首页> 中文期刊> 《计算机科学》 >基于Boosting的代价敏感软件缺陷预测方法

基于Boosting的代价敏感软件缺陷预测方法

         

摘要

Boosting重抽样是常用的扩充小样本数据集的方法,首先针对抽样过程中存在的维数灾难现象,提出随机属性子集选择方法以进行降维处理;进而针对软件缺陷预测对于漏报与误报的惩罚因子不同的特点,在属性选择过程中添加代价敏感算法.以多个基本k-NN预测器为弱学习器,以代价最小为属性删除原则,得到当前抽样集的k值与属性子集的预测器集合,采用代价敏感的权重更新机制对抽样过程中的不同数据实例赋予相应权值,由所有预测器集合构成自适应的集成k-NN强学习器并建立软件缺陷预测模型.基于NASA数据集的实验结果表明,在小样本情况下,基于Boosting的代价敏感软件缺陷预测方法预测的漏报率有较大程度降低,误报率有一定程度增加,整体性能优于原来的Boosting集成预测方法.%Boosting resampling is a common method to expand data sets for small samples.Firstly,aiming at dimension disaster phenomenon during resampling process,a randomly feature selection method is used to reduce the dimensions.In addition,considering the characteristic that software defect prediction's penalties for missing of true positives and the wrongly reported of negatives are different,cost-sensitive algorithm is added in feature selection process.On the basis of multi-normal k-NN weak learning,taking minimum costs as the principle,preditor which consists of k value and attributes subset of the current sampling set is get,cost-sensitive theory is imported to update weight vector during Boosting resampling process,and different instances are given corresponding weights.An adaptive ensemble k-NN learning is constructed using all the predictors,and a software defect prediction model is established.The results using NASA's data sets show that under the condition of small samples,with this model,missing of true positive rate reduces largely and the wrongly reported of negative rate increases to some extent.On the whole,compared with the origen boostingbased learning,the method of cost-sensitive software defect prediction based on boosting greatly improves the prediction effect.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号