When the previous defect labels of modules in software history warehouse are limited,building an effective prediction model becomes a challenging problem.Aiming at this problem,a twice learning based semi-supervised learning algorithm for software defect prediction is proposed.In the first stage of learning,a large number of unlabeled samples are labeled with probability soft labels and extended to the labeled training dataset by using sparse representation classifier.Then,on this dataset discriminative dictionary learning is used for the second stage of learning.Finally,defect proneness prediction is conducted on the obtained dictionary.Experiments on the widely used NASA MDP and PROMISE AR datasets indicate the superiority of the proposed algorithm.%当软件历史仓库中有标记训练样本较少时,有效的预测模型难以构建.针对此问题,文中提出基于二次学习的半监督字典学习软件缺陷预测方法.在第一阶段的学习中,利用稀疏表示分类器将大量无标记样本通过概率软标记标注扩充至有标记训练样本集中.再在扩充后的训练集上进行第二阶段的鉴别字典学习,最后在学得的字典上预测缺陷倾向性.在NASA MDP和PROMISE AR数据集上的实验验证文中方法的优越性.
展开▼