首页> 外文期刊>International journal of software engineering and knowledge engineering >SMOTE and Feature Selection for More Effective Bug Severity Prediction

SMOTE and Feature Selection for More Effective Bug Severity Prediction


"Severity" is one of the essential features of software bug reports, which is a crucial factor for developers to decide which bug should be fixed immediately and which bug could be delayed to a next release. Severity assignment is a manual process and its accuracy depends on the experience of the assignee. Prior research proposed several models to automate this process. These models are based on textual preprocessing of historical bug reports and classification techniques. Although bug repositories suffer from severity class imbalance, none of the prior studies investigated the impact of implementing a class rebalancing technique on the accuracy of their models. In this paper, we propose a framework for predicting fine-grained severity levels which utilizes an over-sampling technique "SMOTE", to balance the severity classes, and a feature selection scheme, to reduce the data scale and select the most informative features for training a K-nearest neighbor (KNN) classifier. The KNN classifier utilizes a distance-weighted voting scheme to predict the proper severity level of a newly reported bug. We investigated the effectiveness of our proposed approach on two large bug repositories, namely Eclipse and Mozilla, and the experimental results showed that our approach outperforms cutting-edge studies in predicting the minority severity classes.
机译:“严重性”是软件错误报告的基本功能之一,这对于开发人员决定应立即修复哪些错误以及哪些错误可能延迟到下一发行版而言至关重要。严重性分配是手动过程,其准确性取决于受让人的经验。先前的研究提出了几种模型来使这一过程自动化。这些模型基于历史错误报告和分类技术的文本预处理。尽管错误存储库遭受严重性级别不平衡的困扰,但先前的研究均未调查实施类重新平衡技术对其模型准确性的影响。在本文中,我们提出了一种用于预测细粒度严重程度级别的框架,该框架利用过采样技术“ SMOTE”来平衡严重程度类别,并使用一种特征选择方案来减少数据规模并选择最有用的信息训练K最近邻(KNN)分类器的功能。 KNN分类器利用距离加权投票方案来预测新报告的错误的适当严重性级别。我们在两个大型Bug库(即Eclipse和Mozilla)上研究了我们提出的方法的有效性,实验结果表明,在预测少数族裔严重性等级方面,我们的方法优于前沿研究。



