HYDRA: Massively Compositional Model for Cross-Project Defect Prediction

Xin Xia; David Lo; Sinno Jialin Pan; Nachiappan Nagappan; Xinyu Wang

首页> 外文期刊>IEEE Transactions on Software Engineering >HYDRA: Massively Compositional Model for Cross-Project Defect Prediction

【24h】

HYDRA: Massively Compositional Model for Cross-Project Defect Prediction

机译：HYDRA：跨项目缺陷预测的大规模组合模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most software defect prediction approaches are trained and applied on data from the same project. However, often a new project does not have enough training data. Cross-project defect prediction, which uses data from other projects to predict defects in a particular project, provides a new perspective to defect prediction. In this work, we propose a HYbrid moDel Reconstruction Approach (HYDRA) for cross-project defect prediction, which includes two phases: genetic algorithm (GA) phase and ensemble learning (EL) phase. These two phases create a massive composition of classifiers. To examine the benefits of HYDRA, we perform experiments on 29 datasets from the PROMISE repository which contains a total of 11,196 instances (i.e., Java classes) labeled as defective or clean. We experiment with logistic regression as the underlying classification algorithm of HYDRA. We compare our approach with the most recently proposed cross-project defect prediction approaches: TCA+ by Nam et al., Peters filter by Peters et al., GP by Liu et al., MO by Canfora et al., and CODEP by Panichella et al. Our results show that HYDRA achieves an average F1-score of 0.544. On average, across the 29 datasets, these results correspond to an improvement in the F1-scores of 26.22 , 34.99, 47.43, 28.61, and 30.14 percent over TCA+, Peters filter, GP, MO, and CODEP, respectively. In addition, HYDRA on average can discover 33 percent of all bugs if developers inspect the top 20 percent lines of code, which improves the best baseline approach (TCA+) by 44.41 percent. We also find that HYDRA improves the F1-score of Zero-R which predict all the instances to be defective by 5.42 percent, but improves Zero-R by 58.65 percent when inspecting the top 20 percent lines of code. In practice, Zero-R can be hard to use since it simply predicts all of the instances to be defective, and thus developers have to inspect all of the instances to find the defective ones. Moreover, we notice the improvement of HYDRA over other baseline approaches in terms of F1-score and when inspecting the top 20 percent lines of code are substantial, and in most cases the improvements are significant and have large effect sizes across the 29 datasets.

机译：大多数软件缺陷预测方法都经过培训，并应用于来自同一项目的数据。但是，一个新项目通常没有足够的培训数据。跨项目缺陷预测使用其他项目中的数据来预测特定项目中的缺陷，为缺陷预测提供了新的视角。在这项工作中，我们提出了一种用于跨项目缺陷预测的混合模型重建方法（HYDRA），该方法包括两个阶段：遗传算法（GA）阶段和集成学习（EL）阶段。这两个阶段创建了大量的分类器。为了检查HYDRA的好处，我们对PROMISE存储库中的29个数据集进行了实验，该数据集包含总共11,196个被标记为有缺陷或干净的实例（即Java类）。我们将逻辑回归作为HYDRA的基础分类算法进行实验。我们将我们的方法与最近提出的跨项目缺陷预测方法进行了比较：Nam等人的TCA +，Peters等人的Peters过滤器，Liu等人的GP，Canfora等人的MO和Panichella等人的CODEP。等我们的结果表明，HYDRA的F1平均得分为0.544。平均而言，在29个数据集中，这些结果对应的F1得分分别比TCA +，Peters过滤器，GP，MO和CODEP分别提高了26.22％，34.99％，47.43％，28.61和30.14％。此外，如果开发人员检查前20％的代码行，HYDRA平均可以发现所有错误的33％，这使最佳基准方法（TCA +）改善了44.41％。我们还发现，HYDRA改善了Zero-R的F1分数，该分数预测所有实例的缺陷率均达到5.42％，但是当检查前20％的代码行时，其Zero-R则提高58.65％。实际上，Zero-R可能很难使用，因为它仅能预测所有实例都是有缺陷的，因此开发人员必须检查所有实例以找到有缺陷的实例。此外，我们注意到在F1分数方面，HYDRA相对于其他基准方法而言有所改进，并且在检查前20％的代码行时，它们是实质性的，并且在大多数情况下，这种改进是显着的，并且在29个数据集中具有较大的影响量。

著录项

来源
《IEEE Transactions on Software Engineering》 |2016年第10期|977-998|共22页
作者
Xin Xia; David Lo; Sinno Jialin Pan; Nachiappan Nagappan; Xinyu Wang;
展开▼
作者单位

College of Computer Science and Technology, Zhejiang University Hangzhou, Zhejiang, China;

School of Information Systems, Singapore Management University, Singapore;

School of Computer Engineering, Nanyang Technological University, Singapore;

Testing, Verification and Measurement Research, Microsoft Research, Redmond, WA;

College of Computer Science and Technology, Zhejiang University Hangzhou, Zhejiang, China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Genetic algorithms; Predictive models; Training; Buildings; Architecture; Data models; Measurement;

机译：遗传算法;预测模型;训练;建筑物;建筑;数据模型;测量;

相似文献

外文文献
中文文献
专利

1. An empirical analysis of the statistical learning models for different categories of cross-project defect prediction [J] . Lipika Goel, Mayank Sharma, Sunil Kumar Khatri, International Journal of Computer Aided Engineering and Technology . 2021,第2期

机译：不同类别的交叉项目缺陷预测统计学习模型的实证分析
2. On the time-based conclusion stability of cross-project defect prediction models [J] . Abdul AN Bangash, Hareem Sahar, Abram Hindle, Empirical Software Engineering . 2020,第6期

机译：论交叉项目缺陷预测模型的基于时间的结论稳定性
3. Global vs. local models for cross-project defect prediction A replication study [J] . Herbold Steffen, Trautsch Alexander, Grabowski Jens Empirical Software Engineering . 2017,第4期

机译：跨项目缺陷预测的全局模型与局部模型的复制研究
4. A Three-Stage Defect Prediction Model for Cross-Project Defect Prediction [C] . Song Huang, Yaning Wu, Haijin Ji, 2017 International Conference on Dependable Systems and Their Applications . 2017

机译：跨项目缺陷预测的三阶段缺陷预测模型
5. A Software Metrics Clustering Approach to Cross-Project Defect Prediction [D] . Sezer, Anil. 2019

机译：交叉项目缺陷预测的软件度量聚类方法
6. Transitioning from Microbiome Composition to Microbial Community Interactions: The Potential of the Metaorganism Hydra as an Experimental Model [O] . Peter Deines, Thomas C. G. Bosch -1

机译：从微生物组组成到微生物群落相互作用的转变：微生物九头蛇作为实验模型的潜力。
7. complexFuzzy: A novel clustering method for selecting training instances of cross-project defect prediction [O] . Muhammed Maruf Ozturk 2021

机译：复杂滤芯：一种选择跨项目缺陷预测培训实例的新型聚类方法
8. Comparison of HYDRA-II Predictions to Temperature Data from Consolidated and Unconsolidated Model Spent Fuel Assemblies. [R] . McCann, R. A. 1988

机译：HYDRa-II预测与固结和非固结模型乏燃料组件温度数据的比较。

HYDRA: Massively Compositional Model for Cross-Project Defect Prediction

摘要

著录项

相似文献

相关主题

期刊订阅