An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data

机译：软件质量数据中动态不完全案例最近邻插补的实证研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software quality prediction is an important yet difficult problem in software project development and management. Historical datasets can be used to build models for software quality prediction. However, the missing data significantly affects the prediction ability of models in knowledge discovery. Instead of ignoring missing observations, we investigate and improve incomplete-case k-nearest neighbor based imputation. K-nearest neighbor imputation is widely applied but has rarely been improved to have the most appropriate parameter settings for each imputation. This work conducts imputation on four well-known software quality datasets to discover the impact of the new imputation method we proposed. We compare it with mean imputation and other commonly used versions of k-nearest neighbor imputation. The empirical results show that the proposed dynamic incomplete-case nearest neighbor imputation performs better when the missingness is completely at random or non-ignorable, regardless of the percentage of missing values.

机译：软件质量预测是软件项目开发和管理中一个重要而又困难的问题。历史数据集可用于构建软件质量预测模型。但是，缺少的数据会显着影响知识发现中模型的预测能力。而不是忽略缺失的观测值，我们研究和改进了基于不完整情况的k最近邻居的归因。 K近邻插补被广泛应用，但很少进行改进以使每个插补具有最合适的参数设置。这项工作对四个著名的软件质量数据集进行插补，以发现我们提出的新插补方法的影响。我们将其与均值插补和k最近邻插补的其他常用版本进行比较。实证结果表明，无论缺失值的百分比如何，当缺失完全处于随机或不可忽略时，所提出的动态不完全情况最近邻插值法会表现出更好的效果。

著录项

来源
《IEEE International Conference on Software Quality, Reliability and Security》|2015年|37-42|共6页
会议地点
作者
Huang Jianglin; Sun Hongyi; Li Yan-Fu; Xie Min;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
imputation; incomplete-case; k-nearest neighbor; missing data treatments; software quality data;

机译：归因;不完备情况; k最近邻;缺少数据处理;软件质量数据;

相似文献

外文文献
中文文献
专利

1. Incomplete-case nearest neighbor imputation in software measurement data [J] . Jason Van Hulse, Taghi M. Khoshgoftaar Information Sciences: An International Journal . 2014,第Null期

机译：软件测量数据中不完整情况的最近邻插补
2. Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study [J] . Huang Jianglin, Keung Jacky Wai, Sarro Federica, The Journal of Systems and Software . 2017,第octa期

机译：基于交叉验证的软件质量数据集的K最近邻插补：一项实证研究
3. Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies [J] . Jasmit S. Shah, Shesh N. Rai, Andrew P. DeFilippis, BMC Bioinformatics . 2017,第1期

机译：基于分布的最近邻插补用于截断的高维数据，并应用于临床前和临床代谢组学研究
4. An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data [C] . Huang Jianglin, Sun Hongyi, Li Yan-Fu, IEEE International Conference on Software Quality, Reliability and Security . 2015

机译：软件质量数据中动态不完全差别邻界的实证研究
5. Improve Software Defect Estimation with Six Sigma Defect Measures: Empirical Studies with Imputation Techniques on ISBSG Data Repository with a High Ratio of Missing Data [D] . Almakadmeh, Mhammed. 2017

机译：提高六种Sigma缺陷措施的软件缺陷估算：具有高比例的ISBSG数据储存中缺货技术的实证研究
6. Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies [O] . Jasmit S. Shah, Shesh N. Rai, Andrew P. DeFilippis, 2017

机译：基于分布的最近邻插补用于截断的高维数据并应用于临床前和临床代谢组学研究
7. Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies [O] . 2017

机译：基于分布的最近邻插补用于截断的高维数据及其在临床前和临床代谢组学研究中的应用

An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data

摘要

著录项

相似文献

相关主题

期刊订阅