Boundary sampling to boost mutation testing for deep learning models

Shen Weijun; Li Yanhui; Han Yuanlei; Chen Lin; Wu Di; Zhou Yuming; Xu Baowen

首页> 外文期刊>Information and software technology >Boundary sampling to boost mutation testing for deep learning models

【24h】

Boundary sampling to boost mutation testing for deep learning models

机译：边界采样，为深学习模型提升突变检测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Context: The prevalent application of Deep Learning (DL) models has raised concerns about their reliability. Due to the data-driven programming paradigm, the quality of test datasets is extremely important to gain accurate assessment of DL models. Recently, researchers have introduced mutation testing into DL testing, which applies mutation operators to generate mutants from DL models, and observes whether the test data can identify mutants to check the quality of test dataset. However, there still exist many factors (e.g., huge labeling efforts and high running cost) hindering the implementation of mutation testing for DL models.Objective: We desire for an approach to selecting a smaller, sensitive, representative and efficient subset of the whole test dataset to promote the current mutation testing (e.g., reduce labeling and running cost) for DL Models.Method: We propose boundary sample selection (BSS), which employs the distance of samples to decision boundary of DL models as the indicator to construct the appropriate subset. To evaluate the performance of BSS, we conduct an extensive empirical study with two widely-used datasets, three popular DL models, and 14 up-to-date DL mutation operators. Results: We observe that (1) The sizes of our subsets generated by BSS are much smaller (about 3%-20% of the whole test set). (2) Under most mutation operators, our subsets are superior (about 9.94-21.63) than the whole test sets in observing mutation effects. (3) Our subsets could replace the whole test sets to a very high degree (higher than 97%) when considering mutation score. (4) The MRR values of our proposed subsets are clearly better (about 2.28-13.19 times higher) than that of the whole test sets.Conclusions: The result shows that BSS can help testers save labelling cost, run mutation testing quickly and identify killed mutants early.

机译：背景：深度学习（DL）模型的普遍应用提出了对其可靠性的担忧。由于数据驱动的编程范例，测试数据集的质量非常重要，以准确评估DL模型。最近，研究人员已经将突变测试引入DL测试，该DL测试应用突变运算符从DL模型生成突变体，并观察测试数据是否可以识别突变体以检查测试数据集的质量。然而，仍然存在许多因素（例如，巨大的标签努力和高运行成本）阻碍了DL模型的突变测试的实现。目的：我们希望采用方法来选择整个测试的较小，敏感，代表性和有效的子集DataSet推广DL Models的当前突变测试（例如，减少标签和运行成本）。子集。为了评估BSS的性能，我们通过两个广泛使用的数据集，三个流行的DL模型和14个最新的DL突变运算符进行广泛的实证研究。结果：我们观察到（1）BSS生成的我们的子集的大小要小得多（约3％-20％的整个测试集）。（2）在大多数突变运营商下，我们的子集高于（约9.94-21.63），而不是观察突变效应的整个测试集。（3）在考虑突变分数时，我们的子集可以将整个测试集更换为高度（高于97％）。（4）我们提出的子集的MRR值明显更好（比整个测试集的倍率更好）.Conclusions：结果表明，BSS可以帮助测试人员节省标签成本，快速运行突变测试并识别杀死突变体早期。

著录项

来源
《Information and software technology》 |2021年第2期|106413.1-106413.16|共16页
作者
Shen Weijun; Li Yanhui; Han Yuanlei; Chen Lin; Wu Di; Zhou Yuming; Xu Baowen;
展开▼
作者单位

Nanjing Univ State Key Lab Novel Software Technol Nanjing 210023 Peoples R China|Nanjing Univ Software Inst Nanjing 210023 Peoples R China;

Nanjing Univ State Key Lab Novel Software Technol Nanjing 210023 Peoples R China|Nanjing Univ Dept Comp Sci & Technol Nanjing 210023 Peoples R China;

Nanjing Univ State Key Lab Novel Software Technol Nanjing 210023 Peoples R China|Nanjing Univ Dept Comp Sci & Technol Nanjing 210023 Peoples R China;

Nanjing Univ State Key Lab Novel Software Technol Nanjing 210023 Peoples R China|Nanjing Univ Dept Comp Sci & Technol Nanjing 210023 Peoples R China;

Momenta Nantiancheng Rd Suzhou Peoples R China;

Nanjing Univ State Key Lab Novel Software Technol Nanjing 210023 Peoples R China|Nanjing Univ Dept Comp Sci & Technol Nanjing 210023 Peoples R China;

Nanjing Univ State Key Lab Novel Software Technol Nanjing 210023 Peoples R China|Nanjing Univ Dept Comp Sci & Technol Nanjing 210023 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Software testing; Deep learning; Mutation testing; Boundary; Neural network;

机译：软件测试;深度学习;突变测试;边界;神经网络;

相似文献

外文文献
中文文献
专利

1. Evaluation of different boosting ensemble machine learning models and novel deep learning and boosting framework for head-cut gully erosion susceptibility [J] . Wei Chen, Xinxiang Lei, Rabin Chakrabortty, Journal of Environmental Management . 2021,第Apra15期

机译：评价不同促进集合机器学习模型和新型深度学习，促进头部沟壑腐蚀易感性的促进框架
2. Comprehensive analysis of machine learning models for prediction of sub-clinical mastitis: Deep Learning and Gradient-Boosted Trees outperform other models [J] . Ebrahimi Mansour, Mohammadi-Dehcheshmeh Manijeh, Ebrahimie Esmaeil, Computers in Biology and Medicine . 2019,第期

机译：用于预测亚临床乳腺炎的机器学习模型的综合分析：深度学习和梯度升压树木优于其他模型
3. Integration of intra-sample contextual error modeling for improved detection of somatic mutations from deep sequencing [J] . Sagi Abelson, Andy G. X. Zeng, Ido Nofech-Mozes, Science Advances . 2020,第50期

机译：对样本内部上下文误差建模的集成，以改善深度测序的细胞突变检测
4. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing [C] . Jingyi Wang, Guoliang Dong, Jun Sun, International Conference on Software Engineering . 2019

机译：通过模型变异测试对深度神经网络进行对抗性样本检测
5. A Face Detection and Recognition System for Color Images Using Neural Networks with Boosting and Deep Learning [D] . Hajiarbabi, Mohammadreza. 2017

机译：基于神经网络的Boosting和深度学习彩色图像人脸检测与识别系统
6. Integrating Sensor Models in Deep Learning Boosts Performance: Application to Monocular Depth Estimation in Warehouse Automation [O] . Ryota Yoneyama, Angel J. Duran, Angel P. del Pobil 2021

机译：集成在深度学习中的传感器模型提升了性能：应用于仓库自动化中的单眼深度估计
7. Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models [O] . Linghan Meng, Yanhui Li, Lin Chen, 2021

机译：测量歧视提高多个深层学习模型的比较测试

Boundary sampling to boost mutation testing for deep learning models

摘要

著录项

相似文献

相关主题

期刊订阅