首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software
【24h】

A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software

机译:用于漏洞软件的漏洞的手动策划数据集

获取原文

摘要

Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool, which we developed, and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software, and the commits fixing them. The data were obtained both from the National Vulnerability Database (NVD), and from project-specific web resources, which we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct opensource Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE (Common Vulnerability and Exposure) identifier at all, and 46, which do have such identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories, and to augment the attributes available for each instance. Moreover, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; it also represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry.
机译:推进我们对软件漏洞的理解,自动化其识别,对其影响的分析,并最终是必要的,使其缓解能够开发更安全的软件。在操作我们开发的漏洞评估工具时,目前在SAP的数百个开发单位使用,我们手动收集并策划了开源软件的漏洞数据集,并提出了修复它们。从国家漏洞数据库(NVD)以及从项目特定的Web资源获得数据,我们在连续监视的项目特定的Web资源中获得。从该数据中,我们提取了一个数据集,该数据集地图624公开披露的漏洞影响了在SAP产品或内部工具中使用的205个不同的OpenSource Java项目到修复它们的1282个提交。在624个漏洞中,29没有CVE(常见漏洞和曝光)标识符,46个,该标识符具有编号权限分配的此类标识符,但在NVD中不可用。 DataSet在开源许可证下释放,以及支持脚本,允许研究人员从相应的存储库自动检索提交的实际内容,并增强每个实例可用的属性。此外,这些脚本允许使用不安全修复的附加实例(例如,在机器学习应用程序中有用)补充数据集。我们的数据集已成功用于培训可以在代码存储库中自动识别安全相关提交的分类器。此数据集的发布和作为开源的支持代码将允许将来的研究基于工业相关数据;它还代表了维护了这一数据集的具体步骤,该数据集是涉及开源社区,学术界和行业的共同努力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号