首页> 外文会议>IEEE International Conference on Software Maintenance >Who's who in Gnome: Using LSA to merge software repository identities
【24h】

Who's who in Gnome: Using LSA to merge software repository identities

机译:谁在GNOME中:使用LSA合并软件存储库标识

获取原文

摘要

Understanding an individual's contribution to an ecosystem often necessitates integrating information from multiple repositories corresponding to different projects within the ecosystem or different kinds of repositories (e.g., mail archives and version control systems). However, recognising that different contributions belong to the same contributor is challenging, since developers may use different aliases. It is known that existing identity merging algorithms are sensitive to large discrepancies between the aliases used by the same individual: the noisier the data, the worse their performance. To assess the scale of the problem for a large software ecosystem, we study all Gnome Git repositories, classify the differences in aliases, and discuss robustness of existing algorithms with respect to these types of differences. We then propose a new identity merging algorithm based on Latent Semantic Analysis (LSA), designed to be robust against more types of differences in aliases, and evaluate it empirically by means of cross-validation on Gnome Git authors. Our results show a clear improvement over existing algorithms in terms of precision and recall on worst-case input data.
机译:了解个人对生态系统的贡献通常需要将来自与生态系统内的不同项目对应的多个存储库的信息集成(例如,邮件档案和版本控制系统)。但是,认识到不同贡献属于同一贡献者是具有挑战性的,因为开发人员可以使用不同的别名。众所周知,现有的身份合并算法对同一个人使用的别名之间的巨大差异敏感:嘈杂的数据,它们的性能差。为了评估大型软件生态系统的问题的规模,我们研究所有GNOME GIT存储库,对别名的差异分类,以及对这些类型的差异的现有算法的鲁棒性。然后,我们提出了一种基于潜在语义分析(LSA)的新的身份合并算法,该算法旨在对别名的更多类型的差异,并通过GNOME GIT作者的交叉验证来统一地评估它。我们的结果表明,在最坏情况的输入数据上的精度和召回方面,对现有算法的明确改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号