首页> 外文会议>Software quality days conferenceSWQD >On Identifying Similarities in Git Commit Trends:A Comparison Between Clustering and SimSAX
【24h】

On Identifying Similarities in Git Commit Trends:A Comparison Between Clustering and SimSAX

机译:在识别Git提交趋势中的相似性:群集和Simsax之间的比较

获取原文

摘要

Software products evolve increasingly fast as markets continuously demand new features and agility to customer's need. This evolution of products triggers an evolution of software development practices in a different way. Compared to classical methods, where products were developed in projects, contemporary methods for continuous integration, delivery, and deployment develop products as part of continuous programs. In this context, software architects, designers, and quality engineers need to understand how the processes evolve over time since there is no natural start and stop of projects. For example, they need to know how similar two iterations of the same program or how similar two development programs are. In this paper, we compare three methods for calculating the degree of similarity between projects by comparing their Git commit series. We test three approaches—the DNA-motifs-inspired SimSAX measure and clustering of subsequences (k-Means and Hierarchical clustering). Our results show that the clustering algorithms are much more sensitive to parameters and often find similarities that are not correct. SimSAX, on the other hand, can be calibrated to find fewer similarities between the projects; the similarities are also more consistent for SimSAX than they are for the clustering. We conclude that it is better to use DNA-inspired motifs as they provide more accurate results.
机译:软件产品随着市场不断需求为客户需求的新功能和灵活性而越来越快地发展。产品的这种演变触发了不同方式软件开发实践的演变。与经典方法相比,在项目中开发的产品,当代方法是连续集成,交付和部署开发产品作为连续计划的一部分。在这种情况下,软件架构师,设计师和优质工程师需要了解过程如何随着时间的推移而发展,因为没有项目的自然启动和停止。例如,他们需要知道同一程序的两个迭代类似或类似的两个开发程序的方式。在本文中,我们通过比较他们的Git提交系列来比较三种方法计算项目之间的相似度。我们测试三种方法 - DNA-MOTIFS启发的SIMSAX测量和随后的聚类(k均值和分层聚类)。我们的结果表明,聚类算法对参数更敏感,并且通常找到不正确的相似之处。另一方面,Simsax可以校准,以在项目之间找到更少的相似之处;对于Simsax而言,相似性也比它们用于聚类更为一致。我们得出结论,最好使用DNA启发的主题,因为它们提供更准确的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号