首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >Cross-Language Clone Detection by Learning Over Abstract Syntax Trees
【24h】

Cross-Language Clone Detection by Learning Over Abstract Syntax Trees

机译:通过抽象语法树来学习跨语言克隆检测

获取原文

摘要

Clone detection across programs written in the same programming language has been studied extensively in the literature. On the contrary, the task of detecting clones across multiple programming languages has not been studied as much, and approaches based on comparison cannot be directly applied. In this paper, we present a clone detection method based on semi-supervised machine learning designed to detect clones across programming languages with similar syntax. Our method uses an unsupervised learning approach to learn token-level vector representations and an LSTM-based neural network to predict whether two code fragments are clones. To train our network, we present a cross-language code clone dataset - which is to the best of our knowledge the first of its kind - containing around 45,000 code fragments written in Java and Python. We evaluate our approach on the dataset we created and show that our method gives promising results when detecting similarities between code fragments written in Java and Python.
机译:在文献中,已经在文献中进行了广泛研究以相同的编程语言编写的程序的克隆检测。相反,尚未研究跨多个编程语言检测克隆的任务,并且不能直接应用基于比较的方法。在本文中,我们介绍了一种基于半监控机器学习的克隆检测方法,旨在以具有类似语法的编程语言检测克隆。我们的方法使用无监督的学习方法来学习令牌级矢量表示和基于LSTM的神经网络来预测两个代码片段是否是克隆。要培训我们的网络,我们提出了一种跨语言代码数据集 - 这是我们知识的最佳类型 - 其中包含大约45,000个用Java和Python编写的代码片段。我们在我们创建的数据集中评估我们的方法,并显示我们的方法在检测在Java和Python中编写的代码片段之间的相似性时提供了有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号