Cross-Language Clone Detection by Learning Over Abstract Syntax Trees

机译：通过抽象语法树来学习跨语言克隆检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clone detection across programs written in the same programming language has been studied extensively in the literature. On the contrary, the task of detecting clones across multiple programming languages has not been studied as much, and approaches based on comparison cannot be directly applied. In this paper, we present a clone detection method based on semi-supervised machine learning designed to detect clones across programming languages with similar syntax. Our method uses an unsupervised learning approach to learn token-level vector representations and an LSTM-based neural network to predict whether two code fragments are clones. To train our network, we present a cross-language code clone dataset - which is to the best of our knowledge the first of its kind - containing around 45,000 code fragments written in Java and Python. We evaluate our approach on the dataset we created and show that our method gives promising results when detecting similarities between code fragments written in Java and Python.

机译：在文献中，已经在文献中进行了广泛研究以相同的编程语言编写的程序的克隆检测。相反，尚未研究跨多个编程语言检测克隆的任务，并且不能直接应用基于比较的方法。在本文中，我们介绍了一种基于半监控机器学习的克隆检测方法，旨在以具有类似语法的编程语言检测克隆。我们的方法使用无监督的学习方法来学习令牌级矢量表示和基于LSTM的神经网络来预测两个代码片段是否是克隆。要培训我们的网络，我们提出了一种跨语言代码数据集 - 这是我们知识的最佳类型 - 其中包含大约45,000个用Java和Python编写的代码片段。我们在我们创建的数据集中评估我们的方法，并显示我们的方法在检测在Java和Python中编写的代码片段之间的相似性时提供了有希望的结果。

著录项

来源
《IEEE/ACM International Conference on Mining Software Repositories》|2019年|xxxiv 606 p. :|共11页
会议地点
作者
Daniel Perez; Shigeru Chiba;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类安全保密;
关键词
Java; learning (artificial intelligence); program diagnostics; program verification; software libraries; software maintenance; tree data structures; unsupervised learning;

机译：Java;学习（人工智能）;程序诊断;程序验证;软件图书馆;软件维护;树数据结构;无监督的学习;

相似文献

外文文献
中文文献
专利

1. Programmers' de-anonymization using a hybrid approach of abstract syntax tree and deep learning [J] . Ullah Farhan, Jabbar Sohail, Al-Turjman Fadi Technological forecasting and social change . 2020,第Octa期

机译：程序员使用抽象语法树和深度学习的混合方法进行匿名化
2. Static code detection based on abstract syntax tree [J] . Lu Xiaofeng, Fang Denghui Basic & clinical pharmacology & toxicology. . 2020,第S9期

机译：基于抽象语法树的静态代码检测
3. Static code detection based on abstract syntax tree [J] . Lu Xiaofeng, Fang Denghui Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：基于抽象语法树的静态代码检测
4. Cross-Language Clone Detection by Learning Over Abstract Syntax Trees [C] . Daniel Perez, Shigeru Chiba IEEE/ACM International Conference on Mining Software Repositories . 2019

机译：通过学习抽象语法树来进行跨语言克隆检测
5. Abstract Syntax Tree-Based Code Smell Detection and Refactoring [D] . Patodiya aka Patoliya, Aditi aka Palak. 2018

机译：基于抽象句法树的代码气味检测与重构
6. Artificial Grammar Learning Capabilities in an Abstract Visual Task Match Requirements for Linguistic Syntax [O] . Gesche Westphal-Fitch, Beatrice Giustolisi, Carlo Cecchetto, -1

机译：语言语法的抽象视觉任务匹配要求中的人工语法学习能力
7. Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree [O] . Wenhan Wang, Ge Li, Bo Ma, 2020

机译：用图形神经网络和流量增强抽象语法树检测代码克隆

Cross-Language Clone Detection by Learning Over Abstract Syntax Trees

摘要

著录项

相似文献

相关主题

期刊订阅