首页> 外文会议>2012 9th International Conference on Electrical Engineering, Computing Science and Automatic Control. >Batch source-code plagiarism detection using an algorithm for the bounded longest common subsequence problem
【24h】

Batch source-code plagiarism detection using an algorithm for the bounded longest common subsequence problem

机译:使用有界最长公共子序列问题的算法进行批量源代码窃检测

获取原文
获取原文并翻译 | 示例

摘要

Source-code plagiarism detection is an unfortunate but necessary activity when reviewing assignments of programming courses. While being reasonably easy to fool, string-based comparisons offer a high degree of accuracy with almost no false positives and usually a good string similarity metric is the length of their longest common subsequence. In the case of two strings, the dynamic programming algorithm for this calculation unfortunately takes quadratic time even if the strings are equal. In this paper we present an algorithm that, given a batch of source-code files, efficiently finds all pairs of similar files by preprocessing the files and then using a fast branch-and-bound algorithm to find only those pairs whose longest common subsequence is indicative of plagiarism.
机译:在复习编程课程的作业时,源代码gi窃检测是一种不幸但必不可少的活动。基于字符串的比较虽然相当容易犯傻,但是却提供了很高的准确性,几乎没有误报,而且通常,良好的字符串相似性度量是它们最长的公共子序列的长度。对于两个字符串,不幸的是,即使字符串相等,用于此计算的动态编程算法也需要二次时间。在本文中,我们提出了一种算法,该算法给定了一批源代码文件,可以通过对文件进行预处理,然后使用快速分支定界算法,仅查找最长公共子序列为表示抄袭。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号