Batch source-code plagiarism detection using an algorithm for the bounded longest common subsequence problem

机译：使用有界最长公共子序列问题的算法进行批量源代码窃检测

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Source-code plagiarism detection is an unfortunate but necessary activity when reviewing assignments of programming courses. While being reasonably easy to fool, string-based comparisons offer a high degree of accuracy with almost no false positives and usually a good string similarity metric is the length of their longest common subsequence. In the case of two strings, the dynamic programming algorithm for this calculation unfortunately takes quadratic time even if the strings are equal. In this paper we present an algorithm that, given a batch of source-code files, efficiently finds all pairs of similar files by preprocessing the files and then using a fast branch-and-bound algorithm to find only those pairs whose longest common subsequence is indicative of plagiarism.

机译：在复习编程课程的作业时，源代码gi窃检测是一种不幸但必不可少的活动。基于字符串的比较虽然相当容易犯傻，但是却提供了很高的准确性，几乎没有误报，而且通常，良好的字符串相似性度量是它们最长的公共子序列的长度。对于两个字符串，不幸的是，即使字符串相等，用于此计算的动态编程算法也需要二次时间。在本文中，我们提出了一种算法，该算法给定了一批源代码文件，可以通过对文件进行预处理，然后使用快速分支定界算法，仅查找最长公共子序列为表示抄袭。

著录项

来源
《2012 9th International Conference on Electrical Engineering, Computing Science and Automatic Control.》|2012年|p.1-4|共4页
会议地点 Mexico City(MX);Mexico City(MX)
作者
Campos R. A. Castro; Martinez F. J. Zaragoza;
展开▼
作者单位

Departamento de Sistemas, UAM Azcapotzalco, Mexico City, Mexico;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;计算技术、计算机技术;
关键词
Plagiarism detection; branch and bound; longest common subsequence; source-code;

机译：抄袭检测；分支与界限；最长公共子序列；源代码；;

相似文献

外文文献
中文文献
专利

1. Exact algorithms for the repetition-bounded longest common subsequence problem [J] . Asahiro Yuichi, Jansson Jesper, Lin Guohui, Theoretical computer science . 2020,第1期

机译：重复界最长的常见后续问题的确切算法
2. An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis [J] . Cosma Georgina, Joy Mike Computers, IEEE Transactions on . 2012,第3期

机译：基于潜在语义分析的源代码Pla窃检测和调查方法
3. Evaluating the Performance of LSA for Source-code Plagiarism Detection [J] . Georgina Cosma, Mike Joy Informatica: An International Journal of Computing and Informatics . 2012,第4期

机译：评估LSA用于源代码抄袭检测的性能
4. Batch source-code plagiarism detection using an algorithm for the bounded longest common subsequence problem [C] . Campos R. A. Castro, Martinez F. J. Zaragoza International Conference on Electrical Engineering Computing Science and Automatic Control . 2012

机译：批量源代码抄袭使用算法的界限最长常见的子序列问题
5. A Study Using Plagiarism Detection Services to Assess the Effect of an APA Formatting and Plagiarism Training Lesson on the Quality of Student Originality Scores. [D] . Townsend, Grant R. 2017

机译：使用抄袭检测服务评估APA格式和抄袭培训课程对学生原创性评分质量的影响的研究。
6. A Space-Bounded Anytime Algorithm for the Multiple Longest Common Subsequence Problem [O] . Jiaoyun Yang, Yun Xu, Yi Shang, -1

机译：多重最长公共子序列问题的有界无时限算法
7. An approach to source-code plagiarism detection investigation using latent semantic analysis [O] . Cosma Georgina 2008

机译：一种利用潜在语义分析的源代码抄袭检测方法

Batch source-code plagiarism detection using an algorithm for the bounded longest common subsequence problem

摘要

著录项

相似文献

相关主题

期刊订阅