首页> 外文会议>Australasian conference on Computer science >Signature extraction for overlap detection in documents
【24h】

Signature extraction for overlap detection in documents

机译:签名提取,用于文档中的重叠检测

获取原文

摘要

Easy access to the Web has led to increased potential for students cheating on assignments by plagiarising others' work. By the same token, Web-based tools offer the potential for instructors to check submitted assignments for signs of plagiarism. Overlap-detection tools are easy to use and accurate in plagiarism detection, so they can be an excellent deterrent to plagiarism. Documents can overlap for other reasons, too: Old documents are superseded, and authors summarize previous work identically in several papers. Overlap-detection tools can pinpoint interconnections in a corpus of documents and could be used in search engines.We describe a web-accessible text registry based on signature extraction. We extract a small but diagnostic signature from each registered text for permanent storage and comparison against other stored signatures. This comparison allows us to estimate the amount of overlap between pairs of documents, although the total time required is linear in the total size of the documents. We compare our algorithm with several alternatives and present both efficiency and accuracy results.
机译:轻松访问Web可以通过students窃他人的工作来增加学生作弊的可能性。同样,基于Web的工具为教师提供了检查提交的作业是否存在窃迹象的潜力。重叠检测工具易于使用并且在accurate窃检测中非常准确,因此它们可以很好地阻止de窃。文档也可能由于其他原因而重叠:旧文档被取代,并且作者在几篇论文中对以前的工作进行了相同的总结。重叠检测工具可以查明文档集中的互连,并且可以在搜索引擎中使用。我们描述了一种基于签名提取的可通过网络访问的文本注册表。我们从每个注册的文本中提取一个小的但具有诊断意义的签名,以进行永久存储并与其他已存储的签名进行比较。这种比较使我们能够估计成对的文档之间的重叠量,尽管所需的总时间与文档的总大小成线性关系。我们将我们的算法与几种备选方案进行比较,并给出了效率和准确性结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号