首页> 外文会议>International Conference on Advanced Cognitive Technologies and Applications >Linearithmic Corpus to Corpus Comparison by Sentence Hashing Algorithm SHAPD2
【24h】

Linearithmic Corpus to Corpus Comparison by Sentence Hashing Algorithm SHAPD2

机译:语料库语料库通过句子散列算法SHAPD2进行比较

获取原文

摘要

This work presents an innovative method of comparing sets of textual documents with an aim to identify common phrase sequences. The SHAPD2 (Sentence Hashing Algorithm for Plagiarism Detection 2) algorithm was designed to achieve the goal of a single-pass corpus to corpus comparison. The algorithm was developed taking into account results and observations from previous research activities. It is a highly efficient solution that finds application with considerable amounts of data and excels over other approaches. One of its possible applications is detection of potential plagiarisms comparing not a document against a corpus, but corpus to corpus. Algorithm's performance allows for applications in situations where results have to be served an instant after issuing a query. This makes the SHAPD2 algorithm a valuable alternative to the available solutions.
机译:这项工作提出了一种比较文本文档集的创新方法,目的是识别共同短语序列。 SHAPD2(抄袭检测的​​句子散列算法2)算法旨在实现对语料库比较的单通语料库的目标。通过以前的研究活动的结果和观察结果开发了该算法。它是一种高效的解决方案,可以使用相当多的数据,并以其他方法擅长应用。其中一个可能的应用是检测潜在的抄袭,比较不是对语料库的文件,而是对语料库的语料库。算法的性能允许在发出查询后必须提供结果的情况下的应用程序。这使得SHAPD2算法成为可用解决方案的有价值的替代品。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号