首页> 外文期刊>Information Processing & Management >Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges
【24h】

Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges

机译:使用基于句法语义的自然语言处理技术来揭露文本窃:比较,分析和挑战

获取原文
获取原文并翻译 | 示例
           

摘要

The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN11http://pan.webis.de.competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data.
机译:拟议的工作旨在探索和比较基于句法语义的语言结构在使用自然语言处理技术进行窃检测中的作用。当前的工作探索语言特征,即部分语音标签,大块和语义在检测抄袭片段中的作用,并利用组合的句法-语义相似度,从WordNet词汇数据库中提取语义概念。语言信息用于有效的预处理和语义相关的比较。另一个主要贡献是对各种复杂程度的levels窃案件的拟议方法进行了分析。分析和讨论types窃类型和复杂程度对提取特征的影响。此外,与现有系统不同(在某些有限的数据集上进行了评估),该提议的方法使用PAN11http://pan.webis.de.competition从2009年至2014年提供的provided窃语料库进行了较大规模的评估。与相应年份的顶级系统相比,有了很大的改进。对各种cases窃案件的评估和分析也反映出,在识别手动窃数据方面,更深层次的语言功能至高无上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号