2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents

El Moatez Billah Nagoudi; Ahmed Khorsi; Hadda Cherroun; Didier Schwab

首页> 外文期刊>Cybernetics and information technologies: CIT >2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents

【24h】

2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents

机译：2L-APD：阿拉伯文档的两级抄袭检测系统

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Measuring the amount of shared information between two documents is akey to address a number of Natural Language Processing (NLP) challenges such asInformation Retrieval (IR), Semantic Textual Similarity (STS), Sentiment Analysis(SA) and Plagiarism Detection (PD). In this paper, we report a plagiarism detectionsystem based on two layers of assessment: 1) Fingerprinting which simply comparesthe documents fingerprints to detect the verbatim reproduction; 2) Word embeddingwhich uses the semantic and syntactic properties of words to detect much morecomplicated reproductions. Moreover, Word Alignment (WA), Inverse DocumentFrequency (IDF) and Part-of-Speech (POS) weighting are applied on the examineddocuments to support the identification of words that are most descriptive in eachtextual unit. In the present work, we focused on Arabic documents and we evaluatedthe performance of the system on a data-set of holding three types of plagiarism:1) Simple reproduction (copy and paste); 2) Word and phrase shuffling; 3) Intelligentplagiarism including synonym substitution, diacritics insertion and paraphrasing.The results show a recall of 88% and a precision of 86%. Compared to the resultsobtained by the systems participating in the Arabic Plagiarism Detection SharedTask 2015, our system outperforms all of them with a plagiarism detection score(Plagdet) of 83%.

机译：测量两个文档之间的共享信息量是解决许多自然语言处理（NLP）挑战的关键，例如信息检索（IR），语义文本相似性（STS），情感分析（SA）和抄袭检测（PD）。在本文中，我们报告了一种基于两层评估的pla窃检测系统：1）指纹，它简单地比较文档指纹以检测逐字复制; 2）词嵌入，利用词的语义和句法属性来检测复杂得多的复制品。此外，单词对齐（WA），逆文档频率（IDF）和词性（POS）加权应用于检查的文档，以支持识别每个文本单元中最具描述性的单词。在当前的工作中，我们重点研究阿拉伯文文档，并根据包含三种抄袭的数据集对系统的性能进行了评估：1）简单复制（复制和粘贴）; 2）单词和短语改组; 3）智能抄袭包括同义词替换，变音符号插入和措辞，结果显示召回率为88％，准确度为86％。与参加2015年阿拉伯语Pla窃检测SharedTask的系统所获得的结果相比，我们的系统的83窃检测得分（Plagdet）为83％，优于所有系统。

著录项

来源
《Cybernetics and information technologies: CIT》 |2017年第1期|共15页
作者
El Moatez Billah Nagoudi; Ahmed Khorsi; Hadda Cherroun; Didier Schwab;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动信息理论;
关键词

相似文献

外文文献
中文文献
专利

1. Towards Building an Arabic Plagiarism Detection System: Plagiarism Detection in Arabic [J] . Imtiaz Hussain Khan, Muazzam Ahmed Siddiqui, Kamal M. Jambi International journal of information retrieval research . 2019,第3期

机译：迈向建立阿拉伯语System窃检测系统：阿拉伯语中的Pla窃检测
2. Plagiarism Detection in Arabic Documents: Approaches, Architecture and Systems [J] . Boubaker Kahloula, Jawad Berri Journal of digital information management . 2016,第2期

机译：阿拉伯文档中的抄袭检测：方法，体系结构和系统
3. Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information [J] . Nava Ehsan, Azadeh Shakery Information Processing & Management . 2016,第6期

机译：使用两级邻近信息检索候选文档以进行跨语言retrieval窃
4. A Plagiarism Detection System for Arabic Documents [C] . Ashraf S. Hussein IEEE International Conference Intelligent Systems . 2014

机译：阿拉伯文献的抄袭检测系统
5. A Study Using Plagiarism Detection Services to Assess the Effect of an APA Formatting and Plagiarism Training Lesson on the Quality of Student Originality Scores. [D] . Townsend, Grant R. 2017

机译：使用抄袭检测服务评估APA格式和抄袭培训课程对学生原创性评分质量的影响的研究。
6. Intelligent Bar Chart Plagiarism Detection in Documents [O] . Mohammed Mumtaz Al-Dabbagh, Naomie Salim, Amjad Rehman, -1

机译：文件中的智能条形图抄袭检测
7. 2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents [O] . El Moatez Billah Nagoudi, Ahmed Khorsi, Hadda Cherroun, 2018

机译：2L-APD：阿拉伯文文件的两级抄袭检测系统

2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents

摘要

著录项

相似文献

相关主题

期刊订阅