Arabic Text Categorization Based on Arabic Wikipedia

ADNAN YAHYA; ALI SALHI

首页> 外文期刊>ACM transactions on Asian language information processing >Arabic Text Categorization Based on Arabic Wikipedia

【24h】

Arabic Text Categorization Based on Arabic Wikipedia

机译：基于阿拉伯语维基百科的阿拉伯语文本分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization algorithm was built by adopting a simple categorization idea then moving forward to more complex ones. We applied tests and filtration criteria to reach the best and most efficient results that our algorithm can achieve. The categorization depends on the statistical relations between the input (test) text and the reference (training) data supported by well-defined Wikipedia-based categories. Our algorithm supports two levels for categorizing Arabic text; categories are grouped into a hierarchy of main categories and subcategories. This introduces a challenge due to the correlation between certain subcategories and overlap between main categories. We argue that our algorithm achieved good performance compared to other methods reported in the literature.

机译：本文介绍了一种算法，该算法通过使用手动和自动过程来构建和自定义类别，而依赖于从阿拉伯语维基百科获得的高度分类的基于语料库的数据集，从而对阿拉伯语文本进行分类。分类算法是通过采用简单的分类思想，然后再发展到更复杂的分类思想而构建的。我们应用了测试和过滤标准，以达到算法可以实现的最佳和最有效的结果。分类取决于输入（测试）文本和定义良好的基于维基百科的类别所支持的参考（培训）数据之间的统计关系。我们的算法支持两个级别的阿拉伯文本分类：类别分为主要类别和子类别的层次结构。由于某些子类别之间的相关性以及主要类别之间的重叠，这带来了挑战。我们认为，与文献中报道的其他方法相比，我们的算法实现了良好的性能。

著录项

来源
《ACM transactions on Asian language information processing》 |2014年第1期|4.1-4.20|共20页
作者
ADNAN YAHYA; ALI SALHI;
展开▼
作者单位

Department of Computer Systems Engineering, Birzeit University, Birzeit, Palestine;

Department of Computer Systems Engineering, Birzeit University, Birzeit, Palestine;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Arabic natural language processing; Arabic Wikipedia; categorized corpora; text categorization; light stemming; text analysis;

机译：阿拉伯自然语言处理;阿拉伯文维基百科;分类语料库;文本分类轻茎文字分析;

相似文献

外文文献
中文文献
专利

1. Arabic text categorization based on Arabic Wikipedia [J] . Lalit Saxena Computing reviews . 2014,第9期

机译：基于阿拉伯语维基百科的阿拉伯语文本分类
2. Effective Arabic Stemmer Based Hybrid Approach for Arabic Text Categorization [J] . Meryeme Hadni, Said Alaoui Ouatik, Abdelmonaime Lachkar International Journal of Data Mining & Knowledge Management Process . 2013,第4期

机译：基于有效阿拉伯词干的混合方法进行阿拉伯文本分类
3. Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text [J] . Gadri Said, Moussaoui Abdelouahab The international arab journal of information technology . 2017,第6期

机译：上下文文本分类：一种改进的词干算法，可提高阿拉伯文本分类的质量
4. Predicting the Popularity of Trending Arabic Wikipedia Articles Based on External Stimulants Using Data/Text Mining Techniques [C] . Al-Mutairi Hanadi Muqbil, Khan Mohammad Badruddin 2015 International Conference on Cloud Computing . 2015

机译：使用数据/文本挖掘技术预测基于外部刺激的阿拉伯语维基百科文章的流行度
5. From text to context: Literacy practices of native speakers of Arabic in Arabic and English. [D] . Gherwash, Ghada. 2016

机译：从文本到上下文：以阿拉伯语和英语讲阿拉伯语的母语的读写习惯。
6. SANAD: Single-label Arabic News Articles Dataset for automatic text categorization [O] . Omar Einea, Ashraf Elnagar, Ridhwan Al Debsi 2019

机译：SANAD：用于自动文本分类的单标签阿拉伯新闻文章数据集
7. Arabic text categorization based on Arabic Wikipedia [O] . Yahya Adnan, Salhi Ali 2014

机译：基于阿拉伯语维基百科的阿拉伯语文本分类

Arabic Text Categorization Based on Arabic Wikipedia

摘要

著录项

相似文献

相关主题

期刊订阅