首页> 外文期刊>ACM transactions on Asian language information processing >Arabic Text Categorization Based on Arabic Wikipedia
【24h】

Arabic Text Categorization Based on Arabic Wikipedia

机译:基于阿拉伯语维基百科的阿拉伯语文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization algorithm was built by adopting a simple categorization idea then moving forward to more complex ones. We applied tests and filtration criteria to reach the best and most efficient results that our algorithm can achieve. The categorization depends on the statistical relations between the input (test) text and the reference (training) data supported by well-defined Wikipedia-based categories. Our algorithm supports two levels for categorizing Arabic text; categories are grouped into a hierarchy of main categories and subcategories. This introduces a challenge due to the correlation between certain subcategories and overlap between main categories. We argue that our algorithm achieved good performance compared to other methods reported in the literature.
机译:本文介绍了一种算法,该算法通过使用手动和自动过程来构建和自定义类别,而依赖于从阿拉伯语维基百科获得的高度分类的基于语料库的数据集,从而对阿拉伯语文本进行分类。分类算法是通过采用简单的分类思想,然后再发展到更复杂的分类思想而构建的。我们应用了测试和过滤标准,以达到算法可以实现的最佳和最有效的结果。分类取决于输入(测试)文本和定义良好的基于​​维基百科的类别所支持的参考(培训)数据之间的统计关系。我们的算法支持两个级别的阿拉伯文本分类:类别分为主要类别和子类别的层次结构。由于某些子类别之间的相关性以及主要类别之间的重叠,这带来了挑战。我们认为,与文献中报道的其他方法相比,我们的算法实现了良好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号