首页> 外文会议>2012 7th International Conference on Computing and Convergence Technology >Automatic arabic Text Categorization using Bayesian learning
【24h】

Automatic arabic Text Categorization using Bayesian learning

机译:使用贝叶斯学习的阿拉伯文本自动分类

获取原文
获取原文并翻译 | 示例

摘要

Automatic Text Categorization (ATC) is a task of categorizing an electronic document to a predefined category automatically based on its content. There are many supervised Machine Learning (ML) techniques that has been used to solve Text Categorization (TC) problem. The complex morphology of Arabic language and its large vocabulary size makes using these techniques difficult and costly in time and attempt. We have investigated Bayesian learning which is based on Bayesian theorem to deal with Arabic ATC problem. Bayesian learning classifiers that have been applied are Multivariate Guess Naïve Bayes (MGNB), Flexible Bayes (FB), Multivariate Bernoulli Naïve Bayes (MBNB), and Multinomial Naïve Bayes (MNB). For text representation in terms of word level N-Gram, 1-Gram, 2-Gram and 3-Gram have been used. For Arabic stemming, a simple stemmer called TREC-2002 Light Stemmer is used in the prototype. For feature selection we have used several feature selection techniques i.e. Chi-Square Statistic (CHI), Odd Ratio (OR), Mutual Information (MI), and GSS Coefficient (GSS). The results showed that FB outperforms MNB, MBNB, and MGNB. The experimental results of this work proved that using word level n-gram for ATC based on Bayesian learning leads to acceptable results.
机译:自动文本分类(ATC)是一项根据电子文档的内容将电子文档自动分类为预定义类别的任务。有许多监督的机器学习(ML)技术已用于解决文本分类(TC)问题。阿拉伯语言的复杂形态及其庞大的词汇量使使用这些技术变得困难且费时费力。我们研究了基于贝叶斯定理的贝叶斯学习方法,以解决阿拉伯ATC问题。已应用的贝叶斯学习分类器是多元猜想朴素贝叶斯(MGNB),弹性贝叶斯(FB),多元伯努利朴素贝叶斯(MBNB)和多项式朴素贝叶斯(MNB)。对于以词级表示的文本表示,已使用1-Gram,2-Gram和3-Gram。对于阿拉伯词根,原型中使用了一个简单的词干器,称为TREC-2002轻型茎干。对于特征选择,我们使用了几种特征选择技术,即卡方统计(CHI),奇数比(OR),互信息(MI)和GSS系数(GSS)。结果表明,FB的性能优于MNB,MBNB和MGNB。这项工作的实验结果证明,基于贝叶斯学习对ATC使用单词级n-gram可以得出可接受的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号