Statistical Bayesian Learning for Automatic Arabic Text Categorization

Bassam Al-Salemi; Mohd. Juzaiddin Ab Aziz

首页> 外文期刊>Journal of computer sciences >Statistical Bayesian Learning for Automatic Arabic Text Categorization

【24h】

Statistical Bayesian Learning for Automatic Arabic Text Categorization

机译：用于自动阿拉伯文本分类的统计贝叶斯学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Problem statement: The rapid increasing of online Arabic documents necessitated applying Text Categorization techniques which are commonly used for English language to categorize them automatically. The complex morphology of Arabic language and its large vocabulary size make using these techniques difficult and costly in time and effort. Approach: We have investigated Bayesian learning models in order to enhance Arabic ATC. Results: Three classifiers based on Bayesian theorem had been implemented which are Simple Naive Bayes (NB), Multi-variant Bernoulli Naive Bayes (MBNB) and Multinomial Naive Bayes (MNB) models. TREC-2002 Light Stemmer was applied for Arabic stemming. For text representation, BOW and character-level 3, 4 and 5 g had been used. In order to reduce the dimensionality of feature space, we have used several feature selection methods; Mutual Information (MI), CHI-Square statistic (CHI), Odds Ratio (OR) and GSS-coefficient (GSS). Conclusion: MBNB classifier outperforms both of NB and MNB classifiers. BOW representation type leads to the best classification performance, nevertheless using character level n-gram leads to satisfied results by Bayesian learning for Arabic ATC.

机译：问题陈述：在线阿拉伯语文档的迅速增长使得必须使用文本分类技术，该技术通常用于英语以对其进行自动分类。阿拉伯语言的复杂形态及其庞大的词汇量使使用这些技术既困难又费时。方法：我们已经研究了贝叶斯学习模型，以增强阿拉伯ATC。结果：已经实现了基于贝叶斯定理的三个分类器，分别是简单朴素贝叶斯（NB），多元贝努利朴素贝叶斯（MBNB）和多项式朴素贝叶斯（MNB）模型。 TREC-2002轻型词干被应用于阿拉伯词干。对于文本表示，使用了BOW以及字符级别3、4和5 g。为了减少特征空间的维数，我们使用了几种特征选择方法。互信息（MI），卡方统计（CHI），赔率（OR）和GSS系数（GSS）。结论：MBNB分类器优于NB和MNB分类器。 BOW表示类型可导致最佳的分类性能，但是使用字符级n-gram可以使贝叶斯学习阿拉伯ATC的结果令人满意。

著录项

来源
《Journal of computer sciences》 |2011年第1期|p.39-45|共7页
作者
Bassam Al-Salemi; Mohd. Juzaiddin Ab Aziz;
展开▼
作者单位

Department of Computer Science, Faculty of Information Technology, University Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia;

rnDepartment of Computer Science, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
arabic text categorization; bayesian learning; automatic text categorization; odds ratio (OR); information gain (IG); feature selection (FS); mutual information (MI);

机译：阿拉伯文字分类;贝叶斯学习;自动文本分类;比值比（OR）;信息增益（IG）;功能选择（FS）;共同信息（MI）;

相似文献

外文文献
中文文献
专利

1. Statistical Bayesian Learning for Automatic Arabic Text Categorization | Science Publications [J] . Bassam Al-Salemi, Mohd. J. Ab Aziz Journal of computer sciences . 2010,第1期

机译：自动阿拉伯文本分类的统计贝叶斯学习科学出版物
2. SANAD: Single-label Arabic News Articles Dataset for automatic text categorization [J] . Omar Einea, Ashraf Elnagar, Ridhwan Al Debsi Data in Brief . 2019,第1期

机译：SANAD：用于自动文本分类的单标签阿拉伯新闻文章数据集
3. Automatic Arabic text categorization: A comprehensive comparative study [J] . Ismail Hmeidi, Mahmoud Al-Ayyoub, Nawaf A. Abdulla, Journal of Information Science . 2015,第1期

机译：自动阿拉伯文本分类：全面的比较研究
4. Automatic arabic Text Categorization using Bayesian learning [C] . Kadhim Mahmood H., Omar Nazlia 2012 7th International Conference on Computing and Convergence Technology . 2012

机译：使用贝叶斯学习的阿拉伯文本自动分类
5. Bayesian text categorization [D] . Eyheramendy, Susana 2004

机译：贝叶斯文本分类
6. SANAD: Single-label Arabic News Articles Dataset for automatic text categorization [O] . Omar Einea, Ashraf Elnagar, Ridhwan Al Debsi 2019

机译：SANAD：用于自动文本分类的单标签阿拉伯新闻文章数据集
7. Statistical Bayesian Learning for Automatic Arabic Text Categorization [O] . Bassam Al-Salemi, Mohd. J. Ab Aziz 2011

机译：用于自动阿拉伯文本分类的统计贝叶斯学习
8. Some Issues in the Automatic Classification of U.S. Patents Working Notes for the AAAI-98 Workshop on Learning for Text Categorization [R] . Larkey, L. S. 1998

机译：美国专利自动分类中的一些问题aaaI-98文本分类学习研讨会工作说明

Statistical Bayesian Learning for Automatic Arabic Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅