...
首页> 外文期刊>International Journal of Computer Trends and Technology >Afaan Oromo News Text Categorization using Decision Tree Classifier and Support Vector Machine: A Machine Learning Approach
【24h】

Afaan Oromo News Text Categorization using Decision Tree Classifier and Support Vector Machine: A Machine Learning Approach

机译:使用决策树分类器和支持向量机的Afaan Oromo新闻文本分类:一种机器学习方法

获取原文
           

摘要

Afaan Oromo is one of the major African languages that is widely spoken and used in most parts of Ethiopia and some parts of other neighbor countries like Kenya and Somalia. It is used by Oromo people, who are the largest ethnic group in Ethiopia, which amounts to 25.5% of the total population. There are large collections of Afaan Oromo document available in web, in addition to hard copy document in library, and documentation centers. Even though the amount of the document increase, there are challenging tasks to identify the relevant documents related to a specific topic. So, a text categorization mechanism is required for finding, filtering and managing the rapid growth of online information. Text categorization is an important application of machine learning to the field of document information retrieval. The objective of this research is to investigate the application of machine learning techniques to automatic categorization of Afaan Oromo news text. Two machine learning techniques, namely Decision Tree Classifier and Support Vector Machine are used to categorize the Afaan Oromo news texts. Annotated news texts are used to train classifiers with six news categories sport, business, politics, health, agriculture, and education. To design Afaan Oromo news text categorization system, different techniques, and tools are used for preprocessing, document clustering, and classifier model building. In order to preprocess the Afaan Oromo documents, different text preprocessing techniques such as tokenization, stemming, and stop word removal would be used. 824 news texts were used to do this research. To come up with good results text preparation and preprocessing was done. Stopword was removed from the collection. The 10 fold cross validation was used for testing purposes. The result of this research indicated that such classifiers are applicable to automatically classify Afaan Oromo news texts. The best result obtained by Decision Tree Classifier and Support Vector Machine is on six categories data (96.58, 84.93%) respectively. This research indicated that Decision Tree Classifier is more applicable to automatic categorization of Afaan Oromo news text.
机译:Afaan Oromo是埃塞俄比亚大部分地区以及肯尼亚和索马里其他邻国的某些地区广泛使用的非洲主要语言之一。埃塞俄比亚最大的种族奥罗莫人使用它,占总人口的25.5%。除了图书馆和文档中心的硬拷贝文档外,Web上还有大量的Afaan Oromo文档。尽管文档数量增加,但是要确定与特定主题相关的相关文档仍存在艰巨的任务。因此,需要一种文本分类机制来查找,过滤和管理在线信息的快速增长。文本分类是机器学习在文档信息检索领域的重要应用。这项研究的目的是研究机器学习技术在Afaan Oromo新闻文本自动分类中的应用。决策树分类器和支持向量机这两种机器学习技术用于对Afaan Oromo新闻文本进行分类。带注释的新闻文本用于训练具有以下六个新闻类别的分类器:体育,商业,政治,卫生,农业和教育。为了设计Afaan Oromo新闻文本分类系统,需要使用不同的技术和工具进行预处理,文档聚类和分类器模型构建。为了预处理Afaan Oromo文档,将使用不同的文本预处理技术,例如标记化,词干和停用词删除。 824条新闻文本被用来进行这项研究。为了得出好的结果,完成了文本准备和预处理。停用词已从集合中删除。 10倍交叉验证用于测试目的。研究结果表明,这种分类器适用于对Afaan Oromo新闻文本进行自动分类。决策树分类器和支持向量机获得的最佳结果分别是六个类别的数据(96.58、84.93%)。研究表明,决策树分类器更适用于Afaan Oromo新闻文本的自动分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号