...
首页> 外文期刊>Malaysian Journal of Computer Science >Comparative Study of Feature Selection Approaches for Urdu Text Categorization
【24h】

Comparative Study of Feature Selection Approaches for Urdu Text Categorization

机译:乌尔都语文本分类特征选择方法的比较研究

获取原文
   

获取外文期刊封面封底 >>

       

摘要

This paper presentsacomparative study of feature selection methods for Urdu text categorization. Fivewellknownfeature selection methods were analyzedby means ofsixrecognized classification algorithms: support vector machines (with linear, polynomial and radial basis kernels), naive Bayes, k-nearest neighbour (KNN), and decision tree (i.e. J48). Experimentations are performed on two test collections includinga standard EMILLE collection and a naive collection. We have found that information gain, Chi statistics, and symmetrical uncertainfeature selection methods have uniformly performed in mostly cases. We also found that no solo feature selection technique is best for every classifier.That is,naive Bayes and J48 have advantage with gain ratio than other feature selection methods. Similarly, support vector machines (SVM) and KNN classifiers have shown top performance with information gain.Generally,linear SVM with any of feature selection methods outperformed other classifiers on moderate-size naive collection.Conversely, naive Bayes with any of feature selection technique has an advantage over other classifiers for a small-size EMILLE corpus.
机译:本文对乌尔都语文本分类的特征选择方法进行了比较研究。通过六种公认的分类算法分析了五种著名的特征选择方法:支持向量机(具有线性,多项式和径向基核),朴素贝叶斯,k最近邻(KNN)和决策树(即J48)。实验是在两个测试集合上进行的,其中包括标准的EMILLE集合和天真的集合。我们发现,在大多数情况下,信息增益,Chi统计量和对称的不确定特征选择方法均得到统一执行。我们还发现,没有一种单独的特征选择技术是每个分类器的最佳选择。也就是说,朴素贝叶斯和J48在增益比上比其他特征选择方法更具优势。同样,支持向量机(SVM)和KNN分类器在信息获取方面也表现出最高的性能。通常,使用任何特征选择方法的线性SVM在中等规模的朴素集合上都优于其他分类器。相反,使用任何特征选择技术的朴素贝叶斯具有小型EMILLE语料库比其他分类器的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号