Comparative Study of Feature Selection Approaches for Urdu Text Categorization

Muhammad Pervez Akhter; Qaiser Abbas; Tehseen Zia

首页> 外文期刊>Malaysian Journal of Computer Science >Comparative Study of Feature Selection Approaches for Urdu Text Categorization

【24h】

Comparative Study of Feature Selection Approaches for Urdu Text Categorization

机译：乌尔都语文本分类特征选择方法的比较研究

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presentsacomparative study of feature selection methods for Urdu text categorization. Fivewellknownfeature selection methods were analyzedby means ofsixrecognized classification algorithms: support vector machines (with linear, polynomial and radial basis kernels), naive Bayes, k-nearest neighbour (KNN), and decision tree (i.e. J48). Experimentations are performed on two test collections includinga standard EMILLE collection and a naive collection. We have found that information gain, Chi statistics, and symmetrical uncertainfeature selection methods have uniformly performed in mostly cases. We also found that no solo feature selection technique is best for every classifier.That is,naive Bayes and J48 have advantage with gain ratio than other feature selection methods. Similarly, support vector machines (SVM) and KNN classifiers have shown top performance with information gain.Generally,linear SVM with any of feature selection methods outperformed other classifiers on moderate-size naive collection.Conversely, naive Bayes with any of feature selection technique has an advantage over other classifiers for a small-size EMILLE corpus.

机译：本文对乌尔都语文本分类的特征选择方法进行了比较研究。通过六种公认的分类算法分析了五种著名的特征选择方法：支持向量机（具有线性，多项式和径向基核），朴素贝叶斯，k最近邻（KNN）和决策树（即J48）。实验是在两个测试集合上进行的，其中包括标准的EMILLE集合和天真的集合。我们发现，在大多数情况下，信息增益，Chi统计量和对称的不确定特征选择方法均得到统一执行。我们还发现，没有一种单独的特征选择技术是每个分类器的最佳选择。也就是说，朴素贝叶斯和J48在增益比上比其他特征选择方法更具优势。同样，支持向量机（SVM）和KNN分类器在信息获取方面也表现出最高的性能。通常，使用任何特征选择方法的线性SVM在中等规模的朴素集合上都优于其他分类器。相反，使用任何特征选择技术的朴素贝叶斯具有小型EMILLE语料库比其他分类器的优势。

著录项

来源
《Malaysian Journal of Computer Science》 |2015年第2期|共页
作者
Muhammad Pervez Akhter; Qaiser Abbas; Tehseen Zia;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类情报学、情报工作;
关键词

相似文献

外文文献
中文文献
专利

1. Comparative Study of Feature Selection Approaches for Urdu Text Categorization [J] . Tehseen Zia, Muhammad Pervez Akhter, Qaiser Abbas Malaysian Journal of Computer Science . 2015,第2期

机译：乌尔都语文本分类特征选择方法的比较研究
2. Evaluation of Feature Selection Approaches for Urdu Text Categorization [J] . Tehseen Zia, Qaiser Abbas, Muhammad Pervez Akhtar International Journal of Intelligent Systems and Applications . 2015,第6期

机译：乌尔都语文本分类特征选择方法的评估
3. A NOVEL EMBEDDED FEATURE SELECTION METHOD: A COMPARATIVE STUDY IN THE APPLICATION OF TEXT CATEGORIZATION [J] . Maryam Bahojb Imani, Mohammad Reza Keyvanpour, Reza Azmi Applied Artificial Intelligence . 2013,第5a7期

机译：新型嵌入式特征选择方法：文本分类应用的比较研究
4. A comparative study on feature selection in text categorization [C] . Yiming Yang, Jan O.Pedersen International conference on machine learning;ICML'97 . 1997

机译：文本分类中特征选择的比较研究
5. Study of feature selection algorithms for text-categorization. [D] . Dave, Kandarp. 2011

机译：用于文本分类的特征选择算法的研究。
6. Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization [O] . Jieming Yang, Zhaoyang Qu, Zhiying Liu -1

机译：文本分类中考虑不平衡问题的改进特征选择方法
7. Study of feature selection algorithms for text-categorization [O] . Dave Kandarp 2011

机译：文本分类的特征选择算法研究
8. Comparative Study of Feature Mapping and Selection for ATR. Experiments on SARData [R] . Yu, X., Hoff, L. E., Reed, I. S., 1994

机译：aTR特征映射与选择的比较研究。在saRData上的实验

Comparative Study of Feature Selection Approaches for Urdu Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅