Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering

Abualigah Laith Mohammad; Khader Ahamad Tajudin; Al-Betar Mohammed Azmi; Alomari Osama Ahmad

首页> 外文期刊>Expert Systems with Application >Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering

【24h】

Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering

机译：具有稳健的权重方案和文本文档聚类的动态尺寸缩减功能的文本特征选择

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes three feature selection algorithms with feature weight scheme and dynamic dimension reduction for the text document clustering problem. Text document clustering is a new trend in text mining; in this process, text documents are separated into several coherent clusters according to carefully selected informative features by using proper evaluation function, which usually depends on term frequency. Informative features in each document are selected using feature selection methods. Genetic algorithm (GA), harmony search (HS) algorithm, and particle swarm optimization (PSO) algorithm are the most successful feature selection methods established using a novel weighting scheme, namely, length feature weight (LFW), which depends on term frequency and appearance of features in other documents. A new dynamic dimension reduction (DDR) method is also provided to reduce the number of features used in clustering and thus improve the performance of the algorithms. Finally, k-mean, which is a popular clustering method, is used to cluster the set of text documents based on the terms (or features) obtained by dynamic reduction. Seven text mining benchmark text datasets of different sizes and complexities are evaluated. Analysis with k-mean shows that particle swarm optimization with length feature weight and dynamic reduction produces the optimal outcomes for almost all datasets tested. This paper provides new alternatives for text mining community to cluster text documents by using cohesive and informative features. (C) 2017 Elsevier Ltd. All rights reserved.

机译：针对文本文档聚类问题，提出了三种具有特征权重方案和动态降维的特征选择算法。文本文档聚类是文本挖掘中的新趋势。在此过程中，通过使用适当的评估功能（通常取决于术语频率），根据精心选择的信息功能将文本文档分为几个连贯的簇。使用功能选择方法选择每个文档中的信息性功能。遗传算法（GA），和声搜索（HS）算法和粒子群优化（PSO）算法是使用新型加权方案（长度特征权重（LFW））建立的最成功的特征选择方法，该方案取决于词频和其他文档中功能的外观。还提供了一种新的动态降维（DDR）方法，以减少聚类中使用的特征数量，从而提高算法的性能。最后，k-mean是一种流行的聚类方法，用于基于通过动态归约获得的术语（或特征）对文本文档集进行聚类。评估了七个大小和复杂程度不同的文本挖掘基准文本数据集。用k均值分析表明，具有长度特征权重和动态缩减的粒子群优化为几乎所有测试数据集产生了最佳结果。本文为文本挖掘社区提供了使用聚类和信息功能来聚类文本文档的新方法。（C）2017 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2017年第10期|24-36|共13页
作者
Abualigah Laith Mohammad; Khader Ahamad Tajudin; Al-Betar Mohammed Azmi; Alomari Osama Ahmad;
展开▼
作者单位

Univ Sains Malaysia, Sch Comp Sci, George Town 11800, Malaysia;

Univ Sains Malaysia, Sch Comp Sci, George Town 11800, Malaysia;

Al Balqa Appl Univ, Al Huson Univ Coll, Dept Informat Technol, POB 50, Irbid, Jordan;

Univ Sains Malaysia, Sch Comp Sci, George Town 11800, Malaysia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature selection; Dynamic dimension reduction; Text document clustering; Weight score; Metaheuristics;

机译：特征选择;动态降维;文本文档聚类;权重得分;元启发法;

相似文献

外文文献
中文文献
专利

1. Integrated Clustering and Feature Selection Scheme for Text Documents [J] . M. Thangamani, P. Thangaraj Journal of computer sciences . 2010,第5期

机译：文本文档的集成聚类和特征选择方案
2. Integrated Clustering and Feature Selection Scheme for Text Documents. | Science Publications [J] . M. Thangamani, P. Thangaraj Journal of computer sciences . 2010,第5期

机译：文本文档的集成聚类和特征选择方案。 |科学出版物
3. Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering [J] . Kusum Kumari Bharti, Pramod Kumar Singh Expert Systems with Application . 2015,第6期

机译：通过将特征选择与特征提取方法集成来进行文本聚类的混合降维
4. Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection [C] . Ibraheem Al-Jadir, Kok Wai Wong, Chun Che Fung, Multi-disciplinary international workshop on artificial intelligence . 2017

机译：使用混合模因特征选择的文档聚类的文本降维
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. Relevance popularity: A term event model based feature selection scheme for text classification [O] . Guozhong Feng, Baiguo An, Fengqin Yang, -1

机译：相关性流行度：基于术语事件模型的文本分类特征选择方案
7. Integrated Clustering and Feature Selection Scheme for Text Documents. [O] . M. Thangamani, P. Thangaraj 2010

机译：文本文档的集成聚类和特征选择方案。

Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering

摘要

著录项

相似文献

相关主题

期刊订阅