首页> 外文期刊>Journal of computational science >A three-stage unsupervised dimension reduction method for text clustering
【24h】

A three-stage unsupervised dimension reduction method for text clustering

机译:文本聚类的三阶段无监督降维方法

获取原文
获取原文并翻译 | 示例
           

摘要

Dimension reduction is a well-known pre-processing step in the text clustering to remove irrelevant, redundant and noisy features without sacrificing performance of the underlying algorithm. Dimension reduction methods are primarily classified as feature selection (FS) methods and feature extraction (FE) methods. Though FS methods are robust against irrelevant features, they occasionally fail to retain important information present in the original feature space. On the other hand, though FE methods reduce dimensions in the feature space without losing much information, they are significantly affected by the irrelevant features. The one-stage models, FS/FE methods, and the two-stage models, a combination of FS and FE methods proposed in the literature are not sufficient to fulfil all the above mentioned requirements of the dimension reduction. Therefore, we propose three-stage dimension reduction models to remove irrelevant, redundant and noisy features in the original feature space without loss of much valuable information. These models incorporates advantages of the FS and the FE methods to create a low dimension feature subspace. The experiments over three well-known benchmark text datasets of different characteristics show that the proposed three-stage models significantly improve performance of the clustering algorithm as measured by micro F-score, macro F-score, and total execution time.
机译:降维是文本聚类中的一个众所周知的预处理步骤,可以在不牺牲基础算法性能的情况下删除不相关,多余和嘈杂的功能。降维方法主要分为特征选择(FS)方法和特征提取(FE)方法。尽管FS方法对不相关的特征具有鲁棒性,但有时仍无法保留原始特征空间中存在的重要信息。另一方面,尽管有限元方法在不损失太多信息的情况下减小了特征空间的维数,但是它们却受到不相关特征的显着影响。文献中提出的单阶段模型,FS / FE方法以及两阶段模型,FS和FE方法的组合不足以满足上述所有减小尺寸的要求。因此,我们提出了三阶段降维模型,以消除原始特征空间中不相关,冗余和嘈杂的特征,而不会丢失很多有价值的信息。这些模型结合了FS和FE方法的优势,以创建低维特征子空间。对三个不同特征的著名基准文本数据集进行的实验表明,通过微F分数,宏F分数和总执行时间来衡量,所提出的三阶段模型显着提高了聚类算法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号