...
首页> 外文期刊>Expert systems with applications >Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets
【24h】

Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets

机译:稳定性群集:组合统计和语义特征来群集大文本数据集

获取原文
获取原文并翻译 | 示例
           

摘要

Document clustering in text mining is a problem that is heavily researched upon. It is observed that individual approaches based on statistical features and semantic features have been extensively used to solve this problem. However, techniques combining the advantages of both types of features have not been frequently researched upon. Specifically, when the growth in the size of textual data is immense, there is a need for such an approach that combines the advantages of both types of features to give more accurate results within an acceptable range of time. In this paper, a document clustering technique is proposed that combines the effectiveness of the statistical features (using TF-IDF) and semantic features (using lexical chains). It is designed to use a fewer number of features while maintaining a comparable and even better accuracy for the task of document clustering.
机译:文本挖掘中的文档聚类是一个大量研究的问题。 观察到,基于统计特征和语义特征的各个方法已经广泛地用于解决这个问题。 然而,组合两种类型特征的优点的技术尚未经常研究。 具体地,当文本数据大小的增长是巨大的时,需要这种方法,该方法结合了两种类型特征的优点,以在可接受的时间内提供更准确的结果。 在本文中,提出了一种组合统计特征(使用TF-IDF)和语义特征(使用词汇链)的有效性的文档聚类技术。 它旨在使用更少数量的功能,同时保持对文档聚类任务的可比性甚至更好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号