Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets

Mehta Vivek; Bawa Seema; Singh Jasmeet

首页> 外文期刊>Expert systems with applications >Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets

【24h】

Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets

机译：稳定性群集：组合统计和语义特征来群集大文本数据集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Document clustering in text mining is a problem that is heavily researched upon. It is observed that individual approaches based on statistical features and semantic features have been extensively used to solve this problem. However, techniques combining the advantages of both types of features have not been frequently researched upon. Specifically, when the growth in the size of textual data is immense, there is a need for such an approach that combines the advantages of both types of features to give more accurate results within an acceptable range of time. In this paper, a document clustering technique is proposed that combines the effectiveness of the statistical features (using TF-IDF) and semantic features (using lexical chains). It is designed to use a fewer number of features while maintaining a comparable and even better accuracy for the task of document clustering.

机译：文本挖掘中的文档聚类是一个大量研究的问题。观察到，基于统计特征和语义特征的各个方法已经广泛地用于解决这个问题。然而，组合两种类型特征的优点的技术尚未经常研究。具体地，当文本数据大小的增长是巨大的时，需要这种方法，该方法结合了两种类型特征的优点，以在可接受的时间内提供更准确的结果。在本文中，提出了一种组合统计特征（使用TF-IDF）和语义特征（使用词汇链）的有效性的文档聚类技术。它旨在使用更少数量的功能，同时保持对文档聚类任务的可比性甚至更好的准确性。

著录项

来源
《Expert systems with applications》 |2021年第7期|114710.1-114710.9|共9页
作者
Mehta Vivek; Bawa Seema; Singh Jasmeet;
展开▼
作者单位

Thapar Inst Engn & Technol Comp Sci & Engn Dept Patiala 147001 Punjab India;

Thapar Inst Engn & Technol Comp Sci & Engn Dept Patiala 147001 Punjab India;

Thapar Inst Engn & Technol Comp Sci & Engn Dept Patiala 147001 Punjab India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Document clustering; Semantic relations; Lexical chains; TF-IDF; WordNet; Big data;

机译：文档聚类;语义关系;词汇链;TF-IDF;Wordnet;大数据;

相似文献

外文文献
中文文献
专利

1. High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing [J] . Saida Ishak Boushaki, Nadjet Kamel, Omar Bendjeghaba Journal of information & knowledge management . 2018,第3期

机译：基于Cuckoo搜索和潜在语义索引的高维文本数据集聚类算法
2. High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing [J] . Saida Ishak Boushaki, Nadjet Kamel, Omar Bendjeghaba Journal of information & knowledge management . 2018,第3期

机译：基于Cuckoo搜索和潜在语义索引的高维文本数据集聚类算法
3. Statistical approach to normalization of feature vectors and clustering of mixed datasets [J] . Suarez-Alvarez M.M., Pham D.-T., Prostov M.Y., Proceedings of the Royal Society. Mathematical, physical and engineering sciences . 2012,第2145期

机译：统计特征向量标准化和混合数据集聚的方法
4. Combining Statistical Information and Semantic Similarity for Short Text Feature Extension [C] . Xiaohong Li, Yun Su, Huifang Ma, IFIP TC 12 international conference on intelligent information processing . 2016

机译：结合统计信息和语义相似度进行短文本特征扩展
5. Semantic preserving text representation and its applications in text clustering. [D] . Howard, Michael. 2012

机译：语义保留文本表示及其在文本聚类中的应用。
6. Towards Semantically Sensitive Text Clustering: A Feature Space Modeling Technology Based on Dimension Extension [O] . Yuanchao Liu, Ming Liu, Xin Wang -1

机译：面向语义敏感的文本聚类：基于维扩展的特征空间建模技术
7. Statistical approach to normalization of feature vectors and clustering of mixed datasets [O] . Maria M. Suarez-Alvarez, Duc-Truong Pham, Mikhail Y. Prostov, 2012

机译：特征向量标准化的统计方法和混合数据集的聚类

Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets

摘要

著录项

相似文献

相关主题

期刊订阅