A New Information Theory Based Clustering Fusion Method for Multi-view Representations of Text Documents

机译：基于信息论的文本文档多视图表示聚类融合新方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-view clustering is a complex problem that consists in extracting partitions from multiple representations of the same objects. In text mining and natural language processing, such views may come in the form of word frequencies, topic based representations and many other possible encoding forms coming from various vector space model algorithms. From there, in this paper we propose a clustering fusion algorithm that takes clustering results acquired from multiple vector space models of given documents, and merges them into a single partition. Our fusion method relies on an information theory model based on Kol-mogorov complexity that was previously used for collaborative clustering applications. We apply our algorithm to different text corpuses frequently used in the literature with results that we find to be very satisfying.

机译：多视图聚类是一个复杂的问题，其中包括从同一对象的多种表示中提取分区。在文本挖掘和自然语言处理中，此类视图可能以词频，基于主题的表示形式以及来自各种矢量空间模型算法的许多其他可能的编码形式出现。从那里开始，本文提出了一种聚类融合算法，该算法采用从给定文档的多个向量空间模型获取的聚类结果，并将它们合并为一个分区。我们的融合方法依赖于基于Kol-mogorov复杂度的信息理论模型，该模型先前用于协作集群应用程序。我们将算法应用于文献中经常使用的不同文本语料库，其结果令人非常满意。

著录项

来源
《International Conference on Social Computing and Social Media;International Conference on Human-Computer Interaction》|2020年|156-167|共12页
会议地点
作者
Juan Zamora; Jeremie Sublime;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Multi-view clustering; Information theory; Corpus analysis;

机译：多视图聚类;信息论;语料库分析;

相似文献

外文文献
中文文献
专利

1. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
2. An improved ant algorithm with LDA-based representation for text document clustering [J] . Aytug Onan, Hasan Bulut, Serdar Korukoglu Journal of Information Science . 2017,第2期

机译：一种基于LDA表示的改进蚁群算法用于文本文档聚类
3. GRAPH BASED TEXT REPRESENTATION FOR DOCUMENT CLUSTERING [J] . ASMA KHAZAAL ABDULSAHIB, SITI SAKIRA KAMARUDDIN Journal of Theoretical and Applied Information Technology . 2015,第1期

机译：用于文档聚类的基于图形的文本表示
4. A survey on text document categorization using enhanced sentence vector space model and bi-gram text representation model based on novel fusion techniques [C] . Abdisa Demissie Amensisa, Seema Patil, Poorva Agrawal 2018 2nd International Conference on Inventive Systems and Control . 2018

机译：基于新型融合技术的增强句向量空间模型和二元语法文本表示模型对文本文档分类的研究
5. Multi-document Summarization Based on Document Clustering and Neural Sentence Fusion [D] . Fuad, Tanvir Ahmed. 2018

机译：基于文档聚类和神经句子融合的多文件摘要
6. Thematic clustering of text documents using an EM-based approach [O] . Sun Kim, W John Wilbur 2012

机译：使用基于EM的方法对文本文档进行主题聚类
7. TM-SGTD: Text Mining Based Semantic Graph for Text Document Approach for Text Representation [O] . Ashish Pacharne, Pramod S Nair, Srinivasa Rao D 2017

机译：TM-SGTD：文本文档方法的文本挖掘语义图文本表示

A New Information Theory Based Clustering Fusion Method for Multi-view Representations of Text Documents

摘要

著录项

相似文献

相关主题

期刊订阅