...
首页> 外文期刊>Neurocomputing >A text semantic topic discovery method based on the conditional co-occurrence degree
【24h】

A text semantic topic discovery method based on the conditional co-occurrence degree

机译:基于条件共现度的文本语义主题发现方法

获取原文
获取原文并翻译 | 示例
           

摘要

The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, "topic-word" distributions and "document-topic" distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, "document-topic" distributions of the original documents are obtained by merging new subdocuments' "document-topic" distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics. (C) 2019 Elsevier B.V. All rights reserved.
机译:主题发现方法作为一种有效的语义挖掘工具,是一种从原始文本中提取新特征的关键手段,在文本挖掘和知识发现领域起着重要的作用。为了解决传统主题模型中遇到的问题,例如语义信息的丢失和主题概念的歧义以及主题之间的交叉和覆盖,我们提出了一种基于条件共现度的语义主题发现方法( CCOD_STDM)。首先,根据文档的语义结构和独立性决定规则,将每个文档分为多个子文档。其次,基于子文档中的条件共现度,提取具有强语义相关性的组合词。基于这些组合词,通过特征扩展和内容重构形成新的子文档。第三,新的子文档的“主题词”分布和“文档主题”分布是通过使用Gibbs采样进行主题建模获得的。最后,通过将新的子文档的“文档主题”分布与特定策略合并来获得原始文档的“文档主题”分布。将数值实验与七个主题语料库的六个主题模型和两种评估方法进行了比较,实验结果证明了CCOD_STDM的优越性及其在主题发现中的效率。更重要的是,一个案例研究表明组合词可以有效避免多义性问题,并有助于主题的概括和概括。 (C)2019 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2019年第27期|11-24|共14页
  • 作者

    Wei Wei; Guo Chonghui;

  • 作者单位

    Zhengzhou Univ Ctr Energy Environm & Econ Res Zhengzhou 450001 Henan Peoples R China|Dalian Univ Technol Inst Syst Engn Dalian 116024 Peoples R China;

    Dalian Univ Technol Inst Syst Engn Dalian 116024 Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Text mining; Topic discovery; Semantic information; Conditional co-occurrence degree;

    机译:文本挖掘;主题发现;语义信息;有条件共现度;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号