...
首页> 外文期刊>Neurocomputing >Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation
【24h】

Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation

机译:相关图模型的汉语故事语义分割的无监督度量

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a simple yet effective approach to unsupervisedly measuring Chinese lexical semantic similarity, and shows its promising performance in automatic story segmentation of Mandarin broadcast news. Our approach centers on the unsupervised correlated affinity graph (UCAG) model, which is initialized as a hybrid sparse graph, encoding both explicit word-to-word contextual correlations and latent word-to-character correlations within the given corpus. The UCAG model further diffuses the initial sparse correlations throughout the graph by parallel affinity propagation. This provides us with a dense, reliable, and corpus-specific lexical semantic similarity measure, which comes from purely unlabeled data. We then generalize the classical cosine similarity metric to effectively take soft similarities into account for story segmentation. Extensive experiments on benchmark datasets validate the superiority of the proposed similarity measure over previous measures. We specifically show that our similarity measure averagely helps to achieve 7.7% relative F1-score improvement to the accuracy of state-of-art normalized cuts (NCuts) based story segmentation on two holistic benchmark Mandarin broadcast news corpora, TDT2 and CCTV, and achieves 10.8% relative F1-score improvement on the detailed broadcast news subsets. (C) 2018 Elsevier B.V. All rights reserved.
机译:本文提出了一种简单而有效的方法来无监督地测量中文词汇语义相似度,并显示了其在普通话广播新闻自动故事分割中的有希望的表现。我们的方法集中在无监督的相关性亲和图(UCAG)模型上,该模型被初始化为混合稀疏图,对给定语料库中的显式词与词上下文相关性和潜在的词与字符相关性进行编码。 UCAG模型通过并行亲和力传播进一步分散了整个图中的初始稀疏相关性。这为我们提供了一种密集,可靠且特定于语料库的词汇语义相似性度量,该度量来自纯未标记的数据。然后,我们对经典余弦相似度度量进行概括,以有效地将软相似度用于故事分割。在基准数据集上进行的大量实验证实了所提出的相似性度量优于先前的度量。我们具体表明,我们的相似性度量平均有助于在两个整体基准普通话广播新闻语料库TDT2和CCTV上,基于最新的标准化剪切(NCuts)的故事分割的准确性达到7.7%的F1分数提高,详细广播新闻子集的F1相对得分提高了10.8%。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号