首页> 中文期刊> 《数据采集与处理》 >基于图聚类的汉越双语新闻话题发现

基于图聚类的汉越双语新闻话题发现

         

摘要

The purpose of cross-language topic discovery is to classify news texts written in different lan-guages by their topics automatically .However ,due to the difference in different languages ,it′s hard to describe these texts on the same feature space ,so mining the same topic is not an easy work .When a particular news event is reported ,the news elements are the same no matter which language describe it . So news elements can reflect the relevance among different news texts .Therefore ,the paper proposed Chinese-Vietnamese bilingual news topic detection methods based on graph clustering .Firstly ,Chinese-Vietnamese bilingual news elements are extracted and the similarity of different news texts is calculated by using the news elements′similarity to set up a Chinese-Vietnamese bilingual news graph model .Sec-ondly ,through the propagation characteristics of the Chinese-Vietnamese bilingual news graph model , the similarity matrix is adjusted by using the random walk algorithm .Finally ,affinity propagation algo-rithm is used to cluster topic .The experimental result shows that the proposed method is effective .%跨语言新闻话题发现是将互联网上报道相同事件的不同语言新闻进行自动归类,由于不同语言文本很难表示在同一特征空间下,对其共同话题的挖掘就比较困难.然而类似的新闻事件在不同语言文本表达上具有相同的新闻要素,这些要素之间关联能够体现出新闻事件的关联性,因此,针对汉越新闻话题发现问题,提出基于文档图聚类的汉越双语新闻话题发现方法.首先提取汉越新闻文本新闻要素,借助文本中要素相似度计算汉越文本相关度,构建汉越双语文本图模型,获得新闻文本相似度矩阵;然后,借助图模型中文本间的传播特点,采用随机游走算法对相似度矩阵进行调整,最后利用信息传递算法进行聚类.实验结果表明提出的方法取得了很好的效果.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号