首页> 中文期刊> 《计算机应用与软件》 >多策略中文微博实体词消歧及实体链接

多策略中文微博实体词消歧及实体链接

         

摘要

在社交网络迅猛发展的今天,如何对有歧义的微博实体进行消歧和如何将微博实体连接到知识库已成为当今研究热点。对实体消歧和实体链接提出了多种策略方案。首先利用 ICTCLAS 对微博文本进行分词处理,利用百度百科、实体专家库对实体进行规范化处理。然后利用由爬虫爬取的百度百科信息、微博数据、网络词语构建了消歧文本数据库,再结合 TF-IDF 算法和 Fast-Newman 聚类算法对实体进行消歧和链接。使用第二届自然语言处理与中文计算会议(NLP&CC 2013)中的中文微博实体链接任务给的数据进行测试,测评中准确率为84.99%,继续改进模型后准确率达91.40%。%Nowadays,the social networks are highly developing.How to disambiguate the microblogging entities with equivocal meaning and to link the entities to knowledge base have become the research focus at present.The paper proposes multiple strategic schemes in regard to entity disambiguation and entity linking.First it uses ICTCLAS to make word segmentation on microblogging texts,and uses Baidu Baike and entity expert database to normalise the entities.Then the paper uses Baidu Baike information,microblogging data and network terms caught by the web crawler to construct the disambiguation text database,and combines TF-IDF algorithm and Fast-Newman clustering algorithm to disambiguate and link the entities.We tested the data fetched from Chinese microblog entity linking task in 2rd Natural Language Processing &Chinese Computation conference (NLP&CC 2013).In the assessment the accuracy rate achieved 84.99%,and further achieved 91.40% after the constant improve of the model.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号