首页> 外国专利> METHOD AND SYSTEM FOR EXTRACTING AND CHARACTERIZING RELATIONSHIPS BETWEEN ENTITIES MENTIONED IN DOCUMENTS

METHOD AND SYSTEM FOR EXTRACTING AND CHARACTERIZING RELATIONSHIPS BETWEEN ENTITIES MENTIONED IN DOCUMENTS

机译:提取和表征文档中提到的实体之间的关系的方法和系统

摘要

Methods and devices for use in gathering and analyzing data from a corpus of documents. A corpus of documents is initially scanned for words that qualify as entities according to user defined criteria. Multiple counters track the number of documents which mention specific entities. A database of entities mentioned in the documents is maintained and an entry for each entity in the corpus is placed in the entity database. The results are then presented to a user in a spiral form with the most important entity at the center of the spiral. The importance of an entity may be determined by either how many entities it is connected to or how many documents mention that entity. A connection exists between two entities if they are both mentioned in at least one document and the more documents mention two specific entities at the same time, the stronger the connection between those two specific entities. The result presentation to the user is capable of also visually representing connections between entities by connecting connected entities with lines. The strength of a connection can also be represented with the width of the line connecting two entities.
机译:用于从文档集收集和分析数据的方法和设备。首先根据用户定义的标准扫描文档库中是否有符合实体条件的单词。多个计数器跟踪提及特定实体的文档数量。维护文档中提到的实体的数据库,并将语料库中每个实体的条目放置在实体数据库中。然后将结果以螺旋形式呈现给用户,其中最重要的实体位于螺旋的中心。实体的重要性可以由它连接到多少个实体或有多少个文档提到该实体来确定。如果在至少一个文档中同时提到两个实体,并且两个实体同时提到两个特定实体,则这两个实体之间存在联系,这两个特定实体之间的联系越牢固。通过将连接的实体与线连接起来,向用户呈现的结果还能够直观地表示实体之间的连接。连接的强度也可以用连接两个实体的线的宽度表示。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号