首页> 美国政府科技报告 >TREC Chemical IR Track 2009: A Distributed Dimensional Indexing Model for Chemical Patent Search
【24h】

TREC Chemical IR Track 2009: A Distributed Dimensional Indexing Model for Chemical Patent Search

机译:TREC Chemical IR Track 2009:化学专利检索的分布式尺寸索引模型

获取原文

摘要

For the TREC-2009 Chemical IR Track, we explore development of a distributed information retrieval system based on a dimensional data model. The indexing model supports named entity identification and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system was deployed across 15 Amazon Web Services (AWS) Elastic Cloud Compute (EC2) instances and 15 Elastic Block Storage (EBS) database shards to support efficient indexing and query processing of the relatively large index generated from indexing each individual word (sans stop words) in the 100G+ collection of chemical patent documents. The query processing algorithm for technology survey search and prior art search uses information extraction techniques and locally aggregated term statistics to help disambiguate candidate entities and terms in context. Query processing for prior art search automatically generates a structured query based on the relative distinctiveness of individual terms and candidate entity phrases from the query patent's claims, abstract, and title sections. For both the technology survey and prior art search, we evaluated several probabilistic retrieval functions for integrating statistics of retrieved named entities with term statistics at multiple levels of document structure to identify relevant patents.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号