...
首页> 外文期刊>Knowledge and Information Systems >TEG—a hybrid approach to information extraction
【24h】

TEG—a hybrid approach to information extraction

机译:TEG-信息提取的混合方法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labour by relying on statistics drawn from a training corpus. The implementation of the model, called TEG (trainable extraction grammar), can be adapted to any IE domain by writing a suitable set of rules in a SCFG (stochastic context-free grammar)-based extraction language and training them using an annotated corpus. The system does not contain any purely linguistic components, such as PoS tagger or shallow parser, but allows to using external linguistic components if necessary. We demonstrate the performance of the system on several named entity extraction and relation extraction tasks. The experiments show that our hybrid approach outperforms both purely statistical and purely knowledge-based systems, while requiring orders of magnitude less manual rule writing and smaller amounts of training data. We also demonstrate the robustness of our system under conditions of poor training-data quality.
机译:本文描述了一种基于统计和知识的混合信息提取模型,该模型能够在句子级别提取实体和关系。该模型试图保留并提高基于知识的系统的高精度水平,同时通过依赖于训练语料库的统计数据来极大地减少体力劳动量。该模型的实现称为TEG(可训练提取语法),可以通过在基于SCFG(随机上下文无关语法)的提取语言中编写一组合适的规则,并使用带注释的语料库对其进行训练,从而使其适应任何IE域。该系统不包含任何纯粹的语言组件,例如PoS标记程序或浅层解析器,但允许在必要时使用外部语言组件。我们在几个命名的实体提取和关系提取任务上演示了系统的性能。实验表明,我们的混合方法优于纯粹的统计系统和纯粹基于知识的系统,同时所需的手动规则编写量和培训数据量也要少几个数量级。我们还证明了在训练数据质量较差的情况下,系统的稳定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号