首页> 外文学位 >Entity Relation Detection with Factorial Hidden Markov Models and Maximum Entropy Discriminant Latent Dirichlet Allocations .
【24h】

Entity Relation Detection with Factorial Hidden Markov Models and Maximum Entropy Discriminant Latent Dirichlet Allocations .

机译:因子隐马尔可夫模型与最大熵判别潜在Dirichlet分配的实体关系检测。

获取原文
获取原文并翻译 | 示例

摘要

Coreference resolution (CR) and entity relation detection (ERD) aim at finding predefined relations between pairs of entities in text. CR focuses on resolving identity relations while ERD focuses on detecting non-identity relations. Both CR and ERD are important as they can potentially improve other natural language processing (NLP) related tasks such information retrieval and extraction, web-searching, and question answering and also enhance non-NLP tasks such as computer vision, database constructions or ontologies.;In this thesis, I propose models to handle both coreference resolution (CR) and entity relation detection (ERD). Both systems are built on machine learning models. The CR system is based on Factorial Hidden Markov Models (FHMMs). The ERD is based on Maximum Entropy Discriminant Latent Dirichlet Allocation (MEDLDA). The work on CR only resolves pronouns. It is a supervised system trained on annotated corpus. The basic idea is that the hidden states of FHMMs are an explicit short-term memory with an antecedent buffer containing recently described referents. Thus an observed pronoun can find its antecedent from the hidden buffer, or in terms of a generative model, the entries in the hidden buffer generate the corresponding pronouns. In the hidden buffer, all references are expressed as diverse features. In this work, besides the common gender, number, person and animacy, I converted Givenness Hierarchy and Centering Theories to probabilistic features, thus greatly improving the accuracy. A system implementing this model is evaluated on the ACE corpus and I2B2 medical corpus with promising performance.;For ERD, a novel application of topic models is proposed to do this task. In order to make use of the latent semantics of text, the task of relation detection is reformulated as a topic modeling problem. The motivation is to find underlying topics which are indicative of relations between named entities. The approach considers pairs of named entities and features associated with them as mini documents. The system, called ERD-MEDLDA, adapts Maximum Entropy Discriminant Latent Dirichlet Allocation (MedLDA) with mixed membership for relation detection. By using supervision, ERD-MedLDA is able to learn topic distributions indicative of relation types. Further, ERD- MEDLDA is a topic model that combines the benefits of both Maximum Likelihood Estimation (MLE) and Maximum Margin Estimation (MME), and the mixed membership formulation enables the system to incorporate heterogeneous features. We incorporate diverse features into the system and perform experiments on the ACE 2005 corpus. Our approach achieves better overall performance for precision, recall and Fmeasure metrics as compared to SVM-based and LDA-based models. ERD-MedLDA also shows better overall performance than state-of-the-art kernels used previously for relation detection.
机译:共指解析(CR)和实体关系检测(ERD)旨在查找文本中的成对实体之间的预定义关系。 CR专注于解决身份关系,而ERD专注于检测非身份关系。 CR和ERD都很重要,因为它们可以潜在地改善其他与自然语言处理(NLP)相关的任务,例如信息检索和提取,Web搜索和问题回答,还可以增强非NLP任务,例如计算机视觉,数据库构建或本体。 ;在本文中,我提出了用于处理共指称分辨率(CR)和实体关系检测(ERD)的模型。两种系统均基于机器学习模型。 CR系统基于因子隐马尔可夫模型(FHMM)。 ERD基于最大熵判别潜在Dirichlet分配(MEDLDA)。关于CR的工作只能解决代词。它是在带注释的语料库上训练的监督系统。基本思想是FHMM的隐藏状态是一个显式的短期记忆,其前期缓冲区包含最近描述的参考对象。因此,观察到的代词可以从隐藏的缓冲区中找到其先行词,或者根据生成模型,隐藏的缓冲区中的条目会生成相应的代词。在隐藏缓冲区中,所有引用均表示为多种功能。在这项工作中,除了常见的性别,人数,人物和喜怒无常之外,我还将纪梵希等级制和居中理论转换为概率特征,从而大大提高了准确性。在ACE语料库和I2B2医疗语料库上对实现该模型的系统进行了评估,具有良好的性能。;对于ERD,提出了一种新的主​​题模型应用程序来完成此任务。为了利用文本的潜在语义,将关系检测的任务重新构造为主题建模问题。动机是找到指示命名实体之间关系的基础主题。该方法将成对的命名实体和与其关联的功能视为迷你文档。该系统称为ERD-MEDLDA,它采用混合成员资格来适应最大熵判别潜在Dirichlet分配(MedLDA),用于关系检测。通过使用监督,ERD-MedLDA能够学习指示关系类型的主题分布。此外,ERD-MEDLDA是一个主题模型,结合了最大似然估计(MLE)和最大边际估计(MME)的优点,并且混合成员资格公式使系统能够合并异构特征。我们将各种功能集成到系统中,并在ACE 2005语料库上进行实验。与基于SVM和LDA的模型相比,我们的方法在精度,召回率和Fmeasure指标方面实现了更好的整体性能。与以前用于关系检测的最新内核相比,ERD-MedLDA还显示出更好的整体性能。

著录项

  • 作者

    Li, Dingcheng.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Language Linguistics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 137 p.
  • 总页数 137
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号