首页> 外文学位 >Entity Relation Detection with Factorial Hidden Markov Models and Maximum Entropy Discriminant Latent Dirichlet Allocations .

【24h】

Entity Relation Detection with Factorial Hidden Markov Models and Maximum Entropy Discriminant Latent Dirichlet Allocations .

机译：因子隐马尔可夫模型与最大熵判别潜在Dirichlet分配的实体关系检测。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Coreference resolution (CR) and entity relation detection (ERD) aim at finding predefined relations between pairs of entities in text. CR focuses on resolving identity relations while ERD focuses on detecting non-identity relations. Both CR and ERD are important as they can potentially improve other natural language processing (NLP) related tasks such information retrieval and extraction, web-searching, and question answering and also enhance non-NLP tasks such as computer vision, database constructions or ontologies.;In this thesis, I propose models to handle both coreference resolution (CR) and entity relation detection (ERD). Both systems are built on machine learning models. The CR system is based on Factorial Hidden Markov Models (FHMMs). The ERD is based on Maximum Entropy Discriminant Latent Dirichlet Allocation (MEDLDA). The work on CR only resolves pronouns. It is a supervised system trained on annotated corpus. The basic idea is that the hidden states of FHMMs are an explicit short-term memory with an antecedent buffer containing recently described referents. Thus an observed pronoun can find its antecedent from the hidden buffer, or in terms of a generative model, the entries in the hidden buffer generate the corresponding pronouns. In the hidden buffer, all references are expressed as diverse features. In this work, besides the common gender, number, person and animacy, I converted Givenness Hierarchy and Centering Theories to probabilistic features, thus greatly improving the accuracy. A system implementing this model is evaluated on the ACE corpus and I2B2 medical corpus with promising performance.;For ERD, a novel application of topic models is proposed to do this task. In order to make use of the latent semantics of text, the task of relation detection is reformulated as a topic modeling problem. The motivation is to find underlying topics which are indicative of relations between named entities. The approach considers pairs of named entities and features associated with them as mini documents. The system, called ERD-MEDLDA, adapts Maximum Entropy Discriminant Latent Dirichlet Allocation (MedLDA) with mixed membership for relation detection. By using supervision, ERD-MedLDA is able to learn topic distributions indicative of relation types. Further, ERD- MEDLDA is a topic model that combines the benefits of both Maximum Likelihood Estimation (MLE) and Maximum Margin Estimation (MME), and the mixed membership formulation enables the system to incorporate heterogeneous features. We incorporate diverse features into the system and perform experiments on the ACE 2005 corpus. Our approach achieves better overall performance for precision, recall and Fmeasure metrics as compared to SVM-based and LDA-based models. ERD-MedLDA also shows better overall performance than state-of-the-art kernels used previously for relation detection.

机译：共指解析（CR）和实体关系检测（ERD）旨在查找文本中的成对实体之间的预定义关系。 CR专注于解决身份关系，而ERD专注于检测非身份关系。 CR和ERD都很重要，因为它们可以潜在地改善其他与自然语言处理（NLP）相关的任务，例如信息检索和提取，Web搜索和问题回答，还可以增强非NLP任务，例如计算机视觉，数据库构建或本体。；在本文中，我提出了用于处理共指称分辨率（CR）和实体关系检测（ERD）的模型。两种系统均基于机器学习模型。 CR系统基于因子隐马尔可夫模型（FHMM）。 ERD基于最大熵判别潜在Dirichlet分配（MEDLDA）。关于CR的工作只能解决代词。它是在带注释的语料库上训练的监督系统。基本思想是FHMM的隐藏状态是一个显式的短期记忆，其前期缓冲区包含最近描述的参考对象。因此，观察到的代词可以从隐藏的缓冲区中找到其先行词，或者根据生成模型，隐藏的缓冲区中的条目会生成相应的代词。在隐藏缓冲区中，所有引用均表示为多种功能。在这项工作中，除了常见的性别，人数，人物和喜怒无常之外，我还将纪梵希等级制和居中理论转换为概率特征，从而大大提高了准确性。在ACE语料库和I2B2医疗语料库上对实现该模型的系统进行了评估，具有良好的性能。;对于ERD，提出了一种新的主题模型应用程序来完成此任务。为了利用文本的潜在语义，将关系检测的任务重新构造为主题建模问题。动机是找到指示命名实体之间关系的基础主题。该方法将成对的命名实体和与其关联的功能视为迷你文档。该系统称为ERD-MEDLDA，它采用混合成员资格来适应最大熵判别潜在Dirichlet分配（MedLDA），用于关系检测。通过使用监督，ERD-MedLDA能够学习指示关系类型的主题分布。此外，ERD-MEDLDA是一个主题模型，结合了最大似然估计（MLE）和最大边际估计（MME）的优点，并且混合成员资格公式使系统能够合并异构特征。我们将各种功能集成到系统中，并在ACE 2005语料库上进行实验。与基于SVM和LDA的模型相比，我们的方法在精度，召回率和Fmeasure指标方面实现了更好的整体性能。与以前用于关系检测的最新内核相比，ERD-MedLDA还显示出更好的整体性能。

著录项

作者
Li, Dingcheng.;
展开▼
作者单位

University of Minnesota.;

展开▼
授予单位 University of Minnesota.;
学科 Language Linguistics.;Computer Science.
学位 Ph.D.
年度 2011
页码 137 p.
总页数 137
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning word meanings and grammar for verbalization of daily life activities using multilayered multimodal latent Dirichlet allocation and Bayesian hidden Markov models [J] . Attamimi Muhammad, Ando Yuji, Nakamura Tomoaki, Advanced Robotics: The International Journal of the Robotics Society of Japan . 2016,第11a12期

机译：使用多层多峰潜在Dirichlet分配和贝叶斯隐马尔可夫模型学习用于日常活动言语表达的词义和语法
2. Phishing detection and impersonated entity discovery using Conditional Random Field and Latent Dirichlet Allocation [J] . Venkatesh Ramanathan, Harry Wechsler Computers & Security . 2013,第may期

机译：使用条件随机场和潜在狄利克雷分配的网络钓鱼检测和模拟实体发现
3. Proportional data modeling with hidden Markov models based on generalized Dirichlet and Beta-Liouville mixtures applied to anomaly detection in public areas [J] . Epaillard Elise, Bouguila Nizar Pattern Recognition: The Journal of the Pattern Recognition Society . 2016,第Null期

机译：基于广义Dirichlet和Beta-Liouville混合物的隐马尔可夫模型的比例数据建模应用于公共区域的异常检测
4. Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging) [C] . Yasir Abdur Rohman, Retno Kusumaningrum International Conference on Informatics and Computational Sciences . 2019

机译：使用潜在Dirichlet分配和隐马尔可夫模型POS-TAG（词性标记）的Twitter讲故事生成器
5. Dirichlet process mixture modeling: Hidden Markov mixture models and multi-task compressive sensing [D] . Qi, Yuting 2009

机译：Dirichlet过程混合物建模：隐马尔可夫混合物模型和多任务压缩感测
6. A DIRICHLET PROCESS MIXTURE OF HIDDEN MARKOV MODELS FOR PROTEIN STRUCTURE PREDICTION [O] . Kristin P. Lennox, David B. Dahl, Marina Vannucci, -1

机译：蛋白质结构预测隐马尔可夫模型的Dirichlet式混合物
7. Characterizing A Database of Sequential Behaviors with Latent Dirichlet Hidden Markov Models [O] . Song, Yin, Cao, Longbing, Fan, Xuhui, 2013

机译：用潜在Dirichlet表征序贯行为数据库隐马尔可夫模型
8. Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies [R] . Isaly, L. A. 2010

机译：利用本体增强潜在Dirichlet分配和秩阈值检测

Entity Relation Detection with Factorial Hidden Markov Models and Maximum Entropy Discriminant Latent Dirichlet Allocations .

摘要

著录项

相似文献

相关主题

期刊订阅