首页> 外文学位 >Entity Analysis with Weak Supervision: Typing, Linking, and Attribute Extraction.

【24h】

Entity Analysis with Weak Supervision: Typing, Linking, and Attribute Extraction.

机译：具有弱监督的实体分析：键入，链接和属性提取。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the advent of the Web, textual information has grown at an explosive rate. To digest this enormous amount of data, an automatic solution, Information Extraction (IE), has become necessary. Information extraction is a task of converting unstructured text strings into structured machine-readable data. The first key step of a general IE pipeline is often to analyze entities mentioned in the text before making holistic conclusions. To fully understand each entity, one needs to detect their mentions, categorize them into semantic types, connect them with their knowledge base entries, and identify their attributes as well as the relationships with others.;In this dissertation, we first present the problem of fine-grained entity recognition. Unlike most traditional named entity recognition systems using a small set of entity classes, e.g., person, organization, location or miscellaneous, we define a novel set of over one hundred fine-grained entity types. In order to intelligently understand text and extract a wide range of information, it is useful to more precisely determine the semantic classes of entities mentioned in unstructured text. We formulate the recognition problem as multi-class, multi-label classification, describe an unsupervised method for collecting training data, and present the FIGER implementation.;Next, we demonstrate that fine-grained entity types are closely connected with other entity analysis tasks. We describe an entity linking system whose prediction heavily relies on these types and present a simple yet effective implementation, called VINCULUM. An extensive evaluation on nine data sets, comparing VINCULUM with two state-of-the-art systems, elucidates key aspects of the system that include mention extraction, candidate generation, entity type prediction, entity coreference, and coherence.;Finally, we describe an approach to acquire commonsense knowledge from a massive amount of text on the Web. In particular, a system called S IZEITALL is developed to extract numerical attribute values for various classes of entities. To resolve the ambiguity from the surface form text, we canonicalize the extractions with respect to WordNet senses and build a knowledge base on physical size for thousands of entity classes.;Throughout all three entity analysis tasks, we show the feasibility of building sophisticated IE systems without a significant investment in human effort to create sufficient labeled data.

机译：随着网络的出现，文本信息以爆炸性的速度增长。为了消化大量数据，自动解决方案信息提取（IE）成为必要。信息提取是将非结构化文本字符串转换为结构化机器可读数据的任务。通用IE管道的第一步，通常是在得出整体结论之前，分析文本中提到的实体。为了充分理解每个实体，需要检测它们的提及，将其归类为语义类型，将其与知识库条目联系起来，并确定其属性以及与其他实体的关系。细粒度的实体识别。与大多数传统的使用少量实体类别（例如人，组织，位置或其他）的命名实体识别系统不同，我们定义了一组新的超过一百种细粒度的实体类型。为了智能地理解文本并提取大量信息，更精确地确定非结构化文本中提到的实体的语义类别很有用。我们将识别问题表述为多类，多标签分类，描述一种用于收集训练数据的无监督方法，并介绍FIGER的实现。接下来，我们证明细粒度的实体类型与其他实体分析任务紧密相关。我们描述了一个实体链接系统，其预测严重依赖于这些类型，并提出了一种简单而有效的实现方式，称为VINCULUM。通过将VINCULUM与两个最新系统进行比较，对9个数据集进行了广泛评估，阐明了该系统的关键方面，包括提要提取，候选者生成，实体类型预测，实体共指和相干性。一种从网络上大量文本中获取常识知识的方法。特别是，开发了一个名为S IZEITALL的系统来提取各种类别的实体的数值属性值。为了解决表面形式文本的歧义，我们规范化了有关WordNet感官的提取，并针对数千个实体类的物理大小建立了知识库。通过所有三个实体分析任务，我们展示了构建复杂的IE系统的可行性无需投入大量人力来创建足够的标签数据。

著录项

作者
Ling, Xiao.;
展开▼
作者单位

University of Washington.;

展开▼
授予单位 University of Washington.;
学科 Computer science.;Artificial intelligence.
学位 Ph.D.
年度 2015
页码 100 p.
总页数 100
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Resolving Entity on A Large scale: DEtermining Linked Entities and Grouping similar Attributes represented in assorted TErminologies [J] . Vidhya K. A., Geetha T. V. Distributed and Parallel Databases . 2017,第3a4期

机译：大规模解析实体：确定链接的实体并将各种术语中表示的相似属性分组
2. Weakly-supervised learning for community detection based on graph convolution in attributed networks [J] . Wang Xiaofeng, Li Jianhua, Yang Li, International journal of machine learning and cybernetics . 2021,第12期

机译：基于Graph卷积的社区检测学习弱监督
3. Hand gesture understanding by weakly-supervised fusing shallow/deep image attributes [J] . Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2020,第期

机译：通过弱监督融合浅/深映像属性的手势理解
4. Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model [C] . Hongliang Dai, Yangqiu Song, Haixun Wang Annual Meeting of the Association for Computational Linguistics;International Joint Conference on natural Language Processing . 2021

机译：超细实体与屏蔽语言模型的弱监督键入
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. A Weakly-Supervised Named Entity Recognition Machine Learning Approach for Emergency Medical Services Clinical Audit [O] . Han Wang, Wesley Lok Kin Yeung, Qin Xiang Ng, 2021

机译：紧急医疗服务临床审计的弱监督名为实体识别机器学习方法
7. Fine-Grained Entity Typing for Domain Independent Entity Linking [O] . Yasumasa Onoe, Greg Durrett 2020

机译：细粒度实体键入域独立实体链接
8. Direct Measurements of Current-Phase Relations for Several Types of Superconducting Weak Links. [R] . jackel, l. d. warlaumont, j. m. 1975

机译：几种超导弱链路电流相位关系的直接测量。

Entity Analysis with Weak Supervision: Typing, Linking, and Attribute Extraction.

摘要

著录项

相似文献

相关主题

期刊订阅