首页> 外文会议>K-CAP'07 >Strategies for Lifelong Knowledge Extraction from the Web
【24h】

Strategies for Lifelong Knowledge Extraction from the Web

机译:从网络提取终身知识的策略

获取原文
获取原文并翻译 | 示例

摘要

The increasing availability of electronic text has made it possible to acquire information using a variety of techniques that leverage the expertise of both humans and machines. In particular, the field of Information Extraction (IE), in which knowledge is extracted automatically from text, has shown promise for large-scale knowledge acquisition.rnWhile IE systems can uncover assertions about individual entities with an increasing level of sophistication, text understanding — the formation of a coherent theory from a textual corpus - involves representation and learning abilities not currently achievable by today's IE systems. Compared to individual relational assertions outputted by IE systems, a theory includes coherent knowledge of abstract concepts and the relationships among them.rnWe believe that the ability to fully discover the richness of knowledge present within large, unstructured and heterogeneous corpora will require a lifelong learning process in which earlier learned knowledge is used to guide subsequent learning. This paper introduces Alice, a lifelong learning agent whose goal is to automatically discover a collection of concepts, facts and generalizations that describe a particular topic of interest directly from a large volume of Web text. Building upon recent advances in unsupervised information extraction, we demonstrate that Alice can iteratively discover new concepts and compose general domain knowledge with a precision of 78%.
机译:电子文本的可用性不断提高,使得可以利用各种利用人类和机器专业知识的技术来获取信息。特别是从文本中自动提取知识的信息提取(IE)领域显示了大规模知识获取的前景。尽管IE系统可以发现有关单个实体的断言,但其复杂程度不断提高,但文本理解-由文本语料库形成连贯理论的过程-涉及当今的IE系统当前无法实现的表示和学习能力。与IE系统输出的单个关系断言相比,一种理论包含了抽象概念及其之间关系的连贯知识。我们相信,要充分发现大型,非结构化和异构语料库中存在的知识的丰富性,就需要终身学习过程。其中较早学习的知识用于指导后续学习。本文介绍了Alice,这是一个终身学习的代理,其目标是自动从大量的Web文本中直接发现描述感兴趣的特定主题的概念,事实和概括的集合。基于无监督信息提取的最新进展,我们证明了Alice可以迭代地发现新概念并以78%的精度构成一般领域的知识。

著录项

  • 来源
    《K-CAP'07》|2007年|p.95-102|共8页
  • 会议地点 Whistler(CA);Whistler(CA)
  • 作者

    Michele Banko; Oren Etzioni;

  • 作者单位

    Turing Center University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98195, USA banko@cs.washington.edu;

    rnTuring Center University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98195, USA etzioni@cs.washington.edu;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算技术、计算机技术;
  • 关键词

    algorithms;

    机译:算法;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号