首页> 外文期刊>Expert Systems with Application >Lexicon-Grammar based open information extraction from natural language sentences in Italian
【24h】

Lexicon-Grammar based open information extraction from natural language sentences in Italian

机译:基于词典的语法,从意大利语中的自然语言句子中提取开放信息

获取原文
获取原文并翻译 | 示例
           

摘要

In the last decade, the quantity of readily accessible text has grown rapidly and enormously, long exceeding the capacity of humans to read and understand it. One of the most interesting strategies proposed to fulfill this need is known as Open Information Extraction (OIE). It is essentially devised to read in sentences and rapidly extract one or more domain-independent coherent propositions, each represented by a verb relation and its arguments. Even though many OIE approaches exist for English, no significant research has been conducted about OIE on Italian texts. Due to the usage of language-specific features, OIE systems operating in other languages are not directly applicable for Italian. Therefore, this paper proposes, as first contribution, a novel approach to perform OIE for Italian language, based on standard linguistic structures to analyze sentences and on a set of verbal behavior patterns to extract information from them. These patterns are built combining a solid linguistic theoretical framework, i.e. Lexicon-Grammar (LG), and distributional profiles extracted from a contemporary Italian corpus, i.e. itWaC. Starting from simple sentences, the approach is able to determine elementary tuples, then, all their permutations, by adding complements and adverbials, and, finally, n-ary propositions, by granting syntactic invariance, preserving the overall grammaticality and also respecting some syntactic constraints and selection preferences, thus approximating a first level of semantic acceptability. As second contribution of this work, a gold standard dataset for the Italian language has been built from the itWaC corpus, aimed at being widely used to enable the experimental validation of OIE solutions. It has been manually and independently labeled by four Italian native speakers with all the n-ary propositions that can be extracted, following the criteria of grammaticality and acceptability, i.e. granting syntactic well-formedness and meaningfulness in the context. Finally, the proposed approach has been experimented and quantitatively validated on this gold standard dataset, also in comparison with an indirect approach translating input sentences and output propositions from Italian to English and vice versa and embedding an OIE approach for English, as well as with an OIE system for Italian previously presented by the authors. The results obtained have shown the effectiveness of the proposed approach in generating propositions with respect to these criteria of grammaticality and acceptability. Even if the approach has been evaluated for the Italian language, it is essentially based on linguistic resources produced by LG, which exist for many languages besides Italian and a representative corpus for the language under consideration. Given these premises, it has a general basis from a methodological perspective and can be proficiently extended also to other languages. (C) 2019 Elsevier Ltd. All rights reserved.
机译:在过去的十年中,易于获取的文本数量迅速而巨大地增长,远远超过了人类阅读和理解文本的能力。为满足这一需求而提出的最有趣的策略之一就是开放信息提取(OIE)。它的主要目的是阅读句子并快速提取一个或多个与领域无关的连贯性命题,每个命题均由动词关系及其自变量表示。尽管英语有很多OIE方法,但有关OIE的意大利语文本尚未进行大量研究。由于使用了特定于语言的功能,以其他语言运行的OIE系统不适用于意大利语。因此,本文提出了作为第一贡献的一种新方法,该方法基于标准语言结构来分析句子并基于一组言语行为模式来从中提取信息,从而为意大利语执行OIE。这些模式是结合坚实的语言理论框架(即Lexicon-Grammar(LG))和从当代意大利语料库(即itWaC)中提取的分布图而构建的。从简单的句子开始,该方法能够确定基本元组,然后通过添加补语和副词确定所有元组的排列,最后通过赋予句法不变性,保留整体语法并遵守某些句法约束,从而确定n元命题。和选择首选项,从而使语义可接受性接近第一水平。作为这项工作的第二个贡献,从itWaC语料库构建了意大利语的黄金标准数据集,旨在广泛用于实现OIE解决方案的实验验证。它由四个意大利语为母语的人手动和独立地标记,可以遵循语法和可接受性的标准,即在上下文中赋予语法良好的形式和意义,可以提取所有的n元命题。最后,在这种黄金标准数据集上对提出的方法进行了实验和定量验证,同时还与将输入句子和输出命题从意大利语翻译成英语(反之亦然)以及将OIE方法嵌入英语的间接方法进行了比较。作者先前介绍过的意大利语OIE系统。获得的结果表明,所提出的方法在针对这些语法和可接受性标准生成命题方面是有效的。即使该方法已针对意大利语进行了评估,但它基本上是基于LG产生的语言资源,除了意大利语和所考虑语言的代表性语料库外,LG还提供多种语言资源。在这些前提下,它从方法论的角度具有通用基础,并且可以熟练地扩展到其他语言。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号