首页> 外文期刊>International journal on Semantic Web and information systems >Bootstrapping of Semantic Relation Extraction for a Morphologically Rich Language: Semi-Supervised Learning of Semantic Relations
【24h】

Bootstrapping of Semantic Relation Extraction for a Morphologically Rich Language: Semi-Supervised Learning of Semantic Relations

机译:在形态学丰富的语言中引导语义关系提取:半监督语义关系学习

获取原文
获取原文并翻译 | 示例
           

摘要

This article focuses on the use of a bootstrapping approach for the extraction of semantic relations that exist between two different concepts in a Tamil text. The proposed system, bootstrapping approach to semantic UNL relation extraction (BASURE) extracts generic relations that exist between different components of a sentence by exploiting the morphological richness of Tamil. Tamil is essentially a partially free word order language which means that semantic relations that exist between the concepts can occur anywhere in the sentence not necessarily in a fixed order. Here, the authors use Universal Networking Language (UNL), an Interlingua framework, to represent the word-based features and aim to define UNL semantic relations that exist between any two constituents in a sentence. The morphological suffix, lexical category and UNL semantic constraints associated with a word are defined as tuples of the pattern used for bootstrapping. Most systems define the initial set of seed patterns manually. However, this article uses a rule-based approach to obtain word-based features that form tuples of the patterns. A bootstrapping approach is then applied to extract all possible instances from the corpus and to generate new patterns. Here, the authors also introduce the use of UNL ontology to discover the semantic similarity between semantic tuples of the pattern, hence, to learn new patterns from the text corpus in an iterative manner. The use of UNL Ontology makes this approach general and domain independent. The results obtained are evaluated and compared with existing approaches and it has been shown that this approach is generic, can extract all sentence based semantic UNL relations and significantly increases the performance of the generic semantic relation extraction system.
机译:本文侧重于使用自行启动方法来提取泰米尔文本中的两个不同概念之间存在的语义关系。建议的系统,对语义偏离关系提取(Basure)的自动启动方法通过利用泰米尔的形态丰富的形态丰富的句子的不同组成部分之间存在的通用关系。泰米尔本质上是部分免费的单词秩序语言,这意味着在概念之间存在的语义关系可以在句子中的任何地方都不一定以固定的顺序发生。在这里,作者使用通用网络语言(UNL),一个Interlingua框架,表示基于词的特征,旨在定义句子中任意两个成分之间存在的语义关系。形态后缀,词汇类别和与单词相关联的语义约束被定义为用于自动启动的模式的元组。大多数系统手动定义初始种子模式集。但是,本文使用基于规则的方法来获取基于词的特征,形成模式的元组。然后应用自动启动方法以从语料库中提取所有可能的实例并生成新模式。在这里,作者还介绍了使用UNL本体学,以发现模式的语义元组之间的语义相似度,因此,以迭代方式从文本语料库中学习新模式。使用UNL ONTOGOLOGY使得这种方法和域名独立。得到的结果评估并与现有方法进行了比较,并显示出这种方法是通用的,可以提取所有基于句子的语义非关系,并显着提高了通用语义关系提取系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号