首页> 外文会议>International conference on computational linguistics >Parenthetical Classification for Information Extraction
【24h】

Parenthetical Classification for Information Extraction

机译:信息提取的括号分类

获取原文

摘要

The article focuses on a rather unexplored topic in NLP: parenthetical classification. Parenthet-icals are defined as any text sequence between parentheses. They have been approached from isolated perspectives, like translation pairs extraction, but a full account of their syntactic and semantic properties is lacking. This article proposes a new comprehensive scheme drawn from corpus-based linguistic studies on French news. This research is part of a project investigating the structural aspects of punctuation signs and their usefulness for Information Extraction. Parenthetical classification is approached as a relation extraction problem split into three correlated subtasks: syntactic and semantic classification and head recognition. Corpus-based studies singled out 11 syntactic and 18 semantic relation subtypes. The article addresses automatic classification, using a combination of CRF and SVM. This baseline system reports 0.674 (head recognition). 0.908 (syntax), 0.734 (semantics), and 0.518 (end-to-end) of F1.
机译:本文重点介绍NLP中一个尚未开发的主题:括号分类。括号定义为括号之间的任何文本序列。已经从孤立的角度(例如翻译对提取)着手处理它们,但是仍缺乏对它们的句法和语义属性的完整说明。本文提出了一种新的综合方案,该方案是从基于语料库的法国新闻语言研究中得出的。这项研究是研究标点符号的结构方面及其对信息提取的实用性的项目的一部分。括号分类法是一种关系提取问题,分为三个相关的子任务:句法和语义分类以及头部识别。基于语料库的研究选择了11种句法和18种语义关系子类型。本文介绍了结合使用CRF和SVM进行自动分类的方法。该基线系统报告为0.674(头部识别)。 F1的0.908(语法),0.734(语义)和0.518(端对端)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号