首页> 外文会议>ACM international conference on information and knowledge management >Labeling by Landscaping: Classifying Tokens in Context by Pruning and Decorating Trees
【24h】

Labeling by Landscaping: Classifying Tokens in Context by Pruning and Decorating Trees

机译:通过园林绿化标记:通过修剪和装饰树木在背景下进行分类令牌

获取原文

摘要

State-of-the-art approaches to token labeling within text documents typically cast the problem either as a classification task, without using complex structural characteristics of the input, or as a sequential labeling task, carried out by a Conditional Random Field (CRF) classifier. Here we explore principled ways for structure to be brought to bear on the task. In line with recent trends in statistical learning of structured natural language input, we use a Support Vector Machine (SVM) classification framework deploying tree kernels. We then propose tree transformations and decorations, as a methodology for modeling complex linguistic phenomena in highly multi-dimensional feature spaces. We develop a general purpose tree engineering framework, which enables us to transcend the typically complex and laborious process of feature engineering. We build kernel-based classifiers for two token labeling tasks: fine-grained event recognition, and lexical answer type detection in questions. For both, we show that in comparison with a corresponding linear kernel SVM, our method of using tree kernels improves recognition, thanks to appropriately engineering tree structures for use by the tree kernel. We also observe significant improvements when comparing with a CRF-based realization of structured prediction, itself performing at levels comparable to state-of-the-art.
机译:在文本文档中令牌标记的最先进方法通常将问题作为分类任务,而不使用输入的复杂结构特征,或作为顺序标签任务,由条件随机字段(CRF)执行分类器。在这里,我们探讨了要承担的结构的原则方式。根据近期统计学习的结构化自然语言输入的趋势,我们使用支持向量机(SVM)分类框架部署树内核。然后,我们提出了树形变换和装饰,作为在高度多维特征空间中建模复杂语言现象的方法。我们开发了一般的树工程框架,使我们能够超越特征工程的典型复杂和费力的过程。我们为两个令牌标签任务构建基于内核的分类器:细粒度的事件识别和问题中的词汇答案类型检测。对于这两者来说,我们表明,与相应的线性内核SVM相比,我们使用树内核的方法提高了识别,这归功于为树内核使用的适当工程树结构。当与基于CRF的结构预测的实现比较时,我们还观察到显着改进,本身在与最先进的水平上表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号