【24h】

UCSG Shallow Parser

机译:UCSG浅解析器

获取原文
获取原文并翻译 | 示例

摘要

Recently, there is an increasing interest in integrating rule based methods with statistical techniques for developing robust, wide coverage, high performance parsing systems. In this paper, we describe an architecture, called UCSG shallow parser architecture, which combines linguistic constraints expressed in the form of finite state grammars with statistical rating using HMMs built from a POS-tagged corpus and an A* search for global optimization for determining the best shallow parse for a given sentence. The primary aim of the design of the UCSG parsing architecture is developing a judicious combination of linguistic and statistical methods to develop wide coverage robust shallow parsing systems, without the need for large scale manually parsed training corpora. The UCSG architecture uses a grammar to specify all valid structures and a statistical component to rate and rank the possible alternatives, so as to produce the best parse first without compromising on the ability to produce all possible parses. The architecture supports bootstrapping with an aim to reduce the need for parsed training corpora. The complete system has been implemented in Perl under Linux. In this paper we first describe the UCSG shallow parsing architecture and then focus on the evaluation of the UCSG finite state grammar for the chunking task for English. Recall of 91.16% and 93.73% have been obtained on the Susanne parsed corpus and CoNLL 2000 chunking task test data set respectively. Extensive experimentation is under way to evaluate the other modules.
机译:最近,人们越来越关注将基于规则的方法与统计技术相集成,以开发健壮,覆盖面广的高性能解析系统。在本文中,我们描述了一种称为UCSG浅层解析器体系结构的体系结构,该体系结构将有限状态文法形式的语言约束与具有统计等级的统计信息结合使用,该HMM使用POS标签语料库构建的HMM和A *搜索全局优化来确定给定句子的最佳浅层分析。 UCSG解析体系结构设计的主要目的是开发一种语言学和统计方法的明智组合,以开发广泛覆盖的健壮的浅层解析系统,而无需大规模的手动解析训练语料库。 UCSG体系结构使用语法指定所有有效结构,并使用统计成分对可能的备选方案进行评级和排名,以便首先产生最佳解析,而又不影响产生所有可能解析的能力。该体系结构支持自举,目的是减少对解析后的训练语料库的需求。完整的系统已在Linux下的Perl中实现。在本文中,我们首先描述了UCSG浅层解析体系结构,然后重点介绍了针对英语分块任务的UCSG有限状态语法的评估。在Susanne解析的语料库和CoNLL 2000分块任务测试数据集上,分别获得了91.16%和93.73%的调用率。正在进行广泛的实验以评估其他模块。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号