首页> 外文会议>Conference on empirical methods in natural language processing >Er... well, it matters, right? On the role of data representations in spoken language dependency parsing
【24h】

Er... well, it matters, right? On the role of data representations in spoken language dependency parsing

机译:呃......好吧,它很重要,对吧?关于数据表示在语言依赖解析中的作用

获取原文

摘要

Despite the significant improvement of data-driven dependency parsing systems in recent years, they still achieve a considerably lower performance in parsing spoken language data in comparison to written data. On the example of Spoken Slovenian Treebank, the first spoken data treebank using the UD annotation scheme, we investigate which speech-specific phenomena undermine parsing performance, through a series of training data and treebank modification experiments using two distinct state-of-the-art parsing systems. Our results show that utterance segmentation is the most prominent cause of low parsing performance, both in parsing raw and pre-segmented transcriptions. In addition to shorter utterances, both parsers perform better on normalized transcriptions including basic markers of prosody and excluding disfiuencies, discourse markers and fillers. On the other hand, the effects of written training data addition and speech-specific dependency representations largely depend on the parsing system selected.
机译:尽管近年来数据驱动依赖解析系统的重大改进,但与书面数据相比,它们仍然在解析口语数据方面仍然达到了相当低的性能。在口头斯洛文尼亚树班库的例子上,使用UD注释方案的第一个口头数据树银行,我们调查哪些语音特定现象破坏解析性能,通过使用两个不同的最先进的培训数据和树木银行改装实验进行解析性能解析系统。我们的研究结果表明,话语分割是解析原始和预分段的转录中解析性能的最突出原因。除了较短的话语之外,两个解析剂在规范化的转录上表现出更好的,包括韵律的基本标记,并不包括无差异,话语标记和填料。另一方面,书面训练数据添加和语音特定依赖关系的影响主要取决于所选解析系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号