首页> 外文会议>International Conference on Language Resources and Evaluation >ODIL_Syntax: a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees
【24h】

ODIL_Syntax: a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees

机译:Odil_syntax:一个免费的自来来的法国treebank与成分树木注释

获取原文

摘要

This paper describes ODIL_Syntax, a French treebank built on spontaneous speech transcripts. The syntactic structure of every speech turn is represented by constituent trees, through a procedure which combines an automatic annotation provided by a parser (here, the Stanford Parser) and a manual revision. ODILSyntax respects the annotation scheme designed for the French TreeBank (FTB), with the addition of some annotation guidelines that aims at representing specific features of the spoken language such as speech disftuencies. The corpus will be freely distributed by January 2020 under a Creative Commons licence. It will ground a further semantic enrichment dedicated to the representation of temporal entities and temporal relations, as a second phase of the ODIL@Temporal project. The paper details the annotation scheme we followed with a emphasis on the representation of speech disfluencics. We then present the annotation procedure that was carried out on the Contemplata annotation platform. In the last section, we provide some distributional characteristics of the annotated corpus (POS distribution, multiword expressions).
机译:本文介绍了Odil_Syntax,这是一个在自发演讲成绩单上建立的法国树木银行。每个语音转弯的句法结构由组成树代表,通过组合解析器提供的自动注释(这里,斯坦福解析器)和手动修订的过程来表示。 Odilsyntax尊重为法国TreeBank(FTB)设计的注释方案,增加了一些注释指南,该指南旨在代表语言诸如语音分歧等口语的特定特征。该语料库将于1920年1月在创造性的公共许可下自由分发。它将接地致力于临时实体和时间关系的代表的进一步的语义富集,作为Odil @ Temporal项目的第二阶段。本文详细说明了注释方案,我们随之而来强调言语不流化的代表性。然后,我们提出了在Contemplata注释平台上执行的注释程序。在最后一节中,我们提供了注释语料库的一些分布特征(POS分发,多字演示)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号