首页> 外文会议>Conference on empirical methods in natural language processing >UD-Japanese BCCWJ: Universal Dependencies Annotation for the Balanced Corpus of Contemporary Written Japanese
【24h】

UD-Japanese BCCWJ: Universal Dependencies Annotation for the Balanced Corpus of Contemporary Written Japanese

机译:ud-japanes bccwj:普遍依赖性批注当代人写日本的平衡语料库

获取原文

摘要

In this paper, we describe a corpus UD Japanese-BCCWJ that was created by converting the Balanced Corpus of Contemporary Written Japanese (BCCWJ), a Japanese language corpus, to adhere to the UD annotation schema. The BCCWJ already assigns dependency information at the level of the bun-setsu (a Japanese syntactic unit comparable to the phrase). We developed a program to convert the BCCWJto UD based on this dependency structure, and this corpus is the result of completely automatic conversion using the program. UD Japanese-BCCWJ is the largest-scale UD Japanese corpus and the second-largest of all UD corpora, including 1,980 documents, 57,109 sentences, and 1,273k words across six distinct domains.
机译:在本文中,我们描述了通过转换当代书面日语(BCCWJ),日语语料库的平衡语料库来创建的语料库UD日本BCCWJ,以遵守UD注释模式。 BCCWJ已经在Bun-SetSU的级别(日语句法单元相当)的级别分配依赖性信息。我们开发了一个程序,用于基于此依赖结构转换BCCWJTO UD,而该语料库是使用该程序完全自动转换的结果。 UD日本BCCWJ是最大的UD日语语料库和所有UD基础的第二大公司,包括1,980个文件,57,109个句子和跨六个不同域名的1,273k字。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号