【24h】

Enriching, Editing, and Representing Interlinear Glossed Text

机译:丰富,编辑和表示互线性光泽文本

获取原文

摘要

The majority of the world's languages have little to no NLP resources or tools. This is due to a lack of training data ("resources") over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swathe of the worlds languages. In many cases this involves bootstrapping the learning process with enriched or partially enriched resources. One promising line of research involves the use of Interlinear Glossed Text (IGT), a very common form of annotated data used in the field of linguistics. Although IGT is generally very richly annotated, and can be enriched even further (e.g., through structural projection), much of the content is not easily consumable by machines since it remains "trapped" in linguistic scholarly documents and in human readable form. In this paper, we introduce several tools that make IGT more accessible and consumable by NLP researchers.
机译:世界上大多数语言几乎没有NLP资源或工具。这是由于缺少训练数据(“资源”)而无法训练诸如标记器或解析器之类的工具。近年来,人们越来越努力地将NLP方法应用于世界语言的广泛范围。在许多情况下,这涉及使用丰富或部分丰富的资源来引导学习过程。一项很有前途的研究涉及使用Interlinear Glossed Text(IGT),这是在语言学领域中使用的一种非常常见的带注释数据。尽管IGT通常具有非常丰富的注释,并且可以进一步丰富(例如,通过结构投影),但是由于其中的许多内容仍然被“困在”语言学术文档和人类可读形式中,因此机器不容易消耗掉其中的大部分内容。在本文中,我们介绍了一些工具,这些工具使NLP研究人员可以更容易地使用和使用IGT。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号