首页> 外文会议>Workshop on language technology for cultural heritage, social sciences, and humanities >Extending the tool, or how to annotate historical language varieties
【24h】

Extending the tool, or how to annotate historical language varieties

机译:扩展工具,或如何注释历史语言品种

获取原文

摘要

We present a general and simple method to adapt an existing NLP tool in order to enable it to deal with historical varieties of languages. This approach consists basically in expanding the dictionary with the old word variants and in retraining the tagger with a small training corpus. We implement this approach for Old Spanish. The results of a thorough evaluation over the extended tool show that using this method an almost state-of-the-art performance is obtained, adequate to carry out quantitative studies in the humanities: 94.5% accuracy for the main part of speech and 92.6% for lemma. To our knowledge, this is the first time that such a strategy is adopted to annotate historical language varieties and we believe that it could be used as well to deal with other non-standard varieties of languages.
机译:我们展示了一种调整现有NLP工具的一般和简单的方法,以使其能够处理历史品种的语言。此方法基本包括使用旧词变体扩展字典,并用小型训练语料库重新培训标记器。我们为旧西班牙语实施这种方法。通过扩展工具进行彻底评估的结果表明,使用这种方法几乎可以获得最先进的性能,足以在人文科学中进行定量研究:言论的主要部分和92.6%的准确性为94.5%对于引理。为了我们的知识,这是第一次采用这种战略来注释历史语言品种,我们认为可以使用它来处理其他非标准品种语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号