首页> 外文会议>International conference on language resources and evaluation >Annotated Bibliographical Reference Corpora in Digital Humanities
【24h】

Annotated Bibliographical Reference Corpora in Digital Humanities

机译:数字人文注释书目参考语料库

获取原文

摘要

In this paper, we present new bibliographical reference corpora in digital humanities (DH) that have been developed under a research project, Robust and Language Independent Machine Learning Approaches for Automatic Annotation of Bibliographical References in DH Books supported by Google Digital Humanities Research Awards. The main target is the bibliographical references in the articles of Revues.org site, an oldest French online journal platform in DH field. Since the final object is to provide automatic links between related references and articles, the automatic recognition of reference fields like author and title is essential. These fields are therefore manually annotated using a set of carefully defined tags. After providing a full description of three corpora, which are separately constructed according to the difficulty level of annotation, we briefly introduce our experimental results on the first two corpora. A popular machine learning technique, Conditional Random Field (CRF) is used to build a model, which automatically annotates the fields of new references. In the experiments, we first establish a standard for defining features and labels adapted to our DH reference data. Then we show our new methodology against less structured references gives a meaningful result.
机译:在本文中,我们介绍了数字人文科学(DH)中的新书目参考语料库,该研究项目是在Google数字人文研究奖的支持下,针对DH图书中书目参考的自动注释的鲁棒和语言独立机器学习方法而开发的。主要目标是Revues.org网站的文章中的书目参考,Revues.org网站是DH领域中最古老的法国在线期刊平台。由于最终目的是在相关参考文献和文​​章之间提供自动链接,因此对诸如作者和标题之类的参考文献字段的自动识别至关重要。因此,使用一组精心定义的标签对这些字段进行手动注释。在提供了三种语料库的完整描述之后,根据注解的难度级别分别构建了这三种语料库,我们简要介绍了前两个语料库的实验结果。一种流行的机器学习技术,条件随机场(CRF)用于构建模型,该模型自动注释新引用的字段。在实验中,我们首先建立一个标准来定义适合我们DH参考数据的特征和标签。然后,我们展示了针对结构化引用较少的新方法所得出的有意义的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号