首页> 美国卫生研究院文献>ZooKeys >From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks
【2h】

From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks

机译:从文档到数据集:一种基于MediaWiki的方法用于在具有百年历史的野外笔记本中注释和提取物种观测结果

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent. In this paper, we demonstrate a workflow to generate structured outputs while also maintaining links to the original texts. The first step in this workflow was to place already digitized and transcribed field notebooks from the University of Colorado Museum of Natural History founder, Junius Henderson, on Wikisource, an open text transcription platform. Next, we created Wikisource templates to document places, dates, and taxa to facilitate annotation and wiki-linking. We then requested help from the public, through social media tools, to take advantage of volunteer efforts and energy. After three notebooks were fully annotated, content was converted into XML and annotations were extracted and cross-walked into Darwin Core compliant record sets. Finally, these recordsets were vetted, to provide valid taxon names, via a process we call “taxonomic referencing.” The result is identification and mobilization of 1,068 observations from three of Henderson’s thirteen notebooks and a publishable Darwin Core record set for use in other analyses. Although challenges remain, this work demonstrates a feasible approach to unlock observations from field notebooks that enhances their discovery and interoperability without losing the narrative context from which those observations are drawn.“Compose your notes as if you were writing a letter to someone a century in the future.”
机译:部分日记,部分科学记录和生物野外记事本通常包含一些必要的详细信息,以了解收集事件期间存在的位置和环境条件。尽管它们在全球变更研究中具有明显的价值(并且最近已在全球变更研究中使用),但野外笔记本的文本挖掘输出对于特定的研究项目却是特质的,并且无法发现或重新使用。数字化,转录,提取以及与其他来源集成的最佳实践和工作流尚不存在。在本文中,我们演示了在生成结构化输出的同时还保持与原始文本的链接的工作流。此工作流程的第一步是将来自科罗拉多大学自然历史博物馆创始人朱尼乌斯·亨德森(Junius Henderson)的已经数字化和转录的野外笔记本放置在开放源文本转录平台Wikisource上。接下来,我们创建了Wikisource模板来记录位置,日期和分类单元,以方便注释和Wiki链接。然后,我们通过社交媒体工具要求公众提供帮助,以利用志愿者的努力和精力。在对三个笔记本进行完全注释后,将内容转换为XML,并提取注释,并将其交叉输入兼容Darwin Core的记录集。最后,通过我们称为“分类参考”的过程,对这些记录集进行了审核,以提供有效的分类单元名称。结果是从Henderson的13份笔记本中的3份和可发布的Darwin Core记录集中识别并动员了1,068个观察结果,以供其他分析使用。尽管仍然存在挑战,但这项工作展示了一种从野外记事本中解锁观察结果的可行方法,可以增强其发现能力和互操作性,而又不会失去从中获得这些观察结果的叙述性背景。未来。”

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号