首页> 美国卫生研究院文献>ZooKeys >From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks

【2h】

From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks

机译：从文档到数据集：一种基于MediaWiki的方法用于在具有百年历史的野外笔记本中注释和提取物种观测结果

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent. In this paper, we demonstrate a workflow to generate structured outputs while also maintaining links to the original texts. The first step in this workflow was to place already digitized and transcribed field notebooks from the University of Colorado Museum of Natural History founder, Junius Henderson, on Wikisource, an open text transcription platform. Next, we created Wikisource templates to document places, dates, and taxa to facilitate annotation and wiki-linking. We then requested help from the public, through social media tools, to take advantage of volunteer efforts and energy. After three notebooks were fully annotated, content was converted into XML and annotations were extracted and cross-walked into Darwin Core compliant record sets. Finally, these recordsets were vetted, to provide valid taxon names, via a process we call “taxonomic referencing.” The result is identification and mobilization of 1,068 observations from three of Henderson’s thirteen notebooks and a publishable Darwin Core record set for use in other analyses. Although challenges remain, this work demonstrates a feasible approach to unlock observations from field notebooks that enhances their discovery and interoperability without losing the narrative context from which those observations are drawn.“Compose your notes as if you were writing a letter to someone a century in the future.”

机译：部分日记，部分科学记录和生物野外记事本通常包含一些必要的详细信息，以了解收集事件期间存在的位置和环境条件。尽管它们在全球变更研究中具有明显的价值（并且最近已在全球变更研究中使用），但野外笔记本的文本挖掘输出对于特定的研究项目却是特质的，并且无法发现或重新使用。数字化，转录，提取以及与其他来源集成的最佳实践和工作流尚不存在。在本文中，我们演示了在生成结构化输出的同时还保持与原始文本的链接的工作流。此工作流程的第一步是将来自科罗拉多大学自然历史博物馆创始人朱尼乌斯·亨德森（Junius Henderson）的已经数字化和转录的野外笔记本放置在开放源文本转录平台Wikisource上。接下来，我们创建了Wikisource模板来记录位置，日期和分类单元，以方便注释和Wiki链接。然后，我们通过社交媒体工具要求公众提供帮助，以利用志愿者的努力和精力。在对三个笔记本进行完全注释后，将内容转换为XML，并提取注释，并将其交叉输入兼容Darwin Core的记录集。最后，通过我们称为“分类参考”的过程，对这些记录集进行了审核，以提供有效的分类单元名称。结果是从Henderson的13份笔记本中的3份和可发布的Darwin Core记录集中识别并动员了1,068个观察结果，以供其他分析使用。尽管仍然存在挑战，但这项工作展示了一种从野外记事本中解锁观察结果的可行方法，可以增强其发现能力和互操作性，而又不会失去从中获得这些观察结果的叙述性背景。未来。”

著录项

期刊名称 ZooKeys
作者
Andrea Thomer; Gaurav Vaidya; Robert Guralnick; David Bloom; Laura Russell;
展开▼
作者单位

展开▼
年(卷),期 2012(-1),209
年度 2012
页码 235–253
总页数 19
原文格式 PDF
正文语种
中图分类动物学;
关键词
Field notes notebooks crowd sourcing digitization biodiversity transcription text-mining Darwin Core Junius Henderson annotation taxonomic referencing natural history Wikisource Colorado species occurrence records;

机译：实地记录;笔记本;众包;数字化;生物多样性;转录;文本挖掘;Darwin Core;Junius Henderson;注释;分类参考;自然历史;Wikisource;科罗拉多州;物种发生记录;

相似文献

外文文献
中文文献
专利

1. From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks [J] . Bloom David, Guralnick Robert, Russell Laura, ZooKeys . 2012,第209期

机译：从文档到数据集：一种基于MediaWiki的方法，用于在具有百年历史的野外记事本中注释和提取物种观测结果
2. Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model: a use case in gender documents [J] . Hélène de Ribaupierre, Gilles Falquet International journal on digital libraries . 2018,第2a3期

机译：使用SciAnnotDoc模型提取话语元素并注释科学文档：性别文档中的用例
3. Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study [J] . Oana Inel, Lora Aroyo OASIcs : OpenAccess Series in Informatics . 2019,第1期

机译：专家注释数据集的验证方法：事件注释案例研究
4. Survey on Extractive Text Summarization Methods with Multi-Document Datasets [C] . P N Varalakshmi K, Jagadish S Kallimani International Conference on Advances in Computing, Communications and Informatics . 2018

机译：多文档数据集提取文本摘要方法的调查
5. Probabilistic random field based method for annotated machine printed documents preprocessing [D] . Peng, Xujun 2011

机译：基于概率随机场的带注释机器打印文档预处理方法
6. An annotated fluorescence image dataset for training nuclear segmentation methods [O] . Florian Kromp, Eva Bozsaky, Fikret Rifatbegovic, 2020

机译：用于训练核细分方法的注释荧光图像数据集
7. Figure 5 from: Bloom D, Thomer A, Vaidya G, Guralnick R, Russell L (2012) From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks. ZooKeys 209: 235-253. https://doi.org/10.3897/zookeys.209.3247 [O] . Thomer, Andrea, Vaidya, Gaurav, Guralnick, Robert, 2012

机译：图5来自：Bloom D，Thomer A，Vaidya G，Guralnick R，Russell L（2012年）从文档到数据集：一种基于MediaWiki的方法，用于在具有百年历史的野外笔记本中注释和提取物种观测值。 ZooKeys 209：235-253。 https://doi.org/10.3897/zookeys.209.3247

From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks

摘要

著录项

相似文献

相关主题

期刊订阅