首页> 外文期刊>GigaScience >Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
【24h】

Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse

机译:从不同的数据源构建多尺度的地理空间时空生态数据库:促进开放科学和数据重用

获取原文
           

摘要

Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km 2 ). LAGOS includes two modules: LAGOS GEO , with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOS LIMNO , with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
机译:尽管对于单个或一组生态系统有大量基于站点的数据,但是这些数据集散布广泛,具有不同的数据格式和约定,并且通常具有有限的可访问性。在更广泛的范围内,国家数据集包含大量土地,水和空气的地理空间特征,而这些数据是充分了解这些生态系统之间差异所必需的。但是,此类数据集源自不同的来源,并且具有不同的时空分辨率。通过采取开放科学的观点,并结合基于站点的生态系统数据集和国家地理空间数据集,科学获得了提出与大规模环境挑战相关的重要研究问题的能力。建议通过同行评审的文件来记录此类复杂的数据库集成工作,以提高集成数据库的可重复性和将来的使用。在这里,我们描述了建立湖泊生态系统综合数据库LAGOS(LAke多尺度GeOS时空数据库)的主要步骤,挑战和考虑因素,该数据库是在美国17个州的次大陆研究范围内开发的(180万个)公里2)。 LAGOS包括两个模块:LAGOS GEO,具有研究范围内表面积大于4公顷的每个湖泊(约50,000个湖泊)的地理空间数据,包括气候,大气沉积,土地利用/覆盖,水文,地质和地形一系列时空范围;和LAGOS LIMNO,从研究范围内的一部分湖泊(〜10,000个湖泊)的约100个单独数据集中汇编的湖泊水质数据。数据集集成的程序包括:创建灵活的数据库设计;创作和集成元数据;记录数据来源;量化地理数据的空间度量;质量控制的综合和衍生数据;并广泛记录数据库。我们的程序使大型,复杂和集成的数据库具有可复制性和可扩展性,允许用户使用现有数据库或通过添加新数据来提出新的研究问题。这项任务的最大挑战是数据,格式和元数据的异构性。数据集成的许多步骤需要来自各个领域的专家的手动输入,需要密切协作。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号