首页> 外文期刊>European journal of human genetics: EJHG >The federated database--a basis for biobank-based post-genome studies, integrating phenome and genome data from 600,000 twin pairs in Europe.
【24h】

The federated database--a basis for biobank-based post-genome studies, integrating phenome and genome data from 600,000 twin pairs in Europe.

机译:联邦数据库-基于生物库的后基因组研究的基础,整合了来自欧洲60万对双胞胎的表型和基因组数据。

获取原文
获取原文并翻译 | 示例
           

摘要

Integration of complex data and data management represent major challenges in large-scale biobank-based post-genome era research projects like GenomEUtwin (an international collaboration between eight Twin Registries) with extensive amounts of genotype and phenotype data combined from different data sources located in different countries. The challenge lies not only in data harmonization and constant update of clinical details in various locations, but also in the heterogeneity of data storage and confidentiality of sensitive health-related and genetic data. Solid infrastructure must be built to provide secure, but easily accessible and standardized, data exchange also facilitating statistical analyses of the stored data. Data collection sites desire to have full control of the accumulation of data, and at the same time the integration should facilitate effortless slicing and dicing of the data for different types of data pooling and study designs. Here we describe how we constructed a federated database infrastructure for genotype and phenotype information collected in seven European countries and Australia and connected this database setting via a network called TwinNET to guarantee effortless data exchange and pooled analyses. This federated database system offers a powerful facility for combining different types of information from multiple data sources. The system is transparent to end users and application developers, since it makes the set of federated data sources look like a single system. The user need not be aware of the format or site where the data are stored, the language or programming interface of the data source, how the data are physically stored, whether they are partitioned and/or replicated or what networking protocols are used. The user sees a single standardized interface with the desired data elements for pooled analyses.
机译:复杂数据和数据管理的集成代表了大规模的基于生物库的后基因组时代研究项目(如GenomEUtwin(八个孪生注册管理机构之间的国际合作))的重大挑战,该项目具有大量基因型和表型数据,这些数据来自不同地点的不同数据源国家。挑战不仅在于数据协调和在各个位置不断更新临床细节,还在于数据存储的异质性和敏感的健康相关基因数据的机密性。必须建立坚实的基础架构,以提供安全但易于访问和标准化的数据交换,还必须促进对存储数据的统计分析。数据收集站点希望完全控制数据的积累,同时,集成应有助于轻松地对不同类型的数据池和研究设计进行数据的切片和切块。在这里,我们描述了我们如何构建用于收集七个欧洲国家和澳大利亚的基因型和表型信息的联邦数据库基础结构,以及如何通过称为TwinNET的网络连接此数据库设置,以确保轻松进行数据交换和汇总分析。该联合数据库系统提供了强大的功能,可以组合来自多个数据源的不同类型的信息。该系统对最终用户和应用程序开发人员透明,因为它使联合数据源集看起来像一个系统。用户不需要知道数据的存储格式或站点,数据源的语言或编程接口,如何物理存储数据,是否对其进行分区和/或复制或使用何种网络协议。用户将看到一个单一的标准化界面,其中包含用于合并分析的所需数据元素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号