首页> 美国卫生研究院文献>Bioinformatics >h5vc: scalable nucleotide tallies with HDF5
【2h】

h5vc: scalable nucleotide tallies with HDF5

机译:h5vc:具有HDF5的可扩展核苷酸计数

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Summary: As applications of genome sequencing, including exomes and whole genomes, are expanding, there is a need for analysis tools that are scalable to large sets of samples and/or ultra-deep coverage. Many current tool chains are based on the widely used file formats BAM and VCF or VCF-derivatives. However, for some desirable analyses, data management with these formats creates substantial implementation overhead, and much time is spent parsing files and collating data. We observe that a tally data structure, i.e. the table of counts of nucleotides × samples × strands × genomic positions, provides a reasonable intermediate level of abstraction for many genomics analyses, including single nucleotide variant (SNV) and InDel calling, copy-number estimation and mutation spectrum analysis. Here we present h5vc, a data structure and associated software for managing tallies. The software contains functionality for creating tallies from BAM files, flexible and scalable data visualization, data quality assessment, computing statistics relevant to variant calling and other applications. Through the simplicity of its API, we envision making low-level analysis of large sets of genome sequencing data accessible to a wider range of researchers.>Availability and implementation: The package >h5vc for the statistical environment R is available through the Bioconductor project. The HDF5 system is used as the core of our implementation.>Contact: or >Supplementary information: are available at Bioinformatics online.
机译:>摘要:随着基因组测序(包括外显子组和整个基因组)的应用范围不断扩大,需要一种可扩展到大量样品和/或超深度覆盖范围的分析工具。当前许多工具链都基于广泛使用的文件格式BAM和VCF或VCF派生文件。但是,对于某些合乎需要的分析,使用这些格式的数据管理会产生大量的实现开销,并且会花费大量时间来解析文件和整理数据。我们观察到,一个tally数据结构,即核苷酸计数×样本×链数×基因组位置的表,为许多基因组分析提供了合理的中间抽象水平,包括单核苷酸变异(SNV)和InDel调用,拷贝数估计和突变谱分析。在这里,我们介绍 h5vc ,这是一种用于管理计数的数据结构和相关软件。该软件包含用于从BAM文件创建标记的功能,灵活且可扩展的数据可视化,数据质量评估,与变体调用和其他应用程序相关的计算统计信息。通过其API的简单性,我们可以使更广泛的研究人员可以访问大型基因组测序数据的低级分析。>可用性和实现:软件包 > h5vc <可通过Bioconductor项目获得用于统计环境R的/ strong> 。 HDF5系统用作我们实施的核心。>联系方式:或>补充信息:可在Bioinformatics在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号