首页> 美国卫生研究院文献>Journal of Computational Biology >A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets
【2h】

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets

机译:一种用于状态空间推理和聚类的MAD-Bayes算法用于查询ChIP-Seq数据集的大集合

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Current analytic approaches for querying large collections of chromatin immunoprecipitation followed by sequencing (ChIP-seq) data from multiple cell types rely on individual analysis of each data set (i.e., peak calling) independently. This approach discards the fact that functional elements are frequently shared among related cell types and leads to overestimation of the extent of divergence between different ChIP-seq samples. Methods geared toward multisample investigations have limited applicability in settings that aim to integrate 100s to 1000s of ChIP-seq data sets for query loci (e.g., thousands of genomic loci with a specific binding site). Recently, Zuo et al. developed a hierarchical framework for state-space matrix inference and clustering, named MBASIC, to enable joint analysis of user-specified loci across multiple ChIP-seq data sets. Although this versatile framework estimates both the underlying state-space (e.g., bound vs. unbound) and also groups loci with similar patterns together, its Expectation-Maximization-based estimation structure hinders its applicability with large number of loci and samples. We address this limitation by developing MAP-based asymptotic derivations from Bayes (MAD-Bayes) framework for MBASIC. This results in a K-means-like optimization algorithm that converges rapidly and hence enables exploring multiple initialization schemes and flexibility in tuning. Comparison with MBASIC indicates that this speed comes at a relatively insignificant loss in estimation accuracy. Although MAD-Bayes MBASIC is specifically designed for the analysis of user-specified loci, it is able to capture overall patterns of histone marks from multiple ChIP-seq data sets similar to those identified by genome-wide segmentation methods such as ChromHMM and Spectacle.
机译:>当前用于查询大量染色质免疫沉淀并随后从多种细胞类型进行测序(ChIP-seq)数据的分析方法依赖于每个数据集的独立分析(即峰调用)独立进行。这种方法放弃了在相关细胞类型之间频繁共享功能元素这一事实,并导致高估了不同ChIP-seq样本之间的差异程度。旨在进行多样本调查的方法在旨在将100s到1000s的ChIP-seq数据集整合到查询基因座(例如具有特定结合位点的数千个基因组基因座)的设置中适用性有限。最近,Zuo等。开发了用于状态空间矩阵推理和聚类的分层框架,称为MBASIC,以实现跨多个ChIP-seq数据集的用户指定位点的联合分析。尽管这种通用框架既可以估算基础状态空间(例如绑定状态还是未绑定状态),也可以将具有相似模式的基因座组合在一起,但是其基于期望最大化的估算结构阻碍了其在大量基因座和样本中的适用性。我们通过为MBASIC开发来自贝叶斯(MAD-Bayes)框架的基于MAP的渐近派生来解决此限制。这导致了一种类似K均值的优化算法,该算法快速收敛,因此可以探索多种初始化方案和调整的灵活性。与MBASIC的比较表明,该速度在估计精度方面的损失相对较小。尽管MAD-Bayes MBASIC是专门为分析用户指定的基因座而设计的,但它能够从多个ChIP-seq数据集中捕获组蛋白标记的整体模式,类似于通过ChromHMM和Spectacle等全基因组分割方法识别的数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号