...
首页> 外文期刊>BMC Bioinformatics >Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models
【24h】

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models

机译:通过监督分段模型提高过分分散的芯片SEQ数据中的峰值检测精度

获取原文
           

摘要

Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS ( https://github.com/aLiehrmann/CROCS ), detect the peaks more accurately than algorithms which rely on natural assumptions. The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.
机译:组蛋白修饰构成了基因表达遗传调节的基本机制。在2000年代初,出现了一种强大的技术,使染色质免疫沉淀与高通量测序(CHIP-SEQ)致染色。该技术提供了与这些修改相关的DNA区域的直接调查。为了实现该技术的全部潜力,已经开发或适于分析它产生的大量数据的衰减统计算法。这些算法中的许多围绕自然假设构建,例如泊松分布以模拟计数数据中的噪声。在这项工作中,我们从这些自然的假设开始,并表明可以改善它们。我们对组蛋白修改的七个参考数据集的比较(H3K36ME3和H3K4ME3)表明,在应用条件下,自然假设并不总是现实。我们表明,具有替代噪声假设的无约束多变化点检测模型和惩罚参数的监督学习减少了计数数据所呈现的过分分散。在R包Crocs(https://github.com/aliehrmann/crocs)中实现的这些模型比依赖于自然假设的算法更准确地检测峰值。我们提出的分割模型可以通过为H3K36ME3和H3K4ME3组蛋白修饰提供新的高质量峰预测轨迹来利用表观遗传学领域的研究人员。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号