首页> 外文会议>Research in computational molecular biology >AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization
【24h】

AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

机译:AREM:通过期望最大化来对齐来自ChIP测序的短读

获取原文
获取原文并翻译 | 示例

摘要

High-throughput sequencing coupled to chromatin immuno-precipitation (ChlP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChlP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChlP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChlP-Seq data analysis which utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem
机译:高通量测序与染色质免疫沉淀(ChlP-Seq)结合,广泛用于表征转录因子,辅因子,染色质修饰剂和其他DNA结合蛋白的全基因组结合模式。 ChlP-Seq数据分析中的关键步骤是将高通量测序的短读图映射到参考基因组,并鉴定富含短读的峰区域。尽管已经提出了几种用于ChlP-Seq分析的方法,但是大多数现有方法仅考虑可以唯一地放置在参考基因组中的读段,因此对于检测位于重复序列内的峰具有较低的功效。在这里,我们介绍了一种ChlP-Seq数据分析的概率方法,该方法利用了所有读数,从而提供了全基因组范围内结合模式的真实视图。使用对应于K个富集区域和无效基因组背景的混合模型对读取进行建模。我们使用最大似然来估计富集区域的位置,并实施称为AREM(通过期望最大化对齐读取)的期望最大化(E-M)算法,以将每个读取的对齐概率更新为不同的基因组位置。我们应用该算法来识别两种蛋白质的全基因组结合事件:Rad21,一种黏着蛋白的组成部分,是染色单体凝聚的关键因子,Srebp-1,一种对脂质/胆固醇稳态很重要的转录因子。使用AREM,我们能够以高可信度鉴定小鼠基因组中的19935个Rad21峰和1748个Srebp-1峰,包括仅使用唯一图谱就错过了的1517个(7.6%)Rad21峰和227个(13%)Srebp-1峰读。我们算法的开源实现可从http://sourceforge.net/projects/arem获得

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号