AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

机译：AREM：通过期望最大化来对齐来自ChIP测序的短读

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

High-throughput sequencing coupled to chromatin immuno-precipitation (ChlP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChlP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChlP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChlP-Seq data analysis which utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem

机译：高通量测序与染色质免疫沉淀（ChlP-Seq）结合，广泛用于表征转录因子，辅因子，染色质修饰剂和其他DNA结合蛋白的全基因组结合模式。 ChlP-Seq数据分析中的关键步骤是将高通量测序的短读图映射到参考基因组，并鉴定富含短读的峰区域。尽管已经提出了几种用于ChlP-Seq分析的方法，但是大多数现有方法仅考虑可以唯一地放置在参考基因组中的读段，因此对于检测位于重复序列内的峰具有较低的功效。在这里，我们介绍了一种ChlP-Seq数据分析的概率方法，该方法利用了所有读数，从而提供了全基因组范围内结合模式的真实视图。使用对应于K个富集区域和无效基因组背景的混合模型对读取进行建模。我们使用最大似然来估计富集区域的位置，并实施称为AREM（通过期望最大化对齐读取）的期望最大化（E-M）算法，以将每个读取的对齐概率更新为不同的基因组位置。我们应用该算法来识别两种蛋白质的全基因组结合事件：Rad21，一种黏着蛋白的组成部分，是染色单体凝聚的关键因子，Srebp-1，一种对脂质/胆固醇稳态很重要的转录因子。使用AREM，我们能够以高可信度鉴定小鼠基因组中的19935个Rad21峰和1748个Srebp-1峰，包括仅使用唯一图谱就错过了的1517个（7.6％）Rad21峰和227个（13％）Srebp-1峰读。我们算法的开源实现可从http://sourceforge.net/projects/arem获得

著录项

来源
《Research in computational molecular biology》|2011年|p.283-297|共15页
会议地点 Vancouver(CA);Vancouver(CA)
作者
Daniel Newkirk; Jacob Biesinger; Alvin Chon; Kyoko Yokomori; Xiaohui Xie;
展开▼
作者单位

Department of Biological Chemistry ,The Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697;

Department of Computer Science ,The Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697;

Department of Computer Science ,The Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697;

Department of Biological Chemistry;

Department of Computer Science ,The Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类生物工程学（生物技术）;
关键词
chlp-seq; mixture model; expectation-maximization; cohesin; ctcf; srebp-1; repetitive elements; high throughput sequencing; peak-caller;

机译：chlp-seq;混合模型期望最大化粘着蛋白ctcf; srebp-1;重复元素；高通量测序；调峰者;

相似文献

外文文献
中文文献
专利

1. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization [J] . DANIEL NEWKIRK, JACOB BIESINGER, ALVIN CHON, Journal of computational biology: A journal of computational molecular cell biology . 2011,第11期

机译：AREM：通过期望最大化来对齐来自ChIP测序的短读
2. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization [J] . Alvin Chon, Daniel Newkirk, Jacob Biesinger, Journal of computational biology . 2011,第11期

机译：AREM：通过期望最大化来对齐来自ChIP测序的短读
3. Strand-seq enables reliable separation of long reads by chromosome via expectation maximization [J] . Ghareghani Maryam, Porubsky David, Sanders Ashley D., Bioinformatics . 2018,第13期

机译：Strand-SEQ通过期望最大化使染色体可靠地分离染色体
4. Analysis of Short-read Aligners using Genome Sequence Complexity [C] . Quang Tran, Nam Sy Vo, Eric Hicks, International Conference on Knowledge and Systems Engineering . 2020

机译：使用基因组序列复杂性分析短读对准器
5. Constrained expectation-maximization (EM), dynamic analysis, linear quadratic tracking, and nonlinear constrained expectation-maximization (EM) for the analysis of genetic regulatory networks and signal transduction networks. [D] . Xiong, Hao. 2008

机译：约束期望最大化（EM），动态分析，线性二次跟踪和非线性约束期望最大化（EM），用于分析遗传调控网络和信号转导网络。
6. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization [O] . Daniel Newkirk, Jacob Biesinger, Alvin Chon, -1

机译：AREM：通过期望最大化来对齐来自ChIP测序的短读
7. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization [O] . Daniel Newkirk, Jacob Biesinger, Alvin Chon, 2011

机译：arem：通过期望最大化对齐短读取从芯片排序

AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

摘要

著录项

相似文献

相关主题

期刊订阅