首页> 外文期刊>PLoS Computational Biology >Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data
【24h】

Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data

机译:利用ChIP-Seq数据的多读分析发现基因组高度重复区域中的转录因子结合位点

获取原文
           

摘要

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
机译:染色质免疫沉淀后再进行高通量测序(ChIP-seq)正在迅速取代染色质免疫沉淀与全基因组平铺阵列分析(ChIP-chip)相结合,成为绘制转录因子结合位点和染色质修饰的首选方法。用于分析ChIP-seq数据的最新技术依赖于仅使用唯一映射到相关参考基因组的读段(单读段)。这可能导致多达30%的可比对读数遗漏。我们描述了一种利用映射到参考基因组上多个位置的读段的通用方法(多读段)。我们的方法基于使用加权比对方案将多读数分配为小数。使用人类STAT1和小鼠GATA1 ChIP-seq数据集,我们说明了整合多个读段会显着增加测序深度,导致检测无法通过单读段识别的新峰,并改善了可绘制区域中的峰检测。我们调查了仅通过利用多重读数通过计算实验检测到的峰的各种全基因组特征。总体而言,多次读取分析得到的峰与单读取所鉴定的峰具有相似的特征,只是它们中的大多数驻留在片段重复中。我们通过独立的实时定量ChIP分析进一步验证了多个GATA1多重只读峰,并鉴定了GATA1的新型靶基因。这些计算和实验结果表明,多重阅读对于使用ChIP-seq实验研究基因组高度重复区域中的转录因子结合至关重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号