首页> 外文期刊>Bioinformatics >Robust smooth segmentation approach for array CGH data analysis
【24h】

Robust smooth segmentation approach for array CGH data analysis

机译:阵列CGH数据分析的鲁棒平滑分割方法

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Array comparative genomic hybridization (aCGH) provides a genome-wide technique to screen for copy number alteration. The existing segmentation approaches for analyzing aCGH data are based on modeling data as a series of discrete segments with unknown boundaries and unknown heights. Although the biological process of copy number alteration is discrete, in reality a variety of biological and experimental factors can cause the signal to deviate from a stepwise function. To take this into account, we propose a smooth segmentation (smoothseg) approach.Methods: To achieve a robust segmentation, we use a doubly heavy-tailed random-effect model. The first heavy-tailed structure on the errors deals with outliers in the observations, and the second deals with possible jumps in the underlying pattern associated with different segments. We develop a fast and reliable computational procedure based on the iterative weighted least-squares algorithm with band-limited matrix inversion.Results: Using simulated and real data sets, we demonstrate how smoothseg can aid in identification of regions with genomic alteration and in classification of samples. For the real data sets, smoothseg leads to smaller false discovery rate and classification error rate than the circular binary segmentation (CBS) algorithm. In a realistic simulation setting, smoothseg is better than wavelet smoothing and CBS in identification of regions with genomic alterations and better than CBS in classification of samples. For comparative analyses, we demonstrate that segmenting the t-statistics performs better than segmenting the data.
机译:动机:阵列比较基因组杂交(aCGH)提供了一种全基因组技术来筛选拷贝数变化。现有的用于分析aCGH数据的分割方法是基于将数据建模为一系列具有未知边界和高度的离散段。尽管拷贝数改变的生物学过程是离散的,但实际上,多种生物学和实验因素都会导致信号偏离逐步函数。为了考虑到这一点,我们提出了一种平滑分割(smoothseg)方法。方法:为了实现鲁棒分割,我们使用了双重尾随机效应模型。关于误差的第一个重尾结构处理了观测值中的离群值,第二个处理了与不同段关联的基础模式中可能的跳跃。基于带域限制矩阵的迭代加权最小二乘算法,我们开发了一种快速可靠的计算程序。结果:使用模拟和真实数据集,我们证明了平滑石可以如何帮助识别具有基因组改变的区域和分类。样品。对于真实数据集,与循环二进制分段(CBS)算法相比,平滑集导致的错误发现率和分类错误率更小。在现实的模拟环境中,在识别具有基因组改变的区域时,smoothegeg要好于小波平滑和CBS,而在样本分类中要好于CBS。为了进行比较分析,我们证明了分割t统计量比分割数据要好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号