【24h】

Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis

机译:DNA拷贝数数据分析的间隔得分的高效计算

获取原文
获取原文并翻译 | 示例

摘要

Background. DNA amplifications and deletions characterize cancer genome and are often related to disease evolution. Microarray based techniques for measuring these DNA copy-number changes use fluorescence ratios at arrayed DNA elements (BACs, cDNA or oligonu-cleotides) to provide signals at high resolution, in terms of genomic locations. These data are then further analyzed to map aberrations and boundaries and identify biologically significant structures. Methods. We develop a statistical framework that enables the casting of several DNA copy number data analysis questions as optimization problems over real valued vectors of signals. The simplest form of the optimization problem seeks to maximize φ (I) = Σ υ_i / (|I|)~(1/2) over all subintervals I in the input vector. We present and prove a linear time approximation scheme for this problem. Namely, a process with time complexity O (nε~(-2)) that outputs an interval for which φ(I) is at least Opt/α(ε), where Opt is the actual optimum and α(ε) → 1 as ε → 0. We further develop practical implementations that improve the performance of the naive quadratic approach by orders of magnitude. We discuss properties of optimal intervals and how they apply to the algorithm performance. Examples. We benchmark our algorithms on synthetic as well as publicly available DNA copy number data. We demonstrate the use of these methods for identifying aberrations in single samples as well as common alterations in fixed sets and subsets of breast cancer samples.
机译:背景。 DNA扩增和缺失是癌症基因组的特征,通常与疾病的发展有关。用于测量这些DNA拷贝数变化的基于微阵列的技术利用阵列DNA元素(BAC,cDNA或寡核苷酸)的荧光比率,以基因组位置提供高分辨率的信号。然后进一步分析这些数据,以绘制像差和边界并识别生物学上重要的结构。方法。我们开发了一个统计框架,该框架能够将多个DNA拷贝数数据分析问题转换为对信号的实际值向量的优化问题。最优化问题的最简单形式是在输入向量的所有子间隔I上最大化φ(I)=Συ_i/(| I |)〜(1/2)。我们提出并证明了该问题的线性时间近似方案。即,具有时间复杂度O(nε〜(-2))的过程输出一个间隔,其中φ(I)至少为Opt /α(ε),其中Opt为实际最优值,α(ε)→1为ε→0。我们进一步开发了实用的实现,可以将原始二次方法的性能提高几个数量级。我们讨论最佳间隔的属性以及它们如何应用于算法性能。例子。我们以合成以及公开的DNA拷贝数数据为基准对算法进行基准测试。我们演示了使用这些方法来识别单个样本中的像差以及乳腺癌样本的固定集和子集中的常见变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号