TWO-PASS IMPUTATION ALGORITHM FOR MISSING VALUE ESTIMATION IN GENE EXPRESSION TIME SERIES

ELENA TSIPORKOVA; VESELKA BOEVA

首页> 外文期刊>Journal of Bioinformatics and Computational Biology >TWO-PASS IMPUTATION ALGORITHM FOR MISSING VALUE ESTIMATION IN GENE EXPRESSION TIME SERIES

【24h】

TWO-PASS IMPUTATION ALGORITHM FOR MISSING VALUE ESTIMATION IN GENE EXPRESSION TIME SERIES

机译：基因表达时间序列中缺失值估计的两步插补算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

机译：基因表达微阵列实验经常生成缺少多个值的数据集。但是，大多数用于基因表达数据的分析，挖掘和分类方法都需要完整的基因阵列值矩阵。因此，在这样的数据集中准确估计缺失值已被认为是一个重要问题，并且已经向生物界提出了几种估算算法。然而，这些方法中的大多数并不特别适合于时间序列表达谱。有鉴于此，我们提出了一种新颖的插补算法，该算法特别适合于估计基因表达时间序列数据中的缺失值。该算法利用动态时间规整（DTW）距离来测量时间表达谱之间的相似性，然后为每个具有缺失值的基因表达谱选择一个专用的候选谱集进行估计。已经考虑了三种不同的基于DTW的插补（DTWimpute）算法：位置插补，邻域插补和两次通过插补。这些最初是在Perl中原型化的，并且已经使用几种不同的参数设置在酵母表达时间序列数据上评估了它们的准确性。实验表明，两次遍历算法始终具有优异的性能，特别是对于丢失条目水平较高的数据集，邻域算法和位置算法。两次加权DTWimpute算法的性能已针对加权K最近邻算法进行了基准测试，该算法已在生物界广泛使用。前一种算法似乎优于后一种算法。受这些发现的启发，清楚地表明了DTW技术在时间序列数据中的缺失值估计的附加值，我们建立了两遍DTWimpute算法的优化C ++实现。该软件还提供了三种不同的初始粗糙插补方法之间的选择。

著录项

来源
《Journal of Bioinformatics and Computational Biology》 |2007年第5期|共18页
作者
ELENA TSIPORKOVA; VESELKA BOEVA;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类细胞生物学;
关键词
DTW distance; gene expression time series; missing value imputation;

机译：DTW距离;基因表达时间序列;缺失值插补;

相似文献

外文文献
中文文献
专利

1. TWO-PASS IMPUTATION ALGORITHM FOR MISSING VALUE ESTIMATION IN GENE EXPRESSION TIME SERIES [J] . ELENA TSIPORKOVA, VESELKA BOEVA Journal of Bioinformatics and Computational Biology . 2007,第5期

机译：基因表达时间序列中缺失值估计的两步插补算法
2. A bagging algorithm for the imputation of missing values in time series [J] . Andiojaya Agung, Demirhan Haydar Expert systems with applications . 2019,第Sepa期

机译：时间序列中缺失值归档的堆积算法
3. Handling missing data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm [J] . Faraj Bashir, Hua-Liang Wei Neurocomputing . 2018,第FEBa7期

机译：使用向量自回归模型输入（VAR-IM）算法处理多元时间序列中的缺失数据
4. Handling missing data in multivariate time series using a vector autoregressive model based imputation (VAR-IM) algorithm: Part I: VAR-IM algorithm versus traditional methods [C] . Faraj Bashir, Hua-Liang Wei Mediterranean Conference on Control and Automation . 2016

机译：使用基于矢量自回归模型的插补（VAR-IM）算法处理多元时间序列中的缺失数据：第一部分：VAR-IM算法与传统方法
5. An Imputation-Estimation Algorithm Using Time-Varying Auxiliary Covariates for a Longitudinal Model When Outcome is Missing by Design. [D] . Temprosa, Marinella Gracia Montealegre. 2012

机译：当设计结果缺失时，使用纵向时变辅助协变量的纵向估计插补算法。
6. Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme [O] . Xian Wang, Ao Li, Zhaohui Jiang, 2006

机译：支持向量回归归因和正交编码方案估计DNA微阵列基因表达数据的缺失值
7. Iterative two-pass algorithm for missing data imputation in SNP arrays [O] . Sinoquet Christine 2009

机译：SNP阵列中缺失数据插补的迭代两遍算法
8. The Efficient Estimation of Stationary Multiple Time Series Mixed Models: Theory and Algorithms. [R] . Newton, H. J. 1975

机译：平稳多时间序列混合模型的有效估计：理论与算法。

TWO-PASS IMPUTATION ALGORITHM FOR MISSING VALUE ESTIMATION IN GENE EXPRESSION TIME SERIES

摘要

著录项

相似文献

相关主题

期刊订阅