首页> 外文期刊>Journal of biomedical informatics. >Ameliorative missing value imputation for robust biological knowledge inference.
【24h】

Ameliorative missing value imputation for robust biological knowledge inference.

机译:改进的缺失值插补可用于可靠的生物学知识推断。

获取原文
获取原文并翻译 | 示例
           

摘要

Gene expression data is widely used in various post genomic analyses. The data is often probed using microarrays due to their ability to simultaneously measure the expressions of thousands of genes. The expression data, however, contains significant numbers of missing values, which can impact on subsequent biological analysis. To minimize the impact of these missing values, several imputation algorithms including Collateral Missing Value Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), Local Least Square Impute (LLSImpute), and K-Nearest Neighbour (KNN) have been proposed. These algorithms, however, exploit either only the global or local correlation structure of the data, which normally can lead to higher estimation errors. This paper presents an Ameliorative Missing Value Imputation (AMVI) technique which has ability to exploit global/local and positiveegative correlations in a given dataset by automatic selection of the optimal number of predictor genes k using a wrapper non-parametric method based on Monte Carlo simulations. The AMVI technique has CMVE strategy at its core because CMVE has demonstrated improved performance compared to both low variance methods like BPCA, LLSImpute, and high variance methods such as KNN and ZeroImpute, as CMVE exploits positiveegative correlations. The performance of AMVI is compared with CMVE, BPCA, LLSImpute, and KNN by randomly removing between 1% and 15% missing values in eight different ovarian, breast cancer and yeast datasets. Together with the standard NRMS error metric, the True Positive (TP) rate of the significant genes selection, biological significance of the selected genes and the statistical significance test results are presented to investigate the impact of missing values on subsequent biological analysis. The enhanced performance of AMVI was demonstrated by its lower NRMS error, improved TP rate, bio significance of the selected genes and statistical significance test results, when compared with theaforementioned imputation methods across all the datasets. The results show that AMVI adapted to the latent correlation structure of the data and proved to be an effective and robust approach compared with the trial and error methodology for selecting k. The results confirmed that AMVI can be successfully applied to accurately impute missing values prior to any microarray data analysis.
机译:基因表达数据广泛用于各种后基因组分析中。由于具有同时测量数千种基因表达的能力,经常使用微阵列来探测数据。但是,表达数据包含大量缺失值,这可能会影响后续的生物学分析。为了最大程度地减少这些缺失值的影响,几种插补算法包括抵押缺失值估计(CMVE),贝叶斯主成分分析(BPCA),最小二乘归因(LSImpute),局部最小二乘归因(LLSImpute)和K最近邻( KNN)已提出。但是,这些算法仅利用数据的全局或局部相关结构,这通常会导致更高的估计误差。本文提出了一种改进的缺失值插补(AMVI)技术,该技术可通过使用基于Monte的包装器非参数方法自动选择最佳预测基因数k来利用给定数据集中的全局/局部和正/负相关性。卡洛模拟。 AMVI技术以CMVE策略为核心,因为与CMCA利用正/负相关性相比,BPVE,LLSImpute等低方差方法和KNN和ZeroImpute等高方差方法相比,CMVE表现出更高的性能。通过在八个不同的卵巢,乳腺癌和酵母数据集中随机删除介于1%和15%之间的缺失值,将AMVI的性能与CMVE,BPCA,LLSImpute和KNN进行了比较。连同标准的NRMS误差度量,重要基因选择的真实阳性(TP)率,所选基因的生物学显着性和统计显着性测试结果一起,提出来研究缺失值对后续生物学分析的影响。与所有数据集中的上述插补方法相比,AMVI的性能更高,其更低的NRMS误差,更高的TP速率,所选基因的生物学显着性和统计学显着性测试结果证明了这一点。结果表明,AMVI与数据的潜在相关结构相适应,并且与选择k的反复试验方法相比,被证明是一种有效且鲁棒的方法。结果证实,在进行任何微阵列数据分析之前,AMVI可以成功应用于准确估算缺失值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号