...
首页> 外文期刊>Nucleic Acids Research >Microarray missing data imputation based on a set theoretic framework and biological knowledge
【24h】

Microarray missing data imputation based on a set theoretic framework and biological knowledge

机译:基于一套理论框架和生物学知识的微阵列缺失数据估算

获取原文
获取原文并翻译 | 示例
           

摘要

Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.
机译:使用微阵列测量的基因表达通常遭受缺失值问题。但是,在许多数据分析方法中,都需要完整的数据矩阵。尽管现有的缺失值插补算法在处理缺失值方面表现出良好的性能,但它们也有其局限性。例如,某些算法仅在数据中存在强局部相关性时才具有良好的性能,而某些算法在数据由全局结构支配时才提供最佳估计。另外,这些算法在归因中未考虑任何生物学限制。在本文中,我们提出了一种基于凸集投影(POCS)的集合理论框架,用于缺失数据插补。 POCS允许我们将关于缺失值的不同类型的先验知识纳入估计过程。 POCS的主要思想是将每个先验知识公式化为相应的凸集,然后使用收敛保证的迭代过程在所有这些集的交集上获得解。在这项工作中,我们考虑到数据的生物学特性,设计了一些凸集:第一个集主要利用微阵列数据中基因之间的局部相关结构,而第二个集则捕获了阵列之间的全局相关结构。第三组(实际上是一系列)利用了微阵列实验中同步丢失的生物学现象。在循环系统中,同步损耗是一种常见现象,我们基于这种现象为POCS插补算法构造了一系列集合。实验表明,与KNNimpute,SVDimpute和LSimpute方法相比,我们的算法可以显着减少错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号