首页> 美国卫生研究院文献>Genetics >A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization
【2h】

A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization

机译:一种新的基于核主成分分析和优化的人口结构推断方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at .
机译:群体结构是全基因组关联研究中的一个混杂因素,增加了假阳性关联的比率。为了对其进行校正,已经提出了几种基于模型的算法,例如ADMIXTURE和STRUCTURE。它们倾向于遭受这样的事实,即它们具有相当大的计算负担,当与大型数据集(如由下一代测序技术产生的数据集)一起使用时,限制了它们的适用性。为了解决这个问题,已经提出了基于非模型的方法,例如稀疏非负矩阵分解(sNMF)和EIGENSTRAT,它们可以随着较大的数据更好地扩展。在这里,我们介绍一种新颖的基于模型的非模型方法,即使用核PCA和优化(PSIKO)进行总体结构推断,该方法基于线性核PCA和最小二乘优化的独特组合,并且可以推断混合系数,组成部分以及数据集的创建者总数。在各种模拟场景以及真实生物数据上,已将PSIKO与现有的领先方法进行了比较。我们发现,除了产生与其他测试方法相同质量的结果外,PSIKO可以很好地缩放数据集大小,与甚至诸如sNMF之类的最新方法相比,更长的序列也可以显着(高达30倍)更快地扩展。 PSIKO和随附的手册可在上免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号