首页> 美国卫生研究院文献>Nucleic Acids Research >Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering
【2h】

Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering

机译:使用Y染色体中的最小独立进化标记推断种群结构和关系:递归特征选择的混合方法用于层次聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Inundation of evolutionary markers expedited in Human Genome Project and 1000 Genome Consortium has necessitated pruning of redundant and dependent variables. Various computational tools based on machine-learning and data-mining methods like feature selection/extraction have been proposed to escape the curse of dimensionality in large datasets. Incidentally, evolutionary studies, primarily based on sequentially evolved variations have remained un-facilitated by such advances till date. Here, we present a novel approach of recursive feature selection for hierarchical clustering of Y-chromosomal SNPs/haplogroups to select a minimal set of independent markers, sufficient to infer population structure as precisely as deduced by a larger number of evolutionary markers. To validate the applicability of our approach, we optimally designed MALDI-TOF mass spectrometry-based multiplex to accommodate independent Y-chromosomal markers in a single multiplex and genotyped two geographically distinct Indian populations. An analysis of 105 world-wide populations reflected that 15 independent variations/markers were optimal in defining population structure parameters, such as FST, molecular variance and correlation-based relationship. A subsequent addition of randomly selected markers had a negligible effect (close to zero, i.e. 1 × 10−3) on these parameters. The study proves efficient in tracing complex population structures and deriving relationships among world-wide populations in a cost-effective and expedient manner.
机译:人类基因组计划和1000个基因组联合会加快了对进化标记的淹没,因此必须修剪冗余和因变量。已经提出了各种基于机器学习和数据挖掘方法(例如特征选择/提取)的计算工具,以逃避大型数据集中维度的诅咒。顺便说一句,迄今为止,主要基于顺序进化变异的进化研究一直没有得到这种进展的促进。在这里,我们提出了一种递归特征选择的新方法,用于Y染色体SNP /单体组的层次聚类,以选择一组最小的独立标记,足以推断出大量进化标记所精确推论的种群结构。为了验证我们方法的适用性,我们优化设计了基于MALDI-TOF质谱的多重分析,以在单个多重分析中容纳独立的Y染色体标记,并对两个地理上不同的印度人群进行基因分型。对全球105个人口的分析表明,在定义人口结构参数(例如FST,分子变异和基于相关的关系)时,有15个独立的变异/标记是最佳的。随后添加随机选择的标记对这些参数的影响可忽略不计(接近零,即1×10 -3 )。这项研究证明可以有效地追踪复杂的人口结构并以经济有效的方式推导全球人口之间的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号