首页> 外文会议>Asia-Pacific Bioinformatics Conference >Gene-gene interaction filtering with ensemble of filters
【24h】

Gene-gene interaction filtering with ensemble of filters

机译:基因基因相互作用过滤器滤波器

获取原文

摘要

Background: Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step. In this study, we show that two commonly used SNP-SNP interaction filtering algorithms, ReliefF and tuned ReliefF (TuRF), are sensitive to the order of the samples in the dataset, giving rise to unstable and suboptimal results. However, we observe that the 'unstable' results from multiple runs of these algorithms can provide valuable information about the dataset. We therefore hypothesize that aggregating results from multiple runs of the algorithm may improve the filtering performance. Results: We propose a simple and effective ensemble approach in which the results from multiple runs of an unstable filter are aggregated based on the general theory of ensemble learning. The ensemble versions of the ReliefF and TuRF algorithms, referred to as ReliefF-E and TuRF-E, are robust to sample order dependency and enable a more informative investigation of data characteristics. Using simulated and real datasets, we demonstrate that both the ensemble of ReliefF and the ensemble of TuRF can generate a much more stable SNP ranking than the original algorithms. Furthermore, the ensembleof TuRF achieved the highest success rate in comparison to many state-of-the-art algorithms as well as traditional ^2-test and odds ratio methods in terms of retaining gene-gene interactions.
机译:背景:复杂的疾病通常由多种基因和它们彼此的相互作用引起的。基因组协会(GWA)研究为我们提供了通过SNP标记的面板捕获这些疾病相关基因和基因 - 基因相互作用的机会。然而,在计算密集型基因基因相互作用识别步骤之前,适当的滤波过程对于减少搜索空间至关重要。在这项研究中,我们表明,两个常用的SNP-SNP交互滤波算法,Relieff和调谐的Relieff(草皮)对数据集中的样本的顺序敏感,从而产生不稳定和次优效果。但是,我们观察到,来自这些算法的多次运行的“不稳定”结果可以提供有关数据集的有价值的信息。因此,我们假设来自算法多次运行的聚合结果可以提高滤波性能。结果:我们提出了一种简单且有效的合并方法,其中基于集合学习的一般理论聚合了多次不稳定过滤器的多次不稳定滤波器的结果。作为Relieff-E和Turf-E称为Relieff和地图算法的集合版本是对采样订单依赖性的强大,并为数据特征进行更充分的信息调查。使用模拟和真实数据集,我们证明了Relieff的集合和草皮的集合可以产生比原始算法更稳定的SNP排名。此外,与许多最先进的算法以及传统的^ 2 - 试验和差异方法在保持基因 - 基因相互作用方面,达到了最高成功率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号