首页> 美国卫生研究院文献>NAR Genomics and Bioinformatics >An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients
【2h】

An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients

机译:一种可解释的低复杂性机器学习框架用于巩固基于克罗恩病患者的硅基硅诊断

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn’s disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction.
机译:整个Exome测序(WES)数据允许研究人员查明许多孟德尔障碍的原因。及时,测序数据对于解决基因组解释难题至关重要,旨在揭示基因型对表型关系,但目前需要解决许多概念和技术问题。特别地,由于挑战的复杂性,数据以及批量效应和数据异质性等问题的复杂性,对硅基对多基因疾病的寡聚疾病的硅基疾病的硅基疾病的硅基诊断中的硅诊断很少有少量尝试是机器学习(ML)方法的混淆因素。在这里,我们提出了一种基于克罗恩病(CD)患者的基于外硅诊断的方法,该患者解决了许多目前的方法问题。首先,我们根据基因突变负担负担概念为WES数据设计了一个Rational ML友好特征表示,这适用于小型样本尺寸数据集。其次,我们提出了一个具有参数捆绑和重正则化的神经网络(NN),以限制其复杂性,从而限制过度拟合的风险。我们在3个CD案例控制数据集中培训并测试了我们的NN,将性能与以前的CAGI挑战的参与者进行比较。我们表明,尽管有限的NN复杂性,但它优于以前的方法。此外,我们通过分析变体和基因水平的学习模式并研究导致每个预测的决策过程来解释NN预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号