...
首页> 外文期刊>mSystems >Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection
【24h】

Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection

机译:卫生生物标志物分类的可解释的日志对比:一种平衡选择的新方法

获取原文
           

摘要

Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization. IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.
机译:自世纪之交以来,技术进步使得可以以成本效益的方式获得任何组织的分子曲线。在这些进展中,具有复杂的高通量测定,其测量微生物,RNA分子和代谢物的相对丰富。虽然最常收集这些数据以获得新的洞察,但它们也可以用作生物标志物,以在临床上产生临床有用的诊断分类器。如何最好地分类高维 - MODICS数据仍然是一个积极研究的领域。但是,很少有人明确地模拟这些数据的相对性质,而是依赖于麻烦的训练。本报告(i)强调健康生物标志物的相对性质,(ii)讨论了相对数据分类的文献,(iii)基准在多种生物标志物类型中如何进行正则化逻辑回归。我们展示了如何解释的一组可解释的日志对比,称为余额,可以准备分类数据。我们提出了一种简单的程序,称为鉴别平衡分析,选择可以一起区分实验条件的2和3个细菌的组。辨别性平衡分析是数据标准化的快速,准确和可解释的替代品。重要的高通量测序为测量任何环境或生物样品中的细菌的相对丰度提供了一种简单且具有成本效益的方法。当这些样品来自人类时,微生物组签名可以充当疾病预测的生物标志物。然而,因为测量细菌丰度作为组合物,所以数据具有使常规分析不合适的独特性质。为了克服这一点,分析师经常使用繁琐的阵正性。本文提出了一种替代方法,其识别其化学计量的细菌的对和三种细菌可以区分患病和无次样品的细菌。通过使用称为余额的可解释日志对比度,我们开发了一个完全归一化的分类过程,可减少特征空间并提高解释性,而不会牺牲分类器性能。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号