首页> 美国卫生研究院文献>PLoS Clinical Trials >Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition
【2h】

Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition

机译:机器学习预测微生物群落功能:凋落物分解过程中溶解的有机碳的分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson’s correlation coefficients of.636 and.676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest.
机译:微生物群落无处不在,通常会影响它们所居住的生态系统的宏观特性。然而,由于社区的复杂性,解密特定微生物与生态系统特性之间的功能关系是一项持续的挑战。通过将DNA测序技术的进步与诸如机器学习之类的计算方法相结合,可以部分解决这一挑战。尽管机器学习技术已应用于微生物组数据,但仍很少使用这些技术,并且实现这种技术的用户友好平台还不广泛。我们开发了一种工具,该工具可实现神经网络和随机森林模型,以对微生物组数据执行回归和特征选择任务。在这项研究中,我们使用了该工具来分析土壤微生物组(16S rRNA基因图谱)和来自44天植物凋落物分解实验的溶解有机碳(DOC)数据。微生物组数据包括来自300多个微观领域的1709个细菌总操作分类单位(OTU)。对51个样本的测试集进行预测和实际DOC的回归分析,得出神经网络和随机森林方法的Pearson相关系数分别为.636和.676。将通过机器学习技术确定的重要分类单元与微生物生态学家广泛使用的标准工具(指标物种分析)的结果进行比较。在1709个细菌类群中,指标种类分析确定了285个类群是DOC浓度的重要决定因素。在通过机器学习方法确定的排名最高的285个特征中,有86个分类单元的子集是所有特征选择技术所共有的。使用该特征子集,与使用整个特征集确定的预测相比,数据集随机排列的预测结果至少具有同等的准确性。我们的结果表明,多种方法的集成可以帮助识别可能驱动特定功能成果的复杂社区中的分类单元的强大子集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号