首页> 美国卫生研究院文献>Computational and Structural Biotechnology Journal >Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning
【2h】

Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning

机译:通过特征工程和机器学习预测秀丽隐杆线虫的基因重要性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Defining genes that are essential for life has major implications for understanding critical biological processes and mechanisms. Although essential genes have been identified and characterised experimentally using functional genomic tools, it is challenging to predict with confidence such genes from molecular and phenomic data sets using computational methods. Using extensive data sets available for the model organism , we constructed here a machine-learning (ML)-based workflow for the prediction of essential genes on a genome-wide scale. We identified strong predictors for such genes and showed that trained ML models consistently achieve highly-accurate classifications. Complementary analyses revealed an association between essential genes and chromosomal location. Our findings reveal that essential genes in tend to be located in or near the centre of autosomal chromosomes; are positively correlated with low single nucleotide polymorphim (SNP) densities and epigenetic markers in promoter regions; are involved in protein and nucleotide processing; are transcribed in most cells; are enriched in reproductive tissues or are targets for small RNAs bound to the argonaut CSR-1. Based on these results, we hypothesise an interplay between epigenetic markers and small RNA pathways in the germline, with transcription-based memory; this hypothesis warrants testing. From a technical perspective, further work is needed to evaluate whether the present ML-based approach will be applicable to other metazoans (including ) for which comprehensive data sets (i.e. genomic, transcriptomic, proteomic, variomic, epigenetic and phenomic) are available.
机译:定义生命必不可少的基因对于理解关键的生物学过程和机制具有重要意义。尽管已使用功能性基因组工具对必需基因进行了鉴定和实验表征,但使用计算方法从分子和表型数据集中自信地预测此类基因仍具有挑战性。使用适用于模型生物的大量数据集,我们在此处构建了基于机器学习(ML)的工作流,用于在全基因组范围内预测必需基因。我们确定了此类基因的强预测因子,并表明训练有素的ML模型始终可以实现高度准确的分类。补充分析显示必需基因与染色体位置之间存在关联。我们的发现表明,必需基因倾向于位于常染色体的中心或附近。与启动子区域的低单核苷酸多态性(SNP)密度和表观遗传标记呈正相关;参与蛋白质和核苷酸加工;在大多数细胞中转录;它们在生殖组织中富集,或者是与argonaut CSR-1结合的小RNA的靶标。基于这些结果,我们假设表观遗传标记与种系中的小RNA途径之间存在相互作用,并具有基于转录的记忆;这个假设值得检验。从技术角度来看,需要进一步的工作来评估当前基于ML的方法是否适用于其他后生动物(包括),这些后生动物具有全面的数据集(即基因组,转录组,蛋白质组学,变体,表观遗传和表型)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号