首页> 外文期刊>PLoS Computational Biology >A novel artificial intelligence-based approach for identification of deoxynucleotide aptamers
【24h】

A novel artificial intelligence-based approach for identification of deoxynucleotide aptamers

机译:基于人工智能的基于人工智能的方法,用于鉴定脱氧核苷酸适体

获取原文
           

摘要

The selection of a DNA aptamer through the Systematic Evolution of Ligands by EXponential enrichment (SELEX) method involves multiple binding steps, in which a target and a library of randomized DNA sequences are mixed for selection of a single, nucleotide-specific molecule. Usually, 10 to 20 steps are required for SELEX to be completed. Throughout this process it is necessary to discriminate between true DNA aptamers and unspecified DNA-binding sequences. Thus, a novel machine learning-based approach was developed to support and simplify the early steps of the SELEX process, to help discriminate binding between DNA aptamers from those unspecified targets of DNA-binding sequences. An Artificial Intelligence (AI) approach to identify aptamers were implemented based on Natural Language Processing (NLP) and Machine Learning (ML). NLP method (CountVectorizer) was used to extract information from the nucleotide sequences. Four ML algorithms (Logistic Regression, Decision Tree, Gaussian Na?ve Bayes, Support Vector Machines) were trained using data from the NLP method along with sequence information. The best performing model was Support Vector Machines because it had the best ability to discriminate between positive and negative classes. In our model, an Accuracy (A) of 0.995, the fraction of samples that the model correctly classified, and an Area Under the Receiving Operating Curve (AUROC) of 0.998, the degree by which a model is capable of distinguishing between classes, were observed. The developed AI approach is useful to identify potential DNA aptamers to reduce the amount of rounds in a SELEX selection. This new approach could be applied in the design of DNA libraries and result in a more efficient and faster process for DNA aptamers to be chosen during SELEX.
机译:通过指数富集(SELEX)方法通过配体的系统演化选择DNA适体涉及多种结合步骤,其中将随机化DNA序列的靶和文库混合以选择单个核苷酸特异性分子。通常,SELEX需要10到20个步骤。在此过程中,有必要区分真正的DNA适体和未指定的DNA结合序列。因此,开发了一种新的机器学习方法以支持和简化SELEX过程的早期步骤,以帮助区分DNA适体与来自DNA结合序列的那些未指明的靶标之间的结合。基于自然语言处理(NLP)和机器学习(ML)来实施识别适体的人工智能(AI)方法。 NLP方法(CountVectorizer)用于从核苷酸序列中提取信息。使用来自NLP方法的数据以及序列信息,训练了四毫升算法(逻辑回归,决策树,高斯NA?ve Bay,支持向量机)。最好的表演模型是支持向量机,因为它具有区分正面和负数的最佳能力。在我们的模型中,精度(a)为0.995,模型正确分类的样本的一部分,以及在0.998的接收操作曲线(Auroc)下的区域,模型能够区分类别的程度观测到的。开发的AI方法可用于识别潜在的DNA适体,以减少SELEX选择中的轮数。这种新方法可以应用于DNA文库的设计,并导致在SELEX期间选择DNA适体的更有效和更快的过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号