首页> 美国卫生研究院文献>Scientific Reports >Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
【2h】

Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II

机译:基于整合的hERG数据库的NSGA-II描述子选择支持向量机模型用于hERG抑制活性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches constructed the datasets from only one database, limiting the predictability and scope of the models. In this study, a hERG classification model was constructed using the largest dataset for hERG inhibition built by integrating multiple databases. The integrated dataset consisted of more than 291,000 structurally diverse compounds derived from ChEMBL, GOSTAR, PubChem, and hERGCentral. The prediction model was built by support vector machine (SVM) with descriptor selection based on Non-dominated Sorting Genetic Algorithm-II (NSGA-II) to optimize the descriptor set for maximum prediction performance with the minimal number of descriptors. The SVM classification model using 72 selected descriptors and ECFP_4 structural fingerprints recorded kappa statistics of 0.733 and accuracy of 0.984 for the test set, substantially outperforming the prediction performance of the current commercial applications for hERG prediction. Finally, the applicability domain of the prediction model was assessed based on the molecular similarity between the training set and test set compounds.
机译:在药物发现计划的早期评估hERG责任很重要。公共数据库中与hERG相关的信息的最新增加使机器学习技术的各种成功应用能够预测hERG抑制。但是,大多数这些研究仅从一个数据库构建数据集,从而限制了模型的可预测性和范围。在这项研究中,使用最大的hERG抑制数据集通过整合多个数据库构建了hERG分类模型。集成的数据集由来自ChEMBL,GOSTAR,PubChem和hERGCentral的291,000多种结构多样的化合物组成。预测模型是由支持向量机(SVM)建立的,具有基于非支配排序遗传算法II(NSGA-II)的描述符选择,可以优化描述符集,以最少的描述符数量实现最大的预测性能。使用72个选定的描述符和ECFP_4结构指纹的SVM分类模型记录的测试集的kappa统计值为0.733,准确度为0.984,大大超过了hERG预测的当前商业应用的预测性能。最后,基于训练集和测试集化合物之间的分子相似性,评估了预测模型的适用范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号