首页> 美国卫生研究院文献>Journal of Cheminformatics >A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood
【2h】

A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood

机译:一种新的适用性域技术用于在QSAR的化学空间中映射预测可靠性:可靠性-密度邻域

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The ability to define the regions of chemical space where a predictive model can be safely used is a necessary condition to assure the reliability of new predictions. This implies that reliability must be determined across chemical space in the attempt to localize “safe” and “unsafe” regions for prediction. As a result we devised an applicability domain technique that addresses the data locally instead of handling it as a whole—the reliability-density neighbourhood (RDN). The main novelty aspect of this method is that it characterizes each single training instance according to the density of its neighbourhood in the training set, as well as its individual bias and precision. By scanning through the chemical space (by iteratively increasing the applicability domain area), it was observed that new test compounds are successively included into the applicability domain region in such a manner that strongly correlates to their predictive performance. This allows the mapping of local reliability across different locations in the training set space, and thus allows identifying regions where the model has low reliability. This method also showed matching profiles between two external sets, which is an indication that it performs robustly with new data. Another novel aspect in this technique is that it is paired with a specific feature selection algorithm. As a result, the impact of the feature set used was studied from which the top 20 features selected by ReliefF yielded the best results, as opposed to using the model’s features or the entire feature set as commonly done. As the third novel aspect, in this work we propose a new scoring function to help evaluate the quality of an applicability domain profile (i.e., the curve of accuracy vs the applicability domain measure in question). Overall, the RDN showed to be a promising method that can correctly sort new instances according to predictive performance. As a result, this technique can be received by an end-user as proof of concept for the performance of a QSAR model in new data, thus promoting the user’s trust on the QSAR output.>Graphical abstract.
机译:定义可以安全使用预测模型的化学空间区域的能力是确保新预测的可靠性的必要条件。这意味着必须在整个化学空间中确定可靠性,以尝试定位“安全”和“不安全”区域以进行预测。结果,我们设计了一种适用性域技术,该技术可在本地处理数据,而不是整体处理数据-可靠性-密度邻域(RDN)。该方法的主要新颖之处在于,它根据训练集中邻域的密度及其个体偏差和精度来表征每个单个训练实例。通过扫描化学空间(通过反复增加适用范围区域),可以观察到新的测试化合物以与它们的预测性能密切相关的方式连续被包含在适用范围区域中。这允许在训练集空间中跨不同位置映射局部可靠性,从而允许识别模型可靠性较低的区域。该方法还显示了两个外部集之间的匹配配置文件,这表明它对新数据表现出强大的性能。该技术的另一个新颖方面是它与特定的特征选择算法配对。结果,研究了所使用功能集的影响,与通常使用模型的功能或整个功能集相比,ReliefF选择的前20个功能产生了最佳效果。作为第三个新颖的方面,在这项工作中,我们提出了一个新的评分功能,以帮助评估适用性域配置文件的质量(即准确性曲线与适用性域度量之间的关系)。总体而言,RDN被证明是一种很有前途的方法,可以根据预测性能正确地对新实例进行排序。结果,最终用户可以接受该技术,作为在新数据中执行QSAR模型的概念证明,从而提高用户对QSAR输出的信任。<!-fig ft0-> <! --fig @ position =“ anchor” mode =文章f4-> <!-fig mode =“ anchored” f5-> >图形摘要<!-fig / graphic | fig / alternatives / graphic mode = “锚定” m1-> <!-标题a7->。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号