首页> 外文学位 >Novel structure similarity-based methods for identifying drug-like compounds.
【24h】

Novel structure similarity-based methods for identifying drug-like compounds.

机译:基于新型结构相似性的方法,用于识别类药物化合物。

获取原文
获取原文并翻译 | 示例

摘要

The prediction of biologically active compounds is of great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. Many computational methods in this area focus on measuring the structural similarities between chemical structures. However, traditional similarity measures are often too rigid or they consider only global similarities between structures. This study introduces two new alternative search approaches that overcome most of these limitations. First, the maximum common substructure (MCS) approach provides a more promising and flexible alternative for predicting bioactive compounds. A new backtracking algorithm for MCS is proposed here and compared to global similarity measurements. Our algorithm provides high flexibility in the matching process and is very efficient in identifying local structural similarities. To apply the MCS-based similarity measure in predictive models of biological activity of compounds, the concept of basis compounds is introduced to enable researchers to easily combine the MCS-based and traditional similarity measures with modern machine learning techniques. Our experiments on real compound datasets demonstrate that MCS complements the well-known atom pair descriptor-based similarity measure. By combining these two measures, we propose an SVM-based algorithm for predicting the biological activities of chemical compounds with high specificity and sensitivity.;In similarity search and clustering applications of very large compound sets, most methods are limited in efficiency and scalability and cannot handle today's large compound datasets with several million entries. This is particularly true for MCS-based methods and the computation complexity renders MCS infeasible for large compound dataset. The second main topic of this study addresses this time performance issue by introducing a new method for greatly accelerating similarity search and clustering of very large compound sets using embedding and indexing techniques. The method, which can be used with MCS-based as well as traditional similarity measures, embeds compounds in a high-dimensional Euclidean space and searches this space using an efficient index-aware nearest neighbor search method based on Locality Sensitive Hashing. The method can also be used to accelerate cluster analysis of large compound sets. When applied to similarity search in compound datasets as large as PubChem, we found that the method was 40-200 times faster than sequential search methods, while maintaining comparable recall rates. It also made MCS-based similarity search tractable for large compound datasets. When applied to the clustering of such compound datasets, it helped to reduce the computation time from several months to only a few days.
机译:生物活性化合物的预测对于药物发现和化学基因组学中的高通量筛选(HTS)方法非常重要。该领域中的许多计算方法集中于测量化学结构之间的结构相似性。但是,传统的相似性度量通常过于僵化,或者仅考虑结构之间的全局相似性。这项研究引入了两种新的替代搜索方法,这些方法可以克服大多数这些局限性。首先,最大通用子结构(MCS)方法为预测生物活性化合物提供了更为有前途和灵活的选择。本文提出了一种新的MCS回溯算法,并将其与全局相似性度量进行了比较。我们的算法在匹配过程中提供了高度灵活性,并且在识别局部结构相似性方面非常有效。为了将基于MCS的相似性度量应用于化合物生物活性的预测模型,引入了基础化合物的概念,以使研究人员能够轻松地将基于MCS的和传统相似性度量与现代机器学习技术相结合。我们在真实化合物数据集上的实验表明,MCS补充了众所周知的基于原子对描述符的相似性度量。通过将这两种方法结合起来,我们提出了一种基于SVM的算法来预测具有高特异性和敏感性的化合物的生物活性。;在非常大的化合物集的相似性搜索和聚类应用中,大多数方法的效率和可扩展性都受到限制,并且不能处理具有数百万个条目的当今大型复合数据集。对于基于MCS的方法尤其如此,并且计算复杂性使得MCS对于大型复合数据集不可行。本研究的第二个主要主题是通过引入一种新方法来极大地加速使用嵌入和索引技术的非常大的化合物集的相似性搜索和聚类,从而解决了该时间性能问题。该方法可以与基于MCS的方法以及传统的相似性度量方法一起使用,将化合物嵌入到高维欧几里得空间中,并使用基于局部敏感哈希的有效索引感知最近邻居搜索方法来搜索该空间。该方法还可以用于加速大型化合物集的聚类分析。当在与PubChem一样大的化合物数据集中进行相似性搜索时,我们发现该方法比顺序搜索方法快40-200倍,同时保持了相当的查全率。它还使大型化合物数据集的基于MCS的相似性搜索变得易于处理。当应用于此类复合数据集的聚类时,它有助于将计算时间从几个月减少到只有几天。

著录项

  • 作者

    Cao, Yi Qun.;

  • 作者单位

    University of California, Riverside.;

  • 授予单位 University of California, Riverside.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 154 p.
  • 总页数 154
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号