首页> 外文会议>IEEE International Conference on Cloud Computing and Big Data Analysis >Mining potential proteins as drug targets using ensemble models
【24h】

Mining potential proteins as drug targets using ensemble models

机译:使用集成模型挖掘潜在的蛋白质作为药物靶标

获取原文

摘要

Mining drug targets from database is a problem of challenge but significance, which has received much attention from academia and industry. As a main source of drug targets, proteins have abundant sequence properties analyzed by the modern measuring techniques and those properties provide a new perspective for detecting the potential proteins as drug targets. In our research, two typical ensemble models, Bagging and AdaBoost of decision trees, were applied for predicting potential DTPs. To tackle the uncertainty of non-drug targets proteins, random sampling was used to construct the negative dataset. The experiments showed both of two ensemble models were insensitive to the feature selection and ratio of training dataset. The average of recall ratios for proteins as drug targets maintained a high level, over 90% in two models. In one turn of parallel experiments, Bagging and AdaBoost predicted 257 potential drug target proteins in common. All of these have confirmed the availability of sequence information and applicability of data mining techniques. The process of retrieving drug targets would be accelerated with the booming development of new data mining technics.
机译:从数据库中挖掘药物靶标是一个挑战,但意义重大,已经引起了学术界和工业界的广泛关注。作为药物靶标的主要来源,蛋白质具有通过现代测量技术分析的丰富序列特性,这些特性为检测潜在的蛋白质作为药物靶标提供了新的视角。在我们的研究中,决策树的Bagging和AdaBoost两个典型的集成模型被用于预测潜在的DTP。为了解决非药物靶蛋白的不确定性,使用随机抽样构建了阴性数据集。实验表明,两个集成模型都对特征选择和训练数据集的比例不敏感。蛋白质作为药物靶标的平均召回率保持较高水平,在两个模型中均超过90%。在另一轮平行实验中,Bagging和AdaBoost预测了257种潜在的共同药物靶蛋白。所有这些都证实了序列信息的可用性和数据挖掘技术的适用性。随着新的数据挖掘技术的蓬勃发展,检索药物靶标的过程将加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号