...
首页> 外文期刊>Journal of Molecular Modeling >PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines
【24h】

PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines

机译:PSP_MCSVM:使用两阶段多类支持向量机集思广益地预测蛋白质二级结构

获取原文
获取原文并翻译 | 示例
           

摘要

Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and physico-chemical properties of amino acids. It is a two stage approach involving multiclass support vector machines (SVMs) as classifiers for three different structural conformations, viz., helix, sheet and coil. In the first stage, PSSMs obtained from PSI-BLAST and five specially selected physicochemical properties of amino acids are fed into SVMs as features for sequence-to-structure prediction. Confidence values for forming helix, sheet and coil that are obtained from the first stage SVM are then used in the second stage SVM for performing structure-to-structure prediction. The two-stage cascaded classifiers (PSP_MCSVM) are trained with proteins from RS126 dataset. The classifiers are finally tested on target proteins of critical assessment of protein structure prediction experiment-9 (CASP9). PSP_MCSVM with brainstorming consensus procedure performs better than the prediction servers like Predator, DSC, SIMPA96, for randomly selected proteins from CASP9 targets. The overall performance is found to be comparable with the current state-of-the art. PSP_MCSVM source code, train-test datasets and supplementary files are available freely in public domain at: http://sysbio.icm.edu.pl/secstruct and http://code.google.com/p/cmater-bioinfo/
机译:二级结构预测对于理解各种蛋白质结构和已执行的生物学功能至关重要。利用氨基酸序列预测新蛋白质的二级结构在生物信息学中至关重要。我们提出了一种新技术,以基于位置特定评分矩阵(PSSM)和氨基酸的理化性质预测蛋白质的二级结构。它是一种分为两阶段的方法,涉及多类支持向量机(SVM)作为三个不同结构构象(即螺旋,薄片和线圈)的分类器。在第一阶段,将从PSI-BLAST获得的PSSM和5种特别选择的氨基酸理化特性作为序列到结构预测的特征输入SVM。从第一阶段SVM获得的用于形成螺旋,薄板和线圈的置信度值随后在第二阶段SVM中用于执行结构到结构的预测。使用来自RS126数据集的蛋白质训练两级级联分类器(PSP_MCSVM)。最后,在对蛋白质结构预测实验9(CASP9)进行严格评估的目标蛋白质上对分类器进行测试。对于从CASP9目标中随机选择的蛋白质,具有集思广益共识程序的PSP_MCSVM的性能优于Predator,DSC,SIMPA96等预测服务器。发现总体性能可与当前的最新技术相媲美。 PSP_MCSVM源代码,训练测试数据集和补充文件可在公共领域免费获得,网址为:http://sysbio.icm.edu.pl/secstruct和http://code.google.com/p/cmater-bioinfo/

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号