Objective To evaluate the application value of a model based on K-nearest neighbor (KNN) method for subcellular localization prediction of proteins containing fibronectin domains. Methods KNN was used to predict the subcellular locations of proteins containing fibronectin domains. Amino acid compositions of 40 extracellular proteins and 40 intracellular proteins were used as the input vectors which were divided into groups to be trained and tested. The prediction quality was examined by two standard test methods in statistics: the jackknife test and 5-fold cross validation. Results The prediction accuracy rate of the samples thus obtained via the jackknife test was 88.75%. 36 intracellular proteins and 35 extracellular proteins were predicted belonging to the right subcellular locations. While the 5-fold cross validation success rate was 82.5%. 34 intracellular proteins and 32 extracellular proteins were predicted belonging to the right locations. Conclusiou The KNN method can provid good prediction performance for the protein sequences.%目的 探讨K最近邻(KNN)法在含纤连蛋白(FN)域蛋白质亚细胞定位中的应用价值.方法 选取含FN域蛋白质80个(40个细胞外蛋白质和40个细胞内蛋白质),采用KNN法进行蛋白质亚细胞定位,并采用jackknife检验法和5维交叉验证法检验样本的定位的准确率.结果 KNN法定位细胞内蛋白36个,细胞外蛋白35个.jackknife法检验KNN法蛋白质定位准确率为88.75%, 5维交叉法验证其定位准确率为82.5%.结论 利用KNN法可较准确的预测含FN域蛋白质的亚细胞位置.
展开▼