...
首页> 外文期刊>Nucleic Acids Research >Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins
【24h】

Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins

机译:选择最佳预测因子以鉴定DNA结合蛋白中结合位点的新颖方法

获取原文
获取原文并翻译 | 示例
           

摘要

Protein-DNA complexes play vital roles in many cellular processes by the interactions of amino acids with DNA. Several computational methods have been developed for predicting the interacting residues in DNA-binding proteins using sequence and/or structural information. These methods showed different levels of accuracies, which may depend on the choice of data sets used in training, the feature sets selected for developing a predictive model, the ability of the models to capture information useful for prediction or a combination of these factors. In many cases, different methods are likely to produce similar results, whereas in others, the predictors may return contradictory predictions. In this situation, a priori estimates of prediction performance applicable to the system being investigated would be helpful for biologists to choose the best method for designing their experiments. In this work, we have constructed unbiased, stringent and diverse data sets for DNA-binding proteins based on various biologically relevant considerations: (i) seven structural classes, (ii) 86 folds, (iii) 106 superfamilies, (iv) 194 families, (v) 15 binding motifs, (vi) single/double-stranded DNA, (vii) DNA conformation (A, B, Z, etc.), (viii) three functions and (ix) disordered regions. These data sets were culled as non-redundant with sequence identities of 25 and 40% and used to evaluate the performance of 11 different methods in which online services or standalone programs are available. We observed that the best performing methods for each of the data sets showed significant biases toward the data sets selected for their benchmark. Our analysis revealed important data set features, which could be used to estimate these context-specific biases and hence suggest the best method to be used for a given problem. We have developed a web server, which considers these features on demand and displays the best method that the investigator should use. The web server is freely available at http://www.biotech.iitm.ac.in/DNA-protein/. Further, we have grouped the methods based on their complexity and analyzed the performance. The information gained in this work could be effectively used to select the best method for designing experiments.
机译:蛋白质-DNA复合物通过氨基酸与DNA的相互作用在许多细胞过程中起着至关重要的作用。已经开发出了几种使用序列和/或结构信息预测DNA结合蛋白中相互作用残基的计算方法。这些方法显示出不同的准确性水平,这可能取决于训练中使用的数据集的选择,为开发预测模型而选择的特征集,模型捕获对预测有用的信息的能力或这些因素的组合。在许多情况下,不同的方法可能会产生相似的结果,而在其他情况下,预测变量可能会返回矛盾的预测。在这种情况下,适用于所研究系统的预测性能的先验估计将有助于生物学家选择设计实验的最佳方法。在这项工作中,我们基于各种生物学相关的考虑因素,为DNA结合蛋白构建了无偏见,严格而多样的数据集:(i)七个结构类别,(ii)86倍,(iii)106个超家族,(iv)194个家族,(v)15个结合基序,(vi)单/双链DNA,(vii)DNA构象(A,B,Z等),(viii)三个功能和(ix)无序区。这些数据集被剔除为非冗余,具有25%和40%的序列同一性,并用于评估11种不同方法的性能,在这些方法中可以使用在线服务或独立程序。我们观察到,对于每个数据集而言,性能最佳的方法显示出明显偏向为其基准选择的数据集。我们的分析揭示了重要的数据集功能,这些功能可用于估计这些特定于上下文的偏差,因此建议了用于给定问题的最佳方法。我们已经开发了一个Web服务器,它可以根据需要考虑这些功能,并显示调查人员应使用的最佳方法。该Web服务器可从http://www.biotech.iitm.ac.in/DNA-protein/免费获得。此外,我们根据方法的复杂性对其进行了分组并分析了性能。从这项工作中获得的信息可以有效地用于选择设计实验的最佳方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号