首页> 外文期刊>Journal of Bioinformatics and Computational Biology >INCORPORATING HOMOLOGUES INTO SEQUENCE EMBEDDINGS FOR PROTEIN ANALYSIS
【24h】

INCORPORATING HOMOLOGUES INTO SEQUENCE EMBEDDINGS FOR PROTEIN ANALYSIS

机译:将同源物并入序列嵌入物中以进行蛋白质分析

获取原文
获取原文并翻译 | 示例
           

摘要

Statistical and learning techniques are becoming increasingly popular for different tasks in bioinformatics. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space but not directly applicable to discrete sequences such as protein sequences. One way to apply these techniques to protein sequences is to embed the sequences into a Euclidean space and then apply these techniques to the embedded points. In this work we introduce a biologically motivated sequence embedding, the homology kernel, which takes into account intuitions from local alignment, sequence homology, and predicted secondary structure. This embedding allows us to directly apply learning techniques to protein sequences. We apply the homology kernel in several ways. We demonstrate how the homology kernel can be used for protein family classification and outperforms state-of-the-art methods for remote homology detection. We show that the homology kernel can be used for secondarystructure prediction and is competitive with popular secondary structure prediction methods. Finally, we show how the homology kernel can be used to incorporate information from homologous sequences in local sequence alignment.
机译:对于生物信息学中的不同任务,统计和学习技术正变得越来越流行。许多最强大的统计和学习技术都适用于欧氏空间中的点,但不能直接适用于离散序列(例如蛋白质序列)。将这些技术应用于蛋白质序列的一种方法是将序列嵌入欧氏空间,然后将这些技术应用于嵌入点。在这项工作中,我们引入了生物学动机的序列嵌入,即同源核,该核考虑了来自局部比对,序列同源性和预测的二级结构的直觉。这种嵌入使我们可以将学习技术直接应用于蛋白质序列。我们以几种方式应用同源核。我们演示了同源核如何可用于蛋白质家族分类,并优于远程同源检测的最新方法。我们表明同源核可用于二级结构预测,并且与流行的二级结构预测方法竞争。最后,我们展示了同源核如何用于在局部序列比对中整合来自同源序列的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号