首页> 美国卫生研究院文献>Proceedings of the National Academy of Sciences of the United States of America >Enhanced protein domain discovery by using language modeling techniques from speech recognition
【2h】

Enhanced protein domain discovery by using language modeling techniques from speech recognition

机译:通过使用语音识别中的语言建模技术来增强蛋白质结构域发现

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Most modern speech recognition uses probabilistic models to interpret a sequence of sounds. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to find domains in protein sequences of amino acids. To increase word accuracy in speech recognition, language models are used to capture the information that certain word combinations are more likely than others, thus improving detection based on context. However, to date, these context techniques have not been applied to protein domain discovery. Here we show that the application of statistical language modeling methods can significantly enhance domain recognition in protein sequences. As an example, we discover an unannotated Tf_Otx Pfam domain on the cone rod homeobox protein, which suggests a possible mechanism for how the V242M mutation on this protein causes cone-rod dystrophy.
机译:大多数现代语音识别使用概率模型来解释声音序列。隐马尔可夫模型尤其用于识别单词。已采用相同的技术来发现氨基酸的蛋白质序列中的结构域。为了提高语音识别中的单词准确性,使用语言模型来捕获某些单词组合比其他单词更可能出现的信息,从而改善了基于上下文的检测。但是,迄今为止,这些上下文技术尚未应用于蛋白质结构域发现。在这里,我们表明统计语言建模方法的应用可以显着增强蛋白质序列中的域识别。例如,我们在锥杆同源盒蛋白上发现了一个未注释的Tf_Otx Pfam结构域,这提示了该蛋白上的V242M突变如何引起锥杆营养不良的可能机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号