首页> 美国卫生研究院文献>Computational and Structural Biotechnology Journal >A frequency-based linguistic approach to protein decoding and design: Simple concepts diverse applications and the SCS Package
【2h】

A frequency-based linguistic approach to protein decoding and design: Simple concepts diverse applications and the SCS Package

机译:一种基于频率的蛋白质解码和设计语言方法:简单的概念多样的应用和SCS软件包

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs.
机译:蛋白质结构和功能信息以氨基酸序列编码。但是,主要序列与三维结构和功能之间的关系仍然是个谜。我们针对这一基本生物化学问题的方法是基于短组成序列(SCS)或单词的频率。蛋白质氨基酸序列被认为类似于英语句子,其中SCS等同于单词。可用性分数定义为相对于其概率预期频率的非冗余氨基酸数据库中的实际SCS频率,表明了SCS的生物学使用偏差。结果,这种基于频率的语言方法有望具有多种应用,例如特定于结构的SCS的二级结构规格以及具有罕见或不存在的SCS的免疫佐剂。在SCS或单词的等级频率关系中已经揭示了蛋白质与自然英语之间的语言相似性(例如,广泛的无标度分布范围)和差异(例如,低等级样本的行为)。我们已经开发了一个Web服务器SCS软件包,其中包含五个基于语言概念分析蛋白质序列的应用程序。这些工具有可能帮助研究人员破译结构和功能上重要的蛋白质位点,物种特异性序列以及SCS之间的功能关系。 SCS软件包还为研究人员提供了一种工具,可根据SCS的惯用用法从头构建氨基酸序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号