...
首页> 外文期刊>EURASIP journal on advances in signal processing >Novel methodologies for spectral classification of exon and intron sequences
【24h】

Novel methodologies for spectral classification of exon and intron sequences

机译:外显子和内含子序列光谱分类的新方法

获取原文
           

摘要

Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
机译:核苷酸序列的数字处理要求将其映射到一个数字序列,在该序列中,核苷酸到数字映射的选择会影响其生物学特性得以保留的程度以及从核苷酸域到数字域的反映。核苷酸序列的数字光谱分析揭示了一个3期功率谱值,该值在外显子序列中比内含子序列更显着。基于周期3的外显子和内含子分类的成功取决于阈值的选择。本文的主要目的是介绍用于频谱分析的1序列数字表示形式的新颖代码,并将其与现有代码进行比较以确定合适的表示形式,并介绍新颖的阈值处理方法,以更精确地基于周期3的外显子和内含子分类未知序列。这项研究的主要发现概括如下:在16个1序列数字表示形式中,K第四代编码I提供了诱人的性能。窗口化的1序列数字表示形式(窗口长度为9、15和24个碱基)提供了比非窗口化的4序列Voss表示形式可能的速度增益,后者随序列长度的增加而增加。优胜者阈值(从两个定义的阈值和一个其他阈值中的最佳者中选出)为分类指定固定长度的未知序列提供了最高的精度。可以根据具有可比较性能的固定长度序列的获胜者阈值来估计适用于未知长度和任意长度序列的内插获胜者阈值。通常,精度随着序列长度的增加而增加。该研究有助于对核苷酸序列进行有效的光谱分析,以更好地揭示嵌入的特性,并在改进基因组注释中具有潜在的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号