首页> 外文期刊>Journal of Mathematical Biology >Words in DNA sequences: some case studies based on their frequency statistics
【24h】

Words in DNA sequences: some case studies based on their frequency statistics

机译:DNA序列中的单词:一些基于频率统计的案例研究

获取原文
获取原文并翻译 | 示例
           

摘要

One of the critical requirements of data analysis involving large DNA sequences is an effective statistical summarization of those sequences. In this article DNA sequences have been analyzed based on word frequencies. Our analysis focuses on the detection of structural signature of a genome reflected in word frequencies and identification of phylogenetic relationships among different species reflected in the variation of word distributions in their DNA sequences. We have carried out a statistical study of the complete genome of baker's yeast, of various ribosomal RNA sequences from different prokaryotic and eukaryotic organisms and of the full genomes of some bacteriophages. Our exploratory analysis amply demonstrates the usefulness of DNA word frequencies in reducing the dimensionality of large sequences while retaining some of the structural information there that can have biological significance. Some conceptual issues that arise in course of our investigation have been addressed. A few interesting problems related to the statistics of DNA words have been pointed out with some indication of their possible solutions. The work has been partially motivated by the fact that sequence alignment and homology techniques that are quite popular for comparing and analyzing relatively smaller DNA sequences of nearly equal sizes are not applicable to data consisting of large sequences with widely varying sizes, which may contain segments with unknown or no biological functions, and consequently their comparison through functional homology is either impossible or extremely difficult. [References: 48]
机译:涉及大DNA序列的数据分析的关键要求之一是对这些序列进行有效的统计汇总。在本文中,已经根据词频分析了DNA序列。我们的分析重点是检测反映在单词频率中的基因组的结构特征,以及识别反映在其DNA序列中单词分布变化中的不同物种之间的系统发生关系。我们已经对面包酵母的完整基因组,来自不同原核生物和真核生物的各种核糖体RNA序列以及某些噬菌体的完整基因组进行了统计研究。我们的探索性分析充分证明了DNA单词频率在减少大序列的维数的同时保留一些可能具有生物学意义的结构信息的有用性。已经解决了我们调查过程中出现的一些概念性问题。指出了一些与DNA词统计有关的有趣问题,并指出了可能的解决方案。这项工作的部分动机是因为比较和分析几乎相等大小的相对较小DNA序列非常流行的序列比对和同源技术不适用于由大小各异的大序列组成的数据,这些大序列可能包含带有未知或没有生物学功能,因此通过功能同源性进行比较是不可能的,或者是极其困难的。 [参考:48]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号