Words in DNA sequences: some case studies based on their frequency statistics

Basu S.; Burma DP.; Chaudhuri P.

首页> 外文期刊>Journal of Mathematical Biology >Words in DNA sequences: some case studies based on their frequency statistics

【24h】

Words in DNA sequences: some case studies based on their frequency statistics

机译：DNA序列中的单词：一些基于频率统计的案例研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the critical requirements of data analysis involving large DNA sequences is an effective statistical summarization of those sequences. In this article DNA sequences have been analyzed based on word frequencies. Our analysis focuses on the detection of structural signature of a genome reflected in word frequencies and identification of phylogenetic relationships among different species reflected in the variation of word distributions in their DNA sequences. We have carried out a statistical study of the complete genome of baker's yeast, of various ribosomal RNA sequences from different prokaryotic and eukaryotic organisms and of the full genomes of some bacteriophages. Our exploratory analysis amply demonstrates the usefulness of DNA word frequencies in reducing the dimensionality of large sequences while retaining some of the structural information there that can have biological significance. Some conceptual issues that arise in course of our investigation have been addressed. A few interesting problems related to the statistics of DNA words have been pointed out with some indication of their possible solutions. The work has been partially motivated by the fact that sequence alignment and homology techniques that are quite popular for comparing and analyzing relatively smaller DNA sequences of nearly equal sizes are not applicable to data consisting of large sequences with widely varying sizes, which may contain segments with unknown or no biological functions, and consequently their comparison through functional homology is either impossible or extremely difficult. [References: 48]

机译：涉及大DNA序列的数据分析的关键要求之一是对这些序列进行有效的统计汇总。在本文中，已经根据词频分析了DNA序列。我们的分析重点是检测反映在单词频率中的基因组的结构特征，以及识别反映在其DNA序列中单词分布变化中的不同物种之间的系统发生关系。我们已经对面包酵母的完整基因组，来自不同原核生物和真核生物的各种核糖体RNA序列以及某些噬菌体的完整基因组进行了统计研究。我们的探索性分析充分证明了DNA单词频率在减少大序列的维数的同时保留一些可能具有生物学意义的结构信息的有用性。已经解决了我们调查过程中出现的一些概念性问题。指出了一些与DNA词统计有关的有趣问题，并指出了可能的解决方案。这项工作的部分动机是因为比较和分析几乎相等大小的相对较小DNA序列非常流行的序列比对和同源技术不适用于由大小各异的大序列组成的数据，这些大序列可能包含带有未知或没有生物学功能，因此通过功能同源性进行比较是不可能的，或者是极其困难的。 [参考：48]

著录项

来源
《Journal of Mathematical Biology》 |2003年第6期|共25页
作者
Basu S.; Burma DP.; Chaudhuri P.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物科学;
关键词
Average linkage clustering; Chernoff's faces; Dendrograms; Dna words; F-ranks of words; F-ratios of words; L(1-)distance; Phylogenetic relationships; Rank correlation; Single linkage clustering; Escherichia-coli genome; Markov-chain analysis; Nucleotide-sequences; Poisson approximations; Compound poisson; Patterns; Similarities; Phylogenies; Linguistics; Inference;

机译：平均连锁聚类;切尔诺夫的面孔;树状图;Dna词;词的F等级;词的F比率;L（1-）距离;系统发育关系;等级相关;单连锁聚类;大肠杆菌基因组;马尔可夫链分析核苷酸序列泊松近似复合泊松模式相似性系统发育语言学推论;

相似文献

外文文献
中文文献
专利

1. Words in DNA sequences: some case studies based on their frequency statistics [J] . Basu S., Burma DP., Chaudhuri P. Journal of Mathematical Biology . 2003,第6期

机译：DNA序列中的单词：一些基于频率统计的案例研究
2. Measurement of word frequencies in genomic DNA sequences based on partial alignment and fuzzy set [J] . Fumiya Shida, Satoshi Mizuta Journal of Bioinformatics and Computational Biology . 2014,第4期

机译：基于部分比对和模糊集的基因组DNA序列词频测量
3. Study on the Optimization Design of the Subject Indexing Based on the Word-frequency Statistics [J] . Huafeng Xie, Fang Wu, Xuying Lu Computer and information science . 2011,第2期

机译：基于词频统计的主题索引优化设计研究
4. Similarity analysis of DNA sequences based on k-word [C] . Hu Yingxin, Qi Zhaohui, Zheng Lijuan, IEEE International Conference on Progress in Informatics and Computing . 2014

机译：基于k字的DNA序列相似性分析
5. PREDICTING LETTER SEARCH TIME THROUGH WORDS AND NONWORDS: THE ROLES OF STATISTICAL FREQUENCY AND LEXICAL STATUS IN THE WORD-SUPERIORITY EFFECT [D] . DUTCH, SUSAN ELAINE. 1980

机译：通过单词和单词预测字母搜索时间：单词超常效果中统计频率和词汇状态的作用
6. WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences. [O] . G Pesole, N Prunella, S Liuni, 1992

机译：WORDUP：一种有效的算法用于发现DNA序列中具有统计学意义的模式。
7. Segmenting DNA sequence into words based on statistical language model [O] . Wang Liang 2012

机译：基于统计语言模型的DNA序列片段分割

Words in DNA sequences: some case studies based on their frequency statistics

摘要

著录项

相似文献

相关主题

期刊订阅