首页> 外文期刊>Gene: An International Journal Focusing on Gene Cloning and Gene Structure and Function >A shotgun approach to discovering and reconstructing consensus retrotransposons ex novo from dense contigs of short sequences derived from Genbank Genome Survey Sequence database records.
【24h】

A shotgun approach to discovering and reconstructing consensus retrotransposons ex novo from dense contigs of short sequences derived from Genbank Genome Survey Sequence database records.

机译:一种散弹枪方法,用于从Genbank基因组调查序列数据库记录衍生的短序列的密集重叠群中从头发现和重建共有逆转录转座子。

获取原文
获取原文并翻译 | 示例
           

摘要

Retrotransposons constitute the majority of pseudogenic protein coding regions of most eukaryotic genomes. Most genomes carry tens to thousands of retrotransposon copies derived from dozens of distinct families, but most if not all of these copies are non-functional and contain disabling mutations, including large numbers of indels. Until recently, most regions rich in these elements were virtually ignored in all but the most complete genome sequencing projects, and the full extent of their impact on the structure and function of the genomes of higher eukaryotes was under-appreciated. Even when new retrotransposons are encountered and annotated by automated gene finding programs and similarity searches, coding regions are treated as exons and invariably and not surprisingly mistranslated because of numerous frameshift mutations and large indels. Very few functional retrotransposons contain introns, as in silico annotations imply. While many repetitive DNA consensus sequences have been assembled from collections of largely full-length copies using full-length templates, we have shown that repetitive DNA consensus sequence contigs representing long, moderately high copy-number elements can also be generated ex novo in the absence of templates from very short overlapping sequences. We have devised an in silico strategy to recover and reconstruct consensus sequences of elements up to 20,000 bp by building dense contigs of hundreds of overlapping 400 to 900-bp records found in the Genbank Genome Survey Sequence database. The results are hypothetical ancestral sequences that encode elements that appear to be fully functional with intact open reading frames and other conserved features.
机译:逆转座子构成大多数真核生物基因组的大多数假基因蛋白编码区。大多数基因组携带数十至数千个来自数十个不同家族的反转录转座子拷贝,但大多数(如果不是全部)这些拷贝是无功能的,并且包含致残突变,包括大量插入缺失。直到最近,除了最完整的基因组测序项目外,几乎所有这些富含这些元素的区域都被忽略了,而它们对高等真核生物基因组结构和功能的影响程度却被人们忽略了。即使在遇到新的逆转座子并通过自动基因发现程序和相似性搜索进行注释时,由于许多移码突变和大插入缺失,编码区仍被视作外显子,而且总是不变且不令人意外地误翻译。正如计算机注释中所暗示的那样,功能性反转录转座子中几乎没有内含子。尽管已经使用全长模板从大量全长拷贝的集合中组装了许多重复性DNA共有序列,但我们已经表明,在不存在的情况下,也可以从头产生代表长,中等高拷贝数元素的重复性DNA共有序列重叠群非常短的重叠序列中的模板集合。我们设计了一种计算机策略,通过构建在Genbank基因组调查序列数据库中发现的数百个重叠的400至900 bp记录的密集重叠群,来恢复和重建高达20,000 bp的元素共有序列。结果是假设的祖先序列,这些祖先序列编码看起来具有完整开放阅读框和其他保守特征的完整功能的元件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号