首页> 外文期刊>Journal of Molecular Biology >Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.
【24h】

Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.

机译:通过基因组DNA与蛋白质序列的剪接比对预测基因结构:通过差异剪接位点评分提高准确性。

获取原文
获取原文并翻译 | 示例
           

摘要

Gene identification in genomic DNA from eukaryotes is complicated by the vast combinatorial possibilities of potential exon assemblies. If the gene encodes a protein that is closely related to known proteins, gene identification is aided by matching similarity of potential translation products to those target proteins. The genomic DNA and protein sequences can be aligned directly by scoring the implied residues of in-frame nucleotide triplets against the protein residues in conventional ways, while allowing for long gaps in the alignment corresponding to introns in the genomic DNA. We describe a novel method for such spliced alignment. The method derives an optimal alignment based on scoring for both sequence similarity of the predicted gene product to the protein sequence and intrinsic splice site strength of the predicted introns. Application of the method to a representative set of 50 known genes from Arabidopsis thaliana showed significant improvement in prediction accuracy compared to previous spliced alignment methods. The method is also more accurate than ab initio gene prediction methods, provided sufficiently close target proteins are available. In view of the fast growth of public sequence repositories, we argue that close targets will be available for the majority of novel genes, making spliced alignment an excellent practical tool for high-throughput automated genome annotation. Copyright 2000 Academic Press.
机译:真核生物的基因组DNA中的基因鉴定由于潜在外显子装配的巨大组合可能性而变得复杂。如果基因编码与已知蛋白质密切相关的蛋白质,则通过将潜在翻译产物的相似性与那些靶标蛋白质进行匹配来辅助基因鉴定。可以通过以常规方式将框内核苷酸三联体的隐含残基相对于蛋白质残基评分来直接比对基因组DNA和蛋白质序列,同时允许与基因组DNA中的内含子相对应的比对中存在长的缺口。我们描述了这种拼接对齐的一种新方法。该方法基于预测基因产物与蛋白质序列的序列相似性和预测内含子的固有剪接位点强度的得分,得出最佳比对。与以前的剪接比对方法相比,将该方法应用于拟南芥的50个已知基因的代表集显示出预测准确性的显着提高。如果可以使用足够接近的靶蛋白,则该方法比从头算基因预测方法更准确。鉴于公共序列存储库的快速增长,我们认为大多数新基因都可以使用近距离靶点,从而使剪接比对成为实现高通量自动化基因组注释的出色实用工具。版权所有2000学术出版社。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号