...
首页> 外文期刊>Nucleic Acids Research >The whole alignment and nothing but the alignment: the problem of spurious alignment flanks.
【24h】

The whole alignment and nothing but the alignment: the problem of spurious alignment flanks.

机译:整个对齐方式不过是对齐方式:虚假对齐方式侧面的问题。

获取原文
获取原文并翻译 | 示例
           

摘要

Pairwise sequence alignment is a ubiquitous tool for inferring the evolution and function of DNA, RNA and protein sequences. It is therefore essential to identify alignments arising by chance alone, i.e. spurious alignments. On one hand, if an entire alignment is spurious, statistical techniques for identifying and eliminating it are well known. On the other hand, if only a part of the alignment is spurious, elimination is much more problematic. In practice, even the sizes and frequencies of spurious subalignments remain unknown. This article shows that some common scoring schemes tend to overextend alignments and generate spurious alignment flanks up to hundreds of base pairs/amino acids in length. In the UCSC genome database, e.g. spurious flanks probably comprise >18% of the human-fugu genome alignment. To evaluate the possibility that chance alone generated a particular flank on a particular pairwise alignment, we provide a simple 'overalignment' P-value. The overalignment P-value can identify spurious alignment flanks, thereby eliminating potentially misleading inferences about evolution and function. Moreover, by explicitly demonstrating the tradeoff between over- and under-alignment, our methods guide the rational choice of scoring schemes for various alignment tasks.
机译:成对序列比对是一种普遍使用的工具,可推断DNA,RNA和蛋白质序列的进化和功能。因此,必须识别仅由偶然性引起的对准,即伪对准。一方面,如果整个比对是虚假的,则用于识别和消除它的统计技术是众所周知的。另一方面,如果仅对齐的一部分是伪造的,则消除将成问题。在实践中,甚至杂散子对准的大小和频率仍然未知。本文显示了一些常见的计分方案倾向于过度延伸比对并生成高达数百个碱基对/氨基酸长度的伪比对侧翼。在UCSC基因组数据库中,例如虚假的侧翼可能占人与河豚基因组比对的18%以上。为了评估机会单独在特定的成对比对中产生特定侧翼的可能性,我们提供了一个简单的“过度对准” P值。过度对准的P值可以识别虚假的对准侧面,从而消除有关进化和功能的潜在误导性推论。此外,通过明确显示过度对齐和欠对齐之间的折衷,我们的方法指导了针对各种对齐任务的评分方案的合理选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号