首页> 美国卫生研究院文献>Bioinformatics >Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping
【2h】

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping

机译:汉明距离偏移:快速准确的SIMD友好型过滤器可加快读取映射中的比对验证

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abundant sequence pairs needlessly waste resources and severely hinder the performance of read mappers. Therefore, it is crucial to develop a fast and accurate filter that can rapidly and efficiently detect error-abundant string pairs and remove them from consideration before more computationally expensive methods are used.>Results: We present a simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit-parallel and SIMD-parallel operations. SHD only filters string pairs that contain more errors than a user-defined threshold, making it fully comprehensive. It also maintains high accuracy with moderate error threshold (up to 5% of the string length) while achieving a 3-fold speedup over the best previous algorithm (Gene Myers’s bit-vector algorithm). SHD is compatible with all mappers that perform sequence alignment for verification.>Availability and implementation: We provide an implementation of SHD in C with Intel SSE instructions at: .>Contact: , or >Supplementary information: are available at Bioinformatics online.
机译:>动机:计算短DNA序列之间的编辑距离(即最小插入,缺失和取代数)是基于种子和扩展的映射器(比较数十亿个序列)的主要任务。实际上,只有具有较小编辑距离的序列对才能提供有用的科学数据。但是,由基于种子和扩展的映射器分析的大多数序列对与通常允许的差异相比,相差明显得多。这样的错误丰富的序列对不必要地浪费了资源,并严重阻碍了读取映射器的性能。因此,至关重要的是要开发一种快速准确的过滤器,该过滤器可以快速有效地检测出大量错误的字符串对,并在使用更昂贵的计算方法之前将其从考虑中删除。>结果:我们提出了一种简单且高效的算法,移位汉明距离(SHD),通过使用位并行和SIMD并行操作快速滤除错误丰富的序列对,从而加快了读取映射中的比对验证过程。 SHD仅过滤包含比用户定义的阈值更多错误的字符串对,从而使其全面。它还具有中等误差阈值(最多为字符串长度的5%)的高精度,同时比以前最好的算法(Gene Myers的位向量算法)提高了3倍。 SHD与执行序列比对以进行验证的所有映射程序都兼容。>可用性和实现:我们通过以下位置的Intel SSE指令在C中提供SHD的实现:>联系:或>补充信息:可在线访问生物信息学。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号