首页> 外国专利> LEAPING SEARCH ALGORITHM FOR SIMILAR SUB-SEQUENCES IN CHARACTER SEQUENCES AND APPLICATION THEREOF IN SEARCHING IN BIOLOGICAL SEQUENCE DATABASE

LEAPING SEARCH ALGORITHM FOR SIMILAR SUB-SEQUENCES IN CHARACTER SEQUENCES AND APPLICATION THEREOF IN SEARCHING IN BIOLOGICAL SEQUENCE DATABASE

机译:特征序列中相似子序列的搜索算法及其在生物序列数据库中的应用

摘要

Disclosed is a leaping search algorithm for similar sub-sequences in character sequences and an application thereof in searching in a biological sequence database. The algorithm comprises: S0, constructing an FMD index and a lookup table for a database; S1, fetching, from the lookup table, a bi-interval of a sub-sequence with k characters in query sequences; S2, sequentially finding matching areas on the left of the k seed by using a backward search algorithm; S3, applying a forward search algorithm to an interval that has not been shrinked in step S2, to find matching areas on the right of the k seed; S4, checking whether the current detecting position is the end of the query sequence or not, and if yes, the algorithm ends, otherwise, proceeding to step S5; and S5, leaping forward w-k+1 positions from the current detecting position, and repeating steps S2 to S5. The lookup table proposed in the present invention features small memory space and a high access efficiency. According to the present invention, by combining the lookup table and an FMD index, all w seeds can be found quickly. In addition, the present invention has been successfully applied to biological sequence alignment.
机译:公开了一种针对字符序列中相似子序列的跨越式搜索算法及其在生物序列数据库中的搜索应用。该算法包括:S 0,为数据库建立FMD索引和查找表。 S 1 从查询表中获取查询序列中具有k个字符的子序列的双间隔; S 2 ,采用后向搜索算法,在第k个种子的左侧依次找到匹配区域; S 3,在步骤S 2中未缩小的区间上应用前向搜索算法,以找到第k个种子右侧的匹配区域; S 4,检查当前检测位置是否在查询序列的末尾,如果是,则算法结束,否则,进行步骤S 5; S 5,从当前检测位置向前跃迁w-k + 1位置,并重复步骤S 2 至S 5。本发明中提出的特征在于小的存储空间和高的访问效率。根据本发明,通过组合查找表和FMD索引,可以快速找到所有w个种子。另外,本发明已经成功地应用于生物序列比对。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号