首页> 外文会议>IEEE International Conference on Bioinformatics and Bioengineering >An Efficient Algorithm for Identifying Genomic Structural Inversion with Wide-spectrum of Length
【24h】

An Efficient Algorithm for Identifying Genomic Structural Inversion with Wide-spectrum of Length

机译:一种高效算法,用于识别基因组结构反演的宽范围

获取原文

摘要

Genomic structural inversion is a class of structural variations, and has been widely associated to a series of complex traits and diseases. It has great significance in accurately identifying the inversions from the high-throughput sequencing data for both research and clinical practice. However, detecting inversion is a challenging computational problem. Existing approaches either limit to detect the inversions with specific length intervals or require a significant distribution of the coverage across the candidate interval. In this paper, we propose a novel detection algorithm to accurately identify the inversions with wide-spectrum of length. The proposed algorithm consists of two components: a clustering step and a segmentation and extension step. It first clusters the pair-ended reads to squeeze the candidate intervals. Then, it utilizes the contig assembly strategy to reconstruct the candidate intervals. Meanwhile, a segmentation and extension strategy is implemented. For each candidate interval, a feature vector is calculated, based on the characteristic values. Finally, the algorithm combines the comparison verification results to filter out some potential false positives, and then returns the inversion breakpoints on base-pair resolution. We conduct a series of simulation experiments to verify the performance of proposed algorithm and compare to two very popular approaches, DELLY and Pindel. The results demonstrate that the proposed approach provides better results on handling the inversions with wide-spectrum of length, especially when the inversions with short-to-medium length exist.
机译:基因组结构倒置是一类结构性变化,并且已与一系列复杂的性状和疾病广泛相关。在准确地识别来自高通量测序数据的逆转数据的研究和临床实践具有重要意义。然而,检测反演是一个具有挑战性的计算问题。现有方法限制以检测具有特定长度间隔的逆转或需要跨候选间隔的覆盖率的显着分布。在本文中,我们提出了一种新的检测算法,可以准确地识别具有宽范围的长度的逆。所提出的算法包括两个组件:群集步骤和分段和扩展步骤。它首先将对结束的读取群体挤压候选间隔。然后,它利用CONTIG组装策略重建候选间隔。同时,实施了分割和扩展策略。对于每个候选间隔,基于特征值计算特征向量。最后,该算法结合了比较验证结果以滤除一些潜在的误报,然后返回基对分辨率的反转断点。我们开展一系列仿真实验,以验证所提出的算法的性能,并比较两个非常流行的方法,德利奇和井线。结果表明,所提出的方法在处理具有宽范围的频率的过程中提供更好的结果,特别是当存在短到介质长度的逆转时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号