首页> 美国卫生研究院文献>Scientific Reports >A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
【2h】

A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns

机译:无比对的宏基因组合并的信号处理方法:多分辨率基因组二进制模式

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local ‘texture’ changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their ‘texture’ compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at .
机译:生物信息学中的算法使用遗传信息的文本表示形式,将字符A,T,G和C的序列计算为字符串或子字符串。信号和相关的图像处理方法提供了丰富的替代描述符来源,因为它们被设计为在有噪声数据的情况下工作,而无需精确匹配。在这里,我们介绍一种方法,该方法适用于图像处理,可从核苷酸序列数据中提取局部“纹理”变化,从而适用于图像处理的多分辨率局部二进制模式(MLBP)。我们将此功能空间应用于宏基因组数据的无对齐分箱。 MLBP的有效性已通过模拟和真实的人类肠道微生物群落来证明。序列读段或重叠群可以表示为向量,并使用机器学习算法进行有效的比较以执行降维以捕获特征基因组信息并进行聚类(此处使用随机的奇异值分解和BH-tSNE)。我们方法背后的直觉是MLBP特征向量允许进行序列比较,而无需显式的成对匹配。我们证明了这种方法优于基于k-mer频率的现有方法。因此,信号处理方法MLBP为序列数据的文本表示提供了可行的替代特征空间。我们的多分辨率基因组二元模式方法的源代码可在处找到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号