Motif Discovery in DNA Sequences Using an Improved Gibbs (i Gibbs) Sampling Algorithm

Makolo AU; Lamidi UA

首页> 外文期刊>Journal of Computer Science & Systems Biology >Motif Discovery in DNA Sequences Using an Improved Gibbs (i Gibbs) Sampling Algorithm

【24h】

Motif Discovery in DNA Sequences Using an Improved Gibbs (i Gibbs) Sampling Algorithm

机译：使用改进的Gibbs（i Gibbs）采样算法发现DNA序列中的基序

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motifs are repeated patterns of short sequences usually of varying lengths between 6 to 20 bases. Within Deoxyribonucleic Acid (DNA) sequences, these motifs constitute the conserved region of most common signatures for recognizing protein domains that are relevant in it evolution, function and interaction. The Gibbs sampling is a Markov Chain Monte Carlo (MCMC) algorithm which has been applied in the past to discover motifs in DNA sequences. A problem with this technique is the profusion of iterative operations in the sampling process because it progressively chooses new possible motif positions from a continuous randomize sampling in DNA sequences. We applied an Improved Gibbs (iGibbs) sampling algorithm on Breast Cancer (brca) human disease DNA sequences obtained from https://www.ncbi.nlm.nih.govuccore to overcome this unwieldy iteration by altering the processes to obtain a reduced runtime and also achieve an accurate satisfactory motif result. The methodology applied in iGibbs algorithm takes an input of fasta or gbk DNA file and creates a list of all nucleotides to predict a random sampling starting position. It applies motif length, lesser iterative value and further computes the probability and position ranking scores using Position Weight Matrix (PWM). The algorithm was implemented using Python, Python(x,y) and Biopython. The iGibbs algorithm was evaluated using varying motif lengths of 12, 18 and 24 on different base lengths of 5,000, 10,000 and 15,000 with different iteration levels. The result showed that the iGibbs returned a better average runtime of 7, 10 and 23 seconds respectively compared to 12, 32 and 60 seconds respectively in the existing Gibbs sampling algorithm found at http://ccmbweb.ccv.brown.edu/gibbs/gibbs.html. The accuracy of the motif result was checked using the hamming distance for finding the contiguous string and minimum edit distance into consensus sequences.

机译：图案是短序列的重复模式，通常长度在6到20个碱基之间。在脱氧核糖核酸（DNA）序列中，这些基序构成了最常见特征的保守区域，用于识别与其进化，功能和相互作用相关的蛋白质域。吉布斯采样是一种马尔可夫链蒙特卡洛（MCMC）算法，该算法过去曾被用于发现DNA序列中的基序。该技术的一个问题是在采样过程中会出现大量重复操作，因为它会从DNA序列的连续随机采样中逐步选择新的可能基序位置。我们对从https://www.ncbi.nlm.nih.govuccore获得的乳腺癌（brca）人类疾病DNA序列应用了改进的Gibbs（iGibbs）采样算法，以通过更改过程以减少重复序列的过程来克服这种繁琐的迭代过程运行时，还可以获得准确的令人满意的图案结果。 iGibbs算法中应用的方法学输入了fasta或gbk DNA文件，并创建了所有核苷酸的列表以预测随机采样的起始位置。它应用了主题长度，较小的迭代值，并使用位置权重矩阵（PWM）进一步计算了概率和位置排名得分。该算法是使用Python，Python（x，y）和Biopython实现的。使用不同的基序长度5,000、10,000和15,000以及不同的迭代级别，使用12、18和24的不同基序长度来评估iGibbs算法。结果表明，与http://ccmbweb.ccv.brown.edu/gibbs/上现有的Gibbs采样算法相比，iGibbs分别返回了更好的平均运行时间7、10和23秒，而分别为12、32和60秒。 gibbs.html。使用汉明距离以查找连续的字符串并将最小编辑距离转换为共有序列，从而检查图案结果的准确性。

著录项

来源
《Journal of Computer Science & Systems Biology》 |2018年第5期|共10页
作者
Makolo AU; Lamidi UA;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Motif discoveryGibbs sampling algorithmBreast cancer DNAIteration;

机译：主题发现吉布斯采样算法乳腺癌DNA迭代;

相似文献

外文文献
中文文献
专利

1. An Improved Genetic Algorithm for DNA Motif Discovery with Gibbs Sampling Algorithm [J] . Changjun Zhou, Yanzhang Li, Qiang Zhang, Journal of Bionanoscience . 2014,第3期

机译：基于Gibbs采样算法的DNA主题发现的改进遗传算法
2. A Greedy Two-stage Gibbs Sampling Method for Motif Discovery in Biological Sequences [J] . Li-Fang Liu, Li-Cheng Jiao Journal of information science and engineering . 2010,第6期

机译：在生物序列中发现基序的贪婪两阶段吉布斯采样方法
3. Enhancing Gibbs Sampling Method for Motif Finding in DNA with Initial Graph Representation of Sequences [J] . ?IVA STEPAN?I? Journal of computational biology: A journal of computational molecular cell biology . 2014,第10期

机译：用序列的初始图表示法增强DNA的Gibbs采样方法以寻找基序
4. Hybrid Gibbs-Sampling Algorithm for Challenging Motif Discovery: Gibbs DST [C] . Kazuhito Shid International Conference on Genome Informatics . 2006

机译：挑战基序的混合GIBBS - 采样算法：GIBBS DST
5. Uniqueness of Gibbs measures with application to Gibbs sampling and the Sum-Product algorithm. [D] . Winkler, Stephan Norbert. 2007

机译：Gibbs度量的唯一性及其在Gibbs采样和Sum-Product算法中的应用。
6. GibbsST: a Gibbs sampling method for motif discovery with enhanced resistance to local optima [O] . Kazuhito Shida 2006

机译：GibbsST：一种用于发现基序的Gibbs采样方法具有增强的局部最优抗性
7. AN IMPROVED GIBBS SAMPLING METHOD FOR MOTIF DISCOVERY VIA SEQUENCE WEIGHTING [O] . Xin Chen, Tao Jiang 2009

机译：一种通过序列加权的改进的GIBBS采样方法

Motif Discovery in DNA Sequences Using an Improved Gibbs (i Gibbs) Sampling Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅