...
首页> 外文期刊>Journal of Computer Science & Systems Biology >Motif Discovery in DNA Sequences Using an Improved Gibbs (i Gibbs) Sampling Algorithm
【24h】

Motif Discovery in DNA Sequences Using an Improved Gibbs (i Gibbs) Sampling Algorithm

机译:使用改进的Gibbs(i Gibbs)采样算法发现DNA序列中的基序

获取原文
           

摘要

Motifs are repeated patterns of short sequences usually of varying lengths between 6 to 20 bases. Within Deoxyribonucleic Acid (DNA) sequences, these motifs constitute the conserved region of most common signatures for recognizing protein domains that are relevant in it evolution, function and interaction. The Gibbs sampling is a Markov Chain Monte Carlo (MCMC) algorithm which has been applied in the past to discover motifs in DNA sequences. A problem with this technique is the profusion of iterative operations in the sampling process because it progressively chooses new possible motif positions from a continuous randomize sampling in DNA sequences. We applied an Improved Gibbs (iGibbs) sampling algorithm on Breast Cancer (brca) human disease DNA sequences obtained from https://www.ncbi.nlm.nih.govuccore to overcome this unwieldy iteration by altering the processes to obtain a reduced runtime and also achieve an accurate satisfactory motif result. The methodology applied in iGibbs algorithm takes an input of fasta or gbk DNA file and creates a list of all nucleotides to predict a random sampling starting position. It applies motif length, lesser iterative value and further computes the probability and position ranking scores using Position Weight Matrix (PWM). The algorithm was implemented using Python, Python(x,y) and Biopython. The iGibbs algorithm was evaluated using varying motif lengths of 12, 18 and 24 on different base lengths of 5,000, 10,000 and 15,000 with different iteration levels. The result showed that the iGibbs returned a better average runtime of 7, 10 and 23 seconds respectively compared to 12, 32 and 60 seconds respectively in the existing Gibbs sampling algorithm found at http://ccmbweb.ccv.brown.edu/gibbs/gibbs.html. The accuracy of the motif result was checked using the hamming distance for finding the contiguous string and minimum edit distance into consensus sequences.
机译:图案是短序列的重复模式,通常长度在6到20个碱基之间。在脱氧核糖核酸(DNA)序列中,这些基序构成了最常见特征的保守区域,用于识别与其进化,功能和相互作用相关的蛋白质域。吉布斯采样是一种马尔可夫链蒙特卡洛(MCMC)算法,该算法过去曾被用于发现DNA序列中的基序。该技术的一个问题是在采样过程中会出现大量重复操作,因为它会从DNA序列的连续随机采样中逐步选择新的可能基序位置。我们对从https://www.ncbi.nlm.nih.govuccore获得的乳腺癌(brca)人类疾病DNA序列应用了改进的Gibbs(iGibbs)采样算法,以通过更改过程以减少重复序列的过程来克服这种繁琐的迭代过程运行时,还可以获得准确的令人满意的图案结果。 iGibbs算法中应用的方法学输入了fasta或gbk DNA文件,并创建了所有核苷酸的列表以预测随机采样的起始位置。它应用了主题长度,较小的迭代值,并使用位置权重矩阵(PWM)进一步计算了概率和位置排名得分。该算法是使用Python,Python(x,y)和Biopython实现的。使用不同的基序长度5,000、10,000和15,000以及不同的迭代级别,使用12、18和24的不同基序长度来评估iGibbs算法。结果表明,与http://ccmbweb.ccv.brown.edu/gibbs/上现有的Gibbs采样算法相比,iGibbs分别返回了更好的平均运行时间7、10和23秒,而分别为12、32和60秒。 gibbs.html。使用汉明距离以查找连续的字符串并将最小编辑距离转换为共有序列,从而检查图案结果的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号