首页> 外文学位 >Pattern Discovery for Deciphering Gene Regulation Based on Evolutionary Computation.
【24h】

Pattern Discovery for Deciphering Gene Regulation Based on Evolutionary Computation.

机译:基于进化计算的破译基因调控模式发现。

获取原文
获取原文并翻译 | 示例

摘要

Transcription Factor (TF) and Transcription Factor Binding Site (TFBS) bindings are fundamental protein-DNA interactions in transcriptional regulation. TFs and TFBSs are conserved to form patterns (motifs) due to their important roles for controlling gene expressions and finally affecting functions and appearances. Pattern discovery is thus important for deciphering gene regulation, which has tremendous impacts on the understanding of life, bio-engineering and therapeutic applications. This thesis contributes to pattern discovery involving TFBS motifs and TF-TFBS associated sequence patterns based on Evolutionary Computation (EC), especially Genetic Algorithms (GAs), which are promising for bioinformatics problems with huge and noisy search space.;On TFBS motif discovery, three novel GA based algorithms are developed, namely GALF-P with focus on optimization, GALF-G for modeling, and GASMEN for spaced motifs. Novel memetic operators are introduced, namely local filtering and probabilistic refinement, to significantly improve effectiveness (e.g. 73% better than MEME) and efficiency (e.g. 4.49 times speedup) in search. The GA based algorithms have been extensively tested on comprehensive synthetic, real and benchmark datasets, and shown outstanding performances compared with state-of-the-art approaches. Our algorithms also "evolve" to handle more and more relaxed cases, namely from fixed motif widths to most flexible widths, from single motifs to multiple motifs with overlapping control, from stringent motif instance assumption to very relaxed ones, and from contiguous motifs to generic spaced motifs with arbitrary spacers.;TF-TFBS associated sequence pattern (rule) discovery is further investigated for better deciphering protein-DNA interactions in regulation. We for the first time generalize previous exact TF-TFBS rules to approximate ones using a progressive approach. A customized algorithm is developed, outperforming MEME by over 73%. The approximate TF-TFBS rules, compared with the exact ones, have significantly more verified rules and better verification ratios. Detailed analysis on PDB cases and conservation verification on NCBI protein records illustrate that the approximate rules reveal the flexible and specific protein-DNA interactions with much greater generalized capability.;The comprehensive pattern discovery algorithms developed will be further verified, improved and extended to further deciphering transcriptionial regulation, such as inferring whole gene regulatory networks by applying TFBS and TF-TFBS patterns discovered and incorporating expression data.
机译:转录因子(TF)和转录因子结合位点(TFBS)的结合是转录调控中基本的蛋白质-DNA相互作用。由于TF和TFBS在控制基因表达并最终影响功能和外观方面起着重要作用,因此它们被保守地形成图案(基序)。因此,模式发现对于破译基因调控非常重要,这对理解生命,生物工程和治疗应用具有巨大影响。本文为基于进化计算(EC),尤其是遗传算法(GA)的涉及TFBS基序和TF-TFBS相关序列模式的模式发现做出了贡献,这些方法有望解决具有巨大且嘈杂搜索空间的生物信息学问题。开发了三种基于GA的新颖算法,即专注于优化的GALF-P,用于建模的GALF-G和用于间隔图案的GASMEN。引入了新颖的模因运算符,即局部过滤和概率细化,以显着提高搜索的效率(例如,比MEME好73%)和效率(例如,加速4.49倍)。基于GA的算法已在综合的综合,真实和基准数据集上进行了广泛的测试,与最先进的方法相比,表现出出色的性能。我们的算法还可以“进化”来处理越来越轻松的情况,即从固定的主题宽度到最灵活的宽度,从单个主题到具有重叠控制的多个主题,从严格的主题实例假设到非常轻松的主题,以及从连续主题到通用主题具有任意间隔基的间隔的基序。进一步研究了TF-TFBS相关序列模式(规则)的发现,以更好地破译调控中的蛋白质-DNA相互作用。我们第一次使用渐进方法将以前的精确TF-TFBS规则概括为近似规则。开发了定制的算法,性能超过MEME超过73%。与精确的TF-TFBS规则相比,近似的TF-TFBS规则具有更多的已验证规则和更好的验证比率。对PDB病例的详细分析和对NCBI蛋白质记录的保守性验证表明,近似规则揭示了灵活而特异性的蛋白质-DNA相互作用,具有更大的泛化能力。;将进一步验证,改进和扩展开发的综合模式发现算法,以进一步解密转录调控,例如通过应用发现的TFBS和TF-TFBS模式并整合表达数据来推断整个基因调控网络。

著录项

  • 作者

    Chan, Tak Ming.;

  • 作者单位

    The Chinese University of Hong Kong (Hong Kong).;

  • 授予单位 The Chinese University of Hong Kong (Hong Kong).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 177 p.
  • 总页数 177
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号