首页> 外文学位 >Development of Computational Techniques for Regulatory DNA Motif Identification Based on Big Biological Data.
【24h】

Development of Computational Techniques for Regulatory DNA Motif Identification Based on Big Biological Data.

机译:基于大生物数据的调控性DNA主题识别计算技术的发展。

获取原文
获取原文并翻译 | 示例

摘要

Accurate regulatory DNA motif (or motif) identification plays a fundamental role in the elucidation of transcriptional regulatory mechanisms in a cell and can strongly support the regulatory network construction for both prokaryotic and eukaryotic organisms. Next-generation sequencing techniques generate a huge amount of biological data for motif identification. Specifically, Chromatin Immunoprecipitation followed by high throughput DNA sequencing (ChIP-seq) enables researchers to identify motifs on a genome scale. Recently, technological improvements have allowed for DNA structural information to be obtained in a high-throughput manner, which can provide four DNA shape features. The DNA shape has been found as a complementary factor to genomic sequences in terms of transcription factor (TF)-DNA binding specificity prediction based on traditional machine learning models. Recent studies have demonstrated that deep learning (DL), especially the convolutional neural network (CNN), enables identification of motifs from DNA sequence directly.;Although numerous algorithms and tools have been proposed and developed in this field, (1) the lack of intuitive and integrative web servers impedes the progress of making effective use of emerging algorithms and tools; (2) DNA shape has not been integrated with DL; and (3) existing DL models still suffer high false positive and false negative issues in motif identification.;This thesis focuses on developing an integrated web server for motif identification based on DNA sequences either from users or built-in databases. This web server allows further motif-related analysis and Cytoscape-like network interpretation and visualization. We then proposed a DL framework for both sequence and shape motif identification from ChIP-seq data using a binomial distribution strategy. This framework can accept as input the different combinations of DNA sequence and DNA shape. Finally, we developed a gated convolutional neural network (GCNN) for capturing motif dependencies among long DNA sequences.;Results show that our developed web server enables providing comprehensive motif analysis functionalities compared with existing web servers. The DL framework can identify motifs using an optimized threshold and disclose the strong predictive power of DNA shape in TF-DNA binding specificity. The identified sequence and shape motifs can contribute to TF-DNA binding mechanism interpretation. Additionally, GCNN can improve TF-DNA binding specificity prediction than CNN on most of the datasets.
机译:准确的调控DNA基序(或基序)鉴定在阐明细胞中转录调控机制中起着基本作用,并且可以为原核和真核生物大力支持调控网络的构建。下一代测序技术会产生大量的生物学数据用于基序识别。具体来说,染色质免疫沉淀后再进行高通量DNA测序(ChIP-seq),使研究人员能够在基因组规模上鉴定出基序。近来,技术进步已经允许以高通量的方式获得DNA结构信息,其可以提供四个DNA形状特征。就基于传统机器学习模型的转录因子(TF)-DNA结合特异性预测而言,已经发现DNA形状是基因组序列的补充因子。最近的研究表明,深度学习(DL),尤其是卷积神经网络(CNN)可以直接从DNA序列识别模体。;尽管在该领域已经提出并开发了许多算法和工具,(1)缺乏直观且集成的Web服务器阻碍了有效使用新兴算法和工具的进展; (2)DNA形状尚未与DL整合在一起; (3)现有的DL模型在主题识别中仍然存在较高的假阳性和假阴性问题。本文主要研究基于用户或内置数据库的DNA序列开发集成的Web服务器进行主题识别。该网络服务器允许进行进一步的与主题相关的分析以及类似Cytoscape的网络解释和可视化。然后,我们提出了一种使用二项分布策略从ChIP-seq数据中识别序列和形状基序的DL框架。该框架可以接受DNA序列和DNA形状的不同组合作为输入。最后,我们开发了一个门控卷积神经网络(GCNN)来捕获长DNA序列之间的基序依赖性。结果表明,与现有的Web服务器相比,我们开发的Web服务器能够提供全面的基序分析功能。 DL框架可以使用优化的阈值识别基序,并揭示TF-DNA结合特异性中DNA形状的强大预测能力。鉴定出的序列和形状基序可以有助于TF-DNA结合机制的解释。此外,在大多数数据集上,GCNN可以比CNN改善TF-DNA结合特异性的预测。

著录项

  • 作者

    Yang, Jinyu.;

  • 作者单位

    South Dakota State University.;

  • 授予单位 South Dakota State University.;
  • 学科 Statistics.
  • 学位 M.S.
  • 年度 2018
  • 页码 76 p.
  • 总页数 76
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号