首页> 外文学位 >Kernel-based empirical Bayesian classification methods with applications to protein phosphorylation and non-coding RNA.
【24h】

Kernel-based empirical Bayesian classification methods with applications to protein phosphorylation and non-coding RNA.

机译:基于核的经验贝叶斯分类方法,应用于蛋白质磷酸化和非编码RNA。

获取原文
获取原文并翻译 | 示例

摘要

With the advancement of high-throughput sequencing technologies, a new era of "big data" biological research has dawned. However, the abundance of biological data presents many challenges in their analysis and it has proven very difficult to extract important information out of the data. One approach to this problem is to use the methods of machine learning.;In this dissertation, we describe novel probabilistic kernel-based learning methods and demonstrate their practical applicability by solving major bioinformatics problems at the transcriptome and proteome levels where the resulting tools are expected to help biologists further elucidate the important information contained in their data.;The proposed binary classification method, the Classification Relevance Units Machine (CRUM), employs the theory of kernel and empirical Bayesian methods to achieve non-linear classification and high generalization. We demonstrate the practical applicability of CRUM by applying it to the prediction of protein phosphorylation sites, which helps explain the mechanisms that control many biochemical processes.;Then we develop an extension of CRUM to solve multiclass problems, called the Multiclass Relevance Units Machine (McRUM). McRUM uses the error correcting output codes framework to decompose a multiclass problem into a set of binary problems. We devise a linear-time algorithm to aggregate the results into the final probabilistic multiclass prediction to allow for predictions in large scale applications. We demonstrate the practical applicability of McRUM through a solution to the identification of mature microRNA (miRNA) and piwi-interacting RNA (piRNA) in small RNA sequencing datasets. This provides biologists a tool to help discover novel miRNA and piRNA to further understand the molecular processes of the organisms they study.
机译:随着高通量测序技术的发展,“大数据”生物学研究的新时代已经到来。但是,丰富的生物数据在分析中提出了许多挑战,事实证明很难从数据中提取重要信息。解决这一问题的一种方法是使用机器学习方法。本论文中,我们描述了基于概率的新型学习方法,并通过解决预期的工具在转录组和蛋白质组水平上的主要生物信息学问题,证明了它们的实际适用性。二进制提出的分类方法-分类相关单位机(CRUM),利用核理论和经验贝叶斯方法,实现了非线性分类和高泛化,从而帮助生物学家进一步阐明其数据中的重要信息。通过将其应用于蛋白质磷酸化位点的预测,我们证明了CRUM的实际适用性,这有助于解释控制许多生化过程的机制。然后,我们开发了CRUM的扩展来解决多类问题,称为多类相关单位机器(McRUM) )。 McRUM使用纠错输出代码框架将多类问题分解为一组二进制问题。我们设计了一种线性时间算法,将结果汇总到最终的概率多类预测中,以便在大规模应用中进行预测。我们通过在小型RNA测序数据集中鉴定成熟的microRNA(miRNA)和piwi相互作用RNA(piRNA)的解决方案,证明了McRUM的实际适用性。这为生物学家提供了一种工具,帮助他们发现新颖的miRNA和piRNA,以进一步了解他们研究的生物的分子过程。

著录项

  • 作者

    Menor, Mark S.;

  • 作者单位

    University of Hawai'i at Manoa.;

  • 授予单位 University of Hawai'i at Manoa.;
  • 学科 Computer science.;Bioinformatics.;Molecular biology.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 131 p.
  • 总页数 131
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号