首页> 外文学位 >Analysis of genomic and experimental data for identifying positively selected genes, early cancer diagnosis, and screening for improved industrial protein production.
【24h】

Analysis of genomic and experimental data for identifying positively selected genes, early cancer diagnosis, and screening for improved industrial protein production.

机译:分析基因组和实验数据,以鉴定阳性选择的基因,早期癌症诊断以及提高工业蛋白产量的筛选。

获取原文
获取原文并翻译 | 示例

摘要

Bioinformatics employs an inter-disciplinary approach, which includes techniques from the fields of statistics, computer science, and biology, to solve biological problems. Recent advances in laboratory techniques and computational capabilities, such as microarray permit collection and analysis of biological data on a large scale. The analysis of such large data sets presents significant new challenges in both of biology and computer science, requiring development of specialized new techniques and tools in the field of bioinformatics.;Evolutionary genomics is a major area of bioinformatics research. As many previous studies have shown, a computational analysis of conservation of particular sequence elements under various selective pressures in multiple genomes will contribute to our understanding of biological diversity and speciation. Protozoan parasites Cryptosporidium parvum and Cryptosporidium hominis infect human and frequently cause diarrhea. Our analysis of their genome sequences shows a small number of genes are under positive selection, which is consistent with the fact that organisms need to keep their genetic materials stable for survival and reproduction; these genes are difficult to annotate because of low sequence similarity to genes with known functions. In addition, proteins with transmembrane domain and signal peptide are enriched in positively selected gene groups. All these new results helped answering questions to these pathogens' evolution history.;Machine learning or data mining is a research direction that designs or develops algorithms that can extract hidden patterns from large amount of data. As more and more biology experimental data are assembled and researchers are eager to learn the underlying principle governing biological process, many machine learning techniques are applied to biological data and have successfully modeled these processes. We applied such techniques to two separated research projects: early diagnosis of pancreatic cancer using mass spectrometry data and prediction of protein solubility. Using a data analysis framework that included three steps: preprocessing; feature selection and ensembles classification, we achieved a good performance on differentiating samples between disease group and control group. This work suggested a possible future direction in successfully diagnosing pancreatic cancer in its early stage. In protein solubility project, we proposed a new pair of novel features, and then tested a variety of machine learning approaches, using two newly assembled datasets. Also, a group of important protein sequence and secondary structure features having good correlation with protein solubility status were identified. Our results suggest that incorporating secondary structural information can improve solubility prediction.
机译:生物信息学采用跨学科的方法来解决生物学问题,其中包括统计学,计算机科学和生物学领域的技术。实验室技术和计算能力(例如微阵列)的最新进展允许大规模收集和分析生物学数据。对如此大的数据集进行分析在生物学和计算机科学方面都提出了重大的新挑战,需要在生物信息学领域开发专门的新技术和工具。进化基因组学是生物信息学研究的主要领域。正如许多以前的研究表明的那样,在多个基因组中的各种选择压力下对特定序列元素的保守性的计算分析将有助于我们对生物多样性和物种形成的理解。原生动物寄生虫小隐孢子虫和人隐孢子虫感染人并经常引起腹泻。我们对其基因组序列的分析表明,有少数基因处于正选择状态,这与生物体需要保持其遗传物质稳定才能生存和繁殖这一事实是一致的。这些基因由于与已知功能基因的序列相似性低而难以注释。另外,具有跨膜结构域和信号肽的蛋白质富含正选择的基因组。所有这些新结果帮助回答了有关这些病原体进化历史的问题。;机器学习或数据挖掘是设计或开发可以从大量数据中提取隐藏模式的算法的研究方向。随着越来越多的生物学实验数据的组合以及研究人员渴望了解控制生物学过程的基本原理,许多机器学习技术已应用于生物学数据并成功地对这些过程进行了建模。我们将这样的技术应用于两个独立的研究项目:使用质谱数据对胰腺癌进行早期诊断以及预测蛋白质溶解度。使用包括三个步骤的数据分析框架:预处理;特征选择和合奏分类,我们在区分疾病组和对照组的样本上取得了良好的性能。这项工作为成功诊断早期胰腺癌提供了可能的未来方向。在蛋白质溶解度项目中,我们提出了一对新的新颖功能,然后使用两个新组装的数据集测试了多种机器学习方法。另外,鉴定出一组与蛋白质溶解度状态具有良好相关性的重要蛋白质序列和二级结构特征。我们的结果表明,结合二级结构信息可以改善溶解度预测。

著录项

  • 作者

    Ge, Guangtao.;

  • 作者单位

    Tufts University.;

  • 授予单位 Tufts University.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 87 p.
  • 总页数 87
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号