首页> 外文学位 >The design and implementation of an extended database system to support biological sequence similarity analysis.
【24h】

The design and implementation of an extended database system to support biological sequence similarity analysis.

机译:支持生物学序列相似性分析的扩展数据库系统的设计和实现。

获取原文
获取原文并翻译 | 示例

摘要

Molecular biology researchers generate vast amounts of gene sequence data so quickly that they are outdistancing their ability to characterize what function they perform in the cell. A faster means of characterizing new sequences is to use similarity algorithms to compare them to known sequences. For large-scale sequencing projects, however, the biologists' problems using this technique are twofold: (1) they have too any sequences on which to manually execute similarity algorithms, and (2) the tremendous amount of textual data that results from running these algorithms is impossible to manually interpret. To solve these problems, we present the design and implementation of a Similarity Analysis Database System, which we developed during a cross-disciplinary research project between computer scientists and molecular biologists. The contributions of this work, to both computer science and computational biology research, are: (1) We have developed a DBMS-independent conceptual data schema for representing general information about the many different similarity algorithms, their execution parameters, and the results from performing those executions; (2) we have developed a processing system that automates the difficult task of performing similarity algorithm executions on the tens thousands of sequences generated annually by researchers on our project, and we provide the similarity results to the rest of the community via index search on our WWW site; (3) we have stored these similarity results in a database patterned after the conceptual schema, using an extensible DBMS; (4) we have extended the DBMS with additional functions that facilitate faster and more complex interpretation of similarities detected by the algorithms; (5) we show the value of these functions by reporting interesting results from several analyses that we have conducted on similarity data. Because the system is faster and easier to use, biologists are now able to overcome the insurmountable task of analyzing similarities for the large amounts of sequence data that they produce. We designed this system for long-term use by providing generality and giving biologists the ability to compare the results using different sets of criteria. The system thus empowers scientists to explore the similarity data in ways that were not possible before.
机译:分子生物学研究人员如此迅速地生成大量基因序列数据,以至于无法表征其在细胞中发挥何种功能的能力。表征新序列的一种更快的方法是使用相似性算法将它们与已知序列进行比较。但是,对于大规模测序项目,生物学家使用此技术的问题是双重的:(1)他们拥有太多序列,无法手动执行相似性算法;(2)运行这些序列会产生大量文本数据算法是无法手动解释的。为了解决这些问题,我们介绍了相似性分析数据库系统的设计和实现,该系统是在计算机科学家和分子生物学家之间的跨学科研究项目中开发的。这项工作对计算机科学和计算生物学研究的贡献是:(1)我们开发了一种独立于DBMS的概念数据模式,用于表示有关许多不同相似性算法,它们的执行参数以及执行结果的一般信息。这些处决; (2)我们开发了一种处理系统,该系统可以自动完成对研究人员在我们的项目上每年生成的成千上万个序列执行相似性算法执行的艰巨任务,并通过对我们的索引搜索将相似性结果提供给其他社区万维网站点; (3)我们使用可扩展的DBMS将这些相似性结果存储在按照概念模式进行模式化的数据库中; (4)我们扩展了DBMS,使其具有附加功能,这些功能有助于更快,更复杂地解释算法检测到的相似性; (5)通过报告对相似性数据进行的几次分析得出的有趣结果,我们展示了这些功能的价值。由于该系统更快,更易于使用,因此生物学家现在能够克服分析其产生的大量序列数据的相似性这一不可克服的任务。我们通过提供通用性并使生物学家能够使用不同的标准比较结果来设计该系统以供长期使用。因此,该系统使科学家能够以前所未有的方式探索相似性数据。

著录项

  • 作者

    Shoop, Elizabeth Grace.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer Science.;Information Science.;Biology Molecular.
  • 学位 Ph.D.
  • 年度 1996
  • 页码 189 p.
  • 总页数 189
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号