首页> 外文OA文献 >The development of semantic meta-database: an ontology based semantic integration of biological databases
【2h】

The development of semantic meta-database: an ontology based semantic integration of biological databases

机译:语义元数据库的开发:基于本体的生物数据库语义集成

摘要

Protein sequence annotation is important for the preservation and reuse of knowledge, for content-based queries, and for the understanding of its function. Traditional wet-lab methods are labor intensive and prone to human error. Alternatively, existing tools are time intensive and require high investment in computing facilities for offline usage. On the other hand, these tools are highly dependent on internet stability and speed for online usage. Therefore, a simple and practical computational method that is more accurate, faster, easy to configure and use, and bears low computing cost is needed particularly for offline usage. In this study, a Gene Ontology (GO) based protein sequence annotation tool named extended UTMGO is developed to meet these features. The GO is selected because of its ability to provide dynamic, precisely defined, structured, and controlled terms that describe genes and their functions and products in any organism. Furthermore, the GO terms are linked with gene products and their protein sequences from various species provided by Gene Ontology Annotation (GOA). Thus, assigning highly correlated GO terms of annotated protein sequences to partially annotated or newly discovered protein sequences can be made. The tool comprises two intelligent algorithms. The first algorithm combines parallel genetic algorithm with the split-and-merge algorithm. The idea is to cluster the GO terms into number k of clusters in order to split the monolithic GO RDF/XML file into smaller files. Thus, it enables protein sequences and Inferred from Electronic Annotation (IEA) evidence associations to be included in those files. The second algorithm incorporates parallel genetic algorithm with the semantic similarity measure algorithm. The motive is to search for a set of semantically similar GO terms from the fragmented GO RDF/XML files to a given query. In addition, its basic version which is a GO browser based on semantic similarity search is also introduced to overcome the weaknesses of conventional approach: the keyword matching.
机译:蛋白质序列注释对于保存和重用知识,基于内容的查询以及理解其功能非常重要。传统的湿实验室方法劳动强度大,容易出错。可选地,现有工具是时间密集的,并且需要大量的计算设施投资以供离线使用。另一方面,这些工具在很大程度上取决于互联网的稳定性和在线使用的速度。因此,特别是对于离线使用,需要一种更简单,更实用,更准确,更快,易于配置和使用,并且承担低计算成本的计算方法。在这项研究中,开发了一种名为扩展UTMGO的基于基因本体(GO)的蛋白质序列注释工具来满足这些功能。选择GO的原因是它具有提供动态,精确定义,结构化和受控的术语的能力,这些术语描述了任何生物体中的基因及其功能和产物。此外,GO术语与基因本体注释(GOA)提供的来自各种物种的基因产物及其蛋白序列相关。因此,可以将注释蛋白序列的高度相关的GO术语分配给部分注释或新发现的蛋白序列。该工具包含两种智能算法。第一种算法将并行遗传算法与拆分合并算法结合在一起。这个想法是将GO项聚类为k个聚类,以便将单片GO RDF / XML文件拆分为较小的文件。因此,它使蛋白质序列和从电子注释(IEA)推断的证据关联可以包含在那些文件中。第二种算法将并行遗传算法与语义相似性度量算法结合在一起。目的是从分散的GO RDF / XML文件到给定查询中搜索一组语义相似的GO术语。此外,还介绍了其基本版本,即基于语义相似性搜索的GO浏览器,以克服常规方法的缺点:关键字匹配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号