Summary: We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl andJava, and SQL access through the BioWarehouse relational database. BioCyc (see http://BioCyc org/) is a collection of 161 Pathway/ Genome DataBases (PGDBs) that represent cellular networks and genome information in a structured manner, to allow powerfulcomputational analysis and manipulation of data. The highly cur-ated Tier 1 PGDBs at the core of BioCyc are the EcoCyc and MetaCyc DBs (Karp et al, 2002c,b). They contain many experimentally elucidated metabolic pathways from Escherichia coli and other organisms, BioCyc is viewed and edited through Pathway Tools (Karp et al., 2002a), a software environment we have developed to query, display and edit information about each pathway and its component reactions, compounds, enzymes, protein complexes, genes, operons and regulation at the substrate and tran-scriptional level. Additionally, the data objects support literature references, evidence codes and links to external databases. The BioCyc schema attempts to faithfully capture biological concepts and the cross-links among widely differing types of data. Tiers 2 and 3 were computationally predicted by Pathways Tools, Tier 2 has undergone moderate curation, whereas the 139 DBs in Tier 3 have undergone no curation (note also that Tier 3 PGDBs are not yet available for programmatic access, but we expect they will be soon).
展开▼