GO DATABASE DOWNLOADS ===================== releases are now named go_<200304>- 200304 is the release date (the release export usually follows some time after the monthly release, due to time taken to build) the DATASET is one of: ------- * termdb - a database containing just the information on the GO terms and relationships. These are the table that are populated: term GO controlled vocab terms term2term relationships between GO terms term_definition definitions of terms dbxref external database identifier entities term_dbxref links from terms to other databases term_synonym synonyms for terms graph_path transitive closure (all paths) in graph * assocdb - a database containing both the GO vocabulary and associations between GO terms and gene products. This database subsumes termdb. These are the extra tables that are populated: gene_product gene or protein or entity annotated association link between gene product and GO term evidence evidence type and reference for an assoc gene_product_count recursive product counts per GO term *seqdb - a database containing GO terms, gene products and the sequences associated with these gene products. This db subsumes the two above. It populates these additional tables: seq biological sequence gene_product_seq link between a product and a sequence seq_dbxref external database links for a sequence NOTE: there are other unpopulated tables - we may or may not decide to populate these at some point in the future. *seqdblite - this is the same as seqdb, except all IEA associations have been removed. The IEA associations provide relatively little value compared to the curated associations, and they slow querying down immensely. This is the distribution that AmiGO runs off of. We are working on optimisations to allow AmiGO to run off of the full seqdb release. the TYPE is either ---- .xml - RDF XML export of the database. this comes as one single file. Note there is no RDF XML export of seqdb, as we do not include sequences in the xml yet. We do not include IEA evidence associations in the xml. We may decide to split this xml file into multiple files at a later date. .tables - this is a directory containing the MySQL dump, see below .sql - SQL CREATE TABLE and INSERT statements for building a local instance of the database. equivalent to the .tables TYPE (but slower to load) .fasta - Fasta format sequence files. These are generated from a combination of the source gene association data and SwissProt sequence files. There will usually be far fewer sequences than gene products, as the sequence information is often not provided by the data source provider. Also, we only include sequences for which a non-IEA association could be made. The header line includes the GO annotations for the gene product corresponding to that sequence. SCHEMA DOCUMENTATION ******************** In this distribution, uncomporess the file: go_200304-schema-mysql.sql.gz Which contains the (MySQL ported) schema used in this release You can also look at the HTML marked up version of the schema, or schema diagrams here: Go to http://www.godatabase.org/dev/database To guarantee that your schema, code and database release are in sync you should use the files from the same release. ********************************** MYSQL USERS The database export was prepared from a mysql db - you should have no problem importing it: tar -zxvf go-200304-TYPE-tables.gz cd echo "create database mygo" | mysql cat *.sql | mysql mygo mysqlimport mygo *.txt ********************************** OTHER DBMS USERS Your database is not supported; but we do have some tips below: also: the perl api code is mostly DBMS neutral, it should in theory work on non mysql setups ********************************** POSTGRES USERS: TIP: for converting mysql dumps to postgres, try my2pg http://www.ziet.zhitomir.ua/~fonin/code/ Thanks to Joe Morris of Affymetrix for the tip ************************************ ORACLE USERS: see the directory sql/oracle/ in the go_200304-utilities_src software distribution ************************************ PROGRAMMERS: you can access the data using the perl API - see http://www.godatabase.org/dev OR look at the perl API release for the data release: tar -zxvf go_200304-utilities_src.tar.gz cd go_200304-utilities_src cd perl-api perldoc GO/AppHandle.pm sometimes the perl API must be in sync with the database, eg if the schema changes in a way to break old code ************************************* AMIGO: you can build a local AmiGO installation using the source code and data included in this distribution. You can load your own data into it using either the scripts in go-dev/apps/db-loading, or the configure script and makefile in go-dev/sql volunteers to write documentation on making this process simple much appreciated - if you want to contribute, email: Chris Mungall or: go-database@fruitfly.org note - you will have a better chance of response emailing the mail list. even then, don't be miffed if you don't get a response for a while, other projects often divert me from GO related stuff.