About the Gene Ontology


Mission Statement

A core goal of biomedical research is to uncover how individual genes contribute to the biology of an organism, and their roles in health and disease. The mission of the Gene Ontology Consortium (GOC) is to provide a comprehensive and up-to-date computational model of the current scientific understanding of the functions of gene products, e.g. proteins, non-coding RNAs, macromolecular complexes, or genes for simplicity. GO encompasses all levels of biological systems, from molecular activities to complex cellular and organismal-level networks. GO provides uniform descriptors applicable to gene products across the entire tree of life. Today, GO is used to represent gene function in all sequenced organisms.

Background

The GOC was established in 1998 when researchers studying the genome of three model organisms — Drosophila melanogaster (fruit fly), Mus musculus (mouse), and Saccharomyces cerevisiae (brewer’s or baker’s yeast) — began to work collaboratively on a common classification scheme for gene function to compare the newly sequenced genomes of these organisms. One of the GO’s earliest documents, On the representation of “gene function” in databases, was written by Michael Ashburner in 1998. GO’s first offical paper was the 2000 Nature Genetics publication Gene Ontology: tool for the unification of biology.

GO grew into a large data framework adapted for all living organisms, from bacteria to human. GO was the first of the hundreds of biomedical ontologies that currently exist, which together, aim to represent the vast amount of biomedical knowledge in a computable form. GO is a major hub within these ontologies, being linked to many other biomedical ontologies. It is widely used as a tool in scientific research, and has been cited in tens of thousands of publications. The GO Consortium regularly publishes updates and developments as the Gene Ontology Consortium. For the most recent publications, please see our citation policy.

The GO consists of:

  • The GO ontology: the logical structure describing the full complexity of the biology, comprising the ‘classes’ (often referred to as ‘terms’) describing the many different types of molecular functions (Molecular Function), the pathways carrying out different biological programs (Biological Process), and the cellular locations where these occur (Cellular Component). The GO is structured by relating each class to other classes using specific relations.
  • The corpus of GO annotations: the traceable (i. e., associated with scientific articles), evidence-based statements relating a specific gene product to a specific ontology term. The set of all GO annotations associated with a gene provides a description of its biological role. As of October 2024, the GO includes experimental findings from over 180,000 published papers, representing over 1,000,000 experimentally-supported annotations.

Together, the ontology and annotations provide a comprehensive model of biological systems.

In addition to this core knowledgebase of ontology + annotations, GO resources also include software to edit and perform logical reasoning over the ontologies, web access to the ontology and annotations, and analytical tools that use GO to support biomedical research.

Uses of the GO and annotations

The GO knowledgebase plays an essential role in supporting biomedical research and has been cited in tens of thousands of scientific studies. The most common use of GO annotations is for interpretation of large-scale molecular biology experiments, to gain insight into the structure, function, and dynamics of an organism. Gene Ontology enrichment analysis is used to discover statistically significant similarities or differences under alternate controlled experimental conditions.

The GO and the Alliance of Genome Resources

Alliance of Genome Resources, www.alliancegenome.org In 2016, the GO knowledgebase partnered with model organism databases (MODs) to form the Alliance of Genome Resources (the Alliance). The mission of the Alliance is to develop and maintain sustainable genome information resources that facilitate the use of diverse model organisms in understanding the genetic and genomic basis of human biology, health and disease. The partner MODs are: Flybase, Mouse Genome Database (MGI), Rat Genome Database (RGD), Saccharomyces Genome Database(SGD), WormBase, Xenbase, and Zebrafish Information Network(ZFIN).

The GO and the Global Biodata Coalition

Global Biodata Coalition Resource, globalbiodata.org The Global Biodata Coalition (GBC), founded in 2019, is a forum working to ensure the efficient management and growth of biodata infrastructure by coordinating funding at the global level. GO has been a Global Core Biodata Resource (GCBR) since the first set was selected in December 2022. Among other criteria, GCBR selection is based on the status of the resource as authoritative databases or knowledgebases that are used extensively, have a proven longevity, and provide free and open access to their high quality data. For more information and to view the full list of GCBRs, visit the GBC Global Core Biodata Resource page.

Funding

The GO Consortium is funded by the National Human Genome Research Institute (US National Institutes of Health), grant number HG012212, with co-funding by NIGMS.

Further reading about the Gene Ontology knowledgebase

For further guidance and reading, please see the following publications: