Michael Ashburner, PhD, ScD


Ontologies for Biologists - A Community Model for the Annotation of Genomic Data

The representation of biological knowledge in databases is a necessity for modern biomedical research. Historically, there has been very little collaboration or coordination between different database providers. Although many grandiose schemes for the "integration" of biological data-bases have been proposed over the years, none have been practical to the point of implementa-tion. Yet the need for integration remains, as many biologists, both those at the bench and those who analyse data computationally, wish to integrate data from a diversity of sources. The Gene Ontology Consortium (GOC) began, some seven years ago, to develop a resource that could be used by both the model organism databases (e.g. FlyBase, WormBase, Mouse Genome Da-tabase, The Arabidopsis Information Resource) and the large "horizontal" databases (e.g. Uni-Prot, GeneDB, TIGR Gene Index) as a standard for the annotation of gene products. The GOC now maintains several structured controlled vo-cabularies for the annotation of gene products. The first three of these are used for the annota-tion of gene products with respect to these do-mains: their molecular function, their cellular location and the biological processes in which they are involved. This database of nearly 18,000 terms is now used for the annotation of the gene products of all of the major experimen-tal eukaryotes and many prokaryotes.

The philosophy of the GOC is now being ex-tended to cover further domains of biological knowledge. Under the umbrella of "obo" (open biological ontologies) structured controlled vo-cabularies are now available, or are being devel-oped, for sequence annotation, anatomies and development, cells and tissues, mouse pathology and experimental treatments.

In this talk I will discuss how the concept of ontologies can be used for the intelligent design of database schema, and for the development of common tools for data exchange. I will also discuss some of the major limitations of the cur-rent models of data representation used by the GO Consortium, and proposals that will make the design of ontologies for shared use both more flexible and powerful.



Michael Ashburner (1942-) is Professor of Biology at the University of Cambridge and is the former Joint-Head of the European Bioinformatics Institute (EBI).

He was educated at the Royal Grammar School, High Wycombe and the University of Cambridge, where he received his undergraduate degree (1964) and Ph.D. (1968), both in genetics. He then went to the California Institute of Technology as a postdoctoral fellow with Hershell Mitchell. In 1979, he returned to the Department of Genetics in Cambridge where he has been based since, as Assistant in Research, University Demonstrator, University Lecturer, Reader in Developmental Biology and Professor (Ad hominem) of Biology (since 1991). He has been Miller Professor at the University of California at Berkeley and visiting professor at the University of California Medical School, San Francisco; University of Crete, Greece; and University of Pavia, Italy. For the period 1994-2001 he was first Research Coordinator and then Joint-Head of the European Molecular Biology Laboratory - European Bioinformatics Institute at Hinxton, Cambridge. During this period he was on 50% leave from the University of Cambridge.

His major research interests are now the structure and evolution of genomes. Most of his research has been with the model organism Drosophila melanogaster, about which he has written the standard research text (Drosophila: A Laboratory Handbook, Cold Spring Harbor Press, New York, 1989, 2nd ed. 2005). His research has covered a range of subjects, from classical genetics, developmental biology, cytogenetics to evolution, at both molecular and organismal levels. He was a member of the consortium which recently sequenced the entire genome of this fly. He has had a strong interest in the provision of databases for biologists for about 15 years. He is a founder of FlyBase, a major database for researchers using Drosophila as a model organizer, and of the Gene Ontology Consortium, a project to provide infrastructure for biological databases by a defined taxonomy of gene function. Ashburner is a Fellow of the Royal Society of London and of the Academia Europeae; he is a foreign honorary member of the American Academy of Arts and Sciences, a member of the European Molecular Biology Organization, and past president of the British Genetical Society. He also is a Professorial Fellow of Churchill College, Cambridge.

University of Cambridge
Department of Genetics
Downing Street, Cambridge, CB2 3EH, England

Telephone: +44 1223 333969