Science

Fact-checked

What Is a Genome Database?

Jillian O Keeffe

Last Modified Date: February 22, 2024

A genome is a collection of all the genetic material present in one organism. As the sequence and structure of this genetic material drives all of biological life, scientists are very interested in finding out what they're all for. A genome database is a cross-referenced collection of information about one or more organisms, so one scientist can look at all the available genetic information to help him or her in research.

Genomes are highly complex and contain billions of bases in the sequence of information. Computerized databases, therefore, are the only practical way of organizing the details in one place. Generally, these are available as online databases for scientific research. A relatively new field of science, called bioinformatics, has sprung up to perfect the way biological data can be interpreted through computer systems.

Geneticists can use a genome database either to identify a gene that they are studying or to find out what the gene does.

Databases of genomes contain the sequence of the genes of an organism if the entire sequence is known. Otherwise, it can contain partial sequences. The human, mouse, and Drosophila fly genomes have been sequenced, for example. When the sequence of a genome is known, geneticists can identify particular genes in the genome. Each gene is the instruction sheet for one particular cell product.

If a gene has a mutation, it has a different sequence to the normal, functional gene. Mutations can be beneficial and produce a useful characteristic in the mutated organism. They can also make no difference to the product, or they can be detrimental to the normal workings of the organism. Many medical conditions, for example, are due to mutations in a particular gene.

Mutations can also be used to calculate how closely related a particular species is to another, as more mutations accumulate over time. Individuals can also vary in genome sequence, especially as large parts of the genome are not genes and do not code for any essential cell product. A genome database holds a sequence from an organism that is designated as a standard, but there will be many minor differences among the arbitrarily chosen standard and the other individuals in a species.

Despite the presence of many differences, genes are recognizable through sequences. If geneticists know what a particular gene does in one organism, then a gene with a similar sequence in another animal most probably performs the same function. Geneticists can use a genome database either to identify a gene that they are studying or to find out what the gene does.

Each genome database is searchable. Usually, scientists can search a database one of several different ways. Commonly, he or she can input the sequence of a gene he or she has sequenced. Then, the database finds one or more similar sequences for comparison.

A simpler manner of searching the database involves looking for a gene key word, such as the name of the gene. Authorities such as the U.S. National Authority for Biotechnology Information (NCBI) can give sequences distinct reference numbers, and a geneticist can also search a genome database using one of these identifiers. He or she can also narrow down the results using more search parameters. Cross-referenced information is a feature of most genome databases, and a single sequence result will also furnish the database user with useful links for more genetic information. As well as information about a specific sequence, many databases provide a visual representation of the sequence and of the notable features of that area.

Different organisms can have specific genome databases, but some larger databases contain more than one species. Various authorities control the different databases available, so the databases can all use distinct formats and search capabilities. Some examples of these authorities include the NCBI, the European Bioinformatics Institute, or even individual universities.