Raw data in results out: depiction of a blue database pasta maker
Raw data in results out: the database pasta maker

Secondary databases

In this Step, you will learn about biological databases, building on the primary databases that were discussed in Week 1.

Biological databases are centralised resources that contain representations of DNA and protein sequences and their associated information. Primary databases store and make data available to the public, acting as repositories. Secondary databases make use of publicly available sequence data in primary databases to to provide layers of information to DNA or protein sequence data.

We already discussed primary databases or repositories for nucleotide sequences, namely Genbank (NCBI), ENA (EMBL-EBI) and DDBJ in Week 1. The role of primary databases is not restricted to nucleotide sequences, protein sequences and other types of data can be submitted to some primary databases. Two examples that bioinformaticians use regularly include; (i) the WorldWide Protein Data Bank, a resource where protein’s three-dimensional structural data (and also nucleic acids) can be deposited and made publicly available; and Uniprot - a prmary database for protein sequences and functional annotation based on experimental evidence - which we will discuss in the next Step.

Secondary databases comprise data derived from analysing entries in primary databases. In most cases, they also provide tools to investigate further the genes and proteins. They work by analysing pre-existing data (for example, all protein sequences ever submitted, or the conceptual translation of all nucleotide sequences) and collect alongside when possible, information about the function of that sequence. Secondary databases will overlay additional information, commonly derived from their own analysis featuring a particular characteristic of the protein or sequence, for example the occurrence of an enzymatic catalytic site or a site for a protein modification. Many secondary databases are applied to the protein sequences rather than nucleotide sequences and a few examples are given in the next steps.

Share this article:

This article is from the free online course:

Bacterial Genomes: From DNA to Protein Function Using Bioinformatics

Wellcome Genome Campus Advanced Courses and Scientific Conferences