Skip main navigation

Secondary databases

An introduction to secondary biological databases.
Raw data in results out: depiction of a blue database pasta maker
© Francesca Short BY-CC
In this Step, you will learn about biological databases, building on the primary databases that were discussed in Week 1.

Biological databases are centralised resources that contain representations of DNA and protein sequences and their associated information. Primary databases store and make data available to the public, acting as repositories. Secondary databases make use of publicly available sequence data in primary databases to to provide layers of information to DNA or protein sequence data.

We already discussed primary databases or repositories for nucleotide sequences, namely Genbank (NCBI), ENA (EMBL-EBI) and DDBJ in Week 1. The role of primary databases is not restricted to nucleotide sequences, protein sequences and other types of data can be submitted to some primary databases. Two examples that bioinformaticians use regularly include; (i) the WorldWide Protein Data Bank, a resource where protein’s three-dimensional structural data (and also nucleic acids) can be deposited and made publicly available; and Uniprot – a prmary database for protein sequences and functional annotation based on experimental evidence – which we will discuss in the next Step.

Secondary databases comprise data derived from analysing entries in primary databases. In most cases, they also provide tools to investigate further the genes and proteins. They work by analysing pre-existing data (for example, all protein sequences ever submitted, or the conceptual translation of all nucleotide sequences) and collect alongside when possible, information about the function of that sequence. Secondary databases will overlay additional information, commonly derived from their own analysis featuring a particular characteristic of the protein or sequence, for example the occurrence of an enzymatic catalytic site or a site for a protein modification. Many secondary databases are applied to the protein sequences rather than nucleotide sequences and a few examples are given in the next steps.

© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now