Skip main navigation

Secondary databases

An introduction to secondary biological databases.
Raw data in results out: depiction of a blue database pasta maker
© Wellcome Genome Campus Advanced Courses and Scientific Conferences

In this Step, you will learn about biological databases, building on the primary databases that were discussed in Week 1.

Biological databases are centralised resources that contain representations of DNA and protein sequences and their associated information. Primary databases store and make data available to the public, acting as repositories. Secondary databases make use of publicly available sequence data in primary databases to to provide layers of information to DNA or protein sequence data.

We already discussed primary databases or repositories for nucleotide sequences, namely Genbank (NCBI), ENA (EMBL-EBI) and DDBJ in Week 1. The role of primary databases is not restricted to nucleotide sequences, protein sequences and other types of data can be submitted to some primary databases. Two examples that bioinformaticians use regularly include; (i) the WorldWide Protein Data Bank, a resource where protein’s three-dimensional structural data (and also nucleic acids) can be deposited and made publicly available; and Uniprot – a prmary database for protein sequences and functional annotation based on experimental evidence – which we will discuss in the next Step.

Secondary databases comprise data derived from analysing entries in primary databases. In most cases, they also provide tools to investigate further the genes and proteins. They work by analysing pre-existing data (for example, all protein sequences ever submitted, or the conceptual translation of all nucleotide sequences) and collect alongside when possible, information about the function of that sequence. Secondary databases will overlay additional information, commonly derived from their own analysis featuring a particular characteristic of the protein or sequence, for example the occurrence of an enzymatic catalytic site or a site for a protein modification. Many secondary databases are applied to the protein sequences rather than nucleotide sequences and a few examples are given in the next steps.

© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education