Skip main navigation

Primary Databases

In this article, you will learn about primary databases and their importance in storing and making sequence data available. Primary databases (also known as data repositories) are highly organised, user-friendly gateways to the huge amount of biological data produced by researchers around the world.
Primary Databases diagram with INSDC in the centre, DDBJ, ENA/EBI and NCBI in an outer ring
© EMBL-EBI 2018

In this article, you will learn about primary databases and their importance in storing and making sequence data available.

Primary databases (also known as data repositories) are highly organised, user-friendly gateways to the huge amount of biological data produced by researchers around the world.

Primary Database Development

The primary databases were first developed for the storage of experimentally determined DNA and protein sequences in the 1980s and 90s. In those times, proteins were sequenced one amino acid at a time and DNA sequencing was in its infancy, so repositories contained a limited number of sequences. However, with the arrival of automatic DNA sequencing, these data banks started to grow exponentially.

Nowadays, sequence submissions are made by individual laboratories, as well as “in bulk” by sequencing centres around the world and DNA submissions now greatly outnumber protein sequence submissions. Most protein sequences found in databases are the product of conceptual translation of the genes and genomes determined using DNA sequencing.

The Primary Databases

There are three nucleotide repositories or primary databases for the submission of nucleotide and genome sequences:

  • GenBank hosted by the National Center for Biotechnology Information (or NCBI).
  • The European Nucleotide archive or ENA hosted by the European Molecular Biology Laboratories (EMBL).
  • The DNA Data Bank of Japan or DDBJ hosted by the National Centre for Genetics.

Together they form the International Nucleotide Sequence Database Collaboration and luckily for the users, they all “mirror” each other. This means that irrespective of where a sequence is submitted, the entry will appear in all three databases.

Primary Databases Free to Access

Once data are deposited in primary databases, they can be accessed freely by anyone around the world. For example, researchers are working on a Staphylococcus aureus strain that was isolated from a patient. After some investigations, the researchers suspected that this strain might be genetically different from previously identified strains. They decide to sequence it and, after comparing the DNA sequences already placed in the public repository (“known” strains), they conclude that indeed their strain is different. The research community will benefit from having this new sequence in the public repository so that the next time a researcher finds the same strain, he/she will be able to recognise if their isolate is a novel one, or if it is somehow related to strains previously sequenced.

The accumulation of collective knowledge in public databases enables rapid and efficient access to data by individuals and institutions. The rapid identification of a virulent strain of microbial pathogen based on its sequence, and sharing of results and experiences among researchers and clinicians could help put restrictions in place to prevent a pathogen spreading in the community. In other situations, the correct identification of the disease-causing pathogen can aid the choice of antibiotics enabling a better and quicker resolution of the disease.

See FAQs

© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now