We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip main navigation

Getting started and glossary

Getting Started - Introducing the Glossary and Key Terms that we will be using in the course.
A hard copy book showing a dictionary entry
© Wellcome Genome Campus Advanced Courses and Scientific Conferences
Welcome to ‘From DNA to Protein Function Using Bioinformatics’. In this step, we introduce the course content as well as share some useful information – such as the Glossary – to guarantee your smooth learning.

Glossary

We start by summarising some of the key terms that you will come across during the course. The definitions are given in the context of bioinformatics.
The glossary below provides definitions of the key terms used in the course. Whenever one of these terms appears, we will link you back to its definition. The Glossary is here for your reference and you do not need to read it all now. There is also a PDF version of this glossary in the downloads section at the bottom of this step.

Algorithm.

A set of steps or a detailed plan to be followed when making a computer program.

Accession number

It is a unique identifier, often a combination of letters and numbers, that is assigned permanently to an entry in a database. The entry could be a DNA or protein sequence or other type of molecule. Accession numbers can also be assigned to experiments in databases. Accession numbers are stable through time.

Conceptual translation.

Of DNA/mRNA sequence into a protein sequence. This is the process of predicting the amino acid sequence of a polypeptide based on the sequence of nucleotides of its mRNA/DNA. The prediction is guided by the genetic code.

Homology annotation.

In bioinformatics, this term refers to the use of evolutionary conservation as a basis for extrapolating functional characteristics from one gene or protein to another.

Score (in BLAST).

This parameter describes how good the alignment between the query and the subject is. It depends on the number of “good” and ‘bad” matches. The higher the score, the better the alignment is.

Expected Value (E-value).

In sequence similarity searches, this parameter describes the number of hits that could be found by chance given the length of the sequence and the size of the database. The lower the E-value, the higher the chance that the observed alignment is due to homology. Learn more about e-values in this BLAST help page and in this tutorial

Percentage identity.

In BLAST results, this value represent the number of residues (amino acids or nucleotides) that match exactly at the same position between the query and the subject expressed as a percentage of the whole sequence.

Primary database.

A resource database to which researchers can submit experimentally-derived data, often sequenced DNA or mRNA, to be archived and made available for the wider community. Other primary databases include three-dimensional structure of proteins. For more information on databases from the European Bioinformatics Institute see here

Secondary database.

A resource in which entries in the primary database are processed informatically, to derive new information from them (for example, the prediction of protein topology). Secondary databases provide “digested” information. More on databases from the European Bioinformatics Institute here

Conserved domain.

Of a protein. It is a part of a protein that, by assuming a defined three-dimensional structure, confers a given function to that protein. Proteins can have more than one conserved domain and, at the same time, one given conserved domain can appear in different proteins. The amino acid sequence of conserved domains is less likely to change (i.e. it is more conserved) than sequences not part of the conserved domains (i.e. their structure is better maintained throughout evolution). https://en.wikipedia.org/wiki/Protein_domain

Flat file.

A plain text file containing records with no structured interrelationship. The records themselves can have an internal structure. It’s known as a flat file database.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education