Learn more about this course.

Genome Annotation Files

Learn more about genome annotation files.

In this article, we learn how information about genes and proteins found in genomes can be stored in files.

Genomic features that are found in the genome can be stored in files that we call annotation files. These are text files in which the information regarding different features of the genome (genes and other regions of interest such as promoters, etc) can be stored and read mainly bioinformatically (although most of the annotation files can be decoded by humans too!). Annotation files are not exclusive to genomic DNA, they can also be used to annotate single genes or single protein sequences. In the case of proteins, instead of indicating genetic regions of interest one can indicate for example secondary structure regions or catalytic residues.

Typically, a genome annotation file will have information of each gene location, the strand in which it is found and sometimes it will also include functional annotation (that is the putative function of that gene or protein). Often, genomes downloaded from public databases already contain annotation information together with the sequence data. This might be in GFF or EMBL format.

Let’s have a look at a section of an annotation.

Want to keep
learning?

This content is taken from
Wellcome Connecting Science online course,

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

View Course

The St.tab files, when opened in a text editor, looks something like this:

FT CDS 190..255
FT /blastp_file="../old_whole_genome/blastp/St.tab.seq.00001.out"
FT /class="3.1.18"
FT /colour=7
FT /ec_orthologue="LPT_ECOLI"
FT /fasta_file="../old_whole_genome/fasta/St.tab.seq.00001.out"
FT /gene="STY0001"
FT /gene="thrL"
FT /hth_file="../old_whole_genome/hth/CORBA-St.tab.seq.00001.out"
FT /note="Orthologue of E. coli thrL (LPT_ECOLI); Fasta hit
FT to LPT_ECOLI (21 aa), 86% identity in 21 aa overlap"
FT /product="thr operon leader peptide"
FT CDS 337..2799
FT /blastp_file="../old_whole_genome/blastp/St.tab.seq.00002.out"
FT /class="3.1.18"
FT /colour=7
FT /ec_orthologue="AK1H_ECOLI"
FT /fasta_file="../old_whole_genome/fasta/St.tab.seq.00002.out"
FT /gene="STY0002"
FT /gene="thrA"
FT /hth_file="../old_whole_genome/hth/CORBA-St.tab.seq.00002.out"
FT /note="Orthologue of E. coli thrA (AK1H_ECOLI); Fasta hit
FT to AK1H_ECOLI (820 aa), 94% identity in 820 aa overlap"
FT /product="aspartokinase I/homoserine dehydrogenase I"

Note that the CDS sequences are clearly marked and that the numbers on the same line of the CDS label indicate the position in the genome.

It is important to notice that FASTA, EMBL, GenBank, etc are essentially text files with specific formatting, which means that the file name extension (that is the .fa and .embl we add at the end of the file names) doesn’t need to be .fasta or .embl; it could be .txt, and Artemis will still be able to read those files, as long as the formatting of the text contained in them is correct.

You can download the full file from here (we recommend use of Chrome or Firefox browsers for downloading data files): ftp://ftp.sanger.ac.uk/pub/resources/coursesandconferences/Online_Courses/Course3/data/S_typhi.tab

You may need to copy and paste the link in your internet browser.

Want to keep learning?

This content is taken from Wellcome Connecting Science online course

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

View Course

See other articles from this course

This article is from the free online

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Genome Annotation Files

Want to keep
learning?

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Want to keep learning?

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Genome Annotation Files

Want to keep learning?

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Want to keep learning?

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Share this

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?