Skip main navigation

Genomes in chunks

Using multi-fasta files.
A photo of two oranges with the one in front split into a half and two quarters.  The split orange's flesh is facing forwards.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences

In this Step we will learn how to represent large multi-piece genomes in one file.

As it is the case for genes and proteins, whole genome sequences can also be stored in FASTA format. Fully sequenced genomes consisting of only one chromosome (as is the case of many bacterial genomes) can be represented in FASTA files that contain one entry (designated by the “>” and the sequence in the next line) for the full genome sequence. But more common than not, the genomes are known in chunks (that is to say some gaps of unknown size are present) or the genome has more than one chromosome. In this case, genome sequences are stored in a multi-FASTA file.

Multi-FASTA files have one FASTA entry for each chunk (chromosome or scaffold) of DNA. An example (mock) of a multi-FASTA section is shown below.

>Futuris learnis bacterium - Chr1
TGGATTCGCACTCCTCCAGCTTATAGACCACCAAATGCCCCTATCCTATCAACACTTCCG
GAGACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGC
AGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCTCAATGTTAG
TATTCCTTGGACTCATAAGGTGGGGAACTTTACTGGGCTTTATTCTTCTACTGTACCTGT
CTTTAATCCTCATTGGAAAACACCATCTTTTCCTAATATACATTTACACCAAGACATTAT
CAAAAAATGTGAACAGTTTGTAGGCCCACTCACAGTTAATGAGAAAAGAAGATTGCAATT
GATTATGCCTGCCAGGTTTTATCCAAAGGTTACCAAATATTTACCATTGGATAAGGGTAT
TAAACCTTATTATCCAGAACATCTAGTTAATCATTACTTCCAAACTAGACACTATTTACA
CACTCTATGGAAGGCGGGTATATTATATAAGAGAGAAACAACACATAGCGCCTCATTTTG
TGGGTCACCATA
>Futuris learnis bacterium - Chr2
TATGGTGACCCACAAAATGAGGCGCTATGTGTTGTTTCTCTCTTATATAATATACCCGCC
TTCCATAGAGTGTGTAAATAGTGTCTAGTTTGGAAGTAATGATTAACTAGATGTTCTGGA
TAATAAGGTTTAATACCCTTATCCAATGGTAAATATTTGGTAACCTTTGGATAAAACCTG
GCAGGCATAATCAATTGCAATCTTCTTTTCTCATTAACTGTGAGTGGGCCTACAAACTGT
TCACATTTTTTGATAATGTCTTGGTGTAAATGTATATTAGGAAAAGATGGTGTTTTCCAA
TGAGGATTAAAGACAGGTACAGTAGAAGAATAAAGCCCAGTAAAGTTCCCCACCTTATGA
GTCCAAGGAATACTAACATTGAGATTCCCGAGATTGAGATCTTCTGCGACGCGGCGATTG
AGACCTTCGTCTGCGAGGCGAGGGAGTTCTTCTTCTAGGGGACCTGCCTCGTCGTCTAAC
AACAGTAGTCTCCGGAAGTGTTGATAGGATAGGGGCATTTGGTGGTCTATAAGCTGGAGG
AGTGCGAATCCA

(Please note, this is a dummy example – and bacterial chromosomes are much larger than what it is represented here!)

Multi-FASTA files are not limited to the storage of genomic DNA from just one organism per file. Remember that we established earlier that genes and proteins can also be stored as multi-FASTA so it is not uncommon that DNA and proteins sequences from different organisms are stored in one file. For instance, if we want to collect all sequences of a virulence protein from different bacteria, we could collect them all in one multi-FASTA file.

© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education