Skip main navigation

Genomes in chunks

Using multi-fasta files.
A photo of two oranges with the one in front split into a half and two quarters.  The split orange's flesh is facing forwards.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences

In this Step we will learn how to represent large multi-piece genomes in one file.

As it is the case for genes and proteins, whole genome sequences can also be stored in FASTA format. Fully sequenced genomes consisting of only one chromosome (as is the case of many bacterial genomes) can be represented in FASTA files that contain one entry (designated by the “>” and the sequence in the next line) for the full genome sequence. But more common than not, the genomes are known in chunks (that is to say some gaps of unknown size are present) or the genome has more than one chromosome. In this case, genome sequences are stored in a multi-FASTA file.

Multi-FASTA files have one FASTA entry for each chunk (chromosome or scaffold) of DNA. An example (mock) of a multi-FASTA section is shown below.

>Futuris learnis bacterium - Chr1
TGGATTCGCACTCCTCCAGCTTATAGACCACCAAATGCCCCTATCCTATCAACACTTCCG
GAGACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGC
AGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCTCAATGTTAG
TATTCCTTGGACTCATAAGGTGGGGAACTTTACTGGGCTTTATTCTTCTACTGTACCTGT
CTTTAATCCTCATTGGAAAACACCATCTTTTCCTAATATACATTTACACCAAGACATTAT
CAAAAAATGTGAACAGTTTGTAGGCCCACTCACAGTTAATGAGAAAAGAAGATTGCAATT
GATTATGCCTGCCAGGTTTTATCCAAAGGTTACCAAATATTTACCATTGGATAAGGGTAT
TAAACCTTATTATCCAGAACATCTAGTTAATCATTACTTCCAAACTAGACACTATTTACA
CACTCTATGGAAGGCGGGTATATTATATAAGAGAGAAACAACACATAGCGCCTCATTTTG
TGGGTCACCATA
>Futuris learnis bacterium - Chr2
TATGGTGACCCACAAAATGAGGCGCTATGTGTTGTTTCTCTCTTATATAATATACCCGCC
TTCCATAGAGTGTGTAAATAGTGTCTAGTTTGGAAGTAATGATTAACTAGATGTTCTGGA
TAATAAGGTTTAATACCCTTATCCAATGGTAAATATTTGGTAACCTTTGGATAAAACCTG
GCAGGCATAATCAATTGCAATCTTCTTTTCTCATTAACTGTGAGTGGGCCTACAAACTGT
TCACATTTTTTGATAATGTCTTGGTGTAAATGTATATTAGGAAAAGATGGTGTTTTCCAA
TGAGGATTAAAGACAGGTACAGTAGAAGAATAAAGCCCAGTAAAGTTCCCCACCTTATGA
GTCCAAGGAATACTAACATTGAGATTCCCGAGATTGAGATCTTCTGCGACGCGGCGATTG
AGACCTTCGTCTGCGAGGCGAGGGAGTTCTTCTTCTAGGGGACCTGCCTCGTCGTCTAAC
AACAGTAGTCTCCGGAAGTGTTGATAGGATAGGGGCATTTGGTGGTCTATAAGCTGGAGG
AGTGCGAATCCA

(Please note, this is a dummy example – and bacterial chromosomes are much larger than what it is represented here!)

Multi-FASTA files are not limited to the storage of genomic DNA from just one organism per file. Remember that we established earlier that genes and proteins can also be stored as multi-FASTA so it is not uncommon that DNA and proteins sequences from different organisms are stored in one file. For instance, if we want to collect all sequences of a virulence protein from different bacteria, we could collect them all in one multi-FASTA file.

© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now