1.8

## Wellcome Genome Campus Advanced Courses and Scientific Conferences

Bacterial genomes are often found in one chromosome. However, in many cases we have to represent this molecule in several pieces.

# Genomes in chunks

In this Step we will learn how to represent large multi-piece genomes in one file.

As it is the case for genes and proteins, whole genome sequences can also be stored in FASTA format. Fully sequenced genomes consisting of only one chromosome (as is the case of many bacterial genomes) can be represented in FASTA files that contain one entry (designated by the “>” and the sequence in the next line) for the full genome sequence. But more common than not, the genomes are known in chunks (that is to say some gaps of unknown size are present) or the genome has more than one chromosome. In this case, genome sequences are stored in a multi-FASTA file.

Multi-FASTA files have one FASTA entry for each chunk (chromosome or scaffold) of DNA. An example (mock) of a multi-FASTA section is shown below.

>Futuris learnis bacterium - Chr1
TGGATTCGCACTCCTCCAGCTTATAGACCACCAAATGCCCCTATCCTATCAACACTTCCG
GAGACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGC
AGACGAAGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCTCAATGTTAG
TATTCCTTGGACTCATAAGGTGGGGAACTTTACTGGGCTTTATTCTTCTACTGTACCTGT
CTTTAATCCTCATTGGAAAACACCATCTTTTCCTAATATACATTTACACCAAGACATTAT
CAAAAAATGTGAACAGTTTGTAGGCCCACTCACAGTTAATGAGAAAAGAAGATTGCAATT
GATTATGCCTGCCAGGTTTTATCCAAAGGTTACCAAATATTTACCATTGGATAAGGGTAT
TAAACCTTATTATCCAGAACATCTAGTTAATCATTACTTCCAAACTAGACACTATTTACA
CACTCTATGGAAGGCGGGTATATTATATAAGAGAGAAACAACACATAGCGCCTCATTTTG
TGGGTCACCATA
>Futuris learnis bacterium - Chr2
TATGGTGACCCACAAAATGAGGCGCTATGTGTTGTTTCTCTCTTATATAATATACCCGCC
TTCCATAGAGTGTGTAAATAGTGTCTAGTTTGGAAGTAATGATTAACTAGATGTTCTGGA
TAATAAGGTTTAATACCCTTATCCAATGGTAAATATTTGGTAACCTTTGGATAAAACCTG
GCAGGCATAATCAATTGCAATCTTCTTTTCTCATTAACTGTGAGTGGGCCTACAAACTGT
TCACATTTTTTGATAATGTCTTGGTGTAAATGTATATTAGGAAAAGATGGTGTTTTCCAA
TGAGGATTAAAGACAGGTACAGTAGAAGAATAAAGCCCAGTAAAGTTCCCCACCTTATGA
GTCCAAGGAATACTAACATTGAGATTCCCGAGATTGAGATCTTCTGCGACGCGGCGATTG
AGACCTTCGTCTGCGAGGCGAGGGAGTTCTTCTTCTAGGGGACCTGCCTCGTCGTCTAAC
AACAGTAGTCTCCGGAAGTGTTGATAGGATAGGGGCATTTGGTGGTCTATAAGCTGGAGG
AGTGCGAATCCA


(Please note, this is a dummy example - and bacterial chromosomes are much larger than what it is represented here!)

Multi-FASTA files are not limited to the storage of genomic DNA from just one organism per file. Remember that we established earlier that genes and proteins can also be stored as multi-FASTA so it is not uncommon that DNA and proteins sequences from different organisms are stored in one file. For instance, if we want to collect all sequences of a virulence protein from different bacteria, we could collect them all in one multi-FASTA file.