Comparison of two FASTA files
In the next exercise we will compare two Staphylococcus aureus genomes in ACT and investigate the number of differences, also called synteny breaks.
The first sequence we will download is the S.aureus genome strain called TW20. This is an antibiotic-resistant strain. Click here to download the GenBank entry.
We will compare this genome to an antibiotic-sensitive strain called S.aureus MSSA476. Click here to download the GenBank entry.
To download the FASTA file, go to Send to on the right-hand side of the GenBank record, choose Complete Record, destination File and as Format choose FASTA.
When downloading files from GenBank, the name of the FASTA file is always sequence.fasta. Change the name of the file to more meaningful names, for example TW20.fasta and MSSA476.fasta.
If you cannot download the files from a public repository you can also download them from the following FTP site. The BlastN comparison file called TW20_vs_MSSA476.txt can also be downloaded from this site: ftp://ftp.sanger.ac.uk/pub/resources/coursesandconferences/Online_Courses/Course4/Week1/Step_1.15/
You may need to copy and paste the link in your internet browser. We recommend use of Chrome or Firefox browsers for downloading data files.
We will now open the files in ACT. Double click on the ACT icon. Once the small ACT window is open choose the three following files and click Apply.
Sequence file 1: TW20.fasta
Comparison file TW20_vs_MSSA476.txt
Sequence file 2 MSSA476.fasta
Now follow the steps we’ve outlined in the previous section. Zoom out (marked with an arrow) to get an overview of the complete genomes. Take the slider from the comparison view panel (the one in the middle, marked with a circle) all the way down so you can eliminate low score similarities.
Differences between the genomes are shown as white spaces in the comparison view panel. They are called synteny breaks.
Discuss how many synteny breaks you can observe.
Is one of the genomes bigger than the other? Discuss the possible reason behind this.
You can read more about the S.aureus genome TW20 in this publication. Can you identify the 127kb difference mentioned in the publication?
© Wellcome Genome Campus Advanced Courses and Scientific Conferences