Skip to 0 minutes and 8 seconds Hello, everyone. In this part of the course, we will focus on how we can use whole genome sequences to predict resistance. In order to do that, first thing we need is the genome sequence data, it could be a strain that you have sequenced in-house or it could be a downloaded data from public databases such NCBI or ENA. For practice, we will download sequence data of Salmonella typhi strains from the ENA database, the accession number for which I have noted down here. So let’s get started. First, we will open new browser, copy and paste the link for ENA database in the browser tab.
Skip to 0 minutes and 59 seconds In the search block, copy and paste the accession number here and hit Enter. Click on the Run Details, and now we can see the organism, the platform which was used for sequencing, the machine itself on which sequencing was performed. And if we click More, we can even see that that the sequencing were paired end or single end. Remember that if the sequencing is paired end, it will have two raw sequence files. Now if we looked at the top right corner, under the Read File Section, click Show. And if we scroll down, now we can see where the two files under the FASTQ_FTP column, we can see two files here. Check both the Read Files and click on Download Selected Files.
Skip to 1 minute and 58 seconds Then a window will appear asking us where to save the file. So in this case, I’m saving it in Desktop here, and the download process begins. Once that is completed, we will have downloaded the read data of this particular strain of Salmonella typhi. The next step would be to assemble these downloaded reads into contigs. And for that, we will open in a new browser tab. Open the file, copy the links for SPAdes, and paste it in the address bar. The web server is running SPAdes, which utilises the paired end reads, if we have provided reads, and produces contigs.
Skip to 2 minutes and 48 seconds So remember, we downloaded paired end reads, so here we’ll select the option of Illumina paired end reads, and then we’ll click Choose Files. Select the files that we have downloaded recently, two files. Remember, these are paired end sequences, and select Open. The moment you say– now you can see both the read files on the screen. And once that is done, click Upload. The reads from your computer will be uploaded to the web server. And then on the web server, it will be assembled using SPAdes. And once done it’ll generate the contigs. The results page will appear something like this. Now you have to just click the Contigs tab and save the contigs wherever we want it to save.
Skip to 3 minutes and 38 seconds From this exercise, we have learned to access ENA database, download the FASTQ files for a particular strain, and how we can assemble it online using web server.
ENA database and assembly
In this video, Narender is explaining how to download read data from the European Nucleotide Archive (ENA).
The ENA is a data repository containing huge volumes of sequence data stored in form of DNA, RNA, protein sequences and associated metadata. The sequence database allows easy access and sharing of the genomic information among the researchers globally.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences