Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

ENA database and assembly

Demonstration of the ENA database and assembly
Hello, everyone. In this part of the course, we will focus on how we can use whole genome sequences to predict resistance. In order to do that, first thing we need is the genome sequence data, it could be a strain that you have sequenced in-house or it could be a downloaded data from public databases such NCBI or ENA. For practice, we will download sequence data of Salmonella typhi strains from the ENA database, the accession number for which I have noted down here. So let’s get started. First, we will open new browser, copy and paste the link for ENA database in the browser tab.
In the search block, copy and paste the accession number here and hit Enter. Click on the Run Details, and now we can see the organism, the platform which was used for sequencing, the machine itself on which sequencing was performed. And if we click More, we can even see that that the sequencing were paired end or single end. Remember that if the sequencing is paired end, it will have two raw sequence files. Now if we looked at the top right corner, under the Read File Section, click Show. And if we scroll down, now we can see where the two files under the FASTQ_FTP column, we can see two files here. Check both the Read Files and click on Download Selected Files.
Then a window will appear asking us where to save the file. So in this case, I’m saving it in Desktop here, and the download process begins. Once that is completed, we will have downloaded the read data of this particular strain of Salmonella typhi. The next step would be to assemble these downloaded reads into contigs. And for that, we will open in a new browser tab. Open the file, copy the links for SPAdes, and paste it in the address bar. The web server is running SPAdes, which utilises the paired end reads, if we have provided reads, and produces contigs.
So remember, we downloaded paired end reads, so here we’ll select the option of Illumina paired end reads, and then we’ll click Choose Files. Select the files that we have downloaded recently, two files. Remember, these are paired end sequences, and select Open. The moment you say– now you can see both the read files on the screen. And once that is done, click Upload. The reads from your computer will be uploaded to the web server. And then on the web server, it will be assembled using SPAdes. And once done it’ll generate the contigs. The results page will appear something like this. Now you have to just click the Contigs tab and save the contigs wherever we want it to save.
From this exercise, we have learned to access ENA database, download the FASTQ files for a particular strain, and how we can assemble it online using web server.

In this video, Narender is explaining how to download read data from the European Nucleotide Archive (ENA).

The ENA is a data repository containing huge volumes of sequence data stored in form of DNA, RNA, protein sequences and associated metadata. The sequence database allows easy access and sharing of the genomic information among the researchers globally.

This article is from the free online

Bacterial Genomes: Antimicrobial Resistance in Bacterial Pathogens

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now