Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Running viralrecon

guide on running viralrecon on previously prepared data

In the previous section we downloaded the data we’re going to analyse and created the samplesheet we’ll use as input for Nextflow

We should now have everything we need to run viralrecon. However, before we can run viralrecon, we need to activate the nextflow environment we created earlier:

conda activate nextflow

viralrecon has a number of different parameters and options we could use to analyse our data and these are listed on the viralrecon pipeline page. We’re going to skip some of the steps such as removing host classified reads and lineage assignment in the interests of time but feel free to have a play with the different parameters. Ok, let’s run viralrecon using the command below:

nextflow run nf-core/viralrecon -profile singularity 
--max_memory '12.GB' --max_cpus 4
--input samplesheet.csv
--outdir results/viralrecon
--protocol amplicon
--genome 'MN908947.3'
--primer_set artic
--primer_set_version 3
--platform illumina

The parameters we’ve used are described in the table below:

Parameter Description
–max_memory ‘12.GB’ Sets the maximum available memory (RAM) on your machine. You should set this to what you have on your system.
–max_cpus 4 Sets the maximum number of CPUs (threads) available on your system. You should set this to what you have on your system.
–input samplesheet.csv Input file we created in the previous section
–outdir results/viralrecon Directory to save the pipeline results to
–protocol amplicon Specifies the type of protocol used for sequencing.
–genome ‘MN908947.3’ Specifies which reference to use to align the sequences to, in this case the Wuhan-Hu-1 sequence
–primer_set artic The primer set to be used for the data analysis
–primer_set_version 3 Version of the primer set
–skip_kraken2 Skip Kraken2 process for removing host classified reads.
–skip_assembly Skip all of the de novo assembly steps in the pipeline
–skip_pangolin Skip Pangolin lineage analysis
–skip_nextclade Skip Nextclade clade assignment, mutation calling, and sequence quality checks
–platform illumina NGS platform used to sequence the samples

If the pipeline starts running successfully, you should see output like the figure below:

The viralrecon pipeline launching in a terminal window

Once the pipeline starts submitting jobs and these jobs start running, you’ll see the progress of all the steps in the pipeline that are running (see below). The number of jobs running at any time will depend on the amount of RAM and number of CPUs you have available. The more you have, the more likely it is that multiple jobs will run at the same time. Fortunately, you don’t have to worry about any of this as Nextflow will take care of this based on what you supply with the –max_memory and –max_cpus flags.

viralrecon jobs running and completing in a terminal window

The pipeline will take a little while to run so time to go and make a coffee. If the pipeline completes successfully you should see a message like the one below:

Output text showing that the viralrecon pipeline completed successfully

Now, let’s move on to the next section where we’ll examine the results created by viralrecon.

How is it going so far? Do leave your comments in the discussion section below.

© Wellcome Connecting Science
This article is from the free online

Bioinformatics for Biologists: Analysing and Interpreting Genomics Datasets

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now