Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

What is a workflow?

Introduction and definitions on pipelines/workflow
Analysis of data, especially in bioinformatics, often involves a series of tasks which may include data collection and cleaning, processing and some kind of downstream analysis such as visualization

These tasks are typically executed using a program or a script that performs a specific function. You were introduced to some of the most commonly used bioinformatics programs like FastQC in Week 1. However, we may want to analyse large amounts of data using several different tools that each need to be run on our input files and the subsequent outputs. You can imagine that waiting around for each of these tasks to complete before inputting the next command can be very inconvenient. It would be much better if we were able to somehow chain these tools together so all the tasks are executed in one go. Fortunately, we can and when these tasks are collected together they are known as a workflow or a pipeline. These workflows typically involve the execution of different software tools with the output of one tool becoming the input to the next tool in the workflow (see the figure below).

diagram illustrating a workflow with boxes representing different tools and flows coming from one tool to the next

Traditionally, bioinformatics workflows were written in general purpose programming languages such as bash, Perl or Python but, more recently, new specific workflow management systems such as Nextflow, Snakemake and Galaxy have been developed especially to manage computational data analysis workflows. The major advantage of using workflow management systems and pipelines in general is that they make our bioinformatics analyses much more efficient and, importantly, frees up more time for the actual interpretation of the results we’ve obtained. In the next section, we’ll go into more detail about workflow management systems and how they work.

© Wellcome Connecting Science
This article is from the free online

Bioinformatics for Biologists: Analysing and Interpreting Genomics Datasets

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now