Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £29.99 £19.99. New subscribers only. T&Cs apply

Find out more

An expert’s view: Is cloud computing a feasible computational resource for biomedical researchers?

In this video Dr. Tsute (George) Chen talks about cloud computing for biomedical researchers.
Hello my name is George Chen from Forsyth Institute in Cambridge, Massachusetts, United States. Today I will try to answer three questions regarding the biological big data challenges. First, is cloud computing a feasible solution for analysing biological big data? Second, due to this big data, our need for storage space has been increasing significantly. What options do we have? Third, what is the sensible option for data storage? How to deal with data safety and integrity issues? In recent years, biological researchers have been generating a lot of data, especially through the next generation sequencing technology, NGS. The data sciences are biggest for meta-genomic and meta-transcriptomic sequencing.
These curves show the growth of NCB at [INAUDIBLE] sequence data in the past 10 years, for both private and public sectors. It has grown up to 60 million gigabytes in 2020. The biggest challenge is that it is impossible to analyse this tremendous amount of data. Also, scientists are still trying to improve their software tools, such as using artificial intelligence to help analyse the data. Lastly, data transfer and storage are also very challenging. Luckily, most researchers won’t have to deal with the big amount of sequencing data because after several upstream processes such as quality control and sequence mapping to the reference genomes, the larger model of sequence data will be reduced to a very small information table called Read Count Table.
And with small downstream analysis, such as statistical inferencing, or network association analysis, the data science is still kept at a very small scale, no more than several megabytes per project. Most users don’t have to deal with these upstream analysis that are used to convert and reduce big sequence data to the so-called read count table, which can be simply an Excel table. Instead, there are many user friendly downstream analysis software. For example, the Resist Xplorer is a user friendly web based tool for video, statistical, and exploratory analysis of resistome data, which is derived from the sequence data, too. And the software was developed by our Dr. Fernanda Peterson’s group.
Nevertheless, if your research requires the NGS sequencing project, you will have to deal with the data storage and upstream analysis challenges. There are three basic requirements for any data analysis, computer hardware, analysis software, and data storage devices. These three components can be located locally in a scientist office or remotely in the so-called cloud environment. Cloud computing platforms have become very popular in recent years. These are the top commercial cloud platforms in the current market. The top one is to Amazon Web Services. So the first question is, is cloud computing a feasible solution? The answer is yes, but you have to have someone who knows how to manage the cloud platform or resource.
The answer will be no if you don’t have such expertise, but even if you want to analyse the data locally using your own computer and equipment, someone has to have the physical computer maintenance skill. Another thing we need to consider is the cost. These are the real life example of analysing NGS data in the cloud. It costs only less than three US dollars to analyse the typical 16S ribosomal and amplicon sequencing project with 81 samples. And a typical metatranscriptomic project with 20 to 30 million reads per sample cost about $7 per sample to analyse.
They don’t seem to be very expensive, but it is inconvenient having to manage the cloud resource and also to transfer the big data back and forth. The second question is considering the amount of data generated from metagenomic studies or our need for space has been increasing significantly, what are the options to curb this issue moving forward? This slide shows some options at different user levels. Unfortunately, personal laptop or desktop computers are no longer enough for analysing the biological big data. And what a cloud solution, almost all of us know this, have one or two types of these personal cloud storages such as OneDrive, Dropbox, or Google Drive.
These are convenient for small files, but are not enough for big data storage. So for a research laboratory generating NGS data, the most sensible solution for big data analysis is to use the more powerful computers called workstations. And for storage, it will be this type of equipment called network attached storage, NAS devices. As for the answer to the third question, I would still suggest the NAS devices. The NAS devices put together multiple hard drives to provide large and enough data storage space. They are inexpensive and have very little risk of losing data, so it is very safe.
NAS devices are safe because it uses the so-called RAID technology to ensure there is very little chance of losing data due to hard wire failure. Nest devices are relatively inexpensive and convenient to purchase, instal, and use compared to the cloud storage. For example, a 32 turbine NAS devices cost under just about $3,000. And 80 turbine NAS is a little above $6,000. Let’s say, for some very simple answers to these questions, please feel free to contact me if you have any other questions. Thank you.

To continue on the topic of computational resources, we invite Dr. Tsute (George) Chen to our series ‘An expert’s view’.

Dr. Chen is an expert microbiologist and bioinformatician who has led The Forsyth Institute’s effort to build and maintain the first human body site microbiome database – the ‘Human Oral Microbiome Database’ (HOMD). Over the past decade, the tools and data in the HOMD have been used by scientists worldwide for studying human oral microorganisms. Likewise, Dr. Chen works on the interface of ‘multi-omics’ aspects of microbiology, including genomics, transcriptomics, and metabolomics. It is our pleasure to welcome Dr. Chen to talk about cloud computing for biomedical researchers.

This article is from the free online

Exploring the Landscape of Antibiotic Resistance in Microbiomes

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now