Big data often needs to be stored and processed using a dedicated storage facility. Watch Dr Victoria Bennett, Head of CEDA, explain more.
The Centre for Environmental Data Analysis (CEDA)
has been working with big data for 22 years – since before the phrase ‘big data’ was coined. Watch Dr Victoria Bennett, Head of CEDA, explain how storing and providing access to data can support environmental science.
CEDA is focused on atmospheric and Earth Observation data from satellites, climate model simulations, meteorological observations, aircraft and ground-based measurements, as well as datasets produced by scientists using these raw data as inputs.
Data is added to CEDA in different ways – even arriving in the post on a disk or as an email attachment. Some data are routinely pushed to the organisation from data providers, others are pulled in via a satellite receiver dish, or over the internet as a FTP (File Transfer Protocol). CEDA also has a facility for data producers to upload their data, which are checked before being added to the archives.
CEDA not only holds data for users but also provides an analysis environment. Adjacent computers allow users to analyse, process and investigate the data, without moving them around networks, and without needing to store them on their own computers. This avoids multiple copies of the same data in different institutes, saving both time and money.
As Victoria Bennett mentions in the video, JASMIN
is a petabyte scale data storage and analysis infrastructure run for users to exploit the data held by CEDA. There are currently 17 petabytes of storage, over 5,000 processing cores and fast networking so users can efficiently access, analyse and process the data. This offers different types of computing: a private cloud and hosted processing for expert users (infrastructure as a service, and platform as a service) and software as a service: web interfaces and services to access our data.
Over 1,000 users regularly log in to JASMIN to perform analyses of the data remotely, but over 30,000 users regularly access the data to view the data catalogue. This includes research scientists, but also users from other sectors (eg government and industry), depending on the data’s licences.
JASMIN users work on a huge range of science projects, including earthquake monitoring, simulating hurricanes, measuring greenhouse gas emissions, analysing high resolution global and regional climate models, understanding ocean currents and modelling air pollution.
This combination of storage, processing and networking offers a range of options and benefits for different types of user needs. Jon Blower will discuss these later this week.