We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip main navigation

The four V’s of data

Big data can be characterised by the four V’s: volume, variety velocity and veracity. Find out more in this article.
© University of Reading and Institute for Environmental Analytics
Let’s look at some of the key points when storing large volumes of data and the associated challenges.
Volume: Victoria talks about data size in her video, CEDA currently have 17 petabytes (PB) of storage which is added to all the time (revisit the infographic in Step 1.4, if you need a reminder of the different byte sizes). Data volumes are predicted to grow and the CEDA team is trialling new systems for storage, moving to tape rather than fast, expensive disk. Tapes may not be accessible immediately and have to be retrieved by a robot, and it may take longer to find the right place on the tape for the required data compared with disk. Those who have used tape (eg when music or video was on tape!) will appreciate some of the issues.
Variety: there are hundreds of datasets (nearly 500 collections), more than 150 million files, different suppliers, formats and uses. Topics for data held on CEDA range from bird habitats to archaeology. Much of what CEDA do is around making sure the data can be discovered, understood and used in years to come, without needing to contact the original data supplier.
Velocity: the fastest incoming dataset today is Sentinel data from the Copernicus Sentinel satellites, bringing in over 150 terabytes (TB) each month and with peak daily intakes of about 5 terabytes (TB).
Veracity: refers to the reliability of the data. Data may be biased, or noisy or contain outliers, making it difficult to analyse. The sources of these problems may be discovered and corrected, for example a faulty instrument gathering data, or remain in the data undiscovered. At CEDA, despite the pressure on storage, datasets are not deleted partly as a record is needed if an ‘incorrect’ dataset was used in research and partly because it would take a great deal of effort to decide which to delete.
© University of Reading and Institute for Environmental Analytics
This article is from the free online

Big Data and the Environment

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education