Skip main navigation

Big Data At Rest

To begin our overview of interacting with big data, we'll examine data in a state of rest, such as data in a data warehouse.

To begin this overview of interacting with big data, let’s examine data in a state of rest.

At Rest

Big data at rest is when you have high volumes of data in a store. To analyse and report on this data, you need to batch process what is in the stores. To do so, you need to first prepare the data.

A Problem of Volume

When dealing with large volumes of data, we can use technological solutions to scale the processing out over multiple units in order to ‘divide and conquer’.

Another solution is to apply a schema-on-read schematic. Don’t force the data into a particular storage structure; you need only apply the schema when processing. We can then use orchestration workflow to transform data into a processing database from which we’ll generate reports.

There are a few different suggested models for the processing of big data, but they all share some commonalities.

1. Explore

KDD Knowledge Discovery in Databases processes

First, we must explore the data to get to know what it contains and what the structure is.

2. Clean, Prepare, & Transform

After we’ve explored and understood the data, it’s necessary to prepare and clean it. This could take the form of normalising the databases, removing duplicate or irrelevant data, scaling the values to not unfairly impact the results, or searching for errors and outliers. This would be the time to design the on-read schema mentioned in the video.

CCC Big Data Pipeline

3. Report & Mine the Data

Having prepared our data, we can now run reports and mine the data for trends. This is usually the main goal when working with large amounts of data, but the results are only as valuable as the accuracy of the data.

As many will tell you, databases and algorithms that work on a ‘garbage in; garbage out’ principle; the more accurate the data and input, the more valuable the reporting and output.

4. Analysis and Interpretation

Once you’ve run the reports, it’s time to interpret the results and analyse the value and efficacy of our process. This is a chance to see where we can go back and make changes to improve. Crisp-DM Big Data process

In the final step of this activity, we’ll look at how our needs and options change when the data problem is one of high velocity (when it’s a constant incoming stream of information).

This article is from the free online

Microsoft Future Ready: Fundamentals of Big Data

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now