Skip main navigation

Introduction to Big Data Technologies

This will describe the technological solutions available to us within the Azure platform to tackle the problems highlighted in the previous Activity

This activity will describe the technological solutions available within the Azure platform to tackle the problems highlighted in the previous section.

In previous steps, we explored batch processing, which typically happens when you have large amounts of static data that you need to process regularly to aggregate or clean them.

We also spoke about real-time processing, where we take a perpetual stream of data and process it perpetually in real-time to analyse or store it.

There are many technological solutions within Microsoft Azure to achieve this.

Batch processing.

An example of a workflow:

  1. Extract data
  2. Clean up
  3. Load into the analytical store
  4. Report from that store.

Orchestration of Analytics

Options available to us within the Azure platform for processing the data:

  • Apache Hadoop, or Spark (open-source)
  • Native Platform as a Service option: Azure Data Lake Analytics.

Both of these will be explored in later steps.

Data Store for Analytics post-processing

Azure Synapse Analytics is an analytics service that brings together enterprise data warehousing and big data analytics. Dedicated SQL pool (referred to in this video as Azure SQL Data Warehouse) refers to the enterprise data warehousing features that are available in Azure Synapse Analytics.

Dedicated SQL pool (formerly SQL DW) represents a collection of analytic resources that are provisioned when using Synapse SQL. The size of a dedicated SQL pool (formerly SQL DW) is determined by Data Warehousing Units (DWU).

Analytics

Once we’ve created these processes, we have options for displaying the outputs within Azure & the Microsoft family. We could export to Excel, generate charts and explore the data further there, or use some of the integrated functionality to display results within Power-BI.

Real-time Data

As discussed in the last step, we first need to capture data. Within Azure, we have options of tools such as:

  • Message Broker
  • Azure Event Hubs
  • IoT Hub.

There are other tools such as Azure Stream Analytics, or Apache Open Source solutions such as Kafka, Storm, Spark, or H-Base, which we can use to create bespoke solutions for our problems. These will be explored more in later steps.

Processing

Whilst real-time reporting Azure Stream Analytics output, Azure Machine Learning can also do some processing.

In the next step, Graeme will demonstrate some tools for the batch processing of data in the Azure environment.

This article is from the free online

Microsoft Future Ready: Fundamentals of Big Data

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education