Skip main navigation

Introduction to Big Data Technologies

This will describe the technological solutions available to us within the Azure platform to tackle the problems highlighted in the previous Activity

This activity will describe the technological solutions available within the Azure platform to tackle the problems highlighted in the previous section.

In previous steps, we explored batch processing, which typically happens when you have large amounts of static data that you need to process regularly to aggregate or clean them.

We also spoke about real-time processing, where we take a perpetual stream of data and process it perpetually in real-time to analyse or store it.

There are many technological solutions within Microsoft Azure to achieve this.

Batch processing.

An example of a workflow:

  1. Extract data
  2. Clean up
  3. Load into the analytical store
  4. Report from that store.

Orchestration of Analytics

Options available to us within the Azure platform for processing the data:

  • Apache Hadoop, or Spark (open-source)
  • Native Platform as a Service option: Azure Data Lake Analytics.

Both of these will be explored in later steps.

Data Store for Analytics post-processing

Azure Synapse Analytics is an analytics service that brings together enterprise data warehousing and big data analytics. Dedicated SQL pool (referred to in this video as Azure SQL Data Warehouse) refers to the enterprise data warehousing features that are available in Azure Synapse Analytics.

Dedicated SQL pool (formerly SQL DW) represents a collection of analytic resources that are provisioned when using Synapse SQL. The size of a dedicated SQL pool (formerly SQL DW) is determined by Data Warehousing Units (DWU).

Analytics

Once we’ve created these processes, we have options for displaying the outputs within Azure & the Microsoft family. We could export to Excel, generate charts and explore the data further there, or use some of the integrated functionality to display results within Power-BI.

Real-time Data

As discussed in the last step, we first need to capture data. Within Azure, we have options of tools such as:

  • Message Broker
  • Azure Event Hubs
  • IoT Hub.

There are other tools such as Azure Stream Analytics, or Apache Open Source solutions such as Kafka, Storm, Spark, or H-Base, which we can use to create bespoke solutions for our problems. These will be explored more in later steps.

Processing

Whilst real-time reporting Azure Stream Analytics output, Azure Machine Learning can also do some processing.

In the next step, Graeme will demonstrate some tools for the batch processing of data in the Azure environment.

This article is from the free online

Microsoft Future Ready: Fundamentals of Big Data

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now