Skip main navigation

Batch Processing with HDInsight

In this step, we will begin to explore Azure HDInsight, and some of the open-source technologies available to us when batch processing data

In this step, we’ll start exploring Azure HDInsight and some of the open-source technologies available to us when batch processing data. You’ll also be introduced to Jupyter Notebooks.

HDInsight & Apache OpenSource

Azure HDInsight allows us to run popular open-source frameworks (including Apache Hadoop, Spark, Hive, Kafka, and more) within the Azure environment using a customisable, enterprise-grade service for open-source analytics.

You can effortlessly process massive amounts of data and get all the benefits of the broad open-source project ecosystem with the global scale of Azure, allowing you to easily migrate big data workloads and processing to the cloud.

Load-balancing

We need to scale up the amount of processing power available to reduce the amount of time taken to run our processes. However, often the environment has limitations on the amount of power available on an individual machine.

Load-balancing allows us to spread the responsibilities and tasks across multiple machines to grow the processing capabilities. In Azure HDInsight, Spark clusters can be utilised for the parallel processing of tasks specific for high-performance instantaneous querying.

Note: If you’d like to delve deeper into the open-source technologies mentioned in this step, take a look at the links posted in the See also section below.

In the final step of this activity, we’ll look at the real-time processing technologies available to us in Azure HDInsight.

This article is from the free online

Microsoft Future Ready: Fundamentals of Big Data

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now