• Packt logo
  • FutureLearn logo

Hadoop Ecosystem Essentials

Develop essential data analyst skills as you delve into the Hadoop ecosystem and learn how to handle large amounts of data.

A young woman is seated at a desk surrounded by monitors displaying data

Learn the skills needed to succeed as a data analyst

For data analysts, Hadoop is an extremely powerful tool to help process large amounts of data and is used by successful companies such as Google and Spotify.

On this four-week course, you’ll learn how to use Hadoop to its full potential to make it easier for you to store, analyse, and scale big data.

Through step-by-step guides and exercises, you’ll gain the knowledge and practical skills to take into your role in data analytics.

Understand how to manage your Hadoop cluster

You’ll understand how to manage clusters with Yet Another Resource Negotiator (YARN), Mesos, Zookeeper, Oozie, Zeppelin, and Hue.

With this knowledge, you’ll be able to ensure high performance, workload management, security, and more.

Learn how to analyse streams of data

Next, you’ll uncover the techniques to handle and stream data in real-time using Kafka, Flume, Spark Streaming, Flink, and Storm.

This understanding will help you to react and respond quickly to any issues that may arise.

Hone your data handling skills

Finally, you’ll learn how to design real-world systems using the Hadoop ecosystem to ensure you can use your skills in practice.

By the end of the course, you’ll have the knowledge to handle large amounts of data using Hadoop.

Skip to 0 minutes and 1 second SPEAKER: Happy with your Hadoop skills? Do you have the hang of Hadoop? If not, hurry over and start your Hadoop Essentials training today. Learn the essentials in just four weeks. Expert Frank Kane from Sundog Education shows you how to design real-world systems and manage clusters with the Hadoop ecosystem. By the end of this course, you’ll know how to use different query engines in Hadoop, use resource negotiators to manage a Hadoop cluster, describe streaming, analyse streams of data, and design a system to meet real-world business requirements. And the best part is that you can learn whenever from wherever you want. Step into your future with Hadoop Ecosystem Essentials from Packt and FutureLearn.

Syllabus

  • Week 1

    Querying data interactively in Hadoop

    • Introduction to the course

      Welcome to Hadoop Ecosystem Essentials and the start of your learning journey, brought to you by Packt.

    • Apache Drill

      In this activity, we will discuss how to install and use Apache Drill to query across multiple databases.

    • Apache Phoenix

      In this activity, we will describe Apache Phoenix, how to install the SQL driver, and how to integrate Phoenix with Pig.

    • Presto

      In this activity, we will discuss Presto, a query engine developed by Facebook.

    • Wrap up

      You have reached the end of Week 1. In this activity, you'll reflect on what you have learned.

  • Week 2

    Managing your cluster in Hadoop

    • Introduction to Week 2

      Welcome to Week 2. In this activity we'll highlight the main topics that will be covered this week.

    • Managing resources

      In this activity, we will discuss different technologies for managing resources in your cluster.

    • Managing clusters and tasks

      In this activity, we will describe technologies for managing clusters and tasks.

    • Other technologies

      In this activity, we will discuss a system that competes with Hortonworks, called Hue, and older systems in Hadoop.

    • Wrap up

      You have reached the end of Week 2. In this activity, you'll reflect on what you have learned.

  • Week 3

    Feeding and analysing data in Hadoop

    • Introduction to Week 3

      Welcome to Week 3. In this activity we'll highlight the main topics that will be covered this week.

    • Kafka

      In this activity, we will describe how Kafka provides a scalable and reliable means of collecting data across a cluster of computers and broadcasting it for further processing.

    • Apache Flume

      In this activity, we will discuss another way to stream data using Apache Flume.

    • Spark Streaming

      In this activity, we will discuss using Spark Streaming for processing continuous streams of data in real-time.

    • Introducing Apache Storm

      In this activity, we will describe streaming with Apache Storm, another tool for real-time data processing.

    • Flink

      In this activity, we will explore the Flink stream processing engine.

    • Wrap up

      You have reached the end of Week 3. In this activity, you'll reflect on what you have learned.

  • Week 4

    Designing real-world systems

    • Introduction to Week 4

      Welcome to Week 4. In this activity we'll highlight the main topics that will be covered this week.

    • Architecture design

      In this activity, we will discuss how to fit various systems together to design an architecture that solves real-world business problems.

    • Wrap up

      You have reached the end of Week 4. In this activity, you'll reflect on what you have learned.

When would you like to start?

Start straight away and join a global classroom of learners. If the course hasn’t started yet you’ll see the future date listed below.

Learning on this course

On every step of the course you can meet other learners, share your ideas and join in with active discussions in the comments.

What will you achieve?

By the end of the course, you‘ll be able to...

  • Practice using different query engines in the Hadoop ecosystem.
  • Demonstrate using different resource negotiators to manage a Hadoop cluster.
  • Describe streaming.
  • Practice analysing streams of data.
  • Design a system to meet real-world business requirements.

Who is the course for?

This course is designed for anyone who wants to hone their data handling skills using Hadoop.

What software or tools do you need?

You’ll be shown how to use a variety of open source utilities within the Hadoop environment. We assume you’ve already installed the Hadoop environment. If you haven’t, check out Introduction to Big Data Analytics with Hadoop.

Who developed the course?

Packt

Founded in 2004 in Birmingham, UK, Packt’s mission is to help the world put software to work in new ways, through the delivery of effective learning and information services to IT professionals.

FutureLearn

FutureLearn is a leading social learning platform and has been providing high quality online courses for learners around the world over the last ten years.

What's included?

This is a premium course. These courses are designed for professionals from specific industries looking to learn with a smaller group of like-minded individuals.

  • Unlimited access to this course
  • Includes any articles, videos, peer reviews and quizzes
  • Certificate of Achievement to prove your success when you're eligible
  • Download and print your Certificate of Achievement anytime

Still want to know more? Check out our FAQs

Learning on FutureLearn

Your learning, your rules

  • Courses are split into weeks, activities, and steps to help you keep track of your learning
  • Learn through a mix of bite-sized videos, long- and short-form articles, audio, and practical activities
  • Stay motivated by using the Progress page to keep track of your step completion and assessment scores

Join a global classroom

  • Experience the power of social learning, and get inspired by an international network of learners
  • Share ideas with your peers and course educators on every step of the course
  • Join the conversation by reading, @ing, liking, bookmarking, and replying to comments from others

Map your progress

  • As you work through the course, use notifications and the Progress page to guide your learning
  • Whenever you’re ready, mark each step as complete, you’re in control
  • Complete 90% of course steps and all of the assessments to earn your certificate

Want to know more about learning on FutureLearn? Using FutureLearn

Do you know someone who'd love this course? Tell them about it...