• Packt logo
  • FutureLearn logo

Introduction to Big Data Analytics with Hadoop

Hone your data analyst skills and improve your workflow as you learn how to store, analyse, and scale big data using Hadoop.

A woman points at a colourful bar graph on a purple background.

Learn how to use the Hadoop ecosystem

Understanding Hadoop is a highly valuable skill for anyone working with large amounts of data. Companies such as Amazon, eBay, Facebook, Google, LinkedIn, Spotify, and Twitter use Hadoop in some way to process huge chunks of data.

On this three-week course, you’ll become familiar with Hadoop’s ecosystem and understand how to apply Hadoop skills in the real world.

Exploring the history and key terminology of Hadoop, you’ll then walk-through the installation process on your desktop to help you get started.

Explore Hadoop Distributed File System (HDFS)

With a solid introduction to Hadoop, you’ll learn how to manage big data on a cluster with Hadoop Distributed File System (HDFS).

You’ll also discover MapReduce to understand what it is and how it’s used before moving onto programming Hadoop with Pig and Spark.

With this knowledge, you’ll be able to start analysing data on Hadoop.

Understand MySQL and NoSQL

Next, you’ll learn how to do more with your data as you understand how to store and query data. To help you do this, you’ll learn how to use applications such as Sqoop, Hive, MySQL, Phoenix, and MongoDB.

Develop core data analyst skills

Finally, you’ll hone your data analyst skills by learning how to query data interactivity. You’ll also gain an overview of Presto and learn how to install it to ensure you can quickly query data of any size.

By the end of the course, you’ll have the skills to effectively work with big data using Hadoop and be able to streamline your processes.

Skip to 0 minutes and 1 second SPEAKER: Big data is a big deal. Learn the basics of Hadoop and big data analytics in three short weeks. Learn from an expert in all things Hadoop. Let Frank Kane from Sundog Education introduce you to big data analytics. This course is designed to help you use the Hadoop Distributed File System, describe MapReduce, use Hadoop with Pig and Spark, use relational and non-relational data stores with Hadoop, handle Hadoop happily. Learn from anywhere at any pace. Step into your future with introduction to big data analytics with Hadoop from Packt and FutureLearn.

Syllabus

  • Week 1

    Introduction to Hadoop and using the HDFS

    • Introduction to the course

      Welcome to Introduction to Big Data Analytics with Hadoop and the start of your learning journey, brought to you by Packt.

    • Introduction to Hadoop

      In this activity, we will discuss how to install Hadoop, the effect of the Hortonworks and Cloudera merger, a Hadoop overview and history and the Hadoop ecosystem.

    • Using the Hadoop Distributed File System (HDFS)

      In this activity, we will discuss the Hadoop Distributed File System (HDFS), installing the MovieLens dataset, installing a dataset into HDFS using the command line and MapReduce.

    • Using the Hadoop's Core: MapReduce

      In this activity, we will discuss MapReduce, how MapReduce distributes processing and a MapReduce example.

    • Using the Hadoop’s Core: Activities and challenge exercise

      In this activity, we will explore Python MRJob, Nano and the MapReduce job. We will also describe how to rank movies by their popularity and check our results.

    • Wrap up

      You have reached the end of Week 1. In this activity, you'll reflect on what you have learned.

  • Week 2

    Programming Hadoop with Pig and Spark

    • Introduction to Week 2

      Welcome to Week 2. In this activity we'll highlight the main topics that will be covered this week.

    • Introduction to programming Hadoop with Pig

      In this activity, we will discuss and introduction to Ambari and an introduction to Pig. We will also apply Pig to an activity.

    • Pig continued

      In this activity, we will discuss Pig in more detail and apply Pig to a challenge exercise.

    • Programming Hadoop with Spark

      In this activity, we will discuss Hadoop with Spark, Resilient Distributed Datasets (RDD) and using RDD.

    • Data sets and Spark 2.0

      In this activity, we will discuss data sets and Spark 2.0.

    • Wrap up

      You have reached the end of Week 2. In this activity, you'll reflect on what you have learned.

  • Week 3

    Using relational and non-relational data stores with Hadoop

    • Introduction to Week 3

      Welcome to Week 3. In this activity we'll highlight the main topics that will be covered this week.

    • Using relational data-stores with Hadoop part 1

      In this activity, we will discuss what Hive is and how Hive works.

    • Using relational data-stores with Hadoop part 2

      In this activity, we will discuss integrating MySQL with Hadoop. We will describe installing MySQL and importing data and using Sqoop to import and export data.

    • Using non-relational data stores with Hadoop (1)

      In this activity, we will discuss NoSQL and HBase.

    • Using non-relational data stores with Hadoop (2)

      In this activity, we will discuss Cassandra, installing Cassandra and writing Spark output into Cassandra.

    • Using non-relational data stores with Hadoop (3)

      In this activity, we will discuss MongoDB, integrating MongoDB with Spark and using the MongoDB shell.

    • Wrap up

      You have reached the end of Week 3. In this activity, you'll reflect on what you have learned.

When would you like to start?

Start straight away and join a global classroom of learners. If the course hasn’t started yet you’ll see the future date listed below.

Learning on this course

On every step of the course you can meet other learners, share your ideas and join in with active discussions in the comments.

What will you achieve?

By the end of the course, you‘ll be able to...

  • Discuss the Hadoop Distributed File System.
  • Describe MapReduce.
  • Practice using Hadoop with Pig.
  • Practice using Hadoop with Spark.
  • Demonstrate using relational datastores with Hadoop.
  • Demonstrate using non-relational datastores with Hadoop.

Who is the course for?

This course is designed for anyone who works with big data.

You don’t need any prior experience of using Hadoop as you’ll start with the very basics.

What software or tools do you need?

On this course, we’ll show you how to install the Hadoop environment on your operating system.

Who developed the course?

Packt

Founded in 2004 in Birmingham, UK, Packt’s mission is to help the world put software to work in new ways, through the delivery of effective learning and information services to IT professionals.

FutureLearn

FutureLearn is a leading social learning platform and has been providing high quality online courses for learners around the world over the last ten years.

What's included?

This is a premium course. These courses are designed for professionals from specific industries looking to learn with a smaller group of like-minded individuals.

  • Unlimited access to this course
  • Includes any articles, videos, peer reviews and quizzes
  • Certificate of Achievement to prove your success when you're eligible
  • Download and print your Certificate of Achievement anytime

Still want to know more? Check out our FAQs

Learning on FutureLearn

Your learning, your rules

  • Courses are split into weeks, activities, and steps to help you keep track of your learning
  • Learn through a mix of bite-sized videos, long- and short-form articles, audio, and practical activities
  • Stay motivated by using the Progress page to keep track of your step completion and assessment scores

Join a global classroom

  • Experience the power of social learning, and get inspired by an international network of learners
  • Share ideas with your peers and course educators on every step of the course
  • Join the conversation by reading, @ing, liking, bookmarking, and replying to comments from others

Map your progress

  • As you work through the course, use notifications and the Progress page to guide your learning
  • Whenever you’re ready, mark each step as complete, you’re in control
  • Complete 90% of course steps and all of the assessments to earn your certificate

Want to know more about learning on FutureLearn? Using FutureLearn

Do you know someone who'd love this course? Tell them about it...