• QUT logo

Big Data: Statistical Inference and Machine Learning

Learn how to apply selected statistical and machine learning techniques and tools to analyse big data.

34,422 enrolled on this course

  • Duration

    2 weeks
  • Weekly study

    2 hours

Everyone has heard of big data. Many people have big data. But only some people know what to do with big data when they have it.

So what’s the problem? Well, the big problem is that the data is big—the size, complexity and diversity of datasets increases every day. This means that we need new technological or methodological solutions for analysing data. There is a great demand for people with the skills and know-how to do big data analytics.

Extract information from large datasets

This free online course equips you for working with these solutions by introducing you to selected statistical and machine learning techniques used for analysing large datasets and extracting information.

Of course, we can’t teach everything in one course, so we have focused on giving an overview of a selection of common methods. You will become familiar with predictive analysis, dimension reduction, machine learning and clustering techniques. You will also discover how simple decision trees can help us make informed decisions and you can dive into statistical learning theory.

Explore real-world big data problems

These methods will be described through case studies that explain how each is applied to solve real-world problems. You can also develop your coding skills by applying the techniques you’ve just learnt to complete hands-on tasks and obtain results.

Just as there are many statistical and machine learning methods for big data analytics, there are also many software packages (see ‘Requirements’ below) that can be used for this purpose. In this course, we will expose you to three such packages, so that you can start to become familiar with using different tools, and can gain confidence in going further with these packages or using others that may come your way.

Continue learning with the Big Data Analytics program

This course is one of four in the Big Data Analytics program on FutureLearn from the ARC Centre of Excellence for Mathematical and Statistical Frontiers at Queensland University of Technology (QUT).

The program enables you to understand how big data is collected and managed, before exploring statistical inference, machine learning, mathematical modelling and data visualisation.

When you complete all four courses and buy a Certificate of Achievement for each, you will earn a FutureLearn Award as proof of completing the program of study.

Acknowledgements

QUT would like to thank the following content contributors:

  • Tomasz Bednarz
  • Amy Cook
  • Miles McBain
  • Kerrie Mengersen
  • Sam Rathmanner
  • Nan Ye

Skip to 0 minutes and 6 seconds Hi everyone and welcome to our Big Data Analytics collection of courses. My name is Kerrie Mengersen. Why is statistical inference and machine learning approaches important for analysing Big Data? To answer this question, I want to draw your attention to the world’s largest coral reef system, and one of Australia’s biggest natural wonders, the Great Barrier Reef. The Great Barrier Reef is composed of over 2900 reefs and 900 islands, spanning over 2300km, and is one of the most diverse ecosystems on the Earth. However, because of its large size, monitoring and predicting different trends in the reef is really difficult.

Skip to 0 minutes and 50 seconds Here at QUT we’re developing mathematical and statistical models that use Big Data to help better understand environmental impacts and trends in biodiversity on the Great Barrier Reef. Both statistical inference and machine learning play a huge role in modelling information and making predictions using all of this reef data. For example, here at QUT we’re using machine learning approaches to design robots to seek out and control the damaging crown-of-thorns starfish. In this course we show you how to apply certain predictive analysis, dimension reduction, clustering, and machine learning techniques to analyse big data and make informed decisions.

Skip to 1 minute and 37 seconds We not only explain these concepts, but we also provide a hands on approach that will help you better your programming skills using selected Big Data frameworks. Here we draw from the multi-faceted approach we use at ACEMS to provide you with a unique course on big data that meets the demand for analytics across a variety of different fields. We hope you enjoy the course as much as we have enjoyed creating it.

What topics will you cover?

  • Introduction to the relationship between statistical inference and machine learning
  • The application of methods from these areas to real world projects
  • An overview of the most popular methods currently used in these fields.
  • Machine learning methods used to undertake prediction and analysis of a given data set.
  • Specific methods such as neural networks, decision trees, principle component analysis and clustering.
  • The practical application of modern analysis tools such as R Studio and H2o.

Learning on this course

On every step of the course you can meet other learners, share your ideas and join in with active discussions in the comments.

What will you achieve?

By the end of the course, you‘ll be able to...

  • Identify big data application areas
  • Explore big data frameworks
  • Model and analyse data by applying selected techniques
  • Demonstrate an integrated approach to big data
  • Develop an awareness of how to participate effectively in a team working with big data experts

Who is the course for?

You will enjoy this course most and benefit from the learning experience if you have a basic understanding of statistics and mathematics at an undergraduate level.

In this course you will be using the following free tools. Please review the product websites below to ensure your system meets the minimum requirements:

R and R Studio Desktop (open source edition)
You will complete practical exercises using R Studio, so you’ll need to be familiar enough with R to:

  • install a package
  • import data
  • read and run starter code
  • develop a solution or read through a solution and gain understanding from it.

NOTE: You must first have a working installation of R to use R Studio.

H2O Flow
H2O Flow can be used as a stand-alone package for big data analytics or can be used in conjunction with R. This package will allow you to tackle larger problems that you might encounter in your own work.

WEKA
WEKA is a popular workbench for machine learning and statistical analysis. It comprises a very wide range of tools that are suitable for big data analysis.

Knowing R, H2O Flow and WEKA will give you a powerful, flexible and scalable set of tools to manipulate and analyse big data.

Who will you learn with?

I’m a Professor at QUT and a Deputy Director of ACEMS. My interests are in statistical modelling and analysis, computational and simulation sciences and big data analytics.

Hi, I am a Computer Scientist from the Australian National University with a particular interest in machine learning and artificial intelligence.

I am a designer, digital artist, and PhD candidate. My research and creative practice bridges art, science, creative code, big data, emerging technologies, and the everyday user.

I am a PhD student with research interests in genomics, operations research, big data and machine learning.

I'm a statistics PhD student at QUT. I'm interested in Bayesian analysis and Queueing theory. I script in R.

Who developed the course?

Queensland University of Technology

QUT is a leading Australian university ranked in the top 1% of universities worldwide by the 2019 Times Higher Education World University Rankings. Located in Brisbane, it attracts over 50,000 students.

  • Established

    1989
  • Location

    Brisbane, Australia
  • World ranking

    Top 180Source: Times Higher Education World University Rankings 2019

Endorsers and supporters

content provided by

ARC Centre of Excellence for Mathematical and Statistical Frontiers

Learning on FutureLearn

Your learning, your rules

  • Courses are split into weeks, activities, and steps to help you keep track of your learning
  • Learn through a mix of bite-sized videos, long- and short-form articles, audio, and practical activities
  • Stay motivated by using the Progress page to keep track of your step completion and assessment scores

Join a global classroom

  • Experience the power of social learning, and get inspired by an international network of learners
  • Share ideas with your peers and course educators on every step of the course
  • Join the conversation by reading, @ing, liking, bookmarking, and replying to comments from others

Map your progress

  • As you work through the course, use notifications and the Progress page to guide your learning
  • Whenever you’re ready, mark each step as complete, you’re in control
  • Complete 90% of course steps and all of the assessments to earn your certificate

Want to know more about learning on FutureLearn? Using FutureLearn

Do you know someone who'd love this course? Tell them about it...

You can use the hashtag #FLbigdataStats to talk about this course on social media.