New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

# Predictive Analysis for Solving Problems

Once you identified what the problem you like to solve, the critical skill to solve problems using data is decomposing a data analytics problem into pieces. And you should be able to match those problems with available data analysis tools. Let’s briefly discuss the problems and tools we will use in this class.

First, classification is identifying the group to which each case belongs. Recall our customer retention case. The classification is determining who will switch to other provider’s services and who will stay with the current service. For instance, we can use classification tree methods to solve this problem. We will learn about the tree model later.

You already know linear regression. Linear regression is also a predictive modeling methodology. You can estimate the unknown population value using regression and then predict based on this. I will not spend too much time on linear regression here in this lecture, but you should read our material to remind you about regression.

Let’s go back to our cell phone customer retention case, if you like to know how much potential sales you will lose when one customer churns, you might be able to use regression. In this case, we will relate the amount of money the customer spent to a few variables. We called ‘the amount of money the customer spent’ a dependent variable, and a few other variables explain this as independent variables. You can read about an independent variable and dependent variable from our study material. During this course, we will solve a similar kind of problem using an Artificial neural network.

When you use online services such as Netflix and Amazon, you might get some recommendations. The algorithm behind will do similarity matching. We will discuss various methods such as similarity measure, nearest neighbor, clustering, and association rules to solve this kind of problem.

Big data is not always useful. Sometimes, we make data reduction. Here is a well-known dataset in R, called the “mtcars” that describe many attributes of European, Japanese, and American cars, including their miles per gallon, the number of cylinders, and the number of forward gears. Since the mtcars dataset has eleven attributes, a graph without data reduction would be of eleven-dimensions. That is too messy. So we will reduce the dataset to two-dimension. However, you will still not lose any information. This method calls principle component analysis. We will not discuss principle component analysis, but it is a widespread technique.

Is news relevant to your daily life? In these days, data scientist often analyze text-based data such as news data or even social media data. Extracting information from the documents calls “information retrieval.” To do this, data scientists came up with various methods.

The goal of predictive analysis is to make a model using various methods we just mentioned and apply those to predict the answer that does not happen yet. Remember, our goal is not making the model that explains the given data well but predicting unknown cases.

Data science techniques can provide answers to business questions. Remind yourself of customer retention. Then you can answer the questions that probably matter in your workplace. Rather than verbal debating, solving by using data would yield more sensible solutions.