# What are Models and Predictions?

What is a model? Discover how to use various machine learning methods to make a model.

What is a model?

It is a formula that will answer based on inputs. We will use various machine learning methods to make the model. There are two essential models. One is a predictive model, and the other is a descriptive model. First, descriptive models give insight into the underlying facts. On the other hand, a predictive model is a formula for investigating the unknown value of interest that is called the target. The predictive model requires mathematical or logical thinking, and they are judged by predictive performance.
Some models can serve both descriptive and predictive purposes.

Have you heard about p2p lending? Peer-to-peer lending refers to the practice of lending money to individuals or small businesses via online services. It matches lenders with borrowers. Lenders can typically earn higher returns relative to savings and some investment products offered by banks. However, there is, of course, the risk that the borrower defaults on his or her loan. A platform usually sets interest rates based on analyzing the borrower’s data. The platform generates revenue by collecting a one-time fee on funded loans from borrowers and by charging a loan servicing fee to investors.

Lendingclub in the U.S. is the largest p2p platform. LendingClub issues loans between $1,000 and$40,000 for a duration of either 36 or 60 months. As mentioned, the interest rates for borrowers are determined based on personal information such as credit score and annual income. Also, LendingClub categorizes its loans using a grading scheme grades A, B, C, D, E, F, and G, where grade F corresponds to the loans judged to be ‘‘most risky” by LendingClub. Individual investors can browse loan listings online before deciding which loans to invest.

In Lending Club, you can obtain loan data. There are data of loans issued, and data of loans declined. Data of issued loans include time period, current loan status, whether it is current, late, or fully paid, and the latest payment information. Data on declined loans tell you why candidates of borrowers had not met the requirements.

Remember that data mining starts with data. In data mining terminologies, the data has attributes. An attribute is a property or characteristic of an object. A collection of attributes describes the object. In the data of issued loans, the attributes can be loan status, loan amount, and interest rates that describe the loans.

Let’s look at the dataset provided by Lending Club. For customers 1-4, the data tell each customer’s loan amount, funded amount, term, interest rate, loan grade, and loan status. Objects are customers, while attributes are here from loan amount to loan status.

Loan status is the target attribute. A target variable is what we like to know. In this case, we like to know whether the candidate loan will be fully paid or not. A target of a model is a special kind of attribute. There is the target column in the data sets. The data sets we will use to make the model calls ‘training data’ and making the model calls ‘training’. The target column in the training data includes values for comparison.

In this dataset, customer 3 has a loan issued. After Lending Club screens loan amount, funded amount, loan term, interest rate, and loan grade, it will train the algorithm to match using other attributes to the customer’s loan status. The idea is very similar to the regression. The algorithm will try to match the output from using the many attributes to the target variable.

A model is a simplified representation of reality created to serve a purpose. A predictive model is a formula for estimating the unknown value of interest: the target. With many attributes, such as balance, age, job, and loan amount, the algorithm had built a model to answer the question: Can we lend this person the money that he or she had requested? In terminologies of data science, the target attribute can be loan status or write-off values.

There are many names for the same thing in the context of data mining. In a traditional table of rows by columns, rows denote examples or instances. The table is another word for the dataset. Independent variable has other names like columns or features. An Independent variable, as you all know, is a variable whose variation does not depend on that of another.

Machine learning can be summarized in this way. You input data, train using the data. We call this ‘inducing algorithm’ or ‘learning’, or building a model. The process of building the model – we call training. Remember that the target is of our interest so that the algorithm learns to predict the target in training while comparing the target with the result.

To look deeply into the target, we should know the two types of learning: supervised learning and unsupervised learning. Supervised learning is similar to when a teacher supervises the learning process in class. We know the correct answers; we teach the algorithm to learn it. The circles are one group, while the xs are the other group here. Just like this, we know the right answer from the data. Now, we repeat the supervising process so that the algorithm can learn and make predictions with the new data, whether it is group 1 or group 2.

However, in unsupervised learning, nobody knows the answers. Only the input data is given. There is no output variable in the training data set. We do not know whether these are group 1 or group 2. Algorithms are left on their own to find the structure and distribution of the input data.

We will continue to discuss supervised learning next week.