Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £29.99 £19.99. New subscribers only. T&Cs apply

Find out more

Learning process in machine learning

And here is the learning process in machine learning. If you want to perform a machine learning algorithms, so what is the process in their learning? And first one, you need to provide the training data.
And after that, the mode, the computer can learn the data. And then to generate the model for you. And you will use that model for what and for, uh, from this model, you have the training accuracy here. And if you insert the testing data into model, you will have another accuracy that one we call the testing accuracy. So that is the difference between training and testing in the model. So mostly, when you build the model with controlling accuracy, and then the testing accuracy was used to evaluate the performance on another data set. And another validation evaluation method, we call cross-validation testing, and what is the cross validation testing here? For example, you have an original data set.
And if you want to divide into training and validation sets, like I show before. So you can have training and testing and then you perform the training test in the same like previous, like. However, what are different between training validation separation and cross-validation. And in cross-validation, for example here, I use five-fold cross-validation, and if we use five-fold cross-validations means that I try I will have, uh, five times of validations. And every… and for each time, I have one set of validations and the other process as a training set. So we try to divide our original data set into five parts. And for each training times, we try to use one part of validations.
And the other four parts as a training set. And after that, for the first and for the first separations, We will use the last one as a training, a validation training. And for the second one, if you try to average all of the results, you will have a cross-validation accuracy, the performance for cross-vibrations. And in many, in many applications, many people prefer to use cross validation. Because you can have a very consistent result, because you average many times of validations. If you only use training and validations one time, the results sometimes overfitting and then you cannot estimate about the results when you will have another data set. However, if you test many times, so the results more stable and consistent.
And that’s why I try to uh show you about the cross variation. Because in the real case, you will meet a lot of problems. We call an overfitting or underfitting. So what are the differences between overfitting and also underfitting? The overfitting means that you can train good very good in your in your training accuracy. However, when you test in another data set like unseen data, the performance is not good, that’s why we call overfitting. And the optimum is like the middle one. So you try to test and then it contains a very few errors in the very few errors in the model. And the performance between the training and testing is very similar. And another case is underfitting.
And in the underfitting, you will receive a lot of error, when you try to… have to try to test the model. The underfitting means that your train not well or maybe you, however, you can test in another data set, the performance may be high, because…that’s why I say this problem is that because you don’t have enough data to train or you implement a not very good model. Okay, here is a slice I show you how you can read the results from Weka. After you implement the machine learning algorithms using Weka. And the Weka return the results how you can read it. There are some information suggests. The correctly classified instances, and also incorrect classified instances. You can see.
You have how many data that you have. And what are the accuracy for positive and negative data. Here. And for the first one, there are some number of positive data it predicts. And for this figure, you can see that. Your model can predict correctly for the 79 percent and about the other 20 percent, they predict incorrectly. And if you move…uh in the under… below you can see that, it shows some detail accuracies by class. You have class one and class two. For class one, the accuracy is like in the recall, the class 1 is 56 percent. And then for the class 2 is 71%.
And move down, you can see that confusion matrix means the information, the true positive, true negative, and so the other two are to perceived false-negative, and two negative and false positive here. And you can see that, you can from this confusion matrix, you can calculate the accuracies for positive and negative as well as the final the overall accuracy. So that is the information, all of the results. So from Weka, you can read according to their results.

In this video, Dr. Khanh will explain the learning process in machine learning. If you want to perform machine learning algorithms, how can machines learn? First, we need to provide the training data.

Like the chart, through the model, if we give training data, then we will receive testing accuracy and training accuracy. Then, he will explain another validation evaluation method, cross-validation testing. In the real case, using machine learning algorithms will meet a lot of problems. We call this overfitting or underfitting. He will explain why it affect the testing results. The final slide is a result return from Weka. Dr. Khanh will teach you how to read them.

This article is from the free online

Artificial Intelligence in Bioinformatics

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now