We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip main navigation

How much training data do I need? And how do I optimize all those parameters?

Ian Witten introduces this week's second Big Question
Two questions loom large when embarking on a data mining project. First, how much training data is enough? And second, given that data mining algorithms generally have parameters, how do you find suitable values for them, having chosen the algorithm itself?
Good questions.
And Weka can help. By the end of the week you will be able to examine how performance improves as the volume of training data increases. You’d expect it to improve rapidly at first, subject to random fluctuations, and then continue to improve – but at a steadily decreasing rate – thereafter. That should help you determine how much data is enough to achieve your goals. As for the second question, you should avoid optimizing parameters manually: you’re bound to overfit! Instead, by the end of the week you will be able to use “wrapper” metalearners that Weka provides to optimize parameters for best performance.
And a final question, in this closing miscellany of things you need to know: what other things can be specified in the ARFF file format in which datasets are represented?
This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education