Skip main navigation

£199.99 £139.99 for one year of Unlimited learning. Offer ends on 28 February 2023 at 23:59 (UTC). T&Cs apply

Find out more

Reflect on your experience

In Witten reflects on the Big Data quiz, and explains why testing is much slower than training for the NaiveBayesUpdateable classifier.

Well over a million instances of this 55-attribute dataset are needed before it becomes too big for Weka to load into memory. Beyond that point you have to use “updateable” classifiers, and then there is no limit on size.

As you discovered in the quiz, you could load the 581,000-instance covtype dataset into the Explorer. You could also have loaded the double-size version (1,162,000 instances; try it if you like), but not the triple-size version (1,743,000 instances).

The command line interface allows larger files, provided it’s configured in a suitable way – updateable classifiers and no cross-validation. Using a test set and NaiveBayesUpdateable, it was able to process the triple-size training file; and in fact both training and test files of any size could be processed.

Weka reports the time that NaiveBayesUpdateable takes to build the model and then test it on the training data. On my computer, with the triple-size dataset, training takes 10 secs and testing 0.5 secs. Training takes only 20 times longer than testing – despite the fact that the training file is 175 times larger (1,743,000 instances compared with the test file’s 10,000)!

Why? This involves thinking about how the Naive Bayes algorithm works. Processing a single instance when training involves incrementing 55 counts, one for each attribute. Processing an instance when testing involves 55 multiplications for each of the 7 class values. Thus if the time taken for multiplication and addition were comparable, one would expect testing to take 7 times as long as training.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education