Skip main navigation

Reflect on your experience

In Witten reflects on the Big Data quiz, and explains why testing is much slower than training for the NaiveBayesUpdateable classifier.

Well over a million instances of this 55-attribute dataset are needed before it becomes too big for Weka to load into memory. Beyond that point you have to use “updateable” classifiers, and then there is no limit on size.

As you discovered in the quiz, you could load the 581,000-instance covtype dataset into the Explorer. You could also have loaded the double-size version (1,162,000 instances; try it if you like), but not the triple-size version (1,743,000 instances).

The command line interface allows larger files, provided it’s configured in a suitable way – updateable classifiers and no cross-validation. Using a test set and NaiveBayesUpdateable, it was able to process the triple-size training file; and in fact both training and test files of any size could be processed.

Weka reports the time that NaiveBayesUpdateable takes to build the model and then test it on the training data. On my computer, with the triple-size dataset, training takes 10 secs and testing 0.5 secs. Training takes only 20 times longer than testing – despite the fact that the training file is 175 times larger (1,743,000 instances compared with the test file’s 10,000)!

Why? This involves thinking about how the Naive Bayes algorithm works. Processing a single instance when training involves incrementing 55 counts, one for each attribute. Processing an instance when testing involves 55 multiplications for each of the 7 class values. Thus if the time taken for multiplication and addition were comparable, one would expect testing to take 7 times as long as training.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now