Learn more about this course.

Reflect on your experience

In Witten reflects on the Big Data quiz, and explains why testing is much slower than training for the NaiveBayesUpdateable classifier.

Well over a million instances of this 55-attribute dataset are needed before it becomes too big for Weka to load into memory. Beyond that point you have to use “updateable” classifiers, and then there is no limit on size.

As you discovered in the quiz, you could load the 581,000-instance covtype dataset into the Explorer. You could also have loaded the double-size version (1,162,000 instances; try it if you like), but not the triple-size version (1,743,000 instances).

The command line interface allows larger files, provided it’s configured in a suitable way – updateable classifiers and no cross-validation. Using a test set and NaiveBayesUpdateable, it was able to process the triple-size training file; and in fact both training and test files of any size could be processed.

Weka reports the time that NaiveBayesUpdateable takes to build the model and then test it on the training data. On my computer, with the triple-size dataset, training takes 10 secs and testing 0.5 secs. Training takes only 20 times longer than testing – despite the fact that the training file is 175 times larger (1,743,000 instances compared with the test file’s 10,000)!

Want to keep
learning?

This content is taken from
The University of Waikato online course,

More Data Mining with Weka

View Course

Why? This involves thinking about how the Naive Bayes algorithm works. Processing a single instance when training involves incrementing 55 counts, one for each attribute. Processing an instance when testing involves 55 multiplications for each of the 7 class values. Thus if the time taken for multiplication and addition were comparable, one would expect testing to take 7 times as long as training.

Want to keep learning?

This content is taken from The University of Waikato online course

More Data Mining with Weka

View Course

See other articles from this course

This article is from the free online

More Data Mining with Weka

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Reflect on your experience

Want to keep
learning?

More Data Mining with Weka

Want to keep learning?

More Data Mining with Weka

More Data Mining with Weka

More Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Reflect on your experience

Want to keep learning?

More Data Mining with Weka

Want to keep learning?

More Data Mining with Weka

Share this

More Data Mining with Weka

More Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?