Skip main navigation

Prepare for the quiz

Ian Witten suggests that you reproduce what he did in the video.

Before starting the quiz we suggest you reproduce what Ian did in the video, using the LED24 data in the Command Line Interface.

Make a test file with 100,000 instances. Precede all filenames below with an appropriate directory specification, surrounded by quotation marks if necessary.

java weka.datagenerators.classifiers.classification.LED24 -n 100000 
-o test.arff

(The Command Line Interface gives no output; you need to look for the file to see if it has worked.)

Make a training file with 10,000,000 instances:

 java weka.datagenerators.classifiers.classification.LED24 -n 10000000 
-o train.arff

Apply NaiveBayesUpdateable:

 java weka.classifiers.bayes.NaiveBayesUpdateable -t train.arff 
-T test.arff -v

(This takes about 30 secs on my computer. The ā€œ-vā€ suppresses evaluation on the training file, which the Command Line Interface does by default.)

Verify that Weka runs out of memory if cross-validation is attempted:

 java weka.classifiers.bayes.NaiveBayesUpdateable -t train.arff

If Weka runs out of memory it becomes unresponsive, although you will probably not get an error message. (Unfortunately it is difficult to trap and report out-of-memory errors in a Java program.)

Memory is allocated to Weka by the Java Virtual Machine (JVM) code, and depends (in an obscure way) on the amount you have available on your machine. To force it to run out, you may have to double or even triple the size of the training file.

If you feel brave, repeat the original exercise, without cross-validation, with a 100,000,000-instance training file.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now