1.19

# Prepare for the quiz

Before starting the quiz we suggest you reproduce what Ian did in the video, using the LED24 data in the Command Line Interface.

Make a test file with 100,000 instances. Precede all filenames below with an appropriate directory specification, surrounded by quotation marks if necessary.

java weka.datagenerators.classifiers.classification.LED24 -n 100000
-o test.arff


(The Command Line Interface gives no output; you need to look for the file to see if it has worked.)

Make a training file with 10,000,000 instances:

 java weka.datagenerators.classifiers.classification.LED24 -n 10000000
-o train.arff


Apply NaiveBayesUpdateable:

 java weka.classifiers.bayes.NaiveBayesUpdateable -t train.arff
-T test.arff -v


(This takes about 30 secs on my computer. The “-v” suppresses evaluation on the training file, which the Command Line Interface does by default.)

Verify that Weka runs out of memory if cross-validation is attempted:

 java weka.classifiers.bayes.NaiveBayesUpdateable -t train.arff


If Weka runs out of memory it becomes unresponsive, although you will probably not get an error message. (Unfortunately it is difficult to trap and report out-of-memory errors in a Java program.)

Memory is allocated to Weka by the Java Virtual Machine (JVM) code, and depends (in an obscure way) on the amount you have available on your machine. To force it to run out, you may have to double or even triple the size of the training file.

If you feel brave, repeat the original exercise, without cross-validation, with a 100,000,000-instance training file.