Growing random numbers from seeds
In the preceding video I talk about changing the random number seed in the Weka Explorer and getting a different result. (This was also done in the previous course, Data Mining with Weka.) In case you are mystified, an explanation follows.
Here’s the issue. Many data mining processes depend on some random process – like randomly splitting a dataset into training and test sets. This creates a conflict between getting repeatable results and realistic results. Realistically, the results should be slightly different each time, depending on the exact split. But in practice that would be a nightmare: you want to be able to repeat experiments and get the same results.
Here’s Weka’s solution. It uses a random number generator (a simple little program), but it generates the same sequence of numbers each time, so that you can do the same thing tomorrow with the same result. The sequence is controlled by number called a “seed”. You change it in the Explorer’s Classify panel, under More options. The default value is 1, but you get a different sequence of random numbers by changing the seed to something else – like 2, or 3, or 42, or anything.
In the Experimenter you don’t need to worry about changing the seed, because the random number generator is only initialized at the very beginning of an experiment, when you run it. Thus if the experiment involves several evaluations – say several cross-validations – they each take place with the random number generator in a different state, so they come up with different results. Yet the whole experiment is completely repeatable: if you run it again, the random number generator will be re-initialized.