Skip main navigation

Using the Command Line

Ian Witten shows how you can do everything the Explorer does (and more) from the command line – and also how to consult JavaDoc to learn about Weka.
Hello again! Welcome back to New Zealand for a few minutes with More Data Mining with Weka. Let’s look at the Command Line interface in this lesson. Now, the Command Line interface isn’t for everyone, but it’s worth knowing about, just in case you might need to do some more advanced things. We’re going to run a classifier from within the Command Line interface. I’m going to run J48 on the iris data. The first thing I’m going to do is to print the J48 options. Let’s fire up the Simple Command Line interface. I’m going to type “java”. Everything’s going to begin with “java” in this. This is the one line we type into here.
I’m going to type “java weka.classifiers.trees.J48” (I’ll explain the name in a moment). I’m going to hit Enter, and here I’ve got printed out a bunch of information. Actually, this is error information.
It says “Weka exception: No training file and no object input file given.” Because it can’t interpret this command, Weka has kindly printed out the options for J48.
First of all, the general options: “–h” for help; “–t” for training file; “–T” for test file. We’ll be using those. Then, a bit further down after the general options, we’ve got the options specific to J48. There’s the “–c” option and the “–m” option, and a few more options for J48. To make sense of these options, I’ve opened the Explorer here, and this is J48 in the default configuration. You can see here the “–c” option and the “–m” option. These are things that we type into the Simple Command Line interface. This is the default configuration, and these are parameters for J48. I can actually copy here. I’m going to copy the configuration.
I did a right-click, and I’m going to Copy the configuration to the clipboard. Then I’m going to go back and find my Simple Command Line interface, and I’m going to paste. It’s Ctrl-v for paste.
Oh, I should have put “java” at the beginning. I’m going to run this Java program with these options copy and pasted from the Explorer. Then I need a training file. That’s “–t” followed by a space, and now I need to put a filename for my training file.
Here it is. It’s a fully qualified file name starting with the disk. Unfortunately, in the Simple Command Line interface, you need to have fully qualified file names. This is where my datasets are, and it’s the iris.arff file. I’ve surrounded in quotes, because there are actually spaces in this file name, and Windows doesn’t like file names with spaces unless you put quotes around it. Now I’m going to hit Enter again, and it should execute J48 on that dataset. There we go; this is the result. We’ve seen that kind of thing many times before. That’s how you run classifiers in the Simple Command Line interface. Over here on the slide, this is what we did.
We copied the classifier name and the options from the Explorer, then we put the training set afterwards manually. That’s a good way of using the Command Line interface. I want to talk about this complicated name “weka.classifiers.trees.J48”. J48 is a “class”, which roughly means a program in Java. It’s a collection of variables, along with some methods – that is, code – that operates on the variables.
Classes come in packages. A “package” is a directory containing related classes. J48 is in the “trees” package, and the trees package is part of the classifiers package. We can see all this stuff in Javadoc. It’s useful to be able to look at the definitive documentation for Weka, and we can find that in our Weka installation. If I go to where I installed Weka. Here’s My Computer. I’m going to go to C, and I installed it in Program Files (x86). I’m going to find Weka here. There’s Weka, and I’m going to find documentation.html. There is the documentation, and I want to look at the Package Documentation.
I can see the Weka Manual here, but I’m going to look at the Package Documentation. This is called the Javadoc, which is documentation generated from the Java program. This is the definitive source of documentation for Weka. I’m going to find the classifiers. These are the packages up here. It’s a little bit complicated. I’m going to find the classifiers.trees package and click that. Down here I’ve got the contents of the classifiers.trees package, and I can click J48. Here I can see information about the J48 class. Actually, I could have got to the same thing if I had clicked All classes here and looked through this alphabetical list down here for J48, which is here. I get the same information.
When I look at this Javadoc, when I go down here, you can see some computer-y stuff here and you can see the options. This is the definitive source of the options for J48. These are options that you can use in the Explorer or in the Simple Command Line interface. Then there’s a lot of other information. Back to the slide here. We found J48 in the “all classes” list and looked at its documentation.
Now, I know what you’re thinking: “what’s all this geeky stuff?” Well, don’t worry, just try to ignore things you don’t understand, and just power on through here. To set your mind at rest, we’re not going to be using the Simple Command Line interface very much in this course. In fact, we’ll use it in the next lesson, but after that, we won’t be using it at all. Just bear with us while we look at it. I want to find another thing in the Javadoc. If you go back to the Explorer. Perhaps you’ve never noticed this, but – I’ll just find the Explorer again, which is here – you may never have noticed that here we’ve got, this is Open a database.
Open DB…, and if I click this – this is on the Preprocess panel – it says “Open a set of instances from a database”. I get a rather formidable looking form I’ve got to fill in here without really any help. Now we can find the documentation on this in Javadoc. I happen to know this is actually a “converter”, the database converter, and it’s in a package called weka.core.converters, the core of Weka. There’s a bunch of packages in the central core of Weka, and “converters” is one of them. If I look at the database converter, and look at the database loader, that gives us some documentation on this converter.
It’s a little bit complicated here, because reading from a database is a little bit complicated. We’ve got to specify a number of things here, like the URL of the database, the username, a password, and a query, and so on. We can specify all those things. Well, I don’t want to use this converter now, I just wanted to show you that the Javadoc is a source of detailed documentation on different bits of Weka. Coming back to the slide. The database loader will load from any JDBC database. It’s in the Explorer Preprocess panel, but the documentation is here in the Javadoc.
It’s useful to be able to find your way around the Javadoc to see more information about some of the facilities in Weka. This is what we’ve talked about here, the Command Line interface. I showed you it quickly. It can do everything the Explorer does, from the command line. We specify a command with minus followed by a letter followed by a space and then an option like “–c .25” or “–t filename”. You only get one line in the Command Line interface to type things, and people often open a terminal window instead, which gives you some advantages.
You can do scripting, so you can script a sequence of Weka commands, but in order to do that, you need to be able to set up your environment properly, and we’re not going to cover that in this course. I showed you how you can copy and paste a configured classifier from the Explorer. The advantage of the Command Line interface is that it gives you more control over memory usage. It’s kind of a lower level way of accessing the facilities of Weka, and we’ll be doing a little bit of that in the next lesson. Javadoc, as I’ve said, is the definitive source of Weka documentation.

You can do everything the Explorer does (and more) from the command line. Why would you want to – the Explorer is so easy? Well, some people find it faster to type commands! More importantly, the command line gives you more control over memory usage, because it’s a lower-level way of accessing Weka’s facilities. You will also learn how to consult Javadoc, which is the definitive source of Weka documentation.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education