Skip to 0 minutes and 11 secondsHi, everyone! I’m Peter. Welcome to my class on scripting. Why would you do scripting? Well, there’s pros and cons to scripting. On the positive side of things, when you write a script, it captures all the steps that you performed from preprocessing to modeling to evaluation. Also, when you write a script, you really only write it once, and you can run it multiple times with no extra cost. It’s also very easy to create a variant of the script in order to test some theories. For example, tweaking some parameters of a classifier or swapping out a classifier completely. The best thing about scripting is that you don’t need to compile anything like you would have to with Java code.
Skip to 0 minutes and 52 secondsOn the not-so-good side of things, you will have to do programming. You need to familiarize yourself with the APIs of the libraries that are involved, and writing code is usually slower than clicking in the GUI. Now, what scripting languages will we cover in this class? We will cover Jython, Python, and Groovy. Jython is basically a pure Java implementation of Python 2.7, which runs solely in the Java Virtual Machine. This means it gives you access to all the Java libraries that are on the CLASSPATH. If you’re using Python code, then it has to be pure Python, no native libraries like, for example NumPy. As for Python, we’ll be using Python 2.7, and we’ll be invoking Weka through Python 2.7.
Skip to 1 minute and 38 secondsIt gives you then all the access that you need to the full Python library ecosystem. At the end, we’ll be touching briefly on Groovy, which has a Java-like syntax and also runs in the Java Virtual Machine. Once again, it gives you access to all the Java libraries on the CLASSPATH. In order to demonstrate why Python might be a good choice of programming language for doing the scripting, is simply by comparing what Java code would look like and Python code would look like for doing the same thing. What we’re trying to do is simply output ten times “Hello WekaMOOC!”.Looking at the Java code here, you have the outer class definition, then you have your main method.
Skip to 2 minutes and 19 secondsInside your main method, you have your for loop, where you finally output stuff. In Python, this whole thing collapses to a two-liner. You simply iterate from 0 to 9 and then print the whole thing out. Done. Now, in order to have Jython support in Weka, we need to install a package. I’m going to start up Weka. In the Package Manager, we need to install tigerJython. I’ve already done that, and, for plotting Jython, we also want to use jfreechart, and for that reason, you want to install the jfreechartoffscreenRenderer library.After we’ve done that, we have to restart Weka.
Skip to 2 minutes and 59 secondsThen, under the Tools menu, we will have a Jython console menu item, which brings up a little user interface for writing and running Jython scripts. The first time round it takes a little bit longer because it analyzes all the libraries that are in your CLASSPATH.Here’s our little interface. What you can see here is basically where you write your script. Down here you would see errors and so on, and output that your script generates. You execute your script with the green triangle up here. You can also turn debug mode on and off, which allows you to basically step through the program that you’ve written.
Skip to 3 minutes and 47 secondsYou can also set breakpoints up here, which allow you to stop at certain points in the program and then analyze, for instance, what the values for variables are, and so on. When running things, I usually run multiple scripts in parallel, so under Preferences I usually have a smaller font, and I’d rather use tabs than just a single one. Let’s just revisit our really, really simple example that we had previously. We were just outputting our “Hello World”, more or less. When we run this – not in debug mode for the time being – I’m just going to run that, we’ll see there’s an output from 1 to 10, “Hello WekaMOOC!”
Skip to 4 minutes and 28 secondsNow if we are in debug mode, once again toggling it, then we can define how fast it actually goes through, and we can simply go through and run it. You can see the instruction pointer sort of toggling between those two lines, and you can also see over here, when you open up variables and types, that the variable “i” gets incremented. This is a first quick introduction to tigerJython. When you’re writing code, you have to find information, and the best information on Java libraries, like Weka, is using the Javadoc.
Skip to 5 minutes and 10 secondsAlso, coming with your release or snapshot that you’ve installed, you’ll find a wekaexample s.zip file, which contains quite a lot of example code that should get you going in how to use APIs in Weka. Last, but not least, also check out the WekaManual.pdf document. In the appendix under the “Using the API” section, you will find most of the important APIs in Weka explained and how to use them. Of course, I promised that we’re going to write a little script. What we’re going to do is load data and filter it and print it out. However, since all the installations of Weka will be different around the world, in order to find datasets I’ll be using a little trick.
Skip to 5 minutes and 59 secondsI’ll be using an environment variable to point to the directory where I’ve stored my datasets. I’m going to close Weka for the time being. You can see here on my desktop in the data directory, I have various datasets, and we want to point to that directory. I’m going to copy that path and I’m going to add an environment variable. I’m going into the Advanced settings, Environment variables, and I’m going to create one called MOOC_DATA and paste that in there. OK. Close that dialogue again, and we can close that, too. Then we can start up Weka again. We’re starting up our Jython console again. First of all, we’ll have to import some classes to actually do stuff.
Skip to 7 minutes and 8 secondsFirst of all, we actually want to load data, and we’ll be using the DataSource class for that, abbreviated to DS; the Filter class for filtering a dataset; and the Remove filter to do the actual work. The os library is a Jython/Python library which gives us access to the operating system, like, for example, environment variables and so on.In order to utilize the MOOC_DATA environment variables that I’ve just configured, I’m using the os.environ.get method, the os.sep property for forward or backward slash, depending on what operating system you’re running, plus the name of the dataset, in this case iris, so I’m basically loading that. Then we’re going to configure our filter. So we want to have a Remove filter.
Skip to 8 minutes and 4 secondsWe want to remove the last attribute, which is done via the –R last option. Then we are telling the filter about what the data actually looks like, so it can configure itself internally. Then, we’re using the Filter class to actually push the data through our Remove filter and get a new dataset. Finally, we’re going to output that new dataset in the console. We run this now, and we get a lot of output here. If we scroll to the top of it, we can see that the relation name has changed with the filter set-up and there is no longer any class attribute. In this first lesson, we have installed tigerJython.
Skip to 8 minutes and 58 secondsWe’ve seen that Python is actually very easy to read and write, and is quite short as well compared to Java; learned about where we can find API documentation; and wrote our first Jython script.
Invoking Python from Weka
Peter Reutemann introduces scripting, and then demonstrates a Weka package that opens an editor in which you can write and execute Python scripts. Finally he writes a script for loading and filtering data.
First install two packages using the Package Manager:
- the tigerjython package
- the jfreechartOffscreenRenderer package
(Note: tigerjython is not yet compatible with Java 9. If you normally run Weka under Java 9 you will have to temporarily use Java 8.)
Then create an environment variable called MOOC_DATA that points to your Weka data files folder:
- On Windows, Peter right-clicked on This PC and selected Properties to get the Control Panel, then Advanced system settings, then Environment Variables. Click New… and type in the variable name and value.
- On the Mac, you can install envPane in your Preferences, which makes it easy to set environment variables.
Here’s Peter’s script for loading and filtering data:
© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.