Hi! My name is Mike Mayo, and today I’m just going to demonstrate for you the imageFilters package, which is a package you can download for Weka using the Package Manager. What the imageFilters package does is let you convert images into features so that you can run image classification experiments, and then you can do exciting things like face recognition, scene recognition, and maybe even object detection. What I’ll do is just go over what the imageFilters package does in this lesson and give a quick demo. So what is an image feature? Basically, it’s a measurement concerning an image. In this
example, there are a couple of images: one is a sunflower; one is a tree. The measurements that we’re taking from the image are to do with things such as color and brightness and shape. Both of those images vary quite a bit in terms of those four different measurements. Once we calculate those measurements, we can put them together in a feature vector, and then we can use Weka’s standard machine learning algorithms to do some image experiments and see if we can classify different types of images. The first thing you need to do when you want to run an image classification experiment is get a whole lot of images.
For the image filters to work, they all need to be in one directory. Here I’ve got an example of a dataset. It’s basically a collection of monarch butterfly images and owl images, and we can see that they’re pretty easy to distinguish. Monarch butterflies are mostly orange and black; owls are mostly white. Once you have your directory of images, you need to create an ARFF file. The ARFF file is just like the normal ARFF files you’ve been using so far. The difference is that it only contains two attributes, and one of them is a string. The first attribute, which is the string, has to contain the filenames of the images, and the second attribute contains the class.
So here on the left I’ve got an example of such a dataset, and you can see that it’s pretty straightforward.
There are the two attributes there: the first one is a filename, the second one is the class. When we apply the filters, all the filters are going to do is add further attributes to the dataset. In all cases, they are numeric attributes, and so the example on the right shows a very simple filter that adds three numeric values to the dataset. Now I’ve got Weka open, and I’ve opened that ARFF file to have a look at. You can
see that there are two attributes here: the filename, which is a string, so there’s no useful information about it here; and the class, which is a nominal attribute, which is shown here. You can see that there are two classes and they’re both equal frequency. So there are 50 examples of each in this dataset. I now want to apply the filter, and if you’ve installed the imageFilters package, all the filters should be available under Unsupervised/Instance/imageFilter. OK. So if you’ve installed the package correctly, you should have this directory here, which you can then open to get all of these filters. All you do is select one. So I’m going to choose the ColorLayoutFilter, and there’s
a “–D” option here: that simply refers to the directory that contains all the images. I’m going to put the image directory in here. Once I’ve done that, I can now apply the filter and it will go away and process all the images. Weka is reading in all those images and extracting all the features, and after a few moments it’s done. We can see that a whole lot of additional features have been added to the dataset, and they’re all numeric. If we go down here, we can see that there are 33 in total, and the class label is still there.
In order to run a classification experiment after running the filter for the first time, what we have to do is remove the filename attribute because that is a string and that will cause problems for many different classifiers. So I’m going to remove it, and then switch over to the Classify tab. I can open that, and I can find a classifier. I’ll use J48 just for fun. Click Start, and then we can see that Weka correctly classified all of those images 90% of the time. That’s pretty good accuracy. It is possible to apply more than one filter in sequence to your images. If you want to do that, all you need to do is repeatedly apply different filters to the dataset.
I’ll give you a quick example of that. I’ll click undo, just to get the filename attribute back, because Weka needs to know what the filenames are. I’ll then choose a different filter. For this example, I’ll choose the EdgeHistogram as the second set of features. Again, I just set the directory. I click Apply, and we wait to see what happens. OK, the features are there now. We can see that they’ve got a different name. These are edge histogram features. If we scroll down, we can see there are a lot of them, 80 in total, and they’ve been inserted before the color layout features. So the old color layout features are still there.
The class is still at the end, but we have a lot more features this time. Again, I’ll remove the filename attribute, switch to Classify, run exactly the same experiment again, and this time, interestingly, Weka does slightly less. It gets 88% this time compared to 90% last time. That means that adding those additional edge histogram features has in fact decreased the accuracy a little bit. That probably makes sense if you think about it, because clearly the main differentiating feature between these two classes – monarch butterfly and owl – is color. So the size and direction of edges probably doesn’t provide that much information. Clearly, in this case only, the edge features haven’t been very useful.
That’s a quick run down of the imageFilters package. I’ve put a summary of the steps there, if you want to try that package out and do some experiments with your own images. It’s really straightforward to use, and I hope you enjoyed this lesson.