Ian Witten

Ian Witten

I grew up in Ireland, studied at Cambridge, and taught computer science at the Universities of Essex in England and Calgary in Canada before moving to paradise (aka New Zealand) 25 years ago.

Location New Zealand


  • Yes, Weka can read csv files; see Q.3 of the quiz that follows (Step 1.17)

  • Darn. Guess you're missing the best part of the course :-)

  • yes

  • You are correct. This weird result is only true for this particular dataset.

  • Great! But please note, this is an advanced course. If you haven't done it already, you might be better signing up for the introductory course Data Mining with Weka (https://www.futurelearn.com/courses/data-mining-with-weka/) instead.

  • Ian Witten replied to [Learner left FutureLearn]

    What exactly is your question? If this https://www.futurelearn.com/comments/58046078 is it then the answer is "no".

  • My understanding is that once the course opens, all five weeks become available immediately when you join the course. Is this not the case for you? (Since I am instructor rather than student, it's possible the interface works differently for me.)

    Otherwise, I don't understand your question.

  • I just tried it and can confirm that the quiz answer is correct. And you are looking in the correct place. Try restarting Weka and doing the experiment again.

  • Ian Witten replied to [Learner left FutureLearn]

    !ʇhgir s’tahT

  • I believe I have fixed this now.

  • The plant is, in fact, real (as is my hair). In New Zealand we don't need artificial plants.

  • Reminds me (for some reason) of "rubber duck debugging", a method of debugging code whose name refers to a story about a programmer who carried around a rubber duck and debugged their code by forcing themselves to explain it, line-by-line, to the duck. (Don't tell your husband.)

  • @RobertGillespie I just checked this, and I believe the answer is correct as it stands.

  • @RobertGillespie You are correct, it's an error. I've fixed it. Thanks for pointing this out.

  • It's a little clunky, but the quickest way is to select another classifier (or filter) and then re-select the one you want.

  • Thanks for pointing this out. I've made them available on our own (Waikato University) computer, and changed the links appropriately.

  • Click the pink link in the sentence "Download the regression_outliers.csv dataset and open it with Weka." in the quiz instructions. (And note the two points that follow in those instructions.)

  • @RobertGillespie I checked, and the given answer is correct. I think you might not be constraining XMeans to 2 clusters only. Its default is maxNumClusters=4, minNumClusters=2; and you should change maxNumClusters to 2 – otherwise the result is as you describe.

  • Yes, there's plenty of scope for more experimentation. Give it a go!

  • The follow-up course is running right now, More Data Mining with Weka (https://www.futurelearn.com/courses/more-data-mining-with-weka).

  • @TeresaFranco Thanks for pointing this out; I've fixed it (stupid cut-and-paste error).

  • I'm not using Catalina myself, but I've talked to others who do and they report no problems regarding Weka.

    You mention "lots of little error boxes": Weka doesn't normally pop up lots of boxes; it would be nice to know what's in them :-)

    If you've worked with earlier versions of Weka previously, it may be worth removing the "wekafiles" folder in your...

  • Sorry you feel like that; hope it doesn't put you off the course. Did you manage to complete the quiz anyway?

  • Please re-read step 3.15 Using Weka in practice: some questions (https://www.futurelearn.com/courses/data-mining-with-weka/7/steps/658023)

  • If only life were so easy ...

  • The aim of the quizzes is not so much to test your knowledge as to help with your learning. Sounds like it's working in your case :-) By the way, it's fine to look at the answers!

  • @NicolasBrookes I'm sorry you're giving up. But don't blame the Mac – Weka works perfectly well on a Mac; I use one all the time.

  • I have no idea what the problem is, and you don't mention what the error is – though I'm not sure it would help me to know.

    Tens of thousands of people have installed the user classifier without reporting any problems, so I would guess it's some kind of network issue.

    I'd recommend you try again when network loading is light. And if you can't install it,...

  • @MarkGlover: "Python 2.7 is obsolete ..." -- thanks for reminding me! I'll make a note below the video, and remove references to Python 2.7.

  • Do you know where Weka has been installed? All ARFF files should be in a folder called "data" within the weka-3-8-4 folder.

  • Please try this: locate the folder called "wekafiles", which should be in your home directory, and remove it and all its contents. (This is where Weka puts the packages.)

    Good for Google Translate :-)

  • I have no idea, unfortunately. That's a FutureLearn question; I'm just the educator :-)

    But congratulations anyway.

  • Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while Recall (also known as sensitivity) is the fraction of the total amount of relevant instances that were actually retrieved.

    Both are shown in the Weka classifier output, e.g. (for ZeroR on the Iris data):

    === Detailed Accuracy...

  • The NaiveBayesMultinomial classifier, which (as you will learn about in Week 2 of the follow-up course More Data Mining with Weka) is used for text mining, is based on the multinomial model, which is a generalization of the binomial model.

  • Yes, it's not so important provided you get the idea.

    See my response below (at https://www.futurelearn.com/comments/43942319) for what I think are the 8 outliers.

  • @VivianWagumba Look at the Visualization tab and select the first of the four plots, which is X: year vs Y: phone calls. If you double-click a data point you get instance information, including the instance number. Instances 15, 16, 17, 18, 19 and 20 are clear outliers. Less obvious are instances 21 and (particularly) 14. To see that these are outliers,...

  • The difference is whether you're mining "data" (typically in spreadsheet-like tabular format) or text (typically a plain text file). Text mining is covered in Week 2 of the follow-up course, More Data Mining with Weka.

  • The manual is in a file called WekaManual.pdf that appears in the weka-3-8-4 folder when you download Weka, . On my Mac that's /Applications/weka-3-8-4; on WIndows I guess weka-3-8-4 is in the Program Files folder.

  • @KatieTerrell: rename the file weather.arff.txt to weather.arff.

  • You will soon (Step 4.8) learn about Logistic Regression, which does (something like?) what I think you are describing. It often works well, but I know of no rules of thumb. That's why evaluation was stressed in Week 2.

  • Unfortunately this won't work at the moment. Please see my response to a question about the upcoming quiz on Cross-validating classifiers with Spark: https://www.futurelearn.com/comments/43342086

  • i's not your Mac; unfortunately it doesn't work on anything at the moment. Please see my response to a question about the upcoming quiz on Cross-validating classifiers with Spark: https://www.futurelearn.com/comments/43342086