Ian Witten

Ian Witten

I grew up in Ireland, studied at Cambridge, and taught computer science at the Universities of Essex in England and Calgary in Canada before moving to paradise (aka New Zealand) 25 years ago.

Location New Zealand

Activity

  • Yes, Weka can read csv files; see Q.3 of the quiz that follows (Step 1.17)

  • Darn. Guess you're missing the best part of the course :-)

  • yes

  • You are correct. This weird result is only true for this particular dataset.

  • Great! But please note, this is an advanced course. If you haven't done it already, you might be better signing up for the introductory course Data Mining with Weka (https://www.futurelearn.com/courses/data-mining-with-weka/) instead.

  • Ian Witten replied to [Learner left FutureLearn]

    What exactly is your question? If this https://www.futurelearn.com/comments/58046078 is it then the answer is "no".

  • My understanding is that once the course opens, all five weeks become available immediately when you join the course. Is this not the case for you? (Since I am instructor rather than student, it's possible the interface works differently for me.)

    Otherwise, I don't understand your question.

  • I just tried it and can confirm that the quiz answer is correct. And you are looking in the correct place. Try restarting Weka and doing the experiment again.

  • Ian Witten replied to [Learner left FutureLearn]

    !ʇhgir s’tahT

  • I believe I have fixed this now.

  • The plant is, in fact, real (as is my hair). In New Zealand we don't need artificial plants.

  • Reminds me (for some reason) of "rubber duck debugging", a method of debugging code whose name refers to a story about a programmer who carried around a rubber duck and debugged their code by forcing themselves to explain it, line-by-line, to the duck. (Don't tell your husband.)

  • @RobertGillespie I just checked this, and I believe the answer is correct as it stands.

  • @RobertGillespie You are correct, it's an error. I've fixed it. Thanks for pointing this out.

  • It's a little clunky, but the quickest way is to select another classifier (or filter) and then re-select the one you want.

  • Thanks for pointing this out. I've made them available on our own (Waikato University) computer, and changed the links appropriately.

  • Click the pink link in the sentence "Download the regression_outliers.csv dataset and open it with Weka." in the quiz instructions. (And note the two points that follow in those instructions.)

  • @RobertGillespie I checked, and the given answer is correct. I think you might not be constraining XMeans to 2 clusters only. Its default is maxNumClusters=4, minNumClusters=2; and you should change maxNumClusters to 2 – otherwise the result is as you describe.

  • Yes, there's plenty of scope for more experimentation. Give it a go!

  • The follow-up course is running right now, More Data Mining with Weka (https://www.futurelearn.com/courses/more-data-mining-with-weka).

  • @TeresaFranco Thanks for pointing this out; I've fixed it (stupid cut-and-paste error).

  • I'm not using Catalina myself, but I've talked to others who do and they report no problems regarding Weka.

    You mention "lots of little error boxes": Weka doesn't normally pop up lots of boxes; it would be nice to know what's in them :-)

    If you've worked with earlier versions of Weka previously, it may be worth removing the "wekafiles" folder in your...

  • Sorry you feel like that; hope it doesn't put you off the course. Did you manage to complete the quiz anyway?

  • Please re-read step 3.15 Using Weka in practice: some questions (https://www.futurelearn.com/courses/data-mining-with-weka/7/steps/658023)

  • If only life were so easy ...

  • The aim of the quizzes is not so much to test your knowledge as to help with your learning. Sounds like it's working in your case :-) By the way, it's fine to look at the answers!

  • @NicolasBrookes I'm sorry you're giving up. But don't blame the Mac – Weka works perfectly well on a Mac; I use one all the time.

  • I have no idea what the problem is, and you don't mention what the error is – though I'm not sure it would help me to know.

    Tens of thousands of people have installed the user classifier without reporting any problems, so I would guess it's some kind of network issue.

    I'd recommend you try again when network loading is light. And if you can't install it,...

  • @MarkGlover: "Python 2.7 is obsolete ..." -- thanks for reminding me! I'll make a note below the video, and remove references to Python 2.7.

  • Do you know where Weka has been installed? All ARFF files should be in a folder called "data" within the weka-3-8-4 folder.

  • Please try this: locate the folder called "wekafiles", which should be in your home directory, and remove it and all its contents. (This is where Weka puts the packages.)

    Good for Google Translate :-)

  • I have no idea, unfortunately. That's a FutureLearn question; I'm just the educator :-)

    But congratulations anyway.

  • Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while Recall (also known as sensitivity) is the fraction of the total amount of relevant instances that were actually retrieved.

    Both are shown in the Weka classifier output, e.g. (for ZeroR on the Iris data):

    === Detailed Accuracy...

  • The NaiveBayesMultinomial classifier, which (as you will learn about in Week 2 of the follow-up course More Data Mining with Weka) is used for text mining, is based on the multinomial model, which is a generalization of the binomial model.

  • Yes, it's not so important provided you get the idea.

    See my response below (at https://www.futurelearn.com/comments/43942319) for what I think are the 8 outliers.

  • @VivianWagumba Look at the Visualization tab and select the first of the four plots, which is X: year vs Y: phone calls. If you double-click a data point you get instance information, including the instance number. Instances 15, 16, 17, 18, 19 and 20 are clear outliers. Less obvious are instances 21 and (particularly) 14. To see that these are outliers,...

  • The difference is whether you're mining "data" (typically in spreadsheet-like tabular format) or text (typically a plain text file). Text mining is covered in Week 2 of the follow-up course, More Data Mining with Weka.

  • The manual is in a file called WekaManual.pdf that appears in the weka-3-8-4 folder when you download Weka, . On my Mac that's /Applications/weka-3-8-4; on WIndows I guess weka-3-8-4 is in the Program Files folder.

  • @KatieTerrell: rename the file weather.arff.txt to weather.arff.

  • You will soon (Step 4.8) learn about Logistic Regression, which does (something like?) what I think you are describing. It often works well, but I know of no rules of thumb. That's why evaluation was stressed in Week 2.

  • Unfortunately this won't work at the moment. Please see my response to a question about the upcoming quiz on Cross-validating classifiers with Spark: https://www.futurelearn.com/comments/43342086

  • i's not your Mac; unfortunately it doesn't work on anything at the moment. Please see my response to a question about the upcoming quiz on Cross-validating classifiers with Spark: https://www.futurelearn.com/comments/43342086

  • OK, here's the scoop. It turns out that the Spark 1.x libraries used in distributedWekaSpark are incompatible with recent versions of Java (> 1.8), which Weka has been updated to use. Spark 3.0 should resolve these issues, but it's only at the preview stage at the moment.

    You can overcome this by installing a Java 1.8 runtime environment for Quiz 4.10 and...

  • Looks like a bug to me. I'll check with Mark Hall.

  • You need to define a cost matrix using the costMatrix field of the CostSensitiveClassifier panel configuration panel.

    The error message appears because by default Weka tries to load the cost matrix from a file (called breast-cancer.cost in this case).

  • "Multiresponse linear regression" and "pairwise linear regression" are different ways of using linear regression for the classification problem. (For a 2-class problem, there is no difference between the two.)

    They are explained near the start of the Step 4.6 video "Classification by regression" (from 0:44 min:sec).

    Multiresponse linear regression works...

  • > I'm part way through the quiz.

    I guess you mean Step 1.19, Using J48. (It's annoying that FutureLearn interface allow comments on quizzes – which in this course is where you most need them! – but I'm trying to establish the convention that queries are posted on the step following the quiz, not the one preceding it.)

    > I've opened the labor.arff...

  • The other attributes have already been used to obtain the best possible predicted number; now what we are doing is finding the best split-point to distinguish the two classes. If that number is all that will be used to make the final binary decision, OneR will produce the best split-point.

    Its clean and simple. Perhaps a more complicated scheme might make...

  • There is no way of doing this within Weka (as far as I know). However, I'm sure others have faced this problem. You should join the Weka email list and ask your question there.

  • @HawwauMoruf This will become clear as you work though the week.

  • This problem is fixed in the latest version of the massiveOnlineAnaysis package, 2020.05.1.

  • There is now a new version of the massiveOnlineAnaysis package, 2020.05.1. which works OK for me now. Try it.

  • Yes, I know. It doesn't work on my Mac but apparently it does work for the MOA guy who created the fix. He's looking into it.

  • Check out Robert's comment (below, in the same comment stream): https://www.futurelearn.com/comments/42437289

  • That's the dreaded "smart quotes" problem. If you look closely at the quotes around polygon you'll see that they're not regular quotation marks. Just edit them in the R command line, typing in the quote marks, and the command will work.

    The problem arose because you (wisely!) copied from the question text, and FutureLearn displays all quotation marks as...

  • No!

  • Filters are called "supervised" if they use the actual class value of training instances in any way; otherwise they are "unsupervised". Almost all filters you will use are unsupervised.

    However, the addClassification filter is supervised. Why? That's a good question! As I am using it here it doesn't look at the actual class values (so it should be...

  • If a CSV file contains strings, quotation marks or newlines (maybe other characters too) within strings can cause this problem. Have a good look at line 2 of your file.

  • The NoChange classifier has been accidentally omitted from the LITE version of MOA. Change to the STANDARD version using the little menu at the top right of the interface, and then you will find it.

  • Please see my response to Hulya, https://www.futurelearn.com/comments/42215131

  • Please see my response to Hulya, https://www.futurelearn.com/comments/42215131

  • Please see my response to Hulya, https://www.futurelearn.com/comments/42215131

  • I have just discovered that an incompatibility has arisen between the versions of Java used in Weka (which was recently updated to use the latest version) and the massiveOnlineAnalysis package (which was not). If you are using the latest version of Weka, you are unable to select MOA’s data generators and classifiers from within Weka.

    I apologise for not...

  • @RonW: I have just discovered that an incompatibility has arisen between the versions of Java used in Weka (which was recently updated to use the latest version) and the massiveOnlineAnalysis package (which was not). This explains why, if you are using the latest version of Weka and of this package, you are unable to select MOA's data generators and...

  • I think your choice of folder name might betray your age.

  • Yes, that's OK. Actually, the separate installation of MOA.jar is unnecessary, but it does no harm. I have asked for that piece of text to be removed.

  • OK, thanks for letting me know. And sorry again for jumping to conclusions :-)

  • The NoChange classifier has been accidentally omitted from the LITE version of MOA. Change to the STANDARD version using the little menu at the top right of the interface, and then you will find it.

  • The file org_c_n.arff is large; 8.6 MB (about 11,000 lines)

  • It depends who you're talking to.

    1. Some people use the term "validation data" for what I call the test data.

    2. Sometimes the test data is used to help select between competing final models, in which case a "validation dataset" is held back to be used to make an unbiased estimate of the final model's performance –so you have training data, test data,...

  • The classifier that you use in these lessons is SMO, and that is installed in your system.

    LibSVM is an external library that you would have to load explicitly; but you don't need it now. As I say in the video, the SMO algorithm only works with 2-class datasets, whereas the methods in LibSVM are more comprehensive.

    Yes, my video screenshots were taken...

  • I apologize! I thought you hadn't because when I click on your name I can see a list of the FutureLearn courses you have done, and those two are not on it. Is the list inaccurate, or did you do the courses under a different name? I sent the same message to several others on the same basis, so I would like to know.