Ian Witten
I grew up in Ireland, studied at Cambridge, and taught computer science at the Universities of Essex in England and Calgary in Canada before moving to paradise (aka New Zealand) 25 years ago.
Location New Zealand
Activity
-
Ian Witten replied to Dinda Veska
Yes, Weka can read csv files; see Q.3 of the quiz that follows (Step 1.17)
-
Ian Witten replied to Agustiah Agustiah
Darn. Guess you're missing the best part of the course :-)
-
Ian Witten replied to Agustiah Agustiah
yes
-
Ian Witten replied to Seren Evans
See here, Step 1.9: https://www.futurelearn.com/courses/data-mining-with-weka/9/steps/796485
-
Ian Witten replied to Tristen Fielding
You are correct. This weird result is only true for this particular dataset.
-
Great! But please note, this is an advanced course. If you haven't done it already, you might be better signing up for the introductory course Data Mining with Weka (https://www.futurelearn.com/courses/data-mining-with-weka/) instead.
-
Ian Witten replied to [Learner left FutureLearn]
What exactly is your question? If this https://www.futurelearn.com/comments/58046078 is it then the answer is "no".
-
Ian Witten replied to Luiz Jacob
My understanding is that once the course opens, all five weeks become available immediately when you join the course. Is this not the case for you? (Since I am instructor rather than student, it's possible the interface works differently for me.)
Otherwise, I don't understand your question.
-
I just tried it and can confirm that the quiz answer is correct. And you are looking in the correct place. Try restarting Weka and doing the experiment again.
-
Ian Witten replied to [Learner left FutureLearn]
!ʇhgir s’tahT
-
Ian Witten replied to Angel Perez
I believe I have fixed this now.
-
Ian Witten replied to Peter Rossler
The plant is, in fact, real (as is my hair). In New Zealand we don't need artificial plants.
-
Reminds me (for some reason) of "rubber duck debugging", a method of debugging code whose name refers to a story about a programmer who carried around a rubber duck and debugged their code by forcing themselves to explain it, line-by-line, to the duck. (Don't tell your husband.)
-
@RobertGillespie I just checked this, and I believe the answer is correct as it stands.
-
@RobertGillespie You are correct, it's an error. I've fixed it. Thanks for pointing this out.
-
It's a little clunky, but the quickest way is to select another classifier (or filter) and then re-select the one you want.
-
Thanks for pointing this out. I've made them available on our own (Waikato University) computer, and changed the links appropriately.
-
Ian Witten replied to Anne C
Click the pink link in the sentence "Download the regression_outliers.csv dataset and open it with Weka." in the quiz instructions. (And note the two points that follow in those instructions.)
-
@RobertGillespie I checked, and the given answer is correct. I think you might not be constraining XMeans to 2 clusters only. Its default is maxNumClusters=4, minNumClusters=2; and you should change maxNumClusters to 2 – otherwise the result is as you describe.
-
Yes, there's plenty of scope for more experimentation. Give it a go!
-
Ian Witten replied to paul martin
The follow-up course is running right now, More Data Mining with Weka (https://www.futurelearn.com/courses/more-data-mining-with-weka).
-
Ian Witten replied to Teresa Franco
@TeresaFranco Thanks for pointing this out; I've fixed it (stupid cut-and-paste error).
-
Ian Witten replied to Amanda Bluett
I'm not using Catalina myself, but I've talked to others who do and they report no problems regarding Weka.
You mention "lots of little error boxes": Weka doesn't normally pop up lots of boxes; it would be nice to know what's in them :-)
If you've worked with earlier versions of Weka previously, it may be worth removing the "wekafiles" folder in your...
-
Sorry you feel like that; hope it doesn't put you off the course. Did you manage to complete the quiz anyway?
-
Please re-read step 3.15 Using Weka in practice: some questions (https://www.futurelearn.com/courses/data-mining-with-weka/7/steps/658023)
-
Ian Witten replied to Manish Pandey
If only life were so easy ...
-
Ian Witten replied to Lorna Johnson
The aim of the quizzes is not so much to test your knowledge as to help with your learning. Sounds like it's working in your case :-) By the way, it's fine to look at the answers!
-
Ian Witten replied to Nicolas Brookes
@NicolasBrookes I'm sorry you're giving up. But don't blame the Mac – Weka works perfectly well on a Mac; I use one all the time.
-
I have no idea what the problem is, and you don't mention what the error is – though I'm not sure it would help me to know.
Tens of thousands of people have installed the user classifier without reporting any problems, so I would guess it's some kind of network issue.
I'd recommend you try again when network loading is light. And if you can't install it,...
-
Ian Witten replied to Robert Gillespie
@MarkGlover: "Python 2.7 is obsolete ..." -- thanks for reminding me! I'll make a note below the video, and remove references to Python 2.7.
-
Ian Witten replied to Nicolas Brookes
Do you know where Weka has been installed? All ARFF files should be in a folder called "data" within the weka-3-8-4 folder.
-
Ian Witten replied to ismael salam
Please try this: locate the folder called "wekafiles", which should be in your home directory, and remove it and all its contents. (This is where Weka puts the packages.)
Good for Google Translate :-)
-
Ian Witten replied to Cathal King
I have no idea, unfortunately. That's a FutureLearn question; I'm just the educator :-)
But congratulations anyway.
-
Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while Recall (also known as sensitivity) is the fraction of the total amount of relevant instances that were actually retrieved.
Both are shown in the Weka classifier output, e.g. (for ZeroR on the Iris data):
=== Detailed Accuracy...
-
Ian Witten replied to Ken di
The NaiveBayesMultinomial classifier, which (as you will learn about in Week 2 of the follow-up course More Data Mining with Weka) is used for text mining, is based on the multinomial model, which is a generalization of the binomial model.
-
Ian Witten replied to Brent U
Yes, it's not so important provided you get the idea.
See my response below (at https://www.futurelearn.com/comments/43942319) for what I think are the 8 outliers.
-
Ian Witten replied to Vivian Wagumba
@VivianWagumba Look at the Visualization tab and select the first of the four plots, which is X: year vs Y: phone calls. If you double-click a data point you get instance information, including the instance number. Instances 15, 16, 17, 18, 19 and 20 are clear outliers. Less obvious are instances 21 and (particularly) 14. To see that these are outliers,...
-
Ian Witten replied to Hawwau Moruf
The difference is whether you're mining "data" (typically in spreadsheet-like tabular format) or text (typically a plain text file). Text mining is covered in Week 2 of the follow-up course, More Data Mining with Weka.
-
The manual is in a file called WekaManual.pdf that appears in the weka-3-8-4 folder when you download Weka, . On my Mac that's /Applications/weka-3-8-4; on WIndows I guess weka-3-8-4 is in the Program Files folder.
-
Ian Witten replied to Katie Terrell
@KatieTerrell: rename the file weather.arff.txt to weather.arff.
-
Ian Witten replied to Coby Beck
You will soon (Step 4.8) learn about Logistic Regression, which does (something like?) what I think you are describing. It often works well, but I know of no rules of thumb. That's why evaluation was stressed in Week 2.
-
Ian Witten replied to Jorge Pita
Unfortunately this won't work at the moment. Please see my response to a question about the upcoming quiz on Cross-validating classifiers with Spark: https://www.futurelearn.com/comments/43342086
-
Ian Witten replied to Jorge Pita
i's not your Mac; unfortunately it doesn't work on anything at the moment. Please see my response to a question about the upcoming quiz on Cross-validating classifiers with Spark: https://www.futurelearn.com/comments/43342086
-
Ian Witten replied to Ron W
OK, here's the scoop. It turns out that the Spark 1.x libraries used in distributedWekaSpark are incompatible with recent versions of Java (> 1.8), which Weka has been updated to use. Spark 3.0 should resolve these issues, but it's only at the preview stage at the moment.
You can overcome this by installing a Java 1.8 runtime environment for Quiz 4.10 and...
-
Ian Witten replied to Ron W
Looks like a bug to me. I'll check with Mark Hall.
-
You need to define a cost matrix using the costMatrix field of the CostSensitiveClassifier panel configuration panel.
The error message appears because by default Weka tries to load the cost matrix from a file (called breast-cancer.cost in this case).
-
"Multiresponse linear regression" and "pairwise linear regression" are different ways of using linear regression for the classification problem. (For a 2-class problem, there is no difference between the two.)
They are explained near the start of the Step 4.6 video "Classification by regression" (from 0:44 min:sec).
Multiresponse linear regression works...
-
Ian Witten replied to Lorna Johnson
> I'm part way through the quiz.
I guess you mean Step 1.19, Using J48. (It's annoying that FutureLearn interface allow comments on quizzes – which in this course is where you most need them! – but I'm trying to establish the convention that queries are posted on the step following the quiz, not the one preceding it.)
> I've opened the labor.arff...
-
Ian Witten replied to shine destine
The other attributes have already been used to obtain the best possible predicted number; now what we are doing is finding the best split-point to distinguish the two classes. If that number is all that will be used to make the final binary decision, OneR will produce the best split-point.
Its clean and simple. Perhaps a more complicated scheme might make...
-
Ian Witten replied to Stephen Howells
There is no way of doing this within Weka (as far as I know). However, I'm sure others have faced this problem. You should join the Weka email list and ask your question there.
-
Ian Witten replied to Oluwasefunmi Bamidele
@HawwauMoruf This will become clear as you work though the week.
-
Ian Witten replied to Hulya Akil
This problem is fixed in the latest version of the massiveOnlineAnaysis package, 2020.05.1.
-
Ian Witten replied to Mark Glover
There is now a new version of the massiveOnlineAnaysis package, 2020.05.1. which works OK for me now. Try it.
-
Ian Witten replied to Mark Glover
Yes, I know. It doesn't work on my Mac but apparently it does work for the MOA guy who created the fix. He's looking into it.
-
Ian Witten replied to Hulya Akil
Check out Robert's comment (below, in the same comment stream): https://www.futurelearn.com/comments/42437289
-
That's the dreaded "smart quotes" problem. If you look closely at the quotes around polygon you'll see that they're not regular quotation marks. Just edit them in the R command line, typing in the quote marks, and the command will work.
The problem arose because you (wisely!) copied from the question text, and FutureLearn displays all quotation marks as...
-
Ian Witten replied to İbrahim Atakan Kubilay
No!
-
Filters are called "supervised" if they use the actual class value of training instances in any way; otherwise they are "unsupervised". Almost all filters you will use are unsupervised.
However, the addClassification filter is supervised. Why? That's a good question! As I am using it here it doesn't look at the actual class values (so it should be...
-
Ian Witten replied to Jeff H
If a CSV file contains strings, quotation marks or newlines (maybe other characters too) within strings can cause this problem. Have a good look at line 2 of your file.
-
The NoChange classifier has been accidentally omitted from the LITE version of MOA. Change to the STANDARD version using the little menu at the top right of the interface, and then you will find it.
-
Ian Witten replied to Ron W
Please see my response to Hulya, https://www.futurelearn.com/comments/42215131
-
Ian Witten replied to Hulya Akil
Please see my response to Hulya, https://www.futurelearn.com/comments/42215131
-
Ian Witten replied to Mark Glover
Please see my response to Hulya, https://www.futurelearn.com/comments/42215131
-
Ian Witten replied to Hulya Akil
I have just discovered that an incompatibility has arisen between the versions of Java used in Weka (which was recently updated to use the latest version) and the massiveOnlineAnalysis package (which was not). If you are using the latest version of Weka, you are unable to select MOA’s data generators and classifiers from within Weka.
I apologise for not...
-
Ian Witten replied to Ron W
@RonW: I have just discovered that an incompatibility has arisen between the versions of Java used in Weka (which was recently updated to use the latest version) and the massiveOnlineAnalysis package (which was not). This explains why, if you are using the latest version of Weka and of this package, you are unable to select MOA's data generators and...
-
Ian Witten replied to Ron W
I think your choice of folder name might betray your age.
-
Ian Witten replied to Hulya Akil
Yes, that's OK. Actually, the separate installation of MOA.jar is unnecessary, but it does no harm. I have asked for that piece of text to be removed.
-
OK, thanks for letting me know. And sorry again for jumping to conclusions :-)
-
Ian Witten replied to Robert Gillespie
The NoChange classifier has been accidentally omitted from the LITE version of MOA. Change to the STANDARD version using the little menu at the top right of the interface, and then you will find it.
-
The file org_c_n.arff is large; 8.6 MB (about 11,000 lines)
-
Ian Witten replied to shine destine
It depends who you're talking to.
1. Some people use the term "validation data" for what I call the test data.
2. Sometimes the test data is used to help select between competing final models, in which case a "validation dataset" is held back to be used to make an unbiased estimate of the final model's performance –so you have training data, test data,...
-
Ian Witten replied to Brent U
The classifier that you use in these lessons is SMO, and that is installed in your system.
LibSVM is an external library that you would have to load explicitly; but you don't need it now. As I say in the video, the SMO algorithm only works with 2-class datasets, whereas the methods in LibSVM are more comprehensive.
Yes, my video screenshots were taken...
-
I apologize! I thought you hadn't because when I click on your name I can see a list of the FutureLearn courses you have done, and those two are not on it. Is the list inaccurate, or did you do the courses under a different name? I sent the same message to several others on the same basis, so I would like to know.
-
Ian Witten replied to George Hannan
@AndyvanEmmerik Having selected the Copy filter (or any other filter or classifier), double-click it to bring up its configuration panel. The More button is near the top on the right-hand side.
-
Well, it could use Date-Remapped in the model, but it doesn't. Linear regression doesn't necessarily use all available attributes, because omitting some may produce a better model.
-
Ian Witten replied to Hyeon Jin Cho
Maybe :-). Look under "functions".
-
Ian Witten replied to Gautam Bhut
The dropdown list is scrollable.
-
Ian Witten replied to Walew Yeboah
As Q.3 of the Step 4.7 Quiz says, click Output predictions in the More options menu and output the predictions as PlainText.
-
Kaggle (https://www.kaggle.com/) has 19,000 public datasets, and also offers many competitions, past and present, some with attractive prizes! (at https://www.kaggle.com/competitions).
-
Ah yes, I know Bragg Creek, and nearby Moose Mountain with the ice cave, and Rock of Gibraltar in the Sheep River area. Great times!
-
You can keep data files on the Web and open them by clicking "Open URL" in the Explorer's Preprocess.
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
Ian Witten replied to Uma Z
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
Ian Witten replied to Joseph Wijeyagoonewardena
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
Ian Witten replied to رضا القريشي
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
Ian Witten replied to Satender Melandiya
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
Ian Witten replied to Rouane Abdelselam
This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
Ian Witten replied to İbrahim Atakan Kubilay
@OlarewajuBabatope This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
Ian Witten replied to İbrahim Atakan Kubilay
@AmreenKureshi This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).
-
Ian Witten replied to swapan mitra
@MaryLynch The PrincipalComponents filter performs a principal components analysis and transformation of the data.
Dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data -- default 0.95 (95%). -
Ian Witten replied to Guus Löhlefink
@JyotiJalaj This course is an advanced one. I recommend you start with the introductory course "Data Mining with Weka" (https://www.futurelearn.com/courses/data-mining-with-weka/).