Chris Wild

Chris Wild

Lead educator for “Data to Insight”, Chris Wild’s interests are data from complex sampling, statistical thinking and reasoning processes, and visualisation
See: https://www.stat.auckland.ac.nz/~wild

Location Auckland, New Zealand

Activity

  • HI Muriel, These particular articles get the essential ideas across well at the right level -- Chris

  • Hi Teresa. It doesn't really matter for essential learning if you used education rather than education.record (which you had to create .... "Exercise 2.5 showed the use of this technique to create a new variable called Education.reord. You will need to do that again.") -- Chris

  • Hi Chi Ni, they are different systems with different strengths and weaknesses -- Chris

  • Hi Elena, The counts are a summary off a categorical variable. If you wanted to make a new dataset containing the counts, those counts would form a numerical variable in the new data set. So you are on the right track -- Chris

  • Hi Chi Ni, With a categorical variable each entity falls into one, and only one, category. With what you are calling "overlapped categories" you have/code a set of variables, one for each category and the variable records whether or not an entity falls in to that category. This situation arises with so-called multiple-response questionnaire items where,...

  • Working fine for me on Windows Maria so guessing you are using a Mac. Email me at inzight_support@stat.auckland.ac.nz -- Chris

  • Not in the sampling variation module Olena, but there is in the module to follow -- Chris

  • Hi Elvina, Fill out the form at https://www.stat.auckland.ac.nz/~wild/iNZight/support/contact/ and it should tell us what we need to know to help you -- Chris

  • Hi Elvina, iNZight Lite doesn't get installed. You just connect to it online -- Chris

  • Yes it does Osama -- Chris

  • The current one (3 5 3 at the moment) is always the one to use Theresa -- Chris

  • Hi Diana, https://www.stat.auckland.ac.nz/~wild/d2i/exercises/1.15%20exercise-import-data-into-inzight-lite_17.pdf . Make sure you use the getting started links near the top first - Chris

  • It's there if you get your copy of the gapminder dataset from the place where it says the instructions in next Step (2.15) -- Chris

  • Hi Areej. Re going "back to previous steps" I guess you are talking about the Play button. Instead of using the Play button, use the slider and then you have control of what graphs you are looking at. Occasionally some combination of things you do stops iNZight but unless it happens in a way we can replicate there is no way to find it and fix it. You just...

  • Hi Mmuso. Your question is a bit advanced for this course. If you Google you'll find all sorts of things about sample size calculations/determination -- Chris

  • Hi Eva, iNZight is free for anyone to use anywhere so no problems from our end -- Chris

  • Hi Eva. Once iNZight is installed nothing you are using (except perhaps if you ask for an interactive graph) uses you internet connection so can't see how it can be the cause -- Chris

  • Strange. I still can't replicate! -- Chris

  • Got them Emily. Thanks, Chris

  • The graphics were made for a first encounter with testing ideas and we decided that "2-tailed" added another obscuring layer of complexity. Take your tail area and double it for a reasonable approximation -- Chris

  • @PatrickKearns It is hard to understand but often happens when results are a little unusual but no extremely so. Remember not having a small tail area does not demonstrate that no real (non random) effect exists -- Chris

  • @areejfatima Found it Areej, " Is this a problem for intended analysis??" is just a trigger to make you think about "Is the data problem I am seeing going to cause problems for the type of analysis I want to do?" You may have to find out more or learn more to be able to answer such a question -- Chris

  • Hi Suubi, I would say the *estimate* becomes more accurate because the sample size is bigger". -- Chris

  • Great to have you "back" Maggie -- Chris

  • Hi Areej, I don't get what you are asking. Can you give me more detail please? -- Chris

  • Hi Kemi, Can you email me a screenshot at inzight_support@stat.auckland.ac.nz so I can see what you are seeing? -- Chris

  • Hi Hakeen, You can just use the p-value if it was obtained using an appropriate method -- Chris

  • Hi Hakeem. This was just a taster. You will need a more full-on statistics course to get more into those aspects -- Chris

  • Thanks Patrick. That's an old link to the material on Step 6.9. I've removed it. Step 6.9 and the pdf of it linked from on Step 6.9 are fine -- Chris

  • Fine Marcio but please see answer I've just posted to Areej immediately above -- Chris

  • Hi Areej, Looks fine but I can't comment on everyone's answers to all these questions. I hope participants will look over one another's -- Chris

  • Hi Ali, Start from the top and email me at inzight_support@stat.auckland.ac.nz telling me about the first thing you strike that you can't understand -- Chris

  • I need more detail to understand what the problem is Areej -- Chris

  • Hi Anna, the bottom line is, if you are looking at the graph and want to spot evidence of where there are true differences, or get a visual indication of how small or big a true difference could be use the black lines in graph 5 and not the red lines. Even though we are illustrating with 2 groups here the technique is really for graphs with multiple groups. To...

  • Hi Patrick, (CI lower, CI upper) overweight 1.18,1.29, 1.40; normal weight 1.38, 1.5, 1.62 is consistent with the story unless you are looking somewhere I haven't seen --- Chris

  • Hi Olena, We talked about the overlap between data fro 2 different groups. IQR is talking about where the centre 50% of the data for one dataset/group is -- Chris

  • Hi Ася, VIT and VITonline can only cope with csv and tab-delimited text files -- Chris

  • No RIMAMSIKWE, Lots in the R libraries of Rob Hyndman (Google him) -- Chris

  • Can't currently in iNZight or with the iNZightPlot function in R Rosebud -- Chris (can in ggplot)

  • Are you using VIT or VITonline Victoria? -- Chris

  • Hi Ася, All the numbers above are produced by iNZight. Not sure what you mean about the "proportion rate"? Can you give more detail? Thanks, Chris

  • Hi Ася, I'm not quite following your question. Can you be more specific? -- Chris

  • Hi Ася, Step 7.11 confronts discusses issues like this -- Chris

  • Hi HAKUZAYESU, desktop (installed) iNZight works off line. iNZight Lite is driven by a remote server so online only -- Chris

  • Sorry Sadequllah, Button name has changed to "Record my choices" to be the same as desktop VIT. Fixed -- Chris

  • Hi Ася, It's doing the whole 49 but only 40 numbers showing in the panel onscreen -- Chris

  • HI John, The exercises are for you to play with software and data. The quizzes examine your level of understanding. With upgrading the assessments do it better -- Chris

  • That's no problem on Windows John but it is a problem on Macs -- Chris

  • Hi Ася, we set up a scenario where we know the answers to see what sort of misconceptions we could get from biased sampling -- Chris biased sampling

  • HI Mark, If you run enough resamples you stop getting the differences. We've only been doing 1000 mainly so the visualizations are not too slow since VIT is mainly a conceptual development too -- Chris

  • A clean download is more reliable. Try in the early morning or some other time you think the internet in your area is not likely to be overloaded. If it remains a problem use iNZight Lite -- Chris

  • Hi Muriel, You'll soon find whether or not trying to use both is going to cost you more time than you are prepared to spend - in which case cut back to one -- Chris

  • Hi Pat, The point of this is demonstrating the variation you get in estimates from sample-to-sample. That variation gets smaller if the samples are bigger. So there is less sampling error in an estimate from a big sample than in an estimate from a smaller sample -- Chris

  • Hi Pat, Replication of results by many centres protects us against biases, experimental and other data-collection mistakes and special circumstances so in that sense a much higher bar for concluding a research hypothesis (not a null hypothesis) is true. Assuming that several trials have been done well pooling the results (meta analysis) is like having a very,...

  • Hi RIMAMSIKWE, In practice the true value of the population mean is unknown but if we take samples and construct 95% confidence intervals most of the time simulation experience in scenarios where we know the truth and theory show that for 95% of samples taken the true mean is in the calculated confidence interval -- Chris

  • No RIMAMSIKWE. Outliers can cause uncertainty (about whether this is a real observation or an error), but not the sort we can compensate for by using a confidence interval -- Chris

  • Hi RIMAMSIKWE, the confidence interval itself is a way of conveying the level of uncertainty in an estimate -- Chris

  • @RIMAMSIKWEANDEMAMMAN Not easily and definitely not with the bootstrap RIMAMSIKWE -- Chris

  • Email the data to me at inzight_support@stat.auckland.ac.nz Philip and I'll take a look -- Chris

  • No but you could try this one Barbara ... -- Chris
    https://www.futurelearn.com/courses/data-mining-with-weka

  • I don't know of any Patrick but I do know that in medical trials the analyst is sometimes blinded - in the sense that they only get treatment labels in their data and don't know what actual treatment each patient got - Chris

  • If you get hooked on that stuff Dave, start poking around Rob Hyndman's website. Rob and his group are amazing -- Chris

  • Hi Ася, The diagram, above and the game itself should help. Look at the answers it produces and after a while you should get a better feel for it -- Chris

  • Email me directly then N E -- Chris

  • Hi Patrick. A negative relationship is one in which as one variable increases the other tends to decrease (opposite directions), so the wrong descriptor for "no association". Better ones would be unrelated or independent -- Chris

  • All working fine for me on v 3.5.3. Pretty sure time series hasn't been touched recently. As I said to Dave below for something similar, if restarting and trying again doesn't fix it email me at inzight_support@stat.auckland.ac.nz with an account of what you have done in what order, contents of the R Console window and a screenshot -- Chris

  • Hi mark. This would be plenty to prepare for Stats 101 here. In some place we have gone further -- Chris

  • Hi Adam, p-values are a different idea - see Week 7 -- Chris

  • Thanks Dorota -- Chris

  • Hi Nadia, Have lived in Auckland most of my life -- Chris

  • Working fine for me Dave. If restarting and trying again doesn't fix it email me at inzight_support@stat.auckland.ac.nz with an account of what you have done in what order, contents of the R Console window and a screenshot -- Chris

  • Hi Dorota, The behaviour in the China series isn't mistakes, it is real behaviour that is unlike behaviour before or since. Recently lots of series have gone crazy because of covid-19 induced behaviours that are unprecedented. In such circumstances there are no obvious ways of making good forecasts -- Chris

  • Hi Nadia, If you can't tell whether a seasonal series looks additive of multiplicative you'll generally get very similar results either way. Can always look at it both ways -- Chris

  • Hi S J, iNZight is a gui-driven system written in R -- Chris

  • Hi Mark, You'd have to calculate the other tail area as well -- Chris

  • @NadiaB Hi Nadia and Laura. Jan-Mar is about 15,000 above the trend so the trend ... -- Chris (probably unnecessarily tricky)

  • Hi Gemma, That is the behaviour for a subsetting variable (slots 3 and 4) that is numeric. Put your variables in slots 1 and 2 -- Chris

  • Hi Hannah, Bar charts won't display for a categorical variable with more than about 200 categories (wouldn't be able to see anything anyway) -- Chris

  • Hi Hannah, View Data Set is not just a display. It has editing capabilities as well. When the data set gets too large it slows down the program. We disable it at about 20,000 cells I think it is. You can still view the data set using Dataset > View full dataset - Chris

  • Here's a nice treatment Roger ... -- Chris
    https://robjhyndman.com/hyndsight/cyclicts/

  • Future Steps Roger -- Chris

  • Yes Adam, even in the nonlinear case curve fitting is a form of regression -- Chris

  • Hi Meshack. That is not a question for a statistician. That is a question for an expert in the area of the problem (be it medical, business, political, ...) and the answer will change with the problem -- Chris

  • One word wrong in there Meshack, "Treatment differences are **practically** significant if they are big enough to have a real world impact" -- Chris

  • For data visualisation and analysis Areej, certainly -- Chris

  • Use iNZight Lite Jhonattan -- Chris

  • Hi Myra, iNZight Lite slowdowns are usually caused by usage spikes -- Chris

  • Thanks Petra. All the best for applying these ideas in your real world -- Chris

  • Fixed Petra --- Chris

  • Hi Geoff, Maths in here ... Not in because few would understand -- Chris
    https://www.stat.auckland.ac.nz/~wild/visdiffs/

  • Sure Geoff - Chris

  • Hi Anna, In desktop iNZight you are not limited by the drop down choices, you can type in colours .. -- Chris
    https://www.stat.auckland.ac.nz/~wild/iNZight/user_guides/advanced/#colours

  • Hi RIMAMSIKWE, It is in principle but it tends to be an expensive strategy cost wise. In practice more complex methods of random sampling are used for large populations including elements of stratified sampling and cluster sampling (https://en.wikipedia.org/wiki/Stratified_sampling, https://en.wikipedia.org/wiki/Cluster_sampling,...

  • More like during Stats 310 I guess Mark -- Chris

  • Congratulations Hashan. Glad you enjoyed it -- Chris

  • Hi Mark, I informed them. Thanks, Chris

  • Hi Emily, Depends what you are doing. Even severe lack of balance is fine for comparing men and women (provided you use things like means and proportions), not fine unless you make adjustments if you are want to use as a combined group representing the general population -- Chris

  • Hi Afshin. Restart the program and it will be fine. If you can reproduce a sequence of steps that leads to the error, only then do we have a good chance of diagnosing and fixing it. Best, Chris

  • Hi Sarah, You can often only see what is wrong with a line of code by seeing what has come before. Email to inzight_support@stat.auckland.ac.nz. It could be as simple as misspelling a variable name -- Chris