James Davenport

James Davenport

Professor of Information Technology at the University of Bath, in both Mathematical Sciences and Computer Science. Is on ISO/IEC JTC 1/SC 42/WG 3: Trustworthiness of Artificial Intelligence.

Location Bath, UK


  • As the old joke goes "He uses statistics the way a drunken man uses a lamppost: for support rather than for illumination"

  • Thanks Tom.

  • I think Laura has an important point here. The old phrase in computing was "garbage in, garbage out" (abbreviated to GIGO) and that's probably appropriate here.

  • Good, ambitious, questions. There will be several legal/ethical questions around collecting and analysing such data, but that shouldn't stop you from trying.

  • Nice motivation. The sport that's made the most use of big data/analytics is probably baseball: see for example https://hbr.org/2019/07/what-baseball-can-teach-you-about-using-data-to-improve-yourself .

  • The most important thing about a convention is that it should be applied consistently. The bigger the project the more important this is.

  • @NiallBuswell : this doesn't quite work. What happens if I say N/N/N/Y/Y - I get no drink but with both milks.

  • Also, interdisciplinary teamwork - it requires a team with more skills background than one person normally has.

  • To all, but in response to this remark - that's one reason we built this course, to help people realise their gaps.

  • A good set of motivations so far - broad, but that's to be expected, as AI and DS are very widely applicable.

  • All good comments here.

  • Days of the week, certainly. Hence the original design certainly has a problem, as several jave mentioned. But also holidays (which can differ by country, in the event of an internationally-oriented website).

  • Good point - it could be argued that Facebook has hijacked the word "friend"

  • Good point. There are pros and cons to suppressing a very large item. I'd have been tempted to use the 'broken y axis' technique in this case. But hindsight is always better!

  • Appreciating that you have a lot to learn is part of the journey, and much better than not appreciating it.

  • Sticking with the coffee theme, there's a popular article on the "does coffee stunt children's growth" myth at https://www.livescience.com/coffee-does-not-stunt-growth.html : there's a correlation between coffee and osteoporosis, but that's because coffee drinkers tend to drink less milk.

  • Indeed, and without a great deal of care, AI can easily replicate, and quite possibly exaggerate, existing biases.

  • @MichaelMorehouse Do you want to drop me a mail (masjhd@bath.ac.uk) to take this one further?

  • @MichaelMorehouse Indeed - I was just starting to re-read this and had the same thought. But BMI is still in use, e.g. for Covid prioritisation: https://www.liverpoolecho.co.uk/news/liverpool-news/invited-covid-vaccine-because-nhs-19857990?utm_source=nsday&utm_medium=email&utm_campaign=NSDAY_180221

  • And a more general lesson is that there needn't be a clear winner. This shows up in PR voting as the Condorcet-Dodgson paradox: https://en.wikipedia.org/wiki/Condorcet_paradox

  • Absolutely right that there are a lot of assumptions, many of which are driven by availability of (quality) data. The technical phrase would be that we are using insurance data as a proxy for accident data.

  • And I'll be looking at today's comments

  • The fragmentation issue is an interesting one. A lot of health research (e.g. on alcoholism) comes out of the U.S. Veterans Administration because they have essentially perfect tracking of their patients across multiple hospitals etc.

  • Sally B: good points. There is a lot of work in "medical ontologies" (read 'structured vocabularies') to ensure that the same terms are used, but it seems to me, as one who follows ontologies but isn't a doctor, that these are of limited scope. "Cause of death" for example, is one where the principal cause is well-structured, but secondary conditions tend to...

  • Just to correct Kweku: that's 1%, or 0.01. The deeper question he asks is interesting, but I don't have a definite answer. It depends crucially on the question: "does this suit 90% of the population" is very different from "does this suit people independent of neurodiversity".

  • And indeed problems that can be got wrong without machine learning. For example the English 2020 A-levels debacle was ssentially done without machine learning, despite the claims of "mutant algorithms".

  • And also Data Exploration will involve some Visualisation: drawing a number of 2D/3D plots and so on.

  • Thanks for the comments on absolute numbers of Twitter users. What about the percentage of the population that are Twitter users?

  • Almost correct. Number 5 is the following "False" != False .
    The first argument to != is a string, and the second is a Boolean. Booleans interoperate with integers (see the next example), BUT NOT with strings. So this is in fact True, and would be True for any string. Similarly "Zero" != 0

  • Thanks - bug reported.

  • You might want to consider whether the user voluntarily provides the data gathered by the facebook pixel: https://en-gb.facebook.com/business/learn/facebook-ads-pixel

  • You should really give the basis of your calculations, not just the answers.

  • "Fruit flies like a banana" - not the way I throw bananas!

  • I think all these might need more detail. Max's step 3 is good but probably needs a definition of 'heated'.

  • The distinction between correlation and causation is extremely important. One of the better examples is at https://blogs.ams.org/blogonmathblogs/2017/04/10/divorce-and-margarine/

  • Zac's point about representation of minorities is important, as we were also seeing over the coronavirus vaccine.

  • Looks like the broken link has been fixed

  • You might think shopping habits were innocuous, but consider the story (search for 'Minneapolis') in https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html

  • Good comments about bias (often unconscious). Going back to Florence Nightingale, the joy of her polar graphics was that they convinced people that "died in war" was not the same as "died in battle", which was people's unconscious assumption.

  • Indeed, as several have said, it takes insight to ask the right questions

  • All good introductions, and a variety of backgrounds to learn from each other.

  • Personally, I'd add "presentation skills" to the personal qualities.

  • "So much to learn" - yes, it's a broad field. And though all the hype goes on "data scientist", many of them, much of the time, are really data engineers. See https://blog.panoply.io/what-is-the-difference-between-a-data-engineer-and-a-data-scientist for one person's view of the difference.

  • Is that "No" saying "No comments" or a negative response?

  • Lot of useful comments here, and plenty of focus on health and supporting people. Of course, these areas also have significant privacy concerns.