Count Distributions

Next, I want to talk about a special case, that is, a variable when you have counted something. This is really common in data analysis. Often, the variables that we are analysing represent counts of things, like number of products sold or number of apps downloaded. Briefly, when you have data like that, they tend to have a certain shape. And so I just want to emphasise those features. First, it’s bounded at zero. I can’t have a negative count of something. I can’t have a negative number of downloads, for instance. So you’ll notice the lowest score is zero. And often, it’s a very common score or scores tend to be down near zero.
In this case, I don’t have many zeros, but often you do. The other feature of this is that it has positive skew. I can’t have negative downloads or a negative count, but usually, I can have as many downloads or as many things as I want. There’s going to be a small number of people with very large counts. So you see a distribution like this, it’s often very highly skewed. This is just a feature of count distributions, but it’s something to keep in mind. And when we talk about summary statistics later, it’ll give you a warning about what to do when you’re analysing count data.

Lesson 3: Non-Normal Distributions

In this lesson, we’ll take a brief look at some other distributions of data. We’ll also explore the differences between “discrete” and “continuous” data.

Lab: Count Data

Data come in different flavors. Two of the major types of numerical data are discrete and continuous. In this lab, we’ll examine the same coffee data from the previous labs and identify some of their characteristics.

The lab instructions can be downloaded as a PDF file here.

The data set for this lab can be viewed here. From the link, copy and paste all the data into a new worksheet in Excel Online.

Essential Mathematics for Data Analysis in Microsoft Excel

