Skip main navigation

Statistical inference and probability models

How do probability models for statistical inference help you to deal with data variability and uncertainty?

Previously, you learned that drawing your conclusions from data is called statistical inference. In this step, you learn how probability models for statistical inference help you to deal with data variability and uncertainty.

1. Inferential statistics

After the experiment is completed and the data are described and summarised, the aim is to draw a conclusion about the population (for example, which teaching method is more effective). The part of statistics concerned with drawing conclusions is called inferential statistics.

To be able to draw a valid conclusion from the data, we must take into account the possibility of chance.

Continuing with the Teaching method example from earlier, suppose that the average score of members of the first group is higher than that of the second. Can you conclude that this increase is due to the teaching method used? Or is it possible that the teaching method was not responsible for the increased scores but rather that the higher scores of the first group were just a chance occurrence?

Example 1: Tossing a coin

A coin has two sides referred to as ‘heads’ and ‘tails’. If we observe that the coin showed heads 7 times in 10 tosses, this does not necessarily mean that the coin is more likely to show heads than tails in future tosses – 7 heads out of 10 might have happened by mere chance. On the other hand, if the coin had shown heads 70 times out of 100 tosses, we would be quite certain that it was not an ordinary coin! Thus, the strength of evidence of observing 70% of heads is greater if we have collected more data.

The difference in the conclusions might be surprising since the observed fraction of heads equals 70% in both cases. This is where the power of statistical reasoning lies.

To be able to draw logical conclusions from data, we usually make some assumptions about the chances of obtaining different data values. Such assumptions are collectively referred to as a probability model for the data.

Sometimes, the nature of the data suggests the form of the probability model that is assumed. Let’s consider some further examples. 

Example 2: Quality control

Suppose that an engineer wants to find out what proportion of computer chips produced by a new method will be defective. The engineer might select a group of these chips, with the resulting data being the number of defective chips in the sample. Provided that the chips are chosen ‘randomly’, it is reasonable to suppose that each one of them is defective with ‘probability’ p, where pp is the unknown proportion of defective chips among the entire batch of chips produced by the new method.

The resulting data can then be used to make inferences about p. Namely, it is reasonable to believe that the unknown probability pp has some bearing on the observed proportion of defective chips in the sample; for example, if p is small, the observed proportion is likely to be low. Moreover, common sense suggests that by increasing the size of the sample, the observed frequency would represent the unknown probability pp more accurately, even though not exactly.

In other situations, the appropriate probability model for a given dataset will not be readily apparent. However, careful description and presentation of the data can sometimes enable us to suggest a reasonable model, which we can then try to verify using more data.

Because the basis of statistical inference is the formulation of a probability model to describe the data, an understanding of statistical inference requires some knowledge of probability theory.

In other words, statistical inference starts with the assumption that important aspects of the phenomenon under study can be described in terms of probabilities. Then, you draw conclusions by using data to make inference about these probabilities.

2. Concept of probability

The concept of the probability of a particular outcome (‘event’) in a random experiment admits various interpretations.

For instance, what does it mean if a geologist said that “there is a 60% chance of oil in a certain region”?

There are two types of plausible interpretations of this opinion, either frequentist or subjective.

2.1. Frequentist interpretation

One possible interpretation, called frequentist (from ‘frequencies’), is that from the geologist’s experience, there is oil in about 60% of the regions with similar environmental conditions.

In the frequentist interpretation, the probability (smallmathsf{P}(A)) of a given outcome (small A) in an experiment (for example, oil found in the exploration of a region) is considered as a property of (small A) (‘measure’ of its chances to occur).

This probability can be approximately evaluated by sequential repetition of the experiment (say, n times) and noting the number (small n_A) of those with outcomes. As in Example 2 on quality control, it may be expected that (small n_A/napprox mathsf{P}(A)). Moreover, the longer the sequence of experiments, the better the approximation.

Mathematically, this suggests that, in some sense, 

(small frac{n_A}{n} rightarrow mathsf{P}(A) quad text { as } n rightarrow infty)

This is often referred to as the law of large numbers or stability of frequencies.

This interpretation explains why, despite the popular use of percentages, in statistical theory, it is more convenient to represent probability on a scale from (small 0) to (small 1). Thus, (small 60%) in the geologist’s statement is expressed as ‘probability equals (small 0.6)‘.

2.2. Subjective interpretation

Another possible interpretation called subjective, is that the geologist believes it is more likely than not that the region will contain oil, and ‘(small 60%)‘ is a measure of the geologist’s belief that the region contains oil.

In the subjective interpretation, the probability (small mathsf{P}(A)) is considered as a subjective belief concerning the chance that the outcome (small A) will occur. This approach is popular in the so-called Bayesian statistics.

In this interpretation, probability is an expression of someone’s degree of belief, so cannot be justified by a sequence of experiments. However, through data collection such beliefs can be updated and, if necessary, amended.

We can further examine this subjective interpretation with an example of a Bayesian urn.

Example 3: Bayesian urn 

Suppose there is a box (‘urn’) containing two balls, each one either black or white, but their actual colours are unknown to us.

Assume that our subjective belief is that there is a (small 50%) chance that the balls are of different colours, a (small 25%) chance that both balls are black, and the remaining (small 25%) that both are white:

(small mathsf{P}(B W)={largetfrac{1}{2}}, quad mathsf{P}(B B)={largetfrac{1}{4}}, quad mathsf{P}(W W)={largetfrac{1}{4}}.)

We draw a ball at random, note its colour and return it to the urn (‘sampling with replacement’). Suppose we observed only black balls in 10 draws. How should we update our beliefs about the urn content?

The hypothesis with two white balls cannot be true. Moreover, we have very strong evidence in favour of both balls being black, for otherwise, the non-appearance of a white ball in 10 draws would seem unlikely.

Indeed, a Bayesian calculation reveals that our beliefs should be updated to about a 99.8% chance of two black balls and just 0.2% for a mixed content (such as, one black and one white).

3. Random variables and their probability distribution

Building on the concept of probability as a tool to quantify uncertainty in a random outcome of the experiment, this idea can be extended to the cases with a larger variety of possible outcomes.

Example 4: Snack bars

Suppose a snack bar of a certain brand should each contain, on average, 5 grams of chocolate. To check the quality, we buy 100 bars and determine the amount of chocolate in each of them. Due to variations during the production, the measurements would vary, so we get a sample of 100 observed values.

Because such values would change almost continuously, it does not make sense to talk about probabilities of any specific values.

Instead, it is more informative to look at the distribution of probabilities across the range, for example, the probabilities for the amount of chocolate to be in various intervals.

The observed values in our sample would represent the chances of possible outcomes. If the bars are produced properly, the sample values should lie close to the target amount of 5 grams, with values further away being less likely and, therefore, observed more rarely.

Thus, it is helpful to imagine that our observations are produced according to some background random variable X, with a certain probability distribution describing the chances of various values of the chocolate amount in a randomly chosen bar.

The sample of size (small n=100) will then be a collection of random values (small X_1,X_2,dots,X_{100})​ representing the parent random variable X.

This article is from the free online

Statistical Methods

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now