Skip main navigation

Using simulations to ‘measure’ probability

Discover how to measure probability using simulations in RStudio. Read about the Law of Large Numbers and Central Limit Theorem.

The stability of frequencies can be used to find (at least approximately) the probabilities of events of interest using computer simulations.

1. Probability of 23% or more of 6s in rolls of a fair dice

As one such example, we have seen that in some of our simulations of 100 dice rolls, the value 6 may have been observed 23 times or more. What is the probability of such an outcome?

Modifying the function  dice6()  defined above, consider a new, more general function:

Dice23 <- function(n) 
{x <- sample(1:6, size=n, replace=TRUE) 
x6 <- sum(x == 6) 
y23 <- (x6 >= (23/100)*n) 
return(y23)
 } 

Note that this function has an argument n which indicates the length of the sequence of simulated rolls. The (logical) output of  Dice23(n)  is whether or not the percentage of observed 6s in (small n) rolls of a fair dice is at least 23%.

This definition will be useful if we want to explore the chances of seeing at least 23% of 6s in longer simulations of the dice rolls.

We can now replicate the simulation involved in  Dice23()  and calculate the relative frequency of seeing 23 or more 6s in 100 rolls:

n <- 100 # 100 dice rolls in each simulation 
N <- 5000 # 5000 simulated instances 
set.seed(2023) 
fr23 <- sum(replicate(N, Dice23(n))) 
print(fr23/N) 
## 0.0578 

So, the estimated ‘probability’ of seeing 23 or more 6s in 100 rolls is about 5.78%.

The accuracy (and confidence) can be improved by experimenting with more replicates, for example:

N <- 100000 # 100000 simulated instances 
fr23 <- sum(replicate(N, Dice23(n))) 
print(fr23/N) 
## 0.06272 

Here, the estimated probability changes to about 6.27%.

The theoretical probabilities of observing specific frequencies of 6 in repeated rolls of a fair dice is described by the so-called binomial distribution.

Based on this distribution, the probability of observing 23 or more 6s out of 100 rolls is calculated in R as follows:

# Using binomial distribution: 
pbinom(q=22.5, size=100, prob=1/6, lower.tail=FALSE) 
## 0.06305033 

This gives a probability of about 6.31, which is pretty close to the empirical estimate obtained above.

For comparison, let us estimate the probability of seeing at least 23% of 6s in, say, 1,000 rolls:

n <- 1000 
N <- 100000 
fr23 <- sum(replicate(N, Dice23(n))) 
print(fr23/N) 
## 0 

A ‘zero’ estimate means that there were few (if any) instances with 230 or more 6s.

This is confirmed by the binomial calculation, yielding an extremely small probability:

pbinom(q=229.5, size=1000, prob=1/6, lower.tail=FALSE) 
## 1.604014e-07 

2. Probability of runs of 6s in rolls of a fair dice

As our second example, recall that we were asking about how plausible it is to observe three 6s in a row in 100 rolls of a fair dice. Such ‘runs’ of 6s have not been observed in our simulation, but again, we need more conclusive evidence from further simulations.

To explain how to build a suitable R code, let us start with a simpler question about two 6s in a row. Suppose we have simulated a random sequence of 100 dice rolls as before:

set.seed(2023)
x <- sample(1:6, size=100, replace=TRUE) 

To inspect the values of the vector x  one by one, it would be most natural to use so-called loops, which are the coding structures that repeat the same instructions a required number of times or until ordered to stop according to some rule.

However, we can deploy simple ‘tricks’ to circumvent the use of loops.

Consider two auxiliary vectors augmented by adding, say, a zero at the beginning or at the end:

# Two copies of the sample augmented by zeroes: 
x1 <- c(x,0) 
x2 <- c(0,x) 
rbind(head(x1), head(x2)) 
## [,1] [,2] [,3] [,4] [,5] [,6] 
## [1,] 5 1 3 2 4 2 
## [2,] 0 5 1 3 2 4 

We can count the number of runs 66 :

# Count the number of 6-ties: 
run62 <- sum((x1 == 6) & (x2 == 6)) 
# Check if there is at least one 6-tie: 
irun62 <- (run62 > 0) 
print(irun62) 
## TRUE 

To understand this code, note that every pair of consecutive 6s will produce a TRUE contribution to the  run62  summation. Hence,  run62  equals the total number of pairs  66  in the vector x, while  irun62  says if there is at least one such pair.

[We make a convention here that, say, a sequence  666  comprises two consecutive pairs  66, i.e contributes two runs of length 2.]

To use this code for replication, define the corresponding function:

irun62 <- function(n) 
{x <- sample(1:6, size=n, replace=TRUE) 
x1 <- c(x,0) 
x2 <- c(0,x) 
run62 <- sum((x1 == 6) & (x2 == 6)) 
y <- (run62 > 0) 
return(y)
 } 

Now, we are ready to replicate this simulation and calculate the relative frequency of seeing at least one pair 66 :

# Set a required number of replicates: 
n <- 100 # 100 dice rolls in each simulation 
N <- 500 # 500 simulated instances 
# Simulate N instances of 100 dice rolls: 
set.seed(2023)
fr62 <- sum(replicate(N, expr=irun62(n))) 
print(fr62/N) 
## 0.928 

Thus, our estimated probability to observe at least one pair ’66’ in 100 dice rolls is quite high at 0.928.

This probability will get higher or lower if the number of rolls is increased or decreased, respectively.

For instance, with  n=40  rolls, the probability drops to 0.628:

n <- 40 
N <- 500 
set.seed(2023) 
fr62 <- sum(replicate(N, expr=irun62(n))) 
print(fr62/N) 
## 0.628 

Of course, these numerical values are only approximate estimates because they depend on random simulations.

To improve accuracy, we can experiment with more replicates, e.g  N=1000 or more; this is left as an exercise.

The question about three consecutive values 6 is addressed similarly, using three augmented sequences:

# Three copies of the sample augmented by zeroes: 
x1 <- c(x,0,0) 
x2 <- c(0,x,0) 
x3 <- c(0,0,x) 
rbind(head(x1), head(x2), head(x3)) 
## [,1] [,2] [,3] [,4] [,5] [,6] 
## [1,] 5 1 3 2 4 2 
## [2,] 0 5 1 3 2 4 
## [3,] 0 0 5 1 3 2 

The function to be replicated is modified accordingly:

irun63 <- function(n) 
{x <- sample(1:6, size=n, replace=TRUE) 
x1 <- c(x,0,0) 
x2 <- c(0,x,0) 
x3 <- c(0,0,x) 
run63 <- sum((x1 == 6) & (x2 == 6) & (x3 == 6)) 
y <- (run63 > 0) 
return(y)

Now, we replicate this function and calculate the observed relative frequencies of the runs:

n <- 100 
N <- 500 
set.seed(2023) 
fr63 <- sum(replicate(N, expr=irun63(n))) 
print(fr63/N) 
## 0.3 

Thus, the probability of at least one run of 6s of length 3 in 100 rolls of a fair dice is estimated as 0.30 (which may look surprisingly large).

The estimation accuracy improves with more replicates; e.g choosing N=5000 gives the estimated probability

## 0.3174 

This article is from the free online

Statistical Methods

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now