Using simulations to ‘measure’ probability
The stability of frequencies can be used to find (at least approximately) the probabilities of events of interest using computer simulations.
1. Probability of 23% or more of 6s in rolls of a fair dice
As one such example, we have seen that in some of our simulations of 100 dice rolls, the value 6 may have been observed 23 times or more. What is the probability of such an outcome?
Modifying the function dice6()
defined above, consider a new, more general function:
Dice23 <- function(n)
{x <- sample(1:6, size=n, replace=TRUE)
x6 <- sum(x == 6)
y23 <- (x6 >= (23/100)*n)
return(y23)
}
Note that this function has an argument n
which indicates the length of the sequence of simulated rolls. The (logical) output of Dice23(n)
is whether or not the percentage of observed 6s in (small n) rolls of a fair dice is at least 23%.
This definition will be useful if we want to explore the chances of seeing at least 23% of 6s in longer simulations of the dice rolls.
We can now replicate the simulation involved in Dice23()
and calculate the relative frequency of seeing 23 or more 6s in 100 rolls:
n <- 100 # 100 dice rolls in each simulation
N <- 5000 # 5000 simulated instances
set.seed(2023)
fr23 <- sum(replicate(N, Dice23(n)))
print(fr23/N)
## 0.0578
So, the estimated ‘probability’ of seeing 23 or more 6s in 100 rolls is about 5.78%.
The accuracy (and confidence) can be improved by experimenting with more replicates, for example:
N <- 100000 # 100000 simulated instances
fr23 <- sum(replicate(N, Dice23(n)))
print(fr23/N)
## 0.06272
Here, the estimated probability changes to about 6.27%.
The theoretical probabilities of observing specific frequencies of 6 in repeated rolls of a fair dice is described by the so-called binomial distribution.
Based on this distribution, the probability of observing 23 or more 6s out of 100 rolls is calculated in R as follows:
# Using binomial distribution:
pbinom(q=22.5, size=100, prob=1/6, lower.tail=FALSE)
## 0.06305033
This gives a probability of about 6.31, which is pretty close to the empirical estimate obtained above.
For comparison, let us estimate the probability of seeing at least 23% of 6s in, say, 1,000 rolls:
n <- 1000
N <- 100000
fr23 <- sum(replicate(N, Dice23(n)))
print(fr23/N)
## 0
A ‘zero’ estimate means that there were few (if any) instances with 230 or more 6s.
This is confirmed by the binomial calculation, yielding an extremely small probability:
pbinom(q=229.5, size=1000, prob=1/6, lower.tail=FALSE)
## 1.604014e-07
2. Probability of runs of 6s in rolls of a fair dice
As our second example, recall that we were asking about how plausible it is to observe three 6s in a row in 100 rolls of a fair dice. Such ‘runs’ of 6s have not been observed in our simulation, but again, we need more conclusive evidence from further simulations.
To explain how to build a suitable R code, let us start with a simpler question about two 6s in a row. Suppose we have simulated a random sequence of 100 dice rolls as before:
set.seed(2023)
x <- sample(1:6, size=100, replace=TRUE)
To inspect the values of the vector x
one by one, it would be most natural to use so-called loops, which are the coding structures that repeat the same instructions a required number of times or until ordered to stop according to some rule.
However, we can deploy simple ‘tricks’ to circumvent the use of loops.
Consider two auxiliary vectors augmented by adding, say, a zero at the beginning or at the end:
# Two copies of the sample augmented by zeroes:
x1 <- c(x,0)
x2 <- c(0,x)
rbind(head(x1), head(x2))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 5 1 3 2 4 2
## [2,] 0 5 1 3 2 4
We can count the number of runs 66
:
# Count the number of 6-ties:
run62 <- sum((x1 == 6) & (x2 == 6))
# Check if there is at least one 6-tie:
irun62 <- (run62 > 0)
print(irun62)
## TRUE
To understand this code, note that every pair of consecutive 6s will produce a TRUE contribution to the run62
summation. Hence, run62
equals the total number of pairs 66 in the vector x
, while irun62
says if there is at least one such pair.
[We make a convention here that, say, a sequence 666
comprises two consecutive pairs 66
, i.e contributes two runs of length 2.]
To use this code for replication, define the corresponding function:
irun62 <- function(n)
{x <- sample(1:6, size=n, replace=TRUE)
x1 <- c(x,0)
x2 <- c(0,x)
run62 <- sum((x1 == 6) & (x2 == 6))
y <- (run62 > 0)
return(y)
}
Now, we are ready to replicate this simulation and calculate the relative frequency of seeing at least one pair 66
:
# Set a required number of replicates:
n <- 100 # 100 dice rolls in each simulation
N <- 500 # 500 simulated instances
# Simulate N instances of 100 dice rolls:
set.seed(2023)
fr62 <- sum(replicate(N, expr=irun62(n)))
print(fr62/N)
## 0.928
Thus, our estimated probability to observe at least one pair ’66’ in 100 dice rolls is quite high at 0.928.
This probability will get higher or lower if the number of rolls is increased or decreased, respectively.
For instance, with n=40
rolls, the probability drops to 0.628:
n <- 40
N <- 500
set.seed(2023)
fr62 <- sum(replicate(N, expr=irun62(n)))
print(fr62/N)
## 0.628
Of course, these numerical values are only approximate estimates because they depend on random simulations.
To improve accuracy, we can experiment with more replicates, e.g N=1000
or more; this is left as an exercise.
The question about three consecutive values 6 is addressed similarly, using three augmented sequences:
# Three copies of the sample augmented by zeroes:
x1 <- c(x,0,0)
x2 <- c(0,x,0)
x3 <- c(0,0,x)
rbind(head(x1), head(x2), head(x3))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 5 1 3 2 4 2
## [2,] 0 5 1 3 2 4
## [3,] 0 0 5 1 3 2
The function to be replicated is modified accordingly:
irun63 <- function(n)
{x <- sample(1:6, size=n, replace=TRUE)
x1 <- c(x,0,0)
x2 <- c(0,x,0)
x3 <- c(0,0,x)
run63 <- sum((x1 == 6) & (x2 == 6) & (x3 == 6))
y <- (run63 > 0)
return(y)
}
Now, we replicate this function and calculate the observed relative frequencies of the runs:
n <- 100
N <- 500
set.seed(2023)
fr63 <- sum(replicate(N, expr=irun63(n)))
print(fr63/N)
## 0.3
Thus, the probability of at least one run of 6s of length 3 in 100 rolls of a fair dice is estimated as 0.30 (which may look surprisingly large).
The estimation accuracy improves with more replicates; e.g choosing N=5000
gives the estimated probability
## 0.3174
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free