Skip main navigation

R code to simulate random samples

What R code is used to simulate random samples?

To explore what we can expect in 100 rolls of a fair dice, we can set up a computer simulation to reproduce such random rolls.

3.1. R command  sample()

The R command to use is  sample():

# Simulate one instance of 100 rolls of a fair dice: 
set.seed(2023) # set RNG seed for reproducibility 
x <- sample(1:6, size=100, replace=TRUE) 
set.seed(Sys.time()) # to randomise the seed again 

Here, the first argument  1:6  defines the range from which we want to sample (1 to 6);  size=100  specifies the required number of trials (i.e the sample size), and the option  replace=TRUE  means that we sample from the same range, 1 to 6, at each trial (this design is called sampling with replacement).

You may wish to experiment and see what happens if you set  replace=FALSE

Then return it to replace=TRUE to watch the simulated sequence:

print(x) 
## [1] 5 1 3 2 4 2 1 1 5 1 1 5 5 2 3 5 4 5 1 1 6 2 6 6 5 1 2 6 6 1 5 4 6 1 6 4 6 
## [38] 6 2 2 3 4 6 1 5 6 2 4 4 1 5 2 3 1 4 2 5 2 4 4 4 3 1 1 3 6 3 4 6 2 4 2 3 5 
## [75] 6 4 4 1 6 2 2 5 5 3 3 3 1 6 2 6 1 4 5 4 1 1 2 1 4 3 

…and to summarise the output:

table(x) 
## x 
## 1 2 3 4 5 6 ## 21 17 12 18 15 17 

We see that the value 6 was observed 17 times in this simulation. This is about the same as we expected, 100/6=16.667, but it is smaller than our query count of 23 (or more).

Of course, this doesn’t mean that 23 is impossible, as it was just a single simulation.

We also observe in the same simulation that there are no ‘runs’ of three consecutive 6s, although there are three such runs of length two.

As described above, the command sample() picks up a value from the specified range completely at random, with all possible choices being equally likely.

If we want to change this, we can use the option prob=c() to prescribe a vector of weights for obtaining sampled elements. For instance, to simulate a sequence of heads and tails (encoded as  1 and 0, respectively) in tosses of a biased coin, with heads three times more likely than tails, we use the following code:

# Simulate 100 tosses of a biased coin, with weights 1:3 for "0" vs "1" 
set.seed(2023) 
coin <- sample(c(0,1), size=100, replace=TRUE, prob=c(1,3)) 
coin

## [1] 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 
## [38] 1 1 0 1 1 1 1 0 0 1 0 1 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 
## [75] 0 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 0 0 0 1 1 1 0 1

3.2. R command  replicate()

To get a better feel of randomness, we can repeat sampling using the command  replicate()  to obtain several simulated instances of 100 rolls.

To simplify the structure of the R code, we first define a function that simulates an instance (as was done above) and then call this function repeatedly using replicate().

But this time, we make a record of the observed number of 6s directly, without using the command  table():

dice6 <- function() 
{x <- sample(1:6, size=100, replace=TRUE) 
x6 <- sum(x == 6) 
return(x6) 

Note that this function doesn’t take any arguments, but we still need the parentheses when creating it.

The logical query  x==6  is applied component-wise to the vector x, each time returning  TRUE  or  FALSE  depending on whether the component equals 6.

Under the command sum(), the logical values TRUE or FALSE are converted to 1 or 0, respectively, so the result of summation is the total number of components that equal 6 .

Let us check it out using our first simulation:

set.seed(2023) 
dice6() 
## 17 

Note that if we repeat the command  dice6(), it will use a new seed, so the result is likely to be different:

dice6() 
## 12 

Now, we can simulate several random instances by calling this function:

# Set a required number of instances: 
N <- 50 
# Simulate N instances of 100 dice rolls: 
set.seed(2023) 
dice6.N <- replicate(N, expr=dice6()) 
print(dice6.N) 
## [1] 17 12 19 17 17 15 11 19 11 21 16 17 21 15 11 21 27 19 24 23 17 18 16 22 17 
## [26] 11 18 17 14 13 14 18 15 19 18 18 16 13 17 15 22 14 15 18 20 21 17 15 14 23 

By inspection, we see that there have been two instances with 23 counts of 6, and even one with 24 and another with 27 counts.

The mean of this sample of counts is:

mean(dice6.N) 
## 17.16 

The median is:

median(dice6.N) 
## 17 

The frequency distribution of observed counts is visualised using a histogram:

hist(dice6.N, breaks=0.5+c(10:29), xlab="Number of 6s", 
main="Histogram of occurrence of 6 in 100 rollsn 
of a fair dice (50 replicates)") 

The following image is of the histogram that is produced. The 0.5 added in the definition of the breaks ensures that the histogram columns are centred at the integer values.

Next steps

You have now observed how we can use random experiments and Monte Carlo simulations to understand probability. Additionally, we have demonstrated how to simulate random samples using R commands like sample() and replicate(). In the next activity you will examine what will happen if we increase the number of simulated instances of 100 rolls of a fair dice.

This article is from the free online

Statistical Methods

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now