Simulation

This lesson has two sections. First demonstrates a method to simulate one-participant data. The function, simulate, in the ggdmc package creates a data frame based on the parameter vector and the model (both are defined by a user) with nsim observations for each row in model. ps is the true parameter vector.

Second section shows a method to conduct a process model. Specifically, the section conducts a simulation experiment to describe the British tea example on p 37 in Maxwell & Delaney (2004). See Maxwell and Delaney (2004) for an analytic method to calculate the same probabilities. Here I directly model the Britich tea example, approximating the same probabilities. The analytic method is just to use a binominal distribution and the idea of combinations and permutations.

One-participant simulation

This line define one S (stimulus) factor with two levels. So this model defines one two experimental conditions.

factors = list(S = c(“s1”, “s2”)),

Below are the R codes for defining a model and for simulating data from the model.

require(ggdmc)
model <- BuildModel(
   p.map     = list(A = "1", B = "R", t0 = "1", mean_v = "M", sd_v = "M", st0 = "1"),
   match.map = list(M = list(s1 = "r1", s2 = "r2")),
   factors   = list(S = c("s1", "s2")),  ## one factor with two levels, so only
   constants = c(sd_v.false = 1, st0 = 0),
   responses = c("r1", "r2"),
   type      = "norm")
   
p.vector <- c(A = .75, B.r1 = .25, B.r2 = .15, t0 = .2, mean_v.true = 2.5,
               mean_v.false = 1.5, sd_v.true = 0.5)

This just is to simulate only one observation per condition to check the function.

set.seed(123)  ## Set seed to get the same simulation
dat <- simulate(model, nsim = 1, ps = p.vector)
##    S  R        RT
## 1 s1 r1 0.3327392
## 2 s2 r1 0.3797985

The following simulates 500 observations per condition. So in total, there are 1000 observations.

ntrial <- 5e2  ## number of trials per condition
dat <- simulate(model, nsim = ntrial, ps = p.vector)
dplyr::tbl_df(dat)
##  A tibble: 1,000 x 3
##    S     R        RT
##    <fct> <fct> <dbl>
##  1 s1    r2    0.533
##  2 s1    r2    0.494
##  3 s1    r1    0.497
##  4 s1    r2    0.310
##  5 s1    r1    0.462
##  6 s1    r2    0.345
##  7 s1    r2    0.430
##  8 s1    r1    0.384
##  9 s1    r2    0.310
## 10 s1    r1    0.302
# # ... with 990 more rows

Note that model and data are in fact two separate objects. To fit data with certain models, we need to bind them together with BuildDMI. This is useful to facilitate model comparison. That is, a data set can bind with many different models, so we can compare them to see which model may fit the data better so perhaps provide a better account. I used a term, data-model instance (dmi), coined by Matthew Gretton.

dmi <- BuildDMI(dat, model)

We can the codes introduced in the “Descriptive Statistics” to check the correct 10%, 50%, 90% quantile RTs and accuracy, separately, for each level of the stimulus factor.

First I convert the dmi data frame to a data table and then create a new accuracy (logical) column, C.

require(data.table)
d <- data.table(dmi)
d$C <- ifelse(d$S == "s1" & d$R == "r1", TRUE,
       ifelse(d$S == "s2" & d$R == "r2", TRUE,
       ifelse(d$S == "s1" & d$R == "r2", FALSE,
       ifelse(d$S == "s2" & d$R == "r1", FALSE, NA))))

d[, .(q1 = round(quantile(RT, .1), 2),
      q5 = round(quantile(RT, .5), 2),
      q9 = round(quantile(RT, .9), 2)), .(C, S)]
##        C  S   q1   q5   q9
## 1:  TRUE s1 0.32 0.42 0.56
## 2:  TRUE s2 0.28 0.39 0.52
## 3: FALSE s1 0.27 0.37 0.50
## 4: FALSE s2 0.32 0.39 0.54

pro <- d[, .N, .(C, S)]
pro[, NN := sum(N), .(S)]
pro[, value := N/NN]
cp <- pro[C == TRUE] ## correct percentage
##       C  S   N  NN value
## 1: TRUE s1 333 500 0.666
## 2: TRUE s2 391 500 0.782

ep <- pro[C == FALSE] ## error percentage
##        C  S   N  NN value
## 1: FALSE s1 167 500 0.334
## 2: FALSE s2 109 500 0.218

Plot the RT distributions

require(ggplot2)
bw <- .01 ## 10 ms binwidth
p0 <- ggplot(d, aes(RT)) +
      geom_histogram(binwidth = bw, fill = "white", colour = "black") +
      facet_grid(.~C) +
      theme_bw(base_size = 18)
print(p0)

distributions

British tea example

This section shows how we may test an hypothetical question directly via a simulation. Quoted from Maxwell and Delaney (p. 37, 2004)

“A lady declares that by tasting a cup of tea made with milk, she can discriminate whether the milk or the tea infusion was first added to the cup. We will consider the problem of designing an experiment by means of which this assertion can be tested. (Fisher, 1935/1971, p. 11)”

This is essential a binominal decision making. That is, the decision maker (“the lady”) in question will be presented one cup of tea after another and then her task is to decide if the cup is made by milk or tea is added first.

This following function, British.tea implements a process model to describe the above “British tea example”. That is, it conducts a simulation experiment of presenting 8 (i.e., n) cups of tea to a participant. The n equals 8 is decided arbitrarily here.

One additional information (i.e., assumption) is that the participant is told half of the cups are milk first and tea and vice versa. So when simulating the chance only scenario, we need also to take this into consideration. That is, after making a decision for a cup (either MT or TM), the (chance) probability state should adjust accordingly.

##' British tea example
##'
##' The function runs a simulation study to test the British tea example
##'
##' @param the number of observation (cups of tea)
##' @param correct correct sequence: First four cups are tea and milk
##  (TM = 1), the next four cups are milk and then tea (MT = 0).
##' @param verbose print more information
##'
##' @export
British.tea <- function(n = 8, correct = c(1,1,1,1, 0,0,0,0),
                        verbose = TRUE) {

    MT <- n/2 ## 0 indicates milk and then tea (MT)
    TM <- n/2 ## 1 indicates tea and then milk (TM)

    ## Create three containers
    ## 1. x0 is a "n x 2" matrix to store the evolution of chance probabilities
    ## 2. res is a n-element numeric vector
    ## 3. acc is a n logical vector; default value is FALSE
    x0 <- matrix(numeric(n*2), ncol = 2)
    res <- numeric(n)
    acc <- rep(FALSE, n)

    ## Begin the experiment, presenting one cup after another
    for (i in 1:n) {
        if (verbose) cat("Cup", i, "in total", sum(MT, TM), " cup(s)\n")
        
		## store the chance probabilities of MT and TM in probs
        probs <- c(MT / (MT + TM), TM / (MT + TM))
        if (verbose) cat("Chances probabilities of (MT, TM): ", probs, "\n")
        
        x0[i, ] <- probs
        decision <- sample(c(0, 1), 1, prob = probs);
         if (decision == 0) {
             if (verbose) cat("This cup is made by adding milk first\n")
             MT <- MT - 1
             res[i] <- decision
             if (decision == correct[i]) acc[i] <- TRUE
         } else if (decision == 1) {
             if (verbose) cat("This cup is made by adding tea first\n")
             TM <- TM - 1
             res[i] <- decision
             if (decision == correct[i]) acc[i] <- TRUE
         } else cat("Unexpected situation\n")
         
         if (verbose) cat("Current state", i, ": ", c(MT, TM), "\n\n")
    }
    if (verbose) cat("Done\n")
    return(list(x0, res, correct, acc))
}

The simulation starts from the for loop.

for (i in 1:n) {…}

The for loop presents a cup of tea after another until the nth cup. Before the participant make a decision, the chance probabilities of the two possible outcomes are stored in x0 variable.

probs <- c(probMT, probTM)
x0[i, ] <- probs

And then the sample function acts as a chance mechanism to simulate the participant’s (chance) decision making process.

decision <- sample(c(0, 1), 1, replace = TRUE, prob = probs);

The function randomly choose two numbers, c(0, 1), with the probabilities, probs to for the first and second number.

    ## Begin the experiment, presenting one cup after another
    for (i in 1:n) {
        if (verbose) cat("Cup", i, "in total", sum(MT, TM), " cup(s)\n")
        
        probMT <- MT / (MT + TM)   ## chance probability of MT 0
        probTM <- TM / (MT + TM)   ## chance probability of TM 1
        probs  <- c(probMT, probTM)
        
        if (verbose) cat("Chances probabilities of (MT, TM): ", probs, "\n")
        
        x0[i, ] <- probs
        decision <- sample(c(0, 1), 1, prob = probs);
		
         if (decision == 0) {
             if (verbose) cat("This cup is made by adding milk first\n")
             MT <- MT - 1
             res[i] <- decision
             if (decision == correct[i]) acc[i] <- TRUE
         } else if (decision == 1) {
             if (verbose) cat("This cup is made by adding tea first\n")
             TM <- TM - 1
             res[i] <- decision
             if (decision == correct[i]) acc[i] <- TRUE
         } else cat("Unexpected situation\n")
         
         if (verbose) cat("Current state", i, ": ", c(MT, TM), "\n\n")
}
		 

Conduct one experiment and print information

ncup <- 8
cor <- c(rep(1, 4), rep(0, 4)); 
res <- British.tea(ncup, cor, TRUE)

Cup 1 in total 8 cup(s).

Chances probabilities of (MT, TM): 0.5 0.5 This cup is made by adding tea first Current state 1 : 4 3

Cup 2 in total 7 cup(s).

Chances probabilities of (MT, TM): 0.5714286 0.4285714 This cup is made by adding milk first Current state 2 : 3 3

Cup 3 in total 6 cup(s).

Chances probabilities of (MT, TM): 0.5 0.5 This cup is made by adding tea first Current state 3 : 3 2

Cup 4 in total 5 cup(s).

Chances probabilities of (MT, TM): 0.6 0.4 This cup is made by adding milk first Current state 4 : 2 2

Cup 5 in total 4 cup(s).

Chances probabilities of (MT, TM): 0.5 0.5 This cup is made by adding milk first Current state 5 : 1 2

Cup 6 in total 3 cup(s).

Chances probabilities of (MT, TM): 0.3333333 0.6666667 This cup is made by adding tea first Current state 6 : 1 1

Cup 7 in total 2 cup(s).

Chances probabilities of (MT, TM): 0.5 0.5 This cup is made by adding tea first Current state 7 : 1 0

Cup 8 in total 1 cup(s).

Chances probabilities of (MT, TM): 1 0 This cup is made by adding milk first Current state 8 : 0 0

Done

Now I replicate the experiments separately for 512, 4096, 32768, 262144, and 2097152 times and store each result in a list, called exp.

n <- 8^(3:7); 
exp <- vector("list", length(n))

## Use parallel package to conduct experiments
## 100.636 s
library(parallel)
cl <- makeCluster(detectCores())
clusterExport(cl, c("British.tea", "ncup", "cor"))
system.time(
    for (i in 1:length(n)) {
        exp[[i]] <- parSapply(cl, 1:n[i], function(i, ...) {British.tea(ncup, cor, FALSE)} )
    }
)
stopCluster(cl)
## Without using parallel
## for(i in 1:length(n)) {
##     exp[[i]] <- replicate(n[i], British.tea(ncup, cor, FALSE))
## }

res3 <- numeric(length(n)); res3  ## to store the result when 6 corrects
res4 <- numeric(length(n)); res4  ## to store the result when 8 corrects

## Collect results
for(i in 1:length(n)) {
    c3 <- 0
    c4 <- 0
    for(j in 1:n[i]) {
         ## Calculate exactly 4 corrects
         if(all(exp[[i]][,j][[4]])) c4 <- c4 + 1
         if(sum(exp[[i]][,j][[4]]) == 6) c3 <- c3 + 1
    }
    res3[i] <- c3 / n[i]
    res4[i] <- c4 / n[i]
}


round(res3, 4) ## [1] 0.2578 0.2324 0.2306 0.2288 0.2286
round(res4, 4) ## [1] 0.0137 0.0137 0.0140 0.0141 0.0143
require(ggplot2); require(data.table)
## Plot the result
## (How to add differernt horizontal lines on each facet)
DT <- data.table(x= rep(n, 2), y = c(res3, res4), gp = rep(c("6", "8"), each = 5),
                 ref = rep(c(16/70, 1/70), each = 5))

## Dashlines show theoretically probabilities
p0 <- ggplot(DT, aes(x, y)) +
    geom_point(size = 3) +
    geom_hline(aes(yintercept = ref), linetype = "dashed") +
    ## scale_x_log10(name = "N") +
    xlab("N") + ylab("Probability") +
    facet_grid(gp~., scales = "free") +
    theme_bw(base_size = 22) 

print(p0)

tea

Cognitive Models

Modelling Basics

Simulation

One-participant simulation

British tea example