Pleskac, Cesario and Johnson (2017) examined a thorny issue in U.S.A (Wong, 2016; Chinese Community Reels After Brooklyn NYPD Shooting, 2014; American’s police on trial, 2014), whether a police officer’s decision, to shoot or not to shoot, is affected by the race of a shooting target and many other related factors. This is an important question that cognitive experiments might be able to provide some insights. They analyzed four data sets with the hierarchical Wiener diffusion model. They kindly provide their data and JAGS codes at their project OSF. You may also want to read their article, which describes the findings. A previous study, fitting data from also a first-person-shooter task with fixed-effect DDM, is reported in Correll, Wittenbrink, Crawford and Sadler (2015).
In this tutorial, I used the data in Pleskac, Cesario and Johnson (2017) to demonstrate how to use pMCMC to fit hierarchical Wiener diffusion to empirical data, although one might account for the decision scenario more relatistically by using the urgency-gating model (Cisek, Puskas, & El-Murr, 2009). That says, I did not argue the urgency- gating model is better than the HDDM, as the story about the LCA model suggested (Miletic, Turner, Forstmann, & van Maanen, 2017). Only when we put them into tests can we know better whether the urgency-gating model is better than the HDDM to account for the shooting decisions.
In the case of fitting empirical data I need to rely on other techniques, for example posterior predective check (Gelman, Carlin, Stern, Dunson, Vehtari, & Rubin, 2014), to check whether posterior distributions appropriately reflect target distributions, because I do not know the true data-generation mechanism.
Set-up a model object
First, I defined a race-threshold model,based on the stereotype findings in classic social psychology literature (Duncan, 1976; Sagar & Schofield, 1990). One question related to this classic observation is that whether this stereotype affects people’s decision threshold, their decision rate, or both. In the first model,I set up a race-threshold model as model1. Below is a list of the abbreviations for each experimental factor.
- RACE: the stimulus shows an African American (A) or a European American (E)
- S: the stimulus shows one holding a gun (G) or other object (N, not a gun)
- R: response to “shoot” or “not” to shoot
Below is a list of the abbrevations of the DDM parameters.
- a: the boundary separation
- v: the mean of the drift rate
- z: the mean of the starting point of the diffusion relative to threshold separation
- d: differences in the non-decisional component between upper and lower threshold
- sz: the width of the support of the distribution of zr
- sv: the standard deviation of the drift rate
- t0: the mean of the non-decisional component of the response time
- st0: the width of the support of the distribution of t0
a = RACE refers to the model assumes that the race factor affects the a parameter. In R style formula, this may look like, a ~ RACE.
match.map = list(M = list(G = “shoot”, N = “not”)),
stands for that when a trial records a stimuls type as “G”, and a response of “shoot”, it will be coded (by BuildModel) as correct response (TRUE), otherwise error response (FALSE).
factors = list(S = c(“G”, “N”), RACE = c(“E”, “A”)),
stands for that experiment has two factors, S and RACE, each of them has two levels. The former has a G level for holding a gun object, and a N level for holding a nongun object and the latter has a E level for European American and and A level for African American, levels.
library(ggdmc)
model <- BuildModel(
p.map = list(a = "RACE", v = "S", z = "1", d = "1", sz = "1", sv = "1",
t0 = "1", st0 = "1"),
match.map = list(M = list(G = "shoot", N = "not")),
factors = list(S = c("G", "N"), RACE = c("E", "A")),
constants = c(st0 = 0, d = 0),
responses = c("shoot", "not"),
type = "rd")
npar <- length(GetPNames(model))
I conducted a parameter recovery study to certain that the model can fit the data properly beforce I fit to the real data.
## Population distribution
pop.mean <- c(a.E = 1.5, a.A = 2.5, v.G = 3, v.N = 2, z = .5, sz = .3,
sv = 1, t0 = .2)
pop.scale <- c(a.E =.5, a.A = .8, v.G = .5, v.N = .5, z = .1, sz = .1,
sv = .3, t0=.05)
pop.prior <- BuildPrior(
dists = rep("tnorm", npar),
p1 = pop.mean,
p2 = pop.scale,
lower = c(rep(0, 2), rep(-5, 2), rep(0, 4)),
upper = c(rep(5, 2), rep(7, 2), rep(2, 4)))
As usual, I want to visually check if the assumed mechanism is reasonable.
plot(pop.prior)
I loaded the empirical data to see the trial number and participant number in the empirical data.
load("data/race/study3.rda")
dplyr::tbl_df(study3)
## A tibble: 12,033 x 7
## RT S B CT RACE R s
## <dbl> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 0.753 gun blur safe black not 11
## 2 0.851 non blur safe white shoot 11
## 3 0.742 gun clear safe black not 11
## 4 0.636 non clear safe white shoot 11
## 5 0.644 gun blur safe black shoot 11
## 6 0.625 non clear safe black not 11
## 7 0.889 non clear safe white shoot 11
## 8 0.597 gun blur safe black not 11
## 9 0.724 gun clear safe white not 11
## 10 0.656 non blur safe white shoot 11
## ... with 12,023 more rows
By using the internal function .N in data.table, I knew the actual trial number in a design cell is very small (6 to 33 trials) in study 3. One more reason that I tested only RACE and S factor, which give trial number between 50 to 93. Later, I will tested another model, which also has B and CT factors, to see whether the HDDM still returns good estimates when the trial numbers are very small.
- s: subject id
- S: stimulus factor
- B: object factor: blurrd or clear view
- CT: context factor: safe or dangerous neighbor
study3[, .N, .(s, S, B, CT, RACE)]
##
## s S B CT RACE N
## 1: 11 gun blur safe black 22
## 2: 11 non blur safe white 23
## 3: 11 gun clear safe black 19
## 4: 11 non clear safe white 18
## 5: 11 non clear safe black 21
## ---
## 604: 348 gun clear danger black 17
## 605: 348 non clear danger white 28
## 606: 348 gun blur danger white 20
## 607: 348 non clear danger black 16
## 608: 348 gun blur danger black 23
range(study3[, .N, .(s, S, B, CT, RACE)]$N)
## [1] 6 33
study3[, .N, .(s, S, RACE)]
# s S RACE N
# 1: 11 gun black 82
# 2: 11 non white 82
# 3: 11 non black 78
# 4: 11 gun white 78
# 5: 19 gun white 83
# ---
# 148: 344 non black 84
# 149: 348 non white 93
# 150: 348 gun black 81
# 151: 348 gun white 79
# 152: 348 non black 67
range(study3[, .N, .(s, S, RACE)]$N)
## [1] 50 93
nrow(study3[, .N, .(s)])
## 38 subjects
Then I set up the same number of subjects / participants and an optimistic trial numbers.
nsubject <- 38
ntrial <- 100
dat <- simulate(model, nsim = ntrial, nsub = nsubject, p.prior = pop.prior)
dmi <- BindDataModel(dat, model)
ps <- attr(dat, "parameters")
Next I started the sampling. This will take a few hours.
## Set-up Priors --------
p.prior <- BuildPrior(
dists = rep("tnorm", npar),
p1 = pop.mean,
p2 = pop.scale*10,
lower = c(rep(0, 2), rep(-5, 2), rep(0, 4)),
upper = c(rep(10, 2), rep(7, 2), rep(5, 4)))
mu.prior <- BuildPrior(
dists = rep("tnorm", npar),
p1 = pop.mean,
p2 = pop.scale*10,
lower = c(rep(0, 2), rep(-5, 2), rep(0, 4)),
upper = c(rep(10, 2), rep(7, 2), rep(5, 4)))
sigma.prior <- BuildPrior(
dists = rep("beta", npar),
p1 = rep(1, npar),
p2 = rep(1, npar),
upper = rep(2, npar))
names(sigma.prior) <- GetPNames(model)
pp.prior <- list(mu.prior, sigma.prior)
## Sampling ----------
hsam <- run(StartNewHypersamples(5e2, dmi, p.prior, pp.prior, 16),
pm = .3, hpm = .3)
hsam <- run(RestartHypersamples(5e2, hsam, thin = 32),
pm = .3, hpm = .3) ## 3 hrs
hsam <- run(RestartHypersamples(5e2, hsam, thin = 16),
pm = .3, hpm = .3) ## 90 mins
save(model, pop.prior, nsubject, ntrial, dat, dmi, hsam,
file = "data/hierarchical/shoot-decision-recovery.rda")
As usual, I checked the model so as to know it is reliable.
- Trace plots of posterior log-likelihood at hyper level
- Trace plots of the hyper parameters
- Trace plots of posterior log-likelihood at the data level
- Trace plots of each DDM parameters for each participants
- Posterior density plots (i.e., marginal posterior distributions) for the hyper parameters
- Posterior density plots the DDM parameters for each parameters
- Brook and Gelman potential scale reduction factors
plot(hsam, hyper = TRUE) ## 1.
plot(hsam, hyper = TRUE, pll = FALSE) ## 2.
plot(hsam) ## 3.
plot(hsam, pll = FALSE) ## 4.
plot(hsam, hyper = TRUE, pll = FALSE, den = TRUE) ## 5.
plot(hsam, pll = FALSE, den = TRUE) ## 6.
rhat <- hgelman(hsam) ## 7.
## Diagnosing theta for many participants separately
## Diagnosing the hyper parameters, phi
## hyper 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 1.06 1.00 1.00 1.00 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01
## 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01
## 30 31 32 33 34 35 36 37 38
## 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.03
Then, I checked if my race-threshold model can recover the parameters.
hest1 <- summary(hsam, recovery = TRUE, hyper = TRUE, ps = pop.mean, type = 1)
hest2 <- summary(hsam, recovery = TRUE, hyper = TRUE, ps = pop.scale, type = 2)
ests <- summary(hsam, recovery = TRUE, ps = ps)
## Summary each participant separately
## a.E a.A v.G v.N z sz sv t0
## Mean 1.58 2.61 4.12 3.00 0.48 0.32 1.02 0.22
## True 1.69 2.50 2.94 2.08 0.51 0.33 1.02 0.19
## Diff 0.11 -0.11 -1.18 -0.92 0.03 0.00 0.00 -0.02
## Sd 0.49 0.80 0.39 0.59 0.10 0.12 0.39 0.05
## True 0.46 0.82 0.44 0.51 0.10 0.09 0.35 0.05
## Diff -0.04 0.02 0.05 -0.08 -0.01 -0.03 -0.04 0.00
lapply(list(hest1, hest2), round, 2)
## Mean
## a.A a.E sv sz t0 v.G v.N z
## True 2.50 1.50 1.00 0.30 0.20 3.00 2.00 0.50
## 2.5% Estimate 2.32 1.39 0.58 0.09 0.20 3.88 2.75 0.44
## 50% Estimate 2.60 1.57 0.98 0.31 0.22 4.11 3.00 0.48
## 97.5% Estimate 2.89 1.75 1.18 0.39 0.23 4.36 3.25 0.51
## Median-True 0.10 0.07 -0.02 0.01 0.02 1.11 1.00 -0.02
The recovery study supports the hypothesis that the race-threshold model can recover the parameters reliably. Therefore, if this model is a true model, we can use it to confirm or reject the hypothesis that a police office has a higher shoot threshold towards a black than a white target (i.e., a.A > a.E). Both at the level of individual participants and at the level of hyper parameters. Of course, this presumes that the hierarchical race-threshold model is an appropriate model to better reflect the true phenomenon. That is, if the data do reflect this hypothesis, my hierarchical DDM is able to reveal it. I will return to the technique of model selection in a later tutorial. Note without highly efficient software, like ggdmc, the model selection work is very difficult to conduct.
Another strength in ggdmc (as well as DMC), relative to the Python-HDDM (Wiecki, Sofer & Frank, 2013) is that ggdmc estimates the DDM variabilities at the hyper level 1. I will return this particular strength of ggdmc in another tutorial.
## SD
## a.A a.E sv sz t0 v.G v.N z
## True 0.80 0.50 0.30 0.10 0.05 0.50 0.50 0.10
## 2.5% Estimate 0.67 0.41 0.33 0.10 0.04 0.34 0.50 0.09
## 50% Estimate 0.85 0.52 0.50 0.17 0.05 0.52 0.67 0.11
## 97.5% Estimate 1.14 0.71 0.85 0.32 0.07 0.74 0.90 0.14
## Median-True 0.05 0.02 0.20 0.07 0.00 0.02 0.17 0.01
Now I am ready to fit the empirical data with the race-threshold model.
Reference
- Pleskac, T.J., Cesario, J. & Johnson, D.J. (2017). How race affects evidence accumulation during the decision to shoot. Psychonomic Bulletin & Review, 1-30. https://doi.org/10.3758/s13423-017-1369-6
- Wong, J. C. (2016, Aprial, 18). ‘Scapegoated?’ The police killing that left Asian Americans angry – and divided. The Guardian.
- Chinese Community Reels After Brooklyn NYPD Shooting. (2014, December 24). NBC News.
- American’s police on trial. (2014, December, 11). The Economist.
-
If one wishes to estimate the DDM variabilities at the hyper level in Python-HDDM, s/he would need to modify the Python-HDDM source codes, which is possible but less convenient. ↩