Cognitive Models logo Cognitive Models

In analyzing response time data with two choices, researchers would usually examine average response times (RTs) and response proportions. Depending on the model a researcher wishes to presume, the response proportions can simply be correct and error rates, or, if using SDT model, hits, correct rejections, false alarms and misses. Here I used Pleskac, Cesario, and Johnson’s (2017) data in the first-person shooter task (FPST; Correll et al., 2002) to illustrate one method to calculate average RTs and response proportions across participants.

Firstly, I use fread function to load the data file, which is in csv format. The data set provides clear and good column names. That is, the column names have informed the coding method. I simply just followed column names to code the factor levels and later checked against the data in the paper. Of course, I had also checked against the figures of behaviour analyses in the paper to make sure I did correctly identify the dependent and independent variables, which are listed in the following.

library(data.table);
study3 <- fread("data/race/Study3/original/Study3TrialData.csv")
study3$S    <- factor(ifelse(study3$Object0NG1G == 0, "non", "gun"))
study3$BC   <- factor(ifelse(study3$Blurry0Clear1Blur == 0, "clear", "blur"))
study3$CT   <- factor(ifelse(study3$Context1Safe2Danger == 0, "safe", "danger"))
study3$RACE <- factor(ifelse(study3$Race012B == 0, "white", "black"))
study3$R    <- factor(ifelse(study3$Resp0NS1Sh == 0, "not", "shoot"))
study3$RT   <- study3$RT / 1e3
study3$s    <- factor(study3$Subject)

factor is a R function converting variables, numeric or character, to categorical (i.e., nominal) variable. After reorganizing the columns, I removed the replicated columns by assigning them as NULL. This is a data.table specific syntax.

study3[, c("Subject", "NewSubject", "conditionRaceDangerBlurbject",
"conditionRaceDangerBlur",  "Object0NG1G", "Blurry0Clear1Blur",
"Context1Safe2Danger", "Race012B", "Resp0NS1Sh", "DiffusionRT") := NULL]

There are NaN response times in this data set. One method is to replace them with random RTs drawn from uniform distribution, with the range of valid RTs. This was achieved by using the data.table internal function .I. I firstly found the (row) index of these NaN RTs, and then replaced them. Of course, we can simply just remove them.

## save organized data to a temporary object, so I can roll back.
dtmp <- data.table(study3)
minmax <- range(study3$RT, na.rm = TRUE); minmax
idx <- dtmp[, .I[is.nan(RT)]]; idx
dtmp[idx, RT := runif(1, minmax[1], minmax[2])]
d <- dtmp

## scoring a correctness column
d$C <- ifelse(d$S == "gun" & d$R == "shoot", TRUE,
       ifelse(d$S == "non" & d$R == "not",   TRUE,
       ifelse(d$S == "gun" & d$R == "not",   FALSE,
       ifelse(d$S == "non" & d$R == "shoot", FALSE, NA))))

Now the data table looks like:

dplyr::tbl_df(d)
## # A tibble: 12,033 x 8
##       RT S     BC    CT    RACE  R     s     C
##    <dbl> <fct> <fct> <fct> <fct> <fct> <fct> <lgl>
##  1 0.753 gun   blur  safe  black shoot 11    TRUE
##  2 0.851 non   blur  safe  white not   11    TRUE
##  3 0.742 gun   clear safe  black shoot 11    TRUE
##  4 0.636 non   clear safe  white not   11    TRUE
##  5 0.644 gun   blur  safe  black not   11    FALSE
##  6 0.625 non   clear safe  black shoot 11    FALSE
##  7 0.889 non   clear safe  white not   11    TRUE
##  8 0.597 gun   blur  safe  black shoot 11    TRUE
##  9 0.724 gun   clear safe  white shoot 11    TRUE
## 10 0.656 non   blur  safe  white not   11    TRUE
## # ... with 12,023 more rows

Censoring RT data

Censoring outliers is a difficult task (Ratcliff, 1993). Here I illustrated one way to do it via Heathcote’s rc, a collection of very useful R functions and my summarise, also a collection of useful R functions. First, I used R’s source function to load this large collection of R functions.

source("~/rc/data.analysis.R")
source("~/rc/utils.R")
source("~/functions/summarise.R")

## Scoring ------------
se3 <- score.rc(data.frame(d), S = "s", R = "R", RT = "RT", SC = "C",
                F = c("BC", "CT", "RACE", "S"))
## Spreading 11851 of 12033 RTs that are ties given preceision 0.001 .
##    497 have ties out of 679 unique values
##
## Added the following manifest design
##      S  RACE     CT    BC     R rcell
## 1  gun black danger  blur   not     1
## 2  gun black danger  blur shoot     1
## 3  gun black danger clear   not     2
## ...
## 30 non white   safe  blur shoot    15
## 31 non white   safe clear   not    16
## 32 non white   safe clear shoot    16

score.rc function takes first argument data.frame, which is the data as seen previously. Because I stored it as data.table, I needed to convert it back to data.frame. Just a note. Although data.table may accommodate many functions operating in data.frame, there are some operations in rc functions, which cannot work in data.table.

Note the second argument, uppercase S, which takes the subject column, instead of the column of stimulus factor. The R and RT arguments take response column and the response time column. SC takes the column of score correctness, which is purely my guess. I cannot be sure why it is called SC. The last useful argument is F, which takes user-defined factors, including the stimulus factor.

score.rc detects the identical (ties) RTs and spread them into finer scale. For example, in this data set, there are 31 trials with 60y ms.

table(d$RT)
##             0.01            0.015            0.019            0.025 
##                1                1                1                1 
##            0.027             0.03            0.035            0.036 
##                1                1                1                1 
##    	       ...
##            0.386            0.387            0.388            0.389 
##                7                3                1                3 
##             0.39            0.391            0.392            0.393 
##                1                4                3                3 
##            0.394            0.395            0.396            0.397 
##                2                6                7                3 
##     		   ...
##            0.606            0.607            0.608            0.609 
##               50               31               40               39 
##             0.61            0.611            0.612            0.613 
##               42               49               31               41 
##            0.614            0.615            0.616            0.617 
##               49               55               42               39 
##            ...

If I printed them all out, the data set after scoring spreads these RT to a

se3[se3$RT >= .607 & se3$RT < .608,]

      cell rcell   s    BC     CT  RACE   S     C     R        RT
539     31    16  19 clear   safe white non  TRUE   not 0.6070000
838     31    16  24 clear   safe white non  TRUE   not 0.6075488
1909    26    13  39  blur danger white non FALSE shoot 0.6070313
2108     6     3  44  blur   safe black gun  TRUE shoot 0.6072812
2321     4     2  50 clear danger black gun  TRUE shoot 0.6079634
2354    19    10  50 clear danger black non  TRUE   not 0.6071250
2939    15     8  62 clear   safe white gun FALSE   not 0.6072188
3083    32    16  62 clear   safe white non FALSE shoot 0.6077683
3319    12     6  72 clear danger white gun  TRUE shoot 0.6075244
3762    19    10  82 clear danger black non  TRUE   not 0.6079146
4802    23    12 120 clear   safe black non  TRUE   not 0.6076951
5211    17     9 129  blur danger black non  TRUE   not 0.6075976
5391    19    10 129 clear danger black non  TRUE   not 0.6074063
6095     1     1 184  blur danger black gun FALSE   not 0.6073125
7057    10     5 201  blur danger white gun  TRUE shoot 0.6077195
7201    17     9 201  blur danger black non  TRUE   not 0.6078659
7288     6     3 214  blur   safe black gun  TRUE shoot 0.6073438
7345    25    13 214  blur danger white non  TRUE   not 0.6071875
7747    23    12 218 clear   safe black non  TRUE   not 0.6077439
8259     8     4 231 clear   safe black gun  TRUE shoot 0.6076707
8305    19    10 231 clear danger black non  TRUE   not 0.6079390
8671    12     6 235 clear danger white gun  TRUE shoot 0.6071563
8923    19    10 247 clear danger black non  TRUE   not 0.6073750
9002    31    16 247 clear   safe white non  TRUE   not 0.6074375
9045    31    16 247 clear   safe white non  TRUE   not 0.6070625
9509    31    16 286 clear   safe white non  TRUE   not 0.6076220
9551    31    16 286 clear   safe white non  TRUE   not 0.6078415
9692     6     3 286  blur   safe black gun  TRUE shoot 0.6075732
9903    27    14 288 clear danger white non  TRUE   not 0.6079878
10432   25    13 307  blur danger white non  TRUE   not 0.6077927
10534   10     5 308  blur danger white gun  TRUE shoot 0.6078902
10666   21    11 308  blur   safe black non  TRUE   not 0.6074688
10979   16     8 325 clear   safe white gun  TRUE shoot 0.6078171
11201    9     5 326  blur danger white gun FALSE   not 0.6070937
11642   19    10 344 clear danger black non  TRUE   not 0.6072500
11866   25    13 348  blur danger white non  TRUE   not 0.6076463

The original data set is to the millisecond scale.

d[RT == .607]

       RT   S    BC     CT  RACE     R   s     C
 1: 0.607 gun  blur danger black   not  11 FALSE
 2: 0.607 non  blur danger black   not  19  TRUE
 3: 0.607 non clear   safe white   not  19  TRUE
 4: 0.607 gun  blur   safe black shoot  24  TRUE
 5: 0.607 non clear danger white   not  28  TRUE
 6: 0.607 non clear   safe white   not  37  TRUE
 7: 0.607 non  blur danger white shoot  39 FALSE
 8: 0.607 non  blur danger black   not  44  TRUE
 9: 0.607 gun  blur   safe black shoot  44  TRUE
10: 0.607 non clear danger black   not  50  TRUE
11: 0.607 gun clear danger white shoot  50  TRUE
12: 0.607 gun clear   safe white   not  62 FALSE
13: 0.607 non clear danger black   not 129  TRUE
14: 0.607 gun  blur danger black   not 184 FALSE
15: 0.607 non clear   safe white   not 184  TRUE
16: 0.607 non  blur danger black   not 184  TRUE
17: 0.607 gun  blur   safe black shoot 214  TRUE
18: 0.607 non  blur danger white   not 214  TRUE
19: 0.607 non clear   safe white   not 218  TRUE
20: 0.607 gun clear danger white shoot 235  TRUE
21: 0.607 non clear danger black   not 247  TRUE
22: 0.607 non clear   safe white   not 247  TRUE
23: 0.607 non clear   safe white   not 247  TRUE
24: 0.607 gun clear danger white shoot 288  TRUE
25: 0.607 non  blur   safe black   not 308  TRUE
26: 0.607 non clear   safe white   not 325  TRUE
27: 0.607 gun  blur   safe white shoot 326  TRUE
28: 0.607 gun  blur danger black shoot 326  TRUE
29: 0.607 gun  blur danger white   not 326 FALSE
30: 0.607 non  blur danger black   not 344  TRUE
31: 0.607 non clear danger black   not 344  TRUE
       RT   S    BC     CT  RACE     R   s     C

The scored data set, se3, will also attach two new columns, cell and rcell, indicating the experimental design. In this example, it has 32 cell, so cell is from 1 to 32 and rcell is from 1 to 16, because this is a two-choice experiment, response, shoot and not to shoot in cell 1 and cell 2, belong to the same experimental design, but with different response types.

## 1  gun black danger  blur   not     1
## 2  gun black danger  blur shoot     1
## 3  gun black danger clear   not     2
## 4  gun black danger clear shoot     2

A usual practice is to take 3 times the standard deviation, respectively in each participants. This can be achieved via tapply function. If the data set is large, one can use data.table to achieve the same aim, which I will demonstrate in a later tutorial.

sd3 <- tapply(se3$RT, se3$s, mean) + tapply(se3$RT, se3$s, sd) * 3;

A second useful function in rc collection is the make.rc, which does the censoring work. It takes a first argument of the scored data set, from score.rc and a second argument, correct.name, indicating the character string for the correctness column, and the last two arguments, for the lower and upper bounds of the censoring.

me3 <- make.rc(se3, correct.name = "C", minrt = .2, maxrt = sd3)

How to average across trials

acc0 <- summarySE(d, mv = "error", gvs = c("s", "BC", "CT", "RACE", "S"))
mrt0 <- summarySE(d[C == TRUE], mv = "RT",    gvs = c("s", "BC", "CT", "RACE", "S"))
## Within se average across subjects for pc and nt
figA <- summarySEwithin(acc0, wvs = c("BC","CT", "RACE", "S"), mv = "error")
figB <- summarySEwithin(mrt0, wvs = c("BC","CT", "RACE", "S"), mv = "RT")
names(figA) <- c("BC", "CT", "RACE", "S", "N", "y", "sd", "se", "ci")
names(figB) <- c("BC", "CT", "RACE", "S", "N", "y", "sd", "se", "ci")

Reference

Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological bulletin, 114(3), 510.