Chapter 11 Mediation analysis

(Work in progress!)

See Tingley et at. (2014).

11.1 Simulated Example

11.1.1 Make up some data

In this example, let’s make up some data for an experiment where people are randomised to one of two groups. The intervention group receives £2000 and the control group receives £500. The hypothesis is that one month later, the intervention group will have more savings than the control group; i.e., they won’t just spend all of the money they have received.

set.seed(42)
study_n <- 200
dat <- tibble(savings_start = rnorm(study_n, 5000, 2000) %>%
                ifelse(. < 0, 0, .),
              treat_group    = rbinom(study_n, size = 1, p = .5),
              money_received = ifelse(treat_group == 1,
                                      2000,
                                      500) + rnorm(study_n, 0, 5),
              savings_end    = -500 + savings_start + money_received +
                                 rnorm(study_n, 0, 600) %>%
                ifelse(. < 0, 0, .))
                            
dat <- dat %>%
  mutate(treat_group = dplyr::recode(treat_group,
                                       `1` = "Intervention",
                                       `0` = "Control")) %>%
  mutate(treat_group = as.factor(treat_group))
dat
## # A tibble: 200 x 4
##    savings_start treat_group  money_received savings_end
##            <dbl> <fct>                 <dbl>       <dbl>
##  1         7742. Control                500.       8359.
##  2         3871. Intervention          2004.       5923.
##  3         5726. Intervention          2000.       7226.
##  4         6266. Control                504.       6351.
##  5         5809. Intervention          1999.       7308.
##  6         4788. Control                500.       4787.
##  7         8023. Intervention          2002.       9525.
##  8         4811. Control                505.       4816.
##  9         9037. Control                494.       9031.
## 10         4875. Control                500.       5376.
## # ... with 190 more rows

Here’s a graph:

dat %>%
  group_by(treat_group) %>%
  summarise(savings_Pre  = mean(savings_start),
            savings_Post = mean(savings_end)) %>%
  pivot_longer(cols = starts_with("savings"),
               names_to = "time",
               values_to = "savings",
               names_prefix = "savings_") %>%
  mutate(time = factor(time)) %>%
  mutate(time = relevel(time, "Pre")) %>%
  ggplot(aes(x = time, y = savings, fill = treat_group)) +
    geom_col(position = "dodge") +
    labs(x = "Time",
         y = "Savings",
         fill = "Group")
## `summarise()` ungrouping output (override with `.groups` argument)

11.1.2 Analyse it

Fit the two models:

mod_med <- lm(money_received ~ savings_start + treat_group, data = dat)
mod_out <- lm(savings_end ~ savings_start + treat_group + money_received, data = dat)

Have a look:

options(scipen = 3) # bias away from scientific notation (default 0)
summary(mod_med)
## 
## Call:
## lm(formula = money_received ~ savings_start + treat_group, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.6671  -3.0849  -0.1591   3.3646  14.6775 
## 
## Coefficients:
##                              Estimate    Std. Error  t value Pr(>|t|)    
## (Intercept)              499.79497950    0.97028728  515.100   <2e-16 ***
## savings_start             -0.00005186    0.00017527   -0.296    0.768    
## treat_groupIntervention 1500.56087162    0.67935371 2208.806   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.772 on 197 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2.442e+06 on 2 and 197 DF,  p-value: < 2.2e-16

The coefficients for the model regressing savings on treatment group and money received wipe each other out…

summary(mod_out)
## 
## Call:
## lm(formula = savings_end ~ savings_start + treat_group + money_received, 
##     data = dat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -265.4 -249.8 -199.9  180.2 1742.1 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              170.10314 2532.28559   0.067    0.947    
## savings_start              0.99974    0.01246  80.223   <2e-16 ***
## treat_groupIntervention 1185.99116 7600.14736   0.156    0.876    
## money_received             0.17450    5.06477   0.034    0.973    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 339.2 on 196 degrees of freedom
## Multiple R-squared:  0.9745, Adjusted R-squared:  0.9741 
## F-statistic:  2500 on 3 and 196 DF,  p-value: < 2.2e-16

Check out the VIFs:

vif(mod_out)
##  savings_start    treat_group money_received 
##       1.001488   24792.453868   24792.238825

HUGE and fine (“Multicollinearity is to be expected in a mediational analysis and it cannot be avoided,” explains David A. Kenny).

Test the mediation:

library(mediation)
test_med <- mediate(model.m = mod_med,
                    model.y = mod_out,
                    treat = "treat_group",
                    mediator = "money_received",
                    robustSE = TRUE,
                    sims = 100,
                    control.value = "Control",
                    treat.value = "Intervention")
summary(test_med)
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
##                  Estimate 95% CI Lower 95% CI Upper p-value    
## ACME              812.599   -16287.267        17423    0.98    
## ADE               636.341   -15973.607        17711    0.80    
## Total Effect     1448.940     1368.180         1507  <2e-16 ***
## Prop. Mediated      0.282      -11.547           12    0.98    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 200 
## 
## 
## Simulations: 100