Intro

In the spirit of writing as note-taking, I wanted to share a neat little trick in R for running regressions. By far the most common table in a political science paper is a regression table. Often, researchers run multiple regression specifications and then present them in a singular table. Each regression may have different sets of variables, or one specification will include an interaction effect. This can mean a lot of typing, which can implicitly violate DRY principles for coding.

I came across a nice use of the accumulate() function from the purrr package that both speeds up this task, and makes it programmatic for easy replication. Lazy win!

The basic concept behind accumulate() is to apply a function recursively over a list starting from the left. If you want to do so in reverse, then use accumulate_right(). Given the name, the main use of this function is for cumulative sums, but we can take advantage of the character formula ability of R to run different specifications.

In order to replicate this post on your own machine, you will need the following packages.

install.packages(c("AER", "tidyverse", "estimatr", "texreg"))

A fake data example

In the first demonstration, I create a simple dataset with three predictors, two potential outcomes, and the observed outcome based on a treatment variable.

By construction, \(\tau = 1\) and the data generating process includes multiple pre-treatment covariates and an interaction.

set.seed(100)
N = 1000
dat <- tibble(
  x1 = rnorm(N),
  x2 = rnorm(N, 1, 10),
  x3 = rnorm(N, 10, 3),
  treat = sample(c(rep(1,N/2), rep(0,N/2)), N, replace = F),
  y0 = x1 + x2 + x3 + x2*x3+runif(N),
  y1 = y0 + 1,
  yobs = ifelse(treat == 1, y1, y0)
) 

Manually typing out the

cols_to_use <- c("treat", "x1", "x2", "x3", 'x2*x3')
predictors <- accumulate(cols_to_use, function(a,b){paste(a,b, sep=" + ")})
formulas <- paste("yobs~", predictors)

print(formulas)
## [1] "yobs~ treat"                        "yobs~ treat + x1"                  
## [3] "yobs~ treat + x1 + x2"              "yobs~ treat + x1 + x2 + x3"        
## [5] "yobs~ treat + x1 + x2 + x3 + x2*x3"

Now we can run

# Functional programming with purrr using map()
# lm_robust requires that we coerce our character to a formula object 
formulas %>% 
  map(~lm_robust(as.formula(.x), data = dat, se_type = "stata"))%>%
  htmlreg(include.ci = F)
Statistical models
  Model 1 Model 2 Model 3 Model 4 Model 5
(Intercept) 23.61*** 23.44*** 9.68*** -9.60 0.54***
  (4.90) (4.90) (1.27) (5.40) (0.03)
treat -7.61 -7.43 -2.76 -3.54 1.00***
  (7.10) (7.11) (1.97) (1.93) (0.02)
x1   4.56 3.13** 3.35** 1.00***
    (3.58) (1.02) (1.02) (0.01)
x2     10.99*** 11.04*** 1.00***
      (0.17) (0.17) (0.00)
x3       1.97*** 1.00***
        (0.53) (0.00)
x2:x3         1.00***
          (0.00)
R2 0.00 0.00 0.92 0.93 1.00
Adj. R2 0.00 0.00 0.92 0.93 1.00
Num. obs. 1000 1000 1000 1000 1000
RMSE 112.30 112.26 31.03 30.42 0.28
p < 0.001; p < 0.01; p < 0.05

An Application with Real Data

data("CASchools")

CASchools <- CASchools %>% 
  mutate(STR = students/teachers,
         score = (read + math)/2,
         HiSTR = as.numeric(STR >=20),
         HiEL = as.numeric(english >= 10))

cols_to_use <- c("HiSTR", "HiEL", "HiSTR*HiEL")
predictors <- accumulate(cols_to_use, function(a,b){
  paste(a,b,sep = "+")
})

formulas  <- paste("score~", predictors)
formulas %>%
  map(~lm_robust(as.formula(.x), data = CASchools, se_type = "stata"))%>%
  htmlreg(include.ci = F)
## <table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;caption-side: bottom;color: #000000;border-top: 2px solid #000000;">
## <caption>Statistical models</caption>
## <thead>
## <tr>
## <th style="padding-left: 5px;padding-right: 5px;">&nbsp;</th>
## <th style="padding-left: 5px;padding-right: 5px;">Model 1</th>
## <th style="padding-left: 5px;padding-right: 5px;">Model 2</th>
## <th style="padding-left: 5px;padding-right: 5px;">Model 3</th>
## </tr>
## </thead>
## <tbody>
## <tr style="border-top: 1px solid #000000;">
## <td style="padding-left: 5px;padding-right: 5px;">(Intercept)</td>
## <td style="padding-left: 5px;padding-right: 5px;">657.25<sup>***</sup></td>
## <td style="padding-left: 5px;padding-right: 5px;">664.69<sup>***</sup></td>
## <td style="padding-left: 5px;padding-right: 5px;">664.14<sup>***</sup></td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">(1.25)</td>
## <td style="padding-left: 5px;padding-right: 5px;">(1.25)</td>
## <td style="padding-left: 5px;padding-right: 5px;">(1.39)</td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">HiSTR</td>
## <td style="padding-left: 5px;padding-right: 5px;">-7.17<sup>***</sup></td>
## <td style="padding-left: 5px;padding-right: 5px;">-3.48<sup>*</sup></td>
## <td style="padding-left: 5px;padding-right: 5px;">-1.91</td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">(1.83)</td>
## <td style="padding-left: 5px;padding-right: 5px;">(1.55)</td>
## <td style="padding-left: 5px;padding-right: 5px;">(1.93)</td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">HiEL</td>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">-19.76<sup>***</sup></td>
## <td style="padding-left: 5px;padding-right: 5px;">-18.32<sup>***</sup></td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">(1.59)</td>
## <td style="padding-left: 5px;padding-right: 5px;">(2.33)</td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">HiSTR:HiEL</td>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">-3.26</td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
## <td style="padding-left: 5px;padding-right: 5px;">(3.12)</td>
## </tr>
## <tr style="border-top: 1px solid #000000;">
## <td style="padding-left: 5px;padding-right: 5px;">R<sup>2</sup></td>
## <td style="padding-left: 5px;padding-right: 5px;">0.03</td>
## <td style="padding-left: 5px;padding-right: 5px;">0.29</td>
## <td style="padding-left: 5px;padding-right: 5px;">0.29</td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">Adj. R<sup>2</sup></td>
## <td style="padding-left: 5px;padding-right: 5px;">0.03</td>
## <td style="padding-left: 5px;padding-right: 5px;">0.29</td>
## <td style="padding-left: 5px;padding-right: 5px;">0.29</td>
## </tr>
## <tr>
## <td style="padding-left: 5px;padding-right: 5px;">Num. obs.</td>
## <td style="padding-left: 5px;padding-right: 5px;">420</td>
## <td style="padding-left: 5px;padding-right: 5px;">420</td>
## <td style="padding-left: 5px;padding-right: 5px;">420</td>
## </tr>
## <tr style="border-bottom: 2px solid #000000;">
## <td style="padding-left: 5px;padding-right: 5px;">RMSE</td>
## <td style="padding-left: 5px;padding-right: 5px;">18.74</td>
## <td style="padding-left: 5px;padding-right: 5px;">16.06</td>
## <td style="padding-left: 5px;padding-right: 5px;">16.06</td>
## </tr>
## </tbody>
## <tfoot>
## <tr>
## <td style="font-size: 0.8em;" colspan="4"><sup>***</sup>p &lt; 0.001; <sup>**</sup>p &lt; 0.01; <sup>*</sup>p &lt; 0.05</td>
## </tr>
## </tfoot>
## </table>

Some more text