Notes on Bulk Recoding in R

In the spirit of notes to myself, here’s a neat trick I learned to bulk recode lots of variables at once. Suppose we have conducted a survey experiment and gotten lots of data from our participants. Our raw data looks like the following: raw_data <-tibble( ID = LETTERS[1:15], var1 = sample(1:15, replace = F), var2 = sample(16:30, replace = F), var3 = sample(31:45, replace = F) ) kable(raw_data) ID var1 var2 var3 A 1 17 40 B 5 18 38 C 15 24 41 D 9 28 36 E 10 26 45 F 4 19 44 G 2 20 34 H 12 22 39 I 13 27 42 J 11 23 32 K 7 25 35 L 14 30 43 M 8 29 31 N 3 21 37 O 6 16 33 Because survey data is likely to contain errors, we have a subject matter expert look at the data.

An Example of IV Estimation in R

There have likely been more words written about the use and misuse of instrumental variables than exist atoms in the universe. When I was starting out in grad school, almost all of our methods education came in the context of experiments. Instrumental Variables were treated like a compliance problem. A researcher ran an experiment, but some people decided not to comply with treatment for some reason which led to missing values.

Bulk downloads of pdfs from a website: An Example script

I am working at UC Berkeley’s D-Lab as a Data Science Fellow. One of my responsibilities is to provide consulting to the UC Berkeley community on statistical and data science projects. A common request of late due to points at everything is to help with web scraping for projects. Recently, a request came in to scrape a page and download the pdf files that were linked. Fortunately, the page was simple from an HTML perspective, and I could apply a few common patterns to pull the downloads.

Running Multiple Intent to Treat Analyses with purrr

Intro When running a randomized control trial (RCT), there is often concern about non-compliance. Subjects may drop out of the treatment or choose to go against the treatment assignment somehow. A plausible solution is to run an Intent To Treat (ITT) analysis. In this setting, we include every unit that has been randomized according to treatment assignment and ignore any non-compliance or dropout that might have occurred. These studies measure the average effect of being assigned to the treatment or control group.

Simulating Lead Changes with R

What constitutes an exciting game? One possibility is by looking at the amount teams trade off leads. For example, in 2014 the Portland Trailblazers and the Los Angeles Clippers had a record 40 lead changes in a single game. The previous record for 34 back in 2004 in a game between the then New Jersey Nets and the Phoenix Suns. Games like Portland-LA are especially interesting because they are rare. Most games have few lead changes.

Regression Tables for Lazy People

In the spirit of writing as note-taking, I wanted to share a neat little trick in R for running regressions. By far the most common table in a political science paper is a regression table. Often, researchers run multiple regression specifications and then present them in a singular table. Each regression may have different sets of variables, or one specification will include an interaction effect. This can mean a lot of typing, which can implicitly violate DRY principles for coding.

Plotting Democracy over Time with R

Introduction Recently, I made some maps for a research article. I initially had some reticence, as it had been a long time since I worked with GIS systems. To my (pleasant) surprise, the R spatial ecosystem has evolved to make the process extremely user friendly. In the spirit of “write it down to not forget,” this post provides a beginning to end tutorial for plotting maps across time. To give the tutorial a practical application, I focus on plotting the electoral democracy changes across time using the V-Dem index.

Two Ways to fit Two-Way Fixed Effects in R

1. Introduction Recently, a friend asked me how to fit a two-way fixed effects model in R. A fixed effects model is a regression model in which the intercept of the model is allowed to move across individuals and groups. We most often see it in panel data contexts. Two-way fixed effects have seen massive interest from the methodological community. Some recent papers of interest are Imai and Kim 2019, Goodman-Bacon 2019, and Abraham and Sun 2018.

Simulating Football Recruiting Evaluations with R

R is great for running quick simulations. Using a running example of college football recruit rankings, I show how we can leverage the power of R to see the implication of evaluators of different quality.

OLS Estimation by "Hand" in R

A common programming assignment when learning regression is to calculate OLS estimators by hand. In this post, I show exactly how to program OLS estimation in R. In addition, I explain how to add different standard error calculations to replicate Huber-White standard errors and Stata robust standard errors.