R | Alex Stephenson

Filtering expressions with rlang

Suppose you run an experiment and are interested in an estimate of the average treatment effect (ATE) for the full sample as well as sub-groups. An annoying aspect of coding this up is that you end up repeating a lot of code. Programming guides tell us that we should write a function, but if you are like me and reach for the tidyverse you have another problem: non-standard evaluation.

Standard Errors and the Delta Method

A friend recently asked me a question about the delta method, which sent me briefly into a cold sweat trying to remember a concept from my first year methods sequence. In the spirit of notes to myself, here’s a post explaining what it is, why you might want to use it, and how to calculate it with R. What is the Delta Method? The Delta method is a result concerning the asymptotic behavior of functions over a random variable.

Building Up from our Bootstraps

Thinking about Learning Methods Note to Readers: If you arrived at this post because of a search query about how to do bootstrapping, especially wild bootstrapping, in R, skip to the Replication example section at the bottom. When I first entered grad school, I knew little to nothing about quantitative methods. The methods sequence at Berkeley was daunting to me, and I felt overwhelmed with all of the new skills and techniques that my cohort seemed to be picking up with ease.

Notes on Bulk Recoding in R

In the spirit of notes to myself, here’s a neat trick I learned to bulk recode lots of variables at once. Suppose we have conducted a survey experiment and gotten lots of data from our participants. Our raw data looks like the following: raw_data <-tibble( ID = LETTERS[1:15], var1 = sample(1:15, replace = F), var2 = sample(16:30, replace = F), var3 = sample(31:45, replace = F) ) kable(raw_data) ID var1 var2 var3 A 1 17 40 B 5 18 38 C 15 24 41 D 9 28 36 E 10 26 45 F 4 19 44 G 2 20 34 H 12 22 39 I 13 27 42 J 11 23 32 K 7 25 35 L 14 30 43 M 8 29 31 N 3 21 37 O 6 16 33 Because survey data is likely to contain errors, we have a subject matter expert look at the data.

An Example of IV Estimation in R

There have likely been more words written about the use and misuse of instrumental variables than atoms in the universe. When I was starting in grad school, almost all of our methods education came in the context of experiments. Instrumental Variables were treated as a compliance problem. A researcher ran an experiment, but some people decided not to comply with treatment for some reason, which led to missing values. Using the random assignment as an instrument for treatment, the researcher could find the Complier Average Treatment Effect (CATE).

Bulk downloads of pdfs from a website: An Example script

I am working at UC Berkeley’s D-Lab as a Data Science Fellow. One of my responsibilities is to provide consulting to the UC Berkeley community on statistical and data science projects. A common request of late due to points at everything is to help with web scraping for projects. Recently, a request came in to scrape a page and download the pdf files that were linked. Fortunately, the page was simple from an HTML perspective, and I could apply a few common patterns to pull the downloads.

Running Multiple Intent to Treat Analyses with purrr

Intro When running a randomized control trial (RCT), there is often concern about non-compliance. Subjects may drop out of the treatment or choose to go against the treatment assignment somehow. A plausible solution is to run an Intent To Treat (ITT) analysis. In this setting, we include every unit that has been randomized according to treatment assignment and ignore any non-compliance or dropout that might have occurred. These studies measure the average effect of being assigned to the treatment or control group.

Simulating Lead Changes with R

What constitutes an exciting game? One possibility is by looking at the amount teams trade off leads. For example, in 2014 the Portland Trailblazers and the Los Angeles Clippers had a record 40 lead changes in a single game. The previous record for 34 back in 2004 in a game between the then New Jersey Nets and the Phoenix Suns. Games like Portland-LA are especially interesting because they are rare. Most games have few lead changes.

Regression Tables for Lazy People

In the spirit of writing as note-taking, I wanted to share a neat little trick in R for running regressions. By far the most common table in a political science paper is a regression table. Often, researchers run multiple regression specifications and then present them in a singular table. Each regression may have different sets of variables, or one specification will include an interaction effect. This can mean a lot of typing, which can implicitly violate DRY principles for coding.

Plotting Democracy over Time with R

Introduction Recently, I made some maps for a research article. I initially had some reticence, as it had been a long time since I worked with GIS systems. To my (pleasant) surprise, the R spatial ecosystem has evolved to make the process extremely user friendly. In the spirit of “write it down to not forget,” this post provides a beginning to end tutorial for plotting maps across time. To give the tutorial a practical application, I focus on plotting the electoral democracy changes across time using the V-Dem index.