In the spirit of notes to myself, here’s a neat trick I learned to bulk recode lots of variables at once.
Suppose we have conducted a survey experiment and gotten lots of data from our participants. Our raw data looks like the following:
raw_data <-tibble( ID = LETTERS[1:15], var1 = sample(1:15, replace = F), var2 = sample(16:30, replace = F), var3 = sample(31:45, replace = F) ) kable(raw_data) ID var1 var2 var3 A 1 17 40 B 5 18 38 C 15 24 41 D 9 28 36 E 10 26 45 F 4 19 44 G 2 20 34 H 12 22 39 I 13 27 42 J 11 23 32 K 7 25 35 L 14 30 43 M 8 29 31 N 3 21 37 O 6 16 33 Because survey data is likely to contain errors, we have a subject matter expert look at the data.
There have likely been more words written about the use and misuse of instrumental variables than exist atoms in the universe. When I was starting out in grad school, almost all of our methods education came in the context of experiments. Instrumental Variables were treated like a compliance problem. A researcher ran an experiment, but some people decided not to comply with treatment for some reason which led to missing values.
I am working at UC Berkeley’s D-Lab as a Data Science Fellow. One of my responsibilities is to provide consulting to the UC Berkeley community on statistical and data science projects. A common request of late due to points at everything is to help with web scraping for projects.
Recently, a request came in to scrape a page and download the pdf files that were linked. Fortunately, the page was simple from an HTML perspective, and I could apply a few common patterns to pull the downloads.
Intro When running a randomized control trial (RCT), there is often concern about non-compliance. Subjects may drop out of the treatment or choose to go against the treatment assignment somehow. A plausible solution is to run an Intent To Treat (ITT) analysis. In this setting, we include every unit that has been randomized according to treatment assignment and ignore any non-compliance or dropout that might have occurred. These studies measure the average effect of being assigned to the treatment or control group.
What constitutes an exciting game? One possibility is by looking at the amount teams trade off leads. For example, in 2014 the Portland Trailblazers and the Los Angeles Clippers had a record 40 lead changes in a single game. The previous record for 34 back in 2004 in a game between the then New Jersey Nets and the Phoenix Suns. Games like Portland-LA are especially interesting because they are rare. Most games have few lead changes.
In the spirit of writing as note-taking, I wanted to share a neat little trick in R for running regressions. By far the most common table in a political science paper is a regression table. Often, researchers run multiple regression specifications and then present them in a singular table. Each regression may have different sets of variables, or one specification will include an interaction effect. This can mean a lot of typing, which can implicitly violate DRY principles for coding.
Introduction Recently, I made some maps for a research article. I initially had some reticence, as it had been a long time since I worked with GIS systems. To my (pleasant) surprise, the R spatial ecosystem has evolved to make the process extremely user friendly. In the spirit of “write it down to not forget,” this post provides a beginning to end tutorial for plotting maps across time. To give the tutorial a practical application, I focus on plotting the electoral democracy changes across time using the V-Dem index.
On Friday, I gave a presentation on Morgan Kelly’s “The standard errors of persistence” a summary of which is available here.
The jist of Kelly’s work is that the persistence literature features unusually high t-statistics in part because of severe spatial autocorrelation in the residuals. When one accounts for this issue, he finds that the main persistence variable frequently has lower explanatory power than spatial noise. Furthermore, that persistence variable often strongly predicts spatial noise.
1. Introduction Recently, a friend asked me how to fit a two-way fixed effects model in R. A fixed effects model is a regression model in which the intercept of the model is allowed to move across individuals and groups. We most often see it in panel data contexts. Two-way fixed effects have seen massive interest from the methodological community. Some recent papers of interest are Imai and Kim 2019, Goodman-Bacon 2019, and Abraham and Sun 2018.
A common programming assignment when learning regression is to calculate OLS estimators by hand. In this post, I show exactly how to program OLS estimation in R. In addition, I explain how to add different standard error calculations to replicate Huber-White standard errors and Stata robust standard errors.