Mixtape University: Diff-in-Diff with a checklist. Simulating the Importance of Weighting
One of the perks I’m creating for paying subscribers is a library of short videos where I teach causal inference.These are 15-minute (more or less) videos that aim to build over time into a helpful resource. If you’ve heard me teach before and wanted a quick refresher on a concept, that’s what these are for.
Right now, I’m taking a checklist approach to teaching difference-in-differences. The jury’s still out on whether that’s the best pedagogical move, but I’m giving it a go.
In the last couple of videos, I focused on the role of weighting. But rather than just walking through formulas, I tried to show how the aggregation already present in a dataset can influence the parameter we estimate. The most relevant videos are here:
Mixtape University: Diff-in-diff with a checklist. Defining your target parameters
Greetings! Today I am posting two new 15-minute videos for Mixtape University. This is for my “Diff-in-diff with a checklist” series. It will be based on my workshop, Causal Inference 2. And at the moment, this is my checklist (although last week the way I taught this in a workshop, I found that Steps 4-6 have not yet fully gelled in my mind if that will be how I do it when I get to those steps for this series — but we have time to fix that).
I’ve been thinking about this for a long time—ever since our paper, “Difference-in-Differences: A Practitioner’s Guide.”In an earlier version of that paper, we discussed how different weighting schemes could change the interpretation of the ATT—from being about the “average person” to being about the “average county.” That distinction stuck with me.
But another idea that kept resurfacing came from years of teaching potential outcomes in Excel spreadsheets. When you use spreadsheets, it becomes really clear that average treatment effects are just averages over units. If the unit is a person, then the average treatment effect is about the average person. But what if the unit is a county or a state—and those units are aggregated from an underlying person-level dataset?
Finally, I’ve always felt a bit uneasy about how people write about population weights in panel regressions—especially when those weights are added to make things “nationally representative.” That never quite sat right with me in this context.
So, I wrote up some new slides that start from step one: define the target parameter. From there, I built a simulation to explore when heterogeneous treatment effects and Roy-like sorting into regions could make the same dataset—aggregated in different ways—produce different estimates.
Today, I’m sharing video simulations as well as the code in Stata, R, and Python that walk through exactly when different weighting schemes lead to the same answer, and when they don’t. I think it’s pretty interesting, honestly. Remember, people wanting to learn more about this can also read this great 2015 JHR by Solon, Haider and Wooldridge, “What are we weighting for?”
Thanks again for being a subscriber. It means a lot.
Keep reading with a 7-day free trial
Subscribe to Scott's Mixtape Substack to keep reading this post and get 7 days of free access to the full post archives.