Mixtape Mailbag #7: What Happens in Difference-in-Differences if Parallel Trends is satisfied but No Anticipation is Violated?
Every Monday, so long as I have a question submitted, I try to answer one readers’ question. The questions are usually something about causal inference and oftentimes very practical, like a project they’re working on. If you are someone out there curiosity about something too, feel free to email me at causalinf@mixtape.consulting. And as a perk, anyone whose question I answer will be given a 1-month free subscription to the substack. And if you’re already a subscriber, then you still get a month free (the next month). So with that being said, let’s see what’s in the bag this week.
Dear Scott,
I have attended many of your excellent mixtape classes. However, there is one problem I can't seem to solve by looking through old session slides. I was wondering if you could quickly advise me on it?
I am running an event study diff-in-diff with leads and lags based on a change in the age of eligibility for some payments. I was wondering how the assumption of no anticipation works here. I am actually expecting some people to only partially adjust their behaviour so that the relevant behaviour change moves later in time but now occurs before the age of eligibility (the treatment event). Obviously, this means we can't test for pre-trends (not a problem as groups very similar so confident on that one). I want to keep the treatment time as the age of eligibility. But I am wondering if this anticipation could bias the lag estimates due to "not yet treated" units who are experiencing a treatment effect being used as the counterfactual for those who are post-treatment. If anticipation effects occur at a lead that is specified in the regression, do they still bias the results of the lags or are they accounted for?
So, in the following, as long as anticipation occurs only in leads after -3 are we OK:
Thanks again for the classes and best!
DF
Dear DF,
Well I have to say, DF, that in trying to answer your question, I got the old pen and paper out, and actually taught myself something that I didn’t know, and I actually don’t even think many people know. I learned what a population regression coefficient in a simple 2x2 (which will ultimately carry over to the event study you’re asking about) identifies if you have parallel trends and SUTVA but not “No Anticipation” (NA). And I think when you see what I learned, you’ll realize that indeed losing the NA assumption is actually going to be fatal if you have (1) constant treatment effects ironically and/or (2) any treatment effect that happened at baseline as a result of using as your baseline a treated unit. Your question concerns the leads and lags, but as the leads and lags are all just simple 2x2 calculations referencing the problematic baseline term, the leads and lags will all suffer from biases too.
In this substack, I’m going to focus first on the 2x2 which would be all post-treatment coefficients. Given that the event study’s lags you reference will be numerically identical to what I show here (as each lag is a 2x2 with reference to some baseline), then everything I say here will apply. But that said, I think I’m intrigued enough that at the end of this week, I’m going to do some simulations to investigate more what happens when we lose no anticipation with and without an event study.
What I found in this analysis surprised me. It surprised me not because I didn’t know NA was crucial. It surprised me because I actually hadn’t worked out the assumptions and bias terms when you have parallel trends but not NA. Losing NA caused the regression coefficient to sort of look like the form that Goodman-Bacon found in his paper under differential timing, but only superficially similar. Whereas Goodman-Bacon found that problems emerged if you had dynamic treatment effects and differential timing when you used twoway fixed effects, here we find that you get a problem with a simple 2x2 with either constant treatment effects or any treatment effect in the baseline period. Like I said — I think this is actually quite new, and I haven’t seen it before in a paper, so check it out below the paywall.
If you’re reading this, but aren’t a subscriber, you may want to click on the free 7-day subscription so you can see the steps, work it out yourself, and better understand how DiD absolutely must use as its baseline an untreated unit (just as it must use as its comparison untreated units).
Keep reading with a 7-day free trial
Subscribe to Scott's Mixtape Substack to keep reading this post and get 7 days of free access to the full post archives.