The Mixtape Mailbag is a weekly Q&A in which someone writes me a question, preferably in my wheelhouse of expertise, and I hazard an answer. If I hazard an answer, you get a one month free trial to the substack, so please email me all your questions to causalinf@mixtape.consulting. Just don’t email me problem set questions or test questions because as a professor, I have earned the right to only assign those, but never take them again. And if after reading this, a reader wants to offer their own thoughts, please leave a comment! It would be great if this could prompt some discussion.
Mailbag Question: Difference in differences with staggered rollout
Dear Scott,
I hope you are well and that you had an enjoyable Christmas and New Year!
I have a question about difference in differences that I was hoping you might be able to answer (either in a reply to me or through the Mixtape Mailbag).
How do you deal with a scenario where you have a staggered rollout across more than one dimension?
For example, imagine there is a job training programme that is rolled out over several years to different states (some states remain never treated and some are treated from time T, time T+1, T+2 and so on). Once they become treated they stay treated. This lends itself nicely to some of the recent methods developed for staggered Diff in Diff.
However, what if that job training programme is also expanding in scope each year too so that each year the training programme covers more industries? For example, in the first year, the job training programme covers retail workers, in the second year it expands and covers retail and manufacturing and then in the third it adds administrative workers (and so on). And to make things more complex, states can choose which industries (out of those available in the overall program at time T) they want to introduce the job training for. And, states can add industries in successive years even if they have already been treated (i.e., state A might start with only retail but in two years time they also add all other available industries).
It seems to me that the number of industries covered is kind of akin to a treatment intensity variable, but not exactly and definitely cannot be represented as a continuous variable. It also strikes me that the training programme could be split into each industry as seperate treatments. But at the same time, the decision to add more industries to the programme is likely contingent on the programme's initial success, so there are some selection issues here too and the treatments are not independent.
I hope that all makes sense. It's something I've been thinking about for a while and I can't really get my head around the right way to approach this problem! If you have any thoughts or advice, I'd really appreciate it.
Kind regards,
RM
Dear RM,
Greetings from Waco, Texas USA. Where I live, the summers reach 110 degrees Fahrenheit (or around 43.3 Celsius) for sometimes a month to six weeks at a time. It’s blistering hot, and I love every second of it truth be told. If asked by Zeus, the Greek god of weather, if I had had enough, I would taunt him to make it to make it even hotter. Perhaps now knowing my love of extreme heat, you can understand when I say that it is bitter cold this morning at 15 degrees Fahrenheit (or -9.4 Celsius) and I do not particularly enjoy it. But I recently bought a new house — built in 1914, abandoned, unliveable, but not unloveable, completely rebuilt — and this morning I am enjoying being inside, with my three cats (Betty, Ronnie and Clara) who are running around still curious about their new surroundings, drinking my coffee, and thinking about your question. After I finish posting this, I will run over to my former house, where I’m still paying rent, and check on my semi-feral cat who I caught on the porch when I went by last night to get my daughter’s bedding in the dryer. This was one of my three orange feral cats — Little Mama, Tigger or Simba — and for some reason I couldn’t figure out for the life of me whether it was Simba or Tigger. I grabbed him by the scruff of his neck, took him inside the empty house, made him a bed by the kitty litter with some food, and left him there screaming meowing alone feeling at least he’d be inside where it was a bit easier to get safe (especially if he figured out that heat rises and can go upstairs). Anyway, that’s a long introduction, but I wanted to get that off my chest and put into the universe that Tigger-Simba is currently asleep cuddled in the weighted blanket I left for him over at the house while all the other feral cats in the community are doing who knows what during this horrible cold storm. Here’s a picture of him maybe even dreaming of warmer days.
The spring semester, depending on where you are, began either last week or begins today — unless you’re on the quarter system, which is still a complete mystery to me as to when it starts and stops. But let’s say more or less things are probably beginning this week with regards to some new part of the learning cycle. I have a feeling your question will stick with me for a while, just as it has stuck with you for a while. I want to begin by saying back to your what I understand your question to be.
Your question regards the expansion of some policy across multiple dimensions, as you say. It expands across time and across industries. I want to reframe this in language I understand. But I want to preface by saying that I think your question is actually referencing data limitations that are mechanically creating a SUTVA violation. By framing it that way, it’s possible that that could be a useful way to devise a solution. Let me start first by explaining what I mean by SUTVA violation.
Keep reading with a 7-day free trial
Subscribe to Scott's Mixtape Substack to keep reading this post and get 7 days of free access to the full post archives.