Mixtape Mailbag #9: Log Transformations in Diff-in-Diff with Continuous Treatments
Discussion of three papers by Pedro Sant'Anna
This ended up becoming a discussion (at a high level) of three papers by Pedro Sant’Anna. Pedro, as many of you know, has been at the forefront of writing about difference-in-differences over the last five years along with a couple of other economists like Clément de Chaisemartin, Xavier D'Haultfoeuille, Andrew Goodman-Bacon, Brantly Callaway, Jonathan Roth, and more. I was not really planning on doing it, but the more I thought about this reader’s question, the more I realized that if there was an answer to his question, it most likely was at the intersection of those three papers. So hopefully you find this a useful response, and may it prompt you to look into these other three papers as well.
Subject: Seeking Insights on Causal Inference and Strong Parallel Trends Assumption
Dear Scott,
I hope this message finds you well. I'm reaching out with a couple of queries related to causal inference, hoping to gain some of your valuable insights.
Firstly, I'm curious if the mixtape mailbag will address causal inference questions. I've been working on a Joint Modeling Project (JMP) involving an intriguing treatment regime and would greatly appreciate your off-the-cuff thoughts on it.
Secondly, I recently completed the continuous DiD mixtape session and found it incredibly enlightening, particularly regarding the strong parallel trends assumption. This assumption seems crucial, especially when considering the functional form in relation to the dose-response curve. It's easy to envision scenarios where this assumption might not hold in levels but could be valid in logarithmic transformations.
To illustrate this point, I've been pondering a thought experiment, initially shared with Professor Callaway, concerning a hypothetical scenario where the federal government raises the minimum wage to $20 an hour nationwide. This change would affect states differently, ranging from a $5 to a $13 increase. The strong parallel trends assumption, particularly in the context of dose-response curves, suggests that states should respond similarly to the same increment in minimum wage. However, this seems implausible because states *choose* their starting minimum wage. So the size of the delta is not randomly assigned. It's hard to imagine that West Virginia and Massachusetts would react identically to a $5 increase in their minimum wage, but strong parallel trends require that kind of assumption.
But what if we consider these changes in logarithmic terms or as a percentage of the state's median wage? In that case, the strong parallel trends assumption seems more tenable. This reflection is partly inspired by my JMP, which I am shamelessly plugging (Teacher Testing Standards and the New Teacher Pipeline).
I'd love to hear your thoughts on these matters, particularly regarding the feasibility of the strong parallel trends assumption in different contexts.
Thank you for your time and expertise.
Best,
Sleepless in Boston
Dear SIB,
I had hoped that by today I would be able to better answer your question, but I think all I’ve managed to do is put myself on a path to trying to answer it. So let me just try to answer as best I can, which is to say, let me just try to keep up with you as I think you’ve thought more deeply about this.
You bring up three things. First, you note that cities and states choose their own minimum wages, and as such, it’s possible that strong parallel trends fails. Put aside the parallel trends argument itself. The idea that units choose their own treatments is usually the simple heuristic that I give students, too, to explain broadly why we have to focus on the treatment assignment mechanism when conducting causal inference. Insofar as units choose treatments in anticipation of the impact that the treatment will have, then given unbounded heterogeneity in treatment effects, the selection bias baked into the data may be severe. So I appreciate your point.
A paper I have printed out and stuffed into my bag, but have not yet read, seems relevant to thinking about the threats to parallel trends in light of selection, though. It’s by the highly prolific Pedro Sant’Anna, Dalia Ghanem and Kaspar Wüthrich and is entitled appropriately “Selection and Parallel Trends”. From what I’ve skimmed, though, the authors argue that without making any restrictions on the selection mechanism, then the only way to identify the ATT using DiD is to make assumptions about time-invariant unobservables. Parallel trends, without any restrictions on selection, holds only insofar as the Y(0) potential outcome is constant across time up to “deterministic mean shifts”. This, they argue, is such a strong situation that refer to it as a “negative result” and suggest restricting selection to relax it.
Keep reading with a 7-day free trial
Subscribe to Scott's Mixtape Substack to keep reading this post and get 7 days of free access to the full post archives.