Scott's Mixtape Substack

Scott's Mixtape Substack

Difference-in-Differences

Should I Include Covariates in Diff-in-Diff?

scott cunningham's avatar
scott cunningham
Jun 01, 2026
∙ Paid

I have heard the following enough times that it has registered. And it happens among people who are usually fairly seasoned researchers. So both the frequency and the speaker has made me think it’s probably a common enough belief. And that is this:

If I include covariates, and my diff-in-diff estimates change, then I do not believe the diff-in-diff estimates.

It comes in many forms, but that’s usually it in a nutshell. And today I want to just write what is probably going to be the first of a few substacks on it, but I’m going to try and be brief, which will require doing a couple of these. But first, I flipped a coin 3 times, it came up head all three times, and therefore this will be paywalled (eventually below it will be).

Thanks again for your support! If you’re dying to learn more about the importance of including covariates in diff-in-diff, then consider becoming a paying subscriber! At $5/month, which is the absolute bare minimum Substack allows me to charge, it’s a steal!

Scott's Mixtape Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


Why do you include covariates in diff-in-diff?

It is well known that diff-in-diff has one key assumption called parallel trends. And if you satisfy it, you don’t need to include any covariates as controls. Let me start with an illustration of what it means to satisfy parallel trends. Our outcome will be earnings, and I will have compare college educated workers (our treatment group) with high school only workers (out control group). We will represent untreated potential outcome as Y(0) and the treated outcome as Y(1), and therefore a treatment effect as Y(1) - Y(0).

First, let’s say that men’s high school only earnings grows +10 a year, but female’s high school only earnings grow +8 a year (euros, dollars, pounds, anything). We can write this as:

\(Y_{it}(0) = \alpha + 10t \cdot M_i + 8t \cdot (1-M_i) + \varepsilon_{it}\)

where M is a dummy variable equalling 1 if biologically male and 0 if biological female, alpha is a level constant that can be different for males and females if we wanted, and the epsilon is in expectation zero. Hence when M=1, then E[Y(0)] grows at a rate of 10, and when M=0, then E[Y(0)] grows at a rate of 8. Notice that this is an outcome model. It states that there is a “return” to being a male, a “return” to being a female, but that it is not the same.

But subtly, notice also that that return is the same whether you are treated or not. If you are treated, then of course we never see Y(0). We only see Y(1). But that just means that for college educated workers, Y(0) is counterfactual.

And in this outcome model, we are saying that high school only males have different trends than females — not just different levels (i.e., alpha) but trends.


Balanced

Second, let’s say that 75% of our college educated workers are males and 75% of our high school educated workers are males. First, let’s take a first difference for everyone in the sample.

\(\Delta Y_{it}(0)=8+2M_i+\Delta\varepsilon_{it}\)

When we take expectations, we get:

\(E[\Delta Y_{it}(0)]=8+2M_i\)

Note that the alpha dropped out because it was a constant for each person i. So even if we allowed males and females to make different baseline earnings, the first difference wipes them out. It just doesn’t wipe out the effect of sex on trends. That’s the key here.

Now, recall I said that the two groups were balanced. 75% of the treatment group was male and 75% of the control group was male. This means that we can can calculate using that equation the trend in average earnings for both groups, and since it does not depend on treatment status, the trend will be the same. And it will be 9.5. And that is because 8+2 x 0.75 = 8 + 1.5 = 9.5.

So the two groups are balanced, they both grow at 9.5, and thus the college group and the high school group satisfy unconditional parallel trends and as a result, you do not need to control for sex in your diff-in-diff. You do not because every 2x2 is equal to this:

\(2 \times 2 = ATT + PT_{bias}\)

And since we just showed that there isn’t a parallel trends bias, the 2x2 is an unbiased and consistent estimate of the ATT. Done.

User's avatar

Continue reading this post for free, courtesy of scott cunningham.

Or purchase a paid subscription.
© 2026 scott cunningham · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture