Mixtape Mailbag: Covariates, Regression, and Homogeneity

Mar 17, 2025

∙ Paid

Greetings readers! Mixtape Mailbag is where people write in with questions, and I take a stab at answering them. It’s like Dear Abby, but for applied empirical work.

This week’s Mixtape Mailbag is from a reader asking about a diff-in-diff design with two treatments and a single regression. This was from their own research, so I changed the details around the application itself, but kept the rest. I hope you find this helpful.

Dear Scott,

I'm a big fan of your work and am eagerly anticipating your upcoming DID workshop!

I have a question related to a project I'm currently working on, exploring the effect of job-training programs on employment outcomes for displaced workers using difference-in-differences.

Here's the setup:

Data: Individual-level panel data observed in two years: 2015 (pre-treatment) and 2018 (post-treatment). Unfortunately, there's no data available for 2017, the exact year when the job-training program started.
Treatment timing: Initiated in 2017.
Treatments: Two separate types of job-training programs—one group received online training and another received in-person workshops. There's also a third group of displaced workers who received no training at all (never-treated control).
Specification: A two-way fixed effects (TWFE) regression with individual and year fixed effects, and including individual-level (X_it) and regional-level covariates (W_rt):

\(Y_irt = α_i + λ_t + β₁(OnlineTraining_i × Post_t) + β₂(InPersonTraining_i × Post_t) + X'_itγ + W'_rtδ + ε_irt\)

where β₁ and β₂ represent the treatment effects I'm interested in estimating.

My questions:

Given there’s no staggered adoption—both treatment groups started simultaneously—and the control group is never-treated, does this TWFE with interaction terms approach still introduce potential bias, perhaps due to contamination between groups or complications from the missing treatment-year data (2017)?
Regarding distinguishing the effects of the two treatments clearly: I initially looked into using the `did_multiplegt_dyn` command in Stata, but it appears to aggregate treatment effects rather than separately identifying the online vs. in-person training impacts. Are there alternative recent DID approaches or methods you'd recommend that explicitly separate these two treatment effects clearly?

I greatly appreciate any insights you could provide—thanks so much!

Best,

"Job-Training Jason"

Dear Job-Training Jason,

It’s great to hear from you. This sounds like an interesting project. Briefly, here’s what I understand you to be saying. Hopefully I got it right.

Keep reading with a 7-day free trial

Subscribe to Scott's Mixtape Substack to keep reading this post and get 7 days of free access to the full post archives.