I was waiting for this blog. Thank you Scott. I will go through in detail before i start second phase of analysis a data with continuous and staggered treatment. :) Happy new year.

This is really helpful, thanks for sharing. For alleviating the selection bias term, does it make sense to match the units on, say, their propensity to select that level of treatment? For example, a linear regression to predict their chosen treatment based on confounding factors that affect both their choice of the treatment level and their outcome. We can then stratify the individuals. Within a strata we can assume that whatever variation exists in their choice of treatment level is random and then the continuous DiD estimator would be, relatively, unbiased?

Just re-read this as its relevant to a paper I am working on. If you were trying to demonstrate that the continuous treatment bias term isn't there, what are some concrete things you could do? The way we use event studies to argue that our parallel trends assumption is valid.

My first thought is that one could break their continuous treatment variable up into 4 dummies the way you did and demonstrate that all four coefficients are equal. However I think that would just show that your treatment variable has a linear average causal response curve... Not clear to me if that tells you anything about the bias we are concerned with.

Let me know what you think in terms of feasible diagnostics!

So if I understand correctly: in a world where dosage is purely randomized but with staggered timing of dosages, can we still follow CS or some of the other methodologies in the staggered timing literature? With randomization we are still recovering the ATT with selection bias = 0 right?

Is it proper to think of DDD with binary variables as an elementary way of getting around this issue? If you have a DiD paper you want to write and you have continuous X, you can find a reasonable way to break your sample into no treatment, some treatment, and high treatment.

This is obviously throwing away a lot of variation, but my take away from reading this is if you are a PhD student working on your dissertation, you should go the triple difference route...

I was waiting for this blog. Thank you Scott. I will go through in detail before i start second phase of analysis a data with continuous and staggered treatment. :) Happy new year.

edited Mar 18This is really helpful, thanks for sharing. For alleviating the selection bias term, does it make sense to match the units on, say, their propensity to select that level of treatment? For example, a linear regression to predict their chosen treatment based on confounding factors that affect both their choice of the treatment level and their outcome. We can then stratify the individuals. Within a strata we can assume that whatever variation exists in their choice of treatment level is random and then the continuous DiD estimator would be, relatively, unbiased?

Hi Scott,

Just re-read this as its relevant to a paper I am working on. If you were trying to demonstrate that the continuous treatment bias term isn't there, what are some concrete things you could do? The way we use event studies to argue that our parallel trends assumption is valid.

My first thought is that one could break their continuous treatment variable up into 4 dummies the way you did and demonstrate that all four coefficients are equal. However I think that would just show that your treatment variable has a linear average causal response curve... Not clear to me if that tells you anything about the bias we are concerned with.

Let me know what you think in terms of feasible diagnostics!

So if I understand correctly: in a world where dosage is purely randomized but with staggered timing of dosages, can we still follow CS or some of the other methodologies in the staggered timing literature? With randomization we are still recovering the ATT with selection bias = 0 right?

Is it proper to think of DDD with binary variables as an elementary way of getting around this issue? If you have a DiD paper you want to write and you have continuous X, you can find a reasonable way to break your sample into no treatment, some treatment, and high treatment.

This is obviously throwing away a lot of variation, but my take away from reading this is if you are a PhD student working on your dissertation, you should go the triple difference route...