Hello Scott, thanks for (once again) a great post! I had 2 questions which I am struggling with for a while. 1) identifying colliders - according to my understanding, and also based on the example below, are colliders simply variables that occur after treatment and are correlated (and/or caused by) with the treatment variable and outcome? 2) What about variables with multiple potential paths? e.g. a variable Q that both may be causally related with Y, but Y may have a causal relationship with Q? Concretely, you have introduced some change in the mixtape stubstack, but not in your tapemix substack and you want to understand how this change relates to the comments on your substack. however, your comments will be made up of comments and questions posed by others, but you only want to understand the effect on comments. Here, Questions posed both affects comments (Q->Y) it is affected most likely by the treatment (D -> Q), your change may have been affected by questions posed under your posts (Q->D) and comments may affect questions that are asked (Y -> Q). so you have a D <-> Q <-> Y relationship

Scott, what do you think of so called causal discovery methods, eg, the PC algorithm or score and search methods, for producing DAGs? And could you clarify what you mean by the conditional independence assumption is not testable? From what I know, conditional independence is testable assuming an underlying distribution.

What should I do if there is a covariate W that directly affects Y, as well as indirectly by affecting the main variable of interest D (W -> D -> Y & W -> Y )? Should W be included in the regression? If so, how can I interpret the effect of D on Y with and without W in the model? Would it be advisable to run an instrumental variable regression with D as the dependent variable and W as the independent variable in the first stage, and then Y as the dependent variable and predicted D as the independent variable in the second stage without W in the model?

Thank you for such a explanation.

Very helpful. Thanks!

Am I ok if I think it this way?

Counfounder: variable that affects treatment and outcome

Collider: Variable that is affected by treatment and outcome

Covariate: Variable that affects only outcome

“Control” only for counfounders and covariates

edited Feb 27Hello Scott, thanks for (once again) a great post! I had 2 questions which I am struggling with for a while. 1) identifying colliders - according to my understanding, and also based on the example below, are colliders simply variables that occur after treatment and are correlated (and/or caused by) with the treatment variable and outcome? 2) What about variables with multiple potential paths? e.g. a variable Q that both may be causally related with Y, but Y may have a causal relationship with Q? Concretely, you have introduced some change in the mixtape stubstack, but not in your tapemix substack and you want to understand how this change relates to the comments on your substack. however, your comments will be made up of comments and questions posed by others, but you only want to understand the effect on comments. Here, Questions posed both affects comments (Q->Y) it is affected most likely by the treatment (D -> Q), your change may have been affected by questions posed under your posts (Q->D) and comments may affect questions that are asked (Y -> Q). so you have a D <-> Q <-> Y relationship

Excellent post, Scott! That is such a lucid explanation.

Scott, what do you think of so called causal discovery methods, eg, the PC algorithm or score and search methods, for producing DAGs? And could you clarify what you mean by the conditional independence assumption is not testable? From what I know, conditional independence is testable assuming an underlying distribution.

Can you please provide an example settings with all these variables?

What should I do if there is a covariate W that directly affects Y, as well as indirectly by affecting the main variable of interest D (W -> D -> Y & W -> Y )? Should W be included in the regression? If so, how can I interpret the effect of D on Y with and without W in the model? Would it be advisable to run an instrumental variable regression with D as the dependent variable and W as the independent variable in the first stage, and then Y as the dependent variable and predicted D as the independent variable in the second stage without W in the model?