We need to bring “ignorable treatment assignment” back it into our vocabulary
Unconfoundedness, conditional independence, and ignorable treatment assignment
What’s the difference between unconfoundedness and ignorability? Mathematically, best I can tell, nothing. The phrase conditional independence, which I think Angrist and Pischke used more often or maybe first, is more or less synonyms for both. They all have the same notation:
(y0,y1) _||_ D | X
But in some ways, I think something important has been lost moving away from the word ignorability. And in this essay, I’m going to explain why. In a nutshell, unfoundedness is what we mean when we say that all the problematic confounders are known and quantified. In a DAG, it’s the observed parent variable in that familiar triangle linking treatment and outcome. Unconfoundedness is what allows us to include covariates and avoid omitted variable bias. Uncondoundedness in other words is about possessing in your dataset the known and quantified confounders which allows for matching and basic multivariate regressions to be unbiased estimators of some treatment effect.
But ignorability isn’t about the confounders — at least not exactly. It has a different emphasis. It points its fingers at the treatment variable and connects it to the potential outcomes. It states that the treatment is assigned to units “independent of” the potential outcomes. So whatever rule is used, it’s not based on y1 or y0.
Now why is ignorability important? Why can’t we just say “you have controlled for all the confounders”? Because while for some that is a shocking almost brazen claim, it is not shocking nearly enough until you see that it also implies ignorability. Let me give you an example.
Let’s say that a person if they major in literature expects to have some flow of utility equalling 10 net of cost. But if he doesn’t, and majors in anything else, he expects to have a flow of utility equal to 9 net of cost. Therefore we say if y1 is 10 and y0 is 9, then the causal effect of literature is:
Delta = y1 - y0 = 1
and so since the treatment effect is positive, we suspect the person so long as they’re aware of those preference ordering within them will choose to major in literature.
Guess what — that violates ignorability. The treatment is based on y1 and y0. The decision was based on the gains from the treatment and it violated ignorability. Which means it violated uncondoundedness too. Which means no DAG would satisfy the backdoor criterion if you believe at the end of the day the individual always chose to do what he did bc treatment gains were positive.
Sometimes we need to see things from the bleachers as well as from the court to understand it. And ignorability and unconfoundedness are both a picture of the court, they just are different from where you are standing. You can see things from the stands you can’t see as a player and you can see things as a player you can’t see from the stands. And that’s how I see these two words related.
Ignorability is about treatment assignment and equals:
(Y1,y0) _||_ D | X
Unfoundedness is about all confounders are known, quantified and included in your adjustments. And it equals:
(Y1,y0) _||_ D | X
They aren’t in other words two assumptions. They are to remind us that when we try to control for something to achieve causality, we are not merely saying we have all the variables. We are saying for people with the same values of those variables, they did not make their Choice because of treatment gains. All selection on observables Implies irrationality in other words once you condition on all the covariates and as I’ve said before, this is a hard pill to swallow for the behavioral sciences.
Is it a hard pill for the statistics community who don’t study behavior, though? Was it hard for Rubin or Rosenbaum? Who knows. But I know it’s hard for me. As I’ve said, to weaken it and just ask for weak ignorability so that you can estimate the ATT does get you somewhere but not everywhere. In our example, if we wanted to say the treatment was ignorable with respect to Y0, we’d say:
Y0 _||_ D | X
and if that was credible, then with common support or exogeneity, we could then estimate the ATT. You don’t need more than weak ignorability and common support for that one.
But then what is the behavioral implications of weak ignorability? What does it say about peoples minds? In our case it means persons with the same covariate values were attentive to what they gained when they made a choice (y1) but not what they lost (y0). They ignored opportunity costs, in other words.
Well, I teach economics for a living and guess what — people ignore opportunity costs all the time. So much that economists for centuries have talked about it a lot. Paul Samuelson thought it and comparative advantage were two of the most important principles we (economists) had pushed on people.
And yet, I just don’t know. I think about hard decision I have to make. I weigh pros and cons. I made lists. I make tables comparing alternatives. I’ve even made up fake utility numbers and probabilities and maximized expected utility! Many of us do. The more rational you think someone or something is, the less you can use any method relying on irrationality as an identification strategy. The burden of proof is on you to explain why it just so happens you found a time where people actually flipped coins over a major life decision when most of us have never done that ourselves nor seem someone do it. But ironically, we analyzed data all the time assuming they do. Mostly, I think, out of convenience.
Identification by convenience is a dangerous seductive trap. It’s the Venus fly trap of causal inference because you often do it at all because the data is there. But once you grab the data and pursue a matching strategy, be careful you don’t fall into belly of a carnivore plant. (This metaphor may need to have been workshopped more).
I truly struggle on these points but I think where I come down on it is simple. I will always have an open mind. But a person will need to clearly tell me the behavioral narrative as to why conditional on X the actors stopped being “homo economicus”, and if it makes sense, I’m all set to go. It just is that more times than not, they don’t try. Not only do that not have a model to rationalize their choices, they don’t seem to know that this matching strategy assumes ignorability.
And that is why I want the word to come back. I want to bring sexy back.