One of the first causal methods we ever learn in statistics class is to “control for X”. We can use a regression that controls for X, and if we wonder what that means, we need only review the Frisch-Waugh-Lovell theorem to see what a multivariate regression is equivalent to. As you progress, you may learn that there are a whole range of estimators, though, that use covariates to reconstruct a missing counterfactual when trying to estimate some average treatment effect. There’s matching methods which basically impute missing counterfactuals by finding units in the comparison group that have the same or almost the same covariate values. There’s even fixes for when you can’t find the exact matches, too, such as Abadie and Imbens (2011) bias correction methods for matching discrepancies. There’s propensity scores which can be used, also, to find matches, as well as used as weights in simple comparisons between treatment and control. And then there are things sort of in between like coarsened exact matching. The number of ways in which you can try to solve thorny causal inference problems through the use of covariates is very long, and very old, and if you enjoy causal inference and econometrics, many of these will the more you spend time with become interesting, and maybe even beautiful.
But ask an economist if these beautiful objects are fit for solving causal inference problems, and more times than not, the answer will be an unambiguous “no”. They won’t even flinch when they say it too! Many economists will just flat out look you in the face and say not only are these unlikely to be useful in real life — they just refuse to even accept the possibility that they’ll ever work.
The problem isn’t the math. All estimators that are unbiased are unbiased and we have proofs to show it. The problem isn’t the math. The problem is the data. Economists in this day and age are borderline allergic to using “selection on observable” methods. And the main reasons given is that they don’t believe the conditional independence assumption.
You see, there are two assumptions that must hold in your dataset in order for any covariate adjustment method to recover an aggregate causal parameter. The first is conditional independence and the second is common support. Common support actually gets short shrift a lot of the times when selection on observables comes up, but it’s often problems with support that will kill a regression model. Not because regressions inherently violate common support. Rather, anyone can run a regression without checking for common support. It’s hard to do when you have many covariates, which is one the advantages propensity scores have for us — they collapse all those covariates into a single scalar which allows for simple checks for support, such as histograms.
But common support is testable, and if you have it, then you have it, and so that can’t really be the source for why so many microeconomists just turn their noses up to selection on observables as a class of problems. The real target to many is the first one: conditional independence. Conditional independence is written like this:
Sorry for the sloppy independence symbol there, but my equation editor doesn’t have the symbol I ordinarily use. But that’s basically the independence symbol. And conditional independence, or what is sometimes also called unconfoundedness, is required to estimate aggregate causal parameters anytime you are adjusting for covariates. Let me walk you through this equation for a moment:
The stuff on the left are potential outcomes. These are states of the world, not realized, hypothetical, they exist a priori but are unknown.
The independence symbol followed by D (the treatment variable) means that the treatment assignment is made across the sample of units for some reason, but whatever that reason is, it’s got nothing to do with either potential outcome or functions of them (like the treatment effect itself)
And last the “conditional on X” part.
The last part, conditional on X, is not the only motivation for including covariates in a regression and expected the coefficient on the treatment variable to have a causal interpretation. Rather, it’s the entire line that justifies “running regressions controlling for X”. What it basically means is that X is a confounder, but good news — it’s the only confounder. It’s both known and it’s quantified. The known part requires a model, be it formal or informal in nature, and the quantified means it’s measured and in your dataset. So if you have the known and quantified confounder, then a whole host of solutions avail themselves to you like regression, matching, propensity scores, etc.
There’s a group of economists who object to this statement, and usually it’s that “known” part. They’ll basically just throw up their hands and say “Look, I don’t believe your model. I’m not saying you’re a liar. I’m just saying I could write a hundred models down and sometimes X satisfies this statement and sometimes it doesn’t. Which model is true?” And I think that is true — there’s a lot of people that are agnostic about the models and so are reluctant to use covariate adjustment which itself requires a model to know which covariates are confounders and which ones are just irrelevant or downright dangerous.
But I have a different theory. I think there are also economists who reject conditional independence, not because they reject models, but rather because they embrace them. Or rather they embrace a particular model, an all encompassing model, of human decision making that I’ll just call “rationality”.
See, rationality is short hand in the economist’s mind, not for being smart or clever or even pretty good. It means they have transitive, complete and continuous preferences from which they can build sensible, ordinal rank orderings over different bundles of things and that allows them to build functions called utility functions that represent their preferences. The larger numbers just mean they like those bundles more than the bundles associated with the lower numbers. Early economists, like Bentham, equated utility with happiness, but modern economists typically root utility in the axioms of choice and don’t. They simply equate it with a ruler measuring preference orderings and the numbers are just place holders for where a person is on that utility function. The make believe people in our models are said to buy goods and services at market prices until they both run out of money and maximize their utility. And that’s where we get demand curves.
Well, there’s a lot of modeling that empirical economists sort of don’t engage with. They’ll do it, but they won’t bet their lives on it. But utility maximization, profit maximization — the things we associate with the axioms of choice. Well, push comes to shove, many applied economists will give up their accomplices in a bank robbery before they’ll ever give up on rationality. It’s the one model we just don’t really outrightly abandon. We tinker with it, we acknowledge irrationalities, but at the end of the day, we still think people make choices to improve their lives under the constraints they face as they define improvement.
So what in the world could this possibly have to do with having a dislike of conditional independence? Because most of the time, when you are fully committed to the notion that people are rational, or at least intentional pursuing goals and living in the reality of scarcity itself, you actually think they are paying attention to those potential outcomes. Why? Because those potential outcomes represent the gains from the choice you’re making. Look at how we define treatment effects:
You get Y1 if you chose D, and you get Y0 if you didn’t, and the difference between choosing D and not choosing D is Y1-Y0, the individual treatment effect. Well guess what — if you think people make choices because they hope the choice will improve their life, then you believe their choices are directly dependent on Y0 and Y1. This is called “selection on treatment gains”, and it’s a tragic problem that if true almost certainly means covariate adjustment won’t work.
Why do I say that? Because think of what our conditional independence equation implies: once I condition on a series of covariates, if I line up a thousand people with the same exact covariate values, those thousand people completely disregarded the treatment gains when they made their choice. Put differently, conditional independence essentially says that for a group of people with the same covariate values, their decision making had become erratic and random. In other words, the covariates contained the rationality and you had found the covariates that sucked that rationality out of their minds.
This isn’t quite the same thing as saying “I seriously doubt your model” or “I seriously doubt that the only confounders you need are the ones in your model”. It’s more like saying “I’m sorry, but if you say to me one more time that anyone on this planet is ever irrational on their own, we’re going to have to go outside and talk this out like adults.” Those are fighting words to economists. Conditional independence is basically saying that there exist people whose entire rationality is just contained inside those covariates such that when you line up people with the same ones, any differences in their decision making was coming from coin flips. And while it’s mathematically elegant the structures statisticians and econometricians have made from those, it’s for many applied microeconomists a tough pill to swallow.
I haven’t heard this kind of complaint before — that certain kinds of rationality may indeed never allow us to even contemplate selection on observables. But I think maybe in my mind it could be why you saw over the last 50 years the push away from simple covariate adjustment to seeking more explicit randomization outside of the person’s own decision making. External factors that randomized treatments on top of one another really aren’t violating a person’s rationality anymore than winning the lucky lottery ticket did. I think perhaps those things that come from above, particularly when they are randomizing devices, tend to just have more prima facie satisfaction to many economists than believing in covariate adjustment, and again not because it’s mathematically wrong. It’s just that rational choice is the one model economists as a group have clung to up til now. And maybe one day we won’t, and when we don’t, perhaps conditional independence will seem appealing. Other fields don’t have nearly the addiction to rationality as an axiom like we do, and so maybe that’s why matching and weighting and covariate adjustment is a lot more palatable. But I think deep down, as economists stare at that conditional independence equation and what it entails really sinks it, it comes across as almost offensive, and so they skip those chapters in the book.
Thanks a lot for this article. I've thought about this problem a lot (I too am allergic to the CI assumption), but I've never read something summarizing my misgivings quite so clearly. One other way I'd put the conditional independence assumption: conditional on X, rational actors don't have any additional information about their own delta, which seems pretty hard to believe.
I come from a philosophy background, which is also accused of being ‘too rational’. Aristotle found his way out of causal problems by distinguishing between the thing and the thing qua itself and I wonder if that could be helpful?
Is it not just a matter of ensuring you’re plotting tokens with tokens (‘qua’ things) and not plotting types?
For example plotting tokens of ice cream sales (at one time) with tokens of hot weather (at one time) gives you causal inference. Plotting tokens of crimes (at one time) with types of ice cream sales (at different times) does not.
Maybe I have oversimplified, but that’s I suspect what Aristotle would think (admittedly in a pre-econometrics world!)