Many Analyst Designs, Data Preparation and the Sources of Non Standard Errors

Feb 27, 2025

∙ Paid

This is about a new paper by Nick Hungtinton-Klein and around a 100+ coauthors. Before I dive into it, though, let me set it up. I want to frame this substack in terms of two things: bias in studies versus uncertainty in studies. I think they’re related in our minds but are technically not the same thing. But first, let’s flip a coin 3 times and see if this will be paywalled!

And it will be paywalled. Consider becoming a subscriber today! In this post I will talk about a new paper by Nick Huntington-Klein, et al entitled “The Sources of Research Variation in Economics”, and I think it’s very interesting and worth your time, though, even if you don’t become a paying subscriber today! It could save a life!

Sources of Uncertainty in Empirical Causal Inference

Rung 1: Malfeasance and publication bias

Over the years, we have seen growing scandals in the sciences related to fraud. We all can name instances, so I won’t, but these range from outright data fabrication, data manipulation to p-hacking and publication bias. This level of malfeasance is not what I’m going to talk about today, though maybe it might feel that what I’m going to talk about is the same thing.

Rung 2: Sampling uncertainty

On a second rung is statistical uncertainty. Specifically, the kind of uncertainty that our standard errors are meant to measure. Imagine we could sample the US population repeatedly. We pull 5%, run a regression. We pull 5%, run a regression. We do that 10,000 times and end up with 10,000 regression coefficients each corresponding to a particular random sample. The sampling distribution of an estimator has a hypothetical standard deviation, and our standard errors are meant to measure that.

Rung 3: Design based uncertainty

Without getting into details, there is another kind of uncertainty that is not about the sampling uncertainty, and that would be in causal inference, the uncertainty that comes from not knowing the counterfactual. In this situation, we may be thinking of the uncertainty that comes from treatment assignment. Randomization inference as a way of quantifying that comes to mind.

Rung 4: Non-standard errors

Continue reading this post for free, courtesy of scott cunningham.

Or purchase a paid subscription.