Some Stylized Facts About Econometrics Citations Patterns, Diff-in-Diff Citation Patterns and Software Downloads
Or, "How I couldn't go to sleep and stayed up to 2am doing fake research and rejected all my theories about something no one else but me probably cares about"
It likely goes without saying, but I'll say it anyway as maybe some of the readers don’t know this. But since 2018, a series of econometrics papers on difference-in-differences have absolutely exploded in popularity in economics. What started as a revision to practice has — separate that alone — become a peculiar citation phenomenon unlike anything I have been able to find in the history of econometric citations.
When searching for a house, you’ll typically get “comps” which are nearby homes with similar characteristics so we can put the house we’re interested in better perspective. I’m going to list “total cites” and “best year cites”, as well as as the time it took to reach those points. So, here are some of the comps.
Arellano and Bond (1991) has around 46,000 cites but that took decades to reach their current citation levels. This one is my leading candidate for a breakaway paper in econometrics that went on a tear. In 2001, ten years after it was published in 1991, it got 300 cites annually. Last year it had its historically highest cites with 3,200 cites for the year. I’m going to say this is a historic example of a “meteoric impact econometrics paper” because of its total cites, it’s year over year increases in cites, and it’s large number of annual cites which has grown year over year. Its Stata package,
xtabond2
, is the fifth most downloaded user-created package too.Hal White's 1980 heteroskedasticity paper has 36,000 cites but hit its max annual count around 2013 with 1,437 citations that year. Now this one is more of an example of a paper that has become so influential that it no longer gets cited. We also don’t cite Gauss every time we use OLS, and we don’t cite Hal White when we use heteroskedastic robust standard errors. But again, the point is the same that it’s citation pattern was a slow crawl that eventually swamped the economics profession so much that it was generating over a thousand cites a year.
Abadie's seminal 2003 (7000 cites) and 2010 (7800 cites) synthetic control papers,
Imbens and Angrist's 1994 LATE paper in Econometrica has over 7,100 cites. Last year was its highest yet with 591 annual cites for 2024. Their 1996 IV paper with Rubin in JASA is even higher with 8,400 cites and last year was also its highest year with almost 600 annual cites.
Rosenbaum and Rubin (1983) has 42,000 cites. It's the second most-cited paper in the history of Biometrika if not the most cited. But its influence was a slow boil. In 2000, 16 years after its publication date, this classic paper introducing the propensity score had 133 citations for the year. Last year it had over 3,000 cites and it’s on track to have even more this year.
And of course, Heckman (1979). This seminal paper on sample selection bias currently has around 43,000 cites — after 46 years. And I think it was actually a runaway hit because in 1986 it had 164 cites that year. Its max annual citations was in 2014 when it hit 2,000 cites for the year and last year its total cites were 1,900. (Interestingly, the Arellano and Bond (1991) paper published 12 years later reached Heckman’s current cite count 13 years sooner.)
Not going to leave behind my distant cousin in the causal inference family, but Engle and Granger (1987) now sits comfortably at 55,500 cites. Its peak was in 2017 when it got a little over 2,500 cites for the year. But it’s been declining — last year it got “only” a measly 2,000 cites.
Daniel McFadden’s 1972 article on the conditional logit has around 28,000 cites after 53 years. Its peak was in 2021 when it got 1,584 cites. Last year it got 1,450.
Lars Peter Hansen’s classic 1982 Econometrica on GMM currently has close to 20,000 citations. Its best year was 2023 when I got just under a thousand cites. Last year it got exactly 900.
There are other influential papers, but they go back further in time (e.g., Anderson and Rubin 1949; Halvelmo 1944), and what I’ve noticed is that the phenomena I’m going to document does seem to be a growing tendency to become meteoric as time has passed.
So, and with the caveat I am basing this on my own very much anecdotal eyesight and without more data, I nonetheless get the sensethat the “econometrics hits”, measured in citations, could be shifting over time. I think that still won’t explain the things below that I’ll be documenting, but nonetheless that’s my feeling of things. It’s not that in the past papers did not reach giant levels, as I just documented numerous ones that did, but rather the speed with which they reach them, does seem to be changing, and I think maybe for the outliers, speeding up. I suspect is a function is a few things, some of which Dan Hamermesh has documented in analyzing six decades of trends in economics publishing.
The economics profession has shifted over time from pure theory to pure empiricism.
And if the profession shifts to pure empiricism, from pure theory, econometrician’s marginal product, measured in citations, would seem to be growing too.
I guess that’s the same thing, in two bullet points, but as this is a substack, I’ll allow it.
So anyway, I think those are reasonable candidates. Which is why I want to now moved into the following papers so that we have some context for what is really by historic measures in econometrics a weird pattern in citations.
The DiD Credibility Reformation: By the Numbers
I decided to focus on six papers that I will refer to them collectively as the Kings and Queens of the Diff-in-Diff Credibility Reformation. I chose the named “credibility reformation” because “credibility revolution” was taken, plus I’m protestant. Anyway, the six papers I am going to focus on are: Callaway and Sant'Anna, Bacon, Borusyak, Jaravel and Spiess, de Chaisemartin and D'Haultfœuille, Sun and Abraham, and Baker. I included Baker as he’s been following a similar trend as the others.
A couple of omissions before I get into it, though. First I don’t include Gardner’s 2-stage diff-in-diff because John doesn't have a Google Scholar page, and while I found his paper has over 700 citations, I can't track the annual counts easily without that author google scholar page (or at least couldn’t figure it out and Cosmos told me “yeah it is hard” so I didn’t really try). Also, it's not yet published. So those two apparently are part of my selection criteria for this substack manuscript.
Second, I do not include synthetic diff-in-diff, which has also surged. It now sits at 1,500 cites since it first appeared in 2019 (published in 2021) and shows no signs of slowing down. But I exclude synth diff-in-diff because despite its name, I consider it part of the synthetic control family as it does not have a parallel trends assumption. And in this analysis, I am defining diff-in-diff as any method that identifies the ATT under parallel trends. But since synthetic DiD relaxes the parallel trends assumption, I exclude it.
So, with that out of the way, here’s the kings and queens of the credibility reformation.
Some of these papers have longer lives than others. But that’s the annual cites. Here’s the numbers in tabular form. The gray dots mean that either the paper did not yet exist in working paper form and/or google scholar did not yet have data on that. Bacon’s paper was bouncing around in working paper form going back to 2018, but google scholar does not list cites for its 2018 form (though I bet that some of these 2020 cites are in fact 2018-2019). Similar stories are there for the others. Nonetheless, gray dots are “paper didn’t exist” and/or “not yet cited”.
The dark blue means “working paper but not yet published”. The light blue means “published”. So then that too is unusual. While I suspect the increases we see at the year of publication are at least somewhat due to publication itself, I don’t think that explains everything because many of these papers were on a tear before they even got published. BJS, for instance, 1,880 cites before it had ever been published.
Third, the rise. Note that if you click on the earlier “classic econometrics papers” links I’d embedded above, you’ll be taken to google scholar pages. There you will see the more typical shape of impactful econometrics articles which is the “gradual rise”. But notice the growth rates which I’ll show by taking the natural log. These are not small levels (above) and they also aren’t small rates of change (below).
So the pattern is pretty striking. This collection of papers, five of which are econometric theory papers (Baker’s a reevaluation of several papers in finance that used TWFE diff-in-diff with differential timing, on the other hand), are advancing faster than any of the other papers I reviewed. Both Bacon and CS reached a thousand annual cites within two years of publication. They reached over two thousand cites by the fourth year of publication. But those papers we reviewed? The breakout classics? They followed that “slow boil” pattern where it took them decades before they were showing that kind of annual and total citations, and for several we reviewed, their max annual cites were still below these papers max annual cites last year and growing.
Why has this happened? Well I think we all know.
Diff-in-diff is the most popular method in economics amounting to over 30% of all applied papers posted to NBER.
Twoway fixed effects was the workhorse model for decades in what can only be called standard situations where the treatment hit certain locations at different moments in time, or “differential timing”.
These papers as a group were either backwards engineering the TWFE estimator in such situations, showing it’s biases under unrestricted heterogenous treatment effects, and/or they were building estimators that were robust unrestricted heterogenous treatment effects under differential timing, or sometimes both.
You take all three together — a popular design, a model thought to be “simple” and “intuitive” and “easy to understand”, the discovery that said model was complex, non-intuitive, not easy to understand and the real killer — potentially biased — and then you the delivery of solutions and you know what you get? You get the Diff-in-Diff Credibility Reformation.
It’s also possible that social media helped fuel this, particularly #EconTwitter, as that is also one thing that is happening now with the current crop of econometrics papers that did not exist before Twitter and Bluesky — the alternative mediums for promoting econometric theory papers has probably allowed young people to circumvent established channels, like NBER working paper series, to broadcast their work. First of all, young people do not yet have established professional networks, let alone a name for themselves. Second, NBER doesn’t even have an econometrics section. Third, the publication lags have made freely accessible working paper series, like ssrn and arXiv.org extremely valuable because as I said, these papers were actually quite influential even before they were published almost certainly assisted by the fact that they sat on publicly accessible working paper platforms like those two.
Some Stylized Facts Around CS Advancing
So we can agree — the patterns is pretty impressive? Because now I want to note some weirder things for which I don’t have any explanations, and that is how CS clearly broke ahead of the pack after its publication, even though two other papers were published the same year, all three papers won paper of the year at Journal of Econometrics, it and two other diff-in-diff papers received a Stata package that year, and one paper was published in the AER the previous year with its own Stata package released a year before that. So what gives? Why did CS break away?
I remade the graph to show you that CS breaks from trend between 2021 and 2022 and then after that grows more steeply than Bacon’s paper, overtaking this year. Now since it was also published in 2021, you might think that this was the “publication effect”, but I don’t think that makes sense. CS was published in 2021, but so was Bacon and was SA. And to make it even less likely this is a publication effect, all three shared shared the Aigner award for paper of the year at Journal of Econometrics. Elsewhere I wrote an entire thing about the fact that those three papers appear to have caused Journal of Econometrics to go from an impact of factor 2 to 10, which exceeded the impact factor of several top 5s, including Econometrica itself.
Diff-in-diff papers and their impact
Guido Imbens and Susan Athey said a few years ago that synthetic control was the most important innovation in causal inference of the last 15 (now 20) years and as of a week ago, Andrew Goodman-Bacon’s 2021 article on difference-in-differences with differential timing
The second thing I thought was that it was caused by Fernando Rios-Avilla releasing csdid
on Stata in late 2021. I thought it so much that I combed through whatever data I could find. So here are some facts about downloads of the major diff-in-diff packages that I could find.
First, I found the csdid
historic downloads at repec as well as several others. The ones I’ll focus on are CS (csdid
and drdid
just bc Fernando also made it and it’s the engine inside csdid
), SA (eventstudyinteract
), DCDH (did_multiplegt
), and BJS (did_imputation
). Here’s what I found after aggregating to the annual level.
But here’s some basic facts.
did_multiplegt
, the DCDH estimator from their 2020 AER, shows up in May 2019 and takes off. The year the AER comes out, they’re at 400 downloads. In 2021, they have almost 700 downloads. In 2022 they hit their peak with 838, but then ever year since, they’ve been declining. Perhaps everyone who will usedid_multiplegt
now possesses it, of perhaps they have switched over to their better and faster package,did_multiplegt_dyn.
csdid
was released May 2021 at GitHub, but the ssc download statistics don’t start until August 2021. Those 5 months result in 110 downloads. Then in 2022, it jumps to almost 500 downloads, and it basically stays at 500 for three years, and then this year it’s at 137 so far. Also in May 2025,csdid
was the 13th most downloaded package at ssc (at least among the user created add-ons). But interestingly, over the last 12 months it was the 8th most downloaded user created packaged at ssc among the add-ons.The BJS estimator,
did_imputation
, came out two months before the SA estimator,eventstudyinteract
, both in 2021, and they have followed almost the same trajectory with BJS having more downloads in 2021 (124 vs 45), 2022 (328 vs 283) but then after that, they’re basically in lockstep and as of today, they’re close. BJS has 1,041 cites as of May of this year, whereas SA has 908.
Also, in case you were curious, I wasted a half hour of my life to also collect data on how many user created packages at Stata’s ssc have been downloaded over time, and they appear to be in decline, which is a bit worrisome for that for-profit company I’m sure.
But, getting back to CS. I said that I thought it was caused by csdid
but that I ultimately changed my mind. First, let’s look a bit more at the data though — did CS really break from trend? To check, I estimated event study coefficients using a simple 2xT diff-in-diff design. What’s a 2xT diff-in-diff design? Well, in our recently accepted paper, “Difference-in-Differences: A Practitioner’s Guide” at Journal of Economic Literature (pop pop!), you can learn more about that, but it’s basically the following regression. I’ll explain the 2xT regression equation in a moment after this message from Magnitude!
So back to the event study regression equation. It is:
This expression is straightforward. I have a treatment dummy equalling 1 if you’re CS and 0 if you aren’t. And I have calendar year dummies that go from 2020 to 2024 for one analysis, and then 2019 to 2024 for a different one. I drop 2025 in other words since I don’t have complete data for it. Notice that the specification here is to fully interact the CS dummy with the calendar dummies. I drop 2021 so it’s the baseline for all the coefficients. I don’t have standard errors, so I will only plot the coefficients, also.
First, this uses all the papers as controls, but since not all the papers have 2019 data, I can only estimate one pre-treatment coefficient. It appears that there is a slight trend, which I am going to check out again, but for now notice that the diff-in-diff estimates (I won’t say treatment effects as I don’t know what the treatment is) go from just under 400 additional cites in 2022 (the year after publication), to 600 in 2023 to around 1100 in 2024.
Note this does not seem like it could be a publication effect, as Bacon and SA were both published in 2021. And DCDH was published in 2020 in the AER. And I don’t think it can be a Stata package effect because while csdid
was released in 2021, Brant’s R package did
came out in 2018, SA and BJS both had their package come out in 2021, and DCDH had had a package in Stata and R already too. Plus, I’ll show you some data from Brant’s R package, did
, momentarily which casts doubt on the csdid
theory I was originally thinking of.
But, because of those pre-trends, I’m going to extend the event study back two years. That means working with only 3 papers — CS, DCDH and BJS are the only ones for whom I have data going back to 2019. Same equations as before, only this time I have two pre-treatment coefficients. And since I have a different subset of papers, both the 2020 coefficient is slightly changed, as are the post-treatment coefficients, though not by much at all interestingly.
Here’s what I find.
2019: CS has a slightly negative trend of -91
2020: CS has a slightly negative trend of -76
So it’s trending up, and the estimated treatment effects are pretty much the same.
Then, around 1am last night when I simply could not sleep after all this nonsense I’d put myself through, in a race to try and just be done, I decided to bound those estimates just to see if those pre-trends could explain the effects, including some make-believe Rambachan and Roth (2024) style stuff.
Basically, I drew a line from 2019 to 2020 and out to see if that trend hit those Xs and not it didn’t. Then I assumed a quadratic curve starting at 2019, and it didn’t touch it either. Then I took a linear extrapolation from 2019 to 2020, added a hundred to it, and multiplied it by years after 2021 (“conservative”) to extrapolate and that didn’t do it. And then I did the same thing but added 200 to the linear trend, multiplied that by years since 2021 (“aggressive”), and that also didn’t do it.
So whatever is going on, CS does break away even in 2022 and races ahead.
But it wasn’t CSDID
So, late last night, Brant sends me download data for the history of his R package, did
, which basically removed all possibility that csdid
was driving their surge. csdid
should have no effect on R downloads. So it’s a perfect falsification because if you see a similar jump in downloads in Brant’s R package, did
, when Fernando’s csdid
went on ssc, then it probably can’t be csdid
causing CS to get ahead. And in fact, you find the same thing.
First of all, let that sink in — Brant’s R package, did
, has been downloaded 100 times more than Fernando’s csdid
package on Stata. ONE HUNDRED. Not 10 times. ONE HUNDRED TIMES. His total downloads is 165,000 whereas Fernando’s is 1,591. I think this has to be something about how on R, people download stuff more often than they do in Stata, though is it off by 100? I don’t know.
But then if you look closely — you see how from 2022 to 2024, Brant’s package plateaus at 40,000 a year? Interestingly, Fernando’s package plateaued at 400 a year over those same years. So I plotted them again but so they’re on the same scale, I indexed each one to their respective 2021 value so that all values are relative to their own 2021 value.
So then what did I learn?
Okay, so let’s recap. What did I learn?
Historically econometrics hits were a slow boil, gradual rise, in influence.
Current diff-in-diff papers show meteoric rise — faster growth caused by larger annual growth rates and larger annual counts
For some reasons, though, CS experiences a surge starting in 2021 relative to everyone and which cannot be explained purely by pre-trends — by 2024, it’s around 1000-1200 more cites relative to the groups’ 2021 to 2024 trend.
It doesn’t seem like it’s coming from software availability as DCDH already had software; CS had R package going back to 2018 which was already on fire in terms of downloads; CS, SA and BJS all released a Stata package in 2021
Brant’s R package also sees a big jump in 2022 over 2021 after which flattens; that flattening in downloads can also be seen in Fernando’s Stata package.
And that’s it. I got nothing else to say, and have no idea why CS would have that break in trend in 2022. But whatever it is, you can see signs of it in the software downloads too as the other measures of downloads of the other packages for which I have data don’t show that same pattern. Hate to say it, but I might have just wasted a lot of time on doing this.
This was one of the discussion we were having in the department as well. My personal take on this rise (might not align with the time you show upwards but might as well if you check) of CS estimator is the Paper "What’s trending in difference-in-differences?". Mainly because it draws the comparison about CS and other estimators. However the CS estimator gets really detailed and intuitive explanation throughout the process. And small things (not really but on grand scheme of things) like CS being able to deal with serially correlated outcomes and parallel trends being relative easy to justify over long time periods. Also, the paper discussed about the no carryover assumption of of dCDH and a stronger parallel trends, which also kind of tips the choice of estimator towards CS. This is coming from my PhD experience and why I choose CS (which is mainly due to context of my research).