Introducing "Design-Based Regression Inference" Workshop Taught by Peter Hull
Information about Peter Hull’s new workshop, “Design-Based Regression Inference”
Starting next Monday, April 22nd, Peter Hull from Brown University is going to be teaching a new workshop on Mixtape Sessions called “Design-based Regression Inference”. It’ll go Monday, Wednesday (April 24th) and Friday (April 26th), two hours each night, for a total of six hours. It will include lecture, and it’ll include coding exercises, with solutions, and a lively discussion over Discord, just like always, plus permanent access to recordings stored on Vimeo. It’s a great opportunity and I feel you shouldn’t pass it up.
This is I think Peter’s fourth or fifth workshop for Mixtape Sessions, but the others have all been instrumental variables — either IV alone or a shift-share IV workshop. He’s a favorite, of mine and everyone else. But this workshop is both new for the platform, and conceptually novel too. You maybe have never heard of a workshop that goes by the name “design based regression inference”. So what I want to do below is a few things to help explain what it is about.
First, I want to get the logistics out of the way. Let me give you directions and information about the workshop using pictures and links. After that, I want to give a little background to this word “design” as it is deeply rooted in modern causal inference and if you’re like me, a little background helps. But then third, I did a few emails back and forth with Peter where I asked him questions about the workshop to get clarification around it. So we will end with a conversation between Peter and myself regarding the workshop and what you can expect.
But the bottom line is, this “design” class is cutting edge material from econometrics over the last few years. And it’s closely connected to work by Peter, Kirill Borusyak, Paul Goldsmith-Pinkham, Michal Kolesár and no doubt even more people. It is about using prior knowledge of the quasi random treatment assignment process for building estimators that identify causal effects. And to illustrate its value, he will walk you through many applications, such as spillovers in networks, to new concepts like “formula treatments”, to old questions like clustering standard errors to new applications to industrial organization. It’s a lot of regression and IV too.
So, with that said, here’s some information, and remember the same discounts apply: $1 for residents of low income countries, $50 for students, predoc RAs, postdocs, residents of middle income countries and those in between jobs, $95 for people on higher teaching loads or non-tenure track positions (e.g., Visiting), and $595 for everyone else. Here’s the high level information.
So what is this word “design” anyway and why is it everywhere?
“Design” is a word that you hear a lot in causal inference, but primarily in the economics wing of the causal inference tradition. And more specifically, in the natural experiment tradition of the economics part of the causal inference work. Best I can tell it’s a word we associate with the 2021 winners of the Nobel Prize, David Card, Josh Angrist and Guido Imbens, and through them it can be traced back to “experimental design”. And I think that’s pretty key, because we associate randomization with experimental design.
But we also associate “natural experiments” with the labor economists born out of the Princeton tradition. Take for instance this article by Peter Hull, Michael Kolesár and Christopher Walters celebrating the 2021 Nobel Laureates called “Labor by Design”. They go into a decent amount of detail about the winners, noting the links with natural experiments and experimental designs, and sprinkle throughout the particular flavor of the kind of work they made a career out of doing. It’s a great piece of intellectual history and I encourage you to read it.
Or just look at how many people explicitly note “design” in their papers. Kirill Borusyak has four papers on his website all the with the word “design” in them (here, here, here, and here). Jon Roth has two (here and here). It also came up in an old interview that Ben Zipperer did with Alan Krueger and David Card when Ben mentioned the phrase “research design”. Listen to what Card said.
Card: And you mentioned research design. I remember Alan was an assistant professor and I was a professor at Princeton and Alan sat next to me. And he, for some reason, got a subscription to the New England Journal of Medicine. (Laughter.) And ‐‐
Zipperer: Intentionally?
Krueger: Yeah. I loved reading the New England Journal of Medicine.Card: Yeah. And the New England Journal would come in every week, so there was a lot of stuff to read. And the beginning of each article would have “research design.”
Krueger: And “methods.”
Card: Yes, and if you've never seen that before and you were educated as an economist in the 1970s or 1980s, that just didn't make any sense. What is research design? And I remember one time I said, “I don't think my papers have a research design.”
But if you keep reading in that interview, you can also hear them referencing the natural experiment material too, and noting all these subtle strains around “design”, “shocks” and “natural experiments”, all rooted in the empirical labor work of the 70s, 80s and 90s. Listen here as they discuss the origin of the phrase “natural experiment”.
Card: The first person that I saw really use the phrase “natural experiment” was Richard Freeman.
Krueger: That's where I learned it from too. Richard always had an interest in evidence‐based‐ natural experiments. He was an enormous fan of the work by LaLonde; also, the paper Orley did in JASA [the Journal of the American Statistical Association] on the negative income tax experiment. Richard always had a soft spot for natural experiments. But I think he used the term differently than we would.
I think he applied it to big shocks. So to him, the passage of the Civil Rights Act was a natural experiment. The tight labor market in the 1960s was another natural experiment. I think the way he viewed it was a bit different than the way it started to get applied, which was that the world opened up and made a change. When Josh Angrist and I looked at compulsory schooling, we looked at a small change. The experiment was just being on one side or the other of the threshold for starting school and then affecting how many years of education you ultimately got because of different compulsory schooling laws.
But that's where I heard the term.
So that’s kind of interesting I think. The research design concepts are coming to Krueger and Card via medical trials — i.e., RCTs. But they’re using, as a group of labor economists, natural experiments that they hope can mimic the RCT. Nevertheless, it doesn’t always. Krueger notes that the “big shocks” and the “small shocks” seem to be different things, at least in his mind.
Oddly enough, a lot of these themes show up in the correspondence I have with Peter about his class. Below I have a back and forth I did with Peter trying to pull out of him an “explain to me like I’m 5” explanation of what “design-based inference” is. And I’m going to let him say it, but one thing you should know is that this is going to be a cutting edge course in causal inference that is a flowering of the last 50 years of work in some ways. So I highly recommend it, not matter where you or who you are. You will learn a lot about regression, IV, omitted variable bias, and how to take advantage of prior knowledge to estimate causal effects that you care about.
Email correspondence about Peter’s new workshop
Dear Peter
I’m super excited that you are teaching this three day workshop you’ve named “Design-Based Inference” starting Monday April 22nd and going through Wednesday April 24th and Friday April 26th. And I appreciate you agreeing to let me interview you a little about it! It sort of feels like we are pen pals.
So let’s get started. I’ve never heard of a class like this before. I read your Scandinavian Journal of Economics article about the Nobel laureates entitled “Labor by Design”, though, coauthored with Chris Walters (who’s also done a mixtape sessions workshop) and Michal Kolesar. So I’m guessing there’s something technical and specific about this idea of “design” that maybe makes your class stand out. Can you tell me a little about what this class is and how it differs, say, from your earlier workshops on instrumental variables and shift-share? And I guess I am curious what this word “design” means since I’ve now seen you use it twice.
Sincerely
Scott
Thanks Scott -- I’m super excited to teach it! And thanks for the opportunity to say a little more about what students can expect from this course.
In short, design-based methods use knowledge about the assignment process of as-good-as-randomly assigned shocks to identify certain causal effects or structural parameters. What does that mean exactly?
Well, think about an experiment that randomizes some treatment according to some protocol. The simplest experimental protocol is one in which all individuals face the same probability of assignment to treatment, but we could also imagine much more elaborate protocols where the assignment probability (aka the propensity score) varies across individuals. The point is that whatever it is you know the “design” of the randomized treatment, and this knowledge turns out to be very powerful. For one thing, it allows you to construct robust estimators of many different kinds of treatment effects which you know will “work” without any restrictions on how unobservables (e.g. potential outcomes) relate to observables. One such robust estimator turns out to be linear regression, at least if you pick the controls appropriately given the design. Another reason is that design knowledge can guide appropriate inference methods --- e.g. how to cluster your standard errors, which can otherwise be quite a headache to figure out!
As discussed in the SJE article, a leading insight of the 2021 Nobel Laureates is that empirical work in economics can draw on the strengths of “real” experiments to more robustly or credibly estimate policy-relevant parameters. But it’s taken us some time to really get to the bottom of what it means for an observational data analysis to leverage the design of a natural experiment in this way. For example, part of the class will contrast the design-based approach to other more “model-based” strategies which use restrictions on potential outcomes (e.g. parallel trends assumptions) instead of assumptions on the assignment of shocks for identification. These are also tractable and often very credible strategies, but they’re not design-based. And this can matter, for example in what we now know about the fragility of two-way fixed effect regressions or other diff-in-diff strategies under treatment effect heterogeneity (the now-infamous “negative weights” problem).
Most of the class will focus on these issues, and in particular on how you can most effectively leverage design assumptions and know when you’re a setting appropriate for them. We’ll primarily discuss linear regression and instrumental variable methods (no surprises there for folks who know me!) but we’ll also have a bit of time at the end to talk about how design can also help with estimation of nonlinear or “structural” models.
I should say that I’m most excited for this class because this stuff really is hot off the presses. Many of the issues we’ll be talking about have only come out in the last couple of years, and some of them are still very much being worked out. I’m looking forward to honing and exploring some of the ideas with the students, and hopefully generating some interesting discussions across fields and backgrounds as we learn together!
Peter
Peter
That's pretty cool. This seems like a fairly innovative course then if it's drawing on the last few years of material. I'm curious though -- we've had IV and observational designs before this more explicitly branded "design-based regression inference". So something must be even newer than it sounds. I was wondering if you could tell me a little bit more about who is it out there that maybe thinks they already know this stuff, but in fact, humbly speaking, probably would really benefit from it. Like what kinds of projects, or what kinds of researchers, or what kinds of industries and academic departments and fields would really find this, not just interesting, but maybe critically important to their own work?
scott
Hi Scott,
Great question. I think design-based methods offer several practical insights for people from many different backgrounds and working on many different topics (both academic and not).
One immediate advantage of design knowledge is that it allows you to robustly and efficiently estimate the effects of “formula” treatments, which combine a set of as-good-as-randomly assigned shocks with other non-random variables, potentially in a very complicated way. This is the subject of my recent Econometrica with Kirill Borusyak. We give several examples of such treatments from different fields, such as those capturing network or general-equilibrium spillovers or indicating eligibility for complex public policies, along with a general design-based approach to estimation and inference. The class of formula treatments is quite large, spanning many fields, so I suspect students in the class will be able to come up with many other interesting applications of this method in their own work.
Another advantage of taking design seriously is that it can guide the appropriate clustering of standard errors (or the correct inference methods more generally). This can otherwise be a real headache in applied work, as Khoa Vu once succinctly put it in meme form. But design knowledge makes inference (relatively) easy! So again I think there are lessons here that are quite useful regardless of your background or field.
A final frontier consideration is how well-specified design can help with structural estimation. This is something that Kirill and I are actively working on, and I’m very excited about its potential for work in fields like industrial organization (IO) where I think it’s fair to say that identification can be much murkier than with simpler “reduced form” analyses. In both worlds, design knowledge can relax otherwise strong assumptions about how unobservables relate to observables and our sense is this can help a lot when the model is complicated. As I’ll discuss in the class, in some cases design knowledge can also clarify exactly what the model is “doing” – i.e. why it is useful for extrapolating out from the reduced-form variation given the design.
Here I think folks in industry might particularly benefit: my understanding is that A/B testing is quite common now in tech, for example, and such testing naturally yields design knowledge. My hope is the course will inspire many new and innovative ways to use this variation, potentially in conjunction with other non-random variation or existing models, hopefully giving folks a big leg up on the competition.
Peter
Peter
It’s really awesome you’re doing this. Most people are likely unfamiliar with this new material or its relevance for their work. Issues around clustering has always seemed like throwing darts at a dart board. I do it one way and just hope no one asks me too much about it! I’m really excited you’re doing this and can’t wait to see how it goes. Before we stop, is there anything you want to say to people as they are either on the fence about attending or maybe are attending and want to get in the right head space to prepare?
Sincerely
Scott
Scott,
I think that mostly covers it! The only other thing I’d mention is that -- as always with Mixtape courses -- there will be two coding labs to help reinforce the core practical lessons and give students some practice trying them out in existing applications. I’m a strong believer that econometrics is meant to be used and not just studied, so I’m having a lot of fun putting these together. Students will have a chance to work on the labs on their own, then come back together and see how I approach things in a live-coding demo. It should be a lot of fun!
Thanks again for the chance to say a little more about the course and for inviting me to do it in the first place! I’m really looking forward to it.
Peter
So there you go! There’s a lot of new stuff in his workshop. Applications to IO, networks, new insights into regression and IV, formula treatments, and more. It’s a very cool workshop, and I highly recommend you come! Remember, just because you can’t make it, you still get the recordings.