Designing Diff-in-Diff: Continuing Target Parameter Discussion plus Workshop Announcements
Greetings! We are at Baylor concluding the last week before our spring semester starts. I am going away for the weekend for a writing retreat to Austin where I’ll be in an Airbnb off of South Congress finalizing my prep for the new “Economics of AI” class I’m designing, prepping and teaching this semester. I leave in an hour sharp, so this substack is a race against time. Here’s what you can expect from it.
I will share about the Mixtape Sessions Causal Inference I workshop that begins next Saturday on January 25th at 9:00AM CST.
I will report the results of the coin flip that determines whether the post will or will not be paywalled.
I will continue my discussion of the first part of the design stage as it applies to concealed carry laws, the empirical exercise I’m doing in this diff-in-diff checklist series.
So let’s begin!
January 25th Workshop, “Causal Inference I”
Every semester, I offer workshops taught by myself and others on my platform, “Mixtape Sessions”. Adlerian psychology says that it is important for our personal growth that when we respond to “feelings of inferiority”, which all people feel and which is neutral and not a good or bad thing, that one of the ways we strive with costly effort is to do things that are social and create a sense of community. Mixtape Sessions is one of those things for me, as is really all of my “Mixtape stuff”.
Mixtape Sessions is born out of my intense belief that causal inference is not able to be taught in an accessible and inexpensive way everywhere. I believe, but cannot prove, that this is due to several factors, and if you’ve been reading the substack a while you can then skip them. But they have to do with sociological factors, field-specific factors, and the fact that causal inference is not the same thing as econometrics or statistics, and therefore in those classes simply cannot be covered always in appropriate depth given there is so much more to both than merely the Rubin causal model and its descendent methodologies.
I am sympathetic to all of that, and so Mixtape Sessions was created as a two-sided platform that allows people from all over the world to come to the workshops and take them with me and other people from top universities, often elite ones like MIT, Brown and so forth. We break the workshops into three categories: “The Classics”, which are mine, “Singles”, which are more detailed focus on core designs like diff-in-diff taught by someone else, and “Deep Cuts” which are more cutting edge, frontier material not covered, sometimes at all, in “The Classics”.
Next Saturday, January 25th, I will begin a new workshop. It’ll be “Causal Inference I” and it will cover the Rubin causal model, randomization and selection bias, directed acyclic graphs and their usefulness, unconfoundedness, regression discontinuity design and instrumental variables. It’s four days spread over two weekends. And if you can’t make it, but want to come, you should know you can buy a ticket and whether you make it or not you still get access permanently to the recordings. It’s around 24 hours, give or take, of recorded material, plus code and applications of the code, usually in the form replication exercises, as well as a lot of suggestions and opinions. I hear from others that it is valuable, it’s open to everyone whether academic, student, faculty, industry, tech, nonprofits, or government. And the pricing is very flexible: $1 for residents of low income countries, $50 for students, predoctoral RAs, postdocs, residents of middle income countries and people in between jobs, $95 for people not on the tenure track but that don’t fit the prior category, or people who are tenure/tenure track but teach in their own opinion “high teaching load”. And then $595 for everyone else. We meet over zoom, and we talk a lot via Discord.
So click on this link, email me at causalinf@mixtape.consulting for a promotional code if you fit those categories with a discount, and sign up! You have 8 days!
Coin flip
Today’s post is not paywalled due to tails being more common in a Monte Carlo simulation of coin flips repeated 101 times. Figure and code below.
Python code
import numpy as np
import matplotlib.pyplot as plt
# Step 1: Simulate 101 coin flips (Bernoulli trials)
np.random.seed(42) # For reproducibility
trials = np.random.binomial(1, 0.5, 101) # 1 for heads, 0 for tails
# Step 2: Count outcomes
heads_count = np.sum(trials)
tails_count = len(trials) - heads_count
# Step 3: Data for the plot
labels = ['Heads', 'Tails']
counts = [heads_count, tails_count]
# Step 4: Create the bar plot
plt.figure(figsize=(8, 6))
plt.bar(labels, counts, color=['skyblue', 'lightcoral'], alpha=0.8)
plt.title('Heads vs Tails Monte Carlo over 101 Trials', fontsize=16)
plt.ylabel('Count', fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', alpha=0.3)
# Step 5: Annotate bars
for i, count in enumerate(counts):
plt.text(i, count + 1, str(count), ha='center', fontsize=12, color='black')
# Display the plot
plt.tight_layout()
plt.show()
Designing your diff-in-diff, Step 1: Define your target parameter
Let’s dive in. Recall that we are using as our running empirical example the evaluation of “right to carry” and concealed carry laws on homicides in the United States. And I reviewed the literature on this, including work by Lott and Mustard (1997), John Donohue’s several excellent articles on the topic as well as criticisms of datasets that have been used.
When we start a new project, it is usually born out of subject matter and public policy questions, and not a clearly defined target parameter. So what is the difference and what is the more common way in which causal studies done on a subject matter focusing on public policy is done if target parameters are not defined?
In our case, we want to know the effect of a state passing concealed carry laws on homicides. The subject matter and public policy therefore are related to the research question of concealed carry and right-to-carry legislation. The outcome of interest is homicide. We are interested in it for whatever reason, but it is usually driven by combinations of subjective curiosity and scientific questions where answers are needed for the hopeful shaping of future public policy. The shaping of public policy is oftentimes the driving force of these types of studies though not always. Not everyone has a burning desire to do that, though in my readings of classical history of thought in economics, I have been quite surprised to learn that an interest in shaping public policy has essentially almost always been the hope and goal of economic inquiry in the beginning, even going back to Adam Smith, Thomas Malthus and David Ricardo. We often think that social scientists are like pure scientists, and sometimes they are. But economists, and I think this is also true for epidemiologists, sociologists, education researchers, and others, and especially those working in industry, are very much hoping that the work they are doing will impact the real world and make it better by changing what is sometimes called the rules of the game.
Target Parameters are not Regressions
Oftentimes, this overall motivation leading one to study the laws and their effects will, though, mask and distort the explicit definition of the target parameter. More often than not, the target parameter will be entirely conflated within an estimator, too. Individuals will say that they are “seeking to estimate the causal effect of concealed carry on homicides” and then immediately write down a regression model like:
And they will simply assert that \delta is the parameter of interest. But what exactly is \delta? What does causal effect mean if it is only expressed inside the estimator? After all, that model is the estimator, and you’ve already chosen a regression to use it.
The target parameter is not the parameter inside a regression model, not in the way that I mean it. The target parameter is expressed as two things:
It is expressed as an average over individual treatment effects which are defined using potential outcomes, and
It is expressed as an average over a specific population
Thus, we have to be very specific up front about both of these, because once we know which group of people (i.e., specific population) that we are wanting to take averages of their own individual treatment effects, then we can select the estimator best suited to identify it with data, as well as contemplate whether the assumptions required by that estimator are suitable for the data you have, the model you’ve chosen, and the specific specification of that model.
Individual Treatments Can Be Defined but not Calculated
Recall the definition, then, of the individual treatment effect when expressed with potential outcomes notation from the Rubin causal model framework:
where Y(1) for person i is whether they died by homicide in a world where their community was covered by concealed carry and right-to-carry laws, but Y(0) is for the same person i and this is important, at the exact same moment in time, whether they died by homicide in a world where the same community did not have a concealed carry / right-to-carry law.
The fundamental problem of causal inference is if you need both potential outcomes to measure a causal effect, but in the reality of human history only one of them will ever be accessible, then no dataset will ever have both potential outcomes for the same person, and thus no dataset no matter how large and no matter who created it — including in a scientific lab where treatments are randomized — will be able to precisely measure a person’s treatment effect. We do not observe Jonathan’s homicide outcomes at 7:32AM on January 17th 2025 in Waco, Texas in a world where his community legally allowed its residents to carry weapons and did not. Only one of those will hold at that point in time by the law of noncontradiction, if nothing else, and so we cannot directly measure individual treatment effects.
The ATT and Diff-in-Diff
But we can define treatment effects. And as they are expressed as numbers, we can also average them, even though the averages will suffer from the same problem as the individual treatment effects since the averages are averages of the individual treatment effects. So, let’s create an average now.
It is not technically proper to put subscripts in those columns as you’re not averaging the outcomes of Jonathan, but rather you are averaging over all individuals. But if I didn’t put that there, you might not see that we have to take means of Y(1) and Y(0) in this example, subtract one from the other, in order to create an average treatment effect. There’s two things I want to them note.
First, as I said, this cannot be calculated; it can only be defined. We are missing for each person either their Y(1) or their Y(0). I live in Waco and I think people around here can carry weapons on their body concealed. If they can, then they cannot also not do so. They can choose, in other words, whether they do that, but the law either will let them or not let them — but it cannot do both at the same time. Still, individual treatment effects are defined as though that was possible, and so average treatment effects are too. When we take the average over individual treatment effects at all, we say that that is the “average treatment effect”. But there are different versions of it because there are different groups of people in your data and you can target them if you want, at least conceptually.
So that leads to the second point. Notice the vertical bar and then notice what comes after — D=1. What is that? The treatment group are the communities that have concealed carry / right-to-carry laws, and so that is the D=1 group. The control groups are those communities that do not, and so that is the D=0 group.
Any time you are using diff-in-diff to estimate “the causal effect”, you are technically targeting the ATT parameter. The ATT parameter is the average treatment effect for the communities that have concealed carry / right-to-carry laws in place. But with diff-in-diff, since it’s using longitudinal data, it’s even more specific than that. Diff-in-diff targets the averaged treatment effects of groups of units assigned to the treatment in all periods after the treatment had happened, which is:
Thus if 20 states adopted right-to-carry in 1985, and your data runs from 1980 to 1990, then your ATT target parameter is difference in mean Y(1)-Y(0) for those 20 states where the average is over all 1985 to 1990.
Population
Recall that I said that the ATT is not just the mean of Y(1)-Y(0), though. It is also that mean for a given population. What is a population exactly and how does that population connect to data?
Fundamentally, the population is the individual units, and in the social sciences, the units are usually individual people. We want to know if Jonathan was murdered. But when it comes to data, the units can be either individuals such as we find in panel surveys like the NSLY97 (where we follow the same individuals over time) or repeated cross sections like we find in the Census (where we do not follow the same individuals over time, but we do follow different individuals over time), but it can also be aggregates up to some level, such as the county, city, state, country, commuting zone, school, firm, etc. levels.
Target parameters are both neutral definitions of aggregate treatment effects, but they are also normative in a sense. To the economist, positive is an adjective that means simply to describe the world. And so the ATT is positive in that it describes an average. But normative means “what should you then do?” Should we raise the minimum wage? Should we get rid of social security? These are policy recommendations, they are preferences, they are ethical statements or even aesthetic ones. And while the ATT as a definition is not normative, to state that the ATT will be your desired target parameter is normative, because you and your team have said that not only does the ATT exist — that it is the parameter that you care about.
The following are different ATT parameters, and while they all average treatment effects over individuals, they do not have the same weights, because they use different levels of aggregation. Not only will they not have the same number, though; they very well might have different signs.
Each of these are aggregations over the same group of people, but each also correspond to different datasets. The first ATT corresponds to individuals, i, exposed to concealed carry / right-to-carry laws. The second ATT corresponds to counts of murders in states s exposed to concealed carry / right-to-carry laws. The third ATT corresponds to counts of murders for counties c exposed to concealed carry /right-to-carry laws. And the fourth ATT corresponds to a large panel dataset of commuting zones, cz exposed to concealed carry / right-to-carry laws.
Which of these target parameters do you want and why that one and not one of the others?
There is not right or wrong target parameter in one sense because in one sense, all target parameters as I said are positive and descriptive. They are merely averages and averages aren’t good or bad — they just are. They are Zen-like in that sense — they are empty of all things other than a number.
But if you are doing the study, then you have for some reason chosen to focus on one but not the other and that is normative, not positive, because you have taken a stand and said that to you and your team, the most important question is to figure out the ATT for a given level of aggregated units. So the question is, why might you care about the ATT for states versus the ATT for counties, and what exactly is different between them.
I’m going to stop there because I want to just leave that in the air for your contemplation. In the next post, I will pick up on this topic and simply talk about how there are datasets of individuals, datasets of states, datasets of counties, and datasets of commuting zones, and that averaging over each of them is not the same, will not yield the same number, and with heterogenous treatment effects may even not be the same sign. I simply want to leave you with this already lengthy post so that you can just be thinking about target parameters measured with potential outcomes averaged over populations, and that populations can be individuals, or it can be aggregations of individuals, and let you mull that over.
Now wish me the best of luck as I head out to Austin to continue prepping my class without any distractions!