Today’s substack is by Kyle Butts, an assistant professor of economics at the University of Arkansas, and one of my good friends and partner in running Mixtape Sessions. This is a repost of a post he did on his blog here. And in this he’s going to talk about causal inference with spatial treatments, which is something Kyle has published on himself. I’ll turn it over to him now. Welcome Kyle again as a guest writer for the substack! And don’t forget to sign up for Causal Inference I which starts Saturday! Below is the fall schedule, followed by Kyle’s post.
Introduction
Treatment often occurs at a point in space: a new apartment is constructed; a new firm opens on a street; or an abandoned lot is cleaned up. Evaluating the effects of this treatment, like all causal inference, requires finding a good control group.
This blog post is meant to serve as a starting point when thinking about this. I will cover two papers, one of my own and an excellent paper by Michael Pollmann. They both offer a different strategy for selecting a control group, so I want to summarize each strategy and the relative advantages and disadvantages.
As a leading example, we will consider evaluating the impact of a new apartment on rents in the local area. Our dataset would consist of housing units and their observed rent. Treatment is the set of points where apartments were built. The outcome of interest is the change in rents before vs. after the appartment is built.
Butts (2023)
My work considers a common strategy in urban work. The method considers an “inner ring” and an “outer ring”, where the inner ring is assumed to be impacted by treatment and the outer ring is not. The idea here is that by making the outer ring small enough, you’re looking at units within the same “neighborhood” (e.g. looking at 600m around a new apartment). Therefore, any neighborhood-level time shocks (changes to amenities, demand shocks, firm openings, etc.) would presumably be the same for all the units.
If you believe in this “common neighborhood shocks” assumption, then the counterfactual trends is the same as you move from the treatment location towards the outer ring. With this assumption, you can estimate a “treatment effect curve” that allows you to trace out treatment effects over space. This effectively says all shocks to rent are equal for housing within the outer ring.
As an example of what these estimate would look like, I revisit a paper studying the effects on a sex offender moving to a street on home prices in the paper. There is a very large negative effect on the same city block as the offender, but then quickly moves to 0. After about 1 tenth of a mile, trends in home prices are flat, supporting the “common neighborhood shocks” assumption.
However, there are some cases where you might not believe the common trends assumption. For example, think of a polycentric city where there are multiple “business streets” that have a lot of store fronts and apartments. This is where new aparments will be built. But, as you move outwards from the business area, you tend to enter more “residential streets” with more single-family homes. Then, the inner ring will consist mostly of apartments on dense streets and the outer ring will be mostly residential streets. If there are shocks to specifically residential homes or specifically to apartments, the inner and outer ring will be on differential trends.
Pollmann (WP)
Michael Pollmann’s paper (R&R at Econometrica) takes an alternative approach. Instead of comparing units that are really close to treatment to slightly further away, they focus on the case of finding comparable locations that could have been plausible treatment location. In our apartment example, we would be looking for parcels that were very likely to have a new apartment built there, but did not due to random chance.
This is really helpful if we think there is a difference between the kinds of units that are very close to treatment (inner ring) to the kinds of units that are slightly further away (outer ring). Instead it compares units very close to treatment to other units who are plausibly very close to an unrealized potential treamtent location. That is, which housing units are near to a place that very likely could have had a new apartment get built, but did not for idiosyncratic reasons. In our example, this would be comparing “business streets” with a new apartment to other “business streets” that did not get a new apartment.
The difficult part of this approach is therefore in finding a good set of comparison locations that could have been treated but did not by random chance. The simplest way to think about doing this is via a matching type strategy. Imagine looking at a bunch of parcels and measuring information about that parcel (e.g. zoning, lot size, it’s census tract, and characteristics of the current residents). Then, you would find comparison parcels that are similar along all these dimensions (i.e. matching).
However, Pollman points out that often times treatment is not assigned to a location based solely on the location’s own characteristics, but also depending on the characteristics of the neighborhood. It might matter that a parcel is underdeveloped for apartment construction, but it might also matter the characteristics of the surrounding parcels (e.g. are there a lot of consumptive amenities that people will pay for nearby?). Another example is there being a difference between there being a lot of low-income people in a building versus being on a street with lots of low-income buildings. Therefore, we should really be thinking about matching both on the location’s own XXs and the neighboring unit’s XXs.
In other words, imagine you have maps of all the units and their values along a bunch of different XX values. Then, you want to do matching not only on your own XX, but the distribution of XXs of your neighbors. This creates a bit of a high-dimensional covariate problem where even if the XXs are low-dimensional, spatial functions of units’ XXs can be high-dimesnional (my next-door neighbor X, average of my street’s X, average of my tract’s X, etc.).
Section 4.2 of their paper shows a really cool way of estiamting propensity scores based on spatially-varying data. They use a convolutional neural network to estimate a propensity score map. Convolutional neural networks are popular in analyzing image data as it looks at a focal point and “sees” characteristics around that point (i.e. looks at the neighborhood). Then the neural net will learn to predict treatment based on it’s view of the neighborhood (i.e. fitting a propensity score model). The method is a bit complicated, but fortunately they have example code that you can use if you try to implement this method.
From this neural network, you would receive a `heat map’ of where treatment is likely to occur. This map might look like this map of lightning strike intensity. From this propensity-score map, you could draw (according the propensity score) placebo locations to use as the comparison units. If this were your map, you would end up with a lot of placebo locations in the Texas, Louisianna, and Florida.
Conclusion
We have covered two primary ways you might analyze treatments that occur at a point in space. Here’s the cliff-notes:
Compare points very close to treatment to those slightly further away (assuming these units look similar).
Approach 1 requires assumptions on where treatment is located and how units might change as you move away from treatment.
or, compare units very close to the treatment point to comparison units that are very close to plausible alternative treatment locations.
Approach 2 requires assumptions on the comparability of treated units to the selected comparison placebo locations.