6 Comments

This is very interesting. Good point about being led by 'ethical' considerations. I analyse a lot of survey data and noticed a few years ago that the more you aggregate data from the same source (i.e. from the same survey) the stronger the correlations between variables become. In a recent example, the correlation between two variables at the level of 16358 individual cases (i.e. respondents) was 0.609, between those cases aggregated into 45 monthly averages was 0.919, combined into15 quarterly averages the correlations was 0.949 and aggregated into 4 annual averages the correlation was 0.995. If you plot these with the number of units on a logarithmic y-axis (base 10), the points fall in a straight line. It's as if aggregation eliminates noise from the measures - in this case probably respondents' inconsistencies or varied considerations in using a performance scale for different aspects of a service - purifying them to points and clarifying the underlying relationship in the process. Of course the trade-off is that you're losing more sample size at each level of aggregation, starting with a sample of 16358 and ending with a sample size of 4!

Expand full comment

Excellent post on an important and neglected topic.

Expand full comment

Thanks bud. I can never tell if I’m saying something obvious or irrelevant. I think it’s a big deal though I do admit it’s actually quite hard to figure out how to explain it to students for me for some reason. I feel like I’m struggling still to find the perfect example. I’ve been trying some simple tables lately and so I may try to do that again.

Expand full comment

In panel regressions, when using aggregate data (say at the state level) the ATE is wrong if the weighting is wrong. Typically, if one weights by state population, one is overweighting, and the ATE is dominated by treatment effects in just a few large states. If one does not weight, the small states dominate. Since the treatment effects are certain to differ between states, ATEs are likely to depend on the weight used. It is easy to see why a regression with aggregate data would reach different conclusions from that an individual level analysis if the wrong weights are used in the former. The article examples you give are behind paywalls, so I cannot tell what the weights were and how they were justified, but I'll bet they did not weight or weighted by population, such that the ATE in the aggregate-level study is wrong.

The proper weight (a function of population) should be one that gives equal influence to states of different sizes. To do that one can use the Breusch-Pagan test, exploring different weights until finding the one with the least heteroskedasticity. Better yet, one can add up the dfbetas on the treatment variable for each state and try different weights until the dfbetas are close to the same for each state. In my experience, the proper weight differs greatly from regression to regression. It partly depends on the size of the dependent variable, since there is more relative variation when there are small numbers. Thus the proper regression weight in a murder regression is population to the 1.2 power, while it is population to much smaller powers with other crimes, for example population to the 0.3 power in a robbery regression.

In all, this is a data issue. Getting the proper weight so that neither small nor large states have excessive influence in the particular data set is not one that can be determined by theory or mathematical analysis. Lastly, the varying views on proper weighting give researchers degrees of freedom - different weights get different results - and sometimes it looks like a particular weight was used to achieve desirable results.

Expand full comment

Hi Scott, I have one question and one clarification request.

First for clarification, in the "Minimum Wage Studies" section, you write that under heterogeneous treatment effects, it does not matter but matters in wildly heterogeneous effects. Do you mean to say homogenous on the prior or is it about the extent of heterogeneity? I ask this clarification because if the extent of heterogeneity is a crucial factor knowing the critical value for heterogeneity might be good for future research.

My question is, I have been thinking about a scenario where I do not see an effect on aggregate (sum) measure but see if I weight it by population. My thought process is that if, for example, the outcome is a number of visits to the hospital, an increase of 10 visits might mean very different for different states, whereas 10 visits per population are the same no matter the size. Do you think, therefore, aggregation matters when the "treatment effect" that we calculate has different implications (from the perspective of seriousness to success/failure) for different states? In other words, the choice of outcomes dictates if aggregation matters.

Expand full comment

I may have not been clear, so let me say it now. Ultimately the minimum wage, if it does something, does it to individual workers. So that's a population. If you hypothetically could line up the entire population's workers and their individual treatment effects, and you averaged it, you'd get the ATE for the entire population.

I talk about this next part in the latest substack.

Now let's say that we have two things:

1) "unrestricted" heterogenous treatment effects. So minimum wages might cause someone to get fired (Y1-Y0=0-1=-1), might cause someone to get hired (Y1-Y0=1-0=+1), or maybe they would've gotten fired anyway (0-0=0) maybe they would've gotten hired anyway (1-1=0). So if it's a binary outcome, then you have only three values: 1,0-1.

2) Sorting into different sized cities based on those numbers. I call it Tiebout-Roy in the newest substack since it's "voting with your feet" but it's doing so based on the effect that the minimum wage will have on one's employment.

If you have both of those, then the ATE for the entire population (as if you were to just average treatment effects over all individuals) will be different from the ATE that you'd get if you (a) first averaged the treatment effects in those cities (which recall the people in those cities are differently sized and have different treatment effects) and then (b) averaged over the cities. And if you did that, then the ATE for the average city might not be the same as the ATE for the average person, even though the cities have the same people in them.

The only way it would be that the simple ATE and the city level ATE are the same is if either you have homogeneous treatment effects or the sorting patterns have nothing to do with potential outcomes or treatment effects.

It's not really like a bias per se, but it might help explain why something like CPS level data versus firm level data versus average state level data versus average city level data may actually find different things, even if they were created originally from the exact same datasets. It all would be the distribution of treatment effects, sort into regions based on them, and weights.

Expand full comment