A funny thing happened on the way to the minimum wage

A #JHR_Threads explainer of Meer and West (2016)

Long Winded Introduction about uncertainty and the minimum wage

The Nobel Laureate Robert Aumann has an interesting theorem popularly called “Agreeing to Disagree”. Like so many observations that come from game theory, the theorem feels both ridiculous and incredibly intuitive. It starts out with the following provocative sentence:

“If two people have the same priors, and their posteriors for a given event A are common knowledge, then these posteriors must be equal. This is so even though they may be their posteriors on quite different information. In brief, people with the same priors cannot agree to disagree.” (Aumann 1976)

This theorem is weird because it is prima facie false — people agree to disagree all the time. But I don’t think Aumann is committing some sort of weird Zeno’s paradox where he denies reality. Rather he is saying that if two people start with the same beliefs and the same common knowledge, then they shouldn’t disagree. That they should is different from that they do, in other words, but the point is that situations where people do have all the same information and start from similar places shouldn’t lead to strong disagreement. It makes me think that disagreement is more of a paradox than we really understand. And nowhere in all of microeconomics is that more true than in the controversial area of the minimum age.

Despite being something studied by the same people (labor economists) with similar training (causal inference and econometrics), similar access to data sources, similar economic theory training, and so forth — despite all these nearly identical elements, it is still true that the minimum wage is divisive even among economists which is represented in the wildly different beliefs about the minimum wage’s effects. It’s an empirical question what some elasticity of employment response is to the minimum wage, so there shouldn’t be a lot of disagreement. The answer is what it is. And yet there is substantial disagreement. For instance, look at the IGM Forum which asked 43 economists the following question:

The current US federal minimum wage is $7.25 per hour. States can choose whether to have a higher minimum - and many do. A federal minimum wage of $15 per hour would lower employment for low-wage workers in many states.

Look at that variation. 40% agreed, 33% uncertain, 14% disagreed. A third of all economists were uncertain? Only 45% agreed? Why? Is it because we were predicting beyond the support of the data because we hadn’t played around with a minimum wage that high? Is it that the answer is “it depends” — as in it depends on if the minimum wage is binding, or it depends on whether it hits competitive labor markets or whether it hits intensely monopsonized markets? Whatever it is, such a spread of opinion over what seems like such a basic idea seems to contradict Aumann’s interesting theory, and I find that peculiar.

Economics of the minimum wage in competitive markets

The effect of an increase in the relative wages on demand for competitive labor inputs is non-increasing in a partial equilibrium. But we also know from Joan Robinson’s monopsony model shows that it can increase employment. Many economists disagree as to whether contemporary labor markets are in fact competitive. A rich growing new literature led by people such as Doug Webber, Ioana Marinescu, Arin Dube and others suggest monopsony may be more common than we think. And thus to a real degree, these debates may be sustained because the minimum wage may operate in heterogenous ways depending on the competitive labor markets in which they hit.

We don’t have to rely on Economics 101 to see this; we can go a bit further. Assume a firm’s production function is quasi-concave in inputs and input prices. Solving a cost minimization problem yields the following three first order conditions:

Now we take the total differentiation of these three equations:

Then putting this into the AB=c matrices, we get:

Finally, we calculate the effect of a change in relative wages on labor demand using Cramer’s rule. This requires taking the determinant of |A|, the determinant of a “transformed'“ matrix, and dividing the first by the second:

Cramer’s rule substitutes column c into the first column of the A matrix to get the following determinant:

Dividing this transformed matrix by the determinant of A, we get:

And thus we see that the demand for labor for a cost minimizing firm operating in partial equilibrium is a reduction in labor demand. So if the government were to impose a binding minimum wage on firms, it would raise relative wages and in turn reduce demand for labor, an effect that would be more pronounced in the long run than the short run given the fixed costs of production in the short run may mute the impact that the policy may have temporarily.

Earlier work that had focused just on level shifts found negligible effects, but these authors suspected that the real world differed from the black board world of economics because of various real world stickiness such that firms may be reluctant to fire workers. It can reduce morale, for instance, to have to do that, so firms may accommodate the minimum wage temporarily simply because firings are so bad on the workers and therefore the firm itself.

But in the long run, a higher minimum wage could reduce employment, not by increasing firings, but simply by slowing growth in hirings. Such effects imply a different counterfactual — rather than a discrete drop in employment, perhaps it’s a discrete drop in job creation growth as firms simply substitute at the margin between new workers and some alternative. This has implications for analysis, which I’ll discuss soon.

Monopsony

Standard neoclassical economics does not as many claim show an unambiguous effect of the minimum wage on employment. We can also show that the situation may change once we move into the world of input markets which are not competitive. Joan Robinson argued that just as we have a model of a lone firm in the output market (monopoly) and its associated inefficiencies, we can have a lone firm in the input market (monopsony) which will also have its own inefficiencies. Let’s look at this graphical representation of what effect a raise in the minimum wage will have for employment using her model.

In any model, solutions are almost always some versions of setting a choice variable equal to a point where marginal benefit equals marginal cost. In her model, this is where the rising marginal cost of labor curve equals the marginal revenue product of hiring another worker (Lm), which in this picture is $4. If a minimum wage rises, then this equality cannot be maintained, and we move down the marginal revenue product curve to where the minimum wage intersects at L2. The optimal point is where supply equals marginal revenue product, and while we have overshot a little, we are closer than we had been absent the raise.

Two seemingly different predictions, so who is right? Well neither are “right” or “wrong” in abstract. It ultimately comes down to whether the markets are competitive or not, and to a degree, this may help explain the theoretical agreeing to disagreeing. I think labor markets are competitive, in which case a rise in relative wages will reduce labor demand, but you believe it is not very competitive, in which case a rise in relative wages via the minimum wage will increase employment. We are both right given our premise, which requires knowing whether a local labor market is or is not competitive, and much of the time we don’t know that. So maybe this contributes to the agreeing to disagreeing — we don’t agree on basic priors.

Given the prediction depends on many other factors, like whether that particular labor market is or is not competitive, this means we may have to rely more on empiricism than theory to decide what should happen, because despite what James Buchanan said when he called proponents of the minimum wage “camp following whores”, this is an empirical question and will always be. And expecting effects to differ across markets is a natural one given these two models, so it is also not surprising we might see efforts to calculate average effects across the country from many different minimum wages to get a policy parameter that we might use to make predictions about average effects.

Enter Two Texas Economists

In this explainer, I want to discuss a 2016 article in the Journal of Human Resources article by Jonathan Meer (Texas A&M) and Jeremy West (formerly a graduate student at A&M, now assistant professor at UC Santa Cruz) entitled “Effects of the Minimum Wage on Employment Dynamics”. I had a lot of fun reading this paper. The writing was unusually good, even superb, and the analysis was always thorough and thoughtful. But I also liked it because it seemed to anticipate some of the newer work that would come out on staggered rollouts, difference-in-differences, and panel fixed effects with time dummies (twoway fixed effects or TWFE for short). But I’m getting ahead of myself. Let me begin.1

The authors are interested in several related questions:

  1. What is the effect of minimum wages on employment growth (not levels but growth)

  2. What problems are created by staggered roll outs when estimating a DD design with TWFE?

  3. What is the effect of including state-specific time trends in TWFE models?

The first question is an important one to understand for this project, but also even for theory. In the short-run, firms may be constrained by fixed costs such that they are unable to substitute away from their employment of inputs like labor. This could be driven by contracts, but it could be driven by some other heterogenous “stickiness” the firm faces which we don’t understand as empiricists. But in the long run, all costs become variable costs, and if the relative price of labor rises, we expect that demand should fall as firms seek to optimize under new relative prices. But we also have that monopsony model in our minds too, which suggests to us that the rise in the minimum wage may have ambiguous effects too.2

The authors use three administrative datasets for their analysis:

  1. Business Dynamics Statistics (BDS)

  2. Quarterly Census of Employment and Wages (QCEW)

  3. Quarterly Workforce Indicators (QWI)

These datasets have certain strengths and weaknesses but using both they are able to create a panel of aggregate employment from 1975 to 2012. Their findings do not depend on which data source they use, probably because these are all close measurements of total labor employment.

Issues Created by the Staggered Rollout

One of the things that characterizes the minimum wage is its ubiquity and diversity. Nearly every area gets increases in minimum wage, but some do so because of national wage increases, and some because of local, autonomous, governments. This creates what is commonly called a “staggered rollout”. As later authors would note several years later, the staggered roll out creates problems for standard difference-in-differences models using TWFE.

Consider the following simple example illustrating the differences in a treatment in a staggered rollout — panel A where the treatment simply changes the level growth in employment, panel B where the treatment changes the growth itself.

It is fascinating for me to read this paper because Meer and West (2016) appear to have discovered without deriving any decompositions of TWFE that these two scenarios (Panels A and B) will not be handled by TWFE equivalently. They tend to focus on the "middle period” in this analysis and note that in Panel A, the middle period shows a smaller gap than the outer periods, which should yield a negative number. Not only that, but they also write that the duration of time in which groups are treated shouldn’t actually affect the the estimate itself. They write:

“Moreover, the duration of each of the three time periods is irrelevant for obtaining the correct inference.” (Meer and West 2016, p. 505).

But Panel B is a different animal, they say. What’s different exactly about Panel B, though? Panel B represents a scenario where the growth rate in employment flattens as a result of the treatment. Notice how the slopes of the dashed line flattens almost to zero upon treatment, following by group B continuing to rise. But once panel B is treated, it’s slope also flattens. Now why does this matter for TWFE?

Consider, they say, a scenario where the time to A’s treatment took up almost the entire figure, but was almost immediately followed by B’s treatment. Then the “average difference” between A and B would be almost entirely driven by the early pre-treatment difference and as before be negative.

But now consider an alternative scenario. What if treatment A occurred almost instantly, but a very long period between B’s treatment and the end of the panel. In this situation, then the “average difference” between A and B is driven by the outer period. Meer and West think this has to be the case — ultimately the length of time to treatment, and the differences in time to treatment, are in their minds materially affecting the TWFE estimates. But here is the really interesting thing I thought:

“If T is selected such that the two outer periods have equivalent duration (that is, t1-0=T-t2), then DiD yields a zero treatment effect, visibly at odds with the plotted time paths of employment.”

The authors haven’t conducted the kind of Frisch-Waugh based decomposition that Goodman-Bacon (2021) has done so they don’t fully appreciate that the TWFE estimate is not based merely on differences in averages between two groups, but rather the weighted sum of four individual “2 by 2 DD” building blocks. But they do appreciate that panel length, and dynamic treatment effects, are skewing results towards zero — a result firmly shown by Goodman-Bacon (2021). Goodman-Bacon notes that the difference in timing, when it is equal to half the panel length, will be weighted high, and if there are dynamic treatment effects, those in turn could wash away treatment effects due to the inherent biases of TWFE in a staggered rollout scenario. It was quite prescient to say the least that the authors stumbled upon what would later be a core part of the new DD methodology.

State-specific trends

A second methodological insight of this great paper is the realization that when the treatment changes the slope path of some outcome, as opposed to causing a discrete shift in levels, then the inclusion of unit-specific time trends, or here “state-specific linear trends”, could wash out the effect entirely. Think of it as this: the slope shifts, not the level itself. But since the linear trends are themselves lines, then variation around the line become so small as to introduce noise that can wash out all evidence of the effect itself. This, too, would be echoed later in Appendix D of Goodman-Bacon (2021) who noted:

“Appendix D analyzes two common controls strategies: unit-specific linear time trends and region-by-year fixed effects. Column 6 of Table 2 shows that unit-specific trends change the unilateral divorce estimate to 0.59 (s.e. = 1.35), consistent with the observation that trends over-control for time-varying treatment effects (Lee and Solon 2011, Meer and West 2013, Neumark, Salas, and Wascher 2014).” (Goodman-Bacon 2021).3

The inclusion of state-specific trends can wash out all variation in the data in such scenarios, thus exacerbating the attenuation bias already created by the design itself. Meer and West show this in a beautiful picture of residuals corresponding to Panel A and B based on simulations they ran.

Even when they hard code treatment effects, the inclusion of state specific time trends washes out all variation. Why does this matter? Because remember what OLS needs for identification: variation. But if there is no variation left because of this “over controlling” problem, then there is nothing for OLS to pick up — not because there is no effect, but because you have literally controlled for the treatment effect in the trends itself!

Show me the money!

So with such criticisms aside, let’s now move into their analysis. Lacking any alternative to panel fixed effects, they proceed to analyzing the impact of the minimum wage on employment using a log-log estimation that will allow all coefficients to be interpreted as an elasticity. Let’s review their results here:

While these authors discuss carefully each column succinctly, I’ll go a bit faster. Putting in time-varying controls introduces substantial precision. A 10% increase in the minimum wage causes under certain assumptions a 1.5% reduction in employment. This effect persists and even controlling for 1-2 leads (an old fashioned way of trying to evaluate pre-trends) does nothing to their estimates. It’s only column 6, when they include unit-specific time trends, does their result vanish, but given the theoretical discussion that preceded this table, we are primed to be skeptical due to the natural attenuation biases that will arise from this specification.

Next they analyze the effect of the minimum wage on what they call the “long difference” in growth rates where they examine the impact over time without trends (panel A) and with trends (panel B) using a difference in employment specification (regressed against a difference in minimum wage). Regarding the focus on long-term growth rates they write:

“If the minimum wage has a dynamic effect on employment, then the impact of the minimum wage may be small over short durations, but it will increase in magnitude and significance as the time span is increased. Ultimately, the estimated elasticity should level off in magnitude once the full effect has taken place.” (Meer and West 2016, p. 514)

This is presented in the following Table 3. The effect on growth rates is around -0.05 once we get to about three years, but as with the earlier theoretical and simulation based analysis, inclusion of trends greatly attenuates all results.

Conclusion

The paper has more, but I’ll stop there. This paper is valuable for three reasons: it showcases the challenges that TWFE will face when implementing a staggered rollout DD design, first of all. That alone creates challenges that researchers will need to think long and hard about, such as treatment timing and panel length. The paper can serve as a guide for how researchers can inform their work through careful reasoning, economic theory, toy examples, and simulations to better understand what exactly is causing the variation in the data that you’re using. What precisely is causing the number you have to be the number you have? Why is it that and not a different number? This paper peels back the black box of TWFE and provides an excellent guide for thinking through what should be dictating how we think about data and implementation.

The second valuable thing about this project is the realization that if theoretically the minimum wage has a slow effect on employment such that growth rates change, not via a level shift, but through a gradual adjustment — something consistent with economic theory, mind you — then the inclusion of unit-specific trends may wash out so much of the variation that it becomes impossible to pick up anything even when in a simulation something is hard coded to be there. This was shown through both graphical toy examples but also figures showing residuals with and without the trends. This is an example to researchers about how the rhetoric of a scientific paper is greatly enhanced with pictures, arguments and simulations.

And finally, the authors emphasize employment growth as an outcome. While I am not an expert on this literature, and therefore cannot attest to how common this was before them, the arguments they raise for why we should care about growth in employment were, for me, valid ones. I do think that we should expect slow adjustments, if only because of various fixed costs that are only relaxed as time goes on. The authors find evidence for negative effects on employment growth around negative 0.05, which if I understand the specification correctly, remains an elasticity as a delta on delta specification.

While this paper won’t settle the debate over the minimum wage, it is information nonetheless and as Aumann noted, all of us should be incorporating common knowledge into our beliefs and update appropriately, otherwise something other than that common knowledge are influencing those posteriors.

Thank you for reading. I hope you have found this entry into #JHR_Threads interesting. For your pleasure, I have included a one hour interview with Jeremy and Jonathan which I recommend to all junior faculty and students because I asked a lot of questions geared towards what it’s like working on a project like this (which is so often contentious), and the skills they think they have that have made them successful more generally.

1

Did you know that the minimum wage is almost a millennia old? First signs of it we know of is 1349. King James set minimum wages in 1604. It has a long illustrious history as one of the oldest anti-poverty programs to still be around today.

2

There is still even more caveats though. Raising wages means raising income for workers employed. With that increased income, we expect increased demand for normal goods which raises the marginal revenue product for firms that produce normal goods requiring more employment. Still, in general equilibrium, we expect that for competitive industries, the rise in the relative wage should reduce labor at least, but moreso in the long run.

3

Justin Wolfers had earlier noted the problems with including unit-specific time trends when treatments are changes in slopes, rather than discrete changes in levels, in his 2006 AER on unilateral divorce.