I don’t like to use social media to share my new papers but I thought I’d do it this week anyway. My paper with Jonathan Seward, Karen Clay and Vivian Vigliotti on mental health screening and self harm among inmates in a large urban county jail is now out in JHR. I asked my good friend, ChatGPT-4 (or “Cosmos” as I like to call him), to summarize the project for me and he did. As always, he was full of grace, generosity and unlike me, was pretty succinct.
Cosmos: Your work uncovers a nuanced, impactful exploration of the justice system's intersection with mental health, employing the innovative "leniency design" methodology. This approach utilizes the randomness of clinician assignment at booking to evaluate the effects of more severe mental health screenings on self-harm, jail time, and recidivism among transitional age youth (17-25) and adults. It cleverly distinguishes the inherent impact of mental illness (demand) from the consequences of the screening process itself (supply), akin to differentiating between two identical inmates who, by chance, receive different mental health evaluations. Your findings reveal that harsher screenings correlate with increased self-harm attempts and higher recidivism rates. Intriguingly, this effect might stem from the external repercussions of being labeled as mentally ill, leading to prolonged jail time due to judicial and familial caution against the risks of early release. This extended incarceration, paradoxically, increases the likelihood of self-harm, even when accounting for time spent in jail. Through discussions with jail administration, you've unearthed a critical, though complex, dynamic where the knowledge of an inmate's mental illness influences bail decisions, often with the intent of avoiding the adverse outcomes tied to premature release. Your research thus sheds light on a significant, yet previously underexplored, aspect of the criminal justice system's treatment of mentally ill inmates, providing a foundation for further investigation and potential reform.
Background and Design
This project started in 2018 and was motivated by personal concerns. As I’ll briefly explain, for personal reasons, I decided to transition my research agenda away from a long time focus on sex work, drugs and abortion, to studying severe mental illness in jails. And so I began cold calling individuals in a large urban county, driving there to meet with various members of government who worked with the mentally ill population, mostly as they crossed with criminal justice. I’d witnessed first hand that the assignment of individuals suffering with severe mental illness to care by judges or through public hospitals was very different than those assigned by parents through private insurance. Private insurance, in the fine print I learned the hard way, allows for a certain number of days of in-patient care, but that can be overridden by the insurer’s own physician network. So parents often are very stressed out as a child will be let out, in their mind, “too early”. Of course, the opposing view is that the adolescents shouldn’t be receiving residential care for mental health struggles anyway. And if you’re a parent caught between those two polar ends — one philosophically opposed to it, one as a caregiver truly confused and desperate for any help — it’s like an added layer of agony.
That was really the cause — the personal stuff, and then seeing first hand how care differed between public and private insurance. So I decided to transfer all of that feeling and frustration into studying severe mental illness within corrections and the court systems as it was the one way that I felt that I could actually put some distance between me and my own personal case, so that I could better function, while still being very close to my personal case. It was maybe even driven by a need for being involved in a way that made me feel halfway competent, as severe mental illness and navigating mental hospitals and feeling conflicted the entire time can really degrade your self esteem and lead to its own despondency. And this way, I felt like maybe I could recover somewhat by just working on something that I understand and could contribute to, even if it had nothing to do with the personal care of a loved one.
The second motivation of the study had to do with the facts, that I learned, about inmate suicide. Suicide, as I soon learned, was the leading “single cause” of death in jails. It accounts for around a third of all jail deaths.1 Here’s a BJS report from a few years ago on suicide in jails if you want to learn more basic facts about it, but it’s actually kind of a “silent problem”. The suicide rate among jail inmates is very high, even higher than the suicide rate of post 9/11 vets. The levels are much much higher among post 9/11 vets, because the baseline is higher, but the rate is somewhere between the exact same or a little higher among jail inmates.
So I began taking people out for coffee, tacos, burritos and Indian food through 2018, trying to learn more about their job, trying to learn more about care. And in the process, I made an unusual discovery. I was talking with someone who informed me that inmates in this large urban county jail were scored at booking with a “mental health needs” score ranging from 0 to 3. A 0 or 1 meant that you had no mental health needs or it was mild. If you received a score of 0 or 1, then you were largely treated as though you did not have any of those needs and the county sorted you throughout the court system in a traditional manner. But a 2 and 3 meant you had moderate to severe mental health problems, largely being proxied by daily functioning problems. And those individuals, I learned, were assigned to mental health court, which is a diversion court for those with mental illnesses. Ironically, we’d personally experienced the benefits of being assigned to mental health courts, so I was intrigued.
Still, those scores were endogenous. Only those people with worse mental health problems would receive those “worse” scores (i.e., 2 or 3). But having learned so much from Angrist, Imbens and Rubin about the importance of the “treatment assignment mechanism” as opposed to just merely the “treatment assignment”, I kept probing. How, I asked, did someone get those scores. By clinicians I was told. Clinicians would rate them at booking by law within 36 hours of booking. But then I asked how did they match with their respective clinician, and that was when they told me that the clinicians were randomly assigned. And at that point, the pieces of the project fell together.
Initially, we thought of using regression discontinuity design, but the bandwidths were too coarse (there was only coarsened scores of 0 to 3). I tried to find out if clinicians kept raw scores that were aggregated into those larger scores, and was never able to find it. The paperwork was either disposed of once the final score was tallied, or I just couldn’t find it. So that really only left a leniency design. I have an entire section devoted to the leniency design in my book — there under the name “judge fixed effects”. I even commissioned a drawing by my friend Seth Hahne to illustrate for you what a judge fixed effects design looked like.
But for those nerds who like me love the history of thought in causal inference, it dates back to Imbens and Angrist’s 1994 classic on the local average treatment effect. Listen to their explanation of it, as well as their pessimism of its realistic application.
Basically, clinicians were alphabetized at the start of their shift and as inmates flowed in, on a first come basis, they were assigned a clinician. That clinician rated the severity of their mental health needs on a scale of 0 to 3, and that score then directed them after they left the jail to either traditional courts of mental health courts. Quoting Imbens and Angrist (1994) above, some [inmates] for a [social program] were screened by two [clinicians at booking]. The two clinicians are likely to have different [mental health needs scoring rates], even if the stated criteria is identical. Since the identify of the clinician is immaterial to the response, it seems plausible that [exclusion] is satisfied. … But the monotonicity assumption which requires that Clinician A accepts inmates with probability P(0) and Clinician B who accepts people with probability P(1)>P(0), Clinician B must accept any inmate who would have been accepted by Clinician A plus additional ones.
So the conditions that must be satisfied for this leniency design are not trivial. You need SUTVA, the instrument must be independent of the potential treatment status and potential outcomes, exclusion, a non-zero first stage and monotonicity, and of the five, monotonicity is usually the one that is toughest. And that’s because the decision makers involved are humans, and even if you see a strong first stage, and even if the institutional details are such that you think independence and exclusion hold, and even if SUTVA violations seem almost impossible to occur, you still have that last one — monotonicity — wherein a clinician may cease to be “the most strict” clinician compared to someone else in some unknown situation unobserved by the econometrician. So we try to address this formally and informally and settle on a plausible version of monotonicity called “average monotonicity” which is sufficient to identify local average treatment effects but not marginal treatment effects (Frandsen, Lefgren and Leslie (2023, AER). But let me now get into those weeds just yet.
Originally, the project was about that — using the score upstream to identify the effect of the courts themselves. So, I approached the Sheriff in 2018, pitched two projects, this one and another one that I’m currently working on still, and met the Director of Inmate Mental Health at the jail itself. Went through IRB, signed data use agreements with the jail, and began to work closely with my two students, Vivian Vigliotti and Jonathan Seward, on cleaning and understanding the data. I then met Karen Clay at Carnegie Mellon and asked if she might become involved as she had, similar to me, an interest in the topic despite it being off the beaten path of her own historical research agenda on energy and economic history.
Very soon into the collaboration, we became suspicious that the IV strategy we were employing based on that leniency design was satisfying exclusion, even putting aside monotonicity itself. We were pursuing falsifications about the effect of the worse scores on self harm before they exited jail and went to the courts and found clear evidence that even before they exited the jail, inmates randomly assigned clinicians who scored higher scores were suffering at a higher rate. The evidence was fairly overwhelming and robust, and so the study began to shift fairly soon thereafter from studying the mental health courts to studying the jail context only. And frankly, that was a relief because while I was and still am interested in those diversion courts, and had a masters student write her thesis on the rollout of mental health courts across the country on crime (a paper that has since been extended I saw by someone else whose name I have forgot but is out there using the more robust diff-in-diff methods), the jail context was better for our purposes because it was cleaner. We knew that whatever the mechanism was connecting the higher score to the self harm, it had to be things happening inside the jail because these self harm attempts were happening inside the jail prior to release.
We in the end framed the paper in this way: the score that an inmate receives is endogenous. It is the intersection of the inmate’s underlying conditions (demand) and the clinicians own assessment of that condition (supply). By randomizing clinicians, we are shifting supply itself and able to isolate the causal effects associated with higher to lower mental health needs score holding constant underlying conditions. This study therefore is not about the effect of mental illness on suicide but rather the effects of higher mental health needs scores on suicide. A subtler difference but nonetheless an understudied one, largely because it is impossible without an instrument that shifts supply but not demand. Your clinician cannot “make you” schizophrenia, but your clinician can say you are schizophrenic, and that is the idea of the paper.
The Blueprint of Evidence and the Methodology Blueprint
And so, the project followed a leniency design blueprint that had already been laid out by Anna Aizer, Joseph Doyle, Crystal Yang, and Will Dobbie and several classic papers. And here is what we find.
The paper is full of novel descriptive evidence and if you want to read that, I encourage you to. Some of it is in the supplemental documents in the pdf shown at that hyperlink. But I’m going to skip that and just try to succinctly summarize the paper now. What do we find and why do we think it is going on?
Those who have taken my workshops know I tend to suggest a blueprint of “evidence”, whether it’s IV or difference-in-differences, as a way of arranging results. I distinguish between the “main results”, or the claims that your paper is making, from the evidence that can support it. I’ll adapt it for IV setup. And so for me that blue print goes like this:
First Stage or “Bite”. Do you find that the instrument shifts people into and out of the treatment, here being that worse mental health needs score. There’s visual evidence and there’s traditional evidence for this using Kleibergen–Paap F statistic, which in the just identified model is the Olea and Pfleuger effective F statistics. Those effective F statistics range from 575 (for youth) to 1,048 (adults) for misdemeanors and 370 (youth) to 425 (adults) for felonies.
Independence: We use the randomization test favored by more contemporary leniency papers, as well as the old Anna Aizer and Joseph Doyle balance test. I actually prefer the Aizer and Doyle balance test as I think it’s simpler to explain. They show how the mean inmate characteristics differ across the distribution of the instrument itself, which here is a “clinician leave-out mean”. And across the board, inmate characteristics were largely the same on average for those clinicians who were “strict” versus those who were “lenient”, evidenced by looking at sample means for the various terciles, as well tests of differences in means.
Data Visualizations. There are some data visualizations in the paper, but they’re largely showing the first stage showing that as you are assigned more lenient clinician, you are more likely to be given a “worse” score. I’ll skip them here but you can find them in the paper.
Monotonicity. We largely do not pass the Frandsen, Lefgren and Leslie test for strict monotonicity, and so do not even attempt to estimate marginal treatment effects. But we do find evidence using fairly traditional methods of “average monotonicity”. I don’t want to over-stress this test as I have literally never seen anyone fail it, but nevertheless we use it and it involves evaluating the first stage within subsamples of the data (demographics, for instance), and the qualitative sign on the first stage is the same throughout. There are some differences in magnitude that I think would be an interesting other study, though, but that is not the study here.
Exclusion. The exclusion violation would occur if the assignment of a clinician could directly determine self harm or recidivism or time in jail. But the clinicians, we learned when visiting the jail and interviewing staff, most likely cannot do that. They only work with them at booking. They aren’t “seeing clients”, in other words. Their scores are communicated in writing and those scores therefore do get moved around and filter to the inmate’s lawyer, family and judge’s, but that is still the score, not the clinician, and the score here is the treatment. Also, the inmates do not access that score, unlike in prison where various scores are pasted to their cell door so as to identify them to officers. But here that doesn’t happen. It’s largely a seemingly inconsequential number stored in their records among other.
Main results. I’ll show those below.
Mechanism. I’ll show those below too.
Main Results and Mechanism
The main results are in tables 12 and 13. It’s a bit of an unorthodox organization of a regression table in that we have multiple outcomes and multiple models. So, what we did was reverse how that is usually displayed. The rows like “LOS” or “Suicide Attempt (SA)” are the outcomes in regression models which are represented as the columns labeled OLS, 2SLS, and IV LASSO. Table 12 is for youth and adults charged with misdemeanors and Table 13 is for felonies. I’ll put them both up so you see it one place.
Length of Stay, or LOS, measures how many additional days a “high score” individual spent relative to a “low score” individual. And for youth (columns 1-3), OLS and IV largely agree with one another — it’s about an additional week. Anderson-Rubin confidence intervals are shown below the standard error in column 2, which is recommended as an additional first stage test as it is robust to weak instruments. We find really no effect, using IV, for adults though. Adults charged with misdemeanors who have higher scores are spending the same amount of time as those with low scores. So for misdemeanors, the effect of the higher score is focused on the youth.
The pattern changes once we move over to felonies where the time in jail grows a lot as a result of the score. For youth charged with a felony, they are spending around two more months than they the complier control group, which is around two times what you find with OLS. Which is to say they are spending around two months longer than they would have been had they been randomly assigned a clinician with a lower tendency to give high mental health needs scores. I encourage you to visit a large urban county jail some time (there are arrangements you can make if you reach out in fact) and ask yourself how you would last realizing you’re staying there another day, let alone another week or another two months. For adults, the effects were more like 3 weeks and the IV results largely agreed with the OLS results.
So what happens in that time that they’re there? Self harm raises. We measure self harm in two ways: that it happens at all (SA) and SA divided by the number of days they’re there. The latter is more like a hazard given we can’t observe their self harm attempts after they exit. And here the IV results tend to systematically differ from the OLS results. Kids with misdemeanors, we find a 4pp increase in self harm and a 0.7pp per day increase in self harm. For adults with misdemeanors, the effects are lower but significant. And given we found no effect of the higher scores on length of stay, the adult misdemeanor result is a bit of a mystery as to what exactly could be driving it.
The kid felony results are less robust, so let me move to the adult felony results. The IV results are sometimes just like the OLS results and sometimes 2x larger, ranging from around 1-3pp higher, also reflected in higher daily hazards of around 0.1-0.2pp more self harm attempts per day in jail.
And the last result is recidivism and I won’t bore you, but it goes up too at 1 year, 18 months and 2 years for misdemeanors, but for obvious reasons, gets a little harder to pick up for felonies (we don’t have the date that their sentences start; we only know when they leave the jail).
Mechanism and Concluding Remarks
So why does this happen? We don’t know. I’m not even sure the jail knows as we shared the results with them. Something is going on with the score that is beyond what the score is being used for. This all happening prior to going to court, suggesting that the score is assigning some kind of unknown treatment to mentally ill inmates, but what?
We think the culprit is the length of stay, but then the question became why even does length of stay respond to a higher score if the inmate is himself equivalent to those with the lower score (on average). And that was the million dollar question. So we tried asking — and old fashioned test you might say for mediators — and here’s what we learned as plausible but still speculative mechanisms. We think that when judges responsible for letting the inmate out of jail learns that he has been assigned a higher score and labeled moderate to severely mentally ill, that they become reluctant to immediately let them out and so they stay longer in jail. The red tape that is involved in moving into court hearings and either back to jail awaiting another court hearing or exiting is complex and time consuming. The motivation of the judge is probably reasonable — it may be concern about the community. Mentally ill citizens are often over represented among the homelessness population. So judges may be sensitive to that.
Another possible mechanism, though, is the defense attorney who should have that information about the scores in the inmate’s file. And that could be communicated to the family of the inmate who upon hearing that the inmate suffering from a mental illness, and that that may even be the first time they’re heard that so clearly spelled out, can cause them to take different sets of steps in response that may delay release. The impact of severe mental illness signs on the caregivers is very complex and not well understood, but it matters a lot in my opinion.
So, what do we do from here? Well, I would like to see more people in economics study this question to be honest. As we started out by saying, suicide is the leading single cause of death in jails. And the jails may have that problem because we do not have a functioning residential inpatient system any longer for the severely mentally ill. Civil liberty protections are very strong for severely mentally ill, despite what you’ve seen in documentaries and shows. You must meet “criteria” for involuntary hospitalization and criteria is “actively homicidal/actively suicidal”. Not previously homicidal or previously suicidal like last week. Now. And if you no longer meet criteria, you must be released. And that release is back in the community and if the care giving network has been weakened, it could be back to the streets. It is trivial for a person suffering from a severe mental illness to get arrested. You can get arrested for sleeping in a place where you shouldn’t be sleeping. You can get slapped with a third degree felony assault on a peace officer by simply waiving your hands around when confronted by an officer and striking them. Maybe in a single incident, you can avoid this, but in the limit, no. In the limit, you will get arrested and end up back in jail which has made the mails the “de facto mental health hospital of last resort”.
Jails were never designed to be mental health hospitals. But they have become one of the main ways in which mental health care is provided to the most severely ill. And that’s something to think about.
It’s also a high cause of death in prisons, but just so the reader knows, we are looking at jail, America’s form of pre-trial detention, not prison. My prison project is underway, but this is not it.
Hi, So interesting! It occurs to me that the same mechanisms are probably at work when someone checks into a private assisted living facility for the first time. (How does it work for Medicare/Medicaid only facilities? There is a standard assessment to see if they can go to assisted living, or need skilled nursing or even memory care. There are probably studies of thrive/no thrive in each of of these settings. In my experience if the facility says you need the more expensive memory care or skilled nursing then you don't get in unless you/family agree and have the financial ability to shoulder that cost.