I want to share something I learned recently. I mentioned it this week on my podcast, The Odd Couple, with Caitlin in the opening. And it’s this — I learned last week during my MIT talk, from a grad student named Theo, that Claude Code is writing a diary of everything it did in your project, including the abandoned work it did. And I think those of us who are starting down the path of studying the behavior of AI agents (which appears to be a topic I’m going to be pursuing for a little bit anyway), this is valuable for understanding mechanisms. Let me explain.
There is a file associated with your work that Claude Code is keeping that is a JSONL, and it is Claude’s diary. It’s its notes from your work with it. It is a central part of the local folders where Claude Code is recording its reasoning. This is not what it sounds like. Because Claude Code is a reasoning model, not just an agent model, its reasoning is being recorded. Why does this matter? Well, if we are trying to understand the behavior of agent-based production of research, that’s where we are going to learn about it. And for me, that’s one of the topics I think I am sorting into — what can we learn about their behavior? How do they make decisions? And are they influenced by the researcher?
Ordinarily we can only estimate causal effects under quasi or controlled randomized scenarios. So we involve independence, conditional independence, parallel trends, etc. And then upon targeting the corresponding population causal estimand, we build the estimators that center around those identification assumptions and pow — we get it. Under the law of large numbers, the estimator’s sampling distribution will center exactly on top of that population causal effect, and under central limit theorem, we get a normal distribution that we can use for hypothesis tests. Win-win.
But that is not itself something we can easily use for identifying mechanisms. Very often all we have are a bunch of variables and have to more or less use the same research design, cross our fingers that the assumptions that held for the original target parameter will continue to hold for the “mechanism variables”, but of course that is not guaranteed. Parallel trends could hold for mortality but not diseases, for instance, even though you may suspect diseases are the drivers of the mortality results you found. And yet if you switch out the research design in order to get the disease channel, maybe you are not longer in the world of the diff-in-diff with its targeted ATTs. Maybe to get the mechanism, you have to switch over to an instrument, and as a result, you end up with something more like a LATE — which is not only a different parameter, but it’s a different population, and I don’t just mean it’s the average complier effect. I mean that the LATE is not the LATT is not the ATT.
And so you end up with having to invoke a lot of homogenous treatment effect assumptions to figure out the mechanism, which is identification by magic in some instances because as I said, there’s no implicit guarantee that an identification assumption for one outcome, Y, must therefore hold for another outcome, call it Y2.
I think this is actually not a widely noted problem with modern causal inference and the assumption of unbounded heterogenous treatment effects. Buying into unbounded heterogenous treatment effects is actually a huge headache. It broke OLS twoway fixed effects (RIP), it broke instrumental variables (also RIP), and it will keep breaking things. In fact, heterogeneous treatment effects is so harsh that it even seems to step on the toes of making Popperian falsification principles for testing scientific theories impossible. Why? Because if treatment effects really can be anything, and you have a theory that says some comparative static goes in some direction, then you end up without the ability to say that empirically. The theory can say that, but once you end up in a world of unbounded heterogenous treatment effects, you let theory go. Maybe the overall effect is supposed to be negative, but does that therefore mean the average complier effect is going to be negative? Unbounded heterogenous treatment effects may undermine Popperian principles of falsification making it hard to truly work out through deduction why you found what you found.
What does this have to do with AI Agents? Well — we actually have the mechanism for their behavior. How so? Because in that JSON, they will literally write it down. And since it’s all in text, the only thing you really need to do is work with natural language processing techniques to draw it out.
In my minimum wage study that I mentioned the other day, I have these various agents that I randomly assigned to the same task with slightly different primes about the expected effect of the minimum wage. Once I get the R&R resubmitted, I’ll post about it, but let me tell you this for the sake of everyone else working on these questions — I am finding evidence of specification searching in response to the various minimum wage primes. I can literally see in these diaries Claude Code running a model, looking at the outcome, and then openly stating it’s going to try a different model because the estimated effect is not what it expected.
Which means a few things. First, Claude Code is not going to automatically bind itself to design principles. It won’t take the position in other words that the result is what the result is. This, again, is evidence that we are going to have to become adept at verification. Even when I significantly handicapped the agents (I forced them to read our JEL on difference-in-differences rife with warning about twoway fixed effects, for instance), they will switch things around, all under the hood, leaving me with results that I am not personally involved in.
Secondly, Claude Code does not appear to be targeting a particular population estimand. Which means that the human is likely going to have to scaffold that, and that requires human capital, and proofing the work does too, and figuring out if it’s specification searching is going to need to be something that the researcher is on top of. I am skeptical that the researcher can submit their JSONs in the project, because frankly, that’s fairly easy to clip. But I suspect it will require at minimum a lot of awareness collectively that the human can just lay back and led the agent drive. Even when the agent has the comparative advantage in production does not therefore mean it has the comparative advantage in verification too. In fact, rarely does any actor have the comparative advantage across all of the decision nodes. And if there are only two decision nodes, they won’t have the comparative advantage in both even if they have the absolute advantage in both.
Of course, some of this does get a little tenuous with agents because we aren’t dealing with just one actor, but nearly a potentially huge number of swarming agents.
Anyway, we are at the beginning of all this. And I just wanted to put out there, again, that you have more data than you may know, and that this is probably one of those “text as data” moments. I encourage you to start tearing that JSON open, therefore, and incorporate it into your research — for sure if you’re studying the behavior of agents, but probably even if you aren’t.
Also I have now written 50 essays about Claude Code. My first one was December 13th, 2025. And today is May 13th (or at least, it is in Zurich). Six months. What a journey six months has been for me. Going from first noticing, almost randomly, in mid November the toggle on the desktop app with Claude Code, to now being genuine and sincere that Claude Code has changed my life. I am all in. I am pushing all my chips onto this technology. No turning back for me. All of the problems that Claude Code creates for my own workflow are simply problems to solve. I am undeterred. I find it such a strange sensation to wake up feeling actual gratitude for software. But I do.



some reactions here--great post and series: https://jasonmfletcher.substack.com/p/ai-reasoning-vs-ai-justification