16 Comments
User's avatar
Dr Sam Illingworth's avatar

Great post Scott and I agree that this is a huge opportunity for social scientists and just research in general. Do you think there'll be any issues in terms of journals accepting analysis done using Claude Code, though? For me personally, I'm genuinely thinking about moving a lot of my research beyond the paywalled barriers of expensive academic journals and into Substack instead, where it can genuinely make a difference, or is this just potentially career silliness on my behalf?

scott cunningham's avatar

Well that is a journal policy and I think we don’t yet know. Journals do often require acknowledging if and for what the authors used AI, so I can imagine that’ll just be what happens for the most part.

Dr Sam Illingworth's avatar

In the journals I exec edit we just ask for explanation of how AI has been used and then engage in a dialogue rather than blanket yes or no. But far more didactic for the other journal I am passionate editor for such as nature humanities and social sciences communications. Which I think is a shame and a missed opportunity.

Michael's avatar
2dEdited

For the time being, the $20/month ChatGPT plan comes with relatively generous limits on Codex (a comparable product to Claude Code, though not as nice in mysterious hard to define ways). Reasonable place to look for a broke PhD student interested in this right now.

scott cunningham's avatar

Like what’s an example where they’re different but weird?

Michael's avatar

Codex seems to be tuned to just “close tickets”. Very terse, asks fewer questions, will just try to run off and code complete features. This can mean running down a dead end. You’re delegating not working with a partner.

On the other hand, if there are tricky mathematical/numerical implementation details, it seems to be a little better, possibly because the base model is better at math than Claude.

This is all just a vibes based analysis and shouldn’t really be trusted, but it is a consistent impression I get.

The non-AI software parts of the product are also not as polished, fewer convenient little features.

scott cunningham's avatar

I wonder what’s going on. Are they targeting different types of users maybe?

Michael's avatar

Probably more software engineering workflow. You might have a junior dev do self-contained set of work on a git branch, open a pull request, let tests run automatically, and then have it reviewed by a senior person before merging into main codebase.

An agent that just goes off and independently makes complete pull requests with no interaction fits in perfectly to this workflow. In fact openai lets you launch codex jobs from the web without touching your machine at all, and they will appear on github as pull requests.

But I can speculate that training for this makes it harder to train for the interactive back-and-forth Claude Code style.

Scott Hancock's avatar

I vaguely recall OpenAI and Anthropic offer free $20 plans to students? Or maybe API credits?

Vitaly Meursault 🌹's avatar

This is me acting as a wall to bouncing your ball off. Let's assume that the shock we're talking about is the release of Chat GPT, and we want to think about how it propagates through the economy. Release of Chat GPT can't have any direct effect on productivity (in a way, say, a monetary policy shock can have direct effect on interest rates). The productivity effects are through 1) Initial release serving as a signal that a new portion of innovation space has been unlocked and we can get better AI models in predictable ways aka scaling 2) software integrations that reduce frictions in applying intelligence, natural or artificial, to the physical world 3) and people learning how to use to leverage natural or artificial intelligence better. What makes analysis of equilibrium hard in this case (to my taste, too uncertain to be useful, but that's not a hill I would die on) is that everything we know about productivity effects so far has been snapshots of productivity effects of a specific [LLM, software integrations, human skill] bundles (and I'm even abstracting from task object here, which is also very important, and the system of tasks object which is even more important because of Goldratt's law). They don't tell us about equilibrium (by which I understand long run steady state) effects because it's just a point in time snapshot. Acemoglu call center [LLM, software integrations, human skill] bundle is nothing like [LLM, software integrations, human skill] bundle from Mollick paper, or [LLM, software integrations, human skill] we use when we use Claude Code. My [LLM, software integrations, human skill] bundle is nothing like yours. We're all in process of learning. The reason I like METR study that you cited so much (besides it being very sobering) is that it is painstakingly specific about the [LLM, software integrations, human skill] bundle they use. We need a long series of these studies over time so we can track these individual components as the detailed objects they are. Because 1) [LLM, software integrations, human skill] are all evolving fast and 2) Productivity depends not only on the sum of [LLM, software integrations, human skill] but also their interaction, the expiration date of any insights from individual studies for the current state is very short (basically, because label "AI" refers to [LLM, software integrations, human skill] bundle and the bundle drifts).

I'm with my old ancient egyptian history professor here: "It's too early to write a history of ancient egypt / think about equilibrium effects of AI, we need to spend much more time digging up artifacts / observing and participating in [LLM, software integrations, human skill] evolution before we can even start".

I'm 100% with you that we all should start using Claude Code today (well, yesterday would be better, but today is second best). But to me that means not using a tool but rather subscribing to a long learning journey.

Scott Hancock's avatar

"But the equilibrium logic doesn’t care about any of that. It doesn’t require everyone to gain equally. It requires enough people to gain enough that non-adoption becomes costly."

Members of some organizations can enforce what are effectively prohibitions or cartels against adoption. This is especially costly if there are network effects to returns on adoption. I don't think academia is such a case, but you can imagine other types of organizations where it is true!

Scott Hancock's avatar

Gov

scott cunningham's avatar

The US government?

Scott Hancock's avatar

I messaged you earlier on LinkedIn about it; I had state govt in mind

Alexei Gannon's avatar

I'm not sure if the incentives are there for this, but I find AI Agents to be incredibly useful for experimental planning & statistical replication in biology. When you have a new experimental idea, it now takes only a few hours to search online literature for novelty, apply your analysis to similar datasets, and spin up simulations to ensure your next experiment is robust, novel, and reproducible. This is the type of rigor we want in scientific research, but it's always had too much time-cost and too little incentive for most labs to do regularly. That said, I am unsure if competitive pressures in our current system will incentivize rigor over quantity & how policy could change that