The following is just some thoughts I had about Claude Code based on spending a day working on an old project that I had done a ton of work on shortly after discovering Claude Code in mid November.
I am a paid subscriber :-) … I am basically stalking you. Ha ha. I read all you Substack and then I go straight to knock on Pedro’s door to comment and how cool something you did is.
"A lot of econometrics can be done with pencil and paper if you really can distill it to the most basic version of itself."
I had an NLP professor who introduced almost every new method with something like, "Today we're going to learn a new way to add things up and then divide them by some other things. That's all it is."
An incredibly useful pedagogical trick to make some pretty intense adding and dividing seem manageable.
If you have motivated students in the classroom, which I suspect you were, it’s amazing. But if you have students who only want the equations for the exam, they get pissed. lol
Returning to an old project after months away tells you more about your setup than any benchmark. Fresh context, cold files, code you didn't touch in months.
If Claude can pick it back up without you rebuilding all the scaffolding from scratch, something is working. If you spend the first hour re-explaining what the project even is, something isn't.
I've found the difference comes down almost entirely to how well the memory files were maintained when the project was active.
Totally agree with that 100%. But that's the thing -- I'm always juggling way too many projects, and with a fairly intensive teaching schedule and new preps, that's the life of the professor.
I think I need different workflows though for new projects with CC than picking up pre-CC projects than picking up early CC projects. This one was basically the second project I worked on after discovering CC in mid November, and I can absolutely tell that all the progress I made since then was absolutely not there then. I am remember back then it would constantly make new versions of existing files, for instance, whenever I had a request. Plus it would hard code estimates rather than use stored results in macros, which I later started to piece together as it didn't use consistently do that, and therefore I would be lulled into thinking it never did. I've still not fully figured out the human verification parts -- what to do, when to do it, how to do it and where to do it.
But back then, it was even before I was religiously using markdowns to document things constantly. So there was just a lot of learning by doing that happened for me over the last 3 months.
Hah, read the first paragrah: "'I'm always juggling way too many projects" - this is 100% me as well :D
And from what you wrote - we have very similar expierience! I was coding back in 2024 in rails and had a lot of mini-projects. But now...it is just different.
How would you recommend Claude Code/AI be used by graduate students who are just acquiring expertise in the field? I feel like if I am not using AI, I am falling behind. But I'm also worried about 'the blind leading the blind" trap.
Honestly, you need to use it. That’s my suggestion. But you just need to be careful that it doesn’t accidentally reduce your human capital, and honestly, I think none of us know when or if or even how precisely that is happening.
But I think we all do know what it feels like to have true knowledge, so for me the goal is that. Aim to be literate and aware without AI present. Confidence.
Read this and immediately thought: context problem, not Claude problem. The part where it couldn't replicate csdid and wanted to scrap everything — Claude was given a codebase as its baseline and tried to replicate code logic rather than methodological logic. Researcher code is usually spaghetti so when replication fails, the instinct is "code is wrong" not "my replication is wrong."
What could have helped is methodology in context first — the CS vignette (https://bcallaway11.github.io/did/articles/multi-period-did.html) as your first message would be like a CLAUDE.md for methodology. Slight improvement would be to pass it to Gemini or Deepseek and ask it to extract key assumptions and known failure modes. Your expert intervention would still be essential, but having the ground truth document in context would have inverted the reasoning: does this code implement CS correctly, rather than does my replication match this code.
Skills have already been drafted in your tradition — compound-science, econ-research-skills, causal-inference-r, econometrics-check. At least two cite the Mixtape directly. Worth knowing they exist, though they probably won't prevent failure modes on novel methodology applied in unusual ways.
"We are not at AGI" is exactly right — we're at context window management. Get the context right, get good results.
I don't think CS vignette helps in this case because I was having him do this manually. The R file was actually breaking down. This was an unconventional version of CS, so I was having him manually do the estimator and then compare it with the Stata csdid so that I could diagnosis precisely where the problem was. Anyway point is, I'm pretty skeptical that a markdown reference to the syntax of the R package would've helped since I was asking him to calculate the four means myself so that I could work around the C+ problem it was running into with me flipping the data around like I was.
Thank you for sharing. I may assign this Substack to all my graduate students!
Absolutely! Print it out as a pdf just in case. Everything goes behind the paywall after a few days even the free things
I am a paid subscriber :-) … I am basically stalking you. Ha ha. I read all you Substack and then I go straight to knock on Pedro’s door to comment and how cool something you did is.
Knocking on Pedro’s door is what I do too only using slack DMs! Lol
Ha ha
"A lot of econometrics can be done with pencil and paper if you really can distill it to the most basic version of itself."
I had an NLP professor who introduced almost every new method with something like, "Today we're going to learn a new way to add things up and then divide them by some other things. That's all it is."
An incredibly useful pedagogical trick to make some pretty intense adding and dividing seem manageable.
If you have motivated students in the classroom, which I suspect you were, it’s amazing. But if you have students who only want the equations for the exam, they get pissed. lol
This post gets a complex and important subject exactly right. I hope it creates ripples in the pond.
Thanks buddy.
Returning to an old project after months away tells you more about your setup than any benchmark. Fresh context, cold files, code you didn't touch in months.
If Claude can pick it back up without you rebuilding all the scaffolding from scratch, something is working. If you spend the first hour re-explaining what the project even is, something isn't.
I've found the difference comes down almost entirely to how well the memory files were maintained when the project was active.
Totally agree with that 100%. But that's the thing -- I'm always juggling way too many projects, and with a fairly intensive teaching schedule and new preps, that's the life of the professor.
I think I need different workflows though for new projects with CC than picking up pre-CC projects than picking up early CC projects. This one was basically the second project I worked on after discovering CC in mid November, and I can absolutely tell that all the progress I made since then was absolutely not there then. I am remember back then it would constantly make new versions of existing files, for instance, whenever I had a request. Plus it would hard code estimates rather than use stored results in macros, which I later started to piece together as it didn't use consistently do that, and therefore I would be lulled into thinking it never did. I've still not fully figured out the human verification parts -- what to do, when to do it, how to do it and where to do it.
But back then, it was even before I was religiously using markdowns to document things constantly. So there was just a lot of learning by doing that happened for me over the last 3 months.
Hah, read the first paragrah: "'I'm always juggling way too many projects" - this is 100% me as well :D
And from what you wrote - we have very similar expierience! I was coding back in 2024 in rails and had a lot of mini-projects. But now...it is just different.
How would you recommend Claude Code/AI be used by graduate students who are just acquiring expertise in the field? I feel like if I am not using AI, I am falling behind. But I'm also worried about 'the blind leading the blind" trap.
Honestly, you need to use it. That’s my suggestion. But you just need to be careful that it doesn’t accidentally reduce your human capital, and honestly, I think none of us know when or if or even how precisely that is happening.
But I think we all do know what it feels like to have true knowledge, so for me the goal is that. Aim to be literate and aware without AI present. Confidence.
Sounds like a case for SKILLS.md for methodology
Which part?
Read this and immediately thought: context problem, not Claude problem. The part where it couldn't replicate csdid and wanted to scrap everything — Claude was given a codebase as its baseline and tried to replicate code logic rather than methodological logic. Researcher code is usually spaghetti so when replication fails, the instinct is "code is wrong" not "my replication is wrong."
What could have helped is methodology in context first — the CS vignette (https://bcallaway11.github.io/did/articles/multi-period-did.html) as your first message would be like a CLAUDE.md for methodology. Slight improvement would be to pass it to Gemini or Deepseek and ask it to extract key assumptions and known failure modes. Your expert intervention would still be essential, but having the ground truth document in context would have inverted the reasoning: does this code implement CS correctly, rather than does my replication match this code.
Skills have already been drafted in your tradition — compound-science, econ-research-skills, causal-inference-r, econometrics-check. At least two cite the Mixtape directly. Worth knowing they exist, though they probably won't prevent failure modes on novel methodology applied in unusual ways.
"We are not at AGI" is exactly right — we're at context window management. Get the context right, get good results.
I don't think CS vignette helps in this case because I was having him do this manually. The R file was actually breaking down. This was an unconventional version of CS, so I was having him manually do the estimator and then compare it with the Stata csdid so that I could diagnosis precisely where the problem was. Anyway point is, I'm pretty skeptical that a markdown reference to the syntax of the R package would've helped since I was asking him to calculate the four means myself so that I could work around the C+ problem it was running into with me flipping the data around like I was.