Claude Code 19: When the Reclassification Is Massive But the Trends Don't Change, Something Interesting Is Happening (Part 4)

Testing the marginal-cases hypothesis with a thermometer, and letting Claude Code run wild on datasets

Feb 13, 2026

∙ Paid

In Parts 1 and 2, I showed you the setup and the punchline: gpt-4o-mini agreed with the original RoBERTa classifier on only 69% of individual speeches, but the aggregate trends — partisan polarization, country-of-origin patterns, the whole historical arc — were virtually identical. Over 100,000 labels changed and yet the original story didn’t.

That resul…

Continue reading this post for free, courtesy of scott cunningham.

Or purchase a paid subscription.