Read in the Substack app
Open app

Discussion about this post

User's avatar
Ralf Elsas-Nicolle's avatar

Hi Scott. Super interesting experiment and blog!

I‘m currently engaged in a similar endeavor with forum posts. But I‘m running all the time into token restrictions when trying to use cheaper batch processing, though in Google Cloud. OpenAI is supposedly even more restricted in this regard. To understand your workflow and data - when you say 300k documents - are these chunks of speeches? What is the overall token amount of the speech data?

Thanks - and looking forward to read your continued blog tomorrow.

Dr Sam Illingworth's avatar

Thanks for sharing this so publicly, Scott. I'm really enjoying following along and also learning how to use this approach in my own research as well.

What processes are you using for recording the steps that you've taken for quality assurance as a human? As I imagine, that would be really interesting and, in some instances, necessary to report if this was ever to be published as peer-reviewed research.

1 more comment...

No posts

Ready for more?