This is great, thank you for sharing. I work on these APIs at OpenAI, it's a sur...

georgemandis · 2025-06-26T04:21:48 1750911708

I kind of want to take a more proper poke at this but focus more one summarization accuracy over word-for-word accuracy, though I see the value in both.

I'm actually curious, if I run transcriptions back-to-back-to-back on the exact same audio, how much variance should I expect?

Maybe I'll try three approaches:

- A straight diff comparison (I know a lot of people are calling for this, but I really think this is less useful than it sounds)

- A "variance within the modal" test running it multiple times against the same audio, tracking how much it varies between runs

- An LLM analysis assessing if the primary points from a talk were captured and summarized at 1x, 2x, 3x, 4x runs (I think this is far more useful and interesting)

nerder92 · 2025-06-25T22:02:33 1750888953

Quick Feedback: Would it be cool to research this internally and maybe find a sweet spot in speed multiplier where the loss is minimal. This pre-processing is quite cheap and could bring down the API price eventually.