Gemini charges by tokens rather than minutes. I used VAD to trim silence hoping ...

Gemini charges by tokens rather than minutes. I used VAD to trim silence hoping token count will go down. I noticed the token count wasn't much different (Eg: 30 seconds of background noise had the same count as 2s of background noise). Either Gemini API trims silence under the hood, or the nature of tokenization is dependent on speech content rather than the length. Not sure which.

In either case, I bet OpenAI is doing the same optimization under the hood and keeping the savings for themselves.