Do the APIs support simultaneous voice transcription in a way that different voi...

Do the APIs support simultaneous voice transcription in a way that different voices are tagged? (either in text or as metadata)

If so: could you split the audiofile and process the latter half by pitch shifting, say an octave, and then merging them together to get shorter audiofile — then transcribe and join them back to a linear form, tagging removed. (You could insert some prerecorded voice to know at which point the second voice starts.). If pitch change is not enough, maybe manipulate it further by formants.