Sending an Audio Stream to OpenAI’s Whisper Model

whispher

The documentation for using Open AI’s Whisper model makes it pretty clear that it is expecting a file. But there is a way around it, this is that way. It took some experimentation to find it.

let mediaRecorder = null;
const apikey = "SL-XXXXXXX";
function transcribe() {
// Request permission to access audio stream
navigator.mediaDevices
.getUserMedia({ audio: true })
.then((stream) => {
mediaRecorder = new MediaRecorder(stream, {
mimeType: "audio/webm; codecs=opus",
});
mediaRecorder.addEventListener("dataavailable", async (event) => {
const blob = new Blob([event.data], { type: "audio/webm" });
const data = new FormData();
const fileName = `audio-${new Date().toISOString()}.webm`;
data.append("file", blob, fileName);
data.append("model", "whisper-1");
fetch("https://api.openai.com/v1/audio/transcriptions", {
method: "POST",
headers: {
Authorization: `Bearer ${apikey}`,
},
body: data,
})
.then((response) => response.json())
.then((data) => console.log("Transcription:", data.text))
.catch((error) => console.error(error));
});
})
.catch((error) => console.error(error));
}

You can start recording with a button event that calls mediaRecorder.start() and call mediaRecorder.stop() on release, walkie-talkie style 😃

Previous Two Step Evernote to Obsidian Conversion