Whisper Transcription from the Browser

Sending an Audio Stream to OpenAI’s Whisper Model

whispher

The documentation for using Open AI’s Whisper model makes it pretty clear that it is expecting a file. But there is a way around it, this is that way. It took some experimentation to find it.

let mediaRecorder = null;
const apikey = "SL-XXXXXXX";
function transcribe() {
  // Request permission to access audio stream
  navigator.mediaDevices
    .getUserMedia({ audio: true })
    .then((stream) => {
      mediaRecorder = new MediaRecorder(stream, {
        mimeType: "audio/webm; codecs=opus",
      });
      mediaRecorder.addEventListener("dataavailable", async (event) => {
        const blob = new Blob([event.data], { type: "audio/webm" });
        const data = new FormData();
        const fileName = `audio-${new Date().toISOString()}.webm`;
        data.append("file", blob, fileName);
        data.append("model", "whisper-1");
        fetch("https://api.openai.com/v1/audio/transcriptions", {
          method: "POST",
          headers: {
            Authorization: `Bearer ${apikey}`,
          },
          body: data,
        })
          .then((response) => response.json())
          .then((data) => console.log("Transcription:", data.text))
          .catch((error) => console.error(error));
      });
    })
    .catch((error) => console.error(error));
}

You can start recording with a button event that calls mediaRecorder.start() and call mediaRecorder.stop() on release, walkie-talkie style 😃

29 Oct 2023 Code Sound Tips