Whisper API — Speech to Text with OpenAI

Sanjeev SharmaSanjeev Sharma
1 min read

Advertisement

Introduction

Whisper converts speech to text with high accuracy. This guide covers API integration.

Getting Started

from openai import OpenAI

client = OpenAI(api_key="sk-...")

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f
    )

print(transcript.text)

Features

  • Multiple languages
  • High accuracy
  • Automatic punctuation
  • Timestamps available
  • Reasonable pricing

Supported Formats

MP3, MP4, MPEG, MPGA, M4A, WAV, WEBM

Pricing

$0.006 per minute, very affordable.

Tips

  • Clean audio produces better results
  • Specify language for improvement
  • Handle errors gracefully
  • Cache transcripts
  • Monitor API usage

Use Cases

Transcription services, voice commands, accessibility, meeting notes, customer feedback.

Limitations

  • Audio length limit (25MB)
  • No real-time streaming
  • Language detection sometimes inaccurate

Conclusion

Whisper excellent choice for speech-to-text needs.

FAQ

Q: Accuracy? A: Very high, comparable to human transcription.

Q: Other languages? A: Supports 99+ languages with good accuracy.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro