Speech-to-Text (STT)

Triple-engine speech recognition for real-time subtitle generation—online and offline, with up to 99 languages supported.

Three Engines, Maximum Flexibility

Sub!t integrates three speech recognition engines, each optimized for different scenarios. Switch between them instantly based on your needs—whether you need cloud-powered speed, fully offline privacy, or a lightweight streaming solution.

Deepgram Nova-3

Online real-time streaming with ~200ms latency. Keywords Boosting for specialized terminology. Best for live events with internet access.

Whisper large-v3-turbo

Offline recognition with Metal GPU acceleration. Supports 99 languages. Best accuracy for pre-recorded or high-fidelity scenarios.

Sherpa-onnx

Offline streaming (Zipformer Chinese-English) and non-streaming (SenseVoice 5 languages). Lightweight and privacy-first.

Smart Language Processing

Built-in OpenCC engine automatically converts between Traditional and Simplified Chinese for offline recognition results, including Taiwan-specific vocabulary mapping.

Audio input device selector—choose any microphone or audio interface
Recognition language selection per session
Model management panel for downloading and switching models
Enter / Esc keys work globally during STT—no need to click the input box first

Use Cases

Live Conferences:

Real-time speech-to-subtitle with Deepgram for multilingual audiences. Keywords Boosting ensures proper nouns and technical terms are recognized correctly.

Houses of Worship:

Offline recognition with Sherpa-onnx or Whisper—no internet required. Perfect for venues without reliable connectivity.

Broadcast Production:

Generate real-time captions and pipe them through NDI output for live broadcast overlay.