🤖
Whisper AI
State-of-the-art accuracy
🌍
99+ Languages
Auto-detection + translation
⚡
GPU Acceleration
2-22x faster with CUDA
🎚️
Multi-Track
Separate mic + game audio
Model Selection
The model dropdown only shows models installed on your system:
| Model |
Size |
Speed |
Quality |
Recommended For |
| large-v3-turbo |
~1.5 GB |
Fast |
Excellent |
Daily use (recommended) |
| large-v3 |
~3 GB |
Slow |
Best |
Maximum accuracy |
| medium |
~1.5 GB |
Medium |
Good |
Balanced |
| small |
~500 MB |
Fast |
Decent |
Low-end hardware |
| base |
~150 MB |
Very Fast |
Basic |
Testing |
Advanced Features
Voice Activity Detection (VAD)
Automatically filters silence and non-speech audio:
- Auto - Automatically selects best VAD for your content
- Energy - Fast, works well with game audio
- Silero - High accuracy for clean speech
- Off - Process all audio (no filtering)
Speaker Diarization
Identifies who's speaking - great for podcasts, interviews, and co-op gaming.
Word-Level Timestamps
Enables karaoke-style captions where each word highlights as it's spoken.
Workflow
- Select your Video Profile (Dashboard tab)
- Place videos in profile's
toupload/ folder
- Click "Refresh" to see videos
- Select videos to transcribe
- Click "Transcribe Selected"
Output Files:
video_name_transcript.txt - Plain text transcript
video_name.srt - Subtitle file
💡 Pro Tips
- Use OBS Audio Channel 2 (mic only) for highest accuracy
- Enable Word Timestamps in Settings for caption burning
- GPU transcription is 10-20x faster than CPU
- Use large-v3-turbo for best speed/quality balance