🎤 Transcription (Muninn Engine)

Convert speech to text with AI-powered transcription - 100% FREE and runs locally

← Back to Documentation
🤖
Whisper AI

State-of-the-art accuracy

🌍
99+ Languages

Auto-detection + translation

GPU Acceleration

2-22x faster with CUDA

🎚️
Multi-Track

Separate mic + game audio

Model Selection

The model dropdown only shows models installed on your system:

Model Size Speed Quality Recommended For
large-v3-turbo ~1.5 GB Fast Excellent Daily use (recommended)
large-v3 ~3 GB Slow Best Maximum accuracy
medium ~1.5 GB Medium Good Balanced
small ~500 MB Fast Decent Low-end hardware
base ~150 MB Very Fast Basic Testing

Advanced Features

Voice Activity Detection (VAD)

Automatically filters silence and non-speech audio:

  • Auto - Automatically selects best VAD for your content
  • Energy - Fast, works well with game audio
  • Silero - High accuracy for clean speech
  • Off - Process all audio (no filtering)

Speaker Diarization

Identifies who's speaking - great for podcasts, interviews, and co-op gaming.

Word-Level Timestamps

Enables karaoke-style captions where each word highlights as it's spoken.

Workflow

  1. Select your Video Profile (Dashboard tab)
  2. Place videos in profile's toupload/ folder
  3. Click "Refresh" to see videos
  4. Select videos to transcribe
  5. Click "Transcribe Selected"

Output Files:

  • video_name_transcript.txt - Plain text transcript
  • video_name.srt - Subtitle file

💡 Pro Tips

  • Use OBS Audio Channel 2 (mic only) for highest accuracy
  • Enable Word Timestamps in Settings for caption burning
  • GPU transcription is 10-20x faster than CPU
  • Use large-v3-turbo for best speed/quality balance
← Timeline Editor Metadata Generation →
Buy me a coffee