Everything under the hood, for those who want to know.
| Operating System | Windows 10/11 (64-bit) |
| Processor | Intel Core i5 or AMD Ryzen 5 (minimum) Intel Core i7 / AMD Ryzen 7 (recommended) |
| RAM | 8 GB (minimum) · 16 GB (recommended for large models) |
| Storage | 2 GB for application · 10-15 GB for AI models |
| GPU (for fast transcription) | NVIDIA GTX 1060 6GB or better CUDA 11.8+ supported |
| Display | 1920×1080 minimum · 4K supported |
| Internet | Required for: LLM API calls (Claude/GPT), YouTube upload, model downloads |
Note: GPU is optional but strongly recommended. CPU transcription works but is 10-20x slower.
| GPU | VRAM | Max Model | Performance |
|---|---|---|---|
| RTX 4090 | 24 GB | large-v3 | Excellent |
| RTX 4080 / 3090 | 16-24 GB | large-v3 | Excellent |
| RTX 4070 / 3080 | 10-12 GB | large-v3-turbo | Excellent |
| RTX 3070 / 4060 | 8 GB | medium / turbo | Very Good |
| RTX 3060 / 2070 | 6-8 GB | medium | Good |
| GTX 1660 / 1060 | 6 GB | small / base | Usable |
| CPU Only | — | Any (slow) | Works |
Loki Studio uses OpenAI's Whisper model via the faster-whisper implementation (CTranslate2). Models are downloaded once and run entirely on your machine.
| Model | Size | VRAM | Speed vs Quality |
|---|---|---|---|
| large-v3-turbo Recommended | 1.6 GB | ~6 GB | Best balance — near large-v3 quality at 4x speed |
| large-v3 | 3.1 GB | ~10 GB | Maximum accuracy, slower |
| medium | 1.5 GB | ~5 GB | Good accuracy, fast |
| small | 488 MB | ~3 GB | Faster, reduced accuracy |
| base | 147 MB | ~2 GB | Fastest, basic accuracy |
| Component | Technology | Details |
|---|---|---|
| NLLB-200 | Meta's Neural Machine Translation | 34+ languages, word-level timing preserved |
| Model Size | ~3 GB (downloaded on first use) | Runs locally on GPU/CPU |
| Languages | European, Asian, Middle Eastern, Cyrillic | Any-to-any translation supported |
| Feature | Description |
|---|---|
| Character-Level Timing | Distributes characters evenly across segment duration (no word boundaries in CJK) |
| Auto-Detection | Recognizes CJK content via language setting OR Unicode analysis |
| All Caption Styles | OneWordPop (2 chars), FullSegment (character highlight), WindowSlide (character scroll) |
| ASS Export | Character-level timing preserved in subtitle export |
| Languages | Chinese (Simplified/Traditional), Japanese, Korean |
Loki Studio generates titles, descriptions, tags, and chapters using your choice of LLM. You provide your own API key — no middleman markup.
| Provider | Models | Notes |
|---|---|---|
| Anthropic Claude Recommended | Claude 3.5 Sonnet, Claude 3 Opus | Best creative writing quality |
| OpenAI | GPT-4o, GPT-4o-mini, GPT-4-turbo | Reliable, fast |
| Ollama | Llama 3, Mistral, custom models | Free, runs locally, requires setup |
| LM Studio | Any GGUF model | Free, runs locally, requires setup |
| Grimnir (Built-in) | Bundled llama.cpp models | No setup required, moderate quality |
MP4, MKV, MOV, AVI, WebM, WMV, FLV, M4V, TS, MTS
Any resolution up to 8K supported
MP3, WAV, FLAC, AAC, OGG, M4A, WMA
Multi-track audio from OBS/recording software
MP4 (H.264/H.265), WebM (VP9)
Stream copy when possible for fast export
SRT, ASS/SSA, VTT, JSON
Word-level timestamps, CJK character-level timing
PNG, JPG, WebP
1280×720 default, customizable
JSON-based project format, EDL import/export
Human-readable, version control friendly
| Multi-Track Support | Separate audio tracks from OBS, Streamlabs, etc. processed independently |
| VAD (Voice Activity Detection) | Silero VAD v5.1 (ONNX) — filters silence before transcription |
| LUFS Analysis | ITU-R BS.1770-4 compliant loudness measurement |
| Waveform Generation | SIMD-optimized (AVX2) waveform visualization |
| Sample Rates | 8kHz to 192kHz (internally resampled to 16kHz for Whisper) |
| API Version | YouTube Data API v3 |
| Authentication | OAuth 2.0 (you authorize Loki Studio to upload on your behalf) |
| Upload Features | Video, thumbnail, title, description, tags, category, playlists, scheduling |
| Privacy Settings | Public, Unlisted, Private, Scheduled |
| Multiple Channels | Switch between authenticated channels |
| Upload Queue | Batch uploads with progress tracking and retry on failure |
Tested on RTX 4070 with large-v3-turbo model. Your results may vary based on hardware.
| Operation | 30-min Video | 1-hour Video |
|---|---|---|
| Transcription (GPU) | ~2-3 minutes | ~5-6 minutes |
| Transcription (CPU) | ~30-45 minutes | ~60-90 minutes |
| Metadata Generation | ~15-30 seconds | ~20-40 seconds |
| Thumbnail Frame Extraction | ~5 seconds | ~8 seconds |
| Video Metadata Read | <5 ms | <5 ms |
| Waveform Generation | ~2-3 seconds | ~4-5 seconds |
| Application Framework | Qt 6.10 (C++17, QML) |
| Transcription Engine | CTranslate2 (faster-whisper), CUDA 11.8+ |
| Video Playback | GStreamer 1.0 MSVC |
| Video Processing | FFmpeg (statically linked in isolated DLLs) |
| Image Compositing | Custom AVX2-optimized engine |
| Local LLM | llama.cpp (optional Skuld module) |
| Model Inference | ONNX Runtime 1.19+ |
| Networking | Qt Network, OpenSSL 3.x |
| Telemetry | None. Zero data collection. |
| Local Processing | Transcription, thumbnail creation, video editing — all local |
| Cloud Connections | Only when you request: LLM API calls, YouTube uploads, model downloads |
| API Keys | Stored locally, encrypted, never transmitted except to the provider you chose |
| Videos | Never uploaded anywhere except YouTube when you click Upload |
Not sure if your system can run Loki Studio? Just ask.
Email CraigI'll give you an honest answer about whether Loki Studio is right for your setup.