Technical Specifications

Everything under the hood, for those who want to know.

System Requirements

Operating System Windows 10/11 (64-bit)
Processor Intel Core i5 or AMD Ryzen 5 (minimum)
Intel Core i7 / AMD Ryzen 7 (recommended)
RAM 8 GB (minimum) · 16 GB (recommended for large models)
Storage 2 GB for application · 10-15 GB for AI models
GPU (for fast transcription) NVIDIA GTX 1060 6GB or better
CUDA 11.8+ supported
Display 1920×1080 minimum · 4K supported
Internet Required for: LLM API calls (Claude/GPT), YouTube upload, model downloads

Note: GPU is optional but strongly recommended. CPU transcription works but is 10-20x slower.

GPU Compatibility

GPU VRAM Max Model Performance
RTX 4090 24 GB large-v3 Excellent
RTX 4080 / 3090 16-24 GB large-v3 Excellent
RTX 4070 / 3080 10-12 GB large-v3-turbo Excellent
RTX 3070 / 4060 8 GB medium / turbo Very Good
RTX 3060 / 2070 6-8 GB medium Good
GTX 1660 / 1060 6 GB small / base Usable
CPU Only Any (slow) Works

Transcription Models (Whisper)

Loki Studio uses OpenAI's Whisper model via the faster-whisper implementation (CTranslate2). Models are downloaded once and run entirely on your machine.

Model Size VRAM Speed vs Quality
large-v3-turbo Recommended 1.6 GB ~6 GB Best balance — near large-v3 quality at 4x speed
large-v3 3.1 GB ~10 GB Maximum accuracy, slower
medium 1.5 GB ~5 GB Good accuracy, fast
small 488 MB ~3 GB Faster, reduced accuracy
base 147 MB ~2 GB Fastest, basic accuracy

Translation Engine

Component Technology Details
NLLB-200 Meta's Neural Machine Translation 34+ languages, word-level timing preserved
Model Size ~3 GB (downloaded on first use) Runs locally on GPU/CPU
Languages European, Asian, Middle Eastern, Cyrillic Any-to-any translation supported

CJK Caption System

Feature Description
Character-Level Timing Distributes characters evenly across segment duration (no word boundaries in CJK)
Auto-Detection Recognizes CJK content via language setting OR Unicode analysis
All Caption Styles OneWordPop (2 chars), FullSegment (character highlight), WindowSlide (character scroll)
ASS Export Character-level timing preserved in subtitle export
Languages Chinese (Simplified/Traditional), Japanese, Korean

AI Text Generation (LLM Providers)

Loki Studio generates titles, descriptions, tags, and chapters using your choice of LLM. You provide your own API key — no middleman markup.

Provider Models Notes
Anthropic Claude Recommended Claude 3.5 Sonnet, Claude 3 Opus Best creative writing quality
OpenAI GPT-4o, GPT-4o-mini, GPT-4-turbo Reliable, fast
Ollama Llama 3, Mistral, custom models Free, runs locally, requires setup
LM Studio Any GGUF model Free, runs locally, requires setup
Grimnir (Built-in) Bundled llama.cpp models No setup required, moderate quality

Supported Formats

Video Input

MP4, MKV, MOV, AVI, WebM, WMV, FLV, M4V, TS, MTS

Any resolution up to 8K supported

Audio Input

MP3, WAV, FLAC, AAC, OGG, M4A, WMA

Multi-track audio from OBS/recording software

Video Output

MP4 (H.264/H.265), WebM (VP9)

Stream copy when possible for fast export

Caption Export

SRT, ASS/SSA, VTT, JSON

Word-level timestamps, CJK character-level timing

Thumbnail Output

PNG, JPG, WebP

1280×720 default, customizable

Project Files

JSON-based project format, EDL import/export

Human-readable, version control friendly

Audio Processing

Multi-Track Support Separate audio tracks from OBS, Streamlabs, etc. processed independently
VAD (Voice Activity Detection) Silero VAD v5.1 (ONNX) — filters silence before transcription
LUFS Analysis ITU-R BS.1770-4 compliant loudness measurement
Waveform Generation SIMD-optimized (AVX2) waveform visualization
Sample Rates 8kHz to 192kHz (internally resampled to 16kHz for Whisper)

YouTube Integration

API Version YouTube Data API v3
Authentication OAuth 2.0 (you authorize Loki Studio to upload on your behalf)
Upload Features Video, thumbnail, title, description, tags, category, playlists, scheduling
Privacy Settings Public, Unlisted, Private, Scheduled
Multiple Channels Switch between authenticated channels
Upload Queue Batch uploads with progress tracking and retry on failure

Performance Benchmarks

Tested on RTX 4070 with large-v3-turbo model. Your results may vary based on hardware.

Operation 30-min Video 1-hour Video
Transcription (GPU) ~2-3 minutes ~5-6 minutes
Transcription (CPU) ~30-45 minutes ~60-90 minutes
Metadata Generation ~15-30 seconds ~20-40 seconds
Thumbnail Frame Extraction ~5 seconds ~8 seconds
Video Metadata Read <5 ms <5 ms
Waveform Generation ~2-3 seconds ~4-5 seconds

Technical Architecture

Application Framework Qt 6.10 (C++17, QML)
Transcription Engine CTranslate2 (faster-whisper), CUDA 11.8+
Video Playback GStreamer 1.0 MSVC
Video Processing FFmpeg (statically linked in isolated DLLs)
Image Compositing Custom AVX2-optimized engine
Local LLM llama.cpp (optional Skuld module)
Model Inference ONNX Runtime 1.19+
Networking Qt Network, OpenSSL 3.x

Privacy & Data

Telemetry None. Zero data collection.
Local Processing Transcription, thumbnail creation, video editing — all local
Cloud Connections Only when you request: LLM API calls, YouTube uploads, model downloads
API Keys Stored locally, encrypted, never transmitted except to the provider you chose
Videos Never uploaded anywhere except YouTube when you click Upload

Questions About Compatibility?

Not sure if your system can run Loki Studio? Just ask.

Email Craig

I'll give you an honest answer about whether Loki Studio is right for your setup.

Buy me a coffee