How It Works - Loki Studio Documentation

The Pipeline Overview

1. Transcribe
Multi-track audio

→

2. Metadata
70/30 weighting

→

3. Thumbnail
Content-aware

→

4. Upload
Batch to YouTube

Step 1: Transcription

Multi-Track Detection: Loki Studio detects up to 32 audio tracks in your video
Track Naming: You assign meaningful names to each track (System Audio, Microphone 1, etc.)
Primary Track Selection: Mark which track contains YOUR voice (radio button)

Output Files:

video_transcript.txt - All tracks with timestamps
video_captions.srt - PRIMARY tracks only (for YouTube)
video_metadata.json - Full JSON data

Step 2: Metadata Generation

Weighted Content Loading: AI receives 70% PRIMARY track, 30% context tracks
Timestamps Stripped: Clean text without [HH:MM:SS] clutter
Character Budget: Respects header/footer templates (max 5000 chars)
Focus: Description focuses on what YOU said, not just game sounds

Step 3: Thumbnail Generation

Same Weighting: Uses 70% YOUR words, 30% context
Real Content: Title/subtitle reflect actual topics discussed
Personality System: Applies Dad Joke, Brain Rot, or other styles

Track Priority System

Track Type	Weight	Examples	In Captions?
PRIMARY	70%	Microphone, Host, Commentary, My Voice	Yes
SECONDARY	20% (shared)	Guest 1, Guest 2, Speaker, Co-Host	Yes
CONTEXT	10%	System Audio, Game Audio, Music, SFX	No

Track Naming Best Practices

Track Type	Good Names	Bad Names	Why?
Your Mic	Microphone 1, Host, Commentary	Mic, Track 1, Audio	Auto-detection works best with full words
Game Audio	System Audio, Game Audio, Desktop	Game, Sounds, Track 0	Clear context prevents mixing with voices
Guests	Guest 1, Speaker, Interview	Person, Other, Track 2	Identifies supporting speakers

Avoid Human Names for Tracks

Problem: What if you name tracks "Joe" and "Sally"?

Auto-detection won't recognize "Joe" or "Sally" as primary
May treat them as context (bad!)
Solution: Manually select Primary radio button, or rename to "Joe (Host)" / "Sally (Guest)"

Real-World Examples

Gaming Commentary (Good)

Track 0: "System Audio"     → Context (game sounds)  ○
Track 1: "Microphone 1"     → Primary (your voice)   ●

Result: Metadata focuses on YOUR commentary, game audio adds context

Podcast with Guests (Good)

Track 0: "System Audio"     → Context (intro music)  ○
Track 1: "Host"             → Primary (you)          ●
Track 2: "Guest 1"          → Secondary              ○
Track 3: "Guest 2"          → Secondary              ○

Result: Weighting = 50% host, 40% guests, 10% system

Poor Naming (Bad)

Track 0: "Track 1"          → Unclear                ○
Track 1: "Audio"            → Unclear                ○
Track 2: "Mic"              → Too vague              ○

Problem: Auto-detection fails, unclear weighting, poor results

Captions (SRT) Special Handling

What Goes in Captions?

INCLUDED: All PRIMARY and SECONDARY tracks (human voices)
EXCLUDED: CONTEXT tracks (game audio, music, SFX)

Note: Speaker labels are not yet supported. All human voices merge chronologically by timestamp.

Pro Tip

Good track naming is the foundation of great metadata! Take 10 seconds to name tracks properly, and the AI will generate descriptions and thumbnails that truly reflect your content.