The Pipeline Overview
1. Transcribe
Multi-track audio
→
2. Metadata
70/30 weighting
→
3. Thumbnail
Content-aware
→
4. Upload
Batch to YouTube
Step 1: Transcription
- Multi-Track Detection: Loki Studio detects up to 32 audio tracks in your video
- Track Naming: You assign meaningful names to each track (System Audio, Microphone 1, etc.)
- Primary Track Selection: Mark which track contains YOUR voice (radio button)
Output Files:
video_transcript.txt - All tracks with timestamps
video_captions.srt - PRIMARY tracks only (for YouTube)
video_metadata.json - Full JSON data
Step 2: Metadata Generation
- Weighted Content Loading: AI receives 70% PRIMARY track, 30% context tracks
- Timestamps Stripped: Clean text without [HH:MM:SS] clutter
- Character Budget: Respects header/footer templates (max 5000 chars)
- Focus: Description focuses on what YOU said, not just game sounds
Step 3: Thumbnail Generation
- Same Weighting: Uses 70% YOUR words, 30% context
- Real Content: Title/subtitle reflect actual topics discussed
- Personality System: Applies Dad Joke, Brain Rot, or other styles
Track Priority System
| Track Type |
Weight |
Examples |
In Captions? |
| PRIMARY |
70% |
Microphone, Host, Commentary, My Voice |
Yes |
| SECONDARY |
20% (shared) |
Guest 1, Guest 2, Speaker, Co-Host |
Yes |
| CONTEXT |
10% |
System Audio, Game Audio, Music, SFX |
No |
Track Naming Best Practices
| Track Type |
Good Names |
Bad Names |
Why? |
| Your Mic |
Microphone 1, Host, Commentary |
Mic, Track 1, Audio |
Auto-detection works best with full words |
| Game Audio |
System Audio, Game Audio, Desktop |
Game, Sounds, Track 0 |
Clear context prevents mixing with voices |
| Guests |
Guest 1, Speaker, Interview |
Person, Other, Track 2 |
Identifies supporting speakers |
Avoid Human Names for Tracks
Problem: What if you name tracks "Joe" and "Sally"?
- Auto-detection won't recognize "Joe" or "Sally" as primary
- May treat them as context (bad!)
- Solution: Manually select Primary radio button, or rename to "Joe (Host)" / "Sally (Guest)"
Real-World Examples
Gaming Commentary (Good)
Track 0: "System Audio" → Context (game sounds) ○
Track 1: "Microphone 1" → Primary (your voice) ●
Result: Metadata focuses on YOUR commentary, game audio adds context
Podcast with Guests (Good)
Track 0: "System Audio" → Context (intro music) ○
Track 1: "Host" → Primary (you) ●
Track 2: "Guest 1" → Secondary ○
Track 3: "Guest 2" → Secondary ○
Result: Weighting = 50% host, 40% guests, 10% system
Poor Naming (Bad)
Track 0: "Track 1" → Unclear ○
Track 1: "Audio" → Unclear ○
Track 2: "Mic" → Too vague ○
Problem: Auto-detection fails, unclear weighting, poor results
Captions (SRT) Special Handling
What Goes in Captions?
- INCLUDED: All PRIMARY and SECONDARY tracks (human voices)
- EXCLUDED: CONTEXT tracks (game audio, music, SFX)
Note: Speaker labels are not yet supported. All human voices merge chronologically by timestamp.
Pro Tip
Good track naming is the foundation of great metadata! Take 10 seconds to name tracks properly, and the AI will generate descriptions and thumbnails that truly reflect your content.