Local LLM Setup - Loki Studio Documentation

Zero Costs

No per-request API fees

🔒

Privacy

All processing on your machine

📴

Offline Capable

No internet required

♾️

No Rate Limits

Process as much as you want

Trade-offs

Slower than cloud APIs (5-10x longer processing)
Requires decent hardware (GPU with 8GB+ VRAM recommended)

Option 1: Ollama (Recommended)

Ollama is the easiest way to run local LLMs. It just works, runs as a lightweight background service, and takes up almost no resources until you actually need it.

Step 1: Download and Install

Visit ollama.com
Download Ollama for Windows
Install - Ollama starts automatically as a background service

Step 2: Download a Model

Option A: Install from Loki Studio (Easiest)

Open Loki Studio
Go to Settings > Application Settings > LLM Providers
Select Ollama as provider
Click Install next to any model (llama3, mistral, qwen, etc.)
Wait for download to complete (progress shown)
Click Refresh to see installed models

Option B: Install via Command Line

Open Command Prompt or PowerShell and run:

# Best balance of quality and speed (8GB VRAM)
ollama pull llama3:8b

# Faster, smaller footprint (4GB VRAM)
ollama pull llama3.2:3b

# Good alternative (8GB VRAM)
ollama pull mistral:7b

Step 3: Configure Loki Studio

Open Loki Studio
Go to Settings > Application Settings > LLM Providers
Set AI Provider to Ollama
Select your model from the dropdown (auto-detected)
Settings auto-save!

That's it! Ollama runs automatically in the background whenever you need it.

Option 2: LM Studio

LM Studio is a desktop application with a nice UI and model browser. It gives you more control over model settings.

Important: LM Studio requires you to manually start its server each time you want to use it. If you want something that "just works," use Ollama instead.

Step 1: Download and Install

Visit lmstudio.ai
Download LM Studio for Windows
Install and launch the application

Step 2: Download a Model

Open LM Studio
Go to the Search tab (magnifying glass icon)
Search for and download a model:
- Qwen2.5-7B-Instruct - Best balance (~8GB VRAM)
- Llama-3.2-3B-Instruct - Faster (~4GB VRAM)
- Mistral-7B-Instruct - Good alternative (~8GB VRAM)
- Phi-3-mini - Very fast (~2GB VRAM)
Wait for download to complete

Step 3: Start the Server (Every Time!)

Go to the Local Server tab (leftmost icon)
Select your downloaded model from the dropdown
Click Start Server
Keep LM Studio running while using Loki Studio

Server runs at: http://localhost:1234

Step 4: Configure Loki Studio

Open Loki Studio
Go to Application Settings
Set AI Provider to LM Studio
Set LM Studio Endpoint to http://localhost:1234
Select your model from the dropdown
Click Save Settings

Model Recommendations

Model	VRAM	Speed	Quality	Best For
Qwen2.5-7B	8GB	Medium	Excellent	Daily use, best balance
Llama-3.2-3B	4GB	Fast	Good	Budget GPUs
Mistral-7B	8GB	Medium	Very Good	Alternative to Qwen
Phi-3-mini	2GB	Very Fast	Decent	Older GPUs, testing

Troubleshooting

"Connection refused" or timeout errors

Ollama: Check service is running with ollama list
LM Studio: Make sure server is started (green indicator)
Verify endpoint URL matches what's in Loki Studio settings

"Model not found"

Ollama: Run ollama list to see installed models
LM Studio: Model must be loaded in Local Server tab
Model name is case-sensitive - check spelling

Very slow generation

Check GPU is being used (Task Manager > Performance > GPU)
Try a smaller model (3B instead of 7B)
Close other GPU-intensive applications
Update GPU drivers

Cost Comparison

Method	Cost per Video	Notes
Local LLM	$0.00	Free after initial setup
OpenAI GPT-4o-mini	~$0.01-0.05	Fast, high quality
OpenAI GPT-4o	~$0.10-0.50	Premium quality

Pro Tip

Process videos overnight with local LLMs. Set up a batch before bed and wake up to fully processed content - all at zero cost. After ~50-100 videos, local LLMs have paid for themselves!

See also: Remote LLM Setup for cloud-based AI options