$0
Zero Costs
No per-request API fees
π
Privacy
All processing on your machine
π΄
Offline Capable
No internet required
βΎοΈ
No Rate Limits
Process as much as you want
Trade-offs
- Slower than cloud APIs (5-10x longer processing)
- Requires decent hardware (GPU with 8GB+ VRAM recommended)
Option 1: Ollama (Recommended)
Ollama is the easiest way to run local LLMs. It just works, runs as a lightweight background service, and takes up almost no resources until you actually need it.
Step 1: Download and Install
- Visit ollama.com
- Download Ollama for Windows
- Install - Ollama starts automatically as a background service
Step 2: Download a Model
Option A: Install from Loki Studio (Easiest)
- Open Loki Studio
- Go to Settings > Application Settings > LLM Providers
- Select Ollama as provider
- Click Install next to any model (llama3, mistral, qwen, etc.)
- Wait for download to complete (progress shown)
- Click Refresh to see installed models
Option B: Install via Command Line
Open Command Prompt or PowerShell and run:
# Best balance of quality and speed (8GB VRAM)
ollama pull llama3:8b
# Faster, smaller footprint (4GB VRAM)
ollama pull llama3.2:3b
# Good alternative (8GB VRAM)
ollama pull mistral:7b
Step 3: Configure Loki Studio
- Open Loki Studio
- Go to Settings > Application Settings > LLM Providers
- Set AI Provider to
Ollama
- Select your model from the dropdown (auto-detected)
- Settings auto-save!
That's it! Ollama runs automatically in the background whenever you need it.
Option 2: LM Studio
LM Studio is a desktop application with a nice UI and model browser. It gives you more control over model settings.
Important: LM Studio requires you to manually start its server each time you want to use it. If you want something that "just works," use Ollama instead.
Step 1: Download and Install
- Visit lmstudio.ai
- Download LM Studio for Windows
- Install and launch the application
Step 2: Download a Model
- Open LM Studio
- Go to the Search tab (magnifying glass icon)
- Search for and download a model:
- Qwen2.5-7B-Instruct - Best balance (~8GB VRAM)
- Llama-3.2-3B-Instruct - Faster (~4GB VRAM)
- Mistral-7B-Instruct - Good alternative (~8GB VRAM)
- Phi-3-mini - Very fast (~2GB VRAM)
- Wait for download to complete
Step 3: Start the Server (Every Time!)
- Go to the Local Server tab (leftmost icon)
- Select your downloaded model from the dropdown
- Click Start Server
- Keep LM Studio running while using Loki Studio
Server runs at: http://localhost:1234
Step 4: Configure Loki Studio
- Open Loki Studio
- Go to Application Settings
- Set AI Provider to
LM Studio
- Set LM Studio Endpoint to
http://localhost:1234
- Select your model from the dropdown
- Click Save Settings
Model Recommendations
| Model |
VRAM |
Speed |
Quality |
Best For |
| Qwen2.5-7B |
8GB |
Medium |
Excellent |
Daily use, best balance |
| Llama-3.2-3B |
4GB |
Fast |
Good |
Budget GPUs |
| Mistral-7B |
8GB |
Medium |
Very Good |
Alternative to Qwen |
| Phi-3-mini |
2GB |
Very Fast |
Decent |
Older GPUs, testing |
Troubleshooting
"Connection refused" or timeout errors
- Ollama: Check service is running with
ollama list
- LM Studio: Make sure server is started (green indicator)
- Verify endpoint URL matches what's in Loki Studio settings
"Model not found"
- Ollama: Run
ollama list to see installed models
- LM Studio: Model must be loaded in Local Server tab
- Model name is case-sensitive - check spelling
Very slow generation
- Check GPU is being used (Task Manager > Performance > GPU)
- Try a smaller model (3B instead of 7B)
- Close other GPU-intensive applications
- Update GPU drivers
Cost Comparison
| Method |
Cost per Video |
Notes |
| Local LLM |
$0.00 |
Free after initial setup |
| OpenAI GPT-4o-mini |
~$0.01-0.05 |
Fast, high quality |
| OpenAI GPT-4o |
~$0.10-0.50 |
Premium quality |
Pro Tip
Process videos overnight with local LLMs. Set up a batch before bed and wake up to fully processed content - all at zero cost. After ~50-100 videos, local LLMs have paid for themselves!
See also: Remote LLM Setup for cloud-based AI options