Local Models & Open Source Providers
Run AI agents completely locally with open source models - perfect for privacy, cost savings, and offline development.
🔒 Complete Privacy & Zero API Costs
✅ Your data never leaves your machine
✅ No API keys required
✅ Works offline
✅ Unlimited usage - no per-token costs
✅ 6 local providers supported: Ollama, LM Studio, LocalAI, Text Generation WebUI, vLLM, Jan
Why Local Models?
Privacy
Sensitive data stays on your infrastructure. Perfect for healthcare, finance, legal.
Cost Savings
No per-token charges. Save $75K+ annually on API costs for production apps.
Control
Full control over models, versions, and infrastructure. No rate limits.
Ollama (Recommended)
The easiest way to run local models. Ollama makes running LLMs as simple as ollama pull llama3.2.
Quick Start
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.2
# Run a model
ollama run llama3.2Using with AgentSea ADK
import {
Agent,
OllamaProvider,
ToolRegistry,
BufferMemory,
calculatorTool,
} from '@lov3kaizen/agentsea-core';
// Create Ollama provider - no API key needed!
const provider = new OllamaProvider({
baseUrl: 'http://localhost:11434',
});
// Check available models
const models = await provider.listModels();
console.log('Available models:', models);
// Pull a new model if needed
if (!models.includes('llama3.2')) {
console.log('Pulling llama3.2...');
await provider.pullModel('llama3.2');
}
// Create agent
const agent = new Agent(
{
name: 'local-assistant',
model: 'llama3.2',
provider: 'ollama',
systemPrompt: 'You are a helpful assistant running locally.',
tools: [calculatorTool],
temperature: 0.7,
maxTokens: 2048,
},
provider,
new ToolRegistry(),
new BufferMemory(50),
);
// Use it - everything runs locally!
const response = await agent.execute('What is 42 * 58?', {
conversationId: 'local-user',
sessionData: {},
history: [],
});
console.log(response.content);Popular Models for Ollama
| Model | Size | RAM | Best For |
|---|---|---|---|
| llama3.2:3b | 2GB | 8GB | Fast, lightweight, good quality |
| llama3.2:latest | 4.7GB | 16GB | Best balance of quality & speed |
| mistral | 4.1GB | 16GB | Excellent instruction following |
| qwen2.5 | 4.7GB | 16GB | Strong coding & reasoning |
| gemma2 | 5.4GB | 16GB | Google's open model |
| codellama | 3.8GB | 16GB | Code generation |
Other Local Providers
LM Studio
Desktop app with beautiful UI for running local models. OpenAI-compatible API server included.
import { LMStudioProvider } from '@lov3kaizen/agentsea-core';
const provider = new LMStudioProvider({
baseUrl: 'http://localhost:1234',
});
// Use like any other provider
const agent = new Agent(
{
name: 'lm-studio-assistant',
model: 'local-model', // Model loaded in LM Studio
provider: 'lm-studio',
},
provider,
toolRegistry,
);LocalAI
Self-hosted OpenAI alternative supporting LLMs, Stable Diffusion, voice, embeddings.
import { LocalAIProvider } from '@lov3kaizen/agentsea-core';
const provider = new LocalAIProvider({
baseUrl: 'http://localhost:8080',
});
const agent = new Agent(
{
name: 'localai-assistant',
model: 'llama-3.2-3b',
provider: 'localai',
},
provider,
toolRegistry,
);Text Generation WebUI
Feature-rich web UI for running models with extensions ecosystem.
import { TextGenerationWebUIProvider } from '@lov3kaizen/agentsea-core';
const provider = new TextGenerationWebUIProvider({
baseUrl: 'http://localhost:5000',
});vLLM
High-throughput inference server for production deployments. Uses PagedAttention for efficiency.
import { VLLMProvider } from '@lov3kaizen/agentsea-core';
const provider = new VLLMProvider({
baseUrl: 'http://localhost:8000',
});Model Management
Ollama provider includes built-in model management:
const provider = new OllamaProvider();
// List available models
const models = await provider.listModels();
console.log(models); // ['llama3.2', 'mistral', ...]
// Pull a new model
await provider.pullModel('codellama');
// Use the model
const agent = new Agent(
{
model: 'codellama',
provider: 'ollama',
},
provider,
toolRegistry,
);
// Model info
const info = await provider.getModelInfo('codellama');
console.log('Model size:', info.size);
console.log('Parameters:', info.parameters);Streaming Support
All local providers support streaming for real-time responses:
import { Agent, OllamaProvider } from '@lov3kaizen/agentsea-core';
const provider = new OllamaProvider();
const agent = new Agent(
{
model: 'llama3.2',
provider: 'ollama',
stream: true, // Enable streaming
},
provider,
toolRegistry,
);
// Stream response chunks
for await (const chunk of agent.stream('Write a story', context)) {
process.stdout.write(chunk.content);
}Complete Privacy Example
Build a fully private AI system - LLM, voice, and tools all running locally:
import {
Agent,
OllamaProvider,
VoiceAgent,
LocalWhisperProvider,
PiperTTSProvider,
ToolRegistry,
BufferMemory,
} from '@lov3kaizen/agentsea-core';
// Local LLM
const ollamaProvider = new OllamaProvider();
// Local voice
const sttProvider = new LocalWhisperProvider({
whisperPath: '/usr/local/bin/whisper',
modelPath: '/path/to/ggml-base.bin',
});
const ttsProvider = new PiperTTSProvider({
piperPath: '/usr/local/bin/piper',
modelPath: '/path/to/en_US-lessac-medium.onnx',
});
// Create agent
const agent = new Agent(
{
name: 'private-assistant',
model: 'llama3.2',
provider: 'ollama',
systemPrompt: 'You are a completely private AI assistant.',
},
ollamaProvider,
new ToolRegistry(),
new BufferMemory(100),
);
// Wrap with voice
const voiceAgent = new VoiceAgent(agent, {
sttProvider,
ttsProvider,
autoSpeak: true,
});
// Everything runs locally - complete privacy!
const result = await voiceAgent.processVoice(audioInput, context);
// ✅ No data sent to cloud
// ✅ No API keys needed
// ✅ Works offline
// ✅ Zero API costsPerformance Tips
🚀 GPU Acceleration
Ollama automatically uses GPU if available. Expect 10-50x faster inference with NVIDIA GPU.
💾 Model Size vs Quality
Start with 3B models (8GB RAM) for testing. Use 7B models (16GB RAM) for production quality.
⚡ Context Length
Reduce max_tokens for faster responses. Most conversations work well with 1024-2048 tokens.
🔄 Keep Models Loaded
Ollama keeps models in memory for 5 minutes after use. First request loads model (slow), subsequent requests are fast.
Provider Comparison
| Provider | Ease of Use | Performance | Features | Best For |
|---|---|---|---|---|
| Ollama | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Model mgmt, CLI | Getting started, development |
| LM Studio | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | GUI, easy setup | Non-technical users |
| LocalAI | ⭐⭐⭐ | ⭐⭐⭐⭐ | Multi-modal, Docker | Self-hosted services |
| vLLM | ⭐⭐ | ⭐⭐⭐⭐⭐ | PagedAttention | Production, high throughput |
| Text Gen WebUI | ⭐⭐⭐⭐ | ⭐⭐⭐ | Web UI, extensions | Experimentation |
Use Cases
🏥 Healthcare
Process patient data locally, maintain HIPAA compliance without cloud dependencies.
💰 Finance
Analyze financial data on-premise, meet regulatory requirements for data sovereignty.
⚖️ Legal
Review confidential documents locally, maintain attorney-client privilege.
🚀 Startups
Build MVP without API costs, scale without per-token charges eating profits.
Next Steps
- Add Voice Features - Local voice with Whisper & Piper
- Use the CLI Tool - Interactive model management
- Explore All Providers - 12+ providers total
- View Examples - Complete local examples
💡 Recommended Setup
Development: Start with Ollama + llama3.2:3b (fast, good quality)
Production: Use vLLM + mistral-7b (best throughput)
Privacy: Ollama + Local Whisper + Piper TTS (100% local)