v0.5.2 release - Contributors, Sponsors and Enquiries are most welcome 😌

Local Models & Open Source Providers

Run AI agents completely locally with open source models - perfect for privacy, cost savings, and offline development.

🔒 Complete Privacy & Zero API Costs

✅ Your data never leaves your machine

✅ No API keys required

✅ Works offline

✅ Unlimited usage - no per-token costs

✅ 6 local providers supported: Ollama, LM Studio, LocalAI, Text Generation WebUI, vLLM, Jan

Why Local Models?

🔒

Privacy

Sensitive data stays on your infrastructure. Perfect for healthcare, finance, legal.

💰

Cost Savings

No per-token charges. Save $75K+ annually on API costs for production apps.

Control

Full control over models, versions, and infrastructure. No rate limits.

Ollama (Recommended)

The easiest way to run local models. Ollama makes running LLMs as simple as ollama pull llama3.2.

Quick Start

bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.2

# Run a model
ollama run llama3.2

Using with AgentSea ADK

typescript
import {
  Agent,
  OllamaProvider,
  ToolRegistry,
  BufferMemory,
  calculatorTool,
} from '@lov3kaizen/agentsea-core';

// Create Ollama provider - no API key needed!
const provider = new OllamaProvider({
  baseUrl: 'http://localhost:11434',
});

// Check available models
const models = await provider.listModels();
console.log('Available models:', models);

// Pull a new model if needed
if (!models.includes('llama3.2')) {
  console.log('Pulling llama3.2...');
  await provider.pullModel('llama3.2');
}

// Create agent
const agent = new Agent(
  {
    name: 'local-assistant',
    model: 'llama3.2',
    provider: 'ollama',
    systemPrompt: 'You are a helpful assistant running locally.',
    tools: [calculatorTool],
    temperature: 0.7,
    maxTokens: 2048,
  },
  provider,
  new ToolRegistry(),
  new BufferMemory(50),
);

// Use it - everything runs locally!
const response = await agent.execute('What is 42 * 58?', {
  conversationId: 'local-user',
  sessionData: {},
  history: [],
});

console.log(response.content);

Popular Models for Ollama

ModelSizeRAMBest For
llama3.2:3b2GB8GBFast, lightweight, good quality
llama3.2:latest4.7GB16GBBest balance of quality & speed
mistral4.1GB16GBExcellent instruction following
qwen2.54.7GB16GBStrong coding & reasoning
gemma25.4GB16GBGoogle's open model
codellama3.8GB16GBCode generation

Other Local Providers

LM Studio

Desktop app with beautiful UI for running local models. OpenAI-compatible API server included.

typescript
import { LMStudioProvider } from '@lov3kaizen/agentsea-core';

const provider = new LMStudioProvider({
  baseUrl: 'http://localhost:1234',
});

// Use like any other provider
const agent = new Agent(
  {
    name: 'lm-studio-assistant',
    model: 'local-model', // Model loaded in LM Studio
    provider: 'lm-studio',
  },
  provider,
  toolRegistry,
);

LocalAI

Self-hosted OpenAI alternative supporting LLMs, Stable Diffusion, voice, embeddings.

typescript
import { LocalAIProvider } from '@lov3kaizen/agentsea-core';

const provider = new LocalAIProvider({
  baseUrl: 'http://localhost:8080',
});

const agent = new Agent(
  {
    name: 'localai-assistant',
    model: 'llama-3.2-3b',
    provider: 'localai',
  },
  provider,
  toolRegistry,
);

Text Generation WebUI

Feature-rich web UI for running models with extensions ecosystem.

typescript
import { TextGenerationWebUIProvider } from '@lov3kaizen/agentsea-core';

const provider = new TextGenerationWebUIProvider({
  baseUrl: 'http://localhost:5000',
});

vLLM

High-throughput inference server for production deployments. Uses PagedAttention for efficiency.

typescript
import { VLLMProvider } from '@lov3kaizen/agentsea-core';

const provider = new VLLMProvider({
  baseUrl: 'http://localhost:8000',
});

Model Management

Ollama provider includes built-in model management:

typescript
const provider = new OllamaProvider();

// List available models
const models = await provider.listModels();
console.log(models); // ['llama3.2', 'mistral', ...]

// Pull a new model
await provider.pullModel('codellama');

// Use the model
const agent = new Agent(
  {
    model: 'codellama',
    provider: 'ollama',
  },
  provider,
  toolRegistry,
);

// Model info
const info = await provider.getModelInfo('codellama');
console.log('Model size:', info.size);
console.log('Parameters:', info.parameters);

Streaming Support

All local providers support streaming for real-time responses:

typescript
import { Agent, OllamaProvider } from '@lov3kaizen/agentsea-core';

const provider = new OllamaProvider();
const agent = new Agent(
  {
    model: 'llama3.2',
    provider: 'ollama',
    stream: true, // Enable streaming
  },
  provider,
  toolRegistry,
);

// Stream response chunks
for await (const chunk of agent.stream('Write a story', context)) {
  process.stdout.write(chunk.content);
}

Complete Privacy Example

Build a fully private AI system - LLM, voice, and tools all running locally:

typescript
import {
  Agent,
  OllamaProvider,
  VoiceAgent,
  LocalWhisperProvider,
  PiperTTSProvider,
  ToolRegistry,
  BufferMemory,
} from '@lov3kaizen/agentsea-core';

// Local LLM
const ollamaProvider = new OllamaProvider();

// Local voice
const sttProvider = new LocalWhisperProvider({
  whisperPath: '/usr/local/bin/whisper',
  modelPath: '/path/to/ggml-base.bin',
});

const ttsProvider = new PiperTTSProvider({
  piperPath: '/usr/local/bin/piper',
  modelPath: '/path/to/en_US-lessac-medium.onnx',
});

// Create agent
const agent = new Agent(
  {
    name: 'private-assistant',
    model: 'llama3.2',
    provider: 'ollama',
    systemPrompt: 'You are a completely private AI assistant.',
  },
  ollamaProvider,
  new ToolRegistry(),
  new BufferMemory(100),
);

// Wrap with voice
const voiceAgent = new VoiceAgent(agent, {
  sttProvider,
  ttsProvider,
  autoSpeak: true,
});

// Everything runs locally - complete privacy!
const result = await voiceAgent.processVoice(audioInput, context);

// ✅ No data sent to cloud
// ✅ No API keys needed
// ✅ Works offline
// ✅ Zero API costs

Performance Tips

🚀 GPU Acceleration

Ollama automatically uses GPU if available. Expect 10-50x faster inference with NVIDIA GPU.

💾 Model Size vs Quality

Start with 3B models (8GB RAM) for testing. Use 7B models (16GB RAM) for production quality.

⚡ Context Length

Reduce max_tokens for faster responses. Most conversations work well with 1024-2048 tokens.

🔄 Keep Models Loaded

Ollama keeps models in memory for 5 minutes after use. First request loads model (slow), subsequent requests are fast.

Provider Comparison

ProviderEase of UsePerformanceFeaturesBest For
Ollama⭐⭐⭐⭐⭐⭐⭐⭐⭐Model mgmt, CLIGetting started, development
LM Studio⭐⭐⭐⭐⭐⭐⭐⭐⭐GUI, easy setupNon-technical users
LocalAI⭐⭐⭐⭐⭐⭐⭐Multi-modal, DockerSelf-hosted services
vLLM⭐⭐⭐⭐⭐⭐⭐PagedAttentionProduction, high throughput
Text Gen WebUI⭐⭐⭐⭐⭐⭐⭐Web UI, extensionsExperimentation

Use Cases

🏥 Healthcare

Process patient data locally, maintain HIPAA compliance without cloud dependencies.

💰 Finance

Analyze financial data on-premise, meet regulatory requirements for data sovereignty.

⚖️ Legal

Review confidential documents locally, maintain attorney-client privilege.

🚀 Startups

Build MVP without API costs, scale without per-token charges eating profits.

Next Steps

💡 Recommended Setup

Development: Start with Ollama + llama3.2:3b (fast, good quality)
Production: Use vLLM + mistral-7b (best throughput)
Privacy: Ollama + Local Whisper + Piper TTS (100% local)