AgentSea - Unite and Orchestrate AI Agents

🔒 Complete Privacy & Zero API Costs

✅ Your data never leaves your machine

✅ No API keys required

✅ Works offline

✅ Unlimited usage - no per-token costs

✅ 6 local providers supported: Ollama, LM Studio, LocalAI, Text Generation WebUI, vLLM, Jan

Why Local Models?

🔒

Privacy

Sensitive data stays on your infrastructure. Perfect for healthcare, finance, legal.

💰

Cost Savings

No per-token charges. Save $75K+ annually on API costs for production apps.

⚡

Control

Full control over models, versions, and infrastructure. No rate limits.

Ollama (Recommended)

The easiest way to run local models. Ollama makes running LLMs as simple as ollama pull llama3.2.

Quick Start

bash

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.2

# Run a model
ollama run llama3.2

Using with AgentSea ADK

typescript

import {
  Agent,
  OllamaProvider,
  ToolRegistry,
  BufferMemory,
  calculatorTool,
} from '@lov3kaizen/agentsea-core';

// Create Ollama provider - no API key needed!
const provider = new OllamaProvider({
  baseUrl: 'http://localhost:11434',
});

// Check available models
const models = await provider.listModels();
console.log('Available models:', models);

// Pull a new model if needed
if (!models.includes('llama3.2')) {
  console.log('Pulling llama3.2...');
  await provider.pullModel('llama3.2');
}

// Create agent
const agent = new Agent(
  {
    name: 'local-assistant',
    model: 'llama3.2',
    provider: 'ollama',
    systemPrompt: 'You are a helpful assistant running locally.',
    tools: [calculatorTool],
    temperature: 0.7,
    maxTokens: 2048,
  },
  provider,
  new ToolRegistry(),
  new BufferMemory(50),
);

// Use it - everything runs locally!
const response = await agent.execute('What is 42 * 58?', {
  conversationId: 'local-user',
  sessionData: {},
  history: [],
});

console.log(response.content);

Popular Models for Ollama

Model	Size	RAM	Best For
llama3.2:3b	2GB	8GB	Fast, lightweight, good quality
llama3.2:latest	4.7GB	16GB	Best balance of quality & speed
mistral	4.1GB	16GB	Excellent instruction following
qwen2.5	4.7GB	16GB	Strong coding & reasoning
gemma2	5.4GB	16GB	Google's open model
codellama	3.8GB	16GB	Code generation

Other Local Providers

LM Studio

Desktop app with beautiful UI for running local models. OpenAI-compatible API server included.

typescript

import { LMStudioProvider } from '@lov3kaizen/agentsea-core';

const provider = new LMStudioProvider({
  baseUrl: 'http://localhost:1234',
});

// Use like any other provider
const agent = new Agent(
  {
    name: 'lm-studio-assistant',
    model: 'local-model', // Model loaded in LM Studio
    provider: 'lm-studio',
  },
  provider,
  toolRegistry,
);

LocalAI

Self-hosted OpenAI alternative supporting LLMs, Stable Diffusion, voice, embeddings.

typescript

import { LocalAIProvider } from '@lov3kaizen/agentsea-core';

const provider = new LocalAIProvider({
  baseUrl: 'http://localhost:8080',
});

const agent = new Agent(
  {
    name: 'localai-assistant',
    model: 'llama-3.2-3b',
    provider: 'localai',
  },
  provider,
  toolRegistry,
);

Text Generation WebUI

Feature-rich web UI for running models with extensions ecosystem.

typescript

import { TextGenerationWebUIProvider } from '@lov3kaizen/agentsea-core';

const provider = new TextGenerationWebUIProvider({
  baseUrl: 'http://localhost:5000',
});

vLLM

High-throughput inference server for production deployments. Uses PagedAttention for efficiency.

typescript

import { VLLMProvider } from '@lov3kaizen/agentsea-core';

const provider = new VLLMProvider({
  baseUrl: 'http://localhost:8000',
});

Model Management

Ollama provider includes built-in model management:

typescript

const provider = new OllamaProvider();

// List available models
const models = await provider.listModels();
console.log(models); // ['llama3.2', 'mistral', ...]

// Pull a new model
await provider.pullModel('codellama');

// Use the model
const agent = new Agent(
  {
    model: 'codellama',
    provider: 'ollama',
  },
  provider,
  toolRegistry,
);

// Model info
const info = await provider.getModelInfo('codellama');
console.log('Model size:', info.size);
console.log('Parameters:', info.parameters);

Streaming Support

All local providers support streaming for real-time responses:

typescript

import { Agent, OllamaProvider } from '@lov3kaizen/agentsea-core';

const provider = new OllamaProvider();
const agent = new Agent(
  {
    model: 'llama3.2',
    provider: 'ollama',
    stream: true, // Enable streaming
  },
  provider,
  toolRegistry,
);

// Stream response chunks
for await (const chunk of agent.stream('Write a story', context)) {
  process.stdout.write(chunk.content);
}

Complete Privacy Example

Build a fully private AI system - LLM, voice, and tools all running locally:

typescript

import {
  Agent,
  OllamaProvider,
  VoiceAgent,
  LocalWhisperProvider,
  PiperTTSProvider,
  ToolRegistry,
  BufferMemory,
} from '@lov3kaizen/agentsea-core';

// Local LLM
const ollamaProvider = new OllamaProvider();

// Local voice
const sttProvider = new LocalWhisperProvider({
  whisperPath: '/usr/local/bin/whisper',
  modelPath: '/path/to/ggml-base.bin',
});

const ttsProvider = new PiperTTSProvider({
  piperPath: '/usr/local/bin/piper',
  modelPath: '/path/to/en_US-lessac-medium.onnx',
});

// Create agent
const agent = new Agent(
  {
    name: 'private-assistant',
    model: 'llama3.2',
    provider: 'ollama',
    systemPrompt: 'You are a completely private AI assistant.',
  },
  ollamaProvider,
  new ToolRegistry(),
  new BufferMemory(100),
);

// Wrap with voice
const voiceAgent = new VoiceAgent(agent, {
  sttProvider,
  ttsProvider,
  autoSpeak: true,
});

// Everything runs locally - complete privacy!
const result = await voiceAgent.processVoice(audioInput, context);

// ✅ No data sent to cloud
// ✅ No API keys needed
// ✅ Works offline
// ✅ Zero API costs

Performance Tips

🚀 GPU Acceleration

Ollama automatically uses GPU if available. Expect 10-50x faster inference with NVIDIA GPU.

💾 Model Size vs Quality

Start with 3B models (8GB RAM) for testing. Use 7B models (16GB RAM) for production quality.

⚡ Context Length

Reduce max_tokens for faster responses. Most conversations work well with 1024-2048 tokens.

🔄 Keep Models Loaded

Ollama keeps models in memory for 5 minutes after use. First request loads model (slow), subsequent requests are fast.

Provider Comparison

Provider	Ease of Use	Performance	Features	Best For
Ollama	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	Model mgmt, CLI	Getting started, development
LM Studio	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	GUI, easy setup	Non-technical users
LocalAI	⭐⭐⭐	⭐⭐⭐⭐	Multi-modal, Docker	Self-hosted services
vLLM	⭐⭐	⭐⭐⭐⭐⭐	PagedAttention	Production, high throughput
Text Gen WebUI	⭐⭐⭐⭐	⭐⭐⭐	Web UI, extensions	Experimentation

Use Cases

🏥 Healthcare

Process patient data locally, maintain HIPAA compliance without cloud dependencies.

💰 Finance

Analyze financial data on-premise, meet regulatory requirements for data sovereignty.

⚖️ Legal

Review confidential documents locally, maintain attorney-client privilege.

🚀 Startups

Build MVP without API costs, scale without per-token charges eating profits.

Next Steps

Add Voice Features - Local voice with Whisper & Piper
Use the CLI Tool - Interactive model management
Explore All Providers - 12+ providers total
View Examples - Complete local examples

💡 Recommended Setup

Development: Start with Ollama + llama3.2:3b (fast, good quality)
Production: Use vLLM + mistral-7b (best throughput)
Privacy: Ollama + Local Whisper + Piper TTS (100% local)

Local Models & Open Source Providers

🔒 Complete Privacy & Zero API Costs

Why Local Models?

Privacy

Cost Savings

Control

Ollama (Recommended)

Quick Start

Using with AgentSea ADK

Popular Models for Ollama

Other Local Providers

LM Studio

LocalAI

Text Generation WebUI

vLLM

Model Management

Streaming Support

Complete Privacy Example

Performance Tips

🚀 GPU Acceleration

💾 Model Size vs Quality

⚡ Context Length

🔄 Keep Models Loaded

Provider Comparison

Use Cases

🏥 Healthcare

💰 Finance

⚖️ Legal

🚀 Startups

Next Steps

💡 Recommended Setup