v0.5.2 release - Contributors, Sponsors and Enquiries are most welcome 😌

LLM Gateway

High-performance OpenAI-compatible API gateway with intelligent routing, caching, and cost optimization across multiple LLM providers.

The Gateway provides a unified API for all providers with virtual models like best, cheapest, and fastest for automatic routing.

Installation

bash
pnpm add @lov3kaizen/agentsea-gateway

Key Features

🔌

Unified API

OpenAI-compatible API for all providers

🎯

Intelligent Routing

Round-robin, failover, cost, and latency optimization

Virtual Models

Use best, cheapest, or fastest for auto-routing

💾

Built-in Caching

LRU cache to reduce costs and latency

📊

Metrics

Request tracking, cost calculation, latency monitoring

🛡️

Circuit Breaker

Automatic failover and retry protection

Quick Start - HTTP Server

Run the gateway as an OpenAI-compatible HTTP server that you can use with any OpenAI SDK:

typescript
import {
  Gateway,
  createHTTPServer,
  startServer,
} from '@lov3kaizen/agentsea-gateway';

const gateway = new Gateway({
  providers: [
    {
      name: 'openai',
      apiKey: process.env.OPENAI_API_KEY,
      models: ['gpt-4o', 'gpt-4o-mini'],
    },
    {
      name: 'anthropic',
      apiKey: process.env.ANTHROPIC_API_KEY,
      models: ['claude-3-5-sonnet-20241022'],
    },
  ],
  routing: {
    strategy: 'cost-optimized',
  },
});

const app = createHTTPServer({ gateway });
startServer(app, { port: 3000 });

Then use it like the OpenAI API:

bash
curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cheapest",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Quick Start - SDK

Use the gateway directly in your code:

typescript
import { Gateway } from '@lov3kaizen/agentsea-gateway';

const gateway = new Gateway({
  providers: [
    { name: 'openai', apiKey: process.env.OPENAI_API_KEY, models: ['gpt-4o'] },
  ],
});

// OpenAI-compatible interface
const response = await gateway.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);
console.log(response._gateway); // Gateway metadata (provider, cost, latency)

Virtual Models

Use virtual models to automatically route to the optimal provider:

typescript
// Route to highest quality available model
await gateway.chat.completions.create({
  model: 'best',
  messages: [{ role: 'user', content: 'Complex reasoning task...' }],
});

// Route to cheapest model
await gateway.chat.completions.create({
  model: 'cheapest',
  messages: [{ role: 'user', content: 'Simple task...' }],
});

// Route to fastest provider
await gateway.chat.completions.create({
  model: 'fastest',
  messages: [{ role: 'user', content: 'Time-sensitive task...' }],
});

Routing Strategies

Round-Robin

Distributes requests evenly across providers:

typescript
const gateway = new Gateway({
  providers: [...],
  routing: {
    strategy: 'round-robin',
    weights: { openai: 2, anthropic: 1 }, // 2:1 ratio
  },
});

Failover

Tries providers in order until one succeeds:

typescript
const gateway = new Gateway({
  providers: [...],
  routing: {
    strategy: 'failover',
    fallbackChain: ['openai', 'anthropic', 'google'],
  },
});

Cost-Optimized

Selects the cheapest model meeting quality requirements:

typescript
const gateway = new Gateway({
  providers: [...],
  routing: { strategy: 'cost-optimized' },
});

Latency-Optimized

Routes to the fastest provider based on observed latencies:

typescript
const gateway = new Gateway({
  providers: [...],
  routing: { strategy: 'latency-optimized' },
});

Caching

Enable caching to reduce costs and latency for repeated requests:

typescript
const gateway = new Gateway({
  providers: [...],
  cache: {
    enabled: true,
    ttl: 3600, // 1 hour
    maxEntries: 1000,
    type: 'exact', // Hash-based matching
  },
});

Request Metadata

Add gateway-specific options to requests:

typescript
const response = await gateway.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  _gateway: {
    preferredProvider: 'anthropic',
    excludeProviders: ['google'],
    maxCost: 0.01, // Max $0.01 per request
    maxLatency: 5000, // Max 5 seconds
    cachePolicy: 'no-cache', // Skip cache
    tags: { user: 'user-123' },
  },
});

Response Metadata

Every response includes gateway metadata:

typescript
const response = await gateway.chat.completions.create({ ... });

console.log(response._gateway);
// {
//   provider: 'openai',
//   originalModel: 'cheapest',
//   latencyMs: 1234,
//   cost: 0.000123,
//   cached: false,
//   retries: 0,
//   routingDecision: { ... }
// }

Streaming

Full streaming support with SSE:

typescript
const stream = await gateway.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Metrics

Track usage and costs:

typescript
const metrics = gateway.getMetrics();

console.log(metrics.requests.total);
console.log(metrics.cost.total);
console.log(metrics.cost.byProvider);
console.log(metrics.latency.avg);
console.log(metrics.cache.hitRate);

Events

Listen to gateway events:

typescript
gateway.on('request:complete', (event) => {
  console.log(`${event.provider}: ${event.latencyMs}ms, $${event.cost}`);
});

gateway.on('request:error', (event) => {
  console.error(`Error: ${event.error.message}`);
});

gateway.on('provider:unhealthy', (provider) => {
  console.warn(`Provider ${provider} is unhealthy`);
});

API Reference

Gateway

  • constructor(config: GatewayConfig) - Create gateway instance
  • chat.completions.create(request) - Create completion
  • getMetrics() - Get usage metrics
  • getRegistry() - Get provider registry
  • getRouter() - Get router instance
  • checkHealth() - Check provider health
  • shutdown() - Clean shutdown

Built-in Providers

  • OpenAIProvider - OpenAI / Azure OpenAI
  • AnthropicProvider - Anthropic Claude
  • GoogleProvider - Google Gemini

Routing Strategies

  • RoundRobinStrategy - Even distribution
  • FailoverStrategy - Ordered fallback
  • CostOptimizedStrategy - Cheapest model
  • LatencyOptimizedStrategy - Fastest provider

Next Steps