LLM Gateway
High-performance OpenAI-compatible API gateway with intelligent routing, caching, and cost optimization across multiple LLM providers.
best, cheapest, and fastest for automatic routing.Installation
pnpm add @lov3kaizen/agentsea-gatewayKey Features
Unified API
OpenAI-compatible API for all providers
Intelligent Routing
Round-robin, failover, cost, and latency optimization
Virtual Models
Use best, cheapest, or fastest for auto-routing
Built-in Caching
LRU cache to reduce costs and latency
Metrics
Request tracking, cost calculation, latency monitoring
Circuit Breaker
Automatic failover and retry protection
Quick Start - HTTP Server
Run the gateway as an OpenAI-compatible HTTP server that you can use with any OpenAI SDK:
import {
Gateway,
createHTTPServer,
startServer,
} from '@lov3kaizen/agentsea-gateway';
const gateway = new Gateway({
providers: [
{
name: 'openai',
apiKey: process.env.OPENAI_API_KEY,
models: ['gpt-4o', 'gpt-4o-mini'],
},
{
name: 'anthropic',
apiKey: process.env.ANTHROPIC_API_KEY,
models: ['claude-3-5-sonnet-20241022'],
},
],
routing: {
strategy: 'cost-optimized',
},
});
const app = createHTTPServer({ gateway });
startServer(app, { port: 3000 });Then use it like the OpenAI API:
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "cheapest",
"messages": [{"role": "user", "content": "Hello!"}]
}'Quick Start - SDK
Use the gateway directly in your code:
import { Gateway } from '@lov3kaizen/agentsea-gateway';
const gateway = new Gateway({
providers: [
{ name: 'openai', apiKey: process.env.OPENAI_API_KEY, models: ['gpt-4o'] },
],
});
// OpenAI-compatible interface
const response = await gateway.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);
console.log(response._gateway); // Gateway metadata (provider, cost, latency)Virtual Models
Use virtual models to automatically route to the optimal provider:
// Route to highest quality available model
await gateway.chat.completions.create({
model: 'best',
messages: [{ role: 'user', content: 'Complex reasoning task...' }],
});
// Route to cheapest model
await gateway.chat.completions.create({
model: 'cheapest',
messages: [{ role: 'user', content: 'Simple task...' }],
});
// Route to fastest provider
await gateway.chat.completions.create({
model: 'fastest',
messages: [{ role: 'user', content: 'Time-sensitive task...' }],
});Routing Strategies
Round-Robin
Distributes requests evenly across providers:
const gateway = new Gateway({
providers: [...],
routing: {
strategy: 'round-robin',
weights: { openai: 2, anthropic: 1 }, // 2:1 ratio
},
});Failover
Tries providers in order until one succeeds:
const gateway = new Gateway({
providers: [...],
routing: {
strategy: 'failover',
fallbackChain: ['openai', 'anthropic', 'google'],
},
});Cost-Optimized
Selects the cheapest model meeting quality requirements:
const gateway = new Gateway({
providers: [...],
routing: { strategy: 'cost-optimized' },
});Latency-Optimized
Routes to the fastest provider based on observed latencies:
const gateway = new Gateway({
providers: [...],
routing: { strategy: 'latency-optimized' },
});Caching
Enable caching to reduce costs and latency for repeated requests:
const gateway = new Gateway({
providers: [...],
cache: {
enabled: true,
ttl: 3600, // 1 hour
maxEntries: 1000,
type: 'exact', // Hash-based matching
},
});Request Metadata
Add gateway-specific options to requests:
const response = await gateway.chat.completions.create({
model: 'gpt-4o',
messages: [...],
_gateway: {
preferredProvider: 'anthropic',
excludeProviders: ['google'],
maxCost: 0.01, // Max $0.01 per request
maxLatency: 5000, // Max 5 seconds
cachePolicy: 'no-cache', // Skip cache
tags: { user: 'user-123' },
},
});Response Metadata
Every response includes gateway metadata:
const response = await gateway.chat.completions.create({ ... });
console.log(response._gateway);
// {
// provider: 'openai',
// originalModel: 'cheapest',
// latencyMs: 1234,
// cost: 0.000123,
// cached: false,
// retries: 0,
// routingDecision: { ... }
// }Streaming
Full streaming support with SSE:
const stream = await gateway.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}Metrics
Track usage and costs:
const metrics = gateway.getMetrics();
console.log(metrics.requests.total);
console.log(metrics.cost.total);
console.log(metrics.cost.byProvider);
console.log(metrics.latency.avg);
console.log(metrics.cache.hitRate);Events
Listen to gateway events:
gateway.on('request:complete', (event) => {
console.log(`${event.provider}: ${event.latencyMs}ms, $${event.cost}`);
});
gateway.on('request:error', (event) => {
console.error(`Error: ${event.error.message}`);
});
gateway.on('provider:unhealthy', (provider) => {
console.warn(`Provider ${provider} is unhealthy`);
});API Reference
Gateway
constructor(config: GatewayConfig)- Create gateway instancechat.completions.create(request)- Create completiongetMetrics()- Get usage metricsgetRegistry()- Get provider registrygetRouter()- Get router instancecheckHealth()- Check provider healthshutdown()- Clean shutdown
Built-in Providers
OpenAIProvider- OpenAI / Azure OpenAIAnthropicProvider- Anthropic ClaudeGoogleProvider- Google Gemini
Routing Strategies
RoundRobinStrategy- Even distributionFailoverStrategy- Ordered fallbackCostOptimizedStrategy- Cheapest modelLatencyOptimizedStrategy- Fastest provider