Browser Automation (Surf)
Computer-use agent for controlling desktop environments. Screen capture, mouse, keyboard actions using Claude's vision capabilities.
Installation
pnpm add @lov3kaizen/agentsea-surf
# Optional dependencies
pnpm add puppeteer # For browser automation
pnpm add sharp # For image processingKey Features
8 Computer-Use Tools
Screenshot, click, type, scroll, drag, key press, cursor move, wait
Multiple Backends
Native (macOS, Linux, Windows), Puppeteer, Docker
Claude Vision
Automatic screen analysis and action determination
Streaming
Real-time event streaming during execution
Security Sandbox
Rate limiting, command blocking, domain restrictions
NestJS Integration
REST API and WebSocket support
Computer-Use Tools
Screenshot
Click
Type
Scroll
Drag
Key Press
Cursor Move
Wait
Quick Start
import {
SurfAgent,
createNativeBackend,
} from '@lov3kaizen/agentsea-surf';
async function main() {
// Create a native backend for your platform
const backend = createNativeBackend();
await backend.connect();
// Create the agent
const agent = new SurfAgent('session-1', backend, {
maxSteps: 20,
vision: {
model: 'claude-sonnet-4-20250514',
maxTokens: 4096,
includeScreenshotInResponse: true,
},
});
// Execute a task
const result = await agent.execute(
'Open Chrome and navigate to google.com'
);
console.log('Result:', result.response);
console.log('Steps taken:', result.state.actionHistory.length);
await backend.disconnect();
}
main().catch(console.error);Backends
Native Backend
Automatically selects the appropriate backend for your platform (macOS, Linux, Windows):
import { createNativeBackend } from '@lov3kaizen/agentsea-surf';
const backend = createNativeBackend({ displayIndex: 0 });
await backend.connect();Browser Backend (Puppeteer)
Automate web browsers with Puppeteer:
import { PuppeteerBackend } from '@lov3kaizen/agentsea-surf';
const backend = new PuppeteerBackend({
headless: false,
viewport: { width: 1920, height: 1080 },
initialUrl: 'https://example.com',
});
await backend.connect();Docker Backend
Run in an isolated Docker container:
import { DockerBackend } from '@lov3kaizen/agentsea-surf';
const backend = new DockerBackend({
image: 'agentsea/desktop:ubuntu-22.04',
resolution: { width: 1920, height: 1080, scaleFactor: 1 },
removeOnDisconnect: true,
});
await backend.connect();Streaming Execution
Stream events as the agent executes:
const agent = new SurfAgent('session-1', backend, config);
for await (const event of agent.executeStream('Search for weather')) {
switch (event.type) {
case 'screenshot':
console.log('Screenshot taken');
break;
case 'action':
console.log(`Executing: ${event.action.description}`);
break;
case 'complete':
console.log('Task completed:', event.response);
break;
}
}Using Individual Tools
Use tools independently for fine-grained control:
import {
createSurfTools,
createNativeBackend,
} from '@lov3kaizen/agentsea-surf';
const backend = createNativeBackend();
await backend.connect();
const tools = createSurfTools(backend);
// Use individual tools
await tools.screenshot.execute({});
await tools.click.execute({ x: 100, y: 200 });
await tools.typeText.execute({ text: 'Hello World' });
await tools.scroll.execute({ direction: 'down', amount: 200 });
await tools.keyPress.execute({ key: 'Enter' });
await tools.drag.execute({ startX: 0, startY: 0, endX: 100, endY: 100 });
await tools.cursorMove.execute({ x: 500, y: 300 });
await tools.wait.execute({ duration: 1000 });Security Sandboxing
Restrict agent capabilities for safety:
const agent = new SurfAgent('session', backend, {
sandbox: {
enabled: true,
maxActionsPerMinute: 60,
blockedDomains: ['malicious-site.com'],
blockedCommands: ['rm -rf', 'sudo'],
blockedPaths: ['/etc', '/root'],
},
});NestJS Integration
import { Module } from '@nestjs/common';
import { SurfModule } from '@lov3kaizen/agentsea-surf/nestjs';
@Module({
imports: [
SurfModule.forRoot({
backend: { type: 'native' },
config: {
maxSteps: 50,
sandbox: { enabled: true },
},
enableRestApi: true,
enableWebSocket: true,
}),
],
})
export class AppModule {}REST API Endpoints
When using NestJS integration:
| Method | Endpoint | Description |
|---|---|---|
| POST | /surf/execute | Execute a task |
| POST | /surf/action | Execute single action |
| POST | /surf/screenshot | Take a screenshot |
| GET | /surf/screen | Get screen state |
| GET | /surf/sessions | List active sessions |
| GET | /surf/status | Get backend status |
WebSocket Events
| Event | Description |
|---|---|
execute | Start task execution (emits stream, complete, error) |
action | Execute single action (emits actionResult) |
screenshot | Take screenshot (emits screenshotResult) |
stop | Stop current execution |
status | Get backend status |
API Reference
SurfAgent
// Constructor
new SurfAgent(
sessionId: string,
backend: DesktopBackend,
config?: Partial<SurfConfig>
)
// Methods
agent.execute(task: string, context?: AgentContext) // Execute a task
agent.executeStream(task: string, context?: AgentContext) // Execute with streaming
agent.stop() // Stop the current execution
agent.getState() // Get current agent stateSurfConfig
interface SurfConfig {
maxSteps: number; // Maximum steps before stopping
vision: {
model: string; // Claude model to use
maxTokens: number;
includeScreenshotInResponse: boolean;
};
sandbox?: {
enabled: boolean;
maxActionsPerMinute?: number;
blockedDomains?: string[];
blockedCommands?: string[];
blockedPaths?: string[];
};
}Use Cases
- Web Automation - Fill forms, navigate sites, scrape data
- Desktop Automation - Control native applications
- Testing - Automated UI testing with AI
- Data Entry - Automate repetitive data entry tasks
- Research - Gather information from multiple sources