v0.5.2 release - Contributors, Sponsors and Enquiries are most welcome 😌

Browser Automation (Surf)

Computer-use agent for controlling desktop environments. Screen capture, mouse, keyboard actions using Claude's vision capabilities.

Surf enables AI agents to interact with desktop applications and browsers through 8 computer-use tools with Claude vision integration.

Installation

bash
pnpm add @lov3kaizen/agentsea-surf

# Optional dependencies
pnpm add puppeteer  # For browser automation
pnpm add sharp      # For image processing

Key Features

🖥️

8 Computer-Use Tools

Screenshot, click, type, scroll, drag, key press, cursor move, wait

🔌

Multiple Backends

Native (macOS, Linux, Windows), Puppeteer, Docker

👁️

Claude Vision

Automatic screen analysis and action determination

🔄

Streaming

Real-time event streaming during execution

🛡️

Security Sandbox

Rate limiting, command blocking, domain restrictions

🎯

NestJS Integration

REST API and WebSocket support

Computer-Use Tools

📷

Screenshot

🖱️

Click

⌨️

Type

📜

Scroll

🎯

Drag

Key Press

➡️

Cursor Move

⏱️

Wait

Quick Start

typescript
import {
  SurfAgent,
  createNativeBackend,
} from '@lov3kaizen/agentsea-surf';

async function main() {
  // Create a native backend for your platform
  const backend = createNativeBackend();
  await backend.connect();

  // Create the agent
  const agent = new SurfAgent('session-1', backend, {
    maxSteps: 20,
    vision: {
      model: 'claude-sonnet-4-20250514',
      maxTokens: 4096,
      includeScreenshotInResponse: true,
    },
  });

  // Execute a task
  const result = await agent.execute(
    'Open Chrome and navigate to google.com'
  );

  console.log('Result:', result.response);
  console.log('Steps taken:', result.state.actionHistory.length);

  await backend.disconnect();
}

main().catch(console.error);

Backends

Native Backend

Automatically selects the appropriate backend for your platform (macOS, Linux, Windows):

typescript
import { createNativeBackend } from '@lov3kaizen/agentsea-surf';

const backend = createNativeBackend({ displayIndex: 0 });
await backend.connect();

Browser Backend (Puppeteer)

Automate web browsers with Puppeteer:

typescript
import { PuppeteerBackend } from '@lov3kaizen/agentsea-surf';

const backend = new PuppeteerBackend({
  headless: false,
  viewport: { width: 1920, height: 1080 },
  initialUrl: 'https://example.com',
});

await backend.connect();

Docker Backend

Run in an isolated Docker container:

typescript
import { DockerBackend } from '@lov3kaizen/agentsea-surf';

const backend = new DockerBackend({
  image: 'agentsea/desktop:ubuntu-22.04',
  resolution: { width: 1920, height: 1080, scaleFactor: 1 },
  removeOnDisconnect: true,
});

await backend.connect();

Streaming Execution

Stream events as the agent executes:

typescript
const agent = new SurfAgent('session-1', backend, config);

for await (const event of agent.executeStream('Search for weather')) {
  switch (event.type) {
    case 'screenshot':
      console.log('Screenshot taken');
      break;
    case 'action':
      console.log(`Executing: ${event.action.description}`);
      break;
    case 'complete':
      console.log('Task completed:', event.response);
      break;
  }
}

Using Individual Tools

Use tools independently for fine-grained control:

typescript
import {
  createSurfTools,
  createNativeBackend,
} from '@lov3kaizen/agentsea-surf';

const backend = createNativeBackend();
await backend.connect();

const tools = createSurfTools(backend);

// Use individual tools
await tools.screenshot.execute({});
await tools.click.execute({ x: 100, y: 200 });
await tools.typeText.execute({ text: 'Hello World' });
await tools.scroll.execute({ direction: 'down', amount: 200 });
await tools.keyPress.execute({ key: 'Enter' });
await tools.drag.execute({ startX: 0, startY: 0, endX: 100, endY: 100 });
await tools.cursorMove.execute({ x: 500, y: 300 });
await tools.wait.execute({ duration: 1000 });

Security Sandboxing

Restrict agent capabilities for safety:

typescript
const agent = new SurfAgent('session', backend, {
  sandbox: {
    enabled: true,
    maxActionsPerMinute: 60,
    blockedDomains: ['malicious-site.com'],
    blockedCommands: ['rm -rf', 'sudo'],
    blockedPaths: ['/etc', '/root'],
  },
});

NestJS Integration

typescript
import { Module } from '@nestjs/common';
import { SurfModule } from '@lov3kaizen/agentsea-surf/nestjs';

@Module({
  imports: [
    SurfModule.forRoot({
      backend: { type: 'native' },
      config: {
        maxSteps: 50,
        sandbox: { enabled: true },
      },
      enableRestApi: true,
      enableWebSocket: true,
    }),
  ],
})
export class AppModule {}

REST API Endpoints

When using NestJS integration:

MethodEndpointDescription
POST/surf/executeExecute a task
POST/surf/actionExecute single action
POST/surf/screenshotTake a screenshot
GET/surf/screenGet screen state
GET/surf/sessionsList active sessions
GET/surf/statusGet backend status

WebSocket Events

EventDescription
executeStart task execution (emits stream, complete, error)
actionExecute single action (emits actionResult)
screenshotTake screenshot (emits screenshotResult)
stopStop current execution
statusGet backend status

API Reference

SurfAgent

typescript
// Constructor
new SurfAgent(
  sessionId: string,
  backend: DesktopBackend,
  config?: Partial<SurfConfig>
)

// Methods
agent.execute(task: string, context?: AgentContext)  // Execute a task
agent.executeStream(task: string, context?: AgentContext)  // Execute with streaming
agent.stop()  // Stop the current execution
agent.getState()  // Get current agent state

SurfConfig

typescript
interface SurfConfig {
  maxSteps: number;  // Maximum steps before stopping
  vision: {
    model: string;  // Claude model to use
    maxTokens: number;
    includeScreenshotInResponse: boolean;
  };
  sandbox?: {
    enabled: boolean;
    maxActionsPerMinute?: number;
    blockedDomains?: string[];
    blockedCommands?: string[];
    blockedPaths?: string[];
  };
}

Use Cases

  • Web Automation - Fill forms, navigate sites, scrape data
  • Desktop Automation - Control native applications
  • Testing - Automated UI testing with AI
  • Data Entry - Automate repetitive data entry tasks
  • Research - Gather information from multiple sources

Next Steps