dev-browser

The dev-browser skill enables browser automation that maintains page state across script executions. The agent writes small, focused scripts to accomplish tasks incrementally.

When it activates

The agent uses this skill when you ask it to:

Navigate to websites (“go to example.com”)
Interact with pages (“click the submit button”, “fill out the form”)
Capture visual state (“take a screenshot”)
Extract data (“scrape the product list”)
Test web applications (“test the checkout flow”)
Handle authentication (“log into the dashboard”)

How it works

A persistent Chromium browser runs in headless mode inside the task sandbox
Pages are created with descriptive names (e.g., "checkout", "login")
State persists between script executions—cookies, localStorage, and DOM remain intact
The agent writes TypeScript scripts that execute via npx tsx

Key capabilities

const page = await client.page("my-app");
await page.goto("https://example.com");
await page.click("button.submit");
await page.fill("input[name='email']", "[email protected]");

Screenshots

// Viewport screenshot
await page.screenshot({ path: "tmp/screenshot.png" });

// Full page screenshot
await page.screenshot({ path: "tmp/full.png", fullPage: true });

Element discovery

When the agent doesn’t know the page structure, it uses ARIA snapshots to discover elements via the accessibility tree:

- banner:
  - link "Hacker News" [ref=e1]
  - navigation:
    - link "new" [ref=e2]
    - link "comments" [ref=e4]
- main:
  - list:
    - listitem:
      - link "Article Title" [ref=e8]

The agent then interacts with elements using their refs:

const element = await client.selectSnapshotRef("hackernews", "e2");
await element.click();

Console message capture

const errors = await client.getConsoleMessages("my-app", { type: "error" });

When the agent encounters a login page and has no credentials, it outputs <login-required />. This displays a Control Browser button allowing you to complete authentication manually in the agent’s browser session. After you log in, the agent continues with the authenticated session.

Example interaction

You: “Go to our staging site and take a screenshot of the dashboard” Agent:

Navigates to the staging URL
Detects login page, outputs <login-required />
(You log in manually via the Control Browser button)
Agent continues to dashboard
Takes screenshot and reports what it sees

Tips

Be specific about what you want: “click the submit button” vs “submit the form”
Ask for screenshots when you need visual confirmation
If login is required, you’ll get a prompt to authenticate manually
Page state persists—the agent can return to a page later without re-navigating

Getting started

Setting up

Tasks

Skills

GitHub integration

Reference

When it activates

How it works

Key capabilities

Page navigation and interaction

Screenshots

Element discovery

Console message capture

Example interaction

Tips

Getting started

Setting up

Tasks

Skills

GitHub integration

Reference

​When it activates

​How it works

​Key capabilities

​Page navigation and interaction

​Screenshots

​Element discovery

​Console message capture

​Login handling

​Example interaction

​Tips

When it activates

How it works

Key capabilities

Page navigation and interaction

Screenshots

Element discovery

Console message capture

Login handling

Example interaction

Tips