Skip to main content
The dev-browser skill enables browser automation that maintains page state across script executions. The agent writes small, focused scripts to accomplish tasks incrementally.

When it activates

The agent uses this skill when you ask it to:
  • Navigate to websites (“go to example.com”)
  • Interact with pages (“click the submit button”, “fill out the form”)
  • Capture visual state (“take a screenshot”)
  • Extract data (“scrape the product list”)
  • Test web applications (“test the checkout flow”)
  • Handle authentication (“log into the dashboard”)

How it works

  1. A persistent Chromium browser runs in headless mode inside the task sandbox
  2. Pages are created with descriptive names (e.g., "checkout", "login")
  3. State persists between script executions—cookies, localStorage, and DOM remain intact
  4. The agent writes TypeScript scripts that execute via npx tsx

Key capabilities

const page = await client.page("my-app");
await page.goto("https://example.com");
await page.click("button.submit");
await page.fill("input[name='email']", "[email protected]");

Screenshots

// Viewport screenshot
await page.screenshot({ path: "tmp/screenshot.png" });

// Full page screenshot
await page.screenshot({ path: "tmp/full.png", fullPage: true });

Element discovery

When the agent doesn’t know the page structure, it uses ARIA snapshots to discover elements via the accessibility tree:
- banner:
  - link "Hacker News" [ref=e1]
  - navigation:
    - link "new" [ref=e2]
    - link "comments" [ref=e4]
- main:
  - list:
    - listitem:
      - link "Article Title" [ref=e8]
The agent then interacts with elements using their refs:
const element = await client.selectSnapshotRef("hackernews", "e2");
await element.click();

Console message capture

const errors = await client.getConsoleMessages("my-app", { type: "error" });

Login handling

When the agent encounters a login page and has no credentials, it outputs <login-required />. This displays a Control Browser button allowing you to complete authentication manually in the agent’s browser session. After you log in, the agent continues with the authenticated session.

Example interaction

You: “Go to our staging site and take a screenshot of the dashboard” Agent:
  1. Navigates to the staging URL
  2. Detects login page, outputs <login-required />
  3. (You log in manually via the Control Browser button)
  4. Agent continues to dashboard
  5. Takes screenshot and reports what it sees

Tips

  • Be specific about what you want: “click the submit button” vs “submit the form”
  • Ask for screenshots when you need visual confirmation
  • If login is required, you’ll get a prompt to authenticate manually
  • Page state persists—the agent can return to a page later without re-navigating