When it activates
The agent uses this skill when you ask it to:- Navigate to websites (“go to example.com”)
- Interact with pages (“click the submit button”, “fill out the form”)
- Capture visual state (“take a screenshot”)
- Extract data (“scrape the product list”)
- Test web applications (“test the checkout flow”)
- Handle authentication (“log into the dashboard”)
How it works
- A persistent Chromium browser runs in headless mode inside the task sandbox
- Pages are created with descriptive names (e.g.,
"checkout","login") - State persists between script executions—cookies, localStorage, and DOM remain intact
- The agent writes TypeScript scripts that execute via
npx tsx
Key capabilities
Page navigation and interaction
Screenshots
Element discovery
When the agent doesn’t know the page structure, it uses ARIA snapshots to discover elements via the accessibility tree:Console message capture
Login handling
When the agent encounters a login page and has no credentials, it outputs<login-required />. This displays a Control Browser button allowing you to complete authentication manually in the agent’s browser session.
After you log in, the agent continues with the authenticated session.
Example interaction
You: “Go to our staging site and take a screenshot of the dashboard” Agent:- Navigates to the staging URL
- Detects login page, outputs
<login-required /> - (You log in manually via the Control Browser button)
- Agent continues to dashboard
- Takes screenshot and reports what it sees
Tips
- Be specific about what you want: “click the submit button” vs “submit the form”
- Ask for screenshots when you need visual confirmation
- If login is required, you’ll get a prompt to authenticate manually
- Page state persists—the agent can return to a page later without re-navigating