Free preview. Full chapter — decision tree, cost comparison table, hardware sizing guide — in the Production Guide.

Browser Use Architecture: Agent vs Cloud, Component Selection

Before you write a single line of deployment config, you need to answer one question: which Browser Use path are you actually on? There are two, and they lead to very different infrastructure decisions.

The open-source Python agent (browser-use on PyPI) gives you full control: you own the Chromium instances, the LLM endpoints, and the execution environment. The managed Cloud API abstracts all of that away — you send a task description, Browser Use Inc. runs it on their infra, and you get the result back. The trade-off is straightforward: self-hosted means no per-task API cost but you carry the operational burden; Cloud API means zero ops but costs scale linearly with task volume. If you're running more than ~5,000 tasks per month, self-hosting starts to pull ahead on cost. Below that threshold, the Cloud API's per-task pricing is hard to beat once you factor in your own engineering time.

On the agent side, there's a second fork: the v0.13 Rust-powered beta agent versus the Python legacy agent. The Rust beta runs on a WebView-based runtime instead of Playwright, which means faster cold starts and a smaller memory footprint — but the API surface is smaller and the ecosystem of community extensions hasn't caught up yet. If you're building a new integration from scratch in mid-2026, the Rust beta is the right default. If you have existing Playwright scripts or need the full browser_use Python API, stick with legacy for now. The guide covers both paths with separate configuration sections.

LLM backbone selection is the third architectural lever. Browser Use ships with built-in support for Claude (Anthropic), GPT (OpenAI), Gemini (Google), and their own open-source bu-2-0 model which scores 83.3% on WebVoyager with a ~60s per-task average. Claude Sonnet is the current accuracy leader at ~87%, but at roughly 3x the per-token cost of GPT-4o. For cost-sensitive batch workloads, the bu-2-0 model self-hosted via vLLM hits ~55s/task at a fraction of the API cost — perfect for overnight data extraction pipelines where a few percentage points of accuracy loss is acceptable.

from browser_use import Agent, Browser
import asyncio
from langchain_anthropic import ChatAnthropic

# Self-hosted agent with custom LLM endpoint
async def main():
    browser = Browser(
        headless=False,
        user_data_dir="./profiles/agent-1"
    )
    llm = ChatAnthropic(
        model="claude-sonnet-4-20250514",
        temperature=0.0,
    )
    agent = Agent(
        task="Search for the latest Browser Use release on GitHub",
        llm=llm,
        browser=browser,
        use_vision=True,
    )
    result = await agent.run()
    print(result)

asyncio.run(main())
🔒

Unlock the full chapter. Get the complete decision tree for Agent vs Cloud, a detailed cost comparison table across all LLM backbones, hardware sizing recommendations for single-agent to 100-agent deployments, and Rust beta migration guidance.

Get the Production Guide — $39