Browser Use Architecture: Agent vs Cloud, Component Selection
Before you write a single line of deployment config, you need to answer one question: which Browser Use path are you actually on? There are two, and they lead to very different infrastructure decisions.
The open-source Python agent (browser-use on PyPI) gives you full control: you own the Chromium instances, the LLM endpoints, and the execution environment. The managed Cloud API abstracts all of that away — you send a task description, Browser Use Inc. runs it on their infra, and you get the result back. The trade-off is straightforward: self-hosted means no per-task API cost but you carry the operational burden; Cloud API means zero ops but costs scale linearly with task volume. If you're running more than ~5,000 tasks per month, self-hosting starts to pull ahead on cost. Below that threshold, the Cloud API's per-task pricing is hard to beat once you factor in your own engineering time.
On the agent side, there's a second fork: the v0.13 Rust-powered beta agent versus the Python legacy agent. The Rust beta runs on a WebView-based runtime instead of Playwright, which means faster cold starts and a smaller memory footprint — but the API surface is smaller and the ecosystem of community extensions hasn't caught up yet. If you're building a new integration from scratch in mid-2026, the Rust beta is the right default. If you have existing Playwright scripts or need the full browser_use Python API, stick with legacy for now. The guide covers both paths with separate configuration sections.
LLM backbone selection is the third architectural lever. Browser Use ships with built-in support for Claude (Anthropic), GPT (OpenAI), Gemini (Google), and their own open-source bu-2-0 model which scores 83.3% on WebVoyager with a ~60s per-task average. Claude Sonnet is the current accuracy leader at ~87%, but at roughly 3x the per-token cost of GPT-4o. For cost-sensitive batch workloads, the bu-2-0 model self-hosted via vLLM hits ~55s/task at a fraction of the API cost — perfect for overnight data extraction pipelines where a few percentage points of accuracy loss is acceptable.
from browser_use import Agent, Browser
import asyncio
from langchain_anthropic import ChatAnthropic
# Self-hosted agent with custom LLM endpoint
async def main():
browser = Browser(
headless=False,
user_data_dir="./profiles/agent-1"
)
llm = ChatAnthropic(
model="claude-sonnet-4-20250514",
temperature=0.0,
)
agent = Agent(
task="Search for the latest Browser Use release on GitHub",
llm=llm,
browser=browser,
use_vision=True,
)
result = await agent.run()
print(result)
asyncio.run(main())
Unlock the full chapter. Get the complete decision tree for Agent vs Cloud, a detailed cost comparison table across all LLM backbones, hardware sizing recommendations for single-agent to 100-agent deployments, and Rust beta migration guidance.
Get the Production Guide — $39