Batch Task Queue & Concurrency: Scaling Beyond One Agent

A single Browser Use agent handles one task at a time. One Chromium instance, one LLM call chain, one output. That's fine for ad-hoc automation scripts. It's completely inadequate when you need to scrape 10,000 product pages, run a nightly data extraction pipeline across 200 URLs, or process a continuous stream of user-submitted tasks. You need a queue.

Celery with Redis is the most battle-tested pattern in the Python ecosystem for this exact problem. Here's how the architecture fits together: Redis acts as both the message broker (tasks go in, workers pull them out) and the result backend (agent outputs are stored until your application retrieves them). Each Celery worker process wraps a single Browser Use agent — one worker = one Chromium instance = one concurrent task. Scale horizontally by adding more workers, up to the memory limits of your host machine. Flower provides a real-time dashboard showing queue depth, worker status, task success/failure rates, and per-task execution time — essential for spotting when a particular website or task type is degrading your throughput.

Resource budgeting is where most scaling attempts fail. A headless Chromium instance needs roughly 500MB of RAM at steady state, spiking to 800MB during complex page loads. A headful instance (with Xvfb, see Chapter 3) needs 700MB to 1GB. If you're running a 16GB VM, you can safely run 8-12 headless agents — but only if you account for OS overhead, the Celery worker processes themselves, and the fact that Chromium never releases memory to the OS (it holds onto freed pages internally). The LLM API calls are I/O-bound, not CPU-heavy, so CPU is rarely the bottleneck. The real constraint is almost always RAM and the number of concurrent Chromium processes your kernel can handle before context switching overhead overtakes any throughput gains.

For GPU-accelerated deployments (e.g., running the open-source bu-2-0 model locally via vLLM), you have a second dimension to budget: VRAM. Each concurrent LLM inference stream uses a fixed amount of GPU memory based on model size and context length. The full chapter includes a budgeting worksheet that takes your target tasks-per-hour, works backward to the number of concurrent agents you need, and outputs the exact VM instance type and GPU configuration required.

# celery_app.py — Minimal Celery task wrapping a Browser Use agent
from celery import Celery
from browser_use import Agent, Browser
from langchain_anthropic import ChatAnthropic
import asyncio

app = Celery("browser_tasks", broker="redis://localhost:6379/0")
app.conf.result_backend = "redis://localhost:6379/0"
app.conf.worker_concurrency = 1
app.conf.task_acks_late = True

@app.task(bind=True, max_retries=2, default_retry_delay=60)
def run_browser_task(self, url: str, instruction: str):
    """Run one Browser Use task. One task = one Chromium instance."""
    async def _run():
        browser = Browser(headless=True)
        llm = ChatAnthropic(
            model="claude-sonnet-4-20250514",
            temperature=0.0,
        )
        agent = Agent(task=instruction, llm=llm, browser=browser)
        result = await agent.run()
        await browser.close()
        return result

    try:
        return asyncio.run(_run())
    except Exception as exc:
        raise self.retry(exc=exc, countdown=120)

🔒

Unlock the full chapter. Get the complete Celery + Flower production setup with Docker Compose, an agent pool manager that pre-warms Chromium instances for sub-second task dispatch, resource budgeting formulas for any VM size, and a horizontal autoscaling strategy for Kubernetes deployments.

Get the Production Guide — $39