Skip to main content

Overview

Web scraping lets you extract structured data from websites. Browserbase provides a reliable browser infrastructure that helps you build scrapers that can:
  • Scale without infrastructure management
  • Maintain consistent performance
  • Avoid bot detection and CAPTCHAs with Browserbase’s stealth mode
  • Provide debugging and monitoring tools with session replays and live views
This guide will help you get started with web scraping on Browserbase and highlight best practices.
Need scheduled scraping or webhook-triggered data collection? Functions provide serverless scraping that can be invoked on-demand or on a schedule—perfect for building data pipelines and monitoring workflows.

Scraping a website

Using a sample website, we’ll scrape the title, price, and some other details of books from the website.

Follow Along: Web Scraping Example

Step-by-step code for web scraping

Code Example

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";
import dotenv from "dotenv";
dotenv.config();

const stagehand = new Stagehand({
    env: "BROWSERBASE",
    verbose: 0,
});

async function scrapeBooks() {
    await stagehand.init();
    const page = stagehand.page;

    await page.goto("https://books.toscrape.com/");

    const scrape = await page.extract({
        instruction: "Extract the books from the page",
        schema: z.object({
            books: z.array(z.object({
                title: z.string(),
                price: z.string(),
                image: z.string(),
                inStock: z.string(),
                link: z.string(),
            }))
        }),
    });

    console.log(scrape.books);

    await stagehand.close();
    return books;
}

const books = scrapeBooks().catch(console.error);

Example output

[
  {
    title: 'A Light in the Attic',
    price: '£51.77',
    image: 'https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg',
    inStock: 'In stock',
    link: 'catalogue/a-light-in-the-attic_1000/index.html'
  },
  ...
]

Best Practices for Web Scraping

Follow these best practices to build reliable, efficient, and ethical web scrapers with Browserbase.

Ethical Scraping

  • Respect robots.txt: Check the website’s robots.txt file for crawling guidelines
  • Rate limiting: Implement reasonable delays between requests (2-5 seconds)
  • Terms of Service: Review the website’s terms of service before scraping
  • Data usage: Only collect and use data in accordance with the website’s policies

Performance Optimization

  • Batch processing: Process multiple pages in batches with concurrent sessions
  • Selective scraping: Only extract the data you need
  • Resource management: Close browser sessions promptly after use
  • Connection reuse: Reuse browsers for sequential scraping tasks

Stealth and Anti-Bot Avoidance

  • Enable Browserbase Advanced Stealth mode: Helps avoid bot detection
  • Randomize behavior: Add variable delays between actions
  • Use proxies: Rotate IPs to distribute requests
  • Mimic human interaction: Add realistic mouse movements and delays
  • Handle CAPTCHAs: Enable Browserbase’s automatic CAPTCHA solving

Next Steps

Now that you understand the basics of web scraping with Browserbase, here are some features to explore next: