Overview

Web scraping lets you extract structured data from websites. Browserbase provides a reliable browser infrastructure that helps you build scrapers that can:

  • Scale without infrastructure management
  • Maintain consistent performance
  • Avoid bot detection and CAPTCHAs with Browserbase’s stealth mode
  • Provide debugging and monitoring tools with session replays and live views

This guide will help you get started with web scraping on Browserbase and highlight best practices.

Scraping a website

Using a sample website, we’ll scrape the title, price, and some other details of books from the website.

Follow Along: Web Scraping Example

Step-by-step code for web scraping

Code Example

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";
import dotenv from "dotenv";
dotenv.config();

const stagehand = new Stagehand({
    env: "BROWSERBASE",
    verbose: 0,
});

async function scrapeBooks() {
    await stagehand.init();
    const page = stagehand.page;

    await page.goto("https://books.toscrape.com/");

    const scrape = await page.extract({
        instruction: "Extract the books from the page",
        schema: z.object({
            books: z.array(z.object({
                title: z.string(),
                price: z.string(),
                image: z.string(),
                inStock: z.string(),
                link: z.string(),
            }))
        }),
    });

    console.log(scrape.books);

    await stagehand.close();
    return books;
}

const books = scrapeBooks().catch(console.error);

Example output

[
  {
    title: 'A Light in the Attic',
    price: '£51.77',
    image: 'https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg',
    inStock: 'In stock',
    link: 'catalogue/a-light-in-the-attic_1000/index.html'
  },
  ...
]

Best Practices for Web Scraping

Follow these best practices to build reliable, efficient, and ethical web scrapers with Browserbase.

Ethical Scraping

  • Respect robots.txt: Check the website’s robots.txt file for crawling guidelines
  • Rate limiting: Implement reasonable delays between requests (2-5 seconds)
  • Terms of Service: Review the website’s terms of service before scraping
  • Data usage: Only collect and use data in accordance with the website’s policies

Performance Optimization

  • Batch processing: Process multiple pages in batches with concurrent sessions
  • Selective scraping: Only extract the data you need
  • Resource management: Close browser sessions promptly after use
  • Connection reuse: Reuse browsers for sequential scraping tasks

Stealth and Anti-Bot Avoidance

  • Enable Browserbase Advanced Stealth mode: Helps avoid bot detection
  • Randomize behavior: Add variable delays between actions
  • Use proxies: Rotate IPs to distribute requests
  • Mimic human interaction: Add realistic mouse movements and delays
  • Handle CAPTCHAs: Enable Browserbase’s automatic CAPTCHA solving

Next Steps

Now that you understand the basics of web scraping with Browserbase, here are some features to explore next:

Was this page helpful?