What You'll Build

An AI-powered web scraper that extracts product data from e-commerce sites and stores it in MongoDB with automatic schema validation and data analysis.

Before You Start

Ensure you have these requirements ready:

Step 1: Project Setup

Clone and Install

# Clone the integration template
npx degit browserbase/integrations/examples/integrations/mongodb browserbase-mongodb
cd browserbase-mongodb

# Install all dependencies
npm install

# Install browser binaries
npx playwright install

Step 2: Start MongoDB

Required: MongoDB must be running on your system before proceeding. Start it with mongod if installed locally, or ensure your MongoDB Atlas connection is ready.
If using local MongoDB, ensure it’s running:
# Start MongoDB (if installed locally)
mongod
MongoDB Atlas users can skip this step as the database is already hosted in the cloud.

Step 3: Configuration

Environment Variables

Create your .env file with the required configuration:
# Browserbase Configuration (Recommended)
BROWSERBASE_API_KEY=your_browserbase_api_key
BROWSERBASE_PROJECT_ID=your_project_id

# MongoDB Configuration
MONGO_URI=mongodb://localhost:27017
DB_NAME=scraper_db

# Stagehand Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here

Step 4: Configure Stagehand

The integration is configured to use Browserbase cloud browsers:
stagehand.config.ts
const StagehandConfig: ConstructorParams = {
  verbose: 1,
  domSettleTimeoutMs: 30_000,
  
  // LLM Configuration  
  modelName: "claude-3-7-sonnet-20250219",
  modelClientOptions: {
    apiKey: process.env.ANTHROPIC_API_KEY,
  },
  
  // Run in Browserbase cloud (Recommended)
  env: "BROWSERBASE",
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
  browserbaseSessionCreateParams: {
    projectId: process.env.BROWSERBASE_PROJECT_ID!,
    browserSettings: {
      blockAds: true,
      viewport: { width: 1024, height: 768 },
    },
  },
};

Step 5: Run Your First Scrape

What happens when you run the scraper:
  1. Connects to MongoDB and creates necessary collections
  2. Navigates to Amazon laptop category
  3. Scrapes product listings with AI-powered extraction
  4. Extracts detailed information for the first 3 products
  5. Stores all data in MongoDB with schema validation
  6. Runs analysis queries and displays results

Execute the Scraper

npm start

Customization Options

Extend Data Schema

Add custom fields to capture more product information:
const ProductSchema = z.object({
  // Existing fields...
  title: z.string(),
  price: z.string().optional(),
  rating: z.string().optional(),
  
  // Add your custom fields
  brand: z.string().optional(),
  availability: z.string().optional(),
  shippingInfo: z.string().optional(),
  specifications: z.array(z.string()).optional(),
  customerReviews: z.number().optional(),
});

Custom Extraction Instructions

Modify the AI extraction to capture specific data:
const data = await page.extract({
  instruction: `
    Extract comprehensive product information including:
    - Brand and model details
    - Detailed specifications
    - Availability and shipping information
    - Customer ratings and review counts
  `,
  schema: ProductSchema,
});

What’s Next?

Now that you have a working MongoDB + Stagehand integration:
Need help? Join the Stagehand Slack community for support and to share your scraping projects!