Speed up your automation with parallelization
Leverage multiple browser instances in parallel with a simple task queue.
Some complex automation requires processing multiple pages and actions without resulting in an exponential duration. For example, a backup automation tool must access and back up hundreds of pages reliably and efficiently.
Using multiple pages or contexts for parallelism will result in issues with Browserbase’s persistent context. Instead, we recommend leveraging multiple browser instances as a pool to process actions on numerous pages in parallel.
Enable parallelism with a pool of Browser instances
Let’s walk through the code of our example script processing hundreds of Wikipedia pages efficiently:
The processBrowserbaseTasks() utility
The processBrowserbaseTasks()
utility creates 5 Browserbase Sessions and reuses the available pages between tasks:
import { Page, chromium } from "playwright-core";
export async function processBrowserbaseTasks<R>(
tasks: ((page: Page) => Promise<R>)[],
): Promise<R[]> {
const tasksQueue = tasks.slice();
const resultsQueue: R[] = [];
const createBrowserSession = async (browserWSEndpoint: string) => {
const browser = await chromium.connectOverCDP(browserWSEndpoint);
const pages = await browser.pages();
const page = pages[0];
while (true) {
if (tasksQueue.length > 0) {
const task = tasksQueue.shift();
if (task) {
const result = await task(page);
resultsQueue.push(result);
}
} else {
break;
}
}
await page.close();
await browser.close();
};
const browserWSEndpoint = `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}&enableProxy=true`;
const sessions = Array.from({ length: 5 }, () =>
createBrowserSession(browserWSEndpoint),
);
await Promise.all(sessions);
return resultsQueue;
}
Creating a task to fetch each Wikipedia page
Let’s create a function that will be called with a page
for each Wikipedia URL and return the url
, content
tuple:
const tasks = loadUrlsFromFile("wikipedia_urls.txt").map(
(url) => async (page: Page) => {
console.log(`Processing ${url}...`);
await page.goto(url);
const content = await page.content();
return [url, content];
},
);
Passing the tasks to `processBrowserbaseTasks()` and printing the results
const result = await processBrowserbaseTasks(tasks);
result.map(([url, content]) => {
console.log(url, content.substring(0, 200) + "...");
});
Run the automation
Let’s now run our Wikipedia automation; you will notice that pages are processed by a group of 5:
$ BROWSERBASE_PROJECT_ID=xxxxxxxxx BROWSERBASE_API_KEY=xxxxxxxxx node dist/index.js
Processing https://en.wikipedia.org/wiki/Patrick_Flynn_(hurler)...
Processing https://en.wikipedia.org/wiki/Environmental_radioactivity...
Processing https://en.wikipedia.org/wiki/Alexi_Ogando...
Processing https://en.wikipedia.org/wiki/Costantino_Maria_Attilio_Barneschi...
Processing https://en.wikipedia.org/wiki/Breaking_bulk...
Processing https://en.wikipedia.org/wiki/New_Hampshire_Route_122...
Processing https://en.wikipedia.org/wiki/David_Hoff...
Processing https://en.wikipedia.org/wiki/Neodesha,_Oklahoma...
Processing https://en.wikipedia.org/wiki/List_of_Bethel_Threshers_head_football_coaches...
Processing https://en.wikipedia.org/wiki/Thysanodonta_boucheti...
Processing https://en.wikipedia.org/wiki/Sturm_und_Drang_(play)...
Processing https://en.wikipedia.org/wiki/Maša_Kolanović...
Processing https://en.wikipedia.org/wiki/Hermitage_of_Sant'Onofrio,_Serramonacesca...
Processing https://en.wikipedia.org/wiki/Bill_Simpson_(racing_driver)...
Processing https://en.wikipedia.org/wiki/Dundee,_Oregon...
Processing https://en.wikipedia.org/wiki/Caragh_McMurtry...
Processing https://en.wikipedia.org/wiki/Palmar_metacarpal_veins...
Processing https://en.wikipedia.org/wiki/2000_Uzbek_presidential_election...
Find the complete example on GitHub
Clone this GitHub repo to get started with a Playwright parallelism setup
Was this page helpful?