Haystack
Fetch and extract data from complex web pages
Reliably fetch data from web pages containing JavaScript or anti-bots mechanisms.
1
Get your API Key
Go over the Dashboard’s Settings tab:
Then copy your API Key directly from the input and set the BROWSERBASE_KEY
environment variable.
2
Install the Browserbase Haystack integration package
pip install browserbase-haystack
3
Import and configure BrowserbaseFetcher
Python
from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from browserbase_haystack import BrowserbaseFetcher
prompt_template = (
"Tell me the titles of the given pages. Pages: {{ documents }}"
)
prompt_builder = PromptBuilder(template=prompt_template)
llm = OpenAIGenerator()
browserbase_fetcher = BrowserbaseFetcher()
pipe = Pipeline()
pipe.add_component("fetcher", browserbase_fetcher)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("fetcher.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")
result = pipe.run(data={"fetcher": {"urls": ["https://example.com"]}})
Was this page helpful?