Tom Osman
[ RETURN_TO_INVENTORY ]

Firecrawl

API + dashboard

Firecrawl homepage
Live preview captured from firecrawl.dev

Firecrawl is my go-to when I need to turn external websites into structured context for agents, knowledge bases, or research pipelines. Give it a URL (or a list), and it crawls, scrapes, cleans, and returns JSON/Markdown you can feed straight into your stack. (firecrawl.dev)

Why it's useful

  • LLM-friendly output. Firecrawl strips nav junk, normalizes content, and can return JSON, Markdown, or chunked text so you can drop it directly into vector stores or agent prompts.
  • Depth & rate controls. Define how deep to crawl, throttle requests, and avoid hammering a site while still grabbing everything you need.
  • Webhooks + streaming. Kick off crawls asynchronously, get notified when they're done, or stream chunks as they arrive for real-time processing.
  • Integrations. Use the hosted dashboard for ad-hoc jobs, or plug the REST/SDK into automations and pipelines (Zapier, LangChain, etc.).

Common plays

  1. Agent context. Run Firecrawl on docs/help centers so your agents have up-to-date knowledge.
  2. Competitive research. Crawl product sites or blogs and feed summaries into Notion/Slack for weekly updates.
  3. Dataset creation. Harvest structured data from directories/docs to train or fine-tune downstream systems.

Getting started

  • Create an account at firecrawl.dev and grab an API key.
  • Use the dashboard for manual jobs, or call the API/SDK (crawl, scrape, status, etc.) inside your scripts.
  • Configure depth, include/exclude patterns, output formats, and destinations (S3, webhooks, vector DB).
  • Pair with Exa, Expo Router content, or agent frameworks to keep your AI workflows fuelled with fresh, structured web data.
Open Firecrawl