Firecrawl

API + dashboard

Firecrawl is my go-to when I need to turn external websites into structured context for agents, knowledge bases, or research pipelines. Give it a URL (or a list), and it crawls, scrapes, cleans, and returns JSON/Markdown you can feed straight into your stack. (firecrawl.dev)

Why it's useful

LLM-friendly output. Firecrawl strips nav junk, normalizes content, and can return JSON, Markdown, or chunked text so you can drop it directly into vector stores or agent prompts.
Depth & rate controls. Define how deep to crawl, throttle requests, and avoid hammering a site while still grabbing everything you need.
Webhooks + streaming. Kick off crawls asynchronously, get notified when they're done, or stream chunks as they arrive for real-time processing.
Integrations. Use the hosted dashboard for ad-hoc jobs, or plug the REST/SDK into automations and pipelines (Zapier, LangChain, etc.).

Common plays

Agent context. Run Firecrawl on docs/help centers so your agents have up-to-date knowledge.
Competitive research. Crawl product sites or blogs and feed summaries into Notion/Slack for weekly updates.
Dataset creation. Harvest structured data from directories/docs to train or fine-tune downstream systems.

Getting started

Create an account at firecrawl.dev and grab an API key.
Use the dashboard for manual jobs, or call the API/SDK (crawl, scrape, status, etc.) inside your scripts.
Configure depth, include/exclude patterns, output formats, and destinations (S3, webhooks, vector DB).
Pair with Exa, Expo Router content, or agent frameworks to keep your AI workflows fuelled with fresh, structured web data.

Open Firecrawl