Firecrawl
API + dashboard
Firecrawl is my go-to when I need to turn external websites into structured context for agents, knowledge bases, or research pipelines. Give it a URL (or a list), and it crawls, scrapes, cleans, and returns JSON/Markdown you can feed straight into your stack. (firecrawl.dev)
Why it's useful
- LLM-friendly output. Firecrawl strips nav junk, normalizes content, and can return JSON, Markdown, or chunked text so you can drop it directly into vector stores or agent prompts.
- Depth & rate controls. Define how deep to crawl, throttle requests, and avoid hammering a site while still grabbing everything you need.
- Webhooks + streaming. Kick off crawls asynchronously, get notified when they're done, or stream chunks as they arrive for real-time processing.
- Integrations. Use the hosted dashboard for ad-hoc jobs, or plug the REST/SDK into automations and pipelines (Zapier, LangChain, etc.).
Common plays
- Agent context. Run Firecrawl on docs/help centers so your agents have up-to-date knowledge.
- Competitive research. Crawl product sites or blogs and feed summaries into Notion/Slack for weekly updates.
- Dataset creation. Harvest structured data from directories/docs to train or fine-tune downstream systems.
Getting started
- Create an account at firecrawl.dev and grab an API key.
- Use the dashboard for manual jobs, or call the API/SDK (
crawl,scrape,status, etc.) inside your scripts. - Configure depth, include/exclude patterns, output formats, and destinations (S3, webhooks, vector DB).
- Pair with Exa, Expo Router content, or agent frameworks to keep your AI workflows fuelled with fresh, structured web data.