Overview
POST /web/extract crawls a starting URL, converts pages to Markdown, and returns a data object that matches the JSON Schema you provide. Use it when you need typed facts from a site instead of raw HTML or Markdown.
/web/extract API reference
See the full request body, response shape, limits, and pricing.
/brand/ai/query is deprecated in favor of /web/extract. It remains live
for the foreseeable future, so existing integrations do not need to migrate
immediately, but new structured extraction work should use /web/extract.How It Works
Send a JSON Schema-compliant schema in theschema field. The endpoint uses that schema as the contract for the returned data object.
Good extraction schemas should:
- Use an
objectas the root schema - Add clear
descriptiontext to fields that need interpretation - Use
additionalProperties: falsewhen you want a predictable output shape - Allow
nullfor uncertain facts instead of omitting keys - Keep arrays and nested objects explicit
Code Examples
For TypeScript, install the SDK and thezod package:
schema, then validate the returned data with the same local model.
Request Options
Useinstructions to clarify what to prioritize or how to interpret a field. Keep it short and specific; the schema should still carry the shape of the output.
Common options:
| Option | Use it for |
|---|---|
factCheck | Require returned values to be grounded in page content. |
maxAgeMs | Reuse recent cached scrape results for repeat requests. |
waitForMs | Wait for client-rendered content before extraction. |
stopAfterMs | Set a soft crawl time budget. |
followSubdomains | Include pages on subdomains of the starting URL. |
includeFrames | Include iframe content in the Markdown used for extraction. |
pdf | Parse or bound PDF extraction with shouldParse, start, and end. |
Schema Tips
Use descriptions for fields that could be interpreted multiple ways:factCheck to false. For names, URLs, dates, quotes, and metrics, keep factCheck enabled.
Related Resources
Extract Structured Website Data
Full API reference for /web/extract
Scrape Markdown
Retrieve Markdown directly when you do not need structured output