Integrate Context.dev's Extract API in your app
Prerequisites
-
A Context.dev API key. Sign up at context.dev/signup, copy the key from the dashboard (prefix
ctxt_secret_), and export it: -
An SDK (optional). Install for your language, or skip the install and call directly with
curl:
Extract data
You describe the result you want as one JSON Schema. Property names become the keys of the response’sdata object, and each property’s description tells the model what to look for:
Request parameters
| Parameter | Type | Description |
|---|---|---|
url | string | Required. Starting URL to crawl. Must include http:// or https://. |
schema | object | Required. A JSON Schema describing the structure of the data you want back. Add descriptions to properties to tell the model what to look for. |
instructions | string | Plain-language guidance for the crawl and extraction (max 2000 chars), e.g. "Focus on the pricing page." |
factCheck | boolean | When true, only values stated on the crawled pages are returned. When false (default), the model may make reasonable inferences. |
followSubdomains | boolean | Follow links to subdomains of the starting domain. Default false. |
maxPages | integer | Number of pages to analyze, 1–50. Default 5. |
maxDepth | integer | Maximum link depth from the starting URL. Unlimited by default. |
pdf | object | PDF handling: shouldParse (default true), plus start / end to limit parsing to a 1-based page range. |
includeFrames | boolean | Include iframe contents in extraction. Default false. |
maxAgeMs | integer | Serve cached page content up to this old, 0–2592000000 ms. Default 604800000 (7 days). |
waitForMs | integer | Extra browser wait after page load, in milliseconds. |
stopAfterMs | integer | Soft time budget for the crawl, 10000–110000 ms. Default 80000. |
timeoutMS | integer | Abort the request with a 408 if it exceeds this many milliseconds. Range 1000–300000 (5 min max). |
Use your schema library
Becauseschema is standard JSON Schema, you don’t have to write it by hand: generate it from the schema library you already use, and validate response.data on the way out with the same model. Nested structures like arrays of objects come for free:
Understand the response
A successful call returns the starting URL, the URLs the crawler actually used, your data in the shape of your schema, and crawl statistics:sample response
| Field | Type | Description |
|---|---|---|
status | string | "ok" on success. |
url | string | The starting URL that was analyzed. |
urls_analyzed | string[] | Every URL the crawler actually used to produce the answer. |
data | object | The extracted data. Matches the schema you sent. |
metadata.numUrls | integer | Total URLs attempted during the crawl. |
metadata.maxCrawlDepth | integer | Deepest link depth reached. |
metadata.numSucceeded | integer | Pages fetched and analyzed successfully. |
metadata.numFailed | integer | Pages that failed to fetch. |
metadata.numSkipped | integer | Pages skipped as irrelevant to the schema. |
error_code. Common ones: INPUT_VALIDATION_ERROR (bad URL or schema) and WEBSITE_ACCESS_ERROR on 400, UNAUTHORIZED on 401 (missing or invalid API key), REQUEST_TIMEOUT on 408, RATE_LIMITED on 429, and INTERNAL_ERROR on 500.
Use cases
- Lead enrichment: extract
founded_year,employee_count,headquarters_cityetc. from a company’s site to enrich CRM records. - Hiring signal tracking: extract an array of open roles (title, location, team) starting from a careers page for sourcing pipelines or competitor monitoring.
- Compliance snapshots: extract a structured summary of privacy policy or terms clauses on a schedule with
factCheck: true, then diff against the last run. - Investor relations data: extract revenue, ARR, headcount, or funding figures as typed numbers; PDF parsing picks up IR decks and annual reports automatically.
Next steps
Scrape Websites
Get clean Markdown, HTML, or sitemap URLs from any page.
Extract Products
Typed product data: SKU, price, images, from any storefront.
Best Practices
Caching, error handling, and key hygiene.
Troubleshooting
Status codes, retry patterns, and common errors.