Skip to main content
POST
/
web
/
extract
JavaScript
import ContextDev from 'context.dev';

const client = new ContextDev({
  apiKey: process.env['CONTEXT_DEV_API_KEY'], // This is the default and can be omitted
});

const response = await client.web.extract({
  schema: {
    type: 'bar',
    properties: 'bar',
    required: 'bar',
    additionalProperties: 'bar',
  },
  url: 'https://example.com',
});

console.log(response.data);
{
  "status": "<string>",
  "url": "<string>",
  "urls_analyzed": [
    "<string>"
  ],
  "data": {},
  "metadata": {
    "numUrls": 123,
    "maxCrawlDepth": 123,
    "numSucceeded": 123,
    "numFailed": 123,
    "numSkipped": 123
  }
}
10 Credits

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string<uri>
required

The starting website URL to crawl and extract from. Must include http:// or https://.

schema
object
required

JSON Schema for the returned data object. TypeScript Zod users can pass a JSON Schema generated from a Zod object; Python users can pass the equivalent JSON Schema object.

Example:
{
"type": "object",
"properties": {
"mission_statement": {
"type": "string",
"description": "The company's stated mission."
},
"case_studies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"url": { "type": "string" }
},
"required": ["title", "url"],
"additionalProperties": false
}
}
},
"required": ["mission_statement", "case_studies"],
"additionalProperties": false
}
instructions
string

Optional extraction guidance, such as which facts to prioritize or how to interpret fields in the schema.

Maximum string length: 2000
factCheck
boolean
default:false

When true, every returned value must be grounded in facts stated on the page; fields that cannot be supported by the page are returned as null/empty. When false (default), the model may make reasonable inferences and derivations from the page content (e.g. ideal customer, competitor analysis, recommendations) while keeping verifiable specifics (names, quotes, URLs, dates, metrics) faithful to the source.

followSubdomains
boolean
default:false

When true, follow links on subdomains of the starting URL's domain.

pdf
object
includeFrames
boolean
default:false

When true, iframe contents are included in Markdown before extraction.

maxAgeMs
integer
default:604800000

Return cached scrape results if a prior scrape for the same parameters is younger than this many milliseconds. Defaults to 7 days (604800000 ms).

Required range: 0 <= x <= 2592000000
waitForMs
integer

Optional browser wait time in milliseconds after initial page load for each crawled page.

Required range: 0 <= x <= 30000
stopAfterMs
integer
default:120000

Soft time budget for the crawl in milliseconds.

Required range: 10000 <= x <= 240000
timeoutMS
integer

Optional timeout in milliseconds for the request. If the request takes longer than this value, it will be aborted with a 408 status code. Maximum allowed value is 300000ms (5 minutes).

Required range: 1000 <= x <= 300000

Response

Successful response

status
string
required

Status of the response, e.g., 'ok'

url
string
required

The starting URL that was analyzed

urls_analyzed
string[]
required

List of URLs whose Markdown was used for extraction

data
object
required

Extracted data matching the request schema

metadata
object
required