Extract Structured Data from Web Pages

Overview

POST /web/extract crawls a starting URL, converts pages to Markdown, and returns a data object that matches the JSON Schema you provide. Use it when you need typed facts from a site instead of raw HTML or Markdown.

/web/extract API reference

See the full request body, response shape, limits, and pricing.

/brand/ai/query is deprecated in favor of /web/extract. It remains live for the foreseeable future, so existing integrations do not need to migrate immediately, but new structured extraction work should use /web/extract.

How It Works

Send a JSON Schema-compliant schema in the schema field. The endpoint uses that schema as the contract for the returned data object. Good extraction schemas should:

Use an object as the root schema
Add clear description text to fields that need interpretation
Use additionalProperties: false when you want a predictable output shape
Allow null for uncertain facts instead of omitting keys
Keep arrays and nested objects explicit

Code Examples

For TypeScript, install the SDK and the zod package:

npm install context.dev zod

Use a local schema model, pass the generated or written JSON Schema as schema, then validate the returned data with the same local model.

import ContextDev from "context.dev";
import { z } from "zod";

const CaseStudySchema = z
  .object({
    title: z.string().describe("Case study title."),
    url: z.string().url().describe("Absolute URL for the case study."),
    customer: z.string().nullable().describe("Customer name if listed."),
  })
  .strict();

const WebsiteProfileSchema = z
  .object({
    mission_statement: z
      .string()
      .nullable()
      .describe("The company's stated mission, if one is listed."),
    value_props: z
      .array(z.string())
      .describe("Main product or service value propositions."),
    case_studies: z
      .array(CaseStudySchema)
      .describe("Published customer stories or case studies."),
  })
  .strict();

const jsonSchema = z.toJSONSchema(WebsiteProfileSchema);

async function extractWebsiteProfile(url: string) {
  const client = new ContextDev({
    apiKey: process.env.CONTEXT_DEV_API_KEY,
  });

  const response = await client.web.extract({
    url,
    schema: jsonSchema,
    instructions:
      "Extract facts from the homepage, about page, and customer story pages when available.",
    factCheck: true,
    maxAgeMs: 86_400_000,
  });

  return WebsiteProfileSchema.parse(response.data);
}

extractWebsiteProfile("https://example.com")
  .then((profile) => console.log(profile.case_studies))
  .catch(console.error);

Request Options

Use instructions to clarify what to prioritize or how to interpret a field. Keep it short and specific; the schema should still carry the shape of the output. Common options:

Option	Use it for
`factCheck`	Require returned values to be grounded in page content.
`maxAgeMs`	Reuse recent cached scrape results for repeat requests.
`waitForMs`	Wait for client-rendered content before extraction.
`stopAfterMs`	Set a soft crawl time budget.
`followSubdomains`	Include pages on subdomains of the starting URL.
`includeFrames`	Include iframe content in the Markdown used for extraction.
`pdf`	Parse or bound PDF extraction with `shouldParse`, `start`, and `end`.

Schema Tips

Use descriptions for fields that could be interpreted multiple ways:

{
  "type": "object",
  "properties": {
    "ideal_customer": {
      "type": ["string", "null"],
      "description": "The customer segment explicitly described on the site, not an inferred buyer persona."
    }
  },
  "required": ["ideal_customer"],
  "additionalProperties": false
}

For inferred fields such as recommendations, market positioning, or likely competitors, set factCheck to false. For names, URLs, dates, quotes, and metrics, keep factCheck enabled.

Extract Structured Website Data

Full API reference for /web/extract

Scrape Markdown

Retrieve Markdown directly when you do not need structured output

​Overview

/web/extract API reference

​How It Works

​Code Examples

​Request Options

​Schema Tips

​Related Resources

Extract Structured Website Data

Scrape Markdown

Overview

How It Works

Code Examples

Request Options

Schema Tips

Related Resources