Skip to main content

Overview

POST /web/extract crawls a starting URL, converts pages to Markdown, and returns a data object that matches the JSON Schema you provide. Use it when you need typed facts from a site instead of raw HTML or Markdown.

/web/extract API reference

See the full request body, response shape, limits, and pricing.
/brand/ai/query is deprecated in favor of /web/extract. It remains live for the foreseeable future, so existing integrations do not need to migrate immediately, but new structured extraction work should use /web/extract.

How It Works

Send a JSON Schema-compliant schema in the schema field. The endpoint uses that schema as the contract for the returned data object. Good extraction schemas should:
  • Use an object as the root schema
  • Add clear description text to fields that need interpretation
  • Use additionalProperties: false when you want a predictable output shape
  • Allow null for uncertain facts instead of omitting keys
  • Keep arrays and nested objects explicit

Code Examples

For TypeScript, install the SDK and the zod package:
npm install context.dev zod
Use a local schema model, pass the generated or written JSON Schema as schema, then validate the returned data with the same local model.
import ContextDev from "context.dev";
import { z } from "zod";

const CaseStudySchema = z
  .object({
    title: z.string().describe("Case study title."),
    url: z.string().url().describe("Absolute URL for the case study."),
    customer: z.string().nullable().describe("Customer name if listed."),
  })
  .strict();

const WebsiteProfileSchema = z
  .object({
    mission_statement: z
      .string()
      .nullable()
      .describe("The company's stated mission, if one is listed."),
    value_props: z
      .array(z.string())
      .describe("Main product or service value propositions."),
    case_studies: z
      .array(CaseStudySchema)
      .describe("Published customer stories or case studies."),
  })
  .strict();

const jsonSchema = z.toJSONSchema(WebsiteProfileSchema);

async function extractWebsiteProfile(url: string) {
  const client = new ContextDev({
    apiKey: process.env.CONTEXT_DEV_API_KEY,
  });

  const response = await client.web.extract({
    url,
    schema: jsonSchema,
    instructions:
      "Extract facts from the homepage, about page, and customer story pages when available.",
    factCheck: true,
    maxAgeMs: 86_400_000,
  });

  return WebsiteProfileSchema.parse(response.data);
}

extractWebsiteProfile("https://example.com")
  .then((profile) => console.log(profile.case_studies))
  .catch(console.error);

Request Options

Use instructions to clarify what to prioritize or how to interpret a field. Keep it short and specific; the schema should still carry the shape of the output. Common options:
OptionUse it for
factCheckRequire returned values to be grounded in page content.
maxAgeMsReuse recent cached scrape results for repeat requests.
waitForMsWait for client-rendered content before extraction.
stopAfterMsSet a soft crawl time budget.
followSubdomainsInclude pages on subdomains of the starting URL.
includeFramesInclude iframe content in the Markdown used for extraction.
pdfParse or bound PDF extraction with shouldParse, start, and end.

Schema Tips

Use descriptions for fields that could be interpreted multiple ways:
{
  "type": "object",
  "properties": {
    "ideal_customer": {
      "type": ["string", "null"],
      "description": "The customer segment explicitly described on the site, not an inferred buyer persona."
    }
  },
  "required": ["ideal_customer"],
  "additionalProperties": false
}
For inferred fields such as recommendations, market positioning, or likely competitors, set factCheck to false. For names, URLs, dates, quotes, and metrics, keep factCheck enabled.

Extract Structured Website Data

Full API reference for /web/extract

Scrape Markdown

Retrieve Markdown directly when you do not need structured output