Skip to main content
POST
/
web
/
crawl
JavaScript
import ContextDev from 'context.dev';

const client = new ContextDev({
  apiKey: process.env['CONTEXT_DEV_API_KEY'], // This is the default and can be omitted
});

const response = await client.web.webCrawlMd({ url: 'https://example.com' });

console.log(response.metadata);
{
  "results": [
    {
      "markdown": "<string>",
      "metadata": {
        "url": "<string>",
        "title": "<string>",
        "crawlDepth": 123,
        "statusCode": 123,
        "success": true
      }
    }
  ],
  "metadata": {
    "numUrls": 123,
    "maxCrawlDepth": 123,
    "numSucceeded": 123,
    "numFailed": 123,
    "numSkipped": 123
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.context.dev/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string<uri>
required

The starting URL for the crawl (must include http:// or https:// protocol)

maxPages
integer
default:100

Maximum number of pages to crawl. Hard cap: 500.

Required range: 1 <= x <= 500
maxDepth
integer

Maximum link depth from the starting URL (0 = only the starting page)

Required range: x >= 0
urlRegex
string

Regex pattern. Only URLs matching this pattern will be followed and scraped.

Example:

"^https?://[^/]+/blog/"

Preserve hyperlinks in the Markdown output

includeImages
boolean
default:false

Include image references in the Markdown output

shortenBase64Images
boolean
default:true

Truncate base64-encoded image data in the Markdown output

useMainContentOnly
boolean
default:false

Extract only the main content, stripping headers, footers, sidebars, and navigation

followSubdomains
boolean
default:false

When true, follow links on subdomains of the starting URL's domain (e.g. docs.example.com when starting from example.com). www and apex are always treated as equivalent.

pdf
object

PDF parsing controls. Use start/end to limit text extraction and OCR to an inclusive 1-based page range.

includeFrames
boolean
default:false

When true, the contents of iframes are rendered to Markdown for each crawled page.

maxAgeMs
integer
default:86400000

Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.

Required range: 0 <= x <= 2592000000
waitForMs
integer

Optional browser wait time in milliseconds after initial page load for each crawled page. Min: 0. Max: 30000 (30 seconds).

Required range: 0 <= x <= 30000
stopAfterMs
integer
default:120000

Soft time budget for the crawl in milliseconds. After each scrape, the crawler checks the elapsed time and, if exceeded, returns the pages collected so far instead of continuing. Min: 10000 (10s). Max: 240000 (4 min). Default: 120000 (2 min).

Required range: 10000 <= x <= 240000
timeoutMS
integer

Optional timeout in milliseconds for the request. If the request takes longer than this value, it will be aborted with a 408 status code. Maximum allowed value is 300000ms (5 minutes).

Required range: 1000 <= x <= 300000

Response

Successful response

results
object[]
required
metadata
object
required