GET /web/scrape/markdown scrapes any URL into LLM-ready GitHub Flavored Markdown. Bot protection and geo-blocks are handled by automatic proxy escalation; pass useMainContentOnly: true to drop nav, footer, sidebars, and other chrome.
1 credit per callThe connection stays open while the page is fetched and converted, so there’s no need to poll. Repeated calls for the same URL within maxAgeMs return the cached scrape.
Required. Full URL to scrape. Must include http:// or https://.
includeLinks
boolean
true
Preserve hyperlinks in the Markdown output.
includeImages
boolean
false
Include image references in the Markdown output.
shortenBase64Images
boolean
true
Truncate base64-encoded image data so it doesn’t dominate the response.
useMainContentOnly
boolean
false
Strip headers, footers, sidebars, and navigation, keeping only the main content.
includeFrames
boolean
false
When true, the contents of iframes are rendered to Markdown.
includeSelectors
string[]
none
CSS selectors. When provided, only matching HTML subtrees (and their descendants) are kept before conversion to Markdown. Examples: article.main, #content, [role=main].
excludeSelectors
string[]
none
CSS selectors to remove before conversion to Markdown. Applied after includeSelectors; exclusion takes precedence. Examples: nav, footer, .ad-banner.
pdf
object
{ shouldParse: true }
PDF-page controls: shouldParse, start, end (1-based inclusive range). Set shouldParse: false to skip PDFs.
maxAgeMs
integer
86400000 (24h)
Return a cached scrape if one exists younger than this. 0 forces a fresh scrape. Max is 30 days.
waitForMs
integer
none
Browser wait time after initial load (max 30000). Use when the page needs JS time to populate.
headers
object
none
Outbound HTTP headers forwarded to the target URL, sent as deep-object query params (e.g. headers[X-Custom]=value). When provided, caching is bypassed entirely.
timeoutMS
integer
none
Abort with a 408 if the request exceeds this many milliseconds. Min 1000, max 300000 (5 min).
{ "success": true, "url": "https://example.com", "markdown": "# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)"}
Field
Type
Description
success
boolean
true when the scrape completed.
url
string
The URL that was scraped.
markdown
string
The page rendered as GitHub Flavored Markdown. By default the full page is converted; pass useMainContentOnly: true to strip nav, footer, sidebars, and other chrome.
POST /web/crawl takes a seed URL and returns an array of scraped pages in one call. That’s exactly the shape you want for seeding a RAG index or building a knowledge base.
GET /web/scrape/sitemap reads sitemap.xml from a domain root, follows any nested sitemap indexes, and returns a de-duplicated URL list without rendering any of the pages. Use it for cheap coverage of large sites or to feed a downstream scraper with a curated list.
GET /web/scrape/images takes a URL and returns a manifest of every image referenced on the page: <img> tags, inline <svg>, CSS background images, <picture> sources, OpenGraph and Twitter card images, favicons.Opt into enrichment to also get measured dimensions, a CDN-hosted copy, and a visual-type classification per image.
For type: "url", the absolute image URL. For type: "html", the raw inline SVG/HTML.
images[].element
enum
DOM origin: img, svg, link, source, video, css, object, meta, or background.
images[].type
enum
Format of src: url (external image), html (inline markup like SVG), or base64 (data URI).
images[].alt
string | null
Alt text where present.
images[].enrichment.width
number
Pixel width. Present when enrichment.resolution=true.
images[].enrichment.height
number
Pixel height. Present when enrichment.resolution=true.
images[].enrichment.mimetype
string
MIME type. Present when hosted via enrichment.hostedUrl=true.
images[].enrichment.url
string
Context.dev CDN URL. Present when enrichment.hostedUrl=true.
images[].enrichment.type
enum
Visual category. Present when enrichment.classification=true. One of photography, illustration, logo, wordmark, icon, pattern, graphic, other.
The base manifest is 1 credit. Setting any enrichment flag (resolution, hostedUrl, or classification) bumps the entire call to 5 credits, even if only one image qualifies for enrichment.