Handle Rate Limits

Each plan has a per-minute request cap. If you exceed it, the API returns 429 Too Many Requests. To make your app production-ready, use these four patterns to prevent this or handle errors when it happens:

Client-side caching for hot domains.
Backoff on 429, honoring the Retry-After header.
Prefetch to shift slow work ahead of bursts.
Tier-aware fallbacks when the limit holds.

Rate limits per plan

Rate limits apply per API key, are measured per minute, and are visible on your dashboard. The current tiers:

Plan	Credits per month	Rate limit	Overage
Free	500 one-time*	30 requests/min	None
Hobby	10,000	60 requests/min	$15 per 10K credits
Starter	30,000	120 requests/min	$12 per 10K credits
Pro	200,000	300 requests/min	$9 per 10K credits
Scale	2,500,000	800 requests/min	$6 per 10K credits
Enterprise	Custom	Custom	Contact sales

* Free plan credits are a one-time grant, not a monthly allowance.

Logo Link and Prefetch endpoints do not have any rate limits. Monitors management endpoints (/v1/monitors/*) use a separate per-organization bucket.

Monitors API has its own bucket

Requests to /v1/monitors/* — the endpoints that create, list, update, delete, and inspect monitors and their runs — do not count against the per-plan rate limit in the table above. They draw from a separate per-organization bucket:

Limit	Value
Requests per minute (all plans)	1000
Scope	Per organization

This bucket is isolated in both directions: heavy monitor polling never eats into your data-API budget, and heavy brand/web traffic never throttles your monitors dashboard or webhook backfill queries. The 1000/min ceiling is flat across every plan, including Free. When you exceed the monitors bucket, the response is the same 429 Rate limit exceeded envelope described below, but the X-RateLimit-* headers on monitors responses reflect the monitors bucket (limit 1000) rather than your plan’s data-API limit. Apply the same backoff pattern — reading Retry-After and X-RateLimit-Remaining from the monitors response works the same way.

Monitor runs (the actual scrape work executed on your monitor’s schedule) are separate from the monitors management API. Run frequency and count are controlled by your plan’s monitor limits and the monitor’s own schedule, not this per-minute cap.

Weighted endpoints

Two endpoints fan out to many upstream scrapes per call and count as 10 requests against your per-minute rate limit instead of 1:

Endpoint	Weight
`POST /web/crawl`	10
`POST /brand/ai/products`	10

Every other endpoint still counts as 1 request. On a Pro plan (300 requests/min), that means at most 30 crawl or product-extraction calls per minute before you start seeing 429s, while lighter endpoints keep their full 300/min budget. Rejected requests do not consume budget — if a weighted call is throttled, none of the 10 units are charged and other requests in the same window are unaffected. The X-RateLimit-Remaining header reflects the post-weight count, so a successful /web/crawl on a fresh 300/min window drops X-RateLimit-Remaining from 300 to 290.

Read rate-limit headers on every response

Every response from an authenticated request now includes three headers so you can pace requests without waiting for a 429:

Header	Meaning
`X-RateLimit-Limit`	Maximum requests allowed in the current fixed one-minute window.
`X-RateLimit-Remaining`	Requests remaining in the current window.
`X-RateLimit-Reset`	Unix timestamp (seconds) when the window resets.

Use X-RateLimit-Remaining to slow down proactively — for example, add a small delay when it drops under 10% of X-RateLimit-Limit — instead of retrying after a 429.

What a 429 looks like

The API returns a JSON envelope:

{
  "status": "error",
  "message": "Rate limit exceeded",
  "code": 429,
  "key_metadata": {
    "credits_consumed": 0,
    "credits_remaining": 29940
  }
}

credits_consumed is always 0 on a 429 — throttled requests are never charged. Every 429 response also includes the standard X-RateLimit-* headers plus a Retry-After header with the precise number of seconds (1–60) until your per-minute window resets:

Retry-After: 23

Through an SDK, the error surfaces as a typed exception with status === 429. The SDK does not retry automatically. You wire that in.

Pattern 1: Client-side cache for hot domains

The cheapest way to stay under the cap is to skip the call. Brand data changes on the order of months, so a 24-hour client cache is safe for most products:

import ContextDev from "context.dev";

const client = new ContextDev({ apiKey: process.env.CONTEXT_DEV_API_KEY });

const CACHE_TTL_MS = 30 * 24 * 60 * 60 * 1000;
const cache = new Map<string, { data: unknown; at: number }>();

async function getBrand(domain: string) {
  const hit = cache.get(domain);
  if (hit && Date.now() - hit.at < CACHE_TTL_MS) return hit.data;

  const { brand } = await client.brand.retrieve({ type: "by_domain", domain });
  cache.set(domain, { data: brand, at: Date.now() });
  return brand;
}

import os
import time
from context.dev import ContextDev

client = ContextDev(api_key=os.environ["CONTEXT_DEV_API_KEY"])

CACHE_TTL_S = 30 * 24 * 60 * 60
_cache: dict[str, tuple[float, dict]] = {}

def get_brand(domain: str):
    hit = _cache.get(domain)
    if hit and time.time() - hit[0] < CACHE_TTL_S:
        return hit[1]
    brand = client.brand.retrieve(type="by_domain", domain=domain).brand
    _cache[domain] = (time.time(), brand)
    return brand

require "context_dev"

client = ContextDev::Client.new(api_key: ENV.fetch("CONTEXT_DEV_API_KEY"))

CACHE_TTL_S = 30 * 24 * 60 * 60
@cache = {}

def get_brand(client, domain)
  hit = @cache[domain]
  return hit[:data] if hit && Time.now.to_i - hit[:at] < CACHE_TTL_S

  brand = client.brand.retrieve(body: { type: :by_domain, domain: domain }).brand
  @cache[domain] = { data: brand, at: Time.now.to_i }
  brand
end

get_brand(client, "acme.com")

package main

import (
    "context"
    "os"
    "sync"
    "time"

    contextdev "github.com/context-dot-dev/context-go-sdk"
    "github.com/context-dot-dev/context-go-sdk/option"
)

var (
    cacheMu sync.Mutex
    cache   = map[string]struct {
        data any
        at   time.Time
    }{}
    cacheTTL = 30 * 24 * time.Hour
    client   = contextdev.NewClient(option.WithAPIKey(os.Getenv("CONTEXT_DEV_API_KEY")))
)

func GetBrand(ctx context.Context, domain string) (any, error) {
    cacheMu.Lock()
    if hit, ok := cache[domain]; ok && time.Since(hit.at) < cacheTTL {
        cacheMu.Unlock()
        return hit.data, nil
    }
    cacheMu.Unlock()

    r, err := client.Brand.Get(ctx, contextdev.BrandGetParams{
        OfByDomain: &contextdev.BrandGetParamsBodyByDomain{Domain: domain},
    })
    if err != nil {
        return nil, err
    }
    cacheMu.Lock()
    cache[domain] = struct {
        data any
        at   time.Time
    }{r.Brand, time.Now()}
    cacheMu.Unlock()
    return r.Brand, nil
}

<?php

use ContextDev\Client;

$client = new Client(apiKey: getenv('CONTEXT_DEV_API_KEY'));

const CACHE_TTL_S = 30 * 24 * 60 * 60;
$cache = [];

function getBrand(string $domain)
{
    global $client, $cache;

    if (isset($cache[$domain]) && time() - $cache[$domain]['at'] < CACHE_TTL_S) {
        return $cache[$domain]['data'];
    }

    $brand = $client->brand->retrieve(type: 'by_domain', domain: $domain)->brand;
    $cache[$domain] = ['data' => $brand, 'at' => time()];
    return $brand;
}

Reasonable TTL starting points: 30 days for brand responses, 7 days for product extractions, indefinite for industry codes (NAICS / SIC). Adjust per use case.

Pattern 2: Backoff on 429 with `Retry-After`

When you hit rate limits, you get a 429 status code on the response:

{
  "status": "error",
  "message": "Rate limit exceeded",
  "code": 429
}

The response’s Retry-After header tells you exactly how many seconds until your window resets, so use it as the wait time when it’s present. Fall back to exponential backoff (wait 1 second before the first retry and double the delay on each subsequent attempt) if you can’t read the header. Here’s an example of a retry script that honors Retry-After and falls back to exponential delays:

async function retrieveWithBackoff(domain: string, maxAttempts = 4) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await client.brand.retrieve({ type: "by_domain", domain });
    } catch (err: any) {
      if (err.status !== 429 || attempt === maxAttempts - 1) throw err;

      const retryAfter = Number(err.headers?.["retry-after"]);
      const delayMs = retryAfter > 0 ? retryAfter * 1000 : Math.pow(2, attempt) * 1000;
      await new Promise((r) => setTimeout(r, delayMs));
    }
  }
}

import time
from context.dev import APIStatusError

def retrieve_with_backoff(domain: str, max_attempts: int = 4):
    for attempt in range(max_attempts):
        try:
            return client.brand.retrieve(type="by_domain", domain=domain)
        except APIStatusError as e:
            if e.status_code != 429 or attempt == max_attempts - 1:
                raise
            retry_after = int(e.response.headers.get("Retry-After", 0))
            time.sleep(retry_after if retry_after > 0 else 2 ** attempt)

def retrieve_with_backoff(client, domain, max_attempts: 4)
  attempt = 0
  begin
    client.brand.retrieve(body: { type: :by_domain, domain: domain })
  rescue ContextDev::Errors::APIStatusError => e
    raise unless e.status == 429 && attempt < max_attempts - 1
    retry_after = e.headers["retry-after"].to_i
    sleep(retry_after.positive? ? retry_after : 2**attempt)
    attempt += 1
    retry
  end
end

import (
    "context"
    "errors"
    "fmt"
    "math"
    "strconv"
    "time"

    contextdev "github.com/context-dot-dev/context-go-sdk"
)

func RetrieveWithBackoff(ctx context.Context, domain string) (*contextdev.BrandGetResponse, error) {
    const maxAttempts = 4
    for attempt := 0; attempt < maxAttempts; attempt++ {
        r, err := client.Brand.Get(ctx, contextdev.BrandGetParams{
            OfByDomain: &contextdev.BrandGetParamsBodyByDomain{Domain: domain},
        })
        if err == nil {
            return r, nil
        }
        var apiErr *contextdev.Error
        if !errors.As(err, &apiErr) || apiErr.StatusCode != 429 || attempt == maxAttempts-1 {
            return nil, err
        }
        delay := time.Duration(math.Pow(2, float64(attempt))) * time.Second
        if secs, parseErr := strconv.Atoi(apiErr.Response.Header.Get("Retry-After")); parseErr == nil && secs > 0 {
            delay = time.Duration(secs) * time.Second
        }
        time.Sleep(delay)
    }
    return nil, fmt.Errorf("unreachable")
}

<?php

use ContextDev\Client;
use ContextDev\Core\Exceptions\APIStatusException;

function retrieveWithBackoff(string $domain, int $maxAttempts = 4)
{
    global $client;

    for ($attempt = 0; $attempt < $maxAttempts; $attempt++) {
        try {
            return $client->brand->retrieve(
                type: 'by_domain',
                domain: $domain,
                requestOptions: ['maxRetries' => 0],
            );
        } catch (APIStatusException $e) {
            if ($e->status !== 429 || $attempt === $maxAttempts - 1) {
                throw $e;
            }
            $retryAfter = (int) ($e->response?->getHeaderLine('Retry-After') ?: 0);
            sleep($retryAfter > 0 ? $retryAfter : 2 ** $attempt);
        }
    }

    throw new RuntimeException('unreachable');
}

Pattern 3: Prefetch to shift slow work ahead of bursts

Bursty traffic (like when a marketing email triggers 200 signups in 60 seconds) can get you rate limited. Prefetching doesn’t reduce the number of Brand API calls that count against your limit; every user-facing /brand/retrieve still spends rate-limit budget. What it does is shift the slow crawl work earlier, so each call during the burst completes in under a second instead of stalling for up to a minute and piling up retries on top of an already-saturated window. Here’s how it works:

During the burst, your application calls POST /utility/prefetch right when it first receives the target domain or email, passing type: "brand" and either identifier.domain or identifier.email. Prefetch is rate-limit-free, so 200 calls in a minute is fine.
A few seconds later, when the user actually submits and the user-facing client hits the Brand API, the request lands on a warm cache and returns in under a second. That call still counts toward your per-minute limit; it’s just fast.

See Prefetch for Faster Response for the full pattern.

Pattern 4: Degrade gracefully when the limit holds

If exponential backoff has run out of retries and you are still seeing 429s, the user is better served by a missing-data fallback than an error screen. Some examples:

Onboarding form. Skip the prefilled fields. Let the user enter them by hand and do not block on the API.
Logo wall. Render the customer’s name in a styled box instead of the logo.
CRM enrichment. Queue the contact for an offline enrichment job that runs overnight.

Build the fallback once and the end user never sees a rate-limit message.

Prefetch

Warm the cache so burst-time calls return fast.

Best practices

Cache, fallback, and proxy patterns end to end.

Troubleshooting

Other status codes, retry logic, and SDK gotchas.

Pricing

Per-plan credit, rate limit, and overage details.

Get Started

Give it to your Agent

What can Context.dev do

Optimizations

No-code Integrations

Rate limits per plan

Monitors API has its own bucket

Weighted endpoints

Read rate-limit headers on every response

What a 429 looks like

Pattern 1: Client-side cache for hot domains

Pattern 2: Backoff on 429 with `Retry-After`

Pattern 3: Prefetch to shift slow work ahead of bursts

Pattern 4: Degrade gracefully when the limit holds

Prefetch

Best practices

Troubleshooting

Pricing

​Rate limits per plan

​Monitors API has its own bucket

​Weighted endpoints

​Read rate-limit headers on every response

​What a 429 looks like

​Pattern 1: Client-side cache for hot domains

​Pattern 2: Backoff on 429 with Retry-After

​Pattern 3: Prefetch to shift slow work ahead of bursts

​Pattern 4: Degrade gracefully when the limit holds

​Related resources

Prefetch

Best practices

Troubleshooting

Pricing

Rate limits per plan

Monitors API has its own bucket

Weighted endpoints

Read rate-limit headers on every response

What a 429 looks like

Pattern 1: Client-side cache for hot domains

Pattern 2: Backoff on 429 with `Retry-After`

Pattern 3: Prefetch to shift slow work ahead of bursts

Pattern 4: Degrade gracefully when the limit holds

Related resources