> ## Documentation Index > Fetch the complete documentation index at: https://docs.context.dev/llms.txt > Use this file to discover all available pages before exploring further. # Handle Rate Limits > Stay under your plan's per-minute request cap with client caching, exponential backoff, prefetching, and graceful fallbacks. Each plan has a per-minute request cap. If you exceed it, the API returns `429 Too Many Requests`. To make your app production-ready, use these four patterns to prevent this or handle errors when it happens: * Client-side caching for hot domains. * Backoff on 429, honoring the `Retry-After` header. * Prefetch to shift slow work ahead of bursts. * Tier-aware fallbacks when the limit holds. ## Rate limits per plan Rate limits apply per API key, are measured per minute, and are visible on your [dashboard](https://www.context.dev/dashboard). The current tiers: | Plan | Credits per month | Rate limit | Overage | | ---------- | ----------------- | ------------------ | -------------------- | | Free | 500 one-time\* | 10 requests/min | None | | Starter | 30,000 | 120 requests/min | \$19 per 10K credits | | Pro | 200,000 | 300 requests/min | \$9 per 10K credits | | Scale | 2,500,000 | 1,200 requests/min | \$6 per 10K credits | | Enterprise | Custom | Custom | Contact sales | \* Free plan credits are a one-time grant, not a monthly allowance. [Logo Link](/guides/get-logo-from-url) and [Prefetch](/optimization/prefetching) endpoints do not have any rate limits. ## What a 429 looks like The API returns a JSON envelope: ```json theme={null} { "status": "error", "message": "Rate limit exceeded", "code": 429, "key_metadata": { "credits_consumed": 0, "credits_remaining": 29940 } } ``` `credits_consumed` is always `0` on a 429 — throttled requests are never charged. Every 429 response also includes a `Retry-After` header with the number of seconds (1–60) until your per-minute window resets: ```text theme={null} Retry-After: 23 ``` Through an SDK, the error surfaces as a typed exception with `status === 429`. The SDK does not retry automatically. You wire that in. ## Pattern 1: Client-side cache for hot domains The cheapest way to stay under the cap is to skip the call. Brand data changes on the order of months, so a 24-hour client cache is safe for most products: ```typescript TypeScript theme={null} import ContextDev from "context.dev"; const client = new ContextDev({ apiKey: process.env.CONTEXT_DEV_API_KEY }); const CACHE_TTL_MS = 30 * 24 * 60 * 60 * 1000; const cache = new Map(); async function getBrand(domain: string) { const hit = cache.get(domain); if (hit && Date.now() - hit.at < CACHE_TTL_MS) return hit.data; const { brand } = await client.brand.retrieve({ domain }); cache.set(domain, { data: brand, at: Date.now() }); return brand; } ``` ```python Python theme={null} import os import time from context.dev import ContextDev client = ContextDev(api_key=os.environ["CONTEXT_DEV_API_KEY"]) CACHE_TTL_S = 30 * 24 * 60 * 60 _cache: dict[str, tuple[float, dict]] = {} def get_brand(domain: str): hit = _cache.get(domain) if hit and time.time() - hit[0] < CACHE_TTL_S: return hit[1] brand = client.brand.retrieve(domain=domain).brand _cache[domain] = (time.time(), brand) return brand ``` ```ruby Ruby theme={null} require "context_dev" client = ContextDev::Client.new(api_key: ENV.fetch("CONTEXT_DEV_API_KEY")) CACHE_TTL_S = 30 * 24 * 60 * 60 @cache = {} def get_brand(client, domain) hit = @cache[domain] return hit[:data] if hit && Time.now.to_i - hit[:at] < CACHE_TTL_S brand = client.brand.retrieve(domain: domain).brand @cache[domain] = { data: brand, at: Time.now.to_i } brand end get_brand(client, "acme.com") ``` ```go Go theme={null} package main import ( "context" "os" "sync" "time" contextdev "github.com/context-dot-dev/context-go-sdk" "github.com/context-dot-dev/context-go-sdk/option" ) var ( cacheMu sync.Mutex cache = map[string]struct { data any at time.Time }{} cacheTTL = 30 * 24 * time.Hour client = contextdev.NewClient(option.WithAPIKey(os.Getenv("CONTEXT_DEV_API_KEY"))) ) func GetBrand(ctx context.Context, domain string) (any, error) { cacheMu.Lock() if hit, ok := cache[domain]; ok && time.Since(hit.at) < cacheTTL { cacheMu.Unlock() return hit.data, nil } cacheMu.Unlock() r, err := client.Brand.Get(ctx, contextdev.BrandGetParams{Domain: domain}) if err != nil { return nil, err } cacheMu.Lock() cache[domain] = struct { data any at time.Time }{r.Brand, time.Now()} cacheMu.Unlock() return r.Brand, nil } ``` Reasonable TTL starting points: 30 days for brand responses, 7 days for product extractions, indefinite for industry codes (NAICS / SIC). Adjust per use case. ## Pattern 2: Backoff on 429 with `Retry-After` When you hit rate limits, you get a 429 status code on the response: ```json theme={null} { "status": "error", "message": "Rate limit exceeded", "code": 429 } ``` The response's `Retry-After` header tells you exactly how many seconds until your window resets, so use it as the wait time when it's present. Fall back to exponential backoff (wait 1 second before the first retry and double the delay on each subsequent attempt) if you can't read the header. Here's an example of a retry script that honors `Retry-After` and falls back to exponential delays: ```typescript TypeScript theme={null} async function retrieveWithBackoff(domain: string, maxAttempts = 4) { for (let attempt = 0; attempt < maxAttempts; attempt++) { try { return await client.brand.retrieve({ domain }); } catch (err: any) { if (err.status !== 429 || attempt === maxAttempts - 1) throw err; const retryAfter = Number(err.headers?.["retry-after"]); const delayMs = retryAfter > 0 ? retryAfter * 1000 : Math.pow(2, attempt) * 1000; await new Promise((r) => setTimeout(r, delayMs)); } } } ``` ```python Python theme={null} import time from context.dev import APIStatusError def retrieve_with_backoff(domain: str, max_attempts: int = 4): for attempt in range(max_attempts): try: return client.brand.retrieve(domain=domain) except APIStatusError as e: if e.status_code != 429 or attempt == max_attempts - 1: raise retry_after = int(e.response.headers.get("Retry-After", 0)) time.sleep(retry_after if retry_after > 0 else 2 ** attempt) ``` ```ruby Ruby theme={null} def retrieve_with_backoff(client, domain, max_attempts: 4) attempt = 0 begin client.brand.retrieve(domain: domain) rescue ContextDev::Errors::APIStatusError => e raise unless e.status == 429 && attempt < max_attempts - 1 retry_after = e.headers["retry-after"].to_i sleep(retry_after.positive? ? retry_after : 2**attempt) attempt += 1 retry end end ``` ```go Go theme={null} import ( "context" "errors" "fmt" "math" "strconv" "time" contextdev "github.com/context-dot-dev/context-go-sdk" ) func RetrieveWithBackoff(ctx context.Context, domain string) (*contextdev.BrandGetResponse, error) { const maxAttempts = 4 for attempt := 0; attempt < maxAttempts; attempt++ { r, err := client.Brand.Get(ctx, contextdev.BrandGetParams{Domain: domain}) if err == nil { return r, nil } var apiErr *contextdev.Error if !errors.As(err, &apiErr) || apiErr.StatusCode != 429 || attempt == maxAttempts-1 { return nil, err } delay := time.Duration(math.Pow(2, float64(attempt))) * time.Second if secs, parseErr := strconv.Atoi(apiErr.Response.Header.Get("Retry-After")); parseErr == nil && secs > 0 { delay = time.Duration(secs) * time.Second } time.Sleep(delay) } return nil, fmt.Errorf("unreachable") } ``` ## Pattern 3: Prefetch to shift slow work ahead of bursts Bursty traffic (like when a marketing email triggers 200 signups in 60 seconds) can get you rate limited. Prefetching doesn't reduce the number of Brand API calls that count against your limit; every user-facing `/brand/retrieve` still spends rate-limit budget. What it does is shift the slow crawl work earlier, so each call during the burst completes in under a second instead of stalling for up to a minute and piling up retries on top of an already-saturated window. Here's how it works: * During the burst, your application calls `/brand/prefetch` (if it has a domain) or `/brand/prefetch-by-email` (if it has an email) right when it first receives the target domain or email. These prefetch endpoints are rate-limit-free, so 200 calls in a minute is fine. * A few seconds later, when the user actually submits and the user-facing client hits the Brand API, the request lands on a warm cache and returns in under a second. That call still counts toward your per-minute limit; it's just fast. See [Prefetch for Faster Response](/optimization/prefetching) for the full pattern. ## Pattern 4: Degrade gracefully when the limit holds If exponential backoff has run out of retries and you are still seeing 429s, the user is better served by a missing-data fallback than an error screen. Some examples: * **Onboarding form.** Skip the prefilled fields. Let the user enter them by hand and do not block on the API. * **Logo wall.** Render the customer's name in a styled box instead of the logo. * **CRM enrichment.** Queue the contact for an offline enrichment job that runs overnight. Build the fallback once and the end user never sees a rate-limit message. ## Related resources Warm the cache so burst-time calls return fast. Cache, fallback, and proxy patterns end to end. Other status codes, retry logic, and SDK gotchas. Per-plan credit, rate limit, and overage details.