Rate Limits

The Semji API enforces rate limits to ensure fair usage and stable performance for all customers. Understanding these limits and building retry logic into your integration will prevent disruptions in production. This page explains the limits in place, the response headers you can use to track your usage, and recommended strategies for handling errors gracefully.

Limits

Two rate limits apply to every API key simultaneously:

Limit	Window	Maximum
Hourly	1 hour (rolling)	1,000 requests
Burst	1 second	20 requests

Both limits are enforced per API key. The hourly limit is a rolling window — it tracks requests over the past 60 minutes, not a fixed clock-hour boundary. The burst limit prevents sudden spikes that could affect service reliability even when your hourly budget is healthy. Health check (/health), OpenAPI spec (/openapi.json), and documentation (/docs) endpoints are excluded from rate limiting.

Rate limit headers

Every API response includes the following headers reflecting your current hourly limit status:

X-RateLimit-Limit

integer

The maximum number of requests allowed per hour for this API key.

X-RateLimit-Remaining

integer

The number of requests remaining in the current rolling hour window.

X-RateLimit-Reset

integer

The number of seconds until the oldest request in the rolling window falls out and your remaining count increases.

You can inspect these headers with curl using the -i flag:

Inspect rate limit headers

curl -i https://api.semji.com/v1/me \
  -H "Authorization: Bearer sk_your_api_key_here"

Response headers (excerpt)

HTTP/2 200
x-ratelimit-limit: 1000
x-ratelimit-remaining: 847
x-ratelimit-reset: 1823
content-type: application/json

When the limit is exceeded

When you exceed either limit, the API responds with 429 Too Many Requests. The response body follows the standard error format:

429 Too Many Requests

{
  "error": {
    "code": "rate_limited",
    "message": "Too many requests. Please try again later."
  }
}

On a 429, the response also includes a Retry-After header indicating the number of seconds to wait before retrying.

Handling 429 errors

Exponential backoff

The recommended approach is exponential backoff with jitter. Wait progressively longer between retries, and add a small random delay to avoid synchronized retry storms across multiple workers:

import os
import time
import random
import requests

api_key = os.environ["SEMJI_API_KEY"]

def semji_get(path: str, max_retries: int = 5) -> dict:
    url = f"https://api.semji.com/v1{path}"
    headers = {"Authorization": f"Bearer {api_key}"}

    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
            jitter = random.uniform(0, 1)
            wait = retry_after + jitter
            print(f"Rate limited. Retrying in {wait:.1f}s (attempt {attempt + 1})")
            time.sleep(wait)
            continue

        response.raise_for_status()
        return response.json()

    raise RuntimeError(f"Exceeded {max_retries} retries for {path}")

Proactively monitoring your budget

Instead of waiting for a 429, read the X-RateLimit-Remaining header on each response and slow down when your budget runs low:

Proactive throttling (Python)

import time
import requests

def semji_get_with_throttle(session: requests.Session, url: str) -> dict:
    response = session.get(url)
    response.raise_for_status()

    remaining = int(response.headers.get("X-RateLimit-Remaining", 1000))
    reset_in = int(response.headers.get("X-RateLimit-Reset", 0))

    # Slow down when fewer than 50 requests remain
    if remaining < 50 and reset_in > 0:
        sleep_time = reset_in / max(remaining, 1)
        time.sleep(min(sleep_time, 5))  # Cap at 5s between requests

    return response.json()

Best practices

Content generation jobs (/v1/content-generations) are asynchronous. Poll their status with a reasonable interval — every 5–10 seconds is sufficient. Polling every second wastes your request budget without providing faster results.

Cache responses where possible. Resources like workspaces, brand voices, and knowledge documents rarely change. Cache their IDs and names for the duration of your session rather than fetching them on every run. Use pagination efficiently. Fetch only the pages you need. Use the maximum limit=100 when you need to process all items in a collection, rather than making many small requests. Batch reads before writes. If your workflow reads several resources before creating or updating one, perform all the reads first. This groups your writes into a smaller time window and leaves more headroom for subsequent operations. Avoid retrying 4xx errors other than 429. Errors in the 400–428 range indicate a problem with the request itself (bad input, missing permissions, not found). Retrying them will not help and only wastes your quota. Fix the request and then retry.

Overview

Me

Users

Workspaces

Folders

Pages

Contents

Content Generations

Credit Usages

Keywords

Prompts

Brand Voices

Knowledge Documents

Limits

Rate limit headers

When the limit is exceeded

Handling 429 errors

Exponential backoff

Proactively monitoring your budget

Best practices

Overview

Me

Users

Workspaces

Folders

Pages

Contents

Content Generations

Credit Usages

Keywords

Prompts

Brand Voices

Knowledge Documents

Documentation Index

​Limits

​Rate limit headers

​When the limit is exceeded

​Handling 429 errors

​Exponential backoff

​Proactively monitoring your budget

​Best practices

Limits

Rate limit headers

When the limit is exceeded

Handling 429 errors

Exponential backoff

Proactively monitoring your budget

Best practices