Skip to main content

Rate Limits & Quotas

Each organization has a daily token quota that resets at midnight UTC.

How it works

  • Every request consumes input tokens (your message + conversation history) and output tokens (the agent’s response)
  • Both are counted against your daily limit
  • When the limit is reached, subsequent requests return a 429 error with the time until reset

Error response

When your quota is exceeded, the API returns an SSE error event:
event: error
data: {
  "type": "error",
  "data": {
    "error": "Limite quotidienne organisation atteinte (1,000,000 tokens). Reset dans 3h42.",
    "error_type": "RateLimitExceeded"
  }
}

Monitoring usage

Contact your iNwealth account manager to:
  • Check your current consumption
  • Adjust your daily token limit
  • Get usage reports

Tips to optimize token usage

Use effort wisely

Use "low" or "medium" effort for simple questions. Reserve "high" and "max" for complex cross-border scenarios.

Keep history lean

Only send relevant conversation history in messages. Trim older turns when the conversation gets long.

Reuse session_id

Same session_id across turns enables prompt caching — up to 90% token savings on multi-turn conversations.

Skip extended thinking

Only enable extended_thinking for questions that truly need deep reasoning.