Skip to main content

Rate Limiting

The rate_limit engine protects against brute force, credential stuffing, scraping, and plain resource exhaustion by enforcing a per-key token-bucket limit. It is a header-phase, request-only engine — the body is never buffered for it — and a limited request receives an immediate 429 with a Retry-After header instead of the usual 403.

When to use it

  • Cap per-client request rates on public APIs (key: ip).
  • Give each partner or API key its own budget (key: header on the credential header).
  • Throttle an entire vhost regardless of caller (key: host).
  • Size limits safely first: run in detect mode to record would-be-429s without enforcing — see Modes & Fail Postures.

Configuration

FieldTypeRequiredDefaultAllowed / notes
requests_per_secondfloatyesSustained rate per key. Must be > 0.
burstintnorequests_per_second (floored to 1)Bucket capacity (max instantaneous burst). ≥ 0 (0 = derive from the rate).
keystringnoipip | host | header — the limit dimension.
headerstringconditionalOnly used for key: header (required there) — the request header whose value is the limiter key. It is ignored for key: ip: the source IP is the pre-derived trusted-hop address (tx.SourceIP), not a header you name here.
info

The source IP for key: ip is derived from the trusted hop (right-side X-Forwarded-For), never the spoofable leftmost token. See Envoy prerequisites.

Example

apiVersion: sentinel.elchi.io/v1
kind: SecurityPolicy
metadata:
name: api-ratelimit
spec:
defaults:
mode: block
fail_mode: fail_open

domains:
- hosts: ["gateway.example.com"]
routes:
# Per-IP limit: 100 req/s sustained, bursts up to 200.
- match:
path_prefix: "/v1/"
policy:
mode: block
engines:
rate_limit:
requests_per_second: 100
burst: 200
key: ip # ip | host | header

# Per-API-key limit (key by a header value) + JWT auth on the same
# route. Both engines run at the header phase.
- match:
path_prefix: "/partner/"
policy:
mode: block
engines:
rate_limit:
requests_per_second: 10
burst: 20
key: header
header: "X-Api-Key"
jwt:
issuer: "https://auth.example.com/"
audience: "partner-api"
algorithms: ["RS256"]
public_key_file: "/etc/elchi/elchi-shield/keys/jwt-pub.pem"
leeway: 30s # clock-skew tolerance

# Detect-mode (monitor) rate limit: records would-be-429s but allows
# the request, so you can size limits before enforcing.
- match:
path_prefix: "/beta/"
policy:
mode: detect
engines:
rate_limit:
requests_per_second: 5
key: ip

How it decides

Key selection:

  • key: ip (default) → the trusted derived source IP. An empty source IP is not limited — the engine fails open on a missing IP rather than falling back to a spoofable one.
  • key: host → the canonicalized host (port stripped, IPv6 brackets removed, lowercased).
  • key: header → the named header's value. An absent header ⇒ empty key ⇒ not limited.

Bucket math: requests_per_second is the refill rate; burst is the bucket capacity (default ≈ requests_per_second, floored to 1). The first sighting of a key allows the request and leaves burst − 1 tokens; after that, tokens += elapsed × rps, capped at burst. If at least 1 token is available the request is allowed and one token is consumed; otherwise it is blocked with 429, reason ratelimit.exceeded, severity Low, and a Retry-After header.

The limiter is the one sanctioned stateful engine: 64 shards, each with its own mutex, touched only when a policy opts in — the rest of the hot path stays lock-free.

Envoy prerequisites

For key: ip, the limit is only as trustworthy as the source IP:

  • Run Envoy with use_remote_address: true so the peer address Envoy appends to X-Forwarded-For is authoritative.
  • Set --xff-trusted-hops to the exact number of trusted proxies in front of Envoy. Misconfigure it and clients can mint fresh buckets at will by rotating a spoofed XFF value.

key: host and key: header have no special Envoy requirements. See Envoy wiring.

Verify

Under the limit, requests pass:

curl -i http://gateway.example.com/v1/items
# HTTP/1.1 200 OK

Exhaust the bucket and the next request is limited:

for i in $(seq 1 250); do
curl -s -o /dev/null -w "%{http_code}\n" http://gateway.example.com/v1/items
done | sort | uniq -c
# 200 200
# 50 429 <- ratelimit.exceeded, response carries Retry-After

In detect mode the 429s show up in detections_total instead of being returned.

Gotchas

warning

Never key on an attacker-controlled header. A spoofable key lets an attacker mint unlimited fresh buckets and bypass the limit entirely. Key-flood resilience is also coarse: each shard caps at 16384 keys and resets the whole shard map when full, which can let a burst through during a key flood. Prefer key: ip or key: host over key: header unless the header is a credential your edge guarantees.

warning

Shared state follows policy inheritance. A rate_limit defined at the domain level and inherited by N routes is one combined limiter across all of them (shared buckets); the same block written on a route is independent; two separately-written identical blocks are independent. Define the limit at the scope it should actually apply to — see policy resolution.

  • An empty key (missing IP or absent header) is not limited — by design, but it means key: header only limits requests that carry the header.

Related engines: IP reputation, Bot detection.