Rate Limit
For introduction, see How to Limit Execution Rate.
This is an additional feature of SmallRye Fault Tolerance and is not specified by MicroProfile Fault Tolerance. |
Description
Rate limit enforces a maximum number of permitted invocations in a time window of some length.
For example, with a rate limit, one can make sure that a method may only be called 50 times per minute.
Invocations that would exceed the limit are rejected with an exception of type RateLimitException
.
Additionally, it is possible to define minimum spacing between invocations. For example, with minimum spacing of 1 second, if a second invocation happens 500 millis after the first, it is rejected even if the limit would not be exceeded yet.
Rate limit is superficially similar to a bulkhead (concurrency limit), but is in fact quite different. Bulkhead limits the number of executions happening concurrently at any point in time. Rate limit limits the number of executions in a time window of some length, without considering concurrency.
A method or a class can be annotated with @RateLimit
, which means the method or the methods in the class will apply the rate limit strategy.
The previous example with 50 maximum invocations per minute and minimum spacing of 1 second would look like this:
@RateLimit(value = 50,
window = 1, windowUnit = ChronoUnit.MINUTES,
minSpacing = 1, minSpacingUnit = ChronoUnit.SECONDS)
public void doSomething() {
...
}
Lifecycle
Rate limit needs to maintain some state between invocations: the number of recent invocations, the time stamp of last invocation, and so on.
This state is a singleton, irrespective of the lifecycle of the bean that uses the @RateLimit
annotation.
More specifically, the rate limit state is uniquely identified by the combination of the bean class (java.lang.Class
) and the method object (java.lang.reflect.Method
) representing the guarded method.
For example, if there’s a guarded method doWork
on a bean which is @RequestScoped
, each request will have its own instance of the bean, but all invocations of doWork
will share the same rate limit state.
Interactions with Other Strategies
See How to Use Multiple Strategies for an overview of how fault tolerance strategies are nested.
If @Fallback
is used with @RateLimit
, the fallback method or handler may be invoked if a RateLimitException
is thrown, depending on the fallback configuration.
If @Retry
is used with @RateLimit
, each retry attempt is processed by the rate limit as an independent invocation.
If RateLimitException
is thrown, the execution may be retried, depending on how retry is configured.
If @CircuitBreaker
is used with @RateLimit
, the circuit breaker is checked before enforcing the rate limit.
If rate limiting results in RateLimitException
, this may be counted as a failure, depending on how the circuit breaker is configured.
Configuration
There are 6 configuration options, corresponding to the 6 members of the @RateLimit
annotation.
minSpacing
+ minSpacingUnit
Type: long
+ ChronoUnit
Default: 0 seconds
The minimum spacing between two consecutive invocations.
type
Type: RateLimitType
Default: RateLimitType.FIXED
The type of time window used for rate limiting. There are 3 types of time windows used for rate limiting: fixed, rolling and smooth.
Fixed time windows are a result of dividing time into non-overlapping intervals of given length. The invocation limit is enforced for each interval independently. This means that short bursts of invocations occuring near the time window boundaries may temporarily exceed the configured rate limit. This kind of rate limiting is also called fixed window rate limiting.
Rolling time windows enforce the limit continuously, instead of dividing time into independent intervals. The invocation limit is enforced for all possible time intervals of given length, regardless of overlap. This is more precise, but requires more memory and may be slower. This kind of rate limiting is also called sliding log rate limiting.
Smooth time windows enforce a uniform distribution of invocations under a rate calculated from given time window length and given limit. If recent rate of invocations is under the limit, a subsequent burst of invocations is allowed during a shorter time span, but the calculated rate is never exceeded. This kind of rate limiting is also called token bucket or leaky bucket (as a meter) rate limiting, with the additional property that all work units are considered to have the same size.
Example usage:
@RateLimit(value = 50,
window = 1, windowUnit = ChronoUnit.MINUTES,
minSpacing = 1, minSpacingUnit = ChronoUnit.SECONDS,
type = RateLimitType.ROLLING)
public void doSomething() {
...
}
Metrics
Rate limit exposes the following metrics:
Name |
|
Type |
|
Unit |
None |
Description |
The number of times the rate limit logic was run. This is usually once per method call, but may be zero times if the circuit breaker prevented execution or more than once if the method call was retried. |
Tags |
|
See the Metrics reference guide for general metrics information.