Least Response Time Load Balancing

The least-response-time load balancing strategy collects response times of the calls made with service instances and picks an instance based on this information.

Erroneous responses are treated as responses with a long response time, by default 60 seconds. This can be controlled with the error-penalty attribute.

The algorithm for service instance selection is as follows:

if there is a service instance that wasn't used before - use it, otherwise:
if there are any service instances with collected response times - select the one for which score is the lowest, otherwise:
select a random instance

The score for an instance decreases in time if an instance is not used. This way we ensure that instances that haven't been used in a long time, are retried.

For the details on the score calculation, see Score calculation

Dependency

To use this load balancer, start with adding the least-response-time load-balancer dependency to your project:

<dependency>
    <groupId>io.smallrye.stork</groupId>
    <artifactId>stork-load-balancer-least-response-time</artifactId>
    <version>1.1.1</version>
</dependency>

Configuration

For each service expected to use a least-response-time selection, configure the load-balancer to be least-response-time:

stork standalonestork in quarkus

stork.my-service.service-discovery.type=...
stork.my-service.service-discovery...=...
stork.my-service.load-balancer.type=least-response-time

quarkus.stork.my-service.service-discovery.type=...
quarkus.stork.my-service.service-discovery...=...
quarkus.stork.my-service.load-balancer.type=least-response-time

The following attributes are supported:

Attribute	Mandatory	Default Value	Description
`declining-factor`	No	`0.9`	How much score should decline in time, see Score calculation in the docs for details.
`error-penalty`	No	`60s`	This load balancer treats an erroneous response as a response after this time.
`use-secure-random`	No	`false`	Whether the load balancer should use a SecureRandom instead of a Random (default). Check this page to understand the difference

Score calculation

The score of a service instance is calculated by dividing a weighted sum of response times by sum of the weighs. The result is additionally adjusted to account for instances that haven't been used for a long time.

Let:

$n$ denote how many instance selections were made so far
$t_i$ denote the response time for call $i$
$n_i$ denote the number of instance selections done until the moment of recording the response time for call $i$
$n_{max}$ denote the number of instance selections at the moment of last call recorded with this instance
$\delta$ denote a configurable declining-factor

The idea for the weight is to decrease the importance of the data collected long time (many calls) ago. For call $i$, the weight is calculated as follows: $$ w_i = \delta ^ {(n - n_i)} $$

The score of a service instance is calculated as: $$ score(n) = \delta^{n - n_{max}} * \frac{\sum_i t_i * w_i}{\sum_i w_i} = \delta^{n - n_{max}} * \frac{\sum_i t_i * \delta^{n - n_i}}{\sum_i \delta^{n - n_i}} $$

The declining-factor should be in $(0, 1]$ , the default is $0.9$. Using a lower value makes the older response times less important.