Receiving Kafka Records
The Kafka Connector retrieves Kafka Records from Kafka Brokers and maps
each of them to Reactive Messaging Messages
.
Example
Let’s imagine you have a Kafka broker running, and accessible using the
kafka:9092
address (by default it would use localhost:9092
).
Configure your application to receive Kafka records from a Kafka topic
on the prices
channel as follows:
-
Configure the broker location. You can configure it globally or per channel
-
Configure the connector to manage the
prices
channel -
Sets the (Kafka) deserializer to read the record’s value
-
Make sure that we can receive from more than one consumer (see
KafkaPriceConsumer
andKafkaPriceMessageConsumer
below)
Note
You don’t need to set the Kafka topic. By default, it uses the channel
name (prices
). You can configure the topic
attribute to override it.
Then, your application receives Message<Double>
. You can consume the
payload directly:
Or, you can retrieve the Message<Double>
:
Deserialization
The deserialization is handled by the underlying Kafka Client. You need to configure the:
-
mp.messaging.incoming.[channel-name].value.deserializer
to configure the value deserializer (mandatory) -
mp.messaging.incoming.[channel-name].key.deserializer
to configure the key deserializer (optional, default toString
)
If you want to use a custom deserializer, add it to your CLASSPATH
and
configure the associate attribute.
In addition, the Kafka Connector also provides a set of message converters. So you can receive payloads representing records from Kafka using:
- Record - a pair key/value
- ConsumerRecord
- a structure representing the record with all its metadata
Inbound Metadata
Messages coming from Kafka contains an instance of IncomingKafkaRecordMetadata in the metadata. It provides the key, topic, partitions, headers and so on:
Acknowledgement
When a message produced from a Kafka record is acknowledged, the connector invokes a commit strategy. These strategies decide when the consumer offset for a specific topic/partition is committed. Committing an offset indicates that all previous records have been processed. It is also the position where the application would restart the processing after a crash recovery or a restart.
Committing every offset has performance penalties as Kafka offset management can be slow. However, not committing the offset often enough may lead to message duplication if the application crashes between two commits.
The Kafka connector supports three strategies:
-
throttled
keeps track of received messages and commit to the next offset after the latest acked message in sequence. This strategy guarantees at-least-once delivery even if the channel performs asynchronous processing. The connector tracks the received records and periodically (period specified byauto.commit.interval.ms
(default: 5000)) commits the highest consecutive offset. The connector will be marked as unhealthy if a message associated with a record is not acknowledged inthrottled.unprocessed-record-max-age.ms
(default: 60000). Indeed, this strategy cannot commit the offset as soon as a single record processing fails (see failure-strategy to configure what happens on failing processing). Ifthrottled.unprocessed-record-max-age.ms
is set to less than or equal to 0, it does not perform any health check verification. Such a setting might lead to running out of memory if there are poison pill messages. This strategy is the default ifenable.auto.commit
is not explicitly set totrue
. -
latest
commits the record offset received by the Kafka consumer as soon as the associated message is acknowledged (if the offset is higher than the previously committed offset). This strategy provides at-least-once delivery if the channel processes the message without performing any asynchronous processing. This strategy should not be used on high-load as offset commit is expensive. However, it reduces the risk of duplicates. -
ignore
performs no commit. This strategy is the default strategy when the consumer is explicitly configured withenable.auto.commit
totrue
. It delegates the offset commit to the Kafka client. Whenenable.auto.commit
istrue
this strategy DOES NOT guarantee at-least-once delivery. However, if the processing failed between two commits, messages received after the commit and before the failure will be re-processed.
Important
The Kafka connector disables the Kafka auto commit if not explicitly enabled. This behavior differs from the traditional Kafka consumer.
If high-throughout is important for you, and not limited by the downstream, we recommend to either:
- Use the
throttled
policy - or set
enable.auto.commit
totrue
and annotate the consuming method with@Acknowledgment(Acknowledgment.Strategy.NONE)
Failure Management
If a message produced from a Kafka record is nacked, a failure strategy is applied. The Kafka connector supports 3 strategies:
-
fail
- fail the application, no more records will be processed. (default) The offset of the record that has not been processed correctly is not committed. -
ignore
- the failure is logged, but the processing continue. The offset of the record that has not been processed correctly is committed. -
dead-letter-queue
- the offset of the record that has not been processed correctly is committed, but the record is written to a (Kafka) dead letter queue topic.
The strategy is selected using the failure-strategy
attribute.
In the case of dead-letter-queue
, you can configure the following
attributes:
dead-letter-queue.topic
: the topic to use to write the records not processed correctly, default isdead-letter-topic-$channel
, with$channel
being the name of the channel.-
dead-letter-queue.producer-client-id
: the client id used by the kafka producer when sending records to dead letter queue topic. If not specified it will default tokafka-dead-letter-topic-producer-$client-id
, with $client-id being the value obtained from consumer client id. -
dead-letter-queue.key.serializer
: the serializer used to write the record key on the dead letter queue. By default, it deduces the serializer from the key deserializer. -
dead-letter-queue.value.serializer
: the serializer used to write the record value on the dead letter queue. By default, it deduces the serializer from the value deserializer.
The record written on the dead letter topic contains the original record’s headers, as well as a set of additional headers about the original record:
-
dead-letter-reason
- the reason of the failure (theThrowable
passed tonack()
) -
dead-letter-cause
- the cause of the failure (thegetCause()
of theThrowable
passed tonack()
), if any -
dead-letter-topic
- the original topic of the record -
dead-letter-partition
- the original partition of the record (integer mapped to String) -
dead-letter-offset
- the original offset of the record (long mapped to String)
When using dead-letter-queue
, it is also possible to change some
metadata of the record that is sent to the dead letter topic. To do
that, use the Message.nack(Throwable, Metadata)
method:
The Metadata
may contain an instance of OutgoingKafkaRecordMetadata
.
If the instance is present, the following properties will be used:
-
key; if not present, the original record’s key will be used
-
topic; if not present, the configured dead letter topic will be used
-
partition; if not present, partition will be assigned automatically
-
headers; combined with the original record’s headers, as well as the
dead-letter-*
headers described above
Retrying processing
You can combine Reactive Messaging with SmallRye Fault Tolerance, and retry processing when it fails:
You can configure the delay, the number of retries, the jitter...
If your method returns a Uni
, you need to add the @NonBlocking
annotation:
The incoming messages are acknowledged only once the processing completes successfully. So, it commits the offset after the successful processing. If after the retries the processing still failed, the message is nacked and the failure strategy is applied.
You can also use @Retry
on methods only consuming incoming messages:
Handling deserialization failures
Because deserialization happens before creating a Message
, the failure
strategy presented above cannot be applied. However, when a
deserialization failure occurs, you can intercept it and provide a
fallback value. To achieve this, create a CDI bean implementing the
DeserializationFailureHandler
interface:
The bean must be exposed with the @Identifier
qualifier specifying the
name of the bean. Then, in the connector configuration, specify the
following attribute:
-
mp.messaging.incoming.$channel.key-deserialization-failure-handler
: name of the bean handling deserialization failures happening for the record’s key -
mp.messaging.incoming.$channel.value-deserialization-failure-handler
: name of the bean handling deserialization failures happening for the record’s value,
The handler is called with the deserialization action as a Uni<T>
, the
record’s topic, a boolean indicating whether the failure happened on a
key, the class name of the deserializer that throws the exception, the
corrupted data, the exception, and the records headers augmented with
headers describing the failure (which ease the write to a dead letter).
On the deserialization Uni
failure strategies like retry, providing a
fallback value or applying timeout can be implemented. Note that the
method must await on the result and return the deserialized object.
Alternatively, the handler can only implement
handleDeserializationFailure
method and provide a fallback value,
which may be null
.
If you don’t configure a deserialization failure handlers and a
deserialization failure happens, the application is marked unhealthy.
You can also ignore the failure, which will log the exception and
produce a null
value. To enable this behavior, set the
mp.messaging.incoming.$channel.fail-on-deserialization-failure
attribute to false
.
Receiving Cloud Events
The Kafka connector supports Cloud Events.
When the connector detects a structured or binary Cloud Events, it
adds a IncomingKafkaCloudEventMetadata in the metadata of the
Message. IncomingKafkaCloudEventMetadata
contains the various (mandatory and optional) Cloud Event attributes.
If the connector cannot extract the Cloud Event metadata, it sends the Message without the metadata.
Binary Cloud Events
For binary
Cloud Events, all mandatory Cloud Event attributes must
be set in the record header, prefixed by ce_
(as mandated by the
protocol
binding).
The connector considers headers starting with the ce_
prefix but not
listed in the specification as extensions. You can access them using the
getExtension
method from IncomingKafkaCloudEventMetadata
. You can
retrieve them as String
.
The datacontenttype
attribute is mapped to the content-type
header
of the record. The partitionkey
attribute is mapped to the record’s
key, if any.
Note that all headers are read as UTF-8.
With binary Cloud Events, the record’s key and value can use any deserializer.
Structured Cloud Events
For structured
Cloud Events, the event is encoded in the record’s
value. Only JSON is supported, so your event must be encoded as JSON in
the record’s value.
Structured Cloud Event must set the content-type
header of the record
to application/cloudevents
or prefix the value with
application/cloudevents
such as:
application/cloudevents+json; charset=UTF-8
.
To receive structured Cloud Events, your value deserializer must be:
-
org.apache.kafka.common.serialization.StringDeserializer
-
org.apache.kafka.common.serialization.ByteArrayDeserializer
-
io.vertx.kafka.client.serialization.JsonObjectDeserializer
As mentioned previously, the value must be a valid JSON object containing at least all the mandatory Cloud Events attributes.
If the record is a structured Cloud Event, the created Message’s payload
is the Cloud Event data
.
The partitionkey
attribute is mapped to the record’s key if any.
Consumer Rebalance Listener
To handle offset commit and assigned partitions yourself, you can
provide a consumer rebalance listener. To achieve this, implement the
io.smallrye.reactive.messaging.kafka.KafkaConsumerRebalanceListener
interface, make the implementing class a bean, and add the @Identifier
qualifier. A usual use case is to store offset in a separate data store
to implement exactly-once semantic, or starting the processing at a
specific offset.
The listener is invoked every time the consumer topic/partition
assignment changes. For example, when the application starts, it invokes
the partitionsAssigned
callback with the initial set of
topics/partitions associated with the consumer. If, later, this set
changes, it calls the partitionsRevoked
and partitionsAssigned
callbacks again, so you can implement custom logic.
Note that the rebalance listener methods are called from the Kafka polling thread and must block the caller thread until completion. That’s because the rebalance protocol has synchronization barriers, and using asynchronous code in a rebalance listener may be executed after the synchronization barrier.
When topics/partitions are assigned or revoked from a consumer, it pauses the message delivery and restarts once the rebalance completes.
If the rebalance listener handles offset commit on behalf of the user
(using the ignore
commit strategy), the rebalance listener must
commit the offset synchronously in the partitionsRevoked
callback. We
also recommend applying the same logic when the application stops.
Unlike the ConsumerRebalanceListener
from Apache Kafka, the
io.smallrye.reactive.messaging.kafka.KafkaConsumerRebalanceListener
methods pass the Kafka Consumer
and the set of topics/partitions.
Example
In this example we set-up a consumer that always starts on messages from
at most 10 minutes ago (or offset 0). First we need to provide a bean
that implements the
io.smallrye.reactive.messaging.kafka.KafkaConsumerRebalanceListener
interface and is annotated with @Identifier
. We then must configure
our inbound connector to use this named bean.
To configure the inbound connector to use the provided listener we either set the consumer rebalance listener’s name:
mp.messaging.incoming.rebalanced-example.consumer-rebalance-listener.name=rebalanced-example.rebalancer
Or have the listener’s name be the same as the group id:
mp.messaging.incoming.rebalanced-example.group.id=rebalanced-example.rebalancer
Setting the consumer rebalance listener’s name takes precedence over using the group id.
Receiving Kafka Records in Batches
By default, incoming methods receive each Kafka record individually.
Under the hood, Kafka consumer clients poll the broker constantly and
receive records in batches, presented inside the ConsumerRecords
container.
In batch mode, your application can receive all the records returned by the consumer poll in one go.
To achieve this you need to set
mp.messaging.incoming.$channel.batch=true
and specify a compatible
container type to receive all the data:
The incoming method can also receive Message<List<Payload>
,
KafkaBatchRecords<Payload>
ConsumerRecords<Key, Payload>
types, They
give access to record details such as offset or timestamp :
Note that the successful processing of the incoming record batch will commit the latest offsets for each partition received inside the batch. The configured commit strategy will be applied for these records only.
Conversely, if the processing throws an exception, all messages are nacked, applying the failure strategy for all the records inside the batch.
Configuration Reference
Attribute (alias) | Description | Type | Mandatory | Default |
---|---|---|---|---|
auto.offset.reset | What to do when there is no initial offset in Kafka.Accepted values are earliest, latest and none | string | false | latest |
batch | Whether the Kafka records are consumed in batch. The channel injection point must consume a compatible type, such as List<Payload> or KafkaRecordBatch<Payload> . |
boolean | false | false |
bootstrap.servers (kafka.bootstrap.servers) | A comma-separated list of host:port to use for establishing the initial connection to the Kafka cluster. | string | false | localhost:9092 |
broadcast | Whether the Kafka records should be dispatched to multiple consumer | boolean | false | false |
cloud-events | Enables (default) or disables the Cloud Event support. If enabled on an incoming channel, the connector analyzes the incoming records and try to create Cloud Event metadata. If enabled on an outgoing, the connector sends the outgoing messages as Cloud Event if the message includes Cloud Event Metadata. | boolean | false | true |
commit-strategy | Specify the commit strategy to apply when a message produced from a record is acknowledged. Values can be latest , ignore or throttled . If enable.auto.commit is true then the default is ignore otherwise it is throttled |
string | false | |
consumer-rebalance-listener.name | The name set in @Identifier of a bean that implements io.smallrye.reactive.messaging.kafka.KafkaConsumerRebalanceListener . If set, this rebalance listener is applied to the consumer. |
string | false | |
dead-letter-queue.key.serializer | When the failure-strategy is set to dead-letter-queue indicates the key serializer to use. If not set the serializer associated to the key deserializer is used |
string | false | |
dead-letter-queue.producer-client-id | When the failure-strategy is set to dead-letter-queue indicates what client id the generated producer should use. Defaults is kafka-dead-letter-topic-producer-$client-id |
string | false | |
dead-letter-queue.topic | When the failure-strategy is set to dead-letter-queue indicates on which topic the record is sent. Defaults is dead-letter-topic-$channel |
string | false | |
dead-letter-queue.value.serializer | When the failure-strategy is set to dead-letter-queue indicates the value serializer to use. If not set the serializer associated to the value deserializer is used |
string | false | |
enable.auto.commit | If enabled, consumer's offset will be periodically committed in the background by the underlying Kafka client, ignoring the actual processing outcome of the records. It is recommended to NOT enable this setting and let Reactive Messaging handles the commit. | boolean | false | false |
fail-on-deserialization-failure | When no deserialization failure handler is set and a deserialization failure happens, report the failure and mark the application as unhealthy. If set to false and a deserialization failure happens, a null value is forwarded. |
boolean | false | true |
failure-strategy | Specify the failure strategy to apply when a message produced from a record is acknowledged negatively (nack). Values can be fail (default), ignore , or dead-letter-queue |
string | false | fail |
fetch.min.bytes | The minimum amount of data the server should return for a fetch request. The default setting of 1 byte means that fetch requests are answered as soon as a single byte of data is available or the fetch request times out waiting for data to arrive. | int | false | 1 |
graceful-shutdown | Whether or not a graceful shutdown should be attempted when the application terminates. | boolean | false | true |
group.id | A unique string that identifies the consumer group the application belongs to. If not set, a unique, generated id is used | string | false | |
health-enabled | Whether health reporting is enabled (default) or disabled | boolean | false | true |
health-readiness-enabled | Whether readiness health reporting is enabled (default) or disabled | boolean | false | true |
health-readiness-timeout | deprecated - During the readiness health check, the connector connects to the broker and retrieves the list of topics. This attribute specifies the maximum duration (in ms) for the retrieval. If exceeded, the channel is considered not-ready. Deprecated: Use 'health-topic-verification-timeout' instead. | long | false | |
health-readiness-topic-verification | deprecated - Whether the readiness check should verify that topics exist on the broker. Default to false. Enabling it requires an admin connection. Deprecated: Use 'health-topic-verification-enabled' instead. | boolean | false | |
health-topic-verification-enabled | Whether the startup and readiness check should verify that topics exist on the broker. Default to false. Enabling it requires an admin client connection. | boolean | false | false |
health-topic-verification-timeout | During the startup and readiness health check, the connector connects to the broker and retrieves the list of topics. This attribute specifies the maximum duration (in ms) for the retrieval. If exceeded, the channel is considered not-ready. | long | false | 2000 |
kafka-configuration | Identifier of a CDI bean that provides the default Kafka consumer/producer configuration for this channel. The channel configuration can still override any attribute. The bean must have a type of Map |
string | false | |
key-deserialization-failure-handler | The name set in @Identifier of a bean that implements io.smallrye.reactive.messaging.kafka.DeserializationFailureHandler . If set, deserialization failure happening when deserializing keys are delegated to this handler which may retry or provide a fallback value. |
string | false | |
key.deserializer | The deserializer classname used to deserialize the record's key | string | false | org.apache.kafka.common.serialization.StringDeserializer |
max-queue-size-factor | Multiplier factor to determine maximum number of records queued for processing, using max.poll.records * max-queue-size-factor . Defaults to 2. In batch mode max.poll.records is considered 1 . |
int | false | 2 |
partitions | The number of partitions to be consumed concurrently. The connector creates the specified amount of Kafka consumers. It should match the number of partition of the targeted topic | int | false | 1 |
pattern | Indicate that the topic property is a regular expression. Must be used with the topic property. Cannot be used with the topics property |
boolean | false | false |
pause-if-no-requests | Whether the polling must be paused when the application does not request items and resume when it does. This allows implementing back-pressure based on the application capacity. Note that polling is not stopped, but will not retrieve any records when paused. | boolean | false | true |
poll-timeout | The polling timeout in milliseconds. When polling records, the poll will wait at most that duration before returning records. Default is 1000ms | int | false | 1000 |
requests | When partitions is greater than 1, this attribute allows configuring how many records are requested by each consumers every time. |
int | false | 128 |
retry | Whether or not the connection to the broker is re-attempted in case of failure | boolean | false | true |
retry-attempts | The maximum number of reconnection before failing. -1 means infinite retry | int | false | -1 |
retry-max-wait | The max delay (in seconds) between 2 reconnects | int | false | 30 |
throttled.unprocessed-record-max-age.ms | While using the throttled commit-strategy, specify the max age in milliseconds that an unprocessed message can be before the connector is marked as unhealthy. Setting this attribute to 0 disables this monitoring. |
int | false | 60000 |
topic | The consumed / populated Kafka topic. If neither this property nor the topics properties are set, the channel name is used |
string | false | |
topics | A comma-separating list of topics to be consumed. Cannot be used with the topic or pattern properties |
string | false | |
tracing-enabled | Whether tracing is enabled (default) or disabled | boolean | false | true |
value-deserialization-failure-handler | The name set in @Identifier of a bean that implements io.smallrye.reactive.messaging.kafka.DeserializationFailureHandler . If set, deserialization failure happening when deserializing values are delegated to this handler which may retry or provide a fallback value. |
string | false | |
value.deserializer | The deserializer classname used to deserialize the record's value | string | true |
You can also pass any property supported by the underlying Kafka consumer.
For example, to configure the max.poll.records
property, use:
Some consumer client properties are configured to sensible default values:
If not set, reconnect.backoff.max.ms
is set to 10000
to avoid high
load on disconnection.
If not set, key.deserializer
is set to
org.apache.kafka.common.serialization.StringDeserializer
.
The consumer client.id
is configured according to the number of
clients to create using mp.messaging.incoming.[channel].partitions
property.
-
If a
client.id
is provided, it is used as-is or suffixed with client index ifpartitions
property is set. -
If a
client.id
is not provided, it is generated askafka-consumer-[channel][-index]
.