Kafka : Real-Time Data Streaming in Spring Boot (KRaft)

Published On: 11 July 2023.By .

An open-source distributed streaming platform, Kafka enables real-time processing of large volumes of data.

It’s design enables it to handle data streams in a scalable, fault-tolerant, and durable way, allowing it to support applications in publishing and subscribing to streams of records, much like a message queue or enterprise messaging system. It optimizes for high-throughput, low-latency data streaming, making it ideal for use cases such as real-time analytics, data processing, and event-driven architectures.

Kafka is based on a distributed publish-subscribe model where it organizes data into topics and partitions it across multiple nodes in a cluster. This enables Kafka to provide horizontal scalability, fault tolerance, and high availability. It also provides features such as message retention, data replication, and support for multiple client libraries and programming languages out of which here we will be implementing it on JAVA (Spring Boot).

When to use Kafka ?

Some common use cases of Kafka include :

  1. Messaging: Applications can use it as a messaging system to decouple themselves and enable asynchronous communication between them. Applications can produce and consume messages from Kafka topics without being aware of each other’s existence.
  2. Log aggregation: It can collect and store log data from multiple sources, such as web servers, application servers, and databases. This enables centralized log management, analysis, and monitoring.
  3. Real-time processing: It’s fast and scalable architecture makes it ideal for real-time processing of large volumes of data. Users can use Kafka Streams, a Kafka client library, to process data streams in real-time and generate insights and analytics.
  4. Event-driven architectures: Companies can use it as the backbone of event-driven architectures, where events trigger actions. It’s support for pub-sub messaging and distributed processing makes it an ideal platform for building event-driven systems.
  5. Micro-services: It enables communication between micro-services in a distributed system. Each micro-service can publish and consume messages from Kafka topics, making it easy to build loosely coupled and scalable architectures.

Implementing Kafka in Spring Boot project :

Before starting with implementation let’s clear some basics.

Earlier Kafka used to rely on ZooKeeper for its working. However, managing ZooKeeper can be complex and time-consuming, moreover one needs to start ZooKeeper services separately to get started which has led to the development of KRaft.

With KRaft, Kafka nodes coordinate with each other directly, eliminating the need for a separate ZooKeeper cluster. This simplifies the deployment and management of Kafka clusters, as well as improves their stability and availability.

Some benefits of using KRaft include:

  1. Simplified deployment
  2. Improved stability and availability
  3. Reduced maintenance
  4. Lower operational costs

More about KRaft can be read from their official website : https://developer.confluent.io/learn/kraft/

Setting up system for development :

  1. Download Kafka from their official website : https://dlcdn.apache.org/kafka/
  2. Extract the file :
  3. Start Kafka environment : Now we are in ready to go condition to use Kafka in our project.

More about setup can be read from their official website : https://kafka.apache.org/quickstart#quickstart_download

Creating a Spring-Boot project :

NOTE : Make sure you use Java 8+ version.

Controller : Endpoint for sending messages to our Kafka server.

KafkaTopicConfig : Configuration file used to create a topic with its name, partition and replication factor.

Note : We will learn more about the terms used here later in this part of the blog.

Publisher : Service that will publish messages that will be consumed by consumers.

Note : Consumer and publishers need not to be in same project. Here for simplicity of blog we have included consumer in same project but you can have different setup for consumer by providing right configuration’s in that setup.

Consumer : Service that will listen/consume the messages produced by producer.

Note : Consumer and publishers need not to be in same project. Here for simplicity of blog we have included consumer in same project but you can have different setup for consumer by providing right configuration’s in that setup.

ConsumerSeekAware :  If we want to reset our offset and start reading messages from past we can do it using this. This will start seeking messages from start.

A snapshot of project running :

Listener listening to published events in topic (kafka)

You can find the complete code for this project on the Git Repository :

https://github.com/akshat-jainn/Kafka_POC

Understanding commonly used terms :

  1. Producer : An application or system that produces data to Topics.
  2. Consumer : An application or system that consumes data from Topics.
  3. Consumer group : A group of consumers that collectively consume messages from one or more partitions of a topic. It assigns each consumer in the group to one or more partitions of the topic.
  4. Offset : A unique identifier that represents the position of a consumer within a partition of a topic. Kafka uses it to track consumption progress and ensure that each consumer in a group processes unique messages.
  5. Broker : A Kafka server that stores and manages the topic partitions and handles requests from producers and consumers.
  6. Leader : The broker responsible for handling all read and write requests for a particular partition of a topic.
  7. Follower : Kafka keeps a replica of a partition, which is not currently the leader but receives incremental updates to stay up-to-date with the leader’s data.
  8. Topic Name : Producers publish messages to a category or feed name called a topic, and consumers consume them. Kafka identifies topics by a string name that can contain any character allowed by the file system, except for the null character and the forward slash. Topics are the primary mechanism for message segregation, and multiple producers can publish messages to the same topic, and multiple consumers can subscribe to the same topic to receive the published messages.
  9. Partition : A partition is a unit of parallelism and scalability within a Kafka topic. Kafka can divide a topic into multiple partitions, and it can host each partition on a different broker in a Kafka cluster.
  10. ReplicationFactor : The replication factor in Kafka refers to the number of copies of a particular topic partition that Kafka maintains across different brokers in a Kafka cluster.

When to use Kafka and when to use RabbitMQ :

Both Kafka and RabbitMQ messaging systems find wide usage in distributed systems to enable communication between various components.While they both serve the same purpose, there are some significant differences between the two.

Kafka handles real-time data streams at scale and is designed as a streaming platform, while RabbitMQ is designed as a traditional message broker for messaging between applications.

  1. Architecture: Kafka is a distributed streaming platform, while RabbitMQ is a traditional message broker. Designers have developed Kafka to handle large volumes of data and stream processing, whereas they have designed RabbitMQ for message queuing.
  2. Latency: Kafka optimises for low latency and high throughput, making it an ideal choice for real-time processing of data, while RabbitMQ optimizes for reliability and message durability, which can result in slightly higher latency.
  3. Message delivery guarantees: Kafka guarantees at-least-once delivery, ensuring that messages are delivered at least once. On the other hand, RabbitMQ provides both at-most-once and at-least-once delivery guarantees, depending on the configuration.
  4. Protocols and APIs: Kafka supports a variety of protocols and APIs, including a native Java API, REST, and a command-line interface. RabbitMQ supports the Advanced Message Queuing Protocol (AMQP), which is a standard messaging protocol, as well as other protocols and APIs.
  5. Message ordering: Kafka is designed to maintain message order within a partition, but it does not maintain order across partitions. On the other hand, RabbitMQ preserves message order within a queue.
  6. Data processing: Streaming platforms optimize for processing continuous streams of data in real-time, whereas traditional message brokers optimize for queuing and message delivery.
  7. Scalability: Streaming platforms are highly scalable and can handle large volumes of data across multiple nodes and applications, while traditional message brokers are typically limited to a single node or cluster.
  8. Data persistence: Streaming platforms, such as Kafka, can persist data for longer periods of time and enable data replayability, while traditional message brokers are typically used for short-term storage and message delivery.

Hence we can conclude :

For use cases that require low latency and high throughput, Kafka is a better choice, whereas for use cases where reliability and message durability are critical, RabbitMQ is a better choice.

AMQP Protocol used by RabbitMQ:

Think of AMQP like a post office. Just like how you can send a letter or package to someone through a post office, you can use AMQP to send a message from one application to another. The AMQP protocol defines how to structure messages, deliver them, and ensure that the sender and receiver interact securely to guarantee reliable message delivery.

In technical words :

AMQP (Advanced Message Queuing Protocol) is a messaging protocol that allows different applications or systems to exchange messages in a reliable, secure, and efficient manner. AMQP provides a standard way for different applications to communicate with each other by sending and receiving messages.

Conclusion:

In this blog we discussed about what Kafka is, why we shifted from ZoopKeeper to KRaft, how to setup it in our system, how to implement a Spring Boot project using KRaft, brief about basic terminologies used, why and when to use Kafka over RabbitMQ, what is AMQP protocol.

Related content

That’s all for this blog