Apache Kafka Interview Questions
In this post we will look at Apache Kafka Interview questions. Examples are provided with explanation.
Q: What is Apache Kafka?
A: Apache Kafka is a distributed publish-subscribe messaging system. It is a scalable, fault-tolerant, publish-subscribe messaging system which enables us to build distributed applications. It is an Apache Top Level project. Kafka is suitable for both offline and online message consumption.
Q: What are the advantages of using Apache Kafka?
A: The Advantages of using Apache Kafka are as follows-
- High Throughput-
The design of Kafka enables the platform to process messages at very fast speed. The processing rates in Kafka can exceed beyond 100k/seconds. The data is processed in a partitioned and ordered fashion.
The scalability can be achieved in Kafka at various levels. Multiple producers can write to the same topic. Topics can be partitioned. Consumers can be grouped to consume individual partitions.
- Fault Tolerance-
Kafka is a distributed architecture which means there are several nodes running together to serve the cluster. Topics inside Kafka are replicated. Users can choose the number of replicas for each topic to be safe in case of a node failure. Node failure in cluster won’t impact. Integration with Zookeeper provides producers and consumers accurate information about the cluster. Internally each topic has its own leader which takes care of the writes. Failure of node ensures new leader election.
Kafka offers data durability as well. The message written in Kafka can be persisted. The persistence can be configured. This ensures re-processing, if required, can be performed.
Q: How to get started with Apache Kafka?
A: Getting Started with Apache Kafka - Hello World Example
Q: Have you integrated Apache Kafka with any framework?
A: Spring Boot + Apache Kafka Example
Spring Boot Interview Questions
Q: What is Kafka Logs?
A: An important concept for Apache Kafka is “log”. This is not related to application log or system log. This is a log of the data. It creates a loose structure of the data which is consumed by Kafka. The notion of “log” is an ordered, append-only sequence of data. The data can be anything because for Kafka it will be just an array of bytes.
Q: When not to use Apache Kafka?
- Kafka doesn’t number the messages. It has a notion of “offset” inside the log which identifies the messages.
- Consumers consume the data from topics but Kafka does not keep track of the message consumption. Kafka does not know which consumer consumed which message from the topic. The consumer or consumer group has to keep a track of the consumption.
- There are no random reads from Kafka. Consumer has to mention the offset for the topic and Kafka starts serving the messages in order from the given offset.
- Kafka does not offer the ability to delete. The message stays via logs in Kafka till it expires (until the retention time defined).