You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3. How many tasks will you configure for the S3 connector?

Study for the CCDAK Apache Kafka Test. Use flashcards and multiple choice questions with hints and explanations for each question. Prepare thoroughly for your exam!

Multiple Choice

You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3. How many tasks will you configure for the S3 connector?

Explanation:
In Kafka Connect, the level of parallelism for a sink connector is determined by the number of partitions in the source topic. Typically, each partition is handled by its own task, so you can run as many tasks as there are partitions. The replication factor and the number of brokers don’t change this parallelism; replication affects durability, not how many tasks can work in parallel. With a topic that has 2 partitions, you can have up to 2 tasks running for the S3 sink connector. If you set a higher max (like 4 or 10), you won’t get more than 2 tasks because there aren’t more partitions to assign them to. If you set it to 1, you’d only have one task handling both partitions, reducing parallelism. Therefore, configuring 2 tasks allows one task per partition, giving optimal parallel processing.

In Kafka Connect, the level of parallelism for a sink connector is determined by the number of partitions in the source topic. Typically, each partition is handled by its own task, so you can run as many tasks as there are partitions. The replication factor and the number of brokers don’t change this parallelism; replication affects durability, not how many tasks can work in parallel.

With a topic that has 2 partitions, you can have up to 2 tasks running for the S3 sink connector. If you set a higher max (like 4 or 10), you won’t get more than 2 tasks because there aren’t more partitions to assign them to. If you set it to 1, you’d only have one task handling both partitions, reducing parallelism. Therefore, configuring 2 tasks allows one task per partition, giving optimal parallel processing.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy