lucasbru opened a new pull request, #17301:
URL: https://github.com/apache/kafka/pull/17301
First implementation of auto-topic creation of internal topics. A basic
implementation is working, which is checking for the right ACL permissions,
validates if the request is valid, then updates the topology record in the
consumer offset topic, determines the right number of partitions for all
internal topics and finally triggers a corresponding CreateTopic request. Code
for generating and validating internal topic configuration lives inside the
group coordinator, but the actual creation of the topics is performed inside
KafkaApis.
The key concept is that the topology description being sent to the broker is
parametric in the number of partitions -- that is, the broker checks the number
of partitions for the input partition, and derives the number of partitions for
the internal topic. I ported ChangelogTopics, RepartitionTopis and
CopartionningEnforcer from the StreamsPartitionAssignor to the broker code.
Compared to the schema in KIP-1071, I had to add copartition groups to the
initialization call. Copartititon groups are not necessarily the same as
subtopologies, so we need to model copartition groups separately in broker-side
topology, if we want to keep the number of partitions in each subtopology
parametric / generic.
This work is still in progress, and here are some things that I want to
improve berfore merging:
- There is a dependency between updating the topology record in the
consumer offset topic and creating the internal topics. In case of failures,
one could be executed without the other. I think we’ll need to make the update
to the topology record first, and then create internal topics. The topology
record will be the single source of truth. If internal topic creation fails,
this may or may not be detected by the client. We’ll have to have way to deal
with missing internal topics. If the StreamsHeartbeat detects that internal
topics are missing, it should trigger re-initialization of the topology, to
create the missing topics.
- Surprising, existing RPCs (TopicMetadata and FindCoordinator) auto-create
topics asynchronously — so a create topic request is started, but the result is
never checked. I followed the same route for this first version, but I’m not
quite sure if this is what we want for StreamsInitialize, as you’ll have to
check the broker logs if internal topics cannot be created. We can do some
validation up-front, but the final confirmation that internal topics were
created comes from the response of the internal CreateTopics request, that
would mean we need to do a synchronous request.
- The set of tasks is defined by the TopicsImage in the group coordinator
and the partition-parametric topology description. The group epoch needs to be
bumped when either changes. We need to make sure to cache the result of
validating and instantiating a topology against a current set of input topics,
so that we don’t have to redo it on every heartbeat. After fail-over, we need
to reconstruct this cache from the topology and subscription records in the
offset topic.
We need to define clearly what happens if a topology conflicts with the
current set of topics on the broker. We may want to retry initialization of the
topology, or just auto-create any missing topics on the broker.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]