lucasbru opened a new pull request, #17301:
URL: https://github.com/apache/kafka/pull/17301

   First implementation of auto-topic creation of internal topics. A basic 
implementation is working, which is checking for the right ACL permissions, 
validates if the request is valid, then updates the topology record in the 
consumer offset topic, determines the right number of partitions for all 
internal topics and finally triggers a corresponding CreateTopic request. Code 
for generating and validating internal topic configuration lives inside the 
group coordinator, but the actual creation of the topics is performed inside 
KafkaApis.
   
   The key concept is that the topology description being sent to the broker is 
parametric in the number of partitions -- that is, the broker checks the number 
of partitions for the input partition, and derives the number of partitions for 
the internal topic. I ported ChangelogTopics, RepartitionTopis and 
CopartionningEnforcer from the StreamsPartitionAssignor to the broker code.
   
   Compared to the schema in KIP-1071, I had to add copartition groups to the 
initialization call. Copartititon groups are not necessarily the same as 
subtopologies, so we need to model copartition groups separately in broker-side 
topology, if we want to keep the number of partitions in each subtopology 
parametric / generic. 
   
   This work is still in progress, and here are some things that I want to 
improve berfore merging:
   
    - There is a dependency between updating the topology record in the 
consumer offset topic and creating the internal topics. In case of failures, 
one could be executed without the other. I think we’ll need to make the update 
to the topology record first, and then create internal topics. The topology 
record will be the single source of truth. If internal topic creation fails, 
this may or may not be detected by the client. We’ll have to have way to deal 
with missing internal topics. If the StreamsHeartbeat detects that internal 
topics are missing, it should trigger re-initialization of the topology, to 
create the missing topics.
    - Surprising, existing RPCs (TopicMetadata and FindCoordinator) auto-create 
topics asynchronously — so a create topic request is started, but the result is 
never checked. I followed the same route for this first version, but I’m not 
quite sure if this is what we want for StreamsInitialize, as you’ll have to 
check the broker logs if internal topics cannot be created. We can do some 
validation up-front, but the final confirmation that internal topics were 
created comes from the response of the internal CreateTopics request, that 
would mean we need to do a synchronous request.
    - The set of tasks is defined by the TopicsImage in the group coordinator 
and the partition-parametric topology description. The group epoch needs to be 
bumped when either changes. We need to make sure to cache the result of 
validating and instantiating a topology against a current set of input topics, 
so that we don’t have to redo it on every heartbeat. After fail-over, we need 
to reconstruct this cache from the topology and subscription records in the 
offset topic.
   
   We need to define clearly what happens if a topology conflicts with the 
current set of topics on the broker. We may want to retry initialization of the 
topology, or just auto-create any missing topics on the broker.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to