kirktrue commented on PR #17035:
URL: https://github.com/apache/kafka/pull/17035#issuecomment-2327491994

   Hi @AndrewJSchofield!
   
   Thanks for the review ๐Ÿ‘ 
   
   > First, I think it better not to overload the PollEvent because that's 
already used in the share consumer.
   
   Agreed. I've introduced a `FetchEvent` so that the two separate mechanisms 
won't step on each others' toes.
   
   > Second, it seems to me that there is still the potential for 
over-fetching, and this will still cause churn of the fetch session cache.
   
   Agreed. It aims to _lessen_ the churn. _Preventing_ the churn completely is 
a future task. 
   
   > In the case where the consumer is only fetching a single partition, I 
think it works pretty well. The set of fetchable partitions will be empty if 
there's buffered data, and contain the only partition in the fetch session if 
there is not. So, you'll only send a Fetch request when there's a need for more 
data and the fetch session will not churn.
   
   Correct.
   
   > In the case where the consumer is fetching more than one partition on a 
particular node, if a subset of the partitions is fetchable, then the fetch 
session will be modified by sending a Fetch request and that seems to have the 
potential for a lot of churn.
   
   Correct again!
   
   Any partition with buffered data at the point where the fetch request is 
being generated will be marked as "removed" from the broker's fetch session 
cache. That's the crux of the problem ๐Ÿ˜ž
   
   Something that I tend to lose sight of is the fact that it's not a foregone 
conclusion that a fetch session will be evicted when it has partitions removed. 
Of course, it will increase its eligibility for eviction if the broker hosting 
the fetch session is resource-constrained and invokes the eviction process.
   
   > Of course, all of this code is in common between the legacy consumer and 
the async consumer.
   
   I'm not sure I follow. This code is all specific to the 
`AsyncKafkaConsumer`. While the `ClassicKafkaConsumer` has a similar race 
condition, it is 2-4 orders of magnitude less likely to happen.
   
   > The async consumer is still very keen on fetching so I don't properly 
grasp why this PR would make the fetch session behaviour better.
   
   Yepโ€”the design of the `AsyncKafkaConsumer` fetching continuously in the 
background makes it _very_ keen to cause this problem. With this change, the 
application thread now signals when to fetch, which results in the background 
thread creating and issuing the fetch requests _much_ less often.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to