lucasbru opened a new pull request, #20523:
URL: https://github.com/apache/kafka/pull/20523

   In the current solution, we only use a heap to select the right process, but 
resort to linear search for selecting a member within a process. This means use 
cases where a lot of threads run within the same process can yield slow 
assignment. The number of threads in a process shouldn’t scale arbitrarily (our 
assumed case for benchmarking of 50 threads in a single process seems quite 
extreme already), however, we can optimize for this case to reduce the runtime 
further.
   
   Other assignment algorithms assign directly on the member-level, but we 
cannot do this in Kafka Streams, since we cannot assign tasks to processes that 
already own the task. Defining a heap directly on members would mean that we 
may have to skip through 10s of member before finding one that does not belong 
to a process that does not yet own the member.
   
   Instead, we can define a separate heap for each process, which keeps the 
members of the process by load. We can only keep the heap as long as we are 
only changing the load of the top-most member (which we usually do). This means 
we keep track of a lot of heaps, but since heaps are backed by arrays in Java, 
this should not result in extreme memory inefficiencies.
   
   In our worst-performing benchmark, this improves the runtime by ~2x on top 
of the optimization above.
   
   Also piggybacked are some minor optimizations / clean-ups:
    - initialize HashMaps and ArrayLists with the right capacity
    - fix some comments
    - improve logging output
   
   Note that this is a pure performance change, so there are no changes to the 
unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to