jasperjiaguo opened a new issue, #10499:
URL: https://github.com/apache/pinot/issues/10499

   For high cardinality columns, the local/intermediate/global merging phase of 
distinct(count) can be pretty memory/cpu heavy as the merger will need to 
ser/de and merge multiple large sets from the responses. In this case, if the 
distinct(count) column is partitioned into disjoint sets, then the merger can 
simply concat (for distinct) or add (for distinctcount) the intermediate 
results. This change can significantly reduce the set ser/de, transmission, and 
merge time/memory footprint. Meanwhile, it can be applicable to different 
levels of the processing depending on the partition granularity.
   
   <img width="757" alt="Screenshot 2023-03-28 at 8 34 39 PM" 
src="https://user-images.githubusercontent.com/10736840/228420057-f4957793-1820-4a6b-9974-45ec0fc80190.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to