jasperjiaguo opened a new issue, #10499: URL: https://github.com/apache/pinot/issues/10499
For high cardinality columns, the local/intermediate/global merging phase of distinct(count) can be pretty memory/cpu heavy as the merger will need to ser/de and merge multiple large sets from the responses. In this case, if the distinct(count) column is partitioned into disjoint sets, then the merger can simply concat (for distinct) or add (for distinctcount) the intermediate results. This change can significantly reduce the set ser/de, transmission, and merge time/memory footprint. Meanwhile, it can be applicable to different levels of the processing depending on the partition granularity. <img width="757" alt="Screenshot 2023-03-28 at 8 34 39 PM" src="https://user-images.githubusercontent.com/10736840/228420057-f4957793-1820-4a6b-9974-45ec0fc80190.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org