Jackie-Jiang commented on issue #5893: URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-689154234
@mr-agrwal In order to aggregate the values, we have to store the serialized `Set` into the star-tree (same as what we need to store for `DistinctCount`. The size of this `Set` is unbounded, and is storing all the distinct values under a tree node. I don't think it will work properly for 2 reasons: 1. The star-tree size could be huge if there are many distinct values under a tree node, which can leads to memory issue 2. Reading and deserializing the set could be very expensive (even more expensive than scanning the raw values and creating a new set) It might work for low cardinality columns (e.g. colA has <1000 distinct values), but that is not very common IMO ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org