[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

GitBox Tue, 08 Sep 2020 14:48:07 -0700


Jackie-Jiang commented on issue #5893:
URL: 
https://github.com/apache/incubator-pinot/issues/5893#issuecomment-689154234



   @mr-agrwal In order to aggregate the values, we have to store the serialized 
`Set` into the star-tree (same as what we need to store for `DistinctCount`. 
The size of this `Set` is unbounded, and is storing all the distinct values 
under a tree node. I don't think it will work properly for 2 reasons:
   1. The star-tree size could be huge if there are many distinct values under 
a tree node, which can leads to memory issue
   2. Reading and deserializing the set could be very expensive (even more 
expensive than scanning the raw values and creating a new set)
   
   It might work for low cardinality columns (e.g. colA has <1000 distinct 
values), but that is not very common IMO


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Reply via email to