Jackie-Jiang commented on issue #5893:
URL: 
https://github.com/apache/incubator-pinot/issues/5893#issuecomment-675644651


   @mr-agrwal It might not be efficient to support star-tree on 
`SegmentPartitionedDistinctCount` because:
   - In order to generate star-tree, we need to generate the intermediate 
aggregated values for each dimension combinations. For 
`SegmentPartitionedDistinctCount`, that is a `Set`, which contains all the 
distinct values and has unbounded size.
   - Storing all these sets could cause memory issue during segment creation, 
and the segment size could be huge
   - At query time, deserializing these set could be slow, and we won't get 
much performance gain as we still need to process all these distinct values
   
   We usually add star-tree support for functions that has limited-sized 
intermediate aggregated values (e.g. Double, HyperLogLog, TDigest, etc.). For 
distinct count family, we have star-tree support on `DistinctCountHLL` and 
`DistinctCountBitmap`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to