[GitHub] [pinot] richardstartin commented on issue #7973: chunk compression type is hardcoded to passthrough for metric columns

GitBox Tue, 04 Jan 2022 23:13:44 -0800


richardstartin commented on issue #7973:
URL: https://github.com/apache/pinot/issues/7973#issuecomment-1005437083



   This makes sense the way it is for a couple of reasons:
   * chunks for metric columns are tiny: 4-8KB depending on the data type. This 
means there would be many chunks to decompress in a column scan. 
   * general purpose compression algorithms work better on text than arbitrary 
numeric data, so the compression ratio for the average user’s column likely 
wouldn’t be very good.
   
   These two factors combine to make a less than compelling case for general 
purpose compression of metric columns. 
   
   There are numerous encoding techniques which could be explored for metric 
columns in the future, which tend to produce better space reductions and are 
faster to decode. 
   
   If you have a metric column which you expect to be compressible because it 
has lots of duplicates, it would be worth experimenting with using a dictionary 
column instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] richardstartin commented on issue #7973: chunk compression type is hardcoded to passthrough for metric columns

Reply via email to