kangpinghuang opened a new issue #2967: add a test for different encoding
URL: https://github.com/apache/incubator-doris/issues/2967
 
 
   I add a test for encoding in different situation.
   
   I generate 100million int, classified into 4 type: sequence/random/small 
step/large step.
   
   the original data size is as following:
   
   sequence | random | small_step | big_step
   -- | -- | -- | --
   848 | 1000M | 859 | 859
   
   - tests
   
   I test for encoding method, including: alpha/beta_bitshuffle/beta_for(frame 
of reference)/beta_rle. The result is as following:
   
   1. space
   
   the following is space after encoding for 100million ints.
   
   单位(KB) | sequence | random | small_step | big_step
   -- | -- | -- | -- | --
   alpha | 2865.152 | 104420.4 | 2108.416 | 2224.128
   beta_bitshuffle | 4094.976 | 143268.9 | 1682.432 | 2679.808
   beta_for | 4582.4 | 94251.01 | 797.3325 | 956.233728
   beta_rle | 818.0101 | 10342.4 | 778.4581 | 783.970304
   
   the graph is as following:
   
![image](https://user-images.githubusercontent.com/40422952/75019415-c9441d00-54cb-11ea-92f7-a6ac0e2ae9f7.png)
   
   2. query time cost for count(*) 
   
   the time is 95% percentile time cost, unit is : ms
   
     | sequence | random | small_step | big_step
   -- | -- | -- | -- | --
   alpha | 7399.1 | 5416.48 | 6231 | 5372.88
   beta_bitshuffle | 14342 | 12059.82 | 9186.91 | 8817.78
   beta_for | 8752.04 | 11379.43 | 12403.98 | 8415.49
   beta_rle | 8544.95 | 8614.29 | 9299.58 | 8295.44
   
   the graph is:
   
![image](https://user-images.githubusercontent.com/40422952/75019604-22ac4c00-54cc-11ea-8ad9-38d9bb419ace.png)
   
   3. query time cost for point query
   
   select count(*) from table where id = xxx;
   
   
     | sequence | random | small_step | big_step
   -- | -- | -- | -- | --
   alhpa | 8.3 | 8.66 | 477.26 | 10.73
   beta_bitshuffle | 9.65 | 9.86 | 413.63 | 10.91
   beta_for | 25.3 | 29.98 | 401.32 | 30.13
   beta_rle | 8.65 | 9.06 | 398.92 | 10.86
   
   the graph is:
   
   
![image](https://user-images.githubusercontent.com/40422952/75019723-61420680-54cc-11ea-80ef-da9862433f4c.png)
   
   - conclusion
   
   beta rle aquire the best space efficiency in all situation than other beta's 
encodings and alpha encoding. The query performance of beta rle is the best in 
encodings of Segment V2, but is a bit poor than alpha encoding.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to