benwtrent commented on issue #11830:
URL: https://github.com/apache/lucene/issues/11830#issuecomment-1319099313

   I changed the PR to move towards delta encoding & vint. Even with storing 
the memory offsets within `vex`, the storage improvements are much better than 
PackedInts.
   
   Table with some numbers around the size improvements for different data sets 
& parameters:
   
   | packed_vex_mb_size | vex_mb_size | packed_index_build_time | 
index_build_time | params                             | dataset             | 
percent_reduction |
   
|--------------------|-------------|-------------------------|------------------|------------------------------------|---------------------|-------------------|
   | 79.9               | 161.6       | 767                     | 784           
   | "{'M': 16, 'efConstruction': 100}" | glove-100-angular   | 50.55693069     
  |
   | 108.4              | 464.1       | 1138                    | 1225          
   | "{'M': 48, 'efConstruction': 100}" | glove-100-angular   | 76.64296488     
  |
   | 2.3                | 8.2         | 36                      | 36            
   | "{'M': 16, 'efConstruction': 100}" | mnist-784-euclidean | 71.95121951     
  |
   | 2.4                | 23.5        | 36                      | 36            
   | "{'M': 48, 'efConstruction': 100}" | mnist-784-euclidean | 89.78723404     
  |
   | 66.1               | 392.2       | 501                     | 572           
   | "{'M': 48, 'efConstruction': 100}" | sift-128-euclidean  | 83.1463539      
  |
   | 59.7               | 136.6       | 449                     | 516           
   | "{'M': 16, 'efConstruction': 100}" | sift-128-euclidean  | 56.29575403     
  |
   
   
   For the curious, here are the QPS numbers (higher is better) for packed 
(delta & vint) vs baseline:
   
   # Glove
   
   
![image](https://user-images.githubusercontent.com/4357155/202539450-415f1622-cf6f-4cc6-8de5-e714b47cc8a6.png)
   
   # MNist
   
   
![image](https://user-images.githubusercontent.com/4357155/202539516-235485b1-9b01-497f-81af-ce2d7475ae74.png)
   
   # SIFT
   
   
![image](https://user-images.githubusercontent.com/4357155/202539592-d3c387e2-60e2-4956-8e92-b5b9361588bb.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to