benwtrent commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1318779275

   @rmuir @jpountz I have the following local changes (will clean up and push 
to the PR soon). The change seems to improve query per second compared to the 
unpacked codec in some cases (remains unchanged in others)
   
   The local changes are:
    - delta encode the neighbor doc IDs
    - Vint serialize them
    - Keep track of memory offsets and store them in the vex file as well
   
   Here are the improvements over the unpacked values (80% reduction in some 
cases!!!)
   
   It makes sense that the larger the `M` the bigger the impact. 
   
   | packed_vex_mb_size | vex_mb_size | packed_index_build_time | 
index_build_time | params                             | dataset             | 
percent_reduction |
   
|--------------------|-------------|-------------------------|------------------|------------------------------------|---------------------|-------------------|
   | 79.9               | 161.6       | 767                     | 784           
   | "{'M': 16, 'efConstruction': 100}" | glove-100-angular   | 50.55693069     
  |
   | 108.4              | 464.1       | 1138                    | 1225          
   | "{'M': 48, 'efConstruction': 100}" | glove-100-angular   | 76.64296488     
  |
   | 2.3                | 8.2         | 36                      | 36            
   | "{'M': 16, 'efConstruction': 100}" | mnist-784-euclidean | 71.95121951     
  |
   | 2.4                | 23.5        | 36                      | 36            
   | "{'M': 48, 'efConstruction': 100}" | mnist-784-euclidean | 89.78723404     
  |
   | 66.1               | 392.2       | 501                     | 572           
   | "{'M': 48, 'efConstruction': 100}" | sift-128-euclidean  | 83.1463539      
  |
   | 59.7               | 136.6       | 449                     | 516           
   | "{'M': 16, 'efConstruction': 100}" | sift-128-euclidean  | 56.29575403     
  |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to