kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2895246147

   Rebased the PR to incorporate recent changes (including the optimistic 
collection based on pro-rating)
   
   ---
   
   Single-segment search has no impact as expected:
   
   Lucene:
   ```
   recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  
visited  index s  index docs/s  force merge s  num segments  index size (MB)  
vec disk (MB)  vec RAM (MB)
    0.812         0.731  200000   100      50       32        200         no    
 1392   153.38       1303.96           0.01             1           236.93      
  228.882       228.882
   ```
   
   Faiss:
   ```
   recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  
visited  index s  index docs/s  force merge s  num segments  index size (MB)  
vec disk (MB)  vec RAM (MB)
    0.811         0.568  200000   100      50       32        200         no    
    0   133.12       1502.37           0.01             1           511.97      
  228.882       228.882
   ```
   
   ---
   
   Multi-segment search is a bit tricky now, because we collect a different 
number of results from each segment based on its size -- but the `efSearch` 
parameter is set independently (from the index factory string)
   
   Lucene:
   ```
   recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  
visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  
vec RAM (MB)
    0.880         2.110  200000   100      50       32        200         no    
 9632    88.03       2271.85             6           235.10        228.882      
 228.882
   ```
   
   Faiss:
   ```
   recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  
visited  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  
vec RAM (MB)
    0.894         1.731  200000   100      50       32        200         no    
    0    83.34       2399.69             4           511.99        228.882      
 228.882
   ```
   
   (speedup is <20% even with fewer segments)
   
   ---
   
   In the future, we could try to expose a fixed set of parameters from Lucene 
and construct the index factory string programmatically to incorporate these 
caveats better..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to