kaivalnp commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3024727415

   > see some knnPerTest.py results comparing before/after
   
   I tried this on 300d byte vectors generated using:
   ```sh
   ./gradlew vectors-300
   ```
   
   ..and the results are strange!
   
   ---
   
   `main` without `jdk.incubator.vector`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.834        0.507   0.498        0.982  100000   100      50       64      
  250         no     23.20       4310.16            0.00             1          
 31.91       114.441      114.441       HNSW
   ```
   
   `main` with `jdk.incubator.vector`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.835        0.449   0.441        0.982  100000   100      50       64      
  250         no     15.36       6511.69            0.00             1          
 31.91       114.441      114.441       HNSW
   ```
   
   This PR without `jdk.incubator.vector`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.839        0.524   0.516        0.985  100000   100      50       64      
  250         no     22.71       4402.57            0.00             1          
 31.93       114.441      114.441       HNSW
   ```
   
   This PR with `jdk.incubator.vector`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.834        0.832   0.825        0.992  100000   100      50       64      
  250         no     15.43       6479.20            0.00             1          
 31.92       114.441      114.441       HNSW
   ```
   
   We see \~85% regression in the vectorized implementation after making these 
changes
   
   ---
   
   I understand there is a cost to allocate a native `MemorySegment`, copy the 
query vector, and GC overhead, but does the regression seem a bit too high?
   
   Attaching the flame graph comparison as well..
   <img width="1441" alt="Screenshot 2025-07-01 at 12 22 11 PM" 
src="https://github.com/user-attachments/assets/ae8450c0-ea2c-4bfb-af49-6cc3f126277d";
 />
   
   Any leads to debug would be appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to