kaivalnp commented on PR #15341:
URL: https://github.com/apache/lucene/pull/15341#issuecomment-3423279799

   Thanks @mikemccand, there doesn't seem to be any performance penalty on 
"[beast3 (nightly benchmarking 
box)](https://blog.mikemccandless.com/2021/01/apache-lucene-performance-on-128-core.html)
 -- a Ryzen Threadripper 3990X". There's definitely some impact of alignment on 
"Raptor Lake box is i9-13900K", but this is lower than my machine (<10%) -- so 
this alignment issue is mostly on Graviton, or ARM CPUs in general, as @rmuir 
shared?
   
   I tried running `knnPerfTest.py` on Cohere vectors (768d) with `DOT_PRODUCT` 
similarity
   
   `main` (4-byte-alignment)
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  
beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  
num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.890        1.917   1.908        0.995  1000000   100      50       32     
   250         no     5388     67.66      14780.22          130.41             
1         3014.60      2929.688     2929.688       HNSW
   ```
   
   This PR (64-byte-alignment)
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  
beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  
num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.891        1.845   1.836        0.995  1000000   100      50       32     
   250         no     5403     62.48      16004.61          130.03             
1         3014.90      2929.688     2929.688       HNSW
   ```
   
   Indexing was sped up by \~7.6%, while Search was sped up by \~3.8%
   
   I see another action item from this benchmark: I wasn't aligning the output 
inside [this merge 
function](https://github.com/apache/lucene/blob/eb27b14eaa09c53496a50c5944160b4989910882/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsWriter.java#L252-L253),
 which is used by HNSW-based vector formats for merging (see that `index(s)` 
improved in my benchmark, but not `force_merge(s)` -- which should speed up 
after this additional change?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to