[I] Make `byte[]` vector comparisons faster! (if possible) [lucene]

via GitHub Wed, 04 Oct 2023 05:48:59 -0700


benwtrent opened a new issue, #12621:
URL: https://github.com/apache/lucene/issues/12621


   ### Description
   
   While testing and digging around, I noticed that our float comparisons are 
way faster than byte on my Macbook (M1) and pretty much the same as our byte 
comparisons on a GCP Intel Sapphire Rapids CPU.
   
   This seems counter-intuitive to me. I would expect Panama to be able to do 
more `byte` operations per cycle than `float`. My guess is the intrinsics are 
weird? Panama Vector just doesn't support or detect the required operations?
   
   Here are two benchmark results using @rmuir's helpful vectorbench project:
   
   MacBook (Apple Silicon [128bits], JDK21):
   
   ```
   FloatDotProductBenchmark.dotProductNew     768  thrpt    5  21.781 ± 0.254  
ops/us
   FloatDotProductBenchmark.dotProductNew    1024  thrpt    5  15.091 ± 0.217  
ops/us
   BinaryDotProductBenchmark.dotProductNew     768  thrpt    5  8.041 ± 0.108  
ops/us
   BinaryDotProductBenchmark.dotProductNew    1024  thrpt    5  6.085 ± 0.133  
ops/us
   ```
   
   GCP (Intel Sapphire Rapids [avx512], JDK21):
   
   ```
   FloatDotProductBenchmark.dotProductNew     768  thrpt    5  20.169 ± 0.385  
ops/us
   FloatDotProductBenchmark.dotProductNew    1024  thrpt    5  18.334 ± 0.180  
ops/us
   BinaryDotProductBenchmark.dotProductNew     768  thrpt    5  19.686 ± 0.050  
ops/us
   BinaryDotProductBenchmark.dotProductNew    1024  thrpt    5  14.934 ± 0.014  
ops/us
   ```
   
   <details>
   <summary>cpu-flags</summary>
   
   ```
   Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep 
mtrr pge mca cmov pat pse36 clflus
                                    h mmx fxsr sse sse2 ss ht syscall nx 
pdpe1gb rdtscp lm constant_tsc rep_good n
                                    opl xtopology nonstop_tsc cpuid 
tsc_known_freq pni pclmulqdq ssse3 fma cx16 pc
                                    id sse4_1 sse4_2 x2apic movbe popcnt aes 
xsave avx f16c rdrand hypervisor lahf
                                    _lm abm 3dnowprefetch invpcid_single ssbd 
ibrs ibpb stibp ibrs_enhanced fsgsba
                                    se tsc_adjust bmi1 avx2 smep bmi2 erms 
invpcid rtm avx512f avx512dq rdseed adx
                                     smap avx512ifma clflushopt clwb avx512cd 
sha_ni avx512bw avx512vl xsaveopt xs
                                    avec xgetbv1 xsaves avx512_bf16 arat 
avx512vbmi umip avx512_vbmi2 gfni vaes vp
                                    clmulqdq avx512_vnni avx512_bitalg 
avx512_vpopcntdq rdpid cldemote movdiri mov
                                    dir64b fsrm md_clear serialize 
arch_capabilities
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] Make `byte[]` vector comparisons faster! (if possible) [lucene]

Reply via email to