gf2121 commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729250628

   Thanks for feedback,
   
   > Should we floor to a multiple of 16 instead of 8 so that we have a perfect 
second loop with AVX-512 as well?
   
   That is what i thought initially. But my AVX-512 machine (hopefully it is) 
somehow only deals 256 bit once for its `vpand` and `vpslld` instructions so 
flooring to multiple of 8 works on it. Flooring to multiple of 16 makes 
benchmark a bit slower on my machine as more remaining longs need to be read 
but i'm fine to floor it to multiple of 16 there are machines requiring that.
   
   ```
   Benchmark                                                 Mode  Cnt   Score  
 Error   Units
   Decode21Benchmark.decode21Scalar                         thrpt    5  28.114 
? 0.013  ops/ms
   Decode21Benchmark.decode21Scalar:asm                     thrpt          NaN  
           ---
   Decode21Benchmark.decode21Vector                         thrpt    5  49.160 
? 1.661  ops/ms
   Decode21Benchmark.decode21Vector:asm                     thrpt          NaN  
           ---
   Decode21Benchmark.decode21VectorFloorToMultipleOf16      thrpt    5  74.828 
? 0.463  ops/ms
   Decode21Benchmark.decode21VectorFloorToMultipleOf16:asm  thrpt          NaN  
           ---
   Decode21Benchmark.decode21VectorFloorToMultipleOf8       thrpt    5  81.078 
? 0.397  ops/ms
   Decode21Benchmark.decode21VectorFloorToMultipleOf8:asm   thrpt          NaN  
           ---
   ```
   
   cpu flags:
   ```
   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc 
rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid 
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb 
stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx 
avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl 
xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni arch_capabilities
   ```
   
   > By the way, which of your machine produced the above benchmark results?
   
   The luceneutil results get on the intel chip (Intel(R) Xeon(R) Gold 5118 CPU 
@ 2.30GHz (AVX 512)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to