mikemccand commented on PR #15341:
URL: https://github.com/apache/lucene/pull/15341#issuecomment-3422697948

   Raptor Lake box is i9-13900K:
   
   ```
   processor       : 31
   vendor_id       : GenuineIntel
   cpu family      : 6
   model           : 183
   model name      : 13th Gen Intel(R) Core(TM) i9-13900K
   stepping        : 1
   microcode       : 0x12f
   cpu MHz         : 800.000
   cache size      : 36864 KB
   physical id     : 0
   siblings        : 32
   core id         : 47
   cpu cores       : 24
   apicid          : 94
   initial apicid  : 94
   fpu             : yes
   fpu_exception   : yes
   cpuid level     : 32
   wp              : yes
   flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 
monitor ds_cpl vmx sm\
   x est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch 
cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid 
ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap 
clflushopt\
    clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect 
user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window 
hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid 
movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d 
arch_capabilities
   vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only 
ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid 
unrestricted_guest vapic_reg vid ple shadow_vmcs ept_violation_ve 
ept_mode_based_exec tsc_scaling usr_wait_pause
   bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb 
rfds bhi spectre_v2_user
   bogomips        : 5990.40
   clflush size    : 64
   cache_alignment : 64
   address sizes   : 46 bits physical, 48 bits virtual
   power management:
   ```
   
   Results:
   
   ```
   Benchmark                                      (padBytes)  (size)   Mode  
Cnt   Score   Error   Units
   VectorScorerBenchmark.binaryDotProductDefault           0     256  thrpt   
15  14.037 ± 0.061  ops/us
   VectorScorerBenchmark.binaryDotProductDefault           1     256  thrpt   
15  14.046 ± 0.071  ops/us
   VectorScorerBenchmark.binaryDotProductDefault           2     256  thrpt   
15  14.139 ± 0.089  ops/us
   VectorScorerBenchmark.binaryDotProductDefault           4     256  thrpt   
15  14.069 ± 0.040  ops/us
   VectorScorerBenchmark.binaryDotProductDefault           6     256  thrpt   
15  14.038 ± 0.072  ops/us
   VectorScorerBenchmark.binaryDotProductDefault           8     256  thrpt   
15  14.094 ± 0.070  ops/us
   VectorScorerBenchmark.binaryDotProductDefault          16     256  thrpt   
15  14.073 ± 0.059  ops/us
   VectorScorerBenchmark.binaryDotProductDefault          20     256  thrpt   
15  14.134 ± 0.075  ops/us
   VectorScorerBenchmark.binaryDotProductDefault          32     256  thrpt   
15  14.016 ± 0.044  ops/us
   VectorScorerBenchmark.binaryDotProductDefault          50     256  thrpt   
15  14.031 ± 0.046  ops/us
   VectorScorerBenchmark.binaryDotProductDefault          64     256  thrpt   
15  14.082 ± 0.068  ops/us
   VectorScorerBenchmark.binaryDotProductDefault         100     256  thrpt   
15  14.013 ± 0.059  ops/us
   VectorScorerBenchmark.binaryDotProductDefault         128     256  thrpt   
15  14.079 ± 0.069  ops/us
   VectorScorerBenchmark.binaryDotProductDefault         255     256  thrpt   
15  14.143 ± 0.074  ops/us
   VectorScorerBenchmark.binaryDotProductDefault         256     256  thrpt   
15  14.026 ± 0.028  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg            0     256  thrpt   
15  49.305 ± 0.244  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg            1     256  thrpt   
15  48.572 ± 0.030  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg            2     256  thrpt   
15  48.508 ± 0.198  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg            4     256  thrpt   
15  48.636 ± 0.094  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg            6     256  thrpt   
15  48.536 ± 0.185  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg            8     256  thrpt   
15  49.346 ± 0.166  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg           16     256  thrpt   
15  49.419 ± 0.102  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg           20     256  thrpt   
15  49.224 ± 0.396  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg           32     256  thrpt   
15  49.423 ± 0.134  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg           50     256  thrpt   
15  48.676 ± 0.167  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg           64     256  thrpt   
15  49.060 ± 0.866  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg          100     256  thrpt   
15  49.181 ± 0.210  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg          128     256  thrpt   
15  49.444 ± 0.082  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg          255     256  thrpt   
15  48.362 ± 0.163  ops/us
   VectorScorerBenchmark.binaryDotProductMemSeg          256     256  thrpt   
15  48.169 ± 5.291  ops/us
   VectorScorerBenchmark.floatDotProductDefault            0     256  thrpt   
15  23.215 ± 0.023  ops/us
   VectorScorerBenchmark.floatDotProductDefault            1     256  thrpt   
15  23.207 ± 0.067  ops/us
   VectorScorerBenchmark.floatDotProductDefault            2     256  thrpt   
15  23.181 ± 0.086  ops/us
   VectorScorerBenchmark.floatDotProductDefault            4     256  thrpt   
15  23.156 ± 0.290  ops/us
   VectorScorerBenchmark.floatDotProductDefault            6     256  thrpt   
15  23.232 ± 0.012  ops/us
   VectorScorerBenchmark.floatDotProductDefault            8     256  thrpt   
15  23.215 ± 0.091  ops/us
   VectorScorerBenchmark.floatDotProductDefault           16     256  thrpt   
15  23.194 ± 0.071  ops/us
   VectorScorerBenchmark.floatDotProductDefault           20     256  thrpt   
15  23.202 ± 0.083  ops/us
   VectorScorerBenchmark.floatDotProductDefault           32     256  thrpt   
15  23.207 ± 0.048  ops/us
   VectorScorerBenchmark.floatDotProductDefault           50     256  thrpt   
15  23.227 ± 0.031  ops/us
   VectorScorerBenchmark.floatDotProductDefault           64     256  thrpt   
15  23.187 ± 0.095  ops/us
   VectorScorerBenchmark.floatDotProductDefault          100     256  thrpt   
15  23.246 ± 0.114  ops/us
   VectorScorerBenchmark.floatDotProductDefault          128     256  thrpt   
15  23.214 ± 0.077  ops/us
   VectorScorerBenchmark.floatDotProductDefault          255     256  thrpt   
15  23.212 ± 0.035  ops/us
   VectorScorerBenchmark.floatDotProductDefault          256     256  thrpt   
15  23.239 ± 0.117  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg             0     256  thrpt   
15  53.514 ± 5.159  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg             1     256  thrpt   
15  49.594 ± 3.885  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg             2     256  thrpt   
15  50.504 ± 0.122  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg             4     256  thrpt   
15  51.385 ± 4.406  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg             6     256  thrpt   
15  50.497 ± 0.146  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg             8     256  thrpt   
15  52.327 ± 0.292  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg            16     256  thrpt   
15  51.401 ± 4.426  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg            20     256  thrpt   
15  52.373 ± 0.307  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg            32     256  thrpt   
15  54.779 ± 0.078  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg            50     256  thrpt   
15  49.447 ± 1.502  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg            64     256  thrpt   
15  54.788 ± 0.060  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg           100     256  thrpt   
15  51.600 ± 0.352  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg           128     256  thrpt   
15  54.650 ± 0.377  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg           255     256  thrpt   
15  50.042 ± 0.167  ops/us
   VectorScorerBenchmark.floatDotProductMemSeg           256     256  thrpt   
15  54.583 ± 0.399  ops/us
   ```
   
   There might be small some mis-alignment penalty for float SIMD?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to