ChrisHegarty commented on issue #14042:
URL: https://github.com/apache/lucene/issues/14042#issuecomment-2546160112

   Hotspot will unroll the loops that are using the Vector API to do 
floating-point arithmetic. On my Intel box `dotProductBody` gets unrolled 4x, 
and since it is already hand-unrolled 4x, we get effectively 16x unrolling. e.g.
   
   ```
   ;; B49: #      out( B49 B50 ) <- in( B48 B49 ) Loop( B49-B49 inner main of 
N191 strip mined) Freq: 272.018
   0x000079609ff2fa11:   vmovdqu32   zmm0,ZMMWORD PTR [rdx+rax*4+0x310]
   0x000079609ff2fa1c:   vmovdqu32   zmm1,ZMMWORD PTR [rdx+rax*4+0x210]
   0x000079609ff2fa27:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0xd0]
   0x000079609ff2fa32:   vfmadd231ps zmm6,zmm4,ZMMWORD PTR [rcx+rax*4+0xd0]
   0x000079609ff2fa3d:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x1d0]
   0x000079609ff2fa48:   vfmadd231ps zmm6,zmm4,ZMMWORD PTR [rcx+rax*4+0x1d0]
   0x000079609ff2fa53:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x2d0]
   0x000079609ff2fa5e:   vfmadd231ps zmm6,zmm4,ZMMWORD PTR [rcx+rax*4+0x2d0]
   0x000079609ff2fa69:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x3d0]
   0x000079609ff2fa74:   vfmadd231ps zmm6,zmm4,ZMMWORD PTR [rcx+rax*4+0x3d0]
   0x000079609ff2fa7f:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x90]
   0x000079609ff2fa8a:   vfmadd231ps zmm5,zmm4,ZMMWORD PTR [rcx+rax*4+0x90]
   0x000079609ff2fa95:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x190]
   0x000079609ff2faa0:   vfmadd231ps zmm5,zmm4,ZMMWORD PTR [rcx+rax*4+0x190]
   0x000079609ff2faab:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x290]
   0x000079609ff2fab6:   vfmadd231ps zmm5,zmm4,ZMMWORD PTR [rcx+rax*4+0x290]
   0x000079609ff2fac1:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x390]
   0x000079609ff2facc:   vfmadd231ps zmm5,zmm4,ZMMWORD PTR [rcx+rax*4+0x390]
   0x000079609ff2fad7:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x50]
   0x000079609ff2fae2:   vfmadd231ps zmm3,zmm4,ZMMWORD PTR [rcx+rax*4+0x50]
   0x000079609ff2faed:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x150]
   0x000079609ff2faf8:   vfmadd231ps zmm3,zmm4,ZMMWORD PTR [rcx+rax*4+0x150]
   0x000079609ff2fb03:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x250]
   0x000079609ff2fb0e:   vfmadd231ps zmm3,zmm4,ZMMWORD PTR [rcx+rax*4+0x250]
   0x000079609ff2fb19:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x350]
   0x000079609ff2fb2f:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x10]
   0x000079609ff2fb3a:   vfmadd231ps zmm2,zmm4,ZMMWORD PTR [rcx+rax*4+0x10]
   0x000079609ff2fb45:   vmovdqu32   zmm4,ZMMWORD PTR [rdx+rax*4+0x110]
   0x000079609ff2fb50:   vfmadd231ps zmm2,zmm4,ZMMWORD PTR [rcx+rax*4+0x110]
   0x000079609ff2fb5b:   vfmadd231ps zmm2,zmm1,ZMMWORD PTR [rcx+rax*4+0x210]
   0x000079609ff2fb66:   vfmadd231ps zmm2,zmm0,ZMMWORD PTR [rcx+rax*4+0x310]
   0x000079609ff2fb71:   add    eax,0x100
   0x000079609ff2fb76:   cmp    eax,ebp
   0x000079609ff2fb78:   jl     0x000079609ff2fa11
   ;; B50: #      out( B48 B51 ) <- in( B49 )  Freq: 16.0002
   ``` 
   
   Reducing the unrolling of `dotProductBody`, to 2x (e.g. [draft PR]( #14071)) 
gives me a bit of an improvement.
   
   Linux
   ```
   Benchmark                                  (size)   Mode  Cnt   Score   
Error   Units
   main
   VectorUtilBenchmark.floatDotProductVector     768  thrpt   75  31.888 ± 
0.812  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  26.240 ± 
0.550  ops/us
   
   reduce unroll to x2
   VectorUtilBenchmark.floatDotProductVector     768  thrpt   75  35.129 ± 
0.749  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  28.060 ± 
0.619  ops/us
   
   reduce unroll to x2 AND first is mul (rather than FMA)
   VectorUtilBenchmark.floatDotProductVector     768  thrpt   75  37.100 ± 
0.726  ops/us
   VectorUtilBenchmark.floatDotProductVector    1024  thrpt   75  29.172 ± 
0.514  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to