tuning on Zen CPUs

rguenth at gcc dot gnu.org Wed, 17 Apr 2019 05:07:46 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90128


--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ugh.  Cactus is really ugly code :/  For one there's an invariant switch () in
the innermost loop, expanded to a binary tree (slightly different split point
GCC 8 vs. trunk), obviously unswitching cannot handle this.  This is a general
missed optimization precluding any vectorization attempt here.  Then we spill
the hell out of us because of the way the code is written.  Other than that
I don't see anything obvious here.  It might be that trunk:

    5802:       83 fb 06                cmp    $0x6,%ebx
    5805:       0f 84 25 84 00 00       je     dc30
<_ZL19ML_BSSN_Advect_BodyPK4
_cGHiiPKdS3_S3_PKiS5_iPKPd+0xdc30>
    580b:       0f 8f cf 1d 00 00       jg     75e0
<_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd+0x75e0>
    5811:       83 fb 02                cmp    $0x2,%ebx
    5814:       0f 85 06 c0 ff ff       jne    1820
<_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd+0x1820>

is worse to the branch predictor than the GCC 8 version

    89ee:       0f 84 bc 64 00 00       je     eeb0
<_ZL19ML_BSSN_Advect_BodyPK4
_cGHiiPKdS3_S3_PKiS5_iPKPd+0xeeb0>
    89f4:       0f 8e 96 45 00 00       jle    cf90
<_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd+0xcf90>
    89fa:       8b b4 24 a8 08 00 00    mov    0x8a8(%rsp),%esi
    8a01:       83 fe 06                cmp    $0x6,%esi
    8a04:       0f 85 e6 8e ff ff       jne    18f0
<_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd+0x18f0>

(notice the "padding" reload).  That is probably going to depend on final
code layout again of course.  I recall reading a third conditional jump
in a fetch word requires an additional branch predictor slot or so.

So it would be interesting to see if the branch misses accumulate on
that binary tree generated from the loop invariant switch where in
theory those should be all totally predictable.

That said, I'm not yet able to reproduce the slowdown but will try.

[Bug tree-optimization/90128] 507.cactuBSSN_r is 9-11% slower at -Ofast and native march/tuning on Zen CPUs

Reply via email to