[Bug tree-optimization/115841] 521.wrf_r ICEs when building with -march=znver4 -Ofast -flto --param vect-partial-vector-usage=1

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 16 Jul 2024 02:17:34 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115841


--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
r15-2054-g1e3aa9c9278db6, when backported to the branch, avoids the failure,
it's still latent of course.

The fortran loop is the DO KR=1,NRM loop from
module_mp_fast_sbm.fppized.f90:6100 which is the JERTIMESC subroutine

        SUBROUTINE JERTIMESC(FI1,X1,SFN11,SFN12 &
     &                      ,B11_MY,B12_MY,RIEC,CF,ID,COL,NKR)
      IMPLICIT NONE
       INTEGER NRM,KR,ICE,ID,NKR
      REAL B12,B11,FUN,DELM,FK,CF,SFN12S,SFN11S
        REAL  COL, &
     & X1(NKR,ID),FI1(NKR,ID),B11_MY(NKR,ID),B12_MY(NKR,ID) &
     &,RIEC(NKR,ID),SFN11,SFN12

        NRM=NKR-1
        DO 1 ICE=1,ID
             SFN11S=0.
             SFN12S=0.
             SFN11=CF*SFN11S
             SFN12=CF*SFN12S
             DO KR=1,NRM
! VALUE OF DISTRIBUTION FUNCTION
                FK=FI1(KR,ICE)
! DELTA-M
                DELM=X1(KR,ICE)*3.*COL
! INTEGRAL'S EXPRESSION
                FUN=FK*DELM
! VALUES OF INTEGRALS
                B11=B11_MY(KR,ICE)
                B12=B12_MY(KR,ICE)
                SFN11S=SFN11S+FUN*B11
                SFN12S=SFN12S+FUN*B12
             ENDDO
! CORRECTION
             SFN11=CF*SFN11S
             SFN12=CF*SFN12S
    1   CONTINUE
! END
        RETURN
        END SUBROUTINE JERTIMESC

It's an inlined copy in ONECOND1 (and that is a IPA CP clone).

The key to reproduce is the peeling for alignment.  We have

module_mp_fast_sbm.fppized.f90:6100:19: note:  vectorization_factor = 16,
niters = 32
module_mp_fast_sbm.fppized.f90:6100:19: note:   ===
vect_analyze_data_refs_alignment ===
module_mp_fast_sbm.fppized.f90:6100:19: note:   recording new base alignment
for &A.170
  alignment:    64
  misalignment: 0
  based on:     fk_206 = MEM <float[0:D.7150]> [(float[0:D.7065]
*)&A.170][_205];
module_mp_fast_sbm.fppized.f90:6100:19: note:   recording new base alignment
for &xl
  alignment:    32
  misalignment: 0
  based on:     _171 = MEM <float[0:D.7174]> [(float[0:D.7069] *)&xl][_205];
module_mp_fast_sbm.fppized.f90:6100:19: note:   recording new base alignment
for &A.166
  alignment:    64
  misalignment: 0
  based on:     b11_164 = MEM <float[0:D.7153]> [(float[0:D.7075]
*)&A.166][_205];
module_mp_fast_sbm.fppized.f90:6100:19: note:   recording new base alignment
for &A.167
  alignment:    64
  misalignment: 0
  based on:     b12_161 = MEM <float[0:D.7156]> [(float[0:D.7079]
*)&A.167][_205];
module_mp_fast_sbm.fppized.f90:6100:19: note:  
vect_compute_data_ref_alignment:
module_mp_fast_sbm.fppized.f90:6100:19: missed:   misalign = 0 bytes of ref MEM
<float[0:D.7150]> [(float[0:D.7065] *)&A.170][_205]
module_mp_fast_sbm.fppized.f90:6100:19: note:  
vect_compute_data_ref_alignment:
module_mp_fast_sbm.fppized.f90:6100:19: note:   can't force alignment of ref:
MEM <float[0:D.7174]> [(float[0:D.7069] *)&xl][_205]
module_mp_fast_sbm.fppized.f90:6100:19: note:  
vect_compute_data_ref_alignment:
module_mp_fast_sbm.fppized.f90:6100:19: missed:   misalign = 0 bytes of ref MEM
<float[0:D.7153]> [(float[0:D.7075] *)&A.166][_205]
module_mp_fast_sbm.fppized.f90:6100:19: note:  
vect_compute_data_ref_alignment:
module_mp_fast_sbm.fppized.f90:6100:19: missed:   misalign = 0 bytes of ref MEM
<float[0:D.7156]> [(float[0:D.7079] *)&A.167][_205]
module_mp_fast_sbm.fppized.f90:6100:19: note:   ===
vect_prune_runtime_alias_test_list ===
module_mp_fast_sbm.fppized.f90:6100:19: note:   ===
vect_enhance_data_refs_alignment ===
module_mp_fast_sbm.fppized.f90:6100:19: missed:   Unknown misalignment,
naturally aligned
module_mp_fast_sbm.fppized.f90:6100:19: note:   vect_can_advance_ivs_p:
module_mp_fast_sbm.fppized.f90:6100:19: note:   Analyze phi: sfn11s_17 = PHI
<sfn11s_156(69), 0.0(7)>
module_mp_fast_sbm.fppized.f90:6100:19: note:   reduc or virtual phi. skip.
module_mp_fast_sbm.fppized.f90:6100:19: note:   Analyze phi: sfn12s_2 = PHI
<sfn12s_133(69), 0.0(7)>
...
module_mp_fast_sbm.fppized.f90:6100:19: note:   Alignment of access forced
using peeling.
module_mp_fast_sbm.fppized.f90:6100:19: note:   Peeling for alignment will be
applied.

where we align the xl load and end up with misaligned others.  The C testcase
has the As aligned to 16 and xl aligned to 32.  We're doing runtime
alignment of xl with a scalar prologue.

[Bug tree-optimization/115841] 521.wrf_r ICEs when building with -march=znver4 -Ofast -flto --param vect-partial-vector-usage=1

Reply via email to