https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115845

            Bug ID: 115845
           Summary: 25% runtime regression of 527.cam4_r when enabling
                    --param vect-partial-vector-usage={1,2} ontop of
                    -Ofast --march=znver4
           Product: gcc
           Version: 14.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

There's a lot of cases like below

       │7c40:┌─ lea           (%rdi,%rax,1),%rbx                               
                                                             ▒
       │     │tau_w_f(1:ncol,1:pver,:) = tau_w_f(1:ncol,1:pver,:) +
twf(1:ncol,:,:)                                                          ▒
    68 │     │  vmovupd       (%rsi,%rax,1),%zmm14{%k1}                        
                                                             ▒
       │     │  add           $0x40,%rax                                       
                                                             ▒
       │     │  vmovupd       (%rbx),%zmm17{%k1}                               
                                                             ▒
 11165 │     │  vaddpd        %zmm17,%zmm14,%zmm20                             
                                                             ▒
   864 │     │  vmovupd       %zmm20,(%rbx){%k1}                               
                                                             ◆
       │     │  mov           %r11d,%ebx                                       
                                                             ▒
       │     │  vpbroadcastw  %r11d,%xmm20                                     
                                                             ▒
     5 │     │  sub           $0x8,%r11d                                       
                                                             ▒
       │     │  add           $0x8,%ebx                                        
                                                             ▒
       │     │  vpcmpnleuw    %xmm1,%xmm20,%k1                                 
                                                             ▒
     1 │     │  cmp           $0x8,%bx                                         
                                                             ▒
    89 │     └──ja            7c40

resulting in

Samples: 1M of event 'cycles:u', Event count (approx.): 1356741812802        2  
Overhead       Samples  Command          Shared Object                   
Symbol                                                              
   7.02%         79632  cam4_r_peak.gcc  cam4_r_peak.gcc7-m64             [.]
__aer_rad_props_MOD_aer_rad_props_sw                           ◆
   3.96%         43265  cam4_r_peak.gcc  cam4_r_peak.gcc7-m64             [.]
__tracer_data_MOD_interpolate_trcdata.constprop.0              ▒
   3.09%         34998  cam4_r_peak.gcc  cam4_r_peak.gcc7-m64             [.]
__radsw_MOD_radcswmx                                           ▒
   2.94%         32597  cam4_r_base.gcc  libm-2.31.so                     [.]
__ieee754_log_fma                                              ▒
   2.70%         30823  cam4_r_peak.gcc  libm-2.31.so                     [.]
__ieee754_log_fma                                              ▒
   2.68%         29978  cam4_r_base.gcc  cam4_r_base.gcc7-m64             [.]
__radsw_MOD_radcswmx                                           ▒
   2.28%         24998  cam4_r_base.gcc  cam4_r_base.gcc7-m64             [.]
__tracer_data_MOD_interpolate_trcdata.constprop.0              ▒
   2.21%         25215  cam4_r_peak.gcc  cam4_r_peak.gcc7-m64             [.]
__radae_MOD_radabs                                             ▒
   2.07%         22878  cam4_r_base.gcc  cam4_r_base.gcc7-m64             [.]
__radae_MOD_radabs                                             ▒
   1.77%         20098  cam4_r_peak.gcc  cam4_r_peak.gcc7-m64             [.]
__zm_conv_MOD_ientropy.isra.0                                  ▒
   1.62%         18145  cam4_r_base.gcc  cam4_r_base.gcc7-m64             [.]
__aer_rad_props_MOD_aer_rad_props_sw

(topmost and bottom most entries are the same functions peak/base).

It almost feels like fault suppression kicking in, but on loads?!

Reply via email to