https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110248

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
                 CC|                            |juzhe.zhong at rivai dot ai,
                   |                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org
             Target|                            |powerpc*-linux-gnu

--- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> ---
Commit r14-1493 caused -4% degradation on SPEC2017 fp bmk 503.bwaves_r at
option -Ofast --param=vect-partial-vector-usage=2 on Power10, as a follow up of
[1], I looked into it and confirmed it had nothing to do with existing load
density heuristics. The gap is from the hotspot mat_times_vec_, perf showed
that the different iv choices leading more latency for the length and further
uses.

By further checking, I think it exposed one issue that currently we only checks
the addressing mode supported or not against the mode, it's without any other
information like gimple statement. Unfortunately for IFNs len_{load,store}
which are generated with lxvl/stxvl only supporting the addressing mode: base
register (+ length register, which isn't even index register), but when
determining the group cost with cand (determine_group_iv_cost_address), it's
unable to consider this characteristic, as the current valid_mem_ref_p
(valid_mem_ref_p-> memory_address_addr_space_p ->legitimate_address_p) only
checking mode, address_space, constructed rtx. For V16QImode, the normal vector
load/store do support addressing modes "base + offset (DQ-form)", "base +
index" since power9, so ivopts would consider it's fine to use base + index
addressing mode for LEN_{load,store} uses and the related cost of adopting the
scalar (no address object based) candidate with step 16 for those
LEN_{load,store} uses is zero.

For example:

| Group 1:
|   Type: POINTER ARGUMENT ADDRESS
|   Use 1.0:
|     At stmt:    vect_434 = .LEN_LOAD (vectp_y.124_438, 64B, loop_len_436, 0);
|     At pos:     vectp_y.124_438
|     IV struct:
|       Type:     vector(2) real(kind=8) *
|       Base:     (vector(2) real(kind=8) *) vectp_y.125_195
|       Step:     32
|       Object:   (void *) vectp_y.125_195
|       Biv:      N
|       Overflowness wrto loop niter:     Overflow
|   Use 1.1:
|     At stmt:    .LEN_STORE (vectp_y.173_213, 64B, loop_len_436, vect_211, 0);
|     At pos:     vectp_y.173_213
|     IV struct:
|       Type:     vector(2) real(kind=8) *
|       Base:     (vector(2) real(kind=8) *) vectp_y.125_195
|       Step:     32
|       Object:   (void *) vectp_y.125_195
|       Biv:      N
|       Overflowness wrto loop niter:     Overflow

| Candidate 7:
|   Var befor: ivtmp.182
|   Var after: ivtmp.182
|   Incr POS: before exit test
|   IV struct:
|     Type:       sizetype
|     Base:       0
|     Step:       32
|     Biv:        N
|     Overflowness wrto loop niter:       No-overflow

  Group 1:
    cand  cost    compl.  inv.expr.       inv.vars
    1     8       2       NIL;    1
    2     12      2       18;     NIL;
    3     8       2       NIL;    1
    4     12      2       19;     NIL;
    5     12      2       19;     NIL;
    6     0       2       NIL;    NIL;
    7     0       2       NIL;    1     ==> zero cost
    8     0       0       NIL;    NIL;
    9     0       0       NIL;    NIL;
    31    0       0       NIL;    NIL;

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620305.html

Reply via email to