https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62631

--- Comment #18 from amker at gcc dot gnu.org ---
(In reply to Eric Botcazou from comment #16)
> > The cost of expression "p + ((sizetype)(99 - i_6(D)) + 1) * 4"  computed
> > using normal +/-/* operators on sparc64 is 18, but the cost is 32 if it is
> > computed as "p + ((sizetype)(99 - i_6(D)) + 1) << 2", which is returned by
> > get_shiftadd_cost.
> 
> How do you get the first number exactly?  Note that the costs of shiftadd is
In force_expr_to_var_cost, it calculates the cost in normal way before
returning get_shiftadd_cost.

> completely skewed (by a factor of 3) because expmed.c computes it as a
> multadd instead of a shiftadd:
> 
> Breakpoint 2, init_expmed_one_mode (all=0x7fffffffd540, mode=QImode, speed=1)
>     at /home/eric/svn/gcc/gcc/expmed.c:219
> 219           set_shiftadd_cost (speed, mode, m, set_src_cost
> (all->shift_add, speed));
> (gdb) p debug_rtx(all->shift_add)
> (plus:QI (mult:QI (reg:QI 109 [0])
>         (const_int 2 [0x2]))
>     (reg:QI 109 [0]))
> 
> but this should ensure that the costs are roughly the same for the
> expressions.
> 
> > From the assembly code, it seems the computation is expensive on sparc64, I
> > may skip the test for these architectures if no other solutions.
> 
> The hitch is that the code generated for 32-bit SPARC (where the test
> passes) is the optimal one and is also valid for 64-bit SPARC.

The assembly is as below on sparc64:
f1:
    .register    %g2, #scratch
    sllx    %o1, 2, %g1
    mov    99, %g2
    add    %o0, %g1, %o0
    sub    %g2, %o1, %o1
    srl    %o1, 0, %g1
    add    %g1, 1, %g1
    sllx    %g1, 2, %g1
    add    %o0, %g1, %g1
    st    %g0, [%o0]
.LL5:
    add    %o0, 4, %o0
    cmp    %o0, %g1
    blu,a,pt %xcc, .LL5
     st    %g0, [%o0]
    jmp    %o7+8
     nop

While more efficient on sparc32, as below:

f1:
    sll    %o1, 2, %g1
    sub    %g0, %o1, %o1
    add    %o0, %g1, %o0
    sll    %o1, 2, %o1
    add    %o1, 400, %g1
    add    %o0, %g1, %g1
    st    %g0, [%o0]
.LL5:
    add    %o0, 4, %o0
    cmp    %o0, %g1
    blu,a    .LL5
     st    %g0, [%o0]
    jmp    %o7+8
     nop

The bloated pre-header happens on all 64 bits platforms.  At least I can
confrim that on aarch64 it is much worse than arm.  The difference is, it is
fixed in later compilation passes on aarch64 (I didn't investigate why or how).

I think the cause is, on 64bits platforms, expression
  "p + ((sizetype)(99 - i_6(D)) + 1) * 4" != "p + ((sizetype)(100-i_6(D)) * 4"
like on 32 platforms because sizetype has larger precision.

Reply via email to