From: Eric Botcazou <ebotca...@adacore.com>
Date: Tue, 18 Oct 2011 00:09:55 +0200

> I think that the original motivation for the previous design was the 32-bit 
> vector ABI, where the arguments are passed in integer registers.  So for:
> 
> typedef char  vec8 __attribute__((vector_size(8)));
> 
> extern vec8 foo (vec8);
> 
> vec8 bar(vec8 a, vec8 b)
> {
>   return foo(a & b);
> }
> 
> the generated code at -O2 is optimal:
> 
> fun8_2:
>       and     %o2, %o0, %o0
>       sethi   %hi(foo), %g1
>       jmp     %g1 + %lo(foo)
>        and    %o3, %o1, %o1
> 
> My understand is that, with the changes, you will spill and reload twice.
> Of course things are totally different with the 64-bit ABI.
> 
> A compromise could be to segregate the patterns, but still have alternatives 
> for the other registers, i.e. andsi3 would still have the 'd' alternative at 
> the end and the andv1si3 would have a 'r' alternative at the end, them being 
> disparaged properly.

I understand this, but one major problem with the original patterns is that
they told the compiler that integer arithmetic was possible also on DImode
and SImode values in float regs.

Guess what kinds of things reload does when you tell it that, and you also
give it access to a one-instruction way to move values between float and
integer regs?

Compounding this is the register allocation order for leaf functions.
That would cause the compiler to sometimes go to float regs for
integer reloads before it will go to the integer regs that would make
the function non-leaf.

Even in situations where this might provide some level of gain, it caused
extra code to be generated.  For example, if it reloaded the function's
return value temporarily into a float reg, we couldn't merge the reload
into the return value register as part of a "restore" instruction whereas
using a %lN register instead would allow us to do that.  So:

        mov     %o4, %l3
        ...
        ret
         restore %g0, %l3, %o0

turned into stuff like:

        movwtos %o4, %f0
        ...
        movstouw %f0, %o0
        ret
         restore

I would suggest we start with my patch, get the int<-->float move
instructions working reasonably, and then re-add vector-only cases for
the scenerios you describe above, making sure that we don't end up with
silly code generation scenerios like those I've just described.

I'm happy to work on all of that myself.

>> In fact, gcc.target/sparc/combined-1.c passes always even without
>> adjusting the optimization level to placate the register allocator
>> and many tests now generate more VIS instructions than before,
>> particularly on 32-bit.
> 
> Feel free to revert the adjustment I made as part of the patch.

Thanks for reviewing.

Richard Henderson also suggested to me during a seperate conversation
that I should use CONSTANT_P in sparc_vector_init() instead of the
convoluted test I had there.

Reply via email to