[Bug target/77308] surprisingly large stack usage for sha512 on arm

bernd.edlinger at hotmail dot de Thu, 27 Oct 2016 07:04:08 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


--- Comment #21 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
(In reply to wilco from comment #20)
> > Wilco, where have you seen the additional registers used with my
> > previous patch, maybe we can try to fix that somehow?
> 
> What happens is that the move of zero causes us to use extra registers in
> shifts as both source and destination are now always live at the same time.
> We generate worse code for simple examples like x | (y << 3):
> 
> -mfpu=vfp:
>       push    {r4, r5}
>       lsls    r5, r1, #3
>       orr     r5, r5, r0, lsr #29
>       lsls    r4, r0, #3
>       orr     r0, r4, r2
>       orr     r1, r5, r3
>       pop     {r4, r5}
>       bx      lr
> -mfpu=neon:
>       lsls    r1, r1, #3
>       orr     r1, r1, r0, lsr #29
>       lsls    r0, r0, #3
>       orrs    r0, r0, r2
>       orrs    r1, r1, r3
>       bx      lr
> 

hmm. I think with my patch reverted the code is the same.

I tried -O2 -marm -mfpu=vfp -mhard-float get the first variant
with and without patch.

For -O2 -marm -mfpu=vfp -msoft-float I get the second variant
with and witout patch.

For -O2 -marm -mfpu=neon -mhard-float I get the second variant

With -O2 -marm -mfpu=neon -msoft-float I get a third variant
again with and without patch:

        lsl     r1, r1, #3
        mov     ip, r0
        orr     r0, r2, r0, lsl #3
        orr     r1, r1, ip, lsr #29
        orr     r1, r1, r3
        bx      lr



Am I missing something?

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to