https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #21 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
(In reply to wilco from comment #20)
> > Wilco, where have you seen the additional registers used with my
> > previous patch, maybe we can try to fix that somehow?
>
> What happens is that the move of zero causes us to use extra registers in
> shifts as both source and destination are now always live at the same time.
> We generate worse code for simple examples like x | (y << 3):
>
> -mfpu=vfp:
> push {r4, r5}
> lsls r5, r1, #3
> orr r5, r5, r0, lsr #29
> lsls r4, r0, #3
> orr r0, r4, r2
> orr r1, r5, r3
> pop {r4, r5}
> bx lr
> -mfpu=neon:
> lsls r1, r1, #3
> orr r1, r1, r0, lsr #29
> lsls r0, r0, #3
> orrs r0, r0, r2
> orrs r1, r1, r3
> bx lr
>
hmm. I think with my patch reverted the code is the same.
I tried -O2 -marm -mfpu=vfp -mhard-float get the first variant
with and without patch.
For -O2 -marm -mfpu=vfp -msoft-float I get the second variant
with and witout patch.
For -O2 -marm -mfpu=neon -mhard-float I get the second variant
With -O2 -marm -mfpu=neon -msoft-float I get a third variant
again with and without patch:
lsl r1, r1, #3
mov ip, r0
orr r0, r2, r0, lsl #3
orr r1, r1, ip, lsr #29
orr r1, r1, r3
bx lr
Am I missing something?