https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #21 from Bernd Edlinger <bernd.edlinger at hotmail dot de> --- (In reply to wilco from comment #20) > > Wilco, where have you seen the additional registers used with my > > previous patch, maybe we can try to fix that somehow? > > What happens is that the move of zero causes us to use extra registers in > shifts as both source and destination are now always live at the same time. > We generate worse code for simple examples like x | (y << 3): > > -mfpu=vfp: > push {r4, r5} > lsls r5, r1, #3 > orr r5, r5, r0, lsr #29 > lsls r4, r0, #3 > orr r0, r4, r2 > orr r1, r5, r3 > pop {r4, r5} > bx lr > -mfpu=neon: > lsls r1, r1, #3 > orr r1, r1, r0, lsr #29 > lsls r0, r0, #3 > orrs r0, r0, r2 > orrs r1, r1, r3 > bx lr > hmm. I think with my patch reverted the code is the same. I tried -O2 -marm -mfpu=vfp -mhard-float get the first variant with and without patch. For -O2 -marm -mfpu=vfp -msoft-float I get the second variant with and witout patch. For -O2 -marm -mfpu=neon -mhard-float I get the second variant With -O2 -marm -mfpu=neon -msoft-float I get a third variant again with and without patch: lsl r1, r1, #3 mov ip, r0 orr r0, r2, r0, lsl #3 orr r1, r1, ip, lsr #29 orr r1, r1, r3 bx lr Am I missing something?