https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #57 from wilco at gcc dot gnu.org --- (In reply to Bernd Edlinger from comment #56) > (In reply to wilco from comment #55) > > (In reply to Bernd Edlinger from comment #39) > > > Created attachment 39940 [details] > > > proposed patch, v2 > > > > > > last upload was accidentally truncated. > > > uploaded the right patch. > > > > Right so looking at your patch, I think we should make the LDRD peephole > > change in a separate patch. I tried your foo example on all combinations of > > ARM, Thumb-2, VFP, NEON on various CPUs with both settings of > > prefer_ldrd_strd. > > > > In all cases the current GCC generates LDRD/STRD, even for zero offsets. > > CPUs where prefer_ldrd_strd=false emit LDR/STR for the shifts with > > -msoft-float or -mfpu=vfp (but not -mfpu=neon). This is clearly incorrect > > given that LDRD/STRD is used in all other cases, and prefer_ldrd_strd seems > > to imply whether to prefer using LDRD/STRD in prolog/epilog and inlined > > memcpy. > > > > So that means we should remove the odd checks for codesize and > > current_tune->prefer_ldrd_strd from all the peepholes. > > Agreed, I can split the patch. > > From what I understand, we should never emit ldrd/strd out of > the memmovdi2 pattern when optimizing for speed and disable > the peephole in the way I proposed it in the patch. No that's incorrect. Not generating LDRD when optimizing for speed means a slowdown on most cores, so it is essential we keep generating LDRD whenever possible.