[Bug target/77308] surprisingly large stack usage for sha512 on arm

wilco at gcc dot gnu.org Thu, 03 Nov 2016 08:34:45 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


--- Comment #57 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #56)
> (In reply to wilco from comment #55)
> > (In reply to Bernd Edlinger from comment #39)
> > > Created attachment 39940 [details]
> > > proposed patch, v2
> > > 
> > > last upload was accidentally truncated.
> > > uploaded the right patch.
> > 
> > Right so looking at your patch, I think we should make the LDRD peephole
> > change in a separate patch. I tried your foo example on all combinations of
> > ARM, Thumb-2, VFP, NEON on various CPUs with both settings of
> > prefer_ldrd_strd.
> > 
> > In all cases the current GCC generates LDRD/STRD, even for zero offsets.
> > CPUs where prefer_ldrd_strd=false emit LDR/STR for the shifts with
> > -msoft-float or -mfpu=vfp (but not -mfpu=neon). This is clearly incorrect
> > given that LDRD/STRD is used in all other cases, and prefer_ldrd_strd seems
> > to imply whether to prefer using LDRD/STRD in prolog/epilog and inlined
> > memcpy.
> > 
> > So that means we should remove the odd checks for codesize and
> > current_tune->prefer_ldrd_strd from all the peepholes.
> 
> Agreed, I can split the patch.
> 
> From what I understand, we should never emit ldrd/strd out of
> the memmovdi2 pattern when optimizing for speed and disable
> the peephole in the way I proposed it in the patch.

No that's incorrect. Not generating LDRD when optimizing for speed means a
slowdown on most cores, so it is essential we keep generating LDRD whenever
possible.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to