[Bug target/77308] surprisingly large stack usage for sha512 on arm

rearnsha at gcc dot gnu.org Wed, 02 Nov 2016 03:23:04 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


--- Comment #50 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to wilco from comment #47)
> (In reply to Richard Earnshaw from comment #46)
> > (In reply to wilco from comment #44)
> > > (In reply to Bernd Edlinger from comment #38)
> > > > Created attachment 39939 [details]
> > > > proposed patch, v2
> > > > 
> > > 
> > > > Unlike the previous patch, thumb1 stack usage stays at 1588 bytes,
> > > > because thumb1 cannot split the adddi3 pattern, once it is emitted.
> > > 
> > > We can split into a new pattern that contains adds/adc together. Splitting
> > > should help Thumb-1 the most as it has just 3 allocatable DI mode
> > > registers...
> > 
> > Not on Thumb-1 we can't.  Because of register allocation limitations, we
> > cannot expose the flags until after register allocation has completed. 
> > (Since the register allocator needs to be able to insert loads, adds and
> > copy instructions between any two insns.  The add and copy instructions
> > clobber the flags, making early splitting impossible.
> 
> What I meant is splitting into a single new instruction using SI mode
> registers rather than DI mode registers so that register allocation is more
> efficient.

You couldn't do that before combine, since the pattern would have to describe
setting both 'result' registers independently.  That would create a pattern
that combine couldn't handle (more than one non-flag result) and so that in
turn would stop the compiler being able to optimize such a pattern properly. 
Note the pattern would probably end up looking like a parallel that set high
and low parts to the result of the 64-bit operation.

It might help to rewrite the pattern that way after combine, but before
register allocation.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to