On Wed, Aug 12, 2015 at 04:59:27PM +0100, Wilco Dijkstra wrote:
> However it also creates new dependencies that may not be desirable
> (such as hash table size, algorithm used etc).
> 
> > Some notes about ppc64 in particular:
> > 
> >   * Constants aren't split until quite late, preventing all hope of
> >     CSE'ing portions of the generated code.  My gut feeling is that
> >     this is in general a mistake, but...
> 
> Late split is best in general as you want to CSE the original constants,
> not parts of the expansion (which would be very rarely possible).

CSE handles the REG_EQ* notes just fine?  For rs6000 many constants
are split at expansion already, and I never saw problems from that?
Maybe I haven't looked at the right (wrong :-) ) testcases though.

> > Comments?  Especially on the shared header?
> 
> I'm not convinced the amount of code that could be shared is enough to be
> worthwhile.

I agree.

> Also the way it is written makes the immediate generation more 
> complex and likely consuming a lot of memory (each cached immediate requires
> at least 64 bytes).

It would take a bit less memory if it used pointers (to other cache elts)
for sub-expressions, but that makes cache invalidation more interesting,
and shared (within one constant) subexpressions non-trivial.

> It is not obvious to me why it is a good idea to hide the
> simple/fast cases behind the hashing scheme - it seems better that the backend
> explicitly decides which cases should be cached.

Without going through the abstraction you mean?  I'd rather there was
no such abstraction at all :-)

> I looked at the statistics of AArch64 immediate generation a while ago. 
> The interesting thing is ~95% of calls are queries, and the same query is on 
> average repeated 10 times in a row.

Huh.

> So (a) it is not important to cache the expansions,

It also stores the expansion from analysis until emission time.  This
simplifies the code a lot.

Maybe that can be done without caching everything (and still not be
too expensive), dunno.

> and (b) the high repetition rate means a single-entry cache
> has a 90% hitrate.

And a 10% miss rate...

> We already have a patch for this and could collect stats
> comparing the approaches. If a single-entry cache can provide a similar 
> benefit as caching all immediates then my preference would be to keep things
> simple and just cache the last query.
> 
> Note the many repeated queries indicate a performance issue at a much higher 
> level (repeated cost queries on the same unchanged RTL), and solving that 
> problem will likely improve buildtime for all targets.

You cannot see if RTL has changed from the pointer to it, so we'd have
to store that info (the "unchanged" info, or the "cost" info) somewhere.
Maybe it would be useful to store it in the RTL itself?


Segher

Reply via email to