Richard Henderson wrote:
> However, the way that aarch64 and alpha have done it hasn't
> been ideal, in that there's a fairly costly search that must
> be done every time.  I've thought before about changing this
> so that we would be able to cache results, akin to how we do
> it in expmed.c for multiplication.
> 
> I've implemented such a caching scheme for three targets, as
> a test of how much code could be shared.  The answer appears
> to be about 100 lines of boiler-plate.  Minimal, true, but it
> may still be worth it as a way of encouraging backends to do
> similar things in a similar way.

However it also creates new dependencies that may not be desirable
(such as hash table size, algorithm used etc).

> Some notes about ppc64 in particular:
> 
>   * Constants aren't split until quite late, preventing all hope of
>     CSE'ing portions of the generated code.  My gut feeling is that
>     this is in general a mistake, but...

Late split is best in general as you want to CSE the original constants,
not parts of the expansion (which would be very rarely possible).

>   * This is the only platform for which I bothered collecting any sort
>     of performance data:
> 
>     As best I can tell, there is a 9% improvement in bootstrap speed
>     for ppc64.  That is, 10 minutes off the original 109 minute build.
> 
>     For aarch64 and alpha, I simply assumed there would be no loss,
>     since the basic search algorithm is unchanged for each.
> 
> Comments?  Especially on the shared header?

I'm not convinced the amount of code that could be shared is enough to be
worthwhile. Also the way it is written makes the immediate generation more 
complex and likely consuming a lot of memory (each cached immediate requires
at least 64 bytes). It is not obvious to me why it is a good idea to hide the
simple/fast cases behind the hashing scheme - it seems better that the backend
explicitly decides which cases should be cached.

I looked at the statistics of AArch64 immediate generation a while ago. 
The interesting thing is ~95% of calls are queries, and the same query is on 
average repeated 10 times in a row. So (a) it is not important to cache the 
expansions, and (b) the high repetition rate means a single-entry cache
has a 90% hitrate. We already have a patch for this and could collect stats
comparing the approaches. If a single-entry cache can provide a similar 
benefit as caching all immediates then my preference would be to keep things
simple and just cache the last query.

Note the many repeated queries indicate a performance issue at a much higher 
level (repeated cost queries on the same unchanged RTL), and solving that 
problem will likely improve buildtime for all targets.

Wilco


Reply via email to