On 12/08/15 09:32, Richard Earnshaw wrote: > On 12/08/15 02:11, Richard Henderson wrote: >> Something last week had me looking at ppc64 code generation, >> and some of what I saw was fairly bad. Fixing it wasn't going >> to be easy, due to the fact that the logic for generating >> constants wasn't contained within a single function. >> >> Better is the way that aarch64 and alpha have done it in the >> past, sharing a single function with all of the logical that >> can be used for both cost calculation and the actual emission >> of the constants. >> >> However, the way that aarch64 and alpha have done it hasn't >> been ideal, in that there's a fairly costly search that must >> be done every time. I've thought before about changing this >> so that we would be able to cache results, akin to how we do >> it in expmed.c for multiplication. >> >> I've implemented such a caching scheme for three targets, as >> a test of how much code could be shared. The answer appears >> to be about 100 lines of boiler-plate. Minimal, true, but it >> may still be worth it as a way of encouraging backends to do >> similar things in a similar way. >> > > I've got a short week this week, so won't have time to look at this in > detail for a while. So a bunch of questions... but not necessarily > objections :-) > > How do we clear the cache, and when? For example, on ARM, switching > between ARM and Thumb state means we need to generate potentially > radically different sequences? We can do such splitting at function > boundaries now. > > Can we generate different sequences for hot/cold code within a single > function? > > Can we cache sequences with the context (eg use with AND, OR, ADD, etc)? > > >> Some notes about ppc64 in particular: >> >> * Constants aren't split until quite late, preventing all hope of >> CSE'ing portions of the generated code. My gut feeling is that >> this is in general a mistake, but... >> >> I did attempt to fix it, and got nothing for my troubles except >> poorer code generation for AND/IOR/XOR with non-trivial constants. >> > On AArch64 in particular, building complex constants is generally > destructive on the source register (if you want to preserve intermediate > values you have to make intermediate copies); that's clearly never going > to be a win if you don't need at least 3 instructions to form the > constant. > > There might be some cases where you could form a second constant as a > difference from an earlier one, but that then creates data-flow > dependencies and in OoO machines that might not be worth-while. Even > for in-order machines it can restrict scheduling and result in worse code. > > >> I'm somewhat surprised that the operands to the logicals aren't >> visible at rtl generation time, given all the work done in gimple. >> And failing that, combine has enough REG_EQUAL notes that it ought >> to be able to put things back together and see the simpler pattern. >> > > We've tried it in the past. Exposing the individual steps prevents the > higher-level rtl-based optimizations since they can no-longer deal with > the complete sub-expression.
Eg. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63724 R.