On Tue, 2020-09-15 at 10:49 +0930, Alan Modra via Gcc-patches wrote: > This patch series fixes a number of issues in rs6000_rtx_costs, the > aim being to provide costing somewhat closer to reality. Probably > the > most important patch of the series is patch 4, which just adds a > comment. Without the analysis that went into that comment, I found > myself making what seemed to be good changes but which introduced > regressions. > > So far these changes have not introduced any testsuite regressions > on --with-cpu=power8 and --with-cpu=power9 all lang bootstraps on > powerpc64le-linux. Pat spec tested on power9 against a baseline > master from a few months ago, seeing a few small improvements and no > degradations above the noise.
I've read through all the patches in this series, (including the tests that were sent a bit later). Your use of comments does a good job helping describe whats going on. One comment/question/point of clarity for the AND patch that I'll send separately. That said, the series lgtm. :-) thanks, -Will > > Some notes: > > Examination of varasm.o shows quite a number of cases where > if-conversion succeeds due to different seq_cost. One example: > > extern int foo (); > int > default_assemble_integer (unsigned size) > { > extern unsigned long rs6000_isa_flags; > > if (size > (!((rs6000_isa_flags & (1UL << 35)) != 0) ? 4 : 8)) > return 0; > return foo (); > } > > This rather horrible code turns the rs6000_isa_flags value into > either > 4 or 8: > rldicr 9,9,28,0 > srdi 9,9,28 > addic 9,9,-1 > subfe 9,9,9 > rldicr 9,9,0,61 > addi 9,9,8 > Better would be > rldicl 9,9,29,63 > sldi 9,9,2 > addi 9,9,4 > > There is also a "rlwinm ra,rb,3,0,26" instead of "rldicr ra,rb,3,60", > and "li r31,0x4000; rotldi r31,r31,17" vs. > "lis r31,0x8000; clrldi r31,r31,32". > Neither of these is a real change. I saw one occurrence of a 5 insn > sequence being replaced with a load from memory in > default_function_rodata_section, for ".rodata", and others elsewhere. > > Sometimes correct insn cost leads to unexpected results. For > example: > > extern unsigned bar (void); > unsigned > f1 (unsigned a) > { > if ((a & 0x01000200) == 0x01000200) > return bar (); > return 0; > } > > emits for a & 0x01000200 > (set (reg) (and (reg) (const_int 0x01000200))) > at expand time (two rlwinm insns) rather than the older > (set (reg) (const_int 0x01000200)) > (set (reg) (and (reg) (reg))) > which is three insns. However, since 0x01000200 is needed later the > older code after optimisation is smaller.