https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67606
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2015-09-17 CC| |matz at gcc dot gnu.org Component|c |middle-end Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- So for the main part of this PR we actually expand from <bb 7>: # count_16 = PHI <count_1(6), 0(2)> return count_16; so it is a matter of coalescing and where we put that copy from zero. We coalesce the following way: Coalesce list: (1)count_1 & (15)count_15 [map: 0, 7] : Success -> 0 Coalesce list: (3)ivtmp.6_3 & (13)ivtmp.6_13 [map: 2, 6] : Success -> 2 Coalesce list: (1)count_1 & (11)count_11 [map: 0, 5] : Success -> 0 Coalesce list: (1)count_1 & (16)count_16 [map: 0, 8] : Success -> 0 Coalesce list: (2)ivtmp.6_2 & (13)ivtmp.6_3 [map: 1, 2] : Success -> 2 so 'count' is fully coalesced but of course the constant is still there and we insert a copy on the 2->7 edge. Inserting a value copy on edge BB3->BB4 : PART.0 = 0 Inserting a value copy on edge BB2->BB7 : PART.0 = 0 which also looks good (we use the correct partition for this). Note that the zero init isn't partially redundant so GCSE isn't able to optimize here and RTL code hoisting isn't very advanced. I'm also sure the RA guys will say it's not the RAs job of doing the hoisting. So with my usual GIMPLE hat on I'd say it would have been nice to help things by placing the value copy more intelligently. We have a pass that is supposed to help here - uncprop. We're faced with <bb 2>: if (length_4(D) > 0) goto <bb 3>; else goto <bb 7>; <bb 3>: ... <bb 4>: # count_15 = PHI <0(3), count_1(6)> ... <bb 6>: # count_1 = PHI <count_15(4), count_11(5)> ivtmp.6_3 = ivtmp.6_13 + 4; if (ivtmp.6_3 != _25) goto <bb 4>; else goto <bb 7>; <bb 7>: # count_16 = PHI <count_1(6), 0(2)> return count_16; which might be a good enough pattern to detect (slight complication with the forwarder BB3). Note that we'd increase register pressure throughout BB3 and that for the whole thing to work we'd still need to be able to make sure we can coalesce all of count and the register we init with zero. Given the interaction with coalescing I wonder whether it makes sense to do "uncprop" together with coalescing or to make emitting and placing value-copies work on GIMPLE, exposing the partitions explicitely somehow. Well, trying to improve uncprop for this particular testcase would work and shouldn't be too hard (you'd extend it from detecting edge equivalencies to existing vargs to do PHI value hoisting). There is also other code trying to improve coalescing - rewrite_out_of_ssa has insert_backedge_copies so we could also detect the pattern here and insert copies from the common constant in the common dominator.