On Sun, Nov 11, 2018 at 9:16 AM Joern Wolfgang Rennecke <joern.renne...@riscy-ip.com> wrote: > > It's nice to use the processors vector arithmetic to good effect, but > it's all for naught when > there are too many moves from/to general registers cluttering up the > loop. With a > double-vector reduction variable, the standard final reduction code got > so awkward that > the register allocator decided that the reduction variable must live in > general purpose > registers, not only after the loop, but across the loop patch. > Splitting the reduction to force the first step to be done as a vector > operation > seemed the obvious solution. The hook was called, but the vectorizer still > generated the vanilla final reduction code. It turns out that the > reduction splitting > was calculated, but the result not used, and the calculation started anew. > > The attached patch fixes this.
That looks quite fragile to me or warrants further cleanups. Can you push up the new_phis.length assert further and elide the loop over the PHIs? It looks like at the very beginning we are reducing the PHIs to a single PHI and new_phi_result is the one to look at (and the vector is updated, but given we replace the PHI with an assign using new_phi_result instead of the vector would be better). RIchard. > bootstrapped and regression tested on x86_64-pc-linux-gnu .