On Sun, Nov 11, 2018 at 9:16 AM Joern Wolfgang Rennecke
<joern.renne...@riscy-ip.com> wrote:
>
> It's nice to use the processors vector arithmetic to good effect, but
> it's all for naught when
> there are too many moves from/to general registers cluttering up the
> loop.  With a
> double-vector reduction variable, the standard final reduction code got
> so awkward that
> the register allocator decided that the reduction variable must live in
> general purpose
> registers, not only after the loop, but across the loop patch.
> Splitting the reduction to force the first step to be done as a vector
> operation
> seemed the obvious solution. The hook was called, but the vectorizer still
> generated the vanilla final reduction code.  It turns out that the
> reduction splitting
> was calculated, but the result not used, and the calculation started anew.
>
> The attached patch fixes this.

That looks quite fragile to me or warrants further cleanups.  Can you
push up the new_phis.length assert further and elide the loop over
the PHIs?  It looks like at the very beginning we are reducing the
PHIs to a single PHI and new_phi_result is the one to look at
(and the vector is updated, but given we replace the PHI with an
assign using new_phi_result instead of the vector would be better).

RIchard.

> bootstrapped and regression tested on x86_64-pc-linux-gnu .

Reply via email to