On Fri, 6 Oct 2023, Robin Dapp wrote:
> > So if you think you got everything correct the patch is OK as-is,
> > I just wasn't sure - maybe the neutral_element change deserves
> > a comment as to how MINUS_EXPR is handled.
>
> Heh, I never think I got everything correct ;)
>
> Added this now:
>
> static bool
> fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
> {
> + /* We support MINUS_EXPR by negating the operand. This also preserves an
> + initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 +
> + (-0.0) = -0.0. */
>
> What I still found is that aarch64 ICEs at the assertion you added
> with -frounding-math. Therefore I changed it to:
>
> - gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
> + if (HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "cannot vectorize fold-left reduction
> because"
> + " signed zeros cannot be preserved.\n");
> + return false;
> + }
>
> No code changes apart from that. Will leave it until Monday and push then
> barring any objections.
Hmm, the function is called at transform time so this shouldn't help
avoiding the ICE. I expected we refuse to vectorize _any_ reduction
when sign dependent rounding is in effect? OTOH maybe sign-dependent
rounding is OK but only when we use a unconditional fold-left
(so a loop mask from fully masking is OK but not an original COND_ADD?).
Still the check should be done in vectorizable_reduction, not only
during transform (there the assert is proper, if we can distinguish
the loop mask vs. the COND_ADD here, otherwise just remove it).
Richard.
> Thanks for the pointers.
>
> Regards
> Robin
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)