On Fri, Aug 1, 2014 at 12:19 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
> This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.
>
> These are presently documented as producing a vector with the result in
> element 0, and this is inconsistent with their use in tree-vect-loop.c
> (which on bigendian targets pulls the bits out of the other end of the
> vector result). This leads to bugs on bigendian targets - see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 for a small testcase.
>
> I discounted "fixing" the vectorizer (to always read from element 0) and
> then making the optab reverse the result on bigendian targets (whose
> architectural insn produces the result in lane N-1), as optimization of
> vectors in RTL seems unlikely to remove such a reverse/permute and so this
> would lead to a performance regression (specifically on PowerPC).
>
> Instead it seems more natural for the tree code to produce a scalar result
> (producing a vector with the result in lane 0 has already caused confusion,
> e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).
>
> This patch preserves the meaning of the existing optab (producing a result
> in lane 0 on little-endian architectures or N-1 on bigendian), thus
> generally avoiding the need to change backends. Hence, expr.c extracts an
> endianness-dependent element from the optab result to give the result
> expected for the tree code.
>
> Significant complication in the AArch64 backend stems from the existence of
> builtins for reduction operations, which are gimple_fold'd to the tree code.
> Hence, I introduce new define_expands, and map the existing
> __builtin_aarch64_reduc_s{plus,min,max}_<mode> functions to those, with
> scalar result types, matching the result of the tree code to which these are
> still gimple_folded.
>
> If the above/proposed solution is acceptable, I'd make a longer patch
> series, including some cleanup to tree-vect-loop.c
> (vect_create_epilog_for_reduction now has only one case where
> extract_scalar_result == true), and separating out AArch64 changes. Further,
> I'd like to propose creating a new optab that directly outputs a scalar, as
> a migration path away from the existing optab whose meaning is
> endianness-dependent, i.e. such that expand_unop falls back to the existing
> optab only if the new one is not defined.
>
> Patch as it stands has been bootstrapped on x86_64 and regression tested on
> aarch64 and aarch64_be without regressions. On x86_64 there is a regression
> in gcc.target/i386/pr51235.c, where it seems my check in tree-cfg.c is too
> strict - we end up with a reduction from "vector (4) unsigned long int" to
> "void *". (Even if I modify tree-vect-loop.c to build the REDUC_..._EXPR as
> returning the element type of the input vector, its return type is later
> changed.) It seems I can "get away with" a less-strict check in tree-cfg.c,
> i.e. allowing the case where the modes of the expected and actual result
> types match (rather than "useless_type_conversion_p" holding between said
> types), but if anyone can suggest an alternative/better check then it'd be
> great to hear it...

We should fix the vectorizer code-generation instead.

Makes sense to me - non-aarch64 parts of the patch are ok.  The optab
migration strategy is as well.

Thanks,
Richard.

> --Alan

Reply via email to