https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>:

https://gcc.gnu.org/g:3dfa4fe9f1a089b2b3906c83e22a1b39c49d937c

commit r12-1551-g3dfa4fe9f1a089b2b3906c83e22a1b39c49d937c
Author: Richard Biener <rguent...@suse.de>
Date:   Tue Jun 8 15:10:45 2021 +0200

    Vectorization of BB reductions

    This adds a simple reduction vectorization capability to the
    non-loop vectorizer.  Simple meaning it lacks any of the fancy
    ways to generate the reduction epilogue but only supports
    those we can handle via a direct internal function reducing
    a vector to a scalar.  One of the main reasons is to avoid
    massive refactoring at this point but also that more complex
    epilogue operations are hardly profitable.

    Mixed sign reductions are for now fend off and I'm not finally
    settled with whether we want an explicit SLP node for the
    reduction epilogue operation.  Handling mixed signs could be
    done by multiplying with a { 1, -1, .. } vector.  Fend off
    are also reductions with non-internal operands (constants
    or register parameters for example).

    Costing is done by accounting the original scalar participating
    stmts for the scalar cost and log2 permutes and operations for
    the vectorized epilogue.

    --

    SPEC CPU 2017 FP with rate workload measurements show (picked
    fastest runs of three) regressions for 507.cactuBSSN_r (1.5%),
    508.namd_r (2.5%), 511.povray_r (2.5%), 526.blender_r (0.5) and
    527.cam4_r (2.5%) and improvements for 510.parest_r (5%) and
    538.imagick_r (1.5%).  This is with -Ofast -march=znver2 on a Zen2.

    Statistics on CPU 2017 shows that the overwhelming number of seeds
    we find are reductions of two lanes (well - that's basically every
    associative operation).  That means we put a quite high pressure
    on the SLP discovery process this way.

    In total we find 583218 seeds we put to SLP discovery out of which
    66205 pass that and only 6185 of those make it through
    code generation checks. 796 of those are discarded because the reduction
    is part of a larger SLP instance.  4195 of the remaining
    are deemed not profitable to vectorize and 1194 are finally
    vectorized.  That's a poor 0.2% rate.

    Of the 583218 seeds 486826 (83%) have two lanes, 60912 have three (10%),
    28181 four (5%), 4808 five, 909 six and there are instances up to 120
    lanes.

    There's a set of 54086 candidate seeds we reject because
    they contain a constant or invariant (not implemented yet) but still
    have two or more lanes that could be put to SLP discovery.

    2021-06-16  Richard Biener   <rguent...@suse.de>

            PR tree-optimization/54400
            * tree-vectorizer.h (enum slp_instance_kind): Add
            slp_inst_kind_bb_reduc.
            (reduction_fn_for_scalar_code): Declare.
            * tree-vect-data-refs.c (vect_slp_analyze_instance_dependence):
            Check SLP_INSTANCE_KIND instead of looking at the
            representative.
            (vect_slp_analyze_instance_alignment): Likewise.
            * tree-vect-loop.c (reduction_fn_for_scalar_code): Export.
            * tree-vect-slp.c (vect_slp_linearize_chain): Split out
            chain linearization from vect_build_slp_tree_2 and generalize
            for the use of BB reduction vectorization.
            (vect_build_slp_tree_2): Adjust accordingly.
            (vect_optimize_slp): Elide permutes at the root of BB reduction
            instances.
            (vectorizable_bb_reduc_epilogue): New function.
            (vect_slp_prune_covered_roots): Likewise.
            (vect_slp_analyze_operations): Use them.
            (vect_slp_check_for_constructors): Recognize associatable
            chains for BB reduction vectorization.
            (vectorize_slp_instance_root_stmt): Generate code for the
            BB reduction epilogue.

            * gcc.dg/vect/bb-slp-pr54400.c: New testcase.

Reply via email to