https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575

--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>:

https://gcc.gnu.org/g:7b2fb7ddc7a713c057d033a48c9482d5383ba54c

commit r15-4662-g7b2fb7ddc7a713c057d033a48c9482d5383ba54c
Author: Richard Biener <rguent...@suse.de>
Date:   Wed Oct 23 13:56:55 2024 +0200

    tree-optimization/116575 - SLP masked load-lanes discovery

    The following implements masked load-lane discovery for SLP.  The
    challenge here is that a masked load has a full-width mask with
    group-size number of elements when this becomes a masked load-lanes
    instruction one mask element gates all group members.  We already
    have some discovery hints in place, namely STMT_VINFO_SLP_VECT_ONLY
    to guard non-uniform masks, but we need to choose a way for SLP
    discovery to handle possible masked load-lanes SLP trees.

    I have this time chosen to handle load-lanes discovery where we
    have performed permute optimization already and conveniently got
    the graph with predecessor edges built.  This is because unlike
    non-masked loads masked loads with a load_permutation are never
    produced by SLP discovery (because load permutation handling doesn't
    handle un-permuting the mask) and thus the load-permutation lowering
    which handles non-masked load-lanes discovery doesn't trigger.

    With this SLP discovery for a possible masked load-lanes, thus
    a masked load with uniform mask, produces a splat of a single-lane
    sub-graph as the mask SLP operand.  This is a representation that
    shouldn't pessimize the mask load case and allows the masked load-lanes
    transform to simply elide this splat.

    This fixes the aarch64-sve.exp mask_struct_load*.c testcases with
    --param vect-force-slp=1

            PR tree-optimization/116575
            * tree-vect-slp.cc (vect_get_and_check_slp_defs): Handle
            gaps, aka NULL scalar stmt.
            (vect_build_slp_tree_2): Allow gaps in the middle of a
            grouped mask load.  When the mask of a grouped mask load
            is uniform do single-lane discovery for the mask and
            insert a splat VEC_PERM_EXPR node.
            (vect_optimize_slp_pass::decide_masked_load_lanes): New
            function.
            (vect_optimize_slp_pass::run): Call it.

Reply via email to