https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575
--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>: https://gcc.gnu.org/g:7b2fb7ddc7a713c057d033a48c9482d5383ba54c commit r15-4662-g7b2fb7ddc7a713c057d033a48c9482d5383ba54c Author: Richard Biener <rguent...@suse.de> Date: Wed Oct 23 13:56:55 2024 +0200 tree-optimization/116575 - SLP masked load-lanes discovery The following implements masked load-lane discovery for SLP. The challenge here is that a masked load has a full-width mask with group-size number of elements when this becomes a masked load-lanes instruction one mask element gates all group members. We already have some discovery hints in place, namely STMT_VINFO_SLP_VECT_ONLY to guard non-uniform masks, but we need to choose a way for SLP discovery to handle possible masked load-lanes SLP trees. I have this time chosen to handle load-lanes discovery where we have performed permute optimization already and conveniently got the graph with predecessor edges built. This is because unlike non-masked loads masked loads with a load_permutation are never produced by SLP discovery (because load permutation handling doesn't handle un-permuting the mask) and thus the load-permutation lowering which handles non-masked load-lanes discovery doesn't trigger. With this SLP discovery for a possible masked load-lanes, thus a masked load with uniform mask, produces a splat of a single-lane sub-graph as the mask SLP operand. This is a representation that shouldn't pessimize the mask load case and allows the masked load-lanes transform to simply elide this splat. This fixes the aarch64-sve.exp mask_struct_load*.c testcases with --param vect-force-slp=1 PR tree-optimization/116575 * tree-vect-slp.cc (vect_get_and_check_slp_defs): Handle gaps, aka NULL scalar stmt. (vect_build_slp_tree_2): Allow gaps in the middle of a grouped mask load. When the mask of a grouped mask load is uniform do single-lane discovery for the mask and insert a splat VEC_PERM_EXPR node. (vect_optimize_slp_pass::decide_masked_load_lanes): New function. (vect_optimize_slp_pass::run): Call it.