odd extract from VLA vector

cvs-commit at gcc dot gnu.org via Gcc-bugs Mon, 07 Oct 2024 05:03:50 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116583


--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:2abd04d01bc4e18158c785e75c91576b836f3ba6

commit r15-4113-g2abd04d01bc4e18158c785e75c91576b836f3ba6
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Mon Oct 7 13:03:04 2024 +0100

    vect: Restructure repeating_p case for SLP permutations [PR116583]

    The repeating_p case previously handled the specific situation
    in which the inputs have N lanes and the output has N lanes,
    where N divides the number of vector elements.  In that case,
    every output uses the same permute vector.

    The code was therefore structured so that the outer loop only
    constructed one permute vector, with an inner loop generating
    as many VEC_PERM_EXPRs from it as required.

    However, the main patch for PR116583 adds support for cycling
    through N permute vectors, rather than just having one.
    The current structure doesn't really handle that case well.
    (We'd need to interleave the results after generating them,
    which sounds a bit fragile.)

    This patch instead makes the transform phase calculate each output
    vector's permutation explicitly, like for the !repeating_p path.
    As a bonus, it gets rid of one use of SLP_TREE_NUMBER_OF_VEC_STMTS.

    This arguably undermines one of the justifications for using repeating_p
    for constant-length vectors: that the repeating_p path involved less
    work than the !repeating_p path.  That justification does still hold for
    the analysis phase, though, and that should be the more time-sensitive
    part.  And the other justification -- to get more coverage of the code --
    still applies.  So I'd prefer that we continue to use repeating_p for
    constant-length vectors unless that causes a known missed optimisation.

    gcc/
            PR tree-optimization/116583
            * tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
            the noutputs_per_mask inner loop and instead generate a
            separate permute vector for each output.

[Bug tree-optimization/116583] vectorizable_slp_permutation cannot handle even/odd extract from VLA vector

Reply via email to