https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

--- Comment #21 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>:

https://gcc.gnu.org/g:27653070db35216d5115cc25672fcc6a51203d26

commit r15-7520-g27653070db35216d5115cc25672fcc6a51203d26
Author: Richard Biener <rguent...@suse.de>
Date:   Wed Feb 12 14:18:06 2025 +0100

    tree-optimization/90579 - avoid STLF fail by better optimizing

    For the testcase in question which uses a fold-left vectorized
    reduction of a reverse iterating loop we'd need two forwprop
    invocations to first bypass the permute emitted for the reverse
    iterating loop and then to decompose the vector load that only
    feeds element extracts.  The following moves the first transform
    to a match.pd pattern and makes sure we fold the element extracts
    when the vectorizer emits them so the single forwprop pass can
    then pick up the vector load decomposition, avoiding the forwarding
    fail that causes.

    Moving simplify_bitfield_ref also makes forwprop remove the dead
    VEC_PERM_EXPR via the simple-dce it uses - this was also
    previously missing.

            PR tree-optimization/90579
            * tree-ssa-forwprop.cc (simplify_bitfield_ref): Move to
            match.pd.
            (pass_forwprop::execute): Adjust.
            * match.pd (bit_field_ref (vec_perm ...)): New pattern
            modeled after simplify_bitfield_ref.
            * tree-vect-loop.cc (vect_expand_fold_left): Fold the
            element extract stmt, combining it with the vector def.

            * gcc.target/i386/pr90579.c: New testcase.

Reply via email to