https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99149

--- Comment #1 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>:

https://gcc.gnu.org/g:5296bd57d0605d1fec900d85e3ab3875197e609a

commit r11-7355-g5296bd57d0605d1fec900d85e3ab3875197e609a
Author: Tamar Christina <tamar.christ...@arm.com>
Date:   Wed Feb 24 09:43:22 2021 +0000

    slp: fix sharing of SLP only patterns.

    The attached testcase ICEs due to a couple of issues.
    In the testcase you have two SLP instances that share the majority of their
    definition with each other.  One tree defines a COMPLEX_MUL sequence and
the
    other tree a COMPLEX_FMA.

    The ice happens because:

    1. the refcounts are wrong, in particular the FMA case doesn't correctly
count
    the references for the COMPLEX_MUL that it consumes.

    2. when the FMA is created it incorrectly assumes it can just tear apart
the MUL
    node that it's consuming.  This is wrong and should only be done when there
is
    no more uses of the node, in which case the vector only pattern is no
longer
    relevant.

    To fix the last part the SLP only pattern reset code was moved into
    vect_free_slp_tree which results in cleaner code.  I also think it does
belong
    there since that function knows when there are no more uses of the node and
so
    the pattern should be unmarked, so when the the vectorizer is inspecting
the BB
    it doesn't find the now invalid vector only patterns.

    The patch also clears the SLP_TREE_REPRESENTATIVE when stores are removed
such
    that we don't hit an error later trying to free the stmt_vec_info again.

    Lastly it also tweaks the results of whether a pattern has been detected or
not
    to return true when another SLP instance has created a pattern that is only
used
    by a different instance (due to the trees being unshared).

    Instead of ICEing this code now produces

            adrp    x1, .LANCHOR0
            add     x2, x1, :lo12:.LANCHOR0
            movi    v1.2s, 0
            mov     w0, 0
            ldr     x4, [x1, #:lo12:.LANCHOR0]
            ldrsw   x3, [x2, 16]
            ldr     x1, [x2, 8]
            ldrsw   x2, [x2, 20]
            ldr     d0, [x4]
            ldr     d2, [x1, x3, lsl 3]
            fcmla   v2.2s, v0.2s, v0.2s, #0
            fcmla   v2.2s, v0.2s, v0.2s, #90
            str     d2, [x1, x3, lsl 3]
            fcmla   v1.2s, v0.2s, v0.2s, #0
            fcmla   v1.2s, v0.2s, v0.2s, #90
            str     d1, [x1, x2, lsl 3]
            ret

    PS. This testcase actually shows that the codegen we get in these cases is
not
    optimal. It should generate a MUL + ADD instead MUL + FMA.

    But that's for GCC 12.

    gcc/ChangeLog:

            PR tree-optimization/99149
            * tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate
the
            buffer.
            (vect_slp_reset_pattern): Remove.
            (complex_fma_pattern::matches): Remove call to
vect_slp_reset_pattern.
            (complex_mul_pattern::build, complex_fma_pattern::build,
            complex_fms_pattern::build): Fix ref counts.
            * tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern
relevancy
            when node is being deleted.
            (vect_match_slp_patterns_2): Correct result of cache hit on
patterns.
            (vect_schedule_slp): Invalidate SLP_TREE_REPRESENTATIVE of removed
            stores.
            * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
value.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/99149
            * g++.dg/vect/pr99149.cc: New test.

Reply via email to