https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103350

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>:

https://gcc.gnu.org/g:d5c965374cd688b0a8ad0334c85c971c1e9c3f44

commit r12-5996-gd5c965374cd688b0a8ad0334c85c971c1e9c3f44
Author: Tamar Christina <tamar.christ...@arm.com>
Date:   Wed Dec 15 10:26:10 2021 +0000

    middle-end: REE should always check all vector usages, even if it finds a
defining def. [PR103350]

    This and the report in PR103632 are caused by a bug in REE where it
generates
    incorrect code.

    It's trying to eliminate the following zero extension

    (insn 54 90 102 2 (set (reg:V4SI 33 v1)
            (zero_extend:V4SI (reg/v:V4HI 40 v8)))
         (nil))

    by folding it in the definition of `v8`:

    (insn 2 5 104 2 (set (reg/v:V4HI 40 v8)
            (reg:V4HI 32 v0 [156]))
         (nil))

    which is fine, except that `v8` is also used by the extracts, e.g.:

    (insn 11 10 12 2 (set (reg:SI 1 x1)
            (zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
                    (parallel [
                            (const_int 3)
                        ]))))
         (nil))

    REE replaces insn 2 by folding insn 54 and placing it at the definition
site of
    insn 2, so before insn 11.

    Trying to eliminate extension:
    (insn 54 90 102 2 (set (reg:V4SI 33 v1)
            (zero_extend:V4SI (reg/v:V4HI 40 v8)))
         (nil))
    Tentatively merged extension with definition (copy needed):
    (insn 2 5 104 2 (set (reg:V4SI 33 v1)
            (zero_extend:V4SI (reg:V4HI 32 v0)))
         (nil))

    to produce

    (insn 2 5 110 2 (set (reg:V4SI 33 v1)
            (zero_extend:V4SI (reg:V4HI 32 v0)))
         (nil))
    (insn 110 2 104 2 (set (reg:V4SI 40 v8)
            (reg:V4SI 33 v1))
         (nil))

    The new insn 2 using v0 directly is correct, but the insn 110 it creates is
    wrong, `v8` should still be V4HI.

    or it also needs to eliminate the zero extension from the extracts, so
instead
    of

    (insn 11 10 12 2 (set (reg:SI 1 x1)
            (zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
                    (parallel [
                            (const_int 3)
                        ]))))
         (nil))

    it should be

    (insn 11 10 12 2 (set (reg:SI 1 x1)
            (vec_select:SI (reg/v:V4SI 40 v8)
                    (parallel [
                            (const_int 3)
                        ])))
         (nil))

    without doing so the indices have been remapped in the extension and so we
    extract the wrong elements

    At any other optimization level but -Os ree seems to abort so this doesn't
    trigger:

    Trying to eliminate extension:
    (insn 54 90 101 2 (set (reg:V4SI 32 v0)
            (zero_extend:V4SI (reg/v:V4HI 40 v8)))
         (nil))
    Elimination opportunities = 2 realized = 0

    purely due to the ordering of instructions. REE doesn't check uses of `v8`
    because it assumes that with a zero extended value, you still have access
to the
    lower bits by using the the bottom part of the register.

    This is true for scalar but not for vector.  This would have been fine as
well
    if REE had eliminated the zero_extend on insn 11 and the rest but it
doesn't do
    so since REE can only handle cases where the SRC value are REG_P.

    It does try to do this in add_removable_extension:

     1160      /* For vector mode extensions, ensure that all uses of the
     1161         XEXP (src, 0) register are in insn or debug insns, as unlike
     1162         integral extensions lowpart subreg of the sign/zero extended
     1163         register are not equal to the original register, so we have
     1164         to change all uses or none and the current code isn't able
     1165         to change them all at once in one transaction.  */

    However this code doesn't trigger for the example because REE doesn't check
the
    uses if the defining instruction doesn't feed into another extension..

    Which is bogus. For vectors it should always check all usages.

    r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 simply exposed this as
it now
    lowers VEC_SELECT 0 into the RTL canonical form subreg 0 which causes REE
to run
    more often.

    gcc/ChangeLog:

            PR rtl-optimization/103350
            * ree.c (add_removable_extension): Don't stop at first definition
but
            inspect all.

    gcc/testsuite/ChangeLog:

            PR rtl-optimization/103350
            * gcc.target/aarch64/pr103350-1.c: New test.
            * gcc.target/aarch64/pr103350-2.c: New test.

Reply via email to