https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115258

--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:39263ed2d39ac1cebde59bc5e72ddcad5dc7a1ec

commit r15-906-g39263ed2d39ac1cebde59bc5e72ddcad5dc7a1ec
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Wed May 29 16:43:33 2024 +0100

    aarch64: Split aarch64_combinev16qi before RA [PR115258]

    Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose
    purpose is to put the two input data vectors into consecutive registers.
    This aarch64_combinev16qi was then split after reload into individual
    moves (from the first input to the first half of the output, and from
    the second input to the second half of the output).

    In the worst case, the RA might allocate things so that the destination
    of the aarch64_combinev16qi is the second input followed by the first
    input.  In that case, the split form of aarch64_combinev16qi uses three
    eors to swap the registers around.

    This PR is about a test where this worst case occurred.  And given the
    insn description, that allocation doesn't semm unreasonable.

    early-ra should (hopefully) mean that we're now better at allocating
    subregs of vector registers.  The upcoming RA subreg patches should
    improve things further.  The best fix for the PR therefore seems
    to be to split the combination before RA, so that the RA can see
    the underlying moves.

    Perhaps it even makes sense to do this at expand time, avoiding the need
    for aarch64_combinev16qi entirely.  That deserves more experimentation
    though.

    gcc/
            PR target/115258
            * config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Allow
            the split before reload.
            * config/aarch64/aarch64.cc (aarch64_split_combinev16qi):
Generalize
            into a form that handles pseudo registers.

    gcc/testsuite/
            PR target/115258
            * gcc.target/aarch64/pr115258.c: New test.

Reply via email to