https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115258

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:31dcf941ac78c4b1b01dc4b2ce9809f0209153b8

commit r15-7933-g31dcf941ac78c4b1b01dc4b2ce9809f0209153b8
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Mon Mar 10 20:29:52 2025 +0000

    aarch64: Avoid unnecessary use of 2-input TBLs [PR115258]

    When using TBL for (say) a V4SI permutation, the aarch64 port first
    asks target-independent code to lower to a V16QI permutation.
    Then, during code generation, an input like:

      (reg:V4SI R)

    gets converted to:

      (subreg:V16QI (reg:V4SI R) 0)

    aarch64_vectorize_vec_perm_const had:

      d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX;
      if (op0 == op1)
        d.op1 = d.op0;
      else
        d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX;

    But subregs (unlike regs) are not shared, so the op0 == op1 check
    always failed for this case.  We'd then force each subreg into a
    fresh register, meaning that during the later:

      aarch64_expand_vec_perm_1 (d->target, d->op0, d->op1, sel);

    there is no way for aarch64_expand_vec_perm_1 to realise that
    d->op0 and d->op1 are the same value.  It would therefore generate
    a two-input TBL in the testcase, even though a single-input TBL
    is enough.

    I'm not sure forcing subregs to a fresh regiter is a good idea --
    it caused problems for copysign & co. -- but that's not something
    to fiddle with during stage 4.  Using op0 == op1 for rtx equality
    is independently wrong, so we might as well just fix that for now.

    The patch gets rid of extra MOVs that are a regression from GCC 14.

    The testcase is based on one from Kugan, itself based on TSVC.

    gcc/
            PR target/115258
            * config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Use
            d.one_vector_p to decide whether op1 should be a copy of op0.

    gcc/testsuite/
            PR target/115258
            * gcc.target/aarch64/pr115258_2.c: New test.

    Co-authored-by: Kugan Vivekanandarajah <kvivekana...@nvidia.com>

Reply via email to