https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106329

--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jennifer Schmitz <jschm...@gcc.gnu.org>:

https://gcc.gnu.org/g:5289540ed58e42ae66255e31f22afe4ca0a6e15e

commit r15-5957-g5289540ed58e42ae66255e31f22afe4ca0a6e15e
Author: Jennifer Schmitz <jschm...@nvidia.com>
Date:   Fri Nov 15 07:45:59 2024 -0800

    SVE intrinsics: Fold calls with pfalse predicate.

    If an SVE intrinsic has predicate pfalse, we can fold the call to
    a simplified assignment statement: For _m predication, the LHS can be
assigned
    the operand for inactive values and for _z, we can assign a zero vector.
    For _x, the returned values can be arbitrary and as suggested by
    Richard Sandiford, we fold to a zero vector.

    For example,
    svint32_t foo (svint32_t op1, svint32_t op2)
    {
      return svadd_s32_m (svpfalse_b (), op1, op2);
    }
    can be folded to lhs = op1, such that foo is compiled to just a RET.

    For implicit predication, a case distinction is necessary:
    Intrinsics that read from memory can be folded to a zero vector.
    Intrinsics that write to memory or prefetch can be folded to a no-op.
    Other intrinsics need case-by-case implemenation, which we added in
    the corresponding svxxx_impl::fold.

    We implemented this optimization during gimple folding by calling a new
method
    gimple_folder::fold_pfalse from gimple_folder::fold, which covers the
generic
    cases described above.

    We tested the new behavior for each intrinsic with all supported
predications
    and data types and checked the produced assembly. There is a test file
    for each shape subclass with scan-assembler-times tests that look for
    the simplified instruction sequences, such as individual RET instructions
    or zeroing moves. There is an additional directive counting the total
number of
    functions in the test, which must be the sum of counts of all other
    directives. This is to check that all tested intrinsics were optimized.

    Some few intrinsics were not covered by this patch:
    - svlasta and svlastb already have an implementation to cover a pfalse
    predicate. No changes were made to them.
    - svld1/2/3/4 return aggregate types and were excluded from the case
    that folds calls with implicit predication to lhs = {0, ...}.
    - svst1/2/3/4 already have an implementation in svstx_impl that precedes
    our optimization, such that it is not triggered.

    The patch was bootstrapped and regtested on aarch64-linux-gnu, no
regression.
    OK for mainline?

    Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>

    gcc/ChangeLog:

            PR target/106329
            * config/aarch64/aarch64-sve-builtins-base.cc
            (svac_impl::fold): Add folding if pfalse predicate.
            (svadda_impl::fold): Likewise.
            (class svaddv_impl): Likewise.
            (class svandv_impl): Likewise.
            (svclast_impl::fold): Likewise.
            (svcmp_impl::fold): Likewise.
            (svcmp_wide_impl::fold): Likewise.
            (svcmpuo_impl::fold): Likewise.
            (svcntp_impl::fold): Likewise.
            (class svcompact_impl): Likewise.
            (class svcvtnt_impl): Likewise.
            (class sveorv_impl): Likewise.
            (class svminv_impl): Likewise.
            (class svmaxnmv_impl): Likewise.
            (class svmaxv_impl): Likewise.
            (class svminnmv_impl): Likewise.
            (class svorv_impl): Likewise.
            (svpfirst_svpnext_impl::fold): Likewise.
            (svptest_impl::fold): Likewise.
            (class svsplice_impl): Likewise.
            * config/aarch64/aarch64-sve-builtins-sve2.cc
            (class svcvtxnt_impl): Likewise.
            (svmatch_svnmatch_impl::fold): Likewise.
            * config/aarch64/aarch64-sve-builtins.cc
            (is_pfalse): Return true if tree is pfalse.
            (gimple_folder::fold_pfalse): Fold calls with pfalse predicate.
            (gimple_folder::fold_call_to): Fold call to lhs = t for given tree
t.
            (gimple_folder::fold_to_stmt_vops): Helper function that folds the
            call to given stmt and adjusts virtual operands.
            (gimple_folder::fold): Call fold_pfalse.
            * config/aarch64/aarch64-sve-builtins.h (is_pfalse): Declare
is_pfalse.

    gcc/testsuite/ChangeLog:

            PR target/106329
            * gcc.target/aarch64/pfalse-binary_0.h: New test.
            * gcc.target/aarch64/pfalse-unary_0.h: New test.
            * gcc.target/aarch64/sve/pfalse-binary.c: New test.
            * gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-binary_opt_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-binary_rotate.c: New test.
            * gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-binaryxn.c: New test.
            * gcc.target/aarch64/sve/pfalse-clast.c: New test.
            * gcc.target/aarch64/sve/pfalse-compare_opt_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-count_pred.c: New test.
            * gcc.target/aarch64/sve/pfalse-fold_left.c: New test.
            * gcc.target/aarch64/sve/pfalse-load.c: New test.
            * gcc.target/aarch64/sve/pfalse-load_ext.c: New test.
            * gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c: New test.
            * gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c: New test.
            * gcc.target/aarch64/sve/pfalse-load_gather_sv.c: New test.
            * gcc.target/aarch64/sve/pfalse-load_gather_vs.c: New test.
            * gcc.target/aarch64/sve/pfalse-load_replicate.c: New test.
            * gcc.target/aarch64/sve/pfalse-prefetch.c: New test.
            * gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c: New test.
            * gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c: New test.
            * gcc.target/aarch64/sve/pfalse-ptest.c: New test.
            * gcc.target/aarch64/sve/pfalse-rdffr.c: New test.
            * gcc.target/aarch64/sve/pfalse-reduction.c: New test.
            * gcc.target/aarch64/sve/pfalse-reduction_wide.c: New test.
            * gcc.target/aarch64/sve/pfalse-shift_right_imm.c: New test.
            * gcc.target/aarch64/sve/pfalse-store.c: New test.
            * gcc.target/aarch64/sve/pfalse-store_scatter_index.c: New test.
            * gcc.target/aarch64/sve/pfalse-store_scatter_offset.c: New test.
            * gcc.target/aarch64/sve/pfalse-storexn.c: New test.
            * gcc.target/aarch64/sve/pfalse-ternary_opt_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-ternary_rotate.c: New test.
            * gcc.target/aarch64/sve/pfalse-unary.c: New test.
            * gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c: New test.
            * gcc.target/aarch64/sve/pfalse-unary_convertxn.c: New test.
            * gcc.target/aarch64/sve/pfalse-unary_n.c: New test.
            * gcc.target/aarch64/sve/pfalse-unary_pred.c: New test.
            * gcc.target/aarch64/sve/pfalse-unary_to_uint.c: New test.
            * gcc.target/aarch64/sve/pfalse-unaryxn.c: New test.
            * gcc.target/aarch64/sve2/pfalse-binary.c: New test.
            * gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c: New test.
            * gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c: New
test.
            * gcc.target/aarch64/sve2/pfalse-binary_opt_n.c: New test.
            * gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c: New test.
            * gcc.target/aarch64/sve2/pfalse-binary_to_uint.c: New test.
            * gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c: New test.
            * gcc.target/aarch64/sve2/pfalse-binary_wide.c: New test.
            * gcc.target/aarch64/sve2/pfalse-compare.c: New test.
            *
gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c:
            New test.
            *
gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c:
            New test.
            * gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c: New
test.
            * gcc.target/aarch64/sve2/pfalse-load_gather_vs.c: New test.
            * gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c: New
test.
            * gcc.target/aarch64/sve2/pfalse-shift_right_imm.c: New test.
            * gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c:
            New test.
            * gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c:
            New test.
            * gcc.target/aarch64/sve2/pfalse-unary.c: New test.
            * gcc.target/aarch64/sve2/pfalse-unary_convert.c: New test.
            * gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c: New test.
            * gcc.target/aarch64/sve2/pfalse-unary_to_int.c: New test.
  • [Bug target/106329] No optimiza... cvs-commit at gcc dot gnu.org via Gcc-bugs

Reply via email to