[PATCH] MATCH: add abs support for half float
This patch extends abs detection in matched for half float. Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for trunk? gcc/ChangeLog: * match.pd: Add pattern to convert (type)A >=/> 0 ? A : -A into abs (A) for half float. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/absfloat16.c: New test. Signed-off-by: Kugan Vivekanandarajah 0001-abs-for-half-float.patch Description: 0001-abs-for-half-float.patch
[PATCH] gimple-fold: consistent dump of builtin call simplifications
Previously only simplifications of the `__st[xrp]cpy_chk` were dumped. Now all call replacement simplifications are dumped. Examples of statements with corresponding dumpfile entries: `printf("mystr\n");`: optimized: simplified printf to __builtin_puts `printf("%c", 'a');`: optimized: simplified printf to __builtin_putchar `printf("%s\n", "mystr");`: optimized: simplified printf to __builtin_puts 2024-07-13 Rubin Gerritsen gcc/ChangeLog: * gimple-fold.cc (dump_transformation): Moved definition. (replace_call_with_call_and_fold): Calls dump_transformation. (gimple_fold_builtin_stxcpy_chk): Removes call to dump_transformation, now in replace_call_with_call_and_fold. (gimple_fold_builtin_stxncpy_chk): Removes call to dump_transformation, now in replace_call_with_call_and_fold. --- gcc/gimple-fold.cc | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index 7c534d56bf1..b20d3a2ff9a 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -802,6 +802,15 @@ gimplify_and_update_call_from_tree (gimple_stmt_iterator *si_p, tree expr) gsi_replace_with_seq_vops (si_p, stmts); } +/* Print a message in the dump file recording transformation of FROM to TO. */ + +static void +dump_transformation (gcall *from, gcall *to) +{ + if (dump_enabled_p ()) +dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to %T\n", + gimple_call_fn (from), gimple_call_fn (to)); +} /* Replace the call at *GSI with the gimple value VAL. */ @@ -835,6 +844,7 @@ static void replace_call_with_call_and_fold (gimple_stmt_iterator *gsi, gimple *repl) { gimple *stmt = gsi_stmt (*gsi); + dump_transformation (as_a (stmt), as_a (repl)); gimple_call_set_lhs (repl, gimple_call_lhs (stmt)); gimple_set_location (repl, gimple_location (stmt)); gimple_move_vops (repl, stmt); @@ -3090,16 +3100,6 @@ gimple_fold_builtin_memory_chk (gimple_stmt_iterator *gsi, return true; } -/* Print a message in the dump file recording transformation of FROM to TO. */ - -static void -dump_transformation (gcall *from, gcall *to) -{ - if (dump_enabled_p ()) -dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to %T\n", - gimple_call_fn (from), gimple_call_fn (to)); -} - /* Fold a call to the __st[rp]cpy_chk builtin. DEST, SRC, and SIZE are the arguments to the call. IGNORE is true if return value can be ignored. FCODE is the BUILT_IN_* @@ -3189,7 +3189,6 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterator *gsi, return false; gcall *repl = gimple_build_call (fn, 2, dest, src); - dump_transformation (stmt, repl); replace_call_with_call_and_fold (gsi, repl); return true; } @@ -3235,7 +3234,6 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator *gsi, return false; gcall *repl = gimple_build_call (fn, 3, dest, src, len); - dump_transformation (stmt, repl); replace_call_with_call_and_fold (gsi, repl); return true; } -- 2.34.1
[pushed] wwwdocs: gcc-*: Tweak links to testing instructions to use https
Business as usual; pushed. Gerald --- htdocs/gcc-5/buildstat.html | 2 +- htdocs/gcc-6/buildstat.html | 2 +- htdocs/gcc-7/buildstat.html | 2 +- htdocs/gcc-8/buildstat.html | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-5/buildstat.html b/htdocs/gcc-5/buildstat.html index 59c9a5a6..03cbb03e 100644 --- a/htdocs/gcc-5/buildstat.html +++ b/htdocs/gcc-5/buildstat.html @@ -16,7 +16,7 @@ summaries. Instructions for running the testsuite and for submitting test results are part of -http://gcc.gnu.org/install/test.html";> +https://gcc.gnu.org/install/test.html";> Installing GCC: Testing. diff --git a/htdocs/gcc-6/buildstat.html b/htdocs/gcc-6/buildstat.html index a4609405..06a87da7 100644 --- a/htdocs/gcc-6/buildstat.html +++ b/htdocs/gcc-6/buildstat.html @@ -16,7 +16,7 @@ summaries. Instructions for running the testsuite and for submitting test results are part of -http://gcc.gnu.org/install/test.html";> +https://gcc.gnu.org/install/test.html";> Installing GCC: Testing. diff --git a/htdocs/gcc-7/buildstat.html b/htdocs/gcc-7/buildstat.html index fb9524d1..62659059 100644 --- a/htdocs/gcc-7/buildstat.html +++ b/htdocs/gcc-7/buildstat.html @@ -16,7 +16,7 @@ summaries. Instructions for running the test suite and for submitting test results are part of -http://gcc.gnu.org/install/test.html";> +https://gcc.gnu.org/install/test.html";> Installing GCC: Testing. diff --git a/htdocs/gcc-8/buildstat.html b/htdocs/gcc-8/buildstat.html index 0e7a808e..ad0ec217 100644 --- a/htdocs/gcc-8/buildstat.html +++ b/htdocs/gcc-8/buildstat.html @@ -16,7 +16,7 @@ summaries. Instructions for running the test suite and for submitting test results are part of -http://gcc.gnu.org/install/test.html";> +https://gcc.gnu.org/install/test.html";> Installing GCC: Testing. -- 2.45.2
[match.pd PATCH] PR tree-optimization/114661: Generalize MULT_EXPR recognition (take #2)
Hi Richard, Many thanks for the review and recommendation to use nop_convert?. This revised patch implements that suggestion, which required a little experimentation/tweaking as ranger/EVRP records the ranges on the useless type conversions rather than the multiplications. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-07-14 Roger Sayle Richard Biener gcc/ChangeLog PR tree-optimization/114661 * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless type conversions around multiplicaitions, such as those inserted by this transformation. gcc/testsuite/ChangeLog PR tree-optimization/114661 * gcc.dg/pr114661.c: New test case. Thanks again, Roger -- > -Original Message- > From: Richard Biener > Sent: 10 July 2024 12:34 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [match.pd PATCH] PR tree-optimization/114661: Generalize > MULT_EXPR recognition. > > On Wed, Jul 10, 2024 at 12:28 AM Roger Sayle > wrote: > > > > This patch resolves PR tree-optimization/114661, by generalizing the > > set of expressions that we canonicalize to multiplication. This > > extends the > > optimization(s) contributed (by me) back in July 2021. > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html > > > > The existing transformation folds (X*C1)^(X< > allowed. A subtlety is that for non-wrapping integer types, we > > actually fold this into (int)((unsigned)X*C3) so that we don't > > introduce an undefined overflow that wasn't in the original. > > Unfortunately, this transformation confuses itself, as the type-safe > > multiplication isn't recognized when further combining bit operations. > > Fixed here by adding transforms to turn (int)((unsigned)X*C1)^(X< > into (int)((unsigned)X*C3) so that match.pd and EVRP can continue to > > construct multiplications. > > > > For the example given in the PR: > > > > unsigned mul(unsigned char c) { > > if (c > 3) __builtin_unreachable(); > > return c << 18 | c << 15 | > >c << 12 | c << 9 | > >c << 6 | c << 3 | c; > > } > > > > GCC on x86_64 with -O2 previously generated: > > > > mul:movzbl %dil, %edi > > leal(%rdi,%rdi,8), %edx > > leal0(,%rdx,8), %eax > > movl%edx, %ecx > > sall$15, %edx > > orl %edi, %eax > > sall$9, %ecx > > orl %ecx, %eax > > orl %edx, %eax > > ret > > > > with this patch we now generate: > > > > mul:movzbl %dil, %eax > > imull $299593, %eax, %eax > > ret > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > I'm looking at the difference between the existing > > (simplify > (op:c (mult:s@0 @1 INTEGER_CST@2) > (lshift:s@3 @1 INTEGER_CST@4)) > (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type) >&& tree_int_cst_sgn (@4) > 0 >&& (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0) >(with { wide_int wone = wi::one (TYPE_PRECISION (type)); >wide_int c = wi::add (wi::to_wide (@2), > wi::lshift (wone, wi::to_wide (@4))); } > (mult @1 { wide_int_to_tree (type, c); } > > and > > + (simplify > + (op:c (convert:s@0 (mult:s@1 (convert @2) INTEGER_CST@3)) > + (lshift:s@4 @2 INTEGER_CST@5)) > + (if (INTEGRAL_TYPE_P (type) > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > + && TREE_TYPE (@2) == type > + && TYPE_UNSIGNED (TREE_TYPE (@1)) > + && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1)) > + && tree_int_cst_sgn (@5) > 0 > + && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0) > + (with { tree t = TREE_TYPE (@1); > + wide_int wone = wi::one (TYPE_PRECISION (t)); > + wide_int c = wi::add (wi::to_wide (@3), > +wi::lshift (wone, wi::to_wide (@5))); } > +(convert (mult:t (convert:t @2) { wide_int_to_tree (t, c); }) > > and wonder whether wrapping of the multiplication is required for correctness, > specifically the former seems to allow signed types with -fwrapv while the > latter > won't. It also looks the patterns could be merged doing > > (simplify > (op:c (nop_convert:s? (mult:s@0 (nop_convert? @1) INTEGER_CST@2) > (lshift:s@3 @1 INTEGER_CST@4)) > > and by using nop_convert instead of convert simplify the condition? > > Richard. > > > > > 2024-07-09 Roger Sayle > > > > gcc/ChangeLog > > PR tree-optimization/114661 > > * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Additionally recognize > > multiplications surrounded by casts to an unsigned type and back > > such as those generated by
[x86 PATCH] Tweak i386-expand.cc to restore bootstrap on RHEL.
This is a minor change to restore bootstrap on systems using gcc 4.8 as a host compiler. The fatal error is: In file included from gcc/gcc/coretypes.h:471:0, from gcc/gcc/config/i386/i386-expand.cc:23: gcc/gcc/config/i386/i386-expand.cc: In function 'void ix86_expand_fp_absneg_operator(rtx_code, machine_mode, rtx_def**)': ./insn-modes.h:315:75: error: temporary of non-literal type 'scalar_float_mode' in a constant expression #define HFmode (scalar_float_mode ((scalar_float_mode::from_int) E_HFmode)) ^ gcc/gcc/config/i386/i386-expand.cc:2179:8: note: in expansion of macro 'HFmode' case HFmode: ^ The solution is to use the E_?Fmode enumeration constants as case values in switch statements. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures (from this change). Ok for mainline? 2024-07-14 Roger Sayle * config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator): Use E_?Fmode enumeration constants in switch statement. (ix86_expand_copysign): Likewise. (ix86_expand_xorsign): Likewise. Thanks in advance, Roger -- diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index cfcfdd9..9a31e6d 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2176,19 +2176,19 @@ ix86_expand_fp_absneg_operator (enum rtx_code code, machine_mode mode, switch (mode) { - case HFmode: + case E_HFmode: use_sse = true; vmode = V8HFmode; break; - case BFmode: + case E_BFmode: use_sse = true; vmode = V8BFmode; break; - case SFmode: + case E_SFmode: use_sse = TARGET_SSE_MATH && TARGET_SSE; vmode = V4SFmode; break; - case DFmode: + case E_DFmode: use_sse = TARGET_SSE_MATH && TARGET_SSE2; vmode = V2DFmode; break; @@ -2330,19 +2330,19 @@ ix86_expand_copysign (rtx operands[]) switch (mode) { - case HFmode: + case E_HFmode: vmode = V8HFmode; break; - case BFmode: + case E_BFmode: vmode = V8BFmode; break; - case SFmode: + case E_SFmode: vmode = V4SFmode; break; - case DFmode: + case E_DFmode: vmode = V2DFmode; break; - case TFmode: + case E_TFmode: vmode = mode; break; default: @@ -2410,16 +2410,16 @@ ix86_expand_xorsign (rtx operands[]) switch (mode) { - case HFmode: + case E_HFmode: vmode = V8HFmode; break; - case BFmode: + case E_BFmode: vmode = V8BFmode; break; - case SFmode: + case E_SFmode: vmode = V4SFmode; break; - case DFmode: + case E_DFmode: vmode = V2DFmode; break; default:
Re: [pushed] Add function filtering to gcov
Im seeing (dejagnu) testsuite problems from this (recent) patch. Running /home/roger/GCC/patchem/gcc/testsuite/gcc.misc-tests/gcov.exp ... ERROR: (DejaGnu) proc "lmap key { snd } { if { $key in $seen } continue set key }" does not exist. The error code is NONE The info on the error is: invalid command name "lmap" while executing "::tcl_unknown lmap key { snd } { if { $key in $seen } continue set key }" ("uplevel" body line 1) invoked from within "uplevel 1 ::tcl_unknown $args" I guess (but Im not sure) that lmap requires Tcl 8.6, and my RHEL-based system has Tcl 8.5. Is there a simple workaround to avoid the use of lmap? Admittedly the systems that I use are a bit "long in the tooth" (obsolete?) [but that's also why they're available for use with my gcc "hobby"]. Thoughts? Roger --
Re: [x86 PATCH] Tweak i386-expand.cc to restore bootstrap on RHEL.
On Sun, Jul 14, 2024 at 3:42 PM Roger Sayle wrote: > > > This is a minor change to restore bootstrap on systems using gcc 4.8 > as a host compiler. The fatal error is: > > In file included from gcc/gcc/coretypes.h:471:0, > from gcc/gcc/config/i386/i386-expand.cc:23: > gcc/gcc/config/i386/i386-expand.cc: In function 'void > ix86_expand_fp_absneg_operator(rtx_code, machine_mode, rtx_def**)': > ./insn-modes.h:315:75: error: temporary of non-literal type > 'scalar_float_mode' in a constant expression > #define HFmode (scalar_float_mode ((scalar_float_mode::from_int) E_HFmode)) >^ > gcc/gcc/config/i386/i386-expand.cc:2179:8: note: in expansion of macro > 'HFmode' >case HFmode: > ^ > > > The solution is to use the E_?Fmode enumeration constants as case values > in switch statements. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures (from this change). Ok for mainline? > > > 2024-07-14 Roger Sayle > > * config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator): > Use E_?Fmode enumeration constants in switch statement. > (ix86_expand_copysign): Likewise. > (ix86_expand_xorsign): Likewise. OK, also for backports. Thanks, Uros. > > > Thanks in advance, > Roger > -- >
Re: [Patch, fortran] PR84868 - [11/12/13/14/15 Regression] ICE in gfc_conv_descriptor_offset, at fortran/trans-array.c:208
Hi Paul, at first sight the patch seems to be the right approach, but it breaks for the following two variations: (1) LEN_TRIM is elemental, but the following is erroneously rejected: function g(n) result(z) integer, intent(in) :: n character, parameter :: d(3,3) = 'x' character(len_trim(d(n,n))) :: z z = d(n,n) end This is fixed here by commenting/removing the line expr->rank = 1; as the result shall have the same shape as the argument. Can you check? (2) The handling of namespaces is problematic: using the same name for a parameter within procedures in the same scope generates another ICE. The following testcase demonstrates this: module m implicit none integer :: c contains function f(n) result(z) integer, intent(in) :: n character, parameter :: c(3) = ['x', 'y', 'z'] character(len_trim(c(n))) :: z z = c(n) end function h(n) result(z) integer, intent(in) :: n character, parameter :: c(3,3) = 'x' character(len_trim(c(n,n))) :: z z = c(n,n) end end program p use m implicit none print *, f(2) print *, h(1) end I get: pr84868-z0.f90:22:15: 22 | print *, h(1) | 1 internal compiler error: in gfc_conv_descriptor_stride_get, at fortran/trans-array.cc:483 0x243e156 internal_error(char const*, ...) ../../gcc-trunk/gcc/diagnostic-global-context.cc:491 0x96dd70 fancy_abort(char const*, int, char const*) ../../gcc-trunk/gcc/diagnostic.cc:1725 0x749d68 gfc_conv_descriptor_stride_get(tree_node*, tree_node*) ../../gcc-trunk/gcc/fortran/trans-array.cc:483 [rest of traceback elided] Renaming the parameter array in h solves the problem. Am 13.07.24 um 17:57 schrieb Paul Richard Thomas: Hi All, Harald has pointed out that I attached the ChangeLog twice and the patch not at all :-( Please find the patch duly attached. Paul On Sat, 13 Jul 2024 at 10:58, Paul Richard Thomas < paul.richard.tho...@gmail.com> wrote: Hi All, After messing around with argument mapping, where I found and fixed another bug, I realised that the problem lay with simplification of len_trim with an argument that is the element of a parameter array. The fix was then a straightforward lift of existing code in expr.cc. The mapping bug is also fixed by supplying the se string length when building character typespecs. Regtests just fine. OK for mainline? I believe that this is safe for backporting to 14-branch before the 14.2 release - thoughts? If you manage to correct/fix the above issues, I am fine with backporting, as this appears a very reasonable fix. Thanks, Harald Regards Paul
Re: [pushed] Add function filtering to gcov
Certainly, I can rewrite from lmap. I'll send a patch shortly. On 7/14/24 16:27, Roger Sayle wrote: I’m seeing (dejagnu) testsuite problems from this (recent) patch. Running /home/roger/GCC/patchem/gcc/testsuite/gcc.misc-tests/gcov.exp ... ERROR: (DejaGnu) proc "lmap key { snd } { if { $key in $seen } continue set key }" does not exist. The error code is NONE The info on the error is: invalid command name "lmap" while executing "::tcl_unknown lmap key { snd } { if { $key in $seen } continue set key }" ("uplevel" body line 1) invoked from within "uplevel 1 ::tcl_unknown $args" I guess (but I’m not sure) that lmap requires Tcl 8.6, and my RHEL-based system has Tcl 8.5. Is there a simple workaround to avoid the use of lmap? Admittedly the systems that I use are a bit "long in the tooth" (obsolete?) [but that's also why they're available for use with my gcc "hobby"]. Thoughts? Roger --
Re: [PATCH] MATCH: add abs support for half float
On Sun, Jul 14, 2024 at 1:12 AM Kugan Vivekanandarajah wrote: > > This patch extends abs detection in matched for half float. > > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for trunk? This is basically this pattern: ``` /* A >=/> 0 ? A : -Asame as abs (A) */ (for cmp (ge gt) (simplify (cnd (cmp @0 zerop) @1 (negate @1)) (if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0)) && !TYPE_UNSIGNED (TREE_TYPE(@0)) && bitwise_equal_p (@0, @1)) (if (TYPE_UNSIGNED (type)) (absu:type @0) (abs @0) ``` except extended to handle an optional convert. Why didn't you just extend the above pattern to handle the convert instead? Also I think you have an issue with unsigned types with the comparison. Also you should extend the -abs(A) pattern right below it in a similar fashion. Thanks, Andrew Pinski > > gcc/ChangeLog: > > * match.pd: Add pattern to convert (type)A >=/> 0 ? A : -A into abs (A) for > half float. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/absfloat16.c: New test. > > Signed-off-by: Kugan Vivekanandarajah >
[PATCH] Use foreach, not lmap, for tcl <= 8.5 compat
lmap was introduced in tcl 8.6, and while it was released in 2012, lmap does not really make too much of a difference to warrant the friction on consverative (and relevant) systems. gcc/testsuite/ChangeLog: * lib/gcov.exp: Use foreach for tcl <= 8.5. --- gcc/testsuite/lib/gcov.exp | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/gcc/testsuite/lib/gcov.exp b/gcc/testsuite/lib/gcov.exp index 3fc7b65bee5..68696c9aa50 100644 --- a/gcc/testsuite/lib/gcov.exp +++ b/gcc/testsuite/lib/gcov.exp @@ -512,25 +512,29 @@ proc verify-filters { testname testcase file expected unexpected } { set seen [lsort -unique $seen] -set expected [lmap key $expected { - if { $key in $seen } continue - set key -}] -set unexpected [lmap key $unexpected { - if { $key ni $seen } continue - set key -}] - -foreach sym $expected { +set ex {} +foreach key $expected { + if { $key ni $seen } { + lappend ex $key + } +} +set unex {} +foreach key $unexpected { + if { $key in $seen } { + lappend unex $key + } +} + +foreach sym $ex { fail "Did not see expected symbol '$sym'" } -foreach sym $unexpected { +foreach sym $unex { fail "Found unexpected symbol '$sym'" } close $fd -return [expr [llength $expected] + [llength $unexpected]] +return [expr [llength $ex] + [llength $unex]] } proc verify-prime-paths { testname testcase file } { -- 2.39.2
[PR middle-end/114635] Set OMP safelen handling to INT_MAX when the pragma didn’t provide one.
OMP safelen handling is assigning backend provided max as an int even when the pragma didn’t provide one. As a result, vectoriser is rejecting SVE modes while comparing poly_int with the safelen. That is, for the attached test case, omp_max_vf gets [16, 16] from the backend. This then becomes 16 as omp safelen is an integer. When vectoriser compares the potential vector mode with maybe_lt (max_vf, min_vf)) , this would fail resulting in any SVE vector mode being selected. One suggestion there was to set safelen to INT_MAX when OMP pragma does not provide safely explicitly. Bootstrapped and regression tested on aarch64-linux-gnu. Is this OK for trunk. Thanks, Kugan PR middle-end/114635 PR 114635 gcc/ChangeLog: * omp-low.cc (lower_rec_input_clauses): Set INT_MAX when safelen is not provided instead of using backend provided safelen. gcc/testsuite/ChangeLog: * c-c++-common/pr114635-1.cpp: New test. * c-c++-common/pr114635-2.cpp: New test. Signed-off-by: Kugan Vivekanandarajah diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index 4d003f42098..69feedbde54 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -6980,6 +6980,8 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, || (poly_int_tree_p (OMP_CLAUSE_SAFELEN_EXPR (c), &safe_len) && maybe_gt (safe_len, sctx.max_vf))) { + if (!sctx.is_simt && maybe_ne (sctx.max_vf, 1U)) + sctx.max_vf = INT_MAX; c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN); OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst (integer_type_node, sctx.max_vf); diff --git a/gcc/testsuite/c-c++-common/pr114635-1.cpp b/gcc/testsuite/c-c++-common/pr114635-1.cpp new file mode 100644 index 000..9bf52ba85b0 --- /dev/null +++ b/gcc/testsuite/c-c++-common/pr114635-1.cpp @@ -0,0 +1,60 @@ + +/* PR middle-end/114635 */ +/* { dg-do compile } */ +/* { dg-options "-fopenmp -O3 -fdump-tree-omplower" } */ +namespace std { + inline constexpr float + sqrt(float __x) + { return __builtin_sqrtf(__x); } +} +extern const float PolyCoefficients4[] = { + 0.263729f, -0.0686285f, 0.00882248f, -0.000592487f, 0.164622f +}; + +template +static void GravityForceKernel(int n, float *__restrict__ x, float *__restrict__ y, + float *__restrict__ z, float *__restrict__ mass, + float x0, float y0, float z0, + float MaxSepSqrd, float SofteningLenSqrd, + float &__restrict__ ax, float &__restrict__ ay, + float &__restrict__ az) { + float lax = 0.0f, lay = 0.0f, laz = 0.0f; + +#pragma omp simd reduction(+:lax,lay,laz) + + for (int i = 0; i < n; ++i) { +float dx = x[i] - x0, dy = y[i] - y0, dz = z[i] - z0; +float r2 = dx * dx + dy * dy + dz * dz; + +if (r2 >= MaxSepSqrd || r2 == 0.0f) + continue; + +float r2s = r2 + SofteningLenSqrd; +float f = PolyCoefficients[PolyOrder]; +for (int p = 1; p <= PolyOrder; ++p) + f = PolyCoefficients[PolyOrder-p] + r2*f; + +f = (1.0f / (r2s * std::sqrt(r2s)) - f) * mass[i]; + +lax += f * dx; +lay += f * dy; +laz += f * dz; + } + + ax += lax; + ay += lay; + az += laz; +} + +void GravityForceKernel4(int n, float *__restrict__ x, float *__restrict__ y, + float *__restrict__ z, float *__restrict__ mass, + float x0, float y0, float z0, + float MaxSepSqrd, float SofteningLenSqrd, + float &__restrict__ ax, float &__restrict__ ay, + float &__restrict__ az) { + GravityForceKernel<4, PolyCoefficients4>(n, x, y, z, mass, x0, y0, z0, + MaxSepSqrd, SofteningLenSqrd, + ax, ay, az); +} + +/* { dg-final { scan-tree-dump "safelen(2147483647)" "omplower" } } */ diff --git a/gcc/testsuite/c-c++-common/pr114635-2.cpp b/gcc/testsuite/c-c++-common/pr114635-2.cpp new file mode 100644 index 000..7de2c8eea73 --- /dev/null +++ b/gcc/testsuite/c-c++-common/pr114635-2.cpp @@ -0,0 +1,61 @@ + +/* PR middle-end/114635 */ +/* { dg-do compile } */ +/* { dg-options "-fopenmp -O3 -fdump-tree-omplower" } */ + +namespace std { + inline constexpr float + sqrt(float __x) + { return __builtin_sqrtf(__x); } +} +extern const float PolyCoefficients4[] = { + 0.263729f, -0.0686285f, 0.00882248f, -0.000592487f, 0.164622f +}; + +template +static void GravityForceKernel(int n, float *__restrict__ x, float *__restrict__ y, + float *__restrict__ z, float *__restrict__ mass, + float x0, float y0, float z0, + float MaxSepSqrd, float SofteningLenSqrd, + float &__
Re: Re: [PATCH 3/3 v3] RISC-V: Add md files for vector BFloat16
On 2024-07-12 06:19 Jeff Law wrote: > > > >On 7/11/24 1:10 AM, Feng Wang wrote: >> V3: Add Bfloat16 vector insn in generic-vector-ooo.md >> v2: Rebase >> Accroding to the BFloat16 spec, some vector iterators and new pattern >> are added in md files. >> >> Signed-off-by: Feng Wang >> gcc/ChangeLog: >> >> * config/riscv/generic-vector-ooo.md: Add def_insn_reservation for vector >> BFloat16. >> * config/riscv/riscv.md: Add new insn name for vector BFloat16. >> * config/riscv/vector-iterators.md: Add some iterators for vector BFloat16. >> * config/riscv/vector.md: Add some attribute for vector BFloat16. >> * config/riscv/vector-bfloat16.md: New file. Add insn pattern vector >> BFloat16. >Note the spaces vs tabs issue pointed out by the lint phase. Those >should be fixed. I don't think the rest of the lint issues need to be >fixed. >jeff Thanks, will fix this lint error type according to the CI log and then commit it.
Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]
On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang wrote: > > Hi, > > According to the instruction spec of AVX512BF16, the convert from float > to BF16 is not a simple truncation. It has special handling for > denormal/nan, even for normal float it will add an extra bias according > to the least significant bit for bf number. This means we cannot use the > vcvtne2ps2bf16 for any bf16 vector shuffle. > The optimization introduced in r15-1368 adds a specific split to convert > HImode permutation with this instruction, so remove it and treat the > BFmode permutation same as HFmode. > > Bootstrapped & regtested on x86_64-pc-linux-gnu. OK for trunk? Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? > > gcc/ChangeLog: > > PR target/115889 > * config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove. > * config/i386/sse.md (hi_cvt_bf): Remove. > (HI_CVT_BF): Likewise. > (vpermt2_sepcial_bf16_shuffle_):Likewise. > > gcc/testsuite/ChangeLog: > > PR target/115889 > * gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust option > and output scan. > --- > gcc/config/i386/predicates.md | 11 -- > gcc/config/i386/sse.md| 35 --- > .../i386/vpermt2-special-bf16-shufflue.c | 5 ++- > 3 files changed, 2 insertions(+), 49 deletions(-) > > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > index a894847adaf..5d0bb1e0f54 100644 > --- a/gcc/config/i386/predicates.md > +++ b/gcc/config/i386/predicates.md > @@ -2327,14 +2327,3 @@ (define_predicate "apx_ndd_add_memory_operand" > >return true; > }) > - > -;; Check that each element is odd and incrementally increasing from 1 > -(define_predicate "vcvtne2ps2bf_parallel" > - (and (match_code "const_vector") > - (match_code "const_int" "a")) > -{ > - for (int i = 0; i < XVECLEN (op, 0); ++i) > -if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1)) > - return false; > - return true; > -}) > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index b3b4697924b..c134494cd20 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -31460,38 +31460,3 @@ (define_insn "vpdp_" >"TARGET_AVXVNNIINT16" >"vpdp\t{%3, %2, %0|%0, %2, %3}" > [(set_attr "prefix" "vex")]) > - > -(define_mode_attr hi_cvt_bf > - [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")]) > - > -(define_mode_attr HI_CVT_BF > - [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")]) > - > -(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_" > - [(set (match_operand:VI2_AVX512F 0 "register_operand") > - (unspec:VI2_AVX512F > - [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel") > - (match_operand:VI2_AVX512F 2 "register_operand") > - (match_operand:VI2_AVX512F 3 "nonimmediate_operand")] > - UNSPEC_VPERMT2))] > - "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()" > - "#" > - "&& 1" > - [(const_int 0)] > -{ > - rtx op0 = gen_reg_rtx (mode); > - operands[2] = lowpart_subreg (mode, > - force_reg (mode, operands[2]), > - mode); > - operands[3] = lowpart_subreg (mode, > - force_reg (mode, operands[3]), > - mode); > - > - emit_insn (gen_avx512f_cvtne2ps2bf16_(op0, > - operands[3], > - operands[2])); > - emit_move_insn (operands[0], lowpart_subreg (mode, op0, > - mode)); > - DONE; > -} > -[(set_attr "mode" "")]) > diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > index 5c65f2a9884..4cbc85735de 100755 > --- a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > +++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > @@ -1,7 +1,6 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */ > -/* { dg-final { scan-assembler-not "vpermi2b" } } */ > -/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } */ > +/* { dg-options "-O2 -mavx512vbmi -mavx512vl" } */ > +/* { dg-final { scan-assembler-times "vpermi2w" 3 } } */ > > typedef __bf16 v8bf __attribute__((vector_size(16))); > typedef __bf16 v16bf __attribute__((vector_size(32))); > -- > 2.34.1 > -- BR, Hongtao
[PATCHv2, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns
Hi, This patch adds TARGET_FLOAT128_HW into pattern conditions for quad- precision insns. Some qp patterns are guarded by TARGET_P9_VECTOR originally, so replace it with "TARGET_FLOAT128_HW". For test case float128-cmp2-runnable.c, it should be guarded with ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with ppc_float128_hw, so it's removed. Compared to previous version, the main change it to split redundant FLOAT128_IEEE_P removal to another patch. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns gcc/ * config/rs6000/rs6000.md (floatti2, floatunsti2, fix_truncti2): Add guard TARGET_FLOAT128_HW. * config/rs6000/vsx.md (xsxexpqp__, xsxsigqp__, xsiexpqpf_, xsiexpqp__, xscmpexpqp__, *xscmpexpqp, xststdcnegqp_): Replace guard TARGET_P9_VECTOR with TARGET_FLOAT128_HW. (xststdc_, *xststdc_, isinf2): Add guard TARGET_FLOAT128_HW for the IEEE128 modes. gcc/testsuite/ * testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace ppc_float128_sw with ppc_float128_hw and remove p9vector_hw. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index deffc4b601c..c0f6599c08b 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6928,7 +6928,7 @@ (define_insn "floatdidf2" (define_insn "floatti2" [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvsqqp %0,%1"; } @@ -6937,7 +6937,7 @@ (define_insn "floatti2" (define_insn "floatunsti2" [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvuqqp %0,%1"; } @@ -6946,7 +6946,7 @@ (define_insn "floatunsti2" (define_insn "fix_truncti2" [(set (match_operand:TI 0 "vsx_register_operand" "=v") (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))] - "TARGET_POWER10" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { return "xscvqpsqz %0,%1"; } diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 1272f8b2080..7dd08895bec 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5157,7 +5157,7 @@ (define_insn "xsxexpqp__" (unspec:V2DI_DI [(match_operand:IEEE128 1 "altivec_register_operand" "v")] UNSPEC_VSX_SXEXPDP))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xsxexpqp %0,%1" [(set_attr "type" "vecmove")]) @@ -5176,7 +5176,7 @@ (define_insn "xsxsigqp__" (unspec:VEC_TI [(match_operand:IEEE128 1 "altivec_register_operand" "v")] UNSPEC_VSX_SXSIG))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xsxsigqp %0,%1" [(set_attr "type" "vecmove")]) @@ -5196,7 +5196,7 @@ (define_insn "xsiexpqpf_" [(match_operand:IEEE128 1 "altivec_register_operand" "v") (match_operand:DI 2 "altivec_register_operand" "v")] UNSPEC_VSX_SIEXPQP))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xsiexpqp %0,%1,%2" [(set_attr "type" "vecmove")]) @@ -5208,7 +5208,7 @@ (define_insn "xsiexpqp__" (match_operand:V2DI_DI 2 "altivec_register_operand" "v")] UNSPEC_VSX_SIEXPQP))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xsiexpqp %0,%1,%2" [(set_attr "type" "vecmove")]) @@ -5278,7 +5278,7 @@ (define_expand "xscmpexpqp__" (set (match_operand:SI 0 "register_operand" "=r") (CMP_TEST:SI (match_dup 3) (const_int 0)))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" { if ( == UNORDERED && !HONOR_NANS (mode)) { @@ -5296,7 +5296,7 @@ (define_insn "*xscmpexpqp" (match_operand:IEEE128 2 "altivec_register_operand" "v")] UNSPEC_VSX_SCMPEXPQP) (match_operand:SI 3 "zero_constant" "j")))] - "TARGET_P9_VECTOR" + "TARGET_FLOAT128_HW" "xscmpexpqp %0,%1,%2" [(set_attr "type" "fpcompare")]) @@ -5315,7 +5315,8 @@ (define_expand "xststdc_" (set (match_operand:SI 0 "register_operand" "=r") (eq:SI (match_dup 3) (const_int 0)))] - "TARGET_P9_VECTOR" + "TARGET_P9_VECTOR + && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)" { operands[3] = gen_reg_rtx (CCFPmode); operands[4] = CONST0_RTX (SImode); @@ -5324,7 +5325,8 @@ (define_expand "xststdc_" (define_expand "isinf2" [(use (match_operand:SI 0 "gpc_reg_operand")) (use (match_operand:IEEE_FP 1 ""))] - "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" + "TARGET_P9_VECTOR + && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)" { int mask = VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_
[PATCH, rs6000] Remove redundant guard for float128 mode patterns
Hi, This patch removes FLOAT128_IEEE_P guard when the mode of pattern is IEEE128 and FLOAT128_IBM_P when the mode of pattern is IBM128. The mode iterators already do the checking. So they're redundant. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Remove redundant guard for float128 mode patterns gcc/ * config/rs6000/rs6000.md (movcc, *movcc_p10, *movcc_invert_p10, *fpmask, *xxsel, @ieee_128bit_vsx_abs2, *ieee_128bit_vsx_nabs2, add3, sub3, mul3, div3, sqrt2, copysign3, copysign3_hard, copysign3_soft, @neg2_hw, @abs2_hw, *nabs2_hw, fma4_hw, *fms4_hw, *nfma4_hw, *nfms4_hw, extend2_hw, truncdf2_hw, truncsf2_hw, fix_2_hw, fix_trunc2, *fix_trunc2_mem, float_di2_hw, float_si2_hw, float2, floatuns_di2_hw, floatuns_si2_hw, floatuns2, floor2, ceil2, btrunc2, round2, add3_odd, sub3_odd, mul3_odd, div3_odd, sqrt2_odd, fma4_odd, *fms4_odd, *nfma4_odd, *nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128): Remove guard FLOAT128_IEEE_P. (@extenddf2_fprs, @extenddf2_vsx, truncdf2_internal1, truncdf2_internal2, fix_trunc_helper, neg2, *cmp_internal1, *cmp_internal2 for IBM128): Remove guard FLOAT128_IBM_P. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index c0f6599c08b..f22b7ed6256 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -5736,7 +5736,7 @@ (define_expand "movcc" (if_then_else:IEEE128 (match_operand 1 "comparison_operator") (match_operand:IEEE128 2 "gpc_reg_operand") (match_operand:IEEE128 3 "gpc_reg_operand")))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" { if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3])) DONE; @@ -5753,7 +5753,7 @@ (define_insn_and_split "*movcc_p10" (match_operand:IEEE128 4 "altivec_register_operand" "v,v") (match_operand:IEEE128 5 "altivec_register_operand" "v,v"))) (clobber (match_scratch:V2DI 6 "=0,&v"))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "#" "&& 1" [(set (match_dup 6) @@ -5785,7 +5785,7 @@ (define_insn_and_split "*movcc_invert_p10" (match_operand:IEEE128 4 "altivec_register_operand" "v,v") (match_operand:IEEE128 5 "altivec_register_operand" "v,v"))) (clobber (match_scratch:V2DI 6 "=0,&v"))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "#" "&& 1" [(set (match_dup 6) @@ -5820,7 +5820,7 @@ (define_insn "*fpmask" (match_operand:IEEE128 3 "altivec_register_operand" "v")]) (match_operand:V2DI 4 "all_ones_constant" "") (match_operand:V2DI 5 "zero_constant" "")))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "xscmp%V1qp %0,%2,%3" [(set_attr "type" "fpcompare")]) @@ -5831,7 +5831,7 @@ (define_insn "*xxsel" (match_operand:V2DI 2 "zero_constant" "")) (match_operand:IEEE128 3 "altivec_register_operand" "v") (match_operand:IEEE128 4 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "TARGET_POWER10 && TARGET_FLOAT128_HW" "xxsel %x0,%x4,%x3,%x1" [(set_attr "type" "vecmove")]) @@ -8904,7 +8904,7 @@ (define_insn_and_split "@extenddf2_fprs" (match_operand:DF 1 "nonimmediate_operand" "d,m,d"))) (use (match_operand:DF 2 "nonimmediate_operand" "m,m,d"))] "!TARGET_VSX && TARGET_HARD_FLOAT - && TARGET_LONG_DOUBLE_128 && FLOAT128_IBM_P (mode)" + && TARGET_LONG_DOUBLE_128" "#" "&& reload_completed" [(set (match_dup 3) (match_dup 1)) @@ -8921,7 +8921,7 @@ (define_insn_and_split "@extenddf2_vsx" [(set (match_operand:IBM128 0 "gpc_reg_operand" "=d,d") (float_extend:IBM128 (match_operand:DF 1 "nonimmediate_operand" "wa,m")))] - "TARGET_LONG_DOUBLE_128 && TARGET_VSX && FLOAT128_IBM_P (mode)" + "TARGET_LONG_DOUBLE_128 && TARGET_VSX" "#" "&& reload_completed" [(set (match_dup 2) (match_dup 1)) @@ -8967,7 +8967,7 @@ (define_insn_and_split "truncdf2_internal1" [(set (match_operand:DF 0 "gpc_reg_operand" "=d,?d") (float_truncate:DF (match_operand:IBM128 1 "gpc_reg_operand" "0,d")))] - "FLOAT128_IBM_P (mode) && !TARGET_XL_COMPAT + "!TARGET_XL_COMPAT && TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128" "@ # @@ -8983,7 +8983,7 @@ (define_insn_and_split "truncdf2_internal1" (define_insn "truncdf2_internal2" [(set (match_operand:DF 0 "gpc_reg_operand" "=d") (float_truncate:DF (match_operand:IBM128 1 "gpc_reg_operand" "d")))] - "FLOAT12
Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]
> Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? We can still deal with BFmode permutation the same way as HFmode, so the change in ix86_vectorize_vec_perm_const can be preserved. Hongtao Liu 于2024年7月15日周一 09:40写道: > > On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang wrote: > > > > Hi, > > > > According to the instruction spec of AVX512BF16, the convert from float > > to BF16 is not a simple truncation. It has special handling for > > denormal/nan, even for normal float it will add an extra bias according > > to the least significant bit for bf number. This means we cannot use the > > vcvtne2ps2bf16 for any bf16 vector shuffle. > > The optimization introduced in r15-1368 adds a specific split to convert > > HImode permutation with this instruction, so remove it and treat the > > BFmode permutation same as HFmode. > > > > Bootstrapped & regtested on x86_64-pc-linux-gnu. OK for trunk? > Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? > > > > gcc/ChangeLog: > > > > PR target/115889 > > * config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove. > > * config/i386/sse.md (hi_cvt_bf): Remove. > > (HI_CVT_BF): Likewise. > > (vpermt2_sepcial_bf16_shuffle_):Likewise. > > > > gcc/testsuite/ChangeLog: > > > > PR target/115889 > > * gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust option > > and output scan. > > --- > > gcc/config/i386/predicates.md | 11 -- > > gcc/config/i386/sse.md| 35 --- > > .../i386/vpermt2-special-bf16-shufflue.c | 5 ++- > > 3 files changed, 2 insertions(+), 49 deletions(-) > > > > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > > index a894847adaf..5d0bb1e0f54 100644 > > --- a/gcc/config/i386/predicates.md > > +++ b/gcc/config/i386/predicates.md > > @@ -2327,14 +2327,3 @@ (define_predicate "apx_ndd_add_memory_operand" > > > >return true; > > }) > > - > > -;; Check that each element is odd and incrementally increasing from 1 > > -(define_predicate "vcvtne2ps2bf_parallel" > > - (and (match_code "const_vector") > > - (match_code "const_int" "a")) > > -{ > > - for (int i = 0; i < XVECLEN (op, 0); ++i) > > -if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1)) > > - return false; > > - return true; > > -}) > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > index b3b4697924b..c134494cd20 100644 > > --- a/gcc/config/i386/sse.md > > +++ b/gcc/config/i386/sse.md > > @@ -31460,38 +31460,3 @@ (define_insn "vpdp_" > >"TARGET_AVXVNNIINT16" > >"vpdp\t{%3, %2, %0|%0, %2, %3}" > > [(set_attr "prefix" "vex")]) > > - > > -(define_mode_attr hi_cvt_bf > > - [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")]) > > - > > -(define_mode_attr HI_CVT_BF > > - [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")]) > > - > > -(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_" > > - [(set (match_operand:VI2_AVX512F 0 "register_operand") > > - (unspec:VI2_AVX512F > > - [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel") > > - (match_operand:VI2_AVX512F 2 "register_operand") > > - (match_operand:VI2_AVX512F 3 "nonimmediate_operand")] > > - UNSPEC_VPERMT2))] > > - "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()" > > - "#" > > - "&& 1" > > - [(const_int 0)] > > -{ > > - rtx op0 = gen_reg_rtx (mode); > > - operands[2] = lowpart_subreg (mode, > > - force_reg (mode, operands[2]), > > - mode); > > - operands[3] = lowpart_subreg (mode, > > - force_reg (mode, operands[3]), > > - mode); > > - > > - emit_insn (gen_avx512f_cvtne2ps2bf16_(op0, > > - operands[3], > > - operands[2])); > > - emit_move_insn (operands[0], lowpart_subreg (mode, op0, > > - mode)); > > - DONE; > > -} > > -[(set_attr "mode" "")]) > > diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > > b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > > index 5c65f2a9884..4cbc85735de 100755 > > --- a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > > +++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > > @@ -1,7 +1,6 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */ > > -/* { dg-final { scan-assembler-not "vpermi2b" } } */ > > -/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } */ > > +/* { dg-options "-O2 -mavx512vbmi -mavx512vl" } */ > > +/* { dg-final { scan-assembler-times "vpermi2w" 3 } } */ > > > > typedef __bf16 v8bf __attribute__((vector_size(16))); > > typedef __bf16 v16bf __attribute__((vector_size(32))); > > -- > > 2.34.1 > > > > > -- > BR, > Hongtao
Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]
On Mon, Jul 15, 2024 at 10:21 AM Hongyu Wang wrote: > > > Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? > > We can still deal with BFmode permutation the same way as HFmode, so > the change in ix86_vectorize_vec_perm_const can be preserved. > > Hongtao Liu 于2024年7月15日周一 09:40写道: > > > > On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang wrote: > > > > > > Hi, > > > > > > According to the instruction spec of AVX512BF16, the convert from float > > > to BF16 is not a simple truncation. It has special handling for > > > denormal/nan, even for normal float it will add an extra bias according > > > to the least significant bit for bf number. This means we cannot use the > > > vcvtne2ps2bf16 for any bf16 vector shuffle. > > > The optimization introduced in r15-1368 adds a specific split to convert > > > HImode permutation with this instruction, so remove it and treat the > > > BFmode permutation same as HFmode. I see, patch LGTM. > > > > > > Bootstrapped & regtested on x86_64-pc-linux-gnu. OK for trunk? > > Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? > > > > > > gcc/ChangeLog: > > > > > > PR target/115889 > > > * config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove. > > > * config/i386/sse.md (hi_cvt_bf): Remove. > > > (HI_CVT_BF): Likewise. > > > (vpermt2_sepcial_bf16_shuffle_):Likewise. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR target/115889 > > > * gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust option > > > and output scan. > > > --- > > > gcc/config/i386/predicates.md | 11 -- > > > gcc/config/i386/sse.md| 35 --- > > > .../i386/vpermt2-special-bf16-shufflue.c | 5 ++- > > > 3 files changed, 2 insertions(+), 49 deletions(-) > > > > > > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > > > index a894847adaf..5d0bb1e0f54 100644 > > > --- a/gcc/config/i386/predicates.md > > > +++ b/gcc/config/i386/predicates.md > > > @@ -2327,14 +2327,3 @@ (define_predicate "apx_ndd_add_memory_operand" > > > > > >return true; > > > }) > > > - > > > -;; Check that each element is odd and incrementally increasing from 1 > > > -(define_predicate "vcvtne2ps2bf_parallel" > > > - (and (match_code "const_vector") > > > - (match_code "const_int" "a")) > > > -{ > > > - for (int i = 0; i < XVECLEN (op, 0); ++i) > > > -if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1)) > > > - return false; > > > - return true; > > > -}) > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > > index b3b4697924b..c134494cd20 100644 > > > --- a/gcc/config/i386/sse.md > > > +++ b/gcc/config/i386/sse.md > > > @@ -31460,38 +31460,3 @@ (define_insn "vpdp_" > > >"TARGET_AVXVNNIINT16" > > >"vpdp\t{%3, %2, %0|%0, %2, %3}" > > > [(set_attr "prefix" "vex")]) > > > - > > > -(define_mode_attr hi_cvt_bf > > > - [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")]) > > > - > > > -(define_mode_attr HI_CVT_BF > > > - [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")]) > > > - > > > -(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_" > > > - [(set (match_operand:VI2_AVX512F 0 "register_operand") > > > - (unspec:VI2_AVX512F > > > - [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel") > > > - (match_operand:VI2_AVX512F 2 "register_operand") > > > - (match_operand:VI2_AVX512F 3 "nonimmediate_operand")] > > > - UNSPEC_VPERMT2))] > > > - "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()" > > > - "#" > > > - "&& 1" > > > - [(const_int 0)] > > > -{ > > > - rtx op0 = gen_reg_rtx (mode); > > > - operands[2] = lowpart_subreg (mode, > > > - force_reg (mode, operands[2]), > > > - mode); > > > - operands[3] = lowpart_subreg (mode, > > > - force_reg (mode, operands[3]), > > > - mode); > > > - > > > - emit_insn (gen_avx512f_cvtne2ps2bf16_(op0, > > > - operands[3], > > > - operands[2])); > > > - emit_move_insn (operands[0], lowpart_subreg (mode, op0, > > > - mode)); > > > - DONE; > > > -} > > > -[(set_attr "mode" "")]) > > > diff --git > > > a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > > > b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > > > index 5c65f2a9884..4cbc85735de 100755 > > > --- a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > > > +++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c > > > @@ -1,7 +1,6 @@ > > > /* { dg-do compile } */ > > > -/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */ > > > -/* { dg-final { scan-assembler-not "vpermi2b" } } */ > > > -/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } *
Re: [committed] Fix previously latent bug in reorg affecting cris port
> From: Hans-Peter Nilsson > Date: Fri, 12 Jul 2024 02:11:45 +0200 > > > Date: Wed, 3 Jul 2024 12:46:46 -0600 > > From: Jeff Law > > > The late-combine patch has triggered a previously latent bug in reorg. > > > > Basically we have a sequence like this in the middle of reorg before we > > start relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c) > > [...] > > > Pushing to the trunk momentarily. JFTR, for cris-elf, this can't be blamed on (to have been exposed by) late-combine, because this appeared with r15-1619-g3b9b8d6cfdf593 "ira: Scale save/restore costs of callee save registers with block frequency", even with -fno-late-combine-instructions. I noticed because I chased another regression, an XPASS, happening for gcc.dg/tree-ssa/loop-1.c which was also caused by that revision. Regarding that commit, checking the generated code for loop-1.c, that XPASS was reflecting a *regression*, not an improvement. To wit, it looks like _foo is no longer stored in a register for cris-elf, but there's no change in the number of saved registers. As coremark results are the same before/after that commit for cris-elf, I'm not going to make a fuss; IOW, not open a PR for the regression. (Phew, one less rabbit-hole. I see that patch exposed as many problems as late-combine!) Still, a heads-up to the author of that patch. Maybe the frequencies are miscalculated for that test-case. I tried to look at regs.h:REG_FREQ_FROM_BB, but it's a mystery to me: its value seems to vary between 1 and 1000, that doesn't seem right, but that macro's used all over the place. Not documented very much though. :( FAOD, not blaming the author of r15-1619-g3b9b8d6cfdf593 here. Also FTR, I had to search a bit to find the patch submission and review. It's in the archives of last October: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html as mentioned in another message. brgds, H-P
Re: [PATCH] [APX NF] Add a pass to convert legacy insn to NF insns
On Wed, Jul 10, 2024 at 2:46 PM Hongyu Wang wrote: > > Hi, > > For APX ccmp, current infrastructure will always generate cstore for > the ccmp flag user, like > > cmpe%rcx, %r8 > ccmpnel %rax, %rbx > seta%dil > add %rcx, %r9 > add %r9, %rdx > testb %dil, %dil > je .L2 > > For such case, the legacy add clobbers FLAGS_REG so there should have > extra cstore to avoid the flag be reset before using it. If the > instructions between flag producer and user are NF insns, the setcc/ > test sequence is not required. > > Add a pass to convert legacy flag clobber insns to their NF counterpart. > The convertion only happens when > 1. APX_NF enabled. > 2. For a BB, cstore was find, and there are insns between such cstore > and next explicit set insn to FLAGS_REG (test or cmp). > 3. All the insns between should have NF counterpart. > > The pass was added after rtl-ifcvt which eliminates some branch when > profitable, which could cause some flag-clobbering insn put between > cstore and jcc. > > Bootstrapped & regtested on x86_64-pc-linux-gnu and SDE. Also passed > spec2017 simulation run on SDE. > > Ok for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i386.md (has_nf): New define_attr, add to all > nf related patterns. > * config/i386/i386-features.cc (apx_nf_convert): New function > to convert Non-NF insns to their NF counterparts. > (class pass_apx_nf_convert): New pass class. > (make_pass_apx_nf_convert): New. > * config/i386/i386-passes.def: Add pass_apx_nf_convert after > rtl_ifcvt. > * config/i386/i386-protos.h (make_pass_apx_nf_convert): Declare. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/apx-nf-2.c: New test. > --- > gcc/config/i386/i386-features.cc | 163 +++ > gcc/config/i386/i386-passes.def | 1 + > gcc/config/i386/i386-protos.h| 1 + > gcc/config/i386/i386.md | 67 +- > gcc/testsuite/gcc.target/i386/apx-nf-2.c | 32 + > 5 files changed, 259 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-2.c > > diff --git a/gcc/config/i386/i386-features.cc > b/gcc/config/i386/i386-features.cc > index fc224ed06b0..3da56ddbdcc 100644 > --- a/gcc/config/i386/i386-features.cc > +++ b/gcc/config/i386/i386-features.cc > @@ -3259,6 +3259,169 @@ make_pass_remove_partial_avx_dependency (gcc::context > *ctxt) >return new pass_remove_partial_avx_dependency (ctxt); > } > > +/* Convert legacy instructions that clobbers EFLAGS to APX_NF > + instructions when there are no flag set between a flag > + producer and user. */ > + > +static unsigned int > +ix86_apx_nf_convert (void) > +{ > + timevar_push (TV_MACH_DEP); > + > + basic_block bb; > + rtx_insn *insn; > + hash_map converting_map; > + auto_vec current_convert_list; > + > + bool converting_seq = false; > + rtx cc = gen_rtx_REG (CCmode, FLAGS_REG); > + > + FOR_EACH_BB_FN (bb, cfun) > +{ > + /* Reset conversion for each bb. */ > + converting_seq = false; > + FOR_BB_INSNS (bb, insn) > + { > + if (!NONDEBUG_INSN_P (insn)) > + continue; > + > + if (recog_memoized (insn) < 0) > + continue; > + > + /* Convert candidate insns after cstore, which should > +satisify the two conditions: > +1. Is not flag user or producer, only clobbers > +FLAGS_REG. > +2. Have corresponding nf pattern. */ > + > + rtx pat = PATTERN (insn); > + > + /* Starting convertion at first cstorecc. */ > + rtx set = NULL_RTX; > + if (!converting_seq > + && (set = single_set (insn)) > + && ix86_comparison_operator (SET_SRC (set), VOIDmode) > + && reg_overlap_mentioned_p (cc, SET_SRC (set)) > + && !reg_overlap_mentioned_p (cc, SET_DEST (set))) > + { > + converting_seq = true; > + current_convert_list.truncate (0); > + } > + /* Terminate at the next explicit flag set. */ > + else if (reg_set_p (cc, pat) > + && GET_CODE (set_of (cc, pat)) != CLOBBER) > + converting_seq = false; > + > + if (!converting_seq) > + continue; > + > + if (get_attr_has_nf (insn) > + && GET_CODE (pat) == PARALLEL) > + { > + /* Record the insn to candidate map. */ > + current_convert_list.safe_push (insn); > + converting_map.put (insn, pat); > + } > + /* If the insn clobbers flags but has no nf_attr, > +revoke all previous candidates. */ > + else if (!get_attr_has_nf (insn) > + && reg_set_p (cc, pat) > + && GET_CODE (set_of (cc, pat)) == CLOBBER) > + { > + for (auto item : current_conv
[COMMITTED] CRIS: Adjust gcc.dg/tree-ssa/loop-1.c
Committed. -- >8 -- With r15-1619-g3b9b8d6cfdf593, there's a XPASS and a FAIL for this test-case for cris-elf. Looking at the generated code, _foo is indeed no longer saved in a register for CRIS. While that looks like a regression, coremark results are the same around this revision, so simply adjust the test-case: remove the target-specific exceptions for cris-*-*. * gcc.dg/tree-ssa/loop-1.c: Remove target-specific test and xfail to adjust for recent changes in register allocation. --- gcc/testsuite/gcc.dg/tree-ssa/loop-1.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c index a531b7584a64..a8f2c3bbfdb4 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c @@ -43,16 +43,15 @@ int xxx(void) /* The SH targets always use separate instructions to load the address and to do the actual call - bsr is only generated by link time relaxation. */ -/* CRIS and MSP430 keep the address in a register. */ +/* MSP430 keeps the address in a register. */ /* m68k sometimes puts the address in a register, depending on CPU and PIC. */ -/* { dg-final { scan-assembler-times "foo" 5 { xfail hppa*-*-* ia64*-*-* sh*-*-* cris-*-* fido-*-* m68k-*-* i?86-*-mingw* i?86-*-cygwin* x86_64-*-mingw* visium-*-* nvptx*-*-* pdp11*-*-* msp430-*-* amdgcn*-*-* } } } */ +/* { dg-final { scan-assembler-times "foo" 5 { xfail hppa*-*-* ia64*-*-* sh*-*-* fido-*-* m68k-*-* i?86-*-mingw* i?86-*-cygwin* x86_64-*-mingw* visium-*-* nvptx*-*-* pdp11*-*-* msp430-*-* amdgcn*-*-* } } } */ /* { dg-final { scan-assembler-times "foo,%r" 5 { target hppa*-*-* } } } */ /* { dg-final { scan-assembler-times "= foo" 5 { target ia64*-*-* } } } */ /* { dg-final { scan-assembler-times "call\[ \t\]*_foo" 5 { target i?86-*-mingw* i?86-*-cygwin* } } } */ /* { dg-final { scan-assembler-times "call\[ \t\]*foo" 5 { target x86_64-*-mingw* } } } */ /* { dg-final { scan-assembler-times "jsr|bsrf|blink\ttr?,r18" 5 { target sh*-*-* } } } */ -/* { dg-final { scan-assembler-times "Jsr \\\$r" 5 { target cris-*-* } } } */ /* { dg-final { scan-assembler-times "\[jb\]sr" 5 { target fido-*-* m68k-*-* pdp11-*-* } } } */ /* { dg-final { scan-assembler-times "bra *tr,r\[1-9\]*,r21" 5 { target visium-*-* } } } */ /* { dg-final { scan-assembler-times "(?n)\[ \t\]call\[ \t\].*\[ \t\]foo," 5 { target nvptx*-*-* } } } */ -- 2.30.2
Re: [i386] adjust flag_omit_frame_pointer in a single function [PR113719] (was: Re: [PATCH] [i386] restore recompute to override opts after change [PR113719])
On Thu, Jul 11, 2024 at 9:07 PM Alexandre Oliva wrote: > > On Jul 4, 2024, Alexandre Oliva wrote: > > > On Jul 3, 2024, Rainer Orth wrote: > > > Hmm, I wonder if leaf frame pointer has to do with that. > > It did, in a way. > > > > The first two patches for PR113719 have each regressed > gcc.dg/ipa/iinline-attr.c on a different target. The reason for this > instability is that there are competing flag_omit_frame_pointer > overriders on x86: > > - ix86_recompute_optlev_based_flags computes and sets a > -f[no-]omit-frame-pointer default depending on > USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size > > - ix86_option_override_internal enables flag_omit_frame_pointer for > -momit-leaf-frame-pointer to take effect > > ix86_option_override[_internal] calls > ix86_recompute_optlev_based_flags before setting > flag_omit_frame_pointer. It is called during global process_options. > > But ix86_recompute_optlev_based_flags is also called by > parse_optimize_options, during attribute processing, and at that > point, ix86_option_override is not called, so the final overrider for > global options is not applied to the optimize attributes. If they > differ, the testcase fails. > > In order to fix this, we need to process all overriders of this option > whenever we process any of them. Since this setting is affected by > optimization options, it makes sense to compute it in > parse_optimize_options, rather than in process_options. > > Regstrapped on x86_64-linux-gnu. Also verified that the regression is > cured with a i686-solaris cross compiler. Ok to install? Ok. thanks. > > > for gcc/ChangeLog > > PR target/113719 > * config/i386/i386-options.cc (ix86_option_override_internal): > Move flag_omit_frame_pointer final overrider... > (ix86_recompute_optlev_based_flags): ... here. > --- > gcc/config/i386/i386-options.cc | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc > index 5824c0cb072eb..059ef3ae6ad44 100644 > --- a/gcc/config/i386/i386-options.cc > +++ b/gcc/config/i386/i386-options.cc > @@ -1911,6 +1911,12 @@ ix86_recompute_optlev_based_flags (struct gcc_options > *opts, > opts->x_flag_pcc_struct_return = DEFAULT_PCC_STRUCT_RETURN; > } > } > + > + /* Keep nonleaf frame pointers. */ > + if (opts->x_flag_omit_frame_pointer) > +opts->x_target_flags &= ~MASK_OMIT_LEAF_FRAME_POINTER; > + else if (TARGET_OMIT_LEAF_FRAME_POINTER_P (opts->x_target_flags)) > +opts->x_flag_omit_frame_pointer = 1; > } > > /* Implement part of TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook. */ > @@ -2590,12 +2596,6 @@ ix86_option_override_internal (bool main_args_p, > opts->x_target_flags |= MASK_NO_RED_ZONE; > } > > - /* Keep nonleaf frame pointers. */ > - if (opts->x_flag_omit_frame_pointer) > -opts->x_target_flags &= ~MASK_OMIT_LEAF_FRAME_POINTER; > - else if (TARGET_OMIT_LEAF_FRAME_POINTER_P (opts->x_target_flags)) > -opts->x_flag_omit_frame_pointer = 1; > - >/* If we're doing fast math, we don't care about comparison order > wrt NaNs. This lets us use a shorter comparison sequence. */ >if (opts->x_flag_finite_math_only) > > > -- > Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ >Free Software Activist GNU Toolchain Engineer > More tolerance and less prejudice are key for inclusion and diversity > Excluding neuro-others for not behaving ""normal"" is *not* inclusive -- BR, Hongtao
Re: [COMMITTED] CRIS: Adjust gcc.dg/tree-ssa/loop-1.c
> From: Hans-Peter Nilsson > Date: Mon, 15 Jul 2024 05:06:43 +0200 > With r15-1619-g3b9b8d6cfdf593, there's a XPASS and a FAIL > for this test-case for cris-elf. Looking at the generated > code, _foo is indeed no longer saved in a register for CRIS. > While that looks like a regression, coremark results are the > same around this revision, so simply adjust the test-case: > remove the target-specific exceptions for cris-*-*. Oh my... That "sameness" was due to fumblefingers on my part. Sorry about that. There is indeed a performance regression at "-O2 -march=v10" for cris-elf for coremark. Not a big one; going from 5179918 to 5181696 cycles gets me 0.034%, but still. Maybe there are other targets affected negatively by r15-1619-g3b9b8d6cfdf593, so I opened PR115932 to keep track. brgds, H-P
[PATCH] i386: extend trunc{128}2{16,32,64}'s scope.
Hi, all Based on actual usage, trunc{128}2{16,32,64} use some instructions from sse/sse3, so extend their scope to extend the scope of optimization. Bootstraped and regtest on x86-64-linux-gnu, OK for trunk? BRs, Lin gcc/ChangeLog: PR target/107432 * config/i386/sse.md (PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI. (PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI. (trunc2): Change constraint from TARGET_AVX2 to TARGET_SSSE3. (trunc2): Ditto. (truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE. gcc/testsuite/ChangeLog: PR target/107432 * gcc.target/i386/pr107432-10.c: New test. --- gcc/config/i386/sse.md | 11 +++--- gcc/testsuite/gcc.target/i386/pr107432-10.c | 41 + 2 files changed, 47 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-10.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b3b4697924b..72f3c7df297 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -15000,7 +15000,8 @@ (define_expand "_2_mask_store" "TARGET_AVX512VL") (define_mode_iterator PMOV_SRC_MODE_3 [V4DI V2DI V8SI V4SI (V8HI "TARGET_AVX512BW")]) -(define_mode_iterator PMOV_SRC_MODE_3_AVX2 [V4DI V2DI V8SI V4SI V8HI]) +(define_mode_iterator PMOV_SRC_MODE_3_AVX2 + [(V4DI "TARGET_AVX2") V2DI (V8SI "TARGET_AVX2") V4SI V8HI]) (define_mode_attr pmov_dst_3_lower [(V4DI "v4qi") (V2DI "v2qi") (V8SI "v8qi") (V4SI "v4qi") (V8HI "v8qi")]) (define_mode_attr pmov_dst_3 @@ -15014,7 +15015,7 @@ (define_expand "trunc2" [(set (match_operand: 0 "register_operand") (truncate: (match_operand:PMOV_SRC_MODE_3_AVX2 1 "register_operand")))] - "TARGET_AVX2" + "TARGET_SSSE3" { if (TARGET_AVX512VL && (mode != V8HImode || TARGET_AVX512BW)) @@ -15390,7 +15391,7 @@ (define_insn_and_split "avx512vl_v8qi2_mask_store_2" (match_dup 2)))] "operands[0] = adjust_address_nv (operands[0], V8QImode, 0);") -(define_mode_iterator PMOV_SRC_MODE_4 [V4DI V2DI V4SI]) +(define_mode_iterator PMOV_SRC_MODE_4 [(V4DI "TARGET_AVX2") V2DI V4SI]) (define_mode_attr pmov_dst_4 [(V4DI "V4HI") (V2DI "V2HI") (V4SI "V4HI")]) (define_mode_attr pmov_dst_4_lower @@ -15404,7 +15405,7 @@ (define_expand "trunc2" [(set (match_operand: 0 "register_operand") (truncate: (match_operand:PMOV_SRC_MODE_4 1 "register_operand")))] - "TARGET_AVX2" + "TARGET_SSSE3" { if (TARGET_AVX512VL) { @@ -15659,7 +15660,7 @@ (define_expand "truncv2div2si2" [(set (match_operand:V2SI 0 "register_operand") (truncate:V2SI (match_operand:V2DI 1 "register_operand")))] - "TARGET_AVX2" + "TARGET_SSE" { if (TARGET_AVX512VL) { diff --git a/gcc/testsuite/gcc.target/i386/pr107432-10.c b/gcc/testsuite/gcc.target/i386/pr107432-10.c new file mode 100644 index 000..57edf7cfc78 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107432-10.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-march=x86-64-v2 -O2" } */ +/* { dg-final { scan-assembler-times "shufps" 1 } } */ +/* { dg-final { scan-assembler-times "pshufb" 5 } } */ + +#include + +typedef short __v2hi __attribute__ ((__vector_size__ (4))); +typedef char __v2qi __attribute__ ((__vector_size__ (2))); +typedef char __v4qi __attribute__ ((__vector_size__ (4))); +typedef char __v8qi __attribute__ ((__vector_size__ (8))); + +__v2si mm_cvtepi64_epi32_builtin_convertvector(__v2di a) +{ + return __builtin_convertvector((__v2di)a, __v2si); +} + +__v2hi mm_cvtepi64_epi16_builtin_convertvector(__m128i a) +{ + return __builtin_convertvector((__v2di)a, __v2hi); +} + +__v4hi mm_cvtepi32_epi16_builtin_convertvector(__m128i a) +{ + return __builtin_convertvector((__v4si)a, __v4hi); +} + +__v2qi mm_cvtepi64_epi8_builtin_convertvector(__m128i a) +{ + return __builtin_convertvector((__v2di)a, __v2qi); +} + +__v4qi mm_cvtepi32_epi8_builtin_convertvector(__m128i a) +{ + return __builtin_convertvector((__v4si)a, __v4qi); +} + +__v8qi mm_cvtepi16_epi8_builtin_convertvector(__m128i a) +{ + return __builtin_convertvector((__v8hi)a, __v8qi); +} -- 2.31.1
[committed] RISC-V: Implement locality for __builtin_prefetch
The patch add the Zihintntl instructions in the prefetch pattern. Zicbop has prefetch instructions. Zihintntl has NTL instructions. Insert NTL instructions before prefetch instruction, if target has Zihintntl extension. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_print_operand): Add 'L' letter to print zihintntl instructions string. * config/riscv/riscv.md (prefetch): Add zihintntl instructions. gcc/testsuite/ChangeLog: * gcc.target/riscv/prefetch-zicbop.c: New test. * gcc.target/riscv/prefetch-zihintntl.c: New test. --- gcc/config/riscv/riscv.cc | 22 +++ gcc/config/riscv/riscv.md | 10 ++--- .../gcc.target/riscv/prefetch-zicbop.c| 20 + .../gcc.target/riscv/prefetch-zihintntl.c | 20 + 4 files changed, 69 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c create mode 100644 gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 53ab2f1a881..084a592a313 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -6488,6 +6488,7 @@ riscv_asm_output_opcode (FILE *asm_out_file, const char *p) 'A' Print the atomic operation suffix for memory model OP. 'I' Print the LR suffix for memory model OP. 'J' Print the SC suffix for memory model OP. + 'L' Print a non-temporal locality hints instruction. 'z' Print x0 if OP is zero, otherwise print OP normally. 'i' Print i if the operand is not a register. 'S' Print shift-index of single-bit mask OP. @@ -6682,6 +6683,27 @@ riscv_print_operand (FILE *file, rtx op, int letter) break; } +case 'L': + { + const char *ntl_hint = NULL; + switch (INTVAL (op)) + { + case 0: + ntl_hint = "ntl.all"; + break; + case 1: + ntl_hint = "ntl.pall"; + break; + case 2: + ntl_hint = "ntl.p1"; + break; + } + + if (ntl_hint) + asm_fprintf (file, "%s\n\t", ntl_hint); + break; + } + case 'i': if (code != REG) fputs ("i", file); diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 379015c60de..46c46039c33 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -4113,12 +4113,16 @@ { switch (INTVAL (operands[1])) { -case 0: return "prefetch.r\t%a0"; -case 1: return "prefetch.w\t%a0"; +case 0: return TARGET_ZIHINTNTL ? "%L2prefetch.r\t%a0" : "prefetch.r\t%a0"; +case 1: return TARGET_ZIHINTNTL ? "%L2prefetch.w\t%a0" : "prefetch.w\t%a0"; default: gcc_unreachable (); } } - [(set_attr "type" "store")]) + [(set_attr "type" "store") + (set (attr "length") (if_then_else (and (match_test "TARGET_ZIHINTNTL") + (match_test "IN_RANGE (INTVAL (operands[2]), 0, 2)")) + (const_string "8") + (const_string "4")))]) (define_insn "riscv_prefetchi_" [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r") diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c b/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c new file mode 100644 index 000..0faa120f1f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c @@ -0,0 +1,20 @@ +/* { dg-do compile target { { rv64-*-*}}} */ +/* { dg-options "-march=rv64gc_zicbop -mabi=lp64" } */ + +void foo (char *p) +{ + __builtin_prefetch (p, 0, 0); + __builtin_prefetch (p, 0, 1); + __builtin_prefetch (p, 0, 2); + __builtin_prefetch (p, 0, 3); + __builtin_prefetch (p, 1, 0); + __builtin_prefetch (p, 1, 1); + __builtin_prefetch (p, 1, 2); + __builtin_prefetch (p, 1, 3); +} + +/* { dg-final { scan-assembler-not "ntl.all\t" } } */ +/* { dg-final { scan-assembler-not "ntl.pall\t" } } */ +/* { dg-final { scan-assembler-not "ntl.p1\t" } } */ +/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */ +/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c new file mode 100644 index 000..78a3afe6833 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c @@ -0,0 +1,20 @@ +/* { dg-do compile target { { rv64-*-*}}} */ +/* { dg-options "-march=rv64gc_zicbop_zihintntl -mabi=lp64" } */ + +void foo (char *p) +{ + __builtin_prefetch (p, 0, 0); + __builtin_prefetch (p, 0, 1); + __builtin_prefetch (p, 0, 2); + __builtin_prefetch (p, 0, 3); + __builtin_prefetch (p, 1, 0); + __builtin_prefetch (p, 1, 1); + __builtin_prefetch (p, 1, 2); + __builtin_prefetch (p, 1, 3); +} + +/* { dg-final { scan-assembler-times "ntl.all" 2 } } */ +/* { dg-final { scan-assembler-times "ntl.pall" 2 } } */ +/* { dg-final { scan-assembler-times "ntl.p1" 2 } } *
Re: [PATCH] i386: extend trunc{128}2{16,32,64}'s scope.
On Mon, Jul 15, 2024 at 1:39 PM Hu, Lin1 wrote: > > Hi, all > > Based on actual usage, trunc{128}2{16,32,64} use some instructions from > sse/sse3, so extend their scope to extend the scope of optimization. > > Bootstraped and regtest on x86-64-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: > > PR target/107432 > * config/i386/sse.md > (PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI. > (PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI. > (trunc2): Change constraint from TARGET_AVX2 > to > TARGET_SSSE3. > (trunc2): Ditto. > (truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE. > > gcc/testsuite/ChangeLog: > > PR target/107432 > * gcc.target/i386/pr107432-10.c: New test. > --- > gcc/config/i386/sse.md | 11 +++--- > gcc/testsuite/gcc.target/i386/pr107432-10.c | 41 + > 2 files changed, 47 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-10.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index b3b4697924b..72f3c7df297 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -15000,7 +15000,8 @@ (define_expand > "_2_mask_store" >"TARGET_AVX512VL") > > (define_mode_iterator PMOV_SRC_MODE_3 [V4DI V2DI V8SI V4SI (V8HI > "TARGET_AVX512BW")]) > -(define_mode_iterator PMOV_SRC_MODE_3_AVX2 [V4DI V2DI V8SI V4SI V8HI]) > +(define_mode_iterator PMOV_SRC_MODE_3_AVX2 > + [(V4DI "TARGET_AVX2") V2DI (V8SI "TARGET_AVX2") V4SI V8HI]) > (define_mode_attr pmov_dst_3_lower >[(V4DI "v4qi") (V2DI "v2qi") (V8SI "v8qi") (V4SI "v4qi") (V8HI "v8qi")]) > (define_mode_attr pmov_dst_3 > @@ -15014,7 +15015,7 @@ (define_expand "trunc2" >[(set (match_operand: 0 "register_operand") > (truncate: > (match_operand:PMOV_SRC_MODE_3_AVX2 1 "register_operand")))] > - "TARGET_AVX2" > + "TARGET_SSSE3" > { >if (TARGET_AVX512VL >&& (mode != V8HImode || TARGET_AVX512BW)) > @@ -15390,7 +15391,7 @@ (define_insn_and_split > "avx512vl_v8qi2_mask_store_2" > (match_dup 2)))] >"operands[0] = adjust_address_nv (operands[0], V8QImode, 0);") > > -(define_mode_iterator PMOV_SRC_MODE_4 [V4DI V2DI V4SI]) > +(define_mode_iterator PMOV_SRC_MODE_4 [(V4DI "TARGET_AVX2") V2DI V4SI]) > (define_mode_attr pmov_dst_4 >[(V4DI "V4HI") (V2DI "V2HI") (V4SI "V4HI")]) > (define_mode_attr pmov_dst_4_lower > @@ -15404,7 +15405,7 @@ (define_expand "trunc2" >[(set (match_operand: 0 "register_operand") > (truncate: > (match_operand:PMOV_SRC_MODE_4 1 "register_operand")))] > - "TARGET_AVX2" > + "TARGET_SSSE3" > { >if (TARGET_AVX512VL) > { > @@ -15659,7 +15660,7 @@ (define_expand "truncv2div2si2" >[(set (match_operand:V2SI 0 "register_operand") > (truncate:V2SI > (match_operand:V2DI 1 "register_operand")))] > - "TARGET_AVX2" > + "TARGET_SSE" > { >if (TARGET_AVX512VL) > { > diff --git a/gcc/testsuite/gcc.target/i386/pr107432-10.c > b/gcc/testsuite/gcc.target/i386/pr107432-10.c > new file mode 100644 > index 000..57edf7cfc78 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr107432-10.c > @@ -0,0 +1,41 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=x86-64-v2 -O2" } */ > +/* { dg-final { scan-assembler-times "shufps" 1 } } */ > +/* { dg-final { scan-assembler-times "pshufb" 5 } } */ > + > +#include > + > +typedef short __v2hi __attribute__ ((__vector_size__ (4))); > +typedef char __v2qi __attribute__ ((__vector_size__ (2))); > +typedef char __v4qi __attribute__ ((__vector_size__ (4))); > +typedef char __v8qi __attribute__ ((__vector_size__ (8))); > + > +__v2si mm_cvtepi64_epi32_builtin_convertvector(__v2di a) > +{ > + return __builtin_convertvector((__v2di)a, __v2si); > +} > + > +__v2hi mm_cvtepi64_epi16_builtin_convertvector(__m128i a) > +{ > + return __builtin_convertvector((__v2di)a, __v2hi); > +} > + > +__v4hi mm_cvtepi32_epi16_builtin_convertvector(__m128i a) > +{ > + return __builtin_convertvector((__v4si)a, __v4hi); > +} > + > +__v2qi mm_cvtepi64_epi8_builtin_convertvector(__m128i a) > +{ > + return __builtin_convertvector((__v2di)a, __v2qi); > +} > + > +__v4qi mm_cvtepi32_epi8_builtin_convertvector(__m128i a) > +{ > + return __builtin_convertvector((__v4si)a, __v4qi); > +} > + > +__v8qi mm_cvtepi16_epi8_builtin_convertvector(__m128i a) > +{ > + return __builtin_convertvector((__v8hi)a, __v8qi); > +} > -- > 2.31.1 > -- BR, Hongtao
Re: [RFC] Proposal to support Packed Boolean Vector masks.
On 7/12/24 6:40 PM, Richard Biener wrote: On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: Padding is only an issue for very small vectors - the obvious choice is to disallow vector types that would require any padding. I can hardly see where those are faster than using a vector of up to 4 char elements. Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit elements with 2 or one element vectors and 4-bit elements with 1 element vectors. I'd really like to avoid having to support something like _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16))) _BitInt(2) to say size of long long could be acceptable. I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say the element should have n (< 8) bits. I have no idea what the stance of supporting _BitInt in C++ are, but most certainly diverging support (or even semantics) of the vector extension in C vs. C++ is undesirable. I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't look favorably to _BitInt support in C++, so at least until something like that is standardized in C++ the answer is probably no. OK, I think that rules out _BitInt use here so while bool is then natural for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the number of bits explicitly. There is signed_bool_precision but like vector_mask it's use is restricted to the GIMPLE frontend because interaction with the rest of the language isn't defined. Thanks for all the suggestions - really insightful (to me) discussions. Yeah, BitInt seemed like it was best placed for this, but not having C++ support is definitely a blocker. But as you say, in the absence of BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One way to specify non-1-bit widths could be overloading vector_size. Also, I think overloading GIMPLE's vector_mask takes us into the earlier-discussed territory of what it should actually mean - it meaning the target truth type in GIMPLE and a generic vector extension in the FE will probably confuse gcc developers more than users. That said - we're mixing two things here. The desire to have "proper" svbool (fix: declare in the backend) and the desire to have "packed" bit-precision vectors (for whatever actual reason) as part of the GCC vector extension. If we leave lane-disambiguation of svbool to the backend, the values I see in supporting 1, 2 and 4 bitsizes are 1) first step towards supporting BitInt(N) vectors possibly in the future 2) having a way for targets to define their intrinsics' bool vector types using GNU extensions 3) feature parity with Clang's ext_vector_type? I believe the primary motivation for Clang to support ext_vector_type was to have a way to define target intrinsics' vector bool type using vector extensions. Thanks, Tejas. Richard. Jakub