Re: [x86_64 PATCH] Improved Scalar-To-Vector (STV) support for TImode to V1TImode.
On Thu, Jul 14, 2022 at 7:32 AM Roger Sayle wrote: > > > On Mon, Jul 11, 2022, H.J. Lu wrote: > > On Sun, Jul 10, 2022 at 2:38 PM Roger Sayle > > wrote: > > > Hi HJ, > > > > > > I believe this should now be handled by the post-reload (CSE) pass. > > > Consider the simple test case: > > > > > > __int128 a, b, c; > > > void foo() > > > { > > > a = 0; > > > b = 0; > > > c = 0; > > > } > > > > > > Without any STV, i.e. -O2 -msse4 -mno-stv, GCC get TI mode writes: > > > movq$0, a(%rip) > > > movq$0, a+8(%rip) > > > movq$0, b(%rip) > > > movq$0, b+8(%rip) > > > movq$0, c(%rip) > > > movq$0, c+8(%rip) > > > ret > > > > > > But with STV, i.e. -O2 -msse4, things get converted to V1TI mode: > > > pxor%xmm0, %xmm0 > > > movaps %xmm0, a(%rip) > > > movaps %xmm0, b(%rip) > > > movaps %xmm0, c(%rip) > > > ret > > > > > > You're quite right internally the STV actually generates the equivalent > > > of: > > > pxor%xmm0, %xmm0 > > > movaps %xmm0, a(%rip) > > > pxor%xmm0, %xmm0 > > > movaps %xmm0, b(%rip) > > > pxor%xmm0, %xmm0 > > > movaps %xmm0, c(%rip) > > > ret > > > > > > And currently because STV run before cse2 and combine, the const0_rtx > > > gets CSE'd be the cse2 pass to produce the code we see. However, if > > > you specify -fno-rerun-cse-after-loop (to disable the cse2 pass), > > > you'll see we continue to generate the same optimized code, as the > > > same const0_rtx gets CSE'd in postreload. > > > > > > I can't be certain until I try the experiment, but I believe that the > > > postreload CSE will clean-up, all of the same common subexpressions. > > > Hence, it should be safe to perform all STV at the same point (after > > > combine), which for a few additional optimizations. > > > > > > Does this make sense? Do you have a test case, > > > -fno-rerun-cse-after-loop produces different/inferior code for TImode STV > > chains? > > > > > > My guess is that the RTL passes have changed so much in the last six > > > or seven years, that some of the original motivation no longer applies. > > > Certainly we now try to keep TI mode operations visible longer, and > > > then allow STV to behave like a pre-reload pass to decide which set of > > > registers to use (vector V1TI or scalar doubleword DI). Any CSE > > > opportunities that cse2 finds with V1TI mode, could/should equally > > > well be found for TI mode (mostly). > > > > You are probably right. If there are no regressions in GCC testsuite, my > > original > > motivation is no longer valid. > > It was good to try the experiment, but H.J. is right, there is still some > benefit > (as well as some disadvantages) to running STV lowering before CSE2/combine. > A clean-up patch to perform all STV conversion as a single pass (removing a > pass from the compiler) results in just a single regression in the test suite: > FAIL: gcc.target/i386/pr70155-17.c scan-assembler-times movv1ti_internal 8 > which looks like: > > __int128 a, b, c, d, e, f; > void foo (void) > { > a = 0; > b = -1; > c = 0; > d = -1; > e = 0; > f = -1; > } > > By performing STV after combine (without CSE), reload prefers to implement > this function using a single register, that then requires 12 instructions > rather > than 8 (if using two registers). Alas there's nothing that postreload > CSE/GCSE > can do. Doh! Hmm, the RA could be taught to make use of more of the register file I suppose (shouldn't regrename do this job - but it runs after postreload-cse) > pxor%xmm0, %xmm0 > movaps %xmm0, a(%rip) > pcmpeqd %xmm0, %xmm0 > movaps %xmm0, b(%rip) > pxor%xmm0, %xmm0 > movaps %xmm0, c(%rip) > pcmpeqd %xmm0, %xmm0 > movaps %xmm0, d(%rip) > pxor%xmm0, %xmm0 > movaps %xmm0, e(%rip) > pcmpeqd %xmm0, %xmm0 > movaps %xmm0, f(%rip) > ret > > I also note that even without STV, the scalar implementation of this function > when > compiled with -Os is also larger than it needs to be due to poor CSE (notice > in the > following we only need a single zero register, and an all_ones reg would be > helpful). > > xorl%eax, %eax > xorl%edx, %edx > xorl%ecx, %ecx > movq$-1, b(%rip) > movq%rax, a(%rip) > movq%rax, a+8(%rip) > movq$-1, b+8(%rip) > movq%rdx, c(%rip) > movq%rdx, c+8(%rip) > movq$-1, d(%rip) > movq$-1, d+8(%rip) > movq%rcx, e(%rip) > movq%rcx, e+8(%rip) > movq$-1, f(%rip) > movq$-1, f+8(%rip) > ret > > I need to give the problem some more thought. It would be good to > clean-up/unify > the STV passes, but I/we need to solve/CSE HJ's last test case before we do. > Perhaps
Re: [PATCH] Extend 64-bit vector bit_op patterns with ?r alternative
On Thu, Jul 14, 2022 at 7:33 AM liuhongt wrote: > > And split it to GPR-version instruction after reload. > > > ?r was introduced under the assumption that we want vector values > > mostly in vector registers. Currently there are no instructions with > > memory or immediate operand, so that made sense at the time. Let's > > keep ?r until logic instructions with mem/imm operands are introduced. > > So, for the patch that adds 64-bit vector logic in GPR, I would advise > > to first introduce only register operands. mem/imm operands should be > Update patch to add ?r to 64-bit bit_op patterns. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > No big imact on SPEC2017(Most same binary). The problem with your approach is with the combine pass, where combine first tries to recognize the combined instruction without clobber, before re-recognizing instruction with added clobber. So, if a forward propagation happens, the combine will *always* choose the insn variant without GPR. So, the solution with VI_16_32 is to always expand with a clobbered version that is split to either SImode or V16QImode. With 64-bit instructions, we have two additional complications. First, we have a native MMX instruction, and we have to split to it after reload, and second, we have a builtin that expects vector insn. To solve the first issue, we should change the mode of "*mmx" to V1DImode and split your new _gpr version with clobber to it for !GENERAL_REG_P operands. The second issue could be solved by emitting V1DImode instructions directly from the expander. Please note there are several expanders that expect non-clobbered logic insn in certain mode to be available, so the situation can become quite annoying... Uros.
Re: [PATCH] xtensa: Minor fix for FP constant synthesis
On Wed, Jul 13, 2022 at 4:41 AM Takayuki 'January June' Suwa wrote: > > This patch fixes an non-fatal issue about negative constant values derived > from FP constant synthesis on hosts whose 'long' is wider than 'int32_t'. > > And also replaces the dedicated code in FP constant synthesis split > pattern with the appropriate existing function call. > > gcc/ChangeLog: > > * config/xtensa/xtensa.md: > In FP constant synthesis split pattern, subcontract to > avoid_constant_pool_reference() as in the case of integer, > because it can handle well too. And cast to int32_t before > calling xtensa_constantsynth() in order to ignore upper 32-bit. > > gcc/testsuite/ChangeLog: > > * gcc.target/xtensa/constsynth_double.c: > Modify in order to catch the issue. > --- > gcc/config/xtensa/xtensa.md | 35 +-- > .../gcc.target/xtensa/constsynth_double.c | 2 +- > 2 files changed, 9 insertions(+), 28 deletions(-) Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master. -- Thanks. -- Max
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Wed, 13 Jul 2022 at 12:22, Richard Biener wrote: > > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > wrote: > > > > Hi Richard, > > For the following test: > > > > svint32_t f2(int a, int b, int c, int d) > > { > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > } > > > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > foo.c: In function ‘f2’: > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > 4 | svint32_t f2(int a, int b, int c, int d) > > | ^~ > > svint32_t > > __Int32x4_t > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > during GIMPLE pass: forwprop > > dump file: foo.c.109t.forwprop2 > > foo.c:4:11: internal compiler error: verify_gimple failed > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > ../../gcc/gcc/tree-cfg.cc:5568 > > 0xe9371f execute_function_todo > > ../../gcc/gcc/passes.cc:2091 > > 0xe93ccb execute_todo > > ../../gcc/gcc/passes.cc:2145 > > > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > int32x4_t v; > > __Int32x4_t _1; > > svint32_t _9; > > vector(4) int _11; > > > >: > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > v_12 = _1; > > _11 = v_12; > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > return _9; > > > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > view_convert_expr, > > and the end result becomes: > > svint32_t _7; > > __Int32x4_t _8; > > > > ;; basic block 2, loop depth 0 > > ;;pred: ENTRY > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > return _7; > > ;;succ: EXIT > > > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > has incompatible types (svint32_t, int32x4_t). > > > > The attached patch disables simplification of VEC_PERM_EXPR > > in simplify_permutation, if lhs and rhs have non compatible types, > > which resolves ICE, but am not sure if it's the correct approach ? > > It for sure papers over the issue. I think the error happens earlier, > the V_C_E should have been built with the type of the VEC_PERM_EXPR > which is the type of the LHS. But then you probably run into the > different sizes ICE (VLA vs constant size). I think for this case you > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > selecting the "low" part of the VLA vector. Hi Richard, Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to represent dup operation from fixed width to VLA vector. I am not sure how folding it to BIT_FIELD_REF will work. Could you please elaborate ? Also, the issue doesn't seem restricted to this case. The following test case also ICE's during forwprop: svint32_t foo() { int32x4_t v = (int32x4_t) {1, 2, 3, 4}; svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); return v2; } foo2.c: In function ‘foo’: foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ 9 | } | ^ svint32_t int32x4_t v2_4 = { 1, 2, 3, 4 }; because simplify_permutation folds VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > into: vector_cst {1, 2, 3, 4} and it complains during verify_gimple_assign_single because we don't support assignment of vector_cst to VLA vector. I guess the issue really is that currently, only VEC_PERM_EXPR supports lhs and rhs to have vector types with differing lengths, and simplifying it to other tree codes, like above, will result in type errors ? Thanks, Prathamesh > > > > > Alternatively, should we allow assignments from fixed-width to SVE > > vector, so the above > > VIEW_CONVERT_EXPR would result in dup ? > > > > Thanks, > > Prathamesh
[PATCH] Fix ICE on view conversion between struct and integer
Hi, you can build a view conversion between pretty much anything in Ada including between types with different sizes, although the compiler warns in this case and gigi pads the smaller type to end up with the same size. The attached testcase triggers an ICE at -O or above for one of them: FAIL: gnat.dg/opt98.adb (test for excess errors) Excess errors: during GIMPLE pass: esra +===GNAT BUG DETECTED==+ | 13.0.0 20220713 (experimental) [master 6f5cf9470aa] (x86_64-suse-linux) GCC error:| | in gimplify_modify_expr, at gimplify.cc:6254 | | Error detected around /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt98.adb:10:7| | Compiling /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt98.adb | if (gimplify_ctxp->into_ssa && is_gimple_reg (*to_p)) { /* We should have got an SSA name from the start. */ gcc_assert (TREE_CODE (*to_p) == SSA_NAME || ! gimple_in_ssa_p (cfun)); } This happens from prepare_gimple_addressable for the variable to be marked with DECL_NOT_GIMPLE_REG_P when its initialization is gimplified, so it's apparently just a matter of setting the flag earlier. Bootstrapped/regtested on x86-64/Linux, OK for the mainline? 2022-07-14 Eric Botcazou * gimplify.cc (lookup_tmp_var): Add NOT_GIMPLE_REG boolean parameter and set DECL_NOT_GIMPLE_REG_P on the variable according to it. (internal_get_tmp_var): Add NOT_GIMPLE_REG boolean parameter and pass it in the call to lookup_tmp_var. (get_formal_tmp_var): Pass false in the call to lookup_tmp_var. (get_initialized_tmp_var): Likewise. (prepare_gimple_addressable): Call internal_get_tmp_var instead of get_initialized_tmp_var with NOT_GIMPLE_REG set to true. 2022-07-14 Eric Botcazou * gnat.dg/opt98.ads, gnat.dg/opt98.adb: New test. -- Eric Botcazoudiff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 04990ad91a6..2ac7ca0855e 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -573,20 +573,26 @@ create_tmp_from_val (tree val) } /* Create a temporary to hold the value of VAL. If IS_FORMAL, try to reuse - an existing expression temporary. */ + an existing expression temporary. If NOT_GIMPLE_REG, mark it as such. */ static tree -lookup_tmp_var (tree val, bool is_formal) +lookup_tmp_var (tree val, bool is_formal, bool not_gimple_reg) { tree ret; + /* We cannot mark a formal temporary with DECL_NOT_GIMPLE_REG_P. */ + gcc_assert (!is_formal || !not_gimple_reg); + /* If not optimizing, never really reuse a temporary. local-alloc won't allocate any variable that is used in more than one basic block, which means it will go into memory, causing much extra work in reload and final and poorer code generation, outweighing the extra memory allocation here. */ if (!optimize || !is_formal || TREE_SIDE_EFFECTS (val)) -ret = create_tmp_from_val (val); +{ + ret = create_tmp_from_val (val); + DECL_NOT_GIMPLE_REG_P (ret) = not_gimple_reg; +} else { elt_t elt, *elt_p; @@ -617,7 +623,7 @@ lookup_tmp_var (tree val, bool is_formal) static tree internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p, - bool is_formal, bool allow_ssa) + bool is_formal, bool allow_ssa, bool not_gimple_reg) { tree t, mod; @@ -639,7 +645,7 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p, } } else -t = lookup_tmp_var (val, is_formal); +t = lookup_tmp_var (val, is_formal, not_gimple_reg); mod = build2 (INIT_EXPR, TREE_TYPE (t), t, unshare_expr (val)); @@ -667,7 +673,7 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p, tree get_formal_tmp_var (tree val, gimple_seq *pre_p) { - return internal_get_tmp_var (val, pre_p, NULL, true, true); + return internal_get_tmp_var (val, pre_p, NULL, true, true, false); } /* Return a temporary variable initialized with VAL. PRE_P and POST_P @@ -678,7 +684,7 @@ get_initialized_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p /* = NULL */, bool allow_ssa /* = true */) { - return internal_get_tmp_var (val, pre_p, post_p, false, allow_ssa); + return internal_get_tmp_var (val, pre_p, post_p, false, allow_ssa, false); } /* Declare all the variables in VARS in SCOPE. If DEBUG_INFO is true, @@ -4574,13 +4580,10 @@ prepare_gimple_addressable (tree *expr_p, gimple_seq *seq_p) { while (handled_component_p (*expr_p)) expr_p = &TREE_OPERAND (*expr_p, 0); + + /* Do not allow an SSA name as the temporary. */ if (is_gimple_reg (*expr_p)) -{ - /* Do not allow an SSA name as the temporary. */ - tree var = get_initialized_tmp_var (*expr_p, seq_p, NULL, false); - DECL_NOT_GIMPLE_REG_P (var) = 1; - *expr_p = var; -} +*expr_p = internal_get_tmp_var (*expr_p, seq_p, NULL, false, false, true); }
Re: [PATCH] [RFC]Support vectorization for Complex type.
On Wed, Jul 13, 2022 at 9:34 AM Richard Biener wrote: > > On Wed, Jul 13, 2022 at 6:47 AM Hongtao Liu wrote: > > > > On Tue, Jul 12, 2022 at 10:12 PM Richard Biener > > wrote: > > > > > > On Tue, Jul 12, 2022 at 6:11 AM Hongtao Liu wrote: > > > > > > > > On Mon, Jul 11, 2022 at 7:47 PM Richard Biener via Gcc-patches > > > > wrote: > > > > > > > > > > On Mon, Jul 11, 2022 at 5:44 AM liuhongt > > > > > wrote: > > > > > > > > > > > > The patch only handles load/store(including ctor/permutation, except > > > > > > gather/scatter) for complex type, other operations don't needs to be > > > > > > handled since they will be lowered by pass cplxlower.(MASK_LOAD is > > > > > > not > > > > > > supported for complex type, so no need to handle either). > > > > > > > > > > (*) > > > > > > > > > > > Instead of support vector(2) _Complex double, this patch takes > > > > > > vector(4) > > > > > > double as vector type of _Complex double. Since vectorizer > > > > > > originally > > > > > > takes TYPE_VECTOR_SUBPARTS as nunits which is not true for complex > > > > > > type, the patch handles nunits/ncopies/vf specially for complex > > > > > > type. > > > > > > > > > > For the limited set above(*) can you explain what's "special" about > > > > > vector(2) _Complex > > > > > vs. vector(4) double, thus why we need to have STMT_VINFO_COMPLEX_P > > > > > at all? > > > > Supporting a vector(2) complex is a straightforward idea, just like > > > > supporting other scalar type in vectorizer, but it requires more > > > > efforts(in the backend and frontend), considering that most of > > > > operations of complex type will be lowered into realpart and imagpart > > > > operations, supporting a vector(2) complex does not look that > > > > necessary. Then it comes up with supporting vector(4) double(with > > > > adjustment of vf/ctor/permutation), the vectorizer only needs to > > > > handle the vectorization of the move operation of the complex type(no > > > > need to worry about wrongly mapping vector(4) double multiplication to > > > > complex type multiplication since it's already lowered before > > > > vectorizer). > > > > stmt_info does not record the scalar type, in order to avoid duplicate > > > > operation like getting a lhs type from stmt to determine whether it is > > > > a complex type, STMT_VINFO_COMPLEX_P bit is added, this bit is mainly > > > > initialized in vect_analyze_data_refs and vect_get_vector_types_for_ > > > > stmt. > > > > > > > > > > I wonder to what extent your handling can be extended to support > > > > > re-vectorizing > > > > > (with a higher VF for example) already vectorized code? The > > > > > vectorizer giving > > > > > up on vector(2) double looks quite obviously similar to it giving up > > > > > on _Complex double ... > > > > Yes, it can be extended to vector(2) double/float/int/ with a bit > > > > adjustment(exacting element by using bit_field instead of > > > > imagpart_expr/realpart_expr). > > > > > It would be a shame to not use the same underlying mechanism for > > > > > dealing with > > > > > both, where for the vector case obviously vector(4) would be > > > > > supported as well. > > > > > > > > > > In principle _Complex double operations should be two SLP lanes but > > > > > it seems you > > > > > are handling them with classical interleaving as well? > > > > I'm only handling move operations, for other operations it will be > > > > lowered to realpart and imagpart and thus two SLP lanes. > > > > > > Yes, I understood that. > > > > > > Doing it more general (and IMHO better) would involve enhancing > > > how we represent dataref groups, maintaining the number of scalars > > > covered by each of the vinfos. On the SLP representation side it > > > probably requires to rely on the representative for access and not > > > on the scalar stmts (since those do not map properly to the lanes). > > > > > > Ideally we'd be able to handle > > > > > > struct { _Complex double c; double a; double b; } a[], b[]; > > > > > > void foo () > > > { > > >for (int i = 0; i < 100; ++i) > > > { > > > a[i].c = b[i].c; > > > a[i].a = b[i].a; > > > a[i].b = b[i].b; > > > } > > > } > > > > > > which I guess your patch doesn't handle with plain AVX vector > > > copies but instead uses interleaving for the _Complex and non-_Complex > > > parts? > > Indeed, it produces wrong code. > > For _Complex, in case we don't get to the "true and only" solution it > might be easier to split the loads and stores when it's just memory > copies and we have vectorization enabled and a supported vector > mode that would surely re-assemble them (store-merging doesn't seem > to do that). > > Btw, we seem to produce > > movsd b(%rip), %xmm0 > movsd %xmm0, a(%rip) > movsd b+8(%rip), %xmm0 > movsd %xmm0, a+8(%rip) > > for a _Complex double memory copy on x86 which means we lack > true DCmode support (pseudos get decomposed). Not sure if we > can somehow
[PATCH][pushed] docs: fix position of @end deftypefn
gcc/ChangeLog: * doc/gimple.texi: Close properly a deftypefn. --- gcc/doc/gimple.texi | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index dd9149377f3..9150218fed8 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -1965,11 +1965,10 @@ statements to be executed by just the master. @deftypefn {GIMPLE function} gimple gimple_build_omp_ordered (gimple_seq body) Build a @code{GIMPLE_OMP_ORDERED} statement. -@end deftypefn @code{BODY} is the sequence of statements inside a loop that will executed in sequence. - +@end deftypefn @node @code{GIMPLE_OMP_PARALLEL} @subsection @code{GIMPLE_OMP_PARALLEL} -- 2.37.0
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni wrote: > > On Wed, 13 Jul 2022 at 12:22, Richard Biener > wrote: > > > > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches > > wrote: > > > > > > Hi Richard, > > > For the following test: > > > > > > svint32_t f2(int a, int b, int c, int d) > > > { > > > int32x4_t v = (int32x4_t) {a, b, c, d}; > > > return svld1rq_s32 (svptrue_b8 (), &v[0]); > > > } > > > > > > The compiler emits following ICE with -O3 -mcpu=generic+sve: > > > foo.c: In function ‘f2’: > > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ > > > 4 | svint32_t f2(int a, int b, int c, int d) > > > | ^~ > > > svint32_t > > > __Int32x4_t > > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > during GIMPLE pass: forwprop > > > dump file: foo.c.109t.forwprop2 > > > foo.c:4:11: internal compiler error: verify_gimple failed > > > 0xfda04a verify_gimple_in_cfg(function*, bool) > > > ../../gcc/gcc/tree-cfg.cc:5568 > > > 0xe9371f execute_function_todo > > > ../../gcc/gcc/passes.cc:2091 > > > 0xe93ccb execute_todo > > > ../../gcc/gcc/passes.cc:2145 > > > > > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have: > > > int32x4_t v; > > > __Int32x4_t _1; > > > svint32_t _9; > > > vector(4) int _11; > > > > > >: > > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; > > > v_12 = _1; > > > _11 = v_12; > > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; > > > return _9; > > > > > > During forwprop, simplify_permutation simplifies vec_perm_expr to > > > view_convert_expr, > > > and the end result becomes: > > > svint32_t _7; > > > __Int32x4_t _8; > > > > > > ;; basic block 2, loop depth 0 > > > ;;pred: ENTRY > > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; > > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); > > > return _7; > > > ;;succ: EXIT > > > > > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR > > > has incompatible types (svint32_t, int32x4_t). > > > > > > The attached patch disables simplification of VEC_PERM_EXPR > > > in simplify_permutation, if lhs and rhs have non compatible types, > > > which resolves ICE, but am not sure if it's the correct approach ? > > > > It for sure papers over the issue. I think the error happens earlier, > > the V_C_E should have been built with the type of the VEC_PERM_EXPR > > which is the type of the LHS. But then you probably run into the > > different sizes ICE (VLA vs constant size). I think for this case you > > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, > > selecting the "low" part of the VLA vector. > Hi Richard, > Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to > represent dup operation > from fixed width to VLA vector. I am not sure how folding it to > BIT_FIELD_REF will work. > Could you please elaborate ? > > Also, the issue doesn't seem restricted to this case. > The following test case also ICE's during forwprop: > svint32_t foo() > { > int32x4_t v = (int32x4_t) {1, 2, 3, 4}; > svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); > return v2; > } > > foo2.c: In function ‘foo’: > foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ > 9 | } > | ^ > svint32_t > int32x4_t > v2_4 = { 1, 2, 3, 4 }; > > because simplify_permutation folds > VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > > into: > vector_cst {1, 2, 3, 4} > > and it complains during verify_gimple_assign_single because we don't > support assignment of vector_cst to VLA vector. > > I guess the issue really is that currently, only VEC_PERM_EXPR > supports lhs and rhs > to have vector types with differing lengths, and simplifying it to > other tree codes, like above, > will result in type errors ? That might be the case - Richard should know. If so your type check is still too late, you should instead recognize that we are permuting a VLA vector and then refuse to go any of the non-VEC_PERM generating paths - that probably means just allowing the code == VEC_PERM_EXPR case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? Richard. > > Thanks, > Prathamesh > > > > > > > > Alternatively, should we allow assignments from fixed-width to SVE > > > vector, so the above > > > VIEW_CONVERT_EXPR would result in dup ? > > > > > > Thanks, > > > Prathamesh
Re: [PATCH] Fix ICE on view conversion between struct and integer
On Thu, Jul 14, 2022 at 10:20 AM Eric Botcazou via Gcc-patches wrote: > > Hi, > > you can build a view conversion between pretty much anything in Ada including > between types with different sizes, although the compiler warns in this case > and gigi pads the smaller type to end up with the same size. > > The attached testcase triggers an ICE at -O or above for one of them: > > FAIL: gnat.dg/opt98.adb (test for excess errors) > Excess errors: > during GIMPLE pass: esra > +===GNAT BUG DETECTED==+ > | 13.0.0 20220713 (experimental) [master 6f5cf9470aa] (x86_64-suse-linux) GCC > error:| > | in gimplify_modify_expr, at gimplify.cc:6254 | > | Error detected around > /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt98.adb:10:7| > | Compiling /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt98.adb | > > if (gimplify_ctxp->into_ssa && is_gimple_reg (*to_p)) > { > /* We should have got an SSA name from the start. */ > gcc_assert (TREE_CODE (*to_p) == SSA_NAME > || ! gimple_in_ssa_p (cfun)); > } > > This happens from prepare_gimple_addressable for the variable to be marked > with > DECL_NOT_GIMPLE_REG_P when its initialization is gimplified, so it's > apparently > just a matter of setting the flag earlier. > > Bootstrapped/regtested on x86-64/Linux, OK for the mainline? OK. Thanks, Richard. > > 2022-07-14 Eric Botcazou > > * gimplify.cc (lookup_tmp_var): Add NOT_GIMPLE_REG boolean parameter > and set DECL_NOT_GIMPLE_REG_P on the variable according to it. > (internal_get_tmp_var): Add NOT_GIMPLE_REG boolean parameter and pass > it in the call to lookup_tmp_var. > (get_formal_tmp_var): Pass false in the call to lookup_tmp_var. > (get_initialized_tmp_var): Likewise. > (prepare_gimple_addressable): Call internal_get_tmp_var instead of > get_initialized_tmp_var with NOT_GIMPLE_REG set to true. > > > 2022-07-14 Eric Botcazou > > * gnat.dg/opt98.ads, gnat.dg/opt98.adb: New test. > > -- > Eric Botcazou
[PATCH part2][PUSHED] docs: fix position of @end deftypefn
gcc/ChangeLog: * doc/gimple.texi: Close properly a deftypefn. --- gcc/doc/gimple.texi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 9150218fed8..7832fa6ff90 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -1977,12 +1977,12 @@ executed in sequence. @deftypefn {GIMPLE function} gomp_parallel *gimple_build_omp_parallel (@ gimple_seq body, tree clauses, tree child_fn, tree data_arg) Build a @code{GIMPLE_OMP_PARALLEL} statement. -@end deftypefn @code{BODY} is sequence of statements which are executed in parallel. @code{CLAUSES}, are the @code{OMP} parallel construct's clauses. @code{CHILD_FN} is the function created for the parallel threads to execute. @code{DATA_ARG} are the shared data argument(s). +@end deftypefn @deftypefn {GIMPLE function} bool gimple_omp_parallel_combined_p (gimple g) Return true if @code{OMP} parallel statement @code{G} has the @@ -2076,9 +2076,9 @@ Return true if @code{OMP} return statement @code{G} has the @deftypefn {GIMPLE function} gimple gimple_build_omp_section (gimple_seq body) Build a @code{GIMPLE_OMP_SECTION} statement for a sections statement. -@end deftypefn @code{BODY} is the sequence of statements in the section. +@end deftypefn @deftypefn {GIMPLE function} bool gimple_omp_section_last_p (gimple g) Return true if @code{OMP} section statement @code{G} has the -- 2.37.0
Re: [PATCH] [RFC]Support vectorization for Complex type.
On Thu, Jul 14, 2022 at 4:20 PM Richard Biener wrote: > > On Wed, Jul 13, 2022 at 9:34 AM Richard Biener > wrote: > > > > On Wed, Jul 13, 2022 at 6:47 AM Hongtao Liu wrote: > > > > > > On Tue, Jul 12, 2022 at 10:12 PM Richard Biener > > > wrote: > > > > > > > > On Tue, Jul 12, 2022 at 6:11 AM Hongtao Liu wrote: > > > > > > > > > > On Mon, Jul 11, 2022 at 7:47 PM Richard Biener via Gcc-patches > > > > > wrote: > > > > > > > > > > > > On Mon, Jul 11, 2022 at 5:44 AM liuhongt > > > > > > wrote: > > > > > > > > > > > > > > The patch only handles load/store(including ctor/permutation, > > > > > > > except > > > > > > > gather/scatter) for complex type, other operations don't needs to > > > > > > > be > > > > > > > handled since they will be lowered by pass cplxlower.(MASK_LOAD > > > > > > > is not > > > > > > > supported for complex type, so no need to handle either). > > > > > > > > > > > > (*) > > > > > > > > > > > > > Instead of support vector(2) _Complex double, this patch takes > > > > > > > vector(4) > > > > > > > double as vector type of _Complex double. Since vectorizer > > > > > > > originally > > > > > > > takes TYPE_VECTOR_SUBPARTS as nunits which is not true for complex > > > > > > > type, the patch handles nunits/ncopies/vf specially for complex > > > > > > > type. > > > > > > > > > > > > For the limited set above(*) can you explain what's "special" about > > > > > > vector(2) _Complex > > > > > > vs. vector(4) double, thus why we need to have STMT_VINFO_COMPLEX_P > > > > > > at all? > > > > > Supporting a vector(2) complex is a straightforward idea, just like > > > > > supporting other scalar type in vectorizer, but it requires more > > > > > efforts(in the backend and frontend), considering that most of > > > > > operations of complex type will be lowered into realpart and imagpart > > > > > operations, supporting a vector(2) complex does not look that > > > > > necessary. Then it comes up with supporting vector(4) double(with > > > > > adjustment of vf/ctor/permutation), the vectorizer only needs to > > > > > handle the vectorization of the move operation of the complex type(no > > > > > need to worry about wrongly mapping vector(4) double multiplication to > > > > > complex type multiplication since it's already lowered before > > > > > vectorizer). > > > > > stmt_info does not record the scalar type, in order to avoid duplicate > > > > > operation like getting a lhs type from stmt to determine whether it is > > > > > a complex type, STMT_VINFO_COMPLEX_P bit is added, this bit is mainly > > > > > initialized in vect_analyze_data_refs and vect_get_vector_types_for_ > > > > > stmt. > > > > > > > > > > > > I wonder to what extent your handling can be extended to support > > > > > > re-vectorizing > > > > > > (with a higher VF for example) already vectorized code? The > > > > > > vectorizer giving > > > > > > up on vector(2) double looks quite obviously similar to it giving up > > > > > > on _Complex double ... > > > > > Yes, it can be extended to vector(2) double/float/int/ with a bit > > > > > adjustment(exacting element by using bit_field instead of > > > > > imagpart_expr/realpart_expr). > > > > > > It would be a shame to not use the same underlying mechanism for > > > > > > dealing with > > > > > > both, where for the vector case obviously vector(4) would be > > > > > > supported as well. > > > > > > > > > > > > In principle _Complex double operations should be two SLP lanes but > > > > > > it seems you > > > > > > are handling them with classical interleaving as well? > > > > > I'm only handling move operations, for other operations it will be > > > > > lowered to realpart and imagpart and thus two SLP lanes. > > > > > > > > Yes, I understood that. > > > > > > > > Doing it more general (and IMHO better) would involve enhancing > > > > how we represent dataref groups, maintaining the number of scalars > > > > covered by each of the vinfos. On the SLP representation side it > > > > probably requires to rely on the representative for access and not > > > > on the scalar stmts (since those do not map properly to the lanes). > > > > > > > > Ideally we'd be able to handle > > > > > > > > struct { _Complex double c; double a; double b; } a[], b[]; > > > > > > > > void foo () > > > > { > > > >for (int i = 0; i < 100; ++i) > > > > { > > > > a[i].c = b[i].c; > > > > a[i].a = b[i].a; > > > > a[i].b = b[i].b; > > > > } > > > > } > > > > > > > > which I guess your patch doesn't handle with plain AVX vector > > > > copies but instead uses interleaving for the _Complex and non-_Complex > > > > parts? > > > Indeed, it produces wrong code. > > > > For _Complex, in case we don't get to the "true and only" solution it > > might be easier to split the loads and stores when it's just memory > > copies and we have vectorization enabled and a supported vector > > mode that would surely re-assemble them (store-merging doesn't seem > > to do that). > > > > Btw,
Re: [PATCH] [RFC]Support vectorization for Complex type.
On Thu, Jul 14, 2022 at 4:53 PM Hongtao Liu wrote: > > On Thu, Jul 14, 2022 at 4:20 PM Richard Biener > wrote: > > > > On Wed, Jul 13, 2022 at 9:34 AM Richard Biener > > wrote: > > > > > > On Wed, Jul 13, 2022 at 6:47 AM Hongtao Liu wrote: > > > > > > > > On Tue, Jul 12, 2022 at 10:12 PM Richard Biener > > > > wrote: > > > > > > > > > > On Tue, Jul 12, 2022 at 6:11 AM Hongtao Liu > > > > > wrote: > > > > > > > > > > > > On Mon, Jul 11, 2022 at 7:47 PM Richard Biener via Gcc-patches > > > > > > wrote: > > > > > > > > > > > > > > On Mon, Jul 11, 2022 at 5:44 AM liuhongt > > > > > > > wrote: > > > > > > > > > > > > > > > > The patch only handles load/store(including ctor/permutation, > > > > > > > > except > > > > > > > > gather/scatter) for complex type, other operations don't needs > > > > > > > > to be > > > > > > > > handled since they will be lowered by pass cplxlower.(MASK_LOAD > > > > > > > > is not > > > > > > > > supported for complex type, so no need to handle either). > > > > > > > > > > > > > > (*) > > > > > > > > > > > > > > > Instead of support vector(2) _Complex double, this patch takes > > > > > > > > vector(4) > > > > > > > > double as vector type of _Complex double. Since vectorizer > > > > > > > > originally > > > > > > > > takes TYPE_VECTOR_SUBPARTS as nunits which is not true for > > > > > > > > complex > > > > > > > > type, the patch handles nunits/ncopies/vf specially for complex > > > > > > > > type. > > > > > > > > > > > > > > For the limited set above(*) can you explain what's "special" > > > > > > > about > > > > > > > vector(2) _Complex > > > > > > > vs. vector(4) double, thus why we need to have > > > > > > > STMT_VINFO_COMPLEX_P at all? > > > > > > Supporting a vector(2) complex is a straightforward idea, just like > > > > > > supporting other scalar type in vectorizer, but it requires more > > > > > > efforts(in the backend and frontend), considering that most of > > > > > > operations of complex type will be lowered into realpart and > > > > > > imagpart > > > > > > operations, supporting a vector(2) complex does not look that > > > > > > necessary. Then it comes up with supporting vector(4) double(with > > > > > > adjustment of vf/ctor/permutation), the vectorizer only needs to > > > > > > handle the vectorization of the move operation of the complex > > > > > > type(no > > > > > > need to worry about wrongly mapping vector(4) double multiplication > > > > > > to > > > > > > complex type multiplication since it's already lowered before > > > > > > vectorizer). > > > > > > stmt_info does not record the scalar type, in order to avoid > > > > > > duplicate > > > > > > operation like getting a lhs type from stmt to determine whether it > > > > > > is > > > > > > a complex type, STMT_VINFO_COMPLEX_P bit is added, this bit is > > > > > > mainly > > > > > > initialized in vect_analyze_data_refs and vect_get_vector_types_for_ > > > > > > stmt. > > > > > > > > > > > > > > I wonder to what extent your handling can be extended to support > > > > > > > re-vectorizing > > > > > > > (with a higher VF for example) already vectorized code? The > > > > > > > vectorizer giving > > > > > > > up on vector(2) double looks quite obviously similar to it giving > > > > > > > up > > > > > > > on _Complex double ... > > > > > > Yes, it can be extended to vector(2) double/float/int/ with a > > > > > > bit > > > > > > adjustment(exacting element by using bit_field instead of > > > > > > imagpart_expr/realpart_expr). > > > > > > > It would be a shame to not use the same underlying mechanism for > > > > > > > dealing with > > > > > > > both, where for the vector case obviously vector(4) would be > > > > > > > supported as well. > > > > > > > > > > > > > > In principle _Complex double operations should be two SLP lanes > > > > > > > but it seems you > > > > > > > are handling them with classical interleaving as well? > > > > > > I'm only handling move operations, for other operations it will be > > > > > > lowered to realpart and imagpart and thus two SLP lanes. > > > > > > > > > > Yes, I understood that. > > > > > > > > > > Doing it more general (and IMHO better) would involve enhancing > > > > > how we represent dataref groups, maintaining the number of scalars > > > > > covered by each of the vinfos. On the SLP representation side it > > > > > probably requires to rely on the representative for access and not > > > > > on the scalar stmts (since those do not map properly to the lanes). > > > > > > > > > > Ideally we'd be able to handle > > > > > > > > > > struct { _Complex double c; double a; double b; } a[], b[]; > > > > > > > > > > void foo () > > > > > { > > > > >for (int i = 0; i < 100; ++i) > > > > > { > > > > > a[i].c = b[i].c; > > > > > a[i].a = b[i].a; > > > > > a[i].b = b[i].b; > > > > > } > > > > > } > > > > > > > > > > which I guess your patch doesn't handle with plain AVX vector > > > > > copies but instead uses interleaving
Re: [PATCH] Extend 64-bit vector bit_op patterns with ?r alternative
On Thu, Jul 14, 2022 at 3:22 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, Jul 14, 2022 at 7:33 AM liuhongt wrote: > > > > And split it to GPR-version instruction after reload. > > > > > ?r was introduced under the assumption that we want vector values > > > mostly in vector registers. Currently there are no instructions with > > > memory or immediate operand, so that made sense at the time. Let's > > > keep ?r until logic instructions with mem/imm operands are introduced. > > > So, for the patch that adds 64-bit vector logic in GPR, I would advise > > > to first introduce only register operands. mem/imm operands should be > > Update patch to add ?r to 64-bit bit_op patterns. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > No big imact on SPEC2017(Most same binary). > > The problem with your approach is with the combine pass, where combine > first tries to recognize the combined instruction without clobber, > before re-recognizing instruction with added clobber. So, if a forward > propagation happens, the combine will *always* choose the insn variant > without GPR. Thank you for the explanation, I really did not know this point. > > So, the solution with VI_16_32 is to always expand with a clobbered > version that is split to either SImode or V16QImode. With 64-bit > instructions, we have two additional complications. First, we have a > native MMX instruction, and we have to split to it after reload, and > second, we have a builtin that expects vector insn. > > To solve the first issue, we should change the mode of > "*mmx" to V1DImode and split your new _gpr version with > clobber to it for !GENERAL_REG_P operands. > > The second issue could be solved by emitting V1DImode instructions > directly from the expander. Please note there are several expanders > that expect non-clobbered logic insn in certain mode to be available, > so the situation can become quite annoying... Yes. It looks like it would add a lot of code complexity, I'll hold the patch for now. > > Uros. -- BR, Hongtao
[PATCH][pushed] libiberty: fix docs typo
libiberty/ChangeLog: * functions.texi: Replace strtoul with strtoull. --- libiberty/functions.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libiberty/functions.texi b/libiberty/functions.texi index e4d74b58220..b56b02e0686 100644 --- a/libiberty/functions.texi +++ b/libiberty/functions.texi @@ -1747,7 +1747,7 @@ that the converted value is unsigned. @c strtoll.c:33 @deftypefn Supplemental {long long int} strtoll (const char *@var{string}, @ char **@var{endptr}, int @var{base}) -@deftypefnx Supplemental {unsigned long long int} strtoul (@ +@deftypefnx Supplemental {unsigned long long int} strtoull (@ const char *@var{string}, char **@var{endptr}, int @var{base}) The @code{strtoll} function converts the string in @var{string} to a -- 2.37.0
Re: [PATCH] ipa-cp: Fix assert triggering with -fno-toplevel-reorder (PR 106260)
Hi, On Thu, Jul 14 2022, Richard Biener wrote: > On Wed, Jul 13, 2022 at 11:06 PM Martin Jambor wrote: >> >> Hi, >> >> with -fno-toplevel-reorder (and -fwhole-program), there apparently can >> be local functions without any callers. > > Did you check why? Can't we fix that? no, I have not checked much. Our manual description of -fno-toplevel-reorder says that: "When this option is used, unreferenced static variables are not removed." In the beginning, static (because of -fwhole-program) variables _ZTV1S and __gxx_personality_v0 do refer to the destructor in question, my hypothesis is that the dead function removal code somehow has to be careful in case these have been already output, even when these two variables are no longer in the symbol table when IPA-CP runs. But I did not look deeper, hoping that Honza would correct me if I am totally off. Perhaps the bug is that the function should not be local even with -fwhole-program. The semantics of this combination of options is a bit unclear to me. Martin > >> This is something that IPA-CP >> does not like because its propagation verifier checks that local >> functions do not end up with TOP in their lattices. Therefore there >> is an assert checking that all call-less unreachable functions have >> been removed, which triggers in PR 106260 with these two options. >> >> This patch detects the situation and marks the lattices as variable, >> thus avoiding both the assert trigger and the verification failure. >> >> Bootstrapped and tested on x86_64-linux. OK for master and then all >> active release branches? >> >> Thanks, >> >> Martin >> >> >> gcc/ChangeLog: >> >> 2022-07-13 Martin Jambor >> >> PR ipa/106260 >> * ipa-cp.cc (initialize_node_lattices): Replace assert that there are >> callers with handling that situation when -fno-toplevel_reorder. >> >> gcc/testsuite/ChangeLog: >> >> 2022-07-13 Martin Jambor >> >> PR ipa/106260 >> * g++.dg/ipa/pr106260.C: New test. >> --- >> gcc/ipa-cp.cc | 6 ++- >> gcc/testsuite/g++.dg/ipa/pr106260.C | 64 + >> 2 files changed, 69 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/g++.dg/ipa/pr106260.C >> >> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc >> index 543a9334e2c..f699a8dadc0 100644 >> --- a/gcc/ipa-cp.cc >> +++ b/gcc/ipa-cp.cc >> @@ -1286,10 +1286,14 @@ initialize_node_lattices (struct cgraph_node *node) >>int caller_count = 0; >>node->call_for_symbol_thunks_and_aliases (count_callers, >> &caller_count, >> true); >> - gcc_checking_assert (caller_count > 0); >>if (caller_count == 1) >> node->call_for_symbol_thunks_and_aliases (set_single_call_flag, >> NULL, true); >> + else if (caller_count == 0) >> + { >> + gcc_checking_assert (!opt_for_fn (node->decl, >> flag_toplevel_reorder)); >> + variable = true; >> + } >> } >>else >> { >> diff --git a/gcc/testsuite/g++.dg/ipa/pr106260.C >> b/gcc/testsuite/g++.dg/ipa/pr106260.C >> new file mode 100644 >> index 000..bd3b6e0af79 >> --- /dev/null >> +++ b/gcc/testsuite/g++.dg/ipa/pr106260.C >> @@ -0,0 +1,64 @@ >> +// { dg-do compile } >> +// { dg-options "-O2 -std=gnu++14 -fwhole-program -fno-unit-at-a-time" } >> + >> +struct A; >> +template >> +struct Q { Q (T); }; >> +template >> +struct U { >> + ~U () { m1 (nullptr); } >> + D m2 (); >> + T *u; >> + void m1 (T *) { m2 () (u); } >> +}; >> +struct F { F (int *); }; >> +template >> +using W = Q; >> +int a, b; >> +void fn1 (void *); >> +template >> +void >> +fn2 (T *x) >> +{ >> + if (x) >> +x->~T(); >> + fn1 (x); >> +} >> +template >> +struct C { >> + void operator() (T *x) { fn2 (x); } >> +}; >> +struct D; >> +template > >> +using V = U; >> +struct A { >> + A (int *); >> +}; >> +struct S; >> +struct G { >> + V m3 (); >> +}; >> +struct S { >> + int e; >> + virtual ~S () {} >> +}; >> +template >> +struct H { >> + H (int, T x, int) : h(x) {} >> + G g; >> + void m4 () { g.m3 (); } >> + T h; >> +}; >> +struct I { >> + I(A, W); >> +}; >> +void >> +test () >> +{ >> + A c (&b); >> + W d (&b); >> + I e (c, d); >> + H f (0, e, a); >> + f.m4 (); >> +} >> + >> -- >> 2.36.1 >>
Re: [PATCH] Extend 64-bit vector bit_op patterns with ?r alternative
On Thu, Jul 14, 2022 at 11:32 AM Hongtao Liu wrote: > > On Thu, Jul 14, 2022 at 3:22 PM Uros Bizjak via Gcc-patches > wrote: > > > > On Thu, Jul 14, 2022 at 7:33 AM liuhongt wrote: > > > > > > And split it to GPR-version instruction after reload. > > > > > > > ?r was introduced under the assumption that we want vector values > > > > mostly in vector registers. Currently there are no instructions with > > > > memory or immediate operand, so that made sense at the time. Let's > > > > keep ?r until logic instructions with mem/imm operands are introduced. > > > > So, for the patch that adds 64-bit vector logic in GPR, I would advise > > > > to first introduce only register operands. mem/imm operands should be > > > Update patch to add ?r to 64-bit bit_op patterns. > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > > No big imact on SPEC2017(Most same binary). > > > > The problem with your approach is with the combine pass, where combine > > first tries to recognize the combined instruction without clobber, > > before re-recognizing instruction with added clobber. So, if a forward > > propagation happens, the combine will *always* choose the insn variant > > without GPR. > Thank you for the explanation, I really did not know this point. > > > > So, the solution with VI_16_32 is to always expand with a clobbered > > version that is split to either SImode or V16QImode. With 64-bit > > instructions, we have two additional complications. First, we have a > > native MMX instruction, and we have to split to it after reload, and > > second, we have a builtin that expects vector insn. > > > > To solve the first issue, we should change the mode of > > "*mmx" to V1DImode and split your new _gpr version with > > clobber to it for !GENERAL_REG_P operands. > > > > The second issue could be solved by emitting V1DImode instructions > > directly from the expander. Please note there are several expanders > > that expect non-clobbered logic insn in certain mode to be available, > > so the situation can become quite annoying... > Yes. It looks like it would add a lot of code complexity, I'll hold > the patch for now. I did some experimenting in the past with the idea of adding GPR instructions to 64-bit vectors. While there were some opportunities with 32- and 16-bit operations, mostly due to the fact that these arguments are passed via integer registers, 64-bit cases never triggered, because 64-bit vectors are passed via XMM registers. Also, when mem/imm alternatives were added, many inter-regunit moves were generated for everything but the most simple testcases involving logic operations, also considering the limited range of 64-bit immediates. IMO, the only case it is worth adding is a direct immediate store to memory, which HJ recently added. Uros. Uros.
Re: [PATCH] lto-plugin: use -pthread only for detected targets
On 7/13/22 14:15, Richard Biener wrote: > Didn't we have it that way and not work? IIRC LDFLAGS is only > used during configure link tests and _not_ substituted? You are right. There's a proper fix that utilizes AM_CONDITIONAL and sets LDFLAGS properly in Makefile.am. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, MartinFrom bf0e9635086b5b3f88a40bab9839b2c438eed831 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Thu, 14 Jul 2022 09:51:33 +0200 Subject: [PATCH] lto-plugin: use -pthread only for detected targets Use -pthread only if we are going to use pthread functionality. PR bootstrap/106156 lto-plugin/ChangeLog: * Makefile.am: Append -pthread to LDFLAGS if LTO_PLUGIN_USE_PTHREAD. * configure.ac: Use AM_CONDITIONAL for LTO_PLUGIN_USE_PTHREAD. * Makefile.in: Regenerate. * configure: Regenerate. --- lto-plugin/Makefile.am | 7 +-- lto-plugin/Makefile.in | 5 +++-- lto-plugin/configure| 19 +-- lto-plugin/configure.ac | 2 ++ 4 files changed, 27 insertions(+), 6 deletions(-) diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am index 482946e4dd5..ce51e8e173e 100644 --- a/lto-plugin/Makefile.am +++ b/lto-plugin/Makefile.am @@ -9,12 +9,15 @@ libexecsubdir := $(libexecdir)/gcc/$(real_target_noncanonical)/$(gcc_version)$(a AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS) AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS) -DBASE_VERSION='"$(gcc_version)"' -# The plug-in depends on pthreads. -AM_LDFLAGS = -pthread @ac_lto_plugin_ldflags@ +AM_LDFLAGS = @ac_lto_plugin_ldflags@ AM_LIBTOOLFLAGS = --tag=disable-static override CFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(CFLAGS)) override LDFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(LDFLAGS)) +if LTO_PLUGIN_USE_PTHREAD +override LDFLAGS := $(LDFLAGS) -pthread +endif + libexecsub_LTLIBRARIES = liblto_plugin.la gcc_build_dir = @gcc_build_dir@ in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in index 9453bc7d607..c323b6a51de 100644 --- a/lto-plugin/Makefile.in +++ b/lto-plugin/Makefile.in @@ -344,8 +344,7 @@ gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER) libexecsubdir := $(libexecdir)/gcc/$(real_target_noncanonical)/$(gcc_version)$(accel_dir_suffix) AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS) AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS) -DBASE_VERSION='"$(gcc_version)"' -# The plug-in depends on pthreads. -AM_LDFLAGS = -pthread @ac_lto_plugin_ldflags@ +AM_LDFLAGS = @ac_lto_plugin_ldflags@ AM_LIBTOOLFLAGS = --tag=disable-static libexecsub_LTLIBRARIES = liblto_plugin.la in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) @@ -685,6 +684,8 @@ uninstall-am: uninstall-libexecsubLTLIBRARIES override CFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(CFLAGS)) override LDFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(LDFLAGS)) +@LTO_PLUGIN_USE_PTHREAD_TRUE@override LDFLAGS := $(LDFLAGS) -pthread + all-local: $(in_gcc_libs) $(in_gcc_libs) : $(gcc_build_dir)/%: % diff --git a/lto-plugin/configure b/lto-plugin/configure index 870e49b2e62..bf7c4d064ee 100755 --- a/lto-plugin/configure +++ b/lto-plugin/configure @@ -650,6 +650,8 @@ LD FGREP SED LIBTOOL +LTO_PLUGIN_USE_PTHREAD_FALSE +LTO_PLUGIN_USE_PTHREAD_TRUE LTO_PLUGIN_USE_SYMVER_SUN_FALSE LTO_PLUGIN_USE_SYMVER_SUN_TRUE LTO_PLUGIN_USE_SYMVER_GNU_FALSE @@ -6033,6 +6035,15 @@ fi fi + if test "x$use_locking" = xyes; then + LTO_PLUGIN_USE_PTHREAD_TRUE= + LTO_PLUGIN_USE_PTHREAD_FALSE='#' +else + LTO_PLUGIN_USE_PTHREAD_TRUE='#' + LTO_PLUGIN_USE_PTHREAD_FALSE= +fi + + case `pwd` in *\ * | *\ *) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Libtool does not cope well with whitespace in \`pwd\`" >&5 @@ -12104,7 +12115,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12107 "configure" +#line 12118 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -12210,7 +12221,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12213 "configure" +#line 12224 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -12693,6 +12704,10 @@ if test -z "${LTO_PLUGIN_USE_SYMVER_SUN_TRUE}" && test -z "${LTO_PLUGIN_USE_SYMV as_fn_error $? "conditional \"LTO_PLUGIN_USE_SYMVER_SUN\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi +if test -z "${LTO_PLUGIN_USE_PTHREAD_TRUE}" && test -z "${LTO_PLUGIN_USE_PTHREAD_FALSE}"; then + as_fn_error $? "conditional \"LTO_PLUGIN_USE_PTHREAD\" was never defined. +Usually this means the macro was only invoked conditionally." "$LINENO" 5 +fi : "${CONFIG_STATUS=./config.status}" ac_write_fail=0 diff --g
Re: [PATCH] lto-plugin: use -pthread only for detected targets
On Thu, Jul 14, 2022 at 12:03 PM Martin Liška wrote: > > On 7/13/22 14:15, Richard Biener wrote: > > Didn't we have it that way and not work? IIRC LDFLAGS is only > > used during configure link tests and _not_ substituted? > > You are right. > > There's a proper fix that utilizes AM_CONDITIONAL and sets LDFLAGS properly > in Makefile.am. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? I'm not really an expert on build issues but I'd have added some lto_plugin_extra_ldflags variable, AC_SUBST ()ed it and done AM_LDFLAGS = @lto_plugin_extra_ldflags@ @ac_lto_plugin_ldflags@ or figure where we set ac_lto_plugin_ldflags and amend that during configure instead: # Need -Wc to get it through libtool. if test "x$have_static_libgcc" = xyes; then ac_lto_plugin_ldflags="-Wc,-static-libgcc" fi AC_SUBST(ac_lto_plugin_ldflags) > Thanks, > Martin
[aarch64] Use op_mode instead of vmode for op0, op1 in aarch64_vectorize_vec_perm_const
Hi, For following test case: svint32_t foo() { int32x4_t v = (int32x4_t) { 1, 2, 3, 4 }; svint32_t v2 = svld1rq_s32 (svptrue_b8(), &v[0]); return v2; } After applying workaround in forwprop to not simplify VEC_PERM_EXPR in simplify_permutation to avoid type error in middle end (or using -fno-tree-forwprop) as mentioned in: https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598390.html We get following optimized gimple: v2_2 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; return v2_2; However we hit the following ICE during expansion of vec_perm_expr because in aarch64_vectorize_vec_perm_const, op0 is VECTOR_CST, and we call force_reg (VNx4SI, op0), which is incorrect mode for op0. The patch fixes it by using op_mode instead of vmode in calls to force_reg for op0 and op1. during RTL pass: expand foo2.c: In function ‘foo’: foo2.c:8:10: internal compiler error: in emit_move_insn, at expr.cc:4052 8 | return v2; | ^~ 0x74789b emit_move_insn(rtx_def*, rtx_def*) ../../gcc/gcc/expr.cc:4052 0xb8f664 force_reg(machine_mode, rtx_def*) ../../gcc/gcc/explow.cc:688 0x134182f aarch64_vectorize_vec_perm_const ../../gcc/gcc/config/aarch64/aarch64.cc:24132 0xe63070 expand_vec_perm_const(machine_mode, rtx_def*, rtx_def*, int_vector_builder > const&, machine_mode, rtx_def*) ../../gcc/gcc/optabs.cc:6254 0xbb1569 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../gcc/gcc/expr.cc:10273 0xbb6498 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.cc:10625 0xa897dc expand_expr ../../gcc/gcc/expr.h:310 0xa897dc expand_return ../../gcc/gcc/cfgexpand.cc:3809 0xa897dc expand_gimple_stmt_1 ../../gcc/gcc/cfgexpand.cc:3918 0xa897dc expand_gimple_stmt ../../gcc/gcc/cfgexpand.cc:4044 0xa8f238 expand_gimple_basic_block ../../gcc/gcc/cfgexpand.cc:6096 0xa91187 execute ../../gcc/gcc/cfgexpand.cc:6822 Is the patch OK to commit after bootstrap+test on aarch64-linux-gnu ? Thanks, Prathamesh diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 25f4cbb466d..303814b8cca 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24129,11 +24129,11 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, d.op_mode = op_mode; d.op_vec_flags = aarch64_classify_vector_mode (d.op_mode); d.target = target; - d.op0 = op0 ? force_reg (vmode, op0) : NULL_RTX; + d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX; if (op0 == op1) d.op1 = d.op0; else -d.op1 = op1 ? force_reg (vmode, op1) : NULL_RTX; +d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX; d.testing_p = !target; if (!d.testing_p)
Re: Mips: Fix kernel_stat structure size
Hi Xi, > Please CC me (@xry111) in LLVM review. I just posted the patch. You should've gotten the email. If not, see: https://reviews.llvm.org/D129749. > By the way, if you have some spare time, you can also add the value for > Musl (all 3 ABIs). Musl has a different struct stat than Glibc (it's PR > 106136 in GCC bugzilla). I might be able to, but no promises. :) From: Xi Ruoyao Sent: Wednesday, July 13, 2022 4:38 AM To: Dimitrije Milosevic ; Hans-Peter Nilsson Cc: Djordje Todorovic ; gcc-patches@gcc.gnu.org Subject: Re: Mips: Fix kernel_stat structure size On Tue, 2022-07-12 at 06:42 +, Dimitrije Milosevic wrote: > I will send out a review, covering both 32 ABIs, meaning that both O32 > and N32 ABIs will be working properly. Please CC me (@xry111) in LLVM review. By the way, if you have some spare time, you can also add the value for Musl (all 3 ABIs). Musl has a different struct stat than Glibc (it's PR 106136 in GCC bugzilla). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] lto-plugin: use -pthread only for detected targets
On 7/14/22 12:10, Richard Biener wrote: > On Thu, Jul 14, 2022 at 12:03 PM Martin Liška wrote: >> >> On 7/13/22 14:15, Richard Biener wrote: >>> Didn't we have it that way and not work? IIRC LDFLAGS is only >>> used during configure link tests and _not_ substituted? >> >> You are right. >> >> There's a proper fix that utilizes AM_CONDITIONAL and sets LDFLAGS properly >> in Makefile.am. >> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. >> >> Ready to be installed? > > I'm not really an expert on build issues but I'd have added > some lto_plugin_extra_ldflags variable, AC_SUBST ()ed it > and done > > AM_LDFLAGS = @lto_plugin_extra_ldflags@ @ac_lto_plugin_ldflags@ > > or figure where we set ac_lto_plugin_ldflags and amend that during > configure instead: > > # Need -Wc to get it through libtool. > if test "x$have_static_libgcc" = xyes; then >ac_lto_plugin_ldflags="-Wc,-static-libgcc" > fi > AC_SUBST(ac_lto_plugin_ldflags) All right, so something like this? Martin > > >> Thanks, >> Martin From 248469bd1b35d2c87d751b1adbc07c784ebd4830 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Thu, 14 Jul 2022 09:51:33 +0200 Subject: [PATCH] lto-plugin: use -pthread only for detected targets Use -pthread only if we are going to use pthread functionality. PR bootstrap/106156 lto-plugin/ChangeLog: * Makefile.am: Append -pthread to LDFLAGS if LTO_PLUGIN_USE_PTHREAD. * configure.ac: Use AM_CONDITIONAL for LTO_PLUGIN_USE_PTHREAD. * Makefile.in: Regenerate. * configure: Regenerate. --- lto-plugin/Makefile.am | 3 +-- lto-plugin/Makefile.in | 4 ++-- lto-plugin/configure| 10 -- lto-plugin/configure.ac | 5 + 4 files changed, 16 insertions(+), 6 deletions(-) diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am index 482946e4dd5..aba3c5a00cd 100644 --- a/lto-plugin/Makefile.am +++ b/lto-plugin/Makefile.am @@ -9,8 +9,7 @@ libexecsubdir := $(libexecdir)/gcc/$(real_target_noncanonical)/$(gcc_version)$(a AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS) AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS) -DBASE_VERSION='"$(gcc_version)"' -# The plug-in depends on pthreads. -AM_LDFLAGS = -pthread @ac_lto_plugin_ldflags@ +AM_LDFLAGS = @ac_lto_plugin_ldflags@ @ac_lto_plugin_extra_ldflags@ AM_LIBTOOLFLAGS = --tag=disable-static override CFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(CFLAGS)) override LDFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(LDFLAGS)) diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in index 9453bc7d607..cb568e1e09f 100644 --- a/lto-plugin/Makefile.in +++ b/lto-plugin/Makefile.in @@ -276,6 +276,7 @@ abs_top_builddir = @abs_top_builddir@ abs_top_srcdir = @abs_top_srcdir@ ac_ct_CC = @ac_ct_CC@ ac_ct_DUMPBIN = @ac_ct_DUMPBIN@ +ac_lto_plugin_extra_ldflags = @ac_lto_plugin_extra_ldflags@ ac_lto_plugin_ldflags = @ac_lto_plugin_ldflags@ ac_lto_plugin_warn_cflags = @ac_lto_plugin_warn_cflags@ accel_dir_suffix = @accel_dir_suffix@ @@ -344,8 +345,7 @@ gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER) libexecsubdir := $(libexecdir)/gcc/$(real_target_noncanonical)/$(gcc_version)$(accel_dir_suffix) AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS) AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS) -DBASE_VERSION='"$(gcc_version)"' -# The plug-in depends on pthreads. -AM_LDFLAGS = -pthread @ac_lto_plugin_ldflags@ +AM_LDFLAGS = @ac_lto_plugin_ldflags@ @ac_lto_plugin_extra_ldflags@ AM_LIBTOOLFLAGS = --tag=disable-static libexecsub_LTLIBRARIES = liblto_plugin.la in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib)) diff --git a/lto-plugin/configure b/lto-plugin/configure index 870e49b2e62..fdb8a5964b4 100755 --- a/lto-plugin/configure +++ b/lto-plugin/configure @@ -650,6 +650,7 @@ LD FGREP SED LIBTOOL +ac_lto_plugin_extra_ldflags LTO_PLUGIN_USE_SYMVER_SUN_FALSE LTO_PLUGIN_USE_SYMVER_SUN_TRUE LTO_PLUGIN_USE_SYMVER_GNU_FALSE @@ -6012,6 +6013,7 @@ fi # Check for thread headers. use_locking=no +ac_lto_plugin_extra_ldflags= case $target in riscv*) @@ -6031,8 +6033,12 @@ $as_echo "#define HAVE_PTHREAD_LOCKING 1" >>confdefs.h fi + + ac_lto_plugin_extra_ldflags="-pthread" fi + + case `pwd` in *\ * | *\ *) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Libtool does not cope well with whitespace in \`pwd\`" >&5 @@ -12104,7 +12110,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12107 "configure" +#line 12113 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -12210,7 +12216,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12213 "configure" +#line 12219 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/lto-plugin/configure.ac b/lto-plugin/configure.ac index 18eb4f60b0a..e69a7bd9ec0 100644 --- a/lto-plugin
Re: [PATCH] lto-plugin: use -pthread only for detected targets
On Thu, Jul 14, 2022 at 1:13 PM Martin Liška wrote: > > On 7/14/22 12:10, Richard Biener wrote: > > On Thu, Jul 14, 2022 at 12:03 PM Martin Liška wrote: > >> > >> On 7/13/22 14:15, Richard Biener wrote: > >>> Didn't we have it that way and not work? IIRC LDFLAGS is only > >>> used during configure link tests and _not_ substituted? > >> > >> You are right. > >> > >> There's a proper fix that utilizes AM_CONDITIONAL and sets LDFLAGS properly > >> in Makefile.am. > >> > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> > >> Ready to be installed? > > > > I'm not really an expert on build issues but I'd have added > > some lto_plugin_extra_ldflags variable, AC_SUBST ()ed it > > and done > > > > AM_LDFLAGS = @lto_plugin_extra_ldflags@ @ac_lto_plugin_ldflags@ > > > > or figure where we set ac_lto_plugin_ldflags and amend that during > > configure instead: > > > > # Need -Wc to get it through libtool. > > if test "x$have_static_libgcc" = xyes; then > >ac_lto_plugin_ldflags="-Wc,-static-libgcc" > > fi > > AC_SUBST(ac_lto_plugin_ldflags) > > All right, so something like this? LGTM with the changelog fixed > Martin > > > > > > >> Thanks, > >> Martin
Re: [aarch64] Use op_mode instead of vmode for op0, op1 in aarch64_vectorize_vec_perm_const
Prathamesh Kulkarni writes: > Hi, > For following test case: > > svint32_t foo() > { > int32x4_t v = (int32x4_t) { 1, 2, 3, 4 }; > svint32_t v2 = svld1rq_s32 (svptrue_b8(), &v[0]); > return v2; > } > > After applying workaround in forwprop to not simplify VEC_PERM_EXPR in > simplify_permutation to avoid type error in middle end (or using > -fno-tree-forwprop) > as mentioned in: > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598390.html > > We get following optimized gimple: > v2_2 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>; > return v2_2; Hmm, we really should be able to fold that to a constant. However… > However we hit the following ICE during expansion of vec_perm_expr > because in aarch64_vectorize_vec_perm_const, > op0 is VECTOR_CST, and we call force_reg (VNx4SI, op0), which is incorrect > mode > for op0. The patch fixes it by using op_mode instead of vmode in calls > to force_reg > for op0 and op1. > > during RTL pass: expand > foo2.c: In function ‘foo’: > foo2.c:8:10: internal compiler error: in emit_move_insn, at expr.cc:4052 > 8 | return v2; > | ^~ > 0x74789b emit_move_insn(rtx_def*, rtx_def*) > ../../gcc/gcc/expr.cc:4052 > 0xb8f664 force_reg(machine_mode, rtx_def*) > ../../gcc/gcc/explow.cc:688 > 0x134182f aarch64_vectorize_vec_perm_const > ../../gcc/gcc/config/aarch64/aarch64.cc:24132 > 0xe63070 expand_vec_perm_const(machine_mode, rtx_def*, rtx_def*, > int_vector_builder > const&, machine_mode, > rtx_def*) > ../../gcc/gcc/optabs.cc:6254 > 0xbb1569 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, > expand_modifier) > ../../gcc/gcc/expr.cc:10273 > 0xbb6498 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, > expand_modifier, rtx_def**, bool) > ../../gcc/gcc/expr.cc:10625 > 0xa897dc expand_expr > ../../gcc/gcc/expr.h:310 > 0xa897dc expand_return > ../../gcc/gcc/cfgexpand.cc:3809 > 0xa897dc expand_gimple_stmt_1 > ../../gcc/gcc/cfgexpand.cc:3918 > 0xa897dc expand_gimple_stmt > ../../gcc/gcc/cfgexpand.cc:4044 > 0xa8f238 expand_gimple_basic_block > ../../gcc/gcc/cfgexpand.cc:6096 > 0xa91187 execute > ../../gcc/gcc/cfgexpand.cc:6822 > > Is the patch OK to commit after bootstrap+test on aarch64-linux-gnu ? > > Thanks, > Prathamesh > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 25f4cbb466d..303814b8cca 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24129,11 +24129,11 @@ aarch64_vectorize_vec_perm_const (machine_mode > vmode, machine_mode op_mode, >d.op_mode = op_mode; >d.op_vec_flags = aarch64_classify_vector_mode (d.op_mode); >d.target = target; > - d.op0 = op0 ? force_reg (vmode, op0) : NULL_RTX; > + d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX; >if (op0 == op1) > d.op1 = d.op0; >else > -d.op1 = op1 ? force_reg (vmode, op1) : NULL_RTX; > +d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX; >d.testing_p = !target; > >if (!d.testing_p) …yes, this is OK, thanks. Richard
Re: ICE after folding svld1rq to vec_perm_expr duing forwprop
Richard Biener writes: > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > wrote: >> >> On Wed, 13 Jul 2022 at 12:22, Richard Biener >> wrote: >> > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches >> > wrote: >> > > >> > > Hi Richard, >> > > For the following test: >> > > >> > > svint32_t f2(int a, int b, int c, int d) >> > > { >> > > int32x4_t v = (int32x4_t) {a, b, c, d}; >> > > return svld1rq_s32 (svptrue_b8 (), &v[0]); >> > > } >> > > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve: >> > > foo.c: In function ‘f2’: >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’ >> > > 4 | svint32_t f2(int a, int b, int c, int d) >> > > | ^~ >> > > svint32_t >> > > __Int32x4_t >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); >> > > during GIMPLE pass: forwprop >> > > dump file: foo.c.109t.forwprop2 >> > > foo.c:4:11: internal compiler error: verify_gimple failed >> > > 0xfda04a verify_gimple_in_cfg(function*, bool) >> > > ../../gcc/gcc/tree-cfg.cc:5568 >> > > 0xe9371f execute_function_todo >> > > ../../gcc/gcc/passes.cc:2091 >> > > 0xe93ccb execute_todo >> > > ../../gcc/gcc/passes.cc:2145 >> > > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we >> > > have: >> > > int32x4_t v; >> > > __Int32x4_t _1; >> > > svint32_t _9; >> > > vector(4) int _11; >> > > >> > >: >> > > _1 = {a_3(D), b_4(D), c_5(D), d_6(D)}; >> > > v_12 = _1; >> > > _11 = v_12; >> > > _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>; >> > > return _9; >> > > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to >> > > view_convert_expr, >> > > and the end result becomes: >> > > svint32_t _7; >> > > __Int32x4_t _8; >> > > >> > > ;; basic block 2, loop depth 0 >> > > ;;pred: ENTRY >> > > _8 = {a_2(D), b_3(D), c_4(D), d_5(D)}; >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8); >> > > return _7; >> > > ;;succ: EXIT >> > > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR >> > > has incompatible types (svint32_t, int32x4_t). >> > > >> > > The attached patch disables simplification of VEC_PERM_EXPR >> > > in simplify_permutation, if lhs and rhs have non compatible types, >> > > which resolves ICE, but am not sure if it's the correct approach ? >> > >> > It for sure papers over the issue. I think the error happens earlier, >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR >> > which is the type of the LHS. But then you probably run into the >> > different sizes ICE (VLA vs constant size). I think for this case you >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR, >> > selecting the "low" part of the VLA vector. >> Hi Richard, >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to >> represent dup operation >> from fixed width to VLA vector. I am not sure how folding it to >> BIT_FIELD_REF will work. >> Could you please elaborate ? >> >> Also, the issue doesn't seem restricted to this case. >> The following test case also ICE's during forwprop: >> svint32_t foo() >> { >> int32x4_t v = (int32x4_t) {1, 2, 3, 4}; >> svint32_t v2 = svld1rq_s32 (svptrue_b8 (), &v[0]); >> return v2; >> } >> >> foo2.c: In function ‘foo’: >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’ >> 9 | } >> | ^ >> svint32_t >> int32x4_t >> v2_4 = { 1, 2, 3, 4 }; >> >> because simplify_permutation folds >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} > >> into: >> vector_cst {1, 2, 3, 4} >> >> and it complains during verify_gimple_assign_single because we don't >> support assignment of vector_cst to VLA vector. >> >> I guess the issue really is that currently, only VEC_PERM_EXPR >> supports lhs and rhs >> to have vector types with differing lengths, and simplifying it to >> other tree codes, like above, >> will result in type errors ? > > That might be the case - Richard should know. I don't see anything particularly special about VEC_PERM_EXPR here, or about the VLA vs. VLS thing. We would have the same issue trying to build a 128-bit vector from 2 64-bit vectors. And there are other tree codes whose input types are/can be different from their output types. So it just seems like a normal type correctness issue: a VEC_PERM_EXPR of type T needs to be replaced by something of type T. Whether T has a constant size or a variable size doesn't matter. > If so your type check > is still too late, you should instead recognize that we are permuting > a VLA vector and then refuse to go any of the non-VEC_PERM generating > paths - that probably means just allowing the code == VEC_PERM_EXPR > case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases? Yeah. If we're talking about the match.pd code, I think only: (if (sel.series_p (0, 1, 0, 1)) { op0; } (if (sel.series_p (0, 1, nelts, 1)) { op1; } need a type compatibility check. For fold_vec_perm I think we
[PATCH] libstdc++: Make __from_chars_alnum_to_val conversion explicit
>From 2d4e7cd1d476a065d824e11045c8dbc049d5f0a0 Mon Sep 17 00:00:00 2001 From: MacroFake Date: Thu, 14 Jul 2022 15:26:12 +0200 Subject: [PATCH] libstdc++: Make __from_chars_alnum_to_val conversion explicit The optimizations from commit a54137c88061c7495728fc6b8dfd0474e812b2cb introduced a clang integer sanitizer error. Fix this with an explicit static_cast, similar to the fix in commit 074436cf8cdd2a9ce75cadd36deb8301f00e55b9. libstdc++-v3/ChangeLog: * include/std/charconv (__from_chars_alnum_to_val): Replace implicit conversions from int to unsigned char with explicit casts. --- libstdc++-v3/include/std/charconv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index 218813e4797..bdf23e4a5ad 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -436,7 +436,7 @@ namespace __detail __from_chars_alnum_to_val(unsigned char __c) { if _GLIBCXX17_CONSTEXPR (_DecOnly) -return __c - '0'; + return static_cast(__c - '0'); else { // This initializer is deliberately made dependent in order to work -- 2.35.3 From 2d4e7cd1d476a065d824e11045c8dbc049d5f0a0 Mon Sep 17 00:00:00 2001 From: MacroFake Date: Thu, 14 Jul 2022 15:26:12 +0200 Subject: [PATCH] libstdc++: Make __from_chars_alnum_to_val conversion explicit The optimizations from commit a54137c88061c7495728fc6b8dfd0474e812b2cb introduced a clang integer sanitizer error. Fix this with an explicit static_cast, similar to the fix in commit 074436cf8cdd2a9ce75cadd36deb8301f00e55b9. libstdc++-v3/ChangeLog: * include/std/charconv (__from_chars_alnum_to_val): Replace implicit conversions from int to unsigned char with explicit casts. --- libstdc++-v3/include/std/charconv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index 218813e4797..bdf23e4a5ad 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -436,7 +436,7 @@ namespace __detail __from_chars_alnum_to_val(unsigned char __c) { if _GLIBCXX17_CONSTEXPR (_DecOnly) - return __c - '0'; + return static_cast(__c - '0'); else { // This initializer is deliberately made dependent in order to work -- 2.35.3
Re: [PATCH] match.pd: Add new abs pattern [PR94290]
On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski wrote: > On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches > wrote: > > > > This patch is intended to fix a missed optimization in match.pd. It > optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to > write a second simplification in match.pd to handle the commutative > property as the match was not ocurring otherwise. Additionally, the pattern > (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps with the > other simplification rule. > > You could use :c for the commutative property instead and that should > simplify things. > That is: > > (simplify > (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) > (abs @0)) > > Also since integer_zerop works on vectors, it seems like you should > add a testcase or two for the vector case. > Also would be useful if you write a testcase that uses different > statements rather than one big one so it gets exercised in the > forwprop case. > Note also if either of the max are used more than just in this > simplification, it could increase the lifetime of @0, maybe you need > to add :s to the max expressions. > > Thanks for the feedback. I'm not quite sure what a vector test case would look like for this. Could I get some guidance on that? Thanks -Sam > Thanks, > Andrew > > > > > Tests are also included to be added to the testsuite. > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > PR tree-optimization/94290 > > > > gcc/ChangeLog: > > > > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New > simplification. > > * match.pd (x <= 0 ? -x : 0): New Simplification. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.c-torture/execute/pr94290-1.c: New test. > > * gcc.dg/pr94290-2.c: New test. > > * gcc.dg/pr94290.c: New test. > > --- > > gcc/match.pd | 15 ++ > > .../gcc.c-torture/execute/pr94290-1.c | 16 +++ > > gcc/testsuite/gcc.dg/pr94290-2.c | 15 ++ > > gcc/testsuite/gcc.dg/pr94290.c| 46 +++ > > 4 files changed, 92 insertions(+) > > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c > > create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c > > create mode 100644 gcc/testsuite/gcc.dg/pr94290.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 45aefd96688..55ca79d7ac9 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -7848,3 +7848,18 @@ and, > >(if (TYPE_UNSIGNED (TREE_TYPE (@0))) > > (bit_and @0 @1) > >(cond (le @0 @1) @0 (bit_and @0 @1)) > > + > > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ > > +(simplify > > + (plus (max @0 integer_zerop) (max (negate @0) integer_zerop)) > > + (abs @0)) > > + > > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x. */ > > +(simplify > > + (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) ) > > + (abs @0)) > > + > > +/* (x <= 0 ? -x : 0) -> max(-x, 0). */ > > +(simplify > > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1) > > + (max (negate @0) @1)) > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c > b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c > > new file mode 100644 > > index 000..93b80d569aa > > --- /dev/null > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c > > @@ -0,0 +1,16 @@ > > +/* PR tree-optimization/94290 */ > > + > > +#include "../../gcc.dg/pr94290.c" > > + > > +int main() { > > + > > +if (foo(0) != 0 > > +|| foo(-42) != 42 > > +|| foo(42) != 42 > > +|| baz(-10) != 10 > > +|| baz(-10) != 10) { > > +__builtin_abort(); > > +} > > + > > +return 0; > > +} > > diff --git a/gcc/testsuite/gcc.dg/pr94290-2.c > b/gcc/testsuite/gcc.dg/pr94290-2.c > > new file mode 100644 > > index 000..ea6e55755f5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr94290-2.c > > @@ -0,0 +1,15 @@ > > +/* PR tree-optimization/94290 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > + > > +/* Form from PR. */ > > +__attribute__((noipa)) unsigned int foo(int x) { > > +return x <= 0 ? -x : 0; > > +} > > + > > +/* Changed order. */ > > +__attribute__((noipa)) unsigned int bar(int x) { > > +return 0 >= x ? -x : 0; > > +} > > + > > +/* { dg-final {scan-tree-dump-times " MAX_EXPR " 2 "optimized" } } */ > > diff --git a/gcc/testsuite/gcc.dg/pr94290.c > b/gcc/testsuite/gcc.dg/pr94290.c > > new file mode 100644 > > index 000..47617c36c02 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr94290.c > > @@ -0,0 +1,46 @@ > > +/* PR tree-optimization/94290 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > + > > + > > +/* Same form as PR. */ > > +__attribute__((noipa)) unsigned int foo(int x) { > > +return (x >= 0 ? x : 0) + (x <= 0 ? -x : 0); > > +} > > + > > +/* Signed function. */ > > +__a
Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
Hi Yonghong. > On 7/7/22 1:24 PM, Jose E. Marchesi wrote: >> Hi Yonghong. >> >>> On 6/21/22 9:12 AM, Jose E. Marchesi wrote: > On 6/17/22 10:18 AM, Jose E. Marchesi wrote: >> Hi Yonghong. >> >>> On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: > > > On 6/7/22 2:43 PM, David Faust wrote: >> Hello, >> >> This patch series adds support for: >> >> - Two new C-language-level attributes that allow to associate (to >> "annotate" or >>to "tag") particular declarations and types with arbitrary >> strings. As >>explained below, this is intended to be used to, for example, >> characterize >>certain pointer types. >> >> - The conveyance of that information in the DWARF output in the form >> of a new >>DIE: DW_TAG_GNU_annotation. >> >> - The conveyance of that information in the BTF output in the form >> of two new >>kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >> >> All of these facilities are being added to the eBPF ecosystem, and >> support for >> them exists in some form in LLVM. >> >> Purpose >> === >> >> 1) Addition of C-family language constructs (attributes) to specify >> free-text >> tags on certain language elements, such as struct fields. >> >> The purpose of these annotations is to provide additional >> information about >> types, variables, and function parameters of interest to >> the kernel. A >> driving use case is to tag pointer types within the linux >> kernel and eBPF >> programs with additional semantic information, such as >> '__user' or '__rcu'. >> >> For example, consider the linux kernel function do_execve >> with the >> following declaration: >> >>static int do_execve(struct filename *filename, >> const char __user *const __user *__argv, >> const char __user *const __user *__envp); >> >> Here, __user could be defined with these annotations to >> record semantic >> information about the pointer parameters (e.g., they are >> user-provided) in >> DWARF and BTF information. Other kernel facilites such as >> the eBPF verifier >> can read the tags and make use of the information. >> >> 2) Conveying the tags in the generated DWARF debug info. >> >> The main motivation for emitting the tags in DWARF is that >> the Linux kernel >> generates its BTF information via pahole, using DWARF as a >> source: >> >> ++ BTF BTF +--+ >> | pahole |---> vmlinux.btf --->| verifier | >> ++ +--+ >> ^^ >> || >>DWARF |BTF | >> || >> vmlinux +-+ >> module1.ko | BPF program | >> module2.ko +-+ >> ... >> >> This is because: >> >> a) Unlike GCC, LLVM will only generate BTF for BPF >> programs. >> >> b) GCC can generate BTF for whatever target with -gbtf, >> but there is no >> support for linking/deduplicating BTF in the linker. >> >> In the scenario above, the verifier needs access to the >> pointer tags of >> both the kernel types/declarations (conveyed in the DWARF >> and translated >> to BTF by pahole) and those of the BPF program (available >> directly in BTF). >> >> Another motivation for having the tag information in DWARF, >> unrelated to >> BPF and BTF, is that the drgn project (another DWARF >> consumer) also wants >> to benefit from these tags in order to differentiate >> between different >> kinds of pointers in the kernel. >> >> 3) Conveying the tags in the generated BTF debug info. >
Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.
I have applied the patch to GCC 12. | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001 | From: Michael Meissner | Date: Thu, 14 Jul 2022 11:16:08 -0400 | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies. Testing has found that using load and store vector pair for block copies can result in a slow down on power10. This patch disables using the vector pair instructions for block copies if we are tuning for power10. 2022-06-11 Michael Meissner gcc/ * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do not generate block copies with vector pair instructions if we are tuning for power10. Back port from master branch. --- gcc/config/rs6000/rs6000.cc | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 4030864aa1a..040487bd277 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p) if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) { - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) + /* Do not generate lxvp and stxvp on power10 since there are some +performance issues. */ + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX + && rs6000_tune != PROCESSOR_POWER10) rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; else rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; -- 2.35.3 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
[PATCH] PR target/106278: Keep REG_EQUAL notes consistent during TImode STV.
This patch resolves PR target/106278 a regression on x86_64 caused by my recent TImode STV improvements. Now that TImode STV can handle comparisons such as "(set (regs:CC) (compare:CC (reg:TI) ...))" the convert_insn method sensibly checks that the mode of the SET_DEST is TImode before setting it to V1TImode [to avoid V1TImode appearing on the hard reg CC_FLAGS. Hence the current code looks like: if (GET_MODE (dst) == TImode) { tmp = find_reg_equal_equiv_note (insn); if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode) PUT_MODE (XEXP (tmp, 0), V1TImode); PUT_MODE (dst, V1TImode); fix_debug_reg_uses (dst); } break; which checks GET_MODE (dst) before calling PUT_MODE, and when a change is made updating the REG_EQUAL_NOTE tmp if it exists. The logical flaw (oversight) is that due to RTL sharing, the destination of this set may already have been updated to V1TImode, as this chain is being converted, but we still need to update any REG_EQUAL_NOTE that still has TImode. Hence the correct code is actually: if (GET_MODE (dst) == TImode) { PUT_MODE (dst, V1TImode); fix_debug_reg_uses (dst); } if (GET_MODE (dst) == V1TImode) { tmp = find_reg_equal_equiv_note (insn); if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode) PUT_MODE (XEXP (tmp, 0), V1TImode); } break; While fixing this behavior, I noticed I had some indentation whitespace issues and some vestigial dead code in this function/method that I've taken the liberty of cleaning up (as obvious) in this patch. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-07-14 Roger Sayle gcc/ChangeLog PR target/106278 * config/i386/i386-features.cc (general_scalar_chain::convert_insn): Fix indentation whitespace. (timode_scalar_chain::fix_debug_reg_uses): Likewise. (timode_scalar_chain::convert_insn): Delete dead code. Update TImode REG_EQUAL_NOTE even if the SET_DEST is already V1TI. Fix indentation whitespace. (convertible_comparison_p): Likewise. (timode_scalar_to_vector_candidate_p): Likewise. gcc/testsuite/ChangeLog PR target/106278 * gcc.dg/pr106278.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index f1b03c3..813b203 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -1054,13 +1054,13 @@ general_scalar_chain::convert_insn (rtx_insn *insn) else if (REG_P (dst) && GET_MODE (dst) == smode) { /* Replace the definition with a SUBREG to the definition we - use inside the chain. */ +use inside the chain. */ rtx *vdef = defs_map.get (dst); if (vdef) dst = *vdef; dst = gen_rtx_SUBREG (vmode, dst, 0); /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST - is a non-REG_P. So kill those off. */ +is a non-REG_P. So kill those off. */ rtx note = find_reg_equal_equiv_note (insn); if (note) remove_note (insn, note); @@ -1246,7 +1246,7 @@ timode_scalar_chain::fix_debug_reg_uses (rtx reg) { rtx_insn *insn = DF_REF_INSN (ref); /* Make sure the next ref is for a different instruction, - so that we're not affected by the rescan. */ +so that we're not affected by the rescan. */ next = DF_REF_NEXT_REG (ref); while (next && DF_REF_INSN (next) == insn) next = DF_REF_NEXT_REG (next); @@ -1336,21 +1336,19 @@ timode_scalar_chain::convert_insn (rtx_insn *insn) rtx dst = SET_DEST (def_set); rtx tmp; - if (MEM_P (dst) && !REG_P (src)) -{ - /* There are no scalar integer instructions and therefore -temporary register usage is required. */ -} switch (GET_CODE (dst)) { case REG: if (GET_MODE (dst) == TImode) { + PUT_MODE (dst, V1TImode); + fix_debug_reg_uses (dst); + } + if (GET_MODE (dst) == V1TImode) + { tmp = find_reg_equal_equiv_note (insn); if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode) PUT_MODE (XEXP (tmp, 0), V1TImode); - PUT_MODE (dst, V1TImode); - fix_debug_reg_uses (dst); } break; case MEM: @@ -1410,8 +1408,8 @@ timode_scalar_chain::convert_insn (rtx_insn *insn) if (MEM_P (dst)) { tmp = gen_reg_rtx (V1TImode); - emit_insn_before (gen_rtx_SET (tmp, src), insn); - src = tmp; + emit_insn_before (gen_rtx_SET (tmp, src), insn); + src = tmp; } break; @@ -1434,8 +1432,8 @@ timode_scalar_chain::convert_insn (rtx_insn *insn) if (MEM_P (dst)) { t
Re: [PATCH] match.pd: Add new abs pattern [PR94290]
On Thu, Jul 14, 2022 at 7:09 AM Sam Feifer wrote: > > > On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski wrote: >> >> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches >> wrote: >> > >> > This patch is intended to fix a missed optimization in match.pd. It >> > optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to >> > write a second simplification in match.pd to handle the commutative >> > property as the match was not ocurring otherwise. Additionally, the >> > pattern (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps >> > with the other simplification rule. >> >> You could use :c for the commutative property instead and that should >> simplify things. >> That is: >> >> (simplify >> (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) >> (abs @0)) >> >> Also since integer_zerop works on vectors, it seems like you should >> add a testcase or two for the vector case. >> Also would be useful if you write a testcase that uses different >> statements rather than one big one so it gets exercised in the >> forwprop case. >> Note also if either of the max are used more than just in this >> simplification, it could increase the lifetime of @0, maybe you need >> to add :s to the max expressions. >> > > Thanks for the feedback. I'm not quite sure what a vector test case would > look like for this. Could I get some guidance on that? Yes this should produce the pattern at forwprop1 time (with the C++ front-end, the C front-end does not support vector selects): typedef int __attribute__((vector_size(4*sizeof(int vint; vint foo(vint x) { vint t = (x >= 0 ? x : 0) ; vint xx = -x; vint t1 = (xx >= 0 ? xx : 0); return t + t1; } int foo(int x) { int t = (x >= 0 ? x : 0) ; int xx = -x; int t1 = (xx >= 0 ? xx : 0); return t + t1; } Thanks, Andrew Pinski > > Thanks > -Sam > >> >> Thanks, >> Andrew >> >> > >> > Tests are also included to be added to the testsuite. >> > >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? >> > >> > PR tree-optimization/94290 >> > >> > gcc/ChangeLog: >> > >> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New >> > simplification. >> > * match.pd (x <= 0 ? -x : 0): New Simplification. >> > >> > gcc/testsuite/ChangeLog: >> > >> > * gcc.c-torture/execute/pr94290-1.c: New test. >> > * gcc.dg/pr94290-2.c: New test. >> > * gcc.dg/pr94290.c: New test. >> > --- >> > gcc/match.pd | 15 ++ >> > .../gcc.c-torture/execute/pr94290-1.c | 16 +++ >> > gcc/testsuite/gcc.dg/pr94290-2.c | 15 ++ >> > gcc/testsuite/gcc.dg/pr94290.c| 46 +++ >> > 4 files changed, 92 insertions(+) >> > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c >> > create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c >> > create mode 100644 gcc/testsuite/gcc.dg/pr94290.c >> > >> > diff --git a/gcc/match.pd b/gcc/match.pd >> > index 45aefd96688..55ca79d7ac9 100644 >> > --- a/gcc/match.pd >> > +++ b/gcc/match.pd >> > @@ -7848,3 +7848,18 @@ and, >> >(if (TYPE_UNSIGNED (TREE_TYPE (@0))) >> > (bit_and @0 @1) >> >(cond (le @0 @1) @0 (bit_and @0 @1)) >> > + >> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ >> > +(simplify >> > + (plus (max @0 integer_zerop) (max (negate @0) integer_zerop)) >> > + (abs @0)) >> > + >> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x. */ >> > +(simplify >> > + (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) ) >> > + (abs @0)) >> > + >> > +/* (x <= 0 ? -x : 0) -> max(-x, 0). */ >> > +(simplify >> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1) >> > + (max (negate @0) @1)) >> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c >> > b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c >> > new file mode 100644 >> > index 000..93b80d569aa >> > --- /dev/null >> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c >> > @@ -0,0 +1,16 @@ >> > +/* PR tree-optimization/94290 */ >> > + >> > +#include "../../gcc.dg/pr94290.c" >> > + >> > +int main() { >> > + >> > +if (foo(0) != 0 >> > +|| foo(-42) != 42 >> > +|| foo(42) != 42 >> > +|| baz(-10) != 10 >> > +|| baz(-10) != 10) { >> > +__builtin_abort(); >> > +} >> > + >> > +return 0; >> > +} >> > diff --git a/gcc/testsuite/gcc.dg/pr94290-2.c >> > b/gcc/testsuite/gcc.dg/pr94290-2.c >> > new file mode 100644 >> > index 000..ea6e55755f5 >> > --- /dev/null >> > +++ b/gcc/testsuite/gcc.dg/pr94290-2.c >> > @@ -0,0 +1,15 @@ >> > +/* PR tree-optimization/94290 */ >> > +/* { dg-do compile } */ >> > +/* { dg-options "-O2 -fdump-tree-optimized" } */ >> > + >> > +/* Form from PR. */ >> > +__attribute__((noipa)) unsigned int foo(int x) { >> > +return x <= 0 ? -x : 0; >> > +} >> > + >> > +/* Changed order. */ >> > +__attribute__((noipa)) unsigned
[PATCH v3] c++: Add __reference_con{struc,ver}ts_from_temporary [PR104477]
On Tue, Jul 12, 2022 at 04:10:08PM -0400, Jason Merrill wrote: > On 7/8/22 13:41, Marek Polacek wrote: > > +bool > > +reference_from_temporary (tree to, tree from, bool direct_init_p) > > +{ > > + /* Check is_reference. */ > > + if (!TYPE_REF_P (to)) > > +return false; > > + /* Check is_constructible. > > + ??? This check doesn't seem to be necessary; if T isn't constructible > > + from U, we won't be able to create a conversion. */ > > + if (!is_xible (INIT_EXPR, to, build_tree_list (NULL_TREE, from))) > > +return false; > > I agree with the comment, did you try leaving this out? If it stays I'd > think it needs to consider direct_init_p. I did, it doesn't break anything, so in this version I dropped the is_xible call entirely. > > @@ -13811,7 +13821,7 @@ warn_for_range_copy (tree decl, tree expr) > > if (TYPE_REF_P (type)) > > { > > - if (glvalue_p (expr) && !ref_conv_binds_directly_p (type, expr)) > > + if (glvalue_p (expr) && ref_conv_binds_to_temporary_p (type, expr)) > > { > > auto_diagnostic_group d; > > if (warning_at (loc, OPT_Wrange_loop_construct, > > @@ -13842,7 +13852,7 @@ warn_for_range_copy (tree decl, tree expr) > > tree rtype = cp_build_reference_type (type, /*rval*/false); > > /* If we could initialize the reference directly, it wouldn't involve > > any > >copies. */ > > - if (!ref_conv_binds_directly_p (rtype, expr)) > > + if (ref_conv_binds_to_temporary_p (rtype, expr)) > > return; > > I think this case wants the old handling of invalid conversions you > mentioned in your intro; we don't want to suggest changing to a reference if > that's ill-formed. Right. I've changed the return type to 'tristate', so that we can properly distinguish between true/false/dunno. > In passing we might change the comment to "If we can initialize a reference > directly, suggest that to avoid the copy." and move it above the rtype > declaration. Done. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- This patch implements C++23 P2255R2, which adds two new type traits to detect reference binding to a temporary. They can be used to detect code like std::tuple t("meow"); which is incorrect because it always creates a dangling reference, because the std::string temporary is created inside the selected constructor of std::tuple, and not outside it. There are two new compiler builtins, __reference_constructs_from_temporary and __reference_converts_from_temporary. The former is used to simulate direct- and the latter copy-initialization context. But I had a hard time finding a test where there's actually a difference. Under DR 2267, both of these are invalid: struct A { } a; struct B { explicit B(const A&); }; const B &b1{a}; const B &b2(a); so I had to peruse [over.match.ref], and eventually realized that the difference can be seen here: struct G { operator int(); // #1 explicit operator int&&(); // #2 }; int&& r1(G{}); // use #2 (no temporary) int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&) The implementation itself was rather straightforward because we already have the conv_binds_ref_to_prvalue function. The main function here is reference_from_temporary. I've changed the return type of ref_conv_binds_directly to tristate, because previously the function didn't distinguish between an invalid conversion and one that binds to a prvalue. Since it no longer returns a bool, I removed the _p suffix. The patch also adds the relevant class and variable templates to . PR c++/104477 gcc/c-family/ChangeLog: * c-common.cc (c_common_reswords): Add __reference_constructs_from_temporary and __reference_converts_from_temporary. * c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. gcc/cp/ChangeLog: * call.cc (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_directly): ... this. Add a new bool parameter. Change the return type to tristate. * constraint.cc (diagnose_trait_expr): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * cp-tree.h: Include "tristate.h". (enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_directly): ... this. (reference_from_temporary): Declare. * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * method.cc (reference_from_temporary): New. * parser.cc (cp_parser_primary_expression): Handle RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. (cp_parser_trait_expr): Likewise. (warn_for_range_copy): Adjust to call ref_conv_binds_directly. * semant
[PATCH] x86: Disable sibcall if indirect_return attribute doesn't match
When shadow stack is enabled, function with indirect_return attribute may return via indirect jump. In this case, we need to disable sibcall if caller doesn't have indirect_return attribute and indirect branch tracking is enabled since compiler won't generate ENDBR when calling the caller. gcc/ PR target/85620 * config/i386/i386.cc (ix86_function_ok_for_sibcall): Return false if callee has indirect_return attribute and caller doesn't. gcc/testsuite/ PR target/85620 * gcc.target/i386/pr85620-2.c: Updated. * gcc.target/i386/pr85620-5.c: New test. * gcc.target/i386/pr85620-6.c: Likewise. * gcc.target/i386/pr85620-7.c: Likewise. --- gcc/config/i386/i386.cc | 10 ++ gcc/testsuite/gcc.target/i386/pr85620-2.c | 3 ++- gcc/testsuite/gcc.target/i386/pr85620-5.c | 13 + gcc/testsuite/gcc.target/i386/pr85620-6.c | 14 ++ gcc/testsuite/gcc.target/i386/pr85620-7.c | 14 ++ 5 files changed, 53 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-7.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 3a3c7299eb4..e03f86d4a23 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -1024,6 +1024,16 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) return false; } + /* Disable sibcall if callee has indirect_return attribute and + caller doesn't since callee will return to the caller's caller + via an indirect jump. */ + if (((flag_cf_protection & (CF_RETURN | CF_BRANCH)) + == (CF_RETURN | CF_BRANCH)) + && lookup_attribute ("indirect_return", TYPE_ATTRIBUTES (type)) + && !lookup_attribute ("indirect_return", + TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl +return false; + /* Otherwise okay. That also includes certain types of indirect calls. */ return true; } diff --git a/gcc/testsuite/gcc.target/i386/pr85620-2.c b/gcc/testsuite/gcc.target/i386/pr85620-2.c index b2e680fa1fe..14ce0ffd1e1 100644 --- a/gcc/testsuite/gcc.target/i386/pr85620-2.c +++ b/gcc/testsuite/gcc.target/i386/pr85620-2.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fcf-protection" } */ -/* { dg-final { scan-assembler-times {\mendbr} 1 } } */ +/* { dg-final { scan-assembler-times {\mendbr} 2 } } */ +/* { dg-final { scan-assembler-not "jmp" } } */ struct ucontext; diff --git a/gcc/testsuite/gcc.target/i386/pr85620-5.c b/gcc/testsuite/gcc.target/i386/pr85620-5.c new file mode 100644 index 000..04537702d09 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr85620-5.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fcf-protection" } */ +/* { dg-final { scan-assembler-not "jmp" } } */ + +struct ucontext; + +extern int (*bar) (struct ucontext *) __attribute__((__indirect_return__)); + +int +foo (struct ucontext *oucp) +{ + return bar (oucp); +} diff --git a/gcc/testsuite/gcc.target/i386/pr85620-6.c b/gcc/testsuite/gcc.target/i386/pr85620-6.c new file mode 100644 index 000..0b6a64e8454 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr85620-6.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fcf-protection" } */ +/* { dg-final { scan-assembler "jmp" } } */ + +struct ucontext; + +extern int bar (struct ucontext *) __attribute__((__indirect_return__)); + +__attribute__((__indirect_return__)) +int +foo (struct ucontext *oucp) +{ + return bar (oucp); +} diff --git a/gcc/testsuite/gcc.target/i386/pr85620-7.c b/gcc/testsuite/gcc.target/i386/pr85620-7.c new file mode 100644 index 000..fa62d56decf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr85620-7.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fcf-protection" } */ +/* { dg-final { scan-assembler "jmp" } } */ + +struct ucontext; + +extern int (*bar) (struct ucontext *) __attribute__((__indirect_return__)); +extern int foo (struct ucontext *) __attribute__((__indirect_return__)); + +int +foo (struct ucontext *oucp) +{ + return bar (oucp); +} -- 2.36.1
Re: [PATCH] c++: Add __reference_con{struc,ver}ts_from_temporary [PR104477]
On Tue, Jul 12, 2022 at 04:15:00PM -0400, Jason Merrill wrote: > On 7/12/22 16:10, Jason Merrill wrote: > > On 7/8/22 13:41, Marek Polacek wrote: > > > This patch implements C++23 P2255R2, which adds two new type traits to > > > detect reference binding to a temporary. They can be used to detect code > > > like > > > > > > std::tuple t("meow"); > > > > > > which is incorrect because it always creates a dangling reference, > > > because > > > the std::string temporary is created inside the selected constructor of > > > std::tuple, and not outside it. > > > > > > There are two new compiler builtins, > > > __reference_constructs_from_temporary > > > and __reference_converts_from_temporary. The former is used to simulate > > > direct- and the latter copy-initialization context. But I had a > > > hard time > > > finding a test where there's actually a difference. Under DR 2267, both > > > of these are invalid: > > > > > > struct A { } a; > > > struct B { explicit B(const A&); }; > > > const B &b1{a}; > > > const B &b2(a); > > > > > > so I had to peruse [over.match.ref], and eventually realized that the > > > difference can be seen here: > > > > > > struct G { > > > operator int(); // #1 > > > explicit operator int&&(); // #2 > > > }; > > > > > > int&& r1(G{}); // use #2 (no temporary) > > > int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&) > > > > > > The implementation itself was rather straightforward because we already > > > have conv_binds_ref_to_prvalue. The main function here is > > > reference_from_temporary. The renaming to ref_conv_binds_to_temporary_p > > > is because previously the function didn't distinguish between an invalid > > > conversion and one that binds to a prvalue. > > > > > > The patch also adds the relevant class and variable templates to > > > . > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > > > PR c++/104477 > > > > > > gcc/c-family/ChangeLog: > > > > > > * c-common.cc (c_common_reswords): Add > > > __reference_constructs_from_temporary and > > > __reference_converts_from_temporary. > > > * c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and > > > RID_REF_CONVERTS_FROM_TEMPORARY. > > > > > > gcc/cp/ChangeLog: > > > > > > * call.cc (ref_conv_binds_directly_p): Rename to ... > > > (ref_conv_binds_to_temporary_p): ... this. Add a new bool > > > parameter. Return true only if the conversion is valid and > > > conv_binds_ref_to_prvalue returns true. > > > * constraint.cc (diagnose_trait_expr): Handle > > > CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and > > > CPTK_REF_CONVERTS_FROM_TEMPORARY. > > > * cp-tree.h (enum cp_trait_kind): Add > > > CPTK_REF_CONSTRUCTS_FROM_TEMPORARY > > > and CPTK_REF_CONVERTS_FROM_TEMPORARY. > > > (ref_conv_binds_directly_p): Rename to ... > > > (ref_conv_binds_to_temporary_p): ... this. > > > (reference_from_temporary): Declare. > > > * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle > > > CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and > > > CPTK_REF_CONVERTS_FROM_TEMPORARY. > > > * method.cc (reference_from_temporary): New. > > > * parser.cc (cp_parser_primary_expression): Handle > > > RID_REF_CONSTRUCTS_FROM_TEMPORARY and > > > RID_REF_CONVERTS_FROM_TEMPORARY. > > > (cp_parser_trait_expr): Likewise. > > > (warn_for_range_copy): Adjust to call ref_conv_binds_to_temporary_p. > > > * semantics.cc (trait_expr_value): Handle > > > CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and > > > CPTK_REF_CONVERTS_FROM_TEMPORARY. > > > (finish_trait_expr): Likewise. > > > > > > libstdc++-v3/ChangeLog: > > > > > > * include/std/type_traits (reference_constructs_from_temporary, > > > reference_converts_from_temporary): New class templates. > > > (reference_constructs_from_temporary_v, > > > reference_converts_from_temporary_v): New variable templates. > > > (__cpp_lib_reference_from_temporary): Define for C++23. > > > * include/std/version (__cpp_lib_reference_from_temporary): > > > Define for > > > C++23. > > > * testsuite/20_util/variable_templates_for_traits.cc: Test > > > reference_constructs_from_temporary_v and > > > reference_converts_from_temporary_v. > > > * testsuite/20_util/reference_from_temporary/value.cc: New test. > > > * testsuite/20_util/reference_from_temporary/value2.cc: New test. > > > * testsuite/20_util/reference_from_temporary/version.cc: New test. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * g++.dg/ext/reference_constructs_from_temporary1.C: New test. > > > * g++.dg/ext/reference_converts_from_temporary1.C: New test. > > > --- > > > gcc/c-family/c-common.cc | 4 + > > > gcc/c-family/c-common.h | 2 + > > > gcc/cp/call.cc | 14 +- > > > gcc/cp/constraint.cc | 8 + > > >
Re: [RFC] RISC-V: Add support for RV64E/lp64e
On Tue, 12 Jul 2022 22:09:53 PDT (-0700), Palmer Dabbelt wrote: gcc/ChangeLog * config.gcc (riscv): Accept rv64e and lp64e. * config/riscv/arch-canonicalize: Likewise. * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Likewise. * config/riscv/riscv-opts.h (riscv_abi_type): Likewise. * config/riscv/riscv.cc (riscv_option_override): Likewise * config/riscv/riscv.h (UNITS_PER_FP_ARG): Likewise. (STACK_BOUNDARY): Likewise. (ABI_STACK_BOUNDARY): Likewise. (MAX_ARGS_IN_REGISTERS): Likewise. (ABI_SPEC): Likewise. * config/riscv/riscv.opt (abi_type): Likewise. * doc/invoke.texi (RISC-V) <-mabi>: Likewise. --- This is all still in flight, but evidently RV64E exists. I haven't tested this at all, but given that we don't even have the ABI docs lined up yet it's likely a bit away from being mergable. --- gcc/config.gcc | 8 +--- gcc/config/riscv/arch-canonicalize | 2 +- gcc/config/riscv/riscv-c.cc| 1 + gcc/config/riscv/riscv-opts.h | 1 + gcc/config/riscv/riscv.cc | 6 -- gcc/config/riscv/riscv.h | 11 +++ gcc/config/riscv/riscv.opt | 3 +++ gcc/doc/invoke.texi| 5 +++-- 8 files changed, 25 insertions(+), 12 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 4e3b15bb5e9..4617ecb8d9b 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4637,7 +4637,7 @@ case "${target}" in # Infer arch from --with-arch, --target, and --with-abi. case "${with_arch}" in - rv32e* | rv32i* | rv32g* | rv64i* | rv64g*) + rv32e* | rv32i* | rv32g* | rv64e* | rv64i* | rv64g*) # OK. ;; "") @@ -4645,12 +4645,13 @@ case "${target}" in case "${with_abi}" in ilp32e) with_arch="rv32e" ;; ilp32 | ilp32f | ilp32d) with_arch="rv32gc" ;; + lp64e) with_arch="rv64e" ;; lp64 | lp64f | lp64d) with_arch="rv64gc" ;; *) with_arch="rv${xlen}gc" ;; esac ;; *) - echo "--with-arch=${with_arch} is not supported. The argument must begin with rv32e, rv32i, rv32g, rv64i, or rv64g." 1>&2 + echo "--with-arch=${with_arch} is not supported. The argument must begin with rv32e, rv32i, rv32g, rv64e, rv64i, or rv64g." 1>&2 exit 1 ;; esac @@ -4672,6 +4673,7 @@ case "${target}" in rv32e*) with_abi=ilp32e ;; rv32*) with_abi=ilp32 ;; rv64*d* | rv64g*) with_abi=lp64d ;; + rv64e*) with_abi=lp64e ;; rv64*) with_abi=lp64 ;; esac ;; @@ -4687,7 +4689,7 @@ case "${target}" in ilp32,rv32* | ilp32e,rv32e* \ | ilp32f,rv32*f* | ilp32f,rv32g* \ | ilp32d,rv32*d* | ilp32d,rv32g* \ - | lp64,rv64* \ + | lp64,rv64* | lp64e,rv64e* \ | lp64f,rv64*f* | lp64f,rv64g* \ | lp64d,rv64*d* | lp64d,rv64g*) ;; diff --git a/gcc/config/riscv/arch-canonicalize b/gcc/config/riscv/arch-canonicalize index fd7651ac491..8db3e88ddd7 100755 --- a/gcc/config/riscv/arch-canonicalize +++ b/gcc/config/riscv/arch-canonicalize @@ -71,7 +71,7 @@ def arch_canonicalize(arch, isa_spec): new_arch = "" extra_long_ext = [] std_exts = [] - if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64i', 'rv64g']: + if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64e', 'rv64i', 'rv64g']: new_arch = arch[:5].replace("g", "i") if arch[:5] in ['rv32g', 'rv64g']: std_exts = ['m', 'a', 'f', 'd'] diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index eb7ef09297e..4614dc6b6d9 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/riscv/riscv-c.cc @@ -67,6 +67,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile) switch (riscv_abi) { case ABI_ILP32E: +case ABI_LP64E: builtin_define ("__riscv_abi_rve"); gcc_fallthrough (); diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index 1e153b3a6e7..70fe708cbae 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -27,6 +27,7 @@ enum riscv_abi_type { ABI_ILP32F, ABI_ILP32D, ABI_LP64, + ABI_LP64E, ABI_LP64F, ABI_LP64D }; diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 2e83ca07394..51b7195c17b 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -5047,8 +5047,10 @@ riscv_option_override (void) error ("requested ABI requires %<-march%> to subsume the %qc extension", UNIT
Re: [GCC 11 backport] Disable generating load/store vector pairs for block copies.
Back port patch (changing .cc to .c) from trunk to GCC 11 committed. | From 3118d0856b030fe491a170354fed2df570df199f Mon Sep 17 00:00:00 2001 | From: Michael Meissner | Date: Thu, 14 Jul 2022 14:03:37 -0400 | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies. Testing has found that using load and store vector pair for block copies can result in a slow down on power10. This patch disables using the vector pair instructions for block copies if we are tuning for power10. 2022-06-11 Michael Meissner gcc/ * config/rs6000/rs6000.c (rs6000_option_override_internal): Do not generate block copies with vector pair instructions if we are tuning for power10. Back port from master branch. --- gcc/config/rs6000/rs6000.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 8c89b45922f..73f90f134f8 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4149,7 +4149,10 @@ rs6000_option_override_internal (bool global_init_p) if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) { - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) + /* Do not generate lxvp and stxvp on power10 since there are some +performance issues. */ + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX + && rs6000_tune != PROCESSOR_POWER10) rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; else rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; -- 2.35.3 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
[PATCH] jit: Make recording::memento non-copyable
These polymoprhic types don't appear to be copied anywhere. Rather than trying to reason about what it means to copy a polymoprhic base without copying the derived part, disable copies. This also avoids a potential double-free if a recorindg::string object does somehow get copied (it owns a pointer to dynamically allocated memory, but the implicit copy constructor will just make shallow copies). Tested x86_64-linux with --enable-languages=c,c++,jit --enable-host-shared OK for trunk? -- >8 -- gcc/jit/ChangeLog: * jit-recording.h (recording::memento): Define copy constructor and copy assignment operator as deleted. (recording::string): Likewise. (recording::string::c_str): Add const qualifier. --- gcc/jit/jit-recording.h | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h index 0dfb42f2676..8610ea988bd 100644 --- a/gcc/jit/jit-recording.h +++ b/gcc/jit/jit-recording.h @@ -405,6 +405,9 @@ public: virtual void write_reproducer (reproducer &r) = 0; virtual location *dyn_cast_location () { return NULL; } + memento (const memento&) = delete; + memento& operator= (const memento&) = delete; + protected: memento (context *ctxt) : m_ctxt (ctxt), @@ -436,13 +439,16 @@ public: string (context *ctxt, const char *text, bool escaped); ~string (); - const char *c_str () { return m_buffer; } + const char *c_str () const { return m_buffer; } static string * from_printf (context *ctxt, const char *fmt, ...) GNU_PRINTF(2, 3); void replay_into (replayer *) final override {} + string (const string&) = delete; + string& operator= (const string&) = delete; + private: string * make_debug_string () final override; void write_reproducer (reproducer &r) final override; -- 2.36.1
Re: [PATCH] jit: Make recording::memento non-copyable
On Thu, 2022-07-14 at 19:44 +0100, Jonathan Wakely wrote: > These polymoprhic types don't appear to be copied anywhere. Rather > than > trying to reason about what it means to copy a polymoprhic base > without > copying the derived part, disable copies. This also avoids a > potential > double-free if a recorindg::string object does somehow get copied (it > owns a pointer to dynamically allocated memory, but the implicit copy > constructor will just make shallow copies). > > Tested x86_64-linux with --enable-languages=c,c++,jit --enable-host- > shared > > OK for trunk? [CCing jit mailing list] Thanks; OK for trunk. > > -- >8 -- > > gcc/jit/ChangeLog: > > * jit-recording.h (recording::memento): Define copy > constructor > and copy assignment operator as deleted. > (recording::string): Likewise. > (recording::string::c_str): Add const qualifier. > --- > gcc/jit/jit-recording.h | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h > index 0dfb42f2676..8610ea988bd 100644 > --- a/gcc/jit/jit-recording.h > +++ b/gcc/jit/jit-recording.h > @@ -405,6 +405,9 @@ public: > virtual void write_reproducer (reproducer &r) = 0; > virtual location *dyn_cast_location () { return NULL; } > > + memento (const memento&) = delete; > + memento& operator= (const memento&) = delete; > + > protected: > memento (context *ctxt) > : m_ctxt (ctxt), > @@ -436,13 +439,16 @@ public: > string (context *ctxt, const char *text, bool escaped); > ~string (); > > - const char *c_str () { return m_buffer; } > + const char *c_str () const { return m_buffer; } > > static string * from_printf (context *ctxt, const char *fmt, ...) > GNU_PRINTF(2, 3); > > void replay_into (replayer *) final override {} > > + string (const string&) = delete; > + string& operator= (const string&) = delete; > + > private: > string * make_debug_string () final override; > void write_reproducer (reproducer &r) final override;
Re: [PATCH] PR target/106278: Keep REG_EQUAL notes consistent during TImode STV.
On Thu, Jul 14, 2022 at 6:58 PM Roger Sayle wrote: > > > This patch resolves PR target/106278 a regression on x86_64 caused by my > recent TImode STV improvements. Now that TImode STV can handle comparisons > such as "(set (regs:CC) (compare:CC (reg:TI) ...))" the convert_insn method > sensibly checks that the mode of the SET_DEST is TImode before setting > it to V1TImode [to avoid V1TImode appearing on the hard reg CC_FLAGS. > > Hence the current code looks like: > > if (GET_MODE (dst) == TImode) > { > tmp = find_reg_equal_equiv_note (insn); > if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode) > PUT_MODE (XEXP (tmp, 0), V1TImode); > PUT_MODE (dst, V1TImode); > fix_debug_reg_uses (dst); > } > break; > > which checks GET_MODE (dst) before calling PUT_MODE, and when a > change is made updating the REG_EQUAL_NOTE tmp if it exists. > > The logical flaw (oversight) is that due to RTL sharing, the destination > of this set may already have been updated to V1TImode, as this chain is > being converted, but we still need to update any REG_EQUAL_NOTE that > still has TImode. Hence the correct code is actually: > > > if (GET_MODE (dst) == TImode) > { > PUT_MODE (dst, V1TImode); > fix_debug_reg_uses (dst); > } > if (GET_MODE (dst) == V1TImode) > { > tmp = find_reg_equal_equiv_note (insn); > if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode) > PUT_MODE (XEXP (tmp, 0), V1TImode); > } > break; > > While fixing this behavior, I noticed I had some indentation whitespace > issues and some vestigial dead code in this function/method that I've > taken the liberty of cleaning up (as obvious) in this patch. While it is desired to fix cosmetic issues, those kinds of changes unnecessarily make review of the patch more difficult. > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? OK, but please do not mix cleanups with the functional changes. Cleanups should be separated to a follow-up patch. Thanks, Uros. > > > 2022-07-14 Roger Sayle > > gcc/ChangeLog > PR target/106278 > * config/i386/i386-features.cc (general_scalar_chain::convert_insn): > Fix indentation whitespace. > (timode_scalar_chain::fix_debug_reg_uses): Likewise. > (timode_scalar_chain::convert_insn): Delete dead code. > Update TImode REG_EQUAL_NOTE even if the SET_DEST is already V1TI. > Fix indentation whitespace. > (convertible_comparison_p): Likewise. > (timode_scalar_to_vector_candidate_p): Likewise. > > gcc/testsuite/ChangeLog > PR target/106278 > * gcc.dg/pr106278.c: New test case. > > > Thanks in advance, > Roger > -- >
Re: [PATCH] match.pd: Add new abs pattern [PR94290]
On Thu, Jul 14, 2022 at 1:24 PM Andrew Pinski wrote: > On Thu, Jul 14, 2022 at 7:09 AM Sam Feifer wrote: > > > > > > On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski wrote: > >> > >> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches > >> wrote: > >> > > >> > This patch is intended to fix a missed optimization in match.pd. It > optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to > write a second simplification in match.pd to handle the commutative > property as the match was not ocurring otherwise. Additionally, the pattern > (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps with the > other simplification rule. > >> > >> You could use :c for the commutative property instead and that should > >> simplify things. > >> That is: > >> > >> (simplify > >> (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) > >> (abs @0)) > >> > >> Also since integer_zerop works on vectors, it seems like you should > >> add a testcase or two for the vector case. > >> Also would be useful if you write a testcase that uses different > >> statements rather than one big one so it gets exercised in the > >> forwprop case. > >> Note also if either of the max are used more than just in this > >> simplification, it could increase the lifetime of @0, maybe you need > >> to add :s to the max expressions. > >> > > > > Thanks for the feedback. I'm not quite sure what a vector test case > would look like for this. Could I get some guidance on that? > > Yes this should produce the pattern at forwprop1 time (with the C++ > front-end, the C front-end does not support vector selects): > typedef int __attribute__((vector_size(4*sizeof(int vint; > > vint foo(vint x) { > vint t = (x >= 0 ? x : 0) ; > vint xx = -x; > vint t1 = (xx >= 0 ? xx : 0); > return t + t1; > } > > int foo(int x) { > int t = (x >= 0 ? x : 0) ; > int xx = -x; > int t1 = (xx >= 0 ? xx : 0); > return t + t1; > } > > Thanks, > Andrew Pinski > > Thanks for the help. I'm still having trouble with the vector test, though. When I try to compile, I get an error saying "used vector type where scalar is required", referring to the max expressions. How do I use the max expression with two vectors as the inputs? Thanks -Sam > > > > > Thanks > > -Sam > > > >> > >> Thanks, > >> Andrew > >> > >> > > >> > Tests are also included to be added to the testsuite. > >> > > >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > >> > > >> > PR tree-optimization/94290 > >> > > >> > gcc/ChangeLog: > >> > > >> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New > simplification. > >> > * match.pd (x <= 0 ? -x : 0): New Simplification. > >> > > >> > gcc/testsuite/ChangeLog: > >> > > >> > * gcc.c-torture/execute/pr94290-1.c: New test. > >> > * gcc.dg/pr94290-2.c: New test. > >> > * gcc.dg/pr94290.c: New test. > >> > --- > >> > gcc/match.pd | 15 ++ > >> > .../gcc.c-torture/execute/pr94290-1.c | 16 +++ > >> > gcc/testsuite/gcc.dg/pr94290-2.c | 15 ++ > >> > gcc/testsuite/gcc.dg/pr94290.c| 46 > +++ > >> > 4 files changed, 92 insertions(+) > >> > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c > >> > create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c > >> > create mode 100644 gcc/testsuite/gcc.dg/pr94290.c > >> > > >> > diff --git a/gcc/match.pd b/gcc/match.pd > >> > index 45aefd96688..55ca79d7ac9 100644 > >> > --- a/gcc/match.pd > >> > +++ b/gcc/match.pd > >> > @@ -7848,3 +7848,18 @@ and, > >> >(if (TYPE_UNSIGNED (TREE_TYPE (@0))) > >> > (bit_and @0 @1) > >> >(cond (le @0 @1) @0 (bit_and @0 @1)) > >> > + > >> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ > >> > +(simplify > >> > + (plus (max @0 integer_zerop) (max (negate @0) integer_zerop)) > >> > + (abs @0)) > >> > + > >> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x. */ > >> > +(simplify > >> > + (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) ) > >> > + (abs @0)) > >> > + > >> > +/* (x <= 0 ? -x : 0) -> max(-x, 0). */ > >> > +(simplify > >> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1) > >> > + (max (negate @0) @1)) > >> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c > b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c > >> > new file mode 100644 > >> > index 000..93b80d569aa > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c > >> > @@ -0,0 +1,16 @@ > >> > +/* PR tree-optimization/94290 */ > >> > + > >> > +#include "../../gcc.dg/pr94290.c" > >> > + > >> > +int main() { > >> > + > >> > +if (foo(0) != 0 > >> > +|| foo(-42) != 42 > >> > +|| foo(42) != 42 > >> > +|| baz(-10) != 10 > >> > +|| baz(-10) != 10) { > >> > +__builtin_abort(); > >> > +} > >> > + > >> > +return 0; > >> > +}
Re: [PATCH] match.pd: Add new abs pattern [PR94290]
On Thu, Jul 14, 2022 at 12:38 PM Sam Feifer wrote: > > > > On Thu, Jul 14, 2022 at 1:24 PM Andrew Pinski wrote: >> >> On Thu, Jul 14, 2022 at 7:09 AM Sam Feifer wrote: >> > >> > >> > On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski wrote: >> >> >> >> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches >> >> wrote: >> >> > >> >> > This patch is intended to fix a missed optimization in match.pd. It >> >> > optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to >> >> > write a second simplification in match.pd to handle the commutative >> >> > property as the match was not ocurring otherwise. Additionally, the >> >> > pattern (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps >> >> > with the other simplification rule. >> >> >> >> You could use :c for the commutative property instead and that should >> >> simplify things. >> >> That is: >> >> >> >> (simplify >> >> (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) >> >> (abs @0)) >> >> >> >> Also since integer_zerop works on vectors, it seems like you should >> >> add a testcase or two for the vector case. >> >> Also would be useful if you write a testcase that uses different >> >> statements rather than one big one so it gets exercised in the >> >> forwprop case. >> >> Note also if either of the max are used more than just in this >> >> simplification, it could increase the lifetime of @0, maybe you need >> >> to add :s to the max expressions. >> >> >> > >> > Thanks for the feedback. I'm not quite sure what a vector test case would >> > look like for this. Could I get some guidance on that? >> >> Yes this should produce the pattern at forwprop1 time (with the C++ >> front-end, the C front-end does not support vector selects): >> typedef int __attribute__((vector_size(4*sizeof(int vint; >> >> vint foo(vint x) { >> vint t = (x >= 0 ? x : 0) ; >> vint xx = -x; >> vint t1 = (xx >= 0 ? xx : 0); >> return t + t1; >> } >> >> int foo(int x) { >> int t = (x >= 0 ? x : 0) ; >> int xx = -x; >> int t1 = (xx >= 0 ? xx : 0); >> return t + t1; >> } >> >> Thanks, >> Andrew Pinski >> > > Thanks for the help. I'm still having trouble with the vector test, though. > When I try to compile, I get an error saying "used vector type where scalar > is required", referring to the max expressions. How do I use the max > expression with two vectors as the inputs? As I mentioned it only works with the C++ front-end :). The C front-end does not support ?: with vectors types. Thanks, Andrew Pinski > > Thanks > -Sam >> >> >> > >> > Thanks >> > -Sam >> > >> >> >> >> Thanks, >> >> Andrew >> >> >> >> > >> >> > Tests are also included to be added to the testsuite. >> >> > >> >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? >> >> > >> >> > PR tree-optimization/94290 >> >> > >> >> > gcc/ChangeLog: >> >> > >> >> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New >> >> > simplification. >> >> > * match.pd (x <= 0 ? -x : 0): New Simplification. >> >> > >> >> > gcc/testsuite/ChangeLog: >> >> > >> >> > * gcc.c-torture/execute/pr94290-1.c: New test. >> >> > * gcc.dg/pr94290-2.c: New test. >> >> > * gcc.dg/pr94290.c: New test. >> >> > --- >> >> > gcc/match.pd | 15 ++ >> >> > .../gcc.c-torture/execute/pr94290-1.c | 16 +++ >> >> > gcc/testsuite/gcc.dg/pr94290-2.c | 15 ++ >> >> > gcc/testsuite/gcc.dg/pr94290.c| 46 +++ >> >> > 4 files changed, 92 insertions(+) >> >> > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c >> >> > create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c >> >> > create mode 100644 gcc/testsuite/gcc.dg/pr94290.c >> >> > >> >> > diff --git a/gcc/match.pd b/gcc/match.pd >> >> > index 45aefd96688..55ca79d7ac9 100644 >> >> > --- a/gcc/match.pd >> >> > +++ b/gcc/match.pd >> >> > @@ -7848,3 +7848,18 @@ and, >> >> >(if (TYPE_UNSIGNED (TREE_TYPE (@0))) >> >> > (bit_and @0 @1) >> >> >(cond (le @0 @1) @0 (bit_and @0 @1)) >> >> > + >> >> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ >> >> > +(simplify >> >> > + (plus (max @0 integer_zerop) (max (negate @0) integer_zerop)) >> >> > + (abs @0)) >> >> > + >> >> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x. */ >> >> > +(simplify >> >> > + (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) ) >> >> > + (abs @0)) >> >> > + >> >> > +/* (x <= 0 ? -x : 0) -> max(-x, 0). */ >> >> > +(simplify >> >> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1) >> >> > + (max (negate @0) @1)) >> >> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c >> >> > b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c >> >> > new file mode 100644 >> >> > index 000..93b80d569aa >> >> > --- /dev/null >> >> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c >> >> > @@ -0,0 +1,16 @@ >> >> > +/*
Re: [PATCH 2/3] tree-cfg: do not duplicate returns_twice calls
On Thu, 14 Jul 2022, Richard Biener wrote: > Indeed. Guess that's what __builtin_setjmp[_receiver] for SJLJ_EH got > "right". > > When copying a block we do not copy labels so any "jumps" remain to the > original > block and thus we are indeed able to isolate normal control flow. Given that > returns_twice functions _do_ seem to be special, and also special as to how > we handle other abnormal receivers in duplicate_block. We do? Sorry, I don't see what you mean here, can you point me to specific lines? > So it might indeed make sense to special-case them in can_duplicate_block_p > ... (sorry for going back-and-forth ...) > > Note that I think this detail of duplicate_block (the function) and the hook > needs to be better documented (the semantics on incoming edges, not > duplicating > labels used for incoming control flow). > > Can you see as to how to adjust the RTL side for this? It looks like at least > some places set a REG_SETJMP note on call_insns (emit_call_1), I wonder > if in rtl_verify_flow_info[_1] (or its callees) we can check that such > calls come > first ... they might not since IIRC we do _not_ preserve abnormal edges when > expanding RTL (there's some existing bug about this and how this breaks some > setjmp tests) (but we try to recompute them?). No, we emit arguments/return value handling before/after a REG_SETJMP call, and yeah, we don't always properly recompute abnormal edges, so improving RTL in this respect seems hopeless. For example, it is easy enough to create a testcase where bb-reordering duplicates a returns_twice call, although it runs so late that perhaps later passes don't care: // gcc -O2 --param=max-grow-copy-bb-insns=100 __attribute__((returns_twice)) int rtwice(int); int g1(int), g2(int); void f(int i) { do { i = i%2 ? g1(i) : g2(i); } while (i = rtwice(i)); } FWIW, I also investigated https://gcc.gnu.org/PR101347 > Sorry about the back-and-forth again ... your original patch looks OK for the > GIMPLE side but can you amend the cfghooks.{cc,h} documentation to > summarize our findings and > the desired semantics of duplicate_block in this respect? Like below? ---8<--- Subject: [PATCH v3] tree-cfg: do not duplicate returns_twice calls A returns_twice call may have associated abnormal edges that correspond to the "second return" from the call. If the call is duplicated, the copies of those edges also need to be abnormal, but e.g. tracer does not enforce that. Just prohibit the (unlikely to be useful) duplication. gcc/ChangeLog: * cfghooks.cc (duplicate_block): Expand comment. * tree-cfg.cc (gimple_can_duplicate_bb_p): Reject blocks with calls that may return twice. --- gcc/cfghooks.cc | 13 ++--- gcc/tree-cfg.cc | 7 +-- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/gcc/cfghooks.cc b/gcc/cfghooks.cc index e435891fa..c6ac9532c 100644 --- a/gcc/cfghooks.cc +++ b/gcc/cfghooks.cc @@ -1086,9 +1086,16 @@ can_duplicate_block_p (const_basic_block bb) return cfg_hooks->can_duplicate_block_p (bb); } -/* Duplicates basic block BB and redirects edge E to it. Returns the - new basic block. The new basic block is placed after the basic block - AFTER. */ +/* Duplicate basic block BB, place it after AFTER (if non-null) and redirect + edge E to it (if non-null). Return the new basic block. + + If BB contains a returns_twice call, the caller is responsible for recreating + incoming abnormal edges corresponding to the "second return" for the copy. + gimple_can_duplicate_bb_p rejects such blocks, while RTL likes to live + dangerously. + + If BB has incoming abnormal edges for some other reason, their destinations + should be tied to label(s) of the original BB and not the copy. */ basic_block duplicate_block (basic_block bb, edge e, basic_block after, copy_bb_data *id) diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index f846dc2d8..5bcf78198 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -6346,12 +6346,15 @@ gimple_can_duplicate_bb_p (const_basic_block bb) { gimple *g = gsi_stmt (gsi); - /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be + /* Prohibit duplication of returns_twice calls, otherwise associated +abnormal edges also need to be duplicated properly. +An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be duplicated as part of its group, or not at all. The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a group, so the same holds there. */ if (is_gimple_call (g) - && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC) + && (gimple_call_flags (g) & ECF_RETURNS_TWICE + || gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC) || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT) || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY) || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_
[PATCH, committed] Fortran: error recovery for bad initializers of implied-shape arrays [PR106209]
Dear all, the attached patch introduces error recovery for two cases of using an invalid array in a declaration of an implied-shape array instead of hitting internal errors. Initial patch by Steve. The final version was approved in the PR by Steve. Committed as: https://gcc.gnu.org/g:748f8a8b145dde59c7b63aa68b5a59515b7efc49 Thanks, Harald From 748f8a8b145dde59c7b63aa68b5a59515b7efc49 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Thu, 14 Jul 2022 22:24:55 +0200 Subject: [PATCH] Fortran: error recovery for bad initializers of implied-shape arrays [PR106209] gcc/fortran/ChangeLog: PR fortran/106209 * decl.cc (add_init_expr_to_sym): Handle bad initializers for implied-shape arrays. gcc/testsuite/ChangeLog: PR fortran/106209 * gfortran.dg/pr106209.f90: New test. Co-authored-by: Steven G. Kargl --- gcc/fortran/decl.cc| 15 +-- gcc/testsuite/gfortran.dg/pr106209.f90 | 9 + 2 files changed, 22 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/pr106209.f90 diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc index 339f8b15035..b6400514731 100644 --- a/gcc/fortran/decl.cc +++ b/gcc/fortran/decl.cc @@ -2129,10 +2129,21 @@ add_init_expr_to_sym (const char *name, gfc_expr **initp, locus *var_locus) /* The shape may be NULL for EXPR_ARRAY, set it. */ if (init->shape == NULL) { - gcc_assert (init->expr_type == EXPR_ARRAY); + if (init->expr_type != EXPR_ARRAY) + { + gfc_error ("Bad shape of initializer at %L", &init->where); + return false; + } + init->shape = gfc_get_shape (1); if (!gfc_array_size (init, &init->shape[0])) - gfc_internal_error ("gfc_array_size failed"); + { + gfc_error ("Cannot determine shape of initializer at %L", + &init->where); + free (init->shape); + init->shape = NULL; + return false; + } } for (dim = 0; dim < sym->as->rank; ++dim) diff --git a/gcc/testsuite/gfortran.dg/pr106209.f90 b/gcc/testsuite/gfortran.dg/pr106209.f90 new file mode 100644 index 000..44f9233ec2f --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr106209.f90 @@ -0,0 +1,9 @@ +! { dg-do compile } +! PR fortran/106209 - ICE in add_init_expr_to_sym +! Contributed by G.Steinmetz + +program p + integer, parameter :: a(:) = 0 ! { dg-error "of deferred shape" } + integer, parameter :: b(*) = a ! { dg-error "Bad shape of initializer" } + integer, parameter :: c(*) = [a] ! { dg-error "Cannot determine shape" } +end -- 2.35.3
Re: [PATCH] Revert "[PATCH] RISC-V: Use new linker emulations for glibc ABI."
On Mon, 20 Jun 2022 20:48:50 PDT (-0700), gcc-patches@gcc.gnu.org wrote: On Mon, Jun 20, 2022 at 1:21 AM Kito Cheng wrote: Generally I agree we should fix that by GCC driver rather than ld emulation, but I think this should be reverted with the -L path fix, otherwise that will break multilib on GNU toolchain for linux immediately? IIUC just changing this will break even non-multlib systems, though it's possible that the symlinks work around that sufficiently. Thanks for the good consideration. That said, I am unsure any distro uses this currently. I think some just work around the possibly non-existent paths by creating symlinks. Perhaps we should prioritize on fixing the scheme before distros start to rely on the behavior. I'm kind of torn on this one: this has been around for a while and dropping it would be an ABI break, but the feedback from distro folks is pretty consistently that multlib is broken on RISC-V. If it's really unusably broken then I could buy the argument that there's no binaries (and thus no ABI to break), but there's at some base multilib functionality working -- I build multilib cross toolchains regularly, for example, and they can build simple stuff. I always find making that "nobody's used it" argument really hard, there's just too many users to try and track everyone down. We're in kind of a weird spot with RISC-V in general when it comes to ABI stuff: we were probably a bit overly optimistic about how fast any of this was going to get used when we committed to the ABI freeze, but any ABi break has such a huge potential for user headaches that I'm not sure it's going to be possible. It means we're stuck with some baggage, and while it's a headache to keep around stuff that's probably not all that useful I think it's just what we've got to live with. If multlib really is so broken it's not fixable without an ABI break then I guess there's no other option, but I think in this case we have some: One option would be to add an ld argument that says to turn off the emulation-specific path resolution, which we could then add to LINK_SPEC when we get the library paths sorted out? We'd still have the emulations and the subdirs, but at least we wouldn't need a flag day. Another option would be to add new multlib paths that don't have the subdirectories, as last I checked that was an issue for distros (violates FHS, breaks build systems, etc). If we're going to do that anyway then we could piggyback the new behavior on it and deprecate the old paths along with whatever behavior is associated with them. On Wed, Jun 15, 2022 at 4:00 PM Fangrui Song via Gcc-patches wrote: > > This reverts commit 37d57ac9a636f2235f9060e84fb8dd7968abd1dc. > > The resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=22962 > let GCC pass -m emulation to ld and let the ld emulation configure > default library paths. This scheme is problematic: > > * It's not ld's business to specify default -L. Different platforms have > different opinions on the hierarchy and all other arches work well without ld's > default -L. > * If some ABI derived library paths are desired, the compiler driver is in a > better position to make the decision and traditionally has done this. > * -m emulation is opaque to the compiler driver. It doesn't affect -B, so > data files like crt*.o, libasan_preinit.o, and libtsan_preinit.o are not affected. > > As is, many platforms just use symlinks to fake the lib64/{ilp32{,f},lp64{,f}} > hierarchies needed by the GNU ld emulation. They can always specify -L > explicitly if they want some ABI derived library paths. See also the rejected > https://reviews.llvm.org/D95755 I don't do a lot of LLVM stuff, but that has a green check mark that says "accepted" at the top. Does that mean it was merged somewhere, or just that it was acked/reviewed and then dropped? > > gcc/Changelog: > > * config/riscv/linux.h (LD_EMUL_SUFFIX): Remove. > (LINK_SPEC): Remove LD_EMUL_SUFFIX. > --- > gcc/config/riscv/linux.h | 10 +- > 1 file changed, 1 insertion(+), 9 deletions(-) > > diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h > index 38803723ba9..e0ff6e6a178 100644 > --- a/gcc/config/riscv/linux.h > +++ b/gcc/config/riscv/linux.h > @@ -49,16 +49,8 @@ along with GCC; see the file COPYING3. If not see > > #define CPP_SPEC "%{pthread:-D_REENTRANT}" > > -#define LD_EMUL_SUFFIX \ > - "%{mabi=lp64d:}" \ > - "%{mabi=lp64f:_lp64f}" \ > - "%{mabi=lp64:_lp64}" \ > - "%{mabi=ilp32d:}" \ > - "%{mabi=ilp32f:_ilp32f}" \ > - "%{mabi=ilp32:_ilp32}" > - > #define LINK_SPEC "\ > --melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv" LD_EMUL_SUFFIX " \ > +-melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv \ > %{mno-relax:--no-relax} \ > %{mbig-endian:-EB} \ > %{mlittle-endian:-EL} \ > -- > 2.36.1.476.g0c4daa206d-goog > -- 宋方睿
Re: [PATCH] Revert "[PATCH] RISC-V: Use new linker emulations for glibc ABI."
On 2022-07-14, Palmer Dabbelt wrote: On Mon, 20 Jun 2022 20:48:50 PDT (-0700), gcc-patches@gcc.gnu.org wrote: On Mon, Jun 20, 2022 at 1:21 AM Kito Cheng wrote: Generally I agree we should fix that by GCC driver rather than ld emulation, but I think this should be reverted with the -L path fix, otherwise that will break multilib on GNU toolchain for linux immediately? IIUC just changing this will break even non-multlib systems, though it's possible that the symlinks work around that sufficiently. Thanks for the good consideration. That said, I am unsure any distro uses this currently. I think some just work around the possibly non-existent paths by creating symlinks. Perhaps we should prioritize on fixing the scheme before distros start to rely on the behavior. I'm kind of torn on this one: this has been around for a while and dropping it would be an ABI break, but the feedback from distro folks is pretty consistently that multlib is broken on RISC-V. If it's really unusably broken then I could buy the argument that there's no binaries (and thus no ABI to break), but there's at some base multilib functionality working -- I build multilib cross toolchains regularly, for example, and they can build simple stuff. I always find making that "nobody's used it" argument really hard, there's just too many users to try and track everyone down. We're in kind of a weird spot with RISC-V in general when it comes to ABI stuff: we were probably a bit overly optimistic about how fast any of this was going to get used when we committed to the ABI freeze, but any ABi break has such a huge potential for user headaches that I'm not sure it's going to be possible. It means we're stuck with some baggage, and while it's a headache to keep around stuff that's probably not all that useful I think it's just what we've got to live with. If multlib really is so broken it's not fixable without an ABI break then I guess there's no other option, but I think in this case we have some: One option would be to add an ld argument that says to turn off the emulation-specific path resolution, which we could then add to LINK_SPEC when we get the library paths sorted out? We'd still have the emulations and the subdirs, but at least we wouldn't need a flag day. Another option would be to add new multlib paths that don't have the subdirectories, as last I checked that was an issue for distros (violates FHS, breaks build systems, etc). If we're going to do that anyway then we could piggyback the new behavior on it and deprecate the old paths along with whatever behavior is associated with them. Thanks for chiming in and providing the good consideration. I don't do much GNU development. I just spotted something and think it should be fixed. If dropping -L is considered a risk, an alternative scheme is that someone: * create an alternative patch making GCC driver pass the -L to ld and use the plain emulation name. * a configure option can possibly be added to guard whether the -L is added at all * drop the excessive emulations in GNU ld. This way an alternative linker implementation doesn't have to add a compatibility no-op option. I appreciate anyone who wants to step up and helps removing eelf64lriscv_{lp64,ilp32f,...}. We will have "*64briscv*" variants and it will then look really ugly. On Wed, Jun 15, 2022 at 4:00 PM Fangrui Song via Gcc-patches wrote: This reverts commit 37d57ac9a636f2235f9060e84fb8dd7968abd1dc. The resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=22962 let GCC pass -m emulation to ld and let the ld emulation configure default library paths. This scheme is problematic: * It's not ld's business to specify default -L. Different platforms have different opinions on the hierarchy and all other arches work well without ld's default -L. * If some ABI derived library paths are desired, the compiler driver is in a better position to make the decision and traditionally has done this. * -m emulation is opaque to the compiler driver. It doesn't affect -B, so data files like crt*.o, libasan_preinit.o, and libtsan_preinit.o are not affected. As is, many platforms just use symlinks to fake the lib64/{ilp32{,f},lp64{,f}} hierarchies needed by the GNU ld emulation. They can always specify -L explicitly if they want some ABI derived library paths. See also the rejected https://reviews.llvm.org/D95755 I don't do a lot of LLVM stuff, but that has a green check mark that says "accepted" at the top. Does that mean it was merged somewhere, or just that it was acked/reviewed and then dropped? I could land it, but I realized that would be a mistake so I stopped. There has since been no real need for this feature, either. I wished that some RISC-V folks stepped up and fixed it. It took so long so I ended up sending this patch myself, but I have no bandwidth to change this patch to a form providing better compatibility gcc/Changelog: * config/ri
Re: [PATCH v3] Simplify memchr with small constant strings
On Wed, Jul 13, 2022 at 11:42 PM Richard Biener wrote: > > On Wed, Jul 13, 2022 at 6:50 PM H.J. Lu wrote: > > > > When memchr is applied on a constant string of no more than the bytes of > > a word, simplify memchr by checking each byte in the constant string. > > > > int f (int a) > > { > >return __builtin_memchr ("AE", a, 2) != 0; > > } > > > > is simplified to > > > > int f (int a) > > { > > return ((char) a == 'A' || (char) a == 'E') != 0; > > } > > > > gcc/ > > > > PR tree-optimization/103798 > > * tree-ssa-forwprop.cc: Include "tree-ssa-strlen.h". > > (simplify_builtin_call): Inline memchr with constant strings of > > no more than the bytes of a word. > > * tree-ssa-strlen.cc (use_in_zero_equality): Make it global. > > * tree-ssa-strlen.h (use_in_zero_equality): New. > > > > gcc/testsuite/ > > > > PR tree-optimization/103798 > > * c-c++-common/pr103798-1.c: New test. > > * c-c++-common/pr103798-2.c: Likewise. > > * c-c++-common/pr103798-3.c: Likewise. > > * c-c++-common/pr103798-4.c: Likewise. > > * c-c++-common/pr103798-5.c: Likewise. > > * c-c++-common/pr103798-6.c: Likewise. > > * c-c++-common/pr103798-7.c: Likewise. > > * c-c++-common/pr103798-8.c: Likewise. > > * c-c++-common/pr103798-9.c: Likewise. > > * c-c++-common/pr103798-10.c: Likewise. > > --- > > gcc/testsuite/c-c++-common/pr103798-1.c | 28 + > > gcc/testsuite/c-c++-common/pr103798-10.c | 10 > > gcc/testsuite/c-c++-common/pr103798-2.c | 30 ++ > > gcc/testsuite/c-c++-common/pr103798-3.c | 28 + > > gcc/testsuite/c-c++-common/pr103798-4.c | 28 + > > gcc/testsuite/c-c++-common/pr103798-5.c | 26 + > > gcc/testsuite/c-c++-common/pr103798-6.c | 27 + > > gcc/testsuite/c-c++-common/pr103798-7.c | 27 + > > gcc/testsuite/c-c++-common/pr103798-8.c | 27 + > > gcc/testsuite/c-c++-common/pr103798-9.c | 10 > > gcc/tree-ssa-forwprop.cc | 73 > > gcc/tree-ssa-strlen.cc | 4 +- > > gcc/tree-ssa-strlen.h| 2 + > > 13 files changed, 318 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-1.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-10.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-2.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-3.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-4.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-5.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-6.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-7.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-8.c > > create mode 100644 gcc/testsuite/c-c++-common/pr103798-9.c > > > > diff --git a/gcc/testsuite/c-c++-common/pr103798-1.c > > b/gcc/testsuite/c-c++-common/pr103798-1.c > > new file mode 100644 > > index 000..cd3edf569fc > > --- /dev/null > > +++ b/gcc/testsuite/c-c++-common/pr103798-1.c > > @@ -0,0 +1,28 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */ > > + > > +__attribute__ ((weak)) > > +int > > +f (char a) > > +{ > > + return __builtin_memchr ("a", a, 1) == 0; > > +} > > + > > +__attribute__ ((weak)) > > +int > > +g (char a) > > +{ > > + return a != 'a'; > > +} > > + > > +int > > +main () > > +{ > > + for (int i = 0; i < 255; i++) > > + if (f (i) != g (i)) > > + __builtin_abort (); > > + > > + return 0; > > +} > > + > > +/* { dg-final { scan-assembler-not "memchr" } } */ > > diff --git a/gcc/testsuite/c-c++-common/pr103798-10.c > > b/gcc/testsuite/c-c++-common/pr103798-10.c > > new file mode 100644 > > index 000..4677e9539fa > > --- /dev/null > > +++ b/gcc/testsuite/c-c++-common/pr103798-10.c > > @@ -0,0 +1,10 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-Os -fdump-tree-optimized -save-temps" } */ > > + > > +int > > +f (char a) > > +{ > > + return __builtin_memchr ("ac", a, 1) == 0; > > +} > > + > > +/* { dg-final { scan-assembler "memchr" } } */ > > diff --git a/gcc/testsuite/c-c++-common/pr103798-2.c > > b/gcc/testsuite/c-c++-common/pr103798-2.c > > new file mode 100644 > > index 000..e7e99c3679e > > --- /dev/null > > +++ b/gcc/testsuite/c-c++-common/pr103798-2.c > > @@ -0,0 +1,30 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */ > > + > > +#include > > + > > +__attribute__ ((weak)) > > +int > > +f (int a) > > +{ > > + return memchr ("aE", a, 2) != NULL; > > +} > > + > > +__attribute__ ((weak)) > > +int > > +g (char a) > > +{ > > + return a == 'a' || a == 'E'; > > +} > > + > > +int > > +main () > > +{ > > + for (int i = 0; i < 255; i++) > > + if (f (i + 256) != g (i + 256)) > > + __builtin_abort (); > > + > > + return 0; > > +} > > + > > +/* { dg-final { scan-asse
[PATCH] libcpp: Improve encapsulation of label_text
I'm not sure if label_text::get () is the best name for the new accessor. Other options include buffer () and c_str () but I don't see a compelling reason to prefer either of those. As a follow-up patch we could change label_text::m_buffer (and pod_label_text::m_buffer) to be const char*, since those labels are never modified once a label_text takes ownership of it. That would make it clear to callers that they are not supposed to modify the contents, and would remove the const_cast currently used by label_text::borrow (), which is a code smell (returning a non-const char* makes it look like it's OK to write into it, which is definitely not true for a borrowed string that was originally a string literal). That would require using const_cast when passing the buffer to free for deallocation, but that's fine, and avoids the impression that the object holds a modifiable string buffer. Tested x86_64-linux, OK for trunk? -- >8 -- This adjusts the API of label_text so that the data members are private and cannot be modified by callers. Add accessors for them instead. Also rename moved_from () to the more idiomatic release (). Also remove the unused take_or_copy () member function which has confusing ownership semantics. gcc/analyzer/ChangeLog: * call-info.cc (call_info::print): Adjust to new label_text API. * checker-path.cc (checker_event::dump): Likewise. (region_creation_event::get_desc): Likewise. (state_change_event::get_desc): Likewise. (superedge_event::should_filter_p): Likewise. (start_cfg_edge_event::get_desc): Likewise. (call_event::get_desc): Likewise. (return_event::get_desc): Likewise. (warning_event::get_desc): Likewise. (checker_path::dump): Likewise. (checker_path::debug): Likewise. * diagnostic-manager.cc (diagnostic_manager::prune_for_sm_diagnostic): Likewise. (diagnostic_manager::prune_interproc_events): Likewise. * engine.cc (feasibility_state::maybe_update_for_edge): Likewise. * program-state.cc (sm_state_map::to_json): Likewise. * region-model-impl-calls.cc (region_model::impl_call_analyzer_describe): Likewise. (region_model::impl_call_analyzer_dump_capacity): Likewise. * region.cc (region::to_json): Likewise. * sm-malloc.cc (inform_nonnull_attribute): Likewise. * store.cc (binding_map::to_json): Likewise. (store::to_json): Likewise. * supergraph.cc (superedge::dump): Likewise. * svalue.cc (svalue::to_json): Likewise. gcc/c-family/ChangeLog: * c-format.cc (class range_label_for_format_type_mismatch): Adjust to new label_text API. gcc/ChangeLog: * diagnostic-format-json.cc (json_from_location_range): Adjust to new label_text API. * diagnostic-format-sarif.cc (sarif_builder::make_location_object): Likewise. * diagnostic-show-locus.cc (struct pod_label_text): Likewise. (layout::print_any_labels): Likewise. * tree-diagnostic-path.cc (class path_label): Likewise. (struct event_range): Likewise. (default_tree_diagnostic_path_printer): Likewise. (default_tree_make_json_for_path): Likewise. libcpp/ChangeLog: * include/line-map.h (label_text::take_or_copy): Remove. (label_text::moved_from): Rename to release. (label_text::m_buffer, label_text::m_owned): Make private. (label_text::get, label_text::is_owned): New accessors. --- gcc/analyzer/call-info.cc | 2 +- gcc/analyzer/checker-path.cc| 46 - gcc/analyzer/diagnostic-manager.cc | 20 +-- gcc/analyzer/engine.cc | 2 +- gcc/analyzer/program-state.cc | 2 +- gcc/analyzer/region-model-impl-calls.cc | 4 +-- gcc/analyzer/region.cc | 2 +- gcc/analyzer/sm-malloc.cc | 10 +++--- gcc/analyzer/store.cc | 6 ++-- gcc/analyzer/supergraph.cc | 4 +-- gcc/analyzer/svalue.cc | 2 +- gcc/c-family/c-format.cc| 4 +-- gcc/diagnostic-format-json.cc | 4 +-- gcc/diagnostic-format-sarif.cc | 2 +- gcc/diagnostic-show-locus.cc| 6 ++-- gcc/tree-diagnostic-path.cc | 16 - libcpp/include/line-map.h | 27 --- 17 files changed, 80 insertions(+), 79 deletions(-) diff --git a/gcc/analyzer/call-info.cc b/gcc/analyzer/call-info.cc index e1142d743a3..efc070b8bed 100644 --- a/gcc/analyzer/call-info.cc +++ b/gcc/analyzer/call-info.cc @@ -75,7 +75,7 @@ void call_info::print (pretty_printer *pp) const { label_text desc (get_desc (pp_show_color (pp))); - pp_string (pp, desc.m_buffer); + pp_string (pp, desc.get ()); } /* Implementation of custom_edge_info::add_events_to_path vfunc for diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc index 21
Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.
On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote: > I have applied the patch to GCC 12. > > | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001 > | From: Michael Meissner > | Date: Thu, 14 Jul 2022 11:16:08 -0400 > | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for > block copies. > > Testing has found that using load and store vector pair for block copies > can result in a slow down on power10. This patch disables using the > vector pair instructions for block copies if we are tuning for power10. > > 2022-06-11 Michael Meissner > > gcc/ > > * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do > not generate block copies with vector pair instructions if we are > tuning for power10. Back port from master branch. You never posted the trunk version of this, so that never was approved either. > +++ b/gcc/config/rs6000/rs6000.cc > @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p) > >if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) > { > - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) > + /* Do not generate lxvp and stxvp on power10 since there are some > + performance issues. */ > + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX > + && rs6000_tune != PROCESSOR_POWER10) > rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; >else > rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; The TARGET_MMA in that should not be there. Please fix that (that probably needs more changes). This statement does the opposite of what the comment says. Please fix this. On trunk, first. Segher
Re: [PATCH] libcpp: Improve encapsulation of label_text
On Thu, 2022-07-14 at 22:10 +0100, Jonathan Wakely wrote: Thanks for the patch. > I'm not sure if label_text::get () is the best name for the new > accessor. Other options include buffer () and c_str () but I don't > see a > compelling reason to prefer either of those. label_text::get should return a const char *, rather than a char *. I don't think anything uses label_text::is_owner; do we need that? > > As a follow-up patch we could change label_text::m_buffer (and > pod_label_text::m_buffer) to be const char*, since those labels are > never modified once a label_text takes ownership of it. That would > make it clear to callers that they are not supposed to modify the > contents, and would remove the const_cast currently used by > label_text::borrow (), which is a code smell (returning a non-const > char* makes it look like it's OK to write into it, which is > definitely > not true for a borrowed string that was originally a string literal). > That would require using const_cast when passing the buffer to free > for > deallocation, but that's fine, and avoids the impression that the > object > holds a modifiable string buffer. I'm not sure I agree with your reasoning about the proposed follow-up, but the patch below is OK for trunk, with the above nits fixed. Thanks Dave > > Tested x86_64-linux, OK for trunk? > > > -- >8 -- > > This adjusts the API of label_text so that the data members are > private > and cannot be modified by callers. Add accessors for them instead. > Also rename moved_from () to the more idiomatic release (). Also > remove > the unused take_or_copy () member function which has confusing > ownership > semantics. > > gcc/analyzer/ChangeLog: > > * call-info.cc (call_info::print): Adjust to new label_text > API. > * checker-path.cc (checker_event::dump): Likewise. > (region_creation_event::get_desc): Likewise. > (state_change_event::get_desc): Likewise. > (superedge_event::should_filter_p): Likewise. > (start_cfg_edge_event::get_desc): Likewise. > (call_event::get_desc): Likewise. > (return_event::get_desc): Likewise. > (warning_event::get_desc): Likewise. > (checker_path::dump): Likewise. > (checker_path::debug): Likewise. > * diagnostic-manager.cc > (diagnostic_manager::prune_for_sm_diagnostic): > Likewise. > (diagnostic_manager::prune_interproc_events): Likewise. > * engine.cc (feasibility_state::maybe_update_for_edge): > Likewise. > * program-state.cc (sm_state_map::to_json): Likewise. > * region-model-impl-calls.cc > (region_model::impl_call_analyzer_describe): Likewise. > (region_model::impl_call_analyzer_dump_capacity): Likewise. > * region.cc (region::to_json): Likewise. > * sm-malloc.cc (inform_nonnull_attribute): Likewise. > * store.cc (binding_map::to_json): Likewise. > (store::to_json): Likewise. > * supergraph.cc (superedge::dump): Likewise. > * svalue.cc (svalue::to_json): Likewise. > > gcc/c-family/ChangeLog: > > * c-format.cc (class range_label_for_format_type_mismatch): > Adjust to new label_text API. > > gcc/ChangeLog: > > * diagnostic-format-json.cc (json_from_location_range): > Adjust > to new label_text API. > * diagnostic-format-sarif.cc > (sarif_builder::make_location_object): > Likewise. > * diagnostic-show-locus.cc (struct pod_label_text): Likewise. > (layout::print_any_labels): Likewise. > * tree-diagnostic-path.cc (class path_label): Likewise. > (struct event_range): Likewise. > (default_tree_diagnostic_path_printer): Likewise. > (default_tree_make_json_for_path): Likewise. > > libcpp/ChangeLog: > > * include/line-map.h (label_text::take_or_copy): Remove. > (label_text::moved_from): Rename to release. > (label_text::m_buffer, label_text::m_owned): Make private. > (label_text::get, label_text::is_owned): New accessors. > --- > gcc/analyzer/call-info.cc | 2 +- > gcc/analyzer/checker-path.cc | 46 --- > -- > gcc/analyzer/diagnostic-manager.cc | 20 +-- > gcc/analyzer/engine.cc | 2 +- > gcc/analyzer/program-state.cc | 2 +- > gcc/analyzer/region-model-impl-calls.cc | 4 +-- > gcc/analyzer/region.cc | 2 +- > gcc/analyzer/sm-malloc.cc | 10 +++--- > gcc/analyzer/store.cc | 6 ++-- > gcc/analyzer/supergraph.cc | 4 +-- > gcc/analyzer/svalue.cc | 2 +- > gcc/c-family/c-format.cc | 4 +-- > gcc/diagnostic-format-json.cc | 4 +-- > gcc/diagnostic-format-sarif.cc | 2 +- > gcc/diagnostic-show-locus.cc | 6 ++-- > gcc/tree-diagnostic-path.cc | 16 - > libcpp/include/line-map.h
Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.
On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote: > On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote: > > I have applied the patch to GCC 12. > > > > | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001 > > | From: Michael Meissner > > | Date: Thu, 14 Jul 2022 11:16:08 -0400 > > | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs > > for block copies. > > > > Testing has found that using load and store vector pair for block copies > > can result in a slow down on power10. This patch disables using the > > vector pair instructions for block copies if we are tuning for power10. > > > > 2022-06-11 Michael Meissner > > > > gcc/ > > > > * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do > > not generate block copies with vector pair instructions if we are > > tuning for power10. Back port from master branch. > > You never posted the trunk version of this, so that never was approved > either. I did post the trunk version on June 10th, and your only comment was fix the commit message, which I thought I did in the commit. > > +++ b/gcc/config/rs6000/rs6000.cc > > @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p) > > > >if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) > > { > > - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) > > + /* Do not generate lxvp and stxvp on power10 since there are some > > +performance issues. */ > > + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX > > + && rs6000_tune != PROCESSOR_POWER10) > > rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > >else > > rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > > The TARGET_MMA in that should not be there. Please fix that (that > probably needs more changes). All of the movoo and movxo support require TARGET_MMA as does the code in rs6000-string.cc that could possibly generate load/store vector pair. To remove the check here would mean also fixing all of the vector load and store pairs in mma.md. > This statement does the opposite of what the comment says. > > Please fix this. On trunk, first. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
[PATCH] libphobos: Fix instability in the parallelized testsuite
Hello- I get a different number of test results from libphobos.unittest/unittest.exp, depending on server load. I believe it's because this testsuite doesn't check runtest_file_p: $ make -j 1 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#' # of expected passes 10 $ make -j 2 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#' # of expected passes 10 # of expected passes 10 $ make -j 4 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#' # of expected passes 10 # of expected passes 10 # of expected passes 10 # of expected passes 10 When running in parallel along with other tests, even at a fixed argument for -j, the number of tests that actually execute will depend on how many of the parallel sub-makes happened to start prior to the first one finishing, hence it changes from run to run. The attached patch fixes it for me, if it looks OK? Thanks, this would remove some noise from before/after test comparisons. -Lewis libphobos: Fix instability in the parallelized testsuite libphobos.unittest/unittest.exp calls bare dg-test rather than dg-runtest, and so it should call runtest_file_p to determine whether to run each test or not. Without that call, the tests run too many times in parallel mode (they will run as many times, as the argument to make -j). libphobos/ChangeLog: * testsuite/libphobos.unittest/unittest.exp: Call runtest_file_p prior to running each test. diff --git a/libphobos/testsuite/libphobos.unittest/unittest.exp b/libphobos/testsuite/libphobos.unittest/unittest.exp index 2a019caca8c..175decdc333 100644 --- a/libphobos/testsuite/libphobos.unittest/unittest.exp +++ b/libphobos/testsuite/libphobos.unittest/unittest.exp @@ -42,6 +42,9 @@ foreach unit_test $unit_test_list { set expected_fail [lindex $unit_test 1] foreach test $tests { + if {![runtest_file_p $runtests $test]} { +continue +} set shouldfail $expected_fail dg-test $test "" $test_flags }
[PATCH] stack-protector: Check stack canary for noreturn function
Check stack canary for noreturn function to catch stack corruption before calling noreturn function. For C++, check stack canary when throwing exception or resuming stack unwind to avoid corrupted stack. gcc/ PR middle-end/58245 * calls.cc (expand_call): Check stack canary for noreturn function. gcc/testsuite/ PR middle-end/58245 * c-c++-common/pr58245-1.c: New test. * g++.dg/pr58245-1.C: Likewise. * g++.dg/fstack-protector-strong.C: Adjusted. --- gcc/calls.cc | 7 ++- gcc/testsuite/c-c++-common/pr58245-1.c | 12 gcc/testsuite/g++.dg/fstack-protector-strong.C | 2 +- gcc/testsuite/g++.dg/pr58245-1.C | 10 ++ 4 files changed, 29 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/pr58245-1.c create mode 100644 gcc/testsuite/g++.dg/pr58245-1.C diff --git a/gcc/calls.cc b/gcc/calls.cc index bc96aff38f0..7816c2c8d99 100644 --- a/gcc/calls.cc +++ b/gcc/calls.cc @@ -3154,7 +3154,12 @@ expand_call (tree exp, rtx target, int ignore) if (pass && (flags & ECF_MALLOC)) start_sequence (); - if (pass == 0 + /* Check the canary value for sibcall or function which doesn't +return. */ + if ((pass == 0 + || ((flags & ECF_NORETURN) != 0 + && (fndecl + != get_callee_fndecl (targetm.stack_protect_fail () && crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ()) stack_protect_epilogue (); diff --git a/gcc/testsuite/c-c++-common/pr58245-1.c b/gcc/testsuite/c-c++-common/pr58245-1.c new file mode 100644 index 000..945acc53004 --- /dev/null +++ b/gcc/testsuite/c-c++-common/pr58245-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */ +/* { dg-options "-O2 -fstack-protector-all" } */ + +extern void foo (void) __attribute__ ((noreturn)); + +void +bar (void) +{ + foo (); +} + +/* { dg-final { scan-assembler-times "stack_chk_fail" 1 } } */ diff --git a/gcc/testsuite/g++.dg/fstack-protector-strong.C b/gcc/testsuite/g++.dg/fstack-protector-strong.C index ae6d2fdb8df..034af2ce9ab 100644 --- a/gcc/testsuite/g++.dg/fstack-protector-strong.C +++ b/gcc/testsuite/g++.dg/fstack-protector-strong.C @@ -85,4 +85,4 @@ int foo7 (B *p) return p->return_slot ().a1; } -/* { dg-final { scan-assembler-times "stack_chk_fail" 7 } } */ +/* { dg-final { scan-assembler-times "stack_chk_fail" 8 } } */ diff --git a/gcc/testsuite/g++.dg/pr58245-1.C b/gcc/testsuite/g++.dg/pr58245-1.C new file mode 100644 index 000..1439bc62e71 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr58245-1.C @@ -0,0 +1,10 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */ +/* { dg-options "-O2 -fstack-protector-all" } */ + +void +bar (void) +{ + throw 1; +} + +/* { dg-final { scan-assembler-times "stack_chk_fail" 1 } } */ -- 2.36.1
Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.
On Thu, Jul 14, 2022 at 05:49:57PM -0400, Michael Meissner wrote: > On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote: > > You never posted the trunk version of this, so that never was approved > > either. > > I did post the trunk version on June 10th, and your only comment was fix the > commit message, which I thought I did in the commit. I did not approve the patch. Of course not, I didn't even get as far as reading it. You should have fixed it and sent again, I did not approve anything. > > > + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX > > > + && rs6000_tune != PROCESSOR_POWER10) > > > rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > > >else > > > rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > > > > The TARGET_MMA in that should not be there. Please fix that (that > > probably needs more changes). > > All of the movoo and movxo support require TARGET_MMA as does the code in > rs6000-string.cc that could possibly generate load/store vector pair. And all that is wrong and should be fixed. > To > remove the check here would mean also fixing all of the vector load and store > pairs in mma.md. That is wha I said, yes,. > > This statement does the opposite of what the comment says. > > > > Please fix this. On trunk, first. This is the core problem with this patch: it is simply wrong. It is a very roundabout way of saying "only enable vector pairs if -mcpu=power10 but -mtune=somethingelse". Which is not a sensible thing to do, and not what the comment says either. Segher
Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
On 7/14/22 8:09 AM, Jose E. Marchesi wrote: Hi Yonghong. On 7/7/22 1:24 PM, Jose E. Marchesi wrote: Hi Yonghong. On 6/21/22 9:12 AM, Jose E. Marchesi wrote: On 6/17/22 10:18 AM, Jose E. Marchesi wrote: Hi Yonghong. On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are
Re: [PATCH, committed] Fortran: error recovery for bad initializers of implied-shape arrays [PR106209]
Hi Herald, Looks good to me. I have always preferred informative messages. Thanks, Jerry On 7/14/22 1:34 PM, Harald Anlauf via Fortran wrote: Dear all, the attached patch introduces error recovery for two cases of using an invalid array in a declaration of an implied-shape array instead of hitting internal errors. Initial patch by Steve. The final version was approved in the PR by Steve. Committed as: https://gcc.gnu.org/g:748f8a8b145dde59c7b63aa68b5a59515b7efc49 Thanks, Harald
Re: [PATCH] i386: Fix _mm_[u]comixx_{ss,sd} codegen and add PF result. [PR106113]
On Thu, Jul 14, 2022 at 2:11 PM Kong, Lingling via Gcc-patches wrote: > > Hi, > > The patch is to fix _mm_[u]comixx_{ss,sd} codegen and add PF result. These > intrinsics have changed over time, like `_mm_comieq_ss ` old operation is > `RETURN ( a[31:0] == b[31:0] ) ? 1 : 0`, and new operation update is `RETURN > ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0`. > > OK for master? All _mm_comiXX_ss uses order_compare except for mm_comine_ss which uses unordered_compare, now it's aligned with intrinsic guide. Ok for trunk. > > gcc/ChangeLog: > > PR target/106113 > * config/i386/i386-builtin.def (BDESC): Fix [u]comi{ss,sd} > comparison due to intrinsics changed over time. > * config/i386/i386-expand.cc (ix86_ssecom_setcc): > Add unordered check and mode for sse comi codegen. > (ix86_expand_sse_comi): Add unordered check and check a different > CCmode. > (ix86_expand_sse_comi_round):Extract unordered check and mode part > in ix86_ssecom_setcc. > > gcc/testsuite/ChangeLog: > > PR target/106113 > * gcc.target/i386/avx-vcomisd-pr106113-2.c: New test. > * gcc.target/i386/avx-vcomiss-pr106113-2.c: Ditto. > * gcc.target/i386/avx-vucomisd-pr106113-2.c: Ditto. > * gcc.target/i386/avx-vucomiss-pr106113-2.c: Ditto. > * gcc.target/i386/sse-comiss-pr106113-1.c: Ditto. > * gcc.target/i386/sse-comiss-pr106113-2.c: Ditto. > * gcc.target/i386/sse-ucomiss-pr106113-1.c: Ditto. > * gcc.target/i386/sse-ucomiss-pr106113-2.c: Ditto. > * gcc.target/i386/sse2-comisd-pr106113-1.c: Ditto. > * gcc.target/i386/sse2-comisd-pr106113-2.c: Ditto. > * gcc.target/i386/sse2-ucomisd-pr106113-1.c: Ditto. > * gcc.target/i386/sse2-ucomisd-pr106113-2.c: Ditto. > --- > gcc/config/i386/i386-builtin.def | 32 ++-- > gcc/config/i386/i386-expand.cc| 140 +++--- > .../gcc.target/i386/avx-vcomisd-pr106113-2.c | 8 + > .../gcc.target/i386/avx-vcomiss-pr106113-2.c | 8 + > .../gcc.target/i386/avx-vucomisd-pr106113-2.c | 8 + > .../gcc.target/i386/avx-vucomiss-pr106113-2.c | 8 + > .../gcc.target/i386/sse-comiss-pr106113-1.c | 19 +++ > .../gcc.target/i386/sse-comiss-pr106113-2.c | 59 > .../gcc.target/i386/sse-ucomiss-pr106113-1.c | 19 +++ > .../gcc.target/i386/sse-ucomiss-pr106113-2.c | 59 > .../gcc.target/i386/sse2-comisd-pr106113-1.c | 19 +++ > .../gcc.target/i386/sse2-comisd-pr106113-2.c | 59 > .../gcc.target/i386/sse2-ucomisd-pr106113-1.c | 19 +++ > .../gcc.target/i386/sse2-ucomisd-pr106113-2.c | 59 > 14 files changed, 450 insertions(+), 66 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/avx-vcomisd-pr106113-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx-vcomiss-pr106113-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx-vucomisd-pr106113-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx-vucomiss-pr106113-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse-comiss-pr106113-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse-comiss-pr106113-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse-ucomiss-pr106113-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse-ucomiss-pr106113-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-comisd-pr106113-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-comisd-pr106113-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-ucomisd-pr106113-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/sse2-ucomisd-pr106113-2.c > > diff --git a/gcc/config/i386/i386-builtin.def > b/gcc/config/i386/i386-builtin.def > index fd160935e67..acb7e8ca64b 100644 > --- a/gcc/config/i386/i386-builtin.def > +++ b/gcc/config/i386/i386-builtin.def > @@ -35,30 +35,30 @@ > IX86_BUILTIN__BDESC_##NEXT_KIND##_FIRST - 1. */ > > BDESC_FIRST (comi, COMI, > - OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comieq", > IX86_BUILTIN_COMIEQSS, UNEQ, 0) > -BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comilt", > IX86_BUILTIN_COMILTSS, UNLT, 0) > -BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comile", > IX86_BUILTIN_COMILESS, UNLE, 0) > + OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comieq", > IX86_BUILTIN_COMIEQSS, EQ, 0) > +BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comilt", > IX86_BUILTIN_COMILTSS, LT, 0) > +BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comile", > IX86_BUILTIN_COMILESS, LE, 0) > BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comigt", > IX86_BUILTIN_COMIGTSS, GT, 0) > BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comige", > IX86_BUILTIN_COMIGESS, GE, 0) > -BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comineq", > IX86_BUILTIN
Re: [PATCH] x86: Disable sibcall if indirect_return attribute doesn't match
On Fri, Jul 15, 2022 at 1:44 AM H.J. Lu via Gcc-patches wrote: > > When shadow stack is enabled, function with indirect_return attribute > may return via indirect jump. In this case, we need to disable sibcall > if caller doesn't have indirect_return attribute and indirect branch > tracking is enabled since compiler won't generate ENDBR when calling the > caller. > LGTM. > gcc/ > > PR target/85620 > * config/i386/i386.cc (ix86_function_ok_for_sibcall): Return > false if callee has indirect_return attribute and caller > doesn't. > > gcc/testsuite/ > > PR target/85620 > * gcc.target/i386/pr85620-2.c: Updated. > * gcc.target/i386/pr85620-5.c: New test. > * gcc.target/i386/pr85620-6.c: Likewise. > * gcc.target/i386/pr85620-7.c: Likewise. > --- > gcc/config/i386/i386.cc | 10 ++ > gcc/testsuite/gcc.target/i386/pr85620-2.c | 3 ++- > gcc/testsuite/gcc.target/i386/pr85620-5.c | 13 + > gcc/testsuite/gcc.target/i386/pr85620-6.c | 14 ++ > gcc/testsuite/gcc.target/i386/pr85620-7.c | 14 ++ > 5 files changed, 53 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-5.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-6.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-7.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 3a3c7299eb4..e03f86d4a23 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -1024,6 +1024,16 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) > return false; > } > > + /* Disable sibcall if callee has indirect_return attribute and > + caller doesn't since callee will return to the caller's caller > + via an indirect jump. */ > + if (((flag_cf_protection & (CF_RETURN | CF_BRANCH)) > + == (CF_RETURN | CF_BRANCH)) > + && lookup_attribute ("indirect_return", TYPE_ATTRIBUTES (type)) > + && !lookup_attribute ("indirect_return", > + TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl > +return false; > + >/* Otherwise okay. That also includes certain types of indirect calls. */ >return true; > } > diff --git a/gcc/testsuite/gcc.target/i386/pr85620-2.c > b/gcc/testsuite/gcc.target/i386/pr85620-2.c > index b2e680fa1fe..14ce0ffd1e1 100644 > --- a/gcc/testsuite/gcc.target/i386/pr85620-2.c > +++ b/gcc/testsuite/gcc.target/i386/pr85620-2.c > @@ -1,6 +1,7 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -fcf-protection" } */ > -/* { dg-final { scan-assembler-times {\mendbr} 1 } } */ > +/* { dg-final { scan-assembler-times {\mendbr} 2 } } */ > +/* { dg-final { scan-assembler-not "jmp" } } */ > > struct ucontext; > > diff --git a/gcc/testsuite/gcc.target/i386/pr85620-5.c > b/gcc/testsuite/gcc.target/i386/pr85620-5.c > new file mode 100644 > index 000..04537702d09 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr85620-5.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fcf-protection" } */ > +/* { dg-final { scan-assembler-not "jmp" } } */ > + > +struct ucontext; > + > +extern int (*bar) (struct ucontext *) __attribute__((__indirect_return__)); > + > +int > +foo (struct ucontext *oucp) > +{ > + return bar (oucp); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr85620-6.c > b/gcc/testsuite/gcc.target/i386/pr85620-6.c > new file mode 100644 > index 000..0b6a64e8454 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr85620-6.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fcf-protection" } */ > +/* { dg-final { scan-assembler "jmp" } } */ > + > +struct ucontext; > + > +extern int bar (struct ucontext *) __attribute__((__indirect_return__)); > + > +__attribute__((__indirect_return__)) > +int > +foo (struct ucontext *oucp) > +{ > + return bar (oucp); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr85620-7.c > b/gcc/testsuite/gcc.target/i386/pr85620-7.c > new file mode 100644 > index 000..fa62d56decf > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr85620-7.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fcf-protection" } */ > +/* { dg-final { scan-assembler "jmp" } } */ > + > +struct ucontext; > + > +extern int (*bar) (struct ucontext *) __attribute__((__indirect_return__)); > +extern int foo (struct ucontext *) __attribute__((__indirect_return__)); > + > +int > +foo (struct ucontext *oucp) > +{ > + return bar (oucp); > +} > -- > 2.36.1 > -- BR, Hongtao
Re: [PATCH] c++: Add __reference_con{struc,ver}ts_from_temporary [PR104477]
On 7/14/22 13:43, Marek Polacek wrote: On Tue, Jul 12, 2022 at 04:15:00PM -0400, Jason Merrill wrote: On 7/12/22 16:10, Jason Merrill wrote: On 7/8/22 13:41, Marek Polacek wrote: This patch implements C++23 P2255R2, which adds two new type traits to detect reference binding to a temporary. They can be used to detect code like std::tuple t("meow"); which is incorrect because it always creates a dangling reference, because the std::string temporary is created inside the selected constructor of std::tuple, and not outside it. There are two new compiler builtins, __reference_constructs_from_temporary and __reference_converts_from_temporary. The former is used to simulate direct- and the latter copy-initialization context. But I had a hard time finding a test where there's actually a difference. Under DR 2267, both of these are invalid: struct A { } a; struct B { explicit B(const A&); }; const B &b1{a}; const B &b2(a); so I had to peruse [over.match.ref], and eventually realized that the difference can be seen here: struct G { operator int(); // #1 explicit operator int&&(); // #2 }; int&& r1(G{}); // use #2 (no temporary) int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&) The implementation itself was rather straightforward because we already have conv_binds_ref_to_prvalue. The main function here is reference_from_temporary. The renaming to ref_conv_binds_to_temporary_p is because previously the function didn't distinguish between an invalid conversion and one that binds to a prvalue. The patch also adds the relevant class and variable templates to . Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? PR c++/104477 gcc/c-family/ChangeLog: * c-common.cc (c_common_reswords): Add __reference_constructs_from_temporary and __reference_converts_from_temporary. * c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. gcc/cp/ChangeLog: * call.cc (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_to_temporary_p): ... this. Add a new bool parameter. Return true only if the conversion is valid and conv_binds_ref_to_prvalue returns true. * constraint.cc (diagnose_trait_expr): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * cp-tree.h (enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_to_temporary_p): ... this. (reference_from_temporary): Declare. * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * method.cc (reference_from_temporary): New. * parser.cc (cp_parser_primary_expression): Handle RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. (cp_parser_trait_expr): Likewise. (warn_for_range_copy): Adjust to call ref_conv_binds_to_temporary_p. * semantics.cc (trait_expr_value): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (finish_trait_expr): Likewise. libstdc++-v3/ChangeLog: * include/std/type_traits (reference_constructs_from_temporary, reference_converts_from_temporary): New class templates. (reference_constructs_from_temporary_v, reference_converts_from_temporary_v): New variable templates. (__cpp_lib_reference_from_temporary): Define for C++23. * include/std/version (__cpp_lib_reference_from_temporary): Define for C++23. * testsuite/20_util/variable_templates_for_traits.cc: Test reference_constructs_from_temporary_v and reference_converts_from_temporary_v. * testsuite/20_util/reference_from_temporary/value.cc: New test. * testsuite/20_util/reference_from_temporary/value2.cc: New test. * testsuite/20_util/reference_from_temporary/version.cc: New test. gcc/testsuite/ChangeLog: * g++.dg/ext/reference_constructs_from_temporary1.C: New test. * g++.dg/ext/reference_converts_from_temporary1.C: New test. --- gcc/c-family/c-common.cc | 4 + gcc/c-family/c-common.h | 2 + gcc/cp/call.cc | 14 +- gcc/cp/constraint.cc | 8 + gcc/cp/cp-tree.h | 7 +- gcc/cp/cxx-pretty-print.cc | 6 + gcc/cp/method.cc | 28 +++ gcc/cp/parser.cc | 14 +- gcc/cp/semantics.cc | 8 + .../reference_constructs_from_temporary1.C | 214 ++ .../ext/reference_converts_from_temporary1.C | 214 ++ libstdc++-v3/include/std/type_traits | 39 libstdc++-v3/include/std/version | 5 +- .../20_util/reference_f