[PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
Hi, Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO if func->decl is not null but no cgraph node is available for it. As PR105818 shows, this could give unexpected result. For the case in PR105818, when parsing bar decl in function foo, the cfun is a function structure for foo, for which there is none cgraph node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since the context is to optimize for size, the flag optimize_size is true. The patch is to make optimize_function_for_size_p to check optimize_size as what it does when func->decl is unavailable. One regression failure got exposed on aarch64-linux-gnu: PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ -DPREVENT_OPTIMIZATION line 21 x == 10 - i The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT used in function fold_range_test during c parsing, it uses optimize_function_for_speed_p which is equal to the invertion of optimize_function_for_size_p. At that time cfun->decl is valid but no cgraph node for it, w/o this patch function optimize_function_for_speed_p returns true eventually, while it returns false with this patch. Since the command line option -Os is specified, there is no reason to interpret it as "for speed". I think this failure is expected and adjust the test case accordingly. Is it ok for trunk? BR, Kewen - PR target/105818 gcc/ChangeLog: * predict.cc (optimize_function_for_size_p): Check optimize_size when func->decl is valid but its cgraph node is unavailable. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr105818.c: New test. * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. --- gcc/predict.cc | 2 +- gcc/testsuite/gcc.dg/guality/pr54693-2.c| 2 +- gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 + 3 files changed, 11 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c diff --git a/gcc/predict.cc b/gcc/predict.cc index 5734e4c8516..6c60a973236 100644 --- a/gcc/predict.cc +++ b/gcc/predict.cc @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) cgraph_node *n = cgraph_node::get (fun->decl); if (n) return n->optimize_for_size_p (); - return OPTIMIZE_SIZE_NO; + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; } /* Return true if function FUN should always be optimized for speed. */ diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c index 68aa6c63d71..14ca94ad37d 100644 --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c @@ -17,7 +17,7 @@ foo (int x, int y, int z) int i = 0; while (x > 3 && y > 3 && z > 3) { /* { dg-final { gdb-test .+2 "i" "v + 1" } } */ - /* { dg-final { gdb-test .+1 "x" "10 - i" } } */ + /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */ bar (i); /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */ /* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */ i++, x--, y -= 2, z -= 3; diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c new file mode 100644 index 000..18781edbf9e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c @@ -0,0 +1,9 @@ +/* { dg-options "-Os -fno-tree-vectorize" } */ + +#pragma GCC optimize "-fno-tree-vectorize" + +void +foo (void) +{ + void bar (void); +} -- 2.27.0
RE: gcc-patches@gcc.gnu.org
<>
[PATCH] tree-optimization/105946 - avoid accessing excess args from uninit diag
uninit diagnostics uses passing via reference and access attributes but that iterates over function type arguments which can in some cases appearantly outrun the actual arguments leading to ICEs. The following simply ignores not present arguments. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2022-06-14 Richard Biener PR tree-optimization/105946 * tree-ssa-uninit.cc (maybe_warn_pass_by_reference): Do not look at arguments not specified in the function call. --- gcc/tree-ssa-uninit.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/tree-ssa-uninit.cc b/gcc/tree-ssa-uninit.cc index b48fcf1a8ba..f326f1775c0 100644 --- a/gcc/tree-ssa-uninit.cc +++ b/gcc/tree-ssa-uninit.cc @@ -797,6 +797,9 @@ maybe_warn_pass_by_reference (gcall *stmt, wlimits &wlims) { ++argno; + if (argno > nargs) + break; + if (!POINTER_TYPE_P (argtype)) continue; -- 2.35.3
Re: [PATCH] Do not erase warning data in gimple_set_location
> Hmm, I think instead of special-casing UNKNOWN_LOCATION > what gimple_set_location should probably do is either not copy > warnings at all or union them. Btw, gimple_set_location also > removes a previously set BLOCK (but gimple_set_block preserves > the location locus and diagnostic override). > > So I'd be tempted to axe the copy_warning () completely here. The first thing I tried, but it regressed the original testcase IIRC. Even my minimal patch manages to break bootstrap on ARM: buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libcpp/lex.cc: 1523:9: error: pointer used after ‘void operator delete(void*, std::size_t)’ [-Werror=use-after-free] # 00:31:04 make[3]: *** [Makefile:227: lex.o] Error 1 # 00:31:04 make[2]: *** [Makefile:9527: all-stage3-libcpp] Error 2 # 00:31:35 make[1]: *** [Makefile:25887: stage3-bubble] Error 2 # 00:31:35 make: *** [Makefile:1072: all] Error 2 /* Don't warn for cases like when a cdtor returns 'this' on ARM. */ else if (warning_suppressed_p (var, OPT_Wuse_after_free)) return; because warning-control.cc:copy_warning also clobbers the warning data of the destination. We have in cp/decl.cc:maybe_return_this the lines: /* Return the address of the object. */ tree val = DECL_ARGUMENTS (current_function_decl); suppress_warning (val, OPT_Wuse_after_free); -Wuse-after-free is suppressed for the location of VAL and the TREE_NO_WARNING bit set on it. But other expressions may have the same location as VAL and the TREE_NO_WARNING bit _not_ set, so when you call copy_warning (expr, expr) (we do that a lot after failed folding) for them, copy_warning erases the warning data of the location. I have installed the obvious fixlet after testing on x86-64/Linux, but the decoupling between TREE_NO_WARNING bit and location looks a bit problematic. * warning-control.cc (copy_warning) [generic version]: Do not erase the warning data of the destination location when the no-warning bit is not set on the source. (copy_warning) [tree version]: Return early if TO is equal to FROM. (copy_warning) [gimple version]: Likewise. testsuite/ * g++.dg/warn/Wuse-after-free5.C: New test. -- Eric Botcazoudiff --git a/gcc/warning-control.cc b/gcc/warning-control.cc index 0cbb4f079fa..7e9e701cfbe 100644 --- a/gcc/warning-control.cc +++ b/gcc/warning-control.cc @@ -191,7 +191,7 @@ void copy_warning (ToType to, FromType from) { const location_t to_loc = get_location (to); - bool supp = get_no_warning_bit (from); + const bool supp = get_no_warning_bit (from); nowarn_spec_t *from_spec = get_nowarn_spec (from); if (RESERVED_LOCATION_P (to_loc)) @@ -209,7 +209,7 @@ void copy_warning (ToType to, FromType from) nowarn_spec_t tem = *from_spec; nowarn_map->put (to_loc, tem); } - else + else if (supp) { if (nowarn_map) nowarn_map->remove (to_loc); @@ -226,6 +226,8 @@ void copy_warning (ToType to, FromType from) void copy_warning (tree to, const_tree from) { + if (to == from) +return; copy_warning(to, from); } @@ -250,5 +252,7 @@ copy_warning (gimple *to, const_tree from) void copy_warning (gimple *to, const gimple *from) { + if (to == from) +return; copy_warning(to, from); } // Check the suppression of -Wuse-after-free for destructors on ARM // { dg-do compile } // { dg-options "-Wuse-after-free" } struct range_label { virtual ~range_label(); }; struct unpaired_bidi_rich_location { struct custom_range_label : range_label {}; unpaired_bidi_rich_location(int); custom_range_label m_custom_label; }; void maybe_warn_bidi_on_close() { unpaired_bidi_rich_location(0); }
[PATCH] middle-end/105965 - add missing v_c_e <{ el }> simplification
When we got the simplification of bit-field-ref to view-convert we lost the ability to detect FMAs since we cannot look through _1 = {_10}; _11 = VIEW_CONVERT_EXPR(_1); the following amends the (view_convert CONSTRUCTOR) pattern to handle this case. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2022-06-14 Richard Biener PR middle-end/105965 * match.pd (view_convert CONSTRUCTOR): Handle single-element CTOR case. * gcc.target/i386/pr105965.c: New testcase. --- gcc/match.pd | 17 + gcc/testsuite/gcc.target/i386/pr105965.c | 12 2 files changed, 25 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr105965.c diff --git a/gcc/match.pd b/gcc/match.pd index 44a385b912d..776c9c6489a 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3669,12 +3669,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && TYPE_UNSIGNED (TREE_TYPE (@1) (view_convert @1))) -/* Simplify a view-converted empty constructor. */ +/* Simplify a view-converted empty or single-element constructor. */ (simplify (view_convert CONSTRUCTOR@0) - (if (TREE_CODE (@0) != SSA_NAME - && CONSTRUCTOR_NELTS (@0) == 0) - { build_zero_cst (type); })) + (with + { tree ctor = (TREE_CODE (@0) == SSA_NAME + ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0); } + (switch +(if (CONSTRUCTOR_NELTS (ctor) == 0) + { build_zero_cst (type); }) +(if (CONSTRUCTOR_NELTS (ctor) == 1 +&& VECTOR_TYPE_P (TREE_TYPE (ctor)) +&& operand_equal_p (TYPE_SIZE (type), +TYPE_SIZE (TREE_TYPE + (CONSTRUCTOR_ELT (ctor, 0)->value + (view_convert { CONSTRUCTOR_ELT (ctor, 0)->value; }) /* Re-association barriers around constants and other re-association barriers can be removed. */ diff --git a/gcc/testsuite/gcc.target/i386/pr105965.c b/gcc/testsuite/gcc.target/i386/pr105965.c new file mode 100644 index 000..5bb53790de8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr105965.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mfma -mfpmath=sse" } */ + +typedef float v1sf __attribute__((vector_size(4))); + +v1sf +foo43 (v1sf a, v1sf b, v1sf c) +{ + return a * b + c; +} + +/* { dg-final { scan-assembler "fmadd" } } */ -- 2.35.3
Re: [PATCH v2] RISC-V: bitmanip: improve constant-loading for (1ULL << 31) in DImode
Thanks, backport applied to releases/gcc-12! On Tue, 7 Jun 2022 at 12:28, Kito Cheng wrote: > > OK for backport? > > OK, it seems no issue after a week :) > > > > > > On Thu, 2 Jun 2022 at 21:23, Philipp Tomsich > wrote: > > > > > > Thanks, applied to trunk! > > > > > > On Thu, 2 Jun 2022 at 15:17, Kito Cheng wrote: > > > > > > > > LGTM > > > > > > > > On Mon, May 30, 2022 at 5:52 AM Philipp Tomsich > > > > wrote: > > > > > > > > > > The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for > > > > > bits above 31 only (to side-step any issues with the negative > SImode > > > > > value 0x8000/(-1ull << 31)/(1 << 31)). This moves the special > > > > > handling of this SImode value (i.e. the check for (-1ull << 31) to > > > > > riscv.cc and relaxes the SINGLE_BIT_MASK_OPERAND() test. > > > > > > > > > > With this, the code-generation for loading (1ULL << 31) from: > > > > > li a0,1 > > > > > sllia0,a0,31 > > > > > to: > > > > > bseti a0,zero,31 > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * config/riscv/riscv.cc (riscv_build_integer_1): Rewrite > value as > > > > > (-1 << 31) for the single-bit case, when operating on (1 > << 31) > > > > > in SImode. > > > > > * gcc/config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND): > Allow for > > > > > any single-bit value, moving the special case for (1 << > 31) to > > > > > riscv_build_integer_1 (in riscv.c). > > > > > > > > > > Signed-off-by: Philipp Tomsich > > > > > > > > > > --- > > > > > > > > > > Changes in v2: > > > > > - Use HOST_WIDE_INT_1U/HOST_WIDE_INT_M1U instead of constants. > > > > > - Fix some typos in the comment above the rewrite of the value. > > > > > - Update the comment to clarify that we expect a LUI to be emitted > for > > > > > the SImode case (i.e. sign-extended for RV64) of (1 << 31). > > > > > > > > > > gcc/config/riscv/riscv.cc | 9 + > > > > > gcc/config/riscv/riscv.h | 11 --- > > > > > 2 files changed, 13 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > > > > > index f83dc796d88..2e83ca07394 100644 > > > > > --- a/gcc/config/riscv/riscv.cc > > > > > +++ b/gcc/config/riscv/riscv.cc > > > > > @@ -420,6 +420,15 @@ riscv_build_integer_1 (struct > riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], > > > > >/* Simply BSETI. */ > > > > >codes[0].code = UNKNOWN; > > > > >codes[0].value = value; > > > > > + > > > > > + /* RISC-V sign-extends all 32bit values that live in a 32bit > > > > > +register. To avoid paradoxes, we thus need to use the > > > > > +sign-extended (negative) representation (-1 << 31) for the > > > > > +value, if we want to build (1 << 31) in SImode. This will > > > > > +then expand to an LUI instruction. */ > > > > > + if (mode == SImode && value == (HOST_WIDE_INT_1U << 31)) > > > > > + codes[0].value = (HOST_WIDE_INT_M1U << 31); > > > > > + > > > > >return 1; > > > > > } > > > > > > > > > > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h > > > > > index 5083a1c24b0..6f7f4d3fbdc 100644 > > > > > --- a/gcc/config/riscv/riscv.h > > > > > +++ b/gcc/config/riscv/riscv.h > > > > > @@ -528,13 +528,10 @@ enum reg_class > > > > >(((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) - > IMM_REACH) \ > > > > > || ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0) > > > > > > > > > > -/* If this is a single bit mask, then we can load it with bseti. > But this > > > > > - is not useful for any of the low 31 bits because we can use > addi or lui > > > > > - to load them. It is wrong for loading SImode 0x8000 on > rv64 because it > > > > > - needs to be sign-extended. So we restrict this to the upper > 32-bits > > > > > - only. */ > > > > > -#define SINGLE_BIT_MASK_OPERAND(VALUE) \ > > > > > - (pow2p_hwi (VALUE) && (ctz_hwi (VALUE) >= 32)) > > > > > +/* If this is a single bit mask, then we can load it with bseti. > Special > > > > > + handling of SImode 0x8000 on RV64 is done in > riscv_build_integer_1. */ > > > > > +#define SINGLE_BIT_MASK_OPERAND(VALUE) > \ > > > > > + (pow2p_hwi (VALUE)) > > > > > > > > > > /* Stack layout; function entry, exit and calling. */ > > > > > > > > > > -- > > > > > 2.34.1 > > > > > >
Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
> Hi, > > Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO > if func->decl is not null but no cgraph node is available for it. > As PR105818 shows, this could give unexpected result. For the > case in PR105818, when parsing bar decl in function foo, the cfun > is a function structure for foo, for which there is none cgraph > node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since > the context is to optimize for size, the flag optimize_size is > true. > > The patch is to make optimize_function_for_size_p to check > optimize_size as what it does when func->decl is unavailable. > > One regression failure got exposed on aarch64-linux-gnu: > > PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ > -DPREVENT_OPTIMIZATION line 21 x == 10 - i > > The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT > used in function fold_range_test during c parsing, it uses > optimize_function_for_speed_p which is equal to the invertion > of optimize_function_for_size_p. At that time cfun->decl is valid > but no cgraph node for it, w/o this patch function > optimize_function_for_speed_p returns true eventually, while it > returns false with this patch. Since the command line option -Os > is specified, there is no reason to interpret it as "for speed". > I think this failure is expected and adjust the test case > accordingly. > > Is it ok for trunk? > > BR, > Kewen > - > > PR target/105818 > > gcc/ChangeLog: > > * predict.cc (optimize_function_for_size_p): Check optimize_size when > func->decl is valid but its cgraph node is unavailable. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr105818.c: New test. > * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. > --- > gcc/predict.cc | 2 +- > gcc/testsuite/gcc.dg/guality/pr54693-2.c| 2 +- > gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 + > 3 files changed, 11 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c > > diff --git a/gcc/predict.cc b/gcc/predict.cc > index 5734e4c8516..6c60a973236 100644 > --- a/gcc/predict.cc > +++ b/gcc/predict.cc > @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >cgraph_node *n = cgraph_node::get (fun->decl); >if (n) > return n->optimize_for_size_p (); > - return OPTIMIZE_SIZE_NO; > + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; We could also do (opt_for_fn (cfun->decl, optimize_size) that is probably better since one can change optimize_size with optimization attribute. However I think in most cases we check for optimize_size early I think we are doing something wrong, since at that time htere is no profile available. Why exactly PR105818 hits the flag change issue? Honza > } > > /* Return true if function FUN should always be optimized for speed. */ > diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c > b/gcc/testsuite/gcc.dg/guality/pr54693-2.c > index 68aa6c63d71..14ca94ad37d 100644 > --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c > +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c > @@ -17,7 +17,7 @@ foo (int x, int y, int z) >int i = 0; >while (x > 3 && y > 3 && z > 3) > {/* { dg-final { gdb-test .+2 "i" "v + 1" } } */ > - /* { dg-final { gdb-test .+1 "x" "10 - i" } } */ > + /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { > aarch64*-*-* && { any-opts "-Os" } } } } } */ >bar (i); /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */ > /* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { > aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */ >i++, x--, y -= 2, z -= 3; > diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c > b/gcc/testsuite/gcc.target/powerpc/pr105818.c > new file mode 100644 > index 000..18781edbf9e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c > @@ -0,0 +1,9 @@ > +/* { dg-options "-Os -fno-tree-vectorize" } */ > + > +#pragma GCC optimize "-fno-tree-vectorize" > + > +void > +foo (void) > +{ > + void bar (void); > +} > -- > 2.27.0
Re: [PATCH v1 1/3] RISC-V: add consecutive_bits_operand predicate
Thanks, applied to master! On Tue, 7 Jun 2022 at 12:26, Kito Cheng wrote: > > LGTM > > > On Wed, May 25, 2022 at 5:48 AM Philipp Tomsich > wrote: > > > > Provide an easy way to constrain for constants that are a a single, > > consecutive run of ones. > > > > gcc/ChangeLog: > > > > * config/riscv/predicates.md (consecutive_bits_operand): > > Implement new predicate. > > > > Signed-off-by: Philipp Tomsich > > --- > > > > gcc/config/riscv/predicates.md | 11 +++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md > > index c37caa2502b..90db5dfcdd5 100644 > > --- a/gcc/config/riscv/predicates.md > > +++ b/gcc/config/riscv/predicates.md > > @@ -243,3 +243,14 @@ (define_predicate "const63_operand" > > (define_predicate "imm5_operand" > >(and (match_code "const_int") > > (match_test "INTVAL (op) < 5"))) > > + > > +;; A CONST_INT operand that consists of a single run of consecutive set > > bits. > > +(define_predicate "consecutive_bits_operand" > > + (match_code "const_int") > > +{ > > + unsigned HOST_WIDE_INT val = UINTVAL (op); > > + if (exact_log2 ((val >> ctz_hwi (val)) + 1) < 0) > > + return false; > > + > > + return true; > > +}) > > -- > > 2.34.1 > >
Re: [PATCH v1 2/3] RISC-V: Split slli+sh[123]add.uw opportunities to avoid zext.w
Thanks, applied to master! For [3/3], I'll submit a new standalone patch with the requested changes. On Tue, 7 Jun 2022 at 12:25, Kito Cheng wrote: > > LGTM, you can commit that without [3/3] if you like :) > > On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich > wrote: > > > > When encountering a prescaled (biased) value as a candidate for > > sh[123]add.uw, the combine pass will present this as shifted by the > > aggregate amount (prescale + shift-amount) with an appropriately > > adjusted mask constant that has fewer than 32 bits set. > > > > E.g., here's the failing expression seen in combine for a prescale of > > 1 and a shift of 2 (note how 0x3fff8 >> 3 is 0x7fff). > > Trying 7, 8 -> 10: > > 7: r78:SI=r81:DI#0<<0x1 > > REG_DEAD r81:DI > > 8: r79:DI=zero_extend(r78:SI) > > REG_DEAD r78:SI > > 10: r80:DI=r79:DI<<0x2+r82:DI > > REG_DEAD r79:DI > > REG_DEAD r82:DI > > Failed to match this instruction: > > (set (reg:DI 80 [ cD.1491 ]) > > (plus:DI (and:DI (ashift:DI (reg:DI 81) > >(const_int 3 [0x3])) > >(const_int 17179869176 [0x3fff8])) > > (reg:DI 82))) > > > > To address this, we introduce a splitter handling these cases. > > > > gcc/ChangeLog: > > > > * config/riscv/bitmanip.md: Add split to handle opportunities > > for slli + sh[123]add.uw > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/zba-shadd.c: New test. > > > > Signed-off-by: Philipp Tomsich > > Co-developed-by: Manolis Tsamis > > > > --- > > > > gcc/config/riscv/bitmanip.md | 44 ++ > > gcc/testsuite/gcc.target/riscv/zba-shadd.c | 13 +++ > > 2 files changed, 57 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shadd.c > > > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md > > index 0ab9ffe3c0b..6c1ccc6f8c5 100644 > > --- a/gcc/config/riscv/bitmanip.md > > +++ b/gcc/config/riscv/bitmanip.md > > @@ -79,6 +79,50 @@ (define_insn "*shNadduw" > >[(set_attr "type" "bitmanip") > > (set_attr "mode" "DI")]) > > > > +;; During combine, we may encounter an attempt to combine > > +;; slli rtmp, rs, #imm > > +;; zext.w rtmp, rtmp > > +;; sh[123]add rd, rtmp, rs2 > > +;; which will lead to the immediate not satisfying the above constraints. > > +;; By splitting the compound expression, we can simplify to a slli and a > > +;; sh[123]add.uw. > > +(define_split > > + [(set (match_operand:DI 0 "register_operand") > > + (plus:DI (and:DI (ashift:DI (match_operand:DI 1 "register_operand") > > + (match_operand:QI 2 > > "immediate_operand")) > > +(match_operand:DI 3 "consecutive_bits_operand")) > > +(match_operand:DI 4 "register_operand"))) > > + (clobber (match_operand:DI 5 "register_operand"))] > > + "TARGET_64BIT && TARGET_ZBA" > > + [(set (match_dup 5) (ashift:DI (match_dup 1) (match_dup 6))) > > + (set (match_dup 0) (plus:DI (and:DI (ashift:DI (match_dup 5) > > + (match_dup 7)) > > + (match_dup 8)) > > + (match_dup 4)))] > > +{ > > + unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]); > > + /* scale: shift within the sh[123]add.uw */ > > + int scale = 32 - clz_hwi (mask); > > + /* bias: pre-scale amount (i.e. the prior shift amount) */ > > + int bias = ctz_hwi (mask) - scale; > > + > > + /* If the bias + scale don't add up to operand[2], reject. */ > > + if ((scale + bias) != UINTVAL (operands[2])) > > + FAIL; > > + > > + /* If the shift-amount is out-of-range for sh[123]add.uw, reject. */ > > + if ((scale < 1) || (scale > 3)) > > + FAIL; > > + > > + /* If there's no bias, the '*shNadduw' pattern should have matched. > > */ > > + if (bias == 0) > > + FAIL; > > + > > + operands[6] = GEN_INT (bias); > > + operands[7] = GEN_INT (scale); > > + operands[8] = GEN_INT (0xULL << scale); > > +}) > > + > > (define_insn "*add.uw" > >[(set (match_operand:DI 0 "register_operand" "=r") > > (plus:DI (zero_extend:DI > > diff --git a/gcc/testsuite/gcc.target/riscv/zba-shadd.c > > b/gcc/testsuite/gcc.target/riscv/zba-shadd.c > > new file mode 100644 > > index 000..33da2530f3f > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/zba-shadd.c > > @@ -0,0 +1,13 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -march=rv64gc_zba -mabi=lp64" } */ > > + > > +unsigned long foo(unsigned int a, unsigned long b) > > +{ > > +a = a << 1; > > +unsigned long c = (unsigned long) a; > > +unsigned long d = b + (c<<2); > > +return d; > > +} > > + > > +/* { dg-final { scan-assembler "sh2add.uw" } } */ > > +/* { dg-final { scan-assembler-not "zext" } }
Re: [PATCH RFA] ubsan: default to trap on unreachable at -O0 and -Og [PR104642]
On Mon, Jun 13, 2022 at 03:53:13PM -0400, Jason Merrill via Gcc-patches wrote: > When not optimizing, we can't do anything useful with unreachability in > terms of code performance, so we might as well improve debugging by turning > __builtin_unreachable into a trap. In the PR richi suggested introducing an > -funreachable-traps flag for this, but this functionality is already > implemented as -fsanitize=unreachable -fsanitize-undefined-trap-on-error, we > just need to set those flags by default. > > I think it also makes sense to do this when we're explicitly optimizing for > the debugging experience. > > I then needed to make options-save handle -fsanitize and > -fsanitize-undefined-trap-on-error; since -fsanitize is has custom parsing, > that meant handling it explicitly in the awk scripts. I also noticed we > weren't setting it in opts_set. > > Do we still want -funreachable-traps as an alias (or separate flag) for this > behavior that doesn't mention the sanitizer? I do not like doing it this way, -fsanitize-undefined-trap-on-error is (unfortunately for compatibility with llvm misdesign, users should have ways to control which of the enabled sanitizers should be handled which way, where the 3 ways are runtime diagnostics without abort, runtime diagnostics with abort and __builtin_trap ()) an all or nothing option which affects all the sanitizers. Your patch will e.g. silently enable the sanitization whenever just -fsanitize-undefined-trap-on-error is passed, but that can be passed even when users don't really expect any sanitization, just making sure that if it is enabled (say for selected TUs or functions), it doesn't need a runtime library to report it. Furthermore, handling it the UBSan way means we slow down the compiler (enable a bunch of extra passes, like sanopt, ubsan), which is undesirable e.g. for -O0 compilation speed. So, I think -funreachable-traps should be a separate flag and not an alias, enabled by default for -O0 and -Og, which would be handled elsewhere (I'd say e.g. in fold_builtin_0 and perhaps gimple_fold_builtin too to avoid allocating trees unnecessary) and would be done if flag_unreachable_traps && !sanitize_flag_p (SANITIZE_UNREACHABLE), just replacing that __builtin_unreachable call with __builtin_trap. For the function ends in fact under those conditions we could emit __builtin_trap right away instead of emitting __builtin_unreachable and waiting on folding it later to __builtin_trap. Jakub
Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote: > PR target/105818 Change this to something else? It is not a target bug. "middle-end" perhaps? Thanks for fixing this, Segher
Re: [PATCH][WIP] have configure probe prefix for gmp/mpfr/mpc [PR44425]
On Mon, Jun 13, 2022 at 4:27 PM Eric Gallager wrote: > > On Mon, Jun 13, 2022 at 7:02 AM Richard Biener > wrote: > > > > On Thu, Jun 2, 2022 at 5:54 PM Eric Gallager via Gcc-patches > > wrote: > > > > > > So, I'm working on fixing PR bootstrap/44425, and have this patch to > > > have the top-level configure script check in the value passed to > > > `--prefix=` when looking for gmp/mpfr/mpc. It "works" (in that > > > configuring with just `--prefix=` and none of > > > `--with-gmp=`/`--with-mpfr=`/`--with-mpc=` now works where it failed > > > before), but unfortunately it results in a bunch of duplicated > > > `-I`/`-L` flags stuck in ${gmplibs} and ${gmpinc}... is that > > > acceptable or should I try another approach? > > > > --prefix is documented as to be used for installing (architecture > > independent) files, > > I'm not sure it is a good idea to probe that for gmp/mpfr/mpc installs used > > for > > the host. > > > > Richard. > > > > > Eric > > So... I guess that means we should close bug 44425 as INVALID or > WONTFIX then? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44425 That would be my reaction, yes.
RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask
On Mon, 13 Jun 2022, Tamar Christina wrote: > > -Original Message- > > From: Richard Biener > > Sent: Monday, June 13, 2022 12:48 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > > > Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2 > > bitmask > > > > On Mon, 13 Jun 2022, Tamar Christina wrote: > > > > > > -Original Message- > > > > From: Richard Biener > > > > Sent: Monday, June 13, 2022 10:39 AM > > > > To: Tamar Christina > > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > > > > > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by > > > > pow2 bitmask > > > > > > > > On Mon, 13 Jun 2022, Richard Biener wrote: > > > > > > > > > On Thu, 9 Jun 2022, Tamar Christina wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > In plenty of image and video processing code it's common to > > > > > > modify pixel values by a widening operation and then scale them > > > > > > back into range > > > > by dividing by 255. > > > > > > > > > > > > This patch adds an optab to allow us to emit an optimized > > > > > > sequence when doing an unsigned division that is equivalent to: > > > > > > > > > > > >x = y / (2 ^ (bitsize (y)/2)-1 > > > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > > > x86_64-pc-linux-gnu and no issues. > > > > > > > > > > > > Ok for master? > > > > > > > > > > Looking at 2/2 it seems that this is the wrong way to attack the > > > > > problem. The ISA doesn't have such instruction so adding an optab > > > > > looks premature. I suppose that there's no unsigned vector > > > > > integer division and thus we open-code that in a different way? > > > > > Isn't the correct thing then to fixup that open-coding if it is more > > efficient? > > > > > > > > > > The problem is that even if you fixup the open-coding it would need to > > > be something target specific? The sequence of instructions we generate > > > don't have a GIMPLE representation. So whatever is generated I'd have > > > to fixup in RTL then. > > > > What's the operation that doesn't have a GIMPLE representation? > > For NEON use two operations: > 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2 > Where the + widens and the >> narrows. So you give it two shorts, get a > byte > 2. Add widening add of lowpart so basically lowpart (a +w b) > > For SVE2 we use a different sequence, we use two back-to-back sequences of: > 1. Add narrow high part (bottom). In SVE the Top and Bottom instructions > select >Even and odd elements of the vector rather than "top half" and "bottom > half". > >So this instruction does : Add each vector element of the first source > vector to the >corresponding vector element of the second source vector, and place the > most > significant half of the result in the even-numbered half-width > destination elements, > while setting the odd-numbered elements to zero. > > So there's an explicit permute in there. The instructions are sufficiently > different that there > wouldn't be a single GIMPLE representation. I see. Are these also useful to express scalar integer division? I'll defer to others to ack the special udiv_pow2_bitmask optab or suggest some piecemail things other targets might be able to do as well. It does look very special. I'd also bikeshed it to udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I interpreted 'bitmask' correctly ;)). It seems to be even less general since it is an unary op and the actual divisor is constrained by the mode itself? Richard. > > > > I think for costing you could resort to the *_cost functions as used by > > synth_mult and friends. > > > > > The problem with this is that it seemed fragile. We generate from the > > > Vectorizer: > > > > > > vect__3.8_35 = MEM [(uint8_t *)_21]; > > > vect_patt_28.9_37 = WIDEN_MULT_LO_EXPR > vect_cst__36>; > > > vect_patt_28.9_38 = WIDEN_MULT_HI_EXPR > vect_cst__36>; > > > vect_patt_19.10_40 = vect_patt_28.9_37 h* { 32897, 32897, 32897, 32897, > > 32897, 32897, 32897, 32897 }; > > > vect_patt_19.10_41 = vect_patt_28.9_38 h* { 32897, 32897, 32897, 32897, > > 32897, 32897, 32897, 32897 }; > > > vect_patt_25.11_42 = vect_patt_19.10_40 >> 7; > > > vect_patt_25.11_43 = vect_patt_19.10_41 >> 7; > > > vect_patt_11.12_44 = VEC_PACK_TRUNC_EXPR > > vect_patt_25.11_43>; > > > > > > and if the magic constants change then we miss the optimization. I > > > could rewrite the open coding to use shifts alone, but that might be a > > regression for some uarches I would imagine. > > > > OK, so you do have a highpart multiply. I suppose the pattern is too deep > > to > > be recognized by combine? What's the RTL good vs. bad before combine of > > one of the expressions? > > Yeah combine only tried 2-3 instructions, but to use these sequences we have > to > match the entire chain as the instructions do the narro
[PATCH V2]rs6000: Store complicated constant into pool
Hi, This patch is same with nearly: https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595378.html The concept of this patch is similar with the patches which is attached in PR63281. e.g. https://gcc.gnu.org/bugzilla/attachment.cgi?id=42186 I had a test for perlbench from SPEC2017. As expected, on -O2 for P9, there is ~2% performance improvement with this patch. This patch reduces the threshold of instruction number for storing constant to pool and update cost for constant and mem accessing. And then if building the constant needs more than 2 instructions (or more than 1 instruction on P10), then prefer to load it from constant pool. Bootstrap and regtest pass on ppc64le and ppc64. Is this ok for trunk? Thanks for comments and sugguestions. BR, Jiufu PR target/63281 gcc/ChangeLog: 2022-06-14 Jiufu Guo Alan Modra * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem): Exclude rtx with code 'HIGH'. (rs6000_emit_move): Update threshold of const insn. (rs6000_rtx_costs): Update cost of constant and mem. gcc/testsuite/ChangeLog: 2022-06-14 Jiufu Guo Alan Modra * gcc.target/powerpc/medium_offset.c: Update. * gcc.target/powerpc/pr93012.c: Update. * gcc.target/powerpc/pr63281.c: New test. --- gcc/config/rs6000/rs6000.cc | 23 +++ .../gcc.target/powerpc/medium_offset.c| 2 +- gcc/testsuite/gcc.target/powerpc/pr63281.c| 11 + gcc/testsuite/gcc.target/powerpc/pr93012.c| 2 +- 4 files changed, 31 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index cd291f93019..90c91a8e1ea 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void) static bool rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) { - if (GET_CODE (x) == HIGH - && GET_CODE (XEXP (x, 0)) == UNSPEC) + /* Exclude CONSTANT HIGH part. e.g. + (high:DI (symbol_ref:DI ("var") [flags 0xc0] )). */ + if (GET_CODE (x) == HIGH) return true; /* A TLS symbol in the TOC cannot contain a sum. */ @@ -11139,7 +11140,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode) && FP_REGNO_P (REGNO (operands[0]))) || !CONST_INT_P (operands[1]) || (num_insns_constant (operands[1], mode) - > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2))) + > (TARGET_PREFIXED ? 1 : 2))) && !toc_relative_expr_p (operands[1], false, NULL, NULL) && (TARGET_CMODEL == CMODEL_SMALL || can_create_pseudo_p () @@ -22101,6 +22102,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, case CONST_DOUBLE: case CONST_WIDE_INT: + /* It may needs a few insns for const to SET. -1 for outer SET code. */ + if (outer_code == SET) + { + *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1; + return true; + } + /* FALLTHRU */ + case CONST: case HIGH: case SYMBOL_REF: @@ -22110,8 +22119,12 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, case MEM: /* When optimizing for size, MEM should be slightly more expensive than generating address, e.g., (plus (reg) (const)). -L1 cache latency is about two instructions. */ - *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2); +L1 cache latency is about two instructions. +For prefixed load (pld), we would set it slightly faster than +than two instructions. */ + *total = !speed +? COSTS_N_INSNS (1) + 1 +: TARGET_PREFIXED ? COSTS_N_INSNS (2) - 1 : COSTS_N_INSNS (2); if (rs6000_slow_unaligned_access (mode, MEM_ALIGN (x))) *total += COSTS_N_INSNS (100); return true; diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c b/gcc/testsuite/gcc.target/powerpc/medium_offset.c index f29eba08c38..4889e8fa8ec 100644 --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c @@ -1,7 +1,7 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target lp64 } */ /* { dg-options "-O" } */ -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ static int x; diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c new file mode 100644 index 000..469a8f64400 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c @@ -0,0 +1,11 @@ +/* PR target/63281 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -std=c99" } */ + +void +foo (unsigned long long *a) +{ + *a = 0x02080500610600
Re: [PATCH] Do not erase warning data in gimple_set_location
On Tue, Jun 14, 2022 at 12:49 PM Eric Botcazou wrote: > > > Hmm, I think instead of special-casing UNKNOWN_LOCATION > > what gimple_set_location should probably do is either not copy > > warnings at all or union them. Btw, gimple_set_location also > > removes a previously set BLOCK (but gimple_set_block preserves > > the location locus and diagnostic override). > > > > So I'd be tempted to axe the copy_warning () completely here. > > The first thing I tried, but it regressed the original testcase IIRC. > > Even my minimal patch manages to break bootstrap on ARM: > > buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libcpp/lex.cc: > 1523:9: error: pointer used after ‘void operator delete(void*, std::size_t)’ > [-Werror=use-after-free] > # 00:31:04 make[3]: *** [Makefile:227: lex.o] Error 1 > # 00:31:04 make[2]: *** [Makefile:9527: all-stage3-libcpp] Error 2 > # 00:31:35 make[1]: *** [Makefile:25887: stage3-bubble] Error 2 > # 00:31:35 make: *** [Makefile:1072: all] Error 2 > > /* Don't warn for cases like when a cdtor returns 'this' on ARM. */ > else if (warning_suppressed_p (var, OPT_Wuse_after_free)) > return; > > because warning-control.cc:copy_warning also clobbers the warning data of the > destination. We have in cp/decl.cc:maybe_return_this the lines: > > /* Return the address of the object. */ > tree val = DECL_ARGUMENTS (current_function_decl); > suppress_warning (val, OPT_Wuse_after_free); > > -Wuse-after-free is suppressed for the location of VAL and the TREE_NO_WARNING > bit set on it. But other expressions may have the same location as VAL and > the TREE_NO_WARNING bit _not_ set, so when you call copy_warning (expr, expr) > (we do that a lot after failed folding) for them, copy_warning erases the > warning data of the location. > > I have installed the obvious fixlet after testing on x86-64/Linux, but the > decoupling between TREE_NO_WARNING bit and location looks a bit problematic. Thanks - that makes more sense. > > * warning-control.cc (copy_warning) [generic version]: Do not erase > the warning data of the destination location when the no-warning > bit is not set on the source. > (copy_warning) [tree version]: Return early if TO is equal to FROM. > (copy_warning) [gimple version]: Likewise. > testsuite/ > * g++.dg/warn/Wuse-after-free5.C: New test. > > -- > Eric Botcazou
RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask
> -Original Message- > From: Richard Biener > Sent: Tuesday, June 14, 2022 2:19 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2 > bitmask > > On Mon, 13 Jun 2022, Tamar Christina wrote: > > > > -Original Message- > > > From: Richard Biener > > > Sent: Monday, June 13, 2022 12:48 PM > > > To: Tamar Christina > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > > > > > Subject: RE: [PATCH 1/2]middle-end Support optimized division by > > > pow2 bitmask > > > > > > On Mon, 13 Jun 2022, Tamar Christina wrote: > > > > > > > > -Original Message- > > > > > From: Richard Biener > > > > > Sent: Monday, June 13, 2022 10:39 AM > > > > > To: Tamar Christina > > > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > > > > > > > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by > > > > > pow2 bitmask > > > > > > > > > > On Mon, 13 Jun 2022, Richard Biener wrote: > > > > > > > > > > > On Thu, 9 Jun 2022, Tamar Christina wrote: > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > In plenty of image and video processing code it's common to > > > > > > > modify pixel values by a widening operation and then scale > > > > > > > them back into range > > > > > by dividing by 255. > > > > > > > > > > > > > > This patch adds an optab to allow us to emit an optimized > > > > > > > sequence when doing an unsigned division that is equivalent to: > > > > > > > > > > > > > >x = y / (2 ^ (bitsize (y)/2)-1 > > > > > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > > > > x86_64-pc-linux-gnu and no issues. > > > > > > > > > > > > > > Ok for master? > > > > > > > > > > > > Looking at 2/2 it seems that this is the wrong way to attack > > > > > > the problem. The ISA doesn't have such instruction so adding > > > > > > an optab looks premature. I suppose that there's no unsigned > > > > > > vector integer division and thus we open-code that in a different > way? > > > > > > Isn't the correct thing then to fixup that open-coding if it > > > > > > is more > > > efficient? > > > > > > > > > > > > > The problem is that even if you fixup the open-coding it would > > > > need to be something target specific? The sequence of instructions > > > > we generate don't have a GIMPLE representation. So whatever is > > > > generated I'd have to fixup in RTL then. > > > > > > What's the operation that doesn't have a GIMPLE representation? > > > > For NEON use two operations: > > 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2 > > Where the + widens and the >> narrows. So you give it two shorts, > > get a byte 2. Add widening add of lowpart so basically lowpart (a +w > > b) > > > > For SVE2 we use a different sequence, we use two back-to-back > sequences of: > > 1. Add narrow high part (bottom). In SVE the Top and Bottom instructions > select > >Even and odd elements of the vector rather than "top half" and "bottom > half". > > > >So this instruction does : Add each vector element of the first source > vector to the > >corresponding vector element of the second source vector, and place the > most > > significant half of the result in the even-numbered half-width > > destination > elements, > > while setting the odd-numbered elements to zero. > > > > So there's an explicit permute in there. The instructions are > > sufficiently different that there wouldn't be a single GIMPLE > representation. > > I see. Are these also useful to express scalar integer division? Hmm not these exact instructions as they only exist on vector. Scalar may Potentially benefit from rewriting this to (x + ((x + 257) >> 8)) >> 8 Which avoids the multiply with the magic constant. But the problem here is that unless undone for vector it would likely generate worse code if vectorized exactly like this on most ISAs compared to what we have now. > > I'll defer to others to ack the special udiv_pow2_bitmask optab or suggest > some piecemail things other targets might be able to do as well. It does look > very special. I'd also bikeshed it to > udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I > interpreted 'bitmask' correctly ;)). It seems to be even less general since > it is > an unary op and the actual divisor is constrained by the mode itself? I am happy to change the name, and quite happy to add the constant as an argument. I had only made it this specific because this was the only fairly common operation I had found. Though perhaps it's indeed better to keep the optab a bit more general? Thanks, Tamar > > Richard. > > > > > > > I think for costing you could resort to the *_cost functions as used > > > by synth_mult and friends. > > > > > > > The problem with this is that it seemed fragile. We generate from > > > > the > > > > Vectorizer: > > > > > > > > vect__3.8_35 = MEM [(uint8_t
Re: [PATCH take #2] Fold truncations of left shifts in match.pd
On Sun, Jun 5, 2022 at 1:12 PM Roger Sayle wrote: > > > Hi Richard, > Many thanks for taking the time to explain how vectorization is supposed > to work. I now see that vect_recog_rotate_pattern in tree-vect-patterns.cc > is supposed to handle lowering of rotations to (vector) shifts, and > completely agree that adding support for signed types (using appropriate > casts to unsigned_type_for and casting the result back to the original > signed type) is a better approach to avoid the regression of pr98674.c. > > I've also implemented your suggestions of combining the proposed new > (convert (lshift @1 INTEGER_CST@2)) with the existing one, and at the > same time including support for valid shifts greater than the narrower > type, such as (short)(x << 20), to constant zero. Although this optimization > is already performed during the tree-ssa passes, it's convenient to > also catch it here during constant folding. > > This revised patch has been tested on x86_64-pc-linux-gnu with > make bootstrap and make -k check, both with and without > --target_board=unix{-m32}, with no new failures. Ok for mainline? OK. Thanks, Richard. > 2022-06-05 Roger Sayle > Richard Biener > > gcc/ChangeLog > * match.pd (convert (lshift @1 INTEGER_CST@2)): Narrow integer > left shifts by a constant when the result is truncated, and the > shift constant is well-defined. > * tree-vect-patterns.cc (vect_recog_rotate_pattern): Add > support for rotations of signed integer types, by lowering > using unsigned vector shifts. > > gcc/testsuite/ChangeLog > * gcc.dg/fold-convlshift-4.c: New test case. > * gcc.dg/optimize-bswaphi-1.c: Update found bswap count. > * gcc.dg/tree-ssa/pr61839_3.c: Shift is now optimized before VRP. > * gcc.dg/vect/vect-over-widen-1-big-array.c: Remove obsolete tests. > * gcc.dg/vect/vect-over-widen-1.c: Likewise. > * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise. > * gcc.dg/vect/vect-over-widen-3.c: Likewise. > * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise. > * gcc.dg/vect/vect-over-widen-4.c: Likewise. > > > Thanks again, > Roger > -- > > > -Original Message- > > From: Richard Biener > > Sent: 02 June 2022 12:03 > > To: Roger Sayle > > Cc: GCC Patches > > Subject: Re: [PATCH] Fold truncations of left shifts in match.pd > > > > On Thu, Jun 2, 2022 at 12:55 PM Roger Sayle > > wrote: > > > > > > > > > Hi Richard, > > > > + /* RTL expansion knows how to expand rotates using shift/or. */ > > > > + if (icode == CODE_FOR_nothing > > > > + && (code == LROTATE_EXPR || code == RROTATE_EXPR) > > > > + && optab_handler (ior_optab, vec_mode) != CODE_FOR_nothing > > > > + && optab_handler (ashl_optab, vec_mode) != CODE_FOR_nothing) > > > > +icode = (int) optab_handler (lshr_optab, vec_mode); > > > > > > > > but we then get the vector costing wrong. > > > > > > The issue is that we currently get the (relative) vector costing wrong. > > > Currently for gcc.dg/vect/pr98674.c, the vectorizer thinks the scalar > > > code requires two shifts and an ior, so believes its profitable to > > > vectorize this loop using two vector shifts and an vector ior. But > > > once match.pd simplifies the truncate and recognizes the HImode rotate we > > end up with: > > > > > > pr98674.c:6:16: note: ==> examining statement: _6 = _1 r>> 8; > > > pr98674.c:6:16: note: vect_is_simple_use: vectype vector(8) short int > > > pr98674.c:6:16: note: vect_is_simple_use: operand 8, type of def: > > > constant > > > pr98674.c:6:16: missed: op not supported by target. > > > pr98674.c:8:33: missed: not vectorized: relevant stmt not supported: _6 > > > = _1 > > r>> 8; > > > pr98674.c:6:16: missed: bad operation or unsupported loop bound. > > > > > > > > > Clearly, it's a win to vectorize HImode rotates, when the backend can > > > perform > > > 8 (or 16) rotations at a time, but using 3 vector instructions, even > > > when a scalar rotate can performed in a single instruction. > > > Fundamentally, vectorization may still be desirable/profitable even when > > > the > > backend doesn't provide an optab. > > > > Yes, as said it's tree-vect-patterns.cc job to handle this not natively > > supported > > rotate by re-writing it. Can you check why vect_recog_rotate_pattern does > > not > > do this? Ah, the code only handles !TYPE_UNSIGNED (type) - not sure why > > though (for rotates it should not matter and for the lowered sequence we can > > convert to desired signedness to get arithmetic/logical shifts)? > > > > > The current situation where the i386's backend provides expanders to > > > lower rotations (or vcond) into individual instruction sequences, also > > > interferes > > with > > > vector costing. It's the vector cost function that needs to be fixed, > > > not the > > > generated code made worse (or the backend bloated performing its own > > > RTL expans
Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask
Richard Biener writes: > On Mon, 13 Jun 2022, Tamar Christina wrote: > >> > -Original Message- >> > From: Richard Biener >> > Sent: Monday, June 13, 2022 12:48 PM >> > To: Tamar Christina >> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford >> > >> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2 >> > bitmask >> > >> > On Mon, 13 Jun 2022, Tamar Christina wrote: >> > >> > > > -Original Message- >> > > > From: Richard Biener >> > > > Sent: Monday, June 13, 2022 10:39 AM >> > > > To: Tamar Christina >> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford >> > > > >> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by >> > > > pow2 bitmask >> > > > >> > > > On Mon, 13 Jun 2022, Richard Biener wrote: >> > > > >> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote: >> > > > > >> > > > > > Hi All, >> > > > > > >> > > > > > In plenty of image and video processing code it's common to >> > > > > > modify pixel values by a widening operation and then scale them >> > > > > > back into range >> > > > by dividing by 255. >> > > > > > >> > > > > > This patch adds an optab to allow us to emit an optimized >> > > > > > sequence when doing an unsigned division that is equivalent to: >> > > > > > >> > > > > >x = y / (2 ^ (bitsize (y)/2)-1 >> > > > > > >> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, >> > > > > > x86_64-pc-linux-gnu and no issues. >> > > > > > >> > > > > > Ok for master? >> > > > > >> > > > > Looking at 2/2 it seems that this is the wrong way to attack the >> > > > > problem. The ISA doesn't have such instruction so adding an optab >> > > > > looks premature. I suppose that there's no unsigned vector >> > > > > integer division and thus we open-code that in a different way? >> > > > > Isn't the correct thing then to fixup that open-coding if it is more >> > efficient? >> > > > >> > > >> > > The problem is that even if you fixup the open-coding it would need to >> > > be something target specific? The sequence of instructions we generate >> > > don't have a GIMPLE representation. So whatever is generated I'd have >> > > to fixup in RTL then. >> > >> > What's the operation that doesn't have a GIMPLE representation? >> >> For NEON use two operations: >> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2 >> Where the + widens and the >> narrows. So you give it two shorts, get a >> byte >> 2. Add widening add of lowpart so basically lowpart (a +w b) >> >> For SVE2 we use a different sequence, we use two back-to-back sequences of: >> 1. Add narrow high part (bottom). In SVE the Top and Bottom instructions >> select >>Even and odd elements of the vector rather than "top half" and "bottom >> half". >> >>So this instruction does : Add each vector element of the first source >> vector to the >>corresponding vector element of the second source vector, and place the >> most >> significant half of the result in the even-numbered half-width >> destination elements, >> while setting the odd-numbered elements to zero. >> >> So there's an explicit permute in there. The instructions are sufficiently >> different that there >> wouldn't be a single GIMPLE representation. > > I see. Are these also useful to express scalar integer division? > > I'll defer to others to ack the special udiv_pow2_bitmask optab > or suggest some piecemail things other targets might be able to do as > well. It does look very special. I'd also bikeshed it to > udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming > I interpreted 'bitmask' correctly ;)). It seems to be even less > general since it is an unary op and the actual divisor is constrained > by the mode itself? Yeah, those were my concerns as well. For n-bit numbers, the same kind of arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so from a target-independent point of view, m==n/2 isn't particularly special. Hard-coding one value of m would make sense if there was an underlying instruction that did exactly this, but like you say, there isn't. Would a compromise be to define an optab for ADDHN and then add a vector pattern for this division that (at least initially) prefers ADDHN over the current approach whenever ADDHN is available? We could then adapt the conditions on the pattern if other targets also provide ADDHN but don't want this transform. (I think the other instructions in the pattern already have optabs.) That still leaves open the question about what to do about SVE2, but the underlying problem there is that the vectoriser doesn't know about the B/T layout. Thanks, Richard
[PATCH] tree-optimization/105736: Don't let error_mark_node escape for ADDR_EXPR
The addr_expr computation does not check for error_mark_node before returning the size expression. This used to work in the constant case because the conversion to uhwi would end up causing it to return size_unknown, but that won't work for the dynamic case. Modify the control flow to explicitly return size_unknown if the offset computation returns an error_mark_node. gcc/ChangeLog: PR tree-optimization/105736 * tree-object-size.cc (addr_object_size): Return size_unknown when object offset computation returns an error. gcc/testsuite/ChangeLog: PR tree-optimization/105736 * gcc.dg/builtin-dynamic-object-size-0.c (TV4, val3, test_pr105736): New struct declaration, variable and function to test PR. (main): Use them. Signed-off-by: Siddhesh Poyarekar --- Tested: - x86_64 bootstrap and test - --with-build-config=bootstrap-ubsan build May I also backport this to gcc12? .../gcc.dg/builtin-dynamic-object-size-0.c| 19 ++ gcc/tree-object-size.cc | 20 ++- 2 files changed, 30 insertions(+), 9 deletions(-) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index b5b0b3a677c..90f303ef40e 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -479,6 +479,20 @@ test_loop (int *obj, size_t sz, size_t start, size_t end, int incr) return __builtin_dynamic_object_size (ptr, 0); } +/* Other tests. */ + +struct TV4 +{ + __attribute__((vector_size (sizeof (int) * 4))) int v; +}; + +struct TV4 val3; +int * +test_pr105736 (struct TV4 *a) +{ + return &a->v[0]; +} + unsigned nfails = 0; #define FAIL() ({ \ @@ -633,6 +647,11 @@ main (int argc, char **argv) FAIL (); if (test_loop (arr, 42, 20, 52, 1) != 0) FAIL (); + /* pr105736. */ + int *t = test_pr105736 (&val3); + if (__builtin_dynamic_object_size (t, 0) != -1) +__builtin_abort (); + if (nfails > 0) __builtin_abort (); diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc index 5ca87ae3504..12bc0868b77 100644 --- a/gcc/tree-object-size.cc +++ b/gcc/tree-object-size.cc @@ -695,19 +695,21 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, var_size = pt_var_size; bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var); if (bytes != error_mark_node) - bytes = size_for_offset (var_size, bytes); - if (var != pt_var - && pt_var_size - && TREE_CODE (pt_var) == MEM_REF - && bytes != error_mark_node) { - tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), pt_var); - if (bytes2 != error_mark_node) + bytes = size_for_offset (var_size, bytes); + if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == MEM_REF) { - bytes2 = size_for_offset (pt_var_size, bytes2); - bytes = size_binop (MIN_EXPR, bytes, bytes2); + tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), + pt_var); + if (bytes2 != error_mark_node) + { + bytes2 = size_for_offset (pt_var_size, bytes2); + bytes = size_binop (MIN_EXPR, bytes, bytes2); + } } } + else + bytes = size_unknown (object_size_type); wholebytes = object_size_type & OST_SUBOBJECT ? var_size : pt_var_wholesize; -- 2.35.3
[PATCH v1.1] tree-optimization/105736: Don't let error_mark_node escape for ADDR_EXPR
The addr_expr computation does not check for error_mark_node before returning the size expression. This used to work in the constant case because the conversion to uhwi would end up causing it to return size_unknown, but that won't work for the dynamic case. Modify the control flow to explicitly return size_unknown if the offset computation returns an error_mark_node. gcc/ChangeLog: PR tree-optimization/105736 * tree-object-size.cc (addr_object_size): Return size_unknown when object offset computation returns an error. gcc/testsuite/ChangeLog: PR tree-optimization/105736 * gcc.dg/builtin-dynamic-object-size-0.c (TV4, val3, test_pr105736): New struct declaration, variable and function to test PR. (main): Use them. Signed-off-by: Siddhesh Poyarekar --- Changes from v1: - Used FAIL() instead of __builtin_abort() in the test. Tested: - x86_64 bootstrap and test - --with-build-config=bootstrap-ubsan build May I also backport this to gcc12? .../gcc.dg/builtin-dynamic-object-size-0.c| 18 + gcc/tree-object-size.cc | 20 ++- 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index b5b0b3a677c..01a280b2d7b 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -479,6 +479,20 @@ test_loop (int *obj, size_t sz, size_t start, size_t end, int incr) return __builtin_dynamic_object_size (ptr, 0); } +/* Other tests. */ + +struct TV4 +{ + __attribute__((vector_size (sizeof (int) * 4))) int v; +}; + +struct TV4 val3; +int * +test_pr105736 (struct TV4 *a) +{ + return &a->v[0]; +} + unsigned nfails = 0; #define FAIL() ({ \ @@ -633,6 +647,10 @@ main (int argc, char **argv) FAIL (); if (test_loop (arr, 42, 20, 52, 1) != 0) FAIL (); + /* pr105736. */ + int *t = test_pr105736 (&val3); + if (__builtin_dynamic_object_size (t, 0) != -1) +FAIL (); if (nfails > 0) __builtin_abort (); diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc index 5ca87ae3504..12bc0868b77 100644 --- a/gcc/tree-object-size.cc +++ b/gcc/tree-object-size.cc @@ -695,19 +695,21 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, var_size = pt_var_size; bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var); if (bytes != error_mark_node) - bytes = size_for_offset (var_size, bytes); - if (var != pt_var - && pt_var_size - && TREE_CODE (pt_var) == MEM_REF - && bytes != error_mark_node) { - tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), pt_var); - if (bytes2 != error_mark_node) + bytes = size_for_offset (var_size, bytes); + if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == MEM_REF) { - bytes2 = size_for_offset (pt_var_size, bytes2); - bytes = size_binop (MIN_EXPR, bytes, bytes2); + tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), + pt_var); + if (bytes2 != error_mark_node) + { + bytes2 = size_for_offset (pt_var_size, bytes2); + bytes = size_binop (MIN_EXPR, bytes, bytes2); + } } } + else + bytes = size_unknown (object_size_type); wholebytes = object_size_type & OST_SUBOBJECT ? var_size : pt_var_wholesize; -- 2.35.3
RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask
> -Original Message- > From: Richard Sandiford > Sent: Tuesday, June 14, 2022 2:43 PM > To: Richard Biener > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; > nd > Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2 > bitmask > > Richard Biener writes: > > On Mon, 13 Jun 2022, Tamar Christina wrote: > > > >> > -Original Message- > >> > From: Richard Biener > >> > Sent: Monday, June 13, 2022 12:48 PM > >> > To: Tamar Christina > >> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > >> > > >> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by > >> > pow2 bitmask > >> > > >> > On Mon, 13 Jun 2022, Tamar Christina wrote: > >> > > >> > > > -Original Message- > >> > > > From: Richard Biener > >> > > > Sent: Monday, June 13, 2022 10:39 AM > >> > > > To: Tamar Christina > >> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > >> > > > > >> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division > >> > > > by > >> > > > pow2 bitmask > >> > > > > >> > > > On Mon, 13 Jun 2022, Richard Biener wrote: > >> > > > > >> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote: > >> > > > > > >> > > > > > Hi All, > >> > > > > > > >> > > > > > In plenty of image and video processing code it's common to > >> > > > > > modify pixel values by a widening operation and then scale > >> > > > > > them back into range > >> > > > by dividing by 255. > >> > > > > > > >> > > > > > This patch adds an optab to allow us to emit an optimized > >> > > > > > sequence when doing an unsigned division that is equivalent to: > >> > > > > > > >> > > > > >x = y / (2 ^ (bitsize (y)/2)-1 > >> > > > > > > >> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > >> > > > > > x86_64-pc-linux-gnu and no issues. > >> > > > > > > >> > > > > > Ok for master? > >> > > > > > >> > > > > Looking at 2/2 it seems that this is the wrong way to attack > >> > > > > the problem. The ISA doesn't have such instruction so adding > >> > > > > an optab looks premature. I suppose that there's no unsigned > >> > > > > vector integer division and thus we open-code that in a different > way? > >> > > > > Isn't the correct thing then to fixup that open-coding if it > >> > > > > is more > >> > efficient? > >> > > > > >> > > > >> > > The problem is that even if you fixup the open-coding it would > >> > > need to be something target specific? The sequence of > >> > > instructions we generate don't have a GIMPLE representation. So > >> > > whatever is generated I'd have to fixup in RTL then. > >> > > >> > What's the operation that doesn't have a GIMPLE representation? > >> > >> For NEON use two operations: > >> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2 > >> Where the + widens and the >> narrows. So you give it two > >> shorts, get a byte 2. Add widening add of lowpart so basically > >> lowpart (a +w b) > >> > >> For SVE2 we use a different sequence, we use two back-to-back > sequences of: > >> 1. Add narrow high part (bottom). In SVE the Top and Bottom instructions > select > >>Even and odd elements of the vector rather than "top half" and "bottom > half". > >> > >>So this instruction does : Add each vector element of the first source > vector to the > >>corresponding vector element of the second source vector, and place > the most > >> significant half of the result in the even-numbered half-width > destination elements, > >> while setting the odd-numbered elements to zero. > >> > >> So there's an explicit permute in there. The instructions are > >> sufficiently different that there wouldn't be a single GIMPLE > representation. > > > > I see. Are these also useful to express scalar integer division? > > > > I'll defer to others to ack the special udiv_pow2_bitmask optab or > > suggest some piecemail things other targets might be able to do as > > well. It does look very special. I'd also bikeshed it to > > udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I > > interpreted 'bitmask' correctly ;)). It seems to be even less general > > since it is an unary op and the actual divisor is constrained by the > > mode itself? > > Yeah, those were my concerns as well. For n-bit numbers, the same kind of > arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so > from a target-independent point of view, m==n/2 isn't particularly special. > Hard-coding one value of m would make sense if there was an underlying > instruction that did exactly this, but like you say, there isn't. > > Would a compromise be to define an optab for ADDHN and then add a vector > pattern for this division that (at least initially) prefers ADDHN over the > current approach whenever ADDHN is available? We could then adapt the > conditions on the pattern if other targets also provide ADDHN but don't want > this transform. (I think the other instructions in the pattern already have > optabs.) > > That still leaves o
Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask
> Am 14.06.2022 um 17:58 schrieb Tamar Christina via Gcc-patches > : > > > >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, June 14, 2022 2:43 PM >> To: Richard Biener >> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; >> nd >> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2 >> bitmask >> >> Richard Biener writes: On Mon, 13 Jun 2022, Tamar Christina wrote: >>> > -Original Message- > From: Richard Biener > Sent: Monday, June 13, 2022 12:48 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > Subject: RE: [PATCH 1/2]middle-end Support optimized division by > pow2 bitmask > > On Mon, 13 Jun 2022, Tamar Christina wrote: > >>> -Original Message- >>> From: Richard Biener >>> Sent: Monday, June 13, 2022 10:39 AM >>> To: Tamar Christina >>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford >>> >>> Subject: Re: [PATCH 1/2]middle-end Support optimized division >>> by >>> pow2 bitmask >>> >>> On Mon, 13 Jun 2022, Richard Biener wrote: >>> On Thu, 9 Jun 2022, Tamar Christina wrote: > Hi All, > > In plenty of image and video processing code it's common to > modify pixel values by a widening operation and then scale > them back into range >>> by dividing by 255. > > This patch adds an optab to allow us to emit an optimized > sequence when doing an unsigned division that is equivalent to: > > x = y / (2 ^ (bitsize (y)/2)-1 > > Bootstrapped Regtested on aarch64-none-linux-gnu, > x86_64-pc-linux-gnu and no issues. > > Ok for master? Looking at 2/2 it seems that this is the wrong way to attack the problem. The ISA doesn't have such instruction so adding an optab looks premature. I suppose that there's no unsigned vector integer division and thus we open-code that in a different >> way? Isn't the correct thing then to fixup that open-coding if it is more > efficient? >>> >> >> The problem is that even if you fixup the open-coding it would >> need to be something target specific? The sequence of >> instructions we generate don't have a GIMPLE representation. So >> whatever is generated I'd have to fixup in RTL then. > > What's the operation that doesn't have a GIMPLE representation? For NEON use two operations: 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2 Where the + widens and the >> narrows. So you give it two shorts, get a byte 2. Add widening add of lowpart so basically lowpart (a +w b) For SVE2 we use a different sequence, we use two back-to-back >> sequences of: 1. Add narrow high part (bottom). In SVE the Top and Bottom instructions >> select Even and odd elements of the vector rather than "top half" and "bottom >> half". So this instruction does : Add each vector element of the first source >> vector to the corresponding vector element of the second source vector, and place >> the most significant half of the result in the even-numbered half-width >> destination elements, while setting the odd-numbered elements to zero. So there's an explicit permute in there. The instructions are sufficiently different that there wouldn't be a single GIMPLE >> representation. >>> >>> I see. Are these also useful to express scalar integer division? >>> >>> I'll defer to others to ack the special udiv_pow2_bitmask optab or >>> suggest some piecemail things other targets might be able to do as >>> well. It does look very special. I'd also bikeshed it to >>> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I >>> interpreted 'bitmask' correctly ;)). It seems to be even less general >>> since it is an unary op and the actual divisor is constrained by the >>> mode itself? >> >> Yeah, those were my concerns as well. For n-bit numbers, the same kind of >> arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so >> from a target-independent point of view, m==n/2 isn't particularly special. >> Hard-coding one value of m would make sense if there was an underlying >> instruction that did exactly this, but like you say, there isn't. >> >> Would a compromise be to define an optab for ADDHN and then add a vector >> pattern for this division that (at least initially) prefers ADDHN over the >> current approach whenever ADDHN is available? We could then adapt the >> conditions on the pattern if other targets also provide ADDHN but don't want >> this transform. (I think the other instructions in the pattern already have >> optabs.) >> >> That still leaves open the question about what to do about SVE2, but
c++: Elide calls to NOP module initializers
This implements NOP module initializer elision. Each CMI gains a new flag informing importers whether its initializer actually does something (initializers its own globals, and/or calls initializers of its imports). This allows an importer to determine whether to call it. nathan -- Nathan Sidwell
Re: [committed] openmp: Conforming device numbers and omp_{initial, invalid}_device
Hi Jakub! On 2022-06-13T14:06:39+0200, Jakub Jelinek via Gcc-patches wrote: > OpenMP 5.2 changed once more what device numbers are allowed. > libgomp/ > * testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add > test with omp_initial_device. Use -5 instead of -1 for negative value > test. > * testsuite/libgomp.fortran/target-is-accessible-1.f90 (main): > Likewise. Reorder stop numbers. In an offloading configuration, I'm seeing: PASS: libgomp.fortran/get-mapped-ptr-1.f90 -O (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.fortran/get-mapped-ptr-1.f90 -O execution test Does that one need similar treatment? It FAILs in 'STOP 1'; 'libgomp.fortran/get-mapped-ptr-1.f90': 1 program main 2 use omp_lib 3 use iso_c_binding 4 implicit none (external, type) 5 integer :: d, id 6 type(c_ptr) :: p 7 integer, target :: q 8 9 d = omp_get_default_device () 10 id = omp_get_initial_device () 11 12 if (d < 0 .or. d >= omp_get_num_devices ()) & 13 d = id 14 15 p = omp_target_alloc (c_sizeof (q), d) 16 if (.not. c_associated (p)) & 17 stop 0 ! okay 18 19 if (omp_target_associate_ptr (c_loc (q), p, c_sizeof (q), & 20 0_c_size_t, d) == 0) then 21 22 if(c_associated (omp_get_mapped_ptr (c_loc (q), -1))) & 23 stop 1 [...] Grüße Thomas > --- libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c.jj > 2022-05-23 21:44:48.950848384 +0200 > +++ libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c > 2022-06-13 13:10:56.471535852 +0200 > @@ -17,7 +17,10 @@ main () >if (!omp_target_is_accessible (p, sizeof (int), id)) > __builtin_abort (); > > - if (omp_target_is_accessible (p, sizeof (int), -1)) > + if (!omp_target_is_accessible (p, sizeof (int), omp_initial_device)) > +__builtin_abort (); > + > + if (omp_target_is_accessible (p, sizeof (int), -5)) > __builtin_abort (); > >if (omp_target_is_accessible (p, sizeof (int), n + 1)) > --- libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90.jj > 2022-05-23 21:44:48.954848343 +0200 > +++ libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90 > 2022-06-13 13:12:08.133819977 +0200 > @@ -19,12 +19,15 @@ program main >if (omp_target_is_accessible (p, c_sizeof (d), id) /= 1) & > stop 2 > > - if (omp_target_is_accessible (p, c_sizeof (d), -1) /= 0) & > + if (omp_target_is_accessible (p, c_sizeof (d), omp_initial_device) /= 1) & > stop 3 > > - if (omp_target_is_accessible (p, c_sizeof (d), n + 1) /= 0) & > + if (omp_target_is_accessible (p, c_sizeof (d), -5) /= 0) & > stop 4 > > + if (omp_target_is_accessible (p, c_sizeof (d), n + 1) /= 0) & > +stop 5 > + >! Currently, a host pointer is accessible if the device supports shared >! memory or omp_target_is_accessible is executed on the host. This >! test case must be adapted when unified shared memory is avialable. > @@ -35,14 +38,14 @@ program main > !$omp end target > > if (omp_target_is_accessible (p, c_sizeof (d), d) /= shared_mem) & > - stop 5; > + stop 6; > > if (omp_target_is_accessible (c_loc (a), 128 * sizeof (a(1)), d) /= > shared_mem) & > - stop 6; > + stop 7; > > do i = 1, 128 >if (omp_target_is_accessible (c_loc (a(i)), sizeof (a(i)), d) /= > shared_mem) & > -stop 7; > +stop 8; > end do > >end do - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH] opts: improve option suggestion
On 5/24/2022 2:45 AM, Martin Liška wrote: PING^1 On 5/12/22 09:10, Martin Liška wrote: On 5/11/22 20:49, David Malcolm wrote: On Wed, 2022-05-11 at 16:49 +0200, Martin Liška wrote: In case where we have 2 equally good candidates like -ftrivial-auto-var-init= -Wtrivial-auto-var-init for -ftrivial-auto-var-init, we should take the candidate that has a difference in trailing sign symbol. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin PR driver/105564 gcc/ChangeLog: * spellcheck.cc (test_find_closest_string): Add new test. * spellcheck.h (class best_match): Prefer a difference in trailing sign symbol. OK jeff
[PATCH] i386: Disallow sibcall when calling ifunc functions with PIC register
Disallow siball when calling ifunc functions with PIC register so that PIC register can be restored. gcc/ PR target/105960 * config/i386/i386.cc (ix86_function_ok_for_sibcall): Return false if PIC register is used when calling ifunc functions. gcc/testsuite/ PR target/105960 * gcc.target/i386/pr105960.c: New test. --- gcc/config/i386/i386.cc | 9 + gcc/testsuite/gcc.target/i386/pr105960.c | 19 +++ 2 files changed, 28 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr105960.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 3d189e124e4..1ca7836e11e 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -1015,6 +1015,15 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) } } + if (decl && ix86_use_pseudo_pic_reg ()) +{ + /* When PIC register is used, it must be restored after ifunc +function returns. */ + cgraph_node *node = cgraph_node::get (decl); + if (node && node->ifunc_resolver) +return false; +} + /* Otherwise okay. That also includes certain types of indirect calls. */ return true; } diff --git a/gcc/testsuite/gcc.target/i386/pr105960.c b/gcc/testsuite/gcc.target/i386/pr105960.c new file mode 100644 index 000..db137a1642d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr105960.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O2 -fpic" } */ + +__attribute__((target_clones("default","fma"))) +static inline double +expfull_ref(double x) +{ + return __builtin_pow(x, 0.1234); +} + +double +exp_ref(double x) +{ + return expfull_ref(x); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]*expfull_ref@PLT" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "call\[ \t\]*expfull_ref@PLT" { target ia32 } } } */ -- 2.36.1
[PATCH] gcc/configure.ac: fix --enable-fixed-point enablement [PR34422]
So, in investigating PR target/34422, I discovered that the gcc subdirectory's configure script had an instance of AC_ARG_ENABLE with 3rd and 4th its arguments reversed: the one where it warns that the --enable-fixed-point flag is being ignored is the one where that flag hasn't even been passed in the first place. The attached patch puts the warning in the correct argument to the macro in question. (I'm not including the regeneration of gcc/configure in the patch this time since that confused people last time.) OK to commit, with an appropriate ChangeLog? patch-gcc_configure.diff Description: Binary data
Re: [PATCH 2/5] xtensa: Add support for sibling call optimization
-fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr90949.c -O2 execution test FAIL: gcc.c-torture/execute/pr90949.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr90949.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr90949.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.c-torture/execute/printf-2.c -O2 execution test FAIL: gcc.c-torture/execute/printf-2.c -O3 -g execution test FAIL: gcc.c-torture/execute/printf-2.c -Os execution test FAIL: gcc.c-torture/execute/printf-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/printf-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/packed-array.c execution test FAIL: gcc.dg/pr20115.c execution test FAIL: gcc.dg/pr44404.c execution test FAIL: gcc.dg/pr81292-2.c execution test FAIL: gcc.dg/strlenopt-31.c execution test FAIL: gcc.dg/strlenopt-81.c execution test FAIL: gcc.dg/torture/builtin-complex-1.c -O2 execution test FAIL: gcc.dg/torture/builtin-complex-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.dg/torture/builtin-complex-1.c -O3 -g execution test FAIL: gcc.dg/torture/builtin-complex-1.c -Os execution test FAIL: gcc.dg/torture/builtin-complex-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/pr56661.c -Os execution test FAIL: gcc.dg/torture/pr65077.c -O2 execution test FAIL: gcc.dg/torture/pr65077.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.dg/torture/pr65077.c -O3 -g execution test FAIL: gcc.dg/torture/pr65077.c -Os execution test FAIL: gcc.dg/torture/pr65077.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/pr67916.c -O2 execution test FAIL: gcc.dg/torture/pr67916.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.dg/torture/pr67916.c -O3 -g execution test FAIL: gcc.dg/torture/pr67916.c -Os execution test FAIL: gcc.dg/torture/pr67916.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/tree-ssa/cswtch-3.c execution test FAIL: gcc.dg/tree-ssa/predcom-dse-5.c execution test FAIL: gcc.dg/tree-ssa/predcom-dse-6.c execution test FAIL: gcc.dg/tree-ssa/predcom-dse-7.c execution test The code generated for e.g. gcc.c-torture/execute/921208-2.c looks like this: .file "921208-2.c" .text .literal_position .align 4 .global g .type g, @function g: ret.n .size g, .-g .literal_position .literal .LC1, g@PLT .literal .LC3, 1072693248 .literal .LC4, 1073741824 .align 4 .global f .type f, @function f: addisp, sp, -16 s32i.n a13, sp, 4 l32ra13, .LC3 s32i.n a12, sp, 8 s32i.n a14, sp, 0 movi.n a12, 0 l32ra14, .LC1 s32i.n a0, sp, 12 mov.n a3, a13 mov.n a4, a12 mov.n a5, a13 mov.n a2, a12 callx0 a14 l32i.n a0, sp, 12 l32i.n a14, sp, 0 mov.n a4, a12 mov.n a5, a13 l32i.n a12, sp, 8 l32i.n a13, sp, 4 l32ra3, .LC4 movi.n a2, 0 addisp, sp, 16 jx a14 .size f, .-f .section.text.startup,"ax",@progbits .literal_position .literal .LC5, f@PLT .literal .LC6, exit@PLT .align 4 .global main .type main, @function main: addisp, sp, -16 l32ra2, .LC5 s32i.n a0, sp, 12 callx0 a2 l32ra3, .LC6 movi.n a2, 0 callx0 a3 .size main, .-main .ident "GCC: (GNU) 13.0.0 20220614 (experimental)" -- Thanks. -- Max
[committed] libstdc++: Check for size overflow in constexpr allocation [PR105957]
Tested powerpc64le-linux, pushed to trunk. -- >8 -- libstdc++-v3/ChangeLog: PR libstdc++/105957 * include/bits/allocator.h (allocator::allocate): Check for overflow in constexpr allocation. * testsuite/20_util/allocator/105975.cc: New test. --- libstdc++-v3/include/bits/allocator.h | 7 ++- .../testsuite/20_util/allocator/105975.cc | 18 ++ 2 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 libstdc++-v3/testsuite/20_util/allocator/105975.cc diff --git a/libstdc++-v3/include/bits/allocator.h b/libstdc++-v3/include/bits/allocator.h index ee1121b080a..aec0b374fd1 100644 --- a/libstdc++-v3/include/bits/allocator.h +++ b/libstdc++-v3/include/bits/allocator.h @@ -184,7 +184,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION allocate(size_t __n) { if (std::__is_constant_evaluated()) - return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp))); + { + if (__builtin_mul_overflow(__n, sizeof(_Tp), &__n)) + std::__throw_bad_array_new_length(); + return static_cast<_Tp*>(::operator new(__n)); + } + return __allocator_base<_Tp>::allocate(__n, 0); } diff --git a/libstdc++-v3/testsuite/20_util/allocator/105975.cc b/libstdc++-v3/testsuite/20_util/allocator/105975.cc new file mode 100644 index 000..4342aeade04 --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/allocator/105975.cc @@ -0,0 +1,18 @@ +// { dg-options "-std=gnu++20" } +// { dg-do compile { target c++20 } } + +// PR libstdc++/105957 + +#include + +consteval bool test_pr105957() +{ + std::allocator a; + auto n = std::size_t(-1) / (sizeof(long long) - 1); + auto p = a.allocate(n); // { dg-error "constexpr" } + a.deallocate(p, n); + return true; +} +static_assert( test_pr105957() ); + +// { dg-error "throw_bad_array_new_length" "" { target *-*-* } 0 } -- 2.34.3
[committed] libstdc++: Fix indentation in allocator base classes
Tested powerpc64le-linux, pushed to trunk. -- >8 -- libstdc++-v3/ChangeLog: * include/bits/new_allocator.h: Fix indentation. * include/ext/malloc_allocator.h: Likewise. --- libstdc++-v3/include/bits/new_allocator.h | 6 +++--- libstdc++-v3/include/ext/malloc_allocator.h | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/include/bits/new_allocator.h b/libstdc++-v3/include/bits/new_allocator.h index 1a5bc51b956..92ae9847f1c 100644 --- a/libstdc++-v3/include/bits/new_allocator.h +++ b/libstdc++-v3/include/bits/new_allocator.h @@ -119,9 +119,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION allocate(size_type __n, const void* = static_cast(0)) { #if __cplusplus >= 201103L -// _GLIBCXX_RESOLVE_LIB_DEFECTS -// 3308. std::allocator().allocate(n) -static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types"); + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3308. std::allocator().allocate(n) + static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types"); #endif if (__builtin_expect(__n > this->_M_max_size(), false)) diff --git a/libstdc++-v3/include/ext/malloc_allocator.h b/libstdc++-v3/include/ext/malloc_allocator.h index b61e9a85bb2..82b3f0a1c6f 100644 --- a/libstdc++-v3/include/ext/malloc_allocator.h +++ b/libstdc++-v3/include/ext/malloc_allocator.h @@ -103,9 +103,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION allocate(size_type __n, const void* = 0) { #if __cplusplus >= 201103L -// _GLIBCXX_RESOLVE_LIB_DEFECTS -// 3308. std::allocator().allocate(n) -static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types"); + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3308. std::allocator().allocate(n) + static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types"); #endif if (__builtin_expect(__n > this->_M_max_size(), false)) -- 2.34.3
[committed] libstdc++: Inline all basic_string::compare overloads [PR59048]
Tested powerpc64le-linux, pushed to trunk. -- >8 -- Defining the compare member functions inline allows calls to traits_type::length and std::min to be inlined, taking advantage of constant expression arguments. When not inline, the compiler prefers to use the explicit instantiation definitions in libstdc++.so and can't take advantage of constant arguments. libstdc++-v3/ChangeLog: PR libstdc++/59048 * include/bits/basic_string.h (compare): Define inline. * include/bits/basic_string.tcc (compare): Remove out-of-line definitions. * include/bits/cow_string.h (compare): Define inline. * testsuite/21_strings/basic_string/operations/compare/char/3.cc: New test. --- libstdc++-v3/include/bits/basic_string.h | 63 -- libstdc++-v3/include/bits/basic_string.tcc| 85 --- libstdc++-v3/include/bits/cow_string.h| 63 -- .../basic_string/operations/compare/char/3.cc | 7 ++ 4 files changed, 123 insertions(+), 95 deletions(-) create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/char/3.cc diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index f76ddf970c6..a34b3d9ed28 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -3235,7 +3235,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 */ _GLIBCXX20_CONSTEXPR int - compare(size_type __pos, size_type __n, const basic_string& __str) const; + compare(size_type __pos, size_type __n, const basic_string& __str) const + { + _M_check(__pos, "basic_string::compare"); + __n = _M_limit(__pos, __n); + const size_type __osize = __str.size(); + const size_type __len = std::min(__n, __osize); + int __r = traits_type::compare(_M_data() + __pos, __str.data(), __len); + if (!__r) + __r = _S_compare(__n, __osize); + return __r; + } /** * @brief Compare substring to a substring. @@ -3263,7 +3273,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _GLIBCXX20_CONSTEXPR int compare(size_type __pos1, size_type __n1, const basic_string& __str, - size_type __pos2, size_type __n2 = npos) const; + size_type __pos2, size_type __n2 = npos) const + { + _M_check(__pos1, "basic_string::compare"); + __str._M_check(__pos2, "basic_string::compare"); + __n1 = _M_limit(__pos1, __n1); + __n2 = __str._M_limit(__pos2, __n2); + const size_type __len = std::min(__n1, __n2); + int __r = traits_type::compare(_M_data() + __pos1, + __str.data() + __pos2, __len); + if (!__r) + __r = _S_compare(__n1, __n2); + return __r; + } /** * @brief Compare to a C string. @@ -3281,7 +3303,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 */ _GLIBCXX20_CONSTEXPR int - compare(const _CharT* __s) const _GLIBCXX_NOEXCEPT; + compare(const _CharT* __s) const _GLIBCXX_NOEXCEPT + { + __glibcxx_requires_string(__s); + const size_type __size = this->size(); + const size_type __osize = traits_type::length(__s); + const size_type __len = std::min(__size, __osize); + int __r = traits_type::compare(_M_data(), __s, __len); + if (!__r) + __r = _S_compare(__size, __osize); + return __r; + } // _GLIBCXX_RESOLVE_LIB_DEFECTS // 5 String::compare specification questionable @@ -3306,7 +3338,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 */ _GLIBCXX20_CONSTEXPR int - compare(size_type __pos, size_type __n1, const _CharT* __s) const; + compare(size_type __pos, size_type __n1, const _CharT* __s) const + { + __glibcxx_requires_string(__s); + _M_check(__pos, "basic_string::compare"); + __n1 = _M_limit(__pos, __n1); + const size_type __osize = traits_type::length(__s); + const size_type __len = std::min(__n1, __osize); + int __r = traits_type::compare(_M_data() + __pos, __s, __len); + if (!__r) + __r = _S_compare(__n1, __osize); + return __r; + } /** * @brief Compare substring against a character %array. @@ -3335,7 +3378,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _GLIBCXX20_CONSTEXPR int compare(size_type __pos, size_type __n1, const _CharT* __s, - size_type __n2) const; + size_type __n2) const + { + __glibcxx_requires_string_len(__s, __n2); + _M_check(__pos, "basic_string::compare"); + __n1 = _M_limit(__pos, __n1); + const size_type __len = std::min(__n1, __n2); + int __r = traits_type::compare(_M_data() + __pos, __s, __len); + if (!__r) + __r = _S_compare(__n1, __n2); + return __r; + } #if __cplusplus >= 202002L constexpr bool diff --git a/libstdc++-v3/include/bits/b
[committed] libstdc++: Check lengths first in operator== for basic_string [PR62187]
Tested powerpc64le-linux, pushed to trunk. -- >8 -- As confirmed by LWG 2852, the calls to traits_type::compare do not need to be obsvervable, so we can make operator== compare string lengths first and return immediately for non-equal lengths. This avoids doing a slow string comparison for "abc...xyz" == "abc...xy". Previously we only did this optimization for std::char_traits, but we can enable it unconditionally thanks to LWG 2852. For comparisons with a const char* we can call traits_type::length right away to do the same optimization. That strlen call can be folded away for constant arguments, making it very efficient. For the pre-C++20 operator== and operator!= overloads we can swap the order of the arguments to take advantage of the operator== improvements. libstdc++-v3/ChangeLog: PR libstdc++/62187 * include/bits/basic_string.h (operator==): Always compare lengths before checking string contents. [!__cpp_lib_three_way_comparison] (operator==, operator!=): Reorder arguments. --- libstdc++-v3/include/bits/basic_string.h | 24 ++-- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index a34b3d9ed28..57247e306dc 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -3627,17 +3627,10 @@ _GLIBCXX_END_NAMESPACE_CXX11 operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs, const basic_string<_CharT, _Traits, _Alloc>& __rhs) _GLIBCXX_NOEXCEPT -{ return __lhs.compare(__rhs) == 0; } - - template -_GLIBCXX20_CONSTEXPR -inline -typename __gnu_cxx::__enable_if<__is_char<_CharT>::__value, bool>::__type -operator==(const basic_string<_CharT>& __lhs, - const basic_string<_CharT>& __rhs) _GLIBCXX_NOEXCEPT -{ return (__lhs.size() == __rhs.size() - && !std::char_traits<_CharT>::compare(__lhs.data(), __rhs.data(), - __lhs.size())); } +{ + return __lhs.size() == __rhs.size() + && !_Traits::compare(__lhs.data(), __rhs.data(), __lhs.size()); +} /** * @brief Test equivalence of string and C string. @@ -3650,7 +3643,10 @@ _GLIBCXX_END_NAMESPACE_CXX11 inline bool operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs, const _CharT* __rhs) -{ return __lhs.compare(__rhs) == 0; } +{ + return __lhs.size() == _Traits::length(__rhs) + && !_Traits::compare(__lhs.data(), __rhs, __lhs.size()); +} #if __cpp_lib_three_way_comparison /** @@ -3691,7 +3687,7 @@ _GLIBCXX_END_NAMESPACE_CXX11 inline bool operator==(const _CharT* __lhs, const basic_string<_CharT, _Traits, _Alloc>& __rhs) -{ return __rhs.compare(__lhs) == 0; } +{ return __rhs == __lhs; } // operator != /** @@ -3717,7 +3713,7 @@ _GLIBCXX_END_NAMESPACE_CXX11 inline bool operator!=(const _CharT* __lhs, const basic_string<_CharT, _Traits, _Alloc>& __rhs) -{ return !(__lhs == __rhs); } +{ return !(__rhs == __lhs); } /** * @brief Test difference of string and C string. -- 2.34.3
[PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]
Hello- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902 The attached patch resolves PR preprocessor/103902 as described in the patch message inline below. bootstrap + regtest all languages was successful on x86-64 Linux, with no new failures: FAIL 103 103 PASS 542338 542371 UNSUPPORTED 15247 15250 UNTESTED 136 136 XFAIL 4166 4166 XPASS 17 17 Please let me know if it looks OK? A few questions I have: - A difference introduced with this patch is that after lexing something like `operator ""_abc', then `_abc' is added to the identifier hash map, whereas previously it was not. I feel like this must be OK because with the optional space as in `operator "" _abc', it would be added with or without the patch. - The behavior of `#pragma GCC poison' is not consistent (including prior to my patch). I tried to make it more so but there is still one thing I want to ask about. Leaving aside extended characters for now, the inconsistency is that currently the poison is only checked, when the suffix appears as a standalone token. #pragma GCC poison _X bool operator ""_X (unsigned long long); //accepted before the patch, //rejected after it bool operator "" _X (unsigned long long); //rejected either before or after const char * operator ""_X (const char *, unsigned long); //accepted before, //rejected after const char * operator "" _X (const char *, unsigned long); //rejected either const char * s = ""_X; //accepted before the patch, rejected after it const bool b = 1_X; //accepted before or after I feel like after the patch, the behavior is the expected behavior for all cases but the last one. Here, we allow the poisoned identifier because it's not lexed as an identifier, it's lexed as part of a pp-number. Does it seem OK like this or does it need to be addressed? Thanks for taking a look! -Lewis Subject: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902] The PR complains that we do not handle UTF-8 in the suffix for a user-defined literal, such as: bool operator ""_π (unsigned long long); In fact we don't handle any extended identifier characters there, whether UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after the "" tokens is included, since then the identifier is lexed in the "normal" way as its own token. But when it is lexed as part of the string token, this is handled in lex_string() with a one-off loop that is not aware of extended characters. This patch fixes it by adding a new function scan_cur_identifier() that can be used to lex an identifier while in the middle of lexing another token. It is somewhat duplicative of the code in lex_identifier(), which handles the normal case, but I think there's no good way to avoid that without pessimizing the usual case, since lex_identifier() takes advantage of the fact that the first character of the identifier has already been analyzed. The code duplication is somewhat offset by factoring out the identifier lexing diagnostics (e.g. for poisoned identifiers), which were formerly duplicated in two places, and have been factored into their own function that's used in (now) 3 places. BTW, the other place that was lexing identifiers is lex_identifier_intern(), which is used to implement #pragma push_macro and #pragma pop_macro. This does not support extended characters either. I will add that in a subsequent patch, because it can't directly reuse the new function, but rather needs to lex from a string instead of a cpp_buffer. With scan_cur_identifier(), we do also correctly warn about bidi and normalization issues in the extended identifiers comprising the suffix, and we check for poisoned identifiers there as well. PR preprocessor/103902 libcpp/ChangeLog: * lex.cc (identifier_diagnostics_on_lex): New function refactors common code from... (lex_identifier_intern): ...here, and... (lex_identifier): ...here. (struct scan_id_result): New struct to hold the result of... (scan_cur_identifier): ...new function. (create_literal2): New function. (is_macro): Removed function that is now handled directly in lex_string() and lex_raw_string(). (is_macro_not_literal_suffix): Likewise. (lit_accum::create_literal2): New function. (lex_raw_string): Make use of new function scan_cur_identifier(). (lex_string): Likewise. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/udlit-extended-id-1.C: New test. * g++.dg/cpp0x/udlit-extended-id-2.C: New test. * g++.dg/cpp0x/udlit-extended-id-3.C: New test. diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C new file mode 100644 index 000..411d4fdd0ba --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C @@ -0,0 +1,68 @@ +// { dg-do run
Re: [PATCH][_Hashtable] Fix insertion of range of type convertible to value_type PR 105714
On Wed, 25 May 2022 at 06:10, François Dumont wrote: > > Here is the patch to fix just what is described in PR 105714. > > libstdc++: [_Hashtable] Insert range of types convertible to > value_type PR 105714 > > Fix insertion of range of types convertible to value_type. > > libstdc++-v3/ChangeLog: > > PR libstdc++/105714 > * include/bits/hashtable_policy.h (_ValueTypeEnforcer): New. > * include/bits/hashtable.h > (_Hashtable<>::_M_insert_unique_aux): New. > (_Hashtable<>::_M_insert(_Arg&&, const _NodeGenerator&, > true_type)): Use latters. > (_Hashtable<>::_M_insert(_Arg&&, const _NodeGenerator&, > false_type)): Likewise. > (_Hashtable(_InputIterator, _InputIterator, size_type, > const _Hash&, const _Equal&, > const allocator_type&, true_type)): Use this.insert range. > (_Hashtable(_InputIterator, _InputIterator, size_type, > const _Hash&, const _Equal&, > const allocator_type&, false_type)): Use _M_insert. > * testsuite/23_containers/unordered_map/cons/56112.cc: > Check how many times conversion > is done. > * testsuite/23_containers/unordered_map/insert/105714.cc: > New test. > * testsuite/23_containers/unordered_set/insert/105714.cc: > New test. > > Tested under Linux x64, ok to commit ? I think "_ConvertToValueType" would be a better name than _ValueTypeEnforcer, and all overloads of _ValueTypeEnforcer can be const. OK with that change, thanks.
libgo patch committed: Format the syscall package
This Go formatter is starting to format documentation comments in some cases. As a step toward that in libgo, this patch adds blank lines after //sys comments in the syscall package where needed, and then runs the new formatter on the syscall package files. This is the libgo version of https://go.dev/cl/407136. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian 3f4a86eef4ebc28e394a7108a2353098d2ca4856 diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index 2cf7141c4fa..aeada9f8d0c 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -0058658a9efb6e5c5faa6f0f65949beea5ddbc98 +bbb3a4347714faee620dc205674510a0f20b81ae The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/libgo/go/syscall/dir_plan9.go b/libgo/go/syscall/dir_plan9.go index 4ed052de761..1667cbc02f4 100644 --- a/libgo/go/syscall/dir_plan9.go +++ b/libgo/go/syscall/dir_plan9.go @@ -184,6 +184,7 @@ func gbit8(b []byte) (uint8, []byte) { } // gbit16 reads a 16-bit number in little-endian order from b and returns it with the remaining slice of b. +// //go:nosplit func gbit16(b []byte) (uint16, []byte) { return uint16(b[0]) | uint16(b[1])<<8, b[2:] diff --git a/libgo/go/syscall/errstr.go b/libgo/go/syscall/errstr.go index 6c2441d364d..59f7a82c6d7 100644 --- a/libgo/go/syscall/errstr.go +++ b/libgo/go/syscall/errstr.go @@ -4,8 +4,8 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build !hurd -// +build !linux +//go:build !hurd && !linux +// +build !hurd,!linux package syscall diff --git a/libgo/go/syscall/errstr_glibc.go b/libgo/go/syscall/errstr_glibc.go index 5b19e6f202d..03a327dbc90 100644 --- a/libgo/go/syscall/errstr_glibc.go +++ b/libgo/go/syscall/errstr_glibc.go @@ -7,6 +7,7 @@ // We use this rather than errstr.go because on GNU/Linux sterror_r // returns a pointer to the error message, and may not use buf at all. +//go:build hurd || linux // +build hurd linux package syscall diff --git a/libgo/go/syscall/exec_bsd.go b/libgo/go/syscall/exec_bsd.go index 86e513efdea..e631593cbd9 100644 --- a/libgo/go/syscall/exec_bsd.go +++ b/libgo/go/syscall/exec_bsd.go @@ -49,6 +49,7 @@ func runtime_AfterForkInChild() // For the same reason compiler does not race instrument it. // The calls to RawSyscall are okay because they are assembly // functions that do not grow the stack. +// //go:norace func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) { // Declare all variables at top in case any diff --git a/libgo/go/syscall/exec_freebsd.go b/libgo/go/syscall/exec_freebsd.go index f02f89d1ca0..8e8ecb7e989 100644 --- a/libgo/go/syscall/exec_freebsd.go +++ b/libgo/go/syscall/exec_freebsd.go @@ -57,6 +57,7 @@ func runtime_AfterForkInChild() // For the same reason compiler does not race instrument it. // The calls to RawSyscall are okay because they are assembly // functions that do not grow the stack. +// //go:norace func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) { // Declare all variables at top in case any diff --git a/libgo/go/syscall/exec_hurd.go b/libgo/go/syscall/exec_hurd.go index 06df513c55c..a62b3e920e6 100644 --- a/libgo/go/syscall/exec_hurd.go +++ b/libgo/go/syscall/exec_hurd.go @@ -49,6 +49,7 @@ func runtime_AfterForkInChild() // For the same reason compiler does not race instrument it. // The calls to RawSyscall are okay because they are assembly // functions that do not grow the stack. +// //go:norace func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) { // Declare all variables at top in case any diff --git a/libgo/go/syscall/exec_linux.go b/libgo/go/syscall/exec_linux.go index 86fb8e84a66..77846af89e4 100644 --- a/libgo/go/syscall/exec_linux.go +++ b/libgo/go/syscall/exec_linux.go @@ -80,6 +80,7 @@ func runtime_AfterFork() func runtime_AfterForkInChild() // Implemented in clone_linux.c +// //go:noescape func rawClone(flags _C_ulong, child_stack *byte, ptid *Pid_t, ctid *Pid_t, regs unsafe.Pointer) _C_long @@ -92,6 +93,7 @@ func rawClone(flags _C_ulong, child_stack *byte, ptid *Pid_t, ctid *Pid_t, regs // For the same reason compiler does not race instrument it. // The calls to RawSyscall are okay because they are assembly // functions that do not grow the stack. +// //go:norace func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) { // Set up and fork. This returns immediately in the parent or diff --git a/libgo/go/syscall/exec_stubs.go b/libgo/go/syscall/exec_stubs.go index c837cf7a4e2..
Re: [PATCH V2]rs6000: Store complicated constant into pool
Hi! On Tue, Jun 14, 2022 at 09:23:55PM +0800, Jiufu Guo wrote: > This patch reduces the threshold of instruction number for storing > constant to pool and update cost for constant and mem accessing. > And then if building the constant needs more than 2 instructions (or > more than 1 instruction on P10), then prefer to load it from constant > pool. Have you tried with different limits? And, p10 is a red herring, you actually test if prefixed insns are used. > * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem): > Exclude rtx with code 'HIGH'. > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void) > static bool > rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) > { > - if (GET_CODE (x) == HIGH > - && GET_CODE (XEXP (x, 0)) == UNSPEC) > + /* Exclude CONSTANT HIGH part. e.g. > + (high:DI (symbol_ref:DI ("var") [flags 0xc0] )). */ > + if (GET_CODE (x) == HIGH) > return true; So, why is this? You didn't explain. > @@ -11139,7 +11140,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode > mode) > && FP_REGNO_P (REGNO (operands[0]))) > || !CONST_INT_P (operands[1]) > || (num_insns_constant (operands[1], mode) > -> (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2))) > +> (TARGET_PREFIXED ? 1 : 2))) > && !toc_relative_expr_p (operands[1], false, NULL, NULL) > && (TARGET_CMODEL == CMODEL_SMALL > || can_create_pseudo_p () This is the more obvious part. > @@ -22101,6 +22102,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int > outer_code, > > case CONST_DOUBLE: > case CONST_WIDE_INT: > + /* It may needs a few insns for const to SET. -1 for outer SET code. > */ > + if (outer_code == SET) > + { > + *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1; > + return true; > + } > + /* FALLTHRU */ > + > case CONST: > case HIGH: > case SYMBOL_REF: But this again isn't an obvious improvement at all, and needs performance testing -- separately of the other changes. > @@ -22110,8 +22119,12 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int > outer_code, > case MEM: >/* When optimizing for size, MEM should be slightly more expensive >than generating address, e.g., (plus (reg) (const)). > - L1 cache latency is about two instructions. */ > - *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2); > + L1 cache latency is about two instructions. > + For prefixed load (pld), we would set it slightly faster than > + than two instructions. */ > + *total = !speed > + ? COSTS_N_INSNS (1) + 1 > + : TARGET_PREFIXED ? COSTS_N_INSNS (2) - 1 : COSTS_N_INSNS (2); >if (rs6000_slow_unaligned_access (mode, MEM_ALIGN (x))) > *total += COSTS_N_INSNS (100); >return true; And this is completely independent of the rest as well. Cost 5 or 7 are completely random numbers, why did you pick these? Does it work better than 8, or 4, etc.? > --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c > +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c > @@ -1,7 +1,7 @@ > /* { dg-do compile { target { powerpc*-*-* } } } */ > /* { dg-require-effective-target lp64 } */ > /* { dg-options "-O" } */ > -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */ > +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */ Why? This is still better generated in code, no? It should never be loaded from a constant pool (it is hex 4000___, easy to construct with just one or two insns). > --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c > @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0x92349234ULL; } > unsigned long long mskl1() { return 0x2bcd2bcdULL; } > unsigned long long mskse() { return 0x12341234ULL; } > > -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */ > +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */ Please make this the exact number of times you want to see rldimi and the number of times you want a load. Btw, you need to write \m(?:rldimi|ld|pld)\M or it will mean \mrldimi or ld or pld\M (and that "ld" will match anything that "pld$" will match of course). So no doubt this will improve things, but we need testing of each part separately. Also look at code size, or differences in the generated code in general: this is much more sensitive to detect than performance, and is not itself sensitive to things like system load, so a) is easier to measure, and b) has more useful outputs, outputs that tell more of the whole story. Segher
[PATCH] Fix ICE in extract_insn, at recog.cc:2791
(In reply to Uroš Bizjak from comment #1) > Instruction does not accept memory operand for operand 3: > > (define_insn_and_split > "*_blendv_ltint" > [(set (match_operand: 0 "register_operand" "=Yr,*x,x") > (unspec: > [(match_operand: 1 "register_operand" "0,0,x") > (match_operand: 2 "vector_operand" "YrBm,*xBm,xm") > (subreg: >(lt:VI48_AVX > (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x") > (match_operand:VI48_AVX 4 "const0_operand")) 0)] > UNSPEC_BLENDV))] > > The problematic insn is: > > (define_insn_and_split "*avx_cmp3_ltint_not" > [(set (match_operand:VI48_AVX 0 "register_operand") >(vec_merge:VI48_AVX >(match_operand:VI48_AVX 1 "vector_operand") >(match_operand:VI48_AVX 2 "vector_operand") >(unspec: > [(subreg:VI48_AVX > (not: > (match_operand: 3 "vector_operand")) 0) > (match_operand:VI48_AVX 4 "const0_operand") > (match_operand:SI 5 "const_0_to_7_operand")] > UNSPEC_PCMP)))] > > which gets split to the above pattern. > > In the preparation statements we have: > > if (!MEM_P (operands[3])) > operands[3] = force_reg (mode, operands[3]); > operands[3] = lowpart_subreg (mode, operands[3], mode); > > Which won't fly when operand 3 is memory operand... > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/105953 * config/i386/sse.md (*avx_cmp3_ltint_not): Force_reg operands[3]. gcc/testsuite/ChangeLog: * g++.target/i386/pr105953.C: New test. --- gcc/config/i386/sse.md | 3 +-- gcc/testsuite/g++.target/i386/pr105953.C | 4 2 files changed, 5 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr105953.C diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 75609eaf9b7..3e3d96fe087 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -3643,8 +3643,7 @@ (define_insn_and_split "*avx_cmp3_ltint_not" gen_lowpart (mode, operands[1])); operands[2] = gen_lowpart (mode, operands[2]); - if (!MEM_P (operands[3])) -operands[3] = force_reg (mode, operands[3]); + operands[3] = force_reg (mode, operands[3]); operands[3] = lowpart_subreg (mode, operands[3], mode); }) diff --git a/gcc/testsuite/g++.target/i386/pr105953.C b/gcc/testsuite/g++.target/i386/pr105953.C new file mode 100644 index 000..b423d2dfdae --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr105953.C @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512vl -mabi=ms" } */ + +#include "pr100738-1.C" -- 2.18.1
Re: [PATCH] Fix ICE in extract_insn, at recog.cc:2791
On Wed, Jun 15, 2022 at 12:49 AM liuhongt wrote: > > (In reply to Uroš Bizjak from comment #1) > > Instruction does not accept memory operand for operand 3: > > > > (define_insn_and_split > > "*_blendv_ltint" > > [(set (match_operand: 0 "register_operand" "=Yr,*x,x") > > (unspec: > > [(match_operand: 1 "register_operand" "0,0,x") > > (match_operand: 2 "vector_operand" "YrBm,*xBm,xm") > > (subreg: > >(lt:VI48_AVX > > (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x") > > (match_operand:VI48_AVX 4 "const0_operand")) 0)] > > UNSPEC_BLENDV))] > > > > The problematic insn is: > > > > (define_insn_and_split "*avx_cmp3_ltint_not" > > [(set (match_operand:VI48_AVX 0 "register_operand") > >(vec_merge:VI48_AVX > >(match_operand:VI48_AVX 1 "vector_operand") > >(match_operand:VI48_AVX 2 "vector_operand") > >(unspec: > > [(subreg:VI48_AVX > > (not: > > (match_operand: 3 "vector_operand")) 0) > > (match_operand:VI48_AVX 4 "const0_operand") > > (match_operand:SI 5 "const_0_to_7_operand")] > > UNSPEC_PCMP)))] > > > > which gets split to the above pattern. > > > > In the preparation statements we have: > > > > if (!MEM_P (operands[3])) > > operands[3] = force_reg (mode, operands[3]); > > operands[3] = lowpart_subreg (mode, operands[3], mode); > > > > Which won't fly when operand 3 is memory operand... > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/105953 > * config/i386/sse.md (*avx_cmp3_ltint_not): Force_reg > operands[3]. > > gcc/testsuite/ChangeLog: > > * g++.target/i386/pr105953.C: New test. LGTM. Thanks, Uros. > --- > gcc/config/i386/sse.md | 3 +-- > gcc/testsuite/g++.target/i386/pr105953.C | 4 > 2 files changed, 5 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/g++.target/i386/pr105953.C > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 75609eaf9b7..3e3d96fe087 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -3643,8 +3643,7 @@ (define_insn_and_split "*avx_cmp3_ltint_not" > gen_lowpart (mode, operands[1])); >operands[2] = gen_lowpart (mode, operands[2]); > > - if (!MEM_P (operands[3])) > -operands[3] = force_reg (mode, operands[3]); > + operands[3] = force_reg (mode, operands[3]); >operands[3] = lowpart_subreg (mode, operands[3], mode); > }) > > diff --git a/gcc/testsuite/g++.target/i386/pr105953.C > b/gcc/testsuite/g++.target/i386/pr105953.C > new file mode 100644 > index 000..b423d2dfdae > --- /dev/null > +++ b/gcc/testsuite/g++.target/i386/pr105953.C > @@ -0,0 +1,4 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mavx512vl -mabi=ms" } */ > + > +#include "pr100738-1.C" > -- > 2.18.1 >
Re: [PATCH 0/9] Add debug_annotate attributes
On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are functionally very similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" in the attribute name seems misleading. DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, declarations and types will be checked for the corresponding attributes. If present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for the annotated type or declaration, one for each tag. These DIEs link the arbitrary tag value to the item they annotate. For example, the following variable declaration: #define __typetag1 _
Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
Hi Honza, Thanks for the comments! Some replies are inlined below. on 2022/6/14 19:37, Jan Hubicka wrote: >> Hi, >> >> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >> if func->decl is not null but no cgraph node is available for it. >> As PR105818 shows, this could give unexpected result. For the >> case in PR105818, when parsing bar decl in function foo, the cfun >> is a function structure for foo, for which there is none cgraph >> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >> the context is to optimize for size, the flag optimize_size is >> true. >> >> The patch is to make optimize_function_for_size_p to check >> optimize_size as what it does when func->decl is unavailable. >> >> One regression failure got exposed on aarch64-linux-gnu: >> >> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >> >> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >> used in function fold_range_test during c parsing, it uses >> optimize_function_for_speed_p which is equal to the invertion >> of optimize_function_for_size_p. At that time cfun->decl is valid >> but no cgraph node for it, w/o this patch function >> optimize_function_for_speed_p returns true eventually, while it >> returns false with this patch. Since the command line option -Os >> is specified, there is no reason to interpret it as "for speed". >> I think this failure is expected and adjust the test case >> accordingly. >> >> Is it ok for trunk? >> >> BR, >> Kewen >> - >> >> PR target/105818 >> >> gcc/ChangeLog: >> >> * predict.cc (optimize_function_for_size_p): Check optimize_size when >> func->decl is valid but its cgraph node is unavailable. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr105818.c: New test. >> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >> --- >> gcc/predict.cc | 2 +- >> gcc/testsuite/gcc.dg/guality/pr54693-2.c| 2 +- >> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 + >> 3 files changed, 11 insertions(+), 2 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >> >> diff --git a/gcc/predict.cc b/gcc/predict.cc >> index 5734e4c8516..6c60a973236 100644 >> --- a/gcc/predict.cc >> +++ b/gcc/predict.cc >> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>cgraph_node *n = cgraph_node::get (fun->decl); >>if (n) >> return n->optimize_for_size_p (); >> - return OPTIMIZE_SIZE_NO; >> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; > > We could also do (opt_for_fn (cfun->decl, optimize_size) that is > probably better since one can change optimize_size with optimization > attribute. Good point, agree! > However I think in most cases we check for optimize_size early I think > we are doing something wrong, since at that time htere is no profile > available. Why exactly PR105818 hits the flag change issue? For PR105818, the reason why the flag changs is that: Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit of rs6000_isa_flags_explicit, it's set as below: /* If we can shrink-wrap the TOC register save separately, then use -msave-toc-indirect unless explicitly disabled. */ if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 && flag_shrink_wrap_separate && optimize_function_for_speed_p (cfun)) rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; Initially, rs6000 initialize target_option_default_node with OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL and optimize_size is true. Later, when c parser handling function foo, it builds target option node as target_option_default_node in function handle_optimize_attribute, it does global option saving and verifying there as well, at that time the cfun is NULL, no issue is found. And function store_parm_decls allocates struct_function for foo then, cfun becomes function struct for foo, when c parser continues to handle the decl bar in foo, function handle_optimize_attribute works as before, tries to restore the target options at the end, it calls targetm.target_option.restore (rs6000_function_specific_restore) which calls function rs6000_option_override_internal again, at this time the cfun is not NULL while there is no cgraph node for its decl, optimize_function_for_speed_p returns true and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag rs6000_isa_flags set unexpectedly. It becomes inconsistent as the one saved previously. IMHO, both contexts of global and function decl foo here hold optimize_size, function optimize_function_for_speed_p should not return true anyway. btw, the aarch64 failed case also gets the unexpected result for optimize_function_for_speed_p during c parsing (fold_range_test <- ... <- c_parser_condition). IIUC, in parsing time we don't have the profile information available. BR, Kewen
Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
on 2022/6/14 20:53, Segher Boessenkool wrote: > On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote: >> PR target/105818 > > Change this to something else? It is not a target bug. "middle-end" > perhaps? > Good catch, will fix this. Thanks Segher! BR, Kewen