Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]
On Fri, Jul 30, 2021 at 6:27 AM Jakub Jelinek via Gcc-patches wrote: > > On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote: > > Please put some space here, e.g.: > ... > > Can you just name the relevant insn pattern and use > > > > emit_insn (gen_bsr_1)? > > Here is the updated patch. I'll bootstrap/regtest it tonight. > > 2021-07-30 Jakub Jelinek > > PR target/78103 > * config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New > define_insn patterns. > (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns. > Add combine splitters for constant - clz. > (clz2): Use a temporary pseudo for bsr result. > > * gcc.target/i386/pr78103-1.c: New test. > * gcc.target/i386/pr78103-2.c: New test. > * gcc.target/i386/pr78103-3.c: New test. > > --- gcc/config/i386/i386.md.jj 2021-07-28 12:05:56.857977764 +0200 > +++ gcc/config/i386/i386.md 2021-07-30 15:13:49.994946550 +0200 > @@ -14761,6 +14761,18 @@ (define_insn "bsr_rex64" > (set_attr "znver1_decode" "vector") > (set_attr "mode" "DI")]) > > +(define_insn "bsr_rex64_1" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (minus:DI (const_int 63) > + (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm" > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT && TARGET_64BIT" > + "bsr{q}\t{%1, %0|%0, %1}" > + [(set_attr "type" "alu1") > + (set_attr "prefix_0f" "1") > + (set_attr "znver1_decode" "vector") > + (set_attr "mode" "DI")]) > + > (define_insn "bsr" >[(set (reg:CCZ FLAGS_REG) > (compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm") > @@ -14775,17 +14787,204 @@ (define_insn "bsr" > (set_attr "znver1_decode" "vector") > (set_attr "mode" "SI")]) > > +(define_insn "bsr_1" > + [(set (match_operand:SI 0 "register_operand" "=r") > + (minus:SI (const_int 31) > + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm" > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT" > + "bsr{l}\t{%1, %0|%0, %1}" > + [(set_attr "type" "alu1") > + (set_attr "prefix_0f" "1") > + (set_attr "znver1_decode" "vector") > + (set_attr "mode" "SI")]) > + > +(define_insn "bsr_zext_1" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (zero_extend:DI > + (minus:SI > + (const_int 31) > + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm") > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT && TARGET_64BIT" > + "bsr{l}\t{%1, %k0|%k0, %1}" > + [(set_attr "type" "alu1") > + (set_attr "prefix_0f" "1") > + (set_attr "znver1_decode" "vector") > + (set_attr "mode" "SI")]) > + > +; As bsr is undefined behavior on zero and for other input > +; values it is in range 0 to 63, we can optimize away sign-extends. > +(define_insn_and_split "*bsr_rex64_2" > + [(set (match_operand:DI 0 "register_operand") > + (xor:DI > + (sign_extend:DI > + (minus:SI > + (const_int 63) > + (subreg:SI (clz:DI (match_operand:DI 1 "nonimmediate_operand")) > +0))) > + (const_int 63))) > +(clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(parallel [(set (reg:CCZ FLAGS_REG) > + (compare:CCZ (match_dup 1) (const_int 0))) > + (set (match_dup 2) > + (minus:DI (const_int 63) (clz:DI (match_dup 1]) > + (parallel [(set (match_dup 0) > + (zero_extend:DI (xor:SI (match_dup 3) (const_int 63 > + (clobber (reg:CC FLAGS_REG))])] > +{ > + operands[2] = gen_reg_rtx (DImode); > + operands[3] = lowpart_subreg (SImode, operands[2], DImode); > +}) > + > +(define_insn_and_split "*bsr_2" > + [(set (match_operand:DI 0 "register_operand") > + (sign_extend:DI > + (xor:SI > + (minus:SI > + (const_int 31) > + (clz:SI (match_operand:SI 1 "nonimmediate_operand"))) > + (const_int 31 > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(parallel [(set (reg:CCZ FLAGS_REG) > + (compare:CCZ (match_dup 1) (const_int 0))) > + (set (match_dup 2) > + (minus:SI (const_int 31) (clz:SI (match_dup 1]) > + (parallel [(set (match_dup 0) > + (zero_extend:DI (xor:SI (match_dup 2) (const_int 31 > + (clobber (reg:CC FLAGS_REG))])] > + "operands[2] = gen_reg_rtx (SImode);") > + > +; Splitters to optimize 64 - __builtin_clzl (x) or 32 - __builtin_clz (x). > +; Again, as for !TARGET_LZCNT CLZ is UB at zero, CLZ is guaranteed to be > +; in [0, 63] or [0, 31] range. > +(define_split > + [(set (match_operand:SI 0 "register_operand") > + (minus:SI > + (match_operand:SI 2 "const_int_operand") > + (xor:SI > + (minus:SI (const
Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]
On Sat, Jul 31, 2021 at 12:38 PM H.J. Lu wrote: > > On Fri, Jul 30, 2021 at 6:27 AM Jakub Jelinek via Gcc-patches > wrote: > > > > On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote: > > > Please put some space here, e.g.: > > ... > > > Can you just name the relevant insn pattern and use > > > > > > emit_insn (gen_bsr_1)? > > > > Here is the updated patch. I'll bootstrap/regtest it tonight. > > > > 2021-07-30 Jakub Jelinek > > > > PR target/78103 > > * config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New > > define_insn patterns. > > (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns. > > Add combine splitters for constant - clz. > > (clz2): Use a temporary pseudo for bsr result. > > > > * gcc.target/i386/pr78103-1.c: New test. > > * gcc.target/i386/pr78103-2.c: New test. > > * gcc.target/i386/pr78103-3.c: New test. > > > > --- gcc/config/i386/i386.md.jj 2021-07-28 12:05:56.857977764 +0200 > > +++ gcc/config/i386/i386.md 2021-07-30 15:13:49.994946550 +0200 > > @@ -14761,6 +14761,18 @@ (define_insn "bsr_rex64" > > (set_attr "znver1_decode" "vector") > > (set_attr "mode" "DI")]) > > > > +(define_insn "bsr_rex64_1" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (minus:DI (const_int 63) > > + (clz:DI (match_operand:DI 1 "nonimmediate_operand" > > "rm" > > + (clobber (reg:CC FLAGS_REG))] > > + "!TARGET_LZCNT && TARGET_64BIT" > > + "bsr{q}\t{%1, %0|%0, %1}" > > + [(set_attr "type" "alu1") > > + (set_attr "prefix_0f" "1") > > + (set_attr "znver1_decode" "vector") > > + (set_attr "mode" "DI")]) > > + > > (define_insn "bsr" > >[(set (reg:CCZ FLAGS_REG) > > (compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm") > > @@ -14775,17 +14787,204 @@ (define_insn "bsr" > > (set_attr "znver1_decode" "vector") > > (set_attr "mode" "SI")]) > > > > +(define_insn "bsr_1" > > + [(set (match_operand:SI 0 "register_operand" "=r") > > + (minus:SI (const_int 31) > > + (clz:SI (match_operand:SI 1 "nonimmediate_operand" > > "rm" > > + (clobber (reg:CC FLAGS_REG))] > > + "!TARGET_LZCNT" > > + "bsr{l}\t{%1, %0|%0, %1}" > > + [(set_attr "type" "alu1") > > + (set_attr "prefix_0f" "1") > > + (set_attr "znver1_decode" "vector") > > + (set_attr "mode" "SI")]) > > + > > +(define_insn "bsr_zext_1" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (zero_extend:DI > > + (minus:SI > > + (const_int 31) > > + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm") > > + (clobber (reg:CC FLAGS_REG))] > > + "!TARGET_LZCNT && TARGET_64BIT" > > + "bsr{l}\t{%1, %k0|%k0, %1}" > > + [(set_attr "type" "alu1") > > + (set_attr "prefix_0f" "1") > > + (set_attr "znver1_decode" "vector") > > + (set_attr "mode" "SI")]) > > + > > +; As bsr is undefined behavior on zero and for other input > > +; values it is in range 0 to 63, we can optimize away sign-extends. > > +(define_insn_and_split "*bsr_rex64_2" > > + [(set (match_operand:DI 0 "register_operand") > > + (xor:DI > > + (sign_extend:DI > > + (minus:SI > > + (const_int 63) > > + (subreg:SI (clz:DI (match_operand:DI 1 > > "nonimmediate_operand")) > > +0))) > > + (const_int 63))) > > +(clobber (reg:CC FLAGS_REG))] > > + "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()" > > + "#" > > + "&& 1" > > + [(parallel [(set (reg:CCZ FLAGS_REG) > > + (compare:CCZ (match_dup 1) (const_int 0))) > > + (set (match_dup 2) > > + (minus:DI (const_int 63) (clz:DI (match_dup 1]) > > + (parallel [(set (match_dup 0) > > + (zero_extend:DI (xor:SI (match_dup 3) (const_int 63 > > + (clobber (reg:CC FLAGS_REG))])] > > +{ > > + operands[2] = gen_reg_rtx (DImode); > > + operands[3] = lowpart_subreg (SImode, operands[2], DImode); > > +}) > > + > > +(define_insn_and_split "*bsr_2" > > + [(set (match_operand:DI 0 "register_operand") > > + (sign_extend:DI > > + (xor:SI > > + (minus:SI > > + (const_int 31) > > + (clz:SI (match_operand:SI 1 "nonimmediate_operand"))) > > + (const_int 31 > > + (clobber (reg:CC FLAGS_REG))] > > + "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()" > > + "#" > > + "&& 1" > > + [(parallel [(set (reg:CCZ FLAGS_REG) > > + (compare:CCZ (match_dup 1) (const_int 0))) > > + (set (match_dup 2) > > + (minus:SI (const_int 31) (clz:SI (match_dup 1]) > > + (parallel [(set (match_dup 0) > > + (zero_extend:DI (xor:SI (match_dup 2) (const_int 31 > > + (clobber (reg:CC FLAGS_REG))])] > > + "operands[2] = gen_reg_rtx (SImode);") > > + > > +; Splitters to optimize 64 - __builtin_clzl (x) or 32 - __bu
Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]
On Fri, Jul 30, 2021 at 6:27 AM Jakub Jelinek via Gcc-patches wrote: > > On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote: > > Please put some space here, e.g.: > ... > > Can you just name the relevant insn pattern and use > > > > emit_insn (gen_bsr_1)? > > Here is the updated patch. I'll bootstrap/regtest it tonight. > > 2021-07-30 Jakub Jelinek > > PR target/78103 > * config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New > define_insn patterns. > (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns. > Add combine splitters for constant - clz. > (clz2): Use a temporary pseudo for bsr result. > > * gcc.target/i386/pr78103-1.c: New test. > * gcc.target/i386/pr78103-2.c: New test. > * gcc.target/i386/pr78103-3.c: New test. > > --- gcc/config/i386/i386.md.jj 2021-07-28 12:05:56.857977764 +0200 > +++ gcc/config/i386/i386.md 2021-07-30 15:13:49.994946550 +0200 > @@ -14761,6 +14761,18 @@ (define_insn "bsr_rex64" > (set_attr "znver1_decode" "vector") > (set_attr "mode" "DI")]) > > +(define_insn "bsr_rex64_1" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (minus:DI (const_int 63) > + (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm" > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT && TARGET_64BIT" > + "bsr{q}\t{%1, %0|%0, %1}" > + [(set_attr "type" "alu1") > + (set_attr "prefix_0f" "1") > + (set_attr "znver1_decode" "vector") > + (set_attr "mode" "DI")]) > + > (define_insn "bsr" >[(set (reg:CCZ FLAGS_REG) > (compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm") > @@ -14775,17 +14787,204 @@ (define_insn "bsr" > (set_attr "znver1_decode" "vector") > (set_attr "mode" "SI")]) > > +(define_insn "bsr_1" > + [(set (match_operand:SI 0 "register_operand" "=r") > + (minus:SI (const_int 31) > + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm" > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT" > + "bsr{l}\t{%1, %0|%0, %1}" > + [(set_attr "type" "alu1") > + (set_attr "prefix_0f" "1") > + (set_attr "znver1_decode" "vector") > + (set_attr "mode" "SI")]) > + > +(define_insn "bsr_zext_1" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (zero_extend:DI > + (minus:SI > + (const_int 31) > + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm") > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT && TARGET_64BIT" > + "bsr{l}\t{%1, %k0|%k0, %1}" > + [(set_attr "type" "alu1") > + (set_attr "prefix_0f" "1") > + (set_attr "znver1_decode" "vector") > + (set_attr "mode" "SI")]) > + > +; As bsr is undefined behavior on zero and for other input > +; values it is in range 0 to 63, we can optimize away sign-extends. > +(define_insn_and_split "*bsr_rex64_2" > + [(set (match_operand:DI 0 "register_operand") > + (xor:DI > + (sign_extend:DI > + (minus:SI > + (const_int 63) > + (subreg:SI (clz:DI (match_operand:DI 1 "nonimmediate_operand")) > +0))) > + (const_int 63))) > +(clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(parallel [(set (reg:CCZ FLAGS_REG) > + (compare:CCZ (match_dup 1) (const_int 0))) > + (set (match_dup 2) > + (minus:DI (const_int 63) (clz:DI (match_dup 1]) > + (parallel [(set (match_dup 0) > + (zero_extend:DI (xor:SI (match_dup 3) (const_int 63 > + (clobber (reg:CC FLAGS_REG))])] > +{ > + operands[2] = gen_reg_rtx (DImode); > + operands[3] = lowpart_subreg (SImode, operands[2], DImode); > +}) > + > +(define_insn_and_split "*bsr_2" > + [(set (match_operand:DI 0 "register_operand") > + (sign_extend:DI > + (xor:SI > + (minus:SI > + (const_int 31) > + (clz:SI (match_operand:SI 1 "nonimmediate_operand"))) > + (const_int 31 > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_LZCNT && TARGET_64BIT && ix86_pre_reload_split ()" > + "#" > + "&& 1" > + [(parallel [(set (reg:CCZ FLAGS_REG) > + (compare:CCZ (match_dup 1) (const_int 0))) > + (set (match_dup 2) > + (minus:SI (const_int 31) (clz:SI (match_dup 1]) > + (parallel [(set (match_dup 0) > + (zero_extend:DI (xor:SI (match_dup 2) (const_int 31 > + (clobber (reg:CC FLAGS_REG))])] > + "operands[2] = gen_reg_rtx (SImode);") > + > +; Splitters to optimize 64 - __builtin_clzl (x) or 32 - __builtin_clz (x). > +; Again, as for !TARGET_LZCNT CLZ is UB at zero, CLZ is guaranteed to be > +; in [0, 63] or [0, 31] range. > +(define_split > + [(set (match_operand:SI 0 "register_operand") > + (minus:SI > + (match_operand:SI 2 "const_int_operand") > + (xor:SI > + (minus:SI (const
[r12-2649 Regression] FAIL: gcc.target/i386/pr78103-2.c scan-assembler \\m(leal|addl)\\M on Linux/x86_64
On Linux/x86_64, 91425e2adecd00091d7443104ecb367686e88663 is the first bad commit commit 91425e2adecd00091d7443104ecb367686e88663 Author: Jakub Jelinek Date: Sat Jul 31 09:19:32 2021 +0200 i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103] caused FAIL: gcc.target/i386/pr78103-2.c scan-assembler \\m(leal|addl)\\M with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2649/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/pr78103-2.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[r12-2640 Regression] FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) on Linux/x86_64
On Linux/x86_64, f7bf03cf69ccb7dcfa0320774aa7f3c51344dada is the first bad commit commit f7bf03cf69ccb7dcfa0320774aa7f3c51344dada Author: Roger Sayle Date: Fri Jul 30 22:46:32 2021 +0100 Decrement followed by cmov improvements. caused FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2640/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/dec-cmov-2.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/dec-cmov-2.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[committed] openmp: Handle OpenMP directives in attribute syntax in attribute-declaration
Hi! Now that we parse attribute-declaration (outside of functions), the following patch handles OpenMP directives in its attribute(s). What needs handling incrementally is diagnose mismatching begin/end pair like [[omp::directive (declare target)]]; int a; #pragma omp end declare target or #pragma omp declare target int b; [[omp::directive (end declare target)]]; and handling declare simd/declare variant on declarations (function definitions and declarations), for those in two different spots. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2021-07-31 Jakub Jelinek * parser.c (cp_parser_declaration): Handle OpenMP directives in attribute-declaration. * g++.dg/gomp/attrs-9.C: New test. --- gcc/cp/parser.c.jj 2021-07-30 14:43:43.049383470 +0200 +++ gcc/cp/parser.c 2021-07-30 19:43:22.464675663 +0200 @@ -14423,6 +14423,25 @@ cp_parser_declaration (cp_parser* parser { location_t attrs_loc = token1->location; tree std_attrs = cp_parser_std_attribute_spec_seq (parser); + + if (std_attrs && (flag_openmp || flag_openmp_simd)) + { + gcc_assert (!parser->lexer->in_omp_attribute_pragma); + std_attrs = cp_parser_handle_statement_omp_attributes (parser, +std_attrs); + if (parser->lexer->in_omp_attribute_pragma) + { + cp_lexer *lexer = parser->lexer; + while (parser->lexer->in_omp_attribute_pragma) + { + gcc_assert (cp_lexer_next_token_is (parser->lexer, + CPP_PRAGMA)); + cp_parser_pragma (parser, pragma_external, NULL); + } + cp_lexer_destroy (lexer); + } + } + if (std_attrs != NULL_TREE) warning_at (make_location (attrs_loc, attrs_loc, parser->lexer), OPT_Wattributes, "attribute ignored"); --- gcc/testsuite/g++.dg/gomp/attrs-9.C.jj 2021-07-30 19:51:28.977218521 +0200 +++ gcc/testsuite/g++.dg/gomp/attrs-9.C 2021-07-30 19:30:54.421622986 +0200 @@ -0,0 +1,15 @@ +// { dg-do compile { target c++11 } } + +[[omp::sequence (directive (requires, atomic_default_mem_order (seq_cst)))]]; +[[omp::directive (declare reduction (plus: int: omp_out += omp_in) initializer (omp_priv = 0))]]; +int a; +[[omp::directive (declare target (a))]]; +int t; +[[omp::sequence (omp::directive (threadprivate (t)))]]; +int b, c; +[[omp::directive (declare target, to (b), link (c))]]; +[[omp::directive (declare target)]]; +[[omp::directive (declare target)]]; +int d; +[[omp::directive (end declare target)]]; +[[omp::directive (end declare target)]]; Jakub
[PATCH] Optimize x ? bswap(x) : 0 in tree-ssa-phiopt
Many thanks again to Jakub Jelinek for a speedy fix for PR 101642. Interestingly, that test case "bswap16(x) ? : x" also reveals a missed optimization opportunity. The resulting "x ? bswap(x) : 0" can be further simplified to just bswap(x). Conveniently, tree-ssa-phiopt.c already recognizes/optimizes the related "x ? popcount(x) : 0", so this patch simply makes that transformation make general, additionally handling bswap, parity, ffs and clrsb. All of the required infrastructure is already present thanks to Jakub previously adding support for clz/ctz. To reflect this generalization, the name of the function is changed from cond_removal_in_popcount_clz_ctz_pattern to the hopefully equally descriptive cond_removal_in_builtin_zero_pattern. The following patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-07-31 Roger Sayle gcc/ChangeLog * tree-ssa-phiopt.c (cond_removal_in_builtin_zero_pattern): Renamed from cond_removal_in_popcount_clz_ctz_pattern. Add support for BSWAP, FFS, PARITY and CLRSB builtins. (tree_ssa_phiop_worker): Update call to function above. gcc/testuite/ChangeLog * gcc.dg/tree-ssa/phi-opt-25.c: New test case. Roger -- Roger Sayle NextMove Software Cambridge, UK diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index c6adbbd..66af902 100644 --- a/gcc/tree-ssa-phiopt.c +++ b/gcc/tree-ssa-phiopt.c @@ -66,9 +66,9 @@ static bool minmax_replacement (basic_block, basic_block, edge, edge, gphi *, tree, tree); static bool spaceship_replacement (basic_block, basic_block, edge, edge, gphi *, tree, tree); -static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, basic_block, - edge, edge, gphi *, - tree, tree); +static bool cond_removal_in_builtin_zero_pattern (basic_block, basic_block, + edge, edge, gphi *, + tree, tree); static bool cond_store_replacement (basic_block, basic_block, edge, edge, hash_set *); static bool cond_if_else_store_replacement (basic_block, basic_block, basic_block); @@ -350,9 +350,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads, bool early_p) early_p)) cfgchanged = true; else if (!early_p - && cond_removal_in_popcount_clz_ctz_pattern (bb, bb1, e1, - e2, phi, arg0, - arg1)) + && cond_removal_in_builtin_zero_pattern (bb, bb1, e1, e2, + phi, arg0, arg1)) cfgchanged = true; else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) cfgchanged = true; @@ -2466,7 +2465,8 @@ spaceship_replacement (basic_block cond_bb, basic_block middle_bb, return true; } -/* Convert +/* Optimize x ? __builtin_fun (x) : C, where C is __builtin_fun (0). + Convert if (b_4(D) != 0) @@ -2498,10 +2498,10 @@ spaceship_replacement (basic_block cond_bb, basic_block middle_bb, instead of 0 above it uses the value from that macro. */ static bool -cond_removal_in_popcount_clz_ctz_pattern (basic_block cond_bb, - basic_block middle_bb, - edge e1, edge e2, gphi *phi, - tree arg0, tree arg1) +cond_removal_in_builtin_zero_pattern (basic_block cond_bb, + basic_block middle_bb, + edge e1, edge e2, gphi *phi, + tree arg0, tree arg1) { gimple *cond; gimple_stmt_iterator gsi, gsi_from; @@ -2549,6 +2549,12 @@ cond_removal_in_popcount_clz_ctz_pattern (basic_block cond_bb, int val = 0; switch (cfn) { +case CFN_BUILT_IN_BSWAP16: +case CFN_BUILT_IN_BSWAP32: +case CFN_BUILT_IN_BSWAP64: +case CFN_BUILT_IN_BSWAP128: +CASE_CFN_FFS: +CASE_CFN_PARITY: CASE_CFN_POPCOUNT: break; CASE_CFN_CLZ: @@ -2577,6 +2583,15 @@ cond_removal_in_popcount_clz_ctz_pattern (basic_block cond_bb, } } return false; +case BUILT_IN_CLRSB: + val = TYPE_PRECISION (integer_type_node) - 1; + break; +case BUILT_IN_CLRSBL: + val = TYPE_PRECISION (long_integer_type_node) - 1; + break; +case BUILT_IN_CLRSBLL: + val = TYPE_PRECISION (long_long_integer_type_node) - 1; + break; default: return false; } /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-optimized" } */ unsi
New French PO file for 'gcc' (version 11.2.0)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the French team of translators. The file is available at: https://translationproject.org/latest/gcc/fr.po (This file, 'gcc-11.2.0.fr.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
RE: [r12-2640 Regression] FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) on Linux/x86_64
[Committed] Tweak new test case gcc.target/i386/dec-cmov-2.c With -m32, this test case is sensitive to the instruction timings of the target (for ifcvt to normalize bar() to foo() during the ce1 pass, prior to the transformations actually being tested here). Specifying -march=core2 prevents these failures. Committed as obvious. 2021-07-31 Roger Sayle gcc/testsuite/ChangeLog * gcc.target/i386/dec-cmov-2.c: Require -march=core2 with -m32. Roger -- -Original Message- From: sunil.k.pandey Sent: 31 July 2021 08:13 To: gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org; ro...@nextmovesoftware.com Subject: [r12-2640 Regression] FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) on Linux/x86_64 On Linux/x86_64, f7bf03cf69ccb7dcfa0320774aa7f3c51344dada is the first bad commit commit f7bf03cf69ccb7dcfa0320774aa7f3c51344dada Author: Roger Sayle Date: Fri Jul 30 22:46:32 2021 +0100 Decrement followed by cmov improvements. caused FAIL: gcc.target/i386/dec-cmov-2.c scan-assembler-not test(l|q|w) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2640/ usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/dec-cmov-2.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/dec-cmov-2.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
committed: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]
On Sat, 2021-07-31 at 02:08 +0800, Xi Ruoyao via Gcc-patches wrote: > On Fri, 2021-07-30 at 16:23 +0800, Xi Ruoyao via Gcc-patches wrote: > > On Fri, 2021-07-30 at 09:11 +0100, Richard Sandiford wrote: > > > Xi Ruoyao writes: > > > > Ping again. > > > > > > > > On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote: > > > > > Commit message shamelessly copied from 1777beb6b129 by jakub: > > > > > > > > > > This function, because it is sometimes called even outside of > > > > > function > > > > > bodies, uses create_tmp_var_raw rather than create_tmp_var. > > > > > But > > > > > in > > > > > order > > > > > for that to work, when first referenced, the VAR_DECLs need to > > > > > appear > > > > > in a > > > > > TARGET_EXPR so that during gimplification the var gets the > > > > > right > > > > > DECL_CONTEXT and is added to local decls. > > > > > > > > > > Bootstrapped & regtested on mips64el-linux-gnu. Ok for trunk > > > > > and > > > > > backport > > > > > to 11, 10, and 9? > > > > > > OK for all, thanks. > > > > > > Similar comments to the previous message about the appropriateness > > > of me reviewing the patch, but like you say, this is doing for > > > MIPS > > > what we've already had to do for other targets. > > > > Thanks for reviewing. > > > > Will bootstrap and test it again, and commit if there is no > > regressions. > > Committed to master at 20656544 and releases/gcc-11 at 7db1795a. Commited to releases/gcc-10 at 613e4ebc and releases/gcc-9 at 79184d8c.
[pushed] c++: pretty-print TYPE_PACK_EXPANSION better
gcc/cp/ChangeLog: * ptree.c (cxx_print_type) [TYPE_PACK_EXPANSION]: Also print PACK_EXPANSION_PATTERN. --- Tested x86_64-pc-linux-gnu, applying to trunk. gcc/cp/ptree.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/cp/ptree.c b/gcc/cp/ptree.c index 33b73fb24b6..7f140f5f06b 100644 --- a/gcc/cp/ptree.c +++ b/gcc/cp/ptree.c @@ -171,6 +171,7 @@ cxx_print_type (FILE *file, tree node, int indent) return; case TYPE_PACK_EXPANSION: + print_node (file, "pattern", PACK_EXPANSION_PATTERN (node), indent + 4); print_node (file, "args", PACK_EXPANSION_EXTRA_ARGS (node), indent + 4); return; base-commit: 4c4249b71de3b15ba1e176ce90a57fb7bc54b917 -- 2.27.0
[pushed] c++: ICE on anon struct with base [PR96636]
pinskia pointed out that my recent change to reject anonymous structs with bases was relevant to this PR. But we still ICEd after giving that error; this fixes the ICE. Tested x86_64-pc-linux-gnu, applying to trunk. PR c++/96636 gcc/cp/ChangeLog: * decl.c (fixup_anonymous_aggr): Clear TYPE_NEEDS_CONSTRUCTING after error. gcc/testsuite/ChangeLog: * g++.dg/ext/anon-struct9.C: New test. --- gcc/cp/decl.c | 6 +- gcc/testsuite/g++.dg/ext/anon-struct9.C | 9 + 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/ext/anon-struct9.C diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index e4be6be1819..6fa6b9adc87 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -5094,7 +5094,11 @@ fixup_anonymous_aggr (tree t) tree field, type; if (BINFO_N_BASE_BINFOS (TYPE_BINFO (t))) - error_at (location_of (t), "anonymous struct with base classes"); + { + error_at (location_of (t), "anonymous struct with base classes"); + /* Avoid ICE after error on anon-struct9.C. */ + TYPE_NEEDS_CONSTRUCTING (t) = false; + } for (field = TYPE_FIELDS (t); field; field = DECL_CHAIN (field)) if (TREE_CODE (field) == FIELD_DECL) diff --git a/gcc/testsuite/g++.dg/ext/anon-struct9.C b/gcc/testsuite/g++.dg/ext/anon-struct9.C new file mode 100644 index 000..56759429620 --- /dev/null +++ b/gcc/testsuite/g++.dg/ext/anon-struct9.C @@ -0,0 +1,9 @@ +// PR c++/96636 +// { dg-options "" } + +typedef class { + class a {}; + class : virtual a {};// { dg-error "anonymous struct with base" } +} b; +void foo(){ b();} + base-commit: 4c4249b71de3b15ba1e176ce90a57fb7bc54b917 prerequisite-patch-id: 62730bcaf1f07786fd756efb6f3bbd94d778c092 -- 2.27.0
Re: [PATCH] c++: Reject anonymous struct with bases
On Fri, Jul 30, 2021 at 3:35 PM Andrew Pinski wrote: > On Fri, Jul 30, 2021 at 9:26 AM Jason Merrill via Gcc-patches > wrote: > > > > In discussion of jakub's patch for C++20 pointer-interconvertibility, it > > came up that we allow anonymous structs to have bases, but don't do > anything > > usable with them. Let's reject it. > > > > The comment change is something I noticed while looking for the right > place > > to diagnose this: finish_struct_anon does not actually check for anything > > invalid, so it shouldn't claim to. > > This should fix PR 96636 by rejecting the code. > Thanks. Jason
PING^1 [PATCH v5] : Add pragma GCC target("general-regs-only")
On Sat, Jul 17, 2021 at 6:45 PM H.J. Lu wrote: > > On Thu, Apr 22, 2021 at 7:30 AM Richard Biener via Gcc-patches > wrote: > > > > On Thu, Apr 22, 2021 at 2:52 PM Richard Biener > > wrote: > > > > > > On Thu, Apr 22, 2021 at 2:22 PM Jakub Jelinek wrote: > > > > > > > > On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via > > > > Gcc-patches wrote: > > > > > > The question is if the pragma GCC target right now behaves > > > > > > incrementally > > > > > > or not, whether > > > > > > #pragma GCC target("avx2") > > > > > > adds -mavx2 to options if it was missing before and nothing > > > > > > otherwise, or if > > > > > > it switches other options off. If it is incremental, we could e.g. > > > > > > try to > > > > > > use the second least significant bit of global_options_set.x_* to > > > > > > mean > > > > > > this option has been set explicitly by some surrounding #pragma GCC > > > > > > target. > > > > > > The normal tests - global_options_set.x_flag_whatever could still > > > > > > work > > > > > > fine because they wouldn't care if the option was explicit from > > > > > > anywhere > > > > > > (command line or GCC target or target attribute) and just & 2 would > > > > > > mean > > > > > > it was explicit from pragma GCC target; though there is the case of > > > > > > bitfields... And then the inlining decision could check the & 2 > > > > > > flags to > > > > > > see what is required and what is just from command line. > > > > > > Or we can have some other pragma GCC that would be like target but > > > > > > would > > > > > > have flags that are explicit (and could e.g. be more restricted, to > > > > > > ISA > > > > > > options only, and let those use in addition to #pragma GCC target. > > > > > > > > > > I'm still curious as to what you think will break if always-inline > > > > > does what > > > > > it is documented to do. > > > > > > > > We will silently accept calling intrinsics that must be used only in > > > > certain > > > > ISA contexts, which will lead to people writing non-portable code. > > > > > > > > So -O2 -mno-avx > > > > #include > > > > > > > > void > > > > foo (__m256 *x) > > > > { > > > > x[0] = _mm256_sub_ps (x[1], x[2]); > > > > } > > > > etc. will now be accepted when it shouldn't be. > > > > clang rejects it like gcc with: > > > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target > > > > feature 'avx', but would be inlined into function 'foo' that is > > > > compiled without support for 'avx' > > > > x[0] = _mm256_sub_ps (x[1], x[2]); > > > > ^ > > > > > > > > Note, if I do: > > > > #include > > > > > > > > __attribute__((target ("no-sse3"))) void > > > > foo (__m256 *x) > > > > { > > > > x[0] = _mm256_sub_ps (x[1], x[2]); > > > > } > > > > and compile > > > > clang -S -O2 -mavx2 1.c > > > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target > > > > feature 'avx', but would be inlined into function 'foo' that is > > > > compiled without support for 'avx' > > > > x[0] = _mm256_sub_ps (x[1], x[2]); > > > > ^ > > > > then from the error message it seems that unlike GCC, clang remembers > > > > the exact target features that are needed for the intrinsics and checks > > > > just > > > > those. > > > > Though, looking at the preprocessed source, seems it uses > > > > static __inline __m256 __attribute__((__always_inline__, __nodebug__, > > > > __target__("avx"), __min_vector_width__(256))) > > > > _mm256_sub_ps(__m256 __a, __m256 __b) > > > > { > > > > return (__m256)((__v8sf)__a-(__v8sf)__b); > > > > } > > > > and not target pragmas. > > > > > > > > Anyway, if we tweak our intrinsic headers so that > > > > -#ifndef __AVX__ > > > > #pragma GCC push_options > > > > #pragma GCC target("avx") > > > > -#define __DISABLE_AVX__ > > > > -#endif /* __AVX__ */ > > > > > > > > ... > > > > -#ifdef __DISABLE_AVX__ > > > > -#undef __DISABLE_AVX__ > > > > #pragma GCC pop_options > > > > -#endif /* __DISABLE_AVX__ */ > > > > and do the opts_set->x_* & 2 stuff on explicit options coming out of > > > > target/optimize pragmas and attributes, perhaps we don't even need > > > > to introduce a new attribute and can handle everything magically: > > > > Oh, and any such changes will likely interact with Martins ideas to rework > > how optimize and target attributes work (aka adding ontop of the > > commandline options). That is, attribute target will then not be enough > > to remember the exact set of needed ISA features (as opposed to what > > likely clang implements?) > > > > > > 1) if it is gnu_inline extern inline, allow indirect calls, otherwise > > > > disallow them for always_inline functions > > > > > > There are a lot of intrinsics using extern inline __gnu_inline though... > > > > > > > 2) for the isa flags and option mismatches, only disallow opts_set->x_* > > > > & 2 > > > > stuff > > > > This will keep both intrinsics and glibc fortify macros working fine > > > > in all the needed use cases