Re: [PATCH] x86: Shrink writing 0/-1 to memory using and/or with -Oz.
On Tue, Dec 21, 2021 at 4:08 PM Roger Sayle wrote: > > > This is the second part of my fix to PR target/103773 where -Oz shouldn't > use push/pop on x86 to shrink writing small integer constants to memory. > Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem" > when writing -1 to memory when using -Oz. This patch implements this > via peephole2 where we can confirm that its ok to clobber the flags. > > On the CSiBE benchmark, this reduces total code size from 3664172 bytes > to 3663304 bytes, saving 868 bytes. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures, and the new testcase checked > both with and without -m32. Ok for mainline? > > > 2021-12-21 Roger Sayle > > gcc/ChangeLog > * gcc/config/i386/i386.md (define_peephole2): With -Oz use > andl $0,mem instead of movl $0,mem and orl $-1,mem instead of > movl $-1,mem. Your approach uses access to uninitialized memory, which may confuse optimizers. Please rather enhance *mov_xor and *mov_or to accept memory operand and convert to these patterns. Uros. > gcc/testsuite/ChangeLog > * gcc.target/i386/pr103773-2.c: New test case. > > > Thanks in advance (and my apologies for the breakage). > Roger > -- >
Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle wrote: > > > My apologies for the inconvenience. The new support for -Oz using > push/pop for small integer constants on x86_64 is only a win/correct > for loading registers. Fixed by adding !MEM_P tests in the appropriate > locations. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? > > > 2021-12-21 Roger Sayle > > gcc/ChangeLog > PR target/103773 > * config/i386/i386.md (*movdi_internal): Only use short > push/pop sequence for register (non-memory) destinations. > (*movsi_internal): Likewise. > > gcc/testsuite/ChangeLog > PR target/103773 > * gcc.target/i386/pr103773.c: New test case. Ouch, as pointed out in the PR, this approach clobbers the red zone. Please revert the original patch. Thanks, Uros. > > Roger > -- >
Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
On Wed, Dec 22, 2021 at 9:10 AM Uros Bizjak wrote: > > On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle > wrote: > > > > > > My apologies for the inconvenience. The new support for -Oz using > > push/pop for small integer constants on x86_64 is only a win/correct > > for loading registers. Fixed by adding !MEM_P tests in the appropriate > > locations. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check with no new failures. Ok for mainline? > > > > > > 2021-12-21 Roger Sayle > > > > gcc/ChangeLog > > PR target/103773 > > * config/i386/i386.md (*movdi_internal): Only use short > > push/pop sequence for register (non-memory) destinations. > > (*movsi_internal): Likewise. > > > > gcc/testsuite/ChangeLog > > PR target/103773 > > * gcc.target/i386/pr103773.c: New test case. > > Ouch, as pointed out in the PR, this approach clobbers the red zone. > > Please revert the original patch. *Maybe* we can use frame->red_zone_size here, but the frame is recalculated several times during the compilation. I think it is just too dangerous to use push/pop w.r.t. red zone clobbering. Uros.
Re: vxworks libstdc++ locale
On 21/12/2021 16.42, Rasmus Villemoes wrote: > Hi > > While trying to upgrade our vxworks 5.5 compiler to gcc12, I've hit a > problem when loading the libstdc++ module on target. It manifests as > > [00] tShell memPartFree: invalid block 8bf72c in partition 9605dc. > [00] tShell memPartFree: invalid block 8bf38c in partition 9605dc. > [00] tShell memPartFree: invalid block 8bf304 in partition 9605dc. > [00] tShell memPartFree: invalid block 8bf348 in partition 9605dc. > [00] tShell memPartFree: invalid block 8bf23c in partition 9605dc. > [00] tShell memPartFree: invalid block 8bf6c4 in partition 9605dc. > [00] tShell memPartFree: invalid block 8bf794 in partition 9605dc. > [00] tShell memPartFree: invalid block 8bf7a0 in partition 9605dc. > [00] tShell memPartFree: invalid block 8bf7bc in partition 9605dc. > > being printed on the console. We didn't use to pass an explicit > --enable-clocale option to configure, but if I add > --enable-clocale=generic , thus reverting to the locale implementation > used for gcc11, the problem goes away. > > The vxworks locale seems to be mostly identical to generic, just > differing in CCTYPE_CC. And comparing the .a files, it seems that that > TU ends up defining a constructor > _GLOBAL__sub_I__ZNSt12ctype_bynameIcEC2EPKcj , which calls > _ZNSt8ios_base4InitC1Ev . But then I'm lost. > > Any ideas? So if I remove the #include from libstdc++-v3/config/locale/vxworks/ctype_members.cc the problem goes away, and I don't see the purpose of that #include anyway (a debug leftover perhaps?). Rasmus
RE: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
Hi Uros, Would you consider the following variant that disables this optimization when a red zone is used by the current function? You're right that cfun's red_zone_size is recalculated dynamically, but ix86_red_zone_used should be a better "gate" given that this logic resides very late during compilation, in the output templates, where whether or not a red zone is used is known. On CSiBE, disabling this optimization in non-leaf functions that use a red zone costs 219 bytes, but remains a significant win over -Os. (Alas the absolute numbers aren't comparable as this testing included the 0/-1 write to memory changes). Tested (overnight) on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. 2021-12-22 Roger Sayle gcc/ChangeLog PR target/103773 * config/i386/i386.md (*movdi_internal): Only use short push/pop sequence for register (non-memory) destinations when the current function doesn't make use of a red zone. (*movsi_internal): Likewise. gcc/testsuite/ChangeLog PR target/103773 * gcc.target/i386/pr103773.c: New test case. Please let me know what you think. I'll revert, if this tweak doesn't address your concerns. Roger -- > -Original Message- > From: Uros Bizjak > Sent: 22 December 2021 08:20 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to > memory. > > On Wed, Dec 22, 2021 at 9:10 AM Uros Bizjak wrote: > > > > On Tue, Dec 21, 2021 at 1:27 PM Roger Sayle > wrote: > > > > > > > > > My apologies for the inconvenience. The new support for -Oz using > > > push/pop for small integer constants on x86_64 is only a win/correct > > > for loading registers. Fixed by adding !MEM_P tests in the > > > appropriate locations. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make > > > bootstrap and make -k check with no new failures. Ok for mainline? > > > > > > > > > 2021-12-21 Roger Sayle > > > > > > gcc/ChangeLog > > > PR target/103773 > > > * config/i386/i386.md (*movdi_internal): Only use short > > > push/pop sequence for register (non-memory) destinations. > > > (*movsi_internal): Likewise. > > > > > > gcc/testsuite/ChangeLog > > > PR target/103773 > > > * gcc.target/i386/pr103773.c: New test case. > > > > Ouch, as pointed out in the PR, this approach clobbers the red zone. > > > > Please revert the original patch. > > *Maybe* we can use frame->red_zone_size here, but the frame is recalculated > several times during the compilation. I think it is just too dangerous to use > push/pop w.r.t. red zone clobbering. > > Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d25453f..489cede 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2217,7 +2217,9 @@ if (optimize_size > 1 && TARGET_64BIT && CONST_INT_P (operands[1]) - && IN_RANGE (INTVAL (operands[1]), -128, 127)) + && IN_RANGE (INTVAL (operands[1]), -128, 127) + && !MEM_P (operands[0]) + && !ix86_red_zone_used) return "push{q}\t%1\n\tpop{q}\t%0"; return "mov{l}\t{%k1, %k0|%k0, %k1}"; } @@ -2440,7 +2442,9 @@ return "lea{l}\t{%E1, %0|%0, %E1}"; else if (optimize_size > 1 && CONST_INT_P (operands[1]) - && IN_RANGE (INTVAL (operands[1]), -128, 127)) + && IN_RANGE (INTVAL (operands[1]), -128, 127) + && !MEM_P (operands[0]) + && !ix86_red_zone_used) { if (TARGET_64BIT) return "push{q}\t%1\n\tpop{q}\t%q0";
[PATCH][pushed] docs: Unify instruct set name.
gcc/ChangeLog: * doc/extend.texi: Use uppercase letters for SSEx. --- gcc/doc/extend.texi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index f52384f7629..a15c4fe9b33 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -6855,12 +6855,12 @@ and SSE4.2). @item sse4.1 @itemx no-sse4.1 @cindex @code{target("sse4.1")} function attribute, x86 -Enable/disable the generation of the sse4.1 instructions. +Enable/disable the generation of the SSE4.1 instructions. @item sse4.2 @itemx no-sse4.2 @cindex @code{target("sse4.2")} function attribute, x86 -Enable/disable the generation of the sse4.2 instructions. +Enable/disable the generation of the SSE4.2 instructions. @item sse4a @itemx no-sse4a -- 2.34.1
Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
On Wed, Dec 22, 2021 at 10:26 AM Roger Sayle wrote: > > > Hi Uros, > Would you consider the following variant that disables this optimization when > a > red zone is used by the current function? You're right that cfun's > red_zone_size is > recalculated dynamically, but ix86_red_zone_used should be a better "gate" > given > that this logic resides very late during compilation, in the output > templates, where > whether or not a red zone is used is known. > > On CSiBE, disabling this optimization in non-leaf functions that use a red > zone costs > 219 bytes, but remains a significant win over -Os. (Alas the absolute > numbers aren't > comparable as this testing included the 0/-1 write to memory changes). > > Tested (overnight) on x86_64-pc-linux-gnu with make bootstrap and make -k > check > with no new failures. > > 2021-12-22 Roger Sayle > > gcc/ChangeLog > PR target/103773 > * config/i386/i386.md (*movdi_internal): Only use short > push/pop sequence for register (non-memory) destinations > when the current function doesn't make use of a red zone. > (*movsi_internal): Likewise. > > gcc/testsuite/ChangeLog > PR target/103773 > * gcc.target/i386/pr103773.c: New test case. > > Please let me know what you think. I'll revert, if this tweak doesn't address > your concerns. Yes, using ix86_red_zone_used looks safe. OTOH, is there a reason the transformation is not implemented via peephole2 pass? IIRC, frame is stable after pro_and_epilogue_pass, and peephole2 pass is instanced well after register allocation. Uros.
[PATCH][GCC] aarch64: fix: ls64 tests fail on aarch64-linux-gnu_ilp32 [PR103729]
This patch is sorting issue with LS64 intrinsics tests failing with aarch64-linux-gnu_ilp32 target. Regtested on aarch64-linux-gnu_ilp32, aarch64-elf and aarch64_be-elf and no issues. OK to install? gcc/ChangeLog: PR target/103729 * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin_ls64): Handle SImode for ILP32. rb15171.patch Description: rb15171.patch
[PATCH][pushed] docs: use ';' for function declarations.
Pushed as obvious, it makes the documentation more consistent. Martin gcc/ChangeLog: * doc/extend.texi: Unify all function declarations in examples where some miss trailing ';'. --- gcc/doc/extend.texi | 2973 +-- 1 file changed, 1483 insertions(+), 1490 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index a15c4fe9b33..7e5791b67c5 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -14591,34 +14591,34 @@ The following built-in functions are always available. They all generate the machine instruction that is part of the name. @smallexample -long __builtin_alpha_implver (void) -long __builtin_alpha_rpcc (void) -long __builtin_alpha_amask (long) -long __builtin_alpha_cmpbge (long, long) -long __builtin_alpha_extbl (long, long) -long __builtin_alpha_extwl (long, long) -long __builtin_alpha_extll (long, long) -long __builtin_alpha_extql (long, long) -long __builtin_alpha_extwh (long, long) -long __builtin_alpha_extlh (long, long) -long __builtin_alpha_extqh (long, long) -long __builtin_alpha_insbl (long, long) -long __builtin_alpha_inswl (long, long) -long __builtin_alpha_insll (long, long) -long __builtin_alpha_insql (long, long) -long __builtin_alpha_inswh (long, long) -long __builtin_alpha_inslh (long, long) -long __builtin_alpha_insqh (long, long) -long __builtin_alpha_mskbl (long, long) -long __builtin_alpha_mskwl (long, long) -long __builtin_alpha_mskll (long, long) -long __builtin_alpha_mskql (long, long) -long __builtin_alpha_mskwh (long, long) -long __builtin_alpha_msklh (long, long) -long __builtin_alpha_mskqh (long, long) -long __builtin_alpha_umulh (long, long) -long __builtin_alpha_zap (long, long) -long __builtin_alpha_zapnot (long, long) +long __builtin_alpha_implver (void); +long __builtin_alpha_rpcc (void); +long __builtin_alpha_amask (long); +long __builtin_alpha_cmpbge (long, long); +long __builtin_alpha_extbl (long, long); +long __builtin_alpha_extwl (long, long); +long __builtin_alpha_extll (long, long); +long __builtin_alpha_extql (long, long); +long __builtin_alpha_extwh (long, long); +long __builtin_alpha_extlh (long, long); +long __builtin_alpha_extqh (long, long); +long __builtin_alpha_insbl (long, long); +long __builtin_alpha_inswl (long, long); +long __builtin_alpha_insll (long, long); +long __builtin_alpha_insql (long, long); +long __builtin_alpha_inswh (long, long); +long __builtin_alpha_inslh (long, long); +long __builtin_alpha_insqh (long, long); +long __builtin_alpha_mskbl (long, long); +long __builtin_alpha_mskwl (long, long); +long __builtin_alpha_mskll (long, long); +long __builtin_alpha_mskql (long, long); +long __builtin_alpha_mskwh (long, long); +long __builtin_alpha_msklh (long, long); +long __builtin_alpha_mskqh (long, long); +long __builtin_alpha_umulh (long, long); +long __builtin_alpha_zap (long, long); +long __builtin_alpha_zapnot (long, long); @end smallexample The following built-in functions are always with @option{-mmax} @@ -14627,19 +14627,19 @@ later. They all generate the machine instruction that is part of the name. @smallexample -long __builtin_alpha_pklb (long) -long __builtin_alpha_pkwb (long) -long __builtin_alpha_unpkbl (long) -long __builtin_alpha_unpkbw (long) -long __builtin_alpha_minub8 (long, long) -long __builtin_alpha_minsb8 (long, long) -long __builtin_alpha_minuw4 (long, long) -long __builtin_alpha_minsw4 (long, long) -long __builtin_alpha_maxub8 (long, long) -long __builtin_alpha_maxsb8 (long, long) -long __builtin_alpha_maxuw4 (long, long) -long __builtin_alpha_maxsw4 (long, long) -long __builtin_alpha_perr (long, long) +long __builtin_alpha_pklb (long); +long __builtin_alpha_pkwb (long); +long __builtin_alpha_unpkbl (long); +long __builtin_alpha_unpkbw (long); +long __builtin_alpha_minub8 (long, long); +long __builtin_alpha_minsb8 (long, long); +long __builtin_alpha_minuw4 (long, long); +long __builtin_alpha_minsw4 (long, long); +long __builtin_alpha_maxub8 (long, long); +long __builtin_alpha_maxsb8 (long, long); +long __builtin_alpha_maxuw4 (long, long); +long __builtin_alpha_maxsw4 (long, long); +long __builtin_alpha_perr (long, long); @end smallexample The following built-in functions are always with @option{-mcix} @@ -14648,9 +14648,9 @@ later. They all generate the machine instruction that is part of the name. @smallexample -long __builtin_alpha_cttz (long) -long __builtin_alpha_ctlz (long) -long __builtin_alpha_ctpop (long) +long __builtin_alpha_cttz (long); +long __builtin_alpha_ctlz (long); +long __builtin_alpha_ctpop (long); @end smallexample The following built-in functions are available on systems that use the OSF/1 @@ -14659,8 +14659,8 @@ PAL calls, but when invoked with @option{-mtls-kernel}, they invoke @code{rdval} and @code{wrval}. @smallexample -void *__builtin_thread_pointer (void) -void __builtin_set_thread_pointer (void *) +void *__builtin_thread_pointer (void); +void __builtin_set_thread_p
Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to memory.
On Wed, Dec 22, 2021 at 11:26 AM Uros Bizjak wrote: > > On Wed, Dec 22, 2021 at 10:26 AM Roger Sayle > wrote: > > > > > > Hi Uros, > > Would you consider the following variant that disables this optimization > > when a > > red zone is used by the current function? You're right that cfun's > > red_zone_size is > > recalculated dynamically, but ix86_red_zone_used should be a better "gate" > > given > > that this logic resides very late during compilation, in the output > > templates, where > > whether or not a red zone is used is known. > > > > On CSiBE, disabling this optimization in non-leaf functions that use a red > > zone costs > > 219 bytes, but remains a significant win over -Os. (Alas the absolute > > numbers aren't > > comparable as this testing included the 0/-1 write to memory changes). > > > > Tested (overnight) on x86_64-pc-linux-gnu with make bootstrap and make -k > > check > > with no new failures. > > > > 2021-12-22 Roger Sayle > > > > gcc/ChangeLog > > PR target/103773 > > * config/i386/i386.md (*movdi_internal): Only use short > > push/pop sequence for register (non-memory) destinations > > when the current function doesn't make use of a red zone. > > (*movsi_internal): Likewise. > > > > gcc/testsuite/ChangeLog > > PR target/103773 > > * gcc.target/i386/pr103773.c: New test case. > > > > Please let me know what you think. I'll revert, if this tweak doesn't > > address > > your concerns. > > Yes, using ix86_red_zone_used looks safe. > > OTOH, is there a reason the transformation is not implemented via > peephole2 pass? IIRC, frame is stable after pro_and_epilogue_pass, and > peephole2 pass is instanced well after register allocation. Something like the attached patch. Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 58b10643fcb..e5d603f0025 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2514,6 +2514,24 @@ ] (symbol_ref "true")))]) +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "const_int_operand"))] + "optimize_insn_for_size_p () && optimize_size > 1 + && IN_RANGE (INTVAL (operands[1]), -128, 127) + && !ix86_red_zone_used" + [(set (match_dup 2) (match_dup 1)) + (set (match_dup 0) (match_dup 3))] +{ + if (GET_MODE (operands[0]) != word_mode) +operands[0] = gen_rtx_REG (word_mode, REGNO (operands[0])); + + operands[2] = gen_rtx_MEM (word_mode, +gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx)); + operands[3] = gen_rtx_MEM (word_mode, +gen_rtx_POST_INC (Pmode, stack_pointer_rtx)); +}) + (define_insn "*movhi_internal" [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
Re: [patch, Fortran] Make REAL(KIND=16) detection more robust
Thanks Thomas, pushed as 228173565eafbe34e44c1600c32e32a323eb5aab 228173565eafbe34e44c1600c32e32a323eb5aab.patch Description: Binary data
[PATCH] Fix typo in type verification.
Hello. The patch is quite obvious fix. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin PR ipa/103786 gcc/ChangeLog: * tree.c (verify_type): Fix typo. --- gcc/tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree.c b/gcc/tree.c index 72cceda568f..0741e3b01af 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -13530,7 +13530,7 @@ verify_type (const_tree t) tree ct = TYPE_CANONICAL (t); if (!ct) ; - else if (TYPE_CANONICAL (t) != ct) + else if (TYPE_CANONICAL (ct) != ct) { error ("% has different %"); debug_tree (ct); -- 2.34.1
[PATCH] docs: replace http:// with https://
I replaced and verified http:// links for various domains. Ready to be installed? Tahnks, Martin gcc/ada/ChangeLog: * doc/share/gnu_free_documentation_license.rst: Replace http:// with https. * gnat-style.texi: Likewise. * gnat_rm.texi: Likewise. * gnat_ugn.texi: Likewise. gcc/d/ChangeLog: * gdc.texi: Replace http:// with https. gcc/ChangeLog: * doc/contrib.texi: Replace http:// with https. * doc/contribute.texi: Likewise. * doc/extend.texi: Likewise. * doc/gccint.texi: Likewise. * doc/gnu.texi: Likewise. * doc/implement-c.texi: Likewise. * doc/implement-cxx.texi: Likewise. * doc/include/fdl.texi: Likewise. * doc/include/gpl_v3.texi: Likewise. * doc/install.texi: Likewise. * doc/invoke.texi: Likewise. * doc/passes.texi: Likewise. * doc/service.texi: Likewise. * doc/sourcebuild.texi: Likewise. * doc/standards.texi: Likewise. gcc/fortran/ChangeLog: * gfortran.texi: Replace http:// with https. * intrinsic.texi: Likewise. gcc/go/ChangeLog: * gccgo.texi: Replace http:// with https. gcc/jit/ChangeLog: * docs/_build/texinfo/libgccjit.texi: Replace http:// with https. * docs/cp/index.rst: Likewise. * docs/cp/intro/index.rst: Likewise. * docs/cp/intro/tutorial01.rst: Likewise. * docs/cp/intro/tutorial02.rst: Likewise. * docs/cp/intro/tutorial03.rst: Likewise. * docs/cp/intro/tutorial04.rst: Likewise. * docs/cp/topics/asm.rst: Likewise. * docs/cp/topics/compilation.rst: Likewise. * docs/cp/topics/contexts.rst: Likewise. * docs/cp/topics/expressions.rst: Likewise. * docs/cp/topics/functions.rst: Likewise. * docs/cp/topics/index.rst: Likewise. * docs/cp/topics/locations.rst: Likewise. * docs/cp/topics/objects.rst: Likewise. * docs/cp/topics/types.rst: Likewise. * docs/index.rst: Likewise. * docs/internals/index.rst: Likewise. * docs/intro/index.rst: Likewise. * docs/intro/tutorial01.rst: Likewise. * docs/intro/tutorial02.rst: Likewise. * docs/intro/tutorial03.rst: Likewise. * docs/intro/tutorial04.rst: Likewise. * docs/intro/tutorial05.rst: Likewise. * docs/topics/asm.rst: Likewise. * docs/topics/compatibility.rst: Likewise. * docs/topics/compilation.rst: Likewise. * docs/topics/contexts.rst: Likewise. * docs/topics/expressions.rst: Likewise. * docs/topics/function-pointers.rst: Likewise. * docs/topics/functions.rst: Likewise. * docs/topics/index.rst: Likewise. * docs/topics/locations.rst: Likewise. * docs/topics/objects.rst: Likewise. * docs/topics/performance.rst: Likewise. * docs/topics/types.rst: Likewise. --- .../share/gnu_free_documentation_license.rst | 4 +- gcc/ada/gnat-style.texi | 4 +- gcc/ada/gnat_rm.texi | 4 +- gcc/ada/gnat_ugn.texi | 4 +- gcc/d/gdc.texi| 10 +- gcc/doc/contrib.texi | 2 +- gcc/doc/contribute.texi | 10 +- gcc/doc/extend.texi | 4 +- gcc/doc/gccint.texi | 2 +- gcc/doc/gnu.texi | 4 +- gcc/doc/implement-c.texi | 2 +- gcc/doc/implement-cxx.texi| 2 +- gcc/doc/include/fdl.texi | 6 +- gcc/doc/include/gpl_v3.texi | 6 +- gcc/doc/install.texi | 32 +++ gcc/doc/invoke.texi | 10 +- gcc/doc/passes.texi | 2 +- gcc/doc/service.texi | 2 +- gcc/doc/sourcebuild.texi | 2 +- gcc/doc/standards.texi| 6 +- gcc/fortran/gfortran.texi | 14 +-- gcc/fortran/intrinsic.texi| 4 +- gcc/go/gccgo.texi | 4 +- gcc/jit/docs/_build/texinfo/libgccjit.texi| 96 +-- gcc/jit/docs/cp/index.rst | 4 +- gcc/jit/docs/cp/intro/index.rst | 2 +- gcc/jit/docs/cp/intro/tutorial01.rst | 2 +- gcc/jit/docs/cp/intro/tutorial02.rst | 2 +- gcc/jit/docs/cp/intro/tutorial03.rst | 2 +- gcc/jit/docs/cp/intro/tutorial04.rst | 2 +- gcc/jit/docs/cp/topics/asm.rst| 2 +- gcc/jit/docs/cp/topics/compilation.rst| 2 +- gcc/jit/docs/cp/topics/contexts.rst | 2 +- gcc/jit/docs/cp/topics/expressions.rst| 2 +- gcc/jit/docs/cp/topics/functions.rst | 2 +- gcc/jit/docs/cp/topics/index.rst | 2 +- gcc/jit/docs/cp/topics/locations.rst | 2 +- gcc/jit
[OG11][PATCH] OpenMP: Ensure that offloaded variables are public
This is now backported to the devel/omp/gcc-11 branch (OG11). Andrew On 09/12/2021 11:41, Andrew Stubbs wrote: On 02/12/2021 16:43, Jakub Jelinek wrote: On Thu, Dec 02, 2021 at 04:31:36PM +, Andrew Stubbs wrote: On 02/12/2021 16:05, Andrew Stubbs wrote: On 02/12/2021 12:58, Jakub Jelinek wrote: I've tried modifying offload_handle_link_vars but that spot doesn't catch the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c, which was one of the motivating examples. Why doesn't catch it? Is the variable created only post-IPA? I'd think that it should have been created before IPA, streamed and therefore I don't understand why you don't see it after streaming LTO in. On closer inspection it does, in fact, catch it as you'd expect, but then the variable is no longer marked public when it gets to pass_omp_target_link::execute, so something somewhere is resetting it. More investigation is needed The "whole-program" pass is removing the public flag. That's probably working as intended, and I assume it is run for offload code on purpose? So you'd stick it somewhere into e.g. symbol_table::compile after ipa_passes call, guarded with #ifdef ACCEL_COMPILER ? I've given up on this approach, and switched to loading the symbol addresses from the table directly. The relocation issues that I had with older assemblers/linkers do not seem to be a problem any more. This patch requires only a single symbol to be forced global, and since that's one that I create in mkoffload there is no issue with previous definitions. I think I can approve this myself, but if you have any observations I'm happy to hear them. Andrew
Re: [PATCH] docs: replace http:// with https://
Excerpts from Martin Liška's message of Dezember 22, 2021 1:57 pm: > I replaced and verified http:// links for various domains. > > Ready to be installed? > Tahnks, > Martin > Hi, > gcc/d/ChangeLog: > > * gdc.texi: Replace http:// with https. > > --- > gcc/d/gdc.texi| 10 +- > OK for the D front-end docs change. Iain. > diff --git a/gcc/d/gdc.texi b/gcc/d/gdc.texi > index bfec1568857..d93d2e8001a 100644 > --- a/gcc/d/gdc.texi > +++ b/gcc/d/gdc.texi > @@ -326,14 +326,14 @@ values are supported: > @item all > Turns on all upcoming D language features. > @item dip1000 > -Implements @uref{http://wiki.dlang.org/DIP1000} (Scoped pointers). > +Implements @uref{https://wiki.dlang.org/DIP1000} (Scoped pointers). > @item dip1008 > -Implements @uref{http://wiki.dlang.org/DIP1008} (Allow exceptions in > +Implements @uref{https://wiki.dlang.org/DIP1008} (Allow exceptions in > @code{@@nogc} code). > @item dip1021 > -Implements @uref{http://wiki.dlang.org/DIP1021} (Mutable function arguments). > +Implements @uref{https://wiki.dlang.org/DIP1021} (Mutable function > arguments). > @item dip25 > -Implements @uref{http://wiki.dlang.org/DIP25} (Sealed references). > +Implements @uref{https://wiki.dlang.org/DIP25} (Sealed references). > @item dtorfields > Turns on generation for destructing fields of partially constructed objects. > @item fieldwise > @@ -383,7 +383,7 @@ are supported: > @item all > Turns off all revertable D language features. > @item dip25 > -Reverts @uref{http://wiki.dlang.org/DIP25} (Sealed references). > +Reverts @uref{https://wiki.dlang.org/DIP25} (Sealed references). > @item dtorfields > Turns off generation for destructing fields of partially constructed > objects. > @item markdown
Re: [PATCH 1/2][GCC] arm: Move arm_simd_info array declaration into header
On 24/11/2021 12:18, Richard Earnshaw via Gcc-patches wrote: On 24/11/2021 12:15, Murray Steele wrote: On 18/11/2021 15:40, Richard Earnshaw wrote: On 16/11/2021 10:14, Murray Steele via Gcc-patches wrote: Hi all, This patch moves the arm_simd_type and arm_type_qualifiers enums, and arm_simd_info struct from arm-builtins.c into arm-builtins.h header. This is a first step towards internalising the type definitions for MVE predicate, vector, and tuple types. By moving arm_simd_types into a header, we allow future patches to use these type trees externally to arm-builtins.c, which is a crucial step towards developing an MVE intrinsics framework similar to the current SVE implementation. Thanks, Murray gcc/ChangeLog: * config/arm/arm-builtins.c (enum arm_type_qualifiers): Move to arm_builtins.h (enum arm_simd_type): Move to arm-builtins.h (struct arm_simd_type_info): Move to arm-builtins.h * config/arm/arm-builtins.h (enum arm_simd_type): Move from arm-builtins.c (enum arm_type_qualifiers): Move from arm-builtins.c (struct arm_simd_type_info): Move from arm-builtins.c OK. R. Hi Richard, I don't currently have write access, so I will need this patch committed on my behalf. Thanks again, Murray That can be done when 2/2 patch has been resolved. They need to go in together. R. Now pushed. R.
Re: [PATCH v3 2/2][GCC] arm: Declare MVE types internally via pragma
On 09/12/2021 15:24, Murray Steele via Gcc-patches wrote: Changes from original patch: 1. Make mentioned changes to changelog. 2. Add namespace-end comments. 3. Add #error for when arm-mve-builtins.def is included without defining DEF_MVE_TYPE. 4. Make placement of '#undef DEF_MVE_TYPE' consistent. --- This patch moves the implementation of MVE ACLE types from arm_mve_types.h to inside GCC via a new pragma, which replaces the prior type definitions. This allows for the types to be used internally for intrinsic function definitions. Bootstrapped and regression tested on arm-none-linux-gnuabihf, and regression tested on arm-eabi -- no issues. Thanks, Murray gcc/ChangeLog: * config.gcc: Add arm-mve-builtins.o to extra_objs. * config/arm/arm-c.c (arm_pragma_arm): Handle "#pragma GCC arm". (arm_register_target_pragmas): Register it. * config/arm/arm-protos.h: (arm_mve::arm_handle_mve_types_h): New prototype. * config/arm/arm_mve_types.h: Replace MVE type definitions with new pragma. * config/arm/t-arm: (arm-mve-builtins.o): New target rule. * config/arm/arm-mve-builtins.cc: New file. * config/arm/arm-mve-builtins.def: New file. * config/arm/arm-mve-builtins.h: New file. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/mve.exp: Add new subdirectories. * gcc.target/arm/mve/general-c/type_redef_1.c: New test. * gcc.target/arm/mve/general/double_pragmas_1.c: New test. * gcc.target/arm/mve/general/nomve_1.c: New test. I fixed a minor issue in the changelog (config.gcc needs to mention arm*-*-* as the 'function') and pushed this. Thanks, R.
[PATCH][GCC] arm: fix __arm_vld1q_z* and __arm_vst1q_p* intrinsics.
Hi All, This patch fixes the implementation of the existing __arm_vld1q_z* and __arm_vst1q_p* MVE intrinsic functions. The MVE ACLE allows for __ARM_MVE_PRESERVE_USER_NAMESPACE to be defined, which removes definitions for intrinsic functions without the __arm_ prefix. __arm_vld1q_z* and __arm_vst1q_p* are currently implemented via calls to vldr* and vstr*, which results in several compile-time errors when __ARM_MVE_PRESERVE_USER_NAMESPACE is defined. This patch replaces these with calls to their prefixed counterparts, __arm_vldr* and __arm_str*, and adds a test covering the definition of __ARM_MVE_PRESERVE_USER_NAMESPACE. Regression tested on arm-eabi -- no issues. Thanks, Murray gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vst1q_p_u8): Use prefixed intrinsic function. (__arm_vst1q_p_s8): Likewise. (__arm_vld1q_z_u8): Likewise. (__arm_vld1q_z_s8): Likewise. (__arm_vst1q_p_u16): Likewise. (__arm_vst1q_p_s16): Likewise. (__arm_vld1q_z_u16): Likewise. (__arm_vld1q_z_s16): Likewise. (__arm_vst1q_p_u32): Likewise. (__arm_vst1q_p_s32): Likewise. (__arm_vld1q_z_u32): Likewise. (__arm_vld1q_z_s32): Likewise. (__arm_vld1q_z_f16): Likewise. (__arm_vst1q_p_f16): Likewise. (__arm_vld1q_z_f32): Likewise. (__arm_vst1q_p_f32): Likewise. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/general/preserve_user_namespace_1.c: New test.diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index e04d46218d03effdf0cb79471108cd2f24e92dec..708f5c71fddfc2cab0b0456e0b8724c803544ddc 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -16171,14 +16171,14 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst1q_p_u8 (uint8_t * __addr, uint8x16_t __value, mve_pred16_t __p) { - return vstrbq_p_u8 (__addr, __value, __p); + return __arm_vstrbq_p_u8 (__addr, __value, __p); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst1q_p_s8 (int8_t * __addr, int8x16_t __value, mve_pred16_t __p) { - return vstrbq_p_s8 (__addr, __value, __p); + return __arm_vstrbq_p_s8 (__addr, __value, __p); } __extension__ extern __inline void @@ -16203,14 +16203,14 @@ __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld1q_z_u8 (uint8_t const *__base, mve_pred16_t __p) { - return vldrbq_z_u8 ( __base, __p); + return __arm_vldrbq_z_u8 ( __base, __p); } __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld1q_z_s8 (int8_t const *__base, mve_pred16_t __p) { - return vldrbq_z_s8 ( __base, __p); + return __arm_vldrbq_z_s8 ( __base, __p); } __extension__ extern __inline int8x16x2_t @@ -16253,14 +16253,14 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst1q_p_u16 (uint16_t * __addr, uint16x8_t __value, mve_pred16_t __p) { - return vstrhq_p_u16 (__addr, __value, __p); + return __arm_vstrhq_p_u16 (__addr, __value, __p); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst1q_p_s16 (int16_t * __addr, int16x8_t __value, mve_pred16_t __p) { - return vstrhq_p_s16 (__addr, __value, __p); + return __arm_vstrhq_p_s16 (__addr, __value, __p); } __extension__ extern __inline void @@ -16285,14 +16285,14 @@ __extension__ extern __inline uint16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld1q_z_u16 (uint16_t const *__base, mve_pred16_t __p) { - return vldrhq_z_u16 ( __base, __p); + return __arm_vldrhq_z_u16 ( __base, __p); } __extension__ extern __inline int16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld1q_z_s16 (int16_t const *__base, mve_pred16_t __p) { - return vldrhq_z_s16 ( __base, __p); + return __arm_vldrhq_z_s16 ( __base, __p); } __extension__ extern __inline int16x8x2_t @@ -16335,14 +16335,14 @@ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst1q_p_u32 (uint32_t * __addr, uint32x4_t __value, mve_pred16_t __p) { - return vstrwq_p_u32 (__addr, __value, __p); + return __arm_vstrwq_p_u32 (__addr, __value, __p); } __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst1q_p_s32 (int32_t * __addr, int32x4_t __value, mve_pred16_t __p) { - return vstrwq_p_s32 (__addr, __value, __p); + return __arm_vstrwq_p_s32 (__addr, __value, __p); } __extension__ extern __inline void @@ -16367,14 +16367,14 @@ __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vld1q_z_u32 (uint32_t const *__base, mve_pred16_t __
Re: [PATCH][GCC] arm: fix __arm_vld1q_z* and __arm_vst1q_p* intrinsics.
On 22/12/2021 15:55, Murray Steele via Gcc-patches wrote: > Hi All, > > This patch fixes the implementation of the existing __arm_vld1q_z* and > __arm_vst1q_p* MVE intrinsic functions. > > The MVE ACLE allows for __ARM_MVE_PRESERVE_USER_NAMESPACE to be defined, > which removes definitions for intrinsic functions without the __arm_ > prefix. __arm_vld1q_z* and __arm_vst1q_p* are currently implemented via > calls to vldr* and vstr*, which results in several compile-time errors when > __ARM_MVE_PRESERVE_USER_NAMESPACE is defined. This patch replaces these > with calls to their prefixed counterparts, __arm_vldr* and __arm_str*, > and adds a test covering the definition of __ARM_MVE_PRESERVE_USER_NAMESPACE. Is there a PR in bugzilla for this? R. > > Regression tested on arm-eabi -- no issues. > > Thanks, > Murray > > gcc/ChangeLog: > > * config/arm/arm_mve.h (__arm_vst1q_p_u8): Use prefixed intrinsic > function. > (__arm_vst1q_p_s8): Likewise. > (__arm_vld1q_z_u8): Likewise. > (__arm_vld1q_z_s8): Likewise. > (__arm_vst1q_p_u16): Likewise. > (__arm_vst1q_p_s16): Likewise. > (__arm_vld1q_z_u16): Likewise. > (__arm_vld1q_z_s16): Likewise. > (__arm_vst1q_p_u32): Likewise. > (__arm_vst1q_p_s32): Likewise. > (__arm_vld1q_z_u32): Likewise. > (__arm_vld1q_z_s32): Likewise. > (__arm_vld1q_z_f16): Likewise. > (__arm_vst1q_p_f16): Likewise. > (__arm_vld1q_z_f32): Likewise. > (__arm_vst1q_p_f32): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/mve/general/preserve_user_namespace_1.c: New test. >
Re: [PATCH] Fix typo in type verification.
On December 22, 2021 1:03:18 PM GMT+01:00, "Martin Liška" wrote: >Hello. > >The patch is quite obvious fix. > >Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >Ready to be installed? Ok. Richard. >Thanks, >Martin > > PR ipa/103786 > >gcc/ChangeLog: > > * tree.c (verify_type): Fix typo. >--- > gcc/tree.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/gcc/tree.c b/gcc/tree.c >index 72cceda568f..0741e3b01af 100644 >--- a/gcc/tree.c >+++ b/gcc/tree.c >@@ -13530,7 +13530,7 @@ verify_type (const_tree t) >tree ct = TYPE_CANONICAL (t); >if (!ct) > ; >- else if (TYPE_CANONICAL (t) != ct) >+ else if (TYPE_CANONICAL (ct) != ct) > { >error ("% has different %"); >debug_tree (ct);
Re: [PATCH][GCC] arm: fix __arm_vld1q_z* and __arm_vst1q_p* intrinsics.
Hi, On 22/12/2021 16:04, Richard Earnshaw wrote: > > Is there a PR in bugzilla for this? > > R. > No, not at this time. It's something I came across whilst making changes of my own. For completeness, the ACLE specification I am referencing has been added below [1]. [1]: https://github.com/ARM-software/acle/releases/tag/r2021Q3 Thanks, Murray
[PATCH] c++: hard error w/ ptr+CST and incomplete type [PR103700]
In pointer_int_sum when called from a SFINAE context, we need to avoid calling size_in_bytes_loc on an incomplete pointed-to type since this latter function isn't SFINAE-friendly and always emits an error in this case. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps 11? pointer_int_sum is also used in the C FE, but always with the complain parameter defaulted to true so this change should have no effect there AFAICT. PR c++/103700 gcc/c-family/ChangeLog: * c-common.c (pointer_int_sum): When quiet, return error_mark_node for an incomplete type and avoid calling size_in_bytes_loc. gcc/testsuite/ChangeLog: * g++.dg/template/sfinae32.C: New test. --- gcc/c-family/c-common.c | 2 ++ gcc/testsuite/g++.dg/template/sfinae32.C | 17 + 2 files changed, 19 insertions(+) create mode 100644 gcc/testsuite/g++.dg/template/sfinae32.C diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index a25d59fa77b..f3e3e9ba0a5 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -3308,6 +3308,8 @@ pointer_int_sum (location_t loc, enum tree_code resultcode, size_exp = integer_one_node; else { + if (!complain && !COMPLETE_TYPE_P (TREE_TYPE (result_type))) + return error_mark_node; size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type)); /* Wrap the pointer expression in a SAVE_EXPR to make sure it is evaluated first when the size expression may depend diff --git a/gcc/testsuite/g++.dg/template/sfinae32.C b/gcc/testsuite/g++.dg/template/sfinae32.C new file mode 100644 index 000..488bf145e21 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/sfinae32.C @@ -0,0 +1,17 @@ +// PR c++/103700 +// { dg-do compile { target c++11 } } + +template auto f() -> decltype(*p + N) = delete; +template auto f() -> decltype(*p - N) = delete; +template auto f() -> decltype(N + *p) = delete; +template void f(); + +struct Incomplete *p; + +int main() { + f(); + f(); + f(); + f(); + f(); +} -- 2.34.1.363.g597af311a2
Re: [PATCH] c++: hard error w/ ptr+CST and incomplete type [PR103700]
On 12/22/21 12:39, Patrick Palka wrote: In pointer_int_sum when called from a SFINAE context, we need to avoid calling size_in_bytes_loc on an incomplete pointed-to type since this latter function isn't SFINAE-friendly and always emits an error in this case. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps 11? pointer_int_sum is also used in the C FE, but always with the complain parameter defaulted to true so this change should have no effect there AFAICT. LGTM, but let's give the C maintainers time to comment; OK on Friday if no comment. PR c++/103700 gcc/c-family/ChangeLog: * c-common.c (pointer_int_sum): When quiet, return error_mark_node for an incomplete type and avoid calling size_in_bytes_loc. gcc/testsuite/ChangeLog: * g++.dg/template/sfinae32.C: New test. --- gcc/c-family/c-common.c | 2 ++ gcc/testsuite/g++.dg/template/sfinae32.C | 17 + 2 files changed, 19 insertions(+) create mode 100644 gcc/testsuite/g++.dg/template/sfinae32.C diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index a25d59fa77b..f3e3e9ba0a5 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -3308,6 +3308,8 @@ pointer_int_sum (location_t loc, enum tree_code resultcode, size_exp = integer_one_node; else { + if (!complain && !COMPLETE_TYPE_P (TREE_TYPE (result_type))) + return error_mark_node; size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type)); /* Wrap the pointer expression in a SAVE_EXPR to make sure it is evaluated first when the size expression may depend diff --git a/gcc/testsuite/g++.dg/template/sfinae32.C b/gcc/testsuite/g++.dg/template/sfinae32.C new file mode 100644 index 000..488bf145e21 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/sfinae32.C @@ -0,0 +1,17 @@ +// PR c++/103700 +// { dg-do compile { target c++11 } } + +template auto f() -> decltype(*p + N) = delete; +template auto f() -> decltype(*p - N) = delete; +template auto f() -> decltype(N + *p) = delete; +template void f(); + +struct Incomplete *p; + +int main() { + f(); + f(); + f(); + f(); + f(); +}
[PATCH] libsanitizer: Fix setbuffer() interceptor (accept size not mode)
Fixes: b667dd7017a ("Libsanitizer merge from trunk r368656.") Refs: https://reviews.llvm.org/D116176 --- .../sanitizer_common/sanitizer_common_interceptors.inc | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc index abb38ccfa15..60b0545a943 100644 --- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc +++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc @@ -7858,12 +7858,13 @@ INTERCEPTOR(void, setbuf, __sanitizer_FILE *stream, char *buf) { unpoison_file(stream); } -INTERCEPTOR(void, setbuffer, __sanitizer_FILE *stream, char *buf, int mode) { +INTERCEPTOR(void, setbuffer, __sanitizer_FILE *stream, char *buf, + SIZE_T size) { void *ctx; - COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, mode); + COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, size); REAL(setbuffer)(stream, buf, mode); if (buf) { -COMMON_INTERCEPTOR_WRITE_RANGE(ctx, buf, __sanitizer_bufsiz); +COMMON_INTERCEPTOR_WRITE_RANGE(ctx, buf, size); } if (stream) unpoison_file(stream); -- 2.33.1
Re: [PATCH] libsanitizer: Fix setbuffer() interceptor (accept size not mode)
On 22 December 2021 19:19:12 CET, Azat Khuzhin via Gcc-patches wrote: >Fixes: b667dd7017a ("Libsanitizer merge from trunk r368656.") >Refs: https://reviews.llvm.org/D116176 >--- > .../sanitizer_common/sanitizer_common_interceptors.inc | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > >diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc >b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc >index abb38ccfa15..60b0545a943 100644 >--- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc >+++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc >@@ -7858,12 +7858,13 @@ INTERCEPTOR(void, setbuf, __sanitizer_FILE *stream, >char *buf) { > unpoison_file(stream); > } > >-INTERCEPTOR(void, setbuffer, __sanitizer_FILE *stream, char *buf, int mode) { >+INTERCEPTOR(void, setbuffer, __sanitizer_FILE *stream, char *buf, >+ SIZE_T size) { > void *ctx; >- COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, mode); >+ COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, size); > REAL(setbuffer)(stream, buf, mode); Where does mode come from after this patch? thanks, > if (buf) { >-COMMON_INTERCEPTOR_WRITE_RANGE(ctx, buf, __sanitizer_bufsiz); >+COMMON_INTERCEPTOR_WRITE_RANGE(ctx, buf, size); > } > if (stream) > unpoison_file(stream);
Re: [PATCH] libsanitizer: Fix setbuffer() interceptor (accept size not mode)
On Wed, Dec 22, 2021 at 09:41:06PM +0100, Bernhard Reutner-Fischer wrote: > On 22 December 2021 19:19:12 CET, Azat Khuzhin via Gcc-patches > wrote: > >Fixes: b667dd7017a ("Libsanitizer merge from trunk r368656.") > >Refs: https://reviews.llvm.org/D116176 > >--- > > .../sanitizer_common/sanitizer_common_interceptors.inc | 7 --- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > >diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc > >b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc > >index abb38ccfa15..60b0545a943 100644 > >--- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc > >+++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc > >@@ -7858,12 +7858,13 @@ INTERCEPTOR(void, setbuf, __sanitizer_FILE *stream, > >char *buf) { > > unpoison_file(stream); > > } > > > >-INTERCEPTOR(void, setbuffer, __sanitizer_FILE *stream, char *buf, int mode) > >{ > >+INTERCEPTOR(void, setbuffer, __sanitizer_FILE *stream, char *buf, > >+ SIZE_T size) { > > void *ctx; > >- COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, mode); > >+ COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, size); > > REAL(setbuffer)(stream, buf, mode); > > Where does mode come from after this patch? setbuffer() does not accept mode, it simply do not change it. Only setvbuf() can change the mode. > thanks, > > > if (buf) { > >-COMMON_INTERCEPTOR_WRITE_RANGE(ctx, buf, __sanitizer_bufsiz); > >+COMMON_INTERCEPTOR_WRITE_RANGE(ctx, buf, size); > > } > > if (stream) > > unpoison_file(stream); >
[PATCH v2] libsanitizer: Fix setbuffer() interceptor (accept size not mode)
Fixes: b667dd7017a ("Libsanitizer merge from trunk r368656.") Refs: https://reviews.llvm.org/D116176 --- .../sanitizer_common/sanitizer_common_interceptors.inc | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc index abb38ccfa15..86784768fe5 100644 --- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc +++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc @@ -7858,12 +7858,13 @@ INTERCEPTOR(void, setbuf, __sanitizer_FILE *stream, char *buf) { unpoison_file(stream); } -INTERCEPTOR(void, setbuffer, __sanitizer_FILE *stream, char *buf, int mode) { +INTERCEPTOR(void, setbuffer, __sanitizer_FILE *stream, char *buf, + SIZE_T size) { void *ctx; - COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, mode); - REAL(setbuffer)(stream, buf, mode); + COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, size); + REAL(setbuffer)(stream, buf, size); if (buf) { -COMMON_INTERCEPTOR_WRITE_RANGE(ctx, buf, __sanitizer_bufsiz); +COMMON_INTERCEPTOR_WRITE_RANGE(ctx, buf, size); } if (stream) unpoison_file(stream); -- 2.33.1
Re: [PATCH] libsanitizer: Fix setbuffer() interceptor (accept size not mode)
On Wed, Dec 22, 2021 at 09:41:06PM +0100, Bernhard Reutner-Fischer wrote: > On 22 December 2021 19:19:12 CET, Azat Khuzhin via Gcc-patches > wrote: > >- COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, mode); > >+ COMMON_INTERCEPTOR_ENTER(ctx, setbuffer, stream, buf, size); > > REAL(setbuffer)(stream, buf, mode); > > Where does mode come from after this patch? Sorry, missed that. Fixed and send v2 patch. Thanks!
Re: [PATCH] libsanitizer: Fix setbuffer() interceptor (accept size not mode)
On Wed, 22 Dec 2021 23:50:39 +0300 Azat Khuzhin wrote: > Thanks! you're welcome. You should state how you tested the patch. Please refer to https://gcc.gnu.org/contribute.html#testing thanks,
Re: [PATCH] libsanitizer: Fix setbuffer() interceptor (accept size not mode)
On Wed, Dec 22, 2021 at 10:02:02PM +0100, Bernhard Reutner-Fischer wrote: > You should state how you tested the patch. Please refer to > https://gcc.gnu.org/contribute.html#testing I though about this, but when gcc syncs changes with upstream [1], it does not syncs tests, even though they were there [2]. [1]: https://github.com/gcc-mirror/gcc/commit/b667dd7017a8 [2]: https://github.com/llvm/llvm-project/commit/0c81a62d9d76 That's why I though that test can be ommitted in this case (I've added a test for llvm in [3]). [3]: https://reviews.llvm.org/D116176 So what is the right way here? - migrate all tests - write test only for setbuffer() - do not add any tests, since they are covered in llvm repo -- Azat.
[PATCH] rs6000: Fix an assertion in update_target_cost_per_stmt [PR103702]
Hi, This patch is to fix one wrong assertion which is too aggressive. Vectorizer can do vec_construct costing for the vector type which only has one unit. For the failed case, the passed-in vector type is "vector(1) int", though it doesn't end up with any construction eventually. We have to handle this kind of input in function rs6000_cost_data::update_target_cost_per_stmt. Bootstrapped and regtested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8. Is it ok for trunk? BR, Kewen - gcc/ChangeLog: PR target/103702 * config/rs6000/rs6000.c (rs6000_cost_data::update_target_cost_per_stmt): Fix one wrong assertion with early return. gcc/testsuite/ChangeLog: PR target/103702 * gcc.target/powerpc/pr103702.c: New test. --- gcc/config/rs6000/rs6000.c | 7 -- gcc/testsuite/gcc.target/powerpc/pr103702.c | 24 + 2 files changed, 29 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103702.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 0b09713b2f5..37f07fe5358 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5461,8 +5461,11 @@ rs6000_cost_data::update_target_cost_per_stmt (vect_cost_for_stmt kind, { tree vectype = STMT_VINFO_VECTYPE (stmt_info); unsigned int nunits = vect_nunits_for_cost (vectype); - /* We don't expect strided/elementwise loads for just 1 nunit. */ - gcc_assert (nunits > 1); + /* As PR103702 shows, it's possible that vectorizer wants to do +costings for only one unit here, it's no need to do any +penalization for it, so simply early return here. */ + if (nunits == 1) + return; /* i386 port adopts nunits * stmt_cost as the penalized cost for this kind of penalization, we used to follow it but found it could result in an unreliable body cost especially diff --git a/gcc/testsuite/gcc.target/powerpc/pr103702.c b/gcc/testsuite/gcc.target/powerpc/pr103702.c new file mode 100644 index 000..585946fd64b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103702.c @@ -0,0 +1,24 @@ +/* We don't have one powerpc.*_ok for Power6, use altivec_ok conservatively. */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-options "-mdejagnu-cpu=power6 -O2 -ftree-loop-vectorize -fno-tree-scev-cprop" } */ + +/* Verify there is no ICE. */ + +unsigned short a, e; +int *b, *d; +int c; +extern int fn2 (); +void +fn1 () +{ + void *f; + for (;;) +{ + fn2 (); + b = f; + e = 0; + for (; e < a; ++e) + b[e] = d[e * c]; +} +} + -- 2.27.0
[PATCH] rs6000: Disable MMA if no P9 VECTOR support [PR103627]
Hi, As PR103627 shows, there is an unexpected case where !TARGET_VSX and TARGET_MMA co-exist. As ISA3.1 claims, SIMD is a requirement for MMA. By looking into the ICE, I noticed that the current MMA implementation depends on vector pairs load/store, but since we don't have a separated option to control Power10 vector, this patch is to check for Power9 vector instead. Bootstrapped and regtested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8. Is it ok for trunk? BR, Kewen - gcc/ChangeLog: PR target/103627 * config/rs6000/rs6000.c (rs6000_option_override_internal): Disable MMA if !TARGET_P9_VECTOR. gcc/testsuite/ChangeLog: PR target/103627 * gcc.target/powerpc/pr103627-1.c: New test. * gcc.target/powerpc/pr103627-2.c: New test. --- gcc/config/rs6000/rs6000.c| 11 +++ gcc/testsuite/gcc.target/powerpc/pr103627-1.c | 16 gcc/testsuite/gcc.target/powerpc/pr103627-2.c | 16 3 files changed, 43 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103627-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103627-2.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index c020947abc8..ec3b46682a7 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4505,6 +4505,17 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags &= ~OPTION_MASK_MMA; } + /* MMA requires SIMD support as ISA 3.1 claims and our implementation + such as "*movoo" uses vector pair access which are only supported + from ISA 3.1. But since we don't have one separated option to + control Power10 vector, check for Power9 vector instead. */ + if (TARGET_MMA && !TARGET_P9_VECTOR) +{ + if ((rs6000_isa_flags_explicit & OPTION_MASK_MMA) != 0) + error ("%qs requires %qs", "-mmma", "-mpower9-vector"); + rs6000_isa_flags &= ~OPTION_MASK_MMA; +} + if (!TARGET_PCREL && TARGET_PCREL_OPT) rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT; diff --git a/gcc/testsuite/gcc.target/powerpc/pr103627-1.c b/gcc/testsuite/gcc.target/powerpc/pr103627-1.c new file mode 100644 index 000..6c6c16188fb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103627-1.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -mno-power9-vector" } */ + +/* Verify compiler emits error message instead of ICE. */ + +extern float *dest; +extern __vector_quad src; + +int +foo () +{ + __builtin_mma_disassemble_acc (dest, &src); + /* { dg-error "'__builtin_mma_disassemble_acc' requires the '-mmma' option" "" { target *-*-* } .-1 } */ + return 0; +} + diff --git a/gcc/testsuite/gcc.target/powerpc/pr103627-2.c b/gcc/testsuite/gcc.target/powerpc/pr103627-2.c new file mode 100644 index 000..6604872c0e8 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103627-2.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -mmma -mno-power9-vector" } */ + +/* Verify the emitted error message. */ + +extern float *dest; +extern __vector_quad src; + +int +foo () +{ + __builtin_mma_disassemble_acc (dest, &src); + /* { dg-error "'-mmma' requires '-mpower9-vector'" "mma" { target *-*-* } 0 } */ + return 0; +} + -- 2.27.0
[PATCH] rs6000: Move the hunk affecting VSX/ALTIVEC ahead [PR103627]
Hi, There is one hunk checking for functions with target attribute/pragma have the same altivec abi as the one of main_target_opt, it can update both VSX and ALTIVEC flags. Meanwhile, we have some codes to check or warn for some isa flags related to VSX and ALTIVEC, that sit where the mentioned hunk is proposed to be moved to in this patch. Since the flags update in the mentioned hunk happen behind those adjustments based on VSX and ALTIVEC flags, it can cause the incompatibility and result in unexpected behaviors, the associated test case is one typical case. Besides, we already have the code which sets TARGET_FLOAT128_TYPE and lays after where the hunk is moved to, and OPTION_MASK_FLOAT128_KEYWORD will rely on TARGET_FLOAT128_TYPE, so this patch just simply removes them. Bootstrapped and regtested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8 and P7. Is it ok for trunk? BR, Kewen - gcc/ChangeLog: PR target/103627 * config/rs6000/rs6000.c (rs6000_option_override_internal): Move the hunk affecting VSX and ALTIVEC to the appropriate place. gcc/testsuite/ChangeLog: PR target/103627 * gcc.target/powerpc/pr103627-3.c: New test. --- gcc/config/rs6000/rs6000.c| 21 --- gcc/testsuite/gcc.target/powerpc/pr103627-3.c | 20 ++ 2 files changed, 29 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103627-3.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index ec3b46682a7..0b09713b2f5 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -3955,6 +3955,15 @@ rs6000_option_override_internal (bool global_init_p) else if (TARGET_ALTIVEC) rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~ignore_masks); + /* Disable VSX and Altivec silently if the user switched cpus to power7 in a + target attribute or pragma which automatically enables both options, + unless the altivec ABI was set. This is set by default for 64-bit, but + not for 32-bit. Don't move this before the above code using ignore_masks, + since it can reset the cleared VSX/ALTIVEC flag again. */ + if (main_target_opt != NULL && !main_target_opt->x_rs6000_altivec_abi) +rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC) + & ~rs6000_isa_flags_explicit); + if (TARGET_CRYPTO && !TARGET_ALTIVEC) { if (rs6000_isa_flags_explicit & OPTION_MASK_CRYPTO) @@ -4373,18 +4382,6 @@ rs6000_option_override_internal (bool global_init_p) } } - /* Disable VSX and Altivec silently if the user switched cpus to power7 in a - target attribute or pragma which automatically enables both options, - unless the altivec ABI was set. This is set by default for 64-bit, but - not for 32-bit. */ - if (main_target_opt != NULL && !main_target_opt->x_rs6000_altivec_abi) -{ - TARGET_FLOAT128_TYPE = 0; - rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC -| OPTION_MASK_FLOAT128_KEYWORD) - & ~rs6000_isa_flags_explicit); -} - /* Enable Altivec ABI for AIX -maltivec. */ if (TARGET_XCOFF && (TARGET_ALTIVEC || TARGET_VSX) diff --git a/gcc/testsuite/gcc.target/powerpc/pr103627-3.c b/gcc/testsuite/gcc.target/powerpc/pr103627-3.c new file mode 100644 index 000..9df2b73fe85 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103627-3.c @@ -0,0 +1,20 @@ +/* There are no error messages for either LE or BE 64bit. */ +/* { dg-require-effective-target be }*/ +/* { dg-require-effective-target ilp32 } */ +/* We don't have one powerpc.*_ok for Power6, use altivec_ok conservatively. */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-options "-mdejagnu-cpu=power6" } */ + +/* Verify compiler emits error message instead of ICE. */ + +#pragma GCC target "cpu=power10" +int +main () +{ + float *b; + __vector_quad c; + __builtin_mma_disassemble_acc (b, &c); + /* { dg-error "'__builtin_mma_disassemble_acc' requires the '-mmma' option" "" { target *-*-* } .-1 } */ + return 0; +} + -- 2.27.0
Re: [PATCH] i386: Enable intrinsics that convert float and bf16 data to each other.
On Wed, Dec 22, 2021 at 11:28 AM Kong, Lingling via Gcc-patches wrote: > > Hi, > > > This patch is to enable intrinsics that convert float and bf16 data to each > other. > Ok for master? > Ok. > gcc/ChangeLog: > > * config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Add new intrinsic. > (_mm512_cvtpbh_ps): Likewise. > (_mm512_maskz_cvtpbh_ps): Likewise. > (_mm512_mask_cvtpbh_ps): Likewise. > * config/i386/avx512bf16vlintrin.h (_mm_cvtness_sbh): Likewise. > (_mm_cvtpbh_ps): Likewise. > (_mm256_cvtpbh_ps): Likewise. > (_mm_maskz_cvtpbh_ps): Likewise. > (_mm256_maskz_cvtpbh_ps): Likewise. > (_mm_mask_cvtpbh_ps): Likewise. > (_mm256_mask_cvtpbh_ps): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: New test. > * gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c: Ditto. > * gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Ditto. > * gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c: Ditto. > --- > gcc/config/i386/avx512bf16intrin.h| 36 +++ > gcc/config/i386/avx512bf16vlintrin.h | 63 +++ > .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c | 15 + > .../gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c | 20 ++ > .../i386/avx512bf16vl-cvtness2sbh-1.c | 14 + > .../i386/avx512bf16vl-vcvtpbh2ps-1.c | 29 + > 6 files changed, 177 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c > > diff --git a/gcc/config/i386/avx512bf16intrin.h > b/gcc/config/i386/avx512bf16intrin.h > index 9afc6bd7d2b..6b62dc3e398 100644 > --- a/gcc/config/i386/avx512bf16intrin.h > +++ b/gcc/config/i386/avx512bf16intrin.h > @@ -41,6 +41,16 @@ typedef short __v32bh __attribute__ ((__vector_size__ > (64))); > vector types, and their scalar components. */ typedef short __m512bh > __attribute__ ((__vector_size__ (64), __may_alias__)); > > +/* Convert One BF16 Data to One Single Float Data. */ extern __inline > +float __attribute__ ((__gnu_inline__, __always_inline__, > +__artificial__)) _mm_cvtsbh_ss (__bfloat16 __A) { > + union{ float a; unsigned int b;} __tmp; > + __tmp.b = ((unsigned int)(__A)) << 16; > + return __tmp.a; > +} > + > /* vcvtne2ps2bf16 */ > > extern __inline __m512bh > @@ -110,6 +120,32 @@ _mm512_maskz_dpbf16_ps (__mmask16 __A, __m512 __B, > __m512bh __C, __m512bh __D) >return (__m512)__builtin_ia32_dpbf16ps_v16sf_maskz(__B, __C, __D, __A); } > > +extern __inline __m512 > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_cvtpbh_ps (__m256bh __A) { > + return (__m512)_mm512_castsi512_ps ((__m512i)_mm512_slli_epi32 ( > +(__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16)); } > + > +extern __inline __m512 > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_maskz_cvtpbh_ps (__mmask16 __U, __m256bh __A) { > + return (__m512)_mm512_castsi512_ps ((__m512i) _mm512_slli_epi32 ( > +(__m512i)_mm512_maskz_cvtepi16_epi32 ( > +(__mmask16)__U, (__m256i)__A), 16)); > +} > + > +extern __inline __m512 > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm512_mask_cvtpbh_ps (__m512 __S, __mmask16 __U, __m256bh __A) { > + return (__m512)_mm512_castsi512_ps ((__m512i)(_mm512_mask_slli_epi32 ( > +(__m512i)__S, (__mmask16)__U, > +(__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16))); } > + > #ifdef __DISABLE_AVX512BF16__ > #undef __DISABLE_AVX512BF16__ > #pragma GCC pop_options > diff --git a/gcc/config/i386/avx512bf16vlintrin.h > b/gcc/config/i386/avx512bf16vlintrin.h > index 6dd396d4008..5e6a6503aa6 100644 > --- a/gcc/config/i386/avx512bf16vlintrin.h > +++ b/gcc/config/i386/avx512bf16vlintrin.h > @@ -43,6 +43,7 @@ typedef short __v8bh __attribute__ ((__vector_size__ > (16))); typedef short __m256bh __attribute__ ((__vector_size__ (32), > __may_alias__)); typedef short __m128bh __attribute__ ((__vector_size__ > (16), __may_alias__)); > > +typedef unsigned short __bfloat16; > /* vcvtne2ps2bf16 */ > > extern __inline __m256bh > @@ -175,6 +176,68 @@ _mm_maskz_dpbf16_ps (__mmask8 __A, __m128 __B, __m128bh > __C, __m128bh __D) >return (__m128)__builtin_ia32_dpbf16ps_v4sf_maskz(__B, __C, __D, __A); } > > +extern __inline __bfloat16 > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_cvtness_sbh (float __A) { > + __v4sf __V = {__A, 0, 0, 0}; > + __v8hi __R = __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V, > + (__v8hi)_mm_undefined_si128 (), (__mmask8)-1); > + return __R[0]; > +} > + > +extern __inline __m128 > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)
Re: [PATCH] [i386] Add define_insn_and_split for vpcmp{b, w, d, q} vpcmp{ph, ps, pd}.
On Tue, Dec 21, 2021 at 2:27 PM liuhongt wrote: > > The purpose of those define_insn_and_split: > 1. Combine vpcmpuw and zero_extend into vpcmpuw. > 2. Canonicalize vpcmpuw pattern so CSE can replace duplicate vpcmpuw to just > kmov > 3. Use DImode as dest of zero_extend so cprop_hardreg can eliminate redundant > kmov. Use DImode as dest of zero_extend is too aggressive which causes several regression. New patch add define_insn_and_split just combine vpcmpuw and zero_extend into vpcmpuw. Here's the patch i'm checking in. > > It should partially fix the issue in PR. > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready to push to trunk. > > gcc/ChangeLog: > > PR target/103750 > * config/i386/sse.md > (*_cmp3_zero_extend): > New define_insn_and_split. > (*_cmp3): Ditto. > (*_cmp3_zero_extenddi): New define_insn. > (*_cmp3_zero_extend): > New define_insn_and_split. > (*_ucmp3_zero_extend): > Ditto. > (*_ucmp3): Ditto. > (*_ucmp3_zero_extenddi): New define_insn. > (*_ucmp3_zero_extend): > New define_insn_and_split. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/bitwise_mask_op-3.c: Adjust test/ > * g++.target/i386/pr103750-1.C: New test. > --- > gcc/config/i386/sse.md| 267 ++ > gcc/testsuite/g++.target/i386/pr103750-1.C| 50 > .../gcc.target/i386/bitwise_mask_op-3.c | 6 +- > 3 files changed, 320 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/g++.target/i386/pr103750-1.C > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 5196149ee32..fb885d58272 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -3702,6 +3702,75 @@ (define_insn > "_cmp3" > (set_attr "prefix" "evex") > (set_attr "mode" "")]) > > +;; Those Splitters are used to canonicalize vpcmpuw pattern, so that CSE can > transfrom > +;; duplicated vpcmpuw to vpcmpuw and kmov > +;; Choose biggest mode(DImode) as dest, so kmov can be optimized by > cprop_hardreg. > +(define_insn_and_split > "*_cmp3_zero_extend" > + [(set (match_operand:SWI248x 0 "register_operand" "=k") > + (zero_extend:SWI248x > + (unspec: > + [(match_operand:V48H_AVX512VL 1 "register_operand" "v") > +(match_operand:V48H_AVX512VL 2 "nonimmediate_operand" "vm") > +(match_operand:SI 3 "" "n")] > + UNSPEC_PCMP)))] > + "TARGET_AVX512BW > + && (GET_MODE_NUNITS (mode) > + < GET_MODE_PRECISION (mode))" > + "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" > + "&& mode != E_DImode" > + [(set (match_dup 0) > + (zero_extend:DI > + (unspec: > + [(match_dup 1) > +(match_dup 2) > +(match_dup 3)] > + UNSPEC_PCMP)))] > + "operands[0] = lowpart_subreg (DImode, operands[0], mode);" > + [(set_attr "type" "ssecmp") > + (set_attr "length_immediate" "1") > + (set_attr "prefix" "evex") > + (set_attr "mode" "")]) > + > +(define_insn_and_split "*_cmp3" > + [(set (match_operand: 0 "register_operand" "=k") > + (unspec: > + [(match_operand:V48H_AVX512VL 1 "register_operand" "v") > + (match_operand:V48H_AVX512VL 2 "nonimmediate_operand" "vm") > + (match_operand:SI 3 "" "n")] > + UNSPEC_PCMP))] > + "TARGET_AVX512BW > + && GET_MODE_NUNITS (mode) < 64" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (zero_extend:DI > + (unspec: > + [(match_dup 1) > +(match_dup 2) > +(match_dup 3)] > + UNSPEC_PCMP)))] > + "operands[0] = lowpart_subreg (DImode, operands[0], > mode);" > + [(set_attr "type" "ssecmp") > + (set_attr "length_immediate" "1") > + (set_attr "prefix" "evex") > + (set_attr "mode" "")]) > + > +(define_insn "*_cmp3_zero_extenddi" > + [(set (match_operand:DI 0 "register_operand" "=k") > + (zero_extend:DI > + (unspec: > + [(match_operand:V48H_AVX512VL 1 "register_operand" "v") > +(match_operand:V48H_AVX512VL 2 "nonimmediate_operand" "vm") > +(match_operand:SI 3 "" "n")] > + UNSPEC_PCMP)))] > + "TARGET_AVX512BW > + && GET_MODE_NUNITS (mode) < 64" > + "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" > + [(set_attr "type" "ssecmp") > + (set_attr "length_immediate" "1") > + (set_attr "prefix" "evex") > + (set_attr "mode" "")]) > + > (define_insn_and_split "*_cmp3" >[(set (match_operand: 0 "register_operand") > (not: > @@ -3735,6 +3804,72 @@ (define_insn > "_cmp3" > (set_attr "prefix" "evex") > (set_attr "mode" "")]) > > +(define_insn_and_split > "*_cmp3_zero_extend" > + [(set (match_operand:SWI248x 0 "register_operand" "=k") > + (zero_extend:SWI248x > + (unspec: > + [(match_operand:VI12_AVX512VL 1 "register_operand" "v") > +(match_operand:VI12_AVX512VL 2 "nonimmediate_operand" "vm") > +
[PATCH] fixed testcase riscv/pr103302.c
From: LiaoShihua because riscv32 not support __int128, so skip if -march=rv32*. gcc/testsuite\ChangeLog: * gcc.target/riscv/pr103302.c: skip if -march=rv32* --- gcc/testsuite/gcc.target/riscv/pr103302.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.target/riscv/pr103302.c b/gcc/testsuite/gcc.target/riscv/pr103302.c index 822c4087416..2cfb12498a2 100644 --- a/gcc/testsuite/gcc.target/riscv/pr103302.c +++ b/gcc/testsuite/gcc.target/riscv/pr103302.c @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-skip-if "rv32 not support _int128" { *-*-* } { "-march=rv32*" } } */ /* { dg-options "-Og -fharden-compares -fno-tree-dce -fno-tree-fre " } */ typedef unsigned char u8; -- 2.31.1.windows.1
Re: [PATCH] fixed testcase riscv/pr103302.c
On Wed, Dec 22, 2021 at 11:37 PM wrote: > > From: LiaoShihua > > because riscv32 not support __int128, so skip if -march=rv32*. > > gcc/testsuite\ChangeLog: > * gcc.target/riscv/pr103302.c: skip if -march=rv32* > --- > gcc/testsuite/gcc.target/riscv/pr103302.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/testsuite/gcc.target/riscv/pr103302.c > b/gcc/testsuite/gcc.target/riscv/pr103302.c > index 822c4087416..2cfb12498a2 100644 > --- a/gcc/testsuite/gcc.target/riscv/pr103302.c > +++ b/gcc/testsuite/gcc.target/riscv/pr103302.c > @@ -1,4 +1,5 @@ > /* { dg-do run } */ > +/* { dg-skip-if "rv32 not support _int128" { *-*-* } { "-march=rv32*" } } */ Better fix: /* { dg-do run { target int128 } } */ Thanks, Andrew Pinski > /* { dg-options "-Og -fharden-compares -fno-tree-dce -fno-tree-fre " } */ > > typedef unsigned char u8; > -- > 2.31.1.windows.1 >