Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> On Sep 11, 2020, at 3:36 PM, Segher Boessenkool > wrote: > > On Fri, Sep 11, 2020 at 03:17:19PM -0500, Qing Zhao wrote: >>> On Sep 11, 2020, at 3:05 PM, Segher Boessenkool >>> wrote: >>> On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote: > On Sep 11, 2020, at 12:13 PM, Segher Boessenkool > wrote: > On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote: >> I don’t understand why it’s not correct if we clearing call-clobbered >> registers >> AFTER restoring call-preserved registers? > > Because the compiler backend (or the linker! Or the dynamic linker! > Etc.) can use volatile registers for their own purposes. For the following sequence at the end of a routine: *...* “restore call-preserved registers” *clear call-clobbered registers"* *ret* “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine. >>> >>> And they can be written again right after the routine, by linker- >>> generated code for example. This is a waste. >>> In the new version of the patch, the implementation of clearing call-clobbered registers is done in backend, middle end only computes a hard register set based on user option, source attribute, data flow information, and function abi information, and Then pass this hard register set to the target hook to generate the clearing sequence. The backend will have all the details on the special situations you mentioned. Let me know any more concerns here. >>> >>> I cannot find that patch? >> >> Haven’t finished yet. -:). > > Ah okay :-) > > If you have, please send it in a new thread (not as a reply)? So that > it will be much easirer to handle :-) Okay. Will do. Qing > > > Segher
[PATCH] libstdc++: only pull in bits/align.h if C++11 or later
libstdc++-v3/ChangeLog: * include/std/memory: Move #include inside C++11 conditional includes. Tested x86_64-pc-linux-gnu, committed to master. --- libstdc++-v3/include/std/memory | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory index a56952fb114..aee7b050bd7 100644 --- a/libstdc++-v3/include/std/memory +++ b/libstdc++-v3/include/std/memory @@ -61,7 +61,6 @@ */ #include -#include #include #include #include @@ -75,6 +74,7 @@ # include // std::basic_ostream # include # include +# include # include # include // std::less # include -- 2.26.2
i386: Fix array index in expander
I noticed a compiler warning about out-of-bound access. Fixed thusly. gcc/ * config/i386/sse.md (mov): Fix operand indices. pushed as obvious -- Nathan Sidwell diff --git i/gcc/config/i386/sse.md w/gcc/config/i386/sse.md index a728b979f01..a784346a23b 100644 --- i/gcc/config/i386/sse.md +++ w/gcc/config/i386/sse.md @@ -23491,7 +23491,7 @@ (define_expand "mov" (match_operand:MASK_DWI 1 "nonimmediate_operand"))] "TARGET_AVX512VP2INTERSECT" { - if (MEM_P (operands[1]) && MEM_P (operands[2])) + if (MEM_P (operands[0]) && MEM_P (operands[1])) operands[1] = force_reg (mode, operands[1]); })
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> On Sep 11, 2020, at 4:03 PM, Segher Boessenkool > wrote: > > Hi! > > On Fri, Sep 11, 2020 at 03:14:57PM -0500, Qing Zhao wrote: >> My understanding of how this scheme helps ROP is: the attacker usually uses >> scratch register to pass > > Help obstruct ROP ;-) Thanks for catching my mistake. > >> parameters to the sys call in the gadget, if clearing the scratch registers >> immediately before “ret”, then >> The parameters that are passed to sys call will be destroyed, therefore, the >> attack will likely failed. > > But you do not need more than one non-zero argument for execv*, and that > is usually the same register as the normal return value register; all > other registers *should* be zero for a simple execv*("/bin/sh", ...)! > > (There is also the system call number register, rax on x86-64, but if > overwriting that would be any effective, you could just do that one > always and everywhere. This is only an effective defence if there are > no gadgets that do the system call an attacker wants, and he has to > construct that sequence himself; but it very effective and cheap then). In the above, do you mean only clearing “rax” on x86-64 should be effective enough? Qing > > > Segher
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
Qing Zhao writes: >> On Sep 11, 2020, at 12:32 PM, Richard Sandiford >> >> If we go for (2), then I think it would be better to do >> it at the start of pass_late_compilation instead. (Some targets wouldn't >> cope with doing it later.) The reason for doing it so late is that the >> set of used “volatile”/caller-saved registers is not fixed at prologue >> and epilogue generation: later optimisation passes can introduce uses >> of volatile registers that weren't used previously. (Sorry if this >> has already been suggested.) > > Yes, I agree. > > I thought that it might be better to move this task at the very late of the > RTL stage, i.e, before “final” phase. > > Another solution is (discussed with Hongjiu): > > 1. Define a new target hook: > > targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET > need_zeroed_hardregs, bool gpr_only) > > 2. Add the following routine in middle end: > > rtx_insn * > generate_return_rtx (bool simple_return_p) > { > if (targetm.return_with_zeroing) > { > Compute the hardregs set for clearing into “need_zeroed_hardregs”; > return targetm.return_with_zeroing (simple_return_p, > need_zeroed_hardregs, gpr_only); >} > else > { > if (simple_return_p) >return targetm.gen_simple_return ( ); > else >return targetm.gen_return (); > } > } > > Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” > to “generate_return_rtx()”. > > 3. In the target, > Implement “return_with_zeroing”. > > > Let me know your comments on this. I think having a separate pass is better. We don't normally know at the point of generating the return which registers will need to be cleared. So IMO the pass should just search for all the returns in a function and insert the zeroing instructions before each one. Having a target hook sounds good, but I think it should have a default definition that just uses the move patterns to zero each selected register. I expect the default will be good enough for most targets. Thanks, Richard
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
On Fri, Sep 11, 2020 at 04:29:16PM -0500, Qing Zhao wrote: > > On Sep 11, 2020, at 4:03 PM, Segher Boessenkool > > wrote: > >> The parameters that are passed to sys call will be destroyed, therefore, > >> the attack will likely failed. > > > > But you do not need more than one non-zero argument for execv*, and that > > is usually the same register as the normal return value register; all > > other registers *should* be zero for a simple execv*("/bin/sh", ...)! > > > > (There is also the system call number register, rax on x86-64, but if > > overwriting that would be any effective, you could just do that one > > always and everywhere. This is only an effective defence if there are > > no gadgets that do the system call an attacker wants, and he has to > > construct that sequence himself; but it very effective and cheap then). > > In the above, do you mean only clearing “rax” on x86-64 should be effective > enough? (rax=0 is "read", you might want to do another value, but that's just details.) "This is only an effective defence if there are no gadgets that do the system call an attacker wants, and he has to construct that sequence himself; but it very effective and cheap then)." It is definitely *not* effective if there are gadgets that set rax to a value the attacker wants and then do a syscall. It of course is quite effective in breaking a ROP chain of (set rax) -> (syscall). How effective it is in practice, I have no idea. My point was that your proposed scheme does not protect the other syscall parameters very much either. And, hrm, rax is actually the first return value. On most ABIs the same registers are used for arguments and for return values, I got confused. Sorry. So this cannot be very effective for x86-64 no matter what. Segher
[PATCH] -fprofile-reproducible: fix option value handling
From: Sergei Trofimovich Before the change option handling did not accept an argument: xgcc: error: unknown profile reproducibility method '=serial' xgcc: note: valid arguments to '-fprofile-reproducible' are: multithreaded parallel-runs serial; did you mean 'serial'? The change also includes trailing '=' as part of option prefix. gcc/ChangeLog: * common.opt: Fix handling of '-fprofile-reproducible' option. --- gcc/common.opt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index dd68c61ae1d..84bf521128d 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2228,7 +2228,7 @@ Enum(profile_reproducibility) String(parallel-runs) Value(PROFILE_REPRODUCIBILIT EnumValue Enum(profile_reproducibility) String(multithreaded) Value(PROFILE_REPRODUCIBILITY_MULTITHREADED) -fprofile-reproducible +fprofile-reproducible= Common Joined RejectNegative Var(flag_profile_reproducible) Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL) -fprofile-reproducible=[serial|parallel-runs|multithreaded]Control level of reproducibility of profile gathered by -fprofile-generate. -- 2.28.0
[PATCH] doc: fix spelling of -fprofile-reproducibility
From: Sergei Trofimovich gcc/ChangeLog: * doc/invoke.texi: fix '-fprofile-reproducibility' option spelling in maunal. --- gcc/doc/invoke.texi | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index bca8c856dc8..183ce7715d1 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -557,7 +557,8 @@ Objective-C and Objective-C++ Dialects}. -fprofile-dir=@var{path} -fprofile-generate -fprofile-generate=@var{path} @gol -fprofile-note=@var{path} -fprofile-prefix-path=@var{path} @gol -fprofile-update=@var{method} -fprofile-filter-files=@var{regex} @gol --fprofile-exclude-files=@var{regex} -fprofile-reproducibility @gol +-fprofile-exclude-files=@var{regex} @gol +-fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]} @gol -fsanitize=@var{style} -fsanitize-recover -fsanitize-recover=@var{style} @gol -fasan-shadow-offset=@var{number} -fsanitize-sections=@var{s1},@var{s2},... @gol -fsanitize-undefined-trap-on-error -fbounds-check @gol @@ -13889,14 +13890,14 @@ any of the regular expressions (separated by semi-colons). For example, @option{-fprofile-exclude-files=/usr/.*} will prevent instrumentation of all files that are located in the @file{/usr/} folder. -@item -fprofile-reproducible +@item -fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]} @opindex fprofile-reproducible Control level of reproducibility of profile gathered by @code{-fprofile-generate}. This makes it possible to rebuild program with same outcome which is useful, for example, for distribution packages. -With @option{-fprofile-reproducibility=serial} the profile gathered by +With @option{-fprofile-reproducible=serial} the profile gathered by @option{-fprofile-generate} is reproducible provided the trained program behaves the same at each invocation of the train run, it is not multi-threaded and profile data streaming is always done in the same @@ -13911,7 +13912,7 @@ Such non-reproducible part of programs may be annotated by @option{-l} can be used to dump gathered data and verify that they are indeed reproducible. -With @option{-fprofile-reproducibility=parallel-runs} collected profile +With @option{-fprofile-reproducible=parallel-runs} collected profile stays reproducible regardless the order of streaming of the data into gcda files. This setting makes it possible to run multiple instances of instrumented program in parallel (such as with @code{make -j}). This -- 2.28.0
[PATCH] rs6000: Rename mffgpr/mftgpr instruction types
The following is mostly a mechanical change to rename the mffgpr/mftgpr insn types to mtvsr/mfvsr to be more clear. It also removes Power6 references to those insn types since we no longer generate those instructions. Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk? -Pat 2020-09-11 Pat Haugen gcc/ * config/rs6000/power10.md (power10-mffgpr, power10-mftgpr): Rename to mtvsr/mfvsr. * config/rs6000/power6.md (X2F_power6, power6-mftgpr, power6-mffgpr): Remove. * config/rs6000/power8.md (power8-mffgpr, power8-mftgpr): Rename to mtvsr/mfvsr. * config/rs6000/power9.md (power8-mffgpr, power8-mftgpr): Likewise. * config/rs6000/rs6000.c (rs6000_adjust_cost): Remove Power6 TYPE_MFFGPR cases. * config/rs6000/rs6000.md (mffgpr, mftgpr, zero_extendsi2, extendsi2, @signbit2_dm, lfiwax, lfiwzx, *movsi_internal1, movsi_from_sf, *movdi_from_sf_zero_ext, *mov_internal, movsd_hardfloat, movsf_from_si, *mov_hardfloat64, p8_mtvsrwz, p8_mtvsrd_df, p8_mtvsrd_sf, p8_mfvsrd_3_, *movdi_internal64, unpack_dm): Rename mffgpr/mftgpr to mtvsr/mfvsr. * config/rs6000/vsx.md (vsx_mov_64bit, vsx_extract_, vsx_extract_si, *vsx_extract__p8): Likewise. diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md index 9f8a5825744..2b4d882e8df 100644 --- a/gcc/config/rs6000/power10.md +++ b/gcc/config/rs6000/power10.md @@ -468,13 +468,13 @@ (define_insn_reservation "power10-qpmul" 24 (eq_attr "cpu" "power10")) "DU_super_power10,dfu_power10*8") -(define_insn_reservation "power10-mffgpr" 2 - (and (eq_attr "type" "mffgpr") +(define_insn_reservation "power10-mtvsr" 2 + (and (eq_attr "type" "mtvsr") (eq_attr "cpu" "power10")) "DU_slice_3_power10,VSU_power10") -(define_insn_reservation "power10-mftgpr" 2 - (and (eq_attr "type" "mftgpr") +(define_insn_reservation "power10-mfvsr" 2 + (and (eq_attr "type" "mfvsr") (eq_attr "cpu" "power10")) "DU_slice_3_power10,VSU_power10") diff --git a/gcc/config/rs6000/power6.md b/gcc/config/rs6000/power6.md index a94ce52425f..e2e7582e67c 100644 --- a/gcc/config/rs6000/power6.md +++ b/gcc/config/rs6000/power6.md @@ -56,10 +56,6 @@ (define_reservation "LX2_power6" (define_reservation "FX2_power6" "iu1_power6+iu2_power6") -(define_reservation "X2F_power6" -"(iu1_power6+iu2_power6+fpu1_power6)\ -|(iu1_power6+iu2_power6+fpu2_power6)") - (define_reservation "BX2_power6" "iu1_power6+iu2_power6+bpu_power6") @@ -605,20 +601,3 @@ (define_bypass 6 "power6-vecperm" "power6-veccomplex" ) (define_bypass 5 "power6-vecperm" "power6-vecstore" ) -(define_insn_reservation "power6-mftgpr" 8 - (and (eq_attr "type" "mftgpr") - (eq_attr "cpu" "power6")) - "X2F_power6") - -(define_insn_reservation "power6-mffgpr" 14 - (and (eq_attr "type" "mffgpr") - (eq_attr "cpu" "power6")) - "LX2_power6") - -(define_bypass 4 "power6-mftgpr" "power6-imul,\ - power6-lmul,\ - power6-imul-cmp,\ - power6-lmul-cmp,\ - power6-imul3,\ - power6-idiv,\ - power6-ldiv" ) diff --git a/gcc/config/rs6000/power8.md b/gcc/config/rs6000/power8.md index fae2ad8bf57..a3f46c62be2 100644 --- a/gcc/config/rs6000/power8.md +++ b/gcc/config/rs6000/power8.md @@ -379,13 +379,13 @@ (define_insn_reservation "power8-vecdiv" 31 (eq_attr "cpu" "power8")) "DU_any_power8,VSU_power8") -(define_insn_reservation "power8-mffgpr" 5 - (and (eq_attr "type" "mffgpr") +(define_insn_reservation "power8-mtvsr" 5 + (and (eq_attr "type" "mtvsr") (eq_attr "cpu" "power8")) "DU_any_power8,VSU_power8") -(define_insn_reservation "power8-mftgpr" 6 - (and (eq_attr "type" "mftgpr") +(define_insn_reservation "power8-mfvsr" 6 + (and (eq_attr "type" "mfvsr") (eq_attr "cpu" "power8")) "DU_any_power8,VSU_power8") diff --git a/gcc/config/rs6000/power9.md b/gcc/config/rs6000/power9.md index 2277b145308..c86d643a7d3 100644 --- a/gcc/config/rs6000/power9.md +++ b/gcc/config/rs6000/power9.md @@ -466,13 +466,13 @@ (define_insn_reservation "power9-qpmul" 24 (eq_attr "cpu" "power9")) "DU_super_power9,dfu_power9*8") -(define_insn_reservation "power9-mffgpr" 2 - (and (eq_attr "type" "mffgpr") +(define_insn_reservation "power9-mtvsr" 2 + (and (eq_attr "type" "mtvsr") (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") -(define_insn_reservation "power9-mftgpr" 2 - (and (eq_attr "type" "mftgpr") +(define_insn_reservation "power9-mfvsr" 2 + (and (eq_attr "type" "mfvsr") (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index ca5b71ecdd3..5b0162dcccd 10
[PATCH 2/4, revised patch applied] PowerPC: Rename functions for min, max, cmove
Here is the patch that I applied: >From 1a2e0742e3e3c45f75d0ce31c45a7778c8d1f45e Mon Sep 17 00:00:00 2001 From: Michael Meissner Date: Fri, 11 Sep 2020 16:57:13 -0400 Subject: [PATCH] PowerPC: rename some functions. gcc/ 2020-09-11 Michael Meissner * config/rs6000/rs6000.c (rs6000_maybe_emit_maxc_minc): Rename from rs6000_emit_p9_fp_minmax. Change return type to bool. Add comments to document NaN/signed zero behavior. (rs6000_maybe_emit_fp_cmove): Rename from rs6000_emit_p9_fp_cmove. (have_compare_and_set_mask): New helper function. (rs6000_emit_cmove): Update calls to new names and the new helper function. --- gcc/config/rs6000/rs6000.c | 93 -- 1 file changed, 69 insertions(+), 24 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 9908830b07a..20a4ba382bc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -15057,13 +15057,33 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false, return 1; } -/* ISA 3.0 (power9) minmax subcase to emit a XSMAXCDP or XSMINCDP instruction - for SF/DF scalars. Move TRUE_COND to DEST if OP of the operands of the last - comparison is nonzero/true, FALSE_COND if it is zero/false. Return 0 if the - hardware has no such operation. */ +/* Possibly emit the xsmaxcdp and xsmincdp instructions to emit a maximum or + minimum with "C" semantics. -static int -rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) + Unless you use -ffast-math, you can't use these instructions to replace + conditions that implicitly reverse the condition because the comparison + might generate a NaN or signed zer0. + + I.e. the following can be replaced all of the time + ret = (op1 > op2) ? op1 : op2 ; generate xsmaxcdp + ret = (op1 >= op2) ? op1 : op2 ; generate xsmaxcdp + ret = (op1 < op2) ? op1 : op2; ; generate xsmincdp + ret = (op1 <= op2) ? op1 : op2; ; generate xsmincdp + + The following can be replaced only if -ffast-math is used: + ret = (op1 < op2) ? op2 : op1 ; generate xsmaxcdp + ret = (op1 <= op2) ? op2 : op1 ; generate xsmaxcdp + ret = (op1 > op2) ? op2 : op1; ; generate xsmincdp + ret = (op1 >= op2) ? op2 : op1; ; generate xsmincdp + + Move TRUE_COND to DEST if OP of the operands of the last comparison is + nonzero/true, FALSE_COND if it is zero/false. + + Return false if we can't generate the appropriate minimum or maximum, and + true if we can did the minimum or maximum. */ + +static bool +rs6000_maybe_emit_maxc_minc (rtx dest, rtx op, rtx true_cond, rtx false_cond) { enum rtx_code code = GET_CODE (op); rtx op0 = XEXP (op, 0); @@ -15073,14 +15093,14 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) bool max_p = false; if (result_mode != compare_mode) -return 0; +return false; if (code == GE || code == GT) max_p = true; else if (code == LE || code == LT) max_p = false; else -return 0; +return false; if (rtx_equal_p (op0, true_cond) && rtx_equal_p (op1, false_cond)) ; @@ -15093,19 +15113,23 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) max_p = !max_p; else -return 0; +return false; rs6000_emit_minmax (dest, max_p ? SMAX : SMIN, op0, op1); - return 1; + return true; } -/* ISA 3.0 (power9) conditional move subcase to emit XSCMP{EQ,GE,GT,NE}DP and - XXSEL instructions for SF/DF scalars. Move TRUE_COND to DEST if OP of the - operands of the last comparison is nonzero/true, FALSE_COND if it is - zero/false. Return 0 if the hardware has no such operation. */ +/* Possibly emit a floating point conditional move by generating a compare that + sets a mask instruction and a XXSEL select instruction. -static int -rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) + Move TRUE_COND to DEST if OP of the operands of the last comparison is + nonzero/true, FALSE_COND if it is zero/false. + + Return false if the operation cannot be generated, and true if we could + generate the instruction. */ + +static bool +rs6000_maybe_emit_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) { enum rtx_code code = GET_CODE (op); rtx op0 = XEXP (op, 0); @@ -15133,7 +15157,7 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) break; default: - return 0; + return false; } /* Generate: [(parallel [(set (dest) @@ -15153,7 +15177,28 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, cmove_rtx, clobber_rtx))); - return 1; + return true; +} + +/* Helper function to return true if the target has instructions to do a + compare and set mask instruction that can
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> On Sep 11, 2020, at 4:44 PM, Richard Sandiford > wrote: > > Qing Zhao writes: >>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford >>> >> If we go for (2), then I think it would be better to do >>> it at the start of pass_late_compilation instead. (Some targets wouldn't >>> cope with doing it later.) The reason for doing it so late is that the >>> set of used “volatile”/caller-saved registers is not fixed at prologue >>> and epilogue generation: later optimisation passes can introduce uses >>> of volatile registers that weren't used previously. (Sorry if this >>> has already been suggested.) >> >> Yes, I agree. >> >> I thought that it might be better to move this task at the very late of the >> RTL stage, i.e, before “final” phase. >> >> Another solution is (discussed with Hongjiu): >> >> 1. Define a new target hook: >> >> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET >> need_zeroed_hardregs, bool gpr_only) >> >> 2. Add the following routine in middle end: >> >> rtx_insn * >> generate_return_rtx (bool simple_return_p) >> { >> if (targetm.return_with_zeroing) >>{ >> Compute the hardregs set for clearing into “need_zeroed_hardregs”; >> return targetm.return_with_zeroing (simple_return_p, >> need_zeroed_hardregs, gpr_only); >> } >> else >>{ >> if (simple_return_p) >> return targetm.gen_simple_return ( ); >>else >> return targetm.gen_return (); >> } >> } >> >> Then replace all call to “targetm.gen_simple_return” and >> “targetm.gen_return” to “generate_return_rtx()”. >> >> 3. In the target, >> Implement “return_with_zeroing”. >> >> >> Let me know your comments on this. > > I think having a separate pass is better. We don't normally know > at the point of generating the return which registers will need > to be cleared. At the point of generating the return, we can compute the “need_zeroed_hardregs” HARD_REG_SET by using data flow information, the function abi of the routine, and also the user option and source code attribute information together. These information should be available at each point when generating the return. > So IMO the pass should just search for all the > returns in a function and insert the zeroing instructions before > each one. I was considering this approach too for some time, however, there is one major issue with this as Segher mentioned, The middle end does not know some details on the registers, lacking such detailed information might result incorrect code generation at middle end. For example, on x86_64 target, when “return” with pop, the scratch register “ECX” will be used for returning, then it’s incorrect to zero “ecx” before generating the return. Since middle end doesn't have such information, it cannot avoid to zero “ecx”. Therefore incorrect code might be generated. Segher also mentioned that on Power, there are some scratch registers also are used for Other purpose, clearing them before return is not correct. > > Having a target hook sounds good, but I think it should have a > default definition that just uses the move patterns to zero each > selected register. I expect the default will be good enough for > most targets. Based on the above, I think that generating the zeroing instructions at middle end is not correct. Thanks. Qing > > Thanks, > Richard
Re: [PATCH] rs6000: Rename mffgpr/mftgpr instruction types
Hi Pat, On Fri, Sep 11, 2020 at 05:04:29PM -0500, Pat Haugen wrote: > The following is mostly a mechanical change to rename the mffgpr/mftgpr > insn types to mtvsr/mfvsr to be more clear. It also removes Power6 > references to those insn types since we no longer generate those > instructions. > * config/rs6000/power10.md (power10-mffgpr, power10-mftgpr): Rename to > mtvsr/mfvsr. Please spell out the new names in full, so that it can be searched for. Okay for trunk. Thank you! Segher
[PATCH] doc: use @code{} instead of @samp{@command{}} around 'date %s'
From: Sergei Trofimovich Before the change 'man gcc' rendered "SOURCE_DATE_EPOCH" section as: ... the output of @command{date +%s} on GNU/Linux ... After the change it renders as: ... the output of "date +%s" on GNU/Linux ... gcc/ChangeLog: * doc/cppenv.texi: Use @code{} instead of @samp{@command{}} around 'date %s'. --- gcc/doc/cppenv.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/cppenv.texi b/gcc/doc/cppenv.texi index 123df5c464c..e1e408a2974 100644 --- a/gcc/doc/cppenv.texi +++ b/gcc/doc/cppenv.texi @@ -89,7 +89,7 @@ reproducible. The value of @env{SOURCE_DATE_EPOCH} must be a UNIX timestamp, defined as the number of seconds (excluding leap seconds) since 01 Jan 1970 00:00:00 represented in ASCII; identical to the output of -@samp{@command{date +%s}} on GNU/Linux and other systems that support the +@code{date +%s} on GNU/Linux and other systems that support the @code{%s} extension in the @code{date} command. The value should be a known timestamp such as the last modification -- 2.28.0
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> On Sep 11, 2020, at 4:51 PM, Segher Boessenkool > wrote: > > On Fri, Sep 11, 2020 at 04:29:16PM -0500, Qing Zhao wrote: >>> On Sep 11, 2020, at 4:03 PM, Segher Boessenkool >>> wrote: The parameters that are passed to sys call will be destroyed, therefore, the attack will likely failed. >>> >>> But you do not need more than one non-zero argument for execv*, and that >>> is usually the same register as the normal return value register; all >>> other registers *should* be zero for a simple execv*("/bin/sh", ...)! >>> >>> (There is also the system call number register, rax on x86-64, but if >>> overwriting that would be any effective, you could just do that one >>> always and everywhere. This is only an effective defence if there are >>> no gadgets that do the system call an attacker wants, and he has to >>> construct that sequence himself; but it very effective and cheap then). >> >> In the above, do you mean only clearing “rax” on x86-64 should be effective >> enough? > > (rax=0 is "read", you might want to do another value, but that's just > details.) > > "This is only an effective defence if there are > no gadgets that do the system call an attacker wants, and he has to > construct that sequence himself; but it very effective and cheap then)." > > It is definitely *not* effective if there are gadgets that set rax to > a value the attacker wants and then do a syscall. You mean the following gadget: Gadget 1: mov rax, value syscall ret Qing > It of course is quite > effective in breaking a ROP chain of (set rax) -> (syscall). How > effective it is in practice, I have no idea. > > My point was that your proposed scheme does not protect the other > syscall parameters very much either. > > And, hrm, rax is actually the first return value. On most ABIs the > same registers are used for arguments and for return values, I got > confused. Sorry. So this cannot be very effective for x86-64 no > matter what. > > > Segher
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
Qing Zhao writes: >> On Sep 11, 2020, at 4:44 PM, Richard Sandiford >> wrote: >> >> Qing Zhao writes: On Sep 11, 2020, at 12:32 PM, Richard Sandiford >> If we go for (2), then I think it would be better to do it at the start of pass_late_compilation instead. (Some targets wouldn't cope with doing it later.) The reason for doing it so late is that the set of used “volatile”/caller-saved registers is not fixed at prologue and epilogue generation: later optimisation passes can introduce uses of volatile registers that weren't used previously. (Sorry if this has already been suggested.) >>> >>> Yes, I agree. >>> >>> I thought that it might be better to move this task at the very late of the >>> RTL stage, i.e, before “final” phase. >>> >>> Another solution is (discussed with Hongjiu): >>> >>> 1. Define a new target hook: >>> >>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET >>> need_zeroed_hardregs, bool gpr_only) >>> >>> 2. Add the following routine in middle end: >>> >>> rtx_insn * >>> generate_return_rtx (bool simple_return_p) >>> { >>> if (targetm.return_with_zeroing) >>>{ >>> Compute the hardregs set for clearing into “need_zeroed_hardregs”; >>> return targetm.return_with_zeroing (simple_return_p, >>> need_zeroed_hardregs, gpr_only); >>> } >>> else >>>{ >>> if (simple_return_p) >>> return targetm.gen_simple_return ( ); >>>else >>> return targetm.gen_return (); >>> } >>> } >>> >>> Then replace all call to “targetm.gen_simple_return” and >>> “targetm.gen_return” to “generate_return_rtx()”. >>> >>> 3. In the target, >>> Implement “return_with_zeroing”. >>> >>> >>> Let me know your comments on this. >> >> I think having a separate pass is better. We don't normally know >> at the point of generating the return which registers will need >> to be cleared. > > At the point of generating the return, we can compute the > “need_zeroed_hardregs” HARD_REG_SET > by using data flow information, the function abi of the routine, and also the > user option and source code > attribute information together. These information should be available at each > point when generating the return. Like I mentioned earlier though, passes that run after pass_thread_prologue_and_epilogue can use call-clobbered registers that weren't previously used. For example, on x86_64, the function might not use %r8 when the prologue, epilogue and returns are generated, but pass_regrename might later introduce a new use of %r8. AIUI, the “used” version of the new command-line option is supposed to clear %r8 in these circumstances, but it wouldn't do so if the data was collected at the point that the return is generated. That's why I think it's more robust to do this later (at the beginning of pass_late_compilation) and insert the zeroing before returns that already exist. >> So IMO the pass should just search for all the >> returns in a function and insert the zeroing instructions before >> each one. > > I was considering this approach too for some time, however, there is one > major issue with this as > Segher mentioned, The middle end does not know some details on the registers, > lacking such > detailed information might result incorrect code generation at middle end. > > For example, on x86_64 target, when “return” with pop, the scratch register > “ECX” will be > used for returning, then it’s incorrect to zero “ecx” before generating the > return. Since middle end > doesn't have such information, it cannot avoid to zero “ecx”. Therefore > incorrect code might be > generated. > > Segher also mentioned that on Power, there are some scratch registers also > are used for > Other purpose, clearing them before return is not correct. But the dataflow information has to be correct between pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise any pass in that region could clobber the registers in the same way. To get the registers that are live before the return, you can start with the registers that are live out from the block that contains the return, then “simulate” the return instruction backwards to get the set of registers that are live before the return instruction (see df_simulate_one_insn_backwards). In the x86_64 case you mention, the pattern is: (define_insn "*simple_return_indirect_internal" [(simple_return) (use (match_operand:W 0 "register_operand" "r"))] "reload_completed" …) This (use …) tells the df machinery that the instruction needs operand 0 (= ecx). The process above would therefore realise that ecx can't be clobbered. Thanks, Richard
[PATCH] libstdc++: Add C++2a synchronization support
From: Thomas Rodgers This patch supercedes both the Add C++2a synchronization support patch being replied to *and* the patch adding wait/notify_* to atomic_flag. Add support for - * atomic_flag::wait/notify_one/notify_all * atomic::wait/notify_one/notify_all * counting_semaphore * binary_semaphore * latch libstdc++-v3/ChangeLog: * include/Makefile.am (bits_headers): Add new header. * include/Makefile.in: Regenerate. * include/bits/atomic_base.h (__atomic_flag::wait): Define. (__atomic_flag::notify_one): Likewise. (__atomic_flag::notify_all): Likewise. (__atomic_base<_Itp>::wait): Likewise. (__atomic_base<_Itp>::notify_one): Likewise. (__atomic_base<_Itp>::notify_all): Likewise. (__atomic_base<_Ptp*>::wait): Likewise. (__atomic_base<_Ptp*>::notify_one): Likewise. (__atomic_base<_Ptp*>::notify_all): Likewise. (__atomic_impl::wait): Likewise. (__atomic_impl::notify_one): Likewise. (__atomic_impl::notify_all): Likewise. (__atomic_float<_Fp>::wait): Likewise. (__atomic_float<_Fp>::notify_one): Likewise. (__atomic_float<_Fp>::notify_all): Likewise. (__atomic_ref<_Tp>::wait): Likewise. (__atomic_ref<_Tp>::notify_one): Likewise. (__atomic_ref<_Tp>::notify_all): Likewise. (atomic_wait<_Tp>): Likewise. (atomic_wait_explicit<_Tp>): Likewise. (atomic_notify_one<_Tp>): Likewise. (atomic_notify_all<_Tp>): Likewise. * include/bits/atomic_wait.h: New file. * include/bits/atomic_timed_wait.h: New file. * include/bits/semaphore_base.h: New file. * include/std/atomic (atomic::wait): Define. (atomic::wait_one): Likewise. (atomic::wait_all): Likewise. (atomic<_Tp>::wait): Likewise. (atomic<_Tp>::wait_one): Likewise. (atomic<_Tp>::wait_all): Likewise. (atomic<_Tp*>::wait): Likewise. (atomic<_Tp*>::wait_one): Likewise. (atomic<_Tp*>::wait_all): Likewise. * include/std/latch: New file. * include/std/semaphore: New file. * include/std/version: Add __cpp_lib_semaphore and __cpp_lib_latch defines. * testsuite/29_atomic/atomic/wait_notify/atomic_refs.cc: New test. * testsuite/29_atomic/atomic/wait_notify/bool.cc: Likewise. * testsuite/29_atomic/atomic/wait_notify/integrals.cc: Likewise. * testsuite/29_atomic/atomic/wait_notify/floats.cc: Likewise. * testsuite/29_atomic/atomic/wait_notify/pointers.cc: Likewise. * testsuite/29_atomic/atomic/wait_notify/generic.cc: Liekwise. * testsuite/29_atomic/atomic/wait_notify/generic.h: New File. * testsuite/29_atomics/atomic_flag/wait_notify/1.cc: New test. * testsuite/30_thread/semaphore/1.cc: New test. * testsuite/30_thread/semaphore/2.cc: Likewise. * testsuite/30_thread/semaphore/least_max_value_neg.cc: Likewise. * testsuite/30_thread/semaphore/try_acquire.cc: Likewise. * testsuite/30_thread/semaphore/try_acquire_for.cc: Likewise. * testsuite/30_thread/semaphore/try_acquire_posix.cc: Likewise. * testsuite/30_thread/semaphore/try_acquire_until.cc: Likewise. * testsuite/30_thread/latch/1.cc: New test. * testsuite/30_thread/latch/2.cc: New test. * testsuite/30_thread/latch/3.cc: New test. --- libstdc++-v3/include/Makefile.am | 5 + libstdc++-v3/include/Makefile.in | 5 + libstdc++-v3/include/bits/atomic_base.h | 195 +++- libstdc++-v3/include/bits/atomic_timed_wait.h | 281 libstdc++-v3/include/bits/atomic_wait.h | 301 ++ libstdc++-v3/include/bits/semaphore_base.h| 283 libstdc++-v3/include/std/atomic | 73 + libstdc++-v3/include/std/latch| 90 ++ libstdc++-v3/include/std/semaphore| 92 ++ libstdc++-v3/include/std/version | 2 + .../atomic/wait_notify/atomic_refs.cc | 103 ++ .../29_atomics/atomic/wait_notify/bool.cc | 59 .../29_atomics/atomic/wait_notify/floats.cc | 32 ++ .../29_atomics/atomic/wait_notify/generic.cc | 31 ++ .../29_atomics/atomic/wait_notify/generic.h | 160 ++ .../atomic/wait_notify/integrals.cc | 65 .../29_atomics/atomic/wait_notify/pointers.cc | 59 .../29_atomics/atomic_flag/wait_notify/1.cc | 61 libstdc++-v3/testsuite/30_threads/latch/1.cc | 27 ++ libstdc++-v3/testsuite/30_threads/latch/2.cc | 27 ++ libstdc++-v3/testsuite/30_threads/latch/3.cc | 50 +++ .../testsuite/30_threads/semaphore/1.cc | 27 ++ .../testsuite/30_threads/semaphore/2.cc | 27 ++ .../semaphore/least_max_value_neg.cc | 30 ++ .../30_threads/semaphore/try_acquire.cc | 55 .../30_threads/semaphore/try_acquire_for.cc | 85 + .../30_threads/sema
[PATCH] tree-optimization/97013 - avoid duplicate 'vectorization is not profitable'
This avoids dumping 'vectorization is not profitable' one more time if none of the opportunities in a BB is profitable. tested on x86_64-unknown-linux-gnu, pushed. 2020-09-11 Richard Biener PR tree-optimization/97013 * tree-vect-slp.c (vect_slp_analyze_bb_1): Remove duplicate dumping. --- gcc/tree-vect-slp.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index aa6aa3328e8..35bde9bcb9d 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -3626,16 +3626,10 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal, vect_bb_partition_graph (bb_vinfo); - /* Cost model: check if the vectorization is worthwhile. */ + /* Cost model: check if the vectorization opportunities are worthwhile. */ if (!unlimited_cost_model (NULL) && !vect_bb_vectorization_profitable_p (bb_vinfo)) -{ - if (dump_enabled_p ()) -dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, -"not vectorized: vectorization is not " -"profitable.\n"); - return false; -} +return false; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, -- 2.26.2
[PATCH] random vectorizer fixes
This fixes random things found when doing SLP discovery from arbitrary sets of stmts. Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed. Richard. 2020-09-10 Richard Biener * tree-vect-slp.c (vect_build_slp_tree_1): Check vector types for all lanes are compatible. (vect_analyze_slp_instance): Appropriately check for stores. (vect_schedule_slp): Likewise. --- gcc/tree-vect-slp.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 458a2c5bb30..aa6aa3328e8 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1007,6 +1007,16 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, continue; } } + + if (!types_compatible_p (vectype, *node_vectype)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"Build SLP failed: different vector type " +"in %G", stmt); + /* Mismatch. */ + continue; + } } /* Grouped store or load. */ @@ -2382,7 +2392,7 @@ vect_analyze_slp_instance (vec_info *vinfo, unsigned HOST_WIDE_INT const_nunits; if (is_a (vinfo) && STMT_VINFO_GROUPED_ACCESS (stmt_info) - && DR_GROUP_FIRST_ELEMENT (stmt_info) + && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)) && nunits.is_constant (&const_nunits)) { /* We consider breaking the group only on VF boundaries from the existing @@ -4893,14 +4903,13 @@ vect_schedule_slp (vec_info *vinfo) if (is_a (vinfo)) vect_remove_slp_scalar_calls (vinfo, root); + /* Remove vectorized stores original scalar stmts. */ for (j = 0; SLP_TREE_SCALAR_STMTS (root).iterate (j, &store_info); j++) { - if (!STMT_VINFO_DATA_REF (store_info)) + if (!STMT_VINFO_DATA_REF (store_info) + || !DR_IS_WRITE (STMT_VINFO_DATA_REF (store_info))) break; - if (SLP_INSTANCE_ROOT_STMT (instance)) - continue; - store_info = vect_orig_stmt (store_info); /* Free the attached stmt_vec_info and remove the stmt. */ vinfo->remove_stmt (store_info); -- 2.26.2
Re: [PATCH v5] genemit.c (main): split insn-emit.c for compiling parallelly
Hi, Ok & Thanks, It’s fixed in patch v6. Jojo 在 2020年8月28日 +0800 AM4:40,Segher Boessenkool ,写道: > Hi! > > On Thu, Aug 27, 2020 at 08:47:19PM +0800, Jojo R wrote: > > +insn-emit-split-c = $(foreach o, $(shell for i in > > {1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c) > > If you use a variable for the result of that "seq", this will be more > readable / maintainable / etc. > > (Should this use := instead of = ? What about the assignment to > insn-generated-split-num itself?) > > > + long long read_count = 0; > > We use "int" in many other places for similar counts. 2**31 should be > enough for anybody. > > > md_rtx_info info; > > while (read_md_rtx (&info)) > > + { > > + if (!(read_count++ % 1)) > > Wrong indent. "== 0" is more typical for testing if numbers are zero. > > > + { > > + printf ("/* Split file into separate compilation units for parallel > > compilation %lld */\n\n", read_count); > > Please split this (at least the source line, but probably the target > line is too long a well). > > > All that are details. This does look like it fixes the problems in the > previous versions. Thanks! > > > Segher
Re: [PATCH v5] genemit.c (main): split insn-emit.c for compiling parallelly
Hi, Ok & Thanks, It’s fixed in patch v6. Jojo 在 2020年8月28日 +0800 PM5:52,Richard Sandiford ,写道: > Thanks for doing this. In addition to what Segher said: > > Jojo R writes: > > gcc/ChangeLog: > > > > * genemit.c (main): Print 'split line'. > > * Makefile.in (insn-emit.c): Define split count and file > > > > --- > > gcc/Makefile.in | 15 + > > gcc/genemit.c | 87 - > > 2 files changed, 64 insertions(+), 38 deletions(-) > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > > index 79e854aa938..08e4aa7ef6f 100644 > > --- a/gcc/Makefile.in > > +++ b/gcc/Makefile.in > > @@ -1258,6 +1258,17 @@ ANALYZER_OBJS = \ > > # We put the *-match.o and insn-*.o files first so that a parallel make > > # will build them sooner, because they are large and otherwise tend to be > > # the last objects to finish building. > > + > > +# target overrides > > +-include $(tmake_file) > > + > > +INSN-GENERATED-SPLIT-NUM ?= 0 > > +insn-generated-split-num = $(shell expr $(INSN-GENERATED-SPLIT-NUM) + 1) > > + > > +insn-emit-split-c = $(foreach o, $(shell for i in > > {1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c) > > The {a..b} construct isn't portable: this needs to be runnable with > a plain Bourne shell like /bin/dash. > > I think we should use the same wordlist technique as check_p_numbers[0-6]. > So I guess the first step would be to rename check_p_numbers[0-6] to > something more general and use it both here and in check_p_numbers. > > Thanks, > Richard
Re: [PATCH] vec: don't select partial vectors when looping on full vectors
Richard Biener writes: > On Wed, 9 Sep 2020, Andrea Corallo wrote: >> Hi all, >> >> this patch is meant not to generate predication in loop when the >> loop is operating only on full vectors. >> >> Ex: >> >> #+BEGIN_SRC C >> /* Vector length is 256. */ >> void >> f (int *restrict x, int *restrict y, unsigned int n) { >> for (unsigned int i = 0; i < n * 8; ++i) >> x[i] += y[i]; >> } >> #+END_SRC >> >> Compiling on aarch64 with -O3 -msve-vector-bits=256 current trunk >> gives: >> >> #+BEGIN_SRC asm >> f: >> .LFB0: >> .cfi_startproc >> lsl w2, w2, 3 >> cbz w2, .L1 >> mov x3, 0 >> whilelo p0.s, xzr, x2 >> .p2align 3,,7 >> .L3: >> ld1wz0.s, p0/z, [x0, x3, lsl 2] >> ld1wz1.s, p0/z, [x1, x3, lsl 2] >> add z0.s, z0.s, z1.s >> st1wz0.s, p0, [x0, x3, lsl 2] >> add x3, x3, 8 >> whilelo p0.s, x3, x2 >> b.any .L3 >> .L1: >> ret >> .cfi_endproc >> #+END_SRC >> >> With the patch applied: >> >> #+BEGIN_SRC asm >> f: >> .LFB0: >> .cfi_startproc >> lsl w3, w2, 3 >> cbz w3, .L1 >> mov x2, 0 >> ptrue p0.b, vl32 >> .p2align 3,,7 >> .L3: >> ld1wz0.s, p0/z, [x0, x2, lsl 2] >> ld1wz1.s, p0/z, [x1, x2, lsl 2] >> add z0.s, z0.s, z1.s >> st1wz0.s, p0, [x0, x2, lsl 2] >> add x2, x2, 8 >> cmp x2, x3 >> bne .L3 >> .L1: >> ret >> .cfi_endproc >> #+END_SRC >> >> To achieve this we check earlier if the loop needs peeling and if is >> not the case we do not set LOOP_VINFO_USING_PARTIAL_VECTORS_P to true. >> >> I moved some logic from 'determine_peel_for_niter' to >> 'vect_need_peeling_or_part_vects_p' so it can be used for this purpose. >> >> Bootstrapped and regtested on aarch64-linux-gnu. > > Looks OK to me, the comment > > @@ -2267,7 +2278,10 @@ start_over: > { >if (param_vect_partial_vector_usage == 0) > LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; > - else if (vect_verify_full_masking (loop_vinfo) > + else if ((vect_verify_full_masking (loop_vinfo) > + && vect_need_peeling_or_part_vects_p (loop_vinfo)) > + /* Don't use partial vectors if we don't need to peel the > + loop. */ >|| vect_verify_loop_lens (loop_vinfo)) > > seems to be oddly misplaced (I'd put it before the call). Yeah, IMO it'd better to put it in the first “if”. Also, very minor, but I think it'd be better not to shorten the name: vect_need_peeling_or_partial_vectors_p Thanks, Richard
[PATCH v6] genemit.c (main): split insn-emit.c for compiling parallelly
gcc/ChangeLog: * genemit.c (main): Print 'split line'. * Makefile.in (insn-emit.c): Define split count and file --- gcc/Makefile.in | 19 +++ gcc/genemit.c | 128 ++-- 2 files changed, 89 insertions(+), 58 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 79e854aa938..a7fcc7d5949 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1258,6 +1258,21 @@ ANALYZER_OBJS = \ # We put the *-match.o and insn-*.o files first so that a parallel make # will build them sooner, because they are large and otherwise tend to be # the last objects to finish building. + +# target overrides +-include $(tmake_file) + +INSN-GENERATED-SPLIT-NUM ?= 0 + +insn-generated-split-num = $(shell i=1; j=`expr $(INSN-GENERATED-SPLIT-NUM) + 1`; \ + while test $$i -le $$j; do \ + echo $$i; i=`expr $$i + 1`; \ + done) + +insn-emit-split-c := $(foreach o, $(shell for i in $(insn-generated-split-num); do echo $$i; done), insn-emit$(o).c) +insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c)) +$(insn-emit-split-c): insn-emit.c + OBJS = \ gimple-match.o \ generic-match.o \ @@ -1265,6 +1280,7 @@ OBJS = \ insn-automata.o \ insn-dfatab.o \ insn-emit.o \ + $(insn-emit-split-obj) \ insn-extract.o \ insn-latencytab.o \ insn-modes.o \ @@ -2365,6 +2381,9 @@ $(simple_generated_c:insn-%.c=s-%): s-%: build/gen%$(build_exeext) $(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \ $(filter insn-conditions.md,$^) > tmp-$*.c $(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c + $*v=$$(echo $$(csplit insn-$*.c /parallel\ compilation/ -k -s {$(INSN-GENERATED-SPLIT-NUM)} -f insn-$* -b "%d.c" 2>&1));\ + [ ! "$$$*v" ] || grep "match not found" <<< $$$*v + [ -s insn-$*0.c ] || (for i in $(insn-generated-split-num); do touch insn-$*$$i.c; done && echo "" > insn-$*.c) $(STAMP) s-$* # gencheck doesn't read the machine description, and the file produced diff --git a/gcc/genemit.c b/gcc/genemit.c index 84d07d388ee..4fc8e61c5c8 100644 --- a/gcc/genemit.c +++ b/gcc/genemit.c @@ -847,6 +847,46 @@ handle_overloaded_gen (overloaded_name *oname) } } +#define printf_include() do { \ + printf ("/* Generated automatically by the program `genemit'\n\ +from the machine description file `md'. */\n\n"); \ + printf ("#define IN_TARGET_CODE 1\n");\ + printf ("#include \"config.h\"\n"); \ + printf ("#include \"system.h\"\n"); \ + printf ("#include \"coretypes.h\"\n");\ + printf ("#include \"backend.h\"\n"); \ + printf ("#include \"predict.h\"\n"); \ + printf ("#include \"tree.h\"\n"); \ + printf ("#include \"rtl.h\"\n"); \ + printf ("#include \"alias.h\"\n");\ + printf ("#include \"varasm.h\"\n"); \ + printf ("#include \"stor-layout.h\"\n"); \ + printf ("#include \"calls.h\"\n");\ + printf ("#include \"memmodel.h\"\n"); \ + printf ("#include \"tm_p.h\"\n"); \ + printf ("#include \"flags.h\"\n");\ + printf ("#include \"insn-config.h\"\n"); \ + printf ("#include \"expmed.h\"\n"); \ + printf ("#include \"dojump.h\"\n"); \ + printf ("#include \"explow.h\"\n"); \ + printf ("#include \"emit-rtl.h\"\n"); \ + printf ("#include \"stmt.h\"\n"); \ + printf ("#include \"expr.h\"\n"); \ + printf ("#include \"insn-codes.h\"\n"); \ + printf ("#include \"optabs.h\"\n"); \ + printf ("#include \"dfp.h\"\n"); \ + printf ("#include \"output.h\"\n"); \ + printf ("#include \"recog.h\"\n");\ + printf ("#include \"df.h\"\n"); \ + printf ("#include \"resource.h\"\n"); \ + printf ("#include \"reload.h\"\n"); \ + printf ("#include \"diagnostic-core.h\"\n"); \ + printf ("#include \"regs.h\"\n"); \ + printf ("#include \"tm-constrs.h\"\n"); \ + printf ("#include \"ggc.h\"\n"); \ + printf ("#include \"target.h\"\n\n"); \ +} while (0) + int main (int argc, const char **argv) { @@ -862,73 +902,45 @@ main (int argc, const char **argv) /* Assign sequential codes to all entries in the machine description in parallel with the tables in insn-output.c. */ - printf ("/* Generated automatically by the program `genemit'\n\ -from the machine description file `md'. */\n\n"); - - printf ("#define IN_TARGET_CODE 1\n"); - printf ("#include \"config.h\"\n"); - printf ("#include \"system.h\"\n"); - printf ("#
Re: [PATCH] fortran, openmp: PR fortran/93660 Fix ICE when coarrays used with 'omp declare simd'
On Thu, Sep 10, 2020 at 11:51:51PM +0100, Kwok Cheung Yeung wrote: > simd_clone_vector_of_formal_parm_types prefers to use the TYPE_ARG_TYPES of > a function decl if available, but they do not contain entries for the hidden > caf_* parameters. Replacements are therefore not generated for them, and the > ICE occurs in ipa_simd_modify_function_body due to an assert firing when a > replacement is not found for these hidden arguments. So, isn't it a bug that TYPE_ARG_TYPES is inconsistent with DECL_ARGUMENTS on the hidden arguments? That looks like lying to the middle-end to me, one doesn't have always a declaration with DECL_ARGUMENTS around, so sometimes TYPE_ARG_TYPES is the only way to get at the argument types. We had e.g. PR92305 in the past... Jakub
Re: [PATCH] ppc64 check for incompatible setting of minimal-toc
Hello, Segher, On Jul 9, 2020, Segher Boessenkool wrote: >> The problem it addresses is that the current checking only tests for >> existence not for an incompatible/compatible setting. >> Currently both are reported as incompatible. https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549760.html > Put that > ((rs6000_isa_flags & OPTION_MASK_MINIMAL_TOC) != 0) > check inside the block please, together with the CMODEL_SMALL check? We're trying to make sense of this suggestion of yours, and I confess I'm quite confused as to the intended effects of this piece of code, and similar ones in aix, freebsd64 and rtems rs6000 headers. Clearly, we wish to issue an error message if -mminimal-toc and -mcmodel other than small. But should we silently force the code model to small if -mno-minimal-toc is specified? (I believe that would be a consequence of the change you propose) Upon first reading your suggestion, I thought it was a think, and that you meant to move the rs6000_isa_flags_explicit test into the block, for that would: - set cmodel to small whenever -mminimal-toc is enabled, explicit or by default - error iff -mminimal-toc and non-small -mcmodel are explicitly given However, this could change behavior for the various headers that have an else block for the if that currently tests rs6000_isa_flags_explicit, which might be undesirable, and that's where things get confusing. It's not at all clear to me what's expected if e.g.: - -mminimal-toc is enabled by default (is it ever?) and a non-small code model is selected (explicitly or by default): error, silent change to small cmodel, silent disabling of minimal-toc, or something else? - -mno-minimal-toc is explicitly given along with some non-small code model: any reason to error out, force small cmode, force-enable minimal-toc, or something else? Could you please shed any light as to the intent, so that we can sort out the logic that will implement it? The logic in this code seems to be unchanged since it was first introduced by Alan Modra along with -mcmodel back in 2010, despite the many spelling changes as internal representation of flags and state and whatnot changed. I'm copying Alan as well, just in case he can recall or figure out what the intent was. AFAICT it first made linux64.h, and other headers got copies or variants thereof as the -mcmodel option got adopted by other systems. Thanks in advance, -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: [PATCH] arm: Fix up arm_override_options_after_change [PR96939]
On Tue, 8 Sep 2020 at 10:45, Jakub Jelinek via Gcc-patches wrote: > > Hi! > > As mentioned in the PR, the testcase fails to link, because when set_cfun is > being called on the crc function, arm_override_options_after_change is > called from set_cfun -> invoke_set_current_function_hook: > /* Change optimization options if needed. */ > if (optimization_current_node != opts) > { > optimization_current_node = opts; > cl_optimization_restore (&global_options, TREE_OPTIMIZATION (opts)); > } > and at that point target_option_default_node actually matches even the > current state of options, so this means armv7 (or whatever) arch is set as > arm_active_target, then > targetm.set_current_function (fndecl); > is called later in that function, which because the crc function's > DECL_FUNCTION_SPECIFIC_TARGET is different from the current one will do: > cl_target_option_restore (&global_options, TREE_TARGET_OPTION (new_tree)); > which calls arm_option_restore and sets arm_active_target to armv8-a+crc > (so far so good). > Later arm_set_current_function calls: > save_restore_target_globals (new_tree); > which in this case calls: > /* Call target_reinit and save the state for TARGET_GLOBALS. */ > TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts (); > which because optimization_current_node != optimization_default_node > (the testcase is LTO, so all functions have their > DECL_FUNCTION_SPECIFIC_TARGET and TREE_OPTIMIZATION nodes) will call: > cl_optimization_restore > (&global_options, > TREE_OPTIMIZATION (optimization_default_node)); > and > cl_optimization_restore (&global_options, >TREE_OPTIMIZATION (opts)); > The problem is that these call arm_override_options_after_change again, > and that one uses the target_option_default_node as what to set the > arm_active_target to (i.e. back to armv7 or whatever, but not to the > armv8-a+crc that should be the active target for the crc function). > That means we then error on the builtin call in that function. > > Now, the targetm.override_options_after_change hook is called always at the > end of cl_optimization_restore, i.e. when we change the Optimization marked > generic options. So it seems unnecessary to call arm_configure_build_target > at that point (nothing it depends on changed), and additionally incorrect > (because it uses the target_option_default_node, rather than the current > set of options; we'd need to revert > https://gcc.gnu.org/legacy-ml/gcc-patches/2016-12/msg01390.html > otherwise so that it works again with global_options otherwise). > The options that arm_configure_build_target cares about will change only > during option parsing (which is where it is called already), or during > arm_set_current_function, where it is done during the > cl_target_option_restore. > Now, arm_override_options_after_change_1 wants to adjust the > str_align_functions, which depends on the current Optimization options (e.g. > optimize_size and flag_align_options and str_align_functions) as well as > the target options target_flags, so IMHO needs to be called both > when the Optimization options (possibly) change, i.e. from > the targetm.override_options_after_change hook, and from when the target > options change (set_current_function hook). > > Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk? > > Looking further at arm_override_options_after_change_1, it also seems to be > incorrect, rather than testing > !opts->x_str_align_functions > it should be really testing > !opts_set->x_str_align_functions > and get &global_options_set or similar passed to it as additional opts_set > argument. That is because otherwise the decision will be sticky, while it > should be done whenever use provided -falign-functions but didn't provide > -falign-functions= (either on the command line, or through optimize > attribute or pragma). > > 2020-09-08 Jakub Jelinek > > PR target/96939 > * config/arm/arm.c (arm_override_options_after_change): Don't call > arm_configure_build_target here. > (arm_set_current_function): Call arm_override_options_after_change_1 > at the end. > > * gcc.target/arm/lto/pr96939_0.c: New test. > * gcc.target/arm/lto/pr96939_1.c: New file. > Hi Jakub, I'm seeing an ICE with this new test on most of my arm configurations, for instance: --target arm-none-linux-gnueabi --with-cpu cortex-a9 /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-ar m-none-linux-gnueabi/gcc3/gcc/ c_lto_pr96939_0.o c_lto_pr96939_1.o -fdiagnostics-plain-output -flto -O2 -o gcc-target-arm-lto-pr96939-01.exe In function 'crc': lto1: internal compiler error: Segmentation fault 0xba720f crash_signal /gcc/toplev.c:327 0x172beb9 strchr /usr/include/string.h:227 0x172beb9 arm_parse_cpu_option_name(cpu_optio
[PATCH] Fortran : Two further previously missed ICEs PR53298
For review. Fixes the two ICEs reported in PR that remained after the previous fix. There is a side affect that is manifested in the tree dumps. Instead of __builtin_free (ptr2->dat.data); we get __builtin_free ((void *) ptr2->dat.data); I do not know the cause of this but from what I can tell the newly inserted cast is harmless. All the examples I've seen so have the cast except where the parameter is declared as void *. In the tree dumps ptr2 is declared as struct testtype2 *, I do not know where the type is declared so I don't know whether data is declared void * (I expect it is). Is it worth the effort to determine how to remove the extra (void *)? [PATCH] Fortran : Two further previously missed ICEs PR53298 There were 3 ICEs with different call stacks in the comments of this PR. A previous commit fixed only one of those ICEs. The ICEs fixed here are in trans-array.c and trans-expr.c. The first ICE occurred when the array reference is not AR_ELEMENT gfc_conv_scalarized_array_ref is called with se and ar, if se->ss is NULL the ICE occurs. If se->ss is NULL there is nothing to do before the return. The second ICE occurs in code that did not match its comments. Fixing the code to match the comments fixes the ICE. A side affect is that the in the tree dumps for finalize_35.f90 and finalize_36.f90 contain "__builtin_free ((void *) ptr2->dat.data);", the "(void *)" was previously omitted. The cast is harmless. 2020-09-11 Mark Eggleston gcc/fortran/ PR fortran/53298 * trans-array.c (gfc_conv_array_ref): In the body of the if statement only execute the code before the reurn is se->ss is set. * trans-expr.c (gfc_conv_component_ref): Change the if expression to match the comments. 2020-09-04 Mark Eggleston gcc/testsuite/ PR fortran/53298 * gfortran.dg/finalize_35.f90: Handle extra (void *). * gfortran.dg/finalize_36.f90: Handle extra (void *). * gfortran.dg/pr53298_2.f90: New test. * gfortran.dg/pr53298_3.f90: New test. -- https://www.codethink.co.uk/privacy.html
Re: [PATCH] ppc64 check for incompatible setting of minimal-toc
On Fri, Sep 11, 2020 at 04:43:50AM -0300, Alexandre Oliva wrote: > Could you please shed any light as to the intent, so that we can sort > out the logic that will implement it? The history goes back to 2003 commit 9739c90c8d where a ppc64-linux host built most of gcc with -mminimal-toc due to toc/got size constraints of the small model, but build libgcc with -mno-minimal-toc. When the medium/large model was implemented, I thought it good to continue compiling libgcc in small model. We'd like to keep doing that too. That's why -mno-minimal-toc, an option valid for both ppc32 and ppc64, chooses small model on ppc64 when the compiler default is medium model. If you change that then make sure that libgcc continues to be small model. And any other code that might have copied the libgcc trick.. I also thought it reasonable to error on an explicit -mcmodel=medium or -mcmodel=large with either of -mminimal-toc or -mno-minimal-toc, since the toc options really are specific to small model code. Why change that? -- Alan Modra Australia Development Lab, IBM
[PATCH] improve BB vectorization dump locations
This tries to improve BB vectorization dumps by providing more precise locations. Currently the vect_location is simply the very last stmt in a basic-block that has a location. So for double a[4], b[4]; int x[4], y[4]; void foo() { a[0] = b[0]; // line 5 a[1] = b[1]; a[2] = b[2]; a[3] = b[3]; x[0] = y[0]; // line 9 x[1] = y[1]; x[2] = y[2]; x[3] = y[3]; } // line 13 we show the user with -O3 -fopt-info-vec t.c:13:1: optimized: basic block part vectorized using 16 byte vectors while with the patch we point to both independently vectorized opportunities: t.c:5:8: optimized: basic block part vectorized using 16 byte vectors t.c:9:8: optimized: basic block part vectorized using 16 byte vectors there's the possibility that the location regresses in case the root stmt in the SLP instance has no location. For a SLP subgraph with multiple entries the location also chooses one entry at random, not sure in which case we want to dump both. Still as the plan is to extend the basic-block vectorization scope from single basic-block to multiple ones this is a first step to preserve something sensible. Implementation-wise this makes both costing and code-generation happen on the subgraphs as analyzed. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard - is iteration over vector modes for BB vectorization still important now that we have related_vector_type and thus no longer only consider a fixed size? If so it will probably make sense to somehow still iterate even if there was some SLP subgraph vectorized? It also looks like BB vectorization was never updated to consider multiple modes based on cost, it will still pick the first opportunity. For BB vectorization we also have the code that re-tries SLP discovery with splitting the store group. So what's your overall thoughts to this? Thanks, Richard. 2020-09-11 Richard Biener * tree-vectorizer.h (_slp_instance::location): New method. (vect_schedule_slp): Adjust prototype. * tree-vectorizer.c (vec_info::remove_stmt): Adjust the BB region begin if we removed the stmt it points to. * tree-vect-loop.c (vect_transform_loop): Adjust. * tree-vect-slp.c (_slp_instance::location): Implement. (vect_analyze_slp_instance): For BB vectorization set vect_location to that of the instance. (vect_slp_analyze_operations): Likewise. (vect_bb_vectorization_profitable_p): Remove wrapper. (vect_slp_analyze_bb_1): Remove cost check here. (vect_slp_region): Cost check and code generate subgraphs separately, report optimized locations and missed optimizations due to profitability for each of them. (vect_schedule_slp): Get the vector of SLP graph entries to vectorize as argument. --- gcc/tree-vect-loop.c | 2 +- gcc/tree-vect-slp.c | 138 +++--- gcc/tree-vectorizer.c | 8 ++- gcc/tree-vectorizer.h | 4 +- 4 files changed, 73 insertions(+), 79 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 80e78f7adf4..c95ec5ad267 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -9018,7 +9018,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) if (!loop_vinfo->slp_instances.is_empty ()) { DUMP_VECT_SCOPE ("scheduling SLP instances"); - vect_schedule_slp (loop_vinfo); + vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo)); } /* FORNOW: the vectorizer supports only loops which body consist diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 35bde9bcb9d..519cd6a7254 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -117,6 +117,18 @@ vect_free_slp_tree (slp_tree node, bool final_p) delete node; } +/* Return a location suitable for dumpings related to the SLP instance. */ + +dump_user_location_t +_slp_instance::location () const +{ + if (root_stmt) +return root_stmt->stmt; + else +return SLP_TREE_SCALAR_STMTS (root)[0]->stmt; +} + + /* Free the memory allocated for the SLP instance. FINAL_P is true if we have vectorized the instance or if we have made a final decision not to vectorize the statements in any way. */ @@ -2121,6 +2133,8 @@ vect_analyze_slp_instance (vec_info *vinfo, vec scalar_stmts; bool constructor = false; + if (is_a (vinfo)) +vect_location = stmt_info->stmt; if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { scalar_type = TREE_TYPE (DR_REF (dr)); @@ -3120,6 +3134,8 @@ vect_slp_analyze_operations (vec_info *vinfo) hash_set lvisited; stmt_vector_for_cost cost_vec; cost_vec.create (2); + if (is_a (vinfo)) + vect_location = instance->location (); if (!vect_slp_analyze_node_operations (vinfo, SLP_INSTANCE_TREE (instance), instance, visited, lvisited, @@ -3
[Ada] Add missing stride entry in debug info
This adds a missing stride entry for bit-packed arrays of record types. Tested on x86_64-suse-linux, applied on the mainline. 2020-09-11 Eric Botcazou * gcc-interface/misc.c (get_array_bit_stride): Return TYPE_ADA_SIZE for record and union types. -- Eric Botcazoudiff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c index e6a563e3666..781868e2ad3 100644 --- a/gcc/ada/gcc-interface/misc.c +++ b/gcc/ada/gcc-interface/misc.c @@ -1003,6 +1003,10 @@ get_array_bit_stride (tree comp_type) if (INTEGRAL_TYPE_P (comp_type)) return TYPE_RM_SIZE (comp_type); + /* Likewise for record or union types. */ + if (RECORD_OR_UNION_TYPE_P (comp_type) && !TYPE_FAT_POINTER_P (comp_type)) +return TYPE_ADA_SIZE (comp_type); + /* The gnat_get_array_descr_info debug hook expects a debug tyoe. */ comp_type = maybe_debug_type (comp_type);
[Ada] Drop GNAT encodings for fixed-point types
GDB can now deal with the DWARF representation just fine. Tested on x86_64-suse-linux, applied on the mainline. 2020-09-11 Eric Botcazou * gcc-interface/misc.c (gnat_get_fixed_point_type): Bail out only when the GNAT encodings are specifically used. -- Eric Botcazoudiff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c index 183daf33fb6..e6a563e3666 100644 --- a/gcc/ada/gcc-interface/misc.c +++ b/gcc/ada/gcc-interface/misc.c @@ -618,10 +618,9 @@ gnat_get_fixed_point_type_info (const_tree type, { tree scale_factor; - /* GDB cannot handle fixed-point types yet, so rely on GNAT encodings - instead for it. */ + /* Do nothing if the GNAT encodings are used. */ if (!TYPE_IS_FIXED_POINT_P (type) - || gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL) + || gnat_encodings == DWARF_GNAT_ENCODINGS_ALL) return false; scale_factor = TYPE_SCALE_FACTOR (type);
[Ada] Fix ICE on nested packed variant record type
This is a regression present on the mainline and 10 branch: the compiler aborts on code accessing a component of a packed record type whose type is a packed discriminated record type with variant part. Tested on x86_64-suse-linux, applied on the mainline and 10 branch. 2020-09-11 Eric Botcazou * gcc-interface/utils.c (type_has_variable_size): New function. (create_field_decl): In the packed case, also force byte alignment when the type of the field has variable size. 2020-09-11 Eric Botcazou * gnat.dg/pack27.adb: New test. * gnat.dg/pack27_pkg.ads: New helper. -- Eric Botcazoudiff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c index a96fde668be..3065fcb6260 100644 --- a/gcc/ada/gcc-interface/utils.c +++ b/gcc/ada/gcc-interface/utils.c @@ -2905,6 +2905,31 @@ aggregate_type_contains_array_p (tree type, bool self_referential) } } +/* Return true if TYPE is a type with variable size or a padding type with a + field of variable size or a record that has a field with such a type. */ + +static bool +type_has_variable_size (tree type) +{ + tree field; + + if (!TREE_CONSTANT (TYPE_SIZE (type))) +return true; + + if (TYPE_IS_PADDING_P (type) + && !TREE_CONSTANT (DECL_SIZE (TYPE_FIELDS (type +return true; + + if (!RECORD_OR_UNION_TYPE_P (type)) +return false; + + for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) +if (type_has_variable_size (TREE_TYPE (field))) + return true; + + return false; +} + /* Return a FIELD_DECL node. NAME is the field's name, TYPE is its type and RECORD_TYPE is the type of the enclosing record. If SIZE is nonzero, it is the specified size of the field. If POS is nonzero, it is the bit @@ -2974,13 +2999,15 @@ create_field_decl (tree name, tree type, tree record_type, tree size, tree pos, DECL_PACKED (field_decl) = pos ? DECL_BIT_FIELD (field_decl) : packed; - /* If FIELD_TYPE is BLKmode, we must ensure this is aligned to at least a - byte boundary since GCC cannot handle less-aligned BLKmode bitfields. + /* If FIELD_TYPE has BLKmode, we must ensure this is aligned to at least + a byte boundary since GCC cannot handle less aligned BLKmode bitfields. + Likewise if it has a variable size and no specified position because + variable-sized objects need to be aligned to at least a byte boundary. Likewise for an aggregate without specified position that contains an - array, because in this case slices of variable length of this array - must be handled by GCC and variable-sized objects need to be aligned - to at least a byte boundary. */ + array because, in this case, slices of variable length of this array + must be handled by GCC and have variable size. */ if (packed && (TYPE_MODE (type) == BLKmode + || (!pos && type_has_variable_size (type)) || (!pos && AGGREGATE_TYPE_P (type) && aggregate_type_contains_array_p (type, false pragma No_Component_Reordering; package Pack27_Pkg is type Enum is (One, Two, Three); type Rec1 (D : Enum := One) is record case D is when One => null; when Two => null; when Three => C : Character; end case; end record; pragma Pack (Rec1); type Rec2 is record R : Rec1; end record; pragma Pack (Rec2); type Rec3 is record B : boolean; R : Rec2; end record; pragma Pack (Rec3); type Rec4 is record B : Boolean; R : Rec3; end record; pragma Pack (Rec4); end Pack27_Pkg; -- { dg-do compile } with Pack27_Pkg; use Pack27_Pkg; procedure Pack27 is R1 : Rec1; R4 : Rec4; begin R4.R.R.R := R1; end;
[Ada] Fix crash on array component with nonstandard index type
This is a regression present on mainline, 10 and 9 branches: the compiler goes into an infinite recursion eventually exhausting the stack for the declaration of a discriminated record type with an array component having a discriminant as bound and an index type that is an enumeration type with a non-standard representation clause. Tested on x86_64-suse-linux, applied on the mainline, 10 and 9 branches. 2020-09-11 Eric Botcazou * gcc-interface/decl.c (gnat_to_gnu_entity) : Only create extra subtypes for discriminants if the RM size of the base type of the index type is lower than that of the index type. 2020-09-11 Eric Botcazou * gnat.dg/specs/discr7.ads: New test. -- Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c index 8045fa5ff97..2b7392c62c0 100644 --- a/gcc/ada/gcc-interface/decl.c +++ b/gcc/ada/gcc-interface/decl.c @@ -2480,8 +2480,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) tree gnu_base_orig_max = TYPE_MAX_VALUE (gnu_base_index_type); tree gnu_min, gnu_max, gnu_high; - /* We try to define subtypes for discriminants used as bounds - that are more restrictive than those declared by using the + /* We try to create subtypes for discriminants used as bounds + that are more restrictive than those declared, by using the bounds of the index type of the base array type. This will make it possible to calculate the maximum size of the record type more conservatively. This may have already been done by @@ -2489,8 +2489,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) there will be a conversion that needs to be removed first. */ if (CONTAINS_PLACEHOLDER_P (gnu_orig_min) && TYPE_RM_SIZE (gnu_base_index_type) - && !tree_int_cst_lt (TYPE_RM_SIZE (gnu_index_type), - TYPE_RM_SIZE (gnu_base_index_type))) + && tree_int_cst_lt (TYPE_RM_SIZE (gnu_base_index_type), + TYPE_RM_SIZE (gnu_index_type))) { gnu_orig_min = remove_conversions (gnu_orig_min, false); TREE_TYPE (gnu_orig_min) @@ -2501,8 +2501,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) if (CONTAINS_PLACEHOLDER_P (gnu_orig_max) && TYPE_RM_SIZE (gnu_base_index_type) - && !tree_int_cst_lt (TYPE_RM_SIZE (gnu_index_type), - TYPE_RM_SIZE (gnu_base_index_type))) + && tree_int_cst_lt (TYPE_RM_SIZE (gnu_base_index_type), + TYPE_RM_SIZE (gnu_index_type))) { gnu_orig_max = remove_conversions (gnu_orig_max, false); TREE_TYPE (gnu_orig_max) -- { dg-do compile } package Discr7 is type Enum is (One, Two, Three); for Enum use (One => 1, Two => 2, Three => 3); type Arr is array (Integer range <>, Enum range <>) of Boolean; type Rec (D : Integer) is record A: Arr (1 .. D, Enum'Range); end record; end Discr7;
Re: [PATCH] arm: Fix up arm_override_options_after_change [PR96939]
On Fri, Sep 11, 2020 at 09:46:37AM +0200, Christophe Lyon via Gcc-patches wrote: > I'm seeing an ICE with this new test on most of my arm configurations, > for instance: > --target arm-none-linux-gnueabi --with-cpu cortex-a9 > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc > -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-ar > m-none-linux-gnueabi/gcc3/gcc/ c_lto_pr96939_0.o c_lto_pr96939_1.o > -fdiagnostics-plain-output -flto -O2 -o > gcc-target-arm-lto-pr96939-01.exe Seems a latent issue. Neither cl_optimization_{save,restore} nor cl_target_option_{save,restore} (nor any of the target hooks they call) saves or restores any opts_set values, so I think opts_set can be trusted only during option processing (if at all), but not later. So, short term a fix would be IMHO just stop using opts_set altogether in arm_configure_build_target, it doesn't make much sense to me, it should test if those strings are non-NULL instead, or at least do that when it is invoked from arm_option_restore (e.g. could be done by calling it with opts instead of &global_options_set ). Longer term, the question is if cl_optimization_{save,restore} and cl_target_option_{save,restore} shouldn't be changed not to only save/restore the options, but also save the opts_set flags. It could be done e.g. by adding a bool array or set of bool members to struct cl_optimization and struct cl_target_option , or even more compact by using bitmasks, pack each 64 adjacent option flags into a UHWI element of an array. > In function 'crc': > lto1: internal compiler error: Segmentation fault > 0xba720f crash_signal > /gcc/toplev.c:327 > 0x172beb9 strchr > /usr/include/string.h:227 > 0x172beb9 arm_parse_cpu_option_name(cpu_option const*, char const*, > char const*, bool) > /gcc/common/config/arm/arm-common.c:349 > 0xfa9372 arm_configure_build_target(arm_build_target*, > cl_target_option*, gcc_options*, bool) > /gcc/config/arm/arm.c:3209 > 0xfbf8e1 arm_set_current_function > /gcc/config/arm/arm.c:32334 > 0x86b68b invoke_set_current_function_hook > /gcc/function.c:4670 > 0x870d77 invoke_set_current_function_hook > /gcc/function.c:4836 > 0x870d77 allocate_struct_function(tree_node*, bool) > /gcc/function.c:4793 > 0xa25943 input_function > /gcc/lto-streamer-in.c:1385 > 0xa25943 lto_read_body_or_constructor > /gcc/lto-streamer-in.c:1624 > 0x6f37ff cgraph_node::get_untransformed_body() > /gcc/cgraph.c:3932 > 0x703f42 cgraph_node::expand() > /gcc/cgraphunit.c:2274 > 0x70567c expand_all_functions > /gcc/cgraphunit.c:2476 > 0x70567c symbol_table::compile() > /gcc/cgraphunit.c:2839 > 0x63970b lto_main() > /gcc/lto/lto.c:653 > > This is with a cross-compiler. Jakub
Re: [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock
I'm finally getting round to merging this series! On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote: If std::future::wait_until is passed a time point measured against a clock that is neither std::chrono::steady_clock nor std::chrono::system_clock then the generic implementation of __atomic_futex_unsigned::_M_load_when_equal_until is called which calculates the timeout based on __clock_t and calls the _M_load_when_equal_until method for that clock to perform the actual wait. There's no guarantee that __clock_t is running at the same speed as the caller's clock, so if the underlying wait times out timeout we need to check the timeout against the caller's clock again before potentially looping. Also add two extra tests to the testsuite's async.cc: * run test03 with steady_clock_copy, which behaves identically to std::chrono::steady_clock, but isn't std::chrono::steady_clock. This causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until that takes an arbitrary clock to be called. * invent test04 which uses a deliberately slow running clock in order to exercise the looping behaviour o __atomic_futex_unsigned::_M_load_when_equal_until described above. * libstdc++-v3/include/bits/atomic_futex.h: (__atomic_futex_unsigned) Add loop to _M_load_when_equal_until on generic _Clock to check the timeout against _Clock again after _M_load_when_equal_until returns indicating a timeout. * libstdc++-v3/testsuite/30_threads/async/async.cc: Invent slow_clock that runs at an eleventh of steady_clock's speed. Use it to test the user-supplied-clock variant of __atomic_futex_unsigned::_M_load_when_equal_until works generally with test03 and loops correctly when the timeout time hasn't been reached in test04. --- libstdc++-v3/include/bits/atomic_futex.h | 15 ++-- libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +- 2 files changed, 80 insertions(+), 5 deletions(-) diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h index 4375129..5f95ade 100644 --- a/libstdc++-v3/include/bits/atomic_futex.h +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_load_when_equal_until(unsigned __val, memory_order __mo, const chrono::time_point<_Clock, _Duration>& __atime) { - const typename _Clock::time_point __c_entry = _Clock::now(); - const __clock_t::time_point __s_entry = __clock_t::now(); - const auto __delta = __atime - __c_entry; - const auto __s_atime = __s_entry + __delta; - return _M_load_when_equal_until(__val, __mo, __s_atime); + typename _Clock::time_point __c_entry = _Clock::now(); + do { + const __clock_t::time_point __s_entry = __clock_t::now(); + const auto __delta = __atime - __c_entry; + const auto __s_atime = __s_entry + __delta; + if (_M_load_when_equal_until(__val, __mo, __s_atime)) + return true; I wonder if we can avoid looping if _Clock::is_steady is true i.e. if _GLIBCXX_CONSTEXPR (_Clock::is_steady) return false If _Clock is steady then it can't be adjusted to move backwards. I am not 100% sure, but I don't think a steady clock can run "slow" either. But I'm checking with LWG about that. We should keep the loop for now. The reason for wanting to return early is that _Clock::now() can be quite expensive, so avoiding the second call if it's not needed would be significant. Related to this, I've been experimenting with a change to wait_for that checks __rtime <= 0 and if so, uses __clock_t::time_point::min() for the wait_until call, so that we don't bother to call __clock_t::now(). For a non-positive duration we don't need to determine the precise time_point in the past, because it's in the past anyway. Using time_point::min() is as good as __clock_t::now() - rtime (as long as we don't overflow anything when converting it to a struct timespec, or course!) Using future.wait_for(0s) to poll for a future becoming ready takes a long time currently: https://wandbox.org/permlink/Z1arsDE7eHW9JrqJ Using wait_until(C::time_point::min()) is faster, because it doesn't call C::now(). This probably isn't important for most timed waiting functions in the library, because I don't see any reason to use wait_for(0s) to poll a mutex, condition_variable or atomic. But it's reasonable to do for futures. I already added (__rtime <= __rtime.zero()) to this_thread::sleep_for in d1a74a287ee1a84b90e5675904dac7f945cffca1. The extra branch to check rtime <= rtime.zero() or atime < C::now() is probably insignificant compared to the cost of unnecessary calls to now() on one or more clocks. + __c_entry = _Clock::now(); + } while (__c_entry < __atime); + return false; }
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
Kees Cook via Gcc-patches writes: > [tried to clean up quoting...] > > On Tue, Sep 08, 2020 at 10:00:09AM -0500, Qing Zhao wrote: >> >> > On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor >> > wrote: >> > >> >>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote: >> >>> So, my question is: >> >>> >> >>> From the security point of view, does clearing ALL registers have more >> >>> benefit than clearing USED registers? >> >>> From my understanding, clearing registers that are not used in the >> >>> current routine does NOT provide additional benefit, correct me if I am >> >>> wrong here. >> > >> > You are right, it does not provide additional security >> >> Then, is it necessary to provide >> >> -fzero-call-used-regs=all-arg|all-gpr|all to the user? >> >> Can we just delete these 3 sub options? > > Well... I'd say there is some benefit (remember that ROP gadgets are > built from function trailers, so there is rarely a concern over what the > rest of the function is doing). Generally, they are chained together > based on just the last couple instructions: > > *useful action* > *ret* > > So with ...=used this turns into: > > *useful action* > *clear some registers* > *ret* > > Which may still be helpful (if, for example, the state being built by > the attacker is using registers _not_ in the cleared list). However: > > *useful action* > *clear all registers* > *ret* > > Means that suddenly the ROP chain cannot use *any* of the caller-saved > registers to hold state. > > So, while ...=used is likely going to block a lot, ...=all will block > even more. I'd prefer to have both available, if for no other reason > than to compare the ROP gadget availability for any given binary (e.g. > if some future attack is found that bypasses ...=used, does it also > bypass ...=all?) This might have already been discussed/answered, sorry, but: when there's a choice, is there an obvious winner between: (1) clearing call-clobbered registers and then restoring call-preserved ones (2) restoring call-preserved registers and then clearing call-clobbered ones Is one option more likely to be useful to attackers than the other? (For some frames, it might be necessary to use a small number of call-clobbered registers to perform the restore sequence, so (1) wouldn't be fully achievable in all cases.) Thanks, Richard
[committed][libatomic] Add nvptx support
On 9/8/20 8:51 AM, Tom de Vries wrote: > Hi, > > Add nvptx support to libatomic. > > Given that atomic_test_and_set is not implemented for nvptx (PR96964), the > compiler translates __atomic_test_and_set falling back onto the "Failing all > else, assume a single threaded environment and simply perform the operation" > case in expand_atomic_test_and_set, so it doesn't map onto an actual atomic > operation. > > Still, that counts as supported for the configure test of libatomic, so we > end up with HAVE_ATOMIC_TAS_1/2/4/8/16 == 1, and the corresponding > __atomic_test_and_set_1/2/4/8/16 in libatomic all using that non-atomic > implementation. > > Fix this by adding an atomic_test_and_set expansion for nvptx, that uses > libatomics __atomic_test_and_set_1. > > This again makes the configure tests for HAVE_ATOMIC_TAS_1/2/4/8/16 fail, so > instead we use this case in tas_n.c: > ... > /* If this type is smaller than word-sized, fall back to a word-sized >compare-and-swap loop. */ > bool > SIZE(libat_test_and_set) (UTYPE *mptr, int smodel) > ... > which for __atomic_test_and_set_8 uses INVERT_MASK_8. > > Add INVERT_MASK_8 in libatomic_i.h, as well as MASK_8. > > Tested libatomic testsuite on nvptx. > > Non-target bits (sync-builtins.def, libatomic_i.h) OK for trunk? > > Any other comments? > > Thanks, > - Tom > > [libatomic] Add nvptx support > > gcc/ChangeLog: > > PR target/96964 > * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New > expansion. > * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin. > I realized after reading the header comment of write_fn_proto_from_insn in nvptx.c that I could drop the sync-builtins.def bit. So, committed without the sync-builtins.def change. Thanks, - Tom > libatomic/ChangeLog: > > PR target/96898 > * configure.tgt: Add nvptx. > * libatomic_i.h (MASK_8, INVERT_MASK_8): New macro definition. > * config/nvptx/host-config.h: New file. > * config/nvptx/lock.c: New file. > > --- > gcc/config/nvptx/nvptx.md| 16 +++ > gcc/sync-builtins.def| 2 ++ > libatomic/config/nvptx/host-config.h | 56 > > libatomic/config/nvptx/lock.c| 56 > > libatomic/configure.tgt | 3 ++ > libatomic/libatomic_i.h | 2 ++ > 6 files changed, 135 insertions(+) > > diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md > index 4168190fa42..6178e6a0f77 100644 > --- a/gcc/config/nvptx/nvptx.md > +++ b/gcc/config/nvptx/nvptx.md > @@ -1667,6 +1667,22 @@ >"%.\\tatom%A1.b%T0.\\t%0, %1, %2;" >[(set_attr "atomic" "true")]) > > +(define_expand "atomic_test_and_set" > + [(match_operand:SI 0 "nvptx_register_operand") ;; bool success output > + (match_operand:QI 1 "memory_operand") ;; memory > + (match_operand:SI 2 "const_int_operand")] ;; model > + "" > +{ > + rtx libfunc; > + rtx addr; > + libfunc = init_one_libfunc ("__atomic_test_and_set_1"); > + addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0)); > + emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode, > + addr, ptr_mode, > + operands[2], SImode); > + DONE; > +}) > + > (define_insn "nvptx_barsync" >[(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri") >(match_operand:SI 1 "const_int_operand")] > diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def > index 156a13ce0f8..b802257bd1a 100644 > --- a/gcc/sync-builtins.def > +++ b/gcc/sync-builtins.def > @@ -261,6 +261,8 @@ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, > "__sync_synchronize", > > DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET, "__atomic_test_and_set", > BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST) > +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET_1, "__atomic_test_and_set_1", > + BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST) > > DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_CLEAR, "__atomic_clear", > BT_FN_VOID_VPTR_INT, > ATTR_NOTHROWCALL_LEAF_LIST) > diff --git a/libatomic/config/nvptx/host-config.h > b/libatomic/config/nvptx/host-config.h > new file mode 100644 > index 000..eb9de81f388 > --- /dev/null > +++ b/libatomic/config/nvptx/host-config.h > @@ -0,0 +1,56 @@ > +/* Copyright (C) 2020 Free Software Foundation, Inc. > + > + This file is part of the GNU Atomic Library (libatomic). > + > + Libatomic is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + Libatomic is distributed in the hope that it will be useful, but WITHOUT > ANY > + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS > + FO
[committed] amdgcn: align TImode registers
This patch fixes an execution failure in which the compiler would corrupt TImode values due to missed early clobber problems with partially overlapping register allocations. In fact, adding early clobber constraints does not fix the issue because IRA doesn't check that for move instructions properly in all circumstances anyway (the constraint causes an ICE in postreload). This patch fixes the problem by ensuring that TImode values are always aligned to 4-register boundaries, meaning that inputs and outputs will either overlap completely, or not at all, neither of which have early-clobber issues. This is an artificial restriction the hardware not present in hardware , but it is the same solution we use for DImode values where we had a lot of the same problems. With the patch I see the following new test passes: PASS: gfortran.dg/PR95331.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: gfortran.dg/PR95331.f90 -O3 -g execution test PASS: gfortran.dg/gamma_5.f90 -O0 execution test PASS: gfortran.dg/gamma_5.f90 -O1 execution test PASS: gfortran.dg/gamma_5.f90 -O2 execution test PASS: gfortran.dg/gamma_5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: gfortran.dg/gamma_5.f90 -O3 -g execution test PASS: gfortran.dg/gamma_5.f90 -Os execution test PASS: gfortran.dg/intrinsic_pack_1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: gfortran.dg/intrinsic_pack_1.f90 -O3 -g execution test PASS: gfortran.dg/optional_absent_5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: gfortran.dg/optional_absent_5.f90 -O3 -g execution test Andrew amdgcn: align TImode registers This prevents execution failures caused by partially overlapping input and output registers. This is the same solution already used for DImode. gcc/ChangeLog: * config/gcn/gcn.c (gcn_hard_regno_mode_ok): Align TImode registers. * config/gcn/gcn.md: Assert that TImode registers do not early clobber. diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 8b3c4544dd5..84d1fd9a354 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -475,7 +475,8 @@ gcn_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return (vgpr_1reg_mode_p (mode) || (!((regno - FIRST_VGPR_REG) & 1) && vgpr_2reg_mode_p (mode)) /* TImode is used by DImode compare_and_swap. */ - || mode == TImode); + || (mode == TImode + && !((regno - FIRST_VGPR_REG) & 3))); return false; } diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index aeb25fbb931..0e73fea93cf 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -677,6 +677,8 @@ (define_insn_and_split "*movti_insn" (set (match_dup 4) (match_dup 5)) (set (match_dup 6) (match_dup 7))] { +gcc_assert (rtx_equal_p (operands[0], operands[1]) + || !reg_overlap_mentioned_p (operands[0], operands[1])); operands[6] = gcn_operand_part (TImode, operands[0], 3); operands[7] = gcn_operand_part (TImode, operands[1], 3); operands[4] = gcn_operand_part (TImode, operands[0], 2);
Re: [PATCH V2] libgccjit: Add new gcc_jit_context_new_blob entry point
Hi Dave, thanks for the review! David Malcolm writes: [...] > Was there a reason for using reinterpret_cast here, rather than > static_cast? Yes the reason is that apparently we can't use static_cast for that: "error: invalid ‘static_cast’ from type ‘gcc_jit_lvalue*’ to type ‘gcc::jit::recording::global*’" Why is that is another question I fear I'm not very qualified to answer. If you feel is necessary and want to suggest some rework to improve this I'll be happy to put some effort in if you like. >> diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h >> index 1c5a12e9c01..3cbcbef2693 100644 >> --- a/gcc/jit/libgccjit.h >> +++ b/gcc/jit/libgccjit.h >> @@ -788,6 +788,20 @@ gcc_jit_context_new_global (gcc_jit_context *ctxt, >> gcc_jit_type *type, >> const char *name); >> >> +#define LIBGCCJIT_HAVE_gcc_jit_global_set_initializer >> + >> +/* Set a static initializer for a global return the global itself. [...] > > OK for master with these nits fixed, assuming your usual testing. > > Thanks! > Dave After applying the suggestions and having tested it I've installed the patch as 4ecc0061c40. Thanks! Andrea
Re: [PATCH] improve BB vectorization dump locations
Richard Biener writes: > This tries to improve BB vectorization dumps by providing more > precise locations. Currently the vect_location is simply the > very last stmt in a basic-block that has a location. So for > > double a[4], b[4]; > int x[4], y[4]; > void foo() > { > a[0] = b[0]; // line 5 > a[1] = b[1]; > a[2] = b[2]; > a[3] = b[3]; > x[0] = y[0]; // line 9 > x[1] = y[1]; > x[2] = y[2]; > x[3] = y[3]; > } // line 13 > > we show the user with -O3 -fopt-info-vec > > t.c:13:1: optimized: basic block part vectorized using 16 byte vectors > > while with the patch we point to both independently vectorized > opportunities: > > t.c:5:8: optimized: basic block part vectorized using 16 byte vectors > t.c:9:8: optimized: basic block part vectorized using 16 byte vectors > > there's the possibility that the location regresses in case the > root stmt in the SLP instance has no location. For a SLP subgraph > with multiple entries the location also chooses one entry at random, > not sure in which case we want to dump both. > > Still as the plan is to extend the basic-block vectorization > scope from single basic-block to multiple ones this is a first > step to preserve something sensible. > > Implementation-wise this makes both costing and code-generation > happen on the subgraphs as analyzed. > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > Richard - is iteration over vector modes for BB vectorization > still important now that we have related_vector_type and thus > no longer only consider a fixed size? If so it will probably > make sense to somehow still iterate even if there was some > SLP subgraph vectorized? It also looks like BB vectorization > was never updated to consider multiple modes based on cost, > it will still pick the first opportunity. For BB vectorization > we also have the code that re-tries SLP discovery with > splitting the store group. So what's your overall thoughts to > this? I think there might be different answers for “in principle” and “in practice”. :-) In principle, there's no one right answer to (say) “what vector mode should I use for 4 32-bit integers?”. If the block is only operating on that type, then VNx4SI is the right choice for 128-bit SVE. But if the block is mostly operating on 4 64-bit integers and just converting to 32-bit integers for a small region, then it might be better to use 2 VNx2SIs instead (paired with 2 VNx2DIs). In practice, one situation in which the current loop might be needed is pattern statements. There we assign a vector type during pattern recognition, based only on the element type. So in that situation, the first pass (with the autodetected base vector mode) will not take the number of scalar stmts into account. Also, although SLP currently only operates on full vectors, I was hoping we would eventually support predication for SLP too. At that point, the number of scalar statements wouldn't directly determine the number of vector lanes. On the cost thing: it would be better to try all four and pick the one with the lowest cost, but given your in-progress changes, it seemed like a dead end to do that with the current code. It sounded like the general direction here was to build an SLP graph and “solve” the vector type assignment problem in a more global way, once we have a view of the entire graph. Is that right? If so, then at that point we might be able to do something more intelligent than just iterate over all the options. (Although at the same time, iterating over all the options on a fixed (sub?)graph would be cheaper than what we do now.) Thanks, Richard
RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls
Hi Martin, > -Original Message- > From: Martin Jambor > Sent: Tuesday, September 8, 2020 3:01 PM > To: Tamar Christina ; Richard Sandiford > ; luoxhu via Gcc-patches patc...@gcc.gnu.org> > Cc: seg...@kernel.crashing.org; luoxhu ; > wschm...@linux.ibm.com; li...@gcc.gnu.org; Jan Hubicka > ; dje@gmail.com > Subject: RE: [PATCH] ipa-inline: Improve growth accumulation for recursive > calls > > Hi, > > On Fri, Aug 21 2020, Tamar Christina wrote: > >> > >> Honza's changes have been motivated to big extent as an enabler for > >> IPA-CP heuristics changes to actually speed up 548.exchange2_r. > >> > >> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds > two > >> weeks ago, this week it is 403, but with my WIP (and so far untested) > >> patch below it is just 276 seconds - faster than one built with GCC 8 > >> which needs > >> 283 seconds. > >> > >> I'll be interested in knowing if it also works this well on other > >> architectures. > >> > > I have posted the new version of the patch series to the mailing list > yesterday and I have also pushed the branch to the FSF repo as > refs/users/jamborm/heads/ipa-context_and_exchange-200907 > Thanks! I've pushed it through our CI with a host of different options (see below) > > > > Many thanks for working on this! > > > > I tried this on an AArch64 Neoverse-N1 machine and didn't see any > difference. > > Do I need any flags for it to work? The patch was applied on top of > > 656218ab982cc22b826227045826c92743143af1 > > > > I only have access to fairly old AMD (Seattle) Opteron 1100 which might not > support some interesting Aarch64 ISA extensions but I can measure a > significant speedup on it (everything with just -Ofast -march=native - > mtune=native, no non-default parameters, without LTO, without any inlining > options): > > GCC 10 branch: 915 seconds > Master (rev. 995bb851ffe): 989 seconds > My branch: 827 seconds > > (All is 548.exchange_r reference run time.) > +--+--+--+--+--+ | Compiler | Flags | diff GCC 10 | | | +--+--+--+--+--+ | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions-called-once | | | | +--+--+--+--+--+ | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer | -44% | | | +--+--+--+--+--+ | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto | -36% | | | +--+--+--+--+--+ | GCC 11 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions-called-once | -12% | | | +--+--+--+--+--+ | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 | -22% | | | +--+--+--+--+--+ | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions-called-once | -12% | | | +--+--+--+--+--+ | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto | -24% | | | +--+-
Re: [PATCH] improve BB vectorization dump locations
On Fri, 11 Sep 2020, Richard Sandiford wrote: > Richard Biener writes: > > This tries to improve BB vectorization dumps by providing more > > precise locations. Currently the vect_location is simply the > > very last stmt in a basic-block that has a location. So for > > > > double a[4], b[4]; > > int x[4], y[4]; > > void foo() > > { > > a[0] = b[0]; // line 5 > > a[1] = b[1]; > > a[2] = b[2]; > > a[3] = b[3]; > > x[0] = y[0]; // line 9 > > x[1] = y[1]; > > x[2] = y[2]; > > x[3] = y[3]; > > } // line 13 > > > > we show the user with -O3 -fopt-info-vec > > > > t.c:13:1: optimized: basic block part vectorized using 16 byte vectors > > > > while with the patch we point to both independently vectorized > > opportunities: > > > > t.c:5:8: optimized: basic block part vectorized using 16 byte vectors > > t.c:9:8: optimized: basic block part vectorized using 16 byte vectors > > > > there's the possibility that the location regresses in case the > > root stmt in the SLP instance has no location. For a SLP subgraph > > with multiple entries the location also chooses one entry at random, > > not sure in which case we want to dump both. > > > > Still as the plan is to extend the basic-block vectorization > > scope from single basic-block to multiple ones this is a first > > step to preserve something sensible. > > > > Implementation-wise this makes both costing and code-generation > > happen on the subgraphs as analyzed. > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > > > Richard - is iteration over vector modes for BB vectorization > > still important now that we have related_vector_type and thus > > no longer only consider a fixed size? If so it will probably > > make sense to somehow still iterate even if there was some > > SLP subgraph vectorized? It also looks like BB vectorization > > was never updated to consider multiple modes based on cost, > > it will still pick the first opportunity. For BB vectorization > > we also have the code that re-tries SLP discovery with > > splitting the store group. So what's your overall thoughts to > > this? > > I think there might be different answers for “in principle” and > “in practice”. :-) > > In principle, there's no one right answer to (say) “what vector mode > should I use for 4 32-bit integers?”. If the block is only operating on > that type, then VNx4SI is the right choice for 128-bit SVE. But if the > block is mostly operating on 4 64-bit integers and just converting to > 32-bit integers for a small region, then it might be better to use > 2 VNx2SIs instead (paired with 2 VNx2DIs). > > In practice, one situation in which the current loop might be needed > is pattern statements. There we assign a vector type during pattern > recognition, based only on the element type. So in that situation, > the first pass (with the autodetected base vector mode) will not take > the number of scalar stmts into account. Ah, indeed. So currently the per-BB decision is probably not too bad but when it becomes a per-function decision we need to do something about this I guess. > Also, although SLP currently only operates on full vectors, > I was hoping we would eventually support predication for SLP too. > At that point, the number of scalar statements wouldn't directly > determine the number of vector lanes. > > On the cost thing: it would be better to try all four and pick the one > with the lowest cost, but given your in-progress changes, it seemed like > a dead end to do that with the current code. > > It sounded like the general direction here was to build an SLP graph > and “solve” the vector type assignment problem in a more global way, > once we have a view of the entire graph. Is that right? If so, > then at that point we might be able to do something more intelligent > than just iterate over all the options. (Although at the same time, > iterating over all the options on a fixed (sub?)graph would be cheaper > than what we do now.) Yes, in the final end we should have the SLP tree build without having assinged a vector type which also means doing pattern detection on the SLP tree after the SLP build (which makes that a viable change only after we got rid of the non-SLP paths). But yeah, I forgot about the early vector type assingment during pattern recog ... I did try to get rid of vect_update_shared_vectype some months ago (the only remaining user of STMT_VINFO_NUM_SLP_USES), but somehow miserably failed even though we now have SLP_TREE_VECTYPE - a vector type per SLP node (see https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547981.html). So yeah, pattern recog ... a convenient but also quite iffy feature :/ Richard.
Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx > condition, lose = 1; >mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL); >mark_referenced_resources (trial, &needed, true); > + if (filter_flags) > + { > + mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL); > + > + /* Groups of flags-register setters with users should not > + affect opportunities to move flags-register-setting insns > + (clobbers) into the delay-slot. */ > + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum); > + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum); > + } Don't you need to put the new block before mark_referenced_resources (as I did in fill_simple_delay_slots) in case the needed insn reads the flags register? -- Eric Botcazou
Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
> From: Eric Botcazou > CC: "gcc-patches@gcc.gnu.org" > Date: Fri, 11 Sep 2020 13:09:48 +0200 > received-spf: None (smtp1.axis.com: no sender authenticity information > available from domain of postmas...@mail-wr1-f54.google.com) identity=helo; > client-ip=209.85.221.54; receiver=smtp1.axis.com; > envelope-from="botca...@adacore.com"; > x-sender="postmas...@mail-wr1-f54.google.com"; > x-conformance=sidf_compatible > Old-Content-Type: multipart/alternative; > boundary="_000_2479570Ksvo8Q1nxZfomalhaut_" > Content-Type: text/plain; charset=iso-8859-1 > > > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx > > condition, lose = 1; > >mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL); > >mark_referenced_resources (trial, &needed, true); > > + if (filter_flags) > > + { > > + mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL); > > + > > + /* Groups of flags-register setters with users should not > > +affect opportunities to move flags-register-setting insns > > +(clobbers) into the delay-slot. */ > > + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum); > > + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum); > > + } > > Don't you need to put the new block before mark_referenced_resources (as I > did > in fill_simple_delay_slots) in case the needed insn reads the flags register? It can't be needed, as the flags register is dead in the insn with the delay-slot, as part of the filter_flags condition. brgds, H-P
Re: [PATCH V2] libgccjit: Add new gcc_jit_context_new_blob entry point
On Fri, 2020-09-11 at 12:31 +0200, Andrea Corallo wrote: > Hi Dave, > > thanks for the review! > > David Malcolm writes: > > [...] > > > Was there a reason for using reinterpret_cast here, rather than > > static_cast? > > Yes the reason is that apparently we can't use static_cast for that: > > "error: invalid ‘static_cast’ from type ‘gcc_jit_lvalue*’ to type > ‘gcc::jit::recording::global*’" > > Why is that is another question I fear I'm not very qualified to > answer. > If you feel is necessary and want to suggest some rework to improve > this > I'll be happy to put some effort in if you like. It's because we're effectively downcasting from lvalue to global. This is safe because of the check for global->is_global (). A slightly cleaner way to do it would be to have a dyn_cast_global vfunc here rather than is_global, and to use the result - but I don't think it matters. [...] > > > After applying the suggestions and having tested it I've > > > installed > the > patch as 4ecc0061c40. > > Thanks! Thanks Dave
Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
> From: Hans-Peter Nilsson > Date: Fri, 11 Sep 2020 13:24:18 +0200 > > > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx > > > condition, lose = 1; > > >mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL); > > >mark_referenced_resources (trial, &needed, true); > > > + if (filter_flags) > > > + { > > > + mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL); > > > + > > > + /* Groups of flags-register setters with users should not > > > + affect opportunities to move flags-register-setting insns > > > + (clobbers) into the delay-slot. */ > > > + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum); > > > + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum); > > > + } > > > > Don't you need to put the new block before mark_referenced_resources (as I > > did > > in fill_simple_delay_slots) in case the needed insn reads the flags > > register? > > It can't be needed, as the flags register is dead in the insn with the > delay-slot, as part of the filter_flags condition. Uh, I'll get back with a more thoughtful reply... brgds, H-P
Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets
> From: Eric Botcazou > Date: Fri, 11 Sep 2020 13:09:48 +0200 > > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx > > condition, lose = 1; > >mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL); > >mark_referenced_resources (trial, &needed, true); > > + if (filter_flags) > > + { > > + mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL); > > + > > + /* Groups of flags-register setters with users should not > > +affect opportunities to move flags-register-setting insns > > +(clobbers) into the delay-slot. */ > > + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum); > > + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum); > > + } > > Don't you need to put the new block before mark_referenced_resources (as I > did > in fill_simple_delay_slots) in case the needed insn reads the flags register? I think the simplest answer is that wouldn't be good; it'd leave the "CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);" without effect. Note also fill_simple_delay_slots is moving insns from the before the delay-slotting insn, while this is from a "tread" (or arm, or leg) "after". brgds, H-P
[OG10] Merge GCC 10 into branch; merge some mainline nvptx patches
OG10 = devel/omp/gcc-10 I have merged releases/gcc-10 into that branch. And added a bunch of mainline alias GCC 11 nvptx patches to that branch. 2df8e0f1bc4 [libatomic] Add nvptx support 5544bca37bc [nvptx] Fix UB in nvptx_assemble_value 7e10b6b0b34 [nvptx] Fix printing of 128-bit constant (negative case) 79d64f8ab05 [nvptx] Fix printing of 128-bit constant 21fc67c95a4 [nvptx, libgcc] Fix Wbuiltin-declaration-mismatch in atomic.c 46595b72fed [nvptx] Fix Wformat in nvptx_assemble_decl_begin 842cd983048 [nvptx] Fix boolean type test in write_fn_proto 97012eaf9d3 [nvptx] Fix array dimension in nvptx_assemble_decl_begin 1e00dba14b2 [nvptx] Handle V2DI/V2SI mode in nvptx_gen_shuffle 2a509a60479 Update ChangeLog.omp for the last two commits fbfd2a5674e nvptx: Provide vec_set and vec_extract patterns 89f5545f62c nvptx: Support 16-bit shifts and extendqihi2 34f8c7e5fa3 Merge remote-tracking branch 'origin/releases/gcc-10' into devel/omp/gcc-10 - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
[PATCH] fixincludes/fixfixes.c: Fix 'set but not used' warning.
pz_tmp_base and pz_tmp_dot are always set, but used only when _PC_NAME_MAX is defined. This patch moves their declaration and definition undef #ifdef _PC_NAME_MAX to avoid this warning. 2020-09-11 Torbjörn SVENSSON Christophe Lyon fixincludes/ * fixfixes.c (pz_tmp_base, pz_tmp_dot): Define only with _PC_NAME_MAX. --- fixincludes/fixfixes.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fixincludes/fixfixes.c b/fixincludes/fixfixes.c index 034e15d..5b23a8b 100644 --- a/fixincludes/fixfixes.c +++ b/fixincludes/fixfixes.c @@ -738,8 +738,10 @@ main( int argc, char** argv ) { tFixDesc* pFix; char* pz_tmptmp; +#ifdef _PC_NAME_MAX char* pz_tmp_base; char* pz_tmp_dot; +#endif if (argc != 5) { @@ -772,12 +774,12 @@ main( int argc, char** argv ) pz_tmptmp = XNEWVEC (char, strlen (argv[4]) + 5); strcpy( pz_tmptmp, argv[4] ); +#ifdef _PC_NAME_MAX /* Don't lose because "12345678" and "12345678X" map to the same file under DOS restricted 8+3 file namespace. Note that DOS doesn't allow more than one dot in the trunk of a file name. */ pz_tmp_base = basename( pz_tmptmp ); pz_tmp_dot = strchr( pz_tmp_base, '.' ); -#ifdef _PC_NAME_MAX if (pathconf( pz_tmptmp, _PC_NAME_MAX ) <= 12/* is this DOS or Windows9X? */ && pz_tmp_dot != (char*)NULL) strcpy (pz_tmp_dot+1, "X"); /* nuke the original extension */ -- 2.7.4
[PATCH] libgcc/config/arm/fp16.c: Add missing prototypes
This patch adds the missing prototypes for the fonctions defined in fp16.c, to avoid these warnings during the build: /libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for '__gnu_h2f_internal' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for '__gnu_f2h_ieee' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for '__gnu_h2f_ieee' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for '__gnu_f2h_alternative' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for '__gnu_h2f_alternative' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for '__gnu_d2h_ieee' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for '__gnu_d2h_alternative' [-Wmissing-prototypes] 2020-09-11 Torbjörn SVENSSON Christophe Lyon libgcc/ * config/arm/fp16.c (__gnu_h2f_internal, __gnu_f2h_ieee) (__gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative) (__gnu_d2h_ieee, __gnu_d2h_alternative): Add missing prototypes. --- libgcc/config/arm/fp16.c | 9 + 1 file changed, 9 insertions(+) diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c index e8f7afb..3d1d8f4 100644 --- a/libgcc/config/arm/fp16.c +++ b/libgcc/config/arm/fp16.c @@ -52,6 +52,15 @@ binary64 = 52 /* significand. */ }; +/* Function prototypes */ +unsigned int __gnu_h2f_internal(unsigned short a, int ieee); +unsigned short __gnu_f2h_ieee(unsigned int a); +unsigned int __gnu_h2f_ieee(unsigned short a); +unsigned short __gnu_f2h_alternative(unsigned int x); +unsigned int __gnu_h2f_alternative(unsigned short a); +unsigned short __gnu_d2h_ieee (unsigned long long a); +unsigned short __gnu_d2h_alternative (unsigned long long x); + static inline unsigned short __gnu_float2h_internal (const struct format* fmt, unsigned long long a, int ieee) -- 2.7.4
[PATCH] libstdc++-v3/include/bits/regex_error.h: Avoid warning with -fno-exceptions.
When building with -fno-exceptions, __GLIBCXX_THROW_OR_ABORT expands to abort(), causing warnings: unused parameter '__ecode' unused parameter '__what' This patch adds __attribute__((unused)) to avoid them. 2020-09-11 Torbjörn SVENSSON Christophe Lyon libstdc++-v3/ * include/bits/regex_error.h: Avoid warning with -fno-exceptions. --- libstdc++-v3/include/bits/regex_error.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/regex_error.h b/libstdc++-v3/include/bits/regex_error.h index 09e9288..88f3f811 100644 --- a/libstdc++-v3/include/bits/regex_error.h +++ b/libstdc++-v3/include/bits/regex_error.h @@ -167,7 +167,8 @@ namespace regex_constants __throw_regex_error(regex_constants::error_type __ecode); inline void - __throw_regex_error(regex_constants::error_type __ecode, const char* __what) + __throw_regex_error(regex_constants::error_type __ecode __attribute__((unused)), + const char* __what__attribute__((unused))) { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); } _GLIBCXX_END_NAMESPACE_VERSION -- 2.7.4
[PATCH] libstdc++-v3/libsupc++/eh_call.cc: Avoid warning with -fno-exceptions.
When building with -fno-exceptions, __throw_exception_again expands to nothing, causing a "suggest braces around empty body in an 'if' statement" warning. This patch adds braces, like what was done in eh_personality.cc in svn r193295 (git g:54ba39f599fc2f3d59fd3cd828a301ce9b731a20) 2020-09-11 Torbjörn SVENSSON Christophe Lyon libstdc++-v3/ * libsupc++/eh_call.cc: Avoid warning with -fno-exceptions. --- libstdc++-v3/libsupc++/eh_call.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/libsupc++/eh_call.cc b/libstdc++-v3/libsupc++/eh_call.cc index ee44b1a..d50c4fb 100644 --- a/libstdc++-v3/libsupc++/eh_call.cc +++ b/libstdc++-v3/libsupc++/eh_call.cc @@ -138,7 +138,7 @@ __cxa_call_unexpected(void* exc_obj_in) if (__cxa_type_match(&new_xh->unwindHeader, catch_type, false, &new_ptr) != ctm_failed) - __throw_exception_again; + { __throw_exception_again; } // If the exception spec allows std::bad_exception, throw that. // We don't have a thrown object to compare against, but since -- 2.7.4
[PATCH] libstdc++-v3/libsupc++/eh_call.cc: Avoid "set but not used" warning
When building with -fno-exceptions, bad_exception_allowed is set but not used, causing a warning during the build. This patch adds __attribute__((unused)) to avoid it. 2020-09-11 Torbjörn SVENSSON Christophe Lyon libstdc++-v3/ * libsupc++/eh_call.cc: Avoid warning with -fno-exceptions. --- libstdc++-v3/libsupc++/eh_call.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/libsupc++/eh_call.cc b/libstdc++-v3/libsupc++/eh_call.cc index d50c4fb..3c7426e 100644 --- a/libstdc++-v3/libsupc++/eh_call.cc +++ b/libstdc++-v3/libsupc++/eh_call.cc @@ -124,7 +124,7 @@ __cxa_call_unexpected(void* exc_obj_in) void* new_ptr = __get_object_from_ambiguous_exception (new_xh); const std::type_info* catch_type; int n; - bool bad_exception_allowed = false; + bool bad_exception_allowed __attribute__((unused)) = false; const std::type_info& bad_exc = typeid(std::bad_exception); // Check the new exception against the rtti list -- 2.7.4
RE: [PATCH] libgcc/config/arm/fp16.c: Add missing prototypes
Hi Christophe, > -Original Message- > From: Gcc-patches On Behalf Of > Christophe Lyon via Gcc-patches > Sent: 11 September 2020 13:23 > To: gcc-patches@gcc.gnu.org; i...@airs.com > Subject: [PATCH] libgcc/config/arm/fp16.c: Add missing prototypes > > This patch adds the missing prototypes for the fonctions defined in fp16.c, to > avoid these warnings during the build: > /libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for > '__gnu_h2f_internal' [-Wmissing-prototypes] > /libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for > '__gnu_f2h_ieee' [-Wmissing-prototypes] > /libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for > '__gnu_h2f_ieee' [-Wmissing-prototypes] > /libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for > '__gnu_f2h_alternative' [-Wmissing-prototypes] > /libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for > '__gnu_h2f_alternative' [-Wmissing-prototypes] > /libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for > '__gnu_d2h_ieee' [-Wmissing-prototypes] > /libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for > '__gnu_d2h_alternative' [-Wmissing-prototypes] > > 2020-09-11 Torbjörn SVENSSON > Christophe Lyon > > libgcc/ > * config/arm/fp16.c (__gnu_h2f_internal, __gnu_f2h_ieee) > (__gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative) > (__gnu_d2h_ieee, __gnu_d2h_alternative): Add missing prototypes. > --- > libgcc/config/arm/fp16.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c > index e8f7afb..3d1d8f4 100644 > --- a/libgcc/config/arm/fp16.c > +++ b/libgcc/config/arm/fp16.c > @@ -52,6 +52,15 @@ binary64 = >52 /* significand. */ > }; > > +/* Function prototypes */ > +unsigned int __gnu_h2f_internal(unsigned short a, int ieee); I think this function should just be marked as static, as it is truly internal to this file, whereas the other ones can be called from code emitted in config/arm/arm.c Thanks, Kyrill > +unsigned short __gnu_f2h_ieee(unsigned int a); > +unsigned int __gnu_h2f_ieee(unsigned short a); > +unsigned short __gnu_f2h_alternative(unsigned int x); > +unsigned int __gnu_h2f_alternative(unsigned short a); > +unsigned short __gnu_d2h_ieee (unsigned long long a); > +unsigned short __gnu_d2h_alternative (unsigned long long x); > + > static inline unsigned short > __gnu_float2h_internal (const struct format* fmt, > unsigned long long a, int ieee) > -- > 2.7.4
Re: [PATCH] ppc64 check for incompatible setting of minimal-toc
On Sep 11, 2020, Alan Modra wrote: > I also thought it reasonable to error on an explicit -mcmodel=medium > or -mcmodel=large with either of -mminimal-toc or -mno-minimal-toc, > since the toc options really are specific to small model code. Why > change that? Thanks. I think the key piece of information that was missing for us was that -mno-minimal-toc was specific to small model code as much as -mminimal-toc. Without that info, the prevailing expectation was that, if -mminimal-toc conflicted with -mcmodel=*, then reversing its effects with -mno-minimal-toc should make it fine, but doesn't. I guess this means we withdraw the patch. Thanks again! -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
[PATCH] tree-optimization/97020 - account SLP cost in loop vect again
The previous re-org made the cost of SLP vector stmts in loop vectorization ignored. The following rectifies this mistake. Bootstrapped & tested on x86_64-unknown-linux-gnu, pushed. 2020-09-11 Richard Biener PR tree-optimization/97020 * tree-vect-slp.c (vect_slp_analyze_operations): Apply SLP costs when doing loop vectorization. --- gcc/tree-vect-slp.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 519cd6a7254..15912515caa 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -3163,8 +3163,15 @@ vect_slp_analyze_operations (vec_info *vinfo) visited.add (*x); i++; - /* Remember the SLP graph entry cost for later. */ - instance->cost_vec = cost_vec; + /* For BB vectorization remember the SLP graph entry +cost for later. */ + if (is_a (vinfo)) + instance->cost_vec = cost_vec; + else + { + add_stmt_costs (vinfo, vinfo->target_cost_data, &cost_vec); + cost_vec.release (); + } } } -- 2.26.2
RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls
Hi, On Fri, Sep 11 2020, Tamar Christina wrote: > Hi Martin, > >> On Fri, Aug 21 2020, Tamar Christina wrote: >> >> >> >> Honza's changes have been motivated to big extent as an enabler for >> >> IPA-CP heuristics changes to actually speed up 548.exchange2_r. >> >> >> >> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds >> two >> >> weeks ago, this week it is 403, but with my WIP (and so far untested) >> >> patch below it is just 276 seconds - faster than one built with GCC 8 >> >> which needs >> >> 283 seconds. >> >> >> >> I'll be interested in knowing if it also works this well on other >> >> architectures. >> >> >> >> I have posted the new version of the patch series to the mailing list >> yesterday and I have also pushed the branch to the FSF repo as >> refs/users/jamborm/heads/ipa-context_and_exchange-200907 >> > > Thanks! I've pushed it through our CI with a host of different options (see > below) > >> > >> > Many thanks for working on this! >> > >> > I tried this on an AArch64 Neoverse-N1 machine and didn't see any >> difference. >> > Do I need any flags for it to work? The patch was applied on top of >> > 656218ab982cc22b826227045826c92743143af1 >> > >> >> I only have access to fairly old AMD (Seattle) Opteron 1100 which might not >> support some interesting Aarch64 ISA extensions but I can measure a >> significant speedup on it (everything with just -Ofast -march=native - >> mtune=native, no non-default parameters, without LTO, without any inlining >> options): >> >> GCC 10 branch: 915 seconds >> Master (rev. 995bb851ffe): 989 seconds >> My branch: 827 seconds >> >> (All is 548.exchange_r reference run time.) >> > > +--+--+--+--+--+ > | Compiler | Flags > > | diff GCC 10 | | | > +--+--+--+--+--+ > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param > ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 > -fno-inline-functions-called-once | | | | > +--+--+--+--+--+ > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer > > | -44% | | | > +--+--+--+--+--+ > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto > > | -36% | | | > +--+--+--+--+--+ > | GCC 11 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param > ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 > -fno-inline-functions-called-once | -12% | | | > +--+--+--+--+--+ > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param > ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 > | -22% | | | > +--+--+--+--+--+ > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param > ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 > -fno-inline-functions-called-once | -12% | | | can you please confirm that the difference between these two is all due to the last option -fno-inline-functions-called-once ? Is LTo necessary? I.e., can you run the benchmark also built with the branch compiler and -mcpu=native -Ofast -fomit-frame-pointer -fno-inline-functions-called-once ? > +--+--+--+--+--+ > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto > > | -24% | | | > +-
Re: [PATCH] libstdc++-v3/include/bits/regex_error.h: Avoid warning with -fno-exceptions.
On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote: When building with -fno-exceptions, __GLIBCXX_THROW_OR_ABORT expands to abort(), causing warnings: unused parameter '__ecode' unused parameter '__what' This patch adds __attribute__((unused)) to avoid them. OK, thanks. 2020-09-11 Torbjörn SVENSSON Christophe Lyon libstdc++-v3/ * include/bits/regex_error.h: Avoid warning with -fno-exceptions. --- libstdc++-v3/include/bits/regex_error.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/regex_error.h b/libstdc++-v3/include/bits/regex_error.h index 09e9288..88f3f811 100644 --- a/libstdc++-v3/include/bits/regex_error.h +++ b/libstdc++-v3/include/bits/regex_error.h @@ -167,7 +167,8 @@ namespace regex_constants __throw_regex_error(regex_constants::error_type __ecode); inline void - __throw_regex_error(regex_constants::error_type __ecode, const char* __what) + __throw_regex_error(regex_constants::error_type __ecode __attribute__((unused)), + const char* __what__attribute__((unused))) { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); } _GLIBCXX_END_NAMESPACE_VERSION -- 2.7.4
Re: [PATCH] libstdc++-v3/libsupc++/eh_call.cc: Avoid warning with -fno-exceptions.
On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote: When building with -fno-exceptions, __throw_exception_again expands to nothing, causing a "suggest braces around empty body in an 'if' statement" warning. This patch adds braces, like what was done in eh_personality.cc in svn r193295 (git g:54ba39f599fc2f3d59fd3cd828a301ce9b731a20) OK, thanks. 2020-09-11 Torbjörn SVENSSON Christophe Lyon libstdc++-v3/ * libsupc++/eh_call.cc: Avoid warning with -fno-exceptions. --- libstdc++-v3/libsupc++/eh_call.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/libsupc++/eh_call.cc b/libstdc++-v3/libsupc++/eh_call.cc index ee44b1a..d50c4fb 100644 --- a/libstdc++-v3/libsupc++/eh_call.cc +++ b/libstdc++-v3/libsupc++/eh_call.cc @@ -138,7 +138,7 @@ __cxa_call_unexpected(void* exc_obj_in) if (__cxa_type_match(&new_xh->unwindHeader, catch_type, false, &new_ptr) != ctm_failed) - __throw_exception_again; + { __throw_exception_again; } // If the exception spec allows std::bad_exception, throw that. // We don't have a thrown object to compare against, but since -- 2.7.4
Re: [PATCH] libstdc++-v3/libsupc++/eh_call.cc: Avoid "set but not used" warning
On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote: When building with -fno-exceptions, bad_exception_allowed is set but not used, causing a warning during the build. This patch adds __attribute__((unused)) to avoid it. OK, thanks. 2020-09-11 Torbjörn SVENSSON Christophe Lyon libstdc++-v3/ * libsupc++/eh_call.cc: Avoid warning with -fno-exceptions. --- libstdc++-v3/libsupc++/eh_call.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/libsupc++/eh_call.cc b/libstdc++-v3/libsupc++/eh_call.cc index d50c4fb..3c7426e 100644 --- a/libstdc++-v3/libsupc++/eh_call.cc +++ b/libstdc++-v3/libsupc++/eh_call.cc @@ -124,7 +124,7 @@ __cxa_call_unexpected(void* exc_obj_in) void* new_ptr = __get_object_from_ambiguous_exception (new_xh); const std::type_info* catch_type; int n; - bool bad_exception_allowed = false; + bool bad_exception_allowed __attribute__((unused)) = false; const std::type_info& bad_exc = typeid(std::bad_exception); // Check the new exception against the rtti list -- 2.7.4
Re: [PATCH] ppc64 check for incompatible setting of minimal-toc
On 9/11/20 5:35 AM, Alexandre Oliva wrote: I guess this means we withdraw the patch. Thanks again! The original motivation for this was a failing fsf-testsuite-gcc test on vxworks7. If the patch is wrong, then I suppose the alternative is to modify the test, or else kill it?
Re: [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements
On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote: This series ensures that the std::future::wait_* functions use std::chrono::steady_clock when required, introduces std::chrono::__detail::ceil to make that easier to do, and then makes use of that function to simplify and improve the fix for PR68519 in std::condition_variable. v1 of this series was originally posted back in September 2017 (see https://gcc.gnu.org/ml/libstdc++/2017-09/msg00083.html ) v2 of this series was originally posted back in January 2018 (see https://gcc.gnu.org/ml/libstdc++/2018-01/msg00035.html ) v3 of this series was originally posted back in August 2018 (see https://gcc.gnu.org/ml/libstdc++/2018-08/msg1.html ) v4 of this series was originally posted back in October 2019 (see https://gcc.gnu.org/legacy-ml/gcc-patches/2019-10/msg01934.html ) Changes since v4: * Expose std::chrono::ceil as std::chrono::__detail::ceil so that it can be used to fix PR91486 in std::future::wait_for (as suggested by John Salmon in PR91486.) * Use std::chrono::__detail::ceil to simplify fix for PR68519 in std::condition_variable::wait_for. * Also fix equivalent of PR68519 in std::condition_variable::wait_until and add test. I've pushed this series (except for patch 8/8) to trunk now, with only minor indentation changes in the code and some ChangeLog reformatting. Thanks for your persistence in getting this fixed, and sorry it took so many years!
Re: [PATCH] libstdc++-v3/include/bits/regex_error.h: Avoid warning with -fno-exceptions.
On 11/09/20 13:55 +0100, Jonathan Wakely wrote: On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote: When building with -fno-exceptions, __GLIBCXX_THROW_OR_ABORT expands to abort(), causing warnings: unused parameter '__ecode' unused parameter '__what' This patch adds __attribute__((unused)) to avoid them. OK, thanks. 2020-09-11 Torbjörn SVENSSON Christophe Lyon libstdc++-v3/ * include/bits/regex_error.h: Avoid warning with -fno-exceptions. --- libstdc++-v3/include/bits/regex_error.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/regex_error.h b/libstdc++-v3/include/bits/regex_error.h index 09e9288..88f3f811 100644 --- a/libstdc++-v3/include/bits/regex_error.h +++ b/libstdc++-v3/include/bits/regex_error.h @@ -167,7 +167,8 @@ namespace regex_constants __throw_regex_error(regex_constants::error_type __ecode); inline void - __throw_regex_error(regex_constants::error_type __ecode, const char* __what) + __throw_regex_error(regex_constants::error_type __ecode __attribute__((unused)), Actually, not OK, because that needs to be __unused__ not unused. + const char* __what__attribute__((unused))) And that fails to compile: /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/regex_error.h:171:54: error: expected ‘,’ or ‘...’ before ‘(’ token There's a space missing between the name and the attribute. { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); } _GLIBCXX_END_NAMESPACE_VERSION -- 2.7.4
Re: [PATCH] libstdc++-v3/include/bits/regex_error.h: Avoid warning with -fno-exceptions.
On 11/09/20 14:37 +0100, Jonathan Wakely wrote: On 11/09/20 13:55 +0100, Jonathan Wakely wrote: On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote: When building with -fno-exceptions, __GLIBCXX_THROW_OR_ABORT expands to abort(), causing warnings: unused parameter '__ecode' unused parameter '__what' This patch adds __attribute__((unused)) to avoid them. OK, thanks. 2020-09-11 Torbjörn SVENSSON Christophe Lyon libstdc++-v3/ * include/bits/regex_error.h: Avoid warning with -fno-exceptions. --- libstdc++-v3/include/bits/regex_error.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/regex_error.h b/libstdc++-v3/include/bits/regex_error.h index 09e9288..88f3f811 100644 --- a/libstdc++-v3/include/bits/regex_error.h +++ b/libstdc++-v3/include/bits/regex_error.h @@ -167,7 +167,8 @@ namespace regex_constants __throw_regex_error(regex_constants::error_type __ecode); inline void - __throw_regex_error(regex_constants::error_type __ecode, const char* __what) + __throw_regex_error(regex_constants::error_type __ecode __attribute__((unused)), Actually, not OK, because that needs to be __unused__ not unused. + const char* __what__attribute__((unused))) And that fails to compile: /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/regex_error.h:171:54: error: expected ‘,’ or ‘...’ before ‘(’ token There's a space missing between the name and the attribute. Fixed with this patch. Untested, but it builds. commit 29216f56d002982f10c33056f4b3d7f07e164122 Author: Jonathan Wakely Date: Fri Sep 11 14:51:36 2020 libstdc++: Fix build error in libstdc++-v3/ChangeLog: * include/bits/regex_error.h (__throw_regex_error): Fix parameter declaration and use reserved attribute names. diff --git a/libstdc++-v3/include/bits/regex_error.h b/libstdc++-v3/include/bits/regex_error.h index 88f3f8114a4..f9c01650caa 100644 --- a/libstdc++-v3/include/bits/regex_error.h +++ b/libstdc++-v3/include/bits/regex_error.h @@ -167,8 +167,9 @@ namespace regex_constants __throw_regex_error(regex_constants::error_type __ecode); inline void - __throw_regex_error(regex_constants::error_type __ecode __attribute__((unused)), - const char* __what__attribute__((unused))) + __throw_regex_error(regex_constants::error_type __ecode + __attribute__((__unused__)), + const char* __what __attribute__((__unused__))) { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); } _GLIBCXX_END_NAMESPACE_VERSION
Re: [committed] amdgcn: align TImode registers
This is now backported to GCC 10. Andrew On 11/09/2020 11:17, Andrew Stubbs wrote: This patch fixes an execution failure in which the compiler would corrupt TImode values due to missed early clobber problems with partially overlapping register allocations. In fact, adding early clobber constraints does not fix the issue because IRA doesn't check that for move instructions properly in all circumstances anyway (the constraint causes an ICE in postreload). This patch fixes the problem by ensuring that TImode values are always aligned to 4-register boundaries, meaning that inputs and outputs will either overlap completely, or not at all, neither of which have early-clobber issues. This is an artificial restriction the hardware not present in hardware , but it is the same solution we use for DImode values where we had a lot of the same problems. With the patch I see the following new test passes: PASS: gfortran.dg/PR95331.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: gfortran.dg/PR95331.f90 -O3 -g execution test PASS: gfortran.dg/gamma_5.f90 -O0 execution test PASS: gfortran.dg/gamma_5.f90 -O1 execution test PASS: gfortran.dg/gamma_5.f90 -O2 execution test PASS: gfortran.dg/gamma_5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: gfortran.dg/gamma_5.f90 -O3 -g execution test PASS: gfortran.dg/gamma_5.f90 -Os execution test PASS: gfortran.dg/intrinsic_pack_1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: gfortran.dg/intrinsic_pack_1.f90 -O3 -g execution test PASS: gfortran.dg/optional_absent_5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test PASS: gfortran.dg/optional_absent_5.f90 -O3 -g execution test Andrew
Re: [PATCH] testsuite: gimplefe-44 requires exceptions
This is now committed and backported to GCC 10. Andrew On 10/09/2020 15:03, Andrew Stubbs wrote: This patch prevents an ICE (segmentation fault) the occurs for amdgcn because the test is trying to use -fexceptions which is unsupported on the target. Arguably it should fail more gracefully, but either way the test is inappropriate. OK to commit? Andrew
Re: [PATCH] [PATCH] PR rtl-optimization/96791 Check precision of partial modes
On Fri, Sep 11, 2020 at 08:07:39AM +0200, Richard Biener wrote: > On Thu, Sep 10, 2020 at 5:12 PM Segher Boessenkool > wrote: > > On Thu, Sep 10, 2020 at 04:33:30PM +0200, Richard Biener wrote: > > > On Thu, Sep 10, 2020 at 4:22 PM Aaron Sawdey > > > wrote: > > > > If it feels like a hack, that would because it is a hack. > > > > > > > > What I’d really like to discuss is how to accomplish the real goal: > > > > keep anything from trying to do other operations (zero/sign extend for > > > > one) to POImode. > > > > > > > > Is there an existing mechanism for this? > > > > > > Not that I know, but somehow x86 gets away with OImode and XImode without > > > providing too many patterns for those. > > > > What we were seeing is DSE (of all things!) tries to extract a DImode > > from a POImode (and expects that insn to exist!) That is no good. > > Maybe. I don't know what kind of operations have to exist if a mode > is present and what not. > But are you sure this will be the only case you'll run into? No, I am not sure if there are bugs related to this elsewhere in the compiler :-) Until 2014 (and documented just days ago ;-) ) all bits of a partial integer mode were considered unknown. I have looked at a lot of it in our code the past weeks, and we still treat it like that in most places. We now see bootstrap problems if we use POImode in some contexts (that's this PR96791). POImode can only live in pairs of VSX registers; taking a subreg of POImode that would not be valid on one VSX register is not okay. Maybe we are missing some hooks or macros? Segher
Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
On 9/2/20 1:48 PM, Tom de Vries wrote: > On 9/2/20 12:44 PM, Jakub Jelinek wrote: >> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote: >>> And test-case passes on x86_64 with this patch (obviously, in >>> combination with trigger patch above). >>> >>> Jakub, WDYT? >> >> I guess the normal answer would be use libatomic, but it isn't ported for >> nvptx. > > Ah, I was not aware of that one, filed > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that. > >> I guess at least temporarily this is ok, though I'm wondering why >> you need __sync_*_16 rather than __atomic_*_16, > > That's what omp-expand.c uses in expand_omp_atomic_pipeline: > BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N . > I've got an updated version of this patch. It: - no longer supplies the __atomic_load_16, since that's now handled by libatomic - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap, which also falls back on libatomic. I'm currently retesting. Any comments? Otherwise, I'll commit on Monday. Thanks, - Tom
Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
[ Fixing ENOPATCH. ] On 9/11/20 4:24 PM, Tom de Vries wrote: > On 9/2/20 1:48 PM, Tom de Vries wrote: >> On 9/2/20 12:44 PM, Jakub Jelinek wrote: >>> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote: And test-case passes on x86_64 with this patch (obviously, in combination with trigger patch above). Jakub, WDYT? >>> >>> I guess the normal answer would be use libatomic, but it isn't ported for >>> nvptx. >> >> Ah, I was not aware of that one, filed >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that. >> >>> I guess at least temporarily this is ok, though I'm wondering why >>> you need __sync_*_16 rather than __atomic_*_16, >> >> That's what omp-expand.c uses in expand_omp_atomic_pipeline: >> BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N . >> > > I've got an updated version of this patch. It: > - no longer supplies the __atomic_load_16, since that's now handled by > libatomic > - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap, > which also falls back on libatomic. > > I'm currently retesting. > > Any comments? > > Otherwise, I'll commit on Monday. > > Thanks, > - Tom > [libgomp, nvptx] Add __sync_compare_and_swap_16 As reported here ( https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553070.html ), when running test-case libgomp.c-c++-common/reduction-16.c for powerpc host with nvptx accelerator, we run into: ... unresolved symbol __sync_val_compare_and_swap_16 ... I can reproduce the problem on x86_64 with a trigger patch that: - initializes ix86_isa_flags2 to TARGET_ISA2_CX16 - enables define_expand "atomic_load" in gcc/config/i386/sync.md for TImode The problem is that omp-expand.c generates atomic builtin calls based on checks whether those are supported on the host, which forces the target to support these, even though those checks fail for the accelerator target. Fix this by: - adding a __sync_val_compare_and_swap_16 in libgomp for nvptx, which falls back onto libatomic's __atomic_compare_and_swap_16 - adding -foffload=-latomic in the test-case Tested libgomp on x86_64-linux with nvptx accelerator. Tested libgomp with trigger patch on x86_64-linux with nvptx accelerator. libgomp/ChangeLog: * config/nvptx/atomic.c: New file. Add __sync_val_compare_and_swap_16. --- libgomp/config/nvptx/atomic.c | 18 ++ libgomp/testsuite/libgomp.c-c++-common/reduction-16.c | 1 + 2 files changed, 19 insertions(+) diff --git a/libgomp/config/nvptx/atomic.c b/libgomp/config/nvptx/atomic.c new file mode 100644 index 000..2e13b09b497 --- /dev/null +++ b/libgomp/config/nvptx/atomic.c @@ -0,0 +1,18 @@ +#include "libgomp.h" + +#include "../../atomic.c" + +/* Implement __sync_val_compare_and_swap_16, to support offloading from hosts + that support this builtin. Fallback on libatomic. This can be removed + once omp-expand starts using __atomic_compare_exchange_n instead. */ + +unsigned __int128 +__sync_val_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval, +unsigned __int128 newval) +{ + volatile __int128 *ptr = vptr; + __int128 expected = oldval; + __atomic_compare_exchange_n (ptr, &expected, newval, false, + MEMMODEL_SEQ_CST, MEMMODEL_SEQ_CST); + return expected; +} diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c index d0e82b04790..cfa8853f18b 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-additional-options "-foffload=-latomic" } */ #include
Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
On Fri, Sep 11, 2020 at 04:24:42PM +0200, Tom de Vries wrote: > I've got an updated version of this patch. It: > - no longer supplies the __atomic_load_16, since that's now handled by > libatomic > - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap, > which also falls back on libatomic. > > I'm currently retesting. > > Any comments? > > Otherwise, I'll commit on Monday. If some functions are now in libatomic, do we expect users to know that and pass -foffload=-latomic to link, or will mkoffload or whatever do that automatically? If the latter, will e.g. libgomp testsuite ensure that during testing the library can be found even non-installed, if the former, will libgomp testsuite add it for the respective testcases that need it, perhaps under special options? Jakub
Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote: Convert the specified duration to the target clock's duration type before adding it to the current time in __atomic_futex_unsigned::_M_load_when_equal_for and _M_load_when_equal_until. This removes the risk of the timeout being rounded down to the current time resulting in there being no wait at all when the duration type lacks sufficient precision to hold the steady_clock current time. Rather than using the style of fix from PR68519, let's expose the C++17 std::chrono::ceil function as std::chrono::__detail::ceil so that it can be used in code compiled with earlier standards versions and simplify the fix. This was suggested by John Salmon in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 . This problem has become considerably less likely to trigger since I switched the __atomic__futex_unsigned::__clock_t reference clock from system_clock to steady_clock and added the loop, but the consequences of triggering it have changed too. By my calculations it takes just over 194 days from the epoch for the current time not to be representable in a float. This means that system_clock is always subject to the problem (with the standard 1970 epoch) whereas steady_clock with float duration only runs out of resolution machine has been running for that long (assuming the Linux implementation of CLOCK_MONOTONIC.) The recently-added loop in __atomic_futex_unsigned::_M_load_when_equal_until turns this scenario into a busy wait. Unfortunately the combination of both of these things means that it's not possible to write a test case for this occurring in _M_load_when_equal_until as it stands. * libstdc++-v3/include/std/chrono: (__detail::ceil) Move implementation of std::chrono::ceil into private namespace so that it's available to pre-C++17 code. * libstdc++-v3/include/bits/atomic_futex.h: (__atomic_futex_unsigned::_M_load_when_equal_for, __atomic_futex_unsigned::_M_load_when_equal_until): Use __detail::ceil to convert delta to the reference clock duration type to avoid resolution problems * libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486): New test for __atomic_futex_unsigned::_M_load_when_equal_for. --- libstdc++-v3/include/bits/atomic_futex.h | 6 +++-- libstdc++-v3/include/std/chrono | 19 + libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +- 3 files changed, 34 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h index 5f95ade..aa137a7 100644 --- a/libstdc++-v3/include/bits/atomic_futex.h +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_load_when_equal_for(unsigned __val, memory_order __mo, const chrono::duration<_Rep, _Period>& __rtime) { + using __dur = typename __clock_t::duration; return _M_load_when_equal_until(__val, __mo, - __clock_t::now() + __rtime); + __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime)); } // Returns false iff a timeout occurred. @@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION do { const __clock_t::time_point __s_entry = __clock_t::now(); const auto __delta = __atime - __c_entry; - const auto __s_atime = __s_entry + __delta; + const auto __s_atime = __s_entry + + chrono::__detail::ceil<_Duration>(__delta); if (_M_load_when_equal_until(__val, __mo, __s_atime)) return true; __c_entry = _Clock::now(); diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 6d78f32..4257c7c 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif #endif // C++20 +// We want to use ceil even when compiling for earlier standards versions +namespace __detail +{ + template +constexpr __enable_if_is_duration<_ToDur> +ceil(const duration<_Rep, _Period>& __d) +{ + auto __to = chrono::duration_cast<_ToDur>(__d); + if (__to < __d) + return __to + _ToDur{1}; + return __to; + } +} + #if __cplusplus >= 201703L # define __cpp_lib_chrono 201611 @@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr __enable_if_is_duration<_ToDur> ceil(const duration<_Rep, _Period>& __d) { - auto __to = chrono::duration_cast<_ToDur>(__d); - if (__to < __d) - return __to + _ToDur{1}; - return __to; + return __detail::ceil<_ToDur>(__d); This isn't valid in C++11, a constexpr function needs to be just a return statement. Fix incoming ...
[PATCH, rs6000] testsuite fixup pr96139 tests
Hi, As reported, the recently added pr96139 tests will fail on older targets because the tests are missing the appropriate -mvsx or -maltivec options. This adds the options and clarifies the dg-require statements. sniff-regtested OK when specifying older targets on a power7 host. --target_board=unix/'{-mcpu=power4,-mcpu=power5,-mcpu=power6,-mcpu=power7, -mcpu=power8,-mcpu=power9}''{-m64,-m32}'" OK for trunk? Thanks -=Will gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require. * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require. * gcc.target/powerpc/pr96139-a.c: specify -maltivec option and require. diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c index b3daee4..12a3383 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Wall -m32" } */ +/* { dg-options "-O2 -Wall -m32 -mvsx" } */ /* { dg-require-effective-target ilp32 } */ -/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ #include #include void diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-b.c b/gcc/testsuite/gcc.target/powerpc/pr96139-b.c index 19c1110..379849a 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr96139-b.c +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-b.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Wall -m64" } */ +/* { dg-options "-O2 -Wall -m64 -mvsx" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ #include #include void diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c index 2464b8d..3ada260 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c @@ -1,7 +1,7 @@ /* { dg-do run } */ -/* { dg-options "-O2 -Wall" } */ +/* { dg-options "-O2 -Wall -maltivec" } */ /* { dg-require-effective-target powerpc_altivec_ok } */ /* * Based on test created by sjmunroe for pr96139 */
Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
On 11/09/2020 15:25, Tom de Vries wrote: --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c @@ -1,4 +1,5 @@ /*·{·dg-do·run·}·*/ +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/ This will probably break amdgcn, where libatomic does not exist. Andrew
Re: [PATCH] c++: Remove LOOKUP_CONSTINIT.
On 9/10/20 10:15 PM, Marek Polacek via Gcc-patches wrote: Since we now have DECL_DECLARED_CONSTINIT_P, we no longer need LOOKUP_CONSTINIT. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? looks good, thanks for noticing. BTW, you now have > /* Set constexpr flag on vars (functions got it in grokfndecl). */ > if (constexpr_p && VAR_P (decl)) > DECL_DECLARED_CONSTEXPR_P (decl) = true; > +/* And the constinit flag (which only applies to variables). */ > +else if (constinit_p && VAR_P (decl)) > + DECL_DECLARED_CONSTINIT_P (decl) = true; might that be clearer as if (VAR_P (decl)) { constexpr stuff constinit stuff } ? Ok either way gcc/cp/ChangeLog: * cp-tree.h (LOOKUP_CONSTINIT): Remove. (LOOKUP_REWRITTEN): Adjust. * decl.c (duplicate_decls): Set DECL_DECLARED_CONSTINIT_P. (check_initializer): Use DECL_DECLARED_CONSTINIT_P instead of LOOKUP_CONSTINIT. (cp_finish_decl): Don't set DECL_DECLARED_CONSTINIT_P. Use DECL_DECLARED_CONSTINIT_P instead of LOOKUP_CONSTINIT. (grokdeclarator): Set DECL_DECLARED_CONSTINIT_P. * decl2.c (grokfield): Don't handle LOOKUP_CONSTINIT. * parser.c (cp_parser_decomposition_declaration): Remove LOOKUP_CONSTINIT handling. (cp_parser_init_declarator): Likewise. * pt.c (tsubst_expr): Likewise. (instantiate_decl): Likewise. * typeck2.c (store_init_value): Use DECL_DECLARED_CONSTINIT_P instead of LOOKUP_CONSTINIT. --- gcc/cp/cp-tree.h | 4 +--- gcc/cp/decl.c| 13 +++-- gcc/cp/decl2.c | 3 --- gcc/cp/parser.c | 9 ++--- gcc/cp/pt.c | 8 ++-- gcc/cp/typeck2.c | 2 +- 6 files changed, 13 insertions(+), 26 deletions(-) diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index b166475b5f1..5923574a7aa 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5598,13 +5598,11 @@ enum overload_flags { NO_SPECIAL = 0, DTOR_FLAG, TYPENAME_FLAG }; #define LOOKUP_DELEGATING_CONS (LOOKUP_NO_NON_INTEGRAL << 1) /* Allow initialization of a flexible array members. */ #define LOOKUP_ALLOW_FLEXARRAY_INIT (LOOKUP_DELEGATING_CONS << 1) -/* Require constant initialization of a non-constant variable. */ -#define LOOKUP_CONSTINIT (LOOKUP_ALLOW_FLEXARRAY_INIT << 1) /* We're looking for either a rewritten comparison operator candidate or the operator to use on the former's result. We distinguish between the two by knowing that comparisons other than == and <=> must be the latter, as must a <=> expression trying to rewrite to <=> without reversing. */ -#define LOOKUP_REWRITTEN (LOOKUP_CONSTINIT << 1) +#define LOOKUP_REWRITTEN (LOOKUP_ALLOW_FLEXARRAY_INIT << 1) /* Reverse the order of the two arguments for comparison rewriting. First we swap the arguments in add_operator_candidates, then we swap the conversions in add_candidate (so that they correspond to the original order of the diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index f1b7fbaa75b..79afdeeb055 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -2312,6 +2312,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend) |= DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (olddecl); DECL_DECLARED_CONSTEXPR_P (newdecl) |= DECL_DECLARED_CONSTEXPR_P (olddecl); + DECL_DECLARED_CONSTINIT_P (newdecl) + |= DECL_DECLARED_CONSTINIT_P (olddecl); /* Merge the threadprivate attribute from OLDDECL into NEWDECL. */ if (DECL_LANG_SPECIFIC (olddecl) @@ -6884,7 +6886,7 @@ check_initializer (tree decl, tree init, int flags, vec **cleanups) flags |= LOOKUP_ALREADY_DIGESTED; } else if (DECL_DECLARED_CONSTEXPR_P (decl) - || (flags & LOOKUP_CONSTINIT)) + || DECL_DECLARED_CONSTINIT_P (decl)) { /* Declared constexpr or constinit, but no suitable initializer; massage init appropriately so we can pass it into @@ -7675,10 +7677,6 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, DECL_INITIAL (decl) = NULL_TREE; } - /* Handle `constinit' on variable templates. */ - if (flags & LOOKUP_CONSTINIT) - DECL_DECLARED_CONSTINIT_P (decl) = true; - /* Generally, initializers in templates are expanded when the template is instantiated. But, if DECL is a variable constant then it can be used in future constant expressions, so its value @@ -7782,7 +7780,7 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, /* [dcl.constinit]/1 "The constinit specifier shall be applied only to a declaration of a variable with static or thread storage duration." */ - if ((flags & LOOKUP_CONSTINIT) + if (DECL_DECLARED_CONSTINIT_P (decl) && !(dk == dk_thread || dk == dk_static)) { error_at (DECL_SOURCE_LOCATI
Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
On 2020-09-11 16:48, Andrew Stubbs wrote: On 11/09/2020 15:25, Tom de Vries wrote: --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c @@ -1,4 +1,5 @@ /*·{·dg-do·run·}·*/ +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/ This will probably break amdgcn, where libatomic does not exist. It looks like the customary way to handle that is to use offload_target_nvptx. Thanks, - Tom
Re: [PATCH] c++: Remove LOOKUP_CONSTINIT.
On Fri, Sep 11, 2020 at 10:57:21AM -0400, Nathan Sidwell wrote: > On 9/10/20 10:15 PM, Marek Polacek via Gcc-patches wrote: > > Since we now have DECL_DECLARED_CONSTINIT_P, we no longer need > > LOOKUP_CONSTINIT. > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > looks good, thanks for noticing. BTW, you now have > > /* Set constexpr flag on vars (functions got it in grokfndecl). */ > > if (constexpr_p && VAR_P (decl)) > > DECL_DECLARED_CONSTEXPR_P (decl) = true; > > +/* And the constinit flag (which only applies to variables). */ > > +else if (constinit_p && VAR_P (decl)) > > + DECL_DECLARED_CONSTINIT_P (decl) = true; > > might that be clearer as > if (VAR_P (decl)) > { constexpr stuff constinit stuff } > > ? Ok either way Agreed, it'd be nicer to factor VAR_P out. I'll make that change. Thanks, Marek
Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
On 9/11/20 5:03 PM, tdevries wrote: On 2020-09-11 16:48, Andrew Stubbs wrote: On 11/09/2020 15:25, Tom de Vries wrote: --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c @@ -1,4 +1,5 @@ /*·{·dg-do·run·}·*/ +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/ This will probably break amdgcn, where libatomic does not exist. It looks like the customary way to handle that is to use offload_target_nvptx. Or { target { powerpc*-*-* } } ? For some (known) reasons, the __sync_val_compare_and_swap_16 is produced for powerpc but not for x86-64. I could imagine that GCN is affected in the same way as nvptx, except that AMD's ROC is currently not supported for PowerPC, if I understand it correctly. If FAIL start to occur in some CPU/GPU combinations, it can be still revisited. Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
objc++: Always pop scope with method definitions [PR97015]
Syntax errors in method definition lists could leave us in a function scope. My recent change for block scope externs didn't like that. This reimplements the parsing loop to finish the method definition we started. AFAICT the original code was attempting to provide some error recovery. Also while there, simply do the token peeking at the top of the loop, rather than at the two(!) ends. gcc/cp/ * parser.c (cp_parser_objc_method_definition_list): Reimplement loop, make sure we pop scope. gcc/testsuite/ * obj-c++.dg/syntax-error-9.mm: Adjust expected errors. pushed -- Nathan Sidwell diff --git c/gcc/cp/parser.c w/gcc/cp/parser.c index fed16895b42..fba3fcc0c4c 100644 --- c/gcc/cp/parser.c +++ w/gcc/cp/parser.c @@ -32980,44 +32980,42 @@ cp_parser_objc_method_prototype_list (cp_parser* parser) static void cp_parser_objc_method_definition_list (cp_parser* parser) { - cp_token *token = cp_lexer_peek_token (parser->lexer); - - while (token->keyword != RID_AT_END && token->type != CPP_EOF) + for (;;) { - tree meth; + cp_token *token = cp_lexer_peek_token (parser->lexer); - if (token->type == CPP_PLUS || token->type == CPP_MINUS) + if (token->keyword == RID_AT_END) { - cp_token *ptk; - tree sig, attribute; - bool is_class_method; - if (token->type == CPP_PLUS) - is_class_method = true; - else - is_class_method = false; + cp_lexer_consume_token (parser->lexer); /* Eat '@end'. */ + break; + } + else if (token->type == CPP_EOF) + { + cp_parser_error (parser, "expected %<@end%>"); + break; + } + else if (token->type == CPP_PLUS || token->type == CPP_MINUS) + { + bool is_class_method = token->type == CPP_PLUS; + push_deferring_access_checks (dk_deferred); - sig = cp_parser_objc_method_signature (parser, &attribute); + tree attribute; + tree sig = cp_parser_objc_method_signature (parser, &attribute); if (sig == error_mark_node) + cp_parser_skip_to_end_of_block_or_statement (parser); + else { - cp_parser_skip_to_end_of_block_or_statement (parser); - token = cp_lexer_peek_token (parser->lexer); - continue; - } - objc_start_method_definition (is_class_method, sig, attribute, - NULL_TREE); + objc_start_method_definition (is_class_method, sig, + attribute, NULL_TREE); - /* For historical reasons, we accept an optional semicolon. */ - if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON)) - cp_lexer_consume_token (parser->lexer); + /* For historical reasons, we accept an optional semicolon. */ + if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON)) + cp_lexer_consume_token (parser->lexer); - ptk = cp_lexer_peek_token (parser->lexer); - if (!(ptk->type == CPP_PLUS || ptk->type == CPP_MINUS - || ptk->type == CPP_EOF || ptk->keyword == RID_AT_END)) - { perform_deferred_access_checks (tf_warning_or_error); stop_deferring_access_checks (); - meth = cp_parser_function_definition_after_declarator (parser, - false); + tree meth + = cp_parser_function_definition_after_declarator (parser, false); pop_deferring_access_checks (); objc_finish_method_definition (meth); } @@ -33037,15 +33035,8 @@ cp_parser_objc_method_definition_list (cp_parser* parser) else /* Allow for interspersed non-ObjC++ code. */ cp_parser_objc_interstitial_code (parser); - - token = cp_lexer_peek_token (parser->lexer); } - if (token->type != CPP_EOF) -cp_lexer_consume_token (parser->lexer); /* Eat '@end'. */ - else -cp_parser_error (parser, "expected %<@end%>"); - objc_finish_implementation (); } diff --git c/gcc/testsuite/obj-c++.dg/syntax-error-9.mm w/gcc/testsuite/obj-c++.dg/syntax-error-9.mm index ae104e5c140..1876c32dd20 100644 --- c/gcc/testsuite/obj-c++.dg/syntax-error-9.mm +++ w/gcc/testsuite/obj-c++.dg/syntax-error-9.mm @@ -1,3 +1,3 @@ @implementation SaturnDoc /* { dg-warning "cannot find interface declaration" } */ - read: (void*)aStream ggg /* { dg-error "expected .:. at end of input" } */ -/* { dg-error "-:expected ..end. at end of input" "" { target *-*-* } .+1 } */ +/* { dg-error "-:expected ..*. at end of input" "" { target *-*-* } .+1 } */
Re: [PATCH v6] genemit.c (main): split insn-emit.c for compiling parallelly
Hi! On Fri, Sep 11, 2020 at 03:26:17PM +0800, Jojo R wrote: > +#define printf_include() do { \ Don't use macros please, use a function? And maybe do this in a separate patch, for ease of review. That should be ack'ed pretty much immediately, after which it is out of the way, and we do not have to see it again. >while (read_md_rtx (&info)) > -switch (GET_CODE (info.def)) Factor this body to a separate function, too? Again, as earlier patch. As it is, it is impossible to see if you changed anything here. I suspect all this patch really does is pretty trivial, but it is hard to tell. Segher
Re: [OG10] Merge GCC 10 into branch; merge some mainline nvptx patches
On 11/09/2020 13:02, Tobias Burnus wrote: OG10 = devel/omp/gcc-10 I have merged releases/gcc-10 into that branch. And added a bunch of mainline alias GCC 11 nvptx patches to that branch. 2df8e0f1bc4 [libatomic] Add nvptx support 5544bca37bc [nvptx] Fix UB in nvptx_assemble_value 7e10b6b0b34 [nvptx] Fix printing of 128-bit constant (negative case) 79d64f8ab05 [nvptx] Fix printing of 128-bit constant 21fc67c95a4 [nvptx, libgcc] Fix Wbuiltin-declaration-mismatch in atomic.c 46595b72fed [nvptx] Fix Wformat in nvptx_assemble_decl_begin 842cd983048 [nvptx] Fix boolean type test in write_fn_proto 97012eaf9d3 [nvptx] Fix array dimension in nvptx_assemble_decl_begin 1e00dba14b2 [nvptx] Handle V2DI/V2SI mode in nvptx_gen_shuffle 2a509a60479 Update ChangeLog.omp for the last two commits fbfd2a5674e nvptx: Provide vec_set and vec_extract patterns 89f5545f62c nvptx: Support 16-bit shifts and extendqihi2 34f8c7e5fa3 Merge remote-tracking branch 'origin/releases/gcc-10' into devel/omp/gcc-10 I've done another merge from GCC 10 now, to get another two patches: bc67c5c7d14 testsuite: gimplefe-44 requires exceptions 04912cbb0b4 amdgcn: align TImode registers Andrew
[PATCH] Preliminary work on support for 128bits integers
From: Arnaud Charlet * fe.h, opt.ads (Enable_128bit_Types): New. * stand.ads (Standard_Long_Long_Long_Integer, S_Long_Long_Long_Integer): New. --- gcc/ada/fe.h | 1 + gcc/ada/opt.ads | 7 +++ gcc/ada/stand.ads | 4 3 files changed, 12 insertions(+) diff --git a/gcc/ada/fe.h b/gcc/ada/fe.h index 8ad16c2b1c9..520301e4c3e 100644 --- a/gcc/ada/fe.h +++ b/gcc/ada/fe.h @@ -192,6 +192,7 @@ extern Boolean In_Extended_Main_Code_Unit (Entity_Id); #define Ada_Versionopt__ada_version #define Back_End_Inlining opt__back_end_inlining #define Debug_Generated_Code opt__debug_generated_code +#define Enable_128bit_Typesopt__enable_128bit_types #define Exception_Extra_Info opt__exception_extra_info #define Exception_Locations_Suppressed opt__exception_locations_suppressed #define Exception_Mechanismopt__exception_mechanism diff --git a/gcc/ada/opt.ads b/gcc/ada/opt.ads index c982f83b9e4..885a6fb9497 100644 --- a/gcc/ada/opt.ads +++ b/gcc/ada/opt.ads @@ -525,6 +525,13 @@ package Opt is -- dataflow analysis, which is not available. This behavior parallels that -- of the old ABE mechanism. + Enable_128bit_Types : Boolean := False; + -- GNAT + -- Set to True to enable the support for 128-bit types in the compiler. + -- The prerequisite is a 64-bit target that supports 128-bit computation. + + -- WARNING: There is a matching C declaration of this variable in fe.h + Error_Msg_Line_Length : Nat := 0; -- GNAT -- Records the error message line length limit. If this is set to zero, diff --git a/gcc/ada/stand.ads b/gcc/ada/stand.ads index f3f7eb512d5..57b4d55387e 100644 --- a/gcc/ada/stand.ads +++ b/gcc/ada/stand.ads @@ -61,6 +61,7 @@ package Stand is S_Integer, S_Long_Integer, S_Long_Long_Integer, + S_Long_Long_Long_Integer, S_Natural, S_Positive, @@ -283,6 +284,9 @@ package Stand is Standard_Long_Integer: Entity_Id renames SE (S_Long_Integer); Standard_Long_Long_Integer : Entity_Id renames SE (S_Long_Long_Integer); + Standard_Long_Long_Long_Integer : Entity_Id renames + SE (S_Long_Long_Long_Integer); + Standard_Op_Add : Entity_Id renames SE (S_Op_Add); Standard_Op_And : Entity_Id renames SE (S_Op_And); Standard_Op_Concat : Entity_Id renames SE (S_Op_Concat); -- 2.26.2
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote: > This might have already been discussed/answered, sorry, but: > when there's a choice, is there an obvious winner between: > > (1) clearing call-clobbered registers and then restoring call-preserved ones > (2) restoring call-preserved registers and then clearing call-clobbered ones > > Is one option more likely to be useful to attackers than the other? > > (For some frames, it might be necessary to use a small number of > call-clobbered registers to perform the restore sequence, so (1) > wouldn't be fully achievable in all cases.) The same is true for what you have to do *after* restoring registers, as I said before. Clearing all is not correct in all cases, and also it is not useful in all cases (code right after it might write the registers again. This really is very (sub-)target-specific, it cannot be done by generic code on its own *at all*. Segher
Re: [PATCH] Preliminary work on support for 128bits integers
Sorry, I made a mistake. Please ignore it. On Fri, Sep 11, 2020 at 9:06 AM Sunil K Pandey via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > From: Arnaud Charlet > > * fe.h, opt.ads (Enable_128bit_Types): New. > * stand.ads (Standard_Long_Long_Long_Integer, > S_Long_Long_Long_Integer): New. > --- > gcc/ada/fe.h | 1 + > gcc/ada/opt.ads | 7 +++ > gcc/ada/stand.ads | 4 > 3 files changed, 12 insertions(+) > > diff --git a/gcc/ada/fe.h b/gcc/ada/fe.h > index 8ad16c2b1c9..520301e4c3e 100644 > --- a/gcc/ada/fe.h > +++ b/gcc/ada/fe.h > @@ -192,6 +192,7 @@ extern Boolean In_Extended_Main_Code_Unit > (Entity_Id); > #define Ada_Versionopt__ada_version > #define Back_End_Inlining opt__back_end_inlining > #define Debug_Generated_Code opt__debug_generated_code > +#define Enable_128bit_Typesopt__enable_128bit_types > #define Exception_Extra_Info opt__exception_extra_info > #define Exception_Locations_Suppressed opt__exception_locations_suppressed > #define Exception_Mechanismopt__exception_mechanism > diff --git a/gcc/ada/opt.ads b/gcc/ada/opt.ads > index c982f83b9e4..885a6fb9497 100644 > --- a/gcc/ada/opt.ads > +++ b/gcc/ada/opt.ads > @@ -525,6 +525,13 @@ package Opt is > -- dataflow analysis, which is not available. This behavior parallels > that > -- of the old ABE mechanism. > > + Enable_128bit_Types : Boolean := False; > + -- GNAT > + -- Set to True to enable the support for 128-bit types in the > compiler. > + -- The prerequisite is a 64-bit target that supports 128-bit > computation. > + > + -- WARNING: There is a matching C declaration of this variable in fe.h > + > Error_Msg_Line_Length : Nat := 0; > -- GNAT > -- Records the error message line length limit. If this is set to > zero, > diff --git a/gcc/ada/stand.ads b/gcc/ada/stand.ads > index f3f7eb512d5..57b4d55387e 100644 > --- a/gcc/ada/stand.ads > +++ b/gcc/ada/stand.ads > @@ -61,6 +61,7 @@ package Stand is >S_Integer, >S_Long_Integer, >S_Long_Long_Integer, > + S_Long_Long_Long_Integer, > >S_Natural, >S_Positive, > @@ -283,6 +284,9 @@ package Stand is > Standard_Long_Integer: Entity_Id renames SE (S_Long_Integer); > Standard_Long_Long_Integer : Entity_Id renames SE > (S_Long_Long_Integer); > > + Standard_Long_Long_Long_Integer : Entity_Id renames > + SE > (S_Long_Long_Long_Integer); > + > Standard_Op_Add : Entity_Id renames SE (S_Op_Add); > Standard_Op_And : Entity_Id renames SE (S_Op_And); > Standard_Op_Concat : Entity_Id renames SE (S_Op_Concat); > -- > 2.26.2 > >
[PATCH] Fix fma test case [PR97018]
These tests are written for 256 bit vector. For -march=cascadelake, vector size changed to 512 bit. It doubles the number of fma instruction and test fail. Fix is to explicitly disable 512 bit vector by passing additional option -mno-avx512f. Tested on x86-64. gcc/testsuite/ChangeLog: PR target/97018 * gcc.target/i386/l_fma_double_1.c: Add option -mno-avx512f. * gcc.target/i386/l_fma_double_2.c: Likewise. * gcc.target/i386/l_fma_double_3.c: Likewise. * gcc.target/i386/l_fma_double_4.c: Likewise. * gcc.target/i386/l_fma_double_5.c: Likewise. * gcc.target/i386/l_fma_double_6.c: Likewise. * gcc.target/i386/l_fma_float_1.c: Likewise. * gcc.target/i386/l_fma_float_2.c: Likewise. * gcc.target/i386/l_fma_float_3.c: Likewise. * gcc.target/i386/l_fma_float_4.c: Likewise. * gcc.target/i386/l_fma_float_5.c: Likewise. * gcc.target/i386/l_fma_float_6.c: Likewise. --- gcc/testsuite/gcc.target/i386/l_fma_double_1.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_double_2.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_double_3.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_double_4.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_double_5.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_double_6.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_float_1.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_float_2.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_float_3.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_float_4.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_float_5.c | 2 +- gcc/testsuite/gcc.target/i386/l_fma_float_6.c | 2 +- 12 files changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c index 5089874faa5..3413beba960 100644 --- a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4" } */ +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4 -mno-avx512f" } */ /* Disabling epilogues until we find a better way to deal with scans. */ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c index e4696204299..1b9b7988850 100644 --- a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4" } */ +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4 -mno-avx512f" } */ /* Disabling epilogues until we find a better way to deal with scans. */ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c index df986d0a633..0fbe9ab9569 100644 --- a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4" } */ +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4 -mno-avx512f" } */ /* Disabling epilogues until we find a better way to deal with scans. */ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c index ae065590f62..c9eba09fea3 100644 --- a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4" } */ +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4 -mno-avx512f" } */ /* Disabling epilogues until we find a better way to deal with scans. */ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c index 5d31abaa5a7..3217d2683f6 100644 --- a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4" } */ +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4 -mno-avx512f" } */ /* Disabling epilogues until we find a better way to deal with scans. */ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_6.c b/gcc/testsuite/gcc.target/i386/l_fma_double_6.c index ff857fb02f1..a22b4e2e37a 100644 --- a/gcc/tes
Re: [PATCH] Fix fma test case [PR97018]
On Fri, Sep 11, 2020 at 9:22 AM Sunil K Pandey wrote: > > These tests are written for 256 bit vector. For -march=cascadelake, > vector size changed to 512 bit. It doubles the number of fma > instruction and test fail. Fix is to explicitly disable 512 bit > vector by passing additional option -mno-avx512f. > > Tested on x86-64. > > gcc/testsuite/ChangeLog: > > PR target/97018 > * gcc.target/i386/l_fma_double_1.c: Add option -mno-avx512f. > * gcc.target/i386/l_fma_double_2.c: Likewise. > * gcc.target/i386/l_fma_double_3.c: Likewise. > * gcc.target/i386/l_fma_double_4.c: Likewise. > * gcc.target/i386/l_fma_double_5.c: Likewise. > * gcc.target/i386/l_fma_double_6.c: Likewise. > * gcc.target/i386/l_fma_float_1.c: Likewise. > * gcc.target/i386/l_fma_float_2.c: Likewise. > * gcc.target/i386/l_fma_float_3.c: Likewise. > * gcc.target/i386/l_fma_float_4.c: Likewise. > * gcc.target/i386/l_fma_float_5.c: Likewise. > * gcc.target/i386/l_fma_float_6.c: Likewise. > --- > gcc/testsuite/gcc.target/i386/l_fma_double_1.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_double_2.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_double_3.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_double_4.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_double_5.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_double_6.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_float_1.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_float_2.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_float_3.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_float_4.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_float_5.c | 2 +- > gcc/testsuite/gcc.target/i386/l_fma_float_6.c | 2 +- > 12 files changed, 12 insertions(+), 12 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c > b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c > index 5089874faa5..3413beba960 100644 > --- a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c > +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4" } */ > +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4 -mno-avx512f" } */ > /* Disabling epilogues until we find a better way to deal with scans. */ > /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ > > diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c > b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c > index e4696204299..1b9b7988850 100644 > --- a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c > +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4" } */ > +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4 -mno-avx512f" } */ > /* Disabling epilogues until we find a better way to deal with scans. */ > /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ > > diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c > b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c > index df986d0a633..0fbe9ab9569 100644 > --- a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c > +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4" } */ > +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4 -mno-avx512f" } */ > /* Disabling epilogues until we find a better way to deal with scans. */ > /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ > > diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c > b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c > index ae065590f62..c9eba09fea3 100644 > --- a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c > +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4" } */ > +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4 -mno-avx512f" } */ > /* Disabling epilogues until we find a better way to deal with scans. */ > /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ > > diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c > b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c > index 5d31abaa5a7..3217d2683f6 100644 > --- a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c > +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4" } */ > +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic > -mno-fma4 -mno-avx512f" } */ > /* Disabling epilogues until we find a better way
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool > wrote: > > On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote: >> This might have already been discussed/answered, sorry, but: >> when there's a choice, is there an obvious winner between: >> >> (1) clearing call-clobbered registers and then restoring call-preserved ones >> (2) restoring call-preserved registers and then clearing call-clobbered ones >> >> Is one option more likely to be useful to attackers than the other? for mitigating ROP purpose, I think that (2) is better than (1). i.e, the clearing call-clobbered register sequence will be immediately before “ret” instruction. This will prevent the gadget from doing any useful things. >> >> (For some frames, it might be necessary to use a small number of >> call-clobbered registers to perform the restore sequence, so (1) >> wouldn't be fully achievable in all cases.) > Yes, looks like that (1) is also not correct. > The same is true for what you have to do *after* restoring registers, as > I said before. Clearing all is not correct in all cases, and also it is > not useful in all cases (code right after it might write the registers > again. I don’t understand why it’s not correct if we clearing call-clobbered registers AFTER restoring call-preserved registers? Even though we might need to use some call-clobbered registers to restore the call-preserved registers, after the restoring is done, we can use data flow to make sure the call-clobbered registers not lived at that point anymore, then Clearing those not-lived call-clobbered registers immediately before “ret”. For me, this should be correct. Let me know anything I am missing here. Thanks. Qing > > This really is very (sub-)target-specific, it cannot be done by generic > code on its own *at all*. > > > Segher
Re: [PATCH] Enable GCC support for AMX
Hi Thanks for your review, and sorry for the late reply. It took a while to finish the runtime test. > > diff --git a/gcc/config.gcc b/gcc/config.gcc > > index 797f0ad5edd..d0e59e86a5c 100644 > > --- a/gcc/config.gcc > > +++ b/gcc/config.gcc > > @@ -412,7 +412,7 @@ i[34567]86-*-*) > > waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h > > avx512bf16intrin.h enqcmdintrin.h serializeintrin.h > > avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h > > -tsxldtrkintrin.h" > > +tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h > > amxbf16intrin.h" > > Line more than 80 chars. > > > ;; > > x86_64-*-*) > > cpu_type=i386 > > @@ -447,7 +447,7 @@ x86_64-*-*) > > waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h > > avx512bf16intrin.h enqcmdintrin.h serializeintrin.h > > avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h > > -tsxldtrkintrin.h" > > +tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h > > amxbf16intrin.h" > > Ditto. Changed. > > > diff --git a/gcc/config/i386/amxbf16intrin.h > > b/gcc/config/i386/amxbf16intrin.h > > new file mode 100644 > > index 000..df0e2262d50 > > --- /dev/null > > +++ b/gcc/config/i386/amxbf16intrin.h > > @@ -0,0 +1,25 @@ > > +#if !defined _IMMINTRIN_H_INCLUDED > > +#error "Never use directly; include > > instead." > > +#endif > > + > > +#ifndef _AMXBF16INTRIN_H_INCLUDED > > +#define _AMXBF16INTRIN_H_INCLUDED > > + > > +#if !defined(__AMX_BF16__) > > +#pragma GCC push_options > > +#pragma GCC target("amx-bf16") > > +#define __DISABLE_AMX_BF16__ > > +#endif /* __AMX_BF16__ */ > > + > > +#if defined(__x86_64__) && defined(__AMX_BF16__) > > +#define _tile_dpbf16ps(dst,src1,src2) > > \ > > + __asm__ volatile\ > > + ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", > > %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::) > > +#endif > > I hope in future we'll replace it with unspecs at least... Currently we think it is redundant to add builtins with just const int parameters, which are supposed to be replaced in the future. > > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > > index c9f7195d423..9389dc24948 100644 > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index bca8c856dc8..a46e31f5862 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options. > > -mvpclmulqdq -mavx512bitalg -mmovdiri -mmovdir64b -mavx512vpopcntdq > > @gol > > -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol > > -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol > > +-mamx-tile -mamx-int8 -mamx-bf16@gol > > Add space please. Changed. > > > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c > > b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c > > new file mode 100644 > > index 000..605a44df3f8 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c > > @@ -0,0 +1,4 @@ > > +/* { dg-do assemble { target { ! ia32 } } } */ > > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */ > > +/* { dg-require-effective-target amx_bf16 } */ > > +#include"amxbf16-asmintel-1.c" > > I didn't get it. We ususally use second tescase to actually execute > it and (well, a little) verify that semantics is ok. E.g. that > operands order is correct. Could you please do that? > This applies to all *-2.c cases. > I've checked and looks like public SDE simulator supports AMX. > Added runtime test. Tested and passed under SDE. Also, we adjust the intrinsic call to accept macro parameters. Updated patch. > -- > K > Hello, > > On 03 сен 08:17, H.J. Lu wrote: > > On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches > > wrote: > > > > > > Hello, > > > > > > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote: > > > > Hi: > > > > > > > > This patch is about to support Intel Advanced Matrix Extensions (AMX) > > > > which will be enabled in GLC. > > > > > > > > AMX is a new 64-bit programming paradigm consisting of two > > > > compo nents: a set of 2-dimensional registers (tiles) representing > > > > sub-arrays from a larger 2-dimensional memory image, > > > > and an accelerator able to operate on tiles > > > > > > > > Supported instructions are > > > > > > > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease > > > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud > > > > AMX-BF16:tdpbf16ps > > > > > > > > The intrinsics adopts constant tile register number as its input > > > > parameters. > > > > > > > > For detailed information, please refer to > > > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf > > > > > > > > Bootstrap ok, regression test on i3
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote: > I don’t understand why it’s not correct if we clearing call-clobbered > registers > AFTER restoring call-preserved registers? Because the compiler backend (or the linker! Or the dynamic linker! Etc.) can use volatile registers for their own purposes. Like, on Power, r11 and r12 are used for various calling convention purposes; they are also used for other purposes; and r0 is used as some all-purpose volatile (it typically holds the return address near the end of a function). "Call-clobbered" is pretty meaningless. It only holds meaning for a function calling another, and then only to know which registers lose their value then. It has no semantics for other cases, like a function that will return soonish, as here. Segher
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote: > Shrink-wrapped stuff. Quite important for performance. Not something > you can throw away. ^^^ !!! ^^^ > > Start looking at handle_simple_exit()? targetm.gen_simple_return()… > > Yes, I have been looking at this since this morning. > You are right, we also need to insert zeroing sequence before this > simple_return which the current patch missed. Please run the performance loss numbers again after you have something more realistic :-( > I am currently try to resolve this issue with the following idea: > > In the routine “thread_prologue_and_epilogue_insns”, After both > “make_epilogue_seq” and “try_shrink_wrapping” finished, > > Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), > If YES, generate the zero sequence before this RETURN insn. > > Then we should take care all the exit path that returns. > > Do you see any issue from this idea? You need to let the backend decide what to do, for this as well as for all other cases. I do not know how often I will have to repeat that. There also is separate shrink-wrapping, which you haven't touched on at all yet. Joy. Segher
Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
On 11/09/20 15:41 +0100, Jonathan Wakely wrote: On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote: Convert the specified duration to the target clock's duration type before adding it to the current time in __atomic_futex_unsigned::_M_load_when_equal_for and _M_load_when_equal_until. This removes the risk of the timeout being rounded down to the current time resulting in there being no wait at all when the duration type lacks sufficient precision to hold the steady_clock current time. Rather than using the style of fix from PR68519, let's expose the C++17 std::chrono::ceil function as std::chrono::__detail::ceil so that it can be used in code compiled with earlier standards versions and simplify the fix. This was suggested by John Salmon in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 . This problem has become considerably less likely to trigger since I switched the __atomic__futex_unsigned::__clock_t reference clock from system_clock to steady_clock and added the loop, but the consequences of triggering it have changed too. By my calculations it takes just over 194 days from the epoch for the current time not to be representable in a float. This means that system_clock is always subject to the problem (with the standard 1970 epoch) whereas steady_clock with float duration only runs out of resolution machine has been running for that long (assuming the Linux implementation of CLOCK_MONOTONIC.) The recently-added loop in __atomic_futex_unsigned::_M_load_when_equal_until turns this scenario into a busy wait. Unfortunately the combination of both of these things means that it's not possible to write a test case for this occurring in _M_load_when_equal_until as it stands. * libstdc++-v3/include/std/chrono: (__detail::ceil) Move implementation of std::chrono::ceil into private namespace so that it's available to pre-C++17 code. * libstdc++-v3/include/bits/atomic_futex.h: (__atomic_futex_unsigned::_M_load_when_equal_for, __atomic_futex_unsigned::_M_load_when_equal_until): Use __detail::ceil to convert delta to the reference clock duration type to avoid resolution problems * libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486): New test for __atomic_futex_unsigned::_M_load_when_equal_for. --- libstdc++-v3/include/bits/atomic_futex.h | 6 +++-- libstdc++-v3/include/std/chrono | 19 + libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +- 3 files changed, 34 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h index 5f95ade..aa137a7 100644 --- a/libstdc++-v3/include/bits/atomic_futex.h +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_load_when_equal_for(unsigned __val, memory_order __mo, const chrono::duration<_Rep, _Period>& __rtime) { + using __dur = typename __clock_t::duration; return _M_load_when_equal_until(__val, __mo, - __clock_t::now() + __rtime); + __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime)); } // Returns false iff a timeout occurred. @@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION do { const __clock_t::time_point __s_entry = __clock_t::now(); const auto __delta = __atime - __c_entry; - const auto __s_atime = __s_entry + __delta; + const auto __s_atime = __s_entry + + chrono::__detail::ceil<_Duration>(__delta); I'm testing the attached patch to fix the C++11 constexpr error, but while re-looking at the uses of __detail::ceil I noticed this is using _Duration as the target type. Shouldn't that be __clock_t::duration instead? Why do we care about the duration of the user's time_point here, rather than the preferred duration of the clock we're about to wait against? if (_M_load_when_equal_until(__val, __mo, __s_atime)) return true; __c_entry = _Clock::now(); diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 6d78f32..4257c7c 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif #endif // C++20 +// We want to use ceil even when compiling for earlier standards versions +namespace __detail +{ + template +constexpr __enable_if_is_duration<_ToDur> +ceil(const duration<_Rep, _Period>& __d) +{ + auto __to = chrono::duration_cast<_ToDur>(__d); + if (__to < __d) + return __to + _ToDur{1}; + return __to; + } +} + #if __cplusplus >= 201703L # define __cpp_lib_chrono 201611 @@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr __enable_if_is_duration<_ToDur> ceil(const duration<_Re
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
Qing Zhao writes: >> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool >> wrote: >> >> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote: >>> This might have already been discussed/answered, sorry, but: >>> when there's a choice, is there an obvious winner between: >>> >>> (1) clearing call-clobbered registers and then restoring call-preserved ones >>> (2) restoring call-preserved registers and then clearing call-clobbered ones >>> >>> Is one option more likely to be useful to attackers than the other? > > for mitigating ROP purpose, I think that (2) is better than (1). i.e, the > clearing > call-clobbered register sequence will be immediately before “ret” > instruction. > This will prevent the gadget from doing any useful things. OK. The reason I was asking was that (from the naive perspective of someone not well versed in this stuff): if the effect of one of the register restores is itself a useful gadget, the clearing wouldn't protect against it. But if the register restores are not part of the intended effect, it seemed that having them immediately before the ret might make the gadget harder to use than clearing registers would, because the side-effects of restores would be harder to control than the (predictable) effect of clearing registers. But like I say, this is very much not my area of expertise, so that's probably missing the point in a major way. ;-) I think the original patch plugged into pass_thread_prologue_and_epilogue, is that right? If we go for (2), then I think it would be better to do it at the start of pass_late_compilation instead. (Some targets wouldn't cope with doing it later.) The reason for doing it so late is that the set of used “volatile”/caller-saved registers is not fixed at prologue and epilogue generation: later optimisation passes can introduce uses of volatile registers that weren't used previously. (Sorry if this has already been suggested.) Unlike Segher, I think this can/should be done in target-independent code as far as possible (like the patch seemed to do). Thanks, Richard
RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls
Hi Martin, > > can you please confirm that the difference between these two is all due to > the last option -fno-inline-functions-called-once ? Is LTo necessary? I.e., > can > you run the benchmark also built with the branch compiler and -mcpu=native > -Ofast -fomit-frame-pointer -fno-inline-functions-called-once ? > Done, see below. > > +--+-- > +--+--+--+ > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto > | -24% | | | > > +--+-- > +--+--+--+ > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer > | -26% | | | > > +--+-- > +--+--+--+ > > > > > (Hopefully the table shows up correct) > > it does show OK for me, thanks. > > > > > It looks like your patch definitely does improve the basic cases. So > > there's not much difference between lto and non-lto anymore and it's > much Better than GCC 10. However it still contains the regression introduced > by Honza's changes. > > I assume these are rates, not times, so negative means bad. But do I > understand it correctly that you're comparing against GCC 10 with the two > parameters set to rather special values? Because your table seems to > indicate that even for you, the branch is faster than GCC 10 with just - > mcpu=native -Ofast -fomit-frame-pointer. Yes these are indeed rates, and indeed I am comparing against the same options we used to get the fastest rates on before which is the two parameters and the inline flag. > > So is the problem that the best obtainable run-time, even with obscure > options, from the branch is slower than the best obtainable run-time from > GCC 10? > Yeah that's the problem, when we compare the two we're still behind. I've done the additional two runs +--+--+--+ | Compiler | Flags | diff GCC 10 | +--+--+--+ | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions-called-once | | +--+--+--+ | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer | -44% | +--+--+--+ | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto | -36% | +--+--+--+ | GCC 11 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions-called-once | -12% | +--+--+--+ | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 | -22% | +--+--+--+ | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions-called-once | -12% | +--+--+--+ | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto
Re: [PATCH, rs6000] testsuite fixup pr96139 tests
Hi! On Fri, Sep 11, 2020 at 09:44:54AM -0500, will schmidt wrote: > As reported, the recently added pr96139 tests will fail on older targets > because the tests are missing the appropriate -mvsx or -maltivec > options. > This adds the options and clarifies the dg-require statements. > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require. > * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require. > * gcc.target/powerpc/pr96139-a.c: specify -maltivec option and > require. (Capital "S" on each line.) ("and require" isn't clear... "dg-require"?) (And the files are -a, -b, -c). > diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c > b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c > index b3daee4..12a3383 100644 > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c > @@ -1,9 +1,9 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -Wall -m32" } */ > +/* { dg-options "-O2 -Wall -m32 -mvsx" } */ > /* { dg-require-effective-target ilp32 } */ > -/* { dg-require-effective-target powerpc_altivec_ok } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ This uses vector long long. Okay. > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-b.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-b.c Ditto. > diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c > b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c > index 2464b8d..3ada260 100644 > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c > @@ -1,7 +1,7 @@ > /* { dg-do run } */ > -/* { dg-options "-O2 -Wall" } */ > +/* { dg-options "-O2 -Wall -maltivec" } */ > /* { dg-require-effective-target powerpc_altivec_ok } */ But this one as well, why does it not need VSX like the rest? Segher
[gcc-7-arm] Backport Neoverse-N1 tuning
Hi, The attached patches bring the description of Ares and Neoverse-N1 to the gcc-7-arm vendor branch. There were 2 changes to adjust the first patch to the older code in gcc-7. Instead of: + "32:16", /* function_align. */ + "32:16", /* jump_align. */ + "32:16", /* loop_align. */ the patch has: + 32, /* function_align. */ + 32, /* jump_align. */ + 32, /* loop_align. */ The syntax for “32:16” was added in gcc-8: +@itemx -falign-functions=@var{n}:@var{m} Align the start of functions to the next power-of-two greater than +@var{n}, skipping up to @var{m}-1 bytes. This ensures that at least +the first @var{m} bytes of the function can be fetched by the CPU +without crossing an @var{n}-byte alignment boundary. +If @var{m} is not specified, it defaults to @var{n}. + +Examples: @option{-falign-functions=32} aligns functions to the next +32-byte boundary, @option{-falign-functions=24} aligns to the next +32-byte boundary only if this can be done by skipping 23 bytes or less, +@option{-falign-functions=32:7} aligns to the next +32-byte boundary only if this can be done by skipping 6 bytes or less. The second adjustment is due to the fact that gcc-7-branch does not yet support several of the features listed in gcc-8. Instead of: AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE the patch has: AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 The patches pass bootstrap and regression test on aarch64-linux Graviton2. Ok to commit to gcc-7-arm vendor branch? Thanks, Sebastian 0001-AArch64-Initial-mcpu-ares-tuning.patch Description: 0001-AArch64-Initial-mcpu-ares-tuning.patch 0002-AArch64-Add-support-for-Neoverse-N1.patch Description: 0002-AArch64-Add-support-for-Neoverse-N1.patch 0003-AArch64-Set-jump-align-4-for-neoversen1.patch Description: 0003-AArch64-Set-jump-align-4-for-neoversen1.patch 0004-AArch64-Enable-compare-branch-fusion.patch Description: 0004-AArch64-Enable-compare-branch-fusion.patch
[Patch] OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668]
This is a first attempt to improve the OpenMP mapping for allocatables and pointers; there are some more issues – cf. PR and for scalars PR 97021. In real world code, a usage like the following is not uncommon: real, allocatable :: A(:,:) !$omp target enter data map(to: A) This maps an unallocated array (a.data == NULL), the array descriptor itself ("a", pointer set) and then pointer associates on the device the mapped data (well, NULL) with the device's "a.data" That works well – and one can now use A on the device and allocate it (and, before, 'end target' deallocate it). However, many programs now do on the host: allocate(A(n,m)) !$omp target do i = ... do j = ... A(j,i) = ... !$omp end target which gets an implicit "map(tofrom:A)". While "a.data" now gets mapped, the "a" is not updated as it is already present and pointer-setting 'a.data' on the device is also not needed as it is already there. As written, such code is rather common and other compilers handle this. The Fortran spec between OpenMP 4.5 and TR 8 is a bit unclear; in TR 9 (not yet available), the code above is only valid with map(always, tofrom: A) (or 'to:') where the 'always' is required. The general notion is that it should be also valid for the case above, but allocatable components of derived types should not always be rechecked/remapped every time map(dt) is used. — Hence, this was deferred and only the 'always' part was clarified in the draft for the upcoming TR 9. Additionally, for POINTER there is already the following wording in the spec, which implies that the pointer has to be (potentially) updated every time: "If a list item in a map clause is an associated pointer and the pointer is not the base pointer of another list item in a map clause on the same construct, then it is treated as if its pointer target is implicitly mapped in the same clause. For the purposes of the map clause, the mapped pointer target is treated as if its base pointer is the associated pointer." OK? Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668] gcc/fortran/ChangeLog: PR fortran/96668 * trans-openmp.c (gfc_omp_finish_clause): Use GOMP_MAP_ALWAYS_POINTER with PSET for pointers. (gfc_trans_omp_clauses): Likewise and also if the always modifier is used. gcc/ChangeLog: PR fortran/96668 * gimplify.c (gimplify_scan_omp_clauses): Handle GOMP_MAP_ALWAYS_POINTER like GOMP_MAP_POINTER for target exit data. include/ChangeLog: PR fortran/96668 * gomp-constants.h (GOMP_MAP_ALWAYS_POINTER_P): New define. libgomp/ChangeLog: PR fortran/96668 * libgomp.h (struct target_var_desc): Add has_null_ptr_assoc member. * target.c (gomp_map_vars_existing): Add always_to_flag flag. (gomp_map_vars_existing): Update call to it. (gomp_map_fields_existing): Likewise (gomp_map_vars_internal): Update PSET handling such that if a nullptr is now allocated or if GOMP_MAP_POINTER is used PSET is updated and pointer remapped. (GOMP_target_enter_exit_data): Hanlde GOMP_MAP_ALWAYS_POINTER like GOMP_MAP_POINTER. * testsuite/libgomp.fortran/map-alloc-ptr-1.f90: New test. gcc/fortran/trans-openmp.c | 28 +++- gcc/gimplify.c | 1 + include/gomp-constants.h | 3 + libgomp/libgomp.h | 3 + libgomp/target.c | 173 - .../testsuite/libgomp.fortran/map-alloc-ptr-1.f90 | 114 ++ 6 files changed, 282 insertions(+), 40 deletions(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 0e1da04..268467d 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -1357,6 +1357,15 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p) tree type = TREE_TYPE (decl); tree ptr = gfc_conv_descriptor_data_get (decl); + /* OpenMP: automatically map pointer targets with the pointer; + hence, always update the descriptor/pointer itself. + NOTE: This also remaps the pointer for allocatable arrays with + 'target' attribute which also don't have the 'restrict' qualifier. */ + bool always_modifier = false; + + if (flag_openmp && !(TYPE_QUALS (TREE_TYPE (ptr)) & TYPE_QUAL_RESTRICT)) + always_modifier = true; + if (present) ptr = gfc_build_cond_assign_expr (&block, present, ptr, null_pointer_node); @@ -1376,7 +1385,8 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p) OMP_CLAUSE_DECL (c2) = decl; OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (type); c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - OMP_CLAUSE_SET_MAP_KIND (c3, GOMP_MAP_POINTER); + OMP_CLAUSE_SET_MAP_KIND (c3, always_modifier ? GOMP_MAP_ALWAYS_POINTER + : GOMP_MAP_P
Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
On 11/09/20 18:22 +0100, Jonathan Wakely wrote: On 11/09/20 15:41 +0100, Jonathan Wakely wrote: On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote: Convert the specified duration to the target clock's duration type before adding it to the current time in __atomic_futex_unsigned::_M_load_when_equal_for and _M_load_when_equal_until. This removes the risk of the timeout being rounded down to the current time resulting in there being no wait at all when the duration type lacks sufficient precision to hold the steady_clock current time. Rather than using the style of fix from PR68519, let's expose the C++17 std::chrono::ceil function as std::chrono::__detail::ceil so that it can be used in code compiled with earlier standards versions and simplify the fix. This was suggested by John Salmon in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 . This problem has become considerably less likely to trigger since I switched the __atomic__futex_unsigned::__clock_t reference clock from system_clock to steady_clock and added the loop, but the consequences of triggering it have changed too. By my calculations it takes just over 194 days from the epoch for the current time not to be representable in a float. This means that system_clock is always subject to the problem (with the standard 1970 epoch) whereas steady_clock with float duration only runs out of resolution machine has been running for that long (assuming the Linux implementation of CLOCK_MONOTONIC.) The recently-added loop in __atomic_futex_unsigned::_M_load_when_equal_until turns this scenario into a busy wait. Unfortunately the combination of both of these things means that it's not possible to write a test case for this occurring in _M_load_when_equal_until as it stands. * libstdc++-v3/include/std/chrono: (__detail::ceil) Move implementation of std::chrono::ceil into private namespace so that it's available to pre-C++17 code. * libstdc++-v3/include/bits/atomic_futex.h: (__atomic_futex_unsigned::_M_load_when_equal_for, __atomic_futex_unsigned::_M_load_when_equal_until): Use __detail::ceil to convert delta to the reference clock duration type to avoid resolution problems * libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486): New test for __atomic_futex_unsigned::_M_load_when_equal_for. --- libstdc++-v3/include/bits/atomic_futex.h | 6 +++-- libstdc++-v3/include/std/chrono | 19 + libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +- 3 files changed, 34 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h index 5f95ade..aa137a7 100644 --- a/libstdc++-v3/include/bits/atomic_futex.h +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_load_when_equal_for(unsigned __val, memory_order __mo, const chrono::duration<_Rep, _Period>& __rtime) { + using __dur = typename __clock_t::duration; return _M_load_when_equal_until(__val, __mo, - __clock_t::now() + __rtime); + __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime)); } // Returns false iff a timeout occurred. @@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION do { const __clock_t::time_point __s_entry = __clock_t::now(); const auto __delta = __atime - __c_entry; - const auto __s_atime = __s_entry + __delta; + const auto __s_atime = __s_entry + + chrono::__detail::ceil<_Duration>(__delta); I'm testing the attached patch to fix the C++11 constexpr error, but while re-looking at the uses of __detail::ceil I noticed this is using _Duration as the target type. Shouldn't that be __clock_t::duration instead? Why do we care about the duration of the user's time_point here, rather than the preferred duration of the clock we're about to wait against? if (_M_load_when_equal_until(__val, __mo, __s_atime)) return true; __c_entry = _Clock::now(); diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 6d78f32..4257c7c 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif #endif // C++20 +// We want to use ceil even when compiling for earlier standards versions +namespace __detail +{ + template +constexpr __enable_if_is_duration<_ToDur> +ceil(const duration<_Rep, _Period>& __d) +{ + auto __to = chrono::duration_cast<_ToDur>(__d); + if (__to < __d) + return __to + _ToDur{1}; + return __to; + } +} + #if __cplusplus >= 201703L # define __cpp_lib_chrono 201611 @@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr __enable_if_is_dura
Re: [PATCH] libgcc/config/arm/fp16.c: Add missing prototypes
On Fri, 11 Sep 2020 at 14:35, Kyrylo Tkachov wrote: > > Hi Christophe, > > > -Original Message- > > From: Gcc-patches On Behalf Of > > Christophe Lyon via Gcc-patches > > Sent: 11 September 2020 13:23 > > To: gcc-patches@gcc.gnu.org; i...@airs.com > > Subject: [PATCH] libgcc/config/arm/fp16.c: Add missing prototypes > > > > This patch adds the missing prototypes for the fonctions defined in fp16.c, > > to > > avoid these warnings during the build: > > /libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for > > '__gnu_h2f_internal' [-Wmissing-prototypes] > > /libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for > > '__gnu_f2h_ieee' [-Wmissing-prototypes] > > /libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for > > '__gnu_h2f_ieee' [-Wmissing-prototypes] > > /libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for > > '__gnu_f2h_alternative' [-Wmissing-prototypes] > > /libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for > > '__gnu_h2f_alternative' [-Wmissing-prototypes] > > /libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for > > '__gnu_d2h_ieee' [-Wmissing-prototypes] > > /libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for > > '__gnu_d2h_alternative' [-Wmissing-prototypes] > > > > 2020-09-11 Torbjörn SVENSSON > > Christophe Lyon > > > > libgcc/ > > * config/arm/fp16.c (__gnu_h2f_internal, __gnu_f2h_ieee) > > (__gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative) > > (__gnu_d2h_ieee, __gnu_d2h_alternative): Add missing prototypes. > > --- > > libgcc/config/arm/fp16.c | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c > > index e8f7afb..3d1d8f4 100644 > > --- a/libgcc/config/arm/fp16.c > > +++ b/libgcc/config/arm/fp16.c > > @@ -52,6 +52,15 @@ binary64 = > >52 /* significand. */ > > }; > > > > +/* Function prototypes */ > > +unsigned int __gnu_h2f_internal(unsigned short a, int ieee); > > I think this function should just be marked as static, as it is truly > internal to this file, whereas the other ones can be called from code emitted > in config/arm/arm.c > Right, this updated patch does that, like for the other functions in the same file. OK? Thanks Christophe > Thanks, > Kyrill > > > +unsigned short __gnu_f2h_ieee(unsigned int a); > > +unsigned int __gnu_h2f_ieee(unsigned short a); > > +unsigned short __gnu_f2h_alternative(unsigned int x); > > +unsigned int __gnu_h2f_alternative(unsigned short a); > > +unsigned short __gnu_d2h_ieee (unsigned long long a); > > +unsigned short __gnu_d2h_alternative (unsigned long long x); > > + > > static inline unsigned short > > __gnu_float2h_internal (const struct format* fmt, > > unsigned long long a, int ieee) > > -- > > 2.7.4 > From 8722a517e051309d6b021107273210d83a1a1ae6 Mon Sep 17 00:00:00 2001 From: Christophe Lyon Date: Fri, 11 Sep 2020 11:43:56 + Subject: [PATCH] libgcc/config/arm/fp16.c: Make _internal functions static inline MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch makes the *_internal functions 'static inline' to avoid these warnings during the build: /libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for '__gnu_h2f_internal' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for '__gnu_f2h_ieee' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for '__gnu_h2f_ieee' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for '__gnu_f2h_alternative' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for '__gnu_h2f_alternative' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for '__gnu_d2h_ieee' [-Wmissing-prototypes] /libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for '__gnu_d2h_alternative' [-Wmissing-prototypes] 2020-09-11 Torbjörn SVENSSON Christophe Lyon libgcc/ * config/arm/fp16.c (__gnu_h2f_internal, __gnu_f2h_ieee) (__gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative) (__gnu_d2h_ieee, __gnu_d2h_alternative): Add 'static inline' qualifier. --- libgcc/config/arm/fp16.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c index e8f7afb..3745671 100644 --- a/libgcc/config/arm/fp16.c +++ b/libgcc/config/arm/fp16.c @@ -165,7 +165,7 @@ __gnu_d2h_internal (unsigned long long a, int ieee) return __gnu_float2h_internal (&binary64, a, ieee); } -unsigned int +static inline unsigned int __gnu_h2f_internal(unsigned short a, int ieee) { unsigned int sign = (unsigned int)(a & 0x8000) << 16; @@ -190,37 +190,37 @@ __gnu_h2f_internal(unsigned short a, int ieee) return sign | (((aexp + 0x70) << 23) + (mantissa << 13)); }
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool > wrote: > > On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote: >> I don’t understand why it’s not correct if we clearing call-clobbered >> registers >> AFTER restoring call-preserved registers? > > Because the compiler backend (or the linker! Or the dynamic linker! > Etc.) can use volatile registers for their own purposes. For the following sequence at the end of a routine: *...* “restore call-preserved registers” *clear call-clobbered registers"* *ret* “Clear call-clobbered registers” will only clear the call-clobbered registers that are not live at the end of the routine. If the call-clobbered register is live at the end of the routine, for example, holding the return value, It will NOT be cleared at all. If the call-clobbered register has some other usage after the routine return, then the backend should know this and will not clear it. Then we will resolve this issue, right? > > Like, on Power, r11 and r12 are used for various calling convention > purposes; they are also used for other purposes; and r0 is used as some > all-purpose volatile (it typically holds the return address near the > end of a function). In the new version of the patch, the implementation of clearing call-clobbered registers is done in backend, middle end only computes a hard register set based on user option, source attribute, data flow information, and function abi information, and Then pass this hard register set to the target hook to generate the clearing sequence. The backend will have all the details on the special situations you mentioned. Let me know any more concerns here. thanks. Qing > > "Call-clobbered" is pretty meaningless. It only holds meaning for a > function calling another, and then only to know which registers lose > their value then. It has no semantics for other cases, like a function > that will return soonish, as here. > > > Segher
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> On Sep 11, 2020, at 12:18 PM, Segher Boessenkool > wrote: > > On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote: >> Shrink-wrapped stuff. Quite important for performance. Not something >> you can throw away. > > ^^^ !!! ^^^ > >>> Start looking at handle_simple_exit()? targetm.gen_simple_return()… >> >> Yes, I have been looking at this since this morning. >> You are right, we also need to insert zeroing sequence before this >> simple_return which the current patch missed. > > Please run the performance loss numbers again after you have something > more realistic :-( Yes, I will collect the performance data with the new patch. > >> I am currently try to resolve this issue with the following idea: >> >> In the routine “thread_prologue_and_epilogue_insns”, After both >> “make_epilogue_seq” and “try_shrink_wrapping” finished, >> >> Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), >> If YES, generate the zero sequence before this RETURN insn. >> >> Then we should take care all the exit path that returns. >> >> Do you see any issue from this idea? > > You need to let the backend decide what to do, for this as well as for > all other cases. I do not know how often I will have to repeat that. Yes, the new patch will separate the whole task into two parts: A. Compute the hard register set based on user option, source code attribute, data flow information, function abi information, The result will be “need_zeroed_register_set”, and then pass this hard reg set to the target hook. B. Each target will have it’s own implementation of emitting the zeroing sequence based on the “need_zeroed_register_set”. > > There also is separate shrink-wrapping, which you haven't touched on at > all yet. Joy. Yes, in addition to shrink-wrapping, I also noticed that there are other places that generate “simple_return” or “return” that are not in The epilogue, for example, in “dbr” phase (delay_slots phase), in “mach” phase (machine reorg phase), etc. So, only generate zeroing sequence in epilogue is not enough. Hongjiu and I discussed this more, and we came up with a new implementation, I will describe this new implementation in another email later. Thanks. Qing > > > Segher
[aarch64] Backport missing NEON intrinsics to GCC8
Hi, gcc-8 branch is missing NEON intrinsics for loads and stores. Attached patches pass bootstrap and regression testing on Graviton2 aarch64-linux. Ok to commit to gcc-8 branch? Thanks, Sebastian 0001-Patch-implementing-vld1_-_x3-vst1_-_x2-and-vst1_-_x3.patch Description: 0001-Patch-implementing-vld1_-_x3-vst1_-_x2-and-vst1_-_x3.patch 0002-add-intrinsics-for-vld1-q-_x4-and-vst1-q-_x4.patch Description: 0002-add-intrinsics-for-vld1-q-_x4-and-vst1-q-_x4.patch
Re: [PATCH, rs6000] testsuite fixup pr96139 tests
On Fri, 2020-09-11 at 12:37 -0500, Segher Boessenkool wrote: > Hi! > > On Fri, Sep 11, 2020 at 09:44:54AM -0500, will schmidt wrote: > > As reported, the recently added pr96139 tests will fail on older > > targets > > because the tests are missing the appropriate -mvsx or -maltivec > > options. > > This adds the options and clarifies the dg-require statements. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require. > > * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require. > > * gcc.target/powerpc/pr96139-a.c: specify -maltivec option and > > require. > > (Capital "S" on each line.) > ("and require" isn't clear... "dg-require"?) > (And the files are -a, -b, -c). Thanks for the review. I've updated that to read: gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr96139-a.c: Specify -mvsx option and update the dg-require stanza to match. * gcc.target/powerpc/pr96139-b.c: Same. * gcc.target/powerpc/pr96139-c.c: Specify -maltivec option and update the dg-require stanza to match. > > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c > > b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c > > index b3daee4..12a3383 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c > > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c > > @@ -1,9 +1,9 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -Wall -m32" } */ > > +/* { dg-options "-O2 -Wall -m32 -mvsx" } */ > > /* { dg-require-effective-target ilp32 } */ > > -/* { dg-require-effective-target powerpc_altivec_ok } */ > > +/* { dg-require-effective-target powerpc_vsx_ok } */ > > This uses vector long long. Okay. > > > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-b.c > > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-b.c > > Ditto. > > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c > > b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c > > index 2464b8d..3ada260 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c > > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c > > @@ -1,7 +1,7 @@ > > /* { dg-do run } */ > > -/* { dg-options "-O2 -Wall" } */ > > +/* { dg-options "-O2 -Wall -maltivec" } */ > > /* { dg-require-effective-target powerpc_altivec_ok } */ > > But this one as well, why does it not need VSX like the rest? I made these changes based on the failures reported when I tested against older targets, and thats all it seems to want. :-) Upon further investigation, it appears that the logic simplifies down to a BIT_FIELD_REF against the llfoo symbol and the call to printf. The actual code generation consists entirely of non-vector instructions (lis,ori,std,addi,... bl..). So it simply does not need the vsx support. It is still sufficient to trigger the initially reported error, so it does still serve that purpose. thanks, -Will > > > Segher