date:20200911

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Qing Zhao via Gcc-patches




> On Sep 11, 2020, at 3:36 PM, Segher Boessenkool  
> wrote:
> 
> On Fri, Sep 11, 2020 at 03:17:19PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 3:05 PM, Segher Boessenkool 
>>>  wrote:
>>> On Fri, Sep 11, 2020 at 02:40:06PM -0500, Qing Zhao wrote:
> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool 
>  wrote:
> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
>> I don’t understand why it’s not correct if we clearing call-clobbered 
>> registers 
>> AFTER restoring call-preserved registers?
> 
> Because the compiler backend (or the linker!  Or the dynamic linker!
> Etc.) can use volatile registers for their own purposes.
 
 For the following sequence at the end of a routine:
 
 *...*
 “restore call-preserved registers”
 *clear call-clobbered registers"*
 *ret*
 
 “Clear call-clobbered registers” will only clear the call-clobbered 
 registers that are not live at the end of the routine.
>>> 
>>> And they can be written again right after the routine, by linker-
>>> generated code for example.  This is a waste.
>>> 
 In the new version of the patch,  the implementation of clearing 
 call-clobbered registers is done in backend, middle end only 
 computes a hard register set based on user option, source attribute, data 
 flow information, and function abi information, and
 Then pass this hard register set to the target hook to generate the 
 clearing sequence.  The backend will have all the details
 on the special situations you mentioned. 
 
 Let me know any more concerns here.
>>> 
>>> I cannot find that patch?
>> 
>> Haven’t finished yet. -:).
> 
> Ah okay :-)
> 
> If you have, please send it in a new thread (not as a reply)?  So that
> it will be much easirer to handle :-)

Okay. Will do.

Qing
> 
> 
> Segher

[PATCH] libstdc++: only pull in bits/align.h if C++11 or later

2020-09-11 Thread Thomas Rodgers via Gcc-patches

libstdc++-v3/ChangeLog:

* include/std/memory: Move #include  inside C++11
conditional includes.

Tested x86_64-pc-linux-gnu, committed to master.

---
 libstdc++-v3/include/std/memory | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory
index a56952fb114..aee7b050bd7 100644
--- a/libstdc++-v3/include/std/memory
+++ b/libstdc++-v3/include/std/memory
@@ -61,7 +61,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -75,6 +74,7 @@
 #  include   // std::basic_ostream
 #  include 
 #  include 
+#  include 
 #  include 
 #  include   // std::less
 #  include 
-- 
2.26.2

i386: Fix array index in expander

2020-09-11 Thread Nathan Sidwell


I noticed a compiler warning about out-of-bound access.  Fixed thusly.

gcc/
* config/i386/sse.md (mov): Fix operand indices.

pushed as obvious
--
Nathan Sidwell
diff --git i/gcc/config/i386/sse.md w/gcc/config/i386/sse.md
index a728b979f01..a784346a23b 100644
--- i/gcc/config/i386/sse.md
+++ w/gcc/config/i386/sse.md
@@ -23491,7 +23491,7 @@ (define_expand "mov"
 	(match_operand:MASK_DWI 1 "nonimmediate_operand"))]
   "TARGET_AVX512VP2INTERSECT"
 {
-  if (MEM_P (operands[1]) && MEM_P (operands[2]))
+  if (MEM_P (operands[0]) && MEM_P (operands[1]))
 operands[1] = force_reg (mode, operands[1]);
 })

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Qing Zhao via Gcc-patches




> On Sep 11, 2020, at 4:03 PM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Fri, Sep 11, 2020 at 03:14:57PM -0500, Qing Zhao wrote:
>> My understanding of how this scheme helps ROP is:  the attacker usually uses 
>> scratch register to pass
> 
> Help obstruct ROP ;-)
Thanks for catching my mistake.
> 
>> parameters to the sys call in the gadget, if clearing the scratch registers 
>> immediately before “ret”, then 
>> The parameters that are passed to sys call will be destroyed, therefore, the 
>> attack will likely failed.
> 
> But you do not need more than one non-zero argument for execv*, and that
> is usually the same register as the normal return value register; all
> other registers *should* be zero for a simple execv*("/bin/sh", ...)!
> 
> (There is also the system call number register, rax on x86-64, but if
> overwriting that would be any effective, you could just do that one
> always and everywhere.  This is only an effective defence if there are
> no gadgets that do the system call an attacker wants, and he has to
> construct that sequence himself; but it very effective and cheap then).

In the above, do you mean only clearing “rax” on x86-64 should be effective 
enough? 

Qing
> 
> 
> Segher

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Richard Sandiford

Qing Zhao  writes:
>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford  
>> >>  If we go for (2), then I think it would be better to do
>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>> cope with doing it later.)  The reason for doing it so late is that the
>> set of used “volatile”/caller-saved registers is not fixed at prologue
>> and epilogue generation: later optimisation passes can introduce uses
>> of volatile registers that weren't used previously.  (Sorry if this
>> has already been suggested.)
>
> Yes, I agree.
>
> I thought that it might be better to move this task at the very late of the 
> RTL stage, i.e, before “final” phase. 
>
> Another solution is (discussed with Hongjiu):
>
> 1. Define a new target hook:
>
> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET 
> need_zeroed_hardregs, bool gpr_only)
>
> 2. Add the following routine in middle end:
>
> rtx_insn *
> generate_return_rtx (bool simple_return_p)
> {
>   if (targetm.return_with_zeroing)
> {
>   Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>  return targetm.return_with_zeroing (simple_return_p, 
> need_zeroed_hardregs, gpr_only);
>}
>  else
> {
>  if (simple_return_p)
>return targetm.gen_simple_return ( );
> else
>return targetm.gen_return ();
>   }
> }
>
> Then replace all call to “targetm.gen_simple_return” and “targetm.gen_return” 
> to “generate_return_rtx()”.
>
> 3. In the target, 
> Implement “return_with_zeroing”.
>
>
> Let me know your comments on this.

I think having a separate pass is better.  We don't normally know
at the point of generating the return which registers will need
to be cleared.  So IMO the pass should just search for all the
returns in a function and insert the zeroing instructions before
each one.

Having a target hook sounds good, but I think it should have a
default definition that just uses the move patterns to zero each
selected register.  I expect the default will be good enough for
most targets.

Thanks,
Richard

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Segher Boessenkool

On Fri, Sep 11, 2020 at 04:29:16PM -0500, Qing Zhao wrote:
> > On Sep 11, 2020, at 4:03 PM, Segher Boessenkool 
> >  wrote:
> >> The parameters that are passed to sys call will be destroyed, therefore, 
> >> the attack will likely failed.
> > 
> > But you do not need more than one non-zero argument for execv*, and that
> > is usually the same register as the normal return value register; all
> > other registers *should* be zero for a simple execv*("/bin/sh", ...)!
> > 
> > (There is also the system call number register, rax on x86-64, but if
> > overwriting that would be any effective, you could just do that one
> > always and everywhere.  This is only an effective defence if there are
> > no gadgets that do the system call an attacker wants, and he has to
> > construct that sequence himself; but it very effective and cheap then).
> 
> In the above, do you mean only clearing “rax” on x86-64 should be effective 
> enough? 

(rax=0 is "read", you might want to do another value, but that's just
details.)

"This is only an effective defence if there are
no gadgets that do the system call an attacker wants, and he has to
construct that sequence himself; but it very effective and cheap then)."

It is definitely *not* effective if there are gadgets that set rax to
a value the attacker wants and then do a syscall.  It of course is quite
effective in breaking a ROP chain of (set rax) -> (syscall).  How
effective it is in practice, I have no idea.

My point was that your proposed scheme does not protect the other
syscall parameters very much either.

And, hrm, rax is actually the first return value.  On most ABIs the
same registers are used for arguments and for return values, I got
confused.  Sorry.  So this cannot be very effective for x86-64 no
matter what.

Segher

[PATCH] -fprofile-reproducible: fix option value handling

2020-09-11 Thread Sergei Trofimovich via Gcc-patches

From: Sergei Trofimovich 

Before the change option handling did not accept an argument:
  xgcc: error: unknown profile reproducibility method '=serial'
  xgcc: note: valid arguments to '-fprofile-reproducible' are: multithreaded 
parallel-runs serial; did you mean 'serial'?

The change also includes trailing '=' as part of option prefix.

gcc/ChangeLog:

* common.opt: Fix handling of '-fprofile-reproducible' option.
---
 gcc/common.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index dd68c61ae1d..84bf521128d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2228,7 +2228,7 @@ Enum(profile_reproducibility) String(parallel-runs) 
Value(PROFILE_REPRODUCIBILIT
 EnumValue
 Enum(profile_reproducibility) String(multithreaded) 
Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
 
-fprofile-reproducible
+fprofile-reproducible=
 Common Joined RejectNegative Var(flag_profile_reproducible) 
Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
 -fprofile-reproducible=[serial|parallel-runs|multithreaded]Control level 
of reproducibility of profile gathered by -fprofile-generate.
 
-- 
2.28.0

[PATCH] doc: fix spelling of -fprofile-reproducibility

2020-09-11 Thread Sergei Trofimovich via Gcc-patches

From: Sergei Trofimovich 

gcc/ChangeLog:

* doc/invoke.texi: fix '-fprofile-reproducibility' option
spelling in maunal.
---
 gcc/doc/invoke.texi | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bca8c856dc8..183ce7715d1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -557,7 +557,8 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-dir=@var{path}  -fprofile-generate  -fprofile-generate=@var{path} 
@gol
 -fprofile-note=@var{path} -fprofile-prefix-path=@var{path} @gol
 -fprofile-update=@var{method} -fprofile-filter-files=@var{regex} @gol
--fprofile-exclude-files=@var{regex} -fprofile-reproducibility @gol
+-fprofile-exclude-files=@var{regex} @gol
+-fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]} 
@gol
 -fsanitize=@var{style}  -fsanitize-recover  -fsanitize-recover=@var{style} @gol
 -fasan-shadow-offset=@var{number}  -fsanitize-sections=@var{s1},@var{s2},... 
@gol
 -fsanitize-undefined-trap-on-error  -fbounds-check @gol
@@ -13889,14 +13890,14 @@ any of the regular expressions (separated by 
semi-colons).
 For example, @option{-fprofile-exclude-files=/usr/.*} will prevent 
instrumentation
 of all files that are located in the @file{/usr/} folder.
 
-@item -fprofile-reproducible
+@item 
-fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]}
 @opindex fprofile-reproducible
 Control level of reproducibility of profile gathered by
 @code{-fprofile-generate}.  This makes it possible to rebuild program
 with same outcome which is useful, for example, for distribution
 packages.
 
-With @option{-fprofile-reproducibility=serial} the profile gathered by
+With @option{-fprofile-reproducible=serial} the profile gathered by
 @option{-fprofile-generate} is reproducible provided the trained program
 behaves the same at each invocation of the train run, it is not
 multi-threaded and profile data streaming is always done in the same
@@ -13911,7 +13912,7 @@ Such non-reproducible part of programs may be annotated 
by
 @option{-l} can be used to dump gathered data and verify that they are
 indeed reproducible.
 
-With @option{-fprofile-reproducibility=parallel-runs} collected profile
+With @option{-fprofile-reproducible=parallel-runs} collected profile
 stays reproducible regardless the order of streaming of the data into
 gcda files.  This setting makes it possible to run multiple instances of
 instrumented program in parallel (such as with @code{make -j}). This
-- 
2.28.0

[PATCH] rs6000: Rename mffgpr/mftgpr instruction types

2020-09-11 Thread Pat Haugen via Gcc-patches

The following is mostly a mechanical change to rename the mffgpr/mftgpr
insn types to mtvsr/mfvsr to be more clear. It also removes Power6
references to those insn types since we no longer generate those
instructions.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

-Pat


2020-09-11  Pat Haugen  

gcc/
* config/rs6000/power10.md (power10-mffgpr, power10-mftgpr): Rename to
mtvsr/mfvsr.
* config/rs6000/power6.md (X2F_power6, power6-mftgpr, power6-mffgpr):
Remove.
* config/rs6000/power8.md (power8-mffgpr, power8-mftgpr): Rename to
mtvsr/mfvsr.
* config/rs6000/power9.md (power8-mffgpr, power8-mftgpr): Likewise.
* config/rs6000/rs6000.c (rs6000_adjust_cost): Remove Power6
TYPE_MFFGPR cases.
* config/rs6000/rs6000.md (mffgpr, mftgpr, zero_extendsi2,
extendsi2, @signbit2_dm, lfiwax, lfiwzx, *movsi_internal1,
movsi_from_sf, *movdi_from_sf_zero_ext, *mov_internal,
movsd_hardfloat, movsf_from_si, *mov_hardfloat64, p8_mtvsrwz,
p8_mtvsrd_df, p8_mtvsrd_sf, p8_mfvsrd_3_, *movdi_internal64,
unpack_dm): Rename mffgpr/mftgpr to mtvsr/mfvsr.
* config/rs6000/vsx.md (vsx_mov_64bit, vsx_extract_,
vsx_extract_si, *vsx_extract__p8): Likewise.



diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 9f8a5825744..2b4d882e8df 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -468,13 +468,13 @@ (define_insn_reservation "power10-qpmul" 24
(eq_attr "cpu" "power10"))
   "DU_super_power10,dfu_power10*8")

-(define_insn_reservation "power10-mffgpr" 2
-  (and (eq_attr "type" "mffgpr")
+(define_insn_reservation "power10-mtvsr" 2
+  (and (eq_attr "type" "mtvsr")
(eq_attr "cpu" "power10"))
   "DU_slice_3_power10,VSU_power10")

-(define_insn_reservation "power10-mftgpr" 2
-  (and (eq_attr "type" "mftgpr")
+(define_insn_reservation "power10-mfvsr" 2
+  (and (eq_attr "type" "mfvsr")
(eq_attr "cpu" "power10"))
   "DU_slice_3_power10,VSU_power10")

diff --git a/gcc/config/rs6000/power6.md b/gcc/config/rs6000/power6.md
index a94ce52425f..e2e7582e67c 100644
--- a/gcc/config/rs6000/power6.md
+++ b/gcc/config/rs6000/power6.md
@@ -56,10 +56,6 @@ (define_reservation "LX2_power6"
 (define_reservation "FX2_power6"
 "iu1_power6+iu2_power6")

-(define_reservation "X2F_power6"
-"(iu1_power6+iu2_power6+fpu1_power6)\
-|(iu1_power6+iu2_power6+fpu2_power6)")
-
 (define_reservation "BX2_power6"
 "iu1_power6+iu2_power6+bpu_power6")

@@ -605,20 +601,3 @@ (define_bypass 6 "power6-vecperm" "power6-veccomplex" )

 (define_bypass 5 "power6-vecperm" "power6-vecstore" )

-(define_insn_reservation "power6-mftgpr" 8
-  (and (eq_attr "type" "mftgpr")
-   (eq_attr "cpu" "power6"))
-  "X2F_power6")
-
-(define_insn_reservation "power6-mffgpr" 14
-  (and (eq_attr "type" "mffgpr")
-   (eq_attr "cpu" "power6"))
-  "LX2_power6")
-
-(define_bypass 4 "power6-mftgpr" "power6-imul,\
-  power6-lmul,\
-  power6-imul-cmp,\
-  power6-lmul-cmp,\
-  power6-imul3,\
-  power6-idiv,\
-  power6-ldiv" )
diff --git a/gcc/config/rs6000/power8.md b/gcc/config/rs6000/power8.md
index fae2ad8bf57..a3f46c62be2 100644
--- a/gcc/config/rs6000/power8.md
+++ b/gcc/config/rs6000/power8.md
@@ -379,13 +379,13 @@ (define_insn_reservation "power8-vecdiv" 31
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

-(define_insn_reservation "power8-mffgpr" 5
-  (and (eq_attr "type" "mffgpr")
+(define_insn_reservation "power8-mtvsr" 5
+  (and (eq_attr "type" "mtvsr")
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

-(define_insn_reservation "power8-mftgpr" 6
-  (and (eq_attr "type" "mftgpr")
+(define_insn_reservation "power8-mfvsr" 6
+  (and (eq_attr "type" "mfvsr")
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

diff --git a/gcc/config/rs6000/power9.md b/gcc/config/rs6000/power9.md
index 2277b145308..c86d643a7d3 100644
--- a/gcc/config/rs6000/power9.md
+++ b/gcc/config/rs6000/power9.md
@@ -466,13 +466,13 @@ (define_insn_reservation "power9-qpmul" 24
(eq_attr "cpu" "power9"))
   "DU_super_power9,dfu_power9*8")

-(define_insn_reservation "power9-mffgpr" 2
-  (and (eq_attr "type" "mffgpr")
+(define_insn_reservation "power9-mtvsr" 2
+  (and (eq_attr "type" "mtvsr")
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")

-(define_insn_reservation "power9-mftgpr" 2
-  (and (eq_attr "type" "mftgpr")
+(define_insn_reservation "power9-mfvsr" 2
+  (and (eq_attr "type" "mfvsr")
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ca5b71ecdd3..5b0162dcccd 10

[PATCH 2/4, revised patch applied] PowerPC: Rename functions for min, max, cmove

2020-09-11 Thread Michael Meissner via Gcc-patches

Here is the patch that I applied:

>From 1a2e0742e3e3c45f75d0ce31c45a7778c8d1f45e Mon Sep 17 00:00:00 2001
From: Michael Meissner 
Date: Fri, 11 Sep 2020 16:57:13 -0400
Subject: [PATCH] PowerPC: rename some functions.

gcc/
2020-09-11  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_maybe_emit_maxc_minc): Rename
from rs6000_emit_p9_fp_minmax.  Change return type to bool.  Add
comments to document NaN/signed zero behavior.
(rs6000_maybe_emit_fp_cmove): Rename from rs6000_emit_p9_fp_cmove.
(have_compare_and_set_mask): New helper function.
(rs6000_emit_cmove): Update calls to new names and the new helper
function.
---
 gcc/config/rs6000/rs6000.c | 93 --
 1 file changed, 69 insertions(+), 24 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9908830b07a..20a4ba382bc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15057,13 +15057,33 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, 
rtx op_false,
   return 1;
 }
 
-/* ISA 3.0 (power9) minmax subcase to emit a XSMAXCDP or XSMINCDP instruction
-   for SF/DF scalars.  Move TRUE_COND to DEST if OP of the operands of the last
-   comparison is nonzero/true, FALSE_COND if it is zero/false.  Return 0 if the
-   hardware has no such operation.  */
+/* Possibly emit the xsmaxcdp and xsmincdp instructions to emit a maximum or
+   minimum with "C" semantics.
 
-static int
-rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond)
+   Unless you use -ffast-math, you can't use these instructions to replace
+   conditions that implicitly reverse the condition because the comparison
+   might generate a NaN or signed zer0.
+
+   I.e. the following can be replaced all of the time
+   ret = (op1 >  op2) ? op1 : op2  ; generate xsmaxcdp
+   ret = (op1 >= op2) ? op1 : op2  ; generate xsmaxcdp
+   ret = (op1 <  op2) ? op1 : op2; ; generate xsmincdp
+   ret = (op1 <= op2) ? op1 : op2; ; generate xsmincdp
+
+   The following can be replaced only if -ffast-math is used:
+   ret = (op1 <  op2) ? op2 : op1  ; generate xsmaxcdp
+   ret = (op1 <= op2) ? op2 : op1  ; generate xsmaxcdp
+   ret = (op1 >  op2) ? op2 : op1; ; generate xsmincdp
+   ret = (op1 >= op2) ? op2 : op1; ; generate xsmincdp
+
+   Move TRUE_COND to DEST if OP of the operands of the last comparison is
+   nonzero/true, FALSE_COND if it is zero/false.
+
+   Return false if we can't generate the appropriate minimum or maximum, and
+   true if we can did the minimum or maximum.  */
+
+static bool
+rs6000_maybe_emit_maxc_minc (rtx dest, rtx op, rtx true_cond, rtx false_cond)
 {
   enum rtx_code code = GET_CODE (op);
   rtx op0 = XEXP (op, 0);
@@ -15073,14 +15093,14 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
   bool max_p = false;
 
   if (result_mode != compare_mode)
-return 0;
+return false;
 
   if (code == GE || code == GT)
 max_p = true;
   else if (code == LE || code == LT)
 max_p = false;
   else
-return 0;
+return false;
 
   if (rtx_equal_p (op0, true_cond) && rtx_equal_p (op1, false_cond))
 ;
@@ -15093,19 +15113,23 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
 max_p = !max_p;
 
   else
-return 0;
+return false;
 
   rs6000_emit_minmax (dest, max_p ? SMAX : SMIN, op0, op1);
-  return 1;
+  return true;
 }
 
-/* ISA 3.0 (power9) conditional move subcase to emit XSCMP{EQ,GE,GT,NE}DP and
-   XXSEL instructions for SF/DF scalars.  Move TRUE_COND to DEST if OP of the
-   operands of the last comparison is nonzero/true, FALSE_COND if it is
-   zero/false.  Return 0 if the hardware has no such operation.  */
+/* Possibly emit a floating point conditional move by generating a compare that
+   sets a mask instruction and a XXSEL select instruction.
 
-static int
-rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
+   Move TRUE_COND to DEST if OP of the operands of the last comparison is
+   nonzero/true, FALSE_COND if it is zero/false.
+
+   Return false if the operation cannot be generated, and true if we could
+   generate the instruction.  */
+
+static bool
+rs6000_maybe_emit_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
 {
   enum rtx_code code = GET_CODE (op);
   rtx op0 = XEXP (op, 0);
@@ -15133,7 +15157,7 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
   break;
 
 default:
-  return 0;
+  return false;
 }
 
   /* Generate: [(parallel [(set (dest)
@@ -15153,7 +15177,28 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
   emit_insn (gen_rtx_PARALLEL (VOIDmode,
   gen_rtvec (2, cmove_rtx, clobber_rtx)));
 
-  return 1;
+  return true;
+}
+
+/* Helper function to return true if the target has instructions to do a
+   compare and set mask instruction that can

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Qing Zhao via Gcc-patches

> On Sep 11, 2020, at 4:44 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford  
>>> >>  If we go for (2), then I think it would be better to do
>>> it at the start of pass_late_compilation instead.  (Some targets wouldn't
>>> cope with doing it later.)  The reason for doing it so late is that the
>>> set of used “volatile”/caller-saved registers is not fixed at prologue
>>> and epilogue generation: later optimisation passes can introduce uses
>>> of volatile registers that weren't used previously.  (Sorry if this
>>> has already been suggested.)
>> 
>> Yes, I agree.
>> 
>> I thought that it might be better to move this task at the very late of the 
>> RTL stage, i.e, before “final” phase. 
>> 
>> Another solution is (discussed with Hongjiu):
>> 
>> 1. Define a new target hook:
>> 
>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET 
>> need_zeroed_hardregs, bool gpr_only)
>> 
>> 2. Add the following routine in middle end:
>> 
>> rtx_insn *
>> generate_return_rtx (bool simple_return_p)
>> {
>>  if (targetm.return_with_zeroing)
>>{
>>  Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>> return targetm.return_with_zeroing (simple_return_p, 
>> need_zeroed_hardregs, gpr_only);
>>   }
>> else
>>{
>> if (simple_return_p)
>>   return targetm.gen_simple_return ( );
>>else
>>   return targetm.gen_return ();
>>  }
>> }
>> 
>> Then replace all call to “targetm.gen_simple_return” and 
>> “targetm.gen_return” to “generate_return_rtx()”.
>> 
>> 3. In the target, 
>> Implement “return_with_zeroing”.
>> 
>> 
>> Let me know your comments on this.
> 
> I think having a separate pass is better.  We don't normally know
> at the point of generating the return which registers will need
> to be cleared.  

At the point of generating the return, we can compute the 
“need_zeroed_hardregs” HARD_REG_SET 
by using data flow information, the function abi of the routine, and also the 
user option and source code 
attribute information together. These information should be available at each 
point when generating the return.

> So IMO the pass should just search for all the
> returns in a function and insert the zeroing instructions before
> each one.

I was considering this approach too for some time, however, there is one major 
issue with this as 
Segher mentioned, The middle end does not know some details on the registers, 
lacking such 
detailed information might result incorrect code generation at middle end.

For example, on x86_64 target, when “return” with pop, the scratch register 
“ECX” will be 
used for returning, then it’s incorrect to zero “ecx” before generating the 
return. Since middle end
doesn't have such information, it cannot avoid to zero “ecx”. Therefore 
incorrect code might be 
generated. 

Segher also mentioned that on Power, there are some scratch registers also are 
used for 
Other purpose, clearing them before return is not correct. 

> 
> Having a target hook sounds good, but I think it should have a
> default definition that just uses the move patterns to zero each
> selected register.  I expect the default will be good enough for
> most targets.

Based on the above, I think that generating the zeroing instructions at middle 
end is not correct. 

Thanks.

Qing
> 
> Thanks,
> Richard

Re: [PATCH] rs6000: Rename mffgpr/mftgpr instruction types

2020-09-11 Thread Segher Boessenkool

Hi Pat,

On Fri, Sep 11, 2020 at 05:04:29PM -0500, Pat Haugen wrote:
> The following is mostly a mechanical change to rename the mffgpr/mftgpr
> insn types to mtvsr/mfvsr to be more clear. It also removes Power6
> references to those insn types since we no longer generate those
> instructions.

>   * config/rs6000/power10.md (power10-mffgpr, power10-mftgpr): Rename to
>   mtvsr/mfvsr.

Please spell out the new names in full, so that it can be searched for.

Okay for trunk.  Thank you!


Segher

[PATCH] doc: use @code{} instead of @samp{@command{}} around 'date %s'

2020-09-11 Thread Sergei Trofimovich via Gcc-patches

From: Sergei Trofimovich 

Before the change 'man gcc' rendered "SOURCE_DATE_EPOCH" section as:
... the output of @command{date +%s} on GNU/Linux ...
After the change it renders as:
... the output of "date +%s" on GNU/Linux ...

gcc/ChangeLog:

* doc/cppenv.texi: Use @code{} instead of @samp{@command{}}
around 'date %s'.
---
 gcc/doc/cppenv.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/cppenv.texi b/gcc/doc/cppenv.texi
index 123df5c464c..e1e408a2974 100644
--- a/gcc/doc/cppenv.texi
+++ b/gcc/doc/cppenv.texi
@@ -89,7 +89,7 @@ reproducible.
 The value of @env{SOURCE_DATE_EPOCH} must be a UNIX timestamp,
 defined as the number of seconds (excluding leap seconds) since
 01 Jan 1970 00:00:00 represented in ASCII; identical to the output of
-@samp{@command{date +%s}} on GNU/Linux and other systems that support the
+@code{date +%s} on GNU/Linux and other systems that support the
 @code{%s} extension in the @code{date} command.
 
 The value should be a known timestamp such as the last modification
-- 
2.28.0

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Qing Zhao via Gcc-patches




> On Sep 11, 2020, at 4:51 PM, Segher Boessenkool  
> wrote:
> 
> On Fri, Sep 11, 2020 at 04:29:16PM -0500, Qing Zhao wrote:
>>> On Sep 11, 2020, at 4:03 PM, Segher Boessenkool 
>>>  wrote:
 The parameters that are passed to sys call will be destroyed, therefore, 
 the attack will likely failed.
>>> 
>>> But you do not need more than one non-zero argument for execv*, and that
>>> is usually the same register as the normal return value register; all
>>> other registers *should* be zero for a simple execv*("/bin/sh", ...)!
>>> 
>>> (There is also the system call number register, rax on x86-64, but if
>>> overwriting that would be any effective, you could just do that one
>>> always and everywhere.  This is only an effective defence if there are
>>> no gadgets that do the system call an attacker wants, and he has to
>>> construct that sequence himself; but it very effective and cheap then).
>> 
>> In the above, do you mean only clearing “rax” on x86-64 should be effective 
>> enough? 
> 
> (rax=0 is "read", you might want to do another value, but that's just
> details.)
> 
> "This is only an effective defence if there are
> no gadgets that do the system call an attacker wants, and he has to
> construct that sequence himself; but it very effective and cheap then)."
> 
> It is definitely *not* effective if there are gadgets that set rax to
> a value the attacker wants and then do a syscall.

You mean the following gadget:


Gadget 1:

mov  rax,  value
syscall
ret

Qing

> It of course is quite
> effective in breaking a ROP chain of (set rax) -> (syscall).  How
> effective it is in practice, I have no idea.
> 
> My point was that your proposed scheme does not protect the other
> syscall parameters very much either.
> 
> And, hrm, rax is actually the first return value.  On most ABIs the
> same registers are used for arguments and for return values, I got
> confused.  Sorry.  So this cannot be very effective for x86-64 no
> matter what.
> 
> 
> Segher

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Richard Sandiford

Qing Zhao  writes:
>> On Sep 11, 2020, at 4:44 PM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao  writes:
 On Sep 11, 2020, at 12:32 PM, Richard Sandiford 
  >>  If we go for (2), then I think it would be 
 better to do
 it at the start of pass_late_compilation instead.  (Some targets wouldn't
 cope with doing it later.)  The reason for doing it so late is that the
 set of used “volatile”/caller-saved registers is not fixed at prologue
 and epilogue generation: later optimisation passes can introduce uses
 of volatile registers that weren't used previously.  (Sorry if this
 has already been suggested.)
>>> 
>>> Yes, I agree.
>>> 
>>> I thought that it might be better to move this task at the very late of the 
>>> RTL stage, i.e, before “final” phase. 
>>> 
>>> Another solution is (discussed with Hongjiu):
>>> 
>>> 1. Define a new target hook:
>>> 
>>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET 
>>> need_zeroed_hardregs, bool gpr_only)
>>> 
>>> 2. Add the following routine in middle end:
>>> 
>>> rtx_insn *
>>> generate_return_rtx (bool simple_return_p)
>>> {
>>>  if (targetm.return_with_zeroing)
>>>{
>>>  Compute the hardregs set for clearing into “need_zeroed_hardregs”;
>>> return targetm.return_with_zeroing (simple_return_p, 
>>> need_zeroed_hardregs, gpr_only);
>>>   }
>>> else
>>>{
>>> if (simple_return_p)
>>>   return targetm.gen_simple_return ( );
>>>else
>>>   return targetm.gen_return ();
>>>  }
>>> }
>>> 
>>> Then replace all call to “targetm.gen_simple_return” and 
>>> “targetm.gen_return” to “generate_return_rtx()”.
>>> 
>>> 3. In the target, 
>>> Implement “return_with_zeroing”.
>>> 
>>> 
>>> Let me know your comments on this.
>> 
>> I think having a separate pass is better.  We don't normally know
>> at the point of generating the return which registers will need
>> to be cleared.  
>
> At the point of generating the return, we can compute the 
> “need_zeroed_hardregs” HARD_REG_SET 
> by using data flow information, the function abi of the routine, and also the 
> user option and source code 
> attribute information together. These information should be available at each 
> point when generating the return.

Like I mentioned earlier though, passes that run after
pass_thread_prologue_and_epilogue can use call-clobbered registers that
weren't previously used.  For example, on x86_64, the function might
not use %r8 when the prologue, epilogue and returns are generated,
but pass_regrename might later introduce a new use of %r8.  AIUI,
the “used” version of the new command-line option is supposed to clear
%r8 in these circumstances, but it wouldn't do so if the data was
collected at the point that the return is generated.

That's why I think it's more robust to do this later (at the beginning
of pass_late_compilation) and insert the zeroing before returns that
already exist.

>> So IMO the pass should just search for all the
>> returns in a function and insert the zeroing instructions before
>> each one.
>
> I was considering this approach too for some time, however, there is one 
> major issue with this as 
> Segher mentioned, The middle end does not know some details on the registers, 
> lacking such 
> detailed information might result incorrect code generation at middle end.
>
> For example, on x86_64 target, when “return” with pop, the scratch register 
> “ECX” will be 
> used for returning, then it’s incorrect to zero “ecx” before generating the 
> return. Since middle end
> doesn't have such information, it cannot avoid to zero “ecx”. Therefore 
> incorrect code might be 
> generated. 
>
> Segher also mentioned that on Power, there are some scratch registers also 
> are used for 
> Other purpose, clearing them before return is not correct. 

But the dataflow information has to be correct between
pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
any pass in that region could clobber the registers in the same way.

To get the registers that are live before the return, you can start with
the registers that are live out from the block that contains the return,
then “simulate” the return instruction backwards to get the set of
registers that are live before the return instruction
(see df_simulate_one_insn_backwards).

In the x86_64 case you mention, the pattern is:

(define_insn "*simple_return_indirect_internal"
  [(simple_return)
   (use (match_operand:W 0 "register_operand" "r"))]
  "reload_completed"
  …)

This (use …) tells the df machinery that the instruction needs
operand 0 (= ecx).  The process above would therefore realise
that ecx can't be clobbered.

Thanks,
Richard

[PATCH] libstdc++: Add C++2a synchronization support

2020-09-11 Thread Thomas Rodgers

From: Thomas Rodgers 

This patch supercedes both the Add C++2a synchronization support patch
being replied to *and* the patch adding wait/notify_* to atomic_flag.

Add support for -
  * atomic_flag::wait/notify_one/notify_all
  * atomic::wait/notify_one/notify_all
  * counting_semaphore
  * binary_semaphore
  * latch

libstdc++-v3/ChangeLog:

* include/Makefile.am (bits_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/bits/atomic_base.h (__atomic_flag::wait): Define.
(__atomic_flag::notify_one): Likewise.
(__atomic_flag::notify_all): Likewise.
(__atomic_base<_Itp>::wait): Likewise.
(__atomic_base<_Itp>::notify_one): Likewise.
(__atomic_base<_Itp>::notify_all): Likewise.
(__atomic_base<_Ptp*>::wait): Likewise.
(__atomic_base<_Ptp*>::notify_one): Likewise.
(__atomic_base<_Ptp*>::notify_all): Likewise.
(__atomic_impl::wait): Likewise.
(__atomic_impl::notify_one): Likewise.
(__atomic_impl::notify_all): Likewise.
(__atomic_float<_Fp>::wait): Likewise.
(__atomic_float<_Fp>::notify_one): Likewise.
(__atomic_float<_Fp>::notify_all): Likewise.
(__atomic_ref<_Tp>::wait): Likewise.
(__atomic_ref<_Tp>::notify_one): Likewise.
(__atomic_ref<_Tp>::notify_all): Likewise.
(atomic_wait<_Tp>): Likewise.
(atomic_wait_explicit<_Tp>): Likewise.
(atomic_notify_one<_Tp>): Likewise.
(atomic_notify_all<_Tp>): Likewise.
* include/bits/atomic_wait.h: New file.
* include/bits/atomic_timed_wait.h: New file.
* include/bits/semaphore_base.h: New file.
* include/std/atomic (atomic::wait): Define.
(atomic::wait_one): Likewise.
(atomic::wait_all): Likewise.
(atomic<_Tp>::wait): Likewise.
(atomic<_Tp>::wait_one): Likewise.
(atomic<_Tp>::wait_all): Likewise.
(atomic<_Tp*>::wait): Likewise.
(atomic<_Tp*>::wait_one): Likewise.
(atomic<_Tp*>::wait_all): Likewise.
* include/std/latch: New file.
* include/std/semaphore: New file.
* include/std/version: Add __cpp_lib_semaphore and
__cpp_lib_latch defines.
* testsuite/29_atomic/atomic/wait_notify/atomic_refs.cc: New test.
* testsuite/29_atomic/atomic/wait_notify/bool.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/integrals.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/floats.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/pointers.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/generic.cc: Liekwise.
* testsuite/29_atomic/atomic/wait_notify/generic.h: New File.
* testsuite/29_atomics/atomic_flag/wait_notify/1.cc: New test.
* testsuite/30_thread/semaphore/1.cc: New test.
* testsuite/30_thread/semaphore/2.cc: Likewise.
* testsuite/30_thread/semaphore/least_max_value_neg.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_for.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_posix.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_until.cc: Likewise.
* testsuite/30_thread/latch/1.cc: New test.
* testsuite/30_thread/latch/2.cc: New test.
* testsuite/30_thread/latch/3.cc: New test.
---
 libstdc++-v3/include/Makefile.am  |   5 +
 libstdc++-v3/include/Makefile.in  |   5 +
 libstdc++-v3/include/bits/atomic_base.h   | 195 +++-
 libstdc++-v3/include/bits/atomic_timed_wait.h | 281 
 libstdc++-v3/include/bits/atomic_wait.h   | 301 ++
 libstdc++-v3/include/bits/semaphore_base.h| 283 
 libstdc++-v3/include/std/atomic   |  73 +
 libstdc++-v3/include/std/latch|  90 ++
 libstdc++-v3/include/std/semaphore|  92 ++
 libstdc++-v3/include/std/version  |   2 +
 .../atomic/wait_notify/atomic_refs.cc | 103 ++
 .../29_atomics/atomic/wait_notify/bool.cc |  59 
 .../29_atomics/atomic/wait_notify/floats.cc   |  32 ++
 .../29_atomics/atomic/wait_notify/generic.cc  |  31 ++
 .../29_atomics/atomic/wait_notify/generic.h   | 160 ++
 .../atomic/wait_notify/integrals.cc   |  65 
 .../29_atomics/atomic/wait_notify/pointers.cc |  59 
 .../29_atomics/atomic_flag/wait_notify/1.cc   |  61 
 libstdc++-v3/testsuite/30_threads/latch/1.cc  |  27 ++
 libstdc++-v3/testsuite/30_threads/latch/2.cc  |  27 ++
 libstdc++-v3/testsuite/30_threads/latch/3.cc  |  50 +++
 .../testsuite/30_threads/semaphore/1.cc   |  27 ++
 .../testsuite/30_threads/semaphore/2.cc   |  27 ++
 .../semaphore/least_max_value_neg.cc  |  30 ++
 .../30_threads/semaphore/try_acquire.cc   |  55 
 .../30_threads/semaphore/try_acquire_for.cc   |  85 +
 .../30_threads/sema

[PATCH] tree-optimization/97013 - avoid duplicate 'vectorization is not profitable'

2020-09-11 Thread Richard Biener

This avoids dumping 'vectorization is not profitable' one more time
if none of the opportunities in a BB is profitable.

tested on x86_64-unknown-linux-gnu, pushed.

2020-09-11  Richard Biener  

PR tree-optimization/97013
* tree-vect-slp.c (vect_slp_analyze_bb_1): Remove duplicate dumping.
---
 gcc/tree-vect-slp.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index aa6aa3328e8..35bde9bcb9d 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3626,16 +3626,10 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int 
n_stmts, bool &fatal,
 
   vect_bb_partition_graph (bb_vinfo);
 
-  /* Cost model: check if the vectorization is worthwhile.  */
+  /* Cost model: check if the vectorization opportunities are worthwhile.  */
   if (!unlimited_cost_model (NULL)
   && !vect_bb_vectorization_profitable_p (bb_vinfo))
-{
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not vectorized: vectorization is not "
-"profitable.\n");
-  return false;
-}
+return false;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
-- 
2.26.2

[PATCH] random vectorizer fixes

2020-09-11 Thread Richard Biener

This fixes random things found when doing SLP discovery from
arbitrary sets of stmts.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-09-10  Richard Biener  

* tree-vect-slp.c (vect_build_slp_tree_1): Check vector
types for all lanes are compatible.
(vect_analyze_slp_instance): Appropriately check for stores.
(vect_schedule_slp): Likewise.
---
 gcc/tree-vect-slp.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 458a2c5bb30..aa6aa3328e8 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1007,6 +1007,16 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  continue;
}
}
+
+ if (!types_compatible_p (vectype, *node_vectype))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: different vector type "
+"in %G", stmt);
+ /* Mismatch.  */
+ continue;
+   }
}
 
   /* Grouped store or load.  */
@@ -2382,7 +2392,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
   unsigned HOST_WIDE_INT const_nunits;
   if (is_a  (vinfo)
   && STMT_VINFO_GROUPED_ACCESS (stmt_info)
-  && DR_GROUP_FIRST_ELEMENT (stmt_info)
+  && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info))
   && nunits.is_constant (&const_nunits))
 {
   /* We consider breaking the group only on VF boundaries from the existing
@@ -4893,14 +4903,13 @@ vect_schedule_slp (vec_info *vinfo)
   if (is_a  (vinfo))
vect_remove_slp_scalar_calls (vinfo, root);
 
+  /* Remove vectorized stores original scalar stmts.  */
   for (j = 0; SLP_TREE_SCALAR_STMTS (root).iterate (j, &store_info); j++)
 {
- if (!STMT_VINFO_DATA_REF (store_info))
+ if (!STMT_VINFO_DATA_REF (store_info)
+ || !DR_IS_WRITE (STMT_VINFO_DATA_REF (store_info)))
break;
 
- if (SLP_INSTANCE_ROOT_STMT (instance))
-   continue;
-
  store_info = vect_orig_stmt (store_info);
  /* Free the attached stmt_vec_info and remove the stmt.  */
  vinfo->remove_stmt (store_info);
-- 
2.26.2

Re: [PATCH v5] genemit.c (main): split insn-emit.c for compiling parallelly

2020-09-11 Thread Jojo R

Hi,

Ok & Thanks,

It’s fixed in patch v6.

Jojo
在 2020年8月28日 +0800 AM4:40，Segher Boessenkool ，写道：
> Hi!
>
> On Thu, Aug 27, 2020 at 08:47:19PM +0800, Jojo R wrote:
> > +insn-emit-split-c = $(foreach o, $(shell for i in 
> > {1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c)
>
> If you use a variable for the result of that "seq", this will be more
> readable / maintainable / etc.
>
> (Should this use := instead of = ? What about the assignment to
> insn-generated-split-num itself?)
>
> > + long long read_count = 0;
>
> We use "int" in many other places for similar counts. 2**31 should be
> enough for anybody.
>
> > md_rtx_info info;
> > while (read_md_rtx (&info))
> > + {
> > + if (!(read_count++ % 1))
>
> Wrong indent. "== 0" is more typical for testing if numbers are zero.
>
> > + {
> > + printf ("/* Split file into separate compilation units for parallel 
> > compilation %lld */\n\n", read_count);
>
> Please split this (at least the source line, but probably the target
> line is too long a well).
>
>
> All that are details. This does look like it fixes the problems in the
> previous versions. Thanks!
>
>
> Segher

Re: [PATCH v5] genemit.c (main): split insn-emit.c for compiling parallelly

2020-09-11 Thread Jojo R

Hi,

Ok & Thanks,

It’s fixed in patch v6.

Jojo
在 2020年8月28日 +0800 PM5:52，Richard Sandiford ，写道：
> Thanks for doing this. In addition to what Segher said:
>
> Jojo R  writes:
> > gcc/ChangeLog:
> >
> > * genemit.c (main): Print 'split line'.
> > * Makefile.in (insn-emit.c): Define split count and file
> >
> > ---
> > gcc/Makefile.in | 15 +
> > gcc/genemit.c | 87 -
> > 2 files changed, 64 insertions(+), 38 deletions(-)
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 79e854aa938..08e4aa7ef6f 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1258,6 +1258,17 @@ ANALYZER_OBJS = \
> > # We put the *-match.o and insn-*.o files first so that a parallel make
> > # will build them sooner, because they are large and otherwise tend to be
> > # the last objects to finish building.
> > +
> > +# target overrides
> > +-include $(tmake_file)
> > +
> > +INSN-GENERATED-SPLIT-NUM ?= 0
> > +insn-generated-split-num = $(shell expr $(INSN-GENERATED-SPLIT-NUM) + 1)
> > +
> > +insn-emit-split-c = $(foreach o, $(shell for i in 
> > {1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c)
>
> The {a..b} construct isn't portable: this needs to be runnable with
> a plain Bourne shell like /bin/dash.
>
> I think we should use the same wordlist technique as check_p_numbers[0-6].
> So I guess the first step would be to rename check_p_numbers[0-6] to
> something more general and use it both here and in check_p_numbers.
>
> Thanks,
> Richard

Re: [PATCH] vec: don't select partial vectors when looping on full vectors

2020-09-11 Thread Richard Sandiford

Richard Biener  writes:
> On Wed, 9 Sep 2020, Andrea Corallo wrote:
>> Hi all,
>> 
>> this patch is meant not to generate predication in loop when the
>> loop is operating only on full vectors.
>> 
>> Ex:
>> 
>> #+BEGIN_SRC C
>> /* Vector length is 256.  */
>> void
>> f (int *restrict x, int *restrict y, unsigned int n) {
>>   for (unsigned int i = 0; i < n * 8; ++i)
>> x[i] += y[i];
>> }
>> #+END_SRC
>> 
>> Compiling on aarch64 with -O3 -msve-vector-bits=256 current trunk
>> gives:
>> 
>> #+BEGIN_SRC asm
>> f:
>> .LFB0:
>> .cfi_startproc
>> lsl w2, w2, 3
>> cbz w2, .L1
>> mov x3, 0
>> whilelo p0.s, xzr, x2
>> .p2align 3,,7
>> .L3:
>> ld1wz0.s, p0/z, [x0, x3, lsl 2]
>> ld1wz1.s, p0/z, [x1, x3, lsl 2]
>> add z0.s, z0.s, z1.s
>> st1wz0.s, p0, [x0, x3, lsl 2]
>> add x3, x3, 8
>> whilelo p0.s, x3, x2
>> b.any   .L3
>> .L1:
>> ret
>> .cfi_endproc
>> #+END_SRC
>> 
>> With the patch applied:
>> 
>> #+BEGIN_SRC asm
>> f:
>> .LFB0:
>> .cfi_startproc
>> lsl w3, w2, 3
>> cbz w3, .L1
>> mov x2, 0
>> ptrue   p0.b, vl32
>> .p2align 3,,7
>> .L3:
>> ld1wz0.s, p0/z, [x0, x2, lsl 2]
>> ld1wz1.s, p0/z, [x1, x2, lsl 2]
>> add z0.s, z0.s, z1.s
>> st1wz0.s, p0, [x0, x2, lsl 2]
>> add x2, x2, 8
>> cmp x2, x3
>> bne .L3
>> .L1:
>> ret
>> .cfi_endproc
>> #+END_SRC
>> 
>> To achieve this we check earlier if the loop needs peeling and if is
>> not the case we do not set LOOP_VINFO_USING_PARTIAL_VECTORS_P to true.
>> 
>> I moved some logic from 'determine_peel_for_niter' to
>> 'vect_need_peeling_or_part_vects_p' so it can be used for this purpose.
>> 
>> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Looks OK to me, the comment
>
> @@ -2267,7 +2278,10 @@ start_over:
>  {
>if (param_vect_partial_vector_usage == 0)
> LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -  else if (vect_verify_full_masking (loop_vinfo)
> +  else if ((vect_verify_full_masking (loop_vinfo)
> +   && vect_need_peeling_or_part_vects_p (loop_vinfo))
> +  /* Don't use partial vectors if we don't need to peel the
> + loop.  */
>|| vect_verify_loop_lens (loop_vinfo))
>
> seems to be oddly misplaced (I'd put it before the call).

Yeah, IMO it'd better to put it in the first “if”.

Also, very minor, but I think it'd be better not to shorten the name:

  vect_need_peeling_or_partial_vectors_p

Thanks,
Richard

[PATCH v6] genemit.c (main): split insn-emit.c for compiling parallelly

2020-09-11 Thread Jojo R

gcc/ChangeLog:

* genemit.c (main): Print 'split line'.
* Makefile.in (insn-emit.c): Define split count and file

---
 gcc/Makefile.in |  19 +++
 gcc/genemit.c   | 128 ++--
 2 files changed, 89 insertions(+), 58 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 79e854aa938..a7fcc7d5949 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1258,6 +1258,21 @@ ANALYZER_OBJS = \
 # We put the *-match.o and insn-*.o files first so that a parallel make
 # will build them sooner, because they are large and otherwise tend to be
 # the last objects to finish building.
+
+# target overrides
+-include $(tmake_file)
+
+INSN-GENERATED-SPLIT-NUM ?= 0
+
+insn-generated-split-num = $(shell i=1; j=`expr $(INSN-GENERATED-SPLIT-NUM) + 
1`; \
+   while test $$i -le $$j; do \
+ echo $$i; i=`expr $$i + 1`; \
+   done)
+
+insn-emit-split-c := $(foreach o, $(shell for i in 
$(insn-generated-split-num); do echo $$i; done), insn-emit$(o).c)
+insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
+$(insn-emit-split-c): insn-emit.c
+
 OBJS = \
gimple-match.o \
generic-match.o \
@@ -1265,6 +1280,7 @@ OBJS = \
insn-automata.o \
insn-dfatab.o \
insn-emit.o \
+   $(insn-emit-split-obj) \
insn-extract.o \
insn-latencytab.o \
insn-modes.o \
@@ -2365,6 +2381,9 @@ $(simple_generated_c:insn-%.c=s-%): s-%: 
build/gen%$(build_exeext)
$(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \
  $(filter insn-conditions.md,$^) > tmp-$*.c
$(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
+   $*v=$$(echo $$(csplit insn-$*.c /parallel\ compilation/ -k -s 
{$(INSN-GENERATED-SPLIT-NUM)} -f insn-$* -b "%d.c" 2>&1));\
+   [ ! "$$$*v" ] || grep "match not found" <<< $$$*v
+   [ -s insn-$*0.c ] || (for i in $(insn-generated-split-num); do touch 
insn-$*$$i.c; done && echo "" > insn-$*.c)
$(STAMP) s-$*
 
 # gencheck doesn't read the machine description, and the file produced
diff --git a/gcc/genemit.c b/gcc/genemit.c
index 84d07d388ee..4fc8e61c5c8 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -847,6 +847,46 @@ handle_overloaded_gen (overloaded_name *oname)
 }
 }
 
+#define printf_include() do { \
+  printf ("/* Generated automatically by the program `genemit'\n\
+from the machine description file `md'.  */\n\n");  \
+  printf ("#define IN_TARGET_CODE 1\n");\
+  printf ("#include \"config.h\"\n");   \
+  printf ("#include \"system.h\"\n");   \
+  printf ("#include \"coretypes.h\"\n");\
+  printf ("#include \"backend.h\"\n");  \
+  printf ("#include \"predict.h\"\n");  \
+  printf ("#include \"tree.h\"\n"); \
+  printf ("#include \"rtl.h\"\n");  \
+  printf ("#include \"alias.h\"\n");\
+  printf ("#include \"varasm.h\"\n");   \
+  printf ("#include \"stor-layout.h\"\n");  \
+  printf ("#include \"calls.h\"\n");\
+  printf ("#include \"memmodel.h\"\n"); \
+  printf ("#include \"tm_p.h\"\n"); \
+  printf ("#include \"flags.h\"\n");\
+  printf ("#include \"insn-config.h\"\n");  \
+  printf ("#include \"expmed.h\"\n");   \
+  printf ("#include \"dojump.h\"\n");   \
+  printf ("#include \"explow.h\"\n");   \
+  printf ("#include \"emit-rtl.h\"\n"); \
+  printf ("#include \"stmt.h\"\n"); \
+  printf ("#include \"expr.h\"\n"); \
+  printf ("#include \"insn-codes.h\"\n");   \
+  printf ("#include \"optabs.h\"\n");   \
+  printf ("#include \"dfp.h\"\n");  \
+  printf ("#include \"output.h\"\n");   \
+  printf ("#include \"recog.h\"\n");\
+  printf ("#include \"df.h\"\n");   \
+  printf ("#include \"resource.h\"\n"); \
+  printf ("#include \"reload.h\"\n");   \
+  printf ("#include \"diagnostic-core.h\"\n");  \
+  printf ("#include \"regs.h\"\n"); \
+  printf ("#include \"tm-constrs.h\"\n");   \
+  printf ("#include \"ggc.h\"\n");  \
+  printf ("#include \"target.h\"\n\n"); \
+} while (0)
+
 int
 main (int argc, const char **argv)
 {
@@ -862,73 +902,45 @@ main (int argc, const char **argv)
   /* Assign sequential codes to all entries in the machine description
  in parallel with the tables in insn-output.c.  */
 
-  printf ("/* Generated automatically by the program `genemit'\n\
-from the machine description file `md'.  */\n\n");
-
-  printf ("#define IN_TARGET_CODE 1\n");
-  printf ("#include \"config.h\"\n");
-  printf ("#include \"system.h\"\n");
-  printf ("#

Re: [PATCH] fortran, openmp: PR fortran/93660 Fix ICE when coarrays used with 'omp declare simd'

2020-09-11 Thread Jakub Jelinek via Gcc-patches

On Thu, Sep 10, 2020 at 11:51:51PM +0100, Kwok Cheung Yeung wrote:
> simd_clone_vector_of_formal_parm_types prefers to use the TYPE_ARG_TYPES of
> a function decl if available, but they do not contain entries for the hidden
> caf_* parameters. Replacements are therefore not generated for them, and the
> ICE occurs in ipa_simd_modify_function_body due to an assert firing when a
> replacement is not found for these hidden arguments.

So, isn't it a bug that TYPE_ARG_TYPES is inconsistent with DECL_ARGUMENTS
on the hidden arguments?  That looks like lying to the middle-end to me,
one doesn't have always a declaration with DECL_ARGUMENTS around, so
sometimes TYPE_ARG_TYPES is the only way to get at the argument types.
We had e.g. PR92305 in the past...

Jakub

Re: [PATCH] ppc64 check for incompatible setting of minimal-toc

2020-09-11 Thread Alexandre Oliva

Hello, Segher,

On Jul  9, 2020, Segher Boessenkool  wrote:

>> The problem it addresses is that the current checking only tests for 
>> existence not for an incompatible/compatible setting.

>> Currently both are reported as incompatible.

https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549760.html

> Put that
>   ((rs6000_isa_flags & OPTION_MASK_MINIMAL_TOC) != 0)
> check inside the block please, together with the CMODEL_SMALL check?

We're trying to make sense of this suggestion of yours, and I confess
I'm quite confused as to the intended effects of this piece of code, and
similar ones in aix, freebsd64 and rtems rs6000 headers.

Clearly, we wish to issue an error message if -mminimal-toc and -mcmodel
other than small.

But should we silently force the code model to small if -mno-minimal-toc
is specified?  (I believe that would be a consequence of the change you
propose)

Upon first reading your suggestion, I thought it was a think, and that
you meant to move the rs6000_isa_flags_explicit test into the block, for
that would:

- set cmodel to small whenever -mminimal-toc is enabled, explicit or by
  default

- error iff -mminimal-toc and non-small -mcmodel are explicitly given

However, this could change behavior for the various headers that have an
else block for the if that currently tests rs6000_isa_flags_explicit,
which might be undesirable, and that's where things get confusing.

It's not at all clear to me what's expected if e.g.:

- -mminimal-toc is enabled by default (is it ever?) and a non-small code
model is selected (explicitly or by default): error, silent change to
small cmodel, silent disabling of minimal-toc, or something else?

- -mno-minimal-toc is explicitly given along with some non-small code
model: any reason to error out, force small cmode, force-enable
minimal-toc, or something else?

Could you please shed any light as to the intent, so that we can sort
out the logic that will implement it?

The logic in this code seems to be unchanged since it was first
introduced by Alan Modra along with -mcmodel back in 2010, despite the
many spelling changes as internal representation of flags and state and
whatnot changed.  I'm copying Alan as well, just in case he can recall
or figure out what the intent was.  AFAICT it first made linux64.h, and
other headers got copies or variants thereof as the -mcmodel option got
adopted by other systems.

Thanks in advance,

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

Re: [PATCH] arm: Fix up arm_override_options_after_change [PR96939]

2020-09-11 Thread Christophe Lyon via Gcc-patches

On Tue, 8 Sep 2020 at 10:45, Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> As mentioned in the PR, the testcase fails to link, because when set_cfun is
> being called on the crc function, arm_override_options_after_change is
> called from set_cfun -> invoke_set_current_function_hook:
>   /* Change optimization options if needed.  */
>   if (optimization_current_node != opts)
> {
>   optimization_current_node = opts;
>   cl_optimization_restore (&global_options, TREE_OPTIMIZATION (opts));
> }
> and at that point target_option_default_node actually matches even the
> current state of options, so this means armv7 (or whatever) arch is set as
> arm_active_target, then
>   targetm.set_current_function (fndecl);
> is called later in that function, which because the crc function's
> DECL_FUNCTION_SPECIFIC_TARGET is different from the current one will do:
>   cl_target_option_restore (&global_options, TREE_TARGET_OPTION (new_tree));
> which calls arm_option_restore and sets arm_active_target to armv8-a+crc
> (so far so good).
> Later arm_set_current_function calls:
>   save_restore_target_globals (new_tree);
> which in this case calls:
>   /* Call target_reinit and save the state for TARGET_GLOBALS.  */
>   TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
> which because optimization_current_node != optimization_default_node
> (the testcase is LTO, so all functions have their
> DECL_FUNCTION_SPECIFIC_TARGET and TREE_OPTIMIZATION nodes) will call:
>   cl_optimization_restore
> (&global_options,
>  TREE_OPTIMIZATION (optimization_default_node));
> and
>   cl_optimization_restore (&global_options,
>TREE_OPTIMIZATION (opts));
> The problem is that these call arm_override_options_after_change again,
> and that one uses the target_option_default_node as what to set the
> arm_active_target to (i.e. back to armv7 or whatever, but not to the
> armv8-a+crc that should be the active target for the crc function).
> That means we then error on the builtin call in that function.
>
> Now, the targetm.override_options_after_change hook is called always at the
> end of cl_optimization_restore, i.e. when we change the Optimization marked
> generic options.  So it seems unnecessary to call arm_configure_build_target
> at that point (nothing it depends on changed), and additionally incorrect
> (because it uses the target_option_default_node, rather than the current
> set of options; we'd need to revert
> https://gcc.gnu.org/legacy-ml/gcc-patches/2016-12/msg01390.html
> otherwise so that it works again with global_options otherwise).
> The options that arm_configure_build_target cares about will change only
> during option parsing (which is where it is called already), or during
> arm_set_current_function, where it is done during the
> cl_target_option_restore.
> Now, arm_override_options_after_change_1 wants to adjust the
> str_align_functions, which depends on the current Optimization options (e.g.
> optimize_size and flag_align_options and str_align_functions) as well as
> the target options target_flags, so IMHO needs to be called both
> when the Optimization options (possibly) change, i.e. from
> the targetm.override_options_after_change hook, and from when the target
> options change (set_current_function hook).
>
> Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?
>
> Looking further at arm_override_options_after_change_1, it also seems to be
> incorrect, rather than testing
> !opts->x_str_align_functions
> it should be really testing
> !opts_set->x_str_align_functions
> and get &global_options_set or similar passed to it as additional opts_set
> argument.  That is because otherwise the decision will be sticky, while it
> should be done whenever use provided -falign-functions but didn't provide
> -falign-functions= (either on the command line, or through optimize
> attribute or pragma).
>
> 2020-09-08  Jakub Jelinek  
>
> PR target/96939
> * config/arm/arm.c (arm_override_options_after_change): Don't call
> arm_configure_build_target here.
> (arm_set_current_function): Call arm_override_options_after_change_1
> at the end.
>
> * gcc.target/arm/lto/pr96939_0.c: New test.
> * gcc.target/arm/lto/pr96939_1.c: New file.
>

Hi Jakub,

I'm seeing an ICE with this new test on most of my arm configurations,
for instance:
--target arm-none-linux-gnueabi --with-cpu cortex-a9
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-ar
m-none-linux-gnueabi/gcc3/gcc/ c_lto_pr96939_0.o c_lto_pr96939_1.o
-fdiagnostics-plain-output -flto -O2 -o
gcc-target-arm-lto-pr96939-01.exe
In function 'crc':
lto1: internal compiler error: Segmentation fault
0xba720f crash_signal
/gcc/toplev.c:327
0x172beb9 strchr
/usr/include/string.h:227
0x172beb9 arm_parse_cpu_option_name(cpu_optio

[PATCH] Fortran : Two further previously missed ICEs PR53298

2020-09-11 Thread Mark Eggleston


For review.

Fixes the two ICEs reported in PR that remained after the previous fix.

There is a side affect that is manifested in the tree dumps. Instead of

__builtin_free (ptr2->dat.data);

we get

__builtin_free ((void *) ptr2->dat.data);

I do not know the cause of this but from what I can tell the newly 
inserted cast is harmless.  All the examples I've seen so have the cast 
except where the parameter is declared as void *. In the tree dumps ptr2 
is declared as struct testtype2 *, I do not know where the type is 
declared so I don't know whether data is declared void * (I expect it is).


Is it worth the effort to determine how to remove the extra (void *)?

[PATCH] Fortran  : Two further previously missed ICEs PR53298

There were 3 ICEs with different call stacks in the comments of this
PR.  A previous commit fixed only one of those ICEs.

The ICEs fixed here are in trans-array.c and trans-expr.c.

The first ICE occurred when the array reference is not AR_ELEMENT
gfc_conv_scalarized_array_ref is called with se and ar, if se->ss is
NULL the ICE occurs.  If se->ss is NULL there is nothing to do before
the return.

The second ICE occurs in code that did not match its comments.  Fixing
the code to match the comments fixes the ICE.  A side affect is that
the in the tree dumps for finalize_35.f90 and finalize_36.f90 contain
"__builtin_free ((void *) ptr2->dat.data);", the "(void *)" was
previously omitted.  The cast is harmless.

2020-09-11  Mark Eggleston 

gcc/fortran/

    PR fortran/53298
    * trans-array.c (gfc_conv_array_ref): In the body of the if
    statement only execute the code before the reurn is se->ss is
    set.
    * trans-expr.c (gfc_conv_component_ref): Change the if
    expression to match the comments.

2020-09-04  Mark Eggleston 

gcc/testsuite/

    PR fortran/53298
    * gfortran.dg/finalize_35.f90: Handle extra (void *).
    * gfortran.dg/finalize_36.f90: Handle extra (void *).
    * gfortran.dg/pr53298_2.f90: New test.
    * gfortran.dg/pr53298_3.f90: New test.

--
https://www.codethink.co.uk/privacy.html

Re: [PATCH] ppc64 check for incompatible setting of minimal-toc

2020-09-11 Thread Alan Modra via Gcc-patches

On Fri, Sep 11, 2020 at 04:43:50AM -0300, Alexandre Oliva wrote:
> Could you please shed any light as to the intent, so that we can sort
> out the logic that will implement it?

The history goes back to 2003 commit 9739c90c8d where a ppc64-linux
host built most of gcc with -mminimal-toc due to toc/got size
constraints of the small model, but build libgcc with
-mno-minimal-toc.  When the medium/large model was implemented, I
thought it good to continue compiling libgcc in small model.  We'd
like to keep doing that too.  That's why -mno-minimal-toc, an option
valid for both ppc32 and ppc64, chooses small model on ppc64 when the
compiler default is medium model.  If you change that then make sure
that libgcc continues to be small model.  And any other code that
might have copied the libgcc trick..

I also thought it reasonable to error on an explicit -mcmodel=medium
or -mcmodel=large with either of -mminimal-toc or -mno-minimal-toc,
since the toc options really are specific to small model code.  Why
change that?

-- 
Alan Modra
Australia Development Lab, IBM

[PATCH] improve BB vectorization dump locations

2020-09-11 Thread Richard Biener

This tries to improve BB vectorization dumps by providing more
precise locations.  Currently the vect_location is simply the
very last stmt in a basic-block that has a location.  So for

double a[4], b[4];
int x[4], y[4];
void foo()
{
  a[0] = b[0]; // line 5
  a[1] = b[1];
  a[2] = b[2];
  a[3] = b[3];
  x[0] = y[0]; // line 9
  x[1] = y[1];
  x[2] = y[2];
  x[3] = y[3];
} // line 13

we show the user with -O3 -fopt-info-vec

t.c:13:1: optimized: basic block part vectorized using 16 byte vectors

while with the patch we point to both independently vectorized
opportunities:

t.c:5:8: optimized: basic block part vectorized using 16 byte vectors
t.c:9:8: optimized: basic block part vectorized using 16 byte vectors

there's the possibility that the location regresses in case the
root stmt in the SLP instance has no location.  For a SLP subgraph
with multiple entries the location also chooses one entry at random,
not sure in which case we want to dump both.

Still as the plan is to extend the basic-block vectorization
scope from single basic-block to multiple ones this is a first
step to preserve something sensible.

Implementation-wise this makes both costing and code-generation
happen on the subgraphs as analyzed.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard - is iteration over vector modes for BB vectorization
still important now that we have related_vector_type and thus
no longer only consider a fixed size?  If so it will probably
make sense to somehow still iterate even if there was some
SLP subgraph vectorized?  It also looks like BB vectorization
was never updated to consider multiple modes based on cost,
it will still pick the first opportunity.  For BB vectorization
we also have the code that re-tries SLP discovery with
splitting the store group.  So what's your overall thoughts to
this?

Thanks,
Richard.

2020-09-11  Richard Biener  

* tree-vectorizer.h (_slp_instance::location): New method.
(vect_schedule_slp): Adjust prototype.
* tree-vectorizer.c (vec_info::remove_stmt): Adjust
the BB region begin if we removed the stmt it points to.
* tree-vect-loop.c (vect_transform_loop): Adjust.
* tree-vect-slp.c (_slp_instance::location): Implement.
(vect_analyze_slp_instance): For BB vectorization set
vect_location to that of the instance.
(vect_slp_analyze_operations): Likewise.
(vect_bb_vectorization_profitable_p): Remove wrapper.
(vect_slp_analyze_bb_1): Remove cost check here.
(vect_slp_region): Cost check and code generate subgraphs separately,
report optimized locations and missed optimizations due to
profitability for each of them.
(vect_schedule_slp): Get the vector of SLP graph entries to
vectorize as argument.
---
 gcc/tree-vect-loop.c  |   2 +-
 gcc/tree-vect-slp.c   | 138 +++---
 gcc/tree-vectorizer.c |   8 ++-
 gcc/tree-vectorizer.h |   4 +-
 4 files changed, 73 insertions(+), 79 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 80e78f7adf4..c95ec5ad267 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -9018,7 +9018,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   if (!loop_vinfo->slp_instances.is_empty ())
 {
   DUMP_VECT_SCOPE ("scheduling SLP instances");
-  vect_schedule_slp (loop_vinfo);
+  vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
 }
 
   /* FORNOW: the vectorizer supports only loops which body consist
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 35bde9bcb9d..519cd6a7254 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -117,6 +117,18 @@ vect_free_slp_tree (slp_tree node, bool final_p)
   delete node;
 }
 
+/* Return a location suitable for dumpings related to the SLP instance.  */
+
+dump_user_location_t
+_slp_instance::location () const
+{
+  if (root_stmt)
+return root_stmt->stmt;
+  else
+return SLP_TREE_SCALAR_STMTS (root)[0]->stmt;
+}
+
+
 /* Free the memory allocated for the SLP instance.  FINAL_P is true if we
have vectorized the instance or if we have made a final decision not
to vectorize the statements in any way.  */
@@ -2121,6 +2133,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
   vec scalar_stmts;
   bool constructor = false;
 
+  if (is_a  (vinfo))
+vect_location = stmt_info->stmt;
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
 {
   scalar_type = TREE_TYPE (DR_REF (dr));
@@ -3120,6 +3134,8 @@ vect_slp_analyze_operations (vec_info *vinfo)
   hash_set lvisited;
   stmt_vector_for_cost cost_vec;
   cost_vec.create (2);
+  if (is_a  (vinfo))
+   vect_location = instance->location ();
   if (!vect_slp_analyze_node_operations (vinfo,
 SLP_INSTANCE_TREE (instance),
 instance, visited, lvisited,
@@ -3

[Ada] Add missing stride entry in debug info

2020-09-11 Thread Eric Botcazou

This adds a missing stride entry for bit-packed arrays of record types.

Tested on x86_64-suse-linux, applied on the mainline.


2020-09-11  Eric Botcazou  

* gcc-interface/misc.c (get_array_bit_stride): Return TYPE_ADA_SIZE
for record and union types.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index e6a563e3666..781868e2ad3 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -1003,6 +1003,10 @@ get_array_bit_stride (tree comp_type)
   if (INTEGRAL_TYPE_P (comp_type))
 return TYPE_RM_SIZE (comp_type);
 
+  /* Likewise for record or union types.  */
+  if (RECORD_OR_UNION_TYPE_P (comp_type) && !TYPE_FAT_POINTER_P (comp_type))
+return TYPE_ADA_SIZE (comp_type);
+
   /* The gnat_get_array_descr_info debug hook expects a debug tyoe.  */
   comp_type = maybe_debug_type (comp_type);

[Ada] Drop GNAT encodings for fixed-point types

2020-09-11 Thread Eric Botcazou

GDB can now deal with the DWARF representation just fine.

Tested on x86_64-suse-linux, applied on the mainline.


2020-09-11  Eric Botcazou  

* gcc-interface/misc.c (gnat_get_fixed_point_type): Bail out only
when the GNAT encodings are specifically used.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 183daf33fb6..e6a563e3666 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -618,10 +618,9 @@ gnat_get_fixed_point_type_info (const_tree type,
 {
   tree scale_factor;
 
-  /* GDB cannot handle fixed-point types yet, so rely on GNAT encodings
- instead for it.  */
+  /* Do nothing if the GNAT encodings are used.  */
   if (!TYPE_IS_FIXED_POINT_P (type)
-  || gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL)
+  || gnat_encodings == DWARF_GNAT_ENCODINGS_ALL)
 return false;
 
   scale_factor = TYPE_SCALE_FACTOR (type);

[Ada] Fix ICE on nested packed variant record type

2020-09-11 Thread Eric Botcazou

This is a regression present on the mainline and 10 branch: the compiler 
aborts on code accessing a component of a packed record type whose type is a 
packed discriminated record type with variant part.

Tested on x86_64-suse-linux, applied on the mainline and 10 branch.


2020-09-11  Eric Botcazou  

* gcc-interface/utils.c (type_has_variable_size): New function.
(create_field_decl): In the packed case, also force byte alignment
when the type of the field has variable size.


2020-09-11  Eric Botcazou  

* gnat.dg/pack27.adb: New test.
* gnat.dg/pack27_pkg.ads: New helper.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index a96fde668be..3065fcb6260 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -2905,6 +2905,31 @@ aggregate_type_contains_array_p (tree type, bool self_referential)
 }
 }
 
+/* Return true if TYPE is a type with variable size or a padding type with a
+   field of variable size or a record that has a field with such a type.  */
+
+static bool
+type_has_variable_size (tree type)
+{
+  tree field;
+
+  if (!TREE_CONSTANT (TYPE_SIZE (type)))
+return true;
+
+  if (TYPE_IS_PADDING_P (type)
+  && !TREE_CONSTANT (DECL_SIZE (TYPE_FIELDS (type
+return true;
+
+  if (!RECORD_OR_UNION_TYPE_P (type))
+return false;
+
+  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+if (type_has_variable_size (TREE_TYPE (field)))
+  return true;
+
+  return false;
+}
+
 /* Return a FIELD_DECL node.  NAME is the field's name, TYPE is its type and
RECORD_TYPE is the type of the enclosing record.  If SIZE is nonzero, it
is the specified size of the field.  If POS is nonzero, it is the bit
@@ -2974,13 +2999,15 @@ create_field_decl (tree name, tree type, tree record_type, tree size, tree pos,
 
   DECL_PACKED (field_decl) = pos ? DECL_BIT_FIELD (field_decl) : packed;
 
-  /* If FIELD_TYPE is BLKmode, we must ensure this is aligned to at least a
- byte boundary since GCC cannot handle less-aligned BLKmode bitfields.
+  /* If FIELD_TYPE has BLKmode, we must ensure this is aligned to at least
+ a byte boundary since GCC cannot handle less aligned BLKmode bitfields.
+ Likewise if it has a variable size and no specified position because
+ variable-sized objects need to be aligned to at least a byte boundary.
  Likewise for an aggregate without specified position that contains an
- array, because in this case slices of variable length of this array
- must be handled by GCC and variable-sized objects need to be aligned
- to at least a byte boundary.  */
+ array because, in this case, slices of variable length of this array
+ must be handled by GCC and have variable size.  */
   if (packed && (TYPE_MODE (type) == BLKmode
+		 || (!pos && type_has_variable_size (type))
 		 || (!pos
 		 && AGGREGATE_TYPE_P (type)
 		 && aggregate_type_contains_array_p (type, false
pragma No_Component_Reordering;

package Pack27_Pkg is

  type Enum is (One, Two, Three);

  type Rec1 (D : Enum := One) is record
case D is
  when One => null;
  when Two => null;
  when Three => C : Character;
end case;
  end record;
  pragma Pack (Rec1);

  type Rec2 is record
R : Rec1;
  end record;
  pragma Pack (Rec2);

  type Rec3 is record
B : boolean;
R : Rec2;
  end record;
  pragma Pack (Rec3);

  type Rec4 is record
B : Boolean;
R : Rec3;
  end record;
  pragma Pack (Rec4);

end Pack27_Pkg;
-- { dg-do compile }

with Pack27_Pkg; use Pack27_Pkg;

procedure Pack27 is
  R1 : Rec1;
  R4 : Rec4;
begin
  R4.R.R.R := R1;
end;

[Ada] Fix crash on array component with nonstandard index type

2020-09-11 Thread Eric Botcazou

This is a regression present on mainline, 10 and 9 branches: the compiler goes 
into an infinite recursion eventually exhausting the stack for the declaration 
of a discriminated record type with an array component having a discriminant 
as bound and an index type that is an enumeration type with a non-standard 
representation clause.

Tested on x86_64-suse-linux, applied on the mainline, 10 and 9 branches.


2020-09-11  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Only
create extra subtypes for discriminants if the RM size of the base
type of the index type is lower than that of the index type.


2020-09-11  Eric Botcazou  

* gnat.dg/specs/discr7.ads: New test.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index 8045fa5ff97..2b7392c62c0 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -2480,8 +2480,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	  tree gnu_base_orig_max = TYPE_MAX_VALUE (gnu_base_index_type);
 	  tree gnu_min, gnu_max, gnu_high;
 
-	  /* We try to define subtypes for discriminants used as bounds
-		 that are more restrictive than those declared by using the
+	  /* We try to create subtypes for discriminants used as bounds
+		 that are more restrictive than those declared, by using the
 		 bounds of the index type of the base array type.  This will
 		 make it possible to calculate the maximum size of the record
 		 type more conservatively.  This may have already been done by
@@ -2489,8 +2489,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 		 there will be a conversion that needs to be removed first.  */
 	  if (CONTAINS_PLACEHOLDER_P (gnu_orig_min)
 		  && TYPE_RM_SIZE (gnu_base_index_type)
-		  && !tree_int_cst_lt (TYPE_RM_SIZE (gnu_index_type),
-   TYPE_RM_SIZE (gnu_base_index_type)))
+		  && tree_int_cst_lt (TYPE_RM_SIZE (gnu_base_index_type),
+  TYPE_RM_SIZE (gnu_index_type)))
 		{
 		  gnu_orig_min = remove_conversions (gnu_orig_min, false);
 		  TREE_TYPE (gnu_orig_min)
@@ -2501,8 +2501,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 
 	  if (CONTAINS_PLACEHOLDER_P (gnu_orig_max)
 		  && TYPE_RM_SIZE (gnu_base_index_type)
-		  && !tree_int_cst_lt (TYPE_RM_SIZE (gnu_index_type),
-   TYPE_RM_SIZE (gnu_base_index_type)))
+		  && tree_int_cst_lt (TYPE_RM_SIZE (gnu_base_index_type),
+  TYPE_RM_SIZE (gnu_index_type)))
 		{
 		  gnu_orig_max = remove_conversions (gnu_orig_max, false);
 		  TREE_TYPE (gnu_orig_max)
-- { dg-do compile }

package Discr7 is

  type Enum is (One, Two, Three);
  for Enum use (One => 1, Two => 2, Three => 3);

  type Arr is array (Integer range <>, Enum range <>) of Boolean;

  type Rec (D : Integer) is record
A: Arr (1 .. D, Enum'Range);
  end record;

end Discr7;

Re: [PATCH] arm: Fix up arm_override_options_after_change [PR96939]

2020-09-11 Thread Jakub Jelinek via Gcc-patches

On Fri, Sep 11, 2020 at 09:46:37AM +0200, Christophe Lyon via Gcc-patches wrote:
> I'm seeing an ICE with this new test on most of my arm configurations,
> for instance:
> --target arm-none-linux-gnueabi --with-cpu cortex-a9
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-ar
> m-none-linux-gnueabi/gcc3/gcc/ c_lto_pr96939_0.o c_lto_pr96939_1.o
> -fdiagnostics-plain-output -flto -O2 -o
> gcc-target-arm-lto-pr96939-01.exe

Seems a latent issue.
Neither cl_optimization_{save,restore} nor cl_target_option_{save,restore}
(nor any of the target hooks they call) saves or restores any opts_set
values, so I think opts_set can be trusted only during option processing (if
at all), but not later.
So, short term a fix would be IMHO just stop using opts_set altogether in
arm_configure_build_target, it doesn't make much sense to me, it should test
if those strings are non-NULL instead, or at least do that when it is
invoked from arm_option_restore (e.g. could be done by calling it with
opts instead of &global_options_set ).
Longer term, the question is if cl_optimization_{save,restore} and
cl_target_option_{save,restore} shouldn't be changed not to only
save/restore the options, but also save the opts_set flags.
It could be done e.g. by adding a bool array or set of bool members
to struct cl_optimization and struct cl_target_option , or even more compact
by using bitmasks, pack each 64 adjacent option flags into a UHWI element
of an array.

> In function 'crc':
> lto1: internal compiler error: Segmentation fault
> 0xba720f crash_signal
> /gcc/toplev.c:327
> 0x172beb9 strchr
> /usr/include/string.h:227
> 0x172beb9 arm_parse_cpu_option_name(cpu_option const*, char const*,
> char const*, bool)
> /gcc/common/config/arm/arm-common.c:349
> 0xfa9372 arm_configure_build_target(arm_build_target*,
> cl_target_option*, gcc_options*, bool)
> /gcc/config/arm/arm.c:3209
> 0xfbf8e1 arm_set_current_function
> /gcc/config/arm/arm.c:32334
> 0x86b68b invoke_set_current_function_hook
> /gcc/function.c:4670
> 0x870d77 invoke_set_current_function_hook
> /gcc/function.c:4836
> 0x870d77 allocate_struct_function(tree_node*, bool)
> /gcc/function.c:4793
> 0xa25943 input_function
> /gcc/lto-streamer-in.c:1385
> 0xa25943 lto_read_body_or_constructor
> /gcc/lto-streamer-in.c:1624
> 0x6f37ff cgraph_node::get_untransformed_body()
> /gcc/cgraph.c:3932
> 0x703f42 cgraph_node::expand()
> /gcc/cgraphunit.c:2274
> 0x70567c expand_all_functions
> /gcc/cgraphunit.c:2476
> 0x70567c symbol_table::compile()
> /gcc/cgraphunit.c:2839
> 0x63970b lto_main()
> /gcc/lto/lto.c:653
> 
> This is with a cross-compiler.

Jakub

Re: [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock

2020-09-11 Thread Jonathan Wakely via Gcc-patches


I'm finally getting round to merging this series!


On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:

If std::future::wait_until is passed a time point measured against a clock
that is neither std::chrono::steady_clock nor std::chrono::system_clock
then the generic implementation of
__atomic_futex_unsigned::_M_load_when_equal_until is called which
calculates the timeout based on __clock_t and calls the
_M_load_when_equal_until method for that clock to perform the actual wait.

There's no guarantee that __clock_t is running at the same speed as the
caller's clock, so if the underlying wait times out timeout we need to
check the timeout against the caller's clock again before potentially
looping.

Also add two extra tests to the testsuite's async.cc:

* run test03 with steady_clock_copy, which behaves identically to
 std::chrono::steady_clock, but isn't std::chrono::steady_clock. This
 causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until
 that takes an arbitrary clock to be called.

* invent test04 which uses a deliberately slow running clock in order to
 exercise the looping behaviour o
 __atomic_futex_unsigned::_M_load_when_equal_until described above.

* libstdc++-v3/include/bits/atomic_futex.h:
(__atomic_futex_unsigned) Add loop to _M_load_when_equal_until on
generic _Clock to check the timeout against _Clock again after
_M_load_when_equal_until returns indicating a timeout.

* libstdc++-v3/testsuite/30_threads/async/async.cc: Invent
slow_clock that runs at an eleventh of steady_clock's speed. Use it
to test the user-supplied-clock variant of
__atomic_futex_unsigned::_M_load_when_equal_until works generally
with test03 and loops correctly when the timeout time hasn't been
reached in test04.
---
libstdc++-v3/include/bits/atomic_futex.h | 15 ++--
libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +-
2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 4375129..5f95ade 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_load_when_equal_until(unsigned __val, memory_order __mo,
  const chrono::time_point<_Clock, _Duration>& __atime)
  {
-   const typename _Clock::time_point __c_entry = _Clock::now();
-   const __clock_t::time_point __s_entry = __clock_t::now();
-   const auto __delta = __atime - __c_entry;
-   const auto __s_atime = __s_entry + __delta;
-   return _M_load_when_equal_until(__val, __mo, __s_atime);
+   typename _Clock::time_point __c_entry = _Clock::now();
+   do {
+ const __clock_t::time_point __s_entry = __clock_t::now();
+ const auto __delta = __atime - __c_entry;
+ const auto __s_atime = __s_entry + __delta;
+ if (_M_load_when_equal_until(__val, __mo, __s_atime))
+   return true;


I wonder if we can avoid looping if _Clock::is_steady is true i.e.

  if _GLIBCXX_CONSTEXPR (_Clock::is_steady)
return false

If _Clock is steady then it can't be adjusted to move backwards. I am
not 100% sure, but I don't think a steady clock can run "slow" either.
But I'm checking with LWG about that. We should keep the loop for now.

The reason for wanting to return early is that _Clock::now() can be
quite expensive, so avoiding the second call if it's not needed would
be significant.

Related to this, I've been experimenting with a change to wait_for
that checks __rtime <= 0 and if so, uses __clock_t::time_point::min()
for the wait_until call, so that we don't bother to call
__clock_t::now(). For a non-positive duration we don't need to
determine the precise time_point in the past, because it's in the past
anyway. Using time_point::min() is as good as __clock_t::now() - rtime
(as long as we don't overflow anything when converting it to a struct
timespec, or course!)

Using future.wait_for(0s) to poll for a future becoming ready takes a
long time currently: https://wandbox.org/permlink/Z1arsDE7eHW9JrqJ
Using wait_until(C::time_point::min()) is faster, because it doesn't
call C::now().

This probably isn't important for most timed waiting functions in the
library, because I don't see any reason to use wait_for(0s) to poll a
mutex, condition_variable or atomic. But it's reasonable to do for
futures.

I already added (__rtime <= __rtime.zero()) to this_thread::sleep_for
in d1a74a287ee1a84b90e5675904dac7f945cffca1.

The extra branch to check rtime <= rtime.zero() or atime < C::now() is
probably insignificant compared to the cost of unnecessary calls to
now() on one or more clocks.


+ __c_entry = _Clock::now();
+   } while (__c_entry < __atime);
+   return false;
  }

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Richard Sandiford

Kees Cook via Gcc-patches  writes:
> [tried to clean up quoting...]
>
> On Tue, Sep 08, 2020 at 10:00:09AM -0500, Qing Zhao wrote:
>> 
>> > On Sep 7, 2020, at 8:06 AM, Rodriguez Bahena, Victor 
>> >  wrote:
>> > 
>> >>> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
>> >>> So, my question is:
>> >>>
>> >>> From the security point of view, does clearing ALL registers have more 
>> >>> benefit than clearing USED registers?  
>> >>> From my understanding, clearing registers that are not used in the 
>> >>> current routine does NOT provide additional benefit, correct me if I am 
>> >>> wrong here.
>> >  
>> > You are right, it does not provide additional security
>> 
>> Then, is it necessary to provide 
>> 
>> -fzero-call-used-regs=all-arg|all-gpr|all   to the user?
>> 
>> Can we just delete these 3 sub options?
>
> Well... I'd say there is some benefit (remember that ROP gadgets are
> built from function trailers, so there is rarely a concern over what the
> rest of the function is doing). Generally, they are chained together
> based on just the last couple instructions:
>
>  *useful action*
>  *ret*
>
> So with ...=used this turns into:
>
>  *useful action*
>  *clear some registers*
>  *ret*
>
> Which may still be helpful (if, for example, the state being built by
> the attacker is using registers _not_ in the cleared list). However:
>
>  *useful action*
>  *clear all registers*
>  *ret*
>
> Means that suddenly the ROP chain cannot use *any* of the caller-saved
> registers to hold state.
>
> So, while ...=used is likely going to block a lot, ...=all will block
> even more. I'd prefer to have both available, if for no other reason
> than to compare the ROP gadget availability for any given binary (e.g.
> if some future attack is found that bypasses ...=used, does it also
> bypass ...=all?)

This might have already been discussed/answered, sorry, but:
when there's a choice, is there an obvious winner between:

(1) clearing call-clobbered registers and then restoring call-preserved ones
(2) restoring call-preserved registers and then clearing call-clobbered ones

Is one option more likely to be useful to attackers than the other?

(For some frames, it might be necessary to use a small number of
call-clobbered registers to perform the restore sequence, so (1)
wouldn't be fully achievable in all cases.)

Thanks,
Richard

[committed][libatomic] Add nvptx support

2020-09-11 Thread Tom de Vries

On 9/8/20 8:51 AM, Tom de Vries wrote:
> Hi,
> 
> Add nvptx support to libatomic.
> 
> Given that atomic_test_and_set is not implemented for nvptx (PR96964), the
> compiler translates __atomic_test_and_set falling back onto the "Failing all
> else, assume a single threaded environment and simply perform the operation"
> case in expand_atomic_test_and_set, so it doesn't map onto an actual atomic
> operation.
> 
> Still, that counts as supported for the configure test of libatomic, so we
> end up with HAVE_ATOMIC_TAS_1/2/4/8/16 == 1, and the corresponding
> __atomic_test_and_set_1/2/4/8/16 in libatomic all using that non-atomic
> implementation.
> 
> Fix this by adding an atomic_test_and_set expansion for nvptx, that uses
> libatomics __atomic_test_and_set_1.
> 
> This again makes the configure tests for HAVE_ATOMIC_TAS_1/2/4/8/16 fail, so
> instead we use this case in tas_n.c:
> ...
> /* If this type is smaller than word-sized, fall back to a word-sized
>compare-and-swap loop.  */
> bool
> SIZE(libat_test_and_set) (UTYPE *mptr, int smodel)
> ...
> which for __atomic_test_and_set_8 uses INVERT_MASK_8.
> 
> Add INVERT_MASK_8 in libatomic_i.h, as well as MASK_8.
> 
> Tested libatomic testsuite on nvptx.
> 
> Non-target bits (sync-builtins.def, libatomic_i.h) OK for trunk?
> 
> Any other comments?
> 
> Thanks,
> - Tom
> 
> [libatomic] Add nvptx support
> 
> gcc/ChangeLog:
> 
>   PR target/96964
>   * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
>   expansion.
>   * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.
> 

I realized after reading the header comment of write_fn_proto_from_insn
in nvptx.c that I could drop the sync-builtins.def bit.  So, committed
without the sync-builtins.def change.

Thanks,
- Tom

> libatomic/ChangeLog:
> 
>   PR target/96898
>   * configure.tgt: Add nvptx.
>   * libatomic_i.h (MASK_8, INVERT_MASK_8): New macro definition.
>   * config/nvptx/host-config.h: New file.
>   * config/nvptx/lock.c: New file.
> 
> ---
>  gcc/config/nvptx/nvptx.md| 16 +++
>  gcc/sync-builtins.def|  2 ++
>  libatomic/config/nvptx/host-config.h | 56 
> 
>  libatomic/config/nvptx/lock.c| 56 
> 
>  libatomic/configure.tgt  |  3 ++
>  libatomic/libatomic_i.h  |  2 ++
>  6 files changed, 135 insertions(+)
> 
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 4168190fa42..6178e6a0f77 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -1667,6 +1667,22 @@
>"%.\\tatom%A1.b%T0.\\t%0, %1, %2;"
>[(set_attr "atomic" "true")])
>  
> +(define_expand "atomic_test_and_set"
> +  [(match_operand:SI 0 "nvptx_register_operand") ;; bool success output
> +   (match_operand:QI 1 "memory_operand") ;; memory
> +   (match_operand:SI 2 "const_int_operand")] ;; model
> +  ""
> +{
> +  rtx libfunc;
> +  rtx addr;
> +  libfunc = init_one_libfunc ("__atomic_test_and_set_1");
> +  addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0));
> +  emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode,
> +   addr, ptr_mode,
> +   operands[2], SImode);
> +  DONE;
> +})
> +
>  (define_insn "nvptx_barsync"
>[(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
>(match_operand:SI 1 "const_int_operand")]
> diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def
> index 156a13ce0f8..b802257bd1a 100644
> --- a/gcc/sync-builtins.def
> +++ b/gcc/sync-builtins.def
> @@ -261,6 +261,8 @@ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, 
> "__sync_synchronize",
>  
>  DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET, "__atomic_test_and_set",
> BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET_1, "__atomic_test_and_set_1",
> +   BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
>  
>  DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_CLEAR, "__atomic_clear", 
> BT_FN_VOID_VPTR_INT,
> ATTR_NOTHROWCALL_LEAF_LIST)
> diff --git a/libatomic/config/nvptx/host-config.h 
> b/libatomic/config/nvptx/host-config.h
> new file mode 100644
> index 000..eb9de81f388
> --- /dev/null
> +++ b/libatomic/config/nvptx/host-config.h
> @@ -0,0 +1,56 @@
> +/* Copyright (C) 2020 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU Atomic Library (libatomic).
> +
> +   Libatomic is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   Libatomic is distributed in the hope that it will be useful, but WITHOUT 
> ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FO

[committed] amdgcn: align TImode registers

2020-09-11 Thread Andrew Stubbs

This patch fixes an execution failure in which the compiler would 
corrupt TImode values due to missed early clobber problems with 
partially overlapping register allocations.  In fact, adding early 
clobber constraints does not fix the issue because IRA doesn't check 
that for move instructions properly in all circumstances anyway (the 
constraint causes an ICE in postreload).


This patch fixes the problem by ensuring that TImode values are always 
aligned to 4-register boundaries, meaning that inputs and outputs will 
either overlap completely, or not at all, neither of which have 
early-clobber issues.


This is an artificial restriction the hardware not present in hardware , 
but it is the same solution we use for DImode values where we had a lot 
of the same problems.


With the patch I see the following new test passes:

PASS: gfortran.dg/PR95331.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test

PASS: gfortran.dg/PR95331.f90   -O3 -g  execution test
PASS: gfortran.dg/gamma_5.f90   -O0  execution test
PASS: gfortran.dg/gamma_5.f90   -O1  execution test
PASS: gfortran.dg/gamma_5.f90   -O2  execution test
PASS: gfortran.dg/gamma_5.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test

PASS: gfortran.dg/gamma_5.f90   -O3 -g  execution test
PASS: gfortran.dg/gamma_5.f90   -Os  execution test
PASS: gfortran.dg/intrinsic_pack_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

PASS: gfortran.dg/intrinsic_pack_1.f90   -O3 -g  execution test
PASS: gfortran.dg/optional_absent_5.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

PASS: gfortran.dg/optional_absent_5.f90   -O3 -g  execution test

Andrew
amdgcn: align TImode registers

This prevents execution failures caused by partially overlapping input and
output registers.  This is the same solution already used for DImode.

gcc/ChangeLog:

	* config/gcn/gcn.c (gcn_hard_regno_mode_ok): Align TImode registers.
	* config/gcn/gcn.md: Assert that TImode registers do not early clobber.

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 8b3c4544dd5..84d1fd9a354 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -475,7 +475,8 @@ gcn_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 return (vgpr_1reg_mode_p (mode)
 	|| (!((regno - FIRST_VGPR_REG) & 1) && vgpr_2reg_mode_p (mode))
 	/* TImode is used by DImode compare_and_swap.  */
-	|| mode == TImode);
+	|| (mode == TImode
+		&& !((regno - FIRST_VGPR_REG) & 3)));
   return false;
 }
 
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index aeb25fbb931..0e73fea93cf 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -677,6 +677,8 @@ (define_insn_and_split "*movti_insn"
(set (match_dup 4) (match_dup 5))
(set (match_dup 6) (match_dup 7))]
   {
+gcc_assert (rtx_equal_p (operands[0], operands[1])
+		|| !reg_overlap_mentioned_p (operands[0], operands[1]));
 operands[6] = gcn_operand_part (TImode, operands[0], 3);
 operands[7] = gcn_operand_part (TImode, operands[1], 3);
 operands[4] = gcn_operand_part (TImode, operands[0], 2);

Re: [PATCH V2] libgccjit: Add new gcc_jit_context_new_blob entry point

2020-09-11 Thread Andrea Corallo

Hi Dave,

thanks for the review!

David Malcolm  writes:

[...]

> Was there a reason for using reinterpret_cast here, rather than
> static_cast?

Yes the reason is that apparently we can't use static_cast for that:

"error: invalid ‘static_cast’ from type ‘gcc_jit_lvalue*’ to type 
‘gcc::jit::recording::global*’"

Why is that is another question I fear I'm not very qualified to answer.
If you feel is necessary and want to suggest some rework to improve this
I'll be happy to put some effort in if you like.

>> diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h
>> index 1c5a12e9c01..3cbcbef2693 100644
>> --- a/gcc/jit/libgccjit.h
>> +++ b/gcc/jit/libgccjit.h
>> @@ -788,6 +788,20 @@ gcc_jit_context_new_global (gcc_jit_context *ctxt,
>>  gcc_jit_type *type,
>>  const char *name);
>>  
>> +#define LIBGCCJIT_HAVE_gcc_jit_global_set_initializer
>> +
>> +/* Set a static initializer for a global return the global itself.

[...]

>
> OK for master with these nits fixed, assuming your usual testing.
>
> Thanks!
> Dave

After applying the suggestions and having tested it I've installed the
patch as 4ecc0061c40.

Thanks!

  Andrea

Re: [PATCH] improve BB vectorization dump locations

2020-09-11 Thread Richard Sandiford

Richard Biener  writes:
> This tries to improve BB vectorization dumps by providing more
> precise locations.  Currently the vect_location is simply the
> very last stmt in a basic-block that has a location.  So for
>
> double a[4], b[4];
> int x[4], y[4];
> void foo()
> {
>   a[0] = b[0]; // line 5
>   a[1] = b[1];
>   a[2] = b[2];
>   a[3] = b[3];
>   x[0] = y[0]; // line 9
>   x[1] = y[1];
>   x[2] = y[2];
>   x[3] = y[3];
> } // line 13
>
> we show the user with -O3 -fopt-info-vec
>
> t.c:13:1: optimized: basic block part vectorized using 16 byte vectors
>
> while with the patch we point to both independently vectorized
> opportunities:
>
> t.c:5:8: optimized: basic block part vectorized using 16 byte vectors
> t.c:9:8: optimized: basic block part vectorized using 16 byte vectors
>
> there's the possibility that the location regresses in case the
> root stmt in the SLP instance has no location.  For a SLP subgraph
> with multiple entries the location also chooses one entry at random,
> not sure in which case we want to dump both.
>
> Still as the plan is to extend the basic-block vectorization
> scope from single basic-block to multiple ones this is a first
> step to preserve something sensible.
>
> Implementation-wise this makes both costing and code-generation
> happen on the subgraphs as analyzed.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Richard - is iteration over vector modes for BB vectorization
> still important now that we have related_vector_type and thus
> no longer only consider a fixed size?  If so it will probably
> make sense to somehow still iterate even if there was some
> SLP subgraph vectorized?  It also looks like BB vectorization
> was never updated to consider multiple modes based on cost,
> it will still pick the first opportunity.  For BB vectorization
> we also have the code that re-tries SLP discovery with
> splitting the store group.  So what's your overall thoughts to
> this?

I think there might be different answers for “in principle” and
“in practice”. :-)

In principle, there's no one right answer to (say) “what vector mode
should I use for 4 32-bit integers?”.  If the block is only operating on
that type, then VNx4SI is the right choice for 128-bit SVE.  But if the
block is mostly operating on 4 64-bit integers and just converting to
32-bit integers for a small region, then it might be better to use
2 VNx2SIs instead (paired with 2 VNx2DIs).

In practice, one situation in which the current loop might be needed
is pattern statements.  There we assign a vector type during pattern
recognition, based only on the element type.  So in that situation,
the first pass (with the autodetected base vector mode) will not take
the number of scalar stmts into account.

Also, although SLP currently only operates on full vectors,
I was hoping we would eventually support predication for SLP too.
At that point, the number of scalar statements wouldn't directly
determine the number of vector lanes.

On the cost thing: it would be better to try all four and pick the one
with the lowest cost, but given your in-progress changes, it seemed like
a dead end to do that with the current code.

It sounded like the general direction here was to build an SLP graph
and “solve” the vector type assignment problem in a more global way,
once we have a view of the entire graph.  Is that right?  If so,
then at that point we might be able to do something more intelligent
than just iterate over all the options.  (Although at the same time,
iterating over all the options on a fixed (sub?)graph would be cheaper
than what we do now.)

Thanks,
Richard

RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-09-11 Thread Tamar Christina

Hi Martin,

> -Original Message-
> From: Martin Jambor 
> Sent: Tuesday, September 8, 2020 3:01 PM
> To: Tamar Christina ; Richard Sandiford
> ; luoxhu via Gcc-patches  patc...@gcc.gnu.org>
> Cc: seg...@kernel.crashing.org; luoxhu ;
> wschm...@linux.ibm.com; li...@gcc.gnu.org; Jan Hubicka
> ; dje@gmail.com
> Subject: RE: [PATCH] ipa-inline: Improve growth accumulation for recursive
> calls
> 
> Hi,
> 
> On Fri, Aug 21 2020, Tamar Christina wrote:
> >>
> >> Honza's changes have been motivated to big extent as an enabler for
> >> IPA-CP heuristics changes to actually speed up 548.exchange2_r.
> >>
> >> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds
> two
> >> weeks ago, this week it is 403, but with my WIP (and so far untested)
> >> patch below it is just 276 seconds - faster than one built with GCC 8
> >> which needs
> >> 283 seconds.
> >>
> >> I'll be interested in knowing if it also works this well on other 
> >> architectures.
> >>
> 
> I have posted the new version of the patch series to the mailing list
> yesterday and I have also pushed the branch to the FSF repo as
> refs/users/jamborm/heads/ipa-context_and_exchange-200907
> 

Thanks! I've pushed it through our CI with a host of different options (see 
below)

> >
> > Many thanks for working on this!
> >
> > I tried this on an AArch64 Neoverse-N1 machine and didn't see any
> difference.
> > Do I need any flags for it to work? The patch was applied on top of
> > 656218ab982cc22b826227045826c92743143af1
> >
> 
> I only have access to fairly old AMD (Seattle) Opteron 1100 which might not
> support some interesting Aarch64 ISA extensions but I can measure a
> significant speedup on it (everything with just -Ofast -march=native -
> mtune=native, no non-default parameters, without LTO, without any inlining
> options):
> 
>   GCC 10 branch:  915 seconds
>   Master (rev. 995bb851ffe):  989 seconds
>   My branch:  827 seconds
> 
> (All is 548.exchange_r reference run time.)
> 

+--+--+--+--+--+
| Compiler | Flags  
  | 
diff GCC 10  |  |  |
+--+--+--+--+--+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once |  |  |  |
+--+--+--+--+--+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer   
  | 
-44% |  |  |
+--+--+--+--+--+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto 
  | 
-36% |  |  |
+--+--+--+--+--+
| GCC 11   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once | -12% |  |  |
+--+--+--+--+--+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
  | -22% |  |  |
+--+--+--+--+--+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once | -12% |  |  |
+--+--+--+--+--+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto 
  | 
-24% |  |  |
+--+-

Re: [PATCH] improve BB vectorization dump locations

2020-09-11 Thread Richard Biener

On Fri, 11 Sep 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > This tries to improve BB vectorization dumps by providing more
> > precise locations.  Currently the vect_location is simply the
> > very last stmt in a basic-block that has a location.  So for
> >
> > double a[4], b[4];
> > int x[4], y[4];
> > void foo()
> > {
> >   a[0] = b[0]; // line 5
> >   a[1] = b[1];
> >   a[2] = b[2];
> >   a[3] = b[3];
> >   x[0] = y[0]; // line 9
> >   x[1] = y[1];
> >   x[2] = y[2];
> >   x[3] = y[3];
> > } // line 13
> >
> > we show the user with -O3 -fopt-info-vec
> >
> > t.c:13:1: optimized: basic block part vectorized using 16 byte vectors
> >
> > while with the patch we point to both independently vectorized
> > opportunities:
> >
> > t.c:5:8: optimized: basic block part vectorized using 16 byte vectors
> > t.c:9:8: optimized: basic block part vectorized using 16 byte vectors
> >
> > there's the possibility that the location regresses in case the
> > root stmt in the SLP instance has no location.  For a SLP subgraph
> > with multiple entries the location also chooses one entry at random,
> > not sure in which case we want to dump both.
> >
> > Still as the plan is to extend the basic-block vectorization
> > scope from single basic-block to multiple ones this is a first
> > step to preserve something sensible.
> >
> > Implementation-wise this makes both costing and code-generation
> > happen on the subgraphs as analyzed.
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > Richard - is iteration over vector modes for BB vectorization
> > still important now that we have related_vector_type and thus
> > no longer only consider a fixed size?  If so it will probably
> > make sense to somehow still iterate even if there was some
> > SLP subgraph vectorized?  It also looks like BB vectorization
> > was never updated to consider multiple modes based on cost,
> > it will still pick the first opportunity.  For BB vectorization
> > we also have the code that re-tries SLP discovery with
> > splitting the store group.  So what's your overall thoughts to
> > this?
> 
> I think there might be different answers for “in principle” and
> “in practice”. :-)
> 
> In principle, there's no one right answer to (say) “what vector mode
> should I use for 4 32-bit integers?”.  If the block is only operating on
> that type, then VNx4SI is the right choice for 128-bit SVE.  But if the
> block is mostly operating on 4 64-bit integers and just converting to
> 32-bit integers for a small region, then it might be better to use
> 2 VNx2SIs instead (paired with 2 VNx2DIs).
> 
> In practice, one situation in which the current loop might be needed
> is pattern statements.  There we assign a vector type during pattern
> recognition, based only on the element type.  So in that situation,
> the first pass (with the autodetected base vector mode) will not take
> the number of scalar stmts into account.

Ah, indeed.  So currently the per-BB decision is probably not too
bad but when it becomes a per-function decision we need to do something
about this I guess.

> Also, although SLP currently only operates on full vectors,
> I was hoping we would eventually support predication for SLP too.
> At that point, the number of scalar statements wouldn't directly
> determine the number of vector lanes.
> 
> On the cost thing: it would be better to try all four and pick the one
> with the lowest cost, but given your in-progress changes, it seemed like
> a dead end to do that with the current code.
>
> It sounded like the general direction here was to build an SLP graph
> and “solve” the vector type assignment problem in a more global way,
> once we have a view of the entire graph.  Is that right?  If so,
> then at that point we might be able to do something more intelligent
> than just iterate over all the options.  (Although at the same time,
> iterating over all the options on a fixed (sub?)graph would be cheaper
> than what we do now.)

Yes, in the final end we should have the SLP tree build without
having assinged a vector type which also means doing pattern detection
on the SLP tree after the SLP build (which makes that a viable change
only after we got rid of the non-SLP paths).

But yeah, I forgot about the early vector type assingment during
pattern recog ... I did try to get rid of vect_update_shared_vectype
some months ago (the only remaining user of STMT_VINFO_NUM_SLP_USES),
but somehow miserably failed even though we now have
SLP_TREE_VECTYPE - a vector type per SLP node (see
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547981.html).

So yeah, pattern recog ... a convenient but also quite iffy feature :/

Richard.

Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-09-11 Thread Eric Botcazou

> @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> condition, lose = 1;
>mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
>mark_referenced_resources (trial, &needed, true);
> +  if (filter_flags)
> + {
> +   mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> +
> +   /* Groups of flags-register setters with users should not
> +  affect opportunities to move flags-register-setting insns
> +  (clobbers) into the delay-slot.  */
> +   CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> +   CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> + }

Don't you need to put the new block before mark_referenced_resources (as I did 
in fill_simple_delay_slots) in case the needed insn reads the flags register?

-- 
Eric Botcazou

Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-09-11 Thread Hans-Peter Nilsson via Gcc-patches

> From: Eric Botcazou 
> CC: "gcc-patches@gcc.gnu.org" 
> Date: Fri, 11 Sep 2020 13:09:48 +0200
> received-spf: None (smtp1.axis.com: no sender authenticity  information
>  available from domain of  postmas...@mail-wr1-f54.google.com) identity=helo;
>   client-ip=209.85.221.54; receiver=smtp1.axis.com;
>   envelope-from="botca...@adacore.com";
>   x-sender="postmas...@mail-wr1-f54.google.com";
>   x-conformance=sidf_compatible
> Old-Content-Type: multipart/alternative;
>   boundary="_000_2479570Ksvo8Q1nxZfomalhaut_"
> Content-Type: text/plain; charset=iso-8859-1
> 
> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > condition, lose = 1;
> >mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> >mark_referenced_resources (trial, &needed, true);
> > +  if (filter_flags)
> > +   {
> > + mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> > +
> > + /* Groups of flags-register setters with users should not
> > +affect opportunities to move flags-register-setting insns
> > +(clobbers) into the delay-slot.  */
> > + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > +   }
> 
> Don't you need to put the new block before mark_referenced_resources (as I 
> did 
> in fill_simple_delay_slots) in case the needed insn reads the flags register?

It can't be needed, as the flags register is dead in the insn with the
delay-slot, as part of the filter_flags condition.

brgds, H-P

Re: [PATCH V2] libgccjit: Add new gcc_jit_context_new_blob entry point

2020-09-11 Thread David Malcolm via Gcc-patches

On Fri, 2020-09-11 at 12:31 +0200, Andrea Corallo wrote:
> Hi Dave,
> 
> thanks for the review!
> 
> David Malcolm  writes:
> 
> [...]
> 
> > Was there a reason for using reinterpret_cast here, rather than
> > static_cast?
> 
> Yes the reason is that apparently we can't use static_cast for that:
> 
> "error: invalid ‘static_cast’ from type ‘gcc_jit_lvalue*’ to type
> ‘gcc::jit::recording::global*’"
> 
> Why is that is another question I fear I'm not very qualified to
> answer.
> If you feel is necessary and want to suggest some rework to improve
> this
> I'll be happy to put some effort in if you like.

It's because we're effectively downcasting from lvalue to global.  This
is safe because of the check for
  global->is_global ().
A slightly cleaner way to do it would be to have a dyn_cast_global
vfunc here rather than is_global, and to use the result - but I don't
think it matters.

[...]

> > > After applying the suggestions and having tested it I've
> > > installed
> the
> patch as 4ecc0061c40.
> 
> Thanks!

Thanks
Dave

Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-09-11 Thread Hans-Peter Nilsson via Gcc-patches

> From: Hans-Peter Nilsson 
> Date: Fri, 11 Sep 2020 13:24:18 +0200

> > > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > > condition, lose = 1;
> > >mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> > >mark_referenced_resources (trial, &needed, true);
> > > +  if (filter_flags)
> > > + {
> > > +   mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> > > +
> > > +   /* Groups of flags-register setters with users should not
> > > +  affect opportunities to move flags-register-setting insns
> > > +  (clobbers) into the delay-slot.  */
> > > +   CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > > +   CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > > + }
> > 
> > Don't you need to put the new block before mark_referenced_resources (as I 
> > did 
> > in fill_simple_delay_slots) in case the needed insn reads the flags 
> > register?
> 
> It can't be needed, as the flags register is dead in the insn with the
> delay-slot, as part of the filter_flags condition.

Uh, I'll get back with a more thoughtful reply...

brgds, H-P

Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-09-11 Thread Hans-Peter Nilsson via Gcc-patches

> From: Eric Botcazou 
> Date: Fri, 11 Sep 2020 13:09:48 +0200

> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx
> > condition, lose = 1;
> >mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> >mark_referenced_resources (trial, &needed, true);
> > +  if (filter_flags)
> > +   {
> > + mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> > +
> > + /* Groups of flags-register setters with users should not
> > +affect opportunities to move flags-register-setting insns
> > +(clobbers) into the delay-slot.  */
> > + CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> > + CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > +   }
> 
> Don't you need to put the new block before mark_referenced_resources (as I 
> did 
> in fill_simple_delay_slots) in case the needed insn reads the flags register?

I think the simplest answer is that wouldn't be good; it'd leave the
"CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);" without
effect.

Note also fill_simple_delay_slots is moving insns from the before the
delay-slotting insn, while this is from a "tread" (or arm, or leg)
"after".

brgds, H-P

[OG10] Merge GCC 10 into branch; merge some mainline nvptx patches

2020-09-11 Thread Tobias Burnus


OG10 = devel/omp/gcc-10

I have merged releases/gcc-10 into that branch.
And added a bunch of mainline alias GCC 11 nvptx patches
to that branch.

2df8e0f1bc4 [libatomic] Add nvptx support
5544bca37bc [nvptx] Fix UB in nvptx_assemble_value
7e10b6b0b34 [nvptx] Fix printing of 128-bit constant (negative case)
79d64f8ab05 [nvptx] Fix printing of 128-bit constant
21fc67c95a4 [nvptx, libgcc] Fix Wbuiltin-declaration-mismatch in atomic.c
46595b72fed [nvptx] Fix Wformat in nvptx_assemble_decl_begin
842cd983048 [nvptx] Fix boolean type test in write_fn_proto
97012eaf9d3 [nvptx] Fix array dimension in nvptx_assemble_decl_begin
1e00dba14b2 [nvptx] Handle V2DI/V2SI mode in nvptx_gen_shuffle
2a509a60479 Update ChangeLog.omp for the last two commits
fbfd2a5674e nvptx: Provide vec_set and vec_extract patterns
89f5545f62c nvptx: Support 16-bit shifts and extendqihi2
34f8c7e5fa3 Merge remote-tracking branch 'origin/releases/gcc-10' into 
devel/omp/gcc-10

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

[PATCH] fixincludes/fixfixes.c: Fix 'set but not used' warning.

2020-09-11 Thread Christophe Lyon via Gcc-patches

pz_tmp_base and pz_tmp_dot are always set, but used only when
_PC_NAME_MAX is defined.

This patch moves their declaration and definition undef #ifdef
_PC_NAME_MAX to avoid this warning.

2020-09-11  Torbjörn SVENSSON  
Christophe Lyon  

fixincludes/
* fixfixes.c (pz_tmp_base, pz_tmp_dot): Define only with
_PC_NAME_MAX.
---
 fixincludes/fixfixes.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fixincludes/fixfixes.c b/fixincludes/fixfixes.c
index 034e15d..5b23a8b 100644
--- a/fixincludes/fixfixes.c
+++ b/fixincludes/fixfixes.c
@@ -738,8 +738,10 @@ main( int argc, char** argv )
 {
   tFixDesc* pFix;
   char* pz_tmptmp;
+#ifdef _PC_NAME_MAX
   char* pz_tmp_base;
   char* pz_tmp_dot;
+#endif
 
   if (argc != 5)
 {
@@ -772,12 +774,12 @@ main( int argc, char** argv )
   pz_tmptmp = XNEWVEC (char, strlen (argv[4]) + 5);
   strcpy( pz_tmptmp, argv[4] );
 
+#ifdef _PC_NAME_MAX
   /* Don't lose because "12345678" and "12345678X" map to the same
  file under DOS restricted 8+3 file namespace.  Note that DOS
  doesn't allow more than one dot in the trunk of a file name.  */
   pz_tmp_base = basename( pz_tmptmp );
   pz_tmp_dot = strchr( pz_tmp_base, '.' );
-#ifdef _PC_NAME_MAX
   if (pathconf( pz_tmptmp, _PC_NAME_MAX ) <= 12/* is this DOS or 
Windows9X? */
   && pz_tmp_dot != (char*)NULL)
 strcpy (pz_tmp_dot+1, "X"); /* nuke the original extension */
-- 
2.7.4

[PATCH] libgcc/config/arm/fp16.c: Add missing prototypes

2020-09-11 Thread Christophe Lyon via Gcc-patches

This patch adds the missing prototypes for the fonctions defined in fp16.c, to 
avoid these warnings during the build:
/libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for 
'__gnu_h2f_internal' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for 
'__gnu_f2h_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for 
'__gnu_h2f_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for 
'__gnu_f2h_alternative' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for 
'__gnu_h2f_alternative' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for 
'__gnu_d2h_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for 
'__gnu_d2h_alternative' [-Wmissing-prototypes]

2020-09-11  Torbjörn SVENSSON  
Christophe Lyon  

libgcc/
* config/arm/fp16.c (__gnu_h2f_internal, __gnu_f2h_ieee)
(__gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative)
(__gnu_d2h_ieee, __gnu_d2h_alternative): Add missing prototypes.
---
 libgcc/config/arm/fp16.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
index e8f7afb..3d1d8f4 100644
--- a/libgcc/config/arm/fp16.c
+++ b/libgcc/config/arm/fp16.c
@@ -52,6 +52,15 @@ binary64 =
   52 /* significand.  */
 };
 
+/* Function prototypes */
+unsigned int __gnu_h2f_internal(unsigned short a, int ieee);
+unsigned short __gnu_f2h_ieee(unsigned int a);
+unsigned int __gnu_h2f_ieee(unsigned short a);
+unsigned short __gnu_f2h_alternative(unsigned int x);
+unsigned int __gnu_h2f_alternative(unsigned short a);
+unsigned short __gnu_d2h_ieee (unsigned long long a);
+unsigned short __gnu_d2h_alternative (unsigned long long x);
+
 static inline unsigned short
 __gnu_float2h_internal (const struct format* fmt,
unsigned long long a, int ieee)
-- 
2.7.4

[PATCH] libstdc++-v3/include/bits/regex_error.h: Avoid warning with -fno-exceptions.

2020-09-11 Thread Christophe Lyon via Gcc-patches

When building with -fno-exceptions, __GLIBCXX_THROW_OR_ABORT expands to
abort(), causing warnings:
unused parameter '__ecode'
unused parameter '__what'

This patch adds __attribute__((unused)) to avoid them.

2020-09-11  Torbjörn SVENSSON 
Christophe Lyon  

libstdc++-v3/
* include/bits/regex_error.h: Avoid warning with -fno-exceptions.
---
 libstdc++-v3/include/bits/regex_error.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/regex_error.h 
b/libstdc++-v3/include/bits/regex_error.h
index 09e9288..88f3f811 100644
--- a/libstdc++-v3/include/bits/regex_error.h
+++ b/libstdc++-v3/include/bits/regex_error.h
@@ -167,7 +167,8 @@ namespace regex_constants
   __throw_regex_error(regex_constants::error_type __ecode);
 
   inline void
-  __throw_regex_error(regex_constants::error_type __ecode, const char* __what)
+  __throw_regex_error(regex_constants::error_type __ecode 
__attribute__((unused)),
+ const char* __what__attribute__((unused)))
   { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); }
 
 _GLIBCXX_END_NAMESPACE_VERSION
-- 
2.7.4

[PATCH] libstdc++-v3/libsupc++/eh_call.cc: Avoid warning with -fno-exceptions.

2020-09-11 Thread Christophe Lyon via Gcc-patches

When building with -fno-exceptions, __throw_exception_again expands to
nothing, causing a "suggest braces around empty body in an 'if'
statement" warning.

This patch adds braces, like what was done in eh_personality.cc in svn
r193295 (git g:54ba39f599fc2f3d59fd3cd828a301ce9b731a20)

2020-09-11  Torbjörn SVENSSON  
Christophe Lyon  

libstdc++-v3/
* libsupc++/eh_call.cc: Avoid warning with -fno-exceptions.
---
 libstdc++-v3/libsupc++/eh_call.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/libsupc++/eh_call.cc 
b/libstdc++-v3/libsupc++/eh_call.cc
index ee44b1a..d50c4fb 100644
--- a/libstdc++-v3/libsupc++/eh_call.cc
+++ b/libstdc++-v3/libsupc++/eh_call.cc
@@ -138,7 +138,7 @@ __cxa_call_unexpected(void* exc_obj_in)
 
  if (__cxa_type_match(&new_xh->unwindHeader, catch_type, false,
   &new_ptr) != ctm_failed)
-   __throw_exception_again;
+   { __throw_exception_again; }
 
  // If the exception spec allows std::bad_exception, throw that.
  // We don't have a thrown object to compare against, but since
-- 
2.7.4

[PATCH] libstdc++-v3/libsupc++/eh_call.cc: Avoid "set but not used" warning

2020-09-11 Thread Christophe Lyon via Gcc-patches

When building with -fno-exceptions, bad_exception_allowed is set but
not used, causing a warning during the build.

This patch adds __attribute__((unused)) to avoid it.

2020-09-11  Torbjörn SVENSSON  
Christophe Lyon  

libstdc++-v3/
* libsupc++/eh_call.cc: Avoid warning with -fno-exceptions.
---
 libstdc++-v3/libsupc++/eh_call.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/libsupc++/eh_call.cc 
b/libstdc++-v3/libsupc++/eh_call.cc
index d50c4fb..3c7426e 100644
--- a/libstdc++-v3/libsupc++/eh_call.cc
+++ b/libstdc++-v3/libsupc++/eh_call.cc
@@ -124,7 +124,7 @@ __cxa_call_unexpected(void* exc_obj_in)
   void* new_ptr = __get_object_from_ambiguous_exception (new_xh);
   const std::type_info* catch_type;
   int n;
-  bool bad_exception_allowed = false;
+  bool bad_exception_allowed __attribute__((unused)) = false;
   const std::type_info& bad_exc = typeid(std::bad_exception);
 
   // Check the new exception against the rtti list
-- 
2.7.4

RE: [PATCH] libgcc/config/arm/fp16.c: Add missing prototypes

2020-09-11 Thread Kyrylo Tkachov

Hi Christophe,

> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 11 September 2020 13:23
> To: gcc-patches@gcc.gnu.org; i...@airs.com
> Subject: [PATCH] libgcc/config/arm/fp16.c: Add missing prototypes
> 
> This patch adds the missing prototypes for the fonctions defined in fp16.c, to
> avoid these warnings during the build:
> /libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for
> '__gnu_h2f_internal' [-Wmissing-prototypes]
> /libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for
> '__gnu_f2h_ieee' [-Wmissing-prototypes]
> /libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for
> '__gnu_h2f_ieee' [-Wmissing-prototypes]
> /libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for
> '__gnu_f2h_alternative' [-Wmissing-prototypes]
> /libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for
> '__gnu_h2f_alternative' [-Wmissing-prototypes]
> /libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for
> '__gnu_d2h_ieee' [-Wmissing-prototypes]
> /libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for
> '__gnu_d2h_alternative' [-Wmissing-prototypes]
> 
> 2020-09-11  Torbjörn SVENSSON  
>   Christophe Lyon  
> 
>   libgcc/
>   * config/arm/fp16.c (__gnu_h2f_internal, __gnu_f2h_ieee)
>   (__gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative)
>   (__gnu_d2h_ieee, __gnu_d2h_alternative): Add missing prototypes.
> ---
>  libgcc/config/arm/fp16.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
> index e8f7afb..3d1d8f4 100644
> --- a/libgcc/config/arm/fp16.c
> +++ b/libgcc/config/arm/fp16.c
> @@ -52,6 +52,15 @@ binary64 =
>52 /* significand.  */
>  };
> 
> +/* Function prototypes */
> +unsigned int __gnu_h2f_internal(unsigned short a, int ieee);

I think this function should just be marked as static, as it is truly internal 
to this file, whereas the other ones can be called from code emitted in 
config/arm/arm.c

Thanks,
Kyrill

> +unsigned short __gnu_f2h_ieee(unsigned int a);
> +unsigned int __gnu_h2f_ieee(unsigned short a);
> +unsigned short __gnu_f2h_alternative(unsigned int x);
> +unsigned int __gnu_h2f_alternative(unsigned short a);
> +unsigned short __gnu_d2h_ieee (unsigned long long a);
> +unsigned short __gnu_d2h_alternative (unsigned long long x);
> +
>  static inline unsigned short
>  __gnu_float2h_internal (const struct format* fmt,
>   unsigned long long a, int ieee)
> --
> 2.7.4

Re: [PATCH] ppc64 check for incompatible setting of minimal-toc

2020-09-11 Thread Alexandre Oliva

On Sep 11, 2020, Alan Modra  wrote:

> I also thought it reasonable to error on an explicit -mcmodel=medium
> or -mcmodel=large with either of -mminimal-toc or -mno-minimal-toc,
> since the toc options really are specific to small model code.  Why
> change that?

Thanks.  I think the key piece of information that was missing for us
was that -mno-minimal-toc was specific to small model code as much as
-mminimal-toc.

Without that info, the prevailing expectation was that, if -mminimal-toc
conflicted with -mcmodel=*, then reversing its effects with
-mno-minimal-toc should make it fine, but doesn't.


I guess this means we withdraw the patch.  Thanks again!

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

[PATCH] tree-optimization/97020 - account SLP cost in loop vect again

2020-09-11 Thread Richard Biener

The previous re-org made the cost of SLP vector stmts in loop
vectorization ignored.  The following rectifies this mistake.

Bootstrapped & tested on x86_64-unknown-linux-gnu, pushed.

2020-09-11  Richard Biener  

PR tree-optimization/97020
* tree-vect-slp.c (vect_slp_analyze_operations): Apply
SLP costs when doing loop vectorization.
---
 gcc/tree-vect-slp.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 519cd6a7254..15912515caa 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3163,8 +3163,15 @@ vect_slp_analyze_operations (vec_info *vinfo)
visited.add (*x);
  i++;
 
- /* Remember the SLP graph entry cost for later.  */
- instance->cost_vec = cost_vec;
+ /* For BB vectorization remember the SLP graph entry
+cost for later.  */
+ if (is_a  (vinfo))
+   instance->cost_vec = cost_vec;
+ else
+   {
+ add_stmt_costs (vinfo, vinfo->target_cost_data, &cost_vec);
+ cost_vec.release ();
+   }
}
 }
 
-- 
2.26.2

RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-09-11 Thread Martin Jambor

Hi,

On Fri, Sep 11 2020, Tamar Christina wrote:
> Hi Martin,
>
>> On Fri, Aug 21 2020, Tamar Christina wrote:
>> >>
>> >> Honza's changes have been motivated to big extent as an enabler for
>> >> IPA-CP heuristics changes to actually speed up 548.exchange2_r.
>> >>
>> >> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds
>> two
>> >> weeks ago, this week it is 403, but with my WIP (and so far untested)
>> >> patch below it is just 276 seconds - faster than one built with GCC 8
>> >> which needs
>> >> 283 seconds.
>> >>
>> >> I'll be interested in knowing if it also works this well on other 
>> >> architectures.
>> >>
>> 
>> I have posted the new version of the patch series to the mailing list
>> yesterday and I have also pushed the branch to the FSF repo as
>> refs/users/jamborm/heads/ipa-context_and_exchange-200907
>> 
>
> Thanks! I've pushed it through our CI with a host of different options (see 
> below)
>
>> >
>> > Many thanks for working on this!
>> >
>> > I tried this on an AArch64 Neoverse-N1 machine and didn't see any
>> difference.
>> > Do I need any flags for it to work? The patch was applied on top of
>> > 656218ab982cc22b826227045826c92743143af1
>> >
>> 
>> I only have access to fairly old AMD (Seattle) Opteron 1100 which might not
>> support some interesting Aarch64 ISA extensions but I can measure a
>> significant speedup on it (everything with just -Ofast -march=native -
>> mtune=native, no non-default parameters, without LTO, without any inlining
>> options):
>> 
>>   GCC 10 branch:  915 seconds
>>   Master (rev. 995bb851ffe):  989 seconds
>>   My branch:  827 seconds
>> 
>> (All is 548.exchange_r reference run time.)
>> 
>
> +--+--+--+--+--+
> | Compiler | Flags
>   
>   | diff GCC 10  |  |  |
> +--+--+--+--+--+
> | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
> -fno-inline-functions-called-once |  |  |  |
> +--+--+--+--+--+
> | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer 
>   
>   | -44% |  |  |
> +--+--+--+--+--+
> | GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto   
>   
>   | -36% |  |  |
> +--+--+--+--+--+
> | GCC 11   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
> -fno-inline-functions-called-once | -12% |  |  |
> +--+--+--+--+--+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80   
> | -22% |  |  |
> +--+--+--+--+--+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
> ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
> -fno-inline-functions-called-once | -12% |  |  |

can you please confirm that the difference between these two is all due
to the last option -fno-inline-functions-called-once ?  Is LTo
necessary?  I.e., can you run the benchmark also built with the branch
compiler and
-mcpu=native -Ofast -fomit-frame-pointer -fno-inline-functions-called-once ?

> +--+--+--+--+--+
> | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto   
>   
>   | -24% |  |  |
> +-

Re: [PATCH] libstdc++-v3/include/bits/regex_error.h: Avoid warning with -fno-exceptions.

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote:

When building with -fno-exceptions, __GLIBCXX_THROW_OR_ABORT expands to
abort(), causing warnings:
unused parameter '__ecode'
unused parameter '__what'

This patch adds __attribute__((unused)) to avoid them.


OK, thanks.


2020-09-11  TorbjÃ¶rn SVENSSON 
Christophe Lyon  

libstdc++-v3/
* include/bits/regex_error.h: Avoid warning with -fno-exceptions.
---
libstdc++-v3/include/bits/regex_error.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/regex_error.h 
b/libstdc++-v3/include/bits/regex_error.h
index 09e9288..88f3f811 100644
--- a/libstdc++-v3/include/bits/regex_error.h
+++ b/libstdc++-v3/include/bits/regex_error.h
@@ -167,7 +167,8 @@ namespace regex_constants
  __throw_regex_error(regex_constants::error_type __ecode);

  inline void
-  __throw_regex_error(regex_constants::error_type __ecode, const char* __what)
+  __throw_regex_error(regex_constants::error_type __ecode 
__attribute__((unused)),
+ const char* __what__attribute__((unused)))
  { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); }

_GLIBCXX_END_NAMESPACE_VERSION
--
2.7.4

Re: [PATCH] libstdc++-v3/libsupc++/eh_call.cc: Avoid warning with -fno-exceptions.

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote:

When building with -fno-exceptions, __throw_exception_again expands to
nothing, causing a "suggest braces around empty body in an 'if'
statement" warning.

This patch adds braces, like what was done in eh_personality.cc in svn
r193295 (git g:54ba39f599fc2f3d59fd3cd828a301ce9b731a20)


OK, thanks.


2020-09-11  TorbjÃ¶rn SVENSSON  
Christophe Lyon  

libstdc++-v3/
* libsupc++/eh_call.cc: Avoid warning with -fno-exceptions.
---
libstdc++-v3/libsupc++/eh_call.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/libsupc++/eh_call.cc 
b/libstdc++-v3/libsupc++/eh_call.cc
index ee44b1a..d50c4fb 100644
--- a/libstdc++-v3/libsupc++/eh_call.cc
+++ b/libstdc++-v3/libsupc++/eh_call.cc
@@ -138,7 +138,7 @@ __cxa_call_unexpected(void* exc_obj_in)

  if (__cxa_type_match(&new_xh->unwindHeader, catch_type, false,
   &new_ptr) != ctm_failed)
-   __throw_exception_again;
+   { __throw_exception_again; }

  // If the exception spec allows std::bad_exception, throw that.
  // We don't have a thrown object to compare against, but since
--
2.7.4

Re: [PATCH] libstdc++-v3/libsupc++/eh_call.cc: Avoid "set but not used" warning

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote:

When building with -fno-exceptions, bad_exception_allowed is set but
not used, causing a warning during the build.

This patch adds __attribute__((unused)) to avoid it.


OK, thanks.


2020-09-11  TorbjÃ¶rn SVENSSON  
Christophe Lyon  

libstdc++-v3/
* libsupc++/eh_call.cc: Avoid warning with -fno-exceptions.
---
libstdc++-v3/libsupc++/eh_call.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/libsupc++/eh_call.cc 
b/libstdc++-v3/libsupc++/eh_call.cc
index d50c4fb..3c7426e 100644
--- a/libstdc++-v3/libsupc++/eh_call.cc
+++ b/libstdc++-v3/libsupc++/eh_call.cc
@@ -124,7 +124,7 @@ __cxa_call_unexpected(void* exc_obj_in)
  void* new_ptr = __get_object_from_ambiguous_exception (new_xh);
  const std::type_info* catch_type;
  int n;
-  bool bad_exception_allowed = false;
+  bool bad_exception_allowed __attribute__((unused)) = false;
  const std::type_info& bad_exc = typeid(std::bad_exception);

  // Check the new exception against the rtti list
--
2.7.4

Re: [PATCH] ppc64 check for incompatible setting of minimal-toc

2020-09-11 Thread Douglas B Rupp




On 9/11/20 5:35 AM, Alexandre Oliva wrote:

I guess this means we withdraw the patch.  Thanks again!

The original motivation for this was a failing fsf-testsuite-gcc test on 
vxworks7.  If the patch is wrong, then I suppose the alternative is to 
modify the test, or else kill it?

Re: [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:

This series ensures that the std::future::wait_* functions use
std::chrono::steady_clock when required, introduces
std::chrono::__detail::ceil to make that easier to do, and then makes
use of that function to simplify and improve the fix for PR68519 in
std::condition_variable.

v1 of this series was originally posted back in September 2017 (see
https://gcc.gnu.org/ml/libstdc++/2017-09/msg00083.html )

v2 of this series was originally posted back in January 2018 (see
https://gcc.gnu.org/ml/libstdc++/2018-01/msg00035.html )

v3 of this series was originally posted back in August 2018 (see
https://gcc.gnu.org/ml/libstdc++/2018-08/msg1.html )

v4 of this series was originally posted back in October 2019 (see
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-10/msg01934.html )

Changes since v4:

* Expose std::chrono::ceil as std::chrono::__detail::ceil so that it
 can be used to fix PR91486 in std::future::wait_for (as suggested by
 John Salmon in PR91486.)

* Use std::chrono::__detail::ceil to simplify fix for PR68519 in
 std::condition_variable::wait_for.

* Also fix equivalent of PR68519 in
 std::condition_variable::wait_until and add test.


I've pushed this series (except for patch 8/8) to trunk now, with only
minor indentation changes in the code and some ChangeLog reformatting.

Thanks for your persistence in getting this fixed, and sorry it took
so many years!

Re: [PATCH] libstdc++-v3/include/bits/regex_error.h: Avoid warning with -fno-exceptions.

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 11/09/20 13:55 +0100, Jonathan Wakely wrote:

On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote:

When building with -fno-exceptions, __GLIBCXX_THROW_OR_ABORT expands to
abort(), causing warnings:
unused parameter '__ecode'
unused parameter '__what'

This patch adds __attribute__((unused)) to avoid them.


OK, thanks.


2020-09-11  Torbjörn SVENSSON 
Christophe Lyon  

libstdc++-v3/
* include/bits/regex_error.h: Avoid warning with -fno-exceptions.
---
libstdc++-v3/include/bits/regex_error.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/regex_error.h 
b/libstdc++-v3/include/bits/regex_error.h
index 09e9288..88f3f811 100644
--- a/libstdc++-v3/include/bits/regex_error.h
+++ b/libstdc++-v3/include/bits/regex_error.h
@@ -167,7 +167,8 @@ namespace regex_constants
 __throw_regex_error(regex_constants::error_type __ecode);

 inline void
-  __throw_regex_error(regex_constants::error_type __ecode, const char* __what)
+  __throw_regex_error(regex_constants::error_type __ecode 
__attribute__((unused)),


Actually, not OK, because that needs to be __unused__ not unused.


+ const char* __what__attribute__((unused)))



And that fails to compile:

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/regex_error.h:171:54:
 error: expected ‘,’ or ‘...’ before ‘(’ token

There's a space missing between the name and the attribute.




 { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); }

_GLIBCXX_END_NAMESPACE_VERSION
--
2.7.4

Re: [PATCH] libstdc++-v3/include/bits/regex_error.h: Avoid warning with -fno-exceptions.

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 11/09/20 14:37 +0100, Jonathan Wakely wrote:

On 11/09/20 13:55 +0100, Jonathan Wakely wrote:

On 11/09/20 12:23 +, Christophe Lyon via Libstdc++ wrote:

When building with -fno-exceptions, __GLIBCXX_THROW_OR_ABORT expands to
abort(), causing warnings:
unused parameter '__ecode'
unused parameter '__what'

This patch adds __attribute__((unused)) to avoid them.


OK, thanks.


2020-09-11  Torbjörn SVENSSON 
Christophe Lyon  

libstdc++-v3/
* include/bits/regex_error.h: Avoid warning with -fno-exceptions.
---
libstdc++-v3/include/bits/regex_error.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/regex_error.h 
b/libstdc++-v3/include/bits/regex_error.h
index 09e9288..88f3f811 100644
--- a/libstdc++-v3/include/bits/regex_error.h
+++ b/libstdc++-v3/include/bits/regex_error.h
@@ -167,7 +167,8 @@ namespace regex_constants
__throw_regex_error(regex_constants::error_type __ecode);

inline void
-  __throw_regex_error(regex_constants::error_type __ecode, const char* __what)
+  __throw_regex_error(regex_constants::error_type __ecode 
__attribute__((unused)),


Actually, not OK, because that needs to be __unused__ not unused.


+ const char* __what__attribute__((unused)))



And that fails to compile:

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/regex_error.h:171:54:
 error: expected ‘,’ or ‘...’ before ‘(’ token

There's a space missing between the name and the attribute.


Fixed with this patch. Untested, but it builds.


commit 29216f56d002982f10c33056f4b3d7f07e164122
Author: Jonathan Wakely 
Date:   Fri Sep 11 14:51:36 2020

libstdc++: Fix build error in 

libstdc++-v3/ChangeLog:

* include/bits/regex_error.h (__throw_regex_error): Fix
parameter declaration and use reserved attribute names.

diff --git a/libstdc++-v3/include/bits/regex_error.h b/libstdc++-v3/include/bits/regex_error.h
index 88f3f8114a4..f9c01650caa 100644
--- a/libstdc++-v3/include/bits/regex_error.h
+++ b/libstdc++-v3/include/bits/regex_error.h
@@ -167,8 +167,9 @@ namespace regex_constants
   __throw_regex_error(regex_constants::error_type __ecode);
 
   inline void
-  __throw_regex_error(regex_constants::error_type __ecode __attribute__((unused)),
-		  const char* __what__attribute__((unused)))
+  __throw_regex_error(regex_constants::error_type __ecode
+			__attribute__((__unused__)),
+		  const char* __what __attribute__((__unused__)))
   { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); }
 
 _GLIBCXX_END_NAMESPACE_VERSION

Re: [committed] amdgcn: align TImode registers

2020-09-11 Thread Andrew Stubbs


This is now backported to GCC 10.

Andrew

On 11/09/2020 11:17, Andrew Stubbs wrote:
This patch fixes an execution failure in which the compiler would 
corrupt TImode values due to missed early clobber problems with 
partially overlapping register allocations.  In fact, adding early 
clobber constraints does not fix the issue because IRA doesn't check 
that for move instructions properly in all circumstances anyway (the 
constraint causes an ICE in postreload).


This patch fixes the problem by ensuring that TImode values are always 
aligned to 4-register boundaries, meaning that inputs and outputs will 
either overlap completely, or not at all, neither of which have 
early-clobber issues.


This is an artificial restriction the hardware not present in hardware , 
but it is the same solution we use for DImode values where we had a lot 
of the same problems.


With the patch I see the following new test passes:

PASS: gfortran.dg/PR95331.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test

PASS: gfortran.dg/PR95331.f90   -O3 -g  execution test
PASS: gfortran.dg/gamma_5.f90   -O0  execution test
PASS: gfortran.dg/gamma_5.f90   -O1  execution test
PASS: gfortran.dg/gamma_5.f90   -O2  execution test
PASS: gfortran.dg/gamma_5.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test

PASS: gfortran.dg/gamma_5.f90   -O3 -g  execution test
PASS: gfortran.dg/gamma_5.f90   -Os  execution test
PASS: gfortran.dg/intrinsic_pack_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

PASS: gfortran.dg/intrinsic_pack_1.f90   -O3 -g  execution test
PASS: gfortran.dg/optional_absent_5.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

PASS: gfortran.dg/optional_absent_5.f90   -O3 -g  execution test

Andrew

Re: [PATCH] testsuite: gimplefe-44 requires exceptions

2020-09-11 Thread Andrew Stubbs


This is now committed and backported to GCC 10.

Andrew

On 10/09/2020 15:03, Andrew Stubbs wrote:
This patch prevents an ICE (segmentation fault) the occurs for amdgcn 
because the test is trying to use -fexceptions which is unsupported on 
the target.


Arguably it should fail more gracefully, but either way the test is 
inappropriate.


OK to commit?

Andrew

Re: [PATCH] [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-09-11 Thread Segher Boessenkool

On Fri, Sep 11, 2020 at 08:07:39AM +0200, Richard Biener wrote:
> On Thu, Sep 10, 2020 at 5:12 PM Segher Boessenkool
>  wrote:
> > On Thu, Sep 10, 2020 at 04:33:30PM +0200, Richard Biener wrote:
> > > On Thu, Sep 10, 2020 at 4:22 PM Aaron Sawdey  
> > > wrote:
> > > > If it feels like a hack, that would because it is a hack.
> > > >
> > > > What I’d really like to discuss is how to accomplish the real goal: 
> > > > keep anything from trying to do other operations (zero/sign extend for 
> > > > one) to POImode.
> > > >
> > > > Is there an existing mechanism for this?
> > >
> > > Not that I know, but somehow x86 gets away with OImode and XImode without
> > > providing too many patterns for those.
> >
> > What we were seeing is DSE (of all things!) tries to extract a DImode
> > from a POImode (and expects that insn to exist!)  That is no good.
> 
> Maybe.  I don't know what kind of operations have to exist if a mode
> is present and what not.
> But are you sure this will be the only case you'll run into?

No, I am not sure if there are bugs related to this elsewhere in the
compiler :-)

Until 2014 (and documented just days ago ;-) ) all bits of a partial
integer mode were considered unknown.  I have looked at a lot of it in
our code the past weeks, and we still treat it like that in most places.

We now see bootstrap problems if we use POImode in some contexts (that's
this PR96791).  POImode can only live in pairs of VSX registers; taking
a subreg of POImode that would not be valid on one VSX register is not
okay.

Maybe we are missing some hooks or macros?


Segher

Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-11 Thread Tom de Vries

On 9/2/20 1:48 PM, Tom de Vries wrote:
> On 9/2/20 12:44 PM, Jakub Jelinek wrote:
>> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
>>> And test-case passes on x86_64 with this patch (obviously, in
>>> combination with trigger patch above).
>>>
>>> Jakub, WDYT?
>>
>> I guess the normal answer would be use libatomic, but it isn't ported for
>> nvptx.
> 
> Ah, I was not aware of that one, filed
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.
> 
>> I guess at least temporarily this is ok, though I'm wondering why
>> you need __sync_*_16 rather than __atomic_*_16, 
> 
> That's what omp-expand.c uses in expand_omp_atomic_pipeline:
> BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .
> 

I've got an updated version of this patch. It:
- no longer supplies the __atomic_load_16, since that's now handled by
  libatomic
- the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
  which also falls back on libatomic.

I'm currently retesting.

Any comments?

Otherwise, I'll commit on Monday.

Thanks,
- Tom

Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-11 Thread Tom de Vries

[ Fixing ENOPATCH. ]

On 9/11/20 4:24 PM, Tom de Vries wrote:
> On 9/2/20 1:48 PM, Tom de Vries wrote:
>> On 9/2/20 12:44 PM, Jakub Jelinek wrote:
>>> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
 And test-case passes on x86_64 with this patch (obviously, in
 combination with trigger patch above).

 Jakub, WDYT?
>>>
>>> I guess the normal answer would be use libatomic, but it isn't ported for
>>> nvptx.
>>
>> Ah, I was not aware of that one, filed
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.
>>
>>> I guess at least temporarily this is ok, though I'm wondering why
>>> you need __sync_*_16 rather than __atomic_*_16, 
>>
>> That's what omp-expand.c uses in expand_omp_atomic_pipeline:
>> BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .
>>
> 
> I've got an updated version of this patch. It:
> - no longer supplies the __atomic_load_16, since that's now handled by
>   libatomic
> - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
>   which also falls back on libatomic.
> 
> I'm currently retesting.
> 
> Any comments?
> 
> Otherwise, I'll commit on Monday.
> 
> Thanks,
> - Tom
> 
[libgomp, nvptx] Add __sync_compare_and_swap_16

As reported here
( https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553070.html  ),
when running test-case libgomp.c-c++-common/reduction-16.c for powerpc host
with nvptx accelerator, we run into:
...
unresolved symbol __sync_val_compare_and_swap_16
...

I can reproduce the problem on x86_64 with a trigger patch that:
- initializes ix86_isa_flags2 to TARGET_ISA2_CX16
- enables define_expand "atomic_load" in gcc/config/i386/sync.md
  for TImode

The problem is that omp-expand.c generates atomic builtin calls based on
checks whether those are supported on the host, which forces the target to
support these, even though those checks fail for the accelerator target.

Fix this by:
- adding a __sync_val_compare_and_swap_16 in libgomp for nvptx,
  which falls back onto libatomic's __atomic_compare_and_swap_16
- adding -foffload=-latomic in the test-case

Tested libgomp on x86_64-linux with nvptx accelerator.

Tested libgomp with trigger patch on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

	* config/nvptx/atomic.c: New file.  Add
	__sync_val_compare_and_swap_16.

---
 libgomp/config/nvptx/atomic.c | 18 ++
 libgomp/testsuite/libgomp.c-c++-common/reduction-16.c |  1 +
 2 files changed, 19 insertions(+)

diff --git a/libgomp/config/nvptx/atomic.c b/libgomp/config/nvptx/atomic.c
new file mode 100644
index 000..2e13b09b497
--- /dev/null
+++ b/libgomp/config/nvptx/atomic.c
@@ -0,0 +1,18 @@
+#include "libgomp.h"
+
+#include "../../atomic.c"
+
+/* Implement __sync_val_compare_and_swap_16, to support offloading from hosts
+   that support this builtin.  Fallback on libatomic.  This can be removed
+   once omp-expand starts using __atomic_compare_exchange_n instead.  */
+
+unsigned __int128
+__sync_val_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval,
+unsigned __int128 newval)
+{
+  volatile __int128 *ptr = vptr;
+  __int128 expected = oldval;
+  __atomic_compare_exchange_n (ptr, &expected, newval, false,
+			   MEMMODEL_SEQ_CST, MEMMODEL_SEQ_CST);
+  return expected;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
index d0e82b04790..cfa8853f18b 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-additional-options "-foffload=-latomic" } */
 
 #include

Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-11 Thread Jakub Jelinek via Gcc-patches

On Fri, Sep 11, 2020 at 04:24:42PM +0200, Tom de Vries wrote:
> I've got an updated version of this patch. It:
> - no longer supplies the __atomic_load_16, since that's now handled by
>   libatomic
> - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
>   which also falls back on libatomic.
> 
> I'm currently retesting.
> 
> Any comments?
> 
> Otherwise, I'll commit on Monday.

If some functions are now in libatomic, do we expect users to know that and
pass -foffload=-latomic to link, or will mkoffload or whatever do that
automatically?  If the latter, will e.g. libgomp testsuite ensure that
during testing the library can be found even non-installed, if the former,
will libgomp testsuite add it for the respective testcases that need it,
perhaps under special options?

Jakub

Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:

Convert the specified duration to the target clock's duration type
before adding it to the current time in
__atomic_futex_unsigned::_M_load_when_equal_for and
_M_load_when_equal_until.  This removes the risk of the timeout being
rounded down to the current time resulting in there being no wait at
all when the duration type lacks sufficient precision to hold the
steady_clock current time.

Rather than using the style of fix from PR68519, let's expose the C++17
std::chrono::ceil function as std::chrono::__detail::ceil so that it can
be used in code compiled with earlier standards versions and simplify
the fix. This was suggested by John Salmon in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 .

This problem has become considerably less likely to trigger since I
switched the __atomic__futex_unsigned::__clock_t reference clock from
system_clock to steady_clock and added the loop, but the consequences of
triggering it have changed too.

By my calculations it takes just over 194 days from the epoch for the
current time not to be representable in a float. This means that
system_clock is always subject to the problem (with the standard 1970
epoch) whereas steady_clock with float duration only runs out of
resolution machine has been running for that long (assuming the Linux
implementation of CLOCK_MONOTONIC.)

The recently-added loop in
__atomic_futex_unsigned::_M_load_when_equal_until turns this scenario
into a busy wait.

Unfortunately the combination of both of these things means that it's
not possible to write a test case for this occurring in
_M_load_when_equal_until as it stands.

* libstdc++-v3/include/std/chrono: (__detail::ceil) Move
 implementation of std::chrono::ceil into private namespace so
 that it's available to pre-C++17 code.

* libstdc++-v3/include/bits/atomic_futex.h:
  (__atomic_futex_unsigned::_M_load_when_equal_for,
  __atomic_futex_unsigned::_M_load_when_equal_until): Use
  __detail::ceil to convert delta to the reference clock
  duration type to avoid resolution problems

* libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486):
 New test for __atomic_futex_unsigned::_M_load_when_equal_for.
---
libstdc++-v3/include/bits/atomic_futex.h |  6 +++--
libstdc++-v3/include/std/chrono  | 19 +
libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +-
3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 5f95ade..aa137a7 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_load_when_equal_for(unsigned __val, memory_order __mo,
  const chrono::duration<_Rep, _Period>& __rtime)
  {
+   using __dur = typename __clock_t::duration;
return _M_load_when_equal_until(__val, __mo,
-   __clock_t::now() + __rtime);
+   __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
  }

// Returns false iff a timeout occurred.
@@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
do {
  const __clock_t::time_point __s_entry = __clock_t::now();
  const auto __delta = __atime - __c_entry;
- const auto __s_atime = __s_entry + __delta;
+ const auto __s_atime = __s_entry +
+ chrono::__detail::ceil<_Duration>(__delta);
  if (_M_load_when_equal_until(__val, __mo, __s_atime))
return true;
  __c_entry = _Clock::now();
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 6d78f32..4257c7c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
#endif
#endif // C++20

+// We want to use ceil even when compiling for earlier standards versions
+namespace __detail
+{
+  template
+constexpr __enable_if_is_duration<_ToDur>
+ceil(const duration<_Rep, _Period>& __d)
+{
+ auto __to = chrono::duration_cast<_ToDur>(__d);
+ if (__to < __d)
+   return __to + _ToDur{1};
+ return __to;
+   }
+}
+
#if __cplusplus >= 201703L
# define __cpp_lib_chrono 201611

@@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  constexpr __enable_if_is_duration<_ToDur>
  ceil(const duration<_Rep, _Period>& __d)
  {
-   auto __to = chrono::duration_cast<_ToDur>(__d);
-   if (__to < __d)
- return __to + _ToDur{1};
-   return __to;
+   return __detail::ceil<_ToDur>(__d);


This isn't valid in C++11, a constexpr function needs to be just a
return statement. Fix incoming ...

[PATCH, rs6000] testsuite fixup pr96139 tests

2020-09-11 Thread will schmidt via Gcc-patches



Hi,
  As reported, the recently added pr96139 tests will fail on older targets
  because the tests are missing the appropriate -mvsx or -maltivec options.
  This adds the options and clarifies the dg-require statements.

  sniff-regtested OK when specifying older targets on a power7 host.
  --target_board=unix/'{-mcpu=power4,-mcpu=power5,-mcpu=power6,-mcpu=power7,
  -mcpu=power8,-mcpu=power9}''{-m64,-m32}'"

  OK for trunk?

Thanks
-=Will

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require.
* gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require.
* gcc.target/powerpc/pr96139-a.c: specify -maltivec option and require.

diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c 
b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
index b3daee4..12a3383 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
@@ -1,9 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wall -m32" } */
+/* { dg-options "-O2 -Wall -m32 -mvsx" } */
 /* { dg-require-effective-target ilp32 } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
 
 #include 
 #include 
 
 void
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-b.c 
b/gcc/testsuite/gcc.target/powerpc/pr96139-b.c
index 19c1110..379849a 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96139-b.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96139-b.c
@@ -1,9 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wall -m64" } */
+/* { dg-options "-O2 -Wall -m64 -mvsx" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
 
 #include 
 #include 
 
 void
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c 
b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
index 2464b8d..3ada260 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
@@ -1,7 +1,7 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -Wall" } */
+/* { dg-options "-O2 -Wall -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec_ok } */
 
 /*
  * Based on test created by sjmunroe for pr96139
  */

Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-11 Thread Andrew Stubbs


On 11/09/2020 15:25, Tom de Vries wrote:

--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c

@@ -1,4 +1,5 @@

/*·{·dg-do·run·}·*/
+/*·{·dg-additional-options·"-foffload=-latomic"·}·*/


This will probably break amdgcn, where libatomic does not exist.

Andrew

Re: [PATCH] c++: Remove LOOKUP_CONSTINIT.

2020-09-11 Thread Nathan Sidwell


On 9/10/20 10:15 PM, Marek Polacek via Gcc-patches wrote:

Since we now have DECL_DECLARED_CONSTINIT_P, we no longer need
LOOKUP_CONSTINIT.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


looks good, thanks for noticing.  BTW, you now have
>   /* Set constexpr flag on vars (functions got it in grokfndecl).  */
>   if (constexpr_p && VAR_P (decl))
> DECL_DECLARED_CONSTEXPR_P (decl) = true;
> +/* And the constinit flag (which only applies to variables).  */
> +else if (constinit_p && VAR_P (decl))
> +  DECL_DECLARED_CONSTINIT_P (decl) = true;

might that be clearer as
if (VAR_P (decl))
  {  constexpr stuff constinit stuff }

?  Ok either way



gcc/cp/ChangeLog:

* cp-tree.h (LOOKUP_CONSTINIT): Remove.
(LOOKUP_REWRITTEN): Adjust.
* decl.c (duplicate_decls): Set DECL_DECLARED_CONSTINIT_P.
(check_initializer): Use DECL_DECLARED_CONSTINIT_P instead of
LOOKUP_CONSTINIT.
(cp_finish_decl): Don't set DECL_DECLARED_CONSTINIT_P.  Use
DECL_DECLARED_CONSTINIT_P instead of LOOKUP_CONSTINIT.
(grokdeclarator): Set DECL_DECLARED_CONSTINIT_P.
* decl2.c (grokfield): Don't handle LOOKUP_CONSTINIT.
* parser.c (cp_parser_decomposition_declaration): Remove
LOOKUP_CONSTINIT handling.
(cp_parser_init_declarator): Likewise.
* pt.c (tsubst_expr): Likewise.
(instantiate_decl): Likewise.
* typeck2.c (store_init_value): Use DECL_DECLARED_CONSTINIT_P instead
of LOOKUP_CONSTINIT.
---
  gcc/cp/cp-tree.h |  4 +---
  gcc/cp/decl.c| 13 +++--
  gcc/cp/decl2.c   |  3 ---
  gcc/cp/parser.c  |  9 ++---
  gcc/cp/pt.c  |  8 ++--
  gcc/cp/typeck2.c |  2 +-
  6 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b166475b5f1..5923574a7aa 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5598,13 +5598,11 @@ enum overload_flags { NO_SPECIAL = 0, DTOR_FLAG, 
TYPENAME_FLAG };
  #define LOOKUP_DELEGATING_CONS (LOOKUP_NO_NON_INTEGRAL << 1)
  /* Allow initialization of a flexible array members.  */
  #define LOOKUP_ALLOW_FLEXARRAY_INIT (LOOKUP_DELEGATING_CONS << 1)
-/* Require constant initialization of a non-constant variable.  */
-#define LOOKUP_CONSTINIT (LOOKUP_ALLOW_FLEXARRAY_INIT << 1)
  /* We're looking for either a rewritten comparison operator candidate or the
 operator to use on the former's result.  We distinguish between the two by
 knowing that comparisons other than == and <=> must be the latter, as must
 a <=> expression trying to rewrite to <=> without reversing.  */
-#define LOOKUP_REWRITTEN (LOOKUP_CONSTINIT << 1)
+#define LOOKUP_REWRITTEN (LOOKUP_ALLOW_FLEXARRAY_INIT << 1)
  /* Reverse the order of the two arguments for comparison rewriting.  First we
 swap the arguments in add_operator_candidates, then we swap the conversions
 in add_candidate (so that they correspond to the original order of the
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index f1b7fbaa75b..79afdeeb055 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2312,6 +2312,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
|= DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (olddecl);
  DECL_DECLARED_CONSTEXPR_P (newdecl)
|= DECL_DECLARED_CONSTEXPR_P (olddecl);
+ DECL_DECLARED_CONSTINIT_P (newdecl)
+   |= DECL_DECLARED_CONSTINIT_P (olddecl);
  
  	  /* Merge the threadprivate attribute from OLDDECL into NEWDECL.  */

  if (DECL_LANG_SPECIFIC (olddecl)
@@ -6884,7 +6886,7 @@ check_initializer (tree decl, tree init, int flags, 
vec **cleanups)
  flags |= LOOKUP_ALREADY_DIGESTED;
}
  else if (DECL_DECLARED_CONSTEXPR_P (decl)
-  || (flags & LOOKUP_CONSTINIT))
+  || DECL_DECLARED_CONSTINIT_P (decl))
{
  /* Declared constexpr or constinit, but no suitable initializer;
 massage init appropriately so we can pass it into
@@ -7675,10 +7677,6 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
  DECL_INITIAL (decl) = NULL_TREE;
}
  
-  /* Handle `constinit' on variable templates.  */

-  if (flags & LOOKUP_CONSTINIT)
-   DECL_DECLARED_CONSTINIT_P (decl) = true;
-
/* Generally, initializers in templates are expanded when the
 template is instantiated.  But, if DECL is a variable constant
 then it can be used in future constant expressions, so its value
@@ -7782,7 +7780,7 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
/* [dcl.constinit]/1 "The constinit specifier shall be applied
 only to a declaration of a variable with static or thread storage
 duration."  */
-  if ((flags & LOOKUP_CONSTINIT)
+  if (DECL_DECLARED_CONSTINIT_P (decl)
  && !(dk == dk_thread || dk == dk_static))
{
  error_at (DECL_SOURCE_LOCATI

Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-11 Thread tdevries


On 2020-09-11 16:48, Andrew Stubbs wrote:

On 11/09/2020 15:25, Tom de Vries wrote:

--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c

@@ -1,4 +1,5 @@

/*·{·dg-do·run·}·*/
+/*·{·dg-additional-options·"-foffload=-latomic"·}·*/


This will probably break amdgcn, where libatomic does not exist.



It looks like the customary way to handle that is to use 
offload_target_nvptx.


Thanks,
- Tom

Re: [PATCH] c++: Remove LOOKUP_CONSTINIT.

2020-09-11 Thread Marek Polacek via Gcc-patches

On Fri, Sep 11, 2020 at 10:57:21AM -0400, Nathan Sidwell wrote:
> On 9/10/20 10:15 PM, Marek Polacek via Gcc-patches wrote:
> > Since we now have DECL_DECLARED_CONSTINIT_P, we no longer need
> > LOOKUP_CONSTINIT.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> looks good, thanks for noticing.  BTW, you now have
> >   /* Set constexpr flag on vars (functions got it in grokfndecl).  */
> >   if (constexpr_p && VAR_P (decl))
> > DECL_DECLARED_CONSTEXPR_P (decl) = true;
> > +/* And the constinit flag (which only applies to variables).  */
> > +else if (constinit_p && VAR_P (decl))
> > +  DECL_DECLARED_CONSTINIT_P (decl) = true;
> 
> might that be clearer as
> if (VAR_P (decl))
>   {  constexpr stuff constinit stuff }
> 
> ?  Ok either way

Agreed, it'd be nicer to factor VAR_P out.  I'll make that change.

Thanks,
Marek

Re: [RFC][nvptx, libgomp] Add 128-bit atomic support

2020-09-11 Thread Tobias Burnus


On 9/11/20 5:03 PM, tdevries wrote:


On 2020-09-11 16:48, Andrew Stubbs wrote:

On 11/09/2020 15:25, Tom de Vries wrote:

--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c

@@ -1,4 +1,5 @@

/*·{·dg-do·run·}·*/
+/*·{·dg-additional-options·"-foffload=-latomic"·}·*/


This will probably break amdgcn, where libatomic does not exist.


It looks like the customary way to handle that is to use
offload_target_nvptx.


Or   { target { powerpc*-*-* } }  ?

For some (known) reasons, the __sync_val_compare_and_swap_16 is
produced for powerpc but not for x86-64.

I could imagine that GCN is affected in the same way as nvptx,
except that AMD's ROC is currently not supported for PowerPC,
if I understand it correctly. If FAIL start to occur in some
CPU/GPU combinations, it can be still revisited.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

objc++: Always pop scope with method definitions [PR97015]

2020-09-11 Thread Nathan Sidwell



Syntax errors in method definition lists could leave us in a function
scope.  My recent change for block scope externs didn't like that.
This reimplements the parsing loop to finish the method definition we
started.  AFAICT the original code was attempting to provide some
error recovery.  Also while there, simply do the token peeking at the
top of the loop, rather than at the two(!) ends.

gcc/cp/
* parser.c (cp_parser_objc_method_definition_list): Reimplement
loop, make sure we pop scope.
gcc/testsuite/
* obj-c++.dg/syntax-error-9.mm: Adjust expected errors.

pushed
--
Nathan Sidwell
diff --git c/gcc/cp/parser.c w/gcc/cp/parser.c
index fed16895b42..fba3fcc0c4c 100644
--- c/gcc/cp/parser.c
+++ w/gcc/cp/parser.c
@@ -32980,44 +32980,42 @@ cp_parser_objc_method_prototype_list (cp_parser* parser)
 static void
 cp_parser_objc_method_definition_list (cp_parser* parser)
 {
-  cp_token *token = cp_lexer_peek_token (parser->lexer);
-
-  while (token->keyword != RID_AT_END && token->type != CPP_EOF)
+  for (;;)
 {
-  tree meth;
+  cp_token *token = cp_lexer_peek_token (parser->lexer);
 
-  if (token->type == CPP_PLUS || token->type == CPP_MINUS)
+  if (token->keyword == RID_AT_END)
 	{
-	  cp_token *ptk;
-	  tree sig, attribute;
-	  bool is_class_method;
-	  if (token->type == CPP_PLUS)
-	is_class_method = true;
-	  else
-	is_class_method = false;
+	  cp_lexer_consume_token (parser->lexer);  /* Eat '@end'.  */
+	  break;
+	}
+  else if (token->type == CPP_EOF)
+	{
+	  cp_parser_error (parser, "expected %<@end%>");
+	  break;
+	}
+  else if (token->type == CPP_PLUS || token->type == CPP_MINUS)
+	{
+	  bool is_class_method = token->type == CPP_PLUS;
+
 	  push_deferring_access_checks (dk_deferred);
-	  sig = cp_parser_objc_method_signature (parser, &attribute);
+	  tree attribute;
+	  tree sig = cp_parser_objc_method_signature (parser, &attribute);
 	  if (sig == error_mark_node)
+	cp_parser_skip_to_end_of_block_or_statement (parser);
+	  else
 	{
-	  cp_parser_skip_to_end_of_block_or_statement (parser);
-	  token = cp_lexer_peek_token (parser->lexer);
-	  continue;
-	}
-	  objc_start_method_definition (is_class_method, sig, attribute,
-	NULL_TREE);
+	  objc_start_method_definition (is_class_method, sig,
+	attribute, NULL_TREE);
 
-	  /* For historical reasons, we accept an optional semicolon.  */
-	  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
-	cp_lexer_consume_token (parser->lexer);
+	  /* For historical reasons, we accept an optional semicolon.  */
+	  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+		cp_lexer_consume_token (parser->lexer);
 
-	  ptk = cp_lexer_peek_token (parser->lexer);
-	  if (!(ptk->type == CPP_PLUS || ptk->type == CPP_MINUS
-		|| ptk->type == CPP_EOF || ptk->keyword == RID_AT_END))
-	{
 	  perform_deferred_access_checks (tf_warning_or_error);
 	  stop_deferring_access_checks ();
-	  meth = cp_parser_function_definition_after_declarator (parser,
- false);
+	  tree meth
+		= cp_parser_function_definition_after_declarator (parser, false);
 	  pop_deferring_access_checks ();
 	  objc_finish_method_definition (meth);
 	}
@@ -33037,15 +33035,8 @@ cp_parser_objc_method_definition_list (cp_parser* parser)
   else
 	/* Allow for interspersed non-ObjC++ code.  */
 	cp_parser_objc_interstitial_code (parser);
-
-  token = cp_lexer_peek_token (parser->lexer);
 }
 
-  if (token->type != CPP_EOF)
-cp_lexer_consume_token (parser->lexer);  /* Eat '@end'.  */
-  else
-cp_parser_error (parser, "expected %<@end%>");
-
   objc_finish_implementation ();
 }
 
diff --git c/gcc/testsuite/obj-c++.dg/syntax-error-9.mm w/gcc/testsuite/obj-c++.dg/syntax-error-9.mm
index ae104e5c140..1876c32dd20 100644
--- c/gcc/testsuite/obj-c++.dg/syntax-error-9.mm
+++ w/gcc/testsuite/obj-c++.dg/syntax-error-9.mm
@@ -1,3 +1,3 @@
 @implementation SaturnDoc /* { dg-warning "cannot find interface declaration" } */
 - read: (void*)aStream ggg /* { dg-error "expected .:. at end of input" } */
-/* { dg-error "-:expected ..end. at end of input" "" { target *-*-* } .+1 } */
+/* { dg-error "-:expected ..*. at end of input" "" { target *-*-* } .+1 } */

Re: [PATCH v6] genemit.c (main): split insn-emit.c for compiling parallelly

2020-09-11 Thread Segher Boessenkool

Hi!

On Fri, Sep 11, 2020 at 03:26:17PM +0800, Jojo R wrote:
> +#define printf_include() do { \

Don't use macros please, use a function?

And maybe do this in a separate patch, for ease of review.  That should
be ack'ed pretty much immediately, after which it is out of the way, and
we do not have to see it again.

>while (read_md_rtx (&info))
> -switch (GET_CODE (info.def))

Factor this body to a separate function, too?  Again, as earlier patch.
As it is, it is impossible to see if you changed anything here.

I suspect all this patch really does is pretty trivial, but it is hard
to tell.

Segher

Re: [OG10] Merge GCC 10 into branch; merge some mainline nvptx patches

2020-09-11 Thread Andrew Stubbs


On 11/09/2020 13:02, Tobias Burnus wrote:

OG10 = devel/omp/gcc-10

I have merged releases/gcc-10 into that branch.
And added a bunch of mainline alias GCC 11 nvptx patches
to that branch.

2df8e0f1bc4 [libatomic] Add nvptx support
5544bca37bc [nvptx] Fix UB in nvptx_assemble_value
7e10b6b0b34 [nvptx] Fix printing of 128-bit constant (negative case)
79d64f8ab05 [nvptx] Fix printing of 128-bit constant
21fc67c95a4 [nvptx, libgcc] Fix Wbuiltin-declaration-mismatch in atomic.c
46595b72fed [nvptx] Fix Wformat in nvptx_assemble_decl_begin
842cd983048 [nvptx] Fix boolean type test in write_fn_proto
97012eaf9d3 [nvptx] Fix array dimension in nvptx_assemble_decl_begin
1e00dba14b2 [nvptx] Handle V2DI/V2SI mode in nvptx_gen_shuffle
2a509a60479 Update ChangeLog.omp for the last two commits
fbfd2a5674e nvptx: Provide vec_set and vec_extract 
patterns

89f5545f62c nvptx: Support 16-bit shifts and extendqihi2
34f8c7e5fa3 Merge remote-tracking branch 'origin/releases/gcc-10' into 
devel/omp/gcc-10


I've done another merge from GCC 10 now, to get another two patches:

bc67c5c7d14 testsuite: gimplefe-44 requires exceptions 



 04912cbb0b4 amdgcn: align TImode registers 





Andrew

[PATCH] Preliminary work on support for 128bits integers

2020-09-11 Thread Sunil K Pandey via Gcc-patches

From: Arnaud Charlet 

* fe.h, opt.ads (Enable_128bit_Types): New.
* stand.ads (Standard_Long_Long_Long_Integer,
S_Long_Long_Long_Integer): New.
---
 gcc/ada/fe.h  | 1 +
 gcc/ada/opt.ads   | 7 +++
 gcc/ada/stand.ads | 4 
 3 files changed, 12 insertions(+)

diff --git a/gcc/ada/fe.h b/gcc/ada/fe.h
index 8ad16c2b1c9..520301e4c3e 100644
--- a/gcc/ada/fe.h
+++ b/gcc/ada/fe.h
@@ -192,6 +192,7 @@ extern Boolean In_Extended_Main_Code_Unit   (Entity_Id);
 #define Ada_Versionopt__ada_version
 #define Back_End_Inlining  opt__back_end_inlining
 #define Debug_Generated_Code   opt__debug_generated_code
+#define Enable_128bit_Typesopt__enable_128bit_types
 #define Exception_Extra_Info   opt__exception_extra_info
 #define Exception_Locations_Suppressed opt__exception_locations_suppressed
 #define Exception_Mechanismopt__exception_mechanism
diff --git a/gcc/ada/opt.ads b/gcc/ada/opt.ads
index c982f83b9e4..885a6fb9497 100644
--- a/gcc/ada/opt.ads
+++ b/gcc/ada/opt.ads
@@ -525,6 +525,13 @@ package Opt is
--  dataflow analysis, which is not available. This behavior parallels that
--  of the old ABE mechanism.
 
+   Enable_128bit_Types : Boolean := False;
+   --  GNAT
+   --  Set to True to enable the support for 128-bit types in the compiler.
+   --  The prerequisite is a 64-bit target that supports 128-bit computation.
+
+   --  WARNING: There is a matching C declaration of this variable in fe.h
+
Error_Msg_Line_Length : Nat := 0;
--  GNAT
--  Records the error message line length limit. If this is set to zero,
diff --git a/gcc/ada/stand.ads b/gcc/ada/stand.ads
index f3f7eb512d5..57b4d55387e 100644
--- a/gcc/ada/stand.ads
+++ b/gcc/ada/stand.ads
@@ -61,6 +61,7 @@ package Stand is
   S_Integer,
   S_Long_Integer,
   S_Long_Long_Integer,
+  S_Long_Long_Long_Integer,
 
   S_Natural,
   S_Positive,
@@ -283,6 +284,9 @@ package Stand is
Standard_Long_Integer: Entity_Id renames SE (S_Long_Integer);
Standard_Long_Long_Integer   : Entity_Id renames SE (S_Long_Long_Integer);
 
+   Standard_Long_Long_Long_Integer : Entity_Id renames
+ SE (S_Long_Long_Long_Integer);
+
Standard_Op_Add  : Entity_Id renames SE (S_Op_Add);
Standard_Op_And  : Entity_Id renames SE (S_Op_And);
Standard_Op_Concat   : Entity_Id renames SE (S_Op_Concat);
-- 
2.26.2

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Segher Boessenkool

On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
> This might have already been discussed/answered, sorry, but:
> when there's a choice, is there an obvious winner between:
> 
> (1) clearing call-clobbered registers and then restoring call-preserved ones
> (2) restoring call-preserved registers and then clearing call-clobbered ones
> 
> Is one option more likely to be useful to attackers than the other?
> 
> (For some frames, it might be necessary to use a small number of
> call-clobbered registers to perform the restore sequence, so (1)
> wouldn't be fully achievable in all cases.)

The same is true for what you have to do *after* restoring registers, as
I said before.  Clearing all is not correct in all cases, and also it is
not useful in all cases (code right after it might write the registers
again.

This really is very (sub-)target-specific, it cannot be done by generic
code on its own *at all*.

Segher

Re: [PATCH] Preliminary work on support for 128bits integers

2020-09-11 Thread Sunil Pandey via Gcc-patches

Sorry, I made a mistake. Please ignore it.

On Fri, Sep 11, 2020 at 9:06 AM Sunil K Pandey via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> From: Arnaud Charlet 
>
> * fe.h, opt.ads (Enable_128bit_Types): New.
> * stand.ads (Standard_Long_Long_Long_Integer,
> S_Long_Long_Long_Integer): New.
> ---
>  gcc/ada/fe.h  | 1 +
>  gcc/ada/opt.ads   | 7 +++
>  gcc/ada/stand.ads | 4 
>  3 files changed, 12 insertions(+)
>
> diff --git a/gcc/ada/fe.h b/gcc/ada/fe.h
> index 8ad16c2b1c9..520301e4c3e 100644
> --- a/gcc/ada/fe.h
> +++ b/gcc/ada/fe.h
> @@ -192,6 +192,7 @@ extern Boolean In_Extended_Main_Code_Unit
>  (Entity_Id);
>  #define Ada_Versionopt__ada_version
>  #define Back_End_Inlining  opt__back_end_inlining
>  #define Debug_Generated_Code   opt__debug_generated_code
> +#define Enable_128bit_Typesopt__enable_128bit_types
>  #define Exception_Extra_Info   opt__exception_extra_info
>  #define Exception_Locations_Suppressed opt__exception_locations_suppressed
>  #define Exception_Mechanismopt__exception_mechanism
> diff --git a/gcc/ada/opt.ads b/gcc/ada/opt.ads
> index c982f83b9e4..885a6fb9497 100644
> --- a/gcc/ada/opt.ads
> +++ b/gcc/ada/opt.ads
> @@ -525,6 +525,13 @@ package Opt is
> --  dataflow analysis, which is not available. This behavior parallels
> that
> --  of the old ABE mechanism.
>
> +   Enable_128bit_Types : Boolean := False;
> +   --  GNAT
> +   --  Set to True to enable the support for 128-bit types in the
> compiler.
> +   --  The prerequisite is a 64-bit target that supports 128-bit
> computation.
> +
> +   --  WARNING: There is a matching C declaration of this variable in fe.h
> +
> Error_Msg_Line_Length : Nat := 0;
> --  GNAT
> --  Records the error message line length limit. If this is set to
> zero,
> diff --git a/gcc/ada/stand.ads b/gcc/ada/stand.ads
> index f3f7eb512d5..57b4d55387e 100644
> --- a/gcc/ada/stand.ads
> +++ b/gcc/ada/stand.ads
> @@ -61,6 +61,7 @@ package Stand is
>S_Integer,
>S_Long_Integer,
>S_Long_Long_Integer,
> +  S_Long_Long_Long_Integer,
>
>S_Natural,
>S_Positive,
> @@ -283,6 +284,9 @@ package Stand is
> Standard_Long_Integer: Entity_Id renames SE (S_Long_Integer);
> Standard_Long_Long_Integer   : Entity_Id renames SE
> (S_Long_Long_Integer);
>
> +   Standard_Long_Long_Long_Integer : Entity_Id renames
> + SE
> (S_Long_Long_Long_Integer);
> +
> Standard_Op_Add  : Entity_Id renames SE (S_Op_Add);
> Standard_Op_And  : Entity_Id renames SE (S_Op_And);
> Standard_Op_Concat   : Entity_Id renames SE (S_Op_Concat);
> --
> 2.26.2
>
>

[PATCH] Fix fma test case [PR97018]

2020-09-11 Thread Sunil K Pandey via Gcc-patches

These tests are written for 256 bit vector. For -march=cascadelake,
vector size changed to 512 bit. It doubles the number of fma
instruction and test fail. Fix is to explicitly disable 512 bit
vector by passing additional option -mno-avx512f.

Tested on x86-64.

gcc/testsuite/ChangeLog:

PR target/97018
* gcc.target/i386/l_fma_double_1.c: Add option -mno-avx512f.
* gcc.target/i386/l_fma_double_2.c: Likewise.
* gcc.target/i386/l_fma_double_3.c: Likewise.
* gcc.target/i386/l_fma_double_4.c: Likewise.
* gcc.target/i386/l_fma_double_5.c: Likewise.
* gcc.target/i386/l_fma_double_6.c: Likewise.
* gcc.target/i386/l_fma_float_1.c: Likewise.
* gcc.target/i386/l_fma_float_2.c: Likewise.
* gcc.target/i386/l_fma_float_3.c: Likewise.
* gcc.target/i386/l_fma_float_4.c: Likewise.
* gcc.target/i386/l_fma_float_5.c: Likewise.
* gcc.target/i386/l_fma_float_6.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/l_fma_double_1.c | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_double_2.c | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_double_3.c | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_double_4.c | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_double_5.c | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_double_6.c | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_float_1.c  | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_float_2.c  | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_float_3.c  | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_float_4.c  | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_float_5.c  | 2 +-
 gcc/testsuite/gcc.target/i386/l_fma_float_6.c  | 2 +-
 12 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
index 5089874faa5..3413beba960 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4" } */
+/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4 -mno-avx512f" } */
 /* Disabling epilogues until we find a better way to deal with scans.  */
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
index e4696204299..1b9b7988850 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4" } */
+/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4 -mno-avx512f" } */
 /* Disabling epilogues until we find a better way to deal with scans.  */
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
index df986d0a633..0fbe9ab9569 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4" } */
+/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4 -mno-avx512f" } */
 /* Disabling epilogues until we find a better way to deal with scans.  */
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
index ae065590f62..c9eba09fea3 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4" } */
+/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4 -mno-avx512f" } */
 /* Disabling epilogues until we find a better way to deal with scans.  */
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
index 5d31abaa5a7..3217d2683f6 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4" } */
+/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4 -mno-avx512f" } */
 /* Disabling epilogues until we find a better way to deal with scans.  */
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_6.c 
b/gcc/testsuite/gcc.target/i386/l_fma_double_6.c
index ff857fb02f1..a22b4e2e37a 100644
--- a/gcc/tes

Re: [PATCH] Fix fma test case [PR97018]

2020-09-11 Thread H.J. Lu via Gcc-patches

On Fri, Sep 11, 2020 at 9:22 AM Sunil K Pandey  wrote:
>
> These tests are written for 256 bit vector. For -march=cascadelake,
> vector size changed to 512 bit. It doubles the number of fma
> instruction and test fail. Fix is to explicitly disable 512 bit
> vector by passing additional option -mno-avx512f.
>
> Tested on x86-64.
>
> gcc/testsuite/ChangeLog:
>
> PR target/97018
> * gcc.target/i386/l_fma_double_1.c: Add option -mno-avx512f.
> * gcc.target/i386/l_fma_double_2.c: Likewise.
> * gcc.target/i386/l_fma_double_3.c: Likewise.
> * gcc.target/i386/l_fma_double_4.c: Likewise.
> * gcc.target/i386/l_fma_double_5.c: Likewise.
> * gcc.target/i386/l_fma_double_6.c: Likewise.
> * gcc.target/i386/l_fma_float_1.c: Likewise.
> * gcc.target/i386/l_fma_float_2.c: Likewise.
> * gcc.target/i386/l_fma_float_3.c: Likewise.
> * gcc.target/i386/l_fma_float_4.c: Likewise.
> * gcc.target/i386/l_fma_float_5.c: Likewise.
> * gcc.target/i386/l_fma_float_6.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/l_fma_double_1.c | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_double_2.c | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_double_3.c | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_double_4.c | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_double_5.c | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_double_6.c | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_float_1.c  | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_float_2.c  | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_float_3.c  | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_float_4.c  | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_float_5.c  | 2 +-
>  gcc/testsuite/gcc.target/i386/l_fma_float_6.c  | 2 +-
>  12 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c 
> b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
> index 5089874faa5..3413beba960 100644
> --- a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
> +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4" } */
> +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4 -mno-avx512f" } */
>  /* Disabling epilogues until we find a better way to deal with scans.  */
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c 
> b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
> index e4696204299..1b9b7988850 100644
> --- a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
> +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4" } */
> +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4 -mno-avx512f" } */
>  /* Disabling epilogues until we find a better way to deal with scans.  */
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c 
> b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
> index df986d0a633..0fbe9ab9569 100644
> --- a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
> +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4" } */
> +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4 -mno-avx512f" } */
>  /* Disabling epilogues until we find a better way to deal with scans.  */
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c 
> b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
> index ae065590f62..c9eba09fea3 100644
> --- a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
> +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4" } */
> +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4 -mno-avx512f" } */
>  /* Disabling epilogues until we find a better way to deal with scans.  */
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c 
> b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
> index 5d31abaa5a7..3217d2683f6 100644
> --- a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
> +++ b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4" } */
> +/* { dg-options "-O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
> -mno-fma4 -mno-avx512f" } */
>  /* Disabling epilogues until we find a better way

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Qing Zhao via Gcc-patches

> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool  
> wrote:
> 
> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
>> This might have already been discussed/answered, sorry, but:
>> when there's a choice, is there an obvious winner between:
>> 
>> (1) clearing call-clobbered registers and then restoring call-preserved ones
>> (2) restoring call-preserved registers and then clearing call-clobbered ones
>> 
>> Is one option more likely to be useful to attackers than the other?

for mitigating ROP purpose, I think that (2) is better than (1). i.e, the 
clearing
call-clobbered register sequence will be immediately before “ret” instruction. 
This will prevent the gadget from doing any useful things.

>> 
>> (For some frames, it might be necessary to use a small number of
>> call-clobbered registers to perform the restore sequence, so (1)
>> wouldn't be fully achievable in all cases.)
> 

Yes, looks like that (1) is also not correct.

> The same is true for what you have to do *after* restoring registers, as
> I said before.  Clearing all is not correct in all cases, and also it is
> not useful in all cases (code right after it might write the registers
> again.

I don’t understand why it’s not correct if we clearing call-clobbered registers 
AFTER restoring call-preserved registers?

Even though we might need to use some call-clobbered registers to restore 
the call-preserved registers, after the restoring is done, we can use data flow
to make sure the call-clobbered registers not lived at that point anymore, then
Clearing those not-lived call-clobbered registers immediately before “ret”.

For me, this should be correct. 

Let me know anything I am missing here.

Thanks.

Qing

> 
> This really is very (sub-)target-specific, it cannot be done by generic
> code on its own *at all*.
> 
> 
> Segher

Re: [PATCH] Enable GCC support for AMX

2020-09-11 Thread Hongyu Wang via Gcc-patches

Hi

Thanks for your review, and sorry for the late reply. It took a while
to finish the runtime test.

> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 797f0ad5edd..d0e59e86a5c 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -412,7 +412,7 @@ i[34567]86-*-*)
> >  waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
> >  avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
> >  avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> > -tsxldtrkintrin.h"
> > +tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h 
> > amxbf16intrin.h"
>
> Line more than 80 chars.
>
> >   ;;
> >  x86_64-*-*)
> >   cpu_type=i386
> > @@ -447,7 +447,7 @@ x86_64-*-*)
> >  waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
> >  avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
> >  avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> > -tsxldtrkintrin.h"
> > +tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h 
> > amxbf16intrin.h"
>
> Ditto.

Changed.

>
> > diff --git a/gcc/config/i386/amxbf16intrin.h 
> > b/gcc/config/i386/amxbf16intrin.h
> > new file mode 100644
> > index 000..df0e2262d50
> > --- /dev/null
> > +++ b/gcc/config/i386/amxbf16intrin.h
> > @@ -0,0 +1,25 @@
> > +#if !defined _IMMINTRIN_H_INCLUDED
> > +#error "Never use  directly; include  
> > instead."
> > +#endif
> > +
> > +#ifndef _AMXBF16INTRIN_H_INCLUDED
> > +#define _AMXBF16INTRIN_H_INCLUDED
> > +
> > +#if !defined(__AMX_BF16__)
> > +#pragma GCC push_options
> > +#pragma GCC target("amx-bf16")
> > +#define __DISABLE_AMX_BF16__
> > +#endif /* __AMX_BF16__ */
> > +
> > +#if defined(__x86_64__) && defined(__AMX_BF16__)
> > +#define _tile_dpbf16ps(dst,src1,src2)  
> >   \
> > +  __asm__ volatile\
> > +  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", 
> > %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> > +#endif
>
> I hope in future we'll replace it with unspecs at least...

Currently we think it is redundant to add builtins with just const int
parameters, which are supposed to be replaced in the future.

>
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index c9f7195d423..9389dc24948 100644
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index bca8c856dc8..a46e31f5862 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
> >  -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq 
> > @gol
> >  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
> >  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> > +-mamx-tile -mamx-int8 -mamx-bf16@gol
>
> Add space please.

Changed.

>
> > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c 
> > b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > new file mode 100644
> > index 000..605a44df3f8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > @@ -0,0 +1,4 @@
> > +/* { dg-do assemble { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> > +/* { dg-require-effective-target amx_bf16 } */
> > +#include"amxbf16-asmintel-1.c"
>
> I didn't get it. We ususally use second tescase to actually execute
> it and (well, a little) verify that semantics is ok. E.g. that
> operands order is correct. Could you please do that?
> This applies to all *-2.c cases.
> I've checked and looks like public SDE simulator supports AMX.
>

Added runtime test. Tested and passed under SDE.

Also, we adjust the intrinsic call to accept macro parameters.

Updated patch.

> --
> K
> Hello,
>
> On 03 сен 08:17, H.J. Lu wrote:
> > On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches
> >  wrote:
> > >
> > > Hello,
> > >
> > > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > > > Hi:
> > > >
> > > > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > > > which will be enabled in GLC.
> > > >
> > > > AMX is a new 64-bit programming paradigm consisting of two
> > > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > > sub-arrays from a larger 2-dimensional memory image,
> > > > and an accelerator able to operate on tiles
> > > >
> > > > Supported instructions are
> > > >
> > > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > > AMX-BF16:tdpbf16ps
> > > >
> > > > The intrinsics adopts constant tile register number as its input 
> > > > parameters.
> > > >
> > > > For detailed information, please refer to
> > > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > > >
> > > > Bootstrap ok, regression test on i3

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Segher Boessenkool

On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
> I don’t understand why it’s not correct if we clearing call-clobbered 
> registers 
> AFTER restoring call-preserved registers?

Because the compiler backend (or the linker!  Or the dynamic linker!
Etc.) can use volatile registers for their own purposes.

Like, on Power, r11 and r12 are used for various calling convention
purposes; they are also used for other purposes; and r0 is used as some
all-purpose volatile (it typically holds the return address near the
end of a function).

"Call-clobbered" is pretty meaningless.  It only holds meaning for a
function calling another, and then only to know which registers lose
their value then.  It has no semantics for other cases, like a function
that will return soonish, as here.

Segher

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Segher Boessenkool

On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote:
>  Shrink-wrapped stuff.  Quite important for performance.  Not something
>  you can throw away.

^^^ !!! ^^^

> > Start looking at handle_simple_exit()?  targetm.gen_simple_return()…
> 
> Yes, I have been looking at this since this morning. 
> You are right, we also need to insert zeroing sequence before  this 
> simple_return which the current patch missed.

Please run the performance loss numbers again after you have something
more realistic :-(

> I am currently try to resolve this issue with the following idea:
> 
> In the routine “thread_prologue_and_epilogue_insns”,  After both 
> “make_epilogue_seq” and “try_shrink_wrapping” finished, 
> 
> Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
> If YES, generate the zero sequence before this RETURN insn. 
> 
> Then we should take care all the exit path that returns.
> 
> Do you see any issue from this idea? 

You need to let the backend decide what to do, for this as well as for
all other cases.  I do not know how often I will have to repeat that.

There also is separate shrink-wrapping, which you haven't touched on at
all yet.  Joy.


Segher

Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 11/09/20 15:41 +0100, Jonathan Wakely wrote:

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:

Convert the specified duration to the target clock's duration type
before adding it to the current time in
__atomic_futex_unsigned::_M_load_when_equal_for and
_M_load_when_equal_until.  This removes the risk of the timeout being
rounded down to the current time resulting in there being no wait at
all when the duration type lacks sufficient precision to hold the
steady_clock current time.

Rather than using the style of fix from PR68519, let's expose the C++17
std::chrono::ceil function as std::chrono::__detail::ceil so that it can
be used in code compiled with earlier standards versions and simplify
the fix. This was suggested by John Salmon in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 .

This problem has become considerably less likely to trigger since I
switched the __atomic__futex_unsigned::__clock_t reference clock from
system_clock to steady_clock and added the loop, but the consequences of
triggering it have changed too.

By my calculations it takes just over 194 days from the epoch for the
current time not to be representable in a float. This means that
system_clock is always subject to the problem (with the standard 1970
epoch) whereas steady_clock with float duration only runs out of
resolution machine has been running for that long (assuming the Linux
implementation of CLOCK_MONOTONIC.)

The recently-added loop in
__atomic_futex_unsigned::_M_load_when_equal_until turns this scenario
into a busy wait.

Unfortunately the combination of both of these things means that it's
not possible to write a test case for this occurring in
_M_load_when_equal_until as it stands.

* libstdc++-v3/include/std/chrono: (__detail::ceil) Move
implementation of std::chrono::ceil into private namespace so
that it's available to pre-C++17 code.

* libstdc++-v3/include/bits/atomic_futex.h:
  (__atomic_futex_unsigned::_M_load_when_equal_for,
  __atomic_futex_unsigned::_M_load_when_equal_until): Use
  __detail::ceil to convert delta to the reference clock
  duration type to avoid resolution problems

* libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486):
New test for __atomic_futex_unsigned::_M_load_when_equal_for.
---
libstdc++-v3/include/bits/atomic_futex.h |  6 +++--
libstdc++-v3/include/std/chrono  | 19 +
libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +-
3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 5f95ade..aa137a7 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_load_when_equal_for(unsigned __val, memory_order __mo,
  const chrono::duration<_Rep, _Period>& __rtime)
 {
+   using __dur = typename __clock_t::duration;
return _M_load_when_equal_until(__val, __mo,
-   __clock_t::now() + __rtime);
+   __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
 }

   // Returns false iff a timeout occurred.
@@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
do {
  const __clock_t::time_point __s_entry = __clock_t::now();
  const auto __delta = __atime - __c_entry;
- const auto __s_atime = __s_entry + __delta;
+ const auto __s_atime = __s_entry +
+ chrono::__detail::ceil<_Duration>(__delta);


I'm testing the attached patch to fix the C++11 constexpr error, but
while re-looking at the uses of __detail::ceil I noticed this is using
_Duration as the target type. Shouldn't that be __clock_t::duration
instead? Why do we care about the duration of the user's time_point
here, rather than the preferred duration of the clock we're about to
wait against?



  if (_M_load_when_equal_until(__val, __mo, __s_atime))
return true;
  __c_entry = _Clock::now();
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 6d78f32..4257c7c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
#endif
#endif // C++20

+// We want to use ceil even when compiling for earlier standards versions
+namespace __detail
+{
+  template
+constexpr __enable_if_is_duration<_ToDur>
+ceil(const duration<_Rep, _Period>& __d)
+{
+ auto __to = chrono::duration_cast<_ToDur>(__d);
+ if (__to < __d)
+   return __to + _ToDur{1};
+ return __to;
+   }
+}
+
#if __cplusplus >= 201703L
# define __cpp_lib_chrono 201611

@@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr __enable_if_is_duration<_ToDur>
 ceil(const duration<_Re

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Richard Sandiford

Qing Zhao  writes:
>> On Sep 11, 2020, at 11:14 AM, Segher Boessenkool 
>>  wrote:
>> 
>> On Fri, Sep 11, 2020 at 11:06:03AM +0100, Richard Sandiford wrote:
>>> This might have already been discussed/answered, sorry, but:
>>> when there's a choice, is there an obvious winner between:
>>> 
>>> (1) clearing call-clobbered registers and then restoring call-preserved ones
>>> (2) restoring call-preserved registers and then clearing call-clobbered ones
>>> 
>>> Is one option more likely to be useful to attackers than the other?
>
> for mitigating ROP purpose, I think that (2) is better than (1). i.e, the 
> clearing
> call-clobbered register sequence will be immediately before “ret” 
> instruction. 
> This will prevent the gadget from doing any useful things.

OK.  The reason I was asking was that (from the naive perspective of
someone not well versed in this stuff): if the effect of one of the
register restores is itself a useful gadget, the clearing wouldn't
protect against it.  But if the register restores are not part of the
intended effect, it seemed that having them immediately before the
ret might make the gadget harder to use than clearing registers would,
because the side-effects of restores would be harder to control than the
(predictable) effect of clearing registers.

But like I say, this is very much not my area of expertise, so that's
probably missing the point in a major way. ;-)

I think the original patch plugged into pass_thread_prologue_and_epilogue,
is that right?  If we go for (2), then I think it would be better to do
it at the start of pass_late_compilation instead.  (Some targets wouldn't
cope with doing it later.)  The reason for doing it so late is that the
set of used “volatile”/caller-saved registers is not fixed at prologue
and epilogue generation: later optimisation passes can introduce uses
of volatile registers that weren't used previously.  (Sorry if this
has already been suggested.)

Unlike Segher, I think this can/should be done in target-independent
code as far as possible (like the patch seemed to do).

Thanks,
Richard

RE: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-09-11 Thread Tamar Christina

Hi Martin,

> 
> can you please confirm that the difference between these two is all due to
> the last option -fno-inline-functions-called-once ?  Is LTo necessary?  I.e., 
> can
> you run the benchmark also built with the branch compiler and -mcpu=native
> -Ofast -fomit-frame-pointer -fno-inline-functions-called-once ?
> 

Done, see below.

> > +--+--
> +--+--+--+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto
> | -24% |  |  |
> > +--+--
> +--+--+--+
> > | Branch   | -mcpu=native -Ofast -fomit-frame-pointer
> | -26% |  |  |
> > +--+--
> +--+--+--+
> 
> >
> > (Hopefully the table shows up correct)
> 
> it does show OK for me, thanks.
> 
> >
> > It looks like your patch definitely does improve the basic cases. So
> > there's not much difference between lto and non-lto anymore and it's
> much Better than GCC 10. However it still contains the regression introduced
> by Honza's changes.
> 
> I assume these are rates, not times, so negative means bad.  But do I
> understand it correctly that you're comparing against GCC 10 with the two
> parameters set to rather special values?  Because your table seems to
> indicate that even for you, the branch is faster than GCC 10 with just -
> mcpu=native -Ofast -fomit-frame-pointer.

Yes these are indeed rates, and indeed I am comparing against the same options
we used to get the fastest rates on before which is the two parameters and
the inline flag.

> 
> So is the problem that the best obtainable run-time, even with obscure
> options, from the branch is slower than the best obtainable run-time from
> GCC 10?
> 

Yeah that's the problem, when we compare the two we're still behind.

I've done the additional two runs

+--+--+--+
| Compiler | Flags  
  | 
diff GCC 10  |
+--+--+--+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once |  |
+--+--+--+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer   
  | 
-44% |
+--+--+--+
| GCC 10   | -mcpu=native -Ofast -fomit-frame-pointer -flto 
  | 
-36% |
+--+--+--+
| GCC 11   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once | -12% |
+--+--+--+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
  | -22% |
+--+--+--+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param   ipa-cp-unit-growth=80 
-fno-inline-functions-called-once | -12% |
+--+--+--+
| Branch   | -mcpu=native -Ofast -fomit-frame-pointer -flto

Re: [PATCH, rs6000] testsuite fixup pr96139 tests

2020-09-11 Thread Segher Boessenkool

Hi!

On Fri, Sep 11, 2020 at 09:44:54AM -0500, will schmidt wrote:
>   As reported, the recently added pr96139 tests will fail on older targets
>   because the tests are missing the appropriate -mvsx or -maltivec 
> options.
>   This adds the options and clarifies the dg-require statements.

> gcc/testsuite/ChangeLog:
> * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require.
> * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require.
> * gcc.target/powerpc/pr96139-a.c: specify -maltivec option and 
> require.

(Capital "S" on each line.)
("and require" isn't clear...  "dg-require"?)
(And the files are -a, -b, -c).

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
> index b3daee4..12a3383 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Wall -m32" } */
> +/* { dg-options "-O2 -Wall -m32 -mvsx" } */
>  /* { dg-require-effective-target ilp32 } */
> -/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */

This uses vector long long.  Okay.

> --- a/gcc/testsuite/gcc.target/powerpc/pr96139-b.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-b.c

Ditto.

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
> index 2464b8d..3ada260 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
> @@ -1,7 +1,7 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -Wall" } */
> +/* { dg-options "-O2 -Wall -maltivec" } */
>  /* { dg-require-effective-target powerpc_altivec_ok } */

But this one as well, why does it not need VSX like the rest?


Segher

[gcc-7-arm] Backport Neoverse-N1 tuning

2020-09-11 Thread Pop, Sebastian via Gcc-patches

Hi,

The attached patches bring the description of Ares and Neoverse-N1
to the gcc-7-arm vendor branch.

There were 2 changes to adjust the first patch to the older code in gcc-7.
Instead of:
+  "32:16", /* function_align.  */
+  "32:16", /* jump_align.  */
+  "32:16", /* loop_align.  */
the patch has:
+  32,  /* function_align.  */
+  32,  /* jump_align.  */
+  32,  /* loop_align.  */

The syntax for “32:16” was added in gcc-8:
+@itemx -falign-functions=@var{n}:@var{m}
Align the start of functions to the next power-of-two greater than
+@var{n}, skipping up to @var{m}-1 bytes.  This ensures that at least
+the first @var{m} bytes of the function can be fetched by the CPU
+without crossing an @var{n}-byte alignment boundary.

+If @var{m} is not specified, it defaults to @var{n}.
+
+Examples: @option{-falign-functions=32} aligns functions to the next
+32-byte boundary, @option{-falign-functions=24} aligns to the next
+32-byte boundary only if this can be done by skipping 23 bytes or less,
+@option{-falign-functions=32:7} aligns to the next
+32-byte boundary only if this can be done by skipping 6 bytes or less.

The second adjustment is due to the fact that gcc-7-branch does not yet
support several of the features listed in gcc-8.  Instead of:
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_PROFILE
the patch has:
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16

The patches pass bootstrap and regression test on aarch64-linux Graviton2.
Ok to commit to gcc-7-arm vendor branch?

Thanks,
Sebastian


0001-AArch64-Initial-mcpu-ares-tuning.patch
Description: 0001-AArch64-Initial-mcpu-ares-tuning.patch


0002-AArch64-Add-support-for-Neoverse-N1.patch
Description: 0002-AArch64-Add-support-for-Neoverse-N1.patch


0003-AArch64-Set-jump-align-4-for-neoversen1.patch
Description: 0003-AArch64-Set-jump-align-4-for-neoversen1.patch


0004-AArch64-Enable-compare-branch-fusion.patch
Description: 0004-AArch64-Enable-compare-branch-fusion.patch

[Patch] OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668]

2020-09-11 Thread Tobias Burnus


This is a first attempt to improve the OpenMP mapping for allocatables
and pointers; there are some more issues – cf. PR and for scalars
PR 97021.

In real world code, a usage like the following is not uncommon:

real, allocatable :: A(:,:)
!$omp target enter data map(to: A)

This maps an unallocated array (a.data == NULL), the array descriptor
itself ("a", pointer set) and then pointer associates on the device
the mapped data (well, NULL) with the device's "a.data"

That works well – and one can now use A on the device and allocate
it (and, before, 'end target' deallocate it).

However, many programs now do on the host:

allocate(A(n,m))
!$omp target
  do i = ...
do j = ...
  A(j,i) = ...
!$omp end target

which gets an implicit "map(tofrom:A)". While "a.data" now gets
mapped, the "a" is not updated as it is already present and
pointer-setting 'a.data' on the device is also not needed as it
is already there.

As written, such code is rather common and other compilers handle this.

The Fortran spec between OpenMP 4.5 and TR 8 is a bit unclear; in
TR 9 (not yet available), the code above is only valid with
  map(always, tofrom: A)  (or 'to:')
where the 'always' is required. The general notion is that it should
be also valid for the case above, but allocatable components of
derived types should not always be rechecked/remapped every time
map(dt) is used. — Hence, this was deferred and only the 'always'
part was clarified in the draft for the upcoming TR 9.

Additionally, for POINTER there is already the following wording
in the spec, which implies that the pointer has to be (potentially)
updated every time:

"If a list item in a map clause is an associated pointer and
 the pointer is not the base pointer of another list item in
 a map clause on the same construct, then it is treated as if
 its pointer target is implicitly mapped in the same clause.
 For the purposes of the map clause, the mapped pointer
 target is treated as if its base pointer is the associated pointer."

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668]

gcc/fortran/ChangeLog:

	PR fortran/96668
	* trans-openmp.c (gfc_omp_finish_clause): Use GOMP_MAP_ALWAYS_POINTER
	with PSET for pointers.
	(gfc_trans_omp_clauses): Likewise and also if the always modifier is
	used.

gcc/ChangeLog:

	PR fortran/96668
	* gimplify.c (gimplify_scan_omp_clauses): Handle
	GOMP_MAP_ALWAYS_POINTER like GOMP_MAP_POINTER for target exit data.

include/ChangeLog:

	PR fortran/96668
	* gomp-constants.h (GOMP_MAP_ALWAYS_POINTER_P): New define.

libgomp/ChangeLog:

	PR fortran/96668
	* libgomp.h (struct target_var_desc): Add has_null_ptr_assoc member.
	* target.c (gomp_map_vars_existing): Add always_to_flag flag.
	(gomp_map_vars_existing): Update call to it.
	(gomp_map_fields_existing): Likewise
	(gomp_map_vars_internal): Update PSET handling such that if a nullptr is
	now allocated or if GOMP_MAP_POINTER is used PSET is updated and pointer
	remapped.
	(GOMP_target_enter_exit_data): Hanlde GOMP_MAP_ALWAYS_POINTER like
	GOMP_MAP_POINTER.
	* testsuite/libgomp.fortran/map-alloc-ptr-1.f90: New test.

 gcc/fortran/trans-openmp.c |  28 +++-
 gcc/gimplify.c |   1 +
 include/gomp-constants.h   |   3 +
 libgomp/libgomp.h  |   3 +
 libgomp/target.c   | 173 -
 .../testsuite/libgomp.fortran/map-alloc-ptr-1.f90  | 114 ++
 6 files changed, 282 insertions(+), 40 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 0e1da04..268467d 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1357,6 +1357,15 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
   tree type = TREE_TYPE (decl);
   tree ptr = gfc_conv_descriptor_data_get (decl);
 
+  /* OpenMP: automatically map pointer targets with the pointer;
+	 hence, always update the descriptor/pointer itself.
+	 NOTE: This also remaps the pointer for allocatable arrays with
+	 'target' attribute which also don't have the 'restrict' qualifier.  */
+  bool always_modifier = false;
+
+  if (flag_openmp && !(TYPE_QUALS (TREE_TYPE (ptr)) & TYPE_QUAL_RESTRICT))
+	always_modifier = true;
+
   if (present)
 	ptr = gfc_build_cond_assign_expr (&block, present, ptr,
 	  null_pointer_node);
@@ -1376,7 +1385,8 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 	OMP_CLAUSE_DECL (c2) = decl;
   OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (type);
   c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  OMP_CLAUSE_SET_MAP_KIND (c3, GOMP_MAP_POINTER);
+  OMP_CLAUSE_SET_MAP_KIND (c3, always_modifier ? GOMP_MAP_ALWAYS_POINTER
+		   : GOMP_MAP_P

Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]

2020-09-11 Thread Jonathan Wakely via Gcc-patches


On 11/09/20 18:22 +0100, Jonathan Wakely wrote:

On 11/09/20 15:41 +0100, Jonathan Wakely wrote:

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:

Convert the specified duration to the target clock's duration type
before adding it to the current time in
__atomic_futex_unsigned::_M_load_when_equal_for and
_M_load_when_equal_until.  This removes the risk of the timeout being
rounded down to the current time resulting in there being no wait at
all when the duration type lacks sufficient precision to hold the
steady_clock current time.

Rather than using the style of fix from PR68519, let's expose the C++17
std::chrono::ceil function as std::chrono::__detail::ceil so that it can
be used in code compiled with earlier standards versions and simplify
the fix. This was suggested by John Salmon in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 .

This problem has become considerably less likely to trigger since I
switched the __atomic__futex_unsigned::__clock_t reference clock from
system_clock to steady_clock and added the loop, but the consequences of
triggering it have changed too.

By my calculations it takes just over 194 days from the epoch for the
current time not to be representable in a float. This means that
system_clock is always subject to the problem (with the standard 1970
epoch) whereas steady_clock with float duration only runs out of
resolution machine has been running for that long (assuming the Linux
implementation of CLOCK_MONOTONIC.)

The recently-added loop in
__atomic_futex_unsigned::_M_load_when_equal_until turns this scenario
into a busy wait.

Unfortunately the combination of both of these things means that it's
not possible to write a test case for this occurring in
_M_load_when_equal_until as it stands.

* libstdc++-v3/include/std/chrono: (__detail::ceil) Move
   implementation of std::chrono::ceil into private namespace so
   that it's available to pre-C++17 code.

* libstdc++-v3/include/bits/atomic_futex.h:
  (__atomic_futex_unsigned::_M_load_when_equal_for,
  __atomic_futex_unsigned::_M_load_when_equal_until): Use
  __detail::ceil to convert delta to the reference clock
  duration type to avoid resolution problems

* libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486):
   New test for __atomic_futex_unsigned::_M_load_when_equal_for.
---
libstdc++-v3/include/bits/atomic_futex.h |  6 +++--
libstdc++-v3/include/std/chrono  | 19 +
libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +-
3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 5f95ade..aa137a7 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_load_when_equal_for(unsigned __val, memory_order __mo,
  const chrono::duration<_Rep, _Period>& __rtime)
{
+   using __dur = typename __clock_t::duration;
return _M_load_when_equal_until(__val, __mo,
-   __clock_t::now() + __rtime);
+   __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
}

  // Returns false iff a timeout occurred.
@@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
do {
  const __clock_t::time_point __s_entry = __clock_t::now();
  const auto __delta = __atime - __c_entry;
- const auto __s_atime = __s_entry + __delta;
+ const auto __s_atime = __s_entry +
+ chrono::__detail::ceil<_Duration>(__delta);


I'm testing the attached patch to fix the C++11 constexpr error, but
while re-looking at the uses of __detail::ceil I noticed this is using
_Duration as the target type. Shouldn't that be __clock_t::duration
instead? Why do we care about the duration of the user's time_point
here, rather than the preferred duration of the clock we're about to
wait against?



  if (_M_load_when_equal_until(__val, __mo, __s_atime))
return true;
  __c_entry = _Clock::now();
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 6d78f32..4257c7c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
#endif
#endif // C++20

+// We want to use ceil even when compiling for earlier standards versions
+namespace __detail
+{
+  template
+constexpr __enable_if_is_duration<_ToDur>
+ceil(const duration<_Rep, _Period>& __d)
+{
+ auto __to = chrono::duration_cast<_ToDur>(__d);
+ if (__to < __d)
+   return __to + _ToDur{1};
+ return __to;
+   }
+}
+
#if __cplusplus >= 201703L
# define __cpp_lib_chrono 201611

@@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
constexpr __enable_if_is_dura

Re: [PATCH] libgcc/config/arm/fp16.c: Add missing prototypes

2020-09-11 Thread Christophe Lyon via Gcc-patches

On Fri, 11 Sep 2020 at 14:35, Kyrylo Tkachov  wrote:
>
> Hi Christophe,
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 11 September 2020 13:23
> > To: gcc-patches@gcc.gnu.org; i...@airs.com
> > Subject: [PATCH] libgcc/config/arm/fp16.c: Add missing prototypes
> >
> > This patch adds the missing prototypes for the fonctions defined in fp16.c, 
> > to
> > avoid these warnings during the build:
> > /libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for
> > '__gnu_h2f_internal' [-Wmissing-prototypes]
> > /libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for
> > '__gnu_f2h_ieee' [-Wmissing-prototypes]
> > /libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for
> > '__gnu_h2f_ieee' [-Wmissing-prototypes]
> > /libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for
> > '__gnu_f2h_alternative' [-Wmissing-prototypes]
> > /libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for
> > '__gnu_h2f_alternative' [-Wmissing-prototypes]
> > /libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for
> > '__gnu_d2h_ieee' [-Wmissing-prototypes]
> > /libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for
> > '__gnu_d2h_alternative' [-Wmissing-prototypes]
> >
> > 2020-09-11  Torbjörn SVENSSON  
> >   Christophe Lyon  
> >
> >   libgcc/
> >   * config/arm/fp16.c (__gnu_h2f_internal, __gnu_f2h_ieee)
> >   (__gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative)
> >   (__gnu_d2h_ieee, __gnu_d2h_alternative): Add missing prototypes.
> > ---
> >  libgcc/config/arm/fp16.c | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
> > index e8f7afb..3d1d8f4 100644
> > --- a/libgcc/config/arm/fp16.c
> > +++ b/libgcc/config/arm/fp16.c
> > @@ -52,6 +52,15 @@ binary64 =
> >52 /* significand.  */
> >  };
> >
> > +/* Function prototypes */
> > +unsigned int __gnu_h2f_internal(unsigned short a, int ieee);
>
> I think this function should just be marked as static, as it is truly 
> internal to this file, whereas the other ones can be called from code emitted 
> in config/arm/arm.c
>

Right, this updated patch does that, like for the other functions in
the same file.

OK?

Thanks

Christophe

> Thanks,
> Kyrill
>
> > +unsigned short __gnu_f2h_ieee(unsigned int a);
> > +unsigned int __gnu_h2f_ieee(unsigned short a);
> > +unsigned short __gnu_f2h_alternative(unsigned int x);
> > +unsigned int __gnu_h2f_alternative(unsigned short a);
> > +unsigned short __gnu_d2h_ieee (unsigned long long a);
> > +unsigned short __gnu_d2h_alternative (unsigned long long x);
> > +
> >  static inline unsigned short
> >  __gnu_float2h_internal (const struct format* fmt,
> >   unsigned long long a, int ieee)
> > --
> > 2.7.4
>
From 8722a517e051309d6b021107273210d83a1a1ae6 Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Fri, 11 Sep 2020 11:43:56 +
Subject: [PATCH] libgcc/config/arm/fp16.c: Make _internal functions static
 inline
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch makes the *_internal functions 'static inline' to avoid these warnings during the build:
/libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for '__gnu_h2f_internal' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for '__gnu_f2h_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for '__gnu_h2f_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for '__gnu_f2h_alternative' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for '__gnu_h2f_alternative' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for '__gnu_d2h_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for '__gnu_d2h_alternative' [-Wmissing-prototypes]

2020-09-11  Torbjörn SVENSSON  
	Christophe Lyon  

	libgcc/
	* config/arm/fp16.c (__gnu_h2f_internal, __gnu_f2h_ieee)
	(__gnu_h2f_ieee, __gnu_f2h_alternative, __gnu_h2f_alternative)
	(__gnu_d2h_ieee, __gnu_d2h_alternative): Add 'static inline'
	qualifier.
---
 libgcc/config/arm/fp16.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
index e8f7afb..3745671 100644
--- a/libgcc/config/arm/fp16.c
+++ b/libgcc/config/arm/fp16.c
@@ -165,7 +165,7 @@ __gnu_d2h_internal (unsigned long long a, int ieee)
   return __gnu_float2h_internal (&binary64, a, ieee);
 }
 
-unsigned int
+static inline unsigned int
 __gnu_h2f_internal(unsigned short a, int ieee)
 {
   unsigned int sign = (unsigned int)(a & 0x8000) << 16;
@@ -190,37 +190,37 @@ __gnu_h2f_internal(unsigned short a, int ieee)
   return sign | (((aexp + 0x70) << 23) + (mantissa << 13));
 }

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Qing Zhao via Gcc-patches

> On Sep 11, 2020, at 12:13 PM, Segher Boessenkool  
> wrote:
> 
> On Fri, Sep 11, 2020 at 11:52:29AM -0500, Qing Zhao wrote:
>> I don’t understand why it’s not correct if we clearing call-clobbered 
>> registers 
>> AFTER restoring call-preserved registers?
> 
> Because the compiler backend (or the linker!  Or the dynamic linker!
> Etc.) can use volatile registers for their own purposes.

For the following sequence at the end of a routine:

*...*
“restore call-preserved registers”
*clear call-clobbered registers"*
*ret*

“Clear call-clobbered registers” will only clear the call-clobbered registers 
that are not live at the end of the routine.
If the call-clobbered register is live at the end of the routine, for example, 
holding the return value,
It will NOT be cleared at all.  

If the call-clobbered register has some other usage after the routine return, 
then the backend should know this and will not
clear it. Then we will resolve this issue, right?

> 
> Like, on Power, r11 and r12 are used for various calling convention
> purposes; they are also used for other purposes; and r0 is used as some
> all-purpose volatile (it typically holds the return address near the
> end of a function).

In the new version of the patch,  the implementation of clearing call-clobbered 
registers is done in backend, middle end only 
computes a hard register set based on user option, source attribute, data flow 
information, and function abi information, and
Then pass this hard register set to the target hook to generate the clearing 
sequence.  The backend will have all the details
on the special situations you mentioned. 

Let me know any more concerns here.

thanks.

Qing

> 
> "Call-clobbered" is pretty meaningless.  It only holds meaning for a
> function calling another, and then only to know which registers lose
> their value then.  It has no semantics for other cases, like a function
> that will return soonish, as here.
> 
> 
> Segher

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-11 Thread Qing Zhao via Gcc-patches

> On Sep 11, 2020, at 12:18 PM, Segher Boessenkool  
> wrote:
> 
> On Thu, Sep 10, 2020 at 05:50:40PM -0500, Qing Zhao wrote:
>> Shrink-wrapped stuff.  Quite important for performance.  Not something
>> you can throw away.
> 
> ^^^ !!! ^^^
> 
>>> Start looking at handle_simple_exit()?  targetm.gen_simple_return()…
>> 
>> Yes, I have been looking at this since this morning. 
>> You are right, we also need to insert zeroing sequence before  this 
>> simple_return which the current patch missed.
> 
> Please run the performance loss numbers again after you have something
> more realistic :-(

Yes, I will collect the performance data with the new patch. 

> 
>> I am currently try to resolve this issue with the following idea:
>> 
>> In the routine “thread_prologue_and_epilogue_insns”,  After both 
>> “make_epilogue_seq” and “try_shrink_wrapping” finished, 
>> 
>> Scan every exit block to see whether the last insn is a ANY_RETURN_P(insn), 
>> If YES, generate the zero sequence before this RETURN insn. 
>> 
>> Then we should take care all the exit path that returns.
>> 
>> Do you see any issue from this idea? 
> 
> You need to let the backend decide what to do, for this as well as for
> all other cases.  I do not know how often I will have to repeat that.

Yes, the new patch will separate the whole task into two parts:

A. Compute the hard register set based on user option, source code attribute, 
data flow information, function abi information, 
 The result will be “need_zeroed_register_set”, and then pass this hard reg 
set to the target hook.
B. Each target will have it’s own implementation of emitting the zeroing 
sequence based on the “need_zeroed_register_set”.

> 
> There also is separate shrink-wrapping, which you haven't touched on at
> all yet.  Joy.

Yes, in addition to shrink-wrapping, I also noticed that there are other places 
that generate “simple_return” or “return” that are not in
The epilogue, for example, in “dbr” phase (delay_slots phase), in “mach” phase 
(machine reorg phase), etc. 

So, only generate zeroing sequence in epilogue is not enough. 

Hongjiu and I discussed this more, and we came up with a new implementation, I 
will describe this new implementation in another email later.

Thanks.

Qing
> 
> 
> Segher

[aarch64] Backport missing NEON intrinsics to GCC8

2020-09-11 Thread Pop, Sebastian via Gcc-patches

Hi,

gcc-8 branch is missing NEON intrinsics for loads and stores.
Attached patches pass bootstrap and regression testing on Graviton2 
aarch64-linux.

Ok to commit to gcc-8 branch?

Thanks,
Sebastian


0001-Patch-implementing-vld1_-_x3-vst1_-_x2-and-vst1_-_x3.patch
Description: 0001-Patch-implementing-vld1_-_x3-vst1_-_x2-and-vst1_-_x3.patch


0002-add-intrinsics-for-vld1-q-_x4-and-vst1-q-_x4.patch
Description: 0002-add-intrinsics-for-vld1-q-_x4-and-vst1-q-_x4.patch

Re: [PATCH, rs6000] testsuite fixup pr96139 tests

2020-09-11 Thread will schmidt via Gcc-patches

On Fri, 2020-09-11 at 12:37 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Sep 11, 2020 at 09:44:54AM -0500, will schmidt wrote:
> >   As reported, the recently added pr96139 tests will fail on older 
> > targets
> >   because the tests are missing the appropriate -mvsx or -maltivec 
> > options.
> >   This adds the options and clarifies the dg-require statements.
> > gcc/testsuite/ChangeLog:
> > * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require.
> > * gcc.target/powerpc/pr96139-a.c: specify -mvsx option and require.
> > * gcc.target/powerpc/pr96139-a.c: specify -maltivec option and 
> > require.
> 
> (Capital "S" on each line.)
> ("and require" isn't clear...  "dg-require"?)
> (And the files are -a, -b, -c).

Thanks for the review.

I've updated that to read:

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96139-a.c: Specify -mvsx option and update the   

dg-require stanza to match. 

* gcc.target/powerpc/pr96139-b.c: Same. 

* gcc.target/powerpc/pr96139-c.c: Specify -maltivec option and update   

the dg-require stanza to match. 

  

> 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
> > index b3daee4..12a3383 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-a.c
> > @@ -1,9 +1,9 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -Wall -m32" } */
> > +/* { dg-options "-O2 -Wall -m32 -mvsx" } */
> >  /* { dg-require-effective-target ilp32 } */
> > -/* { dg-require-effective-target powerpc_altivec_ok } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
> This uses vector long long.  Okay.
> 
> > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-b.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-b.c
> 
> Ditto.
> 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
> > index 2464b8d..3ada260 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
> > @@ -1,7 +1,7 @@
> >  /* { dg-do run } */
> > -/* { dg-options "-O2 -Wall" } */
> > +/* { dg-options "-O2 -Wall -maltivec" } */
> >  /* { dg-require-effective-target powerpc_altivec_ok } */
> 
> But this one as well, why does it not need VSX like the rest?

I made these changes based on the failures reported when I tested
against older targets, and thats all it seems to want.  :-) 

Upon further investigation, it appears that the logic simplifies down
to a BIT_FIELD_REF against the llfoo symbol and the call to printf. 
The actual code generation consists entirely of non-vector instructions
(lis,ori,std,addi,... bl..).  So it simply does not need the vsx
support.

It is still sufficient to trigger the initially reported error, so it
does still serve that purpose.

thanks,
-Will

> 
> 
> Segher

1 2 >

1 - 100 of 109 matches

Mail list logo