Re: [PATCH] x86: Update model value for Alderlake and Rocketlake

2022-01-04 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 4, 2022 at 6:20 AM Cui,Lili  wrote:
>
> Hi Uros,
>
> This patch is to update model value for Alderlake and Rocketlake.
>
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>
> OK for master?
>
> gcc/ChangeLog
>
> * common/config/i386/cpuinfo.h (get_intel_cpu): Add new model values
> to Alderlake and Rocketlake.

OK (this patch can be considered as an obvious patch).

Thanks,
Uros.

> ---
>  gcc/common/config/i386/cpuinfo.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 2d8ea201ab5..61b1a0f291c 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -415,6 +415,7 @@ get_intel_cpu (struct __processor_model *cpu_model,
>cpu_model->__cpu_subtype = INTEL_COREI7_SKYLAKE;
>break;
>  case 0xa7:
> +case 0xa8:
>/* Rocket Lake.  */
>cpu = "rocketlake";
>CHECK___builtin_cpu_is ("corei7");
> @@ -487,6 +488,7 @@ get_intel_cpu (struct __processor_model *cpu_model,
>break;
>  case 0x97:
>  case 0x9a:
> +case 0xbf:
>/* Alder Lake.  */
>cpu = "alderlake";
>CHECK___builtin_cpu_is ("corei7");
> --
> 2.17.1
>
> Thanks,
> Lili.


[PATCH] tree-optimization/103864 - SLP reduction of reductions with conversions

2022-01-04 Thread Richard Biener via Gcc-patches
This generalizes the fix for PR103544 to also cover reductions that
are not reduction chains and does not consider reductions wrapped in
sign conversions for SLP reduction handling.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-01-04  Richard Biener  

PR tree-optimization/103864
PR tree-optimization/103544
* tree-vect-slp.c (vect_analyze_slp_instance): Exclude
reductions wrapped in conversions from SLP handling.
(vect_analyze_slp): Revert PR103544 change.

* gcc.dg/vect/pr103864.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr103864.c | 16 
 gcc/tree-vect-slp.c  | 18 +-
 2 files changed, 25 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr103864.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr103864.c 
b/gcc/testsuite/gcc.dg/vect/pr103864.c
new file mode 100644
index 000..464d5731a42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr103864.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -fno-tree-reassoc" } */
+
+void
+crash_me (short int *crash_me_result, int i, char crash_me_ptr_0)
+{
+  while (i < 1)
+{
+  int j;
+
+  for (j = 0; j < 2; ++j)
+crash_me_result[j] += crash_me_ptr_0 + 1;
+
+  i += 3;
+}
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 3566752c657..c3a1681d7c6 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3325,8 +3325,13 @@ vect_analyze_slp_instance (vec_info *vinfo,
= as_a  (vinfo)->reductions;
   scalar_stmts.create (reductions.length ());
   for (i = 0; reductions.iterate (i, &next_info); i++)
-   if (STMT_VINFO_RELEVANT_P (next_info)
-   || STMT_VINFO_LIVE_P (next_info))
+   if ((STMT_VINFO_RELEVANT_P (next_info)
+|| STMT_VINFO_LIVE_P (next_info))
+   /* ???  Make sure we didn't skip a conversion around a reduction
+  path.  In that case we'd have to reverse engineer that conversion
+  stmt following the chain using reduc_idx and from the PHI
+  using reduc_def.  */
+   && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
  scalar_stmts.quick_push (next_info);
   /* If less than two were relevant/live there's nothing to SLP.  */
   if (scalar_stmts.length () < 2)
@@ -3419,13 +3424,8 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
vinfo = next;
  }
STMT_VINFO_DEF_TYPE (first_element) = vect_internal_def;
-   /* It can be still vectorized as part of an SLP reduction.
-  ???  But only if we didn't skip a conversion around the group.
-  In that case we'd have to reverse engineer that conversion
-  stmt following the chain using reduc_idx and from the PHI
-  using reduc_def.  */
-   if (STMT_VINFO_DEF_TYPE (last) == vect_reduction_def)
- loop_vinfo->reductions.safe_push (last);
+   /* It can be still vectorized as part of an SLP reduction.  */
+   loop_vinfo->reductions.safe_push (last);
  }
 
   /* Find SLP sequences starting from groups of reductions.  */
-- 
2.31.1


[PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error

2022-01-04 Thread Chung-Lin Tang

In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

This change was fairly obvious, so committed directly.

Thanks,
Chung-Lin

libgomp/ChangeLog:

* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
From fbb592407c9dd244b4cea086cbb90d7bd0bf60bb Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 4 Jan 2022 17:26:23 +0800
Subject: [PATCH] libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during
 offload image load

In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
---
 libgomp/plugin/plugin-gcn.c   | 4 ++--
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 8ffd3d1a2cf..d0f05b28bf3 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3401,12 +3401,12 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, 
const void *target_data,
}
 }
 
-  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
+  GCN_DEBUG ("Looking for variable %s\n", XSTRING (GOMP_DEVICE_NUM_VAR));
 
   hsa_status_t status;
   hsa_executable_symbol_t var_symbol;
   status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
-STRINGX (GOMP_DEVICE_NUM_VAR),
+XSTRING (GOMP_DEVICE_NUM_VAR),
 agent->id, 0, &var_symbol);
   if (status == HSA_STATUS_SUCCESS)
 {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index f32276b0a18..b4f0a84d77a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1353,7 +1353,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const 
void *target_data,
   size_t device_num_varsize;
   CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &device_num_varptr,
  &device_num_varsize, module,
- STRINGX (GOMP_DEVICE_NUM_VAR));
+ XSTRING (GOMP_DEVICE_NUM_VAR));
   if (r == CUDA_SUCCESS)
 {
   targ_tbl->start = (uintptr_t) device_num_varptr;
-- 
2.17.1



[PATCH] c-family: Fix up -W*conversion on bitwise &/|/^ [PR101537]

2022-01-04 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcases emit a bogus -Wconversion warning.  This is because
conversion_warning function doesn't handle BIT_*_EXPR (only unsafe_conversion_p
that is called during the default: case, and that one doesn't handle
SAVE_EXPRs added because the unsigned char & or | operands promoted to int
have side-effects and =| or =& is used.

The patch handles BIT_IOR_EXPR/BIT_XOR_EXPR like the last 2 operands of
COND_EXPR by recursing on the two operands, if either of them doesn't fit
into the narrower type, complain.  BIT_AND_EXPR too, but first it needs to
handle some special cases that unsafe_conversion_p does, namely when one
of the two operands is a constant.

This fixes completely the pr101537.c test and for C also pr103881.c
and doesn't regress anything in the testsuite, for C++ pr103881.c still
emits the bogus warnings.
This is because while the C FE emits in that case a SAVE_EXPR that
conversion_warning can handle already, C++ FE emits
TARGET_EXPR , something | D.whatever
etc. and conversion_warning handles COMPOUND_EXPR by "recursing" on the
rhs.  To handle that case, we'd need for TARGET_EXPR on the lhs remember
in some hash map the mapping from D.whatever to the TARGET_EXPR and when
we see D.whatever, use corresponding TARGET_EXPR initializer instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-04  Jakub Jelinek  

PR c/101537
PR c/103881
gcc/c-family/
* c-warn.c (conversion_warning): Handle BIT_AND_EXPR, BIT_IOR_EXPR
and BIT_XOR_EXPR.
gcc/testsuite/
* c-c++-common/pr101537.c: New test.
* c-c++-common/pr103881.c: New test.

--- gcc/c-family/c-warn.c.jj2022-01-03 10:40:49.745044771 +0100
+++ gcc/c-family/c-warn.c   2022-01-03 12:42:55.174012944 +0100
@@ -1304,6 +1304,34 @@ conversion_warning (location_t loc, tree
|| conversion_warning (loc, type, op2, result));
   }
 
+case BIT_AND_EXPR:
+  if (TREE_CODE (expr_type) == INTEGER_TYPE
+ && TREE_CODE (type) == INTEGER_TYPE)
+   for (int i = 0; i < 2; ++i)
+ {
+   tree op = TREE_OPERAND (expr, i);
+   if (TREE_CODE (op) != INTEGER_CST)
+ continue;
+
+   /* If one of the operands is a non-negative constant
+  that fits in the target type, then the type of the
+  other operand does not matter.  */
+   if (int_fits_type_p (op, c_common_signed_type (type))
+   && int_fits_type_p (op, c_common_unsigned_type (type)))
+ return false;
+
+   /* If constant is unsigned and fits in the target
+  type, then the result will also fit.  */
+   if (TYPE_UNSIGNED (TREE_TYPE (op)) && int_fits_type_p (op, type))
+ return false;
+ }
+  /* FALLTHRU */
+case BIT_IOR_EXPR:
+case BIT_XOR_EXPR:
+  return (conversion_warning (loc, type, TREE_OPERAND (expr, 0), result)
+ || conversion_warning (loc, type, TREE_OPERAND (expr, 1),
+result));
+
 default_:
 default:
   conversion_kind = unsafe_conversion_p (type, expr, result, true);
--- gcc/testsuite/c-c++-common/pr101537.c.jj2022-01-03 12:08:33.781823852 
+0100
+++ gcc/testsuite/c-c++-common/pr101537.c   2022-01-03 12:08:33.781823852 
+0100
@@ -0,0 +1,26 @@
+/* PR c/101537 */
+/* { dg-do compile } */
+/* { dg-options "-Wconversion" } */
+
+int
+foo ()
+{
+  int aaa = 1;
+  unsigned char bbb = 0;
+  bbb |= aaa ? 1 : 0;
+  return bbb;
+}
+
+int
+bar (unsigned char x, int f)
+{
+  x |= f ? 1 : 0;
+  return x;
+}
+
+int
+baz (unsigned char x, int f)
+{
+  x = x | f ? 1 : 0;
+  return x;
+}
--- gcc/testsuite/c-c++-common/pr103881.c.jj2022-01-03 12:08:33.781823852 
+0100
+++ gcc/testsuite/c-c++-common/pr103881.c   2022-01-03 12:08:33.781823852 
+0100
@@ -0,0 +1,20 @@
+/* PR c/103881 */
+/* { dg-do compile } */
+/* { dg-options "-Wconversion" } */
+
+unsigned char bar (void);
+
+void
+foo (void)
+{
+  unsigned char t = 0;
+  t |= bar ();
+  t |= bar () & bar ();/* { dg-bogus "conversion from 'int' to 
'unsigned char' may change value" "" { xfail c++ } } */
+  t &= bar () & bar ();/* { dg-bogus "conversion from 'int' to 
'unsigned char' may change value" "" { xfail c++ } } */
+  t = bar () & bar ();
+
+  unsigned char a = bar ();
+  t |= a & a;
+  t |= bar () & a; /* { dg-bogus "conversion from 'int' to 
'unsigned char' may change value" "" { xfail c++ } } */
+  t |= a & bar (); /* { dg-bogus "conversion from 'int' to 
'unsigned char' may change value" "" { xfail c++ } } */
+}

Jakub



[power-ieee128] RFH: LTO broken

2022-01-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 03, 2022 at 11:43:57PM +0100, Thomas Koenig wrote:
> > clearly there is still work to fix (but seems e.g. most of the lto tests
> > are related to the gnu attributes stuff:(  ).
> 
> This is looking better than what I expected.  Apart from LTO, I expect

I've just verified that LTO is broken even in C/C++, it isn't just gfortran.
Just do
make check-gcc RUNTESTFLAGS='--target_board=unix\{-mabi=ieeelongdouble\} 
lto.exp'
on a system where gcc is configured to default to -mabi=ibmlongdouble
with glibc 2.32 or later and watch all the FAILs.
All the failures look like:
/home/jakub/gcc/obj/gcc/xgcc -B/home/jakub/gcc/obj/gcc/ c_lto_20081024_0.o 
-mabi=ieeelongdouble -fdiagnostics-plain-output -O0 -flto -flto-partition=none 
-o gcc-
dg-lto-20081024-01.exe
lto1: warning: Using IEEE extended precision 'long double' [-Wpsabi]
FAIL: gcc.dg/lto/20081024 c_lto_20081024_0.o-c_lto_20081024_0.o link, -O0 -flto 
-flto-partition=none 

Michael, do you think you could have a look?  Either it is the ELF object
created for debug info or the one created by lto1.

Jakub



Re: [PATCH] Transition nvptx backend to STORE_FLAG_VALUE = 1

2022-01-04 Thread Tom de Vries via Gcc-patches

On 10/5/21 19:48, Roger Sayle wrote:


This patch to the nvptx backend changes the backend's STORE_FLAG_VALUE
from -1 to 1, by using BImode predicates and selp instructions, instead
of set instructions (almost always followed by integer negation).

Historically, it was reasonable (through rare) for backends to use -1
for representing true during the RTL passes.  However with tree-ssa,
GCC now emits lots of code that reads and writes _Bool values, requiring
STORE_FLAG_VALUE=-1 targets to frequently convert 0/-1 pseudos to 0/1
pseudos using integer negation.  Unfortunately, this process prevents
or complicates many optimizations (negate isn't associative with logical
AND, OR and XOR, and interferes with range/vrp/nonzerobits bounds etc.).

The impact of this is that for a relatively simple logical expression
like "return (x==21) && (y==69);", the nvptx backend currently generates:

 mov.u32 %r26, %ar0;
 mov.u32 %r27, %ar1;
 set.u32.eq.u32  %r30, %r26, 21;
 neg.s32 %r31, %r30;
 mov.u32 %r29, %r31;
 set.u32.eq.u32  %r33, %r27, 69;
 neg.s32 %r34, %r33;
 mov.u32 %r32, %r34;
 cvt.u16.u8  %r39, %r29;
 mov.u16 %r36, %r39;
 cvt.u16.u8  %r39, %r32;
 mov.u16 %r37, %r39;
 and.b16 %r35, %r36, %r37;
 cvt.u32.u16 %r38, %r35;
 cvt.u32.u8  %value, %r38;

This patch tweaks nvptx to generate 0/1 values instead, requiring the
same number of instructions, using (BImode) predicate registers and selp
instructions so as to now generate the almost identical:

 mov.u32 %r26, %ar0;
 mov.u32 %r27, %ar1;
 setp.eq.u32 %r31, %r26, 21;
 selp.u32%r30, 1, 0, %r31;
 mov.u32 %r29, %r30;
 setp.eq.u32 %r34, %r27, 69;
 selp.u32%r33, 1, 0, %r34;
 mov.u32 %r32, %r33;
 cvt.u16.u8  %r39, %r29;
 mov.u16 %r36, %r39;
 cvt.u16.u8  %r39, %r32;
 mov.u16 %r37, %r39;
 and.b16 %r35, %r36, %r37;
 cvt.u32.u16 %r38, %r35;
 cvt.u32.u8  %value, %r38;

The hidden benefit is that this sequence can (in theory) be optimized
by the RTL passes to eventually generate a much shorter sequence using
an and.pred instruction (just like Nvidia's nvcc compiler).

This patch has been tested nvptx-none with a "make" and "make -k check"
(including newlib) hosted on x86_64-pc-linux-gnu with no new failures.
Ok for mainline?




Thanks for the patch, sounds reasonable.

Committed.

Thanks,
- Tom


2021-10-05  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.h (STORE_FLAG_VALUE): Change to 1.
* config/nvptx/nvptx.md (movbi): Use P1 constraint for true.
(setcc_from_bi): Remove SImode specific pattern.
(setcc_from_bi): Provide more general HSDIM pattern.
(extendbi2, zeroextendbi2): Provide instructions
for sign- and zero-extending BImode predicates to integers.
(setcc_int): Remove previous (-1-based) instructions.
(cstorebi4): Remove BImode to SImode specific expander.
(cstore4): Fix indentation.  Expand using setccsi_from_bi.
(cstore4): For both integer and floating point modes.


Thanks in advance,
Roger
--



Re: [2/2] PR96463 -- changes to type checking vec_perm_expr in middle end

2022-01-04 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Fri, 17 Dec 2021, Richard Sandiford wrote:
>
>> Prathamesh Kulkarni  writes:
>> > Hi,
>> > The attached patch rearranges order of type-check for vec_perm_expr
>> > and relaxes type checking for
>> > lhs = vec_perm_expr
>> >
>> > when:
>> > rhs1 == rhs2,
>> > lhs is variable length vector,
>> > rhs1 is fixed length vector,
>> > TREE_TYPE (lhs) == TREE_TYPE (rhs1)
>> >
>> > I am not sure tho if this check is correct ? My intent was to capture
>> > case when vec_perm_expr is used to "extend" fixed length vector to
>> > it's VLA equivalent.
>> 
>> VLAness isn't really the issue.  We want the same thing to work for
>> -msve-vector-bits=256, -msve-vector-bits=512, etc., even though the
>> vectors are fixed-length in that case.
>> 
>> The principle is that for:
>> 
>>   A = VEC_PERM_EXPR ;
>> 
>> the requirements are:
>> 
>> - A, B, C and D must be vectors
>> - A, B and C must have the same element type
>> - D must have an integer element type
>> - A and D must have the same number of elements (NA)
>> - B and C must have the same number of elements (NB)
>> 
>> The semantics are that we create a joined vector BC (all elements of B
>> followed by all element of C) and that:
>> 
>>   A[i] = BC[D[i] % (NB+NB)]
>> 
>> for 0 ≤ i < NA.
>> 
>> This operation makes sense even if NA != NB.
>
> But note that we don't currently expect NA != NB and the optab just
> has a single mode.

True, but we only need this for constant permutes.  They are already
special in that they allow the index elements to be wider than the data
elements.

Thanks,
Richard

>
> I'd rather go with the simpler patch I posted as reply to the earlier
> mail rather such large refactoring at this point.
>
> Richard.
>
>> Thanks,
>> Richard
>> 
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> > diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
>> > index 672e384ef09..9f91878c468 100644
>> > --- a/gcc/tree-cfg.c
>> > +++ b/gcc/tree-cfg.c
>> > @@ -4325,10 +4325,11 @@ verify_gimple_assign_ternary (gassign *stmt)
>> >break;
>> >  
>> >  case VEC_PERM_EXPR:
>> > -  if (!useless_type_conversion_p (lhs_type, rhs1_type)
>> > -|| !useless_type_conversion_p (lhs_type, rhs2_type))
>> > +  if (TREE_CODE (rhs1_type) != VECTOR_TYPE
>> > +|| TREE_CODE (rhs2_type) != VECTOR_TYPE
>> > +|| TREE_CODE (rhs3_type) != VECTOR_TYPE)
>> >{
>> > -error ("type mismatch in %qs", code_name);
>> > +error ("vector types expected in %qs", code_name);
>> >  debug_generic_expr (lhs_type);
>> >  debug_generic_expr (rhs1_type);
>> >  debug_generic_expr (rhs2_type);
>> > @@ -4336,11 +4337,14 @@ verify_gimple_assign_ternary (gassign *stmt)
>> >  return true;
>> >}
>> >  
>> > -  if (TREE_CODE (rhs1_type) != VECTOR_TYPE
>> > -|| TREE_CODE (rhs2_type) != VECTOR_TYPE
>> > -|| TREE_CODE (rhs3_type) != VECTOR_TYPE)
>> > +  if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
>> > +|| (TREE_CODE (rhs3) != VECTOR_CST
>> > +&& (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE
>> > +  (TREE_TYPE (rhs3_type)))
>> > +!= GET_MODE_BITSIZE (SCALAR_TYPE_MODE
>> > + (TREE_TYPE (rhs1_type))
>> >{
>> > -error ("vector types expected in %qs", code_name);
>> > +error ("invalid mask type in %qs", code_name);
>> >  debug_generic_expr (lhs_type);
>> >  debug_generic_expr (rhs1_type);
>> >  debug_generic_expr (rhs2_type);
>> > @@ -4348,15 +4352,18 @@ verify_gimple_assign_ternary (gassign *stmt)
>> >  return true;
>> >}
>> >  
>> > -  if (maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
>> > -  TYPE_VECTOR_SUBPARTS (rhs2_type))
>> > -|| maybe_ne (TYPE_VECTOR_SUBPARTS (rhs2_type),
>> > - TYPE_VECTOR_SUBPARTS (rhs3_type))
>> > -|| maybe_ne (TYPE_VECTOR_SUBPARTS (rhs3_type),
>> > - TYPE_VECTOR_SUBPARTS (lhs_type)))
>> > +  /* Accept lhs = vec_perm_expr if lhs is vector length 
>> > agnostic,
>> > +   and has same element type as v.  */
>> > +  if (!TYPE_VECTOR_SUBPARTS (lhs_type).is_constant ()
>> > +&& operand_equal_p (rhs1, rhs2, 0)
>> > +&& TYPE_VECTOR_SUBPARTS (rhs1_type).is_constant ()
>> > +&& TREE_TYPE (lhs_type) == TREE_TYPE (rhs1_type)) 
>> > +  return false;
>> > +
>> > +  if (!useless_type_conversion_p (lhs_type, rhs1_type)
>> > +|| !useless_type_conversion_p (lhs_type, rhs2_type))
>> >{
>> > -error ("vectors with different element number found in %qs",
>> > -   code_name);
>> > +error ("type mismatch in %qs", code_name);
>> >  debug_generic_expr (lhs_type);
>> >  debug_generic_expr (rhs1_type);
>> >  debug_generic_expr (rhs2_type);
>> > @@ -4364,21 +4371,21 @@ verify_gimple_assign_ternary (gassign *stmt)
>> >  return true;
>> >}
>> >  
>> > -  if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
>> > -|| (TREE_CODE (rhs3) != VECTOR_CST
>> > -&& (GET_MODE_BITSIZE (SCALAR_I

[PATCH] tree-optimization/103690 - not up-to-date SSA and PRE DCE

2022-01-04 Thread Richard Biener via Gcc-patches
This avoids running simple_dce_from_worklist on partially not up-to-date
SSA form (in unreachable code regions) by scheduling CFG cleanup
manually as is done anyway when tail-merging runs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-01-04  Richard Biener  

PR tree-optimization/103690
* tree-pass.h (tail_merge_optimize): Adjust.
* tree-ssa-tail-merge.c (tail_merge_optimize): Pass in whether
to re-split critical edges, move CFG cleanup ...
* tree-ssa-pre.c (pass_pre::execute): ... here, before
simple_dce_from_worklist and delay freeing inserted_exprs from
...
(fini_pre): .. here.
---
 gcc/tree-pass.h   |  2 +-
 gcc/tree-ssa-pre.c| 25 ++---
 gcc/tree-ssa-tail-merge.c | 14 --
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index eef1f3e2400..36097cf2736 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -412,7 +412,7 @@ extern gimple_opt_pass *make_pass_early_thread_jumps 
(gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_pre (gcc::context *ctxt);
-extern unsigned int tail_merge_optimize (unsigned int);
+extern unsigned int tail_merge_optimize (unsigned int, bool);
 extern gimple_opt_pass *make_pass_profile (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strip_predict_hints (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_complex_O0 (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index f67bd076678..ab24fa98a1f 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -4306,7 +4306,6 @@ fini_pre ()
   value_expressions.release ();
   constant_value_expressions.release ();
   expressions.release ();
-  BITMAP_FREE (inserted_exprs);
   bitmap_obstack_release (&grand_bitmap_obstack);
   bitmap_set_pool.release ();
   pre_expr_pool.release ();
@@ -4431,16 +4430,28 @@ pass_pre::execute (function *fun)
 
   vn_valueize = NULL;
 
+  fini_pre ();
+
+  scev_finalize ();
+  loop_optimizer_finalize ();
+
+  /* Perform a CFG cleanup before we run simple_dce_from_worklist since
+ unreachable code regions will have not up-to-date SSA form which
+ confuses it.  */
+  bool need_crit_edge_split = false;
+  if (todo & TODO_cleanup_cfg)
+{
+  cleanup_tree_cfg ();
+  todo &= ~TODO_cleanup_cfg;
+  need_crit_edge_split = true;
+}
+
   /* Because we don't follow exactly the standard PRE algorithm, and decide not
  to insert PHI nodes sometimes, and because value numbering of casts isn't
  perfect, we sometimes end up inserting dead code.   This simple DCE-like
  pass removes any insertions we made that weren't actually used.  */
   simple_dce_from_worklist (inserted_exprs);
-
-  fini_pre ();
-
-  scev_finalize ();
-  loop_optimizer_finalize ();
+  BITMAP_FREE (inserted_exprs);
 
   /* TODO: tail_merge_optimize may merge all predecessors of a block, in which
  case we can merge the block with the remaining predecessor of the block.
@@ -4449,7 +4460,7 @@ pass_pre::execute (function *fun)
  - call merge_blocks after all tail merge iterations
  - mark TODO_cleanup_cfg when necessary
  - share the cfg cleanup with fini_pre.  */
-  todo |= tail_merge_optimize (todo);
+  todo |= tail_merge_optimize (todo, need_crit_edge_split);
 
   free_rpo_vn ();
 
diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index f717bb2b4ad..fd333800f0f 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -1724,7 +1724,7 @@ update_debug_stmts (void)
 /* Runs tail merge optimization.  */
 
 unsigned int
-tail_merge_optimize (unsigned int todo)
+tail_merge_optimize (unsigned int todo, bool need_crit_edge_split)
 {
   int nr_bbs_removed_total = 0;
   int nr_bbs_removed;
@@ -1738,15 +1738,9 @@ tail_merge_optimize (unsigned int todo)
 
   timevar_push (TV_TREE_TAIL_MERGE);
 
-  /* We enter from PRE which has critical edges split.  Elimination
- does not process trivially dead code so cleanup the CFG if we
- are told so.  And re-split critical edges then.  */
-  if (todo & TODO_cleanup_cfg)
-{
-  cleanup_tree_cfg ();
-  todo &= ~TODO_cleanup_cfg;
-  split_edges_for_insertion ();
-}
+  /* Re-split critical edges when PRE did a CFG cleanup.  */
+  if (need_crit_edge_split)
+split_edges_for_insertion ();
 
   if (!dom_info_available_p (CDI_DOMINATORS))
 {
-- 
2.31.1


Re: [2/2] PR96463 -- changes to type checking vec_perm_expr in middle end

2022-01-04 Thread Richard Biener via Gcc-patches
On Tue, 4 Jan 2022, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 17 Dec 2021, Richard Sandiford wrote:
> >
> >> Prathamesh Kulkarni  writes:
> >> > Hi,
> >> > The attached patch rearranges order of type-check for vec_perm_expr
> >> > and relaxes type checking for
> >> > lhs = vec_perm_expr
> >> >
> >> > when:
> >> > rhs1 == rhs2,
> >> > lhs is variable length vector,
> >> > rhs1 is fixed length vector,
> >> > TREE_TYPE (lhs) == TREE_TYPE (rhs1)
> >> >
> >> > I am not sure tho if this check is correct ? My intent was to capture
> >> > case when vec_perm_expr is used to "extend" fixed length vector to
> >> > it's VLA equivalent.
> >> 
> >> VLAness isn't really the issue.  We want the same thing to work for
> >> -msve-vector-bits=256, -msve-vector-bits=512, etc., even though the
> >> vectors are fixed-length in that case.
> >> 
> >> The principle is that for:
> >> 
> >>   A = VEC_PERM_EXPR ;
> >> 
> >> the requirements are:
> >> 
> >> - A, B, C and D must be vectors
> >> - A, B and C must have the same element type
> >> - D must have an integer element type
> >> - A and D must have the same number of elements (NA)
> >> - B and C must have the same number of elements (NB)
> >> 
> >> The semantics are that we create a joined vector BC (all elements of B
> >> followed by all element of C) and that:
> >> 
> >>   A[i] = BC[D[i] % (NB+NB)]
> >> 
> >> for 0 ≤ i < NA.
> >> 
> >> This operation makes sense even if NA != NB.
> >
> > But note that we don't currently expect NA != NB and the optab just
> > has a single mode.
> 
> True, but we only need this for constant permutes.  They are already
> special in that they allow the index elements to be wider than the data
> elements.

OK, then we should reflect this in the stmt verification and only relax
the constant permute vector case and also amend the
TARGET_VECTORIZE_VEC_PERM_CONST accordingly.

For non-constant permutes the docs say the mode of vec_perm is
the common mode of operands 1 and 2 whilst the mode of operand 0
is unspecified - even unconstrained by the docs.  I'm not sure
if vec_perm expansion is expected to eventually FAIL.  Updating the
docs of vec_perm would be appreciated as well.

As said I prefer to not mangle the existing stmt checking too much
at this stage so minimal adjustment is prefered there.

Thanks,
Richard.

> Thanks,
> Richard
> 
> >
> > I'd rather go with the simpler patch I posted as reply to the earlier
> > mail rather such large refactoring at this point.
> >
> > Richard.
> >
> >> Thanks,
> >> Richard
> >> 
> >> >
> >> > Thanks,
> >> > Prathamesh
> >> >
> >> > diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> >> > index 672e384ef09..9f91878c468 100644
> >> > --- a/gcc/tree-cfg.c
> >> > +++ b/gcc/tree-cfg.c
> >> > @@ -4325,10 +4325,11 @@ verify_gimple_assign_ternary (gassign *stmt)
> >> >break;
> >> >  
> >> >  case VEC_PERM_EXPR:
> >> > -  if (!useless_type_conversion_p (lhs_type, rhs1_type)
> >> > -  || !useless_type_conversion_p (lhs_type, rhs2_type))
> >> > +  if (TREE_CODE (rhs1_type) != VECTOR_TYPE
> >> > +  || TREE_CODE (rhs2_type) != VECTOR_TYPE
> >> > +  || TREE_CODE (rhs3_type) != VECTOR_TYPE)
> >> >  {
> >> > -  error ("type mismatch in %qs", code_name);
> >> > +  error ("vector types expected in %qs", code_name);
> >> >debug_generic_expr (lhs_type);
> >> >debug_generic_expr (rhs1_type);
> >> >debug_generic_expr (rhs2_type);
> >> > @@ -4336,11 +4337,14 @@ verify_gimple_assign_ternary (gassign *stmt)
> >> >return true;
> >> >  }
> >> >  
> >> > -  if (TREE_CODE (rhs1_type) != VECTOR_TYPE
> >> > -  || TREE_CODE (rhs2_type) != VECTOR_TYPE
> >> > -  || TREE_CODE (rhs3_type) != VECTOR_TYPE)
> >> > +  if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
> >> > +  || (TREE_CODE (rhs3) != VECTOR_CST
> >> > +  && (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE
> >> > +(TREE_TYPE (rhs3_type)))
> >> > +  != GET_MODE_BITSIZE (SCALAR_TYPE_MODE
> >> > +   (TREE_TYPE (rhs1_type))
> >> >  {
> >> > -  error ("vector types expected in %qs", code_name);
> >> > +  error ("invalid mask type in %qs", code_name);
> >> >debug_generic_expr (lhs_type);
> >> >debug_generic_expr (rhs1_type);
> >> >debug_generic_expr (rhs2_type);
> >> > @@ -4348,15 +4352,18 @@ verify_gimple_assign_ternary (gassign *stmt)
> >> >return true;
> >> >  }
> >> >  
> >> > -  if (maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
> >> > -TYPE_VECTOR_SUBPARTS (rhs2_type))
> >> > -  || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs2_type),
> >> > -   TYPE_VECTOR_SUBPARTS (rhs3_type))
> >> > -  || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs3_type),
> >> > -   TYPE_VECTOR_SUBPARTS (lhs_ty

Patch ping

2022-01-04 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping the
https://gcc.gnu.org/pipermail/libstdc++/2021-December/053680.html
time_get patch.

Thanks

Jakub



[power-ieee128] libgfortran: -mabi=ieeelongdouble I/O fix

2022-01-04 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch fixes:
FAIL: gfortran.dg/fmt_en.f90   -O0  output pattern test
FAIL: gfortran.dg/fmt_en.f90   -O1  output pattern test
FAIL: gfortran.dg/fmt_en.f90   -O2  output pattern test
FAIL: gfortran.dg/fmt_en.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  output pattern test
FAIL: gfortran.dg/fmt_en.f90   -O3 -g  output pattern test
FAIL: gfortran.dg/fmt_en.f90   -Os  output pattern test
FAIL: gfortran.dg/fmt_en_rd.f90   -O0  output pattern test
FAIL: gfortran.dg/fmt_en_rd.f90   -O1  output pattern test
FAIL: gfortran.dg/fmt_en_rd.f90   -O2  output pattern test
FAIL: gfortran.dg/fmt_en_rd.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  output pattern test
FAIL: gfortran.dg/fmt_en_rd.f90   -O3 -g  output pattern test
FAIL: gfortran.dg/fmt_en_rd.f90   -Os  output pattern test
FAIL: gfortran.dg/fmt_en_rn.f90   -O0  output pattern test
FAIL: gfortran.dg/fmt_en_rn.f90   -O1  output pattern test
FAIL: gfortran.dg/fmt_en_rn.f90   -O2  output pattern test
FAIL: gfortran.dg/fmt_en_rn.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  output pattern test
FAIL: gfortran.dg/fmt_en_rn.f90   -O3 -g  output pattern test
FAIL: gfortran.dg/fmt_en_rn.f90   -Os  output pattern test
FAIL: gfortran.dg/fmt_en_ru.f90   -O0  output pattern test
FAIL: gfortran.dg/fmt_en_ru.f90   -O1  output pattern test
FAIL: gfortran.dg/fmt_en_ru.f90   -O2  output pattern test
FAIL: gfortran.dg/fmt_en_ru.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  output pattern test
FAIL: gfortran.dg/fmt_en_ru.f90   -O3 -g  output pattern test
FAIL: gfortran.dg/fmt_en_ru.f90   -Os  output pattern test
FAIL: gfortran.dg/fmt_en_rz.f90   -O0  output pattern test
FAIL: gfortran.dg/fmt_en_rz.f90   -O1  output pattern test
FAIL: gfortran.dg/fmt_en_rz.f90   -O2  output pattern test
FAIL: gfortran.dg/fmt_en_rz.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  output pattern test
FAIL: gfortran.dg/fmt_en_rz.f90   -O3 -g  output pattern test
FAIL: gfortran.dg/fmt_en_rz.f90   -Os  output pattern test
FAIL: gfortran.dg/fmt_g0_7.f08   -O0  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -O1  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -O2  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -O3 -g  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -Os  execution test
FAIL: gfortran.dg/fmt_pf.f90   -O0  output pattern test
FAIL: gfortran.dg/fmt_pf.f90   -O1  output pattern test
FAIL: gfortran.dg/fmt_pf.f90   -O2  output pattern test
FAIL: gfortran.dg/fmt_pf.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  output pattern test
FAIL: gfortran.dg/fmt_pf.f90   -O3 -g  output pattern test
FAIL: gfortran.dg/fmt_pf.f90   -Os  output pattern test
FAIL: gfortran.dg/large_real_kind_1.f90   -O0  execution test
FAIL: gfortran.dg/large_real_kind_1.f90   -O1  execution test
FAIL: gfortran.dg/large_real_kind_1.f90   -O2  execution test
FAIL: gfortran.dg/large_real_kind_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/large_real_kind_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/large_real_kind_1.f90   -Os  execution test

Ok for power-ieee128?

2022-01-04  Jakub Jelinek  

* io/write_float.def (CALCULATE_EXP): If HAVE_GFC_REAL_17, also use
CALCULATE_EXP(17).
(determine_en_precision): Use 17 instead of 16 as first EN_PREC
argument for kind 17.
(get_float_string): Use 17 instead of 16 as first FORMAT_FLOAT
argument for kind 17.

--- libgfortran/io/write_float.def.jj   2022-01-04 10:27:56.528323600 +
+++ libgfortran/io/write_float.def  2022-01-04 13:09:51.751534884 +
@@ -799,6 +799,10 @@ CALCULATE_EXP(10)
 #ifdef HAVE_GFC_REAL_16
 CALCULATE_EXP(16)
 #endif
+
+#ifdef HAVE_GFC_REAL_17
+CALCULATE_EXP(17)
+#endif
 #undef CALCULATE_EXP
 
 
@@ -942,7 +946,7 @@ determine_en_precision (st_parameter_dt
 #endif
 #ifdef HAVE_GFC_REAL_17
 case 17:
-  EN_PREC(16,Q)
+  EN_PREC(17,Q)
 #endif
   break;
 default:
@@ -1150,7 +1154,7 @@ get_float_string (st_parameter_dt *dtp,
 #endif
 #ifdef HAVE_GFC_REAL_17
 case 17:
-  FORMAT_FLOAT(16,Q)
+  FORMAT_FLOAT(17,Q)
   break;
 #endif
 default:

Jakub



Re: [2/2] PR96463 -- changes to type checking vec_perm_expr in middle end

2022-01-04 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 4 Jan 2022, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Fri, 17 Dec 2021, Richard Sandiford wrote:
>> >
>> >> Prathamesh Kulkarni  writes:
>> >> > Hi,
>> >> > The attached patch rearranges order of type-check for vec_perm_expr
>> >> > and relaxes type checking for
>> >> > lhs = vec_perm_expr
>> >> >
>> >> > when:
>> >> > rhs1 == rhs2,
>> >> > lhs is variable length vector,
>> >> > rhs1 is fixed length vector,
>> >> > TREE_TYPE (lhs) == TREE_TYPE (rhs1)
>> >> >
>> >> > I am not sure tho if this check is correct ? My intent was to capture
>> >> > case when vec_perm_expr is used to "extend" fixed length vector to
>> >> > it's VLA equivalent.
>> >> 
>> >> VLAness isn't really the issue.  We want the same thing to work for
>> >> -msve-vector-bits=256, -msve-vector-bits=512, etc., even though the
>> >> vectors are fixed-length in that case.
>> >> 
>> >> The principle is that for:
>> >> 
>> >>   A = VEC_PERM_EXPR ;
>> >> 
>> >> the requirements are:
>> >> 
>> >> - A, B, C and D must be vectors
>> >> - A, B and C must have the same element type
>> >> - D must have an integer element type
>> >> - A and D must have the same number of elements (NA)
>> >> - B and C must have the same number of elements (NB)
>> >> 
>> >> The semantics are that we create a joined vector BC (all elements of B
>> >> followed by all element of C) and that:
>> >> 
>> >>   A[i] = BC[D[i] % (NB+NB)]
>> >> 
>> >> for 0 ≤ i < NA.
>> >> 
>> >> This operation makes sense even if NA != NB.
>> >
>> > But note that we don't currently expect NA != NB and the optab just
>> > has a single mode.
>> 
>> True, but we only need this for constant permutes.  They are already
>> special in that they allow the index elements to be wider than the data
>> elements.
>
> OK, then we should reflect this in the stmt verification and only relax
> the constant permute vector case and also amend the
> TARGET_VECTORIZE_VEC_PERM_CONST accordingly.

Sounds good.

> For non-constant permutes the docs say the mode of vec_perm is
> the common mode of operands 1 and 2 whilst the mode of operand 0
> is unspecified - even unconstrained by the docs.  I'm not sure
> if vec_perm expansion is expected to eventually FAIL.  Updating the
> docs of vec_perm would be appreciated as well.

Yeah, I guess de facto operand 0 has to be the same mode as operands
1 and 2.  Maybe that was just an oversight, or maybe it seemed obvious
or self-explanatory at the time. :-)

> As said I prefer to not mangle the existing stmt checking too much
> at this stage so minimal adjustment is prefered there.

The PR is only an enhancement request rather than a bug, so I think the
patch would need to wait for GCC 13 whatever happens.

Thanks,
Richard


[committed] libgomp/testsuite: Improve omp_get_device_num() tests (was: Re: [PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error)

2022-01-04 Thread Tobias Burnus

On 04.01.22 10:28, Chung-Lin Tang wrote:


In the patch that implemented omp_get_device_num(), there was an error
where
the stringification of GOMP_DEVICE_NUM_VAR, ...


... which caused that omp_get_device() == 0 (always) on nvptx/gcn.

That's fine if there is only a single non-host device (as often the
case), but not if there are multiples.

This commit r12-6209 now makes the testcases iterate over all devices
(including the initial/host device).

Hence, with multiple non-host devices and this test, the error had been
found before ... ;-)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit be661959a6b6d8f9c3c8608a746789e7b2ec3ca4
Author: Tobias Burnus 
Date:   Tue Jan 4 14:58:06 2022 +0100

libgomp/testsuite: Improve omp_get_device_num() tests

Related to r12-6208-gebc853deb7cc0487de9ef6e891a007ba853d1933
"libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load"

That commit fixed an issue with omp_get_device_num() on gcn/nvptx that
resulted in having always the value 0.
This commit modifies the tests to iterate over all devices such that on a
multi-nonhost-device system it had detected that always-zero issue.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/target-45.c: Iterate over all devices.
* testsuite/libgomp.fortran/target10.f90: Likewise.

diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-45.c b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
index 81acee81064..837503996d7 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-45.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -14,17 +14,23 @@ int main (void)
   int device_num;
   int initial_device;
 
-  #pragma omp target map(from: device_num, initial_device)
-  {
-initial_device = omp_is_initial_device ();
-device_num = omp_get_device_num ();
-  }
-
-  if (initial_device && host_device_num != device_num)
-abort ();
-
-  if (!initial_device && host_device_num == device_num)
-abort ();
+  for (int i = 0; i <= omp_get_num_devices (); i++)
+{
+  #pragma omp target map(from: device_num, initial_device) device(i)
+	{
+	  initial_device = omp_is_initial_device ();
+	  device_num = omp_get_device_num ();
+	}
+
+  if (i != device_num)
+	abort ();
+
+  if (initial_device && host_device_num != device_num)
+	abort ();
+
+  if (!initial_device && host_device_num == device_num)
+	abort ();
+}
 
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.fortran/target10.f90 b/libgomp/testsuite/libgomp.fortran/target10.f90
index f41a726de75..f6951fc9057 100644
--- a/libgomp/testsuite/libgomp.fortran/target10.f90
+++ b/libgomp/testsuite/libgomp.fortran/target10.f90
@@ -4,18 +4,20 @@
 program main
   use omp_lib
   implicit none
-  integer :: device_num, host_device_num
+  integer :: device_num, host_device_num, i
   logical :: initial_device
 
   host_device_num = omp_get_device_num ()
   if (host_device_num .ne. omp_get_initial_device ()) stop 1
 
-  !$omp target map(from: device_num, initial_device)
-  initial_device = omp_is_initial_device ()
-  device_num = omp_get_device_num ()
-  !$omp end target
-
-  if (initial_device .and. (host_device_num .ne. device_num)) stop 2
-  if ((.not. initial_device) .and. (host_device_num .eq. device_num)) stop 3
+  do i = 0, omp_get_num_devices ()
+!$omp target map(from: device_num, initial_device) device(i)
+  initial_device = omp_is_initial_device ()
+  device_num = omp_get_device_num ()
+!$omp end target
+if (i /= device_num) stop 2
+if (initial_device .and. (host_device_num .ne. device_num)) stop 3
+if ((.not. initial_device) .and. (host_device_num .eq. device_num)) stop 4
+  end do
 
 end program main


[power-ieee128] fortran, libgfortran: Assorted -mabi=ieeelongdouble I/O fixes

2022-01-04 Thread Jakub Jelinek via Gcc-patches
Hi!

Another patch, this fixes:
FAIL: gfortran.dg/intrinsic_spread_2.f90   -O0  execution test
FAIL: gfortran.dg/intrinsic_spread_2.f90   -O1  execution test
FAIL: gfortran.dg/intrinsic_spread_2.f90   -O2  execution test
FAIL: gfortran.dg/intrinsic_spread_2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/intrinsic_spread_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/intrinsic_spread_2.f90   -Os  execution test
FAIL: gfortran.dg/intrinsic_unpack_2.f90   -O0  execution test
FAIL: gfortran.dg/intrinsic_unpack_2.f90   -O1  execution test
FAIL: gfortran.dg/intrinsic_unpack_2.f90   -O2  execution test
FAIL: gfortran.dg/intrinsic_unpack_2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/intrinsic_unpack_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/intrinsic_unpack_2.f90   -Os  execution test
FAIL: gfortran.dg/large_real_kind_form_io_1.f90   -O0  execution test
FAIL: gfortran.dg/large_real_kind_form_io_1.f90   -O1  execution test
FAIL: gfortran.dg/large_real_kind_form_io_1.f90   -O2  execution test
FAIL: gfortran.dg/large_real_kind_form_io_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/large_real_kind_form_io_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/large_real_kind_form_io_1.f90   -Os  execution test
FAIL: gfortran.dg/quad_2.f90   -O0  execution test
FAIL: gfortran.dg/quad_2.f90   -O1  execution test
FAIL: gfortran.dg/quad_2.f90   -O2  execution test
FAIL: gfortran.dg/quad_2.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/quad_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/quad_2.f90   -Os  execution test

Ok for power-ieee128?

2022-01-04  Jakub Jelinek  

gcc/fortran/
* trans-io.c (transfer_array_desc): Pass abi kind instead of kind
to libgfortran.
libgfortran/
* io/read.c (convert_real): Add missing break; for the
HAVE_GFC_REAL_17 case.

--- gcc/fortran/trans-io.c.jj   2022-01-04 10:27:56.498322942 +
+++ gcc/fortran/trans-io.c  2022-01-04 13:51:50.336998696 +
@@ -2528,7 +2528,7 @@ transfer_array_desc (gfc_se * se, gfc_ty
   else
 charlen_arg = build_int_cst (gfc_charlen_type_node, 0);
 
-  kind_arg = build_int_cst (integer_type_node, ts->kind);
+  kind_arg = build_int_cst (integer_type_node, gfc_type_abi_kind (ts));
 
   tmp = gfc_build_addr_expr (NULL_TREE, dt_parm);
   if (last_dt == READ)
--- libgfortran/io/read.c.jj2022-01-04 10:27:56.518323381 +
+++ libgfortran/io/read.c   2022-01-04 13:58:51.676285518 +
@@ -203,6 +203,7 @@ convert_real (st_parameter_dt *dtp, void
 # else
   *((GFC_REAL_17*) dest) = __qmath_(strtoflt128) (buffer, &endptr);
 # endif
+  break;
 #endif
 
 default:

Jakub



Re: [power-ieee128] libgfortran: -mabi=ieeelongdouble I/O fix

2022-01-04 Thread Thomas Koenig via Gcc-patches

On 04.01.22 14:41, Jakub Jelinek via Fortran wrote:

Ok for power-ieee128?


OK.


Re: [power-ieee128] fortran, libgfortran: Assorted -mabi=ieeelongdouble I/O fixes

2022-01-04 Thread Thomas Koenig via Gcc-patches

On 04.01.22 15:23, Jakub Jelinek via Fortran wrote:

Ok for power-ieee128?


Also OK.

Best regards

Thomas


[PING^2][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-04 Thread Dan Li via Gcc-patches

Gentile ping for this again :), thanks.
Link: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586204.html


Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768


Signed-off-by: Dan Li 

gcc/c-family/ChangeLog:

* c-attribs.c (handle_no_sanitize_shadow_call_stack_attribute):
New.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_shadow_call_stack_enabled):
New decl.
* config/aarch64/aarch64.c (aarch64_shadow_call_stack_enabled):
New.
(aarch64_expand_prologue):  Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue):  Pop x30 frome SCS.
* config/aarch64/aarch64.h (TARGET_SUPPORT_SHADOW_CALL_STACK):
New macro.
(TARGET_CHECK_SCS_RESERVED_REGISTER):   Likewise.
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop):  Likewise.
* defaults.h (TARGET_SUPPORT_SHADOW_CALL_STACK):New macro.
* doc/extend.texi:  Document -fsanitize=shadow-call-stack.
* doc/invoke.texi:  Document attribute.
* flag-types.h (enum sanitize_code):Add
SANITIZE_SHADOW_CALL_STACK.
* opts-global.c (handle_common_deferred_options):   Add SCS
compile option check.
* opts.c (finish_options):  Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
---
 gcc/c-family/c-attribs.c  | 21 +
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.c  | 27 +++
 gcc/config/aarch64/aarch64.h  | 11 +
 gcc/config/aarch64/aarch64.md | 18 
 gcc/defaults.h|  4 ++
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/invoke.texi   | 29 
 gcc/flag-types.h  |  2 +
 gcc/opts-global.c |  6 +++
 gcc/opts.c| 12 +
 .../gcc.target/aarch64/shadow_call_stack_1.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_2.c  |  6 +++
 .../gcc.target/aarch64/shadow_call_stack_3.c  | 45 +++
 .../gcc.target/aarch64/shadow_call_stack_4.c  | 18 
 15 files changed, 213 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/shadow_call_stack_4.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..9b3a35c06bf 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
+static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
+ tree, int, bool *);
 static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
 int, bool *);
 static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
@@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_no_sanitize_attribute, NULL },
   { "no_sanitize_address",0, 0, true, false, false, false,
  handle_no_sanitize_address_attribute, NULL },
+  { "no_sanitize_shadow_call_stack",
+ 0, 0, true, false, false, false,
+ handle_no_sanitize_shadow_call_stack_attribute,
+ NULL },
   { "no_sanitize_thread", 0, 0, true, false, false, false,
  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
@@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize

[PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Andrew Stubbs
This patch implements the OpenMP pinned memory trait for Linux hosts. On 
other hosts and on devices the trait becomes a no-op (instead of being 
rejected).


The memory is locked via the mlock syscall, which is both the "correct" 
way to do it on Linux, and a problem because the default ulimit for 
pinned memory is very small (and most users don't have permission to 
increase it (much?)). Therefore the code emits a non-fatal warning 
message if locking fails.


Another approach might be to use cudaHostAlloc to allocate the memory in 
the first place, which bypasses the ulimit somehow, but this would not 
help non-NVidia users.


The tests work on Linux and will xfail on other hosts; neither libgomp 
nor the test knows how to allocate or query pinned memory elsewhere.


The patch applies on top of the text of my previously submitted patches, 
but does not actually depend on the functionality of those patches.


OK for stage 1?

I'll commit a backport to OG11 shortly.

Andrewlibgomp: pinned memory

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_PIN): New macro.
(xmlock): New function.
(omp_init_allocator): Don't disallow the pinned trait.
(omp_aligned_alloc): Add pinning via MEMSPACE_PIN.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index b1f5fe0a5e2..671b91e7ff8 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -51,6 +51,25 @@
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
   ((void)MEMSPACE, (void)SIZE, free (ADDR))
 #endif
+#ifndef MEMSPACE_PIN
+/* Only define this on supported host platforms.  */
+#ifdef __linux__
+#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
+  ((void)MEMSPACE, xmlock (ADDR, SIZE))
+
+#include 
+#include 
+void
+xmlock (void *addr, size_t size)
+{
+  if (mlock (addr, size))
+  perror ("libgomp: failed to pin memory (ulimit too low?)");
+}
+#else
+#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
+  ((void)MEMSPACE, (void)ADDR, (void)SIZE)
+#endif
+#endif
 
 /* Map the predefined allocators to the correct memory space.
The index to this table is the omp_allocator_handle_t enum value.  */
@@ -212,7 +231,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int 
ntraits,
 data.alignment = sizeof (void *);
 
   /* No support for these so far (for hbw will use memkind).  */
-  if (data.pinned || data.memspace == omp_high_bw_mem_space)
+  if (data.memspace == omp_high_bw_mem_space)
 return omp_null_allocator;
 
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
@@ -326,6 +345,9 @@ retry:
 #endif
  goto fail;
}
+
+  if (allocator_data->pinned)
+   MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
 }
   else
 {
@@ -335,6 +357,9 @@ retry:
   ptr = MEMSPACE_ALLOC (memspace, new_size);
   if (ptr == NULL)
goto fail;
+
+  if (allocator_data && allocator_data->pinned)
+   MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
 }
 
   if (new_alignment > sizeof (void *))
@@ -539,6 +564,9 @@ retry:
 #endif
  goto fail;
}
+
+  if (allocator_data->pinned)
+   MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
 }
   else
 {
@@ -548,6 +576,9 @@ retry:
   ptr = MEMSPACE_CALLOC (memspace, new_size);
   if (ptr == NULL)
goto fail;
+
+  if (allocator_data && allocator_data->pinned)
+   MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
 }
 
   if (new_alignment > sizeof (void *))
@@ -727,7 +758,11 @@ retry:
 #endif
  goto fail;
}
-  else if (prev_size)
+
+  if (allocator_data->pinned)
+   MEMSPACE_PIN (allocator_data->memspace, new_ptr, new_size);
+
+  if (prev_size)
{
  ret = (char *) new_ptr + sizeof (struct omp_mem_header);
  ((struct omp_mem_header *) ret)[-1].ptr = new_ptr;
@@ -747,6 +782,10 @@ retry:
   new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
   if (new_ptr == NULL)
goto fail;
+
+  if (allocator_data && allocator_data->pinned)
+   MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
+
   ret = (char *) new_ptr + sizeof (struct omp_mem_header);
   ((struct omp_mem_header *) ret)[-1].ptr = new_ptr;
   ((struct omp_mem_header *) ret)[-1].size = new_size;
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
new file mode 100644
index 000..0a6360cda29
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu 
} } */
+
+/* Test that pinned memory works.  */
+
+#ifdef __linux__
+#include 
+#include 
+#include 
+#include 
+
+#inc

[PATCH] tree-optimization/103800 - sanity check more PHI vectorization

2022-01-04 Thread Richard Biener via Gcc-patches
Bool pattern detection doesn't really handle PHIs well so we have
to be prepared for mismatched vector types in more cases than
originally thought.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-01-04  Richard Biener  

PR tree-optimization/103800
* tree-vect-loop.c (vectorizable_phi): Remove assert and
expand comment.

* gcc.dg/vect/bb-slp-pr103800.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr103800.c | 17 +
 gcc/tree-vect-loop.c| 10 --
 2 files changed, 21 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr103800.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr103800.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr103800.c
new file mode 100644
index 000..33c2d2081cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr103800.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int a;
+long b;
+extern int c[], d[];
+extern _Bool e[];
+void f() {
+  if (a)
+;
+  for (;;) {
+for (int g = 2; g; g = a)
+  d[g] = 0;
+for (int h = 1; h < 13; h++)
+  e[h] = b ? (short)c[4 + h - 1] : c[4 + h - 1];
+  }
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ebd7d9c2218..77f1cc0f788 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7850,17 +7850,15 @@ vectorizable_phi (vec_info *,
 && !useless_type_conversion_p (vectype,
SLP_TREE_VECTYPE (child)))
  {
-   /* With bools we can have mask and non-mask precision vectors,
-  while pattern recog is supposed to guarantee consistency here
-  bugs in it can cause mismatches (PR103489 for example).
+   /* With bools we can have mask and non-mask precision vectors
+  or different non-mask precisions.  while pattern recog is
+  supposed to guarantee consistency here bugs in it can cause
+  mismatches (PR103489 and PR103800 for example).
   Deal with them here instead of ICEing later.  */
if (dump_enabled_p ())
  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
   "incompatible vector type setup from "
   "bool pattern detection\n");
-   gcc_checking_assert
- (VECTOR_BOOLEAN_TYPE_P (SLP_TREE_VECTYPE (child))
-  != VECTOR_BOOLEAN_TYPE_P (vectype));
return false;
  }
 
-- 
2.31.1


Re: [PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 04, 2022 at 03:32:17PM +, Andrew Stubbs wrote:
> This patch implements the OpenMP pinned memory trait for Linux hosts. On
> other hosts and on devices the trait becomes a no-op (instead of being
> rejected).
> 
> The memory is locked via the mlock syscall, which is both the "correct" way
> to do it on Linux, and a problem because the default ulimit for pinned
> memory is very small (and most users don't have permission to increase it
> (much?)). Therefore the code emits a non-fatal warning message if locking
> fails.
> 
> Another approach might be to use cudaHostAlloc to allocate the memory in the
> first place, which bypasses the ulimit somehow, but this would not help
> non-NVidia users.
> 
> The tests work on Linux and will xfail on other hosts; neither libgomp nor
> the test knows how to allocate or query pinned memory elsewhere.
> 
> The patch applies on top of the text of my previously submitted patches, but
> does not actually depend on the functionality of those patches.
> 
> OK for stage 1?
> 
> I'll commit a backport to OG11 shortly.
> 
> Andrew

> libgomp: pinned memory
> 
> Implement the OpenMP pinned memory trait on Linux hosts using the mlock
> syscall.
> 
> libgomp/ChangeLog:
> 
>   * allocator.c (MEMSPACE_PIN): New macro.
>   (xmlock): New function.
>   (omp_init_allocator): Don't disallow the pinned trait.
>   (omp_aligned_alloc): Add pinning via MEMSPACE_PIN.
>   (omp_aligned_calloc): Likewise.
>   (omp_realloc): Likewise.
>   * testsuite/libgomp.c/alloc-pinned-1.c: New test.
>   * testsuite/libgomp.c/alloc-pinned-2.c: New test.
> 
> diff --git a/libgomp/allocator.c b/libgomp/allocator.c
> index b1f5fe0a5e2..671b91e7ff8 100644
> --- a/libgomp/allocator.c
> +++ b/libgomp/allocator.c
> @@ -51,6 +51,25 @@
>  #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
>((void)MEMSPACE, (void)SIZE, free (ADDR))
>  #endif
> +#ifndef MEMSPACE_PIN
> +/* Only define this on supported host platforms.  */
> +#ifdef __linux__
> +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
> +  ((void)MEMSPACE, xmlock (ADDR, SIZE))
> +
> +#include 
> +#include 
> +void
> +xmlock (void *addr, size_t size)
> +{
> +  if (mlock (addr, size))
> +  perror ("libgomp: failed to pin memory (ulimit too low?)");
> +}
> +#else
> +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
> +  ((void)MEMSPACE, (void)ADDR, (void)SIZE)
> +#endif
> +#endif

The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but
instead add libgomp/config/linux/allocator.c that includes some headers,
defines some macros and then includes the generic allocator.c.

I think perror is the wrong thing to do, omp_alloc etc. has a well defined
interface what to do in such cases - the allocation should just fail (not be
allocated) and depending on user's choice that can be fatal, or return NULL,
or chain to some other allocator with other properties etc.

Other issues in the patch are that it doesn't munlock on deallocation and
that because of that deallocation we need to figure out what to do on page
boundaries.  As documented, mlock can be passed address and/or address +
size that aren't at page boundaries and pinning happens even just for
partially touched pages.  But munlock unpins also even the partially
overlapping pages and we don't know at that point whether some other pinned
allocations don't appear in those pages.
Some bad options are only pin pages wholy contained within the allocation
and don't pin partial pages around it, force at least page alignment and
size so that everything can be pinned, somehow ensure that we never allocate
more than one pinned allocation in such partial pages (but can allocate
there non-pinned allocations), or e.g. use some internal data structure to
track how many pinned allocations are on the partial pages (say a hash map
from page start address to a counter how many pinned allocations are there,
if it goes to 0 munlock even that page, otherwise munlock just the wholy
contained pages), or perhaps use page size aligned allocation and size and
just remember in some data structure that the partial pages could be used
for other pinned (small) allocations.

Jakub



Patch Ping : [Patch][V2]Enable -Wuninitialized + -ftrivial-auto-var-init for address taken variables

2022-01-04 Thread Qing Zhao via Gcc-patches
Hi,

I’d like to ping the patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587014.html

Please take a look and let me know whether it’s okay for committing?

Thanks.

Qing

> On Dec 16, 2021, at 9:59 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi,
> 
> This is the 2nd version of the patch.
> The original patch is at:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586341.html
> 
> In addition to resolve the two issues mentioned in the original patch,
> This patch also can be used as a very good workaround for the issue in 
> PR103720
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103720
> 
> And as I checked, the patch can fix all the bogus uninitialized warnings when
> building kernel with -O2 + -ftrivial-auto-var-init=zero + -Wuninitialized.
> 
> So, this is a very important patch that need to be included into gcc12.
> 
> Compared to the 1st patch, the major changes are to resolve Martin’s comments 
> on
> tree-ssa-uninit.c
> 
> 1.  Add some meaningful temporaries to break the long expression to make it
> Readable. And also add comments to explain the purpose of the statement;
> 
> 2.  Resolve the memory leakage of the dynamically created string.
> 
> The patch has been bootstrapped and regressing tested on both x86 and 
> aarch64, no issues.
> Okay for commit?
> 
> thanks.
> 
> Qing
> 
> =
> 
> **Compared to the 1st version, the code change is:
> 
> --- a/gcc/tree-ssa-uninit.c
> +++ b/gcc/tree-ssa-uninit.c
> @@ -182,9 +182,22 @@ warn_uninit (opt_code opt, tree t, tree var, const char 
> *gmsgid,
> @@ -798,26 +798,35 @@
>if (!var && !SSA_NAME_VAR (t))
>  {
>gimple *def_stmt = SSA_NAME_DEF_STMT (t);
> -@@ -197,9 +210,34 @@ warn_uninit (opt_code opt, tree t, tree var, const char 
> *gmsgid,
> +@@ -197,9 +210,43 @@ warn_uninit (opt_code opt, tree t, tree var, const char 
> *gmsgid,
>   && zerop (gimple_assign_rhs2 (def_stmt)))
> var = SSA_NAME_VAR (v);
> }
> +
> +  if (gimple_call_internal_p (def_stmt, IFN_DEFERRED_INIT))
> +   {
> ++tree lhs_var = NULL_TREE;
> ++tree lhs_var_name = NULL_TREE;
> ++const char *lhs_var_name_str = NULL;
> + /* Get the variable name from the 3rd argument of call.  */
> + var_name = gimple_call_arg (def_stmt, 2);
> + var_name = TREE_OPERAND (TREE_OPERAND (var_name, 0), 0);
> + var_name_str = TREE_STRING_POINTER (var_name);
> +
> -+if (is_gimple_assign (context)
> -+&& TREE_CODE (gimple_assign_lhs (context)) == VAR_DECL
> -+&& DECL_NAME (gimple_assign_lhs (context))
> -+&& IDENTIFIER_POINTER (DECL_NAME (gimple_assign_lhs (context
> -+  if (strcmp
> -+(IDENTIFIER_POINTER (DECL_NAME (gimple_assign_lhs 
> (context))),
> -+ var_name_str) == 0)
> -+return;
> ++/* Ignore the call to .DEFERRED_INIT that define the original
> ++   var itself.  */
> ++if (is_gimple_assign (context))
> ++  {
> ++if (TREE_CODE (gimple_assign_lhs (context)) == VAR_DECL)
> ++  lhs_var = gimple_assign_lhs (context);
> ++else if (TREE_CODE (gimple_assign_lhs (context)) == SSA_NAME)
> ++  lhs_var = SSA_NAME_VAR (gimple_assign_lhs (context));
> ++  }
> ++if (lhs_var
> ++&& (lhs_var_name = DECL_NAME (lhs_var))
> ++&& (lhs_var_name_str = IDENTIFIER_POINTER (lhs_var_name))
> ++&& (strcmp (lhs_var_name_str, var_name_str) == 0))
> ++  return;
> +
> + /* Get the variable declaration location from the def_stmt.  */
> + var_decl_loc = gimple_location (def_stmt);
> @@ -834,7 +843,7 @@
>  return;
> 
>/* Avoid warning if we've already done so or if the warning has been
> -@@ -207,36 +245,56 @@ warn_uninit (opt_code opt, tree t, tree var, const 
> char *gmsgid,
> +@@ -207,36 +254,54 @@ warn_uninit (opt_code opt, tree t, tree var, const 
> char *gmsgid,
>if (((warning_suppressed_p (context, OPT_Wuninitialized)
> || (gimple_assign_single_p (context)
> && get_no_uninit_warning (gimple_assign_rhs1 (context)
> @@ -863,25 +872,24 @@
> 
>auto_diagnostic_group d;
> -  if (!warning_at (location, opt, gmsgid, var))
> +-return;
> +  char *gmsgid_final = XNEWVEC (char, strlen (gmsgid) + 5);
> +  gmsgid_final[0] = 0;
> -+  if (var)
> -+strcat (gmsgid_final, "%qD ");
> -+  else if (var_name)
> -+strcat (gmsgid_final, "%qs ");
> ++  strcat (gmsgid_final, var ? "%qD " : "%qs ");
> +  strcat (gmsgid_final, gmsgid);
> +
> -+  if (var && !warning_at (location, opt, gmsgid_final, var))
> -+return;
> -+  else if (var_name && !warning_at (location, opt, gmsgid_final, 
> var_name_str))
> - return;
> ++  if ((var && !warning_at (location, opt, gmsgid_final, var))
> ++  || (var_name && !warning_at (location, opt, gmsgid_final, 
> var_name_st

[PATCH] c++: constexpr base-to-derived conversion with offset 0 [PR103879]

2022-01-04 Thread Patrick Palka via Gcc-patches
r12-136 made us canonicalize an object/offset pair with negative offset
into one with a nonnegative offset, by iteratively absorbing the
innermost component into the offset and stopping as soon as the offset
becomes nonnegative.

This patch strengthens this transformation to make it keep absorbing
even if the offset is already 0 as long as the innermost component is at
position 0 (and thus absorbing doesn't change the offset).  This lets us
accept the two constexpr testcases below, which we'd previously reject
essentially because cxx_fold_indirect_ref wasn't able to resolve
*(B*)&b.D123 (where D123 is the base subobject A at position 0) to just b.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/103879

gcc/cp/ChangeLog:

* constexpr.c (cxx_fold_indirect_ref): Split out object/offset
canonicalization step into a local lambda.  Strengthen it to
absorb more components at position 0.  Use it before both calls
to cxx_fold_indirect_ref_1.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-base2.C: New test.
* g++.dg/cpp1y/constexpr-base2a.C: New test.
---
 gcc/cp/constexpr.c| 38 +--
 gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C  | 21 ++
 gcc/testsuite/g++.dg/cpp1y/constexpr-base2a.C | 25 
 3 files changed, 72 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-base2a.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 72be45c9e87..1ec33a00ee5 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -5144,6 +5144,25 @@ cxx_fold_indirect_ref (const constexpr_ctx *ctx, 
location_t loc, tree type,
   if (!INDIRECT_TYPE_P (subtype))
 return NULL_TREE;
 
+  /* Canonicalizes the given OBJ/OFF pair by iteratively absorbing
+ the innermost component into the offset until the offset is
+ nonnegative, so that cxx_fold_indirect_ref_1 can identify
+ more folding opportunities.  */
+  auto canonicalize_obj_off = [] (tree& obj, tree& off) {
+while (TREE_CODE (obj) == COMPONENT_REF
+  && (tree_int_cst_sign_bit (off) || integer_zerop (off)))
+  {
+   tree field = TREE_OPERAND (obj, 1);
+   tree pos = byte_position (field);
+   if (integer_zerop (off) && integer_nonzerop (pos))
+ /* If the offset is already 0, keep going as long as the
+component is at position 0.  */
+ break;
+   off = int_const_binop (PLUS_EXPR, off, pos);
+   obj = TREE_OPERAND (obj, 0);
+  }
+  };
+
   if (TREE_CODE (sub) == ADDR_EXPR)
 {
   tree op = TREE_OPERAND (sub, 0);
@@ -5162,7 +5181,12 @@ cxx_fold_indirect_ref (const constexpr_ctx *ctx, 
location_t loc, tree type,
return op;
}
   else
-   return cxx_fold_indirect_ref_1 (ctx, loc, type, op, 0, empty_base);
+   {
+ tree off = integer_zero_node;
+ canonicalize_obj_off (op, off);
+ gcc_assert (integer_zerop (off));
+ return cxx_fold_indirect_ref_1 (ctx, loc, type, op, 0, empty_base);
+   }
 }
   else if (TREE_CODE (sub) == POINTER_PLUS_EXPR
   && tree_fits_uhwi_p (TREE_OPERAND (sub, 1)))
@@ -5174,17 +5198,7 @@ cxx_fold_indirect_ref (const constexpr_ctx *ctx, 
location_t loc, tree type,
   if (TREE_CODE (op00) == ADDR_EXPR)
{
  tree obj = TREE_OPERAND (op00, 0);
- while (TREE_CODE (obj) == COMPONENT_REF
-&& tree_int_cst_sign_bit (off))
-   {
- /* Canonicalize this object/offset pair by iteratively absorbing
-the innermost component into the offset until the offset is
-nonnegative, so that cxx_fold_indirect_ref_1 can identify
-more folding opportunities.  */
- tree field = TREE_OPERAND (obj, 1);
- off = int_const_binop (PLUS_EXPR, off, byte_position (field));
- obj = TREE_OPERAND (obj, 0);
-   }
+ canonicalize_obj_off (obj, off);
  return cxx_fold_indirect_ref_1 (ctx, loc, type, obj,
  tree_to_uhwi (off), empty_base);
}
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C
new file mode 100644
index 000..7cbf5bf32b7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C
@@ -0,0 +1,21 @@
+// PR c++/103879
+// { dg-do compile { target c++14 } }
+
+struct A {
+  int n = 42;
+};
+
+struct B : A { };
+
+struct C {
+  B b;
+};
+
+constexpr int f() {
+  C c;
+  A& a = static_cast(c.b);
+  B& b = static_cast(a);
+  return b.n;
+}
+
+static_assert(f() == 42, "");
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-base2a.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-base2a.C
new file mode 100644
index 000..872e9bb6d6a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-base2a.C
@@ -0,0

Re: [PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Andrew Stubbs

On 04/01/2022 15:55, Jakub Jelinek wrote:

The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but
instead add libgomp/config/linux/allocator.c that includes some headers,
defines some macros and then includes the generic allocator.c.


OK, good point, I can do that.


I think perror is the wrong thing to do, omp_alloc etc. has a well defined
interface what to do in such cases - the allocation should just fail (not be
allocated) and depending on user's choice that can be fatal, or return NULL,
or chain to some other allocator with other properties etc.


I did it this way because pinning feels more like an optimization, and 
falling back to "just works" seemed like what users would want to 
happen. The perror was added because it turns out the default ulimit is 
tiny and I wanted to hint at the solution.


I guess you're right that the consistent behaviour would be to silently 
switch to the fallback allocator, but it still feels like users will be 
left in the dark about why it failed.



Other issues in the patch are that it doesn't munlock on deallocation and
that because of that deallocation we need to figure out what to do on page
boundaries.  As documented, mlock can be passed address and/or address +
size that aren't at page boundaries and pinning happens even just for
partially touched pages.  But munlock unpins also even the partially
overlapping pages and we don't know at that point whether some other pinned
allocations don't appear in those pages.


Right, it doesn't munlock because of these issues. I don't know of any 
way to solve this that wouldn't involve building tables of locked ranges 
(and knowing what the page size is).


I considered using mmap with the lock flag instead, but the failure mode 
looked unhelpful. I guess we could mmap with the regular flags, then 
mlock after. That should bypass the regular heap and ensure each 
allocation has it's own page. I'm not sure what the unintended 
side-effects of that might be.



Some bad options are only pin pages wholy contained within the allocation
and don't pin partial pages around it, force at least page alignment and
size so that everything can be pinned, somehow ensure that we never allocate
more than one pinned allocation in such partial pages (but can allocate
there non-pinned allocations), or e.g. use some internal data structure to
track how many pinned allocations are on the partial pages (say a hash map
from page start address to a counter how many pinned allocations are there,
if it goes to 0 munlock even that page, otherwise munlock just the wholy
contained pages), or perhaps use page size aligned allocation and size and
just remember in some data structure that the partial pages could be used
for other pinned (small) allocations.


Bad options indeed. If any part of the memory block is not pinned I 
expect no performance gains whatsoever. And all this other business adds 
complexity and runtime overhead.


For version 1.0 it feels reasonable to omit the unlock step and hope 
that a) pinned data will be long-lived, or b) short-lived pinned data 
will be replaced with more data that -- most likely -- occupies the same 
pages.


Similarly, it seems likely that serious HPC applications will run on 
devices with lots of RAM, and if not any page swapping will destroy the 
performance gains of using OpenMP.


For now I'll just fix the architectural issues.

Andrew


Re: [PATCH] Use enclosing object size if it's smaller than member [PR 101475]

2022-01-04 Thread Martin Sebor via Gcc-patches

On 12/20/21 12:29 PM, Jeff Law wrote:



On 12/16/2021 12:56 PM, Martin Sebor via Gcc-patches wrote:

Enabling vectorization at -O2 caused quite a few tests for
warnings to start failing in GCC 12.  These tests were xfailed
and bugs were opened to track the problems until they can be
fully analyzed and ultimately fixed before GCC 12 is released.

I've now started going through these and the first such bug
I tackled is PR 102944.  As it turns out, the xfails there
are all due to a known limitation tracked in PR 101475: when
determining the size of a destination for A COMPONENT_REF,
unless asked for the size of the complete object,
compute_objsize() only considers the size of the referenced
member, even when the member is larger than the object would
allow.  This prevents warnings from diagnosing unvectorized
past-the-end accesses to objects in backing buffers (such as
in character arrays or allocated chunks of memory).

Many (though not all) accesses that are vectorized are diagnosed
because there the COMPONENT_REF is replaced by a MEM_REF.  But
because vectorization depends on target-specific things like
alignment requirements, what is and isn't diagnosed also tends
to be target-specific, making these tests quite brittle..

The attached patch corrects this oversight by using the complete
object's size instead of the member when the former is smaller.
Besides improving the out-of-bounds access detection it also
makes the tests behave more consistently across targets.

Tested on x86_64-linux and by building Glibc and verifying
that the change triggers no new warnings.
I must be missing something here.  How can the enclosing object be 
smaller than a member?


When the enclosing object is backed by a buffer of insufficient
size.  The buffer might be a declared character array such as
in the the tests added and modified by the patch, or it might
be dynamically allocated.

Martin


[power-ieee128] fortran, libgfortran: Add remaining missing *_r17 symbols

2022-01-04 Thread Jakub Jelinek via Gcc-patches
Hi!

Following patch adds remaining missing *_r17 entrypoints, so that
we have 91 *_r16 and 91 *_r17 entrypoints (and 24 *_c16 and 24 *_c17).

This fixes:
FAIL: gfortran.dg/dec_math.f90   -O0  execution test
FAIL: gfortran.dg/dec_math.f90   -O1  execution test
FAIL: gfortran.dg/dec_math.f90   -O2  execution test
FAIL: gfortran.dg/dec_math.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/dec_math.f90   -O3 -g  execution test
FAIL: gfortran.dg/dec_math.f90   -Os  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test

Ok for power-ieee128?

2022-01-04  Jakub Jelinek  

gcc/fortran/
* trans-intrinsic.c (gfc_get_intrinsic_lib_fndecl): Use
gfc_type_abi_kind.
libgfortran/
* libgfortran.h (GFC_REAL_17_INFINITY, GFC_REAL_17_QUIET_NAN): Define.
(__erfcieee128): Declare.
* intrinsics/trigd.c (_gfortran_sind_r17, _gfortran_cosd_r17,
_gfortran_tand_r17): Define for HAVE_GFC_REAL_17.
* intrinsics/random.c (random_r17, arandom_r17, rnumber_17): Define.
* intrinsics/erfc_scaled.c (ERFC_SCALED): Define.
(erfc_scaled_r16): Use ERFC_SCALED macro.
(erfc_scaled_r17): Define.

--- gcc/fortran/trans-intrinsic.c.jj2021-12-31 11:08:18.642826955 +
+++ gcc/fortran/trans-intrinsic.c   2022-01-04 15:32:29.789881496 +
@@ -881,7 +881,7 @@ gfc_get_intrinsic_lib_fndecl (gfc_intrin
 {
   snprintf (name, sizeof (name), PREFIX ("%s_%c%d"), m->name,
ts->type == BT_COMPLEX ? 'c' : 'r',
-   ts->kind);
+   gfc_type_abi_kind (ts));
 }
 
   argtypes = NULL;
--- libgfortran/libgfortran.h.jj2022-01-04 10:27:56.528323600 +
+++ libgfortran/libgfortran.h   2022-01-04 16:44:54.075203222 +
@@ -309,6 +309,9 @@ typedef GFC_UINTEGER_4 gfc_char4_t;
 #   define GFC_REAL_16_INFINITY __builtin_infq ()
 #  endif
 # endif
+# ifdef HAVE_GFC_REAL_17
+#  define GFC_REAL_17_INFINITY __builtin_inff128 ()
+# endif
 #endif
 #if __FLT_HAS_QUIET_NAN__
 # define GFC_REAL_4_QUIET_NAN __builtin_nanf ("")
@@ -327,6 +330,9 @@ typedef GFC_UINTEGER_4 gfc_char4_t;
 #   define GFC_REAL_16_QUIET_NAN nanq ("")
 #  endif
 # endif
+# ifdef HAVE_GFC_REAL_17
+#  define GFC_REAL_17_QUIET_NAN __builtin_nanf128 ("")
+# endif
 #endif
 
 typedef struct descriptor_dimension
@@ -1954,6 +1960,8 @@ extern __float128 __coshieee128 (__float
   __attribute__ ((__nothrow__, __leaf__));
 extern __float128 __cosieee128 (__float128)
   __attribute__ ((__nothrow__, __leaf__));
+extern __float128 __erfcieee128 (__float128)
+  __attribute__ ((__nothrow__, __leaf__));
 extern __float128 __erfieee128 (__float128)
   __attribute__ ((__nothrow__, __leaf__));
 extern __float128 __expieee128 (__float128)
--- libgfortran/intrinsics/trigd.c.jj   2021-12-31 11:00:58.083137032 +
+++ libgfortran/intrinsics/trigd.c  2022-01-04 16:29:56.585599529 +
@@ -289,3 +289,42 @@ see the files COPYING3 and COPYING.RUNTI
 #undef HAVE_INFINITY_KIND
 
 #endif /* HAVE_GFC_REAL_16 */
+
+#ifdef HAVE_GFC_REAL_17
+
+/* Build _gfortran_sind_r17, _gfortran_cosd_r17, and _gfortran_tand_r17  */
+
+#define KIND   17
+#define TINY   0x1.p-16400 /* ~= 1.28e-4937 */
+#undef  SIND_SMALL /* not precise */
+
+/* Proper float128 precision.  */
+#define COSD_SMALL  0x1.p-51   /* ~= 4.441e-16 */
+#define COSD30  8.66025403784438646763723170752936183e-01
+#define PIO180H 1.74532925199433197605003442731685936e-02
+#define PIO180L -2.39912634365882824665106671063098954e-17
+
+/* libquadmath or glibc 2.32+: HAVE_*Q are never defined.  They must be 
available.  */
+#define ENABLE_SIND
+#define ENABLE_COSD
+#define ENABLE_TAND
+
+#ifdef GFC_REAL_17_INFINITY
+#define HAVE_INFINITY_KIND
+#endif
+
+#include "trigd_lib.inc"
+
+#undef KIND
+#undef TINY
+#undef COSD_SMALL
+#undef SIND_SMALL
+#undef COSD30
+#undef PIO180H
+#undef PIO180L
+#undef ENABLE_SIND
+#undef ENABLE_COSD
+#undef ENABLE_TAND
+#undef HAVE_INFINITY_KIND
+
+#endif /* HAVE_GFC_REAL_17 */
--- libgfortran/intrinsics/random.c.jj  2021-12-31 11:00:58.083137032 +
+++ libgfortran/intrinsics/random.c 2022-01-04 16:40:37.819575318 +
@@ -79,6 +79,16 @@ export_proto(arandom_r16);
 
 #endif
 
+#ifdef HAVE_GFC_REAL_17
+
+extern void random_r17 (GFC_REAL_17 *);
+iexport_proto(random_r17);
+
+extern void arandom_r17 (gfc_array_r17 *);
+export_proto(arandom_r17);
+
+#endif
+
 #ifdef __GTHREAD_MUTEX_INIT
 static __gthread_mutex_t random_lock = __GTHREAD_MUTEX_INIT;
 #else
@@ -161,6 +171,27 @@ rnumber_16 (GFC_REAL_16 *f, GFC_UINTEGER

[PATCH] c++: "more constrained" vs staticness of memfn [PR103783]

2022-01-04 Thread Patrick Palka via Gcc-patches
Here we're rejecting the calls to g1 and g2 as ambiguous even though one
overload is more constrained than the other (and otherwise equivalent),
because the implicit 'this' parameter of the non-static overload causes
cand_parms_match to think the function parameter lists aren't equivalent.

This patch fixes this by making cand_parms_match skip over 'this'
appropriately.  Note that this bug only occurs with non-template member
functions because for the template case more_specialized_fns seems to
already skips over 'this' appropriately.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 11?

PR c++/103783

gcc/cp/ChangeLog:

* call.c (cand_parms_match): Skip over 'this' when given one
static and one non-static member function.  Declare static.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-memfun2.C: New test.
---
 gcc/cp/call.c | 17 ++---
 gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C | 25 +++
 2 files changed, 39 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 7f7ee88deed..ed74b907828 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -11918,7 +11918,7 @@ joust_maybe_elide_copy (z_candidate *&cand)
 /* True if the defining declarations of the two candidates have equivalent
parameters.  */
 
-bool
+static bool
 cand_parms_match (z_candidate *c1, z_candidate *c2)
 {
   tree fn1 = c1->fn;
@@ -11940,8 +11940,19 @@ cand_parms_match (z_candidate *c1, z_candidate *c2)
   fn1 = DECL_TEMPLATE_RESULT (t1);
   fn2 = DECL_TEMPLATE_RESULT (t2);
 }
-  return compparms (TYPE_ARG_TYPES (TREE_TYPE (fn1)),
-   TYPE_ARG_TYPES (TREE_TYPE (fn2)));
+  tree parms1 = TYPE_ARG_TYPES (TREE_TYPE (fn1));
+  tree parms2 = TYPE_ARG_TYPES (TREE_TYPE (fn2));
+  if (DECL_FUNCTION_MEMBER_P (fn1)
+  && DECL_FUNCTION_MEMBER_P (fn2)
+  && (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn1)
+ != DECL_NONSTATIC_MEMBER_FUNCTION_P (fn2)))
+{
+  /* Ignore 'this' when comparing the parameters of a static member
+function with those of a non-static one.  */
+  parms1 = skip_artificial_parms_for (fn1, parms1);
+  parms2 = skip_artificial_parms_for (fn2, parms2);
+}
+  return compparms (parms1, parms2);
 }
 
 /* Compare two candidates for overloading as described in
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C
new file mode 100644
index 000..e3845e48387
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C
@@ -0,0 +1,25 @@
+// PR c++/103783
+// { dg-do compile { target c++20 } }
+
+template
+struct A {
+  template void f1() = delete;
+  template static void f1() requires B;
+
+  template void f2() requires B;
+  template static void f2() = delete;
+
+  void g1() = delete;
+  static void g1() requires B;
+
+  void g2() requires B;
+  static void g2() = delete;
+};
+
+int main() {
+  A a;
+  a.f1(); // OK
+  a.f2(); // OK
+  a.g1(); // OK, previously rejected as ambiguous
+  a.g2(); // OK, previously rejected as ambiguous
+}
-- 
2.34.1.428.gdcc0cd074f



PING 3 [PATCH v2 1/2] add -Wuse-after-free

2022-01-04 Thread Martin Sebor via Gcc-patches

Ping.  (CC'ing Jason as requested.)

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585816.html

On 12/13/21 9:48 AM, Martin Sebor wrote:

Ping.

Jeff, I addressed your comments in the updated patch.  If there
are no other changes is the last revision okay to commit?

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585816.html

On 12/6/21 5:50 PM, Martin Sebor wrote:

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585816.html

On 11/30/21 3:32 PM, Martin Sebor wrote:

Attached is a revised patch with the following changes based
on your comments:

1) Set and use statement uids to determine which statement
    precedes which in the same basic block.
2) Avoid testing flag_isolate_erroneous_paths_dereference.
3) Use post-dominance to decide whether to use the "maybe"
    phrasing vs a definite form.

David raised (and in our offline discussion today reiterated)
an objection to the default setting of the option being
the strictest.  I have not changed that in this revision.
See my rationale for this choice in my reply below:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583176.html

Martin

On 11/23/21 2:16 PM, Martin Sebor wrote:

On 11/22/21 6:32 PM, Jeff Law wrote:



On 11/1/2021 4:17 PM, Martin Sebor via Gcc-patches wrote:

Patch 1 in the series detects a small subset of uses of pointers
made indeterminate by calls to deallocation functions like free
or C++ operator delete.  To control the conditions the warnings
are issued under the new -Wuse-after-free= option provides three
levels.  At the lowest level the warning triggers only for
unconditional uses of freed pointers and doesn't warn for uses
in equality expressions.  Level 2 warns also for come conditional
uses, and level 3 also for uses in equality expressions.

I debated whether to make level 2 or 3 the default included in
-Wall.  I decided on 3 for two reasons: 1) to raise awareness
of both the problem and GCC's new ability to detect it: using
a pointer after it's been freed, even only in principle, by
a successful call to realloc, is undefined, and 2) because
it's trivial to lower the level either globally, or locally
by suppressing the warning around such misuses.

I've tested the patch on x86_64-linux and by building Glibc
and Binutils/GDB.  It triggers a number of times in each, all
due to comparing invalidated pointers for equality (i.e., level
3).  I have suppressed these in GCC (libiberty) by a #pragma,
and will see how the Glibc folks want to deal with theirs (I
track them in BZ #28521).

The tests contain a number of xfails due to limitations I'm
aware of.  I marked them pr?? until the patch is approved.
I will open bugs for them before committing if I don't resolve
them in a followup.

Martin

gcc-63272-1.diff

Add -Wuse-after-free.

gcc/c-family/ChangeLog

* c.opt (-Wuse-after-free): New options.

gcc/ChangeLog:

* diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle
OPT_Wreturn_local_addr and OPT_Wuse_after_free_.
* diagnostic-spec.h (NW_DANGLING): New enumerator.
* doc/invoke.texi (-Wuse-after-free): Document new option.
* gimple-ssa-warn-access.cc (pass_waccess::check_call): Rename...
(pass_waccess::check_call_access): ...to this.
(pass_waccess::check): Rename...
(pass_waccess::check_block): ...to this.
(pass_waccess::check_pointer_uses): New function.
(pass_waccess::gimple_call_return_arg): New function.
(pass_waccess::warn_invalid_pointer): New function.
(pass_waccess::check_builtin): Handle free and realloc.
(gimple_use_after_inval_p): New function.
(get_realloc_lhs): New function.
(maybe_warn_mismatched_realloc): New function.
(pointers_related_p): New function.
(pass_waccess::check_call): Call check_pointer_uses.
(pass_waccess::execute): Compute and free dominance info.

libcpp/ChangeLog:

* files.c (_cpp_find_file): Substitute a valid pointer for
an invalid one to avoid -Wuse-0after-free.

libiberty/ChangeLog:

* regex.c: Suppress -Wuse-after-free.

gcc/testsuite/ChangeLog:

* gcc.dg/Wmismatched-dealloc-2.c: Avoid -Wuse-after-free.
* gcc.dg/Wmismatched-dealloc-3.c: Same.
* gcc.dg/attr-alloc_size-6.c: Disable -Wuse-after-free.
* gcc.dg/attr-alloc_size-7.c: Same.
* c-c++-common/Wuse-after-free-2.c: New test.
* c-c++-common/Wuse-after-free-3.c: New test.
* c-c++-common/Wuse-after-free-4.c: New test.
* c-c++-common/Wuse-after-free-5.c: New test.
* c-c++-common/Wuse-after-free-6.c: New test.
* c-c++-common/Wuse-after-free-7.c: New test.
* c-c++-common/Wuse-after-free.c: New test.
* g++.dg/warn/Wdangling-pointer.C: New test.
* g++.dg/warn/Wmismatched-dealloc-3.C: New test.
* g++.dg/warn/Wuse-after-free.C: New test.

diff --git a/gcc/gimple-ssa-warn-access.cc 
b/gcc/gimple-ssa-warn-access.cc

index 63fc27a1487..2065402a2b9 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc

@@ -3397,33 +3417,460 @@ pass_waccess::mayb

PING 3 [PATCH v2 2/2] add -Wdangling-pointer [PR #63272]

2022-01-04 Thread Martin Sebor via Gcc-patches

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585819.html

On 12/13/21 9:50 AM, Martin Sebor wrote:

Ping.  This patch, originally submitted on Nov. 1, has not been
reviewed yet.

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585819.html

On 12/6/21 5:51 PM, Martin Sebor wrote:

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585819.html

On 11/30/21 3:55 PM, Martin Sebor wrote:

Attached is a revision of this patch with adjustments for
the changes to the prerequisite patch 1 in the series and
a couple of minor simplifications and slightly improved
test coverage, rested on x86_64-linux.

On 11/1/21 4:18 PM, Martin Sebor wrote:

Patch 2 in this series adds support for detecting the uses of
dangling pointers: those to auto objects that have gone out of
scope.  Like patch 1, to minimize false positives this detection
is very simplistic.  However, thanks to the more deterministic
nature of the problem (all local objects go out of scope) is able
to detect more instances of it.  The approach I used is to simply
search the IL for clobbers that dominate uses of pointers to
the clobbered objects.  If such a use is found that's not
followed by a clobber of the same object the warning triggers.
Similar to -Wuse-after-free, the new -Wdangling-pointer option
has multiple levels: level 1 to detect unconditional uses and
level 2 to flag conditional ones.  Unlike with -Wuse-after-free
there is no use case for testing dangling pointers for
equality, so there is no level 3.

Tested on x86_64-linux and  by building Glibc and Binutils/GDB.
It found no problems outside of the GCC test suite.

As with the first patch in this series, the tests contain a number
of xfails due to known limitations marked with pr??.  I'll
open bugs for them before committing the patch if I don't resolve
them first in a followup.

Martin










Re: [power-ieee128] fortran, libgfortran: Assorted -mabi=ieeelongdouble I/O fixes

2022-01-04 Thread Jakub Jelinek via Gcc-patches
Hi!

This test FAILs because
f951: Error: '-mabi=ieeelongdouble' requires full ISA 2.06 support
compiler exited with status 1
FAIL: gfortran.dg/pr47614.f   -O0  (test for excess errors)
As powerpc64le* only supports -mcpu=power8 and newer, I think we shouldn't
be testing with that option.

Ok for power-ieee128?

All the remaining FAILs I get are due to the -flto -mgnu-attribute issues
or FAIL also with -mabi=ibmlongdouble.

Though, I'm still unsure on what we should do with the array descriptors.
typedef struct dtype_type
{
  size_t elem_len;
  int version;
  signed char rank;
  signed char type;
  signed short attribute;
}
dtype_type;

Is elem_len really element length, or kind, or both?

It seems a lot of code uses that interchangeably, is there anything where
we'd rely on whether it is the IBM extended real(kind=16) or IEEE quad
real(kind=16) (either in libgfortran or elsewhere)?
At least in libgfortran/generates/*, GFC_DESCRIPTOR_SIZE is mostly used
as mask_kind (I think the mask arrays are always logical not real/complex,
right?), or for logical stuff like matmul_l*.

2022-01-04  Jakub Jelinek  

* gfortran.dg/pr47614.f: Don't use -mcpu=power4 for
powerpc64le*-*-linux*.

--- gcc/testsuite/gfortran.dg/pr47614.f.jj  2021-12-31 11:00:53.733041354 
+
+++ gcc/testsuite/gfortran.dg/pr47614.f 2022-01-04 17:51:05.422663254 +
@@ -1,6 +1,7 @@
 ! { dg-do run { target { powerpc*-*-* } } }
 ! { dg-skip-if "" { powerpc*-*-darwin* } }
 ! { dg-options "-O3 -funroll-loops -ffast-math -mcpu=power4" }
+! { dg-options "-O3 -funroll-loops -ffast-math" { target powerpc64le*-*-linux* 
} }
 
 
   SUBROUTINE SFCPAR(ZET,NZ,ZMH,TSL,TMES)

Jakub



Re: [PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 04, 2022 at 04:58:19PM +, Andrew Stubbs wrote:
> > I think perror is the wrong thing to do, omp_alloc etc. has a well defined
> > interface what to do in such cases - the allocation should just fail (not be
> > allocated) and depending on user's choice that can be fatal, or return NULL,
> > or chain to some other allocator with other properties etc.
> 
> I did it this way because pinning feels more like an optimization, and
> falling back to "just works" seemed like what users would want to happen.
> The perror was added because it turns out the default ulimit is tiny and I
> wanted to hint at the solution.

Something like perror might be acceptable for GOMP_DEBUG mode, but not
normal operation.  So perhaps use gomp_debug there instead?

If it is just an optimization for the user, they should be using the
chaining to corresponding allocator without the pinning to make it clear
what they want and also standard conforming.

> > Other issues in the patch are that it doesn't munlock on deallocation and
> > that because of that deallocation we need to figure out what to do on page
> > boundaries.  As documented, mlock can be passed address and/or address +
> > size that aren't at page boundaries and pinning happens even just for
> > partially touched pages.  But munlock unpins also even the partially
> > overlapping pages and we don't know at that point whether some other pinned
> > allocations don't appear in those pages.
> 
> Right, it doesn't munlock because of these issues. I don't know of any way
> to solve this that wouldn't involve building tables of locked ranges (and
> knowing what the page size is).
> 
> I considered using mmap with the lock flag instead, but the failure mode
> looked unhelpful. I guess we could mmap with the regular flags, then mlock
> after. That should bypass the regular heap and ensure each allocation has
> it's own page. I'm not sure what the unintended side-effects of that might
> be.

But the munlock is even more important because of the low ulimit -l, because
if munlock isn't done on deallocation, the by default I think 64KB limit
will be reached even much earlier.  If most users have just 64KB limit on
pinned memory per process, then that most likely asks for grabbing such memory
in whole pages and doing memory management on that resource.
Because vasting that precious memory on the partial pages which will most
likely get non-pinned allocations when we just have 16 such pages is a big
waste.

Jakub



[PATCH] i386: Introduce V2QImode vectorized logic [PR103861]

2022-01-04 Thread Uros Bizjak via Gcc-patches
Add V2QImode logic operations with SSE and GP registers and split
them to V4QImode SSE instructions or SImode GP instructions.

The patch also fixes PR target/103900.

2022-01-04  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/mmx.md (one_cmplv2qi3): New insn pattern.
(one_cmplv2qi3 splitters): New post-reload splitters.
(*andnotv2qi3): New insn pattern.
(andnotv2qi3 splitters): New post-reload splitters.
(v2qi3): New insn pattern.
(v2qi3 splitters): New post-reload splitters.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.target/i386/warn-vect-op-2.c: Adjust warnings.
* gcc.target/i386/pr103900.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 5b33d3cfc1c..fc8ec5e4d49 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2745,6 +2745,45 @@
   "TARGET_SSE2"
   "operands[2] = force_reg (mode, CONSTM1_RTX (mode));")
 
+(define_insn "one_cmplv2qi2"
+  [(set (match_operand:V2QI 0 "register_operand" "=r,&x,&v")
+   (not:V2QI
+ (match_operand:V2QI 1 "register_operand" "0,x,v")))]
+  ""
+  "#"
+  [(set_attr "isa" "*,sse2,avx512vl")
+   (set_attr "type" "negnot,sselog,sselog")
+   (set_attr "mode" "SI,TI,TI")])
+
+(define_split
+  [(set (match_operand:V2QI 0 "general_reg_operand")
+   (not:V2QI
+ (match_operand:V2QI 1 "general_reg_operand")))]
+  "reload_completed"
+  [(set (match_dup 0)
+   (not:SI (match_dup 1)))]
+{
+  operands[1] = gen_lowpart (SImode, operands[1]);
+  operands[0] = gen_lowpart (SImode, operands[0]);
+})
+
+(define_split
+  [(set (match_operand:V2QI 0 "sse_reg_operand")
+   (not:V2QI
+ (match_operand:V2QI 1 "sse_reg_operand")))]
+  "TARGET_SSE2 && reload_completed"
+  [(set (match_dup 0)
+   (xor:V4QI
+ (match_dup 0) (match_dup 1)))]
+{
+  emit_insn
+   (gen_rtx_SET (gen_rtx_REG (V16QImode, REGNO (operands[0])),
+CONSTM1_RTX (V16QImode)));
+
+  operands[1] = gen_lowpart (V4QImode, operands[1]);
+  operands[0] = gen_lowpart (V4QImode, operands[0]);
+})
+
 (define_insn "mmx_andnot3"
   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,x,v")
(and:MMXMODEI
@@ -2775,6 +2814,69 @@
(set_attr "type" "sselog")
(set_attr "mode" "TI")])
 
+(define_insn "*andnotv2qi3"
+  [(set (match_operand:V2QI 0 "register_operand" "=&r,r,x,x,v")
+(and:V2QI
+ (not:V2QI (match_operand:V2QI 1 "register_operand" "0,r,0,x,v"))
+ (match_operand:V2QI 2 "register_operand" "r,r,x,x,v")))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "#"
+  [(set_attr "isa" "*,bmi,sse2_noavx,avx,avx512vl")
+   (set_attr "type" "alu,bitmanip,sselog,sselog,sselog")
+   (set_attr "mode" "SI,SI,TI,TI,TI")])
+
+(define_split
+  [(set (match_operand:V2QI 0 "general_reg_operand")
+(and:V2QI
+ (not:V2QI (match_operand:V2QI 1 "general_reg_operand"))
+ (match_operand:V2QI 2 "general_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_BMI && reload_completed"
+  [(parallel
+ [(set (match_dup 0)
+  (and:SI (not:SI (match_dup 1)) (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = gen_lowpart (SImode, operands[2]);
+  operands[1] = gen_lowpart (SImode, operands[1]);
+  operands[0] = gen_lowpart (SImode, operands[0]);
+})
+
+(define_split
+  [(set (match_operand:V2QI 0 "general_reg_operand")
+(and:V2QI
+ (not:V2QI (match_operand:V2QI 1 "general_reg_operand"))
+ (match_operand:V2QI 2 "general_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_BMI && reload_completed"
+  [(set (match_dup 0)
+(not:SI (match_dup 1)))
+   (parallel
+ [(set (match_dup 0)
+  (and:SI (match_dup 0) (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = gen_lowpart (SImode, operands[2]);
+  operands[1] = gen_lowpart (SImode, operands[1]);
+  operands[0] = gen_lowpart (SImode, operands[0]);
+})
+
+(define_split
+  [(set (match_operand:V2QI 0 "sse_reg_operand")
+(and:V2QI
+ (not:V2QI (match_operand:V2QI 1 "sse_reg_operand"))
+ (match_operand:V2QI 2 "sse_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_SSE2 && reload_completed"
+  [(set (match_dup 0)
+   (and:V4QI (not:V4QI (match_dup 1)) (match_dup 2)))]
+{
+  operands[2] = gen_lowpart (V4QImode, operands[2]);
+  operands[1] = gen_lowpart (V4QImode, operands[1]);
+  operands[0] = gen_lowpart (V4QImode, operands[0]);
+})
+
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODEI 0 "register_operand")
(any_logic:MMXMODEI
@@ -2821,6 +2923,50 @@
(set_attr "type" "sselog")
(set_attr "mode" "TI")])
 
+(define_insn "v2qi3"
+  [(set (match_operand:V2QI 0 "register_operand" "=r,x,x,v")
+(any_logic:V2QI
+ (match_operand:V2QI 1 "register_operand" "%0,0,x,v")
+ (match_operand:V2QI 2 "register_operand" "r,x,x,v")))
+   (clobber (re

Re: [PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 04, 2022 at 07:28:29PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > > Other issues in the patch are that it doesn't munlock on deallocation and
> > > that because of that deallocation we need to figure out what to do on page
> > > boundaries.  As documented, mlock can be passed address and/or address +
> > > size that aren't at page boundaries and pinning happens even just for
> > > partially touched pages.  But munlock unpins also even the partially
> > > overlapping pages and we don't know at that point whether some other 
> > > pinned
> > > allocations don't appear in those pages.
> > 
> > Right, it doesn't munlock because of these issues. I don't know of any way
> > to solve this that wouldn't involve building tables of locked ranges (and
> > knowing what the page size is).
> > 
> > I considered using mmap with the lock flag instead, but the failure mode
> > looked unhelpful. I guess we could mmap with the regular flags, then mlock
> > after. That should bypass the regular heap and ensure each allocation has
> > it's own page. I'm not sure what the unintended side-effects of that might
> > be.
> 
> But the munlock is even more important because of the low ulimit -l, because
> if munlock isn't done on deallocation, the by default I think 64KB limit
> will be reached even much earlier.  If most users have just 64KB limit on
> pinned memory per process, then that most likely asks for grabbing such memory
> in whole pages and doing memory management on that resource.
> Because vasting that precious memory on the partial pages which will most
> likely get non-pinned allocations when we just have 16 such pages is a big
> waste.

E.g. if we start using (dynamically, using dlopen/dlsym etc.) the memkind
library for some of the allocators, for the pinned memory we could use
e.g. the memkind_create_fixed API - on the first pinned allocation, check
what is the ulimit -l and if it is fairly small, mmap PROT_NONE the whole
pinned size (but don't pin it whole at start, just whatever we need as we
go).

Jakub



Re: [power-ieee128] fortran, libgfortran: Add remaining missing *_r17 symbols

2022-01-04 Thread Thomas Koenig via Gcc-patches

Hi Jakub,


Following patch adds remaining missing *_r17 entrypoints, so that
we have 91 *_r16 and 91 *_r17 entrypoints (and 24 *_c16 and 24 *_c17).

This fixes:
FAIL: gfortran.dg/dec_math.f90   -O0  execution test
FAIL: gfortran.dg/dec_math.f90   -O1  execution test
FAIL: gfortran.dg/dec_math.f90   -O2  execution test
FAIL: gfortran.dg/dec_math.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/dec_math.f90   -O3 -g  execution test
FAIL: gfortran.dg/dec_math.f90   -Os  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/dec_math_1.f90   -Os  execution test

Ok for power-ieee128?


Looks good to me.

Thanks!

Thomas


Re: [power-ieee128] fortran, libgfortran: Assorted -mabi=ieeelongdouble I/O fixes

2022-01-04 Thread Thomas Koenig via Gcc-patches



Hi Jakub,


This test FAILs because
f951: Error: '-mabi=ieeelongdouble' requires full ISA 2.06 support
compiler exited with status 1
FAIL: gfortran.dg/pr47614.f   -O0  (test for excess errors)
As powerpc64le* only supports -mcpu=power8 and newer, I think we shouldn't
be testing with that option.

Ok for power-ieee128?


OK (I would also consider it obvious).



Though, I'm still unsure on what we should do with the array descriptors.
typedef struct dtype_type
{
   size_t elem_len;
   int version;
   signed char rank;
   signed char type;
   signed short attribute;
}
dtype_type;

Is elem_len really element length, or kind, or both?


It is specified as a length, and should be set correctly also
for complex (where it is twice the length).


It seems a lot of code uses that interchangeably, is there anything where
we'd rely on whether it is the IBM extended real(kind=16) or IEEE quad
real(kind=16) (either in libgfortran or elsewhere)?


I think we use sizeof in all relevant files.


At least in libgfortran/generates/*, GFC_DESCRIPTOR_SIZE is mostly used
as mask_kind (I think the mask arrays are always logical not real/complex,
right?), or for logical stuff like matmul_l*.


Yes, masks are always of type logical, so that should not be an issue.

There are two direct uses of elem_len in caf/singe.c, one in
date_and_time.c and one in io/transfer.c.  There is also some
use of a variable of that name ISO_Fortran_binding.c, which can
be checked later.

Best regards

Thomas


Re: [PATCH] Libquadmath: add nansq() function

2022-01-04 Thread Joseph Myers
On Sat, 1 Jan 2022, FX via Gcc-patches wrote:

> This patch adds nansq() to libquadmath, a function that returns a 
> signalling NaN. It is a need for full libgfortran support of signalling 
> NaNs, because not all targets that have _Float128 define a 
> __builtin_nanq() function.

All targets with _Float128 should have __builtin_nansf128, since we have

DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, 
ATTR_CONST_NOTHROW_NONNULL)

in builtins.def.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][hooks-bin] Port email_to.py to Python3.

2022-01-04 Thread Joseph Myers
On Mon, 3 Jan 2022, Martin Liška wrote:

> The patch ports the script to Python3.
> 
> Tested with:
> 
> $ echo "libstdc++-v3/xx" | ./email_to.py
> gcc-...@gcc.gnu.org
> libstdc++-...@gcc.gnu.org
> 
> May I install it?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Go patch committed: Remove duplication of Named_object traversal

2022-01-04 Thread Ian Lance Taylor via Gcc-patches
This patch to the Go frontend removes duplication of Named_object
traversal code.  Adding type parameters was about to add a partial
third version.  Remove the duplication to avoid that.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
5ef7d6c289350eb94ff6dd626b7d3f6c7ed65ea2
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 2d04f4b01c0..a18f3a37349 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-d3be41f0a1fca20e241e1db62b4b0f5262caac55
+9732b077b9235d0f35d0fb0abfe406b94d49
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc
index 290d294e83b..e2fd509f58a 100644
--- a/gcc/go/gofrontend/gogo.cc
+++ b/gcc/go/gofrontend/gogo.cc
@@ -6890,80 +6890,12 @@ Block::traverse(Traverse* traverse)
  | Traverse::traverse_expressions
  | Traverse::traverse_types)) != 0)
 {
-  const unsigned int e_or_t = (Traverse::traverse_expressions
-  | Traverse::traverse_types);
-  const unsigned int e_or_t_or_s = (e_or_t
-   | Traverse::traverse_statements);
   for (Bindings::const_definitions_iterator pb =
 this->bindings_->begin_definitions();
   pb != this->bindings_->end_definitions();
   ++pb)
{
- int t = TRAVERSE_CONTINUE;
- switch ((*pb)->classification())
-   {
-   case Named_object::NAMED_OBJECT_CONST:
- if ((traverse_mask & Traverse::traverse_constants) != 0)
-   t = traverse->constant(*pb, false);
- if (t == TRAVERSE_CONTINUE
- && (traverse_mask & e_or_t) != 0)
-   {
- Type* tc = (*pb)->const_value()->type();
- if (tc != NULL
- && Type::traverse(tc, traverse) == TRAVERSE_EXIT)
-   return TRAVERSE_EXIT;
- t = (*pb)->const_value()->traverse_expression(traverse);
-   }
- break;
-
-   case Named_object::NAMED_OBJECT_VAR:
-   case Named_object::NAMED_OBJECT_RESULT_VAR:
- if ((traverse_mask & Traverse::traverse_variables) != 0)
-   t = traverse->variable(*pb);
- if (t == TRAVERSE_CONTINUE
- && (traverse_mask & e_or_t) != 0)
-   {
- if ((*pb)->is_result_variable()
- || (*pb)->var_value()->has_type())
-   {
- Type* tv = ((*pb)->is_variable()
- ? (*pb)->var_value()->type()
- : (*pb)->result_var_value()->type());
- if (tv != NULL
- && Type::traverse(tv, traverse) == TRAVERSE_EXIT)
-   return TRAVERSE_EXIT;
-   }
-   }
- if (t == TRAVERSE_CONTINUE
- && (traverse_mask & e_or_t_or_s) != 0
- && (*pb)->is_variable())
-   t = (*pb)->var_value()->traverse_expression(traverse,
-   traverse_mask);
- break;
-
-   case Named_object::NAMED_OBJECT_FUNC:
-   case Named_object::NAMED_OBJECT_FUNC_DECLARATION:
- go_unreachable();
-
-   case Named_object::NAMED_OBJECT_TYPE:
- if ((traverse_mask & e_or_t) != 0)
-   t = Type::traverse((*pb)->type_value(), traverse);
- break;
-
-   case Named_object::NAMED_OBJECT_TYPE_DECLARATION:
-   case Named_object::NAMED_OBJECT_UNKNOWN:
-   case Named_object::NAMED_OBJECT_ERRONEOUS:
- break;
-
-   case Named_object::NAMED_OBJECT_PACKAGE:
-   case Named_object::NAMED_OBJECT_SINK:
- go_unreachable();
-
-   default:
- go_unreachable();
-   }
-
- if (t == TRAVERSE_EXIT)
+ if ((*pb)->traverse(traverse, false) == TRAVERSE_EXIT)
return TRAVERSE_EXIT;
}
 }
@@ -8673,6 +8605,99 @@ Named_object::location() const
 }
 }
 
+// Traverse a Named_object.
+
+int
+Named_object::traverse(Traverse* traverse, bool is_global)
+{
+  const unsigned int traverse_mask = traverse->traverse_mask();
+  const unsigned int e_or_t = (Traverse::traverse_expressions
+  | Traverse::traverse_types);
+  const unsigned int e_or_t_or_s = (e_or_t
+   | Traverse::traverse_statements);
+
+  int t = TRAVERSE_CONTINUE;
+  switch (this->classification_)
+{
+case Named_object::NAMED_OBJECT_CONST:
+  if ((traverse_mask & Traverse::traverse_constants) != 0)
+   t = traverse->constant(this, is_global);
+  if (t == TRAVERSE_CONTINUE
+ && (traverse_mask & e_or_t) != 0)
+   {
+ Type

[PATCH] Fortran: Fix ICE caused by missing error for untyped symbol [PR103258]

2022-01-04 Thread Sandra Loosemore
This patch fixes an ICE that appeared after I checked in my patch for 
PR101337 back in November, which made the resolve phase try harder to 
check all operands/arguments for errors instead of giving up after the 
first one, but it's actually a bug that existed before that and was only 
revealed by that earlier patch.


The problem is that the parse phase is doing early resolution to try to 
constant-fold a character length expression.  It's throwing away the 
error(s) if it fails, but in the test case for this issue it was leaving 
behind some state indicating that the error had already been diagnosed 
so it wasn't getting caught again during the "real" resolution phase either.


Every bit of code touched by this seems kind of hacky to me -- the 
different mechanisms for suppressing/ignoring errors, the magic bit in 
the symbol attributes, the part that tries to constant-fold an 
expression that might not actually be a constant, etc.  But, this is the 
least hacky fix I could come up with.  :-P  It fixes the test case from 
the issue and does not cause any regressions elsewhere in the gfortran 
testsuite.


OK to check in?

-Sandra
commit ea7deef7dad4239435374884713a187ae8faa4eb
Author: Sandra Loosemore 
Date:   Tue Jan 4 18:18:13 2022 -0800

Fortran: Fix ICE caused by missing error for untyped symbol [PR103258]

The bit on a symbol to mark that it had already been diagnosed as
lacking a type was getting set even when the error was suppressed or
discarded, specifically when doing early resolution on a character
length expression to see if it can be constant-folded.  Explicitly
suppress errors before doing that, then check whether they are
suppressed before setting the bit.

2022-01-04  Sandra Loosemore  

	PR fortran/103258

	gcc/fortran/
	* decl.c (gfc_match_char_spec): Suppress errors around call
	to gfc_reduce_init_expr.
	* error.c (gfc_query_suppress_errors): New.
	* gfortran.h (gfc_query_suppress_errors): Declare.
	* symbol.c (gfc_set_default_type): Check gfc_query_suppress_errors.

	gcc/testsuite/
	* gfortran.dg/pr103258.f90: New.

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 4e510cc..c846923 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -3609,7 +3609,9 @@ done:
 	  gfc_current_ns = gfc_get_namespace (NULL, 0);
 
 	  e = gfc_copy_expr (len);
+	  gfc_push_suppress_errors ();
 	  gfc_reduce_init_expr (e);
+	  gfc_pop_suppress_errors ();
 	  if (e->expr_type == EXPR_CONSTANT)
 	{
 	  gfc_replace_expr (len, e);
diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index be2eb93..e95c083 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -83,6 +83,15 @@ gfc_pop_suppress_errors (void)
 }
 
 
+/* Query whether errors are suppressed.  */
+
+bool
+gfc_query_suppress_errors (void)
+{
+  return suppress_errors > 0;
+}
+
+
 /* Determine terminal width (for trimming source lines in output).  */
 
 static int
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index d01a9dc..3b791a4 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1083,6 +1083,7 @@ typedef struct
 
 void gfc_push_suppress_errors (void);
 void gfc_pop_suppress_errors (void);
+bool gfc_query_suppress_errors (void);
 
 
 /* Character length structures hold the expression that gives the
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 0385595..1a4b022 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -299,7 +299,7 @@ gfc_set_default_type (gfc_symbol *sym, int error_flag, gfc_namespace *ns)
 
   if (ts->type == BT_UNKNOWN)
 {
-  if (error_flag && !sym->attr.untyped)
+  if (error_flag && !sym->attr.untyped && !gfc_query_suppress_errors ())
 	{
 	  const char *guessed = lookup_symbol_fuzzy (sym->name, sym);
 	  if (guessed)
diff --git a/gcc/testsuite/gfortran.dg/pr103258.f90 b/gcc/testsuite/gfortran.dg/pr103258.f90
new file mode 100644
index 000..4521fcd
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr103258.f90
@@ -0,0 +1,14 @@
+! { dg-do compile}
+! { dg-additional-options "-Wno-pedantic" }
+!
+! Test from PR103258.  This used to ICE due to incorrectly marking the
+! no-implicit-type error for n and m in the character length expression
+! as already diagnosed during early resolution, when in fact errors are
+! ignored in that parsing context.  We now expect the errors to be diagnosed
+! at the point of the first use of each symbol.
+
+subroutine s(n) ! { dg-error "Symbol 'n' .*has no IMPLICIT type" }
+implicit none
+character(n+m) :: c ! { dg-error "Symbol 'm' .*has no IMPLICIT type" }
+entry e(m)
+end


[PATCH] [RTL/fwprop] Allow propagations from inner loop to outer loop.

2022-01-04 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk.

gcc/ChangeLog:

PR rtl/103750
* cfgloop.h (loop_contains_p): New function.
* fwprop.c (forward_propagate_into): Allow propagations from
inner loop to outer loop.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr103750-fwprop-1.C: New test.
---
 gcc/cfgloop.h | 12 +
 gcc/fwprop.c  |  7 +++--
 .../g++.target/i386/pr103750-fwprop-1.C   | 26 +++
 3 files changed, 43 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index d2714e20cb0..e8fe0cedd5f 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -908,6 +908,18 @@ loop_outermost (class loop *loop)
   return (*loop->superloops)[1];
 }
 
+/* Returns true if loop OUTER contains loop INNER.  */
+static inline bool
+loop_contains_p (class loop* outer, class loop* inner)
+{
+  unsigned n = vec_safe_length (inner->superloops);
+
+  for (unsigned i = 0; i != n; i++)
+if ((*inner->superloops)[i] == outer)
+  return true;
+  return false;
+}
+
 extern void record_niter_bound (class loop *, const widest_int &, bool, bool);
 extern HOST_WIDE_INT get_estimated_loop_iterations_int (class loop *);
 extern HOST_WIDE_INT get_max_loop_iterations_int (const class loop *);
diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 2eab4fd4614..aed48e7273f 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -866,10 +866,13 @@ forward_propagate_into (use_info *use, bool reg_prop_only 
= false)
   rtx src = SET_SRC (def_set);
 
   /* Allow propagations into a loop only for reg-to-reg copies, since
- replacing one register by another shouldn't increase the cost.  */
+ replacing one register by another shouldn't increase the cost.
+ Propagations from inner loop to outer loop should be also ok.  */
   struct loop *def_loop = def_insn->bb ()->cfg_bb ()->loop_father;
   struct loop *use_loop = use->bb ()->cfg_bb ()->loop_father;
-  if ((reg_prop_only || def_loop != use_loop)
+  if ((reg_prop_only
+   || (use_loop && def_loop != use_loop
+  && !loop_contains_p (use_loop, def_loop)))
   && (!reg_single_def_p (dest) || !reg_single_def_p (src)))
 return false;
 
diff --git a/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C 
b/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C
new file mode 100644
index 000..26987d307aa
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C
@@ -0,0 +1,26 @@
+/* PR target/103750.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -std=c++1y -march=cannonlake -fdump-rtl-fwprop1" } */
+/* { dg-final { scan-rtl-dump-not "subreg:HI\[ 
\\\(\]*reg:SI\[^\n]*\n\[^\n]*UNSPEC_TZCNT" "fwprop1" } } */
+
+#include
+const char16_t *qustrchr(char16_t *n, char16_t *e, char16_t c) noexcept
+{
+  __m256i mch256 = _mm256_set1_epi16(c);
+  for ( ; n < e; n += 32) {
+__m256i data1 = _mm256_loadu_si256(reinterpret_cast(n));
+__m256i data2 = _mm256_loadu_si256(reinterpret_cast(n) + 
1);
+__mmask16 mask1 = _mm256_cmpeq_epu16_mask(data1, mch256);
+__mmask16 mask2 = _mm256_cmpeq_epu16_mask(data2, mch256);
+if (_kortestz_mask16_u8(mask1, mask2))
+  continue;
+
+unsigned idx = _tzcnt_u32(mask1);
+if (mask1 == 0) {
+  idx = __tzcnt_u16(mask2);
+  n += 16;
+}
+return n + idx;
+  }
+  return e;
+}
-- 
2.18.1



[PATCH] match.pd: Simplify 1 / X for integer X [PR95424]

2022-01-04 Thread Zhao Wei Liew via Gcc-patches
match.pd/95424: Simplify 1 / X for integer X

This patch implements an optimization for the following C++ code:

int f(int x) {
return 1 / x;
}

int f(unsigned int x) {
return 1 / x;
}

Before this patch, x86-64 gcc -std=c++20 -O3 produces the following
assembly:

f(int):
xor edx, edx
mov eax, 1
idiv edi
ret
f(unsigned int):
xor edx, edx
mov eax, 1
div edi
ret

In comparison, clang++ -std=c++20 -O3 produces the following assembly:

f(int):
lea ecx, [rdi + 1]
xor eax, eax
cmp ecx, 3
cmovb eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret

Clang's output is more efficient as it avoids expensive div operations.

With this patch, GCC now produces the following assembly:
f(int):
lea eax, [rdi + 1]
cmp eax, 2
mov eax, 0
cmovbe eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret

which is virtualy identical to Clang's assembly output. Any slight
differences
in the output for f(int) is related to another possible missed optimization.

gcc/ChangeLog:

* match.pd: Simplify 1 / X where X is an integer.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/divide-6.c: New test.
* gcc.dg/tree-ssa/divide-7.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 84c9b918041..5edae1818bb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -422,7 +422,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(div:C @0 (negate @0))
(if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
  && TYPE_OVERFLOW_UNDEFINED (type))
-{ build_minus_one_cst (type); })))
+{ build_minus_one_cst (type); }))
+
+ /* 1 / X -> X == 1 for unsigned integer X
+1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X
+But not for 1 / 0 so that we can get proper warnings and errors. */
+ (simplify
+   (div integer_onep@0 @1)
+   (switch
+ (if (!integer_zerop (@1)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+  && TYPE_UNSIGNED (TREE_TYPE (@1)))
+  (eq @0 @1))
+ (if (!integer_zerop (@1)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+  && !TYPE_UNSIGNED (TREE_TYPE (@1)))
+  (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); })
+ (le @1 { build_one_cst (integer_type_node); }))
+@1 { build_zero_cst (type); })

 /* For unsigned integral types, FLOOR_DIV_EXPR is the same as
TRUNC_DIV_EXPR.  Rewrite into the latter in this case.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
new file mode 100644
index 000..a9fc4c04058
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+unsigned int f(unsigned int x) {
+  return 1 / x;
+}
+
+/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */
+/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
new file mode 100644
index 000..285279af7c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+int f(int x) {
+  return 1 / x;
+}
+
+/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */
+/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */


[PATCH v3] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2022-01-04 Thread Kewen.Lin via Gcc-patches
Hi,

This patch is to fix the inconsistent behaviors for non-LTO mode
and LTO mode.  As Martin pointed out, currently the function
rs6000_can_inline_p simply makes it inlinable if callee_tree is
NULL, but it's unexpected, we should use the command line options
from target_option_default_node as default.

It replaces rs6000_isa_flags with target_option_default_node when
caller_tree is NULL since it's more straightforward and doesn't
suffer from some bug not to keep rs6000_isa_flags as default.

It also extends the scope of the check for the case that callee
has explicit set options, inlining in test case pr102059-5.c can
happen unexpectedly before, it's fixed accordingly.

As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
can be neglected for always inlining, this patch also takes some
flags when the callee is attributed by always_inline.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html
v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586112.html

This patch is one re-post of this updated version[1] and also
rebased and adjusted on top of the related commit r12-6219.

Bootstrapped and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586296.html

BR,
Kewen
-
gcc/ChangeLog:

PR target/102059
* config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
target_option_default_node and consider always_inline_safe flags.

gcc/testsuite/ChangeLog:

PR target/102059
* gcc.target/powerpc/pr102059-4.c: New test.
* gcc.target/powerpc/pr102059-5.c: New test.
* gcc.target/powerpc/pr102059-6.c: New test.
* gcc.target/powerpc/pr102059-7.c: New test.
* gcc.target/powerpc/pr102059-8.c: New test.
* gcc.dg/lto/pr102059-1_0.c: Remove unneeded option.


---
 gcc/config/rs6000/rs6000.c| 110 +++---
 gcc/testsuite/gcc.dg/lto/pr102059-1_0.c   |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr102059-4.c |  24 
 gcc/testsuite/gcc.target/powerpc/pr102059-5.c |  20 
 gcc/testsuite/gcc.target/powerpc/pr102059-6.c |  95 +++
 gcc/testsuite/gcc.target/powerpc/pr102059-7.c |  22 
 gcc/testsuite/gcc.target/powerpc/pr102059-8.c |  22 
 7 files changed, 255 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-5.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-6.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-7.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-8.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 7d07b47d9e3..60e131f2191 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25379,55 +25379,87 @@ rs6000_update_ipa_fn_target_info (unsigned int &info, 
const gimple *stmt)
 static bool
 rs6000_can_inline_p (tree caller, tree callee)
 {
-  bool ret = false;
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
-  /* If the callee has no option attributes, then it is ok to inline.  */
+  /* If the caller/callee has option attributes, then use them.
+ Otherwise, use the command line options.  */
   if (!callee_tree)
-ret = true;
-
-  else
-{
-  HOST_WIDE_INT caller_isa;
-  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
-  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
-  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
+callee_tree = target_option_default_node;
+  if (!caller_tree)
+caller_tree = target_option_default_node;
+
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
+  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
+
+  bool always_inline
+= DECL_DISREGARD_INLINE_LIMITS (callee)
+  && lookup_attribute ("always_inline", DECL_ATTRIBUTES (callee));
+
+  /* Some features can be tolerated for always inlines.  */
+  unsigned HOST_WIDE_INT always_inline_safe_mask
+/* Fusion option masks.  */
+= OPTION_MASK_P8_FUSION | OPTION_MASK_P10_FUSION
+  | OPTION_MASK_P8_FUSION_SIGN | OPTION_MASK_P10_FUSION
+  | OPTION_MASK_P10_FUSION_LD_CMPI | OPTION_MASK_P10_FUSION_2LOGICAL
+  | OPTION_MASK_P10_FUSION_LOGADD | OPTION_MASK_P10_FUSION_ADDLOG
+  | OPTION_MASK_P10_FUSION_2ADD
+  /* Like fusion, some option masks which are just for optimization.  */
+  | OPTION_MASK_SAVE_TOC_INDIRECT | OPTION_MASK_PCREL_OPT;
+
+  /* Some features are totally safe for inlining (or always inlines),
+ let's exclude them from the following checkings.  */
+  HOST_WIDE_INT safe_mask = always_inli

Re: [PATCH] Register --sysroot in the driver switches table

2022-01-04 Thread Martin Liška

On 12/20/21 22:28, Olivier Hainque via Gcc-patches wrote:

Hello,

This change adjusts the processing of --sysroot to save the option in the
internal "switches" array, which lets self-specs test for it and provide a
default value possibly dependent on environment variables, as in

   --with-specs=%{!-sysroot*:--sysroot=%:getenv("WIND_BASE" /target)}

This helps the use we have of self specs for VxWorks, and
was bootstrapped and regression tested on native 64bit linux.

Ok to commit ?

Thanks in advance,

With Kind Regards,

Olivier



Hello.

I think the patch broke my cross-rx-gcc12 package, failing now with:

[  162s] checking for rx-elf-gcc... 
/home/abuild/rpmbuild/BUILD/gcc-12.0.0+git190624/obj-x86_64-suse-linux/./gcc/xgcc
 
-B/home/abuild/rpmbuild/BUILD/gcc-12.0.0+git190624/obj-x86_64-suse-linux/./gcc/ 
-B/usr/rx-elf/bin/ -B/usr/rx-elf/lib/ -isystem /usr/rx-elf/include -isystem 
/usr/rx-elf/sys-include --sysroot=/usr/rx-elf/sys-root
[  162s] checking for suffix of object files... configure: error: in 
`/home/abuild/rpmbuild/BUILD/gcc-12.0.0+git190624/obj-x86_64-suse-linux/rx-elf/libgcc':
[  162s] configure: error: cannot compute suffix of object files: cannot compile
[  162s] See `config.log' for more details
[  162s] make[1]: *** [Makefile:12902: configure-target-libgcc] Error 1
[  162s] make[1]: *** Waiting for unfinished jobs
[  162s] g++ -static-libstdc++ -static-libgcc   -o g++-mapper-server server.o 
resolver.o ../libcody/libcody.a ../libiberty/libiberty.a
[  162s] /usr/bin/install -c g++-mapper-server ../gcc/g++-mapper-server
[  162s] make[2]: Leaving directory 
'/home/abuild/rpmbuild/BUILD/gcc-12.0.0+git190624/obj-x86_64-suse-linux/c++tools'
[  162s] make[1]: Leaving directory 
'/home/abuild/rpmbuild/BUILD/gcc-12.0.0+git190624/obj-x86_64-suse-linux'
[  162s] make: *** [Makefile:1027: all] Error 2
[  162s] error: Bad exit status from /var/tmp/rpm-tmp.wDOIGP (%build)

configure:3566: 
/home/abuild/rpmbuild/BUILD/gcc-12.0.0+git190624/obj-x86_64-suse-linux/./gcc/xgcc 
-B/home/abuild/rpmbuild/BUILD/gcc-12.0.0+git190624/obj-x86_64-suse-linux/./gcc/ 
-B/usr/rx-elf/bin/ -B/usr/rx-elf/lib/ -isystem /usr/rx-elf/include -isystem 
/usr/rx-elf/sys-include --sysroot=/usr/rx-elf/sys-root   -o conftest -g -O2   
conftest.c  >&5
xgcc: error: unrecognized command-line option '--sysroot=/usr/rx-elf/sys-root'
configure:3569: $? = 1

The compiler is configured with:

[   21s] + ../configure --prefix=/usr --infodir=/usr/share/info 
--mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 
--enable-languages=c,c++ --enable-checking=release --disable-werror 
--with-gxx-include-dir=/usr/include/c++/12 --enable-ssp --disable-libssp 
--disable-libvtv --enable-cet=auto --disable-libcc1 --disable-plugin 
--with-bugurl=https://bugs.opensuse.org/ '--with-pkgversion=SUSE Linux' 
--with-slibdir=/usr/rx-elf/sys-root/lib64 --with-system-zlib 
--enable-libstdcxx-allocator=new --disable-libstdcxx-pch 
--enable-version-specific-runtime-libs --with-gcc-major-version-only 
--enable-linux-futex --enable-gnu-indirect-function --program-suffix=-12 
--program-prefix=rx-elf- --target=rx-elf --disable-nls 
--with-sysroot=/usr/rx-elf/sys-root --with-build-sysroot=/usr/rx-elf/sys-root 
--with-build-time-tools=/usr/rx-elf/bin --with-newlib --disable-libsanitizer 
--build=x86_64-suse-linux --host=x86_64-suse-linux

Can you please take a look?

Cheers,
Martin