[PATCH (pushed)] param: document ranger-recompute-depth

2023-04-03 Thread Martin Liška

gcc/ChangeLog:

* doc/invoke.texi: Document new param.
---
 gcc/doc/invoke.texi | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index def2df4584b..c9482886c5a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16170,6 +16170,10 @@ per supernode, before terminating analysis.
 Maximum depth of logical expression evaluation ranger will look through
 when evaluating outgoing edge ranges.
 
+@item ranger-recompute-depth

+Maximum depth of instruction chains to consider for recomputation
+in the outgoing range calculator.
+
 @item relation-block-limit
 Maximum number of relations the oracle will register in a basic block.
 
--

2.40.0



[PATCH] driver: drop flag_var_tracking_assignments flag

2023-04-03 Thread Martin Liška

The revision r13-259-g76db543db88727 moved a condition from one
file to another, but now we do not drop x_flag_var_tracking_assignments
as it was done before the mentioned revision.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR driver/108241

gcc/ChangeLog:

* opts.cc (finish_options): Drop also
  x_flag_var_tracking_assignments.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108241.c: New test.
* gcc.dg/pr79570.c: Add also -g option.
---
 gcc/opts.cc |  1 +
 gcc/testsuite/gcc.dg/pr108241.c | 63 +
 gcc/testsuite/gcc.dg/pr79570.c  |  2 +-
 3 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr108241.c

diff --git a/gcc/opts.cc b/gcc/opts.cc
index f102c1328b9..fb2e5388ab1 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -1384,6 +1384,7 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
}
   opts->x_flag_var_tracking = 0;
   opts->x_flag_var_tracking_uninit = 0;
+  opts->x_flag_var_tracking_assignments = 0;
 }
 
   /* One could use EnabledBy, but it would lead to a circular dependency.  */

diff --git a/gcc/testsuite/gcc.dg/pr108241.c b/gcc/testsuite/gcc.dg/pr108241.c
new file mode 100644
index 000..06d210fae68
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr108241.c
@@ -0,0 +1,63 @@
+/* PR driver/108241 */
+/* { dg-options "-Os -frounding-math -fvar-tracking-assignments -fno-dce 
-fno-trapping-math -fno-tree-dce -fno-tree-dse" } */
+
+long int n1;
+int n2, n3, n4;
+char n5;
+
+void
+foo (long int x1, long int x2, int x3, int x4, int x5, char x6, char x7)
+{
+  char a01 = n2, a02 = x4, a03 = 0;
+  short int a04;
+  unsigned short int a05 = x5;
+  int a06, a07, a08 = a05, a09 = x3, a10 = 0;
+  long int a11, a12 = x4;
+
+  if (x1)
+{
+  a07 = x6 + (float)0x101;
+  a03 = a12 = a01 = a06 = ~0;
+
+  if (x5)
+   a11 = n5;
+}
+  else
+{
+  a10 = x3 = n3;
+  if (n3)
+   a06 = a05 = x7;
+}
+
+  if (n3 < n5)
+{
+  n4 = (x2 == x4) + !n1;
+  if (n4 % (n1 % x3))
+   {
+ a04 = n4;
+ a02 = n2;
+   }
+
+  if (x3)
+   {
+ a05 = !n1 % n2;
+ a08 = n1;
+ a04 = x5 + a06;
+   }
+
+  if (a12)
+   a09 = n3 + n4;
+
+  a12 = a07;
+  n3 = a11 % x1;
+  n5 += x6;
+  n1 = a04;
+}
+
+  n4 = x2 % x5 % a11;
+  a06 = a10 + a08 % a02 == n4;
+  a09 = a09 == a01 * x7;
+  n4 = x4;
+  a12 += x4 / 0xc000 + !a03;
+  a03 = !a05;
+}
diff --git a/gcc/testsuite/gcc.dg/pr79570.c b/gcc/testsuite/gcc.dg/pr79570.c
index 00841b9487a..a15be9f201d 100644
--- a/gcc/testsuite/gcc.dg/pr79570.c
+++ b/gcc/testsuite/gcc.dg/pr79570.c
@@ -1,6 +1,6 @@
 /* PR target/79570 */
 /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O2 -fselective-scheduling2 -fvar-tracking-assignments" } */
+/* { dg-options "-O2 -fselective-scheduling2 -fvar-tracking-assignments -g" } 
*/
 /* { dg-warning "changes selective scheduling" "" { target *-*-* } 0 } */
 
 #include "pr69956.c"

--
2.40.0



[PATCH 0/2] Support Intel AMX-COMPLEX

2023-04-03 Thread Haochen Jiang via Gcc-patches
Hi all,

These patch aims to add Intel AMX-COMPLEX instructions. Also we added
AMX-COMPLEX to -march=graniterapids.

The information is based on newly released
Intel Architecture Instruction Set Extensions and Future Features.

The document comes following:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Since there are only two instructions under this new ISA, I suppose the
risk is low and might get a change for GCC13. So I send the patches out now.

Tested on x86_64-pc-linux-gnu. Ok for trunk?

BRs,
Haochen




[PATCH 2/2] i386: Add AMX-COMPLEX to Granite Rapids

2023-04-03 Thread Haochen Jiang via Gcc-patches
gcc/Changelog:

* config/gcc/i386.h (PTA_GRANITERAPIDS): Add PTA_AMX_COMPLEX.
---
 gcc/config/i386/i386.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index dd9391c492b..1da6dce8e0b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2361,7 +2361,7 @@ constexpr wide_int_bitmask PTA_ALDERLAKE = PTA_TREMONT | 
PTA_ADX | PTA_AVX
 constexpr wide_int_bitmask PTA_SIERRAFOREST = PTA_ALDERLAKE | PTA_AVXIFMA
   | PTA_AVXVNNIINT8 | PTA_AVXNECONVERT | PTA_CMPCCXADD;
 constexpr wide_int_bitmask PTA_GRANITERAPIDS = PTA_SAPPHIRERAPIDS | 
PTA_AMX_FP16
-  | PTA_PREFETCHI;
+  | PTA_PREFETCHI | PTA_AMX_COMPLEX;
 constexpr wide_int_bitmask PTA_GRANDRIDGE = PTA_SIERRAFOREST | PTA_RAOINT;
 constexpr wide_int_bitmask PTA_KNM = PTA_KNL | PTA_AVX5124VNNIW
   | PTA_AVX5124FMAPS | PTA_AVX512VPOPCNTDQ;
-- 
2.31.1



[PATCH 1/2] Support Intel AMX-COMPLEX

2023-04-03 Thread Haochen Jiang via Gcc-patches
gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect AMX-COMPLEX.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_COMPLEX_SET,
OPTION_MASK_ISA2_AMX_COMPLEX_UNSET): New.
(ix86_handle_option): Handle -mamx-complex.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AMX_COMPLEX.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
amx-complex.
* config.gcc: Add amxcomplexintrin.h.
* config/i386/cpuid.h (bit_AMX_COMPLEX): New.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__AMX_COMPLEX__.
* config/i386/i386-isa.def (AMX_COMPLEX): Add DEF_PTA(AMX_COMPLEX).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle amx-complex.
* config/i386/i386.opt: Add option -mamx-complex.
* config/i386/immintrin.h: Include amxcomplexintrin.h.
* doc/extend.texi: Document amx-complex.
* doc/invoke.texi: Document -mamx-complex.
* doc/sourcebuild.texi: Document target amx-complex.
* config/i386/amxcomplexintrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mamx-complex.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/amx-check.h: Add cpu check for AMX-COMPLEX.
* gcc.target/i386/amx-helper.h: Add amx-complex support.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mamx-complex.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add amx-complex.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp (check_effective_target_amx_complex): New.
* gcc.target/i386/amxcomplex-asmatt-1.c: New test.
* gcc.target/i386/amxcomplex-asmintel-1.c: Ditto.
* gcc.target/i386/amxcomplex-cmmimfp16ps-2.c: Ditto.
* gcc.target/i386/amxcomplex-cmmrlfp16ps-2.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h  |  2 +
 gcc/common/config/i386/i386-common.cc | 19 +-
 gcc/common/config/i386/i386-cpuinfo.h |  1 +
 gcc/common/config/i386/i386-isas.h|  2 +
 gcc/config.gcc|  2 +-
 gcc/config/i386/amxcomplexintrin.h| 59 +++
 gcc/config/i386/cpuid.h   |  1 +
 gcc/config/i386/i386-c.cc |  2 +
 gcc/config/i386/i386-isa.def  |  1 +
 gcc/config/i386/i386-options.cc   |  4 +-
 gcc/config/i386/i386.opt  |  4 ++
 gcc/config/i386/immintrin.h   |  2 +
 gcc/doc/extend.texi   |  5 ++
 gcc/doc/invoke.texi   | 11 ++--
 gcc/doc/sourcebuild.texi  |  3 +
 gcc/testsuite/g++.dg/other/i386-2.C   |  2 +-
 gcc/testsuite/g++.dg/other/i386-3.C   |  2 +-
 gcc/testsuite/gcc.target/i386/amx-check.h |  3 +
 gcc/testsuite/gcc.target/i386/amx-helper.h|  4 +-
 .../gcc.target/i386/amxcomplex-asmatt-1.c | 15 +
 .../gcc.target/i386/amxcomplex-asmintel-1.c   | 12 
 .../i386/amxcomplex-cmmimfp16ps-2.c   | 53 +
 .../i386/amxcomplex-cmmrlfp16ps-2.c   | 53 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |  2 +
 gcc/testsuite/gcc.target/i386/sse-12.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c|  4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  2 +-
 gcc/testsuite/lib/target-supports.exp | 11 
 30 files changed, 270 insertions(+), 17 deletions(-)
 create mode 100644 gcc/config/i386/amxcomplexintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/amxcomplex-asmatt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxcomplex-asmintel-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxcomplex-cmmimfp16ps-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amxcomplex-cmmrlfp16ps-2.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 5bde0cddb24..61559ed9de2 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -879,6 +879,8 @@ get_available_features (struct __processor_model *cpu_model,
{
  if (eax & bit_AMX_FP16)
set_feature (FEATURE_AMX_FP16);
+ if (edx & bit_AMX_COMPLEX)
+   set_feature (FEATURE_AMX_COMPLEX);
}
 }
 
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 0181e06b1c5..d90c558311b 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -117,6 +117,8 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA

[PATCH][stage1] gcov: respect -fprofile-prefix-map when it comes to output of .gcda file

2023-04-03 Thread Martin Liška

Respect the profile prefix map and save .gcda files to a path that is
also translated with -fprofile-prefix-map option (if provided).

It's a stage 1 material, if you are interested in the fix, please install it,
I won't be able to take care of it at that time.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thanks,
Martin

PR gcov-profile/105063

gcc/ChangeLog:

* coverage.cc (coverage_init): Combine strings with concat and
respect profile path mapping.
---
 gcc/coverage.cc | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/gcc/coverage.cc b/gcc/coverage.cc
index 7ed3a5d4ceb..3fd7f6e8e76 100644
--- a/gcc/coverage.cc
+++ b/gcc/coverage.cc
@@ -112,7 +112,7 @@ static char *bbg_file_name;
 static unsigned bbg_file_stamp;
 
 /* Name of the count data (gcda) file.  */

-static char *da_file_name;
+static const char *da_file_name;
 
 /* The names of merge functions for counters.  */

 #define STR(str) #str
@@ -1259,8 +1259,6 @@ coverage_init (const char *filename)
 #else
   const char *separator = "/";
 #endif
-  int len = strlen (filename);
-  int prefix_len = 0;
 
   /* Since coverage_init is invoked very early, before the pass

  manager, we need to set up the dumping explicitly. This is
@@ -1289,26 +1287,19 @@ coverage_init (const char *filename)
 "prefix %qs", filename, profile_prefix_path);
}
  filename = mangle_path (filename);
- len = strlen (filename);
}
   else
profile_data_prefix = getpwd ();
 }
 
-  if (profile_data_prefix)

-prefix_len = strlen (profile_data_prefix);
-
   /* Name of da file.  */
-  da_file_name = XNEWVEC (char, len + strlen (GCOV_DATA_SUFFIX)
- + prefix_len + 2);
-
   if (profile_data_prefix)
-{
-  memcpy (da_file_name, profile_data_prefix, prefix_len);
-  da_file_name[prefix_len++] = *separator;
-}
-  memcpy (da_file_name + prefix_len, filename, len);
-  strcpy (da_file_name + prefix_len + len, GCOV_DATA_SUFFIX);
+da_file_name = concat (profile_data_prefix, separator, filename,
+  GCOV_DATA_SUFFIX, NULL);
+  else
+da_file_name = concat (filename, GCOV_DATA_SUFFIX, NULL);
+
+  da_file_name = remap_profile_filename (da_file_name);
 
   bbg_file_stamp = local_tick;

   if (flag_auto_profile)
@@ -1385,7 +1376,6 @@ coverage_finish (void)
   coverage_obj_finish (fn_ctor, object_checksum);
 }
 
-  XDELETEVEC (da_file_name);

   da_file_name = NULL;
 }
 
--

2.40.0



[PATCH] ipa: propagate attributes for target attribute clones

2023-04-03 Thread Martin Liška

Hi.

The patch propagates noreturn attribute for MV functions. However, I noticed
we've got the following ICE when we do the same for TREE_READONLY attr:

$ cat tc.c
double bar() __attribute__((target_clones("avx,avx2,avx512f,default")));
double bar() { return 1.2f; }

int foo() { return (int)bar(); }

$ ./xgcc -B. ~/Programming/testcases/tc.c -O3 -c -fprofile-generate
/home/marxin/Programming/testcases/tc.c: In function ‘foo’:
/home/marxin/Programming/testcases/tc.c:4:5: error: virtual definition of 
statement not up to date
4 | int foo() { return (int)bar(); }
  | ^~~
_1 = bar ();
during GIMPLE pass: fixup_cfg

Thus my ambition is to propagate only noreturn attr.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR ipa/106816

gcc/ChangeLog:

* config/i386/i386-features.cc (ix86_get_function_versions_dispatcher):
Propagate function attributes for clones.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr106816.C: New test.

Co-Authored-By: H.J. Lu 
---
 gcc/config/i386/i386-features.cc |  1 +
 gcc/testsuite/g++.target/i386/pr106816.C | 23 +++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/pr106816.C

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index c09abf8fc20..f2b0d59a73c 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3379,6 +3379,7 @@ ix86_get_function_versions_dispatcher (void *decl)
   /* Right now, the dispatching is done via ifunc.  */
   dispatch_decl = make_dispatcher_decl (default_node->decl);
   TREE_NOTHROW (dispatch_decl) = TREE_NOTHROW (fn);
+  TREE_THIS_VOLATILE (dispatch_decl) = TREE_THIS_VOLATILE (fn);
 
   dispatcher_node = cgraph_node::get_create (dispatch_decl);

   gcc_assert (dispatcher_node != NULL);
diff --git a/gcc/testsuite/g++.target/i386/pr106816.C 
b/gcc/testsuite/g++.target/i386/pr106816.C
new file mode 100644
index 000..0f5cc1f13dd
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr106816.C
@@ -0,0 +1,23 @@
+// PR ipa/106816
+
+// { dg-do compile }
+// { dg-require-ifunc "" }
+// { dg-options "-O2 -fdump-tree-optimized" }
+
+__attribute__((noreturn, target("default"))) void f()
+{
+  for (;;) {}
+}
+
+__attribute__((noreturn, target("sse4.2,bmi"))) void f()
+{
+  for (;;) {}
+}
+
+int main()
+{
+  f();
+  return 12345;
+}
+
+/* { dg-final { scan-tree-dump-not "12345" "optimized" } } */
--
2.40.0



Re: [PATCH v3] rs6000: Fix vector parity support [PR108699]

2023-04-03 Thread Segher Boessenkool
On Mon, Mar 20, 2023 at 02:31:31PM +0800, Kewen.Lin wrote:
> The failures on the original failed case builtin-bitops-1.c
> and the associated test case pr108699.c here show that the
> current support of parity vector mode is wrong on Power.
> The hardware insns vprtyb[wdq] which operate on the least
> significant bit of each byte per element, they doesn't match
> what RTL opcode parity needs, but the current implementation
> expands it with them wrongly.
> 
> This patch is to fix the handling with one more insn vpopcntb.
> 
> Comparing to v2 [1]:
>   - Use rs6000_vprtyb2 rather than parityb2, and
> adjust several places with it accordingly.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P{8,9}
> and powerpc64le-linux-gnu P10.
> 
> Is it ok for trunk?

Looks good.  Thanks!


Segher


Re: [PATCH, rs6000] rs6000: correct vector sign extend built-ins on Big Endian [PR108812]

2023-04-03 Thread Segher Boessenkool
On Mon, Mar 27, 2023 at 03:14:26PM +0800, Kewen.Lin wrote:
> on 2023/3/27 14:16, HAO CHEN GUI wrote:
> >   This patch removes byte reverse operation before vector integer sign
> > extension on Big Endian. These built-ins require to sign extend the 
> > rightmost
> > element. So both BE and LE should do the same operation and the byte 
> > reversion
> > is no need. This patch fixes it. Now these built-ins have the same behavior 
> > on
> > all compilers. The test case is modified also.

When extending from sizes A to B the rightmost A in every B.  That is
the same in every endianness, yes -- it is what the machine insns do
after all, it has nothing to do with how the elements are numbered in
the ABI :-)

> I think the whole define_expand can be removed, we can just use the
> define_insn names vsx_sign_extend_qi_* in rs6000-builtins.def (just
> like what you changed for __builtin_altivec_vsignextsw2d).

A very welcome cleanup :-)

> One interesting thing is that we used qi/hi/si in the name for
> V16QI/V8HI/V4SI but used v2di for V2DI, could you also adjust the
> names from vsx_sign_extend_{qi,hi,si}_* to ..._{v16qi,v8hi,v4si}_*
> then make them adopt the same naming style?

Yes please :-)


Segher


Re: [PATCH] rs6000: Fix vector_set_var_p9 by considering BE [PR108807]

2023-04-03 Thread Segher Boessenkool
Hi!

On Fri, Feb 17, 2023 at 05:55:04PM +0800, Kewen.Lin wrote:
> As PR108807 exposes, the current handling in function
> rs6000_expand_vector_set_var_p9 doesn't take care of big
> endianness.  Currently the function is to rotate the
> target vector by moving element to-be-set to element 0,
> set element 0 with the given val, then rotate back.  To
> get the permutation control vector for the rotation, it
> makes use of lvsr and lvsl, but the element ordering is
> different for BE and LE (like element 0 is the most
> significant one on BE while the least significant one on
> LE), this patch is to add consideration for BE and make
> sure permutation control vectors for rotations are expected.

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -7235,22 +7235,26 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, 
> rtx idx)
> 
>machine_mode shift_mode;
>rtx (*gen_ashl)(rtx, rtx, rtx);
> -  rtx (*gen_lvsl)(rtx, rtx);
> -  rtx (*gen_lvsr)(rtx, rtx);
> +  rtx (*gen_pcvr1)(rtx, rtx);
> +  rtx (*gen_pcvr2)(rtx, rtx);

Space before "(" btw, you can fix that at the same time? :-)

What does "pcvr" mean?  You could put that in a short comment?

> +  /* Generate one permutation control vector used for rotating the element

Ah.  Yeah just "/* Permutation control vector */" for the above one
prevents all mysteries :-)

Patch looks good.  Thanks!


Segher


Re: [PATCH] aarch64: update ampere1 vectorization cost

2023-04-03 Thread Philipp Tomsich
Kyrill,

We reran on GCC12 and GCC11, reproducing the same improvements (e.g.,
on fotonik3d) that prompted the changes.
I'll apply the backports later this week, unless you have any further concerns…

Thanks,
Philipp.


On Mon, 27 Mar 2023 at 11:24, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Philipp Tomsich 
> > Sent: Monday, March 27, 2023 9:50 AM
> > To: Kyrylo Tkachov 
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > ; Tamar Christina
> > ; Manolis Tsamis 
> > Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov 
> > wrote:
> > >
> > > Hi Philipp,
> > >
> > > > -Original Message-
> > > > From: Gcc-patches  > > > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > > > Tomsich
> > > > Sent: Monday, March 27, 2023 8:47 AM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: Richard Sandiford ; Tamar Christina
> > > > ; Philipp Tomsich
> > ;
> > > > Manolis Tsamis 
> > > > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> > > >
> > > > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > > > prior to exhaustive testing of vectorizable workloads against
> > > > hardware.
> > > >
> > > > Adjust the vector costs to achieve the best results and more closely
> > > > match the underlying hardware.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> > > >
> > > > Co-Authored-By: Manolis Tsamis 
> > > >
> > > > Signed-off-by: Philipp Tomsich 
> > > > ---
> > > > We would like to get this into GCC 13 to avoid having to backport at
> > > > the start of the next cycle.
> > > >
> > >
> > > Given this affects only the ampere1 costs that sounds fine to me and 
> > > fairly
> > low risk, you are being trusted that these costs are actually desirable and
> > properly validated on the hardware involved.
> > >
> > > > OK for backports?
> > >
> > > This is ok for trunk (GCC 13). Do you also want to backport this to other
> > branches?
> >
> > Ampere1 (with the older vector costs) are in GCC12 and GCC11.
> > I would like to backport to those as well.
>
> Ok then, though you may want to run the benchmarks on the branches as well to 
> make sure the costs give the expected benefit there as well.
> Thanks,
> Kyrill
>
> >
> > Thanks,
> > Philipp.
> >
> > > Thanks,
> > > Kyrill
> > >
> > > >
> > > >  gcc/config/aarch64/aarch64.cc | 12 ++--
> > > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc
> > > > index b27f4354031..661fff65cea 100644
> > > > --- a/gcc/config/aarch64/aarch64.cc
> > > > +++ b/gcc/config/aarch64/aarch64.cc
> > > > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > > > thunderx3t110_vector_cost =
> > > >
> > > >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> > > >  {
> > > > -  3, /* int_stmt_cost  */
> > > > +  1, /* int_stmt_cost  */
> > > >3, /* fp_stmt_cost  */
> > > >0, /* ld2_st2_permute_cost  */
> > > >0, /* ld3_st3_permute_cost  */
> > > > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > > > ampere1_advsimd_vector_cost =
> > > >8, /* store_elt_extra_cost  */
> > > >6, /* vec_to_scalar_cost  */
> > > >7, /* scalar_to_vec_cost  */
> > > > -  5, /* align_load_cost  */
> > > > -  5, /* unalign_load_cost  */
> > > > -  2, /* unalign_store_cost  */
> > > > -  2  /* store_cost  */
> > > > +  4, /* align_load_cost  */
> > > > +  4, /* unalign_load_cost  */
> > > > +  1, /* unalign_store_cost  */
> > > > +  1  /* store_cost  */
> > > >  };
> > > >
> > > >  /* Ampere-1 costs for vector insn classes.  */
> > > >  static const struct cpu_vector_cost ampere1_vector_cost =
> > > >  {
> > > >1, /* scalar_int_stmt_cost  */
> > > > -  1, /* scalar_fp_stmt_cost  */
> > > > +  3, /* scalar_fp_stmt_cost  */
> > > >4, /* scalar_load_cost  */
> > > >1, /* scalar_store_cost  */
> > > >1, /* cond_taken_branch_cost  */
> > > > --
> > > > 2.34.1
> > >


Re: [PATCH (pushed)] param: document ranger-recompute-depth

2023-04-03 Thread Andrew MacLeod via Gcc-patches

Bah.. forgot that.. thanks :-)

Andrew

On 4/3/23 04:04, Martin Liška wrote:

gcc/ChangeLog:

* doc/invoke.texi: Document new param.
---
 gcc/doc/invoke.texi | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index def2df4584b..c9482886c5a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16170,6 +16170,10 @@ per supernode, before terminating analysis.
 Maximum depth of logical expression evaluation ranger will look through
 when evaluating outgoing edge ranges.

+@item ranger-recompute-depth
+Maximum depth of instruction chains to consider for recomputation
+in the outgoing range calculator.
+
 @item relation-block-limit
 Maximum number of relations the oracle will register in a basic block.





Re: [PATCH] sanitizer: missing signed integer overflow errors [PR109107]

2023-04-03 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 14, 2023 at 06:50:26PM -0400, Marek Polacek via Gcc-patches wrote:
> Here we're failing to detect a signed overflow with -O because match.pd,
> since r8-1516, transforms
> 
>   c = (a + 1) - (int) (short int) b;
> 
> into
> 
>   c = (int) ((unsigned int) a + 4294946117);
> 
> wrongly eliding the overflow.  This kind of problems is usually
> avoided by using TYPE_OVERFLOW_SANITIZED in the appropriate place.
> The first match.pd hunk in the patch fixes it.  I've constructed
> a testcase for each of the surrounding cases as well.  Then I
> noticed that fold_binary_loc/associate has the same problem, so I've
> added a TYPE_OVERFLOW_SANITIZED there as well (it may be too coarse,
> sorry).  Then I found yet another problem, but instead of fixing it
> now I've opened 109134.  I could probably go on and find a dozen more.
> 
> Is this worth doing?
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
>   PR sanitizer/109107
> 
> gcc/ChangeLog:
> 
>   * fold-const.cc (fold_binary_loc): Use TYPE_OVERFLOW_SANITIZED
>   when associating.
>   * match.pd: Use TYPE_OVERFLOW_SANITIZED.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/ubsan/pr109107-2.c: New test.
>   * c-c++-common/ubsan/pr109107-3.c: New test.
>   * c-c++-common/ubsan/pr109107-4.c: New test.
>   * c-c++-common/ubsan/pr109107.c: New test.

Please rename the last test to pr109107-1.c.

Otherwise LGTM.

Jakub



[og12] OpenACC: Pass pre-allocated 'ptrblock' to 'goacc_noncontig_array_create_ptrblock' [PR76739] (was: [PATCH, OpenACC, v3] Non-contiguous array support for OpenACC data clauses)

2023-04-03 Thread Thomas Schwinge
Hi!

On 2019-11-26T22:49:21+0800, Chung-Lin Tang  wrote:
> this is a reorg of the last non-contiguous arrays patch.

(Sorry, this is still not the master branch integration email...)


Just a small clean-up, to simplify other changes that I'm working on:

On 2019-11-26T22:49:21+0800, Chung-Lin Tang  wrote:
> --- libgomp/oacc-parallel.c   (revision 278656)
> +++ libgomp/oacc-parallel.c   (working copy)

> +void *
> +goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *nca,
> +void *tgt_ptrblock_addr)
> +{
> +  [...]
> +  void *ptrblock = gomp_malloc (nca->ptrblock_size);

> --- libgomp/target.c  (revision 278656)
> +++ libgomp/target.c  (working copy)

> @@ -1044,6 +1114,98 @@ gomp_map_vars_internal (struct gomp_device_descr *

> +   /* Now we have the target memory allocated, and target offsets of 
> all
> +  row blocks assigned and calculated, we can construct the
> +  accelerator side ptrblock and copy it in.  */
> +   if (nca->ptrblock_size)
> + {
> +   void *ptrblock = goacc_noncontig_array_create_ptrblock
> + (nca, target_ptrblock);
> +   gomp_copy_host2dev (devicep, aq, target_ptrblock, ptrblock,
> +   nca->ptrblock_size, cbufp);
> +   free (ptrblock);
> + }

Pushed to devel/omp/gcc-12 branch
commit c58b28cb650995a41e1ab0166169799f3991bdd6
"OpenACC: Pass pre-allocated 'ptrblock' to 
'goacc_noncontig_array_create_ptrblock' [PR76739]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c58b28cb650995a41e1ab0166169799f3991bdd6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 15 Mar 2023 14:32:12 +0100
Subject: [PATCH] OpenACC: Pass pre-allocated 'ptrblock' to
 'goacc_noncontig_array_create_ptrblock' [PR76739]

... to simplify later changes.  No functional change.

Follow-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".

	PR other/76739
	libgomp/
	* target.c (gomp_map_vars_internal): Pass pre-allocated 'ptrblock'
	to 'goacc_noncontig_array_create_ptrblock'.
	* oacc-parallel.c (goacc_noncontig_array_create_ptrblock): Adjust.
	* oacc-int.h (goacc_noncontig_array_create_ptrblock): Adjust.
---
 libgomp/ChangeLog.omp   | 6 ++
 libgomp/oacc-int.h  | 3 ++-
 libgomp/oacc-parallel.c | 5 ++---
 libgomp/target.c| 5 +++--
 4 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index d8a7e476090..7afb5f43c04 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,11 @@
 2023-04-03  Thomas Schwinge  
 
+	PR other/76739
+	* target.c (gomp_map_vars_internal): Pass pre-allocated 'ptrblock'
+	to 'goacc_noncontig_array_create_ptrblock'.
+	* oacc-parallel.c (goacc_noncontig_array_create_ptrblock): Adjust.
+	* oacc-int.h (goacc_noncontig_array_create_ptrblock): Adjust.
+
 	* libgomp.texi (AMD Radeon, nvptx): Document OpenMP 'pinned'
 	memory.
 
diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h
index d86aeb82dfa..28a6118873a 100644
--- a/libgomp/oacc-int.h
+++ b/libgomp/oacc-int.h
@@ -213,7 +213,8 @@ struct goacc_ncarray_info
   struct goacc_ncarray ncarray[];
 };
 
-extern void *goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *, void *);
+extern void goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *,
+		   void *, void *);
 
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 136702d6e61..8d1c2cce836 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -165,13 +165,13 @@ goacc_process_noncontiguous_arrays (size_t mapnum, void **hostaddrs,
   return nca_info;
 }
 
-void *
+void
 goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *nca,
+   void *ptrblock,
    void *tgt_ptrblock_addr)
 {
   struct goacc_ncarray_descr_type *descr = nca->descr;
   void **tgt_data_rows = nca->tgt_data_rows;
-  void *ptrblock = gomp_malloc (nca->ptrblock_size);
   void **curr_dim_ptrblock = (void **) ptrblock;
   size_t n = 1;
 
@@ -210,7 +210,6 @@ goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *nca,
   curr_dim_ptrblock = next_dim_ptrblock;
 }
   assert (n == nca->data_row_num);
-  return ptrblock;
 }
 
 /* Handle the mapping pair that are presented when a
diff --git a/libgomp/target.c b/libgomp/target.c
index de3facb6428..b88b1ebaa13 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1939,8 +1939,9 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		 accelerator side ptrblock and copy it in.  */
 	  if (nca->ptrblock_size)
 		{
-		  void *ptrblock = goacc_noncontig_array_create_ptrblock
-		   

[PATCH] c++: satisfaction and ARGUMENT_PACK_SELECT [PR105644]

2023-04-03 Thread Patrick Palka via Gcc-patches
This testcase demonstrates we can legitimately enter satisfaction with
an ARGUMENT_PACK_SELECT argument, which is problematic because we can't
store such arguments in the satisfaction cache (or any other hash table).

Since this appears to be possible only during constrained auto deduction
for a return-type-requirement, the most appropriate spot to fix this seems
to be from do_auto_deduction, by calling preserve_args to strip A_P_S args
before entering satisfaction.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/12?

PR c++/105644

gcc/cp/ChangeLog:

* pt.cc (do_auto_deduction): Call preserve_args before entering
satisfaction for adc_requirement contexts.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires36.C: New test.
---
 gcc/cp/pt.cc |  6 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C | 12 
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 4429ae66b68..821e0035c08 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30965,6 +30965,12 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
return type;
}
 
+  /* We can see an ARGUMENT_PACK_SELECT argument when evaluating
+a return-type-requirement.  Get rid of them before entering
+satisfaction, since the satisfaction cache can't handle them.  */
+  if (context == adc_requirement)
+   outer_targs = preserve_args (outer_targs);
+
   if (context == adc_return_type
  || context == adc_variable_type
  || context == adc_decomp_type)
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C
new file mode 100644
index 000..7d13b9b3e54
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C
@@ -0,0 +1,12 @@
+// PR c++/105644
+// { dg-do compile { target c++20 } }
+
+template
+concept same_as = __is_same(T, U);
+
+template
+concept C = (requires { { Ts() } -> same_as; } && ...);
+
+static_assert(C);
+static_assert(!C);
+static_assert(!C);
-- 
2.40.0.153.g6369acd968



[og12] '-foffload-memory=pinned' using offloading device interfaces (was: -foffload-memory=pinned)

2023-04-03 Thread Thomas Schwinge
Hi!

On 2023-02-13T15:20:07+, Andrew Stubbs  wrote:
> On 13/02/2023 14:38, Thomas Schwinge wrote:
>> On 2022-03-08T11:30:55+, Hafiz Abid Qadeer  
>> wrote:
>>> From: Andrew Stubbs 
>>>
>>> Add a new option.  It will be used in follow-up patches.
>>
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>
>>> +@option{-foffload-memory=pinned} forces all host memory to be pinned (this
>>> +mode may require the user to increase the ulimit setting for locked 
>>> memory).
>>
>> So, this is currently implemented via 'mlockall', which, as discussed,
>> (a) has issues ('ulimit -l'), and (b) doesn't actually achieve what it
>> meant to achieve (because it doesn't register the page-locked memory with
>> the GPU driver).
>> [...]
>> As '-foffload-memory=pinned', per the name
>> of the option, concerns itself with memory used in offloading but not
>> host execution generally, why are we actually attempting to "[force] all
>> host memory to be pinned" -- why not just the memory that's being used
>> with offloading?  That is, if '-foffload-memory=pinned' is set, register
>> as page-locked with the GPU driver all memory that appears in OMP
>> offloading data regions, such as OpenMP 'target' 'map' clauses etc.  That
>> way, this is directed at the offloading data transfers, as itended, but
>> at the same time we don't "waste" page-locked memory for generic host
>> memory allocations.  What do you think -- you, who've spent a lot more
>> time on this topic than I have, so it's likely possible that I fail to
>> realize some "details"?
>
> The main reason it is the way it is is because in general it's not
> possible to know what memory is going to be offloaded at the time it is
> allocated (and stack/static memory is never allocated that way).
>
> If there's a way to pin it after the fact then maybe that's not a
> terrible idea?  [...]

I've now pushed to devel/omp/gcc-12 branch my take on this in
commit 43095690ea519205bf56fc148b346edaa43e0f0f
"'-foffload-memory=pinned' using offloading device interfaces", and for
changes related to og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches":
commit 694bbd399c1323975b4a6735646e46c6914de63d
"'-foffload-memory=pinned' using offloading device interfaces for 
non-contiguous array support",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 43095690ea519205bf56fc148b346edaa43e0f0f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 30 Mar 2023 10:08:12 +0200
Subject: [PATCH 1/2] '-foffload-memory=pinned' using offloading device
 interfaces

Implemented for nvptx offloading via 'cuMemHostAlloc', 'cuMemHostRegister'.

	gcc/
	* doc/invoke.texi (-foffload-memory=pinned): Document.
	include/
	* cuda/cuda.h (CUresult): Add
	'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'.
	(CUdevice_attribute): Add
	'CU_DEVICE_ATTRIBUTE_READ_ONLY_HOST_REGISTER_SUPPORTED'.
	(CU_MEMHOSTREGISTER_READ_ONLY): Add.
	(cuMemHostGetFlags, cuMemHostRegister, cuMemHostUnregister): Add.
	libgomp/
	* libgomp-plugin.h (GOMP_OFFLOAD_page_locked_host_free): Add
	'struct goacc_asyncqueue *' formal parameter.
	(GOMP_OFFLOAD_page_locked_host_register)
	(GOMP_OFFLOAD_page_locked_host_unregister)
	(GOMP_OFFLOAD_page_locked_host_p): Add.
	* libgomp.h (always_pinned_mode)
	(gomp_page_locked_host_register_dev)
	(gomp_page_locked_host_unregister_dev): Add.
	(struct splay_tree_key_s): Add 'page_locked_host_p'.
	(struct gomp_device_descr): Add
	'GOMP_OFFLOAD_page_locked_host_register',
	'GOMP_OFFLOAD_page_locked_host_unregister',
	'GOMP_OFFLOAD_page_locked_host_p'.
	* libgomp.texi (-foffload-memory=pinned): Document.
	* plugin/cuda-lib.def (cuMemHostGetFlags, cuMemHostRegister_v2)
	(cuMemHostRegister, cuMemHostUnregister): Add.
	* plugin/plugin-nvptx.c (struct ptx_device): Add
	'read_only_host_register_supported'.
	(nvptx_open_device): Initialize it.
	(free_host_blocks, free_host_blocks_lock)
	(nvptx_run_deferred_page_locked_host_free)
	(nvptx_page_locked_host_free_callback, nvptx_page_locked_host_p)
	(GOMP_OFFLOAD_page_locked_host_register)
	(nvptx_page_locked_host_unregister_callback)
	(GOMP_OFFLOAD_page_locked_host_unregister)
	(GOMP_OFFLOAD_page_locked_host_p)
	(nvptx_run_deferred_page_locked_host_unregister)
	(nvptx_move_page_locked_host_unregister_blocks_aq1_aq2_callback):
	Add.
	(GOMP_OFFLOAD_fini_device, GOMP_OFFLOAD_page_locked_host_alloc)
	(GOMP_OFFLOAD_run): Call
	'nvptx_run_deferred_page_locked_host_free'.
	(struct goacc_asyncqueue): Add
	'page_locked_host_unregister_blocks_lock',
	'page_locked_host_unregister_blocks'.
	(nvptx_goacc_asyncqueue_construct)
	(nvptx_goacc_asyncqueue_destruct): Handle those.
	(GOMP_OFFLOAD_page_locked_host_free): Handle
	'struct goacc_asyncqueue *' formal parameter.
	(GOMP_OFFLOAD_openac

Re: [PATCH] c++: ICE on loopy var tmpl auto deduction [PR109300]

2023-04-03 Thread Patrick Palka via Gcc-patches
On Wed, 29 Mar 2023, Jason Merrill wrote:

> On 3/28/23 13:37, Patrick Palka wrote:
> > Now that we resolve non-dependent variable template-ids ahead of time,
> > cp_finish_decl needs to handle a new invalid situation: we can end up
> > trying to instantiate a variable template with deduced return type
> > before we fully parsed (and attached) its initializer.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this OK for
> > trunK?
> > 
> > PR c++/109300
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl.cc (cp_finish_decl): Diagnose ordinary auto deduction
> > with no initializer instead of asserting.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1y/var-templ79.C: New test.
> > ---
> >   gcc/cp/decl.cc   | 15 ++-
> >   gcc/testsuite/g++.dg/cpp1y/var-templ79.C |  5 +
> >   2 files changed, 19 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ79.C
> > 
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index 20b980f68c8..2c91693b99d 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -8276,7 +8276,20 @@ cp_finish_decl (tree decl, tree init, bool
> > init_const_expr_p,
> >   return;
> > }
> >   -   gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (auto_node));
> > + if (CLASS_PLACEHOLDER_TEMPLATE (auto_node))
> > +   /* Class deduction with no initializer is OK.  */;
> > + else
> > +   {
> > + /* Ordinary auto deduction without an initializer, a situation
> > +which grokdeclarator already catches and rejects for the most
> > +part.  But we can still get here if we're instantiating a
> > +variable template before we've fully parsed (and attached)
> > its
> > +initializer, e.g. template auto x = x;  */
> 
> In the case of recursively dependent instantiation I'd hope to have an
> error_mark_node initializer, rather than none?

Do you mean setting the initializer to error_mark_node after the fact, e.g.

@@ -8288,7 +8297,7 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
  error_at (DECL_SOURCE_LOCATION (decl),
"declaration of %q#D has no initializer", decl);
  TREE_TYPE (decl) = error_mark_node;
- return;
+ init = error_mark_node;
}
}
   d_init = init;

or before the fact, i.e. setting DECL_INITIAL to error_mark_node as a
sentinel value for detecting recursion before we begin parsing a variable
initializer?  The former should work I suppose, but the latter is
problematic because we also call cp_finish_decl with init=error_mark_node
when the initializer is generally invalid, so by overloading the meaning
of error_mark_node here and checking for it from cp_finish_decl we would
end up emitting a bogus extra diagnostic in a bunch of cases e.g.
g++.dg/pr53055.C:

  int i = p ->* p ; // invalid initializer

I guess we would need to use a different sentinel value for detecting
recursion, or expose and inspect the 'lambda_scope' stack which already
keeps track of whether we're in the middle of a variable initializer...
Dunno if it's worth it just for sake of a better diagnostic for this
corner case, I notice e.g. Clang doesn't give a great diagnostic either:

 src/gcc/testsuite/g++.dg/cpp1y/var-templ79.C:5:6: error: declaration of 
variable 'x' with deduced type 'auto' requires an initializer
 auto x = x; // { dg-error "" }
  ^

> 
> > + error_at (DECL_SOURCE_LOCATION (decl),
> > +   "declaration of %q#D has no initializer", decl);
> > + TREE_TYPE (decl) = error_mark_node;
> > + return;
> > +   }
> > }
> > d_init = init;
> > if (d_init)
> > diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ79.C
> > b/gcc/testsuite/g++.dg/cpp1y/var-templ79.C
> > new file mode 100644
> > index 000..3c0d276153a
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ79.C
> > @@ -0,0 +1,5 @@
> > +// PR c++/109300
> > +// { dg-do compile { target c++14 } }
> > +
> > +template
> > +auto x = x; // { dg-error "" }
> 
> 



Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-04-03 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 13 Mar 2023 at 13:03, Richard Biener  wrote:
>
> On Fri, 10 Mar 2023, Richard Sandiford wrote:
>
> > Sorry for the slow reply.
> >
> > Prathamesh Kulkarni  writes:
> > > Unfortunately it regresses code-gen for the following case:
> > >
> > > svint32_t f(int32x4_t x)
> > > {
> > >   return svdupq_s32 (x[0], x[1], x[2], x[3]);
> > > }
> > >
> > > -O2 code-gen with trunk:
> > > f:
> > > dup z0.q, z0.q[0]
> > > ret
> > >
> > > -O2 code-gen with patch:
> > > f:
> > > dup s1, v0.s[1]
> > > movv2.8b, v0.8b
> > > ins v1.s[1], v0.s[3]
> > > ins v2.s[1], v0.s[2]
> > > zip1v0.4s, v2.4s, v1.4s
> > > dup z0.q, z0.q[0]
> > > ret
> > >
> > > IIUC, svdupq_impl::expand uses aarch64_expand_vector_init
> > > to initialize the "base 128-bit vector" and then use dupq to replicate it.
> > >
> > > Without patch, aarch64_expand_vector_init generates fallback code, and 
> > > then
> > > combine optimizes a sequence of vec_merge/vec_select pairs into an 
> > > assignment:
> > >
> > > (insn 7 3 8 2 (set (reg:SI 99)
> > > (vec_select:SI (reg/v:V4SI 97 [ x ])
> > > (parallel [
> > > (const_int 1 [0x1])
> > > ]))) "bar.c":6:10 2592 {aarch64_get_lanev4si}
> > >  (nil))
> > >
> > > (insn 13 9 15 2 (set (reg:V4SI 102)
> > > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 99))
> > > (reg/v:V4SI 97 [ x ])
> > > (const_int 2 [0x2]))) "bar.c":6:10 1794 
> > > {aarch64_simd_vec_setv4si}
> > >  (expr_list:REG_DEAD (reg:SI 99)
> > > (expr_list:REG_DEAD (reg/v:V4SI 97 [ x ])
> > > (nil
> > >
> > > into:
> > > Trying 7 -> 13:
> > > 7: r99:SI=vec_select(r97:V4SI,parallel)
> > >13: r102:V4SI=vec_merge(vec_duplicate(r99:SI),r97:V4SI,0x2)
> > >   REG_DEAD r99:SI
> > >   REG_DEAD r97:V4SI
> > > Successfully matched this instruction:
> > > (set (reg:V4SI 102)
> > > (reg/v:V4SI 97 [ x ]))
> > >
> > > which eventually results into:
> > > (note 2 25 3 2 NOTE_INSN_DELETED)
> > > (note 3 2 7 2 NOTE_INSN_FUNCTION_BEG)
> > > (note 7 3 8 2 NOTE_INSN_DELETED)
> > > (note 8 7 9 2 NOTE_INSN_DELETED)
> > > (note 9 8 13 2 NOTE_INSN_DELETED)
> > > (note 13 9 15 2 NOTE_INSN_DELETED)
> > > (note 15 13 17 2 NOTE_INSN_DELETED)
> > > (note 17 15 18 2 NOTE_INSN_DELETED)
> > > (note 18 17 22 2 NOTE_INSN_DELETED)
> > > (insn 22 18 23 2 (parallel [
> > > (set (reg/i:VNx4SI 32 v0)
> > > (vec_duplicate:VNx4SI (reg:V4SI 108)))
> > > (clobber (scratch:VNx16BI))
> > > ]) "bar.c":7:1 5202 {aarch64_vec_duplicate_vqvnx4si_le}
> > >  (expr_list:REG_DEAD (reg:V4SI 108)
> > > (nil)))
> > > (insn 23 22 0 2 (use (reg/i:VNx4SI 32 v0)) "bar.c":7:1 -1
> > >  (nil))
> > >
> > > I was wondering if we should add the above special case, of assigning
> > > target = vec in aarch64_expand_vector_init, if initializer is {
> > > vec[0], vec[1], ... } ?
> >
> > I'm not sure it will be easy to detect that.  Won't the inputs to
> > aarch64_expand_vector_init just be plain registers?  It's not a
> > good idea in general to search for definitions of registers
> > during expansion.
> >
> > It would be nice to fix this by lowering svdupq into:
> >
> > (a) a constructor for a 128-bit vector
> > (b) a duplication of the 128-bit vector to fill an SVE vector
> >
> > But I'm not sure what the best way of doing (b) would be.
> > In RTL we can use vec_duplicate, but I don't think gimple
> > has an equivalent construct.  Maybe Richi has some ideas.
>
> On GIMPLE it would be
>
>  _1 = { a, ... }; // (a)
>  _2 = { _1, ... }; // (b)
>
> but I'm not sure if (b), a VL CTOR of fixed len(?) sub-vectors is
> possible?  But at least a CTOR of vectors is what we use to
> concat vectors.
>
> With the recent relaxing of VEC_PERM inputs it's also possible to
> express (b) with a VEC_PERM:
>
>  _2 = VEC_PERM <_1, _1, { 0, 1, 2, 3, 0, 1, 2, 3, ... }>
>
> but again I'm not sure if that repeating 0, 1, 2, 3 is expressible
> for VL vectors (maybe we'd allow "wrapping" here, I'm not sure).
>
Hi,
Thanks for the suggestions and sorry for late response in turn.
The attached patch tries to fix the issue by explicitly constructing a CTOR
from svdupq's arguments and then using VEC_PERM_EXPR with VL mask
having encoded elements {0, 1, ... nargs-1},
npatterns == nargs, and nelts_per_pattern == 1, to replicate the base vector.

So for example, for the above case,
svint32_t f_32(int32x4_t x)
{
  return svdupq_s32 (x[0], x[1], x[2], x[3]);
}

forwprop1 lowers it to:
  svint32_t _6;
  vector(4) int _8;
  :
  _1 = BIT_FIELD_REF ;
  _2 = BIT_FIELD_REF ;
  _3 = BIT_FIELD_REF ;
  _4 = BIT_FIELD_REF ;
  _8 = {_1, _2, _3, _4};
  _6 = VEC_PERM_EXPR <_8, _8, { 0, 1, 2, 3, ... }>;
  return _6;

which is then eventually optimized to:
  svint32_t _6;
   [local count: 1073741824]:
  _6 = VEC_PERM_EXPR ;
  return _6;

code-gen:
f_32:
dup z0.q, 

Re: [aarch64] Code-gen for vector initialization involving constants

2023-04-03 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 13 Feb 2023 at 11:58, Prathamesh Kulkarni
 wrote:
>
> On Fri, 3 Feb 2023 at 12:46, Prathamesh Kulkarni
>  wrote:
> >
> > Hi Richard,
> > While digging thru aarch64_expand_vector_init, I noticed it gives
> > priority to loading a constant first:
> >  /* Initialise a vector which is part-variable.  We want to first try
> >  to build those lanes which are constant in the most efficient way we
> >  can.  */
> >
> > which results in suboptimal code-gen for following case:
> > int16x8_t f_s16(int16_t x)
> > {
> >   return (int16x8_t) { x, x, x, x, x, x, x, 1 };
> > }
> >
> > code-gen trunk:
> > f_s16:
> > moviv0.8h, 0x1
> > ins v0.h[0], w0
> > ins v0.h[1], w0
> > ins v0.h[2], w0
> > ins v0.h[3], w0
> > ins v0.h[4], w0
> > ins v0.h[5], w0
> > ins v0.h[6], w0
> > ret
> >
> > The attached patch tweaks the following condition:
> > if (n_var == n_elts && n_elts <= 16)
> >   {
> > ...
> >   }
> >
> > to pass if maxv >= 80% of n_elts, with 80% being an
> > arbitrary "high enough" threshold. The intent is to dup
> > the most repeating variable if it it's repetition
> > is "high enough" and insert constants which should be "better" than
> > loading constant first and inserting variables like in the above case.
> >
> > Alternatively, I suppose we can remove threshold and for constants,
> > generate both sequences and check which one is more
> > efficient ?
> >
> > code-gen with patch:
> > f_s16:
> > dup v0.8h, w0
> > moviv1.4h, 0x1
> > ins v0.h[7], v1.h[0]
> > ret
> >
> > The patch is lightly tested to verify that vec[t]-init-*.c tests pass
> > with bootstrap+test
> > in progress.
> > Does this look OK ?
> Hi Richard,
> ping https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html
Hi Richard,
ping * 2: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh


[PATCH] Less warnings for parameters declared as arrays [PR98541, PR98536]

2023-04-03 Thread Martin Uecker via Gcc-patches



With the relatively new warnings (11..) affecting VLA bounds,
I now get a lot of false positives with -Wall. In general, I find
the new warnings very useful, but they seem a bit too
aggressive and some minor tweaks are needed, otherwise they are
too noisy.  This patch suggests two changes:

1. For VLA bounds non-null is implied only when 'static' is
used (similar to clang) and not already when a bound > 0 is
specified:

int foo(int n, char buf[static n]);

int foo(10, 0); // warning with 'static' but not without.


(It also seems problematic to require a size of 0 to indicate 
that the pointer may be null, because 0 is not allowed in
ISO C as a size. It is also inconsistent to how arrays with
static bound behave.) 

There seems to be agreement about this change in PR98541.


2. GCC always warns when the number of unspecified
bounds is different between two declarations:

int foo(int n, char buf[*]);
int foo(int n, char buf[n]);

or

int foo(int n, char buf[n]);
int foo(int n, char buf[*]);

But the first version is useful if the size expression
can not be specified in a header (e.g. because it uses
a macro or variable not available there) and there is
currently no easy way to avoid this.  The warning for
both cases was by design,  but I suggest to limit the
warning to the second case. 

Note that the logic currently applied by GCC is too
simplistic anyway, as GCC does not warn for

int foo(int x, int y, double m[*][y]);
int foo(int x, int y, double m[x][*]);

because the number of specified / unspecified bounds
is the same.  So I suggest to go with the attached
patch now and add  more precise warnings later
if there is more experience with these warning 
in gernal and if this then still seems desirable.


Martin


Less warnings for parameters declared as arrays [PR98541, PR98536]

To avoid false positivies, tune the warnings for parameters declared
as arrays with size expressions.  Only warn about null arguments with
'static'.  Also do not warn when more bounds are specified in the new
declaration than before.

PR c/98541
PR c/98536

c-family/
* c-warn.cc (warn_parm_array_mismatch): Do not warn if more
bounds are specified.

gcc/
* gimple-ssa-warn-access.cc
  (pass_waccess::maybe_check_access_sizes): For VLA bounds
in parameters, only warn about null pointers with 'static'.

gcc/testsuite:
* gcc.dg/Wnonnull-4: Adapt test.
* gcc.dg/Wstringop-overflow-40.c: Adapt test.
* gcc.dg/Wvla-parameter-4.c: Adapt test.
* gcc.dg/attr-access-2.c: Adapt test.


diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index 9ac43a1af6e..f79fb876142 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -3599,23 +3599,13 @@ warn_parm_array_mismatch (location_t origloc, tree 
fndecl, tree newparms)
  continue;
}
 
- if (newunspec != curunspec)
+ if (newunspec > curunspec)
{
  location_t warnloc = newloc, noteloc = origloc;
  const char *warnparmstr = newparmstr.c_str ();
  const char *noteparmstr = curparmstr.c_str ();
  unsigned warnunspec = newunspec, noteunspec = curunspec;
 
- if (newunspec < curunspec)
-   {
- /* If the new declaration has fewer unspecified bounds
-point the warning to the previous declaration to make
-it clear that that's the one to change.  Otherwise,
-point it to the new decl.  */
- std::swap (warnloc, noteloc);
- std::swap (warnparmstr, noteparmstr);
- std::swap (warnunspec, noteunspec);
-   }
  if (warning_n (warnloc, OPT_Wvla_parameter, warnunspec,
 "argument %u of type %s declared with "
 "%u unspecified variable bound",
@@ -3641,16 +3631,11 @@ warn_parm_array_mismatch (location_t origloc, tree 
fndecl, tree newparms)
  continue;
}
}
-
   /* Iterate over the lists of VLA variable bounds, comparing each
-pair for equality, and diagnosing mismatches.  The case of
-the lists having different lengths is handled above so at
-this point they do .  */
-  for (tree newvbl = newa->size, curvbl = cura->size; newvbl;
+pair for equality, and diagnosing mismatches.  */
+  for (tree newvbl = newa->size, curvbl = cura->size; newvbl && curvbl;
   newvbl = TREE_CHAIN (newvbl), curvbl = TREE_CHAIN (curvbl))
{
- gcc_assert (curvbl);
-
  tree newpos = TREE_PURPOSE (newvbl);
  tree curpos = TREE_PURPOSE (curvbl);
 
@@ -3663,7 +3648,6 @@ warn_parm_array_mismatch (location_t origloc, tree 
fndecl, tree newparms)
   and both are the same expression

[PATCH] Fortran: reject module variable as character length in PARAMETER [PR104349]

2023-04-03 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached patch fixes an ICE-on-invalid for a PARAMETER expression
where the character length was a MODULE variable.  The ICE seemed
strange, as we were catching related erroneous code for declarations in
programs or subroutines.  Removing a seemingly bogus check of restricted
expressions is the simplest way to fix this.  (We could also catch this
differently in decl.cc).

Besides, this also fixes an accepts-invalid, see testcase. :-)

Regtested on x86_64-pc-linux-gnu.  OK for mainline (13) or rather wait?

Thanks,
Harald

From 37136ce94b44149dd013b3d7fed7adba769241e6 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 3 Apr 2023 21:34:01 +0200
Subject: [PATCH] Fortran: reject module variable as character length in
 PARAMETER [PR104349]

gcc/fortran/ChangeLog:

	PR fortran/104349
	* expr.cc (check_restricted): Adjust check for valid variables in
	restricted expressions: make no exception for module variables.

gcc/testsuite/ChangeLog:

	PR fortran/104349
	* gfortran.dg/der_charlen_1.f90: Adjust dg-patterns.
	* gfortran.dg/pr104349.f90: New test.
---
 gcc/fortran/expr.cc | 2 --
 gcc/testsuite/gfortran.dg/der_charlen_1.f90 | 2 ++
 gcc/testsuite/gfortran.dg/pr104349.f90  | 8 
 3 files changed, 10 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr104349.f90

diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 7fb33f81788..02028f993fd 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -3504,8 +3504,6 @@ check_restricted (gfc_expr *e)
 	|| sym->attr.implied_index
 	|| sym->attr.flavor == FL_PARAMETER
 	|| is_parent_of_current_ns (sym->ns)
-	|| (sym->ns->proc_name != NULL
-		  && sym->ns->proc_name->attr.flavor == FL_MODULE)
 	|| (gfc_is_formal_arg () && (sym->ns == gfc_current_ns)))
 	{
 	  t = true;
diff --git a/gcc/testsuite/gfortran.dg/der_charlen_1.f90 b/gcc/testsuite/gfortran.dg/der_charlen_1.f90
index 9f394c73f25..1246522d516 100644
--- a/gcc/testsuite/gfortran.dg/der_charlen_1.f90
+++ b/gcc/testsuite/gfortran.dg/der_charlen_1.f90
@@ -22,3 +22,5 @@ CONTAINS
 type(T), intent(in)  :: X
   end subroutine
 end module another_core
+
+! { dg-prune-output "cannot appear in the expression" }
diff --git a/gcc/testsuite/gfortran.dg/pr104349.f90 b/gcc/testsuite/gfortran.dg/pr104349.f90
new file mode 100644
index 000..2bea4a37214
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr104349.f90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! PR fortran/104349 - reject module variable as character length in PARAMETER
+! Contributed by G.Steinmetz
+
+module m
+  character(n), parameter :: a(1) = 'b' ! { dg-error "cannot appear" }
+  character(n), parameter :: c= 'b' ! { dg-error "cannot appear" }
+end
--
2.35.3



Re: [PATCH 3/3] Fortran: Fix mpz and mpfr memory leaks

2023-04-03 Thread Harald Anlauf via Gcc-patches

Hi Bernhard,

there is neither context nor a related PR with a testcase showing
that this patch fixes issues seen there.

On 4/2/23 17:05, Bernhard Reutner-Fischer via Gcc-patches wrote:

From: Bernhard Reutner-Fischer 

Cc: fort...@gcc.gnu.org

gcc/fortran/ChangeLog:

* array.cc (gfc_ref_dimen_size): Free mpz memory before ICEing.
* expr.cc (find_array_section): Fix mpz memory leak.
* simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in
error paths.
(gfc_simplify_set_exponent): Fix mpfr memory leak.
---
  gcc/fortran/array.cc| 3 +++
  gcc/fortran/expr.cc | 8 
  gcc/fortran/simplify.cc | 7 ++-
  3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index be5eb8b6a0f..8b1e816a859 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -2541,6 +2541,9 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, mpz_t 
*result, mpz_t *end)
return t;

  default:
+  mpz_clear (lower);
+  mpz_clear (stride);
+  mpz_clear (upper);
gfc_internal_error ("gfc_ref_dimen_size(): Bad dimen_type");
  }


What is the point of clearing variables before issuing a gfc_internal_error?


diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 7fb33f81788..b4736804eda 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -1539,6 +1539,7 @@ find_array_section (gfc_expr *expr, gfc_ref *ref)
mpz_init_set_ui (delta_mpz, one);
mpz_init_set_ui (nelts, one);
mpz_init (tmp_mpz);
+  mpz_init (ptr);

/* Do the initialization now, so that we can cleanup without
   keeping track of where we were.  */
@@ -1682,7 +1683,6 @@ find_array_section (gfc_expr *expr, gfc_ref *ref)
mpz_mul (delta_mpz, delta_mpz, tmp_mpz);
  }

-  mpz_init (ptr);
cons = gfc_constructor_first (base);

/* Now clock through the array reference, calculating the index in
@@ -1735,7 +1735,8 @@ find_array_section (gfc_expr *expr, gfc_ref *ref)
 "at %L requires an increase of the allowed %d "
 "upper limit.  See %<-fmax-array-constructor%> "
 "option", &expr->where, flag_max_array_constructor);
- return false;
+ t = false;
+ goto cleanup;
}

cons = gfc_constructor_lookup (base, limit);
@@ -1750,8 +1751,6 @@ find_array_section (gfc_expr *expr, gfc_ref *ref)
   gfc_copy_expr (cons->expr), NULL);
  }

-  mpz_clear (ptr);
-
  cleanup:

mpz_clear (delta_mpz);
@@ -1765,6 +1764,7 @@ cleanup:
mpz_clear (ctr[d]);
mpz_clear (stride[d]);
  }
+  mpz_clear (ptr);
gfc_constructor_free (base);
return t;
  }
diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
index ecf0e3558df..d1f06335e79 100644
--- a/gcc/fortran/simplify.cc
+++ b/gcc/fortran/simplify.cc
@@ -6866,6 +6866,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
*shape_exp,
  gfc_error ("The SHAPE array for the RESHAPE intrinsic at %L has a "
 "negative value %d for dimension %d",
 &shape_exp->where, shape[rank], rank+1);
+ mpz_clear (index);
  return &gfc_bad_expr;
}

@@ -6889,6 +6890,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
*shape_exp,
{
  gfc_error ("Shapes of ORDER at %L and SHAPE at %L are different",
 &order_exp->where, &shape_exp->where);
+ mpz_clear (index);
  return &gfc_bad_expr;
}

@@ -6902,6 +6904,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
*shape_exp,
{
  gfc_error ("Sizes of ORDER at %L and SHAPE at %L are different",
 &order_exp->where, &shape_exp->where);
+ mpz_clear (index);
  return &gfc_bad_expr;
}

@@ -6918,6 +6921,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
*shape_exp,
 "in the range [1, ..., %d] for the RESHAPE intrinsic "
 "near %L", order[i], &order_exp->where, rank,
 &shape_exp->where);
+ mpz_clear (index);
  return &gfc_bad_expr;
}

@@ -6926,6 +6930,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
*shape_exp,
{
  gfc_error ("ORDER at %L is not a permutation of the size of "
 "SHAPE at %L", &order_exp->where, &shape_exp->where);
+ mpz_clear (index);
  return &gfc_bad_expr;
}
  x[order[i]] = 1;
@@ -7408,7 +7413,7 @@ gfc_simplify_set_exponent (gfc_expr *x, gfc_expr *i)
exp2 = (unsigned long) mpz_get_d (i->value.integer);
mpfr_mul_2exp (result->value.real, frac, exp2, GFC_RND_MODE);

-  mpfr_clears (absv, log2, pow2, frac, NULL);
+  mpfr_clears (exp, absv, log2, pow2, frac, NULL);

return range_check (result, "SET_EXPONENT");
  }




[PATCH] range-op-float: Fix reverse ops of comparisons [PR109386]

2023-04-03 Thread Jakub Jelinek via Gcc-patches
Hi!

I've missed one of my recent range-op-float.cc changes (likely the
r13-6967 one) caused
FAIL: libphobos.phobos/std/math/algebraic.d execution test
FAIL: libphobos.phobos_shared/std/math/algebraic.d execution test
regressions, distilled into a C testcase below.

In the testcase, we have
!(u >= v)
condition where both u and v are results of fabs*, which guards
t1 = u u<= __FLT_MAX__;
and
t2 = v u<= __FLT_MAX__;
comparisons.  From false u >= v where u and v have [0.0, +Inf] NAN
ranges we (incorrectly deduce that one of them is [nextafterf (0.0, 1.0), +Inf] 
NAN
and the other is [0.0, nextafterf (+Inf, 0.0)] NAN and from that deduce that
one of the comparisons is always true, because UNLE_EXPR with the maximum
representable number are false only if the value is +Inf and our ranges tell
that is not possible.

The bug is that the u >= v comparison determines a sensible range only when
it is true - we then know neither operand can be NAN and it behaves
correctly.  But when the comparison is false, our current code gives
sensible answers only if the other op can't be NAN.  If it can be NAN,
whenever it is NAN, the comparison is always false regardless of the other
value, so the other value needs to be VARYING.
Now, foperator_unordered_lt::op1_range etc. had code to deal with that
for op?.known_nan (), but as the testcase shows, it is enough if it may be a
NAN at runtime to make it VARYING.

So, the following patch replaces for all those BRS_FALSE cases of the normal
non-equality comparisons if (opOTHER.known_isnan ()) r.set_varying (type);
to do it also if maybe_isnan ().

For the unordered or ... comparisons, it is similar for BRS_TRUE.  Those
comparisons are true whenever either operand is NAN, or if neither is NAN,
the corresponding normal comparison.  So, if those comparisons are true
and other operand might be a NAN, we can't tell (VARYING), if it is false,
currently handling is correct.

Bootstrapped/regtested on x86_64-linux and i686-linux, fixes those 2
D testcases and the newly added one.  Ok for trunk?

2023-04-03  Jakub Jelinek  

PR tree-optimization/109386
* range-op-float.cc (foperator_lt::op1_range, foperator_lt::op2_range,
foperator_le::op1_range, foperator_le::op2_range,
foperator_gt::op1_range, foperator_gt::op2_range,
foperator_ge::op1_range, foperator_ge::op2_range): Make r varying for
BRS_FALSE case even if the other op is maybe_isnan, not just
known_isnan.
(foperator_unordered_lt::op1_range, foperator_unordered_lt::op2_range,
foperator_unordered_le::op1_range, foperator_unordered_le::op2_range,
foperator_unordered_gt::op1_range, foperator_unordered_gt::op2_range,
foperator_unordered_ge::op1_range, foperator_unordered_ge::op2_range):
Make r varying for BRS_TRUE case even if the other op is maybe_isnan,
not just known_isnan.

* gcc.c-torture/execute/ieee/pr109386.c: New test.

--- gcc/range-op-float.cc.jj2023-04-03 10:42:54.0 +0200
+++ gcc/range-op-float.cc   2023-04-03 13:31:01.163216123 +0200
@@ -889,7 +889,7 @@ foperator_lt::op1_range (frange &r,
 
 case BRS_FALSE:
   // On the FALSE side of x < NAN, we know nothing about x.
-  if (op2.known_isnan ())
+  if (op2.known_isnan () || op2.maybe_isnan ())
r.set_varying (type);
   else
build_ge (r, type, op2);
@@ -926,7 +926,7 @@ foperator_lt::op2_range (frange &r,
 
 case BRS_FALSE:
   // On the FALSE side of NAN < x, we know nothing about x.
-  if (op1.known_isnan ())
+  if (op1.known_isnan () || op1.maybe_isnan ())
r.set_varying (type);
   else
build_le (r, type, op1);
@@ -1005,7 +1005,7 @@ foperator_le::op1_range (frange &r,
 
 case BRS_FALSE:
   // On the FALSE side of x <= NAN, we know nothing about x.
-  if (op2.known_isnan ())
+  if (op2.known_isnan () || op2.maybe_isnan ())
r.set_varying (type);
   else
build_gt (r, type, op2);
@@ -1038,7 +1038,7 @@ foperator_le::op2_range (frange &r,
 
 case BRS_FALSE:
   // On the FALSE side of NAN <= x, we know nothing about x.
-  if (op1.known_isnan ())
+  if (op1.known_isnan () || op1.maybe_isnan ())
r.set_varying (type);
   else if (op1.undefined_p ())
return false;
@@ -1122,7 +1122,7 @@ foperator_gt::op1_range (frange &r,
 
 case BRS_FALSE:
   // On the FALSE side of x > NAN, we know nothing about x.
-  if (op2.known_isnan ())
+  if (op2.known_isnan () || op2.maybe_isnan ())
r.set_varying (type);
   else if (op2.undefined_p ())
return false;
@@ -1161,7 +1161,7 @@ foperator_gt::op2_range (frange &r,
 
 case BRS_FALSE:
   // On The FALSE side of NAN > x, we know nothing about x.
-  if (op1.known_isnan ())
+  if (op1.known_isnan () || op1.maybe_isnan ())
r.set_varying (type);
   else if (op1.undefined_p ())
return false;
@@ -1241,7 +1241,7 @@ foperator_ge::op1

Re: [PATCH] c++: ICE on loopy var tmpl auto deduction [PR109300]

2023-04-03 Thread Jason Merrill via Gcc-patches

On 4/3/23 12:28, Patrick Palka wrote:

On Wed, 29 Mar 2023, Jason Merrill wrote:


On 3/28/23 13:37, Patrick Palka wrote:

Now that we resolve non-dependent variable template-ids ahead of time,
cp_finish_decl needs to handle a new invalid situation: we can end up
trying to instantiate a variable template with deduced return type
before we fully parsed (and attached) its initializer.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this OK for
trunK?

PR c++/109300

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Diagnose ordinary auto deduction
with no initializer instead of asserting.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ79.C: New test.
---
   gcc/cp/decl.cc   | 15 ++-
   gcc/testsuite/g++.dg/cpp1y/var-templ79.C |  5 +
   2 files changed, 19 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ79.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 20b980f68c8..2c91693b99d 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8276,7 +8276,20 @@ cp_finish_decl (tree decl, tree init, bool
init_const_expr_p,
  return;
}
   -  gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (auto_node));
+ if (CLASS_PLACEHOLDER_TEMPLATE (auto_node))
+   /* Class deduction with no initializer is OK.  */;
+ else
+   {
+ /* Ordinary auto deduction without an initializer, a situation
+which grokdeclarator already catches and rejects for the most
+part.  But we can still get here if we're instantiating a
+variable template before we've fully parsed (and attached)
its
+initializer, e.g. template auto x = x;  */


In the case of recursively dependent instantiation I'd hope to have an
error_mark_node initializer, rather than none?


Do you mean setting the initializer to error_mark_node after the fact, e.g.

@@ -8288,7 +8297,7 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   error_at (DECL_SOURCE_LOCATION (decl),
 "declaration of %q#D has no initializer", decl);
   TREE_TYPE (decl) = error_mark_node;
- return;
+ init = error_mark_node;
 }
 }
d_init = init;

or before the fact, i.e. setting DECL_INITIAL to error_mark_node as a
sentinel value for detecting recursion before we begin parsing a variable
initializer?  The former should work I suppose, but the latter is
problematic because we also call cp_finish_decl with init=error_mark_node
when the initializer is generally invalid, so by overloading the meaning
of error_mark_node here and checking for it from cp_finish_decl we would
end up emitting a bogus extra diagnostic in a bunch of cases e.g.
g++.dg/pr53055.C:

   int i = p ->* p ; // invalid initializer

I guess we would need to use a different sentinel value for detecting
recursion, or expose and inspect the 'lambda_scope' stack which already
keeps track of whether we're in the middle of a variable initializer...
Dunno if it's worth it just for sake of a better diagnostic for this
corner case, I notice e.g. Clang doesn't give a great diagnostic either:

  src/gcc/testsuite/g++.dg/cpp1y/var-templ79.C:5:6: error: declaration of 
variable 'x' with deduced type 'auto' requires an initializer
  auto x = x; // { dg-error "" }
   ^


Yeah, let's just go with your patch, thanks.




+ error_at (DECL_SOURCE_LOCATION (decl),
+   "declaration of %q#D has no initializer", decl);
+ TREE_TYPE (decl) = error_mark_node;
+ return;
+   }
}
 d_init = init;
 if (d_init)
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ79.C
b/gcc/testsuite/g++.dg/cpp1y/var-templ79.C
new file mode 100644
index 000..3c0d276153a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ79.C
@@ -0,0 +1,5 @@
+// PR c++/109300
+// { dg-do compile { target c++14 } }
+
+template
+auto x = x; // { dg-error "" }









Re: [PATCH] c++: satisfaction and ARGUMENT_PACK_SELECT [PR105644]

2023-04-03 Thread Jason Merrill via Gcc-patches

On 4/3/23 10:49, Patrick Palka wrote:

This testcase demonstrates we can legitimately enter satisfaction with
an ARGUMENT_PACK_SELECT argument, which is problematic because we can't
store such arguments in the satisfaction cache (or any other hash table).

Since this appears to be possible only during constrained auto deduction
for a return-type-requirement, the most appropriate spot to fix this seems
to be from do_auto_deduction, by calling preserve_args to strip A_P_S args
before entering satisfaction.

+++ b/gcc/cp/pt.cc
@@ -30965,6 +30965,12 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
return type;
}
  
+  /* We can see an ARGUMENT_PACK_SELECT argument when evaluating

+a return-type-requirement.  Get rid of them before entering
+satisfaction, since the satisfaction cache can't handle them.  */
+  if (context == adc_requirement)
+   outer_targs = preserve_args (outer_targs);


I'd like to get do_auto_deduction out of the business of handling 
return-type-requirements, since there is no longer any actual deduction 
involved (as there was in the TS).  So I'd prefer not to add any more 
tweaks there.


Maybe this should happen higher up, in tsubst_requires_expr?  Maybe just 
before the call to add_extra_args?


Jason



Re: [PATCH 3/3] Fortran: Fix mpz and mpfr memory leaks

2023-04-03 Thread Bernhard Reutner-Fischer via Gcc-patches
On 3 April 2023 21:50:49 CEST, Harald Anlauf  wrote:
>Hi Bernhard,
>
>there is neither context nor a related PR with a testcase showing
>that this patch fixes issues seen there.

Yes, i forgot to mention the PR:

PR fortran/68800

I did not construct individual test cases but it should be obvious that we 
should not leak these.

>
>On 4/2/23 17:05, Bernhard Reutner-Fischer via Gcc-patches wrote:
>> From: Bernhard Reutner-Fischer 
>> 
>> Cc: fort...@gcc.gnu.org
>> 
>> gcc/fortran/ChangeLog:
>> 
>>  * array.cc (gfc_ref_dimen_size): Free mpz memory before ICEing.
>>  * expr.cc (find_array_section): Fix mpz memory leak.
>>  * simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in
>>  error paths.
>>  (gfc_simplify_set_exponent): Fix mpfr memory leak.
>> ---
>>   gcc/fortran/array.cc| 3 +++
>>   gcc/fortran/expr.cc | 8 
>>   gcc/fortran/simplify.cc | 7 ++-
>>   3 files changed, 13 insertions(+), 5 deletions(-)
>> 
>> diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
>> index be5eb8b6a0f..8b1e816a859 100644
>> --- a/gcc/fortran/array.cc
>> +++ b/gcc/fortran/array.cc
>> @@ -2541,6 +2541,9 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, 
>> mpz_t *result, mpz_t *end)
>> return t;
>> 
>>   default:
>> +  mpz_clear (lower);
>> +  mpz_clear (stride);
>> +  mpz_clear (upper);
>> gfc_internal_error ("gfc_ref_dimen_size(): Bad dimen_type");
>>   }
>
>What is the point of clearing variables before issuing a gfc_internal_error?

To make it obvious that we are aware that we allocated these.

thanks,
>
>> diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
>> index 7fb33f81788..b4736804eda 100644
>> --- a/gcc/fortran/expr.cc
>> +++ b/gcc/fortran/expr.cc
>> @@ -1539,6 +1539,7 @@ find_array_section (gfc_expr *expr, gfc_ref *ref)
>> mpz_init_set_ui (delta_mpz, one);
>> mpz_init_set_ui (nelts, one);
>> mpz_init (tmp_mpz);
>> +  mpz_init (ptr);
>> 
>> /* Do the initialization now, so that we can cleanup without
>>keeping track of where we were.  */
>> @@ -1682,7 +1683,6 @@ find_array_section (gfc_expr *expr, gfc_ref *ref)
>> mpz_mul (delta_mpz, delta_mpz, tmp_mpz);
>>   }
>> 
>> -  mpz_init (ptr);
>> cons = gfc_constructor_first (base);
>> 
>> /* Now clock through the array reference, calculating the index in
>> @@ -1735,7 +1735,8 @@ find_array_section (gfc_expr *expr, gfc_ref *ref)
>>   "at %L requires an increase of the allowed %d "
>>   "upper limit.  See %<-fmax-array-constructor%> "
>>   "option", &expr->where, flag_max_array_constructor);
>> -  return false;
>> +  t = false;
>> +  goto cleanup;
>>  }
>> 
>> cons = gfc_constructor_lookup (base, limit);
>> @@ -1750,8 +1751,6 @@ find_array_section (gfc_expr *expr, gfc_ref *ref)
>> gfc_copy_expr (cons->expr), NULL);
>>   }
>> 
>> -  mpz_clear (ptr);
>> -
>>   cleanup:
>> 
>> mpz_clear (delta_mpz);
>> @@ -1765,6 +1764,7 @@ cleanup:
>> mpz_clear (ctr[d]);
>> mpz_clear (stride[d]);
>>   }
>> +  mpz_clear (ptr);
>> gfc_constructor_free (base);
>> return t;
>>   }
>> diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
>> index ecf0e3558df..d1f06335e79 100644
>> --- a/gcc/fortran/simplify.cc
>> +++ b/gcc/fortran/simplify.cc
>> @@ -6866,6 +6866,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
>> *shape_exp,
>>gfc_error ("The SHAPE array for the RESHAPE intrinsic at %L has a "
>>   "negative value %d for dimension %d",
>>   &shape_exp->where, shape[rank], rank+1);
>> +  mpz_clear (index);
>>return &gfc_bad_expr;
>>  }
>> 
>> @@ -6889,6 +6890,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
>> *shape_exp,
>>  {
>>gfc_error ("Shapes of ORDER at %L and SHAPE at %L are different",
>>   &order_exp->where, &shape_exp->where);
>> +  mpz_clear (index);
>>return &gfc_bad_expr;
>>  }
>> 
>> @@ -6902,6 +6904,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
>> *shape_exp,
>>  {
>>gfc_error ("Sizes of ORDER at %L and SHAPE at %L are different",
>>   &order_exp->where, &shape_exp->where);
>> +  mpz_clear (index);
>>return &gfc_bad_expr;
>>  }
>> 
>> @@ -6918,6 +6921,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
>> *shape_exp,
>>   "in the range [1, ..., %d] for the RESHAPE intrinsic "
>>   "near %L", order[i], &order_exp->where, rank,
>>   &shape_exp->where);
>> +  mpz_clear (index);
>>return &gfc_bad_expr;
>>  }
>> 
>> @@ -6926,6 +6930,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr 
>> *shape_exp,
>>  {
>>gfc_error ("ORDER at %L is not a permutation of the size of "
>>   "SHAPE at %L", &order_exp->where, &shape_

Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-03 Thread Jan Hubicka via Gcc-patches
> On Tue, 28 Mar 2023, Richard Biener wrote:
> 
> > When adjusting calls to reflect instrumentation we failed to handle
> > calls to aliases since they appear to have no body.  Instead resort
> > to symtab node availability.  The patch also avoids touching
> > internal function calls in a more obvious way (builtins might
> > have a body available).
> > 
> > profiledbootstrap & regtest running on x86_64-unknown-linux-gnu.
> > 
> > Honza - does this look OK?
> > PR tree-optimization/109304
> > * tree-profile.cc (tree_profiling): Use symtab node
> > availability to decide whether to skip adjusting calls.
> > Do not adjust calls to internal functions.
> > @@ -842,12 +842,15 @@ tree_profiling (void)
> > for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >   {
> > gcall *call = dyn_cast  (gsi_stmt (gsi));
> > -   if (!call)
> > +   if (!call || gimple_call_internal_p (call))
> >   continue;
> >  
> > /* We do not clear pure/const on decls without body.  */
> > tree fndecl = gimple_call_fndecl (call);
> > -   if (fndecl && !gimple_has_body_p (fndecl))
> > +   cgraph_node *callee;
> > +   if (fndecl
> > +   && (callee = cgraph_node::get (fndecl))
> > +   && callee->get_availability (node) == AVAIL_NOT_AVAILABLE)

As discussed earlier, the testcase I posted can be adjusted to put the
const declared wrapper into another translation unit, so I think we will
need to drop the visibility check completely.  But as discussed, it is
wrong code issue, but not a regression, so we may go with the
availability check as you suggest. So the patch is OK. 


I wonder if we do not want to drop it everywhere (as we plan for next
stage1 anyway).  I think similar ICE as in the PR can be produced with
LTO. In normal situation declaration merging will do the right thing:
If you have unit A calling const foo externally, it won't get processed
by the code above.  However unit B declaring foo will get it downgraded
to non-const.

Now at WPA time we will read both A and B and in declaration merging B's
definition will prevail.  This won't happen if lto_symtab_merge_p
returns false which can probably be triggered by adding warning/error
attribute to B's declaration but not to A's.

It is however really side case and I am worried about dropping
pure/const from builtin declarations...

Honza


Re: [PATCH v2] RISC-V: Add Z*inx imcompatible check in gcc.

2023-04-03 Thread Hans-Peter Nilsson
On Tue, 28 Mar 2023, Jiawei wrote:

> +  // Zfinx is conflict with float extensions.
> +  if (TARGET_ZFINX && TARGET_HARD_FLOAT)
> +error ("z*inx is conflict with float extensions");
> +

While I'm not a native English speaker, "is conflict with" 
doesn't sound grammatically correct.  Perhaps "conflicts with" 
or "is in conflict with"?

brgds, H-P


Re: [GCC14 PATCH] LoongArch: Optimize additions with immediates

2023-04-03 Thread Lulu Cheng



/* snip */


diff --git a/gcc/testsuite/gcc.target/loongarch/add-const.c 
b/gcc/testsuite/gcc.target/loongarch/add-const.c
new file mode 100644
index 000..3a9f72fe83d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/add-const.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O -mabi=lp64d" } */
+
+/* None of these functions should load the const operand into a temp
+   register.  */
+
+/* { dg-final { scan-assembler-not "add\\.[dw]" } } */
+
+unsigned long f01 (unsigned long x) { return x + 1; }
+unsigned long f02 (unsigned long x) { return x - 1; }
+unsigned long f03 (unsigned long x) { return x + 2047; }
+unsigned long f04 (unsigned long x) { return x + 4094; }
+unsigned long f05 (unsigned long x) { return x - 2048; }
+unsigned long f06 (unsigned long x) { return x - 4096; }
+unsigned long f07 (unsigned long x) { return x + 0x7fff; }
+unsigned long f08 (unsigned long x) { return x - 0x8000l; }
+unsigned long f09 (unsigned long x) { return x + 0x7fffl * 2; }
+unsigned long f10 (unsigned long x) { return x - 0x8000l * 2; }
+unsigned long f11 (unsigned long x) { return x - 0x8000l * 2; }

These two test cases are duplicates.

+unsigned long f12 (unsigned long x) { return x + 0x7fff + 0x1; }
+unsigned long f13 (unsigned long x) { return x + 0x7fff - 0x1; }
+unsigned long f14 (unsigned long x) { return x + 0x7fff + 0x7ff; }
+unsigned long f15 (unsigned long x) { return x + 0x7fff - 0x800; }
+unsigned long f16 (unsigned long x) { return x - 0x8000l - 1; }
+unsigned long f17 (unsigned long x) { return x - 0x8000l + 1; }
+unsigned long f18 (unsigned long x) { return x - 0x8000l - 0x800; }
+unsigned long f19 (unsigned long x) { return x - 0x8000l + 0x7ff; }
+
+unsigned int g01 (unsigned int x) { return x + 1; }
+unsigned int g02 (unsigned int x) { return x - 1; }
+unsigned int g03 (unsigned int x) { return x + 2047; }
+unsigned int g04 (unsigned int x) { return x + 4094; }
+unsigned int g05 (unsigned int x) { return x - 2048; }
+unsigned int g06 (unsigned int x) { return x - 4096; }
+unsigned int g07 (unsigned int x) { return x + 0x7fff; }
+unsigned int g08 (unsigned int x) { return x - 0x8000l; }
+unsigned int g09 (unsigned int x) { return x + 0x7fffl * 2; }
+unsigned int g10 (unsigned int x) { return x - 0x8000l * 2; }
+unsigned int g11 (unsigned int x) { return x - 0x8000l * 2; }


Ditto.

I found that adding this log test case 
gcc.target/loongarch/stack-check-cfa-1.c and 
gcc.target/loongarch/stack-check-cfa-2.c test failed.


Although the test fails, the generated assembly code is better, and 
there is no problem with the logic of the assembly code. I haven't 
checked the reason for this yet.


Otherwise LGTM, thanks!




[pushed] c++: friend template matching [PR107484]

2023-04-03 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Here friend matching tries to find a matching non-template friend and fails,
so we mark the friend as a template specialization to be determined later.
Then cplus_decl_attributes tries again to find a matching function and gets
confused by DECL_TEMPLATE_INSTANTIATION without DECL_TEMPLATE_INFO.  But it
doesn't make sense for find_last_decl to be trying to match anything with
DECL_USE_TEMPLATE set; those are matched elsewhere.

PR c++/107484

gcc/cp/ChangeLog:

* decl2.cc (find_last_decl): Return early if DECL_USE_TEMPLATE.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/friend25.C: New test.
---
 gcc/cp/decl2.cc| 5 +
 gcc/testsuite/g++.dg/lookup/friend25.C | 9 +
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/lookup/friend25.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 2b195e7..9594be4092c 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -1613,6 +1613,11 @@ find_last_decl (tree decl)
 
   if (tree name = DECL_P (decl) ? DECL_NAME (decl) : NULL_TREE)
 {
+  /* Template specializations are matched elsewhere.  */
+  if (DECL_LANG_SPECIFIC (decl)
+ && DECL_USE_TEMPLATE (decl))
+   return NULL_TREE;
+
   /* Look up the declaration in its scope.  */
   tree pushed_scope = NULL_TREE;
   if (tree ctype = DECL_CONTEXT (decl))
diff --git a/gcc/testsuite/g++.dg/lookup/friend25.C 
b/gcc/testsuite/g++.dg/lookup/friend25.C
new file mode 100644
index 000..74cf5dc3431
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/friend25.C
@@ -0,0 +1,9 @@
+// PR c++/107484
+
+namespace qualified_friend_no_match {
+  void f(int);
+  template void f(T*);
+  struct X {
+friend void qualified_friend_no_match::f(double); // { dg-error "does not 
match any template" }
+  };
+}

base-commit: 59b4a555c3f1c3dba376da1c4886a9ea18ad208d
-- 
2.31.1



[PATCH] Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.

2023-04-03 Thread liuhongt via Gcc-patches
There's a potential performance issue when backend returns some
unreasonable value for the mode which can be never be allocate with
reg class.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk(or GCC14 stage1)?

gcc/ChangeLog:

PR rtl-optimization/109351
* ira.cc (setup_class_subset_and_memory_move_costs): Check
hard_regno_mode_ok before setting lowest memory move cost for
the mode with different reg classes.
---
 gcc/ira.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 6c7f4901e4c..02dea5d49ee 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -588,6 +588,10 @@ setup_class_subset_and_memory_move_costs (void)
/* Costs for NO_REGS are used in cost calculation on the
   1st pass when the preferred register classes are not
   known yet.  In this case we take the best scenario.  */
+   if (!targetm.hard_regno_mode_ok (ira_class_hard_regs[cl][0],
+(machine_mode) mode))
+ continue;
+
if (ira_memory_move_cost[mode][NO_REGS][0]
> ira_memory_move_cost[mode][cl][0])
  ira_max_memory_move_cost[mode][NO_REGS][0]
-- 
2.39.1.388.g2fc9e9ca3c



Re: [PATCH] rs6000: Fix vector_set_var_p9 by considering BE [PR108807]

2023-04-03 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the review!

on 2023/4/3 19:44, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Feb 17, 2023 at 05:55:04PM +0800, Kewen.Lin wrote:
>> As PR108807 exposes, the current handling in function
>> rs6000_expand_vector_set_var_p9 doesn't take care of big
>> endianness.  Currently the function is to rotate the
>> target vector by moving element to-be-set to element 0,
>> set element 0 with the given val, then rotate back.  To
>> get the permutation control vector for the rotation, it
>> makes use of lvsr and lvsl, but the element ordering is
>> different for BE and LE (like element 0 is the most
>> significant one on BE while the least significant one on
>> LE), this patch is to add consideration for BE and make
>> sure permutation control vectors for rotations are expected.
> 
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -7235,22 +7235,26 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx 
>> val, rtx idx)
>>
>>machine_mode shift_mode;
>>rtx (*gen_ashl)(rtx, rtx, rtx);
>> -  rtx (*gen_lvsl)(rtx, rtx);
>> -  rtx (*gen_lvsr)(rtx, rtx);
>> +  rtx (*gen_pcvr1)(rtx, rtx);
>> +  rtx (*gen_pcvr2)(rtx, rtx);
> 
> Space before "(" btw, you can fix that at the same time? :-)
> 

Good catch, fixed.

> What does "pcvr" mean?  You could put that in a short comment?
> 
>> +  /* Generate one permutation control vector used for rotating the element
> 
> Ah.  Yeah just "/* Permutation control vector */" for the above one
> prevents all mysteries :-)

One comment line added for gen_* function pointers.

> 
> Patch looks good.  Thanks!
> 

Pushed as r13-6994-gd634e6088f139e, thanks!

BR,
Kewen


[PATCH] testsuite: Adjust powerpc test case pr83677.c for BE [PR108815]

2023-04-03 Thread Kewen.Lin via Gcc-patches
Hi,

The test case gcc.target/powerpc/pr83677.c was written for
LE environment, this patch is to make it work on BE as well.

Tested on BE and LE well, I'm going to push this soon if no
objections.

BR,
Kewen
-
PR testsuite/108815

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr83677.c (v_expand_u8, v_expand_u16,
v_load_deinterleave_f32, v_store_interleave_f32): Adjust some code by
considering BE.
---
 gcc/testsuite/gcc.target/powerpc/pr83677.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr83677.c 
b/gcc/testsuite/gcc.target/powerpc/pr83677.c
index c1a09687174..8b1caff3f98 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr83677.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr83677.c
@@ -9,14 +9,24 @@

 void v_expand_u8(vector unsigned char* a, vector unsigned short* b0, vector 
unsigned short* b1)
 {
+#if __LITTLE_ENDIAN__
   *b0 = (vector unsigned short)vec_mergeh(*a, vec_splats((unsigned char)0));
   *b1 = (vector unsigned short)vec_mergel(*a, vec_splats((unsigned char)0));
+#else
+  *b0 = (vector unsigned short)vec_mergeh(vec_splats((unsigned char)0), *a);
+  *b1 = (vector unsigned short)vec_mergel(vec_splats((unsigned char)0), *a);
+#endif
 }

 void v_expand_u16(vector unsigned short* a, vector unsigned int* b0, vector 
unsigned int* b1)
 {
+#if __LITTLE_ENDIAN__
 *b0 = (vector unsigned int)vec_mergeh(*a, vec_splats((unsigned short)0));
 *b1 = (vector unsigned int)vec_mergel(*a, vec_splats((unsigned short)0));
+#else
+*b0 = (vector unsigned int)vec_mergeh(vec_splats((unsigned short)0), *a);
+*b1 = (vector unsigned int)vec_mergel(vec_splats((unsigned short)0), *a);
+#endif
 }

 void v_load_deinterleave_u8(unsigned char *ptr, vector unsigned char* a, 
vector unsigned char* b, vector unsigned char* c)
@@ -44,13 +54,23 @@ void v_load_deinterleave_f32(float *ptr, vector float* a, 
vector float* b, vecto
 vector float v2 = vec_xl(16, ptr);
 vector float v3 = vec_xl(32, ptr);

+#if __LITTLE_ENDIAN__
+vector float t1 = vec_sld(v3, v2, 8);
+vector float t2 = vec_sld(v1, v3, 8);
+vector float t3 = vec_sld(v2, v1, 8);
+#else
+vector float t1 = vec_sld(v2, v3, 8);
+vector float t2 = vec_sld(v3, v1, 8);
+vector float t3 = vec_sld(v1, v2, 8);
+#endif
+
 static const vector unsigned char flp = {0, 1, 2, 3, 12, 13, 14, 15, 16, 
17, 18, 19, 28, 29, 30, 31};
-*a = vec_perm(v1, vec_sld(v3, v2, 8), flp);
+*a = vec_perm(v1, t1, flp);

 static const vector unsigned char flp2 = {28, 29, 30, 31, 0, 1, 2, 3, 12, 
13, 14, 15, 16, 17, 18, 19};
-*b = vec_perm(v2, vec_sld(v1, v3, 8), flp2);
+*b = vec_perm(v2, t2, flp2);

-*c = vec_perm(vec_sld(v2, v1, 8), v3, flp);
+*c = vec_perm(t3, v3, flp);
 }

 void v_store_interleave_f32(float *ptr, vector float a, vector float b, vector 
float c)
@@ -61,7 +81,11 @@ void v_store_interleave_f32(float *ptr, vector float a, 
vector float b, vector f
 vec_xst(vec_perm(a, hbc, ahbc),  0, ptr);

 vector float lab = vec_mergel(a, b);
+#if __LITTLE_ENDIAN__
 vec_xst(vec_sld(lab, hbc, 8), 16, ptr);
+#else
+vec_xst(vec_sld(hbc, lab, 8), 16, ptr);
+#endif

 static const vector unsigned char clab = {8, 9, 10, 11, 24, 25, 26, 27, 
28, 29, 30, 31, 12, 13, 14, 15};
 vec_xst(vec_perm(c, lab, clab), 32, ptr);
--
2.39.1


Re: [PATCH] Fortran: reject module variable as character length in PARAMETER [PR104349]

2023-04-03 Thread Paul Richard Thomas via Gcc-patches
Hi Harald,

OK for mainline. It is sufficiently small that, if there is any fallout in
the next weeks, it can easily be reverted without great impact.

Thanks for the patch.

Paul


On Mon, 3 Apr 2023 at 20:46, Harald Anlauf via Fortran 
wrote:

> Dear all,
>
> the attached patch fixes an ICE-on-invalid for a PARAMETER expression
> where the character length was a MODULE variable.  The ICE seemed
> strange, as we were catching related erroneous code for declarations in
> programs or subroutines.  Removing a seemingly bogus check of restricted
> expressions is the simplest way to fix this.  (We could also catch this
> differently in decl.cc).
>
> Besides, this also fixes an accepts-invalid, see testcase. :-)
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline (13) or rather wait?
>
> Thanks,
> Harald
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein