Re: [PATCH] libgomp: Add OMPD Address Space Information functions.

2020-07-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Jul 08, 2020 at 07:53:23PM -0400, y2s1982 . wrote:
> I do remember, though I obviously understood wrongly. Sorry about that.
> I had assumed it might have something to do with ICV but didn't realize it
> would also
> apply to other variables. In all honesty, I was looking for _OPENMP macro;
> I assumed
> such information would be stored somewhere already and thought
> symbol_addr_lookup() would find it somehow. I saw mentions of it on
> ChangeLog,
> testsuits, and in one string, but I couldn't find the actual macro. As for
> openmp_version,

openmp_version is a Fortran PARAMETER, which is (for integral variables) 
something
like
enum { openmp_version = ... };
or C++ const int openmp_version = ...
if only non-ODR used, i.e. replaced by the value whenever used.

You can always readelf -Wa .libs/libgomp.so.1 or nm -D .libs/libgomp.so.1 to
see what is actually present and exported.

> I (wrongly) made the assumption looking at the omp_lib.h.in. I should learn
> more
> about .in file's syntax and what they do.
> 
> To place a variable in libgomp.so.1, should I define a related struct and
> declare a global
> extern variable of the struct in omp.h and define it in some related .c
> file?

Definitely not in omp.h, that header is for OpenMP users.  Private
implementation stuff should not be there.  But it can be in libgomp.h.

> Can I then simply use the name of the declared variable as the name (where
> "openmp_version" currently  is) to find the struct? As for the value for
> _OPENMP version,
> where can I find it, or should OMPD maintain its own values for it?

_OPENMP is a predefined macro by C/C++ (and Fortran uses that omp_lib.f90
file).  And it is only predefined if preprocessing with -fopenmp.
I guess you can define some private structure, especially for the header of
that variable (where you add perhaps some magic 32-bit number, some version
number of the layout of the variable, this _OPENMP version, size of the
whole variable and others as needed).  What I'd suggest is that it then
contains some array of self-describing further values once you'll need it.
And it should be compact if possible (ideal would be e.g. sleb128/uleb128
encoding of numbers like in DWARF; e.g. one way to do it compact would be
have a generator program that would include omp.h and libgomp.h and stdio.h,
would use offsetof/sizeof and whatever and write it on stdout as initializer
of a const char GOMP_library_description[] = { 0x12, 0x7a, 0x2b, ... };
which then would be compiled and linked into libgomp.so.1.
And that you don't try to read data from that variable all the time, but
instead do it once from ompd_process_init, where you'd look up that
variable, read and parse that variable, store what you found into the
process handle and punt if anything is unexpected in there.
Once you need more information in there, it would be nice to macroize
that info, e.g. have a macro when you need size of some internal structure,
or offsetof of its member, or size of the member, or when some pointer needs
to be dereferenced to achieve something.

But perhaps for start just use a struct and parse the information from
there, and only when we have clearer picture on what kind of information we
need we can extend it.

Jakub



Initial Sapphire Rapids and Alder Lake support from ISA r40

2020-07-09 Thread Cui, Lili via Gcc-patches
Hi:
This patch is about to add Sapphire Rapids and Alder Lake to GCC.
Sapphire Rapids is based on Cooper Lake and plus ISA 
MOVDIRI/MOVDIR64B/AVX512VP2INTERSECT/ENQCMD/CLDEMOTE/PTWRITE/WAITPKG/SERIALIZE/TSXLDTRK.
Alder Lake is based on Skylake and plus ISA CLDEMOTE/PTWRITE/WAITPK/SERIALIZE.

For detailed information, please refer to 
https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Bootstrap is ok, and no regressions for i386/x86-64 testsuite.

OK for master?

gcc/ChangeLog
* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle sapphirerapids.
* common/config/i386/i386-common.c
(processor_names): Add sapphirerapids and alderlake.
(processor_alias_table): Add sapphirerapids and alderlake.
* common/config/i386/i386-cpuinfo.h
(processor_subtypes): Add INTEL_COREI7_ALDERLAKE and
INTEL_COREI7_ALDERLAKE.
* config.gcc: Add -march=sapphirerapids and alderlake.
* config/i386/driver-i386.c
(host_detect_local_cpu) Handle sapphirerapids and alderlake.
* config/i386/i386-c.c
(ix86_target_macros_internal): Handle sapphirerapids and alderlake.
* config/i386/i386-options.c
(m_SAPPHIRERAPIDS) : Define.
(m_ALDERLAKE): Ditto.
(m_CORE_AVX512) : Add m_SAPPHIRERAPIDS.
(processor_cost_table): Add sapphirerapids and alderlake.
(ix86_option_override_internal) Handle PTA_WAITPKG, PTA_ENQCMD,
PTA_CLDEMOTE, PTA_SERIALIZE, PTA_TSXLDTRK.
* config/i386/i386.h
(ix86_size_cost) : Define SAPPHIRERAPIDS and ALDERLAKE.
(processor_type) : Add PROCESSOR_SAPPHIRERAPIDS and
PROCESSOR_ALDERLAKE.
(PTA_ENQCMD): New.
(PTA_CLDEMOTE): Ditto.
(PTA_SERIALIZE): Ditto.
(PTA_TSXLDTRK): New.
(PTA_SAPPHIRERAPIDS): Ditto.
(PTA_ALDERLAKE): Ditto.
(processor_type) : Add PROCESSOR_SAPPHIRERAPIDS and
PROCESSOR_ALDERLAKE.
* doc/extend.texi: Add sapphirerapids and alderlake.
* doc/invoke.texi: Add sapphirerapids and alderlake.

gcc/testsuite/ChangeLog
* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.target/i386/mv16.C: Handle new march

Thanks,
Lili.



0001-Initial-Sapphire-Rapids-and-Alder-Lake-support-from-.patch
Description: 0001-Initial-Sapphire-Rapids-and-Alder-Lake-support-from-.patch


Re: [patch] Make memory copy functions scalar storage order barriers

2020-07-09 Thread Eric Botcazou
> OK with me.

Thanks.

> I still believe we could handle reverse storage order more "optimistically"
> (without all the current usage restrictions).  We seem to have no problems
> with address-spaces in this area for example (their problematic cases are
> of course slightly different).

The fundamental restriction of the implementation is that we don't assign a 
storage order to the scalar rvalues themselves, i.e. there is only one kind of 
scalar rvalues and they are represented in target order.  Only scalar lvalues 
can have a storage order.  This one cannot reasonably be relaxed I think.

>From there, you have secondary restrictions related to aliasing:

 - are pointers to scalar values stored in reverse order allowed?  Currently 
they are not and the compiler issues an error if you try to make one.  This 
one could probably be relaxed by adding a "reverse" flag to pointers and the 
analogy with address-spaces would be exact (as a matter of fact, the latest 
UltraSPARC specifications introduced an Address Space Identifier for little-
endian data, so these pointers could be implemented in hardware for them).

 - is storage order toggling by means of aliasing supported?  Currently it is 
not and this is documented as such.  This one looks hard to relax because this 
would be essentially equivalent to relaxing the fundamenal restriction: if you 
apply SRA to such a case, either materially or symbolically through VN, you 
will need to effectively assign a storage order to scalar rvalues internally.

-- 
Eric Botcazou


[PATCH] S/390: Emit vector alignment hints for z13 if AS accepts them [BACKPORT GCC10]

2020-07-09 Thread Stefan Schulze Frielinghaus via Gcc-patches
gcc/ChangeLog:

* config.in: Regenerate.
* config/s390/s390.c (print_operand): Emit vector alignment hints
for target z13, if AS accepts them.  For other targets the logic
stays the same.
* config/s390/s390.h (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS): Define
macro.
* configure: Regenerate.
* configure.ac: Check HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/align-1.c: Change target architecture
to z13.
* gcc.target/s390/vector/align-2.c: Change target architecture
to z13.

(cherry picked from commit 929fd91ba975eebf9e57f7f092041271dcaf0c34)
(squashed with commit f842bdd7a97e9fef7513a266d641cac72d5f97cc)
---
 gcc/config.in |  7 +
 gcc/config/s390/s390.c|  4 +--
 gcc/config/s390/s390.h|  7 +
 gcc/configure | 31 +++
 gcc/configure.ac  |  5 +++
 .../gcc.target/s390/vector/align-1.c  |  2 +-
 .../gcc.target/s390/vector/align-2.c  |  2 +-
 7 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 809e7b26823..364eba47737 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -706,6 +706,13 @@
 #endif
 
 
+/* Define if your assembler supports vl/vst/vlm/vstm with an optional
+   alignment hint argument on z13. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13
+#endif
+
+
 /* Define if your assembler supports VSX instructions. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_VSX
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 18332271ed7..ac579a9458b 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7853,15 +7853,13 @@ print_operand (FILE *file, rtx x, int code)
   switch (code)
 {
 case 'A':
-#ifdef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS
-  if (TARGET_Z14 && MEM_P (x))
+  if (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS && MEM_P (x))
{
  if (MEM_ALIGN (x) >= 128)
fprintf (file, ",4");
  else if (MEM_ALIGN (x) == 64)
fprintf (file, ",3");
}
-#endif
   return;
 case 'C':
   fprintf (file, s390_branch_condition_mnemonic (x, FALSE));
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 2e29dbe97e8..e4ef63e4080 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -167,6 +167,13 @@ enum processor_flags
(TARGET_VX && TARGET_CPU_VXE2)
 #define TARGET_VXE2_P(opts)\
(TARGET_VX_P (opts) && TARGET_CPU_VXE2_P (opts))
+#if defined(HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13)
+#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS TARGET_Z13
+#elif defined(HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS)
+#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS TARGET_Z14
+#else
+#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS 0
+#endif
 
 #ifdef HAVE_AS_MACHINE_MACHINEMODE
 #define S390_USE_TARGET_ATTRIBUTE 1
diff --git a/gcc/configure b/gcc/configure
index cd3d9516fce..eb6061c1631 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -28235,6 +28235,37 @@ if test 
$gcc_cv_as_s390_vector_loadstore_alignment_hints = yes; then
 
 $as_echo "#define HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS 1" >>confdefs.h
 
+fi
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for vector 
load/store alignment hints on z13" >&5
+$as_echo_n "checking assembler for vector load/store alignment hints on z13... 
" >&6; }
+if ${gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13=no
+  if test x$gcc_cv_as != x; then
+$as_echo ' vl %v24,0(%r15),3 ' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mzarch -march=z13 -o conftest.o 
conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13" >&5
+$as_echo "$gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13" >&6; }
+if test $gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13 = yes; then
+
+$as_echo "#define HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13 1" 
>>confdefs.h
+
 fi
 
 
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 0de3b4bf97b..715fcba0482 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5103,6 +5103,11 @@ configured with --enable-newlib-nano-formatted-io.])
   [vl %v24,0(%r15),3

[PATCH] S/390: Emit vector alignment hints for z13 if AS accepts them [BACKPORT GCC9]

2020-07-09 Thread Stefan Schulze Frielinghaus via Gcc-patches
Bootstrapped and regtested on s390x with and without a patched gas. Ok
for master?

gcc/ChangeLog:

* config.in: Regenerate.
* config/s390/s390.c (print_operand): Emit vector alignment hints
for target z13, if AS accepts them.  For other targets the logic
stays the same.
* config/s390/s390.h (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS): Define
macro.
* configure: Regenerate.
* configure.ac: Check HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/align-1.c: Change target architecture
to z13.
* gcc.target/s390/vector/align-2.c: Change target architecture
to z13.

(cherry picked from commit 929fd91ba975eebf9e57f7f092041271dcaf0c34)
(squashed with commit 87cb9423add08743d8bb3368f0af61ddc9572837)
---
 gcc/config.in |  7 +
 gcc/config/s390/s390.c|  4 +--
 gcc/config/s390/s390.h|  7 +
 gcc/configure | 31 +++
 gcc/configure.ac  |  5 +++
 .../gcc.target/s390/vector/align-1.c  |  2 +-
 .../gcc.target/s390/vector/align-2.c  |  2 +-
 7 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index a718ceaf3da..bfef2340339 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -676,6 +676,13 @@
 #endif
 
 
+/* Define if your assembler supports vl/vst/vlm/vstm with an optional
+   alignment hint argument on z13. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13
+#endif
+
+
 /* Define if your assembler supports VSX instructions. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_VSX
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index db3f94978ec..fb0ef44c196 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7766,15 +7766,13 @@ print_operand (FILE *file, rtx x, int code)
   switch (code)
 {
 case 'A':
-#ifdef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS
-  if (TARGET_Z14 && MEM_P (x))
+  if (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS && MEM_P (x))
{
  if (MEM_ALIGN (x) >= 128)
fprintf (file, ",4");
  else if (MEM_ALIGN (x) == 64)
fprintf (file, ",3");
}
-#endif
   return;
 case 'C':
   fprintf (file, s390_branch_condition_mnemonic (x, FALSE));
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index f7023d985f1..bd5316ffe94 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -167,6 +167,13 @@ enum processor_flags
(TARGET_VX && TARGET_CPU_VXE2)
 #define TARGET_VXE2_P(opts)\
(TARGET_VX_P (opts) && TARGET_CPU_VXE2_P (opts))
+#if defined(HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13)
+#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS TARGET_Z13
+#elif defined(HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS)
+#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS TARGET_Z14
+#else
+#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS 0
+#endif
 
 #ifdef HAVE_AS_MACHINE_MACHINEMODE
 #define S390_USE_TARGET_ATTRIBUTE 1
diff --git a/gcc/configure b/gcc/configure
index a065ba23728..35f5a87983b 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -27779,6 +27779,37 @@ if test 
$gcc_cv_as_s390_vector_loadstore_alignment_hints = yes; then
 
 $as_echo "#define HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS 1" >>confdefs.h
 
+fi
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for vector 
load/store alignment hints on z13" >&5
+$as_echo_n "checking assembler for vector load/store alignment hints on z13... 
" >&6; }
+if ${gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13=no
+  if test x$gcc_cv_as != x; then
+$as_echo ' vl %v24,0(%r15),3 ' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mzarch -march=z13 -o conftest.o 
conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13" >&5
+$as_echo "$gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13" >&6; }
+if test $gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13 = yes; then
+
+$as_echo "#define HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13 1" 
>>confdefs.h
+
 fi
 
 
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 3a7251102ef..eef4f1acd01 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4908,6 +4908,11 @@ po

Re: [patch] Make memory copy functions scalar storage order barriers

2020-07-09 Thread Richard Biener via Gcc-patches
On Thu, Jul 9, 2020 at 9:29 AM Eric Botcazou  wrote:
>
> > OK with me.
>
> Thanks.
>
> > I still believe we could handle reverse storage order more "optimistically"
> > (without all the current usage restrictions).  We seem to have no problems
> > with address-spaces in this area for example (their problematic cases are
> > of course slightly different).
>
> The fundamental restriction of the implementation is that we don't assign a
> storage order to the scalar rvalues themselves, i.e. there is only one kind of
> scalar rvalues and they are represented in target order.  Only scalar lvalues
> can have a storage order.  This one cannot reasonably be relaxed I think.
>
> From there, you have secondary restrictions related to aliasing:
>
>  - are pointers to scalar values stored in reverse order allowed?  Currently
> they are not and the compiler issues an error if you try to make one.  This
> one could probably be relaxed by adding a "reverse" flag to pointers and the
> analogy with address-spaces would be exact (as a matter of fact, the latest
> UltraSPARC specifications introduced an Address Space Identifier for little-
> endian data, so these pointers could be implemented in hardware for them).
>
>  - is storage order toggling by means of aliasing supported?  Currently it is
> not and this is documented as such.  This one looks hard to relax because this
> would be essentially equivalent to relaxing the fundamenal restriction: if you
> apply SRA to such a case, either materially or symbolically through VN, you
> will need to effectively assign a storage order to scalar rvalues internally.

Hmm, I'd probably avoid to assign a storage order to rvalues (SSA names
and pseudos) and simply go with a storage order on lvalues.  That means
SRA can only scalarize when all actual accesses have the same storage
order and "full" scalarization (scalarization of portions where no accesses
are observed) can be only done when its materialized back to memory.
So I don't see where the need arises to tag pointers apart from on the
frontend side which of course needs to tag dereferences of pointers
appropriately for the middle-end.  In particular memcpy() behaves
storage-order agnostic so it shouldn't care.

I don't see how aliases or storage order "punning" become a problem then.
It's no different from aliasing int and float accesses?

As of treating it like address-spaces, yes - it's basically like that at the
moment but with more implementation restrictions.

Richard.

> --
> Eric Botcazou


Re: [PATCH] S/390: Emit vector alignment hints for z13 if AS accepts them [BACKPORT GCC10]

2020-07-09 Thread Stefan Schulze Frielinghaus via Gcc-patches
Bootstrapped and regtested on s390x with and without patched gas. Ok for
releases/gcc-10 branch?

On Thu, Jul 09, 2020 at 09:29:21AM +0200, Stefan Schulze Frielinghaus via 
Gcc-patches wrote:
> gcc/ChangeLog:
> 
>   * config.in: Regenerate.
>   * config/s390/s390.c (print_operand): Emit vector alignment hints
>   for target z13, if AS accepts them.  For other targets the logic
>   stays the same.
>   * config/s390/s390.h (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS): Define
>   macro.
>   * configure: Regenerate.
>   * configure.ac: Check HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/align-1.c: Change target architecture
>   to z13.
>   * gcc.target/s390/vector/align-2.c: Change target architecture
>   to z13.
> 
> (cherry picked from commit 929fd91ba975eebf9e57f7f092041271dcaf0c34)
> (squashed with commit f842bdd7a97e9fef7513a266d641cac72d5f97cc)
> ---
>  gcc/config.in |  7 +
>  gcc/config/s390/s390.c|  4 +--
>  gcc/config/s390/s390.h|  7 +
>  gcc/configure | 31 +++
>  gcc/configure.ac  |  5 +++
>  .../gcc.target/s390/vector/align-1.c  |  2 +-
>  .../gcc.target/s390/vector/align-2.c  |  2 +-
>  7 files changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config.in b/gcc/config.in
> index 809e7b26823..364eba47737 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -706,6 +706,13 @@
>  #endif
>  
>  
> +/* Define if your assembler supports vl/vst/vlm/vstm with an optional
> +   alignment hint argument on z13. */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13
> +#endif
> +
> +
>  /* Define if your assembler supports VSX instructions. */
>  #ifndef USED_FOR_TARGET
>  #undef HAVE_AS_VSX
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 18332271ed7..ac579a9458b 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -7853,15 +7853,13 @@ print_operand (FILE *file, rtx x, int code)
>switch (code)
>  {
>  case 'A':
> -#ifdef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS
> -  if (TARGET_Z14 && MEM_P (x))
> +  if (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS && MEM_P (x))
>   {
> if (MEM_ALIGN (x) >= 128)
>   fprintf (file, ",4");
> else if (MEM_ALIGN (x) == 64)
>   fprintf (file, ",3");
>   }
> -#endif
>return;
>  case 'C':
>fprintf (file, s390_branch_condition_mnemonic (x, FALSE));
> diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
> index 2e29dbe97e8..e4ef63e4080 100644
> --- a/gcc/config/s390/s390.h
> +++ b/gcc/config/s390/s390.h
> @@ -167,6 +167,13 @@ enum processor_flags
>   (TARGET_VX && TARGET_CPU_VXE2)
>  #define TARGET_VXE2_P(opts)  \
>   (TARGET_VX_P (opts) && TARGET_CPU_VXE2_P (opts))
> +#if defined(HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13)
> +#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS TARGET_Z13
> +#elif defined(HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS)
> +#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS TARGET_Z14
> +#else
> +#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS 0
> +#endif
>  
>  #ifdef HAVE_AS_MACHINE_MACHINEMODE
>  #define S390_USE_TARGET_ATTRIBUTE 1
> diff --git a/gcc/configure b/gcc/configure
> index cd3d9516fce..eb6061c1631 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -28235,6 +28235,37 @@ if test 
> $gcc_cv_as_s390_vector_loadstore_alignment_hints = yes; then
>  
>  $as_echo "#define HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS 1" >>confdefs.h
>  
> +fi
> +
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for vector 
> load/store alignment hints on z13" >&5
> +$as_echo_n "checking assembler for vector load/store alignment hints on 
> z13... " >&6; }
> +if ${gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13=no
> +  if test x$gcc_cv_as != x; then
> +$as_echo '   vl %v24,0(%r15),3 ' > conftest.s
> +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mzarch -march=z13 -o 
> conftest.o conftest.s >&5'
> +  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> +  (eval $ac_try) 2>&5
> +  ac_status=$?
> +  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> +  test $ac_status = 0; }; }
> +then
> + gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13=yes
> +else
> +  echo "configure: failed program was" >&5
> +  cat conftest.s >&5
> +fi
> +rm -f conftest.o conftest.s
> +  fi
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13" >&5
> +$as_echo "$gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13" >&6; }
> +if t

Re: [PATCH] S/390: Emit vector alignment hints for z13 if AS accepts them [BACKPORT GCC9]

2020-07-09 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Thu, Jul 09, 2020 at 09:35:42AM +0200, Stefan Schulze Frielinghaus via 
Gcc-patches wrote:
> Bootstrapped and regtested on s390x with and without a patched gas. Ok
> for master?

Ok for releases/gcc-9 branch? (not for master of course, sorry for the
confusion).

> gcc/ChangeLog:
> 
>   * config.in: Regenerate.
>   * config/s390/s390.c (print_operand): Emit vector alignment hints
>   for target z13, if AS accepts them.  For other targets the logic
>   stays the same.
>   * config/s390/s390.h (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS): Define
>   macro.
>   * configure: Regenerate.
>   * configure.ac: Check HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/align-1.c: Change target architecture
>   to z13.
>   * gcc.target/s390/vector/align-2.c: Change target architecture
>   to z13.
> 
> (cherry picked from commit 929fd91ba975eebf9e57f7f092041271dcaf0c34)
> (squashed with commit 87cb9423add08743d8bb3368f0af61ddc9572837)
> ---
>  gcc/config.in |  7 +
>  gcc/config/s390/s390.c|  4 +--
>  gcc/config/s390/s390.h|  7 +
>  gcc/configure | 31 +++
>  gcc/configure.ac  |  5 +++
>  .../gcc.target/s390/vector/align-1.c  |  2 +-
>  .../gcc.target/s390/vector/align-2.c  |  2 +-
>  7 files changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config.in b/gcc/config.in
> index a718ceaf3da..bfef2340339 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -676,6 +676,13 @@
>  #endif
>  
>  
> +/* Define if your assembler supports vl/vst/vlm/vstm with an optional
> +   alignment hint argument on z13. */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13
> +#endif
> +
> +
>  /* Define if your assembler supports VSX instructions. */
>  #ifndef USED_FOR_TARGET
>  #undef HAVE_AS_VSX
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index db3f94978ec..fb0ef44c196 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -7766,15 +7766,13 @@ print_operand (FILE *file, rtx x, int code)
>switch (code)
>  {
>  case 'A':
> -#ifdef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS
> -  if (TARGET_Z14 && MEM_P (x))
> +  if (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS && MEM_P (x))
>   {
> if (MEM_ALIGN (x) >= 128)
>   fprintf (file, ",4");
> else if (MEM_ALIGN (x) == 64)
>   fprintf (file, ",3");
>   }
> -#endif
>return;
>  case 'C':
>fprintf (file, s390_branch_condition_mnemonic (x, FALSE));
> diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
> index f7023d985f1..bd5316ffe94 100644
> --- a/gcc/config/s390/s390.h
> +++ b/gcc/config/s390/s390.h
> @@ -167,6 +167,13 @@ enum processor_flags
>   (TARGET_VX && TARGET_CPU_VXE2)
>  #define TARGET_VXE2_P(opts)  \
>   (TARGET_VX_P (opts) && TARGET_CPU_VXE2_P (opts))
> +#if defined(HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13)
> +#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS TARGET_Z13
> +#elif defined(HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS)
> +#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS TARGET_Z14
> +#else
> +#define TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS 0
> +#endif
>  
>  #ifdef HAVE_AS_MACHINE_MACHINEMODE
>  #define S390_USE_TARGET_ATTRIBUTE 1
> diff --git a/gcc/configure b/gcc/configure
> index a065ba23728..35f5a87983b 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -27779,6 +27779,37 @@ if test 
> $gcc_cv_as_s390_vector_loadstore_alignment_hints = yes; then
>  
>  $as_echo "#define HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS 1" >>confdefs.h
>  
> +fi
> +
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for vector 
> load/store alignment hints on z13" >&5
> +$as_echo_n "checking assembler for vector load/store alignment hints on 
> z13... " >&6; }
> +if ${gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13=no
> +  if test x$gcc_cv_as != x; then
> +$as_echo '   vl %v24,0(%r15),3 ' > conftest.s
> +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mzarch -march=z13 -o 
> conftest.o conftest.s >&5'
> +  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> +  (eval $ac_try) 2>&5
> +  ac_status=$?
> +  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> +  test $ac_status = 0; }; }
> +then
> + gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13=yes
> +else
> +  echo "configure: failed program was" >&5
> +  cat conftest.s >&5
> +fi
> +rm -f conftest.o conftest.s
> +  fi
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $gcc_cv_as_s390_vector_loadstore_alignment_hints_on_z13" >&5
> +$as_ech

[PATCH] Fortran : accepts pointer initialization of DT dummy args, PR45337

2020-07-09 Thread Mark Eggleston

Please find attached a fix for this PR.

I think the discussion of intent muddied the waters for this PR. As I 
understand it initialisation of variables implies the save attribute.  
The save attribute is incompatible with the dummy attribute so an error 
should be output when initialisation is attempted for dummy variables.


Is this correct? If so OK to commit and backport?

make check-fortran doesn't produce any additional errors on x86_64.

Commit message:

Fortran  : accepts pointer initialization of DT dummy args PR45337

Initialisation of a variable results in an implicit save attribute
being added to the variable.  The save attribute is not allowed for
variables with the dummy attribute set.  Initialisation should be
rejected for dummy variables.

2020-07-09  Mark Eggleston 

gcc/fortran/

    PR fortran/45337
    * resolve.c (resolve_fl_variable): Remove type and intent
    checks from the check for dummy.

2020-07-09  Mark Eggleston 

gcc/testsuite/

    PR fortran/45337
    * gfortran.dg/pr45337_1.f90: New test.
    * gfortran.dg/pr45337_2.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From d77b47ea1de104ee960c0c952148f7f1500cf97a Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Wed, 10 Jun 2020 07:22:50 +0100
Subject: [PATCH] Fortran  : accepts pointer initialization of DT dummy args
 PR45337

Initialisation of a variable results in an implicit save attribute
being added to the variable.  The save attribute is not allowed for
variables with the dummy attribute set.  Initialisation should be
rejected for dummy variables.

2020-07-09  Mark Eggleston  

gcc/fortran/

	PR fortran/45337
	* resolve.c (resolve_fl_variable): Remove type and intent
	checks from the check for dummy.

2020-07-09  Mark Eggleston  

gcc/testsuite/

	PR fortran/45337
	* gfortran.dg/pr45337_1.f90: New test.
	* gfortran.dg/pr45337_2.f90: New test.
---
 gcc/fortran/resolve.c   |  3 +--
 gcc/testsuite/gfortran.dg/pr45337_1.f90 | 14 ++
 gcc/testsuite/gfortran.dg/pr45337_2.f90 | 18 ++
 3 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr45337_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr45337_2.f90

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 223de91..730d11105bd 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -12918,8 +12918,7 @@ resolve_fl_variable (gfc_symbol *sym, int mp_flag)
   else if (sym->attr.external)
 	gfc_error ("External %qs at %L cannot have an initializer",
 		   sym->name, &sym->declared_at);
-  else if (sym->attr.dummy
-	&& !(sym->ts.type == BT_DERIVED && sym->attr.intent == INTENT_OUT))
+  else if (sym->attr.dummy)
 	gfc_error ("Dummy %qs at %L cannot have an initializer",
 		   sym->name, &sym->declared_at);
   else if (sym->attr.intrinsic)
diff --git a/gcc/testsuite/gfortran.dg/pr45337_1.f90 b/gcc/testsuite/gfortran.dg/pr45337_1.f90
new file mode 100644
index 000..2bb8ff244cc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr45337_1.f90
@@ -0,0 +1,14 @@
+! { dg-do compile }
+
+module ptrmod
+contains
+subroutine lengthX(x, i) ! { dg-error "Dummy 'x' at .1. cannot have an initializer" }
+   implicit none
+   real, pointer, intent(out) :: x(:)=>null()
+   integer :: i
+   x=>null()
+   allocate(x(i))
+   x=i
+end subroutine
+end module
+
diff --git a/gcc/testsuite/gfortran.dg/pr45337_2.f90 b/gcc/testsuite/gfortran.dg/pr45337_2.f90
new file mode 100644
index 000..ca7a6f53ad6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr45337_2.f90
@@ -0,0 +1,18 @@
+! { dg-do compile }
+
+type t
+end type t
+type t2
+  integer :: j = 7
+end type t2
+contains
+  subroutine x(a, b, c)
+intent(out) :: a, b, c
+type(t) :: a = t()
+type(t2) :: b = t2()
+type(t2) :: c
+  end subroutine x
+end
+
+! { dg-error "Dummy .a. at .1. cannot have an initializer" " " { target *-*-* } 9 }
+! { dg-error "Dummy .b. at .1. cannot have an initializer" " " { target *-*-* } 9 }
-- 
2.11.0



[PATCH] Fortran : Implicitly type parameter causes an invalid error, PR96038

2020-07-09 Thread Mark Eggleston
Please find attached patch for fix PR.  The original patch was provided 
by Steve Kargl in the initial problem report.


OK to commit to master and backport?

Fortran  : Implicitly type parameter causes an invalid errorPR96038

If a paramter to declared and initialised before its type is
declared a bogus error is output at the type declaration
idicating that initialisation is missing.

2020-07-09  Steven G. Kargl  

gcc/fortran/

    PR fortran/96038
    * decl.c (add_init_expr_sym):  For a symbol that is a
    parameter accept an initialisation if it does not have a
    value otherwise output a error and reject.

2020-07-09  Mark Eggleston 

gcc/testsuite/

    PR fortran/96038
    * gfortran.dg/pr96038.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From 7dfee4edf9796304c75785bb56610f3e06211f29 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Mon, 6 Jul 2020 07:14:59 +0100
Subject: [PATCH] Fortran  : Implicitly type parameter causes an invalid error
 PR96038

If a paramter to declared and initialised before its type is
declared a bogus error is output at the type declaration
idicating that initialisation is missing.

2020-07-09  Steven G. Kargl  

gcc/fortran/

	PR fortran/96038
	* decl.c (add_init_expr_sym):  For a symbol that is a
	parameter accept an initialisation if it does not have a
	value otherwise output a error and reject.

2020-07-09  Mark Eggleston  

gcc/testsuite/

	PR fortran/96038
	* gfortran.dg/pr96038.f90: New test.
---
 gcc/fortran/decl.c| 15 +--
 gcc/testsuite/gfortran.dg/pr96038.f90 |  8 
 2 files changed, 17 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr96038.f90

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 52c2a624b6e..d854b2a0307 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -1889,13 +1889,16 @@ add_init_expr_to_sym (const char *name, gfc_expr **initp, locus *var_locus)
 
   /* If this symbol is confirming an implicit parameter type,
  then an initialization expression is not allowed.  */
-  if (attr.flavor == FL_PARAMETER
-  && sym->value != NULL
-  && *initp != NULL)
+  if (attr.flavor == FL_PARAMETER && sym->value != NULL)
 {
-  gfc_error ("Initializer not allowed for PARAMETER %qs at %C",
-		 sym->name);
-  return false;
+  if (*initp != NULL)
+	{
+	  gfc_error ("Initializer not allowed for PARAMETER %qs at %C",
+		 sym->name);
+	  return false;
+	}
+  else
+	return true;
 }
 
   if (init == NULL)
diff --git a/gcc/testsuite/gfortran.dg/pr96038.f90 b/gcc/testsuite/gfortran.dg/pr96038.f90
new file mode 100644
index 000..f1098f33c1b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96038.f90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+
+function ifoo()
+  parameter (n = 50)
+  integer n
+  ifoo = n
+end
+
-- 
2.11.0



Re: [PATCH] rs6000: Refine RTL unroll adjust hook

2020-07-09 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool  writes:

Hi,

> On Wed, Jul 08, 2020 at 11:39:56AM +0800, Jiufu Guo wrote:
>> Segher Boessenkool  writes:
>> > I am not happy about what is considered "a complex loop" here.
>> For early exit, which may cause and *next* unrolled iterations may be
>> not executed, then unroll may be not benifit.
>
> Yes, and it can well result in worse branch prediction.
>
>> For too many branches (this patch say 20% of insns), may cause more
>> branch-misses even for unrolled loop.
>> 
>> >From my intial intuition, I once think each condition may *define* loop
>> as complex. :-)
>> 
>> But for each single condition, loop unrolling may still be helpful.
>> While, if these conditions are all occur in a loop, it would be more
>> possible to get negative impacts after unrolled.
>
> Yes, but how many loops have *all* these conditions?  That is my problem
> with it: it is only tested with one specific loop, and only benefits
> that loop.

I also encounter a few of this kind of loops, some in hot path of leela
and perlbench, and had negative impact leelar_r (~2%), perlbench(>0.5%),
and also gcc_r.  I had a quick count, there are ~500 this kind of loops
occur in specint build.

>
>> > (A PARAM would be nice, but too many of those isn't actually useful
>> > either...  Next time, add one as soon as writing the code, at least it
>> > is useful at that point in time, when you still need to experiment with
>> > it :-) )
>> Yes, with PARAM, we can just change param to experiment:
>> "if (loop->ninsns <= 10) return 0;" (6insn - 10 insn) may just located
>> n one cache line, it may not need to unroll. But, after some tunning, I
>> chose 6/4,10/2 here. 4 hardcodes may be too many for PARAM.  2 PARAMs
>> (one for 6, one for 10) may be ok.  Any comments?
>
> RTL insns are not one-to-one with machine insns.  One important reason
> to unroll is for small loops, where we need to hide the latency of a
> fetch redirect (say, three cycles); this is complicated by most insns
> having a 2 cycle latency (or more for FP and such), and by different
> SMT modes changing stuff here, oh and different CPUs of course.
>
> So it is very important to unroll small loops enough, and it can be
> beneficial for larger loops too, but it also hurts (in general, and
> more for bigger loops, or loops with calls, or jumps).
>
> It's not a science, more an art.  You'll just have to find something
> that works well in practice.  But not something that looks at very
> special cases only, preferably.
After unroll there are some optimizations which also could change rtl
insn sequence, and the number is not same with binary instrctions. 
Yes, some 'think so' theory is not `real so`. It is heuristic. :-).

>
>> > So, we use rs6000_complex_loop_p only to prevent all unrolling, never to
>> > reduce the unrolling, and only in very specific cases.
>> >
>> > Is there no middle road possible?  Say, don't unroll to more than 25
>> > insns total (which is what the "only small loops" does, sort of -- it
>> > also avoids unrolling 3x a bit, yes), and don't unroll to more than 2
>> > calls, and not to more than 4 branches (I'm making up those numbers, of
>> > course, and PARAMS would be helpful).  Some of this already does exist,
>> > and might need retuning for us?
>> It may make sense. There are param_max_unrolled_insns,
>> param_max_unroll_times, param_max_peel_branches(cunrol)...; we may add
>> calls number and branches number checking for rtl unroller.
>
> Hrm yes, that may be generally useful even.
>
>> While, actually, here we would need condition to define *complex* loop,
>> where contains call exist (may just 1), branch exist(may 2) and early
>> exit(may 1) at the same time, but each number is not large.
>> Any sugguestions? Thanks.
>
> How many loops have you seen where all those conditions are true, but
> the generic code still decides to unroll things?
Some occur as above said.  I use -fopt-info to compare the changed
unroll_adjust_hook to check loops of this kind.

Thanks a lot!
Jiufu

>
>
> Segher


Re: [PATCH] middle-end: Improve RTL expansion in expand_mul_overflow,

2020-07-09 Thread Richard Sandiford
"Roger Sayle"  writes:
> This patch improves the RTL that the middle-end generates for testing
> signed overflow following a widening multiplication.  During this
> expansion the middle-end generates a truncation which can get used
> multiple times.  Placing this intermediate value in a pseudo register
> reduces the amount of code generated on platforms where this truncation
> requires an explicit instruction.
>
> This simple call to force_reg eliminates 368 lines of the -S output
> from testsuite/c-c++-common/torture/builtin-arith-overflow-1.c on
> nvptx-none.  An example difference is in t120_1smul where the following
> 7 instruction sequence in which the 1st and 6th instructions perform
> the same truncation:
>
> < cvt.u32.u64 %r31, %r28;   <- truncate %r28
> < shr.s32 %r30, %r31, 31;
> < cvt.u32.u64 %r32, %r29;
> < setp.eq.u32 %r33, %r30, %r32;
> < selp.u32%r24, 0, 1, %r33;
> < cvt.u32.u64 %r25, %r28;   <- truncate %r28
> < setp.eq.u32 %r34, %r24, 0;
>
> is now generated as a 4 instruction sequence without duplication:
>
>> cvt.u32.u64 %r30, %r28;
>> shr.s32 %r31, %r30, 31;
>> cvt.u32.u64 %r32, %r29;
>> setp.eq.u32 %r33, %r31, %r32;
>
> On x86_64-pc-linux-gnu, where SUBREGs are free, this patch generates
> exactly the same builtin-arith-overflow-1.s as before.
>
> This patch has been tested on both x86_64-pc-linux-gnu with
> "make bootstrap" and nvptx-none with "make", with no new
> testsuite regressions on either platform.
> Ok for mainline?
>
>
> 2020-07-06  Roger Sayle  
>
> gcc/ChangeLog:
>   * internal-fn.c (expand_mul_overflow): When checking for signed
>   overflow from a widening multiplication, we access the truncated
>   lowpart RES twice, so keep this value in a pseudo register.
>
>
> Thanks in advance,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 0be2eb4..d1bd6cc 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -1627,6 +1627,9 @@ expand_mul_overflow (location_t loc, tree lhs, tree 
> arg0, tree arg1,
>profile_probability::very_likely ());
> else
>   {
> +   /* RES is used more than once, place it in a pseudo.  */
> +   res = force_reg (mode, res);
> +
> rtx signbit = expand_shift (RSHIFT_EXPR, mode, res, prec - 1,
> NULL_RTX, 0);
> /* RES is low half of the double width result, HIPART

In general, this can be dangerous performance-wise on targets where
subregs are free.  If the move survives to the register allocators,
it increases the risk that the move will become a machine insn.
(The RA will prefer to tie the registers, but that isn't guaranteed.)

But more fundamentally, this can hurt if the SUBREG_REG is live at
the same time as the new pseudo, since the RA then has to treat them
as separate quantities.  From your results, that obviously doesn't
occur in the test case, but I'm not 100% confident that it won't
occur elsewhere.

If target-independent code is going to optimise for “no subreg operand”
targets like nvptx, I think it needs to know that the target wants that.

Thanks,
Richard


Re: [wwwdocs PATCH] remove tree-browser page and links

2020-07-09 Thread Richard Sandiford
Hu Jiangping  writes:
> Hi,
>
> I'm trying Tree Browser during debugging, but failed.
> I found that tree-browser.c and tree-browser.def have been
> removed at 2015-07-25. So, to avoid misunderstanding,
> can we remove this tree-browser page too?

Thanks for the patch.  Seems like a good idea to me.  I guess the only
question is whether we should keep it around for historical purposes,
but with a big banner to say that it's no longer up-to-date.  I also
don't know whether we try to avoid 404s on old pages.

Gerald, WDYT?

Richard

>
> Regards.
> hujp
>
> ---
>  htdocs/projects/tree-ssa/index.html|   6 -
>  htdocs/projects/tree-ssa/tree-browser.html | 306 -
>  2 files changed, 312 deletions(-)
>  delete mode 100644 htdocs/projects/tree-ssa/tree-browser.html
>
> diff --git a/htdocs/projects/tree-ssa/index.html 
> b/htdocs/projects/tree-ssa/index.html
> index a15d0f32..930df390 100644
> --- a/htdocs/projects/tree-ssa/index.html
> +++ b/htdocs/projects/tree-ssa/index.html
> @@ -21,7 +21,6 @@
>  GENERIC and GIMPLE
>  SSA implementation
>  Unparsing GENERIC trees
> -Tree Browser
>  Implementation Status (last updated: 
> 2003-11-22)
>  TODO list (last updated: 2003-12-27)
>  
> @@ -221,11 +220,6 @@ functions that given a GENERIC tree node, they print a C 
> representation of
>  the tree.  The output is not meant to be compilable, but it is of great
>  help when debugging transformations done by the transformation passes.
>  
> -
> -Tree Browser
> -For debugging, browsing, discovering, and playing with trees you can
> -use the Tree Browser directly from gdb.
> -
>  
>  Implementation Status
>  
> diff --git a/htdocs/projects/tree-ssa/tree-browser.html 
> b/htdocs/projects/tree-ssa/tree-browser.html
> deleted file mode 100644
> index ce95a103..
> --- a/htdocs/projects/tree-ssa/tree-browser.html
> +++ /dev/null
> @@ -1,306 +0,0 @@
> -
> -
> -
> -
> -
> -Tree Browser
> -https://gcc.gnu.org/gcc.css"; />
> -
> -  
> -  
> -Tree Browser
> -
> -Until recently the only way to debug trees from gdb was to call
> -debug_tree as follows:
> -
> -
> -(gdb) p debug_tree (current_function_decl) 
> -
> -
> -An alternative for interactively scan tree structures is to use the
> -Tree Browser.  You can access Tree Browser from anywhere during a debugging
> -session as follows:
> -
> -
> -(gdb) p browse_tree (current_function_decl)
> - 
> -Tree Browser 
> -foo 
> -Up/prev expressions updated. 
> -TB> 
> -
> -
> -For listing available commands, you could try:
> -
> -
> -TB> h 
> -Possible commands are: 
> - 
> -   x  -  Exits tree-browser. 
> -   q  -  Exits tree-browser. 
> -   h  -  Prints this help message. 
> -  update  -  Update information about parent expressions. 
> - verbose  -  Sets/unsets verbose mode (default is on). 
> - fun  -  Go to the current function declaration. 
> -  nx  -  Go to the next expression in a BIND_EXPR. 
> -  pr  -  Go to the previous expression in a BIND_EXPR. 
> -  up  -  Go to the parent tree node. 
> -last  -  Go to the last expression in a BIND_EXPR. 
> -   first  -  Go to the first expression in a BIND_EXPR. 
> - hpr  -  Go to the previous visited node (history previous). 
> -arg0  -  Child 0. 
> -arg1  -  Child 1. 
> -arg2  -  Child 2. 
> -arg3  -  Child 3. 
> - decl_saved_tree  -  Body of a function. 
> -type  -  Field accessor. 
> -size  -  Field accessor. 
> -   unit_size  -  Field accessor. 
> -  offset  -  Field accessor. 
> -  bit_offset  -  Field accessor. 
> - context  -  Field accessor. 
> -  attributes  -  Field accessor. 
> - abstract_origin  -  Field accessor. 
> -   arguments  -  Field accessor. 
> -  result  -  Field accessor. 
> - initial  -  Field accessor. 
> -arg-type  -  Field accessor. 
> - arg-type-as-written  -  Field accessor. 
> -   chain  -  Field accessor. 
> -  values  -  Field accessor. 
> -  domain  -  Field accessor. 
> - method_basetype  -  Field accessor. 
> -  fields  -  Field accessor. 
> -   arg-types  -  Field accessor. 
> -basetype  -  Field accessor. 
> - pointer_to_this  -  Field accessor. 
> -   reference_to_this  -  Field accessor. 
> -vars  -  Field accessor. 
> -supercontext  -  Field accessor. 
> -body  -  Field accessor. 
> -   subblocks  -  Field accessor. 
> -   block  -  Field accessor. 
> -real  -  Field accessor. 
> -imag  -  Field accessor. 
> - purpose  -  Field accessor. 
> -   value  -  Field accessor. 
> - elt  -  Field ac

Re: [committed] amdgcn: Add fold_left_plus vector reductions

2020-07-09 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 07/07/2020 12:03, Richard Sandiford wrote:
>> Andrew Stubbs  writes:
>>> This patch implements a floating-point fold_left_plus vector pattern,
>>> which gives a significant speed-up in the BabelStream "dot" benchmark.
>>>
>>> The GCN architecture can't actually do an in-order vector reduction any
>>> more efficiently than that equivalent scalar algorithm, so this is a bit
>>> of a cheat.  However, dividing the problem into threads using OpenACC or
>>> OpenMP has already broken the in-order semantics, so we may as well
>>> optimize the operation at the vector level too.
>>>
>>> If the user has specifically sorted the input data in order to get a
>>> more correct FP result then using multiple threads is already the wrong
>>> thing to do. But, if the input data is in no particular numerical order
>>> then this optimization will give a correct answer much faster, albeit
>>> possibly a slightly different one each run.
>> 
>> There doesn't seem to be anything GCN-specific here though.
>> If pragmas say that we can ignore associativity rules, we should apply
>> that in target-independent code rather than in each individual target.
>
> Yes, I'm lazy. That, and I'm not sure what a target independent solution 
> would look like.
>
> Presumably we'd need something for both OpenMP and OpenACC, and it would 
> need to be specific to certain operations (not just blanket 
> -fassociative-math), which means the vectorizer (anywhere else?) would 
> need to be taught about the new thing?
>
> The nearest example I can think of is the force_vectorize flag that 
> OpenMP "simd" and OpenACC "vector" already use (the latter being 
> amdgcn-only as nvptx does its own OpenACC vectorization).

Yeah, I guess we'd need a way of querying whether a given reduction
is by nature reassociative due to pragmas.  It would probably need
to name the specific reductions somehow.

Agree it doesn't sound easy.

> I'm also not completely convinced that this -- or other cases like it -- 
> isn't simply a target-specific issue. Could it be harmful on other 
> architectures?

I'd hope not.  No target should prefer in-order reductions over
any-order reductions, since in-order implements any-order.

Thanks,
Richard


Re: [PATCH] middle-end: Improve RTL expansion in expand_mul_overflow,

2020-07-09 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 09, 2020 at 09:17:46AM +0100, Richard Sandiford wrote:
> > --- a/gcc/internal-fn.c
> > +++ b/gcc/internal-fn.c
> > @@ -1627,6 +1627,9 @@ expand_mul_overflow (location_t loc, tree lhs, tree 
> > arg0, tree arg1,
> >  profile_probability::very_likely ());
> >   else
> > {
> > + /* RES is used more than once, place it in a pseudo.  */
> > + res = force_reg (mode, res);
> > +
> >   rtx signbit = expand_shift (RSHIFT_EXPR, mode, res, prec - 1,
> >   NULL_RTX, 0);
> >   /* RES is low half of the double width result, HIPART
> 
> In general, this can be dangerous performance-wise on targets where
> subregs are free.  If the move survives to the register allocators,
> it increases the risk that the move will become a machine insn.
> (The RA will prefer to tie the registers, but that isn't guaranteed.)
> 
> But more fundamentally, this can hurt if the SUBREG_REG is live at
> the same time as the new pseudo, since the RA then has to treat them
> as separate quantities.  From your results, that obviously doesn't
> occur in the test case, but I'm not 100% confident that it won't
> occur elsewhere.
> 
> If target-independent code is going to optimise for “no subreg operand”
> targets like nvptx, I think it needs to know that the target wants that.

Isn't that though what the expander is doing in lots of places?
Force operands into pseudos especially if optimize to hope for better CSE
etc., and hope combine does its job to undo it when it is better to be
propagated?
It is true that if res is used several times, then combine will not
propagate it due to multiple uses, so the question I guess is why as Roger
says we get the same code before/after (which pass undoes that; RA?).

Jakub



[PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-07-09 Thread Hongtao Liu via Gcc-patches
Hi:
  For a constant vector having one duplicated value, there's no need
to put the whole vector in the constant pool, using embedded broadcast
instead.

  Bootstrap test is Ok, regression test for i386/x86-64 backend is ok.

gcc/ChangeLog:

PR target/87767
* config/i386/i386-features.c
(replace_constant_pool_with_broadcast): New function.
(constant_pool_broadcast): Ditto.
(class pass_constant_pool_broadcast): New pass.
(make_pass_constant_pool_broadcast): Ditto.
* config/i386/i386-passes.def: Insert new pass after combine.
* config/i386/i386-protos.h
(make_pass_constant_pool_broadcast): Declare.
* config/i386/sse.md (*avx512dq_mul3_bcst,
*avx512f_mul3_bcst): New define_insn.

gcc/testsuite/ChangeLog:

PR target/87767
* gcc.target/i386/avx2-broadcast-pr87767-1.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-2.c: New test.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
* gcc.target/i386/pr92865-1.c: Adjust testcase.




-- 
BR,
Hongtao
From b8f49299e3d23f927a659cd394e3099e3291a76f Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Wed, 8 Jul 2020 17:14:36 +0800
Subject: [PATCH] Optimize memory broadcast for constant vector under AVX512.

For constant vector having one duplicated value, there's no need to put
whole vector in the constant pool, using embedded broadcast instead.

2020-07-09  Hongtao Liu  

gcc/ChangeLog:

	PR target/87767
	* config/i386/i386-features.c
	(replace_constant_pool_with_broadcast): New function.
	(constant_pool_broadcast): Ditto.
	(class pass_constant_pool_broadcast): New pass.
	(make_pass_constant_pool_broadcast): Ditto.
	* config/i386/i386-passes.def: Insert new pass after combine.
	* config/i386/i386-protos.h
	(make_pass_constant_pool_broadcast): Declare.
	* config/i386/sse.md (*avx512dq_mul3_bcst,
	*avx512f_mul3_bcst): New define_insn.

gcc/testsuite/ChangeLog:

	PR target/87767
	* gcc.target/i386/avx2-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-2.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/pr92865-1.c: Adjust testcase.
---
 gcc/config/i386/i386-features.c   | 146 ++
 gcc/config/i386/i386-passes.def   |   1 +
 gcc/config/i386/i386-protos.h |   1 +
 gcc/config/i386/sse.md|  25 +++
 .../i386/avx2-broadcast-pr87767-1.c   |  40 +
 .../i386/avx512f-broadcast-pr87767-1.c|  66 
 .../i386/avx512f-broadcast-pr87767-2.c|  54 +++
 .../i386/avx512vl-broadcast-pr87767-1.c   |  40 +
 gcc/testsuite/gcc.target/i386/pr92865-1.c |   9 +-
 9 files changed, 378 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-broadcast-pr87767-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index 535fc7e981d..8f81d101382 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -2379,6 +2379,152 @@ make_pass_remove_partial_avx_dependency (gcc::context *ctxt)
   return new pass_remove_partial_avx_dependency (ctxt);
 }
 
+/* Replace all one-value const vector that are referenced by SYMBOL_REFs in x
+   with embedded broadcast. i.e.transform
+
+ vpaddq .LC0(%rip), %zmm0, %zmm0
+ ret
+  .LC0:
+.quad 3
+.quad 3
+.quad 3
+.quad 3
+.quad 3
+.quad 3
+.quad 3
+.quad 3
+
+to
+
+ vpaddq .LC0(%rip){1to8}, %zmm0, %zmm0
+ ret
+  .LC0:
+.quad 3  */
+static void
+replace_constant_pool_with_broadcast (rtx_insn* insn)
+{
+  subrtx_ptr_iterator::array_type array;
+  FOR_EACH_SUBRTX_PTR (iter, array, &PATTERN (insn), ALL)
+{
+  rtx *loc = *iter;
+  rtx x = *loc;
+  rtx broadcast_mem, vec_dup, constant, first;
+  machine_mode mode;
+  if (GET_CODE (x) != MEM
+	  || GET_CODE (XEXP (x, 0)) != SYMBOL_REF
+	  || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
+	continue;
+
+  mode = GET_MODE (x);
+  if (!VECTOR_MODE_P (mode))
+	return;
+
+  constant = get_pool_constant (XEXP (x, 0));
+  first = XVECEXP (constant, 0, 0);
+  /* There could be some rtx like
+	 (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1")))
+	 but with "*.LC1" refer to V2DI constant vector.  */
+  if (GET_MODE (constant) != mode)
+	return;
+
+  for (int i = 1; i < GET_MODE_NUNITS (mode); ++i)
+	{
+	  rtx tmp = XVECEXP (constant, 0, i);
+	  /* Only handle one-value const vector.  */
+	  if (!rtx_equal_p (tmp, first))
+	return;
+	}
+
+  broadcast_mem = force_cons

Re: [PATCH] middle-end: Improve RTL expansion in expand_mul_overflow,

2020-07-09 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Thu, Jul 09, 2020 at 09:17:46AM +0100, Richard Sandiford wrote:
>> > --- a/gcc/internal-fn.c
>> > +++ b/gcc/internal-fn.c
>> > @@ -1627,6 +1627,9 @@ expand_mul_overflow (location_t loc, tree lhs, tree 
>> > arg0, tree arg1,
>> > profile_probability::very_likely ());
>> >  else
>> >{
>> > +/* RES is used more than once, place it in a pseudo.  */
>> > +res = force_reg (mode, res);
>> > +
>> >  rtx signbit = expand_shift (RSHIFT_EXPR, mode, res, prec - 1,
>> >  NULL_RTX, 0);
>> >  /* RES is low half of the double width result, HIPART
>> 
>> In general, this can be dangerous performance-wise on targets where
>> subregs are free.  If the move survives to the register allocators,
>> it increases the risk that the move will become a machine insn.
>> (The RA will prefer to tie the registers, but that isn't guaranteed.)
>> 
>> But more fundamentally, this can hurt if the SUBREG_REG is live at
>> the same time as the new pseudo, since the RA then has to treat them
>> as separate quantities.  From your results, that obviously doesn't
>> occur in the test case, but I'm not 100% confident that it won't
>> occur elsewhere.
>> 
>> If target-independent code is going to optimise for “no subreg operand”
>> targets like nvptx, I think it needs to know that the target wants that.
>
> Isn't that though what the expander is doing in lots of places?
> Force operands into pseudos especially if optimize to hope for better CSE
> etc., and hope combine does its job to undo it when it is better to be
> propagated?
> It is true that if res is used several times, then combine will not
> propagate it due to multiple uses, so the question I guess is why as Roger
> says we get the same code before/after (which pass undoes that; RA?).

I'd imagine fwprop.  That's just a guess though.

But I don't see what this force_reg achieves on “free subreg” targets,
even for CSE.  The SUBREG_REG is still set as a full REG value and
can be CSEd in the normal way.  And because the subreg itself is free,
trying to CSE the subregs is likely to be actively harmful for the
reason above: we can then have the SUBREG_REG and the CSEd subreg
live at the same time.

I.e. it's not usually worth extending the lifetime of a previous
subreg result when a new subreg on the same value would have no cost.

Maybe in the old days it made more sense, because we relied on RTL
optimisers to do constant propagation and folding, and so a subreg
move might later become a constant move, with the constant then being
CSEable.  Is that what you mean?  But that should be much less necessary
now.  There shouldn't be many cases in which only the RTL optimisers can
prove that an internal function argument is constant.

Thanks,
Richard


Re: [wwwdocs PATCH] remove tree-browser page and links

2020-07-09 Thread Gerald Pfeifer
On Thu, 9 Jul 2020, Richard Sandiford wrote:
>> I'm trying Tree Browser during debugging, but failed.
>> I found that tree-browser.c and tree-browser.def have been
>> removed at 2015-07-25. So, to avoid misunderstanding,
>> can we remove this tree-browser page too?
> Thanks for the patch.  Seems like a good idea to me.  I guess the only
> question is whether we should keep it around for historical purposes,
> but with a big banner to say that it's no longer up-to-date.  I also
> don't know whether we try to avoid 404s on old pages.

Thank you, Hujp and Richard!

We generally try to avoid 404s.  So in case we remove a page put in
a redirect. You can do so via a new entry in wwwdocs/htdocs/.htaccess 
(which I believe is self explanatory, and I'm happy to help, too).

My recommendation in a case like this is to follow the approach of
adding a big banner at the top as you suggest, Richard. That said,
if the consensus by you/those working in this area is to completely
remove the page, that is perfectly fine, too.

Gerald


Re: [PATCH] Fix unnecessary register spill that depends on function ordering

2020-07-09 Thread Richard Sandiford
Sorry for the slow reply.

Omar Tahir  writes:
>> Omar Tahir  writes:
>> > Hi Richard,
>> >
>> > From: Richard Sandiford 
>> >> > @@ -3719,6 +3722,7 @@ static unsigned int rest_of_handle_sched (void) 
>> >> > { #ifdef INSN_SCHEDULING
>> >> > +  first_moveable_pseudo = last_moveable_pseudo;
>> >> >if (flag_selective_scheduling
>> >> >&& ! maybe_skip_selective_scheduling ())
>> >> >  run_selective_scheduling ();
>> >> 
>> >> I think instead we should zero out both variables at the end of IRA.
>> >> There are other places besides the scheduler that call into the IRA code, 
>> >> so tackling the problem there seems more general.
>> >
>> > If you zero first_moveable_pseudo and last_moveable_pseudo after IRA then
>> > they'll be zero for the second scheduler pass, which uses them.
>> 
>> Are you sure?  It shouldn't be doing that, since there are no pseudos
>> left when the second scheduling pass runs.  RA replaces all pseudos
>> with hard registers.
>> 
>> So if the values in the variables has a noticeable effect on sched2,
>> I think that's even more reason to clear them after IRA :-)
>
> That's a good point. A few other passes call functions that make use of
> the moveable pseudo variables. But if they're before IRA then they should
> be zero, and as you say if they're after IRA then there are no pseudos left!
>
> I've moved the reset to the end of move_unallocated_pseudos. Unfortunately I
> can't inline the patch as there's a form feed (^L) that's mangling the text,
> not sure how to get around that.

Thanks, pushed to master.

Richard


[committed] openmp: Change omp_atv_default value and rename omp_atv_sequential to omp_atv_serialized

2020-07-09 Thread Jakub Jelinek via Gcc-patches
Hi!

While this is an OpenMP 5.1 change, it is undesirable to let people use 
different
values and then deal with ABI backwards compatibility in a year or two.

Have been waiting with this for a month until the final wording for
omp_atv_default makes it through.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-07-09  Jakub Jelinek  

* omp.h.in (omp_alloctrait_value_t): Change omp_atv_default from
2 to -1.  Add omp_atv_serialized and define omp_atv_sequential using
it.  Remove __omp_alloctrait_value_max__.
* allocator.c (omp_init_allocator): Handle omp_atv_default for
omp_atk_alignment and omp_atk_pool_size.

--- libgomp/omp.h.in.jj 2020-05-19 10:09:56.735370056 +0200
+++ libgomp/omp.h.in2020-06-04 23:24:10.115738893 +0200
@@ -136,12 +136,13 @@ typedef enum omp_alloctrait_key_t
 
 typedef enum omp_alloctrait_value_t
 {
+  omp_atv_default = (__UINTPTR_TYPE__) -1,
   omp_atv_false = 0,
   omp_atv_true = 1,
-  omp_atv_default = 2,
   omp_atv_contended = 3,
   omp_atv_uncontended = 4,
-  omp_atv_sequential = 5,
+  omp_atv_serialized = 5,
+  omp_atv_sequential = omp_atv_serialized,
   omp_atv_private = 6,
   omp_atv_all = 7,
   omp_atv_thread = 8,
@@ -154,8 +155,7 @@ typedef enum omp_alloctrait_value_t
   omp_atv_environment = 15,
   omp_atv_nearest = 16,
   omp_atv_blocked = 17,
-  omp_atv_interleaved = 18,
-  __omp_alloctrait_value_max__ = __UINTPTR_MAX__
+  omp_atv_interleaved = 18
 } omp_alloctrait_value_t;
 
 typedef struct omp_alloctrait_t
--- libgomp/allocator.c.jj  2020-06-02 11:56:59.400537423 +0200
+++ libgomp/allocator.c 2020-06-04 23:29:16.896333916 +0200
@@ -90,6 +90,11 @@ omp_init_allocator (omp_memspace_handle_
  }
break;
   case omp_atk_alignment:
+if (traits[i].value == omp_atv_default)
+ {
+   data.alignment = 1;
+   break;
+ }
if ((traits[i].value & (traits[i].value - 1)) != 0
|| !traits[i].value)
  return omp_null_allocator;
@@ -112,7 +117,10 @@ omp_init_allocator (omp_memspace_handle_
  }
break;
   case omp_atk_pool_size:
-   data.pool_size = traits[i].value;
+   if (traits[i].value == omp_atv_default)
+ data.pool_size = ~(uintptr_t) 0;
+   else
+ data.pool_size = traits[i].value;
break;
   case omp_atk_fallback:
switch (traits[i].value)


Jakub



[PATCH] openmp: Optimize triangular loop logical iterator to actual iterators computation using search for quadratic equation root(s)

2020-07-09 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch implements the optimized logical to actual iterators
computation for triangular loops.

I have a rough implementation using integers, but this one uses floating
point.  There is a small problem that -fopenmp programs aren't linked with
-lm, so it does it only if the hw has sqrt optab (and uses ifn rather than
__builtin_sqrt because it obviously doesn't need errno handling etc.).

Bootstrapped/regtested on x86_64-linux and i686-linux.

Do you think it is ok this way, or should I use the integral computation
using inlined isqrt (we have inequation of the form
start >= x * t10 + t11 * (((x - 1) * x) / 2)
where t10 and t11 are signed long long values and start unsigned long long,
and the division by 2 actually is a problem for accuracy in some cases, so
if we do it in integral, we need to do actually
  long long t12 = 2 * t10 - t11;
  unsigned long long t13 = t12 * t12 + start * 8 * t11;
  unsigned long long isqrt_ = isqrtull (t13);
  long long x = (((long long) isqrt_ - t12) / t11) >> 1;
with careful overflow checking on all the computations before isqrtull
(and on overflows use the fallback implementation).

2020-07-09  Jakub Jelinek  

* omp-general.h (struct omp_for_data): Add min_inner_iterations
and factor members.
* omp-general.c (omp_extract_for_data): Initialize them and remember
them in OMP_CLAUSE_COLLAPSE_COUNT if needed and restore from there.
* omp-expand.c (expand_omp_for_init_counts): Fix up computation of
counts[fd->last_nonrect] if fd->loop.n2 is INTEGER_CST.
(expand_omp_for_init_vars): For
fd->first_nonrect + 1 == fd->last_nonrect loops with for now
INTEGER_CST fd->loop.n2 find quadratic equation roots instead of
using fallback method when possible.

* testsuite/libgomp.c/loop-19.c: New test.
* testsuite/libgomp.c/loop-20.c: New test.

--- gcc/omp-general.h.jj2020-06-29 14:51:54.880085560 +0200
+++ gcc/omp-general.h   2020-07-03 11:13:46.571809236 +0200
@@ -78,6 +78,13 @@ struct omp_for_data
   unsigned char sched_modifiers;
   enum omp_clause_schedule_kind sched_kind;
   struct omp_for_data_loop *loops;
+  /* The following are relevant only for non-rectangular loops
+ where only a single loop depends on an outer loop iterator.  */
+  tree min_inner_iterations; /* Number of iterations of the inner
+   loop with either the first or last
+   outer iterator, depending on which
+   results in fewer iterations.  */
+  tree factor; /* (m2 - m1) * outer_step / inner_step.  */
 };
 
 #define OACC_FN_ATTRIB "oacc function"
--- gcc/omp-general.c.jj2020-06-29 14:51:54.870085702 +0200
+++ gcc/omp-general.c   2020-07-07 12:43:12.651936017 +0200
@@ -212,6 +212,8 @@ omp_extract_for_data (gomp_for *for_stmt
   fd->sched_modifiers = 0;
   fd->chunk_size = NULL_TREE;
   fd->simd_schedule = false;
+  fd->min_inner_iterations = NULL_TREE;
+  fd->factor = NULL_TREE;
   collapse_iter = NULL;
   collapse_count = NULL;
 
@@ -653,6 +655,8 @@ omp_extract_for_data (gomp_for *for_stmt
  else
t2 = fold_build2 (TRUNC_DIV_EXPR, itype, t2, step);
  t2 = fold_convert (llutype, t2);
+ fd->min_inner_iterations = t;
+ fd->factor = t2;
  t = fold_build2 (MULT_EXPR, llutype, t,
   single_nonrect_count);
  tree t3 = fold_build2 (MINUS_EXPR, llutype,
@@ -707,7 +711,17 @@ omp_extract_for_data (gomp_for *for_stmt
   if (collapse_count && *collapse_count == NULL)
 {
   if (count)
-   *collapse_count = fold_convert_loc (loc, iter_type, count);
+   {
+ *collapse_count = fold_convert_loc (loc, iter_type, count);
+ if (fd->min_inner_iterations && fd->factor)
+   {
+ t = make_tree_vec (3);
+ TREE_VEC_ELT (t, 0) = *collapse_count;
+ TREE_VEC_ELT (t, 1) = fd->min_inner_iterations;
+ TREE_VEC_ELT (t, 2) = fd->factor;
+ *collapse_count = t;
+   }
+   }
   else
*collapse_count = create_tmp_var (iter_type, ".count");
 }
@@ -717,6 +731,13 @@ omp_extract_for_data (gomp_for *for_stmt
   fd->loop.v = *collapse_iter;
   fd->loop.n1 = build_int_cst (TREE_TYPE (fd->loop.v), 0);
   fd->loop.n2 = *collapse_count;
+  if (TREE_CODE (fd->loop.n2) == TREE_VEC)
+   {
+ gcc_assert (fd->non_rect);
+ fd->min_inner_iterations = TREE_VEC_ELT (fd->loop.n2, 1);
+ fd->factor = TREE_VEC_ELT (fd->loop.n2, 2);
+ fd->loop.n2 = TREE_VEC_ELT (fd->loop.n2, 0);
+   }
   fd->loop.step = build_int_cst (TREE_TYPE (fd->loop.v), 1);
   fd->loop.m1 = NULL_TREE;
   fd->loop.m2 = NULL_TREE;
--- gcc/omp-expand.c.jj 2020-07-02 11:03:11.275499143 +0200
+++ gcc/omp-expand.c2020-07-08 17:14:02.146638008 +0200
@@ -2137,

[PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

2020-07-09 Thread Tamar Christina
Hi All,

This patch fixes a couple of issues in AArch64's -mcpu=native processing:

The buffer used to read the lines from /proc/cpuinfo is 128 bytes long.  While
this was enough in the past with the increase in architecture extensions it is
no longer enough.   It results in two bugs:

1) No option string longer than 127 characters is correctly parsed.  Features
   that are supported are silently ignored.

2) It incorrectly enables features that are not present on the machine:
  a) It checks for substring matching instead of full word matching.  This makes
 it incorrectly detect sb support when ssbs is provided instead.
  b) Due to the truncation at the 127 char border it also incorrectly enables
 features due to the full feature being cut off and the part that is left
 accidentally enables something else.

This breaks -mcpu=native detection on some of our newer system.

The patch fixes these issues by reading full lines up to the \n in a string.
This gives us the full feature line.  Secondly it creates a set from this string
to:

 1) Reduce matching complexity from O(n*m) to O(n*logm).
 2) Perform whole word matching instead of substring matching.

To make this code somewhat cleaner I also changed from using char* to using
std::string and std::set.

Note that I have intentionally avoided the use of ifstream and stringstream
to make it easier to backport.  I have also not change the substring matching
for the initial line classification as I cannot find a documented cpuinfo format
which leads me to believe there may be kernels out there that require this which
may be why the original code does this.

I also do not want this to break if the kernel adds a new line that is long and
indents the file by two tabs to keep everything aligned.  In short I think an
imprecise match is the right thing here.

Test for this is added as the last thing in this series as it requires some
changes to be made to be able to test this.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master, GCC 10, 9 and 8?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/driver-aarch64.c (INCLUDE_SET): New.
(parse_field): Use std::string.
(split_words, readline): New.
(host_detect_local_cpu): Fix truncation issues.

-- 
diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c
index d1229e676806f9607c258e5d678fb3175fadf1c2..3accf3216102a54009af6ece3b2796a55af934ae 100644
--- a/gcc/config/aarch64/driver-aarch64.c
+++ b/gcc/config/aarch64/driver-aarch64.c
@@ -21,6 +21,7 @@
 
 #include "config.h"
 #define INCLUDE_STRING
+#define INCLUDE_SET
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -116,9 +117,9 @@ valid_bL_core_p (unsigned int *core, unsigned int bL_core)
 /* Returns the hex integer that is after ':' for the FIELD.
Returns -1 is returned if there was problem parsing the integer. */
 static unsigned
-parse_field (const char *field)
+parse_field (const std::string field)
 {
-  const char *rest = strchr (field, ':');
+  const char *rest = strchr (field.c_str (), ':');
   char *after;
   unsigned fint = strtol (rest + 1, &after, 16);
   if (after == rest + 1)
@@ -126,6 +127,57 @@ parse_field (const char *field)
   return fint;
 }
 
+/* Splits and returns a string based on whitespace and return it as
+   part of a set. Empty strings are ignored.  */
+
+static void
+split_words (const std::string val, std::set &result)
+{
+  size_t cur, prev = 0;
+  std::string word;
+  while ((cur = val.find_first_of (" \n", prev)) != std::string::npos)
+  {
+word = val.substr (prev, cur - prev);
+/* Skip adding empty words.  */
+if (!word.empty ())
+  result.insert (word);
+prev = cur+1;
+  }
+
+  word = val.substr (prev, cur - prev);
+  if (!word.empty ())
+result.insert (word);
+}
+
+/* Read an entire line from F until '\n' or EOF.  */
+
+static std::string
+readline (FILE *f)
+{
+  char *buf = NULL;
+  int size = 0;
+  int last = 0;
+  const int buf_size = 128;
+
+  if (feof (f))
+return std::string ();
+
+  do
+  {
+size += buf_size;
+buf = (char*)xrealloc (buf, size);
+gcc_assert (buf);
+fgets (buf+last, buf_size, f);
+/* If we're not at the end of the line then override the
+   \0 added by fgets.  */
+last = strlen (buf) - 1;
+  } while (!feof (f) && buf[last]!='\n');
+
+  std::string result (buf);
+  free (buf);
+  return result;
+}
+
 /*  Return true iff ARR contains CORE, in either of the two elements. */
 
 static bool
@@ -164,7 +216,6 @@ host_detect_local_cpu (int argc, const char **argv)
 {
   const char *res = NULL;
   static const int num_exts = ARRAY_SIZE (aarch64_extensions);
-  char buf[128];
   FILE *f = NULL;
   bool arch = false;
   bool tune = false;
@@ -178,6 +229,7 @@ host_detect_local_cpu (int argc, const char **argv)
   bool processed_exts = false;
   uint64_t extension_flags = 0;
   uint64_t default_flags = 0;
+  std::string buf;
 
   gcc_assert (argc);
 
@@ -202

[PATCH 2/6] AArch64: Add GCC_CPUINFO override

2020-07-09 Thread Tamar Christina
Hi All,

This adds an in intentionally undocumented environment variable
GCC_CPUINFO which can be used to test -mcpu=native.

Tests using this are added later on.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master, GCC 10, 9 and 8?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/driver-aarch64.c (host_detect_local_cpu):
Add GCC_CPUINFO.

-- 
diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c
index 78616379189e80e75a28973f6b90ca4ca287e4bc..498350f5588e1109472ed95a61f1021d697eb057 100644
--- a/gcc/config/aarch64/driver-aarch64.c
+++ b/gcc/config/aarch64/driver-aarch64.c
@@ -230,6 +230,7 @@ host_detect_local_cpu (int argc, const char **argv)
   uint64_t extension_flags = 0;
   uint64_t default_flags = 0;
   std::string buf;
+  char *fcpu_info;
 
   gcc_assert (argc);
 
@@ -247,7 +248,11 @@ host_detect_local_cpu (int argc, const char **argv)
   if (!arch && !tune && !cpu)
 goto not_found;
 
-  f = fopen ("/proc/cpuinfo", "r");
+  fcpu_info = getenv ("GCC_CPUINFO");
+  if (fcpu_info)
+f = fopen (fcpu_info, "r");
+  else
+f = fopen ("/proc/cpuinfo", "r");
 
   if (f == NULL)
 goto not_found;



[PATCH 3/6] Arm: Add GCC_CPUINFO override

2020-07-09 Thread Tamar Christina
Hi All,

This adds an in intentionally undocumented environment variable
GCC_CPUINFO which can be used to test -mcpu=native.

Tests using these are added later on.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master, GCC 10, 9 and 8?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/driver-arm.c (host_detect_local_cpu): Add GCC_CPUINFO.

-- 
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index 254e5ba53a6130183cf7561913943765a3a56898..85058f257c9d34f31c395c5d4502f73a7f00625e 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -61,6 +61,7 @@ host_detect_local_cpu (int argc, const char **argv)
   FILE *f = NULL;
   bool arch;
   const struct vendor_cpu *cpu_table = NULL;
+  char *fcpu_info = NULL;
 
   if (argc < 1)
 goto not_found;
@@ -69,7 +70,12 @@ host_detect_local_cpu (int argc, const char **argv)
   if (!arch && strcmp (argv[0], "cpu") != 0 && strcmp (argv[0], "tune"))
 goto not_found;
 
-  f = fopen ("/proc/cpuinfo", "r");
+  fcpu_info = getenv ("GCC_CPUINFO");
+  if (fcpu_info)
+f = fopen (fcpu_info, "r");
+  else
+f = fopen ("/proc/cpuinfo", "r");
+
   if (f == NULL)
 goto not_found;
 



[PATCH 4/6] Testsuite: Make it easier to debug environment setting functions

2020-07-09 Thread Tamar Christina
Hi All,

This adds verbose output to dg-set-compiler-env-var and dg-set-target-env-var
so you can actually see what they're setting when you add -v -v.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* lib/gcc-dg.exp (dg-set-compiler-env-var, dg-set-target-env-var): Add
verbose output.

-- 
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 27cc7c19625d8bff12268de16ec0c20d558394b7..ccf6b7cc70b1d4268c0b0c65f960fb42deab0d88 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -477,7 +477,10 @@ proc dg-set-target-env-var { args } {
 	error "dg-set-target-env-var: need two arguments"
 	return
 }
-lappend set_target_env_var [list [lindex $args 1] [lindex $args 2]]
+set var [lindex $args 1]
+set value [lindex $args 2]
+verbose "dg-set-compiler-env-var $var $value" 2
+lappend set_target_env_var [list $var $value]
 }
 
 proc set-target-env-var { } {
@@ -518,6 +521,7 @@ proc dg-set-compiler-env-var { args } {
 }
 set var [lindex $args 1]
 set value [lindex $args 2]
+verbose "dg-set-compiler-env-var $var $value" 2
 if [info exists ::env($var)] {
   lappend saved_compiler_env_var [list $var 1 $::env($var)]
 } else {



Re: Initial Sapphire Rapids and Alder Lake support from ISA r40

2020-07-09 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 9, 2020 at 9:20 AM Cui, Lili  wrote:
>
> Hi:
> This patch is about to add Sapphire Rapids and Alder Lake to GCC.
> Sapphire Rapids is based on Cooper Lake and plus ISA 
> MOVDIRI/MOVDIR64B/AVX512VP2INTERSECT/ENQCMD/CLDEMOTE/PTWRITE/WAITPKG/SERIALIZE/TSXLDTRK.
> Alder Lake is based on Skylake and plus ISA CLDEMOTE/PTWRITE/WAITPK/SERIALIZE.
>
> For detailed information, please refer to 
> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
>
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>
> OK for master?
>
> gcc/ChangeLog
> * common/config/i386/cpuinfo.h
> (get_intel_cpu): Handle sapphirerapids.
> * common/config/i386/i386-common.c
> (processor_names): Add sapphirerapids and alderlake.
> (processor_alias_table): Add sapphirerapids and alderlake.
> * common/config/i386/i386-cpuinfo.h
> (processor_subtypes): Add INTEL_COREI7_ALDERLAKE and
> INTEL_COREI7_ALDERLAKE.
> * config.gcc: Add -march=sapphirerapids and alderlake.
> * config/i386/driver-i386.c
> (host_detect_local_cpu) Handle sapphirerapids and alderlake.
> * config/i386/i386-c.c
> (ix86_target_macros_internal): Handle sapphirerapids and alderlake.
> * config/i386/i386-options.c
> (m_SAPPHIRERAPIDS) : Define.
> (m_ALDERLAKE): Ditto.
> (m_CORE_AVX512) : Add m_SAPPHIRERAPIDS.
> (processor_cost_table): Add sapphirerapids and alderlake.
> (ix86_option_override_internal) Handle PTA_WAITPKG, PTA_ENQCMD,
> PTA_CLDEMOTE, PTA_SERIALIZE, PTA_TSXLDTRK.
> * config/i386/i386.h
> (ix86_size_cost) : Define SAPPHIRERAPIDS and ALDERLAKE.
> (processor_type) : Add PROCESSOR_SAPPHIRERAPIDS and
> PROCESSOR_ALDERLAKE.
> (PTA_ENQCMD): New.
> (PTA_CLDEMOTE): Ditto.
> (PTA_SERIALIZE): Ditto.
> (PTA_TSXLDTRK): New.
> (PTA_SAPPHIRERAPIDS): Ditto.
> (PTA_ALDERLAKE): Ditto.
> (processor_type) : Add PROCESSOR_SAPPHIRERAPIDS and
> PROCESSOR_ALDERLAKE.
> * doc/extend.texi: Add sapphirerapids and alderlake.
> * doc/invoke.texi: Add sapphirerapids and alderlake.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/funcspec-56.inc: Handle new march.
> * g++.target/i386/mv16.C: Handle new march

OK.

Thanks,
Uros.

> Thanks,
> Lili.
>


[PATCH 5/6] Docs: Document environment setting directives for testsuite

2020-07-09 Thread Tamar Christina
Hi All,

This document some of the existing DejaGnu directives to modify
environment variables before test or compiler invocations.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/sourcebuild.texi (dg-set-compiler-env-var,
dg-set-target-env-var): Document.

-- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 5b5b845afe6e8c8836bad73872797a10b4921cea..b025b3de186ce135d029bf0874ef420a90bb8fc2 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1244,6 +1244,18 @@ This DejaGnu directive compares @var{regexp} to the combined output
 that the test executable writes to @file{stdout} and @file{stderr}.
 @end table
 
+@subsubsection Specify environment variables for a test
+
+@table @code
+@item @{ dg-set-compiler-env-var @var{var_name} "@var{var_value}" @}
+Specify that the environment variable @var{var_name} needs to be set
+to @var{var_value} before invoking the compiler on the test file.
+
+@item @{ dg-set-target-env-var @var{var_name} "@var{var_value}" @}
+Specify that the environment variable @var{var_name} needs to be set
+to @var{var_value} before execution of the program created by the test.
+@end table
+
 @subsubsection Specify additional files for a test
 
 @table @code



[PATCH 6/6] AArch64: Add test for -mcpu=native

2020-07-09 Thread Tamar Christina
Hi All,

This adds some tests to the GCC testsuite for testing the
-mcpu=native code.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master, GCC 10, 9 and 8?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/aarch64-cpunative.exp: New test.
* gcc.target/aarch64/cpunative/info_0: New test.
* gcc.target/aarch64/cpunative/info_1: New test.
* gcc.target/aarch64/cpunative/info_10: New test.
* gcc.target/aarch64/cpunative/info_11: New test.
* gcc.target/aarch64/cpunative/info_12: New test.
* gcc.target/aarch64/cpunative/info_13: New test.
* gcc.target/aarch64/cpunative/info_14: New test.
* gcc.target/aarch64/cpunative/info_15: New test.
* gcc.target/aarch64/cpunative/info_2: New test.
* gcc.target/aarch64/cpunative/info_3: New test.
* gcc.target/aarch64/cpunative/info_4: New test.
* gcc.target/aarch64/cpunative/info_5: New test.
* gcc.target/aarch64/cpunative/info_6: New test.
* gcc.target/aarch64/cpunative/info_7: New test.
* gcc.target/aarch64/cpunative/info_8: New test.
* gcc.target/aarch64/cpunative/info_9: New test.
* gcc.target/aarch64/cpunative/native_cpu_0.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_1.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_10.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_11.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_12.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_13.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_14.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_15.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_2.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_3.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_4.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_5.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_6.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_7.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_8.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_9.c: New test.

-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/aarch64-cpunative.exp b/gcc/testsuite/gcc.target/aarch64/cpunative/aarch64-cpunative.exp
new file mode 100644
index ..ce80ca04b8d095c96acf0bda4d5b16a6460c4cbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/aarch64-cpunative.exp
@@ -0,0 +1,35 @@
+# Copyright (C) 2014-2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if ![istarget aarch64*-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"" ""
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_0 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_0
new file mode 100644
index ..ef4a3f606fa56bc7ef1e951c1896737f013f78a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_0
@@ -0,0 +1,8 @@
+processor	: 0
+BogoMIPS	: 100.00
+Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp
+CPU implementer	: 0x41
+CPU architecture: 8
+CPU variant	: 0x0
+CPU part	: 0xd08
+CPU revision	: 2
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_1 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_1
new file mode 100644
index ..0f434bca28521f678bcee9f407f34ef813e7f1f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_1
@@ -0,0 +1,8 @@
+processor	: 0
+BogoMIPS	: 100.00
+Features	: fp
+CPU implementer	: 0x41
+CPU architecture: 8
+CPU variant	: 0x0
+CPU part	: 0xd08
+CPU revision	: 2
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_10 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_10
new file mode 100644
index ..c6e9d7ca9e274a0be8a75bd7056dad958aa53cb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_10
@@ -0,0 +1,8 @@
+processor	: 0
+BogoMIP

Re: [Patch][gcn, nvptx, offloading] mkoffload – handle -fpic/-fPIC

2020-07-09 Thread Thomas Schwinge
Hi Kwok!

On 2020-07-08T18:35:43+0100, Kwok Cheung Yeung  wrote:
>  > I tried out the patch with one test-case and -pie -fPIC/-fpic already
>  > seems to works, so perhaps we could have at least one test-case
>  > exercising this in libgomp?  That sounds easier to do than the
>  > shared-lib test-case.
>
> I've created a simple testcase which tries to generate a shared library with
> offloaded code.

Thanks.

> Without the mkoffload patch, it fails during linking with:
>
> ld: /tmp/ccNaT7fO.target.o: relocation R_X86_64_32 against `.rodata' can not 
> be
> used when making a shared object; recompile with -fPIC
>
> I have tested this on a x64 host with offloading to nvptx and gcn.

Confirmed, and also that it actually works; using the two files attached:

$ install/bin/gcc -Wall -Wextra -fopenacc -o libf.so shared-lib.c -shared 
-fPIC
$ install/bin/gcc -Wall -Wextra -fopenacc shared-lib-main.c -Wl,-rpath,$PWD 
-L$PWD -lf

..., and execute 'a.out' on a GCN and a nvptx GPU machine.

(As discussed before, it's non-trivial to get such a two-step compilation
process into the libgomp testsuite, so not doing that now.)

> On AMD GCN,
> it also produces a couple of extra linker warnings that I have added 
> dg-warning
> entries for.

Isn't that unexpected to see such warnings?  (Would you declare this
issue "done" if users then see such warnings?)  That said, I'm not seeing
these, thus get:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/shared-lib.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0   
(test for warnings, line )
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/shared-lib.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0   
(test for warnings, line )

> Okay for trunk/OG10 together with the previous mkoffload patch?

In principle yes, with the 'dg-warning' directives removed.  Please
verify why you're seeing these warnings with GCN 'mkoffload'.

> commit 43238117c261285a6b95d881bcc2f9efd9f752ad
> Author: Kwok Cheung Yeung 
> Date:   Wed Jul 8 03:28:08 2020 -0700
>
> Add test case

Try to make 'git log --oneline' meaningful, so a bit more verbose,
please.  Maybe: "Add test case 'libgomp.oacc-c-c++-common/shared-lib.c'"?

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/shared-lib.c
> @@ -0,0 +1,16 @@

Maybe add a header line comment, like: "Verify that we can compile
offloading code into a shared library".

> +/* { dg-do link } */
> +/* { dg-additional-options "-shared -fPIC" { target fpic } } */

Let's hope this does the expected thing for all offloading/libgomp
testing configurations.  Not sure if (additionally?) we should
'dg-require-effective-target shared'?  See 'g++.dg/tls/pr85400.C', for
example.  (Jakub?)

> +
> +#define N 512
> +
> +void f(int a[])
> +{
> +  int i;
> +
> +  #pragma acc parallel
> +  for (i = 0; i < N; i++)
> +a[i]++;
> +}
> +
> +/* { dg-warning "relocation against `.*' in read-only section `\.rodata'" "" 
> { target openacc_radeon_accel_selected } 0 } */
> +/* { dg-warning "creating DT_TEXTREL in a shared object" "" { target 
> openacc_radeon_accel_selected } 0 } */


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
/* { dg-do link } */
/* { dg-additional-options "-shared -fPIC" { target fpic } } */

#define N 512

void f(int a[])
{
  int i;

#pragma acc parallel loop copy(a[0:N])
  for (i = 0; i < N; i++)
a[i] = -i;
}
#include 

extern void f(int a[]);

#define N 512

int main()
{
  int a[N] = { 0 };

  f(a);

  for (int i = 0; i < N; i++)
assert (a[i] == -i);

  return 0;
}


Re: [PATCH] remove premature vect_verify_datarefs_alignment

2020-07-09 Thread Richard Biener
On Thu, 9 Jul 2020, Kewen.Lin wrote:

> on 2020/7/9 上午10:48, Kewen.Lin via Gcc-patches wrote:
> > Hi Richi,
> > 
> > on 2020/7/8 下午10:45, Richard Biener wrote:
> >> This followup removes vect_verify_datarefs_alignment and its
> >> premature cancellation of vectorization leaving the actual
> >> decision whether alignment is supported to the functions
> >> deciding whether we can vectorize a load or store.
> >>
> >> I'll see whether to find a suitable machine to test !hw_misalign_supported
> >> (altivec-only ppc I think?  hints welcome...), but maybe I'm lazy...
> >>
> > 
> > Thanks for caring about this!  As my limited experience, Power7 machine
> > is qualified for this even with explicit configuration -mcpu=power7,
> 
> Oops, sorry I meant --with-cpu=power7.
> 
> > most of vector test cases will go with -mno-allow-movmisalign there.
> > 
> > cfarm gcc110 looks fine to use.
> > 

OK, so that showed a few testcases to adjust, thus I committed the
following.  In particular gcc.dg/vect/slp-45.c shows we can very
well vectorize the loops and prematurely failed to.

Thanks,
Richard.

>From a1e25d0008791118dd58eaddff5f4c3691f8750e Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Wed, 8 Jul 2020 16:37:55 +0200
Subject: [PATCH] remove premature vect_verify_datarefs_alignment

This followup removes vect_verify_datarefs_alignment and its
premature cancellation of vectorization leaving the actual
decision whether alignment is supported to the functions
deciding whether we can vectorize a load or store.

2020-07-08  Richard Biener  

* tree-vectorizer.h (vect_verify_datarefs_alignment): Remove.
(vect_slp_analyze_and_verify_instance_alignment): Rename to ...
(vect_slp_analyze_instance_alignment): ... this.
* tree-vect-data-refs.c (verify_data_ref_alignment): Remove.
(vect_verify_datarefs_alignment): Likewise.
(vect_enhance_data_refs_alignment): Do not call
vect_verify_datarefs_alignment.
(vect_slp_analyze_node_alignment): Rename from
vect_slp_analyze_and_verify_node_alignment and do not
call verify_data_ref_alignment.
(vect_slp_analyze_instance_alignment): Rename from
vect_slp_analyze_and_verify_instance_alignment.
* tree-vect-stmts.c (vectorizable_store): Dump when
we vectorize an unaligned access.
(vectorizable_load): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Do not call
vect_verify_datarefs_alignment.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust.

* gcc.dg/vect/bb-slp-10.c: Adjust.
* gcc.dg/vect/slp-45.c: Likewise.
* gcc.dg/vect/vect-109.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-10.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-45.c|  3 +-
 gcc/testsuite/gcc.dg/vect/vect-109.c  |  2 +-
 gcc/tree-vect-data-refs.c | 88 +++
 gcc/tree-vect-loop.c  |  2 -
 gcc/tree-vect-slp.c   |  2 +-
 gcc/tree-vect-stmts.c | 14 +
 gcc/tree-vectorizer.h |  4 +-
 8 files changed, 28 insertions(+), 89 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-10.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
index 1f9b917dcd2..ad6f878e407 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
@@ -49,6 +49,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "bad data alignment in basic block" "slp2" { 
target { ! vect_element_align } } } } */
+/* { dg-final { scan-tree-dump "unsupported unaligned access" "slp2" { target 
{ ! vect_element_align } } } } */
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" { 
target vect_element_align } } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/slp-45.c 
b/gcc/testsuite/gcc.dg/vect/slp-45.c
index d83472ca095..1e35d354203 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-45.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-45.c
@@ -77,5 +77,4 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 13 "vect" { target 
vect_hw_misalign } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { 
! vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 13 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c 
b/gcc/testsuite/gcc.dg/vect/vect-109.c
index ac5d0827899..fe7ea6c420f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-109.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-109.c
@@ -77,6 +77,6 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target 
vect_element_align } } } */
-/* { dg-final { scan-tree-dump-times "not vectorized: unsupported unaligned 
store" 2 "vect" { xfail vect_element_align } } } */
+/* { dg-final { scan-tree-dump-times "unsupported unaligned access" 2 "vect" { 
xfail vect_element_align } } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 
"vect" { target vect_elemen

[PATCH] Add -fdump-profile-report.

2020-07-09 Thread Martin Liška

When using -fprofile-report, -fdump-profile-report can be used to
print the report to a foo.c.000i.profile-report file instead
of stderr. I see it handy for comparison purpose.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

---
 gcc/dumpfile.c |  3 ++-
 gcc/dumpfile.h |  1 +
 gcc/passes.c   | 47 +++
 3 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index 5d61946fc49..9a5496a18e8 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -103,8 +103,9 @@ static struct dump_file_info dump_files[TDI_end] =
   DUMP_FILE_INFO (".gimple", "tree-gimple", DK_tree, 0),
   DUMP_FILE_INFO (".nested", "tree-nested", DK_tree, 0),
   DUMP_FILE_INFO (".lto-stream-out", "ipa-lto-stream-out", DK_ipa, 0),
+  DUMP_FILE_INFO (".profile-report", "profile-report", DK_ipa, 0),
 #define FIRST_AUTO_NUMBERED_DUMP 1
-#define FIRST_ME_AUTO_NUMBERED_DUMP 4
+#define FIRST_ME_AUTO_NUMBERED_DUMP 5
 
   DUMP_FILE_INFO (NULL, "lang-all", DK_lang, 0),

   DUMP_FILE_INFO (NULL, "tree-all", DK_tree, 0),
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 00e175a4737..ee9e602b67f 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -48,6 +48,7 @@ enum tree_dump_index
   TDI_gimple,  /* dump each function after gimplifying it */
   TDI_nested,  /* dump each function after unnesting it */
   TDI_lto_stream_out,  /* dump information about lto streaming */
+  TDI_profile_report,  /* dump information about profile quality */
 
   TDI_lang_all,			/* enable all the language dumps.  */

   TDI_tree_all,/* enable all the GENERIC/GIMPLE dumps. 
 */
diff --git a/gcc/passes.c b/gcc/passes.c
index 07b2613ffea..a5da9a46f4e 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1850,10 +1850,15 @@ pass_manager::dump_profile_report () const
 
   if (!profile_record)

 return;
-  fprintf (stderr, "\nProfile consistency report:\n\n");
-  fprintf (stderr, " |mismatch |mismatch |  
   |\n");
-  fprintf (stderr, "Pass name|IN|IN|OUT   |OUT   
|overall  |\n");
-  fprintf (stderr, " |freq  |count |freq  |count 
|size  |time  |\n");
+
+  FILE *dump_file = dump_begin (TDI_profile_report, NULL);
+  if (dump_file == NULL)
+dump_file = stderr;
+
+  fprintf (dump_file, "Profile consistency report:\n\n");
+  fprintf (dump_file, " |mismatch |mismatch 
| |\n");
+  fprintf (dump_file, "Pass name|IN|IN|OUT   |OUT   
|overall  |\n");
+  fprintf (dump_file, " |freq  |count |freq  |count 
|size  |time  |\n");

   for (int i = 1; i < passes_by_id_size; i++)
 if (profile_record[i].run)
@@ -1876,47 +1881,47 @@ pass_manager::dump_profile_report () const
|| rel_time_change || rel_size_change)
  {
last_reported = i;
-   fprintf (stderr, "%-33s", passes_by_id[i]->name);
+   fprintf (dump_file, "%-33s", passes_by_id[i]->name);
if (profile_record[i].num_mismatched_freq_in != last_freq_in)
- fprintf (stderr, "| %+5i",
+ fprintf (dump_file, "| %+5i",
   profile_record[i].num_mismatched_freq_in
   - last_freq_in);
else
- fprintf (stderr, "|  ");
+ fprintf (dump_file, "|  ");
if (profile_record[i].num_mismatched_count_in != last_count_in)
- fprintf (stderr, "| %+5i",
+ fprintf (dump_file, "| %+5i",
   profile_record[i].num_mismatched_count_in
   - last_count_in);
else
- fprintf (stderr, "|  ");
+ fprintf (dump_file, "|  ");
if (profile_record[i].num_mismatched_freq_out != last_freq_out)
- fprintf (stderr, "| %+5i",
+ fprintf (dump_file, "| %+5i",
   profile_record[i].num_mismatched_freq_out
   - last_freq_out);
else
- fprintf (stderr, "|  ");
+ fprintf (dump_file, "|  ");
if (profile_record[i].num_mismatched_count_out != last_count_out)
- fprintf (stderr, "| %+5i",
+ fprintf (dump_file, "| %+5i",
   profile_record[i].num_mismatched_count_out
   - last_count_out);
else
- fprintf (stderr, "|  ");
+ fprintf (dump_file, "|  ");
 
 	/* Size/time units change across gimple and RTL.  */

if (i == pass_expand_1->static_pass_number)
- fprintf (stderr, "|--|--");
+ fprintf (dump_file, "|--|--");
 

Re: [PATCH V2] PING^2 correct COUNT and PROB for unrolled loop

2020-07-09 Thread Martin Liška

On 7/2/20 4:35 AM, Jiufu Guo via Gcc-patches wrote:

I would like to reping this patch.
Since this is correcting COUNT and PROB for hot blocks, it helps some
cases.

https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html


Hey.

I've just briefly looked at the patch and I don't feel the right person
to make a review for it.

I believe Richi, Bin or Honza can help us here?
Martin


Re: [PATCH] x86: Enable FMA in rsqrt2 expander

2020-07-09 Thread Kirill Yukhin via Gcc-patches
On 07 июл 09:06, H.J. Lu wrote:
> On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin  wrote:
> >
> > Hello HJ,
> >
> > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote:
> > > Enable FMA in rsqrt2 expander and fold rsqrtv16sf2 expander into
> > > rsqrt2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER.
> > > Although it doesn't show performance change in our workloads, FMA can
> > > improve other workloads.
> > >
> > > gcc/
> > >
> > >   PR target/88713
> > >   * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA.
> > >   * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New.
> > >   (rsqrt2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256.
> > >   (rsqrtv16sf2): Removed.
> > >
> > > gcc/testsuite/
> > >
> > >   PR target/88713
> > >   * gcc.target/i386/pr88713-1.c: New test.
> > >   * gcc.target/i386/pr88713-2.c: Likewise.
> >
> > So, you've introduced new rsqrt expanders for DF vectors and relaxed
> > condition for V16SF. What I didn't get is why did you change unspec
> > type from RSQRT to RSQRT28 for V16SF expander?
> >
> 
> UNSPEC in define_expand is meaningless when the pattern is fully
> expanded by ix86_emit_swsqrtsf.  I believe that UNSPEC in rsqrt2
> expander can be removed.

Agree.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */

I gues -O2 is useless here (and in -2.c test).

Othwerwise LGTM.

--
K

> 
> -- 
> H.J.


RE: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

2020-07-09 Thread Kyrylo Tkachov
Hi Tamar,

> -Original Message-
> From: Tamar Christina 
> Sent: 09 July 2020 10:54
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.
> 
> Hi All,
> 
> This patch fixes a couple of issues in AArch64's -mcpu=native processing:
> 
> The buffer used to read the lines from /proc/cpuinfo is 128 bytes long.  While
> this was enough in the past with the increase in architecture extensions it is
> no longer enough.   It results in two bugs:
> 
> 1) No option string longer than 127 characters is correctly parsed.  Features
>that are supported are silently ignored.
> 
> 2) It incorrectly enables features that are not present on the machine:
>   a) It checks for substring matching instead of full word matching.  This
> makes
>  it incorrectly detect sb support when ssbs is provided instead.
>   b) Due to the truncation at the 127 char border it also incorrectly enables
>  features due to the full feature being cut off and the part that is left
>  accidentally enables something else.
> 
> This breaks -mcpu=native detection on some of our newer system.
> 
> The patch fixes these issues by reading full lines up to the \n in a string.
> This gives us the full feature line.  Secondly it creates a set from this 
> string
> to:
> 
>  1) Reduce matching complexity from O(n*m) to O(n*logm).
>  2) Perform whole word matching instead of substring matching.
> 
> To make this code somewhat cleaner I also changed from using char* to
> using
> std::string and std::set.
> 
> Note that I have intentionally avoided the use of ifstream and stringstream
> to make it easier to backport.  I have also not change the substring matching
> for the initial line classification as I cannot find a documented cpuinfo 
> format
> which leads me to believe there may be kernels out there that require this
> which
> may be why the original code does this.
> 
> I also do not want this to break if the kernel adds a new line that is long 
> and
> indents the file by two tabs to keep everything aligned.  In short I think an
> imprecise match is the right thing here.
> 
> Test for this is added as the last thing in this series as it requires some
> changes to be made to be able to test this.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master, GCC 10, 9 and 8?

Ok.
Note that GCC 10.2 release is coming soon (next week). Please make sure this is 
tested well on GCC 10 (not that I doubt your testing).

Thanks,
Kyrill


> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/driver-aarch64.c (INCLUDE_SET): New.
>   (parse_field): Use std::string.
>   (split_words, readline): New.
>   (host_detect_local_cpu): Fix truncation issues.
> 
> --


Re: [PATCH v4 1/2] asan: specify alignment for LASANPC labels

2020-07-09 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-07-01 at 21:48 +0200, Ilya Leoshkevich wrote:
> On Wed, 2020-07-01 at 11:57 -0600, Jeff Law wrote:
> > On Wed, 2020-07-01 at 14:29 +0200, Ilya Leoshkevich via Gcc-patches
> > wrote:
> > > gcc/ChangeLog:
> > > 
> > > 2020-06-30  Ilya Leoshkevich  
> > > 
> > >   * asan.c (asan_emit_stack_protection): Use CODE_LABEL_BOUNDARY.
> > >   * defaults.h (CODE_LABEL_BOUNDARY): New macro.
> > >   * doc/tm.texi: Document CODE_LABEL_BOUNDARY.
> > >   * doc/tm.texi.in: Likewise.
> > Don't we already have the ability to set label alignments?  See
> > LABEL_ALIGN.
> 
> The following works with -falign-labels=2:
> 
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -1524,7 +1524,7 @@ asan_emit_stack_protection (rtx base, rtx
> pbase,
> unsigned int alignb,
>DECL_INITIAL (decl) = decl;
>TREE_ASM_WRITTEN (decl) = 1;
>TREE_ASM_WRITTEN (id) = 1;
> -  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
> +  SET_DECL_ALIGN (decl, (1 << LABEL_ALIGN (gen_label_rtx ())) *
> BITS_PER_UNIT);
>emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>shadow_base = expand_binop (Pmode, lshr_optab, base,
>   gen_int_shift_amount (Pmode,
> ASAN_SHADOW_SHIFT),
> 
> In order to go this way, we would need to raise `-falign-labels=`
> default to 2 for s390, which is not incorrect, but would
> unnecessarily
> clutter asm with `.align 2` before each label.  So IMHO it would be
> nicer to simply ask the backend "what is your target's instruction
> alignment?".

Besides that it would clutter asm with .align 2, another argument
against using LABEL_ALIGN here is that it's semantically different from
what is needed: -falign-labels value, which it returns, is specified by
user for optimization purposes, whereas here we need to query the
architecture's property.

In practical terms, if user specifies -falign-labels=4096, this would
affect how the code is generated here. However, this would be
completely unnecessary: we never jump to decl, its address is only
saved for reporting.

Best regards,
Ilya



RE: [PATCH 2/6] AArch64: Add GCC_CPUINFO override

2020-07-09 Thread Kyrylo Tkachov


> -Original Message-
> From: Tamar Christina 
> Sent: 09 July 2020 10:55
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: [PATCH 2/6] AArch64: Add GCC_CPUINFO override
> 
> Hi All,
> 
> This adds an in intentionally undocumented environment variable
> GCC_CPUINFO which can be used to test -mcpu=native.
> 
> Tests using this are added later on.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master, GCC 10, 9 and 8?
> 

I like the idea. I'm a bit wary of using a non-aarch64-namespaced name for the 
variable, but it looks like it could be useful for other targets testing if 
they wished to. In any case, as it's internal-only and non-documented we won't 
make any promises on its compatibility.

So ok if the testsuite parts are approved.
Thanks,
Kyrill

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/driver-aarch64.c (host_detect_local_cpu):
>   Add GCC_CPUINFO.
> 
> --


RE: [PATCH 3/6] Arm: Add GCC_CPUINFO override

2020-07-09 Thread Kyrylo Tkachov


> -Original Message-
> From: Tamar Christina 
> Sent: 09 July 2020 10:55
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH 3/6] Arm: Add GCC_CPUINFO override
> 
> Hi All,
> 
> This adds an in intentionally undocumented environment variable
> GCC_CPUINFO which can be used to test -mcpu=native.
> 
> Tests using these are added later on.
> 
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master, GCC 10, 9 and 8?
> 

Ok on the same grounds and conditions as the aarch64 one.
Thanks,
Kyrill

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/arm/driver-arm.c (host_detect_local_cpu): Add
> GCC_CPUINFO.
> 
> --


RE: [PATCH 6/6] AArch64: Add test for -mcpu=native

2020-07-09 Thread Kyrylo Tkachov


> -Original Message-
> From: Tamar Christina 
> Sent: 09 July 2020 10:58
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: [PATCH 6/6] AArch64: Add test for -mcpu=native
> 
> Hi All,
> 
> This adds some tests to the GCC testsuite for testing the
> -mcpu=native code.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master, GCC 10, 9 and 8?
> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/cpunative/aarch64-cpunative.exp: New test.
>   * gcc.target/aarch64/cpunative/info_0: New test.
>   * gcc.target/aarch64/cpunative/info_1: New test.
>   * gcc.target/aarch64/cpunative/info_10: New test.
>   * gcc.target/aarch64/cpunative/info_11: New test.
>   * gcc.target/aarch64/cpunative/info_12: New test.
>   * gcc.target/aarch64/cpunative/info_13: New test.
>   * gcc.target/aarch64/cpunative/info_14: New test.
>   * gcc.target/aarch64/cpunative/info_15: New test.
>   * gcc.target/aarch64/cpunative/info_2: New test.
>   * gcc.target/aarch64/cpunative/info_3: New test.
>   * gcc.target/aarch64/cpunative/info_4: New test.
>   * gcc.target/aarch64/cpunative/info_5: New test.
>   * gcc.target/aarch64/cpunative/info_6: New test.
>   * gcc.target/aarch64/cpunative/info_7: New test.
>   * gcc.target/aarch64/cpunative/info_8: New test.
>   * gcc.target/aarch64/cpunative/info_9: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_0.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_1.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_10.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_11.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_12.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_13.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_14.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_15.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_2.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_3.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_4.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_5.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_6.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_7.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_8.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_9.c: New test.
> 
> --

diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/aarch64-cpunative.exp 
b/gcc/testsuite/gcc.target/aarch64/cpunative/aarch64-cpunative.exp
new file mode 100644
index 
..ce80ca04b8d095c96acf0bda4d5b16a6460c4cbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/aarch64-cpunative.exp
@@ -0,0 +1,35 @@
+# Copyright (C) 2014-2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if ![istarget aarch64*-*-*] then {


May as well add a test for native here to avoid going into the tests themselves?

Ok either way with me.
Thanks,
Kyrill

+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp



Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

2020-07-09 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> This patch fixes a couple of issues in AArch64's -mcpu=native processing:
>
> The buffer used to read the lines from /proc/cpuinfo is 128 bytes long.  While
> this was enough in the past with the increase in architecture extensions it is
> no longer enough.   It results in two bugs:
>
> 1) No option string longer than 127 characters is correctly parsed.  Features
>that are supported are silently ignored.
>
> 2) It incorrectly enables features that are not present on the machine:
>   a) It checks for substring matching instead of full word matching.  This 
> makes
>  it incorrectly detect sb support when ssbs is provided instead.
>   b) Due to the truncation at the 127 char border it also incorrectly enables
>  features due to the full feature being cut off and the part that is left
>  accidentally enables something else.
>
> This breaks -mcpu=native detection on some of our newer system.
>
> The patch fixes these issues by reading full lines up to the \n in a string.
> This gives us the full feature line.  Secondly it creates a set from this 
> string
> to:
>
>  1) Reduce matching complexity from O(n*m) to O(n*logm).
>  2) Perform whole word matching instead of substring matching.
>
> To make this code somewhat cleaner I also changed from using char* to using
> std::string and std::set.
>
> Note that I have intentionally avoided the use of ifstream and stringstream
> to make it easier to backport.  I have also not change the substring matching
> for the initial line classification as I cannot find a documented cpuinfo 
> format
> which leads me to believe there may be kernels out there that require this 
> which
> may be why the original code does this.
>
> I also do not want this to break if the kernel adds a new line that is long 
> and
> indents the file by two tabs to keep everything aligned.  In short I think an
> imprecise match is the right thing here.
>
> Test for this is added as the last thing in this series as it requires some
> changes to be made to be able to test this.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Sorry to be awkward.  I know Kyrill's already approved this,
but I kind-of object.

I think we should split this into two: fixing the limit bug,
and fixing the complexity.  It's easier to justify backporting
the fix for the limit bug than the fix for the complexity.

For fixing the limit bug, it seems better to use an obstack to read the
file instead.  This should actually be much easier than what the patch
does. :-)  It should also be more efficient.

For fixing the complexity: does this actually make things noticeably
faster in practice?  The “m” in the O(n*m) is a fairly small constant.
The cost of making it O(n*logm) this way involves many more memory
allocations, so it isn't obvious that it's actually more efficient
in practice.  Do you have any timings?

If search time is a problem, building a hash_map might be better.

Thanks,
Richard


Re: [PATCH] rs6000: Allow MMA built-in initialization regardless of compiler options

2020-07-09 Thread Peter Bergner via Gcc-patches
On 7/8/20 11:02 PM, Peter Bergner wrote:
> Is this ok for trunk assuming the bootstrap and regression testing
> show no regressions?
> 
> This also affects GCC10, so I'd like to backport this before the release.
> Ok there too after it sits on trunk a day or two?
> 
> Peter
> 
> 
> gcc/
>   PR target/96125
>   * config/rs6000/rs6000-call.c (rs6000_init_builtins): Define the MMA
>   specific types __vector_quad and __vector_pair, and initialize the
>   MMA built-ins if TARGET_EXTRA_BUILTINS is set.
>   (mma_init_builtins): Don't test for mask set in rs6000_builtin_mask.
>   Remove now unneeded mask variable.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Add the
>   OPTION_MASK_MMA flag for power10 if not already set.
> 
> gcc/testsuite/
>   PR target/96125
>   * gcc.target/powerpc/pr96125.c: New test.

Trunk testing was clean with no regressions.

Peter



Re: [PATCH] [RISC-V] Add support for TLS stack protector canary access

2020-07-09 Thread Guo Ren via Gcc-patches

Hi Cooper,

Great Job!

Tested-by: Guo Ren 

Here is kernel related patch with tested result:

https://lore.kernel.org/linux-riscv/1594279697-72511-2-git-send-email-guo...@kernel.org/T/#u 



Best Regards
 Guo Ren

On 2020/7/8 上午10:51, cooper wrote:

The linux kernel guys are discussing about supporting TLS register based
stack proctector canary, the link is as follows:

https://lore.kernel.org/linux-riscv/202007051820.DABE7F87D7@keescook/T/#t
I implemented register based stack protector canary with reference to
aarch64 and x86. When adding -mstack-protector-guard=tls,
use -mstack-protector-guard= to specify a register such as tp
and mstack-protector-guard-offset= to specify the offset, then
the TLS stack protector canary code will be generated.

gcc/
* config/riscv/riscv-opts.h (stack_protector_guard): New enum.
* config/riscv/riscv.c (riscv_option_override): Handle
the new options.
* config/riscv/riscv.md (stack_protect_set): New pattern to handle
flexible stack protector guard settings.
(stack_protect_set_): Ditto.
(stack_protect_test): Ditto.
(stack_protect_test_): Ditto.
* config/riscv/riscv.opt (mstack-protector-guard=,
mstack-protector-guard-reg=, mstack-protector-guard-offset=): New
options.
* doc/invoke.texi (Option Summary) [RISC-V Options]:
Add -mstack-protector-guard=, -mstack-protector-guard-reg=, and
-mstack-protector-guard-offset=.
(RISC-V Options): Ditto.

---
  gcc/ChangeLog | 18 
  gcc/config/riscv/riscv-opts.h |  6 +++
  gcc/config/riscv/riscv.c  | 41 ++
  gcc/config/riscv/riscv.md | 80 +++
  gcc/config/riscv/riscv.opt| 28 
  gcc/doc/invoke.texi   | 22 +-
  6 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ea2f78df22e..98745f9f946 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,21 @@
+2020-07-07  Cooper Qu  
+
+   * config/riscv/riscv-opts.h (stack_protector_guard): New enum.
+   * config/riscv/riscv.c (riscv_option_override): Handle
+   the new options.
+   * config/riscv/riscv.md (stack_protect_set): New pattern to handle
+   flexible stack protector guard settings.
+   (stack_protect_set_): Ditto.
+   (stack_protect_test): Ditto.
+   (stack_protect_test_): Ditto.
+   * config/riscv/riscv.opt (mstack-protector-guard=,
+   mstack-protector-guard-reg=, mstack-protector-guard-offset=): New
+   options.
+   * doc/invoke.texi (Option Summary) [RISC-V Options]:
+   Add -mstack-protector-guard=, -mstack-protector-guard-reg=, and
+   -mstack-protector-guard-offset=.
+   (RISC-V Options): Ditto.
+
  2020-07-06  Richard Biener  
  
  	PR tree-optimization/96075

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 8f12e50b9f1..2a3f9d9eef5 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -51,4 +51,10 @@ enum riscv_align_data {
riscv_align_data_type_natural
  };
  
+/* Where to get the canary for the stack protector.  */

+enum stack_protector_guard {
+  SSP_TLS, /* per-thread canary in TLS block */
+  SSP_GLOBAL   /* global canary */
+};
+
  #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index bfb3885ed08..e606f24fa74 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -4775,6 +4775,47 @@ riscv_option_override (void)
   " [%<-mriscv-attribute%>]");
  #endif
  
+  if (riscv_stack_protector_guard == SSP_GLOBAL

+  && global_options_set.x_riscv_stack_protector_guard_offset_str)
+{
+  error ("incompatible options %<-mstack-protector-guard=global%> and "
+"%<-mstack-protector-guard-offset=%s%>",
+riscv_stack_protector_guard_offset_str);
+}
+
+  if (riscv_stack_protector_guard == SSP_TLS
+  && !(global_options_set.x_riscv_stack_protector_guard_offset_str
+  && global_options_set.x_riscv_stack_protector_guard_reg_str))
+{
+  error ("both %<-mstack-protector-guard-offset%> and "
+"%<-mstack-protector-guard-reg%> must be used "
+"with %<-mstack-protector-guard=sysreg%>");
+}
+
+  if (global_options_set.x_riscv_stack_protector_guard_reg_str)
+{
+  const char *str = riscv_stack_protector_guard_reg_str;
+  int reg = decode_reg_name (str);
+
+  if (!IN_RANGE (reg, 1, 31))
+   error ("%qs is not a valid base register in %qs", str,
+  "-mstack-protector-guard-reg=");
+
+  riscv_stack_protector_guard_reg = reg;
+}
+
+  if (global_options_set.x_riscv_stack_protector_guard_offset_str)
+{
+  char *end;
+  const char *str = riscv_stack_protector_guard_offset_str;
+  errno = 0;
+  long offs = strtol (riscv_stack_protector_guard_offset_str, &e

Re: [PATCH] x86: Enable FMA in rsqrt2 expander

2020-07-09 Thread H.J. Lu via Gcc-patches
On Thu, Jul 9, 2020 at 5:04 AM Kirill Yukhin  wrote:
>
> On 07 июл 09:06, H.J. Lu wrote:
> > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin  
> > wrote:
> > >
> > > Hello HJ,
> > >
> > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote:
> > > > Enable FMA in rsqrt2 expander and fold rsqrtv16sf2 expander into
> > > > rsqrt2 expander which expands to UNSPEC_RSQRT28 for 
> > > > TARGET_AVX512ER.
> > > > Although it doesn't show performance change in our workloads, FMA can
> > > > improve other workloads.
> > > >
> > > > gcc/
> > > >
> > > >   PR target/88713
> > > >   * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA.
> > > >   * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New.
> > > >   (rsqrt2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256.
> > > >   (rsqrtv16sf2): Removed.
> > > >
> > > > gcc/testsuite/
> > > >
> > > >   PR target/88713
> > > >   * gcc.target/i386/pr88713-1.c: New test.
> > > >   * gcc.target/i386/pr88713-2.c: Likewise.
> > >
> > > So, you've introduced new rsqrt expanders for DF vectors and relaxed
> > > condition for V16SF. What I didn't get is why did you change unspec
> > > type from RSQRT to RSQRT28 for V16SF expander?
> > >
> >
> > UNSPEC in define_expand is meaningless when the pattern is fully
> > expanded by ix86_emit_swsqrtsf.  I believe that UNSPEC in rsqrt2
> > expander can be removed.
>
> Agree.

I will leave UNSPEC alone here.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */
>
> I gues -O2 is useless here (and in -2.c test).

Fixed.

> Othwerwise LGTM.
>

This is the patch I am checking in.

Thanks.

-- 
H.J.
From 54d4f6c740ed885e55b0092895ef0986714211b3 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 23 Jan 2019 06:33:58 -0800
Subject: [PATCH] x86: Enable FMA in rsqrt2 expander

Enable FMA in rsqrt2 expander and fold rsqrtv16sf2 expander into
rsqrt2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER.
Although it doesn't show performance change in our workloads, FMA can
improve other workloads.

gcc/

	PR target/88713
	* config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA.
	* config/i386/sse.md (VF_AVX512VL_VF1_128_256): New.
	(rsqrt2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256.
	(rsqrtv16sf2): Removed.

gcc/testsuite/

	PR target/88713
	* gcc.target/i386/pr88713-1.c: New test.
	* gcc.target/i386/pr88713-2.c: Likewise.
---
 gcc/config/i386/i386-expand.c | 18 -
 gcc/config/i386/sse.md| 24 ++-
 gcc/testsuite/gcc.target/i386/pr88713-1.c | 13 
 gcc/testsuite/gcc.target/i386/pr88713-2.c |  6 ++
 4 files changed, 42 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88713-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88713-2.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index d81dd73f034..49718b7a41c 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -15535,14 +15535,22 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip)
 	}
 }
 
+  mthree = force_reg (mode, mthree);
+
   /* e0 = x0 * a */
   emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a)));
-  /* e1 = e0 * x0 */
-  emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0)));
 
-  /* e2 = e1 - 3. */
-  mthree = force_reg (mode, mthree);
-  emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (mode, e1, mthree)));
+  if (TARGET_FMA || TARGET_AVX512F)
+emit_insn (gen_rtx_SET (e2,
+			gen_rtx_FMA (mode, e0, x0, mthree)));
+  else
+{
+  /* e1 = e0 * x0 */
+  emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0)));
+
+  /* e2 = e1 - 3. */
+  emit_insn (gen_rtx_SET (e2, gen_rtx_PLUS (mode, e1, mthree)));
+}
 
   mhalf = force_reg (mode, mhalf);
   if (recip)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 431571a4bc1..d3ad5833e1f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -326,6 +326,12 @@ (define_mode_iterator VF_AVX512VL
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
+;; AVX512VL SF/DF plus 128- and 256-bit SF vector modes
+(define_mode_iterator VF_AVX512VL_VF1_128_256
+  [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX512VL")
+   (V2DF "TARGET_AVX512VL")])
+
 (define_mode_iterator VF2_AVX512VL
   [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
@@ -2070,26 +2076,16 @@ (define_insn "*_vmsqrt2"
(set_attr "mode" "")])
 
 (define_expand "rsqrt2"
-  [(set (match_operand:VF1_128_256 0 "register_operand")
-	(unspec:VF1_128_256
-	  [(match_operand:VF1_128_256 1 "vector_operand")] UNSPEC_RSQRT))]
+  [(set (match_operand:VF_AVX512VL_VF1_128_256 0 "register_operand")
+	(unspec:VF_AVX512VL_VF1_128_256
+	  [(match_operand:VF_AVX512VL_VF1_

RE: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

2020-07-09 Thread Tamar Christina
Hi Richard,

Thanks for the review,

> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, July 9, 2020 1:35 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This patch fixes a couple of issues in AArch64's -mcpu=native processing:
> >
> > The buffer used to read the lines from /proc/cpuinfo is 128 bytes
> > long.  While this was enough in the past with the increase in architecture
> extensions it is
> > no longer enough.   It results in two bugs:
> >
> > 1) No option string longer than 127 characters is correctly parsed.  
> > Features
> >that are supported are silently ignored.
> >
> > 2) It incorrectly enables features that are not present on the machine:
> >   a) It checks for substring matching instead of full word matching.  This
> makes
> >  it incorrectly detect sb support when ssbs is provided instead.
> >   b) Due to the truncation at the 127 char border it also incorrectly 
> > enables
> >  features due to the full feature being cut off and the part that is 
> > left
> >  accidentally enables something else.
> >
> > This breaks -mcpu=native detection on some of our newer system.
> >
> > The patch fixes these issues by reading full lines up to the \n in a string.
> > This gives us the full feature line.  Secondly it creates a set from
> > this string
> > to:
> >
> >  1) Reduce matching complexity from O(n*m) to O(n*logm).
> >  2) Perform whole word matching instead of substring matching.
> >
> > To make this code somewhat cleaner I also changed from using char* to
> > using std::string and std::set.
> >
> > Note that I have intentionally avoided the use of ifstream and
> > stringstream to make it easier to backport.  I have also not change
> > the substring matching for the initial line classification as I cannot
> > find a documented cpuinfo format which leads me to believe there may
> > be kernels out there that require this which may be why the original code
> does this.
> >
> > I also do not want this to break if the kernel adds a new line that is
> > long and indents the file by two tabs to keep everything aligned.  In
> > short I think an imprecise match is the right thing here.
> >
> > Test for this is added as the last thing in this series as it requires
> > some changes to be made to be able to test this.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Sorry to be awkward.  I know Kyrill's already approved this, but I kind-of
> object.
> 
> I think we should split this into two: fixing the limit bug, and fixing the
> complexity.  It's easier to justify backporting the fix for the limit bug 
> than the
> fix for the complexity.

It was never the intention to fix the complexity. The change in complexity is 
just
simply because I split the feature strings into actual words now. I do so 
because
in my opinion this is simpler than doing memcmp and checking the the character
after the one you matched is a space or a null character, and then checking 
that you are
one character away from the previous space or at the start of the line.

Do you know of a simpler way to do word matches in C?

> 
> For fixing the limit bug, it seems better to use an obstack to read the file
> instead.  This should actually be much easier than what the patch does. :-)  
> It
> should also be more efficient.
> 

Sorry.. I genuinely don't understand. I had looked at obstack and the interface 
seemed to be
more work to me.  I am probably wrong since this interface has zero 
documentation
but it looks like to use obstack I have to (based on guessing from what other 
code are doing)

1. call gcc_obstack_init
2. call obstack_alloc
3. call obstack_grow any time I need a bigger buffer.
4. call obstack_free

So I am missing why this is simpler than calling realloc... 
But obviously I am missing something.

> For fixing the complexity: does this actually make things noticeably faster in
> practice?  The “m” in the O(n*m) is a fairly small constant.
> The cost of making it O(n*logm) this way involves many more memory
> allocations, so it isn't obvious that it's actually more efficient in 
> practice.  Do
> you have any timings?

Well, no.. I didn't run any benchmarks. Did you have any in mind that you 
wanted me to run?

As I mentioned, the complexity change was just a byproduct of doing a much 
easier to see correct
word matching.  But if you want me to just add extra checks to the existing 
implementation to see
if it meets all the criteria of being a standalone word then sure I can do 
that.  I don't particularly find
that implementation easier to read and I would have thought correctness was to 
be favored over memory
allocation in something that is called once.

> 
> If search time is a problem, building a hash_map might be better.

I would have thought the red-black t

PING^2 [PATCH 0/2] x86: Add cmpmemsi for -minline-all-stringops

2020-07-09 Thread H.J. Lu via Gcc-patches
On Tue, Jun 9, 2020 at 9:32 AM H.J. Lu  wrote:
>
> On Sun, May 31, 2020 at 4:10 PM H.J. Lu  wrote:
> >
> > We used to expand memcmp to "repz cmpsb" via cmpstrnsi.  It was changed
> > by
> >
> > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f
> > Author: Nick Clifton 
> > Date:   Fri Aug 12 16:26:11 2011 +
> >
> > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern.
> >
> > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi
> > pattern.
> > * doc/md.texi (cmpstrn): Note that the comparison stops if both
> > fetched bytes are zero.
> > (cmpstr): Likewise.
> > (cmpmem): Note that the comparison does not stop if both of the
> > fetched bytes are zero.
> >
> > Duplicate the cmpstrn pattern for cmpmem and expand cmpmem to "repz cmpsb"
> > for -minline-all-stringops.  The only difference is that the length
> > argument of cmpmem is guaranteed to be less than or equal to lengths of
> > 2 memory areas.  Since cmpmem expander may pass the actual string length
> > directly to cmpstrnqi patterns.  Pass a copy of the string length to
> > cmpstrnqi patterns to avoid changing the actual string length by cmpstrnqi
> > patterns.
> >
> > Tested on Linux/x86 and Linux/x86-64.  OK for master?
> >
> > H.J. Lu (2):
> >   x86: Pass a copy of the string length to cmpstrnqi
> >   x86: Add cmpmemsi for -minline-all-stringops
> >
> >  gcc/config/i386/i386-expand.c |  84 ++
> >  gcc/config/i386/i386-protos.h |   1 +
> >  gcc/config/i386/i386.md   |  80 -
> >  gcc/testsuite/gcc.target/i386/pr95151-1.c |  17 +++
> >  gcc/testsuite/gcc.target/i386/pr95151-2.c |  10 ++
> >  gcc/testsuite/gcc.target/i386/pr95151-3.c |  18 +++
> >  gcc/testsuite/gcc.target/i386/pr95151-4.c |  11 ++
> >  gcc/testsuite/gcc.target/i386/pr95443-1.c | 130 ++
> >  gcc/testsuite/gcc.target/i386/pr95443-2.c |  79 +
> >  9 files changed, 371 insertions(+), 59 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr95151-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr95151-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr95151-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr95151-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr95443-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr95443-2.c
> >
>
> PING:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546920.html
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html
>
>
> --
> H.J.



-- 
H.J.


Re: [PATCH] aarch64: Change costs for TX2 to expose more vectorization opportunities

2020-07-09 Thread Anton Youdkevitch

Richard,

Can you approve the backporting of the patch to GCC10?
Also, since I don't have the commit permission can you push
it if approved?

--
  Thanks,
  Anton

On 06.7.2020 21:04 , Richard Sandiford wrote:

Joel Jones  writes:

I approve of this patch. I'm responsible for GCC for TX2 at Marvell. Andrew 
Pinski should certainly chime in if he wants.

Ah, in that case, the patch is OK.

Thanks,
Richard




Re: [PATCH 0/6 ver 4] ] Permute Class Operations

2020-07-09 Thread will schmidt via Gcc-patches
On Wed, 2020-07-08 at 12:58 -0700, Carl Love wrote:
> [PATCH 1/6] rs6000, Update support for vec_extract



Email subject needs to be updated too.  This is at least correct in-
line.  Here and subsequent messages in thread.


> 
> -
> V4 changes
>   rebased onto mainline 7/2/2020
>   Add iterator name to Change log
> 
> ---
> V3 changes
> 
>   Redo ChangeLog for code move.
>   Replace spaces with tabs in ChangeLog.
>   Replaced intruction names using * with the actual list of names.  For
>   example vextdu*vrx with the explicit instruction names vextdubvrx,
>   vextduhvrx, etc.
> -
> v2 changes
> 
> config/rs6000/altivec.md log entry for move from changed as suggested.
> 
> config/rs6000/vsx.md log entro for moved to here changed as suggested.
> 
> define_mode_iterator VI2 also moved, included in both change log entries
> 
> 
> GCC maintainers:
> 
> Move the existing vector extract support in altivec.md to vsx.md
> so all of the vector insert and extract support is in the same file.
> 
> The patch also updates the name of the builtins and descriptions for the
> builtins in the documentation file so they match the approved builtin
> names and descriptions.
> 
> The patch does not make any functional changes.
> 
> Please let me know if the changes are acceptable for mainline.  Thanks.
> 
>   Carl Love
> 
> --
> 
> gcc/ChangeLog
> 
> 2020-07-06  Carl Love  
> 
>   * config/rs6000/altivec.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
>   (vextractl, vextractr)
>   (vextractl_internal, vextractr_internal for mode VI2)
>   (VI2): Move to ...
>   * config/rs6000/vsx.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
>   (vextractl, vextractr)
>   (vextractl_internal, vextractr_internal for mode VI2)
>   (VI2):  ..here.
>   * gcc/doc/extend.texi: Update documentation for vec_extractl.
>   Replace builtin name vec_extractr with vec_extracth.  Update description
>   of vec_extracth.
> ---
>  gcc/config/rs6000/altivec.md | 64 -
>  gcc/config/rs6000/vsx.md | 66 ++
>  gcc/doc/extend.texi  | 78 ++--
>  3 files changed, 105 insertions(+), 103 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2ce9227c765..749b2c42c14 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -172,8 +172,6 @@
> UNSPEC_XXEVAL
> UNSPEC_VSTRIR
> UNSPEC_VSTRIL
> -   UNSPEC_EXTRACTL
> -   UNSPEC_EXTRACTR
>  ])
> 
>  (define_c_enum "unspecv"
> @@ -184,8 +182,6 @@
> UNSPECV_DSS
>])
> 
> -;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
> -(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
>  ;; Longer vec int modes for rotate/mask ops
> @@ -786,66 +782,6 @@
>DONE;
>  })
> 
> -(define_expand "vextractl"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand")
> - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> -   (match_operand:VI2 2 "altivec_register_operand")
> -   (match_operand:SI 3 "register_operand")]
> -  UNSPEC_EXTRACTL))]
> -  "TARGET_POWER10"
> -{
> -  if (BYTES_BIG_ENDIAN)
> -{
> -  emit_insn (gen_vextractl_internal (operands[0], operands[1],
> -operands[2], operands[3]));
> -  emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> -}
> -  else
> -emit_insn (gen_vextractr_internal (operands[0], operands[2],
> -  operands[1], operands[3]));
> -  DONE;
> -})
> -
> -(define_insn "vextractl_internal"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v")
> -   (match_operand:VEC_I 2 "altivec_register_operand" "v")
> -   (match_operand:SI 3 "register_operand" "r")]
> -  UNSPEC_EXTRACTL))]
> -  "TARGET_POWER10"
> -  "vextvlx %0,%1,%2,%3"
> -  [(set_attr "type" "vecsimple")])
> -
> -(define_expand "vextractr"
> -  [(set (match_operand:V2DI 0 "altivec_register_operand")
> - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand")
> -   (match_operand:VI2 2 "altivec_register_operand")
> -   (match_operand:SI 3 "register_operand")]
> -  UNSPEC_EXTRACTR))]
> -  "TARGET_POWER10"
> -{
> -  if (BYTES_BIG_ENDIAN)
> -{
> -  emit_insn (gen_vextractr_internal (operands[0], operands[1],
> -operands[2], operands[3]));
> -  emit_insn (gen_xxswapd_v2di (operands[0], operands[0]));
> -}
> -  else
> -emit_insn (gen_v

Ping Re: c: Add C2X BOOL_MAX and BOOL_WIDTH to limits.h

2020-07-09 Thread Joseph Myers
Ping for this limits.h patch 
.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 0/6 ver 4] ] Permute Class Operations

2020-07-09 Thread will schmidt via Gcc-patches
On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 2/6] rs6000 Add vector insert builtin support
> 
> 
> V4 changes
>   Rebased on mainline.  Changed FUTURE to P10 as needed.
> 
> 
> V3 changes
> 
>   Replace spaces with of tabs in ChangeLog
>   Ditto in gcc/config/rs6000/vsx.md.
>   Updated description for vec_insertl() builtin.
>   Cleaned up vec_insert description.
> 
> -
> v2 changes
> 
> Fix change log entry for config/rs6000/altivec.h
> 
> Fix change log entry for config/rs6000/rs6000-builtin.def
> 
> Fix change log entry for config/rs6000/rs6000-call.c
> 
> vsx.md: Fixed if (BYTES_BIG_ENDIAN) else statements.
> Porting error from pu branch.
> 
> ---
> GCC maintainers:
> 
> This patch adds support for vec_insertl and vec_inserth builtins.
> 
> The patch has been compiled and tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> and mambo with no regression errors.
> 
> Please let me know if this patch is acceptable for the mainline branch.
> 
> Thanks.
> 
>  Carl Love
> 
> --
> gcc/ChangeLog
> 
> 2020-07-02  Carl Love  
> 
>   * config/rs6000/altivec.h (vec_insertl, vec_inserth): New defines.
>   * config/rs6000/rs6000-builtin.def (VINSERTGPRBL, VINSERTGPRHL,
>   VINSERTGPRWL, VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL,
>   VINSERTGPRBR, VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR,
>   VINSERTVPRHR, VINSERTVPRWR): New builtins.
>   (INSERTL, INSERTH): New builtins.
>   * config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_INSERTL,
>   P10_BUILTIN_VEC_INSERTH): New overloaded definitions.
>   (P10_BUILTIN_VINSERTGPRBL, P10_BUILTIN_VINSERTGPRHL,
>   P10_BUILTIN_VINSERTGPRWL, P10_BUILTIN_VINSERTGPRDL,
>   P10_BUILTIN_VINSERTVPRBL, P10_BUILTIN_VINSERTVPRHL,
>   P10_BUILTIN_VINSERTVPRWL): Add case entries.
>   * config/rs6000/vsx.md (define_c_enum): Add UNSPEC_INSERTL,
>   UNSPEC_INSERTR.
>   (define_expand): Add vinsertvl_, vinsertvr_,
>   vinsertgl_, vinsertgr_, mode is VI2.
>   (define_ins): vinsertvl_internal_, vinsertvr_internal_,
>   vinsertgl_internal_, vinsertgr_internal_, mode VEC_I.
>   * doc/extend.texi: Add documentation for vec_insertl, vec_inserth.
> 

ok

> gcc/testsuite/ChangeLog
> 
> 2020-07-02  Carl Love  
> 
>   * gcc.target/powerpc/vec-insert-word-runnable.c: New test case.
> ---
>  gcc/config/rs6000/altivec.h   |   2 +
>  gcc/config/rs6000/rs6000-builtin.def  |  18 +
>  gcc/config/rs6000/rs6000-call.c   |  51 +++
>  gcc/config/rs6000/vsx.md  | 110 ++
>  gcc/doc/extend.texi   |  71 
>  .../powerpc/vec-insert-word-runnable.c| 345 ++
>  6 files changed, 597 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index bb1524f4a67..0563853c03f 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -699,6 +699,8 @@ __altivec_scalar_pred(vec_any_nle,
>  /* Overloaded built-in functions for ISA 3.1.  */
>  #define vec_extractl(a, b, c)__builtin_vec_extractl (a, b, c)
>  #define vec_extracth(a, b, c)__builtin_vec_extracth (a, b, c)
> +#define vec_insertl(a, b, c)   __builtin_vec_insertl (a, b, c)
> +#define vec_inserth(a, b, c)   __builtin_vec_inserth (a, b, c)
> 
>  #define vec_gnb(a, b)__builtin_vec_gnb (a, b)
>  #define vec_clrl(a, b)   __builtin_vec_clrl (a, b)
> diff --git a/gcc/config/rs6000/rs6000-builtin.def 
> b/gcc/config/rs6000/rs6000-builtin.def
> index 363656ec05c..e73d144c1cc 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2708,6 +2708,22 @@ BU_P10V_3 (VEXTRACTHR, "vextduhvhx", CONST, 
> vextractrv8hi)
>  BU_P10V_3 (VEXTRACTWR, "vextduwvhx", CONST, vextractrv4si)
>  BU_P10V_3 (VEXTRACTDR, "vextddvhx", CONST, vextractrv2di)
> 
> +BU_P10V_3 (VINSERTGPRBL, "vinsgubvlx", CONST, vinsertgl_v16qi)
> +BU_P10V_3 (VINSERTGPRHL, "vinsguhvlx", CONST, vinsertgl_v8hi)
> +BU_P10V_3 (VINSERTGPRWL, "vinsguwvlx", CONST, vinsertgl_v4si)
> +BU_P10V_3 (VINSERTGPRDL, "vinsgudvlx", CONST, vinsertgl_v2di)
> +BU_P10V_3 (VINSERTVPRBL, "vinsvubvlx", CONST, vinsertvl_v16qi)
> +BU_P10V_3 (VINSERTVPRHL, "vinsvuhvlx", CONST, vinsertvl_v8hi)
> +BU_P10V_3 (VINSERTVPRWL, "vinsvuwvlx", CONST, vinsertvl_v4si)
> +
> +BU_P10V_3 (VINSERTGPRBR, "vinsgubvrx", CONST, vinsertgr_v16qi)
> +BU_P10V_3 (VINSERTGPRHR, "vinsguhvrx", CONST, vinsertgr_v8hi)
> +BU_P10V_3 (VINSERTGPRWR, "vinsguwvrx", CONST, vinsertgr_v4si)
> +BU_P10V_3 (VINSERTGPRDR, "vinsgudvrx", CONST, vinsertgr_v2di)
> +BU_P10V_3 (VIN

Re: [PATCH] handle MEM_REF with void* arguments (PR c++/95768)

2020-07-09 Thread Martin Sebor via Gcc-patches

On 6/29/20 1:19 AM, Richard Biener wrote:

On Mon, Jun 29, 2020 at 1:08 AM Martin Sebor  wrote:


On 6/23/20 1:12 AM, Richard Biener wrote:

On Tue, Jun 23, 2020 at 12:22 AM Martin Sebor via Gcc-patches
 wrote:


On 6/22/20 12:55 PM, Jason Merrill wrote:

On 6/22/20 1:25 PM, Martin Sebor wrote:

The attached fix parallels the one for the equivalent C bug 95580
where the pretty printers don't correctly handle MEM_REF arguments
with type void* or other pointers to an incomplete type.

The incorrect handling was exposed by the recent change to
-Wuninitialized which includes such expressions in diagnostics.



+if (tree size = TYPE_SIZE_UNIT (TREE_TYPE (argtype)))
+  if (!integer_onep (size))
+{
+  pp_cxx_left_paren (pp);
+  dump_type (pp, ptr_type_node, flags);
+  pp_cxx_right_paren (pp);
+}


Don't we want to print the cast if the pointer target type is incomplete?


I suppose, yes, although after some more testing I think what should
be output is the type of the access.  The target pointer type isn't
meaningful (at least not in this case).

Here's what the warning looks like in C for the test case in
gcc.dg/pr95580.c:

 warning: ‘*((void *)(p)+1)’ may be used uninitialized

and like this in C++:

 warning: ‘*(p +1)’ may be used uninitialized

The +1 is a byte offset, which is correct given that incrementing
a void* in GCC is the same as adding 1 to the byte address, but
dereferencing a void* doesn't correspond to what's going on in
the source.

Even for a complete type (with size greater than 1), printing
the type of the argument plus a byte offset is wrong.  It ends
up with this for the C++ test case from 95768:

 warning: ‘*((int*) +4)’ is used uninitialized

when the access is actually ‘*((int*) +1)’

So it seems to me for MEM_REF, to make the output meaningful,
it's the type of the access (i.e., the MEM_REF type) that should
be printed here, and the offset should either be in elements of
the accessed type, i.e.,

 warning: ‘*((int*) +1)’ is used uninitialized

or, if the access is misaligned, the argument should first be
cast to char*, the offset added, and the result then cast to
the access type, like this:

 warning: ‘*(T*)((char*) +1)’ is used uninitialized

The attached revised and less than fully tested patch implements
this for C++ only for now.  If we agree on this approach I'll see
about making the corresponding change in C.


Note that there is no C/C++ way of fully expressing MEM_REF
semantics.  __MEM  ((T *)p + 1) is not actually
*(int *)((char *)p + 1) because that does not reflect that the
effective type of the lvalue when TBAA is concerned is 'T'
rather than 'int'.


What form would you say is closest to the C/C++ semantics, or
likely the most useful to users, that GCC could print instead?


Hmm, I'd try *() maybe?  Because there's
no C/C++ that can express what GIMPLE can do here.  Of course
pattern matching the exact cases we can handle like your patch
is an improvement (but as said the TBAA issue is still present).


"pointer derived from" would be misleading because of C++ class
derivation.  But more important, I think the output needs to
reflect what the warning actually is based on.  Leaving out
salient details like types or offsets from warnings about
out-of-bounds accesses makes analyzing them more difficult,
both for users and for us during initial triage.


Note for MEM_REF the offset is always
a constant byte offset but it indeed does not have to be a
multiple of the MEM_REF type size.

I wonder whether printing the MEM_REF in full provides
any real diagnostic value in the more "obfuscated" cases.


I'm not sure what obfuscated cases you're thinking of, or what
you mean by printing it in full.


I think that printing ‘*(T*)((char*) +1)’ is likely going
to confuse users because they cannot match this to a source
location.  If we have a source location we should have caret
diagnostics.


  I instrumented the code to
print every MEM_REF in that comes up in warn_uninitialized_vars
and rebuilt GCC.  There are 17,456 distinct instances so I didn't
review them all but those I did look at all look reasonable.
Probably the least useful are those that mention  by
itself (i.e.,  or *).  Those with an offset
are more informative (e.g., *((access**) +1).  In
a few the offset is very large, such as *((unsigned int*)sp
+4611686018427387900), but that doesn't seem like a problem.
I'd be happy to share the result.


Here +4611686018427387900 should be printed as -4, MEM_REF
offsets are to be interpreted as signed.


Sure, converting the offset to signed sounds like a good idea.





I'd also not print  but .


I also don't find  helpful, but I don't see 
as an improvement.  I think printing the SSA variable would be
more informative here since its name is usually related to
the variable it was derived from in the source.  But making that
change (or any other like it) feels like too much feature creep

Re: [PATCH 0/6 ver 4] ] Permute Class Operations

2020-07-09 Thread will schmidt via Gcc-patches
On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 3/6] rs6000, Add vector replace builtin support
> 
> --
> V4 Fixes:
> 
>Rebased on mainline.  Changed FUTURE to P10 in code and ChangeLog.
>Set DEBUG to 0 in vec-replace-word-runnable.c test program.
>Fixed too long lines in ChangeLog.
> 
> --
> V3 fixes:
>Fixed bad word breaks in ChangLog.
>Replace spaces with tabs in ChangeLog.
> 
> 
> v2 fixes:
> 
> change log entries config/rs6000/vsx.md, config/rs6000/rs6000-builtin.def,
> config/rs6000/rs6000-call.c.
> 
> gcc/config/rs6000/rs6000-call.c: fixed if check for 3rd arg between 0 and 3
>  fixed if check for 3rd arg between 0 and 12
> 
> gcc/config/rs6000/vsx.md: removed REPLACE_ELT_atr definition and used
>   VS_scalar instead.
>   removed REPLACE_ELT_inst definition and used
>  instead
>   fixed spelling mistake on Endianness.
>   fixed indenting for vreplace_elt_
> 
> ---
> 
> GCC maintainers:
> 
> The following patch adds support for builtins vec_replace_elt and
> vec_replace_unaligned.
> 
> The patch has been compiled and tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> and mambo with no regression errors.
> 
> Please let me know if this patch is acceptable for the mainline
> branch.  Thanks.
> 
>  Carl Love
> 
> ---
> 
> gcc/ChangeLog
> 
> 2020-07-06 Carl Love  
> 
>   * config/rs6000/altivec.h: Add define for vec_replace_elt and
>   vec_replace_unaligned.
>   * config/rs6000/vsx.md (UNSPEC_REPLACE_ELT, UNSPEC_REPLACE_UN): New.

New unspecs.


>   (REPLACE_ELT): New mode iterator.
>   (REPLACE_ELT_atr, REPLACE_ELT_inst, REPLACE_ELT_char,
>   REPLACE_ELT_sh, REPLACE_ELT_max): New mode attributes.


_atr and _inst are no longer in the patch.  


>   (vreplace_un_, vreplace_elt__inst): New.



>   * config/rs6000/rs6000-builtin.def (VREPLACE_ELT_V4SI,
>   VREPLACE_ELT_UV4SI, VREPLACE_ELT_V4SF, VREPLACE_ELT_UV2DI,
>   VREPLACE_ELT_V2DF, VREPLACE_UN_V4SI, VREPLACE_UN_UV4SI,
>   VREPLACE_UN_V4SF, VREPLACE_UN_V2DI, VREPLACE_UN_UV2DI,
>   VREPLACE_UN_V2DF, (REPLACE_ELT, REPLACE_UN): New.

New builtin entries.

VREPLACE_ELT_V2DI is missing from list.


>   * config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_REPLACE_ELT,
>   P10_BUILTIN_VEC_REPLACE_UN): New.

New what?

>   (rs6000_expand_ternop_builtin): Add 3rd argument checks for

diff suggests this is in rs6000_expand_quaternop_builtin()  ? 


>   CODE_FOR_vreplace_elt_v4si, CODE_FOR_vreplace_elt_v4sf,
>   CODE_FOR_vreplace_un_v4si, CODE_FOR_vreplace_un_v4sf.
>   (builtin_function_type) [P10_BUILTIN_VREPLACE_ELT_UV4SI,
>   P10_BUILTIN_VREPLACE_ELT_UV2DI, P10_BUILTIN_VREPLACE_UN_UV4SI,
>   P10_BUILTIN_VREPLACE_UN_UV2DI]: New cases.



>   * doc/extend.texi: Add description for vec_replace_elt and
>   vec_replace_unaligned builtins.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-07-06 Carl Love  
> 
>   * gcc.target/powerpc/vec-replace-word.c: Add new test.


s/Add new/New/

Nothing else jumped out at me below.


Thanks
-Will





Re: [PATCH 0/6 ver 4] ] Permute Class Operations

2020-07-09 Thread will schmidt via Gcc-patches
On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 4/6] rs6000, Add vector shift double builtin support
> 

Nothing popped out at me for this patch.
lgtm

thanks
-Will


> --
> V4 Fixes:
> 
>Rebased on mainline.  Changed FUTURE to P10.
>Changed SLDB_LR to SLDB_lr
>Changed error ("argument 3 must be in the range 0 to 7"); to
>error ("argument 3 must be a constant in the range 0 to 7");
> 
> -
> V3 Fixes
>   Replace spaces with tabs in ChangeLog.
>   Minor edits to ChangeLog entry.
>   Minor edits to vec_sldb description in gcc/doc/extend.texi.
> 
> 
> v2 fixes:
> 
>  change logs redone
> 
>   gcc/config/rs6000/rs6000-call.c - added spaces before parenthesis
> around args.
> 
> -
> GCC maintainers:
> 
> The following patch adds support for the vector shift double
> builtins.
> 
> The patch has been compiled and tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> and Mambo with no regression errors.
> 
> Please let me know if this patch is acceptable for the mainline
> branch.
> 
> Thanks.
> 
>  Carl Love
> 
> ---
> 
> gcc/ChangeLog
> 
> 2020-07-06  Carl Love  
> 
>   * config/rs6000/altivec.h (vec_sldb, vec_srdb): New defines.
>   * config/rs6000/altivec.md (UNSPEC_SLDB, UNSPEC_SRDB): New.
>   (SLDB_LR): New attribute.
>   (VSHIFT_DBL_LR): New iterator.
>   (vsdb_): New define_insn.
>   * config/rs6000/rs6000-builtin.def (VSLDB_V16QI, VSLDB_V8HI,
>   VSLDB_V4SI, VSLDB_V2DI, VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI,
>   VSRDB_V2DI): New BU_P10V_3 definitions.
>   (SLDB, SRDB): New BU_P10_OVERLOAD_3 definitions.
>   * config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_SLDB,
>   P10_BUILTIN_VEC_SRDB): New definitions.
>   (rs6000_expand_ternop_builtin) [CODE_FOR_vsldb_v16qi,
>   CODE_FOR_vsldb_v8hi, CODE_FOR_vsldb_v4si, CODE_FOR_vsldb_v2di,
>   CODE_FOR_vsrdb_v16qi, CODE_FOR_vsrdb_v8hi, CODE_FOR_vsrdb_v4si,
>   CODE_FOR_vsrdb_v2di}: Add clauses.
>   * doc/extend.texi: Add description for vec_sldb and vec_srdb.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-07-06  Carl Love  
> 
>   * gcc.target/powerpc/vec-shift-double-runnable.c:  New test
> file.
> ---
>  gcc/config/rs6000/altivec.h   |   2 +
>  gcc/config/rs6000/altivec.md  |  18 +
>  gcc/config/rs6000/rs6000-builtin.def  |  12 +
>  gcc/config/rs6000/rs6000-call.c   |  70 
>  gcc/doc/extend.texi   |  53 +++
>  .../powerpc/vec-shift-double-runnable.c   | 384
> ++
>  6 files changed, 539 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-
> double-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index 560c43cfc99..c202fcf25da 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -703,6 +703,8 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_inserth(a, b, c)   __builtin_vec_inserth (a, b, c)
>  #define vec_replace_elt(a, b, c)   __builtin_vec_replace_elt (a,
> b, c)
>  #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a,
> b, c)
> +#define vec_sldb(a, b, c)  __builtin_vec_sldb (a, b, c)
> +#define vec_srdb(a, b, c)  __builtin_vec_srdb (a, b, c)
> 
>  #define vec_gnb(a, b)__builtin_vec_gnb (a, b)
>  #define vec_clrl(a, b)   __builtin_vec_clrl (a, b)
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 749b2c42c14..c58fb3961e0 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -172,6 +172,8 @@
> UNSPEC_XXEVAL
> UNSPEC_VSTRIR
> UNSPEC_VSTRIL
> +   UNSPEC_SLDB
> +   UNSPEC_SRDB
>  ])
> 
>  (define_c_enum "unspecv"
> @@ -782,6 +784,22 @@
>DONE;
>  })
> 
> +;; Map UNSPEC_SLDB to "l" and  UNSPEC_SRDB to "r".
> +(define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
> +   (UNSPEC_SRDB "r")])
> +
> +(define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> +
> +(define_insn "vsdb_"
> + [(set (match_operand:VI2 0 "register_operand" "=v")
> +  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> +(match_operand:VI2 2 "register_operand" "v")
> +(match_operand:QI 3 "const_0_to_12_operand" "n")]
> +   VSHIFT_DBL_LR))]
> +  "TARGET_POWER10"
> +  "vsdbi %0,%1,%2,%3"
> +  [(set_attr "type" "vecsimple")])
> +
>  (define_expand "vstrir_"
>[(set (match_operand:VIshort 0 "altivec_register_operand")
>   (unspec:VIshort [(match_operand:VIshort 1
> "altivec_register_operand")]
> diff --git a/gcc/config/rs6000/rs6000-builtin.def
> b/gcc/config/rs6000/rs6000-builtin.def
> index e22b3e4d53b..c6fdfadeda8 100644
> --- a/gcc/

Re: [PATCH 5/6] Docs: Document environment setting directives for testsuite

2020-07-09 Thread Sandra Loosemore

On 7/9/20 3:57 AM, Tamar Christina wrote:

Hi All,

This document some of the existing DejaGnu directives to modify
environment variables before test or compiler invocations.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/sourcebuild.texi (dg-set-compiler-env-var,
dg-set-target-env-var): Document.



Looks good to me.

-Sandra



Re: [PATCH] rs6000: Refine RTL unroll adjust hook

2020-07-09 Thread Segher Boessenkool
Hi Jiufu,

On Thu, Jul 09, 2020 at 04:01:38PM +0800, Jiufu Guo wrote:
> Segher Boessenkool  writes:
> >> But for each single condition, loop unrolling may still be helpful.
> >> While, if these conditions are all occur in a loop, it would be more
> >> possible to get negative impacts after unrolled.
> >
> > Yes, but how many loops have *all* these conditions?  That is my problem
> > with it: it is only tested with one specific loop, and only benefits
> > that loop.
> 
> I also encounter a few of this kind of loops, some in hot path of leela
> and perlbench, and had negative impact leelar_r (~2%), perlbench(>0.5%),
> and also gcc_r.  I had a quick count, there are ~500 this kind of loops
> occur in specint build.

If the generic code decides to unroll big loops with calls *and* jumps,
there is a big problem there?

On most targets, it should not unroll loops with calls *at all*, for
example.

> >> While, actually, here we would need condition to define *complex* loop,
> >> where contains call exist (may just 1), branch exist(may 2) and early
> >> exit(may 1) at the same time, but each number is not large.
> >> Any sugguestions? Thanks.
> >
> > How many loops have you seen where all those conditions are true, but
> > the generic code still decides to unroll things?
> Some occur as above said.  I use -fopt-info to compare the changed
> unroll_adjust_hook to check loops of this kind.

But there are a few cases where we *do* want to unroll, you say?  What
is special about those cases, what do they do differently?


Segher


Re: [PATCH] rs6000: Allow MMA built-in initialization regardless of compiler options

2020-07-09 Thread Segher Boessenkool
Hi!

On Wed, Jul 08, 2020 at 11:02:45PM -0500, Peter Bergner wrote:
> PR96125 shows a bug when we try to use an MMA built-in within a function
> that uses #pragma target/attribute target to enable power10 code generation
> and the -mcpu= command line option is pre-power10.
> 
> The problem is that we only initialize built-ins once, fairly early, when
> the command line options are in force.  If the -mcpu= is pre-power10,
> then we fail to initialize the MMA built-ins at all and so they are not
> available to call in a #pragma target/attribute target function.
> 
> The patch below basically always (on server type cpus) initializes the MMA
> built-ins so we can use them in #pragma target/attribute target functions.
> The patch below fixes the bug and is currently in the middle of testing.
> 
> Is this ok for trunk assuming the bootstrap and regression testing
> show no regressions?
> 
> This also affects GCC10, so I'd like to backport this before the release.
> Ok there too after it sits on trunk a day or two?

> gcc/
>   PR target/96125
>   * config/rs6000/rs6000-call.c (rs6000_init_builtins): Define the MMA
>   specific types __vector_quad and __vector_pair, and initialize the
>   MMA built-ins if TARGET_EXTRA_BUILTINS is set.
>   (mma_init_builtins): Don't test for mask set in rs6000_builtin_mask.
>   Remove now unneeded mask variable.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Add the
>   OPTION_MASK_MMA flag for power10 if not already set.
> 
> gcc/testsuite/
>   PR target/96125
>   * gcc.target/powerpc/pr96125.c: New test.

Okay for trunk and 10 (but see test nit below).  Thanks!

> -  /* Create Altivec and VSX builtins on machines with at least the
> +  /* Create Altivec, VSX and MMA builtins on machines with at least the
>   general purpose extensions (970 and newer) to allow the use of
>   the target attribute.  */
>if (TARGET_EXTRA_BUILTINS)
> -altivec_init_builtins ();
> -  if (TARGET_MMA)
> -mma_init_builtins ();
> +{
> +  altivec_init_builtins ();
> +  mma_init_builtins ();
> +}
>if (TARGET_HTM)
>  htm_init_builtins ();

So maybe we should just do all builtins always?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96125.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */

powerpc_vsx_ok is not the right test for -mcpu=power8 (it means p7).
Usually powerpc_p8vector_ok is used...  not a great situation, but :-)


Segher


Re: [PATCH 4/6] Testsuite: Make it easier to debug environment setting functions

2020-07-09 Thread Mike Stump via Gcc-patches
On Jul 9, 2020, at 2:56 AM, Tamar Christina  wrote:
> This adds verbose output to dg-set-compiler-env-var and dg-set-target-env-var
> so you can actually see what they're setting when you add -v -v.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Ok.


Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.

2020-07-09 Thread Richard Sandiford
Tamar Christina  writes:
> Hi Richard,
>
> Thanks for the review,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Thursday, July 9, 2020 1:35 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH 1/6] AArch64: Fix bugs in -mcpu=native detection.
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > This patch fixes a couple of issues in AArch64's -mcpu=native processing:
>> >
>> > The buffer used to read the lines from /proc/cpuinfo is 128 bytes
>> > long.  While this was enough in the past with the increase in architecture
>> extensions it is
>> > no longer enough.   It results in two bugs:
>> >
>> > 1) No option string longer than 127 characters is correctly parsed.  
>> > Features
>> >that are supported are silently ignored.
>> >
>> > 2) It incorrectly enables features that are not present on the machine:
>> >   a) It checks for substring matching instead of full word matching.  This
>> makes
>> >  it incorrectly detect sb support when ssbs is provided instead.
>> >   b) Due to the truncation at the 127 char border it also incorrectly 
>> > enables
>> >  features due to the full feature being cut off and the part that is 
>> > left
>> >  accidentally enables something else.
>> >
>> > This breaks -mcpu=native detection on some of our newer system.
>> >
>> > The patch fixes these issues by reading full lines up to the \n in a 
>> > string.
>> > This gives us the full feature line.  Secondly it creates a set from
>> > this string
>> > to:
>> >
>> >  1) Reduce matching complexity from O(n*m) to O(n*logm).
>> >  2) Perform whole word matching instead of substring matching.
>> >
>> > To make this code somewhat cleaner I also changed from using char* to
>> > using std::string and std::set.
>> >
>> > Note that I have intentionally avoided the use of ifstream and
>> > stringstream to make it easier to backport.  I have also not change
>> > the substring matching for the initial line classification as I cannot
>> > find a documented cpuinfo format which leads me to believe there may
>> > be kernels out there that require this which may be why the original code
>> does this.
>> >
>> > I also do not want this to break if the kernel adds a new line that is
>> > long and indents the file by two tabs to keep everything aligned.  In
>> > short I think an imprecise match is the right thing here.
>> >
>> > Test for this is added as the last thing in this series as it requires
>> > some changes to be made to be able to test this.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> 
>> Sorry to be awkward.  I know Kyrill's already approved this, but I kind-of
>> object.
>> 
>> I think we should split this into two: fixing the limit bug, and fixing the
>> complexity.  It's easier to justify backporting the fix for the limit bug 
>> than the
>> fix for the complexity.
>
> It was never the intention to fix the complexity. The change in complexity is 
> just
> simply because I split the feature strings into actual words now. I do so 
> because
> in my opinion this is simpler than doing memcmp and checking the the character
> after the one you matched is a space or a null character, and then checking 
> that you are
> one character away from the previous space or at the start of the line.
>
> Do you know of a simpler way to do word matches in C?

This isn't an anti-C++ thing :-)  I just think any clean-up should
be separate from the bugfix, and only the bugfix should be backported.

>> For fixing the limit bug, it seems better to use an obstack to read the file
>> instead.  This should actually be much easier than what the patch does. :-)  
>> It
>> should also be more efficient.
>> 
>
> Sorry.. I genuinely don't understand. I had looked at obstack and the 
> interface seemed to be
> more work to me.  I am probably wrong since this interface has zero 
> documentation
> but it looks like to use obstack I have to (based on guessing from what other 
> code are doing)
>
> 1. call gcc_obstack_init
> 2. call obstack_alloc
> 3. call obstack_grow any time I need a bigger buffer.
> 4. call obstack_free
>
> So I am missing why this is simpler than calling realloc... 
> But obviously I am missing something.

Yeah, (1), (3) and (4) are the right steps.  But the simplifying
thing is that obstack has “optimised” single-character insertions.
So rather than going by fgets, we can just do something like:

  while ((ch = getc (f)) != EOF)
{
  obstack_1grow (&ob, ch);
  if (ch == '\n')
break;
}
  obstack_1grow (&ob, 0);
  return obstack_finish (&ob);

to read a zero-terminated line.  Then use:

  obstack_free (&ob, line);

after each line.  (Optional, but more efficient, since it reuses memory.)

This saves doing a fresh xmalloc+free for each line.

>> For fixing the complexity: does this actually make things noticeably faster 
>> in
>> practice?  The “m” in the O(n*m) 

Re: [PATCH 4/6] Testsuite: Make it easier to debug environment setting functions

2020-07-09 Thread Richard Sandiford
Tamar Christina  writes:
> diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> index 
> 27cc7c19625d8bff12268de16ec0c20d558394b7..ccf6b7cc70b1d4268c0b0c65f960fb42deab0d88
>  100644
> --- a/gcc/testsuite/lib/gcc-dg.exp
> +++ b/gcc/testsuite/lib/gcc-dg.exp
> @@ -477,7 +477,10 @@ proc dg-set-target-env-var { args } {
>   error "dg-set-target-env-var: need two arguments"
>   return
>  }
> -lappend set_target_env_var [list [lindex $args 1] [lindex $args 2]]
> +set var [lindex $args 1]
> +set value [lindex $args 2]
> +verbose "dg-set-compiler-env-var $var $value" 2

“dg-set-target-env-var”

OK with that change, thanks.

Richard

> +lappend set_target_env_var [list $var $value]
>  }
 
>  proc set-target-env-var { } {
> @@ -518,6 +521,7 @@ proc dg-set-compiler-env-var { args } {
>  }
>  set var [lindex $args 1]
>  set value [lindex $args 2]
> +verbose "dg-set-compiler-env-var $var $value" 2
>  if [info exists ::env($var)] {
>lappend saved_compiler_env_var [list $var 1 $::env($var)]
>  } else {




Re: [PATCH 5/6] Docs: Document environment setting directives for testsuite

2020-07-09 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> This document some of the existing DejaGnu directives to modify
> environment variables before test or compiler invocations.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * doc/sourcebuild.texi (dg-set-compiler-env-var,
>   dg-set-target-env-var): Document.

OK, thanks.

Richard


Re: [PATCH 0/6 ver 4] ] Permute Class Operations

2020-07-09 Thread will schmidt via Gcc-patches
On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 5/6] rs6000, Add vector splat builtin support
> 
> --
> V4 Fixes:
> 
>Rebased on mainline.  Changed FUTURE to P10.
>define_predicate "s32bit_cint_operand" removed unnecessary cast in
>  definition.
>Changed define_expand "xxsplti32dx_v4si" to use "0" for constraint
>  of operand 1.
>Changed define_insn "xxsplti32dx_v4si_inst" to use "0 for constraint
>  of operand 1.
>Removed define_predicate "f32bit_const_operand".  Use const_double_operand
>  instead.
> 
>*** Please provide feedback for the following change:
>(define_insn "xxspltidp_v2df_inst", Added print statement to warn of
>possible undefined behavior.  The xxspltidp instruction result is
>undefined for subnormal inputs.  I added a test for subnormal input with
>a fprintf to stderr to warn the "user" if the constant input is a subnormal
>value.  I tried assert initially, but that causes GCC to exit ungracefully
>with no information as to why.  I really didn't like that behavior.
>A subnormal input is not really a fatal error but the "user" needs
>to be told it is not a good idea.  Not sure if using an fprintf statement
>in a define_insn is an acceptable thing either.  But it does give the
>user the needed input and GCC exits normally.  Let me know if there
>is a better option here.


Maybe this should be an RFC/Patch then..   Put this in the intro
paragraph, not in the v4 fixes blurb. 


I certainly defer to other opinions here.
I don't see any other define_insn entries that emit warning printfs for 
undefined results.
I'd lean towards dropping that part here.

You may be able to add some logic over in rs6000-call.c  rs6000_expand_* to 
handle this.
Though I think there are many cases where undefined results can be the result 
of user input.
Perhaps just documenting this (extend.texi) is enough? 



Nothing else below jumped out at me.

thanks, 
-Will





Re: [PATCH ver 4] RS6000, add VSX mask manipulation support

2020-07-09 Thread Segher Boessenkool
Hi Carl,

On Wed, Jul 08, 2020 at 01:22:33PM -0700, Carl Love wrote:
> The following patch adds support for builtins vec_genbm(),  vec_genhm(),
> vec_genwm(), vec_gendm(), vec_genqm(), vec_cntm(), vec_expandm(),
> vec_extractm().  Support for instructions mtvsrbm, mtvsrhm, mtvsrwm,
> mtvsrdm, mtvsrqm, cntm, vexpandm, vextractm.

> +;; Mode attribute to give the suffix for the mask instruction
> +(define_mode_attr VSX_MM_SUFFIX [(V16QI "b")
> +  (V8HI "h")
> +  (V4SI "w")
> +  (V2DI "d")
> +  (V1TI "q")])

This could just use ?

> +(define_insn "vec_mtvsr_"
> +  [(set (match_operand:VSX_MM 0 "altivec_register_operand" "=v")
> +(unspec:VSX_MM [(match_operand:DI 1 "gpc_reg_operand" "b")]
> +UNSPEC_MTVSBM))]
> +  "TARGET_POWER10"
> +  "mtvsrm %0,%1";
> +  [(set_attr "type" "vecsimple")])

This can take any "r", not just "b" I think?  I.e. r0 is allowed here.

The rest looks great.  Okay with those tweaks.  Thanks!


Segher


[PATCH] fixup BIT_FIELD_REF detection in SLP discovery

2020-07-09 Thread Richard Biener
This fixes a thinko where we end up combining a BIT_FIELD_REF
and a memory access, fixed by checking all stmts are a load or
none.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

2020-07-09  Richard Biener  

PR tree-optimization/96133
* tree-vect-slp.c (vect_build_slp_tree_1): Compare load_p
status between stmts.
---
 gcc/tree-vect-slp.c | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 35ae6984593..b3645b0a820 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -760,7 +760,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
   machine_mode optab_op2_mode;
   machine_mode vec_mode;
   stmt_vec_info first_load = NULL, prev_first_load = NULL;
-  bool load_p = false;
+  bool first_stmt_load_p = false, load_p = false;
 
   /* For every stmt in NODE find its def stmt/s.  */
   stmt_vec_info stmt_info;
@@ -850,6 +850,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
{
  *node_vectype = vectype;
  first_stmt_code = rhs_code;
+ first_stmt_load_p = load_p;
 
  /* Shift arguments should be equal in all the packed stmts for a
 vector shift with scalar shift operand.  */
@@ -931,24 +932,25 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  if (first_stmt_code != rhs_code
  && alt_stmt_code == ERROR_MARK)
alt_stmt_code = rhs_code;
- if (first_stmt_code != rhs_code
- && (first_stmt_code != IMAGPART_EXPR
- || rhs_code != REALPART_EXPR)
- && (first_stmt_code != REALPART_EXPR
- || rhs_code != IMAGPART_EXPR)
- /* Handle mismatches in plus/minus by computing both
-and merging the results.  */
- && !((first_stmt_code == PLUS_EXPR
-   || first_stmt_code == MINUS_EXPR)
-  && (alt_stmt_code == PLUS_EXPR
-  || alt_stmt_code == MINUS_EXPR)
-  && rhs_code == alt_stmt_code)
- && !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
-   && (first_stmt_code == ARRAY_REF
-   || first_stmt_code == BIT_FIELD_REF
-   || first_stmt_code == INDIRECT_REF
-   || first_stmt_code == COMPONENT_REF
-   || first_stmt_code == MEM_REF)))
+ if ((first_stmt_code != rhs_code
+  && (first_stmt_code != IMAGPART_EXPR
+  || rhs_code != REALPART_EXPR)
+  && (first_stmt_code != REALPART_EXPR
+  || rhs_code != IMAGPART_EXPR)
+  /* Handle mismatches in plus/minus by computing both
+ and merging the results.  */
+  && !((first_stmt_code == PLUS_EXPR
+|| first_stmt_code == MINUS_EXPR)
+   && (alt_stmt_code == PLUS_EXPR
+   || alt_stmt_code == MINUS_EXPR)
+   && rhs_code == alt_stmt_code)
+  && !(STMT_VINFO_GROUPED_ACCESS (stmt_info)
+   && (first_stmt_code == ARRAY_REF
+   || first_stmt_code == BIT_FIELD_REF
+   || first_stmt_code == INDIRECT_REF
+   || first_stmt_code == COMPONENT_REF
+   || first_stmt_code == MEM_REF)))
+ || first_stmt_load_p != load_p)
{
  if (dump_enabled_p ())
{
-- 
2.26.2


[committed] c++: Partially revert fix for PR c++/95497 [PR96132]

2020-07-09 Thread Patrick Palka via Gcc-patches
I was mistaken to assume that a dependent type is necessarily
incomplete, and indeed there are multiple places in the frontend where
we check a type for both dependency and completeness.  So this patch
partially reverts the fix for PR95497, restoring the dependent_type_p
check that guarded the call to is_really_empty_class below.

gcc/cp/ChangeLog:

PR c++/96132
* constexpr.c (potential_constant_expression_1) :
Restore dependent_type_p check that guarded the call to
is_really_empty_class.

gcc/testsuite/ChangeLog:

PR c++/96132
* g++.dg/template/incomplete12.C: New test.
---
 gcc/cp/constexpr.c   | 1 +
 gcc/testsuite/g++.dg/template/incomplete12.C | 9 +
 2 files changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/incomplete12.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ff78ebda2dc..97dcc1b1d10 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -7444,6 +7444,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
{
  tree type = TREE_TYPE (t);
  if ((processing_template_decl && !COMPLETE_TYPE_P (type))
+ || dependent_type_p (type)
  || is_really_empty_class (type, /*ignore_vptr*/false))
/* An empty class has no data to read.  */
return true;
diff --git a/gcc/testsuite/g++.dg/template/incomplete12.C 
b/gcc/testsuite/g++.dg/template/incomplete12.C
new file mode 100644
index 000..335e5356874
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/incomplete12.C
@@ -0,0 +1,9 @@
+// PR c++/96132
+// { dg-do compile }
+
+template  class a;
+
+template  class c {
+  a e;
+  void operator=(c d) { e = d; }
+};
-- 
2.27.0.343.g4a0fcf9f76



Re: [PATCH] ppc64 check for incompatible setting of minimal-toc

2020-07-09 Thread Segher Boessenkool
Hi!

On Mon, May 18, 2020 at 04:02:34PM -0700, Douglas B Rupp wrote:
> The attached patch is proposed for rs6000/linux64.h.

(Please use inline patches if at all possible.  For example, your
replies left out the patch already).

> The problem it addresses is that the current checking only tests for 
> existence not for an incompatible/compatible setting.
> 
> For example:
> 
> $ powerpc64-linux-gnu-gcc -mcmodel=medium -mminimal-toc foo.c
> is an incompatible set of switches
> 
> however
> 
> $ powerpc64-linux-gnu-gcc -mcmodel=medium -mno-minimal-toc foo.c
> is ok.
> 
> Currently both are reported as incompatible.

> diff --git gcc/config/rs6000/linux64.h gcc/config/rs6000/linux64.h
> index 34776c8421e..6728006d208 100644
> --- gcc/config/rs6000/linux64.h
> +++ gcc/config/rs6000/linux64.h
> @@ -134,8 +134,10 @@ extern int dot_symbols;
> rs6000_isa_flags |= OPTION_MASK_POWERPC64;\
> error ("%<-m64%> requires a PowerPC64 cpu");  \
>   }   \
> -   if ((rs6000_isa_flags_explicit\
> -& OPTION_MASK_MINIMAL_TOC) != 0) \
> +   if (((rs6000_isa_flags_explicit   \
> + & OPTION_MASK_MINIMAL_TOC) != 0) && \
> +   ((rs6000_isa_flags\
> + & OPTION_MASK_MINIMAL_TOC) != 0))   \
>   {   \
> if (global_options_set.x_rs6000_current_cmodel\
> && rs6000_current_cmodel != CMODEL_SMALL) \

(only allow it with CMODEL_SMALL; no other action in this block).

Put that
  ((rs6000_isa_flags & OPTION_MASK_MINIMAL_TOC) != 0)
check inside the block please, together with the CMODEL_SMALL check?


Segher


Re: [PATCH 0/6 ver 4] ] Permute Class Operations

2020-07-09 Thread will schmidt via Gcc-patches
On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote:
> [PATCH 6/6] rs6000 Add vector blend, permute builtin support
> 
> --
> V4 Fixes:
> 
>Rebased on mainline.  Changed FUTURE to P10.
> -
> 
> v3 fixes:
>Replace spaces with tabs in ChangeLog description.
>Fix implementation comments for define_expand "xxpermx" in file
>  gcc/config/rs6000/alitvec.md.
>Fix minor typos in the comments for the changes in
> gcc/config/rs6000/rs6000-call.c.
> 
> 
> v2 changes:
> 
>Updated ChangeLog per comments.
> 
>Updated implementation of the define_expand "xxpermx".
> 
>Fixed the comments and check for 3-bit immediate field for the
>   CODE_FOR_xxpermx check.
> 
>gcc/doc/extend.texi:
>   comment "Maybe it should say it is related to vsel/xxsel, but
> per
>   bigger element?", added comment.  I took the description
> directly
>   from spec.  Don't really don't want to mess with the approved
>   description.
> 
>fixed typo for Vector Permute Extendedextracth
> 
> --
> 
> GCC maintainers:
> 
> The following patch adds support for the vec_blendv and vec_permx
> builtins.
> 
> The patch has been compiled and tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The test cases were compiled on a Power 9 system and then tested on
> Mambo.
> 
>  Carl Love
> 
> ---
> rs6000 RFC2609 vector blend, permute instructions
> 
> gcc/ChangeLog
> 
> 2020-07-06  Carl Love  
> 
>   * config/rs6000/altivec.h (vec_blendv, vec_permx): Add define.
>   * config/rs6000/altivec.md (UNSPEC_XXBLEND, UNSPEC_XXPERMX.):
> New
>   unspecs.
>   (VM3): New define_mode.
>   (VM3_char): New define_attr.
>   (xxblend_ mode VM3): New define_insn.
>   (xxpermx): New define_expand.
>   (xxpermx_inst): New define_insn.
>   * config/rs6000/rs6000-builtin.def (VXXBLEND_V16QI,
> VXXBLEND_V8HI,
>   VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF):
> New
>   BU_P10V_3 definitions.

>   (XXBLENDBU_P10_OVERLOAD_3): New BU_P10_OVERLOAD_3 definition.

extra noise in (), should be just XXBLEND .

>   (XXPERMX): New BU_P10_OVERLOAD_4 definition.


>   * config/rs6000/rs6000-c.c
> (altivec_resolve_overloaded_builtin):
>   (P10_BUILTIN_VXXPERMX): Add if case support.
>   * config/rs6000/rs6000-call.c (P10_BUILTIN_VXXBLEND_V16QI,
>   P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI,
>   P10_BUILTIN_VXXBLEND_V2DI, P10_BUILTIN_VXXBLEND_V4SF,
>   P10_BUILTIN_VXXBLEND_V2DF, P10_BUILTIN_VXXPERMX): Define
>   overloaded arguments.
>   (rs6000_expand_quaternop_builtin): Add if case for
> CODE_FOR_xxpermx.

s/if//

>   (builtin_quaternary_function_type): Add v16uqi_type and
> xxpermx_type
>   variables, add case statement for P10_BUILTIN_VXXPERMX.
>   (builtin_function_type)[P10_BUILTIN_VXXBLEND_V16QI,
>   P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI,
>   P10_BUILTIN_VXXBLEND_V2DI]: Add case statements.

Add space after (builtin_function_type)
Reverse tense?
(b_f_t) Add case statements for P10_BUILTIN_..., P10_BUILTIN_...


>   * doc/extend.texi: Add documentation for vec_blendv and
> vec_permx.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-07-06  Carl Love  
>   gcc.target/powerpc/vec-blend-runnable.c: New test.
>   gcc.target/powerpc/vec-permute-ext-runnable.c: New test.
> ---
>  gcc/config/rs6000/altivec.h   |   2 +
>  gcc/config/rs6000/altivec.md  |  71 +
>  gcc/config/rs6000/rs6000-builtin.def  |  13 +
>  gcc/config/rs6000/rs6000-c.c  |  27 +-
>  gcc/config/rs6000/rs6000-call.c   |  95 ++
>  gcc/doc/extend.texi   |  63 
>  .../gcc.target/powerpc/vec-blend-runnable.c   | 276 
>  .../powerpc/vec-permute-ext-runnable.c| 294
> ++
>  8 files changed, 835 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-blend-
> runnable.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-permute-ext-
> runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index 126409c168b..e8fdeb31b0b 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -708,6 +708,8 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_splati(a)  __builtin_vec_xxspltiw (a)
>  #define vec_splatid(a) __builtin_vec_xxspltid (a)
>  #define vec_splati_ins(a, b, c)__builtin_vec_xxsplti32dx (a,
> b, c)
> +#define vec_blendv(a, b, c)__builtin_vec_xxblend (a, b, c)
> +#define vec_permx(a, b, c, d)  __builtin_vec_xxpermx (a, b, c, d)
> 
>  #define vec_gnb(a, b)__builtin_vec_gnb (a, b)
>  #define vec_clrl(a, b)   __builtin_vec_clrl (a, b)
> diff --git a/gcc/config/rs6000/altivec.md
> 

[PATCH] middle-end: Call get_constant_section with DECL not EXP.

2020-07-09 Thread David Edelsohn via Gcc-patches
output_constant_def_contents() can call get_constant_section() with an
EXP that is a CONSTRUCTOR, which is not a declaration.  This can hit
asserts in GCC machinery to choose a named section for the initialization
data that expects the parameters to be DECLs.

get_constant_section() is a wrapper around TARGET_ASM_SELECT_SECTION,
which is documented as:

Return the section into which @var{exp} should be placed.  You can
assume that @var{exp} is either a @code{VAR_DECL} node or a constant of
some sort.

A CONSTRUCTOR initializer is a "constant of some sort", but isn't
consistent with DECL_SECTION_NAME, etc.

This patch changes get_constant_section() and its callers to pass the
DECL within the EXP instead of the EXP, and to explicitly pass the reloc
information that must be determined from the EXP.

Another option is to remove get_constant_section() completely, replace
it with direct calls to targetm.asm_out.select_section() (for which it
now is a complete pass-through), and update the Target Hook
documentation.

What's the preferred path forward?

Thanks, David

gcc/ChangeLog

2020-07-09  David Edelsohn  

* varasm.c (get_constant_section): Change first parameter from
EXP to DECL and insert reloc second parameter.
(build_constant_desc): Adjust call.
(output_constant_def_contents): Adjust call.


0001-middle-end-Call-get_constant_section-with-DECL-not-E.patch
Description: Binary data


Re: [PATCH] rs6000: Split movsf_from_si from high word before reload[PR89310]

2020-07-09 Thread Segher Boessenkool
Hi!

On Thu, Jul 09, 2020 at 11:09:42AM +0800, luoxhu wrote:
> > Maybe change it back to just SI?  It won't match often at all for QI or
> > HI anyway, it seems.  Sorry for that detour.  Should be good with the
> > above nits fixed :-)
> 
> OK, if I see correctly, subreg of DImode should be SImode and I used
> subreg:SI to match only SI, so no need to consider mask for QI and HI? :)

My problem with it was that the shift amounts won't be 32 for QI etc.?

> +;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
> +;; split it before reload with "and mask" to avoid generating shift right
> +;; 32 bit then shift left 32 bit.
> +(define_insn_and_split "movsf_from_si2"
> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
> + (unspec:SF [
> +  (subreg:SI (
> +  ashiftrt:DI (

Opening parens *start* a line, they never end it.  So:

(define_insn_and_split "movsf_from_si2"
  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
  (unspec:SF
[(subreg:SI
   (ashiftrt:DI
 (match_operand:DI 1 "input_operand" "r")
 (const_int 32))
  0)]
UNSPEC_SF_FROM_SI))
   (clobber (match_scratch:DI 2 "=r"))]

or something like that.  There aren't really any real rules...  The
important points are that nested things should be indented, and things
at the same level should have the same indent (like the outer set and
clobber).  The [ for an unspec is sometimes put at the end of a line,
that reads a little better perhaps.

> +  "TARGET_NO_SF_SUBREG"
> +  "#"
> +  "&& vsx_reg_sfsubreg_ok (operands[0], SFmode)"

Put this in the insn condition?  And since this is just a predicate,
you can just use it instead of gpc_reg_operand.

(The split condition becomes "&& 1" then, not "").


Segher


[PATCH] libgccjit: Fix several memory leaks in the driver

2020-07-09 Thread Alex Coplan
Hello,

This patch fixes several memory leaks in the driver, all of which relate
to the handling of static specs. We introduce functions
set_static_spec_{shared,owned}() which are used to enforce proper memory
management when updating the strings in the static_specs table.

This is achieved by making use of the alloc_p field in the table
entries. Similarly to set_spec(), each time we update an entry, we check
whether alloc_p is set, and free the old value if so. We then set
alloc_p correctly based on whether we "own" this memory or whether we're
just taking a pointer to a shared string which we shouldn't free.

The following table shows the number of leaks found by AddressSanitizer
when running a minimal libgccjit program on AArch64. The test program
does the whole libgccjit compilation cycle in a loop (including acquiring
and releasing the context), and the table below shows the number of leaks
for different iterations of that loop.

+--+-+-+--+---+
| # of runs >  | 1   | 2   | 3| Leaks per run |
+--+-+-+--+---+
| Before patch | 463 | 940 | 1417 | 477   |
+--+-+-+--+---+
| After patch  | 416 | 846 | 1276 | 430   |
+--+-+-+--+---+

Ensuring that we minimize "leaks per run" (ultimately eliminating all of
them) is important in order for long-running applications to be able to
make use of in-process libgccjit.

Testing:
 * Bootstrap and regtest on aarch64-linxu-gnu, x86_64-linux-gnu.
 * Bootstrap and regtest on aarch64-linux-gnu with bootstrap-asan config.
 * Smoke test of libgccjit, ran regressions on a --disable-bootstrap build on
   aarch64-linux-gnu.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

2020-07-09  Alex Coplan  

* gcc.c (set_static_spec): New.
(set_static_spec_owned): New.
(set_static_spec_shared): New.
(driver::maybe_putenv_COLLECT_LTO_WRAPPER): Use
set_static_spec_owned() to take ownership of lto_wrapper_file
such that it gets freed in driver::finalize.
(driver::maybe_run_linker): Use set_static_spec_shared() to
ensure that we don't try and free() the static string "ld",
also ensuring that any previously-allocated string in
linker_name_spec is freed. Likewise with argv0.
(driver::finalize): Use set_static_spec_shared() when resetting
specs that previously had allocated strings; remove if(0)
around call to free().

diff --git a/gcc/gcc.c b/gcc/gcc.c
index c0eb3c10cfd..f839971d42d 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1908,6 +1908,47 @@ init_spec (void)
 
   specs = sl;
 }
+
+static void
+set_static_spec (const char **spec, const char *value, bool alloc_p)
+{
+  struct spec_list *sl = NULL;
+
+  for (unsigned i = 0; i < ARRAY_SIZE (static_specs); i++)
+{
+  if (static_specs[i].ptr_spec == spec)
+   {
+ sl = static_specs + i;
+ break;
+   }
+}
+
+  gcc_assert (sl);
+
+  if (sl->alloc_p)
+{
+  const char *old = *spec;
+  free (const_cast  (old));
+}
+
+  *spec = value;
+  sl->alloc_p = alloc_p;
+}
+
+/* Update a static spec to a new string, taking ownership of that
+   string's memory.  */
+static void set_static_spec_owned (const char **spec, const char *val)
+{
+  return set_static_spec (spec, val, true);
+}
+
+/* Update a static spec to point to a new value, but don't take
+   ownership of (i.e. don't free) that string.  */
+static void set_static_spec_shared (const char **spec, const char *val)
+{
+  return set_static_spec (spec, val, false);
+}
+
 
 /* Change the value of spec NAME to SPEC.  If SPEC is empty, then the spec is
removed; If the spec starts with a + then SPEC is added to the end of the
@@ -8333,7 +8374,7 @@ driver::maybe_putenv_COLLECT_LTO_WRAPPER () const
   if (lto_wrapper_file)
 {
   lto_wrapper_file = convert_white_space (lto_wrapper_file);
-  lto_wrapper_spec = lto_wrapper_file;
+  set_static_spec_owned ( 0
@@ -8864,7 +8905,7 @@ driver::maybe_run_linker (const char *argv0) const
  linker_plugin_file_spec = convert_white_space (temp_spec);
}
 #endif
- lto_gcc_spec = argv0;
+ set_static_spec_shared (

Re: [PATCH 6/9] [OpenACC] Set bias to zero for explicit attach/detach clauses in C and C++

2020-07-09 Thread Thomas Schwinge
Hi Julian!

On 2020-06-25T13:36:15+0200, I wrote:
> On 2020-06-16T15:39:42-0700, Julian Brown  wrote:
>> This is a fix for the pointer (or array) size inadvertently being used
>> for the bias of attach and detach clauses (PR95270)
>
> Thanks for looking into that one, which had caused my some gray hair.
>
>> for C and C++.
>
> That means, there is no such problem for Fortran?  (I haven't run into
> one, just curious.)

Looking into something else, I've now found the very same (?) problem for
Fortran, too.  :-| For the following simple testcase, I again do see
non-zero 'bias: 64' for 'enter data attach(data_p)':

program main
  use openacc
  implicit none
  !TODO Per PR96080, data types chosen so that we can create a "pointer 
object 'data_p'" on the device.
  integer, dimension(:), target :: data(1)
  integer, dimension(:), pointer :: data_p

  !TODO Per PR96080, not using OpenACC/Fortran runtime library routines.

  !$acc enter data create(data)
  data_p => data
  !$acc enter data copyin(data_p)

  !$acc enter data attach(data_p)
end program main

..., and the 'attach' fails with 'libgomp: pointer target not mapped for
attach'.  It doesn't fail when I force 'bias = 0' in
'gomp_attach_pointer'.

I've tried a bit, but it seems a bit difficult in Fortran to verify (with
'associated(data_p, data)' etc.) what we've actually 'attach'ed: per
PR96080, a 'call acc_update_self(data_p)' may not be doing the expected
thing, and a '!$acc update self(data_p)' per
'libgomp/oacc-parallel.c:GOACC_update' will update the actual data, but
is no-op for 'GOMP_MAP_TO_PSET', 'GOMP_MAP_POINTER'.  I've stopped
experimenting with that any further.

So it seems Fortran front end changes will also be required in addition
to the C, C++ front end changes you've come up with.  (For avoidance of
doubt: OK to do separately, if you'd like to.  Please also reference GCC
PR95270 for these, and include the testcase from above, or something
better.)


Grüße
 Thomas


> In principle, yes, for master and releases/gcc-10 branches, but please
> incorporate the following items:
>
>>  PR middle-end/95270
>>
>>  gcc/c/
>>  * c-typeck.c (c_finish_omp_clauses): Set OMP_CLAUSE_SIZE (bias) to zero
>>  for standalone attach/detach clauses.
>>
>>  gcc/cp/
>>  * semantics.c (finish_omp_clauses): Likewise.
>>
>>  gcc/testsuite/
>>  * c-c++-common/goacc/mdc-1.c: Update expected dump output for zero
>>  bias.
>> ---
>>  gcc/c/c-typeck.c |  8 
>>  gcc/cp/semantics.c   |  8 
>>  gcc/testsuite/c-c++-common/goacc/mdc-1.c | 14 +++---
>>  3 files changed, 23 insertions(+), 7 deletions(-)
>
>> --- a/gcc/c/c-typeck.c
>> +++ b/gcc/c/c-typeck.c
>> @@ -14533,6 +14533,10 @@ c_finish_omp_clauses (tree clauses, enum 
>> c_omp_region_type ort)
>>  }
>>if (c_oacc_check_attachments (c))
>>  remove = true;
>> +  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>> +  && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
>> +  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH))
>> +OMP_CLAUSE_SIZE (c) = size_zero_node;
>>break;
>>  }
>>if (t == error_mark_node)
>> @@ -14546,6 +14550,10 @@ c_finish_omp_clauses (tree clauses, enum 
>> c_omp_region_type ort)
>>remove = true;
>>break;
>>  }
>> +  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>> +  && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
>> +  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH))
>> +OMP_CLAUSE_SIZE (c) = size_zero_node;
>>if (TREE_CODE (t) == COMPONENT_REF
>>&& OMP_CLAUSE_CODE (c) != OMP_CLAUSE__CACHE_)
>>  {
>
> I cannot comment if these two code paths are good places (and the only
> ones) that need to set 'OMP_CLAUSE_SIZE', so I'll trust you've found the
> best/all places.
>
> Does that override an 'OMP_CLAUSE_SIZE' that was set earlier, or
> initialize it?  Maybe the latter, given my comment in
> : "make sure to skip/invalidate the
> 'gcc/gimplify.c:gimplify_scan_omp_clauses' handling"?
>
> Plase add some commentary here in the code, instead of just in the
> ChangeLog, something like: "initialize here, so that gimplify doesn't
> wrongly do so later" (if that's what it is, and in proper language, of
> course).
>
>> --- a/gcc/cp/semantics.c
>> +++ b/gcc/cp/semantics.c
>> @@ -7334,6 +7334,10 @@ finish_omp_clauses (tree clauses, enum 
>> c_omp_region_type ort)
>>  }
>>if (cp_oacc_check_attachments (c))
>>  remove = true;
>> +  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>> +  && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
>> +  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH))
>> +OMP_CLAUSE_SIZE (c) = size_zero_node;
>>break;
>>  }
>>if (t == error_mark

Re: [PATCH] rs6000: Allow MMA built-in initialization regardless of compiler options

2020-07-09 Thread Peter Bergner via Gcc-patches
On 7/9/20 12:11 PM, Segher Boessenkool wrote:
>> gcc/testsuite/
>>  PR target/96125
>>  * gcc.target/powerpc/pr96125.c: New test.
> 
> Okay for trunk and 10 (but see test nit below).  Thanks!
[snip]
> So maybe we should just do all builtins always?

I think that is the correct thing to do, but I think maybe that
should wait for Bill's rewrite of the built-in generation.



>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96125.c
>> @@ -0,0 +1,47 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_vsx_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
> 
> powerpc_vsx_ok is not the right test for -mcpu=power8 (it means p7).
> Usually powerpc_p8vector_ok is used...  not a great situation, but :-)

As we discussed offline, I changed this to -mdejagnu-cpu=power7 and
kept the powerpc_vsx_ok effective target test, so that this test case
will get more coverage, especially on older power7 BE systems.

I verified the updated test case passes on both LE and BE, so I've
pushed this now.  I'll let Bill Seurer's nightly testing try this
on a wider variety of builds before backporting this to GCC10.
I'll try and do that tomorrow.

Thanks!!

Peter




Re: [PATCH] rs6000: Allow MMA built-in initialization regardless of compiler options

2020-07-09 Thread Bill Schmidt via Gcc-patches



On 7/9/20 4:10 PM, Peter Bergner wrote:

On 7/9/20 12:11 PM, Segher Boessenkool wrote:

gcc/testsuite/
PR target/96125
* gcc.target/powerpc/pr96125.c: New test.

Okay for trunk and 10 (but see test nit below).  Thanks!

[snip]

So maybe we should just do all builtins always?

I think that is the correct thing to do, but I think maybe that
should wait for Bill's rewrite of the built-in generation.



I put it on my list after today's discussion.

Bill




Re: [PATCH 6/9] [OpenACC] Set bias to zero for explicit attach/detach clauses in C and C++

2020-07-09 Thread Julian Brown
On Thu, 9 Jul 2020 23:06:29 +0200
Thomas Schwinge  wrote:

> Hi Julian!
> 
> On 2020-06-25T13:36:15+0200, I wrote:
> > On 2020-06-16T15:39:42-0700, Julian Brown 
> > wrote:  
> >> This is a fix for the pointer (or array) size inadvertently being
> >> used for the bias of attach and detach clauses (PR95270)  
> >
> > Thanks for looking into that one, which had caused my some gray
> > hair. 
> >> for C and C++.  
> >
> > That means, there is no such problem for Fortran?  (I haven't run
> > into one, just curious.)  
> 
> Looking into something else, I've now found the very same (?) problem
> for Fortran, too.  :-| For the following simple testcase, I again do
> see non-zero 'bias: 64' for 'enter data attach(data_p)':
> 
> program main
>   use openacc
>   implicit none
>   !TODO Per PR96080, data types chosen so that we can create a
> "pointer object 'data_p'" on the device. integer, dimension(:),
> target :: data(1) integer, dimension(:), pointer :: data_p
> 
>   !TODO Per PR96080, not using OpenACC/Fortran runtime library
> routines. 
>   !$acc enter data create(data)
>   data_p => data
>   !$acc enter data copyin(data_p)
> 
>   !$acc enter data attach(data_p)
> end program main
> 
> ..., and the 'attach' fails with 'libgomp: pointer target not mapped
> for attach'.  It doesn't fail when I force 'bias = 0' in
> 'gomp_attach_pointer'.
> 
> I've tried a bit, but it seems a bit difficult in Fortran to verify
> (with 'associated(data_p, data)' etc.) what we've actually
> 'attach'ed: per PR96080, a 'call acc_update_self(data_p)' may not be
> doing the expected thing, and a '!$acc update self(data_p)' per
> 'libgomp/oacc-parallel.c:GOACC_update' will update the actual data,
> but is no-op for 'GOMP_MAP_TO_PSET', 'GOMP_MAP_POINTER'.  I've stopped
> experimenting with that any further.
> 
> So it seems Fortran front end changes will also be required in
> addition to the C, C++ front end changes you've come up with.  (For
> avoidance of doubt: OK to do separately, if you'd like to.  Please
> also reference GCC PR95270 for these, and include the testcase from
> above, or something better.)

Do the 7th & 8th patches in this series help? They were "supposed to"
be the Fortran equivalent of these C/C++ changes, though I found
additional problems too.

Thanks,

Julian


[PATCH] x86: Check TARGET_AVX512VL when enabling FMA

2020-07-09 Thread H.J. Lu via Gcc-patches
On Thu, Jul 9, 2020 at 6:35 AM H.J. Lu  wrote:
>
> On Thu, Jul 9, 2020 at 5:04 AM Kirill Yukhin  wrote:
> >
> > On 07 июл 09:06, H.J. Lu wrote:
> > > On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin  
> > > wrote:
> > > >
> > > > Hello HJ,
> > > >
> > > > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote:
> > > > > Enable FMA in rsqrt2 expander and fold rsqrtv16sf2 expander into
> > > > > rsqrt2 expander which expands to UNSPEC_RSQRT28 for 
> > > > > TARGET_AVX512ER.
> > > > > Although it doesn't show performance change in our workloads, FMA can
> > > > > improve other workloads.
> > > > >
> > > > > gcc/
> > > > >
> > > > >   PR target/88713
> > > > >   * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA.
> > > > >   * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New.
> > > > >   (rsqrt2): Replace VF1_128_256 with 
> > > > > VF_AVX512VL_VF1_128_256.
> > > > >   (rsqrtv16sf2): Removed.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > >   PR target/88713
> > > > >   * gcc.target/i386/pr88713-1.c: New test.
> > > > >   * gcc.target/i386/pr88713-2.c: Likewise.
> > > >
> > > > So, you've introduced new rsqrt expanders for DF vectors and relaxed
> > > > condition for V16SF. What I didn't get is why did you change unspec
> > > > type from RSQRT to RSQRT28 for V16SF expander?
> > > >
> > >
> > > UNSPEC in define_expand is meaningless when the pattern is fully
> > > expanded by ix86_emit_swsqrtsf.  I believe that UNSPEC in rsqrt2
> > > expander can be removed.
> >
> > Agree.
>
> I will leave UNSPEC alone here.
>
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */
> >
> > I gues -O2 is useless here (and in -2.c test).
>
> Fixed.
>
> > Othwerwise LGTM.
> >
>
> This is the patch I am checking in.
>

This patch is needed for

FAIL: gcc.target/i386/avx512er-vrsqrt28ps-3.c (internal compiler error)
FAIL: gcc.target/i386/avx512er-vrsqrt28ps-3.c (test for excess errors)
FAIL: gcc.target/i386/avx512er-vrsqrt28ps-4.c (internal compiler error)
FAIL: gcc.target/i386/avx512er-vrsqrt28ps-4.c (test for excess errors)
FAIL: gcc.target/i386/avx512er-vrsqrt28ps-5.c (internal compiler error)
FAIL: gcc.target/i386/avx512er-vrsqrt28ps-5.c (test for excess errors)
FAIL: gcc.target/i386/avx512er-vrsqrt28ps-6.c (internal compiler error)
FAIL: gcc.target/i386/avx512er-vrsqrt28ps-6.c (test for excess errors)

-- 
H.J.
From bd455fa210c94ed3f1f9af71ed79022c499dc8ff Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 9 Jul 2020 14:56:48 -0700
Subject: [PATCH] x86: Check TARGET_AVX512VL when enabling FMA

Check TARGET_AVX512VL when enabling FMA to avoid

gcc.target/i386/avx512er-vrsqrt28ps-3.c:25:1: error: unrecognizable insn:
(insn 29 28 30 6 (set (reg:V8SF 108)
(fma:V8SF (reg:V8SF 106)
(reg:V8SF 105)
(reg:V8SF 110)))

when TARGET_AVX512VL isn't enabled.

	* config/i386/i386-expand.c (ix86_emit_swsqrtsf): Check
	TARGET_AVX512VL when enabling FMA.
---
 gcc/config/i386/i386-expand.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 49718b7a41c..2e2b912fc83 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -15540,7 +15540,11 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip)
   /* e0 = x0 * a */
   emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a)));
 
-  if (TARGET_FMA || TARGET_AVX512F)
+  unsigned vector_size = GET_MODE_SIZE (mode);
+  if (TARGET_FMA
+  || (TARGET_AVX512F && vector_size == 64)
+  || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
+
 emit_insn (gen_rtx_SET (e2,
 			gen_rtx_FMA (mode, e0, x0, mthree)));
   else
-- 
2.26.2



Re: [PATCH] rs6000: Allow MMA built-in initialization regardless of compiler options

2020-07-09 Thread Segher Boessenkool
On Thu, Jul 09, 2020 at 04:10:41PM -0500, Peter Bergner wrote:
> On 7/9/20 12:11 PM, Segher Boessenkool wrote:
> [snip]
> > So maybe we should just do all builtins always?
> 
> I think that is the correct thing to do, but I think maybe that
> should wait for Bill's rewrite of the built-in generation.

Yes, this is for the future anyhow (not for power10) :-)


Segher


[PATCH] libgomp: Add OMPD Address Space Information functions.

2020-07-09 Thread y2s1982 via Gcc-patches
This patch adds Address Space Information function implementations as
defined in section 5.5.4 of OpenMP API Specification 5.0. It also
defines a struct that stores various information used by OMPD.

2020-07-09  Tony Sim  

libgomp/ChangeLog:

* Makefile.am (libgompd_la_OBJECTS): Add ompd-addr.c.
* Makefile.in: Regenerate.
* libgompd.h (struct gompd_env): Add an extern declaration.
* ompd-lib.c (struct gompd_env): Define.
(gompd_set_environment): Define a placeholder function.
(ompd_initialize): Add a call to gompd_set_environment.
* ompd-addr.c: New file.

---
 libgomp/Makefile.am |  2 +-
 libgomp/Makefile.in |  5 +--
 libgomp/libgompd.h  | 15 +
 libgomp/ompd-addr.c | 81 +
 libgomp/ompd-lib.c  | 12 +++
 5 files changed, 112 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/ompd-addr.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index fe0a92122ea..0a4a9c10eb9 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -90,7 +90,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c oacc-target.c
 
-libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-addr.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 2b487e00499..9ceb2c6e460 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -235,7 +235,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 libgompd_la_LIBADD =
-am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo
+am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo ompd-addr.lo
 libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -592,7 +592,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c $(am__append_4)
-libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-addr.c
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -816,6 +816,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-profiling.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-addr.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-lib.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-proc.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h
index 495995e00d3..36cb858589e 100644
--- a/libgomp/libgompd.h
+++ b/libgomp/libgompd.h
@@ -47,4 +47,19 @@ typedef struct _ompd_aspace_handle {
   ompd_size_t ref_count;
 } ompd_address_space_handle_t;
 
+struct gompd_env
+{
+  /* TODO: when the struct is better defined, turn it into a compact form.
+ LINK: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549698.html
+ For now, keep it as a struct.  */
+
+  /* Environment set version number.  */
+  ompd_word_t gompd_env_version;
+  /* Represents _OPENMP that is in mm format.  */
+  ompd_word_t openmp_version;
+};
+
+/* TODO: when gompd_env is better defined, turn it into a compact form.  */
+extern struct gompd_env gompd_env_info;
+
 #endif /* LIBGOMPD_H */
diff --git a/libgomp/ompd-addr.c b/libgomp/ompd-addr.c
new file mode 100644
index 000..b751d64ce28
--- /dev/null
+++ b/libgomp/ompd-addr.c
@@ -0,0 +1,81 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Yoosuk Sim .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception alo

[PATCH 2/2] openacc: Adjust dynamic reference count semantics

2020-07-09 Thread Julian Brown
This patch adjusts how dynamic reference counts work so that they match
the semantics of the source program more closely, instead of representing
"excess" reference counts beyond those that represent pointers in the
internal libgomp splay-tree data structure. This allows some corner
cases to be handled more gracefully.

OK?

Thanks,

Julian

2020-07-10  Julian Brown  
Thomas Schwinge  

libgomp/
* libgomp.h (struct splay_tree_key_s): Change virtual_refcount to
dynamic_refcount.
(struct gomp_device_descr): Remove GOMP_MAP_VARS_OPENACC_ENTER_DATA.
* oacc-mem.c (acc_map_data): Substitute virtual_refcount for
dynamic_refcount.
(acc_unmap_data): Update comment.
(goacc_map_var_existing, goacc_enter_datum): Adjust for
dynamic_refcount semantics.
(goacc_exit_datum_1, goacc_exit_datum): Re-add some error checking.
Adjust for dynamic_refcount semantics.
(goacc_enter_data_internal): Implement "present" case of dynamic
memory-map handling here.  Update "non-present" case for
dynamic_refcount semantics.
(goacc_exit_data_internal): Use goacc_exit_datum_1.
* target.c (gomp_map_vars_internal): Remove
GOMP_MAP_VARS_OPENACC_ENTER_DATA handling.  Update for dynamic_refcount
handling.
(gomp_unmap_vars_internal): Remove virtual_refcount handling.
(gomp_load_image_to_device): Substitute dynamic_refcount for
virtual_refcount.
* testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Remove XFAILs.
* testsuite/libgomp.oacc-c-c++-common/refcounting-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/refcounting-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c: New test.
* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Remove XFAILs.
* testsuite/libgomp.oacc-fortran/dynamic-incr-structural-1.f90: New
test.
* testsuite/libgomp.oacc-c-c++-common/structured-dynamic-lifetimes-4.c:
Remove stale comment.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-1.f90: Remove XFAILs.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-2.F90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-3-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90: Adjust XFAIL.
---
 libgomp/libgomp.h |   8 +-
 libgomp/oacc-mem.c| 236 --
 libgomp/target.c  |  38 +--
 .../libgomp.oacc-c-c++-common/pr92843-1.c |   6 +-
 .../libgomp.oacc-c-c++-common/refcounting-1.c |  31 +++
 .../libgomp.oacc-c-c++-common/refcounting-2.c |  31 +++
 .../libgomp.oacc-c-c++-common/struct-3-1-1.c  |  34 +++
 .../structured-dynamic-lifetimes-4.c  |   2 -
 .../libgomp.oacc-fortran/deep-copy-6.f90  |   8 -
 .../dynamic-incr-structural-1.f90 |  48 
 .../mdc-refcount-1-1-1.f90|   8 -
 .../mdc-refcount-1-1-2.F90|   3 -
 .../mdc-refcount-1-2-1.f90|   8 -
 .../mdc-refcount-1-2-2.f90|   8 -
 .../mdc-refcount-1-3-1.f90|   8 -
 .../mdc-refcount-1-4-1.f90|   3 +-
 16 files changed, 312 insertions(+), 168 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/refcounting-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/refcounting-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/dynamic-incr-structural-1.f90

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index ca42e0de640..7b52ce7d5c2 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1016,11 +1016,8 @@ struct splay_tree_key_s {
   uintptr_t tgt_offset;
   /* Reference count.  */
   uintptr_t refcount;
-  /* Reference counts beyond those that represent genuine references in the
- linked splay tree key/target memory structures, e.g. for multiple OpenACC
- "present increment" operations (via "acc enter data") referring to the 
same
- host-memory block.  */
-  uintptr_t virtual_refcount;
+  /* Dynamic reference count.  */
+  uintptr_t dynamic_refcount;
   struct splay_tree_aux *aux;
 };
 
@@ -1153,7 +1150,6 @@ struct gomp_device_descr
 enum gomp_map_vars_kind
 {
   GOMP_MAP_VARS_OPENACC,
-  GOMP_MAP_VARS_OPENACC_ENTER_DATA,
   GOMP_MAP_VARS_TARGET,
   GOMP_MAP_VARS_DATA,
   GOMP_MAP_VARS_ENTER_DATA
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 6f9043bf5b1..c839e5e9542 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -409,7 +409,7 @@ acc_map_data (void *h, void *d, size_t s)
   splay_tree_key n = tgt->list[0].key;
   assert (n);
   assert (n->refcount

[PATCH 1/2] openacc: Helper functions for enter/exit data using single mapping

2020-07-09 Thread Julian Brown
This patch factors out the parts of goacc_enter_datum and
goacc_exit_datum that can be shared with goacc_enter_data_internal
and goacc_exit_data_internal respectively (in the next patch),
without overloading function return values or complicating code paths
unnecessarily.

OK?

Thanks,

Julian

2020-07-10  Julian Brown  

libgomp/
* oacc-mem.c (goacc_map_var_existing): New function.
(goacc_enter_datum): Use above function.
(goacc_exit_datum_1): New function.
(goacc_exit_datum): Use above function.
---
 libgomp/oacc-mem.c | 127 ++---
 1 file changed, 74 insertions(+), 53 deletions(-)

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 247ca135804..6f9043bf5b1 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -498,6 +498,36 @@ acc_unmap_data (void *h)
 }
 
 
+/* Helper function to map a single dynamic data item, represented by a single
+   mapping.  The acc_dev->lock should be held on entry, and remains locked on
+   exit.  */
+
+static void *
+goacc_map_var_existing (struct gomp_device_descr *acc_dev, void *hostaddr,
+   size_t size, goacc_aq aq, splay_tree_key n)
+{
+  assert (n);
+
+  /* Present. */
+  void *d = (void *) (n->tgt->tgt_start + n->tgt_offset + hostaddr
+   - n->host_start);
+
+  if (hostaddr + size > (void *) n->host_end)
+{
+  gomp_mutex_unlock (&acc_dev->lock);
+  gomp_fatal ("[%p,+%d] not mapped", hostaddr, (int) size);
+}
+
+  assert (n->refcount != REFCOUNT_LINK);
+  if (n->refcount != REFCOUNT_INFINITY)
+{
+  n->refcount++;
+  n->virtual_refcount++;
+}
+
+  return d;
+}
+
 /* Enter dynamic mapping for a single datum.  Return the device pointer.  */
 
 static void *
@@ -529,27 +559,10 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void 
*kinds, int async)
   gomp_mutex_lock (&acc_dev->lock);
 
   n = lookup_host (acc_dev, hostaddrs[0], sizes[0]);
+  goacc_aq aq = get_goacc_asyncqueue (async);
   if (n)
 {
-  void *h = hostaddrs[0];
-  size_t s = sizes[0];
-
-  /* Present. */
-  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start);
-
-  if ((h + s) > (void *)n->host_end)
-   {
- gomp_mutex_unlock (&acc_dev->lock);
- gomp_fatal ("[%p,+%d] not mapped", (void *)h, (int)s);
-   }
-
-  assert (n->refcount != REFCOUNT_LINK);
-  if (n->refcount != REFCOUNT_INFINITY)
-   {
- n->refcount++;
- n->virtual_refcount++;
-   }
-
+  d = goacc_map_var_existing (acc_dev, hostaddrs[0], sizes[0], aq, n);
   gomp_mutex_unlock (&acc_dev->lock);
 }
   else
@@ -558,8 +571,6 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void 
*kinds, int async)
 
   gomp_mutex_unlock (&acc_dev->lock);
 
-  goacc_aq aq = get_goacc_asyncqueue (async);
-
   struct target_mem_desc *tgt
= gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs, NULL, sizes,
   kinds, true, GOMP_MAP_VARS_OPENACC_ENTER_DATA);
@@ -649,38 +660,13 @@ acc_pcopyin (void *h, size_t s)
 #endif
 
 
-/* Exit a dynamic mapping for a single variable.  */
+/* Helper function to unmap a single data item.  Device lock should be held on
+   entry, and remains locked on exit.  */
 
 static void
-goacc_exit_datum (void *h, size_t s, unsigned short kind, int async)
+goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s,
+   unsigned short kind, splay_tree_key n, goacc_aq aq)
 {
-  /* No need to call lazy open, as the data must already have been
- mapped.  */
-
-  kind &= 0xff;
-
-  struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev = thr->dev;
-
-  if (acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
-return;
-
-  acc_prof_info prof_info;
-  acc_api_info api_info;
-  bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info, &api_info);
-  if (profiling_p)
-{
-  prof_info.async = async;
-  prof_info.async_queue = prof_info.async;
-}
-
-  gomp_mutex_lock (&acc_dev->lock);
-
-  splay_tree_key n = lookup_host (acc_dev, h, s);
-  if (!n)
-/* PR92726, RP92970, PR92984: no-op.  */
-goto out;
-
   if ((uintptr_t) h < n->host_start || (uintptr_t) h + s > n->host_end)
 {
   size_t host_size = n->host_end - n->host_start;
@@ -691,6 +677,7 @@ goacc_exit_datum (void *h, size_t s, unsigned short kind, 
int async)
 
   bool finalize = (kind == GOMP_MAP_DELETE
   || kind == GOMP_MAP_FORCE_FROM);
+
   if (finalize)
 {
   if (n->refcount != REFCOUNT_INFINITY)
@@ -709,8 +696,6 @@ goacc_exit_datum (void *h, size_t s, unsigned short kind, 
int async)
 
   if (n->refcount == 0)
 {
-  goacc_aq aq = get_goacc_asyncqueue (async);
-
   bool copyout = (kind == GOMP_MAP_FROM
  || kind == GOMP_MAP_FORCE_FROM);
   if (copyout)
@@ -740,8 +725,44 @@ goacc_exit_datum (void *h, size_t s, unsigned short kind, 
int async)
 

[PATCH 0/2] openacc: Refactor enter/exit data code paths, adjust dynamic refcount semantics

2020-07-09 Thread Julian Brown
These two patches follow on from:

https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548908.html

and:

https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549178.html

The first patch has been rethought somewhat -- we can factor out
the common parts of goacc_enter_datum/goacc_enter_data_internal and
goacc_exit_datum/goacc_exit_data_internal into two new functions.
Doing it this way avoids overloading the return type of the "enter data"
function (i.e. returning NULL for multiple mappings -- I really don't
like that "overloading"), and is less problematic wrt. the profiling API,
I think.

The second patch has been adjusted to account for the first (using the
newly factored-out helper functions as appropriate), and I've also fixed
some locking thinkos and added code to fix the new struct-3-1-1.c test.

Currently re-testing with offloading to NVPTX -- the patches have been
tested in combination with others previously though.

OK now?

Julian

Julian Brown (2):
  openacc: Helper functions for enter/exit data using single mapping
  openacc: Adjust dynamic reference count semantics

 libgomp/libgomp.h |   8 +-
 libgomp/oacc-mem.c| 355 +++---
 libgomp/target.c  |  38 +-
 .../libgomp.oacc-c-c++-common/pr92843-1.c |   6 +-
 .../libgomp.oacc-c-c++-common/refcounting-1.c |  31 ++
 .../libgomp.oacc-c-c++-common/refcounting-2.c |  31 ++
 .../libgomp.oacc-c-c++-common/struct-3-1-1.c  |  34 ++
 .../structured-dynamic-lifetimes-4.c  |   2 -
 .../libgomp.oacc-fortran/deep-copy-6.f90  |   8 -
 .../dynamic-incr-structural-1.f90 |  48 +++
 .../mdc-refcount-1-1-1.f90|   8 -
 .../mdc-refcount-1-1-2.F90|   3 -
 .../mdc-refcount-1-2-1.f90|   8 -
 .../mdc-refcount-1-2-2.f90|   8 -
 .../mdc-refcount-1-3-1.f90|   8 -
 .../mdc-refcount-1-4-1.f90|   3 +-
 16 files changed, 382 insertions(+), 217 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/refcounting-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/refcounting-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/dynamic-incr-structural-1.f90

-- 
2.23.0



Re: [PATCH] rs6000: Define movsf_from_si2 to extract high part SF element from DImode[PR89310]

2020-07-09 Thread luoxhu via Gcc-patches
Hi,

On 2020/7/10 03:25, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jul 09, 2020 at 11:09:42AM +0800, luoxhu wrote:
>>> Maybe change it back to just SI?  It won't match often at all for QI or
>>> HI anyway, it seems.  Sorry for that detour.  Should be good with the
>>> above nits fixed :-)
>>
>> OK, if I see correctly, subreg of DImode should be SImode and I used
>> subreg:SI to match only SI, so no need to consider mask for QI and HI? :)
> 
> My problem with it was that the shift amounts won't be 32 for QI etc.?

But if the subreg:SI only accepts SImode comes in, why do we need handle the
shift amounts for QI/HI? It not using QHSI here...

> 
>> +;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>> +;; split it before reload with "and mask" to avoid generating shift right
>> +;; 32 bit then shift left 32 bit.
>> +(define_insn_and_split "movsf_from_si2"
>> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>> +(unspec:SF [
>> + (subreg:SI (
>> + ashiftrt:DI (
> 
> Opening parens *start* a line, they never end it.  So:
> 
> (define_insn_and_split "movsf_from_si2"
>[(set (match_operand:SF 0 "vsx_reg_sfsubreg_ok" "=wa")
> (unspec:SF
>   [(subreg:SI
>  (ashiftrt:DI
>(match_operand:DI 1 "input_operand" "r")
>(const_int 32))
> 0)]
>   UNSPEC_SF_FROM_SI))
> (clobber (match_scratch:DI 2 "=r"))]
> 
> or something like that.  There aren't really any real rules...  The
> important points are that nested things should be indented, and things
> at the same level should have the same indent (like the outer set and
> clobber).  The [ for an unspec is sometimes put at the end of a line,
> that reads a little better perhaps.

OK, seems the md file needs a format tool too...

> 
>> +  "TARGET_NO_SF_SUBREG"
>> +  "#"
>> +  "&& vsx_reg_sfsubreg_ok (operands[0], SFmode)"
> 
> Put this in the insn condition?  And since this is just a predicate,
> you can just use it instead of gpc_reg_operand.
> 
> (The split condition becomes "&& 1" then, not "").

OK, this seems a bit strange as movsi_from_sf, movsf_from_si and
movdi_from_sf_zero_ext all use it as condition...

And why vsx_reg_sfsubreg_ok allows "SF SUBREGS" and TARGET_NO_SF_SUBREG
"avoid (SUBREG:SF (REG:SI)", I guess they are not the same meaning? (The 
TARGET_NO_SF_SUBREG is also copied from other similar defines.)  Thanks.


Updated patch and correct the title:


Define movsf_from_si2 to extract high part SF element from DImode[PR89310]


For extracting high part element from DImode register like:

{%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}

split it before reload with "and mask" to avoid generating shift right
32 bit then shift left 32 bit.  This pattern also exists in PR42475 and
PR67741, etc.

srdi 3,3,32
sldi 9,3,32
mtvsrd 1,9
xscvspdpn 1,1

=>

rldicr 3,3,0,31
mtvsrd 1,3
xscvspdpn 1,1

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog:

2020-07-08  Xionghu Luo  

PR rtl-optimization/89310
* config/rs6000/rs6000.md (movsf_from_si2): New
define_insn_and_split.

gcc/testsuite/ChangeLog:

2020-07-08  Xionghu Luo  

PR rtl-optimization/89310
* gcc.target/powerpc/pr89310.c: New test.
---
 gcc/config/rs6000/rs6000.md| 32 ++
 gcc/testsuite/gcc.target/powerpc/pr89310.c | 17 
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr89310.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 4fcd6a94022..2331c84dcbd 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7593,6 +7593,38 @@ (define_insn_and_split "movsf_from_si"
"*,  *, p9v,   p8v,   *, *,
 p8v,p8v,   p8v,   *")])
 
+;; For extracting high part element from DImode register like:
+;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
+;; split it before reload with "and mask" to avoid generating shift right
+;; 32 bit then shift left 32 bit.
+(define_insn_and_split "movsf_from_si2"
+  [(set (match_operand:SF 0 "vsx_reg_sfsubreg_ok" "=wa")
+   (unspec:SF
+[(subreg:SI
+  (ashiftrt:DI
+   (match_operand:DI 1 "input_operand" "r")
+   (const_int 32))
+  0)]
+UNSPEC_SF_FROM_SI))
+  (clobber (match_scratch:DI 2 "=r"))]
+  "TARGET_NO_SF_SUBREG"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  if (GET_CODE (operands[2]) == SCRATCH)
+operands[2] = gen_reg_rtx (DImode);
+
+  rtx mask = GEN_INT (HOST_WIDE_INT_M1U << 32);
+  emit_insn (gen_anddi3 (operands[2], operands[1], mask));
+  emit_insn (gen_p8_mtvsrd_sf (operands[0], operands[2]));
+  emit_insn (gen_vsx_xscvspdpn_directmove (operands[0], operands[0]));
+  DONE;
+}
+  [(set_attr "length" "12")
+  (set_attr "type" "vecfloat")
+  (set_attr "isa" "p8v")])
 
 ;; Move 64-bit binary/decimal floating point

Re: [PATCH V2] PING^2 correct COUNT and PROB for unrolled loop

2020-07-09 Thread Jiufu Guo via Gcc-patches


Hi Martin,

Thanks so much for your time and kindly help!!!

Wish Richi, Bin or Honza have time to review this patch. ;-)

--Here is a summmary---

PR68212 mentioned that the COUNT of unrolled loop was not correct, and
comments of this PR also mentioned that loop become 'cold'.

The following patch fixes the wrong COUNT/PROB of unrolled loop.  And
the patch resets the COUNT the case where unrolling in unreliable count
number can cause a loop to no longer look hot and therefor not get aligned.

Belows messages are referenced.
(https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02368.html) and comment
(https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02380.html,
https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00044.html),

Bootstrap and regtest are ok on powerpc64le.  Is this ok for trunk?

ChangeLog:
2020-07-10  Jiufu Guo   
Pat Haugen  

PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Correct COUNT/PROB
for unrolled/peeled blocks.

testsuite/ChangeLog:
2020-07-10  Jiufu Guo   
Pat Haugen  
PR rtl-optimization/68212
* gcc.dg/pr68212.c: New test.

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 727e951..ded0046 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify-me.h"
 #include "tree-ssa-loop-manip.h"
 #include "dumpfile.h"
+#include "cfgrtl.h"
 
 static void copy_loops_to (class loop **, int,
   class loop *);
@@ -1258,14 +1259,30 @@ duplicate_loop_to_header_edge (class loop *loop, edge e,
  /* If original loop is executed COUNT_IN times, the unrolled
 loop will account SCALE_MAIN_DEN times.  */
  scale_main = count_in.probability_in (scale_main_den);
+
+ /* If we are guessing at the number of iterations and count_in
+becomes unrealistically small, reset probability.  */
+ if (!(count_in.reliable_p () || loop->any_estimate))
+   {
+ profile_count new_count_in = count_in.apply_probability 
(scale_main);
+ profile_count preheader_count = loop_preheader_edge (loop)->count 
();
+ if (new_count_in.apply_scale (1, 10) < preheader_count)
+   scale_main = profile_probability::likely ();
+   }
+
  scale_act = scale_main * prob_pass_main;
}
   else
{
+ profile_count new_loop_count;
  profile_count preheader_count = e->count ();
- for (i = 0; i < ndupl; i++)
-   scale_main = scale_main * scale_step[i];
  scale_act = preheader_count.probability_in (count_in);
+ /* Compute final preheader count after peeling NDUPL copies.  */
+ for (i = 0; i < ndupl; i++)
+   preheader_count = preheader_count.apply_probability (scale_step[i]);
+ /* Subtract out exit(s) from peeled copies.  */
+ new_loop_count = count_in - (e->count () - preheader_count);
+ scale_main = new_loop_count.probability_in (count_in);
}
 }
 
@@ -1381,6 +1398,38 @@ duplicate_loop_to_header_edge (class loop *loop, edge e,
  scale_bbs_frequencies (new_bbs, n, scale_act);
  scale_act = scale_act * scale_step[j];
}
+
+  /* Need to update PROB of exit edge and corresponding COUNT.  */
+  if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1))
+ && bbs_to_scale)
+   {
+ edge new_exit = new_spec_edges[SE_ORIG];
+ profile_count new_count_in = new_exit->src->count;
+ profile_count preheader_count = loop_preheader_edge (loop)->count ();
+ edge e;
+ edge_iterator ei;
+
+ FOR_EACH_EDGE (e, ei, new_exit->src->succs)
+   if (e != new_exit)
+ break;
+
+ gcc_assert (e && e != new_exit);
+
+ new_exit->probability = preheader_count.probability_in (new_count_in);
+ e->probability = new_exit->probability.invert ();
+
+ profile_count new_latch_count
+   = new_exit->src->count.apply_probability (e->probability);
+ profile_count old_latch_count = e->dest->count;
+
+ EXECUTE_IF_SET_IN_BITMAP (bbs_to_scale, 0, i, bi)
+   scale_bbs_frequencies_profile_count (new_bbs + i, 1,
+new_latch_count,
+old_latch_count);
+
+ if (current_ir_type () != IR_GIMPLE)
+   update_br_prob_note (e->src);
+   }
 }
   free (new_bbs);
   free (orig_loops);
diff --git a/gcc/testsuite/gcc.dg/pr68212.c b/gcc/testsuite/gcc.dg/pr68212.c
new file mode 100644
index 000..f3b7c22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68212.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param 
max-unroll-times=4 -fdump-rtl-alignments" } */
+
+void foo(long int *a, long int *b, long int n)
+{
+  long int i;
+
+  for (i = 0

[PATCH 1/2] rs6000: Init V4SF vector without converting SP to DP

2020-07-09 Thread Xiong Hu Luo via Gcc-patches
Move V4SF to V4SI, init vector like V4SI and move to V4SF back.
Better instruction sequence could be generated on Power9:

lfs + xxpermdi + xvcvdpsp + vmrgew
=>
lwz + (sldi + or) + mtvsrdd

With the patch followed, it could be continue optimized to:

lwz + rldimi + mtvsrdd

The point is to use lwz to avoid converting the single-precision to
double-precision upon load, pack four 32-bit data into one 128-bit
register directly.

gcc/ChangeLog:

2020-07-10  Xionghu Luo  

* config/rs6000/rs6000.c (rs6000_expand_vector_init):
Move V4SF to V4SI, init vector like V4SI and move to V4SF back.
---
 gcc/config/rs6000/rs6000.c | 49 +++---
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 58f5d780603..d94e88c23a5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6423,35 +6423,34 @@ rs6000_expand_vector_init (rtx target, rtx vals)
}
   else
{
- rtx dbl_even = gen_reg_rtx (V2DFmode);
- rtx dbl_odd  = gen_reg_rtx (V2DFmode);
- rtx flt_even = gen_reg_rtx (V4SFmode);
- rtx flt_odd  = gen_reg_rtx (V4SFmode);
- rtx op0 = force_reg (SFmode, XVECEXP (vals, 0, 0));
- rtx op1 = force_reg (SFmode, XVECEXP (vals, 0, 1));
- rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2));
- rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3));
-
- /* Use VMRGEW if we can instead of doing a permute.  */
- if (TARGET_P8_VECTOR)
+ rtx tmpSF[4];
+ rtx tmpSI[4];
+ rtx tmpDI[4];
+ rtx mrgDI[4];
+ for (i = 0; i < 4; i++)
{
- emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op2));
- emit_insn (gen_vsx_concat_v2sf (dbl_odd, op1, op3));
- emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even));
- emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd));
- if (BYTES_BIG_ENDIAN)
-   emit_insn (gen_p8_vmrgew_v4sf_direct (target, flt_even, 
flt_odd));
- else
-   emit_insn (gen_p8_vmrgew_v4sf_direct (target, flt_odd, 
flt_even));
+ tmpSI[i] = gen_reg_rtx (SImode);
+ tmpDI[i] = gen_reg_rtx (DImode);
+ mrgDI[i] = gen_reg_rtx (DImode);
+ tmpSF[i] = force_reg (SFmode, XVECEXP (vals, 0, i));
+ emit_insn (gen_movsi_from_sf (tmpSI[i], tmpSF[i]));
+ emit_insn (gen_zero_extendsidi2 (tmpDI[i], tmpSI[i]));
}
- else
+
+ if (!BYTES_BIG_ENDIAN)
{
- emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1));
- emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3));
- emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even));
- emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd));
- rs6000_expand_extract_even (target, flt_even, flt_odd);
+ std::swap (tmpDI[0], tmpDI[1]);
+ std::swap (tmpDI[2], tmpDI[3]);
}
+
+ emit_insn (gen_ashldi3 (mrgDI[0], tmpDI[0], GEN_INT (32)));
+ emit_insn (gen_iordi3 (mrgDI[1], mrgDI[0], tmpDI[1]));
+ emit_insn (gen_ashldi3 (mrgDI[2], tmpDI[2], GEN_INT (32)));
+ emit_insn (gen_iordi3 (mrgDI[3], mrgDI[2], tmpDI[3]));
+
+ rtx tmpV2DI = gen_reg_rtx (V2DImode);
+ emit_insn (gen_vsx_concat_v2di (tmpV2DI, mrgDI[1], mrgDI[3]));
+ emit_move_insn (target, gen_lowpart (V4SFmode, tmpV2DI));
}
   return;
 }
-- 
2.27.0.90.geebb51ba8c



[PATCH 2/2] rs6000: Define define_insn_and_split to split unspec sldi+or to rldimi

2020-07-09 Thread Xiong Hu Luo via Gcc-patches
Combine pass could recognize the pattern defined and split it in split1,
this patch could optimize:

21: r130:DI=r133:DI<<0x20
11: {r129:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;}
22: r134:DI=r130:DI|r129:DI

to

21: {r149:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;}
22: r134:DI=r149:DI&0x|r133:DI<<0x20

rldimi is generated instead of sldi+or.

gcc/ChangeLog:

2020-07-10  Xionghu Luo  

* config/rs6000/rs6000.md (rotl_unspec): New
define_insn_and_split.

gcc/testsuite/ChangeLog:

2020-07-10  Xionghu Luo  

* gcc.target/powerpc/vector_float.c: New test.
---
 gcc/config/rs6000/rs6000.md   | 26 +++
 .../gcc.target/powerpc/vector_float.c | 14 ++
 2 files changed, 40 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vector_float.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 0aa5265d199..64b655df363 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4239,6 +4239,32 @@
   operands[5] = GEN_INT ((HOST_WIDE_INT_1U << ) - 1);
 })
 
+; rldimi with UNSPEC_SI_FROM_SF.
+(define_insn_and_split "*rotl_unspec"
+  [(set (match_operand:DI 0 "gpc_reg_operand")
+   (ior:DI
+(ashift:DI (match_operand:DI 1 "gpc_reg_operand")
+ (match_operand:SI 2 "const_int_operand"))
+(zero_extend:DI
+ (unspec:QHSI
+  [(match_operand:SF 3 "memory_operand")]
+  UNSPEC_SI_FROM_SF
+  (clobber (match_scratch:V4SF 4))]
+  "INTVAL (operands[2]) == "
+  "#"
+  ""
+  [(parallel [(set (match_dup 5)
+  (zero_extend:DI (unspec:QHSI [(match_dup 3)] UNSPEC_SI_FROM_SF)))
+(clobber (match_dup 4))])
+  (set (match_dup 0)
+   (ior:DI
+(and:DI (match_dup 5) (match_dup 6))
+(ashift:DI (match_dup 1) (match_dup 2]
+{
+  operands[5] = gen_reg_rtx (DImode);
+  operands[6] = GEN_INT ((HOST_WIDE_INT_1U << ) - 1);
+})
+
 ; rlwimi, too.
 (define_split
   [(set (match_operand:SI 0 "gpc_reg_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vector_float.c 
b/gcc/testsuite/gcc.target/powerpc/vector_float.c
new file mode 100644
index 000..414824ad264
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector_float.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+vector float
+test (float *a, float *b, float *c, float *d)
+{
+  return (vector float){*a, *b, *c, *d};
+}
+
+/* { dg-final { scan-assembler-not {\mlxsspx\M} } } */
+/* { dg-final { scan-assembler-not {\mlfs\M} } } */
+/* { dg-final { scan-assembler-times {\mlwz\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */
-- 
2.27.0.90.geebb51ba8c



Re: [PATCH 1/2] rs6000: Init V4SF vector without converting SP to DP

2020-07-09 Thread luoxhu via Gcc-patches
Update patch to keep the logic for non TARGET_P8_VECTOR targets.
Please ignore the previous [PATCH 1/2], Sorry!


Move V4SF to V4SI, init vector like V4SI and move to V4SF back.
Better instruction sequence could be generated on Power9:

lfs + xxpermdi + xvcvdpsp + vmrgew
=>
lwz + (sldi + or) + mtvsrdd

With the patch followed, it could be continue optimized to:

lwz + rldimi + mtvsrdd

The point is to use lwz to avoid converting the single-precision to
double-precision upon load, pack four 32-bit data into one 128-bit
register directly.

gcc/ChangeLog:

2020-07-10  Xionghu Luo  

* config/rs6000/rs6000.c (rs6000_expand_vector_init):
Move V4SF to V4SI, init vector like V4SI and move to V4SF back.
---
 gcc/config/rs6000/rs6000.c | 55 +-
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 58f5d780603..00972fb5165 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6423,29 +6423,48 @@ rs6000_expand_vector_init (rtx target, rtx vals)
}
   else
{
- rtx dbl_even = gen_reg_rtx (V2DFmode);
- rtx dbl_odd  = gen_reg_rtx (V2DFmode);
- rtx flt_even = gen_reg_rtx (V4SFmode);
- rtx flt_odd  = gen_reg_rtx (V4SFmode);
- rtx op0 = force_reg (SFmode, XVECEXP (vals, 0, 0));
- rtx op1 = force_reg (SFmode, XVECEXP (vals, 0, 1));
- rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2));
- rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3));
-
- /* Use VMRGEW if we can instead of doing a permute.  */
  if (TARGET_P8_VECTOR)
{
- emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op2));
- emit_insn (gen_vsx_concat_v2sf (dbl_odd, op1, op3));
- emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even));
- emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd));
- if (BYTES_BIG_ENDIAN)
-   emit_insn (gen_p8_vmrgew_v4sf_direct (target, flt_even, 
flt_odd));
- else
-   emit_insn (gen_p8_vmrgew_v4sf_direct (target, flt_odd, 
flt_even));
+ rtx tmpSF[4];
+ rtx tmpSI[4];
+ rtx tmpDI[4];
+ rtx mrgDI[4];
+ for (i = 0; i < 4; i++)
+   {
+ tmpSI[i] = gen_reg_rtx (SImode);
+ tmpDI[i] = gen_reg_rtx (DImode);
+ mrgDI[i] = gen_reg_rtx (DImode);
+ tmpSF[i] = force_reg (SFmode, XVECEXP (vals, 0, i));
+ emit_insn (gen_movsi_from_sf (tmpSI[i], tmpSF[i]));
+ emit_insn (gen_zero_extendsidi2 (tmpDI[i], tmpSI[i]));
+   }
+
+ if (!BYTES_BIG_ENDIAN)
+   {
+ std::swap (tmpDI[0], tmpDI[1]);
+ std::swap (tmpDI[2], tmpDI[3]);
+   }
+
+ emit_insn (gen_ashldi3 (mrgDI[0], tmpDI[0], GEN_INT (32)));
+ emit_insn (gen_iordi3 (mrgDI[1], mrgDI[0], tmpDI[1]));
+ emit_insn (gen_ashldi3 (mrgDI[2], tmpDI[2], GEN_INT (32)));
+ emit_insn (gen_iordi3 (mrgDI[3], mrgDI[2], tmpDI[3]));
+
+ rtx tmpV2DI = gen_reg_rtx (V2DImode);
+ emit_insn (gen_vsx_concat_v2di (tmpV2DI, mrgDI[1], mrgDI[3]));
+ emit_move_insn (target, gen_lowpart (V4SFmode, tmpV2DI));
}
  else
{
+ rtx dbl_even = gen_reg_rtx (V2DFmode);
+ rtx dbl_odd  = gen_reg_rtx (V2DFmode);
+ rtx flt_even = gen_reg_rtx (V4SFmode);
+ rtx flt_odd  = gen_reg_rtx (V4SFmode);
+ rtx op0 = force_reg (SFmode, XVECEXP (vals, 0, 0));
+ rtx op1 = force_reg (SFmode, XVECEXP (vals, 0, 1));
+ rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2));
+ rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3));
+
  emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1));
  emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3));
  emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even));
-- 
2.27.0.90.geebb51ba8c




[committed] wwwdocs: Complete note on -fallow-invalid-boz.

2020-07-09 Thread Gerald Pfeifer
...and move from passive voice to active voice on the way.

Pushed.

Gerald

---
 htdocs/gcc-10/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 4e5f6139..2b773f1d 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -572,7 +572,7 @@ int get_na??ve_pi() {
 entity.  As a part of the rework, documented and undocumented
 extensions to the Fortran standard now emit errors during compilation.
 Some of these extensions are permitted with the
--fallow-invalid-boz, where the error is degraded to a
+-fallow-invalid-boz option, which degrades the error to a
 warning and the code is compiled as with older gfortran.
   
   
-- 
2.27.0


Re: [PATCH] middle-end: Call get_constant_section with DECL not EXP.

2020-07-09 Thread Richard Biener via Gcc-patches
On Thu, Jul 9, 2020 at 8:29 PM David Edelsohn  wrote:
>
> output_constant_def_contents() can call get_constant_section() with an
> EXP that is a CONSTRUCTOR, which is not a declaration.  This can hit
> asserts in GCC machinery to choose a named section for the initialization
> data that expects the parameters to be DECLs.
>
> get_constant_section() is a wrapper around TARGET_ASM_SELECT_SECTION,
> which is documented as:
>
> Return the section into which @var{exp} should be placed.  You can
> assume that @var{exp} is either a @code{VAR_DECL} node or a constant of
> some sort.
>
> A CONSTRUCTOR initializer is a "constant of some sort", but isn't
> consistent with DECL_SECTION_NAME, etc.
>
> This patch changes get_constant_section() and its callers to pass the
> DECL within the EXP instead of the EXP, and to explicitly pass the reloc
> information that must be determined from the EXP.
>
> Another option is to remove get_constant_section() completely, replace
> it with direct calls to targetm.asm_out.select_section() (for which it
> now is a complete pass-through), and update the Target Hook
> documentation.
>
> What's the preferred path forward?

Given we can always pass a decl - in the cases you change the former
parameter is simply DECL_INITIAL of the now passed decl - this is
a welcome cleanup of the TARGET_ASM_SELECT_SECTION
interface and it should be amended with adjusting its documentation.

That in turn might be an opportunity to cleanup target code.

I think we cannot rule out that targets special-case say STRING_CST
but not DECL_INITIAL (decl) == STRING_CST so some kind of
auditing is probably required.

David - I know you need this for AIX - I think the target hook implementation
is supposed to assume that if it doesn't get passed a decl then it's
a "random constant" without any special attributes attached.  What's
your requirement that can possibly only be fulfilled with seeing a decl?

Richard.

> Thanks, David
>
> gcc/ChangeLog
>
> 2020-07-09  David Edelsohn  
>
> * varasm.c (get_constant_section): Change first parameter from
> EXP to DECL and insert reloc second parameter.
> (build_constant_desc): Adjust call.
> (output_constant_def_contents): Adjust call.