Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-10-28 Thread Chung-Lin Tang




On 2020/10/27 9:17 PM, Julian Brown wrote:

And, in which context are cuStreamAddCallback registered callbacks
run? E.g. if it is inside of asynchronous interrput, using locking in
there might not be the best thing to do.

The cuStreamAddCallback API is documented here:

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483

We're quite limited in what we can do in the callback function since
"Callbacks must not make any CUDA API calls". So what*can*  a callback
function do? It is mentioned that the callback function's execution will
"pause" the stream it is logically running on. So can we get deadlock,
e.g. if multiple host threads are launching offload kernels
simultaneously? I don't think so, but I don't know how to prove it!


I think it's not deadlock that's a problem here, but that the locking acquiring
in nvptx_stack_acquire will effectively serialize GPU kernel execution to just
one host thread (since you're holding it till kernel completion).
Also in that case, why do you need to use a CUDA callback? You can just call the
unlock directly afterwards.

I think a better way is to use a list of stack blocks in ptx_dev, and quickly
retrieve/unlock it in nvptx_stack_acquire, like how we did it in 
GOMP_OFFLOAD_alloc for
general device memory allocation.

Chung-Lin


Re: [PATCH]AArch64 Fix overflow in memcopy expansion on aarch64.

2020-10-28 Thread Christophe Lyon via Gcc-patches
Hi,

On Mon, 26 Oct 2020 at 13:44, Richard Sandiford via Gcc-patches
 wrote:
>
> Tamar Christina  writes:
> > Hi Richard,
> >
> > The 10/26/2020 11:29, Richard Sandiford wrote:
> >> Tamar Christina  writes:
> >> >/* We can't do anything smart if the amount to copy is not constant.  
> >> > */
> >> >if (!CONST_INT_P (operands[2]))
> >> >  return false;
> >> >
> >> > -  n = INTVAL (operands[2]);
> >> > +  /* This may get truncated but that's fine as it would be above our 
> >> > maximum
> >> > + memset inline limit.  */
> >> > +  unsigned tmp = INTVAL (operands[2]);
> >>
> >> That's not true for (1ULL << 32) + 1 for example, since the truncated
> >> value will come under the limit.  I think we should just do:
> >>
> >>   unsigned HOST_WIDE_INT tmp = UINTVAL (operands[2]);
> >>
> >> without a comment.
> >>
> >
> > Updated patch attached.
> >
> > Ok for master and GCC 8, 9, 10?
>
> OK, thanks.
>
> Richard

The new test fails with -mabi=ilp32:
FAIL: gcc.target/aarch64/pr97535.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.target/aarch64/pr97535.c:7:1: warning: this decimal
constant is unsigned only in ISO C90
/gcc/testsuite/gcc.target/aarch64/pr97535.c:7:13: error: size of array
'raw_buffer' is too large
/gcc/testsuite/gcc.target/aarch64/pr97535.c:11:9: warning: this
decimal constant is unsigned only in ISO C90

Do you mind fixing it?

Thanks,

Christophe


Re: [r11-4427 Regression] FAIL: gfortran.dg/vect/pr83232.f90 -O scan-tree-dump-times slp1 "vectorizing stmts using SLP" 3 on Linux/x86_64

2020-10-28 Thread Richard Biener
On Tue, 27 Oct 2020, sunil.k.pandey wrote:

> On Linux/x86_64,
> 
> 5af1e827bbb624eb28f80d2c5e0da46185af3708 is the first bad commit
> commit 5af1e827bbb624eb28f80d2c5e0da46185af3708
> Author: Richard Biener 
> Date:   Tue Oct 27 11:03:27 2020 +0100
> 
> Avoid uniform lane BB vectorization
> 
> caused
> 
> FAIL: gfortran.dg/vect/pr83232.f90   -O   scan-tree-dump-times slp1 
> "vectorizing stmts using SLP" 3
> 
> with GCC configured with
> 
> ../../gcc/configure 
> --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-4427/usr
>  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gfortran.dg/vect/pr83232.f90 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gfortran.dg/vect/pr83232.f90 
> --target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at skpgkp2 at gmail dot com)

So the testcase is now improved with AVX but the dump scanning doesn't
match the need to split the vectorization at (1) into a AVX and SSE
part.  There's no suitable target selector yet, the vect_sizes_32B_16B
one is an exact match and doesn't fit.  I filed PR97611 for the
testsuite harness issue.


Re: [patch, shared coarrays, committed] Make header use more consistent

2020-10-28 Thread Andre Vehreschild
Hi Nicolas, Thomas,

are you planing to also rename the directory and library name from "nca" to
"shared_caf" or the like?

Regards,
Andre

On Tue, 27 Oct 2020 17:18:21 -0400
David Edelsohn via Fortran  wrote:

> The current COARRAYS branch correctly bootstraps on AIX.  Thanks for
> correcting the contents and ordering of the header files.
>
> Thanks, David
>
> On Tue, Oct 27, 2020 at 1:31 PM Thomas Koenig  wrote:
> >
> > I just committed
> >
> > https://gcc.gnu.org/g:0c261d5b5c931d9e9214d06531bdc7e9e16aeaab
> >
> > to hopefully fix the header issue on the native_coarray branch.
> >
> > If anybody wants to give this a spin, please go right ahead.
> >
> > I've also discussed with Nicolas on how best to proceed.  The
> > best way forward is probably to merge the branch into trunk at
> > the end of stage 1 and follow Richard's suggestion to use configure.tgt
> > to only compile the shared coarray library for systems where it is
> > known to at least compile. As people test more systems, we can then
> > add these to configure.tgt.
> >
> > Best regards
> >
> > Thomas
> >
> > Always include libgfortran.h first; sanitize header dependencies.
> >
> > libgfortran/ChangeLog:
> >
> > * nca/coarraynative.c: Do not include util.h. Remove commented
> > include for stdlib.h..
> > * nca/collective_subroutine.c: Move #include  after
> > other #include statement.
> > * nca/hashmap.c: Include shared_memory.h and allocator.h
> > * nca/hashmap.h: Remove includess.
> > * nca/libcoarraynative.h: Include only those headers which
> > are needed.
> > * nca/shared_memory.c: Do not include util.h
> > * nca/shared_memory.h: Do not include other headers.
> > * nca/sync.c: Move include of string.h after other headers.
> > * nca/sync.h: Remove include of shared_memory.h and alloc.h.
> > * nca/util.h: Do not include stdint.h and stddef.h; include
> > limits.h and assert.h.
> > * nca/wrapper.c: Remove include for sync.h, util.h and
> > collective_subroutine.h. Move include of string.h after other
> > headers.
> >


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH 2/2] combine: Don't turn (mult (extend x) 2^n) into extract

2020-10-28 Thread Alex Coplan via Gcc-patches
On 27/10/2020 17:31, Segher Boessenkool wrote:
> On Tue, Oct 27, 2020 at 10:35:59AM +, Alex Coplan wrote:
> > On 26/10/2020 12:43, Segher Boessenkool wrote:
> > > I do not like handling both mult and ashift in one case like this, it
> > > complicates things for no good reason.  Write it as two cases, and it
> > > should be good.
> > 
> > OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top
> > of make_extraction so that the existing ASHIFT block can do the work for
> > us. We remember if we did it and then convert it back if necessary.
> > 
> > I'm not convinced that it's an improvement. What do you think?
> 
> Restoring it like that is just yuck.  That can be okay if it is as the
> start and end of a smallish function, importantly some self-contained
> piece of code, but this is not.
> 
> Just write it as two blocks? One handling the shift, that is already
> there; and add one block adding the mult case.  That should not
> increase the complexity of this already way too complex code.

OK, how about the attached?

Bootstrap and regtest in progress on aarch64-none-linux-gnu.

Thanks,
Alex

---

gcc/ChangeLog:

* combine.c (make_extraction): Also handle shfits written as
(mult x 2^n), avoid creating an extract rtx for these.
diff --git a/gcc/combine.c b/gcc/combine.c
index 4782e1d9dcc..5040dabff98 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7665,6 +7665,24 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
   if (new_rtx != 0)
return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
 }
+  else if (GET_CODE (inner) == MULT
+  && CONST_INT_P (XEXP (inner, 1))
+  && pos_rtx == 0 && pos == 0)
+{
+  /* We're extracting the least significant bits of an rtx
+(mult X (const_int 2^C)), where LEN > C.  Extract the
+least significant (LEN - C) bits of X, giving an rtx
+whose mode is MODE, then multiply it by 2^C.  */
+  const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
+  if (shift_amt > 0 && len > shift_amt)
+   {
+ new_rtx = make_extraction (mode, XEXP (inner, 0),
+0, 0, len - shift_amt,
+unsignedp, in_dest, in_compare);
+ if (new_rtx)
+   return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1));
+   }
+}
   else if (GET_CODE (inner) == TRUNCATE
   /* If trying or potentionally trying to extract
  bits outside of is_mode, don't look through


PING [Patch] x86: Enable GCC support for Intel AVX-VNNI extension

2020-10-28 Thread Hongyu Wang via Gcc-patches
Hongyu Wang  于2020年10月14日周三 上午11:27写道:
>
> Hi:
>
> This patch is about to support Intel AVX-VNNI instructions.
>
> AVX-VNNI is an equivalent to AVX512-VNNI with VEX encoding. The instructions 
> are same, but with extra {vex} prefix to distinguish from AVX512-VNNI 
> instructions in assembler.
>
> For more details, please refer to 
> https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
>
> Bootstrap ok, regression test on i386/x86 backend is ok.
>
> OK for master?
>
> 2020-10-13  Hongtao Liu  
> Hongyu Wang  
>
> gcc/
> * common/config/i386/cpuinfo.h (get_available_features):
> Detect AVXVNNI.
> * common/config/i386/i386-common.c
> (OPTION_MASK_ISA2_AVXVNNI_SET,
> OPTION_MASK_ISA2_AVXVNNI_UNSET, OPTION_MASK_ISA2_AVX2_UNSET):
> New.
> (ix86_hanlde_option): Handle -mavxvnni, unset avxvnni when
> avx2 is disabled.
> * common/config/i386/i386-cpuinfo.h (enum processor_features):
> Add FEATURE_AVXVNNI.
> * common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
> for avxvnni.
> * config.gcc: Add avxvnniintrin.h.
> * config/i386/avx512vnniintrin.h: Remove 128/256 bit non-mask
> intrinsics.
> * config/i386/avxvnniintrin.h: New header file.
> * config/i386/cpuid.h (bit_AVXVNNI): New.
> * config/i386/i386-builtins.c (def_builtin): Handle AVXVNNI mask
> for unified builtin.
> * config/i386/i386-builtin.def (BDESC): Adjust AVX512VNNI
> builtins for AVXVNNI.
> * config/i386/i386-c.c (ix86_target_macros_internal): Define
> __AVXVNNI__.
> * config/i386/i386-expand.c (ix86_expand_builtin): Handle bisa
> for AVXVNNI to support unified intrinsic name, since there is no
> dependency between AVX512VNNI and AVXVNNI.
> * config/i386/i386-options.c (isa2_opts): Add -mavxvnni.
> (ix86_valid_target_attribute_inner_p): Handle avxnnni.
> (ix86_valid_target_attribute_inner_p): Ditto.
> * config/i386/i386.h (TARGET_AVXVNNI, TARGET_AVXVNNI_P,
> TARGET_AVXVNNI_P, PTA_AVXVNNI): New.
> (PTA_SAPPHIRERAPIDS): Add AVX_VNNI.
> (PTA_ALDERLAKE): Likewise.
> * config/i386/i386.md ("isa"): Add avxvnni, avx512vnnivl.
> ("enabled"): Adjust for avxvnni and avx512vnnivl.
> * config/i386/i386.opt: Add option -mavxvnni.
> * config/i386/immintrin.h: Include avxvnniintrin.h.
> * config/i386/sse.md (vpdpbusd_): Adjust for AVXVNNI.
> (vpdpbusds_): Likewise.
> (vpdpwssd_): Likewise.
> (vpdpwssds_): Likewise.
> (vpdpbusd_v16si): New.
> (vpdpbusds_v16si): Likewise.
> (vpdpwssd_v16si): Likewise.
> (vpdpwssds_v16si): Likewise.
> * doc/invoke.texi: Document -mavxvnni.
> * doc/extend.texi: Document avxvnni.
> * doc/sourcebuild.texi: Document target avxvnni.
>
> gcc/testsuite/
>
> * gcc.target/i386/avx512vl-vnni-1.c: Rename..
> * gcc.target/i386/avx512vl-vnni-1a.c: To This.
> * gcc.target/i386/avx512vl-vnni-1b.c: New test.
> * gcc.target/i386/avx512vl-vnni-2.c: Ditto.
> * gcc.target/i386/avx512vl-vnni-3.c: Ditto.
> * gcc.target/i386/avx-vnni-1.c: Ditto.
> * gcc.target/i386/avx-vnni-2.c: Ditto.
> * gcc.target/i386/avx-vnni-3.c: Ditto.
> * gcc.target/i386/avx-vnni-4.c: Ditto.
> * gcc.target/i386/avx-vnni-5.c: Ditto.
> * gcc.target/i386/avx-vnni-6.c: Ditto.
> * gcc.target/i386/avx-vpdpbusd-2.c: Ditto.
> * gcc.target/i386/avx-vpdpbusds-2.c: Ditto.
> * gcc.target/i386/avx-vpdpwssd-2.c: Ditto.
> * gcc.target/i386/avx-vpdpwssds-2.c: Ditto.
> * gcc.target/i386/vnni_inline_error.c: Ditto.
> * gcc.target/i386/avx512vnnivl-builtin.c: Ditto.
> * gcc.target/i386/avxvnni-builtin.c: Ditto.
> * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> * gcc.target/i386/pr83488-3.c: Adjust.
> * gcc.target/i386/sse-12.c: Add -mavxvnni.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-14.c: Ditto.
> * gcc.target/i386/sse-22.c: Ditto.
> * gcc.target/i386/sse-23.c: Ditto.
> * g++.dg/other/i386-2.C: Ditto.
> * g++.dg/other/i386-3.C: Ditto.
> * lib/target-supports.exp (check_effective_target_avxvnni):
> New proc.
>
> --
> Regards,
>
> Hongyu, Wang

Rebased on 2020-10-27 trunk and PING.
From e95c07fd392a012865e98cba78765edf4c4862de Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 13 Oct 2020 16:16:16 +0800
Subject: [PATCH] Support Intel AVX VNNI

2020-10-13  Hongtao Liu  
	Hongyu Wang  

gcc/
	* common/config/i386/cpuinfo.h (get_available_features):
	Detect AVXVNNI.
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA2_AVXVNNI_SET,
	OPTION_MASK_ISA2_AVXVNNI_UNSET, OPTION_MASK_ISA2_AVX2_UNSET):
	New.
	(ix86_hanlde_option): Handle -mavxvnni, unset avxvnni when
	avx2 is disabled.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_AVXVNNI.
	* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
	for avxvnni.
	* config.gcc: Add avxvnniintrin.h

[committed] wide-int: Fix up set_bit_large

2020-10-28 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 27, 2020 at 05:46:42PM +, Richard Sandiford via Gcc wrote:
> >> and proceeds to call
> >> 
> >> wide_int new_lb = wi::set_bit (r.lower_bound (0), 127)
> >> 
> >> and creates the value:
> >> 
> >> p new_lb
> >> { = {val = {-65535, -1, 0}, len = 2, precision = 128},
> >> static is_sign_extended = true}
> >
> > This is non-canonical and so invalid, if the low HWI has the MSB set
> > and the high HWI is -1, it should have been just
> > val = {-65535}, len = 1, precision = 128}
> >
> > I guess the bug is that wi::set_bit_large doesn't call canonize.
> 
> Yeah, looks like a micro-optimisation gone wrong.
> LGTM, thanks.

I've now successfully bootstrapped/regtested the patch and committed to
trunk.  I'll backport it later.

2020-10-28  Jakub Jelinek  

* wide-int.cc (wi::set_bit_large): Call canonize unless setting
msb bit and clearing bits above it.

--- gcc/wide-int.cc.jj  2020-10-19 18:42:41.134426398 +0200
+++ gcc/wide-int.cc 2020-10-27 18:33:38.546703763 +0100
@@ -702,8 +702,11 @@ wi::set_bit_large (HOST_WIDE_INT *val, c
   /* If the bit we just set is at the msb of the block, make sure
 that any higher bits are zeros.  */
   if (bit + 1 < precision && subbit == HOST_BITS_PER_WIDE_INT - 1)
-   val[len++] = 0;
-  return len;
+   {
+ val[len++] = 0;
+ return len;
+   }
+  return canonize (val, len, precision);
 }
   else
 {


Jakub



Re: [PING] [PATCH] S/390: Do not turn maybe-uninitialized warnings into errors

2020-10-28 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Mon, Oct 05, 2020 at 02:02:57PM +0200, Stefan Schulze Frielinghaus via 
Gcc-patches wrote:
> On Tue, Sep 22, 2020 at 02:59:30PM +0200, Andreas Krebbel wrote:
> > On 15.09.20 17:02, Stefan Schulze Frielinghaus wrote:
> > > Over the last couple of months quite a few warnings about uninitialized
> > > variables were raised while building GCC.  A reason why these warnings
> > > show up on S/390 only is due to the aggressive inlining settings here.
> > > Some of these warnings (2c832ffedf0, b776bdca932, 2786c0221b6,
> > > 1657178f59b) could be fixed or in case of a false positive silenced by
> > > initializing the corresponding variable.  Since the latter reoccurs and
> > > while bootstrapping such warnings are turned into errors bootstrapping
> > > fails on S/390 consistently.  Therefore, for the moment do not turn
> > > those warnings into errors.
> > > 
> > > config/ChangeLog:
> > > 
> > >   * warnings.m4: Do not turn maybe-uninitialized warnings into errors
> > >   on S/390.
> > > 
> > > fixincludes/ChangeLog:
> > > 
> > >   * configure: Regenerate.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * configure: Regenerate.
> > > 
> > > libcc1/ChangeLog:
> > > 
> > >   * configure: Regenerate.
> > > 
> > > libcpp/ChangeLog:
> > > 
> > >   * configure: Regenerate.
> > > 
> > > libdecnumber/ChangeLog:
> > > 
> > >   * configure: Regenerate.
> > 
> > That change looks good to me. Could a global reviewer please comment!
> 
> Ping

Ping

> 
> > 
> > Andreas
> > 
> > > ---
> > >  config/warnings.m4 | 20 ++--
> > >  fixincludes/configure  |  8 +++-
> > >  gcc/configure  | 12 +---
> > >  libcc1/configure   |  8 +++-
> > >  libcpp/configure   |  8 +++-
> > >  libdecnumber/configure |  8 +++-
> > >  6 files changed, 51 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/config/warnings.m4 b/config/warnings.m4
> > > index ce007f9b73e..d977bfb20af 100644
> > > --- a/config/warnings.m4
> > > +++ b/config/warnings.m4
> > > @@ -101,8 +101,10 @@ AC_ARG_ENABLE(werror-always,
> > >  AS_HELP_STRING([--enable-werror-always],
> > >  [enable -Werror despite compiler version]),
> > >  [], [enable_werror_always=no])
> > > -AS_IF([test $enable_werror_always = yes],
> > > -  [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> > > +AS_IF([test $enable_werror_always = yes], [dnl
> > > +  acx_Var="$acx_Var${acx_Var:+ }-Werror"
> > > +  AS_CASE([$host], [s390*-*-*],
> > > +  [acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
> > >   m4_if($1, [manual],,
> > >   [AS_VAR_PUSHDEF([acx_GCCvers], [acx_cv_prog_cc_gcc_$1_or_newer])dnl
> > >AC_CACHE_CHECK([whether $CC is GCC >=$1], acx_GCCvers,
> > > @@ -116,7 +118,9 @@ AS_IF([test $enable_werror_always = yes],
> > > [AS_VAR_SET(acx_GCCvers, yes)],
> > > [AS_VAR_SET(acx_GCCvers, no)])])
> > >   AS_IF([test AS_VAR_GET(acx_GCCvers) = yes],
> > > -   [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> > > +   [acx_Var="$acx_Var${acx_Var:+ }-Werror"
> > > +AS_CASE([$host], [s390*-*-*],
> > > +[acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
> > >AS_VAR_POPDEF([acx_GCCvers])])
> > >  m4_popdef([acx_Var])dnl
> > >  AC_LANG_POP(C)
> > > @@ -205,8 +209,10 @@ AC_ARG_ENABLE(werror-always,
> > >  AS_HELP_STRING([--enable-werror-always],
> > >  [enable -Werror despite compiler version]),
> > >  [], [enable_werror_always=no])
> > > -AS_IF([test $enable_werror_always = yes],
> > > -  [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> > > +AS_IF([test $enable_werror_always = yes], [dnl
> > > +  acx_Var="$acx_Var${acx_Var:+ }-Werror"
> > > +  AS_CASE([$host], [s390*-*-*],
> > > +  [strict_warn="$strict_warn -Wno-error=maybe-uninitialized"])])
> > >   m4_if($1, [manual],,
> > >   [AS_VAR_PUSHDEF([acx_GXXvers], [acx_cv_prog_cxx_gxx_$1_or_newer])dnl
> > >AC_CACHE_CHECK([whether $CXX is G++ >=$1], acx_GXXvers,
> > > @@ -220,7 +226,9 @@ AS_IF([test $enable_werror_always = yes],
> > > [AS_VAR_SET(acx_GXXvers, yes)],
> > > [AS_VAR_SET(acx_GXXvers, no)])])
> > >   AS_IF([test AS_VAR_GET(acx_GXXvers) = yes],
> > > -   [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> > > +   [acx_Var="$acx_Var${acx_Var:+ }-Werror"
> > > +AS_CASE([$host], [s390*-*-*],
> > > +[acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
> > >AS_VAR_POPDEF([acx_GXXvers])])
> > >  m4_popdef([acx_Var])dnl
> > >  AC_LANG_POP(C++)
> > > diff --git a/fixincludes/configure b/fixincludes/configure
> > > index 6e2d67b655b..e0d679cc18e 100755
> > > --- a/fixincludes/configure
> > > +++ b/fixincludes/configure
> > > @@ -4753,7 +4753,13 @@ else
> > >  fi
> > >  
> > >  if test $enable_werror_always = yes; then :
> > > -  WERROR="$WERROR${WERROR:+ }-Werror"
> > > +WERROR="$WERROR${WERROR:+ }-Werror"
> > > +  case $host in #(
> > > +  s390*-*-*) :
> > > +WERROR="$WERROR -Wno-error=maybe-uninitialized" ;; #(
> > > +  *) :
> > > + ;;
> > > +esac
> > >  fi
> >

[committed] openmp: Implicitly discover declare target for variants of declare variant calls

2020-10-28 Thread Jakub Jelinek via Gcc-patches
Hi!

This marks all variants of declare variant also declare target if the base
functions are called directly in target regions or declare target functions.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-10-28  Jakub Jelinek  

* omp-offload.c (omp_declare_target_tgt_fn_r): Handle direct calls to
declare variant base functions.
* testsuite/libgomp.c/target-42.c: New test.

--- gcc/omp-offload.c.jj2020-10-22 09:26:21.772352421 +0200
+++ gcc/omp-offload.c   2020-10-22 18:54:40.920324767 +0200
@@ -196,7 +196,26 @@ omp_declare_target_var_p (tree decl)
 static tree
 omp_discover_declare_target_tgt_fn_r (tree *tp, int *walk_subtrees, void *data)
 {
-  if (TREE_CODE (*tp) == FUNCTION_DECL)
+  if (TREE_CODE (*tp) == CALL_EXPR
+  && CALL_EXPR_FN (*tp)
+  && TREE_CODE (CALL_EXPR_FN (*tp)) == ADDR_EXPR
+  && TREE_CODE (TREE_OPERAND (CALL_EXPR_FN (*tp), 0)) == FUNCTION_DECL
+  && lookup_attribute ("omp declare variant base",
+  DECL_ATTRIBUTES (TREE_OPERAND (CALL_EXPR_FN (*tp),
+ 0
+{
+  tree fn = TREE_OPERAND (CALL_EXPR_FN (*tp), 0);
+  for (tree attr = DECL_ATTRIBUTES (fn); attr; attr = TREE_CHAIN (attr))
+   {
+ attr = lookup_attribute ("omp declare variant base", attr);
+ if (attr == NULL_TREE)
+   break;
+ tree purpose = TREE_PURPOSE (TREE_VALUE (attr));
+ if (TREE_CODE (purpose) == FUNCTION_DECL)
+   omp_discover_declare_target_tgt_fn_r (&purpose, walk_subtrees, 
data);
+   }
+}
+  else if (TREE_CODE (*tp) == FUNCTION_DECL)
 {
   tree decl = *tp;
   tree id = get_identifier ("omp declare target");
@@ -237,7 +256,7 @@ omp_discover_declare_target_tgt_fn_r (tr
}
   if (omp_declare_target_fn_p (decl)
  || lookup_attribute ("omp declare target host",
-   DECL_ATTRIBUTES (decl)))
+  DECL_ATTRIBUTES (decl)))
return NULL_TREE;
 
   if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl))
--- libgomp/testsuite/libgomp.c/target-42.c.jj  2020-10-22 18:48:59.679244952 
+0200
+++ libgomp/testsuite/libgomp.c/target-42.c 2020-10-22 18:52:51.348903566 
+0200
@@ -0,0 +1,42 @@
+#include 
+
+int
+on_nvptx (void)
+{
+  return 1;
+}
+
+int
+on_gcn (void)
+{
+  return 2;
+}
+
+#pragma omp declare variant (on_nvptx) 
match(construct={target},device={arch(nvptx)})
+#pragma omp declare variant (on_gcn) 
match(construct={target},device={arch(gcn)})
+int
+on (void)
+{
+  return 0;
+}
+
+int
+main ()
+{
+  int v;
+  #pragma omp target map(from:v)
+  v = on ();
+  switch (v)
+{
+default:
+  printf ("Host fallback or unknown offloading\n");
+  break;
+case 1:
+  printf ("Offloading to NVidia PTX\n");
+  break;
+case 2:
+  printf ("Offloading to AMD GCN\n");
+  break;
+}
+  return 0;
+}


Jakub



[PATCH] value-range: Give up on POLY_INT_CST ranges [PR97457]

2020-10-28 Thread Richard Sandiford via Gcc-patches
This PR shows another problem with calculating value ranges for
POLY_INT_CSTs.  We have:

  ivtmp_76 = ASSERT_EXPR  POLY_INT_CST [9, 4294967294]>

where the VQ coefficient is unsigned but is effectively acting
as a negative number.  We wrongly give the POLY_INT_CST the range:

  [9, INT_MAX]

and things go downhill from there: later iterations of the unrolled
epilogue are wrongly removed as dead.

I guess this is the final nail in the coffin for doing VRP on
POLY_INT_CSTs.  For other similarly exotic testcases we could have
overflow for any coefficient, not just those that could be treated
as contextually negative.

Testing TYPE_OVERFLOW_UNDEFINED doesn't seem like an option because we
couldn't handle warn_strict_overflow properly.  At this stage we're
just recording a range that might or might not lead to strict-overflow
assumptions later.

It still feels like we should be able to do something here, but for
now removing the code seems safest.  It's also telling that there
are no testsuite failures on SVE from doing this.

Tested on aarch64-linux-gnu (with and without SVE) and
x86_64-linux-gnu.  OK for trunk and backports?

Richard


gcc/
PR tree-optimization/97457
* value-range.cc (irange::set): Don't decay POLY_INT_CST ranges
to integer ranges.

gcc/testsuite/
PR tree-optimization/97457
* gcc.dg/vect/pr97457.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr97457.c | 15 +++
 gcc/value-range.cc  | 30 +
 2 files changed, 20 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr97457.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr97457.c 
b/gcc/testsuite/gcc.dg/vect/pr97457.c
new file mode 100644
index 000..506ba249b00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97457.c
@@ -0,0 +1,15 @@
+/* { dg-additional-options "-O3" } */
+
+int a;
+long c;
+signed char d(char e, char f) { return e + f; }
+int main(void) {
+  for (; a <= 1; a++) {
+c = -8;
+for (; c != 3; c = d(c, 1))
+  ;
+  }
+  char b = c;
+  if (b != 3)
+__builtin_abort();
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 7847104050c..2319c13388a 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -248,31 +248,11 @@ irange::set (tree min, tree max, value_range_kind kind)
   set_undefined ();
   return;
 }
-  if (kind == VR_RANGE)
-{
-  /* Convert POLY_INT_CST bounds into worst-case INTEGER_CST bounds.  */
-  if (POLY_INT_CST_P (min))
-   {
- tree type_min = vrp_val_min (TREE_TYPE (min));
- widest_int lb
-   = constant_lower_bound_with_limit (wi::to_poly_widest (min),
-  wi::to_widest (type_min));
- min = wide_int_to_tree (TREE_TYPE (min), lb);
-   }
-  if (POLY_INT_CST_P (max))
-   {
- tree type_max = vrp_val_max (TREE_TYPE (max));
- widest_int ub
-   = constant_upper_bound_with_limit (wi::to_poly_widest (max),
-  wi::to_widest (type_max));
- max = wide_int_to_tree (TREE_TYPE (max), ub);
-   }
-}
-  else if (kind != VR_VARYING)
-{
- if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
-   kind = VR_VARYING;
-}
+
+  if (kind != VR_VARYING
+  && (POLY_INT_CST_P (min) || POLY_INT_CST_P (max)))
+kind = VR_VARYING;
+
   if (kind == VR_VARYING)
 {
   set_varying (TREE_TYPE (min));


[committed] openmp: Parsing and some semantic analysis of OpenMP allocate clause

2020-10-28 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds parsing of OpenMP allocate clause, but still ignores
it during OpenMP lowering where we should for privatized variables
with allocate clause use the corresponding allocators rather than
allocating them on the stack.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-10-28  Jakub Jelinek  

gcc/
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_ALLOCATE.
* tree.h (OMP_CLAUSE_ALLOCATE_ALLOCATOR,
OMP_CLAUSE_ALLOCATE_COMBINED): Define.
* tree.c (omp_clause_num_ops, omp_clause_code_name): Add allocate
clause.
(walk_tree_1): Handle OMP_CLAUSE_ALLOCATE.
* tree-pretty-print.c (dump_omp_clause): Likewise.
* gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses,
gimplify_omp_for): Likewise.
* tree-nested.c (convert_nonlocal_omp_clauses,
convert_local_omp_clauses): Likewise.
* omp-low.c (scan_sharing_clauses): Likewise.
gcc/c-family/
* c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_ALLOCATE.
* c-omp.c: Include bitmap.h.
(c_omp_split_clauses): Handle OMP_CLAUSE_ALLOCATE.
gcc/c/
* c-parser.c (c_parser_omp_clause_name): Handle allocate.
(c_parser_omp_clause_allocate): New function.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_ALLOCATE.
(OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK,
OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK,
OMP_TASK_CLAUSE_MASK, OMP_TASKGROUP_CLAUSE_MASK,
OMP_DISTRIBUTE_CLAUSE_MASK, OMP_TEAMS_CLAUSE_MASK,
OMP_TARGET_CLAUSE_MASK, OMP_TASKLOOP_CLAUSE_MASK): Add
PRAGMA_OMP_CLAUSE_ALLOCATE.
* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_ALLOCATE.
gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle allocate.
(cp_parser_omp_clause_allocate): New function.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_ALLOCATE.
(OMP_FOR_CLAUSE_MASK, OMP_SECTIONS_CLAUSE_MASK,
OMP_PARALLEL_CLAUSE_MASK, OMP_SINGLE_CLAUSE_MASK,
OMP_TASK_CLAUSE_MASK, OMP_TASKGROUP_CLAUSE_MASK,
OMP_DISTRIBUTE_CLAUSE_MASK, OMP_TEAMS_CLAUSE_MASK,
OMP_TARGET_CLAUSE_MASK, OMP_TASKLOOP_CLAUSE_MASK): Add
PRAGMA_OMP_CLAUSE_ALLOCATE.
* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_ALLOCATE.
* pt.c (tsubst_omp_clauses): Likewise.
gcc/testsuite/
* c-c++-common/gomp/allocate-1.c: New test.
* c-c++-common/gomp/allocate-2.c: New test.
* c-c++-common/gomp/clauses-1.c (omp_allocator_handle_t): New typedef.
(foo, bar, baz): Add allocate clauses where allowed.

--- gcc/tree-core.h.jj  2020-10-27 14:36:27.321461746 +0100
+++ gcc/tree-core.h 2020-10-27 18:40:11.130111517 +0100
@@ -276,6 +276,9 @@ enum omp_clause_code {
   /* OpenMP clause: aligned (variable-list[:alignment]).  */
   OMP_CLAUSE_ALIGNED,
 
+  /* OpenMP clause: allocate ([allocator:]variable-list).  */
+  OMP_CLAUSE_ALLOCATE,
+
   /* OpenMP clause: depend ({in,out,inout}:variable-list).  */
   OMP_CLAUSE_DEPEND,
 
--- gcc/tree.h.jj   2020-10-27 14:36:27.360461180 +0100
+++ gcc/tree.h  2020-10-27 18:40:11.125111589 +0100
@@ -1731,6 +1731,16 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_ALIGNED_ALIGNMENT(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALIGNED), 1)
 
+#define OMP_CLAUSE_ALLOCATE_ALLOCATOR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATE), 1)
+
+/* True if an ALLOCATE clause was present on a combined or composite
+   construct and the code for splitting the clauses has already performed
+   checking if the listed variable has explicit privatization on the
+   construct.  */
+#define OMP_CLAUSE_ALLOCATE_COMBINED(NODE) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATE)->base.public_flag)
+
 #define OMP_CLAUSE_NUM_TEAMS_EXPR(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_NUM_TEAMS), 0)
 
--- gcc/tree.c.jj   2020-10-27 14:36:42.409242730 +0100
+++ gcc/tree.c  2020-10-27 18:40:11.129111532 +0100
@@ -291,6 +291,7 @@ unsigned const char omp_clause_num_ops[]
   1, /* OMP_CLAUSE_COPYPRIVATE  */
   3, /* OMP_CLAUSE_LINEAR  */
   2, /* OMP_CLAUSE_ALIGNED  */
+  2, /* OMP_CLAUSE_ALLOCATE  */
   1, /* OMP_CLAUSE_DEPEND  */
   1, /* OMP_CLAUSE_NONTEMPORAL  */
   1, /* OMP_CLAUSE_UNIFORM  */
@@ -375,6 +376,7 @@ const char * const omp_clause_code_name[
   "copyprivate",
   "linear",
   "aligned",
+  "allocate",
   "depend",
   "nontemporal",
   "uniform",
@@ -12213,6 +12215,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func
  WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
 
case OMP_CLAUSE_ALIGNED:
+   case OMP_CLAUSE_ALLOCATE:
case OMP_CLAUSE_FROM:
case OMP_CLAUSE_TO:
case OMP_CLAUSE_MAP:
--- gcc/tree-pretty-print.c.jj  2020-10-27 14:36:27.331461601 +0100
+++ gcc/tree-pretty-print.c 2020-10-27 18:40:11.125111589 +0

Re: libgo patch committed: Additional BSD-specific syscall wrappers

2020-10-28 Thread Rainer Orth
Hi Ian,

> This libgo patch by Nikhil Benesch imports additional code from
> upstream for handing system calls on BSD systems. This makes the
> syscall package on NetBSD complete enough to compile the standard
> library.  Boostrapped and ran Go testsuite on x86_64-pc-linux-gnu.
> Committed to mainline.

this patch broke Solaris bootstrap:

/vol/gcc/src/hg/master/local/libgo/go/syscall/libcall_solaris_largefile.go:12:1:
 error: redefinition of 'ReadDirent'
   12 | func ReadDirent(fd int, buf []byte) (n int, err error) {
  | ^
/vol/gcc/src/hg/master/local/libgo/go/syscall/libcall_bsd.go:27:1: note: 
previous definition of 'ReadDirent' was here
   27 | func ReadDirent(fd int, buf []byte) (n int, err error) {
  | ^
libcalls.go:2320:1: error: redefinition of 'raw_ptrace'
 2320 | func raw_ptrace(request int, pid int, addr uintptr, data uintptr) (err 
Errno) {
  | ^
libcalls.go:383:1: note: previous definition of 'raw_ptrace' was here
  383 | func raw_ptrace(request int, pid int, addr uintptr, data uintptr) (err 
Errno) {
  | ^
/vol/gcc/src/hg/master/local/libgo/go/syscall/libcall_bsd.go:33:16: error: 
reference to undefined name 'Getdirentries'
   33 | return Getdirentries(fd, buf, base)
  |^
/vol/gcc/src/hg/master/local/libgo/go/syscall/libcall_bsd.go:33:9: error: not 
enough arguments to return
   33 | return Getdirentries(fd, buf, base)
  | ^
/vol/gcc/src/hg/master/local/libgo/go/syscall/libcall_bsd.go:69:21: error: 
reference to undefined name 'nametomib'
   69 | mib, err := nametomib(name)
  | ^
/vol/gcc/src/hg/master/local/libgo/go/syscall/libcall_bsd.go:98:21: error: 
reference to undefined name 'nametomib'
   98 | mib, err := nametomib(name)
  | ^
libcalls.go:2321:85: error: argument 4 has incompatible type
 2321 | _r := c_ptrace(_C_int(request), Pid_t(pid), 
(*byte)(unsafe.Pointer(addr)), (*byte)(unsafe.Pointer(data)))
  | 
^
make[4]: *** [Makefile:2912: syscall.lo] Error 1

Of the functions used there, ptrace (32-bit only) is already handled and
sysctl, paccept and flock don't exist on Solaris.  Only pipe2 does
exist, but it's only in Solaris 11.4 and Illumos, not Solaris 11.3, so
better left off for now.

Fixed by removing the solaris build tag.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


diff --git a/libgo/go/syscall/libcall_bsd.go b/libgo/go/syscall/libcall_bsd.go
--- a/libgo/go/syscall/libcall_bsd.go
+++ b/libgo/go/syscall/libcall_bsd.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd netbsd openbsd solaris
+// +build darwin dragonfly freebsd netbsd openbsd
 
 // BSD library calls.
 


Re: PING [PATCH] Enable GCC support for Intel Key Locker extension

2020-10-28 Thread Hongyu Wang via Gcc-patches
Hi Uros,

Thanks for the example. We've update the patterns with new expanders
and predicates like vzeroall.
Now the generated insn for "encodekey128u32"  is like

(insn 7 6 8 2 (parallel [
(set (reg:SI 84 [  ])
(unspec_volatile:SI [
(reg:SI 85)
(reg:V2DI 20 xmm0)
] UNSPECV_ENCODEKEY128U32))
(set (reg:V2DI 20 xmm0)
(unspec_volatile:V2DI [
(const_int 0 [0])
] UNSPECV_ENCODEKEY128U32))
(set (reg:V2DI 21 xmm1)
(unspec_volatile:V2DI [
(const_int 0 [0])
] UNSPECV_ENCODEKEY128U32))
(set (reg:V2DI 22 xmm2)
(unspec_volatile:V2DI [
(const_int 0 [0])
] UNSPECV_ENCODEKEY128U32))
(set (reg:V2DI 24 xmm4)
(const_vector:V2DI [
(const_int 0 [0]) repeated x2
]))
(set (reg:V2DI 25 xmm5)
(const_vector:V2DI [
(const_int 0 [0]) repeated x2
]))
(set (reg:V2DI 26 xmm6)
(const_vector:V2DI [
(const_int 0 [0]) repeated x2
]))
(clobber (reg:CC 17 flags))

Rebased on 2020-10-27 trunk and update patch.

Uros Bizjak  于2020年10月21日周三 下午8:20写道:
>
> On Wed, Oct 21, 2020 at 1:48 PM Uros Bizjak  wrote:
> >
> > On Wed, Oct 21, 2020 at 11:11 AM Hongyu Wang  wrote:
> > >
> > > Hi,
> > >
> > > > IIRC, adding a new regclass is O(n^2), so it should be avoided. I
> > > > think that the new patterns should follow the same path as vzeroall
> > > > and vzeroupper patterns, where we emit the pattern with explicit hard
> > > > regs.
> > > >
> > > > BTW: We do have SSE_FIRST_REG class, but this class was added to solve
> > > > some reload problems in the past by marking %xmm0 as likely spilled.
> > >
> > > Thanks for your suggestion, we have removed the register classes and 
> > > constraints, and
> > > set explicit sse hard registers in the expander. The corresponding 
> > > patterns are also adjusted,
> > >
> > > Update and rebased patch.
> >
> > The attached patch goes only half-way to using explicit registers. As
> > said previously, please see how avx_vzeroall expander is generating
> > its insn pattern, and how *avx_vzeroall matches the generated pattern
> > using "vzeroall_operation" predicate.
>
> For example:
>
> +(define_insn "encodekey128u32"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(unspec_volatile:SI
> +  [(match_operand:SI   1 "register_operand" "r")
> +   (match_operand:V2DI 3 "register_operand" "2")]
> + UNSPECV_ENCODEKEY128U32))
>
> should be generated as:
>
> (parallel [
>   (set ( ... as above ... )
> (unspec_volatile:SI [( ... as above ... ) ( reg:V2DI 20 xmm0 )]
> UNSPEC_ENCODEKEY128U32))
>
> followed by the serie of:
>
>(set (reg:V2DI 20 xmm0)
> (unspec_volatile:V2DI [(const_int 0)] UNSPECV_ENCODEKEY128U32))
>
> no need to duplicate already listed input operands in unspec_volatile.
>
> followed by another serie of:
>
>(set (reg:V2DI 26 xmm6)
> (const_vector:V2DI [(const_int 0) (const_int 0)]))
>
> to tell the optimizer that some registers now hold zero, so the value
> in the register can eventually be reused elsewhere.
>
> and finish the parallel with clobber of flags_reg.
>
> Another example:
>
> +(define_insn "aesu8"
> +  [(set (reg:CCZ FLAGS_REG)
> +(unspec_volatile:CCZ [(match_operand:BLK 0 "memory_operand" "m")
> +  (match_operand:V2DI 9  "register_operand" "1")
> +  (match_operand:V2DI 2  "sse_reg_operand")
> +  (match_operand:V2DI 3  "sse_reg_operand")
> +  (match_operand:V2DI 4  "sse_reg_operand")
> +  (match_operand:V2DI 5  "sse_reg_operand")
> +  (match_operand:V2DI 6  "sse_reg_operand")
> +  (match_operand:V2DI 7  "sse_reg_operand")
> +  (match_operand:V2DI 8  "sse_reg_operand")]
> + AESDECENCWIDEKL))
> +   (set (match_operand:V2DI 1 "register_operand" "=Yz")
> +(unspec_volatile:V2DI [(const_int 0)] AESDECENCWIDEKL))
> +   (set (match_dup 2)
> +(unspec_volatile:V2DI [(const_int 0)] AESDECENCWIDEKL))
>
> This should be written as:
>
> parallel [
>   (set ( ... as above ... )
> (unspec_volatile:CCZ [( ... as above, BLK only ... )]
> UNSPEC_AESDECENWIDEKL))
>
> followed by a series of:
>
>(set (reg:V2DI 20 xmm0)
> (unspec_volatile:V2DI [(reg:V2DI 20 xmm0)] UNSPEC_AESDECENCWIDEKL))
>
> And please see the mentioned expander and pattern how the above series
> are generated and matched.
>
> Uros.
From b19d4d0694911ad5ffd4fcbe1b52a6a66f9

Re: testsuite: Enable and adjust powerpc fold-vec-extract/insert testcases

2020-10-28 Thread Alan Modra via Gcc-patches
git commit badeac77f552 changed expected number of addi instructions,
causing these fails on powerpc-linux.

gcc.target/powerpc/fold-vec-insert-int-p9.c: \\maddi\\M found 12 times
FAIL: gcc.target/powerpc/fold-vec-insert-int-p9.c scan-assembler-times 
\\maddi\\M 8
gcc.target/powerpc/fold-vec-extract-char.p9.c: addi found 6 times
FAIL: gcc.target/powerpc/fold-vec-extract-char.p9.c scan-assembler-times addi 3
gcc.target/powerpc/fold-vec-extract-int.p9.c: \\maddi\\M found 6 times
FAIL: gcc.target/powerpc/fold-vec-extract-int.p9.c scan-assembler-times 
\\maddi\\M 3
gcc.target/powerpc/fold-vec-extract-longlong.p7.c: \\maddi\\M found 6 times
FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p7.c scan-assembler-times 
\\maddi\\M 4
gcc.target/powerpc/fold-vec-extract-longlong.p8.c: \\maddi\\M found 6 times
FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p8.c scan-assembler-times 
\\maddi\\M 4
changed by badeac77f552

I'm not at all sure why we are counting addi.  On linux I see
eight in fold-vec-insert-int-p9.c tearing down the stack frame in
function epilogues, and four in
addi 9,1,16
lvewx 0,0,9
For aix you have the above four but with a -16 offset.  There are no
stack frames, and you have four addressing stack red-zone as
addi 9,1,-64

fold-vec-extract-char.p9.c on linux just has epilogue addi, aix has
red-zone addressing.  The same for fold-vec-extract-int.p9.c,
fold-vec-extract-longlong.p7.c and fold-vec-extract-longlong.p8.c.

It seems silly to count addi in a function epilogue, and fragile to
count them in code.  So remove the ilp32 addi checks.

Regression tested powerpc64-linux and powerpc64le-linux.  OK?

* gcc.target/powerpc/fold-vec-extract-char.p9.c: Don't check addi
count for ilp32.
* gcc.target/powerpc/fold-vec-extract-int.p9.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-longlong.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-longlong.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p9.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p9.c
index ff03c9a722b..8a4c380edad 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p9.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p9.c
@@ -12,7 +12,6 @@
 
 /* { dg-final { scan-assembler-times "stxv" 6 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lbz" 6 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "addi" 3 { target ilp32 } } } */
 
 
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p9.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p9.c
index 868b673cdaf..1abf19da40d 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p9.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p9.c
@@ -18,7 +18,6 @@
 /* { dg-final { scan-assembler-times {\madd\M} 3 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mstxv\M} 6 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlwz\M} 6 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\maddi\M} 3 { target ilp32 } } } */
 
 
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p7.c
index 033d21c9a43..b97fcb40eda 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p7.c
@@ -22,7 +22,6 @@
 /* -m32 target with constant test uses (+2)li where the -m64 has an ld */
 /* { dg-final { scan-assembler-times {\mli\M} 5 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\maddi\M} 6 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\maddi\M} 4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstvx\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mldx\M} 3 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p8.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p8.c
index 0b624d262e1..8ddce3fd2d8 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p8.c
@@ -17,7 +17,6 @@
 /* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target 
ilp32 } } } */
 /* { dg-final { scan-assembler-times {\madd\M} 3 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlwz\M} 11 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\maddi\M} 4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target le } } } */
diff --git a/gcc/testsuite/gcc.target/powerp

Re: [PATCH] SLP vectorize across PHI nodes

2020-10-28 Thread Christophe Lyon via Gcc-patches
On Tue, 27 Oct 2020 at 13:18, Richard Biener  wrote:
>
> This makes SLP discovery detect backedges by seeding the bst_map with
> the node to be analyzed so it can be picked up from recursive calls.
> This removes the need to discover backedges in a separate walk.
>
> This enables SLP build to handle PHI nodes in full, continuing
> the SLP build to non-backedges.  For loop vectorization this
> enables outer loop vectorization of nested SLP cycles and for
> BB vectorization this enables vectorization of PHIs at CFG merges.
>
> It also turns code generation into a SCC discovery walk to handle
> irreducible regions and nodes only reachable via backedges where
> we now also fill in vectorized backedge defs.
>
> This requires sanitizing the SLP tree for SLP reduction chains even
> more, manually filling the backedge SLP def.
>
> This also exposes the fact that CFG copying (and edge splitting
> until I fixed that) ends up with different edge order in the
> copy which doesn't play well with the desired 1:1 mapping of
> SLP PHI node children and edges for epilogue vectorization.
> I've tried to fixup CFG copying here but this really looks
> like a dead (or expensive) end there so I've done fixup in
> slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
> we can run into.
>
> There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
> not sure it's possible to eliminate them all this stage1 so the
> patch has quite some checks for this case all over the place.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
> and SPEC CPU 2006 successfully built and tested.
>
> Will push soon.
>
> Richard.
>
> 2020-10-27  Richard Biener  
>
> * gimple.h (gimple_expr_type): For PHIs return the type
> of the result.
> * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
> Make sure edge order into copied loop headers line up with the
> originals.
> * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
> loops with SLP.
> (vectorizable_phi): New function.
> (vectorizable_live_operation): For BB vectorization compute insert
> location here.
> * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
> SLP_TREE_CHILDREN entries.
> (vect_create_new_slp_node): Add overloads with pre-existing node
> argument.
> (vect_print_slp_graph): Likewise.
> (vect_mark_slp_stmts): Likewise.
> (vect_mark_slp_stmts_relevant): Likewise.
> (vect_gather_slp_loads): Likewise.
> (vect_optimize_slp): Likewise.
> (vect_slp_analyze_node_operations): Likewise.
> (vect_bb_slp_scalar_cost): Likewise.
> (vect_remove_slp_scalar_calls): Likewise.
> (vect_get_and_check_slp_defs): Handle PHIs.
> (vect_build_slp_tree_1): Handle PHIs.
> (vect_build_slp_tree_2): Continue SLP build, following PHI
> arguments.  Fix memory leak.
> (vect_build_slp_tree): Put stub node into the hash-map so
> we can discover cycles directly.
> (vect_build_slp_instance): Set the backedge SLP def for
> reduction chains.
> (vect_analyze_slp_backedges): Remove.
> (vect_analyze_slp): Do not call it.
> (vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION.
> (vect_slp_analyze_node_operations): Handle stray failed
> backedge defs by failing.
> (vect_slp_build_vertices): Adjust leaf condition.
> (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
> hash-set to handle cycles.
> (vect_slp_analyze_operations): Adjust.
> (vect_bb_partition_graph_r): Likewise.
> (vect_slp_function): Adjust split condition to allow CFG
> merges.
> (vect_schedule_slp_instance): Rename to ...
> (vect_schedule_slp_node): ... this.  Move DFS walk to ...
> (vect_schedule_scc): ... this new function.
> (vect_schedule_slp): Call it.  Remove ad-hoc vectorized
> backedge fill code.
> * tree-vect-stmts.c (vect_analyze_stmt): Call
> vectorizable_phi.
> (vect_transform_stmt): Likewise.
> (vect_is_simple_use): Handle vect_backedge_def.
> * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
> set loop header PHIs to vect_unknown_def_type for loop
> vectorization.
> * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def.
> (enum stmt_vec_info_type): Add phi_info_type.
> (vectorizable_phi): Declare.
>
> * gcc.dg/vect/bb-slp-54.c: New test.
> * gcc.dg/vect/bb-slp-55.c: Likewise.
> * gcc.dg/vect/bb-slp-56.c: Likewise.
> * gcc.dg/vect/bb-slp-57.c: Likewise.
> * gcc.dg/vect/bb-slp-58.c: Likewise.
> * gcc.dg/vect/bb-slp-59.c: Likewise.
> * gcc.dg/vect/bb-slp-60.c: Likewise.
> * gcc.dg/vect/bb-slp-61.c: Likewise.
> * gcc.dg/vect/bb-slp-62.c: Likewise

Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts

2020-10-28 Thread Chung-Lin Tang

Hi Jakub,
Thank you for the review.

On 2020/10/13 9:01 PM, Jakub Jelinek wrote:

 gcc/c-family/
 * c-common.h (c_omp_adjust_clauses): New declaration.
 * c-omp.c (c_omp_adjust_clauses): New function.

Besides the naming, I wonder why is it done in a separate function and so
early, can't what the function does be done either in
{,c_}finish_omp_clauses (provided we'd pass separate ORT_OMP vs.
ORT_OMP_TARGET to it to determine if it is target region vs. anything else),
or perhaps even better during gimplification (gimplify_scan_omp_clauses)?


I figured that differentiating with something like "C_ORT_OMP_TARGET" could be
more error prone to adjust changes related to C_ORT_OMP across the code, plus
this has the added benefit of sharing a single place of handling logic across 
C/C++.

You're right about the need for early addressable-marking. Learned that the hard
way, one of my prior attempts tried to place this code somewhere in gimplify,
didn't work.


 gcc/cp/
 * parser.c (cp_parser_omp_target_data): Add use of
 new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as
 handled map clause kind.
 (cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_target): Likewise.
* semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
interaction between reference case and attach/detach.
(finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.

The changelog has some 8 space indented lines.


I'll take care of that in the final git push.


+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+   && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
+   && TREE_CODE (TREE_TYPE (OMP_CLAUSE_DECL (c))) != ARRAY_TYPE)
+  {
+   tree ptr = OMP_CLAUSE_DECL (c);
+   bool ptr_mapped = false;
+   if (is_target)
+ {
+   for (tree m = clauses; m; m = OMP_CLAUSE_CHAIN (m))

Isn't this O(n^2) in number of clauses?  I mean, e.g. for the equality
comparisons (but see below) it could be dealt with e.g. using some bitmap
with DECL_UIDs.


At this stage, we really don't assume any ordering of the clauses, nor try to
modify its ordering yet, so the base-pointer map (if it exists) could be any
where in the list (building some "visited set" isn't really suitable here).
I don't think this is really that much an issue of concern though.


+ if (OMP_CLAUSE_CODE (m) == OMP_CLAUSE_MAP
+ && OMP_CLAUSE_DECL (m) == ptr

Does it really need to be equality?  I mean it will be for
map(tofrom:ptr) map(tofrom:ptr[:32])
but what about e.g.
map(tofrom:structx) map(tofrom:structx.ptr[:32])
?  It is true that likely we don't parse this yet though.


The code for COMPONENT_REF based expressions are actually handled quite 
differently
in gimplify_scan_omp_clauses. Not completely sure there's nothing to handle for 
the
code in this patch set, but will have to discover such testcases later.


+ && (OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_ALLOC
+ || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_TO
+ || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_FROM
+ || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_TOFROM))

What about the always modified mapping kinds?


Took care of that.


+   {
+ ptr_mapped = true;
+ break;
+   }
+
+   if (!ptr_mapped
+   && DECL_P (ptr)
+   && is_global_var (ptr)
+   && lookup_attribute ("omp declare target",
+DECL_ATTRIBUTES (ptr)))
+ ptr_mapped = true;
+ }
+
+   /* If the pointer variable was mapped, or if this is not an offloaded
+  target region, adjust the map kind to attach/detach.  */
+   if (ptr_mapped || !is_target)
+ {
+   OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_ATTACH_DETACH);
+   c_common_mark_addressable_vec (ptr);

Though perhaps this is argument why it needs to be done in the FEs and not
during gimplification, because it is hard to mark something addressable at
that point.


Discussed above.


--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13580,16 +13580,17 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
break;
  }
tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
if (ort != C_ORT_OMP && ort != C_ORT_ACC)
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_POINTER);
else if (TREE_CODE (t) == COMPONENT_REF)
{
- gomp_map_kind k = (ort == C_ORT_ACC) ? GOMP_MAP_ATTACH_DETACH
-  : GOMP_MAP_ALWAYS_POINTER;
+

Re: [PATCH, 2/3, OpenMP] Target mapping changes for OpenMP 5.0, middle-end parts and compiler testcases

2020-10-28 Thread Chung-Lin Tang

On 2020/10/13 9:31 PM, Jakub Jelinek wrote:

+/* Implement OpenMP 5.x map ordering rules for target directives. There are
+   several rules, and with some level of ambiguity, hopefully we can at least
+   collect the complexity here in one place.  */
+
+static void
+omp_target_reorder_clauses (tree *list_p)
+{

So, first of all, are you convinced we can sort just the explicit clauses
and leave out the (later on) implicitly added ones?
If it is possible, sure, it will be easier (because we later on need to deal
with the GOMP_MAP_STRUCT sorting too).


Yeah, there will probably be more cases to handle later, and possibly sinking
the call to omp_target_reorder_clauses till after the main handling in
gimplify_scan_omp_clauses. But the current routine handles a straightforward
set of cases, which can be grown upon later.


+  vec clauses = vNULL;

Isn't this a memory leak?  Nothing frees the vector.  Perhaps better
   auto_vec clauses;


+  for (tree *cp = list_p; *cp; cp = &OMP_CLAUSE_CHAIN (*cp))
+clauses.safe_push (*cp);

The rest of the function deals only with OMP_CLAUSE_MAP clauses, wouldn't it
be better to just push to the vec those clauses and keep other clauses just
in *list_p chain?


+  /* Collect refs to alloc/release/delete maps.  */
+  vec ard = vNULL;

Again, auto_vec ard;


Thanks for catching this. I'm now using auto_vec now.


+  tree *cp = list_p;
+  for (unsigned int i = 0; i < clauses.length (); i++)
+if (clauses[i])
+  {
+   *cp = clauses[i];
+   cp = &OMP_CLAUSE_CHAIN (clauses[i]);
+  }
+  for (unsigned int i = 0; i < ard.length (); i++)
+{
+  *cp = ard[i];
+  cp = &OMP_CLAUSE_CHAIN (ard[i]);
+}
+  *cp = NULL_TREE;
+
+  /* OpenMP 5.0 requires that pointer variables are mapped before
+ its use as a base-pointer.  */
+  for (tree *cp = list_p; *cp; cp = &OMP_CLAUSE_CHAIN (*cp))
+if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP)
+  {
+   tree decl = OMP_CLAUSE_DECL (*cp);
+   gomp_map_kind k = OMP_CLAUSE_MAP_KIND (*cp);
+   if ((k == GOMP_MAP_ALLOC
+|| k == GOMP_MAP_TO
+|| k == GOMP_MAP_FROM
+|| k == GOMP_MAP_TOFROM)

What about the*ALWAYS*  kinds?


Adjustment done, plus re-written so only one pass of this checking is done.


+   && (TREE_CODE (decl) == INDIRECT_REF
+   || TREE_CODE (decl) == MEM_REF))
+ {
+   tree base_ptr = TREE_OPERAND (decl, 0);
+   STRIP_TYPE_NOPS (base_ptr);
+   for (tree *cp2 = &OMP_CLAUSE_CHAIN (*cp); *cp2;
+cp2 = &OMP_CLAUSE_CHAIN (*cp2))
+ if (OMP_CLAUSE_CODE (*cp2) == OMP_CLAUSE_MAP)
+   {
+ tree decl2 = OMP_CLAUSE_DECL (*cp2);
+ gomp_map_kind k2 = OMP_CLAUSE_MAP_KIND (*cp2);
+ if ((k2 == GOMP_MAP_ALLOC
+  || k2 == GOMP_MAP_TO
+  || k2 == GOMP_MAP_FROM
+  || k2 == GOMP_MAP_TOFROM)

Again.

This is O(n^2) too, but due to the is_or_contains_p I'm not sure
if we can avoid it.  Perhaps sort the clauses by uid of the base expressions
and deal with those separately.  Maybe let's ignore it for now.


I re-wrote most of omp_target_reorder_clauses to be more efficient. The O(n^2)
issues should be fixed now.


@@ -8958,25 +9083,20 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  /* An "attach/detach" operation on an update directive should
 behave as a GOMP_MAP_ALWAYS_POINTER.  Beware that
 unlike attach or detach map kinds, GOMP_MAP_ALWAYS_POINTER
 depends on the previous mapping.  */
  if (code == OACC_UPDATE
  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH)
OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_ALWAYS_POINTER);
- if (gimplify_expr (pd, pre_p, NULL, is_gimple_lvalue, fb_lvalue)
- == GS_ERROR)
-   {
- remove = true;
- break;
-   }

So what gimplifies those now?


They're gimplified somewhere during omp-low now.
(some gimplify scan testcases were adjusted to accommodate this change)

I don't remember the exact case I encountered, but there were some issues with 
gimplified
expressions inside the map clauses making some later checking more difficult. I 
haven't seen
any negative effect of this modification so far.


+   if (! (code == OMP_TARGET
+  || code == OMP_TARGET_DATA
+  || code == OMP_TARGET_ENTER_DATA
+  || code == OMP_TARGET_EXIT_DATA))
+ {

Isn't this just if ((region_type & ORT_ACC) == 0) ?  Or do we want
it for target update too?  Though, we wouldn't talk about more than once in
map clauses then because target update doesn't have those.


It's actually "(region_type & ORT_ACC) != 0", which I now use in the patch.
I orig

Re: [PATCH, 3/3, OpenMP] Target mapping changes for OpenMP 5.0, libgomp parts [resend]

2020-10-28 Thread Chung-Lin Tang

On 2020/9/1 9:37 PM, Chung-Lin Tang wrote:

his patch is the changes to libgomp and testcases.

There is now (again) a need to indicate OpenACC/OpenMP and
an 'enter data' style directive, so the associated changes to
'enum gomp_map_vars_kind'.

There is a slight change in the logic of gomp_attach_pointer
handling, because for OpenMP there might be a non-offloaded
data clause that attempts an attachment but silently continues
in case the pointer is not mapped.

Also in the testcases, an XFAILed testcase for structure element
mapping is added. OpenMP 5.0 specifies that a element of the same
structure variable are allocated/deallocated in a uniform fashion,
but this hasn't been implemented yet in this patch.


Hi Jakub,
you haven't reviewed this 3rd part yet, but still updating with a rebased patch 
here.

I've removed the above mentioned XFAILed testcase from the patch, since it 
actually
belongs in the structure element mapping patches instead of here.

Thanks,
Chung-Lin

libgomp/
* libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag
usable.
* oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to
'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'.
(goacc_enter_datum): Likewise for call to gomp_map_vars_async.
(goacc_enter_data_internal): Likewise.

* target.c (gomp_map_vars_internal): Change checks of 
GOMP_MAP_VARS_ENTER_DATA
to use bit-and (&). Adjust use of gomp_attach_pointer for OpenMP cases.
(gomp_exit_data): Add handling of GOMP_MAP_DETACH.
(GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH.
* testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase.
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index da7ac037dcd..0cc3f4d406b 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1162,10 +1162,10 @@ struct gomp_device_descr
 /* Kind of the pragma, for which gomp_map_vars () is called.  */
 enum gomp_map_vars_kind
 {
-  GOMP_MAP_VARS_OPENACC,
-  GOMP_MAP_VARS_TARGET,
-  GOMP_MAP_VARS_DATA,
-  GOMP_MAP_VARS_ENTER_DATA
+  GOMP_MAP_VARS_OPENACC= 1,
+  GOMP_MAP_VARS_TARGET = 2,
+  GOMP_MAP_VARS_DATA   = 4,
+  GOMP_MAP_VARS_ENTER_DATA = 8
 };
 
 extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *,
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 65757ab2ffc..8dc521ac6d6 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -403,7 +403,8 @@ acc_map_data (void *h, void *d, size_t s)
 
   struct target_mem_desc *tgt
= gomp_map_vars (acc_dev, mapnum, &hostaddrs, &devaddrs, &sizes,
-&kinds, true, GOMP_MAP_VARS_ENTER_DATA);
+&kinds, true,
+GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA);
   assert (tgt);
   assert (tgt->list_count == 1);
   splay_tree_key n = tgt->list[0].key;
@@ -572,7 +573,8 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void 
*kinds, int async)
 
   struct target_mem_desc *tgt
= gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs, NULL, sizes,
-  kinds, true, GOMP_MAP_VARS_ENTER_DATA);
+  kinds, true,
+  GOMP_MAP_VARS_OPENACC | 
GOMP_MAP_VARS_ENTER_DATA);
   assert (tgt);
   assert (tgt->list_count == 1);
   n = tgt->list[0].key;
@@ -1202,7 +1204,7 @@ goacc_enter_data_internal (struct gomp_device_descr 
*acc_dev, size_t mapnum,
  struct target_mem_desc *tgt
= gomp_map_vars_async (acc_dev, aq, groupnum, &hostaddrs[i], NULL,
   &sizes[i], &kinds[i], true,
-  GOMP_MAP_VARS_ENTER_DATA);
+  GOMP_MAP_VARS_OPENACC | 
GOMP_MAP_VARS_ENTER_DATA);
  assert (tgt);
 
  gomp_mutex_lock (&acc_dev->lock);
diff --git a/libgomp/target.c b/libgomp/target.c
index 1a8c67c2df5..61dab064fae 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -683,7 +683,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
   struct target_mem_desc *tgt
 = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
-  tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
+  tgt->refcount = (pragma_kind & GOMP_MAP_VARS_ENTER_DATA) ? 0 : 1;
   tgt->device_descr = devicep;
   tgt->prev = NULL;
   struct gomp_coalesce_buf cbuf, *cbufp = NULL;
@@ -1212,15 +1212,16 @@ gomp_map_vars_internal (struct gomp_device_descr 
*devicep,
  /* OpenACC 'attach'/'detach' doesn't affect
 structured/dynamic reference counts ('n->refcount',
 'n->dynamic_refcount').  */
+
+ gomp_attach_pointer (devicep, aq, mem_map, n,
+  (uintptr_t) hostaddrs[i], sizes[i],
+  cbufp);
}
-  

Re: [PATCH] tree-optimization/97428 - split SLP groups for loop vectorization

2020-10-28 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 27 Oct 2020, Richard Sandiford wrote:
>
>> Sorry for the very late comment (was out last week)?
>> 
>> Richard Biener  writes:
>> > This enables SLP store group splitting also for loop vectorization.
>> > For the existing testcase gcc.dg/vect/vect-complex-5.c this then
>> > generates much better code, likewise for the PR97428 testcase.
>> >
>> > Both of those have a splitting opportunity splitting the group
>> > into two equal (vector-sized) halves, still the patch enables
>> > quite arbitrary splitting since generally the interleaving scheme
>> > results in quite awkward code for even small groups.  If any
>> > problems surface with this it's easy to restrict the splitting
>> > to known-good cases.  Is there any additional constraints for
>> > non-constant sized vectors?  Note this interacts with vector
>> > size iteration (but comparing interleaving cost with SLP cost
>> > of a smaller vector size doesn't reliably pick the smaller
>> > vector size).
>> 
>> Not sure about the variable-sized vector aspect.  For SVE it
>> isn't really natural to split the store itself up: I think we'd
>> instead want to keep a unified store and blend in the stored
>> values where necessary.  E.g. rather than split:
>> 
>>   a a a a b b c c
>> 
>> into:
>> 
>>   a a a a
>>   b b
>>   c c
>> 
>> we'd be better off having predicated groups of the form:
>> 
>>   a a a a _ _ _ _
>>   _ _ _ _ b b _ _
>>   _ _ _ _ _ _ c c
>> 
>> This is one thing on the very long todo list :-/
>
> Hmm, I see.  Looking at the case of a group_size == 3 store
> right now which (for the sake of register pressure) would
> benefit from V4xy vectorization and a masked store, doing
> sth "smart" to fill up lane 4 (duplicating another one
> would always work but possibly make loads more expensive,
> masking would work here as well).

Yeah.  Also, SVE has an instruction that fills up a predicate up to the
largest multiple of 3.  So for a group size of 3 we could do something
like:

ptrue   p0.b, mul3
ld1bz0.b, p0/z, ...
...
st1bz0.b, p0, ...

For the final (possibly partial) iteration we'd just use WHILELO as
normal, knowing that nscalars * 3 fits into a vector.

Yet another thing on the to-do list :-)

Thanks,
Richard


Re: [PATCH] SLP vectorize across PHI nodes

2020-10-28 Thread Christophe Lyon via Gcc-patches
On Wed, 28 Oct 2020 at 11:27, Christophe Lyon
 wrote:
>
> On Tue, 27 Oct 2020 at 13:18, Richard Biener  wrote:
> >
> > This makes SLP discovery detect backedges by seeding the bst_map with
> > the node to be analyzed so it can be picked up from recursive calls.
> > This removes the need to discover backedges in a separate walk.
> >
> > This enables SLP build to handle PHI nodes in full, continuing
> > the SLP build to non-backedges.  For loop vectorization this
> > enables outer loop vectorization of nested SLP cycles and for
> > BB vectorization this enables vectorization of PHIs at CFG merges.
> >
> > It also turns code generation into a SCC discovery walk to handle
> > irreducible regions and nodes only reachable via backedges where
> > we now also fill in vectorized backedge defs.
> >
> > This requires sanitizing the SLP tree for SLP reduction chains even
> > more, manually filling the backedge SLP def.
> >
> > This also exposes the fact that CFG copying (and edge splitting
> > until I fixed that) ends up with different edge order in the
> > copy which doesn't play well with the desired 1:1 mapping of
> > SLP PHI node children and edges for epilogue vectorization.
> > I've tried to fixup CFG copying here but this really looks
> > like a dead (or expensive) end there so I've done fixup in
> > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
> > we can run into.
> >
> > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
> > not sure it's possible to eliminate them all this stage1 so the
> > patch has quite some checks for this case all over the place.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
> > and SPEC CPU 2006 successfully built and tested.
> >
> > Will push soon.
> >
> > Richard.
> >
> > 2020-10-27  Richard Biener  
> >
> > * gimple.h (gimple_expr_type): For PHIs return the type
> > of the result.
> > * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
> > Make sure edge order into copied loop headers line up with the
> > originals.
> > * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
> > loops with SLP.
> > (vectorizable_phi): New function.
> > (vectorizable_live_operation): For BB vectorization compute insert
> > location here.
> > * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
> > SLP_TREE_CHILDREN entries.
> > (vect_create_new_slp_node): Add overloads with pre-existing node
> > argument.
> > (vect_print_slp_graph): Likewise.
> > (vect_mark_slp_stmts): Likewise.
> > (vect_mark_slp_stmts_relevant): Likewise.
> > (vect_gather_slp_loads): Likewise.
> > (vect_optimize_slp): Likewise.
> > (vect_slp_analyze_node_operations): Likewise.
> > (vect_bb_slp_scalar_cost): Likewise.
> > (vect_remove_slp_scalar_calls): Likewise.
> > (vect_get_and_check_slp_defs): Handle PHIs.
> > (vect_build_slp_tree_1): Handle PHIs.
> > (vect_build_slp_tree_2): Continue SLP build, following PHI
> > arguments.  Fix memory leak.
> > (vect_build_slp_tree): Put stub node into the hash-map so
> > we can discover cycles directly.
> > (vect_build_slp_instance): Set the backedge SLP def for
> > reduction chains.
> > (vect_analyze_slp_backedges): Remove.
> > (vect_analyze_slp): Do not call it.
> > (vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION.
> > (vect_slp_analyze_node_operations): Handle stray failed
> > backedge defs by failing.
> > (vect_slp_build_vertices): Adjust leaf condition.
> > (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
> > hash-set to handle cycles.
> > (vect_slp_analyze_operations): Adjust.
> > (vect_bb_partition_graph_r): Likewise.
> > (vect_slp_function): Adjust split condition to allow CFG
> > merges.
> > (vect_schedule_slp_instance): Rename to ...
> > (vect_schedule_slp_node): ... this.  Move DFS walk to ...
> > (vect_schedule_scc): ... this new function.
> > (vect_schedule_slp): Call it.  Remove ad-hoc vectorized
> > backedge fill code.
> > * tree-vect-stmts.c (vect_analyze_stmt): Call
> > vectorizable_phi.
> > (vect_transform_stmt): Likewise.
> > (vect_is_simple_use): Handle vect_backedge_def.
> > * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
> > set loop header PHIs to vect_unknown_def_type for loop
> > vectorization.
> > * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def.
> > (enum stmt_vec_info_type): Add phi_info_type.
> > (vectorizable_phi): Declare.
> >
> > * gcc.dg/vect/bb-slp-54.c: New test.
> > * gcc.dg/vect/bb-slp-55.c: Likewise.
> > * gcc.dg/vect/bb-slp-56.c: Likewise.
> > * gcc.dg/v

Re: [RS6000] Do not define builtins that overload disabled builtins

2020-10-28 Thread Alan Modra via Gcc-patches
commit 25ffd3d34e means we no longer define an overloaded
__builtin_byte_in_set for -m32, so the more informative
"__builtin_byte_in_set is not supported in this compiler
configuration" is not reported.

Regression tested powerpc64-linux biarch.  OK?

PR bootstrap/92661
* gcc.target/powerpc/byte-in-set-2.c: Update expected error.

diff --git a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c 
b/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
index 9a80c27fe26..34ab50e25ba 100644
--- a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
@@ -11,5 +11,5 @@
 int
 test_byte_in_set (unsigned char b, unsigned long long set_members)
 {
-  return __builtin_byte_in_set (b, set_members); /* { dg-error 
"'__builtin_byte_in_set' is not supported in this compiler configuration" } */
+  return __builtin_byte_in_set (b, set_members); /* { dg-warning "implicit 
declaration of function" } */
 }

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] float128-type-2.c unsupported

2020-10-28 Thread Alan Modra via Gcc-patches
>From e7ce33cef478a826a2fe4e110b43b49586ef2438 Mon Sep 17 00:00:00 2001
From: Alan Modra 
Date: Wed, 28 Oct 2020 15:57:57 +1030
Subject: 

I noticed this test is unsupported on power10 when looking through
test logs.  There seems no reason why that should be the case, ie.
likely the target test was meant to be powerpc64*-*-linux*.  And that
simplifies down further.

Regression tested powerpc64le-linux.  OK?

* gcc.target/powerpc/float128-type-1.c: Simplify target test.
* gcc.target/powerpc/float128-type-2.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/powerpc/float128-type-1.c 
b/gcc/testsuite/gcc.target/powerpc/float128-type-1.c
index 13152ac7c26..53f9e357535 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-type-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-type-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { powerpc64*-*-linux* && lp64 } } } */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O2 -mno-float128" } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-type-2.c 
b/gcc/testsuite/gcc.target/powerpc/float128-type-2.c
index 5644281c3d4..02dbad1fa4f 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-type-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-type-2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { powerpc64-*-linux* && lp64 } } } */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power9 -O2 -mno-float128" } */
 

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Don't be too clever with dg-do run and dg-do compile

2020-10-28 Thread Alan Modra via Gcc-patches
Otherwise some versions of dejagnu go ahead and run the vsx tests
below when they should not.  To best cope with older dejagnu, put
"run" before "compile", the idea being that if the second dg-do always
wins then that won't cause fails.

The altivec tests also need -save-temps for the scan-assembler test to
occur when vms_hw.

Regression tested powerpc64le-linux and powerpc64-linux.  OK?

* gcc.target/powerpc/vsx-load-element-extend-char.c: Put "dg-do run"
before "dg-do compile", and make them mutually exclusive.
* gcc.target/powerpc/vsx-load-element-extend-int.c: Likewise.
* gcc.target/powerpc/vsx-load-element-extend-longlong.c: Likewise.
* gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
* gcc.target/powerpc/vsx-store-element-truncate-char.c: Likewise.
* gcc.target/powerpc/vsx-store-element-truncate-int.c: Likewise.
* gcc.target/powerpc/vsx-store-element-truncate-longlong.c: Likewise.
* gcc.target/powerpc/vsx-store-element-truncate-short.c: Likewise.
* gcc.target/powerpc/altivec-consts.c: Likewise, add -save-temps.
* gcc.target/powerpc/le-altivec-consts.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-consts.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-consts.c
index d59f9b4cf1c..c68c68125d1 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-consts.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-consts.c
@@ -1,7 +1,7 @@
 /* { dg-do run { target vmx_hw } } */
-/* { dg-do compile } */
+/* { dg-do compile { target { ! vmx_hw } } } */
 /* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec -mabi=altivec -O2" } */
+/* { dg-options "-maltivec -mabi=altivec -O2 -save-temps" } */
 
 /* Check that "easy" AltiVec constants are correctly synthesized.  */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/le-altivec-consts.c 
b/gcc/testsuite/gcc.target/powerpc/le-altivec-consts.c
index f48ef44e676..a1db5e92f87 100644
--- a/gcc/testsuite/gcc.target/powerpc/le-altivec-consts.c
+++ b/gcc/testsuite/gcc.target/powerpc/le-altivec-consts.c
@@ -1,7 +1,7 @@
 /* { dg-do run { target vmx_hw } } */
-/* { dg-do compile } */
+/* { dg-do compile { target { ! vmx_hw } } } */
 /* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec -mabi=altivec -O2" } */
+/* { dg-options "-maltivec -mabi=altivec -O2 -save-temps" } */
 
 /* Check that "easy" AltiVec constants are correctly synthesized.  */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c
index f386346e059..c23a9128680 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c
@@ -2,8 +2,9 @@
Test of vec_xl_sext and vec_xl_zext (load into rightmost
vector element and zero/sign extend). */
 
-/* { dg-do compile {target power10_ok} } */
-/* { dg-do run {target power10_hw} } */
+/* { dg-do run { target power10_hw } } */
+/* { dg-do compile { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */
 /* { dg-require-effective-target int128 } */
 /* { dg-options "-mdejagnu-cpu=power10 -O3 -save-temps" } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
index ea737466a58..c40e1a3a0f7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
@@ -2,8 +2,9 @@
Test of vec_xl_sext and vec_xl_zext (load into rightmost
vector element and zero/sign extend). */
 
-/* { dg-do compile {target power10_ok} } */
-/* { dg-do run {target power10_hw} } */
+/* { dg-do run { target power10_hw } } */
+/* { dg-do compile { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */
 /* { dg-require-effective-target int128 } */
 
 /* Deliberately set optization to zero for this test to confirm
diff --git 
a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-longlong.c
index cd155c2013d..405b4245f8e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-longlong.c
@@ -2,8 +2,9 @@
Test of vec_xl_sext and vec_xl_zext (load into rightmost
vector element and zero/sign extend). */
 
-/* { dg-do compile {target power10_ok} } */
-/* { dg-do run {target power10_hw} } */
+/* { dg-do run { target power10_hw } } */
+/* { dg-do compile { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */
 /* { dg-require-effective-target int128 } */
 /* { dg-options "-mdejagnu-cpu=power10 -O3 -save-temps" } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-short.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-e

[FYI,Ada,PR97504] riscv needs wraplf for aux_long_long_float too

2020-10-28 Thread Alexandre Oliva


riscv is another platform on which GNAT maps Long_Long_Float to double
rather than long double, so we have to explicitly avoid the long
double intrinsics.


for  gcc/ada/ChangeLog

PR ada/97504
* Makefile.rtl (LIBGNAT_TARGET_PAIRS> : Use wraplf
version of Aux_Long_Long_Float.
---
 gcc/ada/Makefile.rtl |1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
index cc957b6..6f014d2 100644
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2697,6 +2697,7 @@ endif
 ifeq ($(strip $(filter-out riscv% linux%,$(target_cpu) $(target_os))),)
   LIBGNAT_TARGET_PAIRS = \
   a-intnam.adshttps://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


[PATCH V4] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

2020-10-28 Thread Andrea Corallo via Gcc-patches
Richard Sandiford  writes:

[...]

>> Hi Richard,
>>
>> I had a look a little more closely and just moving the #undefs to the
>> end of the file is not viable as these macros are: defined, undefined,
>> redefined and finally undefined to generate the intrinsic and theier 'q'
>> variants.
>>
>> In the attached patch the pragmas are added around the bfloat intrinsics
>> without moving the code.
>>
>> Other option would be to rename some of these macro so they can be
>> undefed at the end of the file without overlapping.  Please let me know
>> if you prefer this way, I'll be happy to rework the patches accordingly.
>
> Yeah, that sounds better (sorry).  This file is big enough and hard
> enough to parse without overloaded macro names adding to the fun.
> Generating the vld2q functions from __LD2Q_LANE_FUNC rather than
> __LD2_LANE_FUNC seems more mnemonic as well as solving the undef
> problem.
>
> Thanks,
> Richard

Hi Richard,

here the reworked version renaming in place the
__LD*_LANE_FUNC macros and doing the undef at the bottom of the file.

Regtested and bootstrapped.

Okay for trunk and 10?

Thanks!

  Andrea

>From 3a11baf699c59062b503df3ea18c862aca8961ff Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 15 Oct 2020 10:16:18 +0200
Subject: [PATCH] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16
 intrisics

gcc/ChangeLog

2020-10-15  Andrea Corallo  

* config/aarch64/arm_neon.h (__LD2_LANE_FUNC, __LD3_LANE_FUNC)
(__LD4_LANE_FUNC): Rename the macro geneating the 'q' variants
into __LD2Q_LANE_FUNC, __LD2Q_LANE_FUNC, __LD2Q_LANE_FUNC so they
all can be undefed at the and of the file.
(vld2_lane_bf16, vld2q_lane_bf16, vld3_lane_bf16, vld3q_lane_bf16)
(vld4_lane_bf16, vld4q_lane_bf16): Add new intrinsics.

gcc/testsuite/ChangeLog

2020-10-15  Andrea Corallo  

* gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c: New
testcase.
* gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld2_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld3_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld4_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_bf16_indices_1.c:
Likewise.
---
 gcc/config/aarch64/arm_neon.h | 118 +-
 .../advsimd-intrinsics/bf16_vldN_lane_1.c |  74 +++
 .../advsimd-intrinsics/bf16_vldN_lane_2.c |  52 
 .../vld2_lane_bf16_indices_1.c|  17 +++
 .../vld2q_lane_bf16_indices_1.c   |  17 +++
 .../vld3_lane_bf16_indices_1.c|  17 +++
 .../vld3q_lane_bf16_indices_1.c   |  17 +++
 .../vld4_lane_bf16_indices_1.c|  17 +++
 .../vld4q_lane_bf16_indices_1.c   |  17 +++
 9 files changed, 289 insertions(+), 57 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_bf16_indices_1.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 00cc9d660e7..8b380201553 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -20848,11 +20848,9 @@ __LD2_LANE_FUNC (uint32x2x2_t, uint32x2_t, 
uint32x4x2_t, uint32_t, v2si, v4si, s
 __LD2_LANE_FUNC (uint64x1x2_t, uint64x1_t, uint64x2x2_t, uint64_t, di, v2di, 
di,
 u64, int64x2_t)
 
-#undef __LD2_LANE_FUNC
-
 /* vld2q_lane */
 
-#define __LD2_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \
+#define __LD2Q_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \
 __extension__ extern __inline intype \
 __attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \
 vld2q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
@@ -20868,22 +20866,20 @@ vld2q_lane_##funcsuffix (const ptrtype * __ptr, 
intype __b, const int __c) \
   return ret; \
 }
 
-__LD2_LANE_FUNC (float16x8x2

Re: [PATCH 2/2] combine: Don't turn (mult (extend x) 2^n) into extract

2020-10-28 Thread Alex Coplan via Gcc-patches
On 28/10/2020 09:09, Alex Coplan via Gcc-patches wrote:
> On 27/10/2020 17:31, Segher Boessenkool wrote:
> > On Tue, Oct 27, 2020 at 10:35:59AM +, Alex Coplan wrote:
> > > On 26/10/2020 12:43, Segher Boessenkool wrote:
> > > > I do not like handling both mult and ashift in one case like this, it
> > > > complicates things for no good reason.  Write it as two cases, and it
> > > > should be good.
> > > 
> > > OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top
> > > of make_extraction so that the existing ASHIFT block can do the work for
> > > us. We remember if we did it and then convert it back if necessary.
> > > 
> > > I'm not convinced that it's an improvement. What do you think?
> > 
> > Restoring it like that is just yuck.  That can be okay if it is as the
> > start and end of a smallish function, importantly some self-contained
> > piece of code, but this is not.
> > 
> > Just write it as two blocks? One handling the shift, that is already
> > there; and add one block adding the mult case.  That should not
> > increase the complexity of this already way too complex code.
> 
> OK, how about the attached?
> 
> Bootstrap and regtest in progress on aarch64-none-linux-gnu.

This fails bootstrap since we trigger -Wsign-compare without the cast to
unsigned HOST_WIDE_INT on shift_amt:

> +  const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
> +  if (shift_amt > 0 && len > shift_amt)

So, quoting an earlier reply:

> > +  const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : 
> > ci;
> > +
> > +  if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt)
> 
> Space after cast; better is to not need a cast at all (and you do not
> need one, len is unsigned HOST_WIDE_INT already).

unfortunately we do need the cast here.

See the revised patch attached, bootstrap on aarch64 in progress.

Thanks,
Alex

---

gcc/ChangeLog:

* combine.c (make_extraction): Also handle shfits written as
(mult x 2^n), avoid creating an extract rtx for these.
diff --git a/gcc/combine.c b/gcc/combine.c
index 4782e1d9dcc..729d04b1d9e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7665,6 +7665,24 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
   if (new_rtx != 0)
return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
 }
+  else if (GET_CODE (inner) == MULT
+  && CONST_INT_P (XEXP (inner, 1))
+  && pos_rtx == 0 && pos == 0)
+{
+  /* We're extracting the least significant bits of an rtx
+(mult X (const_int 2^C)), where LEN > C.  Extract the
+least significant (LEN - C) bits of X, giving an rtx
+whose mode is MODE, then multiply it by 2^C.  */
+  const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
+  if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT) shift_amt)
+   {
+ new_rtx = make_extraction (mode, XEXP (inner, 0),
+0, 0, len - shift_amt,
+unsignedp, in_dest, in_compare);
+ if (new_rtx)
+   return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1));
+   }
+}
   else if (GET_CODE (inner) == TRUNCATE
   /* If trying or potentionally trying to extract
  bits outside of is_mode, don't look through


Re: [PATCH V4] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

2020-10-28 Thread Richard Sandiford via Gcc-patches
Andrea Corallo  writes:
> Richard Sandiford  writes:
>
> [...]
>
>>> Hi Richard,
>>>
>>> I had a look a little more closely and just moving the #undefs to the
>>> end of the file is not viable as these macros are: defined, undefined,
>>> redefined and finally undefined to generate the intrinsic and theier 'q'
>>> variants.
>>>
>>> In the attached patch the pragmas are added around the bfloat intrinsics
>>> without moving the code.
>>>
>>> Other option would be to rename some of these macro so they can be
>>> undefed at the end of the file without overlapping.  Please let me know
>>> if you prefer this way, I'll be happy to rework the patches accordingly.
>>
>> Yeah, that sounds better (sorry).  This file is big enough and hard
>> enough to parse without overloaded macro names adding to the fun.
>> Generating the vld2q functions from __LD2Q_LANE_FUNC rather than
>> __LD2_LANE_FUNC seems more mnemonic as well as solving the undef
>> problem.
>>
>> Thanks,
>> Richard
>
> Hi Richard,
>
> here the reworked version renaming in place the
> __LD*_LANE_FUNC macros and doing the undef at the bottom of the file.
>
> Regtested and bootstrapped.
>
> Okay for trunk and 10?

OK for both.  Thanks for doing this.

Richard


[PATCH V3] aarch64: Add vstN_lane_bf16 + vstNq_lane_bf16 intrinsics

2020-10-28 Thread Andrea Corallo via Gcc-patches
Andrea Corallo via Gcc-patches  writes:

> Hi all,
>
> Second version of the patch here implementing the bfloat16_t neon
> related store intrinsics: vst2_lane_bf16, vst2q_lane_bf16,
> vst3_lane_bf16, vst3q_lane_bf16 vst4_lane_bf16, vst4q_lane_bf16.
>
> Please see refer to:
> ACLE 
> ISA  
>
> This better narrows testcases so they do not cause regressions for the
> arm backend where these intrinsics are not yet present.
>
> Please see refer to:
> ACLE 
> ISA  
>

Hi all,

third version of this patch following the suggestions got for its sister
patch 

Regtested and bootstrapped.

Okay for trunk and 10?

Thanks!

  Andrea

>From 55535eada983c4be9cd6a4ba26afec685c01ba91 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 8 Oct 2020 11:02:09 +0200
Subject: [PATCH] aarch64: Add vstN_lane_bf16 + vstNq_lane_bf16 intrinsics

gcc/ChangeLog

2020-10-19  Andrea Corallo  

* config/aarch64/arm_neon.h (__ST2_LANE_FUNC, __ST3_LANE_FUNC)
(__ST4_LANE_FUNC): Rename the macro generating the 'q' variants
into __ST2Q_LANE_FUNC, __ST2Q_LANE_FUNC, __ST2Q_LANE_FUNC so they
all can be undefed at the and of the file.
(vst2_lane_bf16, vst2q_lane_bf16, vst3_lane_bf16, vst3q_lane_bf16)
(vst4_lane_bf16, vst4q_lane_bf16): Add new intrinsics.

gcc/testsuite/ChangeLog

2020-10-19  Andrea Corallo  

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(hbfloat16_t): Define type.
(CHECK_FP): Make it working for bfloat types.
* gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_1.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst2_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst3_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst4_lane_bf16_indices_1.c:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_bf16_indices_1.c:
Likewise.
---
 gcc/config/aarch64/arm_neon.h | 110 +
 .../aarch64/advsimd-intrinsics/arm-neon-ref.h |   4 +-
 .../advsimd-intrinsics/bf16_vstN_lane_1.c | 227 ++
 .../advsimd-intrinsics/bf16_vstN_lane_2.c |  52 
 .../vst2_lane_bf16_indices_1.c|  16 ++
 .../vst2q_lane_bf16_indices_1.c   |  16 ++
 .../vst3_lane_bf16_indices_1.c|  16 ++
 .../vst3q_lane_bf16_indices_1.c   |  16 ++
 .../vst4_lane_bf16_indices_1.c|  16 ++
 .../vst4q_lane_bf16_indices_1.c   |  16 ++
 10 files changed, 440 insertions(+), 49 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_bf16_indices_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_bf16_indices_1.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 8b380201553..7071610e90c 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10926,8 +10926,7 @@ __ST2_LANE_FUNC (uint32x2x2_t, uint32x4x2_t, uint32_t, 
v2si, v4si, si, u32,
 __ST2_LANE_FUNC (uint64x1x2_t, uint64x2x2_t, uint64_t, di, v2di, di, u64,
 int64x2_t)
 
-#undef __ST2_LANE_FUNC
-#define __ST2_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix)   \
+#define __ST2Q_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix)  \
 __extension__ extern __inline void \
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
 vst2q_lane_ ## funcsuffix (ptrtype *__ptr, \
@@ -10939,20 +10938,20 @@ vst2q_lane_ ## funcsuffix (ptrtype *__ptr,
\
__ptr, __temp.__o, __c);\
 }
 
-__ST2_LANE_FUNC (float16x8x2_t, float16_t, v8hf, hf, f16)
-__ST2_LANE_FUNC (float32x4x2_t, float32_t, v4sf, sf,

Re: [PATCH V4] aarch64: Add bfloat16 vldN_lane_bf16 + vldNq_lane_bf16 intrisics

2020-10-28 Thread Andrea Corallo via Gcc-patches
Richard Sandiford  writes:
[...]
> OK for both.  Thanks for doing this.
>
> Richard

Welcome, installed into master as 44e570d9fb0.

Will follow-up for the backport.

Thanks!

  Andrea


Re: [PATCH V3] aarch64: Add vstN_lane_bf16 + vstNq_lane_bf16 intrinsics

2020-10-28 Thread Andrea Corallo via Gcc-patches
Andrea Corallo via Gcc-patches  writes:
[...]

> Hi all,
>
> third version of this patch following the suggestions got for its sister
> patch 
>
> Regtested and bootstrapped.
>
> Okay for trunk and 10?
>
> Thanks!
>
>   Andrea

Installed into master as 292c812a27c (okay given here
)

Will follow-up for the backport.

  Andrea



Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-10-28 Thread Julian Brown
On Wed, 28 Oct 2020 15:25:56 +0800
Chung-Lin Tang  wrote:

> On 2020/10/27 9:17 PM, Julian Brown wrote:
> >> And, in which context are cuStreamAddCallback registered callbacks
> >> run? E.g. if it is inside of asynchronous interrput, using locking
> >> in there might not be the best thing to do.  
> > The cuStreamAddCallback API is documented here:
> > 
> > https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483
> > 
> > We're quite limited in what we can do in the callback function since
> > "Callbacks must not make any CUDA API calls". So what*can*  a
> > callback function do? It is mentioned that the callback function's
> > execution will "pause" the stream it is logically running on. So
> > can we get deadlock, e.g. if multiple host threads are launching
> > offload kernels simultaneously? I don't think so, but I don't know
> > how to prove it!  
> 
> I think it's not deadlock that's a problem here, but that the locking
> acquiring in nvptx_stack_acquire will effectively serialize GPU
> kernel execution to just one host thread (since you're holding it
> till kernel completion). Also in that case, why do you need to use a
> CUDA callback? You can just call the unlock directly afterwards.

IIUC, there's a single GPU queue used for synchronous launches no
matter which host thread initiates the operation, and kernel execution
is serialised anyway, so that shouldn't be a problem. The only way to
get different kernels executing simultaneously is to use different CUDA
streams -- but I think that's still TBD for OpenMP ("TODO: Implement
GOMP_OFFLOAD_async_run").

> I think a better way is to use a list of stack blocks in ptx_dev, and
> quickly retrieve/unlock it in nvptx_stack_acquire, like how we did it
> in GOMP_OFFLOAD_alloc for general device memory allocation.

If it weren't for the serialisation, we could also keep a stack cache
per-host-thread in nvptx_thread. But as it is, I don't think we need the
extra complication. When we do OpenMP async support, maybe a stack
cache can be put per-stream in goacc_asyncqueue or the OpenMP
equivalent.

Thanks,

Julian


Re: [PATCH V3] aarch64: Add vstN_lane_bf16 + vstNq_lane_bf16 intrinsics

2020-10-28 Thread Richard Sandiford via Gcc-patches
Andrea Corallo  writes:
> Andrea Corallo via Gcc-patches  writes:
>
>> Hi all,
>>
>> Second version of the patch here implementing the bfloat16_t neon
>> related store intrinsics: vst2_lane_bf16, vst2q_lane_bf16,
>> vst3_lane_bf16, vst3q_lane_bf16 vst4_lane_bf16, vst4q_lane_bf16.
>>
>> Please see refer to:
>> ACLE 
>> ISA  
>>
>> This better narrows testcases so they do not cause regressions for the
>> arm backend where these intrinsics are not yet present.
>>
>> Please see refer to:
>> ACLE 
>> ISA  
>>
>
> Hi all,
>
> third version of this patch following the suggestions got for its sister
> patch 
>
> Regtested and bootstrapped.
>
> Okay for trunk and 10?

OK, thanks.  FTR, I wondered whether bf16_vstN_lane_1.c (being a run test)
required a stronger condition than arm_v8_2a_bf16_neon_ok.  But it doesn't
of course: these particular instructions exist in base Armv8-A and the
+bf16 requirement is a software-only thing.

Richard


Re: PING [PATCH] Enable GCC support for Intel Key Locker extension

2020-10-28 Thread Uros Bizjak via Gcc-patches
On Wed, Oct 28, 2020 at 10:54 AM Hongyu Wang  wrote:
>
> Hi Uros,
>
> Thanks for the example. We've update the patterns with new expanders
> and predicates like vzeroall.
> Now the generated insn for "encodekey128u32"  is like
>
> (insn 7 6 8 2 (parallel [
> (set (reg:SI 84 [  ])
> (unspec_volatile:SI [
> (reg:SI 85)
> (reg:V2DI 20 xmm0)
> ] UNSPECV_ENCODEKEY128U32))
> (set (reg:V2DI 20 xmm0)
> (unspec_volatile:V2DI [
> (const_int 0 [0])
> ] UNSPECV_ENCODEKEY128U32))
> (set (reg:V2DI 21 xmm1)
> (unspec_volatile:V2DI [
> (const_int 0 [0])
> ] UNSPECV_ENCODEKEY128U32))
> (set (reg:V2DI 22 xmm2)
> (unspec_volatile:V2DI [
> (const_int 0 [0])
> ] UNSPECV_ENCODEKEY128U32))
> (set (reg:V2DI 24 xmm4)
> (const_vector:V2DI [
> (const_int 0 [0]) repeated x2
> ]))
> (set (reg:V2DI 25 xmm5)
> (const_vector:V2DI [
> (const_int 0 [0]) repeated x2
> ]))
> (set (reg:V2DI 26 xmm6)
> (const_vector:V2DI [
> (const_int 0 [0]) repeated x2
> ]))
> (clobber (reg:CC 17 flags))
>
> Rebased on 2020-10-27 trunk and update patch.

Yes, this looks much better.

+#define OPTION_MASK_ISA2_AVX2_UNSET OPTION_MASK_ISA2_AVX512F_UNSET
+#define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
+#define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
+#define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
+#define OPTION_MASK_ISA2_SSE4_UNSET OPTION_MASK_ISA2_SSE4_1_UNSET
+#define OPTION_MASK_ISA2_SSSE3_UNSET OPTION_MASK_ISA2_SSE4_1_UNSET
+#define OPTION_MASK_ISA2_SSE3_UNSET OPTION_MASK_ISA2_SSSE3_UNSET
+#define OPTION_MASK_ISA2_SSE2_UNSET \
+  (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
+#define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET

is there a reason to introduce all these (with corresponding changes)?
SSE options live in ISA bitmap, so it is kind of strange you need to
handle them in ISA2 bitmap. Option handling is not exactly my area,
please ask HJ to comment and review this part.

Eventually, some option could be moved from ISA to ISA2 to accommodate
KL options in ISA bitmap.

+  pat = gen_rtx_EQ (QImode, gen_rtx_REG (CCZmode, FLAGS_REG),
+const0_rtx);
+  emit_move_insn (target, pat);

emit_insn (gen_rtx_SET (target, pat));

+case IX86_BUILTIN_ENCODEKEY128U32:
+  {
+rtx op, xmm_regs[7];
+
+arg0 = CALL_EXPR_ARG (exp, 0); // unsigned int htype
+arg1 = CALL_EXPR_ARG (exp, 1); // __m128i key
+arg2 = CALL_EXPR_ARG (exp, 2); // void *h
+
+op0 = expand_normal (arg0);
+op1 = expand_normal (arg1);
+op2 = expand_normal (arg2);
+
+if (!REG_P (op0))
+  op0 = copy_to_mode_reg (SImode, op0);
+
+op1 = copy_to_suggested_reg (op1,
+ gen_rtx_REG (V2DImode,
+  GET_SSE_REGNO (0)),
+ V2DImode);
+
+xmm_regs[0] = op1;

this is no better than:

reg = gen_rtx_REG (V2DImode, GET_SSE_REGNO (0));
emit_move_insn (reg, op1)

+xmm_regs[0] = op1;
+for (i = 1; i < 3; i++)
+  xmm_regs[i] = gen_rtx_REG (V2DImode, GET_SSE_REGNO (i));

The first line is dead code, copy_to_suggested reg generated (reg
xmm0) RTX for op1. Just use:

for (i = 0; i < 3; i++)
  xmm_regs[i] = gen_rtx_REG (V2DImode, GET_SSE_REGNO (i));

Similar comments for:

+   case IX86_BUILTIN_ENCODEKEY256U32:

...

+(define_expand "encodekey128u32"
+  [(match_par_dup 3
+[(set (match_operand:SI 0 "register_operand")
+  (unspec_volatile:SI
+[(match_operand:SI   1 "register_operand")
+ (match_operand:V2DI 2 "register_operand")]

It is better to use hard register in this particular case, (reg:V2DI XMM0_REG).

+UNSPECV_ENCODEKEY128U32))])]
+  "TARGET_KL"
+{
+  rtx xmm_regs[7];
+  rtx tmp_unspec;
+  unsigned i;
+
+  /* parallel rtx for encodekey128 predicate */
+  operands[3] = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (8));
+
+  xmm_regs[0] = operands[2];
+  for (i = 1; i < 7; i++)
+  xmm_regs[i] = gen_rtx_REG (V2DImode, GET_SSE_REGNO (i));

for (i = 0; i < 7; i++)

+  tmp_unspec
+= gen_rtx_UNSPEC_VOLATILE (SImode,
+   gen_rtvec (2, operands[1], xmm_regs[0]),
+   UNSPECV_ENCODEKEY128U32);
+
+  XVECEXP (operands[3], 0, 0)
+= gen_rtx_SET (operands[0], tmp_unspec);
+
+  for (i = 0; i < 3; i++)
+{
+  tmp_unspec
+= gen_rtx_UNSPEC_VOLATILE (V2DImode,
+   gen_rtvec (1, const0_rtx),
+   UNSPECV_ENCODEKEY128U32);

Please move the above out of the loop.

+  XVECEXP

[committed] libstdc++: Make std::span layout-compatible with struct iovec [PR 95609]

2020-10-28 Thread Jonathan Wakely via Gcc-patches
This change reorders the data members of std::span so that span is
layout-compatible with common implementations of struct iovec. This will
allow span to be used directly in places that use a struct iovec
to do scatter-gather I/O.

It's important to note that POSIX doesn't specify the order of members
in iovec. Also the equivalent type on Windows has members in the other
order, and uses type ULONG (which is always 32-bit whereas size_t is
64-bit for Win64). So this change will only help for certain targets and
an indirection between std::span and I/O system calls will still be
needed for the general case.

libstdc++-v3/ChangeLog:

PR libstdc++/95609
* include/std/span (span): Reorder data members to match common
implementations of struct iovec.
* testsuite/23_containers/span/layout_compat.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 0f7cd5e5735e5536bf7bc8ca2b998f7ce8b4ddee
Author: Jonathan Wakely 
Date:   Wed Oct 28 12:07:40 2020

libstdc++: Make std::span layout-compatible with struct iovec [PR 95609]

This change reorders the data members of std::span so that span is
layout-compatible with common implementations of struct iovec. This will
allow span to be used directly in places that use a struct iovec
to do scatter-gather I/O.

It's important to note that POSIX doesn't specify the order of members
in iovec. Also the equivalent type on Windows has members in the other
order, and uses type ULONG (which is always 32-bit whereas size_t is
64-bit for Win64). So this change will only help for certain targets and
an indirection between std::span and I/O system calls will still be
needed for the general case.

libstdc++-v3/ChangeLog:

PR libstdc++/95609
* include/std/span (span): Reorder data members to match common
implementations of struct iovec.
* testsuite/23_containers/span/layout_compat.cc: New test.

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index fb349403c9e..24c61ba4172 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -38,8 +38,8 @@
 
 #if __cplusplus > 201703L
 
-#include 
 #include 
+#include 
 #include 
 #include 
 
@@ -151,7 +151,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr
   span() noexcept
   requires ((_Extent + 1u) <= 1u)
-  : _M_extent(0), _M_ptr(nullptr)
+  : _M_ptr(nullptr), _M_extent(0)
   { }
 
   template
@@ -159,7 +159,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
constexpr explicit(extent != dynamic_extent)
span(_It __first, size_type __count)
noexcept
-   : _M_extent(__count), _M_ptr(std::to_address(__first))
+   : _M_ptr(std::to_address(__first)), _M_extent(__count)
{
  if constexpr (_Extent != dynamic_extent)
{
@@ -173,8 +173,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
constexpr explicit(extent != dynamic_extent)
span(_It __first, _End __last)
noexcept(noexcept(__last - __first))
-   : _M_extent(static_cast(__last - __first)),
- _M_ptr(std::to_address(__first))
+   : _M_ptr(std::to_address(__first)),
+ _M_extent(static_cast(__last - __first))
{
  if constexpr (_Extent != dynamic_extent)
{
@@ -392,8 +392,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 private:
-  [[no_unique_address]] __detail::__extent_storage _M_extent;
   pointer _M_ptr;
+  [[no_unique_address]] __detail::__extent_storage _M_extent;
 };
 
   // deduction guides
diff --git a/libstdc++-v3/testsuite/23_containers/span/layout_compat.cc 
b/libstdc++-v3/testsuite/23_containers/span/layout_compat.cc
new file mode 100644
index 000..efc5b8e4706
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/span/layout_compat.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+#include 
+
+#if __has_include()
+#include 
+#else
+struct iovec { void* iov_base; std::size_t iov_len; };
+#endif
+
+#if __cpp_lib_is_pointer_interconvertible
+using std::is_layout_compatible_v;
+#else
+// A poor substitute for i

[committed] libstdc++: Fix name clash with _Cosh in QNX headers [PR 95592]

2020-10-28 Thread Jonathan Wakely via Gcc-patches
This replaces unqualified names like _Cosh with struct std::_Cosh to
ensure there is no ambiguity with other entities with the same name.

libstdc++-v3/ChangeLog:

PR libstdc++/95592
* include/bits/valarray_after.h (_DEFINE_EXPR_UNARY_OPERATOR)
(_DEFINE_EXPR_BINARY_OPERATOR, _DEFINE_EXPR_BINARY_FUNCTION):
Use elaborated-type-specifier and qualified-id to avoid
ambiguities with QNX system headers.
* testsuite/26_numerics/valarray/95592.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 72a87d82e0d0741d75c72c8f3d2fc070e3a02b5f
Author: Jonathan Wakely 
Date:   Wed Oct 28 12:35:44 2020

libstdc++: Fix name clash with _Cosh in QNX headers [PR 95592]

This replaces unqualified names like _Cosh with struct std::_Cosh to
ensure there is no ambiguity with other entities with the same name.

libstdc++-v3/ChangeLog:

PR libstdc++/95592
* include/bits/valarray_after.h (_DEFINE_EXPR_UNARY_OPERATOR)
(_DEFINE_EXPR_BINARY_OPERATOR, _DEFINE_EXPR_BINARY_FUNCTION):
Use elaborated-type-specifier and qualified-id to avoid
ambiguities with QNX system headers.
* testsuite/26_numerics/valarray/95592.cc: New test.

diff --git a/libstdc++-v3/include/bits/valarray_after.h 
b/libstdc++-v3/include/bits/valarray_after.h
index cf84e17e5ad..439b3e9a5ed 100644
--- a/libstdc++-v3/include/bits/valarray_after.h
+++ b/libstdc++-v3/include/bits/valarray_after.h
@@ -331,9 +331,9 @@ namespace __detail
   return _Expr<_Closure, _Tp>(_Closure(this->_M_closure));\
 }
 
-_DEFINE_EXPR_UNARY_OPERATOR(+, __unary_plus)
-_DEFINE_EXPR_UNARY_OPERATOR(-, __negate)
-_DEFINE_EXPR_UNARY_OPERATOR(~, __bitwise_not)
+_DEFINE_EXPR_UNARY_OPERATOR(+, struct std::__unary_plus)
+_DEFINE_EXPR_UNARY_OPERATOR(-, struct std::__negate)
+_DEFINE_EXPR_UNARY_OPERATOR(~, struct std::__bitwise_not)
 
 #undef _DEFINE_EXPR_UNARY_OPERATOR
 
@@ -402,24 +402,24 @@ namespace __detail
   return _Expr<_Closure, _Value>(_Closure(__v, __e ()));\
 }
 
-_DEFINE_EXPR_BINARY_OPERATOR(+, __plus)
-_DEFINE_EXPR_BINARY_OPERATOR(-, __minus)
-_DEFINE_EXPR_BINARY_OPERATOR(*, __multiplies)
-_DEFINE_EXPR_BINARY_OPERATOR(/, __divides)
-_DEFINE_EXPR_BINARY_OPERATOR(%, __modulus)
-_DEFINE_EXPR_BINARY_OPERATOR(^, __bitwise_xor)
-_DEFINE_EXPR_BINARY_OPERATOR(&, __bitwise_and)
-_DEFINE_EXPR_BINARY_OPERATOR(|, __bitwise_or)
-_DEFINE_EXPR_BINARY_OPERATOR(<<, __shift_left)
-_DEFINE_EXPR_BINARY_OPERATOR(>>, __shift_right)
-_DEFINE_EXPR_BINARY_OPERATOR(&&, __logical_and)
-_DEFINE_EXPR_BINARY_OPERATOR(||, __logical_or)
-_DEFINE_EXPR_BINARY_OPERATOR(==, __equal_to)
-_DEFINE_EXPR_BINARY_OPERATOR(!=, __not_equal_to)
-_DEFINE_EXPR_BINARY_OPERATOR(<, __less)
-_DEFINE_EXPR_BINARY_OPERATOR(>, __greater)
-_DEFINE_EXPR_BINARY_OPERATOR(<=, __less_equal)
-_DEFINE_EXPR_BINARY_OPERATOR(>=, __greater_equal)
+_DEFINE_EXPR_BINARY_OPERATOR(+, struct std::__plus)
+_DEFINE_EXPR_BINARY_OPERATOR(-, struct std::__minus)
+_DEFINE_EXPR_BINARY_OPERATOR(*, struct std::__multiplies)
+_DEFINE_EXPR_BINARY_OPERATOR(/, struct std::__divides)
+_DEFINE_EXPR_BINARY_OPERATOR(%, struct std::__modulus)
+_DEFINE_EXPR_BINARY_OPERATOR(^, struct std::__bitwise_xor)
+_DEFINE_EXPR_BINARY_OPERATOR(&, struct std::__bitwise_and)
+_DEFINE_EXPR_BINARY_OPERATOR(|, struct std::__bitwise_or)
+_DEFINE_EXPR_BINARY_OPERATOR(<<, struct std::__shift_left)
+_DEFINE_EXPR_BINARY_OPERATOR(>>, struct std::__shift_right)
+_DEFINE_EXPR_BINARY_OPERATOR(&&, struct std::__logical_and)
+_DEFINE_EXPR_BINARY_OPERATOR(||, struct std::__logical_or)
+_DEFINE_EXPR_BINARY_OPERATOR(==, struct std::__equal_to)
+_DEFINE_EXPR_BINARY_OPERATOR(!=, struct std::__not_equal_to)
+_DEFINE_EXPR_BINARY_OPERATOR(<, struct std::__less)
+_DEFINE_EXPR_BINARY_OPERATOR(>, struct std::__greater)
+_DEFINE_EXPR_BINARY_OPERATOR(<=, struct std::__less_equal)
+_DEFINE_EXPR_BINARY_OPERATOR(>=, struct std::__greater_equal)
 
 #undef _DEFINE_EXPR_BINARY_OPERATOR
 
@@ -442,20 +442,20 @@ namespace __detail
   return _Expr<_Closure, _Tp>(_Closure(__v));\
 }
 
-_DEFINE_EXPR_UNARY_FUNCTION(abs, _Abs)
-_DEFINE_EXPR_UNARY_FUNCTION(cos, _Cos)
-_DEFINE_EXPR_UNARY_FUNCTION(acos, _Acos)
-_DEFINE_EXPR_UNARY_FUNCTION(cosh, _Cosh)
-_DEFINE_EXPR_UNARY_FUNCTION(sin, _Sin)
-_DEFINE_EXPR_UNARY_FUNCTION(asin, _Asin)
-_DEFINE_EXPR_UNARY_FUNCTION(sinh, _Sinh)
-_DEFINE_EXPR_UNARY_FUNCTION(tan, _Tan)
-_DEFINE_EXPR_UNARY_FUNCTION(tanh, _Tanh)
-_DEFINE_EXPR_UNARY_FUNCTION(atan, _Atan)
-_DEFINE_EXPR_UNARY_FUNCTION(exp, _Exp)
-_DEFINE_EXPR_UNARY_FUNCTION(log, _Log)
-_DEFINE_EXPR_UNARY_FUNCTION(log10, _Log10)
-_DEFINE_EXPR_UNARY_FUNCTION(sqrt, _Sqrt)
+_DEFINE_EXPR_UNARY_FUNCTION(abs

Re: Avoid typeless storage in trailing wide ints

2020-10-28 Thread Richard Biener via Gcc-patches
On Tue, Oct 27, 2020 at 11:49 PM Jan Hubicka  wrote:
>
> Hi,
> this patch avoid typeless storage in trailing_wide_ints.  This improves
> TBAA disambiguation rate on cc1plus by 7%.
>
> Bootstrapped/regtested x86_64-linux, OK?

OK.

I guess if uint8_t would be a separate type from unsigned char that
would work as well but it is not.

Thanks,
Richard.

> * wide-int.h (trailing_wide_ints ): Turn len to array of structures
> so it does not imply typeless storage.
> (trailing_wide_ints ::operator): update
> (trailing_wide_ints ::operator []): Update.
> diff --git a/gcc/wide-int.h b/gcc/wide-int.h
> index 39cd5b9bd17..6eae46da12e 100644
> --- a/gcc/wide-int.h
> +++ b/gcc/wide-int.h
> @@ -1387,8 +1387,10 @@ private:
>/* The shared maximum length of each number.  */
>unsigned char m_max_len;
>
> -  /* The current length of each number.  */
> -  unsigned char m_len[N];
> +  /* The current length of each number.
> + Avoid char array so the whole structure is not a typeless storage
> + that will, in turn, turn off TBAA on gimple, trees and RTL.  */
> +  struct {unsigned char len;} m_len[N];
>
>/* The variable-length part of the structure, which always contains
>   at least one HWI.  Element I starts at index I * M_MAX_LEN.  */
> @@ -1470,7 +1472,7 @@ template 
>  inline trailing_wide_int
>  trailing_wide_ints ::operator [] (unsigned int index)
>  {
> -  return trailing_wide_int_storage (m_precision, &m_len[index],
> +  return trailing_wide_int_storage (m_precision, &m_len[index].len,
> &m_val[index * m_max_len]);
>  }
>
> @@ -1479,7 +1481,7 @@ inline typename trailing_wide_ints ::const_reference
>  trailing_wide_ints ::operator [] (unsigned int index) const
>  {
>return wi::storage_ref (&m_val[index * m_max_len],
> - m_len[index], m_precision);
> + m_len[index].len, m_precision);
>  }
>
>  /* Return how many extra bytes need to be added to the end of the structure


Re: move sincos after pre

2020-10-28 Thread Richard Biener via Gcc-patches
On Wed, Oct 28, 2020 at 4:18 AM Alexandre Oliva  wrote:
>
> On Oct 27, 2020, Richard Biener  wrote:
>
> > For trapping math SRC may be a constant?  Better be safe
> > and guard against TREE_CODE (src) != SSA_NAME.
>
> *nod*
>
> > You also want to guard against SSA_NAME_OCCURS_IN_ABNORMAL_PHI (src)
> > since you cannot generally propagate or move uses of those.
>
> What if I don't propagate or move them?

That's fine then.

> In my first cut at this change, I figured it (looked like it) took half
> a brain to implement it, and shut down the other half, only making minor
> tweaks to a copy of execute_cse_sincos_1.
>
> Your response was a wake-up call to many issues I hadn't considered but
> should have, and that it looks like sincos doesn't either, so I ended up
> rewriting the cse_conv function to use and propagate a preexisting
> dominating def, instead of inserting one at a point where there wasn't
> one.  That's because any such conv might trap, unless an earlier one
> didn't.
>
> > Any reason you are not replacing all uses via replace_uses_by
> > and removing the old conversion stmts?  OK, removing might
> > wreck the upthread iterator so replacing with GIMPLE_NOP
> > is the usual trick then.
>
> Removing them would be fine, I was just thoughtlessly mirroring the
> cse_sincos_1 behavior that I'd based it on.
>
>
> Now, I put in code to leave conversion results alone if
> SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs), but I wonder if it would work to
> replace uses thereof, or to propagate uses thereof, as long as I turned
> such conversion results that would be deleted into copies.  I.e., given:
>
>   _2 = (double) _1;
>   _4 = sin (_2);
>   ...
>   _3 = (double) _1; // occurs in abnormal phi below
>   _5 = cos (_3);
>   ...
>   _6 = PHI <..., _3 (abnormal), ...>;
>
> Wouldn't this be ok to turn into:
>
>   _2 = (double) _1;
>   _4 = sin (_2);
>   ...
>   _3 = _2;
>   _5 = cos (_2);
>   ...
>   _6 = PHI <..., _3 (abnormal), ...>;
>
> ?

Yes.

>
> Anyway, here's a patch that does *not* assume it would work, and skips
> defs used in abnormal phis instead.
>
>
> sincos, however, may mess with them, and even introduce/move a trapping
> call to a place where there wasn't any, even if none of the preexisting
> calls were going to be executed.  The only thing I fixed there was a
> plain return within a FOR_EACH_IMM_USE_STMT that I introduced recently.
> Though it's unlikely to ever hit, I understand it's wrong per the
> current API.
>
> BTW, any reason why we are not (yet?) using something like:
>
> #define FOR_EACH_IMM_USE_STMT(STMT, ITER, SSAVAR)   \
>   for (auto_end_imm_use_stmt_traverse auto_end  \
>STMT) = first_imm_use_stmt (&(ITER), (SSAVAR))), \
>  &(ITER))); \
>!end_imm_use_stmt_p (&(ITER));   \
>(void) ((STMT) = next_imm_use_stmt (&(ITER

Just laziness.  Or rather last time I remembered this I tried to do it
more fancy via range-for but didn't get very far due to the nested
iteration ...

>
> Anyway, here's what I'm testing now.  Bootstrap succeeded, regression
> testing underway.  Ok to install if it succeeds?

OK.

Thanks,
Richard.

>
> CSE conversions within sincos
>
> From: Alexandre Oliva 
>
> On platforms in which Aux_[Real_Type] involves non-NOP conversions
> (e.g., between single- and double-precision, or between short float
> and float), the conversions before the calls are CSEd too late for
> sincos to combine calls.
>
> This patch enables the sincos pass to CSE type casts used as arguments
> to eligible calls before looking for other calls using the same
> operand.
>
>
> for  gcc/ChangeLog
>
> * tree-ssa-math-opts.c (sincos_stats): Add conv_removed.
> (execute_cse_conv_1): New.
> (execute_cse_sincos_1): Call it.  Fix return within
> FOR_EACH_IMM_USE_STMT.
> (pass_cse_sincos::execute): Report conv_inserted.
>
> for  gcc/testsuite/ChangeLog
>
> * gnat.dg/sin_cos.ads: New.
> * gnat.dg/sin_cos.adb: New.
> * gcc.dg/sin_cos.c: New.
> ---
>  gcc/testsuite/gcc.dg/sin_cos.c|   41 ++
>  gcc/testsuite/gnat.dg/sin_cos.adb |   14 +
>  gcc/testsuite/gnat.dg/sin_cos.ads |4 +
>  gcc/tree-ssa-math-opts.c  |  107 
> +
>  4 files changed, 165 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/sin_cos.c
>  create mode 100644 gcc/testsuite/gnat.dg/sin_cos.adb
>  create mode 100644 gcc/testsuite/gnat.dg/sin_cos.ads
>
> diff --git a/gcc/testsuite/gcc.dg/sin_cos.c b/gcc/testsuite/gcc.dg/sin_cos.c
> new file mode 100644
> index ..aa71dca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sin_cos.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* This maps to essentially the same gimple that is generated for
> +   gnat.dg/sin_cos.adb, on platforms that use the wraplf variant of
> +   Ada.Numerics

Re: [PATCH] value-range: Give up on POLY_INT_CST ranges [PR97457]

2020-10-28 Thread Richard Biener via Gcc-patches
On Wed, Oct 28, 2020 at 10:44 AM Richard Sandiford via Gcc-patches
 wrote:
>
> This PR shows another problem with calculating value ranges for
> POLY_INT_CSTs.  We have:
>
>   ivtmp_76 = ASSERT_EXPR  POLY_INT_CST [9, 4294967294]>
>
> where the VQ coefficient is unsigned but is effectively acting
> as a negative number.  We wrongly give the POLY_INT_CST the range:
>
>   [9, INT_MAX]
>
> and things go downhill from there: later iterations of the unrolled
> epilogue are wrongly removed as dead.
>
> I guess this is the final nail in the coffin for doing VRP on
> POLY_INT_CSTs.  For other similarly exotic testcases we could have
> overflow for any coefficient, not just those that could be treated
> as contextually negative.
>
> Testing TYPE_OVERFLOW_UNDEFINED doesn't seem like an option because we
> couldn't handle warn_strict_overflow properly.  At this stage we're
> just recording a range that might or might not lead to strict-overflow
> assumptions later.
>
> It still feels like we should be able to do something here, but for
> now removing the code seems safest.  It's also telling that there
> are no testsuite failures on SVE from doing this.
>
> Tested on aarch64-linux-gnu (with and without SVE) and
> x86_64-linux-gnu.  OK for trunk and backports?

OK.

Richard.

> Richard
>
>
> gcc/
> PR tree-optimization/97457
> * value-range.cc (irange::set): Don't decay POLY_INT_CST ranges
> to integer ranges.
>
> gcc/testsuite/
> PR tree-optimization/97457
> * gcc.dg/vect/pr97457.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/pr97457.c | 15 +++
>  gcc/value-range.cc  | 30 +
>  2 files changed, 20 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97457.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97457.c 
> b/gcc/testsuite/gcc.dg/vect/pr97457.c
> new file mode 100644
> index 000..506ba249b00
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr97457.c
> @@ -0,0 +1,15 @@
> +/* { dg-additional-options "-O3" } */
> +
> +int a;
> +long c;
> +signed char d(char e, char f) { return e + f; }
> +int main(void) {
> +  for (; a <= 1; a++) {
> +c = -8;
> +for (; c != 3; c = d(c, 1))
> +  ;
> +  }
> +  char b = c;
> +  if (b != 3)
> +__builtin_abort();
> +}
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 7847104050c..2319c13388a 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -248,31 +248,11 @@ irange::set (tree min, tree max, value_range_kind kind)
>set_undefined ();
>return;
>  }
> -  if (kind == VR_RANGE)
> -{
> -  /* Convert POLY_INT_CST bounds into worst-case INTEGER_CST bounds.  */
> -  if (POLY_INT_CST_P (min))
> -   {
> - tree type_min = vrp_val_min (TREE_TYPE (min));
> - widest_int lb
> -   = constant_lower_bound_with_limit (wi::to_poly_widest (min),
> -  wi::to_widest (type_min));
> - min = wide_int_to_tree (TREE_TYPE (min), lb);
> -   }
> -  if (POLY_INT_CST_P (max))
> -   {
> - tree type_max = vrp_val_max (TREE_TYPE (max));
> - widest_int ub
> -   = constant_upper_bound_with_limit (wi::to_poly_widest (max),
> -  wi::to_widest (type_max));
> - max = wide_int_to_tree (TREE_TYPE (max), ub);
> -   }
> -}
> -  else if (kind != VR_VARYING)
> -{
> - if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
> -   kind = VR_VARYING;
> -}
> +
> +  if (kind != VR_VARYING
> +  && (POLY_INT_CST_P (min) || POLY_INT_CST_P (max)))
> +kind = VR_VARYING;
> +
>if (kind == VR_VARYING)
>  {
>set_varying (TREE_TYPE (min));


[PATCH] Change the way we split stores in BB vectorization

2020-10-28 Thread Richard Biener
The following fixes missed optimizations due to the strange way we
split stores in BB vectorization.  The solution is to split at
the failure boundary and not re-align that to the initial piece
chosen vector size.  Also re-analyze any larger matching rest.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2020-10-28  Richard Biener  

* tree-vect-slp.c (vect_build_slp_instance): Split the store
group at the failure boundary and also re-analyze a large enough
matching rest.

* gcc.dg/vect/bb-slp-68.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 22 ++
 gcc/tree-vect-slp.c   | 20 +---
 2 files changed, 35 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-68.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
new file mode 100644
index 000..8718031cc71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-mavx" { target avx } } */
+
+double x[10], y[6], z[4];
+
+void foo ()
+{
+  x[0] = y[0];
+  x[1] = y[1];
+  x[2] = y[2];
+  x[3] = y[3];
+  x[4] = y[4];
+  x[5] = y[5];
+  x[6] = z[0] + 1.;
+  x[7] = z[1] + 1.;
+  x[8] = z[2] + 1.;
+  x[9] = z[3] + 1.;
+}
+
+/* We want to have the store group split into 4, 2, 4 when using 32byte 
vectors.  */
+/* { dg-final { scan-tree-dump-not "from scalars" "slp2" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 470b67d76b5..50a2d37eb25 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2412,15 +2412,21 @@ vect_build_slp_instance (vec_info *vinfo,
   group1_size);
  bool res = vect_analyze_slp_instance (vinfo, bst_map, stmt_info,
max_tree_size);
- /* If the first non-match was in the middle of a vector,
-skip the rest of that vector.  Do not bother to re-analyze
-single stmt groups.  */
- if (group1_size < i)
+ /* Split the rest at the failure point and possibly
+re-analyze the remaining matching part if it has
+at least two lanes.  */
+ if (group1_size < i
+ && (i + 1 < group_size
+ || i - group1_size > 1))
{
- i = group1_size + const_nunits;
- if (i + 1 < group_size)
-   rest = vect_split_slp_store_group (rest, const_nunits);
+ stmt_vec_info rest2 = rest;
+ rest = vect_split_slp_store_group (rest, i - group1_size);
+ if (i - group1_size > 1)
+   res |= vect_analyze_slp_instance (vinfo, bst_map,
+ rest2, max_tree_size);
}
+ /* Re-analyze the non-matching tail if it has at least
+two lanes.  */
  if (i + 1 < group_size)
res |= vect_analyze_slp_instance (vinfo, bst_map,
  rest, max_tree_size);
-- 
2.26.2


[PATCH] dump reason for throwing away SLP instance

2020-10-28 Thread Richard Biener
This adds dumping to vect_slp_analyze_node_alignment when it fails
an SLP instance due to shared vector type conflicts.

Bootstrapped / tested on x86_64-unknwon-linux-gnu, pushed.

2020-10-28  Richard Biener  

* tree-vect-data-refs.c (vect_slp_analyze_node_alignment):
Dump when vect_update_shared_vectype fails.
---
 gcc/tree-vect-data-refs.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 4abd27e4c70..fd14b480dbf 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -2428,7 +2428,13 @@ vect_slp_analyze_node_alignment (vec_info *vinfo, 
slp_tree node)
   /* We need to commit to a vector type for the group now.  */
   if (is_a  (vinfo)
   && !vect_update_shared_vectype (first_stmt_info, SLP_TREE_VECTYPE 
(node)))
-return false;
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"desired vector type conflicts with earlier one "
+"for %G", first_stmt_info->stmt);
+  return false;
+}
 
   dr_vec_info *dr_info = STMT_VINFO_DR_INFO (first_stmt_info);
   vect_compute_data_ref_alignment (vinfo, dr_info);
-- 
2.26.2


Re: [PATCH] value-range: Give up on POLY_INT_CST ranges [PR97457]

2020-10-28 Thread Andrew MacLeod via Gcc-patches

On 10/28/20 5:43 AM, Richard Sandiford via Gcc-patches wrote:

This PR shows another problem with calculating value ranges for
POLY_INT_CSTs.  We have:

   ivtmp_76 = ASSERT_EXPR  POLY_INT_CST [9, 4294967294]>

where the VQ coefficient is unsigned but is effectively acting
as a negative number.  We wrongly give the POLY_INT_CST the range:

   [9, INT_MAX]

and things go downhill from there: later iterations of the unrolled
epilogue are wrongly removed as dead.

I guess this is the final nail in the coffin for doing VRP on
POLY_INT_CSTs.  For other similarly exotic testcases we could have
overflow for any coefficient, not just those that could be treated
as contextually negative.

Testing TYPE_OVERFLOW_UNDEFINED doesn't seem like an option because we
couldn't handle warn_strict_overflow properly.  At this stage we're
just recording a range that might or might not lead to strict-overflow
assumptions later.

It still feels like we should be able to do something here, but for
now removing the code seems safest.  It's also telling that there
are no testsuite failures on SVE from doing this.


ONce things calm down and I get some time, Im happy to see if there is 
some way we can integrate POLYINTs better with irange going forward.  I 
don't understand them very well right now, and have some other things on 
my plate that I need to get to first.


Andrew



[committed] libstdc++: Add comment to nothrow new explaining catch (...)

2020-10-28 Thread Jonathan Wakely via Gcc-patches
The decision to not rethrow a __forced_unwind exception is deliberate,
so add a comment explaining it.

libstdc++-v3/ChangeLog:

* libsupc++/new_opnt.cc (new): Add comment about forced unwind
exceptions.

Tested powerpc64le-linux. Committed to trunk.

commit c227d96feb0030d63efad352b8fa7175b4c30721
Author: Jonathan Wakely 
Date:   Wed Oct 28 13:19:21 2020

libstdc++: Add comment to nothrow new explaining catch (...)

The decision to not rethrow a __forced_unwind exception is deliberate,
so add a comment explaining it.

libstdc++-v3/ChangeLog:

* libsupc++/new_opnt.cc (new): Add comment about forced unwind
exceptions.

diff --git a/libstdc++-v3/libsupc++/new_opnt.cc 
b/libstdc++-v3/libsupc++/new_opnt.cc
index ace4e6f9345..0e5e39ab994 100644
--- a/libstdc++-v3/libsupc++/new_opnt.cc
+++ b/libstdc++-v3/libsupc++/new_opnt.cc
@@ -26,9 +26,6 @@
 #include 
 #include "new"
 
-using std::new_handler;
-using std::bad_alloc;
-
 extern "C" void *malloc (std::size_t);
 
 _GLIBCXX_WEAK_DEFINITION void *
@@ -43,6 +40,13 @@ operator new (std::size_t sz, const std::nothrow_t&) noexcept
 }
   __catch (...)
 {
+  // N.B. catch (...) means the process will terminate if operator new(sz)
+  // exits with a __forced_unwind exception. The process will print
+  // "FATAL: exception not rethrown" to stderr before exiting.
+  //
+  // If we propagated that exception the process would still terminate
+  // (because this function is noexcept) but with a less informative error:
+  // "terminate called without active exception".
   return nullptr;
 }
 }


Re: This is my patch for fstream to fix the performance issue on Windows.

2020-10-28 Thread Jonathan Wakely via Gcc-patches

On 02/10/20 23:04 +0100, Jonathan Wakely wrote:

On 01/10/20 03:29 +, sotrdg sotrdg via Libstdc++ wrote:

From fb8d644a4c315058af141a3e84fcc083d665c8b9 Mon Sep 17 00:00:00 2001
From: ejsvifq_mabmip 
Date: Wed, 30 Sep 2020 23:26:47 -0400
Subject: [PATCH] Fix a long term performance issue of fstream on Windows since
MSVCRT defines BUFSIZ as 512 which causes the serious downgrade of I/O
performance.

Even stdio itself is using 4096 as real buffer size, the behavior should be the 
same as FILE* on Windows.


The attached patch seems a cleaner approach. Does it solve your
performance issues?


Pushed to trunk for GCC 11.




Re: testsuite: Enable and adjust powerpc fold-vec-extract/insert testcases

2020-10-28 Thread David Edelsohn via Gcc-patches
On Wed, Oct 28, 2020 at 6:26 AM Alan Modra  wrote:
>
> git commit badeac77f552 changed expected number of addi instructions,
> causing these fails on powerpc-linux.
>
> gcc.target/powerpc/fold-vec-insert-int-p9.c: \\maddi\\M found 12 times
> FAIL: gcc.target/powerpc/fold-vec-insert-int-p9.c scan-assembler-times 
> \\maddi\\M 8
> gcc.target/powerpc/fold-vec-extract-char.p9.c: addi found 6 times
> FAIL: gcc.target/powerpc/fold-vec-extract-char.p9.c scan-assembler-times addi 
> 3
> gcc.target/powerpc/fold-vec-extract-int.p9.c: \\maddi\\M found 6 times
> FAIL: gcc.target/powerpc/fold-vec-extract-int.p9.c scan-assembler-times 
> \\maddi\\M 3
> gcc.target/powerpc/fold-vec-extract-longlong.p7.c: \\maddi\\M found 6 times
> FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p7.c scan-assembler-times 
> \\maddi\\M 4
> gcc.target/powerpc/fold-vec-extract-longlong.p8.c: \\maddi\\M found 6 times
> FAIL: gcc.target/powerpc/fold-vec-extract-longlong.p8.c scan-assembler-times 
> \\maddi\\M 4
> changed by badeac77f552
>
> I'm not at all sure why we are counting addi.  On linux I see
> eight in fold-vec-insert-int-p9.c tearing down the stack frame in
> function epilogues, and four in
> addi 9,1,16
> lvewx 0,0,9
> For aix you have the above four but with a -16 offset.  There are no
> stack frames, and you have four addressing stack red-zone as
> addi 9,1,-64
>
> fold-vec-extract-char.p9.c on linux just has epilogue addi, aix has
> red-zone addressing.  The same for fold-vec-extract-int.p9.c,
> fold-vec-extract-longlong.p7.c and fold-vec-extract-longlong.p8.c.
>
> It seems silly to count addi in a function epilogue, and fragile to
> count them in code.  So remove the ilp32 addi checks.
>
> Regression tested powerpc64-linux and powerpc64le-linux.  OK?
>
> * gcc.target/powerpc/fold-vec-extract-char.p9.c: Don't check addi
> count for ilp32.
> * gcc.target/powerpc/fold-vec-extract-int.p9.c: Likewise.
> * gcc.target/powerpc/fold-vec-extract-longlong.p7.c: Likewise.
> * gcc.target/powerpc/fold-vec-extract-longlong.p8.c: Likewise.
> * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.

Okay.

Thanks, David


Re: [PATCH] SLP vectorize across PHI nodes

2020-10-28 Thread Richard Biener
On Wed, 28 Oct 2020, Christophe Lyon wrote:

> On Wed, 28 Oct 2020 at 11:27, Christophe Lyon
>  wrote:
> >
> > On Tue, 27 Oct 2020 at 13:18, Richard Biener  wrote:
> > >
> > > This makes SLP discovery detect backedges by seeding the bst_map with
> > > the node to be analyzed so it can be picked up from recursive calls.
> > > This removes the need to discover backedges in a separate walk.
> > >
> > > This enables SLP build to handle PHI nodes in full, continuing
> > > the SLP build to non-backedges.  For loop vectorization this
> > > enables outer loop vectorization of nested SLP cycles and for
> > > BB vectorization this enables vectorization of PHIs at CFG merges.
> > >
> > > It also turns code generation into a SCC discovery walk to handle
> > > irreducible regions and nodes only reachable via backedges where
> > > we now also fill in vectorized backedge defs.
> > >
> > > This requires sanitizing the SLP tree for SLP reduction chains even
> > > more, manually filling the backedge SLP def.
> > >
> > > This also exposes the fact that CFG copying (and edge splitting
> > > until I fixed that) ends up with different edge order in the
> > > copy which doesn't play well with the desired 1:1 mapping of
> > > SLP PHI node children and edges for epilogue vectorization.
> > > I've tried to fixup CFG copying here but this really looks
> > > like a dead (or expensive) end there so I've done fixup in
> > > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
> > > we can run into.
> > >
> > > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
> > > not sure it's possible to eliminate them all this stage1 so the
> > > patch has quite some checks for this case all over the place.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
> > > and SPEC CPU 2006 successfully built and tested.
> > >
> > > Will push soon.
> > >
> > > Richard.
> > >
> > > 2020-10-27  Richard Biener  
> > >
> > > * gimple.h (gimple_expr_type): For PHIs return the type
> > > of the result.
> > > * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
> > > Make sure edge order into copied loop headers line up with the
> > > originals.
> > > * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
> > > loops with SLP.
> > > (vectorizable_phi): New function.
> > > (vectorizable_live_operation): For BB vectorization compute insert
> > > location here.
> > > * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
> > > SLP_TREE_CHILDREN entries.
> > > (vect_create_new_slp_node): Add overloads with pre-existing node
> > > argument.
> > > (vect_print_slp_graph): Likewise.
> > > (vect_mark_slp_stmts): Likewise.
> > > (vect_mark_slp_stmts_relevant): Likewise.
> > > (vect_gather_slp_loads): Likewise.
> > > (vect_optimize_slp): Likewise.
> > > (vect_slp_analyze_node_operations): Likewise.
> > > (vect_bb_slp_scalar_cost): Likewise.
> > > (vect_remove_slp_scalar_calls): Likewise.
> > > (vect_get_and_check_slp_defs): Handle PHIs.
> > > (vect_build_slp_tree_1): Handle PHIs.
> > > (vect_build_slp_tree_2): Continue SLP build, following PHI
> > > arguments.  Fix memory leak.
> > > (vect_build_slp_tree): Put stub node into the hash-map so
> > > we can discover cycles directly.
> > > (vect_build_slp_instance): Set the backedge SLP def for
> > > reduction chains.
> > > (vect_analyze_slp_backedges): Remove.
> > > (vect_analyze_slp): Do not call it.
> > > (vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION.
> > > (vect_slp_analyze_node_operations): Handle stray failed
> > > backedge defs by failing.
> > > (vect_slp_build_vertices): Adjust leaf condition.
> > > (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
> > > hash-set to handle cycles.
> > > (vect_slp_analyze_operations): Adjust.
> > > (vect_bb_partition_graph_r): Likewise.
> > > (vect_slp_function): Adjust split condition to allow CFG
> > > merges.
> > > (vect_schedule_slp_instance): Rename to ...
> > > (vect_schedule_slp_node): ... this.  Move DFS walk to ...
> > > (vect_schedule_scc): ... this new function.
> > > (vect_schedule_slp): Call it.  Remove ad-hoc vectorized
> > > backedge fill code.
> > > * tree-vect-stmts.c (vect_analyze_stmt): Call
> > > vectorizable_phi.
> > > (vect_transform_stmt): Likewise.
> > > (vect_is_simple_use): Handle vect_backedge_def.
> > > * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
> > > set loop header PHIs to vect_unknown_def_type for loop
> > > vectorization.
> > > * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def.
> > > (enum stmt_vec_in

Re: [RS6000] float128-type-2.c unsupported

2020-10-28 Thread David Edelsohn via Gcc-patches
On Wed, Oct 28, 2020 at 6:48 AM Alan Modra  wrote:
>
> From e7ce33cef478a826a2fe4e110b43b49586ef2438 Mon Sep 17 00:00:00 2001
> From: Alan Modra 
> Date: Wed, 28 Oct 2020 15:57:57 +1030
> Subject:
>
> I noticed this test is unsupported on power10 when looking through
> test logs.  There seems no reason why that should be the case, ie.
> likely the target test was meant to be powerpc64*-*-linux*.  And that
> simplifies down further.
>
> Regression tested powerpc64le-linux.  OK?
>
> * gcc.target/powerpc/float128-type-1.c: Simplify target test.
> * gcc.target/powerpc/float128-type-2.c: Likewise.

Unfortunately, no.

The GCC testsuite has probes for float128, ppc_float128, ppc_ieee128.
The testcases should test for the appropriate feature, not for Linux
nor for LP64.

Thanks, David


[PATCH] vect: Fix load costs for SLP permutes

2020-10-28 Thread Richard Sandiford via Gcc-patches
For the following test case (compiled with load/store lanes
disabled locally):

  void
  f (uint32_t *restrict x, uint8_t *restrict y, int n)
  {
for (int i = 0; i < n; ++i)
  {
x[i * 2] = x[i * 2] + y[i * 2];
x[i * 2 + 1] = x[i * 2 + 1] + y[i * 2];
  }
  }

we have a redundant no-op permute on the x[] load node:

   node 0x4472350 (max_nunits=8, refcnt=2)
  stmt 0 _5 = *_4;
  stmt 1 _13 = *_12;
  load permutation { 0 1 }

Then, when costing it, we pick a cost of 1, even though we need 4 copies
of the x[] load to match a single y[] load:

   ==> examining statement: _5 = *_4;
   Vectorizing an unaligned access.
   vect_model_load_cost: unaligned supported by hardware.
   vect_model_load_cost: inside_cost = 1, prologue_cost = 0 .

The problem is that the code only considers the permutation for
the first scalar iteration, rather than for all VF iterations.

This patch tries to fix that by using similar logic to
vect_transform_slp_perm_load.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard

[-b version also included]


gcc/
* tree-vect-stmts.c (vect_model_load_cost): Use similar logic
to vect_transform_slp_perm_load when counting the number of
loads in a permuted SLP operation.
---
 gcc/tree-vect-stmts.c | 67 ++-
 1 file changed, 41 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3575f25241f..6eacd641e6b 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1099,38 +1099,53 @@ vect_model_load_cost (vec_info *vinfo,
   stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
   /* Record the cost for the permutation.  */
   unsigned n_perms;
-  unsigned assumed_nunits
-   = vect_nunits_for_cost (STMT_VINFO_VECTYPE (first_stmt_info));
   vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL,
vf, true, &n_perms);
   inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm,
   first_stmt_info, 0, vect_body);
-  /* And adjust the number of loads performed.  This handles
-redundancies as well as loads that are later dead.  */
-  auto_sbitmap perm (DR_GROUP_SIZE (first_stmt_info));
-  bitmap_clear (perm);
-  for (unsigned i = 0;
-  i < SLP_TREE_LOAD_PERMUTATION (slp_node).length (); ++i)
-   bitmap_set_bit (perm, SLP_TREE_LOAD_PERMUTATION (slp_node)[i]);
-  ncopies = 0;
-  bool load_seen = false;
-  for (unsigned i = 0; i < DR_GROUP_SIZE (first_stmt_info); ++i)
-   {
- if (i % assumed_nunits == 0)
+
+  /* And if this is not a simple "load N vectors and then permute each
+vector internally" operation, adjust the number of loads performed.
+This handles redundancies as well as loads that are later dead.  */
+  unsigned int nscalars = SLP_TREE_SCALAR_STMTS (slp_node).length ();
+  unsigned int dr_group_size = DR_GROUP_SIZE (first_stmt_info);
+  tree vectype = STMT_VINFO_VECTYPE (first_stmt_info);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  if (nscalars != dr_group_size
+ || !multiple_p (nunits, nscalars))
+   {
+ /* The constant VF and nunits are enforced by
+vect_transform_slp_perm_load for this case.  */
+ unsigned int const_nunits = nunits.to_constant ();
+ unsigned int in_nlanes = dr_group_size * vf.to_constant ();
+ unsigned int out_nlanes = nscalars * vf.to_constant ();
+ auto_sbitmap perm (in_nlanes);
+ bitmap_clear (perm);
+ for (unsigned i = 0; i < out_nlanes; ++i)
+   {
+ unsigned int iter_num = i / nscalars;
+ unsigned int stmt_num = i % nscalars;
+ unsigned int in_lane
+   = (iter_num * dr_group_size
+  + SLP_TREE_LOAD_PERMUTATION (slp_node)[stmt_num]);
+ bitmap_set_bit (perm, in_lane);
+   }
+ ncopies = 0;
+ bool load_seen = false;
+ for (unsigned i = 0; i < in_nlanes; ++i)
{
- if (load_seen)
-   ncopies++;
- load_seen = false;
+ if (i % const_nunits == 0)
+   {
+ if (load_seen)
+   ncopies++;
+ load_seen = false;
+   }
+ if (bitmap_bit_p (perm, i))
+   load_seen = true;
}
- if (bitmap_bit_p (perm, i))
-   load_seen = true;
-   }
-  if (load_seen)
-   ncopies++;
-  gcc_assert (ncopies
- <= (DR_GROUP_SIZE (first_stmt_info)
- - DR_GROUP_GAP (first_stmt_info)
- + assumed_nunits - 1) / assumed_nunits);
+ if (load_seen)
+   ncopies++;
+   }
 }
 
   /* Grouped loads read all elements in the group at once,

diff --

[PATCH] Fix gcc.dg/vect/bb-slp-5[89].c

2020-10-28 Thread Richard Biener
I forgot a vect_double check.

Testing in progress.

2020-10-28  Richard Biener  

* gcc.dg/vect/bb-slp-58.c: Require vect_double.
* gcc.dg/vect/bb-slp-59.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-58.c | 1 +
 gcc/testsuite/gcc.dg/vect/bb-slp-59.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-58.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-58.c
index 11bf5c333ab..5a3d3b75aa8 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-58.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-58.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
 
 double x[1024];
 void bar (void);
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
index 2e35725ff2a..815b44e1f7c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
 /* { dg-additional-options "-fdump-tree-loopdone" } */
 
 double x[1024];
-- 
2.26.2


Re: [PATCH] vect: Fix load costs for SLP permutes

2020-10-28 Thread Richard Biener via Gcc-patches
On Wed, Oct 28, 2020 at 2:39 PM Richard Sandiford via Gcc-patches
 wrote:
>
> For the following test case (compiled with load/store lanes
> disabled locally):
>
>   void
>   f (uint32_t *restrict x, uint8_t *restrict y, int n)
>   {
> for (int i = 0; i < n; ++i)
>   {
> x[i * 2] = x[i * 2] + y[i * 2];
> x[i * 2 + 1] = x[i * 2 + 1] + y[i * 2];
>   }
>   }
>
> we have a redundant no-op permute on the x[] load node:
>
>node 0x4472350 (max_nunits=8, refcnt=2)
>   stmt 0 _5 = *_4;
>   stmt 1 _13 = *_12;
>   load permutation { 0 1 }
>
> Then, when costing it, we pick a cost of 1, even though we need 4 copies
> of the x[] load to match a single y[] load:
>
>==> examining statement: _5 = *_4;
>Vectorizing an unaligned access.
>vect_model_load_cost: unaligned supported by hardware.
>vect_model_load_cost: inside_cost = 1, prologue_cost = 0 .
>
> The problem is that the code only considers the permutation for
> the first scalar iteration, rather than for all VF iterations.
>
> This patch tries to fix that by using similar logic to
> vect_transform_slp_perm_load.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

I wonder if we can instead do this counting in vect_transform_slp_load
where we already count the number of permutes.  That would avoid
the duplication of the "logic".

Richard.

> Richard
>
> [-b version also included]
>
>
> gcc/
> * tree-vect-stmts.c (vect_model_load_cost): Use similar logic
> to vect_transform_slp_perm_load when counting the number of
> loads in a permuted SLP operation.
> ---
>  gcc/tree-vect-stmts.c | 67 ++-
>  1 file changed, 41 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 3575f25241f..6eacd641e6b 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1099,38 +1099,53 @@ vect_model_load_cost (vec_info *vinfo,
>stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
>/* Record the cost for the permutation.  */
>unsigned n_perms;
> -  unsigned assumed_nunits
> -   = vect_nunits_for_cost (STMT_VINFO_VECTYPE (first_stmt_info));
>vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL,
> vf, true, &n_perms);
>inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm,
>first_stmt_info, 0, vect_body);
> -  /* And adjust the number of loads performed.  This handles
> -redundancies as well as loads that are later dead.  */
> -  auto_sbitmap perm (DR_GROUP_SIZE (first_stmt_info));
> -  bitmap_clear (perm);
> -  for (unsigned i = 0;
> -  i < SLP_TREE_LOAD_PERMUTATION (slp_node).length (); ++i)
> -   bitmap_set_bit (perm, SLP_TREE_LOAD_PERMUTATION (slp_node)[i]);
> -  ncopies = 0;
> -  bool load_seen = false;
> -  for (unsigned i = 0; i < DR_GROUP_SIZE (first_stmt_info); ++i)
> -   {
> - if (i % assumed_nunits == 0)
> +
> +  /* And if this is not a simple "load N vectors and then permute each
> +vector internally" operation, adjust the number of loads performed.
> +This handles redundancies as well as loads that are later dead.  */
> +  unsigned int nscalars = SLP_TREE_SCALAR_STMTS (slp_node).length ();
> +  unsigned int dr_group_size = DR_GROUP_SIZE (first_stmt_info);
> +  tree vectype = STMT_VINFO_VECTYPE (first_stmt_info);
> +  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  if (nscalars != dr_group_size
> + || !multiple_p (nunits, nscalars))
> +   {
> + /* The constant VF and nunits are enforced by
> +vect_transform_slp_perm_load for this case.  */
> + unsigned int const_nunits = nunits.to_constant ();
> + unsigned int in_nlanes = dr_group_size * vf.to_constant ();
> + unsigned int out_nlanes = nscalars * vf.to_constant ();
> + auto_sbitmap perm (in_nlanes);
> + bitmap_clear (perm);
> + for (unsigned i = 0; i < out_nlanes; ++i)
> +   {
> + unsigned int iter_num = i / nscalars;
> + unsigned int stmt_num = i % nscalars;
> + unsigned int in_lane
> +   = (iter_num * dr_group_size
> +  + SLP_TREE_LOAD_PERMUTATION (slp_node)[stmt_num]);
> + bitmap_set_bit (perm, in_lane);
> +   }
> + ncopies = 0;
> + bool load_seen = false;
> + for (unsigned i = 0; i < in_nlanes; ++i)
> {
> - if (load_seen)
> -   ncopies++;
> - load_seen = false;
> + if (i % const_nunits == 0)
> +   {
> + if (load_seen)
> +   ncopies++;
> + load_seen = false;
> +   }
> + if (bitmap_bit_p (perm, i))
> +   load_seen 

[PATCH] tree-optimization/97615 - avoid creating externals from patterns

2020-10-28 Thread Richard Biener
The previous change missed to check for patterns again, the following
corrects that.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

2020-10-28  Richard Biener  

PR tree-optimization/97615
* tree-vect-slp.c (vect_build_slp_tree_2): Do not build
an external from pattern defs.

* gcc.dg/vect/bb-slp-pr97615.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr97615.c | 23 ++
 gcc/tree-vect-slp.c|  3 ++-
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr97615.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr97615.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr97615.c
new file mode 100644
index 000..b4a8aa2f4a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr97615.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+
+short *a;
+int e, f;
+
+void
+foo (int c, int d)
+{
+  short *a1, *a2, *a3;
+  a1 = a++;
+  *a1 = c;
+  a2 = a++;
+  *a2 = *a1;
+  a3 = a++;
+  *a3 = d;
+}
+
+void
+bar (void)
+{
+  foo (e + f - 2, e + f - 1);
+  foo (e + f - 1, 0);
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 5eafc037955..9f1da3070f5 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1599,7 +1599,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
}
 
   if (is_a  (vinfo)
- && oprnd_info->first_dt == vect_internal_def)
+ && oprnd_info->first_dt == vect_internal_def
+ && !oprnd_info->any_pattern)
{
  /* For BB vectorization, if all defs are the same do not
 bother to continue the build along the single-lane
-- 
2.26.2


[PATCH] Fix iteration over loads in SLP optimize

2020-10-28 Thread Richard Biener
I've made a typo when refactoring the iteration over all loads in
the SLP graph.  Fixed.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

2020-10-28  Richard Biener  

* tree-vect-slp.c (vect_optimize_slp): Fix iteration over
all loads.
---
 gcc/tree-vect-slp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 50a2d37eb25..5eafc037955 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3043,7 +3043,7 @@ vect_optimize_slp (vec_info *vinfo)
   /* Now elide load permutations that are not necessary.  */
   for (i = 0; i < leafs.length (); ++i)
 {
-  node = vertices[i];
+  node = vertices[leafs[i]];
   if (!SLP_TREE_LOAD_PERMUTATION (node).exists ())
continue;
 
-- 
2.26.2


RE: [PATCH 1/2] Enable OpenMP efficient performance profiling via ITT tracing

2020-10-28 Thread Vitaly Slobodskoy
Hi Jakub,

Thanks for your comments! Please see my reply below.

> > In order to optimize OpenMP workloads, it is quite important to have a
> dedicated performance analysis tool familiar with the OpenMP runtime
> specifics. The typical OpenMP performance issues are:
> > - Not all the performance-critical code is parallel, serial execution
> > significantly affects scaling (Amdahl's law)
> > - Work balance is not good, not all the cores doing useful work
> > - Overhead on synchronization, scheduling, threads creation
> 
> So, first thing is that this brings in quite large code that clearly was 
> written by
> somebody else, so the most important question is what is the upstream repo
> for it, what is the license, for the steering committee the question is if we
> can allow it in the GCC codebase or whether instead if the library is
> configured in certain way we just shouldn't require users to have that library
> installed instead.
> If it is included, we need a process of updates from the upstream repo being
> synced into GCC tree.

 [Vitaly Slobodskoy] Well, ITT API (https://github.com/intel/ittapi) is 
licensed under joint GPLv2 and 3-Clause BSD licenses, can be included within 
the GPL projects, for example, LLVM has already integrated it 
(https://github.com/llvm/llvm-project/tree/master/openmp/runtime/src/thirdparty/ittnotify).
 This is just an API in fact listing all the possible tracing routines 
definitions within the ittnotify.h header file with very small amount of logic 
responsible for loading ITT trace collector and binding the callbacks. ITT API 
is evolved primarily via adding new tracing APIs (however, this happens very 
rarely), existing APIs are not removed. As result, all the code instrumented by 
older ITT API version can be perfectly supported by the performance tools 
compiled with the latest ITT API.
So, it might not be expected to update ITT API within gcc (frequently).

> > This proposal adds new "--disable-itt-instrumentation" configure option
> which completely disables (removes) all the tracing. The tracing is ON by
> default.
> > OpenMP Imbalance time calculation is not included in this patch.
> 
> Second thing, making this on by default is a very bad idea.
> Most people will not need it and it will just slow things down.

 [Vitaly Slobodskoy] Well, opposite if tracing is off by default, almost nobody 
would be able to benefit from these capabilities, because in order to analyze 
performance of OpenMP workloads, it would be required to recompile the compiler 
to get instrumented runtime. I'd also like to underline that trace points are 
added within non-performance critical code, and the overhead of the 
instrumentation is negligible if there is no data collector - just single 
comparison of pointer with NULL for every trace point. As result, this 
instrumentation itself has negligible impact on the performance of OpenMP 
workloads.

> Also, OpenMP 5.0 adds OMPT support which is exactly meant for tracing,
> wouldn't it be better to add OMPT support and then add an ITT plugin as one
> of the perhaps multiple users of OMPT?

[Vitaly Slobodskoy] Right, this is an alternative to OMPT solution. There are 
few reasons why I'm proposing ITT-based solution:
1) There is still no OMPT support within gcc. Are there any plans to add its 
support?
2) OMPT implies higher performance overhead, especially for imbalance time 
calculation. As OMPT provides a set of dedicated callbacks, in order to 
calculate imbalance time for parallel loop, it would be required to handle 
ompt_callback_sync_region_wait callback for every thread. In case of ITT an 
imbalance time can be calculated within the runtime and callback master thread 
just once with the actual imbalance time.
3) There are existing OpenMP runtimes (LLVM OpenMP runtime, Intel Compiler 
OpenMP runtime) instrumented with ITT and tools (e.g. Intel VTune) relying on 
ITT instrumentation within OpenMP runtime.
 
> > +#ifdef ENABLE_ITT_INSTRUMENTATION
> > +static __itt_domain* s_gomp_parallel_domain = NULL; #endif
> 
> Code in libgomp proper needs to follow the GCC Coding Conventions.
> So, e.g. space before * rather than after it.

[Vitaly Slobodskoy] Sure, thanks for this and other catches!

> 
> > +#ifdef ENABLE_ITT_INSTRUMENTATION
> > +  if (__itt_frame_submit_v3_ptr)
> > +  {
> 
> { indented 2 spaces more than the if, and the body another 2.
> 
> > +if (!s_gomp_parallel_domain)
> > +{
> > +  s_gomp_parallel_domain = __itt_domain_create("$omp$parallel");
> > +  __itt_thread_set_name("OMP Master Thread");
> 
> Spaces before ( in calls.
> 
> > +__itt_frame_submit_v3(s_gomp_parallel_domain, NULL,
> > +  parallel_region_begin_ts, parallel_region_end_ts);
> 
> The arguments should be aligned (i.e. parallel_region_begin_ts below
> s_gomp_parallel_domain).  Also, all the itt specific variable names should
> have itt somewhere in it.
> 
> > +#ifdef ENABLE_ITT_INSTRUMENTATION
> > +  char thread_name[30];
> > +#endif
> 
> This

Re: [PATCH 1/X] libsanitizer: Tie the hwasan library into our build system

2020-10-28 Thread Matthew Malcomson via Gcc-patches
Hi Richard,

I've done most of the updates you suggested, but have a few questions to
ensure I have the right end of the stick before making the remaining changes.

There are also a few clarifications I'd like to make where I hadn't explained
the rationale for certain bits of the original code, and I think that those
clarifications may lead to further changes you'd like.

I'm putting all the questions and extra clarifications in one email rather than
replying to each of the emails in turn.

--- w.r.t Patch 1

- You suggested this should also update README.gcc and merge.sh.
I believe Martin Liska wanted to perform the merge of libhwasan into the
libsanitizer directory, and that seems to make a lot of sense given that he
knows a lot more here.
He's posted a patch that would do this work here
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556365.html

--- w.r.t. Patch 2

- You correctly read that the `configure.tgt` indicates that HWASAN is supported
for ilp32, that's my mistake.  HWASAN does not actually handle ilp32 (the hwasan
library functions `__hwasan_load*` take pointers as arguments and check the top
bytes which obviously can't pass that top byte).

N.b. I've may have mentioned ilp32 a few times and put comments discussing
ptr_mode and Pmode being different, but that's more to do with trying to ensure
the code is done "properly" rather than actually ensuring the functionality
works on ilp32.

--- w.r.t. Patch 4

- Parametrising the tag size is pretty easy towards less bits, the current code
handles this quite nicely using HWASAN_TAG_SIZE and in fact I've already done
that for a WIP branch for MTE stack tagging.

Parametrising the tag size to more bits is much more involved -- especially
since the hwasan library uses a u8 data type to represent these tags
everywhere.  Hence I figure I'll leave this as it is.

Does that sound ok?

--- w.r.t. Patch 5

- Around hwasan_increment_tag, yes -- the STATIC_ASSERT made the modulus
reduntant, it should have asserted the below (since the "less than or equal"
check still works for the smaller tag size used in MTE rather than fixing it to
tag_offset).
HWASAN_TAG_SIZE <= sizeof (tag_offset) * CHAR_BIT

- Exporting hwasan_base_ptr (which will be renamed to hwasan_frame_base_ptr).
I currently export this through a function hwasan_frame_base rather than via an
exported variable.  I want to do this so that any use of the base pointer will
be recorded so we know to emit the initialisation (whereas a use of
virtual_stack_vars_rtx can rest assured that the pointer will always be
initialised).

N.b. This is also related to API for TARGET_MEMTAG_GENTAG (renamed in the
new patch to TARGET_MEMTAG_INSERT_RANDOM_TAG).  I have tried to separate
generating a register to be used as hwasan_frame_base_ptr and emitting the RTL
to initialise that register.
That way, code that knows it does not want to emit any RTL can still access this
variable, knowing that the initialisation will be emitted later.
This is largely because I didn't want to start spreading the possibility of
emitting RTL earlier in the expand pass.  At the moment hwasan_emit_prologue
is often the first time that RTL is emitted (unless there are large aligned
variables in the stack -- these are indexed off of an alternate "base register"
generated using get_dynamic_stack_base, and that function emits some code to
generate that new register).

However, an alternative API which returns a fresh register would match the
interface of get_dynamic_stack_base which is an existing API in the codebase.

- default_memtag_addtag (now *_add_tag)
I was mentioning compile-time UB, I thought that `plus_constant` taking a
poly_int64 rather than a poly_uint64 meant I can't pass such a large number.


--- w.r.t. Patch 6

- I have added a comment about C++ exceptions, but thought I'd include a bit
more information about the state of things here.
LLVM have support for C++ exceptions by using a different unwinding personality
function for all HWASAN tagged frames.  That personality function is defined in
libhwasan, and untags the entire stack frame as we unwind it.

Unfortunately, that personality function relies on the frame pointer pointing to
just before the variables on the stack (commented as not being enforced by the
ABI but being a requirement for HWASAN).  That holds in LLVM but does not for
GCC.
https://github.com/llvm-mirror/compiler-rt/blob/master/lib/hwasan/hwasan_exceptions.cpp#L51

I have a hack that modifies that function to use _Unwind_Backtrace to find the
extent of the current frame, and then adds the exception support to GCC based on
this new personality function.
Since the focus of the implementation in GCC is for the kernel (which doesn't
have C++ exceptions) I don't have plans to turn that into something
release-quality and upstream it.  The userspace story for hwasan on anything
other than Android has much bigger difficulties around not using the "platform
ABI", so I don't think putting much effort into C++ excep

[patch] vxworks: Fix conditional inclusion guard in gthr-vxworks-thread.c

2020-10-28 Thread Olivier Hainque

This change fixes the name of the macro used to condition the
inclusion of an actual implementation of some of the gthread
support services for VxWorks, to agree with the side
defining that macro based on tests against the targetted
VxWorks version major.

Tested by verifying the desired effect of having the expected
entry points available after a rebuild for the configuration where
there were missing.

Olivier

2020-10-28  Olivier Hainque  

libgcc/
* config/gthr-vxworks-thread.c: Fix name of macro used
to condition the inclusion of an actual implementation,
to match the defining side (__GTHREADS_CXX0X).

diff --git a/libgcc/config/gthr-vxworks-thread.c 
b/libgcc/config/gthr-vxworks-thread.c
index c87168c22711..c8fe65f8cf35 100644
--- a/libgcc/config/gthr-vxworks-thread.c
+++ b/libgcc/config/gthr-vxworks-thread.c
@@ -29,7 +29,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 
 #include "gthr.h"
 
-#if __GTHREAD_CXX0X
+#if __GTHREADS_CXX0X
 
 #include 
 #include 
-- 
2.17.1





Re: [PATCH][middle-end][i386][Version 4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-28 Thread Qing Zhao via Gcc-patches
Hi, Richard,

In order to be consistent with other flags in flag-types.h, for example, 
“sanitize_code”,
I didn’t use namespace, instead making the name more specific as following:

/* Different settings for zeroing subset of registers.  */
enum  zero_regs_flags {
  ZERO_REGS_UNSET = 0,
  ZERO_REGS_SKIP = 1UL << 0,
  ZERO_REGS_ONLY_USED = 1UL << 1,
  ZERO_REGS_ONLY_GPR = 1UL << 2,
  ZERO_REGS_ONLY_ARG = 1UL << 3,
  ZERO_REGS_ENABLED = 1UL << 4,
  ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
   | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
  ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
   | ZERO_REGS_ONLY_GPR,
  ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
   | ZERO_REGS_ONLY_ARG,
  ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
  ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
  | ZERO_REGS_ONLY_ARG,
  ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
  ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
  ZERO_REGS_ALL = ZERO_REGS_ENABLED
};

Is this good?

Or you still prefer namespace?

thanks.

Qing


> On Oct 27, 2020, at 10:36 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
 diff --git a/gcc/flag-types.h b/gcc/flag-types.h
 index 852ea76..0f7e503 100644
 --- a/gcc/flag-types.h
 +++ b/gcc/flag-types.h
 @@ -285,6 +285,15 @@ enum sanitize_code {
  | SANITIZE_BOUNDS_STRICT
 };
 
 +enum  zero_call_used_regs_code {
 +  UNSET = 0,
 +  SKIP = 1UL << 0,
 +  ONLY_USED = 1UL << 1,
 +  ONLY_GPR = 1UL << 2,
 +  ONLY_ARG = 1UL << 3,
 +  ALL = 1UL << 4
 +};
>>> 
>>> I'd suggested these names on the assumption that we'd be using
>>> a C++ enum class, so that the enum would be referenced as
>>> name::ALL, name::SKIP, etc.  But I guess using a C++ enum class
>>> doesn't work well with bitfields after all.
>>> 
>>> These names are too generic without the name:: scoping though.
>>> Perhaps we should put them in a namespace:
>>> 
>>> namespace zero_regs_flags {
>>>   const unsigned int UNSET = 0;
>>>   …etc…
>>> }
>>> 
>>> (call-used probably doesn't need to be part of the flag names,
>>> since the concept is more general than that and call-usedness
>>> is really a filter that's being applied on top.  Although I guess
>>> the same is true of “zero”. ;-))
>>> 
>>> I don't think we should have ALL as a separate flag: ALL is the absence
>>> of ONLY_*.  Maybe we should have an ENABLED flag that all non-skip
>>> combinations use?
>>> 
>>> If it makes things easier, I think it would be good to have e.g.:
>>> 
>>> unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
>>> 
>>> inside the namespace, to reduce the verbosity in the option table.
>> 
>> Then, the final namespace will look like:
>> 
>> namespace zero_regs_flags {
>>  const unsigned int UNSET = 0;
>>  const unsigned int SKIP = 1UL << 0;
>>  const unsigned int ONLY_USED = 1UL << 1;
>>  const unsigned int ONLY_GPR = 1UL << 2;
>>  const unsigned int ONLY_ARG = 1UL << 3;
>>  const unsigned int ENABLED = 1UL << 4;
>>  const unsigned int USED_GPR_ARG = ONLY_USED | ONLY_GPR | ONLY_ARG;
> 
> “ENABLED |” here
> 
>>  const unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
>>  const unsigned int USED_ARG = ENABLED | ONLY_USED | ONLY_ARG;
>>  const unsigned int USED = ENABLED | ONLY_USED;
>>  const unsigned int ALL_GRP_ARG = ENABLED | ONLY_GPR | ONLY_ARG;
> 
> GPR
> 
>>  const unsigned int ALL_GPR = ENABLED | ONLY_GPR;
>>  const unsigned int ALL_ARG = ENABLED | ONLY_ARG;
>>  const unsigned int ALL = ENABLED;
>> }
>> 
>> ??
> 



[PATCH] Ignore ignored operands in vect_get_and_check_slp_defs

2020-10-28 Thread Richard Biener
This passes down skip_args to vect_get_and_check_slp_defs to skip
ignored ops there, too and not fail SLP discovery.  This fixes
gcc.target/aarch64/sve/reduc_strict_5.c

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

2020-10-28  Richard Biener  

* tree-vect-slp.c (vect_get_and_check_slp_defs): For skipped
args just push NULLs and vect_uninitialized_def.
(vect_build_slp_tree_2): Allocate skip_args for all ops
and pass it down to vect_get_and_check_slp_defs.
---
 gcc/tree-vect-slp.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 9f1da3070f5..c98f747d4a9 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -413,6 +413,7 @@ vect_def_types_match (enum vect_def_type dta, enum 
vect_def_type dtb)
ok return 0.  */
 static int
 vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
+bool *skip_args,
 vec stmts, unsigned stmt_num,
 vec *oprnds_info)
 {
@@ -507,6 +508,14 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
  return -1;
}
 
+  if (skip_args[i])
+   {
+ oprnd_info->def_stmts.quick_push (NULL);
+ oprnd_info->ops.quick_push (NULL_TREE);
+ oprnd_info->first_dt = vect_uninitialized_def;
+ continue;
+   }
+
   if (def_stmt_info && is_pattern_stmt_p (def_stmt_info))
oprnd_info->any_pattern = true;
 
@@ -589,6 +598,12 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
   /* Now match the operand definition types to that of the first stmt.  */
   for (i = 0; i < number_of_oprnds;)
 {
+  if (skip_args[i])
+   {
+ ++i;
+ continue;
+   }
+
   oprnd_info = (*oprnds_info)[i];
   dt = dts[i];
   stmt_vec_info def_stmt_info = oprnd_info->def_stmts[stmt_num];
@@ -1412,7 +1427,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 
   /* If the SLP node is a PHI (induction or reduction), terminate
  the recursion.  */
-  bool skip_args[2] = { false, false };
+  bool *skip_args = XALLOCAVEC (bool, nops);
+  memset (skip_args, 0, nops);
   if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
 if (gphi *stmt = dyn_cast  (stmt_info->stmt))
   {
@@ -1557,7 +1573,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
   slp_oprnd_info oprnd_info;
   FOR_EACH_VEC_ELT (stmts, i, stmt_info)
 {
-  int res = vect_get_and_check_slp_defs (vinfo, swap[i],
+  int res = vect_get_and_check_slp_defs (vinfo, swap[i], skip_args,
 stmts, i, &oprnds_info);
   if (res != 0)
matches[(res == -1) ? 0 : i] = false;
@@ -1582,19 +1598,19 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
   slp_tree child;
   unsigned int j;
 
-  if (oprnd_info->first_dt == vect_uninitialized_def)
+  /* We're skipping certain operands from processing, for example
+outer loop reduction initial defs.  */
+  if (skip_args[i])
{
- /* COND_EXPR have one too many eventually if the condition
-is a SSA name.  */
- gcc_assert (i == 3 && nops == 4);
+ children.safe_push (NULL);
  continue;
}
 
-  /* We're skipping certain operands from processing, for example
-outer loop reduction initial defs.  */
-  if (i <= 1 && skip_args[i])
+  if (oprnd_info->first_dt == vect_uninitialized_def)
{
- children.safe_push (NULL);
+ /* COND_EXPR have one too many eventually if the condition
+is a SSA name.  */
+ gcc_assert (i == 3 && nops == 4);
  continue;
}
 
-- 
2.26.2


Re: [PATCH][middle-end][i386][Version 4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-28 Thread Richard Sandiford via Gcc-patches
Qing Zhao  writes:
> Hi, Richard,
>
> In order to be consistent with other flags in flag-types.h, for example, 
> “sanitize_code”,
> I didn’t use namespace, instead making the name more specific as following:
>
> /* Different settings for zeroing subset of registers.  */
> enum  zero_regs_flags {
>   ZERO_REGS_UNSET = 0,
>   ZERO_REGS_SKIP = 1UL << 0,
>   ZERO_REGS_ONLY_USED = 1UL << 1,
>   ZERO_REGS_ONLY_GPR = 1UL << 2,
>   ZERO_REGS_ONLY_ARG = 1UL << 3,
>   ZERO_REGS_ENABLED = 1UL << 4,
>   ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>| ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
>   ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>| ZERO_REGS_ONLY_GPR,
>   ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>| ZERO_REGS_ONLY_ARG,
>   ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
>   ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
>   | ZERO_REGS_ONLY_ARG,
>   ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
>   ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
>   ZERO_REGS_ALL = ZERO_REGS_ENABLED
> };
>
> Is this good?
>
> Or you still prefer namespace?

I prefer the namespace.  I realise namespaces aren't used that much
in GCC yet, but they *are* used.

The advantage they have is that it's possible to do:

  using namespace ...;

in contexts where there's no ambiguity.  They also make lines like
the | ones above easier to read.

Thanks,
Richard


[PATCH] openmp: Implicit 'declare target' for C++ static initializers

2020-10-28 Thread Kwok Cheung Yeung

Hello

OpenMP 5.0 has a new feature for implicitly marking variables and functions that 
are referenced in the initializers of static variables and functions that are 
already marked 'declare target'. Support was added in the commit 'openmp: 
Implement discovery of implicit declare target to clauses' 
(dc703151d4f4560e647649506d5b4ceb0ee11e90). However, this does not work with 
non-constant C++ initializers, where the initializers can contain references to 
other (non-constant) variables and function calls.


The C++ front-end stores the initialization information in the static_aggregates 
list (with the variable decl in the TREE_VALUE of an entry and the 
initialization in TREE_PURPOSE) rather than in TREE_INITIAL(var_decl). I have 
added an extra function in omp-offload.cpp to walk the variable initialiser 
trees in static_aggregates, and added a call to it from the FE shortly before 
the initializations are emitted. I have also added a testcase to ensure that the 
implicitly marked variables/functions can be referenced in offloaded code.


The libgomp tests have been run with offloading to a Nvidia card with no 
regressions, and I have also bootstrapped the compiler with no offloading on 
x86-64. Okay for trunk?


Thanks

Kwok
commit d2c8c5bd2826851b727e93a8ea2141596e50a621
Author: Kwok Cheung Yeung 
Date:   Wed Oct 28 07:13:14 2020 -0700

openmp: Implicitly add 'declare target' directives for dynamic static 
initializers in C++

2020-10-28  Kwok Cheung Yeung  

cp/
* decl2.c: Include omp-offload.h
(c_parse_final_cleanups): Call omp_mark_target_static_initializers.
* omp-offload.c (omp_discover_declare_target_var_r): Add all static
variables to worklist.
(omp_discover_implicit_declare_target): Check that worklist items
that are variable declarations have an initialization expression
before walking.
(omp_mark_target_static_initializers): New.
* omp-offload.h (omp_mark_target_static_initializers): New prototype.

libgomp/
* testsuite/libgomp.c++/declare_target-3.C: New.

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 2f0d637..b207d58 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "c-family/c-ada-spec.h"
 #include "asan.h"
+#include "omp-offload.h"
 
 /* Id for dumping the raw trees.  */
 int raw_dump_id;
@@ -4970,6 +4971,12 @@ c_parse_final_cleanups (void)
  /* Make sure the back end knows about all the variables.  */
  write_out_vars (vars);
 
+ /* Mark functions and variables in static initializers as
+'omp declare target' if the initialized variable is marked
+as such.  */
+ if (flag_openmp)
+   omp_mark_target_static_initializers (vars);
+
  /* Set the line and file, so that it is obviously not from
 the source file.  */
  input_location = locus_at_end_of_parsing;
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 3e9c31d..8ecc181 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -296,7 +296,7 @@ omp_discover_declare_target_var_r (tree *tp, int 
*walk_subtrees, void *data)
  DECL_ATTRIBUTES (*tp)
= remove_attribute ("omp declare target link", DECL_ATTRIBUTES 
(*tp));
}
-  if (TREE_STATIC (*tp) && DECL_INITIAL (*tp))
+  if (TREE_STATIC (*tp))
((vec *) data)->safe_push (*tp);
   DECL_ATTRIBUTES (*tp) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (*tp));
   symtab_node *node = symtab_node::get (*tp);
@@ -348,7 +348,7 @@ omp_discover_implicit_declare_target (void)
   while (!worklist.is_empty ())
 {
   tree decl = worklist.pop ();
-  if (VAR_P (decl))
+  if (VAR_P (decl) && DECL_INITIAL (decl))
walk_tree_without_duplicates (&DECL_INITIAL (decl),
  omp_discover_declare_target_var_r,
  &worklist);
@@ -363,6 +363,33 @@ omp_discover_implicit_declare_target (void)
 }
 }
 
+void
+omp_mark_target_static_initializers (tree vars)
+{
+  tree node;
+  auto_vec worklist;
+
+  for (node = vars; node; node = TREE_CHAIN (node))
+if (omp_declare_target_var_p (TREE_VALUE (node)))
+   worklist.safe_push (TREE_VALUE (node));
+
+  while (!worklist.is_empty ())
+{
+  tree decl = worklist.pop ();
+
+  if (!VAR_P (decl) || !TREE_STATIC (decl))
+   continue;
+
+  for (node = vars; node; node = TREE_CHAIN (node))
+   if (TREE_VALUE (node) == decl)
+ {
+   walk_tree_without_duplicates (&TREE_PURPOSE (node),
+ omp_discover_declare_target_var_r,
+ &worklist);
+   break;
+ }
+}
+}
 
 /* Create new symbols containing (address, size) pairs for global variables,
marked with "omp declare target" attribute, as well as add

[patch] vxworks: Fix the logic conditioning VX_ENTER/LEAVE_TLS_DTOR

2020-10-28 Thread Olivier Hainque

This change fixes a basic #if/#ifdef confusion exposed by
the wider range of testing we’re running for the forthcoming
introduction of additional support for VxWorks 7r2, which
comes together with a few organizational changes to keep
supporting older versions.

Committing as obvious after checking that it does fix
unexpected undefined references on powerpc for vx7r2.

Olivier

2020-10-28  Olivier Hainque  

libgcc/
* config/gthr-vxworks-tls.c: Fix preprocessor logic
controlling the definition of VX_ENTER_TLS_DTOR and
VX_LEAVE_TLS_DTOR based on a version major check.

diff --git a/libgcc/config/gthr-vxworks-tls.c b/libgcc/config/gthr-vxworks-tls.c
index 8987e55c35ac..1d5c4fbb34de 100644
--- a/libgcc/config/gthr-vxworks-tls.c
+++ b/libgcc/config/gthr-vxworks-tls.c
@@ -115,7 +115,7 @@ extern void __gthread_set_tls_data (void *data);
 
 #endif
 
-#ifdef _VXWORKS_MAJOR_EQ(6)
+#if _VXWORKS_MAJOR_EQ(6)
 
 extern void __gthread_enter_tls_dtor_context (void);
 extern void __gthread_leave_tls_dtor_context (void);
-- 
2.17.1



Re: [PATCH][middle-end][i386][Version 4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-28 Thread Qing Zhao via Gcc-patches
Okay, I will change it to namespace.

Qing

> On Oct 28, 2020, at 9:19 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>> Hi, Richard,
>> 
>> In order to be consistent with other flags in flag-types.h, for example, 
>> “sanitize_code”,
>> I didn’t use namespace, instead making the name more specific as following:
>> 
>> /* Different settings for zeroing subset of registers.  */
>> enum  zero_regs_flags {
>>  ZERO_REGS_UNSET = 0,
>>  ZERO_REGS_SKIP = 1UL << 0,
>>  ZERO_REGS_ONLY_USED = 1UL << 1,
>>  ZERO_REGS_ONLY_GPR = 1UL << 2,
>>  ZERO_REGS_ONLY_ARG = 1UL << 3,
>>  ZERO_REGS_ENABLED = 1UL << 4,
>>  ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>   | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>   | ZERO_REGS_ONLY_GPR,
>>  ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>   | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
>>  ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
>>  | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
>>  ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_ALL = ZERO_REGS_ENABLED
>> };
>> 
>> Is this good?
>> 
>> Or you still prefer namespace?
> 
> I prefer the namespace.  I realise namespaces aren't used that much
> in GCC yet, but they *are* used.
> 
> The advantage they have is that it's possible to do:
> 
>  using namespace ...;
> 
> in contexts where there's no ambiguity.  They also make lines like
> the | ones above easier to read.
> 
> Thanks,
> Richard



Re: [PING] [PATCH] S/390: Do not turn maybe-uninitialized warnings into errors

2020-10-28 Thread Jeff Law via Gcc-patches


On 10/28/20 3:38 AM, Stefan Schulze Frielinghaus via Gcc-patches wrote:
> On Mon, Oct 05, 2020 at 02:02:57PM +0200, Stefan Schulze Frielinghaus via 
> Gcc-patches wrote:
>> On Tue, Sep 22, 2020 at 02:59:30PM +0200, Andreas Krebbel wrote:
>>> On 15.09.20 17:02, Stefan Schulze Frielinghaus wrote:
 Over the last couple of months quite a few warnings about uninitialized
 variables were raised while building GCC.  A reason why these warnings
 show up on S/390 only is due to the aggressive inlining settings here.
 Some of these warnings (2c832ffedf0, b776bdca932, 2786c0221b6,
 1657178f59b) could be fixed or in case of a false positive silenced by
 initializing the corresponding variable.  Since the latter reoccurs and
 while bootstrapping such warnings are turned into errors bootstrapping
 fails on S/390 consistently.  Therefore, for the moment do not turn
 those warnings into errors.

 config/ChangeLog:

* warnings.m4: Do not turn maybe-uninitialized warnings into errors
on S/390.

 fixincludes/ChangeLog:

* configure: Regenerate.

 gcc/ChangeLog:

* configure: Regenerate.

 libcc1/ChangeLog:

* configure: Regenerate.

 libcpp/ChangeLog:

* configure: Regenerate.

 libdecnumber/ChangeLog:

* configure: Regenerate.
>>> That change looks good to me. Could a global reviewer please comment!
>> Ping
> Ping

I think this would be a huge mistake to install.


Jeff




Re: [RS6000] Do not define builtins that overload disabled builtins

2020-10-28 Thread David Edelsohn via Gcc-patches
On Wed, Oct 28, 2020 at 6:46 AM Alan Modra  wrote:
>
> commit 25ffd3d34e means we no longer define an overloaded
> __builtin_byte_in_set for -m32, so the more informative
> "__builtin_byte_in_set is not supported in this compiler
> configuration" is not reported.
>
> Regression tested powerpc64-linux biarch.  OK?
>
> PR bootstrap/92661
> * gcc.target/powerpc/byte-in-set-2.c: Update expected error.
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c 
> b/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
> index 9a80c27fe26..34ab50e25ba 100644
> --- a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
> @@ -11,5 +11,5 @@
>  int
>  test_byte_in_set (unsigned char b, unsigned long long set_members)
>  {
> -  return __builtin_byte_in_set (b, set_members); /* { dg-error 
> "'__builtin_byte_in_set' is not supported in this compiler configuration" } */
> +  return __builtin_byte_in_set (b, set_members); /* { dg-warning "implicit 
> declaration of function" } */
>  }

Thanks for tracking this down and generating the fix.  The problems
were more extensive than this one error message.  All of the
byte-in-*.c testcases had incorrect target requirements.

I updated the testcases and changed the expected result for byte-in-2.c.

I committed the patch.

Thanks, David


Re: [PATCH 1/X] libsanitizer: Tie the hwasan library into our build system

2020-10-28 Thread Richard Sandiford via Gcc-patches
I have the feeling that I've picked the most awkward of each binary
choice here, sorry…

Matthew Malcomson  writes:
> - Parametrising the tag size is pretty easy towards less bits, the current 
> code
> handles this quite nicely using HWASAN_TAG_SIZE and in fact I've already done
> that for a WIP branch for MTE stack tagging.
>
> Parametrising the tag size to more bits is much more involved -- especially
> since the hwasan library uses a u8 data type to represent these tags
> everywhere.  Hence I figure I'll leave this as it is.
>
> Does that sound ok?

If it's easy to parameterise towards fewer bits, I think it'd be worth
having a target twiddle for it and simply asserting that the value is
not more than 8.

> --- w.r.t. Patch 5
>
> - Around hwasan_increment_tag, yes -- the STATIC_ASSERT made the modulus
> reduntant, it should have asserted the below (since the "less than or equal"
> check still works for the smaller tag size used in MTE rather than fixing it 
> to
> tag_offset).
> HWASAN_TAG_SIZE <= sizeof (tag_offset) * CHAR_BIT

OK.

> - Exporting hwasan_base_ptr (which will be renamed to hwasan_frame_base_ptr).
> I currently export this through a function hwasan_frame_base rather than via 
> an
> exported variable.  I want to do this so that any use of the base pointer will
> be recorded so we know to emit the initialisation (whereas a use of
> virtual_stack_vars_rtx can rest assured that the pointer will always be
> initialised).

The reason for that comment was that the cfgexpand.c code seemed to have
to work around the fact that it couldn't directly tell whether a given
rtx was hwasan_base_ptr.  I think we should have the ability to test that,
even if it's via a function rather than by directly exposing the variable.

E.g. maybe you could have a little class around it or something, so that
the comparison still happens inline via a public member function, but so
that it's not possible for cfgexpand.c to use the private value directly.

> N.b. This is also related to API for TARGET_MEMTAG_GENTAG (renamed in the
> new patch to TARGET_MEMTAG_INSERT_RANDOM_TAG).  I have tried to separate
> generating a register to be used as hwasan_frame_base_ptr and emitting the RTL
> to initialise that register.
> That way, code that knows it does not want to emit any RTL can still access
> this
> variable, knowing that the initialisation will be emitted later.
> This is largely because I didn't want to start spreading the possibility of
> emitting RTL earlier in the expand pass.  At the moment hwasan_emit_prologue
> is often the first time that RTL is emitted (unless there are large aligned
> variables in the stack -- these are indexed off of an alternate "base 
> register"
> generated using get_dynamic_stack_base, and that function emits some code to
> generate that new register).
>
> However, an alternative API which returns a fresh register would match the
> interface of get_dynamic_stack_base which is an existing API in the codebase.

Yeah, I think that would be better.

> - default_memtag_addtag (now *_add_tag)
> I was mentioning compile-time UB, I thought that `plus_constant` taking a
> poly_int64 rather than a poly_uint64 meant I can't pass such a large number.

No, it's just that C++ forces us to choose between signed or unsigned,
even though for modes of N<=64 bits, the offset is really just a
signless bag of N bits.

I think the reason a signed type was chosen is that, when operating
on 32-bit modes, adding int64_t (-1) looks more obvious than adding
~uint64_t (0), since the latter seems to be adding to bits beyond
the MSB.

> - I have added a comment about C++ exceptions, but thought I'd include a bit
> more information about the state of things here.
> LLVM have support for C++ exceptions by using a different unwinding 
> personality
> function for all HWASAN tagged frames.  That personality function is defined 
> in
> libhwasan, and untags the entire stack frame as we unwind it.
>
> Unfortunately, that personality function relies on the frame pointer pointing
> to
> just before the variables on the stack (commented as not being enforced by the
> ABI but being a requirement for HWASAN).  That holds in LLVM but does not for
> GCC.
> https://github.com/llvm-mirror/compiler-rt/blob/master/lib/hwasan/
> hwasan_exceptions.cpp#L51
>
> I have a hack that modifies that function to use _Unwind_Backtrace to find the
> extent of the current frame, and then adds the exception support to GCC based
> on
> this new personality function.
> Since the focus of the implementation in GCC is for the kernel (which doesn't
> have C++ exceptions) I don't have plans to turn that into something
> release-quality and upstream it.  The userspace story for hwasan on anything
> other than Android has much bigger difficulties around not using the "platform
> ABI", so I don't think putting much effort into C++ exceptions makes sense
> right
> now.

OK.

> - About maybe combining the hwasan pass and the asan pass.
> Yes we could just bra

[PATCH] libstdc++: Fix arithmetic bug in year_month_weekday conversion [PR96713]

2020-10-28 Thread Patrick Palka via Gcc-patches
The conversion function year_month_weekday::operator sys_days computes
the number of days to offset from the first weekday of the month with:

 days{(index()-1)*7}
  ^  type 'unsigned'

We'd like the above to yield -7d when index() is 0u, but our 'days'
alias is based on long instead of int, so the conversion from unsigned
to long instead yields a large positive quantity.

This patch fixes this by casting the result of index() to int so that
the initializer is sign-extended in the conversion to long.  The added
testcase also verifies that we do the right thing when index() == 5.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/96713
* include/std/chrono (year_month_weekday::operator sys_days):
Cast the result of index() to int so that the initializer for
days{} is sign-extended when it's converted to the underlying
type.
* testsuite/std/time/year_month_weekday/3.cc: New test.
---
 libstdc++-v3/include/std/chrono   |  3 +-
 .../std/time/year_month_weekday/3.cc  | 66 +++
 2 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7539d7184ea..7c35b78fe59 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2719,7 +2719,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator sys_days() const noexcept
   {
auto __d = sys_days{year() / month() / 1};
-   return __d + (weekday() - chrono::weekday(__d) + days{(index()-1)*7});
+   return __d + (weekday() - chrono::weekday(__d)
+ + days{((int)index()-1)*7});
   }
 
   explicit constexpr
diff --git a/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc 
b/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc
new file mode 100644
index 000..86db85d04e2
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc
@@ -0,0 +1,66 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// PR libstdc++/97613
+// Test year_month_weekday to sys_days conversion for extreme values of 
index().
+
+#include 
+
+void
+test01()
+{
+  using namespace std::chrono;
+  using ymd = year_month_day;
+  using ymwd = year_month_weekday;
+
+  static_assert(ymd{sys_days{2020y/January/Sunday[0]}} == 2019y/December/29);
+  static_assert(ymd{sys_days{2020y/January/Monday[0]}} == 2019y/December/30);
+  static_assert(ymd{sys_days{2020y/January/Tuesday[0]}} == 2019y/December/31);
+  static_assert(ymd{sys_days{2020y/January/Wednesday[0]}} == 
2019y/December/25);
+  static_assert(ymd{sys_days{2020y/January/Thursday[0]}} == 2019y/December/26);
+  static_assert(ymd{sys_days{2020y/January/Friday[0]}} == 2019y/December/27);
+  static_assert(ymd{sys_days{2020y/January/Saturday[0]}} == 2019y/December/28);
+
+  static_assert((2020y).is_leap());
+  static_assert(ymd{sys_days{2020y/March/Sunday[0]}} == 2020y/February/23);
+  static_assert(ymd{sys_days{2020y/March/Monday[0]}} == 2020y/February/24);
+  static_assert(ymd{sys_days{2020y/March/Tuesday[0]}} == 2020y/February/25);
+  static_assert(ymd{sys_days{2020y/March/Wednesday[0]}} == 2020y/February/26);
+  static_assert(ymd{sys_days{2020y/March/Thursday[0]}} == 2020y/February/27);
+  static_assert(ymd{sys_days{2020y/March/Friday[0]}} == 2020y/February/28);
+  static_assert(ymd{sys_days{2020y/March/Saturday[0]}} == 2020y/February/29);
+
+  static_assert(!(2019y).is_leap());
+  static_assert(ymd{sys_days{2019y/March/Sunday[0]}} == 2019y/February/24);
+  static_assert(ymd{sys_days{2019y/March/Monday[0]}} == 2019y/February/25);
+  static_assert(ymd{sys_days{2019y/March/Tuesday[0]}} == 2019y/February/26);
+  static_assert(ymd{sys_days{2019y/March/Wednesday[0]}} == 2019y/February/27);
+  static_assert(ymd{sys_days{2019y/March/Thursday[0]}} == 2019y/February/28);
+  static_assert(ymd{sys_days{2019y/March/Friday[0]}} == 2019y/February/22);
+  static_assert(ymd{sys_days{2019y/March/Saturday[0]}} == 2019y/February/23);
+
+  static_assert(ymd{sys_days{2020y/December/Sunday[5]}} == 202

Re: [RS6000] Don't be too clever with dg-do run and dg-do compile

2020-10-28 Thread will schmidt via Gcc-patches
On Wed, 2020-10-28 at 21:20 +1030, Alan Modra via Gcc-patches wrote:
> Otherwise some versions of dejagnu go ahead and run the vsx tests
> below when they should not.  To best cope with older dejagnu, put
> "run" before "compile", the idea being that if the second dg-do always
> wins then that won't cause fails.
> 
> The altivec tests also need -save-temps for the scan-assembler test to
> occur when vms_hw.

vmx_hw ? :)

> 
> Regression tested powerpc64le-linux and powerpc64-linux.  OK?
> 
>   * gcc.target/powerpc/vsx-load-element-extend-char.c: Put "dg-do run"
>   before "dg-do compile", and make them mutually exclusive.
>   * gcc.target/powerpc/vsx-load-element-extend-int.c: Likewise.
>   * gcc.target/powerpc/vsx-load-element-extend-longlong.c: Likewise.
>   * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
>   * gcc.target/powerpc/vsx-store-element-truncate-char.c: Likewise.
>   * gcc.target/powerpc/vsx-store-element-truncate-int.c: Likewise.
>   * gcc.target/powerpc/vsx-store-element-truncate-longlong.c: Likewise.
>   * gcc.target/powerpc/vsx-store-element-truncate-short.c: Likewise.
>   * gcc.target/powerpc/altivec-consts.c: Likewise, add -save-temps.
>   * gcc.target/powerpc/le-altivec-consts.c: Likewise.
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-consts.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-consts.c
> index d59f9b4cf1c..c68c68125d1 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-consts.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-consts.c
> @@ -1,7 +1,7 @@
>  /* { dg-do run { target vmx_hw } } */
> -/* { dg-do compile } */
> +/* { dg-do compile { target { ! vmx_hw } } } */
>  /* { dg-require-effective-target powerpc_altivec_ok } */
> -/* { dg-options "-maltivec -mabi=altivec -O2" } */
> +/* { dg-options "-maltivec -mabi=altivec -O2 -save-temps" } */
> 
>  /* Check that "easy" AltiVec constants are correctly synthesized.  */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/le-altivec-consts.c 
> b/gcc/testsuite/gcc.target/powerpc/le-altivec-consts.c
> index f48ef44e676..a1db5e92f87 100644
> --- a/gcc/testsuite/gcc.target/powerpc/le-altivec-consts.c
> +++ b/gcc/testsuite/gcc.target/powerpc/le-altivec-consts.c
> @@ -1,7 +1,7 @@
>  /* { dg-do run { target vmx_hw } } */
> -/* { dg-do compile } */
> +/* { dg-do compile { target { ! vmx_hw } } } */
>  /* { dg-require-effective-target powerpc_altivec_ok } */
> -/* { dg-options "-maltivec -mabi=altivec -O2" } */
> +/* { dg-options "-maltivec -mabi=altivec -O2 -save-temps" } */
> 
>  /* Check that "easy" AltiVec constants are correctly synthesized.  */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c
> index f386346e059..c23a9128680 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-char.c
> @@ -2,8 +2,9 @@
> Test of vec_xl_sext and vec_xl_zext (load into rightmost
> vector element and zero/sign extend). */
> 
> -/* { dg-do compile {target power10_ok} } */
> -/* { dg-do run {target power10_hw} } */
> +/* { dg-do run { target power10_hw } } */
> +/* { dg-do compile { target { ! power10_hw } } } */
> +/* { dg-require-effective-target power10_ok } */
>  /* { dg-require-effective-target int128 } */
>  /* { dg-options "-mdejagnu-cpu=power10 -O3 -save-temps" } */


Ok.   These are from some tests I recently committed,   I obviously
missed this combo (testing with older dejagnu).. I think I've updated
my dejagnu versions all over the place for other reasons.  Do you
consider this a non-typical corner case with older dejagnu, or should I
try to explicitly check for this in the future?

Similar/same changes below.  These changes all seem reasonable.

lgtm, 
thanks
-Will



> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
> index ea737466a58..c40e1a3a0f7 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-int.c
> @@ -2,8 +2,9 @@
> Test of vec_xl_sext and vec_xl_zext (load into rightmost
> vector element and zero/sign extend). */
> 
> -/* { dg-do compile {target power10_ok} } */
> -/* { dg-do run {target power10_hw} } */
> +/* { dg-do run { target power10_hw } } */
> +/* { dg-do compile { target { ! power10_hw } } } */
> +/* { dg-require-effective-target power10_ok } */
>  /* { dg-require-effective-target int128 } */
> 
>  /* Deliberately set optization to zero for this test to confirm
> diff --git 
> a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-longlong.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-longlong.c
> index cd155c2013d..405b4245f8e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-longlong.c
> +++ b/gcc/testsuite/

[PATCH v4] builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-10-28 Thread Raoni Fassina Firmino via Gcc-patches
I am repeating the "changelog" from v3 here because v4 is just one
bugfix. I also tested on BE systems for v4.

Changes since v3[1]:
  - Fixed fegetround bug on powerpc64 (big endian) that Segher
spotted;

Changes since v2[2]:
  - Added documentation for the new optabs;
  - Remove use of non portable __builtin_clz;
  - Changed feclearexcept and feraiseexcept to accept all 4 valid
flags at the same time and added more test for that case;
  - Extended feclearexcept and feraiseexcept testcases to match
accepting multiple flags;
  - Fixed builtin-feclearexcept-feraiseexcept-2.c testcase comparison
after feclearexcept tests;
  - Updated commit message to reflect change in feclearexcept and
feraiseexcept from the glibc conterpart;
  - Fixed English spelling and typos;
  - Fixed code-style;
  - Changed subject line tag to make clear it is not just rs6000 code.

Tested on top of master (75ce04fba49eb30b6a8fe23bc3605cf0ef9a8e28)
on the following plataforms with no regression:
  - powerpc64le-linux-gnu (Power 9)
  - powerpc64le-linux-gnu (Power 8)
  - powerpc64-linux-gnu (Power 8)
  - powerpc-linux-gnu (Power 8)

Documentation changes tested on x86_64-redhat-linux.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557109.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553297.html

 8< 

This optimizations were originally in glibc, but was removed
and suggested that they were a good fit as gcc builtins[1].

feclearexcept and feraiseexcept were extended (in comparison to the
glibc version) to accept any combination of the accepted flags, not
limited to just one flag bit at a time anymore.

The associated bugreport: PR target/94193

[1] https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00047.html
https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00080.html

2020-08-13  Raoni Fassina Firmino  

gcc/ChangeLog:

* builtins.c (expand_builtin_fegetround): New function.
(expand_builtin_feclear_feraise_except): New function.
(expand_builtin): Add cases for BUILT_IN_FEGETROUND,
BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT
* config/rs6000/rs6000.md (fegetroundsi): New pattern.
(feclearexceptsi): New Pattern.
(feraiseexceptsi): New Pattern.
* optabs.def (fegetround_optab): New optab.
(feclearexcept_optab): New optab.
(feraiseexcept_optab): New optab.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New test.
* gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New test.
* gcc.target/powerpc/builtin-fegetround.c: New test.

Signed-off-by: Raoni Fassina Firmino 
---
 gcc/builtins.c|  76 +++
 gcc/config/rs6000/rs6000.md   |  81 +++
 gcc/doc/md.texi   |  18 ++
 gcc/optabs.def|   4 +
 .../builtin-feclearexcept-feraiseexcept-1.c   |  76 +++
 .../builtin-feclearexcept-feraiseexcept-2.c   | 203 ++
 .../gcc.target/powerpc/builtin-fegetround.c   |  36 
 7 files changed, 494 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3a3eb5562df..a40daf5e84f 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -116,6 +116,9 @@ static rtx expand_builtin_mathfn_3 (tree, rtx, rtx);
 static rtx expand_builtin_mathfn_ternary (tree, rtx, rtx);
 static rtx expand_builtin_interclass_mathfn (tree, rtx);
 static rtx expand_builtin_sincos (tree);
+static rtx expand_builtin_fegetround (tree, rtx, machine_mode);
+static rtx expand_builtin_feclear_feraise_except (tree, rtx, machine_mode,
+ optab);
 static rtx expand_builtin_cexpi (tree, rtx);
 static rtx expand_builtin_int_roundingfn (tree, rtx);
 static rtx expand_builtin_int_roundingfn_2 (tree, rtx);
@@ -2887,6 +2890,59 @@ expand_builtin_sincos (tree exp)
   return const0_rtx;
 }
 
+/* Expand call EXP to the fegetround builtin (from C99 venv.h), returning the
+   result and setting it in TARGET.  Otherwise return NULL_RTX on failure.  */
+static rtx
+expand_builtin_fegetround (tree exp, rtx target, machine_mode target_mode)
+{
+  if (!validate_arglist (exp, VOID_TYPE))
+return NULL_RTX;
+
+  insn_code icode = direct_optab_handler (fegetround_optab, SImode);
+  if (icode == CODE_FOR_nothing)
+return NULL_RTX;
+
+  if (target == 0
+  || GET_MODE (target) != target_mode
+  || ! (*insn_data[icode].operand[0].predicate) (target, target_mode))
+target = gen_reg_rtx (target_mode);
+
+  rtx pat = GEN_FCN (icode) (target);
+  if (! pat)
+return NULL_RTX;
+  emit_insn (pat);
+
+  return target;
+}
+
+/* Expand call EX

Re: [PATCH][middle-end][i386][Version 4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-28 Thread Qing Zhao via Gcc-patches
Hi, Richard, 

I changed the “enum” to “namespace”.

There is no issue for C++ compilation. However, flag-types.h header file is 
also included by C modules and compiled with gcc, then I got a lot of following 
compilation errors:

make[4]: Entering directory 
'/home/qinzhao/Work/x86-build/x86_64-pc-linux-gnu/libgcc'
In file included from ../.././gcc/options.h:6,
 from ../.././gcc/tm.h:22,
 from ../../../x86-gcc/libgcc/libgcc2.c:29,
 from ../../../x86-gcc/libgcc/config/i386/64/_multc3.c:6:
../../../x86-gcc/libgcc/../gcc/flag-types.h:289:1: error: unknown type name 
‘namespace’
  289 | namespace  zero_regs_code {
  | ^

Looks like that I should not put this new namespace inside “flag-types.h”?  
Which other header file I should put this namespace in? 

thanks.

Qing

> On Oct 28, 2020, at 9:24 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Okay, I will change it to namespace.
> 
> Qing
> 
>> On Oct 28, 2020, at 9:19 AM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>>> Hi, Richard,
>>> 
>>> In order to be consistent with other flags in flag-types.h, for example, 
>>> “sanitize_code”,
>>> I didn’t use namespace, instead making the name more specific as following:
>>> 
>>> /* Different settings for zeroing subset of registers.  */
>>> enum  zero_regs_flags {
>>> ZERO_REGS_UNSET = 0,
>>> ZERO_REGS_SKIP = 1UL << 0,
>>> ZERO_REGS_ONLY_USED = 1UL << 1,
>>> ZERO_REGS_ONLY_GPR = 1UL << 2,
>>> ZERO_REGS_ONLY_ARG = 1UL << 3,
>>> ZERO_REGS_ENABLED = 1UL << 4,
>>> ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>  | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>  | ZERO_REGS_ONLY_GPR,
>>> ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>  | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
>>> ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
>>> | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
>>> ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_ALL = ZERO_REGS_ENABLED
>>> };
>>> 
>>> Is this good?
>>> 
>>> Or you still prefer namespace?
>> 
>> I prefer the namespace.  I realise namespaces aren't used that much
>> in GCC yet, but they *are* used.
>> 
>> The advantage they have is that it's possible to do:
>> 
>> using namespace ...;
>> 
>> in contexts where there's no ambiguity.  They also make lines like
>> the | ones above easier to read.
>> 
>> Thanks,
>> Richard
> 



c++: Refactor push_template_decl

2020-10-28 Thread Nathan Sidwell



Sadly I need to wander into push_template_decl again.  But here's a
piece of RAII goodness first.

gcc/cp/
* pt.c (push_template_decl): Refactor for some RAII.

pushing to trunk

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index 3c0f2546489..0d2946fd7c4 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -5682,12 +5682,6 @@ template_parm_outer_level (tree t, void *data)
 tree
 push_template_decl (tree decl, bool is_friend)
 {
-  tree tmpl;
-  tree args;
-  tree info;
-  tree ctx;
-  bool is_primary;
-  bool is_partial;
   int new_template_p = 0;
   /* True if the template is a member template, in the sense of
  [temp.mem].  */
@@ -5697,19 +5691,20 @@ push_template_decl (tree decl, bool is_friend)
 return error_mark_node;
 
   /* See if this is a partial specialization.  */
-  is_partial = ((DECL_IMPLICIT_TYPEDEF_P (decl)
-		 && TREE_CODE (TREE_TYPE (decl)) != ENUMERAL_TYPE
-		 && CLASSTYPE_TEMPLATE_SPECIALIZATION (TREE_TYPE (decl)))
-		|| (VAR_P (decl)
-		&& DECL_LANG_SPECIFIC (decl)
-		&& DECL_TEMPLATE_SPECIALIZATION (decl)
-		&& TINFO_USED_TEMPLATE_ID (DECL_TEMPLATE_INFO (decl;
+  bool is_partial = ((DECL_IMPLICIT_TYPEDEF_P (decl)
+		  && TREE_CODE (TREE_TYPE (decl)) != ENUMERAL_TYPE
+		  && CLASSTYPE_TEMPLATE_SPECIALIZATION (TREE_TYPE (decl)))
+		 || (VAR_P (decl)
+			 && DECL_LANG_SPECIFIC (decl)
+			 && DECL_TEMPLATE_SPECIALIZATION (decl)
+			 && TINFO_USED_TEMPLATE_ID (DECL_TEMPLATE_INFO (decl;
 
   /* No surprising friend functions.  */
   gcc_checking_assert (is_friend
 		   || !(TREE_CODE (decl) == FUNCTION_DECL
 			&& DECL_UNIQUE_FRIEND_P (decl)));
 
+  tree ctx;
   if (is_friend)
 /* For a friend, we want the context of the friend, not
the type of which it is a friend.  */
@@ -5731,14 +5726,16 @@ push_template_decl (tree decl, bool is_friend)
 DECL_CONTEXT (decl) = FROB_CONTEXT (current_namespace);
 
   /* See if this is a primary template.  */
+  bool is_primary = false;
   if (is_friend && ctx
   && uses_template_parms_level (ctx, processing_template_decl))
 /* A friend template that specifies a class context, i.e.
  template  friend void A::f();
is not primary.  */
-is_primary = false;
+;
   else if (TREE_CODE (decl) == TYPE_DECL && LAMBDA_TYPE_P (TREE_TYPE (decl)))
-is_primary = false;
+/* Lambdas are not primary.  */
+;
   else
 is_primary = template_parm_scope_p ();
 
@@ -5871,8 +5868,9 @@ push_template_decl (tree decl, bool is_friend)
   if (is_partial)
 return process_partial_specialization (decl);
 
-  args = current_template_args ();
+  tree args = current_template_args ();
 
+  tree tmpl;
   if (!ctx
   || TREE_CODE (ctx) == FUNCTION_DECL
   || (CLASS_TYPE_P (ctx) && TYPE_BEING_DEFINED (ctx))
@@ -6077,7 +6075,7 @@ push_template_decl (tree decl, bool is_friend)
   if (DECL_TEMPLATE_INFO (tmpl))
 args = add_outermost_template_args (DECL_TI_ARGS (tmpl), args);
 
-  info = build_template_info (tmpl, args);
+  tree info = build_template_info (tmpl, args);
 
   if (DECL_IMPLICIT_TYPEDEF_P (decl))
 SET_TYPE_TEMPLATE_INFO (TREE_TYPE (tmpl), info);


Re: [PATCH] libstdc++: Fix arithmetic bug in year_month_weekday conversion [PR96713]

2020-10-28 Thread Patrick Palka via Gcc-patches
On Wed, 28 Oct 2020, Patrick Palka wrote:

> The conversion function year_month_weekday::operator sys_days computes
> the number of days to offset from the first weekday of the month with:
> 
>  days{(index()-1)*7}
>   ^  type 'unsigned'
> 
> We'd like the above to yield -7d when index() is 0u, but our 'days'
> alias is based on long instead of int, so the conversion from unsigned
> to long instead yields a large positive quantity.
> 
> This patch fixes this by casting the result of index() to int so that
> the initializer is sign-extended in the conversion to long.  The added
> testcase also verifies that we do the right thing when index() == 5.
> 
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/96713
>   * include/std/chrono (year_month_weekday::operator sys_days):
>   Cast the result of index() to int so that the initializer for
>   days{} is sign-extended when it's converted to the underlying
>   type.
>   * testsuite/std/time/year_month_weekday/3.cc: New test.
> ---
>  libstdc++-v3/include/std/chrono   |  3 +-
>  .../std/time/year_month_weekday/3.cc  | 66 +++
>  2 files changed, 68 insertions(+), 1 deletion(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc
> 
> diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
> index 7539d7184ea..7c35b78fe59 100644
> --- a/libstdc++-v3/include/std/chrono
> +++ b/libstdc++-v3/include/std/chrono
> @@ -2719,7 +2719,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>operator sys_days() const noexcept
>{
>   auto __d = sys_days{year() / month() / 1};
> - return __d + (weekday() - chrono::weekday(__d) + days{(index()-1)*7});
> + return __d + (weekday() - chrono::weekday(__d)
> +   + days{((int)index()-1)*7});

On second thought, for consistency with the rest of the header, I guess
we should use a functional cast instead of a C-style cast here:

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/96713
* include/std/chrono (year_month_weekday::operator sys_days):
Cast the result of index() to int so that the initializer for
days{} is sign-extended when it's converted to the underlying
type.
* testsuite/std/time/year_month_weekday/3.cc: New test.
---
 libstdc++-v3/include/std/chrono   |  3 +-
 .../std/time/year_month_weekday/3.cc  | 65 +++
 2 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7539d7184ea..f947082c528 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2719,7 +2719,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator sys_days() const noexcept
   {
auto __d = sys_days{year() / month() / 1};
-   return __d + (weekday() - chrono::weekday(__d) + days{(index()-1)*7});
+   return __d + (weekday() - chrono::weekday(__d)
+ + days{(int(index())-1)*7});
   }
 
   explicit constexpr
diff --git a/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc 
b/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc
new file mode 100644
index 000..cccaccef211
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc
@@ -0,0 +1,65 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// PR libstdc++/97613
+// Test year_month_weekday to sys_days conversion for extreme values of 
index().
+
+#include 
+
+void
+test01()
+{
+  using namespace std::chrono;
+  using ymd = year_month_day;
+
+  static_assert(ymd{sys_days{2020y/January/Sunday[0]}} == 2019y/December/29);
+  static_assert(ymd{sys_days{2020y/January/Monday[0]}} == 2019y/December/30);
+  static_assert(ymd{sys_days{2020y/January/Tuesday[0]}} == 2019y/December/31);
+  static_assert(ymd{sys_days{2020y/January/Wednesday[0]}} == 
2019y/December/25);
+  static_assert(ymd{sys_days{2020y/January/Thursday[0]}} == 2019y/December/26);
+  static_assert(ymd{sys_days{2020y/January/Frid

[PATCH] dump when SLP analysis fails due to shared vectype mismatch

2020-10-28 Thread Richard Biener
This adds another one.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2020-10-28  Richard Biener  

* tree-vect-slp.c (vect_slp_analyze_node_operations_1): Dump
when shared vectype update fails.
---
 gcc/tree-vect-slp.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c98f747d4a9..ff3a0c2fd8e 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3371,7 +3371,13 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
 
   if (is_a  (vinfo)
   && !vect_update_shared_vectype (stmt_info, SLP_TREE_VECTYPE (node)))
-return false;
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"desired vector type conflicts with earlier one "
+"for %G", stmt_info->stmt);
+  return false;
+}
 
   bool dummy;
   return vect_analyze_stmt (vinfo, stmt_info, &dummy,
-- 
2.26.2


Re: [PATCH][middle-end][i386][Version 4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-28 Thread Richard Sandiford via Gcc-patches
Qing Zhao  writes:
> Hi, Richard, 
>
> I changed the “enum” to “namespace”.
>
> There is no issue for C++ compilation. However, flag-types.h header file is 
> also included by C modules and compiled with gcc, then I got a lot of 
> following compilation errors:
>
> make[4]: Entering directory 
> '/home/qinzhao/Work/x86-build/x86_64-pc-linux-gnu/libgcc'
> In file included from ../.././gcc/options.h:6,
>  from ../.././gcc/tm.h:22,
>  from ../../../x86-gcc/libgcc/libgcc2.c:29,
>  from ../../../x86-gcc/libgcc/config/i386/64/_multc3.c:6:
> ../../../x86-gcc/libgcc/../gcc/flag-types.h:289:1: error: unknown type name 
> ‘namespace’
>   289 | namespace  zero_regs_code {
>   | ^
>
> Looks like that I should not put this new namespace inside “flag-types.h”?  
> Which other header file I should put this namespace in? 

I think we should just protect the contents of flag-types.h with:

#if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)

similarly to what we do for flags.h.

Thanks,
Richard


Re: deprecations in OpenMP 5.0

2020-10-28 Thread Kwok Cheung Yeung

Hello

I found this almost two-year old thread while looking for how the OpenMP 5.0 
deprecations were to be handled.



E.g. if somebody tries hard to write portable OpenMP code and has:
  omp_lock_t lock;
  #if __OPENMP__ >= 201811L
  omp_init_lock_with_hint (&lock, omp_sync_hint_contended);
  #elif __OPENMP__ >= 201511L
  omp_init_lock_with_hint (&lock, omp_lock_hint_contended);
  #else
  omp_init_lock (&lock);
  #endif
they would now get a warning even when they did the right thing.
So, deprecating those when we change __OPENMP__ macro is the right thing.


What if we made the definition of __GOMP_DEPRECATED in the original patch 
conditional on the current value of __OPENMP__? i.e. Something like:


+#if defined(__GNUC__) && __OPENMP__ >= 201811L
+# define __GOMP_DEPRECATED __attribute__((__deprecated__))
+#else
+# define __GOMP_DEPRECATED
+#endif

In that case, __GOMP_DEPRECATED will not do anything until __OPENMP__ is updated 
to reflect OpenMP 5.0, but when it is, the functions will immediately be marked 
deprecated without any further work.


However, GFortran does not support the deprecated attribute, so how should it 
behave? My first thought would be to print out a warning message at runtime the 
first time a deprecated function is called (printing it out every time would 
probably be too annoying), and maybe add an environment variable that can be set 
to disable the warning. A similar runtime warning could also be printed if the 
OMP_NESTED environment variable is set. Again, printing these warnings could be 
surpressed until the value of __OPENMP__ is bumped up.


Kwok


Re: Materialize clones on demand

2020-10-28 Thread Jan Hubicka
> > > > However main problem is
> > > > cfg.c:202 (connect_src)   5745k:  0.2%  
> > > > 271M:  1.9% 1754k:  0.0% 1132k:  0.2% 7026k
> > > > cfg.c:212 (connect_dest)  6307k:  0.2%  
> > > > 281M:  2.0%10129k:  0.2% 2490k:  0.5% 7172k
> > > > varasm.c:3359 (build_constant_desc)   7387k:  0.2%  
> > > >   0 :  0.0%0 :  0.0%0 :  0.0%   51k
> > > > emit-rtl.c:486 (gen_raw_REG)  7799k:  0.2%  
> > > > 215M:  1.5%   96 :  0.0%0 :  0.0% 9502k
> > > > dwarf2cfi.c:2341 (add_cfis_to_fde)8027k:  0.2%  
> > > >   0 :  0.0% 4906k:  0.1% 1405k:  0.3%   78k
> > > > emit-rtl.c:4074 (make_jump_insn_raw)  8239k:  0.2%  
> > > >  93M:  0.7%0 :  0.0%0 :  0.0% 1442k
> > > > tree-ssanames.c:308 (make_ssa_name_fn)9130k:  0.2%  
> > > > 456M:  3.3%0 :  0.0%0 :  0.0% 6622k
> > > > gimple.c:1808 (gimple_copy)   9508k:  0.3%  
> > > > 524M:  3.7% 8609k:  0.2% 2972k:  0.6% 7135k
> > > > tree-inline.c:4879 (expand_call_inline)   9590k:  0.3%  
> > > >  21M:  0.2%0 :  0.0%0 :  0.0%  328k
> > > > dwarf2cfi.c:418 (new_cfi)   10M:  0.3%  
> > > >   0 :  0.0%0 :  0.0%0 :  0.0%  444k
> > > > cfg.c:266 (unchecked_make_edge) 10M:  0.3%  
> > > >  60M:  0.4%  355M:  6.8%0 :  0.0% 9083k
> > I think it is bug to have fuction body at the end of compilation - will
> > try to work out reason for that.
> > > > tree.c:1642 (wide_int_to_tree_1)10M:  0.3% 
> > > > 2313k:  0.0%0 :  0.0%0 :  0.0%  548k
> > > > stringpool.c:41 (stringpool_ggc_alloc)  10M:  0.3% 
> > > > 7055k:  0.0%0 :  0.0% 2270k:  0.5%  588k
> > > > stringpool.c:63 (alloc_node)10M:  0.3%  
> > > >  12M:  0.1%0 :  0.0%0 :  0.0%  588k
> > > > tree-phinodes.c:119 (allocate_phi_node) 11M:  0.3%  
> > > > 153M:  1.1%0 :  0.0% 3539k:  0.7%  340k
> > > > cgraph.c:289 (create_empty) 12M:  0.3%  
> > > >   0 :  0.0%  109M:  2.1%0 :  0.0%  371k
> > > > cfg.c:127 (alloc_block) 14M:  0.4%  
> > > > 705M:  5.0%0 :  0.0%0 :  0.0% 7086k
> > > > tree-streamer-in.c:558 (streamer_read_tree_bitfi22M:  0.6%  
> > > >  13k:  0.0%0 :  0.0%   22k:  0.0%   64k
> > > > tree-inline.c:834 (remap_block) 28M:  0.8%  
> > > > 159M:  1.1%0 :  0.0%0 :  0.0% 2009k
> > > > stringpool.c:79 (ggc_alloc_string)  28M:  0.8% 
> > > > 5619k:  0.0%0 :  0.0% 6658k:  1.4% 1785k
> > > > dwarf2out.c:11727 (add_ranges_num)  32M:  0.9%  
> > > >   0 :  0.0%   32M:  0.6%  144 :  0.0%   20 
> > > > tree-inline.c:5942 (copy_decl_to_var)   39M:  1.1%  
> > > >  51M:  0.4%0 :  0.0%0 :  0.0%  646k
> > > > tree-inline.c:5994 (copy_decl_no_change)78M:  2.1%  
> > > > 270M:  1.9%0 :  0.0%0 :  0.0% 2497k
> > > > function.c:4438 (reorder_blocks_1)  96M:  2.6%  
> > > > 101M:  0.7%0 :  0.0%0 :  0.0% 2109k
> > > > hash-table.h:802 (expand)  142M:  3.9%  
> > > >  18M:  0.1%  198M:  3.8%   32M:  6.9%   38k
> > > > dwarf2out.c:10086 (new_loc_list)   219M:  6.0%  
> > > >  11M:  0.1%0 :  0.0%0 :  0.0% 2955k
> > > > tree-streamer-in.c:637 (streamer_alloc_tree)   379M: 10.3%  
> > > > 426M:  3.0%0 :  0.0% 4201k:  0.9% 9828k
> > > > dwarf2out.c:5702 (new_die_raw) 434M: 11.8%  
> > > >   0 :  0.0%0 :  0.0%0 :  0.0% 5556k
> > > > dwarf2out.c:1383 (new_loc_descr)   519M: 14.1%  
> > > >  12M:  0.1% 2880 :  0.0%0 :  0.0% 6812k
> > > > dwarf2out.c:4420 (add_dwarf_attr)  640M: 17.4%  
> > > >   0 :  0.0%   94M:  1.8% 4584k:  1.0% 3877k
> > > > toplev.c:906 (realloc_for_line_map)768M: 20.8%  
> > > >   0 :  0.0%  767M: 14.6%  255M: 54.4%   33 
> > > > 
> > > > GGC memory  Leak  
> > > > GarbageFreedOverheadTimes
> > > > -

Re: [PATCH] libstdc++: Fix arithmetic bug in year_month_weekday conversion [PR96713]

2020-10-28 Thread Jonathan Wakely via Gcc-patches

On 28/10/20 11:04 -0400, Patrick Palka via Libstdc++ wrote:

The conversion function year_month_weekday::operator sys_days computes
the number of days to offset from the first weekday of the month with:

days{(index()-1)*7}
 ^  type 'unsigned'

We'd like the above to yield -7d when index() is 0u, but our 'days'
alias is based on long instead of int, so the conversion from unsigned
to long instead yields a large positive quantity.

This patch fixes this by casting the result of index() to int so that
the initializer is sign-extended in the conversion to long.  The added
testcase also verifies that we do the right thing when index() == 5.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?


Yes, thanks.



libstdc++-v3/ChangeLog:

PR libstdc++/96713
* include/std/chrono (year_month_weekday::operator sys_days):
Cast the result of index() to int so that the initializer for
days{} is sign-extended when it's converted to the underlying
type.
* testsuite/std/time/year_month_weekday/3.cc: New test.
---
libstdc++-v3/include/std/chrono   |  3 +-
.../std/time/year_month_weekday/3.cc  | 66 +++
2 files changed, 68 insertions(+), 1 deletion(-)
create mode 100644 libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7539d7184ea..7c35b78fe59 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2719,7 +2719,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  operator sys_days() const noexcept
  {
auto __d = sys_days{year() / month() / 1};
-   return __d + (weekday() - chrono::weekday(__d) + days{(index()-1)*7});
+   return __d + (weekday() - chrono::weekday(__d)
+ + days{((int)index()-1)*7});
  }

  explicit constexpr
diff --git a/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc 
b/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc
new file mode 100644
index 000..86db85d04e2
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc
@@ -0,0 +1,66 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// PR libstdc++/97613
+// Test year_month_weekday to sys_days conversion for extreme values of 
index().
+
+#include 
+
+void
+test01()
+{
+  using namespace std::chrono;
+  using ymd = year_month_day;
+  using ymwd = year_month_weekday;
+
+  static_assert(ymd{sys_days{2020y/January/Sunday[0]}} == 2019y/December/29);
+  static_assert(ymd{sys_days{2020y/January/Monday[0]}} == 2019y/December/30);
+  static_assert(ymd{sys_days{2020y/January/Tuesday[0]}} == 2019y/December/31);
+  static_assert(ymd{sys_days{2020y/January/Wednesday[0]}} == 
2019y/December/25);
+  static_assert(ymd{sys_days{2020y/January/Thursday[0]}} == 2019y/December/26);
+  static_assert(ymd{sys_days{2020y/January/Friday[0]}} == 2019y/December/27);
+  static_assert(ymd{sys_days{2020y/January/Saturday[0]}} == 2019y/December/28);
+
+  static_assert((2020y).is_leap());
+  static_assert(ymd{sys_days{2020y/March/Sunday[0]}} == 2020y/February/23);
+  static_assert(ymd{sys_days{2020y/March/Monday[0]}} == 2020y/February/24);
+  static_assert(ymd{sys_days{2020y/March/Tuesday[0]}} == 2020y/February/25);
+  static_assert(ymd{sys_days{2020y/March/Wednesday[0]}} == 2020y/February/26);
+  static_assert(ymd{sys_days{2020y/March/Thursday[0]}} == 2020y/February/27);
+  static_assert(ymd{sys_days{2020y/March/Friday[0]}} == 2020y/February/28);
+  static_assert(ymd{sys_days{2020y/March/Saturday[0]}} == 2020y/February/29);
+
+  static_assert(!(2019y).is_leap());
+  static_assert(ymd{sys_days{2019y/March/Sunday[0]}} == 2019y/February/24);
+  static_assert(ymd{sys_days{2019y/March/Monday[0]}} == 2019y/February/25);
+  static_assert(ymd{sys_days{2019y/March/Tuesday[0]}} == 2019y/February/26);
+  static_assert(ymd{sys_days{2019y/March/Wednesday[0]}} == 2019y/February/27);
+  static_assert(ymd{sys_days{2019y/March/Thursday[0]}} == 2019y/February/28);
+  static_assert(ymd{sys_days{2019y/March/Friday[0]}} == 2019y/February/22);
+  static_assert(ymd{sys_days{2019y/March/Saturday[0]}} == 2019y/February/23

Re: [PATCH] libstdc++: Fix arithmetic bug in year_month_weekday conversion [PR96713]

2020-10-28 Thread Jonathan Wakely via Gcc-patches

On 28/10/20 11:21 -0400, Patrick Palka via Libstdc++ wrote:

On Wed, 28 Oct 2020, Patrick Palka wrote:


The conversion function year_month_weekday::operator sys_days computes
the number of days to offset from the first weekday of the month with:

 days{(index()-1)*7}
  ^  type 'unsigned'

We'd like the above to yield -7d when index() is 0u, but our 'days'
alias is based on long instead of int, so the conversion from unsigned
to long instead yields a large positive quantity.

This patch fixes this by casting the result of index() to int so that
the initializer is sign-extended in the conversion to long.  The added
testcase also verifies that we do the right thing when index() == 5.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/96713
* include/std/chrono (year_month_weekday::operator sys_days):
Cast the result of index() to int so that the initializer for
days{} is sign-extended when it's converted to the underlying
type.
* testsuite/std/time/year_month_weekday/3.cc: New test.
---
 libstdc++-v3/include/std/chrono   |  3 +-
 .../std/time/year_month_weekday/3.cc  | 66 +++
 2 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7539d7184ea..7c35b78fe59 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2719,7 +2719,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator sys_days() const noexcept
   {
auto __d = sys_days{year() / month() / 1};
-   return __d + (weekday() - chrono::weekday(__d) + days{(index()-1)*7});
+   return __d + (weekday() - chrono::weekday(__d)
+ + days{((int)index()-1)*7});


On second thought, for consistency with the rest of the header, I guess
we should use a functional cast instead of a C-style cast here:


Or static_cast(index()) :-)

Any of those options is OK.



Re: deprecations in OpenMP 5.0

2020-10-28 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 28, 2020 at 03:41:25PM +, Kwok Cheung Yeung wrote:
> I found this almost two-year old thread while looking for how the OpenMP 5.0
> deprecations were to be handled.
> 
> > E.g. if somebody tries hard to write portable OpenMP code and has:
> >   omp_lock_t lock;
> >   #if __OPENMP__ >= 201811L
> >   omp_init_lock_with_hint (&lock, omp_sync_hint_contended);
> >   #elif __OPENMP__ >= 201511L
> >   omp_init_lock_with_hint (&lock, omp_lock_hint_contended);
> >   #else
> >   omp_init_lock (&lock);
> >   #endif
> > they would now get a warning even when they did the right thing.
> > So, deprecating those when we change __OPENMP__ macro is the right thing.
> 
> What if we made the definition of __GOMP_DEPRECATED in the original patch
> conditional on the current value of __OPENMP__? i.e. Something like:
> 
> +#if defined(__GNUC__) && __OPENMP__ >= 201811L
> +# define __GOMP_DEPRECATED __attribute__((__deprecated__))
> +#else
> +# define __GOMP_DEPRECATED
> +#endif
> 
> In that case, __GOMP_DEPRECATED will not do anything until __OPENMP__ is
> updated to reflect OpenMP 5.0, but when it is, the functions will
> immediately be marked deprecated without any further work.

That could work, but the macro name would need to incorporate the exact
OpenMP version.
Because some APIs can be deprecated in OpenMP 5.0, others in 5.1 or in 5.2
(all to be removed in 6.0), others in 6.0/6.1 etc. to be removed in 7.0 etc.
> 
> However, GFortran does not support the deprecated attribute, so how should
> it behave? My first thought would be to print out a warning message at
> runtime the first time a deprecated function is called (printing it out
> every time would probably be too annoying), and maybe add an environment
> variable that can be set to disable the warning. A similar runtime warning
> could also be printed if the OMP_NESTED environment variable is set. Again,
> printing these warnings could be surpressed until the value of __OPENMP__ is
> bumped up.

I'm against such runtime diagnostics, that is perhaps good for some
sanitization, but not normal usage.  Perhaps better implement deprecated
attribute in gfortran?

Jakub



Re: [PATCH] libstdc++: Fix arithmetic bug in year_month_weekday conversion [PR96713]

2020-10-28 Thread Patrick Palka via Gcc-patches
On Wed, 28 Oct 2020, Jonathan Wakely wrote:

> On 28/10/20 11:21 -0400, Patrick Palka via Libstdc++ wrote:
> > On Wed, 28 Oct 2020, Patrick Palka wrote:
> > 
> > > The conversion function year_month_weekday::operator sys_days computes
> > > the number of days to offset from the first weekday of the month with:
> > > 
> > >  days{(index()-1)*7}
> > >   ^  type 'unsigned'
> > > 
> > > We'd like the above to yield -7d when index() is 0u, but our 'days'
> > > alias is based on long instead of int, so the conversion from unsigned
> > > to long instead yields a large positive quantity.
> > > 
> > > This patch fixes this by casting the result of index() to int so that
> > > the initializer is sign-extended in the conversion to long.  The added
> > > testcase also verifies that we do the right thing when index() == 5.
> > > 
> > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> > > 
> > > libstdc++-v3/ChangeLog:
> > > 
> > >   PR libstdc++/96713
> > >   * include/std/chrono (year_month_weekday::operator sys_days):
> > >   Cast the result of index() to int so that the initializer for
> > >   days{} is sign-extended when it's converted to the underlying
> > >   type.
> > >   * testsuite/std/time/year_month_weekday/3.cc: New test.
> > > ---
> > >  libstdc++-v3/include/std/chrono   |  3 +-
> > >  .../std/time/year_month_weekday/3.cc  | 66 +++
> > >  2 files changed, 68 insertions(+), 1 deletion(-)
> > >  create mode 100644
> > > libstdc++-v3/testsuite/std/time/year_month_weekday/3.cc
> > > 
> > > diff --git a/libstdc++-v3/include/std/chrono
> > > b/libstdc++-v3/include/std/chrono
> > > index 7539d7184ea..7c35b78fe59 100644
> > > --- a/libstdc++-v3/include/std/chrono
> > > +++ b/libstdc++-v3/include/std/chrono
> > > @@ -2719,7 +2719,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >operator sys_days() const noexcept
> > >{
> > >   auto __d = sys_days{year() / month() / 1};
> > > - return __d + (weekday() - chrono::weekday(__d) + days{(index()-1)*7});
> > > + return __d + (weekday() - chrono::weekday(__d)
> > > +   + days{((int)index()-1)*7});
> > 
> > On second thought, for consistency with the rest of the header, I guess
> > we should use a functional cast instead of a C-style cast here:
> 
> Or static_cast(index()) :-)
> 
> Any of those options is OK.

Thanks for the review.  I committed the patch just now, albeit with
reference to the unrelated PR96713 instead of PR97613 :( I'll adjust the
ChangeLog entry when it's generated tomorrow.



Re: [PATCH v4] builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-10-28 Thread Segher Boessenkool
On Wed, Oct 28, 2020 at 12:09:54PM -0300, Raoni Fassina Firmino wrote:
> * builtins.c (expand_builtin_fegetround): New function.
> (expand_builtin_feclear_feraise_except): New function.
> (expand_builtin): Add cases for BUILT_IN_FEGETROUND,
> BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT
> * config/rs6000/rs6000.md (fegetroundsi): New pattern.
> (feclearexceptsi): New Pattern.
> (feraiseexceptsi): New Pattern.
> * optabs.def (fegetround_optab): New optab.
> (feclearexcept_optab): New optab.
> (feraiseexcept_optab): New optab.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New 
> test.
> * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New 
> test.
> * gcc.target/powerpc/builtin-fegetround.c: New test.


> +/* Expand call EXP to the fegetround builtin (from C99 venv.h), returning the
> +   result and setting it in TARGET.  Otherwise return NULL_RTX on failure.  
> */

fenv.h

> +  rtx pat = GEN_FCN (icode) (target);
> +  if (! pat)
> +return NULL_RTX;

No space after "!" (you do this more often; there are many bad examples
to follow, of course).

> +/* Expand call EXP to either feclearexcept or feraiseexcept builtins (from 
> C99
> +fenv.h), returning the result and setting it in TARGET.  Otherwise return
> +NULL_RTX on failure.  */

"f" and "N" should align with the "E".

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6565,6 +6565,87 @@
>[(set_attr "type" "fpload")
> (set_attr "length" "8")
> (set_attr "isa" "*,p8v,p8v")])
> +
> +;; int __builtin_fegetround()

Well, give the arguments (or just their types) as well then?

> +(define_expand "fegetroundsi"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT"
> +{
> +  rtx tmp_df = gen_reg_rtx (DFmode);
> +  emit_insn (gen_rs6000_mffsl (tmp_df));
> +
> +  rtx tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> +  rtx tmp_di_2 = gen_reg_rtx (DImode);
> +  emit_insn (gen_anddi3 (tmp_di_2, tmp_di, GEN_INT (3)));
> +  rtx tmp_si = gen_reg_rtx (SImode);
> +  tmp_si = gen_lowpart (SImode, tmp_di_2);
> +  emit_move_insn (operands[0], tmp_si);
> +  DONE;
> +})

Okay.

> +;; int feclearexcept(int excepts)
> +;;
> +;; This expansion for the C99 function only works when excepts is a

(EXCEPTS)

> +;; constant known at compile time and specifies any one of
> +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags.
> +;; It doesn't handle values out of range, and always returns 0.
> +;; Note that FE_INVALID is unsuported because it maps to more than

(unsupported)

> +;; one bit on FPSCR register.

s/on/of the/

> +;; Because of these restrictions, this only expands on the desired cases.

Maybe say in other cases you just get a call to libc?

> +(define_expand "feclearexceptsi"
> +  [(use (match_operand:SI 1 "const_int_operand" "n"))
> +   (set (match_operand:SI 0 "gpc_reg_operand")
> + (const_int 0))]
> +  "TARGET_HARD_FLOAT"
> +{
> +  unsigned int fe = INTVAL (operands[1]);
> +  if (fe != (fe & 0x1e00))
> +FAIL;
> +
> +  if (fe & 0x0200)  /* FE_INEXACT */
> +emit_insn (gen_rs6000_mtfsb0 (gen_rtx_CONST_INT (SImode, 6)));
> +  if (fe & 0x0400)  /* FE_DIVBYZERO */
> +emit_insn (gen_rs6000_mtfsb0 (gen_rtx_CONST_INT (SImode, 5)));
> +  if (fe & 0x0800)  /* FE_UNDERFLOW */
> +emit_insn (gen_rs6000_mtfsb0 (gen_rtx_CONST_INT (SImode, 4)));
> +  if (fe & 0x1000)  /* FE_OVERFLOW */
> +emit_insn (gen_rs6000_mtfsb0 (gen_rtx_CONST_INT (SImode, 3)));
> +
> +  emit_move_insn (operands[0], const0_rtx);
> +  DONE;
> +})

Okay.

> +;; int feraiseexcept(int excepts)
> +;;
> +;; This expansion for the C99 function only works when excepts is a
> +;; constant known at compile time and specifies any one of
> +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags.
> +;; It doesn't handle values out of range, and always returns 0.
> +;; Note that FE_INVALID is unsupported because it maps to more than
> +;; one bit on FPSCR register.
> +;; Because of these restrictions, this only expands on the desired cases.
> +(define_expand "feraiseexceptsi"
> +  [(use (match_operand:SI 1 "const_int_operand" "n"))
> +   (set (match_operand:SI 0 "gpc_reg_operand")
> + (const_int 0))]
> +  "TARGET_HARD_FLOAT"
> +{
> +  unsigned int fe = INTVAL (operands[1]);
> +  if (fe != (fe & 0x1e00))
> +FAIL;
> +
> +  if (fe & 0x0200)  /* FE_INEXACT */
> +emit_insn (gen_rs6000_mtfsb1 (gen_rtx_CONST_INT (SImode, 6)));
> +  if (fe & 0x0400)  /* FE_DIVBYZERO */
> +emit_insn (gen_rs6000_mtfsb1 (gen_rtx_CONST_INT (SImode, 5)));
> +  if (fe & 0x0800)  /* FE_UNDERFLOW */
> +emit_insn (gen_rs6000_mtfsb1 (gen_rtx_CONST_INT (SImode, 4)));
> +  if (fe & 0x1000)  /* FE_OVERFLOW */
> +emit_insn (gen_rs6000_mtfsb1 (gen_rtx_CONST_INT (SImode, 3)));
> +
> +  emit_move_insn (operands

Re: [patch, shared coarrays, committed] Make header use more consistent

2020-10-28 Thread Nicolas König

Hi Andre,

I'll make another pass after I'm done implementing stat & errmsg to 
remove any reference to native coarrays I find.


Kind regards,

  Nicolas

On 28/10/2020 09:25, Andre Vehreschild wrote:

Hi Nicolas, Thomas,

are you planing to also rename the directory and library name from "nca" to
"shared_caf" or the like?

Regards,
Andre

On Tue, 27 Oct 2020 17:18:21 -0400
David Edelsohn via Fortran  wrote:


The current COARRAYS branch correctly bootstraps on AIX.  Thanks for
correcting the contents and ordering of the header files.

Thanks, David

On Tue, Oct 27, 2020 at 1:31 PM Thomas Koenig  wrote:


I just committed

https://gcc.gnu.org/g:0c261d5b5c931d9e9214d06531bdc7e9e16aeaab

to hopefully fix the header issue on the native_coarray branch.

If anybody wants to give this a spin, please go right ahead.

I've also discussed with Nicolas on how best to proceed.  The
best way forward is probably to merge the branch into trunk at
the end of stage 1 and follow Richard's suggestion to use configure.tgt
to only compile the shared coarray library for systems where it is
known to at least compile. As people test more systems, we can then
add these to configure.tgt.

Best regards

 Thomas

Always include libgfortran.h first; sanitize header dependencies.

libgfortran/ChangeLog:

 * nca/coarraynative.c: Do not include util.h. Remove commented
 include for stdlib.h..
 * nca/collective_subroutine.c: Move #include  after
 other #include statement.
 * nca/hashmap.c: Include shared_memory.h and allocator.h
 * nca/hashmap.h: Remove includess.
 * nca/libcoarraynative.h: Include only those headers which
 are needed.
 * nca/shared_memory.c: Do not include util.h
 * nca/shared_memory.h: Do not include other headers.
 * nca/sync.c: Move include of string.h after other headers.
 * nca/sync.h: Remove include of shared_memory.h and alloc.h.
 * nca/util.h: Do not include stdint.h and stddef.h; include
 limits.h and assert.h.
 * nca/wrapper.c: Remove include for sync.h, util.h and
 collective_subroutine.h. Move include of string.h after other
 headers.




--
Andre Vehreschild * Email: vehre ad gmx dot de



Re: [RS6000] Don't be too clever with dg-do run and dg-do compile

2020-10-28 Thread Segher Boessenkool
On Wed, Oct 28, 2020 at 10:05:14AM -0500, will schmidt wrote:
> On Wed, 2020-10-28 at 21:20 +1030, Alan Modra via Gcc-patches wrote:
> > Otherwise some versions of dejagnu go ahead and run the vsx tests
> > below when they should not.  To best cope with older dejagnu, put
> > "run" before "compile", the idea being that if the second dg-do always
> > wins then that won't cause fails.
> > 
> > The altivec tests also need -save-temps for the scan-assembler test to
> > occur when vms_hw.
> 
> vmx_hw ? :)

For vms_hw it needs a lot more ;-)

> Ok.   These are from some tests I recently committed,   I obviously
> missed this combo (testing with older dejagnu).. I think I've updated
> my dejagnu versions all over the place for other reasons.  Do you
> consider this a non-typical corner case with older dejagnu, or should I
> try to explicitly check for this in the future?

I don't know what version of dejagnu introduced the fix, but probably
newer than our minimum required version.

Thanks,


Segher


Re: [RS6000] Don't be too clever with dg-do run and dg-do compile

2020-10-28 Thread Segher Boessenkool
On Wed, Oct 28, 2020 at 09:20:56PM +1030, Alan Modra wrote:
> Otherwise some versions of dejagnu go ahead and run the vsx tests
> below when they should not.  To best cope with older dejagnu, put
> "run" before "compile", the idea being that if the second dg-do always
> wins then that won't cause fails.

If they are mutually exclusive, does the order still matter?  (Just FMI.)

> The altivec tests also need -save-temps for the scan-assembler test to
> occur when vms_hw.

> Regression tested powerpc64le-linux and powerpc64-linux.  OK?
> 
>   * gcc.target/powerpc/vsx-load-element-extend-char.c: Put "dg-do run"
>   before "dg-do compile", and make them mutually exclusive.
>   * gcc.target/powerpc/vsx-load-element-extend-int.c: Likewise.
>   * gcc.target/powerpc/vsx-load-element-extend-longlong.c: Likewise.
>   * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
>   * gcc.target/powerpc/vsx-store-element-truncate-char.c: Likewise.
>   * gcc.target/powerpc/vsx-store-element-truncate-int.c: Likewise.
>   * gcc.target/powerpc/vsx-store-element-truncate-longlong.c: Likewise.
>   * gcc.target/powerpc/vsx-store-element-truncate-short.c: Likewise.
>   * gcc.target/powerpc/altivec-consts.c: Likewise, add -save-temps.
>   * gcc.target/powerpc/le-altivec-consts.c: Likewise.

Okay for trunk.  Thanks!


Segher


Re: [PING] [PATCH] S/390: Do not turn maybe-uninitialized warnings into errors

2020-10-28 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Wed, Oct 28, 2020 at 08:39:41AM -0600, Jeff Law wrote:
> 
> On 10/28/20 3:38 AM, Stefan Schulze Frielinghaus via Gcc-patches wrote:
> > On Mon, Oct 05, 2020 at 02:02:57PM +0200, Stefan Schulze Frielinghaus via 
> > Gcc-patches wrote:
> >> On Tue, Sep 22, 2020 at 02:59:30PM +0200, Andreas Krebbel wrote:
> >>> On 15.09.20 17:02, Stefan Schulze Frielinghaus wrote:
>  Over the last couple of months quite a few warnings about uninitialized
>  variables were raised while building GCC.  A reason why these warnings
>  show up on S/390 only is due to the aggressive inlining settings here.
>  Some of these warnings (2c832ffedf0, b776bdca932, 2786c0221b6,
>  1657178f59b) could be fixed or in case of a false positive silenced by
>  initializing the corresponding variable.  Since the latter reoccurs and
>  while bootstrapping such warnings are turned into errors bootstrapping
>  fails on S/390 consistently.  Therefore, for the moment do not turn
>  those warnings into errors.
> 
>  config/ChangeLog:
> 
>   * warnings.m4: Do not turn maybe-uninitialized warnings into errors
>   on S/390.
> 
>  fixincludes/ChangeLog:
> 
>   * configure: Regenerate.
> 
>  gcc/ChangeLog:
> 
>   * configure: Regenerate.
> 
>  libcc1/ChangeLog:
> 
>   * configure: Regenerate.
> 
>  libcpp/ChangeLog:
> 
>   * configure: Regenerate.
> 
>  libdecnumber/ChangeLog:
> 
>   * configure: Regenerate.
> >>> That change looks good to me. Could a global reviewer please comment!
> >> Ping
> > Ping
> 
> I think this would be a huge mistake to install.

The root cause why those false positives show up on S/390 only seems to
be of more aggressive inlining w.r.t. other architectures.  Because of
bigger caches and a rather huge function call overhead we greatly
benefit from those inlining parameters. Thus:

1) Reverting those parameters would have a negative performance impact.

2) Fixing the maybe-uninitialized warnings analysis itself seems not to
   happen in the near future (assuming that it is fixable at all).

3) Silencing the warning by initialising the variable itself also seems
   to be undesired and feels like a fight against windmills ;-)

4) Not lifting maybe-uninitialized warnings to errors on S/390 only.

Option (4) has the least intrusive effect to me.  At least then it is
not necessary to bootstrap with --disable-werror and we would still
treat all other warnings as errors.  All maybe-uninitialized warnings
which are triggered in common code with non-aggressive inlining are
still caught by other architectures.  Therefore, I'm wondering why this
should be a huge mistake?  What would you propose instead?

Cheers,
Stefan


Re: [PING] [PATCH] S/390: Do not turn maybe-uninitialized warnings into errors

2020-10-28 Thread Jeff Law via Gcc-patches


On 10/28/20 11:29 AM, Stefan Schulze Frielinghaus wrote:
> On Wed, Oct 28, 2020 at 08:39:41AM -0600, Jeff Law wrote:
>> On 10/28/20 3:38 AM, Stefan Schulze Frielinghaus via Gcc-patches wrote:
>>> On Mon, Oct 05, 2020 at 02:02:57PM +0200, Stefan Schulze Frielinghaus via 
>>> Gcc-patches wrote:
 On Tue, Sep 22, 2020 at 02:59:30PM +0200, Andreas Krebbel wrote:
> On 15.09.20 17:02, Stefan Schulze Frielinghaus wrote:
>> Over the last couple of months quite a few warnings about uninitialized
>> variables were raised while building GCC.  A reason why these warnings
>> show up on S/390 only is due to the aggressive inlining settings here.
>> Some of these warnings (2c832ffedf0, b776bdca932, 2786c0221b6,
>> 1657178f59b) could be fixed or in case of a false positive silenced by
>> initializing the corresponding variable.  Since the latter reoccurs and
>> while bootstrapping such warnings are turned into errors bootstrapping
>> fails on S/390 consistently.  Therefore, for the moment do not turn
>> those warnings into errors.
>>
>> config/ChangeLog:
>>
>>  * warnings.m4: Do not turn maybe-uninitialized warnings into errors
>>  on S/390.
>>
>> fixincludes/ChangeLog:
>>
>>  * configure: Regenerate.
>>
>> gcc/ChangeLog:
>>
>>  * configure: Regenerate.
>>
>> libcc1/ChangeLog:
>>
>>  * configure: Regenerate.
>>
>> libcpp/ChangeLog:
>>
>>  * configure: Regenerate.
>>
>> libdecnumber/ChangeLog:
>>
>>  * configure: Regenerate.
> That change looks good to me. Could a global reviewer please comment!
 Ping
>>> Ping
>> I think this would be a huge mistake to install.
> The root cause why those false positives show up on S/390 only seems to
> be of more aggressive inlining w.r.t. other architectures.  Because of
> bigger caches and a rather huge function call overhead we greatly
> benefit from those inlining parameters. Thus:
>
> 1) Reverting those parameters would have a negative performance impact.
>
> 2) Fixing the maybe-uninitialized warnings analysis itself seems not to
>happen in the near future (assuming that it is fixable at all).
>
> 3) Silencing the warning by initialising the variable itself also seems
>to be undesired and feels like a fight against windmills ;-)
>
> 4) Not lifting maybe-uninitialized warnings to errors on S/390 only.
>
> Option (4) has the least intrusive effect to me.  At least then it is
> not necessary to bootstrap with --disable-werror and we would still
> treat all other warnings as errors.  All maybe-uninitialized warnings
> which are triggered in common code with non-aggressive inlining are
> still caught by other architectures.  Therefore, I'm wondering why this
> should be a huge mistake?  What would you propose instead?

I'm aware of all that.  What I think it all argues is that y'all need to
address the issues because of how you've changed the tuning on the s390
port.  Simply disabling things like you've suggested is, IMHO, horribly
wrong.


Improve the analysis, dummy initializers, pragmas all seem viable.  But
again, it feels like it's something the s390 maintainers will have to
take the lead on because of how you've retuned the port.


And note that this isn't just an issue with uninitialized warnings, the
changes in inlining heuristics can impact all the middle end warnings.


jeff



Re: [PATCH] arm: Fix multiple inheritance thunks for thumb-1 with -mpure-code

2020-10-28 Thread Richard Earnshaw via Gcc-patches
On 27/10/2020 15:42, Richard Earnshaw via Gcc-patches wrote:
> On 26/10/2020 10:52, Christophe Lyon via Gcc-patches wrote:
>> On Thu, 22 Oct 2020 at 17:22, Richard Earnshaw
>>  wrote:
>>>
>>> On 22/10/2020 09:45, Christophe Lyon via Gcc-patches wrote:
 On Wed, 21 Oct 2020 at 19:36, Richard Earnshaw
  wrote:
>
> On 21/10/2020 17:11, Christophe Lyon via Gcc-patches wrote:
>> On Wed, 21 Oct 2020 at 18:07, Richard Earnshaw
>>  wrote:
>>>
>>> On 21/10/2020 16:49, Christophe Lyon via Gcc-patches wrote:
 On Tue, 20 Oct 2020 at 13:25, Richard Earnshaw
  wrote:
>
> On 20/10/2020 12:22, Richard Earnshaw wrote:
>> On 19/10/2020 17:32, Christophe Lyon via Gcc-patches wrote:
>>> On Mon, 19 Oct 2020 at 16:39, Richard Earnshaw
>>>  wrote:

 On 12/10/2020 08:59, Christophe Lyon via Gcc-patches wrote:
> On Thu, 8 Oct 2020 at 11:58, Richard Earnshaw
>  wrote:
>>
>> On 08/10/2020 10:07, Christophe Lyon via Gcc-patches wrote:
>>> On Tue, 6 Oct 2020 at 18:02, Richard Earnshaw
>>>  wrote:

 On 29/09/2020 20:50, Christophe Lyon via Gcc-patches wrote:
> When mi_delta is > 255 and -mpure-code is used, we cannot 
> load delta
> from code memory (like we do without -mpure-code).
>
> This patch builds the value of mi_delta into r3 with a series 
> of
> movs/adds/lsls.
>
> We also do some cleanup by not emitting the function address 
> and delta
> via .word directives at the end of the thunk since we don't 
> use them
> with -mpure-code.
>
> No need for new testcases, this bug was already identified by
> eg. pr46287-3.C
>
> 2020-09-29  Christophe Lyon  
>
>   gcc/
>   * config/arm/arm.c (arm_thumb1_mi_thunk): Build 
> mi_delta in r3 and
>   do not emit function address and delta when -mpure-code 
> is used.

>>> Hi Richard,
>>>
>>> Thanks for your comments.
>>>
 There are some optimizations you can make to this code.

 Firstly, for values between 256 and 510 (inclusive), it would 
 be better
 to just expand a mov of 255 followed by an add.
>>> I now see the splitted for the "Pe" constraint which I hadn't 
>>> noticed
>>> before, so I can write something similar indeed.
>>>
>>> However, I'm note quite sure to understand the benefit in the 
>>> split
>>> when -mpure-code is NOT used.
>>> Consider:
>>> int f3_1 (void) { return 510; }
>>> int f3_2 (void) { return 511; }
>>> Compile with -O2 -mcpu=cortex-m0:
>>> f3_1:
>>> movsr0, #255
>>> lslsr0, r0, #1
>>> bx  lr
>>> f3_2:
>>> ldr r0, .L4
>>> bx  lr
>>>
>>> The splitter makes the code bigger, does it "compensate" for 
>>> this by
>>> not having to load the constant?
>>> Actually the constant uses 4 more bytes, which should be taken 
>>> into
>>> account when comparing code size,
>>
>> Yes, the size of the literal pool entry needs to be taken into 
>> account.
>>  It might happen that the entry could be shared with another use 
>> of that
>> literal, but in general that's rare.
>>
>>> so f3_1 uses 6 bytes, and f3_2 uses 8, so as you say below three
>>> thumb1 instructions would be equivalent in size compared to 
>>> loading
>>> from the literal pool. Should the 256-510 range be extended?
>>
>> It's a bit borderline at three instructions when literal pools 
>> are not
>> expensive to use, but in thumb1 literal pools tend to be quite 
>> small due
>> to the limited pc offsets we can use.  I think on balance we 
>> probably
>> want to use the instruction sequence unless optimizing for size.
>>
>>>
>>>
 This is also true for
 the literal pools alternative as well, so should be handled 
 before all
 this.
>>> I am not sure what you mean: with -mpure-code, the above s

Re: [PATCH v2] c++: Implement -Wvexing-parse [PR25814]

2020-10-28 Thread Marek Polacek via Gcc-patches
On Wed, Oct 28, 2020 at 01:26:53AM -0400, Jason Merrill via Gcc-patches wrote:
> On 10/24/20 7:40 PM, Marek Polacek wrote:
> > On Fri, Oct 23, 2020 at 09:33:38PM -0400, Jason Merrill via Gcc-patches 
> > wrote:
> > > On 10/23/20 3:01 PM, Marek Polacek wrote:
> > > > This patch implements the -Wvexing-parse warning to warn about the
> > > > sneaky most vexing parse rule in C++: the cases when a declaration
> > > > looks like a variable definition, but the C++ language requires it
> > > > to be interpreted as a function declaration.  This warning is on by
> > > > default (like clang++).  From the docs:
> > > > 
> > > > void f(double a) {
> > > >   int i();// extern int i (void);
> > > >   int n(int(a));  // extern int n (int);
> > > > }
> > > > 
> > > > Another example:
> > > > 
> > > > struct S { S(int); };
> > > > void f(double a) {
> > > >   S x(int(a));   // extern struct S x (int);
> > > >   S y(int());// extern struct S y (int (*) (void));
> > > >   S z(); // extern struct S z (void);
> > > > }
> > > > 
> > > > You can find more on this in [dcl.ambig.res].
> > > > 
> > > > I spent a fair amount of time on fix-it hints so that GCC can recommend
> > > > various ways to resolve such an ambiguity.  Sometimes that's tricky.
> > > > E.g., suggesting default-initialization when the class doesn't have
> > > > a default constructor would not be optimal.  Suggesting {}-init is also
> > > > not trivial because it can use an initializer-list constructor if no
> > > > default constructor is available (which ()-init wouldn't do).  And of
> > > > course, pre-C++11, we shouldn't be recommending {}-init at all.
> > > 
> > > What do you think of, instead of passing the type down into the declarator
> > > parse, adding the paren locations to cp_declarator::function and giving 
> > > the
> > > diagnostic from cp_parser_init_declarator instead?
> 
> Oops, now I see there's already cp_declarator::parenthesized; might as well
> reuse that.  And maybe change it to a range, while we're at it.

I'm afraid I can't reuse it because grokdeclarator uses it to warn about
"unnecessary parentheses in declaration".  So when we have:

  int (x());

declarator->parenthesized points to the outer parens (if any), whereas
declarator->u.function.parens_loc should point to the inner ones.  We also
have declarator->id_loc but I think we should only use it for declarator-ids.

(We should still adjust ->parenthesized to be a range to generate a better
diagnostic; I shall send a patch soon.)

> Hmm, I wonder why we have the parenthesized_p parameter to some of these
> functions, since we can look at the declarator to find that information...

That would be a nice cleanup.

> > Interesting idea.  I suppose it's better, and makes the implementation
> > more localized.  The approach here is that if the .function.parens_loc
> > is UNKNOWN_LOCATION, we've not seen a vexing parse.
> 
> I'd rather always set the parens location, and then analyze the
> cp_declarator in warn_about_ambiguous_parse to see if it's a vexing parse;
> we should have all the information we need.

I could always set .parens_loc, but then I'd still need another flag telling
me whether we had an ambiguity.  Otherwise I don't know how I would tell
apart e.g. "int f()" (warn) v. "int f(void)" (don't warn), etc.

Marek



Re: [PATCH v2] c++: Prevent warnings for value-dependent exprs [PR96742]

2020-10-28 Thread Marek Polacek via Gcc-patches
On Tue, Oct 27, 2020 at 01:36:30PM -0400, Jason Merrill wrote:
> On 10/24/20 6:52 PM, Marek Polacek wrote:
> > Here, in r11-155, I changed the call to uses_template_parms to
> > type_dependent_expression_p_push to avoid a crash in C++98 in
> > value_dependent_expression_p on a non-constant expression.  But that
> > prompted a host of complaints that we now warn for value-dependent
> > expressions in templates.  Those warnings are technically valid, but
> > people still don't want them because they're awkward to avoid.  So let's
> > partially revert my earlier fix and make sure that we don't ICE in
> > value_dependent_expression_p by checking potential_constant_expression
> > first.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/96675
> > PR c++/96742
> > * pt.c (tsubst_copy_and_build): Call uses_template_parms instead of
> > type_dependent_expression_p_push.  Only call uses_template_parms
> > for expressions that are potential_constant_expression.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/96675
> > PR c++/96742
> > * g++.dg/warn/Wdiv-by-zero-3.C: Turn dg-warning into dg-bogus.
> > * g++.dg/warn/Wtautological-compare3.C: New test.
> > * g++.dg/warn/Wtype-limits5.C: New test.
> > * g++.old-deja/g++.pt/crash10.C: Remove dg-warning.
> > ---
> >   gcc/cp/pt.c|  6 --
> >   gcc/testsuite/g++.dg/warn/Wdiv-by-zero-3.C |  6 --
> >   gcc/testsuite/g++.dg/warn/Wtautological-compare3.C | 11 +++
> >   gcc/testsuite/g++.dg/warn/Wtype-limits5.C  | 11 +++
> >   gcc/testsuite/g++.old-deja/g++.pt/crash10.C|  1 -
> >   5 files changed, 30 insertions(+), 5 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/warn/Wtautological-compare3.C
> >   create mode 100644 gcc/testsuite/g++.dg/warn/Wtype-limits5.C
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index dc664ec3798..8aa0bc2c0d8 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -19618,8 +19618,10 @@ tsubst_copy_and_build (tree t,
> > {
> > /* If T was type-dependent, suppress warnings that depend on the range
> >of the types involved.  */
> > -   bool was_dep = type_dependent_expression_p_push (t);
> > -
> > +   ++processing_template_decl;
> > +   const bool was_dep = (!potential_constant_expression (t)
> > + || uses_template_parms (t));
> 
> We don't want to suppress warnings for a non-constant expression that uses
> no template parms.  So maybe

Fair enough.

> potential_c_e ? value_d : type_d

That works for all the cases I have.

> ?  Or perhaps instantiation_dependent_expression_p.

i_d_e_p would still crash in C++98 :(.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

-- >8 --
Here, in r11-155, I changed the call to uses_template_parms to
type_dependent_expression_p_push to avoid a crash in C++98 in
value_dependent_expression_p on a non-constant expression.  But that
prompted a host of complaints that we now warn for value-dependent
expressions in templates.  Those warnings are technically valid, but
people still don't want them because they're awkward to avoid.  This
patch uses value_dependent_expression_p or type_dependent_expression_p.
But make sure that we don't ICE in value_dependent_expression_p by
checking potential_constant_expression first.

gcc/cp/ChangeLog:

PR c++/96675
PR c++/96742
* pt.c (tsubst_copy_and_build): Call value_dependent_expression_p or
type_dependent_expression_p instead of type_dependent_expression_p_push.
But only call value_dependent_expression_p for expressions that are
potential_constant_expression.

gcc/testsuite/ChangeLog:

PR c++/96675
PR c++/96742
* g++.dg/warn/Wdiv-by-zero-3.C: Turn dg-warning into dg-bogus.
* g++.dg/warn/Wtautological-compare3.C: New test.
* g++.dg/warn/Wtype-limits5.C: New test.
* g++.old-deja/g++.pt/crash10.C: Remove dg-warning.
---
 gcc/cp/pt.c|  7 +--
 gcc/testsuite/g++.dg/warn/Wdiv-by-zero-3.C |  6 --
 gcc/testsuite/g++.dg/warn/Wtautological-compare3.C | 11 +++
 gcc/testsuite/g++.dg/warn/Wtype-limits5.C  | 11 +++
 gcc/testsuite/g++.old-deja/g++.pt/crash10.C|  1 -
 5 files changed, 31 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wtautological-compare3.C
 create mode 100644 gcc/testsuite/g++.dg/warn/Wtype-limits5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 3c0f2546489..57db476645e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -19618,8 +19618,11 @@ tsubst_copy_and_build (tree t,
   {
/* If T was type-dependent, suppress warnings that depend on the range
   of the types involved.  */
-   bool was_dep = type_dependent_expression_p_push (t);
-
+   ++processing_template_decl;
+   const bo

[PATCH] c++: Deprecate arithmetic convs on different enums [PR97573]

2020-10-28 Thread Marek Polacek via Gcc-patches
I noticed that C++20 P1120R0 deprecated certain arithmetic conversions
as outlined in [depr.arith.conv.enum], but we don't warn about them.  In
particular, "If one operand is of enumeration type and the other operand
is of a different enumeration type or a floating-point type, this
behavior is deprecated."  These will likely become ill-formed in C++23,
so we should warn by default in C++20.  To this effect, this patch adds
two new warnings (like clang++): -Wdeprecated-enum-enum-conversion and
-Wdeprecated-enum-float-conversion.  They are enabled by default in
C++20.  In older dialects, to enable these warnings you can now use
-Wenum-conversion which I made available in C++ too.  Note that unlike
C, in C++ it is not enabled by -Wextra, because that breaks bootstrap.

We already warn about comparisons of two different enumeration types via
-Wenum-compare, the rest is handled in this patch: we're performing the
usual arithmetic conversions in these contexts:
  - an arithmetic operation,
  - a bitwise operation,
  - a comparison,
  - a conditional operator,
  - a compound assign operator.

Using the spaceship operator as enum <=> real_type is ill-formed but we
don't reject it yet.  We should also address [depr.array.comp] too, but
it's not handled in this patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

PR c++/97573
* c-opts.c (c_common_post_options): In C++20, turn on
-Wdeprecated-enum-enum-conversion and
-Wdeprecated-enum-float-conversion.
* c.opt (Wdeprecated-enum-enum-conversion,
Wdeprecated-enum-float-conversion): New options.
(Wenum-conversion): Allow for C++ too.

gcc/cp/ChangeLog:

PR c++/97573
* call.c (build_conditional_expr_1): Warn about the deprecated
enum/real type conversion in C++20.  Also warn about a non-enumerated
and enumerated type in ?: when -Wenum-conversion is on.
* typeck.c (do_warn_enum_conversions): New function.
(cp_build_binary_op): Call it.

gcc/ChangeLog:

PR c++/97573
* doc/invoke.texi: Document -Wdeprecated-enum-enum-conversion
and -Wdeprecated-enum-float-conversion.  -Wenum-conversion is
no longer C/ObjC only.

gcc/testsuite/ChangeLog:

PR c++/97573
* g++.dg/cpp0x/linkage2.C: Add dg-warning.
* g++.dg/parse/attr3.C: Likewise.
* g++.dg/cpp2a/enum-conv1.C: New test.
* g++.dg/cpp2a/enum-conv2.C: New test.
* g++.dg/cpp2a/enum-conv3.C: New test.
---
 gcc/c-family/c-opts.c   |  10 ++
 gcc/c-family/c.opt  |  11 ++-
 gcc/cp/call.c   |  35 +--
 gcc/cp/typeck.c | 112 +-
 gcc/doc/invoke.texi |  44 -
 gcc/testsuite/g++.dg/cpp0x/linkage2.C   |   2 +-
 gcc/testsuite/g++.dg/cpp2a/enum-conv1.C | 120 
 gcc/testsuite/g++.dg/cpp2a/enum-conv2.C | 115 +++
 gcc/testsuite/g++.dg/cpp2a/enum-conv3.C | 115 +++
 gcc/testsuite/g++.dg/parse/attr3.C  |   2 +-
 10 files changed, 549 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/enum-conv1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/enum-conv2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/enum-conv3.C

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 38d33849423..120f4489f6c 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -925,6 +925,16 @@ c_common_post_options (const char **pfilename)
   SET_OPTION_IF_UNSET (&global_options, &global_options_set, warn_volatile,
   cxx_dialect >= cxx20 && warn_deprecated);
 
+  /* -Wdeprecated-enum-enum-conversion is enabled by default in C++20.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  warn_deprecated_enum_enum_conv,
+  cxx_dialect >= cxx20 && warn_deprecated);
+
+  /* -Wdeprecated-enum-float-conversion is enabled by default in C++20.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  warn_deprecated_enum_float_conv,
+  cxx_dialect >= cxx20 && warn_deprecated);
+
   /* Declone C++ 'structors if -Os.  */
   if (flag_declone_ctor_dtor == -1)
 flag_declone_ctor_dtor = optimize_size;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 1009defbf16..10e53ea67c9 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -518,6 +518,15 @@ C++ ObjC++ Var(warn_deprecated_copy, 2) Warning
 Mark implicitly-declared copy operations as deprecated if the class has a
 user-provided copy operation or destructor.
 
+Wdeprecated-enum-enum-conversion
+C++ ObjC++ Var(warn_deprecated_enum_enum_conv) Warning
+Warn about deprecated arithmetic conversions on operands of enumeration types.
+
+Wdeprecated-enum-float-conversion
+C++ ObjC++ Var(warn_deprecated_enum_float_conv) Warning
+Warn about depre

[PATCH] c++: GCC accepts junk before fold-expression [PR86773]

2020-10-28 Thread Marek Polacek via Gcc-patches
Here we accept a bogus expression before a left fold:

Recall that a fold expression looks like:

 fold-expression:
( cast-expression fold-operator ... )
( ... fold-operator cast-expression )
( cast-expression fold-operator ... fold-operator cast-expression )

but here we have

( cast-expression ... fold-operator cast-expression )

The best fix seems to just return error_mark_node when we know this code
is invalid, and let the subsequent code report that a ) was expected.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/86773
* parser.c (cp_parser_fold_expression): Return error_mark_node
if a left fold is preceded by an expression.

gcc/testsuite/ChangeLog:

PR c++/86773
* g++.dg/cpp1z/fold12.C: New test.
---
 gcc/cp/parser.c |  2 ++
 gcc/testsuite/g++.dg/cpp1z/fold12.C | 13 +
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/fold12.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cce3d0a679e..1c0eeefe036 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -5138,6 +5138,8 @@ cp_parser_fold_expression (cp_parser *parser, tree expr1)
   // Left fold.
   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))
 {
+  if (expr1)
+   return error_mark_node;
   cp_lexer_consume_token (parser->lexer);
   int op = cp_parser_fold_operator (parser);
   if (op == ERROR_MARK)
diff --git a/gcc/testsuite/g++.dg/cpp1z/fold12.C 
b/gcc/testsuite/g++.dg/cpp1z/fold12.C
new file mode 100644
index 000..90d74cc5947
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/fold12.C
@@ -0,0 +1,13 @@
+// PR c++/86773
+// { dg-do compile { target c++17 } }
+
+template 
+auto work(Param && ...param)
+{
+  return ("asda" ... / param); // { dg-error "expected" }
+}
+
+int main()
+{
+  work(1.0, 2.0, 5, 4.0);
+}

base-commit: 2118438f49f0c193abe3fa3def350a8129045746
-- 
2.26.2



[PATCH] c++: Implement CWG 625: Use of auto as template-arg [PR97479]

2020-10-28 Thread Marek Polacek via Gcc-patches
This patch implements CWG 625 which prohibits using auto in a template
argument.  A few tests used this construction.  We could perhaps only
give an error in C++20, but not in C++17 with -fconcepts.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

DR 625
PR c++/97479
* parser.c (cp_parser_type_id_1): Reject using auto as
a template-argument.

gcc/testsuite/ChangeLog:

DR 625
PR c++/97479
* g++.dg/concepts/auto1.C: Add dg-error.
* g++.dg/concepts/auto3.C: Likewise.
* g++.dg/concepts/auto4.C: Likewise.
* g++.dg/cpp0x/auto3.C: Update dg-error.
* g++.dg/cpp0x/auto9.C: Likewise.
* g++.dg/cpp2a/concepts-pr84979-2.C: Likewise.
* g++.dg/cpp2a/concepts-pr84979-3.C: Likewise.
* g++.dg/cpp2a/concepts-pr84979.C: Likewise.
* g++.dg/DRs/dr625.C: New test.
---
 gcc/cp/parser.c |  8 +++-
 gcc/testsuite/g++.dg/DRs/dr625.C| 15 +++
 gcc/testsuite/g++.dg/concepts/auto1.C   |  4 ++--
 gcc/testsuite/g++.dg/concepts/auto3.C   |  6 +++---
 gcc/testsuite/g++.dg/concepts/auto4.C   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/auto3.C  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/auto9.C  |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-pr84979-2.C | 12 ++--
 gcc/testsuite/g++.dg/cpp2a/concepts-pr84979-3.C | 12 ++--
 gcc/testsuite/g++.dg/cpp2a/concepts-pr84979.C   |  2 +-
 10 files changed, 43 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/DRs/dr625.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cce3d0a679e..4b210db34a5 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -22419,7 +22419,10 @@ cp_parser_type_id_1 (cp_parser *parser, 
cp_parser_flags flags,
 
   if (type_specifier_seq.type
   /* The concepts TS allows 'auto' as a type-id.  */
-  && (!flag_concepts || parser->in_type_id_in_expr_p)
+  && (!flag_concepts
+ || parser->in_type_id_in_expr_p
+ /* DR 625 prohibits use of auto as a template-argument.  */
+ || parser->in_template_argument_list_p)
   /* None of the valid uses of 'auto' in C++14 involve the type-id
 nonterminal, but it is valid in a trailing-return-type.  */
   && !(cxx_dialect >= cxx14 && is_trailing_return))
@@ -22446,6 +22449,9 @@ cp_parser_type_id_1 (cp_parser *parser, cp_parser_flags 
flags,
inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here",
tmpl);
  }
+   else if (parser->in_template_argument_list_p)
+ error_at (loc, "%qT not permitted in template argument",
+   auto_node);
else
  error_at (loc, "invalid use of %qT", auto_node);
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/DRs/dr625.C b/gcc/testsuite/g++.dg/DRs/dr625.C
new file mode 100644
index 000..ce30a9258e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/DRs/dr625.C
@@ -0,0 +1,15 @@
+// DR 625 - Use of auto as a template-argument
+// PR c++/97479
+// { dg-do compile { target c++14 } }
+
+template
+struct A { };
+
+void f(int);
+
+int main()
+{
+  A x = A(); // { dg-error "not permitted|invalid|cannot 
convert" }
+  A a = A(); // { dg-error "not permitted|invalid|cannot convert" }
+  void (*p)(auto); // { dg-error "parameter" }
+}
diff --git a/gcc/testsuite/g++.dg/concepts/auto1.C 
b/gcc/testsuite/g++.dg/concepts/auto1.C
index e05330610fc..ae93ecc8382 100644
--- a/gcc/testsuite/g++.dg/concepts/auto1.C
+++ b/gcc/testsuite/g++.dg/concepts/auto1.C
@@ -7,8 +7,8 @@ A a;
 A a2;
 A a22;
 
-A b = a;
-A b1 = a2;
+A b = a;   // { dg-error "" }
+A b1 = a2; // { dg-error "" }
 
 template  concept bool C = __is_same_as (T, int);
 
diff --git a/gcc/testsuite/g++.dg/concepts/auto3.C 
b/gcc/testsuite/g++.dg/concepts/auto3.C
index 27a6afa4ed9..f460295a2af 100644
--- a/gcc/testsuite/g++.dg/concepts/auto3.C
+++ b/gcc/testsuite/g++.dg/concepts/auto3.C
@@ -4,10 +4,10 @@
 template  class tuple {};
 
 tuple t;
-tuple y = t;
+tuple y = t; // { dg-error "" }
 
 tuple t2;
-tuple x = t2;
-tuple x2 = t;
+tuple x = t2; // { dg-error "" }
+tuple x2 = t; // { dg-error "" }
 
 tuple y2 = t2;   // { dg-error "" }
diff --git a/gcc/testsuite/g++.dg/concepts/auto4.C 
b/gcc/testsuite/g++.dg/concepts/auto4.C
index 8bf3fa9b1ce..ee712617e6d 100644
--- a/gcc/testsuite/g++.dg/concepts/auto4.C
+++ b/gcc/testsuite/g++.dg/concepts/auto4.C
@@ -4,7 +4,7 @@
 
 template struct A {};
 
-template A foo() { return A{}; }
+template A foo() { return A{}; }  // { dg-error "" }
 
 void bar()
 {
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto3.C 
b/gcc/testsuite/g++.dg/cpp0x/auto3.C
index 709898db39d..2cd0520023d 100644
--- a/gcc/testsuite/g++.dg/cpp0x/auto3.C
+++ b/gcc/testsuite/g++.dg/cpp0x/auto3.C
@@ -17,7 +17,7 @@ struct A { };
 
 A A1;
 // CWG issue 625
-A A2 = A1;  

Re: [PATCH] arm: Fix multiple inheritance thunks for thumb-1 with -mpure-code

2020-10-28 Thread Christophe Lyon via Gcc-patches
On Wed, 28 Oct 2020 at 18:44, Richard Earnshaw
 wrote:
>
> On 27/10/2020 15:42, Richard Earnshaw via Gcc-patches wrote:
> > On 26/10/2020 10:52, Christophe Lyon via Gcc-patches wrote:
> >> On Thu, 22 Oct 2020 at 17:22, Richard Earnshaw
> >>  wrote:
> >>>
> >>> On 22/10/2020 09:45, Christophe Lyon via Gcc-patches wrote:
>  On Wed, 21 Oct 2020 at 19:36, Richard Earnshaw
>   wrote:
> >
> > On 21/10/2020 17:11, Christophe Lyon via Gcc-patches wrote:
> >> On Wed, 21 Oct 2020 at 18:07, Richard Earnshaw
> >>  wrote:
> >>>
> >>> On 21/10/2020 16:49, Christophe Lyon via Gcc-patches wrote:
>  On Tue, 20 Oct 2020 at 13:25, Richard Earnshaw
>   wrote:
> >
> > On 20/10/2020 12:22, Richard Earnshaw wrote:
> >> On 19/10/2020 17:32, Christophe Lyon via Gcc-patches wrote:
> >>> On Mon, 19 Oct 2020 at 16:39, Richard Earnshaw
> >>>  wrote:
> 
>  On 12/10/2020 08:59, Christophe Lyon via Gcc-patches wrote:
> > On Thu, 8 Oct 2020 at 11:58, Richard Earnshaw
> >  wrote:
> >>
> >> On 08/10/2020 10:07, Christophe Lyon via Gcc-patches wrote:
> >>> On Tue, 6 Oct 2020 at 18:02, Richard Earnshaw
> >>>  wrote:
> 
>  On 29/09/2020 20:50, Christophe Lyon via Gcc-patches wrote:
> > When mi_delta is > 255 and -mpure-code is used, we cannot 
> > load delta
> > from code memory (like we do without -mpure-code).
> >
> > This patch builds the value of mi_delta into r3 with a 
> > series of
> > movs/adds/lsls.
> >
> > We also do some cleanup by not emitting the function 
> > address and delta
> > via .word directives at the end of the thunk since we don't 
> > use them
> > with -mpure-code.
> >
> > No need for new testcases, this bug was already identified 
> > by
> > eg. pr46287-3.C
> >
> > 2020-09-29  Christophe Lyon  
> >
> >   gcc/
> >   * config/arm/arm.c (arm_thumb1_mi_thunk): Build 
> > mi_delta in r3 and
> >   do not emit function address and delta when 
> > -mpure-code is used.
> 
> >>> Hi Richard,
> >>>
> >>> Thanks for your comments.
> >>>
>  There are some optimizations you can make to this code.
> 
>  Firstly, for values between 256 and 510 (inclusive), it 
>  would be better
>  to just expand a mov of 255 followed by an add.
> >>> I now see the splitted for the "Pe" constraint which I hadn't 
> >>> noticed
> >>> before, so I can write something similar indeed.
> >>>
> >>> However, I'm note quite sure to understand the benefit in the 
> >>> split
> >>> when -mpure-code is NOT used.
> >>> Consider:
> >>> int f3_1 (void) { return 510; }
> >>> int f3_2 (void) { return 511; }
> >>> Compile with -O2 -mcpu=cortex-m0:
> >>> f3_1:
> >>> movsr0, #255
> >>> lslsr0, r0, #1
> >>> bx  lr
> >>> f3_2:
> >>> ldr r0, .L4
> >>> bx  lr
> >>>
> >>> The splitter makes the code bigger, does it "compensate" for 
> >>> this by
> >>> not having to load the constant?
> >>> Actually the constant uses 4 more bytes, which should be 
> >>> taken into
> >>> account when comparing code size,
> >>
> >> Yes, the size of the literal pool entry needs to be taken into 
> >> account.
> >>  It might happen that the entry could be shared with another 
> >> use of that
> >> literal, but in general that's rare.
> >>
> >>> so f3_1 uses 6 bytes, and f3_2 uses 8, so as you say below 
> >>> three
> >>> thumb1 instructions would be equivalent in size compared to 
> >>> loading
> >>> from the literal pool. Should the 256-510 range be extended?
> >>
> >> It's a bit borderline at three instructions when literal pools 
> >> are not
> >> expensive to use, but in thumb1 literal pools tend to be quite 
> >> small due
> >> to the limited pc offsets we can use.  I think on balance we 
> >> probably
> >> want to use the instruction sequence un

Re: [PATCH] c++: GCC accepts junk before fold-expression [PR86773]

2020-10-28 Thread Jason Merrill via Gcc-patches

On 10/28/20 2:02 PM, Marek Polacek wrote:

Here we accept a bogus expression before a left fold:

Recall that a fold expression looks like:

  fold-expression:
 ( cast-expression fold-operator ... )
 ( ... fold-operator cast-expression )
 ( cast-expression fold-operator ... fold-operator cast-expression )

but here we have

 ( cast-expression ... fold-operator cast-expression )

The best fix seems to just return error_mark_node when we know this code
is invalid, and let the subsequent code report that a ) was expected.


It might be nice to suggest how better to write a fold-expression, but 
it's not necessary.  The patch is OK.



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/86773
* parser.c (cp_parser_fold_expression): Return error_mark_node
if a left fold is preceded by an expression.

gcc/testsuite/ChangeLog:

PR c++/86773
* g++.dg/cpp1z/fold12.C: New test.
---
  gcc/cp/parser.c |  2 ++
  gcc/testsuite/g++.dg/cpp1z/fold12.C | 13 +
  2 files changed, 15 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/fold12.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cce3d0a679e..1c0eeefe036 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -5138,6 +5138,8 @@ cp_parser_fold_expression (cp_parser *parser, tree expr1)
// Left fold.
if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))
  {
+  if (expr1)
+   return error_mark_node;
cp_lexer_consume_token (parser->lexer);
int op = cp_parser_fold_operator (parser);
if (op == ERROR_MARK)
diff --git a/gcc/testsuite/g++.dg/cpp1z/fold12.C 
b/gcc/testsuite/g++.dg/cpp1z/fold12.C
new file mode 100644
index 000..90d74cc5947
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/fold12.C
@@ -0,0 +1,13 @@
+// PR c++/86773
+// { dg-do compile { target c++17 } }
+
+template 
+auto work(Param && ...param)
+{
+  return ("asda" ... / param); // { dg-error "expected" }
+}
+
+int main()
+{
+  work(1.0, 2.0, 5, 4.0);
+}

base-commit: 2118438f49f0c193abe3fa3def350a8129045746





Re: [PATCH] c++: Implement CWG 625: Use of auto as template-arg [PR97479]

2020-10-28 Thread Jason Merrill via Gcc-patches

On 10/28/20 2:02 PM, Marek Polacek wrote:

This patch implements CWG 625 which prohibits using auto in a template
argument.  A few tests used this construction.  We could perhaps only
give an error in C++20, but not in C++17 with -fconcepts.


We should not give an error with -fconcepts-ts, this was allowed by the 
Concepts TS.


Does just changing !flag_concepts to !flag_concepts_ts work?


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

DR 625
PR c++/97479
* parser.c (cp_parser_type_id_1): Reject using auto as
a template-argument.

gcc/testsuite/ChangeLog:

DR 625
PR c++/97479
* g++.dg/concepts/auto1.C: Add dg-error.
* g++.dg/concepts/auto3.C: Likewise.
* g++.dg/concepts/auto4.C: Likewise.
* g++.dg/cpp0x/auto3.C: Update dg-error.
* g++.dg/cpp0x/auto9.C: Likewise.
* g++.dg/cpp2a/concepts-pr84979-2.C: Likewise.
* g++.dg/cpp2a/concepts-pr84979-3.C: Likewise.
* g++.dg/cpp2a/concepts-pr84979.C: Likewise.
* g++.dg/DRs/dr625.C: New test.
---
  gcc/cp/parser.c |  8 +++-
  gcc/testsuite/g++.dg/DRs/dr625.C| 15 +++
  gcc/testsuite/g++.dg/concepts/auto1.C   |  4 ++--
  gcc/testsuite/g++.dg/concepts/auto3.C   |  6 +++---
  gcc/testsuite/g++.dg/concepts/auto4.C   |  2 +-
  gcc/testsuite/g++.dg/cpp0x/auto3.C  |  2 +-
  gcc/testsuite/g++.dg/cpp0x/auto9.C  |  2 +-
  gcc/testsuite/g++.dg/cpp2a/concepts-pr84979-2.C | 12 ++--
  gcc/testsuite/g++.dg/cpp2a/concepts-pr84979-3.C | 12 ++--
  gcc/testsuite/g++.dg/cpp2a/concepts-pr84979.C   |  2 +-
  10 files changed, 43 insertions(+), 22 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/DRs/dr625.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cce3d0a679e..4b210db34a5 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -22419,7 +22419,10 @@ cp_parser_type_id_1 (cp_parser *parser, 
cp_parser_flags flags,
  
if (type_specifier_seq.type

/* The concepts TS allows 'auto' as a type-id.  */
-  && (!flag_concepts || parser->in_type_id_in_expr_p)
+  && (!flag_concepts
+ || parser->in_type_id_in_expr_p
+ /* DR 625 prohibits use of auto as a template-argument.  */
+ || parser->in_template_argument_list_p)
/* None of the valid uses of 'auto' in C++14 involve the type-id
 nonterminal, but it is valid in a trailing-return-type.  */
&& !(cxx_dialect >= cxx14 && is_trailing_return))
@@ -22446,6 +22449,9 @@ cp_parser_type_id_1 (cp_parser *parser, cp_parser_flags 
flags,
inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here",
tmpl);
  }
+   else if (parser->in_template_argument_list_p)
+ error_at (loc, "%qT not permitted in template argument",
+   auto_node);
else
  error_at (loc, "invalid use of %qT", auto_node);
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/DRs/dr625.C b/gcc/testsuite/g++.dg/DRs/dr625.C
new file mode 100644
index 000..ce30a9258e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/DRs/dr625.C
@@ -0,0 +1,15 @@
+// DR 625 - Use of auto as a template-argument
+// PR c++/97479
+// { dg-do compile { target c++14 } }
+
+template
+struct A { };
+
+void f(int);
+
+int main()
+{
+  A x = A(); // { dg-error "not permitted|invalid|cannot 
convert" }
+  A a = A(); // { dg-error "not permitted|invalid|cannot convert" }
+  void (*p)(auto); // { dg-error "parameter" }
+}
diff --git a/gcc/testsuite/g++.dg/concepts/auto1.C 
b/gcc/testsuite/g++.dg/concepts/auto1.C
index e05330610fc..ae93ecc8382 100644
--- a/gcc/testsuite/g++.dg/concepts/auto1.C
+++ b/gcc/testsuite/g++.dg/concepts/auto1.C
@@ -7,8 +7,8 @@ A a;
  A a2;
  A a22;
  
-A b = a;

-A b1 = a2;
+A b = a; // { dg-error "" }
+A b1 = a2;   // { dg-error "" }
  
  template  concept bool C = __is_same_as (T, int);
  
diff --git a/gcc/testsuite/g++.dg/concepts/auto3.C b/gcc/testsuite/g++.dg/concepts/auto3.C

index 27a6afa4ed9..f460295a2af 100644
--- a/gcc/testsuite/g++.dg/concepts/auto3.C
+++ b/gcc/testsuite/g++.dg/concepts/auto3.C
@@ -4,10 +4,10 @@
  template  class tuple {};
  
  tuple t;

-tuple y = t;
+tuple y = t;   // { dg-error "" }
  
  tuple t2;

-tuple x = t2;
-tuple x2 = t;
+tuple x = t2;   // { dg-error "" }
+tuple x2 = t;   // { dg-error "" }
  
  tuple y2 = t2;		// { dg-error "" }

diff --git a/gcc/testsuite/g++.dg/concepts/auto4.C 
b/gcc/testsuite/g++.dg/concepts/auto4.C
index 8bf3fa9b1ce..ee712617e6d 100644
--- a/gcc/testsuite/g++.dg/concepts/auto4.C
+++ b/gcc/testsuite/g++.dg/concepts/auto4.C
@@ -4,7 +4,7 @@
  
  template struct A {};
  
-template A foo() { return A{}; }

+template A foo() { return A{}; }  // { dg-error "" }
  
  void bar()

  {
diff --git a/gcc/testsuite/g++.dg/cpp0x/au

Re: [PATCH] c++: Deprecate arithmetic convs on different enums [PR97573]

2020-10-28 Thread Jason Merrill via Gcc-patches

On 10/28/20 2:01 PM, Marek Polacek wrote:

I noticed that C++20 P1120R0 deprecated certain arithmetic conversions
as outlined in [depr.arith.conv.enum], but we don't warn about them.  In
particular, "If one operand is of enumeration type and the other operand
is of a different enumeration type or a floating-point type, this
behavior is deprecated."  These will likely become ill-formed in C++23,
so we should warn by default in C++20.  To this effect, this patch adds
two new warnings (like clang++): -Wdeprecated-enum-enum-conversion and
-Wdeprecated-enum-float-conversion.  They are enabled by default in
C++20.  In older dialects, to enable these warnings you can now use
-Wenum-conversion which I made available in C++ too.  Note that unlike
C, in C++ it is not enabled by -Wextra, because that breaks bootstrap.

We already warn about comparisons of two different enumeration types via
-Wenum-compare, the rest is handled in this patch: we're performing the
usual arithmetic conversions in these contexts:
   - an arithmetic operation,
   - a bitwise operation,
   - a comparison,
   - a conditional operator,
   - a compound assign operator.

Using the spaceship operator as enum <=> real_type is ill-formed but we
don't reject it yet.


Hmm, oops.  Will you fix that as well?  It should be simple to fix in 
the SPACESHIP_EXPR block that starts just at the end of this patch.



We should also address [depr.array.comp] too, but
it's not handled in this patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.


gcc/c-family/ChangeLog:

PR c++/97573
* c-opts.c (c_common_post_options): In C++20, turn on
-Wdeprecated-enum-enum-conversion and
-Wdeprecated-enum-float-conversion.
* c.opt (Wdeprecated-enum-enum-conversion,
Wdeprecated-enum-float-conversion): New options.
(Wenum-conversion): Allow for C++ too.

gcc/cp/ChangeLog:

PR c++/97573
* call.c (build_conditional_expr_1): Warn about the deprecated
enum/real type conversion in C++20.  Also warn about a non-enumerated
and enumerated type in ?: when -Wenum-conversion is on.
* typeck.c (do_warn_enum_conversions): New function.
(cp_build_binary_op): Call it.

gcc/ChangeLog:

PR c++/97573
* doc/invoke.texi: Document -Wdeprecated-enum-enum-conversion
and -Wdeprecated-enum-float-conversion.  -Wenum-conversion is
no longer C/ObjC only.

gcc/testsuite/ChangeLog:

PR c++/97573
* g++.dg/cpp0x/linkage2.C: Add dg-warning.
* g++.dg/parse/attr3.C: Likewise.
* g++.dg/cpp2a/enum-conv1.C: New test.
* g++.dg/cpp2a/enum-conv2.C: New test.
* g++.dg/cpp2a/enum-conv3.C: New test.
---
  gcc/c-family/c-opts.c   |  10 ++
  gcc/c-family/c.opt  |  11 ++-
  gcc/cp/call.c   |  35 +--
  gcc/cp/typeck.c | 112 +-
  gcc/doc/invoke.texi |  44 -
  gcc/testsuite/g++.dg/cpp0x/linkage2.C   |   2 +-
  gcc/testsuite/g++.dg/cpp2a/enum-conv1.C | 120 
  gcc/testsuite/g++.dg/cpp2a/enum-conv2.C | 115 +++
  gcc/testsuite/g++.dg/cpp2a/enum-conv3.C | 115 +++
  gcc/testsuite/g++.dg/parse/attr3.C  |   2 +-
  10 files changed, 549 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/enum-conv1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/enum-conv2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/enum-conv3.C

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 38d33849423..120f4489f6c 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -925,6 +925,16 @@ c_common_post_options (const char **pfilename)
SET_OPTION_IF_UNSET (&global_options, &global_options_set, warn_volatile,
   cxx_dialect >= cxx20 && warn_deprecated);
  
+  /* -Wdeprecated-enum-enum-conversion is enabled by default in C++20.  */

+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  warn_deprecated_enum_enum_conv,
+  cxx_dialect >= cxx20 && warn_deprecated);
+
+  /* -Wdeprecated-enum-float-conversion is enabled by default in C++20.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  warn_deprecated_enum_float_conv,
+  cxx_dialect >= cxx20 && warn_deprecated);
+
/* Declone C++ 'structors if -Os.  */
if (flag_declone_ctor_dtor == -1)
  flag_declone_ctor_dtor = optimize_size;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 1009defbf16..10e53ea67c9 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -518,6 +518,15 @@ C++ ObjC++ Var(warn_deprecated_copy, 2) Warning
  Mark implicitly-declared copy operations as deprecated if the class has a
  user-provided copy operation or destructor.
  
+Wdeprecated-enum-enum-conversion

+C++ ObjC++ Var(

Re: [RS6000] float128-type-2.c unsupported

2020-10-28 Thread Segher Boessenkool
On Wed, Oct 28, 2020 at 09:18:35PM +1030, Alan Modra wrote:
> >From e7ce33cef478a826a2fe4e110b43b49586ef2438 Mon Sep 17 00:00:00 2001
> From: Alan Modra 
> Date: Wed, 28 Oct 2020 15:57:57 +1030
> Subject: 
> 
> I noticed this test is unsupported on power10 when looking through
> test logs.  There seems no reason why that should be the case, ie.
> likely the target test was meant to be powerpc64*-*-linux*.  And that
> simplifies down further.

The target name does not tell you if you are doing a -m32 or a -m64
build; both powerpc-linux and powerpc64-linux can build both 32-bit and
64-bit just fine (and hopefully identically).  Having target powerpc64*
is basically always wrong.

> diff --git a/gcc/testsuite/gcc.target/powerpc/float128-type-1.c 
> b/gcc/testsuite/gcc.target/powerpc/float128-type-1.c
> index 13152ac7c26..53f9e357535 100644
> --- a/gcc/testsuite/gcc.target/powerpc/float128-type-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/float128-type-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { powerpc64*-*-linux* && lp64 } } } */
> +/* { dg-do compile { target { *-*-linux* && lp64 } } } */
>  /* { dg-require-effective-target powerpc_p8vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power8 -O2 -mno-float128" } */
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/float128-type-2.c 
> b/gcc/testsuite/gcc.target/powerpc/float128-type-2.c
> index 5644281c3d4..02dbad1fa4f 100644
> --- a/gcc/testsuite/gcc.target/powerpc/float128-type-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/float128-type-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { powerpc64-*-linux* && lp64 } } } */
> +/* { dg-do compile { target { *-*-linux* && lp64 } } } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9 -O2 -mno-float128" } */

Your patch is fine though, modulo what David said.  If there is some
selector you can use (or you can make one) that is much preferred.  But
since this patch is strictly an improvement already, it is okay for
trunk (if the 2nd works on powerpc64le-linux of course ;-) )  Thanks!

(Improving it to test on exactly the right targets would be nice :-) )


Segher


c++: Make OMP UDR DECL_LOCAL_DECL_P earlier

2020-10-28 Thread Nathan Sidwell


I discovered that we were pushing an OMP UDR in a template before
setting DECL_LOCAL_DECL.  This caused the template machinery to give
it some template info.  It doesn't need that, and this changes the
parser to set it earlier.  We have to adjust instantiate_body to not
try and access such a function's non-existant template_info.  The
access checks that we're no longer doing are the same as those we did
on the containing function anyway.  So nothing is lost.

gcc/cp/
* parser.c (cp_parser_omp_declare_reduction): Set
DECL_LOCAL_DECL_P before push_template_decl.
* pt.c (instantiate_body): Nested fns do not have template_info.

pushing to trunk

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/parser.c w/gcc/cp/parser.c
index 03780fab0a7..52637b1d2af 100644
--- i/gcc/cp/parser.c
+++ w/gcc/cp/parser.c
@@ -42700,15 +42700,19 @@ cp_parser_omp_declare_reduction (cp_parser *parser, cp_token *pragma_tok,
   DECL_ATTRIBUTES (fndecl)
 	= tree_cons (get_identifier ("gnu_inline"), NULL_TREE,
 		 DECL_ATTRIBUTES (fndecl));
-  if (processing_template_decl)
-	fndecl = push_template_decl (fndecl);
   bool block_scope = false;
-  tree block = NULL_TREE;
   if (current_function_decl)
 	{
 	  block_scope = true;
 	  DECL_CONTEXT (fndecl) = current_function_decl;
 	  DECL_LOCAL_DECL_P (fndecl) = true;
+	}
+
+  if (processing_template_decl)
+	fndecl = push_template_decl (fndecl);
+
+  if (block_scope)
+	{
 	  if (!processing_template_decl)
 	pushdecl (fndecl);
 	}
@@ -42736,6 +42740,8 @@ cp_parser_omp_declare_reduction (cp_parser *parser, cp_token *pragma_tok,
 	  /* We should never meet a matched duplicate decl.  */
 	  gcc_checking_assert (d == error_mark_node || d == fndecl);
 	}
+
+  tree block = NULL_TREE;
   if (!block_scope)
 	start_preparsed_function (fndecl, NULL_TREE, SF_PRE_PARSED);
   else
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index 0d2946fd7c4..fdeaa02c887 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -25595,9 +25595,11 @@ instantiate_body (tree pattern, tree args, tree d, bool nested_p)
   if (nested_p)
 	block = push_stmt_list ();
   else
-	start_preparsed_function (d, NULL_TREE, SF_PRE_PARSED);
+	{
+	  start_preparsed_function (d, NULL_TREE, SF_PRE_PARSED);
 
-  perform_instantiation_time_access_checks (code_pattern, args);
+	  perform_instantiation_time_access_checks (code_pattern, args);
+	}
 
   /* Create substitution entries for the parameters.  */
   register_parameter_specializations (code_pattern, d);
@@ -25636,7 +25638,8 @@ instantiate_body (tree pattern, tree args, tree d, bool nested_p)
 }
 
   /* We're not deferring instantiation any more.  */
-  TI_PENDING_TEMPLATE_FLAG (DECL_TEMPLATE_INFO (d)) = 0;
+  if (!nested_p)
+TI_PENDING_TEMPLATE_FLAG (DECL_TEMPLATE_INFO (d)) = 0;
 
   if (push_to_top)
 pop_from_top_level ();


Re: [PATCH v2] c++: Prevent warnings for value-dependent exprs [PR96742]

2020-10-28 Thread Jason Merrill via Gcc-patches

On 10/28/20 2:00 PM, Marek Polacek wrote:

On Tue, Oct 27, 2020 at 01:36:30PM -0400, Jason Merrill wrote:

On 10/24/20 6:52 PM, Marek Polacek wrote:

Here, in r11-155, I changed the call to uses_template_parms to
type_dependent_expression_p_push to avoid a crash in C++98 in
value_dependent_expression_p on a non-constant expression.  But that
prompted a host of complaints that we now warn for value-dependent
expressions in templates.  Those warnings are technically valid, but
people still don't want them because they're awkward to avoid.  So let's
partially revert my earlier fix and make sure that we don't ICE in
value_dependent_expression_p by checking potential_constant_expression
first.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/cp/ChangeLog:

PR c++/96675
PR c++/96742
* pt.c (tsubst_copy_and_build): Call uses_template_parms instead of
type_dependent_expression_p_push.  Only call uses_template_parms
for expressions that are potential_constant_expression.

gcc/testsuite/ChangeLog:

PR c++/96675
PR c++/96742
* g++.dg/warn/Wdiv-by-zero-3.C: Turn dg-warning into dg-bogus.
* g++.dg/warn/Wtautological-compare3.C: New test.
* g++.dg/warn/Wtype-limits5.C: New test.
* g++.old-deja/g++.pt/crash10.C: Remove dg-warning.
---
   gcc/cp/pt.c|  6 --
   gcc/testsuite/g++.dg/warn/Wdiv-by-zero-3.C |  6 --
   gcc/testsuite/g++.dg/warn/Wtautological-compare3.C | 11 +++
   gcc/testsuite/g++.dg/warn/Wtype-limits5.C  | 11 +++
   gcc/testsuite/g++.old-deja/g++.pt/crash10.C|  1 -
   5 files changed, 30 insertions(+), 5 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/warn/Wtautological-compare3.C
   create mode 100644 gcc/testsuite/g++.dg/warn/Wtype-limits5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index dc664ec3798..8aa0bc2c0d8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -19618,8 +19618,10 @@ tsubst_copy_and_build (tree t,
 {
/* If T was type-dependent, suppress warnings that depend on the range
   of the types involved.  */
-   bool was_dep = type_dependent_expression_p_push (t);
-
+   ++processing_template_decl;
+   const bool was_dep = (!potential_constant_expression (t)
+ || uses_template_parms (t));


We don't want to suppress warnings for a non-constant expression that uses
no template parms.  So maybe


Fair enough.


potential_c_e ? value_d : type_d


That works for all the cases I have.


?  Or perhaps instantiation_dependent_expression_p.


i_d_e_p would still crash in C++98 :(.


Perhaps we should protect the value_d call in i_d_e_p with potential_c_e?


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?


OK.


-- >8 --
Here, in r11-155, I changed the call to uses_template_parms to
type_dependent_expression_p_push to avoid a crash in C++98 in
value_dependent_expression_p on a non-constant expression.  But that
prompted a host of complaints that we now warn for value-dependent
expressions in templates.  Those warnings are technically valid, but
people still don't want them because they're awkward to avoid.  This
patch uses value_dependent_expression_p or type_dependent_expression_p.
But make sure that we don't ICE in value_dependent_expression_p by
checking potential_constant_expression first.

gcc/cp/ChangeLog:

PR c++/96675
PR c++/96742
* pt.c (tsubst_copy_and_build): Call value_dependent_expression_p or
type_dependent_expression_p instead of type_dependent_expression_p_push.
But only call value_dependent_expression_p for expressions that are
potential_constant_expression.

gcc/testsuite/ChangeLog:

PR c++/96675
PR c++/96742
* g++.dg/warn/Wdiv-by-zero-3.C: Turn dg-warning into dg-bogus.
* g++.dg/warn/Wtautological-compare3.C: New test.
* g++.dg/warn/Wtype-limits5.C: New test.
* g++.old-deja/g++.pt/crash10.C: Remove dg-warning.
---
  gcc/cp/pt.c|  7 +--
  gcc/testsuite/g++.dg/warn/Wdiv-by-zero-3.C |  6 --
  gcc/testsuite/g++.dg/warn/Wtautological-compare3.C | 11 +++
  gcc/testsuite/g++.dg/warn/Wtype-limits5.C  | 11 +++
  gcc/testsuite/g++.old-deja/g++.pt/crash10.C|  1 -
  5 files changed, 31 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wtautological-compare3.C
  create mode 100644 gcc/testsuite/g++.dg/warn/Wtype-limits5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 3c0f2546489..57db476645e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -19618,8 +19618,11 @@ tsubst_copy_and_build (tree t,
{
/* If T was type-dependent, suppress warnings that depend on the range
   of the types involved.  */
-   bool was_dep = type_dependent_expression_p_push (t);
-
+   ++processing_template_decl;
+   co

Re: [PATCH][PR target/97540] Don't extract memory from operand for normal memory constraint.

2020-10-28 Thread Richard Sandiford via Gcc-patches
Hongtao Liu  writes:
> On Tue, Oct 27, 2020 at 7:13 PM Richard Sandiford
>  wrote:
>>
>> Hongtao Liu via Gcc-patches  writes:
>> > Hi:
>> >   For inline asm, there could be an operand like (not (mem:)), it's
>> > not a valid operand for normal memory constraint.
>> >   Bootstrap is ok, regression test is ok for make check
>> > RUNTESTFLAGS="--target_board='unix{-m32,}'"
>> >
>> > gcc/ChangeLog
>> > PR target/97540
>> > * ira.c: (ira_setup_alts): Extract memory from operand only
>> > for special memory constraint.
>> > * recog.c (asm_operand_ok): Ditto.
>> > * lra-constraints.c (process_alt_operands): MEM_P is
>> > required for normal memory constraint.
>> >
>> > gcc/testsuite/ChangeLog
>> > * gcc.target/i386/pr97540.c: New test.
>>
>> Sorry to stick my oar in, but I think we should reconsider the
>> bcst_mem_operand approach.  It seems like these patches (and the
>> previous one) are fighting against the principle that operands
>> cannot be arbitrary expressions.
>>
>> This kind of thing was attempted long ago (even before my time!)
>> for SIGN_EXTEND on MIPS.  It ended up causing more problems than
>> it solved and in the end it had to be taken out.  I'm worried that
>> we might end up going through the same cycle again.
>>
>> Also, this LRA code is extremely performance-sensitive in terms
>> of compile time: it's often at the top or near the top of the profile.
>> So adding calls to new functions like extract_mem_from_operand for
>> a fairly niche case probably isn't a good trade-off.
>>
>> I think we should instead find a nice(?) syntax for generating separate
>> patterns for the two bcst_vector_operand alternatives from a single
>> .md pattern.  That would fit the existing model much more closely.
>>
>
> We have define_subst for RTL template transformations, but it's not
> suitable for this case(too many define_subst templates need to
> be added, and it doesn't show any advantage compared to adding
> separate bcst patterns.). I don't find other workable existing syntax for it.

Yeah, I think it would need to be new syntax.  I was wondering if it
would help if we had somethine like (strawman suggestion):

  (one_of 0
[(match_operand:VI_AVX2 1 "vector_operand" "...")
 (vec_duplicate:VI_AVX2
   (match_operand:<...> 1 "..." "..."))]

where all instances of (one_of N ...) for a given N are required
to have the same number of alternatives.

This could be handled in a similar way to define_subst, with the
one_of being expanded before the main generator routines see it.

But maybe it wouldn't help that much.  E.g. for:

(define_insn "*3"
  [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
(plusminus:VI_AVX2
  (match_operand:VI_AVX2 1 "bcst_vector_operand" "0,v")
  (match_operand:VI_AVX2 2 "bcst_vector_operand" "xBm,vmBr")))]

the vec_duplicate version should only really have one alternative.
I guess we could handle that by using a:

  (one_of 0
[(set_attr "enabled" "*,*")
 (set_attr "enabled" "0,*")])

or some variant of that that uses a derived attribute.  But it feels
a bit clunky…

Without that, I guess the only pattern that would benefit directly is:

(define_insn "avx512dq_mul3"
  [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v")
(mult:VI8_AVX512VL
  (match_operand:VI8_AVX512VL 1 "bcst_vector_operand" "%v")
  (match_operand:VI8_AVX512VL 2 "bcst_vector_operand" "vmBr")))]

> So suppose I should revert my former 2 patches and add separate bcst patterns.

Are there going to more patterns that need bcst_vector_operand,
or is the current set complete?

I definitely think we should have a better way of handling this in the
.md files, and I'd be happy to hack something up on the generator side
(given that I'm being the awkward one here).  But I guess the answer to
the question above will decide whether it make things better or not.

FWIW, I think having separate patterns (whether they're produced from
one .md construct or from several) might better optimisation results.
But I guess there's a risk of combinatorial explosion, and the port has
a lot of patterns as it is.

Thanks,
Richard


c: Allow omitted parameter names for C2x

2020-10-28 Thread Joseph Myers
C2x allows parameter names to be omitted in function definitions, as
in C++; add support for this feature.  As with other features that
only result in previously rejected code being accepted, this feature
is now accepted as an extension for previous standard versions, with a
pedwarn-if-pedantic that is disabled by -Wno-c11-c2x-compat.  The
logic for avoiding unused-parameter warnings for unnamed parameters is
in code shared between C and C++, so no changes are needed there.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c/
2020-10-28  Joseph Myers  

* c-decl.c (store_parm_decls_newstyle): Use pedwarn_c11 not
error_at for omitted parameter name.

gcc/testsuite/
2020-10-28  Joseph Myers  

* gcc.dg/c11-parm-omit-1.c, gcc.dg/c11-parm-omit-2.c,
gcc.dg/c11-parm-omit-3.c, gcc.dg/c11-parm-omit-4.c,
gcc.dg/c2x-parm-omit-1.c, gcc.dg/c2x-parm-omit-2.c,
gcc.dg/c2x-parm-omit-3.c, gcc.dg/c2x-parm-omit-4.c: New tests.
* gcc.dg/noncompile/pr79758.c: Do not expect error for omitted
parameter name.

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 1673b958555..a5d0b158a26 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -9630,7 +9630,9 @@ store_parm_decls_newstyle (tree fndecl, const struct 
c_arg_info *arg_info)
warn_if_shadowing (decl);
}
   else
-   error_at (DECL_SOURCE_LOCATION (decl), "parameter name omitted");
+   pedwarn_c11 (DECL_SOURCE_LOCATION (decl), OPT_Wpedantic,
+"ISO C does not support omitting parameter names in "
+"function definitions before C2X");
 }
 
   /* Record the parameter list in the function declaration.  */
diff --git a/gcc/testsuite/gcc.dg/c11-parm-omit-1.c 
b/gcc/testsuite/gcc.dg/c11-parm-omit-1.c
new file mode 100644
index 000..83d1b508286
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-parm-omit-1.c
@@ -0,0 +1,5 @@
+/* Test omitted parameter names not in C11: -pedantic-errors.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -pedantic-errors" } */
+
+void f (int) { } /* { dg-error "omitting parameter names" } */
diff --git a/gcc/testsuite/gcc.dg/c11-parm-omit-2.c 
b/gcc/testsuite/gcc.dg/c11-parm-omit-2.c
new file mode 100644
index 000..2efd4505db3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-parm-omit-2.c
@@ -0,0 +1,5 @@
+/* Test omitted parameter names not in C11: -pedantic.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -pedantic" } */
+
+void f (int) { } /* { dg-warning "omitting parameter names" } */
diff --git a/gcc/testsuite/gcc.dg/c11-parm-omit-3.c 
b/gcc/testsuite/gcc.dg/c11-parm-omit-3.c
new file mode 100644
index 000..5bf27a03aff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-parm-omit-3.c
@@ -0,0 +1,5 @@
+/* Test omitted parameter names not in C11: -pedantic -Wno-c11-c2x-compat.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -pedantic -Wno-c11-c2x-compat" } */
+
+void f (int) { }
diff --git a/gcc/testsuite/gcc.dg/c11-parm-omit-4.c 
b/gcc/testsuite/gcc.dg/c11-parm-omit-4.c
new file mode 100644
index 000..ea4cbfa9928
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-parm-omit-4.c
@@ -0,0 +1,6 @@
+/* Test omitted parameter names not in C11: accepted by default in the
+   absence of -pedantic.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11" } */
+
+void f (int) { }
diff --git a/gcc/testsuite/gcc.dg/c2x-parm-omit-1.c 
b/gcc/testsuite/gcc.dg/c2x-parm-omit-1.c
new file mode 100644
index 000..0dc89bb0270
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-parm-omit-1.c
@@ -0,0 +1,5 @@
+/* Test omitted parameter names in C2x.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+void f (int) { }
diff --git a/gcc/testsuite/gcc.dg/c2x-parm-omit-2.c 
b/gcc/testsuite/gcc.dg/c2x-parm-omit-2.c
new file mode 100644
index 000..7d689332813
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-parm-omit-2.c
@@ -0,0 +1,10 @@
+/* Test omitted parameter names in C2x.  Warning test: there should be
+   no warning for an unnamed parameter being unused.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors -Wall -Wextra" } */
+
+int
+f (int a, int, int c, int d) /* { dg-warning "unused parameter 'd'" } */
+{
+  return a + c;
+}
diff --git a/gcc/testsuite/gcc.dg/c2x-parm-omit-3.c 
b/gcc/testsuite/gcc.dg/c2x-parm-omit-3.c
new file mode 100644
index 000..dac258b0fb8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-parm-omit-3.c
@@ -0,0 +1,23 @@
+/* Test omitted parameter names in C2x.  Execution test.  */
+/* { dg-do run } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+extern void abort (void);
+extern void exit (int);
+
+void
+f (int a, int [++a], int b)
+{
+  /* Verify array size expression of unnamed parameter is processed as
+ expected.  */
+  if (a != 2 || b != 3)
+abort ();
+}
+
+int
+main (void)
+{
+  int t[2];
+  f (1, t, 3);
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/c2x-parm-omit-4.c 
b

Re: [PATCH v2] c++: Implement -Wvexing-parse [PR25814]

2020-10-28 Thread Jason Merrill via Gcc-patches

On 10/28/20 1:58 PM, Marek Polacek wrote:

On Wed, Oct 28, 2020 at 01:26:53AM -0400, Jason Merrill via Gcc-patches wrote:

On 10/24/20 7:40 PM, Marek Polacek wrote:

On Fri, Oct 23, 2020 at 09:33:38PM -0400, Jason Merrill via Gcc-patches wrote:

On 10/23/20 3:01 PM, Marek Polacek wrote:

This patch implements the -Wvexing-parse warning to warn about the
sneaky most vexing parse rule in C++: the cases when a declaration
looks like a variable definition, but the C++ language requires it
to be interpreted as a function declaration.  This warning is on by
default (like clang++).  From the docs:

 void f(double a) {
   int i();// extern int i (void);
   int n(int(a));  // extern int n (int);
 }

 Another example:

 struct S { S(int); };
 void f(double a) {
   S x(int(a));   // extern struct S x (int);
   S y(int());// extern struct S y (int (*) (void));
   S z(); // extern struct S z (void);
 }

You can find more on this in [dcl.ambig.res].

I spent a fair amount of time on fix-it hints so that GCC can recommend
various ways to resolve such an ambiguity.  Sometimes that's tricky.
E.g., suggesting default-initialization when the class doesn't have
a default constructor would not be optimal.  Suggesting {}-init is also
not trivial because it can use an initializer-list constructor if no
default constructor is available (which ()-init wouldn't do).  And of
course, pre-C++11, we shouldn't be recommending {}-init at all.


What do you think of, instead of passing the type down into the declarator
parse, adding the paren locations to cp_declarator::function and giving the
diagnostic from cp_parser_init_declarator instead?


Oops, now I see there's already cp_declarator::parenthesized; might as well
reuse that.  And maybe change it to a range, while we're at it.


I'm afraid I can't reuse it because grokdeclarator uses it to warn about
"unnecessary parentheses in declaration".  So when we have:

   int (x());

declarator->parenthesized points to the outer parens (if any), whereas
declarator->u.function.parens_loc should point to the inner ones.  We also
have declarator->id_loc but I think we should only use it for declarator-ids.


Makes sense.


(We should still adjust ->parenthesized to be a range to generate a better
diagnostic; I shall send a patch soon.)


Hmm, I wonder why we have the parenthesized_p parameter to some of these
functions, since we can look at the declarator to find that information...


That would be a nice cleanup.


Interesting idea.  I suppose it's better, and makes the implementation
more localized.  The approach here is that if the .function.parens_loc
is UNKNOWN_LOCATION, we've not seen a vexing parse.


I'd rather always set the parens location, and then analyze the
cp_declarator in warn_about_ambiguous_parse to see if it's a vexing parse;
we should have all the information we need.


I could always set .parens_loc, but then I'd still need another flag telling
me whether we had an ambiguity.  Otherwise I don't know how I would tell
apart e.g. "int f()" (warn) v. "int f(void)" (don't warn), etc.


Ah, I was thinking that we still had the parameter declarators, but now 
I see that cp_parser_parameter_declaration_list groks them and returns a 
TREE_LIST.  We could set a TREE_LANG_FLAG on each TREE_LIST if its 
parameter declarator was parenthesized?


Jason



[committed][PATCH]AArch64 Skip test for pr97535 on ILP32 since it can't express the range.

2020-10-28 Thread Tamar Christina via Gcc-patches
Hi All,

I am excluding the test from ILP32 since the goal of the test is to test
truncations of large numbers above INT_MAX.

Regtested on aarch64-none-linux-gnu and no issues.

Committed under the obvious rule.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR target/97535
* gcc.target/aarch64/pr97535.c: Exclude ILP32.

-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr97535.c b/gcc/testsuite/gcc.target/aarch64/pr97535.c
index 6f83b3f571413577180682c18400d913bb13124d..7d4db485f1feaf1d4b379a4ba2daa2715cb8dc22 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr97535.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr97535.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! ilp32 } } } */
 
 #include 
 



  1   2   >