[PATCH] x86_64: Add insn patterns for V1TI mode logic operations.

2021-10-22 Thread Roger Sayle

On x86_64, V1TI mode holds a 128-bit integer value in a (vector) SSE
register (where regular TI mode uses a pair of 64-bit general purpose
scalar registers).  This patch improves the implementation of AND, IOR,
XOR and NOT on these values.

The benefit is demonstrated by the following simple test program:

typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16)));
v1ti and(v1ti x, v1ti y) { return x & y; }
v1ti ior(v1ti x, v1ti y) { return x | y; }
v1ti xor(v1ti x, v1ti y) { return x ^ y; }
v1ti not(v1ti x) { return ~x; }

For which GCC currently generates the rather large:

and:movdqa  %xmm0, %xmm2
movq%xmm1, %rdx
movq%xmm0, %rax
andq%rdx, %rax
movhlps %xmm2, %xmm3
movhlps %xmm1, %xmm4
movq%rax, %xmm0
movq%xmm4, %rdx
movq%xmm3, %rax
andq%rdx, %rax
movq%rax, %xmm5
punpcklqdq  %xmm5, %xmm0
ret

ior:movdqa  %xmm0, %xmm2
movq%xmm1, %rdx
movq%xmm0, %rax
orq %rdx, %rax
movhlps %xmm2, %xmm3
movhlps %xmm1, %xmm4
movq%rax, %xmm0
movq%xmm4, %rdx
movq%xmm3, %rax
orq %rdx, %rax
movq%rax, %xmm5
punpcklqdq  %xmm5, %xmm0
ret

xor:movdqa  %xmm0, %xmm2
movq%xmm1, %rdx
movq%xmm0, %rax
xorq%rdx, %rax
movhlps %xmm2, %xmm3
movhlps %xmm1, %xmm4
movq%rax, %xmm0
movq%xmm4, %rdx
movq%xmm3, %rax
xorq%rdx, %rax
movq%rax, %xmm5
punpcklqdq  %xmm5, %xmm0
ret

not:movdqa  %xmm0, %xmm1
movq%xmm0, %rax
notq%rax
movhlps %xmm1, %xmm2
movq%rax, %xmm0
movq%xmm2, %rax
notq%rax
movq%rax, %xmm3
punpcklqdq  %xmm3, %xmm0
ret


with this patch we now generate the much more efficient:

and:pand%xmm1, %xmm0
ret

ior:por %xmm1, %xmm0
ret

xor:pxor%xmm1, %xmm0
ret

not:pcmpeqd %xmm1, %xmm1
pxor%xmm1, %xmm0
ret


For my first few attempts at this patch I tried adding V1TI to the
existing VI and VI12_AVX_512F mode iterators, but these then have
dependencies on other iterators (and attributes), and so on until
everything ties itself into a knot, as V1TI mode isn't really a
first-class vector mode on x86_64.  Hence I ultimately opted to use
simple stand-alone patterns (as used by the existing TF mode support).

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.  Ok for mainline?


2021-10-22  Roger Sayle  

gcc/ChangeLog
* config/i386/sse.md (v1ti3): New define_insn to
implement V1TImode AND, IOR and XOR on TARGET_SSE2 (and above).
(one_cmplv1ti2): New define expand.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-logic.c: New test case.
* gcc.target/i386/sse2-v1ti-logic-2.c: New test case.

Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index fbf056b..f37c5c0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16268,6 +16268,31 @@
  ]
  (const_string "")))])
 
+(define_insn "v1ti3"
+  [(set (match_operand:V1TI 0 "register_operand" "=x,x,v")
+   (any_logic:V1TI
+ (match_operand:V1TI 1 "register_operand" "%0,x,v")
+ (match_operand:V1TI 2 "vector_operand" "xBm,xm,vm")))]
+  "TARGET_SSE2"
+  "@
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx,avx")
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "type" "sselog")
+   (set_attr "mode" "TI")])
+
+(define_expand "one_cmplv1ti2"
+  [(set (match_operand:V1TI 0 "register_operand")
+   (xor:V1TI (match_operand:V1TI 1 "register_operand")
+ (match_dup 2)))]
+  "TARGET_SSE2"
+{
+  operands[2] = force_reg (V1TImode, CONSTM1_RTX (V1TImode));
+})
+
 (define_mode_iterator AVX512ZEXTMASK
   [(DI "TARGET_AVX512BW") (SI "TARGET_AVX512BW") HI])
 
/* { dg-do compile { target int128 } } */
/* { dg-options "-O2 -msse2" } */
/* { dg-require-effective-target sse2 } */

typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16)));

v1ti and(v1ti x, v1ti y)
{
  return x & y;
}

v1ti ior(v1ti x, v1ti y)
{
  return x | y;
}

v1ti xor(v1ti x, v1ti y)
{
  return x ^ y;
}

v1ti not(v1ti x)
{
  return ~x;
}

/* { dg-final { scan-assembler "pand" } } */
/* { dg-final { scan-assembler "por" } } */
/* { dg-final { scan-assembler-times "pxor" 2 } } */
/* { dg-do compile { target int128 } } */
/* { dg-options "-O2 -msse2" } */
/* { dg-require-effective-target sse2 } */

typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16)));

v1ti x;
v1ti y;
v1ti z;

void and2()
{
  x &= y;
}

void and3()
{
  x 

José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)

2021-10-22 Thread Tobias Burnus

Hi José, hi all,

especially since my patch which moved the descriptor conversion from
libgfortran to gfortran is in, I wonder whether there are still patches
to be applied and useful testcases; I assume there are more issues in
Bugzilla, especially as I filled myself some (often related to
polymorphism or assumed rank). While I (and also Sandra) try to resolve
some bugs and look at testcases:

it would be helpful if others – in particular José – could check
whether: (a) PRs can be now closed, (b) testcases exist which still
should be added, (c) patches exist which still are applicable (even if
they need to be revised). (Partial/full list below.)

I hope that we can really cleanup this backlog – and possibly fix also
some of the remaining bugs before GCC 12 is released!

And kudos to José for the bug reports, testcases and patches – and sorry
for slow reviews. I hope we resolve the pending issues and be quicker in
future.

Tobias

PS: Old and probably current but incomplete pending patch list:

On 21.06.21 17:21, José Rui Faustino de Sousa wrote:

On 21/06/21 12:37, Tobias Burnus wrote:

Thus: Do you have a list of patches pending review?


https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056168.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056167.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056163.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056155.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056152.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056159.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055921.html

I am not 100% sure this is all of them but it should be most.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-22 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 20 Oct 2021 at 18:21, Richard Biener  wrote:
>
> On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Tue, 19 Oct 2021 at 16:55, Richard Biener  wrote:
> > >
> > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > As suggested in PR, I have attached WIP patch that adds 
> > > > > > > > > > > > two patterns
> > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > >
> > > > > > > > > > > > This works to remove call to erfc for the following 
> > > > > > > > > > > > test:
> > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > {
> > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > >
> > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > >
> > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > {
> > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > >
> > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > >   return t1;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not 
> > > > > > > > > > > > transform 1 -
> > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) 
> > > > > > > > > > > > gets applied,
> > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > resimplify erfc(x), which fails post canonicalization. 
> > > > > > > > > > > > So we end up
> > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't 
> > > > > > > > > > > > ideal.
> > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > >
> > > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even 
> > > > > > > > > > when there's
> > > > > > > > > > no erf(x) in the source ?
> > > > > > > > >
> > > > > > > > > I do think it's reasonable to expect erfc to be available 
> > > > > > > > > when erf
> > > > > > > > > is and vice versa but note both are C99 specified functions 
> > > > > > > > > (either
> > > > > > > > > requires -lm).
> > > > > > > > OK, thanks. Would it be OK to commit the patch after 
> > > > > > > > bootstrap+test ?
> > > > > > >
> > > > > > > Yes, but I'm confused because you say the patch doesn't work for 
> > > > > > > you?
> > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > However when there's only erfc in the source, it canonicalizes 
> > > > > > erfc(x)
> > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > > > erfc(x)
> > > > > > with -O3 -funsafe-math-optimizations.
> > > > > >
> > > > > > For,
> > > > > > t1 = __builtin_erfc(x),
> > > > > >
> > > > > > .optimized dump shows:
> > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > >
> > > > > > and for,
> > > > > > double t1 = x + __builtin_erfc(x);
> > > > > >
> > > > > > .optimized dump shows:
> > > > > >   _3 = __builtin_erf (x_2(D));
> > > > > >   _7 = x_2(D) + 1.0e+0;
> > > > > >   t1_4 = _7 - _3;
> > > > > >
> > > > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > > > I think the reason uncaonicalization fails is because the pattern 1 
> > > > > > -
> > > > > > erf(x) to erfc(x)
> > > > > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > > > > up with 1 - erf(x) in code-gen.
> > > > > >
> > > > > > From gimple-match.c, it hits the simplification:
> > > > > >
> 

Re: [PATCH] Handle jobserver file descriptors in btest.

2021-10-22 Thread Martin Liška

On 10/21/21 20:15, Ian Lance Taylor wrote:

On Thu, Oct 21, 2021 at 12:48 AM Martin Liška  wrote:


The patch is about sensitive handling of file descriptors opened
by make's jobserver.


Thanks.  I think a better approach would be, at the start of main,
fstat the descriptors up to 10 and record the ones for which fstat
succeeds.  Then at the end of main only check the descriptors for
which fstat failed earlier.


Sure, makes sense.



I can work on that at some point if you don't want to tackle it.


I've just done that in the attached patch.

Is it fine?
Thanks,
Martin



Ian

From ad52a33e10f76119867dbf0b6d5875378aa399ab Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 22 Oct 2021 10:12:56 +0200
Subject: [PATCH] Handle jobserver file descriptors in btest.

	PR testsuite/102742

libbacktrace/ChangeLog:

	* btest.c (MIN_DESCRIPTOR): New.
	(MAX_DESCRIPTOR): Likewise.
	(check_available_files): Likewise.
	(check_open_files): Check only file descriptors that
	were not available at the entry.
	(main): Call check_available_files.
---
 libbacktrace/btest.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c
index 9f9c03babf3..7ef6d320497 100644
--- a/libbacktrace/btest.c
+++ b/libbacktrace/btest.c
@@ -38,6 +38,7 @@ POSSIBILITY OF SUCH DAMAGE.  */
 #include 
 #include 
 #include 
+#include 
 
 #include "filenames.h"
 
@@ -458,16 +459,29 @@ test5 (void)
   return failures;
 }
 
+#define MIN_DESCRIPTOR 3
+#define MAX_DESCRIPTOR 10
+
+static int fstat_status[MAX_DESCRIPTOR];
+
+/* Check files that are available.  */
+
+static void
+check_available_files (void)
+{
+  struct stat s;
+  for (unsigned i = MIN_DESCRIPTOR; i < MAX_DESCRIPTOR; i++)
+fstat_status[i] = fstat (i, &s);
+}
+
 /* Check that are no files left open.  */
 
 static void
 check_open_files (void)
 {
-  int i;
-
-  for (i = 3; i < 10; i++)
+  for (unsigned i = MIN_DESCRIPTOR; i < MAX_DESCRIPTOR; i++)
 {
-  if (close (i) == 0)
+  if (fstat_status[i] != 0 && close (i) == 0)
 	{
 	  fprintf (stderr,
 		   "ERROR: descriptor %d still open after tests complete\n",
@@ -482,6 +496,8 @@ check_open_files (void)
 int
 main (int argc ATTRIBUTE_UNUSED, char **argv)
 {
+  check_available_files ();
+
   state = backtrace_create_state (argv[0], BACKTRACE_SUPPORTS_THREADS,
   error_callback_create, NULL);
 
-- 
2.33.1



Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)

2021-10-22 Thread Paul Richard Thomas via Gcc-patches
Hi Tobias,

My disappearance is partly responsible for the backlog. I told José that I
would start working on it some months since but just have not had the time.
I can do some of the reviews but still do not have much time to spare.
Perhaps you could divide them up between us.

Andrew Benson has been working on some standards issues associated with a
patch of mine that sorts out finalization for intrinsic assignment -
PR64290. The main issue was that of finalization of finalizable
types/classes that themselves have finalizable array components. I believe
that the patch has it right, following a comparison between the
(differing!) behaviour of other brands. We have also found a case where
gfortran, with the patch applied, that still does not finalize when it
should. I will work up a fix for this and will coordinate with Andrew to
produce testcases as necessary, well before 15th November.

Best regards

Paul


On Fri, 22 Oct 2021 at 08:42, Tobias Burnus  wrote:

> Hi José, hi all,
>
> especially since my patch which moved the descriptor conversion from
> libgfortran to gfortran is in, I wonder whether there are still patches
> to be applied and useful testcases; I assume there are more issues in
> Bugzilla, especially as I filled myself some (often related to
> polymorphism or assumed rank). While I (and also Sandra) try to resolve
> some bugs and look at testcases:
>
> it would be helpful if others – in particular José – could check
> whether: (a) PRs can be now closed, (b) testcases exist which still
> should be added, (c) patches exist which still are applicable (even if
> they need to be revised). (Partial/full list below.)
>
> I hope that we can really cleanup this backlog – and possibly fix also
> some of the remaining bugs before GCC 12 is released!
>
> And kudos to José for the bug reports, testcases and patches – and sorry
> for slow reviews. I hope we resolve the pending issues and be quicker in
> future.
>
> Tobias
>
> PS: Old and probably current but incomplete pending patch list:
>
> On 21.06.21 17:21, José Rui Faustino de Sousa wrote:
> > On 21/06/21 12:37, Tobias Burnus wrote:
> >> Thus: Do you have a list of patches pending review?
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056168.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056167.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056163.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056155.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056152.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056159.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html
> >
> > https://gcc.gnu.org/pipermail/fortran/2021-April/055921.html
> >
> > I am not 100% sure this is all of them but it should be most.
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> Registergericht München, HRB 106955
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr

2021-10-22 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 20 Oct 2021 at 15:05, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 19 Oct 2021 at 19:58, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > Hi,
> >> > The attached patch emits a more verbose diagnostic for target attribute 
> >> > that
> >> > is an architecture extension needing a leading '+'.
> >> >
> >> > For the following test,
> >> > void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >
> >> > With patch, the compiler now emits:
> >> > 102376.c:1:1: error: arch extension ‘sve’ should be prepended with ‘+’
> >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >   | ^~~~
> >> >
> >> > instead of:
> >> > 102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not valid
> >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >   | ^~~~
> >>
> >> Nice :-)
> >>
> >> > (This isn't specific to sve though).
> >> > OK to commit after bootstrap+test ?
> >> >
> >> > Thanks,
> >> > Prathamesh
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> >> > index a9a1800af53..975f7faf968 100644
> >> > --- a/gcc/config/aarch64/aarch64.c
> >> > +++ b/gcc/config/aarch64/aarch64.c
> >> > @@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args)
> >> >num_attrs++;
> >> >if (!aarch64_process_one_target_attr (token))
> >> >   {
> >> > -   error ("pragma or attribute % is not valid", 
> >> > token);
> >> > +   /* Check if token is possibly an arch extension without
> >> > +  leading '+'.  */
> >> > +   char *str = (char *) xmalloc (strlen (token) + 2);
> >> > +   str[0] = '+';
> >> > +   strcpy(str + 1, token);
> >>
> >> I think std::string would be better here, e.g.:
> >>
> >>   auto with_plus = std::string ("+") + token;
> >>
> >> > +   if (aarch64_handle_attr_isa_flags (str))
> >> > + error("arch extension %<%s%> should be prepended with %<+%>", 
> >> > token);
> >>
> >> Nit: should be a space before the “(”.
> >>
> >> In principle, a fixit hint would have been nice here, but I don't think
> >> we have enough information to provide one.  (Just saying for the record.)
> > Thanks for the suggestions.
> > Does the attached patch look OK ?
>
> Looks good apart from a couple of formatting nits.
> >
> > Thanks,
> > Prathamesh
> >>
> >> Thanks,
> >> Richard
> >>
> >> > +   else
> >> > + error ("pragma or attribute % is not valid", 
> >> > token);
> >> > +   free (str);
> >> > return false;
> >> >   }
> >> >
> >
> > [aarch64] PR102376 - Emit better diagnostics for arch extension in target 
> > attribute.
> >
> > gcc/ChangeLog:
> >   PR target/102376
> >   * config/aarch64/aarch64.c (aarch64_handle_attr_isa_flags): Change 
> > str's
> >   type to const char *.
> >   (aarch64_process_target_attr): Check if token is possibly an arch 
> > extension
> >   without leading '+' and emit diagnostic accordingly.
> >
> > gcc/testsuite/ChangeLog:
> >   PR target/102376
> >   * gcc.target/aarch64/pr102376.c: New test.
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index a9a1800af53..b72079bc466 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -17548,7 +17548,7 @@ aarch64_handle_attr_tune (const char *str)
> > modified.  */
> >
> >  static bool
> > -aarch64_handle_attr_isa_flags (char *str)
> > +aarch64_handle_attr_isa_flags (const char *str)
> >  {
> >enum aarch64_parse_opt_result parse_res;
> >uint64_t isa_flags = aarch64_isa_flags;
> > @@ -17821,7 +17821,13 @@ aarch64_process_target_attr (tree args)
> >num_attrs++;
> >if (!aarch64_process_one_target_attr (token))
> >   {
> > -   error ("pragma or attribute % is not valid", 
> > token);
> > +   /* Check if token is possibly an arch extension without
> > +  leading '+'.  */
> > +   auto with_plus = std::string("+") + token;
>
> Should be a space before “(”.
>
> > +   if (aarch64_handle_attr_isa_flags (with_plus.c_str ()))
> > + error ("arch extension %<%s%> should be prepended with %<+%>", 
> > token);
>
> Long line, should be:
>
> error ("arch extension %<%s%> should be prepended with %<+%>",
>token);
>
> OK with those changes, thanks.
Thanks, the patch regressed some target attr tests because it emitted
diagnostics twice from
aarch64_handle_attr_isa_flags.
So for eg, spellcheck_1.c:
__attribute__((target ("arch=armv8-a-typo"))) void foo () {}

results in:
spellcheck_1.c:5:1: error: invalid name ("armv8-a-typo") in
‘target("arch=")’ pragma or attribute
5 | {
  | ^
spellcheck_1.c:5:1: note: valid arguments are: armv8-a armv8.1-a
armv8.2-a armv8.3-a armv8.4-a armv8.5-a armv8.6-a armv8.7-a armv8-r
armv9-a
spellcheck_1.c:5:1: error: invalid feature modifier arch=armv8-a-typo
of value ("+arch=armv8-a-typo") in ‘target()’ pragma or attribute

Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-22 Thread Richard Biener via Gcc-patches
On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:

> On Wed, 20 Oct 2021 at 18:21, Richard Biener  wrote:
> >
> > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Tue, 19 Oct 2021 at 16:55, Richard Biener  wrote:
> > > >
> > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that 
> > > > > > > > > > > > > adds two patterns
> > > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > > >
> > > > > > > > > > > > > This works to remove call to erfc for the following 
> > > > > > > > > > > > > test:
> > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > {
> > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > >
> > > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > {
> > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > >
> > > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > > >   return t1;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not 
> > > > > > > > > > > > > transform 1 -
> > > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) 
> > > > > > > > > > > > > gets applied,
> > > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > > resimplify erfc(x), which fails post 
> > > > > > > > > > > > > canonicalization. So we end up
> > > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't 
> > > > > > > > > > > > > ideal.
> > > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > > >
> > > > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even 
> > > > > > > > > > > when there's
> > > > > > > > > > > no erf(x) in the source ?
> > > > > > > > > >
> > > > > > > > > > I do think it's reasonable to expect erfc to be available 
> > > > > > > > > > when erf
> > > > > > > > > > is and vice versa but note both are C99 specified functions 
> > > > > > > > > > (either
> > > > > > > > > > requires -lm).
> > > > > > > > > OK, thanks. Would it be OK to commit the patch after 
> > > > > > > > > bootstrap+test ?
> > > > > > > >
> > > > > > > > Yes, but I'm confused because you say the patch doesn't work 
> > > > > > > > for you?
> > > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > > However when there's only erfc in the source, it canonicalizes 
> > > > > > > erfc(x)
> > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > > > > erfc(x)
> > > > > > > with -O3 -funsafe-math-optimizations.
> > > > > > >
> > > > > > > For,
> > > > > > > t1 = __builtin_erfc(x),
> > > > > > >
> > > > > > > .optimized dump shows:
> > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > >
> > > > > > > and for,
> > > > > > > double t1 = x + __builtin_erfc(x);
> > > > > > >
> > > > > > > .optimized dump shows:
> > > > > > >   _3 = __builtin_erf (x_2(D));
> > > > > > >   _7 = x_2(D) + 1.0e+0;
> > > > > > >   t1_4 = _7 - _3;
> > > > > > >
> > > > > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > > > > I think the reason uncaonicalization fails is be

[PATCH] bootstrap/102681 - properly CSE PHIs with default def args

2021-10-22 Thread Richard Biener via Gcc-patches
The PR shows that we fail to CSE PHIs containing (different)
default definitions due to the fact on how we now handle
on-demand build of VN_INFO.  The following fixes this in the
same way the PHI visitation code does.

On gcc.dg/ubsan/pr81981.c this causes one expected warning to be
elided since the uninit pass sees the change

[local count: 1073741824]:
   # u$0_2 = PHI 
-  # cstore_11 = PHI 
   v = u$0_2;
-  return cstore_11;
+  return u$0_2;

and thus only one of the conditionally uninitialized uses (the
other became dead).  I have XFAILed the missing diagnostic,
I don't see a way to preserve that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-22  Richard Biener  

PR bootstrap/102681
* tree-ssa-sccvn.c (vn_phi_insert): For undefined SSA args
record VN_TOP.
(vn_phi_lookup): Likewise.

* gcc.dg/tree-ssa/ssa-fre-97.c: New testcase.
* gcc.dg/ubsan/pr81981.c: XFAIL one case.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c | 19 +++
 gcc/testsuite/gcc.dg/ubsan/pr81981.c   |  2 +-
 gcc/tree-ssa-sccvn.c   | 14 --
 3 files changed, 32 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c
new file mode 100644
index 000..2f09c8baeb8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* ethread threading does not yet catch this but it might at some point.  */
+/* { dg-options "-O -fdump-tree-fre1-details -fno-thread-jumps" } */
+
+int foo (int b, int x)
+{
+  int i, j;
+  if (b)
+i = x;
+  if (b)
+j = x;
+  return j == i;
+}
+
+/* Even with different undefs we should CSE a PHI node with the
+   same controlling condition.  */
+
+/* { dg-final { scan-tree-dump "Replaced redundant PHI node" "fre1" } } */
+/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/ubsan/pr81981.c 
b/gcc/testsuite/gcc.dg/ubsan/pr81981.c
index 8a6597c84c8..d201efb3f65 100644
--- a/gcc/testsuite/gcc.dg/ubsan/pr81981.c
+++ b/gcc/testsuite/gcc.dg/ubsan/pr81981.c
@@ -16,6 +16,6 @@ foo (int i)
   u[0] = i;
 }
 
-  v = u[0];/* { dg-warning "may be used uninitialized" } */
+  v = u[0];/* { dg-warning "may be used uninitialized" "" { xfail 
*-*-* } } */
   return t[0]; /* { dg-warning "may be used uninitialized" } */
 }
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index ae0172a143e..893b1d0ddaa 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -4499,7 +4499,12 @@ vn_phi_lookup (gimple *phi, bool backedges_varying_p)
   tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
   if (TREE_CODE (def) == SSA_NAME
  && (!backedges_varying_p || !(e->flags & EDGE_DFS_BACK)))
-   def = SSA_VAL (def);
+   {
+ if (ssa_undefined_value_p (def, false))
+   def = VN_TOP;
+ else
+   def = SSA_VAL (def);
+   }
   vp1->phiargs[e->dest_idx] = def;
 }
   vp1->type = TREE_TYPE (gimple_phi_result (phi));
@@ -4543,7 +4548,12 @@ vn_phi_insert (gimple *phi, tree result, bool 
backedges_varying_p)
   tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
   if (TREE_CODE (def) == SSA_NAME
  && (!backedges_varying_p || !(e->flags & EDGE_DFS_BACK)))
-   def = SSA_VAL (def);
+   {
+ if (ssa_undefined_value_p (def, false))
+   def = VN_TOP;
+ else
+   def = SSA_VAL (def);
+   }
   vp1->phiargs[e->dest_idx] = def;
 }
   vp1->value_id = VN_INFO (result)->value_id;
-- 
2.31.1


Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-22 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> On 15/10/2021 09:48, Richard Biener wrote:
>> On Tue, 12 Oct 2021, Andre Vieira (lists) wrote:
>>
>>> Hi Richi,
>>>
>>> I think this is what you meant, I now hide all the unrolling cost 
>>> calculations
>>> in the existing target hooks for costs. I did need to adjust 'finish_cost' 
>>> to
>>> take the loop_vinfo so the target's implementations are able to set the 
>>> newly
>>> renamed 'suggested_unroll_factor'.
>>>
>>> Also added the checks for the epilogue's VF.
>>>
>>> Is this more like what you had in mind?
>> Not exactly (sorry..).  For the target hook I think we don't want to
>> pass vec_info but instead another output parameter like the existing
>> ones.
>>
>> vect_estimate_min_profitable_iters should then via
>> vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll
>> suggestion to vect_analyze_loop which should then, if the suggestion
>> was > 1, instead of iterating to the next vector mode run again
>> with a fixed VF (old VF times suggested unroll factor - there's
>> min_vf in vect_analyze_loop_2 which we should adjust to
>> the old VF times two for example and maybe store the suggested
>> factor as hint) - if it succeeds the result will end up in the
>> list of considered modes (where we now may have more than one
>> entry for the same mode but a different VF), we probably want to
>> only consider more unrolling once.
>>
>> For simplicity I'd probably set min_vf = max_vf = old VF * suggested
>> factor, thus take the targets request literally.
>>
>> Richard.
>
> Hi,
>
> I now pass an output parameter to finish_costs and route it through the 
> various calls up to vect_analyze_loop.  I tried to rework 
> vect_determine_vectorization_factor and noticed that merely setting 
> min_vf and max_vf is not enough, we only use these to check whether the 
> vectorization factor is within range, well actually we only use max_vf 
> at that stage. We only seem to use 'min_vf' to make sure the 
> data_references are valid.  I am not sure my changes are the most 
> appropriate here, for instance I am pretty sure the checks for max and 
> min vf I added in vect_determine_vectorization_factor are currently 
> superfluous as they will pass by design, but thought they might be good 
> future proofing?
>
> Also I changed how we compare against max_vf, rather than relying on the 
> 'MAX_VECTORIZATION' I decided to use the estimated_poly_value with 
> POLY_VALUE_MAX, to be able to bound it further in case we have knowledge 
> of the VL. I am not entirely about the validity of this change, maybe we 
> are better off keeping the MAX_VECTORIZATION in place and not making any 
> changes to max_vf for unrolling.

Yeah, estimated_poly_value is just an estimate (even for POLY_VALUE_MAX)
rather than a guarantee.  We can't rely on it for correctness.

Richard


Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-22 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> That said, the overall flow is OK now, some details about the
> max_vf check and where to compute the unrolled VF needs to be
> fleshed out.  And then there's the main analysis loop which,
> frankly, is a mess right now, even before your patch :/

Yeah, the loop is certainly ripe for a rewrite :-)

> I'm thinking of rewriting the analysis loop in vect_analyze_loop
> to use a worklist initially seeded by the vector_modes[] but
> that we can push things like as-main-loop, unrolled and
> epilogue analysis to.  Maybe have the worklist specify
> pairs of mode and kind or tuples of mode, min-VF and kind where
> 'kind' is as-main/epilogue/unroll (though maybe 'kind' is
> redundant there).  Possibly as preparatory step.

Sounds good.  I think we can also drop some of the complexity if
we're prepared to analyse candidate replacements for the main
loop separately from candidate epilogue loops (even if the two
candidates have the same mode and VF, meaning that a lot of work
would be repeated).

Thanks,
Richard


[COMMITTED] Disregard incoming equivalences to a path when defining a new one.

2021-10-22 Thread Aldy Hernandez via Gcc-patches
The equivalence oracle creates a new equiv set at each def point,
killing any incoming equivalences, however in the path sensitive
oracle we create brand new equivalences at each PHI:

   BB4:

   BB8:
  x_5 = PHI 

Here we note that x_5 == y_8 at the end of the path.

The current code is intersecting this new equivalence with previously
known equivalences coming into the path.  This is incorrect, as this
is a new definition.  This patch kills any known equivalence before we
register a new one.

This hasn't caused problems so far, but upcoming changes to the
pipeline has us threading more aggressively and triggering corner
cases where this causes incorrect code.

I have tested this patch with the usual regstrap cycle.  I have also
hacked a compiler comparing the old and new behavior to see if we were
previously threading paths where the decision was made due to invalid
equivalences.  Luckily, there were no such paths, but there were 22
paths in a set of .ii files where disregarding incoming relations
allowed us to thread the path.  This is a miniscule improvement,
but we moved a handful of thredable paths earlier in the pipeline,
which is always good.

Tested on x86-64 Linux.

Co-authored-by: Andrew MacLeod 

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::compute_phi_relations):
Kill any global relations we may know before registering a new
one.
* value-relation.cc (path_oracle::killing_def): New.
* value-relation.h (path_oracle::killing_def): New.
---
 gcc/gimple-range-path.cc | 10 +-
 gcc/value-relation.cc| 23 +++
 gcc/value-relation.h |  1 +
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 694271306a7..557338993ae 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -698,7 +698,15 @@ path_range_query::compute_phi_relations (basic_block bb, 
basic_block prev)
tree arg = gimple_phi_arg_def (phi, i);
 
if (gimple_range_ssa_p (arg))
- m_oracle->register_relation (entry, EQ_EXPR, arg, result);
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "  from bb%d:", bb->index);
+
+   // Throw away any previous relation.
+   get_path_oracle ()->killing_def (result);
+
+   m_oracle->register_relation (entry, EQ_EXPR, arg, result);
+ }
 
break;
  }
diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index ac5f3f9afc0..2acf375ca9a 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -1285,6 +1285,29 @@ path_oracle::register_equiv (basic_block bb, tree ssa1, 
tree ssa2)
   bitmap_ior_into (m_equiv.m_names, b);
 }
 
+// Register killing definition of an SSA_NAME.
+
+void
+path_oracle::killing_def (tree ssa)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, " Registering killing_def (path_oracle) ");
+  print_generic_expr (dump_file, ssa, TDF_SLIM);
+  fprintf (dump_file, "\n");
+}
+
+  bitmap b = BITMAP_ALLOC (&m_bitmaps);
+  bitmap_set_bit (b, SSA_NAME_VERSION (ssa));
+  equiv_chain *ptr = (equiv_chain *) obstack_alloc (&m_chain_obstack,
+   sizeof (equiv_chain));
+  ptr->m_names = b;
+  ptr->m_bb = NULL;
+  ptr->m_next = m_equiv.m_next;
+  m_equiv.m_next = ptr;
+  bitmap_ior_into (m_equiv.m_names, b);
+}
+
 // Register relation K between SSA1 and SSA2, resolving unknowns by
 // querying from BB.
 
diff --git a/gcc/value-relation.h b/gcc/value-relation.h
index 53cefbfa7dc..97be3251144 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -222,6 +222,7 @@ public:
   ~path_oracle ();
   const_bitmap equiv_set (tree, basic_block);
   void register_relation (basic_block, relation_kind, tree, tree);
+  void killing_def (tree);
   relation_kind query_relation (basic_block, tree, tree);
   relation_kind query_relation (basic_block, const_bitmap, const_bitmap);
   void reset_path ();
-- 
2.31.1



Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-22 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 22 Oct 2021 at 14:56, Richard Biener  wrote:
>
> On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Wed, 20 Oct 2021 at 18:21, Richard Biener  wrote:
> > >
> > > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > On Tue, 19 Oct 2021 at 16:55, Richard Biener  wrote:
> > > > >
> > > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via 
> > > > > > > Gcc-patches
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that 
> > > > > > > > > > > > > > adds two patterns
> > > > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This works to remove call to erfc for the following 
> > > > > > > > > > > > > > test:
> > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > > > >   return t1;
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does 
> > > > > > > > > > > > > > not transform 1 -
> > > > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> 
> > > > > > > > > > > > > > erfc(x) gets applied,
> > > > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > > > resimplify erfc(x), which fails post 
> > > > > > > > > > > > > > canonicalization. So we end up
> > > > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose 
> > > > > > > > > > > > > > isn't ideal.
> > > > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) 
> > > > > > > > > > > > even when there's
> > > > > > > > > > > > no erf(x) in the source ?
> > > > > > > > > > >
> > > > > > > > > > > I do think it's reasonable to expect erfc to be available 
> > > > > > > > > > > when erf
> > > > > > > > > > > is and vice versa but note both are C99 specified 
> > > > > > > > > > > functions (either
> > > > > > > > > > > requires -lm).
> > > > > > > > > > OK, thanks. Would it be OK to commit the patch after 
> > > > > > > > > > bootstrap+test ?
> > > > > > > > >
> > > > > > > > > Yes, but I'm confused because you say the patch doesn't work 
> > > > > > > > > for you?
> > > > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > > > However when there's only erfc in the source, it canonicalizes 
> > > > > > > > erfc(x)
> > > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back 
> > > > > > > > to
> > > > > > > > erfc(x)
> > > > > > > > with -O3 -funsafe-math-optimizations.
> > > > > > > >
> > > > > > > > For,
> > > > > > > > t1 = __builtin_erfc(x),
> > > > > > > >
> > > > > > > > .optimized dump shows:
> > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > >
> > > > > > > > and for,
> > > > > > > > double t1 = x + __builtin_erfc(x);
> > > > > > > >
>

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-22 Thread Aldy Hernandez via Gcc-patches
On Fri, Oct 15, 2021 at 12:39 PM Aldy Hernandez  wrote:

> Also, I am PINGing patch 0002, which is the strlen pass conversion to
> the ranger.  As mentioned, this is just a change from an evrp client to
> a ranger client.  The APIs are exactly the same, and besides, the evrp
> analyzer is deprecated and slated for removal.  OK for trunk?

PING*2



Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-22 Thread Aldy Hernandez via Gcc-patches
On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor  wrote:

> I'd like to see gimple-ssa-array-bounds invoked from the access
> pass too (instead of from VRP), and eventually -Wrestrict as well.

You can do that right now.  The pass has been converted to the new API
and it would just require calling it with a ranger instead of the
vr_values from VRP:

  array_bounds_checker array_checker (fun, &vrp_vr_values);
  array_checker.check ();

That is, move it where you want and pass it a fresh new gimple_ranger.
If there are any regressions, we'd be glad to look at them.

> I'm not sure about the strlen/sprintf warnings; those might need
> to stay where they are because they run as part of the optimizers
> there.
>
> (By the way, I don't see range info in the access pass at -O0.
> Should I?)

I assume you mean you don't see anything in the dump files.

None of the VRP passes (evrp included) run at -O0, so you wouldn't see
anything in the IL.  You *may* be able to see some global ranges that
DOM's use of the evrp engine exported, but I'm not sure.

You're going to have to instantiate a gimple_ranger and use it if you
want to have range info available, but that's not going to show up in
the IL, even after you use it, because it doesn't export global ranges
by default.  What are you trying to do?

Aldy



[PATCH] tree-optimization/102893 - properly DCE empty loops inside infinite loops

2021-10-22 Thread Richard Biener via Gcc-patches
The following fixes the test for an exit edge I put in place for
the fix for PR45178 where I somehow misunderstood how the cyclic
list works.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-22  Richard Biener  

PR tree-optimization/102893
* tree-ssa-dce.c (find_obviously_necessary_stmts): Fix the
test for an exit edge.

* gcc.dg/tree-ssa/ssa-dce-9.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c | 10 ++
 gcc/tree-ssa-dce.c|  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c
new file mode 100644
index 000..e1ffa7f038d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1" } */
+
+int main()
+{
+  while(1)
+for(int i=0; i<900; i++){}
+}
+
+/* { dg-final { scan-tree-dump-not "if" "cddce1" } } */
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index c4907af923c..372e0691ae6 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -436,7 +436,7 @@ find_obviously_necessary_stmts (bool aggressive)
 
   for (auto loop : loops_list (cfun, 0))
/* For loops without an exit do not mark any condition.  */
-   if (loop->exits->next && !finite_loop_p (loop))
+   if (loop->exits->next->e && !finite_loop_p (loop))
  {
if (dump_file)
  fprintf (dump_file, "cannot prove finiteness of loop %i\n",
-- 
2.31.1


Re: [RFC PATCH v2 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-10-22 Thread Ard Biesheuvel via Gcc-patches
On Thu, 21 Oct 2021 at 18:51, Ard Biesheuvel  wrote:
>
> Add support for accessing the stack canary value via the TLS register,
> so that multiple threads running in the same address space can use
> distinct canary values. This is intended for the Linux kernel running in
> SMP mode, where processes entering the kernel are essentially threads
> running the same program concurrently: using a global variable for the
> canary in that context is problematic because it can never be rotated,
> and so the OS is forced to use the same value as long as it remains up.
>
> Using the TLS register to index the stack canary helps with this, as it
> allows each CPU to context switch the TLS register along with the rest
> of the process, permitting each process to use its own value for the
> stack canary.
>
> 2021-10-21 Ard Biesheuvel 
>
> * config/arm/arm-opts.h (enum stack_protector_guard): New
> * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
> New
> * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
> (arm_option_override_internal): Handle and put in error checks
> for stack protector guard options.
> (arm_option_reconfigure_globals): Likewise
> (arm_stack_protect_tls_canary_mem): New
> (arm_stack_protect_guard): New
> * config/arm/arm.md (stack_protect_set): New
> (stack_protect_set_tls): Likewise
> (stack_protect_test): Likewise
> (stack_protect_test_tls): Likewise
> * config/arm/arm.opt (-mstack-protector-guard): New
> (-mstack-protector-guard-offset): New.
>
> Signed-off-by: Ard Biesheuvel 
> ---
>  gcc/config/arm/arm-opts.h   |  6 ++
>  gcc/config/arm/arm-protos.h |  2 +
>  gcc/config/arm/arm.c| 52 
>  gcc/config/arm/arm.md   | 62 +++-
>  gcc/config/arm/arm.opt  | 22 +++
>  gcc/doc/invoke.texi |  9 +++
>  6 files changed, 151 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h
> index 5c4b62f404f7..581ba3c4fbbb 100644
> --- a/gcc/config/arm/arm-opts.h
> +++ b/gcc/config/arm/arm-opts.h
> @@ -69,4 +69,10 @@ enum arm_tls_type {
>TLS_GNU,
>TLS_GNU2
>  };
> +
> +/* Where to get the canary for the stack protector.  */
> +enum stack_protector_guard {
> +  SSP_TLSREG,  /* per-thread canary in TLS register */
> +  SSP_GLOBAL   /* global canary */
> +};
>  #endif
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 9b1f61394ad7..37e80256a78d 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, 
> rtx, rtx, rtx, rtx);
>  extern rtx arm_load_tp (rtx);
>  extern bool arm_coproc_builtin_available (enum unspecv);
>  extern bool arm_coproc_ldc_stc_legitimate_address (rtx);
> +extern rtx arm_stack_protect_tls_canary_mem (void);
> +
>
>  #if defined TREE_CODE
>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c4ff06b087eb..0bf06e764dbb 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] =
>
>  #undef TARGET_MD_ASM_ADJUST
>  #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
> +
> +#undef TARGET_STACK_PROTECT_GUARD
> +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard
>
>  /* Obstack for minipool constant handling.  */
>  static struct obstack minipool_obstack;
> @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts,
>if (TARGET_THUMB2_P (opts->x_target_flags))
>  opts->x_inline_asm_unified = true;
>
> +  if (arm_stack_protector_guard == SSP_GLOBAL
> +  && opts->x_arm_stack_protector_guard_offset_str)
> +{
> +  error ("incompatible options %'-mstack-protector-guard=global%' and"
> +"%'-mstack-protector-guard-offset=%qs%'",
> +arm_stack_protector_guard_offset_str);
> +}
> +
> +  if (opts->x_arm_stack_protector_guard_offset_str)
> +{
> +  char *end;
> +  const char *str = arm_stack_protector_guard_offset_str;
> +  errno = 0;
> +  long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0);
> +  if (!*str || *end || errno)
> +   error ("%qs is not a valid offset in %qs", str,
> +  "-mstack-protector-guard-offset=");
> +  arm_stack_protector_guard_offset = offs;
> +}
> +
>  #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
>SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
>  #endif
> @@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void)
>else
> target_thread_pointer = TP_SOFT;
>  }
> +
> +  if (arm_stack_protector_guard == SSP_TLSREG
> +  && target_thread_pointer != TP_CP15)
> +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register");
>  }
>
>  /* Perform some validation between

[PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2021-10-22 Thread Hafiz Abid Qadeer
This patch adds support for OpenMP 5.0 allocate clause for fortran. It does not
yet support the allocator-modifier as specified in OpenMP 5.1. The allocate
clause is already supported in C/C++.

gcc/fortran/ChangeLog:

* dump-parse-tree.c (show_omp_clauses): Handle OMP_LIST_ALLOCATE.
* gfortran.h (OMP_LIST_ALLOCATE): New enum value.
(allocate): New member in gfc_symbol.
* openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATE.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_ALLOCATE
(OMP_PARALLEL_CLAUSES, OMP_DO_CLAUSES, OMP_SECTIONS_CLAUSES)
(OMP_TASK_CLAUSES, OMP_TASKLOOP_CLAUSES, OMP_TARGET_CLAUSES)
(OMP_TEAMS_CLAUSES, OMP_DISTRIBUTE_CLAUSES)
(OMP_SINGLE_CLAUSES): Add OMP_CLAUSE_ALLOCATE.
(OMP_TASKGROUP_CLAUSES): New
(gfc_match_omp_taskgroup): Use 'OMP_TASKGROUP_CLAUSES' instead of
'OMP_CLAUSE_TASK_REDUCTION'
(resolve_omp_clauses): Handle OMP_LIST_ALLOCATE.
(resolve_omp_do): Avoid warning when loop iteration variable is
in allocate clause.
* trans-openmp.c (gfc_trans_omp_clauses): Handle translation of
allocate clause.
(gfc_split_omp_clauses): Update for OMP_LIST_ALLOCATE.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-1.f90: New test.
* gfortran.dg/gomp/allocate-2.f90: New test.
* gfortran.dg/gomp/collapse1.f90: Update error message.
* gfortran.dg/gomp/openmp-simd-4.f90: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/allocate-1.c: New test.
* testsuite/libgomp.fortran/allocate-1.f90: New test.
---
 gcc/fortran/dump-parse-tree.c |   1 +
 gcc/fortran/gfortran.h|   5 +
 gcc/fortran/openmp.c  | 140 +++-
 gcc/fortran/trans-openmp.c|  34 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-1.f90 | 123 +++
 gcc/testsuite/gfortran.dg/gomp/allocate-2.f90 |  45 +++
 gcc/testsuite/gfortran.dg/gomp/collapse1.f90  |   2 +-
 .../gfortran.dg/gomp/openmp-simd-4.f90|   6 +-
 .../testsuite/libgomp.fortran/allocate-1.c|   7 +
 .../testsuite/libgomp.fortran/allocate-1.f90  | 333 ++
 10 files changed, 675 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-1.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-1.f90

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 14a307856fc..66af802ec36 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1685,6 +1685,7 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
  case OMP_LIST_USE_DEVICE_PTR: type = "USE_DEVICE_PTR"; break;
  case OMP_LIST_USE_DEVICE_ADDR: type = "USE_DEVICE_ADDR"; break;
  case OMP_LIST_NONTEMPORAL: type = "NONTEMPORAL"; break;
+ case OMP_LIST_ALLOCATE: type = "ALLOCATE"; break;
  case OMP_LIST_SCAN_IN: type = "INCLUSIVE"; break;
  case OMP_LIST_SCAN_EX: type = "EXCLUSIVE"; break;
  default:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 66192c07d8c..feae00052cc 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1388,6 +1388,7 @@ enum
   OMP_LIST_USE_DEVICE_PTR,
   OMP_LIST_USE_DEVICE_ADDR,
   OMP_LIST_NONTEMPORAL,
+  OMP_LIST_ALLOCATE,
   OMP_LIST_NUM
 };
 
@@ -1880,6 +1881,10 @@ typedef struct gfc_symbol
  according to the Fortran standard.  */
   unsigned pass_as_value:1;
 
+  /* Used to check if a variable used in allocate clause has also been
+ used in privatization clause.  */
+  unsigned allocate:1;
+
   int refs;
   struct gfc_namespace *ns;/* namespace containing this symbol */
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index dcf22ac2c2f..aac8d2580a4 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -911,6 +911,7 @@ enum omp_mask1
   OMP_CLAUSE_MEMORDER,  /* OpenMP 5.0.  */
   OMP_CLAUSE_DETACH,  /* OpenMP 5.0.  */
   OMP_CLAUSE_AFFINITY,  /* OpenMP 5.0.  */
+  OMP_CLAUSE_ALLOCATE,  /* OpenMP 5.0.  */
   OMP_CLAUSE_BIND,  /* OpenMP 5.0.  */
   OMP_CLAUSE_FILTER,  /* OpenMP 5.1.  */
   OMP_CLAUSE_AT,  /* OpenMP 5.1.  */
@@ -1540,6 +1541,40 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
omp_mask mask,
}
  continue;
}
+ if ((mask & OMP_CLAUSE_ALLOCATE)
+ && gfc_match ("allocate ( ") == MATCH_YES)
+   {
+ gfc_expr *allocator = NULL;
+ old_loc = gfc_current_locus;
+ m = gfc_match_expr (&allocator);
+ if (m != MATCH_YES)
+   {
+ gfc_error ("Expected allocator or variable list at %C");
+ goto error;
+   }
+ if (gfc_match (" : ") != MATCH_YES)
+   {
+ /* If n

Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)

2021-10-22 Thread Tobias Burnus

Hi José, hi Fortraners,

triage of all listed patches:


On 21.06.21 17:21, José Rui Faustino de Sousa wrote:

https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html


PR100040 - Wrong code with intent out assumed-rank allocatable
PR100029 - ICE on subroutine call with allocatable polymorphic
→ Both: Still occurs, ICE in gfc_deallocate_scalar_with_status

TODO: Review patch.


https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html


PR100097 - Unlimited polymorphic pointers and allocatables have incorrect rank
PR100098 - Polymorphic pointers and allocatables have incorrect rank
→ Both: PASS

TODO: Check whether it makes sense to apply the testcase
TODO: Close PRs
→ See also patch below (2021-June/056169.html)


https://gcc.gnu.org/pipermail/fortran/2021-June/056168.html


PR96870 - Class name on error message
→ Fixed with sufficient test coverage; thus, I closed the PR.

Nothing to be done.


https://gcc.gnu.org/pipermail/fortran/2021-June/056167.html


PR96724 - Bogus warnings with the repeat intrinsic and the flag 
-Wconversion-extra|  repeat ('x', NCOPIES=i08) ! i08 is 20_1 shows: Warning: 
Conversion
from INTEGER(1) to INTEGER(8) at (1) [-Wconversion-extra] TODO: Review
patch. |


https://gcc.gnu.org/pipermail/fortran/2021-June/056163.html


Bug 93308 - bind(c) subroutine changes lower bound of array argument in caller
Bug 93963 - Select rank mishandling allocatable and pointer arguments with 
bind(c)
Bug 94327 - Bind(c) argument attributes are incorrectly set
Bug 94331 - Bind(C) corrupts array descriptors
Bug 97046 - Bad interaction between lbound/ubound, allocatable arrays and 
bind(C) subroutine with dimension(..) parameter
→ All already closed as FIXED

TODO: Nothing, unless we want to pick one of the testcases.


https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html


PR94104 - Request for diagnostic improvement
   10 |   print *, sumf(a)
  |1
Error: Actual argument for ‘a’ must be a pointer at (1)
NOTE: as the dummy is intent(in), since F2008 alternatively a TARGET attr would 
be also okay.

TODO: Review patch - in principle, I am fine with the but I am not sure the
'valid target' in the error message is clear enough. Might require some message 
tweaking
for clarity.


https://gcc.gnu.org/pipermail/fortran/2021-June/056155.html


Gerald's PR100948 - [12 Regression] ICE in gfc_conv_expr_val, at 
fortran/trans-expr.c:9069
Still has an ICE.

TODO: Review patch.


https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html


Bug 100906 - Bind(c): failure handling character with len/=1
→ Testcase now passes.
Bug 100907 - Bind(c): failure handling wide character
→ I think now okay – but the testcase assumes elem_len/sizeof(char) == #chars
  but for the C descriptor, elem_len / sizeof(char-type) = #chars
  Thus, sz is not 1 or 7 bytes but 4 or 28 bytes (or 1/7 characters)
Bug 100911 - Bind(c): failure handling C_PTR
→ Closed as FIXED.
Bug 100914 - Bind(c): errors handling complex
→ Closed as FIXED
Bug 100915 - Bind(c): failure handling C_FUNPTR
→ Closed as FIXED
Bug 100916 - Bind(c): CFI_type_other unimplemented
→ Bogus testcase (for 't(ype)' argument) otherwise it expects
  CFI_type_other instead of CFI_type_struct (TODO: Is that sensible?)

TODO: Check whether a testcase is needed
TODO: Close the three still open PRs


https://gcc.gnu.org/pipermail/fortran/2021-June/056152.html


Bug 101047 - Pointer explicit initialization fails
Bug 101048 - Class pointer explicit initialization refuses valid
  ..., pointer, save :: ptr => tgt
fails to associate ptr with tgt
(wrong-code + rejects valid)

TODO: Review patch.


https://gcc.gnu.org/pipermail/fortran/2021-June/056159.html


PR92621 - Problems with memory handling with allocatable intent(out) arrays 
with bind(c)

I think mostly fixed by my big bind(C) patch, but there still one ICE
with '-fcheck=all -fsanitize=undefined'

TODO: Fix that bug  (unlikely to be fixed by José's patch)
TODO: Check whether testcase should be added
and then close the PR


https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html


PR100245 - ICE on automatic reallocation.
Still ICEs

TODO: Review patch.


https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html


PR100136 - ICE, regression, using flag -fcheck=pointer

First testcase has an ICE with -fcheck=pointer
Second testcase has always an ICE + possibly missing func.

TODO: Review patch – and probably: follow-up patch for remaining issue


https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html


PR100132 - Optimization breaks pointer association.
'fn spec' is wrong :-(

TODO: Review patch!


https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html


PR100103 - Automatic reallocation fails inside select rank
Still segfaults at runtime for 'that = a' where the RHS is a parameter
and the LHS an allocatable assumed-rank array (inside select rank).

TODO: Review patch


https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html


PR100097 - Unlimited polymorphic pointers a

Re: [PATCH] Canonicalize __atomic/sync_fetch_or/xor/and for constant mask.

2021-10-22 Thread H.J. Lu via Gcc-patches
On Thu, Oct 21, 2021 at 10:48 PM liuhongt  wrote:
>
> Hi:
>  This patch is try to canoicalize bit_and and nop_convert order for
> __atomic_fetch_or_*, __atomic_fetch_xor_*,
> __atomic_xor_fetch_*,__sync_fetch_and_or_*,
> __sync_fetch_and_xor_*,__sync_xor_and_fetch_*,
> __atomic_fetch_and_*,__sync_fetch_and_and_* when mask is constant.
>
> .i.e.
>
> +/* Canonicalize
> +  _1 = __atomic_fetch_or_4 (&v, 1, 0);
> +  _2 = (int) _1;
> +  _5 = _2 & 1;
> +
> +to
> +
> +  _1 = __atomic_fetch_or_4 (&v, 1, 0);
> +  _2 = _1 & 1;
> +  _5 = (int) _2;
>
> +/* Convert
> + _1 = __atomic_fetch_and_4 (a_6(D), 4294959103, 0);
> + _2 = (int) _1;
> + _3 = _2 & 8192;
> +to
> +  _1 = __atomic_fetch_and_4 (a_4(D), 4294959103, 0);
> +  _7 = _1 & 8192;
> +  _6 = (int) _7;
> + So it can be handled by  optimize_atomic_bit_test_and.  */
>
> I'm trying to rewrite match part in match.pd and find the
> canonicalization is ok when mask is constant, but not for variable
> since it will be simplified back by
>  /* In GIMPLE, getting rid of 2 conversions for one new results
> in smaller IL.  */
>  (simplify
>   (convert (bitop:cs@2 (nop_convert:s @0) @1))
>   (if (GIMPLE
>&& TREE_CODE (@1) != INTEGER_CST
>&& tree_nop_conversion_p (type, TREE_TYPE (@2))
>&& types_match (type, @0))
>(bitop @0 (convert @1)
>
> The canonicalization for variabled is like
>
> convert
>   _1 = ~mask_7;
>   _2 = (unsigned int) _1;
>   _3 = __atomic_fetch_and_4 (ptr_6, _2, 0);
>  _4 = (int) _3;
>  _5 = _4 & mask_7;
>
> to
>   _1 = ~mask_7;
>   _2 = (unsigned int) _1;
>   _3 = __atomic_fetch_and_4 (ptr_6, _2, 0);
>   _4 = (unsigned int) mask_7
>   _6 = _3 & _4
>   _5 = (int) _6
>
> and be simplified back.
>
> I've also tried another way of simplication like
>
> convert
>   _1 = ~mask_7;
>   _2 = (unsigned int) _1;
>   _3 = __atomic_fetch_and_4 (ptr_6, _2, 0);
>  _4 = (int) _3;
>  _5 = _4 & mask_7;
>
> to
>   _1 = (unsigned int)mask_7;
>   _2 = ~ _1;
>   _3 = __atomic_fetch_and_4 (ptr_6, _2, 0);
>_6 = _3 & _1
>   _5 = (int)
>
> but it's prevent by below since __atomic_fetch_and_4 is not CONST, but
> we need to regenerate it with updated parameter.
>
>   /* We can't and should not emit calls to non-const functions.  */
>   if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> return NULL;
>
>
>   Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>   Ok for trunk?
>
> gcc/ChangeLog:
>
> * match.pd: Canonicalize bit_and and nop_convert order for
> __atomic/sync_fetch_or/xor/and for when mask is constant.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr102566-1a.c: New test.
> * gcc.target/i386/pr102566-2a.c: New test.
> ---
>  gcc/match.pd| 118 
>  gcc/testsuite/gcc.target/i386/pr102566-1a.c |  66 +++
>  gcc/testsuite/gcc.target/i386/pr102566-2a.c |  65 +++
>  3 files changed, 249 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102566-1a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102566-2a.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5bed2e12715..06b369d1ab1 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -104,6 +104,39 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (define_operator_list COND_TERNARY
>IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
>
> +/* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
> +(define_operator_list ATOMIC_FETCH_OR_XOR_N
> +  BUILT_IN_ATOMIC_FETCH_OR_1 BUILT_IN_ATOMIC_FETCH_OR_2
> +  BUILT_IN_ATOMIC_FETCH_OR_4 BUILT_IN_ATOMIC_FETCH_OR_8
> +  BUILT_IN_ATOMIC_FETCH_OR_16
> +  BUILT_IN_ATOMIC_FETCH_XOR_1 BUILT_IN_ATOMIC_FETCH_XOR_2
> +  BUILT_IN_ATOMIC_FETCH_XOR_4 BUILT_IN_ATOMIC_FETCH_XOR_8
> +  BUILT_IN_ATOMIC_FETCH_XOR_16
> +  BUILT_IN_ATOMIC_XOR_FETCH_1 BUILT_IN_ATOMIC_XOR_FETCH_2
> +  BUILT_IN_ATOMIC_XOR_FETCH_4 BUILT_IN_ATOMIC_XOR_FETCH_8
> +  BUILT_IN_ATOMIC_XOR_FETCH_16)
> +/* __sync_fetch_and_or_*, __sync_fetch_and_xor_*, __sync_xor_and_fetch_*  */
> +(define_operator_list SYNC_FETCH_OR_XOR_N
> +  BUILT_IN_SYNC_FETCH_AND_OR_1 BUILT_IN_SYNC_FETCH_AND_OR_2
> +  BUILT_IN_SYNC_FETCH_AND_OR_4 BUILT_IN_SYNC_FETCH_AND_OR_8
> +  BUILT_IN_SYNC_FETCH_AND_OR_16
> +  BUILT_IN_SYNC_FETCH_AND_XOR_1 BUILT_IN_SYNC_FETCH_AND_XOR_2
> +  BUILT_IN_SYNC_FETCH_AND_XOR_4 BUILT_IN_SYNC_FETCH_AND_XOR_8
> +  BUILT_IN_SYNC_FETCH_AND_XOR_16
> +  BUILT_IN_SYNC_XOR_AND_FETCH_1 BUILT_IN_SYNC_XOR_AND_FETCH_2
> +  BUILT_IN_SYNC_XOR_AND_FETCH_4 BUILT_IN_SYNC_XOR_AND_FETCH_8
> +  BUILT_IN_SYNC_XOR_AND_FETCH_16)
> +/* __atomic_fetch_and_*.  */
> +(define_operator_list ATOMIC_FETCH_AND_N
> +  BUILT_IN_ATOMIC_FETCH_AND_1 BUILT_IN_ATOMIC_FETCH_AND_2
> +  BUILT_IN_ATOMIC_FETCH_AND_4 BUILT_IN_ATOMIC_FETCH_AND_8
> +  BUILT_IN_ATOMIC_FETCH_AND_16)
> +/* __sync_fetch_and_and_*.  */
> +(define_operator_list SYNC_FETCH_AND_AND_N
> +  BUILT_IN_SYNC_FETCH_AND_AND_1 BUILT_IN_SYNC_FETCH_AND_AND_2
> +  BUILT_IN_SYNC_FETCH_AND_AND_4 BUILT_IN_SYNC_FETCH_AND_AND_8

Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2021-10-22 Thread Tobias Burnus

Hi all,

On 22.10.21 15:05, Hafiz Abid Qadeer wrote:

This patch adds support for OpenMP 5.0 allocate clause for fortran. It does not
yet support the allocator-modifier as specified in OpenMP 5.1. The allocate
clause is already supported in C/C++.


I think the following shouldn't block the acceptance of the patch,
but I think we eventually need to handle the following as well:

type t
  integer, allocatable :: xx(:)
end type

type(t) :: tt
class(t), allocatable :: cc

allocate(t :: cc)
tt%xx = [1,2,3,4,5,6]
cc%xx = [1,2,3,4,5,6]

! ...
!$omp task firstprivate(tt, cc) allocate(h)
 ...

In my spec reading, both tt/cc itself and tt%ii and cc%ii should
use the specified allocator.

And unless I missed something (I only glanced at the patch so far),
it is not handled.

But for derived types (except for recursive allocatables, valid since 5.1),
I think it can be handled in gfc_omp_clause_copy_ctor / gfc_omp_clause_dtor,
but I have not checked whether those support it properly.

For CLASS + recursive allocatables, it requires some more changes
(which might be provided by my derived-type deep copy patch,
of which only 1/3 has been written).

Tobias

PS: Just a side note, OpenMP has the following for Fortran:

"If any operation of the base language causes a reallocation
 of a variable that is allocated with a memory allocator then
 that memory allocator will be used to deallocate the current
 memory and to allocate the new memory. For allocated
 allocatable components of such variables, the allocator that
 will be used for the deallocation and allocation is unspecified."

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-22 Thread Martin Sebor via Gcc-patches

On 10/22/21 5:22 AM, Aldy Hernandez wrote:

On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor  wrote:


I'd like to see gimple-ssa-array-bounds invoked from the access
pass too (instead of from VRP), and eventually -Wrestrict as well.


You can do that right now.  The pass has been converted to the new API
and it would just require calling it with a ranger instead of the
vr_values from VRP:

   array_bounds_checker array_checker (fun, &vrp_vr_values);
   array_checker.check ();

That is, move it where you want and pass it a fresh new gimple_ranger.
If there are any regressions, we'd be glad to look at them.


I appreciate that and I'm not worried about regressions due to
ranger code.

It's not so simple as it seems because of the optimizer
dependencies I mentioned.  -Warray-bounds runs before vectorization
and the access pass after it.  Moving it into the access pass will
mean dealing with the fallout: either accepting regressions in
the quality of warnings (bad locations due to vectorization
merging distinct stores into one) or running the access pass at
a different point in the pipeline, and facing regressions in
the other warnings due to that.  Running it twice, once earlier
for -Warray-bounds and then again later for -Wstringop-overflow
etc, would be less than optimal because they all do the same
thing (compute object sizes and offsets) and should be able to
share the same data (the pointer query cache).  So the ideal
solution is to find a middle ground where all these warnings
can run from the same pass with optimal results.

-Warray-bounds might also need to be adjusted for -O0 to avoid
warning on unreachable code, although, surprisingly, that hasn't
been an issue for the other warnings now enabled at -O0.

All this will take some time, which I'm out of for this stage 1.




I'm not sure about the strlen/sprintf warnings; those might need
to stay where they are because they run as part of the optimizers
there.

(By the way, I don't see range info in the access pass at -O0.
Should I?)


I assume you mean you don't see anything in the dump files.


I mean that I don't get accurate range info from the ranger
instance in any function.  I'd like the example below to trigger
a warning even at -O0 but it doesn't because n's range is
[0, UINT_MAX] instead of [7, UINT_MAX]:

  char a[4];

  void f (unsigned n)
  {
if (n < 7)
  n = 7;
__builtin_memset (a, 0, n);
  }



None of the VRP passes (evrp included) run at -O0, so you wouldn't see
anything in the IL.  You *may* be able to see some global ranges that
DOM's use of the evrp engine exported, but I'm not sure.

You're going to have to instantiate a gimple_ranger and use it if you
want to have range info available, but that's not going to show up in
the IL, even after you use it, because it doesn't export global ranges
by default.  What are you trying to do?


The above.  The expected warning runs in the access warning pass.
It uses the per-function instance of the ranger but it gets back
a range for the type.  To see that put a breakpoint in
get_size_range() in pointer-query.cc and compile the above with
-O0.

Martin


[PATCH 1/6] aarch64: Move Neon vector-tuple type declaration into the compiler

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch declares the Neon vector-tuple types inside the
compiler instead of in the arm_neon.h header. This is a necessary first
step before adding corresponding machine modes to the AArch64
backend.

The vector-tuple types are implemented using a #pragma. This means
initialization of builtin functions that have vector-tuple types as
arguments or return values has to be delayed until the #pragma is
handled.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Note that this patch series cannot be merged until the following has
been accepted: 
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581948.html

Ok for master with this proviso?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-09-10  Jonathan Wright  

* config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins):
Factor out main loop to...
(aarch64_init_simd_builtin_functions): This new function.
(register_tuple_type): Define.
(aarch64_scalar_builtin_type_p): Define.
(handle_arm_neon_h): Define.
* config/aarch64/aarch64-c.c (aarch64_pragma_aarch64): Handle
pragma for arm_neon.h.
* config/aarch64/aarch64-protos.h (aarch64_advsimd_struct_mode_p):
Declare.
(handle_arm_neon_h): Likewise.
* config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p):
Remove static modifier.
* config/aarch64/arm_neon.h (target): Remove Neon vector
structure type definitions.


rb14838.patch
Description: rb14838.patch


[PATCH 2/6] gcc/expr.c: Remove historic workaround for broken SIMD subreg

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi,

A long time ago, using a parallel to take a subreg of a SIMD register
was broken. This temporary fix[1] (from 2003) spilled these registers
to memory and reloaded the appropriate part to obtain the subreg.

The fix initially existed for the benefit of the PowerPC E500 - a
platform for which GCC removed support a number of years ago.
Regardless, a proper mechanism for taking a subreg of a SIMD register
exists now anyway.

This patch removes the workaround thus preventing SIMD registers
being dumped to memory unnecessarily - which sometimes can't be fixed
by later passes.

Bootstrapped and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu - no issues.

Ok for master?

Thanks,
Jonathan

[1] https://gcc.gnu.org/pipermail/gcc-patches/2003-April/102099.html

---

gcc/ChangeLog:

2021-10-11  Jonathan Wright  

* expr.c (emit_group_load_1): Remove historic workaround.


rb14923.patch
Description: rb14923.patch


[PATCH 3/6] gcc/expmed.c: Ensure vector modes are tieable before extraction

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi,

Extracting a bitfield from a vector can be achieved by casting the
vector to a new type whose elements are the same size as the desired
bitfield, before generating a subreg. However, this is only an
optimization if the original vector can be accessed in the new
machine mode without first being copied - a condition denoted by the
TARGET_MODES_TIEABLE_P hook.

This patch adds a check to make sure that the vector modes are
tieable before attempting to generate a subreg. This is a necessary
prerequisite for a subsequent patch that will introduce new machine
modes for Arm Neon vector-tuple types.

Bootstrapped and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-10-11  Jonathan Wright  

* expmed.c (extract_bit_field_1): Ensure modes are tieable.


rb14926.patch
Description: rb14926.patch


[PATCH 5/6] gcc/lower_subreg.c: Prevent decomposition if modes are not tieable

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi,

Preventing decomposition if modes are not tieable is necessary to
stop AArch64 partial Neon structure modes being treated as packed in
registers.

This is a necessary prerequisite for a future AArch64 PCS change to
maintain good code generation.

Bootstrapped and regression tested on:
* x86_64-pc-linux-gnu - no issues.
* aarch64-none-linux-gnu - two test failures which will be fixed by
  the next patch in this series. 

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-10-14  Jonathan Wright  

* lower-subreg.c (simple_move): Prevent decomposition if
modes are not tieable.


rb14936.patch
Description: rb14936.patch


[PATCH 6/6] aarch64: Pass and return Neon vector-tuple types without a parallel

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi,

Neon vector-tuple types can be passed in registers on function call
and return - there is no need to generate a parallel rtx. This patch
adds cases to detect vector-tuple modes and generates an appropriate
register rtx.

This change greatly improves code generated when passing Neon vector-
tuple types between functions; many new test cases are added to
defend these improvements.

Bootstrapped and regression tested on aarch64-none-linux-gnu and
aarch64_be-none-linux-gnu - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-10-07  Jonathan Wright  

* config/aarch64/aarch64.c (aarch64_function_value): Generate
a register rtx for Neon vector-tuple modes.
(aarch64_layout_arg): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: New code
generation tests.


rb14937.patch
Description: rb14937.patch


Re: [PATCH 1/6] aarch64: Move Neon vector-tuple type declaration into the compiler

2021-10-22 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> As subject, this patch declares the Neon vector-tuple types inside the
> compiler instead of in the arm_neon.h header. This is a necessary first
> step before adding corresponding machine modes to the AArch64
> backend.
>
> The vector-tuple types are implemented using a #pragma. This means
> initialization of builtin functions that have vector-tuple types as
> arguments or return values has to be delayed until the #pragma is
> handled.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Note that this patch series cannot be merged until the following has
> been accepted:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581948.html
>
> Ok for master with this proviso?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-09-10  Jonathan Wright  
>
> * config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins):
> Factor out main loop to...
> (aarch64_init_simd_builtin_functions): This new function.
> (register_tuple_type): Define.
> (aarch64_scalar_builtin_type_p): Define.
> (handle_arm_neon_h): Define.
> * config/aarch64/aarch64-c.c (aarch64_pragma_aarch64): Handle
> pragma for arm_neon.h.
> * config/aarch64/aarch64-protos.h (aarch64_advsimd_struct_mode_p):
> Declare.
> (handle_arm_neon_h): Likewise.
> * config/aarch64/aarch64.c (aarch64_advsimd_struct_mode_p):
> Remove static modifier.
> * config/aarch64/arm_neon.h (target): Remove Neon vector
> structure type definitions.

OK when the prerequisite you mention is applied, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> 1a507ea59142d0b5977b0167abfe9a58a567adf7..27f2dc5ea4337da80f3b84b6a798263e7bd9012e
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -1045,32 +1045,22 @@ aarch64_init_fcmla_laneq_builtins (void)
>  }
>  
>  void
> -aarch64_init_simd_builtins (void)
> +aarch64_init_simd_builtin_functions (bool called_from_pragma)
>  {
>unsigned int i, fcode = AARCH64_SIMD_PATTERN_START;
>  
> -  if (aarch64_simd_builtins_initialized_p)
> -return;
> -
> -  aarch64_simd_builtins_initialized_p = true;
> -
> -  aarch64_init_simd_builtin_types ();
> -
> -  /* Strong-typing hasn't been implemented for all AdvSIMD builtin 
> intrinsics.
> - Therefore we need to preserve the old __builtin scalar types.  It can be
> - removed once all the intrinsics become strongly typed using the 
> qualifier
> - system.  */
> -  aarch64_init_simd_builtin_scalar_types ();
> - 
> -  tree lane_check_fpr = build_function_type_list (void_type_node,
> -   size_type_node,
> -   size_type_node,
> -   intSI_type_node,
> -   NULL);
> -  aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_LANE_CHECK]
> -= aarch64_general_add_builtin ("__builtin_aarch64_im_lane_boundsi",
> -lane_check_fpr,
> -AARCH64_SIMD_BUILTIN_LANE_CHECK);
> +  if (!called_from_pragma)
> +{
> +  tree lane_check_fpr = build_function_type_list (void_type_node,
> +   size_type_node,
> +   size_type_node,
> +   intSI_type_node,
> +   NULL);
> +  aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_LANE_CHECK]
> + = aarch64_general_add_builtin ("__builtin_aarch64_im_lane_boundsi",
> +lane_check_fpr,
> +AARCH64_SIMD_BUILTIN_LANE_CHECK);
> +}
>  
>for (i = 0; i < ARRAY_SIZE (aarch64_simd_builtin_data); i++, fcode++)
>  {
> @@ -1100,6 +1090,18 @@ aarch64_init_simd_builtins (void)
>tree return_type = void_type_node, args = void_list_node;
>tree eltype;
>  
> +  int struct_mode_args = 0;
> +  for (int j = op_num; j >= 0; j--)
> + {
> +   machine_mode op_mode = insn_data[d->code].operand[j].mode;
> +   if (aarch64_advsimd_struct_mode_p (op_mode))
> + struct_mode_args++;
> + }
> +
> +  if ((called_from_pragma && struct_mode_args == 0)
> +   || (!called_from_pragma && struct_mode_args > 0))
> + continue;
> +
>/* Build a function type directly from the insn_data for this
>builtin.  The build_function_type () function takes care of
>removing duplicates for us.  */
> @@ -1173,9 +1175,82 @@ aarch64_init_simd_builtins (void)
>fndecl = aarch64_general_add_builtin (namebuf, ftype, fcode, attrs);
>aarch64_builtin_decls[fcode] = fndecl;
>  }
> +}
> +
> +/* Register the tuple type that c

Re: [PATCH 2/6] gcc/expr.c: Remove historic workaround for broken SIMD subreg

2021-10-22 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> A long time ago, using a parallel to take a subreg of a SIMD register
> was broken. This temporary fix[1] (from 2003) spilled these registers
> to memory and reloaded the appropriate part to obtain the subreg.
>
> The fix initially existed for the benefit of the PowerPC E500 - a
> platform for which GCC removed support a number of years ago.
> Regardless, a proper mechanism for taking a subreg of a SIMD register
> exists now anyway.
>
> This patch removes the workaround thus preventing SIMD registers
> being dumped to memory unnecessarily - which sometimes can't be fixed
> by later passes.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu - no issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2003-April/102099.html
>
> ---
>
> gcc/ChangeLog:
>
> 2021-10-11  Jonathan Wright  
>
> * expr.c (emit_group_load_1): Remove historic workaround.

OK, thanks.

Richard

> diff --git a/gcc/expr.c b/gcc/expr.c
> index 
> e0bcbccd9053df168c2e861414729fc7cf017f85..62446118b7beb725933ec6f7b0386e7e4b84fa90
>  100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -2508,19 +2508,6 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, 
> tree type,
>  NULL);
>   }
>   }
> -  /* FIXME: A SIMD parallel will eventually lead to a subreg of a
> -  SIMD register, which is currently broken.  While we get GCC
> -  to emit proper RTL for these cases, let's dump to memory.  */
> -  else if (VECTOR_MODE_P (GET_MODE (dst))
> -&& REG_P (src))
> - {
> -   poly_uint64 slen = GET_MODE_SIZE (GET_MODE (src));
> -   rtx mem;
> -
> -   mem = assign_stack_temp (GET_MODE (src), slen);
> -   emit_move_insn (mem, src);
> -   tmps[i] = adjust_address (mem, mode, bytepos);
> - }
>else if (CONSTANT_P (src) && GET_MODE (dst) != BLKmode
> && XVECLEN (dst, 0) > 1)
>  tmps[i] = simplify_gen_subreg (mode, src, GET_MODE (dst), bytepos);


Re: [PATCH 3/6] gcc/expmed.c: Ensure vector modes are tieable before extraction

2021-10-22 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> Extracting a bitfield from a vector can be achieved by casting the
> vector to a new type whose elements are the same size as the desired
> bitfield, before generating a subreg. However, this is only an
> optimization if the original vector can be accessed in the new
> machine mode without first being copied - a condition denoted by the
> TARGET_MODES_TIEABLE_P hook.
>
> This patch adds a check to make sure that the vector modes are
> tieable before attempting to generate a subreg. This is a necessary
> prerequisite for a subsequent patch that will introduce new machine
> modes for Arm Neon vector-tuple types.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu - no issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-10-11  Jonathan Wright  
>
> * expmed.c (extract_bit_field_1): Ensure modes are tieable.

OK, thanks.

Richard

> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 
> 59734d4841cbd2056a7d5bda9134af79c8024c87..f58fb9d877d66809b39253ccdc803f0ecb009326
>  100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -1734,7 +1734,8 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>FOR_EACH_MODE_FROM (new_mode, new_mode)
>   if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
>   && known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
> - && targetm.vector_mode_supported_p (new_mode))
> + && targetm.vector_mode_supported_p (new_mode)
> + && targetm.modes_tieable_p (GET_MODE (op0), new_mode))
> break;
>if (new_mode != VOIDmode)
>   op0 = gen_lowpart (new_mode, op0);


[PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types

2021-10-22 Thread Jonathan Wright via Gcc-patches
Hi,

Until now, GCC has used large integer machine modes (OI, CI and XI)
to model Neon vector-tuple types. This is suboptimal for many
reasons, the most notable are:

 1) Large integer modes are opaque and modifying one vector in the
    tuple requires a lot of inefficient set/get gymnastics. The
    result is a lot of superfluous move instructions.
 2) Large integer modes do not map well to types that are tuples of
    64-bit vectors - we need additional zero-padding which again
    results in superfluous move instructions.

This patch adds new machine modes that better model the C-level Neon
vector-tuple types. The approach is somewhat similar to that already
used for SVE vector-tuple types.

All of the AArch64 backend patterns and builtins that manipulate Neon
vector tuples are updated to use the new machine modes. This has the
effect of significantly reducing the amount of boiler-plate code in
the arm_neon.h header.

While this patch increases the quality of code generated in many
instances, there is still room for significant improvement - which
will be attempted in subsequent patches.

Bootstrapped and regression tested on aarch64-none-linux-gnu and
aarch64_be-none-linux-gnu - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-08-09  Jonathan Wright  
            Richard Sandiford  

* config/aarch64/aarch64-builtins.c (v2x8qi_UP): Define.
(v2x4hi_UP): Likewise.
(v2x4hf_UP): Likewise.
(v2x4bf_UP): Likewise.
(v2x2si_UP): Likewise.
(v2x2sf_UP): Likewise.
(v2x1di_UP): Likewise.
(v2x1df_UP): Likewise.
(v2x16qi_UP): Likewise.
(v2x8hi_UP): Likewise.
(v2x8hf_UP): Likewise.
(v2x8bf_UP): Likewise.
(v2x4si_UP): Likewise.
(v2x4sf_UP): Likewise.
(v2x2di_UP): Likewise.
(v2x2df_UP): Likewise.
(v3x8qi_UP): Likewise.
(v3x4hi_UP): Likewise.
(v3x4hf_UP): Likewise.
(v3x4bf_UP): Likewise.
(v3x2si_UP): Likewise.
(v3x2sf_UP): Likewise.
(v3x1di_UP): Likewise.
(v3x1df_UP): Likewise.
(v3x16qi_UP): Likewise.
(v3x8hi_UP): Likewise.
(v3x8hf_UP): Likewise.
(v3x8bf_UP): Likewise.
(v3x4si_UP): Likewise.
(v3x4sf_UP): Likewise.
(v3x2di_UP): Likewise.
(v3x2df_UP): Likewise.
(v4x8qi_UP): Likewise.
(v4x4hi_UP): Likewise.
(v4x4hf_UP): Likewise.
(v4x4bf_UP): Likewise.
(v4x2si_UP): Likewise.
(v4x2sf_UP): Likewise.
(v4x1di_UP): Likewise.
(v4x1df_UP): Likewise.
(v4x16qi_UP): Likewise.
(v4x8hi_UP): Likewise.
(v4x8hf_UP): Likewise.
(v4x8bf_UP): Likewise.
(v4x4si_UP): Likewise.
(v4x4sf_UP): Likewise.
(v4x2di_UP): Likewise.
(v4x2df_UP): Likewise.
(TYPES_GETREGP): Delete.
(TYPES_SETREGP): Likewise.
(TYPES_LOADSTRUCT_U): Define.
(TYPES_LOADSTRUCT_P): Likewise.
(TYPES_LOADSTRUCT_LANE_U): Likewise.
(TYPES_LOADSTRUCT_LANE_P): Likewise.
(TYPES_STORE1P): Move for consistency.
(TYPES_STORESTRUCT_U): Define.
(TYPES_STORESTRUCT_P): Likewise.
(TYPES_STORESTRUCT_LANE_U): Likewise.
(TYPES_STORESTRUCT_LANE_P): Likewise.
(aarch64_simd_tuple_types): Define.
(aarch64_lookup_simd_builtin_type): Handle tuple type lookup.
(aarch64_init_simd_builtin_functions): Update frontend lookup
for builtin functions after handling arm_neon.h pragma.
(register_tuple_type): Manually set modes of single-integer
tuple types. Record tuple types.
* config/aarch64/aarch64-modes.def
(ADV_SIMD_D_REG_STRUCT_MODES): Define D-register tuple modes.
(ADV_SIMD_Q_REG_STRUCT_MODES): Define Q-register tuple modes.
(SVE_MODES): Give single-vector modes priority over vector-
tuple modes.
(VECTOR_MODES_WITH_PREFIX): Set partial-vector mode order to
be after all single-vector modes.
* config/aarch64/aarch64-simd-builtins.def: Update builtin
generator macros to reflect modifications to the backend
patterns.
* config/aarch64/aarch64-simd.md (aarch64_simd_ld2):
Use vector-tuple mode iterator and rename to...
(aarch64_simd_ld2): This.
(aarch64_simd_ld2r): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_ld2r): This.
(aarch64_vec_load_lanesoi_lane): Use vector-tuple mode
iterator and rename to...
(aarch64_vec_load_lanes_lane): This.
(vec_load_lanesoi): Use vector-tuple mode iterator and
rename to...
(vec_load_lanes): This.
(aarch64_simd_st2): Use vector-tuple mode iterator and
rename to...
(aarch64_simd_st2): This.
(aarch64_vec_store_lanesoi_lane): Use vector-tuple mode
iterator and rename to...
(aarch64_vec_store_lanes_lane): This.
  

Re: [PATCH] x86_64: Add insn patterns for V1TI mode logic operations.

2021-10-22 Thread Uros Bizjak via Gcc-patches
On Fri, Oct 22, 2021 at 9:19 AM Roger Sayle  wrote:
>
>
> On x86_64, V1TI mode holds a 128-bit integer value in a (vector) SSE
> register (where regular TI mode uses a pair of 64-bit general purpose
> scalar registers).  This patch improves the implementation of AND, IOR,
> XOR and NOT on these values.
>
> The benefit is demonstrated by the following simple test program:
>
> typedef unsigned __int128 v1ti __attribute__ ((__vector_size__ (16)));
> v1ti and(v1ti x, v1ti y) { return x & y; }
> v1ti ior(v1ti x, v1ti y) { return x | y; }
> v1ti xor(v1ti x, v1ti y) { return x ^ y; }
> v1ti not(v1ti x) { return ~x; }
>
> For which GCC currently generates the rather large:
>
> and:movdqa  %xmm0, %xmm2
> movq%xmm1, %rdx
> movq%xmm0, %rax
> andq%rdx, %rax
> movhlps %xmm2, %xmm3
> movhlps %xmm1, %xmm4
> movq%rax, %xmm0
> movq%xmm4, %rdx
> movq%xmm3, %rax
> andq%rdx, %rax
> movq%rax, %xmm5
> punpcklqdq  %xmm5, %xmm0
> ret
>
> ior:movdqa  %xmm0, %xmm2
> movq%xmm1, %rdx
> movq%xmm0, %rax
> orq %rdx, %rax
> movhlps %xmm2, %xmm3
> movhlps %xmm1, %xmm4
> movq%rax, %xmm0
> movq%xmm4, %rdx
> movq%xmm3, %rax
> orq %rdx, %rax
> movq%rax, %xmm5
> punpcklqdq  %xmm5, %xmm0
> ret
>
> xor:movdqa  %xmm0, %xmm2
> movq%xmm1, %rdx
> movq%xmm0, %rax
> xorq%rdx, %rax
> movhlps %xmm2, %xmm3
> movhlps %xmm1, %xmm4
> movq%rax, %xmm0
> movq%xmm4, %rdx
> movq%xmm3, %rax
> xorq%rdx, %rax
> movq%rax, %xmm5
> punpcklqdq  %xmm5, %xmm0
> ret
>
> not:movdqa  %xmm0, %xmm1
> movq%xmm0, %rax
> notq%rax
> movhlps %xmm1, %xmm2
> movq%rax, %xmm0
> movq%xmm2, %rax
> notq%rax
> movq%rax, %xmm3
> punpcklqdq  %xmm3, %xmm0
> ret
>
>
> with this patch we now generate the much more efficient:
>
> and:pand%xmm1, %xmm0
> ret
>
> ior:por %xmm1, %xmm0
> ret
>
> xor:pxor%xmm1, %xmm0
> ret
>
> not:pcmpeqd %xmm1, %xmm1
> pxor%xmm1, %xmm0
> ret
>
>
> For my first few attempts at this patch I tried adding V1TI to the
> existing VI and VI12_AVX_512F mode iterators, but these then have
> dependencies on other iterators (and attributes), and so on until
> everything ties itself into a knot, as V1TI mode isn't really a
> first-class vector mode on x86_64.  Hence I ultimately opted to use
> simple stand-alone patterns (as used by the existing TF mode support).
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  Ok for mainline?
>
>
> 2021-10-22  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/sse.md (v1ti3): New define_insn to
> implement V1TImode AND, IOR and XOR on TARGET_SSE2 (and above).
> (one_cmplv1ti2): New define expand.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse2-v1ti-logic.c: New test case.
> * gcc.target/i386/sse2-v1ti-logic-2.c: New test case.

There is no need for

/* { dg-require-effective-target sse2 } */

for compile tests. The compilation does not reach the assembler.

OK with the above change.

BTW: You can add testcases to the main patch with "git add "
and then create the patch with "git diff HEAD".

Thanks,
Uros.


Re: [PATCH 4/6] aarch64: Add machine modes for Neon vector-tuple types

2021-10-22 Thread Richard Sandiford via Gcc-patches
Thanks a lot for doing this.

Jonathan Wright  writes:
> @@ -763,9 +839,16 @@ aarch64_lookup_simd_builtin_type (machine_mode mode,
>  return aarch64_simd_builtin_std_type (mode, q);
>  
>for (i = 0; i < nelts; i++)
> -if (aarch64_simd_types[i].mode == mode
> - && aarch64_simd_types[i].q == q)
> -  return aarch64_simd_types[i].itype;
> +{
> +  if (aarch64_simd_types[i].mode == mode
> +   && aarch64_simd_types[i].q == q)
> + return aarch64_simd_types[i].itype;
> +  else if (aarch64_simd_tuple_types[i][0] != NULL_TREE)

Very minor (sorry for not noticing earlier), but: the “else” is
redundant here.

> + for (int j = 0; j < 3; j++)
> +   if (TYPE_MODE (aarch64_simd_tuple_types[i][j]) == mode
> +   && aarch64_simd_types[i].q == q)
> + return aarch64_simd_tuple_types[i][j];
> +}
>  
>return NULL_TREE;
>  }
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 48eddf64e05afe3788abfa05141f6544a9323ea1..0aa185b67ff13d40c87db0449aec312929ff5387
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -6636,162 +6636,165 @@
>  
>  ;; Patterns for vector struct loads and stores.
>  
> -(define_insn "aarch64_simd_ld2"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> - (unspec:OI [(match_operand:OI 1 "aarch64_simd_struct_operand" "Utv")
> - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> -UNSPEC_LD2))]
> +(define_insn "aarch64_simd_ld2"
> +  [(set (match_operand:VSTRUCT_2Q 0 "register_operand" "=w")
> + (unspec:VSTRUCT_2Q [
> +   (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand" "Utv")]
> +   UNSPEC_LD2))]
>"TARGET_SIMD"
>"ld2\\t{%S0. - %T0.}, %1"
>[(set_attr "type" "neon_load2_2reg")]
>  )
>  
> -(define_insn "aarch64_simd_ld2r"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> -   (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> -   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
> -  UNSPEC_LD2_DUP))]
> +(define_insn "aarch64_simd_ld2r"
> +  [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
> + (unspec:VSTRUCT_2QD [
> +   (match_operand:VSTRUCT_2QD 1 "aarch64_simd_struct_operand" "Utv")]
> +  UNSPEC_LD2_DUP))]

Sorry again for missing this, but the ld2rs, ld3rs and ld4rs should
keep their BLKmode arguments, since they only access 2, 3 or 4
scalar memory elements.

> @@ -7515,10 +7605,10 @@
>  )
>  
>  (define_insn_and_split "aarch64_combinev16qi"
> -  [(set (match_operand:OI 0 "register_operand" "=w")
> - (unspec:OI [(match_operand:V16QI 1 "register_operand" "w")
> - (match_operand:V16QI 2 "register_operand" "w")]
> -UNSPEC_CONCAT))]
> +  [(set (match_operand:V2x16QI 0 "register_operand" "=w")
> + (unspec:V2x16QI [(match_operand:V16QI 1 "register_operand" "w")
> +  (match_operand:V16QI 2 "register_operand" "w")]
> + UNSPEC_CONCAT))]

Just realised that we can now make this a vec_concat, since the
modes are finally self-consistent.

No need to do that though, either way is fine.

Looks good otherwise.

Richard


Re: [PATCH 5/6] gcc/lower_subreg.c: Prevent decomposition if modes are not tieable

2021-10-22 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> Preventing decomposition if modes are not tieable is necessary to
> stop AArch64 partial Neon structure modes being treated as packed in
> registers.
>
> This is a necessary prerequisite for a future AArch64 PCS change to
> maintain good code generation.
>
> Bootstrapped and regression tested on:
> * x86_64-pc-linux-gnu - no issues.
> * aarch64-none-linux-gnu - two test failures which will be fixed by
>   the next patch in this series.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-10-14  Jonathan Wright  
>
> * lower-subreg.c (simple_move): Prevent decomposition if
> modes are not tieable.

OK as a single commit with 6/6.  Thanks for splitting this out for
review purposes.

Richard

>
> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
> index 
> 21078268ba0d241efc469fe71357d3b94f8935d6..f0dc63f485f1237d96ceeb0c75dca9aa8e053c6e
>  100644
> --- a/gcc/lower-subreg.c
> +++ b/gcc/lower-subreg.c
> @@ -383,8 +383,10 @@ simple_move (rtx_insn *insn, bool speed_p)
>   non-integer mode for which there is no integer mode of the same
>   size.  */
>mode = GET_MODE (SET_DEST (set));
> +  scalar_int_mode int_mode;
>if (!SCALAR_INT_MODE_P (mode)
> -  && !int_mode_for_size (GET_MODE_BITSIZE (mode), 0).exists ())
> +  && (!int_mode_for_size (GET_MODE_BITSIZE (mode), 0).exists (&int_mode)
> +   || !targetm.modes_tieable_p (mode, int_mode)))
>  return NULL_RTX;
>  
>/* Reject PARTIAL_INT modes.  They are used for processor specific


Re: [PATCH 6/6] aarch64: Pass and return Neon vector-tuple types without a parallel

2021-10-22 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> Neon vector-tuple types can be passed in registers on function call
> and return - there is no need to generate a parallel rtx. This patch
> adds cases to detect vector-tuple modes and generates an appropriate
> register rtx.
>
> This change greatly improves code generated when passing Neon vector-
> tuple types between functions;

Indeed.

> many new test cases are added to defend these improvements.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu and
> aarch64_be-none-linux-gnu - no issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-10-07  Jonathan Wright  
>
> * config/aarch64/aarch64.c (aarch64_function_value): Generate
> a register rtx for Neon vector-tuple modes.
> (aarch64_layout_arg): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vector_structure_intrinsics.c: New code
> generation tests.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> cbfcf7efcca8e0978518b69cbeafb6812c38889a..9c2b3cb7d677a1570b32a8c9b6ee14bef156cb45
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6433,6 +6433,12 @@ aarch64_function_value (const_tree type, const_tree 
> func,
> gcc_assert (count == 1 && mode == ag_mode);
> return gen_rtx_REG (mode, V0_REGNUM);
>   }
> +  else if (aarch64_advsimd_full_struct_mode_p (mode)
> +&& known_eq (GET_MODE_SIZE (ag_mode), 16))
> + return gen_rtx_REG (mode, V0_REGNUM);
> +  else if (aarch64_advsimd_partial_struct_mode_p (mode)
> +&& known_eq (GET_MODE_SIZE (ag_mode), 8))
> + return gen_rtx_REG (mode, V0_REGNUM);
>else
>   {
> int i;
> @@ -6728,6 +6734,12 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
> function_arg_info &arg)
> gcc_assert (nregs == 1);
> pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
>   }
> +   else if (aarch64_advsimd_full_struct_mode_p (mode)
> +&& known_eq (GET_MODE_SIZE (pcum->aapcs_vfp_rmode), 16))
> + pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
> +   else if (aarch64_advsimd_partial_struct_mode_p (mode)
> +&& known_eq (GET_MODE_SIZE (pcum->aapcs_vfp_rmode), 8))
> + pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
> else
>   {
> rtx par;
> diff --git a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c 
> b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
> index 
> 89e9de18a92dbc00e58261e4558b3cff38c7ca75..100739ab4e67e27a7341b8b1a4ddd9494f0e181d
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
> @@ -17,6 +17,14 @@ TEST_TBL (vqtbl2q, int8x16_t, int8x16x2_t, uint8x16_t, s8)
>  TEST_TBL (vqtbl2q, uint8x16_t, uint8x16x2_t, uint8x16_t, u8)
>  TEST_TBL (vqtbl2q, poly8x16_t, poly8x16x2_t, uint8x16_t, p8)
>  
> +TEST_TBL (vqtbl3, int8x8_t, int8x16x3_t, uint8x8_t, s8)
> +TEST_TBL (vqtbl3, uint8x8_t, uint8x16x3_t, uint8x8_t, u8)
> +TEST_TBL (vqtbl3, poly8x8_t, poly8x16x3_t, uint8x8_t, p8)
> +
> +TEST_TBL (vqtbl3q, int8x16_t, int8x16x3_t, uint8x16_t, s8)
> +TEST_TBL (vqtbl3q, uint8x16_t, uint8x16x3_t, uint8x16_t, u8)
> +TEST_TBL (vqtbl3q, poly8x16_t, poly8x16x3_t, uint8x16_t, p8)
> +
>  TEST_TBL (vqtbl4, int8x8_t, int8x16x4_t, uint8x8_t, s8)
>  TEST_TBL (vqtbl4, uint8x8_t, uint8x16x4_t, uint8x8_t, u8)
>  TEST_TBL (vqtbl4, poly8x8_t, poly8x16x4_t, uint8x8_t, p8)
> @@ -25,62 +33,35 @@ TEST_TBL (vqtbl4q, int8x16_t, int8x16x4_t, uint8x16_t, s8)
>  TEST_TBL (vqtbl4q, uint8x16_t, uint8x16x4_t, uint8x16_t, u8)
>  TEST_TBL (vqtbl4q, poly8x16_t, poly8x16x4_t, uint8x16_t, p8)
>  
> -#define TEST_TBL3(name, rettype, tbltype, idxtype, ts) \
> -  rettype test_ ## name ## _ ## ts (idxtype a, tbltype b) \
> - { \
> - return name ## _ ## ts (b, a); \
> - }
> -
> -TEST_TBL3 (vqtbl3, int8x8_t, int8x16x3_t, uint8x8_t, s8)
> -TEST_TBL3 (vqtbl3, uint8x8_t, uint8x16x3_t, uint8x8_t, u8)
> -TEST_TBL3 (vqtbl3, poly8x8_t, poly8x16x3_t, uint8x8_t, p8)
> -
> -TEST_TBL3 (vqtbl3q, int8x16_t, int8x16x3_t, uint8x16_t, s8)
> -TEST_TBL3 (vqtbl3q, uint8x16_t, uint8x16x3_t, uint8x16_t, u8)
> -TEST_TBL3 (vqtbl3q, poly8x16_t, poly8x16x3_t, uint8x16_t, p8)
> -
> -#define TEST_TBX2(name, rettype, tbltype, idxtype, ts) \
> -  rettype test_ ## name ## _ ## ts (rettype a, idxtype b, tbltype c) \
> - { \
> - return name ## _ ## ts (a, c, b); \
> - }
> -
> -TEST_TBX2 (vqtbx2, int8x8_t, int8x16x2_t, uint8x8_t, s8)
> -TEST_TBX2 (vqtbx2, uint8x8_t, uint8x16x2_t, uint8x8_t, u8)
> -TEST_TBX2 (vqtbx2, poly8x8_t, poly8x16x2_t, uint8x8_t, p8)
> -
> -TEST_TBX2 (vqtbx2q, int8x16_t, int8x16x2_t, uint8x16_t, s8)
> -TEST_TBX2 (vqtbx2q, uint8x16_t, uint8x16x2_t, uint8x16_t, u8)
> -TEST_TBX2 (vqtbx2q,

Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-22 Thread Aldy Hernandez via Gcc-patches
On Fri, Oct 22, 2021 at 4:27 PM Martin Sebor  wrote:
>
> On 10/22/21 5:22 AM, Aldy Hernandez wrote:
> > On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor  wrote:
> >
> >> I'd like to see gimple-ssa-array-bounds invoked from the access
> >> pass too (instead of from VRP), and eventually -Wrestrict as well.
> >
> > You can do that right now.  The pass has been converted to the new API
> > and it would just require calling it with a ranger instead of the
> > vr_values from VRP:
> >
> >array_bounds_checker array_checker (fun, &vrp_vr_values);
> >array_checker.check ();
> >
> > That is, move it where you want and pass it a fresh new gimple_ranger.
> > If there are any regressions, we'd be glad to look at them.
>
> I appreciate that and I'm not worried about regressions due to
> ranger code.
>
> It's not so simple as it seems because of the optimizer
> dependencies I mentioned.  -Warray-bounds runs before vectorization
> and the access pass after it.  Moving it into the access pass will
> mean dealing with the fallout: either accepting regressions in
> the quality of warnings (bad locations due to vectorization
> merging distinct stores into one) or running the access pass at
> a different point in the pipeline, and facing regressions in
> the other warnings due to that.  Running it twice, once earlier
> for -Warray-bounds and then again later for -Wstringop-overflow
> etc, would be less than optimal because they all do the same
> thing (compute object sizes and offsets) and should be able to
> share the same data (the pointer query cache).  So the ideal
> solution is to find a middle ground where all these warnings
> can run from the same pass with optimal results.
>
> -Warray-bounds might also need to be adjusted for -O0 to avoid
> warning on unreachable code, although, surprisingly, that hasn't
> been an issue for the other warnings now enabled at -O0.
>
> All this will take some time, which I'm out of for this stage 1.
>
> >
> >> I'm not sure about the strlen/sprintf warnings; those might need
> >> to stay where they are because they run as part of the optimizers
> >> there.
> >>
> >> (By the way, I don't see range info in the access pass at -O0.
> >> Should I?)
> >
> > I assume you mean you don't see anything in the dump files.
>
> I mean that I don't get accurate range info from the ranger
> instance in any function.  I'd like the example below to trigger
> a warning even at -O0 but it doesn't because n's range is
> [0, UINT_MAX] instead of [7, UINT_MAX]:
>
>char a[4];
>
>void f (unsigned n)
>{
>  if (n < 7)
>n = 7;
>  __builtin_memset (a, 0, n);
>}

Breakpoint 5, get_size_range (query=0x0, bound=, range=0x7fffda10,
bndrng=0x7fffdc98) at
/home/aldyh/src/gcc/gcc/gimple-ssa-warn-access.cc:1196
(gdb) p debug_ranger()
;; Function f

=== BB 2 
Imports: n_3(D)
Exports: n_3(D)
n_3(D)unsigned int VARYING
 :
if (n_3(D) <= 6)
  goto ; [INV]
else
  goto ; [INV]

2->3  (T) n_3(D) : unsigned int [0, 6]
2->4  (F) n_3(D) : unsigned int [7, +INF]

=== BB 3 
 :
n_4 = 7;

n_4 : unsigned int [7, 7]

=== BB 4 
 :
# n_2 = PHI 
_1 = (long unsigned int) n_2;
__builtin_memset (&a, 0, _1);
return;

_1 : long unsigned int [7, 4294967295]
n_2 : unsigned int [7, +INF]
Non-varying global ranges:
=:
_1  : long unsigned int [7, 4294967295]
n_2  : unsigned int [7, +INF]
n_4  : unsigned int [7, 7]

>From the above it looks like _1 at BB4 is [7, 4294967295].   You probably wan:

  range_of_expr (r, tree_for_ssa_1, gimple_for_the_memset_call)

BTW, debug_ranger() tells you everything ranger would know for the
given IL.  It's meant as a debugging aid.  You may want to look at
it's source to see how it calls the ranger.

Aldy



Re: [PATCH][WIP] Add install-dvi Makefile targets

2021-10-22 Thread Jeff Law via Gcc-patches




On 10/18/2021 7:30 PM, Eric Gallager wrote:

On Tue, Oct 12, 2021 at 5:09 PM Eric Gallager  wrote:

On Thu, Oct 6, 2016 at 10:41 AM Eric Gallager  wrote:

Currently the build machinery handles install-pdf and install-html
targets, but no install-dvi target. This patch is a step towards
fixing that. Note that I have only tested with
--enable-languages=c,c++,lto,objc,obj-c++. Thus, target hooks will
probably also have to be added for the languages I skipped.
Also, please note that this patch applies on top of:
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00370.html

ChangeLog:

2016-10-06  Eric Gallager  

 * Makefile.def: Handle install-dvi target.
 * Makefile.tpl: Likewise.
 * Makefile.in: Regenerate.

gcc/ChangeLog:

2016-10-06  Eric Gallager  

 * Makefile.in: Handle dvidir and install-dvi target.
 * ./[c|cp|lto|objc|objcp]/Make-lang.in: Add dummy install-dvi
 target hooks.
 * configure.ac: Handle install-dvi target.
 * configure: Regenerate.

libiberty/ChangeLog:

2016-10-06  Eric Gallager  

 * Makefile.in: Handle dvidir and install-dvi target.
 * functions.texi: Regenerate.

Ping. The prerequisite patch that I linked to previously has gone in now.
I'm not sure if this specific patch still applies, though.
Also note that I've opened a bug to track this issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102663

Hi, I have updated this patch and tested it with more languages now; I
can now confirm that it works with ada, d, and fortran now. The only
languages that remain untested now are go (since I'm building on
darwin and go doesn't build on darwin anyways, as per bug 46986) and
jit (which I ran into a bug about that I brought up on IRC, and will
probably need to file on bugzilla). OK to install?
Yea, I think this is OK.  We might need to adjust go/jit and perhaps 
other toplevel modules, but if those do show up as problems I think we 
can fault in those fixes.


jeff


Re: [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops

2021-10-22 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists) via Gcc-patches"  writes:
> Hi,
>
> This patch changes the order in which we check outside and inside costs 
> for epilogue loops, this is to ensure that a predicated epilogue is more 
> likely to be picked over an unpredicated one, since it saves having to 
> enter a scalar epilogue loop.
>
> gcc/ChangeLog:
>
>      * tree-vect-loop.c (vect_better_loop_vinfo_p): Change how 
> epilogue loop costs are compared.

OK, thanks.  Sorry for the slow review.

Richard

> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 
> 14f8150d7c262b9422784e0e997ca4387664a20a..038af13a91d43c9f09186d042cf415020ea73a38
>  100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2881,17 +2881,75 @@ vect_better_loop_vinfo_p (loop_vec_info 
> new_loop_vinfo,
>   return new_simdlen_p;
>  }
>  
> +  loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo);
> +  if (main_loop)
> +{
> +  poly_uint64 main_poly_vf = LOOP_VINFO_VECT_FACTOR (main_loop);
> +  unsigned HOST_WIDE_INT main_vf;
> +  unsigned HOST_WIDE_INT old_factor, new_factor, old_cost, new_cost;
> +  /* If we can determine how many iterations are left for the epilogue
> +  loop, that is if both the main loop's vectorization factor and number
> +  of iterations are constant, then we use them to calculate the cost of
> +  the epilogue loop together with a 'likely value' for the epilogues
> +  vectorization factor.  Otherwise we use the main loop's vectorization
> +  factor and the maximum poly value for the epilogue's.  If the target
> +  has not provided with a sensible upper bound poly vectorization
> +  factors are likely to be favored over constant ones.  */
> +  if (main_poly_vf.is_constant (&main_vf)
> +   && LOOP_VINFO_NITERS_KNOWN_P (main_loop))
> + {
> +   unsigned HOST_WIDE_INT niters
> + = LOOP_VINFO_INT_NITERS (main_loop) % main_vf;
> +   HOST_WIDE_INT old_likely_vf
> + = estimated_poly_value (old_vf, POLY_VALUE_LIKELY);
> +   HOST_WIDE_INT new_likely_vf
> + = estimated_poly_value (new_vf, POLY_VALUE_LIKELY);
> +
> +   /* If the epilogue is using partial vectors we account for the
> +  partial iteration here too.  */
> +   old_factor = niters / old_likely_vf;
> +   if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo)
> +   && niters % old_likely_vf != 0)
> + old_factor++;
> +
> +   new_factor = niters / new_likely_vf;
> +   if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo)
> +   && niters % new_likely_vf != 0)
> + new_factor++;
> + }
> +  else
> + {
> +   unsigned HOST_WIDE_INT main_vf_max
> + = estimated_poly_value (main_poly_vf, POLY_VALUE_MAX);
> +
> +   old_factor = main_vf_max / estimated_poly_value (old_vf,
> +POLY_VALUE_MAX);
> +   new_factor = main_vf_max / estimated_poly_value (new_vf,
> +POLY_VALUE_MAX);
> +
> +   /* If the loop is not using partial vectors then it will iterate one
> +  time less than one that does.  It is safe to subtract one here,
> +  because the main loop's vf is always at least 2x bigger than that
> +  of an epilogue.  */
> +   if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo))
> + old_factor -= 1;
> +   if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo))
> + new_factor -= 1;
> + }
> +
> +  /* Compute the costs by multiplying the inside costs with the factor 
> and
> +  add the outside costs for a more complete picture.  The factor is the
> +  amount of times we are expecting to iterate this epilogue.  */
> +  old_cost = old_loop_vinfo->vec_inside_cost * old_factor;
> +  new_cost = new_loop_vinfo->vec_inside_cost * new_factor;
> +  old_cost += old_loop_vinfo->vec_outside_cost;
> +  new_cost += new_loop_vinfo->vec_outside_cost;
> +  return new_cost < old_cost;
> +}
> +
>/* Limit the VFs to what is likely to be the maximum number of iterations,
>   to handle cases in which at least one loop_vinfo is fully-masked.  */
> -  HOST_WIDE_INT estimated_max_niter;
> -  loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo);
> -  unsigned HOST_WIDE_INT main_vf;
> -  if (main_loop
> -  && LOOP_VINFO_NITERS_KNOWN_P (main_loop)
> -  && LOOP_VINFO_VECT_FACTOR (main_loop).is_constant (&main_vf))
> -estimated_max_niter = LOOP_VINFO_INT_NITERS (main_loop) % main_vf;
> -  else
> -estimated_max_niter = likely_max_stmt_executions_int (loop);
> +  HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop);
>if (estimated_max_niter != -1)
>  {
>if (known_le (estimated_max_niter, new_vf))


Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-22 Thread Martin Sebor via Gcc-patches

On 10/22/21 9:18 AM, Aldy Hernandez wrote:

On Fri, Oct 22, 2021 at 4:27 PM Martin Sebor  wrote:


On 10/22/21 5:22 AM, Aldy Hernandez wrote:

On Thu, Oct 21, 2021 at 4:51 PM Martin Sebor  wrote:


I'd like to see gimple-ssa-array-bounds invoked from the access
pass too (instead of from VRP), and eventually -Wrestrict as well.


You can do that right now.  The pass has been converted to the new API
and it would just require calling it with a ranger instead of the
vr_values from VRP:

array_bounds_checker array_checker (fun, &vrp_vr_values);
array_checker.check ();

That is, move it where you want and pass it a fresh new gimple_ranger.
If there are any regressions, we'd be glad to look at them.


I appreciate that and I'm not worried about regressions due to
ranger code.

It's not so simple as it seems because of the optimizer
dependencies I mentioned.  -Warray-bounds runs before vectorization
and the access pass after it.  Moving it into the access pass will
mean dealing with the fallout: either accepting regressions in
the quality of warnings (bad locations due to vectorization
merging distinct stores into one) or running the access pass at
a different point in the pipeline, and facing regressions in
the other warnings due to that.  Running it twice, once earlier
for -Warray-bounds and then again later for -Wstringop-overflow
etc, would be less than optimal because they all do the same
thing (compute object sizes and offsets) and should be able to
share the same data (the pointer query cache).  So the ideal
solution is to find a middle ground where all these warnings
can run from the same pass with optimal results.

-Warray-bounds might also need to be adjusted for -O0 to avoid
warning on unreachable code, although, surprisingly, that hasn't
been an issue for the other warnings now enabled at -O0.

All this will take some time, which I'm out of for this stage 1.




I'm not sure about the strlen/sprintf warnings; those might need
to stay where they are because they run as part of the optimizers
there.

(By the way, I don't see range info in the access pass at -O0.
Should I?)


I assume you mean you don't see anything in the dump files.


I mean that I don't get accurate range info from the ranger
instance in any function.  I'd like the example below to trigger
a warning even at -O0 but it doesn't because n's range is
[0, UINT_MAX] instead of [7, UINT_MAX]:

char a[4];

void f (unsigned n)
{
  if (n < 7)
n = 7;
  __builtin_memset (a, 0, n);
}


Breakpoint 5, get_size_range (query=0x0, bound=, range=0x7fffda10,
 bndrng=0x7fffdc98) at
/home/aldyh/src/gcc/gcc/gimple-ssa-warn-access.cc:1196
(gdb) p debug_ranger()
;; Function f

=== BB 2 
Imports: n_3(D)
Exports: n_3(D)
n_3(D)unsigned int VARYING
  :
 if (n_3(D) <= 6)
   goto ; [INV]
 else
   goto ; [INV]

2->3  (T) n_3(D) : unsigned int [0, 6]
2->4  (F) n_3(D) : unsigned int [7, +INF]

=== BB 3 
  :
 n_4 = 7;

n_4 : unsigned int [7, 7]

=== BB 4 
  :
 # n_2 = PHI 
 _1 = (long unsigned int) n_2;
 __builtin_memset (&a, 0, _1);
 return;

_1 : long unsigned int [7, 4294967295]
n_2 : unsigned int [7, +INF]
Non-varying global ranges:
=:
_1  : long unsigned int [7, 4294967295]
n_2  : unsigned int [7, +INF]
n_4  : unsigned int [7, 7]

 From the above it looks like _1 at BB4 is [7, 4294967295].


Great!


  You probably wan:

   range_of_expr (r, tree_for_ssa_1, gimple_for_the_memset_call)


That's what the function does.  But its caller doesn't have
access to the Gimple statement so it passes in null instead.
Presumably without it, range_of_expr() doesn't have enough
context to know what BB I'm asking about.  It does work
without the statement at -O but then there's just one BB
(the if statement becomes a MAX_EXPR) so there's just one
range.



BTW, debug_ranger() tells you everything ranger would know for the
given IL.  It's meant as a debugging aid.  You may want to look at
it's source to see how it calls the ranger.


Thanks for the tip.  I should do that.  There's a paradigm
shift from the old ways of working with ranges and the new
way, and it will take a bit of adjusting to.  I just haven't
spent enough time working with Ranger to be there.  But this
exchange alone was already very helpful!

Martin


[PATCH] rs6000: Add optimizations for _mm_sad_epu8

2021-10-22 Thread Paul A. Clarke via Gcc-patches
Power9 ISA added `vabsdub` instruction which is realized in the
`vec_absd` instrinsic.

Use `vec_absd` for `_mm_sad_epu8` compatibility intrinsic, when
`_ARCH_PWR9`.

Also, the realization of `vec_sum2s` on little-endian includes
two shifts in order to position the input and output to match
the semantics of `vec_sum2s`:
- Shift the second input vector left 12 bytes. In the current usage,
  that vector is `{0}`, so this shift is unnecessary, but is currently
  not eliminated under optimization.
- Shift the vector produced by the `vsum2sws` instruction left 4 bytes.
  The two words within each doubleword of this (shifted) result must then
  be explicitly swapped to match the semantics of `_mm_sad_epu8`,
  effectively reversing this shift.  So, this shift (and a susequent swap)
  are unnecessary, but not currently removed under optimization.

Using `__builtin_altivec_vsum2sws` retains both shifts, so is not an
option for removing the shifts.

For little-endian, use the `vsum2sws` instruction directly, and
eliminate the explicit shift (swap).

2021-10-22  Paul A. Clarke  

gcc
* config/rs6000/emmintrin.h (_mm_sad_epu8): Use vec_absd
when _ARCH_PWR9, optimize vec_sum2s when LE.
---
Tested on powerpc64le-linux on Power9, with and without `-mcpu=power9`,
and on powerpc/powerpc64-linux on Power8.

OK for trunk?

 gcc/config/rs6000/emmintrin.h | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ab16c13c379e..c4758be0e777 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -2197,27 +2197,37 @@ extern __inline __m128i __attribute__((__gnu_inline__, 
__always_inline__, __arti
 _mm_sad_epu8 (__m128i __A, __m128i __B)
 {
   __v16qu a, b;
-  __v16qu vmin, vmax, vabsdiff;
+  __v16qu vabsdiff;
   __v4si vsum;
   const __v4su zero = { 0, 0, 0, 0 };
   __v4si result;
 
   a = (__v16qu) __A;
   b = (__v16qu) __B;
-  vmin = vec_min (a, b);
-  vmax = vec_max (a, b);
+#ifndef _ARCH_PWR9
+  __v16qu vmin = vec_min (a, b);
+  __v16qu vmax = vec_max (a, b);
   vabsdiff = vec_sub (vmax, vmin);
+#else
+  vabsdiff = vec_absd (a, b);
+#endif
   /* Sum four groups of bytes into integers.  */
   vsum = (__vector signed int) vec_sum4s (vabsdiff, zero);
+#ifdef __LITTLE_ENDIAN__
+  /* Sum across four integers with two integer results.  */
+  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
+  /* Note: vec_sum2s could be used here, but on little-endian, vector
+ shifts are added that are not needed for this use-case.
+ A vector shift to correctly position the 32-bit integer results
+ (currently at [0] and [2]) to [1] and [3] would then need to be
+ swapped back again since the desired results are two 64-bit
+ integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
+#else
   /* Sum across four integers with two integer results.  */
   result = vec_sum2s (vsum, (__vector signed int) zero);
   /* Rotate the sums into the correct position.  */
-#ifdef __LITTLE_ENDIAN__
-  result = vec_sld (result, result, 4);
-#else
   result = vec_sld (result, result, 6);
 #endif
-  /* Rotate the sums into the correct position.  */
   return (__m128i) result;
 }
 
-- 
2.27.0



Re: [PATCH] Handle jobserver file descriptors in btest.

2021-10-22 Thread Ian Lance Taylor via Gcc-patches
On Fri, Oct 22, 2021 at 1:15 AM Martin Liška  wrote:
>
> On 10/21/21 20:15, Ian Lance Taylor wrote:
> > On Thu, Oct 21, 2021 at 12:48 AM Martin Liška  wrote:
> >>
> >> The patch is about sensitive handling of file descriptors opened
> >> by make's jobserver.
> >
> > Thanks.  I think a better approach would be, at the start of main,
> > fstat the descriptors up to 10 and record the ones for which fstat
> > succeeds.  Then at the end of main only check the descriptors for
> > which fstat failed earlier.
>
> Sure, makes sense.
>
> >
> > I can work on that at some point if you don't want to tackle it.
>
> I've just done that in the attached patch.
>
> Is it fine?

This is OK.

Thanks.

Ian


[PATCH] sra: Fix the fix for PR 102505 (PR 102886)

2021-10-22 Thread Martin Jambor
Hi,

I was not careful with the fix for PR 102505 and did not craft the
check to satisfy the verifier carefully, which lead to PR 102886.
(The verifier has the test structured differently and somewhat
redundantly, so I could not just copy it).

This patch fixes it.  I hope it is quite obvious correction of an
oversight and so will commit it if survives bootstrap and testing on
x86_64-linux and ppc64le-linux.

Testcase for this bug is gcc.dg/tree-ssa/sra-18.c (but only on
platforms with constant pools).  I will backport the two fixes
to the release branches squashed.

Sorry for the stupid mistake,

Martin


gcc/ChangeLog:

2021-10-22  Martin Jambor  

PR tree-optimization/102886
* tree-sra.c (totally_scalarize_subtree): Fix the out of
access-condition.
---
 gcc/tree-sra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index f561e1a2133..76e3aae405c 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3288,7 +3288,7 @@ totally_scalarize_subtree (struct access *root)
  continue;
 
HOST_WIDE_INT pos = root->offset + int_bit_position (fld);
-   if (pos + fsize > root->size)
+   if (pos + fsize > root->offset + root->size)
  return false;
enum total_sra_field_state
  state = total_should_skip_creating_access (root,
-- 
2.33.0



[Fortran, committed] Add testcase for PR 94289

2021-10-22 Thread Sandra Loosemore
I've committed this slightly cleaned-up version of the testcase 
originally submitted with the now-fixed issue PR 94289.


-Sandra
commit c31d2d14f798dc7ca9cc078200d37113749ec3bd
Author: Sandra Loosemore 
Date:   Fri Oct 22 11:08:19 2021 -0700

Add testcase for PR fortran/94289

2021-10-22  José Rui Faustino de Sousa  
	Sandra Loosemore  

	gcc/testsuite/

	PR fortran/94289
	* gfortran.dg/PR94289.f90: New.

diff --git a/gcc/testsuite/gfortran.dg/PR94289.f90 b/gcc/testsuite/gfortran.dg/PR94289.f90
new file mode 100644
index 000..4f17d97
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR94289.f90
@@ -0,0 +1,168 @@
+! { dg-do run }
+!
+! Testcase for PR 94289
+!
+! - if the dummy argument is a pointer/allocatable, it has the same 
+!   bounds as the dummy argument
+! - if is is nonallocatable nonpointer, the lower bounds are [1, 1, 1].
+
+module bounds_m
+
+  implicit none
+
+  private
+  public :: &
+lb, ub
+
+  public :: &
+bnds_p, &
+bnds_a, &
+bnds_e
+
+  integer, parameter :: lb1 = 3
+  integer, parameter :: lb2 = 5
+  integer, parameter :: lb3 = 9
+  integer, parameter :: ub1 = 4
+  integer, parameter :: ub2 = 50
+  integer, parameter :: ub3 = 11
+  integer, parameter :: ex1 = ub1 - lb1 + 1
+  integer, parameter :: ex2 = ub2 - lb2 + 1
+  integer, parameter :: ex3 = ub3 - lb3 + 1
+
+  integer, parameter :: lf(*) = [1,1,1]
+  integer, parameter :: lb(*) = [lb1,lb2,lb3]
+  integer, parameter :: ub(*) = [ub1,ub2,ub3]
+  integer, parameter :: ex(*) = [ex1,ex2,ex3]
+
+contains
+
+  subroutine bounds(a, lb, ub)
+integer, pointer, intent(in) :: a(..)
+integer,  intent(in) :: lb(3)
+integer,  intent(in) :: ub(3)
+
+integer :: ex(3)
+
+ex = max(ub-lb+1, 0)
+if(any(lbound(a)/=lb)) stop 101
+if(any(ubound(a)/=ub)) stop 102
+if(any( shape(a)/=ex)) stop 103
+return
+  end subroutine bounds
+
+  subroutine bnds_p(this)
+integer, pointer, intent(in) :: this(..)
+
+if(any(lbound(this)/=lb)) stop 1
+if(any(ubound(this)/=ub)) stop 2
+if(any( shape(this)/=ex)) stop 3
+call bounds(this, lb, ub)
+return
+  end subroutine bnds_p
+  
+  subroutine bnds_a(this)
+integer, allocatable, target, intent(in) :: this(..)
+
+if(any(lbound(this)/=lb)) stop 4
+if(any(ubound(this)/=ub)) stop 5
+if(any( shape(this)/=ex)) stop 6
+call bounds(this, lb, ub)
+return
+  end subroutine bnds_a
+  
+  subroutine bnds_e(this)
+integer, target, intent(in) :: this(..)
+
+if(any(lbound(this)/=lf)) stop 7
+if(any(ubound(this)/=ex)) stop 8
+if(any( shape(this)/=ex)) stop 9
+call bounds(this, lf, ex)
+return
+  end subroutine bnds_e
+  
+end module bounds_m
+
+program bounds_p
+
+  use, intrinsic :: iso_c_binding, only: c_int
+  
+  use bounds_m
+  
+  implicit none
+
+  integer, parameter :: fpn = 1
+  integer, parameter :: fan = 2
+  integer, parameter :: fon = 3
+
+  integer :: i
+  
+  do i = fpn, fon
+call test_p(i)
+  end do
+  do i = fpn, fon
+call test_a(i)
+  end do
+  do i = fpn, fon
+call test_e(i)
+  end do
+  stop
+
+contains
+
+  subroutine test_p(t)
+integer, intent(in) :: t
+
+integer, pointer :: a(:,:,:)
+
+allocate(a(lb(1):ub(1),lb(2):ub(2),lb(3):ub(3)))
+select case(t)
+case(fpn)
+  call bnds_p(a)
+case(fan)
+case(fon)
+  call bnds_e(a)
+case default
+  stop
+end select
+deallocate(a)
+return
+  end subroutine test_p
+
+  subroutine test_a(t)
+integer, intent(in) :: t
+
+integer, allocatable, target :: a(:,:,:)
+
+allocate(a(lb(1):ub(1),lb(2):ub(2),lb(3):ub(3)))
+select case(t)
+case(fpn)
+  call bnds_p(a)
+case(fan)
+  call bnds_a(a)
+case(fon)
+  call bnds_e(a)
+case default
+  stop
+end select
+deallocate(a)
+return
+  end subroutine test_a
+
+  subroutine test_e(t)
+integer, intent(in) :: t
+
+integer, target :: a(lb(1):ub(1),lb(2):ub(2),lb(3):ub(3))
+
+select case(t)
+case(fpn)
+  call bnds_p(a)
+case(fan)
+case(fon)
+  call bnds_e(a)
+case default
+  stop
+end select
+return
+  end subroutine test_e
+
+end program bounds_p


[PATCH] PR fortran/102816 - [12 Regression] ICE in resolve_structure_cons, at fortran/resolve.c:1467

2021-10-22 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

the recently introduced shape validation for array components
in DT constructors did not properly deal with invalid code
created by ingenious testers.

Obvious solution: replace the gcc_assert by a suitable error message.

Regarding the error message: before the shape validation, gfortran
would emit the same error message twice referring to the same line,
namely the bad declaration of the component.  With the attached patch
we get one error message for the bad declaration of the component,
and one for the structure constructor referring to that DT component.
One could easily change that and make the second message refer to the
same as the declaration, giving two errors for the same line.

Comments / opinions?

Regtested on x86_64-pc-linux-gnu.  OK?

Thanks,
Harald

Fortran: error recovery on initializing invalid derived type array component

gcc/fortran/ChangeLog:

	PR fortran/102816
	* resolve.c (resolve_structure_cons): Reject invalid array spec of
	a DT component referred in a structure constructor.

gcc/testsuite/ChangeLog:

	PR fortran/102816
	* gfortran.dg/pr102816.f90: New test.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 5ccd9072c24..dc4ca5ef818 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -1463,8 +1463,15 @@ resolve_structure_cons (gfc_expr *expr, int init)
 	  mpz_init (len);
 	  for (int n = 0; n < rank; n++)
 	{
-	  gcc_assert (comp->as->upper[n]->expr_type == EXPR_CONSTANT
-			  && comp->as->lower[n]->expr_type == EXPR_CONSTANT);
+	  if (comp->as->upper[n]->expr_type != EXPR_CONSTANT
+		  || comp->as->lower[n]->expr_type != EXPR_CONSTANT)
+		{
+		  gfc_error ("Bad array spec of component %qs referred in "
+			 "structure constructor at %L",
+			 comp->name, &cons->expr->where);
+		  t = false;
+		  break;
+		};
 	  mpz_set_ui (len, 1);
 	  mpz_add (len, len, comp->as->upper[n]->value.integer);
 	  mpz_sub (len, len, comp->as->lower[n]->value.integer);
diff --git a/gcc/testsuite/gfortran.dg/pr102816.f90 b/gcc/testsuite/gfortran.dg/pr102816.f90
new file mode 100644
index 000..46831743b2b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr102816.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! PR fortran/102816
+
+program p
+  type t
+ integer :: a([2]) ! { dg-error "must be scalar" }
+  end type
+  type(t) :: x = t([3, 4]) ! { dg-error "Bad array spec of component" }
+end


Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)

2021-10-22 Thread Harald Anlauf via Gcc-patches

Hi Tobias,

Am 22.10.21 um 15:06 schrieb Tobias Burnus:


https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html


PR100103 - Automatic reallocation fails inside select rank
Still segfaults at runtime for 'that = a' where the RHS is a parameter
and the LHS an allocatable assumed-rank array (inside select rank).

TODO: Review patch


this one LGTM.

Thanks for the patch!

Harald


Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)

2021-10-22 Thread Harald Anlauf via Gcc-patches

Hi Tobias, José,

Am 22.10.21 um 15:06 schrieb Tobias Burnus:


https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html


PR100136 - ICE, regression, using flag -fcheck=pointer

First testcase has an ICE with -fcheck=pointer
Second testcase has always an ICE + possibly missing func.

TODO: Review patch – and probably: follow-up patch for remaining issue


I think this LGTM.

Thanks for the patch!

Harald


Re: José's pending bind(C) patches / status (was: Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid)

2021-10-22 Thread Harald Anlauf via Gcc-patches

Hi Tobias, José,

Am 22.10.21 um 15:06 schrieb Tobias Burnus:


https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html


PR100245 - ICE on automatic reallocation.
Still ICEs

TODO: Review patch.


this one works and LGTM.

Thanks for the patch!

Harald



Re: [PATCH] Port update-copyright.py to Python3

2021-10-22 Thread Thomas Schwinge
Hi!

On 2021-01-04T11:15:22+0100, Martin Liška  wrote:
> The patch ports the script to Python3.

Turns out, there is another issue, observed in combination with a
few "BadYear" occurrences due to "improper" copyright lines (Bill,
for your information).  OK to push "Fix 'contrib/update-copyright.py':
'TypeError: exceptions must derive from BaseException'" as well as
"Fix 'Copyright (C) 2020-21' into '2020-2021'", see attached?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3cffeead3b7f900999ec7885ae044e63e44deff3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 22 Oct 2021 15:54:42 +0200
Subject: [PATCH] Fix 'contrib/update-copyright.py': 'TypeError: exceptions
 must derive from BaseException'

Running 'contrib/update-copyright.py' currently fails:

[...]
Traceback (most recent call last):
  File "contrib/update-copyright.py", line 365, in update_copyright
canon_form = self.canonicalise_years (dir, filename, filter, years)
  File "contrib/update-copyright.py", line 270, in canonicalise_years
(min_year, max_year) = self.year_range (years)
  File "contrib/update-copyright.py", line 253, in year_range
year_list = [self.parse_year (year)
  File "contrib/update-copyright.py", line 253, in 
year_list = [self.parse_year (year)
  File "contrib/update-copyright.py", line 250, in parse_year
raise self.BadYear (string)
TypeError: exceptions must derive from BaseException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "contrib/update-copyright.py", line 796, in 
GCCCmdLine().main()
  File "contrib/update-copyright.py", line 527, in main
self.copyright.process_tree (dir, filter)
  File "contrib/update-copyright.py", line 458, in process_tree
self.process_file (dir, filename, filter)
  File "contrib/update-copyright.py", line 421, in process_file
res = self.update_copyright (dir, filename, filter,
  File "contrib/update-copyright.py", line 366, in update_copyright
except self.BadYear as e:
TypeError: catching classes that do not inherit from BaseException is not allowed

Fix up for commit 3b25e83536bcd1b2977659a2c6d9f0f9bf2a3152
"Port update-copyright.py to Python3".

	contrib/
	* update-copyright.py (class BadYear): Derive from 'Exception'.
---
 contrib/update-copyright.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py
index 2b2bb11d2e6..d13b963a147 100755
--- a/contrib/update-copyright.py
+++ b/contrib/update-copyright.py
@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 #
-# Copyright (C) 2013-2020 Free Software Foundation, Inc.
+# Copyright (C) 2013-2021 Free Software Foundation, Inc.
 #
 # This script is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -233,7 +233,7 @@ class Copyright:
 def add_external_author (self, holder):
 self.holders[holder] = None
 
-class BadYear():
+class BadYear (Exception):
 def __init__ (self, year):
 self.year = year
 
-- 
2.33.0

>From 881f3e7701ab7ae5269db72cb33a7879b7e94e09 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 22 Oct 2021 16:01:54 +0200
Subject: [PATCH] Fix 'Copyright (C) 2020-21' into '2020-2021'

'contrib/update-copyright.py' currently complains:

gcc/config/rs6000/rs6000-gen-builtins.c: unrecognised year: 21
gcc/config/rs6000/rs6000-overload.def: unrecognised year: 21
gcc/config/rs6000/rbtree.h: unrecognised year: 21
gcc/config/rs6000/rbtree.c: unrecognised year: 21
gcc/config/rs6000/rs6000-builtin-new.def: unrecognised year: 21

Fix up files added in commit fa5f8b49e55caf5bb341f5eb6b5ab828b9286425
"rs6000: Red-black tree implementation for balanced tree search",
commit 4a720a9547320699aceda7d2e0b08de5ab40132f
"rs6000: Add initial input files",
commit bd5b625228d545d5ecb35df24f9f094edc95e3fa
"rs6000: Initial create of rs6000-gen-builtins.c".

	gcc/
	* config/rs6000/rbtree.c: Fix 'Copyright (C) 2020-21' into '2020-2021'
	* config/rs6000/rbtree.h: Likewise.
	* config/rs6000/rs6000-builtin-new.def: Likewise.
	* config/rs6000/rs6000-gen-builtins.c: Likewise.
	* config/rs6000/rs6000-overload.def: Likewise.
---
 gcc/config/rs6000/rbtree.c   | 2 +-
 gcc/config/rs6000/rbtree.h   | 2 +-
 gcc/config/rs6000/rs6000-builtin-new.def | 2 +-
 gcc/config/rs6000/rs6000-gen-builtins.c  | 2 +-
 gcc/config/rs6000/rs6000-overload.def| 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rbtree.c b/gcc/config/rs6000/rbtree.c
index 37a559c1fbc..d3d03a62433 100644

[committed] Fortran: Avoid running into assert with -fcheck= + UBSAN [PR92621]

2021-10-22 Thread Tobias Burnus

The testcase of the PR or as attached gave an ICE, but only when
compiled with -fcheck=all -fsanitize=undefined. Solution: Strip
the nop to avoid the assert failure.

Committed as r12-4632-g24e99e6ec1cc57f3660c00ff677c7feb16aa94d2

Tobias

 * * *

PS: Similar issues when using additional flags:

ICE also with -fcheck=all -fsanitize=undefined:
https://gcc.gnu.org/PR102901 ICE (segfault) when compiling pdt_13.f03 with 
-fcheck=all in gfc_check_pdt_dummy -> structure_alloc_comps
https://gcc.gnu.org/PR102900 ICE via gfc_class_data_get with 
alloc_comp_class_4.f03 or proc_ptr_52.f90 using -fcheck=all

+ runtime same flags but running the code:
https://gcc.gnu.org/PR102903 New: Invalid gfortran.dg testcases or wrong-code 
with -fcheck=all -fsanitize=undefined
Here, false positives might/do surely exist as do testcase bugs. (And the list 
is incomplete.)

+ -flto fail (not really fitting into this series):
https://gcc.gnu.org/PR102885 - [12 Regression] ICE when compiling 
gfortran.dg/bind_c_char_10.f90 with -flto since r12-4467-g64f9623765da3306
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 24e99e6ec1cc57f3660c00ff677c7feb16aa94d2
Author: Tobias Burnus 
Date:   Fri Oct 22 23:23:06 2021 +0200

Fortran: Avoid running into assert with -fcheck= + UBSAN

PR fortran/92621
gcc/fortran/
* trans-expr.c (gfc_trans_assignment_1): Add STRIP_NOPS.

gcc/testsuite/
* gfortran.dg/bind-c-intent-out-2.f90: New test.

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 29697e69e75..2d7f9e0fb91 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -11727,6 +11727,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
 
 	  tmp = INDIRECT_REF_P (lse.expr)
 	  ? gfc_build_addr_expr (NULL_TREE, lse.expr) : lse.expr;
+	  STRIP_NOPS (tmp);
 
 	  /* We should only get array references here.  */
 	  gcc_assert (TREE_CODE (tmp) == POINTER_PLUS_EXPR
diff --git a/gcc/testsuite/gfortran.dg/bind-c-intent-out-2.f90 b/gcc/testsuite/gfortran.dg/bind-c-intent-out-2.f90
new file mode 100644
index 000..fe8f6060f1f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/bind-c-intent-out-2.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+! { dg-additional-options "-fsanitize=undefined -fcheck=all" }
+
+! PR fortran/92621
+
+subroutine hello(val) bind(c)
+  use, intrinsic :: iso_c_binding, only: c_int
+
+  implicit none
+  
+  integer(kind=c_int), allocatable, intent(out) :: val(:)
+
+  allocate(val(1))
+  val = 2
+  return
+end subroutine hello
+
+program alloc_p
+
+  use, intrinsic :: iso_c_binding, only: c_int
+
+  implicit none
+
+  interface
+subroutine hello(val) bind(c)
+  import :: c_int
+  implicit none
+  integer(kind=c_int), allocatable, intent(out) :: val(:)
+end subroutine hello
+  end interface
+
+  integer(kind=c_int), allocatable :: a(:)
+
+  allocate(a(1))
+  a = 1
+  call hello(a)
+  stop
+
+end program alloc_p


[committed] wwwdocs: gcc-5/changes.html: Update link to Intel's pcommit deprecation

2021-10-22 Thread Gerald Pfeifer
---
 htdocs/gcc-5/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-5/changes.html b/htdocs/gcc-5/changes.html
index 05e796dd..2e2e20e6 100644
--- a/htdocs/gcc-5/changes.html
+++ b/htdocs/gcc-5/changes.html
@@ -1084,7 +1084,7 @@ are not listed here).
 IA-32/x86-64
   
 Support for the https://software.intel.com/content/www/us/en/develop/blogs/deprecate-pcommit-instruction.html";>deprecated
+
href="https://www.intel.com/content/www/us/en/developer/articles/technical/deprecate-pcommit-instruction.html";>deprecated
 pcommit instruction has been removed.
   
 
-- 
2.33.0


[committed]

2021-10-22 Thread Tobias Burnus

Committed as r12-4633-g030875c197e339542ddfcbad90cfc01263151bec

To reduce the XFAIL clutter in the *.sum files, this patch removes some
pointless XFAIL in favour of pruning the output which should be ignored
and using explicit checks for the currently output warnings/errors.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 030875c197e339542ddfcbad90cfc01263151bec
Author: Tobias Burnus 
Date:   Sat Oct 23 00:04:43 2021 +0200

Fortran: Change XFAIL to PASS

Replace dg-excess-errors by dg-error/warning and dg-prune-output for
more fine-grained output handling and to avoid XPASS.

gcc/testsuite/ChangeLog:

* gfortran.dg/associate_3.f03: Replace dg-excess-errors by
other dg-* to change XFAIL to PASS.
* gfortran.dg/binding_label_tests_4.f03: Likewise.
* gfortran.dg/block_4.f08: Likewise.
* gfortran.dg/charlen_04.f90: Likewise.
* gfortran.dg/charlen_05.f90: Likewise.
* gfortran.dg/charlen_06.f90: Likewise.
* gfortran.dg/charlen_13.f90: Likewise.
* gfortran.dg/coarray_9.f90: Likewise.
* gfortran.dg/coarray_collectives_3.f90: Likewise.
* gfortran.dg/data_invalid.f90: Likewise.
* gfortran.dg/do_4.f: Likewise.
* gfortran.dg/dollar_sym_1.f90: Likewise.
* gfortran.dg/dollar_sym_3.f: Likewise.
* gfortran.dg/fmt_tab_1.f90: Likewise.
* gfortran.dg/fmt_tab_2.f90: Likewise.
* gfortran.dg/forall_16.f90: Likewise.
* gfortran.dg/g77/970125-0.f: Likewise.
* gfortran.dg/gomp/unexpected-end.f90: Likewise.
* gfortran.dg/interface_operator_1.f90: Likewise.
* gfortran.dg/interface_operator_2.f90: Likewise.
* gfortran.dg/line_length_4.f90: Likewise.
* gfortran.dg/line_length_5.f90: Likewise.
* gfortran.dg/line_length_6.f90: Likewise.
* gfortran.dg/line_length_8.f90: Likewise.
* gfortran.dg/line_length_9.f90: Likewise.
* gfortran.dg/pr65045.f90: Likewise.
* gfortran.dg/pr69497.f90: Likewise.
* gfortran.dg/submodule_21.f08: Likewise.
* gfortran.dg/tab_continuation.f: Likewise.
* gfortran.dg/typebound_proc_2.f90: Likewise.
* gfortran.dg/warnings_are_errors_1.f90: Likewise.

diff --git a/gcc/testsuite/gfortran.dg/associate_3.f03 b/gcc/testsuite/gfortran.dg/associate_3.f03
index da7bec951d1..dfd5a99500e 100644
--- a/gcc/testsuite/gfortran.dg/associate_3.f03
+++ b/gcc/testsuite/gfortran.dg/associate_3.f03
@@ -34,4 +34,4 @@ PROGRAM main
 INTEGER :: b ! { dg-error "Unexpected data declaration statement" }
   END ASSOCIATE
 END PROGRAM main ! { dg-error "Expecting END ASSOCIATE" }
-! { dg-excess-errors "Unexpected end of file" }
+! { dg-error "Unexpected end of file" "" { target "*-*-*" } 0 }
diff --git a/gcc/testsuite/gfortran.dg/binding_label_tests_4.f03 b/gcc/testsuite/gfortran.dg/binding_label_tests_4.f03
index f8c0f046063..af9a588cfec 100644
--- a/gcc/testsuite/gfortran.dg/binding_label_tests_4.f03
+++ b/gcc/testsuite/gfortran.dg/binding_label_tests_4.f03
@@ -20,4 +20,4 @@ module C
 use A
 use B ! { dg-error "Cannot open module file" }
 end module C
-! { dg-excess-errors "compilation terminated" }
+! { dg-prune-output "compilation terminated" }
diff --git a/gcc/testsuite/gfortran.dg/block_4.f08 b/gcc/testsuite/gfortran.dg/block_4.f08
index 4c63194c85d..3ff52b0a098 100644
--- a/gcc/testsuite/gfortran.dg/block_4.f08
+++ b/gcc/testsuite/gfortran.dg/block_4.f08
@@ -15,4 +15,4 @@ PROGRAM main
   myname2: BLOCK
   END BLOCK ! { dg-error "Expected block name of 'myname2'" }
 END PROGRAM main ! { dg-error "Expecting END BLOCK" }
-! { dg-excess-errors "Unexpected end of file" }
+! { dg-error "Unexpected end of file" "" { target "*-*-*" } 0 }
diff --git a/gcc/testsuite/gfortran.dg/charlen_04.f90 b/gcc/testsuite/gfortran.dg/charlen_04.f90
index f93465f2ae6..97aa0ec583c 100644
--- a/gcc/testsuite/gfortran.dg/charlen_04.f90
+++ b/gcc/testsuite/gfortran.dg/charlen_04.f90
@@ -3,6 +3,5 @@
 program p
type t
   character(*), allocatable :: x(*)  ! { dg-error "must have a deferred shape" }
-   end type
+   end type  ! { dg-error "needs to be a constant specification" "" { target "*-*-*" } .-1 } 
 end
-! { dg-excess-errors "needs to be a constant specification" } 
diff --git a/gcc/testsuite/gfortran.dg/charlen_05.f90 b/gcc/testsuite/gfortran.dg/charlen_05.f90
index 0eb0015bf38..e58f9263330 100644
--- a/gcc/testsuite/gfortran.dg/charlen_05.f90
+++ b/gcc/testsuite/gfortran.dg/charlen_05.f90
@@ -3,6 +3,5 @@
 program p
type t
   character(*) :: x y  ! { dg-error "error in data declar

[committed] libstdc++: Constrain std::make_any [PR102894]

2021-10-22 Thread Jonathan Wakely via Gcc-patches
std::make_any should be constrained so it can only be called if the
construction of the return value would be valid.


Tested x86_64-linux, committed to trunk.
I plan to backport this too.



libstdc++-v3/ChangeLog:

PR libstdc++/102894
* include/std/any (make_any): Add SFINAE constraint.
* testsuite/20_util/any/102894.cc: New test.
---
 libstdc++-v3/include/std/any | 13 +
 libstdc++-v3/testsuite/20_util/any/102894.cc | 20 
 2 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/any/102894.cc

diff --git a/libstdc++-v3/include/std/any b/libstdc++-v3/include/std/any
index 9c102a58b26..f75dddf6d92 100644
--- a/libstdc++-v3/include/std/any
+++ b/libstdc++-v3/include/std/any
@@ -428,16 +428,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Exchange the states of two @c any objects.
   inline void swap(any& __x, any& __y) noexcept { __x.swap(__y); }
 
-  /// Create an any holding a @c _Tp constructed from @c __args.
+  /// Create an `any` holding a `_Tp` constructed from `__args...`.
   template 
-any make_any(_Args&&... __args)
+inline
+enable_if_t, _Args...>, any>
+make_any(_Args&&... __args)
 {
   return any(in_place_type<_Tp>, std::forward<_Args>(__args)...);
 }
 
-  /// Create an any holding a @c _Tp constructed from @c __il and @c __args.
+  /// Create an `any` holding a `_Tp` constructed from `__il` and `__args...`.
   template 
-any make_any(initializer_list<_Up> __il, _Args&&... __args)
+inline
+enable_if_t,
+  initializer_list<_Up>&, _Args...>, any>
+make_any(initializer_list<_Up> __il, _Args&&... __args)
 {
   return any(in_place_type<_Tp>, __il, std::forward<_Args>(__args)...);
 }
diff --git a/libstdc++-v3/testsuite/20_util/any/102894.cc 
b/libstdc++-v3/testsuite/20_util/any/102894.cc
new file mode 100644
index 000..66ea9a03fea
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/any/102894.cc
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++17 } }
+#include 
+
+template
+struct can_make_any
+: std::false_type
+{ };
+
+template
+struct can_make_any())>>
+: std::true_type
+{ };
+
+struct move_only
+{
+  move_only() = default;
+  move_only(move_only&&) = default;
+};
+
+static_assert( ! can_make_any::value ); // PR libstdc++/102894
-- 
2.31.1



[committed] doc: Convert mingw-w64.org links to https

2021-10-22 Thread Gerald Pfeifer
It turns out my link checker does catch broken links under 
gcc.gnu.org/install/ - fixed thusly.

(That makes it all the more puzzling how the issue you fixed last
week did not arise, Jonathan.)

Gerald


gcc:
* doc/install.texi (Binaries): Convert mingw-w64.org to https.
(Specific): Ditto.
---
 gcc/doc/install.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 7c775965964..38f96bf5a89 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3473,7 +3473,7 @@ Microsoft Windows:
 The @uref{https://sourceware.org/cygwin/,,Cygwin} project;
 @item
 The @uref{https://osdn.net/projects/mingw/,,MinGW} and
-@uref{http://www.mingw-w64.org/,,mingw-w64} projects.
+@uref{https://www.mingw-w64.org/,,mingw-w64} projects.
 @end itemize
 
 @item
@@ -5080,7 +5080,7 @@ the Win32 subsystem that provides a subset of POSIX.
 
 @subheading Intel 64-bit versions
 GCC contains support for x86-64 using the mingw-w64
-runtime library, available from @uref{http://mingw-w64.org/doku.php}.
+runtime library, available from @uref{https://mingw-w64.org/doku.php}.
 This library should be used with the target triple x86_64-pc-mingw32.
 
 Presently Windows for Itanium is not supported.
-- 
2.33.0


Re: [PATCH][WIP] Add install-dvi Makefile targets

2021-10-22 Thread Eric Gallager via Gcc-patches
On Fri, Oct 22, 2021 at 8:23 AM Jeff Law  wrote:
>
>
>
> On 10/18/2021 7:30 PM, Eric Gallager wrote:
> > On Tue, Oct 12, 2021 at 5:09 PM Eric Gallager  wrote:
> >> On Thu, Oct 6, 2016 at 10:41 AM Eric Gallager  wrote:
> >>> Currently the build machinery handles install-pdf and install-html
> >>> targets, but no install-dvi target. This patch is a step towards
> >>> fixing that. Note that I have only tested with
> >>> --enable-languages=c,c++,lto,objc,obj-c++. Thus, target hooks will
> >>> probably also have to be added for the languages I skipped.
> >>> Also, please note that this patch applies on top of:
> >>> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00370.html
> >>>
> >>> ChangeLog:
> >>>
> >>> 2016-10-06  Eric Gallager  
> >>>
> >>>  * Makefile.def: Handle install-dvi target.
> >>>  * Makefile.tpl: Likewise.
> >>>  * Makefile.in: Regenerate.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2016-10-06  Eric Gallager  
> >>>
> >>>  * Makefile.in: Handle dvidir and install-dvi target.
> >>>  * ./[c|cp|lto|objc|objcp]/Make-lang.in: Add dummy install-dvi
> >>>  target hooks.
> >>>  * configure.ac: Handle install-dvi target.
> >>>  * configure: Regenerate.
> >>>
> >>> libiberty/ChangeLog:
> >>>
> >>> 2016-10-06  Eric Gallager  
> >>>
> >>>  * Makefile.in: Handle dvidir and install-dvi target.
> >>>  * functions.texi: Regenerate.
> >> Ping. The prerequisite patch that I linked to previously has gone in now.
> >> I'm not sure if this specific patch still applies, though.
> >> Also note that I've opened a bug to track this issue:
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102663
> > Hi, I have updated this patch and tested it with more languages now; I
> > can now confirm that it works with ada, d, and fortran now. The only
> > languages that remain untested now are go (since I'm building on
> > darwin and go doesn't build on darwin anyways, as per bug 46986) and
> > jit (which I ran into a bug about that I brought up on IRC, and will
> > probably need to file on bugzilla). OK to install?
> Yea, I think this is OK.  We might need to adjust go/jit and perhaps
> other toplevel modules, but if those do show up as problems I think we
> can fault in those fixes.
>
> jeff

OK thanks, installed as r12-4636:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c3e80a16af287e804b87b8015307085399755cd4


Re: [committed] doc: Convert mingw-w64.org links to https

2021-10-22 Thread Jonathan Wakely via Gcc-patches
On Fri, 22 Oct 2021, 23:28 Gerald Pfeifer,  wrote:

> It turns out my link checker does catch broken links under
> gcc.gnu.org/install/ - fixed thusly.
>
> (That makes it all the more puzzling how the issue you fixed last
> week did not arise, Jonathan.)
>

It didn't give a 404, there was a page at the end of the link, just an
empty one. So it probably looks like a good link to your script.



> Gerald
>
>
> gcc:
> * doc/install.texi (Binaries): Convert mingw-w64.org to https.
> (Specific): Ditto.
> ---
>  gcc/doc/install.texi | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 7c775965964..38f96bf5a89 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -3473,7 +3473,7 @@ Microsoft Windows:
>  The @uref{https://sourceware.org/cygwin/,,Cygwin} project;
>  @item
>  The @uref{https://osdn.net/projects/mingw/,,MinGW} and
> -@uref{http://www.mingw-w64.org/,,mingw-w64} projects.
> +@uref{https://www.mingw-w64.org/,,mingw-w64} projects.
>  @end itemize
>
>  @item
> @@ -5080,7 +5080,7 @@ the Win32 subsystem that provides a subset of POSIX.
>
>  @subheading Intel 64-bit versions
>  GCC contains support for x86-64 using the mingw-w64
> -runtime library, available from @uref{http://mingw-w64.org/doku.php}.
> +runtime library, available from @uref{https://mingw-w64.org/doku.php}.
>  This library should be used with the target triple x86_64-pc-mingw32.
>
>  Presently Windows for Itanium is not supported.
> --
> 2.33.0
>


Re: [committed] doc: Convert mingw-w64.org links to https

2021-10-22 Thread Jonathan Wakely via Gcc-patches
On Sat, 23 Oct 2021, 00:43 Jonathan Wakely,  wrote:

>
>
> On Fri, 22 Oct 2021, 23:28 Gerald Pfeifer,  wrote:
>
>> It turns out my link checker does catch broken links under
>> gcc.gnu.org/install/ - fixed thusly.
>>
>> (That makes it all the more puzzling how the issue you fixed last
>> week did not arise, Jonathan.)
>>
>
> It didn't give a 404, there was a page at the end of the link, just an
> empty one. So it probably looks like a good link to your script.
>


Maybe something is (or was?) still generating old.html, as an empty page:

https://gcc.gnu.org/install/old.html


Re: [committed] doc: Convert mingw-w64.org links to https

2021-10-22 Thread Gerald Pfeifer
On Sat, 23 Oct 2021, Jonathan Wakely wrote:
>> (That makes it all the more puzzling how the issue you fixed last
>> week did not arise, Jonathan.)
> It didn't give a 404, there was a page at the end of the link, just 
> an empty one. So it probably looks like a good link to your script.

Yes, as long as there is a page the link is considered valid, 
regardless of contents.

On Sat, 23 Oct 2021, Jonathan Wakely wrote:
> Maybe something is (or was?) still generating old.html, as an empty page:
> 
> https://gcc.gnu.org/install/old.html

Ahh, that got me thinking. Thank you for the hint, Jonathan!

I know what happens and will address it (after a good night's sleep :-).

Gerald


[Fortran, committed] Add testcase for PR95196

2021-10-22 Thread Sandra Loosemore
I've committed another testcase from a bugzilla issue that now appears 
to be fixed.


-Sandra
commit 9a0e34eb45e36d4f90cedb61191fd31da0bab256
Author: Sandra Loosemore 
Date:   Fri Oct 22 17:22:00 2021 -0700

Add testcase for PR fortran/95196

2021-10-22  José Rui Faustino de Sousa  
	Sandra Loosemore  

	gcc/testsuite/

	PR fortran/95196
	* gfortran.dg/PR95196.f90: New.

diff --git a/gcc/testsuite/gfortran.dg/PR95196.f90 b/gcc/testsuite/gfortran.dg/PR95196.f90
new file mode 100644
index 000..14333e4
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR95196.f90
@@ -0,0 +1,83 @@
+! { dg-do run }
+
+program rnk_p
+
+  implicit none
+
+  integer, parameter :: n = 10
+  integer, parameter :: m = 5
+  integer, parameter :: s = 4
+  integer, parameter :: l = 4
+  integer, parameter :: u = s+l-1
+  
+  integer :: a(n)
+  integer :: b(n,n)
+  integer :: c(n,n,n)
+  integer :: r(s*s*s)
+  integer :: i
+
+  a = reshape([(i, i=1,n)], [n])
+  b = reshape([(i, i=1,n*n)], [n,n])
+  c = reshape([(i, i=1,n*n*n)], [n,n,n])
+  r(1:s) = a(l:u)
+  call rnk_s(a(l:u), r(1:s))
+  r(1:s*s) = reshape(b(l:u,l:u), [s*s])
+  call rnk_s(b(l:u,l:u), r(1:s*s))
+  r = reshape(c(l:u,l:u,l:u), [s*s*s])
+  call rnk_s(c(l:u,l:7,l:u), r)
+  stop
+  
+contains
+
+  subroutine rnk_s(a, b)
+integer, intent(in) :: a(..)
+integer, intent(in) :: b(:)
+
+!integer :: l(rank(a)), u(rank(a)) does not work due to Bug 94048 
+integer, allocatable :: lb(:), ub(:)
+integer  :: i, j, k, l
+
+lb = lbound(a)
+ub = ubound(a)
+select rank(a)
+rank(1)
+  if(any(lb/=lbound(a))) stop 11
+  if(any(ub/=ubound(a))) stop 12
+  if(size(a)/=size(b))   stop 13
+  do i = 1, size(a)
+if(a(i)/=b(i)) stop 14
+  end do
+rank(2)
+  if(any(lb/=lbound(a))) stop 21
+  if(any(ub/=ubound(a))) stop 22
+  if(size(a)/=size(b))   stop 23
+  k = 0
+  do j = 1, size(a, dim=2)
+do i = 1, size(a, dim=1)
+  k = k + 1
+  if(a(i,j)/=b(k)) stop 24
+end do
+  end do
+rank(3)
+  if(any(lb/=lbound(a))) stop 31
+  if(any(ub/=ubound(a))) stop 32
+  if(size(a)/=size(b))   stop 33
+  l = 0
+  do k = 1, size(a, dim=3)
+do j = 1, size(a, dim=2)
+  do i = 1, size(a, dim=1)
+l = l + 1
+! print *, a(i,j,k), b(l)
+if(a(i,j,k)/=b(l)) stop 34
+  end do
+end do
+  end do
+rank default
+  stop 171
+end select
+deallocate(lb, ub)
+return
+  end subroutine rnk_s
+  
+end program rnk_p
+


[r12-4632 Regression] FAIL: gfortran.dg/bind-c-intent-out-2.f90 -Os (test for excess errors) on Linux/x86_64

2021-10-22 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

24e99e6ec1cc57f3660c00ff677c7feb16aa94d2 is the first bad commit
commit 24e99e6ec1cc57f3660c00ff677c7feb16aa94d2
Author: Tobias Burnus 
Date:   Fri Oct 22 23:23:06 2021 +0200

Fortran: Avoid running into assert with -fcheck= + UBSAN

caused

FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4632/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Cannot reproduce – Re: [r12-4632 Regression] FAIL: gfortran.dg/bind-c-intent-out-2.f90 -Os (test for excess errors) on Linux/x86_64

2021-10-22 Thread Tobias Burnus

Hi,

for some reasons, I cannot reproduce this. I checked with that I am in
sync with master – and I also tried -m32 and -march=cascadelake, running
both manually and via DejaGNU but I it passes here.

Can someone who sees it show the excess error? Or was that a spurious
issue which is now  gone?

Tobias

On 23.10.21 06:43, sunil.k.pandey wrote:

On Linux/x86_64,

24e99e6ec1cc57f3660c00ff677c7feb16aa94d2 is the first bad commit
commit 24e99e6ec1cc57f3660c00ff677c7feb16aa94d2
Author: Tobias Burnus 
Date:   Fri Oct 22 23:23:06 2021 +0200

 Fortran: Avoid running into assert with -fcheck= + UBSAN

caused

FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/bind-c-intent-out-2.f90   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4632/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/bind-c-intent-out-2.f90 --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955