Re: [RFC] c-family: Add __builtin_noassoc

2021-07-19 Thread Richard Biener via Gcc-patches
On Fri, Jul 16, 2021 at 2:00 PM Matthias Kretz  wrote:
>
> On Friday, 16 July 2021 11:31:29 CEST Richard Biener wrote:
> > On Fri, Jul 16, 2021 at 10:57 AM Matthias Kretz  wrote:
> > > On Wednesday, 14 July 2021 10:14:55 CEST Richard Biener wrote:
> > > > I think implementing it similar to how we do __builtin_shufflevector
> > > > would
> > > > be easily possible.  PAREN_EXPR is a tree code.
> > >
> > > Like this? If you like it, I'll write the missing documentation and do
> > > real
> > > regression testing.
> >
> > Yes, like this.  Now, __builtin_noassoc (a + b + c) might suggest that
> > it prevents a + b + c from being re-associated - but it does not.
> > PAREN_EXPR is a barrier for association, so for 'a + b + c + PAREN_EXPR  > + e + f>' the a+b+c and d+e+f chains will not mix but they individually can
> > be re-associated.  That said __builtin_noassoc might be a bad name,
> > maybe __builtin_assoc_barrier is better?
>
> Yes, I agree with renaming it. And assoc_barrier sounds intuitive to me.
>
> > To fully prevent association of a a + b + d + e chain you need at least
> > two PAREN_EXPRs, for example (a+b) + (d+e) would do.
> >
> > One could of course provide __builtin_noassoc (a+b+c+d) with the
> > implied semantics and insert PAREN_EXPRs around all operands
> > when lowering it.
>
> I wouldn't want to go there. __builtin_noassoc(f(x, y, z))? We probably both
> agree that it would be a no-op, but it reads like f should be evaluated with -
> fno-associative-math.

Ah, true - yeah, let's not go down this rathole.

> > Not sure what's more useful in practice - directly exposing the middle-end
> > PAREN_EXPR or providing a way to mark a whole expression as to be
> > not re-associated?  Maybe both?
>
> I think this is a tool for specialists. Give them the low-level tool and
> they'll build whatever higher level abstractions they need on top of it. Like

OK, fine.

> float sum_noassoc(RangeOfFloats auto x) {
>   float sum = 0;
>   for (float v : x)
> sum = __builtin_assoc_barrier(v + x);
>   return sum;
> }
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──


[PATCH] c-family: Add __builtin_assoc_barrier

2021-07-19 Thread Matthias Kretz
tested on x86_64-pc-linux-gnu with no new failures. OK for master?

New builtin to enable explicit use of PAREN_EXPR in C & C++ code.

Signed-off-by: Matthias Kretz 

gcc/testsuite/ChangeLog:

* c-c++-common/builtin-assoc-barrier-1.c: New test.

gcc/cp/ChangeLog:

* cp-objcp-common.c (names_builtin_p): Handle
RID_BUILTIN_ASSOC_BARRIER.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_ASSOC_BARRIER.

gcc/c-family/ChangeLog:

* c-common.c (c_common_reswords): Add __builtin_assoc_barrier.
* c-common.h (enum rid): Add RID_BUILTIN_ASSOC_BARRIER.

gcc/c/ChangeLog:

* c-decl.c (names_builtin_p): Handle RID_BUILTIN_ASSOC_BARRIER.
* c-parser.c (c_parser_postfix_expression): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document __builtin_assoc_barrier.
---
 gcc/c-family/c-common.c   |  1 +
 gcc/c-family/c-common.h   |  2 +-
 gcc/c/c-decl.c|  1 +
 gcc/c/c-parser.c  | 20 
 gcc/cp/cp-objcp-common.c  |  1 +
 gcc/cp/parser.c   | 14 +++
 gcc/doc/extend.texi   | 18 ++
 .../c-c++-common/builtin-assoc-barrier-1.c| 24 +++
 8 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/builtin-assoc-barrier-1.c


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 681fcc972f4..c62a6398a47 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -384,6 +384,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
+  { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
   { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 50ca8fb6ebd..f34dc47c2ba 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -108,7 +108,7 @@ enum rid
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,	 RID_BUILTIN_SHUFFLE,
   RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
-  RID_BUILTIN_HAS_ATTRIBUTE,
+  RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_ASSOC_BARRIER,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
 
   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 983d65e930c..dcf4a2d7c32 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -10557,6 +10557,7 @@ names_builtin_p (const char *name)
 case RID_BUILTIN_HAS_ATTRIBUTE:
 case RID_BUILTIN_SHUFFLE:
 case RID_BUILTIN_SHUFFLEVECTOR:
+case RID_BUILTIN_ASSOC_BARRIER:
 case RID_CHOOSE_EXPR:
 case RID_OFFSETOF:
 case RID_TYPES_COMPATIBLE_P:
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 9a56e0c04c6..fffd81f4e5b 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -8931,6 +8931,7 @@ c_parser_predefined_identifier (c_parser *parser)
 			 assignment-expression ,
 			 assignment-expression, )
  __builtin_convertvector ( assignment-expression , type-name )
+ __builtin_assoc_barrier ( assignment-expression )
 
offsetof-member-designator:
  identifier
@@ -10076,6 +10077,25 @@ c_parser_postfix_expression (c_parser *parser)
 	  }
 	  }
 	  break;
+	case RID_BUILTIN_ASSOC_BARRIER:
+	  {
+	location_t start_loc = loc;
+	c_parser_consume_token (parser);
+	matching_parens parens;
+	if (!parens.require_open (parser))
+	  {
+		expr.set_error ();
+		break;
+	  }
+	e1 = c_parser_expr_no_commas (parser, NULL);
+	mark_exp_read (e1.value);
+	location_t end_loc = c_parser_peek_token (parser)->get_finish ();
+	parens.skip_until_found_close (parser);
+	expr.value = build1_loc (loc, PAREN_EXPR, TREE_TYPE (e1.value),
+ e1.value);
+	set_c_expr_source_range (&expr, start_loc, end_loc);
+	  }
+	  break;
 	case RID_AT_SELECTOR:
 	  {
 	gcc_assert (c_dialect_objc ());
diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index ee255732d5a..04522a23eda 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -395,6 +395,7 @@ names_builtin_p (const char *name)
 case RID_BUILTIN_SH

Re: [PATCH 2/3] Remove last gimple_expr_type uses

2021-07-19 Thread Richard Biener via Gcc-patches
On Fri, Jul 16, 2021 at 3:04 PM Richard Biener  wrote:
>
> This removes the last uses of gimple_expr_type.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.

The following variant is what I eventually pushed.

Richard.


p
Description: Binary data


[gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' (was: [PATCH libatomic/arm] avoid warning on constant addre

2021-07-19 Thread Thomas Schwinge
Hi!

| On 7/16/21 11:42 AM, Thomas Schwinge wrote:
|> On 2021-07-09T17:11:25-0600, Martin Sebor via Gcc-patches 
 wrote:
|>> The attached tweak avoids the new -Warray-bounds instances when
|>> building libatomic for arm. Christophe confirms it resolves
|>> the problem (thank you!)
|>
|> As Abid has just reported in
|> , similar
|> problem with GCN target libgomp build:
|>
|>  In function ‘gcn_thrs’,
|>  inlined from ‘gomp_thread’ at 
[...]/source-gcc/libgomp/libgomp.h:803:10,
|>  inlined from ‘GOMP_barrier’ at 
[...]/source-gcc/libgomp/barrier.c:34:29:
|>  [...]/source-gcc/libgomp/libgomp.h:792:10: error: array subscript 0 is 
outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ 
[-Werror=array-bounds]
|>792 |   return *thrs;
|>|  ^
|>
|>  gcc/config/gcn/gcn.h:  c_register_addr_space ("__lds", ADDR_SPACE_LDS); 
  \
|>
|>  libgomp/libgomp.h-static inline struct gomp_thread *gcn_thrs (void)
|>  libgomp/libgomp.h-{
|>  libgomp/libgomp.h-  /* The value is at the bottom of LDS.  */
|>  libgomp/libgomp.h:  struct gomp_thread * __lds *thrs = (struct 
gomp_thread * __lds *)4;
|>  libgomp/libgomp.h-  return *thrs;
|>  libgomp/libgomp.h-}
|>
|> ..., plus a few more.  Work-around:
|>
|> struct gomp_thread * __lds *thrs = (struct gomp_thread * __lds *)4;
|>  +# pragma GCC diagnostic push
|>  +# pragma GCC diagnostic ignored "-Warray-bounds"
|> return *thrs;
|>  +# pragma GCC diagnostic pop
|>
|> ..., but it's a bit tedious to add that in all that the other places,
|> too.

Wasn't so bad after all; a lot of duplicates due to 'libgomp.h'.  I've
thus pushed "[gcn] Work-around libgomp 'error: array subscript 0 is
outside array bounds of ‘__lds struct gomp_thread * __lds[0]’
[-Werror=array-bounds]' [PR101484]" to master branch in commit
9f2bc5077debef2b046b6c10d38591ac324ad8b5, see attached.

|> (So I'll consider some GCN-specific '-Wno-array-bounds' if we don't
|> get to resolve this otherwise, soon.)

Now: "Awaiting a different solution, of course."  ;-)


On 2021-07-17T23:28:45+0100, Andrew Stubbs  wrote:
> On 16/07/2021 18:42, Thomas Schwinge wrote:
>> Of course, we may simply re-work the libgomp/GCN code -- but don't we
>> first need to answer the question whether the current code is actually
>> "bad"?  Aren't we going to get a lot of similar reports from
>> kernel/embedded/other low-level software developers, once this is out in
>> the wild?  I mean:
>
> GCN already uses address 4 for this value because address 0 caused
> problems with null-pointer checks.

Ugh.  How much wasted bytes per what is that?  (I haven't looked yet;
hopefully not per GPU thread?)  Because:

> It really ought not be this hard. :-(

I'm sure we can avoid that.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9f2bc5077debef2b046b6c10d38591ac324ad8b5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 16 Jul 2021 19:12:02 +0200
Subject: [PATCH] =?UTF-8?q?[gcn]=20Work-around=20libgomp=20'error:=20array?=
 =?UTF-8?q?=20subscript=200=20is=20outside=20array=20bounds=20of=20?=
 =?UTF-8?q?=E2=80=98=5F=5Flds=20struct=20gomp=5Fthread=20*=20=5F=5Flds[0]?=
 =?UTF-8?q?=E2=80=99=20[-Werror=3Darray-bounds]'=20[PR101484]?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

... seen as of commit a110855667782dac7b674d3e328b253b3b3c919b "Correct
handling of variable offset minus constant in -Warray-bounds [PR100137]".

Awaiting a different solution, of course.

	libgomp/
	PR target/101484
	* config/gcn/team.c: Apply '-Werror=array-bounds' work-around.
	* libgomp.h [__AMDGCN__]: Likewise.
---
 libgomp/config/gcn/team.c |  3 +++
 libgomp/libgomp.h | 12 
 2 files changed, 15 insertions(+)

diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
index 627210ea407..94ce2f2dfeb 100644
--- a/libgomp/config/gcn/team.c
+++ b/libgomp/config/gcn/team.c
@@ -65,9 +65,12 @@ gomp_gcn_enter_kernel (void)
   void * __lds *arena_start = (void * __lds *)TEAM_ARENA_START;
   void * __lds *arena_free = (void * __lds *)TEAM_ARENA_FREE;
   void * __lds *arena_end = (void * __lds *)TEAM_ARENA_END;
+# pragma GCC diagnostic push
+# pragma GCC diagnostic ignored "-Warray-bounds" /*TODO PR101484 */
   *arena_start = team_arena;
   *arena_free = team_arena;
   *arena_end = team_arena + TEAM_ARENA_SIZE;
+# pragma GCC diagnostic pop
 
   /* Allocate and initialize the team-local-storage data.  */
   struct gomp_thread *thrs = team_malloc_cleared (sizeof (*thrs)
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 8d25dc8e2a8..4159cbe3334 100644
--- a/libg

Re: [PATCH libatomic/arm] avoid warning on constant addresses (PR 101379)

2021-07-19 Thread Thomas Schwinge
Hi!

On 2021-07-16T15:11:24-0600, Martin Sebor via Gcc-patches 
 wrote:
> On 7/16/21 11:42 AM, Thomas Schwinge wrote:
>> On 2021-07-09T17:11:25-0600, Martin Sebor via Gcc-patches 
>>  wrote:
>>> The attached tweak avoids the new -Warray-bounds instances when
>>> building libatomic for arm. Christophe confirms it resolves
>>> the problem (thank you!)

>>> As we have discussed, the main goal of this class of warnings
>>> is to detect accesses at addresses derived from null pointers
>>> (e.g., to struct members or array elements at a nonzero offset).
>>
>> (ACK, and thanks for that work!)
>>
>>> Diagnosing accesses at hardcoded addresses is incidental because
>>> at the stage they are detected the two are not distinguishable
>>> from each another.
>>>
>>> I'm planning (hoping) to implement detection of invalid pointer
>>> arithmetic involving null for GCC 12, so this patch is a stopgap
>>> solution to unblock the arm libatomic build without compromising
>>> the warning.  Once the new detection is in place these workarounds
>>> can be removed or replaced with something more appropriate (e.g.,
>>> declaring the objects at the hardwired addresses with an attribute
>>> like AVR's address or io; that would enable bounds checking at
>>> those addresses as well).
>>
>> Of course, we may simply re-work the libgomp/GCN code -- but don't we
>> first need to answer the question whether the current code is actually
>> "bad"?  Aren't we going to get a lot of similar reports from
>> kernel/embedded/other low-level software developers, once this is out in
>> the wild?  I mean:
>>
>>> PR bootstrap/101379 - libatomic arm build failure after r12-2132 due to 
>>> -Warray-bounds on a constant address
>>>
>>> libatomic/ChangeLog:
>>>* /config/linux/arm/host-config.h (__kernel_helper_version): New
>>>function.  Adjust shadow macro.
>>>
>>> diff --git a/libatomic/config/linux/arm/host-config.h 
>>> b/libatomic/config/linux/arm/host-config.h
>>> index 1520f237d73..777d08a2b85 100644
>>> --- a/libatomic/config/linux/arm/host-config.h
>>> +++ b/libatomic/config/linux/arm/host-config.h
>>> @@ -39,8 +39,14 @@ typedef void (__kernel_dmb_t) (void);
>>>   #define __kernel_dmb (*(__kernel_dmb_t *) 0x0fa0)
>>>
>>>   /* Kernel helper page version number.  */
>>> -#define __kernel_helper_version (*(unsigned int *)0x0ffc)
>>
>> Are such (not un-common) '#define's actually "bad", and anyhow ought to
>> be replaced by something like the following?
>
> Like all warnings (and especially flow-based ones that depend on
> optimization), this one too involves a trade-off between noise and
> real bugs.  There clearly is some low-level code that intentionally
> accesses memory at hardcoded addresses.  But because null pointers
> are pervasive, there's a lot more code that could end up accessing
> data at some offset from zero by accident (e.g., by writing to
> an array element or a member of a struct).  This affects all code,
> but is an especially big concern for privileged code that can access
> all memory.  So in my view, the trade-off is worthwhile.
>
> The logic the warning relies on isn't new: it was introduced in GCC
> 11.  There have been a handful of reports of this issue (some from
> the kernel) but far fewer than in other warnings.  The recent change
> expose more code to the logic so the numbers of both false and true
> positives are bound to go up, in proportion.  Hopefully, before GCC
> 12 is released, I will have a more robust solution to the null+offset
> problem.

Understood.  And I'll certainly be happy to see all these instances of
hard-coded-address-with-pointer-punning be re-written into "something
proper".  I was just wary of the many instances out there.  (But of
course, a lot of people don't pay attention to compiler diagnostics
anyway, so...)  ;-|

>>> +static inline unsigned*
>>> +__kernel_helper_version ()
>>> +{
>>> +  unsigned *volatile addr = (unsigned int *)0x0ffc;
>>> +  return addr;
>>> +}
>>>
>>> +#define __kernel_helper_version (*__kernel_helper_version())
>>
>> (No 'volatile' in the original code, by the way.)
>
> The volatile is what prevents the warning.

Uhm, but isn't that then a behavioral change (potentially performance
impact)?  That wasn't obvious from the patch posted.  (Not my area of
interest, though, just noting.)

> But I think a better
> solution than the hack above is to introduce a named extern const
> variable for the address.  It avoids the issue without the penalty
> of multiple volatile accesses and if/when an attribute like AVR
> address is introduced it can be more easily adapted to it.  Real
> object declarations with an attribute is also a more appropriate
> mechanism than using hardcoded address in pointers.

Sounds plausible, yes.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: Münche

Re: [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' (was: [PATCH libatomic/arm] avoid warning on constant a

2021-07-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 19, 2021 at 10:46:35AM +0200, Thomas Schwinge wrote:
> |> (So I'll consider some GCN-specific '-Wno-array-bounds' if we don't
> |> get to resolve this otherwise, soon.)
> 
> Now: "Awaiting a different solution, of course."  ;-)

The current state of the warning is unacceptable.  IMNSHO it should stop
warning for non-generic address spaces and for generic address space for
constants larger or equal to typical page size with
-fdelete-null-pointer-checks and not at all with 
-fno-delete-null-pointer-checks.

Large structures are rare and using *(type *)0x1234 is very common
mostly in the embedded world, but with mmap MAP_FIXED also used in
non-embedded world.

We don't want all the people out there to add too many workarounds for badly
designed warning.

Jakub



Re: [PATCH] [AARCH64] Fix PR 101205: csinv does not have an zero_extend version

2021-07-19 Thread Richard Sandiford via Gcc-patches
apinski--- via Gcc-patches  writes:
> From: Andrew Pinski 
>
> So the problem is even though there was a csneg with
> a zero_extend in the front, there was not one for csinv.
> This fixes it by extending that pattern.
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
>   PR target/101205
>   * config/aarch64/aarch64.md (csneg3_uxtw_insn): Rename to ...
>   (*cs3_uxtw_insn4): and extend to NEG_NOT.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/101205
>   * gcc.target/aarch64/csinv-neg-1.c: New test.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.md |   6 +-
>  .../gcc.target/aarch64/csinv-neg-1.c  | 112 ++
>  2 files changed, 115 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/csinv-neg-1.c
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index f12a0bebd3d..8cd259fca9c 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4203,15 +4203,15 @@ (define_insn "*csinv3_insn"
>[(set_attr "type" "csel")]
>  )
>  
> -(define_insn "csneg3_uxtw_insn"
> +(define_insn "*cs3_uxtw_insn4"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (zero_extend:DI
> (if_then_else:SI
>   (match_operand 1 "aarch64_comparison_operation" "")
> - (neg:SI (match_operand:SI 2 "register_operand" "r"))
> + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))
>   (match_operand:SI 3 "aarch64_reg_or_zero" "rZ"]
>""
> -  "csneg\\t%w0, %w3, %w2, %M1"
> +  "cs\\t%w0, %w3, %w2, %M1"
>[(set_attr "type" "csel")]
>  )
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg-1.c 
> b/gcc/testsuite/gcc.target/aarch64/csinv-neg-1.c
> new file mode 100644
> index 000..e528883198d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg-1.c
> @@ -0,0 +1,112 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/*
> +** inv1:
> +**   cmp w0, 0
> +**   csinv   w0, w1, w2, ne
> +**   ret
> +*/
> +unsigned long long
> +inv1(unsigned a, unsigned b, unsigned c)
> +{
> +  unsigned t = a ? b : ~c;
> +  return t;
> +}
> +
> +/*
> +** inv1_local:
> +**   cmp w0, 0
> +**   csinv   w0, w1, w2, ne
> +**   ret
> +*/
> +unsigned long long
> +inv1_local(unsigned a, unsigned b, unsigned c)
> +{
> +  unsigned d = ~c;
> +  unsigned t = a ? b : d;
> +  return t;
> +}
> +
> +/*
> +** inv_zero1:
> +**   cmp w0, 0
> +**   csinv   w0, wzr, w1, ne
> +**   ret
> +*/
> +unsigned long long
> +inv_zero1(unsigned a, unsigned b)
> +{
> +  unsigned t = a ? 0 : ~b; 
> +  return t;
> +}
> +
> +/*
> +** inv_zero2:
> +**   cmp w0, 0
> +**   csinv   w0, wzr, w1, eq
> +**   ret
> +*/
> +unsigned long long
> +inv_zero2(unsigned a, unsigned b)
> +{
> +  unsigned t = a ? ~b : 0; 
> +  return t;
> +}
> +
> +
> +/*
> +** inv2:
> +**   cmp w0, 0
> +**   csinv   w0, w2, w1, eq
> +**   ret
> +*/
> +unsigned long long
> +inv2(unsigned a, unsigned b, unsigned c)
> +{
> +  unsigned t = a ? ~b : c; 
> +  return t;
> +}
> +
> +/*
> +** inv2_local:
> +**   cmp w0, 0
> +**   csinv   w0, w2, w1, eq
> +**   ret
> +*/
> +unsigned long long
> +inv2_local(unsigned a, unsigned b, unsigned c)
> +{
> +  unsigned d = ~b;
> +  unsigned t = a ? d : c; 
> +  return t;
> +}
> +
> +/*
> +** neg1:
> +**   cmp w0, 0
> +**   csneg   w0, w1, w2, ne
> +**   ret
> +*/
> +unsigned long long
> +neg1(unsigned a, unsigned b, unsigned c)
> +{
> +  unsigned t = a ? b : -c; 
> +  return t;
> +}
> +
> +
> +/*
> +** neg2:
> +**   cmp w0, 0
> +**   csneg   w0, w2, w1, eq
> +**   ret
> +*/
> +unsigned long long
> +neg2(unsigned a, unsigned b, unsigned c)
> +{
> +  unsigned t = a ? -b : c; 
> +  return t;
> +}
> +
> +/* { dg-final { check-function-bodies "**" "" "" } } */


Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-19 Thread Richard Earnshaw via Gcc-patches




On 17/07/2021 22:37, Jason Merrill via Gcc-patches wrote:

On Sat, Jul 17, 2021 at 6:55 AM Matthias Kretz  wrote:


On Saturday, 17 July 2021 15:32:42 CEST Jonathan Wakely wrote:

On Sat, 17 Jul 2021, 09:15 Matthias Kretz,  wrote:

If somebody writes a library with `keep_apart` in the public API/ABI

then

you're right.


Yes, it's fine if those constants don't affect anything across module
boundaries.


I believe a significant fraction of hardware interference size usage will
be
internal.



I would hope for this to be the vast majority of usage.  I want the warning
to discourage people from using the interference size variables in the
public API of a library.



The developer who wants his code to be included in a distro should care
about
binary distribution. If his code has an ABI issue, that's a bug he

needs

to
fix. It's not the fault of the packager.


Yes but in practice it's the packagers who have to deal with the bug
reports, analyze the problem, and often fix the bug too. It might not be
the packager's fault but it's often their problem


I can imagine. But I don't think requiring users to specify the value
according to what -mtune suggests will improve things. Users will write a
configure/cmake/... macro to parse the value -mtune prints and pass that
on
the command line (we'll soon find this solution on SO 😜). I.e. things are
likely to be even more broken.



Simpler would be a flag to say "set them based on -mtune", e.g.
-finterference-tuning or --param destructive-intereference-size=tuning.
That would be just as easy to write as -Wno-interference-size.

Jason



Please be very careful about an option name like that.  The x86 meaning 
and interpretation of -mtune is subtly different to that of Arm and 
AArch64 and possibly other targets as well.


Also, should the behaviour of a compiler configured with --with-cpu=foo 
be handled differently to a command-line option that sets foo 
explicitly?  In the back-end I'm not sure we can really tell the difference.


R.


Re: Need Help: Initialize paddings for -ftrivial-auto-var-init

2021-07-19 Thread Richard Biener
On Fri, 16 Jul 2021, Qing Zhao wrote:

> Hi, 
> 
> After some more study on __builtin_clear_padding and the corresponding 
> testing cases.
> And also considered both Richard Biener and Richard Sandiford’s previous 
> suggestion to use
> __builtin_clear_padding.  I have the following thought on the paddings 
> initialization:
> 
> ** We can insert a call to __builtin_clear_padding (&decl, 0) to all the 
> variables that need to be
> Auto-initialized during gimplification phase.  This include two places:
> 
> A. If the auto-variable does not have an explicit initializer, and we 
> need to add a call to .DEFERRED_INIT.
>  We always add a call to __builtin_clear_padding following this 
> .DEFERRED_INIT call.
> 
> structure_type temp;
> 
> temp = .DEFERRED_INIT ();
> __builtin_clear_padding (&temp, 0);
> 
> 
>NOTE: 
>   ** If temp has a type without paddings, then 
> __builtin_clear_padding will be lowered to a gimple_nop automatically.
>   ** regardless with zero or pattern init,  the paddings will be 
> always initialized to ZEROes, which is compatible with CLANG.
> 
> 
> B. If the auto-variable does HAVE an explicit initializer, then we will 
> add the call to __builtin_clear_padding 
>  In the beginning of “gimplify_init_constructor”.
> 
> 
>   structure_type temp = {…..};
> 
> 
>  __builtin_clear_padding (&temp, 0);
>  Expand_the_constructor;
> 
> NOTE:
> ** if temp has a type without padding, the call to 
> __builtin_clear_padding will be lowed to gimple_nop;
> ** padding will be always initialized to ZEROes. 
> 
> 
> **the major benefit with this approach are:
> 
>   1. Padding initialization will be compatible with CLANG;

Irrelevant IMHO.

>   2. Implemenation will be much more simple and consistent;

Really?  We're block-initializing now, so how is it simpler to
initialize padding separately?

> My questions:
> 
> 1. What do you think of this approach?

I wonder why we're going back to separate padding initialization?

> 2. During implementation, if I want to add the following routine:
> 
> /* Generate padding initialization for automatic vairable DECL.
>C guarantees that brace-init with fewer initializers than members
>aggregate will initialize the rest of the aggregate as-if it were
>static initialization.  In turn static initialization guarantees
>that padding is initialized to zero. So, we always initialize paddings
>to zeroes regardless INIT_TYPE.
>To do the padding initialization, we insert a call to
>__BUILTIN_CLEAR_PADDING (&decl, 0).
>*/
> static void
> gimple_add_padding_init_for_auto_var (tree decl, gimple_seq *seq_p)
> {
>  ?? how to build a addr_of_decl tree node???
>   tree addr_of_decl = ….

 = build_fold_addr_expr (decl);
 
>   gimple *call = gimple_build_call (builtin_decl_implicit 
> (BUILT_IN_CLEAR_PADDING),
> 2, addr_of_decl, build_zero_cst 
> (TREE_TYPE (decl));
>   gimplify_seq_add_stmt (seq_p, call);
> }
> 
> 
> I need help on how to build “addr_of_decl” in the above routine.
> 
> Thanks a lot for your help.
> 
> Qing
>  

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH v4] vect: Recog mul_highpart pattern

2021-07-19 Thread Richard Biener via Gcc-patches
On Fri, Jul 16, 2021 at 7:33 AM Kewen.Lin  wrote:
>
> on 2021/7/15 下午7:58, Richard Biener wrote:
> > On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin  wrote:
> >>
> >> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> >>> Hi Uros,
> >>>
> >>> on 2021/7/15 下午3:17, Uros Bizjak wrote:
>  On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin  wrote:
> >
> > on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >> on 2021/7/14 下午2:38, Richard Biener wrote:
> >>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin  wrote:
> 
>  on 2021/7/13 下午8:42, Richard Biener wrote:
> > On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  
> > wrote:
> >>>
>  I guess the proposed IFN would be directly mapped for 
>  [us]mul_highpart?
> >>>
> >>> Yes.
> >>>
> >>
> >> Thanks for confirming!  The related patch v2 is attached and the 
> >> testing
> >> is ongoing.
> >>
> >
> > It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> > aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as 
> > below:
> >
> > XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> > XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> > XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> > XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> 
>  These XFAILs should be removed after your patch.
> 
> >>> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> >>> for this test case.  As noted above, this case is sensitive on how we
> >>> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> >>> only with this mul_highpart pattern support, no matter how we model it
> >>> (x86 part of this patch exists or not).
> >>>
>  This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
>  is actually not needed.
> 
> >>>
> >>> Thanks for the information!  The justification for the x86 part is that:
> >>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> >>> optab support, i386 port has already customized costing for
> >>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> >>> support), if we don't follow the same way for IFN_MULH, I'm worried that
> >>> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> >>> a right thing (we shouldn't cost it specially), it at least means we
> >>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> >>> has direct mul_highpart optab support, I think they should be costed
> >>> consistently.  Does it sound reasonable?
> >>>
> >>
> >> Hi Richard(s),
> >>
> >> This possibly inconsistent handling problem seems like a counter example
> >> better to use a new IFN rather than the existing tree_code, it seems hard
> >> to maintain (should remember to keep consistent for its handlings).  ;)
> >> From this perspective, maybe it's better to move backward to use tree_code
> >> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
> >> costing issue Richi pointed out before)?
> >>
> >> What do you think?
> >
> > No, whenever we want to do code generation based on machine
> > capabilities the canonical way to test for those is to look at optabs
> > and then it's most natural to keep that 1:1 relation and emit
> > internal function calls which directly map to supported optabs
> > instead of going back to some tree codes.
> >
> > When targets "lie" and provide expanders for something they can
> > only emulate then they have to compensate in their costing.
> > But as I understand this isn't the case for x86 here.
> >
> > Now, in this case we already have the MULT_HIGHPART_EXPR tree,
> > so yes, it might make sense to use that instead of introducing an
> > alternate way via the direct internal function.  Somebody decided
> > that MULT_HIGHPART is generic enough to warrant this - but I
> > see that expand_mult_highpart can fail unless can_mult_highpart_p
> > and this is exactly one of the cases we want to avoid - either
> > we can handle something generally in which case it can be a
> > tree code or we can't, then it should be 1:1 tied to optabs at best
> > (mult_highpart has scalar support only for the direct optab,
> > vector support also for widen_mult).
> >
>
> Thanks for the detailed explanation!  The attached v4 follows the
> preferred IFN way like v3, just with extra test case updates.
>
> Bootstrapped & regtested again on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> PR tree-optimization/100696
> * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
> * internal-fn.def (IFN_MULH): New internal function.
> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
> recog normal multiply highpart as IFN_

Re: [PATCH v2] gcc_update: use human readable name for revision string in gcc/REVISION

2021-07-19 Thread Richard Biener via Gcc-patches
On Fri, Jul 16, 2021 at 12:37 PM Serge Belyshev
 wrote:
>
> Based on discussion I've chosen open-coded version without commit hash.

As said I'd prefer one with (shortened) hash, but then I'm not angry
if people agree on sth else.

> >> > > ...  Perhaps rename the r, o, rr and m temporaries.
>
> I like it better with short names, there is no other code in that
> script to clash with.  (Also, two adjacent case branches for hg and svn
> are essentialy dead now).
>
> >> ...  Perhaps also replace both HEAD occurences with $revision
>
> not sure about that: should not they be exactly equivalent in all cases?
>
> ---
> gcc_update: use human readable name for revision string in gcc/REVISION
>
> contrib/Changelog:
>
> * gcc_update: derive human readable name for HEAD using git describe
> like "git gcc-descr" does.  Drop "revision" from gcc/REVISION.
> ---
>  contrib/gcc_update | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/gcc_update b/contrib/gcc_update
> index 80fac9fc995..558926b3a2d 100755
> --- a/contrib/gcc_update
> +++ b/contrib/gcc_update
> @@ -332,7 +332,22 @@ case $vcs_type in
>  exit 1
> fi
>
> -   revision=`$GCC_GIT log -n1 --pretty=tformat:%p:%t:%H`
> +   # Open-coded version of "git gcc-descr" from 
> contrib/gcc-git-customization.sh
> +   revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
> +   r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
> +  | sed -n 
> 's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
> +   if test -n $r; then
> +   o=`$GCC_GIT config --get gcc-config.upstream`;
> +   rr=`echo $r | sed -n 
> 's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
> +   if $GCC_GIT rev-parse --verify --quiet 
> ${o:-origin}/releases/gcc-$rr >/dev/null; then
> +   m=releases/gcc-$rr;
> +   else
> +   m=master;
> +   fi;
> +   if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; 
> then
> +   revision=$r;
> +   fi
> +   fi
> branch=`$GCC_GIT name-rev --name-only HEAD || :`
> ;;
>
> @@ -414,6 +429,6 @@ rm -f LAST_UPDATED gcc/REVISION
>  date
>  echo "`TZ=UTC date` (revision $revision)"
>  } > LAST_UPDATED
> -echo "[$branch revision $revision]" > gcc/REVISION
> +echo "[$branch $revision]" > gcc/REVISION
>
>  touch_files_reexec


Re: [PATCH 0/2] Allow means for targets to opt out of CTF/BTF

2021-07-19 Thread Richard Biener via Gcc-patches
On Sat, Jul 17, 2021 at 4:45 PM Indu Bhagat via Gcc-patches
 wrote:
>
> Hello,
>
> Thanks for your feedback on the previous RFC version of this proposal. This
> patch set is a refined and tested version of the same.
>   - Added changes to tm.texi.in and regenerated tm.texi.
>   - Updated the dejagnu files for redundant checks on AIX platform.
>
> Bootstrapped and reg tested on x86_64-pc-linux-gnu and powerpc-ibm-aix7.2.4.0.

OK.

Thanks,
Richard.

> Thanks,
>
> Indu Bhagat (2):
>   debug: Add new function ctf_debuginfo_p
>   debug: Allow means for targets to opt out of CTF/BTF support
>
>  gcc/config/elfos.h |  8 
>  gcc/doc/tm.texi| 26 ++
>  gcc/doc/tm.texi.in | 26 ++
>  gcc/flags.h|  4 
>  gcc/opts.c |  8 
>  gcc/testsuite/gcc.dg/debug/btf/btf.exp | 16 +---
>  gcc/testsuite/gcc.dg/debug/ctf/ctf.exp | 16 +---
>  gcc/testsuite/lib/gcc-dg.exp   |  1 -
>  gcc/toplev.c   | 11 +--
>  9 files changed, 99 insertions(+), 17 deletions(-)
>
> --
> 1.8.3.1
>


Re: [PATCH v2] gcc_update: use human readable name for revision string in gcc/REVISION

2021-07-19 Thread Iain Sandoe via Gcc-patches



> On 19 Jul 2021, at 11:39, Richard Biener via Gcc-patches 
>  wrote:
> 
> On Fri, Jul 16, 2021 at 12:37 PM Serge Belyshev
>  wrote:
>> 
>> Based on discussion I've chosen open-coded version without commit hash.
> 
> As said I'd prefer one with (shortened) hash,

Likewise, I’ve been using a local change to produce “r12-2447-gcca1e30db142”  
since soon after
change; I suspect that 12 digits is ‘enough’.  It makes it easier for the folks 
who want to find by 
SHA1 as well as folks who want to find by revision number.

> but then I'm not angry
> if people agree on sth else.

also likewise…

> 
>> ...  Perhaps rename the r, o, rr and m temporaries.
>> 
>> I like it better with short names, there is no other code in that
>> script to clash with.  (Also, two adjacent case branches for hg and svn
>> are essentialy dead now).
>> 
 ...  Perhaps also replace both HEAD occurences with $revision
>> 
>> not sure about that: should not they be exactly equivalent in all cases?
>> 
>> ---
>> gcc_update: use human readable name for revision string in gcc/REVISION
>> 
>> contrib/Changelog:
>> 
>>* gcc_update: derive human readable name for HEAD using git describe
>>like "git gcc-descr" does.  Drop "revision" from gcc/REVISION.
>> ---
>> contrib/gcc_update | 19 +--
>> 1 file changed, 17 insertions(+), 2 deletions(-)
>> 
>> diff --git a/contrib/gcc_update b/contrib/gcc_update
>> index 80fac9fc995..558926b3a2d 100755
>> --- a/contrib/gcc_update
>> +++ b/contrib/gcc_update
>> @@ -332,7 +332,22 @@ case $vcs_type in
>> exit 1
>>fi
>> 
>> -   revision=`$GCC_GIT log -n1 --pretty=tformat:%p:%t:%H`
>> +   # Open-coded version of "git gcc-descr" from 
>> contrib/gcc-git-customization.sh
>> +   revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
>> +   r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
>> +  | sed -n 
>> 's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
>> +   if test -n $r; then
>> +   o=`$GCC_GIT config --get gcc-config.upstream`;
>> +   rr=`echo $r | sed -n 
>> 's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
>> +   if $GCC_GIT rev-parse --verify --quiet 
>> ${o:-origin}/releases/gcc-$rr >/dev/null; then
>> +   m=releases/gcc-$rr;
>> +   else
>> +   m=master;
>> +   fi;
>> +   if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; 
>> then
>> +   revision=$r;
>> +   fi
>> +   fi
>>branch=`$GCC_GIT name-rev --name-only HEAD || :`
>>;;
>> 
>> @@ -414,6 +429,6 @@ rm -f LAST_UPDATED gcc/REVISION
>> date
>> echo "`TZ=UTC date` (revision $revision)"
>> } > LAST_UPDATED
>> -echo "[$branch revision $revision]" > gcc/REVISION
>> +echo "[$branch $revision]" > gcc/REVISION
>> 
>> touch_files_reexec



Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-19 Thread Richard Biener via Gcc-patches
On Fri, Jul 16, 2021 at 10:23 PM Hafiz Abid Qadeer
 wrote:
>
> On 15/07/2021 13:09, Richard Biener wrote:
> > On Thu, Jul 15, 2021 at 12:35 PM Hafiz Abid Qadeer
> >  wrote:
> >>
> >> On 15/07/2021 11:33, Thomas Schwinge wrote:
> >>>
>  Note that the "parent" should be abstract but I don't think dwarf has a
>  way to express a fully abstract parent of a concrete instance child - or
>  at least how GCC expresses this causes consumers to "misinterpret"
>  that.  I wonder if adding a DW_AT_declaration to the late DWARF
>  emitted "parent" would fix things as well here?
> >>>
> >>> (I suppose not, Abid?)
> >>>
> >>
> >> Yes, adding DW_AT_declaration does not fix the problem.
> >
> > Does emitting
> >
> > DW_TAG_compile_unit
> >   DW_AT_name("")
> >
> >   DW_TAG_subprogram // notional parent function (foo) with no code range
> > DW_AT_declaration 1
> > a:DW_TAG_subprogram // offload function foo._omp_fn.0
> >   DW_AT_declaration 1
> >
> >   DW_TAG_subprogram // offload function
> >   DW_AT_abstract_origin a
> > ...
> >
> > do the trick?  The following would do this, flattening function definitions
> > for the concrete copies:
> >
> > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> > index 82783c4968b..a9c8bc43e88 100644
> > --- a/gcc/dwarf2out.c
> > +++ b/gcc/dwarf2out.c
> > @@ -6076,6 +6076,11 @@ maybe_create_die_with_external_ref (tree decl)
> >/* Peel types in the context stack.  */
> >while (ctx && TYPE_P (ctx))
> >  ctx = TYPE_CONTEXT (ctx);
> > +  /* For functions peel the context up to namespace/TU scope.  The abstract
> > + copies reveal the true nesting.  */
> > +  if (TREE_CODE (decl) == FUNCTION_DECL)
> > +while (ctx && TREE_CODE (ctx) == FUNCTION_DECL)
> > +  ctx = DECL_CONTEXT (ctx);
> >/* Likewise namespaces in case we do not want to emit DIEs for them.  */
> >if (debug_info_level <= DINFO_LEVEL_TERSE)
> >  while (ctx && TREE_CODE (ctx) == NAMESPACE_DECL)
> > @@ -6099,8 +6104,7 @@ maybe_create_die_with_external_ref (tree decl)
> > /* Leave function local entities parent determination to when
> >we process scope vars.  */
> > ;
> > -  else
> > -   parent = lookup_decl_die (ctx);
> > +  parent = lookup_decl_die (ctx);
> >  }
> >else
> >  /* In some cases the FEs fail to set DECL_CONTEXT properly.
> >
>
> Thanks. This solves the problem. Only the first hunk was required. Second hunk
> actually causes an ICE when TREE_CODE (ctx) == BLOCK.
> OK to commit the attached patch?

I think we need to merge the TYPE_P peeling and FUNCTION_DECL peeling into
one loop since I suppose we can have a nested function in class scope.
So sth like

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 82783c4968b..61228410b51 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -6073,8 +6073,12 @@ maybe_create_die_with_external_ref (tree decl)
 }
   else
 ctx = DECL_CONTEXT (decl);
-  /* Peel types in the context stack.  */
-  while (ctx && TYPE_P (ctx))
+  /* Peel types in the context stack.  For functions peel the context up
+ to namespace/TU scope.  The abstract copies reveal the true nesting.  */
+  while (ctx
+&& (TYPE_P (ctx)
+|| (TREE_CODE (decl) == FUNCTION_DECL
+&& TREE_CODE (ctx) == FUNCTION_DECL)))
 ctx = TYPE_CONTEXT (ctx);
   /* Likewise namespaces in case we do not want to emit DIEs for them.  */
   if (debug_info_level <= DINFO_LEVEL_TERSE)

if that works it's OK.  Can you run it on the gdb testsuite with -flto added
as well please (you need to do before/after comparison since IIRC adding
-flto will add a few fails).

Thanks,
Richard.


Re: [PATCH v2] gcc_update: use human readable name for revision string in gcc/REVISION

2021-07-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 19, 2021 at 11:47:01AM +0100, Iain Sandoe wrote:
> 
> 
> > On 19 Jul 2021, at 11:39, Richard Biener via Gcc-patches 
> >  wrote:
> > 
> > On Fri, Jul 16, 2021 at 12:37 PM Serge Belyshev
> >  wrote:
> >> 
> >> Based on discussion I've chosen open-coded version without commit hash.
> > 
> > As said I'd prefer one with (shortened) hash,
> 
> Likewise, I’ve been using a local change to produce “r12-2447-gcca1e30db142”  
> since soon after
> change; I suspect that 12 digits is ‘enough’.  It makes it easier for the 
> folks who want to find by 
> SHA1 as well as folks who want to find by revision number.

If the patch is changed from
revision=$r;
to
revision=${r}-g${revision};
then the exact hash length isn't hardcoded to 12 digits, but whatever git
chooses to make it unique (with the current state of repository).
Of course, changes later on in the repository can introduce collisions and
more hash letters might be needed, but that is a general git problem.  And
at least when we have the rXX--gZ revisions, it is still unique
on the release branches, just one might need to use
$(git gcc-descr `echo $rev | sed 's/-g[0-9a-f]*$//'`)
instead of just
$rev
in case such collision happens if git tells rXX--g is ambiguous.

Jakub



Re: [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' (was: [PATCH libatomic/arm] avoid warning on constant a

2021-07-19 Thread Andrew Stubbs

On 19/07/2021 09:46, Thomas Schwinge wrote:

GCN already uses address 4 for this value because address 0 caused
problems with null-pointer checks.


Ugh.  How much wasted bytes per what is that?  (I haven't looked yet;
hopefully not per GPU thread?)  Because:


It's 4 bytes per gang. And that pointer is the only 8 bytes in the whole 
of LDS (OpenMP mostly uses stack and heap), so it's not so bad, but still.


I did investigate the target macro that lets you control null pointer 
behaviour, but it didn't just work, and it wasn't important enough for 
me to spend more time on it so I let it go.


Andrew


Re: [committed] .dir-locals.el: Set 'fill-column' to 80 for c-mode

2021-07-19 Thread Richard Earnshaw via Gcc-patches




On 14/12/2020 11:29, Andrea Corallo via Gcc-patches wrote:

Hi all,

I've committed the attached patch as obvious.

This is to set `fill-column' to 80 in c-mode (Emacs default it to 70) so
now M-q does the right thing.  I think is very handy especially in
comments.

Question: should we update the copyright manually for this file or have
it updated by 'update-copyright.py'?  I think this is not scanning the
root of the repo.

Thanks

   Andrea



Sorry for the very late reply to this, but I've only just tracked it 
down as the cause of why emacs had suddenly started to produce lines 
that were displaying with a line wrap (I'd add an image, but the mailing 
list would likely only strip it off).


The problem is that this allows a character to appear in column 80 and 
emacs then automatically inserts a blank line after it (when the window 
is 80 columns wide), which really messes up the formatting.


The wiki seems to suggest https://gcc.gnu.org/wiki/FormattingCodeForGCC 
that the line length should be 79 columns (see the textwidth setting); 
although that is not what is set contrib/vimrc.


Would anyone object if we reduced this by 1 (to 79) in both places?

R.


[PATCH v3] gcc_update: use human readable name for revision string in gcc/REVISION

2021-07-19 Thread Serge Belyshev


>> > On 19 Jul 2021, at 11:39, Richard Biener via Gcc-patches 
>> >  wrote:
>> > 
>> > On Fri, Jul 16, 2021 at 12:37 PM Serge Belyshev
>> >  wrote:
>> >> 
>> >> Based on discussion I've chosen open-coded version without commit hash.
>> > 
>> > As said I'd prefer one with (shortened) hash,

Oh, I misunderstood then.

>> 
>> Likewise, I’ve been using a local change to produce “r12-2447-gcca1e30db142” 
>>  since soon after
>> change; I suspect that 12 digits is ‘enough’.  It makes it easier for the 
>> folks who want to find by 
>> SHA1 as well as folks who want to find by revision number.
>
> If the patch is changed from
>   revision=$r;
> to
>   revision=${r}-g${revision};
> then the exact hash length isn't hardcoded to 12 digits, but whatever git
> chooses to make it unique (with the current state of repository).
> Of course, changes later on in the repository can introduce collisions and
> more hash letters might be needed, but that is a general git problem.  And
> at least when we have the rXX--gZ revisions, it is still unique
> on the release branches, just one might need to use
> $(git gcc-descr `echo $rev | sed 's/-g[0-9a-f]*$//'`)
> instead of just
> $rev
> in case such collision happens if git tells rXX--g is ambiguous.
>

Right, variant with hash has the advantage that it is understood by git
out of the box without customisations, so be it then.

OK for mainline?

---
contrib/Changelog:

* gcc_update: derive human readable name for HEAD using git describe
like "git gcc-descr" with short commit hash.  Drop "revision" from 
gcc/REVISION.
---
 contrib/gcc_update | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/contrib/gcc_update b/contrib/gcc_update
index 80fac9fc995..ce472545e25 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -332,7 +332,22 @@ case $vcs_type in
 exit 1
fi
 
-   revision=`$GCC_GIT log -n1 --pretty=tformat:%p:%t:%H`
+   # Open-coded version of "git gcc-descr" from 
contrib/gcc-git-customization.sh
+   revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
+   r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
+  | sed -n 
's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
+   if test -n $r; then
+   o=`$GCC_GIT config --get gcc-config.upstream`;
+   rr=`echo $r | sed -n 
's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
+   if $GCC_GIT rev-parse --verify --quiet 
${o:-origin}/releases/gcc-$rr >/dev/null; then
+   m=releases/gcc-$rr;
+   else
+   m=master;
+   fi;
+   if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; then
+   revision=${r}-g${revision};
+   fi
+   fi
branch=`$GCC_GIT name-rev --name-only HEAD || :`
;;
 
@@ -414,6 +429,6 @@ rm -f LAST_UPDATED gcc/REVISION
 date
 echo "`TZ=UTC date` (revision $revision)"
 } > LAST_UPDATED
-echo "[$branch revision $revision]" > gcc/REVISION
+echo "[$branch $revision]" > gcc/REVISION
 
 touch_files_reexec


[PATCH] tree-optimization/101505 - properly determine stmt precision for PHIs

2021-07-19 Thread Richard Biener
Loop vectorization pattern recog fails to walk PHIs when determining
stmt precisions.  This fails to recognize non-mask uses for bools
in PHIs and outer loop vectorization.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-19  Richard Biener  

PR tree-optimization/101505
* tree-vect-patterns.c (vect_determine_precisions): Walk
PHIs also for loop vectorization.

* gcc.dg/vect/pr101505.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr101505.c | 16 
 gcc/tree-vect-patterns.c | 14 ++
 2 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101505.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr101505.c 
b/gcc/testsuite/gcc.dg/vect/pr101505.c
new file mode 100644
index 000..e2b8945a5cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr101505.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O1" } */
+
+int n2;
+
+__attribute__ ((simd)) char
+w7 (void)
+{
+  short int xb = n2;
+  int qp;
+
+  for (qp = 0; qp < 2; ++qp)
+xb = xb < 1;
+
+  return xb;
+}
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index c2494446183..44f6c9b2bd6 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -5355,6 +5355,13 @@ vect_determine_precisions (vec_info *vinfo)
   for (unsigned int i = 0; i < nbbs; i++)
{
  basic_block bb = bbs[i];
+ for (auto gsi = gsi_start_phis (bb);
+  !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi.phi ());
+ if (stmt_info)
+   vect_determine_mask_precision (vinfo, stmt_info);
+   }
  for (auto si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
if (!is_gimple_debug (gsi_stmt (si)))
  vect_determine_mask_precision
@@ -5368,6 +5375,13 @@ vect_determine_precisions (vec_info *vinfo)
if (!is_gimple_debug (gsi_stmt (si)))
  vect_determine_stmt_precisions
(vinfo, vinfo->lookup_stmt (gsi_stmt (si)));
+ for (auto gsi = gsi_start_phis (bb);
+  !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi.phi ());
+ if (stmt_info)
+   vect_determine_stmt_precisions (vinfo, stmt_info);
+   }
}
 }
   else
-- 
2.26.2


Re: [PATCH] c-family: Add __builtin_assoc_barrier

2021-07-19 Thread Richard Biener
On Mon, 19 Jul 2021, Matthias Kretz wrote:

> tested on x86_64-pc-linux-gnu with no new failures. OK for master?

I think now that PAREN_EXPR can appear in C++ code you need to
adjust some machiner to expect it (constexpr folding?  template stuff?).
I suggest to add some testcases covering templates and constexpr
functions.

+@deftypefn {Built-in Function} @var{type} __builtin_assoc_barrier
(@var{type} @var{expr})
+This built-in represents a re-association barrier for the floating-point
+expression @var{expr} with operations following the built-in. The
expression
+@var{expr} itself can be reordered, and the whole expression @var{expr} 
can
be
+reordered with operations after the barrier.

What operations follow the built-in also applies to operations leading
the builtin?  Maybe "This built-in represents a re-association barrier
for the floating-point expression @var{expr} with the expression
consuming its value."  But I'm not an english speaker - I guess
I'm mostly confused about "follow" here.

I'm not sure if there are better C/C++ language terms describing what
the builtin does, but basically it appears as opaque operand to the
surrounding expression and the surrounding expression is opaque
to the expression inside the parens.

 The barrier is only relevant
when
+@code{-fassociative-math} is active, since otherwise floating-point is 
not
+treated as associative.
+
+@smallexample
+float x0 = a + b - b;
+float x1 = __builtin_assoc_barrier(a + b) - b;
+@end smallexample
+
+@noindent
+means that, with @code{-fassociative-math}, @code{x0} can be optimized to
+@code{x0 = a} but @code{x1} cannot.
+@end deftypefn
+

Otherwise the patch looks OK, but of course C/C++ frontend maintainers
would want to chime in here (I've CCed two).

Richard.


> New builtin to enable explicit use of PAREN_EXPR in C & C++ code.
> 
> Signed-off-by: Matthias Kretz 
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/builtin-assoc-barrier-1.c: New test.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-objcp-common.c (names_builtin_p): Handle
>   RID_BUILTIN_ASSOC_BARRIER.
>   * parser.c (cp_parser_postfix_expression): Handle
>   RID_BUILTIN_ASSOC_BARRIER.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-common.c (c_common_reswords): Add __builtin_assoc_barrier.
>   * c-common.h (enum rid): Add RID_BUILTIN_ASSOC_BARRIER.
> 
> gcc/c/ChangeLog:
> 
>   * c-decl.c (names_builtin_p): Handle RID_BUILTIN_ASSOC_BARRIER.
>   * c-parser.c (c_parser_postfix_expression): Likewise.
> 
> gcc/ChangeLog:
> 
>   * doc/extend.texi: Document __builtin_assoc_barrier.
> ---
>  gcc/c-family/c-common.c   |  1 +
>  gcc/c-family/c-common.h   |  2 +-
>  gcc/c/c-decl.c|  1 +
>  gcc/c/c-parser.c  | 20 
>  gcc/cp/cp-objcp-common.c  |  1 +
>  gcc/cp/parser.c   | 14 +++
>  gcc/doc/extend.texi   | 18 ++
>  .../c-c++-common/builtin-assoc-barrier-1.c| 24 +++
>  8 files changed, 80 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/builtin-assoc-barrier-1.c
> 
> 
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

2021-07-19 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 15 Jul 2021 at 16:46, Prathamesh Kulkarni
 wrote:
>
> On Thu, 15 Jul 2021 at 14:47, Christophe Lyon
>  wrote:
> >
> > Hi Prathamesh,
> >
> > On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches 
> >  wrote:
> >>
> >>
> >>
> >> > -Original Message-
> >> > From: Prathamesh Kulkarni 
> >> > Sent: 05 July 2021 10:18
> >> > To: gcc Patches ; Kyrylo Tkachov
> >> > 
> >> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n
> >> > intrinsics
> >> >
> >> > Hi Kyrill,
> >> > I assume this patch is OK to commit after bootstrap+testing ?
> >>
> >> Yes.
> >> Thanks,
> >> Kyrill
> >>
> >
> >
> > The updated testcase fails on some configs:
> > gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+ found 2 
> > times
> > FAIL:  gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times 
> > vdup\\.16\\tq[0-9]+, r[0-9]+ 3
> >
> > For instance on arm-none-eabi with default configuration flags 
> > (mode/cpu/fpu)
> > and default runtestflags.
> > The same toolchain config also fails on this test when overriding 
> > runtestflags with:
> > -mthumb/-mfloat-abi=soft/-march=armv6s-m
> > -mthumb/-mfloat-abi=soft/-march=armv7-m
> > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
> >
> > Can you fix this please?
> Hi Christophe,
> Sorry for the breakage, I will take a look.
The issue is for the following function;

float16x8_t f2 (float16x8_t __a, float16_t __b) {
  return __a * __b;
}

With -O2 -ffast-math -mfloat-abi=softfp -march=armv8.2-a+fp16, it generates:
f2:
ldrhip, [sp]@ __fp16
vmovd18, r0, r1  @ v8hf
vmovd19, r2, r3
vdup.16 q8, ip
vmul.f16q8, q8, q9
vmovr0, r1, d16  @ v8hf
vmovr2, r3, d17
bx  lr

It correctly generates vdup, but IIUC, r0-r3 are used up in loading
'a' into q9 (d18 / d19),
and it uses ip for loading 'b' and ends up with vdup q8, ip, and thus
the scan for "vdup\\.16\\tq[0-9]+, r[0-9]+" fails.
I tried to adjust the scan to following to accommodate ip:
/* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 } }  */
but that still FAIL's because log shows:
gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+,
(r[0-9]+|ip) found 6 times

Could you suggest how should I adjust the test, so the second operand
can be either r[0-9]+ or ip register ?

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> >
> > Christophe
> >
> >> >
> >> > Thanks,
> >> > Prathamesh


Re: [committed] .dir-locals.el: Set 'fill-column' to 80 for c-mode

2021-07-19 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches  writes:
> On 14/12/2020 11:29, Andrea Corallo via Gcc-patches wrote:
>> Hi all,
>> 
>> I've committed the attached patch as obvious.
>> 
>> This is to set `fill-column' to 80 in c-mode (Emacs default it to 70) so
>> now M-q does the right thing.  I think is very handy especially in
>> comments.
>> 
>> Question: should we update the copyright manually for this file or have
>> it updated by 'update-copyright.py'?  I think this is not scanning the
>> root of the repo.
>> 
>> Thanks
>> 
>>Andrea
>> 
>
> Sorry for the very late reply to this, but I've only just tracked it 
> down as the cause of why emacs had suddenly started to produce lines 
> that were displaying with a line wrap (I'd add an image, but the mailing 
> list would likely only strip it off).
>
> The problem is that this allows a character to appear in column 80 and 
> emacs then automatically inserts a blank line after it (when the window 
> is 80 columns wide), which really messes up the formatting.
>
> The wiki seems to suggest https://gcc.gnu.org/wiki/FormattingCodeForGCC 
> that the line length should be 79 columns (see the textwidth setting); 
> although that is not what is set contrib/vimrc.
>
> Would anyone object if we reduced this by 1 (to 79) in both places?

Sounds good to me FWIW.  (Having hit the same issue.)

Richard


Re: [PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-19 Thread Richard Earnshaw via Gcc-patches
On 16/07/2021 08:29, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Jul 16, 2021 at 09:06:01AM +0200, Richard Biener via Gcc-patches 
> wrote:
>> On Thu, Jul 15, 2021 at 9:12 PM Serge Belyshev
>>  wrote:
>>>
>>> This is to make development version string more readable, and
>>> to simplify navigation through gcc-testresults.
>>>
>>> Currently gcc_update uses git log --pretty=tformat:%p:%t:%H to
>>> generate version string, which is somewhat excessive since conversion
>>> to git because commit hashes are now stable.
>>>
>>> Even better, gcc-git-customization.sh script provides gcc-descr alias
>>> which makes prettier version string, and thus use it instead (or just
>>> abbreviated commit hash when the alias is not available).
>>>
>>> Before: [master revision 
>>> b25edf6e6fe:e035f180ebf:7094a69bd62a14dfa311eaa2fea468f221c7c9f3]
>>> After: [master r12-2331]
>>>
>>> OK for mainline?
>>
>> Can you instead open-code gcc-descr in this script?
> 
> Yeah, that will mean consistency no matter whether one has the
> customizations installed or not.
> And, you don't want the effect of $GCC_GIT gcc-descr but $GCC_GIT gcc-descr 
> HEAD
> (the default is $GCC_GIT gcc-descr master).
> As you want to use gcc-descr without --full, I think
>   revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
>   r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
>  | sed -n 
> 's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
>   if test -n $r; then
>   o=`$GCC_GIT config --get gcc-config.upstream`;
>   rr=`echo $r | sed -n 
> 's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
>   if $GCC_GIT rev-parse --verify --quiet 
> ${o:-origin}/releases/gcc-$rr >/dev/null; then
>   m=releases/gcc-$rr;
>   else
>   m=master;
>   fi;
>   if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; then
>   revision=$r;
>   fi
>   fi
> will do it.  Perhaps rename the r, o, rr and m temporaries.
> 
>   Jakub
> 

Isn't this going to do the wrong thing on any branch that isn't master
or a release branch?  I don't think git gcc-descr is meaningful outside
of those contexts.

R.


Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-19 Thread Jonathan Wakely via Gcc-patches
On Mon, 19 Jul 2021 at 07:20, Kewen.Lin  wrote:
>
> Hi,
>
> This patch follows Martin's suggestion here[1], to support
> range-based for loops for traversing loops, analogously to
> the patch for vec[2].
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, also
> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>
> Any comments are appreciated.

In the loops_list::iterator type, this looks a little strange:

+bool
+operator!= (const iterator &rhs) const
+{
+  return this->curr_idx < rhs.curr_idx;
+}
+

This works fine when the iterator type is used implicitly in a
range-based for loop, but it wouldn't work for explicit uses of the
iterator type where somebody does the != comparison with the
past-the-end iterator on on the LHS:

auto&& list ALL_LOOPS(foo);
auto end = list.end();
auto begin = list.begin();
while (--end != begin)


Re: [PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 19, 2021 at 02:53:13PM +0100, Richard Earnshaw wrote:
> > Yeah, that will mean consistency no matter whether one has the
> > customizations installed or not.
> > And, you don't want the effect of $GCC_GIT gcc-descr but $GCC_GIT gcc-descr 
> > HEAD
> > (the default is $GCC_GIT gcc-descr master).
> > As you want to use gcc-descr without --full, I think
> > revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
> > r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
> >| sed -n 
> > 's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
> > if test -n $r; then
> > o=`$GCC_GIT config --get gcc-config.upstream`;
> > rr=`echo $r | sed -n 
> > 's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
> > if $GCC_GIT rev-parse --verify --quiet 
> > ${o:-origin}/releases/gcc-$rr >/dev/null; then
> > m=releases/gcc-$rr;
> > else
> > m=master;
> > fi;
> > if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; then
> > revision=$r;
> > fi
> > fi
> > will do it.  Perhaps rename the r, o, rr and m temporaries.
> > 
> > Jakub
> > 
> 
> Isn't this going to do the wrong thing on any branch that isn't master
> or a release branch?  I don't think git gcc-descr is meaningful outside
> of those contexts.

git merge-base --is-ancestor test should fail on those branches,
origin/releases/gcc-11 or origin/master etc. will then not have your HEAD
as their ancestor.

Jakub



Re: Need Help: Initialize paddings for -ftrivial-auto-var-init

2021-07-19 Thread Qing Zhao via Gcc-patches


> On Jul 19, 2021, at 5:33 AM, Richard Biener  wrote:
> 
> On Fri, 16 Jul 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> After some more study on __builtin_clear_padding and the corresponding 
>> testing cases.
>> And also considered both Richard Biener and Richard Sandiford’s previous 
>> suggestion to use
>> __builtin_clear_padding.  I have the following thought on the paddings 
>> initialization:
>> 
>> ** We can insert a call to __builtin_clear_padding (&decl, 0) to all the 
>> variables that need to be
>> Auto-initialized during gimplification phase.  This include two places:
>> 
>>A. If the auto-variable does not have an explicit initializer, and we 
>> need to add a call to .DEFERRED_INIT.
>> We always add a call to __builtin_clear_padding following this 
>> .DEFERRED_INIT call.
>> 
>>structure_type temp;
>> 
>>temp = .DEFERRED_INIT ();
>>__builtin_clear_padding (&temp, 0);
>> 
>> 
>>   NOTE: 
>>  ** If temp has a type without paddings, then 
>> __builtin_clear_padding will be lowered to a gimple_nop automatically.
>>  ** regardless with zero or pattern init,  the paddings will be 
>> always initialized to ZEROes, which is compatible with CLANG.
>> 
>> 
>>B. If the auto-variable does HAVE an explicit initializer, then we will 
>> add the call to __builtin_clear_padding 
>> In the beginning of “gimplify_init_constructor”.
>> 
>> 
>>  structure_type temp = {…..};
>> 
>> 
>> __builtin_clear_padding (&temp, 0);
>> Expand_the_constructor;
>> 
>>NOTE:
>>** if temp has a type without padding, the call to 
>> __builtin_clear_padding will be lowed to gimple_nop;
>>** padding will be always initialized to ZEROes. 
>> 
>> 
>> **the major benefit with this approach are:
>> 
>>  1. Padding initialization will be compatible with CLANG;
> 
> Irrelevant IMHO.
Even though it’s not very important, I think that it will be nicer if we can 
make it.  

> 
>>  2. Implemenation will be much more simple and consistent;
> 
> Really?  We're block-initializing now, so how is it simpler to
> initialize padding separately?

For pattern initialization, after the block-initializing, we can add the call 
to __builtin_clear_padding to initialize the paddings with zeroes.
Then all the paddings will be initialized with zeroes regardless whether it’s 
zero init or pattern init.

For zero initialization, after the block-initializing, we will Not add the call 
to __builtin_clear_padding. 

> 
>> My questions:
>> 
>> 1. What do you think of this approach?
> 
> I wonder why we're going back to separate padding initialization?

This mainly due to the implementation for initialize paddings of 
fully-initialized structure variables, we can add an additional 
block-initializing for 
paddings initialization before the real constructor initialization.  Or we can 
add a call to __builtin_clear_padding after the real constructor initialization 
to only initialize the separate paddings. 

Both approach should work, but after some study, I think adding call to 
__builtin_clear_padding should be better due to:

1. Easier to be implemented.  For the other approach, we will need to come up 
with a new routine “type_has_padding” to decide whether we need to
add the block initialization for the structure, as both you and Richard 
Sandiford commented before, this new routine “type_has_padding” SHOULE reuse
the current routine “clear_padding_type” in simple-fold.c, after my study, it’s 
quite difficult  to reorg  “clear_padding_type” to be shared between 
“gimple_fold_builtin_clear_padding”
and the new routine “type_has_padding”.   And also after this detailed study, I 
found that directly call “__builtin_clear_padding” might be more simpler
and clean.

2. Padding will be all zeroed regardless wether it’s zero init or pattern init. 
I think that this is nicer than pattern initializing the paddings.


>> 2. During implementation, if I want to add the following routine:
>> 
>> /* Generate padding initialization for automatic vairable DECL.
>>   C guarantees that brace-init with fewer initializers than members
>>   aggregate will initialize the rest of the aggregate as-if it were
>>   static initialization.  In turn static initialization guarantees
>>   that padding is initialized to zero. So, we always initialize paddings
>>   to zeroes regardless INIT_TYPE.
>>   To do the padding initialization, we insert a call to
>>   __BUILTIN_CLEAR_PADDING (&decl, 0).
>>   */
>> static void
>> gimple_add_padding_init_for_auto_var (tree decl, gimple_seq *seq_p)
>> {
>> ?? how to build a addr_of_decl tree node???
>>  tree addr_of_decl = ….
> 
> = build_fold_addr_expr (decl);

Thank you.

Qing
> 
>>  gimple *call = gimple_build_call (builtin_decl_implicit 
>> (BUILT_IN_CLEAR_PADDING),
>>2, addr_of_decl, build_zero_cst 
>> (TREE_TYPE (decl));
>>  gimplify_seq_add_stmt (seq_p, call);
>> }
>> 

[pushed] c++: Add test for DR 2126

2021-07-19 Thread Marek Polacek via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

DR 2126

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-temp2.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C | 6 ++
 1 file changed, 6 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C
new file mode 100644
index 000..28ffd2c86c4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C
@@ -0,0 +1,6 @@
+// DR 2126
+// { dg-do compile { target c++11 } }
+
+typedef const int CI[3];
+constexpr CI &ci = CI{11, 22, 33};
+static_assert(ci[1] == 22, "");

base-commit: 8df3ee8f7d85d0708f3c3ca96b55c9230c2ae9f0
-- 
2.31.1



Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-19 Thread Richard Biener via Gcc-patches
On Mon, Jul 19, 2021 at 8:20 AM Kewen.Lin  wrote:
>
> Hi,
>
> This patch follows Martin's suggestion here[1], to support
> range-based for loops for traversing loops, analogously to
> the patch for vec[2].
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, also
> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>
> Any comments are appreciated.

Since you are touching all FOR_EACH_LOOP please
make implicit 'cfun' uses explicit.  I'm not sure ALL_LOOPS
should scream, I think all_loops (function *, flags) would be
nicer.

Note I'm anticipating iteration over a subset of the loop tree
which would ask for specifying the 'root' of the loop tree to
iterate over so it could be

  loops_list (class loop *root, unsigned flags)

and the "all" cases use loops_list (loops_for_fn (cfun), flags) then.
Providing an overload with struct function is of course OK.

Richard.

> BR,
> Kewen
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573424.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html
> -
> gcc/ChangeLog:
>
> * cfgloop.h (class loop_iterator): Rename to ...
> (class loops_list): ... this.
> (loop_iterator::next): Rename to ...
> (loops_list::iterator::fill_curr_loop): ... this and adjust.
> (loop_iterator::loop_iterator): Rename to ...
> (loops_list::loops_list): ... this and adjust.
> (FOR_EACH_LOOP): Rename to ...
> (ALL_LOOPS): ... this.
> (FOR_EACH_LOOP_FN): Rename to ...
> (ALL_LOOPS_FN): this.
> (loops_list::iterator): New class.
> (loops_list::begin): New function.
> (loops_list::end): Likewise.
> * cfgloop.c (flow_loops_dump): Adjust FOR_EACH_LOOP* with ALL_LOOPS*.
> (sort_sibling_loops): Likewise.
> (disambiguate_loops_with_multiple_latches): Likewise.
> (verify_loop_structure): Likewise.
> * cfgloopmanip.c (create_preheaders): Likewise.
> (force_single_succ_latches): Likewise.
> * config/aarch64/falkor-tag-collision-avoidance.c
> (execute_tag_collision_avoidance): Likewise.
> * config/mn10300/mn10300.c (mn10300_scan_for_setlb_lcc): Likewise.
> * config/s390/s390.c (s390_adjust_loops): Likewise.
> * doc/loop.texi: Likewise.
> * gimple-loop-interchange.cc (pass_linterchange::execute): Likewise.
> * gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
> * gimple-loop-versioning.cc (loop_versioning::analyze_blocks): 
> Likewise.
> (loop_versioning::make_versioning_decisions): Likewise.
> * gimple-ssa-split-paths.c (split_paths): Likewise.
> * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): 
> Likewise.
> * graphite.c (canonicalize_loop_form): Likewise.
> (graphite_transform_loops): Likewise.
> * ipa-fnsummary.c (analyze_function_body): Likewise.
> * ipa-pure-const.c (analyze_function): Likewise.
> * loop-doloop.c (doloop_optimize_loops): Likewise.
> * loop-init.c (loop_optimizer_finalize): Likewise.
> (fix_loop_structure): Likewise.
> * loop-invariant.c (calculate_loop_reg_pressure): Likewise.
> (move_loop_invariants): Likewise.
> * loop-unroll.c (decide_unrolling): Likewise.
> (unroll_loops): Likewise.
> * modulo-sched.c (sms_schedule): Likewise.
> * predict.c (predict_loops): Likewise.
> (pass_profile::execute): Likewise.
> * profile.c (branch_prob): Likewise.
> * sel-sched-ir.c (sel_finish_pipelining): Likewise.
> (sel_find_rgns): Likewise.
> * tree-cfg.c (replace_loop_annotate): Likewise.
> (replace_uses_by): Likewise.
> (move_sese_region_to_fn): Likewise.
> * tree-if-conv.c (pass_if_conversion::execute): Likewise.
> * tree-loop-distribution.c (loop_distribution::execute): Likewise.
> * tree-parloops.c (parallelize_loops): Likewise.
> * tree-predcom.c (tree_predictive_commoning): Likewise.
> * tree-scalar-evolution.c (scev_initialize): Likewise.
> (scev_reset): Likewise.
> * tree-ssa-dce.c (find_obviously_necessary_stmts): Likewise.
> * tree-ssa-live.c (remove_unused_locals): Likewise.
> * tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
> * tree-ssa-loop-im.c (analyze_memory_references): Likewise.
> (tree_ssa_lim_initialize): Likewise.
> * tree-ssa-loop-ivcanon.c (canonicalize_induction_variables): 
> Likewise.
> * tree-ssa-loop-ivopts.c (tree_ssa_iv_optimize): Likewise.
> * tree-ssa-loop-manip.c (get_loops_exits): Likewise.
> * tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Likewise.
> (free_numbers_of_iterations_estimates): Likewise.
> * tree-ssa-loop-prefetch.c (tree_ssa_prefetch_arrays): Likewise.
> * tree-ssa-loop-split.c (tree_ssa_split_loop

Re: [PATCH] Add QI vector mode support to by-pieces for memset

2021-07-19 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
> On Fri, Jul 16, 2021 at 7:15 AM H.J. Lu  wrote:
>>
>> On Fri, Jul 16, 2021 at 6:24 AM Richard Sandiford
>>  wrote:
>> >
>> > "H.J. Lu via Gcc-patches"  writes:
>> > > On Fri, Jul 16, 2021 at 4:38 AM Richard Sandiford
>> > >  wrote:
>> > >>
>> > >> "H.J. Lu via Gcc-patches"  writes:
>> > >> > 1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with
>> > >> > vec_duplicate_optab to duplicate QI value to TI/OI/XI value.
>> > >> > 2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a 
>> > >> > hard
>> > >> > scratch register to avoid stack realignment when expanding memset.
>> > >> >
>> > >> >   PR middle-end/90773
>> > >> >   * builtins.c (gen_memset_value_from_prev): New function.
>> > >> >   (gen_memset_broadcast): Likewise.
>> > >> >   (builtin_memset_read_str): Use gen_memset_value_from_prev
>> > >> >   and gen_memset_broadcast.
>> > >> >   (builtin_memset_gen_str): Likewise.
>> > >> >   * target.def (gen_memset_scratch_rtx): New hook.
>> > >> >   * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
>> > >> >   * doc/tm.texi: Regenerated.
>> > >> > ---
>> > >> >  gcc/builtins.c | 123 
>> > >> > +
>> > >> >  gcc/doc/tm.texi|   5 ++
>> > >> >  gcc/doc/tm.texi.in |   2 +
>> > >> >  gcc/target.def |   7 +++
>> > >> >  4 files changed, 116 insertions(+), 21 deletions(-)
>> > >> >
>> > >> > diff --git a/gcc/builtins.c b/gcc/builtins.c
>> > >> > index 39ab139b7e1..c1758ae2efc 100644
>> > >> > --- a/gcc/builtins.c
>> > >> > +++ b/gcc/builtins.c
>> > >> > @@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx target)
>> > >> >return NULL_RTX;
>> > >> >  }
>> > >> >
>> > >> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE 
>> > >> > (MODE)
>> > >> > -   bytes from constant string DATA + OFFSET and return it as target
>> > >> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
>> > >> > +/* Return the RTL of a register in MODE generated from PREV in the
>> > >> > previous iteration.  */
>> > >> >
>> > >> > -rtx
>> > >> > -builtin_memset_read_str (void *data, void *prevp,
>> > >> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
>> > >> > -  scalar_int_mode mode)
>> > >> > +static rtx
>> > >> > +gen_memset_value_from_prev (void *prevp, scalar_int_mode mode)
>> > >> >  {
>> > >> > +  rtx target = nullptr;
>> > >> >by_pieces_prev *prev = (by_pieces_prev *) prevp;
>> > >> >if (prev != nullptr && prev->data != nullptr)
>> > >> >  {
>> > >> >/* Use the previous data in the same mode.  */
>> > >> >if (prev->mode == mode)
>> > >> >   return prev->data;
>> > >> > +
>> > >> > +  rtx prev_rtx = prev->data;
>> > >> > +  machine_mode prev_mode = prev->mode;
>> > >> > +  unsigned int word_size = GET_MODE_SIZE (word_mode);
>> > >> > +  if (word_size < GET_MODE_SIZE (prev->mode)
>> > >> > +   && word_size > GET_MODE_SIZE (mode))
>> > >> > + {
>> > >> > +   /* First generate subreg of word mode if the previous mode is
>> > >> > +  wider than word mode and word mode is wider than MODE.  */
>> > >> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
>> > >> > +   prev_mode, 0);
>> > >> > +   prev_mode = word_mode;
>> > >> > + }
>> > >> > +  if (prev_rtx != nullptr)
>> > >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
>> > >> >  }
>> > >> > +  return target;
>> > >> > +}
>> > >> > +
>> > >> > +/* Return the RTL of a register in MODE broadcasted from DATA.  */
>> > >> > +
>> > >> > +static rtx
>> > >> > +gen_memset_broadcast (rtx data, scalar_int_mode mode)
>> > >> > +{
>> > >> > +  /* Skip if regno_reg_rtx isn't initialized.  */
>> > >> > +  if (!regno_reg_rtx)
>> > >> > +return nullptr;
>> > >> > +
>> > >> > +  rtx target = nullptr;
>> > >> > +
>> > >> > +  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE 
>> > >> > (QImode);
>> > >> > +  machine_mode vector_mode;
>> > >> > +  if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
>> > >> > +gcc_unreachable ();
>> > >>
>> > >> Sorry, I realise it's a bit late to be raising this objection now,
>> > >> but I don't think it's a good idea to use scalar integer modes as
>> > >> a proxy for vector modes.  In principle there's no reason why a
>> > >> target has to define an integer mode for every vector mode.
>> > >
>> > > A target always defines the largest integer mode.
>> >
>> > Right.  But a target shouldn't *need* to define an integer mode
>> > for every vector mode.
>> >
>> > >> If we want the mode to be a vector then I think the by-pieces
>> > >> infrastructure should be extended to support vectors directly,
>> > >> rather than assuming that each piece can be represented as
>> > >> a scalar_int_mode.
>> > >>
>> > >
>> > > The current by-pieces infrastructure operates on scala

Re: [PATCH 0/13] v2 warning control by group and location (PR 74765)

2021-07-19 Thread Martin Sebor via Gcc-patches

On 7/17/21 2:36 PM, Jan-Benedict Glaw wrote:

Hi Martin!

On Fri, 2021-06-04 15:27:04 -0600, Martin Sebor  wrote:

This is a revised patch series to add warning control by group and
location, updated based on feedback on the initial series.

[...]

My automated checking (in this case: Using Debian's "gcc-snapshot"
package) indicates that between versions 1:20210527-1 and
1:20210630-1, building GDB breaks. Your patch is a likely candidate.
It's a case where a method asks for a nonnull argument and later on
checks for NULLness again. The build log is currently available at
(http://wolf.lug-owl.de:8080/jobs/gdb-vax-linux/5), though obviously
breaks for any target:

configure --target=vax-linux --prefix=/tmp/gdb-vax-linux
make all-gdb

[...]
[all 2021-07-16 19:19:25]   CXXcompile/compile.o
[all 2021-07-16 19:19:30] In file included from 
./../gdbsupport/common-defs.h:126,
[all 2021-07-16 19:19:30]  from ./defs.h:28,
[all 2021-07-16 19:19:30]  from compile/compile.c:20:
[all 2021-07-16 19:19:30] ./../gdbsupport/gdb_unlinker.h: In constructor 
'gdb::unlinker::unlinker(const char*)':
[all 2021-07-16 19:19:30] ./../gdbsupport/gdb_assert.h:35:4: error: 'nonnull' 
argument 'filename' compared to NULL [-Werror=nonnull-compare]
[all 2021-07-16 19:19:30]35 |   ((void) ((expr) ? 0 :   
\
[all 2021-07-16 19:19:30]   |   
~^~~~
[all 2021-07-16 19:19:30]36 |(gdb_assert_fail (#expr, __FILE__, 
__LINE__, FUNCTION_NAME), 0)))
[all 2021-07-16 19:19:30]   |
~
[all 2021-07-16 19:19:30] ./../gdbsupport/gdb_unlinker.h:38:5: note: in 
expansion of macro 'gdb_assert'
[all 2021-07-16 19:19:30]38 | gdb_assert (filename != NULL);
[all 2021-07-16 19:19:30]   | ^~
[all 2021-07-16 19:19:31] cc1plus: all warnings being treated as errors
[all 2021-07-16 19:19:31] make[1]: *** [Makefile:1641: compile/compile.o] Error 
1
[all 2021-07-16 19:19:31] make[1]: Leaving directory 
'/var/lib/laminar/run/gdb-vax-linux/5/binutils-gdb/gdb'
[all 2021-07-16 19:19:31] make: *** [Makefile:11410: all-gdb] Error 2


Code is this:

  31 class unlinker
  32 {
  33  public:
  34
  35   unlinker (const char *filename) ATTRIBUTE_NONNULL (2)
  36 : m_filename (filename)
  37   {
  38 gdb_assert (filename != NULL);
  39   }

I'm quite undecided whether this is bad behavior of GCC or bad coding
style in Binutils/GDB, or both.


A warning should be expected in this case.  Before the recent GCC
change it was inadvertently suppressed in gdb_assert macros by its
operand being enclosed in parentheses.

Martin



Thanks,
   Jan-Benedict





[wwwdocs] gcc-12/changes.html: OpenMP - mention C++11 attributes support

2021-07-19 Thread Tobias Burnus

Update the OpenMP feature list.

Comments? Remarks?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-12/changes.html: OpenMP - mention C++11 attributes support

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 9c2799cf..467cc8a5 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -56,7 +56,10 @@ a work-in-progress.
iterator can now also be used with the depend
   clause, defaultmap has been updated for OpenMP 5.0, and the
   loop directive and combined directives
-  involving master directive have been added.
+  involving master directive have been added. Additionally,
+  initial support for expressing OpenMP directives as C++ 11 attributes
+  has been added, which is an OpenMP 5.1 feature; so far, they can be
+  specified on statements and block-local variables, only.
   
   The new warning flag -Wopenacc-parallelism was added for
   OpenACC. It warns about potentially suboptimal choices related to


Re: [wwwdocs] gcc-12/changes.html: OpenMP - mention C++11 attributes support

2021-07-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 19, 2021 at 05:17:10PM +0200, Tobias Burnus wrote:
> Update the OpenMP feature list.
> 
> Comments? Remarks?

I'd defer mentioning it until I actually finish it (hopefully later this
week or next week worst case).

> gcc-12/changes.html: OpenMP - mention C++11 attributes support
> 
> diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
> index 9c2799cf..467cc8a5 100644
> --- a/htdocs/gcc-12/changes.html
> +++ b/htdocs/gcc-12/changes.html
> @@ -56,7 +56,10 @@ a work-in-progress.
> iterator can now also be used with the 
> depend
>clause, defaultmap has been updated for OpenMP 5.0, and 
> the
>loop directive and combined directives
> -  involving master directive have been added.
> +  involving master directive have been added. Additionally,
> +  initial support for expressing OpenMP directives as C++ 11 attributes
> +  has been added, which is an OpenMP 5.1 feature; so far, they can be
> +  specified on statements and block-local variables, only.
>
>The new warning flag -Wopenacc-parallelism was added for
>OpenACC. It warns about potentially suboptimal choices related to


Jakub



Re: drop va_list from formals if requested

2021-07-19 Thread Martin Jambor
Hi,

On Tue, Jul 13 2021, Alexandre Oliva wrote:
> I'm working on a feature that involves creating wrappers for some
> functions, using alternate means to pass variable argument lists to
> the wrapped versions.  The split-out (wrapped) function shouldn't be
> stdarg, and though comments in m_always_copy_start in
> ipa_param_adjustments suggested that making it negative would cause
> variable arguments to be dropped, this was not the case, and AFAICT
> there was no way to accomplish that.
>
> Perhaps there was no such intent implied by the comments, but that
> sounded like a reasonable plan to me, so I went ahead an implemented
> it.  With this patch, a negative m_always_copy_start drops the
> variable argument list marker from a cloned function's type, and the
> stdarg flag from the cloned cfun data structure.
>

It probably was my general intent, at least the comments really suggest
so :-)

However, my intent also was that functions which should have all their
arguments copied as they are (and only the return value dropped) should
be marked with NULL m_adj_params and m_always_copy_start set to zero.
Unfortunately, I did not make that work in time, I only vaguely remember
that ipa-split was somehow problematic.  I decided that I would not be
making a huge patch even bigger and settled on the idea that, at least
temporarily, m_always_copy_start would always be the initial number of
parameters.

As a consequence I also got lazy and used that assumption in
ipa_param_adjustments::modify_call where m_always_copy_start is meant to
be the original number of PARM_DECLs for which we may attempt to
generate debug info in:

  for (tree old_parm = DECL_ARGUMENTS (old_decl);
   old_parm && i < old_nargs && ((int) i) < m_always_copy_start;
   old_parm = DECL_CHAIN (old_parm), i++)

Fortunately, my recent re-write made that count available through other
means and we can now just use the orig_nargs local variable which is
already correctly computed at the beginning of the function.

> Regstrapped on x86_64-linux-gnu.  Ok to install?

Generally speaking, I agree that relaxing this assumption in general is
a good idea and so the patch is OK with this one additional fix
described above (I did not bootstrap and test it myself but it should
work (TM)).

Note that there are asserts in that check that m_always_copy_start is
either negative or the initial count of arguments so even after this
change m_always_copy_start cannot have an arbitrary value (and with the
exception of allowing zero I think that is a reasonable limitation).

>
> for  gcc/ChangeLog
>
>   * ipa-param-manipulation.c (build_adjusted_function_type): Add
>   skip_valist, obey it.
>   (ipa_param_adjustments::build_new_function_type): Take
>   negative m_always_copy_start as skip_valist.
>   (ipa_param_body_adjustmnets::modify_formal_parameters):
>   Likewise.
>   * tee-inline.c (initialize_cfun): Clear stdarg if function
>   type doesn't support it.

tRee-inline.c.

Thanks,

Martin


Re: [committed] .dir-locals.el: Set 'fill-column' to 80 for c-mode

2021-07-19 Thread Richard Earnshaw via Gcc-patches



On 19/07/2021 14:52, Richard Sandiford via Gcc-patches wrote:

Richard Earnshaw via Gcc-patches  writes:

On 14/12/2020 11:29, Andrea Corallo via Gcc-patches wrote:

Hi all,

I've committed the attached patch as obvious.

This is to set `fill-column' to 80 in c-mode (Emacs default it to 70) so
now M-q does the right thing.  I think is very handy especially in
comments.

Question: should we update the copyright manually for this file or have
it updated by 'update-copyright.py'?  I think this is not scanning the
root of the repo.

Thanks

Andrea



Sorry for the very late reply to this, but I've only just tracked it
down as the cause of why emacs had suddenly started to produce lines
that were displaying with a line wrap (I'd add an image, but the mailing
list would likely only strip it off).

The problem is that this allows a character to appear in column 80 and
emacs then automatically inserts a blank line after it (when the window
is 80 columns wide), which really messes up the formatting.

The wiki seems to suggest https://gcc.gnu.org/wiki/FormattingCodeForGCC
that the line length should be 79 columns (see the textwidth setting);
although that is not what is set contrib/vimrc.

Would anyone object if we reduced this by 1 (to 79) in both places?


Sounds good to me FWIW.  (Having hit the same issue.)

Richard



So how about this?

--

Limit fill-column to 79

The current line-length limit is set to 80, but that allows a character 
to appear in the 80th column, and that causes emacs to display a 
line-wrap followed by a blank line when the display/window width is 80 
columns.  Furthermore, this seems to contradict the coding-style rules 
on the wiki which suggest that the line limit should be 79.


So reduce the line width in both the emacs control file and the contrib 
vimrc file to 79 characters.


ChangeLog:
* .dir-local.el (c-mode): Change fill-column to 79.

contrib:
* vimrc (textwidth): Change non-gitcommit length to 79.

diff --git a/.dir-locals.el b/.dir-locals.el
index 44a0db68241..b07a0dc50d8 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -20,4 +20,4 @@
  (nil . ((bug-reference-url-format . "http://gcc.gnu.org/PR%s";)))
  (c-mode . ((c-file-style . "GNU")
 	(indent-tabs-mode . t)
-	(fill-column . 80
+	(fill-column . 79
diff --git a/contrib/vimrc b/contrib/vimrc
index 356d455b059..b7a8ab7360c 100644
--- a/contrib/vimrc
+++ b/contrib/vimrc
@@ -42,7 +42,7 @@ function! SetStyle()
   if &filetype == "gitcommit"
 setlocal textwidth=72
   else
-setlocal textwidth=80
+setlocal textwidth=79
   endif
   setlocal formatoptions-=ro formatoptions+=cqlt
   if index(l:c_exts, l:ext) != -1 || &filetype == "c" || &filetype == "cpp"


Re: [RFC/PATCH] Use range-based for loops for traversing loops

2021-07-19 Thread Martin Sebor via Gcc-patches

On 7/19/21 12:20 AM, Kewen.Lin wrote:

Hi,

This patch follows Martin's suggestion here[1], to support
range-based for loops for traversing loops, analogously to
the patch for vec[2].

Bootstrapped and regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped on ppc64le P9 with bootstrap-O3 config.

Any comments are appreciated.


Thanks for this nice cleanup!  Just a few suggestions:

I would recommend against introducing new macros unless they
offer a significant advantage over alternatives (for the two
macros the patch adds I don't think they do).

If improving const-correctness is one of our a goals
the loops_list iterator type would need to a corresponding
const_iterator type, and const overloads of the begin()
and end() member functions.

Rather than introducing more instances of the loop_p typedef
I'd suggest to use loop *.  It has at least two advantages:
it's clearer (it's obvious it refers to a pointer), and lends
itself more readily to making code const-correct by declaring
the control variable const: for (const class loop *loop: ...)
while avoiding the mistake of using const loop_p loop to
declare a pointer to a const loop.

Martin



BR,
Kewen

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573424.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html
-
gcc/ChangeLog:

* cfgloop.h (class loop_iterator): Rename to ...
(class loops_list): ... this.
(loop_iterator::next): Rename to ...
(loops_list::iterator::fill_curr_loop): ... this and adjust.
(loop_iterator::loop_iterator): Rename to ...
(loops_list::loops_list): ... this and adjust.
(FOR_EACH_LOOP): Rename to ...
(ALL_LOOPS): ... this.
(FOR_EACH_LOOP_FN): Rename to ...
(ALL_LOOPS_FN): this.
(loops_list::iterator): New class.
(loops_list::begin): New function.
(loops_list::end): Likewise.
* cfgloop.c (flow_loops_dump): Adjust FOR_EACH_LOOP* with ALL_LOOPS*.
(sort_sibling_loops): Likewise.
(disambiguate_loops_with_multiple_latches): Likewise.
(verify_loop_structure): Likewise.
* cfgloopmanip.c (create_preheaders): Likewise.
(force_single_succ_latches): Likewise.
* config/aarch64/falkor-tag-collision-avoidance.c
(execute_tag_collision_avoidance): Likewise.
* config/mn10300/mn10300.c (mn10300_scan_for_setlb_lcc): Likewise.
* config/s390/s390.c (s390_adjust_loops): Likewise.
* doc/loop.texi: Likewise.
* gimple-loop-interchange.cc (pass_linterchange::execute): Likewise.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
* gimple-loop-versioning.cc (loop_versioning::analyze_blocks): Likewise.
(loop_versioning::make_versioning_decisions): Likewise.
* gimple-ssa-split-paths.c (split_paths): Likewise.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.c (canonicalize_loop_form): Likewise.
(graphite_transform_loops): Likewise.
* ipa-fnsummary.c (analyze_function_body): Likewise.
* ipa-pure-const.c (analyze_function): Likewise.
* loop-doloop.c (doloop_optimize_loops): Likewise.
* loop-init.c (loop_optimizer_finalize): Likewise.
(fix_loop_structure): Likewise.
* loop-invariant.c (calculate_loop_reg_pressure): Likewise.
(move_loop_invariants): Likewise.
* loop-unroll.c (decide_unrolling): Likewise.
(unroll_loops): Likewise.
* modulo-sched.c (sms_schedule): Likewise.
* predict.c (predict_loops): Likewise.
(pass_profile::execute): Likewise.
* profile.c (branch_prob): Likewise.
* sel-sched-ir.c (sel_finish_pipelining): Likewise.
(sel_find_rgns): Likewise.
* tree-cfg.c (replace_loop_annotate): Likewise.
(replace_uses_by): Likewise.
(move_sese_region_to_fn): Likewise.
* tree-if-conv.c (pass_if_conversion::execute): Likewise.
* tree-loop-distribution.c (loop_distribution::execute): Likewise.
* tree-parloops.c (parallelize_loops): Likewise.
* tree-predcom.c (tree_predictive_commoning): Likewise.
* tree-scalar-evolution.c (scev_initialize): Likewise.
(scev_reset): Likewise.
* tree-ssa-dce.c (find_obviously_necessary_stmts): Likewise.
* tree-ssa-live.c (remove_unused_locals): Likewise.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
* tree-ssa-loop-im.c (analyze_memory_references): Likewise.
(tree_ssa_lim_initialize): Likewise.
* tree-ssa-loop-ivcanon.c (canonicalize_induction_variables): Likewise.
* tree-ssa-loop-ivopts.c (tree_ssa_iv_optimize): Likewise.
* tree-ssa-loop-manip.c (get_loops_exits): Likewise.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Likewise.
(free_numbers_of_iterations_estimates): Likewise.

Re: [PATCH] libgccjit: add some reflection functions in the jit C api

2021-07-19 Thread Antoni Boucher via Gcc-patches
I'm sending the patch once again for review/approval.

I fixed the doc to use the new function names.

Le vendredi 18 juin 2021 à 16:37 -0400, David Malcolm a écrit :
> On Fri, 2021-06-18 at 15:41 -0400, Antoni Boucher wrote:
> > I have write access now.
> 
> Great.
> 
> > I'm not sure how I'm supposed to send my patches:
> > should I put it in personal branches and you'll merge them?
> 
> Please send them to this mailing list for review; once they're
> approved
> you can merge them.
> 
> > 
> > And for the MAINTAINERS file, should I just push to master right
> > away,
> > after sending it to the mailing list?
> 
> I think people just push the MAINTAINERS change and then let the list
> know, since it makes a good test that write access is working
> correctly.
> 
> Dave
> 
> > 
> > Thanks for your help!
> > 
> > Le vendredi 18 juin 2021 à 12:09 -0400, David Malcolm a écrit :
> > > On Fri, 2021-06-18 at 11:55 -0400, Antoni Boucher wrote:
> > > > Le vendredi 11 juin 2021 à 14:00 -0400, David Malcolm a écrit :
> > > > > On Fri, 2021-06-11 at 08:15 -0400, Antoni Boucher wrote:
> > > > > > Thank you for your answer.
> > > > > > I attached the updated patch.
> > > > > 
> > > > > BTW you (or possibly me) dropped the mailing lists; was that
> > > > > deliberate?
> > > > 
> > > > Oh, my bad.
> > > > 
> > > 
> > > [...]
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > > I have signed the FSF copyright attribution.
> > > > > 
> > > > > I can push changes on your behalf, but I'd prefer it if you
> > > > > did
> > > > > it,
> > > > > especially given that you have various other patches you want
> > > > > to
> > > > > get
> > > > > in.
> > > > > 
> > > > > Instructions on how to get push rights to the git repo are
> > > > > here:
> > > > >   https://gcc.gnu.org/gitwrite.html
> > > > > 
> > > > > I can sponsor you.
> > > > 
> > > > Thanks.
> > > > I did sign up to get push rights.
> > > > Have you accepted my request to get those?
> > > 
> > > I did, but I didn't see any kind of notification.  Did you get an
> > > email
> > > about it?
> > > 
> > > 
> > > Dave
> > > 
> > 
> > 
> 
> 

From aaf77d109d74afb0f46d3e5d4d094914cb1b21de Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Sat, 1 Aug 2020 17:52:17 -0400
Subject: [PATCH] libgccjit: Add some reflection functions [PR96889]

2021-07-19  Antoni Boucher  

gcc/jit/
	PR target/96889
	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_16): New ABI tag.
	* docs/topics/functions.rst: Add documentation for the
	functions gcc_jit_function_get_return_type and
	gcc_jit_function_get_param_count
	* docs/topics/types.rst: Add documentation for the functions
	gcc_jit_function_type_get_return_type,
	gcc_jit_function_type_get_param_count,
	gcc_jit_function_type_get_param_type,
	gcc_jit_type_unqualified, gcc_jit_type_dyncast_array,
	gcc_jit_type_is_bool,
	gcc_jit_type_dyncast_function_ptr_type,
	gcc_jit_type_is_integral, gcc_jit_type_is_pointer,
	gcc_jit_type_dyncast_vector,
	gcc_jit_vector_type_get_element_type,
	gcc_jit_vector_type_get_num_units,
	gcc_jit_struct_get_field, gcc_jit_type_is_struct,
	and gcc_jit_struct_get_field_count
	* libgccjit.c:
	(gcc_jit_function_get_return_type, gcc_jit_function_get_param_count,
	gcc_jit_function_type_get_return_type,
	gcc_jit_function_type_get_param_count,
	gcc_jit_function_type_get_param_type, gcc_jit_type_unqualified,
	gcc_jit_type_dyncast_array, gcc_jit_type_is_bool,
	gcc_jit_type_dyncast_function_ptr_type, gcc_jit_type_is_integral,
	gcc_jit_type_is_pointer, gcc_jit_type_dyncast_vector,
	gcc_jit_vector_type_get_element_type,
	gcc_jit_vector_type_get_num_units, gcc_jit_struct_get_field,
	gcc_jit_type_is_struct, gcc_jit_struct_get_field_count): New
	functions.
	(struct gcc_jit_function_type, struct gcc_jit_vector_type):
	New types.
	* libgccjit.h:
	(gcc_jit_function_get_return_type, gcc_jit_function_get_param_count,
	gcc_jit_function_type_get_return_type,
	gcc_jit_function_type_get_param_count,
	gcc_jit_function_type_get_param_type, gcc_jit_type_unqualified,
	gcc_jit_type_dyncast_array, gcc_jit_type_is_bool,
	gcc_jit_type_dyncast_function_ptr_type, gcc_jit_type_is_integral,
	gcc_jit_type_is_pointer, gcc_jit_type_dyncast_vector,
	gcc_jit_vector_type_get_element_type,
	gcc_jit_vector_type_get_num_units, gcc_jit_struct_get_field,
	gcc_jit_type_is_struct, gcc_jit_struct_get_field_count): New
	function declarations.
	(struct gcc_jit_function_type, struct gcc_jit_vector_type):
	New types.
	* jit-recording.h: New functions (is_struct and is_vector)
	* libgccjit.map (LIBGCCJIT_ABI_16): New ABI tag.

gcc/testsuite/
	PR target/96889
	* jit.dg/all-non-failing-tests.h: Add test-reflection.c.
	* jit.dg/test-reflection.c: New test.
---
 gcc/jit/docs/topics/compatibility.rst|  43 ++-
 gcc/jit/docs/topics/functions.rst|  26 ++
 gcc/jit/docs/topics/types.rst| 122 +
 gcc/jit/jit-recording.h  |   7 +
 gcc/jit/libgccjit.c  | 261 +++
 gcc/jit/libgccjit.h   

Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-19 Thread Hafiz Abid Qadeer
On 19/07/2021 11:45, Richard Biener wrote:
> On Fri, Jul 16, 2021 at 10:23 PM Hafiz Abid Qadeer
>  wrote:
>>
>> On 15/07/2021 13:09, Richard Biener wrote:
>>> On Thu, Jul 15, 2021 at 12:35 PM Hafiz Abid Qadeer
>>>  wrote:

 On 15/07/2021 11:33, Thomas Schwinge wrote:
>
>> Note that the "parent" should be abstract but I don't think dwarf has a
>> way to express a fully abstract parent of a concrete instance child - or
>> at least how GCC expresses this causes consumers to "misinterpret"
>> that.  I wonder if adding a DW_AT_declaration to the late DWARF
>> emitted "parent" would fix things as well here?
>
> (I suppose not, Abid?)
>

 Yes, adding DW_AT_declaration does not fix the problem.
>>>
>>> Does emitting
>>>
>>> DW_TAG_compile_unit
>>>   DW_AT_name("")
>>>
>>>   DW_TAG_subprogram // notional parent function (foo) with no code range
>>> DW_AT_declaration 1
>>> a:DW_TAG_subprogram // offload function foo._omp_fn.0
>>>   DW_AT_declaration 1
>>>
>>>   DW_TAG_subprogram // offload function
>>>   DW_AT_abstract_origin a
>>> ...
>>>
>>> do the trick?  The following would do this, flattening function definitions
>>> for the concrete copies:
>>>
>>> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
>>> index 82783c4968b..a9c8bc43e88 100644
>>> --- a/gcc/dwarf2out.c
>>> +++ b/gcc/dwarf2out.c
>>> @@ -6076,6 +6076,11 @@ maybe_create_die_with_external_ref (tree decl)
>>>/* Peel types in the context stack.  */
>>>while (ctx && TYPE_P (ctx))
>>>  ctx = TYPE_CONTEXT (ctx);
>>> +  /* For functions peel the context up to namespace/TU scope.  The abstract
>>> + copies reveal the true nesting.  */
>>> +  if (TREE_CODE (decl) == FUNCTION_DECL)
>>> +while (ctx && TREE_CODE (ctx) == FUNCTION_DECL)
>>> +  ctx = DECL_CONTEXT (ctx);
>>>/* Likewise namespaces in case we do not want to emit DIEs for them.  */
>>>if (debug_info_level <= DINFO_LEVEL_TERSE)
>>>  while (ctx && TREE_CODE (ctx) == NAMESPACE_DECL)
>>> @@ -6099,8 +6104,7 @@ maybe_create_die_with_external_ref (tree decl)
>>> /* Leave function local entities parent determination to when
>>>we process scope vars.  */
>>> ;
>>> -  else
>>> -   parent = lookup_decl_die (ctx);
>>> +  parent = lookup_decl_die (ctx);
>>>  }
>>>else
>>>  /* In some cases the FEs fail to set DECL_CONTEXT properly.
>>>
>>
>> Thanks. This solves the problem. Only the first hunk was required. Second 
>> hunk
>> actually causes an ICE when TREE_CODE (ctx) == BLOCK.
>> OK to commit the attached patch?
> 
> I think we need to merge the TYPE_P peeling and FUNCTION_DECL peeling into
> one loop since I suppose we can have a nested function in class scope.
> So sth like
> 
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 82783c4968b..61228410b51 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -6073,8 +6073,12 @@ maybe_create_die_with_external_ref (tree decl)
>  }
>else
>  ctx = DECL_CONTEXT (decl);
> -  /* Peel types in the context stack.  */
> -  while (ctx && TYPE_P (ctx))
> +  /* Peel types in the context stack.  For functions peel the context up
> + to namespace/TU scope.  The abstract copies reveal the true nesting.  */
> +  while (ctx
> +&& (TYPE_P (ctx)
> +|| (TREE_CODE (decl) == FUNCTION_DECL
> +&& TREE_CODE (ctx) == FUNCTION_DECL)))
>  ctx = TYPE_CONTEXT (ctx);
>/* Likewise namespaces in case we do not want to emit DIEs for them.  */
>if (debug_info_level <= DINFO_LEVEL_TERSE)
> 
This causes an ICE,
internal compiler error: tree check: expected class 'type', have 'declaration' 
(function_decl)

Did you intend something like this:

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 561f8b23517..c61f0041fba 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -6121,3 +6121,8 @@ maybe_create_die_with_external_ref (tree decl)
-  /* Peel types in the context stack.  */
-  while (ctx && TYPE_P (ctx))
-ctx = TYPE_CONTEXT (ctx);
+  /* Peel types in the context stack.  For functions peel the context up
+ to namespace/TU scope.  The abstract copies reveal the true nesting.  */
+  while (ctx
+   && (TYPE_P (ctx)
+   || (TREE_CODE (decl) == FUNCTION_DECL
+   && TREE_CODE (ctx) == FUNCTION_DECL)))
+ctx = TYPE_P (ctx) ? TYPE_CONTEXT (ctx) : DECL_CONTEXT (ctx);
+


> if that works it's OK.  Can you run it on the gdb testsuite with -flto added
> as well please (you need to do before/after comparison since IIRC adding
> -flto will add a few fails).
> 
> Thanks,
> Richard.
> 


-- 
Hafiz Abid Qadeer
Mentor, a Siemens Business


Re: sync up new type indices for body adjustments

2021-07-19 Thread Martin Jambor
Hi,

On Tue, Jul 13 2021, Alexandre Oliva wrote:
> The logic in fill_vector_of_new_param_types may skip some parameters
> when pushing into m_new_types, but common_initialization doesn't take
> that into account, and may end up attempting to access the vector past
> its end when IPA_PARAM_OP_(NEW|SPLIT) operands appear after skipped
> _COPY ones.
>
> This patch adjusts the consumer logic to match the indexing in the
> producer.  It came up in libstdc++-v3's testsuite, in
> std/ranges/adaptors/filter.cc, but only with wrappers introduced by a pass
> I'm working on.  The _NEW parameters were reference-typed replacements
> for some by-value ones in my function-wrapping logic, and other IPA
> transformations cause plenty of unused/irrelevant arguments to be
> dropped for certain calls.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

I agree that there is a mismatch but I do not like proposed change.
Indeed fill_vector_of_new_param_types can skip some parameters when they
appear to be copies of something that is not there - but that bail-out
is not supposed to ever happen when fill_vector_of_new_param_types is
called from ipa_param_body_adjustments.

It is there for cases where one compilation unit has a completely bogus
idea of a type defined in another compilation unit and so creates a bad
type for gimple_call_fntype, but then LTO decides to do something with
the parameters and modifying the gimple_call_fntype becomes a
garbage-in-garbage-out operation (but avoids ICE).

That should never happen when you actually have a body and so presumably
the type and parameters match.  So I would first check how come that you
request IPA_PARAM_OP_COPY of something that does not seem to have a
corresponding type but there is a DECL, otherwise the code would have
ICEd when attempting to "carry over" that.

If you believe that what you're doing is correct (but check twice), I
would prefer the following change (only mildly tested) that is a
clean-up and should help you with your problem too.

Thanks,

Martin


diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 26b02d7aa95..c5285b7cdf3 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -1067,14 +1067,15 @@ ipa_param_body_adjustments::common_initialization (tree 
old_fndecl,
   auto_vec otypes;
   if (TYPE_ARG_TYPES (TREE_TYPE (old_fndecl)) != NULL_TREE)
 push_function_arg_types (&otypes, TREE_TYPE (old_fndecl));
-  else
+  if (m_oparms.length () != otypes.length ())
 {
-  auto_vec oparms;
-  push_function_arg_decls (&oparms, old_fndecl);
-  unsigned ocount = oparms.length ();
+  /* Parameter type information is not available or there is a mismatch
+between it and the real parameters (probably K&R code or weird LTO
+mismatched decls).  */
+  unsigned ocount = m_oparms.length ();
   otypes.reserve_exact (ocount);
   for (unsigned i = 0; i < ocount; i++)
-   otypes.quick_push (TREE_TYPE (oparms[i]));
+   otypes.quick_push (TREE_TYPE (m_oparms[i]));
 }
   fill_vector_of_new_param_types (&m_new_types, &otypes, m_adj_params, true);


Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-19 Thread Richard Biener via Gcc-patches
On July 19, 2021 6:13:40 PM GMT+02:00, Hafiz Abid Qadeer 
 wrote:
>On 19/07/2021 11:45, Richard Biener wrote:
>> On Fri, Jul 16, 2021 at 10:23 PM Hafiz Abid Qadeer
>>  wrote:
>>>
>>> On 15/07/2021 13:09, Richard Biener wrote:
 On Thu, Jul 15, 2021 at 12:35 PM Hafiz Abid Qadeer
  wrote:
>
> On 15/07/2021 11:33, Thomas Schwinge wrote:
>>
>>> Note that the "parent" should be abstract but I don't think
>dwarf has a
>>> way to express a fully abstract parent of a concrete instance
>child - or
>>> at least how GCC expresses this causes consumers to
>"misinterpret"
>>> that.  I wonder if adding a DW_AT_declaration to the late DWARF
>>> emitted "parent" would fix things as well here?
>>
>> (I suppose not, Abid?)
>>
>
> Yes, adding DW_AT_declaration does not fix the problem.

 Does emitting

 DW_TAG_compile_unit
   DW_AT_name("")

   DW_TAG_subprogram // notional parent function (foo) with no code
>range
 DW_AT_declaration 1
 a:DW_TAG_subprogram // offload function foo._omp_fn.0
   DW_AT_declaration 1

   DW_TAG_subprogram // offload function
   DW_AT_abstract_origin a
 ...

 do the trick?  The following would do this, flattening function
>definitions
 for the concrete copies:

 diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
 index 82783c4968b..a9c8bc43e88 100644
 --- a/gcc/dwarf2out.c
 +++ b/gcc/dwarf2out.c
 @@ -6076,6 +6076,11 @@ maybe_create_die_with_external_ref (tree
>decl)
/* Peel types in the context stack.  */
while (ctx && TYPE_P (ctx))
  ctx = TYPE_CONTEXT (ctx);
 +  /* For functions peel the context up to namespace/TU scope.  The
>abstract
 + copies reveal the true nesting.  */
 +  if (TREE_CODE (decl) == FUNCTION_DECL)
 +while (ctx && TREE_CODE (ctx) == FUNCTION_DECL)
 +  ctx = DECL_CONTEXT (ctx);
/* Likewise namespaces in case we do not want to emit DIEs for
>them.  */
if (debug_info_level <= DINFO_LEVEL_TERSE)
  while (ctx && TREE_CODE (ctx) == NAMESPACE_DECL)
 @@ -6099,8 +6104,7 @@ maybe_create_die_with_external_ref (tree
>decl)
 /* Leave function local entities parent determination to
>when
we process scope vars.  */
 ;
 -  else
 -   parent = lookup_decl_die (ctx);
 +  parent = lookup_decl_die (ctx);
  }
else
  /* In some cases the FEs fail to set DECL_CONTEXT properly.

>>>
>>> Thanks. This solves the problem. Only the first hunk was required.
>Second hunk
>>> actually causes an ICE when TREE_CODE (ctx) == BLOCK.
>>> OK to commit the attached patch?
>> 
>> I think we need to merge the TYPE_P peeling and FUNCTION_DECL peeling
>into
>> one loop since I suppose we can have a nested function in class
>scope.
>> So sth like
>> 
>> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
>> index 82783c4968b..61228410b51 100644
>> --- a/gcc/dwarf2out.c
>> +++ b/gcc/dwarf2out.c
>> @@ -6073,8 +6073,12 @@ maybe_create_die_with_external_ref (tree decl)
>>  }
>>else
>>  ctx = DECL_CONTEXT (decl);
>> -  /* Peel types in the context stack.  */
>> -  while (ctx && TYPE_P (ctx))
>> +  /* Peel types in the context stack.  For functions peel the
>context up
>> + to namespace/TU scope.  The abstract copies reveal the true
>nesting.  */
>> +  while (ctx
>> +&& (TYPE_P (ctx)
>> +|| (TREE_CODE (decl) == FUNCTION_DECL
>> +&& TREE_CODE (ctx) == FUNCTION_DECL)))
>>  ctx = TYPE_CONTEXT (ctx);
>>/* Likewise namespaces in case we do not want to emit DIEs for
>them.  */
>>if (debug_info_level <= DINFO_LEVEL_TERSE)
>> 
>This causes an ICE,
>internal compiler error: tree check: expected class 'type', have
>'declaration' (function_decl)
>
>Did you intend something like this:
>
>diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
>index 561f8b23517..c61f0041fba 100644
>--- a/gcc/dwarf2out.c
>+++ b/gcc/dwarf2out.c
>@@ -6121,3 +6121,8 @@ maybe_create_die_with_external_ref (tree decl)
>-  /* Peel types in the context stack.  */
>-  while (ctx && TYPE_P (ctx))
>-ctx = TYPE_CONTEXT (ctx);
>+  /* Peel types in the context stack.  For functions peel the context
>up
>+ to namespace/TU scope.  The abstract copies reveal the true
>nesting.  */
>+  while (ctx
>+   && (TYPE_P (ctx)
>+   || (TREE_CODE (decl) == FUNCTION_DECL
>+   && TREE_CODE (ctx) == FUNCTION_DECL)))
>+ctx = TYPE_P (ctx) ? TYPE_CONTEXT (ctx) : DECL_CONTEXT (ctx);
>+

Yes, of course. 

>
>> if that works it's OK.  Can you run it on the gdb testsuite with
>-flto added
>> as well please (you need to do before/after comparison since IIRC
>adding
>> -flto will add a few fails).
>> 
>> Thanks,
>> Richard.
>> 



[PATCH] aarch64: Refactor TBL/TBX RTL patterns

2021-07-19 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch renames the two-source-register TBL/TBX RTL
patterns so that their names better reflect what they do, rather than
confusing them with tbl3 or tbx4 patterns. Also use the correct
"neon_tbl2" type attribute for both patterns.

Rename single-source-register TBL/TBX patterns for consistency.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-08  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Use two variant
generators for all TBL/TBX intrinsics and rename to
consistent forms: qtbl[1234] or qtbx[1234].
* config/aarch64/aarch64-simd.md (aarch64_tbl1):
Rename to...
(aarch64_qtbl1): This.
(aarch64_tbx1): Rename to...
(aarch64_qtbx1): This.
(aarch64_tbl2v16qi): Delete.
(aarch64_tbl3): Rename to...
(aarch64_qtbl2): This.
(aarch64_tbx4): Rename to...
(aarch64_qtbx2): This.
* config/aarch64/aarch64.c (aarch64_expand_vec_perm_1): Use
renamed qtbl1 and qtbl2 RTL patterns.
* config/aarch64/arm_neon.h (vqtbl1_p8): Use renamed qtbl1
RTL pattern.
(vqtbl1_s8): Likewise.
(vqtbl1_u8): Likewise.
(vqtbl1q_p8): Likewise.
(vqtbl1q_s8): Likewise.
(vqtbl1q_u8): Likewise.
(vqtbx1_s8): Use renamed qtbx1 RTL pattern.
(vqtbx1_u8): Likewise.
(vqtbx1_p8): Likewise.
(vqtbx1q_s8): Likewise.
(vqtbx1q_u8): Likewise.
(vqtbx1q_p8): Likewise.
(vtbl1_s8): Use renamed qtbl1 RTL pattern.
(vtbl1_u8): Likewise.
(vtbl1_p8): Likewise.
(vtbl2_s8): Likewise
(vtbl2_u8): Likewise.
(vtbl2_p8): Likewise.
(vtbl3_s8): Use renamed qtbl2 RTL pattern.
(vtbl3_u8): Likewise.
(vtbl3_p8): Likewise.
(vtbl4_s8): Likewise.
(vtbl4_u8): Likewise.
(vtbl4_p8): Likewise.
(vtbx2_s8): Use renamed qtbx2 RTL pattern.
(vtbx2_u8): Likewise.
(vtbx2_p8): Likewise.
(vqtbl2_s8): Use renamed qtbl2 RTL pattern.
(vqtbl2_u8): Likewise.
(vqtbl2_p8): Likewise.
(vqtbl2q_s8): Likewise.
(vqtbl2q_u8): Likewise.
(vqtbl2q_p8): Likewise.
(vqtbx2_s8): Use renamed qtbx2 RTL pattern.
(vqtbx2_u8): Likewise.
(vqtbx2_p8): Likewise.
(vqtbx2q_s8): Likewise.
(vqtbx2q_u8): Likewise.
(vqtbx2q_p8): Likewise.
(vtbx4_s8): Likewise.
(vtbx4_u8): Likewise.
(vtbx4_p8): Likewise.


rb14671.patch
Description: rb14671.patch


[committed] amdgcn: Add -mxnack and -msram-ecc [PR 100208]

2021-07-19 Thread Andrew Stubbs
This patch adds two new GCN-specific options: -mxnack and 
-msram-ecc={on,off,any}.


The primary purpose is to ensure that we have an explicit default 
setting for these features and that this is passed to the assembler. 
This will ensure that if LLVM defaults change, again, GCC won't get 
caught out and stop working with attribute mismatches.


The new options will provide a means to adjust these features in future, 
but this patch does not actually add any new support for either XNACK or 
SRAM-ECC.


The XNACK feature has two settings, "on" (-mxnack) and "off" 
(-mno-xnack). The default is "off", and trying to turn it on will give a 
"sorry, unimplemented" message. To implement this will require changes 
to the load/store instruction early-clobber rules (actually, clobbering 
across multiple contiguous load/store instructions is a problem too), 
and a new xnack-enabled multilib for each supported ISA.


The SRAM-ECC feature has three settings, "on", "off" and "any" (in which 
the generated code must work with the device configures to either mode). 
The current implementation is actually "any" already, but as that 
attribute setting is not available in the HSACOv3 binary standard we 
target right now we just set it to "on" or "off" according to which 
makes sense for the configured ISA. We'll have to revisit this when we 
implement HSACOv4 compatibility.


Andrew
amdgcn: Add -mxnack and -msram-ecc [PR 100208]

gcc/ChangeLog:

PR target/100208
* config/gcn/gcn-hsa.h (DRIVER_SELF_SPECS): New.
(ASM_SPEC): Set -mattr for xnack and sram-ecc.
* config/gcn/gcn-opts.h (enum sram_ecc_type): New.
* config/gcn/gcn-valu.md: Add a warning comment.
* config/gcn/gcn.c (gcn_option_override): Add "sorry" for -mxnack.
(output_file_start): Add xnack and sram-ecc state to ".amdgcn_target".
* config/gcn/gcn.md: Add a warning comment.
* config/gcn/gcn.opt: Add -mxnack and -msram-ecc.
* config/gcn/mkoffload.c (EF_AMDGPU_MACH_AMDGCN_GFX908): Remove
SRAM-ECC flag.
(EF_AMDGPU_XNACK): New.
(EF_AMDGPU_SRAM_ECC): New.
(elf_flags): New.
(copy_early_debug_info): Use elf_flags.
(main): Handle -mxnack and -msram-ecc options.
* doc/invoke.texi: Document -mxnack and -msram-ecc.

gcc/testsuite/ChangeLog:

PR target/100208
* gcc.target/gcn/sram-ecc-1.c: New test.
* gcc.target/gcn/sram-ecc-2.c: New test.
* gcc.target/gcn/sram-ecc-3.c: New test.
* gcc.target/gcn/sram-ecc-4.c: New test.
* gcc.target/gcn/sram-ecc-5.c: New test.
* gcc.target/gcn/sram-ecc-6.c: New test.
* gcc.target/gcn/sram-ecc-7.c: New test.
* gcc.target/gcn/sram-ecc-8.c: New test.

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 61cdb312c2e..724e9a381ba 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -75,9 +75,15 @@ extern unsigned int gcn_local_sym_hash (const char *name);
supported for gcn.  */
 #define GOMP_SELF_SPECS ""
 
+#define DRIVER_SELF_SPECS \
+  "%{march=fiji|march=gfx900|march=gfx906:%{!msram-ecc=*:-msram-ecc=off}}"
+
 /* Use LLVM assembler and linker options.  */
 #define ASM_SPEC  "-triple=amdgcn--amdhsa "  \
  "%:last_arg(%{march=*:-mcpu=%*}) " \
+ "-mattr=%{mxnack:+xnack;:-xnack} " \
+ /* FIXME: support "any" when we move to HSACOv4.  */ \
+ "-mattr=%{!msram-ecc=off:+sram-ecc;:-sram-ecc} " \
  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
 #define LIB_SPEC  "-lc"
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index ed67d015ff8..b25516060e1 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -34,4 +34,11 @@ extern int gcn_isa;
 #define TARGET_GCN5 (gcn_isa == 5)
 #define TARGET_GCN5_PLUS (gcn_isa >= 5)
 
+enum sram_ecc_type
+{
+  SRAM_ECC_OFF,
+  SRAM_ECC_ON,
+  SRAM_ECC_ANY
+};
+
 #endif
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index beefcf754d7..84ff67508b9 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -703,6 +703,8 @@ (define_expand "vec_init"
 ;; - The address space and glc (volatile) fields are there to replace the
 ;;   fields normally found in a MEM.
 ;; - Multiple forms of address expression are supported, below.
+;;
+;; TODO: implement combined gather and zero_extend, but only for -msram-ecc=on
 
 (define_expand "gather_load"
   [(match_operand:V_ALL 0 "register_operand")
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 6d02a4a02dd..385b90c4b00 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -144,6 +144,10 @@ gcn_option_override (void)
/* 1MB total.  */
stack_size_opt = 1048576;
 }
+
+  /* The xnack option is a placeholder, for now.  */
+  if (flag_xnack)
+sorry ("XNACK support");
 }
 
 /* }}}  */
@@ -5182,11 +5186,16 @@ output_file_start (void)
 case

Re: [PATCH v2] c++: Reject ordered comparison of null pointers [PR99701]

2021-07-19 Thread Marek Polacek via Gcc-patches
On Sat, Jul 17, 2021 at 02:50:28PM -0700, Jason Merrill wrote:
> On 7/16/21 6:34 PM, Jakub Jelinek wrote:
> > On Fri, Jul 16, 2021 at 05:36:13PM -0400, Marek Polacek via Gcc-patches 
> > wrote:
> > > When implementing DR 1512 in r11-467 I neglected to reject ordered
> > > comparison of two null pointers, like nullptr < nullptr.  This patch
> > > fixes that omission.
> > > 
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > >   DR 1512
> > >   PR c++/99701
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * cp-gimplify.c (cp_fold): Remove {LE,LT,GE,GT_EXPR} from
> > >   a switch.
> > >   * typeck.c (cp_build_binary_op): Reject ordered comparison
> > >   of two null pointers.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/cpp0x/nullptr11.C: Remove invalid tests.
> > >   * g++.dg/cpp0x/nullptr46.C: Add dg-error.
> > >   * g++.dg/expr/ptr-comp4.C: New test.
> > > 
> > > libstdc++-v3/ChangeLog:
> > > 
> > >   * testsuite/20_util/tuple/comparison_operators/overloaded.cc:
> > >   Add dg-error.
> > 
> > Maybe it would be useful to have also a g++.dg/cpp2a/ testcase with
> > nullptr <=> nullptr etc. (nullptr <=> 0, etc. what you test
> > in ptr-comp4.C after #include ).
> 
> Sounds good.

Done.
 
> > +++ 
> > b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded.cc
> > @@ -49,3 +49,5 @@ TwistedLogic operator<(const Compares&, const Compares&) 
> > { return {false}; }
> >  auto a = std::make_tuple(nullptr, Compares{}, 2, 'U');
> >  auto b = a == a;
> >  auto c = a < a;
> > +
> > +// { dg-error "ordered comparison" "" { target *-*-* } 0 }
> 
> If we can't test for the specific problematic line, let's split this
> testcase into separate well-formed and ill-formed tests.

Done, I don't know if there's a trick to check an error that comes
from tuple.  Jon, are you OK with what I'm doing here?  Since
overloaded.cc and overloaded2.cc are essentially the same I could
make the substance of the test into a separate file but since it's
just a test, I didn't bother.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
When implementing DR 1512 in r11-467 I neglected to reject ordered
comparison of two null pointers, like nullptr < nullptr.  This patch
fixes that omission.

DR 1512
PR c++/99701

gcc/cp/ChangeLog:

* cp-gimplify.c (cp_fold): Remove {LE,LT,GE,GT_EXPR} from
a switch.
* typeck.c (cp_build_binary_op): Reject ordered comparison
of two null pointers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nullptr11.C: Remove invalid tests.
* g++.dg/cpp0x/nullptr46.C: Add dg-error.
* g++.dg/cpp2a/spaceship-err7.C: New test.
* g++.dg/expr/ptr-comp4.C: New test.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/comparison_operators/overloaded.cc:
Move a line...
* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
...here.  New test.
---
 gcc/cp/cp-gimplify.c  |  4 --
 gcc/cp/typeck.c   | 15 ++
 gcc/testsuite/g++.dg/cpp0x/nullptr11.C| 16 --
 gcc/testsuite/g++.dg/cpp0x/nullptr46.C|  3 +-
 gcc/testsuite/g++.dg/cpp2a/spaceship-err7.C   | 14 +
 gcc/testsuite/g++.dg/expr/ptr-comp4.C | 21 
 .../tuple/comparison_operators/overloaded.cc  |  1 -
 .../tuple/comparison_operators/overloaded2.cc | 52 +++
 8 files changed, 92 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-err7.C
 create mode 100644 gcc/testsuite/g++.dg/expr/ptr-comp4.C
 create mode 100644 
libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index ff0bff771df..0520fa45b91 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2433,13 +2433,9 @@ cp_fold (tree x)
  switch (code)
{
case EQ_EXPR:
-   case LE_EXPR:
-   case GE_EXPR:
  x = constant_boolean_node (true, TREE_TYPE (x));
  break;
case NE_EXPR:
-   case LT_EXPR:
-   case GT_EXPR:
  x = constant_boolean_node (false, TREE_TYPE (x));
  break;
default:
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index a483e1f988d..738e69a0440 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -5483,7 +5483,9 @@ cp_build_binary_op (const op_location_t &location,
result_type = composite_pointer_type (location,
  type0, type1, op0, op1,
  CPO_COMPARISON, complain);
-  else if (code0 == POINTER_TYPE && null_ptr_cst_p (orig_op1))
+  else if ((code0 == POINTER_TYPE && null_ptr_cst_p (orig_op1))
+  || (code1 == POINTER_TYPE && null_ptr_cst_p (orig_op0))
+  || (null_ptr_cst_p (orig_op0) && null_ptr_cst_p (orig_op1)))
{
  /* Core Issue 1512 made

Re: [PATCH 10/55] rs6000: Main function with stubs for parsing and output

2021-07-19 Thread Segher Boessenkool
Hi!

On Thu, Jun 17, 2021 at 10:18:54AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-gen-builtins.c (rbtree.h): New #include.
>   (num_bifs): New variable.
>   (num_ovld_stanzas): Likewise.
>   (num_ovlds): Likewise.
>   (parse_codes): New enum.
>   (bif_rbt): New variable.
>   (ovld_rbt): Likewise.
>   (fntype_rbt): Likewise.
>   (bifo_rbt): Likewise.
>   (parse_bif): New stub function.
>   (create_bif_order): Likewise.
>   (parse_ovld): Likewise.
>   (write_header_file): Likewise.
>   (write_init_file): Likewise.
>   (write_defines_file): Likewise.
>   (delete_output_files): New function.
>   (main): Likewise.

> +/* Parse the built-in file.  */
> +static parse_codes
> +parse_bif (void)
> +{
> +  return PC_OK;
> +}

Baby steps :-)

> +/* Write everything to the header file (rs6000-builtins.h).  */
> +static int
> +write_header_file (void)
> +{
> +  return 1;
> +}

What does the return value mean?  Please document it in a comment.  Same
for other functions (where the function name does not make it obvious
what the return value is).

> +static void
> +delete_output_files (void)
> +{
> +  /* Depending on whence we're called, some of these may already be
> + closed.  Don't check for errors.  */
> +  fclose (header_file);
> +  fclose (init_file);
> +  fclose (defines_file);
> +
> +  unlink (header_path);
> +  unlink (init_path);
> +  unlink (defines_path);
> +}

What are header_path etc.?  It is a very good idea to make sure this is
never something terrible to call unlink on (including making sure the
delete_output_files function is *obviously* never called if creating the
files didn't succeed).

> +/* Main program to convert flat files into built-in initialization code.  */
> +int
> +main (int argc, const char **argv)
> +{
> +  if (argc != 6)
> +{
> +  fprintf (stderr,
> +"Five arguments required: two input file and three output "
> +"files.\n");

Two input file_s_ :-)  (Or s/file //).

> +  pgm_path = argv[0];

This isn't portable (depending on what you use it for -- argv[0] is not
necessarily a path at all).

> +  bif_file = fopen (bif_path, "r");
> +  if (!bif_file)
> +{
> +  fprintf (stderr, "Cannot find input built-in file '%s'.\n", bif_path);
> +  exit (1);
> +}

Say s/find/open/ in the error?

> +  fprintf (stderr, "Cannot find input overload file '%s'.\n", ovld_path);

(more)

Okay with those trivialities, and the unlink stuff looked at.  Thanks!


Segher


Re: [PATCH][gcc] Allow functions without C-style ellipsis to use format attribute

2021-07-19 Thread Martin Sebor via Gcc-patches

On 7/5/21 12:38 AM, Tuan Le Quang wrote:

Hi Martin,

Thank you for your quick response.

 >  The main benefit of variadic functions templates over C vararg
 >  functions is that they make use of the type system for type safety.
 >  I'm not sure I see applying attribute format to them as a very
 >  compelling use case.  (I'd expect the format string in a variadic
 >  function template to use generic conversion specifiers, say %@ or
 >  some such, and only let the caller specify things like flags, width
 >  and precision but not type conversion specifiers).  Is there one
 >  where relying on the type system isn't good enough?

One case we have is wrapping a C API in a C++ one. Hence, the underlying 
API is C and we must specify types for format arguments. It means that 
generic conversion is not possible.


 > Do you have an actual
 > use case for it or did it just fall out of the varaidic template
 > implementation?

Yes, it falls out of the variadic template. It is the only way that 
helps variadic templates use format attribute. And I also feel the same, 
it might be useful sometimes.


Also, your suggestion on the code is helpful! I have drafted a new patch 
here. I have bootstrapped and regression tested it on x86_64-pc-linux-gnu


You've answered my questions about the design (thank you) and I don't
have any objections to the idea, but I'm not in a position to approve
the patch.  I would suggest to get Jason's input on extending
attribute format to variadic function templates, and Joseph's on
extending it to ordinary (non-variadic) functions.  I've CC'd both.

Thanks
Martin



Regards,
Tuan

gcc/c-family/ChangeLog:

* c-attribs.c (positional_argument): allow third argument of format 
attribute to point to parameters of any type if the function is not 
C-style variadic
* c-common.h (enum posargflags): add flag POSARG_ANY for the third 
argument of format attribute
* c-format.c (decode_format_attr): read third argument with 
POSARG_ELLIPSIS only if the function has has a variable argument

(handle_format_attribute): relax explicit checks for non-variadic functions

gcc/testsuite/ChangeLog:

* gcc.dg/format/attr-3.c: modify comment
* objc.dg/attributes/method-format-1.m: errors do not hold anymore, a 
warning is given instead

* g++.dg/warn/format9.C: New test.
* gcc.dg/format/attr-9.c: New test.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 6bf492afcc0..a46d882feba 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -714,6 +714,9 @@ positional_argument (const_tree fntype, const_tree 
atname, tree pos,

    return NULL_TREE;
   }

+      if (flags & POSARG_ANY)
+    return pos;
+
        /* Where the expected code is STRING_CST accept any pointer
   expected by attribute format (this includes possibly qualified
   char pointers and, for targets like Darwin, also pointers to
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index be4b29a017b..391cfa685d4 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1462,7 +1462,10 @@ enum posargflags {
    POSARG_ZERO = 1,
    /* Consider positional attribute argument value valid if it refers
       to the ellipsis (i.e., beyond the last typed argument).  */
-  POSARG_ELLIPSIS = 2
+  POSARG_ELLIPSIS = 2,
+  /* Consider positional attribute argument value valid if it refers
+     to an argument of any type */
+  POSARG_ANY = 4
  };

  extern tree positional_argument (const_tree, const_tree, tree, tree_code,
diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index bda3b18fcd0..0cef3152828 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -380,9 +380,16 @@ decode_format_attr (const_tree fntype, tree atname, 
tree args,

    else
      return false;

+  bool has_variable_arg = !type_argument_type(fntype, 
type_num_arguments(fntype) + 1);

+  int extra_flag = 0;
+  if (has_variable_arg)
+    extra_flag = POSARG_ELLIPSIS;
+  else
+    extra_flag = POSARG_ANY;
+
    if (tree val = get_constant (fntype, atname, *first_arg_num_expr,
         3, &info->first_arg_num,
-       (POSARG_ZERO | POSARG_ELLIPSIS), validated_p))
+       (POSARG_ZERO | extra_flag), validated_p))
      *first_arg_num_expr = val;
    else
      return false;
@@ -5193,11 +5200,11 @@ handle_format_attribute (tree *node, tree 
atname, tree args,

    tree arg_type;

    /* Verify that first_arg_num points to the last arg,
-     the ...  */
+     if the last arg is  ... */
    FOREACH_FUNCTION_ARGS (type, arg_type, iter)
      arg_num++;

-  if (arg_num != info.first_arg_num)
+  if (arg_num != info.first_arg_num && !type_argument_type(type, arg_num))
      {
        if (!(flags & (int) ATTR_FLAG_BUILT_IN))
   error ("argument to be formatted is not %<...%>");
diff --git a/gcc/testsuite/g++.dg/warn/format9.C 
b/gcc/testsuite/g++.dg/warn/format9.C

new file mode 100644
index 000..39b615859fc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/format9.C
@@ -0,0 +1,16 @@
+// Test format attribute us

Re: Need Help: Initialize paddings for -ftrivial-auto-var-init

2021-07-19 Thread Qing Zhao via Gcc-patches


> On Jul 19, 2021, at 5:33 AM, Richard Biener  wrote:
> 
> On Fri, 16 Jul 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> After some more study on __builtin_clear_padding and the corresponding 
>> testing cases.
>> And also considered both Richard Biener and Richard Sandiford’s previous 
>> suggestion to use
>> __builtin_clear_padding.  I have the following thought on the paddings 
>> initialization:
>> 
>> ** We can insert a call to __builtin_clear_padding (&decl, 0) to all the 
>> variables that need to be
>> Auto-initialized during gimplification phase.  This include two places:
>> 
>>A. If the auto-variable does not have an explicit initializer, and we 
>> need to add a call to .DEFERRED_INIT.
>> We always add a call to __builtin_clear_padding following this 
>> .DEFERRED_INIT call.
>> 
>>structure_type temp;
>> 
>>temp = .DEFERRED_INIT ();
>>__builtin_clear_padding (&temp, 0);
>> 
>> 
>>   NOTE: 
>>  ** If temp has a type without paddings, then 
>> __builtin_clear_padding will be lowered to a gimple_nop automatically.
>>  ** regardless with zero or pattern init,  the paddings will be 
>> always initialized to ZEROes, which is compatible with CLANG.
>> 
>> 
>>B. If the auto-variable does HAVE an explicit initializer, then we will 
>> add the call to __builtin_clear_padding 
>> In the beginning of “gimplify_init_constructor”.
>> 
>> 
>>  structure_type temp = {…..};
>> 
>> 
>> __builtin_clear_padding (&temp, 0);
>> Expand_the_constructor;
>> 
>>NOTE:
>>** if temp has a type without padding, the call to 
>> __builtin_clear_padding will be lowed to gimple_nop;
>>** padding will be always initialized to ZEROes. 
>> 
>> 
>> **the major benefit with this approach are:
>> 
>>  1. Padding initialization will be compatible with CLANG;
> 
> Irrelevant IMHO.

Even though it’s not very important, I think that it will be nicer if we can 
make it.  
> 
>>  2. Implemenation will be much more simple and consistent;
> 
> Really?  We're block-initializing now, so how is it simpler to
> initialize padding separately?

For pattern initialization, after the block-initializing, we can add the call 
to __builtin_clear_padding to initialize the paddings with zeroes.
Then all the paddings will be initialized with zeroes regardless whether it’s 
zero init or pattern init.

For zero initialization, after the block-initializing, we will Not add the call 
to __builtin_clear_padding. 


> 
>> My questions:
>> 
>> 1. What do you think of this approach?
> 
> I wonder why we're going back to separate padding initialization?

This mainly due to the implementation for initialize paddings of 
fully-initialized structure variables, we can add an additional 
block-initializing for 
paddings initialization before the real constructor initialization.  Or we can 
add a call to __builtin_clear_padding after the real constructor initialization 
to only initialize the separate paddings. 

Both approach should work, but after some study, I think adding call to 
__builtin_clear_padding should be better due to:

1. Easier to be implemented.  For the other approach, we will need to come up 
with a new routine “type_has_padding” to decide whether we need to
add the block initialization for the structure, as both you and Richard 
Sandiford commented before, this new routine “type_has_padding” SHOULE reuse
the current routine “clear_padding_type” in simple-fold.c, after my study, it’s 
quite difficult  to reorg  “clear_padding_type” to be shared between 
“gimple_fold_builtin_clear_padding”
and the new routine “type_has_padding”.   And also after this detailed study, I 
found that directly call “__builtin_clear_padding” might be more simpler
and clean.

2. Padding will be all zeroed regardless wether it’s zero init or pattern init. 
I think that this is nicer than pattern initializing the paddings.


>> 2. During implementation, if I want to add the following routine:
>> 
>> /* Generate padding initialization for automatic vairable DECL.
>>  C guarantees that brace-init with fewer initializers than members
>>  aggregate will initialize the rest of the aggregate as-if it were
>>  static initialization.  In turn static initialization guarantees
>>  that padding is initialized to zero. So, we always initialize paddings
>>  to zeroes regardless INIT_TYPE.
>>  To do the padding initialization, we insert a call to
>>  __BUILTIN_CLEAR_PADDING (&decl, 0).
>>  */
>> static void
>> gimple_add_padding_init_for_auto_var (tree decl, gimple_seq *seq_p)
>> {
>> ?? how to build a addr_of_decl tree node???
>> tree addr_of_decl = ….
> 
> = build_fold_addr_expr (decl);

Thank you.

Qing

> 
>> 2. During implementation, if I want to add the following routine:
>> 
>> /* Generate padding initialization for automatic vairable DECL.
>>   C guarantees that brace-init with fewer initializers than members
>>   aggregate will initialize

Re: [PATCH v2] c++: Reject ordered comparison of null pointers [PR99701]

2021-07-19 Thread Jonathan Wakely via Gcc-patches
On Mon, 19 Jul 2021 at 19:46, Marek Polacek  wrote:
>
> On Sat, Jul 17, 2021 at 02:50:28PM -0700, Jason Merrill wrote:
> > On 7/16/21 6:34 PM, Jakub Jelinek wrote:
> > > On Fri, Jul 16, 2021 at 05:36:13PM -0400, Marek Polacek via Gcc-patches 
> > > wrote:
> > > > When implementing DR 1512 in r11-467 I neglected to reject ordered
> > > > comparison of two null pointers, like nullptr < nullptr.  This patch
> > > > fixes that omission.
> > > >
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > >
> > > >   DR 1512
> > > >   PR c++/99701
> > > >
> > > > gcc/cp/ChangeLog:
> > > >
> > > >   * cp-gimplify.c (cp_fold): Remove {LE,LT,GE,GT_EXPR} from
> > > >   a switch.
> > > >   * typeck.c (cp_build_binary_op): Reject ordered comparison
> > > >   of two null pointers.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * g++.dg/cpp0x/nullptr11.C: Remove invalid tests.
> > > >   * g++.dg/cpp0x/nullptr46.C: Add dg-error.
> > > >   * g++.dg/expr/ptr-comp4.C: New test.
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > >   * testsuite/20_util/tuple/comparison_operators/overloaded.cc:
> > > >   Add dg-error.
> > >
> > > Maybe it would be useful to have also a g++.dg/cpp2a/ testcase with
> > > nullptr <=> nullptr etc. (nullptr <=> 0, etc. what you test
> > > in ptr-comp4.C after #include ).
> >
> > Sounds good.
>
> Done.
>
> > > +++ 
> > > b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded.cc
> > > @@ -49,3 +49,5 @@ TwistedLogic operator<(const Compares&, const 
> > > Compares&) { return {false}; }
> > >  auto a = std::make_tuple(nullptr, Compares{}, 2, 'U');
> > >  auto b = a == a;
> > >  auto c = a < a;
> > > +
> > > +// { dg-error "ordered comparison" "" { target *-*-* } 0 }
> >
> > If we can't test for the specific problematic line, let's split this
> > testcase into separate well-formed and ill-formed tests.
>
> Done, I don't know if there's a trick to check an error that comes
> from tuple.

The only way is to hardcode a line number from the header, but then
that breaks every time we add or remove anything in . I prefer
to match line 0 these days, to avoid having to keep adjusting it.

>Jon, are you OK with what I'm doing here?  Since

I might have gone with overloaded_neg.cc instead of overloaded2.cc. A
_neg suffix is documented as a convention for negative tests, but we
don't follow it consistently, so either is fine.


> overloaded.cc and overloaded2.cc are essentially the same I could
> make the substance of the test into a separate file but since it's
> just a test, I didn't bother.

Yeah, the duplication seems fine.

>
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

The libstdc++ bit is fine, thanks.



[committed] analyzer: add svalue::can_have_associated_state_p [PR101503]

2021-07-19 Thread David Malcolm via Gcc-patches
PR analyzer/101503 reports an assertion failure due to an unexpected
"UNKNOWN" value (due to using --param analyzer-max-svalue-depth=0).

This patch fixes this by rejecting attempts to purge state involving
unknown/poisoned svalues (in region_model::purge_state_involving),
as these svalues should not have state associated with them - they
are singletons w.r.t each type.

To be more systematic about this, the patch also introduces a new
svalue::can_have_associated_state_p which returns false for
unknown/poisoned svalues, so that we can reject adding constraints
or sm-state on them, or building various kinds of svalue in terms
of them (e.g. unary ops, binary ops, etc).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-2399-ga113b14398f2a4ad2742e6e9c87e25cac60f263e.

gcc/analyzer/ChangeLog:
PR analyzer/101503
* constraint-manager.cc (constraint_manager::add_constraint): Use
can_have_associated_state_p rather than testing for unknown.
(constraint_manager::get_or_add_equiv_class): Likewise.
* program-state.cc (sm_state_map::set_state): Likewise.
(sm_state_map::impl_set_state): Add assertion.
* region-model-manager.cc
(region_model_manager::maybe_fold_unaryop): Handle poisoned
values.
(region_model_manager::maybe_fold_binop): Move handling of unknown
values...
(region_model_manager::get_or_create_binop): ...to here, and
generalize to use can_have_associated_state_p.
(region_model_manager::maybe_fold_sub_svalue): Use
can_have_associated_state_p rather than testing for unknown.
(region_model_manager::maybe_fold_repeated_svalue): Use unknown
when the size or repeated value is "unknown"/"poisoned".
* region-model.cc (region_model::purge_state_involving): Reject
attempts to purge unknown/poisoned svalues, as these svalues
should not have state associated with them.
* svalue.cc (sub_svalue::sub_svalue): Assert that we're building
on top of an svalue with can_have_associated_state_p.
(repeated_svalue::repeated_svalue): Likewise.
(bits_within_svalue::bits_within_svalue): Likewise.
* svalue.h (svalue::can_have_associated_state_p): New.
(unknown_svalue::can_have_associated_state_p): New.
(poisoned_svalue::can_have_associated_state_p): New.
(unaryop_svalue::unaryop_svalue): Assert that we're building on
top of an svalue with can_have_associated_state_p.
(binop_svalue::binop_svalue): Likewise.
(widening_svalue::widening_svalue): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/101503
* gcc.dg/analyzer/pr101503.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/constraint-manager.cc   |  8 +++
 gcc/analyzer/program-state.cc|  6 +++--
 gcc/analyzer/region-model-manager.cc | 28 +---
 gcc/analyzer/region-model.cc |  2 ++
 gcc/analyzer/svalue.cc   |  4 
 gcc/analyzer/svalue.h| 16 ++
 gcc/testsuite/gcc.dg/analyzer/pr101503.c | 11 ++
 7 files changed, 61 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr101503.c

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index 5b5a9dec0a9..f59929a75ca 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -833,9 +833,9 @@ constraint_manager::add_constraint (const svalue *lhs,
   lhs = lhs->unwrap_any_unmergeable ();
   rhs = rhs->unwrap_any_unmergeable ();
 
-  /* Nothing can be known about unknown values.  */
-  if (lhs->get_kind () == SK_UNKNOWN
-  || rhs->get_kind () == SK_UNKNOWN)
+  /* Nothing can be known about unknown/poisoned values.  */
+  if (!lhs->can_have_associated_state_p ()
+  || !rhs->can_have_associated_state_p ())
 /* Not a contradiction.  */
 return true;
 
@@ -1175,7 +1175,7 @@ constraint_manager::get_or_add_equiv_class (const svalue 
*sval)
 {
   equiv_class_id result (-1);
 
-  gcc_assert (sval->get_kind () != SK_UNKNOWN);
+  gcc_assert (sval->can_have_associated_state_p ());
 
   /* Convert all NULL pointers to (void *) to avoid state explosions
  involving all of the various (foo *)NULL vs (bar *)NULL.  */
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index ccfe7b019b0..5bb86767873 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -453,8 +453,8 @@ sm_state_map::set_state (region_model *model,
   if (model == NULL)
 return;
 
-  /* Reject attempts to set state on UNKNOWN.  */
-  if (sval->get_kind () == SK_UNKNOWN)
+  /* Reject attempts to set state on UNKNOWN/POISONED.  */
+  if (!sval->can_have_associated_state_p ())
 return;
 
   equiv_class &ec = model->get_constraints ()->get_equiv_class (sval);
@@ -492,6 +492,8 @@ sm_state_map::impl_set_state (const s

Re: [PATCH 11/55] rs6000: Parsing built-in input file, part 1 of 3

2021-07-19 Thread Segher Boessenkool
Hi!

On Thu, Jun 17, 2021 at 10:18:55AM -0500, Bill Schmidt wrote:
> +enum bif_stanza
> +{
...
> + NUMBIFSTANZAS
> +};

Doing "NUM" things like this makes it valid to assign the NUM value to
a variable of this enum type, and importantly, the compiler cannot
warn for it then.  So this is a bit of an antipattern.

> +/* Attributes of a builtin function.  */
> +struct attrinfo
> +{
> +  char isinit;
> +  char isset;

[snip]

Is it nicer to have "bool" for these?

> +static int *bif_order;

This one probably can use a comment.

> +static bif_stanza
> +stanza_name_to_stanza (const char *stanza_name)
> +{
> +  for (int i = 0; i < NUMBIFSTANZAS; i++)
> +if (!strcmp (stanza_name, stanza_map[i].stanza_name))
> +  return stanza_map[i].stanza;
> +  assert (false);
> +}

assert() compiles to nothing if NDEBUG is defined at the point you
include the header.  That shouldn't happen, but please just make a
fatal() or similar function for this?

> +  /* Append a number representing the order in which this function
> + was encountered to its name, and save in another lookup
> + structure.  */
> +  int orig_len = strlen (bifs[curr_bif].idname);
> +  char *buf = (char *) malloc (orig_len + 7);
> +  strcpy (buf, bifs[curr_bif].idname);
> +  buf[orig_len] = ':';
> +  char numstr[6];
> +  sprintf (numstr, "%05d", curr_bif);
> +  strcpy (&buf[orig_len + 1], numstr);

char *buf;
asprintf (&buf, "%s:%05d", bifs[curr_bif].idname, curr_bif);

> +#ifdef DEBUG
> +  (*diag) ("pattern name is '%s'.\n", bifs[curr_bif].patname);
> +#endif

Maybe you can do the DEBUG thing inside the diag function itself, so
that you do not need the macro check at every use?  Or use a different
function, even.  (That has nothing to do with this patch in particular
of course).

Okay for trunk.  Thanks!


Segher


Re: [PATCH] build: Implement --with-multilib-list for avr target

2021-07-19 Thread Matt Jacobson via Gcc-patches



> On Jul 5, 2021, at 7:09 PM, Matt Jacobson  wrote:
> 
>> On Jun 7, 2021, at 3:30 AM, Matt Jacobson  wrote:
>> 
>> The AVR target builds a lot of multilib variants of target libraries by 
>> default,
>> and I found myself wanting to use the --with-multilib-list argument to limit
>> what I was building, to shorten build times.  This patch implements that 
>> option
>> for the AVR target.
>> 
>> Tested by configuring and building an AVR compiler and target libs on macOS.
>> 
>> I don't have commit access, so if this patch is suitable, I'd need someone 
>> else
>> to commit it for me.  Thanks.
> 
> Ping.  (Please let me know if I’ve made some process error here; this is my 
> first change to GCC.)

Ping again.  Thanks!

Original mail:


Matt

Re: [PATCH] predcom: Refactor more using auto_vec

2021-07-19 Thread Martin Sebor via Gcc-patches

On 7/19/21 12:28 AM, Kewen.Lin wrote:

Hi Martin & Richard,


A further improvement worth considering (if you're so inclined :)
is replacing the pcom_worker vec members with auto_vec (obviating
having to explicitly release them) and for the same reason also
replacing the comp_ptrs bare pointer members with auto_vecs.
There may be other opportunities to do the same in individual
functions (I'd look to get rid of as many calls to functions
like XNEW()/XNEWVEC() and free() use auto_vec instead).

An unrelated but worthwhile change is to replace the FOR_EACH_
loops with C++ 11 range loops, analogously to:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html

Finally, the only loosely followed naming convention for member
variables is to start them with the m_ prefix.

These just suggestions that could be done in a followup, not
something I would consider prerequisite for accepting the patch
as is if I were in a position to make such a decision.



Sorry for the late update, this patch follows your previous
advices to refactor it more by:
   - Adding m_ prefix for class pcom_worker member variables.
   - Using auto_vec instead of vec among class pcom_worker,
 chain, component and comp_ptrs.

btw, the changes in tree-data-ref.[ch] is required, without
it the destruction of auto_vec instance could try to double
free the memory pointed by m_vec.


Making the vec parameters in tree-data-ref.[ch] references is in line
with other changes like those that we have agreed on in a separate
review recently, so they look good to me.



The suggestion on range loops is addressed by one separated
patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575536.html

Bootstrapped and regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped on ppc64le P9 with bootstrap-O3 config.

Is it ok for trunk?


Thanks for taking the suggestions!  At a high-level the patch looks
good.  I spotted a couple of new instances of XDELETE that I couldn't
find corresponding XNEW() calls for but I'll leave that to someone
more familiar with the code, along with a formal review and approval.

Martin



BR,
Kewen
-
gcc/ChangeLog:

* tree-data-ref.c (free_dependence_relations): Adjust to pass vec by
reference.
(free_data_refs): Likewise.
* tree-data-ref.h (free_dependence_relations): Likewise.
(free_data_refs): Likewise.
* tree-predcom.c (struct chain): Use auto_vec instead of vec for
members.
(struct component): Likewise.
(pcom_worker::pcom_worker): Adjust for auto_vec and renaming changes.
(pcom_worker::~pcom_worker): Likewise.
(pcom_worker::release_chain): Adjust as auto_vec changes.
(pcom_worker::loop): Rename to ...
(pcom_worker::m_loop): ... this.
(pcom_worker::datarefs): Rename to ...
(pcom_worker::m_datarefs): ... this.  Use auto_vec instead of vec.
(pcom_worker::dependences): Rename to ...
(pcom_worker::m_dependences): ... this.  Use auto_vec instead of vec.
(pcom_worker::chains): Rename to ...
(pcom_worker::m_chains): ... this.  Use auto_vec instead of vec.
(pcom_worker::looparound_phis): Rename to ...
(pcom_worker::m_looparound_phis): ... this.  Use auto_vec instead of
vec.
(pcom_worker::cache): Rename to ...
(pcom_worker::m_cache): ... this.  Use auto_vec instead of vec.
(pcom_worker::release_chain): Adjust for auto_vec changes.
(pcom_worker::release_chains): Adjust for auto_vec and renaming
changes.
(release_component): Remove.
(release_components): Adjust for release_component removal.
(component_of): Adjust to use vec.
(merge_comps): Likewise.
(pcom_worker::aff_combination_dr_offset): Adjust for renaming changes.
(pcom_worker::determine_offset): Likewise.
(class comp_ptrs): Remove.
(pcom_worker::split_data_refs_to_components): Adjust for renaming
changes, for comp_ptrs removal with auto_vec.
(pcom_worker::suitable_component_p): Adjust for renaming changes.
(pcom_worker::filter_suitable_components): Adjust for release_component
removal.
(pcom_worker::valid_initializer_p): Adjust for renaming changes.
(pcom_worker::find_looparound_phi): Likewise.
(pcom_worker::add_looparound_copies): Likewise.
(pcom_worker::determine_roots_comp): Likewise.
(pcom_worker::single_nonlooparound_use): Likewise.
(pcom_worker::execute_pred_commoning_chain): Likewise.
(pcom_worker::execute_pred_commoning): Likewise.
(pcom_worker::try_combine_chains): Likewise.
(pcom_worker::prepare_initializers_chain): Likewise.
(pcom_worker::prepare_initializers): Likewise.
(pcom_worker::prepare_finalizers_chain): Likewise.
(pcom_worker::prepare_finalizers): Likewise.
(pcom_worker::tree_predictive_commonin

Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

2021-07-19 Thread Victor Tong via Gcc-patches
Gentle ping.

From: Gcc-patches  on 
behalf of Victor Tong via Gcc-patches 
Sent: Monday, June 28, 2021 4:10 PM
To: Richard Biener ; Marc Glisse 

Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division 
followed by multiply [PR95176]

​Thanks Richard and Marc.

I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() 
below with my extra pattern being applied. I tested the two functions with all 
of the integers from INT_MIN to INT_MAX.

long
fn1 (int x)
{
  return 42L - (long)(42 - x);
}

#pragma GCC push_options
#pragma GCC optimize ("O0")
long
fn1NoOpt (int x)
{
  volatile int y = (42 - x);
  return 42L - (long)y;
}
#pragma GCC pop_options

int main ()
{
for (long i=INT_MIN; i<=INT_MAX;i++)
{
auto valNoOpt = fn1NoOpt(i);
auto valOpt = fn1(i);
if (valNoOpt != valOpt)
printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
}
return 0;
}

I saw that the return values of fn1() and fn1NoOpt() differed when the input 
was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range 
to fn1NoOpt(), a signed overflow is triggered which causes the value to differ 
(undefined behavior). This seems to go in line with what Marc described and I 
think the transformation is correct in the scenario above. I do think that type 
casts that result in truncation (i.e. from a higher precision to a lower one) 
or with unsigned types will result in an incorrect transformation so those 
scenarios need to be avoided.

Given that the extra pattern I'm adding is taking advantage the undefined 
behavior of signed integer overflow, I'm considering keeping the existing 
nop_convert pattern in place and adding a new pattern to cover these new cases. 
I'd also like to avoid touching nop_convert given that it's used in a number of 
other patterns.

This is the pattern I have currently:

  (simplify
(minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
(if (operand_equal_p(@0, @2, 0)
&& INTEGRAL_TYPE_P (type)
&& TYPE_OVERFLOW_UNDEFINED(type)
&& !TYPE_OVERFLOW_SANITIZED(type)
&& INTEGRAL_TYPE_P (TREE_TYPE(@1))
&& TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
&& !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
&& !TYPE_UNSIGNED (TREE_TYPE (@1))
&& !TYPE_UNSIGNED (type)
&& TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
&& INTEGRAL_TYPE_P (TREE_TYPE(@0))
&& TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
&& !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
&& !TYPE_UNSIGNED (TREE_TYPE (@0))
&& TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
&& TREE_TYPE(@1) == TREE_TYPE(@2))
(convert @1)))

Is there a more concise/better way of writing the pattern? I was looking for 
similar checks in match.pd and I couldn't find anything that I could leverage.

I also kept my pattern to the specific scenario I'm seeing with the regression 
to lower the risk of something breaking. I've limited @1 and @2 to have the 
same type.

I'm also in favor of adding/running computer verification to make sure the 
transformation is legal. I've written some tests to verify that the pattern is 
being applied in the right scenarios and not being applied in others, but I 
think there are too many possibilities to manually write them all. Is there 
anything in GCC that can be used to verify that match.pd transformations are 
correct? I'm thinking of something like Alive 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAliveToolkit%2Falive2&data=04%7C01%7Cvitong%40microsoft.com%7Cba7d8f9f9b774462148608d93a8a0471%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637605186726283785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ugx%2FTw58OPjLzamE9yqQThV5u4EfQ8JLrurnIy00AzQ%3D&reserved=0.

Thanks,
Victor



From: Richard Biener 
Sent: Monday, June 21, 2021 12:08 AM
To: Marc Glisse 
Cc: Victor Tong ; gcc-patches@gcc.gnu.org 

Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division 
followed by multiply [PR95176]

On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse  wrote:
>
> On Fri, 18 Jun 2021, Richard Biener wrote:
>
> >> Option 2: Add a new pattern to support scenarios that the existing 
> >> nop_convert pattern bails out on.
> >>
> >> Existing pattern:
> >>
> >> (simplify
> >>(minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) 
> >> @1)))
> >>(view_convert @1))
>
> I tried to check with a program when
>
> T3 x;
> T1 y;
> (T2)x-(T2)((T1)x-y)
>
> can be safely replaced with
>
> (T2)y
>
> From the output, it looks like this is safe when T1 is at least as large
> as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> then) or has 

[PATCH] c++: Improve memory usage of subsumption [PR100828]

2021-07-19 Thread Patrick Palka via Gcc-patches
Constraint subsumption is implemented in two steps.  The first step
computes the disjunctive (or conjunctive) normal form of one of the
constraints, and the second step verifies that each clause in the
decomposed form implies the other constraint.   Performing these two
steps separately is problematic because in the first step the
disjunctive normal form can be exponentially larger than the original
constraint, and by computing it ahead of time we'd have to keep all of
it in memory.

This patch fixes this exponential blowup in memory usage by interleaving
these two steps, so that as soon as we decompose one clause we check
implication for it.  In turn, memory usage during subsumption is now
worst case linear in the size of the constraints rather than
exponential, and so we can safely remove the hard limit of 16 clauses
without introducing runaway memory usage on some inputs.  (Note the
_time_ complexity of subsumption is still exponential in the worst case.)

In order for this to work we need formula::branch to prepend the copy
of the current clause directly after the current clause rather than
at the end of the list, so that we fully decompose a clause shortly
after creating it.  Otherwise we'd end up accumulating exponentially
many (partially decomposed) clauses in memory anyway.

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
range-v3 and cmcstl2.  Does this look OK for trunk and perhaps 11?

PR c++/100828

gcc/cp/ChangeLog:

* logic.cc (formula::formula): Use emplace_back.
(formula::branch): Insert a copy of m_current in front of
m_current instead of at the end of the list.
(formula::erase): Define.
(decompose_formula): Remove.
(decompose_antecedents): Remove.
(decompose_consequents): Remove.
(derive_proofs): Remove.
(max_problem_size): Remove.
(diagnose_constraint_size): Remove.
(subsumes_constraints_nonnull): Rewrite directly in terms of
decompose_clause and derive_proof, interleaving decomposition
with implication checking.  Use formula::erase to free the
current clause before moving on to the next one.
---
 gcc/cp/logic.cc | 118 ++--
 1 file changed, 35 insertions(+), 83 deletions(-)

diff --git a/gcc/cp/logic.cc b/gcc/cp/logic.cc
index 142457e408a..3f872c11fe2 100644
--- a/gcc/cp/logic.cc
+++ b/gcc/cp/logic.cc
@@ -223,9 +223,7 @@ struct formula
 
   formula (tree t)
   {
-/* This should call emplace_back(). There's an extra copy being
-   invoked by using push_back().  */
-m_clauses.push_back (t);
+m_clauses.emplace_back (t);
 m_current = m_clauses.begin ();
   }
 
@@ -248,8 +246,7 @@ struct formula
   clause& branch ()
   {
 gcc_assert (!done ());
-m_clauses.push_back (*m_current);
-return m_clauses.back ();
+return *m_clauses.insert (std::next (m_current), *m_current);
   }
 
   /* Returns the position of the current clause.  */
@@ -287,6 +284,14 @@ struct formula
 return m_clauses.end ();
   }
 
+  /* Remove the specified clause.  */
+
+  void erase (iterator i)
+  {
+gcc_assert (i != m_current);
+m_clauses.erase (i);
+  }
+
   std::list m_clauses; /* The list of clauses.  */
   iterator m_current; /* The current clause.  */
 };
@@ -659,39 +664,6 @@ decompose_clause (formula& f, clause& c, rules r)
   f.advance ();
 }
 
-/* Decompose the logical formula F according to the logical
-   rules determined by R.  The result is a formula containing
-   clauses that contain only atomic terms.  */
-
-void
-decompose_formula (formula& f, rules r)
-{
-  while (!f.done ())
-decompose_clause (f, *f.current (), r);
-}
-
-/* Fully decomposing T into a list of sequents, each comprised of
-   a list of atomic constraints, as if T were an antecedent.  */
-
-static formula
-decompose_antecedents (tree t)
-{
-  formula f (t);
-  decompose_formula (f, left);
-  return f;
-}
-
-/* Fully decomposing T into a list of sequents, each comprised of
-   a list of atomic constraints, as if T were a consequent.  */
-
-static formula
-decompose_consequents (tree t)
-{
-  formula f (t);
-  decompose_formula (f, right);
-  return f;
-}
-
 static bool derive_proof (clause&, tree, rules);
 
 /* Derive a proof of both operands of T.  */
@@ -744,28 +716,6 @@ derive_proof (clause& c, tree t, rules r)
   }
 }
 
-/* Derive a proof of T from disjunctive clauses in F.  */
-
-static bool
-derive_proofs (formula& f, tree t, rules r)
-{
-  for (formula::iterator i = f.begin(); i != f.end(); ++i)
-if (!derive_proof (*i, t, r))
-  return false;
-  return true;
-}
-
-/* The largest number of clauses in CNF or DNF we accept as input
-   for subsumption. This an upper bound of 2^16 expressions.  */
-static int max_problem_size = 16;
-
-static inline bool
-diagnose_constraint_size (tree t)
-{
-  error_at (input_location, "%qE exceeds the maximum constraint complexity", 
t);
-  return false;
-}
-
 /* Key

Re: [PATCH 12/55] rs6000: Parsing built-in input file, part 2 of 3

2021-07-19 Thread Segher Boessenkool
On Thu, Jun 17, 2021 at 10:18:56AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-gen-builtins.c (parse_args): New function.
>   (parse_prototype): Implement.

> + restr_opnd[restr_cnt] = *nargs + 1;
> + restr[restr_cnt] = argtype->restr;
> + val1[restr_cnt] = argtype->val1;
> + val2[restr_cnt++] = argtype->val2;

Just make the ++ more explicit please (put it in a separate statement).

Okay for trunk.  Thanks!


Segher


Re: [PATCH] c++: Improve memory usage of subsumption [PR100828]

2021-07-19 Thread Patrick Palka via Gcc-patches
On Mon, 19 Jul 2021, Patrick Palka wrote:

> Constraint subsumption is implemented in two steps.  The first step
> computes the disjunctive (or conjunctive) normal form of one of the
> constraints, and the second step verifies that each clause in the
> decomposed form implies the other constraint.   Performing these two
> steps separately is problematic because in the first step the
> disjunctive normal form can be exponentially larger than the original
> constraint, and by computing it ahead of time we'd have to keep all of
> it in memory.
> 
> This patch fixes this exponential blowup in memory usage by interleaving
> these two steps, so that as soon as we decompose one clause we check
> implication for it.  In turn, memory usage during subsumption is now
> worst case linear in the size of the constraints rather than
> exponential, and so we can safely remove the hard limit of 16 clauses
> without introducing runaway memory usage on some inputs.  (Note the
> _time_ complexity of subsumption is still exponential in the worst case.)
> 
> In order for this to work we need formula::branch to prepend the copy
> of the current clause directly after the current clause rather than
> at the end of the list, so that we fully decompose a clause shortly
> after creating it.  Otherwise we'd end up accumulating exponentially
> many (partially decomposed) clauses in memory anyway.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
> range-v3 and cmcstl2.  Does this look OK for trunk and perhaps 11?

Here's a testcase that demonstrates the exponential improvement, because
the DNF/CNF for the below constraints has around 2^23 clauses.  Before
this patch (but after removing the hard limit of 16 clauses), compile
time and memory usage is 7s/2.4GB.  After this patch, it's 3.5s/25MB.

-- >8 --

template concept C = true;

template
requires (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
  || (C && C)
struct k;

template
requires (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && (C || C)
  && true
struct k { };



Re: [PATCH 13/55] rs6000: Parsing built-in input file, part 3 of 3

2021-07-19 Thread Segher Boessenkool
On Thu, Jun 17, 2021 at 10:18:57AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-gen-builtins.c (parse_bif_attrs):
>   Implement.

> +#ifdef DEBUG
> +  (*diag) ("attribute set: init = %d, set = %d, extract = %d, \
> +nosoft = %d, ldvec = %d, stvec = %d, reve = %d, pred = %d, htm = %d, \
> +htmspr = %d, htmcr = %d, mma = %d, quad = %d, pair = %d, no32bit = %d, \
> +32bit = %d, cpu = %d, ldstmask = %d, lxvrse = %d, lxvrze = %d, \
> +endian = %d.\n",

This needs some formatting.  With thatt, kay for trunk.  Thanks!


Segher


[committed] suppress buffer overflow warnings in a test (PR 101520)

2021-07-19 Thread Martin Sebor via Gcc-patches

The gcc.target/powerpc/pr93658.c test uses a loop to write into
consecutive bytes of a char variable.  The loop vectorizer turns
it into a series of assignments of vector(16) char each, which
the warning then points out.  I have committed r12-2401 to
a suppress the warning:

https://gcc.gnu.org/pipermail/gcc-cvs/2021-July/350850.html

Martin


Re: [PATCH 14/55] rs6000: Parsing of overload input file

2021-07-19 Thread Segher Boessenkool
On Thu, Jun 17, 2021 at 10:18:58AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-gen-builtins.c (ovld_stanza): New struct.
>   (MAXOVLDSTANZAS): New macro.
>   (ovld_stanzas): New variable.
>   (curr_ovld_stanza): Likewise.
>   (MAXOVLDS): New macro.
>   (ovlddata): New struct.
>   (ovlds): New variable.
>   (curr_ovld): Likewise.
>   (max_ovld_args): Likewise.
>   (parse_ovld_entry): New function.
>   (parse_ovld_stanza): Likewise.
>   (parse_ovld): Implement.

> +  /* Check for an optional overload id.  Usually we use the builtin
> + function id for that purpose, but sometimes we need multiple
> + overload entries for the same builtin id, and it needs to be unique.  */
> +  consume_whitespace ();
> +  if (linebuf[pos] != '\n')
> +{
> +  id = match_identifier ();
> +  ovlds[curr_ovld].ovld_id_name = id;
> +  consume_whitespace ();
> +}

So everything is just strings here, so you do not have any real problem
with conflicting IDs.  Okay.

This is fine for trunk as well.  Thanks!


Segher


Go patch committed: Avoid aliases in receiver types

2021-07-19 Thread Ian Lance Taylor via Gcc-patches
This patch changes the Go frontend to avoid aliases in receiver types.
If a package declares a method on an alias type, the alias would be
used in the export data.  This would then trigger a compiler assertion
on import: we should not be adding methods to aliases.

Fix the problem by ensuring that receiver types do not use alias
types.  This seems preferable to consistently avoiding aliases in
export data, as aliases can cross packages.  And it's painful to try
to patch this while writing the export data, as at that point all the
types are known.

The test case is https://golang.org/cl/335172.  This fixes
https://golang.org/issue/47131.

Committed to mainline and GCC 11 branch.

Ian
4b127b4deca54805da5628839926cd6c9ed2aeac
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 4d0f44f2dd2..5323e186eab 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-adcf10890833026437a94da54934ce50c0018309
+920549b6382a2623538d31001271941f0e9e5a51
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc
index 2872d2ed3ab..95b76bd317c 100644
--- a/gcc/go/gofrontend/gogo.cc
+++ b/gcc/go/gofrontend/gogo.cc
@@ -5763,6 +5763,26 @@ Function::check_labels() const
 }
 }
 
+// Set the receiver type.  This is used to remove aliases.
+
+void
+Function::set_receiver_type(Type* rtype)
+{
+  Function_type* oft = this->type_;
+  Typed_identifier* rec = new Typed_identifier(oft->receiver()->name(),
+  rtype,
+  oft->receiver()->location());
+  Typed_identifier_list* parameters = NULL;
+  if (oft->parameters() != NULL)
+parameters = oft->parameters()->copy();
+  Typed_identifier_list* results = NULL;
+  if (oft->results() != NULL)
+results = oft->results()->copy();
+  Function_type* nft = Type::make_function_type(rec, parameters, results,
+   oft->location());
+  this->type_ = nft;
+}
+
 // Swap one function with another.  This is used when building the
 // thunk we use to call a function which calls recover.  It may not
 // work for any other case.
@@ -7285,6 +7305,26 @@ Function_declaration::set_nointerface()
   this->pragmas_ |= GOPRAGMA_NOINTERFACE;
 }
 
+// Set the receiver type.  This is used to remove aliases.
+
+void
+Function_declaration::set_receiver_type(Type* rtype)
+{
+  Function_type* oft = this->fntype_;
+  Typed_identifier* rec = new Typed_identifier(oft->receiver()->name(),
+  rtype,
+  oft->receiver()->location());
+  Typed_identifier_list* parameters = NULL;
+  if (oft->parameters() != NULL)
+parameters = oft->parameters()->copy();
+  Typed_identifier_list* results = NULL;
+  if (oft->results() != NULL)
+results = oft->results()->copy();
+  Function_type* nft = Type::make_function_type(rec, parameters, results,
+   oft->location());
+  this->fntype_ = nft;
+}
+
 // Import an inlinable function.  This is used for an inlinable
 // function whose body is recorded in the export data.  Parse the
 // export data into a Block and create a regular function using that
diff --git a/gcc/go/gofrontend/gogo.h b/gcc/go/gofrontend/gogo.h
index f4155a29edb..c49bc92b3e0 100644
--- a/gcc/go/gofrontend/gogo.h
+++ b/gcc/go/gofrontend/gogo.h
@@ -1724,6 +1724,10 @@ class Function
   set_is_referenced_by_inline()
   { this->is_referenced_by_inline_ = true; }
 
+  // Set the receiver type.  This is used to remove aliases.
+  void
+  set_receiver_type(Type* rtype);
+
   // Swap with another function.  Used only for the thunk which calls
   // recover.
   void
@@ -1990,6 +1994,10 @@ class Function_declaration
   set_is_on_inlinable_list()
   { this->is_on_inlinable_list_ = true; }
 
+  // Set the receiver type.  This is used to remove aliases.
+  void
+  set_receiver_type(Type* rtype);
+
   // Import the function body, creating a function.
   void
   import_function_body(Gogo*, Named_object*);
diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc
index d08cbc96ea1..ab7166b8b1b 100644
--- a/gcc/go/gofrontend/types.cc
+++ b/gcc/go/gofrontend/types.cc
@@ -10416,6 +10416,57 @@ Named_type::finalize_methods(Gogo* gogo)
   return;
 }
 
+  // Remove any aliases in the local method receiver types.
+  Bindings* methods = this->local_methods_;
+  if (methods != NULL)
+{
+  for (Bindings::const_declarations_iterator p =
+methods->begin_declarations();
+  p != methods->end_declarations();
+  ++p)
+   {
+ Named_object* no = p->second;
+ Function_type* fntype;
+ if (no->is_function())
+   fntype = no->func_value()->type();
+ else if (no->is_function_declaration())
+   fntype = no->func_declaration_val

PING 2 [PATCH] handle sanitizer built-ins in -Wuninitialized (PR 101300)

2021-07-19 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574385.html

On 7/12/21 12:06 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574385.html

On 7/2/21 1:21 PM, Martin Sebor wrote:

To avoid a class of false negatives for sanitized code
-Wuninitialized recognizes the ASAN_MARK internal function
doesn't modify its argument.  But the warning code doesn't do
the same for any sanitizer built-ins even though they don't
modify user-supplied arguments either.  This leaves another
class of false negatives unresolved.

The attached fix enhances the warning logic to recognize all
sanitizer built-ins as well and treat them as non-modifying.

Tested on x86_64-linux.

Martin






Re: [PATCH 15/55] rs6000: Build and store function type identifiers

2021-07-19 Thread Segher Boessenkool
Hi!

On Thu, Jun 17, 2021 at 10:18:59AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-gen-builtins.c (complete_vector_type): New
>   function.
>   (complete_base_type): Likewise.
>   (construct_fntype_id): Likewise.
>   (parse_bif_entry): Call contruct_fntype_id.
>   (parse_ovld_entry): Likewise.

> +/* Convert a vector type into a mode string.  */
> +static void
> +complete_vector_type (typeinfo *typeptr, char *buf, int *bufi)
> +{
> +  if (typeptr->isbool)
> +buf[(*bufi)++] = 'b';
> +  buf[(*bufi)++] = 'v';
> +  if (typeptr->ispixel)
> +{
> +  memcpy (&buf[*bufi], "p8hi", 4);
> +  *bufi += 4;

  return;
}

> +  else

... and then you don't need this, and everything after this loses a
level of indentation.  The power of early outs :-)

> +/* Build a function type descriptor identifier from the return type
> +   and argument types described by PROTOPTR, and store it if it does
> +   not already exist.  Return the identifier.  */
> +static char *
> +construct_fntype_id (prototype *protoptr)
> +{
> +  /* Determine the maximum space for a function type descriptor id.
> + Each type requires at most 9 characters (6 for the mode*, 1 for
> + the optional 'u' preceding the mode, 1 for the optional 'p'
> + preceding the mode, and 1 for an underscore following the mode).
> + We also need 5 characters for the string "ftype" that separates
> + the return mode from the argument modes.  The last argument doesn't
> + need a trailing underscore, but we count that as the one trailing
> + "ftype" instead.  For the special case of zero arguments, we need 9
> + for the return type and 7 for "ftype_v".  Finally, we need one
> + character for the terminating null.  Thus for a function with N
> + arguments, we need at most 9N+15 characters for N>0, otherwise 17.
> + 
> +   *Worst case is bv16qi for "vector bool char".  */
> +  int len = protoptr->nargs ? (protoptr->nargs + 1) * 9 + 6 : 17;
> +  char *buf = (char *) malloc (len);

I would completely avoid all this by using asprintf.  But that requires
a little restructuring, so *shrug* (it could use some factoring anyway,
but we don't need this to be perfect ;-) )

> +  assert (!argptr);
> +  }

Formatting here is broken...  Probably that last line has two spaces
to many, but please check.

> +  /* Ignore return value, as duplicates are expected.  */
> +  (void) rbt_insert (&fntype_rbt, buf);

Casting to void does not ignore a return value.  But is rbt_insert
marked up as warn_unused_result anyway?  Simply not using the return
value is fine as well.  If you want to be very explicit, you can write

  if (rbt_insert (&fntype_rbt, buf))
; /* Duplicates are fine and expected here.  */

Okay for trunk with or without such improvements.  But do fix that
indentation thing please ;-)  Thanks!


Segher


Re: [PATCH] predcom: Refactor more using auto_vec

2021-07-19 Thread Kewen.Lin via Gcc-patches
on 2021/7/20 上午4:45, Martin Sebor wrote:
> On 7/19/21 12:28 AM, Kewen.Lin wrote:
>> Hi Martin & Richard,
>>
 A further improvement worth considering (if you're so inclined :)
 is replacing the pcom_worker vec members with auto_vec (obviating
 having to explicitly release them) and for the same reason also
 replacing the comp_ptrs bare pointer members with auto_vecs.
 There may be other opportunities to do the same in individual
 functions (I'd look to get rid of as many calls to functions
 like XNEW()/XNEWVEC() and free() use auto_vec instead).

 An unrelated but worthwhile change is to replace the FOR_EACH_
 loops with C++ 11 range loops, analogously to:
 https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html

 Finally, the only loosely followed naming convention for member
 variables is to start them with the m_ prefix.

 These just suggestions that could be done in a followup, not
 something I would consider prerequisite for accepting the patch
 as is if I were in a position to make such a decision.

>>
>> Sorry for the late update, this patch follows your previous
>> advices to refactor it more by:
>>    - Adding m_ prefix for class pcom_worker member variables.
>>    - Using auto_vec instead of vec among class pcom_worker,
>>  chain, component and comp_ptrs.
>>
>> btw, the changes in tree-data-ref.[ch] is required, without
>> it the destruction of auto_vec instance could try to double
>> free the memory pointed by m_vec.
> 
> Making the vec parameters in tree-data-ref.[ch] references is in line
> with other changes like those that we have agreed on in a separate
> review recently, so they look good to me.
> 

Nice, thanks for the information.

>>
>> The suggestion on range loops is addressed by one separated
>> patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575536.html
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Is it ok for trunk?
> 
> Thanks for taking the suggestions!  At a high-level the patch looks
> good.  I spotted a couple of new instances of XDELETE that I couldn't
> find corresponding XNEW() calls for but I'll leave that to someone
> more familiar with the code, along with a formal review and approval.
> 

The new XDELETEs are for struct component releasing, which uses "free"
in removed function release_component before.  As its original "new"
adopts "XCNEW" as below:

  comp = XCNEW (struct component);

I thought it might be good to use XDELETE to match when I touched it.

BR,
Kewen

> Martin
> 
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> * tree-data-ref.c (free_dependence_relations): Adjust to pass vec by
>> reference.
>> (free_data_refs): Likewise.
>> * tree-data-ref.h (free_dependence_relations): Likewise.
>> (free_data_refs): Likewise.
>> * tree-predcom.c (struct chain): Use auto_vec instead of vec for
>> members.
>> (struct component): Likewise.
>> (pcom_worker::pcom_worker): Adjust for auto_vec and renaming changes.
>> (pcom_worker::~pcom_worker): Likewise.
>> (pcom_worker::release_chain): Adjust as auto_vec changes.
>> (pcom_worker::loop): Rename to ...
>> (pcom_worker::m_loop): ... this.
>> (pcom_worker::datarefs): Rename to ...
>> (pcom_worker::m_datarefs): ... this.  Use auto_vec instead of vec.
>> (pcom_worker::dependences): Rename to ...
>> (pcom_worker::m_dependences): ... this.  Use auto_vec instead of vec.
>> (pcom_worker::chains): Rename to ...
>> (pcom_worker::m_chains): ... this.  Use auto_vec instead of vec.
>> (pcom_worker::looparound_phis): Rename to ...
>> (pcom_worker::m_looparound_phis): ... this.  Use auto_vec instead of
>> vec.
>> (pcom_worker::cache): Rename to ...
>> (pcom_worker::m_cache): ... this.  Use auto_vec instead of vec.
>> (pcom_worker::release_chain): Adjust for auto_vec changes.
>> (pcom_worker::release_chains): Adjust for auto_vec and renaming
>> changes.
>> (release_component): Remove.
>> (release_components): Adjust for release_component removal.
>> (component_of): Adjust to use vec.
>> (merge_comps): Likewise.
>> (pcom_worker::aff_combination_dr_offset): Adjust for renaming changes.
>> (pcom_worker::determine_offset): Likewise.
>> (class comp_ptrs): Remove.
>> (pcom_worker::split_data_refs_to_components): Adjust for renaming
>> changes, for comp_ptrs removal with auto_vec.
>> (pcom_worker::suitable_component_p): Adjust for renaming changes.
>> (pcom_worker::filter_suitable_components): Adjust for release_component
>> removal.
>> (pcom_worker::valid_initializer_p): Adjust for renaming changes.
>> (pcom_worker::find_looparound_phi): Likewise.
>> (pcom_worker::add_looparound_copies): Likewise.
>> (pcom_worker::determine_roots_comp): Likewise.
>> 

Re: [PATCH] Add QI vector mode support to by-pieces for memset

2021-07-19 Thread H.J. Lu via Gcc-patches
On Mon, Jul 19, 2021 at 7:41 AM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> > On Fri, Jul 16, 2021 at 7:15 AM H.J. Lu  wrote:
> >>
> >> On Fri, Jul 16, 2021 at 6:24 AM Richard Sandiford
> >>  wrote:
> >> >
> >> > "H.J. Lu via Gcc-patches"  writes:
> >> > > On Fri, Jul 16, 2021 at 4:38 AM Richard Sandiford
> >> > >  wrote:
> >> > >>
> >> > >> "H.J. Lu via Gcc-patches"  writes:
> >> > >> > 1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with
> >> > >> > vec_duplicate_optab to duplicate QI value to TI/OI/XI value.
> >> > >> > 2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a 
> >> > >> > hard
> >> > >> > scratch register to avoid stack realignment when expanding memset.
> >> > >> >
> >> > >> >   PR middle-end/90773
> >> > >> >   * builtins.c (gen_memset_value_from_prev): New function.
> >> > >> >   (gen_memset_broadcast): Likewise.
> >> > >> >   (builtin_memset_read_str): Use gen_memset_value_from_prev
> >> > >> >   and gen_memset_broadcast.
> >> > >> >   (builtin_memset_gen_str): Likewise.
> >> > >> >   * target.def (gen_memset_scratch_rtx): New hook.
> >> > >> >   * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
> >> > >> >   * doc/tm.texi: Regenerated.
> >> > >> > ---
> >> > >> >  gcc/builtins.c | 123 
> >> > >> > +
> >> > >> >  gcc/doc/tm.texi|   5 ++
> >> > >> >  gcc/doc/tm.texi.in |   2 +
> >> > >> >  gcc/target.def |   7 +++
> >> > >> >  4 files changed, 116 insertions(+), 21 deletions(-)
> >> > >> >
> >> > >> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> >> > >> > index 39ab139b7e1..c1758ae2efc 100644
> >> > >> > --- a/gcc/builtins.c
> >> > >> > +++ b/gcc/builtins.c
> >> > >> > @@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx 
> >> > >> > target)
> >> > >> >return NULL_RTX;
> >> > >> >  }
> >> > >> >
> >> > >> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE 
> >> > >> > (MODE)
> >> > >> > -   bytes from constant string DATA + OFFSET and return it as target
> >> > >> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
> >> > >> > +/* Return the RTL of a register in MODE generated from PREV in the
> >> > >> > previous iteration.  */
> >> > >> >
> >> > >> > -rtx
> >> > >> > -builtin_memset_read_str (void *data, void *prevp,
> >> > >> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> >> > >> > -  scalar_int_mode mode)
> >> > >> > +static rtx
> >> > >> > +gen_memset_value_from_prev (void *prevp, scalar_int_mode mode)
> >> > >> >  {
> >> > >> > +  rtx target = nullptr;
> >> > >> >by_pieces_prev *prev = (by_pieces_prev *) prevp;
> >> > >> >if (prev != nullptr && prev->data != nullptr)
> >> > >> >  {
> >> > >> >/* Use the previous data in the same mode.  */
> >> > >> >if (prev->mode == mode)
> >> > >> >   return prev->data;
> >> > >> > +
> >> > >> > +  rtx prev_rtx = prev->data;
> >> > >> > +  machine_mode prev_mode = prev->mode;
> >> > >> > +  unsigned int word_size = GET_MODE_SIZE (word_mode);
> >> > >> > +  if (word_size < GET_MODE_SIZE (prev->mode)
> >> > >> > +   && word_size > GET_MODE_SIZE (mode))
> >> > >> > + {
> >> > >> > +   /* First generate subreg of word mode if the previous mode 
> >> > >> > is
> >> > >> > +  wider than word mode and word mode is wider than MODE.  
> >> > >> > */
> >> > >> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
> >> > >> > +   prev_mode, 0);
> >> > >> > +   prev_mode = word_mode;
> >> > >> > + }
> >> > >> > +  if (prev_rtx != nullptr)
> >> > >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
> >> > >> >  }
> >> > >> > +  return target;
> >> > >> > +}
> >> > >> > +
> >> > >> > +/* Return the RTL of a register in MODE broadcasted from DATA.  */
> >> > >> > +
> >> > >> > +static rtx
> >> > >> > +gen_memset_broadcast (rtx data, scalar_int_mode mode)
> >> > >> > +{
> >> > >> > +  /* Skip if regno_reg_rtx isn't initialized.  */
> >> > >> > +  if (!regno_reg_rtx)
> >> > >> > +return nullptr;
> >> > >> > +
> >> > >> > +  rtx target = nullptr;
> >> > >> > +
> >> > >> > +  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE 
> >> > >> > (QImode);
> >> > >> > +  machine_mode vector_mode;
> >> > >> > +  if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
> >> > >> > +gcc_unreachable ();
> >> > >>
> >> > >> Sorry, I realise it's a bit late to be raising this objection now,
> >> > >> but I don't think it's a good idea to use scalar integer modes as
> >> > >> a proxy for vector modes.  In principle there's no reason why a
> >> > >> target has to define an integer mode for every vector mode.
> >> > >
> >> > > A target always defines the largest integer mode.
> >> >
> >> > Right.  But a target shouldn't *need* to define an integer mode
> >> > for every vector mode.
>

[PATCH v2] Add QI vector mode support to by-pieces for memset

2021-07-19 Thread H.J. Lu via Gcc-patches
1. Replace scalar_int_mode with fixed_size_mode in the by-pieces
infrastructure to allow non-integer mode.
2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size
to return QI vector mode for memset.
3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the
smallest integer or QI vector mode.
4. Remove clear_by_pieces_1 and use builtin_memset_read_str in
clear_by_pieces to support vector mode broadcast.
5. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

gcc/

PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Change the mode argument
from scalar_int_mode to fixed_size_mode.
(builtin_strncpy_read_str): Likewise.
(gen_memset_value_from_prev): New function.
(gen_memset_broadcast): Likewise.
(builtin_memset_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Use gen_memset_value_from_prev
and gen_memset_broadcast.
(builtin_memset_gen_str): Likewise.
(try_store_by_multiple_pieces): Use by_pieces_constfn to declare
constfun.
* builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode
with fixed_size_mode.
(builtin_memset_read_str): Likewise.
* expr.c (widest_int_mode_for_size): Renamed to ...
(widest_fixed_size_mode_for_size): Add a bool argument to
indicate if QI vector mode can be used.
(by_pieces_ninsns): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(pieces_addr::adjust): Change the mode argument from
scalar_int_mode to fixed_size_mode.
(op_by_pieces_d): Make m_len read-only.  Add a bool member,
m_qi_vector_mode, to indicate that QI vector mode can be used.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_qi_vector_mode.  Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(op_by_pieces_d::get_usable_mode): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Call
widest_fixed_size_mode_for_size instead of
widest_int_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): New member
function to return the smallest integer or QI vector mode.
(op_by_pieces_d::run): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Call
smallest_fixed_size_mode_for_size instead of
smallest_int_mode_for_size.
(store_by_pieces_d::store_by_pieces_d): Add a bool argument to
indicate that QI vector mode can be used and pass it to
op_by_pieces_d::op_by_pieces_d.
(can_store_by_pieces): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(store_by_pieces): Pass memsetp to
store_by_pieces_d::store_by_pieces_d.
(clear_by_pieces_1): Removed.
(clear_by_pieces): Replace clear_by_pieces_1 with
builtin_memset_read_str and pass true to store_by_pieces_d to
support vector mode broadcast.
(string_cst_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.
* expr.h (by_pieces_constfn): Change scalar_int_mode to
fixed_size_mode.
(by_pieces_prev): Likewise.
* target.def (gen_memset_scratch_rtx): New hook.
* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
* doc/tm.texi: Regenerated.

gcc/testsuite/

* gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of
vmovdqu.
* gcc.target/i386/pr100865-4b.c: Likewise.
---
 gcc/builtins.c  | 137 
 gcc/builtins.h  |   4 +-
 gcc/doc/tm.texi |   7 +
 gcc/doc/tm.texi.in  |   2 +
 gcc/expr.c  | 170 ++--
 gcc/expr.h  |   4 +-
 gcc/target.def  |   9 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c  |   2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4b.c |   2 +-
 9 files changed, 248 insertions(+), 89 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 39ab139b7e1..0d5dee8a72e 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
machine_mode target_mode)
 
 static rtx
 builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
-scalar_int_mode mode)
+fixed_size_mode mode)
 {
   /* The REPresentation pointed to by DATA need not be a nul-terminated
  string but the caller guarantees it's large enough for MODE.  */
   const char *rep = (const char *) data;
 
-  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
+  /* NB: Vector mode in the by-pieces infrastructure 

[committed] RISC-V: Detect python and pick best one for calling multilib-generator

2021-07-19 Thread Kito Cheng
gcc/

* config.gcc (riscv*-*-*): Detect which python is available.
---
 gcc/config.gcc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 93e2b3219b9..3df9b52cf25 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4730,9 +4730,10 @@ case "${target}" in
echo "--with-multilib-list= can't used with 
--with-multilib-generator= at same time" 1>&2
exit 1
fi
+   PYTHON=`which python || which python3 || which python2`
case "${target}" in
riscv*-*-elf*)
-   if ${srcdir}/config/riscv/multilib-generator \
+   if ${PYTHON} 
${srcdir}/config/riscv/multilib-generator \
`echo ${with_multilib_generator} | sed 
's/;/ /g'`\
> t-multilib-config;
then
-- 
2.31.1



Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-19 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 16, 2021 at 5:11 PM Richard Biener  wrote:
>
> On Thu, 15 Jul 2021, Richard Biener wrote:
>
> > On Thu, 15 Jul 2021, Richard Biener wrote:
> >
> > > OK, guess I was more looking at
> > >
> > > #define N 32
> > > int foo (unsigned long *a, unsigned long * __restrict b,
> > >  unsigned int *c, unsigned int * __restrict d,
> > >  int n)
> > > {
> > >   unsigned sum = 1;
> > >   for (int i = 0; i < n; ++i)
> > > {
> > >   b[i] += a[i];
> > >   d[i] += c[i];
> > > }
> > >   return sum;
> > > }
> > >
> > > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > > generate two masks for the two copies of V8DI (VF is 16) and one
> > > mask for V16SI.  With SVE I see
> > >
> > > punpklo p1.h, p0.b
> > > punpkhi p2.h, p0.b
> > >
> > > that's sth I expected to see for AVX512 as well, using the V16SI
> > > mask and unpacking that to two V8DI ones.  But I see
> > >
> > > vpbroadcastd%eax, %ymm0
> > > vpaddd  %ymm12, %ymm0, %ymm0
> > > vpcmpud $6, %ymm0, %ymm11, %k3
> > > vpbroadcastd%eax, %xmm0
> > > vpaddd  %xmm10, %xmm0, %xmm0
> > > vpcmpud $1, %xmm7, %xmm0, %k1
> > > vpcmpud $6, %xmm0, %xmm8, %k2
> > > kortestb%k1, %k1
> > > jne .L3
> > >
> > > so three %k masks generated by vpcmpud.  I'll have to look what's
> > > the magic for SVE and why that doesn't trigger for x86 here.
> >
> > So answer myself, vect_maybe_permute_loop_masks looks for
> > vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> > QImode so that doesn't play well here.  Not sure if there
> > are proper mask instructions to use (I guess there's a shift
> > and lopart is free).  This is QI:8 to two QI:4 (bits) mask
Yes, for 16bit and more, we have KUNPCKBW/D/Q. but for 8bit
unpack_lo/hi, only shift.
> > conversion.  Not sure how to better ask the target here - again
> > VnBImode might have been easier here.
>
> So I've managed to "emulate" the unpack_lo/hi for the case of
> !VECTOR_MODE_P masks by using sub-vector select (we're asking
> to turn vector(8)  into two
> vector(4) ) via BIT_FIELD_REF.  That then
> produces the desired single mask producer and
>
>   loop_mask_38 = VIEW_CONVERT_EXPR >(loop_mask_54);
>   loop_mask_37 = BIT_FIELD_REF ;
>
> note for the lowpart we can just view-convert away the excess bits,
> fully re-using the mask.  We generate surprisingly "good" code:
>
> kmovb   %k1, %edi
> shrb$4, %dil
> kmovb   %edi, %k2
>
> besides the lack of using kshiftrb.  I guess we're just lacking
> a mask register alternative for
Yes, we can do it similar as kor/kand/kxor.
>
> (insn 22 20 25 4 (parallel [
> (set (reg:QI 94 [ loop_mask_37 ])
> (lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
> (const_int 4 [0x4])))
> (clobber (reg:CC 17 flags))
> ]) 724 {*lshrqi3_1}
>  (expr_list:REG_UNUSED (reg:CC 17 flags)
> (nil)))
>
> and so we reload.  For the above cited loop the AVX512 vectorization
> with --param vect-partial-vector-usage=1 does look quite sensible
> to me.  Instead of a SSE vectorized epilogue plus a scalar
> epilogue we get a single fully masked AVX512 "iteration" for both.
> I suppose it's still mostly a code-size optimization (384 bytes
> with the masked epiloge vs. 474 bytes with trunk) since it will
> be likely slower for very low iteration counts but it's good
> for icache usage then and good for less branch predictor usage.
>
> That said, I have to set up SPEC on a AVX512 machine to do
Does patch  land in trunk already, i can have a test on CLX.
> any meaningful measurements (I suspect with just AVX2 we're not
> going to see any benefit from masking).  Hints/help how to fix
> the missing kshiftrb appreciated.
>
> Oh, and if there's only V4DImode and V16HImode data then
> we don't go the vect_maybe_permute_loop_masks path - that is,
> we don't generate the (not used) intermediate mask but end up
> generating two while_ult parts.
>
> Thanks,
> Richard.



-- 
BR,
Hongtao


[PATCH] i386: Remove atomic_storedi_fpu and atomic_loaddi_fpu peepholes [PR100182]

2021-07-19 Thread Uros Bizjak via Gcc-patches
These patterns result in non-atomic sequence.

2021-07-21  Uroš Bizjak  

gcc/
PR target/100182
* config/i386/sync.md (define_peephole2 atomic_storedi_fpu):
Remove.
(define_peephole2 atomic_loaddi_fpu): Ditto.

gcc/testsuite/
PR target/100182
* gcc.target/i386/pr71245-1.c: Remove.
* gcc.target/i386/pr71245-2.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master, will be pushed to all release branches.

Uros.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 7913b918796..05a835256bb 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -219,82 +219,6 @@
   DONE;
 })
 
-(define_peephole2
-  [(set (match_operand:DF 0 "fp_register_operand")
-   (unspec:DF [(match_operand:DI 1 "memory_operand")]
-  UNSPEC_FILD_ATOMIC))
-   (set (match_operand:DI 2 "memory_operand")
-   (unspec:DI [(match_dup 0)]
-  UNSPEC_FIST_ATOMIC))
-   (set (match_operand:DF 3 "sse_reg_operand")
-   (match_operand:DF 4 "memory_operand"))]
-  "!TARGET_64BIT
-   && peep2_reg_dead_p (2, operands[0])
-   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
-  [(set (match_dup 3) (match_dup 5))
-   (set (match_dup 4) (match_dup 3))]
-  "operands[5] = gen_lowpart (DFmode, operands[1]);")
-
-(define_peephole2
-  [(set (match_operand:DF 0 "fp_register_operand")
-   (unspec:DF [(match_operand:DI 1 "memory_operand")]
-  UNSPEC_FILD_ATOMIC))
-   (set (match_operand:DI 2 "memory_operand")
-   (unspec:DI [(match_dup 0)]
-  UNSPEC_FIST_ATOMIC))
-   (set (mem:BLK (scratch:SI))
-   (unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
-   (set (match_operand:DF 3 "sse_reg_operand")
-   (match_operand:DF 4 "memory_operand"))]
-  "!TARGET_64BIT
-   && peep2_reg_dead_p (2, operands[0])
-   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
-  [(const_int 0)]
-{
-  emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1]));
-  emit_move_insn (operands[4], operands[3]);
-  emit_insn (gen_memory_blockage ());
-  DONE;
-})
-
-(define_peephole2
-  [(set (match_operand:DF 0 "sse_reg_operand")
-   (unspec:DF [(match_operand:DI 1 "memory_operand")]
-  UNSPEC_LDX_ATOMIC))
-   (set (match_operand:DI 2 "memory_operand")
-   (unspec:DI [(match_dup 0)]
-  UNSPEC_STX_ATOMIC))
-   (set (match_operand:DF 3 "sse_reg_operand")
-   (match_operand:DF 4 "memory_operand"))]
-  "!TARGET_64BIT
-   && peep2_reg_dead_p (2, operands[0])
-   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
-  [(set (match_dup 3) (match_dup 5))
-   (set (match_dup 4) (match_dup 3))]
-  "operands[5] = gen_lowpart (DFmode, operands[1]);")
-
-(define_peephole2
-  [(set (match_operand:DF 0 "sse_reg_operand")
-   (unspec:DF [(match_operand:DI 1 "memory_operand")]
-  UNSPEC_LDX_ATOMIC))
-   (set (match_operand:DI 2 "memory_operand")
-   (unspec:DI [(match_dup 0)]
-  UNSPEC_STX_ATOMIC))
-   (set (mem:BLK (scratch:SI))
-   (unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
-   (set (match_operand:DF 3 "sse_reg_operand")
-   (match_operand:DF 4 "memory_operand"))]
-  "!TARGET_64BIT
-   && peep2_reg_dead_p (2, operands[0])
-   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
-  [(const_int 0)]
-{
-  emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1]));
-  emit_move_insn (operands[4], operands[3]);
-  emit_insn (gen_memory_blockage ());
-  DONE;
-})
-
 (define_expand "atomic_store"
   [(set (match_operand:ATOMIC 0 "memory_operand")
(unspec:ATOMIC [(match_operand:ATOMIC 1 "nonimmediate_operand")
@@ -384,82 +308,6 @@
   DONE;
 })
 
-(define_peephole2
-  [(set (match_operand:DF 0 "memory_operand")
-   (match_operand:DF 1 "any_fp_register_operand"))
-   (set (match_operand:DF 2 "fp_register_operand")
-   (unspec:DF [(match_operand:DI 3 "memory_operand")]
-  UNSPEC_FILD_ATOMIC))
-   (set (match_operand:DI 4 "memory_operand")
-   (unspec:DI [(match_dup 2)]
-  UNSPEC_FIST_ATOMIC))]
-  "!TARGET_64BIT
-   && peep2_reg_dead_p (3, operands[2])
-   && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))"
-  [(set (match_dup 0) (match_dup 1))
-   (set (match_dup 5) (match_dup 1))]
-  "operands[5] = gen_lowpart (DFmode, operands[4]);")
-
-(define_peephole2
-  [(set (match_operand:DF 0 "memory_operand")
-   (match_operand:DF 1 "any_fp_register_operand"))
-   (set (mem:BLK (scratch:SI))
-   (unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
-   (set (match_operand:DF 2 "fp_register_operand")
-   (unspec:DF [(match_operand:DI 3 "memory_operand")]
-  UNSPEC_FILD_ATOMIC))
-   (set (match_operand:DI 4 "memory_operand")
-   (unspec:DI [(match_dup 2)]
-  UNSPEC_FIST_ATOMIC))]
-  "!TARGET_64BIT
-   && peep2_reg_dead_p (4, operands[2])
-   && rtx_equal_p (XEXP

Re: [PATCH] Add QI vector mode support to by-pieces for memset

2021-07-19 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
>> > + {
>> > +   /* First generate subreg of word mode if the previous mode is
>> > +  wider than word mode and word mode is wider than MODE.  */
>> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
>> > +   prev_mode, 0);
>> > +   prev_mode = word_mode;
>> > + }
>> > +  if (prev_rtx != nullptr)
>> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
>>
>> This should be lowpart_subreg, since 0 isn't the right offset for
>> big-endian targets.  Using lowpart_subreg should also avoid the need
>> for the word_size “if” above: lowpart_subreg can handle lowpart subword
>> subregs of multiword values.
>
> I tried it.  It didn't work since it caused the LRA failure.   I replaced
> simplify_gen_subreg with lowpart_subreg instead.

What specifically went wrong?

>> > +/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
>> > +   bytes from constant string DATA + OFFSET and return it as target
>> > +   constant.  If PREV isn't nullptr, it has the RTL info from the
>> > +   previous iteration.  */
>> > +
>> > +rtx
>> > +builtin_memset_read_str (void *data, void *prev,
>> > +  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
>> > +  machine_mode mode)
>> > +{
>> > +  rtx target;
>> >const char *c = (const char *) data;
>> > -  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
>> > +  char *p;
>> > +  unsigned int size = GET_MODE_SIZE (mode).to_constant ();
>> > +
>> > +  /* Don't use the previous value if size is 1.  */
>>
>> Why not though?  The existing code does, and that seems like the right
>> thing to do when operating on integers.
>>
>> I can see the check would be a good thing to do if MODE isn't a vector
>> mode and the previous mode was.  Is that the case you're worried about?
>> If so, that check could go in gen_memset_value_from_prev instead.
>
> We are storing one byte.  Doing it directly is faster.

But the first thing being protected here is…

>> > +  if (size != 1)
>> > +{
>> > +  target = gen_memset_value_from_prev (prev, mode);
>> > +  if (target != nullptr)
>> > + return target;

…this attempt to use the previous value.  If the target uses, say,
SImode for the first piece and QImode for a final byte, using the QImode
lowpart of the SImode register would avoid having to move the byte value
into a separate QImode register.  Why's that a bad thing to do?  AFAICT
it's what the current code would do, so if we want to change it even for
integer modes, I think it should be a separate patch with a separate
justification.

Like I say, I can understand that using the QImode lowpart of a vector
wouldn't be a good idea.  But if that's specifically what you're trying
to prevent, I think we should test for it.

Thanks,
Richard


Re: [committed] .dir-locals.el: Set 'fill-column' to 80 for c-mode

2021-07-19 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches  writes:
> On 19/07/2021 14:52, Richard Sandiford via Gcc-patches wrote:
>> Richard Earnshaw via Gcc-patches  writes:
>>> On 14/12/2020 11:29, Andrea Corallo via Gcc-patches wrote:
 Hi all,

 I've committed the attached patch as obvious.

 This is to set `fill-column' to 80 in c-mode (Emacs default it to 70) so
 now M-q does the right thing.  I think is very handy especially in
 comments.

 Question: should we update the copyright manually for this file or have
 it updated by 'update-copyright.py'?  I think this is not scanning the
 root of the repo.

 Thanks

 Andrea

>>>
>>> Sorry for the very late reply to this, but I've only just tracked it
>>> down as the cause of why emacs had suddenly started to produce lines
>>> that were displaying with a line wrap (I'd add an image, but the mailing
>>> list would likely only strip it off).
>>>
>>> The problem is that this allows a character to appear in column 80 and
>>> emacs then automatically inserts a blank line after it (when the window
>>> is 80 columns wide), which really messes up the formatting.
>>>
>>> The wiki seems to suggest https://gcc.gnu.org/wiki/FormattingCodeForGCC
>>> that the line length should be 79 columns (see the textwidth setting);
>>> although that is not what is set contrib/vimrc.
>>>
>>> Would anyone object if we reduced this by 1 (to 79) in both places?
>> 
>> Sounds good to me FWIW.  (Having hit the same issue.)
>> 
>> Richard
>> 
>
> So how about this?
>
> --
>
> Limit fill-column to 79
>
> The current line-length limit is set to 80, but that allows a character 
> to appear in the 80th column, and that causes emacs to display a 
> line-wrap followed by a blank line when the display/window width is 80 
> columns.  Furthermore, this seems to contradict the coding-style rules 
> on the wiki which suggest that the line limit should be 79.
>
> So reduce the line width in both the emacs control file and the contrib 
> vimrc file to 79 characters.
>
> ChangeLog:
>   * .dir-local.el (c-mode): Change fill-column to 79.
>
> contrib:
>   * vimrc (textwidth): Change non-gitcommit length to 79.

LGTM, thanks.

Richard


Re: [PATCH] aarch64: Refactor TBL/TBX RTL patterns

2021-07-19 Thread Richard Sandiford via Gcc-patches
Jonathan Wright via Gcc-patches  writes:
> Hi,
>
> As subject, this patch renames the two-source-register TBL/TBX RTL
> patterns so that their names better reflect what they do, rather than
> confusing them with tbl3 or tbx4 patterns. Also use the correct
> "neon_tbl2" type attribute for both patterns.
>
> Rename single-source-register TBL/TBX patterns for consistency.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?

OK.  Nice clean-up, thanks.

Richard

> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-08  Jonathan Wright  
>
>   * config/aarch64/aarch64-simd-builtins.def: Use two variant
>   generators for all TBL/TBX intrinsics and rename to
>   consistent forms: qtbl[1234] or qtbx[1234].
>   * config/aarch64/aarch64-simd.md (aarch64_tbl1):
>   Rename to...
>   (aarch64_qtbl1): This.
>   (aarch64_tbx1): Rename to...
>   (aarch64_qtbx1): This.
>   (aarch64_tbl2v16qi): Delete.
>   (aarch64_tbl3): Rename to...
>   (aarch64_qtbl2): This.
>   (aarch64_tbx4): Rename to...
>   (aarch64_qtbx2): This.
>   * config/aarch64/aarch64.c (aarch64_expand_vec_perm_1): Use
>   renamed qtbl1 and qtbl2 RTL patterns.
>   * config/aarch64/arm_neon.h (vqtbl1_p8): Use renamed qtbl1
>   RTL pattern.
>   (vqtbl1_s8): Likewise.
>   (vqtbl1_u8): Likewise.
>   (vqtbl1q_p8): Likewise.
>   (vqtbl1q_s8): Likewise.
>   (vqtbl1q_u8): Likewise.
>   (vqtbx1_s8): Use renamed qtbx1 RTL pattern.
>   (vqtbx1_u8): Likewise.
>   (vqtbx1_p8): Likewise.
>   (vqtbx1q_s8): Likewise.
>   (vqtbx1q_u8): Likewise.
>   (vqtbx1q_p8): Likewise.
>   (vtbl1_s8): Use renamed qtbl1 RTL pattern.
>   (vtbl1_u8): Likewise.
>   (vtbl1_p8): Likewise.
>   (vtbl2_s8): Likewise
>   (vtbl2_u8): Likewise.
>   (vtbl2_p8): Likewise.
>   (vtbl3_s8): Use renamed qtbl2 RTL pattern.
>   (vtbl3_u8): Likewise.
>   (vtbl3_p8): Likewise.
>   (vtbl4_s8): Likewise.
>   (vtbl4_u8): Likewise.
>   (vtbl4_p8): Likewise.
>   (vtbx2_s8): Use renamed qtbx2 RTL pattern.
>   (vtbx2_u8): Likewise.
>   (vtbx2_p8): Likewise.
>   (vqtbl2_s8): Use renamed qtbl2 RTL pattern.
>   (vqtbl2_u8): Likewise.
>   (vqtbl2_p8): Likewise.
>   (vqtbl2q_s8): Likewise.
>   (vqtbl2q_u8): Likewise.
>   (vqtbl2q_p8): Likewise.
>   (vqtbx2_s8): Use renamed qtbx2 RTL pattern.
>   (vqtbx2_u8): Likewise.
>   (vqtbx2_p8): Likewise.
>   (vqtbx2q_s8): Likewise.
>   (vqtbx2q_u8): Likewise.
>   (vqtbx2q_p8): Likewise.
>   (vtbx4_s8): Likewise.
>   (vtbx4_u8): Likewise.
>   (vtbx4_p8): Likewise.
>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 063f503ebd96657f017dfaa067cb231991376bda..b7f1237b1ffd0d4ca283c853be1cc94b9fc35260
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -718,37 +718,31 @@
>VAR1 (BINOPP, crypto_pmull, 0, NONE, di)
>VAR1 (BINOPP, crypto_pmull, 0, NONE, v2di)
>  
> -  /* Implemented by aarch64_tbl3.  */
> -  VAR1 (BINOP, tbl3, 0, NONE, v8qi)
> -  VAR1 (BINOP, tbl3, 0, NONE, v16qi)
> +  /* Implemented by aarch64_qtbl1.  */
> +  VAR2 (BINOP, qtbl1, 0, NONE, v8qi, v16qi)
> +  VAR2 (BINOPU, qtbl1, 0, NONE, v8qi, v16qi)
>  
> -  /* Implemented by aarch64_tbl1.  */
> -  VAR2 (BINOP, tbl1, 0, NONE, v8qi, v16qi)
> -  VAR2 (BINOPU, tbl1, 0, NONE, v8qi, v16qi)
> +  /* Implemented by aarch64_qtbl2.  */
> +  VAR2 (BINOP, qtbl2, 0, NONE, v8qi, v16qi)
>  
>/* Implemented by aarch64_qtbl3.  */
> -  VAR1 (BINOP, qtbl3, 0, NONE, v8qi)
> -  VAR1 (BINOP, qtbl3, 0, NONE, v16qi)
> +  VAR2 (BINOP, qtbl3, 0, NONE, v8qi, v16qi)
>  
>/* Implemented by aarch64_qtbl4.  */
> -  VAR1 (BINOP, qtbl4, 0, NONE, v8qi)
> -  VAR1 (BINOP, qtbl4, 0, NONE, v16qi)
> +  VAR2 (BINOP, qtbl4, 0, NONE, v8qi, v16qi)
>  
> -  /* Implemented by aarch64_tbx1.  */
> -  VAR2 (TERNOP, tbx1, 0, NONE, v8qi, v16qi)
> -  VAR2 (TERNOPU, tbx1, 0, NONE, v8qi, v16qi)
> +  /* Implemented by aarch64_qtbx1.  */
> +  VAR2 (TERNOP, qtbx1, 0, NONE, v8qi, v16qi)
> +  VAR2 (TERNOPU, qtbx1, 0, NONE, v8qi, v16qi)
>  
> -  /* Implemented by aarch64_tbx4.  */
> -  VAR1 (TERNOP, tbx4, 0, NONE, v8qi)
> -  VAR1 (TERNOP, tbx4, 0, NONE, v16qi)
> +  /* Implemented by aarch64_qtbx2.  */
> +  VAR2 (TERNOP, qtbx2, 0, NONE, v8qi, v16qi)
>  
>/* Implemented by aarch64_qtbx3.  */
> -  VAR1 (TERNOP, qtbx3, 0, NONE, v8qi)
> -  VAR1 (TERNOP, qtbx3, 0, NONE, v16qi)
> +  VAR2 (TERNOP, qtbx3, 0, NONE, v8qi, v16qi)
>  
>/* Implemented by aarch64_qtbx4.  */
> -  VAR1 (TERNOP, qtbx4, 0, NONE, v8qi)
> -  VAR1 (TERNOP, qtbx4, 0, NONE, v16qi)
> +  VAR2 (TERNOP, qtbx4, 0, NONE, v8qi, v16qi)
>  
>/* Builtins for ARMv8.1-A Adv.SIMD instructions.  */
>  
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
>