Re: [PATCH] i386: Fix up ICE with -mveclibabi={acml,svml} [PR105367]

2022-04-26 Thread Richard Biener via Gcc-patches
On Tue, Apr 26, 2022 at 8:54 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The following testcase ICEs, because conversion between scalar float types
> which have the same mode are useless in GIMPLE, but for mathfn_built_in the
> exact type matters (it treats say double and _Float64 or float and _Float32
> differently, using different suffixes and for the _Float* sometimes
> returning NULL when float/double do have a builtin).
>
> In ix86_veclibabi_{svml,acml} we are using mathfn_built_in just so that
> we don't have to translate the combined_fn and SFmode vs. DFmode into
> strings ourselfs, and we already earlier punt on anything but SFmode and
> DFmode.  So, this patch just uses the double or float types depending
> on the modes, rather than the types we actually got and which might be
> _Float64 or _Float32 etc.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2022-04-26  Jakub Jelinek  
>
> PR target/105367
> * config/i386/i386.cc (ix86_veclibabi_svml, ix86_veclibabi_acml): Pass
> el_mode == DFmode ? double_type_node : float_type_node instead of
> TREE_TYPE (type_in) as first arguments to mathfn_built_in.
>
> * gcc.target/i386/pr105367.c: New test.
>
> --- gcc/config/i386/i386.cc.jj  2022-04-22 13:36:42.558150777 +0200
> +++ gcc/config/i386/i386.cc 2022-04-25 12:30:09.862736906 +0200
> @@ -18807,7 +18807,8 @@ ix86_veclibabi_svml (combined_fn fn, tre
>return NULL_TREE;
>  }
>
> -  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
> +  tree fndecl = mathfn_built_in (el_mode == DFmode
> +? double_type_node : float_type_node, fn);
>bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>
>if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF)
> @@ -18899,7 +18900,8 @@ ix86_veclibabi_acml (combined_fn fn, tre
>return NULL_TREE;
>  }
>
> -  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
> +  tree fndecl = mathfn_built_in (el_mode == DFmode
> +? double_type_node : float_type_node, fn);
>bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>sprintf (name + 7, "%s", bname+10);
>
> --- gcc/testsuite/gcc.target/i386/pr105367.c.jj 2022-04-25 12:25:16.724809778 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr105367.c2022-04-25 12:24:34.004403339 
> +0200
> @@ -0,0 +1,12 @@
> +/* PR target/105367 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mveclibabi=acml" } */
> +
> +_Float64 g;
> +
> +void
> +foo (void)
> +{
> +  _Float64 f = __builtin_sin (g);
> +  g = __builtin_fmax (__builtin_sin (f), f);
> +}
>
> Jakub
>


[committed] libgomp: Fix up two non-GOMP_USE_ALIGNED_WORK_SHARES related issues [PR105358]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
Hi!

Last fall I've changed struct gomp_work_share, so that it doesn't have
__attribute__((aligned (64))) lock member in the middle unless the target has
non-emulated aligned allocator, otherwise it just makes sure the first and
second halves are 64 bytes appart for cache line reasons, but doesn't make
the struct 64-byte aligned itself and so we can use normal allocators for it.

When the struct isn't 64-byte aligned, the amount of tail padding significantly
decreases, to 0 or 4 bytes or so.  The library uses that tail padding when
the ordered_teams_ids array (array of uints) and/or the memory for lastprivate
conditional temporaries (the latter wants to guarantee long long alignment).
The problem with it on ia32 darwin9 is that while the struct contains
long long members, long long is just 4 byte aligned while __alignof__(long long)
is 8.  That causes problems in gomp_init_work_share, where we currently rely on
if offsetof (struct gomp_work_share, inline_ordered_team_ids) is long long
aligned, then that tail array will be aligned at runtime and so no extra
memory for dynamic realignment will be needed (that is false when the whole
struct doesn't have long long alignment).  And also in the remaining hunks
causes another problem, where we compute INLINE_ORDERED_TEAM_IDS_OFF
as the above offsetof aligned up to long long boundary and subtract
sizeof (struct gomp_work_share) and INLINE_ORDERED_TEAM_IDS_OFF.
When unlucky, the former isn't multiple of 8 and the latter is 4 bigger
than that and as the subtraction is done in size_t, we end up with (size_t) -4,
so the comparison doesn't really work.

The fixes add additional conditions to make it work properly, but all of them
should be evaluated at compile time when optimizing and so shouldn't slow
anything.

Bootstrapped/regtested on x86_64-linux and i686-linux and in the PR Iain
said he has tested it on affected targets, committed to trunk.

2022-04-26  Jakub Jelinek  

PR libgomp/105358
* work.c (gomp_init_work_share): Don't mask of adjustment for
dynamic long long realignment if struct gomp_work_share has smaller
alignof than long long.
* loop.c (GOMP_loop_start): Don't use inline_ordered_team_ids if
struct gomp_work_share has smaller alignof than long long or if
sizeof (struct gomp_work_share) is smaller than
INLINE_ORDERED_TEAM_IDS_OFF.
* loop_ull.c (GOMP_loop_ull_start): Likewise.
* sections.c (GOMP_sections2_start): Likewise.

--- libgomp/work.c.jj   2022-01-11 23:11:23.944268316 +0100
+++ libgomp/work.c  2022-04-25 13:42:24.885500128 +0200
@@ -113,7 +113,9 @@ gomp_init_work_share (struct gomp_work_s
  size_t o = nthreads * sizeof (*ws->ordered_team_ids);
  o += __alignof__ (long long) - 1;
  if ((offsetof (struct gomp_work_share, inline_ordered_team_ids)
-  & (__alignof__ (long long) - 1)) == 0)
+  & (__alignof__ (long long) - 1)) == 0
+ && __alignof__ (struct gomp_work_share)
+>= __alignof__ (long long))
o &= ~(__alignof__ (long long) - 1);
  ordered += o - 1;
}
--- libgomp/loop.c.jj   2022-01-11 23:11:23.890269075 +0100
+++ libgomp/loop.c  2022-04-25 13:39:24.266009817 +0200
@@ -270,8 +270,11 @@ GOMP_loop_start (long start, long end, l
 #define INLINE_ORDERED_TEAM_IDS_OFF \
   ((offsetof (struct gomp_work_share, inline_ordered_team_ids) \
 + __alignof__ (long long) - 1) & ~(__alignof__ (long long) - 1))
- if (size > (sizeof (struct gomp_work_share)
- - INLINE_ORDERED_TEAM_IDS_OFF))
+ if (sizeof (struct gomp_work_share)
+ <= INLINE_ORDERED_TEAM_IDS_OFF
+ || __alignof__ (struct gomp_work_share) < __alignof__ (long long)
+ || size > (sizeof (struct gomp_work_share)
+   - INLINE_ORDERED_TEAM_IDS_OFF))
*mem
  = (void *) (thr->ts.work_share->ordered_team_ids
  = gomp_malloc_cleared (size));
--- libgomp/loop_ull.c.jj   2022-01-11 23:11:23.890269075 +0100
+++ libgomp/loop_ull.c  2022-04-25 13:40:49.221829365 +0200
@@ -269,8 +269,11 @@ GOMP_loop_ull_start (bool up, gomp_ull s
 #define INLINE_ORDERED_TEAM_IDS_OFF \
   ((offsetof (struct gomp_work_share, inline_ordered_team_ids) \
 + __alignof__ (long long) - 1) & ~(__alignof__ (long long) - 1))
- if (size > (sizeof (struct gomp_work_share)
- - INLINE_ORDERED_TEAM_IDS_OFF))
+ if (sizeof (struct gomp_work_share)
+ <= INLINE_ORDERED_TEAM_IDS_OFF
+ || __alignof__ (struct gomp_work_share) < __alignof__ (long long)
+ || size > (sizeof (struct gomp_work_share)
+   - INLINE_ORDERED_TEAM_IDS_OFF))
*mem
  = (void *) (thr->ts.work_share->ordered_team_ids
  = gomp_malloc_cleared (size));
--- libgomp/sections.c.jj   2022-01-11 23:11

[PATCH] reassoc: Don't call fold_convert if !fold_convertible_p [PR105374]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, we ICE because maybe_fold_*_comparisons returns
an expression with V4SImode type and we try to fold_convert it to
V4BImode, which isn't allowed.

IMHO no matter whether we change maybe_fold_*_comparisons we should
play safe on the reassoc side and punt if we can't convert like
we punt for many other reasons.  This fixes the testcase on ARM.

Bootstrapped/regtested on {x86_64,i686,armv7hl,aarch64,powerpc64le}-linux,
ok for trunk?

Testcase not included, not exactly sure where and what directives it
should have in gcc.target/arm/ testsuite.  Christophe, do you think you
could handle that incrementally?

2022-04-26  Jakub Jelinek  

PR tree-optimization/105374
* tree-ssa-reassoc.cc (eliminate_redundant_comparison): Punt if
!fold_convertible_p rather than assuming fold_convert must succeed.

--- gcc/tree-ssa-reassoc.cc.jj  2022-04-14 13:46:59.690140053 +0200
+++ gcc/tree-ssa-reassoc.cc 2022-04-25 15:34:03.811473537 +0200
@@ -2254,7 +2254,11 @@ eliminate_redundant_comparison (enum tre
 BIT_AND_EXPR or BIT_IOR_EXPR was of a wider integer type,
 we need to convert.  */
   if (!useless_type_conversion_p (TREE_TYPE (curr->op), TREE_TYPE (t)))
-   t = fold_convert (TREE_TYPE (curr->op), t);
+   {
+ if (!fold_convertible_p (TREE_TYPE (curr->op), t))
+   continue;
+ t = fold_convert (TREE_TYPE (curr->op), t);
+   }
 
   if (TREE_CODE (t) != INTEGER_CST
  && !operand_equal_p (t, curr->op, 0))

Jakub



[PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase regressed on riscv due to the splitting of critical
edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
the edges, whether true or false edge goes to an empty forwarded bb.
>From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, for
some ifcvt opts it matters one way or another.

On this testcase, noce_try_store_flag_mask used to trigger and transformed
if (pseudo2) pseudo1 = 0;
into
pseudo1 &= -(pseudo2 == 0);
But with the swapped edges ifcvt actually sees
if (!pseudo2) pseudo3 = pseudo1; else pseudo3 = 0;
and noce_try_store_flag_mask punts.  IMHO there is no reason why it
should punt those, it is equivalent to
pseudo3 = pseudo1 & -(pseudo2 == 0);
and especially if the target has 3 operand AND, it shouldn't be any more
costly (and even with 2 operand AND, it might very well happen that RA
can make it happen without any extra moves).

Initially I've just removed the rtx_equal_p calls from the conditions
and didn't add anything there, but that broke aarch64 bootstrap and
regressed some testcases on x86_64, where if_info->a or if_info->b could be
some larger expression that we can't force into a register.
Furthermore, the case where both if_info->a and if_info->b are constants is
better handled by other ifcvt optimizations like noce_try_store_flag
or noce_try_inverse_constants or noce_try_store_flag_constants.
So, I've restricted it to just a REG (perhaps SUBREG of REG might be ok too)
next to what has been handled previously.

Bootstrapped/regtested on {x86_64,i686,armv7hl,aarch64,powerpc64le}-linux,
and tested it on the testcase in a cross to riscv*-linux, ok for trunk?

2022-04-26  Jakub Jelinek  

PR rtl-optimization/105314
* ifcvt.cc (noce_try_store_flag_mask): Don't require that the non-zero
operand is equal to if_info->x, instead use the non-zero operand
as one of the operands of AND with if_info->x as target.

* gcc.target/riscv/pr105314.c: New test.

--- gcc/ifcvt.cc.jj 2022-03-15 09:11:55.312988179 +0100
+++ gcc/ifcvt.cc2022-04-25 17:37:11.278924377 +0200
@@ -1678,10 +1678,10 @@ noce_try_store_flag_mask (struct noce_if
   reversep = 0;
 
   if ((if_info->a == const0_rtx
-   && rtx_equal_p (if_info->b, if_info->x))
+   && (REG_P (if_info->b) || rtx_equal_p (if_info->b, if_info->x)))
   || ((reversep = (noce_reversed_cond_code (if_info) != UNKNOWN))
  && if_info->b == const0_rtx
- && rtx_equal_p (if_info->a, if_info->x)))
+ && (REG_P (if_info->a) || rtx_equal_p (if_info->a, if_info->x
 {
   start_sequence ();
   target = noce_emit_store_flag (if_info,
@@ -1689,7 +1689,7 @@ noce_try_store_flag_mask (struct noce_if
 reversep, -1);
   if (target)
 target = expand_simple_binop (GET_MODE (if_info->x), AND,
- if_info->x,
+ reversep ? if_info->a : if_info->b,
  target, if_info->x, 0,
  OPTAB_WIDEN);
 
--- gcc/testsuite/gcc.target/riscv/pr105314.c.jj2022-04-25 
17:41:00.958736306 +0200
+++ gcc/testsuite/gcc.target/riscv/pr105314.c   2022-04-25 17:40:46.237940642 
+0200
@@ -0,0 +1,12 @@
+/* PR rtl-optimization/105314 */
+/* { dg-do compile } *
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "\tbeq\t" } } */
+
+long
+foo (long a, long b, long c)
+{
+  if (c)
+a = 0;
+  return a;
+}

Jakub



Re: [PATCH] reassoc: Don't call fold_convert if !fold_convertible_p [PR105374]

2022-04-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Apr 2022, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, we ICE because maybe_fold_*_comparisons returns
> an expression with V4SImode type and we try to fold_convert it to
> V4BImode, which isn't allowed.
> 
> IMHO no matter whether we change maybe_fold_*_comparisons we should
> play safe on the reassoc side and punt if we can't convert like
> we punt for many other reasons.  This fixes the testcase on ARM.
> 
> Bootstrapped/regtested on {x86_64,i686,armv7hl,aarch64,powerpc64le}-linux,
> ok for trunk?

OK.

Richard.

> Testcase not included, not exactly sure where and what directives it
> should have in gcc.target/arm/ testsuite.  Christophe, do you think you
> could handle that incrementally?
> 
> 2022-04-26  Jakub Jelinek  
> 
>   PR tree-optimization/105374
>   * tree-ssa-reassoc.cc (eliminate_redundant_comparison): Punt if
>   !fold_convertible_p rather than assuming fold_convert must succeed.
> 
> --- gcc/tree-ssa-reassoc.cc.jj2022-04-14 13:46:59.690140053 +0200
> +++ gcc/tree-ssa-reassoc.cc   2022-04-25 15:34:03.811473537 +0200
> @@ -2254,7 +2254,11 @@ eliminate_redundant_comparison (enum tre
>BIT_AND_EXPR or BIT_IOR_EXPR was of a wider integer type,
>we need to convert.  */
>if (!useless_type_conversion_p (TREE_TYPE (curr->op), TREE_TYPE (t)))
> - t = fold_convert (TREE_TYPE (curr->op), t);
> + {
> +   if (!fold_convertible_p (TREE_TYPE (curr->op), t))
> + continue;
> +   t = fold_convert (TREE_TYPE (curr->op), t);
> + }
>  
>if (TREE_CODE (t) != INTEGER_CST
> && !operand_equal_p (t, curr->op, 0))
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] i386: Improve ix86_expand_int_movcc

2022-04-26 Thread Uros Bizjak via Gcc-patches
On Tue, Apr 26, 2022 at 8:44 AM Jakub Jelinek  wrote:
>
> Hi!
>
> When working on PR105338, I've noticed that in some cases we emit
> unnecessarily long sequence which has then higher seq_cost than necessary.
>
> E.g. when ix86_expand_int_movcc is called with
> operands[0] (reg/v:SI 83 [ i ])
> operands[1] (eq (reg/v:SI 83 [ i ]) (const_int 0 [0]))
> operands[2] (reg/v:SI 83 [ i ])
> operands[3] (const_int -2 [0xfffe])
> i.e. r83 = r83 == 0 ? r83 : -2 which with my PR105338 patch is equivalent to
> r83 = r83 == 0 ? 0 : -2, we emit:
> (insn 24 0 25 (set (reg:CC 17 flags)
> (compare:CC (reg/v:SI 83 [ i ])
> (const_int 1 [0x1]))) 11 {*cmpsi_1}
>  (nil))
> (insn 25 24 26 (parallel [
> (set (reg:SI 85)
> (if_then_else:SI (ltu:SI (reg:CC 17 flags)
> (const_int 0 [0]))
> (const_int -1 [0x])
> (const_int 0 [0])))
> (clobber (reg:CC 17 flags))
> ]) 1192 {*x86_movsicc_0_m1}
>  (nil))
> (insn 26 25 27 (set (reg:SI 85)
> (not:SI (reg:SI 85))) 683 {*one_cmplsi2_1}
>  (nil))
> (insn 27 26 28 (parallel [
> (set (reg:SI 85)
> (and:SI (reg:SI 85)
> (const_int -2 [0xfffe])))
> (clobber (reg:CC 17 flags))
> ]) 533 {*andsi_1}
>  (nil))
> (insn 28 27 0 (set (reg/v:SI 83 [ i ])
> (reg:SI 85)) 81 {*movsi_internal}
>  (nil))
> which has seq_cost (seq, true) 24.  But it could have just cost 20
> if we didn't decide to use a fresh temporary r85 and used r83 instead
> - we could avoid the copy at the end.
> The reason for it is in the 2 reg_overlap_mentioned_p calls,
> the destination (out) indeed overlaps op0 - it is the same register,
> but I don't see why that is a problem, this is in a code path where
> we've already called
> ix86_expand_carry_flag_compare (code, op0, op1, &compare_op)
> earlier, so the fact that we've out overlaps op0 or op1 shouldn't matter
> because insn 24 above is already emitted, we should just care if
> it overlaps whatever we got from that ix86_expand_carry_flag_compare
> call, i.e. compare_op, otherwise we can overwrite out just fine;
> we also know at that point that the last 2 operands of ?: are constants.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for GCC 13?
>
> 2022-04-26  Jakub Jelinek  
>
> * config/i386/i386-expand.cc (ix86_expand_int_movcc): Create a
> temporary only if out overlaps compare_op, not when it overlaps
> op0 or op1.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-expand.cc.jj   2022-04-22 14:18:27.0 +0200
> +++ gcc/config/i386/i386-expand.cc  2022-04-22 15:13:47.263829089 +0200
> @@ -3224,8 +3224,7 @@ ix86_expand_int_movcc (rtx operands[])
> }
>   diff = ct - cf;
>
> - if (reg_overlap_mentioned_p (out, op0)
> - || reg_overlap_mentioned_p (out, op1))
> + if (reg_overlap_mentioned_p (out, compare_op))
> tmp = gen_reg_rtx (mode);
>
>   if (mode == DImode)
>
> Jakub
>


[committed] testsuite: Fix up g++.target/i386/vec-tmpl1.C testcase [PR65211]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
Hi!

This test fails on i686-linux:
Excess errors:
.../gcc/testsuite/g++.target/i386/vec-tmpl1.C:13:27: warning: SSE vector return 
without SSE enabled changes the ABI [-Wpsabi]

Fixed thusly, tested on x86_64-linux with
make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m64,-m32/-mno-sse\} 
i386.exp=vec-tmpl*'
and committed to trunk as obvious.

2022-04-26  Jakub Jelinek  

PR c++/65211
* g++.target/i386/vec-tmpl1.C: Add -Wno-psabi as
dg-additional-options.

--- gcc/testsuite/g++.target/i386/vec-tmpl1.C.jj2022-04-14 
13:46:59.621141017 +0200
+++ gcc/testsuite/g++.target/i386/vec-tmpl1.C   2022-04-26 09:51:10.336944403 
+0200
@@ -1,4 +1,5 @@
 // PR c++/65211
+// { dg-additional-options "-Wno-psabi" }
 // { dg-final { scan-assembler-not "movdqa" } }
 
 typedef unsigned v4ui __attribute__ ((vector_size(16), aligned (16)));

Jakub



Re: [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]

2022-04-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Apr 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase regressed on riscv due to the splitting of critical
> edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
> the edges, whether true or false edge goes to an empty forwarded bb.
> From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, for
> some ifcvt opts it matters one way or another.

Do we have evidence that one or the other form is "better"?  There's
the possibility to change CFG cleanup to process the edges in a
defined {true,false} or {false,true} order rather than edge number order.
Likewise we could order fallthru before EH or abnormal edges here.
It would be a bit intrusive (and come at a cost) since currently we
just iterate over all BBs, seeing if they are forwarders while ordering
would require a different iteration scheme.  But doing all this might
also make the CFG cleanup result more stable with respect to IL
representation changes.

> On this testcase, noce_try_store_flag_mask used to trigger and transformed
> if (pseudo2) pseudo1 = 0;
> into
> pseudo1 &= -(pseudo2 == 0);
> But with the swapped edges ifcvt actually sees
> if (!pseudo2) pseudo3 = pseudo1; else pseudo3 = 0;
> and noce_try_store_flag_mask punts.  IMHO there is no reason why it
> should punt those, it is equivalent to
> pseudo3 = pseudo1 & -(pseudo2 == 0);
> and especially if the target has 3 operand AND, it shouldn't be any more
> costly (and even with 2 operand AND, it might very well happen that RA
> can make it happen without any extra moves).
> 
> Initially I've just removed the rtx_equal_p calls from the conditions
> and didn't add anything there, but that broke aarch64 bootstrap and
> regressed some testcases on x86_64, where if_info->a or if_info->b could be
> some larger expression that we can't force into a register.
> Furthermore, the case where both if_info->a and if_info->b are constants is
> better handled by other ifcvt optimizations like noce_try_store_flag
> or noce_try_inverse_constants or noce_try_store_flag_constants.
> So, I've restricted it to just a REG (perhaps SUBREG of REG might be ok too)
> next to what has been handled previously.
> 
> Bootstrapped/regtested on {x86_64,i686,armv7hl,aarch64,powerpc64le}-linux,
> and tested it on the testcase in a cross to riscv*-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2022-04-26  Jakub Jelinek  
> 
>   PR rtl-optimization/105314
>   * ifcvt.cc (noce_try_store_flag_mask): Don't require that the non-zero
>   operand is equal to if_info->x, instead use the non-zero operand
>   as one of the operands of AND with if_info->x as target.
> 
>   * gcc.target/riscv/pr105314.c: New test.
> 
> --- gcc/ifcvt.cc.jj   2022-03-15 09:11:55.312988179 +0100
> +++ gcc/ifcvt.cc  2022-04-25 17:37:11.278924377 +0200
> @@ -1678,10 +1678,10 @@ noce_try_store_flag_mask (struct noce_if
>reversep = 0;
>  
>if ((if_info->a == const0_rtx
> -   && rtx_equal_p (if_info->b, if_info->x))
> +   && (REG_P (if_info->b) || rtx_equal_p (if_info->b, if_info->x)))
>|| ((reversep = (noce_reversed_cond_code (if_info) != UNKNOWN))
> && if_info->b == const0_rtx
> -   && rtx_equal_p (if_info->a, if_info->x)))
> +   && (REG_P (if_info->a) || rtx_equal_p (if_info->a, if_info->x
>  {
>start_sequence ();
>target = noce_emit_store_flag (if_info,
> @@ -1689,7 +1689,7 @@ noce_try_store_flag_mask (struct noce_if
>reversep, -1);
>if (target)
>  target = expand_simple_binop (GET_MODE (if_info->x), AND,
> -   if_info->x,
> +   reversep ? if_info->a : if_info->b,
> target, if_info->x, 0,
> OPTAB_WIDEN);
>  
> --- gcc/testsuite/gcc.target/riscv/pr105314.c.jj  2022-04-25 
> 17:41:00.958736306 +0200
> +++ gcc/testsuite/gcc.target/riscv/pr105314.c 2022-04-25 17:40:46.237940642 
> +0200
> @@ -0,0 +1,12 @@
> +/* PR rtl-optimization/105314 */
> +/* { dg-do compile } *
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "\tbeq\t" } } */
> +
> +long
> +foo (long a, long b, long c)
> +{
> +  if (c)
> +a = 0;
> +  return a;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] reassoc: Don't call fold_convert if !fold_convertible_p [PR105374]

2022-04-26 Thread Christophe Lyon via Gcc-patches




On 4/26/22 09:24, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, we ICE because maybe_fold_*_comparisons returns
an expression with V4SImode type and we try to fold_convert it to
V4BImode, which isn't allowed.

IMHO no matter whether we change maybe_fold_*_comparisons we should
play safe on the reassoc side and punt if we can't convert like
we punt for many other reasons.  This fixes the testcase on ARM.

Bootstrapped/regtested on {x86_64,i686,armv7hl,aarch64,powerpc64le}-linux,
ok for trunk?

Testcase not included, not exactly sure where and what directives it
should have in gcc.target/arm/ testsuite.  Christophe, do you think you
could handle that incrementally?


Yes, sure, I'll take a look.
Thanks for the fix!



2022-04-26  Jakub Jelinek  

PR tree-optimization/105374
* tree-ssa-reassoc.cc (eliminate_redundant_comparison): Punt if
!fold_convertible_p rather than assuming fold_convert must succeed.

--- gcc/tree-ssa-reassoc.cc.jj  2022-04-14 13:46:59.690140053 +0200
+++ gcc/tree-ssa-reassoc.cc 2022-04-25 15:34:03.811473537 +0200
@@ -2254,7 +2254,11 @@ eliminate_redundant_comparison (enum tre
 BIT_AND_EXPR or BIT_IOR_EXPR was of a wider integer type,
 we need to convert.  */
if (!useless_type_conversion_p (TREE_TYPE (curr->op), TREE_TYPE (t)))
-   t = fold_convert (TREE_TYPE (curr->op), t);
+   {
+ if (!fold_convertible_p (TREE_TYPE (curr->op), t))
+   continue;
+ t = fold_convert (TREE_TYPE (curr->op), t);
+   }
  
if (TREE_CODE (t) != INTEGER_CST

  && !operand_equal_p (t, curr->op, 0))

Jakub



Re: [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 26, 2022 at 09:58:35AM +0200, Richard Biener wrote:
> > The following testcase regressed on riscv due to the splitting of critical
> > edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
> > the edges, whether true or false edge goes to an empty forwarded bb.
> > From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, for
> > some ifcvt opts it matters one way or another.
> 
> Do we have evidence that one or the other form is "better"?  There's
> the possibility to change CFG cleanup to process the edges in a
> defined {true,false} or {false,true} order rather than edge number order.
> Likewise we could order fallthru before EH or abnormal edges here.
> It would be a bit intrusive (and come at a cost) since currently we
> just iterate over all BBs, seeing if they are forwarders while ordering
> would require a different iteration scheme.  But doing all this might
> also make the CFG cleanup result more stable with respect to IL
> representation changes.

I have the feeling that sometimes one order and sometimes another order is
better, sometimes one order means we reuse a pseudo, sometimes the other,
and in some cases reusing a pseudo disables ifcvt optimization, in another
is the only case where it works.
I think it is best to make ifcvt work with any orders, but not sure it will
be always possible.  E.g. sometimes the reused pseudo means we need to
allocate an extra temporary, and while say RA can generate the same code
from it, during ifcvt the sequence now might have higher cost.

Also, we have the case of cond_move_process_if_block which doesn't bother
checking any costs, so it can force cases where we otherwise carefully punt
because of costs.  I'm afraid to touch that this late and not really sure
I'll have time for it during stage1 either.

Jakub



Re: [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]

2022-04-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Apr 2022, Jakub Jelinek wrote:

> On Tue, Apr 26, 2022 at 09:58:35AM +0200, Richard Biener wrote:
> > > The following testcase regressed on riscv due to the splitting of critical
> > > edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
> > > the edges, whether true or false edge goes to an empty forwarded bb.
> > > From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, 
> > > for
> > > some ifcvt opts it matters one way or another.
> > 
> > Do we have evidence that one or the other form is "better"?  There's
> > the possibility to change CFG cleanup to process the edges in a
> > defined {true,false} or {false,true} order rather than edge number order.
> > Likewise we could order fallthru before EH or abnormal edges here.
> > It would be a bit intrusive (and come at a cost) since currently we
> > just iterate over all BBs, seeing if they are forwarders while ordering
> > would require a different iteration scheme.  But doing all this might
> > also make the CFG cleanup result more stable with respect to IL
> > representation changes.
> 
> I have the feeling that sometimes one order and sometimes another order is
> better, sometimes one order means we reuse a pseudo, sometimes the other,
> and in some cases reusing a pseudo disables ifcvt optimization, in another
> is the only case where it works.
> I think it is best to make ifcvt work with any orders, but not sure it will
> be always possible.  E.g. sometimes the reused pseudo means we need to
> allocate an extra temporary, and while say RA can generate the same code
> from it, during ifcvt the sequence now might have higher cost.

Agreed.

> Also, we have the case of cond_move_process_if_block which doesn't bother
> checking any costs, so it can force cases where we otherwise carefully punt
> because of costs.  I'm afraid to touch that this late and not really sure
> I'll have time for it during stage1 either.

OK, I'm keeping sanitizing CFG cleanup on my TODO for stage1 since I
want to look at the critical edge split / forwarder removal anyway for
other reasons.

Richard.


New Swedish PO file for 'gcc' (version 12.1-b20220403)

2022-04-26 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-12.1-b20220403.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH] lto: use diagnostics_context in print_lto_docs_link

2022-04-26 Thread Martin Liška
Properly parse OPT_fdiagnostics_urls_ and then initialize both urls
and colors for global_dc. Doing that we would follow the configure
option --with-documentation-root-url, -fdiagnostics-urls is respected.
Plus we'll print colored warning and note messages.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR lto/105364

gcc/ChangeLog:

* lto-wrapper.cc (print_lto_docs_link): Use global_dc.
(run_gcc): Parse OPT_fdiagnostics_urls_.
(main): Initialize global_dc.
---
 gcc/lto-wrapper.cc | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 6027fd9efdd..285e6e96af5 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -1364,14 +1364,17 @@ jobserver_active_p (void)
 void
 print_lto_docs_link ()
 {
-  const char *url = get_option_url (NULL, OPT_flto);
+  bool print_url = global_dc->printer->url_format != URL_FORMAT_NONE;
+  const char *url = global_dc->get_option_url (global_dc, OPT_flto);
 
   pretty_printer pp;
   pp.url_format = URL_FORMAT_DEFAULT;
   pp_string (&pp, "see the ");
-  pp_begin_url (&pp, url);
+  if (print_url)
+pp_begin_url (&pp, url);
   pp_string (&pp, "%<-flto%> option documentation");
-  pp_end_url (&pp);
+  if (print_url)
+pp_end_url (&pp);
   pp_string (&pp, " for more information");
   inform (UNKNOWN_LOCATION, pp_formatted_text (&pp));
 }
@@ -1573,6 +1576,14 @@ run_gcc (unsigned argc, char *argv[])
  incoming_dumppfx = dumppfx = option->arg;
  break;
 
+   case OPT_fdiagnostics_urls_:
+ diagnostic_urls_init (global_dc, option->value);
+ break;
+
+   case OPT_fdiagnostics_color_:
+ diagnostic_color_init (global_dc, option->value);
+ break;
+
default:
  break;
}
@@ -2130,6 +2141,9 @@ main (int argc, char *argv[])
   gcc_init_libintl ();
 
   diagnostic_initialize (global_dc, 0);
+  diagnostic_color_init (global_dc);
+  diagnostic_urls_init (global_dc);
+  global_dc->get_option_url = get_option_url;
 
   if (atexit (lto_wrapper_cleanup) != 0)
 fatal_error (input_location, "% failed");
-- 
2.36.0



[PATCH] libstdc++: Gate constexpr string and vector on constexpr destructor support

2022-04-26 Thread Jonathan Wakely via Gcc-patches
I was going to push this patch to "fix" our C++20 constexpr string and
vector so they depend on __cpp_constexpr_dynamic_alloc. But then I
realised that Clang supports that since 10.0.0 and I don't think we need
to bother supporting anything older than that for C++20 mode (Clang 9
users can still use C++17 mode).

So I'm just posting this here for the archives, but I don't plan to
push it. We could probably remove the __cpp_constexpr_dynamic_alloc
checks in std::allocator and std::unique_ptr now too.

-- >8 --

Declaring a destructor as constexpr is only supported when the
__cpp_constexpr_dynamic_alloc macro is defined. Introduce a new macro,
_GLIBCXX_CONSTEXPR_DTOR, which can be used on destructors. Adjust the
relevant feature test macros to also depend on
__cpp_constexpr_dynamic_alloc.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_CONSTEXPR_DTOR): Define new
macro.
(_GLIBCXX23_CONSTEXPR_DTOR): Likewise.
* include/bits/allocator.h (allocator::~allocator()): Use new
macro.
* include/bits/basic_string.h (__cpp_lib_constexpr_string):
Depend on __cpp_constexpr_dynamic_alloc.
(basic_string::~basic_string()): Use new macro.
* include/bits/stl_vector.h (__cpp_lib_constexpr_vector):
Depend on __cpp_constexpr_dynamic_alloc.
(_Vector_base::~_Vector_base(), vector::~vector()): Use new
macro.
* include/bits/unique_ptr.h (unique_ptr, unique_ptr):
Likewise.
* include/std/version (__cpp_lib_constexpr_string)
(__cpp_lib_constexpr_vector): Depend on
__cpp_constexpr_dynamic_alloc.
---
 libstdc++-v3/include/bits/allocator.h|  4 +---
 libstdc++-v3/include/bits/basic_string.h | 11 ---
 libstdc++-v3/include/bits/c++config  | 12 
 libstdc++-v3/include/bits/stl_vector.h   | 14 --
 libstdc++-v3/include/bits/unique_ptr.h   |  8 ++--
 libstdc++-v3/include/std/version |  6 --
 6 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/include/bits/allocator.h 
b/libstdc++-v3/include/bits/allocator.h
index f7770165273..d94d93546c9 100644
--- a/libstdc++-v3/include/bits/allocator.h
+++ b/libstdc++-v3/include/bits/allocator.h
@@ -168,9 +168,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_GLIBCXX20_CONSTEXPR
allocator(const allocator<_Tp1>&) _GLIBCXX_NOTHROW { }
 
-#if __cpp_constexpr_dynamic_alloc
-  constexpr
-#endif
+  _GLIBCXX_CONSTEXPR_DTOR
   ~allocator() _GLIBCXX_NOTHROW { }
 
 #if __cplusplus > 201703L
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index c3fbc53953c..e5ec7ea794e 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -56,9 +56,14 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
-#ifdef __cpp_lib_is_constant_evaluated
+#if __cpp_lib_is_constant_evaluated
+# if __cpp_constexpr_dynamic_alloc
 // Support P0980R1 in C++20.
-# define __cpp_lib_constexpr_string 201907L
+#  define __cpp_lib_constexpr_string 201907L
+# else
+// Support P1032R1 in C++20.
+#  define __cpp_lib_constexpr_string 201811L
+# endif
 #elif __cplusplus >= 201703L && _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED
 // Support P0426R1 changes to char_traits in C++17.
 # define __cpp_lib_constexpr_string 201611L
@@ -790,7 +795,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   /**
*  @brief  Destroy the string instance.
*/
-  _GLIBCXX20_CONSTEXPR
+  _GLIBCXX_CONSTEXPR_DTOR
   ~basic_string()
   { _M_dispose(); }
 
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 2798b9786dc..6b575505932 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -190,6 +190,18 @@
 # endif
 #endif
 
+#ifdef __cpp_constexpr_dynamic_alloc
+# define _GLIBCXX_CONSTEXPR_DTOR constexpr
+#else
+# define _GLIBCXX_CONSTEXPR_DTOR
+#endif
+
+# if __cplusplus >= 202100L
+# define _GLIBCXX23_CONSTEXPR_DTOR _GLIBCXX_CONSTEXPR_DTOR
+#else
+# define _GLIBCXX23_CONSTEXPR_DTOR
+#endif
+
 #ifndef _GLIBCXX17_INLINE
 # if __cplusplus >= 201703L
 #  define _GLIBCXX17_INLINE inline
diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index b4ff3989a5d..a93018f8700 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -64,7 +64,9 @@
 #endif
 #if __cplusplus >= 202002L
 # include 
-#define __cpp_lib_constexpr_vector 201907L
+# ifdef __cpp_constexpr_dynamic_alloc
+#  define __cpp_lib_constexpr_vector 201907L
+# endif
 #endif
 
 #include 
@@ -229,7 +231,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_S_on_dealloc(_M_impl);
  }
 
- _GLIBCXX20_CONSTEXPR
+ _GLIBCXX_CONSTEXPR_DTOR
  ~_Reinit()
  {
// Mark unused capacity as invalid after re

GCN: Make "gang-private data-share memory exhausted" error more verbose (was: [PATCH] [og10] OpenACC: Shared memory layout optimisation)

2022-04-26 Thread Thomas Schwinge
Hi!

On 2020-06-29T13:16:52-0700, Julian Brown  wrote:
> This patch implements an algorithm to lay out local data-share (LDS) space.  
> It currently works for AMD GCN.  At the moment, LDS is used for three things:
>
>   1. Gang-private variables
>   2. Reduction temporaries (accumulators)
>   3. Broadcasting for worker partitioning
>
> After the patch is applied, (2) and (3) are placed at preallocated
> locations in LDS, and (1) continues to be handled by the backend (as it
> is at present prior to this patch being applied). LDS now looks like this:
>
>   +--+ (gang local size + 1024, = 1536)
>   | free space   |
>   |...   |
>   | - - - - - - -|
>   | worker bcast |
>   +--+
>   | reductions   |
>   +--+ <<< -mgang-local-size= (def. 512)
>   | gang private |
>   |vars  |
>   +--+ (32)
>   | low LDS vars |
>   +--+ LDS base
>
> So, gang-private space is fixed at a constant amount at compile time
> (which can be increased with a command-line switch if necessary
> for some given code). [...]

> --- a/gcc/config/gcn/gcn.c
> +++ b/gcc/config/gcn/gcn.c

> @@ -5240,14 +5286,14 @@ gcn_print_lds_decl (FILE *f, tree var)
>if (size > align && size > 4 && align < 8)
>   align = 8;
>
> -  machfun->lds_allocated = ((machfun->lds_allocated + align - 1)
> - & ~(align - 1));
> +  gangprivate_hwm = ((gangprivate_hwm + align - 1) & ~(align - 1));
>
> -  machfun->lds_allocs->put (var, machfun->lds_allocated);
> -  fprintf (f, "%u", machfun->lds_allocated);
> -  machfun->lds_allocated += size;
> -  if (machfun->lds_allocated > LDS_SIZE)
> - error ("local data-share memory exhausted");
> +  lds_allocs.put (var, gangprivate_hwm);
> +  fprintf (f, "%u", gangprivate_hwm);
> +  gangprivate_hwm += size;
> +  if (gangprivate_hwm > gang_local_size_opt)
> + error ("gang-private data-share memory exhausted (increase with "
> +"-mgang-local-size=)");
>  }
>  }

In a new case (to be discussed later), we're running into this error.
OK to push to master branch the attached
'GCN: Make "gang-private data-share memory exhausted" error more verbose'?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3f57f1d975dcb859a8203bebadb2b2bfbfba24b9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 26 Apr 2022 13:05:19 +0200
Subject: [PATCH] GCN: Make "gang-private data-share memory exhausted" error
 more verbose
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[...]: error: 512 bytes of gang-private data-share memory exhausted (increase with ‘-mgang-private-size=560’, for example)

	gcc/
	* config/gcn/gcn.cc (gcn_print_lds_decl): Make "gang-private
	data-share memory exhausted" error more verbose.
---
 gcc/config/gcn/gcn.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 90cc8edc5b4..19e9f424efc 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5588,8 +5588,9 @@ gcn_print_lds_decl (FILE *f, tree var)
   fprintf (f, "%u", gang_private_hwm);
   gang_private_hwm += size;
   if (gang_private_hwm > gang_private_size_opt)
-	error ("gang-private data-share memory exhausted (increase with "
-	   "%<-mgang-private-size=%>)");
+	error ("%d bytes of gang-private data-share memory exhausted"
+	   " (increase with %<-mgang-private-size=%d%>, for example)",
+	   gang_private_size_opt, gang_private_hwm);
 }
 }
 
-- 
2.25.1



Re: [PATCH] lto: use diagnostics_context in print_lto_docs_link

2022-04-26 Thread Richard Biener via Gcc-patches
On Tue, Apr 26, 2022 at 12:41 PM Martin Liška  wrote:
>
> Properly parse OPT_fdiagnostics_urls_ and then initialize both urls
> and colors for global_dc. Doing that we would follow the configure
> option --with-documentation-root-url, -fdiagnostics-urls is respected.
> Plus we'll print colored warning and note messages.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

Thanks,
Richard.

> Thanks,
> Martin
>
> PR lto/105364
>
> gcc/ChangeLog:
>
> * lto-wrapper.cc (print_lto_docs_link): Use global_dc.
> (run_gcc): Parse OPT_fdiagnostics_urls_.
> (main): Initialize global_dc.
> ---
>  gcc/lto-wrapper.cc | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> index 6027fd9efdd..285e6e96af5 100644
> --- a/gcc/lto-wrapper.cc
> +++ b/gcc/lto-wrapper.cc
> @@ -1364,14 +1364,17 @@ jobserver_active_p (void)
>  void
>  print_lto_docs_link ()
>  {
> -  const char *url = get_option_url (NULL, OPT_flto);
> +  bool print_url = global_dc->printer->url_format != URL_FORMAT_NONE;
> +  const char *url = global_dc->get_option_url (global_dc, OPT_flto);
>
>pretty_printer pp;
>pp.url_format = URL_FORMAT_DEFAULT;
>pp_string (&pp, "see the ");
> -  pp_begin_url (&pp, url);
> +  if (print_url)
> +pp_begin_url (&pp, url);
>pp_string (&pp, "%<-flto%> option documentation");
> -  pp_end_url (&pp);
> +  if (print_url)
> +pp_end_url (&pp);
>pp_string (&pp, " for more information");
>inform (UNKNOWN_LOCATION, pp_formatted_text (&pp));
>  }
> @@ -1573,6 +1576,14 @@ run_gcc (unsigned argc, char *argv[])
>   incoming_dumppfx = dumppfx = option->arg;
>   break;
>
> +   case OPT_fdiagnostics_urls_:
> + diagnostic_urls_init (global_dc, option->value);
> + break;
> +
> +   case OPT_fdiagnostics_color_:
> + diagnostic_color_init (global_dc, option->value);
> + break;
> +
> default:
>   break;
> }
> @@ -2130,6 +2141,9 @@ main (int argc, char *argv[])
>gcc_init_libintl ();
>
>diagnostic_initialize (global_dc, 0);
> +  diagnostic_color_init (global_dc);
> +  diagnostic_urls_init (global_dc);
> +  global_dc->get_option_url = get_option_url;
>
>if (atexit (lto_wrapper_cleanup) != 0)
>  fatal_error (input_location, "% failed");
> --
> 2.36.0
>


Re: GCN: Make "gang-private data-share memory exhausted" error more verbose (was: [PATCH] [og10] OpenACC: Shared memory layout optimisation)

2022-04-26 Thread Julian Brown
On Tue, 26 Apr 2022 13:12:23 +0200
Thomas Schwinge  wrote:

> > @@ -5240,14 +5286,14 @@ gcn_print_lds_decl (FILE *f, tree var)
> >if (size > align && size > 4 && align < 8)
> > align = 8;
> >  
> > -  machfun->lds_allocated = ((machfun->lds_allocated + align -
> > 1)
> > -   & ~(align - 1));
> > +  gangprivate_hwm = ((gangprivate_hwm + align - 1) & ~(align -
> > 1)); 
> > -  machfun->lds_allocs->put (var, machfun->lds_allocated);
> > -  fprintf (f, "%u", machfun->lds_allocated);
> > -  machfun->lds_allocated += size;
> > -  if (machfun->lds_allocated > LDS_SIZE)
> > -   error ("local data-share memory exhausted");
> > +  lds_allocs.put (var, gangprivate_hwm);
> > +  fprintf (f, "%u", gangprivate_hwm);
> > +  gangprivate_hwm += size;
> > +  if (gangprivate_hwm > gang_local_size_opt)
> > +   error ("gang-private data-share memory exhausted (increase
> > with "
> > +  "-mgang-local-size=)");
> >  }
> >  }  
> 
> In a new case (to be discussed later), we're running into this error.
> OK to push to master branch the attached
> 'GCN: Make "gang-private data-share memory exhausted" error more
> verbose'?

LGTM, thanks.

Julian


[PATCH] rs6000: Make the has_arch target selectors actually work

2022-04-26 Thread Segher Boessenkool
Tested on powerpc64-linux {-m32,-m64}.  Also manually checked the
gcc.log; it did the wrong thing before, it does the right thing now.
Committing.


Segher


2022-04-26  Segher Boessenkoool  

gcc/testsuite/
PR target/105349
* lib/target-supports.exp (check_effective_target_has_arch_pwr5): Use
the specified dg-options.
(check_effective_target_has_arch_pwr6): Ditto.
(check_effective_target_has_arch_pwr7): Ditto.
(check_effective_target_has_arch_pwr8): Ditto.
(check_effective_target_has_arch_pwr9): Ditto.
(check_effective_target_has_arch_pwr10): Ditto.
(check_effective_target_has_arch_ppc64): Ditto.
---
 gcc/testsuite/lib/target-supports.exp | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 902bdae8a441..2d5d0539bb4f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6319,73 +6319,73 @@ proc check_effective_target_powerpc_p9modulo_ok { } {
 # return 1 if our compiler returns the ARCH_PWR defines with the options
 # as provided by the test.
 proc check_effective_target_has_arch_pwr5 { } {
-   return [check_no_compiler_messages arch_pwr5 assembly {
+   return [check_no_compiler_messages_nocache arch_pwr5 assembly {
#ifndef _ARCH_PWR5
#error does not have power5 support.
#else
/* "has power5 support" */
#endif
-   }]
+   } [current_compiler_flags]]
 }
 
 proc check_effective_target_has_arch_pwr6 { } {
-   return [check_no_compiler_messages arch_pwr6 assembly {
+   return [check_no_compiler_messages_nocache arch_pwr6 assembly {
#ifndef _ARCH_PWR6
#error does not have power6 support.
#else
/* "has power6 support" */
#endif
-   }]
+   } [current_compiler_flags]]
 }
 
 proc check_effective_target_has_arch_pwr7 { } {
-   return [check_no_compiler_messages arch_pwr7 assembly {
+   return [check_no_compiler_messages_nocache arch_pwr7 assembly {
#ifndef _ARCH_PWR7
#error does not have power7 support.
#else
/* "has power7 support" */
#endif
-   }]
+   } [current_compiler_flags]]
 }
 
 proc check_effective_target_has_arch_pwr8 { } {
-   return [check_no_compiler_messages arch_pwr8 assembly {
+   return [check_no_compiler_messages_nocache arch_pwr8 assembly {
#ifndef _ARCH_PWR8
#error does not have power8 support.
#else
/* "has power8 support" */
#endif
-   }]
+   } [current_compiler_flags]]
 }
 
 proc check_effective_target_has_arch_pwr9 { } {
-   return [check_no_compiler_messages arch_pwr9 assembly {
+   return [check_no_compiler_messages_nocache arch_pwr9 assembly {
#ifndef _ARCH_PWR9
#error does not have power9 support.
#else
/* "has power9 support" */
#endif
-   }]
+   } [current_compiler_flags]]
 }
 
 proc check_effective_target_has_arch_pwr10 { } {
-   return [check_no_compiler_messages arch_pwr10 assembly {
+   return [check_no_compiler_messages_nocache arch_pwr10 assembly {
#ifndef _ARCH_PWR10
#error does not have power10 support.
#else
/* "has power10 support" */
#endif
-   }]
+   } [current_compiler_flags]]
 }
 
 proc check_effective_target_has_arch_ppc64 { } {
-   return [check_no_compiler_messages arch_ppc64 assembly {
+   return [check_no_compiler_messages_nocache arch_ppc64 assembly {
#ifndef _ARCH_PPC64
#error does not have ppc64 support.
#else
/* "has ppc64 support" */
#endif
-   }]
+   } [current_compiler_flags]]
 }
 
 # Return 1 if this is a PowerPC target supporting -mcpu=power10.
-- 
1.8.3.1



[committed] libstdc++: Define std::hash (LWG 3657)

2022-04-26 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This DR was approved at the February 2022 plenary.

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (hash): Define.
* testsuite/27_io/filesystem/path/nonmember/hash_value.cc:
Check std::hash specialization.
---
 libstdc++-v3/include/bits/fs_path.h| 10 ++
 .../27_io/filesystem/path/nonmember/hash_value.cc  | 10 ++
 2 files changed, 20 insertions(+)

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 9e06fa679d8..d6202fd275a 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -1416,6 +1416,16 @@ extern template class __shared_ptr;
 
 /// @endcond
 
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// 3657. std::hash is not enabled
+template<>
+  struct hash
+  {
+size_t
+operator()(const filesystem::path& __p) const noexcept
+{ return filesystem::hash_value(__p); }
+  };
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/path/nonmember/hash_value.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/nonmember/hash_value.cc
index 6bc6296b5a8..0dcea6efc64 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/nonmember/hash_value.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/nonmember/hash_value.cc
@@ -42,9 +42,19 @@ test02()
   }
 }
 
+void
+test03()
+{
+  std::hash h;
+  // LWG 3657. std::hash is not enabled
+  for (const path p : __gnu_test::test_paths)
+VERIFY( h(p) == hash_value(p) );
+}
+
 int
 main()
 {
   test01();
   test02();
+  test03();
 }
-- 
2.34.1



[committed] libstdc++: Add std::atomic(nullptr_t) constructor (LWG 3661)

2022-04-26 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This DR was approved at the February 2022 plenary.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_atomic.h (atomic): Add
constructor for constant initialization from nullptr_t.
* testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc:
Check for new constructor.
---
 libstdc++-v3/include/bits/shared_ptr_atomic.h | 4 
 .../testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc  | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/bits/shared_ptr_atomic.h 
b/libstdc++-v3/include/bits/shared_ptr_atomic.h
index 9e4df7da7f8..ff86432f0b4 100644
--- a/libstdc++-v3/include/bits/shared_ptr_atomic.h
+++ b/libstdc++-v3/include/bits/shared_ptr_atomic.h
@@ -573,6 +573,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   constexpr atomic() noexcept = default;
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3661. constinit atomic> a(nullptr); should work
+  constexpr atomic(nullptr_t) noexcept : atomic() { }
+
   atomic(shared_ptr<_Tp> __r) noexcept
   : _M_impl(std::move(__r))
   { }
diff --git 
a/libstdc++-v3/testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc 
b/libstdc++-v3/testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc
index 1f97224bf6a..a1902745a3e 100644
--- a/libstdc++-v3/testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc
+++ b/libstdc++-v3/testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc
@@ -18,6 +18,8 @@
 
 // Check constexpr constructor.
 constinit std::atomic> a;
+// LWG 3661. constinit atomic> a(nullptr); should work
+constinit std::atomic> a2 = nullptr;
 
 void
 test_is_lock_free()
-- 
2.34.1



[PATCH] fortran: Avoid infinite self-recursion [PR105381]

2022-04-26 Thread Mikael Morin

Hello,

this is a fix for the regression I recently introduced with the PR102043 
patch.  It is an infinite recursion problem.  I can’t see the memory 
consumption that Harald reported; maybe he doesn’t use the default 
optimization level to build the compiler.


Regression tested on x86_64-pc-linux-gnu.
I plan to push it tonight.From 85d57fb88203697d7e52d5f1f148eab35e4f7486 Mon Sep 17 00:00:00 2001
From: Mikael Morin 
Date: Tue, 26 Apr 2022 13:05:32 +0200
Subject: [PATCH] fortran: Avoid infinite self-recursion [PR105381]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Dummy array decls are local decls different from the argument decl
accessible through GFC_DECL_SAVED_DESCRIPTOR.  If the argument decl has
a DECL_LANG_SPECIFIC set, it is copied over to the local decl at the
time the latter is created, so that the DECL_LANG_SPECIFIC object is
shared between local dummy decl and argument decl, and thus the
GFC_DECL_SAVED_DESCRIPTOR of the argument decl is the argument decl
itself.

The r12-8230-g7964ab6c364c410c34efe7ca2eba797d36525349 change introduced
the non_negative_strides_array_p predicate which recurses through
GFC_DECL_SAVED_DESCRIPTOR to avoid seeing dummy decls as purely local
decls.  As the GFC_DECL_SAVED_DESCRIPTOR of the argument decl is itself,
this can cause infinite recursion.

This change adds a check to avoid infinite recursion.

	PR fortran/102043
	PR fortran/105381

gcc/fortran/ChangeLog:

	* trans-array.cc (non_negative_strides_array_p): Don’t recurse
	if the next argument is the same as the current.

gcc/testsuite/ChangeLog:

	* gfortran.dg/character_array_dummy_1.f90: New test.
---
 gcc/fortran/trans-array.cc|  3 ++-
 .../gfortran.dg/character_array_dummy_1.f90   | 21 +++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/character_array_dummy_1.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index e4b6270ccf8..e0070aa080d 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3698,7 +3698,8 @@ non_negative_strides_array_p (tree expr)
   if (DECL_P (expr)
   && DECL_LANG_SPECIFIC (expr))
 if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
-  return non_negative_strides_array_p (orig_decl);
+  if (orig_decl != expr)
+	return non_negative_strides_array_p (orig_decl);
 
   return true;
 }
diff --git a/gcc/testsuite/gfortran.dg/character_array_dummy_1.f90 b/gcc/testsuite/gfortran.dg/character_array_dummy_1.f90
new file mode 100644
index 000..da5ed636f4f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/character_array_dummy_1.f90
@@ -0,0 +1,21 @@
+! { dg-do compile }
+!
+! PR fortran/105381
+! Infinite recursion with array references of character dummy arguments.
+!
+! Contributed by Harald Anlauf 
+
+MODULE m
+  implicit none
+  integer,  parameter :: ncrit  =  8
+  integer,  parameter :: nterm  =  7
+contains
+
+  subroutine new_thin_rule (rule1)
+character(*),intent(in) ,optional :: rule1(ncrit)
+character(len=8) :: rules (ncrit,nterm)
+rules = ''
+if (present (rule1)) rules(:,1) = rule1  ! <-- compile time hog
+  end subroutine new_thin_rule
+
+end module m
-- 
2.35.1



Re: [PATCH] ppc: testsuite: float128-hw{, 4}.c need -mlong-double-128 (was: [PATCH] ppc: testsuite: pr79004 needs -mlong-double-128)

2022-04-26 Thread Segher Boessenkool
Hi!

Please don't send patches as replies.

On Sat, Apr 23, 2022 at 10:33:35AM -0300, Alexandre Oliva wrote:
> On Apr 14, 2022, Alexandre Oliva  wrote:
> 
> > * gcc.target/powerpr/pr79004.c: Add -mlong-double-128.
> 
> Just like pr79004, float128-hw.c requires -mlong-double-128 for some
> the expected asm opcodes to be output on target variants that have
> 64-bit long doubles.  That's because their expanders,
> e.g. floatsi2 for FLOAT128 modes, are conditioned on
> TARGET_LONG_DOUBLE_128, which is not set on target variants that use
> 64-bit long double.
> 
> float128-hw4.c doesn't even compile without -mlong-double-128, on
> 64-bit long double target variants.  The error is "invalid parameter
> combination for AltiVec intrinsic" in get_float128_exponent,
> get_float128_mantissa, and set_float128_exponent_float128, presumably
> caused by rs6000_builtin_type_compatible's refusal to consider
> _Float128 compatible when TARGET_LONG_DOUBLE_128 is not set.
> 
> Since these are compile tests, -mlong-double-128 doesn't hurt even on
> target variants that use 64-bit long double, and enables both tests to
> pass.

This is not okay, sorry.  The testcase uses _Float128, what code that
generates should not depend on your long double setting.  Please file
a PR instead?


Segher


Re: [PATCH] fortran: Avoid infinite self-recursion [PR105381]

2022-04-26 Thread Tobias Burnus

LGTM - however:

On 26.04.22 14:38, Mikael Morin wrote:

--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3698,7 +3698,8 @@ non_negative_strides_array_p (tree expr)
if (DECL_P (expr)
&& DECL_LANG_SPECIFIC (expr))
  if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
-  return non_negative_strides_array_p (orig_decl);
+  if (orig_decl != expr)
+ return non_negative_strides_array_p (orig_decl);


Is the if()if()if() cascade really needed? I can see a reason that an
extra 'if' is preferred for the variable declaration of orig_decl, but
can't we at least put the new 'orig_decl != expr' with an '&&' into the
same if as the decl/in the second if? Besides clearer, it also avoids
further identing the return line.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] fortran: Avoid infinite self-recursion [PR105381]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 26, 2022 at 03:22:08PM +0200, Tobias Burnus wrote:
> LGTM - however:
> 
> On 26.04.22 14:38, Mikael Morin wrote:
> > --- a/gcc/fortran/trans-array.cc
> > +++ b/gcc/fortran/trans-array.cc
> > @@ -3698,7 +3698,8 @@ non_negative_strides_array_p (tree expr)
> > if (DECL_P (expr)
> > && DECL_LANG_SPECIFIC (expr))
> >   if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
> > -  return non_negative_strides_array_p (orig_decl);
> > +  if (orig_decl != expr)
> > + return non_negative_strides_array_p (orig_decl);
> 
> Is the if()if()if() cascade really needed? I can see a reason that an
> extra 'if' is preferred for the variable declaration of orig_decl, but
> can't we at least put the new 'orig_decl != expr' with an '&&' into the
> same if as the decl/in the second if? Besides clearer, it also avoids
> further identing the return line.

I think we can't in C++11/C++14.  The options can be if orig_decl would be 
declared
earlier, then it can be
tree orig_decl;
if (DECL_P (expr)
&& DECL_LANG_SPECIFIC (expr)
&& (orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
&& orig_decl != expr)
  return non_negative_strides_array_p (orig_decl);
but I think this is generally frowned upon,
or one can repeat it like:
if (DECL_P (expr)
&& DECL_LANG_SPECIFIC (expr)
&& GFC_DECL_SAVED_DESCRIPTOR (expr)
&& GFC_DECL_SAVED_DESCRIPTOR (expr) != expr)
  return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (expr));
or what Mikael wrote, perhaps with the && on one line:
if (DECL_P (expr) && DECL_LANG_SPECIFIC (expr))
  if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
if (orig_decl != expr)
  return non_negative_strides_array_p (orig_decl);
In C++17 and later one can write:
if (DECL_P (expr) && DECL_LANG_SPECIFIC (expr))
  if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr);
  orig_decl && orig_decl != expr)
return non_negative_strides_array_p (orig_decl);

Jakub



[PATCH] c++: decltype of non-dependent call of class type [PR105386]

2022-04-26 Thread Patrick Palka via Gcc-patches
We need to pass tf_decltype when instantiating a non-dependent decltype
operand, like tsubst does in the dependent case, so that we avoid
materializing a temporary for a prvalue operand.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/105386

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): Pass tf_decltype to
instantiate_non_dependent_expr_sfinae.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype81.C: New test.
---
 gcc/cp/semantics.cc |  2 +-
 gcc/testsuite/g++.dg/cpp0x/decltype81.C | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype81.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index f08c0b6281f..ab48f11c9be 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11252,7 +11252,7 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
 }
   else if (processing_template_decl)
 {
-  expr = instantiate_non_dependent_expr_sfinae (expr, complain);
+  expr = instantiate_non_dependent_expr_sfinae (expr, 
complain|tf_decltype);
   if (expr == error_mark_node)
return error_mark_node;
   /* Keep processing_template_decl cleared for the rest of the function
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype81.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype81.C
new file mode 100644
index 000..7d25db39d9c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype81.C
@@ -0,0 +1,15 @@
+// PR c++/105386
+// { dg-do compile { target c++11 } }
+
+template struct NoInst {
+  static_assert(sizeof(T) == , "NoInst instantiated");
+};
+
+template NoInst f(T);
+
+template
+struct A {
+  using type = decltype(f(0));
+};
+
+A a;
-- 
2.36.0.rc2.10.g1ac7422e39



[committed] libphobos: Don't call free on the TLS array in the emutls destroy function.

2022-04-26 Thread Iain Buclaw via Gcc-patches
Fixes a segfault seen on Darwin when a GC scan is ran after a thread has
been destroyed.  As the global emutlsArrays hash still has a reference
to the array itself, and tries to iterate all elements.

Setting the length to zero frees all allocated elements in the array,
and ensures that it is skipped when the _d_emutls_scan is called.

Bootstrapped and regression tested on x86_64-linux-gnu and
x86_64-apple-darwin20.  Committed to mainline and backported to the
gcc-9/10/11 release branches.

Regards,
Iain.

---
libphobos/ChangeLog:

* libdruntime/gcc/emutls.d (emutlsDestroyThread): Clear the per-thread
TLS array, don't call free().
---
 libphobos/libdruntime/gcc/emutls.d | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libphobos/libdruntime/gcc/emutls.d 
b/libphobos/libdruntime/gcc/emutls.d
index 6d9fb309a30..ee3603206b6 100644
--- a/libphobos/libdruntime/gcc/emutls.d
+++ b/libphobos/libdruntime/gcc/emutls.d
@@ -223,9 +223,9 @@ void** emutlsAlloc(shared __emutls_object* obj) nothrow 
@nogc
 }
 
 /*
- * When a thread has finished, remove the TLS array from the GC
- * scan list emutlsArrays, free all allocated TLS variables and
- * finally free the array.
+ * When a thread has finished, free all allocated TLS variables and empty the
+ * array.  The pointer is not free'd as it is stil referenced by the GC scan
+ * list emutlsArrays, which gets destroyed when druntime is unloaded.
  */
 extern (C) void emutlsDestroyThread(void* ptr) nothrow @nogc
 {
@@ -237,7 +237,7 @@ extern (C) void emutlsDestroyThread(void* ptr) nothrow @nogc
 free(entry[-1]);
 }
 
-free(arr);
+arr.length = 0;
 }
 
 /*
-- 
2.32.0



Re: [PATCH] v2 PR102024 - IBM Z: Add psabi diagnostics

2022-04-26 Thread Ulrich Weigand via Gcc-patches
Andreas Krebbel  wrote:

>gcc/ChangeLog:
>PR target/102024
>* config/s390/s390-protos.h (s390_function_arg_vector): Remove
>prototype.
>* config/s390/s390.cc (s390_single_field_struct_p): New
function.
>(s390_function_arg_vector): Invoke s390_single_field_struct_p.
>(s390_function_arg_float): Likewise.

This looks good to me.

Bye,
Ulrich



Re: [gcov v2 14/14] gcov: Add section for freestanding environments

2022-04-26 Thread Martin Liška
Hi.

This if fine, except 2 places where you have trailing whitespace
at the end of a line.

Martin


Re: [gcov v2 00/14] Add merge-stream subcommand to gcov-tool

2022-04-26 Thread Martin Liška
On 4/25/22 09:09, Sebastian Huber wrote:
> This patch set is for GCC 13.
> 
> The aim is to better support gcov in free-standing environments. For example,
> you can run a test executable which dumps all gcov info objects in a serial
> data stream using __gcov_info_to_gcda() and the new __gcov_filename_to_gcfn().
> It could be encoded as base64. It could be also compressed. On the host you
> unpack the encoded data stream and feed it into gcov-tool using the new
> "merge-stream" subcommand:
> 
> gcov-tool --help
> Usage: gcov-tool [OPTION]... SUB_COMMAND [OPTION]...
> 
> Offline tool to handle gcda counts
> 
>   -h, --helpPrint this help, then exit
>   -v, --version Print version number, then exit
>   merge-stream [options] [stream-file]  Merge coverage stream file (or stdin)
> and coverage file contents
> -v, --verbose   Verbose mode
> -w, --weight Set weights (float point values)
> 
> Example:
> 
> base64 -d log.txt | gcov-tool merge-stream
> 
> The patch set does not change the format of .gcda files.
> 
> TODO:
> 
> * Tests for gcov-tool
> 
> v2:
> 
> * Address review comments from v1
> 
> * Simple test for __gcov_filename_to_gcfn()
> 
> * Use xstrerror()
> 
> * Add documentation

Hi.

Thank you for it. Please install the patch set once Stage 1 opens.

Cheers,
Martin

> 
> Sebastian Huber (14):
>   gcov-tool: Allow merging of empty profile lists
>   gcov: Add mode to all gcov_open()
>   gcov: Add open mode parameter to gcov_do_dump()
>   gcov: Make gcov_seek() static
>   gcov: Add __gcov_filename_to_gcfn()
>   gcov-tool: Support file input from stdin
>   gcov: Use xstrdup()
>   gcov: Move prepend to list to read_gcda_file()
>   gcov: Move gcov_open() to caller of read_gcda_file()
>   gcov: Fix integer types in ftw_read_file()
>   gcov: Record EOF error during read
>   gcov-tool: Add merge-stream subcommand
>   gcov: Use xstrerror()
>   gcov: Add section for freestanding environments
> 
>  gcc/doc/gcov-tool.texi   |  36 +++
>  gcc/doc/gcov.texi| 375 +++
>  gcc/doc/invoke.texi  |  31 +-
>  gcc/gcov-io.cc   |  79 +++--
>  gcc/gcov-io.h|  35 ++-
>  gcc/gcov-tool.cc | 107 +--
>  gcc/testsuite/gcc.dg/gcov-info-to-gcda.c |  36 ++-
>  libgcc/gcov.h|  17 +-
>  libgcc/libgcov-driver-system.c   |   7 +-
>  libgcc/libgcov-driver.c  |  44 ++-
>  libgcc/libgcov-util.c| 150 +++--
>  libgcc/libgcov.h |   3 -
>  12 files changed, 803 insertions(+), 117 deletions(-)
> 



Re: [PATCH] c++: decltype of non-dependent call of class type [PR105386]

2022-04-26 Thread Jason Merrill via Gcc-patches

On 4/26/22 09:45, Patrick Palka wrote:

We need to pass tf_decltype when instantiating a non-dependent decltype
operand, like tsubst does in the dependent case, so that we avoid
materializing a temporary for a prvalue operand.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?


OK.


PR c++/105386

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): Pass tf_decltype to
instantiate_non_dependent_expr_sfinae.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype81.C: New test.
---
  gcc/cp/semantics.cc |  2 +-
  gcc/testsuite/g++.dg/cpp0x/decltype81.C | 15 +++
  2 files changed, 16 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype81.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index f08c0b6281f..ab48f11c9be 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11252,7 +11252,7 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
  }
else if (processing_template_decl)
  {
-  expr = instantiate_non_dependent_expr_sfinae (expr, complain);
+  expr = instantiate_non_dependent_expr_sfinae (expr, 
complain|tf_decltype);
if (expr == error_mark_node)
return error_mark_node;
/* Keep processing_template_decl cleared for the rest of the function
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype81.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype81.C
new file mode 100644
index 000..7d25db39d9c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype81.C
@@ -0,0 +1,15 @@
+// PR c++/105386
+// { dg-do compile { target c++11 } }
+
+template struct NoInst {
+  static_assert(sizeof(T) == , "NoInst instantiated");
+};
+
+template NoInst f(T);
+
+template
+struct A {
+  using type = decltype(f(0));
+};
+
+A a;




[PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch disables epilogue vectorization when we are peeling for 
alignment in the prologue and we can't guarantee the main vectorized 
loop is entered.  This is to prevent executing vectorized code with an 
unaligned access if the target has indicated it wants to peel for 
alignment. We take this conservative approach as we currently do not 
distinguish between peeling for alignment for correctness or for 
performance.


A better codegen would be to make it skip to the scalar epilogue in case 
the main loop isn't entered when alignment peeling is required. However, 
that would require a more aggressive change to the codebase which we 
chose to avoid at this point of development.  We can revisit this option 
during stage 1 if we choose to.


Bootstrapped on aarch64-none-linux and regression tested on 
aarch64-none-elf.


gcc/ChangeLog:

    PR tree-optimization/105219
    * tree-vect-loop.cc (vect_epilogue_when_peeling_for_alignment): New 
function.
    (vect_analyze_loop): Use vect_epilogue_when_peeling_for_alignment 
to determine

    whether to vectorize epilogue.
    * testsuite/gcc.target/aarch64/pr105219.c: New.
    * testsuite/gcc.target/aarch64/pr105219-2.c: New.
    * testsuite/gcc.target/aarch64/pr105219-3.c: New.
diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
new file mode 100644
index 
..c97d1dc100181b77af0766e08407e1e352f604fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model" } 
*/
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
"-march=armv8.2-a" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
"-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
+/* PR 105219.  */
+int data[128];
+
+void __attribute((noipa))
+foo (int *data, int n)
+{
+  for (int i = 0; i < n; ++i)
+data[i] = i;
+}
+
+int main()
+{
+  for (int start = 0; start < 16; ++start)
+for (int n = 1; n < 3*16; ++n)
+  {
+__builtin_memset (data, 0, sizeof (data));
+foo (&data[start], n);
+for (int j = 0; j < n; ++j)
+  if (data[start + j] != j)
+__builtin_abort ();
+  }
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-3.c 
b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
new file mode 100644
index 
..444352fc051b787369f6f1be6236d1ff0fc2d392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
"-march=armv8.2-a" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
"-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
+/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model 
-fdump-tree-vect-all" } */
+/* PR 105219.  */
+int data[128];
+
+void foo (void)
+{
+  for (int i = 0; i < 9; ++i)
+data[i + 1] = i;
+}
+
+/* { dg-final { scan-tree-dump "EPILOGUE VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219.c 
b/gcc/testsuite/gcc.target/aarch64/pr105219.c
new file mode 100644
index 
..bbdefb549f6a4e803852f69d20ce1ef9152a526c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target aarch64_sve128_hw } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
"-march=armv8.2-a+sve" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
"-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-msve-vector-bits=*" } { 
"-msve-vector-bits=128" } } */
+/* { dg-options "-O3 -march=armv8.2-a+sve -msve-vector-bits=128 
-mtune=thunderx" } */
+/* PR 105219.  */
+int a;
+char b[60];
+short c[18];
+short d[4][19];
+long long f;
+void e(int g, int h, short k[][19]) {
+  for (signed i = 0; i < 3; i += 2)
+for (signed j = 1; j < h + 14; j++) {
+  b[i * 14 + j] = 1;
+  c[i + j] = k[2][j];
+  a = g ? k[i][j] : 0;
+}
+}
+int main() {
+  e(9, 1, d);
+  for (long l = 0; l < 6; ++l)
+for (long m = 0; m < 4; ++m)
+  f ^= b[l + m * 4];
+  if (f)
+__builtin_abort ();
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
d7bc34636bd52b2f67cdecd3dc16fcff684dba07..a23e6181dec8126bcb691ea9474095bf65483863
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2942,6 +2942,38 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
   return opt_loop_vec_info::success (loop_vinfo);
 }
 
+/* Function vect_epilogue_when_peeling_for_alignment
+
+   PR 105219: If we are peeling for alignment in the prologue then we do not

Re: *PING* [PATCH 0/4] Use pointer arithmetic for array references [PR102043]

2022-04-26 Thread Hans-Peter Nilsson via Gcc-patches
> From: Thomas Koenig via Gcc-patches 
> Date: Fri, 22 Apr 2022 15:59:45 +0200

> Hi Mikael,
> 
> > Ping for the four patches starting at 
> > https://gcc.gnu.org/pipermail/fortran/2022-April/057759.html :
> > https://gcc.gnu.org/pipermail/fortran/2022-April/057757.html
> > https://gcc.gnu.org/pipermail/fortran/2022-April/057760.html
> > https://gcc.gnu.org/pipermail/fortran/2022-April/057758.html
> > https://gcc.gnu.org/pipermail/fortran/2022-April/057761.html
> > 
> > Richi accepted the general direction and the middle-end interaction.
> > I need a fortran frontend ack as well.
> 
> Looks good to me.
> 
> Thanks a lot for taking this on! This would have been a serious
> regression if released with gcc 12.
> 
> Best regards
> 
>   Thomas

These, or specifically r12-8227-g89ca0fffa48b79, "fortran:
Pre-evaluate string pointers. [PR102043]" have further
exposed (the issue existed before but now fails for more
platforms) PR78054 "gfortran.dg/pr70673.f90 FAILs at -O0",
at least for cris-elf and apparently also
s390x-ibm-linux-gnu.

In the PR it is mentioned that running the test through
valgrind shows invalid accesses also on x86_64-linux-gnu.
Could it be that the test-case is invalid and has undefined
behavior?  I don't know fortran so I can't tell.

That exact commit causing a regression for s390x is somewhat
an assumption based on posted date and testresults, as the
s390x results don't include a git version.  (@Stefansf: I'm
referring to
https://gcc.gnu.org/pipermail/gcc-testresults/2022-April/760060.html
https://gcc.gnu.org/pipermail/gcc-testresults/2022-April/760137.html
Perhaps that tester isn't using the contrib/gcc_update and
contrib/test_summary scripts, thus no LAST_UPDATED
included?)

brgds, H-P


Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch disables epilogue vectorization when we are peeling for 
> alignment in the prologue and we can't guarantee the main vectorized 
> loop is entered.  This is to prevent executing vectorized code with an 
> unaligned access if the target has indicated it wants to peel for 
> alignment. We take this conservative approach as we currently do not 
> distinguish between peeling for alignment for correctness or for 
> performance.
>
> A better codegen would be to make it skip to the scalar epilogue in case 
> the main loop isn't entered when alignment peeling is required. However, 
> that would require a more aggressive change to the codebase which we 
> chose to avoid at this point of development.  We can revisit this option 
> during stage 1 if we choose to.
>
> Bootstrapped on aarch64-none-linux and regression tested on 
> aarch64-none-elf.
>
> gcc/ChangeLog:
>
>      PR tree-optimization/105219
>      * tree-vect-loop.cc (vect_epilogue_when_peeling_for_alignment): New 
> function.
>      (vect_analyze_loop): Use vect_epilogue_when_peeling_for_alignment 
> to determine
>      whether to vectorize epilogue.
>      * testsuite/gcc.target/aarch64/pr105219.c: New.
>      * testsuite/gcc.target/aarch64/pr105219-2.c: New.
>      * testsuite/gcc.target/aarch64/pr105219-3.c: New.
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-2.c 
> b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
> new file mode 100644
> index 
> ..c97d1dc100181b77af0766e08407e1e352f604fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model" 
> } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
> "-march=armv8.2-a" } } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
> "-mtune=thunderx" } } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */

I think this should be in gcc.dg/vect, with the options forced
for { target aarch64 }.

Are the skips necessary?  It looks like the test should work correctly
for all options/targets.

> +/* PR 105219.  */
> +int data[128];
> +
> +void __attribute((noipa))
> +foo (int *data, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +data[i] = i;
> +}
> +
> +int main()
> +{
> +  for (int start = 0; start < 16; ++start)
> +for (int n = 1; n < 3*16; ++n)
> +  {
> +__builtin_memset (data, 0, sizeof (data));
> +foo (&data[start], n);
> +for (int j = 0; j < n; ++j)
> +  if (data[start + j] != j)
> +__builtin_abort ();
> +  }
> +  return 0;
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-3.c 
> b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
> new file mode 100644
> index 
> ..444352fc051b787369f6f1be6236d1ff0fc2d392
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
> "-march=armv8.2-a" } } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
> "-mtune=thunderx" } } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
> +/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model 
> -fdump-tree-vect-all" } */
> +/* PR 105219.  */
> +int data[128];
> +
> +void foo (void)
> +{
> +  for (int i = 0; i < 9; ++i)
> +data[i + 1] = i;
> +}
> +
> +/* { dg-final { scan-tree-dump "EPILOGUE VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219.c 
> b/gcc/testsuite/gcc.target/aarch64/pr105219.c
> new file mode 100644
> index 
> ..bbdefb549f6a4e803852f69d20ce1ef9152a526c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr105219.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run { target aarch64_sve128_hw } } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
> "-march=armv8.2-a+sve" } } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
> "-mtune=thunderx" } } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
> +/* { dg-skip-if "incompatible options" { *-*-* } { "-msve-vector-bits=*" } { 
> "-msve-vector-bits=128" } } */
> +/* { dg-options "-O3 -march=armv8.2-a+sve -msve-vector-bits=128 
> -mtune=thunderx" } */

Same here.

> +/* PR 105219.  */
> +int a;
> +char b[60];
> +short c[18];
> +short d[4][19];
> +long long f;
> +void e(int g, int h, short k[][19]) {
> +  for (signed i = 0; i < 3; i += 2)
> +for (signed j = 1; j < h + 14; j++) {
> +  b[i * 14 + j] = 1;
> +  c[i + j] = k[2][j];
> +  a = g ? k[i][j] : 0;
> +}
> +}
> +int main() {
> +  e(9, 1, d);
> +  for (long l = 0; l < 6; ++l)
> +for (long m = 0; m < 4; ++m)
> +  f ^= b[l + m * 4];
> +  if (f)
> +__builtin_abo

[PATCH] avr: add support for tinyAVR 2 family

2022-04-26 Thread Torsten Duwe via Gcc-patches


Signed-off-by: Torsten Duwe 

---
gcc/ChangeLog:

2022-04-26  Torsten Duwe  

* config/avr/avr-mcus.def (AVR_MCU): add definitions for
attiny{4,8,16,32}2{4,6,7}; 4k and 8k flash types use RCALL.

--- a/gcc/config/avr/avr-mcus.def
+++ b/gcc/config/avr/avr-mcus.def
@@ -333,6 +333,20 @@ AVR_MCU ("attiny1617",   ARCH_AVRXME
 AVR_MCU ("attiny3214",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny3214__",  0x3800, 0x0, 0x8000, 0x8000)
 AVR_MCU ("attiny3216",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny3216__",  0x3800, 0x0, 0x8000, 0x8000)
 AVR_MCU ("attiny3217",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny3217__",  0x3800, 0x0, 0x8000, 0x8000)
+/* "tinyAVR 2" family, xmega3 core */
+AVR_MCU ("attiny424",ARCH_AVRXMEGA3, AVR_ISA_RCALL, 
"__AVR_ATtiny424__",   0x3e00, 0x0, 0x1000, 0x8000)
+AVR_MCU ("attiny426",ARCH_AVRXMEGA3, AVR_ISA_RCALL, 
"__AVR_ATtiny426__",   0x3e00, 0x0, 0x1000, 0x8000)
+AVR_MCU ("attiny427",ARCH_AVRXMEGA3, AVR_ISA_RCALL, 
"__AVR_ATtiny427__",   0x3e00, 0x0, 0x1000, 0x8000)
+AVR_MCU ("attiny824",ARCH_AVRXMEGA3, AVR_ISA_RCALL, 
"__AVR_ATtiny824__",   0x3c00, 0x0, 0x2000, 0x8000)
+AVR_MCU ("attiny826",ARCH_AVRXMEGA3, AVR_ISA_RCALL, 
"__AVR_ATtiny826__",   0x3c00, 0x0, 0x2000, 0x8000)
+AVR_MCU ("attiny827",ARCH_AVRXMEGA3, AVR_ISA_RCALL, 
"__AVR_ATtiny827__",   0x3c00, 0x0, 0x2000, 0x8000)
+AVR_MCU ("attiny1624",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny1624__",  0x3800, 0x0, 0x4000, 0x8000)
+AVR_MCU ("attiny1626",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny1626__",  0x3800, 0x0, 0x4000, 0x8000)
+AVR_MCU ("attiny1627",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny1627__",  0x3800, 0x0, 0x4000, 0x8000)
+AVR_MCU ("attiny3224",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny3224__",  0x3400, 0x0, 0x8000, 0x8000)
+AVR_MCU ("attiny3226",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny3226__",  0x3400, 0x0, 0x8000, 0x8000)
+AVR_MCU ("attiny3227",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATtiny3227__",  0x3400, 0x0, 0x8000, 0x8000)
+
 AVR_MCU ("atmega808",ARCH_AVRXMEGA3, AVR_ISA_RCALL, 
"__AVR_ATmega808__",   0x3c00, 0x0, 0x2000, 0x4000)
 AVR_MCU ("atmega809",ARCH_AVRXMEGA3, AVR_ISA_RCALL, 
"__AVR_ATmega809__",   0x3c00, 0x0, 0x2000, 0x4000)
 AVR_MCU ("atmega1608",   ARCH_AVRXMEGA3, AVR_ISA_NONE,  
"__AVR_ATmega1608__",  0x3800, 0x0, 0x4000, 0x4000)


[GCC 11 backport][committed] libphobos: Give _Unwind_Exception an alignment that best resembles __attribute__((aligned))

2022-04-26 Thread Iain Buclaw via Gcc-patches
This patch backports r12-3986 to the GCC 11 branch.

For interoperability with C++ EH, the alignment should match, otherwise
D may not be able to intercept exceptions thrown from C++.

Bootstrapped and regression tested on x86_64-apple-darwin20.

Regards,
Iain.

---
libphobos/ChangeLog:

* libdruntime/gcc/unwind/generic.d (__aligned__): Define.
(_Unwind_Exception): Align struct to __aligned__.
---
 libphobos/libdruntime/gcc/unwind/generic.d | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/libphobos/libdruntime/gcc/unwind/generic.d 
b/libphobos/libdruntime/gcc/unwind/generic.d
index 592b3afcb71..68ddd1d5410 100644
--- a/libphobos/libdruntime/gcc/unwind/generic.d
+++ b/libphobos/libdruntime/gcc/unwind/generic.d
@@ -123,7 +123,27 @@ enum : _Unwind_Reason_Code
 // @@@ The IA-64 ABI says that this structure must be double-word aligned.
 // Taking that literally does not make much sense generically.  Instead we
 // provide the maximum alignment required by any type for the machine.
-struct _Unwind_Exception
+ version (ARM)  private enum __aligned__ = 8;
+else version (AArch64)  private enum __aligned__ = 16;
+else version (HPPA) private enum __aligned__ = 8;
+else version (HPPA64)   private enum __aligned__ = 16;
+else version (MIPS_N32) private enum __aligned__ = 16;
+else version (MIPS_N64) private enum __aligned__ = 16;
+else version (MIPS32)   private enum __aligned__ = 8;
+else version (MIPS64)   private enum __aligned__ = 8;
+else version (PPC)  private enum __aligned__ = 16;
+else version (PPC64)private enum __aligned__ = 16;
+else version (RISCV32)  private enum __aligned__ = 16;
+else version (RISCV64)  private enum __aligned__ = 16;
+else version (S390) private enum __aligned__ = 8;
+else version (SPARC)private enum __aligned__ = 8;
+else version (SPARC64)  private enum __aligned__ = 16;
+else version (SystemZ)  private enum __aligned__ = 8;
+else version (X86)  private enum __aligned__ = 16;
+else version (X86_64)   private enum __aligned__ = 16;
+else static assert( false, "Platform not supported.");
+
+align(__aligned__) struct _Unwind_Exception
 {
 _Unwind_Exception_Class exception_class;
 _Unwind_Exception_Cleanup_Fn exception_cleanup;
-- 
2.32.0



Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 26, 2022 at 03:43:13PM +0100, Richard Sandiford via Gcc-patches 
wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx 
> > -fno-vect-cost-model" } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
> > "-march=armv8.2-a" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
> > "-mtune=thunderx" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
> 
> I think this should be in gcc.dg/vect, with the options forced
> for { target aarch64 }.

I think not just aarch64, doesn't it need some effective target that
the HW on which it is tested is ARM v8.2-a compatible plus that binutils
can assemble v8.2-a instructions?
Sure, it can be done in gcc.dg/vect too if those effective targets
aren't defined in aarch64.exp.  But probably needs dg-additional-options
there instead of dg-options.

Jakub



Re: [PATCH] rs6000: Move V2DI vec_neg under power8-vector [PR105271]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
Hi!

On Fri, Apr 15, 2022 at 04:08:15PM +0800, Kewen.Lin via Gcc-patches wrote:
> As PR105271 shows, __builtin_altivec_neg_v2di requires option
> -mpower8-vector as its pattern expansion relies on subv2di which
> has guard VECTOR_UNIT_P8_VECTOR_P (V2DImode).  This fix is to move
> the related lines for __builtin_altivec_neg_v2di to the section
> of stanza power8-vector.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
>   PR target/105271
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-builtins.def (NEG_V2DI): Move to [power8-vector]
>   stanza.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr105271.c: New test.

I'd like to ping this patch, one of the last few remaining P1s we have for
GCC 12.

Thanks.

Jakub



[GCC 10 backport][committed] libphobos: Give _Unwind_Exception an alignment that best resembles __attribute__((aligned))

2022-04-26 Thread Iain Buclaw via Gcc-patches
This patch backports r12-3986 to the GCC 10 branch.

For interoperability with C++ EH, the alignment should match, otherwise
D may not be able to intercept exceptions thrown from C++.

Bootstrapped and regression tested on x86_64-linux-gnu.

Regards,
Iain.

---

libphobos/ChangeLog:

* libdruntime/gcc/unwind/generic.d (__aligned__): Define.
(_Unwind_Exception): Align struct to __aligned__.

(cherry picked from commit efa5449a094d3887e124d400ff0410af2c745b2d)
---
 libphobos/libdruntime/gcc/unwind/generic.d | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/libphobos/libdruntime/gcc/unwind/generic.d 
b/libphobos/libdruntime/gcc/unwind/generic.d
index 9c164b6fbac..cca437df926 100644
--- a/libphobos/libdruntime/gcc/unwind/generic.d
+++ b/libphobos/libdruntime/gcc/unwind/generic.d
@@ -123,7 +123,27 @@ enum : _Unwind_Reason_Code
 // @@@ The IA-64 ABI says that this structure must be double-word aligned.
 // Taking that literally does not make much sense generically.  Instead we
 // provide the maximum alignment required by any type for the machine.
-struct _Unwind_Exception
+ version (ARM)  private enum __aligned__ = 8;
+else version (AArch64)  private enum __aligned__ = 16;
+else version (HPPA) private enum __aligned__ = 8;
+else version (HPPA64)   private enum __aligned__ = 16;
+else version (MIPS_N32) private enum __aligned__ = 16;
+else version (MIPS_N64) private enum __aligned__ = 16;
+else version (MIPS32)   private enum __aligned__ = 8;
+else version (MIPS64)   private enum __aligned__ = 8;
+else version (PPC)  private enum __aligned__ = 16;
+else version (PPC64)private enum __aligned__ = 16;
+else version (RISCV32)  private enum __aligned__ = 16;
+else version (RISCV64)  private enum __aligned__ = 16;
+else version (S390) private enum __aligned__ = 8;
+else version (SPARC)private enum __aligned__ = 8;
+else version (SPARC64)  private enum __aligned__ = 16;
+else version (SystemZ)  private enum __aligned__ = 8;
+else version (X86)  private enum __aligned__ = 16;
+else version (X86_64)   private enum __aligned__ = 16;
+else static assert( false, "Platform not supported.");
+
+align(__aligned__) struct _Unwind_Exception
 {
 _Unwind_Exception_Class exception_class;
 _Unwind_Exception_Cleanup_Fn exception_cleanup;
-- 
2.32.0



Re: [PATCH] rs6000: Move V2DI vec_neg under power8-vector [PR105271]

2022-04-26 Thread Segher Boessenkool
On Tue, Apr 26, 2022 at 05:16:18PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> On Fri, Apr 15, 2022 at 04:08:15PM +0800, Kewen.Lin via Gcc-patches wrote:
> > As PR105271 shows, __builtin_altivec_neg_v2di requires option
> > -mpower8-vector as its pattern expansion relies on subv2di which
> > has guard VECTOR_UNIT_P8_VECTOR_P (V2DImode).  This fix is to move
> > the related lines for __builtin_altivec_neg_v2di to the section
> > of stanza power8-vector.
> > 
> > Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> > powerpc64le-linux-gnu P9 and P10.
> > 
> > Is it ok for trunk?
> > 
> > BR,
> > Kewen
> > -
> > PR target/105271
> > 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (NEG_V2DI): Move to [power8-vector]
> > stanza.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/powerpc/pr105271.c: New test.
> 
> I'd like to ping this patch, one of the last few remaining P1s we have for
> GCC 12.

Heh.  I approved it this morning (off-list).  Kewen will commit it
soonish (somewhere during your night probably) :-)


Segher


Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Andre Vieira (lists) via Gcc-patches



On 26/04/2022 15:43, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

Hi,

This patch disables epilogue vectorization when we are peeling for
alignment in the prologue and we can't guarantee the main vectorized
loop is entered.  This is to prevent executing vectorized code with an
unaligned access if the target has indicated it wants to peel for
alignment. We take this conservative approach as we currently do not
distinguish between peeling for alignment for correctness or for
performance.

A better codegen would be to make it skip to the scalar epilogue in case
the main loop isn't entered when alignment peeling is required. However,
that would require a more aggressive change to the codebase which we
chose to avoid at this point of development.  We can revisit this option
during stage 1 if we choose to.

Bootstrapped on aarch64-none-linux and regression tested on
aarch64-none-elf.

gcc/ChangeLog:

      PR tree-optimization/105219
      * tree-vect-loop.cc (vect_epilogue_when_peeling_for_alignment): New
function.
      (vect_analyze_loop): Use vect_epilogue_when_peeling_for_alignment
to determine
      whether to vectorize epilogue.
      * testsuite/gcc.target/aarch64/pr105219.c: New.
      * testsuite/gcc.target/aarch64/pr105219-2.c: New.
      * testsuite/gcc.target/aarch64/pr105219-3.c: New.

diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
new file mode 100644
index 
..c97d1dc100181b77af0766e08407e1e352f604fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model" } 
*/
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
"-march=armv8.2-a" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
"-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */

I think this should be in gcc.dg/vect, with the options forced
for { target aarch64 }.

Are the skips necessary?  It looks like the test should work correctly
for all options/targets.
The -mtune and -march I guess aren't necessary, but if I drop the -mcpu 
skip-if I have to drop the -march option from dg-options as the use 
provided -mcpu might conflict with the -march and the test will fail.

+/* PR 105219.  */
+int data[128];
+
+void __attribute((noipa))
+foo (int *data, int n)
+{
+  for (int i = 0; i < n; ++i)
+data[i] = i;
+}
+
+int main()
+{
+  for (int start = 0; start < 16; ++start)
+for (int n = 1; n < 3*16; ++n)
+  {
+__builtin_memset (data, 0, sizeof (data));
+foo (&data[start], n);
+for (int j = 0; j < n; ++j)
+  if (data[start + j] != j)
+__builtin_abort ();
+  }
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-3.c 
b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
new file mode 100644
index 
..444352fc051b787369f6f1be6236d1ff0fc2d392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
"-march=armv8.2-a" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
"-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
+/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model 
-fdump-tree-vect-all" } */
+/* PR 105219.  */
+int data[128];
+
+void foo (void)
+{
+  for (int i = 0; i < 9; ++i)
+data[i + 1] = i;
+}
+
+/* { dg-final { scan-tree-dump "EPILOGUE VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219.c 
b/gcc/testsuite/gcc.target/aarch64/pr105219.c
new file mode 100644
index 
..bbdefb549f6a4e803852f69d20ce1ef9152a526c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target aarch64_sve128_hw } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
"-march=armv8.2-a+sve" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
"-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-msve-vector-bits=*" } { 
"-msve-vector-bits=128" } } */
+/* { dg-options "-O3 -march=armv8.2-a+sve -msve-vector-bits=128 
-mtune=thunderx" } */

Same here.
Here the reason is even stronger as if the user provides a different 
-msve-vector-bits the test will fail at run-time too (given we are 
requesting 128bit hardware).


Also these were the conditions required for this test to fail, I could 
leave out this altogether ofc and only keep richi's test.



+/* PR 105219.  */
+int a;
+char b[60];
+short c[18];
+short d[4][19];
+long long f;
+void e(int g, int h, short k[][19]) {
+  for (signed i = 0; i < 3;

[GCC 9 backport][committed] libphobos: Give _Unwind_Exception an alignment that best resembles __attribute__((aligned))

2022-04-26 Thread Iain Buclaw via Gcc-patches
Hi,

This patch backports r12-3986 to the GCC 9 branch.

For interoperability with C++ EH, the alignment should match, otherwise
D may not be able to intercept exceptions thrown from C++.

Bootstrapped and regression tested on x86_64-linux-gnu.

Regards,
Iain.

---
libphobos/ChangeLog:

* libdruntime/gcc/unwind/generic.d (__aligned__): Define.
(_Unwind_Exception): Align struct to __aligned__.

(cherry picked from commit efa5449a094d3887e124d400ff0410af2c745b2d)
---
 libphobos/libdruntime/gcc/unwind/generic.d | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/libphobos/libdruntime/gcc/unwind/generic.d 
b/libphobos/libdruntime/gcc/unwind/generic.d
index eefd90ce353..6508ee7c788 100644
--- a/libphobos/libdruntime/gcc/unwind/generic.d
+++ b/libphobos/libdruntime/gcc/unwind/generic.d
@@ -123,7 +123,27 @@ enum : _Unwind_Reason_Code
 // @@@ The IA-64 ABI says that this structure must be double-word aligned.
 // Taking that literally does not make much sense generically.  Instead we
 // provide the maximum alignment required by any type for the machine.
-struct _Unwind_Exception
+ version (ARM)  private enum __aligned__ = 8;
+else version (AArch64)  private enum __aligned__ = 16;
+else version (HPPA) private enum __aligned__ = 8;
+else version (HPPA64)   private enum __aligned__ = 16;
+else version (MIPS_N32) private enum __aligned__ = 16;
+else version (MIPS_N64) private enum __aligned__ = 16;
+else version (MIPS32)   private enum __aligned__ = 8;
+else version (MIPS64)   private enum __aligned__ = 8;
+else version (PPC)  private enum __aligned__ = 16;
+else version (PPC64)private enum __aligned__ = 16;
+else version (RISCV32)  private enum __aligned__ = 16;
+else version (RISCV64)  private enum __aligned__ = 16;
+else version (S390) private enum __aligned__ = 8;
+else version (SPARC)private enum __aligned__ = 8;
+else version (SPARC64)  private enum __aligned__ = 16;
+else version (SystemZ)  private enum __aligned__ = 8;
+else version (X86)  private enum __aligned__ = 16;
+else version (X86_64)   private enum __aligned__ = 16;
+else static assert( false, "Platform not supported.");
+
+align(__aligned__) struct _Unwind_Exception
 {
 _Unwind_Exception_Class exception_class;
 _Unwind_Exception_Cleanup_Fn exception_cleanup;
-- 
2.32.0



Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Andre Vieira (lists) via Gcc-patches

On 26/04/2022 16:12, Jakub Jelinek wrote:

On Tue, Apr 26, 2022 at 03:43:13PM +0100, Richard Sandiford via Gcc-patches 
wrote:

--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model" } 
*/
+/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
"-march=armv8.2-a" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
"-mtune=thunderx" } } */
+/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */

I think this should be in gcc.dg/vect, with the options forced
for { target aarch64 }.

I think not just aarch64, doesn't it need some effective target that
the HW on which it is tested is ARM v8.2-a compatible plus that binutils
can assemble v8.2-a instructions?
Sure, it can be done in gcc.dg/vect too if those effective targets
aren't defined in aarch64.exp.  But probably needs dg-additional-options
there instead of dg-options.

Jakub
For some reason I thought richi wasn't able to reproduce this on other 
targets, but from my last read of the PR I think he was... Regardless 
probably worth testing it for all targets for sure.
Question is how do I make it run for all targets but use target specific 
options for each to try and trigger the original issue? Multiple 
dg-additional-options with different target selectors?


Kind regards,
Andre



Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 26, 2022 at 04:45:14PM +0100, Andre Vieira (lists) wrote:
> For some reason I thought richi wasn't able to reproduce this on other
> targets, but from my last read of the PR I think he was... Regardless

Note, it isn't strictly needed that a test added as generic test
fails before fixes on all arches or many of them, when a test is
itself not target specific, it can be useful to run it on all targets,
while it will catch regressing the same bug again on the originally
failing target, sometimes it can catch other bugs on other targets
(happened many times in the past).

> probably worth testing it for all targets for sure.
> Question is how do I make it run for all targets but use target specific
> options for each to try and trigger the original issue? Multiple
> dg-additional-options with different target selectors?

Yes.  But they really need to be guarded also by effective targets
which guarantee hw support.
Say if you /* { dg-additional-options "-mavx2" }
then it would need to be
/* { dg-additional-options "-mavx2" { target { { i?86-* x86_64-* } && 
avx2_runtime } } } */
or so where that effective target ensures both that assembler can assemble
avx2 instructions and that the hw it is tested on does support them too.
No idea about aarch64/arm effective targets.

Another way sometimes used is to place just normal test without magic
options into gcc.dg/vect/ , i.e. test that with whatever options user
configured gcc with or asks through RUNTESTFLAGS, and when needed add
gcc.target/*/ additional test that has extra dg-options and renames main
to something else and calls that only after checking hw capabilities.
grep ../../gcc.dg/vect testsuite/gcc.target/i386/*.c
for some examples.

Jakub



Ping: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-04-26 Thread Michael Meissner via Gcc-patches
Ping patch.  The customer really needs this patch.  We need to apply it to the
trunk, and then I will have to refactor it for GCC 10 that the customer is
using.

| Date: Tue, 12 Apr 2022 21:14:55 -0400
| From: Michael Meissner 
| Subject: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR 
target/102059
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593153.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


RE: [PATCH] avr: add support for tinyAVR 2 family

2022-04-26 Thread Joel Holdsworth via Gcc-patches
> From: Gcc-patches  bounces+jholdsworth=nvidia@gcc.gnu.org> On Behalf Of Torsten Duwe
> via Gcc-patches
> Sent: Tuesday, April 26, 2022 4:00 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] avr: add support for tinyAVR 2 family

Note, I also submitted a patch to add support for the AVR-DA and AVR-DB 
families here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592668.html

No response so far.

Best Regards
Joel Holdsworth


Re: [PATCH] ppc: testsuite: float128-hw{,4}.c need -mlong-double-128

2022-04-26 Thread Alexandre Oliva via Gcc-patches
On Apr 26, 2022, Segher Boessenkool  wrote:

> The testcase uses _Float128, what code that
> generates should not depend on your long double setting.

Good, that means my hunch that it shouldn't is on the right track.

> Please file a PR instead?

I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105359 the other
day, how does that look?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] fortran: Avoid infinite self-recursion [PR105381]

2022-04-26 Thread Mikael Morin

Le 26/04/2022 à 15:32, Jakub Jelinek a écrit :

On Tue, Apr 26, 2022 at 03:22:08PM +0200, Tobias Burnus wrote:

LGTM - however:

On 26.04.22 14:38, Mikael Morin wrote:

--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3698,7 +3698,8 @@ non_negative_strides_array_p (tree expr)
 if (DECL_P (expr)
 && DECL_LANG_SPECIFIC (expr))
   if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
-  return non_negative_strides_array_p (orig_decl);
+  if (orig_decl != expr)
+ return non_negative_strides_array_p (orig_decl);


Is the if()if()if() cascade really needed? I can see a reason that an
extra 'if' is preferred for the variable declaration of orig_decl, but
can't we at least put the new 'orig_decl != expr' with an '&&' into the
same if as the decl/in the second if? Besides clearer, it also avoids
further identing the return line.


I think we can't in C++11/C++14.  The options can be if orig_decl would be 
declared
earlier, then it can be
 tree orig_decl;
 if (DECL_P (expr)
&& DECL_LANG_SPECIFIC (expr)
&& (orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
&& orig_decl != expr)
   return non_negative_strides_array_p (orig_decl);
but I think this is generally frowned upon,
or one can repeat it like:
 if (DECL_P (expr)
&& DECL_LANG_SPECIFIC (expr)
&& GFC_DECL_SAVED_DESCRIPTOR (expr)
&& GFC_DECL_SAVED_DESCRIPTOR (expr) != expr)
   return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (expr));


I think I’ll use that.  There are numerous places where macros are 
repeated like this already and everybody seems to be pleased with it.

Thanks for the feedback, and for the suggestions.


or what Mikael wrote, perhaps with the && on one line:
 if (DECL_P (expr) && DECL_LANG_SPECIFIC (expr))
   if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
if (orig_decl != expr)
  return non_negative_strides_array_p (orig_decl);
In C++17 and later one can write:
 if (DECL_P (expr) && DECL_LANG_SPECIFIC (expr))
   if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr);
  orig_decl && orig_decl != expr)
return non_negative_strides_array_p (orig_decl);

Jakub





Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717])

2022-04-26 Thread Thomas Schwinge
Hi!

On 2022-04-25T23:19:26+0200, I wrote:
> On 2022-04-20T19:06:17+0200, Jakub Jelinek  wrote:
>> So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs
>> for which that function is invoked need an extra artificial BIND_EXPR
>> around their body so that we move all variables of the bodies.
>>
>> The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
>> or OMP_TARGET and for OpenACC constructs that behave similarly to
>> OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.
>>
>> The following patch does that for OpenACC constructs too.
>> This fixes ICE on the attached testcase.
>
> ACK, thanks.

>> Unfortunately, it also regresses
>> FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90   -O  (test for 
>> excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>> -DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>> -DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>> -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>> -DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer 
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
>> errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>> -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>> -DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
>> Those emits emit tons of various messages and now there are some extra ones,
>
> I've fixed these up.

One more issue became apparent, where the code changes pushed actually do
lead to a GCN offloading compilation failure:

[...]/libgomp.oacc-fortran/print-1.f90: In function ‘MAIN__._omp_fn.0’:
[...]/libgomp.oacc-fortran/print-1.f90:13:14: error: 512 bytes of 
gang-private data-share memory exhausted (increase with 
‘-mgang-private-size=560’, for example)
   13 | !$acc parallel
  |  ^

In my configuration, I may indeed fix GCN offloading compilation with
'-foffload-options=amdgcn-amdhsa=-mgang-private-size=560', but I don't
think that's generally correct/sufficient, so in the the attached
"Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation
[PR104717]", I instead "raise '-mgang-private-size' to an arbitrary high
value".  This avoids having to route the actual 'sizeof' from GCC build
down to the test suite harness (which ought to be doable, but
non-trivial).  OK to push that:

+! For GCN offloading compilation, when gang-privatizing 'dt_parm.N'
+! (see below), we run into an 'gang-private data-share memory exhausted'
+! error: the default '-mgang-private-size' is too small.  Per
+! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is
+! 'struct st_parameter_dt', which indeed is rather big.  Instead of
+! working out its exact size (which may vary per GCC configuration),
+! raise '-mgang-private-size' to an arbitrary high value.
+! { dg-additional-options 
"-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target 
openacc_radeon_accel_selected } }

... to master branch? (This doubles the use/testing of the
'-mgang-private-size' option!)  ;-)

We've currently not been doing OpenACC privatization scanning in
'libgomp.oacc-fortran/print-1.f90', which I've now added, to help
document the issue; no need to review that.

Of course, the issue could alternatively be fixed by adding more logic to
the GCN back end to auto-scale the allocation, or be fixed by adding more
logic to the compiler to avoid gang-privatizing varibales such as
'dt_parm.N' in such cases, but that's not something I'm going to look
into at this point.

Or, of course, be avoided by re-writing the test case to not require
gang-privatizing 'dt_parm.N', but the test case is correct as it is.


Grüße
 Thomas


>   PR fortran/104717
>   gcc/fortran/
>   * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body
>   in an extra BIND_EXPR.

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -,7 +,9 @@ gfc_trans_oacc_construct (gfc_code *code)
>gfc_start_block (&block);
>oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses,
>   code->loc, false, true);
> +  pushlevel ();
>stmt = gfc_trans_omp_code (code->block->next, true);
> +  stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
>stmt = build2_loc (gfc_get_location (&code->loc), construct_code,
>void_type_node, stmt, oacc_clauses);
>gfc_add_expr_to_block (&block, stmt);


-
Siemens Electronic Design Aut

Re: [PATCH] fortran: Avoid infinite self-recursion [PR105381]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Apr 26, 2022 at 07:12:13PM +0200, Mikael Morin wrote:
> > I think we can't in C++11/C++14.  The options can be if orig_decl would be 
> > declared
> > earlier, then it can be
> >  tree orig_decl;
> >  if (DECL_P (expr)
> > && DECL_LANG_SPECIFIC (expr)
> > && (orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
> > && orig_decl != expr)
> >return non_negative_strides_array_p (orig_decl);
> > but I think this is generally frowned upon,
> > or one can repeat it like:
> >  if (DECL_P (expr)
> > && DECL_LANG_SPECIFIC (expr)
> > && GFC_DECL_SAVED_DESCRIPTOR (expr)
> > && GFC_DECL_SAVED_DESCRIPTOR (expr) != expr)
> >return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR 
> > (expr));
> 
> I think I’ll use that.  There are numerous places where macros are repeated
> like this already and everybody seems to be pleased with it.
> Thanks for the feedback, and for the suggestions.

Agreed in this case, GFC_DECL_SAVED_DESCRIPTOR is really just a dereference
at least in release compiler.  Doing that when the macro actually calls some
functions is worse.

Jakub



Re: [x86 PATCH] PR target/92578: Peephole2s to tweak cmove register allocation.

2022-04-26 Thread Uros Bizjak via Gcc-patches
On Mon, Apr 25, 2022 at 1:16 PM Roger Sayle  wrote:
>
>
> This patch addresses a (minor) missed-optimization regression revealed
> by Richard Biener's example/variant in comment #1 of PR target/92578.
>
> int foo(int moves, int movecnt, int komove) {
> int newcnt = movecnt;
> if (moves == komove)
> newcnt -= 2;
> return newcnt;
> }
>
> Comparing code generation on godbolt.org shows an interesting evolution
> over time, as changes in register allocation affect the cmove sequence.
>
> GCC 4.1.2 (4 instructions, suboptimal mov after cmov).
> leal-2(%rsi), %eax
> cmpl%edx, %edi
> cmove   %eax, %esi
> movl%esi, %eax
>
> GCC 4.4-4.7 (3 instructions, optimal)
> leal-2(%rsi), %eax
> cmpl%edx, %edi
> cmovne  %esi, %eax
>
> GCC 5-7 (4 instructions, suboptimal mov before cmov)
> leal-2(%rsi), %ecx
> movl%esi, %eax
> cmpl%edx, %edi
> cmove   %ecx, %eax
>
> GCC 8 (4 instructions, suboptimal mov before cmov, reordered)
> movl%esi, %eax
> leal-2(%rsi), %ecx
> cmpl%edx, %edi
> cmove   %ecx, %eax
>
> GCC 9-trunk (5 instructions, two suboptimal movs before cmov)
> movl%edx, %ecx
> movl%esi, %eax
> leal-2(%rsi), %edx
> cmpl%ecx, %edi
> cmove   %edx, %eax
>
> The challenge is that x86's two operand conditional moves, that require
> the destination to be one of the (register) sources, are tricky for reload,
> whose heuristics unify pseudos early (greedily?).  In this case, we have
> the equivalent of "pseudo1 = cond ? pseudo2 : expression", and we'd like
> to see "pseudo1 = expression; pseudo1 = cond ? pseudo1 : pseudo2", but
> alas reload (currently and quite reasonably) prefers to place pseudo1 and
> pseudo2 in the same hard register if possible.  Hence the solution is to
> fixup/tweak the register allocation during peephole2, as previously with
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575998.html
>
> Instead of a single peephole2 to catch just the current idiom (last above),
> I've added the four peephole2s that would catch each of the (historical)
> suboptimal variants above and transform them into the ideal 3 insn form.
> Instrumenting the compiler shows, for example, that the (earliest) movl
> after cmov triggers over 50 times during stage2 of a GCC bootstrap.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}, with
> no new failures.  Ok for mainline?  Or if this regression isn't serious
> enough for stage4 (or these patterns considered too risky), for stage1
> when it reopens?  I suspect the poor interaction between cmove usage
> and register allocation is one source of confusion when comparing code
> generation with vs. without cmove (the other major source of confusion
> being that well-predicted branches are free, but that prediction-quality
> is poorly predictable).
>
>
> 2022-04-25  Roger Sayle  
>
> gcc/ChangeLog
> PR target/92578
> * config/i386/i386.md (peephole2): Eliminate register-to-register
> moves by inverting the condition of a conditional move.
>
> gcc/testsuite/ChangeLog
> PR target/92578
> * gcc.target/i386/pr92758.c: New test case.

+;; Eliminate a reg-reg mov by inverting the condition of a cmov (#3).
+;; cmov r0,r1; mov r1,r0 -> cmov r1,r0
+(define_peephole2
+ [(set (match_operand:SWI248 0 "general_reg_operand")
+   (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
+ [(reg FLAGS_REG) (const_int 0)])
+(match_operand:SWI248 2 "general_reg_operand")
+(match_operand:SWI248 3 "general_reg_operand")))
+  (set (match_operand:SWI248 4 "general_reg_operand")
+   (match_dup 0))]
+  "TARGET_CMOVE
+   && ((REGNO (operands[0]) == REGNO (operands[2])
+&& REGNO (operands[3]) == REGNO (operands[4]))
+   || (REGNO (operands[0]) == REGNO (operands[3])
+   && REGNO (operands[2]) == REGNO (operands[4])))
+   && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4) (if_then_else:SWI248 (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)))])

We have a valid cmov insn here, so no need to match operand 0 with 2
or 3. But it doesn't hurt to have some extra level of safety. OTOH,
splitting this pattern to two and using (match_dup X) instead would be
IMO more comprehensible.

+;; Eliminate a reg-reg mov by inverting the condition of a cmov (#5).
+;; mov x,r0; mov r1,r2; cmp; cmov r0,r2 -> mov x,r2; cmp; cmov r1,r2
+(define_peephole2
+ [(set (match_operand:SWI248 0 "general_reg_operand")
+   (match_operand:SWI248 1))

You probably want the "general_gr_operand" predicate here. Otherwise,
LEA also fits here, which is probably not what you intended.

I think that we can apply the first pattern (which probably represents
90% of matches), which l

Re: [gcov v2 14/14] gcov: Add section for freestanding environments

2022-04-26 Thread Sebastian Huber

On 26.04.22 15:53, Martin Liška wrote:

This if fine, except 2 places where you have trailing whitespace
at the end of a line.


Thanks for the review.

Should I use "-ftest-coverage -fprofile-arcs" or "--coverage" in the 
tutorial?


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717])

2022-04-26 Thread Thomas Schwinge
Hi!

On 2022-04-26T19:25:31+0200, I wrote:
> On 2022-04-25T23:19:26+0200, I wrote:
>> On 2022-04-20T19:06:17+0200, Jakub Jelinek  wrote:
>>> So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs
>>> for which that function is invoked need an extra artificial BIND_EXPR
>>> around their body so that we move all variables of the bodies.
>>>
>>> The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
>>> or OMP_TARGET and for OpenACC constructs that behave similarly to
>>> OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.
>>>
>>> The following patch does that for OpenACC constructs too.
>>> This fixes ICE on the attached testcase.
>>
>> ACK, thanks.
>
>>> Unfortunately, it also regresses
>>> FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90   -O  (test for 
>>> excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>>> -DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>>> -DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>>> -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>>> -DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer 
>>> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
>>> errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>>> -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
>>> -DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
>>> Those emits emit tons of various messages and now there are some extra ones,
>>
>> I've fixed these up.
>
> One more issue became apparent, where the code changes pushed actually do
> lead to a GCN offloading compilation failure:
>
> [...]/libgomp.oacc-fortran/print-1.f90: In function ‘MAIN__._omp_fn.0’:
> [...]/libgomp.oacc-fortran/print-1.f90:13:14: error: 512 bytes of 
> gang-private data-share memory exhausted (increase with 
> ‘-mgang-private-size=560’, for example)
>13 | !$acc parallel
>   |  ^
>
> In my configuration, I may indeed fix GCN offloading compilation with
> '-foffload-options=amdgcn-amdhsa=-mgang-private-size=560', but I don't
> think that's generally correct/sufficient, so in the the attached
> "Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation
> [PR104717]", I instead "raise '-mgang-private-size' to an arbitrary high
> value".  This avoids having to route the actual 'sizeof' from GCC build
> down to the test suite harness (which ought to be doable, but
> non-trivial).  OK to push that:
>
> +! For GCN offloading compilation, when gang-privatizing 'dt_parm.N'
> +! (see below), we run into an 'gang-private data-share memory exhausted'
> +! error: the default '-mgang-private-size' is too small.  Per
> +! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is
> +! 'struct st_parameter_dt', which indeed is rather big.  Instead of
> +! working out its exact size (which may vary per GCC configuration),
> +! raise '-mgang-private-size' to an arbitrary high value.
> +! { dg-additional-options 
> "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target 
> openacc_radeon_accel_selected } }
>
> ... to master branch? (This doubles the use/testing of the
> '-mgang-private-size' option!)  ;-)

Eh.  That only works with the default GCN multilib '-march=fiji', testing
on gfx803 amdfury2 system.  For all of '-march=gfx900' (amdnano2),
'-march=gfx906' (amd_ryzen3), '-march=gfx908' (amd-instinct1), I get:

libgomp: GCN fatal error: Asynchronous queue error
Runtime message: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent 
attempted to access memory beyond the largest legal address.

..., and I still get that if lowering the allocation to the minimum,
'-foffload-options=amdgcn-amdhsa=-mgang-private-size=560'.

This is a really simple OpenACC 'parallel' construct:

!$acc parallel
  write (0, '("The answer is ", I2)') var
!$acc end parallel

..., which ought to launch a 1-gang x 1-worker x 1-vector GPU kernel, so
I'd assume '-mgang-private-size=560' (or '-mgang-private-size=13579' in
fact) is not a problem?

Help?


Grüße
 Thomas


> We've currently not been doing OpenACC privatization scanning in
> 'libgomp.oacc-fortran/print-1.f90', which I've now added, to help
> document the issue; no need to review that.
>
> Of course, the issue could alternatively be fixed by adding more logic to
> the GCN back end to auto-scale the allocation, or be fixed by adding more
> logic to the compiler to avoid gang-privatizing varibales such as
> 'dt_parm.N' in such cases, but that's not somet

[PATCH] MAINTAINERS: Update email address

2022-04-26 Thread Joel Sherrill
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 15973503722..847df62a934 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -145,7 +145,7 @@ solaris Rainer Orth 

 netbsd Jason Thorpe
 netbsd Krister Walfridsson 
 sh-linux-gnu   Kaz Kojima  
-RTEMS PortsJoel Sherrill   
+RTEMS PortsJoel Sherrill   
 RTEMS PortsRalf Corsepius  
 RTEMS PortsSebastian Huber 

 VMSDouglas Rupp
-- 
2.24.4



[PATCH] fortran: Compare non-constant bound expressions. [PR105379]

2022-04-26 Thread Mikael Morin

Hello,

this fixes a regression caused by my recent PR103662 patch.

Regression tested on x86_64-pc-linux-gnu. OK for master?From d7309a471c42e51e84c37d5d4a3fd5bb0ed67405 Mon Sep 17 00:00:00 2001
From: Mikael Morin 
Date: Mon, 25 Apr 2022 19:47:04 +0200
Subject: [PATCH] fortran: Compare non-constant bound expressions. [PR105379]

Starting with r12-8235-gfa5cd7102da676dcb1757b1288421f5f3439ae0e,
class descriptor types are compared to detect duplicate declarations.

This caused ICEs as the comparison of array spec supported only constant
explicit bounds, but dummy class variable descriptor types can have a
_data field with non-constant array spec bounds.

This change adds support for non-constant bounds.  For that,
gfc_dep_compare_expr is used.  It does probably more than strictly
necessary, but using it avoids rewriting a specific comparison function,
making mistakes and forgetting cases.

	PR fortran/103662
	PR fortran/105379

gcc/fortran/ChangeLog:

	* array.cc (compare_bounds): Use bool as return type.
	Support non-constant expressions.
	(gfc_compare_array_spec): Update call to compare_bounds.

gcc/testsuite/ChangeLog:

	* gfortran.dg/class_dummy_8.f90: New test.
	* gfortran.dg/class_dummy_9.f90: New test.
---
 gcc/fortran/array.cc| 27 -
 gcc/testsuite/gfortran.dg/class_dummy_8.f90 | 20 +++
 gcc/testsuite/gfortran.dg/class_dummy_9.f90 | 20 +++
 3 files changed, 56 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/class_dummy_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/class_dummy_9.f90

diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index 90ea812d699..bbdb5b392fc 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -957,23 +957,28 @@ gfc_copy_array_spec (gfc_array_spec *src)
 }
 
 
-/* Returns nonzero if the two expressions are equal.  Only handles integer
-   constants.  */
+/* Returns nonzero if the two expressions are equal.
+   We should not need to support more than constant values, as that’s what is
+   allowed in derived type component array spec.  However, we may create types
+   with non-constant array spec for dummy variable class container types, for
+   which the _data component holds the array spec of the variable declaration.
+   So we have to support non-constant bounds as well.  */
 
-static int
+static bool
 compare_bounds (gfc_expr *bound1, gfc_expr *bound2)
 {
   if (bound1 == NULL || bound2 == NULL
-  || bound1->expr_type != EXPR_CONSTANT
-  || bound2->expr_type != EXPR_CONSTANT
   || bound1->ts.type != BT_INTEGER
   || bound2->ts.type != BT_INTEGER)
 gfc_internal_error ("gfc_compare_array_spec(): Array spec clobbered");
 
-  if (mpz_cmp (bound1->value.integer, bound2->value.integer) == 0)
-return 1;
-  else
-return 0;
+  /* What qualifies as identical bounds?  We could probably just check that the
+ expressions are exact clones.  We avoid rewriting a specific comparison
+ function and re-use instead the rather involved gfc_dep_compare_expr which
+ is just a bit more permissive, as it can also detect identical values for
+ some mismatching expressions (extra parenthesis, swapped operands, unary
+ plus, etc).  It probably only makes a difference in corner cases.  */
+  return gfc_dep_compare_expr (bound1, bound2) == 0;
 }
 
 
@@ -1006,10 +1011,10 @@ gfc_compare_array_spec (gfc_array_spec *as1, gfc_array_spec *as2)
   if (as1->type == AS_EXPLICIT)
 for (i = 0; i < as1->rank + as1->corank; i++)
   {
-	if (compare_bounds (as1->lower[i], as2->lower[i]) == 0)
+	if (!compare_bounds (as1->lower[i], as2->lower[i]))
 	  return 0;
 
-	if (compare_bounds (as1->upper[i], as2->upper[i]) == 0)
+	if (!compare_bounds (as1->upper[i], as2->upper[i]))
 	  return 0;
   }
 
diff --git a/gcc/testsuite/gfortran.dg/class_dummy_8.f90 b/gcc/testsuite/gfortran.dg/class_dummy_8.f90
new file mode 100644
index 000..0976a725866
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_dummy_8.f90
@@ -0,0 +1,20 @@
+! { dg-do compile }
+!
+! PR fortran/105379
+! Type comparison of class containers used to trigger an ICE when one of the
+! class containers had a non-constant array spec.
+!
+! Contributed by Gerhard Steinmetz .
+
+program p
+   type t
+   end type
+contains
+   subroutine s1(x)
+  class(t) :: x(3)
+   end
+   subroutine s2(n, x)
+  integer :: n
+  class(t) :: x(n)
+   end
+end
diff --git a/gcc/testsuite/gfortran.dg/class_dummy_9.f90 b/gcc/testsuite/gfortran.dg/class_dummy_9.f90
new file mode 100644
index 000..0fd98c05be2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_dummy_9.f90
@@ -0,0 +1,20 @@
+! { dg-do compile }
+!
+! PR fortran/105379
+! Type comparison of class containers used to trigger an ICE when one of the
+! class containers had a non-constant array spec.
+!
+! Contributed by Gerhard Steinmetz .
+
+program p
+   type t
+   end type
+   integer :: m = 3
+contains
+   subroutine s1(x)
+

Re: [PATCH] fortran: Compare non-constant bound expressions. [PR105379]

2022-04-26 Thread Thomas Koenig via Gcc-patches



Hi Mikael,


this fixes a regression caused by my recent PR103662 patch.

Regression tested on x86_64-pc-linux-gnu. OK for master?


OK.  Good to see that a bit of optimization can also sneak in with
a bug fix :-)

Best regards

Thomas


[PATCH v2] fortran: Avoid infinite self-recursion [PR105381]

2022-04-26 Thread Mikael Morin

Le 26/04/2022 à 19:12, Mikael Morin a écrit :

Le 26/04/2022 à 15:32, Jakub Jelinek a écrit :

or one can repeat it like:
 if (DECL_P (expr)
&& DECL_LANG_SPECIFIC (expr)
&& GFC_DECL_SAVED_DESCRIPTOR (expr)
&& GFC_DECL_SAVED_DESCRIPTOR (expr) != expr)
   return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR 
(expr));


I think I’ll use that. 


Here it comes.
Regression tested again. OK?
From 9da696478832bb3fe5ac25542ad9226ce3235368 Mon Sep 17 00:00:00 2001
From: Mikael Morin 
Date: Tue, 26 Apr 2022 13:05:32 +0200
Subject: [PATCH v2] fortran: Avoid infinite self-recursion [PR105381]

Dummy array decls are local decls different from the argument decl
accessible through GFC_DECL_SAVED_DESCRIPTOR.  If the argument decl has
a DECL_LANG_SPECIFIC set, it is copied over to the local decl at the
time the latter is created, so that the DECL_LANG_SPECIFIC object is
shared between local dummy decl and argument decl, and thus the
GFC_DECL_SAVED_DESCRIPTOR of the argument decl is the argument decl
itself.

The r12-8230-g7964ab6c364c410c34efe7ca2eba797d36525349 change introduced
the non_negative_strides_array_p predicate which recurses through
GFC_DECL_SAVED_DESCRIPTOR to avoid seeing dummy decls as purely local
decls.  As the GFC_DECL_SAVED_DESCRIPTOR of the argument decl is itself,
this can cause infinite recursion.

This change adds a check to avoid infinite recursion.

	PR fortran/102043
	PR fortran/105381

gcc/fortran/ChangeLog:

	* trans-array.cc (non_negative_strides_array_p): Inline variable
	orig_decl and merge nested if conditions.  Add condition to not
	recurse if the next argument is the same as the current.

gcc/testsuite/ChangeLog:

	* gfortran.dg/character_array_dummy_1.f90: New test.
---
 gcc/fortran/trans-array.cc|  7 ---
 .../gfortran.dg/character_array_dummy_1.f90   | 21 +++
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/character_array_dummy_1.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index e4b6270ccf8..05134952db4 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3696,9 +3696,10 @@ non_negative_strides_array_p (tree expr)
   /* If the array was originally a dummy with a descriptor, strides can be
  negative.  */
   if (DECL_P (expr)
-  && DECL_LANG_SPECIFIC (expr))
-if (tree orig_decl = GFC_DECL_SAVED_DESCRIPTOR (expr))
-  return non_negative_strides_array_p (orig_decl);
+  && DECL_LANG_SPECIFIC (expr)
+  && GFC_DECL_SAVED_DESCRIPTOR (expr)
+  && GFC_DECL_SAVED_DESCRIPTOR (expr) != expr)
+return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR (expr));
 
   return true;
 }
diff --git a/gcc/testsuite/gfortran.dg/character_array_dummy_1.f90 b/gcc/testsuite/gfortran.dg/character_array_dummy_1.f90
new file mode 100644
index 000..da5ed636f4f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/character_array_dummy_1.f90
@@ -0,0 +1,21 @@
+! { dg-do compile }
+!
+! PR fortran/105381
+! Infinite recursion with array references of character dummy arguments.
+!
+! Contributed by Harald Anlauf 
+
+MODULE m
+  implicit none
+  integer,  parameter :: ncrit  =  8
+  integer,  parameter :: nterm  =  7
+contains
+
+  subroutine new_thin_rule (rule1)
+character(*),intent(in) ,optional :: rule1(ncrit)
+character(len=8) :: rules (ncrit,nterm)
+rules = ''
+if (present (rule1)) rules(:,1) = rule1  ! <-- compile time hog
+  end subroutine new_thin_rule
+
+end module m
-- 
2.35.1



Re: [PATCH v2] fortran: Avoid infinite self-recursion [PR105381]

2022-04-26 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 26.04.22 um 21:10 schrieb Mikael Morin:

Le 26/04/2022 à 19:12, Mikael Morin a écrit :

Le 26/04/2022 à 15:32, Jakub Jelinek a écrit :

or one can repeat it like:
 if (DECL_P (expr)
&& DECL_LANG_SPECIFIC (expr)
&& GFC_DECL_SAVED_DESCRIPTOR (expr)
&& GFC_DECL_SAVED_DESCRIPTOR (expr) != expr)
   return non_negative_strides_array_p (GFC_DECL_SAVED_DESCRIPTOR
(expr));


I think I’ll use that.


Here it comes.
Regression tested again. OK?


works for me.

Thanks for the quick fix!

Harald


[PATCH, rs6000] Fix passing of Coomplex IEEE 128-bit [PR99685]

2022-04-26 Thread Pat Haugen via Gcc-patches

Fix register count when not splitting Complex IEEE 128-bit args.

For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only 
advancing the

function arg counter by one VSX register. Fixed with this patch.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-04-26  Pat Haugen  

PR testsuite/99685

gcc/
* config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
register count when not splitting IEEE 128-bit Complex.


diff --git a/gcc/config/rs6000/rs6000-call.cc 
b/gcc/config/rs6000/rs6000-call.cc

index f06c692..9d18607 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -,6 +,12 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS 
*cum, machine_mode mode,

{
  cum->vregno += n_elts;

+ /* If we are not splitting Complex IEEE 128-bit args then account for
+the fact that they are passed in 2 VSX regs. */
+ if (! targetm.calls.split_complex_arg && type
+ && TREE_CODE (type) == COMPLEX_TYPE && elt_mode == KCmode)
+   cum->vregno++;
+
  if (!TARGET_ALTIVEC)
error ("cannot pass argument in vector register because"
   " altivec instructions are disabled, use %qs"


[pushed] c++: pack init-capture of unresolved overload [PR102629]

2022-04-26 Thread Jason Merrill via Gcc-patches
Here we were failing to diagnose that the initializer for the capture pack
is an unresolved overload.  It turns out that the reason we didn't recognize
the deduction failure in do_auto_deduction was that the individual 'auto' in
the expansion of the capture pack was still marked as a parameter pack, so
we were deducing it to an empty pack instead of failing.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/102629

gcc/cp/ChangeLog:

* pt.cc (gen_elem_of_pack_expansion_instantiation): Clear
TEMPLATE_TYPE_PARAMETER_PACK on auto.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-pack-init7.C: New test.
---
 gcc/cp/pt.cc   |  8 +++-
 gcc/testsuite/g++.dg/cpp2a/lambda-pack-init7.C | 18 ++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-pack-init7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a77b3166818..3cf1d7af8d2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -12682,7 +12682,13 @@ gen_elem_of_pack_expansion_instantiation (tree pattern,
 t = tsubst_expr (pattern, args, complain, in_decl,
 /*integral_constant_expression_p=*/false);
   else
-t = tsubst (pattern, args, complain, in_decl);
+{
+  t = tsubst (pattern, args, complain, in_decl);
+  if (is_auto (t) && !ith_elem_is_expansion)
+   /* When expanding the fake auto... pack expansion from add_capture, we
+  need to mark that the expansion is no longer a pack.  */
+   TEMPLATE_TYPE_PARAMETER_PACK (t) = false;
+}
 
   /*  If the Ith argument pack element is a pack expansion, then
   the Ith element resulting from the substituting is going to
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-pack-init7.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-pack-init7.C
new file mode 100644
index 000..f3c3899e97a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-pack-init7.C
@@ -0,0 +1,18 @@
+// PR c++/102629
+// { dg-do compile { target c++20 } }
+
+template  T&& forward(T&);
+template  T&& forward(T&&);
+
+struct S {};
+
+template 
+void foo(Args&&... args) {
+  [...args = forward /*(args)*/] { // { dg-error "" }
+[](auto...) { } (forward(args)...);
+  };
+}
+
+void bar( ) {
+  foo(S{});
+}

base-commit: 7d31c678d68d7b6820a958584619ca763b0eb9c5
-- 
2.27.0



[r12-8175 Regression] FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2b (test for excess errors) on Linux/x86_64

2022-04-26 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

a54137c88061c7495728fc6b8dfd0474e812b2cb is the first bad commit
commit a54137c88061c7495728fc6b8dfd0474e812b2cb
Author: Patrick Palka 
Date:   Fri Apr 15 09:34:09 2022 -0400

libstdc++: Optimize integer std::from_chars

caused

FAIL: g++.dg/modules/xtreme-header-4_a.H module-cmi  
(gcm.cache/$srcdir/g++.dg/modules/xtreme-header-4_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header-4_a.H -std=c++17 (internal compiler error: 
in insert, at cp/module.cc:4800)
FAIL: g++.dg/modules/xtreme-header-4_a.H -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-4_a.H -std=c++2a (internal compiler error: 
in insert, at cp/module.cc:4800)
FAIL: g++.dg/modules/xtreme-header-4_a.H -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-4_a.H -std=c++2b (internal compiler error: 
in insert, at cp/module.cc:4800)
FAIL: g++.dg/modules/xtreme-header-4_a.H -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_a.H module-cmi  
(gcm.cache/$srcdir/g++.dg/modules/xtreme-header_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++17 (internal compiler error: in 
insert, at cp/module.cc:4800)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2a (internal compiler error: in 
insert, at cp/module.cc:4800)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2b (internal compiler error: in 
insert, at cp/module.cc:4800)
FAIL: g++.dg/modules/xtreme-header_a.H -std=c++2b (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-8175/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header-4_a.H 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header-4_a.H 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header-4_a.H 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header-4_a.H 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header_a.H 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header_a.H 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header_a.H 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header_a.H 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] asan: Fix up asan_redzone_buffer::emit_redzone_byte [PR105396]

2022-04-26 Thread Jakub Jelinek via Gcc-patches
Hi!

On the following testcase, we have in main's frame 3 variables,
some red zone padding, 4 byte d, followed by 12 bytes of red zone padding, then
8 byte b followed by 24 bytes of red zone padding, then 40 bytes c followed
by some red zone padding.
The intended content of shadow memory for that is (note, each byte describes
8 bytes of memory):
f1 f1 f1 f1 04 f2 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
left redd  mr b  middle r c  right red zone

f1 is left red zone magic
f2 is middle red zone magic
f3 is right red zone magic
00 when all 8 bytes are accessible
01-07 when only 1 to 7 bytes are accessible followed by inaccessible bytes

The -fdump-rtl-expand-details dump makes it clear that it misbehaves:
Flushing rzbuffer at offset -160 with: f1 f1 f1 f1
Flushing rzbuffer at offset -128 with: 04 f2 00 00
Flushing rzbuffer at offset -128 with: 00 00 00 f2
Flushing rzbuffer at offset -96 with: f2 f2 00 00
Flushing rzbuffer at offset -64 with: 00 00 00 f3
Flushing rzbuffer at offset -32 with: f3 f3 f3 f3
In the end we end up with
f1 f1 f1 f1 00 00 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
shadow bytes because at offset -128 there are 2 overlapping stores
as asan_redzone_buffer::emit_redzone_byte has flushed the temporary 4 byte
buffer in the middle.

The function is called with an offset and value.  If the passed offset is
consecutive with the prev_offset + buffer size (off == offset), then
we handle it correctly, similarly if the new offset is far enough from the
old one (we then flush whatever was in the buffer and if needed add up to 3
bytes of 00 before actually pushing value.

But what isn't handled correctly is when the offset isn't consecutive to
what has been added last time, but it is in the same 4 byte word of shadow
memory (32 bytes of actual memory), like the above case where
we have consecutive 04 f2 and then skip one shadow memory byte (aka 8 bytes
of real memory) and then want to emit f2.  Emitting that as a store
of little-endian 0xf204 followed by a store of 0xf200 to the same
address doesn't work, we want to emit 0xf200f204.

The following patch does that by pushing 1 or 2 00 bytes.
Additionally, as a small cleanup, instead of using
  m_shadow_bytes.safe_push (value);
  flush_if_full ();
in all of if, else if and else bodies it sinks those 2 stmts to the end
of function as all do the same thing.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-04-26  Jakub Jelinek  

PR sanitizer/105396
* asan.cc (asan_redzone_buffer::emit_redzone_byte): Handle the case
where offset is bigger than off but smaller than m_prev_offset + 32
bits by pushing one or more 0 bytes.  Sink the
m_shadow_bytes.safe_push (value); flush_if_full (); statements from
all cases to the end of the function.

* gcc.dg/asan/pr105396.c: New test.

--- gcc/asan.cc.jj  2022-02-19 09:03:50.0 +0100
+++ gcc/asan.cc 2022-04-26 16:57:49.737316329 +0200
@@ -1497,10 +1497,14 @@ asan_redzone_buffer::emit_redzone_byte (
   HOST_WIDE_INT off
 = m_prev_offset + ASAN_SHADOW_GRANULARITY * m_shadow_bytes.length ();
   if (off == offset)
+/* Consecutive shadow memory byte.  */;
+  else if (offset < m_prev_offset + (HOST_WIDE_INT) (ASAN_SHADOW_GRANULARITY
+* RZ_BUFFER_SIZE)
+  && !m_shadow_bytes.is_empty ())
 {
-  /* Consecutive shadow memory byte.  */
-  m_shadow_bytes.safe_push (value);
-  flush_if_full ();
+  /* Shadow memory byte with a small gap.  */
+  for (; off < offset; off += ASAN_SHADOW_GRANULARITY)
+   m_shadow_bytes.safe_push (0);
 }
   else
 {
@@ -1521,9 +1525,9 @@ asan_redzone_buffer::emit_redzone_byte (
   m_shadow_mem = adjust_address (m_shadow_mem, VOIDmode,
 diff >> ASAN_SHADOW_SHIFT);
   m_prev_offset = offset;
-  m_shadow_bytes.safe_push (value);
-  flush_if_full ();
 }
+  m_shadow_bytes.safe_push (value);
+  flush_if_full ();
 }
 
 /* Emit RTX emission of the content of the buffer.  */
--- gcc/testsuite/gcc.dg/asan/pr105396.c.jj 2022-04-26 16:56:34.522348879 
+0200
+++ gcc/testsuite/gcc.dg/asan/pr105396.c2022-04-26 17:00:54.757776387 
+0200
@@ -0,0 +1,19 @@
+/* PR sanitizer/105396 */
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
+/* { dg-shouldfail "asan" } */
+
+int
+main ()
+{
+  int a;
+  int *b[1];
+  int c[10];
+  int d[1][1];
+  for (a = 0; a < 1; a++)
+d[1][a] = 0;
+  return 0;
+}
+
+/* { dg-output "ERROR: AddressSanitizer: stack-buffer-overflow on 
address.*(\n|\r\n|\r)" } */
+/* { dg-output "WRITE of size.*" } */

Jakub



Re: [PATCH, rs6000] Fix passing of Coomplex IEEE 128-bit [PR99685]

2022-04-26 Thread Segher Boessenkool
Hi!

On Tue, Apr 26, 2022 at 03:06:51PM -0500, Pat Haugen wrote:
> Fix register count when not splitting Complex IEEE 128-bit args.
> 
> For ABI_V4, we do not split complex args.

Because that is what the ABI requires, yes :-)

> This created a problem because
> even though an arg would be passed in two VSX regs, we were only 
> advancing the
> function arg counter by one VSX register. Fixed with this patch.

> gcc/
>   PR testsuite/99685
>   * config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
>   register count when not splitting IEEE 128-bit Complex.

Note where the PR marker goes.

> +   /* If we are not splitting Complex IEEE 128-bit args then account 
> for

You sent the patch as format=flawed.  Don't.  It does not work.

> +  the fact that they are passed in 2 VSX regs. */
> +   if (! targetm.calls.split_complex_arg && type

No space after "!" (or any other unary operator not written with
letters).

> +   && TREE_CODE (type) == COMPLEX_TYPE && elt_mode == KCmode)
> + cum->vregno++;
> +

With those trivialities fixed, okay for trunk.  Thanks!


Segher


[PATCH] c++: ICE with temporary of class type in DMI [PR100252]

2022-04-26 Thread Marek Polacek via Gcc-patches
Consider

  struct A {
int x;
int y = x;
  };

  struct B {
int x = 0;
int y = A{x}.y; // #1
  };

where for #1 we end up with

  {.x=(&)->x, .y=(&)->x}

that is, two PLACEHOLDER_EXPRs for different types on the same level in
a {}.  This crashes because our CONSTRUCTOR_PLACEHOLDER_BOUNDARY mechanism to
avoid replacing unrelated PLACEHOLDER_EXPRs cannot deal with it.

Here's why we wound up with those PLACEHOLDER_EXPRs: When we're performing
cp_parser_late_parsing_nsdmi for "int y = A{x}.y;" we use 
finish_compound_literal
on type=A, compound_literal={((struct B *) this)->x}.  When digesting this
initializer, we call get_nsdmi which creates a PLACEHOLDER_EXPR for A -- we 
don't
have any object to refer to yet.  After digesting, we have

  {.x=((struct B *) this)->x, .y=(&)->x}

and since we've created a PLACEHOLDER_EXPR inside it, we marked the whole ctor
CONSTRUCTOR_PLACEHOLDER_BOUNDARY.  f_c_l creates a TARGET_EXPR and returns

  TARGET_EXPR x, .y=(&)->x}>

Then we get to

  B b = {};

and call store_init_value, which digest the {}, which produces

  {.x=NON_LVALUE_EXPR <0>, .y=(TARGET_EXPR )->x, .y=(&)->x}>).y}

The call to replace_placeholders in store_init_value will not do anything:
we've marked the inner { } CONSTRUCTOR_PLACEHOLDER_BOUNDARY, and it's only
a sub-expression, so replace_placeholders does nothing, so the 
stays even though now is the perfect time to replace it because we have an
object for it: 'b'.

Later, in cp_gimplify_init_expr the *expr_p is

  D.2395 = {.x=(&)->x, .y=(&)->x}

where D.2395 is of type A, but we crash because we hit , which
has a different type.

My idea was to replace  with D.2384 in f_c_l after creating the
TARGET_EXPR because that means we have an object we can refer to.  Then clear
CONSTRUCTOR_PLACEHOLDER_BOUNDARY because we no longer have a PLACEHOLDER_EXPR
in the {}.  Then store_init_value will be able to replace  with
'b', and we should be good to go.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11.4?

PR c++/100252

gcc/cp/ChangeLog:

* semantics.cc (finish_compound_literal): replace_placeholders after
creating the TARGET_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/nsdmi-aggr14.C: New test.
---
 gcc/cp/semantics.cc   | 31 +++
 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr14.C | 46 +++
 2 files changed, 77 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr14.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index ab48f11c9be..770369458bb 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -3296,6 +3296,37 @@ finish_compound_literal (tree type, tree 
compound_literal,
   if (TREE_CODE (compound_literal) == CONSTRUCTOR)
TREE_HAS_CONSTRUCTOR (compound_literal) = false;
   compound_literal = get_target_expr_sfinae (compound_literal, complain);
+  /* We may have A{} in a NSDMI.  */
+  if (parsing_nsdmi ())
+   {
+ /* Digesting the {} could have introduced a PLACEHOLDER_EXPR
+referring to A.  Now that we've built up a TARGET_EXPR, we
+have an object we can refer to.  The reason we bother doing
+this here is for code like
+
+  struct A {
+int x;
+int y = x;
+  };
+
+  struct B {
+int x = 0;
+int y = A{x}.y; // #1
+  };
+
+where in #1 we don't want to end up with two PLACEHOLDER_EXPRs
+for different types on the same level in a {} as in 100252.  */
+ tree init = TARGET_EXPR_INITIAL (compound_literal);
+ if (TREE_CODE (init) == CONSTRUCTOR
+ && CONSTRUCTOR_PLACEHOLDER_BOUNDARY (init))
+   {
+ tree obj = TARGET_EXPR_SLOT (compound_literal);
+ replace_placeholders (compound_literal, obj);
+ /* We should have dealt with the PLACEHOLDER_EXPRs.  */
+ CONSTRUCTOR_PLACEHOLDER_BOUNDARY (init) = false;
+ gcc_checking_assert (!find_placeholders (init));
+   }
+   }
 }
   else
 /* For e.g. int{42} just make sure it's a prvalue.  */
diff --git a/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr14.C 
b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr14.C
new file mode 100644
index 000..7d508f52b48
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr14.C
@@ -0,0 +1,46 @@
+// PR c++/100252
+// { dg-do run { target c++14 } }
+
+#define SA(X) static_assert ((X),#X)
+
+struct A {
+  int x;
+  int y = x;
+};
+
+struct B {
+  int x = 0;
+  int y = A{x}.y;
+};
+
+constexpr B csb1 = { };
+SA(csb1.x == 0 && csb1.y == csb1.x);
+constexpr B csb2 = { 1 };
+SA(csb2.x == 1 && csb2.y == csb2.x);
+constexpr B csb3 = { 1, 2 };
+SA(csb3.x == 1 && csb3.y == 2);
+
+B sb1 = { };
+B sb2 = { 1 };
+B sb3 = { 1, 2};
+
+int
+main ()
+{
+  if (sb1.x != 0 || sb1.x != sb1.y)
+__builtin_abort();
+  if (sb2.x != 1 || sb2.x != sb2.y)
+__builtin_abort()

[PATCH] c++: enum in generic lambda at global scope [PR105398]

2022-04-26 Thread Marek Polacek via Gcc-patches
We crash compiling this test since r11-7993 which changed
lookup_template_class_1 so that we only call tsubst_enum when

  !uses_template_parms (current_nonlambda_scope ())

But here current_nonlambda_scope () is the global NAMESPACE_DECL ::, which
doesn't have a type, therefore is considered type-dependent.  So we don't
call tsubst_enum, and crash in tsubst_copy/CONST_DECL because we didn't
find the e1 enumerator.

I don't think any namespace can depend on any template parameter, so
this patch tweaks uses_template_parms.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?

PR c++/105398

gcc/cp/ChangeLog:

* pt.cc (uses_template_parms): Return false for any NAMESPACE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-enum2.C: New test.
---
 gcc/cp/pt.cc  |  2 +-
 gcc/testsuite/g++.dg/cpp1y/lambda-generic-enum2.C | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-enum2.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3cf1d7af8d2..e785c5db142 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10905,7 +10905,7 @@ uses_template_parms (tree t)
   || uses_template_parms (TREE_CHAIN (t)));
   else if (TREE_CODE (t) == TYPE_DECL)
 dependent_p = dependent_type_p (TREE_TYPE (t));
-  else if (t == error_mark_node)
+  else if (t == error_mark_node || TREE_CODE (t) == NAMESPACE_DECL)
 dependent_p = false;
   else
 dependent_p = instantiation_dependent_expression_p (t);
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-enum2.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-enum2.C
new file mode 100644
index 000..77cf0bb9d02
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-enum2.C
@@ -0,0 +1,15 @@
+// PR c++/105398
+// { dg-do compile { target c++14 } }
+
+auto f = [](auto &&m) {
+enum E { _,e3,e2,e1,C4,C3,C2,C1 };
+static constexpr int x_coeffs[3][4] = {
+{e1,C2,C3,C4},
+{e2,C1,C3,C4},
+{e3,C1,C2,C4},
+};
+};
+
+int main() {
+f(0);
+}

base-commit: 9ace5d4dab2ab39072b0f07089621a823580f27c
-- 
2.35.1



Re: [PATCH] rs6000: Move V2DI vec_neg under power8-vector [PR105271]

2022-04-26 Thread Kewen.Lin via Gcc-patches
on 2022/4/26 11:25 PM, Segher Boessenkool wrote:
> On Tue, Apr 26, 2022 at 05:16:18PM +0200, Jakub Jelinek wrote:
>> Hi!
>>
>> On Fri, Apr 15, 2022 at 04:08:15PM +0800, Kewen.Lin via Gcc-patches wrote:
>>> As PR105271 shows, __builtin_altivec_neg_v2di requires option
>>> -mpower8-vector as its pattern expansion relies on subv2di which
>>> has guard VECTOR_UNIT_P8_VECTOR_P (V2DImode).  This fix is to move
>>> the related lines for __builtin_altivec_neg_v2di to the section
>>> of stanza power8-vector.
>>>
>>> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
>>> powerpc64le-linux-gnu P9 and P10.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -
>>> PR target/105271
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000-builtins.def (NEG_V2DI): Move to [power8-vector]
>>> stanza.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/pr105271.c: New test.
>>
>> I'd like to ping this patch, one of the last few remaining P1s we have for
>> GCC 12.
> 
> Heh.  I approved it this morning (off-list).  Kewen will commit it
> soonish (somewhere during your night probably) :-)
> 

Thank you both!  I just committed it as r12-8275.

BR,
Kewen


Re: *PING* [PATCH 0/4] Use pointer arithmetic for array references [PR102043]

2022-04-26 Thread Thomas Koenig via Gcc-patches



On 26.04.22 16:40, Hans-Peter Nilsson wrote:


These, or specifically r12-8227-g89ca0fffa48b79, "fortran:
Pre-evaluate string pointers. [PR102043]" have further
exposed (the issue existed before but now fails for more
platforms) PR78054 "gfortran.dg/pr70673.f90 FAILs at -O0",
at least for cris-elf and apparently also
s390x-ibm-linux-gnu.

In the PR it is mentioned that running the test through
valgrind shows invalid accesses also on x86_64-linux-gnu.
Could it be that the test-case is invalid and has undefined
behavior?  I don't know fortran so I can't tell.


You're right.  I just looked at the test case and can confirm
what Steve Kargl observed in the PR, it is in fact invalid.
You cannot do

  a = a

after deallocating a.

I've assigned the PR to myself and will commit the change to
dg-do compile soon (unless anybody objects).

Best regards

Thomas


Re: [PATCH] asan: Fix up asan_redzone_buffer::emit_redzone_byte [PR105396]

2022-04-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Apr 2022, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase, we have in main's frame 3 variables,
> some red zone padding, 4 byte d, followed by 12 bytes of red zone padding, 
> then
> 8 byte b followed by 24 bytes of red zone padding, then 40 bytes c followed
> by some red zone padding.
> The intended content of shadow memory for that is (note, each byte describes
> 8 bytes of memory):
> f1 f1 f1 f1 04 f2 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
> left redd  mr b  middle r c  right red zone
> 
> f1 is left red zone magic
> f2 is middle red zone magic
> f3 is right red zone magic
> 00 when all 8 bytes are accessible
> 01-07 when only 1 to 7 bytes are accessible followed by inaccessible bytes
> 
> The -fdump-rtl-expand-details dump makes it clear that it misbehaves:
> Flushing rzbuffer at offset -160 with: f1 f1 f1 f1
> Flushing rzbuffer at offset -128 with: 04 f2 00 00
> Flushing rzbuffer at offset -128 with: 00 00 00 f2
> Flushing rzbuffer at offset -96 with: f2 f2 00 00
> Flushing rzbuffer at offset -64 with: 00 00 00 f3
> Flushing rzbuffer at offset -32 with: f3 f3 f3 f3
> In the end we end up with
> f1 f1 f1 f1 00 00 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
> shadow bytes because at offset -128 there are 2 overlapping stores
> as asan_redzone_buffer::emit_redzone_byte has flushed the temporary 4 byte
> buffer in the middle.
> 
> The function is called with an offset and value.  If the passed offset is
> consecutive with the prev_offset + buffer size (off == offset), then
> we handle it correctly, similarly if the new offset is far enough from the
> old one (we then flush whatever was in the buffer and if needed add up to 3
> bytes of 00 before actually pushing value.
> 
> But what isn't handled correctly is when the offset isn't consecutive to
> what has been added last time, but it is in the same 4 byte word of shadow
> memory (32 bytes of actual memory), like the above case where
> we have consecutive 04 f2 and then skip one shadow memory byte (aka 8 bytes
> of real memory) and then want to emit f2.  Emitting that as a store
> of little-endian 0xf204 followed by a store of 0xf200 to the same
> address doesn't work, we want to emit 0xf200f204.
> 
> The following patch does that by pushing 1 or 2 00 bytes.
> Additionally, as a small cleanup, instead of using
>   m_shadow_bytes.safe_push (value);
>   flush_if_full ();
> in all of if, else if and else bodies it sinks those 2 stmts to the end
> of function as all do the same thing.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2022-04-26  Jakub Jelinek  
> 
>   PR sanitizer/105396
>   * asan.cc (asan_redzone_buffer::emit_redzone_byte): Handle the case
>   where offset is bigger than off but smaller than m_prev_offset + 32
>   bits by pushing one or more 0 bytes.  Sink the
>   m_shadow_bytes.safe_push (value); flush_if_full (); statements from
>   all cases to the end of the function.
> 
>   * gcc.dg/asan/pr105396.c: New test.
> 
> --- gcc/asan.cc.jj2022-02-19 09:03:50.0 +0100
> +++ gcc/asan.cc   2022-04-26 16:57:49.737316329 +0200
> @@ -1497,10 +1497,14 @@ asan_redzone_buffer::emit_redzone_byte (
>HOST_WIDE_INT off
>  = m_prev_offset + ASAN_SHADOW_GRANULARITY * m_shadow_bytes.length ();
>if (off == offset)
> +/* Consecutive shadow memory byte.  */;
> +  else if (offset < m_prev_offset + (HOST_WIDE_INT) (ASAN_SHADOW_GRANULARITY
> +  * RZ_BUFFER_SIZE)
> +&& !m_shadow_bytes.is_empty ())
>  {
> -  /* Consecutive shadow memory byte.  */
> -  m_shadow_bytes.safe_push (value);
> -  flush_if_full ();
> +  /* Shadow memory byte with a small gap.  */
> +  for (; off < offset; off += ASAN_SHADOW_GRANULARITY)
> + m_shadow_bytes.safe_push (0);
>  }
>else
>  {
> @@ -1521,9 +1525,9 @@ asan_redzone_buffer::emit_redzone_byte (
>m_shadow_mem = adjust_address (m_shadow_mem, VOIDmode,
>diff >> ASAN_SHADOW_SHIFT);
>m_prev_offset = offset;
> -  m_shadow_bytes.safe_push (value);
> -  flush_if_full ();
>  }
> +  m_shadow_bytes.safe_push (value);
> +  flush_if_full ();
>  }
>  
>  /* Emit RTX emission of the content of the buffer.  */
> --- gcc/testsuite/gcc.dg/asan/pr105396.c.jj   2022-04-26 16:56:34.522348879 
> +0200
> +++ gcc/testsuite/gcc.dg/asan/pr105396.c  2022-04-26 17:00:54.757776387 
> +0200
> @@ -0,0 +1,19 @@
> +/* PR sanitizer/105396 */
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
> +/* { dg-shouldfail "asan" } */
> +
> +int
> +main ()
> +{
> +  int a;
> +  int *b[1];
> +  int c[10];
> +  int d[1][1];
> +  for (a = 0; a < 1; a++)
> +d[1][a] = 0;
> +  return 0;
> +}
> +
> +/* { dg-output "ERROR: AddressSanitizer: stack-buffer-overflow on 
> address.*(\n|\r\n|\r)" } */
> +/* { dg-output "WRITE of size.*" } */
> 
>  

Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Apr 2022, Richard Sandiford wrote:

> "Andre Vieira (lists)"  writes:
> > Hi,
> >
> > This patch disables epilogue vectorization when we are peeling for 
> > alignment in the prologue and we can't guarantee the main vectorized 
> > loop is entered.  This is to prevent executing vectorized code with an 
> > unaligned access if the target has indicated it wants to peel for 
> > alignment. We take this conservative approach as we currently do not 
> > distinguish between peeling for alignment for correctness or for 
> > performance.
> >
> > A better codegen would be to make it skip to the scalar epilogue in case 
> > the main loop isn't entered when alignment peeling is required. However, 
> > that would require a more aggressive change to the codebase which we 
> > chose to avoid at this point of development.  We can revisit this option 
> > during stage 1 if we choose to.
> >
> > Bootstrapped on aarch64-none-linux and regression tested on 
> > aarch64-none-elf.
> >
> > gcc/ChangeLog:
> >
> >      PR tree-optimization/105219
> >      * tree-vect-loop.cc (vect_epilogue_when_peeling_for_alignment): New 
> > function.
> >      (vect_analyze_loop): Use vect_epilogue_when_peeling_for_alignment 
> > to determine
> >      whether to vectorize epilogue.
> >      * testsuite/gcc.target/aarch64/pr105219.c: New.
> >      * testsuite/gcc.target/aarch64/pr105219-2.c: New.
> >      * testsuite/gcc.target/aarch64/pr105219-3.c: New.
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-2.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
> > new file mode 100644
> > index 
> > ..c97d1dc100181b77af0766e08407e1e352f604fe
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx 
> > -fno-vect-cost-model" } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
> > "-march=armv8.2-a" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
> > "-mtune=thunderx" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
> 
> I think this should be in gcc.dg/vect, with the options forced
> for { target aarch64 }.
> 
> Are the skips necessary?  It looks like the test should work correctly
> for all options/targets.
> 
> > +/* PR 105219.  */
> > +int data[128];
> > +
> > +void __attribute((noipa))
> > +foo (int *data, int n)
> > +{
> > +  for (int i = 0; i < n; ++i)
> > +data[i] = i;
> > +}
> > +
> > +int main()
> > +{
> > +  for (int start = 0; start < 16; ++start)
> > +for (int n = 1; n < 3*16; ++n)
> > +  {
> > +__builtin_memset (data, 0, sizeof (data));
> > +foo (&data[start], n);
> > +for (int j = 0; j < n; ++j)
> > +  if (data[start + j] != j)
> > +__builtin_abort ();
> > +  }
> > +  return 0;
> > +}
> > +
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219-3.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
> > new file mode 100644
> > index 
> > ..444352fc051b787369f6f1be6236d1ff0fc2d392
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr105219-3.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
> > "-march=armv8.2-a" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
> > "-mtune=thunderx" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
> > +/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-cost-model 
> > -fdump-tree-vect-all" } */
> > +/* PR 105219.  */
> > +int data[128];
> > +
> > +void foo (void)
> > +{
> > +  for (int i = 0; i < 9; ++i)
> > +data[i + 1] = i;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "EPILOGUE VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr105219.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr105219.c
> > new file mode 100644
> > index 
> > ..bbdefb549f6a4e803852f69d20ce1ef9152a526c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr105219.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do run { target aarch64_sve128_hw } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-march=*" } { 
> > "-march=armv8.2-a+sve" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mtune=*" } { 
> > "-mtune=thunderx" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-mcpu=*" } } */
> > +/* { dg-skip-if "incompatible options" { *-*-* } { "-msve-vector-bits=*" } 
> > { "-msve-vector-bits=128" } } */
> > +/* { dg-options "-O3 -march=armv8.2-a+sve -msve-vector-bits=128 
> > -mtune=thunderx" } */
> 
> Same here.
> 
> > +/* PR 105219.  */
> > +int a;
> > +char b[60];
> > +short c[18];
> > +short d[4][19];
> > +long long f;
> > +void e(int g, int h, short k[][19]) {
> > +  for (signed i = 0; i < 3; i 

Re: [PATCH] middle-end/104492 - avoid all equality compare dangling pointer diags

2022-04-26 Thread Jakub Jelinek via Gcc-patches
On Mon, Apr 25, 2022 at 11:54:34AM +0200, Richard Biener wrote:
> The following extends the equality compare dangling pointer diagnostics
> suppression for uses following free or realloc to also cover those
> following invalidation of auto variables via CLOBBERs.  That avoids
> diagnosing idioms like
> 
>   return std::find(std::begin(candidates), std::end(candidates), s)
>!= std::end(candidates);
> 
> for auto candidates which are prone to forwarding of the final
> comparison across the storage invalidation as then seen by the
> late run access warning pass.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for trunk?
> 
> Thanks,
> Richard.
> 
> 2022-04-25  Richard Biener  
> 
>   PR middle-end/104492
>   * gimple-ssa-warn-access.cc
>   (pass_waccess::warn_invalid_pointer): Exclude equality compare
>   diagnostics for all kind of invalidations.
> 
>   * c-c++-common/Wdangling-pointer.c: Adjust for changed
>   suppression.
>   * c-c++-common/Wdangling-pointer-2.c: Likewise.

I spoke with Martin on IRC and his comment was that this is ok
but should be accompanied with a doc/invoke.texi change that clarifies
that behavior in the documentation.
I think that is a reasonable request.

Jakub



Re: [PATCH] loongarch: ignore zero-size fields in calling convention

2022-04-26 Thread Lulu Cheng





gcc/

 * config/loongarch/loongarch.cc
 (loongarch_flatten_aggregate_field): Ignore empty fields for
 RECORD_TYPE.

gcc/testsuite/

 * gcc.target/loongarch/zero-size-field-pass.c: New test.
 * gcc.target/loongarch/zero-size-field-ret.c: New test.
---
  gcc/config/loongarch/loongarch.cc |  3 ++
  .../loongarch/zero-size-field-pass.c  | 30 
+++

  .../loongarch/zero-size-field-ret.c   | 28 +
  3 files changed, 61 insertions(+)
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c


diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc

index f22150a60cc..57e4d9f82ce 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -326,6 +326,9 @@ loongarch_flatten_aggregate_field (const_tree type,
    for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
 if (TREE_CODE (f) == FIELD_DECL)
   {
+   if (DECL_SIZE (f) && integer_zerop (DECL_SIZE (f)))
+ continue;
+


I think the modification should be below.


 if (!TYPE_P (TREE_TYPE (f)))
   return -1;


Thanks!

Lulu Cheng