Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2022-01-25 Thread Thomas Schwinge
Hi!

On 2022-01-24T12:54:27+, Hafiz Abid Qadeer  wrote:
> On 24/01/2022 08:45, Tobias Burnus wrote:
>> On 21.01.22 18:15, Thomas Schwinge wrote:
>>> I'm seeing this test case randomly/non-deterministically FAIL to execute,
>>> differently on different systems and runs, for example: [...]
>>> I'd assume there's some concurrency issue: the problem disappears if I
>>> manually specify a lowerish 'OMP_NUM_THREADS'
>>
>> If one compiles the program with -fsanitize=thread, it shows tons of errors 
>> :-(

Confirmed.

> Did you notice similar behavior with
> libgomp/testsuite/libgomp.c-c++-common/allocate-1.c?

No, this one I always saw PASS.  (... which only means so much, of
course...)

> It was used as base for fortran testcase and it
> shows similar warnings with -fthread=sanitize.

Confirmed.

> I am trying to figure out if the problem you observed
> is a general one or just specific to fortran testcase.

So, unless the '-fsanitize=thread' issues are bogus -- unlikely ;-) -- it
seems a latent issue generally, now fatal with
'libgomp.fortran/allocate-1.f90'.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PING^3][PATCH, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-25 Thread Richard Sandiford via Gcc-patches
Dan Li  writes:
>>> +
>>>  if (flag_stack_usage_info)
>>>current_function_static_stack_size = constant_lower_bound 
>>> (frame_size);
>>>
>>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
>>>  RTX_FRAME_RELATED_P (insn) = 1;
>>>}
>>>
>>> +  /* Pop return address from shadow call stack.  */
>>> +  if (aarch64_shadow_call_stack_enabled ())
>>> +   emit_insn (gen_scs_pop ());
>>> +
>> 
>> This looks correct, but following on from the above, I guess we continue
>> to restore x30 from the frame in the traditional way first, and then
>> overwrite it with the SCS value.  I think the output code would be
>> slightly neater if we suppressed the first restore of x30.
>> 
> Yes, the current epilogue is like:
>  ...
>  ldp x29, x30, [sp], #16
> +   ldr x30, [x18, #-8]!
>
> I think may be we can keep the two x30 pops here, for two reasons:
> 1) The implementation of scs in clang is to pop x30 twice, it might be
> better to be consistent with clang here[1].

This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)

> 2) If we keep the first restore of x30, it may provide some flexibility
> for other scenarios. For example, we can directly patch the scs_push/pop
> insns in the binary to turn it into a binary without scs;

Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.

I'd like a definitive ruling on this from the kernel folks before
the patch goes in.

If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

  (set ...mem...
   (unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.

>>> +;; Save X30 in the X18-based POST_INC stack (consistent with clang).
>>> +(define_insn "scs_push"
>>> +  [(set (mem:DI (reg:DI R18_REGNUM)) (reg:DI R30_REGNUM))
>>> +   (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int 8)))]
>>> +  ""
>>> +  "str\\tx30, [x18], #8"
>>> +  [(set_attr "type" "store_8")]
>>> +)
>>> +
>>> +;; Load X30 form the X18-based PRE_DEC stack (consistent with clang).
>>> +(define_insn "scs_pop"
>>> +  [(set (reg:DI R18_REGNUM) (minus:DI (reg:DI R18_REGNUM) (const_int 8)))
>>> +   (set (reg:DI R30_REGNUM) (mem:DI (reg:DI R18_REGNUM)))]
>>> +  ""
>>> +  "ldr\\tx30, [x18, #-8]!"
>>> +  [(set_attr "type" "load_8")]
>>> +)
>>> +
>> 
>> The two SETs here work in parallel, with the define_insn as a whole
>> following a "read input operands, act, write output operands" sequence.
>> The (mem: …) address therefore sees the old value of r18 rather than the
>> new one.  Also, RTL rules say that subtractions need to be written as
>> additions of the negative.
>> 
> Oh, sorry, I got it wrong here.
>
>> I think the pattern would therefore be something like:
>> 
>>[(set (reg:DI R30_REGNUM) (mem:DI (plus:DI (reg:DI R18_REGNUM)
>>   (const_int -8]
>> (set (reg:DI R18_REGNUM) (plus:DI (reg:DI R18_REGNUM) (const_int -8)))]
>> 
>> However, since these are normal moves, I think we should be able to use
>> the standard move patterns.  E.g. the push could be:
>> 
>> (define_expand "scs_push"
>>[(set (mem:DI (pre_dec:DI (reg:DI R18_REGNUM)))
>>  (reg:DI R30_REGNUM))])
>> 
>> and similarly for the pop.
>> 
> Thanks, this template looks better.
>
> Since the push/pop of SCS and normal stack in clang are different
> (for example, scs push is post_inc, while normal stack is pre_dec),
> in order to maintain consistency, I think it can be changed to the
> following:
>
> (define_expand "scs_push"
>[(set (mem:DI (post_inc:DI (reg:DI R18_REGNUM)))
>  (reg:DI R30_REGNUM))])
>
> (define_expand "scs_pop"
>[(set (reg:DI R30_REGNUM)
> (mem:DI (pre_dec:DI (reg:DI R18_REGNUM])

Yeah, I copied the wrong name above, sorry.

Thanks,
Richard


Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2022-01-25 Thread Tobias Burnus

On 25.01.22 10:19, Thomas Schwinge wrote:

I am trying to figure out if the problem you observed
is a general one or just specific to fortran testcase.

So, unless the '-fsanitize=thread' issues are bogus -- unlikely ;-) -- it
seems a latent issue generally, now fatal with
'libgomp.fortran/allocate-1.f90'.


There is one known issue with libgomp and TSAN (-fsanitize=thread)
that I tend to forget about :-(

That's according to Jakub, who wrote a while ago:

"TSAN doesn't understand what libgomp is doing, unless built with 
--disable-linux-futex"



However, I now tried to disable futex and still get the following.
(First result for libgomp.c-c++-common/allocate-1.c).

On the other hand, I have the feeling that the configure option is
a no op for libgomp. This can also be seen in the configure.ac script,
which only for libstdc++ uses the result and the others have a no-op
call to 'true' (alias ':'):

libgomp/configure.ac:GCC_LINUX_FUTEX(:)
libitm/configure.ac:GCC_LINUX_FUTEX(:)
libstdc++-v3/configure.ac:GCC_LINUX_FUTEX([AC_DEFINE(HAVE_LINUX_FUTEX, 1, 
[Define if futex syscall is available.])])

(The check is not completely pointless as some checks are still done;
e.g. 'SYS_gettid and SYS_futex required'.)

(TSAN did find issues in libgomp in the past, however. But those
habe been fixed.)


Thus, there might or might not be an issue when TSAN reports one.

 * * *

Glancing at the Fortran testcase, I noted the following,
which probably does not cause the problems. But still,
I want to mention it:

  !$omp parallel private (y, v) firstprivate (x) allocate (x, y, v)
  if (x /= 42) then
stop 1
  end if

  v(1) = 7
  if ( (and(fl, 2) /= 0) .and.  &
   ((is_64bit_aligned(x) == 0) .or. &
(is_64bit_aligned(y) == 0) .or. &
(is_64bit_aligned(v(1)) == 0))) then
  stop 2
  end if

If one compares this with the C/C++ testcase, I note that there
is a barrier before the alignment check in C/C++ but not in
Fortran. Additionally, 'v(1) = 7' is set twice and the
alignment check happens earlier than in C/C++. Not that that
should really matter, but I just saw it.


In C/C++:
  int v[x], w[x];
...
v[0] = 7;
v[41] = 8;

In Fortran:
  integer, dimension(x) :: v
...
  v(1) = 7
  v(41) = 8

where 'x == 42'. The Fortran version is not really wrong, but I think
the idea is to set the first and last array element - and that's here
v(42) and not v(41).

BTW: Fortran permits to specify a different lower bound. When converting
C/C++ testcases, it can be useful to use the same lower bound also in
Fortran:   integer :: v(0:x-1)  (or: 'integer, dimension(0:x-1) :: v')
uses then 0 ... 41 for the indices instead of 1 ... 42.

But one has to be careful as Fortran uses the upper bound and C uses the
number of elements. (Same with OpenMP array sections in Fortran vs. C.)


Tobias

PS: The promised data-race warning:
==
WARNING: ThreadSanitizer: data race (pid=4135381)
  Read of size 8 at 0x7ffc0888bdc0 by thread T10:
#0 foo._omp_fn.2 libgomp.c-c++-common/allocate-1.c:47 (a.out+0x402c05)
#1 gomp_thread_start ../../../repos/gcc/libgomp/team.c:129 
(libgomp.so.1+0x1e5ed)

  Previous write of size 8 at 0x7ffc0888bdc0 by main thread:
#0 foo._omp_fn.1 libgomp.c-c++-common/allocate-1.c:47 (a.out+0x402aee)
#1 GOMP_teams_reg ../../../repos/gcc/libgomp/teams.c:51 
(libgomp.so.1+0x3638c)
#2 main libgomp.c-c++-common/allocate-1.c:366 (a.out+0x40273e)

  Location is stack of main thread.

  Location is global '' at 0x ([stack]+0x1ddc0)

  Thread T10 (tid=4135398, running) created by main thread at:
#0 pthread_create 
../../../../repos/gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 
(libtsan.so.2+0x62c76)
#1 gomp_team_start ../../../repos/gcc/libgomp/team.c:858 
(libgomp.so.1+0x1ec18)
#2 main libgomp.c-c++-common/allocate-1.c:366 (a.out+0x40273e)

SUMMARY: ThreadSanitizer: data race libgomp.c-c++-common/allocate-1.c:47 in 
foo._omp_fn.2
==

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] dwarf2out: For ppc64le IEEE quad long double, emit DW_TAG_typedef to _Float128 [PR104194]

2022-01-25 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 24, 2022 at 11:26:27PM +0100, Jakub Jelinek via Gcc-patches wrote:
> Yet another short term solution might be not use DW_TAG_base_type
> for the IEEE quad long double, but instead pretend it is a DW_TAG_typedef
> with DW_AT_name "long double" to __float128 DW_TAG_base_type.
> I bet gdb would even handle it without any changes, but of course, it would
> be larger than the other proposed changes.

Here it is implemented.

Testcases I've played with are e.g.:
__ibm128 a;
long double b;
_Complex long double c;

static __attribute__((noinline)) int
foo (long double d)
{
  long double e = d + 1.0L;
  return 0;
}

int
main ()
{
  a = 1.0;
  b = 2.0;
  c = 5.0 + 6.0i;
  return foo (7.0L);
}
and
  real(kind=16) :: a
  complex(kind=16) :: b
  a = 1.0
  b = 2.0
end

Printing the values of the variables works well,
p &b or p &c shows pointer to the correct type, just
ptype b or ptype c prints _Float128 instead of
long double or complex _Float128 instead of complex long double.
Even worse in fortran where obviously _Float128 or
complex _Float128 aren't valid types, but as GDB knows them by name,
it is just ptype that is weird.

Is this ok for trunk until we get an agreement on which extension
to use (different DW_ATE_*, or DW_AT_GNU_precision or
DW_ATE_GNU_encoding_variant)?

2022-01-25  Jakub Jelinek  

PR debug/104194
* dwarf2out.cc (long_double_as_float128): New function.
(modified_type_die): For powerpc64le IEEE 754 quad long double
and complex long double emit those as DW_TAG_typedef to
_Float128 or complex _Float128 base type.

--- gcc/dwarf2out.cc.jj 2022-01-25 05:47:53.987454934 +0100
+++ gcc/dwarf2out.cc2022-01-25 11:54:25.100522089 +0100
@@ -13568,6 +13568,47 @@ qualified_die_p (dw_die_ref die, int *ma
   return type;
 }
 
+/* If TYPE is long double or complex long double that
+   should be emitted as artificial typedef to _Float128 or
+   complex _Float128, return the type it should be emitted as.
+   This is done in case the target already supports 16-byte
+   composite floating point type (ibm_extended_format).  */
+
+static tree
+long_double_as_float128 (tree type)
+{
+  if (type != long_double_type_node
+  && type != complex_long_double_type_node)
+return NULL_TREE;
+
+  machine_mode mode, fmode;
+  if (TREE_CODE (type) == COMPLEX_TYPE)
+mode = TYPE_MODE (TREE_TYPE (type));
+  else
+mode = TYPE_MODE (type);
+  if (known_eq (GET_MODE_SIZE (mode), 16) && !MODE_COMPOSITE_P (mode))
+FOR_EACH_MODE_IN_CLASS (fmode, MODE_FLOAT)
+  if (known_eq (GET_MODE_SIZE (fmode), 16)
+  && MODE_COMPOSITE_P (fmode))
+   {
+ if (type == long_double_type_node)
+   {
+ if (float128_type_node
+ && (TYPE_MODE (float128_type_node)
+ == TYPE_MODE (type)))
+   return float128_type_node;
+ return NULL_TREE;
+   }
+ for (int i = 0; i < NUM_FLOATN_NX_TYPES; i++)
+   if (COMPLEX_FLOATN_NX_TYPE_NODE (i) != NULL_TREE
+   && (TYPE_MODE (COMPLEX_FLOATN_NX_TYPE_NODE (i))
+   == TYPE_MODE (type)))
+ return COMPLEX_FLOATN_NX_TYPE_NODE (i);
+   }
+
+  return NULL_TREE;
+}
+
 /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging
entry that chains the modifiers specified by CV_QUALS in front of the
given type.  REVERSE is true if the type is to be interpreted in the
@@ -13848,7 +13889,32 @@ modified_type_die (tree type, int cv_qua
 }
   else if (is_base_type (type))
 {
-  mod_type_die = base_type_die (type, reverse);
+  /* If a target supports long double as different floating point
+modes with the same 16-byte size, use normal DW_TAG_base_type
+only for the composite (ibm_extended_real_format) type and
+for the other for the time being emit instead a "_Float128"
+or "complex _Float128" DW_TAG_base_type and a "long double"
+or "complex long double" typedef to it.  */
+  if (tree other_type = long_double_as_float128 (type))
+   {
+ dw_die_ref other_die;
+ if (TYPE_NAME (other_type))
+   other_die
+ = modified_type_die (other_type, TYPE_UNQUALIFIED, reverse,
+  context_die);
+ else
+   {
+ other_die = base_type_die (type, reverse);
+ add_child_die (comp_unit_die (), other_die);
+ add_name_attribute (other_die,
+ TREE_CODE (type) == COMPLEX_TYPE
+ ? "complex _Float128" : "_Float128");
+   }
+ mod_type_die = new_die_raw (DW_TAG_typedef);
+ add_AT_die_ref (mod_type_die, DW_AT_type, other_die);
+   }
+  else
+   mod_type_die = base_type_die (type, reverse);
 
   /* The DIE with DW_AT_endianity is placed right after the naked DIE.  */
   if (reverse_base_type)


Jakub



Re: [PATCH] libgcc: Fix _Unwind_Find_FDE for missing unwind data with glibc 2.35

2022-01-25 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 24, 2022 at 06:11:22PM +0100, Florian Weimer via Gcc-patches wrote:
> _dl_find_object returns success even if no unwind information has been
> found, and dlfo_eh_frame is NULL.
> 
> libgcc/ChangeLog:
> 
>   PR libgcc/104207
>   * unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Add NULL check.
> 
> ---
>  libgcc/unwind-dw2-fde-dip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Ok, thanks.

> diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
> index 7de847cb120..3d6f39f5460 100644
> --- a/libgcc/unwind-dw2-fde-dip.c
> +++ b/libgcc/unwind-dw2-fde-dip.c
> @@ -509,7 +509,7 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
>  #ifdef DLFO_STRUCT_HAS_EH_DBASE
>{
>  struct dl_find_object dlfo;
> -if (_dl_find_object (pc, &dlfo) == 0)
> +if (_dl_find_object (pc, &dlfo) == 0 && dlfo.dlfo_eh_frame != NULL)
>return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame,
>  # if DLFO_STRUCT_HAS_EH_DBASE
>   (_Unwind_Ptr) dlfo.dlfo_eh_dbase,

Jakub



Re: [PATCH] avoid recomputing PHI results after failure (PR 104203)

2022-01-25 Thread Richard Biener via Gcc-patches
On Mon, Jan 24, 2022 at 11:55 PM Martin Sebor  wrote:
>
> The pointer_query algorithm fails when it's unable to determine
> the provenance of an pointer SSA variable.  A failure prevents it
> from making a record of the variable in the cache.  Although this
> doesn't happen often, one common cause of failures is PHI nodes:
> e.g., because one of its arguments references the PHI itself, or
> an undefined variable.  Because a failed SSA_NAME isn't added to
> the cache, the computation that led up to its failure is repeated
> on each interesting use of the variable. This is computationally
> wasteful and can cause excessive CPU usage in pathological cases
> like in the test case in PR 104203.
>
> To avoid the problem the attached patch changes the logic for PHI
> arguments and nodes to cache the most conservative result on such
> a failure.  It treats such a pointer as pointing into the middle
> of an object of maximum size (same as under some other conditions).
>
> In addition, the patch also adds a new timer variable to
> the warn_access pass to make these problems easier in the future
> to track down.

LGTM.

> There are a number of other conditions under which point_query
> fails.  Even though the same approach could be used to replace
> all those, I did not make the change in this patch.

It's probably worth fixing them.  Btw, I wonder whether caching
fails as true fails would work as well though caching a conservative
answer of course is fine.

> Tested on x86_64-linux.
>
> Martin
>
> PS This solution relies on the pointer_query instance having caching
> enabled.  Two warning passes run without a cache: -Warray-bounds in
> VRP and -Wrestrict in the wrestrict pass.  I can look into enabling
> it for GCC 12 if there's concern that they might be impacted.

Please.  There is IMHO no reason to not use the cache - for PHIs
you can run into the same PHI arg def from multiple edges and thus
you'll do unnecessary work even when you just do a single query
for the toplevel API.

Richard.


[PATCH] tree-optimization/104214 - amend PR100740 fix for pointer compares

2022-01-25 Thread Richard Biener via Gcc-patches
When we have a pointer relational compare we have stronger guarantees
about overflow, in particular rewriting BASE0 + STEP0 cmp BASE1 + STEP1
as BASE0 + STEP0 - STEP1 cmp BASE1 is always valid and the new IV0
does not overflow.  The patch basically reverts the previous change
when pointers are involved, keeping only the more conservative handling
for equality compares which can involve comparing different object
addresses.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-01-25  Richard Biener  

PR tree-optimization/104214
* tree-ssa-loop-niter.cc (number_of_iterations_cond): Use
stronger guarantees for relational pointer compares when
rewriting BASE0 + STEP0 cmp BASE1 + STEP1 as
BASE0 + STEP0 - STEP1 cmp BASE1.

* gcc.dg/vect/pr81196-2.c: New variant testcase only
requiring vect_int.
---
 gcc/testsuite/gcc.dg/vect/pr81196-2.c | 16 
 gcc/tree-ssa-loop-niter.cc| 15 ---
 2 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr81196-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr81196-2.c 
b/gcc/testsuite/gcc.dg/vect/pr81196-2.c
new file mode 100644
index 000..8d5ce6bad53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr81196-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+void b (int *p)
+{
+  p = (int *)__builtin_assume_aligned(p, __BIGGEST_ALIGNMENT__);
+  int *q = p + 255;
+  for(; p < q; ++p, --q)
+{
+  int t = *p;
+  *p = *q;
+  *q = t;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 04c209561d8..d33095b8e03 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -1915,14 +1915,23 @@ number_of_iterations_cond (class loop *loop,
}
   /* If the new step of IV0 has changed sign or is of greater
 magnitude then we do not know whether IV0 does overflow
-and thus the transform is not valid for code other than NE_EXPR  */
+and thus the transform is not valid for code other than NE_EXPR.  */
   else if (tree_int_cst_sign_bit (step) != tree_int_cst_sign_bit 
(iv0->step)
   || wi::gtu_p (wi::abs (wi::to_widest (step)),
 wi::abs (wi::to_widest (iv0->step
{
- if (code != NE_EXPR)
+ if (POINTER_TYPE_P (type) && code != NE_EXPR)
+   /* For relational pointer compares we have further guarantees
+  that the pointers always point to the same object (or one
+  after it) and that objects do not cross the zero page.  So
+  not only is the transform always valid for relational
+  pointer compares, we also know the resulting IV does not
+  overflow.  */
+   ;
+ else if (code != NE_EXPR)
return false;
- iv0->no_overflow = false;
+ else
+   iv0->no_overflow = false;
}
 
   iv0->step = step;
-- 
2.31.1


Re: [PATCH] Fortran: detect signaling NaNs on targets without issignaling macro in libc

2022-01-25 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 17, 2022 at 12:11:59AM +0100, FX via Gcc-patches wrote:
> This patch is the third in my “signaling NaN” series. For targets with IEEE 
> support but without the issignaling macro in libc (i.e., everywhere except 
> glibc), this allows us to provide a fallback implementation. In order to keep 
> the code in ieee_helper.c relatively readable, I’ve put that new 
> implementation in a separate file, issignaling_fallback.h.
> 
> The logic is borrowed from different routines in glibc, but gathered into a 
> single file and much simpler than the glibc implementation, because we do not 
> need to cover all the cases they have (comments with details are available in 
> issignaling_fallback.h).
> 
> I can’t test this on all the targets I’d like to, obviously. But it was 
> tested on x86_64-pc-linux-gnu (where it doesn’t do anything), on 
> x86_64-pc-linux-gnu by mimicking the lack of a issignaling macro, and on 
> x86_64-apple-darwin (which does not have issignaling).

This doesn't seem to handle the powerpc* IBM double double long double.

__LDBL_IS_IEC_60559__ isn't defined for this type, because it is far from
an IEEE754 type, but it has signaling NaNs - as can be seen in glibc
libc/sysdeps/ieee754/ldbl-128ibm/s_issignalingl.c
the type is a pair of doubles and whether it is a sNaN or qNaN is determined
by whether the first double is a sNaN or qNaN.

Ok for trunk?

2022-01-25  Jakub Jelinek  

* ieee/issignaling_fallback.h (__issignalingl): Define for
IBM extended long double are returning __issignaling on the
first double.

--- libgfortran/ieee/issignaling_fallback.h.jj  2022-01-25 12:14:45.404232320 
+0100
+++ libgfortran/ieee/issignaling_fallback.h 2022-01-25 12:14:52.504131720 
+0100
@@ -137,6 +137,19 @@ __issignalingl (long double x)
   return ret || (((exi & 0x7fff) == 0x7fff) && (hxi > 0xc000));
 }
 
+#elif (__LDBL_DIG__ == 31)
+
+/* Long double is 128-bit IBM extended type.  */
+
+static inline int
+__issignalingl (long double x)
+{
+  union { long double value; double parts[2]; } u;
+
+  u.value = x;
+  return __issignaling (u.parts[0]);
+}
+
 #elif (__LDBL_DIG__ == 33) && __LDBL_IS_IEC_60559__
 
 /* Long double is 128-bit type.  */


Jakub



Re: [PATCH] Fortran: detect signaling NaNs on targets without issignaling macro in libc

2022-01-25 Thread FX via Gcc-patches
Hi Jakub,

> This doesn't seem to handle the powerpc* IBM double double long double.

Do we support the IEEE Fortran modules on this target, despite having a 
non-IEEE long double? I remember we talked about it when I first implemented 
it, but can’t remember what choice we ended up making.


> __LDBL_IS_IEC_60559__ isn't defined for this type, because it is far from
> an IEEE754 type, but it has signaling NaNs - as can be seen in glibc
> libc/sysdeps/ieee754/ldbl-128ibm/s_issignalingl.c
> the type is a pair of doubles and whether it is a sNaN or qNaN is determined
> by whether the first double is a sNaN or qNaN.
> 
> Ok for trunk?

It doesn’t hurt to provide an implementation, in any case. OK to push, and 
thanks for the patch.

FX

Re: [PATCH 1/2] Check negative combined step

2022-01-25 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Tue, 25 Jan 2022, Jiufu Guo wrote:
>
>> Jiufu Guo  writes:
>> 
>> > Richard Biener  writes:
>> >
>> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
>> > ...
>> >>
>> >>> -  /* No need to check sign of the new step since below code takes 
>> >>> care
>> >>> - of this well.  */
>> >>> +  /* Like cases shown in PR100740/102131, negtive step is not safe. 
>> >>>  */
>> >>> +  if (tree_int_cst_sign_bit (step))
>> >>> +return false;
>> >>> +
>> >>>if (code != NE_EXPR
>> >>>&& (TREE_CODE (step) != INTEGER_CST
>> >>>|| !iv0->no_overflow || !iv1->no_overflow))
>> >>
>> >> I think for NE_EXPR the sign is not relevant.  I think the key is
>> >> that we require that iv0->no_overflow and for non-equality checks
>> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
>> >> does not cause any overflow on either side.  On the LHS this is
>> >
>> > Hi Richard,
>> >
>> > Thanks a lot for your comments and ideas!
>> >
>> > Right! The adjusting is safe only if we can make sure there is
>> > no overflow/wrap on either side or say there is no overflow/wrap
>> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
Hi Richard,

Thanks for your quickly reply, and patient review!
>
> The point is that we may not change the iteration number at which
> overflow occurs since that alters the result of the < compare.
> Only if we know there is no overflow with the present expression
> during the loop evaluation we can do the transform and then only
> if we do not introduce overflow.
Exactly, this is also what I mean :)

>
> We are basically doing the transform that fold_comparison
> in fold-const.cc does:
>
>   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
>  X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
>  the resulting offset is smaller in absolute value than the
>  original one and has the same sign.  */
>   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
>   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> ...
>
This transform seems not the scenario which we are care about in
number_of_iterations_cond.
For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
no overflow happen.
But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
different with niter for '{X, 0} < {Y, 3}'.
for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
   in this loop, iv1 walks to overflow faster, step is 4.
vs.
for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
   in this loop, iv1 overflows slower, step is 3.

Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
may not always correct.  e.g. for below code, X=6, and Y=2147483645
it may output "b0 + 1 < b1 + 4 is true".
```c
int __attribute__ ((noinline))
foo (int b0, int b1)
{
  return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n",
   b0 + 1 < b1 + 4 ? "true" : "false");
}

int main(int argc, char **argv)
{
  if (argc < 2)
return -1;
  int b0 = atoi(argv[1]);
  int b1 = atoi(argv[2]);  
  return foo (b0, b1);
}
```
>> > Or it may also ok if we can compute an assumption, under which
>> > all three ivs are not overflowed/wrapped.
>> >
>> >> only guaranteed if the absolute value of C1 - C2 is smaller than
>> >> that of C1 and if it has the same sign.
>> > I'm thinking this in another way:
>> > When trying to do this transform in number_of_iterations_cond,
>> > GT/GE is inverted to LT/LE, then the compare code would be:
>> > LT/LE or NE.
>> >
>> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
>> > chasing iv1.  We would able to assume X < Y (may_be_zero would
>> > be set later via number_of_iterations_lt/le).
>> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
>> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
>> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
>> > This indicates, if the sign of (C1 - C1) is negative, then the
>> > transform would be incorrect.
>> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
>> > iv1 and combined iv) are not wrapped.
>> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
>> >the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
>> There is still some cases: iv0 step is too large, then iv0 wraps
>> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
>> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
>> 
>> > For C1 < 0, {X,C1} should not down cross MIN
>> >the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
>>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
>> 
>> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
>> > it would be almost safe.
>> For this case, we may still add the assumption to avoid wraping
>> at the first iteration.
>> 
>> BR,
>> Jiufu
>> 
>> >
>> > For NE, it seems more interesting. The transformation depends
>> > on 3 things: 1. the relation between X and Y; 2 the sign
>> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
>> > The 3rd 

Re: [PATCH 1/2] Check negative combined step

2022-01-25 Thread Richard Biener via Gcc-patches
On Tue, 25 Jan 2022, Jiufu Guo wrote:

> Richard Biener  writes:
> 
> > On Tue, 25 Jan 2022, Jiufu Guo wrote:
> >
> >> Jiufu Guo  writes:
> >> 
> >> > Richard Biener  writes:
> >> >
> >> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> >> > ...
> >> >>
> >> >>> -  /* No need to check sign of the new step since below code takes 
> >> >>> care
> >> >>> -   of this well.  */
> >> >>> +  /* Like cases shown in PR100740/102131, negtive step is not 
> >> >>> safe.  */
> >> >>> +  if (tree_int_cst_sign_bit (step))
> >> >>> +  return false;
> >> >>> +
> >> >>>if (code != NE_EXPR
> >> >>>  && (TREE_CODE (step) != INTEGER_CST
> >> >>>  || !iv0->no_overflow || !iv1->no_overflow))
> >> >>
> >> >> I think for NE_EXPR the sign is not relevant.  I think the key is
> >> >> that we require that iv0->no_overflow and for non-equality checks
> >> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
> >> >> does not cause any overflow on either side.  On the LHS this is
> >> >
> >> > Hi Richard,
> >> >
> >> > Thanks a lot for your comments and ideas!
> >> >
> >> > Right! The adjusting is safe only if we can make sure there is
> >> > no overflow/wrap on either side or say there is no overflow/wrap
> >> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
> Hi Richard,
> 
> Thanks for your quickly reply, and patient review!
> >
> > The point is that we may not change the iteration number at which
> > overflow occurs since that alters the result of the < compare.
> > Only if we know there is no overflow with the present expression
> > during the loop evaluation we can do the transform and then only
> > if we do not introduce overflow.
> Exactly, this is also what I mean :)
> 
> >
> > We are basically doing the transform that fold_comparison
> > in fold-const.cc does:
> >
> >   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
> >  X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
> >  the resulting offset is smaller in absolute value than the
> >  original one and has the same sign.  */
> >   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> >   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> > ...
> >
> This transform seems not the scenario which we are care about in
> number_of_iterations_cond.
> For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
> no overflow happen.
> But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
> different with niter for '{X, 0} < {Y, 3}'.
> for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
>in this loop, iv1 walks to overflow faster, step is 4.
> vs.
> for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
>in this loop, iv1 overflows slower, step is 3.

Huh?  But we are _exactly_ doing this, analyzing {X, + 1} < {Y, + 4}
as X < {Y, + 3} (well, OK, we're only trying {X, -3} which now
fails - we should try the other way around as well).

> Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
> may not always correct.  e.g. for below code, X=6, and Y=2147483645
> it may output "b0 + 1 < b1 + 4 is true".

But Y + 4 overflows with 2147483645 so X + 1 < Y + 4 invokes UB
and we can ignore this situation.

> ```c
> int __attribute__ ((noinline))
> foo (int b0, int b1)
> {
>   return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n",
>  b0 + 1 < b1 + 4 ? "true" : "false");
> }
> 
> int main(int argc, char **argv)
> {
>   if (argc < 2)
> return -1;
>   int b0 = atoi(argv[1]);
>   int b1 = atoi(argv[2]);  
>   return foo (b0, b1);
> }
> ```
> >> > Or it may also ok if we can compute an assumption, under which
> >> > all three ivs are not overflowed/wrapped.
> >> >
> >> >> only guaranteed if the absolute value of C1 - C2 is smaller than
> >> >> that of C1 and if it has the same sign.
> >> > I'm thinking this in another way:
> >> > When trying to do this transform in number_of_iterations_cond,
> >> > GT/GE is inverted to LT/LE, then the compare code would be:
> >> > LT/LE or NE.
> >> >
> >> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> >> > chasing iv1.  We would able to assume X < Y (may_be_zero would
> >> > be set later via number_of_iterations_lt/le).
> >> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> >> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> >> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
> >> > This indicates, if the sign of (C1 - C1) is negative, then the
> >> > transform would be incorrect.
> >> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> >> > iv1 and combined iv) are not wrapped.
> >> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
> >> >the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
> >> There is still some cases: iv0 step is too large, then iv0 wraps
> >> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
> >> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
> >> 
> >> > For C1 < 0, {X,C1} should not down cross 

Re: [PATCH 1/2] Check negative combined step

2022-01-25 Thread Richard Biener via Gcc-patches
On Tue, 25 Jan 2022, Richard Biener wrote:

> On Tue, 25 Jan 2022, Jiufu Guo wrote:
> 
> > Richard Biener  writes:
> > 
> > > On Tue, 25 Jan 2022, Jiufu Guo wrote:
> > >
> > >> Jiufu Guo  writes:
> > >> 
> > >> > Richard Biener  writes:
> > >> >
> > >> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> > >> > ...
> > >> >>
> > >> >>> -  /* No need to check sign of the new step since below code 
> > >> >>> takes care
> > >> >>> - of this well.  */
> > >> >>> +  /* Like cases shown in PR100740/102131, negtive step is not 
> > >> >>> safe.  */
> > >> >>> +  if (tree_int_cst_sign_bit (step))
> > >> >>> +return false;
> > >> >>> +
> > >> >>>if (code != NE_EXPR
> > >> >>>&& (TREE_CODE (step) != INTEGER_CST
> > >> >>>|| !iv0->no_overflow || !iv1->no_overflow))
> > >> >>
> > >> >> I think for NE_EXPR the sign is not relevant.  I think the key is
> > >> >> that we require that iv0->no_overflow and for non-equality checks
> > >> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
> > >> >> does not cause any overflow on either side.  On the LHS this is
> > >> >
> > >> > Hi Richard,
> > >> >
> > >> > Thanks a lot for your comments and ideas!
> > >> >
> > >> > Right! The adjusting is safe only if we can make sure there is
> > >> > no overflow/wrap on either side or say there is no overflow/wrap
> > >> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
> > Hi Richard,
> > 
> > Thanks for your quickly reply, and patient review!
> > >
> > > The point is that we may not change the iteration number at which
> > > overflow occurs since that alters the result of the < compare.
> > > Only if we know there is no overflow with the present expression
> > > during the loop evaluation we can do the transform and then only
> > > if we do not introduce overflow.
> > Exactly, this is also what I mean :)
> > 
> > >
> > > We are basically doing the transform that fold_comparison
> > > in fold-const.cc does:
> > >
> > >   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
> > >  X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
> > >  the resulting offset is smaller in absolute value than the
> > >  original one and has the same sign.  */
> > >   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> > >   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> > > ...
> > >
> > This transform seems not the scenario which we are care about in
> > number_of_iterations_cond.
> > For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
> > no overflow happen.
> > But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
> > different with niter for '{X, 0} < {Y, 3}'.
> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
> >in this loop, iv1 walks to overflow faster, step is 4.
> > vs.
> > for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
> >in this loop, iv1 overflows slower, step is 3.
> 
> Huh?  But we are _exactly_ doing this, analyzing {X, + 1} < {Y, + 4}
> as X < {Y, + 3} (well, OK, we're only trying {X, -3} which now
> fails - we should try the other way around as well).
> 
> > Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
> > may not always correct.  e.g. for below code, X=6, and Y=2147483645
> > it may output "b0 + 1 < b1 + 4 is true".
> 
> But Y + 4 overflows with 2147483645 so X + 1 < Y + 4 invokes UB
> and we can ignore this situation.
> 
> > ```c
> > int __attribute__ ((noinline))
> > foo (int b0, int b1)
> > {
> >   return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n",
> >b0 + 1 < b1 + 4 ? "true" : "false");
> > }
> > 
> > int main(int argc, char **argv)
> > {
> >   if (argc < 2)
> > return -1;
> >   int b0 = atoi(argv[1]);
> >   int b1 = atoi(argv[2]);  
> >   return foo (b0, b1);
> > }
> > ```
> > >> > Or it may also ok if we can compute an assumption, under which
> > >> > all three ivs are not overflowed/wrapped.
> > >> >
> > >> >> only guaranteed if the absolute value of C1 - C2 is smaller than
> > >> >> that of C1 and if it has the same sign.
> > >> > I'm thinking this in another way:
> > >> > When trying to do this transform in number_of_iterations_cond,
> > >> > GT/GE is inverted to LT/LE, then the compare code would be:
> > >> > LT/LE or NE.
> > >> >
> > >> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> > >> > chasing iv1.  We would able to assume X < Y (may_be_zero would
> > >> > be set later via number_of_iterations_lt/le).
> > >> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> > >> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> > >> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
> > >> > This indicates, if the sign of (C1 - C1) is negative, then the
> > >> > transform would be incorrect.
> > >> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> > >> > iv1 and combined iv) are not wrapped.
> > >> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
> > >> >the assumption may like

[PATCH] RISC-V: Always pass -misa-spec to assembler [PR104219]

2022-01-25 Thread Kito Cheng
Add -misa-spec to OPTION_DEFAULT_SPECS to make sure -misa-spec will
always pass that into assembler, that prevent GCC and binutils using
different way to interpret the ISA string.

gcc/ChangeLog:

PR target/104219
* config.gcc (riscv*-*-*): Normalize the with_isa_spec value.
(all_defaults): Add isa_spec.
* config/riscv/riscv.h (OPTION_DEFAULT_SPECS): Add isa_spec.
---
 gcc/config.gcc   | 4 +++-
 gcc/config/riscv/riscv.h | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 90aec3f8f3f..0bb8c63a46e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4643,12 +4643,14 @@ case "${target}" in
case "${with_isa_spec}" in
""|default|20191213|201912)
tm_defines="${tm_defines} 
TARGET_DEFAULT_ISA_SPEC=ISA_SPEC_CLASS_20191213"
+   with_isa_spec=20191213
;;
2.2)
tm_defines="${tm_defines} 
TARGET_DEFAULT_ISA_SPEC=ISA_SPEC_CLASS_2P2"
;;
20190608 | 201906)
tm_defines="${tm_defines} 
TARGET_DEFAULT_ISA_SPEC=ISA_SPEC_CLASS_20190608"
+   with_isa_spec=20190608
;;
*)
echo "--with-isa-spec only accept 2.2, 20191213, 
201912, 20190608 or 201906" 1>&2
@@ -5430,7 +5432,7 @@ case ${target} in
 esac
 
 t=
-all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1 madd4"
+all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1 madd4 isa_spec"
 for option in $all_defaults
 do
eval "val=\$with_"`echo $option | sed s/-/_/g`
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 695668424c3..8a4d2cf7f85 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -60,6 +60,7 @@ extern const char *riscv_default_mtune (int argc, const char 
**argv);
--with-arch is ignored if -march or -mcpu is specified.
--with-abi is ignored if -mabi is specified.
--with-tune is ignored if -mtune or -mcpu is specified.
+   --with-isa-spec is ignored if -misa-spec is specified.
 
But using default -march/-mtune value if -mcpu don't have valid option.  */
 #define OPTION_DEFAULT_SPECS \
@@ -70,6 +71,7 @@ extern const char *riscv_default_mtune (int argc, const char 
**argv);
   "  %{!mcpu=*:-march=%(VALUE)}"   \
   "  %{mcpu=*:%:riscv_expand_arch_from_cpu(%* %(VALUE))}}" },  \
   {"abi", "%{!mabi=*:-mabi=%(VALUE)}" }, \
+  {"isa_spec", "%{!misa-spec=*:-misa-spec=%(VALUE)}" }, \
 
 #ifdef IN_LIBGCC2
 #undef TARGET_64BIT
-- 
2.34.0



Re: [PATCH] Disable -fsplit-stack support on non-glibc targets

2022-01-25 Thread David Edelsohn via Gcc-patches
This patch broke bootstrap on AIX.  It may have broken Darwin.  I have
applied the following patch.  AIX doesn't need to distinguish between
different Linux libc implementations.

Bootstrapped on powerpc-ibm-aix7.2.3.0

Thanks, David

aix: AIX is not GLIBC.

A recent patch added tests for OPTION_GLIBC that is defined in
linux.h and linux64.h.  This broke bootstrap for non-Linux rs6000
configurations.  This patch defines OPTION_GLIBC as 0.

* config/rs6000/aix.h (OPTION_GLIBC): Define as 0.

diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
index ad3238bf09a..eb7a0c09f72 100644
--- a/gcc/config/rs6000/aix.h
+++ b/gcc/config/rs6000/aix.h
@@ -23,6 +23,7 @@
 #define DEFAULT_ABI ABI_AIX
 #undef  TARGET_AIX
 #define TARGET_AIX 1
+#define OPTION_GLIBC 0


Re: [PATCH v9] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2022-01-25 Thread Raoni Fassina Firmino via Gcc-patches
On Mon, Jan 24, 2022 at 11:35:47PM +0100, AL gcc-patches wrote:
> On Mon, Jan 24, 2022 at 06:24:11PM -0300, Raoni Fassina Firmino wrote:
> > On Mon, Jan 24, 2022 at 02:29:39PM -0600, Bill Schmidt wrote:
> > > Adding the patch author for his information.
> > 
> > Thanks Bill.
> > 
> > > On 1/24/22 2:26 PM, Jakub Jelinek via Gcc-patches wrote:
> > > > expand_builtin_feclear_feraise_except doesn't check if op0 matches
> > > > the predicate of operands[1], the backend requires const_int_operand,

Below is a patch to do just that. In preliminary tests it seems to work.
What do you think aboud it Jakub?

> > > > but because the call isn't done with a constant integer:
> > > > feraiseexcept (t == LLONG_MIN ? FE_INEXACT : FE_INVALID);
> > > > op0 is a REG.
> > > > If CONST_INT is what is expected on all targets, then it should punt if
> > > > op0 isn't one, otherwise it should the predicate.
> > 
> > My expectation was for backend expanders to be free to choose when they
> > expand, to be able to fail and cleanly fallback to libc call, and not
> > enforce one specific constrains to all targets.
> > 
> > I will look into it ASAP and see what can be done.
> > Thanks for the feedback.
> 
> These days the usual way of doing this is through
> maybe_expand_insn and create_{output,input}_operand before that.

I looked into maybe_expand_insn, if I undestood correctly in my reading
of maybe_expand_insn I could remove mostly if not all code in
expand_builtin_feclear_feraise_except with it, even the validate_arglist
part in the beginning?

o/
Raoni

---
 gcc/builtins.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index e84208035dab..88e689993563 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2598,6 +2598,9 @@ expand_builtin_feclear_feraise_except (tree exp, rtx 
target,
   if (icode == CODE_FOR_nothing)
 return NULL_RTX;
 
+  if (!(*insn_data[icode].operand[1].predicate) (op0, GET_MODE(op0)))
+return NULL_RTX;
+
   if (target == 0
   || GET_MODE (target) != target_mode
   || !(*insn_data[icode].operand[0].predicate) (target, target_mode))
-- 
2.34.1



[PR103970, Fortran, Coarray] Multi-image co_broadcast of derived type with allocatable components fails^

2022-01-25 Thread Andre Vehreschild via Gcc-patches
Hi all,

attached patch fixes wrong code generation when broadcasting a derived type
containing allocatable and non-allocatable scalars. Furthermore does it prevent
broadcasting of coarray-tokens, which are always local this_image. Thus having
them on a different image makes no sense.

Bootstrapped and regtested ok on x86_64-linux/F35.

Ok, for trunk and backport to 12 and 11-branch after decent time?

I perceived that 12 is closed for this kind of bugfix, therefore asking ok for
13.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
gcc/fortran/ChangeLog:

2022-01-24  Andre Vehreschild  

	PR fortran/103790
	* trans-array.cc (structure_alloc_comps): Prevent descriptor
	stacking for non-array data; do not broadcast caf-tokens.
	* trans-intrinsic.cc (conv_co_collective): Prevent generation
	of unused descriptor.

gcc/testsuite/ChangeLog:

2022-01-24  Andre Vehreschild  

	PR fortran/103790
	* gfortran.dg/coarray_collectives_18.f90: New test.

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 2f0c8a4d412..1234932aaff 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -9102,6 +9102,10 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl,
continue;
}

+ /* Do not broadcast a caf_token.  These are local to the image.  */
+ if (attr->caf_token)
+   continue;
+
  add_when_allocated = NULL_TREE;
  if (cmp_has_alloc_comps
  && !c->attr.pointer && !c->attr.proc_pointer)
@@ -9134,10 +9138,13 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl,
  if (attr->dimension)
{
  tmp = gfc_get_element_type (TREE_TYPE (comp));
- ubound = gfc_full_array_size (&tmpblock, comp,
-   c->ts.type == BT_CLASS
-   ? CLASS_DATA (c)->as->rank
-   : c->as->rank);
+ if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (comp)))
+   ubound = GFC_TYPE_ARRAY_SIZE (TREE_TYPE (comp));
+ else
+   ubound = gfc_full_array_size (&tmpblock, comp,
+ c->ts.type == BT_CLASS
+ ? CLASS_DATA (c)->as->rank
+ : c->as->rank);
}
  else
{
@@ -9145,26 +9152,36 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl,
  ubound = build_int_cst (gfc_array_index_type, 1);
}

- cdesc = gfc_get_array_type_bounds (tmp, 1, 0, &gfc_index_one_node,
-&ubound, 1,
-GFC_ARRAY_ALLOCATABLE, false);
+ /* Treat strings like arrays.  Or the other way around, do not
+  * generate an additional array layer for scalar components.  */
+ if (attr->dimension || c->ts.type == BT_CHARACTER)
+   {
+ cdesc = gfc_get_array_type_bounds (tmp, 1, 0, &gfc_index_one_node,
+&ubound, 1,
+GFC_ARRAY_ALLOCATABLE, false);

- cdesc = gfc_create_var (cdesc, "cdesc");
- DECL_ARTIFICIAL (cdesc) = 1;
+ cdesc = gfc_create_var (cdesc, "cdesc");
+ DECL_ARTIFICIAL (cdesc) = 1;

- gfc_add_modify (&tmpblock, gfc_conv_descriptor_dtype (cdesc),
- gfc_get_dtype_rank_type (1, tmp));
- gfc_conv_descriptor_lbound_set (&tmpblock, cdesc,
- gfc_index_zero_node,
- gfc_index_one_node);
- gfc_conv_descriptor_stride_set (&tmpblock, cdesc,
- gfc_index_zero_node,
- gfc_index_one_node);
- gfc_conv_descriptor_ubound_set (&tmpblock, cdesc,
- gfc_index_zero_node, ubound);
+ gfc_add_modify (&tmpblock, gfc_conv_descriptor_dtype (cdesc),
+ gfc_get_dtype_rank_type (1, tmp));
+ gfc_conv_descriptor_lbound_set (&tmpblock, cdesc,
+ gfc_index_zero_node,
+ gfc_index_one_node);
+ gfc_conv_descriptor_stride_set (&tmpblock, cdesc,
+ gfc_index_zero_node,
+ gfc_index_one_node);
+ gfc_conv_descriptor_ubound_set (&tmpblock, cdesc,
+ gfc_index_zero_node, ubound);
+   }

  if (attr->dimension)
-   comp = gfc_conv_descriptor_data_get (comp);
+   {
+ if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (comp)))
+   comp = gfc_conv_descriptor_data_get (comp);
+

[PATCH] c++: deleted fn and noexcept inst [PR101532, PR104225]

2022-01-25 Thread Patrick Palka via Gcc-patches
Here when attempting to use B's implicitly deleted default constructor,
mark_used rightfully returns false, but for the wrong reason: it
tries to instantiate the implicit noexcept specifier, which only
silently fails because get_defaulted_eh_spec suppresses diagnostics
for deleted functions.  This causes us to ICE on the first testcase
(thanks to the sanity check in finish_expr_stmt) and accept the second
invalid testcase without complaint.

To fix this, this patch makes mark_used avoid attempting to instantiate
the noexcept specifier on a deleted function, so that mark_used will
instead directly reject (and diagnose) such a function due to its
deletedness.

Bootstrap and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/101532
PR c++/104225

gcc/cp/ChangeLog:

* decl2.cc (mark_used): Don't call with
maybe_instantiate_noexcept on a deleted function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template21.C: New test.
* g++.dg/cpp0x/nsdmi-template22.C: New test.
---
 gcc/cp/decl2.cc   |  1 +
 gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C | 10 ++
 gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C | 12 
 3 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 69b97449eac..b68cf96fa81 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -5774,6 +5774,7 @@ mark_used (tree decl, tsubst_flags_t complain)
 used_types_insert (DECL_CONTEXT (decl));
 
   if (TREE_CODE (decl) == FUNCTION_DECL
+  && !DECL_DELETED_FN (decl)
   && !maybe_instantiate_noexcept (decl, complain))
 return false;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C
new file mode 100644
index 000..b9d4f4a4b9a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C
@@ -0,0 +1,10 @@
+// PR c++/101532
+// { dg-do compile { target c++11 } }
+
+class A { ~A(); };
+
+template class B { // { dg-error "private" }
+  A f = A();
+};
+
+B b; // { dg-error "deleted" }
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C
new file mode 100644
index 000..bbcc7d9732e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C
@@ -0,0 +1,12 @@
+// PR c++/104225
+// { dg-do compile { target c++11 } }
+
+class A { ~A(); };
+
+template class B { // { dg-error "private" }
+  A f = A();
+};
+
+int main() {
+  new B; // { dg-error "deleted" }
+}
-- 
2.35.0



Re: [PATCH] libgccjit: Add support for register variables [PR104072]

2022-01-25 Thread Antoni Boucher via Gcc-patches
See answers below.

Le lundi 24 janvier 2022 à 18:20 -0500, David Malcolm a écrit :
> On Sat, 2022-01-22 at 19:29 -0500, Antoni Boucher wrote:
> > Hi.
> > 
> > Le mardi 18 janvier 2022 à 18:49 -0500, David Malcolm a écrit :
> > > On Mon, 2022-01-17 at 19:46 -0500, Antoni Boucher via Gcc-patches
> > > wrote:
> > > > I missed the comment about the new define, so here's the
> > > > updated
> > > > patch.
> > > 
> > > Thanks for the patch.
> > > > 
> > > > Le lundi 17 janvier 2022 à 19:24 -0500, Antoni Boucher via Jit
> > > > a
> > > > écrit :
> > > > > Hi.
> > > > > This patch add supports for register variables in libgccjit.
> > > > > 
> > > > > It passes the JIT tests, but since I added a function in
> > > > > reginfo.c,
> > > > > I
> > > > > wonder if I should run the whole testsuite.
> > > 
> > > We're in stage 4 for gcc 12, so we should be especially careful
> > > about
> > > changes right now, and we're not meant to be adding new GCC 12
> > > features.
> > > 
> > > How close is gcc 12's libgccjit to being usable with your rustc
> > > backend?  If we're almost there, I'm willing to make our case for
> > > late-
> > > breaking libgccjit changes to the release managers, but if you
> > > think
> > > rustc users are going to need to build a patched libgccjit, then
> > > let's
> > > queue this up for gcc 13.
> > 
> > As I mentioned in my other email, if the 4 patches currently being
> > reviewed (and listed here:
> > https://github.com/antoyo/libgccjit-patches) were included in gcc
> > 12,
> > I'd be able to build rustc_codegen_gcc with an unpatched gcc.
> 
> Thanks.  Once the relevant patches look good to me, I'll approach the
> release managers with the proposal.
> 
> > 
> > It is to be noted however, that I'll need more patches for future
> > work.
> > Off the top of my head, I'll at least need a patch for the inline
> > attribute, try/catch and target-specific builtins.
> > The last 2 features will probably take some time to implement, so
> > I'll
> > let you judge if you think it's worth merging the 4 patches
> > currently
> > being reviewed for gcc 12.
> 
> Thanks, though I don't know enough about your project's features to
> make the judgement call.  Does rustc_codegen_gcc have releases yet,
> or
> are you just pointing people at the trunk of your repo?  I guess the
> question is - are you hoping to be able to point people at distro
> installs of gcc 12's libgccjit and have some version of
> rustc_codegen_gcc "just work" with that, or are they going to have to
> rebuild their own libgccjit to meaningfully use it?

rustc_codegen_gcc does not have releases. It is merged from time to
time into rustc, so we can as well say that I point them to trunk.

I can feature gate the future features/patches I will need so that
people can still use the project with an unpatched gcc.

There's some interest in having it work with an unpatched gcc because
that would allow people to start working on the infra to get the
project distributed via rustup so that it could be distributed as a
preview component.

> 
> [...snip various corrections...]
> 
> > 
> > > > diff --git a/gcc/testsuite/jit.dg/test-register-variable.c
> > > > b/gcc/testsuite/jit.dg/test-register-variable.c
> > > > new file mode 100644
> > > > index 000..3cea3f1668f
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/jit.dg/test-register-variable.c
> > > > +
> 
> [...snip...]
> 
> > > > +/* { dg-final { jit-verify-output-file-was-created "" } } */
> > > > +/* { dg-final { jit-verify-assembler-output "movl  \\\$1,
> > > > %r12d" } } */
> > > > +/* { dg-final { jit-verify-assembler-output "movl  \\\$2,
> > > > %r13d" } } */
> > > 
> > > How target-specific is this test?
> > 
> > It will only work on x86-64. Should I feature-gate the test
> > somehow?
> 
> 
> Yes; I think you can do this by adding this to the top of the test:
> 
>    /* { dg-do compile { target x86_64-*-* } } */
> 
> like test-asm.c does.

Thanks. I'll update the patch to do this.

> 
> > > 
> > > We should have test coverage for at least these two errors:
> > > 
> > > - gcc_jit_lvalue_set_register_name(global_variable,
> > > "this_is_not_a_register");
> > > - attempting to set the name for a var that doesn't fit in the
> > > given
> > > register (e.g. trying to use a register for an array that's way
> > > too
> > > big)
> > 
> > Done.
> 
> Thanks.
> 
> Is the updated patch available for review? It looks like you didn't
> attach it.
> 
> Dave
> 



Re: [PATCH] c++: deleted fn and noexcept inst [PR101532, PR104225]

2022-01-25 Thread Patrick Palka via Gcc-patches
On Tue, 25 Jan 2022, Patrick Palka wrote:

> Here when attempting to use B's implicitly deleted default constructor,
> mark_used rightfully returns false, but for the wrong reason: it
> tries to instantiate the implicit noexcept specifier, which only
> silently fails because get_defaulted_eh_spec suppresses diagnostics
> for deleted functions.  This causes us to ICE on the first testcase
> (thanks to the sanity check in finish_expr_stmt) and accept the second
> invalid testcase without complaint.
> 
> To fix this, this patch makes mark_used avoid attempting to instantiate
> the noexcept specifier on a deleted function, so that mark_used will
> instead directly reject (and diagnose) such a function due to its
> deletedness.
> 
> Bootstrap and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?

... and perhaps release branches given that the second PR is a
regression relative to GCC 8?

> 
>   PR c++/101532
>   PR c++/104225
> 
> gcc/cp/ChangeLog:
> 
>   * decl2.cc (mark_used): Don't call with
>   maybe_instantiate_noexcept on a deleted function.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/nsdmi-template21.C: New test.
>   * g++.dg/cpp0x/nsdmi-template22.C: New test.
> ---
>  gcc/cp/decl2.cc   |  1 +
>  gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C | 10 ++
>  gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C | 12 
>  3 files changed, 23 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C
> 
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index 69b97449eac..b68cf96fa81 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -5774,6 +5774,7 @@ mark_used (tree decl, tsubst_flags_t complain)
>  used_types_insert (DECL_CONTEXT (decl));
>  
>if (TREE_CODE (decl) == FUNCTION_DECL
> +  && !DECL_DELETED_FN (decl)
>&& !maybe_instantiate_noexcept (decl, complain))
>  return false;
>  
> diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C 
> b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C
> new file mode 100644
> index 000..b9d4f4a4b9a
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C
> @@ -0,0 +1,10 @@
> +// PR c++/101532
> +// { dg-do compile { target c++11 } }
> +
> +class A { ~A(); };
> +
> +template class B { // { dg-error "private" }
> +  A f = A();
> +};
> +
> +B b; // { dg-error "deleted" }
> diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C 
> b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C
> new file mode 100644
> index 000..bbcc7d9732e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C
> @@ -0,0 +1,12 @@
> +// PR c++/104225
> +// { dg-do compile { target c++11 } }
> +
> +class A { ~A(); };
> +
> +template class B { // { dg-error "private" }
> +  A f = A();
> +};
> +
> +int main() {
> +  new B; // { dg-error "deleted" }
> +}
> -- 
> 2.35.0
> 
> 



Re: [PATCH] Fix PR 67102: Add libstdc++ dependancy to libffi

2022-01-25 Thread Thomas Schwinge
Hi!

On 2021-09-17T01:01:39-0700, Andrew Pinski via Gcc-patches 
 wrote:
> On Fri, Sep 17, 2021 at 12:46 AM Thomas Schwinge
>  wrote:
>> First, I appreciate you working through all these old PRs!
>>
>>
>> On 2021-09-15T13:56:37-0700, apinski--- via Gcc-patches 
>>  wrote:
>> > The error message is obvious -funconfigured-libstdc++-v3 is used
>> > on the g++ command line.  So we just add the dependancy.
>>
>> > --- a/Makefile.def
>> > +++ b/Makefile.def
>> > @@ -592,6 +592,7 @@ dependencies = { module=configure-target-fastjar; 
>> > on=configure-target-zlib; };
>> >  dependencies = { module=all-target-fastjar; on=all-target-zlib; };
>> >  dependencies = { module=configure-target-libgo; 
>> > on=configure-target-libffi; };
>> >  dependencies = { module=configure-target-libgo; 
>> > on=all-target-libstdc++-v3; };
>> > +dependencies = { module=configure-target-libffi; 
>> > on=all-target-libstdc++-v3; };
>> >  dependencies = { module=all-target-libgo; on=all-target-libbacktrace; };
>> >  dependencies = { module=all-target-libgo; on=all-target-libffi; };
>> >  dependencies = { module=all-target-libgo; on=all-target-libatomic; };
>>
>> I'm confused, because given that this 'Makefile.def' change only has the
>> following effect:
>>
>> > --- a/Makefile.in
>> > +++ b/Makefile.in
>> > @@ -61261,6 +61261,7 @@ all-bison: maybe-all-intl
>> >  all-flex: maybe-all-intl
>> >  all-m4: maybe-all-intl
>> >  configure-target-libgo: maybe-all-target-libstdc++-v3
>> > +configure-target-libffi: maybe-all-target-libstdc++-v3
>> >  configure-target-liboffloadmic: maybe-configure-target-libgomp
>> >  all-target-liboffloadmic: maybe-all-target-libgomp
>> >  configure-target-newlib: maybe-all-binutils
>>
>> ... isn't that actually a no-op, because we already had such a dependency
>> listed?  Now twice:
>>
>> $ grep -n -F 'configure-target-libffi: maybe-all-target-libstdc++-v3' -- 
>> Makefile.in
>> 61264:configure-target-libffi: maybe-all-target-libstdc++-v3
>> 61372:configure-target-libffi: maybe-all-target-libstdc++-v3
>>
>> Compared to the existing one, the one you've added is additionally
>> restricted by '@unless gcc-bootstrap'.
>>
>> I noticed this as I remembered that on our og[...] development branches
>> we have a patch in the opposite direction: get rid of this dependency via
>> removing 'lang_env_dependencies = { module=libffi; cxx=true; };' from
>> 'Makefile.def'.  See
>> 
>> "Disable libstdc++ dependency for libffi".  (Maciej CCed in case you have
>> any further thoughts on that.)
>
> Oh, I see what happened now, the old bug was actually fixed by r6-5415
> which added cxx=true.
> So yes my patch is actually not needed and can be reverted.
> I tried to look to see if there was a dependency was there but for
> some reason I did not see it.

I have thus pushed 'git revert db1a65d9364fe72c2fff65fb2dec051728b6f3fa'
to master branch in commit aeac414923aa1e87986c7fc6f9b921d89a9b86cf
'Revert "Fix PR 67102: Add libstdc++ dependancy to libffi" [PR67102]',
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From aeac414923aa1e87986c7fc6f9b921d89a9b86cf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 25 Jan 2022 18:44:02 +0100
Subject: [PATCH] Revert "Fix PR 67102: Add libstdc++ dependancy to libffi"
 [PR67102]

This reverts commit db1a65d9364fe72c2fff65fb2dec051728b6f3fa.

On 2021-09-17T01:01:39-0700, Andrew Pinski via Gcc-patches  wrote:
> On Fri, Sep 17, 2021 at 12:46 AM Thomas Schwinge  wrote:
>> On 2021-09-15T13:56:37-0700, apinski--- via Gcc-patches  wrote:
>> > The error message is obvious -funconfigured-libstdc++-v3 is used
>> > on the g++ command line.  So we just add the dependancy.
>>
>> > --- a/Makefile.def
>> > +++ b/Makefile.def
>> > @@ -592,6 +592,7 @@ dependencies = { module=configure-target-fastjar; on=configure-target-zlib; };
>> >  dependencies = { module=all-target-fastjar; on=all-target-zlib; };
>> >  dependencies = { module=configure-target-libgo; on=configure-target-libffi; };
>> >  dependencies = { module=configure-target-libgo; on=all-target-libstdc++-v3; };
>> > +dependencies = { module=configure-target-libffi; on=all-target-libstdc++-v3; };
>> >  dependencies = { module=all-target-libgo; on=all-target-libbacktrace; };
>> >  dependencies = { module=all-target-libgo; on=all-target-libffi; };
>> >  dependencies = { module=all-target-libgo; on=all-target-libatomic; };
>>
>> I'm confused, because given that this 'Makefile.def' change only has the
>> following effect:
>>
>> > --- a/Makefile.in
>> > +++ b/Makefile.in
>> > @@ -61261,6 +61261,7 @@ all-bison: maybe-all-intl
>> >  all-flex: maybe-all-intl
>> >  all-m4: maybe-all-intl
>> >  configure-targ

Re: [PATCH v9] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2022-01-25 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 25, 2022 at 01:28:29PM -0300, Raoni Fassina Firmino wrote:
> Below is a patch to do just that. In preliminary tests it seems to work.
> What do you think aboud it Jakub?

Ok for trunk.

> > These days the usual way of doing this is through
> > maybe_expand_insn and create_{output,input}_operand before that.
> 
> I looked into maybe_expand_insn, if I undestood correctly in my reading
> of maybe_expand_insn I could remove mostly if not all code in
> expand_builtin_feclear_feraise_except with it, even the validate_arglist
> part in the beginning?

validate_arglist should be still performed.  But maybe_expand_insn +
create_*_operand will take care of checking the predicates etc.

> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2598,6 +2598,9 @@ expand_builtin_feclear_feraise_except (tree exp, rtx 
> target,
>if (icode == CODE_FOR_nothing)
>  return NULL_RTX;
>  
> +  if (!(*insn_data[icode].operand[1].predicate) (op0, GET_MODE(op0)))
> +return NULL_RTX;
> +
>if (target == 0
>|| GET_MODE (target) != target_mode
>|| !(*insn_data[icode].operand[0].predicate) (target, target_mode))
> -- 
> 2.34.1

Jakub



Re: [PATCH] dwarf2out: For ppc64le IEEE quad long double, emit DW_TAG_typedef to _Float128 [PR104194]

2022-01-25 Thread Jason Merrill via Gcc-patches

On 1/25/22 06:02, Jakub Jelinek wrote:

On Mon, Jan 24, 2022 at 11:26:27PM +0100, Jakub Jelinek via Gcc-patches wrote:

Yet another short term solution might be not use DW_TAG_base_type
for the IEEE quad long double, but instead pretend it is a DW_TAG_typedef
with DW_AT_name "long double" to __float128 DW_TAG_base_type.
I bet gdb would even handle it without any changes, but of course, it would
be larger than the other proposed changes.


Here it is implemented.

Testcases I've played with are e.g.:
__ibm128 a;
long double b;
_Complex long double c;

static __attribute__((noinline)) int
foo (long double d)
{
   long double e = d + 1.0L;
   return 0;
}

int
main ()
{
   a = 1.0;
   b = 2.0;
   c = 5.0 + 6.0i;
   return foo (7.0L);
}
and
   real(kind=16) :: a
   complex(kind=16) :: b
   a = 1.0
   b = 2.0
end

Printing the values of the variables works well,
p &b or p &c shows pointer to the correct type, just
ptype b or ptype c prints _Float128 instead of
long double or complex _Float128 instead of complex long double.
Even worse in fortran where obviously _Float128 or
complex _Float128 aren't valid types, but as GDB knows them by name,
it is just ptype that is weird.

Is this ok for trunk until we get an agreement on which extension
to use (different DW_ATE_*, or DW_AT_GNU_precision or
DW_ATE_GNU_encoding_variant)?

2022-01-25  Jakub Jelinek  

PR debug/104194
* dwarf2out.cc (long_double_as_float128): New function.
(modified_type_die): For powerpc64le IEEE 754 quad long double
and complex long double emit those as DW_TAG_typedef to
_Float128 or complex _Float128 base type.

--- gcc/dwarf2out.cc.jj 2022-01-25 05:47:53.987454934 +0100
+++ gcc/dwarf2out.cc2022-01-25 11:54:25.100522089 +0100
@@ -13568,6 +13568,47 @@ qualified_die_p (dw_die_ref die, int *ma
return type;
  }
  
+/* If TYPE is long double or complex long double that

+   should be emitted as artificial typedef to _Float128 or
+   complex _Float128, return the type it should be emitted as.
+   This is done in case the target already supports 16-byte
+   composite floating point type (ibm_extended_format).  */
+
+static tree
+long_double_as_float128 (tree type)
+{
+  if (type != long_double_type_node
+  && type != complex_long_double_type_node)
+return NULL_TREE;
+
+  machine_mode mode, fmode;
+  if (TREE_CODE (type) == COMPLEX_TYPE)
+mode = TYPE_MODE (TREE_TYPE (type));
+  else
+mode = TYPE_MODE (type);
+  if (known_eq (GET_MODE_SIZE (mode), 16) && !MODE_COMPOSITE_P (mode))
+FOR_EACH_MODE_IN_CLASS (fmode, MODE_FLOAT)
+  if (known_eq (GET_MODE_SIZE (fmode), 16)
+  && MODE_COMPOSITE_P (fmode))
+   {
+ if (type == long_double_type_node)
+   {
+ if (float128_type_node
+ && (TYPE_MODE (float128_type_node)
+ == TYPE_MODE (type)))
+   return float128_type_node;
+ return NULL_TREE;
+   }
+ for (int i = 0; i < NUM_FLOATN_NX_TYPES; i++)
+   if (COMPLEX_FLOATN_NX_TYPE_NODE (i) != NULL_TREE
+   && (TYPE_MODE (COMPLEX_FLOATN_NX_TYPE_NODE (i))
+   == TYPE_MODE (type)))
+ return COMPLEX_FLOATN_NX_TYPE_NODE (i);
+   }


Do we really need this loop to determine if the target supports two 
different long doubles?  This seems like a complicated way of saying
"if this is IEEE 128-bit long double and the target also supports IBM 
double double", return _Float128."


But OK.


+
+  return NULL_TREE;
+}
+
  /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging
 entry that chains the modifiers specified by CV_QUALS in front of the
 given type.  REVERSE is true if the type is to be interpreted in the
@@ -13848,7 +13889,32 @@ modified_type_die (tree type, int cv_qua
  }
else if (is_base_type (type))
  {
-  mod_type_die = base_type_die (type, reverse);
+  /* If a target supports long double as different floating point
+modes with the same 16-byte size, use normal DW_TAG_base_type
+only for the composite (ibm_extended_real_format) type and
+for the other for the time being emit instead a "_Float128"
+or "complex _Float128" DW_TAG_base_type and a "long double"
+or "complex long double" typedef to it.  */
+  if (tree other_type = long_double_as_float128 (type))
+   {
+ dw_die_ref other_die;
+ if (TYPE_NAME (other_type))
+   other_die
+ = modified_type_die (other_type, TYPE_UNQUALIFIED, reverse,
+  context_die);
+ else
+   {
+ other_die = base_type_die (type, reverse);
+ add_child_die (comp_unit_die (), other_die);
+ add_name_attribute (other_die,
+ TREE_CODE (type) == COMPLEX_TYPE
+ ? "complex _Float128" : "_Float128");
+   }
+ mod_type_die = new_die

Re: [PATCH] c++: deleted fn and noexcept inst [PR101532, PR104225]

2022-01-25 Thread Jason Merrill via Gcc-patches

On 1/25/22 12:33, Patrick Palka wrote:

On Tue, 25 Jan 2022, Patrick Palka wrote:


Here when attempting to use B's implicitly deleted default constructor,
mark_used rightfully returns false, but for the wrong reason: it
tries to instantiate the implicit noexcept specifier, which only
silently fails because get_defaulted_eh_spec suppresses diagnostics
for deleted functions.  This causes us to ICE on the first testcase
(thanks to the sanity check in finish_expr_stmt) and accept the second
invalid testcase without complaint.

To fix this, this patch makes mark_used avoid attempting to instantiate
the noexcept specifier on a deleted function, so that mark_used will
instead directly reject (and diagnose) such a function due to its
deletedness.

Bootstrap and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


... and perhaps release branches given that the second PR is a
regression relative to GCC 8?


OK for trunk and release branches.



PR c++/101532
PR c++/104225

gcc/cp/ChangeLog:

* decl2.cc (mark_used): Don't call with
maybe_instantiate_noexcept on a deleted function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template21.C: New test.
* g++.dg/cpp0x/nsdmi-template22.C: New test.
---
  gcc/cp/decl2.cc   |  1 +
  gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C | 10 ++
  gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C | 12 
  3 files changed, 23 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 69b97449eac..b68cf96fa81 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -5774,6 +5774,7 @@ mark_used (tree decl, tsubst_flags_t complain)
  used_types_insert (DECL_CONTEXT (decl));
  
if (TREE_CODE (decl) == FUNCTION_DECL

+  && !DECL_DELETED_FN (decl)
&& !maybe_instantiate_noexcept (decl, complain))
  return false;
  
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C

new file mode 100644
index 000..b9d4f4a4b9a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template21.C
@@ -0,0 +1,10 @@
+// PR c++/101532
+// { dg-do compile { target c++11 } }
+
+class A { ~A(); };
+
+template class B { // { dg-error "private" }
+  A f = A();
+};
+
+B b; // { dg-error "deleted" }
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C
new file mode 100644
index 000..bbcc7d9732e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template22.C
@@ -0,0 +1,12 @@
+// PR c++/104225
+// { dg-do compile { target c++11 } }
+
+class A { ~A(); };
+
+template class B { // { dg-error "private" }
+  A f = A();
+};
+
+int main() {
+  new B; // { dg-error "deleted" }
+}
--
2.35.0








[pushed] c++: assignment to temporary [PR59950]

2022-01-25 Thread Jason Merrill via Gcc-patches
Given build_this of a TARGET_EXPR, cp_build_fold_indirect_ref returns the
TARGET_EXPR.  But that's the wrong value category for the result of the
defaulted class assignment operator, which returns an lvalue, so we need to
actually build the INDIRECT_REF.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/59950

gcc/cp/ChangeLog:

* call.cc (build_over_call): Use cp_build_indirect_ref.

gcc/testsuite/ChangeLog:

* g++.dg/init/assign2.C: New test.
---
 gcc/cp/call.cc  | 5 -
 gcc/testsuite/g++.dg/init/assign2.C | 6 ++
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/init/assign2.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index f7f861cd16e..bc157cdd1fb 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -9789,7 +9789,10 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   && DECL_OVERLOADED_OPERATOR_IS (fn, NOP_EXPR)
   && trivial_fn_p (fn))
 {
-  tree to = cp_build_fold_indirect_ref (argarray[0]);
+  /* Don't use cp_build_fold_indirect_ref, op= returns an lvalue even if
+the object argument isn't one.  */
+  tree to = cp_build_indirect_ref (input_location, argarray[0],
+  RO_ARROW, complain);
   tree type = TREE_TYPE (to);
   tree as_base = CLASSTYPE_AS_BASE (type);
   tree arg = argarray[1];
diff --git a/gcc/testsuite/g++.dg/init/assign2.C 
b/gcc/testsuite/g++.dg/init/assign2.C
new file mode 100644
index 000..72d1264f3c9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/assign2.C
@@ -0,0 +1,6 @@
+// PR c++/59950
+
+ struct Foo {};
+
+ int f(Foo *p);
+ int n = f(&(Foo() = Foo()));

base-commit: aeac414923aa1e87986c7fc6f9b921d89a9b86cf
-- 
2.27.0



Re: [PATCH] dwarf2out: For ppc64le IEEE quad long double, emit DW_TAG_typedef to _Float128 [PR104194]

2022-01-25 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 25, 2022 at 02:24:34PM -0500, Jason Merrill wrote:
> > +  machine_mode mode, fmode;
> > +  if (TREE_CODE (type) == COMPLEX_TYPE)
> > +mode = TYPE_MODE (TREE_TYPE (type));
> > +  else
> > +mode = TYPE_MODE (type);
> > +  if (known_eq (GET_MODE_SIZE (mode), 16) && !MODE_COMPOSITE_P (mode))
> > +FOR_EACH_MODE_IN_CLASS (fmode, MODE_FLOAT)
> > +  if (known_eq (GET_MODE_SIZE (fmode), 16)
> > +  && MODE_COMPOSITE_P (fmode))
> > +   {
> > + if (type == long_double_type_node)
> > +   {
> > + if (float128_type_node
> > + && (TYPE_MODE (float128_type_node)
> > + == TYPE_MODE (type)))
> > +   return float128_type_node;
> > + return NULL_TREE;
> > +   }
> > + for (int i = 0; i < NUM_FLOATN_NX_TYPES; i++)
> > +   if (COMPLEX_FLOATN_NX_TYPE_NODE (i) != NULL_TREE
> > +   && (TYPE_MODE (COMPLEX_FLOATN_NX_TYPE_NODE (i))
> > +   == TYPE_MODE (type)))
> > + return COMPLEX_FLOATN_NX_TYPE_NODE (i);
> > +   }
> 
> Do we really need this loop to determine if the target supports two
> different long doubles?  This seems like a complicated way of saying
> "if this is IEEE 128-bit long double and the target also supports IBM double
> double", return _Float128."

The outermost if and FOR_EACH_MODE_IN_CLASS in it are checking if it is
essentially ppc64le-linux with -mabi=ieeelongdouble, i.e. whether long double
is non-composite 16-byte and the target supports at least one composite
16-byte mode.
The inner loop is because we don't have a complex_float128_type_node macro,
so it needs to be found by iteration or hardcoding which entry it is
(it iterates over those 6 entries for complex _Float{32,64,128}{,x}).

Jakub



[OG11][committed] OpenMP metadirective support

2022-01-25 Thread Kwok Cheung Yeung

Hello

I have backported and committed my metadirective patches onto the 
current OpenMP development branch (devel/omp/gcc-11). These are:


f464df13a44b9814341659be631f051377a2ce25 openmp: Add C support for 
parsing metadirectives
a238b6934b62ce3e8342047e41840c804d83b59d openmp: Add middle-end support 
for metadirectives
7e672d2ba146ca55dfffc36b198fbb3f3200f8f2 openmp: Add support for 
resolving metadirectives during parsing and Gimplification
b6fd3d1a54736c87fcd29a4ed294b31346b3af75 openmp: Add support for 
streaming metadirectives and resolving them after LTO
360db2054413d21399473173a85870da6479ab8c openmp: Add C++ support for 
parsing metadirectives
ceb0beb7ba9357146994895070762f8a9d94ca7c openmp, fortran: Add Fortran 
support for parsing metadirectives
eb4bea483010d91fbeeae9c863e92da873fbeef9 openmp: Add testcases for 
metadirectives

b597c0835ede0067d1b009e0d7381515b44d8753 openmp: Metadirective fixes
21766085775bd52c9db53629636c830fc9dc6fa0 openmp: Add support for 
'target_device' context selector set


The following backport from mainline was also required:

94c179971913b4837ec76a9e02a9a8a5cbf8e024 Expose stable sort algorithm to 
gcc_sort_r and add vec::stablesort


Thanks

Kwok


[pushed] PR/fortran 104227 - [9/10/11/12 Regression] ICE virtual memory exhausted: Cannot allocate memory

2022-01-25 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

committed as obvious after regtesting on x86_64-pc-linux-gnu.

We already had a check for the MOLD argument to the TRANSFER
intrinsic for having storage size 0, which failed if MOLD was
an EXPR_VARIABLE although having rank != 0.  Duh.
Adjusting that check fixed the issue.

I intend to backport to the affected branches after some
waiting time.

Thanks,
Harald

From ec543c9833c2d9283c035cd8430849eb4ec04406 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 25 Jan 2022 21:56:39 +0100
Subject: [PATCH] Fortran: MOLD argument to TRANSFER intrinsic having storage
 size zero

gcc/fortran/ChangeLog:

	PR fortran/104227
	* check.cc (gfc_calculate_transfer_sizes): Fix checking of arrays
	passed as MOLD argument to the TRANSFER intrinsic for having
	storage size zero.

gcc/testsuite/ChangeLog:

	PR fortran/104227
	* gfortran.dg/transfer_check_6.f90: New test.
---
 gcc/fortran/check.cc   |  2 +-
 gcc/testsuite/gfortran.dg/transfer_check_6.f90 | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/transfer_check_6.f90

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index 4fa05ee7e9b..d6c6767ae9e 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -6151,7 +6151,7 @@ gfc_calculate_transfer_sizes (gfc_expr *source, gfc_expr *mold, gfc_expr *size,
* If SIZE is present, the result is an array of rank one and size SIZE.
*/
   if (result_elt_size == 0 && *source_size > 0 && !size
-  && mold->expr_type == EXPR_ARRAY)
+  && (mold->expr_type == EXPR_ARRAY || mold->rank))
 {
   gfc_error ("% argument of % intrinsic at %L is an "
 		 "array and shall not have storage size 0 when % "
diff --git a/gcc/testsuite/gfortran.dg/transfer_check_6.f90 b/gcc/testsuite/gfortran.dg/transfer_check_6.f90
new file mode 100644
index 000..dffd0913f0d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/transfer_check_6.f90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! PR fortran/104227 - ICE virtual memory exhausted
+! Contributed by G.Steinmetz
+
+program p
+  type t
+  end type
+  type(t) :: x(2)
+  print *, transfer(1, x) ! { dg-error "shall not have storage size 0" }
+  x = transfer(1, x)  ! { dg-error "shall not have storage size 0" }
+end
--
2.31.1



[committed] libstdc++: Avoid symlink race in filesystem::remove_all [PR104161]

2022-01-25 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk. Backports to follow.


This adds a new internal flag to the filesystem::directory_iterator
constructor that makes it fail if the path is a symlink that resolves to
a directory. This prevents filesystem::remove_all from following a
symlink to a directory, rather than deleting the symlink itself.

We can also use that new flag in recursive_directory_iterator to ensure
that we don't follow symlinks if the follow_directory_symlink option is
not set.

This also moves an error check in filesystem::remove_all after the while
loop, so that errors from the directory_iterator constructor are
reproted, instead of continuing to the filesystem::remove call below.

libstdc++-v3/ChangeLog:

PR libstdc++/104161
* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Check for
fdopendir.
* config.h.in: Regenerate.
* configure: Regenerate.
* src/c++17/fs_dir.cc (_Dir): Add nofollow flag to constructor
and pass it to base class constructor.
(directory_iterator): Pass nofollow flag to _Dir constructor.
(fs::recursive_directory_iterator::increment): Likewise.
* src/c++17/fs_ops.cc (do_remove_all): Use nofollow option for
directory_iterator constructor. Move error check outside loop.
* src/filesystem/dir-common.h (_Dir_base): Add nofollow flag to
constructor and when it's set use ::open with O_NOFOLLOW and
O_DIRECTORY.
* src/filesystem/dir.cc (_Dir): Add nofollow flag to constructor
and pass it to base class constructor.
(directory_iterator): Pass nofollow flag to _Dir constructor.
(fs::recursive_directory_iterator::increment): Likewise.
* src/filesystem/ops.cc (remove_all): Use nofollow option for
directory_iterator constructor. Move error check outside loop.
---
 libstdc++-v3/acinclude.m4| 12 ++
 libstdc++-v3/config.h.in |  3 ++
 libstdc++-v3/configure   | 55 
 libstdc++-v3/src/c++17/fs_dir.cc | 13 --
 libstdc++-v3/src/c++17/fs_ops.cc | 12 +++---
 libstdc++-v3/src/filesystem/dir-common.h | 48 -
 libstdc++-v3/src/filesystem/dir.cc   | 13 --
 libstdc++-v3/src/filesystem/ops.cc   |  6 +--
 8 files changed, 134 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index d996477254c..7b6b807114a 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4735,6 +4735,18 @@ dnl
   if test $glibcxx_cv_truncate = yes; then
 AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in 
.])
   fi
+dnl
+  AC_CACHE_CHECK([for fdopendir],
+glibcxx_cv_fdopendir, [dnl
+GCC_TRY_COMPILE_OR_LINK(
+  [#include ],
+  [::fdopendir(1);],
+  [glibcxx_cv_fdopendir=yes],
+  [glibcxx_cv_fdopendir=no])
+  ])
+  if test $glibcxx_cv_truncate = yes; then
+AC_DEFINE(HAVE_FDOPENDIR, 1, [Define if fdopendir is available in 
.])
+  fi
 dnl
   CXXFLAGS="$ac_save_CXXFLAGS"
   AC_LANG_RESTORE
diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
index 235d256a1cc..e25b7de318f 100644
--- a/libstdc++-v3/config.h.in
+++ b/libstdc++-v3/config.h.in
@@ -100,6 +100,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_FCNTL_H
 
+/* Define if fdopendir is available in . */
+#undef HAVE_FDOPENDIR
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_FENV_H
 
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 4c20c669144..5c6e51f5c11 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -77029,6 +77029,61 @@ $as_echo "$glibcxx_cv_truncate" >&6; }
 
 $as_echo "#define HAVE_TRUNCATE 1" >>confdefs.h
 
+  fi
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for fdopendir" >&5
+$as_echo_n "checking for fdopendir... " >&6; }
+if ${glibcxx_cv_fdopendir+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  if test x$gcc_no_link = xyes; then
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+::fdopendir(1);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  glibcxx_cv_fdopendir=yes
+else
+  glibcxx_cv_fdopendir=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+else
+  if test x$gcc_no_link = xyes; then
+  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
"$LINENO" 5
+fi
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+::fdopendir(1);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_link "$LINENO"; then :
+  glibcxx_cv_fdopendir=yes
+else
+  glibcxx_cv_fdopendir=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $glibcxx_cv_fdopendir" >&5
+$as_echo "$glibcxx_cv_fdopendir" >&6; }
+  if test $glibcxx_cv_truncate = yes; then
+
+$as_echo "#define H

[committed] libstdc++: Define _GNU_SOURCE for secure_getenv on Cygwin [PR104217]

2022-01-25 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.


For GNU/Linux G++ defines _GNU_SOURCE automatically, but not for Cygwin.
This means secure_getenv is not declared by Cygwin's , even
though autoconf detected it is present in the library. Define it in the
source files that want to use secure_getenv.

libstdc++-v3/ChangeLog:

PR libstdc++/104217
* src/c++17/fs_ops.cc (_GNU_SOURCE): Define.
* src/filesystem/dir.cc (_GNU_SOURCE): Define.
* src/filesystem/ops.cc (_GNU_SOURCE): Define.
---
 libstdc++-v3/src/c++17/fs_ops.cc   | 4 
 libstdc++-v3/src/filesystem/dir.cc | 4 
 libstdc++-v3/src/filesystem/ops.cc | 4 
 3 files changed, 12 insertions(+)

diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 1d3693af06c..321944d73f7 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -27,6 +27,10 @@
 # define NEED_DO_COPY_FILE
 # define NEED_DO_SPACE
 #endif
+#ifndef _GNU_SOURCE
+// Cygwin needs this for secure_getenv
+# define _GNU_SOURCE 1
+#endif
 
 #include 
 #include 
diff --git a/libstdc++-v3/src/filesystem/dir.cc 
b/libstdc++-v3/src/filesystem/dir.cc
index f1bf96aab50..d5b11f25297 100644
--- a/libstdc++-v3/src/filesystem/dir.cc
+++ b/libstdc++-v3/src/filesystem/dir.cc
@@ -25,6 +25,10 @@
 #ifndef _GLIBCXX_USE_CXX11_ABI
 # define _GLIBCXX_USE_CXX11_ABI 1
 #endif
+#ifndef _GNU_SOURCE
+// Cygwin needs this for secure_getenv
+# define _GNU_SOURCE 1
+#endif
 
 #include 
 #include 
diff --git a/libstdc++-v3/src/filesystem/ops.cc 
b/libstdc++-v3/src/filesystem/ops.cc
index 324c6afc7a9..cd98caf73f2 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -27,6 +27,10 @@
 # define NEED_DO_COPY_FILE
 # define NEED_DO_SPACE
 #endif
+#ifndef _GNU_SOURCE
+// Cygwin needs this for secure_getenv
+# define _GNU_SOURCE 1
+#endif
 
 #include 
 #include 
-- 
2.31.1



[committed] libstdc++: Avoid some more warnings [PR104019]

2022-01-25 Thread Jonathan Wakely via Gcc-patches
The change to istream.tcc looks large but is just adding { and } and
re-indenting everything between them.
Tested x86_64-linux, pushed to trunk.


With -fno-exceptions we get a -Wmisleading-indentation warning for:

  if (cond)
__try {}
__catch (...) {}

This is because the __catch(...) expands to if (false), but is indented
as though it is controlled by the preceding 'if'. Surround it in braces.

The new make_shared code triggers a bogus warning due to PR 61596,
which can be disabled with a pragma.

libstdc++-v3/ChangeLog:

PR libstdc++/104019
* include/bits/istream.tcc (basic_istream::sentry): Add braces
around try-block.
* include/bits/shared_ptr_base.h (_Sp_counted_array_base::_M_init):
Add pragmas to disable bogus warnings from PR 61596.
---
 libstdc++-v3/include/bits/istream.tcc   | 58 +++--
 libstdc++-v3/include/bits/shared_ptr_base.h |  3 ++
 2 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/include/bits/istream.tcc 
b/libstdc++-v3/include/bits/istream.tcc
index cfab5b4f684..a1a7d89335e 100644
--- a/libstdc++-v3/include/bits/istream.tcc
+++ b/libstdc++-v3/include/bits/istream.tcc
@@ -48,36 +48,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   ios_base::iostate __err = ios_base::goodbit;
   if (__in.good())
-   __try
- {
-   if (__in.tie())
- __in.tie()->flush();
-   if (!__noskip && bool(__in.flags() & ios_base::skipws))
- {
-   const __int_type __eof = traits_type::eof();
-   __streambuf_type* __sb = __in.rdbuf();
-   __int_type __c = __sb->sgetc();
+   {
+ __try
+   {
+ if (__in.tie())
+   __in.tie()->flush();
+ if (!__noskip && bool(__in.flags() & ios_base::skipws))
+   {
+ const __int_type __eof = traits_type::eof();
+ __streambuf_type* __sb = __in.rdbuf();
+ __int_type __c = __sb->sgetc();
 
-   const __ctype_type& __ct = __check_facet(__in._M_ctype);
-   while (!traits_type::eq_int_type(__c, __eof)
-  && __ct.is(ctype_base::space,
- traits_type::to_char_type(__c)))
- __c = __sb->snextc();
+ const __ctype_type& __ct = __check_facet(__in._M_ctype);
+ while (!traits_type::eq_int_type(__c, __eof)
+&& __ct.is(ctype_base::space,
+   traits_type::to_char_type(__c)))
+   __c = __sb->snextc();
 
-   // _GLIBCXX_RESOLVE_LIB_DEFECTS
-   // 195. Should basic_istream::sentry's constructor ever
-   // set eofbit?
-   if (traits_type::eq_int_type(__c, __eof))
- __err |= ios_base::eofbit;
- }
- }
-   __catch(__cxxabiv1::__forced_unwind&)
- {
-   __in._M_setstate(ios_base::badbit);
-   __throw_exception_again;
- }
-   __catch(...)
- { __in._M_setstate(ios_base::badbit); }
+ // _GLIBCXX_RESOLVE_LIB_DEFECTS
+ // 195. Should basic_istream::sentry's constructor ever
+ // set eofbit?
+ if (traits_type::eq_int_type(__c, __eof))
+   __err |= ios_base::eofbit;
+   }
+   }
+ __catch(__cxxabiv1::__forced_unwind&)
+   {
+ __in._M_setstate(ios_base::badbit);
+ __throw_exception_again;
+   }
+ __catch(...)
+   { __in._M_setstate(ios_base::badbit); }
+   }
 
   if (__in.good() && __err == ios_base::goodbit)
_M_ok = true;
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index b2f955b41f7..d9230b72203 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -762,6 +762,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::__uninitialized_fill_n_a(__p, _M_n, *__init, _M_alloc);
  else
{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
  struct _Iter
  {
using value_type = _Up;
@@ -783,6 +785,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
bool operator==(const _Iter& __i) const
{ return _M_pos == __i._M_pos; }
  };
+#pragma GCC diagnostic pop
 
  _Iter __first{_S_first_elem(__init), sizeof(_Tp) / sizeof(_Up)};
  _Iter __last = __first;
-- 
2.31.1



Re: [PATCH] avoid recomputing PHI results after failure (PR 104203)

2022-01-25 Thread Martin Sebor via Gcc-patches

On 1/25/22 04:15, Richard Biener wrote:

On Mon, Jan 24, 2022 at 11:55 PM Martin Sebor  wrote:


The pointer_query algorithm fails when it's unable to determine
the provenance of an pointer SSA variable.  A failure prevents it
from making a record of the variable in the cache.  Although this
doesn't happen often, one common cause of failures is PHI nodes:
e.g., because one of its arguments references the PHI itself, or
an undefined variable.  Because a failed SSA_NAME isn't added to
the cache, the computation that led up to its failure is repeated
on each interesting use of the variable. This is computationally
wasteful and can cause excessive CPU usage in pathological cases
like in the test case in PR 104203.

To avoid the problem the attached patch changes the logic for PHI
arguments and nodes to cache the most conservative result on such
a failure.  It treats such a pointer as pointing into the middle
of an object of maximum size (same as under some other conditions).

In addition, the patch also adds a new timer variable to
the warn_access pass to make these problems easier in the future
to track down.


LGTM.


Committed in r12-6869.




There are a number of other conditions under which point_query
fails.  Even though the same approach could be used to replace
all those, I did not make the change in this patch.


It's probably worth fixing them.  Btw, I wonder whether caching
fails as true fails would work as well though caching a conservative
answer of course is fine.


That's another approach: add a bitset and set a bit for each SSA
variable for which the lookup failed (or didn't find anything
interesting).  It would save space in the cache.  I didn't use
it because it would require more churn in the code, but it's
something to consider in the future (the cache itself could
stand to be further split up, with pointer offsets in one and
sizes in another).




Tested on x86_64-linux.

Martin

PS This solution relies on the pointer_query instance having caching
enabled.  Two warning passes run without a cache: -Warray-bounds in
VRP and -Wrestrict in the wrestrict pass.  I can look into enabling
it for GCC 12 if there's concern that they might be impacted.


Please.  There is IMHO no reason to not use the cache - for PHIs
you can run into the same PHI arg def from multiple edges and thus
you'll do unnecessary work even when you just do a single query
for the toplevel API.


Let me post a patch with that.

Martin



Richard.




Re: [PR103970, Fortran, Coarray] Multi-image co_broadcast of derived type with allocatable components fails^

2022-01-25 Thread Harald Anlauf via Gcc-patches

Hi Andre',

Am 25.01.22 um 17:32 schrieb Andre Vehreschild via Fortran:

Hi all,

attached patch fixes wrong code generation when broadcasting a derived type
containing allocatable and non-allocatable scalars. Furthermore does it prevent
broadcasting of coarray-tokens, which are always local this_image. Thus having
them on a different image makes no sense.

Bootstrapped and regtested ok on x86_64-linux/F35.

Ok, for trunk and backport to 12 and 11-branch after decent time?

I perceived that 12 is closed for this kind of bugfix, therefore asking ok for
13.


I do not think that 12 is closed for bugfixing, especially not for
fortran.  And if my cursory reading of the patch is not misleading,
the impact of the patch is really limited to coarrays.

You may want to wait for another 1-2 days for additional comments.
If not, it is OK from my side.

Thanks for the patch!

Harald


Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de




[PATCH] warn-access: Prevent -Wuse-after-free on ARM [PR104213]

2022-01-25 Thread Marek Polacek via Gcc-patches
Here, -Wuse-after-free warns about using 'this' which, on ARM, cdtors
return, as mandated by the EABI.  To be entirely correct, it only
requires it for C1 and C2 ctors and D2 and D1 dtors, but I don't feel
like changing that now and possibly running into issues later on.

This patch uses suppress_warning on 'this' for certain cdtor_returns_this
cases in the C++ FE, and then warn_invalid_pointer makes use of this
information and doesn't warn.

In my first attempt I tried suppress_warning the MODIFY_EXPR or RETURN_EXPR
we build in build_delete_destructor_body, but the complication is that
the suppress_warning bits don't always survive gimplification; see e.g.
gimplify_modify_expr where we do

 6130   if (COMPARISON_CLASS_P (*from_p))
 6131 copy_warning (assign, *from_p);

but here we're not dealing with a comparison.  Removing that check
regresses uninit-pr74762.C.  Adding copy_warning (assign, *expr_p)
regresses c-c++-common/uninit-17.c.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR target/104213

gcc/cp/ChangeLog:

* decl.cc (finish_constructor_body): Suppress -Wuse-after-free.
(finish_destructor_body): Likewise.
* optimize.cc (build_delete_destructor_body): Likewise.

gcc/ChangeLog:

* gimple-ssa-warn-access.cc (pass_waccess::warn_invalid_pointer): Don't
warn when the SSA_NAME_VAR of REF has supressed -Wuse-after-free.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuse-after-free2.C: New test.
---
 gcc/cp/decl.cc   |  2 ++
 gcc/cp/optimize.cc   |  1 +
 gcc/gimple-ssa-warn-access.cc| 14 +++---
 gcc/testsuite/g++.dg/warn/Wuse-after-free2.C | 10 ++
 4 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wuse-after-free2.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 22d3dd1e2ad..6534a7fd320 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17315,6 +17315,7 @@ finish_constructor_body (void)
   add_stmt (build_stmt (input_location, LABEL_EXPR, cdtor_label));
 
   val = DECL_ARGUMENTS (current_function_decl);
+  suppress_warning (val, OPT_Wuse_after_free);
   val = build2 (MODIFY_EXPR, TREE_TYPE (val),
DECL_RESULT (current_function_decl), val);
   /* Return the address of the object.  */
@@ -17408,6 +17409,7 @@ finish_destructor_body (void)
   tree val;
 
   val = DECL_ARGUMENTS (current_function_decl);
+  suppress_warning (val, OPT_Wuse_after_free);
   val = build2 (MODIFY_EXPR, TREE_TYPE (val),
DECL_RESULT (current_function_decl), val);
   /* Return the address of the object.  */
diff --git a/gcc/cp/optimize.cc b/gcc/cp/optimize.cc
index 4ad3f1dc9aa..13ab8b7361e 100644
--- a/gcc/cp/optimize.cc
+++ b/gcc/cp/optimize.cc
@@ -166,6 +166,7 @@ build_delete_destructor_body (tree delete_dtor, tree 
complete_dtor)
   if (targetm.cxx.cdtor_returns_this ())
 {
   tree val = DECL_ARGUMENTS (delete_dtor);
+  suppress_warning (val, OPT_Wuse_after_free);
   val = build2 (MODIFY_EXPR, TREE_TYPE (val),
 DECL_RESULT (delete_dtor), val);
   add_stmt (build_stmt (0, RETURN_EXPR, val));
diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 8bc33eeb6fa..484bcd34c25 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -3880,9 +3880,17 @@ pass_waccess::warn_invalid_pointer (tree ref, gimple 
*use_stmt,
bool maybe, bool equality /* = false */)
 {
   /* Avoid printing the unhelpful "" in the diagnostics.  */
-  if (ref && TREE_CODE (ref) == SSA_NAME
-  && (!SSA_NAME_VAR (ref) || DECL_ARTIFICIAL (SSA_NAME_VAR (ref
-ref = NULL_TREE;
+  if (ref && TREE_CODE (ref) == SSA_NAME)
+{
+  tree var = SSA_NAME_VAR (ref);
+  if (!var)
+   ref = NULL_TREE;
+  /* Don't warn for cases like when a cdtor returns 'this' on ARM.  */
+  else if (warning_suppressed_p (var, OPT_Wuse_after_free))
+   return;
+  else if (DECL_ARTIFICIAL (var))
+   ref = NULL_TREE;
+}
 
   location_t use_loc = gimple_location (use_stmt);
   if (use_loc == UNKNOWN_LOCATION)
diff --git a/gcc/testsuite/g++.dg/warn/Wuse-after-free2.C 
b/gcc/testsuite/g++.dg/warn/Wuse-after-free2.C
new file mode 100644
index 000..6d5f2bf01b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wuse-after-free2.C
@@ -0,0 +1,10 @@
+// PR target/104213
+// { dg-do compile }
+// { dg-options "-Wuse-after-free" }
+
+class C
+{
+virtual ~C();
+};
+
+C::~C() {} // { dg-bogus "used after" }

base-commit: aeac414923aa1e87986c7fc6f9b921d89a9b86cf
-- 
2.34.1



Re: [PATCH] warn-access: Prevent -Wuse-after-free on ARM [PR104213]

2022-01-25 Thread Martin Sebor via Gcc-patches

On 1/25/22 16:33, Marek Polacek via Gcc-patches wrote:

Here, -Wuse-after-free warns about using 'this' which, on ARM, cdtors
return, as mandated by the EABI.  To be entirely correct, it only
requires it for C1 and C2 ctors and D2 and D1 dtors, but I don't feel
like changing that now and possibly running into issues later on.

This patch uses suppress_warning on 'this' for certain cdtor_returns_this
cases in the C++ FE, and then warn_invalid_pointer makes use of this
information and doesn't warn.

In my first attempt I tried suppress_warning the MODIFY_EXPR or RETURN_EXPR
we build in build_delete_destructor_body, but the complication is that
the suppress_warning bits don't always survive gimplification; see e.g.
gimplify_modify_expr where we do

  6130   if (COMPARISON_CLASS_P (*from_p))
  6131 copy_warning (assign, *from_p);

but here we're not dealing with a comparison.  Removing that check
regresses uninit-pr74762.C.  Adding copy_warning (assign, *expr_p)
regresses c-c++-common/uninit-17.c.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


Thanks for picking this up!

My only concern with suppressing the warning for the pointer rather
than just for the return statement is that it not lead to false
negatives.  I played with the patch a little, trying to trigger one
but couldn't so it might be unfounded.  Still, I'd expect targeting
the statement to be a simpler change (the warning already checks
for the suppression bit on the statement).  Did you try it?

For reference, one of the test cases I tried is below.  It would
be good to  add it to the test suite to make sure the bug is caught
(there's a duplicate warning there that should at some point get
squashed).

struct A
{
  virtual ~A ();
  void f ();
};

A::~A ()
{
  operator delete (this);
  f ();
}

Martin



PR target/104213

gcc/cp/ChangeLog:

* decl.cc (finish_constructor_body): Suppress -Wuse-after-free.
(finish_destructor_body): Likewise.
* optimize.cc (build_delete_destructor_body): Likewise.

gcc/ChangeLog:

* gimple-ssa-warn-access.cc (pass_waccess::warn_invalid_pointer): Don't
warn when the SSA_NAME_VAR of REF has supressed -Wuse-after-free.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuse-after-free2.C: New test.
---
  gcc/cp/decl.cc   |  2 ++
  gcc/cp/optimize.cc   |  1 +
  gcc/gimple-ssa-warn-access.cc| 14 +++---
  gcc/testsuite/g++.dg/warn/Wuse-after-free2.C | 10 ++
  4 files changed, 24 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wuse-after-free2.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 22d3dd1e2ad..6534a7fd320 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17315,6 +17315,7 @@ finish_constructor_body (void)
add_stmt (build_stmt (input_location, LABEL_EXPR, cdtor_label));
  
val = DECL_ARGUMENTS (current_function_decl);

+  suppress_warning (val, OPT_Wuse_after_free);
val = build2 (MODIFY_EXPR, TREE_TYPE (val),
DECL_RESULT (current_function_decl), val);
/* Return the address of the object.  */
@@ -17408,6 +17409,7 @@ finish_destructor_body (void)
tree val;
  
val = DECL_ARGUMENTS (current_function_decl);

+  suppress_warning (val, OPT_Wuse_after_free);
val = build2 (MODIFY_EXPR, TREE_TYPE (val),
DECL_RESULT (current_function_decl), val);
/* Return the address of the object.  */
diff --git a/gcc/cp/optimize.cc b/gcc/cp/optimize.cc
index 4ad3f1dc9aa..13ab8b7361e 100644
--- a/gcc/cp/optimize.cc
+++ b/gcc/cp/optimize.cc
@@ -166,6 +166,7 @@ build_delete_destructor_body (tree delete_dtor, tree 
complete_dtor)
if (targetm.cxx.cdtor_returns_this ())
  {
tree val = DECL_ARGUMENTS (delete_dtor);
+  suppress_warning (val, OPT_Wuse_after_free);
val = build2 (MODIFY_EXPR, TREE_TYPE (val),
  DECL_RESULT (delete_dtor), val);
add_stmt (build_stmt (0, RETURN_EXPR, val));
diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 8bc33eeb6fa..484bcd34c25 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -3880,9 +3880,17 @@ pass_waccess::warn_invalid_pointer (tree ref, gimple 
*use_stmt,
bool maybe, bool equality /* = false */)
  {
/* Avoid printing the unhelpful "" in the diagnostics.  */
-  if (ref && TREE_CODE (ref) == SSA_NAME
-  && (!SSA_NAME_VAR (ref) || DECL_ARTIFICIAL (SSA_NAME_VAR (ref
-ref = NULL_TREE;
+  if (ref && TREE_CODE (ref) == SSA_NAME)
+{
+  tree var = SSA_NAME_VAR (ref);
+  if (!var)
+   ref = NULL_TREE;
+  /* Don't warn for cases like when a cdtor returns 'this' on ARM.  */
+  else if (warning_suppressed_p (var, OPT_Wuse_after_free))
+   return;
+  else if (DECL_ARTIFICIAL (var))
+   ref = NULL_TREE;
+}
  
locati

Re: [PATCH] warn-access: Prevent -Wuse-after-free on ARM [PR104213]

2022-01-25 Thread Marek Polacek via Gcc-patches
On Tue, Jan 25, 2022 at 05:35:20PM -0700, Martin Sebor wrote:
> On 1/25/22 16:33, Marek Polacek via Gcc-patches wrote:
> > Here, -Wuse-after-free warns about using 'this' which, on ARM, cdtors
> > return, as mandated by the EABI.  To be entirely correct, it only
> > requires it for C1 and C2 ctors and D2 and D1 dtors, but I don't feel
> > like changing that now and possibly running into issues later on.
> > 
> > This patch uses suppress_warning on 'this' for certain cdtor_returns_this
> > cases in the C++ FE, and then warn_invalid_pointer makes use of this
> > information and doesn't warn.
> > 
> > In my first attempt I tried suppress_warning the MODIFY_EXPR or RETURN_EXPR
> > we build in build_delete_destructor_body, but the complication is that
> > the suppress_warning bits don't always survive gimplification; see e.g.
> > gimplify_modify_expr where we do
> > 
> >   6130   if (COMPARISON_CLASS_P (*from_p))
> >   6131 copy_warning (assign, *from_p);
> > 
> > but here we're not dealing with a comparison.  Removing that check
> > regresses uninit-pr74762.C.  Adding copy_warning (assign, *expr_p)
> > regresses c-c++-common/uninit-17.c.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> Thanks for picking this up!
> 
> My only concern with suppressing the warning for the pointer rather
> than just for the return statement is that it not lead to false
> negatives.  I played with the patch a little, trying to trigger one
> but couldn't so it might be unfounded.  Still, I'd expect targeting
> the statement to be a simpler change (the warning already checks
> for the suppression bit on the statement).  Did you try it?
 
I did -- that is the "In my first attempt..." paragraph in my patch
(we don't use copy_warning to propagate the suppress_warning bits).
Maybe it'd be a more correct way to approach this, that is, somehow
tweak gimplify_modify_expr to copy_warning for this special case.

> For reference, one of the test cases I tried is below.  It would
> be good to  add it to the test suite to make sure the bug is caught
> (there's a duplicate warning there that should at some point get
> squashed).
> 
> struct A
> {
>   virtual ~A ();
>   void f ();
> };
> 
> A::~A ()
> {
>   operator delete (this);
>   f ();
> }

Sure, happy to add it.

> Martin
> 
> > 
> > PR target/104213
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl.cc (finish_constructor_body): Suppress -Wuse-after-free.
> > (finish_destructor_body): Likewise.
> > * optimize.cc (build_delete_destructor_body): Likewise.
> > 
> > gcc/ChangeLog:
> > 
> > * gimple-ssa-warn-access.cc (pass_waccess::warn_invalid_pointer): Don't
> > warn when the SSA_NAME_VAR of REF has supressed -Wuse-after-free.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/warn/Wuse-after-free2.C: New test.
> > ---
> >   gcc/cp/decl.cc   |  2 ++
> >   gcc/cp/optimize.cc   |  1 +
> >   gcc/gimple-ssa-warn-access.cc| 14 +++---
> >   gcc/testsuite/g++.dg/warn/Wuse-after-free2.C | 10 ++
> >   4 files changed, 24 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/warn/Wuse-after-free2.C
> > 
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index 22d3dd1e2ad..6534a7fd320 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -17315,6 +17315,7 @@ finish_constructor_body (void)
> > add_stmt (build_stmt (input_location, LABEL_EXPR, cdtor_label));
> > val = DECL_ARGUMENTS (current_function_decl);
> > +  suppress_warning (val, OPT_Wuse_after_free);
> > val = build2 (MODIFY_EXPR, TREE_TYPE (val),
> > DECL_RESULT (current_function_decl), val);
> > /* Return the address of the object.  */
> > @@ -17408,6 +17409,7 @@ finish_destructor_body (void)
> > tree val;
> > val = DECL_ARGUMENTS (current_function_decl);
> > +  suppress_warning (val, OPT_Wuse_after_free);
> > val = build2 (MODIFY_EXPR, TREE_TYPE (val),
> > DECL_RESULT (current_function_decl), val);
> > /* Return the address of the object.  */
> > diff --git a/gcc/cp/optimize.cc b/gcc/cp/optimize.cc
> > index 4ad3f1dc9aa..13ab8b7361e 100644
> > --- a/gcc/cp/optimize.cc
> > +++ b/gcc/cp/optimize.cc
> > @@ -166,6 +166,7 @@ build_delete_destructor_body (tree delete_dtor, tree 
> > complete_dtor)
> > if (targetm.cxx.cdtor_returns_this ())
> >   {
> > tree val = DECL_ARGUMENTS (delete_dtor);
> > +  suppress_warning (val, OPT_Wuse_after_free);
> > val = build2 (MODIFY_EXPR, TREE_TYPE (val),
> >   DECL_RESULT (delete_dtor), val);
> > add_stmt (build_stmt (0, RETURN_EXPR, val));
> > diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
> > index 8bc33eeb6fa..484bcd34c25 100644
> > --- a/gcc/gimple-ssa-warn-access.cc
> > +++ b/gcc/gimple-ssa-warn-access.cc
> > @@ -3880,9 +3880,17 @@ pass_waccess::warn_

PING^2 [PATCH] rs6000: Fix an assertion in update_target_cost_per_stmt [PR103702]

2022-01-25 Thread Kewen.Lin via Gcc-patches
Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587309.html

BR,
Kewen

> on 2021/12/23 上午10:06, Kewen.Lin via Gcc-patches wrote:
>> Hi, 
>>
>> This patch is to fix one wrong assertion which is too aggressive.
>> Vectorizer can do vec_construct costing for the vector type which
>> only has one unit.  For the failed case, the passed-in vector type
>> is "vector(1) int", though it doesn't end up with any construction
>> eventually.  We have to handle this kind of input in function
>> rs6000_cost_data::update_target_cost_per_stmt.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  PR target/103702
>>  * config/rs6000/rs6000.c
>>  (rs6000_cost_data::update_target_cost_per_stmt): Fix one wrong
>>  assertion with early return.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR target/103702
>>  * gcc.target/powerpc/pr103702.c: New test.
>> ---
>>  gcc/config/rs6000/rs6000.c  |  7 --
>>  gcc/testsuite/gcc.target/powerpc/pr103702.c | 24 +
>>  2 files changed, 29 insertions(+), 2 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103702.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 0b09713b2f5..37f07fe5358 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -5461,8 +5461,11 @@ rs6000_cost_data::update_target_cost_per_stmt 
>> (vect_cost_for_stmt kind,
>>  {
>>tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>>unsigned int nunits = vect_nunits_for_cost (vectype);
>> -  /* We don't expect strided/elementwise loads for just 1 nunit.  */
>> -  gcc_assert (nunits > 1);
>> +  /* As PR103702 shows, it's possible that vectorizer wants to do
>> + costings for only one unit here, it's no need to do any
>> + penalization for it, so simply early return here.  */
>> +  if (nunits == 1)
>> +return;
>>/* i386 port adopts nunits * stmt_cost as the penalized cost
>>   for this kind of penalization, we used to follow it but
>>   found it could result in an unreliable body cost especially
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103702.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr103702.c
>> new file mode 100644
>> index 000..585946fd64b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103702.c
>> @@ -0,0 +1,24 @@
>> +/* We don't have one powerpc.*_ok for Power6, use altivec_ok 
>> conservatively.  */
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power6 -O2 -ftree-loop-vectorize 
>> -fno-tree-scev-cprop" } */
>> +
>> +/* Verify there is no ICE.  */
>> +
>> +unsigned short a, e;
>> +int *b, *d;
>> +int c;
>> +extern int fn2 ();
>> +void
>> +fn1 ()
>> +{
>> +  void *f;
>> +  for (;;)
>> +{
>> +  fn2 ();
>> +  b = f;
>> +  e = 0;
>> +  for (; e < a; ++e)
>> +b[e] = d[e * c];
>> +}
>> +}
>> +
>> --
>> 2.27.0
>>



PING^2 [PATCH] rs6000: Disable MMA if no P9 VECTOR support [PR103627]

2022-01-25 Thread Kewen.Lin via Gcc-patches
Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587310.html

BR,
Kewen

> on 2021/12/23 上午10:09, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> As PR103627 shows, there is an unexpected case where !TARGET_VSX
>> and TARGET_MMA co-exist.  As ISA3.1 claims, SIMD is a requirement
>> for MMA.  By looking into the ICE, I noticed that the current
>> MMA implementation depends on vector pairs load/store, but since
>> we don't have a separated option to control Power10 vector, this
>> patch is to check for Power9 vector instead.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  PR target/103627
>>  * config/rs6000/rs6000.c (rs6000_option_override_internal): Disable
>>  MMA if !TARGET_P9_VECTOR.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR target/103627
>>  * gcc.target/powerpc/pr103627-1.c: New test.
>>  * gcc.target/powerpc/pr103627-2.c: New test.
>> ---
>>  gcc/config/rs6000/rs6000.c| 11 +++
>>  gcc/testsuite/gcc.target/powerpc/pr103627-1.c | 16 
>>  gcc/testsuite/gcc.target/powerpc/pr103627-2.c | 16 
>>  3 files changed, 43 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103627-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103627-2.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index c020947abc8..ec3b46682a7 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -4505,6 +4505,17 @@ rs6000_option_override_internal (bool global_init_p)
>>rs6000_isa_flags &= ~OPTION_MASK_MMA;
>>  }
>>
>> +  /* MMA requires SIMD support as ISA 3.1 claims and our implementation
>> + such as "*movoo" uses vector pair access which are only supported
>> + from ISA 3.1.  But since we don't have one separated option to
>> + control Power10 vector, check for Power9 vector instead.  */
>> +  if (TARGET_MMA && !TARGET_P9_VECTOR)
>> +{
>> +  if ((rs6000_isa_flags_explicit & OPTION_MASK_MMA) != 0)
>> +error ("%qs requires %qs", "-mmma", "-mpower9-vector");
>> +  rs6000_isa_flags &= ~OPTION_MASK_MMA;
>> +}
>> +
>>if (!TARGET_PCREL && TARGET_PCREL_OPT)
>>  rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103627-1.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr103627-1.c
>> new file mode 100644
>> index 000..6c6c16188fb
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103627-1.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-require-effective-target power10_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power10 -mno-power9-vector" } */
>> +
>> +/* Verify compiler emits error message instead of ICE.  */
>> +
>> +extern float *dest;
>> +extern __vector_quad src;
>> +
>> +int
>> +foo ()
>> +{
>> +  __builtin_mma_disassemble_acc (dest, &src);
>> +  /* { dg-error "'__builtin_mma_disassemble_acc' requires the '-mmma' 
>> option" "" { target *-*-* } .-1 } */
>> +  return 0;
>> +}
>> +
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103627-2.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr103627-2.c
>> new file mode 100644
>> index 000..6604872c0e8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103627-2.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-require-effective-target power10_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power10 -mmma -mno-power9-vector" } */
>> +
>> +/* Verify the emitted error message.  */
>> +
>> +extern float *dest;
>> +extern __vector_quad src;
>> +
>> +int
>> +foo ()
>> +{
>> +  __builtin_mma_disassemble_acc (dest, &src);
>> +  /* { dg-error "'-mmma' requires '-mpower9-vector'" "mma" { target *-*-* } 
>> 0 } */
>> +  return 0;
>> +}
>> +
>> --
>> 2.27.0
>>



PING^2 [PATCH] rs6000: Move the hunk affecting VSX/ALTIVEC ahead [PR103627]

2022-01-25 Thread Kewen.Lin via Gcc-patches
Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587311.html

BR,
Kewen

> on 2021/12/23 上午10:12, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> There is one hunk checking for functions with target attribute/pragma
>> have the same altivec abi as the one of main_target_opt, it can update
>> both VSX and ALTIVEC flags.  Meanwhile, we have some codes to check or
>> warn for some isa flags related to VSX and ALTIVEC, that sit where the
>> mentioned hunk is proposed to be moved to in this patch.
>>
>> Since the flags update in the mentioned hunk happen behind those
>> adjustments based on VSX and ALTIVEC flags, it can cause the
>> incompatibility and result in unexpected behaviors, the associated test
>> case is one typical case.
>>
>> Besides, we already have the code which sets TARGET_FLOAT128_TYPE and
>> lays after where the hunk is moved to, and OPTION_MASK_FLOAT128_KEYWORD
>> will rely on TARGET_FLOAT128_TYPE, so this patch just simply removes them.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8 and P7.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  PR target/103627
>>  * config/rs6000/rs6000.c (rs6000_option_override_internal): Move the
>>  hunk affecting VSX and ALTIVEC to the appropriate place.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR target/103627
>>  * gcc.target/powerpc/pr103627-3.c: New test.
>> ---
>>  gcc/config/rs6000/rs6000.c| 21 ---
>>  gcc/testsuite/gcc.target/powerpc/pr103627-3.c | 20 ++
>>  2 files changed, 29 insertions(+), 12 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103627-3.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index ec3b46682a7..0b09713b2f5 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -3955,6 +3955,15 @@ rs6000_option_override_internal (bool global_init_p)
>>else if (TARGET_ALTIVEC)
>>  rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~ignore_masks);
>>
>> +  /* Disable VSX and Altivec silently if the user switched cpus to power7 
>> in a
>> + target attribute or pragma which automatically enables both options,
>> + unless the altivec ABI was set.  This is set by default for 64-bit, but
>> + not for 32-bit.  Don't move this before the above code using 
>> ignore_masks,
>> + since it can reset the cleared VSX/ALTIVEC flag again.  */
>> +  if (main_target_opt != NULL && !main_target_opt->x_rs6000_altivec_abi)
>> +rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC)
>> +  & ~rs6000_isa_flags_explicit);
>> +
>>if (TARGET_CRYPTO && !TARGET_ALTIVEC)
>>  {
>>if (rs6000_isa_flags_explicit & OPTION_MASK_CRYPTO)
>> @@ -4373,18 +4382,6 @@ rs6000_option_override_internal (bool global_init_p)
>>  }
>>  }
>>
>> -  /* Disable VSX and Altivec silently if the user switched cpus to power7 
>> in a
>> - target attribute or pragma which automatically enables both options,
>> - unless the altivec ABI was set.  This is set by default for 64-bit, but
>> - not for 32-bit.  */
>> -  if (main_target_opt != NULL && !main_target_opt->x_rs6000_altivec_abi)
>> -{
>> -  TARGET_FLOAT128_TYPE = 0;
>> -  rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC
>> - | OPTION_MASK_FLOAT128_KEYWORD)
>> -& ~rs6000_isa_flags_explicit);
>> -}
>> -
>>/* Enable Altivec ABI for AIX -maltivec.  */
>>if (TARGET_XCOFF
>>&& (TARGET_ALTIVEC || TARGET_VSX)
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103627-3.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr103627-3.c
>> new file mode 100644
>> index 000..9df2b73fe85
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103627-3.c
>> @@ -0,0 +1,20 @@
>> +/* There are no error messages for either LE or BE 64bit.  */
>> +/* { dg-require-effective-target be }*/
>> +/* { dg-require-effective-target ilp32 } */
>> +/* We don't have one powerpc.*_ok for Power6, use altivec_ok 
>> conservatively.  */
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power6" } */
>> +
>> +/* Verify compiler emits error message instead of ICE.  */
>> +
>> +#pragma GCC target "cpu=power10"
>> +int
>> +main ()
>> +{
>> +  float *b;
>> +  __vector_quad c;
>> +  __builtin_mma_disassemble_acc (b, &c);
>> +  /* { dg-error "'__builtin_mma_disassemble_acc' requires the '-mmma' 
>> option" "" { target *-*-* } .-1 } */
>> +  return 0;
>> +}
>> +
>> --
>> 2.27.0
>>




PING^2 [PATCH] rs6000: Don't turn off VSX for P9 VECTOR when TARGET_AVOID_XFORM set

2022-01-25 Thread Kewen.Lin via Gcc-patches
Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587449.html

BR,
Kewen

> on 2021/12/29 下午5:36, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> When TARGET_AVOID_XFORM is set, we turn off VSX.  But at least from
>> ISA3.0 (Power9), we support DQ form vector load/store.  This patch
>> is to make it not clear VSX when P9 VECTOR supported, it also checks
>> some flags which P9 VECTOR relies on, otherwise those flags could
>> disable P9 VECTOR later.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  * config/rs6000/rs6000.c (rs6000_option_override_internal): Consider
>>  P9 VECTOR when determining to disable VSX for TARGET_AVOID_XFORM.
>> ---
>>  gcc/config/rs6000/rs6000.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 66b01e589b0..c020947abc8 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -3865,7 +3865,10 @@ rs6000_option_override_internal (bool global_init_p)
>>rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
>>  }
>>  }
>> -  else if (TARGET_AVOID_XFORM > 0)
>> +  else if (TARGET_AVOID_XFORM > 0
>> +   /* Exclude P9 VECTOR which supports DQ form, but need to check
>> +  some flags which are able to disable it as well.  */
>> +   && !(TARGET_ALTIVEC && TARGET_P8_VECTOR && TARGET_P9_VECTOR))
>>  msg = N_("%<-mvsx%> needs indexed addressing");
>>else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
>> & OPTION_MASK_ALTIVEC))
>> --
>> 2.27.0
>>
> 



PING^2 [PATCH v3] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2022-01-25 Thread Kewen.Lin via Gcc-patches
Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587635.html

BR,
Kewen

> 
> on 2022/1/5 下午3:34, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> This patch is to fix the inconsistent behaviors for non-LTO mode
>> and LTO mode.  As Martin pointed out, currently the function
>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>> NULL, but it's unexpected, we should use the command line options
>> from target_option_default_node as default.
>>
>> It replaces rs6000_isa_flags with target_option_default_node when
>> caller_tree is NULL since it's more straightforward and doesn't
>> suffer from some bug not to keep rs6000_isa_flags as default.
>>
>> It also extends the scope of the check for the case that callee
>> has explicit set options, inlining in test case pr102059-5.c can
>> happen unexpectedly before, it's fixed accordingly.
>>
>> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
>> can be neglected for always inlining, this patch also takes some
>> flags when the callee is attributed by always_inline.
>>
>> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html
>> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586112.html
>>
>> This patch is one re-post of this updated version[1] and also
>> rebased and adjusted on top of the related commit r12-6219.
>>
>> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
>> powerpc64le-linux-gnu P9 and P10.
>>
>> Is it ok for trunk?
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586296.html
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  PR target/102059
>>  * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
>>  target_option_default_node and consider always_inline_safe flags.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR target/102059
>>  * gcc.target/powerpc/pr102059-4.c: New test.
>>  * gcc.target/powerpc/pr102059-5.c: New test.
>>  * gcc.target/powerpc/pr102059-6.c: New test.
>>  * gcc.target/powerpc/pr102059-7.c: New test.
>>  * gcc.target/powerpc/pr102059-8.c: New test.
>>  * gcc.dg/lto/pr102059-1_0.c: Remove unneeded option.
>>
>>
> 



Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-25 Thread Kewen.Lin via Gcc-patches
on 2022/1/14 上午12:31, David Edelsohn wrote:
> On Thu, Jan 13, 2022 at 7:28 AM Kewen.Lin  wrote:
>>
>> on 2022/1/13 上午11:56, Kewen.Lin via Gcc-patches wrote:
>>> on 2022/1/13 上午11:44, David Edelsohn wrote:
 On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
>
> Hi David,
>
> on 2022/1/13 上午11:07, David Edelsohn wrote:
>> On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>>>
>>> Hi,
>>>
>>> This patch is to fix register constraint v with
>>> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
>>> just like some other existing register constraints with
>>> RS6000_CONSTRAINT_*.
>>>
>>> I happened to see this and hope it's not intentional and just
>>> got neglected.
>>>
>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
>>> powerpc64-linux-gnu P8.
>>>
>>> Is it ok for trunk?
>>
>> Why do you want to make this change?
>>
>> rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
>>
>> but all of the patterns that use a "v" constraint are (or should be)
>> protected by TARGET_ALTIVEC, or some final condition that only is
>> active for TARGET_ALTIVEC.  The other constraints are conditionally
>> set because they can be used in a pattern with multiple alternatives
>> where the pattern itself is active but some of the constraints
>> correspond to NO_REGS when some instruction variants for VSX is not
>> enabled.
>>
>
> Good point!  Thanks for the explanation.
>
>> The change isn't wrong, but it doesn't correct a bug and provides no
>> additional benefit nor clarty that I can see.
>>
>
> The original intention is to make it consistent with the other existing
> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
> weird (like was neglected).  After you clarified above, 
> RS6000_CONSTRAINT_v
> seems useless at all in the current framework.  Do you prefer to remove
> it to avoid any confusions instead?

 It's used in the reg_class, so there may be some heuristic in the GCC
 register allocator that cares about the number of registers available
 for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
 conditionally, so it seems best to leave it as is.

>>>
>>> I may miss something, but I didn't find it's used for the above purposes.
>>> If it's best to leave it as is, the proposed patch seems to offer better
>>> readability.
>>
>> Two more inputs for maintainers' decision:
>>
>> 1) the original proposed patch fixed one "bug" that is:
>>
>> In function rs6000_debug_reg_global, it tries to print the register class
>> for the register constraint:
>>
>>   fprintf (stderr,
>>"\n"
>>"d  reg_class = %s\n"
>>"f  reg_class = %s\n"
>>"v  reg_class = %s\n"
>>"wa reg_class = %s\n"
>>...
>>"\n",
>>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
>>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>>...
>>
>> It uses rs6000_constraints[RS6000_CONSTRAINT_v] which is conditionally
>> set here:
>>
>>   /* Add conditional constraints based on various options, to allow us to
>>  collapse multiple insn patterns.  */
>>   if (TARGET_ALTIVEC)
>> rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
>>
>> But the actual register class for register constraint is hardcoded as
>> ALTIVEC_REGS rather than rs6000_constraints[RS6000_CONSTRAINT_v].
> 
> I agree that the information is inaccurate, but it is informal
> debugging output.  And if Altivec is disabled, the value of the
> constraint is irrelevant / garbage.
> 

Yeah, but IMHO it still can confuse new comers at first glance.

>>
>> 2) Bootstrapped and tested one below patch to remove all the code using
>> RS6000_CONSTRAINT_v on powerpc64le-linux-gnu P10 and P9,
>> powerpc64-linux-gnu P8 and P7 with no regressions.
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 37f07fe5358..3652629c5d0 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -2320,7 +2320,6 @@ rs6000_debug_reg_global (void)
>>"\n"
>>"d  reg_class = %s\n"
>>"f  reg_class = %s\n"
>> -  "v  reg_class = %s\n"
>>"wa reg_class = %s\n"
>>"we reg_class = %s\n"
>>"wr reg_class = %s\n"
>> @@ -2329,7 +2328,6 @@ rs6000_debug_reg_global (void)
>>"\n",
>>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
>> -  reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>

[PATCH v3] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers

2022-01-25 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is that aarch64_expand_setmem does not change the alignment
for strict alignment case. This is version 3 of this patch, is is based on
version 2 and moves the check for the number of instructions from the
optimizing for size case to be always and change the cost of libcalls for
the !size case to be max_size/16 + 1 (or 17) which was the same as before
when handling just the max_size. The main change is dealing with strict
alignment case where we only inline a max of 17 instructions as at that
point the call to the memset will be faster and could handle the dynamic
alignment instead of just the static alignment.

Note the reason why it is +1 is to count for the setting of the simd
duplicate.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/103100
gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_expand_setmem): Constraint
copy_limit to the alignment of the mem if STRICT_ALIGNMENT is
true. Also constraint the number of instructions for the !size
case to max_size/16 + 1.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/memset-strict-align-1.c: Update test.
Reduce the size down to 207 and make s1 global and aligned
to 16 bytes.
* gcc.target/aarch64/memset-strict-align-2.c: New test.
---
 gcc/config/aarch64/aarch64.cc | 55 ++-
 .../aarch64/memset-strict-align-1.c   | 20 +++
 .../aarch64/memset-strict-align-2.c   | 14 +
 3 files changed, 53 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-2.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 296145e6008..02ecb2154ea 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23831,8 +23831,11 @@ aarch64_expand_setmem (rtx *operands)
 (zero constants can use XZR directly).  */
   unsigned mops_cost = 3 + 1 + cst_val;
   /* A libcall to memset in the worst case takes 3 instructions to prepare
- the arguments + 1 for the call.  */
-  unsigned libcall_cost = 4;
+ the arguments + 1 for the call.
+ In the case of not optimizing for size the cost of doing a libcall
+ is the max_set_size / 16 + 1 or 17 instructions. The one instruction
+ is for the vector dup which may or may not be used.  */
+  unsigned libcall_cost = size_p ? 4 : (max_set_size / 16 + 1);
 
   /* Upper bound check.  For large constant-sized setmem use the MOPS sequence
  when available.  */
@@ -23842,12 +23845,12 @@ aarch64_expand_setmem (rtx *operands)
 
   /* Attempt a sequence with a vector broadcast followed by stores.
  Count the number of operations involved to see if it's worth it
- against the alternatives.  A simple counter simd_ops on the
+ against the alternatives.  A simple counter inlined_ops on the
  algorithmically-relevant operations is used rather than an rtx_insn count
  as all the pointer adjusmtents and mode reinterprets will be optimized
  away later.  */
   start_sequence ();
-  unsigned simd_ops = 0;
+  unsigned inlined_ops = 0;
 
   base = copy_to_mode_reg (Pmode, XEXP (dst, 0));
   dst = adjust_automodify_address (dst, VOIDmode, base, 0);
@@ -23855,15 +23858,22 @@ aarch64_expand_setmem (rtx *operands)
   /* Prepare the val using a DUP/MOVI v0.16B, val.  */
   src = expand_vector_broadcast (V16QImode, val);
   src = force_reg (V16QImode, src);
-  simd_ops++;
+  inlined_ops++;
   /* Convert len to bits to make the rest of the code simpler.  */
   n = len * BITS_PER_UNIT;
 
   /* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
  AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.  */
-  const int copy_limit = (aarch64_tune_params.extra_tuning_flags
- & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
- ? GET_MODE_BITSIZE (TImode) : 256;
+  int copy_limit;
+
+  if (aarch64_tune_params.extra_tuning_flags
+  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
+copy_limit = GET_MODE_BITSIZE (TImode);
+  else
+copy_limit = 256;
+
+  if (STRICT_ALIGNMENT)
+copy_limit = MIN (copy_limit, (int)MEM_ALIGN (dst));
 
   while (n > 0)
 {
@@ -23878,7 +23888,7 @@ aarch64_expand_setmem (rtx *operands)
 
   mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
   aarch64_set_one_block_and_progress_pointer (src, &dst, cur_mode);
-  simd_ops++;
+  inlined_ops++;
   n -= mode_bits;
 
   /* Do certain trailing copies as overlapping if it's going to be
@@ -23897,24 +23907,17 @@ aarch64_expand_setmem (rtx *operands)
   rtx_insn *seq = get_insns ();
   end_sequence ();
 
-  if (size_p)
-{
-  /* When optimizing for size we have 3 options: the SIMD broadcast 
sequence,
-call to memset or the MOPS expansion.  */
-  if (TARGET_MOPS
- && mops_cost <= libcall_cost
- && mops_cost <= simd_ops)
-   return aarch64_expand_s

[PATCH] MIPS: use 8bit for IPL in Cause register

2022-01-25 Thread YunQiang Su
Since MIPS r2, the IPL section in Cause register has been expand
to 8bit instead of 6bit.

Since __attribute__((interrupt)) is only supported for r2+,
we don't need to detect the target.

gcc/ChangeLog:

* config/mips/mips.cc (mips_expand_prologue):
  IPL is 8bit for r2+.
---
 gcc/config/mips/mips.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 4f9683e8bf4..bde88fb8e5a 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -12255,7 +12255,7 @@ mips_expand_prologue (void)
  if (!cfun->machine->keep_interrupts_masked_p
  && cfun->machine->int_mask == INT_MASK_EIC)
emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM),
-  GEN_INT (6),
+  GEN_INT (8),
   GEN_INT (SR_IPL),
   gen_rtx_REG (SImode, K0_REG_NUM)));
 
-- 
2.30.2



[pushed] c++: alias template and typename [PR103057]

2022-01-25 Thread Jason Merrill via Gcc-patches
Usually we handle DR1558 substitution near the top of tsubst, but in this
case while substituting TYPENAME_TYPE we were passing an alias
specialization to tsubst_aggr_type, which ignored its aliasness.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/103057

gcc/cp/ChangeLog:

* pt.cc (tsubst_aggr_type): Call tsubst for alias template
specialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-void1.C: New test.
---
 gcc/cp/pt.cc  |  8 
 gcc/testsuite/g++.dg/cpp0x/alias-decl-void1.C | 18 ++
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-void1.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8f50b9c4d3c..6fbda676527 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -13585,6 +13585,14 @@ tsubst_aggr_type (tree t,
   if (t == NULL_TREE)
 return NULL_TREE;
 
+  /* If T is an alias template specialization, we want to substitute that
+ rather than strip it, especially if it's dependent_alias_template_spec_p.
+ It should be OK not to handle entering_scope in this case, since
+ DECL_CONTEXT will never be an alias template specialization.  We only get
+ here with an alias when tsubst calls us for TYPENAME_TYPE.  */
+  if (alias_template_specialization_p (t, nt_transparent))
+return tsubst (t, args, complain, in_decl);
+
   switch (TREE_CODE (t))
 {
 case RECORD_TYPE:
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-void1.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-void1.C
new file mode 100644
index 000..accc1a45abc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-void1.C
@@ -0,0 +1,18 @@
+// PR c++/103057
+// { dg-do compile { target c++11 } }
+
+template  struct A { };
+template  struct B { using type = A; };
+template  struct C {
+  using type = typename T::foo;// { dg-error "int" }
+};
+template  using L = B;
+
+template 
+typename L::type>::type
+f(T) { };
+
+int main()
+{
+  f(42);   // { dg-error "no match" }
+}

base-commit: fe5cee6f62a0b229d9d51616b7490331d39b5ddd
prerequisite-patch-id: c7ab4056fcbd782232c8dc091597602ecd4a7a48
-- 
2.27.0



Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-25 Thread Dan Li via Gcc-patches




On 1/25/22 02:19, Richard Sandiford wrote:

Dan Li  writes:

+
  if (flag_stack_usage_info)
current_function_static_stack_size = constant_lower_bound (frame_size);

@@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)

  RTX_FRAME_RELATED_P (insn) = 1;
}

+  /* Pop return address from shadow call stack.  */

+  if (aarch64_shadow_call_stack_enabled ())
+   emit_insn (gen_scs_pop ());
+


This looks correct, but following on from the above, I guess we continue
to restore x30 from the frame in the traditional way first, and then
overwrite it with the SCS value.  I think the output code would be
slightly neater if we suppressed the first restore of x30.


Yes, the current epilogue is like:
  ...
  ldp x29, x30, [sp], #16
+   ldr x30, [x18, #-8]!

I think may be we can keep the two x30 pops here, for two reasons:
1) The implementation of scs in clang is to pop x30 twice, it might be
better to be consistent with clang here[1].


This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)



Well, probably sticking to pop x30 twice is not a good idea.
AFAICT, there doesn't seem to be an explicit requirement that
this behavior must be followed.

BTW:
Do we still need to emit the .cfi_restore 30 directive here if we
want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)

Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
the generated assembly code may be as follows:

str x30, [x18], 8
ldp x29, x30, [sp], 16
..
ldr x29, [sp], 16
## Do we still need a .cfi_restore 30 here
.cfi_restore 29
.cfi_def_cfa_offset 0
ldr x30, [x18, -8]!
## Or may be a non-CFA based directive here
ret


2) If we keep the first restore of x30, it may provide some flexibility
for other scenarios. For example, we can directly patch the scs_push/pop
insns in the binary to turn it into a binary without scs;


Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.



Oh, no, this is just a little trick that can be used for compatibility.
(Maybe some scenarios such as our company sometimes need to be
compatible with some non-open source binaries from different
manufacturers, two pops could make life easier :). )

If this is not a consideration for our community, please ignore
this request.


I'd like a definitive ruling on this from the kernel folks before
the patch goes in.



Ok, I'll cc some kernel folks to make sure I didn't miss something.


If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

   (set ...mem...
(unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.



Yeah, this template looks more appropriate if it is to be treated
as a special directive.

Thanks for your suggestions,
Dan


Re: PING^2 [PATCH] rs6000: Fix an assertion in update_target_cost_per_stmt [PR103702]

2022-01-25 Thread Richard Biener via Gcc-patches
On Wed, Jan 26, 2022 at 3:15 AM Kewen.Lin via Gcc-patches
 wrote:
>
> Gentle ping:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587309.html

OK.

Thanks,
Richard.

> BR,
> Kewen
>
> > on 2021/12/23 上午10:06, Kewen.Lin via Gcc-patches wrote:
> >> Hi,
> >>
> >> This patch is to fix one wrong assertion which is too aggressive.
> >> Vectorizer can do vec_construct costing for the vector type which
> >> only has one unit.  For the failed case, the passed-in vector type
> >> is "vector(1) int", though it doesn't end up with any construction
> >> eventually.  We have to handle this kind of input in function
> >> rs6000_cost_data::update_target_cost_per_stmt.
> >>
> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> >> powerpc64-linux-gnu P8.
> >>
> >> Is it ok for trunk?
> >>
> >> BR,
> >> Kewen
> >> -
> >> gcc/ChangeLog:
> >>
> >>  PR target/103702
> >>  * config/rs6000/rs6000.c
> >>  (rs6000_cost_data::update_target_cost_per_stmt): Fix one wrong
> >>  assertion with early return.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  PR target/103702
> >>  * gcc.target/powerpc/pr103702.c: New test.
> >> ---
> >>  gcc/config/rs6000/rs6000.c  |  7 --
> >>  gcc/testsuite/gcc.target/powerpc/pr103702.c | 24 +
> >>  2 files changed, 29 insertions(+), 2 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103702.c
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index 0b09713b2f5..37f07fe5358 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -5461,8 +5461,11 @@ rs6000_cost_data::update_target_cost_per_stmt 
> >> (vect_cost_for_stmt kind,
> >>  {
> >>tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> >>unsigned int nunits = vect_nunits_for_cost (vectype);
> >> -  /* We don't expect strided/elementwise loads for just 1 nunit.  */
> >> -  gcc_assert (nunits > 1);
> >> +  /* As PR103702 shows, it's possible that vectorizer wants to do
> >> + costings for only one unit here, it's no need to do any
> >> + penalization for it, so simply early return here.  */
> >> +  if (nunits == 1)
> >> +return;
> >>/* i386 port adopts nunits * stmt_cost as the penalized cost
> >>   for this kind of penalization, we used to follow it but
> >>   found it could result in an unreliable body cost especially
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103702.c 
> >> b/gcc/testsuite/gcc.target/powerpc/pr103702.c
> >> new file mode 100644
> >> index 000..585946fd64b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr103702.c
> >> @@ -0,0 +1,24 @@
> >> +/* We don't have one powerpc.*_ok for Power6, use altivec_ok 
> >> conservatively.  */
> >> +/* { dg-require-effective-target powerpc_altivec_ok } */
> >> +/* { dg-options "-mdejagnu-cpu=power6 -O2 -ftree-loop-vectorize 
> >> -fno-tree-scev-cprop" } */
> >> +
> >> +/* Verify there is no ICE.  */
> >> +
> >> +unsigned short a, e;
> >> +int *b, *d;
> >> +int c;
> >> +extern int fn2 ();
> >> +void
> >> +fn1 ()
> >> +{
> >> +  void *f;
> >> +  for (;;)
> >> +{
> >> +  fn2 ();
> >> +  b = f;
> >> +  e = 0;
> >> +  for (; e < a; ++e)
> >> +b[e] = d[e * c];
> >> +}
> >> +}
> >> +
> >> --
> >> 2.27.0
> >>
>


Re: [PING^3][PATCH,v2,1/1,AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-25 Thread Dan Li via Gcc-patches

Hi, all,

Sorry for bothering.

I'm trying to commit aarch64 scs code to the gcc and there is an issue
that I'm not sure about, could someone give me some suggestions?
(To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )

When clang enables scs, the following instructions are usually generated:

str x30, [x18], 8
ldp x29, x30, [sp], 16
..
ldp x29, x30, [sp], 16  ## x30 pop
ldr x30, [x18, -8]! ## x30 pop again
ret

The x30 register is popped twice here, Richard suggested that we can
omit the first x30 pop here.

AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
missing something with the kernel, could someone give some suggestions?

The previous discussion can be found here [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html

Thanks a lot!
Dan

On 1/25/22 22:51, Dan Li wrote:



On 1/25/22 02:19, Richard Sandiford wrote:

Dan Li  writes:

+
  if (flag_stack_usage_info)
    current_function_static_stack_size = constant_lower_bound (frame_size);
@@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
  RTX_FRAME_RELATED_P (insn) = 1;
    }
+  /* Pop return address from shadow call stack.  */
+  if (aarch64_shadow_call_stack_enabled ())
+    emit_insn (gen_scs_pop ());
+


This looks correct, but following on from the above, I guess we continue
to restore x30 from the frame in the traditional way first, and then
overwrite it with the SCS value.  I think the output code would be
slightly neater if we suppressed the first restore of x30.


Yes, the current epilogue is like:
  ...
  ldp x29, x30, [sp], #16
+   ldr x30, [x18, #-8]!

I think may be we can keep the two x30 pops here, for two reasons:
1) The implementation of scs in clang is to pop x30 twice, it might be
better to be consistent with clang here[1].


This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)



Well, probably sticking to pop x30 twice is not a good idea.
AFAICT, there doesn't seem to be an explicit requirement that
this behavior must be followed.

BTW:
Do we still need to emit the .cfi_restore 30 directive here if we
want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)

Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
the generated assembly code may be as follows:

str x30, [x18], 8
ldp x29, x30, [sp], 16
..
ldr x29, [sp], 16
     ## Do we still need a .cfi_restore 30 here
.cfi_restore 29
.cfi_def_cfa_offset 0
ldr x30, [x18, -8]!
     ## Or may be a non-CFA based directive here
ret


2) If we keep the first restore of x30, it may provide some flexibility
for other scenarios. For example, we can directly patch the scs_push/pop
insns in the binary to turn it into a binary without scs;


Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.



Oh, no, this is just a little trick that can be used for compatibility.
(Maybe some scenarios such as our company sometimes need to be
compatible with some non-open source binaries from different
manufacturers, two pops could make life easier :). )

If this is not a consideration for our community, please ignore
this request.


I'd like a definitive ruling on this from the kernel folks before
the patch goes in.



Ok, I'll cc some kernel folks to make sure I didn't miss something.


If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

   (set ...mem...
    (unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.



Yeah, this template looks more appropriate if it is to be treated
as a special directive.

Thanks for your suggestions,
Dan