date:20211007

Re: [PATCH] ctfc: Free CTF type and CTF variable objects in ctfc_delete_container ()

2021-10-07 Thread Richard Biener via Gcc-patches

On Wed, Oct 6, 2021 at 12:38 AM Indu Bhagat via Gcc-patches
 wrote:
>
> Hello,
>
> This patch fixes an outstanding issue with memory cleanup in the CTF 
> container.
> Earlier the two hash tables in the CTF container were not cleaned up in
> ctfc_delete_container ().  With this patch, we first free up the CTF type and
> CTF variable entries in the hash_table slots, followed by emptying of the hash
> tables.
>
> Bootstrapped and regression tested on x86_64.
>
> Thanks
>
> 
>
> Free up the memory held by CTF type and CTF variable objects after CTF debug
> information has been emitted.  In ctfc_delete_container (), traverse the
> hash_table of CTF types and CTF variables and free the memory held by the
> respective objects.
>
> gcc/ChangeLog:
>
> * ctfc.c (free_ctf_dtdef_cb): New function.
> (free_ctf_dvdef_cb): Likewise.
> (ctfc_delete_container): Free hash table contents.
> ---
>  gcc/ctfc.c | 31 ---
>  1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/ctfc.c b/gcc/ctfc.c
> index 73c118e..1f961c9 100644
> --- a/gcc/ctfc.c
> +++ b/gcc/ctfc.c
> @@ -179,6 +179,26 @@ ctf_dvd_lookup (const ctf_container_ref ctfc, dw_die_ref 
> die)
>return NULL;
>  }
>
> +/* Callback function to free the CTF type from hash table.  */
> +
> +static int
> +free_ctf_dtdef_cb (ctf_dtdef_ref * slot, void * arg ATTRIBUTE_UNUSED)
> +{
> +  if (slot && *slot)
> +ggc_free (*slot);
> +  return 1;
> +}
> +
> +/* Callback function to free the CTF variable from hash table.  */
> +
> +static int
> +free_ctf_dvdef_cb (ctf_dvdef_ref * slot, void * arg ATTRIBUTE_UNUSED)
> +{
> +  if (slot && *slot)
> +ggc_free (*slot);
> +  return 1;
> +}
> +
>  /* Append member definition to the list.  Member list is a singly-linked list
> with list start pointing to the head.  */
>
> @@ -944,11 +964,16 @@ ctfc_delete_strtab (ctf_strtable_t * strtab)
>  void
>  ctfc_delete_container (ctf_container_ref ctfc)
>  {
> -  /* FIXME - CTF container can be cleaned up now.
> - Will the ggc machinery take care of cleaning up the container structure
> - including the hash_map members etc. ?  */
>if (ctfc)
>  {
> +  ctfc->ctfc_types->traverse (NULL);
> +  ctfc->ctfc_types->empty ();
> +  ctfc->ctfc_types = NULL;
> +
> +  ctfc->ctfc_vars->traverse (NULL);
> +  ctfc->ctfc_vars->empty ();
> +  ctfc->ctfc_types = NULL;
> +

it should be enough to set ctfc_types to NULL, in particular traversing
the table shouldn't be needed.

OK with that change.

Thanks,
Richard.

>ctfc_delete_strtab (&ctfc->ctfc_strtable);
>ctfc_delete_strtab (&ctfc->ctfc_aux_strtable);
>if (ctfc->ctfc_vars_list)
> --
> 1.8.3.1
>

Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-07 Thread Aldy Hernandez via Gcc-patches


[Andrew, ranger comment embedded below.]

On 10/7/21 1:06 AM, Jeff Law wrote:



On 10/6/2021 7:47 AM, Aldy Hernandez via Gcc-patches wrote:

The pending patch I have from Richi fixes this.  Even so, it's the
uninit code that's confused.

Sigh...every single change to the threading code shines the light on
some warning bug.

If you take the calls.ii file from the aarch64 bootstrap and break on
the warning, you can see that the uninitalized use is for
const_upper_3934 here:

   [local count: 315357954]:
   # const_upper_3934 = PHI 
   if (_881 != 0)
 goto ; [50.00%]
   else
 goto ; [50.00%]

    [local count: 157678977]:
   if (const_upper_3934 > _6699)
 goto ; [89.00%]
   else
 goto ; [11.00%]

    [local count: 17344687]:

    [local count: 157678977]:
   goto ; [100.00%]

    [local count: 140334290]:
   stack_usage_map.481_3930 = stack_usage_map;
   _6441 = const_upper_3934 - _6699;


PROBLEMATIC READ HERE

   _4819 = stack_usage_map.481_3930 + _6699;
   __builtin_memset (_4819, 1, _6441);
   goto ; [11.00%]

const_upper_3934 could be undefined if it comes from BB101
(const_upper_3937(D)), but it only gets read for _881 != 0, so it
shouldn't warn.

I suggest -Wmaybe-uninitialized be turned off for that file until the
warning is fixed.

And yes, the proposed patch will also cure this, but the underlying
problem in the warning is still there.
You haven't shown enough context for me to agree or disagree.  Are there 
other preds to bb105?   I hate these slimmed down dumps.  I work better 
with the full pred/succ lists. -fdump-tree--blocks-details :-)


It appears to me that for _881 != 0 we certainly flow into the read of 
_const_upper_3934 in bb103 and bb105.  Why do you think that's safe?


My bad, there's some missing context.

The only way to get to BB101->BB102 is through:

  
  if (_6711 != 0)
goto ; [5.50%]
  else
goto ; [94.50%]

And there's an implicit relation between _6711 and _811:


...
  if (_6711 != 0)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 17344687]:
  goto ; [100.00%]

   [local count: 298013267]:

   [local count: 315357954]:
  # _881 = PHI <1(87), 0(287)>

That is, _6711 == !_881.

[Andrew, it'd be neat if we could teach ranger the relationship between 
_6711 and _811 above.  And also, that _881 is [0,1].  Perhaps with the 
relation oracle, the uninit code could notice that a _6711 guard is also 
a !_811 guard.]


Presumably the threader shuffled things sufficiently so that the above 
relationship was difficult to devise.


Your preferred full context follows ;-).

Aldy



Here we see that _881 == 0 when _6711 != 0:

;;   basic block 286, loop depth 1, count 640272214 (estimated locally), 
maybe hot

;;prev block 85, next block 86, flags: (NEW)
;;pred:   85 [67.0% (guessed)]  count:640272214 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)

  goto ; [100.00%]
;;succ:   112 [always]  count:640272214 (estimated locally) 
(FALLTHRU)


;;   basic block 86, loop depth 1, count 315357954 (estimated locally), 
maybe hot

;;prev block 286, next block 287, flags: (NEW, REACHABLE, VISITED)
;;pred:   85 [33.0% (guessed)]  count:315357953 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)

  _4293 = MEM[(long int *)_5338 + 80B];
  _6727 = MEM[(long int *)_5338 + 88B];
  _6715 = MEM[(long int *)_5338 + 32B];
  _4305 = _4293 + _6715;
  _4299 = MEM[(long int *)_5338 + 40B];
  _4300 = _4299 + _6727;
  _6707 = (long unsigned int) _4305;
  _6711 = (long unsigned int) _4300;
  _6699 = (long unsigned int) _4293;
  if (_6711 != 0)
goto ; [5.50%]
  else
goto ; [94.50%]
;;succ:   287 [5.5% (guessed)]  count:17344687 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)
;;87 [94.5% (guessed)]  count:298013267 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)


;;   basic block 287, loop depth 1, count 17344687 (estimated locally), 
maybe hot

;;prev block 86, next block 87, flags: (NEW)
;;pred:   86 [5.5% (guessed)]  count:17344687 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)

  goto ; [100.00%]
;;succ:   88 [always]  count:17344687 (estimated locally) (FALLTHRU)

;;   basic block 87, loop depth 1, count 298013267 (estimated locally), 
maybe hot

;;prev block 287, next block 88, flags: (NEW, VISITED)
;;pred:   86 [94.5% (guessed)]  count:298013267 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)
;;succ:   88 [always]  count:298013267 (estimated locally) 
(FALLTHRU,EXECUTABLE)


;;   basic block 88, loop depth 1, count 315357954 (estimated locally), 
maybe hot

;;prev block 87, next block 89, flags: (NEW, REACHABLE, VISITED)
;;pred:   87 [always]  count:298013267 (estimated locally) 
(FALLTHRU,EXECUTABLE)
;;287 [always]  count:17344687 (estimated locally) 
(FALLTHRU)

  # const_upper_3863 = PHI <_6707(87), 18446744073709551615(287)>
  # _881 = PHI <1(87), 0(287)>

[snip]
[

[PATCH] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Kito Cheng

__builtin___clear_cache was able to accept constant address for the
argument, but it seems no longer accept recently, and it even not
accept constant address which is hold in variable when optimization is
enable:

```
void foo3(){
  void *yy = (void*)0x1000;
  __builtin___clear_cache(yy, yy);
}
```

So this patch make BEGIN and END accept VOIDmode, like cselib_lookup_mem did per
Jim Wilson's suggestion.

```
static cselib_val *
cselib_lookup_mem (rtx x, int create)
{
  ...
  addr_mode = GET_MODE (XEXP (x, 0));
  if (addr_mode == VOIDmode)
addr_mode = Pmode;
```

gcc/ChangeLog:

PR target/100316
* builtins.c (maybe_emit_call_builtin___clear_cache): Allow
VOIDmode for begin and end.

gcc/testsuite/ChangeLog:

PR target/100316
* gcc.c-torture/compile/pr100316.c: New.
---
 gcc/builtins.c |  6 --
 gcc/testsuite/gcc.c-torture/compile/pr100316.c | 18 ++
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100316.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3e57eb03af0..960f07121fc 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5163,8 +5163,10 @@ default_emit_call_builtin___clear_cache (rtx begin, rtx 
end)
 void
 maybe_emit_call_builtin___clear_cache (rtx begin, rtx end)
 {
-  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode)
-  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode))
+  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode
+   && GET_MODE (begin) != VOIDmode)
+  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode
+ && GET_MODE (begin) != VOIDmode))
 {
   error ("both arguments to %<__builtin___clear_cache%> must be pointers");
   return;
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100316.c 
b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
new file mode 100644
index 000..38eca86f49f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
@@ -0,0 +1,18 @@
+void foo(){
+  __builtin___clear_cache(0, 0);
+}
+
+void foo1(){
+  __builtin___clear_cache((void*)0, (void*)0);
+}
+
+void foo2(){
+  void *yy = 0;
+  __builtin___clear_cache(yy, yy);
+}
+
+void foo3(){
+  void *yy = (void*)0x1000;
+  __builtin___clear_cache(yy, yy);
+}
+
-- 
2.33.0

Re: [PATCH] IBM Z: Provide rawmemchr{qi,hi,si} expander

2021-10-07 Thread Andreas Krebbel via Gcc-patches

On 9/20/21 11:24, Stefan Schulze Frielinghaus wrote:
> This patch implements the rawmemchr expander as introduced in
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579649.html
> 
> Bootstrapped and regtested in conjunction with the patch from above on
> IBM Z.  Ok for mainline?
> 

> From 551362cda54048dc1a51588112f11c070ed52020 Mon Sep 17 00:00:00 2001
> From: Stefan Schulze Frielinghaus 
> Date: Mon, 8 Feb 2021 10:35:39 +0100
> Subject: [PATCH 2/2] IBM Z: Provide rawmemchr{qi,hi,si} expander
>
> gcc/ChangeLog:
>
>   * config/s390/s390-protos.h (s390_rawmemchrqi): Add prototype.
>   (s390_rawmemchrhi): Add prototype.
>   (s390_rawmemchrsi): Add prototype.
>   * config/s390/s390.c (s390_rawmemchr): New function.
>   (s390_rawmemchrqi): New function.
>   (s390_rawmemchrhi): New function.
>   (s390_rawmemchrsi): New function.
>   * config/s390/s390.md (rawmemchr): New expander.
>   (rawmemchr): New expander.
>   * config/s390/vector.md (vec_vfees): Basically a copy of
>   the pattern vfees from vx-builtins.md.
>   * config/s390/vx-builtins.md (*vfees): Remove.

Thanks! Would it make sense to also extend the strlen and movstr expanders
we have to support the additional character modes?

A few style comments below.

>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/s390/rawmemchr-1.c: New test.
> ---
>  gcc/config/s390/s390-protos.h   |  4 +
>  gcc/config/s390/s390.c  | 89 ++
>  gcc/config/s390/s390.md | 20 +
>  gcc/config/s390/vector.md   | 26 ++
>  gcc/config/s390/vx-builtins.md  | 26 --
>  gcc/testsuite/gcc.target/s390/rawmemchr-1.c | 99 +
>  6 files changed, 238 insertions(+), 26 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/rawmemchr-1.c
>
> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> index 4b03c6e99f5..0d9619e8254 100644
> --- a/gcc/config/s390/s390-protos.h
> +++ b/gcc/config/s390/s390-protos.h
> @@ -66,6 +66,10 @@ s390_asm_declare_function_size (FILE *asm_out_file,
>   const char *fnname ATTRIBUTE_UNUSED, tree decl);
>  #endif
>
> +extern void s390_rawmemchrqi(rtx dst, rtx src, rtx pat);
> +extern void s390_rawmemchrhi(rtx dst, rtx src, rtx pat);
> +extern void s390_rawmemchrsi(rtx dst, rtx src, rtx pat);
> +
>  #ifdef RTX_CODE
>  extern int s390_extra_constraint_str (rtx, int, const char *);
>  extern int s390_const_ok_for_constraint_p (HOST_WIDE_INT, int, const char *);
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 54dd6332c3a..1435ce156e2 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -16559,6 +16559,95 @@ s390_excess_precision (enum excess_precision_type 
> type)
>  }
>  #endif
>
> +template  +   machine_mode elt_mode,
> +   rtx (*gen_vec_vfees) (rtx, rtx, rtx, rtx)>
> +static void
> +s390_rawmemchr(rtx dst, rtx src, rtx pat) {

I think it would be a bit easier to turn the vec_vfees expander into a
'parameterized name' and add the mode as parameter.  I'll attach a patch
to illustrate how this might look like.

> +  rtx lens = gen_reg_rtx (V16QImode);
> +  rtx pattern = gen_reg_rtx (vec_mode);
> +  rtx loop_start = gen_label_rtx ();
> +  rtx loop_end = gen_label_rtx ();
> +  rtx addr = gen_reg_rtx (Pmode);
> +  rtx offset = gen_reg_rtx (Pmode);
> +  rtx tmp = gen_reg_rtx (Pmode);
> +  rtx loadlen = gen_reg_rtx (SImode);
> +  rtx matchlen = gen_reg_rtx (SImode);
> +  rtx mem;
> +
> +  pat = GEN_INT (trunc_int_for_mode (INTVAL (pat), elt_mode));
> +  emit_insn (gen_rtx_SET (pattern, gen_rtx_VEC_DUPLICATE (vec_mode, pat)));
> +
> +  emit_move_insn (addr, XEXP (src, 0));
> +
> +  // alignment
> +  emit_insn (gen_vlbb (lens, gen_rtx_MEM (BLKmode, addr), GEN_INT (6)));
> +  emit_insn (gen_lcbb (loadlen, addr, GEN_INT (6)));
> +  lens = convert_to_mode (vec_mode, lens, 1);
> +  emit_insn (gen_vec_vfees (lens, lens, pattern, GEN_INT (0)));
> +  lens = convert_to_mode (V4SImode, lens, 1);
> +  emit_insn (gen_vec_extractv4sisi (matchlen, lens, GEN_INT (1)));
> +  lens = convert_to_mode (vec_mode, lens, 1);

That back and forth NOP conversion stuff is ugly but I couldn't find a
more elegant way to write this without generating worse code.  Of
course we want to benefit here from the fact that the result operand
of vfees is already zero-extended.  Perhaps factor this out into a
utility function or an extra expander because we appear to need this
frequently?! Not a requirement for this patch though.

> +  emit_cmp_and_jump_insns (matchlen, loadlen, LT, NULL_RTX, SImode, 1, 
> loop_end);
> +  force_expand_binop (Pmode, and_optab, addr, GEN_INT (15), tmp, 1, 
> OPTAB_DIRECT);
> +  force_expand_binop (Pmode, sub_optab, GEN_INT (16), tmp, tmp, 1, 
> OPTAB_DIRECT);
> +  force_expand_binop (Pmode, add_optab, addr, tmp, addr, 1, OPTAB_DIRECT);

Couldn't we just do this as '(addr + 16) & ~0xf' her

Re: [PATCH] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Andrew Pinski via Gcc-patches

On Thu, Oct 7, 2021 at 2:09 AM Kito Cheng  wrote:
>
> __builtin___clear_cache was able to accept constant address for the
> argument, but it seems no longer accept recently, and it even not
> accept constant address which is hold in variable when optimization is
> enable:
>
> ```
> void foo3(){
>   void *yy = (void*)0x1000;
>   __builtin___clear_cache(yy, yy);
> }
> ```
>
> So this patch make BEGIN and END accept VOIDmode, like cselib_lookup_mem did 
> per
> Jim Wilson's suggestion.
>
> ```
> static cselib_val *
> cselib_lookup_mem (rtx x, int create)
> {
>   ...
>   addr_mode = GET_MODE (XEXP (x, 0));
>   if (addr_mode == VOIDmode)
> addr_mode = Pmode;
> ```
>
> gcc/ChangeLog:
>
> PR target/100316
> * builtins.c (maybe_emit_call_builtin___clear_cache): Allow
> VOIDmode for begin and end.
>
> gcc/testsuite/ChangeLog:
>
> PR target/100316
> * gcc.c-torture/compile/pr100316.c: New.
> ---
>  gcc/builtins.c |  6 --
>  gcc/testsuite/gcc.c-torture/compile/pr100316.c | 18 ++
>  2 files changed, 22 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100316.c
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 3e57eb03af0..960f07121fc 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -5163,8 +5163,10 @@ default_emit_call_builtin___clear_cache (rtx begin, 
> rtx end)
>  void
>  maybe_emit_call_builtin___clear_cache (rtx begin, rtx end)
>  {
> -  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode)
> -  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode))
> +  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode
> +   && GET_MODE (begin) != VOIDmode)
> +  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode
> + && GET_MODE (begin) != VOIDmode))
>  {
>error ("both arguments to %<__builtin___clear_cache%> must be 
> pointers");
>return;

Seems like this error should be an gcc_assert at this point.  The
error message should have come from expand_builtin___clear_cache
already.
And the check for VOIDmode should really be a check for CONST_INT.

Thanks,
Andrew Pinski

> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100316.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
> new file mode 100644
> index 000..38eca86f49f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
> @@ -0,0 +1,18 @@
> +void foo(){
> +  __builtin___clear_cache(0, 0);
> +}
> +
> +void foo1(){
> +  __builtin___clear_cache((void*)0, (void*)0);
> +}
> +
> +void foo2(){
> +  void *yy = 0;
> +  __builtin___clear_cache(yy, yy);
> +}
> +
> +void foo3(){
> +  void *yy = (void*)0x1000;
> +  __builtin___clear_cache(yy, yy);
> +}
> +
> --
> 2.33.0
>

Re: [PATCH RFA] vec: Fix --enable-gather-detailed-mem-stats

2021-10-07 Thread Martin Liška


On 10/5/21 20:44, Jason Merrill wrote:

That is the usual approach, yes.  I was giving up on that, but perhaps it's 
better to stick with it.  Martin, want to make that fix for 
save_opt_decoded_options?


Yes, I'm going to install the following patch once it survives regression tests
and bootstrap.

MartinFrom 16f245cc4a738480c5dbdcc700c1859365a0eab5 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 7 Oct 2021 12:29:15 +0200
Subject: [PATCH] build: Fix --enable-gather-detailed-mem-stats

gcc/c-family/ChangeLog:

	* c-common.c (parse_optimize_options): Make
	save_opt_decoded_options a pointer type.

gcc/ChangeLog:

	* toplev.c (toplev::main): Make
	save_opt_decoded_options a pointer type
	* toplev.h: Likewise.
---
 gcc/c-family/c-common.c | 4 ++--
 gcc/toplev.c| 5 +++--
 gcc/toplev.h| 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 9d19e352725..32c7e3e8972 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5921,7 +5921,7 @@ parse_optimize_options (tree args, bool attr_p)
   decoded_options_count = j;
 
   /* Merge the decoded options with save_decoded_options.  */
-  unsigned save_opt_count = save_opt_decoded_options.length ();
+  unsigned save_opt_count = save_opt_decoded_options->length ();
   unsigned merged_decoded_options_count
 = save_opt_count + decoded_options_count;
   cl_decoded_option *merged_decoded_options
@@ -5929,7 +5929,7 @@ parse_optimize_options (tree args, bool attr_p)
 
   /* Note the first decoded_options is used for the program name.  */
   for (unsigned i = 0; i < save_opt_count; ++i)
-merged_decoded_options[i + 1] = save_opt_decoded_options[i];
+merged_decoded_options[i + 1] = (*save_opt_decoded_options)[i];
   for (unsigned i = 1; i < decoded_options_count; ++i)
 merged_decoded_options[save_opt_count + i] = decoded_options[i];
 
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 70769087c13..ecb2b694970 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -117,7 +117,7 @@ struct cl_decoded_option *save_decoded_options;
 unsigned int save_decoded_options_count;
 
 /* Vector of saved Optimization decoded command line options.  */
-auto_vec save_opt_decoded_options;
+vec *save_opt_decoded_options;
 
 /* Used to enable -fvar-tracking, -fweb and -frename-registers according
to optimize in process_options ().  */
@@ -2320,10 +2320,11 @@ toplev::main (int argc, char **argv)
 		&save_decoded_options_count);
 
   /* Save Optimization decoded options.  */
+  save_opt_decoded_options = new vec ();
   for (unsigned i = 1; i < save_decoded_options_count; ++i)
 if (save_decoded_options[i].opt_index < cl_options_count
 	&& cl_options[save_decoded_options[i].opt_index].flags & CL_OPTIMIZATION)
-  save_opt_decoded_options.safe_push (save_decoded_options[i]);
+  save_opt_decoded_options->safe_push (save_decoded_options[i]);
 
   /* Perform language-specific options initialization.  */
   lang_hooks.init_options (save_decoded_options_count, save_decoded_options);
diff --git a/gcc/toplev.h b/gcc/toplev.h
index c44c5ff926a..493f7eb5ad6 100644
--- a/gcc/toplev.h
+++ b/gcc/toplev.h
@@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 /* Decoded options, and number of such options.  */
 extern struct cl_decoded_option *save_decoded_options;
 extern unsigned int save_decoded_options_count;
-extern auto_vec save_opt_decoded_options;
+extern vec *save_opt_decoded_options;
 
 class timer;
 
-- 
2.33.0

[committed] amdgcn: Support LLVM 13 assembler syntax

2021-10-07 Thread Andrew Stubbs

I've committed this patch to allow GCC to adapt to the different 
variants of the LLVM amdgcn assembler. Unfortunately they keep making 
changes without maintaining backwards compatibility.


GCC should now work with LLVM 9, LLVM 12, and LLVM 13 in terms of CLI 
usage, however only LLVM 9 is well tested (I'm aware of at least one bug 
in each of LLVM 12 and 13 that affects GCC-generated binaries).


This patch should make no observable change in behaviour on LLVM 9, and 
add nothing new on LLVM 13 (besides actually working).


Andrew Stubbsamdgcn: Support LLVM 13 assembler syntax

The LLVM devs have changed the assembler architecture attribute names on both
CLI and in the ".amdgcn_target" directive, and changed the attribute syntax
inside the directive, without keeping any backwards compatibility. :-(

This patch improves our configure tests to detect what dialect to use, what
attributes are valid, and adjusts the specs to match.

gcc/ChangeLog:

* config.in: Regenerate.
* config/gcn/gcn-hsa.h (X_FIJI): New macro.
(X_900): New macro.
(X_906): New macro.
(X_908): New macro.
(A_FIJI): Rename to ...
(S_FIJI): ... this.
(A_900): Rename to ...
(S_900): ... this.
(A_906): Rename to ...
(S_906): ... this.
(A_908): Rename to ...
(S_908): ... this.
(SRAMOPT): New macro.
(ASM_SPEC): Adjust xnack option usage.
* config/gcn/gcn.c (output_file_start): Adjust amdgcn_target usage.
* configure: Regenerate.
* configure.ac: Detect LLVM assembler dialect.

diff --git a/gcc/config.in b/gcc/config.in
index 61cafe4f6c0f..b5bec3971dc2 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1463,30 +1463,66 @@
 #endif
 
 
-/* Define if your assembler allows -mattr=+sram-ecc for fiji. */
+/* Define if your assembler expects amdgcn_target gfx908+xnack syntax. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GCN_ASM_V3_SYNTAX
+#endif
+
+
+/* Define if your assembler expects amdgcn_target gfx908:xnack+ syntax. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GCN_ASM_V4_SYNTAX
+#endif
+
+
+/* Define if your assembler allows -mattr=+sramecc for fiji. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GCN_SRAM_ECC_FIJI
 #endif
 
 
-/* Define if your assembler allows -mattr=+sram-ecc for gfx900. */
+/* Define if your assembler allows -mattr=+sramecc for gfx900. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GCN_SRAM_ECC_GFX900
 #endif
 
 
-/* Define if your assembler allows -mattr=+sram-ecc for gfx906. */
+/* Define if your assembler allows -mattr=+sramecc for gfx906. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GCN_SRAM_ECC_GFX906
 #endif
 
 
-/* Define if your assembler allows -mattr=+sram-ecc for gfx908. */
+/* Define if your assembler allows -mattr=+sramecc for gfx908. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GCN_SRAM_ECC_GFX908
 #endif
 
 
+/* Define if your assembler allows -mattr=+xnack for fiji. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GCN_XNACK_FIJI
+#endif
+
+
+/* Define if your assembler allows -mattr=+xnack for gfx900. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GCN_XNACK_GFX900
+#endif
+
+
+/* Define if your assembler allows -mattr=+xnack for gfx906. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GCN_XNACK_GFX906
+#endif
+
+
+/* Define if your assembler allows -mattr=+xnack for gfx908. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GCN_XNACK_GFX908
+#endif
+
+
 /* Define to 1 if you have the `getchar_unlocked' function. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GETCHAR_UNLOCKED
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index fc99c8db7520..60fd40a10b71 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -75,25 +75,64 @@ extern unsigned int gcn_local_sym_hash (const char *name);
supported for gcn.  */
 #define GOMP_SELF_SPECS ""
 
+#ifdef HAVE_GCN_XNACK_FIJI
+#define X_FIJI
+#else
+#define X_FIJI "!march=*:;march=fiji:;"
+#endif
+#ifdef HAVE_GCN_XNACK_GFX900
+#define X_900
+#else
+#define X_900 "march=gfx900:;"
+#endif
+#ifdef HAVE_GCN_XNACK_GFX906
+#define X_906
+#else
+#define X_906 "march=gfx906:;"
+#endif
+#ifdef HAVE_GCN_XNACK_GFX908
+#define X_908
+#else
+#define X_908 "march=gfx908:;"
+#endif
+
 #ifdef HAVE_GCN_SRAM_ECC_FIJI
-#define A_FIJI
+#define S_FIJI
 #else
-#define A_FIJI "!march=*:;march=fiji:;"
+#define S_FIJI "!march=*:;march=fiji:;"
 #endif
 #ifdef HAVE_GCN_SRAM_ECC_GFX900
-#define A_900
+#define S_900
 #else
-#define A_900 "march=gfx900:;"
+#define S_900 "march=gfx900:;"
 #endif
 #ifdef HAVE_GCN_SRAM_ECC_GFX906
-#define A_906
+#define S_906
 #else
-#define A_906 "march=gfx906:;"
+#define S_906 "march=gfx906:;"
 #endif
 #ifdef HAVE_GCN_SRAM_ECC_GFX908
-#define A_908
+#define S_908
+#else
+#define S_908 "march=gfx908:;"
+#endif
+
+#ifdef HAVE_GCN_ASM_V3_SYNTAX
+#define SRAMOPT "sram-ecc"
+#endif
+#ifdef HAVE_GCN_ASM_V4_SYNTAX
+#define SRAMOPT "sramecc"
+#endif
+#if !defined(SRAMOPT) && !defined(IN_LIBGCC2)
+#error "No assembler syntax configured"
+#endif
+
+#ifdef HAVE_GCN_ASM_V

[committed] amdgcn: Implement -msram-ecc=any

2021-10-07 Thread Andrew Stubbs

I've committed this patch to implement the -msram-ecc=any feature that 
has been stubbed out awaiting LLVM support for a while now.


When the LLVM assembler supports the "any" feature (v13+) GCC will now 
make use of it. Otherwise, GCC will continue to treat "any" the same as 
"on".


Using the "any" setting for the "sram-ecc" attribute means that we will 
not need a multilib to support differently configure hardware. The 
compiler has always produced code that runs in any configuration, but 
the ELF flags could not reflect this, until now.


Andrewamdgcn: Implement -msram-ecc=any

The option was already there, but just an alias for -msram-ecc=on.  Now that
LLVM13 supports HSACOv4 and the new ELF flags I can implement the option
properly.

The "any" option is the default in order to ensure that library files work
whichever way the user wants, which means we won't need multilibs to support
the different SRAM ECC hardware configurations.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (SRAMOPT): Include the whole option string.
Adjust for new -msram-ecc=any behaviour.
(ASM_SPEC): Adjust -mxnack and -msram-ecc usage.
* config/gcn/gcn.c (output_file_start): Implement -msram-ecc=any.
* config/gcn/mkoffload.c (EF_AMDGPU_XNACK): Rename to ...
(EF_AMDGPU_XNACK_V3): ... this.
(EF_AMDGPU_SRAM_ECC): Rename to ...
(EF_AMDGPU_SRAM_ECC_V3): ... this.
(EF_AMDGPU_FEATURE_XNACK_V4): New.
(EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4): New.
(EF_AMDGPU_FEATURE_XNACK_ANY_V4): New.
(EF_AMDGPU_FEATURE_XNACK_OFF_V4): New.
(EF_AMDGPU_FEATURE_XNACK_ON_V4): New.
(EF_AMDGPU_FEATURE_SRAMECC_V4): New.
(EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4): New.
(EF_AMDGPU_FEATURE_SRAMECC_ANY_V4): New.
(EF_AMDGPU_FEATURE_SRAMECC_OFF_V4): New.
(EF_AMDGPU_FEATURE_SRAMECC_ON_V4): New.
(SET_XNACK_ON): New.
(SET_XNACK_OFF): New.
(TEST_XNACK): New.
(SET_SRAM_ECC_ON): New.
(SET_SRAM_ECC_ANY): New.
(SET_SRAM_ECC_OFF): New.
(TEST_SRAM_ECC_ANY): New.
(TEST_SRAM_ECC_ON): New.
(main): Implement HSACOv4 and -msram-ecc=any.

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 60fd40a10b71..6a432d17d99f 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -118,10 +118,12 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 #endif
 
 #ifdef HAVE_GCN_ASM_V3_SYNTAX
-#define SRAMOPT "sram-ecc"
+#define SRAMOPT "!msram-ecc=off:-mattr=+sram-ecc;:-mattr=-sram-ecc"
 #endif
 #ifdef HAVE_GCN_ASM_V4_SYNTAX
-#define SRAMOPT "sramecc"
+/* In HSACOv4 no attribute setting means the binary supports "any" hardware
+   configuration.  The name of the attribute also changed.  */
+#define SRAMOPT "msram-ecc=on:-mattr=+sramecc;msram-ecc=off:-mattr=-sramecc"
 #endif
 #if !defined(SRAMOPT) && !defined(IN_LIBGCC2)
 #error "No assembler syntax configured"
@@ -143,11 +145,9 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 #define ASM_SPEC  "-triple=amdgcn--amdhsa "  \
  "%:last_arg(%{march=*:-mcpu=%*}) " \
  HSACO3_SELECT_OPT \
- "-mattr=%{" X_FIJI X_900 X_906 X_908 \
-   "mxnack:+xnack;:-xnack} " \
- /* FIXME: support "any" when we move to HSACOv4.  */ \
- "-mattr=%{" S_FIJI S_900 S_906 S_908 \
-   "!msram-ecc=off:+" SRAMOPT ";:-" SRAMOPT "} " \
+ "%{" X_FIJI X_900 X_906 X_908 \
+"mxnack:-mattr=+xnack;:-mattr=-xnack} " \
+ "%{" S_FIJI S_900 S_906 S_908 SRAMOPT "} " \
  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
 #define LIB_SPEC  "-lc"
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 8517168ff0ad..2e90f327c451 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -5265,9 +5265,13 @@ output_file_start (void)
   const char *sram_ecc = (flag_sram_ecc ? "+sram-ecc" : "");
 #endif
 #if HAVE_GCN_ASM_V4_SYNTAX
+  /* In HSACOv4 no attribute setting means the binary supports "any" hardware
+ configuration.  In GCC binaries, this is true for SRAM ECC, but not
+ XNACK.  */
   const char *xnack = (flag_xnack ? ":xnack+" : ":xnack-");
-  /* FIXME: support "any" when we move to HSACOv4.  */
-  const char *sram_ecc = (flag_sram_ecc ? ":sramecc+" : ":sramecc-");
+  const char *sram_ecc = (flag_sram_ecc == SRAM_ECC_ON ? ":sramecc+"
+ : flag_sram_ecc == SRAM_ECC_OFF ? ":sramecc-"
+ : "");
 #endif
   if (!use_xnack_attr)
 xnack = "";
diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index 732bdfd98e57..a3b22d059b96 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -54,8 +54,51 @@
 #undef  EF_AMDGPU_MACH_AMDGCN_GFX908
 #define EF_AMDGPU_MACH_AMDGCN_GFX908 0x30
 
-#define EF_AMDGPU_XNACK0x100
-#define EF_AMD

[committed] amdgcn: Fix assembler version incompatibility

2021-10-07 Thread Andrew Stubbs

I've committed this patch to fix another case of LLVM assembler 
incompatibility. Marcel previously posted a patch to fix up the 
global_load and global_store instructions, following a 
non-backwards-compatible change in the assembler.


https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572987.html

This patch fixes up another instance of the same problem, this time in 
the gather and scatter instruction patterns.


The fix is needed for LLVM 12 & 13. There is not change in behaviour for 
LLVM 9.


Andrewamdgcn: Fix assembler version incompatibility

This is another case of the global_load instruction format changing in LLVM
(because they fixed a bug).  The configure test is already in place to detect
what is needed.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (gather_insn_2offsets): Apply
HAVE_GCN_ASM_GLOBAL_LOAD_FIXED.
(scatter_insn_2offsets): Likewise.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 84ff67508b99..01fdce64d423 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -827,8 +827,12 @@ (define_insn "gather_insn_2offsets"
/* Work around assembler bug in which a 64-bit register is expected,
but a 32-bit value would be correct.  */
int reg = REGNO (operands[2]) - FIRST_VGPR_REG;
-   sprintf (buf, "global_load%%o0\t%%0, v[%d:%d], %%1 offset:%%3%s\;"
- "s_waitcnt\tvmcnt(0)", reg, reg + 1, glc);
+   if (HAVE_GCN_ASM_GLOBAL_LOAD_FIXED)
+ sprintf (buf, "global_load%%o0\t%%0, v%d, %%1 offset:%%3%s\;"
+   "s_waitcnt\tvmcnt(0)", reg, glc);
+   else
+ sprintf (buf, "global_load%%o0\t%%0, v[%d:%d], %%1 offset:%%3%s\;"
+   "s_waitcnt\tvmcnt(0)", reg, reg + 1, glc);
   }
 else
   gcc_unreachable ();
@@ -958,8 +962,12 @@ (define_insn "scatter_insn_2offsets"
/* Work around assembler bug in which a 64-bit register is expected,
but a 32-bit value would be correct.  */
int reg = REGNO (operands[1]) - FIRST_VGPR_REG;
-   sprintf (buf, "global_store%%s3\tv[%d:%d], %%3, %%0 offset:%%2%s",
-reg, reg + 1, glc);
+   if (HAVE_GCN_ASM_GLOBAL_LOAD_FIXED)
+ sprintf (buf, "global_store%%s3\tv%d, %%3, %%0 offset:%%2%s",
+  reg, glc);
+   else
+ sprintf (buf, "global_store%%s3\tv[%d:%d], %%3, %%0 offset:%%2%s",
+  reg, reg + 1, glc);
   }
 else
   gcc_unreachable ();

[committed] MAINTAINERS: Add myself to DCO section

2021-10-07 Thread Siddhesh Poyarekar

Also updated my email address to the one I actually use for gcc, which
unfortunately is not the same as the one I use for glibc.  Oh well...

ChangeLog:

* MAINTAINERS: Add myself to DCO section and update email
address.
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 00747ac72b6..bf4006c779f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -569,7 +569,7 @@ Nicolas Pitre   

 Michael Ploujnikov 
 Paul Pluzhnikov
 Antoniu Pop
-Siddhesh Poyarekar 
+Siddhesh Poyarekar 
 Vidya Praveen  
 Thomas Preud'homme 
 Vladimir Prus  
@@ -710,5 +710,6 @@ information.
  Jeff Law  
  Jonathan Wakely   
  Gaius Mulley  
+ Siddhesh Poyarekar
  Trevor Saunders   
  Petter Tomner 
-- 
2.31.1

[PATCH] tree-optimization/102608 - avoid CSEing .DEFERRED_INIT

2021-10-07 Thread Richard Biener via Gcc-patches

This makes VN not CSE .DEFERRED_INIT which confuses uninit
analysis which reports the wrong decl when facing copies
of not initialized data.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-07  Richard Biener  

PR tree-optimization/102608
* tree-ssa-sccvn.c (visit_stmt): Drop .DEFERRED_INIT to
varying.
---
 gcc/tree-ssa-sccvn.c | 45 +++-
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 0d942218279..ae0172a143e 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -5668,27 +5668,30 @@ visit_stmt (gimple *stmt, bool backedges_varying_p = 
false)
  && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL)
extra_fnflags = flags_from_decl_or_type (TREE_OPERAND (fn, 0));
}
-  if (/* Calls to the same function with the same vuse
-and the same operands do not necessarily return the same
-value, unless they're pure or const.  */
- ((gimple_call_flags (call_stmt) | extra_fnflags)
-  & (ECF_PURE | ECF_CONST))
- /* If calls have a vdef, subsequent calls won't have
-the same incoming vuse.  So, if 2 calls with vdef have the
-same vuse, we know they're not subsequent.
-We can value number 2 calls to the same function with the
-same vuse and the same operands which are not subsequent
-the same, because there is no code in the program that can
-compare the 2 values...  */
- || (gimple_vdef (call_stmt)
- /* ... unless the call returns a pointer which does
-not alias with anything else.  In which case the
-information that the values are distinct are encoded
-in the IL.  */
- && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS)
- /* Only perform the following when being called from PRE
-which embeds tail merging.  */
- && default_vn_walk_kind == VN_WALK))
+  if ((/* Calls to the same function with the same vuse
+ and the same operands do not necessarily return the same
+ value, unless they're pure or const.  */
+  ((gimple_call_flags (call_stmt) | extra_fnflags)
+   & (ECF_PURE | ECF_CONST))
+  /* If calls have a vdef, subsequent calls won't have
+ the same incoming vuse.  So, if 2 calls with vdef have the
+ same vuse, we know they're not subsequent.
+ We can value number 2 calls to the same function with the
+ same vuse and the same operands which are not subsequent
+ the same, because there is no code in the program that can
+ compare the 2 values...  */
+  || (gimple_vdef (call_stmt)
+  /* ... unless the call returns a pointer which does
+ not alias with anything else.  In which case the
+ information that the values are distinct are encoded
+ in the IL.  */
+  && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS)
+  /* Only perform the following when being called from PRE
+ which embeds tail merging.  */
+  && default_vn_walk_kind == VN_WALK))
+ /* Do not process .DEFERRED_INIT since that confuses uninit
+analysis.  */
+ && !gimple_call_internal_p (call_stmt, IFN_DEFERRED_INIT))
changed = visit_reference_op_call (lhs, call_stmt);
   else
changed = defs_to_varying (call_stmt);
-- 
2.31.1

Re: [PATCH]middle-end convert negate + right shift into compare greater.

2021-10-07 Thread Richard Earnshaw via Gcc-patches





On 05/10/2021 14:56, Tamar Christina via Gcc-patches wrote:




-Original Message-
From: Richard Earnshaw 
Sent: Tuesday, October 5, 2021 2:52 PM
To: Tamar Christina ; gcc-patches@gcc.gnu.org
Cc: nd ; rguent...@suse.de
Subject: Re: [PATCH]middle-end convert negate + right shift into compare
greater.



On 05/10/2021 14:49, Tamar Christina wrote:

-Original Message-
From: Richard Earnshaw 
Sent: Tuesday, October 5, 2021 2:34 PM
To: Tamar Christina ;
gcc-patches@gcc.gnu.org
Cc: nd ; rguent...@suse.de
Subject: Re: [PATCH]middle-end convert negate + right shift into
compare greater.



On 05/10/2021 14:30, Tamar Christina wrote:




-Original Message-
From: Richard Earnshaw 
Sent: Tuesday, October 5, 2021 1:56 PM
To: Tamar Christina ;
gcc-patches@gcc.gnu.org
Cc: nd ; rguent...@suse.de
Subject: Re: [PATCH]middle-end convert negate + right shift into
compare greater.



On 05/10/2021 13:50, Tamar Christina via Gcc-patches wrote:

Hi All,

This turns an inversion of the sign bit + arithmetic right shift
into a comparison with 0.

i.e.

void fun1(int32_t *x, int n)
{
for (int i = 0; i < (n & -16); i++)
  x[i] = (-x[i]) >> 31;
}


Notwithstanding that I think shifting a negative value right is
unspecified behaviour, I don't think this generates the same result
when x[i] is INT_MIN either, although negating that is also
unspecified since it can't be represented in an int.



You're right that they are implementation defined, but I think most
ISAs do have a sane Implementation of these two cases. At least both
x86 and AArch64 just replicate the signbit and for negate do two

complement negation. So INT_MIN works as expected and results in 0.

Which is not what the original code produces if you have wrapping
ints, because -INT_MIN is INT_MIN, and thus still negative.



True, but then you have a signed overflow which is undefined behaviour
and not implementation defined

" If an exceptional condition occurs during the evaluation of an expression

(that is, if the result is not mathematically defined or not in the range of
representable values for its type), the behavior is undefined."


So it should still be acceptable to do in this case.


-fwrapv


If I understand correctly, you're happy with this is I guard it on ! flag_wrapv 
?


I did some more digging.  Right shift of a negative value is IMP_DEF 
(not UNDEF - this keeps catching me out).  So yes, wrapping this with 
!wrapv would address my concern.


I've not reviewed the patch itself, though.  I've never even written a 
patch for match.pd, so don't feel qualified to do that.


R.



Regards,
Tamar


R.




R.



But I'm happy to guard this behind some sort of target guard.

Regards,
Tamar


R.


now generates:

.L3:
ldr q0, [x0]
cmgtv0.4s, v0.4s, #0
str q0, [x0], 16
cmp x0, x1
bne .L3

instead of:

.L3:
ldr q0, [x0]
neg v0.4s, v0.4s
sshrv0.4s, v0.4s, 31
str q0, [x0], 16
cmp x0, x1
bne .L3

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: New negate+shift pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-2.c: New test.
* gcc.dg/signbit-3.c: New test.
* gcc.target/aarch64/signbit-1.c: New test.

--- inline copy of patch --
diff --git a/gcc/match.pd b/gcc/match.pd index





7d2a24dbc5e9644a09968f877e12a824d8ba1caa..581436fe36dbacdcb0c2720b7

190c96d14398143 100644

--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -826,6 +826,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 { tree utype = unsigned_type_for (type); }
 (convert (rshift (lshift (convert:utype @0) @2) @3))

+/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */
+(for cst (INTEGER_CST VECTOR_CST)  (simplify
+  (rshift (negate:s @0) cst@1)
+   (with { tree ctype = TREE_TYPE (@0);
+  tree stype = TREE_TYPE (@1);
+  tree bt = truth_type_for (ctype); }
+(switch
+ /* Handle scalar case.  */
+ (if (INTEGRAL_TYPE_P (ctype)
+ && !VECTOR_TYPE_P (ctype)
+ && !TYPE_UNSIGNED (ctype)
+ && canonicalize_math_after_vectorization_p ()
+ && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) -

1))

+  (convert:bt (gt:bt @0 { build_zero_cst (stype); })))
+ /* Handle vector case with a scalar immediate.  */
+ (if (VECTOR_INTEGER_TYPE_P (ctype)
+ && !VECTOR_TYPE_P (stype)
+ && !TYPE_UNSIGNED (ctype)
+  && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
+  (convert:bt (gt:bt @0 { build_zero_cst (ctype); })))
+ /* Handle vector case with a vector immediate.   */
+ (if (VECTOR_INTEGER_TYPE_P (ctype)
+ && VECTOR_TYPE_P (stype)
+ && !TYPE_UNSIGNED (ctype)
+ && uniform_vector_p (@1))
+  (with { tree cst = vector_cst_elt

[PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]

2021-10-07 Thread Jakub Jelinek via Gcc-patches

Hi!

I believe we need no changes to the compiler for P2316R2, seems we treat
character literals the same between preprocessor and C++ expressions,
here is a testcase that should verify it.

Tested on x86_64-linux, ok for trunk?

Note, seems the internal charset for GCC can be either UTF-8 or UTF-EBCDIC,
but I bet it is very hard (at least for me) to actually test the latter.
I'd guess one needs all system headers to be in EBCDIC and the gcc sources too.
But looking around the source, I'm a little bit worried about the UTF-EBCDIC
case.
One is:
 #if  '\n' == 0x0A && ' ' == 0x20 && '0' == 0x30 \
&& 'A' == 0x41 && 'a' == 0x61 && '!' == 0x21
 #  define HOST_CHARSET HOST_CHARSET_ASCII
 #else
 # if '\n' == 0x15 && ' ' == 0x40 && '0' == 0xF0 \
&& 'A' == 0xC1 && 'a' == 0x81 && '!' == 0x5A
 #  define HOST_CHARSET HOST_CHARSET_EBCDIC
 # else
 #  define HOST_CHARSET HOST_CHARSET_UNKNOWN
 # endif
 #endif
in include/safe-ctype.h, does that mean we only support EBCDIC if 
-funsigned-char
and otherwise fail to build gcc?  Because with -fsigned-char, '0' is -0x10
rather than 0xF0, 'A' is -0x3F rather than 0xC1 and 'a' is -0x7F rather than
0x81.
And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the 
libcpp/lex.c
static const cppchar_t utf8_signifier = 0xC0;
...
  if (*buffer->cur >= utf8_signifier)
{
  if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
   state, &s))
return true;
}
work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
multi-byte character, it is more complicated and seems _cpp_valid_utf8
assumes UTF-8 as the host charset.

2021-10-07  Jakub Jelinek  

PR c++/102615
* g++.dg/cpp23/charlit-encoding1.C: New testcase for C++23 P2316R2.

--- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj   2021-10-07 
14:34:35.182132411 +0200
+++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C  2021-10-07 
14:34:02.902583774 +0200
@@ -0,0 +1,33 @@
+// PR c++/102615 - P2316R2 - Consistent character literal encoding
+// { dg-do compile }
+
+extern "C" void abort ();
+
+int
+main ()
+{
+#if ' ' == 0x20
+  if (' ' != 0x20)
+abort ();
+#elif ' ' == 0x40
+  if (' ' != 0x40)
+abort ();
+#else
+  if (' ' == 0x20 || ' ' == 0x40)
+abort ();
+#endif
+#if 'a' == 0x61
+  if ('a' != 0x61)
+abort ();
+#elif 'a' == 0x81
+  if ('a' != 0x81)
+abort ();
+#elif 'a' == -0x7F
+  if ('a' != -0x7F)
+abort ();
+#else
+  if ('a' == 0x61 || 'a' == 0x81 || 'a' == -0x7F)
+abort ();
+#endif
+  return 0;
+}

Jakub

Re: [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]

2021-10-07 Thread Jason Merrill via Gcc-patches


On 10/7/21 09:00, Jakub Jelinek wrote:

Hi!

I believe we need no changes to the compiler for P2316R2, seems we treat
character literals the same between preprocessor and C++ expressions,
here is a testcase that should verify it.

Tested on x86_64-linux, ok for trunk?

Note, seems the internal charset for GCC can be either UTF-8 or UTF-EBCDIC,
but I bet it is very hard (at least for me) to actually test the latter.
I'd guess one needs all system headers to be in EBCDIC and the gcc sources too.
But looking around the source, I'm a little bit worried about the UTF-EBCDIC
case.
One is:
  #if  '\n' == 0x0A && ' ' == 0x20 && '0' == 0x30 \
 && 'A' == 0x41 && 'a' == 0x61 && '!' == 0x21
  #  define HOST_CHARSET HOST_CHARSET_ASCII
  #else
  # if '\n' == 0x15 && ' ' == 0x40 && '0' == 0xF0 \
 && 'A' == 0xC1 && 'a' == 0x81 && '!' == 0x5A
  #  define HOST_CHARSET HOST_CHARSET_EBCDIC
  # else
  #  define HOST_CHARSET HOST_CHARSET_UNKNOWN
  # endif
  #endif
in include/safe-ctype.h, does that mean we only support EBCDIC if 
-funsigned-char
and otherwise fail to build gcc?  Because with -fsigned-char, '0' is -0x10
rather than 0xF0, 'A' is -0x3F rather than 0xC1 and 'a' is -0x7F rather than
0x81.
And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the 
libcpp/lex.c
static const cppchar_t utf8_signifier = 0xC0;
...
   if (*buffer->cur >= utf8_signifier)
 {
   if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
state, &s))
 return true;
 }
work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
multi-byte character, it is more complicated and seems _cpp_valid_utf8
assumes UTF-8 as the host charset.


Are there any supported platforms that use UTF-EBCDIC?


2021-10-07  Jakub Jelinek  

PR c++/102615
* g++.dg/cpp23/charlit-encoding1.C: New testcase for C++23 P2316R2.

--- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj   2021-10-07 
14:34:35.182132411 +0200
+++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C  2021-10-07 
14:34:02.902583774 +0200
@@ -0,0 +1,33 @@
+// PR c++/102615 - P2316R2 - Consistent character literal encoding
+// { dg-do compile }


Doesn't this need to run?  OK with that change.


+extern "C" void abort ();
+
+int
+main ()
+{
+#if ' ' == 0x20
+  if (' ' != 0x20)
+abort ();
+#elif ' ' == 0x40
+  if (' ' != 0x40)
+abort ();
+#else
+  if (' ' == 0x20 || ' ' == 0x40)
+abort ();
+#endif
+#if 'a' == 0x61
+  if ('a' != 0x61)
+abort ();
+#elif 'a' == 0x81
+  if ('a' != 0x81)
+abort ();
+#elif 'a' == -0x7F
+  if ('a' != -0x7F)
+abort ();
+#else
+  if ('a' == 0x61 || 'a' == 0x81 || 'a' == -0x7F)
+abort ();
+#endif
+  return 0;
+}

Jakub

Re: [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]

2021-10-07 Thread Jakub Jelinek via Gcc-patches

On Thu, Oct 07, 2021 at 09:12:15AM -0400, Jason Merrill wrote:
> > And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the 
> > libcpp/lex.c
> > static const cppchar_t utf8_signifier = 0xC0;
> > ...
> >if (*buffer->cur >= utf8_signifier)
> >  {
> >if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + 
> > !first,
> > state, &s))
> >  return true;
> >  }
> > work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
> > multi-byte character, it is more complicated and seems _cpp_valid_utf8
> > assumes UTF-8 as the host charset.
> 
> Are there any supported platforms that use UTF-EBCDIC?

I have no idea.  From the libcpp/charset.c code, seems there is no built-in
conversion for UTF-EBCDIC, the only internally supported conversions are
  { "UTF-8/UTF-32LE", convert_utf8_utf32, (iconv_t)0 },
  { "UTF-8/UTF-32BE", convert_utf8_utf32, (iconv_t)1 },
  { "UTF-8/UTF-16LE", convert_utf8_utf16, (iconv_t)0 },
  { "UTF-8/UTF-16BE", convert_utf8_utf16, (iconv_t)1 },
  { "UTF-32LE/UTF-8", convert_utf32_utf8, (iconv_t)0 },
  { "UTF-32BE/UTF-8", convert_utf32_utf8, (iconv_t)1 },
  { "UTF-16LE/UTF-8", convert_utf16_utf8, (iconv_t)0 },
  { "UTF-16BE/UTF-8", convert_utf16_utf8, (iconv_t)1 },
and identity, so unless the C library iconv supports conversion to
UTF-EBCDIC, the only case that could be supported is when -finput-charset=
is also UTF-EBCDIC.  E.g. glibc iconv doesn't support that.
Never used z/VM nor OS/390 which I think are the only possible hosts that
could have UTF-EBCDIC.
CCing Andreas if he knows more...

> > --- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj   2021-10-07 
> > 14:34:35.182132411 +0200
> > +++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C  2021-10-07 
> > 14:34:02.902583774 +0200
> > @@ -0,0 +1,33 @@
> > +// PR c++/102615 - P2316R2 - Consistent character literal encoding
> > +// { dg-do compile }
> 
> Doesn't this need to run?  OK with that change.

Thanks for catching that, fixed, retested and committed.

Jakub

Re: [PATCH] tree-optimization/102608 - avoid CSEing .DEFERRED_INIT

2021-10-07 Thread Jeff Law via Gcc-patches





On 10/7/2021 6:26 AM, Richard Biener via Gcc-patches wrote:

This makes VN not CSE .DEFERRED_INIT which confuses uninit
analysis which reports the wrong decl when facing copies
of not initialized data.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-07  Richard Biener  

PR tree-optimization/102608
* tree-ssa-sccvn.c (visit_stmt): Drop .DEFERRED_INIT to
varying.

Thanks.  mips64 and mips64el kernel builds are happy again.
jeff

Fix ipa-modref ICE

2021-10-07 Thread Jan Hubicka

Hi,
this patch fixes omitted case in contains_p which later trigger a sanity
check since merging is not symmetric.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

gcc/ChangeLog:

2021-10-07  Jan Hubicka  

PR ipa/102581
* ipa-modref-tree.h (modref_access_node::contains_p): Handle offsets
better.
(modref_access_node::try_merge_with): Add sanity check that there
are no redundant entries in the list.

gcc/testsuite/ChangeLog:

2021-10-07  Jan Hubicka  

* g++.dg/torture/pr102581.C: New test.

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 6a9ed5ce54b..8e9b89b3e2c 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -110,8 +110,11 @@ struct GTY(()) modref_access_node
   if (!a.parm_offset_known)
 return false;
   /* Accesses are never below parm_offset, so look
- for smaller offset.  */
-  if (!known_le (parm_offset, a.parm_offset))
+ for smaller offset.
+ If access ranges are known still allow merging
+ when bit offsets comparsion passes.  */
+  if (!known_le (parm_offset, a.parm_offset)
+  && !range_info_useful_p ())
 return false;
   aoffset_adj = (a.parm_offset - parm_offset)
 << LOG2_BITS_PER_UNIT;
@@ -618,6 +621,7 @@ private:
found = true;
  if (!found && n->merge (*a, false))
found = restart = true;
+ gcc_checking_assert (found || !a->merge (*n, false));
  if (found)
{
  accesses->unordered_remove (i);
diff --git a/gcc/testsuite/g++.dg/torture/pr102581.C 
b/gcc/testsuite/g++.dg/torture/pr102581.C
new file mode 100644
index 000..7f172d088b8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr102581.C
@@ -0,0 +1,51 @@
+// { dg-do compile }
+/* { dg-additional-options "-fno-strict-aliasing" } */
+enum VkStructureType {
+  VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_BUFFER_DEVICE_ADDRESS_FEATURES_EXT,
+  VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2_KHR
+} typedef VkPhysicalDeviceSparseProperties;
+struct VkPhysicalDeviceProperties {
+  int apiVersion;
+  VkPhysicalDeviceSparseProperties sparseProperties;
+};
+typedef struct {
+  VkStructureType sType;
+  int *pPhysicalDevices;
+} VkPhysicalDeviceFeatures2;
+typedef struct VkPhysicalDeviceProperties2 {
+  VkStructureType sType;
+  void *pNext;
+} VkPhysicalDeviceMemoryProperties2;
+struct VulkanVersion {
+  int major;
+  int minor;
+  int patch;
+};
+int make_vulkan_version_version;
+VulkanVersion make_vulkan_version() {
+  return {make_vulkan_version_version, make_vulkan_version_version,
+  make_vulkan_version_version};
+}
+struct AppGpu {
+  int &inst;
+  int id;
+  int *phys_device = nullptr;
+  VulkanVersion api_version{};
+  VkPhysicalDeviceProperties props{};
+  VkPhysicalDeviceProperties2 props2{};
+  int memory_props{};
+  VkPhysicalDeviceMemoryProperties2 memory_props2{};
+  int features{};
+  VkPhysicalDeviceFeatures2 features2{};
+  int *dev = nullptr;
+  int enabled_features{};
+  int AppGpu_phys_device;
+  int AppGpu_inst;
+  AppGpu() : inst(AppGpu_inst), id() {
+api_version = make_vulkan_version();
+props2.sType = memory_props2.sType = features2.sType =
+VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2_KHR;
+  }
+};
+int
+main() { AppGpu(); return 0; }

[PATCH] c++: NTTP with array/function type after substitution [PR61355]

2021-10-07 Thread Patrick Palka via Gcc-patches

We're implementing [temp.param]/10 at parse time but not also at
substitution time.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/61355

gcc/cp/ChangeLog:

* pt.c (convert_template_argument): Perform array/function to
pointer conversion on the substituted type of an NTTP.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.pt/nontype5.C:
* g++.dg/template/param6.C: New test.
---
 gcc/cp/pt.c  |  4 +++
 gcc/testsuite/g++.dg/template/param6.C   | 32 
 gcc/testsuite/g++.old-deja/g++.pt/nontype5.C |  2 +-
 3 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/param6.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5af3a6472f8..170edbd6f1e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8528,6 +8528,10 @@ convert_template_argument (tree parm,
   else
t = tsubst (t, args, complain, in_decl);
 
+  /* Perform array-to-pointer and function-to-pointer conversion
+as per [temp.param]/10.  */
+  t = type_decays_to (t);
+
   if (invalid_nontype_parm_type_p (t, complain))
return error_mark_node;
 
diff --git a/gcc/testsuite/g++.dg/template/param6.C 
b/gcc/testsuite/g++.dg/template/param6.C
new file mode 100644
index 000..8306e753d70
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/param6.C
@@ -0,0 +1,32 @@
+// PR c++/61355
+// Verify we perform array-to-pointer and function-to-pointer conversion
+// on the substituted/deduced type of an NTTP.
+
+int f();
+int p[5];
+
+namespace cpp98 {
+  template struct X;
+  typedef X ty1;
+  typedef X ty2;
+}
+
+namespace cpp11 {
+#if __cpp_variadic_templates
+  template struct X;
+  using ty1 = X;
+  using ty2 = X;
+#endif
+}
+
+namespace cpp17 {
+#if __cpp_nontype_template_parameter_auto
+  template struct X;
+  using ty1 = X;
+  using ty2 = X;
+
+  template struct Y;
+  using ty3 = Y;
+  using ty4 = Y;
+#endif
+}
diff --git a/gcc/testsuite/g++.old-deja/g++.pt/nontype5.C 
b/gcc/testsuite/g++.old-deja/g++.pt/nontype5.C
index 2678cf78a7d..e24dca43622 100644
--- a/gcc/testsuite/g++.old-deja/g++.pt/nontype5.C
+++ b/gcc/testsuite/g++.old-deja/g++.pt/nontype5.C
@@ -19,5 +19,5 @@ static int g() { return f(); }
 int f() { return 0; }
 
 int main() {
-return B::g();  // { dg-error "" } could not convert arg
+return B::g();
 }
-- 
2.33.0.664.g0785eb7698

Re: [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615]

2021-10-07 Thread Lewis Hyatt via Gcc-patches

On Thu, Oct 7, 2021 at 9:01 AM Jakub Jelinek via Gcc-patches
 wrote:
> And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the 
> libcpp/lex.c
> static const cppchar_t utf8_signifier = 0xC0;
> ...
>   if (*buffer->cur >= utf8_signifier)
> {
>   if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + 
> !first,
>state, &s))
> return true;
> }
> work?  Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of
> multi-byte character, it is more complicated and seems _cpp_valid_utf8
> assumes UTF-8 as the host charset.

FWIW, here I was following Joseph's guidance from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c21 ("You can
ignore anything claiming to handle UTF-EBCDIC.")

-Lewis

Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-07 Thread Jeff Law via Gcc-patches





On 10/7/2021 2:15 AM, Aldy Hernandez wrote:

[Andrew, ranger comment embedded below.]

On 10/7/21 1:06 AM, Jeff Law wrote:



On 10/6/2021 7:47 AM, Aldy Hernandez via Gcc-patches wrote:

The pending patch I have from Richi fixes this.  Even so, it's the
uninit code that's confused.

Sigh...every single change to the threading code shines the light on
some warning bug.

If you take the calls.ii file from the aarch64 bootstrap and break on
the warning, you can see that the uninitalized use is for
const_upper_3934 here:

   [local count: 315357954]:
   # const_upper_3934 = PHI 
   if (_881 != 0)
 goto ; [50.00%]
   else
 goto ; [50.00%]

    [local count: 157678977]:
   if (const_upper_3934 > _6699)
 goto ; [89.00%]
   else
 goto ; [11.00%]

    [local count: 17344687]:

    [local count: 157678977]:
   goto ; [100.00%]

    [local count: 140334290]:
   stack_usage_map.481_3930 = stack_usage_map;
   _6441 = const_upper_3934 - _6699;


PROBLEMATIC READ HERE

   _4819 = stack_usage_map.481_3930 + _6699;
   __builtin_memset (_4819, 1, _6441);
   goto ; [11.00%]

const_upper_3934 could be undefined if it comes from BB101
(const_upper_3937(D)), but it only gets read for _881 != 0, so it
shouldn't warn.

I suggest -Wmaybe-uninitialized be turned off for that file until the
warning is fixed.

And yes, the proposed patch will also cure this, but the underlying
problem in the warning is still there.
You haven't shown enough context for me to agree or disagree. Are 
there other preds to bb105?   I hate these slimmed down dumps.  I 
work better with the full pred/succ lists. 
-fdump-tree--blocks-details :-)


It appears to me that for _881 != 0 we certainly flow into the read 
of _const_upper_3934 in bb103 and bb105.  Why do you think that's safe?


My bad, there's some missing context.

The only way to get to BB101->BB102 is through:

  
  if (_6711 != 0)
    goto ; [5.50%]
  else
    goto ; [94.50%]

And there's an implicit relation between _6711 and _811:


...
  if (_6711 != 0)
    goto ; [5.50%]
  else
    goto ; [94.50%]

   [local count: 17344687]:
  goto ; [100.00%]

   [local count: 298013267]:

   [local count: 315357954]:
  # _881 = PHI <1(87), 0(287)>

That is, _6711 == !_881.

[Andrew, it'd be neat if we could teach ranger the relationship 
between _6711 and _811 above.  And also, that _881 is [0,1]. Perhaps 
with the relation oracle, the uninit code could notice that a _6711 
guard is also a !_811 guard.]
Ah, yes.  This is one of the most common cases where we fail to detect 
jump threads and/or fail to prune a path in the uninit warnings.  
There's multiple instances of this BZ, though I don't have the #s handy 
and the testcases are much smaller & easier to understand.  I'm pretty 
sure they're linked to the meta bug for threading or uninit warnings.


However, in this instance we may have a way out.   When we record the 
constant boolean equivalence for _881 talking the paths from 87 and 287 
respectively, we could walk up the control dependency graph and derive 
other values such as _6711 in this example.  I'm pretty sure we don't 
have the CDG built for DOM, but I do think we have it in the predicate 
analysis engine, so we could at least prune the warning.



Jeff

[PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-07 Thread Chung-Lin Tang


Hi all,
this patch add support for "strictly-structured blocks" introduced in OpenMP 
5.1,
basically allowing BLOCK constructs to serve as the body for directives:

!$omp target
block
  ...
end block
[!$omp end target]  !! end directive is optional

!$omp parallel
block
  ...
end block
...
!$omp end parallel  !! error, considered as not match to above parallel 
directive

The parsing loop in parse_omp_structured_block() has been modified to allow
a BLOCK construct after the first statement has been detected to be ST_BLOCK.
This is done by a hard modification of the state into (the new) 
COMP_OMP_STRICTLY_STRUCTURED_BLOCK
after the statement is known (I'm not sure if there's a way to 'peek' the next
statement/token in the Fortran FE, open to suggestions on how to better write 
this)

Tested with no regressions on trunk, is this okay to commit?

Thanks,
Chung-Lin

2021-10-07  Chung-Lin Tang  

gcc/fortran/ChangeLog:

* decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case
together with COMP_BLOCK.
* parse.c (parse_omp_structured_block): Adjust declaration, add
'bool strictly_structured_block' default true parameter, add handling
for strictly-structured block case, adjust recursive calls to
parse_omp_structured_block.
(parse_executable): Adjust calls to parse_omp_structured_block.
* parse.h (enum gfc_compile_state): Add
COMP_OMP_STRICTLY_STRUCTURED_BLOCK.
* trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case
handling.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/strictly-structured-block-1.f90: New test.
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index b3c65b7175b..ff66d1f9475 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -8445,6 +8445,7 @@ gfc_match_end (gfc_statement *st)
   break;
 
 case COMP_BLOCK:
+case COMP_OMP_STRICTLY_STRUCTURED_BLOCK:
   *st = ST_END_BLOCK;
   target = " block";
   eos_ok = 0;
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 7d765a0866d..d78bf9b8fa5 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -5451,8 +5451,9 @@ parse_oacc_loop (gfc_statement acc_st)
 
 /* Parse the statements of an OpenMP structured block.  */
 
-static void
-parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
+static gfc_statement
+parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only,
+   bool strictly_structured_block = true)
 {
   gfc_statement st, omp_end_st;
   gfc_code *cp, *np;
@@ -5538,6 +5539,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
   gcc_unreachable ();
 }
 
+  bool block_construct = false;
+  gfc_namespace* my_ns = NULL;
+  gfc_namespace* my_parent = NULL;
+
+  st = next_statement ();
+
+  if (strictly_structured_block && st == ST_BLOCK)
+{
+  /* Adjust state to a strictly-structured block, now that we found that
+the body starts with a BLOCK construct.  */
+  s.state = COMP_OMP_STRICTLY_STRUCTURED_BLOCK;
+
+  block_construct = true;
+  gfc_notify_std (GFC_STD_F2008, "BLOCK construct at %C");
+
+  my_ns = gfc_build_block_ns (gfc_current_ns);
+  gfc_current_ns = my_ns;
+  my_parent = my_ns->parent;
+
+  new_st.op = EXEC_BLOCK;
+  new_st.ext.block.ns = my_ns;
+  new_st.ext.block.assoc = NULL;
+  accept_statement (ST_BLOCK);
+  st = parse_spec (ST_NONE);
+}
+
   do
 {
   if (workshare_stmts_only)
@@ -5554,7 +5581,6 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
 restrictions apply recursively.  */
  bool cycle = true;
 
- st = next_statement ();
  for (;;)
{
  switch (st)
@@ -5576,17 +5602,20 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
  parse_forall_block ();
  break;
 
+   case ST_OMP_PARALLEL_SECTIONS:
+ st = parse_omp_structured_block (st, false, false);
+ continue;
+
case ST_OMP_PARALLEL:
case ST_OMP_PARALLEL_MASKED:
case ST_OMP_PARALLEL_MASTER:
-   case ST_OMP_PARALLEL_SECTIONS:
- parse_omp_structured_block (st, false);
- break;
+ st = parse_omp_structured_block (st, false);
+ continue;
 
case ST_OMP_PARALLEL_WORKSHARE:
case ST_OMP_CRITICAL:
- parse_omp_structured_block (st, true);
- break;
+ st = parse_omp_structured_block (st, true);
+ continue;
 
case ST_OMP_PARALLEL_DO:
case ST_OMP_PARALLEL_DO_SIMD:
@@ -5609,7 +5638,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
}
}
   else
-   st = parse_executab

[PATCH] tree-object-size: Drop unused pdecl and poff arguments

2021-10-07 Thread Siddhesh Poyarekar

The pdecl and poff arguments were added to allow their use in
compute_objsize in builtins.c.  That use has been gone for a while now
since compute_objsize does its own size estimation, so drop these
arguments to simplify code.

gcc/ChangeLog:

* tree-object-size.c (addr_object_size,
compute_builtin_object_size): Drop PDECL and POFF arguments.
(addr_object_size): Adjust calls.
* tree-object-size.h (compute_builtin_object_size): Drop PDECL
and POFF arguments.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/tree-object-size.c | 42 +++---
 gcc/tree-object-size.h |  3 +--
 2 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 744748d4d9b..6a4dc724f34 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -54,8 +54,7 @@ static const unsigned HOST_WIDE_INT unknown[4] = {
 
 static tree compute_object_offset (const_tree, const_tree);
 static bool addr_object_size (struct object_size_info *,
- const_tree, int, unsigned HOST_WIDE_INT *,
- tree * = NULL, tree * = NULL);
+ const_tree, int, unsigned HOST_WIDE_INT *);
 static unsigned HOST_WIDE_INT alloc_object_size (const gcall *, int);
 static tree pass_through_call (const gcall *);
 static void collect_object_sizes_for (struct object_size_info *, tree);
@@ -209,17 +208,10 @@ decl_init_size (tree decl, bool min)
 
 static bool
 addr_object_size (struct object_size_info *osi, const_tree ptr,
- int object_size_type, unsigned HOST_WIDE_INT *psize,
- tree *pdecl /* = NULL */, tree *poff /* = NULL */)
+ int object_size_type, unsigned HOST_WIDE_INT *psize)
 {
   tree pt_var, pt_var_size = NULL_TREE, var_size, bytes;
 
-  tree dummy_decl, dummy_off = size_zero_node;
-  if (!pdecl)
-pdecl = &dummy_decl;
-  if (!poff)
-poff = &dummy_off;
-
   gcc_assert (TREE_CODE (ptr) == ADDR_EXPR);
 
   /* Set to unknown and overwrite just before returning if the size
@@ -241,7 +233,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
  || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
{
  compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
-  object_size_type & ~1, &sz, pdecl, poff);
+  object_size_type & ~1, &sz);
}
   else
{
@@ -259,11 +251,6 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
  offset_int mem_offset;
  if (mem_ref_offset (pt_var).is_constant (&mem_offset))
{
- if (*poff)
-   *poff = wide_int_to_tree (ptrdiff_type_node,
- mem_offset + wi::to_offset (*poff));
- else
-   *poff = wide_int_to_tree (ptrdiff_type_node, mem_offset);
  offset_int dsz = wi::sub (sz, mem_offset);
  if (wi::neg_p (dsz))
sz = 0;
@@ -281,7 +268,6 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 }
   else if (DECL_P (pt_var))
 {
-  *pdecl = pt_var;
   pt_var_size = decl_init_size (pt_var, object_size_type & 2);
   if (!pt_var_size)
return false;
@@ -418,7 +404,6 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
   if (bytes != error_mark_node)
{
- *poff = bytes;
  if (TREE_CODE (bytes) == INTEGER_CST
  && tree_int_cst_lt (var_size, bytes))
bytes = size_zero_node;
@@ -438,7 +423,6 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
bytes2 = size_zero_node;
  else
bytes2 = size_binop (MINUS_EXPR, pt_var_size, bytes2);
- *poff = size_binop (PLUS_EXPR, *poff, bytes2);
  bytes = size_binop (MIN_EXPR, bytes, bytes2);
}
}
@@ -446,11 +430,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
   else if (!pt_var_size)
 return false;
   else
-{
-  bytes = pt_var_size;
-  if (!*poff)
-   *poff = size_zero_node;
-}
+bytes = pt_var_size;
 
   if (tree_fits_uhwi_p (bytes))
 {
@@ -548,17 +528,10 @@ pass_through_call (const gcall *call)
 
 bool
 compute_builtin_object_size (tree ptr, int object_size_type,
-unsigned HOST_WIDE_INT *psize,
-tree *pdecl /* = NULL */, tree *poff /* = NULL */)
+unsigned HOST_WIDE_INT *psize)
 {
   gcc_assert (object_size_type >= 0 && object_size_type <= 3);
 
-  tree dummy_decl, dummy_off = size_zero_node;
-  if (!pdecl)
-pdecl = &dummy_decl;
-  if (!poff)
-poff = &dummy_off;
-
   /* Set to unknown and overwrite just before returning if the size
  could be determined.  */

Re: [PATCH] tree-object-size: Drop unused pdecl and poff arguments

2021-10-07 Thread Jakub Jelinek via Gcc-patches

On Thu, Oct 07, 2021 at 07:45:04PM +0530, Siddhesh Poyarekar wrote:
> The pdecl and poff arguments were added to allow their use in
> compute_objsize in builtins.c.  That use has been gone for a while now
> since compute_objsize does its own size estimation, so drop these

Since r11-1517-g5acc654e380797bbf402bc3a0a67f9a6ac4c2a83 indeed.

> arguments to simplify code.
> 
> gcc/ChangeLog:
> 
>   * tree-object-size.c (addr_object_size,
>   compute_builtin_object_size): Drop PDECL and POFF arguments.
>   (addr_object_size): Adjust calls.
>   * tree-object-size.h (compute_builtin_object_size): Drop PDECL
>   and POFF arguments.

Ok, thanks.

> 
> Signed-off-by: Siddhesh Poyarekar 

Jakub

Re: [PATCH 2/4] ipa-cp: Propagation boost for recursion generated values

2021-10-07 Thread Martin Jambor

Hi,

On Wed, Oct 06 2021, Jan Hubicka wrote:
>> Recursive call graph edges, even when they are hot and important for
>> the compiled program, can never have frequency bigger than one, even
>> when the actual time savings in the next recursion call are not
>> realized just once but depend on the depth of recursion.  The current
>> IPA-CP effect propagation code did not take that into account and just
>> used the frequency, thus severely underestimating the effect.
>> 
>> This patch artificially boosts values taking part in such calls.  If a
>> value feeds into itself through a recursive call, the frequency of the
>> edge is multiplied by a parameter with default value of 6, basically
>> assuming that the recursion will take place 6 times.  This value can
>> of course be subject to change.
>> 
>> Moreover, values which do not feed into themselves but which were
>> generated for a self-recursive call with an arithmetic
>> pass-function (aka the 548.exchange "hack" which however is generally
>> applicable for recursive functions which count the recursion depth in
>> a parameter) have the edge frequency multiplied as many times as there
>> are generated values in the chain.  In essence, we will assume they
>> are all useful.
>> 
>> This patch partially fixes the current situation when we fail to
>> optimize 548.exchange with PGO.  In the benchmark one recursive edge
>> count overwhelmingly dominates all other counts in the program and so
>> we fail to perform the first cloning (for the nonrecursive entry call)
>> because it looks totally insignificant.
>> 
>> gcc/ChangeLog:
>> 
>> 2021-07-16  Martin Jambor  
>> 
>>  * params.opt (ipa-cp-recursive-freq-factor): New.
>>  * ipa-cp.c (ipcp_value): Switch to inline initialization.  New members
>>  scc_no, self_recursion_generated_level, same_scc and
>>  self_recursion_generated_p.
>>  (ipcp_lattice::add_value): Replaced parameter unlimited with
>>  same_lat_gen_level, usit it determine limit of values and store it to
>>  the value.
>>  (ipcp_lattice::print): Dump the new fileds.
>>  (allocate_and_init_ipcp_value): Take same_lat_gen_level as a new
>>  parameter and store it to the new value.
>>  (self_recursively_generated_p): Removed.
>>  (propagate_vals_across_arith_jfunc): Use self_recursion_generated_p
>>  instead of self_recursively_generated_p, store self generation level
>>  to such values.
>>  (value_topo_info::add_val): Set scc_no.
>>  (value_topo_info::propagate_effects): Multiply frequencies of
>>  recursively feeding values and self generated values by appropriate
>>  new factors.
>
> If you boost every self fed value by factor of 6, I wonder how quickly
> we run into exponential explosion of the cost (since the frequency
> should be close to 1 and 6^9=10077696

The factor of six is applied once for an entire SCC, so we'd reach this
huge number only if there was a chain of nine different recursive
functions - with this patch we assume each one will recurse six times,
so the result is indeed a huge execution count estimate.

The constant is not used for the "self generated" values like those in
exchange, those are handled by the else branch below.  For those we
expect the recursion happens 8 times, because that is how many values we
generate, but the boost is different depending on the recursion depth.

>
> I think it would be more robust to simply assume that the job will
>distribute evenly across the clones.  How hard is to implement that?

This is not an update of counters.  The code tries to estimate execution
time improvement that is will be possible in callees if we clone for
this particular value and so is based on call graph edge frequencies (so
that if in a callee we can save 5 units of time and the frequency is 5,
we estimate we will save 25).  The code has the advantage that it is
universal for both situations when profile feedback is and is not
available.

Martin

Re: [PATCH] c++: odr-use argument to a function NTTP [PR53164]

2021-10-07 Thread Patrick Palka via Gcc-patches

On Wed, 6 Oct 2021, Jason Merrill wrote:

> On 10/6/21 15:52, Patrick Palka wrote:
> > On Wed, 6 Oct 2021, Patrick Palka wrote:
> > 
> > > On Tue, 5 Oct 2021, Jason Merrill wrote:
> > > 
> > > > On 10/5/21 15:17, Patrick Palka wrote:
> > > > > On Mon, 4 Oct 2021, Patrick Palka wrote:
> > > > > 
> > > > > > When passing a function template as the argument to a function NTTP
> > > > > > inside a template, we resolve it to the right specialization ahead
> > > > > > of
> > > > > > time via resolve_address_of_overloaded_function, though the call to
> > > > > > mark_used within defers odr-using it until instantiation time (as
> > > > > > usual).
> > > > > > But at instantiation time we end up never calling mark_used on the
> > > > > > specialization.
> > > > > > 
> > > > > > This patch fixes this by adding a call to mark_used in
> > > > > > convert_nontype_argument_function.
> > > > > > 
> > > > > > PR c++/53164
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > > * pt.c (convert_nontype_argument_function): Call mark_used.
> > > > > > 
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > 
> > > > > > * g++.dg/template/non-dependent16.C: New test.
> > > > > > ---
> > > > > >gcc/cp/pt.c |  3 +++
> > > > > >gcc/testsuite/g++.dg/template/non-dependent16.C | 16
> > > > > > 
> > > > > >2 files changed, 19 insertions(+)
> > > > > >create mode 100644
> > > > > > gcc/testsuite/g++.dg/template/non-dependent16.C
> > > > > > 
> > > > > > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > > > > > index f950f4a21b7..5e819c9598c 100644
> > > > > > --- a/gcc/cp/pt.c
> > > > > > +++ b/gcc/cp/pt.c
> > > > > > @@ -6668,6 +6668,9 @@ convert_nontype_argument_function (tree type,
> > > > > > tree
> > > > > > expr,
> > > > > >  return NULL_TREE;
> > > > > >}
> > > > > >+  if (!mark_used (fn_no_ptr, complain) && !(complain &
> > > > > > tf_error))
> > > > > > +return NULL_TREE;
> > > > > > +
> > > > > >  linkage = decl_linkage (fn_no_ptr);
> > > > > >  if (cxx_dialect >= cxx11 ? linkage == lk_none : linkage !=
> > > > > > lk_external)
> > > > > >{
> > > > > > diff --git a/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > > > > b/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > > > > new file mode 100644
> > > > > > index 000..b7dca8f6752
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > > > > @@ -0,0 +1,16 @@
> > > > > > +// PR c++/53164
> > > > > > +
> > > > > > +template
> > > > > > +void f(T) {
> > > > > > +  T::fail; // { dg-error "not a member" }
> > > > > > +}
> > > > > > +
> > > > > > +template
> > > > > > +struct A { };
> > > > > > +
> > > > > > +template
> > > > > > +void g() {
> > > > > > +  A a;
> > > > > > +}
> > > > > 
> > > > > I should mention that the original testcase in the PR was slightly
> > > > > different than this one in that it also performed a call to the NTTP,
> > > > > e.g.
> > > > > 
> > > > > template
> > > > > struct A {
> > > > >   static void h() {
> > > > > p(0);
> > > > >   }
> > > > > };
> > > > > 
> > > > > template
> > > > > void g() {
> > > > >   A::h();
> > > > > }
> > > > > 
> > > > > templated void g<0>();
> > > > > 
> > > > > and not even the call was enough to odr-use f, apparently because the
> > > > > CALL_EXPR case of tsubst_expr calls mark_used on the callee only when
> > > > > it's a FUNCTION_DECL, but in this case after substitution it's an
> > > > > ADDR_EXPR of a FUNCTION_DECL.  Fixing this by looking through the
> > > > > ADDR_EXPR
> > > > > worked, but IIUC the call isn't necessary for f to be odr-used, simply
> > > > > using f as a template argument should be sufficient, so it seems the
> > > > > above is better fix.
> > > > 
> > > > I agree that pedantically the use happens when substituting into the use
> > > > of
> > > > A, but convert_nontype_argument_function seems like a weird place to
> > > > implement that; it's only called again during instantiation of A,
> > > > when we
> > > > instantiate the injected-class-name.  If A isn't instantiated, e.g.
> > > > if 'a'
> > > > is a pointer to A, we again don't instantiate f.
> > > 
> > > I see, makes sense..  I'm not sure where else we can mark the use, then.
> > > Since we resolve the OVERLOAD f to the FUNCTION_DECL f ahead of
> > > time (during which mark_used doesn't actually instantiate f because
> > > we're inside a template), at instantiation time the type A is already
> > > non-dependent so tsubst_aggr_type avoids doing the work that would end
> > > up calling convert_nontype_argument_function.
> > > 
> > > > 
> > > > I see that clang doesn't reject your testcase, either, but MSVC and icc
> > > > do
> > > > (even with 'a' a pointer): https://godbolt.org/z/MGE6TcMch
> > > 
> > > FWIW although Clang doesn't reject 'A a;', it does reject
> > > 'using type = A;' weirdly enough: https://godbolt.org/

Re: [PATCH 2/4] ipa-cp: Propagation boost for recursion generated values

2021-10-07 Thread Jan Hubicka

> Hi,
> >
> > If you boost every self fed value by factor of 6, I wonder how quickly
> > we run into exponential explosion of the cost (since the frequency
> > should be close to 1 and 6^9=10077696
> 
> The factor of six is applied once for an entire SCC, so we'd reach this
> huge number only if there was a chain of nine different recursive
> functions - with this patch we assume each one will recurse six times,
> so the result is indeed a huge execution count estimate.
> 
> The constant is not used for the "self generated" values like those in
> exchange, those are handled by the else branch below.  For those we
> expect the recursion happens 8 times, because that is how many values we
> generate, but the boost is different depending on the recursion depth.
> 
> >
> > I think it would be more robust to simply assume that the job will
> >distribute evenly across the clones.  How hard is to implement that?
> 
> This is not an update of counters.  The code tries to estimate execution
> time improvement that is will be possible in callees if we clone for
> this particular value and so is based on call graph edge frequencies (so
> that if in a callee we can save 5 units of time and the frequency is 5,
> we estimate we will save 25).  The code has the advantage that it is
> universal for both situations when profile feedback is and is not
> available.

I guess the patch is OK then.

Thanks,
Honza
> 
> Martin
>

Re: [PATCH, Fortran] Add diagnostic for F2018:C839 (TS29113:C535c)

2021-10-07 Thread Tobias Burnus


Hi Sandra,

On 06.10.21 23:37, Sandra Loosemore wrote:

This patch is for PR fortran/54753, to add a diagnostic for violations
of this constraint in the 2018 standard:

  C839 If an assumed-size or nonallocatable nonpointer assumed-rank
  array is an actual argument that corresponds to a dummy argument that
  is an INTENT (OUT) assumed-rank array, it shall not be polymorphic,
  finalizable, of a type with an allocatable ultimate component, or of a
  type for which default initialization is specified.

(It now uses an interface instead of an actual subroutine definition,
since Tobias recently committed a patch to fix interfaces in order to
unblock my work on this one.)  That bug is independent of enforcing
this constraint so I'm planning to open a new issue for it with its
own test case, if there isn't already one in Bugzilla.

I concur that that should be in a separate PR.

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
...
+  gfc_array_spec *fas, *aas;
+  bool pointer_arg, allocatable_arg;;

Remove either ";" or ";".

@@ -3329,13 +3331,48 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, 
gfc_formal_arglist *formal,
+  if (a->expr->expr_type != EXPR_VARIABLE)
+ {
+   aas = NULL;
+   pointer_arg = false;
+   allocatable_arg = false;


This code is not generic but rather specific.
But it is fine as used in the code.

The question is how to prevent "?" or wrong code for future
code readers and writers.

The issue is that:
* "alloc_array(:)" is not allocatable but allocatable_arg
  would be true.
* For var(5)%comp%comp2 - the aas and
  allocatable_arg/pointer_arg is based on 'var' and not on
  'comp2'.

As those vars are only used with expr->ref == NULL
(or expr->ref == whole-array ref) – and only with
assumed-rank or assumed-size dummys as actual argument,
it works fine as the not-handled code cannot occur.

 * * *

Solution: I think the simplest would be to add a comment.

Alternatively:

* For 'aas', one way might be to move to 'enum array_type'
  as that makes it clearer that 'aas' is for a special purpose,
  only. I mean something like:
enum array_type a_array_type = AS_UNKNOWN;
if (a->expr->expr_type == EXPR_VARIABLE && a->expr->symtree->n.sym
&& a->expr->symtree->n.sym->as && )
  a_array_type = a->expr->symtree->n.sym->as->type;
else if (... BT_CLASS ...)
  ...

For the attribute, either of the following would work:
* symbol_attribute arg_attr = gfc_expr_attr (e);
  This uses the big hammer when a small one would be sufficient,
  but it works in general.
or
* bool nonpointer_nonalloc_arg = ...
  This uses a more specific name. The attributes might not be
  correct, but the chance that it gets misused are reduced.

I think all variants work – and I am not sure what's the best.
There might be also other solutions, which are better/equally
good.


+  if (fas
+   && (fas->type == AS_ASSUMED_SHAPE
+   || fas->type == AS_DEFERRED
+   || (fas->type == AS_ASSUMED_RANK && f->sym->attr.pointer))
+   && aas
+   && aas->type == AS_ASSUMED_SIZE
&& (a->expr->ref == NULL
|| (a->expr->ref->type == REF_ARRAY
&& a->expr->ref->u.ar.type == AR_FULL)))

That's old code – but can you adapt it to handle BT_CLASS? I think
only 'f->sym->attr.pointer' causes the issue as it does not check for
CLASS_DATA()->attr.class_pointer – and the rest is fine, also because
of now using 'aas->type' which already encapsulates the classness.

Testcase:
--
type t
end type t
interface
  subroutine fc2 (x)
import :: t
class(t), pointer, intent(in) :: x(..)
  end
end interface
contains
  subroutine sub1(y)
type(t), target :: y(*)
call fc2 (y)  ! silently accepted
  end
end
--


+  subroutine test_assumed_size_polymorphic (a1, a2)
+class(t1) :: a1(*), a2(*)
+call poly (a1, a2)  ! { dg-error "(A|a)ssumed.rank" }
+call upoly (a1, a2)  ! { dg-error "(A|a)ssumed.rank" }
+  end subroutine

Can you also add a call like involving something like:
a1(5), a2(4:7), a1(:10) or a2(:-5) ? (Here, '(:-5)' is a
rank-1, size-zero array.)

Calls with those are valid as those pass the array size alongside.
From the patch it looks as if they should just work, but it is
still good to test this.


+  subroutine test_assumed_size_unlimited_polymorphic (a1, a2)
+class(*) :: a1(*), a2(*)
+call upoly (a1, a2)  ! { dg-error "(A|a)ssumed.rank" }
+  end subroutine

Likewise.

Otherwise, it looks good to me.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-10-07 Thread Martin Liška


Hello.

The patch is approved, are you planning committing the changes?

Thanks,
Martin

Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-10-07 Thread H.J. Lu via Gcc-patches

On Thu, Oct 7, 2021 at 8:35 AM Martin Liška  wrote:
>
> Hello.
>
> The patch is approved, are you planning committing the changes?
>
> Thanks,
> Martin

Hongtao is on holiday.  He will be back later today.

-- 
H.J.

Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-07 Thread Tobias Burnus


Hi Chung-Lin,

On 07.10.21 15:59, Chung-Lin Tang wrote:

this patch add support for "strictly-structured blocks" introduced in
OpenMP 5.1,
basically allowing BLOCK constructs to serve as the body for directives:

!$omp target
block
  ...
end block
[!$omp end target]  !! end directive is optional


Pre-remark: That OpenMP feature causes ambiguities.
I have filled an OpenMP spec issue to discuss this, Issue 3154.

Namely the following is unclear:

!$omp parallel
 !$omp parallel
  block
x=  x+  1
  end block
 !$omp end parallel

Does the 'end parallel' end the inner strictly structured block
or the outer loosely structured block?

In principle, a compiler could defer this until later during
parsing, but this requires a tremendous state tracking and it
is surely not simple to do in gfortran (and probably all other
compilers). — Thus, I think the spec will change and probably
in the way which this patch has implemented.

NOTE: It takes the kind of directive into account, i.e.
'omp end target' vs. 'omp end parallel' aren't ambiguous.



The parsing loop in parse_omp_structured_block() has been modified to
allow
a BLOCK construct after the first statement has been detected to be
ST_BLOCK.
This is done by a hard modification of the state into (the new)
COMP_OMP_STRICTLY_STRUCTURED_BLOCK
after the statement is known (I'm not sure if there's a way to 'peek'
the next
statement/token in the Fortran FE, open to suggestions on how to
better write this)

Tested with no regressions on trunk, is this okay to commit?


LGTM  – unless Jakub has further comments.

However, I have two requests:

First, can you include an update of the implementation status
in libgomp/libgomp.texi (+ fix the ...ppp... typo):

"@item Suppport of strictly structured blocks in Fortran @tab N @tab"


Secondly, can you extend the testcase a bit to include
nesting a BLOCK inside the other BLOCK – and nesting
the directive with a strictly structured block inside
another (different) directive to ensure that the
'omp end ...' is correctly matched.

I mean something like:

integer :: x
!$omp target map(i)
 !$omp parallel
  block
block
 x = x + 1
end block
  end block
!$omp end target

!$omp target map(i)
 !$omp parallel
  block
block
 x = x + 1
end block
  end block
  !$omp end parallel
!$omp end target
end

Thanks,

Tobias


2021-10-07 Chung-Lin Tang  

gcc/fortran/ChangeLog:

* decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case
together with COMP_BLOCK.
* parse.c (parse_omp_structured_block): Adjust declaration, add
'bool strictly_structured_block' default true parameter, add handling
for strictly-structured block case, adjust recursive calls to
parse_omp_structured_block.
(parse_executable): Adjust calls to parse_omp_structured_block.
* parse.h (enum gfc_compile_state): Add
COMP_OMP_STRICTLY_STRUCTURED_BLOCK.
* trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case
handling.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/strictly-structured-block-1.f90: New test.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[PATCH v2] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Kito Cheng

__builtin___clear_cache was able to accept constant address for the
argument, but it seems no longer accept recently, and it even not
accept constant address which is hold in variable when optimization is
enable:

```
void foo3(){
  void *yy = (void*)0x1000;
  __builtin___clear_cache(yy, yy);
}
```

So this patch make BEGIN and END accept VOIDmode, like cselib_lookup_mem did per
Jim Wilson's suggestion.

```
static cselib_val *
cselib_lookup_mem (rtx x, int create)
{
  ...
  addr_mode = GET_MODE (XEXP (x, 0));
  if (addr_mode == VOIDmode)
addr_mode = Pmode;
```

Changes v1 -> v2:
- Check is CONST_INT intead of cehck mode, no new testcase, since
  constant value with other type like CONST_DOUBLE will catched by
  front-end.
e.g.
Code:
```c
void foo(){
  __builtin___clear_cache(1.11, 0);
}
```
Error message:
```
clearcache-double.c: In function 'foo':
clearcache-double.c:2:27: error: incompatible type for argument 1 of 
'__builtin___clear_cache'
2 |   __builtin___clear_cache(1.11, 0);
  |   ^~~~
  |   |
  |   double
clearcache-double.c:2:27: note: expected 'void *' but argument is of type 
'double'
```

gcc/ChangeLog:

PR target/100316
* builtins.c (maybe_emit_call_builtin___clear_cache): Allow
CONST_INT for BEGIN and END.

gcc/testsuite/ChangeLog:

PR target/100316
* gcc.c-torture/compile/pr100316.c: New.
---
 gcc/builtins.c |  6 --
 gcc/testsuite/gcc.c-torture/compile/pr100316.c | 18 ++
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100316.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3e57eb03af0..8c3ad302468 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5163,8 +5163,10 @@ default_emit_call_builtin___clear_cache (rtx begin, rtx 
end)
 void
 maybe_emit_call_builtin___clear_cache (rtx begin, rtx end)
 {
-  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode)
-  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode))
+  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode
+   && !CONST_INT_P (begin))
+  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode
+ && !CONST_INT_P (end)))
 {
   error ("both arguments to %<__builtin___clear_cache%> must be pointers");
   return;
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100316.c 
b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
new file mode 100644
index 000..38eca86f49f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
@@ -0,0 +1,18 @@
+void foo(){
+  __builtin___clear_cache(0, 0);
+}
+
+void foo1(){
+  __builtin___clear_cache((void*)0, (void*)0);
+}
+
+void foo2(){
+  void *yy = 0;
+  __builtin___clear_cache(yy, yy);
+}
+
+void foo3(){
+  void *yy = (void*)0x1000;
+  __builtin___clear_cache(yy, yy);
+}
+
-- 
2.33.0

Re: [PATCH] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Kito Cheng via Gcc-patches

Hi Andrew:

> Seems like this error should be an gcc_assert at this point.  The
> error message should have come from expand_builtin___clear_cache
> already.

Actually it will emit errors here, so put gcc_assert will cause ICE :p

> And the check for VOIDmode should really be a check for CONST_INT.

Yeah, that makes sense, thanks!

Re: [PATCH] c++: odr-use argument to a function NTTP [PR53164]

2021-10-07 Thread Jason Merrill via Gcc-patches


On 10/7/21 11:17, Patrick Palka wrote:

On Wed, 6 Oct 2021, Jason Merrill wrote:


On 10/6/21 15:52, Patrick Palka wrote:

On Wed, 6 Oct 2021, Patrick Palka wrote:


On Tue, 5 Oct 2021, Jason Merrill wrote:


On 10/5/21 15:17, Patrick Palka wrote:

On Mon, 4 Oct 2021, Patrick Palka wrote:


When passing a function template as the argument to a function NTTP
inside a template, we resolve it to the right specialization ahead
of
time via resolve_address_of_overloaded_function, though the call to
mark_used within defers odr-using it until instantiation time (as
usual).
But at instantiation time we end up never calling mark_used on the
specialization.

This patch fixes this by adding a call to mark_used in
convert_nontype_argument_function.

PR c++/53164

gcc/cp/ChangeLog:

* pt.c (convert_nontype_argument_function): Call mark_used.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent16.C: New test.
---
gcc/cp/pt.c |  3 +++
gcc/testsuite/g++.dg/template/non-dependent16.C | 16

2 files changed, 19 insertions(+)
create mode 100644
gcc/testsuite/g++.dg/template/non-dependent16.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f950f4a21b7..5e819c9598c 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6668,6 +6668,9 @@ convert_nontype_argument_function (tree type,
tree
expr,
  return NULL_TREE;
}
+  if (!mark_used (fn_no_ptr, complain) && !(complain &
tf_error))
+return NULL_TREE;
+
  linkage = decl_linkage (fn_no_ptr);
  if (cxx_dialect >= cxx11 ? linkage == lk_none : linkage !=
lk_external)
{
diff --git a/gcc/testsuite/g++.dg/template/non-dependent16.C
b/gcc/testsuite/g++.dg/template/non-dependent16.C
new file mode 100644
index 000..b7dca8f6752
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent16.C
@@ -0,0 +1,16 @@
+// PR c++/53164
+
+template
+void f(T) {
+  T::fail; // { dg-error "not a member" }
+}
+
+template
+struct A { };
+
+template
+void g() {
+  A a;
+}


I should mention that the original testcase in the PR was slightly
different than this one in that it also performed a call to the NTTP,
e.g.

 template
 struct A {
   static void h() {
 p(0);
   }
 };

 template
 void g() {
   A::h();
 }

 templated void g<0>();

and not even the call was enough to odr-use f, apparently because the
CALL_EXPR case of tsubst_expr calls mark_used on the callee only when
it's a FUNCTION_DECL, but in this case after substitution it's an
ADDR_EXPR of a FUNCTION_DECL.  Fixing this by looking through the
ADDR_EXPR
worked, but IIUC the call isn't necessary for f to be odr-used, simply
using f as a template argument should be sufficient, so it seems the
above is better fix.


I agree that pedantically the use happens when substituting into the use
of
A, but convert_nontype_argument_function seems like a weird place to
implement that; it's only called again during instantiation of A,
when we
instantiate the injected-class-name.  If A isn't instantiated, e.g.
if 'a'
is a pointer to A, we again don't instantiate f.


I see, makes sense..  I'm not sure where else we can mark the use, then.
Since we resolve the OVERLOAD f to the FUNCTION_DECL f ahead of
time (during which mark_used doesn't actually instantiate f because
we're inside a template), at instantiation time the type A is already
non-dependent so tsubst_aggr_type avoids doing the work that would end
up calling convert_nontype_argument_function.



I see that clang doesn't reject your testcase, either, but MSVC and icc
do
(even with 'a' a pointer): https://godbolt.org/z/MGE6TcMch


FWIW although Clang doesn't reject 'A a;', it does reject
'using type = A;' weirdly enough: https://godbolt.org/z/T9qEn6bWW


Shall we just go with the other more specific approach, that makes sure
the CALL_EXPR case of tsubst_expr calls mark_used when the callee is an
ADDR_EXPR?  Something like (bootstrapped and regtested):


Err, this approach is wrong because by stripping the ADDR_EXPR here we
end up checking access of the unwrapped FUNCTION_DECL again after
substituting into the call.  So we incorrectly reject e.g.

template
void g() {
  P(); // error: ‘static void A::h()’ is private within this context
}

struct A {
  void f() {
g();
  }
private:
  static void h();
};

since A::h isn't accessible from g.


I guess you could call mark_used directly instead of stripping the ADDR_EXPR.


That seems to work nicely, how does the below look?  Bootstrapped and
regtested on x86_64-pc-linux-gnu.



Or for the general problem, perhaps we could mark the TEMPLATE_INFO or TI_ARGS
to indicate that we still need to mark_used the arguments when we encounter
A again during instantiation?


That sounds plausible, though I suppose it might not be worth it only to
handle such a corner case..


Indeed.  A lower-overhead possibility would be to remember, for a 
template

Re: [PATCH] c++: NTTP with array/function type after substitution [PR61355]

2021-10-07 Thread Jason Merrill via Gcc-patches


On 10/7/21 09:31, Patrick Palka wrote:

We're implementing [temp.param]/10 at parse time but not also at
substitution time.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/61355

gcc/cp/ChangeLog:

* pt.c (convert_template_argument): Perform array/function to
pointer conversion on the substituted type of an NTTP.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.pt/nontype5.C:
* g++.dg/template/param6.C: New test.
---
  gcc/cp/pt.c  |  4 +++
  gcc/testsuite/g++.dg/template/param6.C   | 32 
  gcc/testsuite/g++.old-deja/g++.pt/nontype5.C |  2 +-
  3 files changed, 37 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/param6.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5af3a6472f8..170edbd6f1e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8528,6 +8528,10 @@ convert_template_argument (tree parm,
else
t = tsubst (t, args, complain, in_decl);
  
+  /* Perform array-to-pointer and function-to-pointer conversion

+as per [temp.param]/10.  */
+  t = type_decays_to (t);
+
if (invalid_nontype_parm_type_p (t, complain))
return error_mark_node;
  
diff --git a/gcc/testsuite/g++.dg/template/param6.C b/gcc/testsuite/g++.dg/template/param6.C

new file mode 100644
index 000..8306e753d70
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/param6.C
@@ -0,0 +1,32 @@
+// PR c++/61355
+// Verify we perform array-to-pointer and function-to-pointer conversion
+// on the substituted/deduced type of an NTTP.
+
+int f();
+int p[5];
+
+namespace cpp98 {
+  template struct X;
+  typedef X ty1;
+  typedef X ty2;
+}
+
+namespace cpp11 {
+#if __cpp_variadic_templates
+  template struct X;
+  using ty1 = X;
+  using ty2 = X;
+#endif
+}
+
+namespace cpp17 {
+#if __cpp_nontype_template_parameter_auto
+  template struct X;
+  using ty1 = X;
+  using ty2 = X;
+
+  template struct Y;
+  using ty3 = Y;
+  using ty4 = Y;
+#endif
+}
diff --git a/gcc/testsuite/g++.old-deja/g++.pt/nontype5.C 
b/gcc/testsuite/g++.old-deja/g++.pt/nontype5.C
index 2678cf78a7d..e24dca43622 100644
--- a/gcc/testsuite/g++.old-deja/g++.pt/nontype5.C
+++ b/gcc/testsuite/g++.old-deja/g++.pt/nontype5.C
@@ -19,5 +19,5 @@ static int g() { return f(); }
  int f() { return 0; }
  
  int main() {

-return B::g();  // { dg-error "" } could not convert arg
+return B::g();
  }

Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-07 Thread Jakub Jelinek via Gcc-patches

On Thu, Oct 07, 2021 at 09:59:00PM +0800, Chung-Lin Tang wrote:
> this patch add support for "strictly-structured blocks" introduced in OpenMP 
> 5.1,
> basically allowing BLOCK constructs to serve as the body for directives:
> 
> !$omp target
> block
>   ...
> end block
> [!$omp end target]  !! end directive is optional
> 
> !$omp parallel
> block
>   ...
> end block
> ...
> !$omp end parallel  !! error, considered as not match to above parallel 
> directive
> 
> The parsing loop in parse_omp_structured_block() has been modified to allow
> a BLOCK construct after the first statement has been detected to be ST_BLOCK.
> This is done by a hard modification of the state into (the new) 
> COMP_OMP_STRICTLY_STRUCTURED_BLOCK
> after the statement is known (I'm not sure if there's a way to 'peek' the next
> statement/token in the Fortran FE, open to suggestions on how to better write 
> this)

Thanks for working on this.
The workshare/parallel workshare case is unclear, I've filed
https://github.com/OpenMP/spec/issues/3153
for it.  Either don't allow block if workshare_stmts_only for now
until that is clarified, or if we do, we need to make sure that it does the
expected thing, does that gfc_trans_block_construct call ensure it?

Then we have the
https://github.com/OpenMP/spec/issues/3154
issue Tobias discovered, if that issue is resolved to end always applying to
the directive before the block statement, I think your patch handles it that
way but we want testsuite coverage for some of those cases.

For the testcases, I think best would be to split it into two, one that
contains only what we want to accept and another one with dg-errors in it.

I don't think the patch does the right thing for sections/parallel sections.
That is (at least in 5.1) defined as:
!$omp sections clauses...
[!$omp section]
structured-block-sequence
[!$omp section
structured-block-sequence]
...
!$omp end sections
(and similarly for parallel sections).
I believe your patch properly disallows:
!$omp sections
block
...
!$omp section
...
end block
!$omp end sections
- block itself is allowed, e.g.
!$omp sections
block
a=1
b=2
end block
!$omp end sections
with the meaning that the block is after the first implied !$omp section
and there is nothing else.  But does the patch actually check that
!$omp sections
block
...
end block
c=1
!$omp end sections
or
!$omp sections
!$omp section
block
...
end block
c=1
!$omp section
d=1
!$omp end sections
is invalid?  Though, not sure if that was the intended effect, in
OpenMP 5.0 that used to be fine.  But then the other changes are backwards
incompatible too,
!$omp parallel
block
  ...
end block
c=1
!$omp end parallel
used to be valid but no longer is.  structured-block-sequence is fortran
defined as structured-block and structured-block is defined as either
loosely-structured-block or strictly-structured-block, so for sections
in between each !$omp section should be either anything not starting with
block, or, if it starts with block, after end block there should be
immediately !$omp section or !$omp end {,parallel }sections.

Another thing is scan, the wording is similar and newly
!$omp do reduction(+,inscan:a)
do i=1,10
block
  ...
end block
x=1
!$omp scan
block
  ...
end block
x=2
end do
is invalid.

> @@ -5538,6 +5539,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
> workshare_stmts_only)
>gcc_unreachable ();
>  }
>  
> +  bool block_construct = false;
> +  gfc_namespace* my_ns = NULL;
> +  gfc_namespace* my_parent = NULL;

The usual coding conventions put * before variable name instead of after it
(except for libstdc++).

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90
> @@ -0,0 +1,295 @@
> +! { dg-do compile }
> +! { dg-options "-fopenmp" }
> +
> +program main
> +  integer :: x
> +
> +  !$omp parallel
> +  block
> +x = x + 1
> +  end block
> +
> +  !$omp parallel
> +  block
> +x = x + 1
> +  end block
> +  !$omp end parallel
> +
> +  !$omp parallel
> +  block
> +x = x + 1
> +  end block
> +  x = x + 1
> +  !$omp end parallel ! { dg-error "Unexpected !.OMP END PARALLEL statement" }

Other than the splitting into non-dg-error stuff in one testcase and
dg-error in another one, I think we want (probably in yet another pair of
testcases) test what we are not required to do but with your patch we
actually implement, in particular that !$omp master behaves the same way.

> +  !$omp ordered
> +  block
> +x = x + 1
> +  end block
> +
> +  !$omp ordered
> +  block
> +x = x + 1
> +  end block
> +  !$omp end ordered
> +
> +  !$omp ordered
> +  block
> +x = x + 1
> +  end block
> +  x = x + 1
> +  !$omp end ordered ! { dg-error "Unexpected !.OMP END ORDERED statement" }

I believe these 3 can't be done in the program, would either need to be
wrapped in some !$omp do with ordered clause or should go each ordered into
its own orphaned subroutine, because ordered region must bind to a
worksharing-loop with ordered clause.  I th

[committed] libstdc++: Avoid use of hardware interference non-constant [PR102377]

2021-10-07 Thread Jonathan Wakely via Gcc-patches

libstdc++-v3/ChangeLog:

PR libstdc++/102377
* include/bits/atomic_wait.h (__waiter_pool_base:_S_align):
Hardcode to 64 instead of using non-constant constant.

Tested x86_64-linux. Committed to trunk.

commit 0e90799071ee78f712f3b58fca7000bc0a258ade
Author: Jonathan Wakely 
Date:   Thu Oct 7 18:28:04 2021

libstdc++: Avoid use of hardware interference non-constant [PR102377]

libstdc++-v3/ChangeLog:

PR libstdc++/102377
* include/bits/atomic_wait.h (__waiter_pool_base:_S_align):
Hardcode to 64 instead of using non-constant constant.

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 35c92644146..2ac07ccd05e 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -190,11 +190,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 struct __waiter_pool_base
 {
-#ifdef __cpp_lib_hardware_interference_size
-static constexpr auto _S_align = hardware_destructive_interference_size;
-#else
-static constexpr auto _S_align = 64;
-#endif
+  // Don't use std::hardware_destructive_interference_size here because we
+  // don't want the layout of library types to depend on compiler options.
+  static constexpr auto _S_align = 64;
 
   alignas(_S_align) __platform_wait_t _M_wait = 0;

Re: [PATCH] libiberty: prevent buffer overflow when decoding user input

2021-10-07 Thread Luís Ferreira

On Tue, 2021-10-05 at 21:49 -0400, Eric Gallager wrote:
> On Tue, Oct 5, 2021 at 1:28 PM Luís Ferreira 
> wrote:
> > 
> > On Tue, 2021-10-05 at 09:00 -0600, Jeff Law wrote:
> > > 
> > > 
> > > On 10/4/2021 10:52 AM, Luís Ferreira wrote:
> > > > On Thu, 2021-09-23 at 09:50 -0600, Jeff Law wrote:
> > > > > 
> > > > > On 9/23/2021 4:16 AM, ibuclaw--- via Gcc-patches wrote:
> > > > > > > On 22/09/2021 03:10 Luís Ferreira
> > > > > > > 
> > > > > > > wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > Currently a stack/heap overflow may happen if a crafted
> > > > > > > mangle is
> > > > > > > maliciously used to cause denial of service, such as
> > > > > > > intentional
> > > > > > > crashes
> > > > > > > by accessing a reserved memory space.
> > > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > Thanks for this.  Is there a test that could trigger this
> > > > > > code
> > > > > > path?
> > > > > I don't think Luis has commit privs, so I went ahead and
> > > > > committed
> > > > > this
> > > > > patch.
> > > > > 
> > > > > Yea, a testcase would be great.
> > > > > 
> > > > > Jeff
> > > > > 
> > > > Does the test suite runned against address sanitization? if
> > > > yes, I
> > > > can
> > > > submit a patch to make this fail, otherwise it is hard to
> > > > trigger a
> > > > consistent crash for this issue.
> > > Unfortunately, no it doesn't run with sanitization.  If it's too
> > > painful
> > > to create a test, don't worry about it.  It happens from time to
> > > time.
> > > 
> > > jeff
> > 
> > I would like to add address sanitization if I knew how GCC
> > autotools
> > work but I think this is a better fit when I invest some time
> > implementing something to OSS fuzz and build some infrastructure
> > for
> > fuzzing parts of the GCC.
> > 
> 
> I can help with the autotools part if you can say how precisely you'd
> like to use them to add address sanitization. And as for the OSS
> fuzz part, I think someone tried setting up auto-fuzzing for it once,
> but the main bottleneck was getting the bug reports that it generated
> properly triaged, so if you could make sure the bug-submitting
> portion
> of the process is properly streamlined, that'd probably go a long way
> towards helping it be useful.

Bugs are normally reported by email or mailing list. Is there any
writable mailing list to publish bugs or is it strictly needed to open
an entry on bugzilla?

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part

[r12-4231 Regression] FAIL: gcc.target/i386/sse2-mmx-psubsb-2.c scan-assembler-times movl[ \\t]+\\$-128, 1 on Linux/x86_64

2021-10-07 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

555fa3545efe23393ff21fe0928aa3942e1b90ed is the first bad commit
commit 555fa3545efe23393ff21fe0928aa3942e1b90ed
Author: Roger Sayle 
Date:   Thu Oct 7 15:42:09 2021 +0100

Introduce smul_highpart and umul_highpart RTX for high-part multiplications

caused

FAIL: gcc.target/i386/sse2-mmx-paddsb-2.c scan-assembler-times movl[ 
\\t]+\\$-128, 1
FAIL: gcc.target/i386/sse2-mmx-paddusb-2.c scan-assembler-times movl[ 
\\t]+\\$-1, 1
FAIL: gcc.target/i386/sse2-mmx-psubsb-2.c scan-assembler-times movl[ 
\\t]+\\$-128, 1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4231/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse2-mmx-paddsb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse2-mmx-paddusb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse2-mmx-psubsb-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

[committed] wwwdocs: gcc-12/changes.html (PRU): Document __regio_symbol

2021-10-07 Thread Dimitar Dimitrov

Document the new __regio_symbol variable qualifier for PRU target.

Pushed.

Signed-off-by: Dimitar Dimitrov 
---
 htdocs/gcc-12/changes.html | 9 +
 1 file changed, 9 insertions(+)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 4f7bbd33..22839f2d 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -203,6 +203,15 @@ a work-in-progress.
 
 
 
+PRU
+
+  The https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html#PRU-Named-Address-Spaces";
+  >__regio_symbol variable qualifier has been added.
+  It allows easier access in C programs to the __R30 and
+  __R31 CPU I/O registers.
+  
+
 
 
 
-- 
2.31.1

[committed] libstdc++: Move C++14 components to new header

2021-10-07 Thread Jonathan Wakely via Gcc-patches

This moves the "classic" contents of  to a new header, so that
,  etc. can get use durations and clocks without
calendar types, time zones, and chrono I/O.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/std/chrono (duration, time_point, system_clock)
(steady_clock, high_resolution_clock, chrono_literals, sys_time)
(file_clock, file_time): Move to ...
* include/bits/chrono.h: New file.
* include/bits/atomic_futex.h: Include new header instead of
.
* include/bits/atomic_timed_wait.h: Likewise.
* include/bits/fs_fwd.h: Likewise.
* include/bits/semaphore_base.h: Likewise.
* include/bits/this_thread_sleep.h: Likewise.
* include/bits/unique_lock.h: Likewise.
* include/experimental/bits/fs_fwd.h: Likewise.
* include/experimental/chrono: Likewise.
* include/experimental/io_context: Likewise.
* include/experimental/netfwd: Likewise.
* include/experimental/timer: Likewise.
* include/std/condition_variable: Likewise.
* include/std/mutex: Likewise.
* include/std/shared_mutex: Likewise.

Tested powerpc64le-linux. Committed to trunk.

Nothing in the rest of the library needs the calendar types, so this
saves them from parsing 2000 lines of  just to use durations
and clocks. The saving is going to get bigger soon as I add time zones
and I/O to  (which means it needs to include  and
other headers).

commit 7f78718b7958f603d50d5c30fd8735d73900bd1f
Author: Jonathan Wakely 
Date:   Thu Oct 7 14:51:18 2021

libstdc++: Move C++14  components to new  header

This moves the "classic" contents of  to a new header, so that
,  etc. can get use durations and clocks without
calendar types, time zones, and chrono I/O.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/std/chrono (duration, time_point, system_clock)
(steady_clock, high_resolution_clock, chrono_literals, sys_time)
(file_clock, file_time): Move to ...
* include/bits/chrono.h: New file.
* include/bits/atomic_futex.h: Include new header instead of
.
* include/bits/atomic_timed_wait.h: Likewise.
* include/bits/fs_fwd.h: Likewise.
* include/bits/semaphore_base.h: Likewise.
* include/bits/this_thread_sleep.h: Likewise.
* include/bits/unique_lock.h: Likewise.
* include/experimental/bits/fs_fwd.h: Likewise.
* include/experimental/chrono: Likewise.
* include/experimental/io_context: Likewise.
* include/experimental/netfwd: Likewise.
* include/experimental/timer: Likewise.
* include/std/condition_variable: Likewise.
* include/std/mutex: Likewise.
* include/std/shared_mutex: Likewise.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 27b548607b9..0e43f147591 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -117,6 +117,7 @@ bits_headers = \
${bits_srcdir}/c++0x_warning.h \
${bits_srcdir}/char_traits.h \
${bits_srcdir}/charconv.h \
+   ${bits_srcdir}/chrono.h \
${bits_srcdir}/codecvt.h \
${bits_srcdir}/concept_check.h \
${bits_srcdir}/cow_string.h \
diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index d4bb32c26de..41c59c4c64d 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -32,13 +32,12 @@
 
 #pragma GCC system_header
 
-#include 
 #include 
-#include 
 #if ! (defined(_GLIBCXX_HAVE_LINUX_FUTEX) && ATOMIC_INT_LOCK_FREE > 1)
 #include 
 #include 
 #endif
+#include 
 
 #ifndef _GLIBCXX_ALWAYS_INLINE
 #define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__))
diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index d423a7af7c3..64c1ba62a3e 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -37,8 +37,7 @@
 #if __cpp_lib_atomic_wait
 #include 
 #include 
-
-#include 
+#include 
 
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
 #include  // std::terminate
diff --git a/libstdc++-v3/include/bits/chrono.h 
b/libstdc++-v3/include/bits/chrono.h
new file mode 100644
index 000..956af059ad3
--- /dev/null
+++ b/libstdc++-v3/include/bits/chrono.h
@@ -0,0 +1,1392 @@
+// chrono::duration and chrono::time_point -*- C++ -*-
+
+// Copyright (C) 2008-2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by t

Re: [PATCH] c++: fix cases of core1001/1322 by not dropping cv-qualifier of function parameter of type of typename or decltype[PR101402,PR102033,PR102034,PR102039,PR102

2021-10-07 Thread Jason Merrill via Gcc-patches


On 10/3/21 07:14, Nick Huang wrote:

I made a little improvement of my fix by including template
type parameter to not dropping cv-const because they are similar to dependent
type which you cannot determine whether they are top-level cv-qualifier or not
until instantiation.

+ if (processing_template_decl
+   && (TREE_CODE (type) == TYPENAME_TYPE
+  || TREE_CODE (type) == DECLTYPE_TYPE
++  || TREE_CODE (type) == TEMPLATE_TYPE_PARM  // this is new


I think WILDCARD_TYPE_P is what you want, except...


1. It fix your test case of
template 
struct A{
void f(T);
};
template 
void A::f(const T){ }
template<>
void A::f(const int*){}

current GCC mistakenly accepts without considering the gap of missing "const"
between declaration and out of line definition. clang correctly rejects it.
(https://www.godbolt.org/z/qb9Tf99eK) and explains the cause nicely.
My fix also rejects it.


Your patch rejects that testcase even without the specialization on 
int[], which no other compiler I'm aware of does; this is why I think 
this approach is wrong.  I think your approach would make sense for the 
standard to specify, but it doesn't (currently).


You seem to have missed my September 28 mail that argued for fixing the 
bug in determine_specialization that was preventing the 92010 fix from 
handling these cases.  I tried removing the redundant code in 
determine_specialization, but then it turned out that I needed to fix 
the corresponding code in fn_type_unification, as in the patch below. 
This approach does not reject the testcase above, which I think is OK; 
whether the specialization is well-formed depends on which declaration 
we are trying to specialize, and GCC chooses the definition.


Thoughts?From 8671432aac3444603e00c07d0d5d6cebda7d5da8 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Tue, 28 Sep 2021 10:02:04 -0400
Subject: [PATCH] c++: array cv-quals and template specialization
To: gcc-patches@gcc.gnu.org

gcc/cp/ChangeLog:

	* pt.c (determine_specialization): Remove redundant code.
	(fn_type_unification): Check for mismatched length.
	(type_unification_real): Ignore terminal void.
	(get_bindings): Don't stop at void_list_node.
	* class.c (resolve_address_of_overloaded_function): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/parse/pr101402.C: No errors.
	* g++.dg/template/fnspec2.C: New test.
---
 gcc/cp/class.c  |  2 +-
 gcc/cp/pt.c | 30 -
 gcc/testsuite/g++.dg/parse/pr101402.C   |  4 ++--
 gcc/testsuite/g++.dg/template/fnspec2.C |  9 
 4 files changed, 26 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/fnspec2.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 59611627d18..f16e50b9de9 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -8382,7 +8382,7 @@ resolve_address_of_overloaded_function (tree target_type,
   nargs = list_length (target_arg_types);
   args = XALLOCAVEC (tree, nargs);
   for (arg = target_arg_types, ia = 0;
-	   arg != NULL_TREE && arg != void_list_node;
+	   arg != NULL_TREE;
 	   arg = TREE_CHAIN (arg), ++ia)
 	args[ia] = TREE_VALUE (arg);
   nargs = ia;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 19e03369ffa..2eb641e0304 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -2230,7 +2230,6 @@ determine_specialization (tree template_id,
 	{
 	  tree decl_arg_types;
 	  tree fn_arg_types;
-	  tree insttype;
 
 	  /* In case of explicit specialization, we need to check if
 	 the number of template headers appearing in the specialization
@@ -2356,20 +2355,6 @@ determine_specialization (tree template_id,
 	   template argument.  */
 	continue;
 
-  /* Remove, from the set of candidates, all those functions
- whose constraints are not satisfied. */
-  if (flag_concepts && !constraints_satisfied_p (fn, targs))
-continue;
-
-  // Then, try to form the new function type.
-	  insttype = tsubst (TREE_TYPE (fn), targs, tf_fndecl_type, NULL_TREE);
-	  if (insttype == error_mark_node)
-	continue;
-	  fn_arg_types
-	= skip_artificial_parms_for (fn, TYPE_ARG_TYPES (insttype));
-	  if (!compparms (fn_arg_types, decl_arg_types))
-	continue;
-
 	  /* Save this template, and the arguments deduced.  */
 	  templates = tree_cons (targs, fn, templates);
 	}
@@ -21856,6 +21841,15 @@ fn_type_unification (tree fn,
  TREE_VALUE (sarg));
 	goto fail;
 	  }
+  if ((i < nargs || sarg)
+	  /* add_candidates uses DEDUCE_EXACT for x.operator foo(), but args
+	 doesn't contain the trailing void, and conv fns are always ().  */
+	  && !DECL_CONV_FN_P (decl))
+	{
+	  unsigned nsargs = i + list_length (sarg);
+	  unify_arity (explain_p, nargs, nsargs);
+	  goto fail;
+	}
 }
 
   /* After doing deduction with the inherited constructor, actually return an
@@ -22379,6 +22373,10 @@ type_unification_real (tree tparms,
   args = xargs;
   nargs = xn

Re: [PATCH] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Andrew Pinski via Gcc-patches

On Thu, Oct 7, 2021 at 9:47 AM Kito Cheng  wrote:
>
> Hi Andrew:
>
> > Seems like this error should be an gcc_assert at this point.  The
> > error message should have come from expand_builtin___clear_cache
> > already.
>
> Actually it will emit errors here, so put gcc_assert will cause ICE :p

The error message would have been emitted from
expand_builtin___clear_cache and maybe_emit_call_builtin___clear_cache
 would not have been called from user code.
All other uses of maybe_emit_call_builtin___clear_cache are internal
to gcc and should have the correct mode so asserting is the right
thing to do.

Thanks,
Andrew Pinski

>
> > And the check for VOIDmode should really be a check for CONST_INT.
>
> Yeah, that makes sense, thanks!

Re: [PATCH 0/2] libsanitizer: Merge with upstream commit fdf4c035225d

2021-10-07 Thread Gerald Pfeifer

On Wed, 6 Oct 2021, H.J. Lu via Gcc-patches wrote:
> I am checking in these patches to merge with upstream commit:

Thus breaking bootstrap on FreeBSD:

GCC-HEAD/libsanitizer/sanitizer_common/sanitizer_platform_limits_freebsd.cpp:370:36:
 error: 'MD5_CTX' was not declared in this scope
  370 | const unsigned MD5_CTX_sz = sizeof(MD5_CTX);
  |^~~
GCC-HEAD/libsanitizer/sanitizer_common/sanitizer_platform_limits_freebsd.cpp:371:36:
 error: 
'MD5_DIGEST_STRING_LENGTH' was not declared in this scope; did you mean 
'SHA256_DIGEST_STRING_LENGTH'?
  371 | const unsigned MD5_return_length = MD5_DIGEST_STRING_LENGTH;
  |^~~~
  |SHA256_DIGEST_STRING_LENGTH

I stared the the sources for minutes and FreeBSD include files and could
not find what was wrong.

Then I realized: GCC has its own include/md5 which misses some of these!


Looking how old md5 is (and deprecated) I cannot help wonder whether you
merged something that wasn't new, but intentionally left out originally?

Or include paths are broken.


Gerald


PS: At this point I am counting about *seven* distinct bootstrap 
breakages on my nightly testers in the last six weeks or so. :-(

Re: [PATCH 0/2] libsanitizer: Merge with upstream commit fdf4c035225d

2021-10-07 Thread H.J. Lu via Gcc-patches

On Thu, Oct 7, 2021 at 2:41 PM Gerald Pfeifer  wrote:
>
> On Wed, 6 Oct 2021, H.J. Lu via Gcc-patches wrote:
> > I am checking in these patches to merge with upstream commit:
>
> Thus breaking bootstrap on FreeBSD:
>
> GCC-HEAD/libsanitizer/sanitizer_common/sanitizer_platform_limits_freebsd.cpp:370:36:
>  error: 'MD5_CTX' was not declared in this scope
>   370 | const unsigned MD5_CTX_sz = sizeof(MD5_CTX);
>   |^~~
> GCC-HEAD/libsanitizer/sanitizer_common/sanitizer_platform_limits_freebsd.cpp:371:36:
>  error:
> 'MD5_DIGEST_STRING_LENGTH' was not declared in this scope; did you mean
> 'SHA256_DIGEST_STRING_LENGTH'?
>   371 | const unsigned MD5_return_length = MD5_DIGEST_STRING_LENGTH;
>   |^~~~
>   |SHA256_DIGEST_STRING_LENGTH
>
> I stared the the sources for minutes and FreeBSD include files and could
> not find what was wrong.

compiler-rt sync brought in

commit 18a7ebda99044473fdbce6376993714ff54e6690
Author: David Carlier 
Date:   Wed Oct 6 06:01:50 2021 +0100

[Sanitizers] intercept md5 and sha* apis on FreeBSD.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110989

diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_freebsd.cpp
 b/libsanitizer/sanitizer_common/sanitizer_platform_limits_freebsd.cpp
index bfe3eea464d..64535805e40 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_freebsd.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_freebsd.cpp
@@ -69,6 +69,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -361,6 +366,22 @@ const int si_SEGV_MAPERR = SEGV_MAPERR;
 const int si_SEGV_ACCERR = SEGV_ACCERR;
 const int unvis_valid = UNVIS_VALID;
 const int unvis_validpush = UNVIS_VALIDPUSH;
+
+const unsigned MD5_CTX_sz = sizeof(MD5_CTX);
+const unsigned MD5_return_length = MD5_DIGEST_STRING_LENGTH;
+
+#define SHA2_CONST(LEN)  \
+  const unsigned SHA##LEN##_CTX_sz = sizeof(SHA##LEN##_CTX); \
+  const unsigned SHA##LEN##_return_length = SHA##LEN##_DIGEST_STRING_LENGTH; \
+  const unsigned SHA##LEN##_block_length = SHA##LEN##_BLOCK_LENGTH;  \
+  const unsigned SHA##LEN##_digest_length = SHA##LEN##_DIGEST_LENGTH
+
+SHA2_CONST(224);
+SHA2_CONST(256);
+SHA2_CONST(384);
+SHA2_CONST(512);
+
+#undef SHA2_CONST
 }  // namespace __sanitizer

Does compiler-rt compile on FreeBSD?  If not, please file a compiler-rt bug.
If yes, why doesn't it work in GCC?

> Then I realized: GCC has its own include/md5 which misses some of these!
>
>
> Looking how old md5 is (and deprecated) I cannot help wonder whether you
> merged something that wasn't new, but intentionally left out originally?
>
> Or include paths are broken.
>
>
> Gerald
>
>
> PS: At this point I am counting about *seven* distinct bootstrap
> breakages on my nightly testers in the last six weeks or so. :-(



-- 
H.J.

[PATCH 0/8] __builtin_dynamic_object_size and more

2021-10-07 Thread Siddhesh Poyarekar

This patchset implements the __builtin_dynamic_object_size builtin for
gcc.  The primary motivation to have this builtin in gcc is to enable
_FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
in use cases where the potential performance tradeoff is acceptable.

Semantics:
--

__builtin_dynamic_object_size has the same signature as
__builtin_object_size; it accepts a pointer and type ranging from 0 to 3
and it returns an object size estimate for the pointer based on an
analysis of which objects the pointer could point to.  The actual
properties of the object size estimate are different:

- In the best case __builtin_dynamic_object_size evaluates to an
  expression that represents a precise size of the object being pointed
  to.

- In case a precise object size expression cannot be evaluated,
  __builtin_dynamic_object_size attempts to evaluate an estimate size
  expression based on the object size type.

- In what situations the builtin returns an estimate vs a precise
  expression is an implementation detail and may change in future.
  Users must always assume, as in the case of __builtin_object_size, that
  the returned value is the maximum or minimum based on the object size
  type they have provided.

- In the worst case of failure, __builtin_dynamic_object_size returns a
  constant (size_t)-1 or (size_t)0.

- If __builtin_dynamic_object_size returns a non-constant expression,
  the value of that expression is never (size_t)-1.

Implementation:
---

- The __builtin_dynamic_object_size support is implemented in
  tree-dynamic-object-size.c as two passes, just like tree-object-size.
  In most cases the first pass (early_dynobjsz) is able to evaluate
  object size expressions.  The second phase mainly ends up simplifying
  the __builtin_dynamic_object_size to __builtin_object_size.  Speaking
  of which...

- Currently if __builtin_dynamic_object_size fails to evaluate a precise
  size expression, it replaces the call with __builtin_object_size and
  punts to the last tree-object-size pass.  In future, this could become
  more sophisticated.

- Size expressions returned directly by __builtin_dynamic_object_size
  are bounded in the range of [0, SIZE_MAX - 1] so that equality tests
  with (size_t)-1 fail.  This is necessary to eliminate unnecessary
  branches in _FORTIFY_SOURCE=3.  This could be tightened further in
  future to SIZE_MAX / 2 since there are already assumptions built in
  that prevent object sizes from crossing that limit.

- I have split the implementation into one feature per patchset (calls,
  function parameters, PHI, etc.) to hopefully ease review.

- Some code duplication from tree-object-size is intentional.  The
  immediate need for that is to reduce impact on existing passes.  Over
  the medium/long term though, the intention is to fully replace
  tree-object-size.  See Future work for more information.

- I have marked the new files as Copyright (C) me (since I'm
  contributing under DCO), but I've based some of the code on the
  existing tree-object-size.c, so I need advice on the correct copyright
  notice.

Performance:


Expressions generated by this pass in theory could be arbitrarily
complex.  I have not made an attempt to limit nesting of objects since
it seemed too early to do that.  In practice based on the few
applications I built, most of the complexity of the expressions got
folded away.  Even so, the performance overhead is likely to be
non-zero.  If we find performance degradation to be significant, we
could later add nesting limits to bail out if a size expression gets too
complex.

I have also not implemented simplification of __*_chk to their normal
variants if we can determine at compile time that it is safe.  I am
working on that right now and hope to have it ready before stage 1
closes.

Testing:


I have added tests for dynamic object sizes as well as wrappers for all
__builtin_object_size tests to provide wide coverage.  With that in
place I have run full bootstrap builds on Fedora rawhide by backporting the
patches to the gcc11 branch and x86_64 and i686 have no failures in any
of the builtin-*object-size* tests and no new failures.

I have also built bash, cmake, zfs-fuse and systemtap with
_FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw
no issues in any of those builds.  I did some rudimentary analysis of
the generated binaries to see if there was any difference in coverage
and found that there was.  In terms of pure numbers, there were far more
_chk variants of calls than the regular ones due to _FORTIFY_SOURCE
(from about 4% to 70% in bash), but that could well be due to the _chk
variants not being folded into regular variants when safe.  However, on
manual inspection of some of these sites, it was clear that coverage was
increasing significantly where __builtin_object_size was previously
bailing out.

I'm running more tests with _FORTIFY_SOURCE=3 and will do more analysis

[PATCH 1/8] __builtin_dynamic_object_size: Recognize builtin name

2021-10-07 Thread Siddhesh Poyarekar

Recognize the __builtin_dynamic_object_size builtin, but simply
replace it with -1 or 0 for now.

gcc/ChangeLog:

* builtins.c (expand_builtin, fold_builtin_2): Add
BUILT_IN_DYN_OBJECT_SIZE.
(fold_builtin_dyn_object_size): New function.
(valid_object_size_args): New function.
(fold_builtin_object_size): Use it.
* builtins.def (BUILT_IN_DYN_OBJECT_SIZE): New builtin.
---
 gcc/builtins.c   | 63 +++-
 gcc/builtins.def |  1 +
 2 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3e57eb03af0..894d62359b4 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -180,6 +180,7 @@ static rtx expand_builtin_memory_chk (tree, rtx, 
machine_mode,
 static void maybe_emit_chk_warning (tree, enum built_in_function);
 static void maybe_emit_sprintf_chk_warning (tree, enum built_in_function);
 static tree fold_builtin_object_size (tree, tree);
+static tree fold_builtin_dyn_object_size (tree, tree);
 
 unsigned HOST_WIDE_INT target_newline;
 unsigned HOST_WIDE_INT target_percent;
@@ -7910,6 +7911,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
machine_mode mode,
   return const0_rtx;
 
 case BUILT_IN_OBJECT_SIZE:
+case BUILT_IN_DYN_OBJECT_SIZE:
   return expand_builtin_object_size (exp);
 
 case BUILT_IN_MEMCPY_CHK:
@@ -9318,6 +9320,9 @@ fold_builtin_2 (location_t loc, tree expr, tree fndecl, 
tree arg0, tree arg1)
 case BUILT_IN_OBJECT_SIZE:
   return fold_builtin_object_size (arg0, arg1);
 
+case BUILT_IN_DYN_OBJECT_SIZE:
+  return fold_builtin_dyn_object_size (arg0, arg1);
+
 case BUILT_IN_ATOMIC_ALWAYS_LOCK_FREE:
   return fold_builtin_atomic_always_lock_free (arg0, arg1);
 
@@ -9964,7 +9969,10 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
 }
 
 
-/* Expand a call EXP to __builtin_object_size.  */
+/* Expand a call EXP to __builtin_object_size or
+   __builtin_dynamic_object_size.  If the builtin survived up to this point
+   then it means we have failed to get an object size, so replace the call with
+   0 or -1 depending on the object size type argument.  */
 
 static rtx
 expand_builtin_object_size (tree exp)
@@ -10250,27 +10258,40 @@ maybe_emit_sprintf_chk_warning (tree exp, enum 
built_in_function fcode)
access_write_only);
 }
 
-/* Fold a call to __builtin_object_size with arguments PTR and OST,
-   if possible.  */
+/* Validate arguments to __builtin_object_size and
+   __builtin_dynamic_object_size.  If both arguments are valid, return the
+   object size type in OSTP.  */
 
-static tree
-fold_builtin_object_size (tree ptr, tree ost)
+static bool
+valid_object_size_args (const tree ptr, tree ost, int *ostp)
 {
-  unsigned HOST_WIDE_INT bytes;
-  int object_size_type;
-
   if (!validate_arg (ptr, POINTER_TYPE)
   || !validate_arg (ost, INTEGER_TYPE))
-return NULL_TREE;
+return false;
 
   STRIP_NOPS (ost);
 
   if (TREE_CODE (ost) != INTEGER_CST
   || tree_int_cst_sgn (ost) < 0
   || compare_tree_int (ost, 3) > 0)
-return NULL_TREE;
+return false;
 
-  object_size_type = tree_to_shwi (ost);
+  *ostp = tree_to_shwi (ost);
+
+  return true;
+}
+
+/* Fold a call to __builtin_object_size with arguments PTR and OST,
+   if possible.  */
+
+static tree
+fold_builtin_object_size (tree ptr, tree ost)
+{
+  unsigned HOST_WIDE_INT bytes;
+  int object_size_type;
+
+  if (!valid_object_size_args (ptr, ost, &object_size_type))
+return NULL_TREE;
 
   /* __builtin_object_size doesn't evaluate side-effects in its arguments;
  if there are any side-effects, it returns (size_t) -1 for types 0 and 1
@@ -10297,6 +10318,26 @@ fold_builtin_object_size (tree ptr, tree ost)
   return NULL_TREE;
 }
 
+/* Fold a call to __builtin_dynamic_object_size with arguments PTR and OST,
+   if possible.  */
+
+static tree
+fold_builtin_dyn_object_size (tree ptr, tree ost)
+{
+  int object_size_type;
+
+  if (!valid_object_size_args (ptr, ost, &object_size_type))
+return NULL_TREE;
+
+  /* __builtin_dynamic_object_size doesn't evaluate side-effects in its
+ arguments; if there are any side-effects, it returns (size_t) -1 for types
+ 0 and 1 and (size_t) 0 for types 2 and 3.  */
+  if (TREE_SIDE_EFFECTS (ptr))
+return build_int_cst_type (size_type_node, object_size_type < 2 ? -1 : 0);
+
+  return NULL_TREE;
+}
+
 /* Builtins with folding operations that operate on "..." arguments
need special handling; we need to store the arguments in a convenient
data structure before attempting any folding.  Fortunately there are
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 45a09b4d42d..ae94caab921 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -972,6 +972,7 @@ DEF_BUILTIN_STUB (BUILT_IN_STRNCMP_EQ, 
"__builtin_strncmp_eq")
 
 /* Object size checking builtins.  */
 DEF_GCC_BUILTIN   (BUILT_IN_OBJECT_SIZE, "object_size", 
BT_FN_SIZE_CONST_PTR_INT, ATTR_CONST_NOTH

[PATCH 2/8] tree-dynamic-object-size: New pass

2021-10-07 Thread Siddhesh Poyarekar

A new pass is added to execute just before the tree-object-size pass
to recognize and simplify __builtin_dynamic_object_size.  Some key
ideas (such as multipass object size collection to detect reference
loops) have been taken from tree-object-size but is distinct from it
to ensure minimal impact on existing code.

At the moment, the pass only recognizes allocators and passthrough
functions to attempt to derive object size expressions, and replaces
the call site with those expressions.  On failure, it replaces the
__builtin_dynamic_object_size with __builtin_object_size as a
fallback.

gcc/ChangeLog:

* Makefile.in (OBJS): Add tree-dynamic-object-size.o.
(PLUGIN_HEADERS): Add tree-dynamic-object-size.h.
* tree-dynamic-object-size.c: New file.
* tree-dynamic-object-size.h: New file.
* builtins.c: Use it.
(fold_builtin_dyn_object_size): Call
compute_builtin_dyn_object_size for
__builtin_dynamic_object_size builtin.
(passes.def): Add pass_dynamic_object_sizes.
* tree-pass.h: Add ake_pass_dynamic_object_sizes.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c: New test.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/Makefile.in   |  19 +-
 gcc/builtins.c|   8 +
 gcc/doc/extend.texi   |  11 +
 gcc/passes.def|   3 +
 .../gcc.dg/builtin-dynamic-object-size-0.c| 166 ++
 gcc/tree-dynamic-object-size.c| 517 ++
 gcc/tree-dynamic-object-size.h|  25 +
 gcc/tree-pass.h   |   2 +
 8 files changed, 742 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
 create mode 100644 gcc/tree-dynamic-object-size.c
 create mode 100644 gcc/tree-dynamic-object-size.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f36ffa4740b..5189dcfcc0d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1608,6 +1608,7 @@ OBJS = \
tree-diagnostic.o \
tree-diagnostic-path.o \
tree-dump.o \
+   tree-dynamic-object-size.o \
tree-eh.o \
tree-emutls.o \
tree-if-conv.o \
@@ -3647,15 +3648,15 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) 
coretypes.h $(TM_H) \
   ssa-iterators.h $(RESOURCE_H) tree-cfgcleanup.h attribs.h calls.h \
   cfgexpand.h diagnostic-color.h gcc-symtab.h gimple-builder.h gimple-low.h \
   gimple-walk.h gimplify-me.h pass_manager.h print-rtl.h stmt.h \
-  tree-dfa.h tree-hasher.h tree-nested.h tree-object-size.h tree-outof-ssa.h \
-  tree-parloops.h tree-ssa-address.h tree-ssa-coalesce.h tree-ssa-dom.h \
-  tree-ssa-loop.h tree-ssa-loop-ivopts.h tree-ssa-loop-manip.h \
-  tree-ssa-loop-niter.h tree-ssa-ter.h tree-ssa-threadedge.h \
-  tree-ssa-threadupdate.h inchash.h wide-int.h signop.h hash-map.h \
-  hash-set.h dominance.h cfg.h cfgrtl.h cfganal.h cfgbuild.h cfgcleanup.h \
-  lcm.h cfgloopmanip.h file-prefix-map.h builtins.def $(INSN_ATTR_H) \
-  pass-instances.def params.list $(srcdir)/../include/gomp-constants.h \
-  $(EXPR_H)
+  tree-dfa.h tree-dynamic-object-size.h tree-hasher.h tree-nested.h \
+  tree-object-size.h tree-outof-ssa.h tree-parloops.h tree-ssa-address.h \
+  tree-ssa-coalesce.h tree-ssa-dom.h tree-ssa-loop.h tree-ssa-loop-ivopts.h \
+  tree-ssa-loop-manip.h tree-ssa-loop-niter.h tree-ssa-ter.h \
+  tree-ssa-threadedge.h tree-ssa-threadupdate.h inchash.h wide-int.h signop.h \
+  hash-map.h hash-set.h dominance.h cfg.h cfgrtl.h cfganal.h cfgbuild.h \
+  cfgcleanup.h lcm.h cfgloopmanip.h file-prefix-map.h builtins.def \
+  $(INSN_ATTR_H) pass-instances.def params.list \
+  $(srcdir)/../include/gomp-constants.h $(EXPR_H)
 
 # generate the 'build fragment' b-header-vars
 s-header-vars: Makefile
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 894d62359b4..d015029765b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "calls.h"
 #include "varasm.h"
 #include "tree-object-size.h"
+#include "tree-dynamic-object-size.h"
 #include "tree-ssa-strlen.h"
 #include "realmpfr.h"
 #include "cfgrtl.h"
@@ -10325,6 +10326,7 @@ static tree
 fold_builtin_dyn_object_size (tree ptr, tree ost)
 {
   int object_size_type;
+  tree bytes;
 
   if (!valid_object_size_args (ptr, ost, &object_size_type))
 return NULL_TREE;
@@ -10335,6 +10337,12 @@ fold_builtin_dyn_object_size (tree ptr, tree ost)
   if (TREE_SIDE_EFFECTS (ptr))
 return build_int_cst_type (size_type_node, object_size_type < 2 ? -1 : 0);
 
+  /* If object size expression cannot be evaluated yet, delay folding until
+ later.  Maybe subsequent passes will help determining it.  */
+  if (TREE_CODE (ptr) == SSA_NAME
+  && compute_builtin_dyn_object_size (ptr, object_size_type, &bytes))
+return bytes;
+
   return NULL_TREE;
 }
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/ex

[PATCH 3/8] tree-dynamic-object-size: Handle GIMPLE_PHI

2021-10-07 Thread Siddhesh Poyarekar

Where the target object is a PHI node, set the size to be a PHI node
with sizes of the target object.  All object size information is now
split into an SSA name and its size expression.  This allows non-SSA
size expressions to be evaluated all at once in the end and generated
statements inserted into the right places before returning the SSA
tree representing the size.

This change also adds support for dependency loops since that is a
common occurrence with PHI nodes.  Dependency loop handling is single
pass unlike tree-object-size due to the separation of size SSA names
and actual expressions, which allow for placeholders and unresolved
dependency loop counting in case of self-reference.  The downside
though is that if there's an unresolved dependency loop and the object
size computation fails, all of the computations are scrapped.  Ideally
only computations involved in the unresolved dependency loop should be
scrapped but I don't know if it's worth the extra state that would
have to be managed.  We could revisit this if it ends up being a
bottleneck.

gcc/ChangeLog:

* tree-dynamic-object-size.c: Include gimplify-me.h.
(object_size_info): New member deploop.
(object_size_exprs): New vector of object size expressions.
(emit_size_stmts, emit_size_phi_stmts, eval_size_expr,
gimplify_size_exprs): New functions.
(compute_builtin_dyn_object_size): New argument FUN.
Grow object_size_exprs and call gimplify_size_exprs.
(maybe_update_dependency_loop, cache_object_size,
cache_object_size_copy): New functions.
(set_object_size_ssa, call_object_size): Call
cache_object_size_copy.
(collect_object_sizes_for): Handle GIMPLE_PHI.
(init_dynamic_object_sizes): Allocate object_size_exprs.
(fini_dynamic_object_sizes): Deallocate object_size_exprs.
* gcc/tree-dynamic-object-size.h
(compute_builtin_dyn_object_size): New argument.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c
(test_builtin_malloc_condphi, test_builtin_malloc_condphi2,
test_builtin_malloc_condphi3, test_builtin_calloc_condphi,
test_deploop): New tests.
(main): Call them.

Signed-off-by: Siddhesh Poyarekar 
---
 .../gcc.dg/builtin-dynamic-object-size-0.c|  92 +++
 gcc/tree-dynamic-object-size.c| 258 +-
 gcc/tree-dynamic-object-size.h|   3 +-
 3 files changed, 339 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 620e8cbc611..7098ef485c6 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -63,6 +63,48 @@ test_builtin_malloc_cond (int cond)
   return __builtin_dynamic_object_size (ret, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi (int cond)
+{
+  void *ret;
+ 
+  if (cond)
+ret = __builtin_malloc (32);
+  else
+ret = __builtin_malloc (64);
+
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi2 (int cond, size_t in)
+{
+  void *ret;
+ 
+  if (cond)
+ret = __builtin_malloc (in);
+  else
+ret = __builtin_malloc (64);
+
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi3 (int cond, size_t in, size_t in2)
+{
+  void *ret;
+ 
+  if (cond)
+ret = __builtin_malloc (in);
+  else
+ret = __builtin_malloc (in2);
+
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
 /* Calloc-like allocator.  */
 
 size_t
@@ -89,6 +131,21 @@ test_builtin_calloc_cond (int cond1, int cond2)
   return __builtin_dynamic_object_size (ret, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+test_builtin_calloc_condphi (size_t cnt, size_t sz, int cond)
+{
+  struct
+{
+  int a;
+  char b;
+} bin[cnt];
+
+  char *ch = __builtin_calloc (cnt, sz);
+
+  return __builtin_dynamic_object_size (cond ? ch : (void *) &bin, 0);
+}
+
 /* Passthrough functions.  */
 
 size_t
@@ -120,6 +177,19 @@ test_dynarray_cond (int cond)
   return __builtin_dynamic_object_size (bin, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+test_deploop (size_t sz, size_t cond)
+{
+  char *bin = __builtin_alloca (32);
+
+  for (size_t i = 0; i < sz; i++)
+if (i == cond)
+  bin = __builtin_alloca (sz);
+
+  return __builtin_dynamic_object_size (bin, 0);
+}
+
 int
 main (int argc, char **argv)
 {
@@ -141,6 +211,18 @@ main (int argc, char **argv)
 FAIL ();
   if (test_builtin_malloc_cond (0) != 64)
 FAIL ();
+  if (test_builtin_malloc_condphi (1) != 32)
+FAIL ();
+  if (test_builtin_malloc_condphi (0) != 64)
+FAIL ();
+  if (test_builtin_malloc_condphi2 (1, 128) != 128)
+FAIL ();
+  if (test_builtin_malloc_condphi2 (0, 128) != 64)
+FAIL ();
+  if (test_builtin_malloc_condphi3 (1, 128,

[PATCH 5/8] tree-dynamic-object-size: Handle GIMPLE_ASSIGN

2021-10-07 Thread Siddhesh Poyarekar

Follow assignment operations and pointer-plus operations to get the
effective object size.  Unlike tree-object-size, this pass handles
negative offsets correctly and returns a size within the bounds of the
parent object; offsets > SIZE_MAX / 2 are considered negative to
emulate ssize_t.

To make negative offsets work, all object size computations store the
"whole variable" size to ensure that the negative offsets don't go
beyond bounds of the object within which PTR points.  As mentioned in
the previous patch, this makes the size_for_offset expressions quite
complex, but the aim at the moment is to be more correct than fast.
We can tweak this later if it is found to have a noticeable
performance impact.

When gcc is invoked without optimization, attempt to minimally
recognize GIMPLE_ASSIGN and ADDR_EXPR expressions so that
__builtin_dynamic_object_size is at least minimally useful without
optimization.

gcc/ChangeLog:

* tree-dynamic-object-size.c (object_whole_sizes,
object_whole_size_exprs): New vectors.
(size_for_offset): Support negative offsets.
(whole_var_size): Adjust.
(addr_dyn_object_size): New argument wholesizep.
(emit_size_stmts): New arguments wholesize_ssa and
wholesize_expr.  Emit statements for whole size.
(emit_size_phi_stmts): Likewise.
(size_bound_expr): New function.
(eval_size_expr): Call it.  New arguments wholesize and
wholesize_expr.
(gimplify_size_exprs): Adjust for whole sizes.
(compute_builtin_dyn_object_size): Do simple object size
computation for unoptimized case.
(expr_object_size, cache_object_size,cache_object_size_copy,
set_object_size_ssa, call_object_size): Adjust for wholesize.
(make_object_size_name, plus_stmt_object_size): New functions.
(collect_object_sizes_for): Support GIMPLE_ASSIGN.  Adjust for
wholesize.
(init_dynamic_object_sizes): Allocate object_wholesizes and
object_wholesize_exprs.
(fini_dynamic_object_sizes): Deallocate object_wholesizes and
object_wholesize_exprs.

gcc/testsuite/ChangeLog

* gcc.dg/builtin-dynamic-object-size-0.c (dynarray_struct):
New struct.
(test_dynarray_struct, test_substring,
test_substring_ptrplus): New tests.
(main): Call them.

Signed-off-by: Siddhesh Poyarekar 
---
 .../gcc.dg/builtin-dynamic-object-size-0.c|  50 ++
 gcc/tree-dynamic-object-size.c| 435 +++---
 2 files changed, 413 insertions(+), 72 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 22c86190341..3c2c4c84264 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -219,6 +219,42 @@ test_deploop (size_t sz, size_t cond)
   return __builtin_dynamic_object_size (bin, 0);
 }
 
+/* Address expressions.  */
+
+struct dynarray_struct
+{
+  long a;
+  char c[16];
+  int b;
+};
+
+size_t
+__attribute__ ((noinline))
+test_dynarray_struct (size_t sz, size_t off)
+{
+  struct dynarray_struct bin[sz];
+
+  return __builtin_dynamic_object_size (&bin[off].c, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_substring (size_t sz, size_t off)
+{
+  char str[sz];
+
+  return __builtin_dynamic_object_size (&str[off], 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_substring_ptrplus (size_t sz, size_t off)
+{
+  int str[sz];
+
+  return __builtin_dynamic_object_size (str + off, 0);
+}
+
 int
 main (int argc, char **argv)
 {
@@ -279,6 +315,20 @@ main (int argc, char **argv)
  FAIL ();
   if (test_dynarray (__builtin_strlen (argv[0])) != __builtin_strlen (argv[0]))
 FAIL ();
+  if (test_dynarray_struct (42, 4) !=
+  ((42 - 4) * sizeof (struct dynarray_struct)
+   - __builtin_offsetof (struct dynarray_struct, c)))
+FAIL ();
+  if (test_dynarray_struct (42, 48) != 0)
+FAIL ();
+  if (test_substring (128, 4) != 128 - 4)
+FAIL ();
+  if (test_substring (128, 142) != 0)
+FAIL ();
+  if (test_substring_ptrplus (128, 4) != (128 - 4) * sizeof (int))
+FAIL ();
+  if (test_substring_ptrplus (128, 142) != 0)
+FAIL ();
   if (test_dynarray_cond (0) != 16)
 FAIL ();
   if (test_dynarray_cond (1) != 8)
diff --git a/gcc/tree-dynamic-object-size.c b/gcc/tree-dynamic-object-size.c
index 6cc63abd010..f143a64777c 100644
--- a/gcc/tree-dynamic-object-size.c
+++ b/gcc/tree-dynamic-object-size.c
@@ -83,13 +83,18 @@ static void collect_object_sizes_for (struct 
object_size_info *, tree);
object_sizes[2] is lower bound for number of bytes till the end of
the object and object_sizes[3] lower bound for subobject.
 
-   These are merely SSA names of the sizes.  The actual size expressions are in
+   object_sizes are SSA names of the sizes.  The actual size expressions are in
object_size_exprs and they need not be SSA.  */
 static vec object_siz

[PATCH 4/8] tree-dynamic-object-size: Support ADDR_EXPR

2021-10-07 Thread Siddhesh Poyarekar

This is the beginnings of support for ADDR_EXPR objects and is partly
based on the implementation in tree-object-size, splitting out
functions for readability.

One key difference from constant-size ADDR_EXPR computation is the
computation of residual sizes based on offset.  This pass attempts to
compute an expression with guards to try and not overflow.  This is
however still rudimentary and a subsequent patch for subobject support
makes it more comprehensive by handling negative offsets as well.

The size expressions based on offsets may look arbitrarily complex but
in practice most parts of the expression tend to fold away due to
being constants.  Despite that it is still a potential bottleneck and
may need some artifical backstop (such as bailing out on computation
if the size expression nests more than an arbitrarily chosen N levels)
to reduce compile time as well as avoid adding too much of a
performance overhead to generated code.

gcc/ChangeLog:

* builtins.c (fold_builtin_dyn_object_size): Handle ADDR_EXPR.
* tree-dynamic-object-size.c (compute_object_offset,
size_for_offset, get_whole_var, whole_var_size,
addr_dyn_object_size): New functions.
(compute_builtin_dyn_object_size): Handle ADDR_EXPR.
(expr_object_size): New function.
(collect_object_sizes_for): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c
(test_builtin_malloc_condphi4, test_builtin_malloc_condphi5,
test_passthrough_nonssa): New tests.
(main): Call them.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/builtins.c|   2 +-
 .../gcc.dg/builtin-dynamic-object-size-0.c|  37 +++
 gcc/tree-dynamic-object-size.c| 255 +-
 3 files changed, 287 insertions(+), 7 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index d015029765b..c1e23324552 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -10339,7 +10339,7 @@ fold_builtin_dyn_object_size (tree ptr, tree ost)
 
   /* If object size expression cannot be evaluated yet, delay folding until
  later.  Maybe subsequent passes will help determining it.  */
-  if (TREE_CODE (ptr) == SSA_NAME
+  if ((TREE_CODE (ptr) == SSA_NAME || TREE_CODE (ptr) == ADDR_EXPR)
   && compute_builtin_dyn_object_size (ptr, object_size_type, &bytes))
 return bytes;
 
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 7098ef485c6..22c86190341 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -105,6 +105,25 @@ test_builtin_malloc_condphi3 (int cond, size_t in, size_t 
in2)
   return __builtin_dynamic_object_size (ret, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi4 (size_t sz, int cond)
+{
+  char *a = __builtin_malloc (sz);
+  char b[sz / 2];
+
+  return __builtin_dynamic_object_size (cond ? b : (void *) &a, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi5 (size_t sz, int cond, char *c)
+{
+  char *a = __builtin_malloc (sz);
+
+  return __builtin_dynamic_object_size (cond ? c : (void *) &a, 0);
+}
+
 /* Calloc-like allocator.  */
 
 size_t
@@ -158,6 +177,16 @@ test_passthrough (size_t sz, char *in)
   return __builtin_dynamic_object_size (dest, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+test_passthrough_nonssa (char *in)
+{
+  char bin[__builtin_strlen (in) + 1];
+  char *dest = __builtin_memcpy (bin, in, __builtin_strlen (in) + 1);
+
+  return __builtin_dynamic_object_size (dest, 0);
+}
+
 /* Variable length arrays.  */
 size_t
 __attribute__ ((noinline))
@@ -223,6 +252,12 @@ main (int argc, char **argv)
 FAIL ();
   if (test_builtin_malloc_condphi3 (0, 128, 256) != 256)
 FAIL ();
+  if (test_builtin_malloc_condphi4 (128, 1) != 64)
+FAIL ();
+  if (test_builtin_malloc_condphi4 (128, 0) != sizeof (void *))
+FAIL ();
+  if (test_builtin_malloc_condphi5 (128, 0, argv[0]) != -1)
+FAIL ();
   if (test_calloc (2048, 4) != 2048 * 4)
 FAIL ();
   if (test_builtin_calloc (2048, 8) != 2048 * 8)
@@ -240,6 +275,8 @@ main (int argc, char **argv)
   if (test_passthrough (__builtin_strlen (argv[0]) + 1, argv[0])
   != __builtin_strlen (argv[0]) + 1)
 FAIL ();
+  if (test_passthrough_nonssa (argv[0]) != __builtin_strlen (argv[0]) + 1)
+ FAIL ();
   if (test_dynarray (__builtin_strlen (argv[0])) != __builtin_strlen (argv[0]))
 FAIL ();
   if (test_dynarray_cond (0) != 16)
diff --git a/gcc/tree-dynamic-object-size.c b/gcc/tree-dynamic-object-size.c
index 483d6782d49..6cc63abd010 100644
--- a/gcc/tree-dynamic-object-size.c
+++ b/gcc/tree-dynamic-object-size.c
@@ -93,6 +93,212 @@ static vecobject_size_exprs[4];
 /* Bitmaps what object sizes have been computed already.  */
 static bitmap computed[4];
 
+bool compute_builtin_dyn_object_size (tree, int, tree *,
+ st

[PATCH 6/8] tree-dynamic-object-size: Handle function parameters

2021-10-07 Thread Siddhesh Poyarekar

Handle either static sizes in function parameters or hints provided by
__attribute__ ((access (...))) to compute sizes for objects.

gcc/ChangeLog:

* tree-dynamic-object-size.c: Include tree-dfa.h.
(emit_size_stmts): New argument osi.  Handle GIMPLE_NOP.
(eval_size_expr, gimplify_size_exprs): Adjust.
(parm_object_size): New function.
(collect_object_sizes_for): Handle GIMPLE_NOP.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz): New
test.
(main): Call it.

Signed-off-by: Siddhesh Poyarekar 
---
 .../gcc.dg/builtin-dynamic-object-size-0.c| 22 +
 gcc/tree-dynamic-object-size.c| 98 +--
 2 files changed, 112 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 3c2c4c84264..c72fa0508db 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -255,6 +255,15 @@ test_substring_ptrplus (size_t sz, size_t off)
   return __builtin_dynamic_object_size (str + off, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+__attribute__ ((access (__read_write__, 1, 2)))
+test_parmsz (void *obj, size_t sz, size_t off)
+{
+  return __builtin_dynamic_object_size (obj + off, 0);
+}
+
+
 int
 main (int argc, char **argv)
 {
@@ -338,6 +347,19 @@ main (int argc, char **argv)
   if (test_deploop (128, 129) != 32)
 FAIL ();
 
+  if (test_parmsz (argv[0], __builtin_strlen (argv[0]) + 1, -1)!= 0)
+FAIL ();
+
+  if (test_parmsz (argv[0], __builtin_strlen (argv[0]) + 1, 0)
+  != __builtin_strlen (argv[0]) + 1)
+FAIL ();
+  if (test_parmsz (argv[0], __builtin_strlen (argv[0]) + 1,
+  __builtin_strlen (argv[0]))!= 1)
+FAIL ();
+  if (test_parmsz (argv[0], __builtin_strlen (argv[0]) + 1,
+  __builtin_strlen (argv[0]) + 2)!= 0)
+FAIL ();
+
   if (nfails > 0)
 __builtin_abort ();
 
diff --git a/gcc/tree-dynamic-object-size.c b/gcc/tree-dynamic-object-size.c
index f143a64777c..8d7283623dc 100644
--- a/gcc/tree-dynamic-object-size.c
+++ b/gcc/tree-dynamic-object-size.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimple-iterator.h"
 #include "tree-cfg.h"
+#include "tree-dfa.h"
 #include "stringpool.h"
 #include "attribs.h"
 #include "builtins.h"
@@ -456,7 +457,7 @@ pass_through_call (const gcall *call)
 
 
 static void
-emit_size_stmts (gimple *stmt, tree wholesize_ssa,
+emit_size_stmts (object_size_info *osi, gimple *stmt, tree wholesize_ssa,
 tree wholesize_expr, tree size_ssa, tree size_expr)
 {
   gimple_seq seq = NULL;
@@ -481,7 +482,14 @@ emit_size_stmts (gimple *stmt, tree wholesize_ssa,
  statements involved in evaluation of the object size expression precede
  the definition statement.  For parameters, we don't have a definition
  statement, so insert into the first code basic block.  */
-  gimple_stmt_iterator i = gsi_for_stmt (stmt);
+  gimple_stmt_iterator i;
+  if (gimple_code (stmt) == GIMPLE_NOP)
+{
+  basic_block first_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (osi->fun));
+  i = gsi_start_bb (first_bb);
+}
+  else
+i = gsi_for_stmt (stmt);
   gsi_insert_seq_before (&i, seq, GSI_CONTINUE_LINKING);
 }
 
@@ -542,8 +550,8 @@ size_bound_expr (tree sz)
 }
 
 static void
-eval_size_expr (tree var, tree wholesize, tree *wholesize_expr,
-   tree size, tree *size_expr)
+eval_size_expr (struct object_size_info *osi, tree var, tree wholesize,
+   tree *wholesize_expr, tree size, tree *size_expr)
 {
   if (size_expr != NULL)
 {
@@ -560,7 +568,7 @@ eval_size_expr (tree var, tree wholesize, tree 
*wholesize_expr,
}
   else
{
- emit_size_stmts (stmt, wholesize, *wholesize_expr, size,
+ emit_size_stmts (osi, stmt, wholesize, *wholesize_expr, size,
   size_bound_expr (*size_expr));
  delete wholesize_expr;
  delete size_expr;
@@ -611,7 +619,7 @@ gimplify_size_exprs (object_size_info *osi, tree ptr)
   for (int object_size_type = 0; object_size_type <= 3; object_size_type++)
 EXECUTE_IF_SET_IN_BITMAP (computed[object_size_type], 0, i, bi)
   {
-   eval_size_expr (ssa_name (i),
+   eval_size_expr (osi, ssa_name (i),
object_whole_sizes[object_size_type][i],
object_whole_size_exprs[object_size_type][i],
object_sizes[object_size_type][i],
@@ -939,6 +947,71 @@ call_object_size (struct object_size_info *osi, tree ptr, 
gcall *call)
   cache_object_size_copy (osi, SSA_NAME_VERSION (ptr), sz, sz);
 }
 
+/* Find size of an object passed as a parameter to the function.  */
+
+static void
+parm_object_size (struct object_size_info *osi, tree var)
+{
+  tree parm = SSA_NAME_VAR (var);
+  unsigned varno =

[PATCH 7/8] tree-dynamic-object-size: Get subobject sizes

2021-10-07 Thread Siddhesh Poyarekar

Adapt subobject computation logic from tree-object-size to make it
work with variable sizes.

gcc/ChangeLog:

* tree-dynamic-object-size.c (build_cond_branch): New function.
(compute_object_offset): Use it.
(get_closest_subobject): New function.
(addr_dyn_object_size): Call it.  Support subobject size
computation.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c
(test_dynarray_struct_subobj): New test.
(main): Call it.

Signed-off-by: Siddhesh Poyarekar 
---
 .../gcc.dg/builtin-dynamic-object-size-0.c|  34 
 gcc/tree-dynamic-object-size.c| 172 --
 2 files changed, 194 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index c72fa0508db..94f8e071e2c 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -237,6 +237,33 @@ test_dynarray_struct (size_t sz, size_t off)
   return __builtin_dynamic_object_size (&bin[off].c, 0);
 }
 
+size_t
+__attribute__ ((noinline))
+test_dynarray_struct_subobj (size_t sz, size_t off)
+{
+  struct dynarray_struct bin[sz];
+
+  return __builtin_dynamic_object_size (&bin[off].c[4], 1);
+}
+
+size_t
+__attribute__ ((noinline))
+test_dynarray_struct_subobj2 (size_t sz, size_t off, size_t *objsz)
+{
+  struct dynarray_struct2
+{
+  long a;
+  int b;
+  char c[sz];
+};
+
+  struct dynarray_struct2 bin;
+
+  *objsz = sizeof (bin);
+
+  return __builtin_dynamic_object_size (&bin.c[off], 1);
+}
+
 size_t
 __attribute__ ((noinline))
 test_substring (size_t sz, size_t off)
@@ -334,6 +361,13 @@ main (int argc, char **argv)
 FAIL ();
   if (test_substring (128, 142) != 0)
 FAIL ();
+  if (test_dynarray_struct_subobj (42, 4) != 16 - 4)
+FAIL ();
+  if (test_dynarray_struct_subobj (42, 48) != 0)
+FAIL ();
+  size_t objsz = 0;
+  if (test_dynarray_struct_subobj2 (42, 4, &objsz) != objsz - 4 - 12)
+FAIL ();
   if (test_substring_ptrplus (128, 4) != (128 - 4) * sizeof (int))
 FAIL ();
   if (test_substring_ptrplus (128, 142) != 0)
diff --git a/gcc/tree-dynamic-object-size.c b/gcc/tree-dynamic-object-size.c
index 8d7283623dc..ebc2fad7a87 100644
--- a/gcc/tree-dynamic-object-size.c
+++ b/gcc/tree-dynamic-object-size.c
@@ -102,12 +102,21 @@ static bitmap computed[4];
 bool compute_builtin_dyn_object_size (tree, int, tree *,
  struct function *fun = NULL);
 
+static tree
+build_cond_branch (tree t, tree low_bound, tree unit_size)
+{
+  tree br = fold_build2 (MINUS_EXPR, TREE_TYPE (t), t, low_bound);
+  br = fold_convert (sizetype, br);
+  br = fold_build2 (MULT_EXPR, sizetype, unit_size, br);
+
+  return br;
+}
+
 /* Compute offset of EXPR within VAR.  Return error_mark_node if unknown.  */
 
 static tree
 compute_object_offset (const_tree expr, const_tree var)
 {
-  enum tree_code code = PLUS_EXPR;
   tree base, off, t;
 
   if (expr == var)
@@ -150,12 +159,16 @@ compute_object_offset (const_tree expr, const_tree var)
   low_bound = array_ref_low_bound (CONST_CAST_TREE (expr));
   unit_size = array_ref_element_size (CONST_CAST_TREE (expr));
   if (! integer_zerop (low_bound))
-   t = fold_build2 (MINUS_EXPR, TREE_TYPE (t), t, low_bound);
-  if (TREE_CODE (t) == INTEGER_CST && tree_int_cst_sgn (t) < 0)
{
- code = MINUS_EXPR;
- t = fold_build1 (NEGATE_EXPR, TREE_TYPE (t), t);
+ tree cond = fold_build2 (GE_EXPR, TREE_TYPE (t), t, low_bound);
+ tree then_br = build_cond_branch (t, low_bound, unit_size);
+ tree else_br = build_cond_branch (low_bound, t, unit_size);
+
+ then_br = fold_build2 (PLUS_EXPR, sizetype, base, then_br);
+ else_br = fold_build2 (MINUS_EXPR, sizetype, base, else_br);
+ return fold_build3 (COND_EXPR, sizetype, cond, then_br, else_br);
}
+
   t = fold_convert (sizetype, t);
   off = fold_build2 (MULT_EXPR, sizetype, unit_size, t);
   break;
@@ -168,7 +181,7 @@ compute_object_offset (const_tree expr, const_tree var)
   return error_mark_node;
 }
 
-  return fold_build2 (code, sizetype, base, off);
+  return fold_build2 (PLUS_EXPR, sizetype, base, off);
 }
 
 /* Given an object size SZ and offset OFF into it, compute the usable object
@@ -328,6 +341,105 @@ whole_var_size (struct object_size_info *osi, tree pt_var,
   return true;
 }
 
+/* Get the most immediate subobject encapsulating the pointer PTR, given the
+   whole_var object WHOLE_VAR.  */
+
+static tree
+get_closest_subobject (const_tree ptr, tree whole_var)
+{
+  tree var = TREE_OPERAND (ptr, 0);
+
+  while (var != whole_var
+&& TREE_CODE (var) != BIT_FIELD_REF
+&& TREE_CODE (var) != COMPONENT_REF
+&& TREE_CODE (var) != ARRAY_REF
+&& TREE_CODE (var) != ARRAY_RANGE_REF
+&& TREE_CODE (var) != REALPA

[PATCH 8/8] tree-dynamic-object-size: Add test wrappers to extend testing

2021-10-07 Thread Siddhesh Poyarekar

Wrap most tests for __builtin_object_size with
__builtin_dynamic_object_size to get better coverage.  At the moment
only one test does not work correctly with
__builtin_dynamic_object_size, viz. builtin-object-size-5.c.  The test
expects abort() calls to be optimized away, which does not happen with
__builtin_dynamic_object_size since it cannot collapse loop iteration
to a constant.

gcc/testsuite/ChangeLog:

* g++.dg/ext/builtin-dynamic-object-size1.C: New test.
* g++.dg/ext/builtin-dynamic-object-size2.C: Likewise.
* gcc.dg/builtin-dynamic-alloc-size.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-1.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-10.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-11.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-12.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-13.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-14.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-15.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-16.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-17.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-18.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-19.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-6.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-7.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-8.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-9.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Adjust to rebuild with
__builtin_dynamic_object_size.
* gcc.dg/builtin-object-size-16.c: Likewise.
* gcc.dg/builtin-object-size-17.c: Likewise.
* gcc.dg/builtin-object-size-2.c: Likewise.
* gcc.dg/builtin-object-size-3.c: Likewise.
* gcc.dg/builtin-object-size-4.c: Likewise.

Signed-off-by: Siddhesh Poyarekar 
---
 .../g++.dg/ext/builtin-dynamic-object-size1.C |   5 +
 .../g++.dg/ext/builtin-dynamic-object-size2.C |   5 +
 .../gcc.dg/builtin-dynamic-alloc-size.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-1.c|   7 +
 .../gcc.dg/builtin-dynamic-object-size-10.c   |   9 ++
 .../gcc.dg/builtin-dynamic-object-size-11.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-12.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-13.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-14.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-15.c   |   6 +
 .../gcc.dg/builtin-dynamic-object-size-16.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-17.c   |   8 +
 .../gcc.dg/builtin-dynamic-object-size-18.c   |   8 +
 .../gcc.dg/builtin-dynamic-object-size-19.c   | 104 
 .../gcc.dg/builtin-dynamic-object-size-2.c|   7 +
 .../gcc.dg/builtin-dynamic-object-size-3.c|   7 +
 .../gcc.dg/builtin-dynamic-object-size-4.c|   7 +
 .../gcc.dg/builtin-dynamic-object-size-6.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-7.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-8.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-9.c|   5 +
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  | 110 +
 gcc/testsuite/gcc.dg/builtin-object-size-16.c |  10 ++
 gcc/testsuite/gcc.dg/builtin-object-size-17.c |   2 +
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  | 126 +++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  | 148 ++
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  | 117 ++
 27 files changed, 742 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-14.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-15.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-16.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-17.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-18.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-19.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-objec

Re: [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics

2021-10-07 Thread Segher Boessenkool

Hi!

On Mon, Aug 23, 2021 at 02:03:04PM -0500, Paul A. Clarke wrote:
> v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2

There should not be a "v3" in the commit message.  The easy way to
achieve this is put it inside the [] in the subject (as you did), and to
mention the version history after a --- (see --notes for git-format-patch
for example).

> Tested ppc64le (POWER9) and ppc64/32 (POWER7).

Please write the full triples -- well at least enough that they are
usable, like, powerpc64-linux.  I'll assume you tested on Linux :-)

Segher

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-07 Thread Segher Boessenkool

On Mon, Aug 30, 2021 at 04:16:43PM -0500, Paul A. Clarke wrote:
> The confusing thing is that _builtin_mffsl is explicitly supported on earlier
> processors, if I read the code right (from gcc/config/rs6000/rs6000.md):

Yes.  It is very simple to implement everywhere, not significantly
slower than mffs.  So allowing this builtin to be used everywhere makes
it easier to use, with no real downsides.

Segher

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-07 Thread Segher Boessenkool

On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote:
> No attempt is made to optimize writing the FPSCR (by checking if the new
> value would be the same), other than using lighter weight instructions
> when possible.

__builtin_set_fpscr_rn makes optimised code (using mtfsb[01])
automatically, fwiw.

> Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> convert _mm_ceil* and _mm_floor* into macros. This matches the current
> analogous implementations in config/i386/smmintrin.h.

Hrm.  Using function-like macros is begging for trouble, as usual.  But
the x86 version does this, so meh.

> +extern __inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_pd (__m128d __A, int __rounding)
> +{
> +  __v2df __r;
> +  union {
> +double __fr;
> +long long __fpscr;
> +  } __enables_save, __fpscr_save;
> +
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +{
> +  /* Save enabled exceptions, disable all exceptions,
> +  and preserve the rounding mode.  */
> +#ifdef _ARCH_PWR9
> +  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));

The __volatile__ does likely not do what you want.  As far as I can see
you do not want one here anyway?

"volatile" does not order asm wrt fp insns, which you likely *do* want.

> +  __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };

You put spaces after only some casts, btw?  Well maybe I found the one
place you did it wrong, heh :-)  And you can avoid having so many parens
by making extra variables -- much more readable.

> +  switch (__rounding)

You do not need any of that __ either.

> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */

"dg-do run" requires vsx_hw, not just vsx_ok.  Testing on a machine
without VSX (so before p7) would have shown that, but do you have access
to any?  This is one of those things we are only told about a year after
it was added, because no one who tests often does that on so old
hardware :-)

So, okay for trunk (and backports after some burn-in) with that vsx_ok
fixed.  That asm needs fixing, but you can do that later.

Thanks!

Segher

RE: [PATCH] [MIPS] Hazard barrier return support

2021-10-07 Thread Maciej W. Rozycki

On Mon, 16 Aug 2021, Dragan Mladjenovic via Gcc-patches wrote:

> I'll respin the tests for mips64r2 and mips64 just in case.
> 
> This check would fail for -mips64. GAS accepts jr.hb for both '.set mips64' 
> and
> '.set mips64r2' and '.set mips32' for that matter. mips-opc.c has the 
> following
> comment:
> /* jr.hb is officially MIPS{32,64}R2, but it works on R1 as jr with
>the same hazard barrier effect.  */

 That it works on R1 is however not universally true, because as I recall 
at least the original 4Kc CPU does trap with RI on non-zero hint values.  
It may have qualified as a processor erratum though.  I just did a quick 
run-time check with a 5Kc CPU and that one does execute the JR.HB encoding 
just fine.  I don't have a specimen of 4Kc that I could readily use, even 
though I do believe I have a 4Kc module along with an Algorithmics board 
that could use it and which could possibly run NetBSD (I haven't checked).

 NB the semantics claim is even less true, because there is no hazard 
barrier effect with the R1 JR or JALR instructions ever, and for example 
the 4Kc does require to have any hazards resolved by placing a suitable 
number of intervening instructions (which do not have to be SSNOP however 
as the 4Kc is not superscalar).  So a typical instruction hazard such as a 
change of the ASID is requires 1 spacing instruction and 3 such for a data 
access and an instruction fetch respectively affected by a change of the 
ASID[1].

 I don't know where the note in mips-opc.c originally came from and at 
this point it cannot be verified.  As it was typical at the time, the 
change was posted with just a vague note referring a "mixed bag of opcode 
table changes" as it was applied[2], so it does not help.

References:

[1] "MIPS32 4K Processor Core Family Software User's Manual", MIPS 
Technologies, Inc., Document Number: MD00016, Revision 01.17, 
September 25, 2002, Table 2-6 "Instruction Hazards", p. 28

[2] "MIPS opcode table update", 

  Maciej

Re: [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics

2021-10-07 Thread Paul A. Clarke via Gcc-patches

On Thu, Oct 07, 2021 at 05:25:54PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:04PM -0500, Paul A. Clarke wrote:
> > v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
> 
> There should not be a "v3" in the commit message.  The easy way to
> achieve this is put it inside the [] in the subject (as you did), and to
> mention the version history after a --- (see --notes for git-format-patch
> for example).

This is just a cover letter. Does it matter in that context?
(I have done as described in the patches which followed.)

> > Tested ppc64le (POWER9) and ppc64/32 (POWER7).
> 
> Please write the full triples -- well at least enough that they are
> usable, like, powerpc64-linux.  I'll assume you tested on Linux :-)

Yes, sorry.  All are "-linux", and I'll try to remember that for next time.

PC

RE: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-07 Thread Paul A. Clarke via Gcc-patches

On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote:
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible.
> 
> __builtin_set_fpscr_rn makes optimised code (using mtfsb[01])
> automatically, fwiw.
> 
> > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > analogous implementations in config/i386/smmintrin.h.
> 
> Hrm.  Using function-like macros is begging for trouble, as usual.  But
> the x86 version does this, so meh.
> 
> > +extern __inline __m128d
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_pd (__m128d __A, int __rounding)
> > +{
> > +  __v2df __r;
> > +  union {
> > +double __fr;
> > +long long __fpscr;
> > +  } __enables_save, __fpscr_save;
> > +
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +{
> > +  /* Save enabled exceptions, disable all exceptions,
> > +and preserve the rounding mode.  */
> > +#ifdef _ARCH_PWR9
> > +  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> 
> The __volatile__ does likely not do what you want.  As far as I can see
> you do not want one here anyway?
> 
> "volatile" does not order asm wrt fp insns, which you likely *do* want.

Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
has no effect at all (6.47.1):

| The optional volatile qualifier has no effect. All basic asm blocks are
| implicitly volatile.

So, it could be removed without concern.

> > +  __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };
> 
> You put spaces after only some casts, btw?  Well maybe I found the one
> place you did it wrong, heh :-)  And you can avoid having so many parens
> by making extra variables -- much more readable.

I'll fix this.

> > +  switch (__rounding)
> 
> You do not need any of that __ either.

I'm surprised that I don't. A .h file needs to be concerned about the
namespace it inherits, no?

> > +/* { dg-do run } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> 
> "dg-do run" requires vsx_hw, not just vsx_ok.  Testing on a machine
> without VSX (so before p7) would have shown that, but do you have access
> to any?  This is one of those things we are only told about a year after
> it was added, because no one who tests often does that on so old
> hardware :-)
> 
> So, okay for trunk (and backports after some burn-in) with that vsx_ok
> fixed.  That asm needs fixing, but you can do that later.

OK.

Thanks!

PC

Ping: [PATCH v2 0/2] Fix vec_sel code generation and merge xxsel to vsel

2021-10-07 Thread Xionghu Luo via Gcc-patches

Ping, thanks.


On 2021/9/17 13:25, Xionghu Luo wrote:
> These two patches are updated version from:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579490.html
> 
> Changes:
> 1. Fix alignment error in md files.
> 2. Replace rtx_equal_p with match_dup.
> 3. Use register_operand instead of gpc_reg_operand to align with
>vperm/xxperm.
> 4. Regression tested pass on P8LE.
> 
> Xionghu Luo (2):
>   rs6000: Fix wrong code generation for vec_sel [PR94613]
>   rs6000: Fold xxsel to vsel since they have same semantics
> 
>  gcc/config/rs6000/altivec.md  | 84 ++-
>  gcc/config/rs6000/rs6000-call.c   | 62 ++
>  gcc/config/rs6000/rs6000.c| 19 ++---
>  gcc/config/rs6000/vector.md   | 26 +++---
>  gcc/config/rs6000/vsx.md  | 25 --
>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr94613.c| 47 +++
>  7 files changed, 193 insertions(+), 72 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr94613.c
> 

-- 
Thanks,
Xionghu

Re: [PATCH] [GCC12] Mention Intel AVX512-FP16 and _Float16 support.

2021-10-07 Thread Hongtao Liu via Gcc-patches

On Fri, Oct 1, 2021 at 6:13 PM Gerald Pfeifer  wrote:
>
> On Fri, 24 Sep 2021, Hongtao Liu via Gcc-patches wrote:
> > +  New ISA extension support for Intel AVX512-FP16 was added to GCC.
> > +  AVX512FP16 intrinsics are available [...]
>
> So, is it AVX512-FP16 or AVX512FP16?
Sorry for the confusion, the official name is AVX512-FP16[1], and for
simplicity, AVX512FP16 is used in the mail thread.
[1] 
https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
>
> Gerald



-- 
BR,
Hongtao

Re: [PATCH] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Kito Cheng

Hi Andrew:

> The error message would have been emitted from
> expand_builtin___clear_cache and maybe_emit_call_builtin___clear_cache
>  would not have been called from user code.
> All other uses of maybe_emit_call_builtin___clear_cache are internal
> to gcc and should have the correct mode so asserting is the right
> thing to do.

I got your point, and agree that, just like when we feed double types,
such type error should already catched by other place   :)

Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-10-07 Thread Hongtao Liu via Gcc-patches

On Thu, Oct 7, 2021 at 11:38 PM H.J. Lu via Gcc-patches
 wrote:
>
> On Thu, Oct 7, 2021 at 8:35 AM Martin Liška  wrote:
> >
> > Hello.
> >
> > The patch is approved, are you planning committing the changes?
Committed.
> >
> > Thanks,
> > Martin
>
> Hongtao is on holiday.  He will be back later today.
>
> --
> H.J.



-- 
BR,
Hongtao

[PATCH v3] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Kito Cheng

__builtin___clear_cache was able to accept constant address for the
argument, but it seems no longer accept recently, and it even not
accept constant address which is hold in variable when optimization is
enable:

```
void foo3(){
  void *yy = (void*)0x1000;
  __builtin___clear_cache(yy, yy);
}
```

So this patch make BEGIN and END accept VOIDmode, like cselib_lookup_mem did per
Jim Wilson's suggestion.

```
static cselib_val *
cselib_lookup_mem (rtx x, int create)
{
  ...
  addr_mode = GET_MODE (XEXP (x, 0));
  if (addr_mode == VOIDmode)
addr_mode = Pmode;
```

Changes v2 -> v3:
- Use gcc_assert rather than error, maybe_emit_call_builtin___clear_cache is
internal use only, and we already checked the type in other place.

Changes v1 -> v2:
- Check is CONST_INT intead of cehck mode, no new testcase, since
  constant value with other type like CONST_DOUBLE will catched by
  front-end.
e.g.
Code:
```c
void foo(){
  __builtin___clear_cache(1.11, 0);
}
```
Error message:
```
clearcache-double.c: In function 'foo':
clearcache-double.c:2:27: error: incompatible type for argument 1 of 
'__builtin___clear_cache'
2 |   __builtin___clear_cache(1.11, 0);
  |   ^~~~
  |   |
  |   double
clearcache-double.c:2:27: note: expected 'void *' but argument is of type 
'double'
```

gcc/ChangeLog:

PR target/100316
* builtins.c (maybe_emit_call_builtin___clear_cache): Allow
CONST_INT for BEGIN and END, and use gcc_assert rather than
error.

gcc/testsuite/ChangeLog:

PR target/100316
* gcc.c-torture/compile/pr100316.c: New.
---
 gcc/builtins.c | 10 --
 gcc/testsuite/gcc.c-torture/compile/pr100316.c | 18 ++
 2 files changed, 22 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100316.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3e57eb03af0..80a1bb191c6 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5163,12 +5163,10 @@ default_emit_call_builtin___clear_cache (rtx begin, rtx 
end)
 void
 maybe_emit_call_builtin___clear_cache (rtx begin, rtx end)
 {
-  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode)
-  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode))
-{
-  error ("both arguments to %<__builtin___clear_cache%> must be pointers");
-  return;
-}
+  gcc_assert ((GET_MODE (begin) == ptr_mode || GET_MODE (begin) == Pmode
+  || CONST_INT_P (begin))
+ && (GET_MODE (end) == ptr_mode || GET_MODE (end) == Pmode
+ || CONST_INT_P (end)));
 
   if (targetm.have_clear_cache ())
 {
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100316.c 
b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
new file mode 100644
index 000..38eca86f49f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
@@ -0,0 +1,18 @@
+void foo(){
+  __builtin___clear_cache(0, 0);
+}
+
+void foo1(){
+  __builtin___clear_cache((void*)0, (void*)0);
+}
+
+void foo2(){
+  void *yy = 0;
+  __builtin___clear_cache(yy, yy);
+}
+
+void foo3(){
+  void *yy = (void*)0x1000;
+  __builtin___clear_cache(yy, yy);
+}
+
-- 
2.33.0

libgcc patch committed: Use .init_array for constructor if available

2021-10-07 Thread Ian Lance Taylor via Gcc-patches

This libgcc patch changes the morestack support to use the
.init_array. section for the constructor rather than the
.ctors.65535 section, if the target supports .init_array.  This should
avoid pointless mixing .ctors and .init_array.  Bootstrapped and ran
Go and split-stack tests on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian

* config/i386/morestack.S: Use .init_array for constructor if
available.
* config/rs6000/morestack.S: Likewise.
* config/s390/morestack.S: Likewise.
diff --git a/libgcc/config/i386/morestack.S b/libgcc/config/i386/morestack.S
index 718dbb1d5d2..2e748eda8a7 100644
--- a/libgcc/config/i386/morestack.S
+++ b/libgcc/config/i386/morestack.S
@@ -23,6 +23,7 @@
 # see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # .
 
+#include "auto-host.h"
 
 # Support for allocating more stack space when using -fsplit-stack.
 # When a function discovers that it needs more stack space, it will
@@ -75,7 +76,7 @@
 # The amount of extra space we ask for.  In general this has to be
 # enough for the dynamic loader to find a symbol and for a signal
 # handler to run.
-   
+
 #ifndef __x86_64__
 #define BACKOFF (1024)
 #else
@@ -160,7 +161,7 @@ __morestack_non_split:
cmpb$0x55,1(%eax)
je  2f
 
-3: 
+3:
movl%eax,4(%esp)# Update return address.
 
popl%eax# Restore %eax and stack.
@@ -426,7 +427,7 @@ __morestack:
 
 # This is the cleanup code called by the stack unwinder when unwinding
 # through the code between .LEHB0 and .LEHE0 above.
-   
+
 .L1:
.cfi_restore_state
subl$16,%esp# Maintain 16 byte alignment.
@@ -600,7 +601,7 @@ __morestack:
 
 # This is the cleanup code called by the stack unwinder when unwinding
 # through the code between .LEHB0 and .LEHE0 above.
-   
+
 .L1:
.cfi_restore_state
subq$16,%rsp# Maintain 16 byte alignment.
@@ -848,7 +849,11 @@ __morestack_make_guard:
 # Make __stack_split_initialize a high priority constructor.  FIXME:
 # This is ELF specific.
 
+#if HAVE_INITFINI_ARRAY_SUPPORT
+   .section.init_array.0,"aw",@progbits
+#else
.section.ctors.65535,"aw",@progbits
+#endif
 
 #ifndef __LP64__
.align  4
diff --git a/libgcc/config/rs6000/morestack.S b/libgcc/config/rs6000/morestack.S
index a2e255e5c21..509e506a6de 100644
--- a/libgcc/config/rs6000/morestack.S
+++ b/libgcc/config/rs6000/morestack.S
@@ -24,6 +24,8 @@
 # see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # .
 
+#include 
+
 #if _CALL_ELF == 2
.abiversion 2
 #define PARAMS 32
@@ -396,7 +398,11 @@ ENTRY0(__morestack_make_guard)
 
 
 # Make __stack_split_initialize a high priority constructor.
+#if HAVE_INITFINI_ARRAY_SUPPORT
+   .section .init_array.0,"aw",@progbits
+#else
.section .ctors.65535,"aw",@progbits
+#endif
.p2align 3
.quad __stack_split_initialize
.quad __morestack_load_mmap
diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
index 80349fd869c..d873de90d12 100644
--- a/libgcc/config/s390/morestack.S
+++ b/libgcc/config/s390/morestack.S
@@ -28,6 +28,8 @@
 # also cover signal frame size.
 #define BACKOFF 0x1000
 
+#include 
+
 # The __morestack function.
 
.global __morestack
@@ -594,7 +596,11 @@ __morestack_make_guard:
 
 # Make __stack_split_initialize a high priority constructor.
 
+#if HAVE_INITFINI_ARRAY_SUPPORT
+   .section .init_array.0,"aw",@progbits
+#else
.section .ctors.65535,"aw",@progbits
+#endif
 
 #ifndef __LP64__
.align  4

Re: [PATCH 2/8] tree-dynamic-object-size: New pass

2021-10-07 Thread Siddhesh Poyarekar


On 10/8/21 03:44, Siddhesh Poyarekar wrote:

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 133b82eef38..082d167cd65 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12777,6 +12777,17 @@ assert (__builtin_object_size (q, 1) == sizeof 
(var.b));
  @end smallexample
  @end deftypefn
  
+@deftypefn {Built-in Function} {size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})

+is similar to @code{__builtin_object_size} in that it returns a number of bytes
+from @var{ptr} to the end of the object @var{ptr} pointer points to, except
+that the size returned may not be a constant.  This results in successful
+evaluation of object size estimates in a wider range of use cases and can be
+more precise than @code{__builtin_object_size}, but it incurs a performance
+penalty since it may add a runtime overhead on size computation.  Semantics of
+@var{type} as well as return values in case it is not possible to determine
+which objects @var{ptr} points to at compile time are the same as in the case
+of @code{__builtin_object_size}.
+


Looks like I had pushed my scratch build without this doc update; this 
fails because I didn't terminate the @deftypefn.  I've fixed it locally 
and will post an updated series along with resolution of feedback I 
receive on the series.


Thanks,
Siddhesh

[PATCH] c++: Comment out announce_function to prevent ICE [PR102426]

2021-10-07 Thread qingzhe huang via Gcc-patches

This "announce_function" is not a purely readonly
function. Instead it calls tsubst which modifies
global variable "current_function_decl". If it is
placed at parser code directory level, recursion
might happen and set/reset variable might interleave
and can cause ICE. No test case is provided because
this segment fault only occurs when compiler option
"-quiet" is not set which is usually set by driver.
That is why usual testcases cannot expose this issue.
Even though this issue usually doesn't affect endusers,
however, it is a headache for development.

gcc/cp/ChangeLog:
PR c++/102426
* decl.c (start_preparsed_function): Comment
out announce_function.
(emit_coro_helper): Comment out
announce_function.

gcc/ChangeLog:
PR c++/102426
* toplev.c (get_src_pwd): Add comment
to note announce_function usage.

Signed-off-by: qingzhe huang 
---
 gcc/cp/decl.c | 4 
 gcc/toplev.c  | 5 -
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 2d30c790b93..94d3a2c1cba 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -16904,9 +16904,6 @@ start_preparsed_function (tree decl1, tree attrs, int 
flags)
  where store_parm_decls will find them.  */
   tree current_function_parms = DECL_ARGUMENTS (decl1);
 
-  /* Let the user know we're compiling this function.  */
-  announce_function (decl1);
-
   gcc_assert (DECL_INITIAL (decl1));
 
   /* This function may already have been parsed, in which case just
@@ -17472,7 +17469,6 @@ emit_coro_helper (tree helper)
   current_function_decl = helper;
   begin_scope (sk_function_parms, NULL);
   store_parm_decls (DECL_ARGUMENTS (helper));
-  announce_function (helper);
   allocate_struct_function (helper, false);
   cfun->language = ggc_cleared_alloc ();
   poplevel (1, 0, 1);
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 1bb1794be96..4a3ca1aef4a 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -221,7 +221,10 @@ get_src_pwd (void)
 }
 
 /* Called when the start of a function definition is parsed,
-   this function prints on stderr the name of the function.  */
+   this function prints on stderr the name of the function.
+   NOTE: Do not use this function at directory gcc/cp level
+   or below because it might recurse and interleave with
+   function frame parsing which can cause crash. */
 void
 announce_function (tree decl)
 {
-- 
2.17.1

Re: Merge from trunk to gccgo branch

2021-10-07 Thread Ian Lance Taylor via Gcc-patches

I merged trunk revision f49e3d28be44179f07b8a06159139ce77096dda7 to
the gccgo branch.

Ian

[PATCH] c++: Comment out announce_function to prevent ICE [PR102624]

2021-10-07 Thread qingzhe huang via Gcc-patches

This "announce_function" is not a purely readonly
function. Instead it calls tsubst which modifies
global variable "current_function_decl". If it is
placed at parser code directory level, recursion
might happen and set/reset variable might interleave
and can cause ICE. No test case is provided because
this segment fault only occurs when compiler option
"-quiet" is not set which is usually set by driver.
That is why usual testcases cannot expose this issue.
Even though this issue usually doesn't affect endusers,
however, it is a headache for development.

gcc/cp/ChangeLog:
PR c++/102624
* decl.c (start_preparsed_function): Comment
out announce_function.
(emit_coro_helper): Comment out
announce_function.

gcc/ChangeLog:
PR c++/102624
* toplev.c (get_src_pwd): Add comment
to note announce_function usage.

Signed-off-by: qingzhe huang 
---
 gcc/cp/decl.c | 4 
 gcc/toplev.c  | 5 -
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 2d30c790b93..94d3a2c1cba 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -16904,9 +16904,6 @@ start_preparsed_function (tree decl1, tree attrs, int 
flags)
  where store_parm_decls will find them.  */
   tree current_function_parms = DECL_ARGUMENTS (decl1);
 
-  /* Let the user know we're compiling this function.  */
-  announce_function (decl1);
-
   gcc_assert (DECL_INITIAL (decl1));
 
   /* This function may already have been parsed, in which case just
@@ -17472,7 +17469,6 @@ emit_coro_helper (tree helper)
   current_function_decl = helper;
   begin_scope (sk_function_parms, NULL);
   store_parm_decls (DECL_ARGUMENTS (helper));
-  announce_function (helper);
   allocate_struct_function (helper, false);
   cfun->language = ggc_cleared_alloc ();
   poplevel (1, 0, 1);
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 1bb1794be96..4a3ca1aef4a 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -221,7 +221,10 @@ get_src_pwd (void)
 }
 
 /* Called when the start of a function definition is parsed,
-   this function prints on stderr the name of the function.  */
+   this function prints on stderr the name of the function.
+   NOTE: Do not use this function at directory gcc/cp level
+   or below because it might recurse and interleave with
+   function frame parsing which can cause crash. */
 void
 announce_function (tree decl)
 {
-- 
2.17.1

Re: [PATCH] c++: Comment out announce_function to prevent ICE [PR102426]

2021-10-07 Thread Nick Huang via Gcc-patches

I am terribly sorry for my typo of wrong PR #, it should be 102624.
Please ignore this email thread and I resend the patch in next email
with correct subject:
[PATCH] c++: Comment out announce_function to prevent ICE [PR102624]

Once again my apologies for spam.

On Fri, Oct 8, 2021 at 12:31 AM qingzhe huang  wrote:
>
> This "announce_function" is not a purely readonly
> function. Instead it calls tsubst which modifies
> global variable "current_function_decl". If it is
> placed at parser code directory level, recursion
> might happen and set/reset variable might interleave
> and can cause ICE. No test case is provided because
> this segment fault only occurs when compiler option
> "-quiet" is not set which is usually set by driver.
> That is why usual testcases cannot expose this issue.
> Even though this issue usually doesn't affect endusers,
> however, it is a headache for development.
>
> gcc/cp/ChangeLog:
> PR c++/102426
> * decl.c (start_preparsed_function): Comment
> out announce_function.
> (emit_coro_helper): Comment out
> announce_function.
>
> gcc/ChangeLog:
> PR c++/102426
> * toplev.c (get_src_pwd): Add comment
> to note announce_function usage.
>
> Signed-off-by: qingzhe huang 
> ---
>  gcc/cp/decl.c | 4 
>  gcc/toplev.c  | 5 -
>  2 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index 2d30c790b93..94d3a2c1cba 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -16904,9 +16904,6 @@ start_preparsed_function (tree decl1, tree attrs, int 
> flags)
>   where store_parm_decls will find them.  */
>tree current_function_parms = DECL_ARGUMENTS (decl1);
>
> -  /* Let the user know we're compiling this function.  */
> -  announce_function (decl1);
> -
>gcc_assert (DECL_INITIAL (decl1));
>
>/* This function may already have been parsed, in which case just
> @@ -17472,7 +17469,6 @@ emit_coro_helper (tree helper)
>current_function_decl = helper;
>begin_scope (sk_function_parms, NULL);
>store_parm_decls (DECL_ARGUMENTS (helper));
> -  announce_function (helper);
>allocate_struct_function (helper, false);
>cfun->language = ggc_cleared_alloc ();
>poplevel (1, 0, 1);
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 1bb1794be96..4a3ca1aef4a 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -221,7 +221,10 @@ get_src_pwd (void)
>  }
>
>  /* Called when the start of a function definition is parsed,
> -   this function prints on stderr the name of the function.  */
> +   this function prints on stderr the name of the function.
> +   NOTE: Do not use this function at directory gcc/cp level
> +   or below because it might recurse and interleave with
> +   function frame parsing which can cause crash. */
>  void
>  announce_function (tree decl)
>  {
> --
> 2.17.1
>

Re: [PATCH 0/8] __builtin_dynamic_object_size and more

2021-10-07 Thread Siddhesh Poyarekar


On 10/8/21 03:44, Siddhesh Poyarekar wrote:

(from about 4% to 70% in bash), but that could well be due to the _chk


I should also clarify that this is for memcpy.  For all fortifiable 
functions, the coverage percentage went from 30.81% to 84.5% for bash. 
Below is the full table.  Please note that this is only based on symbols 
emitted in the end as I didn't want to rebuild the _FORTIFIED_SOURCE=2 
binaries, so it does not take into account the fact that _chk could get 
folded to regular calls if we know at compile time that it's safe to do so.


No more posting patches at 4am; it only leads to more clarification 
follow-ups :/


  Func  F(2)U(2)Cov(2)  F(3)U(3)Cov(3)

 asprintf  1   0100.00%1   0100.00%
*confstr   0   2  0.00%1   1 50.00%
 fdelt10   0100.00%   10   0100.00%
*fgets 0   1  0.00%1   0100.00%
 fprintf  82   0100.00%   82   0100.00%
 getcwd0   3  0.00%0   3  0.00%
*getgroups 0   1  0.00%1   0100.00%
 gethostname   0   1  0.00%0   1  0.00%
 longjmp  23   0100.00%   23   0100.00%
 mbsnrtowcs0   2  0.00%0   2  0.00%
*mbsrtowcs 0   1  0.00%1   0100.00%
*mbstowcs  0  10  0.00%5   5 50.00%
*memcpy7 170  3.95%  116  40 74.36%
*memmove   4  21 16.00%   12  17 41.38%
*memset0  24  0.00%3  21 12.50%
 printf  150   0100.00%  150   0100.00%
*read  0  19  0.00%8  11 42.11%
*readlink  0   3  0.00%0   3  0.00%
 snprintf 11   0100.00%   11   0100.00%
 sprintf  28   0100.00%   28   0100.00%
*strcat0   5  0.00%8   0100.00%
*strcpy6 440  1.35%  413  36 91.98%
*strncpy   2  52  3.70%   33  19 63.46%
 vfprintf 14   0100.00%   14   0100.00%
 vsnprintf 4   0100.00%4   0100.00%
 wcrtomb   0   7  0.00%0   7  0.00%
*wcsrtombs 0   3  0.00%2   1 66.67%
 wctomb0   3  0.00%0   3  0.00%
-
Total342 768 30.81% 927  170 84.50%

Re: [PATCH v2] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Richard Biener via Gcc-patches

On Thu, Oct 7, 2021 at 6:45 PM Kito Cheng  wrote:
>
> __builtin___clear_cache was able to accept constant address for the
> argument, but it seems no longer accept recently, and it even not
> accept constant address which is hold in variable when optimization is
> enable:
>
> ```
> void foo3(){
>   void *yy = (void*)0x1000;
>   __builtin___clear_cache(yy, yy);
> }
> ```
>
> So this patch make BEGIN and END accept VOIDmode, like cselib_lookup_mem did 
> per
> Jim Wilson's suggestion.
>
> ```
> static cselib_val *
> cselib_lookup_mem (rtx x, int create)
> {
>   ...
>   addr_mode = GET_MODE (XEXP (x, 0));
>   if (addr_mode == VOIDmode)
> addr_mode = Pmode;
> ```
>
> Changes v1 -> v2:
> - Check is CONST_INT intead of cehck mode, no new testcase, since
>   constant value with other type like CONST_DOUBLE will catched by
>   front-end.
> e.g.
> Code:
> ```c
> void foo(){
>   __builtin___clear_cache(1.11, 0);
> }
> ```
> Error message:
> ```
> clearcache-double.c: In function 'foo':
> clearcache-double.c:2:27: error: incompatible type for argument 1 of 
> '__builtin___clear_cache'
> 2 |   __builtin___clear_cache(1.11, 0);
>   |   ^~~~
>   |   |
>   |   double
> clearcache-double.c:2:27: note: expected 'void *' but argument is of type 
> 'double'
> ```
>
> gcc/ChangeLog:
>
> PR target/100316
> * builtins.c (maybe_emit_call_builtin___clear_cache): Allow
> CONST_INT for BEGIN and END.
>
> gcc/testsuite/ChangeLog:
>
> PR target/100316
> * gcc.c-torture/compile/pr100316.c: New.
> ---
>  gcc/builtins.c |  6 --
>  gcc/testsuite/gcc.c-torture/compile/pr100316.c | 18 ++
>  2 files changed, 22 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100316.c
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 3e57eb03af0..8c3ad302468 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -5163,8 +5163,10 @@ default_emit_call_builtin___clear_cache (rtx begin, 
> rtx end)
>  void
>  maybe_emit_call_builtin___clear_cache (rtx begin, rtx end)
>  {
> -  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode)
> -  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode))
> +  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode
> +   && !CONST_INT_P (begin))
> +  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode
> + && !CONST_INT_P (end)))
>  {
>error ("both arguments to %<__builtin___clear_cache%> must be 
> pointers");
>return;

I don't think it makes sense to emit user-facing diagnostics about
this from here,
user code calling this go through expand_builtin___clear_cache which performs
the check and diagnostic on the argument types which is more appropriate and
the code above is eventually called from special backend code directly but
that's nothing the user can do anything about.

So how about simply removing this check?

Thanks,
Richard.

> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100316.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
> new file mode 100644
> index 000..38eca86f49f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
> @@ -0,0 +1,18 @@
> +void foo(){
> +  __builtin___clear_cache(0, 0);
> +}
> +
> +void foo1(){
> +  __builtin___clear_cache((void*)0, (void*)0);
> +}
> +
> +void foo2(){
> +  void *yy = 0;
> +  __builtin___clear_cache(yy, yy);
> +}
> +
> +void foo3(){
> +  void *yy = (void*)0x1000;
> +  __builtin___clear_cache(yy, yy);
> +}
> +
> --
> 2.33.0
>

Re: [PATCH v3] [PR/target 100316] Allow constant address for builtin_clear_cache.

2021-10-07 Thread Richard Biener via Gcc-patches

On Fri, Oct 8, 2021 at 4:40 AM Kito Cheng  wrote:
>
> __builtin___clear_cache was able to accept constant address for the
> argument, but it seems no longer accept recently, and it even not
> accept constant address which is hold in variable when optimization is
> enable:
>
> ```
> void foo3(){
>   void *yy = (void*)0x1000;
>   __builtin___clear_cache(yy, yy);
> }
> ```
>
> So this patch make BEGIN and END accept VOIDmode, like cselib_lookup_mem did 
> per
> Jim Wilson's suggestion.
>
> ```
> static cselib_val *
> cselib_lookup_mem (rtx x, int create)
> {
>   ...
>   addr_mode = GET_MODE (XEXP (x, 0));
>   if (addr_mode == VOIDmode)
> addr_mode = Pmode;
> ```
>
> Changes v2 -> v3:
> - Use gcc_assert rather than error, maybe_emit_call_builtin___clear_cache is
> internal use only, and we already checked the type in other place.
>
> Changes v1 -> v2:
> - Check is CONST_INT intead of cehck mode, no new testcase, since
>   constant value with other type like CONST_DOUBLE will catched by
>   front-end.
> e.g.
> Code:
> ```c
> void foo(){
>   __builtin___clear_cache(1.11, 0);
> }
> ```
> Error message:
> ```
> clearcache-double.c: In function 'foo':
> clearcache-double.c:2:27: error: incompatible type for argument 1 of 
> '__builtin___clear_cache'
> 2 |   __builtin___clear_cache(1.11, 0);
>   |   ^~~~
>   |   |
>   |   double
> clearcache-double.c:2:27: note: expected 'void *' but argument is of type 
> 'double'
> ```
>
> gcc/ChangeLog:
>
> PR target/100316
> * builtins.c (maybe_emit_call_builtin___clear_cache): Allow
> CONST_INT for BEGIN and END, and use gcc_assert rather than
> error.
>
> gcc/testsuite/ChangeLog:
>
> PR target/100316
> * gcc.c-torture/compile/pr100316.c: New.
> ---
>  gcc/builtins.c | 10 --
>  gcc/testsuite/gcc.c-torture/compile/pr100316.c | 18 ++
>  2 files changed, 22 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100316.c
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 3e57eb03af0..80a1bb191c6 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -5163,12 +5163,10 @@ default_emit_call_builtin___clear_cache (rtx begin, 
> rtx end)
>  void
>  maybe_emit_call_builtin___clear_cache (rtx begin, rtx end)
>  {
> -  if ((GET_MODE (begin) != ptr_mode && GET_MODE (begin) != Pmode)
> -  || (GET_MODE (end) != ptr_mode && GET_MODE (end) != Pmode))
> -{
> -  error ("both arguments to %<__builtin___clear_cache%> must be 
> pointers");
> -  return;
> -}
> +  gcc_assert ((GET_MODE (begin) == ptr_mode || GET_MODE (begin) == Pmode
> +  || CONST_INT_P (begin))
> + && (GET_MODE (end) == ptr_mode || GET_MODE (end) == Pmode
> + || CONST_INT_P (end)));

OK I guess.

I'm not 100% sure we might not ICE here when using
__builtin_clear_cache on a pointer
with some other than the default address-space which might have a mode
that's not
ptr_mode or Pmode?

Thanks,
Richard.

>if (targetm.have_clear_cache ())
>  {
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100316.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
> new file mode 100644
> index 000..38eca86f49f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr100316.c
> @@ -0,0 +1,18 @@
> +void foo(){
> +  __builtin___clear_cache(0, 0);
> +}
> +
> +void foo1(){
> +  __builtin___clear_cache((void*)0, (void*)0);
> +}
> +
> +void foo2(){
> +  void *yy = 0;
> +  __builtin___clear_cache(yy, yy);
> +}
> +
> +void foo3(){
> +  void *yy = (void*)0x1000;
> +  __builtin___clear_cache(yy, yy);
> +}
> +
> --
> 2.33.0
>

73 matches

Mail list logo