[PATCH] predict: Adjust optimize_function_for_size_p [PR105818]

2022-06-14 Thread Kewen.Lin via Gcc-patches
Hi,

Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
if func->decl is not null but no cgraph node is available for it.
As PR105818 shows, this could give unexpected result.  For the
case in PR105818, when parsing bar decl in function foo, the cfun
is a function structure for foo, for which there is none cgraph
node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
the context is to optimize for size, the flag optimize_size is
true.

The patch is to make optimize_function_for_size_p to check
optimize_size as what it does when func->decl is unavailable.

One regression failure got exposed on aarch64-linux-gnu:

PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
-DPREVENT_OPTIMIZATION  line 21 x == 10 - i

The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
used in function fold_range_test during c parsing, it uses
optimize_function_for_speed_p which is equal to the invertion
of optimize_function_for_size_p.  At that time cfun->decl is valid
but no cgraph node for it, w/o this patch function
optimize_function_for_speed_p returns true eventually, while it
returns false with this patch.  Since the command line option -Os
is specified, there is no reason to interpret it as "for speed".
I think this failure is expected and adjust the test case
accordingly.

Is it ok for trunk?

BR,
Kewen
-

PR target/105818

gcc/ChangeLog:

* predict.cc (optimize_function_for_size_p): Check optimize_size when
func->decl is valid but its cgraph node is unavailable.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr105818.c: New test.
* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
---
 gcc/predict.cc  | 2 +-
 gcc/testsuite/gcc.dg/guality/pr54693-2.c| 2 +-
 gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +
 3 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 5734e4c8516..6c60a973236 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
   cgraph_node *n = cgraph_node::get (fun->decl);
   if (n)
 return n->optimize_for_size_p ();
-  return OPTIMIZE_SIZE_NO;
+  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
 }

 /* Return true if function FUN should always be optimized for speed.  */
diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c 
b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
index 68aa6c63d71..14ca94ad37d 100644
--- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c
+++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
@@ -17,7 +17,7 @@ foo (int x, int y, int z)
   int i = 0;
   while (x > 3 && y > 3 && z > 3)
 {  /* { dg-final { gdb-test .+2 "i" "v + 1" } } */
-   /* { dg-final { gdb-test .+1 "x" "10 - i" } } */
+   /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { 
aarch64*-*-* && { any-opts "-Os" } } } } } */
   bar (i); /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */
/* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { 
aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */
   i++, x--, y -= 2, z -= 3;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c 
b/gcc/testsuite/gcc.target/powerpc/pr105818.c
new file mode 100644
index 000..18781edbf9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c
@@ -0,0 +1,9 @@
+/* { dg-options "-Os -fno-tree-vectorize" } */
+
+#pragma GCC optimize "-fno-tree-vectorize"
+
+void
+foo (void)
+{
+  void bar (void);
+}
--
2.27.0


RE: gcc-patches@gcc.gnu.org

2022-06-14 Thread thereisnotimeforsle...@gmail.com
<>


[PATCH] tree-optimization/105946 - avoid accessing excess args from uninit diag

2022-06-14 Thread Richard Biener via Gcc-patches
uninit diagnostics uses passing via reference and access attributes
but that iterates over function type arguments which can in some
cases appearantly outrun the actual arguments leading to ICEs.
The following simply ignores not present arguments.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-06-14  Richard Biener  

PR tree-optimization/105946
* tree-ssa-uninit.cc (maybe_warn_pass_by_reference):
Do not look at arguments not specified in the function call.
---
 gcc/tree-ssa-uninit.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-ssa-uninit.cc b/gcc/tree-ssa-uninit.cc
index b48fcf1a8ba..f326f1775c0 100644
--- a/gcc/tree-ssa-uninit.cc
+++ b/gcc/tree-ssa-uninit.cc
@@ -797,6 +797,9 @@ maybe_warn_pass_by_reference (gcall *stmt, wlimits &wlims)
 {
   ++argno;
 
+  if (argno > nargs)
+   break;
+
   if (!POINTER_TYPE_P (argtype))
continue;
 
-- 
2.35.3


Re: [PATCH] Do not erase warning data in gimple_set_location

2022-06-14 Thread Eric Botcazou via Gcc-patches
> Hmm, I think instead of special-casing UNKNOWN_LOCATION
> what gimple_set_location should probably do is either not copy
> warnings at all or union them.  Btw, gimple_set_location also
> removes a previously set BLOCK (but gimple_set_block preserves
> the location locus and diagnostic override).
> 
> So I'd be tempted to axe the copy_warning () completely here. 

The first thing I tried, but it regressed the original testcase IIRC.

Even my minimal patch manages to break bootstrap on ARM:

buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libcpp/lex.cc:
1523:9: error: pointer used after ‘void operator delete(void*, std::size_t)’ 
[-Werror=use-after-free]
# 00:31:04 make[3]: *** [Makefile:227: lex.o] Error 1
# 00:31:04 make[2]: *** [Makefile:9527: all-stage3-libcpp] Error 2
# 00:31:35 make[1]: *** [Makefile:25887: stage3-bubble] Error 2
# 00:31:35 make: *** [Makefile:1072: all] Error 2

  /* Don't warn for cases like when a cdtor returns 'this' on ARM.  */
  else if (warning_suppressed_p (var, OPT_Wuse_after_free))
return;

because warning-control.cc:copy_warning also clobbers the warning data of the 
destination.  We have in cp/decl.cc:maybe_return_this the lines:

  /* Return the address of the object.  */
  tree val = DECL_ARGUMENTS (current_function_decl);
  suppress_warning (val, OPT_Wuse_after_free);

-Wuse-after-free is suppressed for the location of VAL and the TREE_NO_WARNING 
bit set on it.  But other expressions may have the same location as VAL and 
the TREE_NO_WARNING bit _not_ set, so when you call copy_warning (expr, expr) 
(we do that a lot after failed folding) for them, copy_warning erases the 
warning data of the location.

I have installed the obvious fixlet after testing on x86-64/Linux, but the 
decoupling between TREE_NO_WARNING bit and location looks a bit problematic.


* warning-control.cc (copy_warning) [generic version]: Do not erase
the warning data of the destination location when the no-warning
bit is not set on the source.
(copy_warning) [tree version]: Return early if TO is equal to FROM.
(copy_warning) [gimple version]: Likewise.
testsuite/
* g++.dg/warn/Wuse-after-free5.C: New test.

-- 
Eric Botcazoudiff --git a/gcc/warning-control.cc b/gcc/warning-control.cc
index 0cbb4f079fa..7e9e701cfbe 100644
--- a/gcc/warning-control.cc
+++ b/gcc/warning-control.cc
@@ -191,7 +191,7 @@ void copy_warning (ToType to, FromType from)
 {
   const location_t to_loc = get_location (to);
 
-  bool supp = get_no_warning_bit (from);
+  const bool supp = get_no_warning_bit (from);
 
   nowarn_spec_t *from_spec = get_nowarn_spec (from);
   if (RESERVED_LOCATION_P (to_loc))
@@ -209,7 +209,7 @@ void copy_warning (ToType to, FromType from)
 	  nowarn_spec_t tem = *from_spec;
 	  nowarn_map->put (to_loc, tem);
 	}
-  else
+  else if (supp)
 	{
 	  if (nowarn_map)
 	nowarn_map->remove (to_loc);
@@ -226,6 +226,8 @@ void copy_warning (ToType to, FromType from)
 void
 copy_warning (tree to, const_tree from)
 {
+  if (to == from)
+return;
   copy_warning(to, from);
 }
 
@@ -250,5 +252,7 @@ copy_warning (gimple *to, const_tree from)
 void
 copy_warning (gimple *to, const gimple *from)
 {
+  if (to == from)
+return;
   copy_warning(to, from);
 }
// Check the suppression of -Wuse-after-free for destructors on ARM
// { dg-do compile }
// { dg-options "-Wuse-after-free" }

struct range_label {
  virtual ~range_label();
};

struct unpaired_bidi_rich_location {
  struct custom_range_label : range_label {};
  unpaired_bidi_rich_location(int);
  custom_range_label m_custom_label;
};

void maybe_warn_bidi_on_close() { unpaired_bidi_rich_location(0); }


[PATCH] middle-end/105965 - add missing v_c_e <{ el }> simplification

2022-06-14 Thread Richard Biener via Gcc-patches
When we got the simplification of bit-field-ref to view-convert
we lost the ability to detect FMAs since we cannot look through

  _1 = {_10};
  _11 = VIEW_CONVERT_EXPR(_1);

the following amends the (view_convert CONSTRUCTOR) pattern
to handle this case.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-06-14  Richard Biener  

PR middle-end/105965
* match.pd (view_convert CONSTRUCTOR): Handle single-element
CTOR case.

* gcc.target/i386/pr105965.c: New testcase.
---
 gcc/match.pd | 17 +
 gcc/testsuite/gcc.target/i386/pr105965.c | 12 
 2 files changed, 25 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105965.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 44a385b912d..776c9c6489a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3669,12 +3669,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && TYPE_UNSIGNED (TREE_TYPE (@1)
(view_convert @1)))
 
-/* Simplify a view-converted empty constructor.  */
+/* Simplify a view-converted empty or single-element constructor.  */
 (simplify
   (view_convert CONSTRUCTOR@0)
-  (if (TREE_CODE (@0) != SSA_NAME
-   && CONSTRUCTOR_NELTS (@0) == 0)
-   { build_zero_cst (type); }))
+  (with
+   { tree ctor = (TREE_CODE (@0) == SSA_NAME
+ ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0); }
+   (switch
+(if (CONSTRUCTOR_NELTS (ctor) == 0)
+ { build_zero_cst (type); })
+(if (CONSTRUCTOR_NELTS (ctor) == 1
+&& VECTOR_TYPE_P (TREE_TYPE (ctor))
+&& operand_equal_p (TYPE_SIZE (type),
+TYPE_SIZE (TREE_TYPE
+  (CONSTRUCTOR_ELT (ctor, 0)->value
+ (view_convert { CONSTRUCTOR_ELT (ctor, 0)->value; })
 
 /* Re-association barriers around constants and other re-association
barriers can be removed.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr105965.c 
b/gcc/testsuite/gcc.target/i386/pr105965.c
new file mode 100644
index 000..5bb53790de8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105965.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfma -mfpmath=sse" } */
+
+typedef float v1sf __attribute__((vector_size(4)));
+
+v1sf
+foo43 (v1sf a, v1sf b, v1sf c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler "fmadd" } } */
-- 
2.35.3


Re: [PATCH v2] RISC-V: bitmanip: improve constant-loading for (1ULL << 31) in DImode

2022-06-14 Thread Philipp Tomsich
Thanks, backport applied to releases/gcc-12!


On Tue, 7 Jun 2022 at 12:28, Kito Cheng  wrote:

> > OK for backport?
>
> OK, it seems no issue after a week :)
>
>
> >
> > On Thu, 2 Jun 2022 at 21:23, Philipp Tomsich 
> wrote:
> > >
> > > Thanks, applied to trunk!
> > >
> > > On Thu, 2 Jun 2022 at 15:17, Kito Cheng  wrote:
> > > >
> > > > LGTM
> > > >
> > > > On Mon, May 30, 2022 at 5:52 AM Philipp Tomsich
> > > >  wrote:
> > > > >
> > > > > The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for
> > > > > bits above 31 only (to side-step any issues with the negative
> SImode
> > > > > value 0x8000/(-1ull << 31)/(1 << 31)).  This moves the special
> > > > > handling of this SImode value (i.e. the check for (-1ull << 31) to
> > > > > riscv.cc and relaxes the SINGLE_BIT_MASK_OPERAND() test.
> > > > >
> > > > > With this, the code-generation for loading (1ULL << 31) from:
> > > > > li  a0,1
> > > > > sllia0,a0,31
> > > > > to:
> > > > > bseti   a0,zero,31
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * config/riscv/riscv.cc (riscv_build_integer_1): Rewrite
> value as
> > > > > (-1 << 31) for the single-bit case, when operating on (1
> << 31)
> > > > > in SImode.
> > > > > * gcc/config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND):
> Allow for
> > > > > any single-bit value, moving the special case for (1 <<
> 31) to
> > > > > riscv_build_integer_1 (in riscv.c).
> > > > >
> > > > > Signed-off-by: Philipp Tomsich 
> > > > >
> > > > > ---
> > > > >
> > > > > Changes in v2:
> > > > > - Use HOST_WIDE_INT_1U/HOST_WIDE_INT_M1U instead of constants.
> > > > > - Fix some typos in the comment above the rewrite of the value.
> > > > > - Update the comment to clarify that we expect a LUI to be emitted
> for
> > > > >   the SImode case (i.e. sign-extended for RV64) of (1 << 31).
> > > > >
> > > > >  gcc/config/riscv/riscv.cc |  9 +
> > > > >  gcc/config/riscv/riscv.h  | 11 ---
> > > > >  2 files changed, 13 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > > > index f83dc796d88..2e83ca07394 100644
> > > > > --- a/gcc/config/riscv/riscv.cc
> > > > > +++ b/gcc/config/riscv/riscv.cc
> > > > > @@ -420,6 +420,15 @@ riscv_build_integer_1 (struct
> riscv_integer_op codes[RISCV_MAX_INTEGER_OPS],
> > > > >/* Simply BSETI.  */
> > > > >codes[0].code = UNKNOWN;
> > > > >codes[0].value = value;
> > > > > +
> > > > > +  /* RISC-V sign-extends all 32bit values that live in a 32bit
> > > > > +register.  To avoid paradoxes, we thus need to use the
> > > > > +sign-extended (negative) representation (-1 << 31) for the
> > > > > +value, if we want to build (1 << 31) in SImode.  This will
> > > > > +then expand to an LUI instruction.  */
> > > > > +  if (mode == SImode && value == (HOST_WIDE_INT_1U << 31))
> > > > > +   codes[0].value = (HOST_WIDE_INT_M1U << 31);
> > > > > +
> > > > >return 1;
> > > > >  }
> > > > >
> > > > > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > > > > index 5083a1c24b0..6f7f4d3fbdc 100644
> > > > > --- a/gcc/config/riscv/riscv.h
> > > > > +++ b/gcc/config/riscv/riscv.h
> > > > > @@ -528,13 +528,10 @@ enum reg_class
> > > > >(((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) -
> IMM_REACH)  \
> > > > > || ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0)
> > > > >
> > > > > -/* If this is a single bit mask, then we can load it with bseti.
> But this
> > > > > -   is not useful for any of the low 31 bits because we can use
> addi or lui
> > > > > -   to load them.  It is wrong for loading SImode 0x8000 on
> rv64 because it
> > > > > -   needs to be sign-extended.  So we restrict this to the upper
> 32-bits
> > > > > -   only.  */
> > > > > -#define SINGLE_BIT_MASK_OPERAND(VALUE) \
> > > > > -  (pow2p_hwi (VALUE) && (ctz_hwi (VALUE) >= 32))
> > > > > +/* If this is a single bit mask, then we can load it with bseti.
> Special
> > > > > +   handling of SImode 0x8000 on RV64 is done in
> riscv_build_integer_1. */
> > > > > +#define SINGLE_BIT_MASK_OPERAND(VALUE)
>  \
> > > > > +  (pow2p_hwi (VALUE))
> > > > >
> > > > >  /* Stack layout; function entry, exit and calling.  */
> > > > >
> > > > > --
> > > > > 2.34.1
> > > > >
>


Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]

2022-06-14 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
> if func->decl is not null but no cgraph node is available for it.
> As PR105818 shows, this could give unexpected result.  For the
> case in PR105818, when parsing bar decl in function foo, the cfun
> is a function structure for foo, for which there is none cgraph
> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
> the context is to optimize for size, the flag optimize_size is
> true.
> 
> The patch is to make optimize_function_for_size_p to check
> optimize_size as what it does when func->decl is unavailable.
> 
> One regression failure got exposed on aarch64-linux-gnu:
> 
> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>   -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
> 
> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
> used in function fold_range_test during c parsing, it uses
> optimize_function_for_speed_p which is equal to the invertion
> of optimize_function_for_size_p.  At that time cfun->decl is valid
> but no cgraph node for it, w/o this patch function
> optimize_function_for_speed_p returns true eventually, while it
> returns false with this patch.  Since the command line option -Os
> is specified, there is no reason to interpret it as "for speed".
> I think this failure is expected and adjust the test case
> accordingly.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
> 
>   PR target/105818
> 
> gcc/ChangeLog:
> 
>   * predict.cc (optimize_function_for_size_p): Check optimize_size when
>   func->decl is valid but its cgraph node is unavailable.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr105818.c: New test.
>   * gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
> ---
>  gcc/predict.cc  | 2 +-
>  gcc/testsuite/gcc.dg/guality/pr54693-2.c| 2 +-
>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +
>  3 files changed, 11 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
> 
> diff --git a/gcc/predict.cc b/gcc/predict.cc
> index 5734e4c8516..6c60a973236 100644
> --- a/gcc/predict.cc
> +++ b/gcc/predict.cc
> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>cgraph_node *n = cgraph_node::get (fun->decl);
>if (n)
>  return n->optimize_for_size_p ();
> -  return OPTIMIZE_SIZE_NO;
> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;

We could also do (opt_for_fn (cfun->decl, optimize_size) that is
probably better since one can change optimize_size with optimization
attribute.
However I think in most cases we check for optimize_size early I think
we are doing something wrong, since at that time htere is no profile
available.  Why exactly PR105818 hits the flag change issue?

Honza
>  }
> 
>  /* Return true if function FUN should always be optimized for speed.  */
> diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c 
> b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> index 68aa6c63d71..14ca94ad37d 100644
> --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> @@ -17,7 +17,7 @@ foo (int x, int y, int z)
>int i = 0;
>while (x > 3 && y > 3 && z > 3)
>  {/* { dg-final { gdb-test .+2 "i" "v + 1" } } */
> - /* { dg-final { gdb-test .+1 "x" "10 - i" } } */
> + /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { 
> aarch64*-*-* && { any-opts "-Os" } } } } } */
>bar (i);   /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */
>   /* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { 
> aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */
>i++, x--, y -= 2, z -= 3;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c 
> b/gcc/testsuite/gcc.target/powerpc/pr105818.c
> new file mode 100644
> index 000..18781edbf9e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c
> @@ -0,0 +1,9 @@
> +/* { dg-options "-Os -fno-tree-vectorize" } */
> +
> +#pragma GCC optimize "-fno-tree-vectorize"
> +
> +void
> +foo (void)
> +{
> +  void bar (void);
> +}
> --
> 2.27.0


Re: [PATCH v1 1/3] RISC-V: add consecutive_bits_operand predicate

2022-06-14 Thread Philipp Tomsich
Thanks, applied to master!


On Tue, 7 Jun 2022 at 12:26, Kito Cheng  wrote:
>
> LGTM
>
>
> On Wed, May 25, 2022 at 5:48 AM Philipp Tomsich
>  wrote:
> >
> > Provide an easy way to constrain for constants that are a a single,
> > consecutive run of ones.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/predicates.md (consecutive_bits_operand):
> >   Implement new predicate.
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >  gcc/config/riscv/predicates.md | 11 +++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> > index c37caa2502b..90db5dfcdd5 100644
> > --- a/gcc/config/riscv/predicates.md
> > +++ b/gcc/config/riscv/predicates.md
> > @@ -243,3 +243,14 @@ (define_predicate "const63_operand"
> >  (define_predicate "imm5_operand"
> >(and (match_code "const_int")
> > (match_test "INTVAL (op) < 5")))
> > +
> > +;; A CONST_INT operand that consists of a single run of consecutive set 
> > bits.
> > +(define_predicate "consecutive_bits_operand"
> > +  (match_code "const_int")
> > +{
> > +   unsigned HOST_WIDE_INT val = UINTVAL (op);
> > +   if (exact_log2 ((val >> ctz_hwi (val)) + 1) < 0)
> > +   return false;
> > +
> > +   return true;
> > +})
> > --
> > 2.34.1
> >


Re: [PATCH v1 2/3] RISC-V: Split slli+sh[123]add.uw opportunities to avoid zext.w

2022-06-14 Thread Philipp Tomsich
Thanks, applied to master!

For [3/3], I'll submit a new standalone patch with the requested changes.

On Tue, 7 Jun 2022 at 12:25, Kito Cheng  wrote:
>
> LGTM, you can commit that without [3/3] if you like :)
>
> On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich
>  wrote:
> >
> > When encountering a prescaled (biased) value as a candidate for
> > sh[123]add.uw, the combine pass will present this as shifted by the
> > aggregate amount (prescale + shift-amount) with an appropriately
> > adjusted mask constant that has fewer than 32 bits set.
> >
> > E.g., here's the failing expression seen in combine for a prescale of
> > 1 and a shift of 2 (note how 0x3fff8 >> 3 is 0x7fff).
> >   Trying 7, 8 -> 10:
> >   7: r78:SI=r81:DI#0<<0x1
> > REG_DEAD r81:DI
> >   8: r79:DI=zero_extend(r78:SI)
> > REG_DEAD r78:SI
> >  10: r80:DI=r79:DI<<0x2+r82:DI
> > REG_DEAD r79:DI
> > REG_DEAD r82:DI
> >   Failed to match this instruction:
> >   (set (reg:DI 80 [ cD.1491 ])
> >   (plus:DI (and:DI (ashift:DI (reg:DI 81)
> >(const_int 3 [0x3]))
> >(const_int 17179869176 [0x3fff8]))
> >   (reg:DI 82)))
> >
> > To address this, we introduce a splitter handling these cases.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/bitmanip.md: Add split to handle opportunities
> >   for slli + sh[123]add.uw
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/zba-shadd.c: New test.
> >
> > Signed-off-by: Philipp Tomsich 
> > Co-developed-by: Manolis Tsamis 
> >
> > ---
> >
> >  gcc/config/riscv/bitmanip.md   | 44 ++
> >  gcc/testsuite/gcc.target/riscv/zba-shadd.c | 13 +++
> >  2 files changed, 57 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shadd.c
> >
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index 0ab9ffe3c0b..6c1ccc6f8c5 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -79,6 +79,50 @@ (define_insn "*shNadduw"
> >[(set_attr "type" "bitmanip")
> > (set_attr "mode" "DI")])
> >
> > +;; During combine, we may encounter an attempt to combine
> > +;;   slli rtmp, rs, #imm
> > +;;   zext.w rtmp, rtmp
> > +;;   sh[123]add rd, rtmp, rs2
> > +;; which will lead to the immediate not satisfying the above constraints.
> > +;; By splitting the compound expression, we can simplify to a slli and a
> > +;; sh[123]add.uw.
> > +(define_split
> > +  [(set (match_operand:DI 0 "register_operand")
> > +   (plus:DI (and:DI (ashift:DI (match_operand:DI 1 "register_operand")
> > +   (match_operand:QI 2 
> > "immediate_operand"))
> > +(match_operand:DI 3 "consecutive_bits_operand"))
> > +(match_operand:DI 4 "register_operand")))
> > +   (clobber (match_operand:DI 5 "register_operand"))]
> > +  "TARGET_64BIT && TARGET_ZBA"
> > +  [(set (match_dup 5) (ashift:DI (match_dup 1) (match_dup 6)))
> > +   (set (match_dup 0) (plus:DI (and:DI (ashift:DI (match_dup 5)
> > + (match_dup 7))
> > +  (match_dup 8))
> > +  (match_dup 4)))]
> > +{
> > +   unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
> > +   /* scale: shift within the sh[123]add.uw */
> > +   int scale = 32 - clz_hwi (mask);
> > +   /* bias:  pre-scale amount (i.e. the prior shift amount) */
> > +   int bias = ctz_hwi (mask) - scale;
> > +
> > +   /* If the bias + scale don't add up to operand[2], reject. */
> > +   if ((scale + bias) != UINTVAL (operands[2]))
> > +  FAIL;
> > +
> > +   /* If the shift-amount is out-of-range for sh[123]add.uw, reject. */
> > +   if ((scale < 1) || (scale > 3))
> > +  FAIL;
> > +
> > +   /* If there's no bias, the '*shNadduw' pattern should have matched. 
> > */
> > +   if (bias == 0)
> > +  FAIL;
> > +
> > +   operands[6] = GEN_INT (bias);
> > +   operands[7] = GEN_INT (scale);
> > +   operands[8] = GEN_INT (0xULL << scale);
> > +})
> > +
> >  (define_insn "*add.uw"
> >[(set (match_operand:DI 0 "register_operand" "=r")
> > (plus:DI (zero_extend:DI
> > diff --git a/gcc/testsuite/gcc.target/riscv/zba-shadd.c 
> > b/gcc/testsuite/gcc.target/riscv/zba-shadd.c
> > new file mode 100644
> > index 000..33da2530f3f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/zba-shadd.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=rv64gc_zba -mabi=lp64" } */
> > +
> > +unsigned long foo(unsigned int a, unsigned long b)
> > +{
> > +a = a << 1;
> > +unsigned long c = (unsigned long) a;
> > +unsigned long d = b + (c<<2);
> > +return d;
> > +}
> > +
> > +/* { dg-final { scan-assembler "sh2add.uw" } } */
> > +/* { dg-final { scan-assembler-not "zext" } } 

Re: [PATCH RFA] ubsan: default to trap on unreachable at -O0 and -Og [PR104642]

2022-06-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 13, 2022 at 03:53:13PM -0400, Jason Merrill via Gcc-patches wrote:
> When not optimizing, we can't do anything useful with unreachability in
> terms of code performance, so we might as well improve debugging by turning
> __builtin_unreachable into a trap.  In the PR richi suggested introducing an
> -funreachable-traps flag for this, but this functionality is already
> implemented as -fsanitize=unreachable -fsanitize-undefined-trap-on-error, we
> just need to set those flags by default.
> 
> I think it also makes sense to do this when we're explicitly optimizing for
> the debugging experience.
> 
> I then needed to make options-save handle -fsanitize and
> -fsanitize-undefined-trap-on-error; since -fsanitize is has custom parsing,
> that meant handling it explicitly in the awk scripts.  I also noticed we
> weren't setting it in opts_set.
> 
> Do we still want -funreachable-traps as an alias (or separate flag) for this
> behavior that doesn't mention the sanitizer?

I do not like doing it this way, -fsanitize-undefined-trap-on-error is
(unfortunately for compatibility with llvm misdesign, users should have
ways to control which of the enabled sanitizers should be handled which way,
where the 3 ways are runtime diagnostics without abort, runtime diagnostics
with abort and __builtin_trap ()) an all or nothing option which affects all
the sanitizers.  Your patch will e.g. silently enable the sanitization
whenever just -fsanitize-undefined-trap-on-error is passed, but that can be
passed even when users don't really expect any sanitization, just making
sure that if it is enabled (say for selected TUs or functions), it doesn't
need a runtime library to report it.
Furthermore, handling it the UBSan way means we slow down the compiler
(enable a bunch of extra passes, like sanopt, ubsan), which is undesirable
e.g. for -O0 compilation speed.

So, I think -funreachable-traps should be a separate flag and not an alias,
enabled by default for -O0 and -Og, which would be handled elsewhere
(I'd say e.g. in fold_builtin_0 and perhaps gimple_fold_builtin too to
avoid allocating trees unnecessary) and would be done if
flag_unreachable_traps && !sanitize_flag_p (SANITIZE_UNREACHABLE),
just replacing that __builtin_unreachable call with __builtin_trap.
For the function ends in fact under those conditions we could emit
__builtin_trap right away instead of emitting __builtin_unreachable
and waiting on folding it later to __builtin_trap.

Jakub



Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]

2022-06-14 Thread Segher Boessenkool
On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote:
>   PR target/105818

Change this to something else?  It is not a target bug.  "middle-end"
perhaps?

Thanks for fixing this,


Segher


Re: [PATCH][WIP] have configure probe prefix for gmp/mpfr/mpc [PR44425]

2022-06-14 Thread Richard Biener via Gcc-patches
On Mon, Jun 13, 2022 at 4:27 PM Eric Gallager  wrote:
>
> On Mon, Jun 13, 2022 at 7:02 AM Richard Biener
>  wrote:
> >
> > On Thu, Jun 2, 2022 at 5:54 PM Eric Gallager via Gcc-patches
> >  wrote:
> > >
> > > So, I'm working on fixing PR bootstrap/44425, and have this patch to
> > > have the top-level configure script check in the value passed to
> > > `--prefix=` when looking for gmp/mpfr/mpc. It "works" (in that
> > > configuring with just `--prefix=` and none of
> > > `--with-gmp=`/`--with-mpfr=`/`--with-mpc=` now works where it failed
> > > before), but unfortunately it results in a bunch of duplicated
> > > `-I`/`-L` flags stuck in ${gmplibs} and ${gmpinc}... is that
> > > acceptable or should I try another approach?
> >
> > --prefix is documented as to be used for installing (architecture
> > independent) files,
> > I'm not sure it is a good idea to probe that for gmp/mpfr/mpc installs used 
> > for
> > the host.
> >
> > Richard.
> >
> > > Eric
>
> So... I guess that means we should close bug 44425 as INVALID or
> WONTFIX then? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44425

That would be my reaction, yes.


RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Richard Biener via Gcc-patches
On Mon, 13 Jun 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 13, 2022 12:48 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > 
> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
> > bitmask
> > 
> > On Mon, 13 Jun 2022, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Monday, June 13, 2022 10:39 AM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > > > 
> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by
> > > > pow2 bitmask
> > > >
> > > > On Mon, 13 Jun 2022, Richard Biener wrote:
> > > >
> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > In plenty of image and video processing code it's common to
> > > > > > modify pixel values by a widening operation and then scale them
> > > > > > back into range
> > > > by dividing by 255.
> > > > > >
> > > > > > This patch adds an optab to allow us to emit an optimized
> > > > > > sequence when doing an unsigned division that is equivalent to:
> > > > > >
> > > > > >x = y / (2 ^ (bitsize (y)/2)-1
> > > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > x86_64-pc-linux-gnu and no issues.
> > > > > >
> > > > > > Ok for master?
> > > > >
> > > > > Looking at 2/2 it seems that this is the wrong way to attack the
> > > > > problem.  The ISA doesn't have such instruction so adding an optab
> > > > > looks premature.  I suppose that there's no unsigned vector
> > > > > integer division and thus we open-code that in a different way?
> > > > > Isn't the correct thing then to fixup that open-coding if it is more
> > efficient?
> > > >
> > >
> > > The problem is that even if you fixup the open-coding it would need to
> > > be something target specific? The sequence of instructions we generate
> > > don't have a GIMPLE representation.  So whatever is generated I'd have
> > > to fixup in RTL then.
> > 
> > What's the operation that doesn't have a GIMPLE representation?
> 
> For NEON use two operations:
> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
> Where the + widens and the >> narrows.  So you give it two shorts, get a 
> byte
> 2. Add widening add of lowpart so basically lowpart (a +w b)
> 
> For SVE2 we use a different sequence, we use two back-to-back sequences of:
> 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions 
> select
>Even and odd elements of the vector rather than "top half" and "bottom 
> half".
> 
>So this instruction does : Add each vector element of the first source 
> vector to the
>corresponding vector element of the second source vector, and place the 
> most
> significant half of the result in the even-numbered half-width 
> destination elements,
> while setting the odd-numbered elements to zero.
> 
> So there's an explicit permute in there. The instructions are sufficiently 
> different that there
> wouldn't be a single GIMPLE representation.

I see.  Are these also useful to express scalar integer division?

I'll defer to others to ack the special udiv_pow2_bitmask optab
or suggest some piecemail things other targets might be able to do as 
well.  It does look very special.  I'd also bikeshed it to
udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming
I interpreted 'bitmask' correctly ;)).  It seems to be even less
general since it is an unary op and the actual divisor is constrained
by the mode itself?

Richard.

> > 
> > I think for costing you could resort to the *_cost functions as used by
> > synth_mult and friends.
> > 
> > > The problem with this is that it seemed fragile. We generate from the
> > > Vectorizer:
> > >
> > >   vect__3.8_35 = MEM  [(uint8_t *)_21];
> > >   vect_patt_28.9_37 = WIDEN_MULT_LO_EXPR  > vect_cst__36>;
> > >   vect_patt_28.9_38 = WIDEN_MULT_HI_EXPR  > vect_cst__36>;
> > >   vect_patt_19.10_40 = vect_patt_28.9_37 h* { 32897, 32897, 32897, 32897,
> > 32897, 32897, 32897, 32897 };
> > >   vect_patt_19.10_41 = vect_patt_28.9_38 h* { 32897, 32897, 32897, 32897,
> > 32897, 32897, 32897, 32897 };
> > >   vect_patt_25.11_42 = vect_patt_19.10_40 >> 7;
> > >   vect_patt_25.11_43 = vect_patt_19.10_41 >> 7;
> > >   vect_patt_11.12_44 = VEC_PACK_TRUNC_EXPR  > > vect_patt_25.11_43>;
> > >
> > > and if the magic constants change then we miss the optimization. I
> > > could rewrite the open coding to use shifts alone, but that might be a
> > regression for some uarches I would imagine.
> > 
> > OK, so you do have a highpart multiply.  I suppose the pattern is too deep 
> > to
> > be recognized by combine?  What's the RTL good vs. bad before combine of
> > one of the expressions?
> 
> Yeah combine only tried 2-3 instructions, but to use these sequences we have 
> to
> match the entire chain as the instructions do the narro

[PATCH V2]rs6000: Store complicated constant into pool

2022-06-14 Thread Jiufu Guo via Gcc-patches
Hi,

This patch is same with nearly:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595378.html

The concept of this patch is similar with the patches which is
attached in PR63281.  e.g.
https://gcc.gnu.org/bugzilla/attachment.cgi?id=42186

I had a test for perlbench from SPEC2017. As expected, on -O2 for P9,
there is ~2% performance improvement with this patch. 

This patch reduces the threshold of instruction number for storing
constant to pool and update cost for constant and mem accessing.
And then if building the constant needs more than 2 instructions (or
more than 1 instruction on P10), then prefer to load it from constant
pool.

Bootstrap and regtest pass on ppc64le and ppc64.
Is this ok for trunk?  Thanks for comments and sugguestions.


BR,
Jiufu


PR target/63281

gcc/ChangeLog:
2022-06-14  Jiufu Guo  
Alan Modra 

* config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem):
Exclude rtx with code 'HIGH'.
(rs6000_emit_move): Update threshold of const insn.
(rs6000_rtx_costs): Update cost of constant and mem.

gcc/testsuite/ChangeLog:
2022-06-14  Jiufu Guo  
Alan Modra 

* gcc.target/powerpc/medium_offset.c: Update.
* gcc.target/powerpc/pr93012.c: Update.
* gcc.target/powerpc/pr63281.c: New test.


---
 gcc/config/rs6000/rs6000.cc   | 23 +++
 .../gcc.target/powerpc/medium_offset.c|  2 +-
 gcc/testsuite/gcc.target/powerpc/pr63281.c| 11 +
 gcc/testsuite/gcc.target/powerpc/pr93012.c|  2 +-
 4 files changed, 31 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cd291f93019..90c91a8e1ea 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void)
 static bool
 rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 {
-  if (GET_CODE (x) == HIGH
-  && GET_CODE (XEXP (x, 0)) == UNSPEC)
+  /* Exclude CONSTANT HIGH part.  e.g.
+ (high:DI (symbol_ref:DI ("var") [flags 0xc0] )).  */
+  if (GET_CODE (x) == HIGH)
 return true;
 
   /* A TLS symbol in the TOC cannot contain a sum.  */
@@ -11139,7 +11140,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode 
mode)
&& FP_REGNO_P (REGNO (operands[0])))
   || !CONST_INT_P (operands[1])
   || (num_insns_constant (operands[1], mode)
-  > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2)))
+  > (TARGET_PREFIXED ? 1 : 2)))
   && !toc_relative_expr_p (operands[1], false, NULL, NULL)
   && (TARGET_CMODEL == CMODEL_SMALL
   || can_create_pseudo_p ()
@@ -22101,6 +22102,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case CONST_DOUBLE:
 case CONST_WIDE_INT:
+  /* It may needs a few insns for const to SET. -1 for outer SET code.  */
+  if (outer_code == SET)
+   {
+ *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1;
+ return true;
+   }
+  /* FALLTHRU */
+
 case CONST:
 case HIGH:
 case SYMBOL_REF:
@@ -22110,8 +22119,12 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 case MEM:
   /* When optimizing for size, MEM should be slightly more expensive
 than generating address, e.g., (plus (reg) (const)).
-L1 cache latency is about two instructions.  */
-  *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2);
+L1 cache latency is about two instructions.
+For prefixed load (pld), we would set it slightly faster than
+than two instructions. */
+  *total = !speed
+? COSTS_N_INSNS (1) + 1
+: TARGET_PREFIXED ? COSTS_N_INSNS (2) - 1 : COSTS_N_INSNS (2);
   if (rs6000_slow_unaligned_access (mode, MEM_ALIGN (x)))
*total += COSTS_N_INSNS (100);
   return true;
diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c 
b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
index f29eba08c38..4889e8fa8ec 100644
--- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
+++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-O" } */
-/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
+/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */
 
 static int x;
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c 
b/gcc/testsuite/gcc.target/powerpc/pr63281.c
new file mode 100644
index 000..469a8f64400
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
@@ -0,0 +1,11 @@
+/* PR target/63281 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -std=c99" } */
+
+void
+foo (unsigned long long *a)
+{
+  *a = 0x02080500610600

Re: [PATCH] Do not erase warning data in gimple_set_location

2022-06-14 Thread Richard Biener via Gcc-patches
On Tue, Jun 14, 2022 at 12:49 PM Eric Botcazou  wrote:
>
> > Hmm, I think instead of special-casing UNKNOWN_LOCATION
> > what gimple_set_location should probably do is either not copy
> > warnings at all or union them.  Btw, gimple_set_location also
> > removes a previously set BLOCK (but gimple_set_block preserves
> > the location locus and diagnostic override).
> >
> > So I'd be tempted to axe the copy_warning () completely here.
>
> The first thing I tried, but it regressed the original testcase IIRC.
>
> Even my minimal patch manages to break bootstrap on ARM:
>
> buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libcpp/lex.cc:
> 1523:9: error: pointer used after ‘void operator delete(void*, std::size_t)’
> [-Werror=use-after-free]
> # 00:31:04 make[3]: *** [Makefile:227: lex.o] Error 1
> # 00:31:04 make[2]: *** [Makefile:9527: all-stage3-libcpp] Error 2
> # 00:31:35 make[1]: *** [Makefile:25887: stage3-bubble] Error 2
> # 00:31:35 make: *** [Makefile:1072: all] Error 2
>
>   /* Don't warn for cases like when a cdtor returns 'this' on ARM.  */
>   else if (warning_suppressed_p (var, OPT_Wuse_after_free))
> return;
>
> because warning-control.cc:copy_warning also clobbers the warning data of the
> destination.  We have in cp/decl.cc:maybe_return_this the lines:
>
>   /* Return the address of the object.  */
>   tree val = DECL_ARGUMENTS (current_function_decl);
>   suppress_warning (val, OPT_Wuse_after_free);
>
> -Wuse-after-free is suppressed for the location of VAL and the TREE_NO_WARNING
> bit set on it.  But other expressions may have the same location as VAL and
> the TREE_NO_WARNING bit _not_ set, so when you call copy_warning (expr, expr)
> (we do that a lot after failed folding) for them, copy_warning erases the
> warning data of the location.
>
> I have installed the obvious fixlet after testing on x86-64/Linux, but the
> decoupling between TREE_NO_WARNING bit and location looks a bit problematic.

Thanks - that makes more sense.

>
> * warning-control.cc (copy_warning) [generic version]: Do not erase
> the warning data of the destination location when the no-warning
> bit is not set on the source.
> (copy_warning) [tree version]: Return early if TO is equal to FROM.
> (copy_warning) [gimple version]: Likewise.
> testsuite/
> * g++.dg/warn/Wuse-after-free5.C: New test.
>
> --
> Eric Botcazou


RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, June 14, 2022 2:19 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> 
> Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
> bitmask
> 
> On Mon, 13 Jun 2022, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, June 13, 2022 12:48 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > > 
> > > Subject: RE: [PATCH 1/2]middle-end Support optimized division by
> > > pow2 bitmask
> > >
> > > On Mon, 13 Jun 2022, Tamar Christina wrote:
> > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Monday, June 13, 2022 10:39 AM
> > > > > To: Tamar Christina 
> > > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > > > > 
> > > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by
> > > > > pow2 bitmask
> > > > >
> > > > > On Mon, 13 Jun 2022, Richard Biener wrote:
> > > > >
> > > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > In plenty of image and video processing code it's common to
> > > > > > > modify pixel values by a widening operation and then scale
> > > > > > > them back into range
> > > > > by dividing by 255.
> > > > > > >
> > > > > > > This patch adds an optab to allow us to emit an optimized
> > > > > > > sequence when doing an unsigned division that is equivalent to:
> > > > > > >
> > > > > > >x = y / (2 ^ (bitsize (y)/2)-1
> > > > > > >
> > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > > x86_64-pc-linux-gnu and no issues.
> > > > > > >
> > > > > > > Ok for master?
> > > > > >
> > > > > > Looking at 2/2 it seems that this is the wrong way to attack
> > > > > > the problem.  The ISA doesn't have such instruction so adding
> > > > > > an optab looks premature.  I suppose that there's no unsigned
> > > > > > vector integer division and thus we open-code that in a different
> way?
> > > > > > Isn't the correct thing then to fixup that open-coding if it
> > > > > > is more
> > > efficient?
> > > > >
> > > >
> > > > The problem is that even if you fixup the open-coding it would
> > > > need to be something target specific? The sequence of instructions
> > > > we generate don't have a GIMPLE representation.  So whatever is
> > > > generated I'd have to fixup in RTL then.
> > >
> > > What's the operation that doesn't have a GIMPLE representation?
> >
> > For NEON use two operations:
> > 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
> > Where the + widens and the >> narrows.  So you give it two shorts,
> > get a byte 2. Add widening add of lowpart so basically lowpart (a +w
> > b)
> >
> > For SVE2 we use a different sequence, we use two back-to-back
> sequences of:
> > 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions
> select
> >Even and odd elements of the vector rather than "top half" and "bottom
> half".
> >
> >So this instruction does : Add each vector element of the first source
> vector to the
> >corresponding vector element of the second source vector, and place the
> most
> > significant half of the result in the even-numbered half-width 
> > destination
> elements,
> > while setting the odd-numbered elements to zero.
> >
> > So there's an explicit permute in there. The instructions are
> > sufficiently different that there wouldn't be a single GIMPLE
> representation.
> 
> I see.  Are these also useful to express scalar integer division?

Hmm not these exact instructions as they only exist on vector. Scalar may
Potentially benefit from rewriting this to (x + ((x + 257) >> 8)) >> 8
Which avoids the multiply with the magic constant.  But the problem here is
that unless undone for vector it would likely generate worse code if vectorized
exactly like this on most ISAs compared to what we have now.

> 
> I'll defer to others to ack the special udiv_pow2_bitmask optab or suggest
> some piecemail things other targets might be able to do as well.  It does look
> very special.  I'd also bikeshed it to
> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
> interpreted 'bitmask' correctly ;)).  It seems to be even less general since 
> it is
> an unary op and the actual divisor is constrained by the mode itself?

I am happy to change the name, and quite happy to add the constant as an
argument.   I had only made it this specific because this was the only fairly
common operation I had found.  Though perhaps it's indeed better to keep
the optab a bit more general?

Thanks,
Tamar

> 
> Richard.
> 
> > >
> > > I think for costing you could resort to the *_cost functions as used
> > > by synth_mult and friends.
> > >
> > > > The problem with this is that it seemed fragile. We generate from
> > > > the
> > > > Vectorizer:
> > > >
> > > >   vect__3.8_35 = MEM  [(uint8_t

Re: [PATCH take #2] Fold truncations of left shifts in match.pd

2022-06-14 Thread Richard Biener via Gcc-patches
On Sun, Jun 5, 2022 at 1:12 PM Roger Sayle  wrote:
>
>
> Hi Richard,
> Many thanks for taking the time to explain how vectorization is supposed
> to work.  I now see that vect_recog_rotate_pattern in tree-vect-patterns.cc
> is supposed to handle lowering of rotations to (vector) shifts, and
> completely agree that adding support for signed types (using appropriate
> casts to unsigned_type_for and casting the result back to the original
> signed type) is a better approach to avoid the regression of pr98674.c.
>
> I've also implemented your suggestions of combining the proposed new
> (convert (lshift @1 INTEGER_CST@2)) with the existing one, and at the
> same time including support for valid shifts greater than the narrower
> type, such as (short)(x << 20),  to constant zero.  Although this optimization
> is already performed during the tree-ssa passes, it's convenient to
> also catch it here during constant folding.
>
> This revised patch has been tested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, both with and without
> --target_board=unix{-m32}, with no new failures.  Ok for mainline?

OK.

Thanks,
Richard.

> 2022-06-05  Roger Sayle  
> Richard Biener  
>
> gcc/ChangeLog
> * match.pd (convert (lshift @1 INTEGER_CST@2)): Narrow integer
> left shifts by a constant when the result is truncated, and the
> shift constant is well-defined.
> * tree-vect-patterns.cc (vect_recog_rotate_pattern): Add
> support for rotations of signed integer types, by lowering
> using unsigned vector shifts.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/fold-convlshift-4.c: New test case.
> * gcc.dg/optimize-bswaphi-1.c: Update found bswap count.
> * gcc.dg/tree-ssa/pr61839_3.c: Shift is now optimized before VRP.
> * gcc.dg/vect/vect-over-widen-1-big-array.c: Remove obsolete tests.
> * gcc.dg/vect/vect-over-widen-1.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4.c: Likewise.
>
>
> Thanks again,
> Roger
> --
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: 02 June 2022 12:03
> > To: Roger Sayle 
> > Cc: GCC Patches 
> > Subject: Re: [PATCH] Fold truncations of left shifts in match.pd
> >
> > On Thu, Jun 2, 2022 at 12:55 PM Roger Sayle 
> > wrote:
> > >
> > >
> > > Hi Richard,
> > > > +  /* RTL expansion knows how to expand rotates using shift/or.  */
> > > > + if (icode == CODE_FOR_nothing
> > > > +  && (code == LROTATE_EXPR || code == RROTATE_EXPR)
> > > > +  && optab_handler (ior_optab, vec_mode) != CODE_FOR_nothing
> > > > +  && optab_handler (ashl_optab, vec_mode) != CODE_FOR_nothing)
> > > > +icode = (int) optab_handler (lshr_optab, vec_mode);
> > > >
> > > > but we then get the vector costing wrong.
> > >
> > > The issue is that we currently get the (relative) vector costing wrong.
> > > Currently for gcc.dg/vect/pr98674.c, the vectorizer thinks the scalar
> > > code requires two shifts and an ior, so believes its profitable to
> > > vectorize this loop using two vector shifts and an vector ior.  But
> > > once match.pd simplifies the truncate and recognizes the HImode rotate we
> > end up with:
> > >
> > > pr98674.c:6:16: note:   ==> examining statement: _6 = _1 r>> 8;
> > > pr98674.c:6:16: note:   vect_is_simple_use: vectype vector(8) short int
> > > pr98674.c:6:16: note:   vect_is_simple_use: operand 8, type of def: 
> > > constant
> > > pr98674.c:6:16: missed:   op not supported by target.
> > > pr98674.c:8:33: missed:   not vectorized: relevant stmt not supported: _6 
> > > = _1
> > r>> 8;
> > > pr98674.c:6:16: missed:  bad operation or unsupported loop bound.
> > >
> > >
> > > Clearly, it's a win to vectorize HImode rotates, when the backend can
> > > perform
> > > 8 (or 16) rotations at a time, but using 3 vector instructions, even
> > > when a scalar rotate can performed in a single instruction.
> > > Fundamentally, vectorization may still be desirable/profitable even when 
> > > the
> > backend doesn't provide an optab.
> >
> > Yes, as said it's tree-vect-patterns.cc job to handle this not natively 
> > supported
> > rotate by re-writing it.  Can you check why vect_recog_rotate_pattern does 
> > not
> > do this?  Ah, the code only handles !TYPE_UNSIGNED (type) - not sure why
> > though (for rotates it should not matter and for the lowered sequence we can
> > convert to desired signedness to get arithmetic/logical shifts)?
> >
> > > The current situation where the i386's backend provides expanders to
> > > lower rotations (or vcond) into individual instruction sequences, also 
> > > interferes
> > with
> > > vector costing.   It's the vector cost function that needs to be fixed, 
> > > not the
> > > generated code made worse (or the backend bloated performing its own
> > > RTL expans

Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Mon, 13 Jun 2022, Tamar Christina wrote:
>
>> > -Original Message-
>> > From: Richard Biener 
>> > Sent: Monday, June 13, 2022 12:48 PM
>> > To: Tamar Christina 
>> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
>> > 
>> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
>> > bitmask
>> > 
>> > On Mon, 13 Jun 2022, Tamar Christina wrote:
>> > 
>> > > > -Original Message-
>> > > > From: Richard Biener 
>> > > > Sent: Monday, June 13, 2022 10:39 AM
>> > > > To: Tamar Christina 
>> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
>> > > > 
>> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by
>> > > > pow2 bitmask
>> > > >
>> > > > On Mon, 13 Jun 2022, Richard Biener wrote:
>> > > >
>> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
>> > > > >
>> > > > > > Hi All,
>> > > > > >
>> > > > > > In plenty of image and video processing code it's common to
>> > > > > > modify pixel values by a widening operation and then scale them
>> > > > > > back into range
>> > > > by dividing by 255.
>> > > > > >
>> > > > > > This patch adds an optab to allow us to emit an optimized
>> > > > > > sequence when doing an unsigned division that is equivalent to:
>> > > > > >
>> > > > > >x = y / (2 ^ (bitsize (y)/2)-1
>> > > > > >
>> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
>> > > > > > x86_64-pc-linux-gnu and no issues.
>> > > > > >
>> > > > > > Ok for master?
>> > > > >
>> > > > > Looking at 2/2 it seems that this is the wrong way to attack the
>> > > > > problem.  The ISA doesn't have such instruction so adding an optab
>> > > > > looks premature.  I suppose that there's no unsigned vector
>> > > > > integer division and thus we open-code that in a different way?
>> > > > > Isn't the correct thing then to fixup that open-coding if it is more
>> > efficient?
>> > > >
>> > >
>> > > The problem is that even if you fixup the open-coding it would need to
>> > > be something target specific? The sequence of instructions we generate
>> > > don't have a GIMPLE representation.  So whatever is generated I'd have
>> > > to fixup in RTL then.
>> > 
>> > What's the operation that doesn't have a GIMPLE representation?
>> 
>> For NEON use two operations:
>> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
>> Where the + widens and the >> narrows.  So you give it two shorts, get a 
>> byte
>> 2. Add widening add of lowpart so basically lowpart (a +w b)
>> 
>> For SVE2 we use a different sequence, we use two back-to-back sequences of:
>> 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions 
>> select
>>Even and odd elements of the vector rather than "top half" and "bottom 
>> half".
>> 
>>So this instruction does : Add each vector element of the first source 
>> vector to the
>>corresponding vector element of the second source vector, and place the 
>> most
>> significant half of the result in the even-numbered half-width 
>> destination elements,
>> while setting the odd-numbered elements to zero.
>> 
>> So there's an explicit permute in there. The instructions are sufficiently 
>> different that there
>> wouldn't be a single GIMPLE representation.
>
> I see.  Are these also useful to express scalar integer division?
>
> I'll defer to others to ack the special udiv_pow2_bitmask optab
> or suggest some piecemail things other targets might be able to do as 
> well.  It does look very special.  I'd also bikeshed it to
> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming
> I interpreted 'bitmask' correctly ;)).  It seems to be even less
> general since it is an unary op and the actual divisor is constrained
> by the mode itself?

Yeah, those were my concerns as well.  For n-bit numbers, the same kind
of arithmetic transformation can be used for any 2^m-1 for m in [n/2, n),
so from a target-independent point of view, m==n/2 isn't particularly
special.  Hard-coding one value of m would make sense if there was an
underlying instruction that did exactly this, but like you say, there
isn't.

Would a compromise be to define an optab for ADDHN and then add a vector
pattern for this division that (at least initially) prefers ADDHN over
the current approach whenever ADDHN is available?  We could then adapt
the conditions on the pattern if other targets also provide ADDHN but
don't want this transform.  (I think the other instructions in the
pattern already have optabs.)

That still leaves open the question about what to do about SVE2,
but the underlying problem there is that the vectoriser doesn't
know about the B/T layout.

Thanks,
Richard


[PATCH] tree-optimization/105736: Don't let error_mark_node escape for ADDR_EXPR

2022-06-14 Thread Siddhesh Poyarekar
The addr_expr computation does not check for error_mark_node before
returning the size expression.  This used to work in the constant case
because the conversion to uhwi would end up causing it to return
size_unknown, but that won't work for the dynamic case.

Modify the control flow to explicitly return size_unknown if the offset
computation returns an error_mark_node.

gcc/ChangeLog:

PR tree-optimization/105736
* tree-object-size.cc (addr_object_size): Return size_unknown
when object offset computation returns an error.

gcc/testsuite/ChangeLog:

PR tree-optimization/105736
* gcc.dg/builtin-dynamic-object-size-0.c (TV4, val3,
test_pr105736): New struct declaration, variable and function to
test PR.
(main): Use them.

Signed-off-by: Siddhesh Poyarekar 
---

Tested:

- x86_64 bootstrap and test
- --with-build-config=bootstrap-ubsan build

May I also backport this to gcc12?

 .../gcc.dg/builtin-dynamic-object-size-0.c| 19 ++
 gcc/tree-object-size.cc   | 20 ++-
 2 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index b5b0b3a677c..90f303ef40e 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -479,6 +479,20 @@ test_loop (int *obj, size_t sz, size_t start, size_t end, 
int incr)
   return __builtin_dynamic_object_size (ptr, 0);
 }
 
+/* Other tests.  */
+
+struct TV4
+{
+  __attribute__((vector_size (sizeof (int) * 4))) int v;
+};
+
+struct TV4 val3;
+int *
+test_pr105736 (struct TV4 *a)
+{
+  return &a->v[0];
+}
+
 unsigned nfails = 0;
 
 #define FAIL() ({ \
@@ -633,6 +647,11 @@ main (int argc, char **argv)
 FAIL ();
   if (test_loop (arr, 42, 20, 52, 1) != 0)
 FAIL ();
+  /* pr105736.  */
+  int *t = test_pr105736 (&val3);
+  if (__builtin_dynamic_object_size (t, 0) != -1)
+__builtin_abort ();
+
 
   if (nfails > 0)
 __builtin_abort ();
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 5ca87ae3504..12bc0868b77 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -695,19 +695,21 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
var_size = pt_var_size;
   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
   if (bytes != error_mark_node)
-   bytes = size_for_offset (var_size, bytes);
-  if (var != pt_var
- && pt_var_size
- && TREE_CODE (pt_var) == MEM_REF
- && bytes != error_mark_node)
{
- tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), pt_var);
- if (bytes2 != error_mark_node)
+ bytes = size_for_offset (var_size, bytes);
+ if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == MEM_REF)
{
- bytes2 = size_for_offset (pt_var_size, bytes2);
- bytes = size_binop (MIN_EXPR, bytes, bytes2);
+ tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0),
+  pt_var);
+ if (bytes2 != error_mark_node)
+   {
+ bytes2 = size_for_offset (pt_var_size, bytes2);
+ bytes = size_binop (MIN_EXPR, bytes, bytes2);
+   }
}
}
+  else
+   bytes = size_unknown (object_size_type);
 
   wholebytes
= object_size_type & OST_SUBOBJECT ? var_size : pt_var_wholesize;
-- 
2.35.3



[PATCH v1.1] tree-optimization/105736: Don't let error_mark_node escape for ADDR_EXPR

2022-06-14 Thread Siddhesh Poyarekar
The addr_expr computation does not check for error_mark_node before
returning the size expression.  This used to work in the constant case
because the conversion to uhwi would end up causing it to return
size_unknown, but that won't work for the dynamic case.

Modify the control flow to explicitly return size_unknown if the offset
computation returns an error_mark_node.

gcc/ChangeLog:

PR tree-optimization/105736
* tree-object-size.cc (addr_object_size): Return size_unknown
when object offset computation returns an error.

gcc/testsuite/ChangeLog:

PR tree-optimization/105736
* gcc.dg/builtin-dynamic-object-size-0.c (TV4, val3,
test_pr105736): New struct declaration, variable and function to
test PR.
(main): Use them.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v1:
- Used FAIL() instead of __builtin_abort() in the test.

Tested:

- x86_64 bootstrap and test
- --with-build-config=bootstrap-ubsan build

May I also backport this to gcc12?

 .../gcc.dg/builtin-dynamic-object-size-0.c| 18 +
 gcc/tree-object-size.cc   | 20 ++-
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index b5b0b3a677c..01a280b2d7b 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -479,6 +479,20 @@ test_loop (int *obj, size_t sz, size_t start, size_t end, 
int incr)
   return __builtin_dynamic_object_size (ptr, 0);
 }
 
+/* Other tests.  */
+
+struct TV4
+{
+  __attribute__((vector_size (sizeof (int) * 4))) int v;
+};
+
+struct TV4 val3;
+int *
+test_pr105736 (struct TV4 *a)
+{
+  return &a->v[0];
+}
+
 unsigned nfails = 0;
 
 #define FAIL() ({ \
@@ -633,6 +647,10 @@ main (int argc, char **argv)
 FAIL ();
   if (test_loop (arr, 42, 20, 52, 1) != 0)
 FAIL ();
+  /* pr105736.  */
+  int *t = test_pr105736 (&val3);
+  if (__builtin_dynamic_object_size (t, 0) != -1)
+FAIL ();
 
   if (nfails > 0)
 __builtin_abort ();
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 5ca87ae3504..12bc0868b77 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -695,19 +695,21 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
var_size = pt_var_size;
   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
   if (bytes != error_mark_node)
-   bytes = size_for_offset (var_size, bytes);
-  if (var != pt_var
- && pt_var_size
- && TREE_CODE (pt_var) == MEM_REF
- && bytes != error_mark_node)
{
- tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), pt_var);
- if (bytes2 != error_mark_node)
+ bytes = size_for_offset (var_size, bytes);
+ if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == MEM_REF)
{
- bytes2 = size_for_offset (pt_var_size, bytes2);
- bytes = size_binop (MIN_EXPR, bytes, bytes2);
+ tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0),
+  pt_var);
+ if (bytes2 != error_mark_node)
+   {
+ bytes2 = size_for_offset (pt_var_size, bytes2);
+ bytes = size_binop (MIN_EXPR, bytes, bytes2);
+   }
}
}
+  else
+   bytes = size_unknown (object_size_type);
 
   wholebytes
= object_size_type & OST_SUBOBJECT ? var_size : pt_var_wholesize;
-- 
2.35.3



RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Tamar Christina via Gcc-patches



> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, June 14, 2022 2:43 PM
> To: Richard Biener 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
> nd 
> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
> bitmask
> 
> Richard Biener  writes:
> > On Mon, 13 Jun 2022, Tamar Christina wrote:
> >
> >> > -Original Message-
> >> > From: Richard Biener 
> >> > Sent: Monday, June 13, 2022 12:48 PM
> >> > To: Tamar Christina 
> >> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> >> > 
> >> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by
> >> > pow2 bitmask
> >> >
> >> > On Mon, 13 Jun 2022, Tamar Christina wrote:
> >> >
> >> > > > -Original Message-
> >> > > > From: Richard Biener 
> >> > > > Sent: Monday, June 13, 2022 10:39 AM
> >> > > > To: Tamar Christina 
> >> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> >> > > > 
> >> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division
> >> > > > by
> >> > > > pow2 bitmask
> >> > > >
> >> > > > On Mon, 13 Jun 2022, Richard Biener wrote:
> >> > > >
> >> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> >> > > > >
> >> > > > > > Hi All,
> >> > > > > >
> >> > > > > > In plenty of image and video processing code it's common to
> >> > > > > > modify pixel values by a widening operation and then scale
> >> > > > > > them back into range
> >> > > > by dividing by 255.
> >> > > > > >
> >> > > > > > This patch adds an optab to allow us to emit an optimized
> >> > > > > > sequence when doing an unsigned division that is equivalent to:
> >> > > > > >
> >> > > > > >x = y / (2 ^ (bitsize (y)/2)-1
> >> > > > > >
> >> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> > > > > > x86_64-pc-linux-gnu and no issues.
> >> > > > > >
> >> > > > > > Ok for master?
> >> > > > >
> >> > > > > Looking at 2/2 it seems that this is the wrong way to attack
> >> > > > > the problem.  The ISA doesn't have such instruction so adding
> >> > > > > an optab looks premature.  I suppose that there's no unsigned
> >> > > > > vector integer division and thus we open-code that in a different
> way?
> >> > > > > Isn't the correct thing then to fixup that open-coding if it
> >> > > > > is more
> >> > efficient?
> >> > > >
> >> > >
> >> > > The problem is that even if you fixup the open-coding it would
> >> > > need to be something target specific? The sequence of
> >> > > instructions we generate don't have a GIMPLE representation.  So
> >> > > whatever is generated I'd have to fixup in RTL then.
> >> >
> >> > What's the operation that doesn't have a GIMPLE representation?
> >>
> >> For NEON use two operations:
> >> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
> >> Where the + widens and the >> narrows.  So you give it two
> >> shorts, get a byte 2. Add widening add of lowpart so basically
> >> lowpart (a +w b)
> >>
> >> For SVE2 we use a different sequence, we use two back-to-back
> sequences of:
> >> 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions
> select
> >>Even and odd elements of the vector rather than "top half" and "bottom
> half".
> >>
> >>So this instruction does : Add each vector element of the first source
> vector to the
> >>corresponding vector element of the second source vector, and place
> the most
> >> significant half of the result in the even-numbered half-width
> destination elements,
> >> while setting the odd-numbered elements to zero.
> >>
> >> So there's an explicit permute in there. The instructions are
> >> sufficiently different that there wouldn't be a single GIMPLE
> representation.
> >
> > I see.  Are these also useful to express scalar integer division?
> >
> > I'll defer to others to ack the special udiv_pow2_bitmask optab or
> > suggest some piecemail things other targets might be able to do as
> > well.  It does look very special.  I'd also bikeshed it to
> > udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
> > interpreted 'bitmask' correctly ;)).  It seems to be even less general
> > since it is an unary op and the actual divisor is constrained by the
> > mode itself?
> 
> Yeah, those were my concerns as well.  For n-bit numbers, the same kind of
> arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so
> from a target-independent point of view, m==n/2 isn't particularly special.
> Hard-coding one value of m would make sense if there was an underlying
> instruction that did exactly this, but like you say, there isn't.
> 
> Would a compromise be to define an optab for ADDHN and then add a vector
> pattern for this division that (at least initially) prefers ADDHN over the
> current approach whenever ADDHN is available?  We could then adapt the
> conditions on the pattern if other targets also provide ADDHN but don't want
> this transform.  (I think the other instructions in the pattern already have
> optabs.)
> 
> That still leaves o

Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Richard Biener via Gcc-patches



> Am 14.06.2022 um 17:58 schrieb Tamar Christina via Gcc-patches 
> :
> 
> 
> 
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, June 14, 2022 2:43 PM
>> To: Richard Biener 
>> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
>> nd 
>> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
>> bitmask
>> 
>> Richard Biener  writes:
 On Mon, 13 Jun 2022, Tamar Christina wrote:
>>> 
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 13, 2022 12:48 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> 
> Subject: RE: [PATCH 1/2]middle-end Support optimized division by
> pow2 bitmask
> 
> On Mon, 13 Jun 2022, Tamar Christina wrote:
> 
>>> -Original Message-
>>> From: Richard Biener 
>>> Sent: Monday, June 13, 2022 10:39 AM
>>> To: Tamar Christina 
>>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
>>> 
>>> Subject: Re: [PATCH 1/2]middle-end Support optimized division
>>> by
>>> pow2 bitmask
>>> 
>>> On Mon, 13 Jun 2022, Richard Biener wrote:
>>> 
 On Thu, 9 Jun 2022, Tamar Christina wrote:
 
> Hi All,
> 
> In plenty of image and video processing code it's common to
> modify pixel values by a widening operation and then scale
> them back into range
>>> by dividing by 255.
> 
> This patch adds an optab to allow us to emit an optimized
> sequence when doing an unsigned division that is equivalent to:
> 
>   x = y / (2 ^ (bitsize (y)/2)-1
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
 
 Looking at 2/2 it seems that this is the wrong way to attack
 the problem.  The ISA doesn't have such instruction so adding
 an optab looks premature.  I suppose that there's no unsigned
 vector integer division and thus we open-code that in a different
>> way?
 Isn't the correct thing then to fixup that open-coding if it
 is more
> efficient?
>>> 
>> 
>> The problem is that even if you fixup the open-coding it would
>> need to be something target specific? The sequence of
>> instructions we generate don't have a GIMPLE representation.  So
>> whatever is generated I'd have to fixup in RTL then.
> 
> What's the operation that doesn't have a GIMPLE representation?
 
 For NEON use two operations:
 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
Where the + widens and the >> narrows.  So you give it two
 shorts, get a byte 2. Add widening add of lowpart so basically
 lowpart (a +w b)
 
 For SVE2 we use a different sequence, we use two back-to-back
>> sequences of:
 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions
>> select
   Even and odd elements of the vector rather than "top half" and "bottom
>> half".
 
   So this instruction does : Add each vector element of the first source
>> vector to the
   corresponding vector element of the second source vector, and place
>> the most
significant half of the result in the even-numbered half-width
>> destination elements,
while setting the odd-numbered elements to zero.
 
 So there's an explicit permute in there. The instructions are
 sufficiently different that there wouldn't be a single GIMPLE
>> representation.
>>> 
>>> I see.  Are these also useful to express scalar integer division?
>>> 
>>> I'll defer to others to ack the special udiv_pow2_bitmask optab or
>>> suggest some piecemail things other targets might be able to do as
>>> well.  It does look very special.  I'd also bikeshed it to
>>> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
>>> interpreted 'bitmask' correctly ;)).  It seems to be even less general
>>> since it is an unary op and the actual divisor is constrained by the
>>> mode itself?
>> 
>> Yeah, those were my concerns as well.  For n-bit numbers, the same kind of
>> arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so
>> from a target-independent point of view, m==n/2 isn't particularly special.
>> Hard-coding one value of m would make sense if there was an underlying
>> instruction that did exactly this, but like you say, there isn't.
>> 
>> Would a compromise be to define an optab for ADDHN and then add a vector
>> pattern for this division that (at least initially) prefers ADDHN over the
>> current approach whenever ADDHN is available?  We could then adapt the
>> conditions on the pattern if other targets also provide ADDHN but don't want
>> this transform.  (I think the other instructions in the pattern already have
>> optabs.)
>> 
>> That still leaves open the question about what to do about SVE2, but

c++: Elide calls to NOP module initializers

2022-06-14 Thread Nathan Sidwell via Gcc-patches
This implements NOP module initializer elision. Each CMI gains a new 
flag informing importers whether its initializer actually does something 
(initializers its own globals, and/or calls initializers of its 
imports).  This allows an importer to determine whether to call it.


nathan


--
Nathan Sidwell


Re: [committed] openmp: Conforming device numbers and omp_{initial, invalid}_device

2022-06-14 Thread Thomas Schwinge
Hi Jakub!

On 2022-06-13T14:06:39+0200, Jakub Jelinek via Gcc-patches 
 wrote:
> OpenMP 5.2 changed once more what device numbers are allowed.

> libgomp/

>   * testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
>   test with omp_initial_device.  Use -5 instead of -1 for negative value
>   test.
>   * testsuite/libgomp.fortran/target-is-accessible-1.f90 (main):
>   Likewise.  Reorder stop numbers.

In an offloading configuration, I'm seeing:

PASS: libgomp.fortran/get-mapped-ptr-1.f90   -O  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/get-mapped-ptr-1.f90   -O  execution test

Does that one need similar treatment?

It FAILs in 'STOP 1'; 'libgomp.fortran/get-mapped-ptr-1.f90':

 1 program main
 2   use omp_lib
 3   use iso_c_binding
 4   implicit none (external, type)
 5   integer :: d, id
 6   type(c_ptr) :: p
 7   integer, target :: q
 8
 9   d = omp_get_default_device ()
10   id = omp_get_initial_device ()
11
12   if (d < 0 .or. d >= omp_get_num_devices ()) &
13 d = id
14
15   p = omp_target_alloc (c_sizeof (q), d)
16   if (.not. c_associated (p)) &
17 stop 0  ! okay
18
19   if (omp_target_associate_ptr (c_loc (q), p, c_sizeof (q), &
20 0_c_size_t, d) == 0) then
21
22 if(c_associated (omp_get_mapped_ptr (c_loc (q), -1))) &
23   stop 1
[...]


Grüße
 Thomas


> --- libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c.jj
> 2022-05-23 21:44:48.950848384 +0200
> +++ libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c   
> 2022-06-13 13:10:56.471535852 +0200
> @@ -17,7 +17,10 @@ main ()
>if (!omp_target_is_accessible (p, sizeof (int), id))
>  __builtin_abort ();
>
> -  if (omp_target_is_accessible (p, sizeof (int), -1))
> +  if (!omp_target_is_accessible (p, sizeof (int), omp_initial_device))
> +__builtin_abort ();
> +
> +  if (omp_target_is_accessible (p, sizeof (int), -5))
>  __builtin_abort ();
>
>if (omp_target_is_accessible (p, sizeof (int), n + 1))
> --- libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90.jj   
> 2022-05-23 21:44:48.954848343 +0200
> +++ libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90  
> 2022-06-13 13:12:08.133819977 +0200
> @@ -19,12 +19,15 @@ program main
>if (omp_target_is_accessible (p, c_sizeof (d), id) /= 1) &
>  stop 2
>
> -  if (omp_target_is_accessible (p, c_sizeof (d), -1) /= 0) &
> +  if (omp_target_is_accessible (p, c_sizeof (d), omp_initial_device) /= 1) &
>  stop 3
>
> -  if (omp_target_is_accessible (p, c_sizeof (d), n + 1) /= 0) &
> +  if (omp_target_is_accessible (p, c_sizeof (d), -5) /= 0) &
>  stop 4
>
> +  if (omp_target_is_accessible (p, c_sizeof (d), n + 1) /= 0) &
> +stop 5
> +
>! Currently, a host pointer is accessible if the device supports shared
>! memory or omp_target_is_accessible is executed on the host. This
>! test case must be adapted when unified shared memory is avialable.
> @@ -35,14 +38,14 @@ program main
>  !$omp end target
>
>  if (omp_target_is_accessible (p, c_sizeof (d), d) /= shared_mem) &
> -  stop 5;
> +  stop 6;
>
>  if (omp_target_is_accessible (c_loc (a), 128 * sizeof (a(1)), d) /= 
> shared_mem) &
> -  stop 6;
> +  stop 7;
>
>  do i = 1, 128
>if (omp_target_is_accessible (c_loc (a(i)), sizeof (a(i)), d) /= 
> shared_mem) &
> -stop 7;
> +stop 8;
>  end do
>
>end do
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] opts: improve option suggestion

2022-06-14 Thread Jeff Law via Gcc-patches




On 5/24/2022 2:45 AM, Martin Liška wrote:

PING^1

On 5/12/22 09:10, Martin Liška wrote:

On 5/11/22 20:49, David Malcolm wrote:

On Wed, 2022-05-11 at 16:49 +0200, Martin Liška wrote:

In case where we have 2 equally good candidates like
-ftrivial-auto-var-init=
-Wtrivial-auto-var-init

for -ftrivial-auto-var-init, we should take the candidate that
has a difference in trailing sign symbol.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

 PR driver/105564

gcc/ChangeLog:

 * spellcheck.cc (test_find_closest_string): Add new test.
 * spellcheck.h (class best_match): Prefer a difference in
 trailing sign symbol.

OK
jeff



[PATCH] i386: Disallow sibcall when calling ifunc functions with PIC register

2022-06-14 Thread H.J. Lu via Gcc-patches
Disallow siball when calling ifunc functions with PIC register so that
PIC register can be restored.

gcc/

PR target/105960
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
false if PIC register is used when calling ifunc functions.

gcc/testsuite/

PR target/105960
* gcc.target/i386/pr105960.c: New test.
---
 gcc/config/i386/i386.cc  |  9 +
 gcc/testsuite/gcc.target/i386/pr105960.c | 19 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105960.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3d189e124e4..1ca7836e11e 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1015,6 +1015,15 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
}
 }
 
+  if (decl && ix86_use_pseudo_pic_reg ())
+{
+  /* When PIC register is used, it must be restored after ifunc
+function returns.  */
+   cgraph_node *node = cgraph_node::get (decl);
+   if (node && node->ifunc_resolver)
+return false;
+}
+
   /* Otherwise okay.  That also includes certain types of indirect calls.  */
   return true;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr105960.c 
b/gcc/testsuite/gcc.target/i386/pr105960.c
new file mode 100644
index 000..db137a1642d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105960.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O2 -fpic" } */
+
+__attribute__((target_clones("default","fma")))
+static inline double
+expfull_ref(double x)
+{
+  return __builtin_pow(x, 0.1234);
+}
+
+double
+exp_ref(double x)
+{
+  return expfull_ref(x);
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*expfull_ref@PLT" { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*expfull_ref@PLT" { target ia32 } } 
} */
-- 
2.36.1



[PATCH] gcc/configure.ac: fix --enable-fixed-point enablement [PR34422]

2022-06-14 Thread Eric Gallager via Gcc-patches
So, in investigating PR target/34422, I discovered that the gcc
subdirectory's configure script had an instance of AC_ARG_ENABLE with
3rd and 4th its arguments reversed: the one where it warns that the
--enable-fixed-point flag is being ignored is the one where that flag
hasn't even been passed in the first place. The attached patch puts
the warning in the correct argument to the macro in question. (I'm not
including the regeneration of gcc/configure in the patch this time
since that confused people last time.) OK to commit, with an
appropriate ChangeLog?


patch-gcc_configure.diff
Description: Binary data


Re: [PATCH 2/5] xtensa: Add support for sibling call optimization

2022-06-14 Thread Max Filippov via Gcc-patches
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr90949.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr90949.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr90949.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr90949.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -O2  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -Os  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/packed-array.c execution test
FAIL: gcc.dg/pr20115.c execution test
FAIL: gcc.dg/pr44404.c execution test
FAIL: gcc.dg/pr81292-2.c execution test
FAIL: gcc.dg/strlenopt-31.c execution test
FAIL: gcc.dg/strlenopt-81.c execution test
FAIL: gcc.dg/torture/builtin-complex-1.c   -O2  execution test
FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -g  execution test
FAIL: gcc.dg/torture/builtin-complex-1.c   -Os  execution test
FAIL: gcc.dg/torture/builtin-complex-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/pr56661.c   -Os  execution test
FAIL: gcc.dg/torture/pr65077.c   -O2  execution test
FAIL: gcc.dg/torture/pr65077.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.dg/torture/pr65077.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr65077.c   -Os  execution test
FAIL: gcc.dg/torture/pr65077.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/pr67916.c   -O2  execution test
FAIL: gcc.dg/torture/pr67916.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.dg/torture/pr67916.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr67916.c   -Os  execution test
FAIL: gcc.dg/torture/pr67916.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/tree-ssa/cswtch-3.c execution test
FAIL: gcc.dg/tree-ssa/predcom-dse-5.c execution test
FAIL: gcc.dg/tree-ssa/predcom-dse-6.c execution test
FAIL: gcc.dg/tree-ssa/predcom-dse-7.c execution test

The code generated for e.g. gcc.c-torture/execute/921208-2.c looks like this:

   .file   "921208-2.c"
   .text
   .literal_position
   .align  4
   .global g
   .type   g, @function
g:
   ret.n
   .size   g, .-g
   .literal_position
   .literal .LC1, g@PLT
   .literal .LC3, 1072693248
   .literal .LC4, 1073741824
   .align  4
   .global f
   .type   f, @function
f:
   addisp, sp, -16
   s32i.n  a13, sp, 4
   l32ra13, .LC3
   s32i.n  a12, sp, 8
   s32i.n  a14, sp, 0
   movi.n  a12, 0
   l32ra14, .LC1
   s32i.n  a0, sp, 12
   mov.n   a3, a13
   mov.n   a4, a12
   mov.n   a5, a13
   mov.n   a2, a12
   callx0  a14
   l32i.n  a0, sp, 12
   l32i.n  a14, sp, 0
   mov.n   a4, a12
   mov.n   a5, a13
   l32i.n  a12, sp, 8
   l32i.n  a13, sp, 4
   l32ra3, .LC4
   movi.n  a2, 0
   addisp, sp, 16
   jx  a14
   .size   f, .-f
   .section.text.startup,"ax",@progbits
   .literal_position
   .literal .LC5, f@PLT
   .literal .LC6, exit@PLT
   .align  4
   .global main
   .type   main, @function
main:
   addisp, sp, -16
   l32ra2, .LC5
   s32i.n  a0, sp, 12
   callx0  a2
   l32ra3, .LC6
   movi.n  a2, 0
   callx0  a3
   .size   main, .-main
   .ident  "GCC: (GNU) 13.0.0 20220614 (experimental)"

-- 
Thanks.
-- Max


[committed] libstdc++: Check for size overflow in constexpr allocation [PR105957]

2022-06-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/105957
* include/bits/allocator.h (allocator::allocate): Check for
overflow in constexpr allocation.
* testsuite/20_util/allocator/105975.cc: New test.
---
 libstdc++-v3/include/bits/allocator.h  |  7 ++-
 .../testsuite/20_util/allocator/105975.cc  | 18 ++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/allocator/105975.cc

diff --git a/libstdc++-v3/include/bits/allocator.h 
b/libstdc++-v3/include/bits/allocator.h
index ee1121b080a..aec0b374fd1 100644
--- a/libstdc++-v3/include/bits/allocator.h
+++ b/libstdc++-v3/include/bits/allocator.h
@@ -184,7 +184,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   allocate(size_t __n)
   {
if (std::__is_constant_evaluated())
- return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
+ {
+   if (__builtin_mul_overflow(__n, sizeof(_Tp), &__n))
+ std::__throw_bad_array_new_length();
+   return static_cast<_Tp*>(::operator new(__n));
+ }
+
return __allocator_base<_Tp>::allocate(__n, 0);
   }
 
diff --git a/libstdc++-v3/testsuite/20_util/allocator/105975.cc 
b/libstdc++-v3/testsuite/20_util/allocator/105975.cc
new file mode 100644
index 000..4342aeade04
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/allocator/105975.cc
@@ -0,0 +1,18 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+// PR libstdc++/105957
+
+#include 
+
+consteval bool test_pr105957()
+{
+  std::allocator a;
+  auto n = std::size_t(-1) / (sizeof(long long) - 1);
+  auto p = a.allocate(n); // { dg-error "constexpr" }
+  a.deallocate(p, n);
+  return true;
+}
+static_assert( test_pr105957() );
+
+// { dg-error "throw_bad_array_new_length" "" { target *-*-* } 0 }
-- 
2.34.3



[committed] libstdc++: Fix indentation in allocator base classes

2022-06-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/new_allocator.h: Fix indentation.
* include/ext/malloc_allocator.h: Likewise.
---
 libstdc++-v3/include/bits/new_allocator.h   | 6 +++---
 libstdc++-v3/include/ext/malloc_allocator.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/new_allocator.h 
b/libstdc++-v3/include/bits/new_allocator.h
index 1a5bc51b956..92ae9847f1c 100644
--- a/libstdc++-v3/include/bits/new_allocator.h
+++ b/libstdc++-v3/include/bits/new_allocator.h
@@ -119,9 +119,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   allocate(size_type __n, const void* = static_cast(0))
   {
 #if __cplusplus >= 201103L
-// _GLIBCXX_RESOLVE_LIB_DEFECTS
-// 3308. std::allocator().allocate(n)
-static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types");
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3308. std::allocator().allocate(n)
+   static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types");
 #endif
 
if (__builtin_expect(__n > this->_M_max_size(), false))
diff --git a/libstdc++-v3/include/ext/malloc_allocator.h 
b/libstdc++-v3/include/ext/malloc_allocator.h
index b61e9a85bb2..82b3f0a1c6f 100644
--- a/libstdc++-v3/include/ext/malloc_allocator.h
+++ b/libstdc++-v3/include/ext/malloc_allocator.h
@@ -103,9 +103,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   allocate(size_type __n, const void* = 0)
   {
 #if __cplusplus >= 201103L
-// _GLIBCXX_RESOLVE_LIB_DEFECTS
-// 3308. std::allocator().allocate(n)
-static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types");
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3308. std::allocator().allocate(n)
+   static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types");
 #endif
 
if (__builtin_expect(__n > this->_M_max_size(), false))
-- 
2.34.3



[committed] libstdc++: Inline all basic_string::compare overloads [PR59048]

2022-06-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

Defining the compare member functions inline allows calls to
traits_type::length and std::min to be inlined, taking advantage of
constant expression arguments. When not inline, the compiler prefers to
use the explicit instantiation definitions in libstdc++.so and can't
take advantage of constant arguments.

libstdc++-v3/ChangeLog:

PR libstdc++/59048
* include/bits/basic_string.h (compare): Define inline.
* include/bits/basic_string.tcc (compare): Remove out-of-line
definitions.
* include/bits/cow_string.h (compare): Define inline.
* testsuite/21_strings/basic_string/operations/compare/char/3.cc:
New test.
---
 libstdc++-v3/include/bits/basic_string.h  | 63 --
 libstdc++-v3/include/bits/basic_string.tcc| 85 ---
 libstdc++-v3/include/bits/cow_string.h| 63 --
 .../basic_string/operations/compare/char/3.cc |  7 ++
 4 files changed, 123 insertions(+), 95 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/char/3.cc

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index f76ddf970c6..a34b3d9ed28 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3235,7 +3235,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   */
   _GLIBCXX20_CONSTEXPR
   int
-  compare(size_type __pos, size_type __n, const basic_string& __str) const;
+  compare(size_type __pos, size_type __n, const basic_string& __str) const
+  {
+   _M_check(__pos, "basic_string::compare");
+   __n = _M_limit(__pos, __n);
+   const size_type __osize = __str.size();
+   const size_type __len = std::min(__n, __osize);
+   int __r = traits_type::compare(_M_data() + __pos, __str.data(), __len);
+   if (!__r)
+ __r = _S_compare(__n, __osize);
+   return __r;
+  }
 
   /**
*  @brief  Compare substring to a substring.
@@ -3263,7 +3273,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _GLIBCXX20_CONSTEXPR
   int
   compare(size_type __pos1, size_type __n1, const basic_string& __str,
- size_type __pos2, size_type __n2 = npos) const;
+ size_type __pos2, size_type __n2 = npos) const
+  {
+   _M_check(__pos1, "basic_string::compare");
+   __str._M_check(__pos2, "basic_string::compare");
+   __n1 = _M_limit(__pos1, __n1);
+   __n2 = __str._M_limit(__pos2, __n2);
+   const size_type __len = std::min(__n1, __n2);
+   int __r = traits_type::compare(_M_data() + __pos1,
+  __str.data() + __pos2, __len);
+   if (!__r)
+ __r = _S_compare(__n1, __n2);
+   return __r;
+  }
 
   /**
*  @brief  Compare to a C string.
@@ -3281,7 +3303,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   */
   _GLIBCXX20_CONSTEXPR
   int
-  compare(const _CharT* __s) const _GLIBCXX_NOEXCEPT;
+  compare(const _CharT* __s) const _GLIBCXX_NOEXCEPT
+  {
+   __glibcxx_requires_string(__s);
+   const size_type __size = this->size();
+   const size_type __osize = traits_type::length(__s);
+   const size_type __len = std::min(__size, __osize);
+   int __r = traits_type::compare(_M_data(), __s, __len);
+   if (!__r)
+ __r = _S_compare(__size, __osize);
+   return __r;
+  }
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 5 String::compare specification questionable
@@ -3306,7 +3338,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   */
   _GLIBCXX20_CONSTEXPR
   int
-  compare(size_type __pos, size_type __n1, const _CharT* __s) const;
+  compare(size_type __pos, size_type __n1, const _CharT* __s) const
+  {
+   __glibcxx_requires_string(__s);
+   _M_check(__pos, "basic_string::compare");
+   __n1 = _M_limit(__pos, __n1);
+   const size_type __osize = traits_type::length(__s);
+   const size_type __len = std::min(__n1, __osize);
+   int __r = traits_type::compare(_M_data() + __pos, __s, __len);
+   if (!__r)
+ __r = _S_compare(__n1, __osize);
+   return __r;
+  }
 
   /**
*  @brief  Compare substring against a character %array.
@@ -3335,7 +3378,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _GLIBCXX20_CONSTEXPR
   int
   compare(size_type __pos, size_type __n1, const _CharT* __s,
- size_type __n2) const;
+ size_type __n2) const
+  {
+   __glibcxx_requires_string_len(__s, __n2);
+   _M_check(__pos, "basic_string::compare");
+   __n1 = _M_limit(__pos, __n1);
+   const size_type __len = std::min(__n1, __n2);
+   int __r = traits_type::compare(_M_data() + __pos, __s, __len);
+   if (!__r)
+ __r = _S_compare(__n1, __n2);
+   return __r;
+  }
 
 #if __cplusplus >= 202002L
   constexpr bool
diff --git a/libstdc++-v3/include/bits/b

[committed] libstdc++: Check lengths first in operator== for basic_string [PR62187]

2022-06-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

As confirmed by LWG 2852, the calls to traits_type::compare do not need
to be obsvervable, so we can make operator== compare string lengths
first and return immediately for non-equal lengths. This avoids doing a
slow string comparison for "abc...xyz" == "abc...xy". Previously we only
did this optimization for std::char_traits, but we can enable it
unconditionally thanks to LWG 2852.

For comparisons with a const char* we can call traits_type::length right
away to do the same optimization. That strlen call can be folded away
for constant arguments, making it very efficient.

For the pre-C++20 operator== and operator!= overloads we can swap the
order of the arguments to take advantage of the operator== improvements.

libstdc++-v3/ChangeLog:

PR libstdc++/62187
* include/bits/basic_string.h (operator==): Always compare
lengths before checking string contents.
[!__cpp_lib_three_way_comparison] (operator==, operator!=):
Reorder arguments.
---
 libstdc++-v3/include/bits/basic_string.h | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index a34b3d9ed28..57247e306dc 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3627,17 +3627,10 @@ _GLIBCXX_END_NAMESPACE_CXX11
 operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
 _GLIBCXX_NOEXCEPT
-{ return __lhs.compare(__rhs) == 0; }
-
-  template
-_GLIBCXX20_CONSTEXPR
-inline
-typename __gnu_cxx::__enable_if<__is_char<_CharT>::__value, bool>::__type
-operator==(const basic_string<_CharT>& __lhs,
-  const basic_string<_CharT>& __rhs) _GLIBCXX_NOEXCEPT
-{ return (__lhs.size() == __rhs.size()
- && !std::char_traits<_CharT>::compare(__lhs.data(), __rhs.data(),
-   __lhs.size())); }
+{
+  return __lhs.size() == __rhs.size()
+  && !_Traits::compare(__lhs.data(), __rhs.data(), __lhs.size());
+}
 
   /**
*  @brief  Test equivalence of string and C string.
@@ -3650,7 +3643,10 @@ _GLIBCXX_END_NAMESPACE_CXX11
 inline bool
 operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
   const _CharT* __rhs)
-{ return __lhs.compare(__rhs) == 0; }
+{
+  return __lhs.size() == _Traits::length(__rhs)
+  && !_Traits::compare(__lhs.data(), __rhs, __lhs.size());
+}
 
 #if __cpp_lib_three_way_comparison
   /**
@@ -3691,7 +3687,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
 inline bool
 operator==(const _CharT* __lhs,
   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
-{ return __rhs.compare(__lhs) == 0; }
+{ return __rhs == __lhs; }
 
   // operator !=
   /**
@@ -3717,7 +3713,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
 inline bool
 operator!=(const _CharT* __lhs,
   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
-{ return !(__lhs == __rhs); }
+{ return !(__rhs == __lhs); }
 
   /**
*  @brief  Test difference of string and C string.
-- 
2.34.3



[PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2022-06-14 Thread Lewis Hyatt via Gcc-patches
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902

The attached patch resolves PR preprocessor/103902 as described in the patch
message inline below. bootstrap + regtest all languages was successful on
x86-64 Linux, with no new failures:

FAIL 103 103
PASS 542338 542371
UNSUPPORTED 15247 15250
UNTESTED 136 136
XFAIL 4166 4166
XPASS 17 17

Please let me know if it looks OK?

A few questions I have:

- A difference introduced with this patch is that after lexing something
like `operator ""_abc', then `_abc' is added to the identifier hash map,
whereas previously it was not. I feel like this must be OK because with the
optional space as in `operator "" _abc', it would be added with or without the
patch.

- The behavior of `#pragma GCC poison' is not consistent (including prior to
  my patch). I tried to make it more so but there is still one thing I want to
  ask about. Leaving aside extended characters for now, the inconsistency is
  that currently the poison is only checked, when the suffix appears as a
  standalone token.

  #pragma GCC poison _X
  bool operator ""_X (unsigned long long);   //accepted before the patch,
 //rejected after it
  bool operator "" _X (unsigned long long);  //rejected either before or after
  const char * operator ""_X (const char *, unsigned long); //accepted before,
//rejected after
  const char * operator "" _X (const char *, unsigned long); //rejected either

  const char * s = ""_X; //accepted before the patch, rejected after it
  const bool b = 1_X; //accepted before or after 

I feel like after the patch, the behavior is the expected behavior for all
cases but the last one. Here, we allow the poisoned identifier because it's
not lexed as an identifier, it's lexed as part of a pp-number. Does it seem OK
like this or does it need to be addressed?

Thanks for taking a look!

-Lewis
Subject: [PATCH] libcpp: Handle extended characters in user-defined literal 
suffix [PR103902]

The PR complains that we do not handle UTF-8 in the suffix for a user-defined
literal, such as:

bool operator ""_π (unsigned long long);

In fact we don't handle any extended identifier characters there, whether
UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
the "" tokens is included, since then the identifier is lexed in the "normal"
way as its own token. But when it is lexed as part of the string token, this
is handled in lex_string() with a one-off loop that is not aware of extended
characters.

This patch fixes it by adding a new function scan_cur_identifier() that can be
used to lex an identifier while in the middle of lexing another token. It is
somewhat duplicative of the code in lex_identifier(), which handles the normal
case, but I think there's no good way to avoid that without pessimizing the
usual case, since lex_identifier() takes advantage of the fact that the first
character of the identifier has already been analyzed. The code duplication is
somewhat offset by factoring out the identifier lexing diagnostics (e.g. for
poisoned identifiers), which were formerly duplicated in two places, and have
been factored into their own function that's used in (now) 3 places.

BTW, the other place that was lexing identifiers is lex_identifier_intern(),
which is used to implement #pragma push_macro and #pragma pop_macro. This does
not support extended characters either. I will add that in a subsequent patch,
because it can't directly reuse the new function, but rather needs to lex from
a string instead of a cpp_buffer.

With scan_cur_identifier(), we do also correctly warn about bidi and
normalization issues in the extended identifiers comprising the suffix, and we
check for poisoned identifiers there as well.

PR preprocessor/103902

libcpp/ChangeLog:

* lex.cc (identifier_diagnostics_on_lex): New function refactors
common code from...
(lex_identifier_intern): ...here, and...
(lex_identifier): ...here.
(struct scan_id_result): New struct to hold the result of...
(scan_cur_identifier): ...new function.
(create_literal2): New function.
(is_macro): Removed function that is now handled directly in
lex_string() and lex_raw_string().
(is_macro_not_literal_suffix): Likewise.
(lit_accum::create_literal2): New function.
(lex_raw_string): Make use of new function scan_cur_identifier().
(lex_string): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/udlit-extended-id-1.C: New test.
* g++.dg/cpp0x/udlit-extended-id-2.C: New test.
* g++.dg/cpp0x/udlit-extended-id-3.C: New test.

diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C 
b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
new file mode 100644
index 000..411d4fdd0ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
@@ -0,0 +1,68 @@
+// { dg-do run 

Re: [PATCH][_Hashtable] Fix insertion of range of type convertible to value_type PR 105714

2022-06-14 Thread Jonathan Wakely via Gcc-patches
On Wed, 25 May 2022 at 06:10, François Dumont  wrote:
>
> Here is the patch to fix just what is described in PR 105714.
>
>  libstdc++: [_Hashtable] Insert range of types convertible to
> value_type PR 105714
>
>  Fix insertion of range of types convertible to value_type.
>
>  libstdc++-v3/ChangeLog:
>
>  PR libstdc++/105714
>  * include/bits/hashtable_policy.h (_ValueTypeEnforcer): New.
>  * include/bits/hashtable.h
> (_Hashtable<>::_M_insert_unique_aux): New.
>  (_Hashtable<>::_M_insert(_Arg&&, const _NodeGenerator&,
> true_type)): Use latters.
>  (_Hashtable<>::_M_insert(_Arg&&, const _NodeGenerator&,
> false_type)): Likewise.
>  (_Hashtable(_InputIterator, _InputIterator, size_type,
> const _Hash&, const _Equal&,
>  const allocator_type&, true_type)): Use this.insert range.
>  (_Hashtable(_InputIterator, _InputIterator, size_type,
> const _Hash&, const _Equal&,
>  const allocator_type&, false_type)): Use _M_insert.
>  * testsuite/23_containers/unordered_map/cons/56112.cc:
> Check how many times conversion
>  is done.
>  * testsuite/23_containers/unordered_map/insert/105714.cc:
> New test.
>  * testsuite/23_containers/unordered_set/insert/105714.cc:
> New test.
>
> Tested under Linux x64, ok to commit ?

I think "_ConvertToValueType" would be a better name than
_ValueTypeEnforcer, and all overloads of _ValueTypeEnforcer can be
const.

OK with that change, thanks.



libgo patch committed: Format the syscall package

2022-06-14 Thread Ian Lance Taylor via Gcc-patches
This Go formatter is starting to format documentation comments in some
cases.  As a step toward that in libgo, this patch adds blank lines
after //sys comments in the syscall package where needed, and then
runs the new formatter on the syscall package files.  This is the
libgo version of https://go.dev/cl/407136.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
3f4a86eef4ebc28e394a7108a2353098d2ca4856
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 2cf7141c4fa..aeada9f8d0c 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-0058658a9efb6e5c5faa6f0f65949beea5ddbc98
+bbb3a4347714faee620dc205674510a0f20b81ae
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/syscall/dir_plan9.go b/libgo/go/syscall/dir_plan9.go
index 4ed052de761..1667cbc02f4 100644
--- a/libgo/go/syscall/dir_plan9.go
+++ b/libgo/go/syscall/dir_plan9.go
@@ -184,6 +184,7 @@ func gbit8(b []byte) (uint8, []byte) {
 }
 
 // gbit16 reads a 16-bit number in little-endian order from b and returns it 
with the remaining slice of b.
+//
 //go:nosplit
 func gbit16(b []byte) (uint16, []byte) {
return uint16(b[0]) | uint16(b[1])<<8, b[2:]
diff --git a/libgo/go/syscall/errstr.go b/libgo/go/syscall/errstr.go
index 6c2441d364d..59f7a82c6d7 100644
--- a/libgo/go/syscall/errstr.go
+++ b/libgo/go/syscall/errstr.go
@@ -4,8 +4,8 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build !hurd
-// +build !linux
+//go:build !hurd && !linux
+// +build !hurd,!linux
 
 package syscall
 
diff --git a/libgo/go/syscall/errstr_glibc.go b/libgo/go/syscall/errstr_glibc.go
index 5b19e6f202d..03a327dbc90 100644
--- a/libgo/go/syscall/errstr_glibc.go
+++ b/libgo/go/syscall/errstr_glibc.go
@@ -7,6 +7,7 @@
 // We use this rather than errstr.go because on GNU/Linux sterror_r
 // returns a pointer to the error message, and may not use buf at all.
 
+//go:build hurd || linux
 // +build hurd linux
 
 package syscall
diff --git a/libgo/go/syscall/exec_bsd.go b/libgo/go/syscall/exec_bsd.go
index 86e513efdea..e631593cbd9 100644
--- a/libgo/go/syscall/exec_bsd.go
+++ b/libgo/go/syscall/exec_bsd.go
@@ -49,6 +49,7 @@ func runtime_AfterForkInChild()
 // For the same reason compiler does not race instrument it.
 // The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
+//
 //go:norace
 func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, 
attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) {
// Declare all variables at top in case any
diff --git a/libgo/go/syscall/exec_freebsd.go b/libgo/go/syscall/exec_freebsd.go
index f02f89d1ca0..8e8ecb7e989 100644
--- a/libgo/go/syscall/exec_freebsd.go
+++ b/libgo/go/syscall/exec_freebsd.go
@@ -57,6 +57,7 @@ func runtime_AfterForkInChild()
 // For the same reason compiler does not race instrument it.
 // The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
+//
 //go:norace
 func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, 
attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) {
// Declare all variables at top in case any
diff --git a/libgo/go/syscall/exec_hurd.go b/libgo/go/syscall/exec_hurd.go
index 06df513c55c..a62b3e920e6 100644
--- a/libgo/go/syscall/exec_hurd.go
+++ b/libgo/go/syscall/exec_hurd.go
@@ -49,6 +49,7 @@ func runtime_AfterForkInChild()
 // For the same reason compiler does not race instrument it.
 // The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
+//
 //go:norace
 func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, 
attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) {
// Declare all variables at top in case any
diff --git a/libgo/go/syscall/exec_linux.go b/libgo/go/syscall/exec_linux.go
index 86fb8e84a66..77846af89e4 100644
--- a/libgo/go/syscall/exec_linux.go
+++ b/libgo/go/syscall/exec_linux.go
@@ -80,6 +80,7 @@ func runtime_AfterFork()
 func runtime_AfterForkInChild()
 
 // Implemented in clone_linux.c
+//
 //go:noescape
 func rawClone(flags _C_ulong, child_stack *byte, ptid *Pid_t, ctid *Pid_t, 
regs unsafe.Pointer) _C_long
 
@@ -92,6 +93,7 @@ func rawClone(flags _C_ulong, child_stack *byte, ptid *Pid_t, 
ctid *Pid_t, regs
 // For the same reason compiler does not race instrument it.
 // The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
+//
 //go:norace
 func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, 
attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) {
// Set up and fork. This returns immediately in the parent or
diff --git a/libgo/go/syscall/exec_stubs.go b/libgo/go/syscall/exec_stubs.go
index c837cf7a4e2..

Re: [PATCH V2]rs6000: Store complicated constant into pool

2022-06-14 Thread Segher Boessenkool
Hi!

On Tue, Jun 14, 2022 at 09:23:55PM +0800, Jiufu Guo wrote:
> This patch reduces the threshold of instruction number for storing
> constant to pool and update cost for constant and mem accessing.
> And then if building the constant needs more than 2 instructions (or
> more than 1 instruction on P10), then prefer to load it from constant
> pool.

Have you tried with different limits?  And, p10 is a red herring, you
actually test if prefixed insns are used.

>   * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem):
>   Exclude rtx with code 'HIGH'.

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* Exclude CONSTANT HIGH part.  e.g.
> + (high:DI (symbol_ref:DI ("var") [flags 0xc0] )).  */
> +  if (GET_CODE (x) == HIGH)
>  return true;

So, why is this?  You didn't explain.

> @@ -11139,7 +11140,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode 
> mode)
>   && FP_REGNO_P (REGNO (operands[0])))
>  || !CONST_INT_P (operands[1])
>  || (num_insns_constant (operands[1], mode)
> -> (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2)))
> +> (TARGET_PREFIXED ? 1 : 2)))
>  && !toc_relative_expr_p (operands[1], false, NULL, NULL)
>  && (TARGET_CMODEL == CMODEL_SMALL
>  || can_create_pseudo_p ()

This is the more obvious part.

> @@ -22101,6 +22102,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> outer_code,
>  
>  case CONST_DOUBLE:
>  case CONST_WIDE_INT:
> +  /* It may needs a few insns for const to SET. -1 for outer SET code.  
> */
> +  if (outer_code == SET)
> + {
> +   *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1;
> +   return true;
> + }
> +  /* FALLTHRU */
> +
>  case CONST:
>  case HIGH:
>  case SYMBOL_REF:

But this again isn't an obvious improvement at all, and needs
performance testing -- separately of the other changes.

> @@ -22110,8 +22119,12 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> outer_code,
>  case MEM:
>/* When optimizing for size, MEM should be slightly more expensive
>than generating address, e.g., (plus (reg) (const)).
> -  L1 cache latency is about two instructions.  */
> -  *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2);
> +  L1 cache latency is about two instructions.
> +  For prefixed load (pld), we would set it slightly faster than
> +  than two instructions. */
> +  *total = !speed
> +  ? COSTS_N_INSNS (1) + 1
> +  : TARGET_PREFIXED ? COSTS_N_INSNS (2) - 1 : COSTS_N_INSNS (2);
>if (rs6000_slow_unaligned_access (mode, MEM_ALIGN (x)))
>   *total += COSTS_N_INSNS (100);
>return true;

And this is completely independent of the rest as well.  Cost 5 or 7 are
completely random numbers, why did you pick these?  Does it work better
than 8, or 4, etc.?


> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-require-effective-target lp64 } */
>  /* { dg-options "-O" } */
> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */

Why?  This is still better generated in code, no?  It should never be
loaded from a constant pool (it is hex 4000___, easy to
construct with just one or two insns).

> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c
> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0x92349234ULL; }
>  unsigned long long mskl1() { return 0x2bcd2bcdULL; }
>  unsigned long long mskse() { return 0x12341234ULL; }
>  
> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */
> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */

Please make this the exact number of times you want to see rldimi and
the number of times you want a load.

Btw, you need to write
  \m(?:rldimi|ld|pld)\M
or it will mean
  \mrldimi
or
  ld
or
  pld\M
(and that "ld" will match anything that "pld$" will match of course).


So no doubt this will improve things, but we need testing of each part
separately.  Also look at code size, or differences in the generated
code in general: this is much more sensitive to detect than performance,
and is not itself sensitive to things like system load, so a) is easier
to measure, and b) has more useful outputs, outputs that tell more of
the whole story.


Segher


[PATCH] Fix ICE in extract_insn, at recog.cc:2791

2022-06-14 Thread liuhongt via Gcc-patches
(In reply to Uroš Bizjak from comment #1)
> Instruction does not accept memory operand for operand 3:
>
> (define_insn_and_split
> "*_blendv_ltint"
>   [(set (match_operand: 0 "register_operand" "=Yr,*x,x")
>   (unspec:
> [(match_operand: 1 "register_operand" "0,0,x")
>  (match_operand: 2 "vector_operand" "YrBm,*xBm,xm")
>  (subreg:
>(lt:VI48_AVX
>  (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
>  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> UNSPEC_BLENDV))]
>
> The problematic insn is:
>
> (define_insn_and_split "*avx_cmp3_ltint_not"
>  [(set (match_operand:VI48_AVX  0 "register_operand")
>(vec_merge:VI48_AVX
>(match_operand:VI48_AVX 1 "vector_operand")
>(match_operand:VI48_AVX 2 "vector_operand")
>(unspec:
>  [(subreg:VI48_AVX
>   (not:
> (match_operand: 3 "vector_operand")) 0)
>   (match_operand:VI48_AVX 4 "const0_operand")
>   (match_operand:SI 5 "const_0_to_7_operand")]
>   UNSPEC_PCMP)))]
>
> which gets split to the above pattern.
>
> In the preparation statements we have:
>
>   if (!MEM_P (operands[3]))
> operands[3] = force_reg (mode, operands[3]);
>   operands[3] = lowpart_subreg (mode, operands[3], mode);
>
> Which won't fly when operand 3 is memory operand...
>

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/105953
* config/i386/sse.md (*avx_cmp3_ltint_not): Force_reg
operands[3].

gcc/testsuite/ChangeLog:

* g++.target/i386/pr105953.C: New test.
---
 gcc/config/i386/sse.md   | 3 +--
 gcc/testsuite/g++.target/i386/pr105953.C | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr105953.C

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 75609eaf9b7..3e3d96fe087 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3643,8 +3643,7 @@ (define_insn_and_split "*avx_cmp3_ltint_not"
  gen_lowpart (mode, operands[1]));
   operands[2] = gen_lowpart (mode, operands[2]);
 
-  if (!MEM_P (operands[3]))
-operands[3] = force_reg (mode, operands[3]);
+  operands[3] = force_reg (mode, operands[3]);
   operands[3] = lowpart_subreg (mode, operands[3], mode);
 })
 
diff --git a/gcc/testsuite/g++.target/i386/pr105953.C 
b/gcc/testsuite/g++.target/i386/pr105953.C
new file mode 100644
index 000..b423d2dfdae
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr105953.C
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512vl -mabi=ms" } */
+
+#include "pr100738-1.C"
-- 
2.18.1



Re: [PATCH] Fix ICE in extract_insn, at recog.cc:2791

2022-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 15, 2022 at 12:49 AM liuhongt  wrote:
>
> (In reply to Uroš Bizjak from comment #1)
> > Instruction does not accept memory operand for operand 3:
> >
> > (define_insn_and_split
> > "*_blendv_ltint"
> >   [(set (match_operand: 0 "register_operand" "=Yr,*x,x")
> >   (unspec:
> > [(match_operand: 1 "register_operand" "0,0,x")
> >  (match_operand: 2 "vector_operand" "YrBm,*xBm,xm")
> >  (subreg:
> >(lt:VI48_AVX
> >  (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
> >  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> > UNSPEC_BLENDV))]
> >
> > The problematic insn is:
> >
> > (define_insn_and_split "*avx_cmp3_ltint_not"
> >  [(set (match_operand:VI48_AVX  0 "register_operand")
> >(vec_merge:VI48_AVX
> >(match_operand:VI48_AVX 1 "vector_operand")
> >(match_operand:VI48_AVX 2 "vector_operand")
> >(unspec:
> >  [(subreg:VI48_AVX
> >   (not:
> > (match_operand: 3 "vector_operand")) 0)
> >   (match_operand:VI48_AVX 4 "const0_operand")
> >   (match_operand:SI 5 "const_0_to_7_operand")]
> >   UNSPEC_PCMP)))]
> >
> > which gets split to the above pattern.
> >
> > In the preparation statements we have:
> >
> >   if (!MEM_P (operands[3]))
> > operands[3] = force_reg (mode, operands[3]);
> >   operands[3] = lowpart_subreg (mode, operands[3], mode);
> >
> > Which won't fly when operand 3 is memory operand...
> >
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/105953
> * config/i386/sse.md (*avx_cmp3_ltint_not): Force_reg
> operands[3].
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr105953.C: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md   | 3 +--
>  gcc/testsuite/g++.target/i386/pr105953.C | 4 
>  2 files changed, 5 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr105953.C
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 75609eaf9b7..3e3d96fe087 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -3643,8 +3643,7 @@ (define_insn_and_split "*avx_cmp3_ltint_not"
>   gen_lowpart (mode, operands[1]));
>operands[2] = gen_lowpart (mode, operands[2]);
>
> -  if (!MEM_P (operands[3]))
> -operands[3] = force_reg (mode, operands[3]);
> +  operands[3] = force_reg (mode, operands[3]);
>operands[3] = lowpart_subreg (mode, operands[3], mode);
>  })
>
> diff --git a/gcc/testsuite/g++.target/i386/pr105953.C 
> b/gcc/testsuite/g++.target/i386/pr105953.C
> new file mode 100644
> index 000..b423d2dfdae
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr105953.C
> @@ -0,0 +1,4 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512vl -mabi=ms" } */
> +
> +#include "pr100738-1.C"
> --
> 2.18.1
>


Re: [PATCH 0/9] Add debug_annotate attributes

2022-06-14 Thread Yonghong Song via Gcc-patches




On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
   to "tag") particular declarations and types with arbitrary strings. As
   explained below, this is intended to be used to, for example, characterize
   certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
   DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
   kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
 tags on certain language elements, such as struct fields.

 The purpose of these annotations is to provide additional information about
 types, variables, and function parameters of interest to the kernel. A
 driving use case is to tag pointer types within the linux kernel and eBPF
 programs with additional semantic information, such as '__user' or '__rcu'.

 For example, consider the linux kernel function do_execve with the
 following declaration:

   static int do_execve(struct filename *filename,
  const char __user *const __user *__argv,
  const char __user *const __user *__envp);

 Here, __user could be defined with these annotations to record semantic
 information about the pointer parameters (e.g., they are user-provided) in
 DWARF and BTF information. Other kernel facilites such as the eBPF verifier
 can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

 The main motivation for emitting the tags in DWARF is that the Linux kernel
 generates its BTF information via pahole, using DWARF as a source:

 ++  BTF  BTF   +--+
 | pahole |---> vmlinux.btf --->| verifier |
 ++ +--+
 ^^
 ||
   DWARF |BTF |
 ||
  vmlinux  +-+
  module1.ko   | BPF program |
  module2.ko   +-+
...

 This is because:

 a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

 b)  GCC can generate BTF for whatever target with -gbtf, but there is no
 support for linking/deduplicating BTF in the linker.

 In the scenario above, the verifier needs access to the pointer tags of
 both the kernel types/declarations (conveyed in the DWARF and translated
 to BTF by pahole) and those of the BPF program (available directly in BTF).

 Another motivation for having the tag information in DWARF, unrelated to
 BPF and BTF, is that the drgn project (another DWARF consumer) also wants
 to benefit from these tags in order to differentiate between different
 kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

 This is easy: the main purpose of having this info in BTF is for the
 compiled eBPF programs. The kernel verifier can then access the tags
 of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

   https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
   https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
   https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally very
similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
in the attribute name seems misleading.

DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
declarations and types will be checked for the corresponding attributes. If
present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
the annotated type or declaration, one for each tag. These DIEs link the
arbitrary tag value to the item they annotate.

For example, the following variable declaration:

   #define __typetag1 _

Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]

2022-06-14 Thread Kewen.Lin via Gcc-patches
Hi Honza,

Thanks for the comments!  Some replies are inlined below.

on 2022/6/14 19:37, Jan Hubicka wrote:
>> Hi,
>>
>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>> if func->decl is not null but no cgraph node is available for it.
>> As PR105818 shows, this could give unexpected result.  For the
>> case in PR105818, when parsing bar decl in function foo, the cfun
>> is a function structure for foo, for which there is none cgraph
>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>> the context is to optimize for size, the flag optimize_size is
>> true.
>>
>> The patch is to make optimize_function_for_size_p to check
>> optimize_size as what it does when func->decl is unavailable.
>>
>> One regression failure got exposed on aarch64-linux-gnu:
>>
>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>  -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>
>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>> used in function fold_range_test during c parsing, it uses
>> optimize_function_for_speed_p which is equal to the invertion
>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>> but no cgraph node for it, w/o this patch function
>> optimize_function_for_speed_p returns true eventually, while it
>> returns false with this patch.  Since the command line option -Os
>> is specified, there is no reason to interpret it as "for speed".
>> I think this failure is expected and adjust the test case
>> accordingly.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>>
>>  PR target/105818
>>
>> gcc/ChangeLog:
>>
>>  * predict.cc (optimize_function_for_size_p): Check optimize_size when
>>  func->decl is valid but its cgraph node is unavailable.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/powerpc/pr105818.c: New test.
>>  * gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>> ---
>>  gcc/predict.cc  | 2 +-
>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c| 2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +
>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>
>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>> index 5734e4c8516..6c60a973236 100644
>> --- a/gcc/predict.cc
>> +++ b/gcc/predict.cc
>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>cgraph_node *n = cgraph_node::get (fun->decl);
>>if (n)
>>  return n->optimize_for_size_p ();
>> -  return OPTIMIZE_SIZE_NO;
>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
> 
> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
> probably better since one can change optimize_size with optimization
> attribute.

Good point, agree!

> However I think in most cases we check for optimize_size early I think
> we are doing something wrong, since at that time htere is no profile
> available.  Why exactly PR105818 hits the flag change issue?

For PR105818, the reason why the flag changs is that:

Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
of rs6000_isa_flags_explicit, it's set as below:

/* If we can shrink-wrap the TOC register save separately, then use
   -msave-toc-indirect unless explicitly disabled.  */
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
&& flag_shrink_wrap_separate
&& optimize_function_for_speed_p (cfun))
  rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;

Initially, rs6000 initialize target_option_default_node with
OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
and optimize_size is true.

Later, when c parser handling function foo, it builds target
option node as target_option_default_node in function
handle_optimize_attribute, it does global option saving and
verifying there as well, at that time the cfun is NULL, no
issue is found.  And function store_parm_decls allocates
struct_function for foo then, cfun becomes function struct
for foo, when c parser continues to handle the decl bar in
foo, function handle_optimize_attribute works as before,
tries to restore the target options at the end, it calls
targetm.target_option.restore (rs6000_function_specific_restore)
which calls function rs6000_option_override_internal again,
at this time the cfun is not NULL while there is no cgraph
node for its decl, optimize_function_for_speed_p returns true
and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
rs6000_isa_flags set unexpectedly.  It becomes inconsistent
as the one saved previously.

IMHO, both contexts of global and function decl foo here hold
optimize_size, function optimize_function_for_speed_p should
not return true anyway.

btw, the aarch64 failed case also gets the unexpected
result for optimize_function_for_speed_p during c parsing
(fold_range_test <- ... <- c_parser_condition).

IIUC, in parsing time we don't have the profile information
available.

BR,
Kewen


Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]

2022-06-14 Thread Kewen.Lin via Gcc-patches
on 2022/6/14 20:53, Segher Boessenkool wrote:
> On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote:
>>  PR target/105818
> 
> Change this to something else?  It is not a target bug.  "middle-end"
> perhaps?
> 

Good catch, will fix this.  Thanks Segher!

BR,
Kewen