date:20240919

Ping^3: [PATCH] warn-access: ignore template parameters when matching operator new/delete [PR109224]

2024-09-19 Thread Arsen Arsenović

Arsen Arsenović  writes:

> Gentle ping again.  Full patch:
> https://inbox.sourceware.org/gcc-patches/86y14ptvdi@aarsen.me/

And again.  To clarify, the above is a v2 of sorts (it has the comment
fixed, I just didn't update the subject).

TIA, have a lovely day.
-- 
Arsen Arsenović

signature.asc
Description: PGP signature

[PATCH] match: Fix `a != 0 ? a * b : 0` patterns for things that trap [PR116772]

2024-09-19 Thread Andrew Pinski

For generic, `a != 0 ? a * b : 0` would match where `b` would be an expression
which trap (in the case of the testcase, it was an integer division but it 
could be any).

This fixes the issue by adding a condition for `(a != 0 ? expr : 0)` to check 
for expressions
which have side effects or traps.

PR middle-end/116772

gcc/ChangeLog:

* match.pd (`a != 0 ? a / b : 0`): Add a check to make
sure b does not trap or have side effects.
(`a != 0 ? a * b : 0`, `a != 0 ? a & b : 0`): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116772-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd  | 12 ++--
 gcc/testsuite/gcc.dg/torture/pr116772-1.c | 24 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116772-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index fdb59ff0d44..db46f319c5f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4663,7 +4663,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (simplify
   (cond (ne @0 integer_zerop) (op@2 @3 @1) integer_zerop )
(if (bitwise_equal_p (@0, @3)
-&& tree_expr_nonzero_p (@1))
+&& tree_expr_nonzero_p (@1)
+   /* Cannot make a trapping expression or with one with side
+  effects unconditional. */
+   && !generic_expr_could_trap_p (@3)
+   && (GIMPLE || !TREE_SIDE_EFFECTS (@3)))
 @2)))
 
 /* Note we prefer the != case here
@@ -4673,7 +4677,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for op (mult bit_and)
  (simplify
   (cond (ne @0 integer_zerop) (op:c@2 @1 @3) integer_zerop)
-  (if (bitwise_equal_p (@0, @3))
+  (if (bitwise_equal_p (@0, @3)
+   /* Cannot make a trapping expression or with one with side
+  effects unconditional. */
+   && !generic_expr_could_trap_p (@1)
+   && (GIMPLE || !TREE_SIDE_EFFECTS (@1)))
@2)))
 
 /* Simplifications of shift and rotates.  */
diff --git a/gcc/testsuite/gcc.dg/torture/pr116772-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116772-1.c
new file mode 100644
index 000..eedd0398af1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116772-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* PR middle-end/116772  */
+/* The division by `/b` should not
+   be made uncondtional. */
+
+int mult0(int a,int b) __attribute__((noipa));
+
+int mult0(int a,int b){
+  return (b!=0 ? (a/b)*b : 0);
+}
+
+int bit_and0(int a,int b) __attribute__((noipa));
+
+int bit_and0(int a,int b){
+  return (b!=0 ? (a/b)&b : 0);
+}
+
+int main() {
+  if (mult0(3, 0) != 0)
+__builtin_abort();
+  if (bit_and0(3, 0) != 0)
+__builtin_abort();
+  return 0;
+}
-- 
2.43.0

Re: [PATCH] libcpp, v2: Add -Wtrailing-whitespace= warning

2024-09-19 Thread Eric Gallager

On Thu, Sep 19, 2024 at 4:35 PM Jakub Jelinek  wrote:
>
> On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote:
> > On Wed, Sep 18, 2024 at 7:33 PM Jakub Jelinek  wrote:
> > >
> > > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote:
> > > > +1  I'd much rather learn about this kind of error before the code 
> > > > reaches
> > > > a review tool :)
> > > >
> > > > >From a quick check, it doesn't look like Clang has this, so there is no
> > > > existing name to follow.
> > >
> > > I was considering also -Wtrailing-whitespace, but
> > > 1) git diff really warns just about trailing spaces/tabs, not form feeds 
> > > or
> > > vertical tabs
> > > 2) gcc source contains tons of spots with form feed in it (though,
> > > I think pretty much always as the sole character on a line).
> > > And not really sure how people use vertical tabs in the source if at all.
> > > Perhaps form feed could be not warned if at end of line if it isn't the 
> > > sole
> > > character on a line...
> >
> > Generally I like diagnosing this early.  For the above I'd say
> > -Wtrailing-whitespace=
> > with a set of things to diagnose (and a sane default - just spaces and
> > tabs - for
> > -Wtrailiing-whitespace) would be nice.  As for naming possibly follow the
> > is{space,blank,cntrl} character classifications?  If those are a good
> > fit, that is.
>
> Here is a patch which currently allows blank (' ' '\t') and space (' ' '\t'
> '\f' '\v'), cntrl not yet added, not anything non-ASCII, but in theory could
> be added later (though, non-ASCII would be just for inside of comments,
> say non-breaking space etc. in the source is otherwise an error).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux.
>

I think this is getting too complex now; I preferred the simpler version...

> 2024-09-19  Jakub Jelinek  
>
> libcpp/
> * include/cpplib.h (struct cpp_options): Add
> cpp_warn_trailing_whitespace member.
> (enum cpp_warning_reason): Add CPP_W_TRAILING_WHITESPACE.
> * internal.h (struct _cpp_line_note): Document 'W' line note.
> * lex.cc (_cpp_clean_line): Add 'W' line note for trailing whitespace
> except for trailing whitespace after backslash.  Formatting fix.
> (_cpp_process_line_notes): Emit -Wtrailing-whitespace diagnostics.
> Formatting fixes.
> (lex_raw_string): Clear type on 'W' notes.
> gcc/
> * doc/invoke.texi (Wtrailing-whitespace): Document.
> gcc/c-family/
> * c.opt (Wtrailing-whitespace=): New option.
> (Wtrailing-whitespace): New alias.
> gcc/testsuite/
> * c-c++-common/cpp/Wtrailing-whitespace-1.c: New test.
> * c-c++-common/cpp/Wtrailing-whitespace-2.c: New test.
> * c-c++-common/cpp/Wtrailing-whitespace-3.c: New test.
> * c-c++-common/cpp/Wtrailing-whitespace-4.c: New test.
> * c-c++-common/cpp/Wtrailing-whitespace-5.c: New test.
>
> --- libcpp/include/cpplib.h.jj  2024-09-13 16:09:32.690455174 +0200
> +++ libcpp/include/cpplib.h 2024-09-19 16:59:09.674903649 +0200
> @@ -594,6 +594,9 @@ struct cpp_options
>/* True if -finput-charset= option has been used explicitly.  */
>bool cpp_input_charset_explicit;
>
> +  /* -Wtrailing-whitespace= value.  */
> +  unsigned char cpp_warn_trailing_whitespace;
> +
>/* Dependency generation.  */
>struct
>{
> @@ -709,7 +712,8 @@ enum cpp_warning_reason {
>CPP_W_EXPANSION_TO_DEFINED,
>CPP_W_BIDIRECTIONAL,
>CPP_W_INVALID_UTF8,
> -  CPP_W_UNICODE
> +  CPP_W_UNICODE,
> +  CPP_W_TRAILING_WHITESPACE
>  };
>
>  /* Callback for header lookup for HEADER, which is the name of a
> --- libcpp/internal.h.jj2024-09-18 09:45:36.832570227 +0200
> +++ libcpp/internal.h   2024-09-19 16:54:56.610321817 +0200
> @@ -318,8 +318,8 @@ struct _cpp_line_note
>
>/* Type of note.  The 9 'from' trigraph characters represent those
>   trigraphs, '\\' an escaped newline, ' ' an escaped newline with
> - intervening space, 0 represents a note that has already been handled,
> - and anything else is invalid.  */
> + intervening space, 'W' trailing whitespace, 0 represents a note that
> + has already been handled, and anything else is invalid.  */
>unsigned int type;
>  };
>
> --- libcpp/lex.cc.jj2024-09-13 16:09:32.720454758 +0200
> +++ libcpp/lex.cc   2024-09-19 16:58:37.434339128 +0200
> @@ -928,7 +928,7 @@ _cpp_clean_line (cpp_reader *pfile)
>   if (p == buffer->next_line || p[-1] != '\\')
> break;
>
> - add_line_note (buffer, p - 1, p != d ? ' ': '\\');
> + add_line_note (buffer, p - 1, p != d ? ' ' : '\\');
>   d = p - 2;
>   buffer->next_line = p - 1;
> }
> @@ -943,6 +943,20 @@ _cpp_clean_line (cpp_reader *pfile)
> }
> }
> }
> + done:
> +  if (d > buffer->next_line
> + && CPP_OPTION (pfile, cpp_warn_trailing_whitespace))
> +   switch (

Re: [PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]

2024-09-19 Thread Uros Bizjak

On Thu, Sep 19, 2024 at 10:49 PM Jakub Jelinek  wrote:
>
> Hi!
>
> min/max patterns for intrinsics which on x86 result in the second
> input operand if the two operands are both zeros or one or both of them
> are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to
> MIN_EXPR/MAX_EXPR undefined what will be the result in those cases.
>
> The following patch adds an expander which uses either a new pattern with
> UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE
> (except for the 3dNOW! PFMIN/MAX, those actually are documented to behave
> differently), but it actually doesn't improve anything much, as
> simplify_const_relational_operation nor simplify_ternary_operation aren't
> able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE
> with 3 CONST_VECTOR operands.
> So, maybe better approach will be to generic fold the builtins with constant
> arguments (maybe leaving NaNs to runtime).

I think it is still worth it to implement insn patterns with generic
RTXes instead of unspecs. Maybe some future improvement to generic RTX
simplification will be able to handle them.

>
> 2024-09-19  Uros Bizjak  
> Jakub Jelinek  
>
> PR target/116738
> * config/i386/subst.md (mask_scalar_operand_arg34,
> mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New
> subst attributes.
> * config/i386/sse.md
> (_vm3):
> Change from define_insn to define_expand, rename the old define_insn
> to ...
> (*_vm3):
> ... this.
> 
> (_ieee_vm3):
> New define_insn.
>
> * gcc.target/i386/sse-pr116738.c: New test.

OK, also for backports.

Thanks,
Uros.

>
> --- gcc/config/i386/subst.md.jj 2024-09-18 15:49:42.200791315 +0200
> +++ gcc/config/i386/subst.md2024-09-19 12:32:51.048626421 +0200
> @@ -366,6 +366,8 @@ (define_subst_attr "mask_scalar_operand4
>  (define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4")
>  (define_subst_attr "mask_scalar4_dest_false_dep_for_glc_cond" "mask_scalar" 
> "1" "operands[4] == CONST0_RTX(mode)")
>  (define_subst_attr "mask_scalarc_dest_false_dep_for_glc_cond" "mask_scalarc" 
> "1" "operands[3] == CONST0_RTX(V8HFmode)")
> +(define_subst_attr "mask_scalar_operand_arg34" "mask_scalar" "" ", 
> operands[3], operands[4]")
> +(define_subst_attr "mask_scalar_expand_op3" "mask_scalar" "3" "5")
>
>  (define_subst "mask_scalar"
>[(set (match_operand:SUBST_V 0)
> @@ -473,6 +475,7 @@ (define_subst_attr "round_saeonly_scalar
>  (define_subst_attr "round_saeonly_scalar_constraint" "round_saeonly_scalar" 
> "vm" "v")
>  (define_subst_attr "round_saeonly_scalar_prefix" "round_saeonly_scalar" 
> "vex" "evex")
>  (define_subst_attr "round_saeonly_scalar_nimm_predicate" 
> "round_saeonly_scalar" "nonimmediate_operand" "register_operand")
> +(define_subst_attr "round_saeonly_scalar_mask_arg3" "round_saeonly_scalar" 
> "" ", operands[]")
>
>  (define_subst "round_saeonly_scalar"
>[(set (match_operand:SUBST_V 0)
> --- gcc/config/i386/sse.md.jj   2024-09-10 16:26:02.875151133 +0200
> +++ gcc/config/i386/sse.md  2024-09-19 12:43:31.693030695 +0200
> @@ -,7 +,27 @@ (define_insn "*ieee_3
>(const_string "*")))
> (set_attr "mode" "")])
>
> -(define_insn 
> "_vm3"
> +(define_expand 
> "_vm3"
> +  [(set (match_operand:VFH_128 0 "register_operand")
> +   (vec_merge:VFH_128
> + (smaxmin:VFH_128
> +   (match_operand:VFH_128 1 "register_operand")
> +   (match_operand:VFH_128 2 "nonimmediate_operand"))
> +(match_dup 1)
> +(const_int 1)))]
> +  "TARGET_SSE"
> +{
> +  if (!flag_finite_math_only || flag_signed_zeros)
> +{
> +  emit_insn 
> (gen__ieee_vm3
> +(operands[0], operands[1], operands[2]
> + 
> + ));
> +  DONE;
> +}
> +})
> +
> +(define_insn 
> "*_vm3"
>[(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> (vec_merge:VFH_128
>   (smaxmin:VFH_128
> @@ -3348,6 +3368,25 @@ (define_insn "_vm3[(set_attr "isa" "noavx,avx")
> (set_attr "type" "sse")
> (set_attr "btver2_sse_attr" "maxmin")
> +   (set_attr "prefix" "")
> +   (set_attr "mode" "")])
> +
> +(define_insn 
> "_ieee_vm3"
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +   (vec_merge:VFH_128
> + (unspec:VFH_128
> +   [(match_operand:VFH_128 1 "register_operand" "0,v")
> +(match_operand:VFH_128 2 "nonimmediate_operand" 
> "xm,")]
> +   IEEE_MAXMIN)
> +(match_dup 1)
> +(const_int 1)))]
> +  "TARGET_SSE"
> +  "@
> +   \t{%2, %0|%0, %2}
> +   v\t{%2, 
> %1, %0|%0, %1, 
> %2}"
> +  [(set_attr "isa" "noavx,avx")
> +   (set_attr "type" "sse")
> +   (set_attr "btver2_sse_attr" "maxmin")
> (set_attr "prefix"

Re: [PATCH] match: Fix `a != 0 ? a * b : 0` patterns for things that trap [PR116772]

2024-09-19 Thread Richard Biener

On Fri, Sep 20, 2024 at 3:07 AM Andrew Pinski  wrote:
>
> For generic, `a != 0 ? a * b : 0` would match where `b` would be an expression
> which trap (in the case of the testcase, it was an integer division but it 
> could be any).
>
> This fixes the issue by adding a condition for `(a != 0 ? expr : 0)` to check 
> for expressions
> which have side effects or traps.

I think the better fix is to restrict the patterns to GIMPLE - it
doesn't look like they were
moved over from fold-const.cc?  Another option might be to have a way to check
that @3 and @1 are "leaf" (aka non-EXPRs and non-REFERENCEs).

If you think the issue could be more wide-spread and we want to preserve the
folding at GENERIC (it might be useful for SCEV or niter analysis both which
eventually add COND_EXPRs...), then can you, instead of repeating

> +   && !generic_expr_could_trap_p (@3)
> +   && (GIMPLE || !TREE_SIDE_EFFECTS (@3)))

introduce an inline function in {gimple,generic}-match-head.cc for this,
say

static inline bool
no_side_effects (tree t)
{
   return !TREE_SIDE_EFFECTS (t) && !generic_expr_could_trap_p (t);
}

and on the GIMPLE side return true (and checking-assert we have a
is_gimple_val).

Thanks,
Richard.

> PR middle-end/116772
>
> gcc/ChangeLog:
>
> * match.pd (`a != 0 ? a / b : 0`): Add a check to make
> sure b does not trap or have side effects.
> (`a != 0 ? a * b : 0`, `a != 0 ? a & b : 0`): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr116772-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd  | 12 ++--
>  gcc/testsuite/gcc.dg/torture/pr116772-1.c | 24 +++
>  2 files changed, 34 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116772-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index fdb59ff0d44..db46f319c5f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4663,7 +4663,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (simplify
>(cond (ne @0 integer_zerop) (op@2 @3 @1) integer_zerop )
> (if (bitwise_equal_p (@0, @3)
> -&& tree_expr_nonzero_p (@1))
> +&& tree_expr_nonzero_p (@1)
> +   /* Cannot make a trapping expression or with one with side
> +  effects unconditional. */
> +   && !generic_expr_could_trap_p (@3)
> +   && (GIMPLE || !TREE_SIDE_EFFECTS (@3)))
>  @2)))
>
>  /* Note we prefer the != case here
> @@ -4673,7 +4677,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (for op (mult bit_and)
>   (simplify
>(cond (ne @0 integer_zerop) (op:c@2 @1 @3) integer_zerop)
> -  (if (bitwise_equal_p (@0, @3))
> +  (if (bitwise_equal_p (@0, @3)
> +   /* Cannot make a trapping expression or with one with side
> +  effects unconditional. */
> +   && !generic_expr_could_trap_p (@1)
> +   && (GIMPLE || !TREE_SIDE_EFFECTS (@1)))
> @2)))
>
>  /* Simplifications of shift and rotates.  */
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116772-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr116772-1.c
> new file mode 100644
> index 000..eedd0398af1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116772-1.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* PR middle-end/116772  */
> +/* The division by `/b` should not
> +   be made uncondtional. */
> +
> +int mult0(int a,int b) __attribute__((noipa));
> +
> +int mult0(int a,int b){
> +  return (b!=0 ? (a/b)*b : 0);
> +}
> +
> +int bit_and0(int a,int b) __attribute__((noipa));
> +
> +int bit_and0(int a,int b){
> +  return (b!=0 ? (a/b)&b : 0);
> +}
> +
> +int main() {
> +  if (mult0(3, 0) != 0)
> +__builtin_abort();
> +  if (bit_and0(3, 0) != 0)
> +__builtin_abort();
> +  return 0;
> +}
> --
> 2.43.0
>

Re: [PATCH RFC] build: update bootstrap req to C++14

2024-09-19 Thread Richard Biener

On Thu, Sep 19, 2024 at 4:37 PM Jakub Jelinek  wrote:
>
> On Thu, Sep 19, 2024 at 10:21:15AM -0400, Jason Merrill wrote:
> > On 9/19/24 7:57 AM, Richard Biener wrote:
> > > On Wed, Sep 18, 2024 at 6:22 PM Jason Merrill  wrote:
> > > >
> > > > Tested x86_64-pc-linux-gnu with 5.5.0 bootstrap compiler.  Thoughts?
> > >
> > > I'm fine with this in general - do we have needs of bumping the 
> > > requirement for
> > > GCC 15 though?  IMO we should bump once we are requiring actual C++14
> > > in some place.
> >
> > Jakub's dwarf2asm patch yesterday uses C++14 if available, and I remember
>
> And libcpp too.
>
> > seeing a couple of other patches that would have been simpler with C++14
> > available.
>
> It was just a few lines and if I removed the now never true
> HAVE_DESIGNATED_INITIALIZERS cases, it wouldn't even add any new lines, just
> change some to others.  Both of those patches were just minor optimizations,
> it is fine if they don't happen during stage1.
>
> We also have some spots with
> #if __cpp_inline_variables < 201606L
> #else
> #endif
> conditionals but that doesn't mean we need to bump to C++17.
>
> Sure, bumping the required C++ version means we can remove the corresponding
> conditionals, and more importantly stop worrying about working around GCC
> 4.8.x/4.9 bugs (I think that is actually more important).
> The price is stopping to use some of the cfarm machines for testing or
> using IBM Advanced Toolchain or hand-built GCC 14 there as the system
> compiler there.
> At some point we certainly want to do that, the question is if the benefits
> right now overweight the pain.
>
> > > As of the version requirement as you say only some minor versions of the 
> > > GCC 5
> > > series are OK I would suggest to say we recommend using GCC 6 or later
> > > but GCC 5.5 should also work?
> >
> > Aren't we already specifying a minor revision with 4.8.3 for C++11?
> >
> > Another possibility would be to just say GCC 5, and adjust that upward if we
> > run into problems.
>
> I think for the oldest supported version we need some CFarm machines around
> with that compiler so that all people can actually test issues with it.
> Dunno which distros shipped GCC 5 in long term support versions if any and
> at which minor those are.

At this point in time the relevant remaining LTS codestream at SUSE uses GCC 7
(but also has newer GCC available).  The older codestream used GCC 4.8 but
also has newer GCC available - being stuck with GCC 13 there though, no future
updates planned.

So I'm fine with raising the requirement now and documenting the oldest working
release;  we'd just have to double-check that really does it - for example when
we document 5.4 works that might suggest people should go and download & build
5.4 while of course they should instead go and download the newest release that
had the same build requirement as 5.4 had - that's why I suggested to document
a _recommended_ version plus the oldest version that's known to work if readily
available.

Richard.

>
> Jakub
>

Re: [PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]

2024-09-19 Thread Jakub Jelinek

On Fri, Sep 20, 2024 at 08:01:58AM +0200, Richard Biener wrote:
> > P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE
> > (except for the 3dNOW! PFMIN/MAX, those actually are documented to behave
> > differently), but it actually doesn't improve anything much, as
> > simplify_const_relational_operation nor simplify_ternary_operation aren't
> > able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE
> > with 3 CONST_VECTOR operands.
> > So, maybe better approach will be to generic fold the builtins with constant
> > arguments (maybe leaving NaNs to runtime).
> 
> It would be possible to fold them in the gimple folding hook to VEC_COND_EXPRs
> with the chance the min/max operation being lost when expanding to RTL.

Sure, but we don't actually pattern recognize 
typedef float v4sf __attribute__((vector_size (sizeof (4 * sizeof (float);

v4sf
foo (v4sf x, v4sf y)
{
  return x < y ? y : x;
}
back to maxpd etc.  So it wouldn't be an optimization in most cases, at
least until we do that, user was looking for such insn or better with 
_mm_max_ps...
Maybe we should.

For scalar ('-Dvector_size(x)=') this is currently matched in ce2.

Exception-wise, seems the insn raise Invalid on NaN input (either) and if y
is SNaN, actually propagate it rather than turn it into QNaN, so I think it
is actually an exact match for x < y ? y : x (or x > y ? y : x).

Jakub

[PATCH] libcpp, v2: Add -Wtrailing-whitespace= warning

2024-09-19 Thread Jakub Jelinek

On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote:
> On Wed, Sep 18, 2024 at 7:33 PM Jakub Jelinek  wrote:
> >
> > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote:
> > > +1  I'd much rather learn about this kind of error before the code reaches
> > > a review tool :)
> > >
> > > >From a quick check, it doesn't look like Clang has this, so there is no
> > > existing name to follow.
> >
> > I was considering also -Wtrailing-whitespace, but
> > 1) git diff really warns just about trailing spaces/tabs, not form feeds or
> > vertical tabs
> > 2) gcc source contains tons of spots with form feed in it (though,
> > I think pretty much always as the sole character on a line).
> > And not really sure how people use vertical tabs in the source if at all.
> > Perhaps form feed could be not warned if at end of line if it isn't the sole
> > character on a line...
> 
> Generally I like diagnosing this early.  For the above I'd say
> -Wtrailing-whitespace=
> with a set of things to diagnose (and a sane default - just spaces and
> tabs - for
> -Wtrailiing-whitespace) would be nice.  As for naming possibly follow the
> is{space,blank,cntrl} character classifications?  If those are a good
> fit, that is.

Here is a patch which currently allows blank (' ' '\t') and space (' ' '\t'
'\f' '\v'), cntrl not yet added, not anything non-ASCII, but in theory could
be added later (though, non-ASCII would be just for inside of comments,
say non-breaking space etc. in the source is otherwise an error).

Bootstrapped/regtested on x86_64-linux and i686-linux.

2024-09-19  Jakub Jelinek  

libcpp/
* include/cpplib.h (struct cpp_options): Add
cpp_warn_trailing_whitespace member.
(enum cpp_warning_reason): Add CPP_W_TRAILING_WHITESPACE.
* internal.h (struct _cpp_line_note): Document 'W' line note.
* lex.cc (_cpp_clean_line): Add 'W' line note for trailing whitespace
except for trailing whitespace after backslash.  Formatting fix.
(_cpp_process_line_notes): Emit -Wtrailing-whitespace diagnostics.
Formatting fixes.
(lex_raw_string): Clear type on 'W' notes.
gcc/
* doc/invoke.texi (Wtrailing-whitespace): Document.
gcc/c-family/
* c.opt (Wtrailing-whitespace=): New option.
(Wtrailing-whitespace): New alias.
gcc/testsuite/
* c-c++-common/cpp/Wtrailing-whitespace-1.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-2.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-3.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-4.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-5.c: New test.

--- libcpp/include/cpplib.h.jj  2024-09-13 16:09:32.690455174 +0200
+++ libcpp/include/cpplib.h 2024-09-19 16:59:09.674903649 +0200
@@ -594,6 +594,9 @@ struct cpp_options
   /* True if -finput-charset= option has been used explicitly.  */
   bool cpp_input_charset_explicit;
 
+  /* -Wtrailing-whitespace= value.  */
+  unsigned char cpp_warn_trailing_whitespace;
+
   /* Dependency generation.  */
   struct
   {
@@ -709,7 +712,8 @@ enum cpp_warning_reason {
   CPP_W_EXPANSION_TO_DEFINED,
   CPP_W_BIDIRECTIONAL,
   CPP_W_INVALID_UTF8,
-  CPP_W_UNICODE
+  CPP_W_UNICODE,
+  CPP_W_TRAILING_WHITESPACE
 };
 
 /* Callback for header lookup for HEADER, which is the name of a
--- libcpp/internal.h.jj2024-09-18 09:45:36.832570227 +0200
+++ libcpp/internal.h   2024-09-19 16:54:56.610321817 +0200
@@ -318,8 +318,8 @@ struct _cpp_line_note
 
   /* Type of note.  The 9 'from' trigraph characters represent those
  trigraphs, '\\' an escaped newline, ' ' an escaped newline with
- intervening space, 0 represents a note that has already been handled,
- and anything else is invalid.  */
+ intervening space, 'W' trailing whitespace, 0 represents a note that
+ has already been handled, and anything else is invalid.  */
   unsigned int type;
 };
 
--- libcpp/lex.cc.jj2024-09-13 16:09:32.720454758 +0200
+++ libcpp/lex.cc   2024-09-19 16:58:37.434339128 +0200
@@ -928,7 +928,7 @@ _cpp_clean_line (cpp_reader *pfile)
  if (p == buffer->next_line || p[-1] != '\\')
break;
 
- add_line_note (buffer, p - 1, p != d ? ' ': '\\');
+ add_line_note (buffer, p - 1, p != d ? ' ' : '\\');
  d = p - 2;
  buffer->next_line = p - 1;
}
@@ -943,6 +943,20 @@ _cpp_clean_line (cpp_reader *pfile)
}
}
}
+ done:
+  if (d > buffer->next_line
+ && CPP_OPTION (pfile, cpp_warn_trailing_whitespace))
+   switch (CPP_OPTION (pfile, cpp_warn_trailing_whitespace))
+ {
+ case 1:
+   if (ISBLANK (d[-1]))
+ add_line_note (buffer, d - 1, 'W');
+   break;
+ case 2:
+   if (IS_NVSPACE (d[-1]) && d[-1])
+ add_line_note (buffer, d - 1, 'W');
+   break;
+ }
 }
   else
 {
@@ -955,7 +969,6

[PATCH v1 2/2] RISC-V: Add testcases for form 4 of signed scalar SAT_ADD

2024-09-19 Thread pan2 . li

From: Pan Li 

Form 4:
  #define DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_add_##T##_fmt_4 (T x, T y)   \
  {  \
T sum;   \
bool overflow = __builtin_add_overflow (x, y, &sum); \
return !overflow ? sum : x < 0 ? MIN : MAX;  \
  }

DEF_SAT_S_ADD_FMT_4 (int64_t, uint64_t, INT64_MIN, INT64_MAX)

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-13.c: New test.
* gcc.target/riscv/sat_s_add-14.c: New test.
* gcc.target/riscv/sat_s_add-15.c: New test.
* gcc.target/riscv/sat_s_add-16.c: New test.
* gcc.target/riscv/sat_s_add-run-13.c: New test.
* gcc.target/riscv/sat_s_add-run-14.c: New test.
* gcc.target/riscv/sat_s_add-run-15.c: New test.
* gcc.target/riscv/sat_s_add-run-16.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 14 
 gcc/testsuite/gcc.target/riscv/sat_s_add-13.c | 30 +
 gcc/testsuite/gcc.target/riscv/sat_s_add-14.c | 32 +++
 gcc/testsuite/gcc.target/riscv/sat_s_add-15.c | 31 ++
 gcc/testsuite/gcc.target/riscv/sat_s_add-16.c | 29 +
 .../gcc.target/riscv/sat_s_add-run-13.c   | 16 ++
 .../gcc.target/riscv/sat_s_add-run-14.c   | 16 ++
 .../gcc.target/riscv/sat_s_add-run-15.c   | 16 ++
 .../gcc.target/riscv/sat_s_add-run-16.c   | 16 ++
 9 files changed, 200 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-16.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index ab141bb1779..a2617b6db70 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -153,6 +153,17 @@ sat_s_add_##T##_fmt_3 (T x, T y)   \
 #define DEF_SAT_S_ADD_FMT_3_WRAP(T, UT, MIN, MAX) \
   DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)
 
+#define DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX)   \
+T __attribute__((noinline))\
+sat_s_add_##T##_fmt_4 (T x, T y)   \
+{  \
+  T sum;   \
+  bool overflow = __builtin_add_overflow (x, y, &sum); \
+  return !overflow ? sum : x < 0 ? MIN : MAX;  \
+}
+#define DEF_SAT_S_ADD_FMT_4_WRAP(T, UT, MIN, MAX) \
+  DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX)
+
 #define RUN_SAT_S_ADD_FMT_1(T, x, y) sat_s_add_##T##_fmt_1(x, y)
 #define RUN_SAT_S_ADD_FMT_1_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_1(T, x, y)
 
@@ -162,6 +173,9 @@ sat_s_add_##T##_fmt_3 (T x, T y)   \
 #define RUN_SAT_S_ADD_FMT_3(T, x, y) sat_s_add_##T##_fmt_3(x, y)
 #define RUN_SAT_S_ADD_FMT_3_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_3(T, x, y)
 
+#define RUN_SAT_S_ADD_FMT_4(T, x, y) sat_s_add_##T##_fmt_4(x, y)
+#define RUN_SAT_S_ADD_FMT_4_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_4(T, x, y)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-13.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add-13.c
new file mode 100644
index 000..0923764cde4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-13.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_s_add_int8_t_fmt_4:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*7
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*7
+** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
+** xori\s+[atx][0-9]+,\s*[atx][0-

[PATCH v1 1/2] RISC-V: Add testcases for form 3 of signed scalar SAT_ADD

2024-09-19 Thread pan2 . li

From: Pan Li 

This patch would like to add testcases of the signed scalar SAT_ADD
for form 3.  Aka:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_add_##T##_fmt_3 (T x, T y)   \
  {  \
T sum;   \
bool overflow = __builtin_add_overflow (x, y, &sum); \
return overflow ? x < 0 ? MIN : MAX : sum;   \
  }

DEF_SAT_S_ADD_FMT_3 (int64_t, uint64_t, INT64_MIN, INT64_MAX)

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-10.c: New test.
* gcc.target/riscv/sat_s_add-11.c: New test.
* gcc.target/riscv/sat_s_add-12.c: New test.
* gcc.target/riscv/sat_s_add-9.c: New test.
* gcc.target/riscv/sat_s_add-run-10.c: New test.
* gcc.target/riscv/sat_s_add-run-11.c: New test.
* gcc.target/riscv/sat_s_add-run-12.c: New test.
* gcc.target/riscv/sat_s_add-run-9.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 14 
 gcc/testsuite/gcc.target/riscv/sat_s_add-10.c | 32 +++
 gcc/testsuite/gcc.target/riscv/sat_s_add-11.c | 31 ++
 gcc/testsuite/gcc.target/riscv/sat_s_add-12.c | 29 +
 gcc/testsuite/gcc.target/riscv/sat_s_add-9.c  | 30 +
 .../gcc.target/riscv/sat_s_add-run-10.c   | 16 ++
 .../gcc.target/riscv/sat_s_add-run-11.c   | 16 ++
 .../gcc.target/riscv/sat_s_add-run-12.c   | 16 ++
 .../gcc.target/riscv/sat_s_add-run-9.c| 16 ++
 9 files changed, 200 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-9.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index b4fbf5dc662..ab141bb1779 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -142,12 +142,26 @@ sat_s_add_##T##_fmt_2 (T x, T y) \
   return x < 0 ? MIN : MAX;  \
 }
 
+#define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)   \
+T __attribute__((noinline))\
+sat_s_add_##T##_fmt_3 (T x, T y)   \
+{  \
+  T sum;   \
+  bool overflow = __builtin_add_overflow (x, y, &sum); \
+  return overflow ? x < 0 ? MIN : MAX : sum;   \
+}
+#define DEF_SAT_S_ADD_FMT_3_WRAP(T, UT, MIN, MAX) \
+  DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)
+
 #define RUN_SAT_S_ADD_FMT_1(T, x, y) sat_s_add_##T##_fmt_1(x, y)
 #define RUN_SAT_S_ADD_FMT_1_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_1(T, x, y)
 
 #define RUN_SAT_S_ADD_FMT_2(T, x, y) sat_s_add_##T##_fmt_2(x, y)
 #define RUN_SAT_S_ADD_FMT_2_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_2(T, x, y)
 
+#define RUN_SAT_S_ADD_FMT_3(T, x, y) sat_s_add_##T##_fmt_3(x, y)
+#define RUN_SAT_S_ADD_FMT_3_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_3(T, x, y)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add-10.c
new file mode 100644
index 000..45329619f9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-10.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_s_add_int16_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
+** li\s+[atx][0-9]+,\s*32768
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+*

RE: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-09-19 Thread Kong, Lingling

> > "Kong, Lingling"  writes:
> > > Hi,
> > >
> > > This version has added a new optab named 'cfmovcc'. The new optab is
> > > used in the middle end to expand to cfcmov. And simplified my patch
> > > by trying to generate the conditional faulting movcc in
> > > noce_try_cmove_arith
> > function.
> > >
> > > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> > > We also tested spec with SDE and passed the runtime test.
> > >
> > > Ok for trunk?
> > >
> > >
> > > APX CFCMOV[1] feature implements conditionally faulting which means
> > > If the comparison is false, all memory faults are suppressed when
> > > load or store a memory operand. Now we could load or store a memory
> > > operand may trap or fault for conditional move.
> > >
> > > In middle-end, now we don't support a conditional move if we knew
> > > that a load from A or B could trap or fault. To enable CFCMOV, we
> > > added a new optab named cfmovcc.
> > >
> > > Conditional move suppress fault for condition mem store would not
> > > move any arithmetic calculations. For condition mem load now just
> > > support a conditional move one trap mem and one no trap and no mem cases.
> >
> > Sorry if this is going over old ground (I haven't read the earlier versions 
> > yet), but:
> > instead of adding a new optab, could we treat CFCMOV as a scalar
> > instance of maskload_optab?  Robin is working on adding an "else"
> > value for when the condition/mask is false.  After that, it would seem
> > to be a pretty close match to CFCMOV.
> >
> > One reason for preferring maskload is that it makes the load an
> > explicit part of the interface.  We could then potentially use it in gimple 
> > too, not
> just expand.
> >
> 
> Yes, for conditional load is like a scalar instance of  maskload_optab  with 
> else
> operand.
> I could try to use maskload_optab to generate cfcmov in rtl ifcvt pass. But 
> it still
> after expand.
> Now we don't have if-covert pass for scalar in gimple, do we have plan to do
> that ?
> 

Hi,

I have tried to use maskload/maskstore to generate CFCMOV in ifcvt pass,
Unlike movcc, maskload/maskstore are not allowed to FAIL.
But I need restrictions for CFCMOV in backend expand.  Since expand 
maskload/maskstore 
cannot fail, I can only make restrictions in ifcvt and emit_conditional_move 
(in optabs.cc).

I'm not sure if this is the right approach, do you have any suggestions?

Thanks,
Lingling

> > Thanks,
> > Richard
> >
> > >
> > >
> > > [1].https://www.intel.com/content/www/us/en/developer/articles/techn
> > > ic al/advanced-performance-extensions-apx.html
> > >
> > > gcc/ChangeLog:
> > >
> > >* doc/md.texi: Add cfmovcc insn pattern explanation.
> > >* ifcvt.cc (can_use_cmove_load_mem_notrap): New func
> > >for conditional faulting movcc for load.
> > >(can_use_cmove_store_mem_notrap): New func for conditional
> > >faulting movcc for store.
> > >(can_use_cfmovcc):  New func for conditional faulting.
> > >(noce_try_cmove_arith): Try to convert to conditional 
> > > faulting
> > >movcc.
> > >(noce_process_if_block): Ditto.
> > >* optabs.cc (emit_conditional_move): Handle cfmovcc.
> > >(emit_conditional_move_1): Ditto.
> > >* optabs.def (OPTAB_D): New optab.
> > > ---
> > > gcc/doc/md.texi |  10 
> > > gcc/ifcvt.cc| 119 
> > > gcc/optabs.cc   |  14 +-
> > > gcc/optabs.def  |   1 +
> > > 4 files changed, 132 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > a9259112251..5f563787c49 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -8591,6 +8591,16 @@ Return 1 if operand 1 is a normal floating
> > > point number and 0 otherwise.  @var{m} is a scalar floating point
> > > mode.  Operand 0 has mode @code{SImode}, and operand 1 has mode
> > @var{m}.
> > > +@cindex @code{cfmov@var{mode}cc} instruction pattern @item
> > > +@samp{cfmov@var{mode}cc} Similar to @samp{mov@var{mode}cc} but for
> > > +conditional faulting, If the comparison is false, all memory faults
> > > +are suppressed when load or store a memory operand.
> > > +
> > > +Conditionally move operand 2 or operand 3 into operand 0 according
> > > +to the comparison in operand 1.  If the comparison is true, operand
> > > +2 is moved into operand 0, otherwise operand 3 is moved.
> > > +
> > > @end table
> > >  @end ifset
> > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index
> > > 6487574c514..59845390607 100644
> > > --- a/gcc/ifcvt.cc
> > > +++ b/gcc/ifcvt.cc
> > > @@ -778,6 +778,9 @@ static bool noce_try_store_flag_mask (struct
> > > noce_if_info *); static rtx noce_emit_cmove (struct noce_if_info *,
> > > rtx, enum
> > rtx_code, rtx,
> > > rtx, rtx, rtx, rtx =
> > > NULL, rtx = NULL); static bool noce_tr

Re: [PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]

2024-09-19 Thread Richard Biener

On Thu, Sep 19, 2024 at 10:50 PM Jakub Jelinek  wrote:
>
> Hi!
>
> min/max patterns for intrinsics which on x86 result in the second
> input operand if the two operands are both zeros or one or both of them
> are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to
> MIN_EXPR/MAX_EXPR undefined what will be the result in those cases.
>
> The following patch adds an expander which uses either a new pattern with
> UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE
> (except for the 3dNOW! PFMIN/MAX, those actually are documented to behave
> differently), but it actually doesn't improve anything much, as
> simplify_const_relational_operation nor simplify_ternary_operation aren't
> able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE
> with 3 CONST_VECTOR operands.
> So, maybe better approach will be to generic fold the builtins with constant
> arguments (maybe leaving NaNs to runtime).

It would be possible to fold them in the gimple folding hook to VEC_COND_EXPRs
with the chance the min/max operation being lost when expanding to RTL.

Richard.

>
> 2024-09-19  Uros Bizjak  
> Jakub Jelinek  
>
> PR target/116738
> * config/i386/subst.md (mask_scalar_operand_arg34,
> mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New
> subst attributes.
> * config/i386/sse.md
> (_vm3):
> Change from define_insn to define_expand, rename the old define_insn
> to ...
> (*_vm3):
> ... this.
> 
> (_ieee_vm3):
> New define_insn.
>
> * gcc.target/i386/sse-pr116738.c: New test.
>
> --- gcc/config/i386/subst.md.jj 2024-09-18 15:49:42.200791315 +0200
> +++ gcc/config/i386/subst.md2024-09-19 12:32:51.048626421 +0200
> @@ -366,6 +366,8 @@ (define_subst_attr "mask_scalar_operand4
>  (define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4")
>  (define_subst_attr "mask_scalar4_dest_false_dep_for_glc_cond" "mask_scalar" 
> "1" "operands[4] == CONST0_RTX(mode)")
>  (define_subst_attr "mask_scalarc_dest_false_dep_for_glc_cond" "mask_scalarc" 
> "1" "operands[3] == CONST0_RTX(V8HFmode)")
> +(define_subst_attr "mask_scalar_operand_arg34" "mask_scalar" "" ", 
> operands[3], operands[4]")
> +(define_subst_attr "mask_scalar_expand_op3" "mask_scalar" "3" "5")
>
>  (define_subst "mask_scalar"
>[(set (match_operand:SUBST_V 0)
> @@ -473,6 +475,7 @@ (define_subst_attr "round_saeonly_scalar
>  (define_subst_attr "round_saeonly_scalar_constraint" "round_saeonly_scalar" 
> "vm" "v")
>  (define_subst_attr "round_saeonly_scalar_prefix" "round_saeonly_scalar" 
> "vex" "evex")
>  (define_subst_attr "round_saeonly_scalar_nimm_predicate" 
> "round_saeonly_scalar" "nonimmediate_operand" "register_operand")
> +(define_subst_attr "round_saeonly_scalar_mask_arg3" "round_saeonly_scalar" 
> "" ", operands[]")
>
>  (define_subst "round_saeonly_scalar"
>[(set (match_operand:SUBST_V 0)
> --- gcc/config/i386/sse.md.jj   2024-09-10 16:26:02.875151133 +0200
> +++ gcc/config/i386/sse.md  2024-09-19 12:43:31.693030695 +0200
> @@ -,7 +,27 @@ (define_insn "*ieee_3
>(const_string "*")))
> (set_attr "mode" "")])
>
> -(define_insn 
> "_vm3"
> +(define_expand 
> "_vm3"
> +  [(set (match_operand:VFH_128 0 "register_operand")
> +   (vec_merge:VFH_128
> + (smaxmin:VFH_128
> +   (match_operand:VFH_128 1 "register_operand")
> +   (match_operand:VFH_128 2 "nonimmediate_operand"))
> +(match_dup 1)
> +(const_int 1)))]
> +  "TARGET_SSE"
> +{
> +  if (!flag_finite_math_only || flag_signed_zeros)
> +{
> +  emit_insn 
> (gen__ieee_vm3
> +(operands[0], operands[1], operands[2]
> + 
> + ));
> +  DONE;
> +}
> +})
> +
> +(define_insn 
> "*_vm3"
>[(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> (vec_merge:VFH_128
>   (smaxmin:VFH_128
> @@ -3348,6 +3368,25 @@ (define_insn "_vm3[(set_attr "isa" "noavx,avx")
> (set_attr "type" "sse")
> (set_attr "btver2_sse_attr" "maxmin")
> +   (set_attr "prefix" "")
> +   (set_attr "mode" "")])
> +
> +(define_insn 
> "_ieee_vm3"
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +   (vec_merge:VFH_128
> + (unspec:VFH_128
> +   [(match_operand:VFH_128 1 "register_operand" "0,v")
> +(match_operand:VFH_128 2 "nonimmediate_operand" 
> "xm,")]
> +   IEEE_MAXMIN)
> +(match_dup 1)
> +(const_int 1)))]
> +  "TARGET_SSE"
> +  "@
> +   \t{%2, %0|%0, %2}
> +   v\t{%2, 
> %1, %0|%0, %1, 
> %2}"
> +  [(set_attr "isa" "noavx,avx")
> +   (set_attr "type" "sse")
> +   (set_attr "btver2_sse_attr" "maxmin")
> (set_attr "prefix" "")
> (set_attr "mode" "")])
>
> --- gcc/testsuite/gcc.targ

Re: PING [PATCH v3 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

2024-09-19 Thread Joseph Myers

On Fri, 13 Sep 2024, Mikael Morin wrote:

> *PING*
> 
> Joseph, could you take a quick look at the handling of the new option?
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661267.html

Individual new options like this are expected to be reviewed by 
maintainers / reviewers for the relevant part of the compiler, not for 
option handling which is more for the generic machinery independent of 
individual options.

-- 
Joseph S. Myers
josmy...@redhat.com

[COMMITTED] testsuite: fix 'do-do' typos

2024-09-19 Thread Sam James

Fix 'do-do' typos (should be 'dg-do'). No change in logs.

gcc/testsuite/ChangeLog:

* g++.dg/other/operator2.C: Fix dg-do directive.
* gcc.dg/Warray-bounds-67.c: Ditto.
* gcc.dg/cpp/builtin-macro-1.c: Ditto.
* gcc.dg/tree-ssa/builtin-snprintf-3.c: Ditto.
* obj-c++.dg/empty-private-1.mm: Ditto.
---
Pushed as obvious.

 gcc/testsuite/g++.dg/other/operator2.C | 2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-67.c| 2 +-
 gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c | 8 
 gcc/testsuite/obj-c++.dg/empty-private-1.mm| 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/g++.dg/other/operator2.C 
b/gcc/testsuite/g++.dg/other/operator2.C
index 358731127186..cd477a64c3f9 100644
--- a/gcc/testsuite/g++.dg/other/operator2.C
+++ b/gcc/testsuite/g++.dg/other/operator2.C
@@ -1,5 +1,5 @@
 // PR c++/28852
-// { do-do compile }
+// { dg-do compile }
 
 struct A
 {
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-67.c 
b/gcc/testsuite/gcc.dg/Warray-bounds-67.c
index a9b9ff7d2ab2..354fb89467e3 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-67.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-67.c
@@ -2,7 +2,7 @@
of a struct that's a member of either a struct or a union.  Both
are obviously undefined but GCC relies on these hacks so the test
verifies that -Warray-bounds doesn't trigger for it.
-   { do-do compile }
+   { dg-do compile }
{ dg-options "-O2 -Wall" } */
 
 
diff --git a/gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c 
b/gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c
index 0f950038d1bd..6fc3c2602785 100644
--- a/gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c
+++ b/gcc/testsuite/gcc.dg/cpp/builtin-macro-1.c
@@ -5,7 +5,7 @@
the function-like macro expansion it's part of.
 
{ dg-do run }
-   { do-options -no-integrated-cpp }  */
+   { dg-options -no-integrated-cpp }  */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c
index e481955ab732..00ea752c1974 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-snprintf-3.c
@@ -1,6 +1,6 @@
 /* Verify the lower and upper bounds of floating directives with
precision whose range crosses zero.
-  { do-do compile }
+  { dg-do compile }
   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
 
 static const double x = 1.23456789;
@@ -72,6 +72,6 @@ int test_g (int p)
   return n;
 }
 
-/* { dg-final { scan-tree-dump-times "snprintf" 4 "optimized"} }
-   { dg-final { scan-tree-dump-not "failure_range" "optimized"} }
-   { dg-final { scan-tree-dump-times "verify_" 8 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "snprintf" 4 "optimized" } }
+   { dg-final { scan-tree-dump-not "failure_range" "optimized" } }
+   { dg-final { scan-tree-dump-times "verify_" 8 "optimized" } } */
diff --git a/gcc/testsuite/obj-c++.dg/empty-private-1.mm 
b/gcc/testsuite/obj-c++.dg/empty-private-1.mm
index b8b90b07ecda..0bbec921b8ec 100644
--- a/gcc/testsuite/obj-c++.dg/empty-private-1.mm
+++ b/gcc/testsuite/obj-c++.dg/empty-private-1.mm
@@ -1,6 +1,6 @@
 /* Test for no entry after @private token.  */
 
-/* { do-do compile } */
+/* { dg-do compile } */
 
 @interface foo
 {
-- 
2.46.0

Re: RFC PATCH: contrib/test_summary mode for submitting testsuite results to bunsen

2024-09-19 Thread Hans-Peter Nilsson

I'd love for (something like) gcc-testresults@ to be usefully 
searchable (it can be done but... lacks), so please allow me:

On Fri, 13 Sep 2024, Frank Ch. Eigler wrote:
> diff --git a/contrib/test_summary b/contrib/test_summary
> index 5760b053ec27..867ada4d6b81 100755
> --- a/contrib/test_summary
> +++ b/contrib/test_summary
> @@ -39,6 +39,9 @@ if test x"$1" = "x-h"; then
>   should be selected from the log files.
>   -f: force reports to be mailed; if omitted, only reports that differ
>   from the sent.* version are sent.
> + -b: instead of emailing, push test logs into a bunsen git repo
> + -bg REPO: specify the bunsen git repo to override default
> + -bt TAG: specify the bunsen git commit tag to override default
>  _EOF
>exit 0
>  fi
> @@ -57,6 +60,9 @@ fi
>  : ${filesuffix=}; export filesuffix
>  : ${move=true}; export move
>  : ${forcemail=false}; export forcemail
> +: ${bunsen=false};
> +: ${bunsengit=ssh://sourceware.org/git/bunsendb.git/};
> +: ${bunsentag=`whoami`/gcc/`uname -m`-`date +%Y%m%d-%H%M`};

That uname -m looks like it's an assumption that the report is 
for a 1) native build that is 2) the same machine as where the 
git push should happen and 3) all run the same OS.  Also, my 
local account-name may be completely different than what's 
needed in the tag.  Looks like there's a side-question for 
account names for the bunsendb when you don't have a sourceware 
account (are rules needed)? Anyway, please parametrize.
Please instead of uname -m scrape the default target identifier 
from the build.

Use-case: I push cross-build reports from an entirely different 
machine.  My local login may be different.

brgds, H-P

[PATCH] c++: Use type_id_in_expr_sentinel in 6 further spots in the parser

2024-09-19 Thread Jakub Jelinek

Hi!

The following patch uses type_id_in_expr_sentinel in a few spots which
did it all manually.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-19  Jakub Jelinek  

* parser.cc (cp_parser_postfix_expression): Use
type_id_in_expr_sentinel instead of manually saving+setting/restoring
parser->in_type_id_in_expr_p around cp_parser_type_id calls.
(cp_parser_has_attribute_expression): Likewise.
(cp_parser_cast_expression): Likewise.
(cp_parser_sizeof_operand): Likewise.

--- gcc/cp/parser.cc.jj 2024-09-07 09:31:20.708482757 +0200
+++ gcc/cp/parser.cc2024-09-19 10:46:21.916155154 +0200
@@ -7554,7 +7554,6 @@ cp_parser_postfix_expression (cp_parser
tree type;
cp_expr expression;
const char *saved_message;
-   bool saved_in_type_id_in_expr_p;
 
/* All of these can be handled in the same way from the point
   of view of parsing.  Begin by consuming the token
@@ -7569,11 +7568,11 @@ cp_parser_postfix_expression (cp_parser
/* Look for the opening `<'.  */
cp_parser_require (parser, CPP_LESS, RT_LESS);
/* Parse the type to which we are casting.  */
-   saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p;
-   parser->in_type_id_in_expr_p = true;
-   type = cp_parser_type_id (parser, CP_PARSER_FLAGS_TYPENAME_OPTIONAL,
- NULL);
-   parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
+   {
+ type_id_in_expr_sentinel s (parser);
+ type = cp_parser_type_id (parser, CP_PARSER_FLAGS_TYPENAME_OPTIONAL,
+   NULL);
+   }
/* Look for the closing `>'.  */
cp_parser_require_end_of_template_parameter_list (parser);
/* Restore the old message.  */
@@ -7643,7 +7642,6 @@ cp_parser_postfix_expression (cp_parser
   {
tree type;
const char *saved_message;
-   bool saved_in_type_id_in_expr_p;
 
/* Consume the `typeid' token.  */
cp_lexer_consume_token (parser->lexer);
@@ -7658,10 +7656,10 @@ cp_parser_postfix_expression (cp_parser
   expression.  */
cp_parser_parse_tentatively (parser);
/* Try a type-id first.  */
-   saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p;
-   parser->in_type_id_in_expr_p = true;
-   type = cp_parser_type_id (parser);
-   parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
+   {
+ type_id_in_expr_sentinel s (parser);
+ type = cp_parser_type_id (parser);
+   }
/* Look for the `)' token.  Otherwise, we can't be sure that
   we're not looking at an expression: consider `typeid (int
   (3))', for example.  */
@@ -7916,10 +7914,8 @@ cp_parser_postfix_expression (cp_parser
else
  {
/* Parse the type.  */
-   bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p;
-   parser->in_type_id_in_expr_p = true;
+   type_id_in_expr_sentinel s (parser);
type = cp_parser_type_id (parser);
-   parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
parens.require_close (parser);
  }
 
@@ -9502,11 +9498,11 @@ cp_parser_has_attribute_expression (cp_p
  expression.  */
   cp_parser_parse_tentatively (parser);
 
-  bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p;
-  parser->in_type_id_in_expr_p = true;
-  /* Look for the type-id.  */
-  oper = cp_parser_type_id (parser);
-  parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
+  {
+type_id_in_expr_sentinel s (parser);
+/* Look for the type-id.  */
+oper = cp_parser_type_id (parser);
+  }
 
   cp_parser_parse_definitely (parser);
 
@@ -10268,15 +10264,13 @@ cp_parser_cast_expression (cp_parser *pa
cp_parser_simulate_error (parser);
   else
{
- bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p;
- parser->in_type_id_in_expr_p = true;
+ type_id_in_expr_sentinel s (parser);
  /* Look for the type-id.  */
  type = cp_parser_type_id (parser);
  /* Look for the closing `)'.  */
  cp_token *close_paren = parens.require_close (parser);
  if (close_paren)
close_paren_loc = close_paren->location;
- parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
}
 
   /* Restore the saved message.  */
@@ -34299,13 +34293,11 @@ cp_parser_sizeof_operand (cp_parser* par
cp_parser_simulate_error (parser);
   else
{
- bool saved_in_type_id_in_expr_p = parser->in_type_id_in_expr_p;
- parser->in_type_id_in_expr_p = true;
+ type_id_in_expr_sentinel s (parser);
  /* Look for the type-id.  */
  type = cp_parser_type_id (parser);
  /* Look for the closing `)'.  */
  parens.require_close (parser);
-

[PATCH v2] MIPS: Add some floating point instructions support for MIPSr6

2024-09-19 Thread Jie Mei

This patch adds some of the float point instructions from
MIPS32 Release 6(mips32r6) with their respective built-in
functions and tests:

min_a_s, min_a_d
max_a_s, max_a_d
rint_s, rint_d
class_s, class_d

gcc/ChangeLog:

* config/mips/i6400.md (i6400_fpu_minmax): Include
fclass type.
(i6400_fpu_fadd): Include frint type.
* config/mips/mips.cc (AVAIL_NON_MIPS16): Add an entry
for __builtin_mipsr6_xxx.
(MIPSR6_BUILTIN_PURE): Same as above.
(CODE_FOR_mipsr6_min_a_s, CODE_FOR_mipsr6_min_a_d)
(CODE_FOR_mipsr6_max_a_s, CODE_FOR_mipsr6_max_a_d)
(CODE_FOR_mipsr6_class_s, CODE_FOR_mipsr6_class_d):
New code_aliasing macros.
(mips_builtins): Add mips32r6 min_a_s, min_a_d, max_a_s,
max_a_d, class_s, class_d builtins.
* config/mips/mips.h (ISA_HAS_FRINT): Define a new macro.
(ISA_HAS_FCLASS): Same as above.
* config/mips/mips.md (UNSPEC_FRINT): New unspec.
(UNSPEC_FCLASS): Same as above.
(type): Add frint and fclass.
(fmin_a_): Generates MINA.fmt instructions.
(fmax_a_): Generates MAXA.fmt instructions.
(rint2): Generates RINT.fmt instructions.
(fclass_): Generates CLASS.fmt instructions.
* config/mips/p6600.md (p6600_fpu_fadd): Include
frint type.
(p6600_fpu_fabs): Include fclass type.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips-class.c: New tests for MIPSr6
* gcc.target/mips/mips-minamaxa.c: Same as above.
* gcc.target/mips/mips-rint.c: Same as above.

Signed-off-by: Jie Mei 
Co-authored-by: Xi Ruoyao 
---
 gcc/config/mips/i6400.md  |  8 +--
 gcc/config/mips/mips.cc   | 24 +
 gcc/config/mips/mips.h|  4 ++
 gcc/config/mips/mips.md   | 52 ++-
 gcc/config/mips/p6600.md  |  8 +--
 gcc/testsuite/gcc.target/mips/mips-class.c| 17 ++
 gcc/testsuite/gcc.target/mips/mips-minamaxa.c | 31 +++
 gcc/testsuite/gcc.target/mips/mips-rint.c | 17 ++
 8 files changed, 151 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips-class.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips-minamaxa.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips-rint.c

diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md
index d6f691ee217..48ce980e1c2 100644
--- a/gcc/config/mips/i6400.md
+++ b/gcc/config/mips/i6400.md
@@ -219,16 +219,16 @@
(eq_attr "type" "fabs,fneg,fmove"))
   "i6400_fpu_short, i6400_fpu_apu")
 
-;; min, max
+;; min, max, fclass
 (define_insn_reservation "i6400_fpu_minmax" 2
   (and (eq_attr "cpu" "i6400")
-   (eq_attr "type" "fminmax"))
+   (eq_attr "type" "fminmax,fclass"))
   "i6400_fpu_short+i6400_fpu_logic")
 
-;; fadd, fsub, fcvt
+;; fadd, fsub, fcvt, frint
 (define_insn_reservation "i6400_fpu_fadd" 4
   (and (eq_attr "cpu" "i6400")
-   (eq_attr "type" "fadd,fcvt"))
+   (eq_attr "type" "fadd,fcvt,frint"))
   "i6400_fpu_long, i6400_fpu_apu")
 
 ;; fmul
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 173f792bf55..bf1d15b9700 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -15775,6 +15775,7 @@ AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && 
TARGET_DSPR2)
 AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_MMI)
 AVAIL_MIPS16E2_OR_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
 AVAIL_NON_MIPS16 (msa, TARGET_MSA)
+AVAIL_NON_MIPS16 (r6, mips_isa_rev >= 6)
 
 /* Construct a mips_builtin_description from the given arguments.
 
@@ -15940,6 +15941,14 @@ AVAIL_NON_MIPS16 (msa, TARGET_MSA)
 "__builtin_msa_" #INSN,  MIPS_BUILTIN_DIRECT_NO_TARGET,\
 FUNCTION_TYPE, mips_builtin_avail_msa, false }
 
+/* Define a MIPSr6 MIPS_BUILTIN_DIRECT pure function __builtin_mipsr6_
+   for instruction CODE_FOR_mipsr6_.  FUNCTION_TYPE is a 
builtin_description
+   field.  */
+#define MIPSR6_BUILTIN_PURE(INSN, FUNCTION_TYPE)   \
+{ CODE_FOR_mipsr6_ ## INSN, MIPS_FP_COND_f,
\
+"__builtin_mipsr6_" #INSN,  MIPS_BUILTIN_DIRECT,   \
+FUNCTION_TYPE, mips_builtin_avail_r6, true }
+
 #define CODE_FOR_mips_sqrt_ps CODE_FOR_sqrtv2sf2
 #define CODE_FOR_mips_addq_ph CODE_FOR_addv2hi3
 #define CODE_FOR_mips_addu_qb CODE_FOR_addv4qi3
@@ -16177,6 +16186,13 @@ AVAIL_NON_MIPS16 (msa, TARGET_MSA)
 #define CODE_FOR_msa_ldi_w CODE_FOR_msa_ldiv4si
 #define CODE_FOR_msa_ldi_d CODE_FOR_msa_ldiv2di
 
+#define CODE_FOR_mipsr6_min_a_s CODE_FOR_fmin_a_sf
+#define CODE_FOR_mipsr6_min_a_d CODE_FOR_fmin_a_df
+#define CODE_FOR_mipsr6_max_a_s CODE_FOR_fmax_a_sf
+#define CODE_FOR_mipsr6_max_a_d CODE_FOR_fmax_a_df
+#define CODE_FOR_mipsr6_class_s CODE_FOR_fclass_sf
+#define CODE_FOR_mipsr6_class_d CODE_FOR_fclass_df
+
 static const struct mips_builtin_description mips_builtins[] = {
 #define MIPS_GET_FCSR 0

GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated

2024-09-19 Thread Thomas Schwinge

Hi!

Regarding ongoing maintenance efforts, and avoiding to build multilib
variants that probably nobody uses apart from a few of us testing these
out of routine (via building/linking with explicit '-mptx=3.1'), I
propose: "GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated",
see attached, "[...], and will be removed in GCC 16".  Any objections?
If not, then I'll push this before the GCC 15 release, and timely after
the GCC 15 release apply the corresponding code changes (yet to be
implemented).  (That is, no actual change for GCC release users for
another 1.5 years.)

These '-mptx=3.1' multilib variants are only useful for users of ancient
CUDA/Nvidia Driver, which doesn't support GCC's default PTX ISA 6.0
multilib variants; PTX ISA 6.0 is supported as of CUDA 9, 2017-09.


Grüße
 Thomas


>From 8c099b2c4fed4f0745ef913c865868e76c061232 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 19 Sep 2024 22:04:28 +0200
Subject: [PATCH] GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated

---
 htdocs/gcc-15/changes.html | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 7c372688..99242d2c 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -191,6 +191,10 @@ a work-in-progress.
   For this, a recent version of https://gcc.gnu.org/install/specific.html#nvptx-x-none";
   >nvptx-tools is required.
+  
+The -mptx=3.1 multilib variants are deprecated and will be
+removed in GCC 16.
+  
 
 
 
-- 
2.45.2

Re: Re: PING [PATCH v3 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

2024-09-19 Thread Jakub Jelinek

On Mon, Sep 16, 2024 at 10:52:43AM +0200, Mikael Morin wrote:
> > While I understand the intent of 'positive form' vs 'negative form', the
> > above might be clearer as
> > 
> > Usage of intrinsics can be implemented either by generating a call
> > to the libgfortran library function or by directly generating inline
> > code.  For most intrinsics, only a single variant is available, and
> > there is no choice of implementation.  However, some intrinsics can
> > use a library function or inline code, wher inline code typically offers
> > opportunities for additional optimization over a library function.
> > With @code{-finline-intrinsics=...} or 
> > @code{-fno-inline-intrinsics=...}, the
> > choice applies only to the intrinsics present in the comma-separated 
> > list
> > provided as argument.
> > 
> > > > +For each intrinsic, if no choice of implementation was made through 
> > > > either of
> > > > +the flag variants, a default behaviour is chosen depending on 
> > > > optimization:
> > > > +library calls are generated when not optimizing or when optimizing for 
> > > > size;
> > > > +otherwise inline code is preferred.
> > > > +
> > 
> > 
> > OK with consideration the above comments.
> > 
> 
> Harald actually gave a partial green light on this already, but obviously
> there was still room for improvement.
> Thanks for the review, I'm incorporating the changes you suggested.
> 
> I was (and still am) waiting for a review from someone knowledgeable in the
> options system.  I'm considering proceeding without, as I prefer seeing this
> pushed sooner than later.

Just note lang.opt.urls will need to be updated, either you do it right away
with make regenerate-opt-urls or commit, wait for a nag-mail from CI and
commit incrementally the patch it creates.

Jakub

[PATCH 1/2] c++: Don't strip USING_DECLs when updating local bindings [PR116748]

2024-09-19 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Alternatively I could solve this the other way around (have a new
'old_target = strip_using_decl (old)' and replace all usages of 'old'
except the usages in this patch); this is more churn but probably better
matches how other functions are structured.

-- >8 --

Currently update_binding strips USING_DECLs too eagerly, leading to ICEs
in pop_local_decl as it can't find the decl it's popping in the binding
list.  Let's rather try to keep the original USING_DECL around.

This also means that using59.C can point to the location of the
using-decl rather than the underlying object directly; this is in the
direction required to fix PR c++/106851 (though more work is needed to
emit properly helpful diagnostics here).

PR c++/116748

gcc/cp/ChangeLog:

* name-lookup.cc (update_binding): Maintain USING_DECLs in the
binding slots.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/using59.C: Update location.
* g++.dg/lookup/using69.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc | 12 +++-
 gcc/testsuite/g++.dg/lookup/using59.C |  4 ++--
 gcc/testsuite/g++.dg/lookup/using69.C | 10 ++
 3 files changed, 19 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lookup/using69.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index c7a693e02d5..94b031e6be2 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3005,6 +3005,8 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
 
   if (old == error_mark_node)
 old = NULL_TREE;
+
+  tree old_bval = old;
   old = strip_using_decl (old);
 
   if (DECL_IMPLICIT_TYPEDEF_P (decl))
@@ -3021,7 +3023,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
  gcc_checking_assert (!to_type);
  hide_type = hiding;
  to_type = decl;
- to_val = old;
+ to_val = old_bval;
}
   else
hide_value = hiding;
@@ -3034,7 +3036,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
   /* OLD is an implicit typedef.  Move it to to_type.  */
   gcc_checking_assert (!to_type);
 
-  to_type = old;
+  to_type = old_bval;
   hide_type = hide_value;
   old = NULL_TREE;
   hide_value = false;
@@ -3093,7 +3095,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
{
  if (same_type_p (TREE_TYPE (old), TREE_TYPE (decl)))
/* Two type decls to the same type.  Do nothing.  */
-   return old;
+   return old_bval;
  else
goto conflict;
}
@@ -3106,7 +3108,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
 
  /* The new one must be an alias at this point.  */
  gcc_assert (DECL_NAMESPACE_ALIAS (decl));
- return old;
+ return old_bval;
}
   else if (TREE_CODE (old) == VAR_DECL)
{
@@ -3121,7 +3123,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
   else
{
conflict:
- diagnose_name_conflict (decl, old);
+ diagnose_name_conflict (decl, old_bval);
  to_val = NULL_TREE;
}
 }
diff --git a/gcc/testsuite/g++.dg/lookup/using59.C 
b/gcc/testsuite/g++.dg/lookup/using59.C
index 3c3a73c28d5..b7ec325d234 100644
--- a/gcc/testsuite/g++.dg/lookup/using59.C
+++ b/gcc/testsuite/g++.dg/lookup/using59.C
@@ -1,10 +1,10 @@
 
 namespace Y
 {
-  extern int I; //  { dg-message "previous declaration" }
+  extern int I;
 }
 
-using Y::I;
+using Y::I; // { dg-message "previous declaration" }
 extern int I; // { dg-error "conflicts with a previous" }
 
 extern int J;
diff --git a/gcc/testsuite/g++.dg/lookup/using69.C 
b/gcc/testsuite/g++.dg/lookup/using69.C
new file mode 100644
index 000..7d52b73b9ce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/using69.C
@@ -0,0 +1,10 @@
+// PR c++/116748
+
+namespace ns {
+  struct empty;
+}
+
+void foo() {
+  using ns::empty;
+  int empty;
+}
-- 
2.46.0

[PATCH 2/2] c++: Implement resolution for DR 36 [PR116160]

2024-09-19 Thread Nathaniel Shead

Noticed how to fix this while working on the other patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

This implements part of P1787 to no longer complain about redeclaring an
entity via using-decl other than in a class scope.

PR c++/116160

gcc/cp/ChangeLog:

* name-lookup.cc (supplement_binding): Allow redeclaration via
USING_DECL if not in class scope.
(do_nonmember_using_decl): Remove function-scope exemption.
(push_using_decl_bindings): Remove outdated comment.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/using-enum-3.C: No longer expect an error.
* g++.dg/lookup/using53.C: Remove XFAIL.
* g++.dg/cpp2a/using-enum-11.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc  | 12 +++-
 gcc/testsuite/g++.dg/cpp0x/using-enum-3.C  |  2 +-
 gcc/testsuite/g++.dg/cpp2a/using-enum-11.C |  9 +
 gcc/testsuite/g++.dg/lookup/using53.C  |  2 +-
 4 files changed, 18 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-11.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 94b031e6be2..22a1c6aac8c 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -2874,6 +2874,12 @@ supplement_binding (cxx_binding *binding, tree decl)
 "%<-std=c++2c%> or %<-std=gnu++2c%>");
   binding->value = name_lookup::ambiguous (decl, binding->value);
 }
+  else if (binding->scope->kind != sk_class
+  && TREE_CODE (decl) == USING_DECL
+  && decls_match (target_bval, target_decl))
+/* Since P1787 (DR 36) it is OK to redeclare entities via using-decl,
+   except in class scopes.  */
+ok = false;
   else
 {
   if (!error_operand_p (bval))
@@ -5375,8 +5381,7 @@ do_nonmember_using_decl (name_lookup &lookup, bool 
fn_scope_p,
   else if (value
   /* Ignore anticipated builtins.  */
   && !anticipated_builtin_p (value)
-  && (fn_scope_p
-  || !decls_match (lookup.value, strip_using_decl (value
+  && !decls_match (lookup.value, strip_using_decl (value)))
 {
   diagnose_name_conflict (lookup.value, value);
   failed = true;
@@ -6648,9 +6653,6 @@ push_using_decl_bindings (name_lookup *lookup, tree name, 
tree value)
   type = binding->type;
 }
 
-  /* DR 36 questions why using-decls at function scope may not be
- duplicates.  Disallow it, as C++11 claimed and PR 20420
- implemented.  */
   if (lookup)
 do_nonmember_using_decl (*lookup, true, true, &value, &type);
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/using-enum-3.C 
b/gcc/testsuite/g++.dg/cpp0x/using-enum-3.C
index 34f8bf4fa0b..4638181c63c 100644
--- a/gcc/testsuite/g++.dg/cpp0x/using-enum-3.C
+++ b/gcc/testsuite/g++.dg/cpp0x/using-enum-3.C
@@ -9,7 +9,7 @@
 void f ()
 {
   enum e { a };
-  using e::a;  // { dg-error "redeclaration" }
+  using e::a;  // { dg-bogus "redeclaration" "P1787" }
   // { dg-error "enum" "" { target { ! c++2a } } .-1 }
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/using-enum-11.C 
b/gcc/testsuite/g++.dg/cpp2a/using-enum-11.C
new file mode 100644
index 000..ff99ed422d5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/using-enum-11.C
@@ -0,0 +1,9 @@
+// PR c++/116160
+// { dg-do compile { target c++20 } }
+
+enum class Blah { b };
+void foo() {
+  using Blah::b;
+  using Blah::b;
+  using enum Blah;
+}
diff --git a/gcc/testsuite/g++.dg/lookup/using53.C 
b/gcc/testsuite/g++.dg/lookup/using53.C
index e91829e939a..8279c73bfc4 100644
--- a/gcc/testsuite/g++.dg/lookup/using53.C
+++ b/gcc/testsuite/g++.dg/lookup/using53.C
@@ -52,5 +52,5 @@ void
 f ()
 {
   using N::i;
-  using N::i;   // { dg-bogus "conflicts" "See P1787 (CWG36)" { xfail 
*-*-* } }
+  using N::i;   // { dg-bogus "conflicts" "See P1787 (CWG36)" }
 }
-- 
2.46.0

Re: [PATCH] Remove PHI_RESULT_PTR and change some PHI_RESULT to be gimple_phi_result [PR116643]

2024-09-19 Thread Richard Biener




> Am 20.09.2024 um 06:02 schrieb Andrew Pinski :
> 
> There was only a few uses PHI_RESULT_PTR so lets remove it and use 
> gimple_phi_result_ptr
> or gimple_phi_result directly instead.
> Since I was modifying ssa-iterators.h for the use of PHI_RESULT_PTR, change 
> the use
> of PHI_RESULT there to be gimple_phi_result instead.
> 
> This also removes one extra indirection that was done for PHI_RESULT so 
> stage2 building
> should be slightly faster.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

Richard 

>PR middle-end/116643
> 
> gcc/ChangeLog:
> 
>* ssa-iterators.h (single_phi_def): Use gimple_phi_result
>instead of PHI_RESULT.
>(op_iter_init_phidef): Use gimple_phi_result/gimple_phi_result_ptr
>instead of PHI_RESULT/PHI_RESULT_PTR.
>* tree-ssa-operands.h (PHI_RESULT_PTR): Remove.
>(PHI_RESULT): Use gimple_phi_result directly.
>(SET_PHI_RESULT): Use gimple_phi_result_ptr directly.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/ssa-iterators.h | 6 +++---
> gcc/tree-ssa-operands.h | 5 ++---
> 2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
> index b7b01fd018a..e0e555cc472 100644
> --- a/gcc/ssa-iterators.h
> +++ b/gcc/ssa-iterators.h
> @@ -768,7 +768,7 @@ num_ssa_operands (gimple *stmt, int flags)
> inline tree
> single_phi_def (gphi *stmt, int flags)
> {
> -  tree def = PHI_RESULT (stmt);
> +  tree def = gimple_phi_result (stmt);
>   if ((flags & SSA_OP_DEF) && is_gimple_reg (def))
> return def;
>   if ((flags & SSA_OP_VIRTUAL_DEFS) && !is_gimple_reg (def))
> @@ -811,7 +811,7 @@ op_iter_init_phiuse (ssa_op_iter *ptr, gphi *phi, int 
> flags)
> inline def_operand_p
> op_iter_init_phidef (ssa_op_iter *ptr, gphi *phi, int flags)
> {
> -  tree phi_def = PHI_RESULT (phi);
> +  tree phi_def = gimple_phi_result (phi);
>   int comp;
> 
>   clear_and_done_ssa_iter (ptr);
> @@ -833,7 +833,7 @@ op_iter_init_phidef (ssa_op_iter *ptr, gphi *phi, int 
> flags)
>   /* The first call to op_iter_next_def will terminate the iterator since
>  all the fields are NULL.  Simply return the result here as the first and
>  therefore only result.  */
> -  return PHI_RESULT_PTR (phi);
> +  return gimple_phi_result_ptr (phi);
> }
> 
> /* Return true is IMM has reached the end of the immediate use stmt list.  */
> diff --git a/gcc/tree-ssa-operands.h b/gcc/tree-ssa-operands.h
> index 8072932564a..b6534f18c66 100644
> --- a/gcc/tree-ssa-operands.h
> +++ b/gcc/tree-ssa-operands.h
> @@ -72,9 +72,8 @@ struct GTY(()) ssa_operands {
> #define USE_OP_PTR(OP)(&((OP)->use_ptr))
> #define USE_OP(OP)(USE_FROM_PTR (USE_OP_PTR (OP)))
> 
> -#define PHI_RESULT_PTR(PHI)gimple_phi_result_ptr (PHI)
> -#define PHI_RESULT(PHI)DEF_FROM_PTR (PHI_RESULT_PTR (PHI))
> -#define SET_PHI_RESULT(PHI, V)SET_DEF (PHI_RESULT_PTR (PHI), (V))
> +#define PHI_RESULT(PHI)gimple_phi_result (PHI)
> +#define SET_PHI_RESULT(PHI, V)SET_DEF (gimple_phi_result_ptr (PHI), (V))
> /*
> #define PHI_ARG_DEF(PHI, I)USE_FROM_PTR (PHI_ARG_DEF_PTR ((PHI), (I)))
> */
> --
> 2.34.1
>

[COMMITTED] testsuite: debug: fix errant whitespace

2024-09-19 Thread Sam James

I added some whitespace unintentionally in r15-3723-g284c03ec79ec20,
fix that.

gcc/testsuite/ChangeLog:

* gcc.dg/debug/btf/btf-datasec-1.c: Fix whitespace.
---
Pushed as obvious.

 gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
index 781f80774e2f..4a46479397a6 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c
@@ -1,4 +1,3 @@
-
 /* BTF generation of BTF_KIND_DATASEC records.
 
We expect 3 DATASEC records: one for each of .data, .rodata and .bss.
-- 
2.46.0

Re: [PATCH v5] c++: deleting explicitly-defaulted functions [PR116162]

2024-09-19 Thread Jason Merrill


On 9/19/24 5:35 PM, Marek Polacek wrote:

On Tue, Sep 17, 2024 at 12:50:46PM -0400, Jason Merrill wrote:

On 9/16/24 7:14 PM, Marek Polacek wrote:

+/* Mark an explicitly defaulted function FN as =deleted and warn.
+   IMPLICIT_FN is the corresponding special member function that
+   would have been implicitly declared.  */
+
+void
+maybe_delete_defaulted_fn (tree fn, tree implicit_fn)
+{
+  if (DECL_ARTIFICIAL (fn) || !DECL_DEFAULTED_IN_CLASS_P (fn))
+return;
+
+  DECL_DELETED_FN (fn) = true;
+
+  if (!warn_defaulted_fn_deleted)
+return;


The flag shouldn't affect the error cases; I'd drop this check.


Dropped.


+  auto_diagnostic_group d;
+  const special_function_kind kind = special_function_p (fn);
+  tree parmtype
+= TREE_VALUE (DECL_XOBJ_MEMBER_FUNCTION_P (fn)
+ ? TREE_CHAIN (TYPE_ARG_TYPES (TREE_TYPE (fn)))
+ : FUNCTION_FIRST_USER_PARMTYPE (fn));
+  const bool illformed_p
+/* [dcl.fct.def.default] "if F1 is an assignment operator"...  */
+= (SFK_ASSIGN_P (kind)
+   /* "and the return type of F1 differs from the return type of F2"  */
+   && (!same_type_p (TREE_TYPE (TREE_TYPE (fn)),
+TREE_TYPE (TREE_TYPE (implicit_fn)))
+  /* "or F1's non-object parameter type is not a reference,
+ the program is ill-formed"  */
+  || !TYPE_REF_P (parmtype)));
+  /* Decide if we want to emit a pedwarn, error, or a warning.  */
+  diagnostic_t diag_kind;
+  if (cxx_dialect >= cxx20)
+diag_kind = illformed_p ? DK_ERROR : DK_WARNING;
+  else
+diag_kind = DK_PEDWARN;


Error should be errors in all standard modes; it doesn't make sense to have
a softer diagnostic in an older mode when it's ill-formed in all.

Non-errors should be warnings or pedwarns depending on the standard mode.


Aaah, I misunderstood.  Hopefully I got it right this time.
  

+  /* Don't warn for template instantiations.  */
+  if (DECL_TEMPLATE_INSTANTIATION (fn) && diag_kind == DK_WARNING)
+return;
+
+  const char *wmsg;
+  switch (kind)
+{
+case sfk_copy_constructor:
+  wmsg = G_("explicitly defaulted copy constructor is implicitly deleted "
+   "because its declared type does not match the type of an "
+   "implicit copy constructor");
+  break;
+case sfk_move_constructor:
+  wmsg = G_("explicitly defaulted move constructor is implicitly deleted "
+   "because its declared type does not match the type of an "
+   "implicit move constructor");
+  break;
+case sfk_copy_assignment:
+  wmsg = G_("explicitly defaulted copy assignment operator is implicitly "
+   "deleted because its declared type does not match the type "
+   "of an implicit copy assignment operator");
+  break;
+case sfk_move_assignment:
+  wmsg = G_("explicitly defaulted move assignment operator is implicitly "
+   "deleted because its declared type does not match the type "
+   "of an implicit move assignment operator");
+  break;
+default:
+  gcc_unreachable ();
+}
+  if (emit_diagnostic (diag_kind, DECL_SOURCE_LOCATION (fn),
+  OPT_Wdefaulted_function_deleted, wmsg))


Let's not pass the OPT when DK_ERROR.


Done.

I've added new tests to cover -Wno-defaulted-function-deleted.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
This PR points out the we're not implementing [dcl.fct.def.default]
properly.  Consider e.g.

   struct C {
  C(const C&&) = default;
   };

where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted.  There's an exception for
assignment operators in which case the program is ill-formed.

clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.

When the code is ill-formed, we emit an error in all modes.  Otherwise,
we emit a pedwarn in C++17 and a warning in C++20.

PR c++/116162

gcc/c-family/ChangeLog:

* c.opt (Wdefaulted-function-deleted): New.

gcc/cp/ChangeLog:

* class.cc (check_bases_and_members): Don't set DECL_DELETED_FN here,
leave it to defaulted_late_check.
* cp-tree.h (maybe_delete_defaulted_fn): Declare.
(defaulted_late_check): Add a tristate parameter.
* method.cc (maybe_delete_defaulted_fn): New.
(defaulted_late_check): Add a tristate parameter.  Call
maybe_delete_defaulted_fn instead of giving an error.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wdefaulted-function-deleted.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/defaulted15.C: Add dg-warning/dg-error.
* g++.dg/cpp0x/defaulted51.C: Likewi

[PATCH] i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]

2024-09-19 Thread Jakub Jelinek

Hi!

min/max patterns for intrinsics which on x86 result in the second
input operand if the two operands are both zeros or one or both of them
are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to
MIN_EXPR/MAX_EXPR undefined what will be the result in those cases.

The following patch adds an expander which uses either a new pattern with
UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

P.S. I have a patch to replace UNSPEC_IEEE_M{AX,IN} with IF_THEN_ELSE
(except for the 3dNOW! PFMIN/MAX, those actually are documented to behave
differently), but it actually doesn't improve anything much, as
simplify_const_relational_operation nor simplify_ternary_operation aren't
able to fold comparisons with two CONST_VECTOR operands or IF_THEN_ELSE
with 3 CONST_VECTOR operands.
So, maybe better approach will be to generic fold the builtins with constant
arguments (maybe leaving NaNs to runtime).

2024-09-19  Uros Bizjak  
Jakub Jelinek  

PR target/116738
* config/i386/subst.md (mask_scalar_operand_arg34,
mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New
subst attributes.
* config/i386/sse.md
(_vm3):
Change from define_insn to define_expand, rename the old define_insn
to ...
(*_vm3):
... this.

(_ieee_vm3):
New define_insn.

* gcc.target/i386/sse-pr116738.c: New test.

--- gcc/config/i386/subst.md.jj 2024-09-18 15:49:42.200791315 +0200
+++ gcc/config/i386/subst.md2024-09-19 12:32:51.048626421 +0200
@@ -366,6 +366,8 @@ (define_subst_attr "mask_scalar_operand4
 (define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4")
 (define_subst_attr "mask_scalar4_dest_false_dep_for_glc_cond" "mask_scalar" 
"1" "operands[4] == CONST0_RTX(mode)")
 (define_subst_attr "mask_scalarc_dest_false_dep_for_glc_cond" "mask_scalarc" 
"1" "operands[3] == CONST0_RTX(V8HFmode)")
+(define_subst_attr "mask_scalar_operand_arg34" "mask_scalar" "" ", 
operands[3], operands[4]")
+(define_subst_attr "mask_scalar_expand_op3" "mask_scalar" "3" "5")
 
 (define_subst "mask_scalar"
   [(set (match_operand:SUBST_V 0)
@@ -473,6 +475,7 @@ (define_subst_attr "round_saeonly_scalar
 (define_subst_attr "round_saeonly_scalar_constraint" "round_saeonly_scalar" 
"vm" "v")
 (define_subst_attr "round_saeonly_scalar_prefix" "round_saeonly_scalar" "vex" 
"evex")
 (define_subst_attr "round_saeonly_scalar_nimm_predicate" 
"round_saeonly_scalar" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_saeonly_scalar_mask_arg3" "round_saeonly_scalar" "" 
", operands[]")
 
 (define_subst "round_saeonly_scalar"
   [(set (match_operand:SUBST_V 0)
--- gcc/config/i386/sse.md.jj   2024-09-10 16:26:02.875151133 +0200
+++ gcc/config/i386/sse.md  2024-09-19 12:43:31.693030695 +0200
@@ -,7 +,27 @@ (define_insn "*ieee_3
   (const_string "*")))
(set_attr "mode" "")])
 
-(define_insn 
"_vm3"
+(define_expand 
"_vm3"
+  [(set (match_operand:VFH_128 0 "register_operand")
+   (vec_merge:VFH_128
+ (smaxmin:VFH_128
+   (match_operand:VFH_128 1 "register_operand")
+   (match_operand:VFH_128 2 "nonimmediate_operand"))
+(match_dup 1)
+(const_int 1)))]
+  "TARGET_SSE"
+{
+  if (!flag_finite_math_only || flag_signed_zeros)
+{
+  emit_insn 
(gen__ieee_vm3
+(operands[0], operands[1], operands[2]
+ 
+ ));
+  DONE;
+}
+})
+
+(define_insn 
"*_vm3"
   [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
(vec_merge:VFH_128
  (smaxmin:VFH_128
@@ -3348,6 +3368,25 @@ (define_insn "_vm3")
+   (set_attr "mode" "")])
+
+(define_insn 
"_ieee_vm3"
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+   (vec_merge:VFH_128
+ (unspec:VFH_128
+   [(match_operand:VFH_128 1 "register_operand" "0,v")
+(match_operand:VFH_128 2 "nonimmediate_operand" 
"xm,")]
+   IEEE_MAXMIN)
+(match_dup 1)
+(const_int 1)))]
+  "TARGET_SSE"
+  "@
+   \t{%2, %0|%0, %2}
+   v\t{%2, 
%1, %0|%0, %1, 
%2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "btver2_sse_attr" "maxmin")
(set_attr "prefix" "")
(set_attr "mode" "")])
 
--- gcc/testsuite/gcc.target/i386/sse-pr116738.c.jj 2024-09-19 
12:52:33.502681950 +0200
+++ gcc/testsuite/gcc.target/i386/sse-pr116738.c2024-09-19 
12:54:20.938219741 +0200
@@ -0,0 +1,28 @@
+/* PR target/116738 */
+/* { dg-do run } */
+/* { dg-options "-O2 -msse" } */
+/* { dg-require-effective-target sse } */
+
+#include "sse-check.h"
+
+static inline float
+clamp (float f)
+{
+  __m128 v = _mm_set_ss (f);
+  __m128 zero = _mm_setzero_ps ();
+  __m128 greatest = _mm_set_ss (__FLT_MAX__);
+  v = _mm_min_ss (v, greatest);
+  v = _mm_max_ss (v, zero);
+  return _mm_cvtss_f32 (v);
+}
+
+static void
+sse

Re: [PATCH] ltmain.sh: allow more flags at link-time

2024-09-19 Thread Sam James

Sam James  writes:

> Sam James  writes:
>
>> libtool defaults to filtering flags passed at link-time.
>>
>> This brings the filtering in GCC's 'fork' of libtool into sync with
>> upstream libtool commit 22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e.
>>
>> In particular, this now allows some harmless diagnostic flags (especially
>> useful for things like -Werror=odr), more optimization flags, and some
>> Clang-specific options.
>>
>> GCC's -flto documentation mentions:
>>> To use the link-time optimizer, -flto and optimization options should be
>>> specified at compile time and during the final link. It is recommended
>>> that you compile all the files participating in the same link with the
>>> same options and also specify those options at link time.
>>
>> This allows compliance with that.
>>
>>  * ltmain.sh (func_mode_link): Allow various flags through filter.
>> ---
>> We have been using this for a while now downstream.
>>
>> H.J., please take a look.
>>
>> I think this also explains 
>> https://src.fedoraproject.org/rpms/binutils/blob/rawhide/f/binutils.spec#_947.
>>
>>  ltmain.sh | 46 ++
>>  1 file changed, 34 insertions(+), 12 deletions(-)
>
> Ping. The change should be harmless given the flags should be filtered
> out earlier if anything is wrong, and we've been using it internally for
> quite some time (i.e. it doesn't *add* the flags, just means that _if
> they arrive_ at libtool, they're not dropped at link-time).
>

Ping.

> [...]

Re: [PATCH] toplevel: Error out if using --disable-libstdcxx with bootstrap [PR105474]

2024-09-19 Thread Andrew Pinski

On Thu, Aug 22, 2024 at 2:45 PM Andrew Pinski  wrote:
>
> Bootstrapping and using --disable-libstdcxx will cause a build failure deep 
> in compiling
> stage2 so instead error out early in the toplevel configure so it is more 
> user friendly.
>
> Bootstrapped and tested on x86_64-linux-gnu.
> Also made sure --disable-libstdcxx without --disable-bootstrap failed.

Ping? This is just a simple patch to make it more user friendly and
fail early on rather than waiting until the build fails.

Thanks,
Andrew

>
> PR bootstrap/105474
>
> ChangeLog:
>
> * configure: Regenerate.
> * configure.ac: Error out if libstdc++ is not enabled
> with bootstrapping.
>
> Signed-off-by: Andrew Pinski 
> ---
>  configure| 9 +
>  configure.ac | 9 +
>  2 files changed, 18 insertions(+)
>
> diff --git a/configure b/configure
> index 51bf1d1add1..0722242389d 100755
> --- a/configure
> +++ b/configure
> @@ -10235,6 +10235,15 @@ case "$enable_bootstrap:$ENABLE_GOLD: $configdirs 
> :,$stage1_languages," in
>  ;;
>  esac
>
> +# Bootstrapping GCC requires libstdc++-v3 so error out if libstdc++ is 
> disabled with bootstrapping
> +# Note C++ is always enabled for stage1 now.
> +case "$enable_bootstrap:${noconfigdirs}" in
> +  yes:*target-libstdc++-v3*)
> +as_fn_error $? "bootstrapping with --disable-libstdcxx is not supported" 
> "$LINENO" 5
> +;;
> +esac
> +
> +
>  extrasub_build=
>  for module in ${build_configdirs} ; do
>if test -z "${no_recursion}" \
> diff --git a/configure.ac b/configure.ac
> index 20457005e29..8be11e84db8 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -3191,6 +3191,15 @@ case "$enable_bootstrap:$ENABLE_GOLD: $configdirs 
> :,$stage1_languages," in
>  ;;
>  esac
>
> +# Bootstrapping GCC requires libstdc++-v3 so error out if libstdc++ is 
> disabled with bootstrapping
> +# Note C++ is always enabled for stage1 now.
> +case "$enable_bootstrap:${noconfigdirs}" in
> +  yes:*target-libstdc++-v3*)
> +AC_MSG_ERROR([bootstrapping with --disable-libstdcxx is not supported])
> +;;
> +esac
> +
> +
>  extrasub_build=
>  for module in ${build_configdirs} ; do
>if test -z "${no_recursion}" \
> --
> 2.43.0
>

[PATCH] Remove PHI_RESULT_PTR and change some PHI_RESULT to be gimple_phi_result [PR116643]

2024-09-19 Thread Andrew Pinski

There was only a few uses PHI_RESULT_PTR so lets remove it and use 
gimple_phi_result_ptr
or gimple_phi_result directly instead.
Since I was modifying ssa-iterators.h for the use of PHI_RESULT_PTR, change the 
use
of PHI_RESULT there to be gimple_phi_result instead.

This also removes one extra indirection that was done for PHI_RESULT so stage2 
building
should be slightly faster.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/116643

gcc/ChangeLog:

* ssa-iterators.h (single_phi_def): Use gimple_phi_result
instead of PHI_RESULT.
(op_iter_init_phidef): Use gimple_phi_result/gimple_phi_result_ptr
instead of PHI_RESULT/PHI_RESULT_PTR.
* tree-ssa-operands.h (PHI_RESULT_PTR): Remove.
(PHI_RESULT): Use gimple_phi_result directly.
(SET_PHI_RESULT): Use gimple_phi_result_ptr directly.

Signed-off-by: Andrew Pinski 
---
 gcc/ssa-iterators.h | 6 +++---
 gcc/tree-ssa-operands.h | 5 ++---
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
index b7b01fd018a..e0e555cc472 100644
--- a/gcc/ssa-iterators.h
+++ b/gcc/ssa-iterators.h
@@ -768,7 +768,7 @@ num_ssa_operands (gimple *stmt, int flags)
 inline tree
 single_phi_def (gphi *stmt, int flags)
 {
-  tree def = PHI_RESULT (stmt);
+  tree def = gimple_phi_result (stmt);
   if ((flags & SSA_OP_DEF) && is_gimple_reg (def))
 return def;
   if ((flags & SSA_OP_VIRTUAL_DEFS) && !is_gimple_reg (def))
@@ -811,7 +811,7 @@ op_iter_init_phiuse (ssa_op_iter *ptr, gphi *phi, int flags)
 inline def_operand_p
 op_iter_init_phidef (ssa_op_iter *ptr, gphi *phi, int flags)
 {
-  tree phi_def = PHI_RESULT (phi);
+  tree phi_def = gimple_phi_result (phi);
   int comp;
 
   clear_and_done_ssa_iter (ptr);
@@ -833,7 +833,7 @@ op_iter_init_phidef (ssa_op_iter *ptr, gphi *phi, int flags)
   /* The first call to op_iter_next_def will terminate the iterator since
  all the fields are NULL.  Simply return the result here as the first and
  therefore only result.  */
-  return PHI_RESULT_PTR (phi);
+  return gimple_phi_result_ptr (phi);
 }
 
 /* Return true is IMM has reached the end of the immediate use stmt list.  */
diff --git a/gcc/tree-ssa-operands.h b/gcc/tree-ssa-operands.h
index 8072932564a..b6534f18c66 100644
--- a/gcc/tree-ssa-operands.h
+++ b/gcc/tree-ssa-operands.h
@@ -72,9 +72,8 @@ struct GTY(()) ssa_operands {
 #define USE_OP_PTR(OP) (&((OP)->use_ptr))
 #define USE_OP(OP) (USE_FROM_PTR (USE_OP_PTR (OP)))
 
-#define PHI_RESULT_PTR(PHI)gimple_phi_result_ptr (PHI)
-#define PHI_RESULT(PHI)DEF_FROM_PTR (PHI_RESULT_PTR (PHI))
-#define SET_PHI_RESULT(PHI, V) SET_DEF (PHI_RESULT_PTR (PHI), (V))
+#define PHI_RESULT(PHI)gimple_phi_result (PHI)
+#define SET_PHI_RESULT(PHI, V) SET_DEF (gimple_phi_result_ptr (PHI), (V))
 /*
 #define PHI_ARG_DEF(PHI, I)USE_FROM_PTR (PHI_ARG_DEF_PTR ((PHI), (I)))
 */
-- 
2.34.1

Re: [patch, fortran] Implement IANY, IALL and IPARITY for unsigned

2024-09-19 Thread Jerry D


On 9/18/24 1:20 PM, Thomas Koenig wrote:

OK for trunk?


OK and thanks.

Jerry
--- snip ---

Re: [PATCH] vect: Use simple_dce_worklist in the vectorizer [PR116711]

2024-09-19 Thread Andrew Pinski

On Tue, Sep 17, 2024 at 11:53 PM Richard Biener
 wrote:
>
> On Tue, Sep 17, 2024 at 4:36 AM Andrew Pinski  
> wrote:
> >
> > This adds simple_dce_worklist to both the SLP vectorizer and the loop based 
> > vectorizer.
> > This is a step into removing the dce after the loop based vectorizer. That 
> > DCE still
> > does a few things, removing some of the induction variables which has 
> > become unused. That is
> > something which can be improved afterwards.
> >
> > Note this adds it to the SLP BB vectorizer too as it is used from the loop 
> > based one sometimes.
> > In the case of the BB SLP vectorizer, the dead statements don't get removed 
> > until much later in
> > DSE so removing them much earlier is important.
> >
> > Note on the new testcase, it came up during bootstrap where the SLP pass 
> > would cause the need to
> > invalidate the scev caches but there was no testcase for this beforehand so 
> > adding one is a good idea.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> In the places you add to the worklist in vectorizable_* can you please
> see to do that in a place
> where we could actually remove the stmt (and release the def)?  Please
> also add a
> (inline) function like vect_remove_scalar_stmt (vinfo *, X) with X
> either a stmt_vec_info (preferred)
> or a gimple *.

I was thinking about this and the only place where I know 100% that we
might be removing
the statement is `vec_info::remove_stmt` which also might be just
enough to remove all of the scalar
cases. Let me try removing the places which call bitmap_set_bit except
for that one and report back.
Though induction variables might still need to be removed too; I have
to dig into that.

Thanks,
Andrew

>
> Thanks,
> Richard.
>
> > PR tree-optimization/116711
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-dce.cc (simple_dce_from_worklist): Returns
> > true if something was removed.
> > * tree-ssa-dce.h (simple_dce_from_worklist): Change return
> > type to bool.
> > * tree-vect-loop.cc (vectorizable_induction): Add phi result
> > to the dce worklist.
> > * tree-vect-slp.cc: Add includes of tree-ssa-dce.h,
> > tree-ssa-loop-niter.h and tree-scalar-evolution.h.
> > (vect_slp_region): Add DCE_WORKLIST argument. Copy
> > the dce_worklist from the bb vectorization info.
> > (vect_slp_bbs): Add DCE_WORKLIST argument. Update call to
> > vect_slp_region.
> > (vect_slp_if_converted_bb): Add DCE_WORKLIST argument. Update
> > call to vect_slp_bbs.
> > (vect_slp_function): Update call to vect_slp_bbs and call
> > simple_dce_from_worklist. Also free the loop iteration and
> > scev cache if something was removed.
> > * tree-vect-stmts.cc (vectorizable_bswap): Add the lhs of the 
> > scalar stmt
> > to the dce work list.
> > (vectorizable_call): Likewise.
> > (vectorizable_simd_clone_call): Likewise.
> > (vectorizable_conversion): Likewise.
> > (vectorizable_assignment): Likewise.
> > (vectorizable_shift): Likewise.
> > (vectorizable_operation): Likewise.
> > (vectorizable_condition): Likewise.
> > (vectorizable_comparison_1): Likewise.
> > * tree-vectorizer.cc: Include tree-ssa-dce.h.
> > (vec_info::remove_stmt): Add all of the uses of the store to the
> > dce work list.
> > (try_vectorize_loop_1): Update call to vect_slp_if_converted_bb.
> > Copy the dce worklist into the loop's vectinfo dce worklist.
> > (pass_vectorize::execute): Copy loops' vectinfo dce worklist 
> > locally.
> > Add call to simple_dce_from_worklist.
> > * tree-vectorizer.h (vec_info): Add dce_worklist field.
> > (vect_slp_if_converted_bb): Add bitmap argument.
> > * tree-vectorizer.h (vect_slp_if_converted_bb): Add bitmap argument.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/bb-slp-77.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/testsuite/gcc.dg/vect/bb-slp-77.c | 15 +
> >  gcc/tree-ssa-dce.cc   |  5 +++--
> >  gcc/tree-ssa-dce.h|  2 +-
> >  gcc/tree-vect-loop.cc |  3 +++
> >  gcc/tree-vect-slp.cc  | 32 ---
> >  gcc/tree-vect-stmts.cc| 16 +-
> >  gcc/tree-vectorizer.cc| 21 +-
> >  gcc/tree-vectorizer.h |  5 -
> >  8 files changed, 85 insertions(+), 14 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-77.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-77.c 
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-77.c
> > new file mode 100644
> > index 000..a74bb17e25c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-77.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +
> > +/* Make sure SLP vectoriz

[PATCH] rs6000, Fix test builtins-1-p10-runnable.c

2024-09-19 Thread Carl Love


GCC maintainers:

This patch removes an expected value change that was made to verify the 
error checking for the test was working.  Apparently, it didn't get 
removed from the final patch.


The patch fixes the single test error in the builtins-1-10-runnable.c test.

The patch was run on a Power 10.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love

-
rs6000, Fix test builtins-1-p10-runnable.c

The first element of the expected result was apparently changed
for testing purposes.  The change didn't get removed before the
commit.

The issue was introduced in commit:

  commit f1ad419ebfdcfaf26117e069b10bd1b154276049
  Author: Carl Love 
  Date:   Fri Sep 4 19:24:22 2020 -0500

  rs6000, vector integer multiply/divide/modulo instructions

Remove the test input.

gcc/testsuite/ChangeLog:

    * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove
    expected value for testing.  Uncomment correct    expected
    result.
---
 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

index 222c8b3a409..5402852f82b 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -281,8 +281,7 @@ int main()
 /* Signed word multiply high */
 i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 
2147483648 };

 i_arg2 = (vector int){ 2, 3, 4, 5};
-    //    vec_i_expected = (vector int){-1, -2, -2, -3};
-    vec_i_expected = (vector int){1, -2, -2, -3};
+    vec_i_expected = (vector int){-1, -2, -2, -3};

 vec_i_result = vec_mulh (i_arg1, i_arg2);

--
2.46.0

Re: [PATCH] rs6000, Fix test builtins-1-p10-runnable.c

2024-09-19 Thread Carl Love




GCC maintainers:

Please ignore this patch.  Attached the wrong patch to the message.   
Sorry for the noise.


 Carl


On 9/19/24 4:40 PM, Carl Love wrote:

GCC maintainers:

This patch removes an expected value change that was made to verify 
the error checking for the test was working.  Apparently, it didn't 
get removed from the final patch.


The patch fixes the single test error in the builtins-1-10-runnable.c 
test.


The patch was run on a Power 10.

Please let me know if the patch is acceptable for mainline. Thanks.

 Carl Love

-
rs6000, Fix test builtins-1-p10-runnable.c

The first element of the expected result was apparently changed
for testing purposes.  The change didn't get removed before the
commit.

The issue was introduced in commit:

  commit f1ad419ebfdcfaf26117e069b10bd1b154276049
  Author: Carl Love 
  Date:   Fri Sep 4 19:24:22 2020 -0500

  rs6000, vector integer multiply/divide/modulo instructions

Remove the test input.

gcc/testsuite/ChangeLog:

    * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove
    expected value for testing.  Uncomment correct    expected
    result.
---
 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

index 222c8b3a409..5402852f82b 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -281,8 +281,7 @@ int main()
 /* Signed word multiply high */
 i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 
2147483648 };

 i_arg2 = (vector int){ 2, 3, 4, 5};
-    //    vec_i_expected = (vector int){-1, -2, -2, -3};
-    vec_i_expected = (vector int){1, -2, -2, -3};
+    vec_i_expected = (vector int){-1, -2, -2, -3};

 vec_i_result = vec_mulh (i_arg1, i_arg2);

[COMMITTED] testsuite: debug: fix dejagnu directive syntax

2024-09-19 Thread Sam James

In this case, they were all harmless in reality (no diff in test logs).

gcc/testsuite/ChangeLog:

* gcc.dg/debug/btf/btf-array-1.c: Fix dg-do directive syntax.
* gcc.dg/debug/btf/btf-bitfields-1.c: Ditto.
* gcc.dg/debug/btf/btf-bitfields-2.c: Ditto.
* gcc.dg/debug/btf/btf-datasec-1.c: Ditto.
* gcc.dg/debug/btf/btf-union-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-anonymous-struct-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-anonymous-union-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-array-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-array-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-array-4.c: Ditto.
* gcc.dg/debug/ctf/ctf-array-5.c: Ditto.
* gcc.dg/debug/ctf/ctf-array-6.c: Ditto.
* gcc.dg/debug/ctf/ctf-attr-mode-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-attr-used-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-bitfields-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-bitfields-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-bitfields-3.c: Ditto.
* gcc.dg/debug/ctf/ctf-bitfields-4.c: Ditto.
* gcc.dg/debug/ctf/ctf-complex-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-cvr-quals-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-cvr-quals-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-cvr-quals-3.c: Ditto.
* gcc.dg/debug/ctf/ctf-cvr-quals-4.c: Ditto.
* gcc.dg/debug/ctf/ctf-enum-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-enum-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-file-scope-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-float-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-forward-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-forward-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-func-index-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-function-pointers-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-function-pointers-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-function-pointers-3.c: Ditto.
* gcc.dg/debug/ctf/ctf-function-pointers-4.c: Ditto.
* gcc.dg/debug/ctf/ctf-functions-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-int-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-objt-index-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-pointers-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-pointers-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-preamble-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-str-table-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-struct-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-struct-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-struct-array-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-struct-array-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-typedef-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-typedef-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-typedef-3.c: Ditto.
* gcc.dg/debug/ctf/ctf-typedef-struct-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-typedef-struct-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-typedef-struct-3.c: Ditto.
* gcc.dg/debug/ctf/ctf-union-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-variables-1.c: Ditto.
* gcc.dg/debug/ctf/ctf-variables-2.c: Ditto.
* gcc.dg/debug/ctf/ctf-variables-3.c: Ditto.
---
Pushed as obvious.

 gcc/testsuite/gcc.dg/debug/btf/btf-array-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-2.c | 2 +-
 gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c   | 3 ++-
 gcc/testsuite/gcc.dg/debug/btf/btf-union-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-anonymous-struct-1.c  | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-anonymous-union-1.c   | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-2.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-4.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-5.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-array-6.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-attr-mode-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-attr-used-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-2.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-3.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-bitfields-4.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-complex-1.c   | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-2.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-3.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-cvr-quals-4.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-enum-1.c  | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-enum-2.c  | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c| 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-float-1.c | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-1.c   | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-forward-2.c   | 2 +-
 gcc/testsuite/gcc.dg/debug/ctf/ctf-func-index-1.c|

Re: [C PATCH] fix crash when checking for compatibility of structures [PR116726]

2024-09-19 Thread Joseph Myers

On Tue, 17 Sep 2024, Martin Uecker wrote:

> Here is a fix for a mistake I made when recursively checking for
> type compatibility.
> 
> 
> Bootstrapped and regression tested on x86-64.
> 
> 
> c: fix crash when checking for compatibility of structures [PR116726]
> 
> When checking for compatibility of structure or union types in
> tagged_types_tu_compatible_p, restore the old value of the pointer to
> the top of the temporary cache after recursively calling 
> comptypes_internal
> when looping over the members of a structure of union.  While the next
> iteration of the loop overwrites the pointer, I missed the fact that it 
> can
> be accessed again when types of function arguments are compared as part
> of recursive type checking and the function is entered again.
> 
> PR c/116726
> 
> gcc/Changelog:
> * c/c-typeck.cc (tagged_types_tu_compatible_p): Restore value
> of the cache after recursing into comptypes_internal.
> 
> gcc/testsuite/Changelog:
> * pr116726.c: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

2024-09-19 Thread Joseph Myers

On Thu, 19 Sep 2024, mmalcom...@nvidia.com wrote:

> 6) Anything special about floating point maths that I'm tripping up on?

Correct atomic operations with floating-point operands should ensure that 
exceptions raised exactly correspond to the operands for which the 
operation succeeded, and not to the operands for any previous attempts 
where the compare-exchange failed.  There is a lengthy note in the C 
standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in 
6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses 
appropriate code sequences to achieve this.  In GCC the implementation of 
this is in c-typeck.cc:build_atomic_assign, which in turn calls 
targetm.atomic_assign_expand_fenv (note that we have the complication for 
C of not introducing libm dependencies in code that only uses standard 
language features and not ,  or , so direct use 
of  functions is inappropriate here).

I would expect such built-in functions to follow the same semantics for 
floating-point exceptions as _Atomic compound assignment does.  (Note that 
_Atomic compound assignment is more general in the allowed operands, 
because compound assignment is a heterogeneous operation - for example, 
the special floating-point logic in build_atomic_assign includes the case 
where the LHS of the compound assignment is of atomic integer type but the 
RHS is of floating type.  However, built-in functions allow memory orders 
other than seq_cst to be used, whereas _Atomic compound assignment is 
limited to the seq_cst case.)

So it would seem appropriate for the implementation of such built-in 
functions to make use of targetm.atomic_assign_expand_fenv for 
floating-point environment handling, and for testcases to include tests 
analogous to c11-atomic-exec-5.c that exceptions are being handled 
correctly.

Cf. N2329 which suggested such operations for C in  (but 
tried to do to many things in one paper to be accepted into C); it didn't 
go into the floating-point exceptions semantics but simple correctness 
would indicate avoiding spurious exceptions from discarded computations.

-- 
Joseph S. Myers
josmy...@redhat.com

[PATCH] testsuite/116784 - match up SLP scan and vectorized scan

2024-09-19 Thread Richard Biener

The test used vect_perm_short for the vectorized scanning but
vect_perm3_short for whether that's done with SLP.  We're now
generally expecting SLP to be used - even as fallback, so the
following adjusts both to match up, fixing the powerpc64 reported
testsuite issue.

Tested on x86_64-unknwon-linux-gnu, aarch64, riscv and powerpc64, pushed.

PR testsuite/116784
* gcc.dg/vect/slp-perm-9.c: Use vect_perm_short also for
the SLP check.
---
 gcc/testsuite/gcc.dg/vect/slp-perm-9.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
index c9468d81a9d..0c3feabf190 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
@@ -58,5 +58,5 @@ int main (int argc, const char* argv[])
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
{ vect_perm_short || vect32 } || vect_load_lanes } } } } */
 /* We don't try permutes with a group size of 3 for variable-length
vectors.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { vect_perm3_short || { vect32 || vect_load_lanes } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_perm3_short || { vect32 || vect_load_lanes } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { vect_perm_short || { vect32 || vect_load_lanes } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_perm_short || { vect32 || vect_load_lanes } } } } } */
-- 
2.43.0

Re: [PATCH v11] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]

2024-09-19 Thread Marc Poulhiès

Marc Poulhiès  writes:

> Nicolas Boulenguez  writes:
>
>> PR ada/114065
>>
>> Hello.
>> Any news about these patches?
>
> Hello,
>
> Sorry about the delay. Arnaud already replied on BZ, but I'll add a few
> remarks.

Also, I forgot to mention that your changes don't include any changelog
(required for any change in GCC -- see
https://gcc.gnu.org/contribute.html for more details).

You can scaffold these sections by piping the diff in the
`contrib/mklog.py` script and then filling the gaps.

Marc

[PATCH v4] match: Fix A || B not optimized to true when !B implies A [PR114326]

2024-09-19 Thread Konstantinos Eleftheriou

From: kelefth 

In expressions like (a != b || ((a ^ b) & c) == d) and
(a != b || (a ^ b) == c), (a ^ b) is folded to false.
In the equivalent expressions (((a ^ b) & c) == d || a != b) and
((a ^ b) == c || a != b) this is not happening.

This patch adds the following simplifications in match.pd:
((a ^ b) & c) cmp d || a != b --> 0 cmp d || a != b
(a ^ b) cmp c || a != b --> 0 cmp c || a != b

PR tree-optimization/114326

gcc/ChangeLog:

* match.pd: Add two patterns to fold a ^ b to 0, when a == b.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fold-xor-and-or.c: New test.
* gcc.dg/tree-ssa/fold-xor-or.c: New test.

Tested-by: Christoph Müllner 
Signed-off-by: Philipp Tomsich 
Signed-off-by: Konstantinos Eleftheriou 
---
 gcc/match.pd  | 32 ++-
 .../gcc.dg/tree-ssa/fold-xor-and-or.c | 55 +++
 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c   | 55 +++
 3 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 4aa610e2270..3c4f9b5f774 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3761,6 +3761,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (types_match (type, TREE_TYPE (@0)))
  (bit_xor @0 { build_one_cst (type); } ))
 
+/* ((a ^ b) & c) cmp d || a != b --> (0 cmp d || a != b). */
+(for cmp (simple_comparison)
+  (simplify
+(bit_ior
+  (cmp:c
+   (bit_and:c
+ (bit_xor:c @0 @1)
+ tree_expr_nonzero_p@2)
+   @3)
+  (ne:c@4 @0 @1))
+(bit_ior
+  (cmp
+   { build_zero_cst (TREE_TYPE (@0)); }
+   @3)
+  @4)))
+
+/* (a ^ b) cmp c || a != b --> (0 cmp c || a != b). */
+(for cmp (simple_comparison)
+  (simplify
+(bit_ior
+  (cmp:c
+   (bit_xor:c @0 @1)
+   @2)
+  (ne:c@3 @0 @1))
+(bit_ior
+  (cmp
+   { build_zero_cst (TREE_TYPE (@0)); }
+   @2)
+  @3)))
+
 /* We can't reassociate at all for saturating types.  */
 (if (!TYPE_SATURATING (type))
 
@@ -10763,4 +10793,4 @@ and,
 }
   }
   (if (full_perm_p)
-   (vec_perm (op@3 @0 @1) @3 @2))
+   (vec_perm (op@3 @0 @1) @3 @2))
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
new file mode 100644
index 000..e5dc98e7541
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef unsigned long int uint64_t;
+
+int cmp1(int d1, int d2) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2(int d1, int d2) {
+  if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0)
+return 0;
+  return 1;
+}
+
+int cmp3(int d1, int d2) {
+  if (10 > (0xabcd & (d2 ^ d1)) || d2 != d1)
+return 0;
+  return 1;
+}
+
+int cmp4(int d1, int d2) {
+  if (d2 != d1 || 10 > (0xabcd & (d2 ^ d1)))
+return 0;
+  return 1;
+}
+
+int cmp1_64(uint64_t d1, uint64_t d2) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2_64(uint64_t d1, uint64_t d2) {
+  if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0)
+return 0;
+  return 1;
+}
+
+int cmp3_64(uint64_t d1, uint64_t d2) {
+  if (10 > (0xabcd & (d2 ^ d1)) || d2 != d1)
+return 0;
+  return 1;
+}
+
+int cmp4_64(uint64_t d1, uint64_t d2) {
+  if (d2 != d1 || 10 > (0xabcd & (d2 ^ d1)))
+return 0;
+  return 1;
+}
+
+/* The if should be removed, so the condition should not exist */
+/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c
new file mode 100644
index 000..c55cfbcc84c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef unsigned long int uint64_t;
+
+int cmp1(int d1, int d2) {
+  if ((d1 ^ d2) == 0xabcd || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2(int d1, int d2) {
+  if (d1 != d2 || (d1 ^ d2) == 0xabcd)
+return 0;
+  return 1;
+}
+
+int cmp3(int d1, int d2) {
+  if (0xabcd > (d2 ^ d1) || d2 != d1)
+return 0;
+  return 1;
+}
+
+int cmp4(int d1, int d2) {
+  if (d2 != d1 || 0xabcd > (d2 ^ d1))
+return 0;
+  return 1;
+}
+
+int cmp1_64(uint64_t d1, uint64_t d2) {
+  if ((d1 ^ d2) == 0xabcd || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2_64(uint64_t d1, uint64_t d2) {
+  if (d1 != d2 || (d1 ^ d2) == 0xabcd)
+return 0;
+  return 1;
+}
+
+int cmp3_64(uint64_t d1, uint64_t d2) {
+  if (0xabcd > (d2 ^ d1) || d2 != d1)
+return 0;
+  return 1;
+}
+
+int cmp4_64(uint64_t d1, uint64_t d2) {
+  if (d2 != d1 || 0xabcd > (d2 ^ d1))
+return 0;
+  return 1;
+}
+
+/* The if should

[Fortran, Patch, PR101100, v1] Fix ICE when compiling with caf-lib and using proc_pointer component.

2024-09-19 Thread Andre Vehreschild

Hi all,

the attached patch fixes an ICE when compiling with -fcoarray=lib and using
(proc_-)pointer component in a coarray. The code was looking at the wrong
location for the caf-token.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 5115201ea3eb9caf673adce89c49e953cb46c375 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 18 Sep 2024 15:55:28 +0200
Subject: [PATCH] Fortran: Allow to nullify caf token when not in ultimate
 component. [PR101100]

gcc/fortran/ChangeLog:

	PR fortran/101100

	* trans-expr.cc (trans_caf_token_assign): Take caf-token from
	decl for non ultimate coarray components.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/proc_pointer_assign_1.f90: New test.
---
 gcc/fortran/trans-expr.cc |  8 -
 .../coarray/proc_pointer_assign_1.f90 | 29 +++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/proc_pointer_assign_1.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 54901c33139..18ef5e246ce 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -10371,7 +10371,13 @@ trans_caf_token_assign (gfc_se *lse, gfc_se *rse, gfc_expr *expr1,
   else if (lhs_attr.codimension)
 {
   lhs_tok = gfc_get_ultimate_alloc_ptr_comps_caf_token (lse, expr1);
-  lhs_tok = build_fold_indirect_ref (lhs_tok);
+  if (!lhs_tok)
+	{
+	  lhs_tok = gfc_get_tree_for_caf_expr (expr1);
+	  lhs_tok = GFC_TYPE_ARRAY_CAF_TOKEN (TREE_TYPE (lhs_tok));
+	}
+  else
+	lhs_tok = build_fold_indirect_ref (lhs_tok);
   tmp = build2_loc (input_location, MODIFY_EXPR, void_type_node,
 			lhs_tok, null_pointer_node);
   gfc_prepend_expr_to_block (&lse->post, tmp);
diff --git a/gcc/testsuite/gfortran.dg/coarray/proc_pointer_assign_1.f90 b/gcc/testsuite/gfortran.dg/coarray/proc_pointer_assign_1.f90
new file mode 100644
index 000..81f0c3b19cf
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/proc_pointer_assign_1.f90
@@ -0,0 +1,29 @@
+!{ dg-do run }
+
+! Check that PR101100 is fixed.
+
+! Contributed by G. Steinmetz  
+
+program p
+  type t
+procedure(), pointer, nopass :: f
+  end type
+
+  integer :: i = 0
+  type(t) :: x[*]
+
+  x%f => null()
+  if ( associated(x%f) ) stop 1
+
+  x%f => g
+  if (.not. associated(x%f) ) stop 2
+
+  call x%f()
+  if ( i /= 1 ) stop 3
+
+contains
+  subroutine g()
+i = 1
+  end subroutine
+end
+
--
2.46.0

[PATCH 1/8] [RFC] Define new floating point builtin fetch_add functions

2024-09-19 Thread mmalcomson

From: Matthew Malcomson 

This commit just defines the new names -- as yet don't implement them.
Saving this commit because this is one decision, and recording
what the decision was and why:

Adding new floating point builtins for each floating point type that
is defined in the general code *except* f128x (which would have a size
greater than 16bytes -- the largest integral atomic operation we
currently support).

We have to base our naming on floating point *types* rather than sizes
since different types can have the same size and the operations need
to be distinguished based on type.  N.b. one could make size-suffixed
builtins that are still overloaded based on types but I thought that
this was the cleaner approach.
(Actual requirement is distinction based on mode, this is how I choose
which internal function to use in a later patch.  I believe that
defining the function in terms of types and internally mapping to modes
is a sensible split between user interface and internal implementation).

N.b. in order to choose whether these operations are available or not
in something like libstdc++ I use something like
`__has_builtin(__atomic_fetch_add_fp)`.  This happens to be the
builtin for implementing the relevant operation on doubles, but it
also seems like a nice name to check.
- This would require that all compiler implementations have floating
  point atomics for all floating point types they support available at
  the same time.  I don't expect this is much of a problem but invite
  dissent.

N.b. I used the below type suffixes (following what seems like the
existing convention for builtins):
  - float  -> f
  - double -> 
  - long double -> l
  - _FloatN -> fN   (for N <- (16, 32, 64, 128))
  - _FloatNx -> fNx (for N <- (32, 64))

Richi suggested doing this expansion generally for all these builtins
following Cxy _Atomic semantics on IRC.
Since C hasn't specified any fetch_ semantics for floating point
types, C++ has only specified `atomic<>::fetch_{add,sub}`, and the
operations other than these are all bitwise operations (which don't
to map well to floating point), I believe I have followed that
suggestion by implementing all fetch_{sub,add}/{add,sub}_fetch
operations.

I have not implemented anything for the __sync_* builtins on the
belief that these are legacy and new code should use the __atomic_*
builtins.  Happy to adjust if that is a bad choice.

Only the new function types were needed for most cases.
The Fortran frontend does not use `builtin-types.def` so it needed the
fortran `types.def` to be updated to include the floating point built
in types in `enum builtin_type` local to `gfc_init_builtin_functions`.
- N.b. these types are already available in the fortran
  frontend (being defined by `build_common_tree_nodes`), it's just
  that they were not available for sync-builtins.def functions until
  this commit.

--
N.b. for this RFC I've not checked that any other frontends can access
these builtins.  Even the fortran frontend I've only adjusted things to
ensure stuff builds.

Signed-off-by: Matthew Malcomson 
---
 gcc/builtin-types.def | 20 ++
 gcc/fortran/types.def | 48 +++
 gcc/sync-builtins.def | 40 
 3 files changed, 108 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..97ccd945b55 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -802,6 +802,26 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, 
BT_VOLATILE_PTR, BT_I2, BT
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, BT_I8, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I16_INT, BT_VOID, BT_VOLATILE_PTR, 
BT_I16, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT_VPTR_FLOAT_INT, BT_FLOAT, BT_VOLATILE_PTR,
+BT_FLOAT, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_DOUBLE_VPTR_DOUBLE_INT, BT_DOUBLE, BT_VOLATILE_PTR,
+BT_DOUBLE, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_LONGDOUBLE_VPTR_LONGDOUBLE_INT, BT_LONGDOUBLE,
+BT_VOLATILE_PTR, BT_LONGDOUBLE, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_BFLOAT16_VPTR_BFLOAT16_INT, BT_BFLOAT16, 
BT_VOLATILE_PTR,
+BT_BFLOAT16, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT16_VPTR_FLOAT16_INT, BT_FLOAT16, 
BT_VOLATILE_PTR,
+BT_FLOAT16, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32_VPTR_FLOAT32_INT, BT_FLOAT32, 
BT_VOLATILE_PTR,
+BT_FLOAT32, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT64_VPTR_FLOAT64_INT, BT_FLOAT64, 
BT_VOLATILE_PTR,
+BT_FLOAT64, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT128_VPTR_FLOAT128_INT, BT_FLOAT128, 
BT_VOLATILE_PTR,
+BT_FLOAT128, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_FLOAT32X_VPTR_FLOAT32X_INT, BT_FLOAT32X, 
BT_VOLATILE_PTR,
+BT_FLOAT32X, BT_INT)
+DEF_FUNCTION

[PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

2024-09-19 Thread mmalcomson

From: Matthew Malcomson 

Hello, this is an RFC for adding an atomic floating point fetch_add builtin
(and variants) to GCC.  The atomic fetch_add operation is defined to work
on the base floating point types in the C++20 standard chapter 31.7.3, and
extended to work for all cv-unqualified floating point types in C++23
chapter 33.5.7.4.

Honestly not sure who to Cc, please do point me to someone else if that's
better.

This is nowhere near complete (for one thing even the tests I've added
don't fully pass), but I think I have a complete enough idea that it's
worth checking if this is something that could be agreed on.

As it stands no target except the nvptx backend would natively support
these operations.

Main questions that I'm looking to resolve with this RFC:
1) Would GCC be OK accepting this implementation even though no backend
   would be implementing these yet?
   - AIUI only the nvptx backend could theoretically implement this.
   - Even without a backend implementing it natively, the ability to use
 this in code (especially libstdc++) enables other compilers to
 generate better code for GPU's using standard C++.
2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)`
   (i.e. a check on the resolved builtin rather than the more user-facing
   one) in order to determine whether floating point atomic fetch_add is
   available.
   - N.b. this builtin is actually the builtin working on the "double"
 type, one would have to rely on any compilers implementing that
 particular resolved builtin to also implement the other floating point
 atomic fetch_add builtins that they would want to support in libstdc++
 `atomic<[floating_point_type]>::fetch_add`.

More specific questions about the choice of which builtins to implement and
whether the types are OK:
1) Is it OK to not implement the `__sync_*` versions?
   Since these are deprecated and the `__atomic_*` versions are there to
   match the C/C++ code atomic operations (which is a large part of the
   reason for the new floating point operations).
2) Is it OK to not implement all the `fetch_*` operations?
   None of the bitwise operations are specified for C++ and bitwise
   operations are (AIUI) rarely used on floating point values.
3) Wanting to be able to farm out to libatomic meant that we need constant names
   for the specialised functions.
   - This led to the naming convention based on floating point type.
   - That naming convention *could* be updated to include the special backend
 floating point types if needed.  I have not done this mostly because I
 thought it would not add much, though I have not looked into this very
 closely.
4) Wanting to name the functions based on floating point type rather than size
   meant that the mapping from type passed to the overloaded version to
   specialised builtin was less direct than for the integral versions.
   - Ended up with a hard-coded table in the source to check this.
   - Would very much like some better approach, not certain what better approach
 I could find.
   - Will eventually at least share the hard-coded tables (which isn't
 happening yet because this is at RFC level).
5) Are there any other types that I should use?
   Similarly are there any types that I'm trying to use that I shouldn't?
   I *believe* I've implemented all the types that make sense and are
   general builtin types.  Could easily have missed some (e.g. left
   `_Float128x` alone because AIUI it's larger than 128bits which means we
   don't have any other atomic operations on such data), could also easily
   be misunderstanding the mention in the C++ standards of "extended" types
   (which I believe is the `_Float*` and `bfloat16` types).
6) Anything special about floating point maths that I'm tripping up on?
   (Especially around CAS loop where we expand the operation directly,
   sometimes convert a PLUS into a MINUS of a negative value ...).
   Don't know of anything myself, but also a bit wary of floating point
   maths.

N.b. I know that there's a reasonable amount of work left in:
1) Ensuring that all the places which use atomic types are updated
   (e.g. asan), 
2) Ensuring that all frontends can use these to the level that they could
   use the integral atomics.
3) Ensuring the major backends can still compile libatomic (I had to do
   something in AArch64 libatomic backend to make it compile, seems like
   there might be more to do for others).


Matthew Malcomson (8):
  Define new floating point builtin fetch_add functions
  Add FP types for atomic builtin overload resolution
  Tie the new atomic builtins to the backend
  Have libatomic working as first draft
  Use new builtins in libstdc++
  First attempt at testsuite
  Mention floating point atomic fetch_add etc in docs
  Add demo implementation of one of the operations

 gcc/builtin-types.def |   20 +
 gcc/builtins.cc   |  153 ++-
 gcc/c-family/c-c

[PATCH 7/8] [RFC] Mention floating point atomic fetch_add etc in docs

2024-09-19 Thread mmalcomson

From: Matthew Malcomson 

Signed-off-by: Matthew Malcomson 
---
 gcc/doc/extend.texi | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 66c99ef7a66..a3e3e7da5d6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13501,6 +13501,18 @@ the same format with the addition of a @samp{size_t} 
parameter inserted
 as the first parameter indicating the size of the object being pointed to.
 All objects must be the same size.
 
+Moreover, the @samp{__atomic_fetch_add}, @samp{__atomic_fetch_sub},
+@samp{__atomic_add_fetch} and @samp{__atomic_sub_fetch} builtins can all
+accept floating point types of @code{float}, @code{double}, @code{long double},
+@code{bfloat16}, @code{_Float16}, @code{_Float32}, @code{_Float64},
+@code{_Float128}, @code{_Float32x} and @code{_Float64x}.  These use a lock-free
+built-in function if the size of the floating point type makes that possible
+and otherwise leave an external call to be resolved at run time.  This external
+call is of the same format but specialised to the given floating point type.
+The specialised versions of these functions are denoted by one of the
+suffixes @code{_fpf}, @code{_fp}, @code{_fpl}, @code{_fpf16b}, @code{_fpf16},
+@code{_fpf32}, @code{_fpf64}, @code{_fpf128}, @code{_fpf32x}, @code{_fpf64x}.
+
 There are 6 different memory orders that can be specified.  These map
 to the C++11 memory orders with the same names, see the C++11 standard
 or the @uref{https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki
-- 
2.43.0

[PATCH 3/8] [RFC] Tie the new atomic builtins to the backend

2024-09-19 Thread mmalcomson

From: Matthew Malcomson 

Need to implement something in the

Things implemented in this patch:
1) Update the optabs definitions to include floating point versions of
   atomic fetch_add variants.
2) When expanding into a CAS loop in RTL because the floating point
   optab is not implemented, there are now two different modes.  One is
   the integral mode in which the atomic CAS (and load) should be
   performed, and one is the floating point mode in which the operation
   should be performed.
   - Extra handling of modes etc in `expand_atomic_fetch_op`.

Things to highlight to any reviewer:
1) Needed another mapping from builtin to mode.
   This is *almost* shared code between this and the frontend.  Looks
   like this could be shared if I put some effort into it.
2) I do not always expand into the modify before version, but also use
   the modify after version when unable to inline.
   - From looking at the dates on different parts of the code, it seems
 that this used to be needed before libatomic was added as a target
 library.  Since libatomic currently implements both the fetch_
 and _fetch versions I don't believe it's needed any more.
3) I `extract_bit_field` to convert between representations when
   expanding as a fallback (because fallback CAS loop loads in integral
   register and we want to reinterpret that as a floating point type
   before the intermediate operation).
   - Not sure if there's a better way I don't know about.

Other than that everything seems mostly straight-forwardly following
what is already done.

Signed-off-by: Matthew Malcomson 
---
 gcc/builtins.cc | 153 +---
 gcc/optabs.cc   | 101 
 gcc/optabs.def  |   6 +-
 gcc/optabs.h|   2 +-
 4 files changed, 241 insertions(+), 21 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0b902896ddd..0ffd7d0b211 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6394,6 +6394,46 @@ get_builtin_sync_mode (int fcode_diff)
   return int_mode_for_size (BITS_PER_UNIT << fcode_diff, 0).require ();
 }
 
+/* Reconsitute the machine modes relevant for this builtin operation from the
+   builtin difference from the _N version of a fetch_add atomic.
+
+   Only works for floating point atomic builtins.
+   FCODE_DIFF should be fcode - base, where base is the FOO_N code for the
+   group of builtins.  N.b. this is a different base to that used by
+   `get_builtin_sync_mode` because that matches the builtin enum offset used in
+   c-common.cc to find the builtin enum from a given MODE.
+
+   TODO Really do need to figure out a bit neater code here.  Should not be
+   inlining the mapping from type to offset in two different places.  */
+static inline machine_mode
+get_builtin_fp_sync_mode (int fcode_diff, machine_mode *mode)
+{
+  struct type_to_offset { tree type; size_t offset; };
+  static const struct type_to_offset fp_type_mappings[] = {
+{ float_type_node, 6 },
+{ double_type_node, 7 },
+{ long_double_type_node, 8 },
+{ bfloat16_type_node ? bfloat16_type_node : error_mark_node, 9 },
+{ float16_type_node ? float16_type_node : error_mark_node, 10 },
+{ float32_type_node ? float32_type_node : error_mark_node, 11 },
+{ float64_type_node ? float64_type_node : error_mark_node, 12 },
+{ float128_type_node ? float128_type_node : error_mark_node, 13 },
+{ float32x_type_node ? float32x_type_node : error_mark_node, 14 },
+{ float64x_type_node ? float64x_type_node : error_mark_node, 15 }
+  };
+ gcc_assert (fcode_diff <= 15 && fcode_diff >= 6);
+ for (size_t i = 0; i < sizeof(fp_type_mappings)/sizeof(fp_type_mappings[0]); 
i++)
+   {
+ if ((size_t)fcode_diff == fp_type_mappings[i].offset)
+   {
+*mode = TYPE_MODE (fp_type_mappings[i].type);
+return int_mode_for_size (GET_MODE_SIZE (*mode) * BITS_PER_UNIT, 0)
+  .require ();
+   }
+  } 
+ gcc_unreachable ();
+}
+
 /* Expand the memory expression LOC and return the appropriate memory operand
for the builtin_sync operations.  */
 
@@ -6886,9 +6926,10 @@ expand_builtin_atomic_store (machine_mode mode, tree exp)
resolved to an instruction sequence.  */
 
 static rtx
-expand_builtin_atomic_fetch_op (machine_mode mode, tree exp, rtx target,
+expand_builtin_atomic_fetch_op (machine_mode expand_mode, tree exp, rtx target,
enum rtx_code code, bool fetch_after,
-   bool ignore, enum built_in_function ext_call)
+   bool ignore, enum built_in_function ext_call,
+   machine_mode load_mode = VOIDmode)
 {
   rtx val, mem, ret;
   enum memmodel model;
@@ -6898,13 +6939,13 @@ expand_builtin_atomic_fetch_op (machine_mode mode, tree 
exp, rtx target,
   model = get_memmodel (CALL_EXPR_ARG (exp, 2));
 
   /* Expand the operands.  */
-  mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
-  val = expand_expr_force_mod

[PATCH 8/8] [RFC] Add demo implementation of one of the operations

2024-09-19 Thread mmalcomson

From: Matthew Malcomson 

Do demo implementation in AArch64 since that's the backend I'm most
familiar with.

Nothing much else to say -- nice to see that the demo implementation
seems to work as expected (being used for fetch_add, add_fetch and
sub_fetch even though it's only defined for fetch_sub).

Demo implementation ensures that I can run some execution tests.

Demo is added behind a flag in order to be able to run the testsuite
with different variants (with the flag and without).
Ensuring that the functionality worked for both the fallback and when
this optab was implemented (also check with the two different fallbacks
of either using libatomic or inlining a CAS loop).

In order to run with both this and the fallback implementation I use the
following flag in RUNTESTFLAGS:
--target_board='unix {unix/-mtesting-fp-atomics}'

Signed-off-by: Matthew Malcomson 
---
 gcc/config/aarch64/aarch64.h   |  2 ++
 gcc/config/aarch64/aarch64.opt |  5 +
 gcc/config/aarch64/atomics.md  | 15 +++
 3 files changed, 22 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index fac1882bcb3..c2f37545cd7 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -119,6 +119,8 @@
of LSE instructions.  */
 #define TARGET_OUTLINE_ATOMICS (aarch64_flag_outline_atomics)
 
+#define TARGET_TESTING_FP_ATOMICS (aarch64_flag_testing_fp_atomics)
+
 /* Align definitions of arrays, unions and structures so that
initializations and copies can be made more efficient.  This is not
ABI-changing, so it only affects places where we can see the
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 6356c419399..ed031258575 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -332,6 +332,11 @@ moutline-atomics
 Target Var(aarch64_flag_outline_atomics) Init(2) Save
 Generate local calls to out-of-line atomic operations.
 
+mtesting-fp-atomics
+Target Var(aarch64_flag_testing_fp_atomics) Init(0) Save
+Use the demonstration implementation of atomic_fetch_sub_ for floating
+point modes.
+
 -param=aarch64-vect-compare-costs=
 Target Joined UInteger Var(aarch64_vect_compare_costs) Init(1) IntegerRange(0, 
1) Param
 When vectorizing, consider using multiple different approaches and use
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 32a0a723732..ee8fbcd6c58 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -368,6 +368,21 @@
 ;; However we also implement the acquire memory barrier with DMB LD,
 ;; and so the ST is not blocked by the barrier.
 
+(define_insn "atomic_fetch_sub"
+  [(set (match_operand:GPF 0 "register_operand" "=&w")
+(match_operand:GPF 1 "aarch64_sync_memory_operand" "+Q"))
+(set (match_dup 1)
+(unspec_volatile:GPF
+[(minus:GPF (match_dup 1)
+   (match_operand:GPF 2 "register_operand" "w"))
+ (match_operand:SI 3 "const_int_operand")]
+ UNSPECV_ATOMIC_LDOP_PLUS))
+(clobber (match_scratch:GPF 4 "=w"))]
+"TARGET_TESTING_FP_ATOMICS"
+"// Here's your sandwich.\;ldr %0, %1\;fsub %4, %0, %2\;str 
%4, %1\;// END"
+)
+
+
 (define_insn "aarch64_atomic__lse"
   [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
(unspec_volatile:ALLI
-- 
2.43.0

[PATCH 2/8] [RFC] Add FP types for atomic builtin overload resolution

2024-09-19 Thread mmalcomson

From: Matthew Malcomson 

Have a bit of an ugly mapping from floating point type to the builtin
using that type.  Would like to find some code-sharing between this, the
function (in a later patch in this series) that finds the relevant mode
from a given builtin, and the general sync-builtins.def file.
As yet don't have a nice way to do that, but haven't looked that hard.

Other than that, seems we can cleanly emit the functions that we need.

N.b. we match which function to use based on the MODE of the type for
two reasons:
1) Can't match directly on type as otherwise `typedef float x` would
   mean that `x` could no longer be used with that intrinsic.
2) MODE (i.e. the types ABI) is the thing that we need to distinguish
   between when deciding which fundamental operation needs to be
   applied.

Signed-off-by: Matthew Malcomson 
---
 gcc/c-family/c-common.cc | 88 
 1 file changed, 70 insertions(+), 18 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index e7e371fd26f..c0a2b136d67 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -7360,13 +7360,15 @@ speculation_safe_value_resolve_return (tree 
first_param, tree result)
 
 static int
 sync_resolve_size (tree function, vec *params, bool fetch,
-  bool orig_format)
+  bool orig_format,
+   int *fp_specialisation_offset)
 {
   /* Type of the argument.  */
   tree argtype;
   /* Type the argument points to.  */
   tree type;
   int size;
+  bool valid_float = false;
 
   if (vec_safe_is_empty (params))
 {
@@ -7385,7 +7387,8 @@ sync_resolve_size (tree function, vec 
*params, bool fetch,
 goto incompatible;
 
   type = TREE_TYPE (type);
-  if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type))
+  valid_float = fp_specialisation_offset && fetch && SCALAR_FLOAT_TYPE_P 
(type);
+  if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type) && !valid_float)
 goto incompatible;
 
   if (!COMPLETE_TYPE_P (type))
@@ -7402,6 +7405,40 @@ sync_resolve_size (tree function, vec 
*params, bool fetch,
   && !targetm.scalar_mode_supported_p (TImode))
 return -1;
 
+  if (valid_float)
+{
+  tree fp_type = type;
+  /* TODO Want a better reverse-mapping between an argument type and
+ the builtin enum.  */
+  struct type_to_offset { tree type; size_t offset; };
+  static const struct type_to_offset fp_type_mappings[] = {
+{ float_type_node, 6 },
+{ double_type_node, 7 },
+{ long_double_type_node, 8 },
+{ bfloat16_type_node ? bfloat16_type_node : error_mark_node, 9 },
+{ float16_type_node ? float16_type_node : error_mark_node, 10 },
+{ float32_type_node ? float32_type_node : error_mark_node, 11 },
+{ float64_type_node ? float64_type_node : error_mark_node, 12 },
+{ float128_type_node ? float128_type_node : error_mark_node, 13 },
+{ float32x_type_node ? float32x_type_node : error_mark_node, 14 },
+{ float64x_type_node ? float64x_type_node : error_mark_node, 15 }
+  };
+  size_t offset = 0;
+  for (size_t i = 0;
+   i < sizeof(fp_type_mappings)/sizeof(fp_type_mappings[0]);
+   ++i) {
+if (TYPE_MODE (fp_type) == TYPE_MODE (fp_type_mappings[i].type))
+  {
+offset = fp_type_mappings[i].offset;
+break;
+  }
+  }
+  if (offset == 0)
+goto incompatible;
+  *fp_specialisation_offset = offset;
+  return -1;
+}
+
   if (size == 1 || size == 2 || size == 4 || size == 8 || size == 16)
 return size;
 
@@ -7462,9 +7499,10 @@ sync_resolve_params (location_t loc, tree orig_function, 
tree function,
 arguments (e.g. EXPECTED argument of __atomic_compare_exchange_n),
 bool arguments (e.g. WEAK argument) or signed int arguments (memmodel
 kinds).  */
-  if (TREE_CODE (arg_type) == INTEGER_TYPE && TYPE_UNSIGNED (arg_type))
+  if ((TREE_CODE (arg_type) == INTEGER_TYPE && TYPE_UNSIGNED (arg_type))
+  || SCALAR_FLOAT_TYPE_P (arg_type))
{
- /* Ideally for the first conversion we'd use convert_for_assignment
+ /* Ideally) for the first conversion we'd use convert_for_assignment
 so that we get warnings for anything that doesn't match the pointer
 type.  This isn't portable across the C and C++ front ends atm.  */
  val = (*params)[parmnum];
@@ -8256,7 +8294,6 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
 NULL_TREE);
 }
 
-
 /* Some builtin functions are placeholders for other expressions.  This
function should be called immediately after parsing the call expression
before surrounding code has committed to the type of the expression.
@@ -8277,6 +8314,9 @@ resolve_overloaded_builtin (location_t loc, tree function,
  and so must be rejected.  */
   bool fetch_op = true;
   bool orig_format = true;
+  /* Is this fun

[PATCH 5/8] [RFC] Use new builtins in libstdc++

2024-09-19 Thread mmalcomson

From: Matthew Malcomson 

Points to question here are:
1) Whether checking for this particular internal builtin is OK (this one
   happens to be the one implementing the operation for a `double`, we
   would have to rely on the approach that if anyone implements this
   operation for a `double` they implement it for all the floating point
   types that their C++ frontend and libstdc++ handle).
2) Whether the `#if` bit should be somewhere else instead of put in the
   `__fetch_add_flt` function.  I put it there because that's where it
   seemed natural, but am not familiar enough with libstdc++ to be
   confident in that decision.

We still need the CAS loop fallback for any compiler that doesn't
implement this builtin, and hence will still need some extra choice to
be made for floating point types.  Once all compilers we care about
implement this we can remove this special handling and merge the
floating point and integral operations into the same template.

Signed-off-by: Matthew Malcomson 
---
 libstdc++-v3/include/bits/atomic_base.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 1c2367b39b6..d3b1a022db2 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -1217,30 +1217,41 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Tp
   __fetch_add_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m) noexcept
   {
+#if __has_builtin(__atomic_fetch_add_fp)
+   return __atomic_fetch_add(__ptr, __i, int(__m));
+#else
_Val<_Tp> __oldval = load(__ptr, memory_order_relaxed);
_Val<_Tp> __newval = __oldval + __i;
while (!compare_exchange_weak(__ptr, __oldval, __newval, __m,
  memory_order_relaxed))
  __newval = __oldval + __i;
return __oldval;
+#endif
   }
 
 template
   _Tp
   __fetch_sub_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m) noexcept
   {
+#if __has_builtin(__atomic_fetch_sub)
+   return __atomic_fetch_sub(__ptr, __i, int(__m));
+#else
_Val<_Tp> __oldval = load(__ptr, memory_order_relaxed);
_Val<_Tp> __newval = __oldval - __i;
while (!compare_exchange_weak(__ptr, __oldval, __newval, __m,
  memory_order_relaxed))
  __newval = __oldval - __i;
return __oldval;
+#endif
   }
 
 template
   _Tp
   __add_fetch_flt(_Tp* __ptr, _Val<_Tp> __i) noexcept
   {
+#if __has_builtin(__atomic_add_fetch)
+   return __atomic_add_fetch(__ptr, __i, __ATOMIC_SEQ_CST);
+#else
_Val<_Tp> __oldval = load(__ptr, memory_order_relaxed);
_Val<_Tp> __newval = __oldval + __i;
while (!compare_exchange_weak(__ptr, __oldval, __newval,
@@ -1248,12 +1259,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  memory_order_relaxed))
  __newval = __oldval + __i;
return __newval;
+#endif
   }
 
 template
   _Tp
   __sub_fetch_flt(_Tp* __ptr, _Val<_Tp> __i) noexcept
   {
+#if __has_builtin(__atomic_sub_fetch)
+  return __atomic_sub_fetch(__ptr, __i, __ATOMIC_SEQ_CST);
+#else
_Val<_Tp> __oldval = load(__ptr, memory_order_relaxed);
_Val<_Tp> __newval = __oldval - __i;
while (!compare_exchange_weak(__ptr, __oldval, __newval,
@@ -1261,6 +1276,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  memory_order_relaxed))
  __newval = __oldval - __i;
return __newval;
+#endif
   }
   } // namespace __atomic_impl
 
-- 
2.43.0

[PATCH 4/8] [RFC] Have libatomic working as first draft

2024-09-19 Thread mmalcomson

From: Matthew Malcomson 

As it stands there are still a few things to look at whether they could
be improved:
1) Need to find the exact version of automake to use.  I'm using
   automake 1.15.1 from https://ftp.gnu.org/gnu/automake/ but the
   header is claiming I'm using automake 1.15.
2) The internal naming is all a little "not right" up for floating
   point.  E.g. the SIZE() macro is no longer adding a SIZE integer
   suffix to something but instead adding a suffix representing a type.
   Not sure whether the churn to fix this is worth it -- will ask
   upstream.
3) Have not implemented the word-size compare and swap loop fallback.
   This because the implementation uses a mask and the mask is not
   always the same for any given architecture.  Hence the existing
   approach in code would not work for all floating point types.
   - I would appreciate some feedback about whether this is OK to not
 implement.  Seems reasonable to me.
4) In the existing test for the availability of an atomic fetch
   operation there are two things that I do not know why they are needed
   and hence didn't add them to the check for atomic floating point
   fetch_{add,sub}.
   I just wanted to highlight this in case I missed something.
   1) I only put the `x` register into a register with an `asm` call.
  To be honest I don't know why anything need be put into a
  register, but I didn't put the floating point value into a
  register because I didn't know of a standard GCC floating point
  register constraint that worked across all architectures.
  - Is there any need for this `asm` line (I copied from existing
libatomic configure code without understanding).
  - Is there any need for the constant addition to be applied?
   2) I used a cast of a 1.0 floating point literal as the "addition"
  for all floating point types in the configury check.
  - Is there something subtle I'm missing about this?
(I *think* it's fine, but felt like this seemed to be a place
where I could trip up without knowing).

Description of things done in this commit:

We implement the new floating point builtins around fetch_add.  This is
mostly a configure/makefile change.  The main overview of the changes is
that we create a new list of suffixes (_fpf, _fp, _fpl, _fp16b, _fp16,
_fp32, _fp64, _fp128, _fp32x, _fp64x) and re-compile fadd_n.c and
fsub_n.c for these suffixes.
The existing machinery for checking whether a given atomic builtin is
implemented is extended to check for these same suffixes on the atomic
builtins.  The existing machinery for generating atomic fetch_
implementations using a given suffix and general patterns is also
re-used (n.b. with the exception that the implementation based on
a compare and exchange of a word is not implemented because the
pre-processor does not know the size of the floating point types).

The AArch64 backend is updated slightly.  It didn't build because it
assumed there was some IFUNC for all operations implemented (and didn't
have any IFUNC for the new floating point operations).

The new functions are advertised as LIBATOMIC_1.3 in the linker map for
the dynamic library.

Signed-off-by: Matthew Malcomson 
---
 libatomic/Makefile.am|6 +-
 libatomic/Makefile.in|   12 +-
 libatomic/acinclude.m4   |   49 +
 libatomic/auto-config.h.in   |   84 +-
 libatomic/config/linux/aarch64/host-config.h |2 +
 libatomic/configure  | 1153 +-
 libatomic/configure.ac   |4 +
 libatomic/fadd_n.c   |   23 +
 libatomic/fop_n.c|5 +-
 libatomic/fsub_n.c   |   23 +
 libatomic/libatomic.map  |   44 +
 libatomic/libatomic_i.h  |   58 +
 libatomic/testsuite/Makefile.in  |1 +
 13 files changed, 1392 insertions(+), 72 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index efadd9dcd48..ec24f1da86b 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -110,6 +110,7 @@ IFUNC_OPT   = $(word $(PAT_S),$(IFUNC_OPTIONS))
 M_SIZE = -DN=$(PAT_N)
 M_IFUNC= $(if $(PAT_S),$(IFUNC_DEF) $(IFUNC_OPT))
 M_FILE = $(PAT_BASE)_n.c
+M_FLOATING  = $(if $(findstring $(PAT_N),$(FPSUFFIXES)),-DFLOATING)
 
 # The lack of explicit dependency on the source file means that VPATH cannot
 # work properly.  Instead, perform this operation by hand.  First, collect a
@@ -120,10 +121,13 @@ all_c_files   := $(foreach 
dir,$(search_path),$(wildcard $(dir)/*.c))
 M_SRC  = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 
 %_.lo: Makefile
-   $(LTCOMPILE) $(M_DEPS) $(M_SIZE) $(M_IFUNC) -c -o $@ $(M_SRC)
+   $(LTCOMPILE) $(M_DEPS) $(M_SIZE) $(M_FLOATING) $(M_IFUNC) -c -o $@ 
$(M_SRC)
 
 ## Include all of the sizes in the "normal" set of c

Re: [PATCH v2] aarch64: Add fp8 scalar types

2024-09-19 Thread Kyrylo Tkachov

Hi Claudio,

> On 19 Sep 2024, at 15:09, Claudio Bantaloukas  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> The ACLE defines a new scalar type, __mfp8. This is an opaque 8bit types that
> can only be used by fp8 intrinsics. Additionally, the mfloat8_t type is made
> available in arm_neon.h and arm_sve.h as an alias of the same.
> 
> This implementation uses an unsigned INTEGER_TYPE, with precision 8 to
> represent __mfp8. Conversions to int and other types are disabled via the
> TARGET_INVALID_CONVERSION hook.
> Additionally, operations that are typically available to integer types are
> disabled via TARGET_INVALID_UNARY_OP and TARGET_INVALID_BINARY_OP hooks.
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64-builtins.cc (aarch64_mfp8_type_node): Add node
>for __mfp8 type.
>(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type.
>(aarch64_init_fp8_types): New function to initialise fp8 types and
>register with language backends.
>* config/aarch64/aarch64.cc (aarch64_mangle_type): Add ABI mangling for
>new type.
>(aarch64_invalid_conversion): Add function implementing
>TARGET_INVALID_CONVERSION hook that blocks conversion to and from the
>__mfp8 type.
>(aarch64_invalid_unary_op): Add function implementing TARGET_UNARY_OP
>hook that blocks operations on __mfp8 other than &.
>(aarch64_invalid_binary_op): Extend TARGET_BINARY_OP hook to disallow
>operations on __mfp8 type.
>(TARGET_INVALID_CONVERSION): Add define.
>(TARGET_INVALID_UNARY_OP): Likewise.
>* config/aarch64/aarch64.h (aarch64_mfp8_type_node): Add node for 
> __mfp8
>type.
>(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type.
>* config/aarch64/arm_neon.h (mfloat8_t): Add typedef.
>* config/aarch64/arm_sve.h (mfloat8_t): Likewise.

Looks like this typedef is a good candidate to go into arm_private_fp8.h so 
that arm_neon.h, arm_sve.h and arm_sme.h inherit it.

Thanks,
Kyrill


> 
> gcc/testsuite/ChangeLog:
> 
>* g++.target/aarch64/fp8_mangling.C: New tests exercising mangling.
>* g++.target/aarch64/fp8_scalar_typecheck_2.C: New tests in C++.
>* gcc.target/aarch64/fp8_scalar_1.c: New tests in C.
>* gcc.target/aarch64/fp8_scalar_typecheck_1.c: Likewise.
> ---
> Hi,
> Is this ok for master? I do not have commit rights yet, if ok, can someone 
> commit it on my behalf?
> 
> Regression tested with aarch64-unknown-linux-gnu.
> 
> Compared to V1 of the patch, in version 2:
> - mangling for the __mfp8 type was added along with tests
> - unneeded comments were removed
> - simplified type checks in hooks
> - simplified initialization of aarch64_mfp8_type_node
> - separated mfloat8_t define from other fp types in arm_sve.h
> - C++ tests were moved to g++.target/aarch64
> - added more tests around binary operations, function declaration,
>  type traits
> - added tests exercising loads and stores from floating point registers
> 
> 
> Thanks,
> Claudio Bantaloukas
> 
> gcc/config/aarch64/aarch64-builtins.cc|  20 +
> gcc/config/aarch64/aarch64.cc |  54 ++-
> gcc/config/aarch64/aarch64.h  |   5 +
> gcc/config/aarch64/arm_neon.h |   2 +
> gcc/config/aarch64/arm_sve.h  |   2 +
> .../g++.target/aarch64/fp8_mangling.C |  44 ++
> .../aarch64/fp8_scalar_typecheck_2.C  | 381 ++
> .../gcc.target/aarch64/fp8_scalar_1.c | 134 ++
> .../aarch64/fp8_scalar_typecheck_1.c  | 356 
> 9 files changed, 996 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_mangling.C
> create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_scalar_typecheck_2.C
> create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_typecheck_1.c
> 
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index eb878b933fe..7d17df05a0f 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -961,6 +961,11 @@ static GTY(()) tree aarch64_simd_intOI_type_node = 
> NULL_TREE;
> static GTY(()) tree aarch64_simd_intCI_type_node = NULL_TREE;
> static GTY(()) tree aarch64_simd_intXI_type_node = NULL_TREE;
> 
> +/* The user-visible __mfp8 type, and a pointer to that type.  Used
> +   across the back-end.  */
> +tree aarch64_mfp8_type_node = NULL_TREE;
> +tree aarch64_mfp8_ptr_type_node = NULL_TREE;
> +
> /* The user-visible __fp16 type, and a pointer to that type.  Used
>across the back-end.  */
> tree aarch64_fp16_type_node = NULL_TREE;
> @@ -1721,6 +1726,19 @@ aarch64_init_builtin_rsqrt (void)
>   }
> }
> 
> +/* Initialize the backend type that supports the user-visible __mfp8
> +   type and its relative pointer type.  */
> +
> +stat

[Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-19 Thread Tobias Burnus


Hi all,

in order to know and potentially re-use a specific offload device 
(reproducibility,
affinity wise close to a CPU (socket), …) a mapping between an (universal?) 
unique
identifier and the OpenMP device number is useful. Thus, TR13 added support for 
it.

This is a collateral patch caused by looking at the API routines for other 
reasons
and looking at that part of the spec during the OpenMP F2F.

Besides the added API routines, the UID will be used elsewhere:
* In context selectors: 'target_device' supports 'uid()'.
* In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.

@Sandra: Besides the usual .texi part, for the 'target_device' trait set:
if you add a new GOMP routine for kind/arch/isa - can you also add an
UID argument such that we don't have to update the API when needing in the
not so far future.

@Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin +
.texi)?

@Jakub or anyone else — any comments, suggestions, remarks?

[The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
and seems to work fine.]

Tobias
OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.

gcc/ChangeLog:

	* omp-general.cc (omp_runtime_api_procname): Add
	get_device_from_uid and omp_get_uid_from_device routines.

include/ChangeLog:

	* cuda/cuda.h (cuDeviceGetUuid): Declare.
	(cuDeviceGetUuid_v2): Add prototype.

libgomp/ChangeLog:

	* config/gcn/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Add stub implementation.
	* config/nvptx/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Likewise.
	* fortran.c (omp_get_uid_from_device_,
	omp_get_uid_from_device_8_): Add.
	* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
	* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
	* libgomp.map (GOMP_6.0): New, includind the new UID routines.
	* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
	(Device Information Routines): Document new UID routines.
	(Offload-Target Specifics): Document UID format.
	* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New prototype.
	* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New interface.
	* omp_lib.h.in: Likewise.
	* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
	* target.c (str_omp_initial_device): New static var.
	(STR_OMP_DEV_PREFIX): Define.
	(gomp_get_uid_for_device, omp_get_uid_from_device,
	omp_get_device_from_uid): New.
	(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
	(gomp_target_init): Set the device's 'uid' field to NULL.
	* testsuite/libgomp.c/device_uid.c: New test.
	* testsuite/libgomp.fortran/device_uid.f90: New test.

 gcc/omp-general.cc   |  4 +-
 include/cuda/cuda.h  |  7 ++
 libgomp/config/gcn/target.c  | 14 
 libgomp/config/nvptx/target.c| 14 
 libgomp/fortran.c| 15 +
 libgomp/libgomp-plugin.h |  1 +
 libgomp/libgomp.h|  2 +
 libgomp/libgomp.map  |  8 +++
 libgomp/libgomp.texi | 81 +++-
 libgomp/omp.h.in |  3 +
 libgomp/omp_lib.f90.in   | 23 +++
 libgomp/omp_lib.h.in | 23 +++
 libgomp/plugin/cuda-lib.def  |  2 +
 libgomp/plugin/plugin-gcn.c  | 16 +
 libgomp/plugin/plugin-nvptx.c| 34 ++
 libgomp/target.c | 56 
 libgomp/testsuite/libgomp.c/device_uid.c | 38 +++
 libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 
 18 files changed, 379 insertions(+), 4 deletions(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index de91ba8a4a7..12788ad0249 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name)
   "alloc",
   "calloc",
   "free",
+  "get_device_from_uid",
   "get_interop_int",
   "get_interop_ptr",
   "get_mapped_ptr",
@@ -3338,12 +3339,13 @@ omp_runtime_api_procname (const char *name)
 	 as DECL_NAME only omp_* and omp_*_8 appear.  */
   "display_env",
   "get_ancestor_thread_num",
-  "init_allocator",
+  "omp_get_uid_from_device",
   "get_partition_place_nums",
   "get_place_num_procs",
   "get_place_proc_ids",
   "get_schedule",
   "get_team_size",
+  "init_allocator",

Re: [PATCH] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-09-19 Thread Richard Sandiford

Pengxuan Zheng  writes:
> This is similar to the recent improvements to the Advanced SIMD popcount
> expansion by using SVE. We can utilize SVE to generate more efficient code for
> scalar mode popcount too.
>
>   PR target/113860
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (popcount2): Update pattern to
>   also support V1DI mode.
>   * config/aarch64/aarch64.md (popcount2): Add TARGET_SVE support.
>   * config/aarch64/iterators.md (VDQHSD_V1DI): New mode iterator.
>   (SVE_VDQ_I): Add V1DI.
>   (bitsize): Likewise.
>   (VPRED): Likewise.
>   (VEC_POP_MODE): New mode attribute.
>   (vec_pop_mode): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/popcnt11.c: New test.

Sorry for the slow review of this.  The main reason for putting it off
was the use of V1DI, which always makes me nervous.

In particular:

> @@ -2284,7 +2286,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI 
> "VNx8BI")
>(VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
>(V8QI "VNx8BI") (V16QI "VNx16BI")
>(V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
> -  (V4SI "VNx4BI") (V2DI "VNx2BI")])
> +  (V4SI "VNx4BI") (V2DI "VNx2BI") (V1DI "VNx2BI")])
>  

it seems odd to have a predicate mode that contains more elements than
the associated single-vector data mode.

The patch also extends the non-SVE SIMD popcount pattern for V1DI,
but it doesn't look like that path works.  E.g. try the following
with -march=armv8-a -fgimple -O2:

__Uint64x1_t __GIMPLE
foo (__Uint64x1_t x)
{
  __Uint64x1_t z;

  z = .POPCOUNT (x);
  return z;
}

Thanks,
Richard


>  ;; ...and again in lower case.
>  (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi")
> @@ -2318,6 +2320,14 @@ (define_mode_attr VDOUBLE [(VNx16QI "VNx32QI")
>  (VNx4SI "VNx8SI") (VNx4SF "VNx8SF")
>  (VNx2DI "VNx4DI") (VNx2DF "VNx4DF")])
>  
> +;; The Advanced SIMD modes of popcount corresponding to scalar modes.
> +(define_mode_attr VEC_POP_MODE [(QI "V8QI") (HI "V4HI")
> + (SI "V2SI") (DI "V1DI")])
> +
> +;; ...and again in lower case.
> +(define_mode_attr vec_pop_mode [(QI "v8qi") (HI "v4hi")
> + (SI "v2si") (DI "v1di")])
> +
>  ;; On AArch64 the By element instruction doesn't have a 2S variant.
>  ;; However because the instruction always selects a pair of values
>  ;; The normal 3SAME instruction can be used here instead.
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt11.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt11.c
> new file mode 100644
> index 000..595b2f9eb93
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt11.c
> @@ -0,0 +1,58 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.2-a+sve" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +/*
> +** f_qi:
> +**   ldr b([0-9]+), \[x0\]
> +**   cnt v\1.8b, v\1.8b
> +**   smovw0, v\1.b\[0\]
> +**   ret
> +*/
> +unsigned
> +f_qi (unsigned char *a)
> +{
> +  return __builtin_popcountg (a[0]);
> +}
> +
> +/*
> +** f_hi:
> +**   ldr h([0-9]+), \[x0\]
> +**   ptrue   (p[0-7]).b, all
> +**   cnt z\1.h, \2/m, z\1.h
> +**   smovw0, v\1.h\[0\]
> +**   ret
> +*/
> +unsigned
> +f_hi (unsigned short *a)
> +{
> +  return __builtin_popcountg (a[0]);
> +}
> +
> +/*
> +** f_si:
> +**   ldr s([0-9]+), \[x0\]
> +**   ptrue   (p[0-7]).b, all
> +**   cnt z\1.s, \2/m, z\1.s
> +**   umovx0, v\1.d\[0\]
> +**   ret
> +*/
> +unsigned
> +f_si (unsigned int *a)
> +{
> +  return __builtin_popcountg (a[0]);
> +}
> +
> +/*
> +** f_di:
> +**   ldr d([0-9]+), \[x0\]
> +**   ptrue   (p[0-7])\.b, all
> +**   cnt z\1\.d, \2/m, z\1\.d
> +**   fmovx0, d\1
> +**   ret
> +*/
> +unsigned
> +f_di (unsigned long *a)
> +{
> +  return __builtin_popcountg (a[0]);
> +}

Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-19 Thread Richard Sandiford

Martin Storsjö  writes:
> On Thu, 12 Sep 2024, Evgeny Karpov wrote:
>
>> The current binutils implementation does not support offset up to 4GB in
>> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
>> This is related to differences in ELF and COFF relocation records.
>
> Yes, I agree.
>
> But I would not consider this a limitation of the binutils implementation, 
> this is a limitation of the object file format. It can't be worked around 
> by inventing your own custom relocations, but should instead worked around 
> on the code generation side, to avoid needing such large offsets.
>
> This approach is one such, quite valid. Another one is to generate extra 
> symbols to allow addressing anything with a smaller offset.

Maybe this is my ELF bias showing, but: generating extra X=Y+OFF
symbols isn't generally valid for ELF when Y is a global symbol, since
interposition rules, comdat, weak symbols, and various other reasons,
could mean that the local definition of Y isn't the one that gets used.
Does COFF cope with that in some other way?  If not, I would have
expected that there would need to be a fallback path that didn't
involve defining extra symbols.

Thanks,
Richard

Re: [PATCH] SVE intrinsics: Fold svmul with all-zero operands to zero vector

2024-09-19 Thread Richard Sandiford

Jennifer Schmitz  writes:
>> On 18 Sep 2024, at 20:33, Richard Sandiford  
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz  writes:
>>> From 05e010a4ad5ef8df082b3e03b253aad85e2a270c Mon Sep 17 00:00:00 2001
>>> From: Jennifer Schmitz 
>>> Date: Tue, 17 Sep 2024 00:15:38 -0700
>>> Subject: [PATCH] SVE intrinsics: Fold svmul with all-zero operands to zero
>>> vector
>>> 
>>> As recently implemented for svdiv, this patch folds svmul to a zero
>>> vector if one of the operands is a zero vector. This transformation is
>>> applied if at least one of the following conditions is met:
>>> - the first operand is all zeros or
>>> - the second operand is all zeros, and the predicate is ptrue or the
>>> predication is _x or _z.
>>> 
>>> In contrast to constant folding, which was implemented in a previous
>>> patch, this transformation is applied as soon as one of the operands is
>>> a zero vector, while the other operand can be a variable.
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz 
>> 
>> OK, thanks.
>> 
>> If you're planning any more work in this area, I think the next logical
>> step would be to extend the current folds to all predication types,
>> before going on to support other mul/div cases or other operations.
>> 
>> In principle, the mul and div cases correspond to:
>> 
>>  if (integer_zerop (op1) || integer_zerop (op2))
>>return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>> 
>> It would then be up to fold_active_lanes_to(X) to work out how to apply
>> predication to X.  The general case would be:
>> 
>>  - For x predication and unpredicated operations, fold to X.
>> 
>>  - For m and z, calculate a vector that supplies the values of inactive
>>lanes (the first vector argument for m and a zero vector from z).
>> 
>>- If X is equal to the inactive lanes vector, fold directly to X.
>> 
>>- Otherwise fold to VEC_COND_EXPR 
> Dear Richard,
> I pushed it to trunk with 08aba2dd8c9390b6131cca0aac069f97eeddc9d2.
> Thank you also for the good suggestion, I will do that. During the last days, 
> I have been working on a patch that folds multiplication by powers of 2 to 
> left-shifts (svlsl), similar to for division. As I see it, that is 
> independent from what you proposed, because it is a change of the function 
> type. Can I submit it for review before starting on the patch you suggested?

Sure!  I agree the power-of-two fold is independent.  I was just worried
about building up technical debt if we added more fold-to-constant cases.

Thanks,
Richard

Re: [Fortran, Patch, PR106606, v1] Fortran: Break recursion building recursive types. [PR106606]

2024-09-19 Thread Andre Vehreschild

Hi Thomas,

thanks for review. Committed with the changes requested as:
gcc-15-3711-gde915fbe3cb

Thanks again.

Regards,
Andre

On Wed, 18 Sep 2024 18:24:19 +0200
Thomas Koenig  wrote:

> Hi Andre,
>
> > Regtested ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
>
> Extremely minor nit: In the commit message and ChangeLog entry,
>
> Build a derived type component's type only, when it is not already being
> build and the component uses pointer semantics.
>
> I believe that should be "being built".
>
> In the ChangeLog entry
>
>   derived types as component's types when they are not yet build.
>
> s/build/built/
>
> OK for trunk.
>
> Thanks for the patch!
>
> Best regards
>
>   Thomas
>
>


--
Andre Vehreschild * Email: vehre ad gmx dot de

RE: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change

2024-09-19 Thread Li, Pan2

> So for the future I'd suggest you post those with a remark that you think
> they're obvious and going to commit in a day (or some other reasonable
> timeframe) if there are no complaints.

Oh, I see. Thanks Robin for reminding.

That would be perfect. Do you have any best practices for the remark "obvious"?
Like [NFC] in subject to give some hit for not-function-change, maybe take 
[TBO] stand for to-be-obvious or something like that.

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, September 19, 2024 4:26 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp 
Subject: Re: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to 
middle-end change

> This patch would like fix the dump check times of vector SAT_ADD.  The
> middle-end change makes the match times from 2 to 4 times.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.

That's OK.  And I think testsuite fixup patches like this you can consider
"obvious" as long as you're sure the underlying reason is understood.
In particular as you have been working in the saturating space for a while now.

So for the future I'd suggest you post those with a remark that you think
they're obvious and going to commit in a day (or some other reasonable
timeframe) if there are no complaints.

-- 
Regards
 Robin

Re: [Ping, Fortran, Patch, PR85002, v1] Fix deep-copy of alloc. comps. in coarrays ICEing and crashing w/ lib.

2024-09-19 Thread Andre Vehreschild

Hi Thomas,

comitted as gcc-15-3707-g361903ad1af.

Thanks for the review. I am reviewing your unsigned work at the moment.

Thanks again and regards,
Andre

On Wed, 18 Sep 2024 18:18:20 +0200
Thomas Koenig  wrote:

> Am 18.09.24 um 12:31 schrieb Andre Vehreschild:
> > Regtested ok on x86_64-pc-linux-gnu / F39. Ok for mainline?
>
> OK.
>
> Thanks for the patch!
>
> Best regards
>
>   Thomas


--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [gcc-wwwdocs PATCH] gcc-14: Mention -march=gracemont support in x86_64

2024-09-19 Thread Gerald Pfeifer

On Thu, 19 Sep 2024, Haochen Jiang wrote:
> When I was backporting my doc patch in gcc trunk today, I found when 
> adding -march=gracemont in GCC14, the corresponding wwwdoc is missing. 
> This patch is adding that.

This looks fine, thank you.

Gerald

Re: [PATCH v2 5/9] aarch64: Multiple adjustments to support the SMALL code model correctly

2024-09-19 Thread Richard Sandiford

Evgeny Karpov  writes:
> LOCAL_LABEL_PREFIX has been changed to help the assembly
> compiler recognize local labels. Emitting locals has been
> replaced with the .lcomm directive to declare uninitialized
> data without defining an exact section. Functions and objects
> were missing declarations. Binutils was not able to distinguish
> static from external, or an object from a function.
> mingw_pe_declare_object_type has been added to have type
> information for relocation on AArch64, which is not the case
> for ix86.
>
> This fix relies on changes in binutils.
> aarch64: Relocation fixes and LTO
> https://sourceware.org/pipermail/binutils/2024-August/136481.html
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-coff.h (LOCAL_LABEL_PREFIX):
>   Use "." as the local label prefix.
>   (ASM_OUTPUT_ALIGNED_LOCAL): Remove.
>   (ASM_OUTPUT_LOCAL): New.
>   * config/aarch64/cygming.h (ASM_OUTPUT_EXTERNAL_LIBCALL):
>   Update.
>   (ASM_DECLARE_OBJECT_NAME): New.
>   (ASM_DECLARE_FUNCTION_NAME): New.
>   * config/i386/cygming.h (ASM_DECLARE_COLD_FUNCTION_NAME):
>   Update.
>   (ASM_OUTPUT_EXTERNAL_LIBCALL): Update.
>   * config/mingw/winnt.cc (mingw_pe_declare_function_type):
>   Rename into ...
>   (mingw_pe_declare_type): ... this.
>   (i386_pe_start_function): Update.
>   * config/mingw/winnt.h (mingw_pe_declare_function_type):
>   Rename into ...
>   (mingw_pe_declare_type): ... this.
> ---
>  gcc/config/aarch64/aarch64-coff.h | 22 ++
>  gcc/config/aarch64/cygming.h  | 18 +-
>  gcc/config/i386/cygming.h |  8 
>  gcc/config/mingw/winnt.cc | 18 +-
>  gcc/config/mingw/winnt.h  |  3 +--
>  5 files changed, 37 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-coff.h 
> b/gcc/config/aarch64/aarch64-coff.h
> index 81fd9954f75..17f346fe540 100644
> --- a/gcc/config/aarch64/aarch64-coff.h
> +++ b/gcc/config/aarch64/aarch64-coff.h
> @@ -20,9 +20,8 @@
>  #ifndef GCC_AARCH64_COFF_H
>  #define GCC_AARCH64_COFF_H
>  
> -#ifndef LOCAL_LABEL_PREFIX
> -# define LOCAL_LABEL_PREFIX  ""
> -#endif
> +#undef LOCAL_LABEL_PREFIX
> +#define LOCAL_LABEL_PREFIX  "."
>  
>  /* Using long long breaks -ansi and -std=c90, so these will need to be
> made conditional for an LLP64 ABI.  */
> @@ -54,19 +53,10 @@
>  }
>  #endif
>  
> -/* Output a local common block.  /bin/as can't do this, so hack a
> -   `.space' into the bss segment.  Note that this is *bad* practice,
> -   which is guaranteed NOT to work since it doesn't define STATIC
> -   COMMON space but merely STATIC BSS space.  */
> -#ifndef ASM_OUTPUT_ALIGNED_LOCAL
> -# define ASM_OUTPUT_ALIGNED_LOCAL(STREAM, NAME, SIZE, ALIGN) \
> -{
> \
> -  switch_to_section (bss_section);   
> \
> -  ASM_OUTPUT_ALIGN (STREAM, floor_log2 (ALIGN / BITS_PER_UNIT)); \
> -  ASM_OUTPUT_LABEL (STREAM, NAME);   
> \
> -  fprintf (STREAM, "\t.space\t%d\n", (int)(SIZE));   
> \
> -}
> -#endif
> +#define ASM_OUTPUT_LOCAL(FILE, NAME, SIZE, ROUNDED)  \
> +( fputs (".lcomm ", (FILE)), \
> +  assemble_name ((FILE), (NAME)),\
> +  fprintf ((FILE), ",%lu\n", (ROUNDED)))

I'd expect this to be:

  "," HOST_WIDE_INT_PRINT_DEC "\n"

rather than ",%lu\n".  "long" generally shouldn't be used in GCC code,
since it's such an ambiguous type.

LGTM otherwise.

Thanks,
Richard

Re: [PATCH] libcpp: Add -Wtrailing-blanks warning

2024-09-19 Thread Jakub Jelinek

On Thu, Sep 19, 2024 at 09:07:06AM +0200, Jakub Jelinek wrote:
> space is ' ' '\t' '\n' '\r' '\f' '\v' in the C locale,
> blank is ' ' '\t'
> cntrl is a lot of chars but not ' '
> if we extend by the safe-ctype
> vspace '\r' '\n'
> nvspace ' ' '\t' '\f' '\v' '\0'
> Obviously, we shouldn't look at '\r' and '\n', those aren't trailing
> characters, those are line separators.
> 
> Would we need to consider all UTF-8 (or EBCDIC-UTF) control characters is
> cntrl?
> ..0009; Control # Cc  [10] ..
> 000B..000C; Control # Cc   [2] ..
> 000E..001F; Control # Cc  [18] ..
> 007F..009F; Control # Cc  [33] ..
> 00AD  ; Control # Cf   SOFT HYPHEN
> 061C  ; Control # Cf   ARABIC LETTER MARK
> 180E  ; Control # Cf   MONGOLIAN VOWEL SEPARATOR
> 200B  ; Control # Cf   ZERO WIDTH SPACE
> 200E..200F; Control # Cf   [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
> 2028  ; Control # Zl   LINE SEPARATOR
> 2029  ; Control # Zp   PARAGRAPH SEPARATOR
> 202A..202E; Control # Cf   [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT 
> OVERRIDE
> 2060..2064; Control # Cf   [5] WORD JOINER..INVISIBLE PLUS
> 2065  ; Control # Cn   
> 2066..206F; Control # Cf  [10] LEFT-TO-RIGHT ISOLATE..NOMINAL DIGIT SHAPES
> FEFF  ; Control # Cf   ZERO WIDTH NO-BREAK SPACE
> FFF0..FFF8; Control # Cn   [9] ..
> FFF9..FFFB; Control # Cf   [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR 
> ANNOTATION TERMINATOR
> 13430..1343F  ; Control # Cf  [16] EGYPTIAN HIEROGLYPH VERTICAL 
> JOINER..EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE
> 1BCA0..1BCA3  ; Control # Cf   [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND 
> FORMAT UP STEP
> 1D173..1D17A  ; Control # Cf   [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL 
> END PHRASE
> E ; Control # Cn   
> E0001 ; Control # Cf   LANGUAGE TAG
> E0002..E001F  ; Control # Cn  [30] ..
> E0080..E00FF  ; Control # Cn [128] ..
> E01F0..E0FFF  ; Control # Cn [3600] ..
> 
> Wonder why anybody would be interested to find just trailing spaces and not
> trailing tabs or vice versa, so if we have categories, blank would be one,
> then perhaps nvspace as something not including '\0', so just ' ' '\t' '\f'
> '\v' and if really needed, control characters with added ' ', but how to
> call that and would it really need to parse UTF-8/EBCDIC and look at
> pregenerated tables?

And there are also:
0009..000D; White_Space # Cc   [5] ..
0020  ; White_Space # Zs   SPACE
0085  ; White_Space # Cc   
00A0  ; White_Space # Zs   NO-BREAK SPACE
1680  ; White_Space # Zs   OGHAM SPACE MARK
2000..200A; White_Space # Zs  [11] EN QUAD..HAIR SPACE
2028  ; White_Space # Zl   LINE SEPARATOR
2029  ; White_Space # Zp   PARAGRAPH SEPARATOR
202F  ; White_Space # Zs   NARROW NO-BREAK SPACE
205F  ; White_Space # Zs   MEDIUM MATHEMATICAL SPACE
3000  ; White_Space # Zs   IDEOGRAPHIC SPACE

Jakub

Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-19 Thread Richard Sandiford

Evgeny Karpov  writes:
> The current binutils implementation does not support offset up to 4GB in
> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
> This is related to differences in ELF and COFF relocation records.
> There are ways to fix this. This work on relocation change will be extracted 
> to
> a separate binutils patch series and discussion.
>
> To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21
> relocation will remain unchanged, and the workaround below will be applied to
> bypass the 1MB offset limitation.
>
> Regards,
> Evgeny
>
>
> The patch will be replaced by this change.

Seems like a reasonable workarond to me FWIW, but some comments on the
implementation below:

>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 03362a975c0..5f17936df1f 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -2896,7 +2896,30 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
> if (can_create_pseudo_p ())
>   tmp_reg = gen_reg_rtx (mode);
>
> -   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
> +   do
> + {
> +   if (TARGET_PECOFF)
> + {
> +   poly_int64 offset;
> +   HOST_WIDE_INT const_offset;
> +   strip_offset (imm, &offset);
> +
> +   if (offset.is_constant (&const_offset)
> +   && abs(const_offset) >= 1 << 20)

abs_hwi (const_offset) (since const_offset has HOST_WIDE_INT type).

> + {
> +   rtx const_int = imm;
> +   const_int = XEXP (const_int, 0);
> +   XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20));

CONST_INTs are shared objects, so we can't modify their value in-place.

It might be easier to pass base and const_offset from the caller
(aarch64_expand_mov_immediate).  We are then guaranteed that the
offset is constant and don't need to worry about the SVE case.
The new SYM+OFF expression can be calculated using plus_constant.

I think it'd be worth asserting that the offset fits in 32 bits,
since if by some bug the offset is larger, we'd generate silent
wrong code (in the sense that the compiler would truncate the offset
before the assembler sees it).

> +
> +   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, 
> copy_rtx(imm)));
> +   emit_insn (gen_add_hioffset (tmp_reg, 
> GEN_INT(const_offset)));

I think the normal addition patterns can handle this, if we pass the
result of the ~0xf calculation.  There should be no need for a
dedicated pattern.

> +   break;
> + }
> + }
> +
> + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
> + } while(0);

I think it'd be clearer to duplicate the gen_add_losym and avoid the
do...while(0)

Thanks,
Richard

> +
> emit_insn (gen_add_losym (dest, tmp_reg, imm));
> return;
>}
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 665a333903c..072110f93e7 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7405,6 +7405,13 @@
>DONE;
>  })
>
> +(define_insn "add_hioffset"
> +  [(match_operand 0 "register_operand")
> +   (match_operand 1 "const_int_operand")]
> +  ""
> +  "add %0, %0, (%1 & ~0xf) >> 12, lsl #12"
> +)
> +
>  (define_insn "add_losym_"
>[(set (match_operand:P 0 "register_operand" "=r")
> (lo_sum:P (match_operand:P 1 "register_operand" "r")

Re: [patch, fortran] Add random numbers and fix some bugs.

2024-09-19 Thread Andre Vehreschild

Hi Thomas,

submitting your patch as part of the mail got it corrupted by some mailer
adding line breaks. It does not apply for me. Because I can't test it, I have
more questions, see below:

On Wed, 18 Sep 2024 22:22:15 +0200
Thomas Koenig  wrote:

> This patch adds random number support for UNSIGNED, plus fixes
> two bugs, with array I/O where the type used to be set to BT_INTEGER,
> and for division with the divisor being a constant.
>
> Again, depends on prevous submissions.
>
> OK for trunk?
>
> gcc/fortran/ChangeLog:
>
>   * check.cc (gfc_check_random_number): Adjust for unsigned.
>   * iresolve.cc (gfc_resolve_random_number): Handle unsinged.

Hihi, I do this typo, too, over and over again: s/unsinged/unsigned/

>   * trans-expr.cc (gfc_conv_expr_op): Handle BT_UNSIGNED for divide.
>   * trans-types.cc (gfc_get_dtype_rank_type): Handle BT_UNSIGNED.
>   * gfortran.texi: Add RANDOM_NUMBER for UNSIGNED.
>



> diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
> index 533c9d7d343..1851cfb8d4a 100644
> --- a/gcc/fortran/check.cc
> +++ b/gcc/fortran/check.cc
> @@ -7007,8 +7007,14 @@ gfc_check_random_init (gfc_expr *repeatable,
> gfc_expr *image_distinct)
>   bool
>   gfc_check_random_number (gfc_expr *harvest)
>   {
> -  if (!type_check (harvest, 0, BT_REAL))
> -return false;
> +  if (flag_unsigned)
> +{
> +  if (!type_check2 (harvest, 0, BT_REAL, BT_UNSIGNED))
> + return false;

When the second argument is a BT_INTEGER, does this fail here?

> +}
> +  else
> +if (!type_check (harvest, 0, BT_REAL))
> +  return false;
>
> if (!variable_check (harvest, 0, false))
>   return false;



Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-19 Thread Andre Vehreschild

Hi Tobias,

in the changelog of libgomp:

* fortran.c (omp_get_uid_from_device_,
omp_get_uid_from_device_8_): Add.

"Add." what? Can you be more specific, i.e. is it just a dummy or prototype?

In the libgomp/libgomp.texi

+@node omp_get_uid_from_device
+@subsection @code{omp_get_uid_from_device} -- Obtain the unique id of a device
+@table @asis
+@item @emph{Description}:
+This function returns a pointer to _a_ string that represents a unique
identifier 
^^^

+(UID) for the device specified by @var{device_num}.  It returns a ...



@@ -6604,6 +6673,12 @@ The implementation remark:
   @code{omp_thread_mem_alloc}, all use low-latency memory as first
   preference, and fall back to main graphics memory when the low-latency
   pool is exhausted.
+@item The unique identifier (UID), used with OpenMP's API UID routine, consists
+  of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by
+  the CUDA runtime library.  This UUID is output in grouped lower-case
+  hex digits; the grouping of those 32 digits is: 8 digits, hyphen,
+  4 digits, hyphen, 4 digits, hyphen, 16 digits.  The output matches the
+  format used by @code{nvidia-smi}.
 @end itemize

Do I get this right, that for CUDA this is, e.g. GPU-0123456789abdcef ? Then
why is the "normal" UUID display format described here? This confuses me. (Just
curiosity.)

Er, and when I read further on, I find the nvptx implementation and that
contradicts the description. There a "normal" UUID is added to the GPU- id. So
you might want to make that implementation remark more clear

Sorry for the bickering. I just stumbled over that while waiting for a
regression test.

The remainder looks reasonable to me.

Regards,
Andre

On Thu, 19 Sep 2024 15:23:54 +0200
Tobias Burnus  wrote:

> Hi all,
> 
> in order to know and potentially re-use a specific offload device
> (reproducibility, affinity wise close to a CPU (socket), …) a mapping between
> an (universal?) unique identifier and the OpenMP device number is useful.
> Thus, TR13 added support for it.
> 
> This is a collateral patch caused by looking at the API routines for other
> reasons and looking at that part of the spec during the OpenMP F2F.
> 
> Besides the added API routines, the UID will be used elsewhere:
> * In context selectors: 'target_device' supports 'uid()'.
> * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.
> 
> @Sandra: Besides the usual .texi part, for the 'target_device' trait set:
> if you add a new GOMP routine for kind/arch/isa - can you also add an
> UID argument such that we don't have to update the API when needing in the
> not so far future.
> 
> @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin +
> .texi)?
> 
> @Jakub or anyone else — any comments, suggestions, remarks?
> 
> [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
> and seems to work fine.]
> 
> Tobias


-- 
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.

2024-09-19 Thread Jeff Law





On 9/19/24 7:24 AM, Robin Dapp wrote:

Hi,

this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow
SLP with SELECT_VL.

Assisted by sed and regtested on rv64gcv_zvfh_zvbb.

Rather lengthy but obvious, so going to commit after a while if the CI is
happy.  I think those tests don't really need to check for vsetvl anyway,
not all of them at least but I didn't change that for now.

Methodology works for me.
jeff

[Fortran, Patch, PR84870, v1] Fix ICE and allocated memory not assigned correctly.

2024-09-19 Thread Andre Vehreschild

Hi all,

in PR84870 an ICE was reported, that has been fixed in the meantime by some
other patch. Nevertheless did a testcase reveal that the memory handling still
was not correct. I.e. the test case in the patch was answering 2 for both x.b.a
and y.b.a which is not correct.

For a coarray all memory is allocated using an array descriptor. For scalars
just a temporary descriptor is created and handed to the caf-register routine.
The error here was, that the memory now handed back in the temporary descriptor
was not used for the memory in the component, thus the pointer in the component
was not updated. The patch fixes this.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From c26e97a8196fc26abf36a0bad6ffd6f9da7ba5d8 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 19 Sep 2024 15:09:52 +0200
Subject: [PATCH] Fortran: Assign allocated caf-memory to scalar members
 [PR84870]

Allocating a coarray required an array-descriptor.  For scalars a
temporary descriptor was created.  Assigning the allocated memory from
the temporary descriptor back to the scalar is now added.

gcc/fortran/ChangeLog:

	PR fortran/84870

	* trans-array.cc (duplicate_allocatable_coarray): For scalar
	allocatable components the memory allocated is now assigned to
	the component's pointer.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/alloc_comp_10.f90: New test.
---
 gcc/fortran/trans-array.cc|  2 ++
 .../gfortran.dg/coarray/alloc_comp_10.f90 | 24 +++
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/alloc_comp_10.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 838b6d3da80..3da7479fd10 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -9451,6 +9451,7 @@ duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src, tree type,
   gfc_build_addr_expr (NULL_TREE, dest_tok),
   NULL_TREE, NULL_TREE, NULL_TREE,
   GFC_CAF_COARRAY_ALLOC_REGISTER_ONLY);
+  gfc_add_modify (&block, dest, gfc_conv_descriptor_data_get (dummy_desc));
   null_data = gfc_finish_block (&block);

   gfc_init_block (&block);
@@ -9460,6 +9461,7 @@ duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src, tree type,
   gfc_build_addr_expr (NULL_TREE, dest_tok),
   NULL_TREE, NULL_TREE, NULL_TREE,
   GFC_CAF_COARRAY_ALLOC);
+  gfc_add_modify (&block, dest, gfc_conv_descriptor_data_get (dummy_desc));

   tmp = builtin_decl_explicit (BUILT_IN_MEMCPY);
   tmp = build_call_expr_loc (input_location, tmp, 3, dest, src,
diff --git a/gcc/testsuite/gfortran.dg/coarray/alloc_comp_10.f90 b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_10.f90
new file mode 100644
index 000..a31d005498c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_10.f90
@@ -0,0 +1,24 @@
+!{ dg-do run }
+
+! Check that copying of memory for allocated scalar is assigned
+! to coarray object.
+
+! Contributed by G. Steinmetz  
+
+program p
+  type t
+integer, allocatable :: a
+  end type
+  type t2
+type(t), allocatable :: b
+  end type
+  type(t2) :: x, y[*]
+
+  x%b = t(1)
+  y = x
+  y%b%a = 2
+
+  if (x%b%a /= 1) stop 1
+  if (y%b%a /= 2) stop 2
+end
+
--
2.46.0

[PATCH] s390: Remove -m{,no-}lra option

2024-09-19 Thread Stefan Schulze Frielinghaus

I have been missing the two test cases and removed them since they
depend on -mno-lra.

-- 8< --

Since the old reload pass is about to be removed and we defaulted to LRA
for over a decade, remove option -m{,no-}lra.

PR target/113953

gcc/ChangeLog:

* config/s390/s390.cc (s390_lra_p): Remove.
(TARGET_LRA_P): Remove.
* config/s390/s390.opt (mlra): Remove.
* config/s390/s390.opt.urls (mlra): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/s390/TI-constants-nolra.c: Removed.
* gcc.target/s390/pr79895.c: Removed.
---
 gcc/config/s390/s390.cc   | 10 
 gcc/config/s390/s390.opt  |  4 --
 gcc/config/s390/s390.opt.urls |  2 -
 .../gcc.target/s390/TI-constants-nolra.c  | 47 ---
 gcc/testsuite/gcc.target/s390/pr79895.c   |  9 
 5 files changed, 72 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/s390/TI-constants-nolra.c
 delete mode 100644 gcc/testsuite/gcc.target/s390/pr79895.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index c9172d1153a..25d43ae3e13 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -11342,13 +11342,6 @@ s390_can_change_mode_class (machine_mode from_mode,
   return true;
 }
 
-/* Return true if we use LRA instead of reload pass.  */
-static bool
-s390_lra_p (void)
-{
-  return s390_lra_flag;
-}
-
 /* Return true if register FROM can be eliminated via register TO.  */
 
 static bool
@@ -18444,9 +18437,6 @@ s390_c_mode_for_floating_type (enum tree_index ti)
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P s390_legitimate_constant_p
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P s390_lra_p
-
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE s390_can_eliminate
 
diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt
index a5b5aa95a12..23ea4b8232d 100644
--- a/gcc/config/s390/s390.opt
+++ b/gcc/config/s390/s390.opt
@@ -229,10 +229,6 @@ Set the branch costs for conditional branch instructions.  
Reasonable
 values are small, non-negative integers.  The default branch cost is
 1.
 
-mlra
-Target Var(s390_lra_flag) Init(1) Save
-Use LRA instead of reload.
-
 mpic-data-is-text-relative
 Target Var(s390_pic_data_is_text_relative) 
Init(TARGET_DEFAULT_PIC_DATA_IS_TEXT_RELATIVE)
 Assume data segments are relative to text segment.
diff --git a/gcc/config/s390/s390.opt.urls b/gcc/config/s390/s390.opt.urls
index ab1e761efa8..bc772d2ffc7 100644
--- a/gcc/config/s390/s390.opt.urls
+++ b/gcc/config/s390/s390.opt.urls
@@ -74,8 +74,6 @@ UrlSuffix(gcc/S_002f390-and-zSeries-Options.html#index-mzarch)
 
 ; skipping UrlSuffix for 'mbranch-cost=' due to finding no URLs
 
-; skipping UrlSuffix for 'mlra' due to finding no URLs
-
 ; skipping UrlSuffix for 'mpic-data-is-text-relative' due to finding no URLs
 
 ; skipping UrlSuffix for 'mindirect-branch=' due to finding no URLs
diff --git a/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c 
b/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c
deleted file mode 100644
index b9948fc4aa5..000
--- a/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c
+++ /dev/null
@@ -1,47 +0,0 @@
-/* { dg-do compile { target int128 } } */
-/* { dg-options "-O3 -mno-lra" } */
-
-/* 2x lghi */
-__int128 a() {
-  return 0;
-}
-
-/* 2x lghi */
-__int128 b() {
-  return -1;
-}
-
-/* 2x lghi */
-__int128 c() {
-  return -2;
-}
-
-/* lghi + llilh */
-__int128 d() {
-  return 16000 << 16;
-}
-
-/* lghi + llihf */
-__int128 e() {
-  return (unsigned long long)8 << 32;
-}
-
-/* lghi + llihf */
-__int128 f() {
-  return (unsigned __int128)8 << 96;
-}
-
-/* llihf + llihf - this is handled via movti_bigconst pattern */
-__int128 g() {
-  return ((unsigned __int128)8 << 96) | ((unsigned __int128)8 << 32);
-}
-
-/* Literal pool */
-__int128 h() {
-  return ((unsigned __int128)8 << 32) | 1;
-}
-
-/* Literal pool */
-__int128 i() {
-  return (((unsigned __int128)8 << 32) | 1) << 64;
-}
diff --git a/gcc/testsuite/gcc.target/s390/pr79895.c 
b/gcc/testsuite/gcc.target/s390/pr79895.c
deleted file mode 100644
index 02374e4b8a8..000
--- a/gcc/testsuite/gcc.target/s390/pr79895.c
+++ /dev/null
@@ -1,9 +0,0 @@
-/* { dg-do compile { target int128 } } */
-/* { dg-options "-O1 -mno-lra" } */
-
-unsigned __int128 g;
-void
-foo ()
-{
-  g = (unsigned __int128)1 << 127;
-}
-- 
2.45.2

[PATCH] s390: Remove -m{,no-}lra option

2024-09-19 Thread Stefan Schulze Frielinghaus

Since the old reload pass is about to be removed and we defaulted to LRA
for over a decade, remove option -m{,no-}lra.

PR target/113953

gcc/ChangeLog:

* config/s390/s390.cc (s390_lra_p): Remove.
(TARGET_LRA_P): Remove.
* config/s390/s390.opt (mlra): Remove.
* config/s390/s390.opt.urls (mlra): Remove.
---
 Assuming that bootstrap and regtest (which are still running) finish
 successful, ok for mainline?

 gcc/config/s390/s390.cc   | 10 --
 gcc/config/s390/s390.opt  |  4 
 gcc/config/s390/s390.opt.urls |  2 --
 3 files changed, 16 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index c9172d1153a..25d43ae3e13 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -11342,13 +11342,6 @@ s390_can_change_mode_class (machine_mode from_mode,
   return true;
 }
 
-/* Return true if we use LRA instead of reload pass.  */
-static bool
-s390_lra_p (void)
-{
-  return s390_lra_flag;
-}
-
 /* Return true if register FROM can be eliminated via register TO.  */
 
 static bool
@@ -18444,9 +18437,6 @@ s390_c_mode_for_floating_type (enum tree_index ti)
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P s390_legitimate_constant_p
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P s390_lra_p
-
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE s390_can_eliminate
 
diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt
index a5b5aa95a12..23ea4b8232d 100644
--- a/gcc/config/s390/s390.opt
+++ b/gcc/config/s390/s390.opt
@@ -229,10 +229,6 @@ Set the branch costs for conditional branch instructions.  
Reasonable
 values are small, non-negative integers.  The default branch cost is
 1.
 
-mlra
-Target Var(s390_lra_flag) Init(1) Save
-Use LRA instead of reload.
-
 mpic-data-is-text-relative
 Target Var(s390_pic_data_is_text_relative) 
Init(TARGET_DEFAULT_PIC_DATA_IS_TEXT_RELATIVE)
 Assume data segments are relative to text segment.
diff --git a/gcc/config/s390/s390.opt.urls b/gcc/config/s390/s390.opt.urls
index ab1e761efa8..bc772d2ffc7 100644
--- a/gcc/config/s390/s390.opt.urls
+++ b/gcc/config/s390/s390.opt.urls
@@ -74,8 +74,6 @@ UrlSuffix(gcc/S_002f390-and-zSeries-Options.html#index-mzarch)
 
 ; skipping UrlSuffix for 'mbranch-cost=' due to finding no URLs
 
-; skipping UrlSuffix for 'mlra' due to finding no URLs
-
 ; skipping UrlSuffix for 'mpic-data-is-text-relative' due to finding no URLs
 
 ; skipping UrlSuffix for 'mindirect-branch=' due to finding no URLs
-- 
2.45.2

Re: [patch, fortran] Implement IANY, IALL and IPARITY for unsigned

2024-09-19 Thread Andre Vehreschild

Hi Thomas,

this look fine to. Ok for trunk.

Thanks for the patch,
Andre

On Wed, 18 Sep 2024 22:20:44 +0200
Thomas Koenig  wrote:

> OK for trunk?
>
> This is based on the previous submissions. Again, this does not
> generate a new library version; rather it re-uses the signed
> integer version already present in the library.
>
> OK for trunk?
>
> Previous submissions (without which this will not work):
>
> https://gcc.gnu.org/pipermail/fortran/2024-September/060975.html
> https://gcc.gnu.org/pipermail/fortran/2024-September/060987.html
>
> gcc/fortran/ChangeLog:
>
>   * check.cc (gfc_check_transf_bit_intrins): Handle unsigned.
>   * gfortran.texi: Docment IANY, IALL and IPARITY for unsigned.
>   * iresolve.cc (gfc_resolve_iall): Set flag to use integer
>   if type is BT_UNSIGNED.
>   (gfc_resolve_iany): Likewise.
>   (gfc_resolve_iparity): Likewise.
>   * simplify.cc (do_bit_and): Adjust asserts for BT_UNSIGNED.
>   (do_bit_ior): Likewise.
>   (do_bit_xor): Likewise
>
> gcc/testsuite/ChangeLog:
>
>   * gfortran.dg/unsigned_29.f90: New test.
>
>   gcc/fortran/check.cc  | 14 ++-
>   gcc/fortran/gfortran.texi |  1 +
>   gcc/fortran/iresolve.cc   |  6 +--
>   gcc/fortran/simplify.cc   | 51 +++
>   gcc/testsuite/gfortran.dg/unsigned_29.f90 | 40 ++
>   5 files changed, 99 insertions(+), 13 deletions(-)
>   create mode 100644 gcc/testsuite/gfortran.dg/unsigned_29.f90
>
> diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
> index 7c630dd73f4..533c9d7d343 100644
> --- a/gcc/fortran/check.cc
> +++ b/gcc/fortran/check.cc
> @@ -4430,7 +4430,19 @@ gfc_check_mask (gfc_expr *i, gfc_expr *kind)
>   bool
>   gfc_check_transf_bit_intrins (gfc_actual_arglist *ap)
>   {
> -  if (ap->expr->ts.type != BT_INTEGER)
> +  bt type = ap->expr->ts.type;
> +
> +  if (flag_unsigned)
> +{
> +  if (type != BT_INTEGER && type != BT_UNSIGNED)
> + {
> +   gfc_error ("%qs argument of %qs intrinsic at %L must be INTEGER "
> +  "or UNSIGNED", gfc_current_intrinsic_arg[0]->name,
> +  gfc_current_intrinsic, &ap->expr->where);
> +   return false;
> + }
> +}
> +  else if (ap->expr->ts.type != BT_INTEGER)
>   {
> gfc_error ("%qs argument of %qs intrinsic at %L must be INTEGER",
>gfc_current_intrinsic_arg[0]->name,
> diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
> index e5ffe67..3eb8039c09f 100644
> --- a/gcc/fortran/gfortran.texi
> +++ b/gcc/fortran/gfortran.texi
> @@ -2789,6 +2789,7 @@ As of now, the following intrinsics take unsigned
> arguments:
>   @item @code{RANGE}
>   @item @code{TRANSFER}
>   @item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT}
> +@item @code{IANY}, @code{IALL} and @code{IPARITY}
>   @end itemize
>   This list will grow in the near future.
>   @c -
> diff --git a/gcc/fortran/iresolve.cc b/gcc/fortran/iresolve.cc
> index 92a591cf6d7..58a1821ef10 100644
> --- a/gcc/fortran/iresolve.cc
> +++ b/gcc/fortran/iresolve.cc
> @@ -1195,7 +1195,7 @@ gfc_resolve_hypot (gfc_expr *f, gfc_expr *x,
> gfc_expr *y ATTRIBUTE_UNUSED)
>   void
>   gfc_resolve_iall (gfc_expr *f, gfc_expr *array, gfc_expr *dim,
> gfc_expr *mask)
>   {
> -  resolve_transformational ("iall", f, array, dim, mask);
> +  resolve_transformational ("iall", f, array, dim, mask, true);
>   }
>
>
> @@ -1223,7 +1223,7 @@ gfc_resolve_iand (gfc_expr *f, gfc_expr *i,
> gfc_expr *j)
>   void
>   gfc_resolve_iany (gfc_expr *f, gfc_expr *array, gfc_expr *dim,
> gfc_expr *mask)
>   {
> -  resolve_transformational ("iany", f, array, dim, mask);
> +  resolve_transformational ("iany", f, array, dim, mask, true);
>   }
>
>
> @@ -1429,7 +1429,7 @@ gfc_resolve_long (gfc_expr *f, gfc_expr *a)
>   void
>   gfc_resolve_iparity (gfc_expr *f, gfc_expr *array, gfc_expr *dim,
> gfc_expr *mask)
>   {
> -  resolve_transformational ("iparity", f, array, dim, mask);
> +  resolve_transformational ("iparity", f, array, dim, mask, true);
>   }
>
>
> diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
> index e5681c42a48..bd2f6485c95 100644
> --- a/gcc/fortran/simplify.cc
> +++ b/gcc/fortran/simplify.cc
> @@ -3401,9 +3401,20 @@ gfc_simplify_iachar (gfc_expr *e, gfc_expr *kind)
>   static gfc_expr *
>   do_bit_and (gfc_expr *result, gfc_expr *e)
>   {
> -  gcc_assert (e->ts.type == BT_INTEGER && e->expr_type == EXPR_CONSTANT);
> -  gcc_assert (result->ts.type == BT_INTEGER
> -   && result->expr_type == EXPR_CONSTANT);
> +  if (flag_unsigned)
> +{
> +  gcc_assert ((e->ts.type == BT_INTEGER || e->ts.type == BT_UNSIGNED)
> +   && e->expr_type == EXPR_CONSTANT);
> +  gcc_assert ((result->ts.type == BT_INTEGER
> +|| result->ts.type == BT_UNSIGNED)
> +   && result->expr_type == E

Re: [Patch, Fortran] Implement Unsigned for SUM and PRODUCT

2024-09-19 Thread Andre Vehreschild

Hi Thomas,

thanks for the patch. I have one proposal/question and one missing verb (IMO).
Else the patch looks fine to me. Ok for trunk.

> diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
> index 829ab00c665..e5ffe67 100644
> --- a/gcc/fortran/gfortran.texi
> +++ b/gcc/fortran/gfortran.texi
> @@ -2788,7 +2788,7 @@ As of now, the following intrinsics take unsigned
> arguments: @item @code{MVBITS}
>  @item @code{RANGE}
>  @item @code{TRANSFER}
> -@item @code{MATMUL} and @code{DOT_PRODUCT}
> +@item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT}

How about sorting those alphabetically and putting each on a separate line?
This might make it more viewable. Just a suggestion.

>  @end itemize
>  This list will grow in the near future.
>  @c -
> diff --git a/gcc/fortran/iresolve.cc b/gcc/fortran/iresolve.cc
> index 32b31432e58..92a591cf6d7 100644
> --- a/gcc/fortran/iresolve.cc
> +++ b/gcc/fortran/iresolve.cc
> @@ -175,9 +175,11 @@ resolve_bound (gfc_expr *f, gfc_expr *array, gfc_expr
> *dim, gfc_expr *kind,
>  static void
>  resolve_transformational (const char *name, gfc_expr *f, gfc_expr *array,
> -   gfc_expr *dim, gfc_expr *mask)
> +   gfc_expr *dim, gfc_expr *mask,
> +   bool use_integer = false)
>  {
>const char *prefix;
> +  bt type;
>
>f->ts = array->ts;
>
> @@ -200,9 +202,18 @@ resolve_transformational (const char *name, gfc_expr *f,
> gfc_expr *array, gfc_resolve_dim_arg (dim);
>  }
>
> +  /* For those intrinsic like SUM where we the integer version

There is a verb missing here, IMO. ... where we _use_ the ... ???

> + actually uses unsigned, but we call it as the integer
> + version.  */
> +
> +  if (use_integer && array->ts.type == BT_UNSIGNED)
> +type = BT_INTEGER;
> +  else
> +type = array->ts.type;
> +
>f->value.function.name
>  = gfc_get_string (PREFIX ("%s%s_%c%d"), prefix, name,
> -   gfc_type_letter (array->ts.type),
> +   gfc_type_letter (type),
> gfc_type_abi_kind (&array->ts));
>  }
>

Regards and thanks for the patch,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de

Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-19 Thread Richard Sandiford

Richard Sandiford  writes:
> Evgeny Karpov  writes:
>> + {
>> +   rtx const_int = imm;
>> +   const_int = XEXP (const_int, 0);
>> +   XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20));
>
> CONST_INTs are shared objects, so we can't modify their value in-place.

Gah, sorry, I misread.  The patch was only modifying the PLUS, which should
be valid.  My comment below still stands though.

> It might be easier to pass base and const_offset from the caller
> (aarch64_expand_mov_immediate).  We are then guaranteed that the
> offset is constant and don't need to worry about the SVE case.
> The new SYM+OFF expression can be calculated using plus_constant.
>
> I think it'd be worth asserting that the offset fits in 32 bits,
> since if by some bug the offset is larger, we'd generate silent
> wrong code (in the sense that the compiler would truncate the offset
> before the assembler sees it).

Re: [PATCH] SVE intrinsics: Fold svmul with all-zero operands to zero vector

2024-09-19 Thread Jennifer Schmitz



> On 18 Sep 2024, at 20:33, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz  writes:
>> From 05e010a4ad5ef8df082b3e03b253aad85e2a270c Mon Sep 17 00:00:00 2001
>> From: Jennifer Schmitz 
>> Date: Tue, 17 Sep 2024 00:15:38 -0700
>> Subject: [PATCH] SVE intrinsics: Fold svmul with all-zero operands to zero
>> vector
>> 
>> As recently implemented for svdiv, this patch folds svmul to a zero
>> vector if one of the operands is a zero vector. This transformation is
>> applied if at least one of the following conditions is met:
>> - the first operand is all zeros or
>> - the second operand is all zeros, and the predicate is ptrue or the
>> predication is _x or _z.
>> 
>> In contrast to constant folding, which was implemented in a previous
>> patch, this transformation is applied as soon as one of the operands is
>> a zero vector, while the other operand can be a variable.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
> 
> OK, thanks.
> 
> If you're planning any more work in this area, I think the next logical
> step would be to extend the current folds to all predication types,
> before going on to support other mul/div cases or other operations.
> 
> In principle, the mul and div cases correspond to:
> 
>  if (integer_zerop (op1) || integer_zerop (op2))
>return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
> 
> It would then be up to fold_active_lanes_to(X) to work out how to apply
> predication to X.  The general case would be:
> 
>  - For x predication and unpredicated operations, fold to X.
> 
>  - For m and z, calculate a vector that supplies the values of inactive
>lanes (the first vector argument for m and a zero vector from z).
> 
>- If X is equal to the inactive lanes vector, fold directly to X.
> 
>- Otherwise fold to VEC_COND_EXPR 
Dear Richard,
I pushed it to trunk with 08aba2dd8c9390b6131cca0aac069f97eeddc9d2.
Thank you also for the good suggestion, I will do that. During the last days, I 
have been working on a patch that folds multiplication by powers of 2 to 
left-shifts (svlsl), similar to for division. As I see it, that is independent 
from what you proposed, because it is a change of the function type. Can I 
submit it for review before starting on the patch you suggested?
Best, Jennifer
> 
> Richard




smime.p7s
Description: S/MIME cryptographic signature

[PATCH] Always dump generated distance vectors

2024-09-19 Thread Richard Biener

There's special-casing for equal access functions which bypasses
printing the distance vectors.  The following makes sure we print
them always which helps debugging.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-data-ref.cc (build_classic_dist_vector): Move
distance vector dumping to single caller ...
(subscript_dependence_tester): ... here, dumping always
when we succeed computing it.
---
 gcc/tree-data-ref.cc | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index 26e6d9a5657..0f173e8803a 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -5547,21 +5547,6 @@ build_classic_dist_vector (struct 
data_dependence_relation *ddr,
   DDR_NB_LOOPS (ddr), 0));
 }
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
-{
-  unsigned i;
-
-  fprintf (dump_file, "(build_classic_dist_vector\n");
-  for (i = 0; i < DDR_NUM_DIST_VECTS (ddr); i++)
-   {
- fprintf (dump_file, "  dist_vector = (");
- print_lambda_vector (dump_file, DDR_DIST_VECT (ddr, i),
-  DDR_NB_LOOPS (ddr));
- fprintf (dump_file, "  )\n");
-   }
-  fprintf (dump_file, ")\n");
-}
-
   return true;
 }
 
@@ -5673,7 +5658,24 @@ subscript_dependence_tester (struct 
data_dependence_relation *ddr,
 
   compute_subscript_distance (ddr);
   if (build_classic_dist_vector (ddr, loop_nest))
-build_classic_dir_vector (ddr);
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ unsigned i;
+
+ fprintf (dump_file, "(build_classic_dist_vector\n");
+ for (i = 0; i < DDR_NUM_DIST_VECTS (ddr); i++)
+   {
+ fprintf (dump_file, "  dist_vector = (");
+ print_lambda_vector (dump_file, DDR_DIST_VECT (ddr, i),
+  DDR_NB_LOOPS (ddr));
+ fprintf (dump_file, "  )\n");
+   }
+ fprintf (dump_file, ")\n");
+   }
+
+  build_classic_dir_vector (ddr);
+}
 }
 
 /* Returns true when all the access functions of A are affine or
-- 
2.43.0

Re: [PATCH][v2] tree-optimization/116573 - .SELECT_VL for SLP

2024-09-19 Thread Richard Biener

On Thu, 19 Sep 2024, Robin Dapp wrote:

> > On Tue, 17 Sep 2024, Richard Biener wrote:
> >
> > > The following restores the use of .SELECT_VL for testcases where it
> > > is safe to use even when using SLP.  I've for now restricted it
> > > to single-lane SLP plus optimistically allow store-lane nodes
> > > and assume single-lane roots are not widened but at most to
> > > load-lane who should be fine.
> > > 
> > > v2 fixes latent issues in vectorizable_load/store.
> > > 
> > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > So while this fixes the earlier observed 80 regressions from not using
> > SLP this now introduces many more from the CI (800), all in other
> > scan-assembler tests where after checking a sample of one (sic!)
> > we seem to use .SELECT_VL more now but expect not to.  Unfortunately
> > none of the affected tests are runtime tests but at least for the
> > single test I investigated there is nothing wrong with using .SELECT_VL.
> >
> > I've checked the full CI results and as far I can see there are no
> > execute fails caused by this patch (I have locally done a full
> > check-gcc as well with a similar result).
> >
> > So I'm asking for explicit approval here.
> >
> > OK for trunk?
> 
> Odd.  With my testing, rv64 only though, I haven't observed any
> additional fallout.  But the CI knows better, usually.
> 
> While I worked on my patch (which ended up looking similar to yours) I also
> noticed that some examples now use SELECT_VL where we didn't before, and,
> they appeared reasonable to me.  Definitely saw no execution failures either.
> 
> So, I'd say let's go ahead.  Once it is in we can deal with the fallout.
> Same as the LOAD_LANES fallout that I wanted to take care of as soon as
> our internal matters permit.
> Thanks for fixing it.

r15-3712-g5e3a4a01785e2d

Richard.

Re: [PATCH v11] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]

2024-09-19 Thread Marc Poulhiès

Nicolas Boulenguez  writes:

> PR ada/114065
>
> Hello.
> Any news about these patches?

Hello,

Sorry about the delay. Arnaud already replied on BZ, but I'll add a few
remarks.

In 0001-Ada-merge-all-timeval-and-timespec-definitions-and-c.patch:

> -   --  C timeval represent a duration (used in Select for example). This
> -   --  structure is composed of a number of seconds and a number of micro
> -   --  seconds. The timeval structure is not exposed here because its
> -   --  definition is target dependent. Interface to C programs is done via a
> -   --  pointer to timeval structure.
> +   function To_Duration (T : not null access timeval)
> +return System.C_Time.Non_Negative_Duration
> + with Inline
> +   is (System.C_Time.To_Duration (T.all));
> +   --  Deprecated.  Please use C_Time directly.

The aspect "with Inline" is incorrect, it should be last, after the
return expression. The above does not build.

The obvious fix would be:

   function To_Duration (T : not null access timeval)
return System.C_Time.Non_Negative_Duration
   is (System.C_Time.To_Duration (T.all))
 with Inline;

In 0007-Ada-drop-unneeded-darwin-solaris-x32-variants-of-Sys.patch:

> diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
> index 82d01b2..1f339f3 100644
> @@ -2619,7 +2619,7 @@ ifeq ($(strip $(filter-out %x32 linux%,$(target_cpu) 
> $(target_os))),)
>s-mudido.adbs-osinte.adss-osinte.adb -  s-osprim.adb +  s-osprim.adbs-taprop.adbs-tasinf.adss-tasinf.adb @@ -2703,7 +2703,7 @@ ifeq ($(strip $(filter-out darwin%,$(target_os))),)
>ifeq ($(strip $(filter-out %86,$(target_cpu))),)
>  LIBGNAT_TARGET_PAIRS += \
>s-intman.adb -  s-osprim.adb +  s-osprim.adb$(ATOMICS_TARGET_PAIRS) \
>system.ads
> @@ -2722,7 +2722,7 @@ ifeq ($(strip $(filter-out darwin%,$(target_os))),)
>ifeq ($(strip $(filter-out %x86_64,$(target_cpu))),)
>  LIBGNAT_TARGET_PAIRS += \
>s-intman.adb -  s-osprim.adb +  s-osprim.adba-exetim.adsa-exetim.adb$(ATOMICS_TARGET_PAIRS) \
> @@ -2769,7 +2769,7 @@ ifeq ($(strip $(filter-out darwin%,$(target_os))),)
>ifeq ($(strip $(filter-out arm,$(target_cpu))),)
>  LIBGNAT_TARGET_PAIRS += \
>s-intman.adb -  s-osprim.adb +  s-osprim.adb$(ATOMICS_TARGET_PAIRS) \
>$(ATOMICS_BUILTINS_TARGET_PAIRS)
>
> @@ -2782,7 +2782,7 @@ ifeq ($(strip $(filter-out darwin%,$(target_os))),)
>a-nallfl.adss-intman.adbs-dorepr.adb -  s-osprim.adb +  s-osprim.adb$(ATOMICS_TARGET_PAIRS) \
>$(ATOMICS_BUILTINS_TARGET_PAIRS) \
>$(GNATRTL_128BIT_PAIRS)
>
> diff --git a/gcc/ada/libgnat/s-osprim__rtems.adb 
> b/gcc/ada/libgnat/s-osprim__rtems.adb
> index f7b607a..6116345 100644
> --- a/gcc/ada/libgnat/s-osprim__rtems.adb
> +++ b/gcc/ada/libgnat/s-osprim__rtems.adb
> @@ -29,7 +29,7 @@
>  --  
> --
>  
> --
>
> ---  This version is for POSIX-like operating systems
> +--  This version is for POSIX-like operating systems, Darwin and Linux/x32.
>
>  with System.C_Time;

It may be surprising to have the RTEMS file used by other OS. The
original comment should have mentionned that in the first place, but the
file was only used with RTEMS. With your change, the file is effectively
shared, so it would be best to rename it.

Tests are running and I'll report any issue we may find.

Thank you for your patience,
Marc

Re: [PATCH v11] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]

2024-09-19 Thread Arnaud Charlet

> In 0001-Ada-merge-all-timeval-and-timespec-definitions-and-c.patch:
> 
> > -   --  C timeval represent a duration (used in Select for example). This
> > -   --  structure is composed of a number of seconds and a number of micro
> > -   --  seconds. The timeval structure is not exposed here because its
> > -   --  definition is target dependent. Interface to C programs is done via 
> > a
> > -   --  pointer to timeval structure.
> > +   function To_Duration (T : not null access timeval)
> > +return System.C_Time.Non_Negative_Duration
> > + with Inline
> > +   is (System.C_Time.To_Duration (T.all));
> > +   --  Deprecated.  Please use C_Time directly.
> 
> The aspect "with Inline" is incorrect, it should be last, after the
> return expression. The above does not build.
> 
> The obvious fix would be:
> 
>function To_Duration (T : not null access timeval)
> return System.C_Time.Non_Negative_Duration
>is (System.C_Time.To_Duration (T.all))
>  with Inline;

Note that expression functions are already marked inline-unless-impossible so 
you should simply drop the
Inline aspect.

> It may be surprising to have the RTEMS file used by other OS. The
> original comment should have mentionned that in the first place, but the
> file was only used with RTEMS. With your change, the file is effectively
> shared, so it would be best to rename it.

Agreed.

RE: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.

2024-09-19 Thread Li, Pan2

Thanks Robin.

> I think those tests don't really need to check for vsetvl anyway.
Looks only scan asm for RVV fixed-pointer insn is good enough for vector part, 
which
is somehow different to scalar. I will make the change after this patch pushed.

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, September 19, 2024 9:25 PM
To: gcc-patches 
Cc: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; Li, Pan2 ; rdapp@gmail.com
Subject: [PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.

Hi,

this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow
SLP with SELECT_VL.

Assisted by sed and regtested on rv64gcv_zvfh_zvbb.

Rather lengthy but obvious, so going to commit after a while if the CI is
happy.  I think those tests don't really need to check for vsetvl anyway,
not all of them at least but I didn't change that for now.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Expect
length-controlled loop.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto.

Re: [PATCH RFC] build: update bootstrap req to C++14

2024-09-19 Thread Jason Merrill


On 9/19/24 7:57 AM, Richard Biener wrote:

On Wed, Sep 18, 2024 at 6:22 PM Jason Merrill  wrote:


Tested x86_64-pc-linux-gnu with 5.5.0 bootstrap compiler.  Thoughts?


I'm fine with this in general - do we have needs of bumping the requirement for
GCC 15 though?  IMO we should bump once we are requiring actual C++14
in some place.


Jakub's dwarf2asm patch yesterday uses C++14 if available, and I 
remember seeing a couple of other patches that would have been simpler 
with C++14 available.



As of the version requirement as you say only some minor versions of the GCC 5
series are OK I would suggest to say we recommend using GCC 6 or later
but GCC 5.5 should also work?


Aren't we already specifying a minor revision with 4.8.3 for C++11?

Another possibility would be to just say GCC 5, and adjust that upward 
if we run into problems.


Jason

Re: [Patch] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-19 Thread Tobias Burnus


Hi Andre,

thanks for reading the patch + commenting.

Andre Vehreschild wrote:

in the changelog of libgomp:

* fortran.c (omp_get_uid_from_device_,
omp_get_uid_from_device_8_): Add.

"Add." what? Can you be more specific, i.e. is it just a dummy or prototype?


Neither. It is a full implementation (that is a wrapper to the target.c 
function, directly called by C/C++).


The prototype used by fortran.c is 'omp.h.in' (i.e. the C/C++ header 
file, also used by user code) and for Fortran code of users, it is the 
module generated from 'omp_lib.f90.in' and the (deprecated) include file 
'omp_lib.h.in'.


The purpose of fortran.c in general – and also for the added code – is 
to be a wrapper between the Fortran API/ABI and the C ABI. In the 
current case, there are two reasons for the two functions:


(a) The result type is 'character(:), pointer' – but the C function just 
returns a '\0' terminated const char*. Hence, the wrapper function 
contains a '*result_len = strlen (*result);' besides the '*result = 
'


(b) The argument is an 'integer'. As we want to be compatible with 
-fdefault-integer-8, previously somewhat fashionable, we have an 
'int32_t' and an 'int64_t' version of the function – which needs a 
second wrapper function.


As for the other API routine, as a BIND(C) makes it call the C function, 
no wrapper it needed.


* * *

[Typo: missing 'a' – noted + will fix.]

* * *


+@item The unique identifier (UID), used with OpenMP's API UID routine, consists
+  of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by
+  the CUDA runtime library.  This UUID is output in grouped lower-case
+  hex digits; the grouping of those 32 digits is: 8 digits, hyphen,
+  4 digits, hyphen, 4 digits, hyphen, 16 digits.  The output matches the
+  format used by @code{nvidia-smi}.
  @end itemize

Do I get this right, that for CUDA this is, e.g. GPU-0123456789abdcef ? Then
why is the "normal" UUID display format described here? This confuses me. (Just
curiosity.)


For AMD, it is the following type of string, which contains a 8 bytes/16 
hex-digits UUID part: 'GPU-abcef0123456789'.


While for Nvidia it is 'GPU-abcdef12-1234-1234-01234567890abcd', 
consisting of a 16 bytes/32 hex-digits UUID.


For AMD, we directly get the string, matching what "rocminfo" shows as UUID.

For Nvidia, we don't get a string but a 'char bytes[16]' array filled 
with the values, which we print each as '%02x' hex digit. For the 
output, additionally, a "GPU-" prefix is added + a few hyphens. That's 
to mimic what 'nvidia-smi -a' outputs.


I admit it is slightly confusing – and when reading the .texi, it is 
also easy to miss that one part talks about AMD ("GCN") GPUs and the 
other about NVidia GPUs.


→ https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html

(In terms of OpenMP, it is only a unique identifier; it does not need to 
be universally unique [and also isn't for the host]; AMD and Nvidia call 
it UUID and it looks rather unique for the GPU; rocminfo also outputs an 
"UUID" for the CPU but that's just "CPU-XX" (twice for a dual socket 
system, i.e. not even unique), but we don't use this output.)



Er, and when I read further on, I find the nvptx implementation and that
contradicts the description. There a "normal" UUID is added to the GPU- id.


Now I am confused. What description contradicts which one?

Tobias

Re: [PATCH] c++: ICE with structured bindings and m-d array [PR102594]

2024-09-19 Thread Marek Polacek

Ping.

On Thu, Sep 05, 2024 at 06:32:28PM -0400, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?
> 
> -- >8 --
> We ICE in decay_conversion with this test:
> 
>   struct S {
> S() {}
>   };
>   S arr[1][1];
>   auto [m](arr3);
> 
> But not when the last line is:
> 
>   auto [n] = arr3;
> 
> Therefore the difference is between copy- and direct-init.  In
> particular, in build_vec_init we have:
> 
>   if (direct_init)
> from = build_tree_list (NULL_TREE, from);
> 
> and then we call build_vec_init again with init==from.  Then
> decay_conversion gets the TREE_LIST and it crashes.
> 
> build_aggr_init has:
> 
>   /* Wrap the initializer in a CONSTRUCTOR so that build_vec_init
>  recognizes it as direct-initialization.  */
>   init = build_constructor_single (init_list_type_node,
>NULL_TREE, init);
>   CONSTRUCTOR_IS_DIRECT_INIT (init) = true;
> 
> so I propose to do the same in build_vec_init.
> 
>   PR c++/102594
> 
> gcc/cp/ChangeLog:
> 
>   * init.cc (build_vec_init): Build up a CONSTRUCTOR to signal
>   direct-initialization rather than a TREE_LIST.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1z/decomp61.C: New test.
> ---
>  gcc/cp/init.cc|  8 +++-
>  gcc/testsuite/g++.dg/cpp1z/decomp61.C | 28 +++
>  2 files changed, 35 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp61.C
> 
> diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
> index be7fdb40dd6..f785015e477 100644
> --- a/gcc/cp/init.cc
> +++ b/gcc/cp/init.cc
> @@ -4958,7 +4958,13 @@ build_vec_init (tree base, tree maxindex, tree init,
> if (xvalue)
>   from = move (from);
> if (direct_init)
> - from = build_tree_list (NULL_TREE, from);
> + {
> +   /* Wrap the initializer in a CONSTRUCTOR so that
> +  build_vec_init recognizes it as direct-initialization.  */
> +   from = build_constructor_single (init_list_type_node,
> +NULL_TREE, from);
> +   CONSTRUCTOR_IS_DIRECT_INIT (from) = true;
> + }
>   }
> else
>   from = NULL_TREE;
> diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp61.C 
> b/gcc/testsuite/g++.dg/cpp1z/decomp61.C
> new file mode 100644
> index 000..ad0a20c1add
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1z/decomp61.C
> @@ -0,0 +1,28 @@
> +// PR c++/102594
> +// { dg-do compile { target c++17 } }
> +
> +struct S {
> +  S() {}
> +};
> +S arr1[2];
> +S arr2[2][1];
> +S arr3[1][1];
> +auto [m](arr3);
> +auto [n] = arr3;
> +
> +struct X {
> +  int i;
> +};
> +
> +void
> +g (X x)
> +{
> +  auto [a, b](arr2);
> +  auto [c, d] = arr2;
> +  auto [e, f] = (arr2);
> +  auto [i, j](arr1);
> +  auto [k, l] = arr1;
> +  auto [m, n] = (arr1);
> +  auto [z] = x;
> +  auto [y](x);
> +}
> 
> base-commit: b567e5ead5d54f022c57b48f31653f6ae6ece007
> -- 
> 2.46.0
> 

Marek

[Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-19 Thread Tobias Burnus


Minor update – addressing the issues that Andre raised (thanks!):

'Add.' → 'New functions.' in the ChangeLog for 'fortran.c' and otherwise 
libgomp.texi changes, only:


A bunch of typo fixes (preexisting and in the new text). I also added an 
made-up example UUID for the GPUs, which should help to reduce confusion.


Any additional comments or suggestions?

Tobias

Tobias Burnus wrote:
in order to know and potentially re-use a specific offload device 
(reproducibility,
affinity wise close to a CPU (socket), …) a mapping between an 
(universal?) unique
identifier and the OpenMP device number is useful. Thus, TR13 added 
support for it.


This is a collateral patch caused by looking at the API routines for 
other reasons

and looking at that part of the spec during the OpenMP F2F.

Besides the added API routines, the UID will be used elsewhere:
* In context selectors: 'target_device' supports 'uid()'.
* In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.

@Sandra: Besides the usual .texi part, for the 'target_device' trait set:
if you add a new GOMP routine for kind/arch/isa - can you also add an
UID argument such that we don't have to update the API when needing in 
the

not so far future.

@Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side 
(plugin +

.texi)?

@Jakub or anyone else — any comments, suggestions, remarks?

[The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
and seems to work fine.]OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.

gcc/ChangeLog:

	* omp-general.cc (omp_runtime_api_procname): Add
	get_device_from_uid and omp_get_uid_from_device routines.

include/ChangeLog:

	* cuda/cuda.h (cuDeviceGetUuid): Declare.
	(cuDeviceGetUuid_v2): Add prototype.

libgomp/ChangeLog:

	* config/gcn/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Add stub implementation.
	* config/nvptx/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Likewise.
	* fortran.c (omp_get_uid_from_device_,
	omp_get_uid_from_device_8_): New functions.
	* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
	* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
	* libgomp.map (GOMP_6.0): New, includind the new UID routines.
	* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
	(Device Information Routines): Document new UID routines.
	(Offload-Target Specifics): Document UID format.
	* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New prototype.
	* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New interface.
	* omp_lib.h.in: Likewise.
	* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
	* target.c (str_omp_initial_device): New static var.
	(STR_OMP_DEV_PREFIX): Define.
	(gomp_get_uid_for_device, omp_get_uid_from_device,
	omp_get_device_from_uid): New.
	(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
	(gomp_target_init): Set the device's 'uid' field to NULL.
	* testsuite/libgomp.c/device_uid.c: New test.
	* testsuite/libgomp.fortran/device_uid.f90: New test.

 gcc/omp-general.cc   |  4 +-
 include/cuda/cuda.h  |  7 ++
 libgomp/config/gcn/target.c  | 14 
 libgomp/config/nvptx/target.c| 14 
 libgomp/fortran.c| 15 
 libgomp/libgomp-plugin.h |  1 +
 libgomp/libgomp.h|  2 +
 libgomp/libgomp.map  |  8 +++
 libgomp/libgomp.texi | 89 ++--
 libgomp/omp.h.in |  3 +
 libgomp/omp_lib.f90.in   | 23 ++
 libgomp/omp_lib.h.in | 23 ++
 libgomp/plugin/cuda-lib.def  |  2 +
 libgomp/plugin/plugin-gcn.c  | 16 +
 libgomp/plugin/plugin-nvptx.c| 34 +
 libgomp/target.c | 56 +++
 libgomp/testsuite/libgomp.c/device_uid.c | 38 ++
 libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 +++
 18 files changed, 384 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index de91ba8a4a7..12788ad0249 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name)
   "alloc",
   "calloc",
   "free",
+  "get_device_from_uid",
   "get_interop_int",
   "get_interop_ptr",
   "get_mapped_ptr",
@@ -3338,12 +3339,13 @

Re: [Patch, Fortran] Implement Unsigned for SUM and PRODUCT

2024-09-19 Thread Thomas Koenig


Am 19.09.24 um 11:55 schrieb Andre Vehreschild:

Hi Thomas,

thanks for the patch. I have one proposal/question and one missing verb (IMO).
Else the patch looks fine to me. Ok for trunk.


diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 829ab00c665..e5ffe67 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -2788,7 +2788,7 @@ As of now, the following intrinsics take unsigned
arguments: @item @code{MVBITS}
  @item @code{RANGE}
  @item @code{TRANSFER}
-@item @code{MATMUL} and @code{DOT_PRODUCT}
+@item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT}


How about sorting those alphabetically and putting each on a separate line?
This might make it more viewable. Just a suggestion.


I tried to group them somewhat logically, but you're right, this may be
better.  Eventually, I want to document the UNSIGNED arguments to
all intrinsics so they are in the right place.

I think I will re-sort this after all intrinsics have been finished.



  @end itemize
  This list will grow in the near future.
  @c -
diff --git a/gcc/fortran/iresolve.cc b/gcc/fortran/iresolve.cc
index 32b31432e58..92a591cf6d7 100644
--- a/gcc/fortran/iresolve.cc
+++ b/gcc/fortran/iresolve.cc
@@ -175,9 +175,11 @@ resolve_bound (gfc_expr *f, gfc_expr *array, gfc_expr
*dim, gfc_expr *kind,
  static void
  resolve_transformational (const char *name, gfc_expr *f, gfc_expr *array,
- gfc_expr *dim, gfc_expr *mask)
+ gfc_expr *dim, gfc_expr *mask,
+ bool use_integer = false)
  {
const char *prefix;
+  bt type;

f->ts = array->ts;

@@ -200,9 +202,18 @@ resolve_transformational (const char *name, gfc_expr *f,
gfc_expr *array, gfc_resolve_dim_arg (dim);
  }

+  /* For those intrinsic like SUM where we the integer version


There is a verb missing here, IMO. ... where we _use_ the ... ???


This sentence no verb, correct :-)


+ actually uses unsigned, but we call it as the integer
+ version.  */
+
+  if (use_integer && array->ts.type == BT_UNSIGNED)
+type = BT_INTEGER;
+  else
+type = array->ts.type;
+
f->value.function.name
  = gfc_get_string (PREFIX ("%s%s_%c%d"), prefix, name,
- gfc_type_letter (array->ts.type),
+ gfc_type_letter (type),
  gfc_type_abi_kind (&array->ts));
  }



Regards and thanks for the patch,
Andre


Thanks!

Best regards

Thomas

[PATCH] tree-optimization/116768 - wrong dependence analysis

2024-09-19 Thread Richard Biener

The following reverts a bogus fix done for PR101009 and instead makes
sure we get into the same_access_functions () case when computing
the distance vector for g[1] and g[1] where the constants ended up
having different types.  The generic code doesn't seem to handle
loop invariant dependences.  The special case gets us both
( 0 ) and ( 1 ) as distance vectors while formerly we got ( 1 ),
which the PR101009 fix changed to ( 0 ) with bad effects on other
cases as shown in this PR.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk
sofar.

Richard.

PR tree-optimization/116768
* tree-data-ref.cc (build_classic_dist_vector_1): Revert
PR101009 change.
* tree-chrec.cc (eq_evolutions_p): Make sure (sizetype)1
and (int)1 compare equal.

* gcc.dg/torture/pr116768.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr116768.c | 32 +
 gcc/tree-chrec.cc   |  4 ++--
 gcc/tree-data-ref.cc|  2 --
 3 files changed, 34 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116768.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr116768.c 
b/gcc/testsuite/gcc.dg/torture/pr116768.c
new file mode 100644
index 000..57b5d00e7b7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116768.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+#define numwords 2
+
+typedef struct {
+  unsigned words[numwords];
+} Child;
+
+typedef struct {
+  Child child;
+} Parent;
+
+Parent my_or(Parent x, const Parent *y) {
+  const Child *y_child = &y->child;
+  for (int i = 0; i < numwords; i++) {
+x.child.words[i] |= y_child->words[i];
+  }
+  return x;
+}
+
+int main() {
+  Parent bs[4];
+  __builtin_memset(bs, 0, sizeof(bs));
+
+  bs[0].child.words[0] = 1;
+  for (int i = 1; i <= 3; i++) {
+bs[i] = my_or(bs[i], &bs[i - 1]);
+  }
+  if (bs[2].child.words[0] != 1)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc
index 8b7982a2dbe..9b272074a2e 100644
--- a/gcc/tree-chrec.cc
+++ b/gcc/tree-chrec.cc
@@ -1716,7 +1716,7 @@ eq_evolutions_p (const_tree chrec0, const_tree chrec1)
   || TREE_CODE (chrec0) != TREE_CODE (chrec1))
 return false;
 
-  if (chrec0 == chrec1)
+  if (operand_equal_p (chrec0, chrec1, 0))
 return true;
 
   if (! types_compatible_p (TREE_TYPE (chrec0), TREE_TYPE (chrec1)))
@@ -1743,7 +1743,7 @@ eq_evolutions_p (const_tree chrec0, const_tree chrec1)
  TREE_OPERAND (chrec1, 0));
 
 default:
-  return operand_equal_p (chrec0, chrec1, 0);
+  return false;
 }
 }
 
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index 0f173e8803a..de234c65e94 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -5223,8 +5223,6 @@ build_classic_dist_vector_1 (struct 
data_dependence_relation *ddr,
  non_affine_dependence_relation (ddr);
  return false;
}
-  else
-   *init_b = true;
 }
 
   return true;
-- 
2.43.0

Re: [PATCH RFC] build: update bootstrap req to C++14

2024-09-19 Thread Xi Ruoyao

On Thu, 2024-09-19 at 10:21 -0400, Jason Merrill wrote:

/* snip */

> Another possibility would be to just say GCC 5, and adjust that upward
> if we run into problems.

I'd remind that GCC 5.1 is known as incapable to bootstrap recent GCC
releases due to PR 65801.   


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

2024-09-19 Thread Jonathan Wakely

On Thu, 19 Sept 2024 at 14:12,  wrote:
>
> From: Matthew Malcomson 
>
> Hello, this is an RFC for adding an atomic floating point fetch_add builtin
> (and variants) to GCC.  The atomic fetch_add operation is defined to work
> on the base floating point types in the C++20 standard chapter 31.7.3, and
> extended to work for all cv-unqualified floating point types in C++23
> chapter 33.5.7.4.
>
> Honestly not sure who to Cc, please do point me to someone else if that's
> better.
>
> This is nowhere near complete (for one thing even the tests I've added
> don't fully pass), but I think I have a complete enough idea that it's
> worth checking if this is something that could be agreed on.
>
> As it stands no target except the nvptx backend would natively support
> these operations.
>
> Main questions that I'm looking to resolve with this RFC:
> 1) Would GCC be OK accepting this implementation even though no backend
>would be implementing these yet?
>- AIUI only the nvptx backend could theoretically implement this.
>- Even without a backend implementing it natively, the ability to use
>  this in code (especially libstdc++) enables other compilers to
>  generate better code for GPU's using standard C++.
> 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)`
>(i.e. a check on the resolved builtin rather than the more user-facing
>one) in order to determine whether floating point atomic fetch_add is
>available.

Yes, if that name is what other compilers will also use (have you
discussed this with Clang?)

It looks like PATCH 5/8 only uses the _fp name for fetch_add though,
and just uses fetch_sub etc. for the other functions, is that a
mistake?

>- N.b. this builtin is actually the builtin working on the "double"

OK, so the library code just calls the generic __atomic_fetch_add that
accepts any types, but then that gets expanded to a more specific form
for float, double etc.?
And the more specific form has to exist at some level, because we need
an extern symbol from libatomic, so either we include the type as an
explicit suffix on the name, or we use some kind of name mangling like
_Z18__atomic_fetch_addPdS_S_, which is obviously nasty.

>  type, one would have to rely on any compilers implementing that
>  particular resolved builtin to also implement the other floating point
>  atomic fetch_add builtins that they would want to support in libstdc++
>  `atomic<[floating_point_type]>::fetch_add`.

This seems a bit concerning. I can imagine somebody implementing these
for float and double first, but leaving long double, _Float64,
_Float32, _Float128 etc. for later. In that case, libstdc++ would not
work if somebody tries to use std::atomic, or whichever
types aren't supported yet. It's OK if we can be *sure* that won't
happen i.e. that Clang will either implement the new built-in for
*all* FP types, or none.

>
> More specific questions about the choice of which builtins to implement and
> whether the types are OK:
> 1) Is it OK to not implement the `__sync_*` versions?
>Since these are deprecated and the `__atomic_*` versions are there to
>match the C/C++ code atomic operations (which is a large part of the
>reason for the new floating point operations).
> 2) Is it OK to not implement all the `fetch_*` operations?
>None of the bitwise operations are specified for C++ and bitwise
>operations are (AIUI) rarely used on floating point values.

That seems OK (entirely correct even) to me.


> 3) Wanting to be able to farm out to libatomic meant that we need constant 
> names
>for the specialised functions.
>- This led to the naming convention based on floating point type.
>- That naming convention *could* be updated to include the special backend
>  floating point types if needed.  I have not done this mostly because I
>  thought it would not add much, though I have not looked into this very
>  closely.
> 4) Wanting to name the functions based on floating point type rather than size
>meant that the mapping from type passed to the overloaded version to
>specialised builtin was less direct than for the integral versions.
>- Ended up with a hard-coded table in the source to check this.
>- Would very much like some better approach, not certain what better 
> approach
>  I could find.
>- Will eventually at least share the hard-coded tables (which isn't
>  happening yet because this is at RFC level).
> 5) Are there any other types that I should use?
>Similarly are there any types that I'm trying to use that I shouldn't?
>I *believe* I've implemented all the types that make sense and are
>general builtin types.  Could easily have missed some (e.g. left
>`_Float128x` alone because AIUI it's larger than 128bits which means we
>don't have any other atomic operations on such data), could also easily

That seems like a problem though - it means that GCC could be in
exac

Re: [PATCH RFC] build: update bootstrap req to C++14

2024-09-19 Thread Jakub Jelinek

On Thu, Sep 19, 2024 at 10:21:15AM -0400, Jason Merrill wrote:
> On 9/19/24 7:57 AM, Richard Biener wrote:
> > On Wed, Sep 18, 2024 at 6:22 PM Jason Merrill  wrote:
> > > 
> > > Tested x86_64-pc-linux-gnu with 5.5.0 bootstrap compiler.  Thoughts?
> > 
> > I'm fine with this in general - do we have needs of bumping the requirement 
> > for
> > GCC 15 though?  IMO we should bump once we are requiring actual C++14
> > in some place.
> 
> Jakub's dwarf2asm patch yesterday uses C++14 if available, and I remember

And libcpp too.

> seeing a couple of other patches that would have been simpler with C++14
> available.

It was just a few lines and if I removed the now never true
HAVE_DESIGNATED_INITIALIZERS cases, it wouldn't even add any new lines, just
change some to others.  Both of those patches were just minor optimizations,
it is fine if they don't happen during stage1.

We also have some spots with
#if __cpp_inline_variables < 201606L
#else
#endif
conditionals but that doesn't mean we need to bump to C++17.

Sure, bumping the required C++ version means we can remove the corresponding
conditionals, and more importantly stop worrying about working around GCC
4.8.x/4.9 bugs (I think that is actually more important).
The price is stopping to use some of the cfarm machines for testing or
using IBM Advanced Toolchain or hand-built GCC 14 there as the system
compiler there.
At some point we certainly want to do that, the question is if the benefits
right now overweight the pain.

> > As of the version requirement as you say only some minor versions of the 
> > GCC 5
> > series are OK I would suggest to say we recommend using GCC 6 or later
> > but GCC 5.5 should also work?
> 
> Aren't we already specifying a minor revision with 4.8.3 for C++11?
> 
> Another possibility would be to just say GCC 5, and adjust that upward if we
> run into problems.

I think for the oldest supported version we need some CFarm machines around
with that compiler so that all people can actually test issues with it.
Dunno which distros shipped GCC 5 in long term support versions if any and
at which minor those are.

Jakub

[PATCH v5] c++: deleting explicitly-defaulted functions [PR116162]

2024-09-19 Thread Marek Polacek

On Tue, Sep 17, 2024 at 12:50:46PM -0400, Jason Merrill wrote:
> On 9/16/24 7:14 PM, Marek Polacek wrote:
> > +/* Mark an explicitly defaulted function FN as =deleted and warn.
> > +   IMPLICIT_FN is the corresponding special member function that
> > +   would have been implicitly declared.  */
> > +
> > +void
> > +maybe_delete_defaulted_fn (tree fn, tree implicit_fn)
> > +{
> > +  if (DECL_ARTIFICIAL (fn) || !DECL_DEFAULTED_IN_CLASS_P (fn))
> > +return;
> > +
> > +  DECL_DELETED_FN (fn) = true;
> > +
> > +  if (!warn_defaulted_fn_deleted)
> > +return;
> 
> The flag shouldn't affect the error cases; I'd drop this check.

Dropped.

> > +  auto_diagnostic_group d;
> > +  const special_function_kind kind = special_function_p (fn);
> > +  tree parmtype
> > += TREE_VALUE (DECL_XOBJ_MEMBER_FUNCTION_P (fn)
> > + ? TREE_CHAIN (TYPE_ARG_TYPES (TREE_TYPE (fn)))
> > + : FUNCTION_FIRST_USER_PARMTYPE (fn));
> > +  const bool illformed_p
> > +/* [dcl.fct.def.default] "if F1 is an assignment operator"...  */
> > += (SFK_ASSIGN_P (kind)
> > +   /* "and the return type of F1 differs from the return type of F2"  
> > */
> > +   && (!same_type_p (TREE_TYPE (TREE_TYPE (fn)),
> > +TREE_TYPE (TREE_TYPE (implicit_fn)))
> > +  /* "or F1's non-object parameter type is not a reference,
> > + the program is ill-formed"  */
> > +  || !TYPE_REF_P (parmtype)));
> > +  /* Decide if we want to emit a pedwarn, error, or a warning.  */
> > +  diagnostic_t diag_kind;
> > +  if (cxx_dialect >= cxx20)
> > +diag_kind = illformed_p ? DK_ERROR : DK_WARNING;
> > +  else
> > +diag_kind = DK_PEDWARN;
> 
> Error should be errors in all standard modes; it doesn't make sense to have
> a softer diagnostic in an older mode when it's ill-formed in all.
> 
> Non-errors should be warnings or pedwarns depending on the standard mode.

Aaah, I misunderstood.  Hopefully I got it right this time. 
 
> > +  /* Don't warn for template instantiations.  */
> > +  if (DECL_TEMPLATE_INSTANTIATION (fn) && diag_kind == DK_WARNING)
> > +return;
> > +
> > +  const char *wmsg;
> > +  switch (kind)
> > +{
> > +case sfk_copy_constructor:
> > +  wmsg = G_("explicitly defaulted copy constructor is implicitly 
> > deleted "
> > +   "because its declared type does not match the type of an "
> > +   "implicit copy constructor");
> > +  break;
> > +case sfk_move_constructor:
> > +  wmsg = G_("explicitly defaulted move constructor is implicitly 
> > deleted "
> > +   "because its declared type does not match the type of an "
> > +   "implicit move constructor");
> > +  break;
> > +case sfk_copy_assignment:
> > +  wmsg = G_("explicitly defaulted copy assignment operator is 
> > implicitly "
> > +   "deleted because its declared type does not match the type "
> > +   "of an implicit copy assignment operator");
> > +  break;
> > +case sfk_move_assignment:
> > +  wmsg = G_("explicitly defaulted move assignment operator is 
> > implicitly "
> > +   "deleted because its declared type does not match the type "
> > +   "of an implicit move assignment operator");
> > +  break;
> > +default:
> > +  gcc_unreachable ();
> > +}
> > +  if (emit_diagnostic (diag_kind, DECL_SOURCE_LOCATION (fn),
> > +  OPT_Wdefaulted_function_deleted, wmsg))
> 
> Let's not pass the OPT when DK_ERROR.

Done.

I've added new tests to cover -Wno-defaulted-function-deleted.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR points out the we're not implementing [dcl.fct.def.default]
properly.  Consider e.g.

  struct C {
 C(const C&&) = default;
  };

where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted.  There's an exception for
assignment operators in which case the program is ill-formed.

clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.

When the code is ill-formed, we emit an error in all modes.  Otherwise,
we emit a pedwarn in C++17 and a warning in C++20.

PR c++/116162

gcc/c-family/ChangeLog:

* c.opt (Wdefaulted-function-deleted): New.

gcc/cp/ChangeLog:

* class.cc (check_bases_and_members): Don't set DECL_DELETED_FN here,
leave it to defaulted_late_check.
* cp-tree.h (maybe_delete_defaulted_fn): Declare.
(defaulted_late_check): Add a tristate parameter.
* method.cc (maybe_delete_defaulted_fn): New.
(defaulted_late_check): Add a tristate parameter.  Call
maybe_delete_defaulted_fn instead of giving an error.

gcc/ChangeLog:

*

Re: [PATCH 5/5] arm: [MVE intrinsics] Rework MVE vld/vst intrinsics

2024-09-19 Thread Christophe Lyon


Hi!

I had not noticed that this patch makes gcc.target/arm/pr112337.c fail 
because __builtin_mve_vldrwq_sv4si is no longer available.


Adding this fixes the problem:
diff --git a/gcc/testsuite/gcc.target/arm/pr112337.c 
b/gcc/testsuite/gcc.target/arm/pr112337.c

index 10b7881b9f9..599229c1db0 100644
--- a/gcc/testsuite/gcc.target/arm/pr112337.c
+++ b/gcc/testsuite/gcc.target/arm/pr112337.c
@@ -4,7 +4,9 @@
 /* { dg-add-options arm_v8_1m_mve } */

 #pragma GCC arm "arm_mve_types.h"
-int32x4_t h(void *p) { return __builtin_mve_vldrwq_sv4si(p); }
+#pragma GCC arm "arm_mve.h" false
+
+int32x4_t h(void *p) { return vldrwq_s32(p); }
 void g(int32x4_t);
 void f(int, int, int, short, int *p) {
   int *bias = p;


I hope that's simple enough not to need a v2 of the patch series if 
everything else is OK?


Thanks,

Christophe


On 9/16/24 11:38, Christophe Lyon wrote:

From: Alfie Richards 

Implement the mve vld and vst intrinsics using the MVE builtins framework.

The main part of the patch is to reimplement to vstr/vldr patterns
such that we now have much fewer of them:
- non-truncating stores
- predicated non-truncating stores
- truncating stores
- predicated truncating stores
- non-extending loads
- predicated non-extending loads
- extending loads
- predicated extending loads

This enables us to update the implementation of vld1/vst1 and use the
new vldr/vstr builtins.

The patch also adds support for the predicated vld1/vst1 versions.

2024-09-11  Alfie Richards  
Christophe Lyon  

gcc/

* config/arm/arm-mve-builtins-base.cc (vld1q_impl): Add support
for predicated version.
(vst1q_impl): Likewise.
(vstrq_impl): New class.
(vldrq_impl): New class.
(vldrbq): New.
(vldrhq): New.
(vldrwq): New.
(vstrbq): New.
(vstrhq): New.
(vstrwq): New.
* config/arm/arm-mve-builtins-base.def (vld1q): Add predicated
version.
(vldrbq): New.
(vldrhq): New.
(vldrwq): New.
(vst1q): Add predicated version.
(vstrbq): New.
(vstrhq): New.
(vstrwq): New.
(vrev32q): Update types to float_16.
* config/arm/arm-mve-builtins-base.h (vldrbq): New.
(vldrhq): New.
(vldrwq): New.
(vstrbq): New.
(vstrhq): New.
(vstrwq): New.
* config/arm/arm-mve-builtins-functions.h (memory_vector_mode):
Remove conversion of floating point vectors to integer.
* config/arm/arm-mve-builtins.cc (TYPES_float16): Change to...
(TYPES_float_16): ...this.
(TYPES_float_32): New.
(float16): Change to...
(float_16): ...this.
(float_32): New.
(preds_z_or_none): New.
(function_resolver::check_gp_argument): Add support for _z
predicate.
* config/arm/arm_mve.h (vstrbq): Remove.
(vstrbq_p): Likewise.
(vstrhq): Likewise.
(vstrhq_p): Likewise.
(vstrwq): Likewise.
(vstrwq_p): Likewise.
(vst1q_p): Likewise.
(vld1q_z): Likewise.
(vldrbq_s8): Likewise.
(vldrbq_u8): Likewise.
(vldrbq_s16): Likewise.
(vldrbq_u16): Likewise.
(vldrbq_s32): Likewise.
(vldrbq_u32): Likewise.
(vstrbq_p_s8): Likewise.
(vstrbq_p_s32): Likewise.
(vstrbq_p_s16): Likewise.
(vstrbq_p_u8): Likewise.
(vstrbq_p_u32): Likewise.
(vstrbq_p_u16): Likewise.
(vldrbq_z_s16): Likewise.
(vldrbq_z_u8): Likewise.
(vldrbq_z_s8): Likewise.
(vldrbq_z_s32): Likewise.
(vldrbq_z_u16): Likewise.
(vldrbq_z_u32): Likewise.
(vldrhq_s32): Likewise.
(vldrhq_s16): Likewise.
(vldrhq_u32): Likewise.
(vldrhq_u16): Likewise.
(vldrhq_z_s32): Likewise.
(vldrhq_z_s16): Likewise.
(vldrhq_z_u32): Likewise.
(vldrhq_z_u16): Likewise.
(vldrwq_s32): Likewise.
(vldrwq_u32): Likewise.
(vldrwq_z_s32): Likewise.
(vldrwq_z_u32): Likewise.
(vldrhq_f16): Likewise.
(vldrhq_z_f16): Likewise.
(vldrwq_f32): Likewise.
(vldrwq_z_f32): Likewise.
(vstrhq_f16): Likewise.
(vstrhq_s32): Likewise.
(vstrhq_s16): Likewise.
(vstrhq_u32): Likewise.
(vstrhq_u16): Likewise.
(vstrhq_p_f16): Likewise.
(vstrhq_p_s32): Likewise.
(vstrhq_p_s16): Likewise.
(vstrhq_p_u32): Likewise.
(vstrhq_p_u16): Likewise.
(vstrwq_f32): Likewise.
(vstrwq_s32): Likewise.
(vstrwq_u32): Likewise.
(vstrwq_p_f32): Likewise.
(vstrwq_p_s32): Likewise.
(vstrwq_p_u32): Likewise.
(vst1q_p_u8): Likewise.
(vst1q_p_s8): Likewise.
(vld1q_z_u8): Likewise.
(vld1q_z_s8): Likewise.
(vst1q_p_u16): Likewise.
(vst1q_p_s16): Likewise.
(vld1q_z_u16): Likewise.
(vld1q_z_s16): Likewise

Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-19 Thread Martin Storsjö


On Thu, 19 Sep 2024, Richard Sandiford wrote:


Martin Storsjö  writes:

On Thu, 12 Sep 2024, Evgeny Karpov wrote:


The current binutils implementation does not support offset up to 4GB in
IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
This is related to differences in ELF and COFF relocation records.


Yes, I agree.

But I would not consider this a limitation of the binutils implementation,
this is a limitation of the object file format. It can't be worked around
by inventing your own custom relocations, but should instead worked around
on the code generation side, to avoid needing such large offsets.

This approach is one such, quite valid. Another one is to generate extra
symbols to allow addressing anything with a smaller offset.


Maybe this is my ELF bias showing, but: generating extra X=Y+OFF
symbols isn't generally valid for ELF when Y is a global symbol, since
interposition rules, comdat, weak symbols, and various other reasons,
could mean that the local definition of Y isn't the one that gets used.
Does COFF cope with that in some other way?  If not, I would have
expected that there would need to be a fallback path that didn't
involve defining extra symbols.


That's indeed a fair point. COFF doesn't cope with that in other ways - so 
defining such extra symbols to cope for the offsets, for global symbols 
that can be interposed or swapped out at linking stage, would indeed be 
wrong.


In practice, I think it's rare to reference such an interposable symbol 
with an offset overall - even more so to reference it with an offset over 
1 MB.


The practical cases where one mostly runs into the limitation, is when you 
have large sections, and use temporary labels to reference positions 
within those sections. As the temporary labels don't persist into the 
object file, the references against temporary labels end up as against 
section base, plus an offset. And those symbols (the section base) aren't 
global.


The workaround I did for this within LLVM, 
https://github.com/llvm/llvm-project/commit/06d0d449d8555ae5f1ac33e8d4bb4ae40eb080d3, 
deals specifically only with temporary symbols.


// Martin

Re: [patch, fortran] Matmul and dot_product for unsigned

2024-09-19 Thread Andre Vehreschild

Hi Thomas,

unfortunately I have some questions. Most of them are for my understanding.

> diff --git a/gcc/fortran/arith.cc b/gcc/fortran/arith.cc
> index 66a3635404a..a214b8bc1b3 100644
> --- a/gcc/fortran/arith.cc
> +++ b/gcc/fortran/arith.cc
> @@ -711,17 +711,9 @@ gfc_arith_uminus (gfc_expr *op1, gfc_expr **resultp)
>  case BT_UNSIGNED:
>{
>   if (pedantic)
> -   return ARITH_UNSIGNED_NEGATIVE;
> +   return check_result (ARITH_UNSIGNED_NEGATIVE, op1, result, resultp);

What is the need for this check? ARITH_UNSIGNED_NEGATIVE is, when I read the
code correctly, never triggering an error here. What do I not see?

>
> - arith neg_rc;
>   mpz_neg (result->value.integer, op1->value.integer);
> - neg_rc = gfc_range_check (result);



> diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
> index cfafdb7974f..7c630dd73f4 100644
> --- a/gcc/fortran/check.cc
> +++ b/gcc/fortran/check.cc
> @@ -2804,6 +2804,10 @@ gfc_check_dot_product (gfc_expr *vector_a, gfc_expr
> *vector_b) return false;
>break;
>
> +case BT_UNSIGNED:
> +  /* Check comes later.  */
> +  break;
> +
>  default:
>gfc_error ("%qs argument of %qs intrinsic at %L must be numeric "
>"or LOGICAL", gfc_current_intrinsic_arg[0]->name,
> @@ -2811,6 +2815,14 @@ gfc_check_dot_product (gfc_expr *vector_a, gfc_expr
> *vector_b) return false;
>  }
>
> +  if (gfc_invalid_unsigned_ops (vector_a, vector_b))

I haven't read the proposal (shame to me), but why would want not want to
combine a unsigned vector with a signed one? This all depends on the data type
of the result (variable). So why is this needed here? (I know we don't have the
result available here.) It just feels odd to me.

> +{
> +  gfc_error ("Argument types of %qs intrinsic at %L must match (%s/%s)",
> +  gfc_current_intrinsic, &vector_a->where,
> +  gfc_typename(&vector_a->ts), gfc_typename(&vector_b->ts));
> +   return false;
> +}
> +
>if (!rank_check (vector_a, 0, 1))
>  return false;
>
> @@ -4092,7 +4104,8 @@ gfc_check_matmul (gfc_expr *matrix_a, gfc_expr
> *matrix_b) }
>
>if ((matrix_a->ts.type == BT_LOGICAL && gfc_numeric_ts (&matrix_b->ts))
> -  || (gfc_numeric_ts (&matrix_a->ts) && matrix_b->ts.type == BT_LOGICAL))
> +  || (gfc_numeric_ts (&matrix_a->ts) && matrix_b->ts.type == BT_LOGICAL)
> +  || gfc_invalid_unsigned_ops (matrix_a, matrix_b))

Same here.

>  {
>gfc_error ("Argument types of %qs intrinsic at %L must match (%s/%s)",
>gfc_current_intrinsic, &matrix_a->where,
> diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
> index 81c641e2322..cef971894ea 100644
> --- a/gcc/fortran/expr.cc
> +++ b/gcc/fortran/expr.cc
> @@ -224,7 +224,19 @@ gfc_get_int_expr (int kind, locus *where, HOST_WIDE_INT
> value) return p;
>  }
>
> +/* Get a new expression node that is an unsigned constant.  */
>
> +gfc_expr *
> +gfc_get_unsigned_expr (int kind, locus *where, HOST_WIDE_INT value)
> +{
> +  gfc_expr *p;
> +  p = gfc_get_constant_expr (BT_UNSIGNED, kind,
> +  where ? where : &gfc_current_locus);
> +  const wide_int w = wi::shwi (value, kind * BITS_PER_UNIT);
> +  wi::to_mpz (w, p->value.integer, UNSIGNED);
> +
> +  return p;
> +}

Newline please :-)

>  /* Get a new expression node that is a logical constant.  */
>
>  gfc_expr *



> diff --git a/libgfortran/m4/iparm.m4 b/libgfortran/m4/iparm.m4
> index b474620424b..0c4c76c2428 100644
> --- a/libgfortran/m4/iparm.m4
> +++ b/libgfortran/m4/iparm.m4
> @@ -4,7 +4,7 @@ dnl This file is part of the GNU Fortran 95 Runtime Library
> (libgfortran) dnl Distributed under the GNU GPL with exception.  See COPYING
> for details. dnl M4 macro file to get type names from filenames
>  define(get_typename2, `GFC_$1_$2')dnl
> -define(get_typename,
> `get_typename2(ifelse($1,i,INTEGER,ifelse($1,r,REAL,ifelse($1,l,LOGICAL,ifelse($1,c,COMPLEX,ifelse($1,s,UINTEGER,unknown),`$2')')dnl
> +define(get_typename,
> `get_typename2(ifelse($1,i,INTEGER,ifelse($1,r,REAL,ifelse($1,l,LOGICAL,ifelse($1,c,COMPLEX,ifelse($1,m,UINTEGER,ifelse($1,s,UINTEGER,unknown)),`$2')')dnl

Curiosity killed the cat: So type letter 's' and 'm' both signify a unsigned
integer, right? Is there anywhere a notable difference? I meant, keep it simple
is usually wanted and having to type letters with identical meaning is not
simple, right?

> define(get_arraytype, `gfc_array_$1$2')dnl define(define_type, `dnl
> ifelse(regexp($2,`^[0-9]'),-1,`dnl diff --git a/libgfortran/m4/matmul.m4
> b/libgfortran/m4/matmul.m4 index 7fc1f5fa75f..cd804e8be06 100644
> --- a/libgfortran/m4/matmul.m4
> +++ b/libgfortran/m4/matmul.m4
> @@ -28,6 +28,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.
> If not, see #include '
>
>  include(iparm.m4)dnl
> +ifelse(index(rtype_name,`GFC_INTEGER'),`0',dnl
> +define(`rtype_name',patsubst(rtype_name,`GFC_INTEGER',`GFC_UINTEGER'))dnl
> +define(`rtype',patsubst(

Re: [PATCH] libcpp: Add -Wtrailing-blanks warning

2024-09-19 Thread Jakub Jelinek

On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote:
> On Wed, Sep 18, 2024 at 7:33 PM Jakub Jelinek  wrote:
> >
> > On Wed, Sep 18, 2024 at 06:17:58PM +0100, Richard Sandiford wrote:
> > > +1  I'd much rather learn about this kind of error before the code reaches
> > > a review tool :)
> > >
> > > >From a quick check, it doesn't look like Clang has this, so there is no
> > > existing name to follow.
> >
> > I was considering also -Wtrailing-whitespace, but
> > 1) git diff really warns just about trailing spaces/tabs, not form feeds or
> > vertical tabs
> > 2) gcc source contains tons of spots with form feed in it (though,
> > I think pretty much always as the sole character on a line).
> > And not really sure how people use vertical tabs in the source if at all.
> > Perhaps form feed could be not warned if at end of line if it isn't the sole
> > character on a line...
> 
> Generally I like diagnosing this early.  For the above I'd say
> -Wtrailing-whitespace=
> with a set of things to diagnose (and a sane default - just spaces and
> tabs - for
> -Wtrailiing-whitespace) would be nice.  As for naming possibly follow the
> is{space,blank,cntrl} character classifications?  If those are a good
> fit, that is.

I think the character classifications risk problems.

space is ' ' '\t' '\n' '\r' '\f' '\v' in the C locale,
blank is ' ' '\t'
cntrl is a lot of chars but not ' '
if we extend by the safe-ctype
vspace '\r' '\n'
nvspace ' ' '\t' '\f' '\v' '\0'
Obviously, we shouldn't look at '\r' and '\n', those aren't trailing
characters, those are line separators.

Would we need to consider all UTF-8 (or EBCDIC-UTF) control characters is
cntrl?
..0009; Control # Cc  [10] ..
000B..000C; Control # Cc   [2] ..
000E..001F; Control # Cc  [18] ..
007F..009F; Control # Cc  [33] ..
00AD  ; Control # Cf   SOFT HYPHEN
061C  ; Control # Cf   ARABIC LETTER MARK
180E  ; Control # Cf   MONGOLIAN VOWEL SEPARATOR
200B  ; Control # Cf   ZERO WIDTH SPACE
200E..200F; Control # Cf   [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
2028  ; Control # Zl   LINE SEPARATOR
2029  ; Control # Zp   PARAGRAPH SEPARATOR
202A..202E; Control # Cf   [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT 
OVERRIDE
2060..2064; Control # Cf   [5] WORD JOINER..INVISIBLE PLUS
2065  ; Control # Cn   
2066..206F; Control # Cf  [10] LEFT-TO-RIGHT ISOLATE..NOMINAL DIGIT SHAPES
FEFF  ; Control # Cf   ZERO WIDTH NO-BREAK SPACE
FFF0..FFF8; Control # Cn   [9] ..
FFF9..FFFB; Control # Cf   [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR 
ANNOTATION TERMINATOR
13430..1343F  ; Control # Cf  [16] EGYPTIAN HIEROGLYPH VERTICAL 
JOINER..EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE
1BCA0..1BCA3  ; Control # Cf   [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND 
FORMAT UP STEP
1D173..1D17A  ; Control # Cf   [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL 
END PHRASE
E ; Control # Cn   
E0001 ; Control # Cf   LANGUAGE TAG
E0002..E001F  ; Control # Cn  [30] ..
E0080..E00FF  ; Control # Cn [128] ..
E01F0..E0FFF  ; Control # Cn [3600] ..

Wonder why anybody would be interested to find just trailing spaces and not
trailing tabs or vice versa, so if we have categories, blank would be one,
then perhaps nvspace as something not including '\0', so just ' ' '\t' '\f'
'\v' and if really needed, control characters with added ' ', but how to
call that and would it really need to parse UTF-8/EBCDIC and look at
pregenerated tables?

Jakub

Re: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change

2024-09-19 Thread Robin Dapp

> This patch would like fix the dump check times of vector SAT_ADD.  The
> middle-end change makes the match times from 2 to 4 times.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.

That's OK.  And I think testsuite fixup patches like this you can consider
"obvious" as long as you're sure the underlying reason is understood.
In particular as you have been working in the saturating space for a while now.

So for the future I'd suggest you post those with a remark that you think
they're obvious and going to commit in a day (or some other reasonable
timeframe) if there are no complaints.

-- 
Regards
 Robin

Re: [PATCH][v2] tree-optimization/116573 - .SELECT_VL for SLP

2024-09-19 Thread Robin Dapp

> On Tue, 17 Sep 2024, Richard Biener wrote:
>
> > The following restores the use of .SELECT_VL for testcases where it
> > is safe to use even when using SLP.  I've for now restricted it
> > to single-lane SLP plus optimistically allow store-lane nodes
> > and assume single-lane roots are not widened but at most to
> > load-lane who should be fine.
> > 
> > v2 fixes latent issues in vectorizable_load/store.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> So while this fixes the earlier observed 80 regressions from not using
> SLP this now introduces many more from the CI (800), all in other
> scan-assembler tests where after checking a sample of one (sic!)
> we seem to use .SELECT_VL more now but expect not to.  Unfortunately
> none of the affected tests are runtime tests but at least for the
> single test I investigated there is nothing wrong with using .SELECT_VL.
>
> I've checked the full CI results and as far I can see there are no
> execute fails caused by this patch (I have locally done a full
> check-gcc as well with a similar result).
>
> So I'm asking for explicit approval here.
>
> OK for trunk?

Odd.  With my testing, rv64 only though, I haven't observed any
additional fallout.  But the CI knows better, usually.

While I worked on my patch (which ended up looking similar to yours) I also
noticed that some examples now use SELECT_VL where we didn't before, and,
they appeared reasonable to me.  Definitely saw no execution failures either.

So, I'd say let's go ahead.  Once it is in we can deal with the fallout.
Same as the LOAD_LANES fallout that I wanted to take care of as soon as
our internal matters permit.
Thanks for fixing it.

-- 
Regards
 Robin

Re: [PATCH] RISC-V: Align vconfig for TARGER_SFB_ALU

2024-09-19 Thread Robin Dapp

Hi Dusan,

sorry for the late reply.

>   This patch addresses a missed opportunity to fuse vsetvl_infos.
> Instead of checking whether demands for merging configurations of
> vsetvl_info are all met, the demands are checked individually.
>
>   The case in question occurs because of the conditional move
> instruction which sifive-7, sifive-p400 and sifive-p600 support.
> Firstly, the conditional move generated rearranges the CFG.
> Secondly, because the conditional move generated uses
> the same register in the if_then_else pattern as vsetvli before it
> curr_info and prev_info won't be merged.

Can you elaborate a bit on that?  Rearranging the CFG shouldn't matter
in general and relying on the specific TARGET_SFB_ALU feels overly
specific.
Why does the same register in the if_then_else and interfere with vsetvl?

BTW Bohan Lei has since fixed a bug regarding non-RVV uses.  Does the
situation change with that applied?

-- 
Regards
 Robin

Re: [PATCH v2 0/9] SMALL code model fixes, optimization fixes, LTO and minimal C++ enablement

2024-09-19 Thread Richard Sandiford

Evgeny Karpov  writes:
> Hello,
>
> Thank you for reviewing v1!
>
> v2 Changes:
> - Add extra comments and extend patch descriptions.
> - Extract libstdc++ changes to a separate patch.
> - Minor style refactoring based on the reviews.
> - Unify mingw_pe_declare_type for functions and objects.

Thanks for the update.  Aside from the points raised in the discussion
about patches 5, 6, and 9 (and taking into account what you said about
patch 7), the series looks good.

Thanks,
Richard

>
> Regards,
> Evgeny
>
> Evgeny Karpov (9):
>   Support weak references
>   aarch64: Add debugging information
>   aarch64: Add minimal C++ support
>   aarch64: Exclude symbols using GOT from code models
>   aarch64: Multiple adjustments to support the SMALL code model
> correctly
>   aarch64: Use symbols without offset to prevent relocation issues
>   aarch64: Disable the anchors
>   Add LTO support
>   aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT
>
>  gcc/config.gcc|  1 +
>  gcc/config/aarch64/aarch64-coff.h | 32 +++---
>  gcc/config/aarch64/aarch64.cc | 43 ---
>  gcc/config/aarch64/cygming.h  | 69 +--
>  gcc/config/i386/cygming.h | 16 +++
>  gcc/config/i386/i386-protos.h |  2 -
>  gcc/config/mingw/winnt-dll.cc |  4 +-
>  gcc/config/mingw/winnt.cc | 33 ++-
>  gcc/config/mingw/winnt.h  |  7 ++--
>  libiberty/simple-object-coff.c|  4 +-
>  10 files changed, 158 insertions(+), 53 deletions(-)

Re: [PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-09-19 Thread Richard Sandiford

Evgeny Karpov  writes:
> In some cases, the alignment can be bigger than BIGGEST_ALIGNMENT.
>
> The issue was detected while building FFmpeg.
> It creates structures, most likely for AVX optimization.
>
> For instance:
> float __attribute__((aligned (32))) large_aligned_array[3];
>
> BIGGEST_ALIGNMENT could be up to 512 bits on x64.
> This patch has been added to cover this case without needing to
> change the FFmpeg code.

What goes wrong if we don't do this?  I'm not sure from the description
whether it's a correctness fix, a performance fix, or whether it's about
avoiding wasted space.

> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-coff.h (ASM_OUTPUT_ALIGNED_LOCAL):
>   Change alignment.
> ---
>  gcc/config/aarch64/aarch64-coff.h | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-coff.h 
> b/gcc/config/aarch64/aarch64-coff.h
> index 17f346fe540..bf8e30b9c08 100644
> --- a/gcc/config/aarch64/aarch64-coff.h
> +++ b/gcc/config/aarch64/aarch64-coff.h
> @@ -58,6 +58,16 @@
>assemble_name ((FILE), (NAME)),\
>fprintf ((FILE), ",%lu\n", (ROUNDED)))
>  
> +#define ASM_OUTPUT_ALIGNED_LOCAL(FILE, NAME, SIZE, ALIGNMENT)  \
> +  { \
> +unsigned HOST_WIDE_INT rounded = MAX ((SIZE), 1); \
> +unsigned HOST_WIDE_INT alignment = MAX ((ALIGNMENT), BIGGEST_ALIGNMENT); 
> \
> +rounded += (alignment / BITS_PER_UNIT) - 1; \
> +rounded = (rounded / (alignment / BITS_PER_UNIT) \
> +  * (alignment / BITS_PER_UNIT)); \

There's a ROUND_UP macro that could be used here.

Thanks,
Richard


> +ASM_OUTPUT_LOCAL (FILE, NAME, SIZE, rounded); \
> +  }
> +
>  #define ASM_OUTPUT_SKIP(STREAM, NBYTES)  \
>fprintf (STREAM, "\t.space\t%d  // skip\n", (int) (NBYTES))

Re: [PATCH v3] match: Fix A || B not optimized to true when !B implies A [PR114326]

2024-09-19 Thread Konstantinos Eleftheriou

I have sent a new version
(https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663350.html).
I also added :c to the ne operations.

Thanks,
Konstantinos

On Wed, Sep 18, 2024 at 1:52 PM Richard Biener
 wrote:
>
> On Wed, Sep 18, 2024 at 10:42 AM Konstantinos Eleftheriou
>  wrote:
> >
> > On Mon, Sep 9, 2024 at 3:11 PM Richard Biener
> >  wrote:
> > >
> > > On Thu, Aug 29, 2024 at 9:03 AM  wrote:
> > > >
> > > > From: kelefth 
> > > >
> > > > In expressions like (a != b || ((a ^ b) & c) == d) and
> > > > (a != b || (a ^ b) == c), (a ^ b) is folded to false.
> > > > In the equivalent expressions (((a ^ b) & c) == d || a != b) and
> > > > ((a ^ b) == c || a != b) this is not happening.
> > > >
> > > > This patch adds the following simplifications in match.pd:
> > > > ((a ^ b) & c) == d || a != b --> 0 == d || a != b
> > > > (a ^ b) == c || a != b --> 0 == c || a != b
> > > >
> > > > PR tree-optimization/114326
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * match.pd: Add two patterns to fold a ^ b to 0, when a == b.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/tree-ssa/fold-xor-and-or.c: New test.
> > > > * gcc.dg/tree-ssa/fold-xor-or.c: New test.
> > > >
> > > > Reviewed-by: Christoph Müllner 
> > > > Signed-off-by: Philipp Tomsich 
> > > > Signed-off-by: Konstantinos Eleftheriou 
> > > > 
> > > > ---
> > > >  gcc/match.pd  | 30 ++
> > > >  .../gcc.dg/tree-ssa/fold-xor-and-or.c | 31 +++
> > > >  gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c   | 31 +++
> > > >  3 files changed, 92 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index be211535a49..6bab3cfbde1 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -10727,3 +10727,33 @@ and,
> > > >}
> > > >(if (full_perm_p)
> > > > (vec_perm (op@3 @0 @1) @3 @2))
> > >
> > >> Can you please place those patterns next to related ones?  I suggest
> > >> after (type)([0,1]@a != 0) -> (type)a and before
> > >> /* We can't reassociate at all for saturating types.  */
> >
> > Yes, I will fix that.
> > >
> > >
> > > > +/* ((a ^ b) & c) == d || a != b --> (0 == d || a != b). */
> > >
> > >> The comment indicates == d but you also test for other
> > >> comparison ops.  As far as I can see your testcases also
> > >> only cover ==.
> >
> > I will change "=="  to "cmp" in the comments and add additional testcases.
> > My intention is to cover all operations included in "simple_comparison".
> > >
> > >
> > > > +(for cmp (simple_comparison)
> > > > +  (simplify
> > > > +(bit_ior
> > > > +  (cmp
> > > > +   (bit_and
> > >
> > > This needs :c
> > >
> > > > + (bit_xor @0 @1)
> > >
> > > Likewise.
> >
> > Right, I will fix these cases.
> > >
> > >
> > >> I think you also need :c on the comparison to match
> > >> d == (...).
> >
> > In that case, I would need to handle non-commutative operations (e.g.
> > >) separately, right?
>
> The non-commutative operations can be handled as well, the tree code
> will be inverted accordingly.
>
> Richard.
>
> > >
> > >
> > > > + tree_expr_nonzero_p@2)
> > > > +   @3)
> > > > +  (ne@4 @0 @1))
> > > > +(bit_ior
> > > > +  (cmp
> > > > +   { build_zero_cst (TREE_TYPE (@0)); }
> > > > +   @3)
> > > > +  @4)))
> > > > +
> > > > +/* (a ^ b) == c || a != b --> (0 == c || a != b). */
> > > > +(for cmp (simple_comparison)
> > > > +  (simplify
> > > > +(bit_ior
> > > > +  (cmp
> > > > +   (bit_xor @0 @1)
> > >
> > > similar, :c here and also on the comparison.  Same
> > > question with regard to == c.
> > >
> > > > +   @2)
> > > > +  (ne@3 @0 @1))
> > > > +(bit_ior
> > > > +  (cmp
> > > > +   { build_zero_cst (TREE_TYPE (@0)); }
> > > > +   @2)
> > > > +  @3)))
> > > > \ No newline at end of file
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c 
> > > > b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
> > > > new file mode 100644
> > > > index 000..ec327e62f6e
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c
> > > > @@ -0,0 +1,31 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > > > +
> > > > +int cmp1(int d1, int d2) {
> > > > +  if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2)
> > > > +return 0;
> > > > +  return 1;
> > > > +}
> > > > +
> > > > +int cmp2(int d1, int d2) {
> > > > +  if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0)
> > > > +return 0;
> > > > +  return 1;
> > > > +}
> > > > +
> > > > +typedef unsigned long int uint64_t;
> > > > +
> > > > +int cmp1_64(uint64_t d1, uint64_t d2) {
> > > > +  if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2)
> > > > +return 0;
> > > > +  return 1;
>

[PATCH v2] aarch64: Add fp8 scalar types

2024-09-19 Thread Claudio Bantaloukas


The ACLE defines a new scalar type, __mfp8. This is an opaque 8bit types that
can only be used by fp8 intrinsics. Additionally, the mfloat8_t type is made
available in arm_neon.h and arm_sve.h as an alias of the same.

This implementation uses an unsigned INTEGER_TYPE, with precision 8 to
represent __mfp8. Conversions to int and other types are disabled via the
TARGET_INVALID_CONVERSION hook.
Additionally, operations that are typically available to integer types are
disabled via TARGET_INVALID_UNARY_OP and TARGET_INVALID_BINARY_OP hooks.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_mfp8_type_node): Add node
for __mfp8 type.
(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type.
(aarch64_init_fp8_types): New function to initialise fp8 types and
register with language backends.
* config/aarch64/aarch64.cc (aarch64_mangle_type): Add ABI mangling for
new type.
(aarch64_invalid_conversion): Add function implementing
TARGET_INVALID_CONVERSION hook that blocks conversion to and from the
__mfp8 type.
(aarch64_invalid_unary_op): Add function implementing TARGET_UNARY_OP
hook that blocks operations on __mfp8 other than &.
(aarch64_invalid_binary_op): Extend TARGET_BINARY_OP hook to disallow
operations on __mfp8 type.
(TARGET_INVALID_CONVERSION): Add define.
(TARGET_INVALID_UNARY_OP): Likewise.
* config/aarch64/aarch64.h (aarch64_mfp8_type_node): Add node for __mfp8
type.
(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type.
* config/aarch64/arm_neon.h (mfloat8_t): Add typedef.
* config/aarch64/arm_sve.h (mfloat8_t): Likewise.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/fp8_mangling.C: New tests exercising mangling.
* g++.target/aarch64/fp8_scalar_typecheck_2.C: New tests in C++.
* gcc.target/aarch64/fp8_scalar_1.c: New tests in C.
* gcc.target/aarch64/fp8_scalar_typecheck_1.c: Likewise.
---
Hi, 
Is this ok for master? I do not have commit rights yet, if ok, can someone 
commit it on my behalf?

Regression tested with aarch64-unknown-linux-gnu.

Compared to V1 of the patch, in version 2:
- mangling for the __mfp8 type was added along with tests
- unneeded comments were removed
- simplified type checks in hooks
- simplified initialization of aarch64_mfp8_type_node
- separated mfloat8_t define from other fp types in arm_sve.h
- C++ tests were moved to g++.target/aarch64
- added more tests around binary operations, function declaration,
  type traits
- added tests exercising loads and stores from floating point registers


Thanks,
Claudio Bantaloukas

 gcc/config/aarch64/aarch64-builtins.cc|  20 +
 gcc/config/aarch64/aarch64.cc |  54 ++-
 gcc/config/aarch64/aarch64.h  |   5 +
 gcc/config/aarch64/arm_neon.h |   2 +
 gcc/config/aarch64/arm_sve.h  |   2 +
 .../g++.target/aarch64/fp8_mangling.C |  44 ++
 .../aarch64/fp8_scalar_typecheck_2.C  | 381 ++
 .../gcc.target/aarch64/fp8_scalar_1.c | 134 ++
 .../aarch64/fp8_scalar_typecheck_1.c  | 356 
 9 files changed, 996 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_mangling.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_scalar_typecheck_2.C
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_typecheck_1.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index eb878b933fe..7d17df05a0f 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -961,6 +961,11 @@ static GTY(()) tree aarch64_simd_intOI_type_node = NULL_TREE;
 static GTY(()) tree aarch64_simd_intCI_type_node = NULL_TREE;
 static GTY(()) tree aarch64_simd_intXI_type_node = NULL_TREE;
 
+/* The user-visible __mfp8 type, and a pointer to that type.  Used
+   across the back-end.  */
+tree aarch64_mfp8_type_node = NULL_TREE;
+tree aarch64_mfp8_ptr_type_node = NULL_TREE;
+
 /* The user-visible __fp16 type, and a pointer to that type.  Used
across the back-end.  */
 tree aarch64_fp16_type_node = NULL_TREE;
@@ -1721,6 +1726,19 @@ aarch64_init_builtin_rsqrt (void)
   }
 }
 
+/* Initialize the backend type that supports the user-visible __mfp8
+   type and its relative pointer type.  */
+
+static void
+aarch64_init_fp8_types (void)
+{
+  aarch64_mfp8_type_node = make_unsigned_type (8);
+  SET_TYPE_MODE (aarch64_mfp8_type_node, QImode);
+
+  lang_hooks.types.register_builtin_type (aarch64_mfp8_type_node, "__mfp8");
+  aarch64_mfp8_ptr_type_node = build_pointer_type (aarch64_mfp8_type_node);
+}
+
 /* Initialize the backend types that support the user-visible __fp16
type, also initialize a pointer to that type, to be used when
fo

[PATCH] RISC-V: testsuite: Fix SELECT_VL SLP fallout.

2024-09-19 Thread Robin Dapp

Hi,

this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow
SLP with SELECT_VL.

Assisted by sed and regtested on rv64gcv_zvfh_zvbb.

Rather lengthy but obvious, so going to commit after a while if the CI is
happy.  I think those tests don't really need to check for vsetvl anyway,
not all of them at least but I didn't change that for now.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Expect
length-controlled loop.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/

Re: [PATCH v2] aarch64: Add fp8 scalar types

2024-09-19 Thread Claudio Bantaloukas





On 9/19/2024 2:18 PM, Kyrylo Tkachov wrote:

Hi Claudio,


On 19 Sep 2024, at 15:09, Claudio Bantaloukas  
wrote:

External email: Use caution opening links or attachments


The ACLE defines a new scalar type, __mfp8. This is an opaque 8bit types that
can only be used by fp8 intrinsics. Additionally, the mfloat8_t type is made
available in arm_neon.h and arm_sve.h as an alias of the same.

This implementation uses an unsigned INTEGER_TYPE, with precision 8 to
represent __mfp8. Conversions to int and other types are disabled via the
TARGET_INVALID_CONVERSION hook.
Additionally, operations that are typically available to integer types are
disabled via TARGET_INVALID_UNARY_OP and TARGET_INVALID_BINARY_OP hooks.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_mfp8_type_node): Add node
for __mfp8 type.
(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type.
(aarch64_init_fp8_types): New function to initialise fp8 types and
register with language backends.
* config/aarch64/aarch64.cc (aarch64_mangle_type): Add ABI mangling for
new type.
(aarch64_invalid_conversion): Add function implementing
TARGET_INVALID_CONVERSION hook that blocks conversion to and from the
__mfp8 type.
(aarch64_invalid_unary_op): Add function implementing TARGET_UNARY_OP
hook that blocks operations on __mfp8 other than &.
(aarch64_invalid_binary_op): Extend TARGET_BINARY_OP hook to disallow
operations on __mfp8 type.
(TARGET_INVALID_CONVERSION): Add define.
(TARGET_INVALID_UNARY_OP): Likewise.
* config/aarch64/aarch64.h (aarch64_mfp8_type_node): Add node for __mfp8
type.
(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type.
* config/aarch64/arm_neon.h (mfloat8_t): Add typedef.
* config/aarch64/arm_sve.h (mfloat8_t): Likewise.


Looks like this typedef is a good candidate to go into arm_private_fp8.h so 
that arm_neon.h, arm_sve.h and arm_sme.h inherit it.


Hi Kyrill,
thanks for the quick review. The thought of using arm_private_fp8.h 
crossed my mind but I thought that ultimately it made more sense to 
follow existing practice and place the typedef near existing ones for 
bfloat types.
If you feel strongly about this, I'll make the suggested change, but I'd 
rather keep it as is. As you can see, the rest of the patch borrows 
heavily in style from the bfloat implementation and my hope is that the 
closeness in code will aid in maintainability.


Let me know :)

Cheers,
Claudio



Thanks,
Kyrill




gcc/testsuite/ChangeLog:

* g++.target/aarch64/fp8_mangling.C: New tests exercising mangling.
* g++.target/aarch64/fp8_scalar_typecheck_2.C: New tests in C++.
* gcc.target/aarch64/fp8_scalar_1.c: New tests in C.
* gcc.target/aarch64/fp8_scalar_typecheck_1.c: Likewise.
---
Hi,
Is this ok for master? I do not have commit rights yet, if ok, can someone 
commit it on my behalf?

Regression tested with aarch64-unknown-linux-gnu.

Compared to V1 of the patch, in version 2:
- mangling for the __mfp8 type was added along with tests
- unneeded comments were removed
- simplified type checks in hooks
- simplified initialization of aarch64_mfp8_type_node
- separated mfloat8_t define from other fp types in arm_sve.h
- C++ tests were moved to g++.target/aarch64
- added more tests around binary operations, function declaration,
  type traits
- added tests exercising loads and stores from floating point registers


Thanks,
Claudio Bantaloukas

gcc/config/aarch64/aarch64-builtins.cc|  20 +
gcc/config/aarch64/aarch64.cc |  54 ++-
gcc/config/aarch64/aarch64.h  |   5 +
gcc/config/aarch64/arm_neon.h |   2 +
gcc/config/aarch64/arm_sve.h  |   2 +
.../g++.target/aarch64/fp8_mangling.C |  44 ++
.../aarch64/fp8_scalar_typecheck_2.C  | 381 ++
.../gcc.target/aarch64/fp8_scalar_1.c | 134 ++
.../aarch64/fp8_scalar_typecheck_1.c  | 356 
9 files changed, 996 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_mangling.C
create mode 100644 gcc/testsuite/g++.target/aarch64/fp8_scalar_typecheck_2.C
create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/fp8_scalar_typecheck_1.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index eb878b933fe..7d17df05a0f 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -961,6 +961,11 @@ static GTY(()) tree aarch64_simd_intOI_type_node = 
NULL_TREE;
static GTY(()) tree aarch64_simd_intCI_type_node = NULL_TREE;
static GTY(()) tree aarch64_simd_intXI_type_node = NULL_TREE;

+/* The user-visible __mfp8 type, and a pointer to that type.  Used
+   across the back-end.  */
+tree aarch64_mfp8_

Re: [PATCH v5 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change

2024-09-19 Thread Jeff Law





On 9/19/24 4:11 AM, Li, Pan2 wrote:

So for the future I'd suggest you post those with a remark that you think
they're obvious and going to commit in a day (or some other reasonable
timeframe) if there are no complaints.


Oh, I see. Thanks Robin for reminding.

That would be perfect. Do you have any best practices for the remark "obvious"?
Like [NFC] in subject to give some hit for not-function-change, maybe take 
[TBO] stand for to-be-obvious or something like that.

Typically we say something like "pushing as obvious".

jeff

Re: [PATCH] s390: Remove -m{,no-}lra option

2024-09-19 Thread Sam James

Stefan Schulze Frielinghaus  writes:

> I have been missing the two test cases and removed them since they
> depend on -mno-lra.

Can't approve but it looks right. Thanks for handling it, especially so quickly!

>
> -- 8< --
>
> Since the old reload pass is about to be removed and we defaulted to LRA
> for over a decade, remove option -m{,no-}lra.
>
> PR target/113953
>
> gcc/ChangeLog:
>
>   * config/s390/s390.cc (s390_lra_p): Remove.
>   (TARGET_LRA_P): Remove.
>   * config/s390/s390.opt (mlra): Remove.
>   * config/s390/s390.opt.urls (mlra): Remove.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/s390/TI-constants-nolra.c: Removed.
>   * gcc.target/s390/pr79895.c: Removed.
> ---
>  gcc/config/s390/s390.cc   | 10 
>  gcc/config/s390/s390.opt  |  4 --
>  gcc/config/s390/s390.opt.urls |  2 -
>  .../gcc.target/s390/TI-constants-nolra.c  | 47 ---
>  gcc/testsuite/gcc.target/s390/pr79895.c   |  9 
>  5 files changed, 72 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.target/s390/TI-constants-nolra.c
>  delete mode 100644 gcc/testsuite/gcc.target/s390/pr79895.c
>
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index c9172d1153a..25d43ae3e13 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -11342,13 +11342,6 @@ s390_can_change_mode_class (machine_mode from_mode,
>return true;
>  }
>  
> -/* Return true if we use LRA instead of reload pass.  */
> -static bool
> -s390_lra_p (void)
> -{
> -  return s390_lra_flag;
> -}
> -
>  /* Return true if register FROM can be eliminated via register TO.  */
>  
>  static bool
> @@ -18444,9 +18437,6 @@ s390_c_mode_for_floating_type (enum tree_index ti)
>  #undef TARGET_LEGITIMATE_CONSTANT_P
>  #define TARGET_LEGITIMATE_CONSTANT_P s390_legitimate_constant_p
>  
> -#undef TARGET_LRA_P
> -#define TARGET_LRA_P s390_lra_p
> -
>  #undef TARGET_CAN_ELIMINATE
>  #define TARGET_CAN_ELIMINATE s390_can_eliminate
>  
> diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt
> index a5b5aa95a12..23ea4b8232d 100644
> --- a/gcc/config/s390/s390.opt
> +++ b/gcc/config/s390/s390.opt
> @@ -229,10 +229,6 @@ Set the branch costs for conditional branch 
> instructions.  Reasonable
>  values are small, non-negative integers.  The default branch cost is
>  1.
>  
> -mlra
> -Target Var(s390_lra_flag) Init(1) Save
> -Use LRA instead of reload.
> -
>  mpic-data-is-text-relative
>  Target Var(s390_pic_data_is_text_relative) 
> Init(TARGET_DEFAULT_PIC_DATA_IS_TEXT_RELATIVE)
>  Assume data segments are relative to text segment.
> diff --git a/gcc/config/s390/s390.opt.urls b/gcc/config/s390/s390.opt.urls
> index ab1e761efa8..bc772d2ffc7 100644
> --- a/gcc/config/s390/s390.opt.urls
> +++ b/gcc/config/s390/s390.opt.urls
> @@ -74,8 +74,6 @@ 
> UrlSuffix(gcc/S_002f390-and-zSeries-Options.html#index-mzarch)
>  
>  ; skipping UrlSuffix for 'mbranch-cost=' due to finding no URLs
>  
> -; skipping UrlSuffix for 'mlra' due to finding no URLs
> -
>  ; skipping UrlSuffix for 'mpic-data-is-text-relative' due to finding no URLs
>  
>  ; skipping UrlSuffix for 'mindirect-branch=' due to finding no URLs
> diff --git a/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c 
> b/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c
> deleted file mode 100644
> index b9948fc4aa5..000
> --- a/gcc/testsuite/gcc.target/s390/TI-constants-nolra.c
> +++ /dev/null
> @@ -1,47 +0,0 @@
> -/* { dg-do compile { target int128 } } */
> -/* { dg-options "-O3 -mno-lra" } */
> -
> -/* 2x lghi */
> -__int128 a() {
> -  return 0;
> -}
> -
> -/* 2x lghi */
> -__int128 b() {
> -  return -1;
> -}
> -
> -/* 2x lghi */
> -__int128 c() {
> -  return -2;
> -}
> -
> -/* lghi + llilh */
> -__int128 d() {
> -  return 16000 << 16;
> -}
> -
> -/* lghi + llihf */
> -__int128 e() {
> -  return (unsigned long long)8 << 32;
> -}
> -
> -/* lghi + llihf */
> -__int128 f() {
> -  return (unsigned __int128)8 << 96;
> -}
> -
> -/* llihf + llihf - this is handled via movti_bigconst pattern */
> -__int128 g() {
> -  return ((unsigned __int128)8 << 96) | ((unsigned __int128)8 << 32);
> -}
> -
> -/* Literal pool */
> -__int128 h() {
> -  return ((unsigned __int128)8 << 32) | 1;
> -}
> -
> -/* Literal pool */
> -__int128 i() {
> -  return (((unsigned __int128)8 << 32) | 1) << 64;
> -}
> diff --git a/gcc/testsuite/gcc.target/s390/pr79895.c 
> b/gcc/testsuite/gcc.target/s390/pr79895.c
> deleted file mode 100644
> index 02374e4b8a8..000
> --- a/gcc/testsuite/gcc.target/s390/pr79895.c
> +++ /dev/null
> @@ -1,9 +0,0 @@
> -/* { dg-do compile { target int128 } } */
> -/* { dg-options "-O1 -mno-lra" } */
> -
> -unsigned __int128 g;
> -void
> -foo ()
> -{
> -  g = (unsigned __int128)1 << 127;
> -}

Re: [patch, fortran] Add random numbers and fix some bugs.

2024-09-19 Thread Thomas Koenig


Am 19.09.24 um 12:16 schrieb Andre Vehreschild:

Hi Thomas,

submitting your patch as part of the mail got it corrupted by some mailer
adding line breaks. It does not apply for me. Because I can't test it, I have
more questions, see below:


I have attached it.



On Wed, 18 Sep 2024 22:22:15 +0200
Thomas Koenig  wrote:


This patch adds random number support for UNSIGNED, plus fixes
two bugs, with array I/O where the type used to be set to BT_INTEGER,
and for division with the divisor being a constant.

Again, depends on prevous submissions.

OK for trunk?

gcc/fortran/ChangeLog:

* check.cc (gfc_check_random_number): Adjust for unsigned.
* iresolve.cc (gfc_resolve_random_number): Handle unsinged.


Hihi, I do this typo, too, over and over again: s/unsinged/unsigned/


Yep :-) It's like it is burned into my fingers or something.


* trans-expr.cc (gfc_conv_expr_op): Handle BT_UNSIGNED for divide.
* trans-types.cc (gfc_get_dtype_rank_type): Handle BT_UNSIGNED.
* gfortran.texi: Add RANDOM_NUMBER for UNSIGNED.






diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index 533c9d7d343..1851cfb8d4a 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -7007,8 +7007,14 @@ gfc_check_random_init (gfc_expr *repeatable,
gfc_expr *image_distinct)
   bool
   gfc_check_random_number (gfc_expr *harvest)
   {
-  if (!type_check (harvest, 0, BT_REAL))
-return false;
+  if (flag_unsigned)
+{
+  if (!type_check2 (harvest, 0, BT_REAL, BT_UNSIGNED))
+   return false;


When the second argument is a BT_INTEGER, does this fail here?


As it should.  RANDOM_NUMBER usually is for REALs only.  I thought
it an obvious idea to extend it to unsigned integers, but only got
the idea after the document was finalized, so I'm implementing
it anyway.


+}
+  else
+if (!type_check (harvest, 0, BT_REAL))
+  return false;

 if (!variable_check (harvest, 0, false))
   return false;




Best regards

Thomas
From 898be1e536614f6a8eb2cb59c3dbcd8277922d8f Mon Sep 17 00:00:00 2001
From: Thomas Koenig 
Date: Wed, 18 Sep 2024 22:02:03 +0200
Subject: [PATCH 2/2] Add random numbers and fix some bugs.

This patch adds random number support for UNSIGNED, plus fixes
two bugs, with array I/O where the type used to be set to BT_INTEGER,
and for division with the divisor being a constant.

gcc/fortran/ChangeLog:

	* check.cc (gfc_check_random_number): Adjust for unsigned.
	* iresolve.cc (gfc_resolve_random_number): Handle unsinged.
	* trans-expr.cc (gfc_conv_expr_op): Handle BT_UNSIGNED for divide.
	* trans-types.cc (gfc_get_dtype_rank_type): Handle BT_UNSIGNED.
	* gfortran.texi: Add RANDOM_NUMBER for UNSIGNED.

libgfortran/ChangeLog:

	* gfortran.map: Add _gfortran_random_m1, _gfortran_random_m2,
	_gfortran_random_m4, _gfortran_random_m8 and _gfortran_random_m16.
	* intrinsics/random.c (random_m1): New function.
	(random_m2): New function.
	(random_m4): New function.
	(random_m8): New function.
	(random_m16): New function.
	(arandom_m1): New function.
	(arandom_m2): New function.
	(arandom_m4): New function.
	(arandom_m8): New funciton.
	(arandom_m16): New function.

gcc/testsuite/ChangeLog:

	* gfortran.dg/unsigned_30.f90: New test.
---
 gcc/fortran/check.cc  |  10 +-
 gcc/fortran/gfortran.texi |   1 +
 gcc/fortran/iresolve.cc   |   6 +-
 gcc/fortran/trans-expr.cc |   4 +-
 gcc/fortran/trans-types.cc|   7 +-
 gcc/testsuite/gfortran.dg/unsigned_30.f90 |  63 
 libgfortran/gfortran.map  |  10 +
 libgfortran/intrinsics/random.c   | 440 ++
 8 files changed, 534 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/unsigned_30.f90

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index 533c9d7d343..1851cfb8d4a 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -7007,8 +7007,14 @@ gfc_check_random_init (gfc_expr *repeatable, gfc_expr *image_distinct)
 bool
 gfc_check_random_number (gfc_expr *harvest)
 {
-  if (!type_check (harvest, 0, BT_REAL))
-return false;
+  if (flag_unsigned)
+{
+  if (!type_check2 (harvest, 0, BT_REAL, BT_UNSIGNED))
+	return false;
+}
+  else
+if (!type_check (harvest, 0, BT_REAL))
+  return false;
 
   if (!variable_check (harvest, 0, false))
 return false;
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 3eb8039c09f..a5ebadff3bb 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -2790,6 +2790,7 @@ As of now, the following intrinsics take unsigned arguments:
 @item @code{TRANSFER}
 @item @code{SUM}, @code{PRODUCT}, @code{MATMUL} and @code{DOT_PRODUCT}
 @item @code{IANY}, @code{IALL} and @code{IPARITY}
+@item @code{RANDOM_NUMBER}.
 @end itemize
 This list will grow in the near future.
 @c -
diff --git a/gcc/fortran/iresolve.cc b

1 2 >

100 matches

Mail list logo