date:20240107

Re: [PATCH 3/8] libgomp: runtime support for target_device selector

2024-01-07 Thread Tobias Burnus


Tobias Burnus wrote:

Sandra Loosemore wrote:

From: Kwok Cheung Yeung

This patch implements the libgomp runtime support for the dynamic
target_device selector via the GOMP_evaluate_target_device function.


...


+GOMP_evaluate_target_device (int device_num, const char *kind,
+const char *arch, const char *isa)
+{
+  bool result = true;
+
+  if (device_num < 0)
+device_num = omp_get_default_device ();
+
+  if (kind && strcmp (kind, "any") == 0)
+kind = NULL;


I wonder whether we shouldn't be able to do an early return here,
given that:

"If trait-property 'any' is specified in the 'kind' trait-selector of 
the device selector set or the target_device selector sets, no other 
trait-property may be specified in the same selector set."


[From "Restrictions to context selectors are as follows:", here quoting 
TR12]


Tobias

[patch,testsuite,applied] PR52641: Fix more fallout from sloppy tests.

2024-01-07 Thread Georg-Johann Lay


This patch rectifies more tests that make assumptions on
sizeof(int), sizeof(void*), etc.

Johann

--

testsuite/52641: Fix fallout from sloppy tests.

gcc/testsuite/
PR testsuite/52641
* gcc.dg/memchr-3.c [avr]: Anticipate -Wbuiltin-declaration-mismatch.
* gcc.dg/pr103207.c: Use __INT32_TYPE__ instead of int.
* gcc.dg/pr103451.c [void* != long]: Anticipate -Wpointer-to-int-cast.
* gcc.dg/pr110496.c [void* != long]: Anticipate -Wint-to-pointer-cast.
* gcc.dg/pr109977.c: Use __SIZEOF_DOUBLE__ instead of 8.
* gcc.dg/pr110506-2.c: Use __UINT32_TYPE__ for uint32_t.
* gcc.dg/pr110582.c: Require int32plus.
* gcc.dg/pr111039.c: [sizeof(int) < 4]: Use __INT32_TYPE__.
* gcc.dg/pr111599.c: Same.
* gcc.dg/builtin-dynamic-object-size-0.c: Require size20plus.
* gcc.dg/builtin-object-size-1.c [avr]: Skip tests with strndup.
* gcc.dg/builtin-object-size-2.c: Same.
* gcc.dg/builtin-object-size-3.c: Same.
* gcc.dg/builtin-object-size-4.c: Same.
* gcc.dg/pr111070.c: Use __UINTPTR_TYPE__ instead of unsigned long.
* gcc.dg/debug/btf/btf-pr106773.c: Same.
* gcc.dg/debug/btf/btf-bitfields-2.c: [sizeof(int) < 4]: Use
__UINT32_TYPE__.diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 07e3da6f254..c3ac6230d4d 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
+/* { dg-require-effective-target size20plus } */
 
 #include "builtin-object-size-common.h"
 
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-1.c b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
index db325801f93..64c4bc4da39 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
@@ -621,6 +621,7 @@ test10 (void)
 }
 }
 
+#ifndef __AVR__ /* avr has no strndup */
 /* Tests for strdup/strndup.  */
 size_t
 __attribute__ ((noinline))
@@ -708,6 +709,7 @@ test11 (void)
 FAIL ();
   free (res);
 }
+#endif /* avr */
 
 int
 main (void)
@@ -724,6 +726,8 @@ main (void)
   test8 ();
   test9 (1);
   test10 ();
+#ifndef __AVR__ /* avr has no strndup */
   test11 ();
+#endif
   DONE ();
 }
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-2.c b/gcc/testsuite/gcc.dg/builtin-object-size-2.c
index 4c71b1f6a37..da10b6b0632 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-2.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-2.c
@@ -536,6 +536,7 @@ test8 (unsigned cond)
 #endif
 }
 
+#ifndef __AVR__ /* avr has no strndup */
 /* Tests for strdup/strndup.  */
 size_t
 __attribute__ ((noinline))
@@ -623,6 +624,7 @@ test9 (void)
 FAIL ();
   free (res);
 }
+#endif /* avr */
 
 int
 main (void)
@@ -637,6 +639,8 @@ main (void)
   test6 ();
   test7 ();
   test8 (1);
+#ifndef __AVR__ /* avr has no strndup */
   test9 ();
+#endif
   DONE ();
 }
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-3.c b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
index 3d907ef4814..f23873bec38 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-3.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
@@ -628,6 +628,7 @@ test10 (void)
 }
 }
 
+#ifndef __AVR__ /* avr has no strndup */
 /* Tests for strdup/strndup.  */
 size_t
 __attribute__ ((noinline))
@@ -716,6 +717,7 @@ test11 (void)
 FAIL ();
   free (res);
 }
+#endif /* avr */
 
 int
 main (void)
@@ -732,6 +734,8 @@ main (void)
   test8 ();
   test9 (1);
   test10 ();
+#ifndef __AVR__ /* avr has no strndup */
   test11 ();
+#endif
   DONE ();
 }
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-4.c b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
index c9af07499a4..dcb042f34b6 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-4.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
@@ -509,6 +509,7 @@ test8 (unsigned cond)
 #endif
 }
 
+#ifndef __AVR__ /* avr has no strndup */
 /* Tests for strdup/strndup.  */
 size_t
 __attribute__ ((noinline))
@@ -596,6 +597,7 @@ test9 (void)
 FAIL ();
   free (res);
 }
+#endif /* avr */
 
 int
 main (void)
@@ -610,6 +612,8 @@ main (void)
   test6 ();
   test7 ();
   test8 (1);
+#ifndef __AVR__ /* avr has no strndup */
   test9 ();
+#endif
   DONE ();
 }
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-2.c b/gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-2.c
index 03c323a6d49..2ec00dc6796 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-2.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-bitfields-2.c
@@ -18,6 +18,10 @@
 /* Only 2 members.  */
 /* { dg-final { scan-assembler-times "MEMBER" 2 } } */
 
+#if __SIZEOF_INT__ < 4
+#define unsigned __UINT32_TYPE__
+#endif
+
 struct foo
 {
   unsigned a : 31;
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-pr106773.c b/gcc/testsuite/gcc.dg/debug/btf/btf-pr106773.c
index f90fa773a4b..511a54f800d 100644
--- a/gcc/testsuite/gcc

Re: [PATCH 1/8] OpenMP: metadirective tree data structures and front-end interfaces

2024-01-07 Thread Tobias Burnus


Hi Sandra,

Tobias Burnus wrote:

(I have now an errant to do - and will continue later with the review.)


First, something a bit unrelated to this patch but affecting related 
code (quoting old, existing code):


int   
omp_context_selector_matches (tree ctx)

...

case OMP_TRAIT_DEVICE_KIND:
  if (set == OMP_TRAIT_SET_DEVICE) 
for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))

  {
const char *prop = omp_context_name_list_prop (p);
if (prop == NULL)
  return 0;
if (!strcmp (prop, "any"))
  continue;


[Cf. also comment to 3/8] As OpenMP states:

"If trait-property 'any' is specified in the 'kind' trait-selector of 
the device selector set or the target_device selector sets, no other 
trait-property may be specified in the same selector set."


[From "Restrictions to context selectors are as follows:", here quoting 
TR12]


It seems as if we can avoid run-time evaluation for 'device' if there is 
a kind(any) - and likewise for 'target_device' if there is 'kind(any)', 
but we want to have an error in that case if any other trait has been 
specified, I guess.


* * *


Sandra Loosemore wrote:

@@ -2194,12 +2308,21 @@ omp_context_compute_score (tree ctx, score_wide_int 
*score, bool declare_simd)

...

  int *scores
    = (int *) alloca ((2 * nconstructs + 2) * sizeof (int));


That's not new but I have the feeling it should be '+ 3' and not '+ 2'
for device or target_device + and having both device and target device,
it might even need to be + 6.


I also wonder whether 'alloca' will really work - or only when inlined 
at all call sites.


Ignore the last sentence. ['scores' is used locally, while 'score' is 
passed by the caller.] Still, I wonder about the + 2 vs. +3 or +6.


Otherwise, I have not spotted anything.

Tobias

[PATCH] Add __cow_string C string constructor

2024-01-07 Thread François Dumont


Hi

While working on the patch to use the cxx11 abi in gnu version namespace 
mode I got a small problem with this missing constructor. I'm not sure 
that the main patch will be integrated in gcc 14 so I think it is better 
if I propose this patch independently.


    libstdc++: Add __cow_string constructor from C string

    The __cow_string is instantiated from a C string in 
cow-stdexcept.cc. At the moment
    the constructor from std::string is being used with the drawback of 
an intermediate
    potential allocation/deallocation and copy. With the C string 
constructor we bypass

    all those operations.

    libstdc++-v3/ChangeLog:

    * include/std/stdexcept (__cow_string(const char*)): New 
definition.
    * src/c++11/cow-stdexcept.cc (__cow_string(const char*)): 
New definition and

    declaration.

Tested under Linux x64, ok to commit ?

François

diff --git a/libstdc++-v3/include/std/stdexcept 
b/libstdc++-v3/include/std/stdexcept
index 66c8572d0cd..2e3c9f3bf71 100644
--- a/libstdc++-v3/include/std/stdexcept
+++ b/libstdc++-v3/include/std/stdexcept
@@ -54,6 +54,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 __cow_string();
 __cow_string(const std::string&);
+__cow_string(const char*);
 __cow_string(const char*, size_t);
 __cow_string(const __cow_string&) _GLIBCXX_NOTHROW;
 __cow_string& operator=(const __cow_string&) _GLIBCXX_NOTHROW;
diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc 
b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index 8d1cc4605d4..12b189b43b5 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -127,6 +127,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 __cow_string();
 __cow_string(const std::string& s);
+__cow_string(const char*);
 __cow_string(const char*, size_t n);
 __cow_string(const __cow_string&) noexcept;
 __cow_string& operator=(const __cow_string&) noexcept;
@@ -139,6 +140,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   __cow_string::__cow_string(const std::string& s) : _M_str(s) { }
 
+  __cow_string::__cow_string(const char* s) : _M_str(s) { }
+
   __cow_string::__cow_string(const char* s, size_t n) : _M_str(s, n) { }
 
   __cow_string::__cow_string(const __cow_string& s) noexcept

Re: [PATCH] c++/modules: Prevent overwriting arguments for duplicates [PR112588]

2024-01-07 Thread Nathaniel Shead

On Sat, Jan 06, 2024 at 05:32:37PM -0500, Nathan Sidwell wrote:
> I;m not sure about this, there was clearly a reason I did it the way it is,
> but perhaps that reasoning became obsolete -- something about an existing
> declaration and reading in a definition maybe?
> 
> nathan

So I took a bit of a closer look and this is actually a regression,
seeming to start with r13-3134-g09df0d8b14dda6. I haven't looked more
closely at the actual change though to see whether this implies a
different fix yet though.

Nathaniel

> On 11/22/23 06:33, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
> > access.
> > 
> > -- >8 --
> > 
> > When merging duplicate instantiations of function templates, currently
> > read_function_def overwrites the arguments with that of the existing
> > duplicate. This is problematic, however, since this means that the
> > PARM_DECLs in the body of the function definition no longer match with
> > the PARM_DECLs in the argument list, which causes issues when it comes
> > to generating RTL.
> > 
> > There doesn't seem to be any reason to do this replacement, so this
> > patch removes that logic.
> > 
> > PR c++/112588
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (trees_in::read_function_def): Don't overwrite
> > arguments.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/merge-16.h: New test.
> > * g++.dg/modules/merge-16_a.C: New test.
> > * g++.dg/modules/merge-16_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/module.cc  |  2 --
> >   gcc/testsuite/g++.dg/modules/merge-16.h   | 10 ++
> >   gcc/testsuite/g++.dg/modules/merge-16_a.C |  7 +++
> >   gcc/testsuite/g++.dg/modules/merge-16_b.C |  5 +
> >   4 files changed, 22 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16.h
> >   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_a.C
> >   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_b.C
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index 4f5b6e2747a..2520ab659cc 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -11665,8 +11665,6 @@ trees_in::read_function_def (tree decl, tree 
> > maybe_template)
> > DECL_RESULT (decl) = result;
> > DECL_INITIAL (decl) = initial;
> > DECL_SAVED_TREE (decl) = saved;
> > -  if (maybe_dup)
> > -   DECL_ARGUMENTS (decl) = DECL_ARGUMENTS (maybe_dup);
> > if (context)
> > SET_DECL_FRIEND_CONTEXT (decl, context);
> > diff --git a/gcc/testsuite/g++.dg/modules/merge-16.h 
> > b/gcc/testsuite/g++.dg/modules/merge-16.h
> > new file mode 100644
> > index 000..fdb38551103
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/merge-16.h
> > @@ -0,0 +1,10 @@
> > +// PR c++/112588
> > +
> > +void f(int*);
> > +
> > +template 
> > +struct S {
> > +  void g(int n) { f(&n); }
> > +};
> > +
> > +template struct S;
> > diff --git a/gcc/testsuite/g++.dg/modules/merge-16_a.C 
> > b/gcc/testsuite/g++.dg/modules/merge-16_a.C
> > new file mode 100644
> > index 000..c243224c875
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/merge-16_a.C
> > @@ -0,0 +1,7 @@
> > +// PR c++/112588
> > +// { dg-additional-options "-fmodules-ts" }
> > +// { dg-module-cmi merge16 }
> > +
> > +module;
> > +#include "merge-16.h"
> > +export module merge16;
> > diff --git a/gcc/testsuite/g++.dg/modules/merge-16_b.C 
> > b/gcc/testsuite/g++.dg/modules/merge-16_b.C
> > new file mode 100644
> > index 000..8c7b1f0511f
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/merge-16_b.C
> > @@ -0,0 +1,5 @@
> > +// PR c++/112588
> > +// { dg-additional-options "-fmodules-ts" }
> > +
> > +#include "merge-16.h"
> > +import merge16;
> 
> -- 
> Nathan Sidwell
>

Re: [PATCH 6/4] libbacktrace: Add loaded dlls after initialize

2024-01-07 Thread Eli Zaretskii

[I re-added the other addressees, as I don' think you meant to make
this discussion private between the two of us.]

> Date: Sun, 7 Jan 2024 12:58:29 +0100
> From: Björn Schäpers 
> 
> Am 07.01.2024 um 07:50 schrieb Eli Zaretskii:
> >> Date: Sat, 6 Jan 2024 23:15:24 +0100
> >> From: Björn Schäpers 
> >> Cc: gcc-patches@gcc.gnu.org, g...@gcc.gnu.org
> >>
> >> This patch adds libraries which are loaded after backtrace_initialize, like
> >> plugins or similar.
> >>
> >> I don't know what style is preferred for the Win32 typedefs, should the 
> >> code use
> >> PVOID or void*?
> > 
> > It doesn't matter, at least not if the source file includes the
> > Windows header files (where PVOID is defined).
> > 
> >> +  if (reason != /*LDR_DLL_NOTIFICATION_REASON_LOADED*/1)
> > 
> > IMO, it would be better to supply a #define if undefined:
> > 
> > #ifndef LDR_DLL_NOTIFICATION_REASON_LOADED
> > # define LDR_DLL_NOTIFICATION_REASON_LOADED 1
> > #endif
> > 
> 
> I surely can define it. But the ifndef is not needed, since there are no 
> headers 
> containing the function signatures, structures or the defines:
> https://learn.microsoft.com/en-us/windows/win32/devnotes/ldrregisterdllnotification

OK, I wasn't sure about that.

> >> +  if (!GetModuleHandleEx (GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS
> >> +| GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
> >> +(TCHAR*) notification_data->dll_base,
> > 
> > Is TCHAR correct here? does libbacktrace indeed use TCHAR and relies
> > on compile-time definition of UNICODE?  (I'm not familiar with the
> > internals of libbacktrace, so apologies if this is a silly question.)
> > 
> > Thanks.
> 
> As far as I can see it's the first time for TCHAR, I would've gone for 
> GetModuleHandleExW, but 
> https://gcc.gnu.org/pipermail/gcc/2023-January/240534.html

That was about GetModuleHandle, not about GetModuleHandleEx.  For the
latter, all Windows versions that support it also support "wide" APIs.
So my suggestion is to use GetModuleHandleExW here.  However, you will
need to make sure that notification_data->dll_base is declared as
'wchar_t *', not 'char *'.  If dll_base is declared as 'char *', then
only GetModuleHandleExA will work, and you will lose the ability to
support file names with non-ASCII characters outside of the current
system codepage.

> But I didn't want to force GetModuleHandleExA, so I went for TCHAR and 
> GetModuleHandleEx so it automatically chooses which to use. Same for 
> GetModuleHandle of ntdll.dll.

The considerations for GetModuleHandle and for GetModuleHandleEx are
different: the former is also available on old versions of Windows
that doesn't support "wide" APIs.

Re: [PATCH 1/8] OpenMP: lvalue parsing for map/to/from clauses (C++)

2024-01-07 Thread Tobias Burnus


Am 05.01.24 um 13:23 schrieb Julian Brown:

On Wed, 20 Dec 2023 15:31:15 +0100
Tobias Burnus  wrote:
Here's a rebased/retested version which fixes those bits (I haven't
adjusted the libgomp.texi bit you noted yet, though).

How does this look now?




--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13499,7 +13499,11 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
   if (TREE_CODE (dtype) == REFERENCE_TYPE)
dtype = TREE_TYPE (dtype);
   /* FIRSTPRIVATE_POINTER doesn't work well if we have a
-multiply-indirected pointer.  */
+multiply-indirected pointer.  If we have a reference to a pointer to
+a pointer, it's possible that this should really be
+GOMP_MAP_FIRSTPRIVATE_REFERENCE -- but that also doesn't work at the
+moment, so stick with this.  (See testcase
+baseptrs-4.C:ref2ptrptr_offset_decl_member_slice).  */


Looks as we should have a tracking PR about this; can you file one?

* * *


+  if (processing_template_decl)
+{
+  if (type_dependent_expression_p (array_expr)
+ || type_dependent_expression_p (index)
+ || type_dependent_expression_p (length))
+   return build_min_nt_loc (loc, OMP_ARRAY_SECTION, array_expr, index,
+length);
+}


I personally find it more readable if combined in a single 'if' condition.


+ /* Turn *foo into foo[0:1].  */
+ decl = TREE_OPERAND (decl, 0);
+ STRIP_NOPS (decl);
+
+ /* If we have "*foo" and
+- it's an indirection of a reference, "unconvert" it, i.e.
+  strip the indirection (to just "foo").
+- it's an indirection of a pointer, turn it into
+  "foo[0:1]".  */
+ if (!ref_p)
+   decl = grok_omp_array_section (loc, decl, integer_zero_node,
+  integer_one_node);


I would remove the first comment and remove the two succeeding lines 
below the second comment.




+ /* This code rewrites a parsed expression containing various tree
+codes used to represent array accesses into a more uniform nest of
+OMP_ARRAY_SECTION nodes before it is processed by
+semantics.cc:handle_omp_array_sections_1.  It might be more
+efficient to move this logic to that function instead, analysing
+the parsed expression directly rather than this preprocessed
+form.  */


Or to do this transformation in handle_omp_array_sections to get still a 
unified result in the middle end. I see advantages of all three 
solutions. (Doing this in parse.cc (as currently done) feels a bit odd, 
though.)


* * *


build_omp_array_section (location_t loc, tree array_expr, tree index,
+tree length)
+{
+  tree idxtype;
+
+  /* If we know the integer bounds, create an index type with exact
+ low/high (or zero/length) bounds.  Otherwise, create an incomplete
+ array type.  (This mostly only affects diagnostics.)  */
+  if (index != NULL_TREE
+  && length != NULL_TREE
+  && TREE_CODE (index) == INTEGER_CST
+  && TREE_CODE (length) == INTEGER_CST)
+{
+  tree low = fold_convert (sizetype, index);
+  tree high = fold_convert (sizetype, length);
+  high = size_binop (PLUS_EXPR, low, high);
+  high = size_binop (MINUS_EXPR, high, size_one_node);
+  idxtype = build_range_type (sizetype, low, high);
+}
+  else if ((index == NULL_TREE || integer_zerop (index))
+  && length != NULL_TREE
+  && TREE_CODE (length) == INTEGER_CST)
+idxtype = build_index_type (length);
+  else
+idxtype = NULL_TREE;
+
+  tree type = TREE_TYPE (array_expr);
+  gcc_assert (type);
+  type = non_reference (type);
+
+  tree sectype, eltype = TREE_TYPE (type);
+
+  /* It's not an array or pointer type.  Just reuse the type of the
+ original expression as the type of the array section (an error will be
+ raised anyway, later).  */
+  if (eltype == NULL_TREE)
+sectype = TREE_TYPE (array_expr);
+  else
+sectype = build_array_type (eltype, idxtype);
+
+  return build3_loc (loc, OMP_ARRAY_SECTION, sectype, array_expr, index,
+length);
+}


I wonder whether it would be more readable if one moves all the 
'idxtype' handling into the last 'else' branch.


* * *

LGTM - please file the PR and consider the readability items above.

Thanks,

Tobias

[patch, testsuite, applied] PR52641 Fix more fallout from sloppy tests.

2024-01-07 Thread Georg-Johann Lay


Made some tests more generic so they can pass on more targets.

Johann

--

testsuite/52641: Fix fallout from sloppy tests.

gcc/testsuite/
PR testsuite/52641
* gcc.dg/torture/pr110838.c: Use proper shift offset to get MSB or int.
* gcc.dg/torture/pr112282.c: Use at least 32 bits for :20 bit-fields.
* gcc.dg/tree-ssa/bitcmp-5.c: Use integral type with 32 bits or more.
* gcc.dg/tree-ssa/bitcmp-6.c: Same.
* gcc.dg/tree-ssa/cltz-complement-max.c: Same.
* gcc.dg/tree-ssa/cltz-max.c: Same.
* gcc.dg/tree-ssa/if-to-switch-8.c: Use literals that fit int.
* gcc.dg/tree-ssa/if-to-switch-9.c [avr]: Set case-values-threshold=3.
* gcc.dg/tree-ssa/negneg-3.c: Discriminate [not] large_double.
* gcc.dg/tree-ssa/phi-opt-25b.c: Use types of correct widths for
__builtin_bswapN.
* gcc.dg/tree-ssa/pr55177-1.c: Same.
* gcc.dg/tree-ssa/popcount-max.c: Use int32_t where required.
* gcc.dg/tree-ssa/pr111583-1.c: Use intptr_t as needed.
* gcc.dg/tree-ssa/pr111583-2.c: Same.diff --git a/gcc/testsuite/gcc.dg/torture/pr110838.c b/gcc/testsuite/gcc.dg/torture/pr110838.c
index f039bd6c8ea..ae2874b6d0d 100644
--- a/gcc/testsuite/gcc.dg/torture/pr110838.c
+++ b/gcc/testsuite/gcc.dg/torture/pr110838.c
@@ -5,10 +5,12 @@ typedef __UINT8_TYPE__ uint8_t;
 typedef __INT8_TYPE__ int8_t;
 typedef uint8_t pixel;
 
+#define MSB (__CHAR_BIT__ * __SIZEOF_INT__ - 1)
+
 /* get the sign of input variable (TODO: this is a dup, make common) */
 static inline int8_t signOf(int x)
 {
-  return (x >> 31) | ((int)uint32_t)-x)) >> 31));
+  return (x >> MSB) | ((int)uint32_t)-x)) >> MSB));
 }
 
 __attribute__((noipa))
diff --git a/gcc/testsuite/gcc.dg/torture/pr112282.c b/gcc/testsuite/gcc.dg/torture/pr112282.c
index 6190b90cf66..cfe364f9a84 100644
--- a/gcc/testsuite/gcc.dg/torture/pr112282.c
+++ b/gcc/testsuite/gcc.dg/torture/pr112282.c
@@ -1,5 +1,11 @@
 /* { dg-do run } */
 
+#if __SIZEOF_INT__ < 4
+#define Xint __INT32_TYPE__
+#else
+#define Xint int
+#endif
+
 int printf(const char *, ...);
 void abort ();
 /* We need an abort that isn't noreturn.  */
@@ -10,8 +16,8 @@ void __attribute__((noipa)) my_abort ()
 int a, g, h, i, v, w = 2, x, y, ab, ac, ad, ae, af, ag;
 static int f, j, m, n, p, r, u, aa;
 struct b {
-  int c : 20;
-  int d : 20;
+  Xint c : 20;
+  Xint d : 20;
   int e : 10;
 };
 static struct b l, o, q = {3, 3, 5};
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-5.c b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-5.c
index a6be14294b4..8def5ad3cca 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-5.c
@@ -6,6 +6,9 @@
of `(a & b) CMP a` and `(a | b) CMP a`
which can be optimized to 1. */
 
+#if __SIZEOF_INT__ < 4
+#define int __INT32_TYPE__
+#endif
 
 /* For `&`, the non-negativeness of b is not taken into account. */
 int f_and_le(int len) {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-6.c b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-6.c
index a86a19fbef2..cea377489eb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitcmp-6.c
@@ -6,6 +6,10 @@
of `(a & b) CMP a` and `(a | b) CMP a`
which can be optimized to 0. */
 
+#if __SIZEOF_INT__ < 4
+#define int __INT32_TYPE__
+#endif
+
 /* For `&`, the non-negativeness of b is not taken into account. */
 int f_and_gt(int len) {
   len &= 0xf;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c b/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c
index 1a29ca52e42..7b3599a8a4e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c
@@ -3,6 +3,10 @@
 
 #define PREC (__CHAR_BIT__)
 
+#if __SIZEOF_INT__ < 4
+#define int __INT32_TYPE__
+#endif
+
 int clz_complement_count1 (unsigned char b) {
 int c = 0;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c b/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c
index a6bea3d3389..78b0d017be8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cltz-max.c
@@ -3,6 +3,10 @@
 
 #define PREC (__CHAR_BIT__)
 
+#if __SIZEOF_INT__ < 4
+#define int __INT32_TYPE__
+#endif
+
 int clz_count1 (unsigned char b) {
 int c = 0;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c
index f4d06fed2b6..36cb74b7279 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c
@@ -20,7 +20,7 @@ int foo(int a, int b)
 else if (a == 10)
   global2 = 12345;
 else if (a == 1)
-  global2 = 123456;
+  global2 = 23456;
   }
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-9.c b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-9.c
index e67198bf8c3..ce6dc341ded 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-9.c
@@ -1,6 +1,7 @@
 /* PR tree-optimizatio

Re: [PATCH 6/4] libbacktrace: Add loaded dlls after initialize

2024-01-07 Thread Björn Schäpers


Am 07.01.2024 um 15:46 schrieb Eli Zaretskii:

[I re-added the other addressees, as I don' think you meant to make
this discussion private between the two of us.]



Yeah, that was a mistake.


Date: Sun, 7 Jan 2024 12:58:29 +0100
From: Björn Schäpers 

Am 07.01.2024 um 07:50 schrieb Eli Zaretskii:

Date: Sat, 6 Jan 2024 23:15:24 +0100
From: Björn Schäpers 
Cc: gcc-patches@gcc.gnu.org, g...@gcc.gnu.org

This patch adds libraries which are loaded after backtrace_initialize, like
plugins or similar.

I don't know what style is preferred for the Win32 typedefs, should the code use
PVOID or void*?


It doesn't matter, at least not if the source file includes the
Windows header files (where PVOID is defined).


+  if (reason != /*LDR_DLL_NOTIFICATION_REASON_LOADED*/1)


IMO, it would be better to supply a #define if undefined:

#ifndef LDR_DLL_NOTIFICATION_REASON_LOADED
# define LDR_DLL_NOTIFICATION_REASON_LOADED 1
#endif



I surely can define it. But the ifndef is not needed, since there are no headers
containing the function signatures, structures or the defines:
https://learn.microsoft.com/en-us/windows/win32/devnotes/ldrregisterdllnotification


OK, I wasn't sure about that.


+  if (!GetModuleHandleEx (GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS
+ | GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
+ (TCHAR*) notification_data->dll_base,


Is TCHAR correct here? does libbacktrace indeed use TCHAR and relies
on compile-time definition of UNICODE?  (I'm not familiar with the
internals of libbacktrace, so apologies if this is a silly question.)

Thanks.


As far as I can see it's the first time for TCHAR, I would've gone for
GetModuleHandleExW, but 
https://gcc.gnu.org/pipermail/gcc/2023-January/240534.html


That was about GetModuleHandle, not about GetModuleHandleEx.  For the
latter, all Windows versions that support it also support "wide" APIs.
So my suggestion is to use GetModuleHandleExW here.  However, you will
need to make sure that notification_data->dll_base is declared as
'wchar_t *', not 'char *'.  If dll_base is declared as 'char *', then
only GetModuleHandleExA will work, and you will lose the ability to
support file names with non-ASCII characters outside of the current
system codepage.


The dll_base is a PVOID. With the GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS flag 
GetModuleHandleEx does not look for a name, but uses an adress in the module to 
get the HMODULE, so you cast it to char* or wchar_t* depending on which function 
you call. Actually one could just cast the dll_base to HMODULE, at least in 
win32 on x86 the HMODULE of a dll is always its base adress. But to make it 
safer and future proof I went the way through GetModuleHandeEx.





But I didn't want to force GetModuleHandleExA, so I went for TCHAR and
GetModuleHandleEx so it automatically chooses which to use. Same for
GetModuleHandle of ntdll.dll.


The considerations for GetModuleHandle and for GetModuleHandleEx are
different: the former is also available on old versions of Windows
that doesn't support "wide" APIs.

Re: [x86 PATCH] PR target/113231: Improved costs in Scalar-To-Vector (STV) pass.

2024-01-07 Thread Uros Bizjak

On Sat, Jan 6, 2024 at 2:30 PM Roger Sayle  wrote:
>
>
> This patch improves the cost/gain calculation used during the i386 backend's
> SImode/DImode scalar-to-vector (STV) conversion pass.  The current code
> handles loads and stores, but doesn't consider that converting other
> scalar operations with a memory destination, requires an explicit load
> before and an explicit store after the vector equivalent.
>
> To ease the review, the significant change looks like:
>
>  /* For operations on memory operands, include the overhead
> of explicit load and store instructions.  */
>  if (MEM_P (dst))
>igain += !optimize_insn_for_size_p ()
> ? (m * (ix86_cost->int_load[2]
> + ix86_cost->int_store[2])
>- (ix86_cost->sse_load[sse_cost_idx] +
>   ix86_cost->sse_store[sse_cost_idx]))
> : -COSTS_N_BYTES (8);

Please just swap true and false statements to avoid negative test.

> however the patch itself is complicated by a change in indentation
> which leads to a number of lines with only whitespace changes.

'git diff -w' to the rescue ;)

> For architectures where integer load/store costs are the same as
> vector load/store costs, there should be no change without -Os/-Oz.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-01-06  Roger Sayle  
>
> gcc/ChangeLog
> PR target/113231
> * config/i386/i386-features.cc (compute_convert_gain): Include
> the overhead of explicit load and store (movd) instructions when
> converting non-store scalar operations with memory destinations.
>
> gcc/testsuite/ChangeLog
> PR target/113231
> * gcc.target/i386/pr113231.c: New test case.

OK with the above proposed change.

Thanks,
Uros.

Re: [PATCH] Add __cow_string C string constructor

2024-01-07 Thread Jonathan Wakely

On Sun, 7 Jan 2024 at 12:57, François Dumont  wrote:
>
> Hi
>
> While working on the patch to use the cxx11 abi in gnu version namespace
> mode I got a small problem with this missing constructor. I'm not sure
> that the main patch will be integrated in gcc 14 so I think it is better
> if I propose this patch independently.
>
>  libstdc++: Add __cow_string constructor from C string
>
>  The __cow_string is instantiated from a C string in
> cow-stdexcept.cc. At the moment
>  the constructor from std::string is being used with the drawback of
> an intermediate
>  potential allocation/deallocation and copy. With the C string
> constructor we bypass
>  all those operations.

But in that file, the std::string is the COW string, which means that
when we construct a std::string and copy it, it's cheap. It's just a
reference count increment/decrement. There should be no additional
allocation or deallocation.

Am I missing something?


>
>  libstdc++-v3/ChangeLog:
>
>  * include/std/stdexcept (__cow_string(const char*)): New
> definition.
>  * src/c++11/cow-stdexcept.cc (__cow_string(const char*)):
> New definition and
>  declaration.
>
> Tested under Linux x64, ok to commit ?
>
> François
>

Re: [PATCH RFC] c++: mangle function template constraints

2024-01-07 Thread Patrick Palka

On Tue, 5 Dec 2023, Jonathan Wakely wrote:

> On Wed, 22 Nov 2023 at 14:50, Jonathan Wakely  wrote:
> >
> > On Mon, 20 Nov 2023 at 02:56, Jason Merrill wrote:
> > >
> > > Tested x86_64-pc-linux-gnu.  Are the library bits OK?  Any comments 
> > > before I
> > > push this?
> >
> > The library parts are OK.
> >
> > The variable template is_trivially_copyable_v just uses
> > __is_trivially_copyable so should be just as efficient, and the change
> > to  is fine.
> >
> > The variable template is_trivially_destructible_v instantiates the
> > is_trivially_destructible type trait, which instantiates
> > __is_destructible_safe and __is_destructible_impl, which is probably
> > why we used the built-in directly in . But that's an
> > acceptable overhead to avoid using the built-in in a mangled context,
> > and it would be good to optimize the variable template anyway, as a
> > separate change.
> 
> This actually causes a regression:
> 
> FAIL: 20_util/variant/87619.cc  -std=gnu++20 (test for excess errors)
> FAIL: 20_util/variant/87619.cc  -std=gnu++23 (test for excess errors)
> FAIL: 20_util/variant/87619.cc  -std=gnu++26 (test for excess errors)
> 
> It's OK for C++17 because the changed code is only used for C++20 and later.
> 
> That test instantiates a very large variant to check that we don't hit
> our template instantiation depth limit. Using the variable template
> (which uses the class template) instead of the built-in causes it to
> fail now.

Could we pass down __trivially_destructible from _Variadic_storage to
_Variadic_union and use that as the dtor's constraint instead of
recursively re-computing it?  This reduces the maximum template
instantiation depth for 87619.cc to ~270 from ~780 so that the depth is
roughly #variants rather than 4 * #variants.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 20a76c8aa87..4b9002e0917 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -392,7 +392,7 @@ namespace __variant
 };
 
   // Defines members and ctors.
-  template
+  template
 union _Variadic_union
 {
   _Variadic_union() = default;
@@ -401,8 +401,8 @@ namespace __variant
_Variadic_union(in_place_index_t<_Np>, _Args&&...) = delete;
 };
 
-  template
-union _Variadic_union<_First, _Rest...>
+  template
+union _Variadic_union<__trivially_destructible, _First, _Rest...>
 {
   constexpr _Variadic_union() : _M_rest() { }
 
@@ -427,13 +427,12 @@ namespace __variant
   ~_Variadic_union() = default;
 
   constexpr ~_Variadic_union()
-   requires (!is_trivially_destructible_v<_First>)
- || (!is_trivially_destructible_v<_Variadic_union<_Rest...>>)
+   requires (!__trivially_destructible)
   { }
 #endif
 
   _Uninitialized<_First> _M_first;
-  _Variadic_union<_Rest...> _M_rest;
+  _Variadic_union<__trivially_destructible, _Rest...> _M_rest;
 };
 
   // _Never_valueless_alt is true for variant alternatives that can
@@ -514,7 +513,7 @@ namespace __variant
return this->_M_index != __index_type(variant_npos);
   }
 
-  _Variadic_union<_Types...> _M_u;
+  _Variadic_union _M_u;
   using __index_type = __select_index<_Types...>;
   __index_type _M_index;
 };
@@ -552,7 +551,7 @@ namespace __variant
return this->_M_index != static_cast<__index_type>(variant_npos);
   }
 
-  _Variadic_union<_Types...> _M_u;
+  _Variadic_union _M_u;
   using __index_type = __select_index<_Types...>;
   __index_type _M_index;
 };

> 
> So optimizing the variable template is now a priority.
> 
>

Re: [patch, testsuite, applied] PR52641 Fix more fallout from sloppy tests.

2024-01-07 Thread Jeff Law





On 1/7/24 08:53, Georg-Johann Lay wrote:

Made some tests more generic so they can pass on more targets.

Johann

--

testsuite/52641: Fix fallout from sloppy tests.

gcc/testsuite/
 PR testsuite/52641
 * gcc.dg/torture/pr110838.c: Use proper shift offset to get MSB or 
int.

 * gcc.dg/torture/pr112282.c: Use at least 32 bits for :20 bit-fields.
 * gcc.dg/tree-ssa/bitcmp-5.c: Use integral type with 32 bits or more.
 * gcc.dg/tree-ssa/bitcmp-6.c: Same.
 * gcc.dg/tree-ssa/cltz-complement-max.c: Same.
 * gcc.dg/tree-ssa/cltz-max.c: Same.
 * gcc.dg/tree-ssa/if-to-switch-8.c: Use literals that fit int.
 * gcc.dg/tree-ssa/if-to-switch-9.c [avr]: Set case-values-threshold=3.
 * gcc.dg/tree-ssa/negneg-3.c: Discriminate [not] large_double.
 * gcc.dg/tree-ssa/phi-opt-25b.c: Use types of correct widths for
 __builtin_bswapN.
 * gcc.dg/tree-ssa/pr55177-1.c: Same.
 * gcc.dg/tree-ssa/popcount-max.c: Use int32_t where required.
 * gcc.dg/tree-ssa/pr111583-1.c: Use intptr_t as needed.
 * gcc.dg/tree-ssa/pr111583-2.c: Same.
Are you checking this on other targets?  My tester just started 
complaining about these (ft30-elf, fr30-elf), more expected as today's 
run progresses)




Tests that now fail, but worked before (2 tests):

ft32-sim: gcc: gcc.dg/tree-ssa/phi-opt-25b.c (test for excess errors)
ft32-sim: gcc: gcc.dg/tree-ssa/phi-opt-25b.c (test for excess errors)


Jeff

Re: [PATCH RFC] c++: mangle function template constraints

2024-01-07 Thread Jonathan Wakely

On Sun, 7 Jan 2024 at 16:40, Patrick Palka  wrote:
>
> On Tue, 5 Dec 2023, Jonathan Wakely wrote:
>
> > On Wed, 22 Nov 2023 at 14:50, Jonathan Wakely  wrote:
> > >
> > > On Mon, 20 Nov 2023 at 02:56, Jason Merrill wrote:
> > > >
> > > > Tested x86_64-pc-linux-gnu.  Are the library bits OK?  Any comments 
> > > > before I
> > > > push this?
> > >
> > > The library parts are OK.
> > >
> > > The variable template is_trivially_copyable_v just uses
> > > __is_trivially_copyable so should be just as efficient, and the change
> > > to  is fine.
> > >
> > > The variable template is_trivially_destructible_v instantiates the
> > > is_trivially_destructible type trait, which instantiates
> > > __is_destructible_safe and __is_destructible_impl, which is probably
> > > why we used the built-in directly in . But that's an
> > > acceptable overhead to avoid using the built-in in a mangled context,
> > > and it would be good to optimize the variable template anyway, as a
> > > separate change.
> >
> > This actually causes a regression:
> >
> > FAIL: 20_util/variant/87619.cc  -std=gnu++20 (test for excess errors)
> > FAIL: 20_util/variant/87619.cc  -std=gnu++23 (test for excess errors)
> > FAIL: 20_util/variant/87619.cc  -std=gnu++26 (test for excess errors)
> >
> > It's OK for C++17 because the changed code is only used for C++20 and later.
> >
> > That test instantiates a very large variant to check that we don't hit
> > our template instantiation depth limit. Using the variable template
> > (which uses the class template) instead of the built-in causes it to
> > fail now.
>
> Could we pass down __trivially_destructible from _Variadic_storage to
> _Variadic_union and use that as the dtor's constraint instead of
> recursively re-computing it?  This reduces the maximum template
> instantiation depth for 87619.cc to ~270 from ~780 so that the depth is
> roughly #variants rather than 4 * #variants.

LGTM.

I think __trivially_destructible should be safe from collisions with
built-ins, as I would expect any such built-in to
be__is_trivially_destructible not __trivially_destructible (we already
have __has_trivial_destructor which we use for that, but that requires
some additional code to use it for the std::is_trivially_destructible
trait, so a built-in to do it directly isn't far-fetched).


>
> diff --git a/libstdc++-v3/include/std/variant 
> b/libstdc++-v3/include/std/variant
> index 20a76c8aa87..4b9002e0917 100644
> --- a/libstdc++-v3/include/std/variant
> +++ b/libstdc++-v3/include/std/variant
> @@ -392,7 +392,7 @@ namespace __variant
>  };
>
>// Defines members and ctors.
> -  template
> +  template
>  union _Variadic_union
>  {
>_Variadic_union() = default;
> @@ -401,8 +401,8 @@ namespace __variant
> _Variadic_union(in_place_index_t<_Np>, _Args&&...) = delete;
>  };
>
> -  template
> -union _Variadic_union<_First, _Rest...>
> +  template
> +union _Variadic_union<__trivially_destructible, _First, _Rest...>
>  {
>constexpr _Variadic_union() : _M_rest() { }
>
> @@ -427,13 +427,12 @@ namespace __variant
>~_Variadic_union() = default;
>
>constexpr ~_Variadic_union()
> -   requires (!is_trivially_destructible_v<_First>)
> - || (!is_trivially_destructible_v<_Variadic_union<_Rest...>>)
> +   requires (!__trivially_destructible)
>{ }
>  #endif
>
>_Uninitialized<_First> _M_first;
> -  _Variadic_union<_Rest...> _M_rest;
> +  _Variadic_union<__trivially_destructible, _Rest...> _M_rest;
>  };
>
>// _Never_valueless_alt is true for variant alternatives that can
> @@ -514,7 +513,7 @@ namespace __variant
> return this->_M_index != __index_type(variant_npos);
>}
>
> -  _Variadic_union<_Types...> _M_u;
> +  _Variadic_union _M_u;
>using __index_type = __select_index<_Types...>;
>__index_type _M_index;
>  };
> @@ -552,7 +551,7 @@ namespace __variant
> return this->_M_index != static_cast<__index_type>(variant_npos);
>}
>
> -  _Variadic_union<_Types...> _M_u;
> +  _Variadic_union _M_u;
>using __index_type = __select_index<_Types...>;
>__index_type _M_index;
>  };
>
> >
> > So optimizing the variable template is now a priority.
> >
> >
>

[committed] Fix typo in last change

2024-01-07 Thread Jeff Law



Tester started complaining about this change as soon as it went in. 
Clearly there's an extraneous "short" in the testcase.


Pushed to the trunk.

Jeff
commit 66d82874d2254bcb0124f77e6be220d299eab5f1
Author: Jeff Law 
Date:   Sun Jan 7 09:52:44 2024 -0700

Fix typo in last change

gcc/testsuite
* gcc.dg/tree-ssa/phi-opt-25b.c: Remove extraneous "short".

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
index 2cb4361dc00..5d557360ab3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
@@ -5,7 +5,7 @@
 /* Test to make sure unrelated arguments and comparisons
don't get optimized incorrectly. */
 
-__UINT16_TYPE__ short test_bswap16(__UINT16_TYPE__ x, __UINT16_TYPE__ y)
+__UINT16_TYPE__ test_bswap16(__UINT16_TYPE__ x, __UINT16_TYPE__ y)
 {
   return x ? __builtin_bswap16(y) : 0;
 }

Re: [PATCH 6/4] libbacktrace: Add loaded dlls after initialize

2024-01-07 Thread Eli Zaretskii

> Date: Sun, 7 Jan 2024 17:07:06 +0100
> Cc: i...@google.com, gcc-patches@gcc.gnu.org, g...@gcc.gnu.org
> From: Björn Schäpers 
> 
> > That was about GetModuleHandle, not about GetModuleHandleEx.  For the
> > latter, all Windows versions that support it also support "wide" APIs.
> > So my suggestion is to use GetModuleHandleExW here.  However, you will
> > need to make sure that notification_data->dll_base is declared as
> > 'wchar_t *', not 'char *'.  If dll_base is declared as 'char *', then
> > only GetModuleHandleExA will work, and you will lose the ability to
> > support file names with non-ASCII characters outside of the current
> > system codepage.
> 
> The dll_base is a PVOID. With the GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS flag 
> GetModuleHandleEx does not look for a name, but uses an adress in the module 
> to 
> get the HMODULE, so you cast it to char* or wchar_t* depending on which 
> function 
> you call. Actually one could just cast the dll_base to HMODULE, at least in 
> win32 on x86 the HMODULE of a dll is always its base adress. But to make it 
> safer and future proof I went the way through GetModuleHandeEx.

In that case, you an call either GetModuleHandeExA or
GetModuleHandeExW, the difference is minor.

Re: [patch, testsuite, applied] PR52641 Fix more fallout from sloppy tests.

2024-01-07 Thread Georg-Johann Lay





Am 07.01.24 um 17:45 schrieb Jeff Law:



On 1/7/24 08:53, Georg-Johann Lay wrote:

Made some tests more generic so they can pass on more targets.

Johann

--

testsuite/52641: Fix fallout from sloppy tests.

gcc/testsuite/
 PR testsuite/52641
 * gcc.dg/torture/pr110838.c: Use proper shift offset to get MSB 
or int.
 * gcc.dg/torture/pr112282.c: Use at least 32 bits for :20 
bit-fields.
 * gcc.dg/tree-ssa/bitcmp-5.c: Use integral type with 32 bits or 
more.

 * gcc.dg/tree-ssa/bitcmp-6.c: Same.
 * gcc.dg/tree-ssa/cltz-complement-max.c: Same.
 * gcc.dg/tree-ssa/cltz-max.c: Same.
 * gcc.dg/tree-ssa/if-to-switch-8.c: Use literals that fit int.
 * gcc.dg/tree-ssa/if-to-switch-9.c [avr]: Set 
case-values-threshold=3.

 * gcc.dg/tree-ssa/negneg-3.c: Discriminate [not] large_double.
 * gcc.dg/tree-ssa/phi-opt-25b.c: Use types of correct widths for
 __builtin_bswapN.
 * gcc.dg/tree-ssa/pr55177-1.c: Same.
 * gcc.dg/tree-ssa/popcount-max.c: Use int32_t where required.
 * gcc.dg/tree-ssa/pr111583-1.c: Use intptr_t as needed.
 * gcc.dg/tree-ssa/pr111583-2.c: Same.
Are you checking this on other targets?  My tester just started 
complaining about these (ft30-elf, fr30-elf), more expected as today's 
run progresses)




Tests that now fail, but worked before (2 tests):

ft32-sim: gcc: gcc.dg/tree-ssa/phi-opt-25b.c (test for excess errors)
ft32-sim: gcc: gcc.dg/tree-ssa/phi-opt-25b.c (test for excess errors)


Jeff


Hi Jeff, thanks for fixing the typo.

It slipped through because "int short" works in that place.

Usually when going after PR52641 I used dg-require, dg-skip or
dg-xfail for tests that fail on 16-bit int etc.

The take above was more ambitious in that it tried to make some
tests work without breaking other platforms of course.

It's not always easy to get the intent of a test case and how
to make it more generic though.

Johann

[PATCH]middle-end: thread through existing LCSSA variable for alternative exits too [PR113237]

2024-01-07 Thread Tamar Christina

Hi All,

Builing on top of the previous patch, similar to when we have a single exit if
we have a case where all exits are considered early exits and there are existing
non virtual phi then in order to maintain LCSSA we have to use the existing PHI
variables.  We can't simply clear them and just rebuild them because the order
of the PHIs in the main exit must match the original exit for when we add the
skip_epilog guard.

But the infrastructure is already in place to maintain them, we just have to use
the right value.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues normally and with with --enable-checking=release --enable-lto
--with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/113237
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
existing LCSSA variable for exit when all exits are early break.

gcc/testsuite/ChangeLog:

PR tree-optimization/113237
* gcc.dg/vect/vect-early-break_98-pr113237.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_98-pr113237.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_98-pr113237.c
new file mode 100644
index 
..e6d150b571f753e9eb3859f06f62b371817494a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_98-pr113237.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+long Perl_pp_split_limit;
+int Perl_block_gimme();
+int Perl_pp_split() {
+  char strend;
+  long iters;
+  int gimme = Perl_block_gimme();
+  while (--Perl_pp_split_limit) {
+if (gimme)
+  iters++;
+if (strend)
+  break;
+  }
+  if (iters)
+return 0;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
7fd6566341b4893a1e209d1f8ff65d6d180f1190..77649b84f45b9e5dacec2809e0c854c8fcc17ce1
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1700,7 +1700,12 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
  if (peeled_iters && !virtual_operand_p (new_arg))
{
  tree tmp_arg = gimple_phi_result (from_phi);
- if (!new_phi_args.get (tmp_arg))
+ /* Similar to the single exit case, If we have an existing
+LCSSA variable thread through the original value otherwise
+skip it and directly use the final value.  */
+ if (tree *res = new_phi_args.get (tmp_arg))
+   new_arg = *res;
+ else
new_arg = tmp_arg;
}
 




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_98-pr113237.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_98-pr113237.c
new file mode 100644
index 
..e6d150b571f753e9eb3859f06f62b371817494a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_98-pr113237.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+long Perl_pp_split_limit;
+int Perl_block_gimme();
+int Perl_pp_split() {
+  char strend;
+  long iters;
+  int gimme = Perl_block_gimme();
+  while (--Perl_pp_split_limit) {
+if (gimme)
+  iters++;
+if (strend)
+  break;
+  }
+  if (iters)
+return 0;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
7fd6566341b4893a1e209d1f8ff65d6d180f1190..77649b84f45b9e5dacec2809e0c854c8fcc17ce1
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1700,7 +1700,12 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
  if (peeled_iters && !virtual_operand_p (new_arg))
{
  tree tmp_arg = gimple_phi_result (from_phi);
- if (!new_phi_args.get (tmp_arg))
+ /* Similar to the single exit case, If we have an existing
+LCSSA variable thread through the original value otherwise
+skip it and directly use the final value.  */
+ if (tree *res = new_phi_args.get (tmp_arg))
+   new_arg = *res;
+ else
new_arg = tmp_arg;
}

[libgfortran, patch] PR113223 NAMELIST internal write missing leading blank character

2024-01-07 Thread Jerry D


Committed as simple and obvious. Initial patch thanks to Steve.

When using git gcc-commit-mklog how does one add in the coauthor?

The master branch has been updated by Jerry DeLisle :

https://gcc.gnu.org/g:add995ec117d756e61d207041cd32f937c1a1cd9

commit r14-6986-gadd995ec117d756e61d207041cd32f937c1a1cd9
Author: Jerry DeLisle 
Date:   Sun Jan 7 10:22:19 2024 -0800

libgfortran: Emit a space at beginning of internal unit NML.

PR libgfortran/113223

libgfortran/ChangeLog:

* io/write.c (namelist_write): If internal_unit precede 
with space.


gcc/testsuite/ChangeLog:

* gfortran.dg/dtio_25.f90: Update.
* gfortran.dg/namelist_57.f90: Update.
* gfortran.dg/namelist_65.f90: Update.

Re: [PATCH] Add __cow_string C string constructor

2024-01-07 Thread François Dumont




On 07/01/2024 17:34, Jonathan Wakely wrote:

On Sun, 7 Jan 2024 at 12:57, François Dumont  wrote:

Hi

While working on the patch to use the cxx11 abi in gnu version namespace
mode I got a small problem with this missing constructor. I'm not sure
that the main patch will be integrated in gcc 14 so I think it is better
if I propose this patch independently.

  libstdc++: Add __cow_string constructor from C string

  The __cow_string is instantiated from a C string in
cow-stdexcept.cc. At the moment
  the constructor from std::string is being used with the drawback of
an intermediate
  potential allocation/deallocation and copy. With the C string
constructor we bypass
  all those operations.

But in that file, the std::string is the COW string, which means that
when we construct a std::string and copy it, it's cheap. It's just a
reference count increment/decrement. There should be no additional
allocation or deallocation.


Good remark but AFAI understand in this case std::string is the cxx11 
one. I'll take a second look.


Clearly in my gnu version namespace patch it is the cxx11 implementation.

Even if so, why do we want to do those additional operations ? Adding 
this C string constructor will make sure that no useless operations will 
be done.




Am I missing something?



  libstdc++-v3/ChangeLog:

  * include/std/stdexcept (__cow_string(const char*)): New
definition.
  * src/c++11/cow-stdexcept.cc (__cow_string(const char*)):
New definition and
  declaration.

Tested under Linux x64, ok to commit ?

François

Re: [patch, testsuite, applied] PR52641 Fix more fallout from sloppy tests.

2024-01-07 Thread Jeff Law





On 1/7/24 10:17, Georg-Johann Lay wrote:



Am 07.01.24 um 17:45 schrieb Jeff Law:



On 1/7/24 08:53, Georg-Johann Lay wrote:

Made some tests more generic so they can pass on more targets.

Johann

--

testsuite/52641: Fix fallout from sloppy tests.

gcc/testsuite/
 PR testsuite/52641
 * gcc.dg/torture/pr110838.c: Use proper shift offset to get MSB 
or int.
 * gcc.dg/torture/pr112282.c: Use at least 32 bits for :20 
bit-fields.
 * gcc.dg/tree-ssa/bitcmp-5.c: Use integral type with 32 bits or 
more.

 * gcc.dg/tree-ssa/bitcmp-6.c: Same.
 * gcc.dg/tree-ssa/cltz-complement-max.c: Same.
 * gcc.dg/tree-ssa/cltz-max.c: Same.
 * gcc.dg/tree-ssa/if-to-switch-8.c: Use literals that fit int.
 * gcc.dg/tree-ssa/if-to-switch-9.c [avr]: Set 
case-values-threshold=3.

 * gcc.dg/tree-ssa/negneg-3.c: Discriminate [not] large_double.
 * gcc.dg/tree-ssa/phi-opt-25b.c: Use types of correct widths for
 __builtin_bswapN.
 * gcc.dg/tree-ssa/pr55177-1.c: Same.
 * gcc.dg/tree-ssa/popcount-max.c: Use int32_t where required.
 * gcc.dg/tree-ssa/pr111583-1.c: Use intptr_t as needed.
 * gcc.dg/tree-ssa/pr111583-2.c: Same.
Are you checking this on other targets?  My tester just started 
complaining about these (ft30-elf, fr30-elf), more expected as today's 
run progresses)




Tests that now fail, but worked before (2 tests):

ft32-sim: gcc: gcc.dg/tree-ssa/phi-opt-25b.c (test for excess errors)
ft32-sim: gcc: gcc.dg/tree-ssa/phi-opt-25b.c (test for excess errors)


Jeff


Hi Jeff, thanks for fixing the typo.

It slipped through because "int short" works in that place.

Usually when going after PR52641 I used dg-require, dg-skip or
dg-xfail for tests that fail on 16-bit int etc.
Yea.  We're not terribly good about keeping the testsuite 16bit clean. 
Just not that much interest in such ports.  Just to be clear -- I'm 
happy to see the testsuite improved by reducing noise, even on these 
lesser used platforms.




The take above was more ambitious in that it tried to make some
tests work without breaking other platforms of course.
ACK.  It happens.  Just a reminder to be careful.  I've already fixed it 
on the trunk.






It's not always easy to get the intent of a test case and how
to make it more generic though.

Absolutely true.

Thanks again,

jeff

Re: [PATCH]middle-end: thread through existing LCSSA variable for alternative exits too [PR113237]

2024-01-07 Thread Toon Moene


On 1/7/24 18:29, Tamar Christina wrote:


gcc/ChangeLog:

PR tree-optimization/113237
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
existing LCSSA variable for exit when all exits are early break.


Might that be the same error as I got here when building with 
bootstrap-lto and bootstrap-O3:


https://gcc.gnu.org/pipermail/gcc-testresults/2024-January/804807.html

?

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands

Re: [Committed] RISC-V: Use MAX instead of std::max [VSETVL PASS]

2024-01-07 Thread Jeff Law





On 1/6/24 17:36, Juzhe-Zhong wrote:

Obvious fix, Committed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: replace std::max by MAX.
Curious why you made this change -- in general we're moving to 
std::{min,max,swap} and away from macro-ized min/max/swap.


Jeff

[Patch] GCN: Add pre-initial support for gfx1100

2024-01-07 Thread Tobias Burnus

ROCm meanwhile supports also some consumer cards; besides the semi-new 
gfx1030, support for gfx1100 was added more recently (in ROCm 5.7.1 for 
"Ubuntu 22.04 only" and without parenthesis since ROCm 6.0.0).


GCC has already very limited support for gfx1030 - whose multlib support 
is - on purpose - not yet enabled by default and is WIP.


The attached patch now adds gfx1100 on top of it, assuming that it 
mostly behaves the same as gfx1030. This is really WIP as there are 
known build (assembly) issues (see below) and not only "just" runtime 
issues.


gfx1100 differs at least in the following aspects from the previously 
supported cards:


* gfx1100 has an 'architected flat scratch' which is different from 
'absolute flat scratch' which all others (but fiji: 'offset flat 
scratch') have. Hence, '.amdhsa_reserve_flat_scratch 0'

has to be excluded to avoid assembly errors.

* gfx1100 also does not support 'v_mov_b32_sdwa', failing to assembly
  libc/argz/libc_a-argz_stringify.o with:
  "sdwa variant of this instruction is not supported"
→ This has not been address in the patch, hence, specifying gfx1100 in 
--with-multilib-list= will fail to build when an in-tree newlib is build.


* * *

The attached patch fixes in addition one issue in libgomp (string-length 
len constant is too short for gfx1030 (and gfx1100) = 7 characters) and 
it includes the fix that __gfx1030__ is not defined, which I have 
submitted separately (yesterday).


With the caveat that gfx1100 is even less usable than gfx1030 and it 
won't build newlib, is it nonetheless


  OK for mainline ?

(As gfx1100 is not enabled by default in multilib, a regular build will 
will not fail and I think the *.md issue can be addressed separately.)


TobiasGCN: Add pre-initial support for gfx1100

ROCm since 5.7.1 supports gfx1100 (RDNA3) cards. This commit adds support
for it, mostly by assuming gfx1100 behaves identical to gfx1030.  Like gfx1030,
gfx1100 support is neither documented nor the build of the multilib enabled by
default.

But contrary to gfx1030, gfx1100 has a known issue causing some libraries not
to build, including newlib: The sdwa variant of v_mov_b32_sdwa is not supported
by the hardware but GCC current does generates this instruction.
This will be addressed in a later commit.

gcc/ChangeLog:

	* config.gcc (amdgcn-*-amdhsa): Accept --with-arch=gfx1100.
	* config/gcn/gcn-hsa.h (NO_XNACK): Add gfx1100:
	(ASM_SPEC): Handle gfx1100.
	* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1100.
	(enum gcn_isa): Add ISA_RDNA3.
	(TARGET_GFX1100, TARGET_RDNA2_PLUS, TARGET_RDNA3): Define.
	* config/gcn/gcn-valu.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS.
	* config/gcn/gcn.cc (gcn_option_override,
	gcn_omp_device_kind_arch_isa, output_file_start): Handle gfx1100.
	(gcn_global_address_p, gcn_addr_space_legitimate_address_p): Change
	TARGET_RDNA2 to TARGET_RDNA2_PLUS.
	(gcn_hsa_declare_function_name): Don't use '.amdhsa_reserve_flat_scratch'
	with gfx1100.
	* config/gcn/gcn.h (ASSEMBLER_DIALECT): Likewise.
	(TARGET_CPU_CPP_BUILTINS): Define __RDNA3__, __gfx1030__ and
	__gfx1100__.
	* config/gcn/gcn.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS.
	* config/gcn/gcn.opt (Enum gpu_type): Add gfx1100.
	* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1100): Define.
	(isa_has_combined_avgprs, main): Handle gfx1100.
	* config/gcn/t-omp-device (isa): Add gfx1100.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (gcn_gfx1100_s): New const string.
	(gcn_isa_name_len): Fix length.
	(isa_hsa_name, isa_code, max_isa_vgprs): Handle gfx1100.

 gcc/config.gcc  |  2 +-
 gcc/config/gcn/gcn-hsa.h|  4 ++--
 gcc/config/gcn/gcn-opts.h   |  7 +-
 gcc/config/gcn/gcn-valu.md  | 10 
 gcc/config/gcn/gcn.cc   | 29 ---
 gcc/config/gcn/gcn.h| 10 +---
 gcc/config/gcn/gcn.md   | 32 -
 gcc/config/gcn/gcn.opt  |  3 +++
 gcc/config/gcn/mkoffload.cc |  5 
 gcc/config/gcn/t-omp-device |  2 +-
 gcc/tree-vect-loop-manip.cc | 16 +
 gcc/tree-vect-loop.cc   | 58 ++---
 libgomp/plugin/plugin-gcn.c |  9 ++-
 13 files changed, 119 insertions(+), 68 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ce40b7758dd..7e583390024 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4548,7 +4548,7 @@ case "${target}" in
 		for which in arch tune; do
 			eval "val=\$with_$which"
 			case ${val} in
-			"" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030)
+			"" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 | gfx1100)
 # OK
 ;;
 			*)
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 43bbe0411a3..bf7079fbbc6 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -75,7 +75,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
supported for gcn.  */
 #define GOMP_SELF_SPECS ""
 
-#define NO_XNACK "march=fiji:;march=gfx1030:;" \
+#define NO_XNACK "marc

Re: [libgfortran, patch] PR113223 NAMELIST internal write missing leading blank character

2024-01-07 Thread Harald Anlauf


Hi Jerry!

On 1/7/24 19:40, Jerry D wrote:

Committed as simple and obvious. Initial patch thanks to Steve.

When using git gcc-commit-mklog how does one add in the coauthor?


% git help gcc-commit-mklog
...
  --co CO   Add Co-Authored-By trailer (comma separated)

However, I usually add this line manually later.

Regarding the format, have a look at existing log messages.

Cheers,
Harald


The master branch has been updated by Jerry DeLisle
:

https://gcc.gnu.org/g:add995ec117d756e61d207041cd32f937c1a1cd9

commit r14-6986-gadd995ec117d756e61d207041cd32f937c1a1cd9
Author: Jerry DeLisle 
Date:   Sun Jan 7 10:22:19 2024 -0800

     libgfortran: Emit a space at beginning of internal unit NML.

     PR libgfortran/113223

     libgfortran/ChangeLog:

     * io/write.c (namelist_write): If internal_unit precede
with space.

     gcc/testsuite/ChangeLog:

     * gfortran.dg/dtio_25.f90: Update.
     * gfortran.dg/namelist_57.f90: Update.
     * gfortran.dg/namelist_65.f90: Update.

[patch,avr,applied] Fix some avr test cases

2024-01-07 Thread Georg-Johann Lay


The patch below fixes some obvious problems in gcc.target/avr:

* Remove duplicate -mmcu=

* Skip tests with address spaces on Reduced Tiny which does not support 
address spaces at all.


* Address spaces are GNU-C, but some tests were missing -std=gnu*

* Don't test address-space __flash1 on devices that don't have it.

Johann

--

AVR: Fix some test options. Skip tests with address-space on Reduced Tiny.

gcc/testsuite/
* gcc.target/avr/lra-cpymem_qi.c: Remove duplicate -mmcu=.
* gcc.target/avr/lra-elim.c: Same.
* gcc.target/avr/pr112830.c: Skip for Reduced Tiny.
* gcc.target/avr/pr46779-1.c: Same.
* gcc.target/avr/pr46779-2.c: Same.
* gcc.target/avr/pr86869.c: Skip for Reduced Tiny and add -std=gnu99
for GNU-C due to address spaces.
* gcc.target/avr/pr89270.c: Same.
* gcc.target/avr/torture/builtins-2-flash.c: Only test address
space __flash1 if we have it.
* gcc.target/avr/torture/addr-space-1-1.c: Same.
* gcc.target/avr/torture/addr-space-2-1.c: Same.diff --git a/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c b/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c
index fdffb445b45..31cf2003c43 100644
--- a/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c
+++ b/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mmcu=avr51 -Os" } */
+/* { dg-options "-Os" } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/avr/lra-elim.c b/gcc/testsuite/gcc.target/avr/lra-elim.c
index d5086a7fd5d..8d5dbf8ac4e 100644
--- a/gcc/testsuite/gcc.target/avr/lra-elim.c
+++ b/gcc/testsuite/gcc.target/avr/lra-elim.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mmcu=avr25 -Os" } */
+/* { dg-options "-Os" } */
 
 typedef int HItype __attribute__ ((mode (HI)));
 HItype
diff --git a/gcc/testsuite/gcc.target/avr/pr112830.c b/gcc/testsuite/gcc.target/avr/pr112830.c
index c305daed06c..dd70dd0ea39 100644
--- a/gcc/testsuite/gcc.target/avr/pr112830.c
+++ b/gcc/testsuite/gcc.target/avr/pr112830.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! avr_tiny } } } */
 /* { dg-options "" } */
 
 typedef __SIZE_TYPE__ size_t;
diff --git a/gcc/testsuite/gcc.target/avr/pr46779-1.c b/gcc/testsuite/gcc.target/avr/pr46779-1.c
index 24522f175be..e3e0b292114 100644
--- a/gcc/testsuite/gcc.target/avr/pr46779-1.c
+++ b/gcc/testsuite/gcc.target/avr/pr46779-1.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target { ! avr_tiny } } } */
 /* { dg-options "-Os -fsplit-wide-types" } */
 
 /* This testcase should uncover bugs like
diff --git a/gcc/testsuite/gcc.target/avr/pr46779-2.c b/gcc/testsuite/gcc.target/avr/pr46779-2.c
index 682070b5ef9..557cc749c75 100644
--- a/gcc/testsuite/gcc.target/avr/pr46779-2.c
+++ b/gcc/testsuite/gcc.target/avr/pr46779-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target { ! avr_tiny } } } */
 /* { dg-options "-Os -fno-split-wide-types" } */
 
 /* This testcase should uncover bugs like
diff --git a/gcc/testsuite/gcc.target/avr/pr86869.c b/gcc/testsuite/gcc.target/avr/pr86869.c
index fbfb378e8c9..a5de4cc6510 100644
--- a/gcc/testsuite/gcc.target/avr/pr86869.c
+++ b/gcc/testsuite/gcc.target/avr/pr86869.c
@@ -1,4 +1,5 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! avr_tiny } } } */
+/* { dg-additional-options "-std=gnu99 -w" } */
 
 struct S {
   char y[2];
diff --git a/gcc/testsuite/gcc.target/avr/pr89270.c b/gcc/testsuite/gcc.target/avr/pr89270.c
index 2b6e4a8aa5b..5b43218eddb 100644
--- a/gcc/testsuite/gcc.target/avr/pr89270.c
+++ b/gcc/testsuite/gcc.target/avr/pr89270.c
@@ -1,4 +1,5 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { ! avr_tiny } } } */
+/* { dg-additional-options "-std=gnu99" } */
 
 void test()
 {
diff --git a/gcc/testsuite/gcc.target/avr/torture/addr-space-1-1.c b/gcc/testsuite/gcc.target/avr/torture/addr-space-1-1.c
index e90bdcb5bfb..4812f67e2f1 100644
--- a/gcc/testsuite/gcc.target/avr/torture/addr-space-1-1.c
+++ b/gcc/testsuite/gcc.target/avr/torture/addr-space-1-1.c
@@ -1,6 +1,10 @@
 /* { dg-options "-std=gnu99 -Tavr51-flash1.x" } */
 /* { dg-do run { target { ! avr_tiny } } } */
 
+#ifdef __FLASH1
 #define __as __flash1
+#else
+#define __as __flash
+#endif
 
 #include "addr-space-1.h"
diff --git a/gcc/testsuite/gcc.target/avr/torture/addr-space-2-1.c b/gcc/testsuite/gcc.target/avr/torture/addr-space-2-1.c
index 327124aff27..d5fcf0a5520 100644
--- a/gcc/testsuite/gcc.target/avr/torture/addr-space-2-1.c
+++ b/gcc/testsuite/gcc.target/avr/torture/addr-space-2-1.c
@@ -1,6 +1,10 @@
 /* { dg-options "-std=gnu99 -Tavr51-flash1.x" } */
 /* { dg-do run { target { ! avr_tiny } } } */
 
+#ifdef __FLASH1
 #define __as __flash1
+#else
+#define __as __flash
+#endif
 
 #include "addr-space-2.h"
diff --git a/gcc/testsuite/gcc.target/avr/torture/builtins-2-flash.c b/gcc/testsuite/gcc.target/avr/torture/builtins-2-flash.c
index 318551d5ccf..11dba67b85a 100644
--- a/gcc/testsuite/gcc.target/avr/tort

[PATCH] libstdc++: reduce std::variant template instantiation depth

2024-01-07 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

The recursively defined constraints on _Variadic_union's user-defined
destructor (necessary for maintaining trivial destructibility of the
variant iff all of its alternatives are) effectively require a template
instantiation depth of 3x the number of variants, with the instantiation
stack looking like

  ...
  _Variadic_union
  std::is_trivially_destructible_v<_Variadic_union>
  _Variadic_union::~_Variadic_union()
  _Variadic_union
  ...

Ideally the template depth should be ~equal to the number of variants
(plus a constant).  Luckily it seems we don't need to compute trivial
destructibility of the alternatives at all from _Variadic_union, since
its only user _Variant_storage already has that information.  To that
end this patch removes these recursive constraints and instead passes
this information down from _Variant_storage.  After this patch, the
template instantiation depth for 87619.cc is ~270 instead of ~780.

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::_Variadic_union):
Add bool __trivially_destructible template parameter.
(__detail::__variant::_Variadic_union::~_Variadic_union):
Use __trivially_destructible in constraints instead.
(_Variant_storage): Pass __trivially_destructible value to
_Variadic_union.
---
 libstdc++-v3/include/std/variant | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 20a76c8aa87..4b9002e0917 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -392,7 +392,7 @@ namespace __variant
 };
 
   // Defines members and ctors.
-  template
+  template
 union _Variadic_union
 {
   _Variadic_union() = default;
@@ -401,8 +401,8 @@ namespace __variant
_Variadic_union(in_place_index_t<_Np>, _Args&&...) = delete;
 };
 
-  template
-union _Variadic_union<_First, _Rest...>
+  template
+union _Variadic_union<__trivially_destructible, _First, _Rest...>
 {
   constexpr _Variadic_union() : _M_rest() { }
 
@@ -427,13 +427,12 @@ namespace __variant
   ~_Variadic_union() = default;
 
   constexpr ~_Variadic_union()
-   requires (!is_trivially_destructible_v<_First>)
- || (!is_trivially_destructible_v<_Variadic_union<_Rest...>>)
+   requires (!__trivially_destructible)
   { }
 #endif
 
   _Uninitialized<_First> _M_first;
-  _Variadic_union<_Rest...> _M_rest;
+  _Variadic_union<__trivially_destructible, _Rest...> _M_rest;
 };
 
   // _Never_valueless_alt is true for variant alternatives that can
@@ -514,7 +513,7 @@ namespace __variant
return this->_M_index != __index_type(variant_npos);
   }
 
-  _Variadic_union<_Types...> _M_u;
+  _Variadic_union _M_u;
   using __index_type = __select_index<_Types...>;
   __index_type _M_index;
 };
@@ -552,7 +551,7 @@ namespace __variant
return this->_M_index != static_cast<__index_type>(variant_npos);
   }
 
-  _Variadic_union<_Types...> _M_u;
+  _Variadic_union _M_u;
   using __index_type = __select_index<_Types...>;
   __index_type _M_index;
 };
-- 
2.43.0.254.ga26002b628

Re: [PATCH] libstdc++: reduce std::variant template instantiation depth

2024-01-07 Thread Patrick Palka

On Sun, 7 Jan 2024, Patrick Palka wrote:

> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
> -- >8 --
> 
> The recursively defined constraints on _Variadic_union's user-defined
> destructor (necessary for maintaining trivial destructibility of the
> variant iff all of its alternatives are) effectively require a template
> instantiation depth of 3x the number of variants, with the instantiation
> stack looking like
> 
>   ...
>   _Variadic_union
>   std::is_trivially_destructible_v<_Variadic_union>
>   _Variadic_union::~_Variadic_union()
>   _Variadic_union
>   ...
> 
> Ideally the template depth should be ~equal to the number of variants
> (plus a constant).  Luckily it seems we don't need to compute trivial
> destructibility of the alternatives at all from _Variadic_union, since
> its only user _Variant_storage already has that information.  To that
> end this patch removes these recursive constraints and instead passes
> this information down from _Variant_storage.  After this patch, the
> template instantiation depth for 87619.cc is ~270 instead of ~780.

Perhaps we should also test this change with by setting -ftemplate-depth
to something between 256 and 512:

 libstdc++-v3/testsuite/20_util/variant/87619.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/20_util/variant/87619.cc 
b/libstdc++-v3/testsuite/20_util/variant/87619.cc
index 45418e16ca8..b7cfc20858a 100644
--- a/libstdc++-v3/testsuite/20_util/variant/87619.cc
+++ b/libstdc++-v3/testsuite/20_util/variant/87619.cc
@@ -16,6 +16,7 @@
 // .
 
 // { dg-do compile { target c++17 } }
+// { dg-additional-options "-ftemplate-depth=300" }
 
 #include 
 #include 

> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/variant (__detail::__variant::_Variadic_union):
>   Add bool __trivially_destructible template parameter.
>   (__detail::__variant::_Variadic_union::~_Variadic_union):
>   Use __trivially_destructible in constraints instead.
>   (_Variant_storage): Pass __trivially_destructible value to
>   _Variadic_union.
> ---
>  libstdc++-v3/include/std/variant | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/libstdc++-v3/include/std/variant 
> b/libstdc++-v3/include/std/variant
> index 20a76c8aa87..4b9002e0917 100644
> --- a/libstdc++-v3/include/std/variant
> +++ b/libstdc++-v3/include/std/variant
> @@ -392,7 +392,7 @@ namespace __variant
>  };
>  
>// Defines members and ctors.
> -  template
> +  template
>  union _Variadic_union
>  {
>_Variadic_union() = default;
> @@ -401,8 +401,8 @@ namespace __variant
>   _Variadic_union(in_place_index_t<_Np>, _Args&&...) = delete;
>  };
>  
> -  template
> -union _Variadic_union<_First, _Rest...>
> +  template
> +union _Variadic_union<__trivially_destructible, _First, _Rest...>
>  {
>constexpr _Variadic_union() : _M_rest() { }
>  
> @@ -427,13 +427,12 @@ namespace __variant
>~_Variadic_union() = default;
>  
>constexpr ~_Variadic_union()
> - requires (!is_trivially_destructible_v<_First>)
> -   || (!is_trivially_destructible_v<_Variadic_union<_Rest...>>)
> + requires (!__trivially_destructible)
>{ }
>  #endif
>  
>_Uninitialized<_First> _M_first;
> -  _Variadic_union<_Rest...> _M_rest;
> +  _Variadic_union<__trivially_destructible, _Rest...> _M_rest;
>  };
>  
>// _Never_valueless_alt is true for variant alternatives that can
> @@ -514,7 +513,7 @@ namespace __variant
>   return this->_M_index != __index_type(variant_npos);
>}
>  
> -  _Variadic_union<_Types...> _M_u;
> +  _Variadic_union _M_u;
>using __index_type = __select_index<_Types...>;
>__index_type _M_index;
>  };
> @@ -552,7 +551,7 @@ namespace __variant
>   return this->_M_index != static_cast<__index_type>(variant_npos);
>}
>  
> -  _Variadic_union<_Types...> _M_u;
> +  _Variadic_union _M_u;
>using __index_type = __select_index<_Types...>;
>__index_type _M_index;
>  };
> -- 
> 2.43.0.254.ga26002b628
> 
>

Re: [PATCH] Add __cow_string C string constructor

2024-01-07 Thread Jonathan Wakely

On Sun, 7 Jan 2024 at 18:50, François Dumont  wrote:
>
>
> On 07/01/2024 17:34, Jonathan Wakely wrote:
> > On Sun, 7 Jan 2024 at 12:57, François Dumont  wrote:
> >> Hi
> >>
> >> While working on the patch to use the cxx11 abi in gnu version namespace
> >> mode I got a small problem with this missing constructor. I'm not sure
> >> that the main patch will be integrated in gcc 14 so I think it is better
> >> if I propose this patch independently.
> >>
> >>   libstdc++: Add __cow_string constructor from C string
> >>
> >>   The __cow_string is instantiated from a C string in
> >> cow-stdexcept.cc. At the moment
> >>   the constructor from std::string is being used with the drawback of
> >> an intermediate
> >>   potential allocation/deallocation and copy. With the C string
> >> constructor we bypass
> >>   all those operations.
> > But in that file, the std::string is the COW string, which means that
> > when we construct a std::string and copy it, it's cheap. It's just a
> > reference count increment/decrement. There should be no additional
> > allocation or deallocation.
>
> Good remark but AFAI understand in this case std::string is the cxx11
> one. I'll take a second look.
>
> Clearly in my gnu version namespace patch it is the cxx11 implementation.

I hope not! The whole point of that type is to always be a COW string,
which it does by storing a COW std::basic_string in the union, but
wrapping it in a class with a different name, __cow_string.

If your patch to use the SSO string in the versioned namespace doesn't
change that file to guarantee that __cow_string is still a
copy-on-write type then the patch is wrong and must be fixed.

>
> Even if so, why do we want to do those additional operations ? Adding
> this C string constructor will make sure that no useless operations will
> be done.

Yes, we could avoid an atomic increment and decrement, but that type
is only used when throwing an exception so the overhead of allocating
memory and calling __cxa_throw etc. is far higher than an atomic
inc/dec pair.

I was going to say that the new constructor would need to be exported
from the shared lib, but I think the new constructor is only ever used
in these two places, both defined in that same file:

  logic_error::logic_error(const char* __arg)
  : exception(), _M_msg(__arg) { }

  runtime_error::runtime_error(const char* __arg)
  : exception(), _M_msg(__arg) { }

So I think the change is safe, but I don't think it's urgent, and
certainly not needed for the reasons claimed in the patch description.

[PATCH] Fortran: SIZE optional DIM argument having OPTIONAL+VALUE attributes [PR113245]

2024-01-07 Thread Harald Anlauf

Dear all,

the attached, actually rather obvious patch fixes an issue when
an optional dummy with the value attribute was passed as DIM
argument to the SIZE intrinsic.  Instead of some hand-crafted,
incomplete presence check for the argument, it makes more sense
to rely on gfc_conv_expr_present().

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 49f5c89f6bdddbb225ca70f8df78a75252b0b2d5 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 7 Jan 2024 22:24:25 +0100
Subject: [PATCH] Fortran: SIZE optional DIM argument having OPTIONAL+VALUE
 attributes [PR113245]

gcc/fortran/ChangeLog:

	PR fortran/113245
	* trans-intrinsic.cc (gfc_conv_intrinsic_size): Use
	gfc_conv_expr_present() for proper check of optional DIM argument.

gcc/testsuite/ChangeLog:

	PR fortran/113245
	* gfortran.dg/size_optional_dim_2.f90: New test.
---
 gcc/fortran/trans-intrinsic.cc|  4 +--
 .../gfortran.dg/size_optional_dim_2.f90   | 31 +++
 2 files changed, 32 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/size_optional_dim_2.f90

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index d973c49380c..74139262657 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -8025,9 +8025,6 @@ gfc_conv_intrinsic_size (gfc_se * se, gfc_expr * expr)
 	  argse.data_not_needed = 1;
 	  gfc_conv_expr (&argse, actual->expr);
 	  gfc_add_block_to_block (&se->pre, &argse.pre);
-	  cond = fold_build2_loc (input_location, NE_EXPR, logical_type_node,
-  argse.expr, null_pointer_node);
-	  cond = gfc_evaluate_now (cond, &se->pre);
 	  /* 'block2' contains the arg2 absent case, 'block' the arg2 present
 	  case; size_var can be used in both blocks. */
 	  tree size_var = gfc_create_var (TREE_TYPE (size), "size");
@@ -8038,6 +8035,7 @@ gfc_conv_intrinsic_size (gfc_se * se, gfc_expr * expr)
 	  tmp = fold_build2_loc (input_location, MODIFY_EXPR,
  TREE_TYPE (size_var), size_var, size);
 	  gfc_add_expr_to_block (&block2, tmp);
+	  cond = gfc_conv_expr_present (actual->expr->symtree->n.sym);
 	  tmp = build3_v (COND_EXPR, cond, gfc_finish_block (&block),
 			  gfc_finish_block (&block2));
 	  gfc_add_expr_to_block (&se->pre, tmp);
diff --git a/gcc/testsuite/gfortran.dg/size_optional_dim_2.f90 b/gcc/testsuite/gfortran.dg/size_optional_dim_2.f90
new file mode 100644
index 000..698702b0974
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/size_optional_dim_2.f90
@@ -0,0 +1,31 @@
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-original" }
+! PR fortran/113245 - SIZE, optional DIM argument, w/ OPTIONAL+VALUE attributes
+
+program p
+  implicit none
+  real:: a(2,3)
+  integer :: expect
+  expect = size (a,2)
+  call ref (a,2)
+  call val (a,2)
+  expect = size (a)
+  call ref (a)
+  call val (a)
+contains
+  subroutine ref (x, dim)
+real,  intent(in) :: x(:,:)
+integer, optional, intent(in) :: dim
+print *, "present(dim), size(a,dim) =", present (dim), size (x,dim=dim)
+if (size (x,dim=dim) /= expect) stop 1
+  end
+  subroutine val (x, dim)
+real,  intent(in) :: x(:,:)
+integer, optional, value  :: dim
+print *, "present(dim), size(a,dim) =", present (dim), size (x,dim=dim)
+if (size (x,dim=dim) /= expect) stop 2
+  end
+end
+
+! Ensure inline code is generated:
+! { dg-final { scan-tree-dump-not "_gfortran_size" "original" } }
--
2.35.3

Re: Re: [Committed] RISC-V: Use MAX instead of std::max [VSETVL PASS]

2024-01-07 Thread 钟居哲

Since in the previous review from Robin, he have ever asked me change std::max 
into MAX,
I thought the policy is preferring MAX instead of std::max.

I change the codes to make them consistent but it seems I am wrong.

So is it reasonable that I change all RVV-related codes back to use 
std::max/min ?

If yes, I can send a patch to adapt all of them in RVV related codes.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2024-01-08 03:11
To: Juzhe-Zhong; gcc-patches
Subject: Re: [Committed] RISC-V: Use MAX instead of std::max [VSETVL PASS]

On 1/6/24 17:36, Juzhe-Zhong wrote:
> Obvious fix, Committed.
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-vsetvl.cc: replace std::max by MAX.
Curious why you made this change -- in general we're moving to 
std::{min,max,swap} and away from macro-ized min/max/swap.

Jeff

[PATCH v8 5/4] c++: P0847R7 (deducing this) - CWG2586. [PR102609]

2024-01-07 Thread waffl3x

Bootstrapped and tested on x86_64-linux with no regressions.

Not as hard as I thought it would be! As noted in the commit message, I believe
this makes explicit object member functions feature complete.From a5f947d411b5e19ce7efbb4d766a2792b02c9626 Mon Sep 17 00:00:00 2001
From: Waffl3x 
Date: Sun, 7 Jan 2024 15:02:57 -0700
Subject: [PATCH] C++23 P0847R7 (deducing this) - CWG2586. [PR102609]

This adds support for defaulted comparison operators and copy/move assignment
operators, as well as allowing user defined xobj copy/move assignment
operators. It turns out defaulted comparison operators already worked though,
so this just adds a test for them. Defaulted comparison operators were not so
nice and required a bit of a hack. Should work fine though!

The diagnostics leave something to be desired, and there are some things that
could be improved with more extensive design changes. There are a few notes
left indicating where I think we could make improvements.

Aside from some small bugs, with this commit xobj member functions should be
feature complete.

	PR c++/102609

gcc/cp/ChangeLog:

	PR c++/102609
	C++23 P0847R7 (deducing this) - CWG2586.
	* decl.cc (copy_fn_p): Accept xobj copy assignment functions.
	(move_signature_fn_p): Accept xobj move assignment functions.
	* method.cc (do_build_copy_assign): Handle defaulted xobj member
	functions.
	(defaulted_late_check): Comment.
	(defaultable_fn_check): Comment.

gcc/testsuite/ChangeLog:

	PR c++/102609
	C++23 P0847R7 (deducing this) - CWG2586.
	* g++.dg/cpp23/explicit-obj-basic6.C: New test.
	* g++.dg/cpp23/explicit-obj-default1.C: New test.
	* g++.dg/cpp23/explicit-obj-default2.C: New test.

Signed-off-by: Waffl3x 
---
 gcc/cp/decl.cc| 28 +++-
 gcc/cp/method.cc  | 55 ++--
 .../g++.dg/cpp23/explicit-obj-basic6.C| 51 +++
 .../g++.dg/cpp23/explicit-obj-default1.C  | 57 
 .../g++.dg/cpp23/explicit-obj-default2.C  | 65 +++
 5 files changed, 248 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-basic6.C
 create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-default1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-default2.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 7f267055c29..b10a72a87bf 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -15663,7 +15663,19 @@ copy_fn_p (const_tree d)
   && DECL_NAME (d) != assign_op_identifier)
 return 0;
 
-  args = FUNCTION_FIRST_USER_PARMTYPE (d);
+  if (DECL_XOBJ_MEMBER_FUNCTION_P (d))
+{
+  tree object_param = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (d)));
+  if (!TYPE_REF_P (object_param)
+	  || TYPE_REF_IS_RVALUE (object_param)
+	  /* Reject unrelated object parameters. */
+	  || TYPE_MAIN_VARIANT (TREE_TYPE (object_param)) != DECL_CONTEXT (d)
+	  || CP_TYPE_CONST_P (TREE_TYPE (object_param)))
+	return 0;
+  args = TREE_CHAIN (TYPE_ARG_TYPES (TREE_TYPE (d)));
+}
+  else
+args = FUNCTION_FIRST_USER_PARMTYPE (d);
   if (!args)
 return 0;
 
@@ -15738,7 +15750,19 @@ move_signature_fn_p (const_tree d)
   && DECL_NAME (d) != assign_op_identifier)
 return 0;
 
-  args = FUNCTION_FIRST_USER_PARMTYPE (d);
+  if (DECL_XOBJ_MEMBER_FUNCTION_P (d))
+{
+  tree object_param = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (d)));
+  if (!TYPE_REF_P (object_param)
+	  || TYPE_REF_IS_RVALUE (object_param)
+	  /* Reject unrelated object parameters. */
+	  || TYPE_MAIN_VARIANT (TREE_TYPE (object_param)) != DECL_CONTEXT (d)
+	  || CP_TYPE_CONST_P (TREE_TYPE (object_param)))
+	return 0;
+  args = TREE_CHAIN (TYPE_ARG_TYPES (TREE_TYPE (d)));
+}
+  else
+args = FUNCTION_FIRST_USER_PARMTYPE (d);
   if (!args)
 return 0;
 
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index aa5a044883e..da6a08a0304 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -795,13 +795,19 @@ do_build_copy_assign (tree fndecl)
   compound_stmt = begin_compound_stmt (0);
   parm = convert_from_reference (parm);
 
+  /* If we are building a defaulted xobj copy/move assignment operator then
+ current_class_ref will not have been set up.
+ Kind of an icky hack, but what can ya do?  */
+  tree const class_ref = DECL_XOBJ_MEMBER_FUNCTION_P (fndecl)
+? cp_build_fold_indirect_ref (DECL_ARGUMENTS (fndecl)) : current_class_ref;
+
   if (trivial
   && is_empty_class (current_class_type))
 /* Don't copy the padding byte; it might not have been allocated
if *this is a base subobject.  */;
   else if (trivial)
 {
-  tree t = build2 (MODIFY_EXPR, void_type_node, current_class_ref, parm);
+  tree t = build2 (MODIFY_EXPR, void_type_node, class_ref, parm);
   finish_expr_stmt (t);
 }
   else
@@ -826,7 +832,7 @@ do_build_copy_assign (tree fndecl)
 	  /* Call the base class assignment operator.  */
 	  releasing_vec parmvec (make_tree_vector_single (converted_parm));
 	  fi

Re: [RFA] [V3] new pass for sign/zero extension elimination

2024-01-07 Thread Jeff Law





On 1/3/24 05:07, Richard Sandiford wrote:


+
+ if (GET_CODE (x) == ZERO_EXTRACT)
+   {
+ /* If either the size or the start position is unknown,
+then assume we know nothing about what is overwritten.
+This is overly conservative, but safe.  */
+ if (!CONST_INT_P (XEXP (x, 1)) || !CONST_INT_P (XEXP (x, 2)))
+   continue;
+ mask = (1ULL << INTVAL (XEXP (x, 1))) - 1;
+ bit = INTVAL (XEXP (x, 2));
+ if (BITS_BIG_ENDIAN)
+   bit = (GET_MODE_BITSIZE (GET_MODE (x))
+  - INTVAL (XEXP (x, 1)) - bit).to_constant ();
+ x = XEXP (x, 0);


Should mask be shifted by bit here, like it is for the subregs?  E.g.:

   (set (zero_extract:SI R (const_int 2) (const_int 7)))

would currently give a mask of 0x3 and a bit of 7, whereas I think it
should be a mask of 0x180.  Without that we would only treat the low 8
bits of R as live, rather than the low 16 bits.
Seems like it should to me.But as you note later, the semantics for 
ZERO_EXTRACT in a SET are a bit special in that only the written bits 
are affected.


So we can't just continue iteration during SET procesing for 
ZERO_EXTRACT (which would see the inner object as set and mark it as no 
longer live).  Instead we should skip the sub-rtxs which will leave the 
entire destination live.


On the use processing, we need to make the destination live (since it's 
really a read-modify-write operation) and make sure we process the 
size/start since those could be REGs.






+
+/* INSN has a sign/zero extended source inside SET that we will
+   try to turn into a SUBREG.  */
+static void
+ext_dce_try_optimize_insn (rtx_insn *insn, rtx set)
+{
+  rtx src = SET_SRC (set);
+  rtx inner = XEXP (src, 0);
+
+  /* Avoid (subreg (mem)) and other constructs which may be valid RTL, but
+ not useful for this optimization.  */
+  if (!REG_P (inner) && !SUBREG_P (inner))
+return;


It doesn't look like this does avoid (subreg (mem)) directly.  We'd need
to check REG_P (SUBREG_REG (inner)) for that.

Yea, Jivan fixed that.



+
+case SMUL_HIGHPART:
+case UMUL_HIGHPART:
+  if (XEXP (x, 1) == const0_rtx)
+   return 0;
+  if (CONSTANT_P (XEXP (x, 1)))
+   {
+ if (pow2p_hwi (INTVAL (XEXP (x, 1
+   return mmask & (mask << (GET_MODE_BITSIZE (mode).to_constant ()
+- exact_log2 (INTVAL (XEXP (x, 1);


This will give a UB shift for x==1 (log2==0).  For that I guess the
mask should be the sign bit for SMUL_HIGHPART and 0 for UMUL_HIGHPART.
Or we could just punt to mmask, since we shouldn't see unfolded mul_highparts.
I think just punting to mmask is reasonable.   While we shouldn't (in 
theory) see this unfolded, I've actually seen a patch in the last year 
or so that claimed we could see a (const_int 0) operand in a vector 
highpart multiply.


Essentially it was exposed via an intrinsic and never simplified, 
probably because the patterns were too complex and the generic 
optimizers didn't know how to turn it into a vector splat of 0.  I'd be 
the same thing could happen for const_int 1, but we wouldn't simplify 
because we wouldn't know how to turn it into a vector copy.


Point being there may be ways for the oddball cases to slip through and 
we should do something sensible for them.




+
+ /* ?!? How much of this should mirror SET handling, potentially
+being shared?   */
+ if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())
+   {
+ bit = subreg_lsb (dst).to_constant ();
+ if (bit >= HOST_BITS_PER_WIDE_INT)
+   bit = HOST_BITS_PER_WIDE_INT - 1;
+ dst = SUBREG_REG (dst);
+   }
+ else if (GET_CODE (dst) == ZERO_EXTRACT
+  || GET_CODE (dst) == STRICT_LOW_PART)
+   dst = XEXP (dst, 0);


Seems like we should handle the bit position in the ZERO_EXTRACT, since
if we look through to the first operand and still use live_tmp on it,
we're effectively assuming that XEXP (dst, 2) specifies the lsb.

But for now it might be simpler to remove ZERO_EXTRACT from the "else if"
and leave a FIXME, since continuing will be conservatively correct.
I'd leave it conservatively correct for gcc-14.  If we find cases where 
it's valuable to handle it better, we can add that later.





+
+ /* We're inside a SET and want to process the source operands
+making things live.  Breaking from this loop will cause
+the iterator to work on sub-rtxs, so it is safe to break
+if we see something we don't know how to handle.  */
+ for (;;)
+   {
+ /* Strip an outer paradoxical subreg.  The bits outside
+the inner mode are don't cares.  So we can just strip
+and process the inner object.  */
+

[PATCH] RISC-V: Also handle sign extension in branch costing

2024-01-07 Thread Maciej W. Rozycki

Complement commit c1e8cb3d9f94 ("RISC-V: Rework branch costing model for 
if-conversion") and also handle extraneous sign extend operations that 
are sometimes produced by `noce_try_cmove_arith' instead of zero extend 
operations, making branch costing consistent.  It is unclear what the 
condition is for the middle end to choose between the zero extend and 
sign extend operation, but the test case included uses sign extension 
with 64-bit targets, preventing if-conversion from triggering across all 
the architectural variants.

There are further anomalies revealed by the test case, specifically the 
exceedingly high branch cost of 6 required for the `-mmovcc' variant 
despite that the final branchless sequence only uses 4 instructions, the 
missed conversion at -O1 for 32-bit targets even though code is machine 
word size agnostic, and the missed conversion at -Os and -Oz for 32-bit 
Zicond targets even though the branchless sequence would be shorter than 
the branched one.  These will have to be handled separately.

gcc/
* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
Also handle sign extension.

gcc/testsuite/
* gcc.target/riscv/cset-sext-sfb.c: New test.
* gcc.target/riscv/cset-sext-thead.c: New test.
* gcc.target/riscv/cset-sext-ventana.c: New test.
* gcc.target/riscv/cset-sext-zicond.c: New test.
* gcc.target/riscv/cset-sext.c: New test.
---
Hi,

 This is still in regression-testing, but as a branch costing adjustment 
only I don't expect any code correctness issues, and the performance 
advantage seems very obvious as the sign extend operation applied to the 
result of a conditional set instruction is always a no-op, just as with 
the zero extension.

 Depending on how you look at it you may qualify this as a bug fix (for 
the commit referred; it's surely rare enough a case I missed in original 
testing) or a missed optimisation.  Either way it's a narrow-scoped very 
small change, almost an obviously correct one.  I'll be very happy to get 
it off my plate now, but if it has to wait for GCC 15, I'll accept the 
decision.

 OK to apply then or shall I wait?

  Maciej
---
 gcc/config/riscv/riscv.cc  |5 ++-
 gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c |   28 +
 gcc/testsuite/gcc.target/riscv/cset-sext-thead.c   |   26 +++
 gcc/testsuite/gcc.target/riscv/cset-sext-ventana.c |   26 +++
 gcc/testsuite/gcc.target/riscv/cset-sext-zicond.c  |   26 +++
 gcc/testsuite/gcc.target/riscv/cset-sext.c |   27 
 6 files changed, 136 insertions(+), 2 deletions(-)

gcc-riscv-noce-conversion-profitable-p-sign-extend.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -3555,7 +3555,7 @@ riscv_noce_conversion_profitable_p (rtx_
  this redundant zero extend operation counts towards the cost of
  the replacement sequence.  Compensate for that by incrementing the
  cost of the original sequence as well as the maximum sequence cost
- accordingly.  */
+ accordingly.  Likewise for sign extension.  */
   rtx last_dest = NULL_RTX;
   for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
 {
@@ -3567,8 +3567,9 @@ riscv_noce_conversion_profitable_p (rtx_
  && GET_CODE (x) == SET)
{
  rtx src = SET_SRC (x);
+ enum rtx_code code = GET_CODE (src);
  if (last_dest != NULL_RTX
- && GET_CODE (src) == ZERO_EXTEND
+ && (code == SIGN_EXTEND || code == ZERO_EXTEND)
  && REG_P (XEXP (src, 0))
  && REGNO (XEXP (src, 0)) == REGNO (last_dest))
{
Index: gcc/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/cset-sext-sfb.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+/* { dg-options "-march=rv32gc -mtune=sifive-7-series -mbranch-cost=1 
-fdump-rtl-ce1" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-7-series -mbranch-cost=1 
-fdump-rtl-ce1" { target { rv64 } } } */
+
+int
+foo (long a, long b)
+{
+  if (!b)
+return 0;
+  else if (a)
+return 1;
+  else
+return 0;
+}
+
+/* Expect short forward branch assembly like:
+
+   sneza0,a0
+   bne a1,zero,1f  # movcc
+   mv  a0,zero
+1:
+ */
+
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove_arith" 1 "ce1" { xfail { rv32 && { any-opts "-O1" } } } } } */
+/* { dg-final { scan-assembler-times "\\ssnez\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sbne\\s\[^\\s\]+\\s# movcc\\s" 1 { 
xfail { rv32 && { any-opts "-O1" } } } } } */
+/* { dg-final { scan-assembler-not "\\sbeq\

[committed] RISC-V: Fix avl-type operand index error for ZVBC

2024-01-07 Thread Feng Wang

This patch fix the rtl-checking error for crypto vector. The root
cause is the avl-type index of zvbc ins is error,it should be operand[8]
not operand[5].
gcc/ChangeLog:

* config/riscv/vector.md: Modify avl_type operand index of zvbc ins.
---
 gcc/config/riscv/vector.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index be5beb5ab64..24b7b4394be 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -864,9 +864,9 @@
  
vnclip,vicmp,vfalu,vfmul,vfminmax,vfdiv,vfwalu,vfwmul,\
  vfsgnj,vfcmp,vslideup,vslidedown,vislide1up,\
  
vislide1down,vfslide1up,vfslide1down,vgather,viwmuladd,vfwmuladd,\
- vlsegds,vlsegdux,vlsegdox,vandn,vrol,vror,vwsll")
+ 
vlsegds,vlsegdux,vlsegdox,vandn,vrol,vror,vclmul,vclmulh,vwsll")
   (const_int 8)
-(eq_attr "type" "vstux,vstox,vssegts,vssegtux,vssegtox,vclmul,vclmulh")
+(eq_attr "type" "vstux,vstox,vssegts,vssegtux,vssegtox")
   (const_int 5)
 
 (eq_attr "type" "vimuladd,vfmuladd")
-- 
2.17.1

[PATCH] libstdc++: atomic: Add missing clear_padding in __atomic_float constructor

2024-01-07 Thread xndcn

Hi, I found __atomic_float constructor does not clear padding,
while __compare_exchange assumes it as zeroed padding. So it is easy to
reproducing a infinite loop in X86-64 with long double type like:
---
-O0 -std=c++23 -mlong-double-80
#include 
#include 

#define T long double
int main() {
std::atomic t(0.5);
t.fetch_add(0.5);
float x = t;
printf("%f\n", x);
}
---

So we should add __builtin_clear_padding in __atomic_float constructor,
just like the generic atomic struct.

regtested on x86_64-linux. Is it OK for trunk?

---
libstdc++: atomic: Add missing clear_padding in __atomic_float constructor.

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h: add __builtin_clear_padding in __atomic_float
constructor.
---
 libstdc++-v3/include/bits/atomic_base.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/atomic_base.h
b/libstdc++-v3/include/bits/atomic_base.h
index f4ce0fa53..d59c2209e 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -1283,7 +1283,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

   constexpr
   __atomic_float(_Fp __t) : _M_fp(__t)
-  { }
+  {
+#if __has_builtin(__builtin_clear_padding)
+ if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Fp>())
+  __builtin_clear_padding(std::__addressof(_M_fp));
+#endif
+  }

   __atomic_float(const __atomic_float&) = delete;
   __atomic_float& operator=(const __atomic_float&) = delete;
-- 
2.25.1

[committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20

2024-01-07 Thread Jonathan Wakely

Tested x86_64-linux and aarch64-linux. Pushed to trunk.

-- >8 --

This change ensures that char and wchar_t arguments are formatted
consistently when using integer presentation types. This avoids
non-portable std::format output that depends on whether char and wchar_t
happen to be signed or unsigned on the target. Formatting '\xff' as an
integer will now always format 255 and not sometimes -1. This was
approved in Kona 2023 as a DR for C++20 so the change is implemented
unconditionally.

Also make character formatters check for _Pres_c explicitly and call
_M_format_character directly. This avoid the overhead of calling format
and _S_to_character and then calling _M_format_character anyway.

libstdc++-v3/ChangeLog:

* include/bits/version.def (format_uchar): Define.
* include/bits/version.h: Regenerate.
* include/std/format (formatter::format): Check for
_Pres_c and call _M_format_character directly. Cast C to its
unsigned equivalent for formatting as an integer.
(formatter::format): Likewise.
(basic_format_arg(T&)): Store char arguments as unsigned char
for formatting to a wide string.
* testsuite/std/format/functions/format.cc: Adjust test. Check
formatting of
---
 libstdc++-v3/include/bits/version.def |   9 ++
 libstdc++-v3/include/bits/version.h   | 141 ++
 libstdc++-v3/include/std/format   |  14 +-
 .../testsuite/std/format/functions/format.cc  |  27 +++-
 4 files changed, 118 insertions(+), 73 deletions(-)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index fa304146c65..7c7ba066161 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1166,6 +1166,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = format_uchar;
+  values = {
+v = 202311;
+cxxmin = 20;
+hosted = yes;
+  };
+};
+
 // FIXME: #define __glibcxx_execution 201902L
 
 ftms = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 1e1da11e085..65d5164347e 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1422,7 +1422,18 @@
 #endif /* !defined(__cpp_lib_format) && defined(__glibcxx_want_format) */
 #undef __glibcxx_want_format
 
-// from version.def line 1172
+// from version.def line 1170
+#if !defined(__cpp_lib_format_uchar)
+# if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED
+#  define __glibcxx_format_uchar 202311L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_format_uchar)
+#   define __cpp_lib_format_uchar 202311L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_format_uchar) && 
defined(__glibcxx_want_format_uchar) */
+#undef __glibcxx_want_format_uchar
+
+// from version.def line 1181
 #if !defined(__cpp_lib_constexpr_complex)
 # if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED
 #  define __glibcxx_constexpr_complex 201711L
@@ -1433,7 +1444,7 @@
 #endif /* !defined(__cpp_lib_constexpr_complex) && 
defined(__glibcxx_want_constexpr_complex) */
 #undef __glibcxx_want_constexpr_complex
 
-// from version.def line 1181
+// from version.def line 1190
 #if !defined(__cpp_lib_constexpr_dynamic_alloc)
 # if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED
 #  define __glibcxx_constexpr_dynamic_alloc 201907L
@@ -1444,7 +1455,7 @@
 #endif /* !defined(__cpp_lib_constexpr_dynamic_alloc) && 
defined(__glibcxx_want_constexpr_dynamic_alloc) */
 #undef __glibcxx_want_constexpr_dynamic_alloc
 
-// from version.def line 1190
+// from version.def line 1199
 #if !defined(__cpp_lib_constexpr_string)
 # if (__cplusplus >= 202002L) && _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_HOSTED && 
(defined(__glibcxx_is_constant_evaluated))
 #  define __glibcxx_constexpr_string 201907L
@@ -1465,7 +1476,7 @@
 #endif /* !defined(__cpp_lib_constexpr_string) && 
defined(__glibcxx_want_constexpr_string) */
 #undef __glibcxx_want_constexpr_string
 
-// from version.def line 1214
+// from version.def line 1223
 #if !defined(__cpp_lib_constexpr_vector)
 # if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED
 #  define __glibcxx_constexpr_vector 201907L
@@ -1476,7 +1487,7 @@
 #endif /* !defined(__cpp_lib_constexpr_vector) && 
defined(__glibcxx_want_constexpr_vector) */
 #undef __glibcxx_want_constexpr_vector
 
-// from version.def line 1223
+// from version.def line 1232
 #if !defined(__cpp_lib_erase_if)
 # if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED
 #  define __glibcxx_erase_if 202002L
@@ -1487,7 +1498,7 @@
 #endif /* !defined(__cpp_lib_erase_if) && defined(__glibcxx_want_erase_if) */
 #undef __glibcxx_want_erase_if
 
-// from version.def line 1232
+// from version.def line 1241
 #if !defined(__cpp_lib_generic_unordered_lookup)
 # if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED
 #  define __glibcxx_generic_unordered_lookup 201811L
@@ -1498,7 +1509,7 @@
 #endif /* !defined(__cpp_lib_generic_unordered_lookup) && 
defined(__glibcxx_want_generic_unordered_lookup) */
 #undef __glibcxx

[committed 2/2] libstdc++: Implement P2918R0 "Runtime format strings II" for C++26

2024-01-07 Thread Jonathan Wakely

Tested x86_64-linux and aarch64-linux. Pushed to trunk.

-- >8 --

This adds std::runtime_format for C++26. These new overloaded functions
enhance the std::format API so that it isn't necessary to use the less
ergonomic std::vformat and std::make_format_args (which are meant to be
implementation details). This was approved in Kona 2023 for C++26.

libstdc++-v3/ChangeLog:

* include/std/format (__format::_Runtime_format_string): Define
new class template.
(basic_format_string): Add non-consteval constructor for runtime
format strings.
(runtime_format): Define new function for C++26.
* testsuite/std/format/runtime_format.cc: New test.
---
 libstdc++-v3/include/std/format   | 22 +++
 .../testsuite/std/format/runtime_format.cc| 37 +++
 2 files changed, 59 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/std/format/runtime_format.cc

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 160efa5155c..b3b5a0bbdbc 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -81,6 +81,9 @@ namespace __format
 
   template
 using __format_context = basic_format_context<_Sink_iter<_CharT>, _CharT>;
+
+  template
+struct _Runtime_format_string { basic_string_view<_CharT> _M_str; };
 } // namespace __format
 /// @endcond
 
@@ -115,6 +118,11 @@ namespace __format
consteval
basic_format_string(const _Tp& __s);
 
+  [[__gnu__::__always_inline__]]
+  basic_format_string(__format::_Runtime_format_string<_CharT>&& __s)
+  : _M_str(__s._M_str)
+  { }
+
   [[__gnu__::__always_inline__]]
   constexpr basic_string_view<_CharT>
   get() const noexcept
@@ -133,6 +141,20 @@ namespace __format
   = basic_format_string...>;
 #endif
 
+#if __cplusplus > 202302L
+  [[__gnu__::__always_inline__]]
+  inline __format::_Runtime_format_string
+  runtime_format(string_view __fmt)
+  { return {__fmt}; }
+
+#ifdef _GLIBCXX_USE_WCHAR_T
+  [[__gnu__::__always_inline__]]
+  inline __format::_Runtime_format_string
+  runtime_format(wstring_view __fmt)
+  { return {__fmt}; }
+#endif
+#endif // C++26
+
   // [format.formatter], formatter
 
   /// The primary template of std::formatter is disabled.
diff --git a/libstdc++-v3/testsuite/std/format/runtime_format.cc 
b/libstdc++-v3/testsuite/std/format/runtime_format.cc
new file mode 100644
index 000..174334c7676
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/format/runtime_format.cc
@@ -0,0 +1,37 @@
+// { dg-do run { target c++26 } }
+
+#include 
+#include 
+
+void
+test_char()
+{
+  std::string fmt = "{}";
+  auto s = std::format(std::runtime_format(fmt), 123);
+  VERIFY( s == "123" );
+}
+
+void
+test_wchar()
+{
+  std::wstring fmt = L"{:#o}";
+  auto s = std::format(std::runtime_format(fmt), 456);
+  VERIFY( s == L"0710" );
+}
+
+void
+test_internal_api()
+{
+  // Using _Runtime_format_string directly works even in C++20 mode.
+  // This can be used internally by libstdc++.
+  std::string fmt = "{:#x}";
+  auto s = std::format(std::__format::_Runtime_format_string(fmt), 789);
+  VERIFY( s == "0x315" );
+}
+
+int main()
+{
+  test_char();
+  test_wchar();
+  test_internal_api();
+}
-- 
2.43.0

[PATCH v2 3/4] LoongArch: Use enums for constants

2024-01-07 Thread Yang Yujie

Target features constants from loongarch-def.h are currently defined as macros.
Switch to enums for better look in the debugger.

gcc/ChangeLog:

* config/loongarch/loongarch-def.h: Define constants with
enums instead of Macros.
---
 gcc/config/loongarch/loongarch-def.h | 115 ---
 1 file changed, 67 insertions(+), 48 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index f8cb3adf509..a1237ecf1fd 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -23,12 +23,10 @@ along with GCC; see the file COPYING3.  If not see
 - ISA extensions   (isa_ext),
 - base ABI types   (abi_base),
 - ABI extension types  (abi_ext).
-
-- code models(cmodel)
-- other command-line switches (switch)
+- code models  (cmodel)
 
These values are primarily used for implementing option handling
-   logic in "loongarch.opt", "loongarch-driver.c" and "loongarch-opt.c".
+   logic in "loongarch.opt", "loongarch-driver.cc" and "loongarch-opt.cc".
 
As for the result of this option handling process, the following
scheme is adopted to represent the final configuration:
@@ -53,30 +51,40 @@ along with GCC; see the file COPYING3.  If not see
 #include "loongarch-def-array.h"
 #include "loongarch-tune.h"
 
-/* enum isa_base */
 
-/* LoongArch64 */
-#define ISA_BASE_LA640
-#define N_ISA_BASE_TYPES  1
+/* ISA base */
+enum {
+  ISA_BASE_LA64= 0,  /* LoongArch64 */
+  N_ISA_BASE_TYPES = 1
+};
+
 extern loongarch_def_array
   loongarch_isa_base_strings;
 
-/* enum isa_ext_* */
-#define ISA_EXT_NONE 0
-#define ISA_EXT_FPU321
-#define ISA_EXT_FPU642
-#define N_ISA_EXT_FPU_TYPES   3
-#define ISA_EXT_SIMD_LSX  3
-#define ISA_EXT_SIMD_LASX 4
-#define N_ISA_EXT_TYPES  5
+
+/* ISA extensions */
+enum {
+  ISA_EXT_NONE = 0,
+  ISA_EXT_FPU32= 1,
+  ISA_EXT_FPU64= 2,
+  N_ISA_EXT_FPU_TYPES   = 3,
+  ISA_EXT_SIMD_LSX  = 3,
+  ISA_EXT_SIMD_LASX = 4,
+  N_ISA_EXT_TYPES  = 5
+};
+
 extern loongarch_def_array
   loongarch_isa_ext_strings;
 
-/* enum abi_base */
-#define ABI_BASE_LP64D   0
-#define ABI_BASE_LP64F   1
-#define ABI_BASE_LP64S   2
-#define N_ABI_BASE_TYPES  3
+
+/* Base ABI */
+enum {
+  ABI_BASE_LP64D   = 0,
+  ABI_BASE_LP64F   = 1,
+  ABI_BASE_LP64S   = 2,
+  N_ABI_BASE_TYPES = 3
+};
+
 extern loongarch_def_array
   loongarch_abi_base_strings;
 
@@ -90,28 +98,38 @@ extern loongarch_def_array
   (abi_base == ABI_BASE_LP64S)
 
 
-/* enum abi_ext */
-#define ABI_EXT_BASE 0
-#define N_ABI_EXT_TYPES  1
+/* ABI Extension */
+enum {
+  ABI_EXT_BASE = 0,
+  N_ABI_EXT_TYPES  = 1
+};
+
 extern loongarch_def_array
   loongarch_abi_ext_strings;
 
-/* enum cmodel */
-#define CMODEL_NORMAL0
-#define CMODEL_TINY  1
-#define CMODEL_TINY_STATIC2
-#define CMODEL_MEDIUM3
-#define CMODEL_LARGE 4
-#define CMODEL_EXTREME   5
-#define N_CMODEL_TYPES   6
+
+/* Code Model */
+enum {
+  CMODEL_NORMAL= 0,
+  CMODEL_TINY  = 1,
+  CMODEL_TINY_STATIC   = 2,
+  CMODEL_MEDIUM= 3,
+  CMODEL_LARGE = 4,
+  CMODEL_EXTREME   = 5,
+  N_CMODEL_TYPES   = 6
+};
+
 extern loongarch_def_array
   loongarch_cmodel_strings;
 
-/* enum explicit_relocs */
-#define EXPLICIT_RELOCS_AUTO   0
-#define EXPLICIT_RELOCS_NONE   1
-#define EXPLICIT_RELOCS_ALWAYS 2
-#define N_EXPLICIT_RELOCS_TYPES3
+
+/* Explicit Reloc Type */
+enum {
+  EXPLICIT_RELOCS_AUTO = 0,
+  EXPLICIT_RELOCS_NONE = 1,
+  EXPLICIT_RELOCS_ALWAYS= 2,
+  N_EXPLICIT_RELOCS_TYPES   = 3
+};
 
 /* The common default value for variables whose assignments
are triggered by command-line options.  */
@@ -159,17 +177,18 @@ struct loongarch_target
   int cmodel;  /* CMODEL_ */
 };
 
-/* CPU properties.  */
-/* index */
-#define CPU_NATIVE   0
-#define CPU_ABI_DEFAULT   1
-#define CPU_LOONGARCH64  2
-#define CPU_LA4643
-#define CPU_LA6644
-#define N_ARCH_TYPES 5
-#define N_TUNE_TYPES 5
-
-/* parallel tables.  */
+/* CPU model */
+enum {
+  CPU_NATIVE   = 0,
+  CPU_ABI_DEFAULT   = 1,
+  CPU_LOONGARCH64   = 2,
+  CPU_LA464= 3,
+  CPU_LA664= 4,
+  N_ARCH_TYPES = 5,
+  N_TUNE_TYPES = 5
+};
+
+/* CPU model properties */
 extern loongarch_def_array
   loongarch_cpu_strings;
 extern loongarch_def_array
-- 
2.43.0

[PATCH v2 4/4] LoongArch: Simplify -mexplicit-reloc definitions

2024-01-07 Thread Yang Yujie

Since we do not need printing or manual parsing of this option,
(whether in the driver or for target attributes to be supported later)
it can be handled in the .opt file framework.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Remove explicit-reloc
argument string definitions.
* config/loongarch/loongarch-str.h: Same.
* config/loongarch/genopts/loongarch.opt.in: Mark -m[no-]explicit-relocs
as aliases to -mexplicit-relocs={always,none}
* config/loongarch/genopts/loongarch.opt: Same.
* config/loongarch/loongarch.cc: Same.
---
 gcc/config/loongarch/genopts/loongarch-strings |  6 --
 gcc/config/loongarch/genopts/loongarch.opt.in  |  8 
 gcc/config/loongarch/loongarch-str.h   |  5 -
 gcc/config/loongarch/loongarch.cc  | 12 
 gcc/config/loongarch/loongarch.opt |  2 +-
 5 files changed, 5 insertions(+), 28 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index ba47be31227..e434a89c9ee 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -64,9 +64,3 @@ STR_CMODEL_TS   tiny-static
 STR_CMODEL_MEDIUM medium
 STR_CMODEL_LARGE  large
 STR_CMODEL_EXTREMEextreme
-
-# -mexplicit-relocs
-OPTSTR_EXPLICIT_RELOCS explicit-relocs
-STR_EXPLICIT_RELOCS_AUTO   auto
-STR_EXPLICIT_RELOCS_NONE   none
-STR_EXPLICIT_RELOCS_ALWAYS always
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 38ac347c660..1dbd3ad1e3f 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -181,20 +181,20 @@ Name(explicit_relocs) Type(int)
 The code model option names for -mexplicit-relocs:
 
 EnumValue
-Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_AUTO@@) 
Value(EXPLICIT_RELOCS_AUTO)
+Enum(explicit_relocs) String(auto) Value(EXPLICIT_RELOCS_AUTO)
 
 EnumValue
-Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_NONE@@) 
Value(EXPLICIT_RELOCS_NONE)
+Enum(explicit_relocs) String(none) Value(EXPLICIT_RELOCS_NONE)
 
 EnumValue
-Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_ALWAYS@@) 
Value(EXPLICIT_RELOCS_ALWAYS)
+Enum(explicit_relocs) String(always) Value(EXPLICIT_RELOCS_ALWAYS)
 
 mexplicit-relocs=
 Target RejectNegative Joined Enum(explicit_relocs) Var(la_opt_explicit_relocs) 
Init(M_OPT_UNSET)
 Use %reloc() assembly operators.
 
 mexplicit-relocs
-Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET)
+Target Alias(mexplicit-relocs=, always, none)
 Use %reloc() assembly operators (for backward compatibility).
 
 mrecip
diff --git a/gcc/config/loongarch/loongarch-str.h 
b/gcc/config/loongarch/loongarch-str.h
index 0a6a36c5783..20da2b169ed 100644
--- a/gcc/config/loongarch/loongarch-str.h
+++ b/gcc/config/loongarch/loongarch-str.h
@@ -63,11 +63,6 @@ along with GCC; see the file COPYING3.  If not see
 #define STR_CMODEL_LARGE "large"
 #define STR_CMODEL_EXTREME "extreme"
 
-#define OPTSTR_EXPLICIT_RELOCS "explicit-relocs"
-#define STR_EXPLICIT_RELOCS_AUTO "auto"
-#define STR_EXPLICIT_RELOCS_NONE "none"
-#define STR_EXPLICIT_RELOCS_ALWAYS "always"
-
 #define OPTSTR_FRECIPE "frecipe"
 #define OPTSTR_DIV32   "div32"
 #define OPTSTR_LAM_BH  "lam-bh"
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 95517ec61da..594e5a00c98 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7533,18 +7533,6 @@ loongarch_option_override_internal (struct gcc_options 
*opts,
   loongarch_update_gcc_opt_status (&la_target, opts, opts_set);
   loongarch_cpu_option_override (&la_target, opts, opts_set);
 
-  if (la_opt_explicit_relocs != M_OPT_UNSET
-  && la_opt_explicit_relocs_backward != M_OPT_UNSET)
-error ("do not use %qs (with %qs) and %qs (without %qs) together",
-  "-mexplicit-relocs=", "=",
-  la_opt_explicit_relocs_backward ? "-mexplicit-relocs"
-  : "-mno-explicit-relocs", "=");
-
-  if (la_opt_explicit_relocs_backward != M_OPT_UNSET)
-la_opt_explicit_relocs = (la_opt_explicit_relocs_backward
- ? EXPLICIT_RELOCS_ALWAYS
- : EXPLICIT_RELOCS_NONE);
-
   if (la_opt_explicit_relocs == M_OPT_UNSET)
 la_opt_explicit_relocs = (HAVE_AS_EXPLICIT_RELOCS
  ? (loongarch_mrelax
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 76b42d51d09..adb2304fbd5 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -202,7 +202,7 @@ Target RejectNegative Joined Enum(explicit_relocs) 
Var(la_opt_explicit_relocs) I
 Use %reloc() assembly operators.
 
 mexplicit-relocs
-Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET)
+Target Alias(mexplicit-reloc

[committed 1/2] libstdc++: Implement P2905R2 "Runtime format strings" for C++20

2024-01-07 Thread Jonathan Wakely

Tested x86_64-linux and aarch64-linux. Pushed to trunk.

-- >8 --

This change makes std::make_format_args refuse to create dangling
references to temporaries. This makes the std::vformat API safer. This
was approved in Kona 2023 as a DR for C++20 so the change is implemented
unconditionally.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono): Always use
lvalue arguments to make_format_args.
* include/std/format (make_format_args): Change parameter pack
from forwarding references to lvalue references. Remove use of
remove_reference_t which is now unnecessary.
(format_to, formatted_size): Remove incorrect forwarding of
arguments.
* include/std/ostream (print): Remove forwarding of arguments.
* include/std/print (print): Likewise.
* testsuite/20_util/duration/io.cc: Use lvalues as arguments to
make_format_args.
* testsuite/std/format/arguments/args.cc: Likewise.
* testsuite/std/format/arguments/lwg3810.cc: Likewise.
* testsuite/std/format/functions/format.cc: Likewise.
* testsuite/std/format/functions/vformat_to.cc: Likewise.
* testsuite/std/format/string.cc: Likewise.
* testsuite/std/time/day/io.cc: Likewise.
* testsuite/std/time/month/io.cc: Likewise.
* testsuite/std/time/weekday/io.cc: Likewise.
* testsuite/std/time/year/io.cc: Likewise.
* testsuite/std/time/year_month_day/io.cc: Likewise.
* testsuite/std/format/arguments/args_neg.cc: New test.
---
 libstdc++-v3/include/bits/chrono_io.h | 15 ++
 libstdc++-v3/include/std/format   | 30 +--
 libstdc++-v3/include/std/ostream  |  2 +-
 libstdc++-v3/include/std/print|  2 +-
 libstdc++-v3/testsuite/20_util/duration/io.cc |  3 +-
 .../testsuite/std/format/arguments/args.cc| 26 
 .../std/format/arguments/args_neg.cc  | 12 
 .../testsuite/std/format/arguments/lwg3810.cc |  8 +++--
 .../testsuite/std/format/functions/format.cc  |  6 ++--
 .../std/format/functions/vformat_to.cc|  9 --
 libstdc++-v3/testsuite/std/format/string.cc   |  7 +++--
 libstdc++-v3/testsuite/std/time/day/io.cc |  4 +--
 libstdc++-v3/testsuite/std/time/month/io.cc   |  4 +--
 libstdc++-v3/testsuite/std/time/weekday/io.cc |  4 +--
 libstdc++-v3/testsuite/std/time/year/io.cc|  4 +--
 .../testsuite/std/time/year_month_day/io.cc   |  4 +--
 16 files changed, 93 insertions(+), 47 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/format/arguments/args_neg.cc

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index c30451651ea..ec2ae9d53cc 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -2273,7 +2273,8 @@ namespace __detail
   _Str __s = _GLIBCXX_WIDEN("{:02d} is not a valid day");
   if (__d.ok())
__s = __s.substr(0, 6);
-  __os << std::vformat(__s, make_format_args<_Ctx>((unsigned)__d));
+  auto __u = (unsigned)__d;
+  __os << std::vformat(__s, make_format_args<_Ctx>(__u));
   return __os;
 }
 
@@ -2302,8 +2303,10 @@ namespace __detail
__os << std::vformat(__os.getloc(), __s.substr(0, 6),
 make_format_args<_Ctx>(__m));
   else
-   __os << std::vformat(__s.substr(6),
-make_format_args<_Ctx>((unsigned)__m));
+   {
+ auto __u = (unsigned)__m;
+ __os << std::vformat(__s.substr(6), make_format_args<_Ctx>(__u));
+   }
   return __os;
 }
 
@@ -2364,8 +2367,10 @@ namespace __detail
__os << std::vformat(__os.getloc(), __s.substr(0, 6),
 make_format_args<_Ctx>(__wd));
   else
-   __os << std::vformat(__s.substr(6),
-make_format_args<_Ctx>(__wd.c_encoding()));
+   {
+ auto __c = __wd.c_encoding();
+ __os << std::vformat(__s.substr(6), make_format_args<_Ctx>(__c));
+   }
   return __os;
 }
 
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 0d9f70ee555..160efa5155c 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3413,7 +3413,7 @@ namespace __format
 
   template
friend auto
-   make_format_args(_Argz&&...) noexcept;
+   make_format_args(_Argz&...) noexcept;
 
   template
friend decltype(auto)
@@ -3583,7 +3583,7 @@ namespace __format
 
   template
friend auto
-   make_format_args(_Args&&...) noexcept;
+   make_format_args(_Args&...) noexcept;
 
   // An array of _Arg_t enums corresponding to _Args...
   template
@@ -3621,7 +3621,7 @@ namespace __format
 
   template
 auto
-make_format_args(_Args&&... __fmt_args) noexcept;
+make_format_args(_Args&... __fmt_args) noexcept;
 
   // An array of type-erased format

[PATCH v2 1/4] LoongArch: Handle ISA evolution switches along with other options

2024-01-07 Thread Yang Yujie

gcc/ChangeLog:

* config/loongarch/genopts/genstr.sh: Prepend the isa_evolution
variable with the common la_ prefix.
* config/loongarch/genopts/loongarch.opt.in: Mark ISA evolution
flags as saved using TargetVariable.
* config/loongarch/loongarch.opt: Same.
* config/loongarch/loongarch-def.h: Define evolution_set to
mark changes to the -march default.
* config/loongarch/loongarch-driver.cc: Same.
* config/loongarch/loongarch-opts.cc: Same.
* config/loongarch/loongarch-opts.h: Define and use ISA evolution
conditions around the la_target structure.
* config/loongarch/loongarch.cc: Same.
* config/loongarch/loongarch.md: Same.
* config/loongarch/loongarch-builtins.cc: Same.
* config/loongarch/loongarch-c.cc: Same.
* config/loongarch/lasx.md: Same.
* config/loongarch/lsx.md: Same.
* config/loongarch/sync.md: Same.
---
 gcc/config/loongarch/genopts/genstr.sh|  2 +-
 gcc/config/loongarch/genopts/loongarch.opt.in |  6 ++---
 gcc/config/loongarch/lasx.md  |  4 ++--
 gcc/config/loongarch/loongarch-builtins.cc|  6 ++---
 gcc/config/loongarch/loongarch-c.cc   |  2 +-
 gcc/config/loongarch/loongarch-def.h  |  5 +++-
 gcc/config/loongarch/loongarch-driver.cc  |  5 ++--
 gcc/config/loongarch/loongarch-opts.cc| 17 -
 gcc/config/loongarch/loongarch-opts.h | 24 +++
 gcc/config/loongarch/loongarch.cc | 24 ---
 gcc/config/loongarch/loongarch.md | 12 +-
 gcc/config/loongarch/loongarch.opt| 16 ++---
 gcc/config/loongarch/lsx.md   |  4 ++--
 gcc/config/loongarch/sync.md  | 22 -
 14 files changed, 90 insertions(+), 59 deletions(-)

diff --git a/gcc/config/loongarch/genopts/genstr.sh 
b/gcc/config/loongarch/genopts/genstr.sh
index 5865b87d516..724c9aaedac 100755
--- a/gcc/config/loongarch/genopts/genstr.sh
+++ b/gcc/config/loongarch/genopts/genstr.sh
@@ -107,7 +107,7 @@ EOF
   print("")
   print("m"$3)
   gsub(/-/, "_", $3)
-  print("Target Mask(ISA_"toupper($3)") Var(isa_evolution)")
+  print("Target Mask(ISA_"toupper($3)") Var(la_isa_evolution)")
   $1=""; $2=""; $3=""
   sub(/^ */, "", $0)
   print($0)
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index f2e7ea2ef2f..e643deacd21 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -259,6 +259,6 @@ default value is 4.
 ; Features added during ISA evolution.  This concept is different from ISA
 ; extension, read Section 1.5 of LoongArch v1.10 Volume 1 for the
 ; explanation.  These features may be implemented and enumerated with
-; CPUCFG independantly, so we use bit flags to specify them.
-Variable
-HOST_WIDE_INT isa_evolution = 0
+; CPUCFG independently, so we use bit flags to specify them.
+TargetVariable
+HOST_WIDE_INT la_isa_evolution = 0
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 027021b45d5..429c59504b9 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1539,7 +1539,7 @@ (define_insn "lasx_xvfrecipe_"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
 (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
  UNSPEC_LASX_XVFRECIPE))]
-  "ISA_HAS_LASX && TARGET_FRECIPE"
+  "ISA_HAS_LASX && ISA_HAS_FRECIPE"
   "xvfrecipe.\t%u0,%u1"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
@@ -1572,7 +1572,7 @@ (define_insn "lasx_xvfrsqrte_"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
 (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
  UNSPEC_LASX_XVFRSQRTE))]
-  "ISA_HAS_LASX && TARGET_FRECIPE"
+  "ISA_HAS_LASX && ISA_HAS_FRECIPE"
   "xvfrsqrte.\t%u0,%u1"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index 45ec6aca030..efe7e5e5ebc 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -120,9 +120,9 @@ struct loongarch_builtin_description
 AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI)
 AVAIL_ALL (lsx, ISA_HAS_LSX)
 AVAIL_ALL (lasx, ISA_HAS_LASX)
-AVAIL_ALL (frecipe, TARGET_FRECIPE && TARGET_HARD_FLOAT_ABI)
-AVAIL_ALL (lsx_frecipe, ISA_HAS_LSX && TARGET_FRECIPE)
-AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
+AVAIL_ALL (frecipe, ISA_HAS_FRECIPE && TARGET_HARD_FLOAT_ABI)
+AVAIL_ALL (lsx_frecipe, ISA_HAS_LSX && ISA_HAS_FRECIPE)
+AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && ISA_HAS_FRECIPE)
 
 /* Construct a loongarch_builtin_description from the given arguments.
 
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 118b15

[PATCH v2 2/4] LoongArch: Rename ISA_BASE_LA64V100 to ISA_BASE_LA64

2024-01-07 Thread Yang Yujie

LoongArch ISA manual v1.10 suggests that software should not depend on
the ISA version number for marking processor features.  The ISA version
number is now defined as a collective name of individual ISA evolutions.
Since there is a independent ISA evolution mask now, we can drop the
version information from the base ISA.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Rename.
* config/loongarch/genopts/loongarch.opt.in: Same.
* config/loongarch/loongarch-cpu.cc: Same.
* config/loongarch/loongarch-def.cc: Same.
* config/loongarch/loongarch-def.h: Same.
* config/loongarch/loongarch-opts.cc: Same.
* config/loongarch/loongarch-opts.h: Same.
* config/loongarch/loongarch-str.h: Same.
* config/loongarch/loongarch.opt: Same.
---
 gcc/config/loongarch/genopts/loongarch-strings |  2 +-
 gcc/config/loongarch/genopts/loongarch.opt.in  |  2 +-
 gcc/config/loongarch/loongarch-cpu.cc  |  2 +-
 gcc/config/loongarch/loongarch-def.cc  | 14 +++---
 gcc/config/loongarch/loongarch-def.h   |  6 +++---
 gcc/config/loongarch/loongarch-opts.cc | 10 +-
 gcc/config/loongarch/loongarch-opts.h  |  2 +-
 gcc/config/loongarch/loongarch-str.h   |  2 +-
 gcc/config/loongarch/loongarch.opt |  2 +-
 9 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index f40b014f017..ba47be31227 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -29,7 +29,7 @@ STR_CPU_LA464   la464
 STR_CPU_LA664la664
 
 # Base architecture
-STR_ISA_BASE_LA64V100 la64
+STR_ISA_BASE_LA64 la64
 
 # -mfpu
 OPTSTR_ISA_EXT_FPUfpu
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index e643deacd21..38ac347c660 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -33,7 +33,7 @@ Name(isa_base) Type(int)
 Basic ISAs of LoongArch:
 
 EnumValue
-Enum(isa_base) String(@@STR_ISA_BASE_LA64V100@@) Value(ISA_BASE_LA64V100)
+Enum(isa_base) String(@@STR_ISA_BASE_LA64@@) Value(ISA_BASE_LA64)
 
 ;; ISA extensions / adjustments
 Enum
diff --git a/gcc/config/loongarch/loongarch-cpu.cc 
b/gcc/config/loongarch/loongarch-cpu.cc
index e1771fc0b4f..97ac5fed9d8 100644
--- a/gcc/config/loongarch/loongarch-cpu.cc
+++ b/gcc/config/loongarch/loongarch-cpu.cc
@@ -133,7 +133,7 @@ fill_native_cpu_config (struct loongarch_target *tgt)
switch (cpucfg_cache[1] & 0x3)
  {
case 0x02:
- tmp = ISA_BASE_LA64V100;
+ tmp = ISA_BASE_LA64;
  break;
 
default:
diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index 48d28315064..e8c129ce643 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -48,16 +48,16 @@ array_arch loongarch_cpu_default_isa =
   array_arch ()
 .set (CPU_LOONGARCH64,
  loongarch_isa ()
-   .base_ (ISA_BASE_LA64V100)
+   .base_ (ISA_BASE_LA64)
.fpu_ (ISA_EXT_FPU64))
 .set (CPU_LA464,
  loongarch_isa ()
-   .base_ (ISA_BASE_LA64V100)
+   .base_ (ISA_BASE_LA64)
.fpu_ (ISA_EXT_FPU64)
.simd_ (ISA_EXT_SIMD_LASX))
 .set (CPU_LA664,
  loongarch_isa ()
-   .base_ (ISA_BASE_LA64V100)
+   .base_ (ISA_BASE_LA64)
.fpu_ (ISA_EXT_FPU64)
.simd_ (ISA_EXT_SIMD_LASX)
.evolution_ (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA
@@ -153,7 +153,7 @@ array_tune loongarch_cpu_multipass_dfa_lookahead = 
array_tune ()
 
 array loongarch_isa_base_strings =
   array ()
-.set (ISA_BASE_LA64V100, STR_ISA_BASE_LA64V100);
+.set (ISA_BASE_LA64, STR_ISA_BASE_LA64);
 
 array loongarch_isa_ext_strings =
   array ()
@@ -189,15 +189,15 @@ array, 
N_ABI_BASE_TYPES>
  array ()
.set (ABI_EXT_BASE,
  loongarch_isa ()
-   .base_ (ISA_BASE_LA64V100)
+   .base_ (ISA_BASE_LA64)
.fpu_ (ISA_EXT_FPU64)))
 .set (ABI_BASE_LP64F,
  array ()
.set (ABI_EXT_BASE,
  loongarch_isa ()
-   .base_ (ISA_BASE_LA64V100)
+   .base_ (ISA_BASE_LA64)
.fpu_ (ISA_EXT_FPU32)))
 .set (ABI_BASE_LP64S,
  array ()
.set (ABI_EXT_BASE,
- loongarch_isa ().base_ (ISA_BASE_LA64V100)));
+ loongarch_isa ().base_ (ISA_BASE_LA64)));
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 1fab4f4d315..f8cb3adf509 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -55,9 +55,9 @@ a

[PATCH v2 0/4] Adjust option handling code

2024-01-07 Thread Yang Yujie

This patchset performs some code cleanup, and is bootstrapped and regtested
on loongarch64-linux-gnu.

Changes from v1 -> v2:
* Replaced all TARGET_ macros from .opt.
* Fixed definition of ISA_HAS_LAMCAS.

Yang Yujie (4):
  LoongArch: Handle ISA evolution switches along with other options
  LoongArch: Rename ISA_BASE_LA64V100 to ISA_BASE_LA64
  LoongArch: Use enums for constants
  LoongArch: Simplify -mexplicit-reloc definitions

 gcc/config/loongarch/genopts/genstr.sh|   2 +-
 .../loongarch/genopts/loongarch-strings   |   8 +-
 gcc/config/loongarch/genopts/loongarch.opt.in |  16 +--
 gcc/config/loongarch/lasx.md  |   4 +-
 gcc/config/loongarch/loongarch-builtins.cc|   6 +-
 gcc/config/loongarch/loongarch-c.cc   |   2 +-
 gcc/config/loongarch/loongarch-cpu.cc |   2 +-
 gcc/config/loongarch/loongarch-def.cc |  14 +-
 gcc/config/loongarch/loongarch-def.h  | 120 +++---
 gcc/config/loongarch/loongarch-driver.cc  |   5 +-
 gcc/config/loongarch/loongarch-opts.cc|  27 +++-
 gcc/config/loongarch/loongarch-opts.h |  26 +++-
 gcc/config/loongarch/loongarch-str.h  |   7 +-
 gcc/config/loongarch/loongarch.cc |  36 ++
 gcc/config/loongarch/loongarch.md |  12 +-
 gcc/config/loongarch/loongarch.opt|  20 +--
 gcc/config/loongarch/lsx.md   |   4 +-
 gcc/config/loongarch/sync.md  |  22 ++--
 18 files changed, 180 insertions(+), 153 deletions(-)

-- 
2.43.0

Re: [PATCH v2] libstdc++: Add Unicode-aware width estimation for std::format

2024-01-07 Thread Jonathan Wakely

On Mon, 8 Jan 2024 at 01:22, Jonathan Wakely  wrote:
>
> On Mon, 8 Jan 2024 at 01:13, Jonathan Wakely  wrote:
> >
> > This V2 patch failed CI:
> > https://patchwork.sourceware.org/project/gcc/patch/20240106151802.3356059-1-jwak...@redhat.com/
> >
> > But that's because the UTF-8 characters in the patch got garbled in
> > patchwork, so what got tested is not what I intend to push. I have
>
> Hmm, it's garbled in the public inbox too:
> https://inbox.sourceware.org/libstdc++/20240108011829.3670492-1-jwak...@redhat.com/T/#u
> See the additions to
> libstdc++-v3/testsuite/std/format/functions/format.cc which should be
> valid UTF-8. I wonder what my MUA is doing wrong. I'll have to
> investigate.

It seems that git send-email doesn't specify UTF-8 when I hit Enter at
this prompt:

The following files are 8bit, but do not declare a Content-Transfer-Encoding.

/tmp/OEXvA2gRe9/0001-libstdc-Add-Unicode-aware-width-estimation-for-std-f.patch
Which 8bit encoding should I declare [UTF-8]?
OK. Log says:
...
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2024-01-07 Thread Hongtao Liu

On Sun, Jan 7, 2024 at 6:53 AM Roger Sayle  wrote:
>
> Hi Hongtao,
>
> Many thanks for the review.  This revised patch implements several
> of your suggestions, specifically to use pshufd for V4SImode and
> punpcklqdq for V2DImode.  These changes are demonstrated by the
> examples below:
>
> typedef unsigned int v4si __attribute((vector_size(16)));
> typedef unsigned long long v2di __attribute((vector_size(16)));
>
> v4si foo() { return (v4si){1,1,1,1}; }
> v2di bar() { return (v2di){1,1}; }
>
> The previous version of my patch generated:
>
> foo:movdqa  .LC0(%rip), %xmm0
> ret
> bar:movdqa  .LC1(%rip), %xmm0
> ret
>
> with this revised version, -O2 generates:
>
> foo:movl$1, %eax
> movd%eax, %xmm0
> pshufd  $0, %xmm0, %xmm0
> ret
> bar:movl$1, %eax
> movq%rax, %xmm0
> punpcklqdq  %xmm0, %xmm0
> ret
>
> However, if it's OK with you, I'd prefer to allow this function to
> return false, safely falling back to emitting a vector load from
> the constant bool rather than ICEing from a gcc_assert.  For one
Sure, that makes sense.
> thing this isn't a unrecoverable correctness issue, but at worst
> a missed optimization.  The deeper reason is that this usefully
> provides a handle for tuning on different microarchitectures.
> On some (AMD?) machines, where !TARGET_INTER_UNIT_MOVES_TO_VEC,
> the first form above may be preferable to the second.  Currently
> the start of ix86_convert_const_wide_int_to_broadcast disables
> broadcasts for !TARGET_INTER_UNIT_MOVES_TO_VEC even when an
> implementation doesn't reuire an inter unit move, such as a
> broadcast from memory.  I plan follow-up patches that benefit
> from this flexibility.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
Ok.
>
> gcc/ChangeLog
> PR target/112992
> * config/i386/i386-expand.cc
> (ix86_convert_const_wide_int_to_broadcast): Allow call to
> ix86_expand_vector_init_duplicate to fail, and return NULL_RTX.
> (ix86_broadcast_from_constant): Revert recent change; Return a
> suitable MEMREF independently of mode/target combinations.
> (ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate
> to decide whether expansion is possible/preferrable.  Only try
> forcing DImode constants to memory (and trying again) if calling
> ix86_expand_vector_init_duplicate fails with an DImode immediate
> constant.
> (ix86_expand_vector_init_duplicate) : Try using
> V4SImode for suitable immediate constants.
> : Try using V8SImode for suitable constants.
> : Fail for CONST_INT_P, i.e. use constant pool.
> : Likewise.
> : For CONST_INT_P try using V4SImode via widen.
> : For CONT_INT_P try using V8HImode via widen.
> : Handle CONT_INTs via simplify_binary_operation.
> Allow recursive calls to ix86_expand_vector_init_duplicate to fail.
> : For CONST_INT_P try V8SImode via widen.
> : For CONST_INT_P try V16HImode via widen.
> (ix86_expand_vector_init): Move try using a broadcast for all_same
> with ix86_expand_vector_init_duplicate before using constant pool.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/auto-init-8.c: Update test case.
> * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
> * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
> * gcc.target/i386/avx512fp16-13.c: Likewise.
> * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
> * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
> * gcc.target/i386/pr100865-1.c: Likewise.
> * gcc.target/i386/pr100865-10a.c: Likewise.
> * gcc.target/i386/pr100865-10b.c: Likewise.
> * gcc.target/i386/pr100865-2.c: Likewise.
> * gcc.target/i386/pr100865-3.c: Likewise.
> * gcc.target/i386/pr100865-4a.c: Likewise.
> * gcc.target/i386/pr100865-4b.c: Likewise.
> * gcc.target/i386/pr100865-5a.c: Likewise.
> * gcc.target/i386/pr100865-5b.c: Likewise.
> * gcc.target/i386/pr100865-9a.c: Likewise.
> * gcc.target/i386/pr100865-9b.c: Likewise.
> * gcc.target/i386/pr102021.c: Likewise.
> * gcc.target/i386/pr90773-17.c: Likewise.
>
> Thanks in advance.
> Roger
> --
>
> > -Original Message-
> > From: Hongtao Liu 
> > Sent: 02 January 2024 05:40
> > To: Roger Sayle 
> > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak 
> > Subject: Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of
> > constants.
> >
> > On Fri, Dec 22, 2023 at 6:25 PM Roger Sayle 
> > wrote:
> > >
> > >
> > > This patch resolves the second part of PR target/112992, building upon
> > > Hongtao Liu's solution to the first part.
> > >
> > > The issue

Re：Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-07 Thread joshua

Hi Juzhe,

Stage 3 will close today and there are still some patches that
haven't been reviewed left. 
So is it possible to get xtheadvector merged in GCC-14?
We emailed Kito regarding this, but haven't got any reply yet.

Joshua

--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月4日(星期四) 17:18
收件人："cooper.joshua"; 
jeffreyalaw; "gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.

\ No newline at end of file
Each file needs newline.

I am not able to review arch stuff. This needs kito.

Besides, Andrew Pinski want us defer theadvector to GCC-15.

I have no strong opinion here.

juzhe.zh...@rivai.ai

发件人： joshua
发送时间： 2024-01-04 17:15
收件人： 钟居哲; Jeff Law; gcc-patches
抄送： jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; jinma; 
Cooper Qu
主题： Re：Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.

Hi Juzhe,

So is the following patch that this patch relies on OK to commit?
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html

Joshua

--
发件人：钟居哲 
发送时间：2024年1月2日(星期二) 06:57
收件人：Jeff Law; 
"cooper.joshua"; 
"gcc-patches"
抄　送："jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; 
"Christoph Müllner"; 
jinma; Cooper Qu
主　题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.

This is Ok from my side.
But before commit this patch, I think we need this patch first:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html 

I will be back to work so I will take a look at other patches today.
juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2024-01-01 01:43
To: Jun Sha (Joshua); gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
juzhe.zhong; Jin Ma; Xianmiao Qu
Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.

On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> This patch adds th. prefix to all XTheadVector instructions by
> implementing new assembly output functions. We only check the
> prefix is 'v', so that no extra attribute is needed.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
>   New function to add assembler insn code prefix/suffix.
>   * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
>   * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> 
> Co-authored-by: Jin Ma 
> Co-authored-by: Xianmiao Qu 
> Co-authored-by: Christoph Müllner 
> ---
>   gcc/config/riscv/riscv-protos.h    |  1 +
>   gcc/config/riscv/riscv.cc  | 14 ++
>   gcc/config/riscv/riscv.h   |  4 
>   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
>   4 files changed, 31 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> 
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 31049ef7523..5ea54b45703 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -102,6 +102,7 @@ struct riscv_address_info {
>   };
>   
>   /* Routines implemented in riscv.cc.  */
> +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char 
> *p);
>   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
>   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
>   extern int riscv_float_const_rtx_index_for_fli (rtx);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 0d1cbc5cb5f..ea1d59d9cf2 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> return lmul;
>   }
>   
> +/* Define ASM_OUTPUT_OPCODE to do anything special before
> +   emitting an opcode.  */
> +const char *
> +riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
> +{
> +  /* We need to add th. prefix to all the xtheadvector
> + insturctions here.*/
> +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> +  p[0] == 'v')
> +    fputs ("th.", asm_out_file);
> +
> +  return p;
Just a formatting nit. The GNU standards break lines before the 
operator, not after.  So
   if (TARGET_XTHEADVECTOR
   && current_output_insn != NULL
   && p[0] == 'v')

Note that current_output_insn is "extern rtx_insn *", so use NULL, not 
NULL_RTX.

Neither of these nits require a new version for review.  Just fix them.

If Juzhe is fine with this, so am I.  We can refine it if necessary later.

jeff

[PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-07 Thread Kewen.Lin

Hi,

As PR113100 shows, the unbiasing introduced by r14-6737 can
cause the scrubbing to overrun and screw some critical data
on stack like saved toc base consequently cause segfault on
Power.

By checking PR112917, IMHO we should keep this unbiasing
guarded under SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 &&
TARGET_STACK_BIAS), similar to some existing code special
treating SPARC stack bias.

Bootstrapped and regtested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.  All reported failures in
PR113100 are gone.  I also expect the culprit commit can
affect those ports with nonzero STACK_POINTER_OFFSET.

Is it ok for trunk?

BR,
Kewen
-
PR middle-end/113100

gcc/ChangeLog:

* builtins.cc (expand_builtin_stack_address): Guard stack point
adjustment with SPARC_STACK_BOUNDARY_HACK.
---
 gcc/builtins.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 125ea158ebf..9bad1e962b4 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5450,6 +5450,7 @@ expand_builtin_stack_address ()
   rtx ret = convert_to_mode (ptr_mode, copy_to_reg (stack_pointer_rtx),
 STACK_UNSIGNED);

+#ifdef SPARC_STACK_BOUNDARY_HACK
   /* Unbias the stack pointer, bringing it to the boundary between the
  stack area claimed by the active function calling this builtin,
  and stack ranges that could get clobbered if it called another
@@ -5476,7 +5477,9 @@ expand_builtin_stack_address ()
  (caller) function's active area as well, whereas those pushed or
  allocated temporarily for a call are regarded as part of the
  callee's stack range, rather than the caller's.  */
-  ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET);
+  if (SPARC_STACK_BOUNDARY_HACK)
+ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET);
+#endif

   return force_reg (ptr_mode, ret);
 }
--
2.39.3

[PATCH] testsuite, rs6000: Adjust pcrel-sibcall-1.c with noipa [PR112751]

2024-01-07 Thread Kewen.Lin

Hi,

As PR112751 shows, commit r14-5628 caused pcrel-sibcall-1.c
to fail as it enables ipa-vrp which makes return values of
functions {x,y,xx} as known and propagated.  This patch is
to adjust it with noipa to make it not fragile.

Tested well on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9/P10.

I'm going to push this soon.

BR,
Kewen
-
PR testsuite/112751

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pcrel-sibcall-1.c: Replace noinline as noipa.
---
 gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c 
b/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
index 9197788f98f..1b6dffd6073 100644
--- a/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-sibcall-1.c
@@ -8,10 +8,10 @@
generated when the caller preserves the TOC but the callee does not.  */

 #pragma GCC target ("cpu=power10,pcrel")
-int x (void) __attribute__((noinline));
-int y (void) __attribute__((noinline));
-int xx (void) __attribute__((noinline));
-
+int x (void) __attribute__((noipa));
+int y (void) __attribute__((noipa));
+int xx (void) __attribute__((noipa));
+
 int x (void)
 {
   return 1;
--
2.39.3

[PATCH] rs6000: Eliminate zext fed by vclzlsbb [PR111480]

2024-01-07 Thread Kewen.Lin

Hi,

As PR111480 shows, commit r14-4079 only optimizes the case
of vctzlsbb but not for the similar vclzlsbb.  This patch
is to consider vclzlsbb as well and avoid the failure on
the reported test case.  It also simplifies the patterns
with iterator and attribute.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon.

BR,
Kewen
-
PR target/111480

gcc/ChangeLog:

* config/rs6000/vsx.md (VCZLSBB): New int iterator.
(vczlsbb_char): New int attribute.
(vclzlsbb_, vctzlsbb_): Merge to ...
(vczlsbb_): ... this.
(*vctzlsbb_zext_): Rename to ...
(*vczlsbb_zext_): ... this, and extend it to
cover vclzlsbb.
---
 gcc/config/rs6000/vsx.md | 41 ++--
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 4c1725a7ecd..6111cc90eb7 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -411,6 +411,12 @@ (define_mode_attr VM3_char [(V2DI "d")
   (V2DF  "d")
   (V4SF  "w")])

+;; Iterator and attribute for vector count leading/trailing
+;; zero least-significant bits byte
+(define_int_iterator VCZLSBB [UNSPEC_VCLZLSBB
+ UNSPEC_VCTZLSBB])
+(define_int_attr vczlsbb_char [(UNSPEC_VCLZLSBB "l")
+  (UNSPEC_VCTZLSBB "t")])

 ;; VSX moves

@@ -5855,35 +5861,24 @@ (define_insn "vcmpnezw"
   "vcmpnezw %0,%1,%2"
   [(set_attr "type" "vecsimple")])

-;; Vector Count Leading Zero Least-Significant Bits Byte
-(define_insn "vclzlsbb_"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI
-[(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
-UNSPEC_VCLZLSBB))]
-  "TARGET_P9_VECTOR"
-  "vclzlsbb %0,%1"
-  [(set_attr "type" "vecsimple")])
-
-;; Vector Count Trailing Zero Least-Significant Bits Byte
-(define_insn "*vctzlsbb_zext_"
+;; Vector Count Leading/Trailing Zero Least-Significant Bits Byte
+(define_insn "*vczlsbb_zext_"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (zero_extend:DI
-   (unspec:SI
-[(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
-UNSPEC_VCTZLSBB)))]
+ (zero_extend:DI
+   (unspec:SI
+ [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
+ VCZLSBB)))]
   "TARGET_P9_VECTOR"
-  "vctzlsbb %0,%1"
+  "vczlsbb %0,%1"
   [(set_attr "type" "vecsimple")])

-;; Vector Count Trailing Zero Least-Significant Bits Byte
-(define_insn "vctzlsbb_"
+(define_insn "vczlsbb_"
   [(set (match_operand:SI 0 "register_operand" "=r")
-(unspec:SI
- [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
- UNSPEC_VCTZLSBB))]
+ (unspec:SI
+   [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
+   VCZLSBB))]
   "TARGET_P9_VECTOR"
-  "vctzlsbb %0,%1"
+  "vczlsbb %0,%1"
   [(set_attr "type" "vecsimple")])

 ;; Vector Extract Unsigned Byte Left-Indexed
--
2.42.0

[PATCH] rs6000: Make copysign (x, -1) back to -abs (x) for IEEE128 float [PR112606]

2024-01-07 Thread Kewen.Lin

Hi,

I noticed that commit r14-6192 can't help PR112606 #c3 as
it only takes care of SF/DF but TF/KF can still suffer the
issue.  Similar to commit r14-6192, this patch is to take
care of copysign3 with IEEE128 as well.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon.

BR,
Kewen
-
PR target/112606

gcc/ChangeLog:

* config/rs6000/rs6000.md (copysign3 IEEE128): Change predicate
of the last argument from altivec_register_operand to any_operand.  If
operands[2] is CONST_DOUBLE, emit abs or neg abs depending on its sign
otherwise if it doesn't satisfy altivec_register_operand, force it to
REG using copy_to_mode_reg.
---
 gcc/config/rs6000/rs6000.md | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c880cec33a2..bc8bc6ab060 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -15020,9 +15020,27 @@ (define_insn "sqrt2"
 (define_expand "copysign3"
   [(use (match_operand:IEEE128 0 "altivec_register_operand"))
(use (match_operand:IEEE128 1 "altivec_register_operand"))
-   (use (match_operand:IEEE128 2 "altivec_register_operand"))]
+   (use (match_operand:IEEE128 2 "any_operand"))]
   "FLOAT128_IEEE_P (mode)"
 {
+  /* Middle-end canonicalizes -fabs (x) to copysign (x, -1),
+ but PowerPC prefers -fabs (x).  */
+  if (CONST_DOUBLE_AS_FLOAT_P (operands[2]))
+{
+  if (real_isneg (CONST_DOUBLE_REAL_VALUE (operands[2])))
+   {
+ rtx abs_res = gen_reg_rtx (mode);
+ emit_insn (gen_abs2 (abs_res, operands[1]));
+ emit_insn (gen_neg2 (operands[0], abs_res));
+   }
+  else
+   emit_insn (gen_abs2 (operands[0], operands[1]));
+  DONE;
+}
+
+  if (!altivec_register_operand (operands[2], mode))
+operands[2] = copy_to_mode_reg (mode, operands[2]);
+
   if (TARGET_FLOAT128_HW)
 emit_insn (gen_copysign3_hard (operands[0], operands[1],
 operands[2]));
--
2.42.0

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-07 Thread Kito Cheng

I am ok with merging this for GCC 14, as we discussed several times in
the RISC-V GCC sync up meeting, I think at least we reach consensus
among Jeff Law, Palmer Dabbelt and me.

But please be careful: don't break anything for standard vector stuff.

On Mon, Jan 8, 2024 at 10:11 AM joshua  wrote:
>
> Hi Juzhe,
>
> Stage 3 will close today and there are still some patches that
> haven't been reviewed left.
> So is it possible to get xtheadvector merged in GCC-14?
> We emailed Kito regarding this, but haven't got any reply yet.
>
> Joshua
>
>
>
>
>
>
> --
> 发件人：juzhe.zh...@rivai.ai 
> 发送时间：2024年1月4日(星期四) 17:18
> 收件人："cooper.joshua"; 
> jeffreyalaw; "gcc-patches"
> 抄　送：Jim Wilson; palmer; 
> andrew; "philipp.tomsich"; 
> "christoph.muellner"; 
> jinma; "cooper.qu"
> 主　题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
> \ No newline at end of file
> Each file needs newline.
>
>
> I am not able to review arch stuff. This needs kito.
>
>
> Besides, Andrew Pinski want us defer theadvector to GCC-15.
>
>
> I have no strong opinion here.
>
>
> juzhe.zh...@rivai.ai
>
>
> 发件人： joshua
> 发送时间： 2024-01-04 17:15
> 收件人： 钟居哲; Jeff Law; gcc-patches
> 抄送： jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; 
> jinma; Cooper Qu
> 主题： Re：Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
> Hi Juzhe,
>
> So is the following patch that this patch relies on OK to commit?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
> Joshua
>
>
>
>
> --
> 发件人：钟居哲 
> 发送时间：2024年1月2日(星期二) 06:57
> 收件人：Jeff Law; 
> "cooper.joshua"; 
> "gcc-patches"
> 抄　送："jim.wilson.gcc"; palmer; 
> andrew; "philipp.tomsich"; 
> "Christoph Müllner"; 
> jinma; Cooper Qu
> 主　题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
> This is Ok from my side.
> But before commit this patch, I think we need this patch first:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
>
> I will be back to work so I will take a look at other patches today.
> juzhe.zh...@rivai.ai
>
>
> From: Jeff Law
> Date: 2024-01-01 01:43
> To: Jun Sha (Joshua); gcc-patches
> CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> juzhe.zhong; Jin Ma; Xianmiao Qu
> Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
>
> On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > This patch adds th. prefix to all XTheadVector instructions by
> > implementing new assembly output functions. We only check the
> > prefix is 'v', so that no extra attribute is needed.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> >   New function to add assembler insn code prefix/suffix.
> >   * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> >   * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> >
> > Co-authored-by: Jin Ma 
> > Co-authored-by: Xianmiao Qu 
> > Co-authored-by: Christoph Müllner 
> > ---
> >   gcc/config/riscv/riscv-protos.h|  1 +
> >   gcc/config/riscv/riscv.cc  | 14 ++
> >   gcc/config/riscv/riscv.h   |  4 
> >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
> >   4 files changed, 31 insertions(+)
> >   create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h 
> > b/gcc/config/riscv/riscv-protos.h
> > index 31049ef7523..5ea54b45703 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -102,6 +102,7 @@ struct riscv_address_info {
> >   };
> >
> >   /* Routines implemented in riscv.cc.  */
> > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char 
> > *p);
> >   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
> >   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
> >   extern int riscv_float_const_rtx_index_for_fli (rtx);
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 0d1cbc5cb5f..ea1d59d9cf2 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> > return lmul;
> >   }
> >
> > +/* Define ASM_OUTPUT_OPCODE to do anything special before
> > +   emitting an opcode.  */
> > +const char *
> > +riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
> > +{
> > +  /* We need to add th. prefix to all the xtheadvector
> > + insturctions here.*/
> > +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> > +  p[0] == 'v')
> > +fputs ("th.", asm_out_file);
> > +
> > +  return p;
> Just a formatting nit. The GNU standards br

Re: Disable FMADD in chains for Zen4 and generic

2024-01-07 Thread Hongtao Liu

On Thu, Dec 14, 2023 at 12:03 AM Jan Hubicka  wrote:
>
> > > The diffrerence is that Cores understand the fact that fmadd does not need
> > > all three parameters to start computation, while Zen cores doesn't.
> > >
> > > Since this seems noticeable win on zen and not loss on Core it seems like 
> > > good
> > > default for generic.
> > >
> > > I plan to commit the patch next week if there are no compplains.
> > The generic part LGTM.(It's exactly what we proposed in [1])
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637721.html
>
> Thanks.  I wonder if can think of other generic changes that would make
> sense to do?
> Concerning zen4 and FMA, it is not really win with AVX512 enabled
> (which is what I was benchmarking for znver4 tuning), but indeed it is
> win with AVX256 where the extra latency is not hidden by the parallelism
> exposed by doing evertyhing twice.
>
> I re-benmchmarked zen4 and it behaves similarly to zen3 with avx256, so
> for x86-64-v3 this makes sense.
>
> Honza
> > >
> > > Honza
> > >
> > > #include 
> > > #include 
> > >
> > > #define SIZE 1000
> > >
> > > float a[SIZE][SIZE];
> > > float b[SIZE][SIZE];
> > > float c[SIZE][SIZE];
> > >
> > > void init(void)
> > > {
> > >int i, j, k;
> > >for(i=0; i > >{
> > >   for(j=0; j > >   {
> > >  a[i][j] = (float)i + j;
> > >  b[i][j] = (float)i - j;
> > >  c[i][j] = 0.0f;
> > >   }
> > >}
> > > }
> > >
> > > void mult(void)
> > > {
> > >int i, j, k;
> > >
> > >for(i=0; i > >{
> > >   for(j=0; j > >   {
> > >  for(k=0; k > >  {
> > > c[i][j] += a[i][k] * b[k][j];
> > >  }
> > >   }
> > >}
> > > }
> > >
> > > int main(void)
> > > {
> > >clock_t s, e;
> > >
> > >init();
> > >s=clock();
> > >mult();
> > >e=clock();
> > >printf("mult took %10d clocks\n", (int)(e-s));
> > >
> > >return 0;
> > >
> > > }
> > >
> > > * confg/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS, 
> > > X86_TUNE_AVOID_256FMA_CHAINS)
> > > Enable for znver4 and Core.
> > >
> > > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> > > index 43fa9e8fd6d..74b03cbcc60 100644
> > > --- a/gcc/config/i386/x86-tune.def
> > > +++ b/gcc/config/i386/x86-tune.def
> > > @@ -515,13 +515,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, 
> > > "use_scatter_8parts",
> > >
> > >  /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit 
> > > or
> > > smaller FMA chain.  */
> > > -DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | 
> > > m_ZNVER2 | m_ZNVER3
> > > -  | m_YONGFENG)
> > > +DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | 
> > > m_ZNVER2 | m_ZNVER3 | m_ZNVER4
> > > +  | m_YONGFENG | m_GENERIC)
> > >
> > >  /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit 
> > > or
> > > smaller FMA chain.  */
> > > -DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 
> > > | m_ZNVER3
> > > - | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM)
> > > +DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 
> > > | m_ZNVER3 | m_ZNVER4
> > > + | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC)
Can we backport the patch(at least the generic part) to
GCC11/GCC12/GCC13 release branch?
> > >
> > >  /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit 
> > > or
> > > smaller FMA chain.  */
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

[PATCH] i386: [APX] Add missing document for APX

2024-01-07 Thread Hongyu Wang

Hi,

The supported sub-features for APX was missing in option document and
target attribute section. Add those missing ones.

Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.opt: Add supported sub-features.
* doc/extend.texi: Add description for target attribute.
---
 gcc/config/i386/i386.opt | 3 ++-
 gcc/doc/extend.texi  | 6 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 1bfff1e0d82..a38e92baf92 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1328,7 +1328,8 @@ Enable vectorization for scatter instruction.
 
 mapxf
 Target Mask(ISA2_APX_F) Var(ix86_isa_flags2) Save
-Support APX code generation.
+Support code generation for APX features, including EGPR, PUSH2POP2,
+NDD and PPX.
 
 mapx-features=
 Target Undocumented Joined Enum(apx_features) EnumSet Var(ix86_apx_features) 
Init(apx_none) Save
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 9e61ba9507d..84eef411e2d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7344,6 +7344,12 @@ Enable/disable the generation of the SM4 instructions.
 @itemx no-usermsr
 Enable/disable the generation of the USER_MSR instructions.
 
+@cindex @code{target("apxf")} function attribute, x86
+@item apxf
+@itemx no-apxf
+Enable/disable the generation of the APX features, including
+EGPR, PUSH2POP2, NDD and PPX.
+
 @cindex @code{target("avx10.1")} function attribute, x86
 @item avx10.1
 @itemx no-avx10.1
-- 
2.31.1

Re：Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-07 Thread joshua

Hi Kito,

Thank you for your support.
So even during stage 4, we can merge this for GCC 14?





--
发件人：Kito Cheng 
发送时间：2024年1月8日(星期一) 11:06
收件人：joshua
抄　送："juzhe.zh...@rivai.ai"; 
jeffreyalaw; "gcc-patches"; Jim 
Wilson; palmer; 
andrew; "philipp.tomsich"; 
"christoph.muellner"; 
jinma; "cooper.qu"
主　题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.


I am ok with merging this for GCC 14, as we discussed several times in
the RISC-V GCC sync up meeting, I think at least we reach consensus
among Jeff Law, Palmer Dabbelt and me.

But please be careful: don't break anything for standard vector stuff.

On Mon, Jan 8, 2024 at 10:11 AM joshua  wrote:
>
> Hi Juzhe,
>
> Stage 3 will close today and there are still some patches that
> haven't been reviewed left.
> So is it possible to get xtheadvector merged in GCC-14?
> We emailed Kito regarding this, but haven't got any reply yet.
>
> Joshua
>
>
>
>
>
>
> --
> 发件人：juzhe.zh...@rivai.ai 
> 发送时间：2024年1月4日(星期四) 17:18
> 收件人："cooper.joshua"; 
> jeffreyalaw; "gcc-patches"
> 抄 送：Jim Wilson; palmer; 
> andrew; "philipp.tomsich"; 
> "christoph.muellner"; 
> jinma; "cooper.qu"
> 主 题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
> \ No newline at end of file
> Each file needs newline.
>
>
> I am not able to review arch stuff. This needs kito.
>
>
> Besides, Andrew Pinski want us defer theadvector to GCC-15.
>
>
> I have no strong opinion here.
>
>
> juzhe.zh...@rivai.ai
>
>
> 发件人： joshua
> 发送时间： 2024-01-04 17:15
> 收件人： 钟居哲; Jeff Law; gcc-patches
> 抄送： jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; 
> jinma; Cooper Qu
> 主题： Re：Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
> Hi Juzhe,
>
> So is the following patch that this patch relies on OK to commit?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
> Joshua
>
>
>
>
> --
> 发件人：钟居哲 
> 发送时间：2024年1月2日(星期二) 06:57
> 收件人：Jeff Law; 
> "cooper.joshua"; 
> "gcc-patches"
> 抄 送："jim.wilson.gcc"; palmer; 
> andrew; "philipp.tomsich"; 
> "Christoph Müllner"; 
> jinma; Cooper Qu
> 主 题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
> This is Ok from my side.
> But before commit this patch, I think we need this patch first:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
>
> I will be back to work so I will take a look at other patches today.
> juzhe.zh...@rivai.ai
>
>
> From: Jeff Law
> Date: 2024-01-01 01:43
> To: Jun Sha (Joshua); gcc-patches
> CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> juzhe.zhong; Jin Ma; Xianmiao Qu
> Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
>
> On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > This patch adds th. prefix to all XTheadVector instructions by
> > implementing new assembly output functions. We only check the
> > prefix is 'v', so that no extra attribute is needed.
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> >       New function to add assembler insn code prefix/suffix.
> >       * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> >       * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> >
> > Co-authored-by: Jin Ma 
> > Co-authored-by: Xianmiao Qu 
> > Co-authored-by: Christoph Müllner 
> > ---
> >   gcc/config/riscv/riscv-protos.h                    |  1 +
> >   gcc/config/riscv/riscv.cc                          | 14 ++
> >   gcc/config/riscv/riscv.h                           |  4 
> >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c     | 12 
> >   4 files changed, 31 insertions(+)
> >   create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h 
> > b/gcc/config/riscv/riscv-protos.h
> > index 31049ef7523..5ea54b45703 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -102,6 +102,7 @@ struct riscv_address_info {
> >   };
> >
> >   /* Routines implemented in riscv.cc.  */
> > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char 
> > *p);
> >   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
> >   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
> >   extern int riscv_float_const_rtx_index_for_fli (rtx);
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 0d1cbc5cb5f..ea1d59d9cf2 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> >     return lmul;
> >   }
> >
> > +/* Define

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-07 Thread Kito Cheng

It depends on the timing when you send out the v1 patch to the mailing
list, not the timing of when to merge, but of course it's case by
case, I would say no IF it's still not ready when time is the end of
Feb for this kind of big patch set.

On Mon, Jan 8, 2024 at 11:17 AM joshua  wrote:
>
> Hi Kito,
>
> Thank you for your support.
> So even during stage 4, we can merge this for GCC 14?
>
>
>
>
>
> --
> 发件人：Kito Cheng 
> 发送时间：2024年1月8日(星期一) 11:06
> 收件人：joshua
> 抄　送："juzhe.zh...@rivai.ai"; 
> jeffreyalaw; "gcc-patches"; 
> Jim Wilson; palmer; 
> andrew; "philipp.tomsich"; 
> "christoph.muellner"; 
> jinma; "cooper.qu"
> 主　题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
> I am ok with merging this for GCC 14, as we discussed several times in
> the RISC-V GCC sync up meeting, I think at least we reach consensus
> among Jeff Law, Palmer Dabbelt and me.
>
> But please be careful: don't break anything for standard vector stuff.
>
> On Mon, Jan 8, 2024 at 10:11 AM joshua  
> wrote:
> >
> > Hi Juzhe,
> >
> > Stage 3 will close today and there are still some patches that
> > haven't been reviewed left.
> > So is it possible to get xtheadvector merged in GCC-14?
> > We emailed Kito regarding this, but haven't got any reply yet.
> >
> > Joshua
> >
> >
> >
> >
> >
> >
> > --
> > 发件人：juzhe.zh...@rivai.ai 
> > 发送时间：2024年1月4日(星期四) 17:18
> > 收件人："cooper.joshua"; 
> > jeffreyalaw; "gcc-patches"
> > 抄 送：Jim Wilson; palmer; 
> > andrew; "philipp.tomsich"; 
> > "christoph.muellner"; 
> > jinma; "cooper.qu"
> > 主 题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
> > of XTheadVector.
> >
> >
> > \ No newline at end of file
> > Each file needs newline.
> >
> >
> > I am not able to review arch stuff. This needs kito.
> >
> >
> > Besides, Andrew Pinski want us defer theadvector to GCC-15.
> >
> >
> > I have no strong opinion here.
> >
> >
> > juzhe.zh...@rivai.ai
> >
> >
> > 发件人： joshua
> > 发送时间： 2024-01-04 17:15
> > 收件人： 钟居哲; Jeff Law; gcc-patches
> > 抄送： jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; 
> > jinma; Cooper Qu
> > 主题： Re：Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> > XTheadVector.
> >
> > Hi Juzhe,
> >
> > So is the following patch that this patch relies on OK to commit?
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
> >
> > Joshua
> >
> >
> >
> >
> > --
> > 发件人：钟居哲 
> > 发送时间：2024年1月2日(星期二) 06:57
> > 收件人：Jeff Law; 
> > "cooper.joshua"; 
> > "gcc-patches"
> > 抄 送："jim.wilson.gcc"; palmer; 
> > andrew; "philipp.tomsich"; 
> > "Christoph Müllner"; 
> > jinma; Cooper Qu
> > 主 题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
> > of XTheadVector.
> >
> >
> > This is Ok from my side.
> > But before commit this patch, I think we need this patch first:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
> >
> >
> > I will be back to work so I will take a look at other patches today.
> > juzhe.zh...@rivai.ai
> >
> >
> > From: Jeff Law
> > Date: 2024-01-01 01:43
> > To: Jun Sha (Joshua); gcc-patches
> > CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> > juzhe.zhong; Jin Ma; Xianmiao Qu
> > Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
> > of XTheadVector.
> >
> >
> >
> > On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > > This patch adds th. prefix to all XTheadVector instructions by
> > > implementing new assembly output functions. We only check the
> > > prefix is 'v', so that no extra attribute is needed.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> > >   New function to add assembler insn code prefix/suffix.
> > >   * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> > >   * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> > >
> > > Co-authored-by: Jin Ma 
> > > Co-authored-by: Xianmiao Qu 
> > > Co-authored-by: Christoph Müllner 
> > > ---
> > >   gcc/config/riscv/riscv-protos.h|  1 +
> > >   gcc/config/riscv/riscv.cc  | 14 ++
> > >   gcc/config/riscv/riscv.h   |  4 
> > >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
> > >   4 files changed, 31 insertions(+)
> > >   create mode 100644 
> > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> > >
> > > diff --git a/gcc/config/riscv/riscv-protos.h 
> > > b/gcc/config/riscv/riscv-protos.h
> > > index 31049ef7523..5ea54b45703 100644
> > > --- a/gcc/config/riscv/riscv-protos.h
> > > +++ b/gcc/config/riscv/riscv-protos.h
> > > @@ -102,6 +102,7 @@ struct riscv_address_info {
> > >   };
> > >
> > >   /* Routines implemented in riscv.cc.  */
> > > +exter

Re: [committed] RISC-V: Clean up testsuite for multi-lib testing [NFC]

2024-01-07 Thread Kito Cheng

ack, I am fixing this, and running a few more tests, thanks for reporting this!

On Sat, Jan 6, 2024 at 10:06 AM 钟居哲  wrote:
>
> Hi, kito.
>
> This patch causes these following regression FAILs:
>
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
>
> spawn -ignore SIGHUP 
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/build/dev-rv64gcv-lp64d-medany-newlib-spike-release-m1-scalable/build-gcc-newlib-stage2/gcc/xgcc
>  
> -B/work/home/jzzhong/work/docker/riscv-gnu-toolchain/build/dev-rv64gcv-lp64d-medany-newlib-spike-release-m1-scalable/build-gcc-newlib-stage2/gcc/
>  
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c
>  -march=rv64gcv -mabi=lp64d -mcmodel=medany -fdiagnostics-plain-output 
> -ftree-vectorize -O2 --param riscv-autovec-lmul=m1 --param 
> riscv-autovec-preference=scalable -lm -o ./single_rgroup_run-3.exe^M
> In file included from 
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.c:4,^M
>  from 
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c:4:^M
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c:
>  In function 'main':^M
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:108:9:
>  error: implicit declaration of function 'assert' 
> [-Wimplicit-function-declaration]^M
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:174:3:
>  note: in expansion of macro 'run_6'^M
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c:16:3:
>  note: in expansion of macro 'TEST_ALL'^M
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:108:9:
>  note: 'assert' is defined in header ''; this is probably fixable 
> by adding '#include '^M
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:174:3:
>  note: in expansion of macro 'run_6'^M
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c:16:3:
>  note: in expansion of macro 'TEST_ALL'^M
> compiler exited with status 1
> FAIL: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c (test for 
> excess errors)
> Excess errors:
> /work/home/jzzhong/work/docker/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h:108:9:
>  error: implicit declaration of function 'assert' 
> [-Wimplicit-function-declaration]
>
> UNRESOLVED: gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c 
> compilation failed to produce executable
>
>
> Could you fix it ?
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2024-01-05 16:39
> To: gcc-patches; kito.cheng; juzhe.zhong
> CC: Kito Cheng
> Subject: [committed] RISC-V: Clean up testsuite for multi-lib testing [NFC]
> - Drop unnecessary including for stdlib.h and math.h
> - Drop assert.h / assert, use __builtin_abort instead.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h:
> Use __builtin_abort instead of assert.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Drop math.h.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
> * gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: Dit

Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-07 Thread juzhe.zhong

I am on vacation today. I will back tomorrow or late tonight.  I think we can land theadvector before spring festival as long as it is not invasive to RVV1.0 Replied Message FromjoshuaDate01/08/2024 11:17 ToKito Cheng Ccjuzhe.zh...@rivai.ai,jeffreyalaw,gcc-patches,Jim Wilson,palmer,andrew,philipp.tomsich,christoph.muellner,jinma,cooper.quSubjectRe：Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.Hi Kito,

Thank you for your support.
So even during stage 4, we can merge this for GCC 14?





--
发件人：Kito Cheng 
发送时间：2024年1月8日(星期一) 11:06
收件人：joshua
抄　送："juzhe.zh...@rivai.ai"; jeffreyalaw; "gcc-patches"; Jim Wilson; palmer; andrew; "philipp.tomsich"; "christoph.muellner"; jinma; "cooper.qu"
主　题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.


I am ok with merging this for GCC 14, as we discussed several times in
the RISC-V GCC sync up meeting, I think at least we reach consensus
among Jeff Law, Palmer Dabbelt and me.

But please be careful: don't break anything for standard vector stuff.

On Mon, Jan 8, 2024 at 10:11 AM joshua  wrote:
>
> Hi Juzhe,
>
> Stage 3 will close today and there are still some patches that
> haven't been reviewed left.
> So is it possible to get xtheadvector merged in GCC-14?
> We emailed Kito regarding this, but haven't got any reply yet.
>
> Joshua
>
>
>
>
>
>
> --
> 发件人：juzhe.zh...@rivai.ai 
> 发送时间：2024年1月4日(星期四) 17:18
> 收件人："cooper.joshua"; jeffreyalaw; "gcc-patches"
> 抄 送：Jim Wilson; palmer; andrew; "philipp.tomsich"; "christoph.muellner"; jinma; "cooper.qu"
> 主 题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
>
>
> \ No newline at end of file
> Each file needs newline.
>
>
> I am not able to review arch stuff. This needs kito.
>
>
> Besides, Andrew Pinski want us defer theadvector to GCC-15.
>
>
> I have no strong opinion here.
>
>
> juzhe.zh...@rivai.ai
>
>
> 发件人： joshua
> 发送时间： 2024-01-04 17:15
> 收件人： 钟居哲; Jeff Law; gcc-patches
> 抄送： jim.wilson.gcc; palmer; andrew; philipp.tomsich; Christoph Müllner; jinma; Cooper Qu
> 主题： Re：Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
>
> Hi Juzhe,
>
> So is the following patch that this patch relies on OK to commit?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
> Joshua
>
>
>
>
> --
> 发件人：钟居哲 
> 发送时间：2024年1月2日(星期二) 06:57
> 收件人：Jeff Law; "cooper.joshua"; "gcc-patches"
> 抄 送："jim.wilson.gcc"; palmer; andrew; "philipp.tomsich"; "Christoph Müllner"; jinma; Cooper Qu
> 主 题：Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
>
>
> This is Ok from my side.
> But before commit this patch, I think we need this patch first:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
>
> I will be back to work so I will take a look at other patches today.
> juzhe.zh...@rivai.ai
>
>
> From: Jeff Law
> Date: 2024-01-01 01:43
> To: Jun Sha (Joshua); gcc-patches
> CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; juzhe.zhong; Jin Ma; Xianmiao Qu
> Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
>
>
>
> On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > This patch adds th. prefix to all XTheadVector instructions by
> > implementing new assembly output functions. We only check the
> > prefix is 'v', so that no extra attribute is needed.
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> >       New function to add assembler insn code prefix/suffix.
> >       * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> >       * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> >
> > Co-authored-by: Jin Ma 
> > Co-authored-by: Xianmiao Qu 
> > Co-authored-by: Christoph Müllner 
> > ---
> >   gcc/config/riscv/riscv-protos.h                    |  1 +
> >   gcc/config/riscv/riscv.cc                          | 14 ++
> >   gcc/config/riscv/riscv.h                           |  4 
> >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c     | 12 
> >   4 files changed, 31 insertions(+)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index 31049ef7523..5ea54b45703 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -102,6 +102,7 @@ struct riscv_address_info {
> >   };
> >
> >   /* Routines implemented in riscv.cc.  */
> > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
> >   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
> >   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_typ

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-07 Thread Jeff Law





On 1/5/24 10:35, Richard Sandiford wrote:

Jeff Law  writes:

On 10/24/23 12:49, Richard Sandiford wrote:

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

So have you done any investigation on cases caught by your new pass
between combine and split1 to characterize them?  In particular do they
point at solvable problems in combine?  Or do you forsee this subsuming
the old combiner pass at some point in the distant future?


Examples like the PR are the main motivation for the pre-RA pass.
There we had an extension that could be combined into an address,
but no longer was after GCC 13.

The PR itself could in principle be fixed in combine (various
approaches were suggested, but not accepted).  But the same problem
applies to multiple uses of extensions.  fwprop can't handle it because
individual propagations are not a win in isolation.  And combine has
a limit of combining 4 insns (with a maximum of 2 output insns, IIRC).
So I don't think either of the existing passes scale to the general case.

Oh, that discussion :(






rth and I sketched out an SSA based RTL combine at some point in the
deep past.  The key goal we were trying to achieve was combining across
blocks.  We didn't have a functioning RTL SSA form at the time, so it
never went to any implementation work.  It looks like yours would solve
the class of problems rth and I were considering.


Yeah, I do see some cases where combining across blocks helps.
The case above is one example of that.  Another is:
Great.  The cases I think rth and I were looking at were inspired by a 
talk at one of the early Cauldrons -- whichever one we had in 
California.  Someone did a talk on cross-block combinations, mostly 
motivated by missed combinations on ARM.






The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?

I'm going to adjust this slightly so that it's enabled across the board
and throw it into the tester tomorrow (tester is busy tonight).  Even if
we make it opt-in on a per-port basis, the alternate target testing does
seems to find stuff that needs fixing ;-)


Thanks!  As per our off-list discussion, the cris-elf failures showed
up a bug in the handling of call arguments.  Here's an updated version
with that fixed.
Perfect.  I'll spin that in the tester overnight.  Hopefully that fixes 
the other ports that failed to build (as opposed to the testsuite 
failures which I think you've covered as not an issue with the new 
combine pass, but which are instead port issues.







Yeah.  If I'd posted this earlier in stage 1 (rather than October),
I might have tried teaching shorten_branches how to handle this.
But it felt like it could be a bit of a rabbit hole at this stage.


Nothing jumps out at horribly wrong.  You might want/need to reject
frame related insns in optimizable_set, though I guess if the dwarf2
writer isn't complaining, then we haven't mucked things up too bad.


Ah, yeah, I wondered about that.  But I suppose most prologue insns
don't really create combination opportunities, since most of them
either set up the stack (which we wouldn't combine anyway) or sink
incoming data.  So if there cases where this is a problem, it might
unfortunately be a case of "see what breaks".
The other issue that's been in the back of my mind is costing.  But I 
think the model here is combine without regards to cost.


Let's let the tester chew on the updated version overnight, see what 
else may pop out, but barring any big surprises, I think we're good to go.




jeff

Re: [PATCH] i386: [APX] Add missing document for APX

2024-01-07 Thread Hongtao Liu

On Mon, Jan 8, 2024 at 11:09 AM Hongyu Wang  wrote:
>
> Hi,
>
> The supported sub-features for APX was missing in option document and
> target attribute section. Add those missing ones.
>
> Ok for trunk?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386.opt: Add supported sub-features.
> * doc/extend.texi: Add description for target attribute.
> ---
>  gcc/config/i386/i386.opt | 3 ++-
>  gcc/doc/extend.texi  | 6 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 1bfff1e0d82..a38e92baf92 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1328,7 +1328,8 @@ Enable vectorization for scatter instruction.
>
>  mapxf
>  Target Mask(ISA2_APX_F) Var(ix86_isa_flags2) Save
> -Support APX code generation.
> +Support code generation for APX features, including EGPR, PUSH2POP2,
> +NDD and PPX.
>
>  mapx-features=
>  Target Undocumented Joined Enum(apx_features) EnumSet Var(ix86_apx_features) 
> Init(apx_none) Save
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 9e61ba9507d..84eef411e2d 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -7344,6 +7344,12 @@ Enable/disable the generation of the SM4 instructions.
>  @itemx no-usermsr
>  Enable/disable the generation of the USER_MSR instructions.
>
> +@cindex @code{target("apxf")} function attribute, x86
> +@item apxf
> +@itemx no-apxf
> +Enable/disable the generation of the APX features, including
> +EGPR, PUSH2POP2, NDD and PPX.
> +
>  @cindex @code{target("avx10.1")} function attribute, x86
>  @item avx10.1
>  @itemx no-avx10.1
> --
> 2.31.1
>


-- 
BR,
Hongtao

[PING 3][PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2024-01-07 Thread Surya Kumari Jangala

Ping

On 28/11/23 6:24 pm, Surya Kumari Jangala wrote:
> Ping
> 
> On 10/11/23 12:27 pm, Surya Kumari Jangala wrote:
>> Ping
>>
>> On 03/11/23 1:14 pm, Surya Kumari Jangala wrote:
>>> Hi Segher,
>>> I have incorporated changes in the code as per the review comments provided 
>>> by you 
>>> for version 2 of the patch. Please review.
>>>
>>> Regards,
>>> Surya
>>>
>>>
>>> rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>>
>>> In the routine rs6000_analyze_swaps(), special handling of swappable
>>> instructions is done even if the webs that contain the swappable 
>>> instructions
>>> are not optimized, i.e., the webs do not contain any permuting load/store
>>> instructions along with the associated register swap instructions. Doing 
>>> special
>>> handling in such webs will result in the extracted lane being adjusted
>>> unnecessarily for vec_extract.
>>>
>>> Another issue is that existing code treats non-permuting loads/stores as 
>>> special
>>> swappables. Non-permuting loads/stores (that have not yet been split into a
>>> permuting load/store and a swap) are handled by converting them into a 
>>> permuting
>>> load/store (which effectively removes the swap). As a result, if special
>>> swappables are handled only in webs containing permuting loads/stores, then
>>> non-optimal code is generated for non-permuting loads/stores.
>>>
>>> Hence, in this patch, all webs containing either permuting loads/ stores or
>>> non-permuting loads/stores are marked as requiring special handling of
>>> swappables. Swaps associated with permuting loads/stores are marked for 
>>> removal,
>>> and non-permuting loads/stores are converted to permuting loads/stores. 
>>> Then the
>>> special swappables in the webs are fixed up.
>>>
>>> This patch also ensures that swappable instructions are not modified in the
>>> following webs as it is incorrect to do so:
>>>  - webs containing permuting load/store instructions and associated swap
>>>instructions that are transformed by converting the permuting memory
>>>instructions into non-permuting instructions and removing the swap
>>>instructions.
>>>  - webs where swap(load(vector constant)) instructions are replaced with
>>>load(swapped vector constant).
>>>
>>> 2023-09-10  Surya Kumari Jangala  
>>>
>>> gcc/
>>> PR rtl-optimization/PR106770
>>> * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
>>> (handle_non_permuting_mem_insn): New function.
>>> (rs6000_analyze_swaps): Handle swappable instructions only in certain
>>> webs.
>>> (web_requires_special_handling): New instance variable.
>>> (handle_special_swappables): Remove handling of non-permuting load/store
>>> instructions.
>>>
>>> gcc/testsuite/
>>> PR rtl-optimization/PR106770
>>> * gcc.target/powerpc/pr106770.c: New test.
>>> ---
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>>> b/gcc/config/rs6000/rs6000-p8swap.cc
>>> index 0388b9bd736..02ea299bc3d 100644
>>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>>unsigned int special_handling : 4;
>>>/* Set if the web represented by this entry cannot be optimized.  */
>>>unsigned int web_not_optimizable : 1;
>>> +  /* Set if the swappable insns in the web represented by this entry
>>> + have to be fixed. Swappable insns have to be fixed in:
>>> +   - webs containing permuting loads/stores and the swap insns
>>> +in such webs have been marked for removal
>>> +   - webs where non-permuting loads/stores have been converted
>>> +to permuting loads/stores  */
>>> +  unsigned int web_requires_special_handling : 1;
>>>/* Set if this insn should be deleted.  */
>>>unsigned int will_delete : 1;
>>>  };
>>> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry 
>>> *insn_entry, unsigned i)
>>>if (dump_file)
>>> fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>>>break;
>>> -case SH_NOSWAP_LD:
>>> -  /* Convert a non-permuting load to a permuting one.  */
>>> -  permute_load (insn);
>>> -  break;
>>> -case SH_NOSWAP_ST:
>>> -  /* Convert a non-permuting store to a permuting one.  */
>>> -  permute_store (insn);
>>> -  break;
>>>  case SH_EXTRACT:
>>>/* Change the lane on an extract operation.  */
>>>adjust_extract (insn);
>>> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
>>>free (to_delete);
>>>  }
>>>  
>>> +/* Return true if insn is a non-permuting load/store.  */
>>> +static bool
>>> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
>>> +{
>>> +  return insn_entry[i].special_handling == SH_NOSWAP_LD
>>> +|| insn_entry[i].special_handling == SH_NOSWAP_ST;
>>> +}
>>> +
>>> +/* Convert a non-permuting load/store insn to a permuting one.  */
>>> +static void
>>> +convert_mem_insn (swap_web_entr

Re: [PATCH] Add __cow_string C string constructor

2024-01-07 Thread François Dumont




On 07/01/2024 21:53, Jonathan Wakely wrote:

On Sun, 7 Jan 2024 at 18:50, François Dumont  wrote:


On 07/01/2024 17:34, Jonathan Wakely wrote:

On Sun, 7 Jan 2024 at 12:57, François Dumont  wrote:

Hi

While working on the patch to use the cxx11 abi in gnu version namespace
mode I got a small problem with this missing constructor. I'm not sure
that the main patch will be integrated in gcc 14 so I think it is better
if I propose this patch independently.

   libstdc++: Add __cow_string constructor from C string

   The __cow_string is instantiated from a C string in
cow-stdexcept.cc. At the moment
   the constructor from std::string is being used with the drawback of
an intermediate
   potential allocation/deallocation and copy. With the C string
constructor we bypass
   all those operations.

But in that file, the std::string is the COW string, which means that
when we construct a std::string and copy it, it's cheap. It's just a
reference count increment/decrement. There should be no additional
allocation or deallocation.

Good remark but AFAI understand in this case std::string is the cxx11
one. I'll take a second look.

Clearly in my gnu version namespace patch it is the cxx11 implementation.

I hope not! The whole point of that type is to always be a COW string,
which it does by storing a COW std::basic_string in the union, but
wrapping it in a class with a different name, __cow_string.

If your patch to use the SSO string in the versioned namespace doesn't
change that file to guarantee that __cow_string is still a
copy-on-write type then the patch is wrong and must be fixed.


Don't worry, __cow_string is indeed wrapping a COW string.

What I meant is that in this constructor in :

__cow_string(const std::string&);

The std::string parameter is the SSO string.

However, as you said, in cow-stdexcept.cc the similar constructor is in 
fact taking a COW string so it has less importance. It's just a ODR issue.


In my gnu version namespace patch however this type is still the SSO 
string in cow-stdexcept.cc so I'll keep it in this context.




Even if so, why do we want to do those additional operations ? Adding
this C string constructor will make sure that no useless operations will
be done.

Yes, we could avoid an atomic increment and decrement, but that type
is only used when throwing an exception so the overhead of allocating
memory and calling __cxa_throw etc. is far higher than an atomic
inc/dec pair.

I was going to say that the new constructor would need to be exported
from the shared lib, but I think the new constructor is only ever used
in these two places, both defined in that same file:

   logic_error::logic_error(const char* __arg)
   : exception(), _M_msg(__arg) { }

   runtime_error::runtime_error(const char* __arg)
   : exception(), _M_msg(__arg) { }

So I think the change is safe, but I don't think it's urgent, and
certainly not needed for the reasons claimed in the patch description.

The ODR violation has no side effect, it confirms your statement, looks 
like the __cow_string(const std::string&) could be removed from .

[committed] RISC-V: Fix testsuite

2024-01-07 Thread Kito Cheng

Don't use assert, it not work well with multilib testing.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h: Use
check + abort rather than assert.
---
 .../riscv/rvv/autovec/partial/single_rgroup-3.h   | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h
index 604e9048055..dfe48d6dae1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h
@@ -105,7 +105,9 @@ int cond[N] = {0};
   if (b_##TYPE[i] != a_##TYPE[i]) __builtin_abort();   
\
 } \
   else 
\
-   assert (b_##TYPE[i] == 0); \
+{  
\
+  if (b_##TYPE[i] != 0) __builtin_abort(); 
\
+} \
 }
 
 #define run_7(TYPE)
\
@@ -151,7 +153,9 @@ int cond[N] = {0};
   if (b_##TYPE[i] != a_##TYPE[i]) __builtin_abort();   
\
 } \
   else 
\
-   assert (b_##TYPE[i] == 0); \
+{  
\
+  if (b_##TYPE[i] != 0) __builtin_abort(); 
\
+} \
 }
 
 #define run_10(TYPE)   
\
-- 
2.34.1

[PATCH 1/2] arm: Add cortex-m52 core

2024-01-07 Thread Chung-Ju Wu


Hi,

Recently, Arm announced the Cortex-M52, delivering increased performance
in DSP and ML along with a range of other features and benefits.
For the completeness of Arm ecosystem, we hope that cortex-m52 support
could be available in gcc-14.

Attached is the patch to support cortex-m52 cpu with MVE and PACBTI enabled in 
GCC.
Bootstrapped and tested on arm-none-eabi.

Is it OK for trunk?

Regards,
jasonwucjFrom d0856b516c5d270a852f3edd9df5dadccde5b94e Mon Sep 17 00:00:00 2001
From: Chung-Ju Wu 
Date: Wed, 6 Dec 2023 15:49:58 +0800
Subject: [PATCH 1/2] arm: Add support for Arm Cortex-M52 CPU.

This patch adds the -mcpu support for the Arm Cortex-M52 CPU which is
an Armv8.1-M Mainline CPU supporting MVE and PACBTI by default.

-mcpu=cortex-m52 switch by default matches to 
-march=armv8.1-m.main+pacbti+mve.fp+fp.dp.

The cde feature is supported by specifying +cdecpN (e.g. 
-mcpu=cortex-m52+cdecp),
where N is the coprocessor number 0 to 7.

Also following options are provided to disable default features.
+nomve.fp (disables MVE Floating point)
+nomve (disables MVE Integer and MVE Floating point)
+nodsp (disables dsp, MVE Integer and MVE Floating point)
+nopacbti (disables pacbti)
+nofp (disables floating point and MVE floating point)

Signed-off-by: Chung-Ju Wu 

gcc/ChangeLog:

* config/arm/arm-cpus.in (cortex-m52): New cpu.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
---
 gcc/config/arm/arm-cpus.in| 21 +
 gcc/config/arm/arm-tables.opt |  3 +++
 gcc/config/arm/arm-tune.md|  6 +++---
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 6fa7e315ef0..451b15fe9f9 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1641,6 +1641,27 @@ begin cpu cortex-m35p
  costs v7m
 end cpu cortex-m35p
 
+begin cpu cortex-m52
+ cname cortexm52
+ tune flags LDSCHED
+ architecture armv8.1-m.main+pacbti+mve.fp+fp.dp
+ option nopacbti remove pacbti
+ option nomve.fp remove mve_float
+ option nomve remove mve mve_float
+ option nofp remove ALL_FP mve_float
+ option nodsp remove MVE mve_float
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
+ isa quirk_no_asmcpu quirk_vlldm
+ costs v7m
+end cpu cortex-m52
+
 begin cpu cortex-m55
  cname cortexm55
  tune flags LDSCHED
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 9d6ae875ede..d3eb9a97739 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -282,6 +282,9 @@ Enum(processor_type) String(cortex-m33) Value( 
TARGET_CPU_cortexm33)
 EnumValue
 Enum(processor_type) String(cortex-m35p) Value( TARGET_CPU_cortexm35p)
 
+EnumValue
+Enum(processor_type) String(cortex-m52) Value( TARGET_CPU_cortexm52)
+
 EnumValue
 Enum(processor_type) String(cortex-m55) Value( TARGET_CPU_cortexm55)
 
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 7318f03b97e..6a631d82966 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -49,7 +49,7 @@
cortexa710,cortexx1,cortexx1c,
neoversen1,cortexa75cortexa55,cortexa76cortexa55,
neoversev1,neoversen2,cortexm23,
-   cortexm33,cortexm35p,cortexm55,
-   starmc1,cortexm85,cortexr52,
-   cortexr52plus"
+   cortexm33,cortexm35p,cortexm52,
+   cortexm55,starmc1,cortexm85,
+   cortexr52,cortexr52plus"
(const (symbol_ref "((enum attr_tune) arm_tune)")))
-- 
2.34.3

[PATCH 2/2] arm: Add cortex-m52 doc

2024-01-07 Thread Chung-Ju Wu


Hi,

This is the patch to add cortex-m52 in the Arm-related options
sections of the gcc invoke.texi documentation.

Is it OK for trunk?

Regards,
jasonwucjFrom b7ce3d499d4bf087ec54a5f834876c9108d46c3d Mon Sep 17 00:00:00 2001
From: Chung-Ju Wu 
Date: Thu, 7 Dec 2023 11:26:25 +0800
Subject: [PATCH 2/2] arm: Add Arm Cortex-M52 CPU documentation.

Signed-off-by: Chung-Ju Wu 

gcc/ChangeLog:

* doc/invoke.texi: Update docs.
---
 gcc/doc/invoke.texi | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d71583853f0..bdbe0074cb4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -23094,7 +23094,7 @@ Permissible names are: @samp{arm7tdmi}, 
@samp{arm7tdmi-s}, @samp{arm710t},
 @samp{cortex-r7}, @samp{cortex-r8}, @samp{cortex-r52}, @samp{cortex-r52plus},
 @samp{cortex-m0}, @samp{cortex-m0plus}, @samp{cortex-m1}, @samp{cortex-m3},
 @samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m23}, @samp{cortex-m33},
-@samp{cortex-m35p}, @samp{cortex-m55}, @samp{cortex-m85}, @samp{cortex-x1},
+@samp{cortex-m35p}, @samp{cortex-m52}, @samp{cortex-m55}, @samp{cortex-m85}, 
@samp{cortex-x1},
 @samp{cortex-x1c}, @samp{cortex-m1.small-multiply}, 
@samp{cortex-m0.small-multiply},
 @samp{cortex-m0plus.small-multiply}, @samp{exynos-m1}, @samp{marvell-pj4},
 @samp{neoverse-n1}, @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{xscale},
@@ -23160,34 +23160,34 @@ The following extension options are common to the 
listed CPUs:
 @table @samp
 @item +nodsp
 Disable the DSP instructions on @samp{cortex-m33}, @samp{cortex-m35p},
-@samp{cortex-m55} and @samp{cortex-m85}. Also disable the M-Profile Vector
-Extension (MVE) integer and single precision floating-point instructions on
-@samp{cortex-m55} and @samp{cortex-m85}.
+@samp{cortex-m52}, @samp{cortex-m55} and @samp{cortex-m85}.
+Also disable the M-Profile Vector Extension (MVE) integer and
+single precision floating-point instructions on
+@samp{cortex-m52}, @samp{cortex-m55} and @samp{cortex-m85}.
 
 @item +nopacbti
 Disable the Pointer Authentication and Branch Target Identification Extension
-on @samp{cortex-m85}.
+on @samp{cortex-m52} and @samp{cortex-m85}.
 
 @item +nomve
 Disable the M-Profile Vector Extension (MVE) integer and single precision
-floating-point instructions on @samp{cortex-m55} and @samp{cortex-m85}.
+floating-point instructions on @samp{cortex-m52}, @samp{cortex-m55} and 
@samp{cortex-m85}.
 
 @item +nomve.fp
 Disable the M-Profile Vector Extension (MVE) single precision floating-point
-instructions on @samp{cortex-m55} and @samp{cortex-m85}.
+instructions on @samp{cortex-m52}, @samp{cortex-m55} and @samp{cortex-m85}.
 
 @item +cdecp0, +cdecp1, ... , +cdecp7
 Enable the Custom Datapath Extension (CDE) on selected coprocessors according
-to the numbers given in the options in the range 0 to 7 on @samp{cortex-m55}.
+to the numbers given in the options in the range 0 to 7 on @samp{cortex-m52} 
and @samp{cortex-m55}.
 
 @item  +nofp
 Disables the floating-point instructions on @samp{arm9e},
 @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},
 @samp{arm1020e}, @samp{arm1022e}, @samp{arm926ej-s},
 @samp{arm1026ej-s}, @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-r8},
-@samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m33}, @samp{cortex-m35p}
 @samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m33}, @samp{cortex-m35p},
-@samp{cortex-m55} and @samp{cortex-m85}.
+@samp{cortex-m52}, @samp{cortex-m55} and @samp{cortex-m85}.
 Disables the floating-point and SIMD instructions on
 @samp{generic-armv7-a}, @samp{cortex-a5}, @samp{cortex-a7},
 @samp{cortex-a8}, @samp{cortex-a9}, @samp{cortex-a12},
@@ -23530,9 +23530,9 @@ Development Tools Engineering Specification", which can 
be found on
 Mitigate against a potential security issue with the @code{VLLDM} instruction
 in some M-profile devices when using CMSE (CVE-2021-365465).  This option is
 enabled by default when the option @option{-mcpu=} is used with
-@code{cortex-m33}, @code{cortex-m35p}, @code{cortex-m55}, @code{cortex-m85}
-or @code{star-mc1}. The option @option{-mno-fix-cmse-cve-2021-35465} can be 
used
-to disable the mitigation.
+@code{cortex-m33}, @code{cortex-m35p}, @code{cortex-m52}, @code{cortex-m55},
+@code{cortex-m85} or @code{star-mc1}. The option 
@option{-mno-fix-cmse-cve-2021-35465}
+can be used to disable the mitigation.
 
 @opindex mstack-protector-guard
 @opindex mstack-protector-guard-offset
-- 
2.34.3

65 matches

Mail list logo