date:20250424

[Patch] libgomp: Add additional OpenMP interop runtime tests

2025-04-24 Thread Tobias Burnus


The attached patch adds a bunch of tests for OpenMP's interop;
namely:

* One test checks whether nowait/depend works
* The rest checks that the returns cuda/cuda_driver and hip
  objects work.
* This requires that he CUDA and HIP runtimes are found,
  best also the header files and/or hipfort Fortran module,
  but if not, it still works in fallback mode.

Due to the latter, several new check-effective-target checks
have been added. (And the test files contain fallback
implementations for the header/module files.)

Still, best results can be reached when not only the libraries
but also the header/module files are found.

I tested it with AMD GPU and Nvidia GPU offloading, first without
finding the libs, then only the libs and then also the
headers/module files.

Comments? If not, I intent to commit it later today.

Tobias
libgomp: Add additional OpenMP interop runtime tests

Add checks for nowait/depend and for checks that the returned
CUDA, CUDA_DRIVER and HIP interop objects actually work.

While the former two are only for Nvidia GPUs, HIP applies to both
AMD and Nvidia GPUs, where on Nvidia GPUs, it is a very thin wrapper
around CUDA.

For Fortran, only a HIP test has been added - using hipfort.

While libgomp.c-c++-common/interop-2.c always works and checks for
depend / nowait, all others require that runtime libraries are found:
For Nvidia GPUs, libcuda + libcudart or libcublas,
For AMD GPUs, libamdhip64 or libhipblas.

The header files and hipfort modules do not need to be present as a
fallback has been implemented, but if they are, they get used.

Due to the combinations, the basic 1x C/C++, 4x C and 1x Fortran tests
yield 1x C/C++, 14x C and 4 Fortran run-test files.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp (check_effective_target_openacc_cublas,
	check_effective_target_openacc_cudart): Update description as
	the check requires more.
	(check_effective_target_openacc_libcuda,
	check_effective_target_openacc_libcublas,
	check_effective_target_openacc_libcudart,
	check_effective_target_gomp_hip_header_amd,
	check_effective_target_gomp_hip_header_nvidia,
	check_effective_target_gomp_hipfort_module,
	check_effective_target_gomp_libamdhip64,
	check_effective_target_gomp_libhipblas): New.
	* testsuite/libgomp.c-c++-common/interop-2.c: New test.
	* testsuite/libgomp.c/interop-cublas-full.c: New test.
	* testsuite/libgomp.c/interop-cublas-libonly.c: New test.
	* testsuite/libgomp.c/interop-cuda-full.c: New test.
	* testsuite/libgomp.c/interop-cuda-libonly.c: New test.
	* testsuite/libgomp.c/interop-hip-amd-full.c: New test.
	* testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: New test.
	* testsuite/libgomp.c/interop-hip-nvidia-full.c: New test.
	* testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: New test.
	* testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: New test.
	* testsuite/libgomp.c/interop-hip.h: New test.
	* testsuite/libgomp.c/interop-hipblas-amd-full.c: New test.
	* testsuite/libgomp.c/interop-hipblas-amd-no-hip-header.c: New test.
	* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: New test.
	* testsuite/libgomp.c/interop-hipblas-nvidia-no-headers.c: New test.
	* testsuite/libgomp.c/interop-hipblas-nvidia-no-hip-header.c: New test.
	* testsuite/libgomp.c/interop-hipblas.h: New test.
	* testsuite/libgomp.fortran/interop-hip-amd-full.F90: New test.
	* testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: New test.
	* testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: New test.
	* testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: New test.
	* testsuite/libgomp.fortran/interop-hip.h: New test.

 libgomp/testsuite/lib/libgomp.exp  | 133 +++-
 libgomp/testsuite/libgomp.c-c++-common/interop-2.c | 129 
 libgomp/testsuite/libgomp.c/interop-cublas-full.c  | 176 
 .../testsuite/libgomp.c/interop-cublas-libonly.c   |   7 +
 libgomp/testsuite/libgomp.c/interop-cuda-full.c| 159 ++
 libgomp/testsuite/libgomp.c/interop-cuda-libonly.c |   8 +
 libgomp/testsuite/libgomp.c/interop-hip-amd-full.c |   7 +
 .../libgomp.c/interop-hip-amd-no-hip-header.c  |   8 +
 .../testsuite/libgomp.c/interop-hip-nvidia-full.c  |   8 +
 .../libgomp.c/interop-hip-nvidia-no-headers.c  |  10 +
 .../libgomp.c/interop-hip-nvidia-no-hip-header.c   |   9 +
 libgomp/testsuite/libgomp.c/interop-hip.h  | 234 +
 .../testsuite/libgomp.c/interop-hipblas-amd-full.c |   7 +
 .../libgomp.c/interop-hipblas-amd-no-hip-header.c  |   8 +
 .../libgomp.c/interop-hipblas-nvidia-full.c|   7 +
 .../libgomp.c/interop-hipblas-nvidia-no-headers.c  |   9 +
 .../interop-hipblas-nvidia-no-hip-header.c |   8 +
 libgomp/testsuite/libgomp.c/interop-hipblas.h  | 228 
 .../libgomp.fortran/interop-hip-amd-full.F90   |   7 +
 .../libgomp.fortran/interop-hip-amd-no-module.F90  |   6 +
 .../libgomp.fortran/interop-hip-nvidia-full.F90|   9 +
 .../interop-hip-nvidia-no-module.F90

[PATCH v2] libstdc++: Add lvalue overload for generator::yield_value

2025-04-24 Thread Jonathan Wakely

This was approved in Wrocław as LWG 3899.

This avoids creating a new coroutine frame to co_yield the elements of
an lvalue generator.

libstdc++-v3/ChangeLog:

* include/std/generator (generator::yield_value): Add overload
taking lvalue element_of view, as per LWG 3899.
* testsuite/24_iterators/range_generators/lwg3899.cc: New test.
---

Added a testcase (thanks to Arsen).

Tested x86_64-linux.

 libstdc++-v3/include/std/generator| 10 
 .../24_iterators/range_generators/lwg3899.cc  | 57 +++
 2 files changed, 67 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/range_generators/lwg3899.cc

diff --git a/libstdc++-v3/include/std/generator 
b/libstdc++-v3/include/std/generator
index 3f781f1bb29..7ab2c9e7ce9 100644
--- a/libstdc++-v3/include/std/generator
+++ b/libstdc++-v3/include/std/generator
@@ -153,6 +153,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  noexcept
{ return _Recursive_awaiter { std::move(__r.range) }; }
 
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3899. co_yielding elements of an lvalue generator is
+   // unnecessarily inefficient
+   template
+   requires std::same_as<_Yield2_t<_R2, _V2>, _Yielded>
+   auto
+   yield_value(ranges::elements_of&, _U2> __r)
+ noexcept
+   { return _Recursive_awaiter { std::move(__r.range) }; }
+
template
requires convertible_to, _Yielded>
auto
diff --git a/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3899.cc 
b/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3899.cc
new file mode 100644
index 000..5a812ecf9d9
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3899.cc
@@ -0,0 +1,57 @@
+// { dg-do run { target c++23 } }
+
+// LWG 3899.
+// co_yielding elements of an lvalue generator is unnecessarily inefficient
+
+#include 
+#include 
+#include 
+
+struct memory_resource : std::pmr::memory_resource
+{
+  std::size_t count = 0;
+
+  void* do_allocate(std::size_t n, std::size_t a) override
+  {
+count += n;
+return std::pmr::new_delete_resource()->allocate(n, a);
+  }
+
+  void do_deallocate(void* p, std::size_t n, std::size_t a) override
+  {
+return std::pmr::new_delete_resource()->deallocate(p, n, a);
+  }
+
+  bool do_is_equal(const std::pmr::memory_resource& mr) const noexcept override
+  { return this == &mr; }
+};
+
+std::pmr::generator
+f(std::allocator_arg_t, std::pmr::polymorphic_allocator<>, int init)
+{
+  co_yield init + 0;
+  co_yield init + 1;
+}
+
+std::pmr::generator
+g(std::allocator_arg_t, std::pmr::polymorphic_allocator<> alloc)
+{
+  auto gen = f(std::allocator_arg, alloc, 0);
+  auto gen2 = f(std::allocator_arg, alloc, 2);
+  co_yield std::ranges::elements_of(std::move(gen), alloc);
+  co_yield std::ranges::elements_of(gen2, alloc);
+}
+
+int
+main()
+{
+  std::size_t counts[4];
+  memory_resource mr;
+  for (auto d : g(std::allocator_arg , &mr))
+counts[d] = mr.count;
+  VERIFY(counts[0] != 0);
+  // No allocations after the first one:
+  VERIFY(counts[1] == counts[0]);
+  VERIFY(counts[2] == counts[0]);
+  VERIFY(counts[3] == counts[0]);
+}
-- 
2.49.0

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Qing Zhao

Hi, 

Kees reported a segmentation failure when he used the patch to compiler kernel, 
and the reduced the testing case is something like the following:

struct f {
 void *g __attribute__((__counted_by__(h)));
 long h;
};

extern struct f *my_alloc (int);

int i(void) {
 struct f *iov = my_alloc (10);
 int *j = (int *)iov->g;
 return __builtin_dynamic_object_size(iov->g, 0);
}

Basically, the problem is relating to the pointee type of the pointer array 
being “void”, 
As a result, the element size of the array is not available in the IR. 
Therefore segmentation
fault when calculating the size of the whole object. 

Although it’s easy to fix this segmentation failure, I am not quite sure what’s 
the best
solution to this issue:

1. Reject such usage of “counted_by” in the very beginning by reporting warning 
to the
User, and delete the counted_by attribute from the field.

Or:

2. Accept such usage, but issue warnings when calculating the object_size in 
Middle-end.

Personally, I prefer the above 1 since I think that when the pointee type is 
void, we don’t know
The type of the element of the pointer array, there is no way to decide the 
size of the pointer array. 

So, the counted_by information is not useful for the 
__builtin_dynamic_object_size.

But I am not sure whether the counted_by still can be used for bound sanitizer?

Thanks for suggestions and help.

Qing




> On Apr 23, 2025, at 15:45, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the 2nd version of the patch set to extend "counted_by" attribute
> to pointer fields of structures.
> 
> the first version was submitted 3 months ago on 1/16/2025, and triggered
> a lot of discussion on whether we need a new syntax for counted_by
> attribute.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html
> 
> After a long discussion since then: 
> (https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677024.html)
> 
> We agreed to the following compromised solution:
> 
> 1. Keep the current syntax of counted_by for lone identifier;
> 2. Add a new attribute "counted_by_exp" for expressions.
> 
> Although there are still some discussion going on for the new 
> counted_by_exp attribute (In Clang community) 
> https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885
> 
> The syntax for the lone identifier is kept the same as before.
> 
> So, I'd like to resubmit my previous patch of extending "counted_by"
> to pointer fields of structures. 
> 
> The whole patch set has been rebased on the latest trunk, some testing case
> adjustment,  bootstrapped  and regression tested on both aarch64 and x86.
> 
> There will be a seperate patch set for the new "counted_by_exp" 
> attribute later to cover the expressions cases.
> 
> The following are more details on this patch set:
> 
> For example:
> 
> struct PP {
>  size_t count2;
>  char other1;
>  char *array2 __attribute__ ((counted_by (count2)));
>  int other2;
> } *pp;
> 
> specifies that the "array2" is an array that is pointed by the
> pointer field, and its number of elements is given by the field
> "count2" in the same structure.
> 
> There are the following importand facts about "counted_by" on pointer
> fields compared to the "counted_by" on FAM fields:
> 
> 1. one more new requirement for pointer fields with "counted_by" attribute:
>   pp->array2 and pp->count2 can ONLY be changed by changing the whole 
> structure
>   at the same time.
> 
> 2. the following feature for FAM field with "counted_by" attribute is NOT
>   valid for the pointer field any more:
> 
>" One important feature of the attribute is, a reference to the
> flexible array member field uses the latest value assigned to the
> field that represents the number of the elements before that
> reference.  For example,
> 
>p->count = val1;
>p->array[20] = 0;  // ref1 to p->array
>p->count = val2;
>p->array[30] = 0;  // ref2 to p->array
> 
> in the above, 'ref1' uses 'val1' as the number of the elements in
> 'p->array', and 'ref2' uses 'val2' as the number of elements in
> 'p->array'. "
> 
> This patch set includes 3 parts:
> 
> 1.Extend "counted_by" attribute to pointer fields of structures. 
> 2.Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE
>and use it in builtinin-object-size.
> 3.Use the counted_by attribute of pointers in array bound checker.
> 
> In which, the patch 1 and 2 are simple and straightforward, however, the 
> patch 3  
> is a little complicate due to the following reason:
> 
>Current array bound checker only instruments ARRAY_REF, and the INDEX
>information is the 2nd operand of the ARRAY_REF.
> 
>When extending the array bound checker to pointer references with
>counted_by attributes, the hardest part is to get the INDEX of the
>corresponding array ref from the offset computation expression of
>the pointer ref. 
> 
> I do need some careful review on the 3rd part

[PATCH] libstdc++: Constrain formatter for thread:id [PR119918]

2025-04-24 Thread Tomasz Kamiński

This patch add constrains __formatter::__char to _CharT type parameter
of formatter specialization, matching the constrains
of formatting of integer/pointers that are used as native handles.

The dependency on  header, is changed to .
To achieve that, formatting of pointers is extraced from void const*
specialization to internal __formatter_ptr<_CharT>, that can be forward
declared.

Finally, the handle representation is now printed directly to __fc.out(),
by the formatter for handle type. To support this, internal formatters
can now be constructed from _Spec object as alternative to invoking parse
method.

PR libstdc++/119918

libstdc++-v3/ChangeLog:

* include/bits/formatfwd.h (__format::_Align): Moved from std/format.
(std::__throw_format_error, __format::__formatter_str)
(__format::__formatter_ptr): Declare.
* include/std/format (__format::_Align): Moved to bits/formatfwd.h.
(__formatter_int::__formatter_int): Define.
(__format::__formatter_ptr): Extracted from formatter for const void*.
(std::formatter, formatter)
(std::formatter): Delegate to 
__formatter_ptr<_CharT>.
* include/std/thread (std::formatter): Constrain
_CharT template parameter.
(formatter::parse): Specify default aligment, and
qualify __throw_format_error to disable ADL.
(formatter::format): Use formatters to write 
directly
to output.
* testsuite/30_threads/thread/id/output.cc: Tests for formatting 
thread::id
representing not-a-thread with padding and formattable concept.
---
Tested on x86_64-linux. OK for trunk, after GCC 15.1?

 libstdc++-v3/include/bits/formatfwd.h |  19 +-
 libstdc++-v3/include/std/format   | 257 ++
 libstdc++-v3/include/std/thread   |  46 ++--
 .../testsuite/30_threads/thread/id/output.cc  |  30 ++
 4 files changed, 211 insertions(+), 141 deletions(-)

diff --git a/libstdc++-v3/include/bits/formatfwd.h 
b/libstdc++-v3/include/bits/formatfwd.h
index a6b5ac8c8ce..8a84af4d6e4 100644
--- a/libstdc++-v3/include/bits/formatfwd.h
+++ b/libstdc++-v3/include/bits/formatfwd.h
@@ -50,6 +50,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // [format.formatter], formatter
   template struct formatter;
 
+  /// @cond undocumented
+  [[noreturn]]
+  inline void
+  __throw_format_error(const char* __what);
+
 namespace __format
 {
 #ifdef _GLIBCXX_USE_WCHAR_T
@@ -60,8 +65,18 @@ namespace __format
 concept __char = same_as<_CharT, char>;
 #endif
 
-  template<__char _CharT>
-struct __formatter_int;
+  enum _Align {
+_Align_default,
+_Align_left,
+_Align_right,
+_Align_centre,
+  };
+
+  template struct _Spec;
+
+  template<__char _CharT> struct __formatter_str;
+  template<__char _CharT> struct __formatter_int;
+  template<__char _CharT> struct __formatter_ptr;
 }
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index e557e104d74..8f47e5acb80 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -485,13 +485,6 @@ namespace __format
 _Pres_esc = 0xf,  // For strings, charT and ranges
   };
 
-  enum _Align {
-_Align_default,
-_Align_left,
-_Align_right,
-_Align_centre,
-  };
-
   enum _Sign {
 _Sign_default,
 _Sign_plus,
@@ -1434,6 +1427,13 @@ namespace __format
   static constexpr _Pres_type _AsBool = _Pres_s;
   static constexpr _Pres_type _AsChar = _Pres_c;
 
+  __formatter_int() = default;
+
+  constexpr
+  __formatter_int(_Spec<_CharT> __spec)
+  : _M_spec(__spec)
+  { }
+
   constexpr typename basic_format_parse_context<_CharT>::iterator
   _M_do_parse(basic_format_parse_context<_CharT>& __pc, _Pres_type __type)
   {
@@ -2381,6 +2381,134 @@ namespace __format
   _Spec<_CharT> _M_spec{};
 };
 
+  template<__format::__char _CharT>
+struct __formatter_ptr
+{
+  __formatter_ptr() = default;
+
+  constexpr
+  __formatter_ptr(_Spec<_CharT> __spec)
+  : _M_spec(__spec)
+  { }
+
+  constexpr typename basic_format_parse_context<_CharT>::iterator
+  parse(basic_format_parse_context<_CharT>& __pc)
+  {
+   __format::_Spec<_CharT> __spec{};
+   const auto __last = __pc.end();
+   auto __first = __pc.begin();
+
+   auto __finalize = [this, &__spec] {
+ _M_spec = __spec;
+   };
+
+   auto __finished = [&] {
+ if (__first == __last || *__first == '}')
+   {
+ __finalize();
+ return true;
+   }
+ return false;
+   };
+
+   if (__finished())
+ return __first;
+
+   __first = __spec._M_parse_fill_and_align(__first, __last);
+   if (__finished())
+ return __first;
+
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// P2510R3 Formatting pointers
+#if __glibcxx_format >= 202304L
+   __first = __spec._M_parse_zero_fill(__firs

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Jakub Jelinek

On Thu, Apr 24, 2025 at 12:05:06PM +, Kyrylo Tkachov wrote:
> >>> On 24 Apr 2025, at 12:09, Jakub Jelinek  wrote:
> >>> 
> >>> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkachov wrote:
> > I'd have expected instead of the LTO_PARTITION_DEFAULT checks one 
> > should be
> > testing !opts_set->x_flag_lto_partition (i.e. -flto-partition=balanced
> > should be the default, but when not specified explicitly, it would 
> > really
> > match that
> > error ("%<-fipa-reorder-for-locality%> is incompatible with"
> > 
> >   
> >" an explicit %qs option", "-flto-partition");   
> > 
> >   
> > error wording (right now -fipa-reorder-for-locality 
> > -flto-partition=default
> > is explicit, yet no error is emittetd).
>  
>  Yes, I think I was confused about the use of opts_set. This is overly 
>  convoluted.
>  So something like this?
> >>> 
> >>> Yes, though I'd also remove the trailing comma after
> >>>  LTO_PARTITION_CACHE = 5
> >> 
> >> Right, ok to commit this to trunk and GCC 15 branch once bootstrap is 
> >> successful?
> > 
> > Yes.
> 
> Thanks, I’ve now pushed the fix to trunk and the branch.

Actually, sorry, I'm afraid the x_flag_ipa_reorder_for_locality case should
be using opts and not opts_set.

Because right now I see
$ ./xgcc -B ./ -o abc abc.c -flto-partition=balanced -fipa-reorder-for-locality 
cc1: error: ‘-fipa-reorder-for-locality’ is incompatible with an explicit 
‘-flto-partition’ option
$ ./xgcc -B ./ -o abc abc.c -fipa-reorder-for-locality 
$ ./xgcc -B ./ -o abc abc.c -flto-partition=balanced 
-fno-ipa-reorder-for-locality 
cc1: error: ‘-fipa-reorder-for-locality’ is incompatible with an explicit 
‘-flto-partition’ option
$ ./xgcc -B ./ -o abc abc.c -fno-ipa-reorder-for-locality 
$ ./xgcc -B ./ -o abc abc.c -flto-partition=balanced
$ ./xgcc -B ./ -o abc abc.c

The third case looks undesirable, -fno-ipa-reorder-for-locality is the
default and shouldn't affect anything, whether explicit or implicit.

Jakub

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Jakub Jelinek

On Thu, Apr 24, 2025 at 12:39:59PM +, Kyrylo Tkachov wrote:
> > The third case looks undesirable, -fno-ipa-reorder-for-locality is the
> > default and shouldn't affect anything, whether explicit or implicit.
> 
> I see.  With this patch I don’t get a complaint on
> -flto-partition=balanced -fno-ipa-reorder-for-locality while still getting
> the error for  -fipa-reorder-for-locality  with any -flto-partition=
> value.  Thanks, Kyrill

Ok for both trunk and 15 if it passes testing.

Jakub

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov



> On 24 Apr 2025, at 12:18, Jakub Jelinek  wrote:
> 
> On Thu, Apr 24, 2025 at 10:15:08AM +, Kyrylo Tkachov wrote:
>> 
>> 
>>> On 24 Apr 2025, at 12:09, Jakub Jelinek  wrote:
>>> 
>>> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkachov wrote:
> I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should 
> be
> testing !opts_set->x_flag_lto_partition (i.e. -flto-partition=balanced
> should be the default, but when not specified explicitly, it would really
> match that
> error ("%<-fipa-reorder-for-locality%> is incompatible with"  
>   
>   
>" an explicit %qs option", "-flto-partition"); 
>   
>   
> error wording (right now -fipa-reorder-for-locality 
> -flto-partition=default
> is explicit, yet no error is emittetd).
 
 Yes, I think I was confused about the use of opts_set. This is overly 
 convoluted.
 So something like this?
>>> 
>>> Yes, though I'd also remove the trailing comma after
>>>  LTO_PARTITION_CACHE = 5
>> 
>> Right, ok to commit this to trunk and GCC 15 branch once bootstrap is 
>> successful?
> 
> Yes.

Thanks, I’ve now pushed the fix to trunk and the branch.
Kyrill

> 
> Jakub
>

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov



> On 24 Apr 2025, at 14:28, Jakub Jelinek  wrote:
> 
> On Thu, Apr 24, 2025 at 12:05:06PM +, Kyrylo Tkachov wrote:
> On 24 Apr 2025, at 12:09, Jakub Jelinek  wrote:
> 
> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkachov wrote:
>>> I'd have expected instead of the LTO_PARTITION_DEFAULT checks one 
>>> should be
>>> testing !opts_set->x_flag_lto_partition (i.e. -flto-partition=balanced
>>> should be the default, but when not specified explicitly, it would 
>>> really
>>> match that
>>>error ("%<-fipa-reorder-for-locality%> is incompatible with" 
>>> 
>>>  
>>>   " an explicit %qs option", "-flto-partition");
>>> 
>>>  
>>> error wording (right now -fipa-reorder-for-locality 
>>> -flto-partition=default
>>> is explicit, yet no error is emittetd).
>> 
>> Yes, I think I was confused about the use of opts_set. This is overly 
>> convoluted.
>> So something like this?
> 
> Yes, though I'd also remove the trailing comma after
> LTO_PARTITION_CACHE = 5
 
 Right, ok to commit this to trunk and GCC 15 branch once bootstrap is 
 successful?
>>> 
>>> Yes.
>> 
>> Thanks, I’ve now pushed the fix to trunk and the branch.
> 
> Actually, sorry, I'm afraid the x_flag_ipa_reorder_for_locality case should
> be using opts and not opts_set.
> 
> Because right now I see
> $ ./xgcc -B ./ -o abc abc.c -flto-partition=balanced 
> -fipa-reorder-for-locality 
> cc1: error: ‘-fipa-reorder-for-locality’ is incompatible with an explicit 
> ‘-flto-partition’ option
> $ ./xgcc -B ./ -o abc abc.c -fipa-reorder-for-locality 
> $ ./xgcc -B ./ -o abc abc.c -flto-partition=balanced 
> -fno-ipa-reorder-for-locality 
> cc1: error: ‘-fipa-reorder-for-locality’ is incompatible with an explicit 
> ‘-flto-partition’ option
> $ ./xgcc -B ./ -o abc abc.c -fno-ipa-reorder-for-locality 
> $ ./xgcc -B ./ -o abc abc.c -flto-partition=balanced
> $ ./xgcc -B ./ -o abc abc.c
> 
> The third case looks undesirable, -fno-ipa-reorder-for-locality is the
> default and shouldn't affect anything, whether explicit or implicit.

I see. With this patch I don’t get a complaint on -flto-partition=balanced 
-fno-ipa-reorder-for-locality while still getting the error for  
-fipa-reorder-for-locality  with any -flto-partition= value.
Thanks,
Kyrill

> 
> Jakub
> 



0001-opts.cc-Use-opts-rather-than-opts_set-for-validating.patch
Description: 0001-opts.cc-Use-opts-rather-than-opts_set-for-validating.patch

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov



> On 24 Apr 2025, at 14:44, Jakub Jelinek  wrote:
> 
> On Thu, Apr 24, 2025 at 12:39:59PM +, Kyrylo Tkachov wrote:
>>> The third case looks undesirable, -fno-ipa-reorder-for-locality is the
>>> default and shouldn't affect anything, whether explicit or implicit.
>> 
>> I see.  With this patch I don’t get a complaint on
>> -flto-partition=balanced -fno-ipa-reorder-for-locality while still getting
>> the error for  -fipa-reorder-for-locality  with any -flto-partition=
>> value.  Thanks, Kyrill
> 
> Ok for both trunk and 15 if it passes testing.
> 

Thanks again for your help.
Pushed to trunk and 15 after a normal and lto-locality bootstrap on 
aarch64-linux.
Kyrill

> Jakub
>

Re: [PATCH] libstdc++: hashing support for chrono value classes (P2592R2)

2025-04-24 Thread Tomasz Kaminski

Hi,

I am reattaching the original patch below, as I wasn't on the mailing list
when it was sent.
Thank you for submitting the patch and apologies for the late response.

The major comment I have is that these are new C++26 classes, so we can use
requires __is_hash_enabled_for<_Tp> and define only enabled specialization.
In other cases we will be using a primary template that is disabled.

Also, argument_type and result_type typedefs are no longer present since
C++20, so you can remove them, and also remove __hash_base base classes
that were responsible for providing them.

To simplify searching, I have added `## COMMENT` before comments.

-
> This commit implements [time.hash], added by P2592 for C++26.
> The implementation of the various hash specializations is
> mostly straightforward:
>
> * duration hashes its representation (not the period);
> * time_point hashes its duration;
> * the calendaring classes (year, month, day, etc.) hash their
>   values;
> * zoned_time hashes the time zone pointer and its time point.
>
> There are however a couple of challenges:
>
> * The noexcept specifications for hashing duration, time_point,
>   zoned_time are slightly more convoluted than expected, as
>   their getters are noexcept(false) (e.g. calling count() on a
>   duration will copy the representation and that may throw);
>
> * [time.duration] says that "Rep shall be an arithmetic type or a
>   class emulating an arithmetic type". Technically speaking, this
>   means that `const int` is a valid Rep; but we can't use
>   hash.
>
>   I'm not sure if this is deliberate or not (cf. LWG951, LWG953),
>   but I've decided to support it nonetheless.
## COMMENT
We support them for optional, so this is consistent.

> * zoned_time and the calendar classes that combine several
>   parts (e.g. month_day) need a hash combiner. The one
>   available in _Hash_impl works on objects representations,
>   not values, and given the nature of the calendar classes
>   I'm very afraid that I may accidentally be hashing padding
>   bits. Therefore, I've added a helper convenience combiner.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/functional_hash.h: Add a convenience hash
>  combiner.
## COMMENT
The format here is to mention modified entity name in parenthesis:
include/bits/functional_hash.h (_Hash_combiner): Define.
https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs

> * include/bits/version.def: Bump the feature-testing macro.
> * include/bits/version.h: Regenerate.
> * include/std/chrono: Add std::hash specializations for the
>  value classes in namespace chrono.
> * testsuite/std/time/hash.cc: New test.
>
> Signed-off-by: Giuseppe D'Angelo 
> ---
>  libstdc++-v3/include/bits/functional_hash.h |  31 ++
>  libstdc++-v3/include/bits/version.def   |   6 +
>  libstdc++-v3/include/bits/version.h |   7 +-
>  libstdc++-v3/include/std/chrono | 299 
>  libstdc++-v3/testsuite/std/time/hash.cc | 225 +++
>  5 files changed, 567 insertions(+), 1 deletion(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/hash.cc
>
> diff --git a/libstdc++-v3/include/bits/functional_hash.h
b/libstdc++-v3/include/bits/functional_hash.h
> index 3626ebe816b..c0f82601b4b 100644
> --- a/libstdc++-v3/include/bits/functional_hash.h
> +++ b/libstdc++-v3/include/bits/functional_hash.h
> @@ -235,6 +235,37 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>{ return hash(&__val, sizeof(__val), __hash); }
>};
>
> +#if __cplusplus >= 201103L // C++11
> +  // A convenience hash combiner
> +  struct _Hash_combiner
> +  {
> +static void __hash_combine(size_t&) {}
> +
> +template
> +  static void
> +  __hash_combine(size_t& __result,
> + const _Arg& __arg,
> + const _Args&... __args)
> +  {
> + const size_t __arg_hash = hash<_Arg>{}(__arg);
> + __result = _Hash_impl::__hash_combine(__arg_hash, __result);
> + __hash_combine(__result, __args...);
> +  }
> +
> +static size_t __hash()
> +  { return 0; }
## COMMENT
I hope we never call this overload, so I would remove it.
+
> +template
> +  static size_t __hash(const _Arg& __arg, const _Args&... __args)
> +  {
> + const size_t __arg_hash = hash<_Arg>{}(__arg);
> + size_t __result = _Hash_impl::hash(__arg_hash);
## COMMENT
I think you can implement this using fold expression, as:
(_Hash_impl::__hash_combine(hash<_Arg>{}(__args), __result), ...);

> + __hash_combine(__result, __args...);
> + return __result;
> +  }
> +  };
> +#endif // C++11
> +
>/// Specialization for float.
>template<>
>  struct hash : public __hash_base
> diff --git a/libstdc++-v3/include/bits/version.def
b/libstdc++-v3/include/bits/version.def
> index 59b028c44af..2a2bbf54b3f 100644
> --- a/libstdc++-v3/include/bits/version.def
> +++ b/libstdc++-v3/include/bits/version.def
> @@ -575,6 +575,12 @@  ftms = {
>
>  ftms = {
>name = chrono;
> +  values = {
> +v = 202306;
> +cxxmin

Re: [PATCH] s390, v2: Allow 5+ argument tail-calls in some special cases [PR119873]

2025-04-24 Thread Stefan Schulze Frielinghaus

On Thu, Apr 24, 2025 at 12:49:39PM +0200, Jakub Jelinek wrote:
> On Thu, Apr 24, 2025 at 09:44:45AM +0200, Stefan Schulze Frielinghaus wrote:
> > Yes, every parameter is sign or zero extended if its type is smaller
> > than 64bit.
> 
> > Note, on s390 a parameter is either passed in a register (pair) or via
> > memory, but not partly in a register and memory.
> 
> Ok, so like this then if it passes bootstrap/regtest?

Just for the records: the parallel part is coming from
s390_function_arg() where we have that if a parameter is supposed to be
passed via a register pair then for -mesa we return a REG with DImode
whereas for -mzarch we return a parallel containing two REGs each with
SImode.  That means with this patch

extern void bar (int p1, int p2, int p3, long long p4);
void foo (int p1, int p2, int p3, long long p4)
{
[[gnu::musttail]] return bar (p1, p2, p3, p4);
}

fails to compile for -m31 -mzarch and succeeds for -m31 -mesa due to the
missing handling of the parallel case.  IMHO we can ignore this
difference for the moment and install this patch.  Especially if this
patch is supposed to go into gcc15, then I don't want to hold it back,
i.e., please apply.

Cheers,
Stefan

Re: [PATCH v2] Document AArch64 changes for GCC 15

2025-04-24 Thread Kyrylo Tkachov



> On 23 Apr 2025, at 13:47, Richard Sandiford  wrote:
> 
> Thanks for all the feedback.  I've tried to address it in the version
> below.  I'll push later today if there are no further comments.
> 
> Richard
> 
> 
> The list is structured as:
> 
> - new configurations
> - command-line changes
> - ACLE changes
> - everything else
> 
> As usual, the list of new architectures, CPUs, and features is from a
> purely mechanical trawl of the associated .def files.  I've identified
> features by their architectural name to try to improve searchability.
> Similarly, the list of ACLE changes includes the associated ACLE
> feature macros, again to try to improve searchability.
> 
> The list summarises some of the target-specific optimisations because
> it sounded like Tamar had received feedback that people found such
> information interesting.
> 
> I've used the passive tense for most entries, to try to follow the
> style used elsewhere.
> 
> We don't yet define __ARM_FEATURE_FAMINMAX, but I'll fix that
> separately.

Thanks again for doing this...

> 
> +  Support has been added for the following features of the Arm C
> +Language Extensions
> +(https://github.com/ARM-software/acle";>ACLE):
> +
> +  guarded control stacks
> +  lookup table instructions with 2-bit and 4-bit indices
> +(predefined macro
> +__ARM_FEATURE_LUT, enabled by +lut)
> +  
> +  floating-point absolute minimum and maximum instructions
> +(predefined macro __ARM_FEATURE_FAMINMAX,
> +enabled by +faminmax)
> +  
> +  FP8 conversions (predefined macro
> +__ARM_FEATURE_FP8, enabled by +fp8)
> +  
> +  FP8 2-way dot product to half precision instructions
> +(predefined macro __ARM_FEATURE_FP8DOT2,
> +enabled by +fp8dot2)
> +  
> +  FP8 4-way dot product to single precision instructions
> +(predefined macro __ARM_FEATURE_FP8DOT4,
> +enabled by +fp8dot4)
> +  
> +  FP8 multiply-accumulate to half precision and single precision
> +instructions (predefined macro __ARM_FEATURE_FP8FMA,
> +enabled by +fp8fma)
> +  
> +  SVE FP8 2-way dot product to half precision instructions
> +(predefined macro __ARM_FEATURE_SSVE_FP8DOT2,
> +enabled by +ssve-fp8dot2)
> +  
> +  SVE FP8 4-way dot product to single precision instructions
> +(predefined macro __ARM_FEATURE_SSVE_FP8DOT4,
> +enabled by +ssve-fp8dot4)
> +  
> +  SVE FP8 multiply-accumulate to half precision and single precision
> +instructions (predefined macro 
> __ARM_FEATURE_SSVE_FP8FMA,
> +enabled by +ssve-fp8fma)
> +  

… Should these say “SSVE FP8” rather than “SVE FP8”?
Thanks,
Kyrill

> +  SVE2.1 instructions (predefined macro
> +__ARM_FEATURE_SVE2p1, enabled by +sve2p1)
> +  
> +  SVE non-widening bfloat16 instructions
> +(predefined macro __ARM_FEATURE_SVE_B16B16,
> +enabled by +sve-b16b16)
> +  
> +  SME2.1 instructions (predefined macro
> +__ARM_FEATURE_SME2p1, enabled by +sme2p1)
> +  
> +  SME non-widening bfloat16 instructions
> +(predefined macro __ARM_FEATURE_SME_B16B16,
> +enabled by +sme-b16b16)
> +  
> +  SME half-precision instructions
> +(predefined macro __ARM_FEATURE_SME_F16F16,
> +enabled by +sme-f16f16)
> +  
> +  using C and C++ prefix operators, infix operators, and postfix
> +operators with scalable SVE ACLE types
> +(predefined macro __ARM_FEATURE_SVE_VECTOR_OPERATORS==2,
> +enabled by +sve)
> +  
> +  __fma (in arm_acle.h)
> +  __fmaf (in arm_acle.h)
> +  __chkfeat (in arm_acle.h)
> +
> +  
> +  In addition, the following changes have been made to preexisting
> +ACLE features:
> +
> +  The macros __ARM_FEATURE_BF16 and
> +__ARM_FEATURE_SVE_BF16 are now predefined when the
> +associated support is available.  Previous versions of GCC provided
> +the associated intrinsics but did not predefine the macros.
> +  
> +  OpenMP tasks can now share scalable SVE vectors and predicates.
> +However, offloading of scalable vectors and predicates is not
> +supported.
> +  
> +  ACLE system register functions (such as __arm_rsr
> +and __arm_wsr) no longer try to enforce the minimum
> +architectural requirement.
> +  
> +  A warning is reported if code attempts to use the Function
> +Multi-Versioning feature.  GCC's current implementation of this
> +feature is still experimental and it does not conform to the
> +ACLE specification.
> +  
> +
> +  
> +  Support has been added for the indirect_return
> +function-type attribute, which indicates that a function might return
> +via an indirect branch instead of via a normal return instruction.
> +  
> +  128-bit atomic operations have been extended t

Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-24 Thread Robin Dapp

Ah, I see, thanks.  So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 
while vadd.vx costs 1 + 2, making it cheaper?


Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff 
cost here to see
the impact to late-combine.

+  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0 {
+cost_val = get_vector_costs ()->regmove->GR2VR;
+  }

 cut line 

If GR2VR is 2, we will perform the combine as below.

 51 trying to combine definition of r135 in:
 5211: r135:RVVM1DI=vec_duplicate(r150:DI)
 53 into:
 5418: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
 55   REG_DEAD r146:RVVM1DI
 56 successfully matched this instruction to *add_vx_rvvm1di:
 57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
 58 (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
 59 (reg:RVVM1DI 146)))
 60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4 
(weighted: 32.363637); keeping replacement
 61 rescanning insn with uid = 18.
 62 updating insn 18 in-place
 63 verify found no changes in insn with uid = 18.
 64 deleting insn 11
 65 deleting insn with uid = 11.

 cut line 

If GR2VR is 1, we will perform the combine as below.

  51   │ trying to combine definition of r135 in:
  52   │11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │   REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │ (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4 
(weighted: 32.363637); keeping replacement
  61   │ rescanning insn with uid = 18.
  62   │ updating insn 18 in-place
  63   │ verify found no changes in insn with uid = 18.
  64   │ deleting insn 11
  65   │ deleting insn with uid = 11.

 cut line 

If GR2VR is 0, it will be normalized to 1 as below, thus the combine log looks 
like the same as above.


IMHO this is how it should roughly look like:

With GR2VR=2:
vadd.vv: cost 4 = COST_N_INSNS (1)
vmv.v.x: cost COST_N_INSNS (GR2VR) = 8
vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12

With GR2VR=1:
vadd.vv: cost 4
vmv.v.x: cost 4
vadd.vx: cost 4 + 4 = 8

With GR2VR=0:
vadd.vv: cost 4
vmv.v.x: cost 4 (or less?)
vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4

So with GR2VR > 0 we would perform the replacement when the frequency is 
similar.  With GR2VR == 0 we should always do.


vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0 
would also seem off.


--
Regards
Robin

Re: [PATCH v2] libstdc++: Add lvalue overload for generator::yield_value

2025-04-24 Thread Tomasz Kaminski

On Thu, Apr 24, 2025 at 2:41 PM Jonathan Wakely  wrote:

> This was approved in Wrocław as LWG 3899.
>
> This avoids creating a new coroutine frame to co_yield the elements of
> an lvalue generator.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/generator (generator::yield_value): Add overload
> taking lvalue element_of view, as per LWG 3899.
> * testsuite/24_iterators/range_generators/lwg3899.cc: New test.
> ---
>
> Added a testcase (thanks to Arsen).
>
> Tested x86_64-linux.
>
LGTM

>
>  libstdc++-v3/include/std/generator| 10 
>  .../24_iterators/range_generators/lwg3899.cc  | 57 +++
>  2 files changed, 67 insertions(+)
>  create mode 100644
> libstdc++-v3/testsuite/24_iterators/range_generators/lwg3899.cc
>
> diff --git a/libstdc++-v3/include/std/generator
> b/libstdc++-v3/include/std/generator
> index 3f781f1bb29..7ab2c9e7ce9 100644
> --- a/libstdc++-v3/include/std/generator
> +++ b/libstdc++-v3/include/std/generator
> @@ -153,6 +153,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   noexcept
> { return _Recursive_awaiter { std::move(__r.range) }; }
>
> +   // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +   // 3899. co_yielding elements of an lvalue generator is
> +   // unnecessarily inefficient
> +   template
> +   requires std::same_as<_Yield2_t<_R2, _V2>, _Yielded>
> +   auto
> +   yield_value(ranges::elements_of&, _U2>
> __r)
> + noexcept
> +   { return _Recursive_awaiter { std::move(__r.range) }; }
> +
> template
> requires convertible_to, _Yielded>
> auto
> diff --git
> a/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3899.cc
> b/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3899.cc
> new file mode 100644
> index 000..5a812ecf9d9
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3899.cc
> @@ -0,0 +1,57 @@
> +// { dg-do run { target c++23 } }
> +
> +// LWG 3899.
> +// co_yielding elements of an lvalue generator is unnecessarily
> inefficient
> +
> +#include 
> +#include 
> +#include 
> +
> +struct memory_resource : std::pmr::memory_resource
> +{
> +  std::size_t count = 0;
> +
> +  void* do_allocate(std::size_t n, std::size_t a) override
> +  {
> +count += n;
> +return std::pmr::new_delete_resource()->allocate(n, a);
> +  }
> +
> +  void do_deallocate(void* p, std::size_t n, std::size_t a) override
> +  {
> +return std::pmr::new_delete_resource()->deallocate(p, n, a);
> +  }
> +
> +  bool do_is_equal(const std::pmr::memory_resource& mr) const noexcept
> override
> +  { return this == &mr; }
> +};
> +
> +std::pmr::generator
> +f(std::allocator_arg_t, std::pmr::polymorphic_allocator<>, int init)
> +{
> +  co_yield init + 0;
> +  co_yield init + 1;
> +}
> +
> +std::pmr::generator
> +g(std::allocator_arg_t, std::pmr::polymorphic_allocator<> alloc)
> +{
> +  auto gen = f(std::allocator_arg, alloc, 0);
> +  auto gen2 = f(std::allocator_arg, alloc, 2);
> +  co_yield std::ranges::elements_of(std::move(gen), alloc);
> +  co_yield std::ranges::elements_of(gen2, alloc);
> +}
> +
> +int
> +main()
> +{
> +  std::size_t counts[4];
> +  memory_resource mr;
> +  for (auto d : g(std::allocator_arg , &mr))
> +counts[d] = mr.count;
> +  VERIFY(counts[0] != 0);
> +  // No allocations after the first one:
> +  VERIFY(counts[1] == counts[0]);
> +  VERIFY(counts[2] == counts[0]);
> +  VERIFY(counts[3] == counts[0]);
> +}
> --
> 2.49.0
>
>

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Jan Hubicka

> Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> or vpandn.
> Current register_operand/vector_operand could lose some optimization
> opportunity.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
>   * config/i386/predicates.md (vector_or_0_or_1s_operand): New predicate.
>   (nonimm_or_0_or_1s_operand): Ditto.
>   * config/i386/sse.md (vcond_mask_):
>   Extend the predicate of operands1 to accept 0 or allones
>   operands.
>   (vcond_mask_): Ditto.
>   (vcond_mask_v1tiv1ti): Ditto.
>   (vcond_mask_): Ditto.
>   * config/i386/i386.md (movcc): Ditto for operands[2] and
>   operands[3].
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/blendv-to-maxmin.c: New test.
>   * gcc.target/i386/blendv-to-pand.c: New test.

> diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> new file mode 100644
> index 000..042eb7d8f24
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> +
> +double
> +foo (double a)
> +{
> +  if (a > 0.0)
> +return a;
> +  return 0.0;
> +}

With -ffast-math this is matched as MAX_EXPR at gimple level. Without
-ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
explicitely documented as unspecified when one of parameters is nan.

So without -ffast-math at combine time we see:
(insn 6 3 7 2 (set (reg:DF 103)
(const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
 (nil))
(insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
(unspec:DF [
(reg:DF 104 [ a ])
(reg:DF 103)
] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
 (expr_list:REG_DEAD (reg:DF 104 [ a ])
(expr_list:REG_DEAD (reg:DF 103)
(nil

maxss is defined as:

MAX(SRC1, SRC2)
{
IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
ELSE DEST := SRC2;
FI;
}

which I think translates to
  SRC1 > SRC1 : SRC1 : SRC2

If SRC1 and SRC2 are both 0, this should evaulate to false and return RC2
if one of them is NaN this should evaulate to false and return SRC2

so it seems to do right side cases and has direct RTL equivalent.  So
why we need UNSPEC_IEEE_MAX at all? Expressing this in RTL directly
would enable RTL passes to do better job.
Similarly for BLENDV...

Honza

Re: [PATCH] arc: testsuite: Scan "rlc" instead of "mov.hs".

2025-04-24 Thread Claudiu Zissulescu Ianculescu

Hi Jeff,

There is one patch missing, I'll add it to mainline as soon as the
main is open for commits.

Best,
Claudiu

On Fri, Apr 18, 2025 at 12:10 AM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:23 AM, Luis Silva wrote:
> > Due to the patch by Roger Sayle,
> > 09881218137f4af9b7c894c2d350cf2ff8e0ee23, which
> > introduces the use of the `rlc rX,0` instruction in place
> > of the `mov.hs`, the add overflow test case needs to be
> > updated.  The previous test case was validating the `mov.hs`
> > instruction, but now it must validate the `rlc` instruction
> > as the new behavior.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/arc/overflow-1.c: Replace mov.hs with rlc.
> I don't see any test named "overflow-1.c" in the arc subdirectory?!?
>
> Is it possible that's a change in your local repo?
>
> jeff

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Kees Cook

On Thu, Apr 24, 2025 at 04:36:14PM +, Qing Zhao wrote:
> 
> > On Apr 24, 2025, at 11:59, Martin Uecker  wrote:
> > 
> > Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
> >> Hi, 
> >> 
> >> Kees reported a segmentation failure when he used the patch to compiler 
> >> kernel, 
> >> and the reduced the testing case is something like the following:
> >> 
> >> struct f {
> >> void *g __attribute__((__counted_by__(h)));
> >> long h;
> >> };
> >> 
> >> extern struct f *my_alloc (int);
> >> 
> >> int i(void) {
> >> struct f *iov = my_alloc (10);
> >> int *j = (int *)iov->g;
> >> return __builtin_dynamic_object_size(iov->g, 0);
> >> }
> >> 
> >> Basically, the problem is relating to the pointee type of the pointer 
> >> array being “void”, 
> >> As a result, the element size of the array is not available in the IR. 
> >> Therefore segmentation
> >> fault when calculating the size of the whole object. 
> >> 
> >> Although it’s easy to fix this segmentation failure, I am not quite sure 
> >> what’s the best
> >> solution to this issue:
> >> 
> >> 1. Reject such usage of “counted_by” in the very beginning by reporting 
> >> warning to the
> >> User, and delete the counted_by attribute from the field.
> >> 
> >> Or:
> >> 
> >> 2. Accept such usage, but issue warnings when calculating the object_size 
> >> in Middle-end.
> >> 
> >> Personally, I prefer the above 1 since I think that when the pointee type 
> >> is void, we don’t know
> >> The type of the element of the pointer array, there is no way to decide 
> >> the size of the pointer array. 
> >> 
> >> So, the counted_by information is not useful for the 
> >> __builtin_dynamic_object_size.
> >> 
> >> But I am not sure whether the counted_by still can be used for bound 
> >> sanitizer?
> >> 
> >> Thanks for suggestions and help.
> > 
> > GNU C allows pointer arithmetic and sizeof on void pointers and
> > that treats void as having size 1.  So you could also allow counted_by
> > and assume as size 1 for void.
> > 
> > https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html
> 
> Okay, thanks for the info.
> So, 
> 1. should we issue warnings when doing this?

Please don't, Linux would very much like to track these allocation sizes
still. Performing pointer arithmetic and bounds checking (via __bdos) on
"void *" is wanted (and such a calculation was what tripped the
segfault).

> 2. If the compilation option is explicitly asking for standard C,
> shall we issue warning and delete the counted_by attribute from the field?

I think it needs to stay attached for __bdos. And from the looks of it,
even array access works with 1-byte values too:

extern void *ptr;
void *foo(int num) {
return &ptr[num];
}

The assembly output of this shows it's doing byte addition. Clang
doesn't warn about this, but GCC does:

:5:16: warning: dereferencing 'void *' pointer
5 | return &ptr[num];
  |^

So, I think even the bounds sanitizer should handle it, even if a
warning ultimately gets emitted.

-Kees

-- 
Kees Cook

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Qing Zhao



> On Apr 24, 2025, at 14:31, Kees Cook  wrote:
> 
> On Thu, Apr 24, 2025 at 06:06:03PM +, Qing Zhao wrote:
>> 
>> 
>>> On Apr 24, 2025, at 13:07, Kees Cook  wrote:
>>> 
>>> On Thu, Apr 24, 2025 at 04:36:14PM +, Qing Zhao wrote:
 
> On Apr 24, 2025, at 11:59, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
>> Hi, 
>> 
>> Kees reported a segmentation failure when he used the patch to compiler 
>> kernel, 
>> and the reduced the testing case is something like the following:
>> 
>> struct f {
>> void *g __attribute__((__counted_by__(h)));
>> long h;
>> };
>> 
>> extern struct f *my_alloc (int);
>> 
>> int i(void) {
>> struct f *iov = my_alloc (10);
>> int *j = (int *)iov->g;
>> return __builtin_dynamic_object_size(iov->g, 0);
>> }
>> 
>> Basically, the problem is relating to the pointee type of the pointer 
>> array being “void”, 
>> As a result, the element size of the array is not available in the IR. 
>> Therefore segmentation
>> fault when calculating the size of the whole object. 
>> 
>> Although it’s easy to fix this segmentation failure, I am not quite sure 
>> what’s the best
>> solution to this issue:
>> 
>> 1. Reject such usage of “counted_by” in the very beginning by reporting 
>> warning to the
>> User, and delete the counted_by attribute from the field.
>> 
>> Or:
>> 
>> 2. Accept such usage, but issue warnings when calculating the 
>> object_size in Middle-end.
>> 
>> Personally, I prefer the above 1 since I think that when the pointee 
>> type is void, we don’t know
>> The type of the element of the pointer array, there is no way to decide 
>> the size of the pointer array. 
>> 
>> So, the counted_by information is not useful for the 
>> __builtin_dynamic_object_size.
>> 
>> But I am not sure whether the counted_by still can be used for bound 
>> sanitizer?
>> 
>> Thanks for suggestions and help.
> 
> GNU C allows pointer arithmetic and sizeof on void pointers and
> that treats void as having size 1.  So you could also allow counted_by
> and assume as size 1 for void.
> 
> https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html
 
 Okay, thanks for the info.
 So, 
 1. should we issue warnings when doing this?
>>> 
>>> Please don't, Linux would very much like to track these allocation sizes
>>> still. Performing pointer arithmetic and bounds checking (via __bdos) on
>>> "void *" is wanted (and such a calculation was what tripped the
>>> segfault).
>> 
>> My previous question was: -:)
>> 
>> When we accept the “void” pointee type and provide 
>> __builtin_dynamic_object_size for 
>> such pointers (treating it as 1 byte) shall we issue a warning to users to 
>> warn them that the void pointee type is
>> Accepted and treated as size 1? 
>> 
>> Or just silently handle such case as normal?
> 
> I think it should be silently handled. Other such "void is 1 byte" cases
> don't warn.

Okay, that makes sense.
Then I will handle it by default without warning.

> 
 2. If the compilation option is explicitly asking for standard C,
   shall we issue warning and delete the counted_by attribute from the 
 field?
>>> 
>>> I think it needs to stay attached for __bdos. And from the looks of it,
>>> even array access works with 1-byte values too:
>>> 
>>> extern void *ptr;
>>> void *foo(int num) {
>>>   return &ptr[num];
>>> }
>>> 
>>> The assembly output of this shows it's doing byte addition. Clang
>>> doesn't warn about this, but GCC does:
>>> 
>>> :5:16: warning: dereferencing 'void *' pointer
>>>   5 | return &ptr[num];
>>> |^
>>> 
>>> So, I think even the bounds sanitizer should handle it, even if a
>>> warning ultimately gets emitted.
>> 
>> Okay. I will also handle the void in bounds sanitizer by treating element 
>> size as 1 byte.
> 
> Great, that should work well for Linux, and is, I think, the least
> surprising result. :)
> 
>> 
>> My previous question was:
>> 
>> Since this is only a GNU extension, I am wondering under the situation that 
>> No GNU extension
>> Is allowed, shall we issue warnings and delete the counted_by attribute?
> 
> Oh, like, generally? What GCC option disables GNU extensions?

Actually, I am not sure either on what GCC option disables GNU extensions.  -:)

Just a natural question.  I tried to add -std=c99 or something, but I didn’t 
see behavior change with the testing case.

Qing
> 
> -Kees
> 
> -- 
> Kees Cook

Re: [PATCH] Introduce -flto-partition=locality

2025-04-24 Thread Kyrylo Tkachov



> On 24 Apr 2025, at 09:17, Feng Xue OS  wrote:
> 
>> validate_ipa_reorder_locality_lto_partition (opts, opts_set);
> 
> I know this patch has already been merged into the trunk. But I think the 
> below piece of code change in opts.cc is questionable, it would completely 
> override any user-specified partition model, suppose that user wants a 
> traditional all-in-one lto compilation like "-flto-partition=none", without 
> "-fipa-reorder-for-locality".
> 
>> if (opts_set->x_flag_lto_partition != LTO_PARTITION_DEFAULT)
>>  opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = 
>> LTO_PARTITION_BALANCED;
> 

Hmm, yes I think the condition should be == instead of !=. I’ll test a patch 
momentarily.
Thanks,
Kyrill

> Regards,
> Feng
> 
> From: Kyrylo Tkachov 
> Sent: Saturday, November 16, 2024 1:04 AM
> To: GCC Patches
> Cc: Jan Hubicka; Martin Jambor; Richard Biener
> Subject: [PATCH] Introduce -flto-partition=locality
> 
> Hi all,
> 
> This is a patch submission following-up from the RFC at:
> https://gcc.gnu.org/pipermail/gcc/2024-November/245076.html
> The patch is rebased and retested against current trunk, some debugging code
> removed, comments improved and some fixes added as I've we've done more
> testing.
> 
> >8-
> Implement partitioning and cloning in the callgraph to help locality.
> A new -flto-partition=locality flag is used to enable this.
> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
> The optimization has two components:
> * Partitioning the callgraph so as to group callers and callees that 
> frequently
> call each other in the same partition
> * Cloning functions that straddle multiple callchains and allowing each clone
> to be local to the partition of its callchain.
> 
> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
> It creates a partitioning plan and does the prerequisite cloning.
> The partitioning is then implemented during the existing LTO partitioning 
> pass.
> 
> To guide these locality heuristics we use PGO data.
> In the absence of PGO data we use a static heuristic that uses the accumulated
> estimated edge frequencies of the callees for each function to guide the
> reordering.
> We are investigating some more elaborate static heuristics, in particular 
> using
> the demangled C++ names to group template instantiatios together.
> This is promising but we are working out some kinks in the implementation
> currently and want to send that out as a follow-up once we're more confident
> in it.
> 
> A new bootstrap-lto-locality bootstrap config is added that allows us to test
> this on GCC itself with either static or PGO heuristics.
> GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).
> 
> With this optimization we are seeing good performance gains on some large
> internal workloads that stress the parts of the processor that is sensitive
> to code locality, but we'd appreciate wider performance evaluation.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for mainline?
> Thanks,
> Kyrill
> 
> Signed-off-by: Prachi Godbole 
> Co-authored-by: Kyrylo Tkachov 
> 
>config/ChangeLog:
> * bootstrap-lto-locality.mk: New file.
> 
> gcc/ChangeLog:
>* Makefile.in (OBJS): Add ipa-locality-cloning.o
>(GTFILES): Add ipa-locality-cloning.cc dependency.
>* common.opt (lto_partition_model): Add locality value.
>* flag-types.h (lto_partition_model): Add LTO_PARTITION_LOCALITY 
> value.
>(enum lto_locality_cloning_model): Define.
>* lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping 
> of node
>and index.
>* params.opt (lto_locality_cloning_model): New enum.
>(lto-partition-locality-cloning): New param.
>(lto-partition-locality-frequency-cutoff): Likewise.
>(lto-partition-locality-size-cutoff): Likewise.
>(lto-max-locality-partition): Likewise.
>* passes.def: Add pass_ipa_locality_cloning.
>* timevar.def (TV_IPA_LC): New timevar.
>* tree-pass.h (make_pass_ipa_locality_cloning): Declare.
>* ipa-locality-cloning.cc: New file.
>* ipa-locality-cloning.h: New file.
> 
>  gcc/lto/ChangeLog:
> * lto-partition.cc: Include ipa-locality-cloning.h
>(add_node_references_to_partition): Define.
>(create_partition): Likewise.
>(lto_locality_map): Likewise.
>(lto_promote_cross_file_statics): Add extra dumping.
>* lto-partition.h (lto_locality_map): Declare.
>* lto.cc (do_whole_program_analysis): Handle 
> LTO_PARTITION_LOCALITY.
>

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Jakub Jelinek

On Thu, Apr 24, 2025 at 09:10:37AM +, Kyrylo Tkachov wrote:
> [resending to the list, I think there might have been an error with the 
> previous message]
> 
> Hi all,
> 
> This is a thinko in the logic for handling the default -flto-partition=
> arguments. We should override it to balanced only if it stayed as default
> up to that point. We should also be testing opts instead of opts_set here.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC 
> 15 branch as well.
> Thanks,
> Kyrill
> 
> Signed-off-by: Kyrylo Tkachov 
> 
> gcc/
> 
>   * opts.cc (finish_options): Check for == against LTO_PARTITION_DEFAULT
>   before setting LTO_PARTITION_BALANCED.

I think
 opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = 
LTO_PARTITION_BALANCED; 
   
is also wrong.
opts_set should have bool in whatever type the field has, so I think it
should be
{
  opts->x_flag_lto_partition = LTO_PARTITION_BALANCED;
  opts_set->x_flag_lto_partition = true;
}
(or maybe just opts->x_flag_lto_partition = LTO_PARTITION_BALANCED; alone?).
LTO_PARTITION_BALANCED is 2...

Jakub

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Jakub Jelinek

On Thu, Apr 24, 2025 at 11:27:36AM +0200, Jakub Jelinek wrote:
> > This is a thinko in the logic for handling the default -flto-partition=
> > arguments. We should override it to balanced only if it stayed as default
> > up to that point. We should also be testing opts instead of opts_set here.
> > 
> > Bootstrapped and tested on aarch64-none-linux-gnu.
> > 
> > Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC 
> > 15 branch as well.
> > Thanks,
> > Kyrill
> > 
> > Signed-off-by: Kyrylo Tkachov 
> > 
> > gcc/
> > 
> > * opts.cc (finish_options): Check for == against LTO_PARTITION_DEFAULT
> > before setting LTO_PARTITION_BALANCED.
> 
> I think
>  opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = 
> LTO_PARTITION_BALANCED;   
>  
> is also wrong.
> opts_set should have bool in whatever type the field has, so I think it
> should be
> {
>   opts->x_flag_lto_partition = LTO_PARTITION_BALANCED;
>   opts_set->x_flag_lto_partition = true;
> }
> (or maybe just opts->x_flag_lto_partition = LTO_PARTITION_BALANCED; alone?).
> LTO_PARTITION_BALANCED is 2...

In fact, I wonder why LTO_PARTITION_DEFAULT and -flto-partition=default has
been added at all (and it isn't documented).
I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should be
testing !opts_set->x_flag_lto_partition (i.e. -flto-partition=balanced
should be the default, but when not specified explicitly, it would really
match that
   error ("%<-fipa-reorder-for-locality%> is incompatible with" 

  " an explicit %qs option", "-flto-partition");

error wording (right now -fipa-reorder-for-locality -flto-partition=default
is explicit, yet no error is emittetd).

Jakub

Re: [PATCH] GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]

2025-04-24 Thread Andrew Stubbs


On 23/04/2025 20:49, Thomas Schwinge wrote:

'__dso_handle' for '__cxa_atexit', '__cxa_finalize'.  See
.

PR target/119853
PR target/119854
libgcc/
* config/gcn/crt0.c (_fini_array): Call
'__GCC_offload___cxa_finalize'.
* config/nvptx/gbl-ctors.c (__static_do_global_dtors): Likewise.
libgomp/
* target-cxa-dso-dtor.c: New.
* config/accel/target-cxa-dso-dtor.c: Likewise.
* Makefile.am (libgomp_la_SOURCES): Add it.
* Makefile.in: Regenerate.
* testsuite/libgomp.c++/target-cdtor-1.C: New.
* testsuite/libgomp.c++/target-cdtor-2.C: Likewise.
---
  libgcc/config/gcn/crt0.c  |  32 
  libgcc/config/nvptx/gbl-ctors.c   |  16 ++
  libgomp/Makefile.am   |   2 +-
  libgomp/Makefile.in   |   7 +-
  libgomp/config/accel/target-cxa-dso-dtor.c|  62 
  libgomp/target-cxa-dso-dtor.c |   3 +
  .../testsuite/libgomp.c++/target-cdtor-1.C| 104 +
  .../testsuite/libgomp.c++/target-cdtor-2.C| 138 ++
  8 files changed, 361 insertions(+), 3 deletions(-)
  create mode 100644 libgomp/config/accel/target-cxa-dso-dtor.c
  create mode 100644 libgomp/target-cxa-dso-dtor.c
  create mode 100644 libgomp/testsuite/libgomp.c++/target-cdtor-1.C
  create mode 100644 libgomp/testsuite/libgomp.c++/target-cdtor-2.C

diff --git a/libgcc/config/gcn/crt0.c b/libgcc/config/gcn/crt0.c
index dbd6749a47f..cc23e214cf9 100644
--- a/libgcc/config/gcn/crt0.c
+++ b/libgcc/config/gcn/crt0.c
@@ -24,6 +24,28 @@ typedef long long size_t;
  /* Provide an entry point symbol to silence a linker warning.  */
  void _start() {}
  
+

+#define PR119369_fixed 0
+
+
+/* Host/device compatibility: '__cxa_finalize'.  Dummy; if necessary,
+   overridden via libgomp 'target-cxa-dso-dtor.c'.  */
+
+#if PR119369_fixed
+extern void __GCC_offload___cxa_finalize (void *) __attribute__((weak));
+#else
+void __GCC_offload___cxa_finalize (void *) __attribute__((weak));
+
+void __attribute__((weak))
+__GCC_offload___cxa_finalize (void *dso_handle __attribute__((unused)))
+{
+}
+#endif
+
+/* There are no DSOs; this is the main program.  */
+static void * const __dso_handle = 0;
+
+
  #ifdef USE_NEWLIB_INITFINI
  
  extern void __libc_init_array (void) __attribute__((weak));

@@ -38,6 +60,11 @@ void _init_array()
  __attribute__((amdgpu_hsa_kernel ()))
  void _fini_array()
  {
+#if PR119369_fixed
+  if (__GCC_offload___cxa_finalize)
+#endif
+__GCC_offload___cxa_finalize (__dso_handle);
+
__libc_fini_array ();
  }
  
@@ -70,6 +97,11 @@ void _init_array()

  __attribute__((amdgpu_hsa_kernel ()))
  void _fini_array()
  {
+#if PR119369_fixed
+  if (__GCC_offload___cxa_finalize)
+#endif
+__GCC_offload___cxa_finalize (__dso_handle);
+
size_t count;
size_t i;
  



The GCN part seems OK, albeit the conditionals are confusing.

Andrew

[PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov

[resending to the list, I think there might have been an error with the 
previous message]

Hi all,

This is a thinko in the logic for handling the default -flto-partition=
arguments. We should override it to balanced only if it stayed as default
up to that point. We should also be testing opts instead of opts_set here.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC 15 
branch as well.
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov 

gcc/

* opts.cc (finish_options): Check for == against LTO_PARTITION_DEFAULT
before setting LTO_PARTITION_BALANCED.



0001-opts.cc-Fix-thinko-with-default-handling-of-flto-par.patch
Description: 0001-opts.cc-Fix-thinko-with-default-handling-of-flto-par.patch

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov



> On 24 Apr 2025, at 12:09, Jakub Jelinek  wrote:
> 
> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkachov wrote:
>>> I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should be
>>> testing !opts_set->x_flag_lto_partition (i.e. -flto-partition=balanced
>>> should be the default, but when not specified explicitly, it would really
>>> match that
>>>  error ("%<-fipa-reorder-for-locality%> is incompatible with"   
>>> 
>>>
>>> " an explicit %qs option", "-flto-partition");  
>>> 
>>>
>>> error wording (right now -fipa-reorder-for-locality -flto-partition=default
>>> is explicit, yet no error is emittetd).
>> 
>> Yes, I think I was confused about the use of opts_set. This is overly 
>> convoluted.
>> So something like this?
> 
> Yes, though I'd also remove the trailing comma after
>   LTO_PARTITION_CACHE = 5

Right, ok to commit this to trunk and GCC 15 branch once bootstrap is 
successful?
Thanks,
Kyrill


> 
>> Checked manually that the required error is given, the default selected 
>> partitioning is balanced, and an explicit -flto-partition=max selects it 
>> properly.
>> Running a full bootstrap cycle now.
> 
> Jakub
> 



0001-opts.cc-Simplify-handling-of-explicit-flto-partition.patch
Description: 0001-opts.cc-Simplify-handling-of-explicit-flto-partition.patch

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Jakub Jelinek

On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkachov wrote:
> > I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should be
> > testing !opts_set->x_flag_lto_partition (i.e. -flto-partition=balanced
> > should be the default, but when not specified explicitly, it would really
> > match that
> >   error ("%<-fipa-reorder-for-locality%> is incompatible with"  
> > 
> > 
> >  " an explicit %qs option", "-flto-partition"); 
> > 
> > 
> > error wording (right now -fipa-reorder-for-locality -flto-partition=default
> > is explicit, yet no error is emittetd).
> 
> Yes, I think I was confused about the use of opts_set. This is overly 
> convoluted.
> So something like this?

Yes, though I'd also remove the trailing comma after
   LTO_PARTITION_CACHE = 5

> Checked manually that the required error is given, the default selected 
> partitioning is balanced, and an explicit -flto-partition=max selects it 
> properly.
> Running a full bootstrap cycle now.

Jakub

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Jakub Jelinek

On Thu, Apr 24, 2025 at 10:15:08AM +, Kyrylo Tkachov wrote:
> 
> 
> > On 24 Apr 2025, at 12:09, Jakub Jelinek  wrote:
> > 
> > On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkachov wrote:
> >>> I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should 
> >>> be
> >>> testing !opts_set->x_flag_lto_partition (i.e. -flto-partition=balanced
> >>> should be the default, but when not specified explicitly, it would really
> >>> match that
> >>>  error ("%<-fipa-reorder-for-locality%> is incompatible with" 
> >>>   
> >>>
> >>> " an explicit %qs option", "-flto-partition");
> >>>   
> >>>
> >>> error wording (right now -fipa-reorder-for-locality 
> >>> -flto-partition=default
> >>> is explicit, yet no error is emittetd).
> >> 
> >> Yes, I think I was confused about the use of opts_set. This is overly 
> >> convoluted.
> >> So something like this?
> > 
> > Yes, though I'd also remove the trailing comma after
> >   LTO_PARTITION_CACHE = 5
> 
> Right, ok to commit this to trunk and GCC 15 branch once bootstrap is 
> successful?

Yes.

Jakub

Re: [PATCH] s390: Allow 5+ argument tail-calls in some special cases [PR119873]

2025-04-24 Thread Stefan Schulze Frielinghaus

On Wed, Apr 23, 2025 at 04:46:17PM +0200, Jakub Jelinek wrote:
[...]
> > > It won't really work at -O0 but should work for -O1 and above, at least 
> > > when
> > > one doesn't really try to modify the parameter conditionally and hope it 
> > > will
> > > be optimized away in the end.
> > 
> > It also fails for
> > 
> > extern int bar (int p1, int p2, int p3, int p4, int p5);
> > int foo (int p1, int p2, int p3, int p4, int p5)
> > {
> >   [[gnu::musttail]] return bar (p1, p2, p3, p4, p5);
> > }
> > 
> > since rtx_equal_p (parm, parm_rtx) does not hold for p5
> > 
> > (gdb) call debug_rtx(parm_rtx)
> > (reg:SI 6 %r6)
> > (gdb) call debug_rtx(parm)
> > (reg:DI 6 %r6 [ p5+-4 ])
> > 
> > due to a missmatch between extended and non-extended values.  Maybe a
> > check like
> > 
> > REGNO (parm) == REGNO (parm_rtx)
> > && REG_NREGS (parm) == REG_NREGS (parm_rtx)
> > 
> > is sufficient?
> 
> I think it depends the details of the ABI.
> I believe when the argument is SSA_NAME (D) of a PARM_DECL it means
> the parameter and the argument have (GIMPLE) compatible types.
> Are the upper 32 bits always sign extended (or zero extended, or
> sign vs. zero extended depending on if it is signed or unsigned < 64bit
> type)?

Yes, every parameter is sign or zero extended if its type is smaller
than 64bit.

> If it must be extended, then yes, testing REGNO and REG_NREGS might be
> enough, if the upper bits are unspecified, I'd worry we could sometimes
> try to change those bits and break the caller that way.
> If {un,}signed {char,short,int} are always zero/sign extended to 64-bit,
> what about BITINT_TYPEs?

The _BitInt ABI isn't fixed yet, however, we plan to zero/sign extend
excess bits.  In particular for _BitInt(N) with N <= 64, the argument is
sign/zero extended to a 64bit value, if passed via a register.

> Another question is if it shouldn't be handled in the PARALLEL case as well
> (e.g. if some -m32 long long parameter could be partly passed in %r6 and
> partly on the stack).

Yea, that is something I still try to figure out.  The PARALLEL handling
was introduced by r0-99328-g9602b6a1b8de60.  I guess a PARALLEL is used
exclusively in case a register pair is required.  Though, I'm not sure
whether this is correct or not.  There doesn't seem to exist a test for
this.  I will look into this.

Note, on s390 a parameter is either passed in a register (pair) or via
memory, but not partly in a register and memory.

Cheers,
Stefan

Re: [PATCH] libstdc++: centralize and improve testing of shared_ptr/weak_ptr conversions

2025-04-24 Thread Jonathan Wakely

On Sat, 15 Mar 2025 at 20:02, Giuseppe D'Angelo wrote:
>
> Hi,
>
> The attached patch is a cleanup and improvement of a test that I've
> added in r15-8048-gdf0e6509bf7442. Since the test is identical for
> shared_ptr and weak_ptr, I've centralized it to reduce future
> maintenance, and extended it to cover more cases.

OK for trunk, thanks.

Re: [PATCH] Introduce -flto-partition=locality

2025-04-24 Thread Feng Xue OS

> validate_ipa_reorder_locality_lto_partition (opts, opts_set);

I know this patch has already been merged into the trunk. But I think the below 
piece of code change in opts.cc is questionable, it would completely override 
any user-specified partition model, suppose that user wants a traditional 
all-in-one lto compilation like "-flto-partition=none", without 
"-fipa-reorder-for-locality".

> if (opts_set->x_flag_lto_partition != LTO_PARTITION_DEFAULT)
>   opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = 
> LTO_PARTITION_BALANCED;

Regards,
Feng

From: Kyrylo Tkachov 
Sent: Saturday, November 16, 2024 1:04 AM
To: GCC Patches
Cc: Jan Hubicka; Martin Jambor; Richard Biener
Subject: [PATCH] Introduce -flto-partition=locality

Hi all,

This is a patch submission following-up from the RFC at:
https://gcc.gnu.org/pipermail/gcc/2024-November/245076.html
The patch is rebased and retested against current trunk, some debugging code
removed, comments improved and some fixes added as I've we've done more
testing.

>8-
Implement partitioning and cloning in the callgraph to help locality.
A new -flto-partition=locality flag is used to enable this.
The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
The optimization has two components:
* Partitioning the callgraph so as to group callers and callees that frequently
call each other in the same partition
* Cloning functions that straddle multiple callchains and allowing each clone
to be local to the partition of its callchain.

The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
It creates a partitioning plan and does the prerequisite cloning.
The partitioning is then implemented during the existing LTO partitioning pass.

To guide these locality heuristics we use PGO data.
In the absence of PGO data we use a static heuristic that uses the accumulated
estimated edge frequencies of the callees for each function to guide the
reordering.
We are investigating some more elaborate static heuristics, in particular using
the demangled C++ names to group template instantiatios together.
This is promising but we are working out some kinks in the implementation
currently and want to send that out as a follow-up once we're more confident
in it.

A new bootstrap-lto-locality bootstrap config is added that allows us to test
this on GCC itself with either static or PGO heuristics.
GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).

With this optimization we are seeing good performance gains on some large
internal workloads that stress the parts of the processor that is sensitive
to code locality, but we'd appreciate wider performance evaluation.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill

Signed-off-by: Prachi Godbole 
Co-authored-by: Kyrylo Tkachov 

config/ChangeLog:
 * bootstrap-lto-locality.mk: New file.

 gcc/ChangeLog:
* Makefile.in (OBJS): Add ipa-locality-cloning.o
(GTFILES): Add ipa-locality-cloning.cc dependency.
* common.opt (lto_partition_model): Add locality value.
* flag-types.h (lto_partition_model): Add LTO_PARTITION_LOCALITY 
value.
(enum lto_locality_cloning_model): Define.
* lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping 
of node
and index.
* params.opt (lto_locality_cloning_model): New enum.
(lto-partition-locality-cloning): New param.
(lto-partition-locality-frequency-cutoff): Likewise.
(lto-partition-locality-size-cutoff): Likewise.
(lto-max-locality-partition): Likewise.
* passes.def: Add pass_ipa_locality_cloning.
* timevar.def (TV_IPA_LC): New timevar.
* tree-pass.h (make_pass_ipa_locality_cloning): Declare.
* ipa-locality-cloning.cc: New file.
* ipa-locality-cloning.h: New file.

  gcc/lto/ChangeLog:
 * lto-partition.cc: Include ipa-locality-cloning.h
(add_node_references_to_partition): Define.
(create_partition): Likewise.
(lto_locality_map): Likewise.
(lto_promote_cross_file_statics): Add extra dumping.
* lto-partition.h (lto_locality_map): Declare.
* lto.cc (do_whole_program_analysis): Handle LTO_PARTITION_LOCALITY.

[PATCH] Fix size_t in id-15.c and infoleak-net-ethtool-ioctl.c for llp64

2025-04-24 Thread Jonathan Yong


Attached patch OK for master branch?
Will push soon if there are no objections.

gcc/testsuite/ChangeLog:

* gcc.dg/graphite/id-15.c: Use __SIZE_TYPE__ instead of
  unsigned long.
* gcc.dg/plugin/infoleak-net-ethtool-ioctl.c: ditto.From 7b1176589124eb25bf23ec0c05faa947aaabfdbf Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Thu, 24 Apr 2025 07:42:17 +
Subject: [PATCH] Fix size_t in id-15.c and infoleak-net-ethtool-ioctl.c for
 llp64

Use __SIZE_TYPE__ for size_t types so that it works for
llp64.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>

gcc/testsuite/ChangeLog:

	* gcc.dg/graphite/id-15.c: Use __SIZE_TYPE__ instead of
	  unsigned long.
	* gcc.dg/plugin/infoleak-net-ethtool-ioctl.c: ditto.
---
 gcc/testsuite/gcc.dg/graphite/id-15.c| 2 +-
 gcc/testsuite/gcc.dg/plugin/infoleak-net-ethtool-ioctl.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/graphite/id-15.c b/gcc/testsuite/gcc.dg/graphite/id-15.c
index d0a804c876b..d258ef57688 100644
--- a/gcc/testsuite/gcc.dg/graphite/id-15.c
+++ b/gcc/testsuite/gcc.dg/graphite/id-15.c
@@ -1,7 +1,7 @@
 /* { dg-additional-options "-Wno-old-style-definition" } */
 /* { dg-require-effective-target int32plus } */
 
-typedef long unsigned int size_t;
+typedef __SIZE_TYPE__ size_t;
 extern void *memset (void *__s, int __c, size_t __n) __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1)));
 
 static void
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-net-ethtool-ioctl.c b/gcc/testsuite/gcc.dg/plugin/infoleak-net-ethtool-ioctl.c
index 52846c40f86..afb4a5714e9 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-net-ethtool-ioctl.c
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-net-ethtool-ioctl.c
@@ -11,8 +11,7 @@ typedef unsigned int __u32;
 typedef __s8 s8;
 typedef __u32 u32;
 enum { false = 0, true = 1 };
-typedef unsigned long __kernel_ulong_t;
-typedef __kernel_ulong_t __kernel_size_t;
+typedef __SIZE_TYPE__ __kernel_size_t;
 typedef _Bool bool;
 typedef __kernel_size_t size_t;
 
-- 
2.49.0

Re: [PATCH] Add std::deque shrink_to_fit test

2025-04-24 Thread Jonathan Wakely

On Wed, 23 Apr 2025 at 21:10, François Dumont  wrote:

> AFAICT I've never got proper validation for this small patch.
>
> Is it ok to commit ?
>

Yes, OK for trunk, thanks.



> Thanks
>
>
> On 14/04/2025 22:25, François Dumont wrote:
>
>
> On 14/04/2025 08:29, Tomasz Kaminski wrote:
>
>
>
> On Sun, Apr 13, 2025 at 12:13 PM François Dumont 
> wrote:
>
>>
>> On 11/04/2025 08:36, Tomasz Kaminski wrote:
>>
>>
>>
>> On Thu, Apr 10, 2025 at 10:47 PM Jonathan Wakely 
>> wrote:
>>
>>> On 10/04/25 22:36 +0200, François Dumont wrote:
>>> >After running the test with -fno-exceptions option we rather need this
>>> >patch.
>>> >
>>> >Ok to commit ?
>>> >
>>> >François
>>> >
>>> >
>>> >On 10/04/2025 21:08, François Dumont wrote:
>>> >>Hi
>>> >>
>>> >>No problem detected now that we really test std::deque
>>> >>shrink_to_fit implementation.
>>> >>
>>> >>libstdc++: Add std::deque<>::shrink_to_fit test
>>> >>
>>> >>The existing test is currently testing std::vector. Make it test
>>> >>std::deque.
>>> >>
>>> >>libstdc++-v3/ChangeLog:
>>> >>
>>> >>*
>>> >>testsuite/23_containers/deque/capacity/shrink_to_fit.cc: Adapt test
>>> >>to check std::deque shrink_to_fit method.
>>> >>
>>> >>Tested under Linux x64.
>>> >>
>>> >>Ok to commit ?
>>> >>
>>> >>François
>>>
>>> >diff --git
>>> a/libstdc++-v3/testsuite/23_containers/deque/capacity/shrink_to_fit.cc
>>> b/libstdc++-v3/testsuite/23_containers/deque/capacity/shrink_to_fit.cc
>>> >index 7cb67079214..9c8b3a926e8 100644
>>> >---
>>> a/libstdc++-v3/testsuite/23_containers/deque/capacity/shrink_to_fit.cc
>>> >+++
>>> b/libstdc++-v3/testsuite/23_containers/deque/capacity/shrink_to_fit.cc
>>> >@@ -1,4 +1,5 @@
>>> > // { dg-do run { target c++11 } }
>>> >+// { dg-add-options no_pch }
>>>
>>> Tests using replacement_memory_operators.h need:
>>>
>>> // { dg-require-effective-target std_allocator_new }
>>> // { dg-xfail-run-if "AIX operator new" { powerpc-ibm-aix* } }
>>>
>>> See e.g. 23_containers/unordered_set/96088.cc
>>>
>> Thanks, I new I needed to add something like that but then forgot to
>> amend my test.
>>
>>
>>
>>> >
>>> > // 2010-01-08  Paolo Carlini  
>>> >
>>> >@@ -19,18 +20,39 @@
>>> > // with this library; see the file COPYING3.  If not see
>>> > // .
>>> >
>>> >-#include 
>>> >+#define _GLIBCXX_DEQUE_BUF_SIZE sizeof(int) * 3
>>>
>>> Couldn't the test just create more elements, instead of modifying the
>>> internals? We should test it using the default parameters, no?
>>>
>> Sounds better indeed, done using std::__deque_buf_size extension.
>>
>>
>> In from_range test I have used a class that contains some padding,
>> to be able to fill the deque buffer, while still test with few elements,
>>
>> struct EightInBuf
>>
>> {
>>   EightInBuf(int x) : elems{x}
>>   { }
>>
>>  private:
>>int elems[512 / (sizeof(int) * 8)];
>>
>>   friend constexpr bool operator==(EightInBuf const& lhs, int rhs)
>>   { return lhs.elems[0] == rhs; }
>>  };
>>
>>
>>
>>
>> It is a nice alternative even if you are still relying on the 512
>> implementation detail hidden by _GLIBCXX_DEQUE_BUF_SIZE macro.
>>
>> This is why I preferred to use__deque_buf_size.
>>
> When using _GLIBCXX_DEBUG this function will be defined inside
> std::__cxx1998 namespace, not in the std directly.
> There is a macro _GLIBCXX_STD_C that can be used to refer to it, but it
> works only inside std namespace.
>
> Fixed in this new version.
>
> Ok to commit ?
>
> François
>
>

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov



> On 24 Apr 2025, at 11:34, Jakub Jelinek  wrote:
> 
> On Thu, Apr 24, 2025 at 11:27:36AM +0200, Jakub Jelinek wrote:
>>> This is a thinko in the logic for handling the default -flto-partition=
>>> arguments. We should override it to balanced only if it stayed as default
>>> up to that point. We should also be testing opts instead of opts_set here.
>>> 
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>> 
>>> Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC 
>>> 15 branch as well.
>>> Thanks,
>>> Kyrill
>>> 
>>> Signed-off-by: Kyrylo Tkachov 
>>> 
>>> gcc/
>>> 
>>> * opts.cc (finish_options): Check for == against LTO_PARTITION_DEFAULT
>>> before setting LTO_PARTITION_BALANCED.
>> 
>> I think
>> opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = 
>> LTO_PARTITION_BALANCED;  
>>   
>> is also wrong.
>> opts_set should have bool in whatever type the field has, so I think it
>> should be
>>{
>>  opts->x_flag_lto_partition = LTO_PARTITION_BALANCED;
>>  opts_set->x_flag_lto_partition = true;
>>}
>> (or maybe just opts->x_flag_lto_partition = LTO_PARTITION_BALANCED; alone?).
>> LTO_PARTITION_BALANCED is 2...
> 
> In fact, I wonder why LTO_PARTITION_DEFAULT and -flto-partition=default has
> been added at all (and it isn't documented).

It was not intended to be exposed to users, it was used as a mechanism to check 
if flto-partition is explicitly set.

> I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should be
> testing !opts_set->x_flag_lto_partition (i.e. -flto-partition=balanced
> should be the default, but when not specified explicitly, it would really
> match that
>   error ("%<-fipa-reorder-for-locality%> is incompatible with"
>   
> 
>  " an explicit %qs option", "-flto-partition");   
>   
> 
> error wording (right now -fipa-reorder-for-locality -flto-partition=default
> is explicit, yet no error is emittetd).

Yes, I think I was confused about the use of opts_set. This is overly 
convoluted.
So something like this?
Checked manually that the required error is given, the default selected 
partitioning is balanced, and an explicit -flto-partition=max selects it 
properly.
Running a full bootstrap cycle now.

Thanks,
Kyrill



0001-opts.cc-Simplify-handling-of-explicit-flto-partition.patch
Description: 0001-opts.cc-Simplify-handling-of-explicit-flto-partition.patch

[PATCH] c, c++: Extend -Wunused-but-set-* warnings [PR44677]

2025-04-24 Thread Jakub Jelinek

Hi!

The -Wunused-but-set-* warnings work by using 2 bits on VAR_DECLs &
PARM_DECLs, TREE_USED and DECL_READ_P.  If neither is set, we typically
emit -Wunused-variable or -Wunused-parameter warning, that is for variables
which are just declared (including initializer) and completely unused.
If TREE_USED is set and DECL_READ_P is unset, -Wunused-but-set-* warnings
are emitted, i.e. for variables which can appear on the lhs of an assignment
expression but aren't actually used elsewhere.  The DECL_READ_P marking is
done through mark_exp_read called from lots of places (e.g. lvalue to rvalue
conversions etc.).

LLVM has an extension on top of that in that it doesn't count pre/post
inc/decrements as use (i.e. DECL_READ_P for GCC).

The following patch does that too, though because we had the current
behavior for 11+ years already and lot of people is -Wunused-but-set-*
warning free in the current GCC behavior and not in the clang one (including
GCC sources), it allows users to choose.
Furthermore, it implements another level, where also var @= expr uses of var
(except when it is also used in expr) aren't counted as DECL_READ_P.

I think it would be nice to also handle var = var @ expr or var = expr @ var
but unfortunately mark_exp_read is then done in both FEs during parsing of
var @ expr or expr @ var and the code doesn't know it is rhs of an
assignment with var as lhs.

The patch works mostly by checking if DECL_READ_P is clear at some point and
then clearing it again after some operation which might have set it.

-Wunused or -Wall or -Wunused -Wextra or -Wall -Wextra turn on the 3 level
of the new warning (i.e. the one which ignores also var++, ++var etc. as
well as var @= expr), so does -Wunused-but-set-{variable,parameter}, but
users can use explicit -Wunused-but-set-{variable,parameter}={1,2} to select
a different level.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Though, I'm afraid the gofrontend part will need to go in separately through
upstream.

2025-04-24  Jakub Jelinek  

PR c/44677
gcc/
* common.opt (Wunused-but-set-parameter=, Wunused-but-set-variable=):
New options.
(Wunused-but-set-parameter, Wunused-but-set-variable): Turn into
aliases.
* common.opt.urls: Regenerate.
* diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Use
OPT_Wunused_but_set_variable_ instead of OPT_Wunused_but_set_variable
and OPT_Wunused_but_set_parameter_ instead of
OPT_Wunused_but_set_parameter.
* gimple-ssa-store-merging.cc (find_bswap_or_nop_1): Remove unused
but set variable tmp.
* ipa-strub.cc (pass_ipa_strub::execute): Cast named_args to
(void) if ATTR_FNSPEC_DECONST_WATERMARK is not defined.
* doc/invoke.texi (Wunused-but-set-parameter=,
Wunused-but-set-variable=): Document new options.
(Wunused-but-set-parameter, Wunused-but-set-variable): Adjust
documentation now that they are just aliases.
gcc/c-family/
* c-opts.cc (c_common_post_options): Change
warn_unused_but_set_parameter and warn_unused_but_set_variable
from 1 to 3 if they were set only implicitly.
* c-attribs.cc (build_attr_access_from_parms): Remove unused
but set variable nelts.
gcc/c/
* c-parser.cc (c_parser_unary_expression): Clear DECL_READ_P
after default_function_array_read_conversion for
-Wunused-but-set-{parameter,variable}={2,3} on
PRE{IN,DE}CREMENT_EXPR argument.
(c_parser_postfix_expression_after_primary): Similarly for
POST{IN,DE}CREMENT_EXPR.
* c-decl.cc (pop_scope): Use OPT_Wunused_but_set_variable_
instead of OPT_Wunused_but_set_variable.
(finish_function): Use OPT_Wunused_but_set_parameter_
instead of OPT_Wunused_but_set_parameter.
* c-typeck.cc (mark_exp_read): Handle {PRE,POST}{IN,DE}CREMENT_EXPR
and don't handle it when cast to void.
(build_modify_expr): Clear DECL_READ_P after build_binary_op
for -Wunused-but-set-{parameter,variable}=3.
gcc/cp/
* cp-gimplify.cc (cp_fold): Clear DECL_READ_P after cp_fold_rvalue
for x = x OP y.
* decl.cc (poplevel): Use OPT_Wunused_but_set_variable_
instead of OPT_Wunused_but_set_variable.
(finish_function): Use OPT_Wunused_but_set_parameter_
instead of OPT_Wunused_but_set_parameter.
* expr.cc (mark_use): Clear read_p for {PRE,POST}{IN,DE}CREMENT_EXPR
cast to void on {VAR,PARM}_DECL for
-Wunused-but-set-{parameter,variable}={2,3}.
(mark_exp_read): Handle {PRE,POST}{IN,DE}CREMENT_EXPR and don't handle
it when cast to void.
* module.cc (trees_in::fn_parms_fini): Remove unused but set variable
ix.
* semantics.cc (finish_unary_op_expr): Clear DECL_READ_P
after cp_fully_fold for -Wunused-but-set-{parameter,variable}={2,3} on
PRE{IN,DE}CREMENT_EXPR argument.
*

[PATCH] tailc: Improve tail recursion handling [PR119493]

2025-04-24 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 11:51:49AM +0200, Jakub Jelinek wrote:
> Here it is, ok if it passes bootstrap/regtest?  I'll queue the interdiff
> between this patch and the previous one for GCC 16.

Here is the interdiff to improve the tail recursion handling also for
non-musttail calls.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-24  Jakub Jelinek  

PR tree-optimization/119493
* tree-tailcall.cc (find_tail_calls): Handle non-gimple_reg_type
arguments which aren't just passed through for tail recursions
even for non-musttail calls.

--- gcc/tree-tailcall.cc.jj 2025-04-01 16:47:30.373502796 +0200
+++ gcc/tree-tailcall.cc2025-04-01 20:08:34.578787921 +0200
@@ -685,8 +685,7 @@ find_tail_calls (basic_block bb, struct
  ? !is_gimple_reg (param)
  : (!is_gimple_variable (param)
 || TREE_THIS_VOLATILE (param)
-|| may_be_aliased (param)
-|| !gimple_call_must_tail_p (call)))
+|| may_be_aliased (param)))
break;
}
}

Jakub

[PATCH] RISC-V: Add tt-ascalon-d8 integer and floating point scheduling model

2025-04-24 Thread Anton Blanchard

Add integer and floating point scheduling models for the Tenstorrent
Ascalon 8 wide CPU.

gcc/ChangeLog:
* config/riscv/riscv-cores.def (RISCV_TUNE): Update.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
  Add tt_ascalon_d8.
* config/riscv/riscv.md: Update tune attribute and include
  tt-ascalon-d8.md.
* config/riscv/tenstorrent-ascalon.md: New file.

Signed-off-by: Anton Blanchard 
---
 gcc/config/riscv/riscv-cores.def  |   2 +-
 gcc/config/riscv/riscv-opts.h |   1 +
 gcc/config/riscv/riscv.md |   3 +-
 gcc/config/riscv/tt-ascalon-d8.md | 154 ++
 4 files changed, 158 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/tt-ascalon-d8.md

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index e31afc3fe70..33d93080eca 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -39,7 +39,7 @@ RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
 RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
 RISCV_TUNE("sifive-p400-series", sifive_p400, sifive_p400_tune_info)
 RISCV_TUNE("sifive-p600-series", sifive_p600, sifive_p600_tune_info)
-RISCV_TUNE("tt-ascalon-d8", generic_ooo, tt_ascalon_d8_tune_info)
+RISCV_TUNE("tt-ascalon-d8", tt_ascalon_d8, tt_ascalon_d8_tune_info)
 RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
 RISCV_TUNE("xt-c908", generic, generic_ooo_tune_info)
 RISCV_TUNE("xt-c908v", generic, generic_ooo_tune_info)
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 26fe228e0f8..e921c71679f 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -57,6 +57,7 @@ enum riscv_microarchitecture_type {
   sifive_7,
   sifive_p400,
   sifive_p600,
+  tt_ascalon_d8,
   xiangshan,
   generic_ooo
 };
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index eec96875f96..fac9eb9292c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -669,7 +669,7 @@
 ;; Microarchitectures we know how to tune for.
 ;; Keep this in sync with enum riscv_microarchitecture.
 (define_attr "tune"
-  "generic,sifive_7,sifive_p400,sifive_p600,xiangshan,generic_ooo"
+  
"generic,sifive_7,sifive_p400,sifive_p600,tt_ascalon_d8,xiangshan,generic_ooo"
   (const (symbol_ref "((enum attr_tune) riscv_microarchitecture)")))
 
 ;; Describe a user's asm statement.
@@ -4832,6 +4832,7 @@
 (include "thead.md")
 (include "generic-vector-ooo.md")
 (include "generic-ooo.md")
+(include "tt-ascalon-d8.md")
 (include "vector.md")
 (include "vector-crypto.md")
 (include "vector-bfloat16.md")
diff --git a/gcc/config/riscv/tt-ascalon-d8.md 
b/gcc/config/riscv/tt-ascalon-d8.md
new file mode 100644
index 000..513608cea79
--- /dev/null
+++ b/gcc/config/riscv/tt-ascalon-d8.md
@@ -0,0 +1,154 @@
+;; Tenstorrent Ascalon code scheduling model.
+;; Copyright (C) 2023-2025 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tt_ascalon_d8")
+
+;; Ascalon has more issue/execution bandwidth than decode/retire bandwidth,
+;; so we model decode to place an upper limit on what we can achieve.
+(define_cpu_unit 
"asc-decode0,asc-decode1,asc-decode2,asc-decode3,asc-decode4,asc-decode5,asc-decode6,asc-decode7"
 "tt_ascalon_d8")
+
+(define_cpu_unit "asc-lsu0,asc-lsu1,asc-lsu2" "tt_ascalon_d8")
+(define_cpu_unit "asc-fxu0,asc-fxu1,asc-fxu2,asc-fxu3,asc-fxu4,asc-fxu5" 
"tt_ascalon_d8")
+(define_cpu_unit "asc-fpu0,asc-fpu1" "tt_ascalon_d8")
+
+;; Shortcuts
+(define_reservation "tt_ascalon_d8_decode" 
"asc-decode0|asc-decode1|asc-decode2|asc-decode3|asc-decode4|asc-decode5|asc-decode6|asc-decode7")
+(define_reservation "tt_ascalon_d8_ls" "asc-lsu0|asc-lsu1|asc-lsu2")
+(define_reservation "tt_ascalon_d8_alu" 
"asc-fxu0|asc-fxu1|asc-fxu2|asc-fxu3|asc-fxu4|asc-fxu5")
+(define_reservation "tt_ascalon_d8_mul" "asc-fxu0")
+(define_reservation "tt_ascalon_d8_div" "asc-fxu0")
+(define_reservation "tt_ascalon_d8_br" "asc-fxu2|asc-fxu3")
+(define_reservation "tt_ascalon_d8_fp" "asc-fpu0|asc-fpu1")
+
+;; Integer load/store
+(define_insn_reservation "tt_ascalon_d8_int_load" 4
+  (and (eq_attr "tune" "tt_ascalon_d8")
+   (eq_attr "type" "load"))
+  "tt_ascalon_d8_decode,tt_ascalon_d8_ls")
+
+(define_insn_reservation "tt_ascalon_

[PATCH] s390, v2: Allow 5+ argument tail-calls in some special cases [PR119873]

2025-04-24 Thread Jakub Jelinek

On Thu, Apr 24, 2025 at 09:44:45AM +0200, Stefan Schulze Frielinghaus wrote:
> Yes, every parameter is sign or zero extended if its type is smaller
> than 64bit.

> Note, on s390 a parameter is either passed in a register (pair) or via
> memory, but not partly in a register and memory.

Ok, so like this then if it passes bootstrap/regtest?
So far tested with
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m64,-m31\} 
s390.exp=pr119873*'
in a cross-compiler.

2025-04-24  Jakub Jelinek  
Stefan Schulze Frielinghaus  

PR target/119873
* config/s390/s390.cc (s390_call_saved_register_used): Don't return
true if default definition of PARM_DECL SSA_NAME of the same register
is passed in call saved register.
(s390_function_ok_for_sibcall): Adjust comment.

* gcc.target/s390/pr119873-1.c: New test.
* gcc.target/s390/pr119873-2.c: New test.
* gcc.target/s390/pr119873-3.c: New test.
* gcc.target/s390/pr119873-4.c: New test.

--- gcc/config/s390/s390.cc.jj  2025-04-22 09:45:29.907576749 +0200
+++ gcc/config/s390/s390.cc 2025-04-24 12:22:48.897695685 +0200
@@ -14496,7 +14496,21 @@ s390_call_saved_register_used (tree call
 
  for (reg = 0; reg < nregs; reg++)
if (!call_used_or_fixed_reg_p (reg + REGNO (parm_rtx)))
- return true;
+ {
+   rtx parm;
+   /* Allow passing through unmodified value from caller,
+  see PR119873.  */
+   if (TREE_CODE (parameter) == SSA_NAME
+   && SSA_NAME_IS_DEFAULT_DEF (parameter)
+   && SSA_NAME_VAR (parameter)
+   && TREE_CODE (SSA_NAME_VAR (parameter)) == PARM_DECL
+   && (parm = DECL_INCOMING_RTL (SSA_NAME_VAR (parameter)))
+   && REG_P (parm)
+   && REGNO (parm) == REGNO (parm_rtx)
+   && REG_NREGS (parm) == REG_NREGS (parm_rtx))
+ break;
+   return true;
+ }
}
   else if (GET_CODE (parm_rtx) == PARALLEL)
{
@@ -14543,8 +14557,9 @@ s390_function_ok_for_sibcall (tree decl,
 return false;
 
   /* Register 6 on s390 is available as an argument register but unfortunately
- "caller saved". This makes functions needing this register for arguments
- not suitable for sibcalls.  */
+ "caller saved".  This makes functions needing this register for arguments
+ not suitable for sibcalls, unless the same value is passed from the
+ caller.  */
   return !s390_call_saved_register_used (exp);
 }
 
--- gcc/testsuite/gcc.target/s390/pr119873-1.c.jj   2025-04-24 
12:19:05.617685023 +0200
+++ gcc/testsuite/gcc.target/s390/pr119873-1.c  2025-04-24 12:19:05.617685023 
+0200
@@ -0,0 +1,11 @@
+/* PR target/119873 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+const char *foo (void *, void *, void *, void *, unsigned long, unsigned long);
+
+const char *
+bar (void *a, void *b, void *c, void *d, unsigned long e, unsigned long f)
+{
+  [[gnu::musttail]] return foo (a, b, c, d, e, f);
+}
--- gcc/testsuite/gcc.target/s390/pr119873-2.c.jj   2025-04-24 
12:19:05.617685023 +0200
+++ gcc/testsuite/gcc.target/s390/pr119873-2.c  2025-04-24 12:19:05.617685023 
+0200
@@ -0,0 +1,17 @@
+/* PR target/119873 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+const char *foo (void *, void *, void *, void *, unsigned long, unsigned long);
+
+const char *
+bar (void *a, void *b, void *c, void *d, unsigned long e, unsigned long f)
+{
+  [[gnu::musttail]] return foo (a, b, c, d, e + 1, f); /* { dg-error "cannot 
tail-call: target is not able to optimize the call into a sibling call" } */
+}
+
+const char *
+baz (void *a, void *b, void *c, void *d, unsigned long e, unsigned long f)
+{
+  [[gnu::musttail]] return foo (a, b, c, d, f, e); /* { dg-error "cannot 
tail-call: target is not able to optimize the call into a sibling call" } */
+}
--- gcc/testsuite/gcc.target/s390/pr119873-3.c.jj   2025-04-24 
12:32:13.550135958 +0200
+++ gcc/testsuite/gcc.target/s390/pr119873-3.c  2025-04-24 12:44:13.744520302 
+0200
@@ -0,0 +1,27 @@
+/* PR target/119873 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern int foo (int, int, int, long long, int);
+
+int
+bar (int u, int v, int w, long long x, int y)
+{
+  [[gnu::musttail]] return foo (u, v, w, x, y);
+}
+
+extern int baz (int, int, int, int, int);
+
+int
+qux (int u, int v, int w, int x, int y)
+{
+  [[gnu::musttail]] return baz (u, v, w, x, y);
+}
+
+extern int corge (int, int, int, int, unsigned short);
+
+int
+garply (int u, int v, int w, int x, unsigned short y)
+{
+  [[gnu::musttail]] return corge (u, v, w, x, y);
+}
--- gcc/testsuite/gcc.target/s390/pr119873-4.c.jj   2025-04-24 
12:45:21.162620820 +0200
+++ gcc/testsuite/gcc.target/s390/pr119873-4.c  2025-04-24 12:46:02.169073715 
+0200
@@ -0,0 +1,27 @@
+/* PR target/119873 */
+/* { dg-do compile } */
+/* { d

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Uros Bizjak

On Thu, Apr 24, 2025 at 8:10 PM Uros Bizjak  wrote:
>
> On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka  wrote:
> >
> > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> > > or vpandn.
> > > Current register_operand/vector_operand could lose some optimization
> > > opportunity.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/i386/predicates.md (vector_or_0_or_1s_operand): New 
> > > predicate.
> > >   (nonimm_or_0_or_1s_operand): Ditto.
> > >   * config/i386/sse.md (vcond_mask_):
> > >   Extend the predicate of operands1 to accept 0 or allones
> > >   operands.
> > >   (vcond_mask_): Ditto.
> > >   (vcond_mask_v1tiv1ti): Ditto.
> > >   (vcond_mask_): Ditto.
> > >   * config/i386/i386.md (movcc): Ditto for operands[2] and
> > >   operands[3].
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/i386/blendv-to-maxmin.c: New test.
> > >   * gcc.target/i386/blendv-to-pand.c: New test.
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> > > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > new file mode 100644
> > > index 000..042eb7d8f24
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > > +
> > > +double
> > > +foo (double a)
> > > +{
> > > +  if (a > 0.0)
> > > +return a;
> > > +  return 0.0;
> > > +}
> >
> > With -ffast-math this is matched as MAX_EXPR at gimple level. Without
> > -ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
> > explicitely documented as unspecified when one of parameters is nan.
> >
> > So without -ffast-math at combine time we see:
> > (insn 6 3 7 2 (set (reg:DF 103)
> > (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
> >  (nil))
> > (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> > (unspec:DF [
> > (reg:DF 104 [ a ])
> > (reg:DF 103)
> > ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
> >  (expr_list:REG_DEAD (reg:DF 104 [ a ])
> > (expr_list:REG_DEAD (reg:DF 103)
> > (nil
> >
> > maxss is defined as:
> >
> > MAX(SRC1, SRC2)
> > {
> > IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> > ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> > ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> > ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> > ELSE DEST := SRC2;
> > FI;
> > }
>
> Please see [1], "Maximum and minimum functions", which says:
>
> "The maxNum and minNum functions defined in the 2008 standard
> propagate a non-NaN when one input is NaN and the other input is a
> normal number.
>
> This problem will be fixed by the forthcoming revision of the
> standard. The new functions named maximum and minimum are certain to
> propagate NaNs.
> Some current implementations are deviating from both of these
> definitions. Max and min instructions in the x86 instruction set are
> implemented so that max(a,b) and min(a,b) give b if one of the inputs
> is NaN. This is useful because it corresponds to the behavior of the
> code expression a > b ? a : b. A compiler can translate this common
> high-level language expression into a single instruction."
>
> Unfortunately, SSE max and min instructions are incompatible with both
> standard revisions due to "ELSE IF (SRC1 = NaN) THEN DEST := SRC2;
> FI;"

Ehm, SSE max and min instructions are incompatible with -2019 because
of "ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;" and with -2008
because of "ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;".

Uros.

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Kees Cook

On Thu, Apr 24, 2025 at 06:06:03PM +, Qing Zhao wrote:
> 
> 
> > On Apr 24, 2025, at 13:07, Kees Cook  wrote:
> > 
> > On Thu, Apr 24, 2025 at 04:36:14PM +, Qing Zhao wrote:
> >> 
> >>> On Apr 24, 2025, at 11:59, Martin Uecker  wrote:
> >>> 
> >>> Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
>  Hi, 
>  
>  Kees reported a segmentation failure when he used the patch to compiler 
>  kernel, 
>  and the reduced the testing case is something like the following:
>  
>  struct f {
>  void *g __attribute__((__counted_by__(h)));
>  long h;
>  };
>  
>  extern struct f *my_alloc (int);
>  
>  int i(void) {
>  struct f *iov = my_alloc (10);
>  int *j = (int *)iov->g;
>  return __builtin_dynamic_object_size(iov->g, 0);
>  }
>  
>  Basically, the problem is relating to the pointee type of the pointer 
>  array being “void”, 
>  As a result, the element size of the array is not available in the IR. 
>  Therefore segmentation
>  fault when calculating the size of the whole object. 
>  
>  Although it’s easy to fix this segmentation failure, I am not quite sure 
>  what’s the best
>  solution to this issue:
>  
>  1. Reject such usage of “counted_by” in the very beginning by reporting 
>  warning to the
>  User, and delete the counted_by attribute from the field.
>  
>  Or:
>  
>  2. Accept such usage, but issue warnings when calculating the 
>  object_size in Middle-end.
>  
>  Personally, I prefer the above 1 since I think that when the pointee 
>  type is void, we don’t know
>  The type of the element of the pointer array, there is no way to decide 
>  the size of the pointer array. 
>  
>  So, the counted_by information is not useful for the 
>  __builtin_dynamic_object_size.
>  
>  But I am not sure whether the counted_by still can be used for bound 
>  sanitizer?
>  
>  Thanks for suggestions and help.
> >>> 
> >>> GNU C allows pointer arithmetic and sizeof on void pointers and
> >>> that treats void as having size 1.  So you could also allow counted_by
> >>> and assume as size 1 for void.
> >>> 
> >>> https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html
> >> 
> >> Okay, thanks for the info.
> >> So, 
> >> 1. should we issue warnings when doing this?
> > 
> > Please don't, Linux would very much like to track these allocation sizes
> > still. Performing pointer arithmetic and bounds checking (via __bdos) on
> > "void *" is wanted (and such a calculation was what tripped the
> > segfault).
> 
> My previous question was: -:)
> 
> When we accept the “void” pointee type and provide 
> __builtin_dynamic_object_size for 
> such pointers (treating it as 1 byte) shall we issue a warning to users to 
> warn them that the void pointee type is
> Accepted and treated as size 1? 
> 
> Or just silently handle such case as normal?

I think it should be silently handled. Other such "void is 1 byte" cases
don't warn.

> >> 2. If the compilation option is explicitly asking for standard C,
> >>shall we issue warning and delete the counted_by attribute from the 
> >> field?
> > 
> > I think it needs to stay attached for __bdos. And from the looks of it,
> > even array access works with 1-byte values too:
> > 
> > extern void *ptr;
> > void *foo(int num) {
> >return &ptr[num];
> > }
> > 
> > The assembly output of this shows it's doing byte addition. Clang
> > doesn't warn about this, but GCC does:
> > 
> > :5:16: warning: dereferencing 'void *' pointer
> >5 | return &ptr[num];
> >  |^
> > 
> > So, I think even the bounds sanitizer should handle it, even if a
> > warning ultimately gets emitted.
> 
> Okay. I will also handle the void in bounds sanitizer by treating element 
> size as 1 byte.

Great, that should work well for Linux, and is, I think, the least
surprising result. :)

> 
> My previous question was:
> 
> Since this is only a GNU extension, I am wondering under the situation that 
> No GNU extension
> Is allowed, shall we issue warnings and delete the counted_by attribute?

Oh, like, generally? What GCC option disables GNU extensions?

-Kees

-- 
Kees Cook

[committed] libgomp/testsuite: Fix hip_header_nvidia check, add workaround to test

2025-04-24 Thread Tobias Burnus


This is a follow up to my previous commit - now trying more
seriously using the real HIP (not some own wrapper) with Nvidia.

Well, that failed first due to some deprecation warning, which
could be silenced. And then due to errors which seem to be bugs
in the HIP implementation for Nvidia/CUDA.

In any case, with the attached changes, this testcase now PASS
when I use the HIP header files from ROCm 6.4.0 (just copied
/opt/rocm/include from the AMD GPU system to my Nvidia GPU
system). Thus, in principle, this test now works.

I guess, the more likely actual testing will either skip the
test completely or use the HIP-header-not-available fallback,
which is fine as well.
(The tests use different file names + #include such that
the test filename tells you which variant worked and which
was skipped.)

Committed asr16-115-g8ef0518bce489c Tobias PS: The affected testcases are 
(deprecation warning, only) 
libgomp/testsuite/libgomp.c/interop-hip-nvidia-full.c and (work around 
required in addition) testsuite/libgomp.c/interop-hipblas-nvidia-full.c
commit 8ef0518bce489c4c0c252a0e0c44193c5f7cf777
Author: Tobias Burnus 
Date:   Thu Apr 24 18:26:30 2025 +0200

libgomp/testsuite: Fix hip_header_nvidia check, add workaround to test

This is all about using the AMD's HIP header files with
__HIP_PLATFORM_NVIDIA__ defined, i.e. HIP with Nvidia/CUDA; in that case,
HIP is a thin layer on top of CUDA.

First, the check_effective_target_gomp_hip_header_nvidia check failed;
to fix it, -Wno-deprecated-declarations was added - and likewise to the
two affected testcases that actually used the HIP headers on Nvidia.

Doing so, the HIP tested was successful but the HIP-BLAS one showed two
issues:

* One seems to be related to include search paths as the HIP header uses
  #include "library_types.h" to include that CUDA header. Seemingly, it
  tried to included (again) the HIP header hip/library_types.h, not the
  CUDA one. I guess, some tweaking of -isystem vs. -I could have
  prevented this, but the simpler workaround was to just explicitly
  include the CUDA one before the HIP header files.

* Once done, everything compiled but linking failed as the association
  between three HIP-BLAS functions and their CUDA-BLAS ones did not
  work. Solution: Just add three #define for mapping them.

libgomp/ChangeLog:

* testsuite/lib/libgomp.exp
(check_effective_target_gomp_hip_header_nvidia): Compile with
"-Wno-deprecated-declarations".
* testsuite/libgomp.c/interop-hip-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hipblas.h: Add workarounds
when using the HIP headers with __HIP_PLATFORM_NVIDIA__.
---
 libgomp/testsuite/lib/libgomp.exp |  2 +-
 libgomp/testsuite/libgomp.c/interop-hip-nvidia-full.c |  2 +-
 libgomp/testsuite/libgomp.c/interop-hipblas-nvidia-full.c |  2 +-
 libgomp/testsuite/libgomp.c/interop-hipblas.h | 14 +-
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index a057394ca13..54f2f708b1b 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -664,21 +664,21 @@ int main() {
 proc check_effective_target_gomp_hip_header_nvidia { } {
 return [check_no_compiler_messages gomp_hip_header_nvidia assembly {
 #define __HIP_PLATFORM_NVIDIA__
 #include 
 int main() {
 hipDevice_t dev;
 hipError_t r = hipDeviceGet (&dev, 0);
 if (r != hipSuccess)
 	return 1;
 return 0;
-} }]
+} } "-Wno-deprecated-declarations"]
 }
 
 # Return 1 if the Fortran hipfort module is available (no link check)
 
 proc check_effective_target_gomp_hipfort_module { } {
 return [check_no_compiler_messages gomp_hipfort_module assembly {
 ! Fortran
 use hipfort
 implicit none
 integer(kind(hipSuccess)) :: r
diff --git a/libgomp/testsuite/libgomp.c/interop-hip-nvidia-full.c b/libgomp/testsuite/libgomp.c/interop-hip-nvidia-full.c
index 324504feb22..79af47dc076 100644
--- a/libgomp/testsuite/libgomp.c/interop-hip-nvidia-full.c
+++ b/libgomp/testsuite/libgomp.c/interop-hip-nvidia-full.c
@@ -1,8 +1,8 @@
 /* { dg-require-effective-target openacc_cudart } */
 /* { dg-require-effective-target openacc_cuda } */
 /* { dg-require-effective-target gomp_hip_header_nvidia } */
-/* { dg-additional-options "-lcuda -lcudart" } */
+/* { dg-additional-options "-lcuda -lcudart -Wno-deprecated-declarations" } */
 
 #define __HIP_PLATFORM_NVIDIA__ 1
 
 #include "interop-hip.h"
diff --git a/libgomp/testsuite/libgomp.c/interop-hipblas-nvidia-full.c b/libgomp/testsuite/libgomp.c/interop-hipblas-nvidia-full.c
index c195d2486f6..ed428c6c760 100644
--- a/libgomp/testsuite/libgomp.c/interop-hipblas-nvidia-full.c
+++ b/libgomp/testsuite/libgomp.c/int

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Qing Zhao

> On Apr 24, 2025, at 13:07, Kees Cook  wrote:
> 
> On Thu, Apr 24, 2025 at 04:36:14PM +, Qing Zhao wrote:
>> 
>>> On Apr 24, 2025, at 11:59, Martin Uecker  wrote:
>>> 
>>> Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
 Hi, 

 Kees reported a segmentation failure when he used the patch to compiler 
 kernel, 
 and the reduced the testing case is something like the following:

 struct f {
 void *g __attribute__((__counted_by__(h)));
 long h;
 };

 extern struct f *my_alloc (int);

 int i(void) {
 struct f *iov = my_alloc (10);
 int *j = (int *)iov->g;
 return __builtin_dynamic_object_size(iov->g, 0);
 }

 Basically, the problem is relating to the pointee type of the pointer 
 array being “void”, 
 As a result, the element size of the array is not available in the IR. 
 Therefore segmentation
 fault when calculating the size of the whole object. 

 Although it’s easy to fix this segmentation failure, I am not quite sure 
 what’s the best
 solution to this issue:

 1. Reject such usage of “counted_by” in the very beginning by reporting 
 warning to the
 User, and delete the counted_by attribute from the field.

 Or:

 2. Accept such usage, but issue warnings when calculating the object_size 
 in Middle-end.

 Personally, I prefer the above 1 since I think that when the pointee type 
 is void, we don’t know
 The type of the element of the pointer array, there is no way to decide 
 the size of the pointer array. 

 So, the counted_by information is not useful for the 
 __builtin_dynamic_object_size.

 But I am not sure whether the counted_by still can be used for bound 
 sanitizer?

 Thanks for suggestions and help.
>>> 
>>> GNU C allows pointer arithmetic and sizeof on void pointers and
>>> that treats void as having size 1.  So you could also allow counted_by
>>> and assume as size 1 for void.
>>> 
>>> https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html
>> 
>> Okay, thanks for the info.
>> So, 
>> 1. should we issue warnings when doing this?
> 
> Please don't, Linux would very much like to track these allocation sizes
> still. Performing pointer arithmetic and bounds checking (via __bdos) on
> "void *" is wanted (and such a calculation was what tripped the
> segfault).

My previous question was: -:)

When we accept the “void” pointee type and provide 
__builtin_dynamic_object_size for 
such pointers (treating it as 1 byte) shall we issue a warning to users to warn 
them that the void pointee type is
Accepted and treated as size 1? 

Or just silently handle such case as normal?

> 
>> 2. If the compilation option is explicitly asking for standard C,
>>shall we issue warning and delete the counted_by attribute from the field?
> 
> I think it needs to stay attached for __bdos. And from the looks of it,
> even array access works with 1-byte values too:
> 
> extern void *ptr;
> void *foo(int num) {
>return &ptr[num];
> }
> 
> The assembly output of this shows it's doing byte addition. Clang
> doesn't warn about this, but GCC does:
> 
> :5:16: warning: dereferencing 'void *' pointer
>5 | return &ptr[num];
>  |^
> 
> So, I think even the bounds sanitizer should handle it, even if a
> warning ultimately gets emitted.

Okay. I will also handle the void in bounds sanitizer by treating element size 
as 1 byte.

My previous question was:

Since this is only a GNU extension, I am wondering under the situation that No 
GNU extension
Is allowed, shall we issue warnings and delete the counted_by attribute?

Qing

> 
> -Kees
> 
> -- 
> Kees Cook

[committed] cobol: Repair some exception processing logic.

2025-04-24 Thread Robert Dubner

>From 5faa0313bd82827f86768553932d55f7b2bc05a2 Mon Sep 17 00:00:00 2001
From: Robert Dubner 
Date: Thu, 24 Apr 2025 16:26:58 -0400
Subject: [PATCH] cobol: Repair some exception processing logic.

This patch changes the exception processing logic for the calculation of
reference modifications and table subscripts to be more in accordance with
ISO specifications.

It also adjusts the processing of RETURN-CODE when calling routines that
have no CALL ... RETURNING phrase.

gcc/cobol

* genapi.cc: (initialize_variable_internal): Change TRACE1
formatting.
(create_and_call): Repair RETURN-CODE processing.
(mh_source_is_group): Repair run-time IF type comparison.
(psa_FldLiteralA): Change TRACE1 formatting.
(parser_symbol_add): Eliminate unnecessary code.
* genutil.cc: Eliminate SET_EXCEPTION_CODE macro.
(get_data_offset_dest): Repair set_exception_code logic.
(get_data_offset_source): Likewise.
(get_binary_value): Likewise.
(refer_refmod_length): Likewise.
(refer_fill_depends): Likewise.
(refer_offset_dest): Likewise.
(refer_size_dest): Likewise.
(refer_offset_source): Likewise.

gcc/testsuite

* cobol.dg/group1/declarative_1.cob: Adjust for repaired exception
logic.
---
 gcc/cobol/genapi.cc   |  99 +--
 gcc/cobol/genutil.cc  | 779 ++
 .../cobol.dg/group1/declarative_1.cob |   6 +-
 3 files changed, 295 insertions(+), 589 deletions(-)

diff --git a/gcc/cobol/genapi.cc b/gcc/cobol/genapi.cc
index c8911f964d59..e44364a1b482 100644
--- a/gcc/cobol/genapi.cc
+++ b/gcc/cobol/genapi.cc
@@ -1229,7 +1229,40 @@ initialize_variable_internal( cbl_refer_t refer,
   }
 else
   {
-  TRACE1_FIELD_VALUE("", parsed_var, "")
+  // Convert strings of spaces to ""
+  tree spaces = gg_define_int(0);
+  if(   parsed_var->type == FldGroup
+ || parsed_var->type == FldAlphanumeric
+ || parsed_var->type == FldAlphaEdited
+ || parsed_var->type == FldLiteralA )
+{
+gg_assign(spaces, integer_one_node);
+tree counter = gg_define_int(parsed_var->data.capacity);
+WHILE(counter, gt_op, integer_zero_node)
+  {
+  gg_decrement(counter);
+  IF( gg_indirect(member(parsed_var->var_decl_node, "data"),
counter),
+  ne_op,
+  build_int_cst_type(UCHAR, ' ') )
+  {
+  gg_assign(spaces, integer_zero_node);
+  }
+  ELSE
+{
+}
+  ENDIF
+  }
+  WEND
+}
+  IF(spaces, eq_op, integer_one_node)
+{
+TRACE1_TEXT(" ")
+}
+  ELSE
+{
+TRACE1_FIELD_VALUE("", parsed_var, "")
+}
+  ENDIF
   }
 TRACE1_END
 }
@@ -12341,7 +12374,7 @@ create_and_call(size_t narg,
 
 // Because the CALL had a RETURNING clause, RETURN-CODE doesn't
return a
 // value.  So, we make sure it is zero
-gg_assign(var_decl_return_code, build_int_cst_type(SHORT, 0));
+gg_assign(var_decl_return_code, build_int_cst_type(SHORT, 0));
 
 if( returned_value_type == CHAR_P )
   {
@@ -12352,7 +12385,7 @@ create_and_call(size_t narg,
 gg_add( member(returned.field->var_decl_node, "data"),
 refer_offset_dest(returned)));
   gg_assign(returned_length,
-refer_size_dest(returned));
+gg_cast(TREE_TYPE(returned_length),
refer_size_dest(returned)));
 
   // The returned value is a string of nbytes, which by specification
   // has to be at least as long as the returned_length of the target:
@@ -12442,28 +12475,9 @@ create_and_call(size_t narg,
 }
   else
 {
-// Because no explicit returning value is expected, we switch to
-// the IBM default behavior, where the returned INT value is assigned
-// to our RETURN-CODE:
-returned_value = gg_define_variable(SHORT);
-
-// Before doing the call, we save the COBOL program_state:
-push_program_state();
-gg_assign(returned_value, gg_cast(SHORT, call_expr));
-// And after the call, we restore it:
-pop_program_state();
-
-// We know that the returned value is a 2-byte little-endian INT:
-gg_assign(  var_decl_return_code,
-returned_value);
-TRACE1
-  {
-  TRACE1_HEADER
-  gg_printf("returned value: %d",
-gg_cast(INT, var_decl_return_code),
-NULL_TREE);
-  TRACE1_END
-  }
+// Because no explicit returning value is expected, we just call it.
We
+// expect COBOL routines to set RETURN-CODE when they think it
necessary.
+gg_append_statement(call_expr);
 }
 
   for( size_t i=0; ivar_decl_node) = 1;
 nvar += 1;
 }
-  TRACE1
-{
-TRACE1_INDENT
-TRACE1_TEXT("Finished")
-TRACE1_END
-}
+//  TRACE1
+//{
+//TRACE1_INDENT
+//

Re: [PATCH] Refactor msse4 and mno-sse4.

2025-04-24 Thread Uros Bizjak

On Fri, Apr 25, 2025 at 8:14 AM liuhongt  wrote:
>
> This is originally from [1]
> 
> For the command line, or target attribute, the actual operation goes
> into ix86_handle_option, and as long as we get it right in this
> ix86_handle_option, everything else should be fine.
> As for the macros generated by the mask name (TARGET_SSE4_1_P), their
> meanings remain the same, so in my opinion they can be handled this
> way. But to be on the safe side, I agree that it should be adjusted in
> ix86_valid_target_attribute_inner_p first.
> In GCC16, we can do a refactor for this.
> --
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2025-April/679880.html
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

LGTM.

Thanks,
Uros.

>
> gcc/ChangeLog:
>
> PR target/119549
> * common/config/i386/i386-common.cc (ix86_handle_option):
> Refactor msse4 and mno-sse4.
> * config/i386/i386.opt (msse4): Remove RejectNegative.
> (mno-sse4): Remove the entry.
> * config/i386/i386-expand.cc
> (ix86_valid_target_attribute_inner_p): Remove special code
> which handles mno-sse4.
> ---
>  gcc/common/config/i386/i386-common.cc | 23 ---
>  gcc/config/i386/i386-options.cc   |  7 ---
>  gcc/config/i386/i386.opt  |  6 +-
>  3 files changed, 13 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 4815fbc4d35..296df3b3230 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1519,17 +1519,18 @@ ix86_handle_option (struct gcc_options *opts,
>return true;
>
>  case OPT_msse4:
> -  gcc_assert (value != 0);
> -  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_SSE4_SET;
> -  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_SET;
> -  return true;
> -
> -case OPT_mno_sse4:
> -  gcc_assert (value != 0);
> -  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_SSE4_UNSET;
> -  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_UNSET;
> -  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_SSE4_UNSET;
> -  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_SSE4_UNSET;
> +  if (value)
> +   {
> + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_SSE4_SET;
> + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_SET;
> +   }
> +  else
> +   {
> + opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_SSE4_UNSET;
> + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_UNSET;
> + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_SSE4_UNSET;
> + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_SSE4_UNSET;
> +   }
>return true;
>
>  case OPT_msse4a:
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 964449fa8cd..45aa9b4b732 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1271,13 +1271,6 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
> args, char *p_strings[],
> }
> }
>
> -  /* Fixup -msse4 which is RejectNegative to -mno-sse4 when negated.  */
> -  if (opt == OPT_msse4 && !opt_set_p)
> -   {
> - opt = OPT_mno_sse4;
> - opt_set_p = true;
> -   }
> -
>/* Process the option.  */
>if (opt == N_OPTS)
> {
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 27d34bd64ea..0abf13480f5 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -721,13 +721,9 @@ Target Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
>  Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions 
> and code generation.
>
>  msse4
> -Target RejectNegative Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
> +Target Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
>  Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions 
> and code generation.
>
> -mno-sse4
> -Target RejectNegative InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save
> -Do not support SSE4.1 and SSE4.2 built-in functions and code generation.
> -
>  msse5
>  Target Undocumented Alias(mavx) Warn(%<-msse5%> was removed)
>  ;; Deprecated
> --
> 2.34.1
>

Re: [v2 PATCH 1/2] RISC-V: Support RISC-V Profiles 20/22.

2025-04-24 Thread Jeff Law





On 1/20/25 8:59 PM, Jiawei wrote:

This patch introduces support for RISC-V Profiles RV20 and RV22 [1],
enabling developers to utilize these profiles through the -march option.

[1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0

Version log:
Using lowercase letters to present Profiles.
Using '_' as divsor between Profiles and other RISC-V extension.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (struct riscv_profiles): New 
struct.
(riscv_subset_list::parse_profiles): New parser.
(riscv_subset_list::parse_base_ext): Ditto.
* config/riscv/riscv-subset.h: New def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-45.c: New test.
* gcc.target/riscv/arch-46.c: New test.
* gcc.target/riscv/arch-47.c: New test.
So I think we probably need some documentation on the profile options in 
invoke.texi.


I realize you may not necessarily be a native english speaker, so what I 
would suggest is you do your best.  If adjustments need to be made on 
the documentation, I can certainly help with wordsmithing.


So while I think the patches are fine, let's get the doc situation 
sorted out before pushing these two patches to the trunk.


Thanks!

jeff

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Hongtao Liu

On Fri, Apr 25, 2025 at 1:26 PM Jan Hubicka  wrote:
>
> > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka  wrote:
> > >
> > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> > > > or vpandn.
> > > > Current register_operand/vector_operand could lose some optimization
> > > > opportunity.
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * config/i386/predicates.md (vector_or_0_or_1s_operand): New 
> > > > predicate.
> > > >   (nonimm_or_0_or_1s_operand): Ditto.
> > > >   * config/i386/sse.md (vcond_mask_):
> > > >   Extend the predicate of operands1 to accept 0 or allones
> > > >   operands.
> > > >   (vcond_mask_): Ditto.
> > > >   (vcond_mask_v1tiv1ti): Ditto.
> > > >   (vcond_mask_): Ditto.
> > > >   * config/i386/i386.md (movcc): Ditto for operands[2] and
> > > >   operands[3].
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * gcc.target/i386/blendv-to-maxmin.c: New test.
> > > >   * gcc.target/i386/blendv-to-pand.c: New test.
> > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> > > > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > > new file mode 100644
> > > > index 000..042eb7d8f24
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > > > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > > > +
> > > > +double
> > > > +foo (double a)
> > > > +{
> > > > +  if (a > 0.0)
> > > > +return a;
> > > > +  return 0.0;
> > > > +}
> > >
> > > With -ffast-math this is matched as MAX_EXPR at gimple level. Without
> > > -ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
> > > explicitely documented as unspecified when one of parameters is nan.
> > >
> > > So without -ffast-math at combine time we see:
> > > (insn 6 3 7 2 (set (reg:DF 103)
> > > (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
> > >  (nil))
> > > (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> > > (unspec:DF [
> > > (reg:DF 104 [ a ])
> > > (reg:DF 103)
> > > ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
> > >  (expr_list:REG_DEAD (reg:DF 104 [ a ])
> > > (expr_list:REG_DEAD (reg:DF 103)
> > > (nil
> > >
> > > maxss is defined as:
> > >
> > > MAX(SRC1, SRC2)
> > > {
> > > IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> > > ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> > > ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> > > ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> > > ELSE DEST := SRC2;
> > > FI;
> > > }
> >
> > Please see [1], "Maximum and minimum functions", which says:
> >
> > "The maxNum and minNum functions defined in the 2008 standard
> > propagate a non-NaN when one input is NaN and the other input is a
> > normal number.
> >
> > This problem will be fixed by the forthcoming revision of the
> > standard. The new functions named maximum and minimum are certain to
> > propagate NaNs.
> > Some current implementations are deviating from both of these
> > definitions. Max and min instructions in the x86 instruction set are
> > implemented so that max(a,b) and min(a,b) give b if one of the inputs
> > is NaN. This is useful because it corresponds to the behavior of the
> > code expression a > b ? a : b. A compiler can translate this common
> > high-level language expression into a single instruction."
> >
> > Unfortunately, SSE max and min instructions are incompatible with both
> > standard revisions due to "ELSE IF (SRC1 = NaN) THEN DEST := SRC2;
> > FI;"
>
> I may be missing something obvious, but it still seems to me that RTL's
>
>   (if_then_else:SF
>  (gt (reg:SF SRC1) (reg:SF SRC2))
>  (reg:SF SRC1)
>  (reg:SF SRC2))
>
> Behaves same as the sse's max spec:
>
> MAX(SRC1, SRC2)
> {
> IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> ELSE DEST := SRC2;
> FI;
> }
>
> 1) If SRC1 and SRC2 are both 0 (possibly with different signs) then GT is 
> false and we pick SRC2
> 2) If SRC1 or SRC2 is NaN then GT is false and we pick SRC2
> 3) if SRC1 > SRC2 then GT is true and we pick SRC1
> otherwise GT is false and we pick SRC2
>
> And thus it may be more RTL friendly to represent it this way instead of
> current unspec called UNSPEC_IEEE_MAX...

There's a patch proposed for that [1], and Jakub has some comments.

Jakub Jelinek  于2024年11月15日周五 16:20写道：
>
> On Fri, Nov 15, 2024 at 04:04:55PM +0800, Hongyu Wang wrote:
> > Following the discussion in pr116738, the insn for UNSPEC_IEEE_MAXMIN
> > actually matches the

Re: [PATCH] Fortran: fix procedure pointer handling with -fcheck=pointer [PR102900]

2025-04-24 Thread Jerry D


On 4/24/25 12:59 PM, Harald Anlauf wrote:

Dear all,

the attached patch is the result of my attempts to fix an ICE when
compiling gfortran.dg/proc_ptr_52.f90 with -fcheck=all.  While
trying to reduce this, I found several oddities with functions
returning class(*), pointer that ICE'd too.

The original ICE in the PR turned out to be a bug in the pointer
checking code when passing a procedure pointer to a CLASS procedure
dummy that tried to access the container of the procedure pointer.
I believe that this should not be done, and one should only check
that the procedure pointer is not null.
I am not too experienced which class-valued functions, so if any
of the experts (Paul, Andre', ...) could have a look?

(After fixing the issue with -fcheck=pointer, I ran into a bogus
error with -Wexternal-argument-mismatch for the same testcase.
This is now pr119928.)

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Cheers,
Harald


OK Harald,

Jerry

Re: [PATCH] testsuite: Skip tests incompatible with generic thunk support

2025-04-24 Thread Bernhard Reutner-Fischer



>>  * lib/target-supports.exp
>>  (check_effective_target_variadic_mi_thunk): New function.
>OK.
>jeff
>

Please document new effective_target checks in sourcebuild.texi

thanks

[committed] libstdc++: Remove unnecessary dg-prune-output from tests

2025-04-24 Thread Jonathan Wakely

There are no errors matching this pattern in these tests (only in the
deque/48101_neg.cc and vector/48101_neg.cc tests).

libstdc++-v3/ChangeLog:

* testsuite/23_containers/forward_list/48101_neg.cc: Remove
dg-prune-output that doesn't match anything.
* testsuite/23_containers/list/48101_neg.cc: Likewise.
* testsuite/23_containers/multiset/48101_neg.cc: Likewise.
* testsuite/23_containers/set/48101_neg.cc: Likewise.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/testsuite/23_containers/forward_list/48101_neg.cc | 1 -
 libstdc++-v3/testsuite/23_containers/list/48101_neg.cc | 1 -
 libstdc++-v3/testsuite/23_containers/multiset/48101_neg.cc | 1 -
 libstdc++-v3/testsuite/23_containers/set/48101_neg.cc  | 1 -
 4 files changed, 4 deletions(-)

diff --git a/libstdc++-v3/testsuite/23_containers/forward_list/48101_neg.cc 
b/libstdc++-v3/testsuite/23_containers/forward_list/48101_neg.cc
index 2f2ea2afb19..d18195ed354 100644
--- a/libstdc++-v3/testsuite/23_containers/forward_list/48101_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/forward_list/48101_neg.cc
@@ -26,6 +26,5 @@ test01()
 }
 
 // { dg-error "non-const, non-volatile value_type" "" { target *-*-* } 0 }
-// { dg-prune-output "std::allocator<.* has no member named " }
 // { dg-prune-output "must have the same value_type as its allocator" }
 // { dg-prune-output "rebind_alloc" }
diff --git a/libstdc++-v3/testsuite/23_containers/list/48101_neg.cc 
b/libstdc++-v3/testsuite/23_containers/list/48101_neg.cc
index 8b2e075ca6a..cc51705dcc6 100644
--- a/libstdc++-v3/testsuite/23_containers/list/48101_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/list/48101_neg.cc
@@ -26,5 +26,4 @@ test01()
 }
 
 // { dg-error "non-const, non-volatile value_type" "" { target *-*-* } 0 }
-// { dg-prune-output "std::allocator<.* has no member named " }
 // { dg-prune-output "must have the same value_type as its allocator" }
diff --git a/libstdc++-v3/testsuite/23_containers/multiset/48101_neg.cc 
b/libstdc++-v3/testsuite/23_containers/multiset/48101_neg.cc
index f0786cfc7fb..3cc06587526 100644
--- a/libstdc++-v3/testsuite/23_containers/multiset/48101_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/multiset/48101_neg.cc
@@ -29,7 +29,6 @@ test01()
 
 // { dg-error "non-const, non-volatile value_type" "" { target *-*-* } 0 }
 // { dg-error "comparison object must be invocable" "" { target *-*-* } 0 }
-// { dg-prune-output "std::allocator<.* has no member named " }
 // { dg-prune-output "must have the same value_type as its allocator" }
 // { dg-prune-output "no match for call" }
 // { dg-prune-output "invalid conversion" }
diff --git a/libstdc++-v3/testsuite/23_containers/set/48101_neg.cc 
b/libstdc++-v3/testsuite/23_containers/set/48101_neg.cc
index e8dec72cce5..fe38d1af7ef 100644
--- a/libstdc++-v3/testsuite/23_containers/set/48101_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/set/48101_neg.cc
@@ -29,7 +29,6 @@ test01()
 
 // { dg-error "non-const, non-volatile value_type" "" { target *-*-* } 0 }
 // { dg-error "comparison object must be invocable" "" { target *-*-* } 0 }
-// { dg-prune-output "std::allocator<.* has no member named " }
 // { dg-prune-output "must have the same value_type as its allocator" }
 // { dg-prune-output "no match for call" }
 // { dg-prune-output "invalid conversion" }
-- 
2.49.0

Re: [PATCH] testsuite: Add require target for SJLJ exception implementation

2025-04-24 Thread Jeff Law





On 4/24/25 12:22 PM, Dimitar Dimitrov wrote:

Testcases for musttail call optimization fail on pru-unknown-elf:
   FAIL: c-c++-common/musttail14.c  -std=gnu++17 (test for excess errors)
   Excess errors:
   .../gcc/gcc/testsuite/c-c++-common/musttail14.c:37:14: error: cannot 
tail-call: caller uses sjlj exceptions

Silence these errors by disabling the tests if target uses SJLJ for
implementing exceptions.  Use a new effective target check for this.

Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.

Ok for trunk?

gcc/testsuite/ChangeLog:

* c-c++-common/musttail14.c: Disable test if effective target
using_sjlj_exceptions.
* c-c++-common/musttail22.c: Ditto.
* g++.dg/musttail8.C: Ditto.
* g++.dg/musttail9.C: Ditto.
* g++.dg/opt/musttail3.C: Ditto.
* g++.dg/opt/musttail4.C: Ditto.
* g++.dg/opt/musttail5.C: Ditto.
* g++.dg/opt/pr119613.C: Ditto.
* lib/target-supports.exp
(check_effective_target_using_sjlj_exceptions): New check.

OK, but please make sure to document the new dejagnu target test.

Jeff

[PATCH 1/2] libstdc++: Add _M_key_compare helper to associative containers

2025-04-24 Thread Jonathan Wakely

In r10-452-ge625ccc21a91f3 I noted that we don't have an accessor for
invoking _M_impl._M_key_compare in the associative containers. That
meant that the static assertions to check for valid comparison functions
were squirrelled away in _Rb_tree::_S_key instead. As Jason noted in
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681436.html this
means that the static assertions fail later than we'd like.

This change adds a new _Rb_tree::_M_key_compare member function which
invokes the _M_impl._M_key_compare function object, and then moves the
static_assert from _S_key into _M_key_compare.

Because the new function is const-qualified, we now treat LWG 2542 as a
DR for older standards, requiring the comparison function to be const
invocable. Previously we only enforced the LWG 2542 rule for C++17 and
later.

I did consider deprecating support for comparisons which aren't const
invocable, something like this:

  // Before LWG 2542 it wasn't strictly necessary for _Compare to be
  // const invocable, if you only used non-const container members.
  // Define a non-const overload for pre-C++17, deprecated for C++11/14.
  #if __cplusplus < 201103L
  bool
  _M_key_compare(const _Key& __k1, const _Key& __k2)
  { return _M_impl._M_key_compare(__k1, __k2); }
  #elif __cplusplus < 201703L
  template
[[__deprecated__("support for comparison functions that are not "
 "const invocable is deprecated")]]
__enable_if_t<
__and_<__is_invocable<_Compare&, const _Key1&, const _Key2&>,
   __not_<__is_invocable>>::value,
   bool>
_M_key_compare(const _Key1& __k1, const _Key2& __k2)
{
  static_assert(
__is_invocable<_Compare&, const _Key&, const _Key&>::value,
"comparison object must be invocable with two arguments of key type"
  );
  return _M_impl._M_key_compare(__k1, __k2);
}
  #endif // < C++17

But I decided that this isn't necessary, because we've been enforcing
the C++17 rule since GCC 8.4 and 9.2, and C++17 has been the default
since GCC 11.1. Users have had plenty of time to fix their invalid
comparison functions.

libstdc++-v3/ChangeLog:

* include/bits/stl_tree.h (_Rb_tree::_M_key_compare): New member
function to invoke comparison function.
(_Rb_tree): Use new member function instead of accessing the
comparison function directly.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/stl_tree.h | 108 +--
 1 file changed, 50 insertions(+), 58 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h
index 6b35f99a25a..c7352093933 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -1390,27 +1390,25 @@ namespace __rb_tree
   _M_end() const _GLIBCXX_NOEXCEPT
   { return this->_M_impl._M_header._M_base_ptr(); }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2542. Missing const requirements for associative containers
+  template
+   bool
+   _M_key_compare(const _Key1& __k1, const _Key2& __k2) const
+   {
+#if __cplusplus >= 201103L
+ // Enforce this here with a user-friendly message.
+ static_assert(
+   __is_invocable::value,
+   "comparison object must be invocable with two arguments of key type"
+ );
+# endif // C++17
+ return _M_impl._M_key_compare(__k1, __k2);
+   }
+
   static const _Key&
   _S_key(const _Node& __node)
-  {
-#if __cplusplus >= 201103L
-   // If we're asking for the key we're presumably using the comparison
-   // object, and so this is a good place to sanity check it.
-   static_assert(__is_invocable<_Compare&, const _Key&, const _Key&>{},
- "comparison object must be invocable "
- "with two arguments of key type");
-# if __cplusplus >= 201703L
-   // _GLIBCXX_RESOLVE_LIB_DEFECTS
-   // 2542. Missing const requirements for associative containers
-   if constexpr (__is_invocable<_Compare&, const _Key&, const _Key&>{})
- static_assert(
- is_invocable_v,
- "comparison object must be invocable as const");
-# endif // C++17
-#endif // C++11
-
-   return _KeyOfValue()(*__node._M_valptr());
-  }
+  { return _KeyOfValue()(*__node._M_valptr()); }
 
   static const _Key&
   _S_key(_Base_ptr __x)
@@ -1933,7 +1931,7 @@ namespace __rb_tree
_M_find_tr(const _Kt& __k) const
{
  const_iterator __j(_M_lower_bound_tr(__k));
- if (__j != end() && _M_impl._M_key_compare(__k, _S_key(__j._M_node)))
+ if (__j != end() && _M_key_compare(__k, _S_key(__j._M_node)))
__j = end();
  return __j;
}
@@ -1955,7 +1953,7 @@ namespace __rb_tree
  auto __x = _M_begin();
  auto __y = _M_end();
  while (__x)
-   if (!_M_impl._M_key_compare(_S_key(__x), __k))
+   if (!_M_key_compare(_S_key(__x), __k

[PATCH 2/2] libstdc++: Improve diagnostics for std::packaged_task invocable checks

2025-04-24 Thread Jonathan Wakely

Moving the static_assert that checks is_invocable_r_v into _Task_state
means it is checked when we instantiate that class template.

Replacing the __create_task_state function with a static member function
_Task_state::_S_create ensures we instantiate _Task_state and trigger
the static_assert immediately, not deep inside the implementation of
std::allocate_shared. This results in shorter diagnostics that don't
show deeply-nested template instantiations before the static_assert
failure.

Placing the static_assert at class scope also helps us to fail earlier
than waiting until when the _Task_state::_M_run virtual function is
instantiated. That also makes the diagnostics shorter and easier to read
(although for C++11 and C++14 modes the class-scope static_assert
doesn't check is_invocable_r, so dangling references aren't detected
until _M_run is instantiated).

libstdc++-v3/ChangeLog:

* include/std/future (__future_base::_Task_state): Check
invocable requirement here.
(__future_base::_Task_state::_S_create): New static member
function.
(__future_base::_Task_state::_M_reset): Use _S_create.
(__create_task_state): Remove.
(packaged_task): Use _Task_state::_S_create instead of
__create_task_state.
* testsuite/30_threads/packaged_task/cons/dangling_ref.cc:
Adjust dg-error patterns.
* testsuite/30_threads/packaged_task/cons/lwg4154_neg.cc:
Likewise.
---

Tested x86_64-linux.

 libstdc++-v3/include/std/future   | 66 +--
 .../packaged_task/cons/dangling_ref.cc|  3 +-
 .../packaged_task/cons/lwg4154_neg.cc | 10 +--
 3 files changed, 38 insertions(+), 41 deletions(-)

diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index b7ab233b85f..080690064a9 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -1486,12 +1486,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __future_base::_Task_state<_Fn, _Alloc, _Res(_Args...)> final
 : __future_base::_Task_state_base<_Res(_Args...)>
 {
+#ifdef __cpp_lib_is_invocable // C++ >= 17
+  static_assert(is_invocable_r_v<_Res, _Fn&, _Args...>);
+#else
+  static_assert(__is_invocable<_Fn&, _Args...>::value,
+   "_Fn& is invocable with _Args...");
+#endif
+
   template
_Task_state(_Fn2&& __fn, const _Alloc& __a)
: _Task_state_base<_Res(_Args...)>(__a),
  _M_impl(std::forward<_Fn2>(__fn), __a)
{ }
 
+  template
+   static shared_ptr<_Task_state_base<_Res(_Args...)>>
+   _S_create(_Fn2&& __fn, const _Alloc& __a)
+   {
+ return std::allocate_shared<_Task_state>(__a,
+  std::forward<_Fn2>(__fn),
+  __a);
+   }
+
 private:
   virtual void
   _M_run(_Args&&... __args)
@@ -1515,7 +1531,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   virtual shared_ptr<_Task_state_base<_Res(_Args...)>>
-  _M_reset();
+  _M_reset()
+  { return _S_create(std::move(_M_impl._M_fn), _M_impl); }
 
   struct _Impl : _Alloc
   {
@@ -1525,38 +1542,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Fn _M_fn;
   } _M_impl;
 };
-
-  template>
-shared_ptr<__future_base::_Task_state_base<_Signature>>
-__create_task_state(_Fn&& __fn, const _Alloc& __a = _Alloc())
-{
-  typedef typename decay<_Fn>::type _Fn2;
-  typedef __future_base::_Task_state<_Fn2, _Alloc, _Signature> _State;
-  return std::allocate_shared<_State>(__a, std::forward<_Fn>(__fn), __a);
-}
-
-  template
-shared_ptr<__future_base::_Task_state_base<_Res(_Args...)>>
-__future_base::_Task_state<_Fn, _Alloc, _Res(_Args...)>::_M_reset()
-{
-  return __create_task_state<_Res(_Args...)>(std::move(_M_impl._M_fn),
-static_cast<_Alloc&>(_M_impl));
-}
   /// @endcond
 
   /// packaged_task
   template
 class packaged_task<_Res(_ArgTypes...)>
 {
-  typedef __future_base::_Task_state_base<_Res(_ArgTypes...)> _State_type;
+  using _State_type = __future_base::_Task_state_base<_Res(_ArgTypes...)>;
   shared_ptr<_State_type>   _M_state;
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 3039. Unnecessary decay in thread and packaged_task
   template>
-   using __not_same
- = typename enable_if::value>::type;
+   using __not_same = __enable_if_t::value>;
+
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 4154. The Mandates for std::packaged_task's constructor
+  // from a callable entity should consider decaying.
+  template>
+   using _Task_state
+ = __future_base::_Task_state<__decay_t<_Fn>, _Alloc,
+  _Res(_ArgTypes...)>;
 
 public:
   // Construction and destruction
@@ -1565,16 +1571,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   te

[RFC] RISC-V: Implment H modifier for printing the next register name

2025-04-24 Thread Jin Ma

For RV32 inline assembly, when handling 64-bit integer data, it is
often necessary to process the lower and upper 32 bits separately.
Unfortunately, we can only output the current register name
(lower 32 bits) but not the next register name (upper 32 bits).

To address this, the modifier 'H' has been added to allow users
to handle the upper 32 bits of the data. While I believe the
modifier 'N' (representing the next register name) might be more
suitable for this functionality, 'N' is already in use.
Therefore, 'H' (representing the high register) was chosen instead.

Does anyone have any comments on this?

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add H.
* doc/extend.texi: Document for H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/modifier-H.c: New test.
---
 gcc/config/riscv/riscv.cc   | 12 +++
 gcc/doc/extend.texi |  1 +
 gcc/testsuite/gcc.target/riscv/modifier-H.c | 22 +
 3 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/modifier-H.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bad59e248d0..4ef96532f35 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6879,6 +6879,7 @@ riscv_asm_output_opcode (FILE *asm_out_file, const char 
*p)
'T' Print shift-index of inverted single-bit mask OP.
'~' Print w if TARGET_64BIT is true; otherwise not print anything.
'N'  Print register encoding as integer (0-31).
+   'H'  Print the name of the high register for OP, which is the next register.
 
Note please keep this list and the list in riscv.md in sync.  */
 
@@ -7174,6 +7175,17 @@ riscv_print_operand (FILE *file, rtx op, int letter)
asm_fprintf (file, "%u", (regno - offset));
break;
   }
+case 'H':
+  {
+   if (!REG_P (op))
+ {
+   output_operand_lossage ("modifier 'H' require register operand");
+   break;
+ }
+   fputs (reg_names[REGNO (op) + 1], file);
+   break;
+  }
+
 default:
   switch (code)
{
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0978c4c41b2..beda06ecb0b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12585,6 +12585,7 @@ The list below describes the supported modifiers and 
their effects for RISC-V.
 @item @code{z} @tab Print ''@code{zero}'' instead of 0 if the operand is an 
immediate with a value of zero.
 @item @code{i} @tab Print the character ''@code{i}'' if the operand is an 
immediate.
 @item @code{N} @tab Print the register encoding as integer (0 - 31).
+@item @code{H} @tab Print the name of the high register for OP, which is the 
next register.
 @end multitable
 
 @anchor{shOperandmodifiers}
diff --git a/gcc/testsuite/gcc.target/riscv/modifier-H.c 
b/gcc/testsuite/gcc.target/riscv/modifier-H.c
new file mode 100644
index 000..095af79c0a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/modifier-H.c
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+typedef long long __int64;
+
+__int64 foo ()
+{
+/*
+** foo:
+**   ...
+**   li\t[atx][0-9]+,1
+**   li\t[atx][0-9]+,1
+**   ...
+*/
+  __int64 ret;
+  asm ("li\t%0,1\n\tli\t%H0,1\n\t":"=r"(ret));
+
+  return ret;
+}
+
-- 
2.25.1

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Jan Hubicka

> On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka  wrote:
> >
> > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> > > or vpandn.
> > > Current register_operand/vector_operand could lose some optimization
> > > opportunity.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/i386/predicates.md (vector_or_0_or_1s_operand): New 
> > > predicate.
> > >   (nonimm_or_0_or_1s_operand): Ditto.
> > >   * config/i386/sse.md (vcond_mask_):
> > >   Extend the predicate of operands1 to accept 0 or allones
> > >   operands.
> > >   (vcond_mask_): Ditto.
> > >   (vcond_mask_v1tiv1ti): Ditto.
> > >   (vcond_mask_): Ditto.
> > >   * config/i386/i386.md (movcc): Ditto for operands[2] and
> > >   operands[3].
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/i386/blendv-to-maxmin.c: New test.
> > >   * gcc.target/i386/blendv-to-pand.c: New test.
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> > > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > new file mode 100644
> > > index 000..042eb7d8f24
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > > +
> > > +double
> > > +foo (double a)
> > > +{
> > > +  if (a > 0.0)
> > > +return a;
> > > +  return 0.0;
> > > +}
> >
> > With -ffast-math this is matched as MAX_EXPR at gimple level. Without
> > -ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
> > explicitely documented as unspecified when one of parameters is nan.
> >
> > So without -ffast-math at combine time we see:
> > (insn 6 3 7 2 (set (reg:DF 103)
> > (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
> >  (nil))
> > (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> > (unspec:DF [
> > (reg:DF 104 [ a ])
> > (reg:DF 103)
> > ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
> >  (expr_list:REG_DEAD (reg:DF 104 [ a ])
> > (expr_list:REG_DEAD (reg:DF 103)
> > (nil
> >
> > maxss is defined as:
> >
> > MAX(SRC1, SRC2)
> > {
> > IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> > ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> > ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> > ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> > ELSE DEST := SRC2;
> > FI;
> > }
> 
> Please see [1], "Maximum and minimum functions", which says:
> 
> "The maxNum and minNum functions defined in the 2008 standard
> propagate a non-NaN when one input is NaN and the other input is a
> normal number.
> 
> This problem will be fixed by the forthcoming revision of the
> standard. The new functions named maximum and minimum are certain to
> propagate NaNs.
> Some current implementations are deviating from both of these
> definitions. Max and min instructions in the x86 instruction set are
> implemented so that max(a,b) and min(a,b) give b if one of the inputs
> is NaN. This is useful because it corresponds to the behavior of the
> code expression a > b ? a : b. A compiler can translate this common
> high-level language expression into a single instruction."
> 
> Unfortunately, SSE max and min instructions are incompatible with both
> standard revisions due to "ELSE IF (SRC1 = NaN) THEN DEST := SRC2;
> FI;"

I may be missing something obvious, but it still seems to me that RTL's

  (if_then_else:SF
 (gt (reg:SF SRC1) (reg:SF SRC2))
 (reg:SF SRC1)
 (reg:SF SRC2))

Behaves same as the sse's max spec:

MAX(SRC1, SRC2)
{
IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
ELSE DEST := SRC2;
FI;
}

1) If SRC1 and SRC2 are both 0 (possibly with different signs) then GT is false 
and we pick SRC2
2) If SRC1 or SRC2 is NaN then GT is false and we pick SRC2
3) if SRC1 > SRC2 then GT is true and we pick SRC1
otherwise GT is false and we pick SRC2

And thus it may be more RTL friendly to represent it this way instead of
current unspec called UNSPEC_IEEE_MAX

Honza

[PATCH] Refactor msse4 and mno-sse4.

2025-04-24 Thread liuhongt

This is originally from [1]

For the command line, or target attribute, the actual operation goes
into ix86_handle_option, and as long as we get it right in this
ix86_handle_option, everything else should be fine.
As for the macros generated by the mask name (TARGET_SSE4_1_P), their
meanings remain the same, so in my opinion they can be handled this
way. But to be on the safe side, I agree that it should be adjusted in
ix86_valid_target_attribute_inner_p first.
In GCC16, we can do a refactor for this.
--

[1] https://gcc.gnu.org/pipermail/gcc-patches/2025-April/679880.html

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/119549
* common/config/i386/i386-common.cc (ix86_handle_option):
Refactor msse4 and mno-sse4.
* config/i386/i386.opt (msse4): Remove RejectNegative.
(mno-sse4): Remove the entry.
* config/i386/i386-expand.cc
(ix86_valid_target_attribute_inner_p): Remove special code
which handles mno-sse4.
---
 gcc/common/config/i386/i386-common.cc | 23 ---
 gcc/config/i386/i386-options.cc   |  7 ---
 gcc/config/i386/i386.opt  |  6 +-
 3 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 4815fbc4d35..296df3b3230 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -1519,17 +1519,18 @@ ix86_handle_option (struct gcc_options *opts,
   return true;
 
 case OPT_msse4:
-  gcc_assert (value != 0);
-  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_SSE4_SET;
-  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_SET;
-  return true;
-
-case OPT_mno_sse4:
-  gcc_assert (value != 0);
-  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_SSE4_UNSET;
-  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_UNSET;
-  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_SSE4_UNSET;
-  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_SSE4_UNSET;
+  if (value)
+   {
+ opts->x_ix86_isa_flags |= OPTION_MASK_ISA_SSE4_SET;
+ opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_SET;
+   }
+  else
+   {
+ opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_SSE4_UNSET;
+ opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_UNSET;
+ opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_SSE4_UNSET;
+ opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_SSE4_UNSET;
+   }
   return true;
 
 case OPT_msse4a:
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 964449fa8cd..45aa9b4b732 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1271,13 +1271,6 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
args, char *p_strings[],
}
}
 
-  /* Fixup -msse4 which is RejectNegative to -mno-sse4 when negated.  */
-  if (opt == OPT_msse4 && !opt_set_p)
-   {
- opt = OPT_mno_sse4;
- opt_set_p = true;
-   }
-
   /* Process the option.  */
   if (opt == N_OPTS)
{
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 27d34bd64ea..0abf13480f5 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -721,13 +721,9 @@ Target Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and 
code generation.
 
 msse4
-Target RejectNegative Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
+Target Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and 
code generation.
 
-mno-sse4
-Target RejectNegative InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save
-Do not support SSE4.1 and SSE4.2 built-in functions and code generation.
-
 msse5
 Target Undocumented Alias(mavx) Warn(%<-msse5%> was removed)
 ;; Deprecated
-- 
2.34.1

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Jan Hubicka

> Note for blendv, it checks the significant bit of the mask, not simple
>  if_then_else
>   mask
>   if_true 
>   if_false
> 
> It should be 
> if_then_else
>ashiftrt mask 31
>if_true
>if_false

I think canonical form (produced by combine) would be

if_then_else
  ge mask 0
  if_false
  if_true
> 
> Maybe not very useful in practice, just like why there's UNSPEC_FMADDSUB
> 
> 6334
>  6335;; It would be possible to represent these without the UNSPEC as
>  6336;;
>  6337;; (vec_merge
>  6338;;   (fma op1 op2 op3)
>  6339;;   (fma op1 op2 (neg op3))
>  6340;;   (merge-const))
>  6341;;
>  6342;; But this doesn't seem useful in practice.

I am not so sure about this when it come to relatively common
instructions.  Hiding things in unspec prevents combine and other RTL
passes from doing their job. I would say that it only makes sense for
siutations where RTL equivalent is very inconvenient.

I noticed that we miss other optimizations of conditional moves. For
example:

int a[1000];
int b[1000];

int test()
{
for (int i = 0; i < 1000; i++)
a[i] = b[i] > 10 ? 2 : 3;
}

is compiled by clang to:
pcmpgtd %xmm0, %xmm2
paddd   %xmm1, %xmm2

while we do
pcmpgtd %xmm4, %xmm0
pand%xmm0, %xmm1
pandn   %xmm2, %xmm0
por %xmm1, %xmm0

Here I guess combine is out of luck. I fails to simplify the three
logcal operations:

Trying 17, 16 -> 18:
   17: r112:V4SI=~r110:V4SI&r104:V4SI
  REG_DEAD r110:V4SI
   16: r111:V4SI=r107:V4SI&r110:V4SI
   18: r100:V4SI=r112:V4SI|r111:V4SI
  REG_DEAD r112:V4SI
  REG_DEAD r111:V4SI
Failed to match this instruction:
(set (reg:V4SI 100 [ vect_iftmp.10 ])
(ior:V4SI (and:V4SI (not:V4SI (reg:V4SI 110))
(reg:V4SI 104))
(and:V4SI (reg:V4SI 107)
(reg:V4SI 110

Here reg 110 is set by compare:

(insn 15 14 16 3 (set (reg:V4SI 110)
(gt:V4SI (reg:V4SI 109 [ MEM  [(int *)&b + 
ivtmp.17_13 * 1] ])
(reg:V4SI 104))) 6997 {*sse2_gtv4si3}
 (expr_list:REG_DEAD (reg:V4SI 109 [ MEM  [(int *)&b 
+ ivtmp.17_13 * 1] ])
(nil)))

and I think it misses the fact that the mask is either all 0 or all 1
for each lane (which is a value rango info it does not track).

Similarly one can simplify i.e.
a[i] = b[i] > 10 ? 2 : 5;
into and and or...

Honza

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Jan Hubicka

> > And thus it may be more RTL friendly to represent it this way instead of
> > current unspec called UNSPEC_IEEE_MAX...
> 
> There's a patch proposed for that [1], and Jakub has some comments.
> 
> Jakub Jelinek  于2024年11月15日周五 16:20写道：
> >
> > On Fri, Nov 15, 2024 at 04:04:55PM +0800, Hongyu Wang wrote:
> > > Following the discussion in pr116738, the insn for UNSPEC_IEEE_MAXMIN
> > > actually matches the behavior of if_then_else, so remove the UNSPEC and
> > > rewrite related pattern with if_then_else.
> >
> > I'm not sure if it is necessarily always a win.
> > It has an advantage that simplify-rtx can simplify it if the operands are
> > constant and better describes what's going on.
> > The disadvantage I see is that it has multiple uses of the operands, each
> > operand is used twice rather than once in the pattern, which could lead to
> > combiner or other optimizations giving up on it.
> >
> > So, do you see any code generation changes with this patch on something
> > larger?
> 
> with Hongyu's replied
> ---
> I ran spec2017 with O2 config and found some bad changes such as vmaxsd
> does not use mem operands. The reload creates extra load for the new pattern.
> So maybe we need to preserve current UNSPEC.
> ---

Thanks, that is a fair point I did not think of.
In a way I wonder if this justifies adding the IEEE and C-style floting
point min/max as native RTL codes (to rtl.def). Support for SMIN/SMAX in
RTL land seems rather easy.

Honza

Re: [PATCH] RISC-V: Imply C from Zca whenever possible [PR119122]

2025-04-24 Thread Jeff Law





On 3/5/25 5:05 AM, Yuriy Kolerov wrote:

GCC must imply C extension from Zca extension when it's
possible. It's necessary for achieving compatibility
between different march strings which in fact may be
the same.

E.g., if rv32ic multilib configuration is presented in
GCC, then GCC will not choose this configuration for
linking if -march=rv32i_zca is passed.

Here is a more practical example. From RISC-V
Instruction Set Manual:

 Therefore common ISA strings can be updated as follows
 to include the relevant Zc extensions, for example:
 - RV32IMC becomes RV32IM_Zce
 - RV32IMCF becomes RV32IMF_Zce

With current implication rules this will not work well
if rv32imc configuration is presented and a user
passes -march=rv32im_zce. This is how we can check
this with a simple empty test.c source file:

$ riscv64-unknown-elf-gcc -march=rv32ic -mabi=ilp32 -mriscv-attribute -S test.c
$ grep "attribute arch" test.s
 .attribute arch, "rv32i2p1_c2p0_zca1p0"
$ riscv64-unknown-elf-gcc -march=rv32i_zce -mabi=ilp32 -mriscv-attribute -S 
test.c
$ grep "attribute arch" test.s
 .attribute arch, 
"rv32i2p1_zicsr2p0_zca1p0_zcb1p0_zce1p0_zcmp1p0_zcmt1p0"

According to current GCC these march strings are
incompatible: the first one contains c2p0 and the
second on doesn't.

To introduce such implication rule we need to carefully
cover all possible combinations with these extensions:
zca, zcf, zcd, F and D.

According to the same manual:

 As C defines the same instructions as Zca, Zcf and
 Zcd, the rule is that:
 - C always implies Zca
 - C+F implies Zcf (RV32 only)
 - C+D implies Zcd

Here is a full list of cases:

 1. rv32i_zca implies C.
 2. rv32if_zca_zcf implies C.
 3. rv32ifd_zca_zcf_zcd implies C.
 4. rv64i_zca implies C.
 5. rv64ifd_zca_zcd implies C.

PR target/119122

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: add a rule for
Zca to C implication.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-25.c: Fix dg-error expectation.
* gcc.target/riscv/attribute-c-1.c: New test.
* gcc.target/riscv/attribute-c-2.c: New test.
* gcc.target/riscv/attribute-c-3.c: New test.
* gcc.target/riscv/attribute-c-4.c: New test.
* gcc.target/riscv/attribute-c-5.c: New test.
* gcc.target/riscv/attribute-c-6.c: New test.
* gcc.target/riscv/attribute-c-7.c: New test.
* gcc.target/riscv/attribute-c-8.c: New test.
* gcc.target/riscv/attribute-zce-1.c: Update Zce tests.
* gcc.target/riscv/attribute-zce-2.c: Likewise.
* gcc.target/riscv/attribute-zce-3.c: Likewise
* gcc.target/riscv/attribute-zce-4.c: Likewise.

Thanks.  I've pushed this to the trunk.

jeff

Re: [PATCH 1/2] arc: Add commutative multiplication patterns.

2025-04-24 Thread Claudiu Zissulescu Ianculescu

Hi Jeff,

Indeed, Luis should have been using "umulti". The other attributes are
not required. I'll fix it before pushing to the mainline.

Thanks,
Claudiu

On Fri, Apr 18, 2025 at 8:41 PM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:22 AM, Luis Silva wrote:
> > This patch introduces two new instruction patterns:
> >
> >  `*mulsi3_cmp0`:  This pattern performs a multiplication
> >  and sets the CC_Z register based on the result, while
> >  also storing the result of the multiplication in a
> >  general-purpose register.
> >
> >  `*mulsi3_cmp0_noout`:  This pattern performs a
> >  multiplication and sets the CC_Z register based on the
> >  result without storing the result in a general-purpose
> >  register.
> >
> > These patterns are optimized to generate code using the `mpy.f`
> > instruction, specifically used where the result is compared to zero.
> >
> > In addition, the previous commutative multiplication implementation
> > was removed.  It incorrectly took into account the negative flag,
> > which is wrong.  This new implementation only considers the zero
> > flag.
> >
> > A test case has been added to verify the correctness of these
> > changes.
> >
> > gcc/ChangeLog:
> >
> >  * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication
> >  results compared against zero, selecting CC_Zmode.
> >  * config/arc/arc.md (*mulsi3_cmp0): New define_insn.
> >  (*mulsi3_cmp0_noout): New define_insn.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/arc/mult-cmp0.c: New test.
> So I'm not well versed in the ARC port, but a couple questions.
>
> First your new patterns use a new type "mpy".  Do you want/need to add
> that to the pipeline descriptions?  It would seem advisable to do so.
>
> Do the new patterns need to set "cond" and "predicable" attributes?
>
> Jeff

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Uros Bizjak

On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka  wrote:
>
> > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> > or vpandn.
> > Current register_operand/vector_operand could lose some optimization
> > opportunity.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >   * config/i386/predicates.md (vector_or_0_or_1s_operand): New 
> > predicate.
> >   (nonimm_or_0_or_1s_operand): Ditto.
> >   * config/i386/sse.md (vcond_mask_):
> >   Extend the predicate of operands1 to accept 0 or allones
> >   operands.
> >   (vcond_mask_): Ditto.
> >   (vcond_mask_v1tiv1ti): Ditto.
> >   (vcond_mask_): Ditto.
> >   * config/i386/i386.md (movcc): Ditto for operands[2] and
> >   operands[3].
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/blendv-to-maxmin.c: New test.
> >   * gcc.target/i386/blendv-to-pand.c: New test.
>
> > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > new file mode 100644
> > index 000..042eb7d8f24
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > +
> > +double
> > +foo (double a)
> > +{
> > +  if (a > 0.0)
> > +return a;
> > +  return 0.0;
> > +}
>
> With -ffast-math this is matched as MAX_EXPR at gimple level. Without
> -ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
> explicitely documented as unspecified when one of parameters is nan.
>
> So without -ffast-math at combine time we see:
> (insn 6 3 7 2 (set (reg:DF 103)
> (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
>  (nil))
> (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> (unspec:DF [
> (reg:DF 104 [ a ])
> (reg:DF 103)
> ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
>  (expr_list:REG_DEAD (reg:DF 104 [ a ])
> (expr_list:REG_DEAD (reg:DF 103)
> (nil
>
> maxss is defined as:
>
> MAX(SRC1, SRC2)
> {
> IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> ELSE DEST := SRC2;
> FI;
> }

Please see [1], "Maximum and minimum functions", which says:

"The maxNum and minNum functions defined in the 2008 standard
propagate a non-NaN when one input is NaN and the other input is a
normal number.

This problem will be fixed by the forthcoming revision of the
standard. The new functions named maximum and minimum are certain to
propagate NaNs.
Some current implementations are deviating from both of these
definitions. Max and min instructions in the x86 instruction set are
implemented so that max(a,b) and min(a,b) give b if one of the inputs
is NaN. This is useful because it corresponds to the behavior of the
code expression a > b ? a : b. A compiler can translate this common
high-level language expression into a single instruction."

Unfortunately, SSE max and min instructions are incompatible with both
standard revisions due to "ELSE IF (SRC1 = NaN) THEN DEST := SRC2;
FI;"

[1] 
https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/nan-propagation.pdf

Uros.

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Qing Zhao



> On Apr 24, 2025, at 11:59, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
>> Hi, 
>> 
>> Kees reported a segmentation failure when he used the patch to compiler 
>> kernel, 
>> and the reduced the testing case is something like the following:
>> 
>> struct f {
>> void *g __attribute__((__counted_by__(h)));
>> long h;
>> };
>> 
>> extern struct f *my_alloc (int);
>> 
>> int i(void) {
>> struct f *iov = my_alloc (10);
>> int *j = (int *)iov->g;
>> return __builtin_dynamic_object_size(iov->g, 0);
>> }
>> 
>> Basically, the problem is relating to the pointee type of the pointer array 
>> being “void”, 
>> As a result, the element size of the array is not available in the IR. 
>> Therefore segmentation
>> fault when calculating the size of the whole object. 
>> 
>> Although it’s easy to fix this segmentation failure, I am not quite sure 
>> what’s the best
>> solution to this issue:
>> 
>> 1. Reject such usage of “counted_by” in the very beginning by reporting 
>> warning to the
>> User, and delete the counted_by attribute from the field.
>> 
>> Or:
>> 
>> 2. Accept such usage, but issue warnings when calculating the object_size in 
>> Middle-end.
>> 
>> Personally, I prefer the above 1 since I think that when the pointee type is 
>> void, we don’t know
>> The type of the element of the pointer array, there is no way to decide the 
>> size of the pointer array. 
>> 
>> So, the counted_by information is not useful for the 
>> __builtin_dynamic_object_size.
>> 
>> But I am not sure whether the counted_by still can be used for bound 
>> sanitizer?
>> 
>> Thanks for suggestions and help.
> 
> GNU C allows pointer arithmetic and sizeof on void pointers and
> that treats void as having size 1.  So you could also allow counted_by
> and assume as size 1 for void.
> 
> https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html

Okay, thanks for the info.
So, 
1. should we issue warnings when doing this?
2. If the compilation option is explicitly asking for standard C,
shall we issue warning and delete the counted_by attribute from the field?

Qing
> 
> 
> Martin
> 
>> 
>> Qing
>> 
>> 
>> 
>> 
>>> On Apr 23, 2025, at 15:45, Qing Zhao  wrote:
>>> 
>>> Hi,
>>> 
>>> This is the 2nd version of the patch set to extend "counted_by" attribute
>>> to pointer fields of structures.
>>> 
>>> the first version was submitted 3 months ago on 1/16/2025, and triggered
>>> a lot of discussion on whether we need a new syntax for counted_by
>>> attribute.
>>> 
>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html
>>> 
>>> After a long discussion since then: 
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677024.html)
>>> 
>>> We agreed to the following compromised solution:
>>> 
>>> 1. Keep the current syntax of counted_by for lone identifier;
>>> 2. Add a new attribute "counted_by_exp" for expressions.
>>> 
>>> Although there are still some discussion going on for the new 
>>> counted_by_exp attribute (In Clang community) 
>>> https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885
>>> 
>>> The syntax for the lone identifier is kept the same as before.
>>> 
>>> So, I'd like to resubmit my previous patch of extending "counted_by"
>>> to pointer fields of structures. 
>>> 
>>> The whole patch set has been rebased on the latest trunk, some testing case
>>> adjustment,  bootstrapped  and regression tested on both aarch64 and x86.
>>> 
>>> There will be a seperate patch set for the new "counted_by_exp" 
>>> attribute later to cover the expressions cases.
>>> 
>>> The following are more details on this patch set:
>>> 
>>> For example:
>>> 
>>> struct PP {
>>> size_t count2;
>>> char other1;
>>> char *array2 __attribute__ ((counted_by (count2)));
>>> int other2;
>>> } *pp;
>>> 
>>> specifies that the "array2" is an array that is pointed by the
>>> pointer field, and its number of elements is given by the field
>>> "count2" in the same structure.
>>> 
>>> There are the following importand facts about "counted_by" on pointer
>>> fields compared to the "counted_by" on FAM fields:
>>> 
>>> 1. one more new requirement for pointer fields with "counted_by" attribute:
>>>  pp->array2 and pp->count2 can ONLY be changed by changing the whole 
>>> structure
>>>  at the same time.
>>> 
>>> 2. the following feature for FAM field with "counted_by" attribute is NOT
>>>  valid for the pointer field any more:
>>> 
>>>   " One important feature of the attribute is, a reference to the
>>>flexible array member field uses the latest value assigned to the
>>>field that represents the number of the elements before that
>>>reference.  For example,
>>> 
>>>   p->count = val1;
>>>   p->array[20] = 0;  // ref1 to p->array
>>>   p->count = val2;
>>>   p->array[30] = 0;  // ref2 to p->array
>>> 
>>>in the above, 'ref1' uses 'val1' as the number of the elements in
>>>'p->array', and 'ref2' uses 'val2' as the number of

Fix ICE building deepsjeng with -fprofile-use

2025-04-24 Thread Jan Hubicka

Hi,
the problem here is division by zero, since adjusted 0 > precise 0. Fixed by
using right test.

gcc/ChangeLog:

PR ipa/119924
* ipa-cp.cc (update_counts_for_self_gen_clones): Use nonzero_p.
(update_profiling_info): Likewise.
(update_specialized_profile): Likewise.

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index abde64b6f29..b4b96997d75 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -4639,7 +4639,7 @@ update_counts_for_self_gen_clones (cgraph_node *orig_node,
   const vec &self_gen_clones)
 {
   profile_count redist_sum = orig_node->count.ipa ();
-  if (!(redist_sum > profile_count::zero ()))
+  if (!redist_sum.nonzero_p ())
 return;
 
   if (dump_file)
@@ -4710,7 +4710,7 @@ update_counts_for_self_gen_clones (cgraph_node *orig_node,
  it.  */
   for (cgraph_node *n : self_gen_clones)
 {
-  if (!(n->count.ipa () > profile_count::zero ()))
+  if (!n->count.ipa ().nonzero_p ())
continue;
 
   desc_incoming_count_struct desc;
@@ -4756,7 +4756,7 @@ update_profiling_info (struct cgraph_node *orig_node,
   profile_count new_sum;
   profile_count remainder, orig_node_count = orig_node->count.ipa ();
 
-  if (!(orig_node_count > profile_count::zero ()))
+  if (!orig_node_count.nonzero_p ())
 return;
 
   if (dump_file)
@@ -4920,7 +4920,7 @@ update_specialized_profile (struct cgraph_node *new_node,
   orig_node_count.dump (dump_file);
   fprintf (dump_file, "\n");
 }
-  if (!(orig_node_count > profile_count::zero ()))
+  if (!orig_node_count.nonzero_p ())
 return;
 
   new_node_count = new_node->count;

Re: [PATCH] c: Allow $@` in GNU23/GNU2Y raw string delimiters [PR110343]

2025-04-24 Thread Marek Polacek

On Wed, Apr 16, 2025 at 09:25:00PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> Aaron mentioned in the PR that late in C23 N3124 was adopted and
> $@` are now part of basic character set.  The paper has been implemented
> in GCC from what I can see, but we should allow for GNU23/2Y $@` in
> raw string delimiters as well, like they are allowed for C++26, because
> the delimiters can contain anything from basic character set but space,
> ()\, tab, form-feed, newline and backspace.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2025-04-16  Jakub Jelinek  
> 
>   PR c++/110343
>   * lex.cc (lex_raw_string): For C allow $@` in raw string delimiters
>   if CPP_OPTION (pfile, low_ucns) i.e. for C23 and later.
> 
>   * gcc.dg/raw-string-1.c: New test.
> 
> --- libcpp/lex.cc.jj  2025-04-08 14:09:47.173503355 +0200
> +++ libcpp/lex.cc 2025-04-16 17:18:04.556931275 +0200
> @@ -2711,8 +2711,9 @@ lex_raw_string (cpp_reader *pfile, cpp_t
>  || c == '!' || c == '=' || c == ','
>  || c == '"' || c == '\''
>  || ((c == '$' || c == '@' || c == '`')
> -&& CPP_OPTION (pfile, cplusplus)
> -&& CPP_OPTION (pfile, lang) > CLK_CXX23)))
> +&& (CPP_OPTION (pfile, cplusplus)
> +? CPP_OPTION (pfile, lang) > CLK_CXX23
> +: CPP_OPTION (pfile, low_ucns)
>   prefix[prefix_len++] = c;
> else
>   {
> --- gcc/testsuite/gcc.dg/raw-string-1.c.jj2025-04-16 17:32:20.595541753 
> +0200
> +++ gcc/testsuite/gcc.dg/raw-string-1.c   2025-04-16 17:35:09.266302136 
> +0200
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=gnu23" } */

Maybe mention PR c++/110343?

I think the patch is OK though.  Thanks.
 
> +const void *s0 = R"0123456789abcdefg()0123456789abcdefg" 0;
> + /* { dg-error "raw string delimiter longer" "longer" { target *-*-* } 
> .-1 } */
> + /* { dg-error "stray" "stray" { target *-*-* } .-2 } */
> +const void *s1 = R" () " 0;
> + /* { dg-error "invalid character" "invalid" { target *-*-* } .-1 } */
> + /* { dg-error "stray" "stray" { target *-*-* } .-2 } */
> +const void *s2 = R"  ()  " 0;
> + /* { dg-error "invalid character" "invalid" { target *-*-* } .-1 } */
> + /* { dg-error "stray" "stray" { target *-*-* } .-2 } */
> +const void *s3 = R")())" 0;
> + /* { dg-error "invalid character" "invalid" { target *-*-* } .-1 } */
> + /* { dg-error "stray" "stray" { target *-*-* } .-2 } */
> +const char *s4 = R"@()@";
> +const char *s5 = R"$()$";
> +const char *s6 = R"`()`";
> +const void *s7 = R"\u0040()\u0040" 0;
> + /* { dg-error "invalid character" "invalid" { target *-*-* } .-1 } */
> + /* { dg-error "stray" "stray" { target *-*-* } .-2 } */
> +const char *s8 = R"`@$$@`@`$()`@$$@`@`$";
> +
> +int main () {}

Marek

[PATCH] testsuite: Add require target for SJLJ exception implementation

2025-04-24 Thread Dimitar Dimitrov

Testcases for musttail call optimization fail on pru-unknown-elf:
  FAIL: c-c++-common/musttail14.c  -std=gnu++17 (test for excess errors)
  Excess errors:
  .../gcc/gcc/testsuite/c-c++-common/musttail14.c:37:14: error: cannot 
tail-call: caller uses sjlj exceptions

Silence these errors by disabling the tests if target uses SJLJ for
implementing exceptions.  Use a new effective target check for this.

Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.

Ok for trunk?

gcc/testsuite/ChangeLog:

* c-c++-common/musttail14.c: Disable test if effective target
using_sjlj_exceptions.
* c-c++-common/musttail22.c: Ditto.
* g++.dg/musttail8.C: Ditto.
* g++.dg/musttail9.C: Ditto.
* g++.dg/opt/musttail3.C: Ditto.
* g++.dg/opt/musttail4.C: Ditto.
* g++.dg/opt/musttail5.C: Ditto.
* g++.dg/opt/pr119613.C: Ditto.
* lib/target-supports.exp
(check_effective_target_using_sjlj_exceptions): New check.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/c-c++-common/musttail14.c |  2 +-
 gcc/testsuite/c-c++-common/musttail22.c |  2 +-
 gcc/testsuite/g++.dg/musttail8.C|  2 +-
 gcc/testsuite/g++.dg/musttail9.C|  2 +-
 gcc/testsuite/g++.dg/opt/musttail3.C|  2 +-
 gcc/testsuite/g++.dg/opt/musttail4.C|  2 +-
 gcc/testsuite/g++.dg/opt/musttail5.C|  2 +-
 gcc/testsuite/g++.dg/opt/pr119613.C |  2 +-
 gcc/testsuite/lib/target-supports.exp   | 12 
 9 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/musttail14.c 
b/gcc/testsuite/c-c++-common/musttail14.c
index 56a52b8638b..5bda742b36f 100644
--- a/gcc/testsuite/c-c++-common/musttail14.c
+++ b/gcc/testsuite/c-c++-common/musttail14.c
@@ -1,5 +1,5 @@
 /* PR tree-optimization/118430 */
-/* { dg-do compile { target musttail } } */
+/* { dg-do compile { target { musttail && { ! using_sjlj_exceptions } } } } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 /* { dg-final { scan-tree-dump-times "  \[^\n\r]* = bar \\\(\[^\n\r]*\\\); 
\\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "  \[^\n\r]* = freddy \\\(\[^\n\r]*\\\); 
\\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/musttail22.c 
b/gcc/testsuite/c-c++-common/musttail22.c
index eb812494f44..7dc0f199ea9 100644
--- a/gcc/testsuite/c-c++-common/musttail22.c
+++ b/gcc/testsuite/c-c++-common/musttail22.c
@@ -1,5 +1,5 @@
 /* PR tree-optimization/118430 */
-/* { dg-do compile { target musttail } } */
+/* { dg-do compile { target { musttail && { ! using_sjlj_exceptions } } } } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 /* { dg-final { scan-tree-dump-times "  \[^\n\r]* = bar \\\(\[^\n\r]*\\\); 
\\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "  \[^\n\r]* = freddy \\\(\[^\n\r]*\\\); 
\\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
diff --git a/gcc/testsuite/g++.dg/musttail8.C b/gcc/testsuite/g++.dg/musttail8.C
index 0f1b68bd269..18de9c87935 100644
--- a/gcc/testsuite/g++.dg/musttail8.C
+++ b/gcc/testsuite/g++.dg/musttail8.C
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { musttail } } } */
+/* { dg-do compile { target { musttail && { ! using_sjlj_exceptions } } } } */
 /* { dg-options "-std=gnu++11" } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
diff --git a/gcc/testsuite/g++.dg/musttail9.C b/gcc/testsuite/g++.dg/musttail9.C
index 85937dcdcd3..1c3a744a4e4 100644
--- a/gcc/testsuite/g++.dg/musttail9.C
+++ b/gcc/testsuite/g++.dg/musttail9.C
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { musttail } } } */
+/* { dg-do compile { target { musttail && { ! using_sjlj_exceptions } } } } */
 /* { dg-options "-std=gnu++11" } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
diff --git a/gcc/testsuite/g++.dg/opt/musttail3.C 
b/gcc/testsuite/g++.dg/opt/musttail3.C
index 1c4e54952b1..a2db4479ec1 100644
--- a/gcc/testsuite/g++.dg/opt/musttail3.C
+++ b/gcc/testsuite/g++.dg/opt/musttail3.C
@@ -1,5 +1,5 @@
 // PR tree-optimization/119491
-// { dg-do compile { target { external_musttail && c++11 } } }
+// { dg-do compile { target { external_musttail && { c++11 && { ! 
using_sjlj_exceptions } } } } }
 // { dg-options "-O2" }
 
 struct A {
diff --git a/gcc/testsuite/g++.dg/opt/musttail4.C 
b/gcc/testsuite/g++.dg/opt/musttail4.C
index ede2959f7d7..3362ccc5e01 100644
--- a/gcc/testsuite/g++.dg/opt/musttail4.C
+++ b/gcc/testsuite/g++.dg/opt/musttail4.C
@@ -1,4 +1,4 @@
-// { dg-do compile { target { external_musttail && c++11 } } }
+// { dg-do compile { target { external_musttail && { c++11 && { ! 
using_sjlj_exceptions } } } } }
 // { dg-options "-O2 -fexceptions" }
 
 struct S { ~S (); };
diff --git a/gcc/testsuite/g++.dg/opt/musttail5.C 
b/gcc/testsuite/g++.dg/opt/musttail5.C
index 604dd6907aa..10e8d940dbb 100644
--- a/

[PATCH] testsuite: Skip tests incompatible with generic thunk support

2025-04-24 Thread Dimitar Dimitrov

Some backends do not define TARGET_ASM_OUTPUT_MI_THUNK.  But the generic
thunk support cannot emit code for calling variadic methods of
multiple-inheritance classes.  Example error for pru-unknown-elf:

 .../gcc/gcc/testsuite/g++.dg/ipa/pr83549.C:7:24: error: generic thunk code 
fails for method 'virtual void C::_ZThn4_N1C3fooEz(...)' which uses '...'

Disable the affected tests for all targets which do not define
TARGET_ASM_OUTPUT_MI_THUNK.

Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.

Ok for trunk?

gcc/testsuite/ChangeLog:

* g++.dg/ipa/pr83549.C: Require effective target
variadic_mi_thunk.
* g++.dg/ipa/pr83667.C: Ditto.
* g++.dg/torture/pr81812.C: Ditto.
* g++.old-deja/g++.jason/thunk3.C: Ditto.
* lib/target-supports.exp
(check_effective_target_variadic_mi_thunk): New function.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/g++.dg/ipa/pr83549.C|  1 +
 gcc/testsuite/g++.dg/ipa/pr83667.C|  1 +
 gcc/testsuite/g++.dg/torture/pr81812.C|  1 +
 gcc/testsuite/g++.old-deja/g++.jason/thunk3.C |  3 +-
 gcc/testsuite/lib/target-supports.exp | 31 +++
 5 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/ipa/pr83549.C 
b/gcc/testsuite/g++.dg/ipa/pr83549.C
index 90cf8fe7e0d..3b4547b71df 100644
--- a/gcc/testsuite/g++.dg/ipa/pr83549.C
+++ b/gcc/testsuite/g++.dg/ipa/pr83549.C
@@ -1,5 +1,6 @@
 // PR ipa/83549
 // { dg-do compile }
+// { dg-require-effective-target variadic_mi_thunk }
 // { dg-options "-O2" }
 
 struct A { virtual ~A (); };
diff --git a/gcc/testsuite/g++.dg/ipa/pr83667.C 
b/gcc/testsuite/g++.dg/ipa/pr83667.C
index a8a5a5adb3a..ec91a2ebd33 100644
--- a/gcc/testsuite/g++.dg/ipa/pr83667.C
+++ b/gcc/testsuite/g++.dg/ipa/pr83667.C
@@ -1,6 +1,7 @@
 // { dg-require-alias "" }
 // { dg-options "-fdump-ipa-inline" }
 // c++/83667 ICE dumping a static thunk when TARGET_USE_LOCAL_THUNK_ALIAS_P
+// { dg-require-effective-target variadic_mi_thunk }
 
 
 struct a
diff --git a/gcc/testsuite/g++.dg/torture/pr81812.C 
b/gcc/testsuite/g++.dg/torture/pr81812.C
index b5c621d2beb..80aed8eb2b6 100644
--- a/gcc/testsuite/g++.dg/torture/pr81812.C
+++ b/gcc/testsuite/g++.dg/torture/pr81812.C
@@ -1,4 +1,5 @@
 // { dg-xfail-if "PR108277" { arm_thumb1 } }
+// { dg-require-effective-target variadic_mi_thunk }
 
 struct Error {
   virtual void error(... ) const;
diff --git a/gcc/testsuite/g++.old-deja/g++.jason/thunk3.C 
b/gcc/testsuite/g++.old-deja/g++.jason/thunk3.C
index 4e684f9367d..e894194db1f 100644
--- a/gcc/testsuite/g++.old-deja/g++.jason/thunk3.C
+++ b/gcc/testsuite/g++.old-deja/g++.jason/thunk3.C
@@ -1,5 +1,6 @@
 // { dg-do run }
-// { dg-skip-if "fails with generic thunk support" { rs6000-*-* powerpc-*-eabi 
v850-*-* sh-*-* h8*-*-* xtensa*-*-* m32r*-*-* lm32-*-* } }
+// { dg-skip-if "fails with generic thunk support" { rs6000-*-* powerpc-*-eabi 
sh-*-* xtensa*-*-* } }
+// { dg-require-effective-target variadic_mi_thunk }
 // Test that variadic function calls using thunks work right.
 // Note that this will break on any target that uses the generic thunk
 //  support, because it doesn't support variadic functions.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 16bb2ae4426..4ccac18accc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -659,6 +659,37 @@ proc check_effective_target_trampolines { } {
 return 1
 }
 
+# Return 1 if target supports calling variadic methods
+# of multi-inheritance classes.
+
+proc check_effective_target_variadic_mi_thunk { } {
+# These targets do not implement TARGET_ASM_OUTPUT_MI_THUNK.
+if { [istarget avr-*-*]
+|| [istarget bpf-*-*]
+|| [istarget fr30-*-*]
+|| [istarget ft32-*-*]
+|| [istarget amdgcn-*-*]
+|| [istarget h8300-*-*]
+|| [istarget iq2000-*-*]
+|| [istarget lm32-*-*]
+|| [istarget m32c-*-*]
+|| [istarget m32r-*-*]
+|| [istarget mcore-*-*]
+|| [istarget moxie-*-*]
+|| [istarget msp430-*-*]
+|| [istarget nvptx-*-*]
+|| [istarget pdp11-*-*]
+|| [istarget pru-*-*]
+|| [istarget rl78-*-*]
+|| [istarget rx-*-*]
+|| [istarget v850-*-*]
+|| [istarget visium-*-*] } {
+
+   return 0;
+}
+return 1
+}
+
 # Return 1 if target has limited stack size.
 
 proc check_effective_target_stack_size { } {
-- 
2.49.0

Re: [PATCH] testsuite: Skip tests incompatible with generic thunk support

2025-04-24 Thread Jeff Law





On 4/24/25 12:23 PM, Dimitar Dimitrov wrote:

Some backends do not define TARGET_ASM_OUTPUT_MI_THUNK.  But the generic
thunk support cannot emit code for calling variadic methods of
multiple-inheritance classes.  Example error for pru-unknown-elf:

  .../gcc/gcc/testsuite/g++.dg/ipa/pr83549.C:7:24: error: generic thunk code 
fails for method 'virtual void C::_ZThn4_N1C3fooEz(...)' which uses '...'

Disable the affected tests for all targets which do not define
TARGET_ASM_OUTPUT_MI_THUNK.

Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.

Ok for trunk?

gcc/testsuite/ChangeLog:

* g++.dg/ipa/pr83549.C: Require effective target
variadic_mi_thunk.
* g++.dg/ipa/pr83667.C: Ditto.
* g++.dg/torture/pr81812.C: Ditto.
* g++.old-deja/g++.jason/thunk3.C: Ditto.
* lib/target-supports.exp
(check_effective_target_variadic_mi_thunk): New function.

OK.
jeff

Re: [PATCH v2 2/3] RISC-V: Adjust the testcases after vec_duplicate + vadd.vv combine

2025-04-24 Thread Jeff Law





On 4/19/25 5:24 AM, pan2...@intel.com wrote:

From: Pan Li 

After we support the vec_duplicate + vadd.vv combine to vadd.vx, the
existing testcases need some adjust for asm dump check times.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adjust
the asm dump check times.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Ditto.
* gcc.target/riscv/struct_vect_24.c: Ditto.
So this patch or something similar is OK once we've got patch #1 of the 
series under control.


Jeff

Re: [PATCH v2 3/3] RISC-V: Add testcases for vec_duplicate + vadd.vv combine to vadd.vx

2025-04-24 Thread Jeff Law





On 4/19/25 5:24 AM, pan2...@intel.com wrote:

From: Pan Li 

Add asm dump check and run test for vec_duplicate + vadd.vv combine
to vadd.vx.  Introduce new folder to hold all related testcases.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add new folder vx_vf for all
vec_dup + vv to vx testcases.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c: New test.
Similarly, this patch or something close to it is fine once we've got 
the first patch in the series under control.


jeff

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Kees Cook




On April 24, 2025 1:44:23 PM PDT, Qing Zhao  wrote:
>
>
>> On Apr 24, 2025, at 15:43, Bill Wendling  wrote:
>> 
>> On Thu, Apr 24, 2025 at 8:15 AM Qing Zhao  wrote:
>>> 
>>> Hi,
>>> 
>>> Kees reported a segmentation failure when he used the patch to compiler 
>>> kernel,
>>> and the reduced the testing case is something like the following:
>>> 
>>> struct f {
>>> void *g __attribute__((__counted_by__(h)));
>>> long h;
>>> };
>>> 
>>> extern struct f *my_alloc (int);
>>> 
>>> int i(void) {
>>> struct f *iov = my_alloc (10);
>>> int *j = (int *)iov->g;
>>> return __builtin_dynamic_object_size(iov->g, 0);
>>> }
>>> 
>>> Basically, the problem is relating to the pointee type of the pointer array 
>>> being “void”,
>>> As a result, the element size of the array is not available in the IR. 
>>> Therefore segmentation
>>> fault when calculating the size of the whole object.
>>> 
>>> Although it’s easy to fix this segmentation failure, I am not quite sure 
>>> what’s the best
>>> solution to this issue:
>>> 
>>> 1. Reject such usage of “counted_by” in the very beginning by reporting 
>>> warning to the
>>> User, and delete the counted_by attribute from the field.
>>> 
>>> Or:
>>> 
>>> 2. Accept such usage, but issue warnings when calculating the object_size 
>>> in Middle-end.
>>> 
>>> Personally, I prefer the above 1 since I think that when the pointee type 
>>> is void, we don’t know
>>> The type of the element of the pointer array, there is no way to decide the 
>>> size of the pointer array.
>>> 
>>> So, the counted_by information is not useful for the 
>>> __builtin_dynamic_object_size.
>>> 
>>> But I am not sure whether the counted_by still can be used for bound 
>>> sanitizer?
>>> 
>>> Thanks for suggestions and help.
>>> 
>> Clang supports __sized_by that can handle a 'void *', where it defaults to 
>> 'u8'.

I would like to be able to use counted_by (and not sized_by) so that users of 
the annotation don't need to have to change the marking just because it's "void 
*". Everything else operates on "void *" as if it were u8 ...

Regardless, ignoring "void *", the rest of my initial testing (of both GCC and 
Clang) is positive. The test cases are all behaving as expected! Yay! :) I will 
try to construct some more goofy stuff to find more corner cases.

And at some future point we may want to think about -fsanitize=pointer-overflow 
using this information too, to catch arithmetic and increments past the 
bounds...

struct foo {
  u8 *buf __counted_by(len);
  int len;
} str;
u8 *walk;
str->buf = malloc(10);
str->len = 10;

walk = str->buf + 12; // trip!
for (walk = str->buf; ; walk++) // trip after 10 loops
   ;


-Kees

-- 
Kees Cook

[PATCH] gimple-verifier: Add check that comparison in GIMPLE_COND does not throw

2025-04-24 Thread Andrew Pinski

While working on PR 119903, I noticed that there is code in 
replace_stmt_with_simplification
which makes sure that the comparison of a GIMPLE_COND does not throw
(non-call exceptions and trapping math) but the gimple verifier does not
verify this. So let's add it.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-cfg.cc (verify_gimple_cond): Error out if the comparison
throws.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-cfg.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 6a29a56ca9a..712bda1f8ca 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -5113,6 +5113,19 @@ verify_gimple_cond (gcond *stmt)
   return true;
 }
 
+  tree lhs = gimple_cond_lhs (stmt);
+
+  /* GIMPLE_CONDs condition may not throw.  */
+  if (flag_exceptions
+  && cfun->can_throw_non_call_exceptions
+  && operation_could_trap_p (gimple_cond_code (stmt),
+FLOAT_TYPE_P (TREE_TYPE (lhs)),
+false, NULL_TREE))
+{
+  error ("gimple cond condition cannot throw");
+  return true;
+}
+
   return verify_gimple_comparison (boolean_type_node,
   gimple_cond_lhs (stmt),
   gimple_cond_rhs (stmt),
-- 
2.43.0

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Martin Uecker

Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
> Hi, 
> 
> Kees reported a segmentation failure when he used the patch to compiler 
> kernel, 
> and the reduced the testing case is something like the following:
> 
> struct f {
>  void *g __attribute__((__counted_by__(h)));
>  long h;
> };
> 
> extern struct f *my_alloc (int);
> 
> int i(void) {
>  struct f *iov = my_alloc (10);
>  int *j = (int *)iov->g;
>  return __builtin_dynamic_object_size(iov->g, 0);
> }
> 
> Basically, the problem is relating to the pointee type of the pointer array 
> being “void”, 
> As a result, the element size of the array is not available in the IR. 
> Therefore segmentation
> fault when calculating the size of the whole object. 
> 
> Although it’s easy to fix this segmentation failure, I am not quite sure 
> what’s the best
> solution to this issue:
> 
> 1. Reject such usage of “counted_by” in the very beginning by reporting 
> warning to the
> User, and delete the counted_by attribute from the field.
> 
> Or:
> 
> 2. Accept such usage, but issue warnings when calculating the object_size in 
> Middle-end.
> 
> Personally, I prefer the above 1 since I think that when the pointee type is 
> void, we don’t know
> The type of the element of the pointer array, there is no way to decide the 
> size of the pointer array. 
> 
> So, the counted_by information is not useful for the 
> __builtin_dynamic_object_size.
> 
> But I am not sure whether the counted_by still can be used for bound 
> sanitizer?
> 
> Thanks for suggestions and help.

GNU C allows pointer arithmetic and sizeof on void pointers and
that treats void as having size 1.  So you could also allow counted_by
and assume as size 1 for void.

https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html


Martin

> 
> Qing
> 
> 
> 
> 
> > On Apr 23, 2025, at 15:45, Qing Zhao  wrote:
> > 
> > Hi,
> > 
> > This is the 2nd version of the patch set to extend "counted_by" attribute
> > to pointer fields of structures.
> > 
> > the first version was submitted 3 months ago on 1/16/2025, and triggered
> > a lot of discussion on whether we need a new syntax for counted_by
> > attribute.
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html
> > 
> > After a long discussion since then: 
> > (https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677024.html)
> > 
> > We agreed to the following compromised solution:
> > 
> > 1. Keep the current syntax of counted_by for lone identifier;
> > 2. Add a new attribute "counted_by_exp" for expressions.
> > 
> > Although there are still some discussion going on for the new 
> > counted_by_exp attribute (In Clang community) 
> > https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885
> > 
> > The syntax for the lone identifier is kept the same as before.
> > 
> > So, I'd like to resubmit my previous patch of extending "counted_by"
> > to pointer fields of structures. 
> > 
> > The whole patch set has been rebased on the latest trunk, some testing case
> > adjustment,  bootstrapped  and regression tested on both aarch64 and x86.
> > 
> > There will be a seperate patch set for the new "counted_by_exp" 
> > attribute later to cover the expressions cases.
> > 
> > The following are more details on this patch set:
> > 
> > For example:
> > 
> > struct PP {
> >  size_t count2;
> >  char other1;
> >  char *array2 __attribute__ ((counted_by (count2)));
> >  int other2;
> > } *pp;
> > 
> > specifies that the "array2" is an array that is pointed by the
> > pointer field, and its number of elements is given by the field
> > "count2" in the same structure.
> > 
> > There are the following importand facts about "counted_by" on pointer
> > fields compared to the "counted_by" on FAM fields:
> > 
> > 1. one more new requirement for pointer fields with "counted_by" attribute:
> >   pp->array2 and pp->count2 can ONLY be changed by changing the whole 
> > structure
> >   at the same time.
> > 
> > 2. the following feature for FAM field with "counted_by" attribute is NOT
> >   valid for the pointer field any more:
> > 
> >" One important feature of the attribute is, a reference to the
> > flexible array member field uses the latest value assigned to the
> > field that represents the number of the elements before that
> > reference.  For example,
> > 
> >p->count = val1;
> >p->array[20] = 0;  // ref1 to p->array
> >p->count = val2;
> >p->array[30] = 0;  // ref2 to p->array
> > 
> > in the above, 'ref1' uses 'val1' as the number of the elements in
> > 'p->array', and 'ref2' uses 'val2' as the number of elements in
> > 'p->array'. "
> > 
> > This patch set includes 3 parts:
> > 
> > 1.Extend "counted_by" attribute to pointer fields of structures. 
> > 2.Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE
> >and use it in builtinin-object-size.
> > 3.Use the counted_by attribute of pointers in array bound che

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-24 Thread Jan Hubicka

Hi,
> With this patch 
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681503.html
> scalar version can also be optimized to vcmpnltsd + vpandn

this is nice.  Would be nice if this was also caught by combiner...

> > Can we also check if_true/if_false, if they're const0, or
> > constm1(integral modes), ninsns  should be 1 since it will be
> > simplified to simple pand or pandn.

I am testing the following. 0 is a common case so it is definitely worth
to check. Though the main problem in the testcase (derived from
exchange2) is misaccounting open-coded gather.

gcc/ChangeLog:

PR target/119919
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Account
correctly cond_expr and min/max when one of operands is 0 or -1.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr119919.c: New test.

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3b4dfd9a990..7fb88bd32d8 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25375,14 +25375,32 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
case COND_EXPR:
  {
/* SSE2 conditinal move sequence is:
-pcmpgtd %xmm5, %xmm0
+pcmpgtd %xmm5, %xmm0 (accounted separately)
 pand%xmm0, %xmm2
 pandn   %xmm1, %xmm0
 por %xmm2, %xmm0
   while SSE4 uses cmp + blend
-  and AVX512 masked moves.  */
-
-   int ninsns = TARGET_SSE4_1 ? 2 : 4;
+  and AVX512 masked moves.
+
+  The condition is accounted separately since we usually have
+p = a < b
+c = p ? x : y
+  and we will account first statement as setcc.  Exception is when
+  p is loaded from memory as bool and then we will not acocunt
+  the compare, but there is no way to check for this.  */
+
+   int ninsns = TARGET_SSE4_1 ? 1 : 3;
+
+   /* If one of parameters is 0 or -1 the sequence will be simplified:
+  (if_true & mask) | (if_false & ~mask) -> if_true & mask  */
+   if (ninsns > 1
+   && (zerop (gimple_assign_rhs2 (stmt_info->stmt))
+   || zerop (gimple_assign_rhs3 (stmt_info->stmt))
+   || integer_minus_onep
+   (gimple_assign_rhs2 (stmt_info->stmt))
+   || integer_minus_onep
+   (gimple_assign_rhs3 (stmt_info->stmt
+ ninsns = 1;
 
if (SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P (mode))
  stmt_cost = ninsns * ix86_cost->sse_op;
@@ -25393,8 +25411,8 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
else if (VECTOR_MODE_P (mode))
  stmt_cost = ix86_vec_cost (mode, ninsns * ix86_cost->sse_op);
else
- /* compare + cmov.  */
- stmt_cost = ix86_cost->add * 2;
+ /* compare (accounted separately) + cmov.  */
+ stmt_cost = ix86_cost->add;
  }
  break;
 
@@ -25416,9 +25434,18 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
{
  stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op);
  /* vpmin was introduced in SSE3.
-SSE2 needs pcmpgtd + pand + pandn + pxor.  */
+SSE2 needs pcmpgtd + pand + pandn + pxor.
+If one of parameters is 0 or -1 the sequence is simplified
+to pcmpgtd + pand.  */
  if (!TARGET_SSSE3)
-   stmt_cost *= 4;
+   {
+ if (zerop (gimple_assign_rhs2 (stmt_info->stmt))
+ || integer_minus_onep
+   (gimple_assign_rhs2 (stmt_info->stmt)))
+   stmt_cost *= 2;
+ else
+   stmt_cost *= 4;
+   }
}
  else
/* cmp + cmov.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr119919.c 
b/gcc/testsuite/gcc.target/i386/pr119919.c
new file mode 100644
index 000..ed646561bd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr119919.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -fdump-tree-vect-details" } */
+int a[9*9];
+bool b[9];
+void test()
+{
+for (int i = 0; i < 9; i++)
+{
+b[i] = a[i*9] != 0;
+}
+}
+
+/* { dg-final { scan-tree-dump "loop vectorized using 8 byte vectors" "vect" } 
} */

[pushed] c++: attribute duplication [PR116954]

2025-04-24 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

As a followup to the previous patch for 116954, there's no reason to do
anything in remove_contract_attributes if contracts aren't enabled.

PR c++/116954

gcc/cp/ChangeLog:

* contracts.cc (remove_contract_attributes): Return early if
not enabled.
---
 gcc/cp/contracts.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 3ca2102e866..d0cfd2efd55 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -860,6 +860,9 @@ cp_contract_assertion_p (const_tree attr)
 void
 remove_contract_attributes (tree fndecl)
 {
+  if (!flag_contracts)
+return;
+
   tree list = NULL_TREE;
   for (tree p = DECL_ATTRIBUTES (fndecl); p; p = TREE_CHAIN (p))
 if (!cxx_contract_attribute_p (p))

base-commit: fa61afef18a8566d1907a5ae0e7754e1eac207d9
-- 
2.49.0

Re: [PATCH] c++: Fix OpenMP support with C++20 modules [PR119864]

2025-04-24 Thread Jason Merrill


On 4/22/25 4:48 PM, Jason Merrill wrote:

On 4/22/25 1:21 PM, Tobias Burnus wrote:

Jason Merrill wrote:

On 4/22/25 11:04 AM, Tobias Burnus wrote:

The question is why does this code trigger at all, given
that there is OpenMP but no offload code at all? And how
to fix it in case there is offload code and modules are used.


This seems to be because of:


  if (module_global_init_needed ())
    {
  // Make sure there's a default priority entry.   if (! 
static_init_fini_fns[true])

    static_init_fini_fns[true] = priority_map_t::create_ggc ();
  if (static_init_fini_fns[true]->get_or_insert 
(DEFAULT_INIT_PRIORITY))

    has_module_inits = true;

  if (flag_openmp)
    {
  if (!static_init_fini_fns[2 + true])
    static_init_fini_fns[2 + true] = 
priority_map_t::create_ggc ();
  static_init_fini_fns[2 + true]->get_or_insert 
(DEFAULT_INIT_PRIORITY);

    }
    }


Here we're forcing a target module init function as well as host. If 
we remove the flag_openmp block, Nathaniel's patch is unnecessary 
(but may still be desirable).


I currently do not see whether the code is needed in this case or not, 
but I assume it is, if we want to support static initializers?!?


I don't think so.  For the host, we force create a map with a single 
entry because we always want to emit a module init function.  The openmp 
block is saying we also always want a target init function in a module, 
even if it's empty, which I don't think is correct.  Or if it is, we 
need to specify how to mangle it and agree that that's part of the 
module ABI.


So, for now in addition to Nathaniel's patch I'd remove this flag_openmp 
block so we don't get an unneeded empty function, and maybe add 
something back later after more discussion.


Jason

Re: [PATCH RFC] c++: bad pending_template recursion

2025-04-24 Thread Jonathan Wakely

On Fri, 18 Apr 2025 at 23:08, Jason Merrill  wrote:
>
> limit_bad_template_recursion currently avoids immediate instantiation of
> templates from uses in an already ill-formed instantiation, but we still can
> get unnecessary recursive instantiation in pending_templates if the
> instantiation was queued before the error.
>
> Currently this regresses several libstdc++ tests which seem to rely on a
> static_assert in a function called from another that is separately ill-formed.
> For instance, in the 48101_neg.cc tests, we first get an error in find(), then
> later instantiate _S_key() (called from find) and get the static_assert error
> from there.
>
> Thoughts?  Is this a desirable change, or is the fact that the use precedes 
> the
> error reason to go ahead with the instantiation?
>
> > FAIL: 23_containers/map/48101_neg.cc  -std=gnu++17  (test for errors, line )
> > FAIL: 23_containers/multimap/48101_neg.cc  -std=gnu++17  (test for errors, 
> > line )
> > FAIL: 23_containers/multiset/48101_neg.cc  -std=gnu++17  (test for errors, 
> > line )
> > FAIL: 23_containers/set/48101_neg.cc  -std=gnu++17  (test for errors, line )

As discussed the other day, I will tweak the diagnostics for the
associative containers so that these tests don't regress. I'll post
that for review shortly.

> > FAIL: 30_threads/packaged_task/cons/dangling_ref.cc  -std=gnu++17  (test 
> > for errors, line )
> > FAIL: 30_threads/packaged_task/cons/lwg4154_neg.cc  -std=gnu++17  (test for 
> > errors, line )

These two don't strictly need to use dg-error for the messages that
disappear with your patch, they could have used dg-prune-output
instead. What the test intends to check is that we get a failed
static_assert for std::is_invocable_r_v, and that's still the case.
The other errors are collateral damage, and your patch prevents that.
Which is a good thing. So we can just remove the failing dg-error
lines.

I'll finish testing my associative container changes, then push your
patch after mine lands.

>
>
> gcc/cp/ChangeLog:
>
> * cp-tree.h (struct tinst_level): Add had_errors bit.
> * pt.cc (push_tinst_level_loc): Clear it.
> (pop_tinst_level): Set it.
> (reopen_tinst_level): Check it.
> (instantiate_pending_templates): Call limit_bad_template_recursion.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/template/recurse5.C: New test.
> ---
>  gcc/cp/cp-tree.h | 10 --
>  gcc/cp/pt.cc | 10 --
>  gcc/testsuite/g++.dg/template/recurse5.C | 17 +
>  3 files changed, 33 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/recurse5.C
>
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 7798efba3db..856202c65dd 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -6755,8 +6755,14 @@ struct GTY((chain_next ("%h.next"))) tinst_level {
>/* The location where the template is instantiated.  */
>location_t locus;
>
> -  /* errorcount + sorrycount when we pushed this level.  */
> -  unsigned short errors;
> +  /* errorcount + sorrycount when we pushed this level.  If the value
> + overflows, it will always seem like we currently have more errors, so we
> + will limit template recursion even from non-erroneous templates.  In a 
> TU
> + with over 32k errors, that's fine.  */
> +  unsigned short errors : 15;
> +
> +  /* set in pop_tinst_level if there have been errors since we pushed.  */
> +  bool had_errors : 1;
>
>/* Count references to this object.  If refcount reaches
>   refcount_infinity value, we don't increment or decrement the
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index a71705fd085..e8d342f99f6 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -11418,6 +11418,7 @@ push_tinst_level_loc (tree tldcl, tree targs, 
> location_t loc)
>new_level->targs = targs;
>new_level->locus = loc;
>new_level->errors = errorcount + sorrycount;
> +  new_level->had_errors = false;
>new_level->next = NULL;
>new_level->refcount = 0;
>new_level->path = new_level->visible = nullptr;
> @@ -11468,6 +11469,9 @@ pop_tinst_level (void)
>/* Restore the filename and line number stashed away when we started
>   this instantiation.  */
>input_location = current_tinst_level->locus;
> +  if (unsigned errs = errorcount + sorrycount)
> +if (errs > current_tinst_level->errors)
> +  current_tinst_level->had_errors = true;
>set_refcount_ptr (current_tinst_level, current_tinst_level->next);
>--tinst_depth;
>  }
> @@ -11487,7 +11491,7 @@ reopen_tinst_level (struct tinst_level *level)
>
>set_refcount_ptr (current_tinst_level, level);
>pop_tinst_level ();
> -  if (current_tinst_level)
> +  if (current_tinst_level && !current_tinst_level->had_errors)
>  current_tinst_level->errors = errorcount+sorrycount;
>
>tree decl = level->maybe_get_node ();
> @@ -28072,7 +28076,9 @@ instantiate_pending_templates (int retries)
>

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Qing Zhao



> On Apr 24, 2025, at 15:43, Bill Wendling  wrote:
> 
> On Thu, Apr 24, 2025 at 8:15 AM Qing Zhao  wrote:
>> 
>> Hi,
>> 
>> Kees reported a segmentation failure when he used the patch to compiler 
>> kernel,
>> and the reduced the testing case is something like the following:
>> 
>> struct f {
>> void *g __attribute__((__counted_by__(h)));
>> long h;
>> };
>> 
>> extern struct f *my_alloc (int);
>> 
>> int i(void) {
>> struct f *iov = my_alloc (10);
>> int *j = (int *)iov->g;
>> return __builtin_dynamic_object_size(iov->g, 0);
>> }
>> 
>> Basically, the problem is relating to the pointee type of the pointer array 
>> being “void”,
>> As a result, the element size of the array is not available in the IR. 
>> Therefore segmentation
>> fault when calculating the size of the whole object.
>> 
>> Although it’s easy to fix this segmentation failure, I am not quite sure 
>> what’s the best
>> solution to this issue:
>> 
>> 1. Reject such usage of “counted_by” in the very beginning by reporting 
>> warning to the
>> User, and delete the counted_by attribute from the field.
>> 
>> Or:
>> 
>> 2. Accept such usage, but issue warnings when calculating the object_size in 
>> Middle-end.
>> 
>> Personally, I prefer the above 1 since I think that when the pointee type is 
>> void, we don’t know
>> The type of the element of the pointer array, there is no way to decide the 
>> size of the pointer array.
>> 
>> So, the counted_by information is not useful for the 
>> __builtin_dynamic_object_size.
>> 
>> But I am not sure whether the counted_by still can be used for bound 
>> sanitizer?
>> 
>> Thanks for suggestions and help.
>> 
> Clang supports __sized_by that can handle a 'void *', where it defaults to 
> 'u8'.

Thanks for the info.

Qing
> 
> -bw
> 
>> Qing
>> 
>> 
>> 
>> 
>>> On Apr 23, 2025, at 15:45, Qing Zhao  wrote:
>>> 
>>> Hi,
>>> 
>>> This is the 2nd version of the patch set to extend "counted_by" attribute
>>> to pointer fields of structures.
>>> 
>>> the first version was submitted 3 months ago on 1/16/2025, and triggered
>>> a lot of discussion on whether we need a new syntax for counted_by
>>> attribute.
>>> 
>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html
>>> 
>>> After a long discussion since then:
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677024.html)
>>> 
>>> We agreed to the following compromised solution:
>>> 
>>> 1. Keep the current syntax of counted_by for lone identifier;
>>> 2. Add a new attribute "counted_by_exp" for expressions.
>>> 
>>> Although there are still some discussion going on for the new
>>> counted_by_exp attribute (In Clang community)
>>> https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885
>>> 
>>> The syntax for the lone identifier is kept the same as before.
>>> 
>>> So, I'd like to resubmit my previous patch of extending "counted_by"
>>> to pointer fields of structures.
>>> 
>>> The whole patch set has been rebased on the latest trunk, some testing case
>>> adjustment,  bootstrapped  and regression tested on both aarch64 and x86.
>>> 
>>> There will be a seperate patch set for the new "counted_by_exp"
>>> attribute later to cover the expressions cases.
>>> 
>>> The following are more details on this patch set:
>>> 
>>> For example:
>>> 
>>> struct PP {
>>> size_t count2;
>>> char other1;
>>> char *array2 __attribute__ ((counted_by (count2)));
>>> int other2;
>>> } *pp;
>>> 
>>> specifies that the "array2" is an array that is pointed by the
>>> pointer field, and its number of elements is given by the field
>>> "count2" in the same structure.
>>> 
>>> There are the following importand facts about "counted_by" on pointer
>>> fields compared to the "counted_by" on FAM fields:
>>> 
>>> 1. one more new requirement for pointer fields with "counted_by" attribute:
>>>  pp->array2 and pp->count2 can ONLY be changed by changing the whole 
>>> structure
>>>  at the same time.
>>> 
>>> 2. the following feature for FAM field with "counted_by" attribute is NOT
>>>  valid for the pointer field any more:
>>> 
>>>   " One important feature of the attribute is, a reference to the
>>>flexible array member field uses the latest value assigned to the
>>>field that represents the number of the elements before that
>>>reference.  For example,
>>> 
>>>   p->count = val1;
>>>   p->array[20] = 0;  // ref1 to p->array
>>>   p->count = val2;
>>>   p->array[30] = 0;  // ref2 to p->array
>>> 
>>>in the above, 'ref1' uses 'val1' as the number of the elements in
>>>'p->array', and 'ref2' uses 'val2' as the number of elements in
>>>'p->array'. "
>>> 
>>> This patch set includes 3 parts:
>>> 
>>> 1.Extend "counted_by" attribute to pointer fields of structures.
>>> 2.Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE
>>>   and use it in builtinin-object-size.
>>> 3.Use the counted_by attribute of pointers in array bound checker.
>>> 
>>> In which, the patch 1

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-24 Thread Bill Wendling

On Thu, Apr 24, 2025 at 8:15 AM Qing Zhao  wrote:
>
> Hi,
>
> Kees reported a segmentation failure when he used the patch to compiler 
> kernel,
> and the reduced the testing case is something like the following:
>
> struct f {
>  void *g __attribute__((__counted_by__(h)));
>  long h;
> };
>
> extern struct f *my_alloc (int);
>
> int i(void) {
>  struct f *iov = my_alloc (10);
>  int *j = (int *)iov->g;
>  return __builtin_dynamic_object_size(iov->g, 0);
> }
>
> Basically, the problem is relating to the pointee type of the pointer array 
> being “void”,
> As a result, the element size of the array is not available in the IR. 
> Therefore segmentation
> fault when calculating the size of the whole object.
>
> Although it’s easy to fix this segmentation failure, I am not quite sure 
> what’s the best
> solution to this issue:
>
> 1. Reject such usage of “counted_by” in the very beginning by reporting 
> warning to the
> User, and delete the counted_by attribute from the field.
>
> Or:
>
> 2. Accept such usage, but issue warnings when calculating the object_size in 
> Middle-end.
>
> Personally, I prefer the above 1 since I think that when the pointee type is 
> void, we don’t know
> The type of the element of the pointer array, there is no way to decide the 
> size of the pointer array.
>
> So, the counted_by information is not useful for the 
> __builtin_dynamic_object_size.
>
> But I am not sure whether the counted_by still can be used for bound 
> sanitizer?
>
> Thanks for suggestions and help.
>
Clang supports __sized_by that can handle a 'void *', where it defaults to 'u8'.

-bw

> Qing
>
>
>
>
> > On Apr 23, 2025, at 15:45, Qing Zhao  wrote:
> >
> > Hi,
> >
> > This is the 2nd version of the patch set to extend "counted_by" attribute
> > to pointer fields of structures.
> >
> > the first version was submitted 3 months ago on 1/16/2025, and triggered
> > a lot of discussion on whether we need a new syntax for counted_by
> > attribute.
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html
> >
> > After a long discussion since then:
> > (https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677024.html)
> >
> > We agreed to the following compromised solution:
> >
> > 1. Keep the current syntax of counted_by for lone identifier;
> > 2. Add a new attribute "counted_by_exp" for expressions.
> >
> > Although there are still some discussion going on for the new
> > counted_by_exp attribute (In Clang community)
> > https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885
> >
> > The syntax for the lone identifier is kept the same as before.
> >
> > So, I'd like to resubmit my previous patch of extending "counted_by"
> > to pointer fields of structures.
> >
> > The whole patch set has been rebased on the latest trunk, some testing case
> > adjustment,  bootstrapped  and regression tested on both aarch64 and x86.
> >
> > There will be a seperate patch set for the new "counted_by_exp"
> > attribute later to cover the expressions cases.
> >
> > The following are more details on this patch set:
> >
> > For example:
> >
> > struct PP {
> >  size_t count2;
> >  char other1;
> >  char *array2 __attribute__ ((counted_by (count2)));
> >  int other2;
> > } *pp;
> >
> > specifies that the "array2" is an array that is pointed by the
> > pointer field, and its number of elements is given by the field
> > "count2" in the same structure.
> >
> > There are the following importand facts about "counted_by" on pointer
> > fields compared to the "counted_by" on FAM fields:
> >
> > 1. one more new requirement for pointer fields with "counted_by" attribute:
> >   pp->array2 and pp->count2 can ONLY be changed by changing the whole 
> > structure
> >   at the same time.
> >
> > 2. the following feature for FAM field with "counted_by" attribute is NOT
> >   valid for the pointer field any more:
> >
> >" One important feature of the attribute is, a reference to the
> > flexible array member field uses the latest value assigned to the
> > field that represents the number of the elements before that
> > reference.  For example,
> >
> >p->count = val1;
> >p->array[20] = 0;  // ref1 to p->array
> >p->count = val2;
> >p->array[30] = 0;  // ref2 to p->array
> >
> > in the above, 'ref1' uses 'val1' as the number of the elements in
> > 'p->array', and 'ref2' uses 'val2' as the number of elements in
> > 'p->array'. "
> >
> > This patch set includes 3 parts:
> >
> > 1.Extend "counted_by" attribute to pointer fields of structures.
> > 2.Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE
> >and use it in builtinin-object-size.
> > 3.Use the counted_by attribute of pointers in array bound checker.
> >
> > In which, the patch 1 and 2 are simple and straightforward, however, the 
> > patch 3
> > is a little complicate due to the following reason:
> >
> >Current array bound checker only instrument

[PATCH] Fortran: fix procedure pointer handling with -fcheck=pointer [PR102900]

2025-04-24 Thread Harald Anlauf


Dear all,

the attached patch is the result of my attempts to fix an ICE when
compiling gfortran.dg/proc_ptr_52.f90 with -fcheck=all.  While
trying to reduce this, I found several oddities with functions
returning class(*), pointer that ICE'd too.

The original ICE in the PR turned out to be a bug in the pointer
checking code when passing a procedure pointer to a CLASS procedure
dummy that tried to access the container of the procedure pointer.
I believe that this should not be done, and one should only check
that the procedure pointer is not null.
I am not too experienced which class-valued functions, so if any
of the experts (Paul, Andre', ...) could have a look?

(After fixing the issue with -fcheck=pointer, I ran into a bogus
error with -Wexternal-argument-mismatch for the same testcase.
This is now pr119928.)

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Cheers,
Harald

From a6ec26a7d7a92a5e2cefedf08a4616060783050e Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 24 Apr 2025 21:28:35 +0200
Subject: [PATCH] Fortran: fix procedure pointer handling with -fcheck=pointer
 [PR102900]

	PR fortran/102900

gcc/fortran/ChangeLog:

	* trans-decl.cc (gfc_generate_function_code): Use sym->result
	when generating fake result decl for functions returning
	allocatable or pointer results.
	* trans-expr.cc (gfc_conv_procedure_call): When checking the
	pointer status of an actual argument passed to a non-allocatable,
	non-pointer dummy which is of type CLASS, do not check the
	class container of the actual if it is just a procedure pointer.
	(gfc_trans_pointer_assignment): Fix treatment of assignment to
	NULL of a procedure pointer.

gcc/testsuite/ChangeLog:

	* gfortran.dg/proc_ptr_52.f90: Add -fcheck=pointer to options.
	* gfortran.dg/proc_ptr_57.f90: New test.
---
 gcc/fortran/trans-decl.cc |  6 ++--
 gcc/fortran/trans-expr.cc | 10 ---
 gcc/testsuite/gfortran.dg/proc_ptr_52.f90 |  1 +
 gcc/testsuite/gfortran.dg/proc_ptr_57.f90 | 36 +++
 4 files changed, 46 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/proc_ptr_57.f90

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index ee48a820f28..43bd7be54cb 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -8079,13 +8079,13 @@ gfc_generate_function_code (gfc_namespace * ns)
 		   || sym->result->ts.u.derived->attr.alloc_comp
 		   || sym->result->ts.u.derived->attr.pointer_comp))
 	  || (sym->result->ts.type == BT_CLASS
-		  && (CLASS_DATA (sym)->attr.allocatable
-		  || CLASS_DATA (sym)->attr.class_pointer
+		  && (CLASS_DATA (sym->result)->attr.allocatable
+		  || CLASS_DATA (sym->result)->attr.class_pointer
 		  || CLASS_DATA (sym->result)->attr.alloc_comp
 		  || CLASS_DATA (sym->result)->attr.pointer_comp
 	{
 	  artificial_result_decl = true;
-	  result = gfc_get_fake_result_decl (sym, 0);
+	  result = gfc_get_fake_result_decl (sym->result, 0);
 	}
 
   if (result != NULL_TREE && sym->attr.function && !sym->attr.pointer)
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 19e5669b9ee..8d9448eb9b6 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -8145,7 +8145,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		goto end_pointer_check;
 
 	  tmp = parmse.expr;
-	  if (fsym && fsym->ts.type == BT_CLASS)
+	  if (fsym && fsym->ts.type == BT_CLASS && !attr.proc_pointer)
 		{
 		  if (POINTER_TYPE_P (TREE_TYPE (tmp)))
 		tmp = build_fold_indirect_ref_loc (input_location, tmp);
@@ -10912,9 +10912,11 @@ gfc_trans_pointer_assignment (gfc_expr * expr1, gfc_expr * expr2)
   gfc_init_se (&lse, NULL);
 
   /* Usually testing whether this is not a proc pointer assignment.  */
-  non_proc_ptr_assign = !(gfc_expr_attr (expr1).proc_pointer
-			&& expr2->expr_type == EXPR_VARIABLE
-			&& expr2->symtree->n.sym->attr.flavor == FL_PROCEDURE);
+  non_proc_ptr_assign
+= !(gfc_expr_attr (expr1).proc_pointer
+	&& ((expr2->expr_type == EXPR_VARIABLE
+	 && expr2->symtree->n.sym->attr.flavor == FL_PROCEDURE)
+	|| expr2->expr_type == EXPR_NULL));
 
   /* Check whether the expression is a scalar or not; we cannot use
  expr1->rank as it can be nonzero for proc pointers.  */
diff --git a/gcc/testsuite/gfortran.dg/proc_ptr_52.f90 b/gcc/testsuite/gfortran.dg/proc_ptr_52.f90
index cb7cf7040a9..421d2479cd6 100644
--- a/gcc/testsuite/gfortran.dg/proc_ptr_52.f90
+++ b/gcc/testsuite/gfortran.dg/proc_ptr_52.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-additional-options "-fcheck=pointer" }
 !
 ! Test the fix for PRs93924 & 93925.
 !
diff --git a/gcc/testsuite/gfortran.dg/proc_ptr_57.f90 b/gcc/testsuite/gfortran.dg/proc_ptr_57.f90
new file mode 100644
index 000..7ecb88f172c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/proc_ptr_57.f90
@@ -0,0 +1,36 @@
+! { dg-do compile }
+! { dg-additional-options "-fcheck=pointer" }
+!
+! PR fortran/102900
+
+m

Re: [PATCH] RISC-V: Add tt-ascalon-d8 integer and floating point scheduling model

2025-04-24 Thread Jeff Law





On 4/24/25 2:37 AM, Anton Blanchard wrote:

Add integer and floating point scheduling models for the Tenstorrent
Ascalon 8 wide CPU.

gcc/ChangeLog:
* config/riscv/riscv-cores.def (RISCV_TUNE): Update.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
  Add tt_ascalon_d8.
* config/riscv/riscv.md: Update tune attribute and include
  tt-ascalon-d8.md.
* config/riscv/tenstorrent-ascalon.md: New file.
This looks pretty sensible.  The only worry would be insns types that 
don't have a mapping to anything in the DFA -- those will cause an ICE 
in the scheduler as we require all insns to have a type and map to a 
reservation in the DFA.


So for example someone could ask for rv64gcv code generation, but 
ascalon-d8 scheduling.  The compiler will ultimately fault in the 
scheduler because you don't have a mapping of vector insns to a 
reservation in the DFA.


Unfortunately there is no "default" concept or inheritance in the DFA 
description.  So the only thing to do is just verify that every insn 
type mentioned near the top of riscv.md maps to a reservation of some sort.


Feel free to create a dummy reservation for the ascalon-d8 and throw 
everything not handled elsewhere into that bucket.  If you've done a 
good job at covering the insns that actually get used, then it won't 
matter from a performance.


Jeff

Re: [PATCH 0/2] Improve b4 workflow

2025-04-24 Thread Jiaxun Yang




在2025年1月2日周四 下午11:07，Jiaxun Yang写道：
> Hi all,
>
> This series improved b4 working flow by wire up code style
> and changelog checking scripts in b4's automation.
>
> Please help with review and apply.

Ping on this?

Thanks!

>
> Thanks!
>
> Signed-off-by: Jiaxun Yang 
> ---
> Jiaxun Yang (2):
>   contrib/gcc-changelog/git_email.py: Rework the script
>   b4-config: Add useful options
>
>  .b4-config |   7 +++
>  contrib/gcc-changelog/git_email.py | 125 
> +
>  2 files changed, 93 insertions(+), 39 deletions(-)
> ---
> base-commit: 81d4707a00a2d74a9caf2d806e5b0ebe13e1247c
> change-id: 20241231-b4-workflow-2ca134652c07
>
> Best regards,
> -- 
> Jiaxun Yang 

-- 
- Jiaxun

Fwd: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()

2025-04-24 Thread Claudiu Zissulescu Ianculescu

Adding missing email addresses.

-- Forwarded message -
From: Claudiu Zissulescu Ianculescu 
Date: Thu, Apr 24, 2025 at 8:48 PM
Subject: Re: [PATCH 2/2] arc: Use intrinsics for __builtin_mul_overflow ()
To: Jeff Law 


Hi Jeff,

The other attributes are not required as the pattern doesn't allow it
to be used in a predicated execution.  Thus, the default values for
the missing predicates are ok.

Best,
Claudiu

On Fri, Apr 18, 2025 at 8:43 PM Jeff Law  wrote:
>
>
>
> On 3/18/25 10:23 AM, Luis Silva wrote:
> > This patch handles both signed and unsigned
> > builtin multiplication overflow.
> >
> > Uses the "mpy.f" instruction to set the condition
> > codes based on the result.  In the event of an
> > overflow, the V flag is set, triggering a
> > conditional move depending on the V flag status.
> >
> > For example, set "1" to "r0" in case of overflow:
> >
> >   mov_s   r0,1
> >   mpy.f   r0,r0,r1
> >   j_s.d   [blink]
> >   mov.nv  r0,0
> >
> > gcc/ChangeLog:
> >
> >  * config/arc/arc.md (mulvsi4): New define_expand.
> >  (mulsi3_Vcmp): New define_insn.
> So similar to your other patch, there are other attributes (cond and
> predicable) that you may need to set.  I just don't know the port well
> enough to judge that.
>
> jeff
>

Re: [PATCH 1/3][GCC16-Stage-1] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-24 Thread Jeff Law





On 4/18/25 9:28 AM, Li, Pan2 wrote:

Thanks Jeff for comments.


So we've got 3 patches all touching on the same basic area, so we need
to be careful about staging in.


Agree, thanks Jeff for paying attention.


So don't be surprised if most review time is focused on how the costing
model works since that's a common theme across these patches.


I see, Robin also mentioned this last year.


This feels very much like a hack to me.  Why not just handle
VEC_DUPLICATE like the rest of the opcodes in riscv_rtx_costs?
Ultimately all that code needs to work together rather than having
separate paths for vector/scalar.


The idea is to separate vector related code into another sub function
for readability, instead of unroll the vector cost logic in riscv_rtx_costs. 
Given the
riscv_rtx_costs function body is quite long already. Currently we may have
Vec_dup but it may introduce more cases.
All true and understandable.  The problem is rtx_cost isn't really 
separable like that.  RTL in there can be fairly arbitrary and mixed, 
just because we have a vector mode doesn't mean we won't have scalar 
ops.  Worse yet those scalar ops might be something more complex than a 
simple register.


You should expect to get arbitrary RTL in there with arbitrary operands 
-- it doesn't have to match anything actually supported by the target.


The point is the separation isn't neat and clean in rtx_costs.   We've 
kind of let it go as-is with the vector short-cut out, but I don't think 
that's really where we want to be, it was just an expedient decision to 
allow us to focus on more important stuff.  Now that we're looking at 
using rtx costing in more meaningful ways we probably need to rethink 
the hack we've got in place.



Jeff

Re: [PATCH v2] libstdc++: Add lvalue overload for generator::yield_value

2025-04-24 Thread Arsen Arsenović

Jonathan Wakely  writes:

> This was approved in Wrocław as LWG 3899.
>
> This avoids creating a new coroutine frame to co_yield the elements of
> an lvalue generator.
>
> libstdc++-v3/ChangeLog:
>
>   * include/std/generator (generator::yield_value): Add overload
>   taking lvalue element_of view, as per LWG 3899.
>   * testsuite/24_iterators/range_generators/lwg3899.cc: New test.
> ---
>
> Added a testcase (thanks to Arsen).
>
> Tested x86_64-linux.
>

LGTM also, assuming the test still passes :-)

Thank you again!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature

RE: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Liu, Hongtao




> -Original Message-
> From: Jan Hubicka 
> Sent: Friday, April 25, 2025 12:27 AM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com
> Subject: Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.
> 
> > Since ix86_expand_sse_movcc will simplify them into a simple vmov,
> > vpand or vpandn.
> > Current register_operand/vector_operand could lose some optimization
> > opportunity.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/predicates.md (vector_or_0_or_1s_operand): New
> predicate.
> > (nonimm_or_0_or_1s_operand): Ditto.
> > * config/i386/sse.md (vcond_mask_):
> > Extend the predicate of operands1 to accept 0 or allones
> > operands.
> > (vcond_mask_): Ditto.
> > (vcond_mask_v1tiv1ti): Ditto.
> > (vcond_mask_): Ditto.
> > * config/i386/i386.md (movcc): Ditto for operands[2] and
> > operands[3].
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/blendv-to-maxmin.c: New test.
> > * gcc.target/i386/blendv-to-pand.c: New test.
> 
> > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > new file mode 100644
> > index 000..042eb7d8f24
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > +
> > +double
> > +foo (double a)
> > +{
> > +  if (a > 0.0)
> > +return a;
> > +  return 0.0;
> > +}
> 
> With -ffast-math this is matched as MAX_EXPR at gimple level. Without -ffast-
> math we can not do that since MAX_EXPR (and RTL SMAX) are explicitely
> documented as unspecified when one of parameters is nan.
> 
> So without -ffast-math at combine time we see:
> (insn 6 3 7 2 (set (reg:DF 103)
> (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
>  (nil))
> (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> (unspec:DF [
> (reg:DF 104 [ a ])
> (reg:DF 103)
> ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
>  (expr_list:REG_DEAD (reg:DF 104 [ a ])
> (expr_list:REG_DEAD (reg:DF 103)
> (nil
> 
> maxss is defined as:
> 
> MAX(SRC1, SRC2)
> {
> IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> ELSE DEST := SRC2;
> FI;
> }
> 
> which I think translates to
>   SRC1 > SRC1 : SRC1 : SRC2
Yes, for minss/maxss
> 
> If SRC1 and SRC2 are both 0, this should evaulate to false and return RC2 if
> one of them is NaN this should evaulate to false and return SRC2
> 
> so it seems to do right side cases and has direct RTL equivalent.  So why we
> need UNSPEC_IEEE_MAX at all? Expressing this in RTL directly would enable
> RTL passes to do better job.
> Similarly for BLENDV...
Note for blendv, it checks the significant bit of the mask, not simple
 if_then_else
  mask
  if_true 
  if_false

It should be 
if_then_else
   ashiftrt mask 31
   if_true
   if_false

Maybe not very useful in practice, just like why there's UNSPEC_FMADDSUB

6334
 6335;; It would be possible to represent these without the UNSPEC as
 6336;;
 6337;; (vec_merge
 6338;;   (fma op1 op2 op3)
 6339;;   (fma op1 op2 (neg op3))
 6340;;   (merge-const))
 6341;;
 6342;; But this doesn't seem useful in practice.
 6343
 6344(define_expand "vec_fmaddsub4"
 6345  [(set (match_operand:VFH 0 "register_operand")
 6346(unspec:VFH
 6347  [(match_operand:VFH 1 "nonimmediate_operand")
 6348   (match_operand:VFH 2 "nonimmediate_operand")
 6349   (match_operand:VFH 3 "nonimmediate_operand")]
 6350  UNSPEC_FMADDSUB))]
 6351  "TARGET_FMA || TARGET_FMA4 || ( == 64 || TARGET_AVX512VL)")
 6352

> 
> Honza

82 matches

Mail list logo