[PATCH 1/2] libstdc++: Use ranges::iter_move in ranges::unique [PR120789]

2025-06-26 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps
15/14?

-- >8 --

PR libstdc++/120789

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__unique_fn::operator()): Use
ranges::iter_move(iter) instead of std::move(*iter).
* testsuite/25_algorithms/unique/120789.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   |  2 +-
 .../testsuite/25_algorithms/unique/120789.cc  | 36 +++
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/unique/120789.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 83eaa7da28b9..3590c501c4cd 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -1454,7 +1454,7 @@ namespace ranges
  if (!std::__invoke(__comp,
 std::__invoke(__proj, *__dest),
 std::__invoke(__proj, *__first)))
-   *++__dest = std::move(*__first);
+   *++__dest = ranges::iter_move(__first);
return {++__dest, __first};
   }
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/unique/120789.cc 
b/libstdc++-v3/testsuite/25_algorithms/unique/120789.cc
new file mode 100644
index ..24b107132473
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/unique/120789.cc
@@ -0,0 +1,36 @@
+// PR libstdc++/120789 - ranges::unique should use ranges::iter_move
+// { dg-do compile { target c++20 } }
+
+#include 
+
+struct A
+{
+  bool operator==(const A&) const;
+};
+
+struct B
+{
+  B(B&&) = delete;
+  B& operator=(const A&) const;
+
+  operator A() const;
+  bool operator==(const B&) const;
+};
+
+struct I
+{
+  using value_type = A;
+  using difference_type = int;
+  B operator*() const;
+  I& operator++();
+  I operator++(int);
+  bool operator==(const I&) const;
+  friend A iter_move(const I&);
+};
+
+void
+test01()
+{
+  std::ranges::subrange r;
+  auto [begin, end] = std::ranges::unique(r);
+}
-- 
2.50.0.131.gcf6f63ea6b



[PATCH 2/2] libstdc++: Use ranges::iter_move in ranges::remove_if [PR120789]

2025-06-26 Thread Patrick Palka
PR libstdc++/120789

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__remove_if_fn::operator()): Use
ranges::iter_move(iter) instead of std::move(*iter).
* testsuite/25_algorithms/remove_if/120789.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   |  2 +-
 .../25_algorithms/remove_if/120789.cc | 36 +++
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/remove_if/120789.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 3590c501c4cd..7ef761f9c977 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -1294,7 +1294,7 @@ namespace ranges
for (; __first != __last; ++__first)
  if (!std::__invoke(__pred, std::__invoke(__proj, *__first)))
{
- *__result = std::move(*__first);
+ *__result = ranges::iter_move(__first);
  ++__result;
}
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/remove_if/120789.cc 
b/libstdc++-v3/testsuite/25_algorithms/remove_if/120789.cc
new file mode 100644
index ..c1f4eeb9b4dd
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/remove_if/120789.cc
@@ -0,0 +1,36 @@
+// PR libstdc++/120789 - ranges::remove_if should use ranges::iter_move
+// { dg-do compile { target c++20 } }
+
+#include 
+
+struct A
+{
+  bool operator==(const A&) const;
+};
+
+struct B
+{
+  B(B&&) = delete;
+  B& operator=(const A&) const;
+
+  operator A() const;
+  bool operator==(const B&) const;
+};
+
+struct I
+{
+  using value_type = A;
+  using difference_type = int;
+  B operator*() const;
+  I& operator++();
+  I operator++(int);
+  bool operator==(const I&) const;
+  friend A iter_move(const I&);
+};
+
+void
+test01()
+{
+  std::ranges::subrange r;
+  auto [begin, end] = std::ranges::remove_if(r, [](auto&&) { return true; });
+}
-- 
2.50.0.131.gcf6f63ea6b



Re: [PATCH] RISC-V: Fix CFA offsets for stack probes in loop [PR119944]

2025-06-26 Thread Jeff Law




On 6/26/25 7:51 AM, Raphael Moreira Zinsly wrote:

The CFI output for when we do stack probing in a loop were not
accounting for the first sp adjustments, we can fix that by using the
frame's total size.
This is already being tested by g++.dg/torture/pr119610.C.

gcc/ChangeLog:
gcc/config/riscv/riscv.cc
(riscv_allocate_and_probe_stack_space): Use the total frame size
instead of the current adjustment size to set the CFI.

gcc/testsuite/ChangeLog:
gcc.target/riscv/stack-check-cfa-2.c: Fix expected output.

OK.  Thanks for jumping on this.

Jeff




Re: [PATCH] RISC-V: Add pipeline-checker script

2025-06-26 Thread Jeff Law




On 6/26/25 3:27 AM, Kito Cheng wrote:

Pipeline checker utility for RISC-V architecture that validates processor
pipeline models. This tool analyzes machine description files to ensure all
instruction types are properly handled by pipeline scheduling models.

I write this tool since I am implment vector pipeline stuff for SiFive
core, but it's hard to find which instruction type is not handled by
pipeline scheduling models. This tool will help me to find out which
instruction type is not handled by pipeline scheduling models, so I can
fix them.

And I think it may be useful for other RISC-V core developers, so I
decided to upstream that :)
Sounds fantastic!  I'll have to run it internally on our scheduler 
models and I'll have Austin do it on the upcoming spacemit x60 model.


Jeff



[PATCH 6/8] libstdc++: Directly implement ranges::nth_element [PR100795]

2025-06-26 Thread Patrick Palka
PR libstdc++/100795

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__detail::__introselect): New,
based on the stl_algo.h implementation.
(nth_element_fn::operator()): Reimplement in terms of the above.
* testsuite/25_algorithms/nth_element/constrained.cc:
---
 libstdc++-v3/include/bits/ranges_algo.h   | 47 +--
 .../25_algorithms/nth_element/constrained.cc  | 31 
 2 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index a9924cd9c49e..b12da2af1263 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -2805,6 +2805,33 @@ namespace ranges
 
   inline constexpr __is_sorted_fn is_sorted{};
 
+  namespace __detail
+  {
+template
+  constexpr void
+  __introselect(_Iter __first, _Iter __nth, _Iter __last,
+   iter_difference_t<_Iter> __depth_limit, _Comp __comp)
+  {
+   while (__last - __first > 3)
+ {
+   if (__depth_limit == 0)
+ {
+   __detail::__heap_select(__first, __nth + 1, __last, __comp);
+   // Place the nth largest element in its final position.
+   ranges::iter_swap(__first, __nth);
+   return;
+ }
+   --__depth_limit;
+   _Iter __cut = __detail::__unguarded_partition_pivot(__first, 
__last, __comp);
+   if (__cut <= __nth)
+ __first = __cut;
+   else
+ __last = __cut;
+ }
+   __detail::__insertion_sort(__first, __last, __comp);
+  }
+  } // namespace __detail
+
   struct __nth_element_fn
   {
 template _Sent,
@@ -2814,11 +2841,21 @@ namespace ranges
   operator()(_Iter __first, _Iter __nth, _Sent __last,
 _Comp __comp = {}, _Proj __proj = {}) const
   {
-   auto __lasti = ranges::next(__first, __last);
-   _GLIBCXX_STD_A::nth_element(std::move(__first), std::move(__nth),
-   __lasti,
-   __detail::__make_comp_proj(__comp, __proj));
-   return __lasti;
+   if constexpr (!same_as<_Iter, _Sent>)
+ return (*this)(__first, __nth, ranges::next(__first, __last),
+std::move(__comp), std::move(__proj));
+   else
+ {
+   if (__first == __last || __nth == __last)
+ return __last;
+
+   auto __comp_proj = __detail::__make_comp_proj(__comp, __proj);
+   auto __n = __detail::__to_unsigned_like(__last - __first);
+   __detail::__introselect(__first, __nth, __last,
+   std::__bit_width(__n) * 2,
+   __comp_proj);
+   return __last;
+ }
   }
 
 template
 #include 
+#include 
 #include 
 #include 
 
@@ -67,9 +68,39 @@ test02()
   return x[3] == 4;
 }
 
+constexpr bool
+test03()
+{
+  // PR libstdc++/100795 - ranges::sort should not use std::sort directly
+#if __SIZEOF_INT128__
+  auto v = std::views::iota(__int128(0), __int128(20));
+#else
+  auto v = std::views::iota(0ll, 20ll);
+#endif
+
+  int storage[20] = {2,5,4,3,1,6,7,9,10,8,11,14,12,13,15,16,18,0,19,17};
+  auto w = v | std::views::transform([&](auto i) -> int& { return storage[i]; 
});
+  using type = decltype(w);
+  using cat = 
std::iterator_traits>::iterator_category;
+  static_assert( std::same_as );
+  static_assert( std::ranges::random_access_range );
+
+  ranges::nth_element(w, w.begin() + 10);
+  VERIFY( w[10] == 10 );
+
+  ranges::nth_element(w, w.begin() + 5, std::ranges::greater{});
+  VERIFY( w[5] == 19 - 5 );
+
+  ranges::nth_element(w, w.begin() + 15, std::ranges::greater{}, 
std::negate{});
+  VERIFY( w[15] == 15 );
+
+  return true;
+}
+
 int
 main()
 {
   test01();
   static_assert(test02());
+  static_assert(test03());
 }
-- 
2.50.0.131.gcf6f63ea6b



Re: [PATCH 7/8] libstdc++: Directly implement ranges::sample [PR100795]

2025-06-26 Thread Patrick Palka
On Thu, 26 Jun 2025, Patrick Palka wrote:

>   PR libstdc++/100795
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/ranges_algo.h (__sample_fn::operator()):
>   Reimplement the forward_iterator branch directly.
>   * testsuite/25_algorithms/sample/constrained.cc (test02):
>   New test.
> ---
>  libstdc++-v3/include/bits/ranges_algo.h   | 70 +--
>  .../25_algorithms/sample/constrained.cc   | 28 
>  2 files changed, 91 insertions(+), 7 deletions(-)
> 
> diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
> b/libstdc++-v3/include/bits/ranges_algo.h
> index b12da2af1263..672a0ebce0de 100644
> --- a/libstdc++-v3/include/bits/ranges_algo.h
> +++ b/libstdc++-v3/include/bits/ranges_algo.h
> @@ -1839,14 +1839,70 @@ namespace ranges
>operator()(_Iter __first, _Sent __last, _Out __out,
>iter_difference_t<_Iter> __n, _Gen&& __g) const
>{
> + // FIXME: Correctly handle integer-class difference types.

On second thought maybe we don't need to teach uniform_int_distribution
to handle integer-class difference types.  We could just assert that
__n fits inside a long long and use that as the difference type?  Same
for shuffle.

>   if constexpr (forward_iterator<_Iter>)
> {
> - // FIXME: Forwarding to std::sample here requires computing __lasti
> - // which may take linear time.
> - auto __lasti = ranges::next(__first, __last);
> - return _GLIBCXX_STD_A::
> -   sample(std::move(__first), std::move(__lasti), std::move(__out),
> -  __n, std::forward<_Gen>(__g));
> + using _Size = iter_difference_t<_Iter>;
> + using __distrib_type = uniform_int_distribution<_Size>;
> + using __param_type = typename __distrib_type::param_type;
> + using _USize = __detail::__make_unsigned_like_t<_Size>;
> + using __uc_type
> +   = common_type_t::result_type, 
> _USize>;
> +
> + if (__first == __last)
> +   return __out;
> +
> + __distrib_type __d{};
> + _Size __unsampled_sz = ranges::distance(__first, __last);
> + __n = std::min(__n, __unsampled_sz);
> +
> + // If possible, we use __gen_two_uniform_ints to efficiently produce
> + // two random numbers using a single distribution invocation:
> +
> + const __uc_type __urngrange = __g.max() - __g.min();
> + if (__urngrange / __uc_type(__unsampled_sz) >= 
> __uc_type(__unsampled_sz))
> +   // I.e. (__urngrange >= __unsampled_sz * __unsampled_sz) but 
> without
> +   // wrapping issues.
> +   {
> + while (__n != 0 && __unsampled_sz >= 2)
> +   {
> + const pair<_Size, _Size> __p =
> +   __gen_two_uniform_ints(__unsampled_sz, __unsampled_sz - 
> 1, __g);
> +
> + --__unsampled_sz;
> + if (__p.first < __n)
> +   {
> + *__out = *__first;
> + ++__out;
> + --__n;
> +   }
> +
> + ++__first;
> +
> + if (__n == 0) break;
> +
> + --__unsampled_sz;
> + if (__p.second < __n)
> +   {
> + *__out = *__first;
> + ++__out;
> + --__n;
> +   }
> +
> + ++__first;
> +   }
> +   }
> +
> + // The loop above is otherwise equivalent to this one-at-a-time 
> version:
> +
> + for (; __n != 0; ++__first)
> +   if (__d(__g, __param_type{0, --__unsampled_sz}) < __n)
> + {
> +   *__out = *__first;
> +   ++__out;
> +   --__n;
> + }
> + return __out;
> }
>   else
> {
> @@ -1867,7 +1923,7 @@ namespace ranges
>   if (__k < __n)
> __out[__k] = *__first;
> }
> - return __out + __sample_sz;
> + return __out + iter_difference_t<_Out>(__sample_sz);
> }
>}
>  
> diff --git a/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc 
> b/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc
> index b9945b164903..150e2d2036e0 100644
> --- a/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc
> +++ b/libstdc++-v3/testsuite/25_algorithms/sample/constrained.cc
> @@ -20,6 +20,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -59,9 +60,36 @@ test01()
>  }
>  }
>  
> +void
> +test02()
> +{
> +  // PR libstdc++/100795 - ranges::sample should not use std::sample
> +#if 0 // FIXME: ranges::sample rejects integer-class difference types.
> +#if __SIZEOF_INT128__
> +  auto v = std::views::iota(__int128(0), __int128(20));
> +#else
> +  auto v = std::views::iota(0ll, 20ll);
> +#endif
> +#else
> +  auto v = std::views::iota(0, 20);
> +#endif
> +
> +  int storage[20]

Re: [committed] i386: Introduce crc_revsi4 expanders [PR120719]

2025-06-26 Thread Andi Kleen
Uros Bizjak  writes:

> Introduce crc_revsi4 expanders to generate CRC32 instruction when using
> __builtin_rev_crc32_data* builtins with 0x1EDC6F41 poylnomial and -mcrc32.
>
> PR target/120719
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (crc_revsi4): New expander.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/crc-builtin-crc32.c: New test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

This is missing enabling the test cases crc-crc32c* for the CRC pattern
matching pass, which are currently only on aarch64/loongarch.

So we're not sure if it actually works for that.

Also of course it would be nice to support PCLMCLDQ too like ARM.

-Andi



[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-26 Thread pan2 . li
From: Pan Li 

This patch would like to introduce the combine of vec_dup + vssubu.vv
into vssubu.vx on the cost value of GR2VR.  The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.  There will be two cases for the combine:

Case 0:
 |   ...
 |   vmv.v.x
 | L1:
 |   vssubu.vv
 |   J L1
 |   ...

Case 1:
 |   ...
 | L1:
 |   vmv.v.x
 |   vssubu.vv
 |   J L1
 |   ...

Both will be combined to below if the cost of GR2VR is zero.
 |   ...
 | L1:
 |   vssubu.vx
 |   J L1
 |   ...

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (4):
  RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost
  RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 
0, 2 and 15
  RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 
0, 1 and 2
  RISC-V: Reconcile the existing test due to cost model change

 gcc/config/riscv/riscv-v.cc   |   1 +
 gcc/config/riscv/riscv.cc |   1 +
 gcc/config/riscv/vector-iterators.md  |   2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u16.c   |   2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u32.c   |   2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u8.c|   2 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  18 +-
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 196 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u16.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u32.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u64.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u8.c |  17 ++
 36 files changed, 323 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u8.c

-- 
2.43.0



[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-26 Thread pan2 . li
From: Pan Li 

This patch would like to combine the vec_duplicate + vssubu.vv to the
vssubu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)  \
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = FUNC (in[i], x);   \
  }

  T sat_sub(T a, T b)
  {
return (a - b) & (-(T)(a >= b));
  }

  DEF_VX_BINARY(uint32_t, sat_sub)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma
  13   │ vmv.v.x v2,a2
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vssubu.vv v1,v1,v2
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vssubu.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup):
* config/riscv/riscv.cc (riscv_rtx_costs):
* config/riscv/vector-iterators.md:

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc  | 1 +
 gcc/config/riscv/riscv.cc| 1 +
 gcc/config/riscv/vector-iterators.md | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 45dd9256d02..76fb1c36357 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -5581,6 +5581,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx 
op_2,
 case SMIN:
 case UMIN:
 case US_PLUS:
+case US_MINUS:
   icode = code_for_pred_scalar (code, mode);
   break;
 default:
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bbc7547d385..f5d2b2e74ae 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3996,6 +3996,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
case MOD:
case UMOD:
case US_PLUS:
+   case US_MINUS:
  *total = get_vector_binary_rtx_cost (op, scalar2vr_cost);
  break;
default:
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 0e1318d1447..782544423c4 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -4042,7 +4042,7 @@ (define_code_iterator any_int_binop [plus minus and ior 
xor ashift ashiftrt lshi
 ])
 
 (define_code_iterator any_int_binop_no_shift_v_vdup [
-  plus minus and ior xor mult div udiv mod umod smax umax smin umin us_plus
+  plus minus and ior xor mult div udiv mod umod smax umax smin umin us_plus 
us_minus
 ])
 
 (define_code_iterator any_int_binop_no_shift_vdup_v [
-- 
2.43.0



[PATCH v1 4/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-26 Thread pan2 . li
From: Pan Li 

The cost model change will make the default cost of vx to 2, thus
reconcile the asm check for this change.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c:
Update the asm check due to cost model change.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c:
Diito.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c:
Ditto.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c   | 2 +-
 .../riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
index 2261872e3de..b32907afcbb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint16_t, uint32_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
index 4250567686a..344080cb93a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
index 656aad70165..492c3168216 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c
@@ -6,5 +6,5 @@
 DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t)
 
 /* { dg-final { scan-tree-dump-times ".SAT_SUB " 1 "optimized" } } */
-/* { dg-final { scan-assembler-times {vssubu\.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu\.vv} 1 } } */
 /* { dg-final { scan-assembler-times {vnsrl\.wi} 1 } } */
-- 
2.43.0



[PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-26 Thread pan2 . li
From: Pan Li 

Add asm dump check and run test for vec_duplicate + vssubu.vv
combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  18 +-
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 196 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u16.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u32.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u64.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vssub-run-1-u8.c |  17 ++
 18 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vssub-run-1-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
index 21a207edce7..b064748fc14 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
index d1063adb0d6..e334bb3690b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
index 3d96503fd9a..3e8ca0570cd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vminu.vx} 2 } } */
 /* { dg-final { scan-assembler-times {vsaddu.vx} 1 } } */
+/* { dg-final { scan-assembler-times {vssubu.vx} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
index 339a35c3f42..1f995cd8dc1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c
@@ -18,3 +18,4 @@ TEST_BINARY_VX_UNSIGNED_0(T)
 /

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-26 Thread pan2 . li
From: Pan Li 

Add asm dump check test for vec_duplicate + vssubu.vv combine to
vssubu.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vssubu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c  | 2 ++
 12 files changed, 24 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
index de10d66a1b2..afb5a8513a9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY_X8)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY_X8)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY_X8)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -30,3 +31,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BOD
 /* { dg-final { scan-assembler {vmaxu.vx} } } */
 /* { dg-final { scan-assembler {vminu.vx} } } */
 /* { dg-final { scan-assembler {vsaddu.vx} } } */
+/* { dg-final { scan-assembler {vssubu.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
index 2e59da06c97..a907e9b7222 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY_X4)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY_X4)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY_X4)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -29,3 +30,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BOD
 /* { dg-final { scan-assembler {vremu.vx} } } */
 /* { dg-final { scan-assembler {vmaxu.vx} } } */
 /* { dg-final { scan-assembler {vminu.vx} } } */
+/* { dg-final { scan-assembler {vssubu.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
index 064ed1f2e89..efabf9930f0 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c
@@ -18,6 +18,7 @@ DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, 
VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_0_WARP(T), min, VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, MIN_FUNC_1_WARP(T), min, VX_BINARY_FUNC_BODY)
 DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_WRAP(T), sat_add, 
VX_BINARY_FUNC_BODY)
+DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_SUB_FUNC_WRAP(T), sat_sub, 
VX_BINARY_FUNC_BODY)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
@@ -30,3 +31,4 @@ DEF_VX_BINARY_CASE_3_WRAP(T, SAT_U_ADD_FUNC_W

Re: [PATCH v4 1/4] Hard register constraints

2025-06-26 Thread Jeff Law




On 6/26/25 10:38 AM, Stefan Schulze Frielinghaus wrote:


So you need a ChangeLog, but this is OK once the ChangeLog is cobbled
together.  I think you should wait to commit until all 4 patches in this
series are ACK'd though.


Thanks for reviewing/commenting all four patches.  Very much appreciated!

Do I need approval of target maintainers, too?  Maybe for
cris_md_asm_adjust() from cris.cc and map_egpr_constraints() from
i386.cc and maybe the new target tests?

No additional approval needed as I'm a global maintainer :-)

jeff



Re: [PATCH v4 2/4] Error handling for hard register constraints

2025-06-26 Thread Jeff Law




On 6/26/25 10:46 AM, Stefan Schulze Frielinghaus wrote:

On Sat, Jun 21, 2025 at 09:18:43AM -0600, Jeff Law wrote:



On 5/20/25 1:22 AM, Stefan Schulze Frielinghaus wrote:

This implements error handling for hard register constraints including
potential conflicts with register asm operands.

In contrast to register asm operands, hard register constraints allow
more than just one register per operand.  Even more than just one
register per alternative.  For example, a valid constraint for an
operand is "{r0}{r1}m,{r2}".  However, this also means that we have to
make sure that each register is used at most once in each alternative
over all outputs and likewise over all inputs.  For asm statements this
is done by this patch during gimplification.  For hard register
constraints used in machine description, error handling is still a todo
and I haven't investigated this so far and consider this rather a low
priority.

There are 9/10 call sides for parse_{input,output}_constraint() which I
didn't dare to touch in the first run.  If this patch is about to be
accepted I could change those call sides and explicitly pass a null
pointer instead of overloading those functions as it is done right now.
I consider this an implementation nit and didn't want to clutter the
patch for reviewing.

Makes sense. I tend to prefer the overloads when we can easily do so, so
please make that change.  You're going to need a ChangeLog as well.


With those changes this is OK as well.


Just to get this right, you prefer the overloads which means I leave the
patch as is, right?

Sorry there was a missing "avoiding".

I tend to prefer avoiding the overloads when we can easily do so.




As promised in the cover letter I will also look into the last failing
test.  What is a bit annoying, now, that some errors are thrown during
gimplification and some during expand.  For example, from pr87600-3.c
test1 fails during expand_asm_stmt() and test{2,3} fail during
parse_input_constraint().  Since after throwing an error during
gimplification we do not reach expand anymore and the dg-warnings for
those fail.  Currently I see two ways to fix this.  Simply split the
test into two files, or move the error handling part from expand to
gimplification.  For the moment I went with the former since that can be
quickly done ;-) and also without my patches some errors are thrown
during gimplification and some during expand, i.e., this kind of problem
already exists.  Thus, I don't have a strong opinion here but wanted to
make it clear upfront.
Understood.  In general I prefer diagnostics as early as we can 
reasonably do them.  So perhaps splitting the test is the way to go.



Jeff



Re: [PATCH v4 4/4] Rewrite register asm into hard register constraints

2025-06-26 Thread Jeff Law




On 6/26/25 10:51 AM, Stefan Schulze Frielinghaus wrote:



I didn't do any demotion of clobbers since I didn't see any value in it.
If a clobbered register gets accidentally clobbered as e.g. by an
implicitly introduced call, I wouldn't mind.
ACK.  I hadn't really thought much about it, just something I noticed 
walking through the changes.




Again, thanks for reviewing all this :)

This wasn't terribly hard, just sorry it took so damn long!

jeff



[PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-26 Thread Yuao Ma
Hi Dave,

> but the testcases don't seem to be conditionalized on this. Would the
> new tests fail if gcc is built against an insufficiently recent version
> of mpfr, and is/should there be some kind of dg-requires for this, so
> that the new tests gracefully are "UNSUPPORTED" on such configurations?

The test case is indeed conditionalized, though in a different manner than you
might expect. The condition depends on the version of MPFR we're using, and
unfortunately, I haven't found a predefined macro that indicates which MPFR
version GCC is linked against. I tried `gcc -E -dM - < /dev/null`, but didn't
find any relevant macros.

My current approach uses `__builtin_constant_p(acospi(0.5))`. If we're using a
newer MPFR version, acospi will be constant-folded, causing the condition to
evaluate to true and enabling the rest of the test. Otherwise, the condition
will be false, and the entire test case will be omitted.

Do you see any other parts of the patch that require further revision?

Thanks,
Yuao




[PATCH] expand: Allow reuse of local memory for tail call argument [PR42909]

2025-06-26 Thread Andrew Pinski
Since after a tail call function (even if it is tail called in the end),
the current function does not care about the local memory any more so
there is no reason to do a copy of the argument. This is only true for the
first usage of the decl, the rest requires a copy (c-c++-common/pr42909-3.c 
checks that).

Bootstrapped and tested on aarch64-linux-gnu.

PR middle-end/42909
gcc/ChangeLog:

* calls.cc (initialize_argument_information): For tail
calls allow to reuse the argument if it is not addressable
nor static if the first use of the decl. Disallow tails if
that argument is not an incoming argument.

gcc/testsuite/ChangeLog:

* c-c++-common/pr42909-1.c: New testcase
* c-c++-common/pr42909-2.c: New testcase
* c-c++-common/pr42909-3.c: New testcase
* c-c++-common/pr42909-4.c: New testcase

Signed-off-by: Andrew Pinski 
---
 gcc/calls.cc   | 33 ++
 gcc/testsuite/c-c++-common/pr42909-1.c | 19 +
 gcc/testsuite/c-c++-common/pr42909-2.c | 19 +
 gcc/testsuite/c-c++-common/pr42909-3.c | 38 ++
 gcc/testsuite/c-c++-common/pr42909-4.c | 20 ++
 5 files changed, 124 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr42909-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr42909-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pr42909-3.c
 create mode 100644 gcc/testsuite/c-c++-common/pr42909-4.c

diff --git a/gcc/calls.cc b/gcc/calls.cc
index ffb57622389..c0416a719d7 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "stringpool.h"
 #include "hash-map.h"
+#include "hash-set.h"
 #include "hash-traits.h"
 #include "attribs.h"
 #include "builtins.h"
@@ -1337,6 +1338,7 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
 {
   CUMULATIVE_ARGS *args_so_far_pnt = get_cumulative_args (args_so_far);
   location_t loc = EXPR_LOCATION (exp);
+  hash_set decl_reused;
 
   /* Count arg position in order args appear.  */
   int argpos;
@@ -1428,15 +1430,37 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
  const bool callee_copies
= reference_callee_copied (args_so_far_pnt, arg);
  tree base;
-
+ bool can_reuse_arg = false;
+ bool can_reuse_with_tail_call = call_from_thunk_p;
  /* If we're compiling a thunk, pass directly the address of an object
 already in memory, instead of making a copy.  Likewise if we want
-to make the copy in the callee instead of the caller.  */
+to make the copy in the callee instead of the caller. */
  if ((call_from_thunk_p || callee_copies)
  && TREE_CODE (args[i].tree_value) != WITH_SIZE_EXPR
  && ((base = get_base_address (args[i].tree_value)), true)
  && TREE_CODE (base) != SSA_NAME
  && (!DECL_P (base) || MEM_P (DECL_RTL (base
+   can_reuse_arg = true;
+
+ /* If we're compiling a tail call, pass the address of an object
+already in memory, instead of a copy if the argument is a local
+variable. Since after the tail call, the memory belongs to the 
caller,
+this is safe even if we don't expand the call in the end as a tail 
call.
+The first use of the decl can reuse it, the rest uses requires a 
copy.  */
+ if (*may_tailcall
+ && TREE_CODE (args[i].tree_value) != WITH_SIZE_EXPR
+ && ((base = get_base_address (args[i].tree_value)), true)
+ && TREE_CODE (base) != SSA_NAME
+ && DECL_P (base)
+ && !TREE_STATIC (base)
+ && !TREE_ADDRESSABLE (base)
+ && !decl_reused.add (base))
+   {
+ can_reuse_arg = true;
+ can_reuse_with_tail_call = TREE_CODE (base) == PARM_DECL;
+   }
+
+ if (can_reuse_arg)
{
  /* We may have turned the parameter value into an SSA name.
 Go back to the original parameter so we can take the
@@ -1461,11 +1485,10 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
 
  /* We can't use sibcalls if a callee-copied argument is
 stored in the current function's frame.  */
- if (!call_from_thunk_p && DECL_P (base) && !TREE_STATIC (base))
+ if (!can_reuse_with_tail_call && DECL_P (base) && !TREE_STATIC 
(base))
{
  *may_tailcall = false;
- maybe_complain_about_tail_call (exp, _("a callee-copied "
-"argument is stored "
+ maybe_complain_about_tail_call (exp, _("an argument is stored 
"
 "in the current "
   

Re: [committed] i386: Introduce crc_revsi4 expanders [PR120719]

2025-06-26 Thread Uros Bizjak
On Fri, Jun 27, 2025 at 7:27 AM Andi Kleen  wrote:
>
> Uros Bizjak  writes:
>
> > Introduce crc_revsi4 expanders to generate CRC32 instruction when 
> > using
> > __builtin_rev_crc32_data* builtins with 0x1EDC6F41 poylnomial and -mcrc32.
> >
> > PR target/120719
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (crc_revsi4): New expander.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/crc-builtin-crc32.c: New test.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> This is missing enabling the test cases crc-crc32c* for the CRC pattern
> matching pass, which are currently only on aarch64/loongarch.
>
> So we're not sure if it actually works for that.

Included are target-dependent tests that result in crc32 instructions.
Generic tests are performed elsewhere (please see
gcc.dg/crc-builtin-target{32,64}.c).

> Also of course it would be nice to support PCLMCLDQ too like ARM.

As the saying goes: "Patches welcome!".

Uros.


Re: [PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-26 Thread Tobias Burnus

Hi Yuao,


Yuao Ma wrote:

>//but the testcases don't seem to be conditionalized on this. Would the
>//new tests fail if gcc is built against an insufficiently recent version
>//of mpfr,

…
The test case is indeed conditionalized, though in a different manner 
than you
might expect. The condition depends on the version of MPFR we're 
using, and
unfortunately, I haven't found a predefined macro that indicates which 
MPFR

version GCC is linked against.


I think there is a detour way: The 'print_version' function (toplev.cc) 
prints the MPFR version, but only when not printing to stderr.



Thus, I get the desired output with:


gcc -S -fverbose-asm -o - -x c - < /dev/null


[I think the /dev/null is not quite portable; possibly a pipe ("echo 
|...") or an empty file is more portable.]



The output contains here "... MPFR version 4.2.2 ..."


Thus, you could wrap this into an effective target check, similar to the 
others in gcc/testsuite/lib/target-supports.exp + then use a '{ target 
... }' in the test case.



My current approach uses `__builtin_constant_p(acospi(0.5))`. If we're 
using a
newer MPFR version, acospi will be constant-folded, causing the 
condition to
evaluate to true and enabling the rest of the test. Otherwise, the 
condition

will be false, and the entire test case will be omitted.


... which might be well sufficient. But if so, I think it deserves a 
comment in the testcase because it is not obvious at a glance.



Tobias



Re: [PATCH 2/2] Fixup vector epilog analysis skipping when not using partial vectors

2025-06-26 Thread Richard Biener
On Thu, 26 Jun 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following avoids re-analyzing the loop as epilogue when not
> > using partial vectors and the mode is the same as the autodetected
> > vector mode and that has a too high VF for a non-predicated loop.
> > This situation occurs almost always on x86 and saves us one
> > re-analysis unless --param vect-partial-vector-usage is non-default.
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> >
> > Thanks,
> > Richard.
> >
> > * tree-vect-loop.cc (vect_analyze_loop): Prune epilogue
> > analysis further when not using partial vectors.
> > ---
> >  gcc/tree-vect-loop.cc | 20 
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index b91ef4a2325..d9091c6c705 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -3770,6 +3770,26 @@ vect_analyze_loop (class loop *loop, gimple 
> > *loop_vectorized_call,
> > break;
> >   continue;
> > }
> > + /* We would need an exhaustive search to find all modes we
> > +skipped but that would lead to the same result as another
> > +and where we'd could check cached_vf_per_mode against.
> 
> I didn't really follow this.  Is there a missing word around "another"?

I've reworded it to

  /* We would need an exhaustive search to find all modes we
 skipped but that would lead to the same result as the
 analysis it was skipped for and where we'd could check 
 cached_vf_per_mode against.
 Check for the autodetected mode, which is the common
 situation on x86 which does not perform cost comparison.  */

basically the mode skipping logic in vect_analyze_loop_1 leaves us
with unfilled (zero) cached_vf_per_mode[], and we'd ideally skip
the very same modes when analyzing the epilogue with the extra
maybe_ge (cached_vf_per_mode[mode_i], first_vinfo_vf) when not
using partial vectors.

> > +Check for the autodetected mode, which is the common
> > +situation on x86 which does not perform cost comparison.  */
> > + if (!supports_partial_vectors
> > + && maybe_ge (cached_vf_per_mode[0], first_vinfo_vf)
> > + && VECTOR_MODE_P (autodetected_vector_mode)
> > + && (related_vector_mode (vector_modes[mode_i],
> > +  GET_MODE_INNER 
> > (autodetected_vector_mode))
> > + == autodetected_vector_mode)
> > + && (related_vector_mode (autodetected_vector_mode,
> > +  GET_MODE_INNER (vector_modes[mode_i]))
> > + == vector_modes[mode_i]))
> 
> Not too keen on cutting-&-pasting all this :-)  Could we split the
> VECTOR_MODE_P onwards into a subroutine that's shared with
> vect_analyze_loop_1?

Done like below.  I do wonder in which case the different variants
of vect_chooses_same_modes_p get to different answers?

Queued for re-testing with a proposed adjustment to [1/2], see other
mail I'll send out soon.

Richard.

>From 4bbf86e65f4a761d5081daf6216dc516e8717e31 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 26 Jun 2025 11:38:47 +0200
Subject: [PATCH] Fixup vector epilog analysis skipping when not using partial
 vectors
To: gcc-patches@gcc.gnu.org

The following avoids re-analyzing the loop as epilogue when not
using partial vectors and the mode is the same as the autodetected
vector mode and that has a too high VF for a non-predicated loop.
This situation occurs almost always on x86 and saves us one
re-analysis unless --param vect-partial-vector-usage is non-default.

* tree-vectorizer.h (vect_chooses_same_modes_p): New
overload.
* tree-vect-stmts.cc (vect_chooses_same_modes_p): Likewise.
* tree-vect-loop.cc (vect_analyze_loop): Prune epilogue
analysis further when not using partial vectors.
---
 gcc/tree-vect-loop.cc  | 25 ++---
 gcc/tree-vect-stmts.cc | 17 +
 gcc/tree-vectorizer.h  |  1 +
 3 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b91ef4a2325..81a9716d51d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3535,13 +3535,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
   mode_i += 1;
 }
   if (mode_i + 1 < vector_modes.length ()
-  && VECTOR_MODE_P (autodetected_vector_mode)
-  && (related_vector_mode (vector_modes[mode_i + 1],
-  GET_MODE_INNER (autodetected_vector_mode))
- == autodetected_vector_mode)
-  && (related_vector_mode (autodetected_vector_mode,
-  GET_MODE_INNER (vector_modes[mode_i + 1]))
- == vector_modes[mode_i + 1]))
+  && vect_chooses_same_modes_p (autodetected_vector_mode,
+   vector_modes[mode_i + 1]))
 {
  

Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-26 Thread Paul Richard Thomas
Hi Andre,

I used a clean build directory but don't recall if I reconfigured. I was 10
minutes away from leaving for the airport! I'll try again when I am back at
base.

Please, everyone else, don't hesitate to review and test.

Regards

Paul


On Tue, 24 Jun 2025, 23:47 Andre Vehreschild,  wrote:

> Hi Paul,
>
> thanks for trying it. I can only affirm the need of whitespace fixes for
> the
> fifth patch. But that got me only a warning on a clean trunk. So
> application
> should be fine. Attached is an updated patch of part 5 of the patch series
> with
> all whitespace errors removed. Sorry for the bother.
>
> As to your compile error: Did you build in a clean directory? The patch
> adds
> another library and therefore makefiles are changed. This may need another
> configure run. I can only imagine that something went wrong there and the
> dependencies in the build system have not been updated, because that is
> what
> the error message indicates. Can you try again?
>
> Because "works for me" using a clean build directory. How can I help you
> further?
>
> Regards,
> Andre
>
> On Tue, 24 Jun 2025 17:23:31 +0100
> Paul Richard Thomas  wrote:
>
> > Hi Andre,
> >
> > All six patches require git apply --whitespace=fix --ignore-space-change
> <
> > ~/prs/Shared_Memory/pr88076_v1_x.patch to apply.
> >
> >  The build fails with:
> > Makefile:3848: caf/.deps/caf_error.Plo: No such file or directory
> > make[2]: *** No rule to make target 'caf/.deps/caf_error.Plo'.  Stop.
> > make[2]: Leaving directory
> > '/home/pault/gitsources/build/x86_64-pc-linux-gnu/libgfortran'
> > make[1]: *** [Makefile:16529: install-target-libgfortran] Error 2
> > make[1]: Leaving directory '/home/pault/gitsources/build'
> > make: *** [Makefile:2668: install] Error 2
> >
> > I am afraid that I have timed out for the next two weeks - sorry.
> >
> > Regards
> >
> > Paul
> >
> >
> > On Tue, 24 Jun 2025 at 14:10, Andre Vehreschild  wrote:
> >
> > > Hi all,
> > >
> > > this series of patches (six in total) adds a new coarray backend
> library to
> > > libgfortran.  The library uses shared memory and processes to implement
> > > running multiple images on the same node.  The work is based on work
> > > started by
> > > Thomas and Nicolas Koenig. No changes to the gfortran compile part are
> > > required
> > > for this.
> > >
> > > Unfortunately I found some defects in the gfortran compiler, that
> needed
> > > to be
> > > fixed. These are the first four tiny patches. The fifth patch then
> adds the
> > > library and sixth patches the testcases in
> > > gcc/testsuite/gfortran.dg/coarray to
> > > also run (and pass) when linked against caf_shmem.
> > >
> > > The development has been done on x86_64-pc-linux-gnu / Fedora 41. I am
> > > curious
> > > to learn which fixes will be needed for other platforms.
> > >
> > > This will be the last big patch that was funded by the STF/STA. My
> funding
> > > has
> > > run out and I will only be available for a few days before a new
> project
> > > will
> > > consume my attention. Therefore please bring any deficiencies to my
> > > attention
> > > as soon as possible.
> > >
> > > I have done some performance measurement against OpenCoarrays measuring
> > > coarray_icarr from https://github.com/gutmann/coarray_icar . The
> figures
> > > are:
> > >
> > > OpenCoarrays (mpich4-backend): 165.578s (real 2m59,947s)
> > > --
> > > caf_shmem (16-trunk): 61.489s (real 1m3,681s)
> > >
> > > The first number is the "Model run time:" as reported by the program.
> In
> > > the
> > > parentheses the real run time as reported by the bash command `time` is
> > > given.
> > >
> > > Both are done using a debug build of coarray_icar on an Intel Core
> > > i7-5775C CPU
> > > @ 3.30GHz having 24GB, and running Fedora Linux 41 with all recent
> patches.
> > >
> > > Regards,
> > > Andre
> > > --
> > > Andre Vehreschild * Email: vehre ad gmx dot de
> > >
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>


Re: [PATCH v2 2/2] libstdc++: Lift chrono localized formatting to main chrono format loop [PR110739]

2025-06-26 Thread Tomasz Kaminski
On Thu, Jun 26, 2025 at 12:31 PM Tomasz Kamiński 
wrote:

> This patch extract calls to _M_locale_fmt and construction of the struct
> tm,
> from the functions dedicated to each specifier, to main format loop in
> _M_format_to functions. This removes duplicated code repeated for
> specifiers.
>
> To allow _M_locale_fmt to only be called if localized formatting is enabled
> ('L' is present in chrono-format-spec), we provide a implementations for
> locale specific specifiers (%c, %r, %x, %X) that produces the same result
> as locale::classic():
>  * %c is implemented as separate _M_c method
>  * %r is implemented as separate _M_r method
>  * %x is implemented together with %D, as they provide same behavior,
>  * %X is implemented together with %R as _M_R_X, as both of them do not
> include
>subseconds.
>
> The handling of subseconds was also extracted to _M_subsecs function that
> is
> used by _M_S and _M_T specifier. The _M_T is now implemented in terms of
> _M_R_X (printing time without subseconds) and _M_subs.
>
> The __mod responsible for triggering localized formatting was removed from
> method handling most of specifiers, except:
>  * _M_S (for %S) for which it determines if subseconds should be included,
>  * _M_z (for %z) for which it determines if ':' is used as separator.
>
> PR libstdc++/110739
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h (__formatter_chrono::_M_use_locale_fmt):
> Define.
> (__formatter_chrono::_M_locale_fmt): Moved to front of the class.
> (__formatter_chrono::_M_format_to): Construct and initialize
> struct tm and call _M_locale_fmt if needed.
> (__formatter_chrono::_M_c_r_x_X): Split into separate methods.
> (__formatter_chrono::_M_c, __formatter_chrono::_M_r): Define.
> (__formatter_chrono::_M_D): Renamed to _M_D_x.
> (__formatter_chrono::_M_D_x): Renamed from _M_D.
> (__formatter_chrono::_M_R_T): Split into _M_R_X and _M_T.
> (__formatter_chrono::_M_R_X): Extracted from _M_R_T.
> (__formatter_chrono::_M_T): Define in terms of _M_R_X and
> _M_subsecs.
> (__formatter_chrono::_M_subsecs): Extracted from _M_S.
> (__formatter_chrono::_M_S): Replaced __mod with __subs argument,
> removed _M_locale_fmt call, and delegate to _M_subsecs.
> (__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
> (__formatter_chrono::_M_H_I, __formatter_chrono::_M_m)
> (__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W): Remove
> __mod argument and call to _M_locale_fmt.
> ---
>  libstdc++-v3/include/bits/chrono_io.h | 340 +-
>  1 file changed, 172 insertions(+), 168 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> index 35e95906e6a..d451bde722d 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -906,6 +906,40 @@ namespace __format
>   return __format::__write(std::move(__out), __s);
> }
>
> +  [[__gnu__::__always_inline__]]
> +  static bool
> +  _S_localized_spec(_CharT __conv, _CharT __mod)
> +  {
> +   switch (__conv)
> + {
> + case 'c':
> + case 'r':
> + case 'x':
> + case 'X':
> +   return true;
> + case 'z':
> +   return false;
> + default:
> +   return (bool)__mod;
> + };
> +  }
> +
> +  // Use the formatting locale's std::time_put facet to produce
> +  // a locale-specific representation.
> +  template
> +   _Iter
> +   _M_locale_fmt(_Iter __out, const locale& __loc, const struct tm&
> __tm,
> + char __fmt, char __mod) const
> +   {
> + basic_ostringstream<_CharT> __os;
> + __os.imbue(__loc);
> + const auto& __tp = use_facet>(__loc);
> + __tp.put(__os, __os, _S_space, &__tm, __fmt, __mod);
> + if (__os)
> +   __out = _M_write(std::move(__out), __loc, __os.view());
> + return __out;
> +   }
> +
>template
> _Out
> _M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
> @@ -923,6 +957,36 @@ namespace __format
> return std::move(__out);
>   };
>
> + struct tm __tm{};
> + bool __use_locale_fmt = false;
> + if (_M_spec._M_localized && _M_spec._M_locale_specific)
> +   if (__fc.locale() != locale::classic())
> + {
> +   __use_locale_fmt = true;
> +
> +   __tm.tm_year = (int)__t._M_year - 1900;
> +   __tm.tm_yday = __t._M_day_of_year.count();
> +   __tm.tm_mon = (unsigned)__t._M_month - 1;
> +   __tm.tm_mday = (unsigned)__t._M_day;
> +   __tm.tm_wday = __t._M_weekday.c_encoding();
> +   __tm.tm_hour = __t._M_hours.count();
> +   __tm.tm_min = __t._M_minutes.count();
> 

[RFC PATCH] c++, v2: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-26 Thread Jakub Jelinek
On Wed, Jun 25, 2025 at 04:29:41PM -0400, Jason Merrill wrote:
> > (and
> > whether perhaps cp_build_addr_expr isn't undesirable for that case, because
> > that can make vars odr-used etc.; are are odr uses in unevaluated context
> > also supposed to make vars odr-used?).
> 
> That's fine, mark_used handles not actually odr-using things in unevaluated
> context.

Thanks for the patch.

Here is an updated patch, interdiff from the last posted patch attached
(except for testsuite changes).  Had to add some tweak in the dynamic_cast
handling because the code wasn't expecting obj like
(struct S0 *)(&v0 + 16)
which is only folded into v0.D.1234 if dereferenced and the code expects
obj of the &v0.D.1234 form.

Also had to revert Marek's patch to add indexes to some vtable CONSTRUCTORs,
it added them only to some CONSTRUCTOR_ELTs and not to others; at that time
find_array_ctor_elt wasn't able to deal with index-less CONSTRUCTOR_ELTs,
now it can deal with them but only if the CONSTRUCTOR is consistent, either
all indexes or none or at least first part without indexes, then with
indexes up to the end.  Now the vtables consistently don't have any indexes,
which is something the code handles well, it can do just direct access and
for flag_checking it verifies if last elt has no index, then none of them
have and there is no RAW_DATA_CST.

I get some regressions (which I didn't get with the earlier patch, but
it isn't obvious by what it has been caused):
+FAIL: g++.dg/abi/mangle1.C  -std=gnu++26  scan-assembler \\n_?_ZN1AC2Ev[: 
\\t\\n]
+FAIL: g++.dg/abi/mangle1.C  -std=gnu++26  scan-assembler \\n_?_ZN1BC2Ev[: 
\\t\\n]
+FAIL: g++.dg/abi/mangle1.C  -std=gnu++26  scan-assembler \\n_?_ZN1CC1Ev[: 
\\t\\n]
+FAIL: g++.dg/abi/mangle1.C  -std=gnu++26  scan-assembler \\n_?_ZTV1A[: \\t\\n]
+FAIL: g++.dg/abi/vbase15.C  -std=c++26 (internal compiler error: in 
cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.dg/abi/vbase15.C  -std=c++26 (test for excess errors)
+UNRESOLVED: g++.dg/abi/vbase15.C  -std=c++26 compilation failed to produce 
executable
+FAIL: g++.dg/abi/vbase8-10.C  -std=gnu++26 (internal compiler error: in 
cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.dg/abi/vbase8-10.C  -std=gnu++26 (test for excess errors)
+UNRESOLVED: g++.dg/abi/vbase8-10.C  -std=gnu++26 compilation failed to produce 
executable
+FAIL: g++.dg/abi/vbase8-21.C  -std=gnu++26 (internal compiler error: in 
cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.dg/abi/vbase8-21.C  -std=gnu++26 (test for excess errors)
+UNRESOLVED: g++.dg/abi/vbase8-21.C  -std=gnu++26 compilation failed to produce 
executable
+FAIL: g++.dg/abi/vbase8-22.C  -std=gnu++26 (internal compiler error: in 
cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.dg/abi/vbase8-22.C  -std=gnu++26 (test for excess errors)
+UNRESOLVED: g++.dg/abi/vbase8-22.C  -std=gnu++26 compilation failed to produce 
executable
+FAIL: g++.dg/cpp2a/constexpr-dtor3.C  -std=c++26  (test for errors, line 26)
+FAIL: g++.dg/ipa/ipa-icf-4.C  -std=gnu++26  scan-ipa-dump icf "(Unified; 
Variable alias has been created)|(Symbol aliases are not supported by target)"
+FAIL: g++.dg/ipa/ipa-icf-4.C  -std=gnu++26  scan-ipa-dump icf "Equal symbols: 
[67]"
+FAIL: g++.old-deja/g++.abi/primary3.C  -std=c++26 (internal compiler error: in 
cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.old-deja/g++.abi/primary3.C  -std=c++26 (test for excess errors)
+UNRESOLVED: g++.old-deja/g++.abi/primary3.C  -std=c++26 compilation failed to 
produce executable
+FAIL: g++.old-deja/g++.abi/primary4.C  -std=c++26 (internal compiler error: in 
cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.old-deja/g++.abi/primary4.C  -std=c++26 (test for excess errors)
+UNRESOLVED: g++.old-deja/g++.abi/primary4.C  -std=c++26 compilation failed to 
produce executable
+FAIL: g++.old-deja/g++.abi/vbase8-5.C  -std=gnu++26 (internal compiler error: 
in cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.old-deja/g++.abi/vbase8-5.C  -std=gnu++26 (test for excess errors)
+UNRESOLVED: g++.old-deja/g++.abi/vbase8-5.C  -std=gnu++26 compilation failed 
to produce executable
+FAIL: g++.old-deja/g++.abi/vtable2.C  -std=gnu++26 (internal compiler error: 
in cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.old-deja/g++.abi/vtable2.C  -std=gnu++26 (test for excess errors)
+UNRESOLVED: g++.old-deja/g++.abi/vtable2.C  -std=gnu++26 compilation failed to 
produce executable
+FAIL: g++.old-deja/g++.pt/mi1.C  -std=c++26 (internal compiler error: in 
cxx_fold_indirect_ref, at cp/constexpr.cc:6154)
+FAIL: g++.old-deja/g++.pt/mi1.C  -std=c++26 (test for excess errors)
+UNRESOLVED: g++.old-deja/g++.pt/mi1.C  -std=c++26 compilation failed to 
produce executable

The ICEs are all in the same spot:
  tree off = integer_zero_node;
  canonicalize_obj_off (op, off);
  gcc_assert (integer_zerop (off));
  return cxx_fold_indirect_ref_1 (ctx, loc, type, op, 0, empty_base);
maybe will j

Re: [PATCH v2 1/2] libstdc++: Type-erase chrono-data for formatting [PR110739]

2025-06-26 Thread Jonathan Wakely

On 26/06/25 11:39 +0200, Tomasz Kamiński wrote:

This patch reworks the formatting for the chrono types, such that they are all
formatted in terms of _ChronoData class, that includes all required fields.
Populating each required field is performed in formatter for specific type,
based on the chrono-spec used.

To facilitate above, the _ChronoSpec now includes additional _M_needed field,
that represnts the chrono data that is referenced by format spec (this value
is also configured for __defSpec). This value differs from the value of
__parts passed to _M_parse, which does include all fields that can be computed
from input (e.g. weekday_indexed can be computed for year_month_day). Later
it is used to fill _ChronoData, in particular _M_fill_* family of functions,
to determine if given field needs to be set, and thus it's value needs to be


"its"


computed.

In consequence _ChronoParts enum was exteneded with additional values,


"extended"


that allows more fine grained indentification:


"identification"


* _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds,
* _TimeZone is separated into _ZoneAbbrev and _ZoneOffset,
* _LocalDays, _WeekdayIndex are defiend in included in _Date,


"defined"


* _Duration is removed, and instead _EpochUnits and _UnitSuffix are
  introduced.
Furthermore, to avoid name conflicts _ChonoParts is now defined as enum class,
with additional operators that simplify uses.


I don't love overloading operator- to mean clearing bits, but it does
make clearing the bits very convenient. Maybe just add a comment
before operator-(_ChronoParts x, _ChronoParts y) saying that it
returns a copy of x with all bits from y unset. That comment will be 
know that's what the function body 


(Which is x&(x^y) I think, right?)




In addition to fields that can be printed using chron-spec, _ChronoData stores:


"chrono-spec"


* Total days in wall time (_M_ldays), day of year (_M_day_of_year) - used by
  struct tm construction, and for ISO calendar computation.
* Total seconds in wall time (_M_lseconds) - this value may be different from
  sum of days, hours, minutes, seconds (e.g. see utc_time below). Included
  to allow future extension, like printing total minutes.
* Total seconds since epoch - due offset different from above. Again to be
  used with future extension (e.g. %s as proposed in P2945R1).
* Subseconds - count of attoseconds (10^(-18)), in addition to priting can


"printing"


  be used to  compute fractional hours, minutes.
The both total seconds fielkds we use single _TotalSeconds enumerator in


"fields"


_ChronoParts, that when present in combination with _EpochUnits or _LocalDays
indicates that _M_eseconds (_EpochSeconds) or _M_lseconds (_LocalSeconds) are
provided/required.

To handle type formatting of time since epoch ('%Q'|_EpochUnits), we use the
format_args mechanism, where the result of +d.count() (see LWG4118) is erased
into make_format_args to local __arg_store, that is later referenced by
_M_ereps (_M_ereps.get(0)).

To handle precision values, and in prepartion to allow user to configure ones,
we store the precision as third element of _M_ereps (_M_ereps.get(2)), this
allows duration with precision to be printed using "{0:{2}}". For subseconds
the precision is handled differently depending on the representation:
* for integral reps, _M_subseconds value is used to determine fractional value,
  precision is trimmed to 18 digits;
* for floating-points, we _M_ereps stores duration initialized with only


Strike "we"?


  fractional seconds, that is later formatted with precision.
Always using _M_subseconds fields for integral duration, means that we do not
use formattter for user-defined durations that are considered to be integral
(see empty_spec.cc file change). To avoid potentially expensive computation
of _M_subseconds, we make sure that _ChronoParts::_Subseconds is set only if
_Subseconds are needed. In particular we remove this flag for localized ouput
in _M_parse.

Construction the _M_ereps as described above is handled by __formatter_duration,


"Construction of the"


that is then used to format duration, hh_mm_ss and time_points specialization.


"specializations"


This class also handles _UnitSuffix, the _M_units_suffix field is populated
either with predefined suffix (chrono::__detail::__units_suffix) or one produced
locally.

Finally, formatters for types listed below contains type specific logic:
* hh_mm_ss - we do not compute total duration and seconds, unless explicitly
  requested, as such computation may overflow;
* utc_time - for time during leap second insertion, the _M_seconds field is
  increased to 60;
* __local_time_fmt - exception is thrown if zone offset (_ZoneOffset) or
  abbrevation (_ZoneAbbrev) is requsted, but corresponding pointer is null,
  futhermore conversion from `char` to `wchar_t` for abbreviation is performed
  if needed.



I had other comments about the benefits of this change, and the
unintended change that Michae

[PATCH] RISC-V: Fix CFA offsets for stack probes in loop [PR119944]

2025-06-26 Thread Raphael Moreira Zinsly
The CFI output for when we do stack probing in a loop were not
accounting for the first sp adjustments, we can fix that by using the
frame's total size.
This is already being tested by g++.dg/torture/pr119610.C.

gcc/ChangeLog:
gcc/config/riscv/riscv.cc
(riscv_allocate_and_probe_stack_space): Use the total frame size
instead of the current adjustment size to set the CFI.

gcc/testsuite/ChangeLog:
gcc.target/riscv/stack-check-cfa-2.c: Fix expected output.
---
 gcc/config/riscv/riscv.cc  | 5 +++--
 gcc/testsuite/gcc.target/riscv/stack-check-cfa-2.c | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bbc7547d385..3e31438c50c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8977,12 +8977,13 @@ riscv_allocate_and_probe_stack_space (rtx temp1, 
HOST_WIDE_INT size)
   temp2 = riscv_force_temporary (temp2, gen_int_mode (rounded_size, 
Pmode));
   insn = emit_insn (gen_sub3_insn (temp2, stack_pointer_rtx, temp2));
 
+  auto cfa_offset = cfun->machine->frame.total_size;
   if (!frame_pointer_needed)
{
  /* We want the CFA independent of the stack pointer for the
 duration of the loop.  */
  add_reg_note (insn, REG_CFA_DEF_CFA,
-   plus_constant (Pmode, temp1, rounded_size));
+   plus_constant (Pmode, temp1, cfa_offset));
  RTX_FRAME_RELATED_P (insn) = 1;
}
 
@@ -8995,7 +8996,7 @@ riscv_allocate_and_probe_stack_space (rtx temp1, 
HOST_WIDE_INT size)
{
  insn = get_last_insn ();
  add_reg_note (insn, REG_CFA_DEF_CFA,
-   plus_constant (Pmode, stack_pointer_rtx, rounded_size));
+   plus_constant (Pmode, stack_pointer_rtx, cfa_offset));
  RTX_FRAME_RELATED_P (insn) = 1;
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/stack-check-cfa-2.c 
b/gcc/testsuite/gcc.target/riscv/stack-check-cfa-2.c
index 9d36a30..3649bd1a9ce 100644
--- a/gcc/testsuite/gcc.target/riscv/stack-check-cfa-2.c
+++ b/gcc/testsuite/gcc.target/riscv/stack-check-cfa-2.c
@@ -5,9 +5,9 @@
 #define SIZE 80*1024 + 512
 #include "stack-check-prologue.h"
 
-/* { dg-final { scan-assembler-times {\.cfi_def_cfa [0-9]+, 81920} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa [0-9]+, 82432} 1 } } */
 /* { dg-final { scan-assembler-times {\.cfi_def_cfa_register 2} 1 } } */
-/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 82432} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 82944} 1 } } */
 /* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */
 
 /* Checks that the CFA notes are correct for every sp adjustment.  */
-- 
2.47.0



[committed] i386: Introduce crc_revsi4 expanders [PR120719]

2025-06-26 Thread Uros Bizjak
Introduce crc_revsi4 expanders to generate CRC32 instruction when using
__builtin_rev_crc32_data* builtins with 0x1EDC6F41 poylnomial and -mcrc32.

PR target/120719

gcc/ChangeLog:

* config/i386/i386.md (crc_revsi4): New expander.

gcc/testsuite/ChangeLog:

* gcc.target/i386/crc-builtin-crc32.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 41a86544bbf..adff2af4563 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -29523,6 +29523,23 @@ (define_insn "sse4_2_crc32di"
(set_attr "prefix_extra" "1")
(set_attr "mode" "DI")])
 
+(define_expand "crc_revsi4"
+  [(match_operand:SI 0 "register_operand")
+   (match_operand:SI 1 "register_operand")
+   (match_operand:SWI124 2 "nonimmediate_operand")
+   (match_operand:SI 3)]
+  "TARGET_CRC32"
+{
+  /* crc32 uses iSCSI polynomial */
+  if (INTVAL (operands[3]) == 0x1EDC6F41)
+emit_insn (gen_sse4_2_crc32 (operands[0], operands[1], operands[2]));
+  else
+expand_reversed_crc_table_based (operands[0], operands[1], operands[2],
+operands[3], mode,
+generate_reflecting_code_standard);
+  DONE;
+})
+
 (define_insn "rdpmc"
   [(set (match_operand:DI 0 "register_operand" "=A")
(unspec_volatile:DI [(match_operand:SI 1 "register_operand" "c")]
diff --git a/gcc/testsuite/gcc.target/i386/crc-builtin-crc32.c 
b/gcc/testsuite/gcc.target/i386/crc-builtin-crc32.c
new file mode 100644
index 000..0b4ff978817
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/crc-builtin-crc32.c
@@ -0,0 +1,22 @@
+/* PR target/120719 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcrc32" } */
+
+#include 
+
+int32_t rev_crc32_data8 (int8_t v)
+{
+  return __builtin_rev_crc32_data8 (0x, v, 0x1EDC6F41);
+}
+
+int32_t rev_crc32_data16 (int16_t v)
+{
+  return __builtin_rev_crc32_data16 (0x, v, 0x1EDC6F41);
+}
+
+int32_t rev_crc32_data32 (int32_t v)
+{
+  return __builtin_rev_crc32_data32 (0x, v, 0x1EDC6F41);
+} 
+
+/* { dg-final { scan-assembler-times "\tcrc32" 3 } } */


Re: [PATCH] c++, libstdc++, v2: Implement C++26 P2830R10 - Constexpr Type Ordering

2025-06-26 Thread Jonathan Wakely
On Thu, 26 Jun 2025 at 11:33, Jakub Jelinek  wrote:
>
> On Wed, Jun 25, 2025 at 10:58:59PM +0200, Maciej Cencora wrote:
> > update of std module is missing.
>
> Here is an updated patch which adds the std module part and while I was
> changing the patch, I've also added value_type/type and the 2 operators
> to std::type_order.
>
> Interdiff from the last patch is:
> --- libstdc++-v3/libsupc++/compare  2025-06-25 16:18:25.221710493 +0200
> +++ libstdc++-v3/libsupc++/compare  2025-06-25 16:18:25.221710493 +0200
> @@ -1271,6 +1271,10 @@
>  struct type_order
>  {
>static constexpr strong_ordering value = __builtin_type_order(_Tp, 
> _Up);
> +  using value_type = strong_ordering;
> +  using type = type_order<_Tp, _Up>;
> +  constexpr operator value_type() const noexcept { return value; }
> +  constexpr value_type operator()() const noexcept { return value; }

It's unclear which of these members will end up in the final standard,
but this is OK for now.

The library parts are OK for trunk, thanks.

>  };
>
>/// @ingroup variable_templates
> --- libstdc++-v3/src/c++23/std.cc.in.jj 2025-06-12 15:50:51.400821105 +0200
> +++ libstdc++-v3/src/c++23/std.cc.in2025-06-26 07:37:06.90208 +0200
> @@ -888,6 +888,10 @@ export namespace std
>using std::partial_order;
>using std::strong_order;
>using std::weak_order;
> +#if __glibcxx_type_order >= 202506L
> +  using std::type_order;
> +  using std::type_order_v;
> +#endif
>  }
>
>  // 28.4 
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Though, now that I look at it again, perhaps both
> #if __glibcxx_type_order >= 202506L
> in the patch should have been
> #if __cpp_lib_type_order >= 202506L
>
> Can change that.
>
> 2025-06-26  Jakub Jelinek  
>
> gcc/cp/
> * cp-trait.def: Implement C++26 P2830R10 - Constexpr Type Ordering.
> (TYPE_ORDER): New.
> * method.cc (type_order_value): Define.
> * cp-tree.h (type_order_value): Declare.
> * semantics.cc (trait_expr_value): Use gcc_unreachable also
> for CPTK_TYPE_ORDER, adjust comment.
> (finish_trait_expr): Handle CPTK_TYPE_ORDER.
> * constraint.cc (diagnose_trait_expr): Likewise.
> gcc/testsuite/
> * g++.dg/cpp26/type-order1.C: New test.
> * g++.dg/cpp26/type-order2.C: New test.
> * g++.dg/cpp26/type-order3.C: New test.
> libstdc++-v3/
> * include/bits/version.def (type_order): New.
> * include/bits/version.h: Regenerate.
> * libsupc++/compare: Define __glibcxx_want_type_order before
> including bits/version.h.
> (std::type_order, std::type_order_v): New trait and template variable.
> * src/c++23/std.cc.in (std::type_order, std::type_order_v): Export.
> * testsuite/18_support/comparisons/type_order/1.cc: New test.
>
> --- gcc/cp/method.cc.jj 2025-06-25 16:04:51.611158952 +0200
> +++ gcc/cp/method.cc2025-06-25 16:09:32.017556551 +0200
> @@ -3951,5 +3951,26 @@ num_artificial_parms_for (const_tree fn)
>return count;
>  }
>
> +/* Return value of the __builtin_type_order trait.  */
> +
> +tree
> +type_order_value (tree type1, tree type2)
> +{
> +  tree rettype = lookup_comparison_category (cc_strong_ordering);
> +  if (rettype == error_mark_node)
> +return rettype;
> +  int ret;
> +  if (type1 == type2)
> +ret = 0;
> +  else
> +{
> +  const char *name1 = ASTRDUP (mangle_type_string (type1));
> +  const char *name2 = mangle_type_string (type2);
> +  ret = strcmp (name1, name2);
> +}
> +  return lookup_comparison_result (cc_strong_ordering, rettype,
> +  ret == 0 ? 0 : ret > 0 ? 1 : 2);
> +}
> +
>
>  #include "gt-cp-method.h"
> --- gcc/cp/cp-tree.h.jj 2025-06-25 16:04:51.610158965 +0200
> +++ gcc/cp/cp-tree.h2025-06-25 16:09:32.019556525 +0200
> @@ -7557,6 +7557,8 @@ extern bool ctor_omit_inherited_parms (
>  extern tree locate_ctor(tree);
>  extern tree implicitly_declare_fn   (special_function_kind, tree,
>  bool, tree, tree);
> +extern tree type_order_value   (tree, tree);
> +
>  /* In module.cc  */
>  class module_state; /* Forward declare.  */
>  inline bool modules_p () { return flag_modules != 0; }
> --- gcc/cp/semantics.cc.jj  2025-06-25 16:04:51.633158669 +0200
> +++ gcc/cp/semantics.cc 2025-06-25 16:09:32.021556500 +0200
> @@ -13593,8 +13593,10 @@ trait_expr_value (cp_trait_kind kind, tr
>  case CPTK_IS_DEDUCIBLE:
>return type_targs_deducible_from (type1, type2);
>
> -/* __array_rank is handled in finish_trait_expr. */
> +/* __array_rank and __builtin_type_order are handled in
> +   finish_trait_expr.  */
>  case CPTK_RANK:
> +case CPTK_TYPE_ORDER:
>gcc_unreachable ();
>
>  #define DEFTRAIT_TYPE(CODE, NAME, ARITY) \
> @@ -13724,6 +13726,12 @@ finish_trait_expr (

Re: [PATCH v2 2/2] libstdc++: Lift chrono localized formatting to main chrono format loop [PR110739]

2025-06-26 Thread Jonathan Wakely
On Thu, 26 Jun 2025 at 13:30, Tomasz Kaminski  wrote:
>
>
>
> On Thu, Jun 26, 2025 at 2:13 PM Tomasz Kaminski  wrote:
>>
>>
>>
>> On Thu, Jun 26, 2025 at 2:09 PM Jonathan Wakely  wrote:
>>>
>>> On 26/06/25 11:39 +0200, Tomasz Kamiński wrote:
>>> >This patch extract calls to _M_locale_fmt and construction of the struct 
>>> >tm,
>>> >from the functions dedicated to each specifier, to main format loop in
>>> >_M_format_to functions. This removes duplicated code repeated for 
>>> >specifiers.
>>>
>>> Great, this is exactly what I wanted to do. Removing all the branches
>>> to call _M_locale_fmt from each of the _M_xxx member functions makes
>>> them smaller and potentially faster.
>>>
>>> >To allow _M_locale_fmt to only be called if localized formatting is enabled
>>> >('L' is present in chrono-format-spec), we provide a implementations for
>>> >locale specific specifiers (%c, %r, %x, %X) that produces the same result
>>> >as locale::classic():
>>> > * %c is implemented as separate _M_c method
>>> > * %r is implemented as separate _M_r method
>>> > * %x is implemented together with %D, as they provide same behavior,
>>> > * %X is implemented together with %R as _M_R_X, as both of them do not 
>>> > include
>>> >   subseconds.
>>>
>>> Nice.
>>>
>>> >The handling of subseconds was also extracted to _M_subsecs function that 
>>> >is
>>> >used by _M_S and _M_T specifier. The _M_T is now implemented in terms of
>>> >_M_R_X (printing time without subseconds) and _M_subs.
>>> >
>>> >The __mod responsible for triggering localized formatting was removed from
>>> >method handling most of specifiers, except:
>>> > * _M_S (for %S) for which it determines if subseconds should be included,
>>> > * _M_z (for %z) for which it determines if ':' is used as separator.
>>> >
>>> >   PR libstdc++/110739
>>> >
>>> >libstdc++-v3/ChangeLog:
>>> >
>>> >   * include/bits/chrono_io.h (__formatter_chrono::_M_use_locale_fmt):
>>> >   Define.
>>> >   (__formatter_chrono::_M_locale_fmt): Moved to front of the class.
>>> >   (__formatter_chrono::_M_format_to): Construct and initialize
>>> >   struct tm and call _M_locale_fmt if needed.
>>> >   (__formatter_chrono::_M_c_r_x_X): Split into separate methods.
>>> >   (__formatter_chrono::_M_c, __formatter_chrono::_M_r): Define.
>>> >   (__formatter_chrono::_M_D): Renamed to _M_D_x.
>>> >   (__formatter_chrono::_M_D_x): Renamed from _M_D.
>>> >   (__formatter_chrono::_M_R_T): Split into _M_R_X and _M_T.
>>> >   (__formatter_chrono::_M_R_X): Extracted from _M_R_T.
>>> >   (__formatter_chrono::_M_T): Define in terms of _M_R_X and 
>>> > _M_subsecs.
>>> >   (__formatter_chrono::_M_subsecs): Extracted from _M_S.
>>> >   (__formatter_chrono::_M_S): Replaced __mod with __subs argument,
>>> >   removed _M_locale_fmt call, and delegate to _M_subsecs.
>>> >   (__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
>>> >   (__formatter_chrono::_M_H_I, __formatter_chrono::_M_m)
>>> >   (__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W): Remove
>>> >   __mod argument and call to _M_locale_fmt.
>>> >---
>>> > libstdc++-v3/include/bits/chrono_io.h | 340 +-
>>> > 1 file changed, 172 insertions(+), 168 deletions(-)
>>> >
>>> >diff --git a/libstdc++-v3/include/bits/chrono_io.h 
>>> >b/libstdc++-v3/include/bits/chrono_io.h
>>> >index 35e95906e6a..d451bde722d 100644
>>> >--- a/libstdc++-v3/include/bits/chrono_io.h
>>> >+++ b/libstdc++-v3/include/bits/chrono_io.h
>>> >@@ -906,6 +906,40 @@ namespace __format
>>> > return __format::__write(std::move(__out), __s);
>>> >   }
>>> >
>>> >+  [[__gnu__::__always_inline__]]
>>> >+  static bool
>>> >+  _S_localized_spec(_CharT __conv, _CharT __mod)
>>> >+  {
>>> >+  switch (__conv)
>>> >+{
>>> >+case 'c':
>>> >+case 'r':
>>> >+case 'x':
>>> >+case 'X':
>>> >+  return true;
>>> >+case 'z':
>>> >+  return false;
>>> >+default:
>>> >+  return (bool)__mod;
>>> >+};
>>> >+  }
>>> >+
>>> >+  // Use the formatting locale's std::time_put facet to produce
>>> >+  // a locale-specific representation.
>>> >+  template
>>> >+  _Iter
>>> >+  _M_locale_fmt(_Iter __out, const locale& __loc, const struct tm& 
>>> >__tm,
>>> >+char __fmt, char __mod) const
>>> >+  {
>>> >+basic_ostringstream<_CharT> __os;
>>> >+__os.imbue(__loc);
>>> >+const auto& __tp = use_facet>(__loc);
>>> >+__tp.put(__os, __os, _S_space, &__tm, __fmt, __mod);
>>> >+if (__os)
>>> >+  __out = _M_write(std::move(__out), __loc, __os.view());
>>> >+return __out;
>>> >+  }
>>> >+
>>> >   template
>>> >   _Out
>>> >   _M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
>>> >@@ -923,6 +957,36 @@ namespace __format
>>> >   return std::move(__out);
>>> > };
>>>

Re: [PATCH] vect: Misalign checks for gather/scatter.

2025-06-26 Thread Robin Dapp

+  bool is_misaligned = scalar_align < inner_vectype_sz;
+  bool is_packed = scalar_align > 1 && is_misaligned;
+
+  *misalignment = !is_misaligned ? 0 : inner_vectype_sz - 
scalar_align;

+
+  if (targetm.vectorize.support_vector_misalignment
+ (TYPE_MODE (vectype), inner_vectype, *misalignment, is_packed))

the misalignment argument is meaningless, I think you want to
pass DR_MISALIGNMENT_UNKNOWN for this and just pass is_packed
if the scalars acesses are not at least size aligned.


At least aarch64's (and loongarch's) support_vector_misalignment gives up
right away if misalignment == -1 (before checking for !is_packed)
and would thus get dr_unaligned_unsupported in case of strict alignment.

I used the same logic for riscv which made a proper value in *misalignment 
necessary.


We only have one other invocation of support_vector_misalignment in 
tree-vect-data-refs which only sets packed if DR_MISALIGNMENT_UNKOWN.

So ISTM that
if (!is_packed)
  return true;
should always be done before acting on DR_MISALIGNMENT_UNKOWN?

Or can there be instances where is_packed == false && DR_MISALIGNMENT_UNKNOWN  
and we don't support the misalignment?  Like if the target requires 
vector-sized alignment?


So my current plan would be to adjust the riscv hook to always support 
misalignment if !is_packed regardless of DR_MISALIGNMENT_UNKNOWN and do the 
same for aarch64, loongarch?


I'll also change the hook docs to something like

diff --git a/gcc/target.def b/gcc/target.def
index 38903eb567a..94ccf86233c 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1926,7 +1926,8 @@ DEFHOOK
store/load of a specific factor denoted in the @var{misalignment}\n\
parameter.  The vector store/load should be of machine mode @var{mode} and\n\
the elements in the vectors should be of type @var{type}.  @var{is_packed}\n\
-parameter is true if the memory access is defined in a packed struct.",
+parameter is true if the misalignment is unknown and the memory access is\n\
+defined in a packed struct."



Note the hook really doesn't know whether you ask it for gather/scatter
or a contiguous vector load so I wonder whether the above fits
constraints on other platforms where scalar accesses might be
allowed to be packed but all unaligned vector accesses would need
to be element aligned?


We actually can have all four combinations of scalar and vector misalignment 
support on riscv :/


--
Regards
Robin



Re: [PATCH] libstdc++, v2: Implement C++26 P2927R3 - Inspecting exception_ptr

2025-06-26 Thread Jonathan Wakely
On Thu, 26 Jun 2025 at 11:35, Jakub Jelinek  wrote:
>
> On Wed, Jun 25, 2025 at 08:20:55PM +0100, Jonathan Wakely wrote:
> > This won't work for -fno-rtti
>
> I've missed the || __cpp_exceptions part in there, thought it is &&.
>
> Here is an updated patch which uses just one definition of
> std::exception_ptr_cast and additionally exports it from std.cc.in as well.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, thanks


>
> 2025-06-26  Jakub Jelinek  
>
> * include/bits/version.def (exception_ptr_cast): Add.
> * include/bits/version.h: Regenerate.
> * libsupc++/exception: Define __glibcxx_want_exception_ptr_cast before
> including bits/version.h.
> * libsupc++/exception_ptr.h (std::exception_ptr_cast): Define.
> (std::__exception_ptr::exception_ptr::_M_exception_ptr_cast): Declare.
> * libsupc++/eh_ptr.cc
> (std::__exception_ptr::exception_ptr::_M_exception_ptr_cast): Define.
> * src/c++23/std.cc.in (std::exception_ptr_cast): Export.
> * config/abi/pre/gnu.ver: Export
> 
> _ZNKSt15__exception_ptr13exception_ptr21_M_exception_ptr_castERKSt9type_info
> at CXXABI_1.3.17.
> * testsuite/util/testsuite_abi.cc (check_version): Allow 
> CXXABI_1.3.17.
> * testsuite/18_support/exception_ptr/exception_ptr_cast.cc: New test.
>
> --- libstdc++-v3/include/bits/version.def.jj2025-06-24 18:53:13.751807828 
> +0200
> +++ libstdc++-v3/include/bits/version.def   2025-06-25 12:52:41.844921595 
> +0200
> @@ -2012,6 +2012,14 @@ ftms = {
>};
>  };
>
> +ftms = {
> +  name = exception_ptr_cast;
> +  values = {
> +v = 202506;
> +cxxmin = 26;
> +  };
> +};
> +
>  // Standard test specifications.
>  stds[97] = ">= 199711L";
>  stds[03] = ">= 199711L";
> --- libstdc++-v3/include/bits/version.h.jj  2025-06-24 18:53:13.751807828 
> +0200
> +++ libstdc++-v3/include/bits/version.h 2025-06-25 12:52:47.754691329 +0200
> @@ -2253,4 +2253,14 @@
>  #endif /* !defined(__cpp_lib_sstream_from_string_view) && 
> defined(__glibcxx_want_sstream_from_string_view) */
>  #undef __glibcxx_want_sstream_from_string_view
>
> +#if !defined(__cpp_lib_exception_ptr_cast)
> +# if (__cplusplus >  202302L)
> +#  define __glibcxx_exception_ptr_cast 202506L
> +#  if defined(__glibcxx_want_all) || 
> defined(__glibcxx_want_exception_ptr_cast)
> +#   define __cpp_lib_exception_ptr_cast 202506L
> +#  endif
> +# endif
> +#endif /* !defined(__cpp_lib_exception_ptr_cast) && 
> defined(__glibcxx_want_exception_ptr_cast) */
> +#undef __glibcxx_want_exception_ptr_cast
> +
>  #undef __glibcxx_want_all
> --- libstdc++-v3/libsupc++/exception.jj 2025-06-12 09:49:19.924910752 +0200
> +++ libstdc++-v3/libsupc++/exception2025-06-25 12:53:09.924564775 +0200
> @@ -38,6 +38,7 @@
>  #include 
>
>  #define __glibcxx_want_uncaught_exceptions
> +#define __glibcxx_want_exception_ptr_cast
>  #include 
>
>  extern "C++" {
> --- libstdc++-v3/libsupc++/exception_ptr.h.jj   2025-06-02 11:00:06.267523918 
> +0200
> +++ libstdc++-v3/libsupc++/exception_ptr.h  2025-06-26 07:53:12.966100732 
> +0200
> @@ -80,6 +80,13 @@ namespace std _GLIBCXX_VISIBILITY(defaul
>/// Throw the object pointed to by the exception_ptr.
>void rethrow_exception(exception_ptr) __attribute__ ((__noreturn__));
>
> +#if __cpp_lib_exception_ptr_cast >= 202506L
> +  template
> +  const _Ex* exception_ptr_cast(const exception_ptr&) noexcept;
> +  template
> +  void exception_ptr_cast(const exception_ptr&&) = delete;
> +#endif
> +
>namespace __exception_ptr
>{
>  using std::rethrow_exception; // So that ADL finds it.
> @@ -109,6 +116,13 @@ namespace std _GLIBCXX_VISIBILITY(defaul
>friend void std::rethrow_exception(exception_ptr);
>template
>friend exception_ptr std::make_exception_ptr(_Ex) 
> _GLIBCXX_USE_NOEXCEPT;
> +#if __cpp_lib_exception_ptr_cast >= 202506L
> +  template
> +  friend const _Ex* std::exception_ptr_cast(const exception_ptr&) 
> noexcept;
> +#endif
> +
> +  const void* _M_exception_ptr_cast(const type_info&) const
> +   _GLIBCXX_USE_NOEXCEPT;
>
>  public:
>exception_ptr() _GLIBCXX_USE_NOEXCEPT;
> @@ -283,6 +299,20 @@ namespace std _GLIBCXX_VISIBILITY(defaul
>  { return exception_ptr(); }
>  #endif
>
> +#if __cpp_lib_exception_ptr_cast >= 202506L
> +  template
> +[[__gnu__::__always_inline__]]
> +inline const _Ex* exception_ptr_cast(const exception_ptr& __p) noexcept
> +{
> +#ifdef __cpp_rtti
> +  const type_info &__id = typeid(const _Ex&);
> +  return static_cast(__p._M_exception_ptr_cast(__id));
> +#else
> +  return nullptr;
> +#endif
> +}
> +#endif
> +
>  #undef _GLIBCXX_EH_PTR_USED
>
>/// @} group exceptions
> --- libstdc++-v3/libsupc++/eh_ptr.cc.jj 2025-04-08 14:10:30.518900025 +0200
> +++ libstdc++-v3/libsupc++/eh_ptr.cc2025-06-25 15:29:17.416393720 +0200
> @@ -220,4 +220,20 @@ std::rethrow_exception(std::exception_pt
>  

Re: [PATCH] c: Suppress -Wdeprecated-non-prototype warnings for builtins

2025-06-26 Thread Simon Marchi



On 2025-04-30 16:37, Joseph Myers wrote:
> On Sat, 26 Apr 2025, Florian Weimer wrote:
> 
>> Builtins defined with BT_FN_INT_VAR etc. show as functions without
>> a prototype and trigger the warning.
>>
>> gcc/c/
>>
>>  PR c/119950
>>  * c-typeck.cc (convert_arguments): Check for built-in
>>  function declaration before warning.
>>
>> gcc/testsuite/
>>
>>  * gcc.dg/Wdeprecated-non-prototype-5.c: New test.
> 
> OK.
> 

I keep hitting the bug fixed by this patch with gcc 15.1:

make[3]: Entering directory '/home/simark/build/binutils-gdb-all-targets/sim'
  CC   bfin/gui.o
In file included from /usr/include/SDL2/SDL_main.h:25,
 from /usr/include/SDL2/SDL.h:31,
 from /home/simark/src/binutils-gdb/sim/bfin/gui.c:25:
/usr/include/SDL2/SDL_stdinc.h: In function ‘_SDL_size_mul_overflow_builtin’:
/usr/include/SDL2/SDL_stdinc.h:830:12: error: ISO C23 does not allow arguments 
for function ‘__builtin_mul_overflow’ declared without parameters 
[-Werror=deprecated-non-prototype]
  830 | return __builtin_mul_overflow(a, b, ret) == 0 ? 0 : -1;
  |^~

Should this patch be cherry-picked into the gcc 15 branch, to be
included in gcc 15.2?

Simon


Re: [RFC] [lra] catch all to-sp eliminations [PR120424]

2025-06-26 Thread Vladimir Makarov



On 6/26/25 1:51 AM, Alexandre Oliva wrote:

On Jun 25, 2025, Vladimir Makarov  wrote:

This patch is ok for me.  I am a big fan of asserts. They helped to 
catch so many bugs on early stages.


Thank you, Alex.





for  gcc/ChangeLog

PR rtl-optimization/120424
* lra-eliminations.cc (elimination_2sp_occurred_p): Rename
from...
(elimination_fp2sp_occured_p): ... this.  Adjust all uses.
(lra_eliminate_regs_1): Don't require a from-frame-pointer
elimination to set it.
(update_reg_eliminate): Likewise to test it.
---
  gcc/lra-eliminations.cc |   46 +-
  1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 9cdd0c5ff53a2..045f2dcf23ef7 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -309,8 +309,18 @@ move_plus_up (rtx x)
return x;
  }
  
-/* Flag that we already did frame pointer to stack pointer elimination.  */

-static bool elimination_fp2sp_occured_p = false;
+/* Flag that we already applied nonzero stack pointer elimination
+   offset; such sp updates cannot currently be undone.  */
+static bool elimination_2sp_occurred_p = false;
+
+/* Take note of any nonzero sp-OFFSET used in eliminations to sp.  */
+static inline poly_int64
+note_spoff (poly_int64 offset)
+{
+  if (maybe_ne (offset, 0))
+elimination_2sp_occurred_p = true;
+  return offset;
+}
  
  /* Scan X and replace any eliminable registers (such as fp) with a

 replacement (such as sp) if SUBST_P, plus an offset.  The offset is
@@ -369,13 +379,10 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
  
-	  if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM)

-   elimination_fp2sp_occured_p = true;
-
  if (maybe_ne (update_sp_offset, 0))
{
  if (ep->to_rtx == stack_pointer_rtx)
-   return plus_constant (Pmode, to, update_sp_offset);
+   return plus_constant (Pmode, to, note_spoff (update_sp_offset));
  return to;
}
  else if (update_p)
@@ -385,7 +392,8 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  ep->offset
  - (insn != NULL_RTX
 && ep->to_rtx == stack_pointer_rtx
-? lra_get_insn_recog_data (insn)->sp_offset
+? note_spoff (lra_get_insn_recog_data
+  (insn)->sp_offset)
 : 0));
  else
return to;
@@ -402,19 +410,18 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  poly_int64 offset, curr_offset;
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
  
-	  if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM)

-   elimination_fp2sp_occured_p = true;
-
  if (! update_p && ! full_p)
return simplify_gen_binary (PLUS, Pmode, to, XEXP (x, 1));
  
  	  if (maybe_ne (update_sp_offset, 0))

-   offset = ep->to_rtx == stack_pointer_rtx ? update_sp_offset : 0;
+   offset = (ep->to_rtx == stack_pointer_rtx
+ ? note_spoff (update_sp_offset)
+ : 0);
  else
offset = (update_p
  ? ep->offset - ep->previous_offset : ep->offset);
  if (full_p && insn != NULL_RTX && ep->to_rtx == stack_pointer_rtx)
-   offset -= lra_get_insn_recog_data (insn)->sp_offset;
+   offset -= note_spoff (lra_get_insn_recog_data 
(insn)->sp_offset);
  if (poly_int_rtx_p (XEXP (x, 1), &curr_offset)
  && known_eq (curr_offset, -offset))
return to;
@@ -465,15 +472,13 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
  
-	  if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM)

-   elimination_fp2sp_occured_p = true;
-
  if (maybe_ne (update_sp_offset, 0))
{
  if (ep->to_rtx == stack_pointer_rtx)
return plus_constant (Pmode,
  gen_rtx_MULT (Pmode, to, XEXP (x, 1)),
- update_sp_offset * INTVAL (XEXP (x, 1)));
+ note_spoff (update_sp_offset)
+ * INTVAL (XEXP (x, 1)));
  return gen_rtx_MULT (Pmode, to, XEXP (x, 1));
}
  else if (update_p)
@@ -486,7 +491,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  poly_int64 offset = ep->off

[PATCH v2] c++: fix ICE with [[deprecated]] [PR120756]

2025-06-26 Thread Marek Polacek
On Wed, Jun 25, 2025 at 03:13:25PM -0400, Jason Merrill wrote:
> On 6/25/25 1:28 PM, Marek Polacek wrote:
> > @@ -24604,7 +24604,7 @@ resolve_nondeduced_context (tree orig_expr, 
> > tsubst_flags_t complain)
> > }
> > if (good == 1)
> > {
> > - mark_used (goodfn);
> > + mark_used (goodfn, complain);
> 
> Actually, if we're going to pass complain, we should also check the return
> value; the usual pattern is
> 
> >   if (!mark_used (fn, complain) && !(complain & tf_error))
> > return error_mark_node;

OK, done here:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we end up with "error reporting routines re-entered" because
resolve_nondeduced_context isn't passing complain to mark_used.

PR c++/120756

gcc/cp/ChangeLog:

* pt.cc (resolve_nondeduced_context): Pass complain to mark_used.

gcc/testsuite/ChangeLog:

* g++.dg/warn/deprecated-22.C: New test.
---
 gcc/cp/pt.cc  |  3 ++-
 gcc/testsuite/g++.dg/warn/deprecated-22.C | 13 +
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/deprecated-22.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index deb0106b158..c7a0066a11a 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -24604,7 +24604,8 @@ resolve_nondeduced_context (tree orig_expr, 
tsubst_flags_t complain)
}
   if (good == 1)
{
- mark_used (goodfn);
+ if (!mark_used (goodfn, complain) && !(complain & tf_error))
+   return error_mark_node;
  expr = goodfn;
  if (baselink)
expr = build_baselink (BASELINK_BINFO (baselink),
diff --git a/gcc/testsuite/g++.dg/warn/deprecated-22.C 
b/gcc/testsuite/g++.dg/warn/deprecated-22.C
new file mode 100644
index 000..60ee607f717
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/deprecated-22.C
@@ -0,0 +1,13 @@
+// PR c++/120756
+// { dg-do compile { target c++11 } }
+
+struct A {
+template  [[deprecated]] void foo ();
+};
+
+template  [[deprecated]] auto bar () -> decltype (&A::foo);
+
+void foo ()
+{
+  bar<0> ();  // { dg-warning "deprecated" }
+}

base-commit: 5aca8510abea6c3fac3336a7445863db07fd4a06
-- 
2.50.0



Re: [PATCH] Add _GLIBCXX_USE_ALLOC_PTR macro

2025-06-26 Thread François Dumont



On 26/06/2025 21:47, Jonathan Wakely wrote:

On 26/06/25 19:30 +0200, François Dumont wrote:

I find it quite convenient so maybe you'll accept it.

Note that looking for existence of this macro I noticed that 
ChangeLog-2024 is wrongly talking about 
_GLIBCXX_USE_ALLOC_PTR_FOR_LIST in  header. Should it 
be fixed ?


No, I don't see any point. The git commit message will still be wrong,
and that's surely what most people care about.


    libstdc++: Add _GLIBCXX_USE_ALLOC_PTR macro to rule them all

    Provide a unique way to control usage of the allocator pointer 
type through a single
    macro: _GLIBCXX_USE_ALLOC_PTR. If defined is used to set the 
value of the following
    macros: _GLIBCXX_USE_ALLOC_PTR_FOR_LIST, 
_GLIBCXX_USE_ALLOC_PTR_FOR_FORWARD_LIST

    and _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE.


I thought about this at the time, and I decided it's not really
important. Does it have any real use except for testing?


With the add of the same feature for Hashtable it will be a nice way to 
control them all at once.


Otherwise yes, mainly testing.




[PATCH v3] Use incoming small integer argument value as if promoted

2025-06-26 Thread H.J. Lu
On Mon, May 5, 2025 at 7:32 AM H.J. Lu  wrote:

> Here is the v2 patch.   ix86_get_small_integer_argument_value was moved to
> calls.cc.  I added a target hook, TARGET_SAME_FUNCTION_ARGUMENT_ORDER_P,
> to verify that caller and callee have the same incoming argument
> order.  The default returns
> true.  The x86 hook has
>
> /* Implement TARGET_SAME_INCOMING_ARGUMENT_ORDER_P.  */
>
> static bool
> ix86_same_incoming_argument_order_p (const_tree fndecl)
> {
>   return (!TARGET_64BIT
>   || (ix86_function_abi (current_function_decl)
>   == ix86_function_abi (fndecl)));
> }
>
> since 64-bit SYSV ABI and 64-bit MS ABI have different argument orders.
> Copying one incoming argument register to another outgoing argument register
> may override the other incoming argument register.
>
> --
> H.J.
> ---
> or targets, like x86, which define TARGET_PROMOTE_PROTOTYPES to return
> true, all integer arguments smaller than int are passed as int:
>
> [hjl@gnu-tgl-3 pr14907]$ cat x.c
> extern int baz (char c1);
>
> int
> foo (char c1)
> {
>   return baz (c1);
> }
> [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> [hjl@gnu-tgl-3 pr14907]$ cat x.s
> .file "x.c"
> .text
> .p2align 4
> .globl foo
> .type foo, @function
> foo:
> .LFB0:
> .cfi_startproc
> movsbl 4(%esp), %eax
> movl %eax, 4(%esp)
> jmp baz
> .cfi_endproc
> .LFE0:
> .size foo, .-foo
> .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> .section .note.GNU-stack,"",@progbits
> [hjl@gnu-tgl-3 pr14907]$
>
> But integer promotion:
>
> movsbl 4(%esp), %eax
> movl %eax, 4(%esp)
>
> isn't necessary if incoming arguments are copied to outgoing arguments
> directly.
>
> We can use the incoming argument value as the outgoing argument as if it
> has been promoted if
>
> 1. Caller and callee are not nested functions.
> 2. Caller and callee have the same incoming argument order.  Add a new
> target hook, TARGET_SAME_FUNCTION_ARGUMENT_ORDER_P, which returns true
> if caller and callee have the same incoming argument order.  If the
> incoming argument order of the caller is different from the incoming
> argument order of the callee since the same register may be used for
> different incoming arguments in caller and callee.  Copying from one
> incoming argument in the caller to an outgoing argument may override
> another incoming argument.
> 3. The incoming argument is unchanged before call expansion.
>
> Otherwise, using the incoming argument as the outgoing argument may change
> values of other incoming arguments or the wrong outgoing argument value
> may be used.
>
> If callee is a global function, we always properly extend the incoming
> small integer arguments in callee.  If callee is a local function, since
> DECL_ARG_TYPE has the original small integer type, we will extend the
> incoming small integer arguments in callee if needed.
>
> Tested on x86-64, x32 and i686.
>
> NB: I tried to elide all incoming argument copying for all types, not
> just integer arguments smaller than int.  But GCC was miscompiled which
> is related to function inlining.  There is
>
> foo
>   call baz
>
> bar
>   call foo
>
> when foo is inlined
>
> bar
>call baz
>
> the incoming arguments, which aren't integer arguments smaller than int,
> for baz get the wrong values sometimes.

Here is the v3 patch.  The difference from v2 is to use

  if (MEM_P (src)
  && MEM_EXPR (src)
  && (TREE_CODE (get_base_address (MEM_EXPR (src)))
  == PARM_DECL))
continue;

to check incoming arguments on stack.

OK for master?

Thanks.

-- 
H.J.


Re: [PATCH][RFC] c/96570 - diagnostics for conversions to/from time_t

2025-06-26 Thread Joseph Myers
On Thu, 26 Jun 2025, Richard Biener wrote:

> The following prototypes diagnostics for conversions to/from time_t
> where the source/destination does not have sufficient precision for it.
> I've lumped this into -Wconversion for the moment and didn't bother
> fixing up the testcase for !ilp32 or the -Wconversion diagnostics that
> happen.
> 
> Would -Wtime-conversion (or -Wtime_t-conversion?) be an appropriate
> option?  I'd enable it with -Wconversion.

I think such a warning should be based on an attribute on the time_t type 
that means "warn for implicit truncation of this type" (I'm less clear on 
why warnings for implicit widening conversions *to* time_t are supposed to 
be useful), rather than hardcoding it to be based on the time_t name.  
It's hardly just time_t for which a warning about such implicit truncation 
might be useful.

Such an attribute would of course be preserved by e.g. "typedef time_t 
my_time_t;".  It would need composite type rules defined (probably the 
composite type has the attribute if either of the two types does), and 
rules for what happens to the attribute in integer promotions / usual 
arithmetic conversions (I'm guessing that given "time_t x;", it's desired 
to warn about truncation of x+1, for example, so the process of applying 
usual arithmetic conversions to determine the type of x+1 should not have 
lost the attribute; what's less clear is e.g. x+1LL if time_t is narrower 
than long long).

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] RISC-V: update prepare_ternary_operands to handle the vector-scalar case [PR120828]

2025-06-26 Thread Robin Dapp
I guess I missed it when I first ran the testsuite before sending the patch 
for review. I rebased and re-ran the testsuite after getting approved and saw 
the regression. But at that point I realised Jeff had already merged it.

Anyway, I'll regtest more carefully next time!


The CI helps with that but as we saw before it doesn't pick up patches in sub 
threads.  So if you want to ensure it's tested just send a new version in a 
separate thread.


--
Regards
Robin



Re: [committed] RISC-V: Add comment and reorder the the include files in riscv.md [NFC]

2025-06-26 Thread Robin Dapp

Hi Kito,


This patch adds a comment to the riscv.md file to clarify the purpose of
the file and reorders the include files for better organization.


this seems to have broken the build.  I believe that's due to


-(include "vector.md")
 (include "vector-crypto.md")


because vector crypto depends on modes defined/include in vector.md.

--
Regards
Robin



Re: [PATCH 0/1] contrib: add vmtest-tool to test BPF programs

2025-06-26 Thread Jose E. Marchesi


Hello Piyush.

Sorry for the delay in reviewing.  It's been quite a busy week at
work...

> This patch adds initial version of vmtest-tool script to test BPF
> programs on live kernel
>
> For now, the tool is standalone, but it is intended to be integrated with the
> DejaGnu testsuite to run BPF testcases in future patches.

Very nice.

> Current Limitations:
> - Only x86_64 is supported. Support for additional architectures will
> be added soon.

This is a very reasonable way of proceeding.

> - When testing BPF programs with --bpf-src or --bpf-obj, only the host's root
>   directory can be used as the VM root filesystem. This will also be improved
>   in future updates.

Ok.

>
> Thank you,
> Piyush Raj
>
> Piyush Raj (1):
>   contrib: add vmtest-tool to test BPF programs
>
>  contrib/vmtest-tool/.gitignore  |  23 ++
>  contrib/vmtest-tool/.pre-commit-config.yaml |  32 ++
>  contrib/vmtest-tool/.python-version |   1 +
>  contrib/vmtest-tool/README  |  75 
>  contrib/vmtest-tool/__init__.py |   0
>  contrib/vmtest-tool/bpf.py  | 193 ++
>  contrib/vmtest-tool/config.py   |  11 +
>  contrib/vmtest-tool/kernel.py   | 209 +++
>  contrib/vmtest-tool/main.py | 101 ++
>  contrib/vmtest-tool/pyproject.toml  |  36 ++
>  contrib/vmtest-tool/requirements-dev.txt| 198 ++
>  contrib/vmtest-tool/tests/test_cli.py   | 170 +
>  contrib/vmtest-tool/utils.py|  26 ++
>  contrib/vmtest-tool/uv.lock | 380 
>  contrib/vmtest-tool/vm.py   | 154 
>  15 files changed, 1609 insertions(+)
>  create mode 100644 contrib/vmtest-tool/.gitignore
>  create mode 100644 contrib/vmtest-tool/.pre-commit-config.yaml
>  create mode 100644 contrib/vmtest-tool/.python-version
>  create mode 100644 contrib/vmtest-tool/README
>  create mode 100644 contrib/vmtest-tool/__init__.py
>  create mode 100644 contrib/vmtest-tool/bpf.py
>  create mode 100644 contrib/vmtest-tool/config.py
>  create mode 100644 contrib/vmtest-tool/kernel.py
>  create mode 100644 contrib/vmtest-tool/main.py
>  create mode 100644 contrib/vmtest-tool/pyproject.toml
>  create mode 100644 contrib/vmtest-tool/requirements-dev.txt
>  create mode 100644 contrib/vmtest-tool/tests/test_cli.py
>  create mode 100644 contrib/vmtest-tool/utils.py
>  create mode 100644 contrib/vmtest-tool/uv.lock
>  create mode 100644 contrib/vmtest-tool/vm.py


Re: [PATCH] [genoutput] mark scratch outputs as eliminable [PR120424]

2025-06-26 Thread Vladimir Makarov



On 6/22/25 11:54 PM, Alexandre Oliva wrote:


Regstrapped on x86_64-linux-gnu, bootstrapped on arm-linux-gnueabihf
(arm and thumb modes), also tested with gcc-14 on arm-vx7r2 and
arm-linux-gnueabihf.  Ok to install?


It is OK for me.  Thank you, Alex

for  gcc/ChangeLog

PR rtl-optimization/120424
* genoutput.cc (scan_operands): Make MATCH_SCRATCHes eliminable.
---
  gcc/genoutput.cc |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc
index dd4e7b80c2a91..25d0b8b864676 100644
--- a/gcc/genoutput.cc
+++ b/gcc/genoutput.cc
@@ -478,7 +478,7 @@ scan_operands (class data *d, rtx part, int this_address_p,
d->operand[opno].n_alternatives
= n_occurrences (',', d->operand[opno].constraint) + 1;
d->operand[opno].address_p = 0;
-  d->operand[opno].eliminable = 0;
+  d->operand[opno].eliminable = 1;
return;
  
  case MATCH_OPERATOR:






Re: [PATCH] [lra] rework deactivation of fp2sp elimination [PR120424]

2025-06-26 Thread Vladimir Makarov



On 6/22/25 11:59 PM, Alexandre Oliva wrote:

On Jun 13, 2025, Vladimir Makarov  wrote:


* lra-eliminations.cc (lra_update_fp2sp_elimination):
Inactivate the unused fp2sp elimination right away.

Alas, this seems to cause trouble on arm-linux-gnueabihf bootstraps.
This is OK. In many cases It is difficult to get a solution for an RA 
problem for all targets on the first try.  I personally try to test 
non-obvious patches on many targets as possible but still the progress 
looks like two steps forward, one step back.  RA in GCC is too 
complicated as RTL has too many features and RA-related hooks and 
descriptions.

Here's an alternate approach that builds on it to solves the earlier
problem without making for a new one.


Deactivating the fp2sp elimination in lra_update_fp2sp_elimination
prevents update_reg_eliminate from propagating the fp2sp elimination
offset to the next chosen elimination, so it may retain -1 as the
prev_offset, and prev_offset will be taken as an already-applied
offset that needs to be compensated in the next round of spilling and
reloading.  This affects, for example, crtbegin.o's
__do_global_dtors_aux on arm-linux-gnueabihf in a {BOOT_C,T}FLAGS='-O2
-g -fnon-call-exceptions -fstack-clash-protection' bootstrap.

Alas, just retaining that elimination causes spills to use the fp2sp
elimination, including applying sp offsets, which breaks e.g. an
x86_64-linux-gnu native bootstrap with ix86_frame_pointer_required
modified to return true on nonzero frame size.

The middle-ground solution is to keep the elimination active, so that
its offsets are applied and propagated on to the subsequent fp
elimination, but without introducing sp offsets, so that
e.g. pr103973-18.c on the modified x86_64-linux-gnu doesn't get
adjacent argument pushes of two adjacent on-stack temporaries ending
up pushing the same temporary because of undesired adjustments.

Regstrapped on x86_64-linux-gnu, bootstrapped on arm-linux-gnueabihf
(arm and thumb modes), also tested with gcc-14 on arm-vx7r2 and
arm-linux-gnueabihf.  Ok to install?


Yes.  Thank you.

for  gcc/ChangeLog

PR rtl-optimization/120424
* lra-elimination.cc (lra_update_fp2sp_elimination):
Avoid sp offsets in further fp2sp eliminations...
(update_reg_eliminate): ... and restore to_rtx before assert
checking.
---
  gcc/lra-eliminations.cc |   18 --
  1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 6663d1c37e8ba..0a702a43a5a17 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1172,7 +1172,16 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
/* If it is a currently used elimination: update the previous
 offset.  */
if (elimination_map[ep->from] == ep)
-   ep->previous_offset = ep->offset;
+   {
+ ep->previous_offset = ep->offset;
+ /* Restore the stack_pointer_rtx into to_rtx, that
+lra_update_fp2sp_elimination set to from_rtx, so that the assert
+below still checks what it was supposed to check.  */
+ if (ep->from_rtx == ep->to_rtx
+ && ep->from != ep->to
+ && ep->from == FRAME_POINTER_REGNUM)
+   ep->to_rtx = stack_pointer_rtx;
+   }
  
prev = ep->prev_can_eliminate;

setup_can_eliminate (ep, targetm.can_eliminate (ep->from, ep->to));
@@ -1418,7 +1427,12 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
ep = elimination_map[FRAME_POINTER_REGNUM];
if (ep->to == STACK_POINTER_REGNUM)
  {
-  elimination_map[FRAME_POINTER_REGNUM] = NULL;
+  /* Prevent any further uses of fp, say in spill addresses, from being
+eliminated to sp and affected by sp offsets.  Alas, deactivating the
+elimination altogether causes the next chosen fp elimination to miss
+the offset propagation, so it may keep -1 as its prev_offset, and that
+will make subsequent offsets incorrect.  */
+  ep->to_rtx = ep->from_rtx;
setup_can_eliminate (ep, false);
  }
else






Re: Remove early inlining from afdo pass

2025-06-26 Thread Jan Hubicka
> 
> 
> > On 24 Jun 2025, at 7:43 pm, Jan Hubicka  wrote:
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > Hi,
> > this pass removes early-inlining from afdo pass since all inlining
> > should now happen from early inliner.  I tedted this on spec and there
> > are 3 inlines happening here which are blocked at early-inline time by
> > hitting large function growth limit.  We probably want to bypass that
> > limit, I will look into that incrementaly.
> 
> Thanks for doing this. Is the inlining difference here is due to annotation 
> that happens in auto-profile pass in the earlier implementation?

inliner has limit for large function growth which is mostly about
GCC being non-linear in function size.
Each time inlining is done, large function are allowed twice.
Since old code run inliner many times, it bypassed this limit.

Early inliner is really desgined to do win-win decisions only.
Originally it was only inlining when it can prove that resulting code
will shrink, but eventually some extra buffer (--param
early-inlining-insns) was necessary, but stil the early inliner is not
supposed to hit the code growth limits much.

On the other hand, the afdo inliner which replicates what late inliner
did and may inadvertly inline more (since it is organized bottom up and
inlining is non-transitive) may cause quite some code bloat.
> 
> One unrelated question about scaling profiles. We seem to scale-up AFDO  with 
> and_count_scale and scale down local_profile in some other cases. Should we 
> instead scale up AFDO profile to local_profile scale. Lot of the inlining and 
> other parameters seem to work well with that.

Profiles are either local or global.
Guesed profiles are local, that means that one can compare counts of
basic blocks within single function, but there is no meaning in compare
them across funtions.

AFDO or FDO profiles are gobal so one can compare frequencies across
fucntions  which is very useful i.e. to drive the greedy inliner.

No heuristics should depend on absolute values of counters. They are
only meaningful in comparsion with other counts (relative frequencies).
Scaling is mostly done to reduce effect of roundoff errors - more bits
we less likely roundoff errors will cumulate to something useful.

So scaling AFDO profile to local profile makes no sense.  If heuristics
are confused, it means that profile is wrong and we need to figure out
why and fix that.

Looking at today lnt runs, compared to no-FDO there are the following
improvements:

SPEC/SPEC2017/FP/511.povray_r   -13.27% 
SPEC/SPEC2017/FP/544.nab_r  -12.17% 
SPEC/SPEC2017/INT/500.perlbench_r   -6.59% 
SPEC/SPEC2017/INT/520.omnetpp_r -6.21% 
SPEC/SPEC2017/FP/519.lbm_r  -2.77% 
SPEC/SPEC2017/INT/502.gcc_r -2.59% 

It is not that bad. Regresions are:

SPEC/SPEC2017/FP/549.fotonik3d_r17.01%
SPEC/SPEC2017/FP/554.roms_r 16.82%
SPEC/SPEC2017/FP/510.parest_r   16.01%
SPEC/SPEC2017/FP/527.cam4_r 9.99%
SPEC/SPEC2017/INT/531.deepsjeng_r   9.99%
SPEC/SPEC2017/FP/503.bwaves_r   8.43%
SPEC/SPEC2017/INT/541.leela_r   7.59%
SPEC/SPEC2017/INT/525.x264_r5.05%
SPEC/SPEC2017/INT/548.exchange2_r   3.67%
SPEC/SPEC2017/FP/508.namd_r 3.44%

Fotonik seems to be random and caused by too small train run.
I will look into modifying the config to run the train runs multiple
times.

With my hacked setup running ref run for training, I now get SPECfp
improvement for auto-fdo.  I still train w/o LTO.

roms and parest is caused by disabled vectorization since loop header
profile is too low.  So I guess it is something to debug next.
It seems that main inlining and ipa-cp issues are under controll now
(exchange and x264 may be caused by this, but the regressions are small
and the benchmarks are quite sensitive) and most of problems are now in
FP benchmarks and thus likely related to loop optimization messing up
the profile.

I imlemented offlining functions that has not been inlined, so we can
benchmark -fno-auto-profile-inlining, too.

I think it would be useful to add tool to compare AFDO and profile-use
profiles so we can spots bugs without having to debug performance
regressions, but I am still travelling so I am not sure how soon I can
look into implementing this.

Honza


Re: [PATCH] [lra] apply elimination offsets to MEM in autoinc address [PR120424]

2025-06-26 Thread Vladimir Makarov



On 6/23/25 12:01 AM, Alexandre Oliva wrote:

When attempting to bootstrap arm-linux-gnueabihf with
{BOOT_C,T}FLAGS='-g -O2 -fnon-call-exceptions
-fstack-clash-protection', gmp fails to build in stage2: gen-fac's
mpz_and gets miscompiled.

A pseudo is initialized before a loop and used in a PRE_INC load
inside a loop.  It gets spilled just as the fp2sp elimination is
disabled, and only the initialization gets adjusted with elimination
offsets.  The unadjusted stack slot within the PRE_INC load ends up
reloaded later, but only when the FP offset has already missed its
chance to be adjusted.

Arrange for lra_eliminate_regs_1 to adjust autoinc addresses that are
MEMs themselves.

Regstrapped on x86_64-linux-gnu, bootstrapped on arm-linux-gnueabihf
(arm and thumb modes), also tested with gcc-14 on arm-vx7r2 and
arm-linux-gnueabihf.  Ok to install?


Yes. Thank you, Alex

for  gcc/ChangeLog

PR rtl-optimization/120424
* lra_eliminations.cc (lra_eliminate_regs_1): Adjust autoinc
addresses that are MEMs.
---
  gcc/lra-eliminations.cc |6 ++
  1 file changed, 6 insertions(+)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 5713a96805233..9cdd0c5ff53a2 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -571,6 +571,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  case POST_INC:
  case PRE_DEC:
  case POST_DEC:
+  /* Recurse to adjust elimination offsets in a spilled pseudo.  */
+  if (GET_CODE (XEXP (x, 0)) == MEM)
+   break;
/* We do not support elimination of a register that is modified.
 elimination_effects has already make sure that this does not
 happen.  */
@@ -578,6 +581,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  
  case PRE_MODIFY:

  case POST_MODIFY:
+  /* Recurse to adjust elimination offsets in a spilled pseudo.  */
+  if (GET_CODE (XEXP (x, 0)) == MEM)
+   break;
/* We do not support elimination of a hard register that is
 modified.  LRA has already make sure that this does not
 happen. The only remaining case we need to consider here is





Re: [PATCH] c++, libstdc++, v2: Implement C++26 P2830R10 - Constexpr Type Ordering

2025-06-26 Thread Jonathan Wakely
On Thu, 26 Jun 2025 at 15:38, Jonathan Wakely  wrote:
>
> On Thu, 26 Jun 2025 at 11:33, Jakub Jelinek  wrote:
> >
> > On Wed, Jun 25, 2025 at 10:58:59PM +0200, Maciej Cencora wrote:
> > > update of std module is missing.
> >
> > Here is an updated patch which adds the std module part and while I was
> > changing the patch, I've also added value_type/type and the 2 operators
> > to std::type_order.
> >
> > Interdiff from the last patch is:
> > --- libstdc++-v3/libsupc++/compare  2025-06-25 16:18:25.221710493 +0200
> > +++ libstdc++-v3/libsupc++/compare  2025-06-25 16:18:25.221710493 +0200
> > @@ -1271,6 +1271,10 @@
> >  struct type_order
> >  {
> >static constexpr strong_ordering value = __builtin_type_order(_Tp, 
> > _Up);
> > +  using value_type = strong_ordering;
> > +  using type = type_order<_Tp, _Up>;
> > +  constexpr operator value_type() const noexcept { return value; }
> > +  constexpr value_type operator()() const noexcept { return value; }
> >  };
> >
> >/// @ingroup variable_templates
> > --- libstdc++-v3/src/c++23/std.cc.in.jj 2025-06-12 15:50:51.400821105 +0200
> > +++ libstdc++-v3/src/c++23/std.cc.in2025-06-26 07:37:06.90208 +0200
> > @@ -888,6 +888,10 @@ export namespace std
> >using std::partial_order;
> >using std::strong_order;
> >using std::weak_order;
> > +#if __glibcxx_type_order >= 202506L
> > +  using std::type_order;
> > +  using std::type_order_v;
> > +#endif
> >  }
> >
> >  // 28.4 
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > Though, now that I look at it again, perhaps both
> > #if __glibcxx_type_order >= 202506L
> > in the patch should have been
> > #if __cpp_lib_type_order >= 202506L
> >
> > Can change that.
>
> I would leave it. Testing the internal __glibcxx_foo macro is always
> correct. Testing the standard __cpp_lib_foo macro is only correct in
> the main header that defines the __glibcxx_want_foo macro. In this
> case both are defined, because it's in the same header as the "want"
> macro, but if we decide we need std::type_order to be available in
> other headers and move it to some  then we'd need
> to change it to test __glibcxx_type_order instead. Using the internal
> macro is a bit more robust.

Actually for std.cc.in for consistency it should be the
__cpp_lib_type_order macro (since that file is really a consumer of
the headers, not part of the headers themselves).

But in libsupc++/compare it can stay as __glibcxx_type_order.


>
> >
> > 2025-06-26  Jakub Jelinek  
> >
> > gcc/cp/
> > * cp-trait.def: Implement C++26 P2830R10 - Constexpr Type Ordering.
> > (TYPE_ORDER): New.
> > * method.cc (type_order_value): Define.
> > * cp-tree.h (type_order_value): Declare.
> > * semantics.cc (trait_expr_value): Use gcc_unreachable also
> > for CPTK_TYPE_ORDER, adjust comment.
> > (finish_trait_expr): Handle CPTK_TYPE_ORDER.
> > * constraint.cc (diagnose_trait_expr): Likewise.
> > gcc/testsuite/
> > * g++.dg/cpp26/type-order1.C: New test.
> > * g++.dg/cpp26/type-order2.C: New test.
> > * g++.dg/cpp26/type-order3.C: New test.
> > libstdc++-v3/
> > * include/bits/version.def (type_order): New.
> > * include/bits/version.h: Regenerate.
> > * libsupc++/compare: Define __glibcxx_want_type_order before
> > including bits/version.h.
> > (std::type_order, std::type_order_v): New trait and template 
> > variable.
> > * src/c++23/std.cc.in (std::type_order, std::type_order_v): Export.
> > * testsuite/18_support/comparisons/type_order/1.cc: New test.
> >
> > --- gcc/cp/method.cc.jj 2025-06-25 16:04:51.611158952 +0200
> > +++ gcc/cp/method.cc2025-06-25 16:09:32.017556551 +0200
> > @@ -3951,5 +3951,26 @@ num_artificial_parms_for (const_tree fn)
> >return count;
> >  }
> >
> > +/* Return value of the __builtin_type_order trait.  */
> > +
> > +tree
> > +type_order_value (tree type1, tree type2)
> > +{
> > +  tree rettype = lookup_comparison_category (cc_strong_ordering);
> > +  if (rettype == error_mark_node)
> > +return rettype;
> > +  int ret;
> > +  if (type1 == type2)
> > +ret = 0;
> > +  else
> > +{
> > +  const char *name1 = ASTRDUP (mangle_type_string (type1));
> > +  const char *name2 = mangle_type_string (type2);
> > +  ret = strcmp (name1, name2);
> > +}
> > +  return lookup_comparison_result (cc_strong_ordering, rettype,
> > +  ret == 0 ? 0 : ret > 0 ? 1 : 2);
> > +}
> > +
> >
> >  #include "gt-cp-method.h"
> > --- gcc/cp/cp-tree.h.jj 2025-06-25 16:04:51.610158965 +0200
> > +++ gcc/cp/cp-tree.h2025-06-25 16:09:32.019556525 +0200
> > @@ -7557,6 +7557,8 @@ extern bool ctor_omit_inherited_parms (
> >  extern tree locate_ctor(tree);
> >  extern tree implicitly_declare_fn   (special_function_kind, 
> > tree,
> > 

[PATCH] c++, v3: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]

2025-06-26 Thread Jakub Jelinek
On Thu, Jun 26, 2025 at 01:33:08PM +0200, Jakub Jelinek wrote:
> I get some regressions (which I didn't get with the earlier patch, but
> it isn't obvious by what it has been caused):

It ICEs were caused by the canonicalize_obj_off change and indeed
> The ICEs are all in the same spot:
>   tree off = integer_zero_node;
>   canonicalize_obj_off (op, off);
>   gcc_assert (integer_zerop (off));
>   return cxx_fold_indirect_ref_1 (ctx, loc, type, op, 0, empty_base);
fixes that.  The remaining FAILs were about constant evaluation of ctors
resulting in copies of construction vtables etc. no longer be needed because
all the construction was compile time evaluated.

Here is what bootstrapped/regtested on x86_64-linux and i686-linux without
any regressions (plus your patch is of course desirable and maybe some
incremental change to the CALL_EXPR handling of fixed_type_or_null but I
don't know what exactly).

Ok for trunk?

2025-06-26  Jakub Jelinek  

PR c++/120777
gcc/
* gimple-fold.cc (gimple_get_virt_method_for_vtable): Revert
2018-09-18 changes.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_constexpr_virtual_inheritance=202506L for C++26.
gcc/cp/
* constexpr.cc: Implement C++26 P3533R2 - constexpr virtual
inheritance.
(is_valid_constexpr_fn): Don't reject constexpr cdtors in classes
with virtual bases for C++26, adjust error wording.
(cxx_bind_parameters_in_call): Add ORIG_FUN argument, add
values for __in_chrg and __vtt_parm arguments when needed.
(cxx_eval_dynamic_cast_fn): Adjust function comment, HINT -1
should be possible.  For C++26 if obj is cast from POINTER_PLUS_EXPR,
attempt to use cxx_fold_indirect_ref to simplify it and if successful,
build ADDR_EXPR of that.
(cxx_eval_call_expression): Add orig_fun variable, set it to
fun before looking through clones, pass it to
cxx_bind_parameters_in_call.
(reduced_constant_expression_p): Add SZ argument, pass DECL_SIZE
of FIELD_DECL e.index to recursive calls and don't return false
if SZ is non-NULL and there are unfilled fields with bit position
at or above SZ.
(cxx_fold_indirect_ref_1): Handle reading of vtables using
ptrdiff_t dynamic type instead of some pointer type.  Set el_sz
to DECL_SIZE_UNIT value rather than TYPE_SIZE_UNIT of
DECL_FIELD_IS_BASE fields in classes with virtual bases.
(cxx_fold_indirect_ref): In canonicalize_obj_off lambda look
through COMPONENT_REFs with DECL_FIELD_IS_BASE in classes with
virtual bases and adjust off correspondingly.  Remove assertion that
off is integer_zerop, pass tree_to_uhwi (off) instead of 0 to the
cxx_fold_indirect_ref_1 call.
* cp-tree.h (publicly_virtually_derived_p): Declare.
(reduced_constant_expression_p): Add another tree argument defaulted
to NULL_TREE.
* method.cc (synthesized_method_walk): Don't clear *constexpr_p
if there are virtual bases for C++26.
* class.cc (build_base_path): Compute fixed_type_p and
virtual_access before checks for build_simple_base_path instead of
after that and conditional cp_build_addr_expr.  Use build_simple_path
if !virtual_access even when v_binfo is non-NULL.
(layout_virtual_bases): For build_base_field calls use
access_public_node rather than access_private_node if
publicly_virtually_derived_p.
(build_vtbl_initializer): Revert 2018-09-18 and 2018-12-11 changes.
(publicly_virtually_derived_p): New function.
gcc/testsuite/
* g++.dg/cpp26/constexpr-virt-inherit1.C: New test.
* g++.dg/cpp26/constexpr-virt-inherit2.C: New test.
* g++.dg/cpp26/constexpr-virt-inherit3.C: New test.
* g++.dg/cpp26/feat-cxx26.C: Add __cpp_constexpr_virtual_inheritance
tersts.
* g++.dg/cpp2a/constexpr-dtor3.C: Don't expect one error for C++26.
* g++.dg/cpp2a/constexpr-dtor16.C: Don't expect errors for C++26.
* g++.dg/cpp2a/constexpr-dynamic10.C: Likewise.
* g++.dg/cpp0x/constexpr-ice21.C: Likewise.
* g++.dg/cpp0x/constexpr-ice4.C: Likewise.
* g++.dg/abi/mangle1.C: Guard the test on c++23_down.
* g++.dg/abi/mangle81.C: New test.
* g++.dg/ipa/ipa-icf-4.C (A::A): For
__cpp_constexpr_virtual_inheritance >= 202506L add user provided
non-constexpr constructor.

--- gcc/gimple-fold.cc.jj   2025-06-26 13:49:44.433654295 +0200
+++ gcc/gimple-fold.cc  2025-06-26 13:51:29.230355538 +0200
@@ -10276,13 +10276,12 @@ gimple_get_virt_method_for_vtable (HOST_
   access_index = offset / BITS_PER_UNIT / elt_size;
   gcc_checking_assert (offset % (elt_size * BITS_PER_UNIT) == 0);
 
-  /* The C++ FE can now produce indexed fields, and we check if the indexes
- match.  */
+  /* This code makes

[PATCH] Add _GLIBCXX_USE_ALLOC_PTR macro

2025-06-26 Thread François Dumont

I find it quite convenient so maybe you'll accept it.

Note that looking for existence of this macro I noticed that 
ChangeLog-2024 is wrongly talking about _GLIBCXX_USE_ALLOC_PTR_FOR_LIST 
in  header. Should it be fixed ?


    libstdc++: Add _GLIBCXX_USE_ALLOC_PTR macro to rule them all

    Provide a unique way to control usage of the allocator pointer type 
through a single
    macro: _GLIBCXX_USE_ALLOC_PTR. If defined is used to set the value 
of the following
    macros: _GLIBCXX_USE_ALLOC_PTR_FOR_LIST, 
_GLIBCXX_USE_ALLOC_PTR_FOR_FORWARD_LIST

    and _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE.

    libstdc++-v3/ChangeLog:

    * include/bits/forward_list.h 
[_GLIBCXX_USE_ALLOC_PTR_FOR_FORWARD_LIST]:

    Default to _GLIBCXX_USE_ALLOC_PTR if defined.
    * include/bits/stl_list.h [_GLIBCXX_USE_ALLOC_PTR_FOR_LIST]:
    Likewise.
    * include/bits/stl_tree.h [_GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE]:
    Likewise.

Tested under Linux x86_64,

ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/forward_list.h 
b/libstdc++-v3/include/bits/forward_list.h
index 8bcfb809319..b79fc7fea86 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -52,7 +52,11 @@
 #endif
 
 #if ! defined _GLIBCXX_USE_ALLOC_PTR_FOR_FWD_LIST
-# define _GLIBCXX_USE_ALLOC_PTR_FOR_FWD_LIST 1
+# ifdef _GLIBCXX_USE_ALLOC_PTR
+#  define _GLIBCXX_USE_ALLOC_PTR_FOR_FWD_LIST _GLIBCXX_USE_ALLOC_PTR
+# else
+#  define _GLIBCXX_USE_ALLOC_PTR_FOR_FWD_LIST 1
+# endif
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
diff --git a/libstdc++-v3/include/bits/stl_list.h 
b/libstdc++-v3/include/bits/stl_list.h
index d27824c0a7a..ab4271f0e26 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -75,7 +75,11 @@
 # undef _GLIBCXX_USE_ALLOC_PTR_FOR_LIST
 # define _GLIBCXX_USE_ALLOC_PTR_FOR_LIST 0
 #elif ! defined _GLIBCXX_USE_ALLOC_PTR_FOR_LIST
-# define _GLIBCXX_USE_ALLOC_PTR_FOR_LIST 1
+# ifdef _GLIBCXX_USE_ALLOC_PTR
+#  define _GLIBCXX_USE_ALLOC_PTR_FOR_LIST _GLIBCXX_USE_ALLOC_PTR
+# else
+#  define _GLIBCXX_USE_ALLOC_PTR_FOR_LIST 1
+# endif
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h
index 4b7f482e794..4fa2be1b7b7 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -79,7 +79,11 @@
 # undef _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE
 # define _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE 0
 #elif ! defined _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE
-# define _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE 1
+# ifdef _GLIBCXX_USE_ALLOC_PTR
+#  define _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE _GLIBCXX_USE_ALLOC_PTR
+# else
+#  define _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE 1
+# endif
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)


Re: [PATCH] _Hashtable fancy pointer support

2025-06-26 Thread François Dumont
No chance to just answer to the question below about how conservative I 
should be ?


Thanks

On 02/06/2025 07:07, François Dumont wrote:

Hi

It would be nice if someone got some time to review this PR.

Compared to other containers for which support of fancy allocator 
pointer type have been added the main difference is that in 
std::_Hashtable usage of the new node types is an abi breaking change, 
no matter how fancy or not is the allocator's pointer type. This is 
because in this case hash code is always cached and is put in memory 
just after pointer to next node. Let me know if I need to be more 
conservative.


    libstdc++: Add fancy pointer support to std::_Hashtable [PR57272]

    The fancy allocator pointer type support is added to 
std::unordered_map,
    std::unordered_multimap, std::unordered_multiset and 
std::unordered_set

    through the underlying std::_Hashtable class.

    To respect ABI a new parallel hierarchy of node types has been added.
    This change introduces new class template parameterized on the 
allocator's

    void_pointer type, __hashtable::_Node_base, and new class templates
    parameterized on the allocator's pointer type, __hashtable::_Node,
    __hashtable::_Iterator, __hashtable::_Local_iterator. The 
_Iterator class
    template is used for both iterator and const_iterator. The 
_Local_iterator
    class template is used for both local_iterator and 
const_local_iterator.
    Whether std::_Hashtable should 
use the old

    __detail::_Hash_node or new
    __hashtable::_Node type family internally is 
controlled by a new
    __hashtable::_Node_traits traits template. Whether it should use 
the old
    __detail::_Node_iteratorT::__hash_cached::value>
    or new __hashtable::_IteratorT::__constant_iterators::value, A::pointer>
    type family internally is controlled by 
__hashtable::_Iterator_traits traits

    template.

    In case anybody is currently using std::_Hashtable with an 
allocator that has a
    fancy pointer, this change will be an ABI break, because their 
std::_Hashtable
    instantiations would start to (correctly) use the fancy pointer 
type. Note that
    the new type family used in this case is always caching the hash 
code at node

    level.

    Because std::_Hashtable will never use fancy pointers in C++98 
mode, recompiling
    everything to use fancy pointers isn't even possible if mixing 
C++98 and C++11
    code that uses std::_Hashtable. To alleviate this problem, 
compiling with
    -D_GLIBCXX_USE_ALLOC_PTR_FOR_HASHTABLE=0 will force 
std::_Hashtable to have the
    old, non-conforming behaviour and use raw pointers internally. For 
testing
    purposes, compiling with 
-D_GLIBCXX_USE_ALLOC_PTR_FOR_HASHTABLE=9001 will force

    std::_Hashtable to always use the new node types.

    This macro is currently undocumented, which needs to be fixed.

    libstdc++-v3/ChangeLog:

    PR libstdc++/57272
    * include/bits/hashtable.h
    (__hashtable::_Node_base<>, __hashtable::_Node<>): New.
    (__hashtable::_Iterator_base<>, __hashtable::_Iterator<>): 
New.

    (__hashtable::_Local_iterator<>): New.
    (__hashtable::_Node_traits<>, 
__hashtable::_Iterator_traits<>): New.

    (__hashtable::__alloc_ptr<>): New template alias.
    (_Hashtable<>::__node_type, __node_alloc_type, 
__node_alloc_traits, __node_ptr)
    (__node_base, __node_base_ptr, __buckets_ptr): Rename 
respectively into...
    (_Hashtable<>::_Node, _Node_alloc, _Node_alloc_traits, 
_Node_ptr)

    (_Node_base, _Base_ptr, _Buckets_ptr): ... those.
    (_Hashtable<>::_Buckets_ptr_traits): New.
    (_Hashtable<>::__hash_code_base_access): Remove.
    (_Hashtable<>::_S_v, _S_next, _M_single_bucket_ptr): New.
    (_Hashtable<>::_M_node_hash_code, _M_node_hash_code_ext, 
_M_bucket_index)
    (_M_bucket_index_ext, _M_key_equals, _M_key_equals_tr, 
_M_equals, _M_equals_tr)

    (_M_node_equals): New.
    (_Hashtable<>::__location_type::_M_base): New.
    (_Hashtable<>::_S_adapt): New.
    (_Hashtable<>): Adapt.
    * include/bits/hashtable_policy.h: Include 
.

    Include .
    (_GLIBCXX_USE_ALLOC_PTR_FOR_HASHTABLE): New macro, default 
to 1.

    (_Hash_node_base::_M_base_ptr): New.
    (_Hash_node<>::_M_node_ptr): New.
    (_Hash_code_base<>::_M_hash_code, _M_bucket_index): Remove.
    (_Hashtable_base<>::_M_key_equals, _M_key_equals_tr): 
Adapt to only take key

    type.
    (_Hashtable_base<>::_M_equals, _M_equals_tr, 
_M_node_equals): Remove.

    (_Hashtable_alloc<>): Adapt.
    * testsuite/23_containers/unordered_map/115939.cc: #undef
    _GLIBCXX_USE_ALLOC_PTR_FOR_HASHTABLE as types associated 
with fancy pointer

    support do not suffer from the ambiguity issue tested here.
    * 
testsuite/23_conta

Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-26 Thread Andre Vehreschild

Hi Jerry,

for the moment only the static library is configured in the build scripts. 
Therefore only that is build named libcaf_shmem.a That's completely correct 
and desired.


I have asked the same question about performance or stress tests and got 
only the coarray_icar (link in the 0/6 mail). So when you find something 
suitable, please throw it at caf_shmem.


Regards,
Andre
Andre Vehreschild * ve...@gmx.de
Am 26. Juni 2025 19:15:06 schrieb Jerry D :


On 6/26/25 12:22 AM, Andre Vehreschild wrote:

Hi Jerry,

thanks for testing. I have fixed IMO most of the whitespace issues in the
patch attached to this mail:
https://gcc.gnu.org/pipermail/fortran/2025-June/062349.html

About the 32 vs. 64 bit versions of the libraries: I never got in touch with
that. I am doing the same as for caf_single. In fact I copied the Makefile.am
portion of caf_single and changed it to generate caf_shmem. Do you get both
versions for caf_single? Did you try a clean rebuild? Can anyone give me a
pointer on what I do wrong here?

Regards,
Andre

--- snip ---


I was able to apply the patches without any issues.  I did see some
trailing white space in a few places.

In running the testsuite the test lock_1.f90 test fails, unable to link
to the new library.


Hi all, I have seen oddities before.  I started all over. My build
script always completely erases the previous build directory.
Regardless, after a fresh build, I am unable to reproduce the lock_1.f90
failure.

Regarding where the libraries get installed, I was expecting to see the
various libcaf_shmem libraries along side the libgfortran libraries.  If
someone would clarify that I would appreciate it.

I think the next step is to do some hard testing. I was thinking about
the Opencoarrays Burgers test. Any other suggestions?

Andre, do you have any hard performance tests to suggest? Some sort of
stress test?

Jerry




[PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-26 Thread Yuao Ma
Hi all,

This patch, a follow-up to r16-1652-g0606d2b979f401, implements middle-end
optimizations (e.g., constant folding) for our trigonometric pi-based function
built-ins.

This patch is part of
https://gcc.gnu.org/pipermail/fortran/attachments/20250607/4a4a9cb6/attachment.obj

Please take a look when you are available. Thanks!

Please disregard the previous email as I forgot to attach the patch...

Best regards,
Yuao



0001-gcc-middle-end-opt-for-trigonometric-pi-based-functi.patch
Description: 0001-gcc-middle-end-opt-for-trigonometric-pi-based-functi.patch


[PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-26 Thread Yuao Ma
Hi all,

This patch, a follow-up to r16-1652-g0606d2b979f401, implements middle-end
optimizations (e.g., constant folding) for our trigonometric pi-based function
built-ins.

This patch is part of
https://gcc.gnu.org/pipermail/fortran/attachments/20250607/4a4a9cb6/attachment.obj

Please take a look when you are available. Thanks!

Best regards,
Yuao



[pushed: r16-1714] diagnostics: refactor sarif_scheme_handler::make_sink

2025-06-26 Thread David Malcolm
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-1714-g5bf213d4ad648f.

gcc/ChangeLog:
* diagnostic-output-spec.cc (sarif_scheme_handler::make_sink):
Split out creation of sarif_generation_options and
sarif_serialization_format into...
(sarif_scheme_handler::make_sarif_gen_opts): ...this...
(sarif_scheme_handler::make_sarif_serialization_object): ...and
this.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-output-spec.cc | 43 ++-
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/gcc/diagnostic-output-spec.cc b/gcc/diagnostic-output-spec.cc
index e58f0c40fc01..25ef86f5f224 100644
--- a/gcc/diagnostic-output-spec.cc
+++ b/gcc/diagnostic-output-spec.cc
@@ -179,6 +179,14 @@ public:
 diagnostic_context &dc,
 const char *unparsed_arg,
 const scheme_name_and_params &parsed_arg) const final override;
+
+private:
+  static sarif_generation_options
+  make_sarif_gen_opts (enum sarif_version version,
+  bool xml_state);
+
+  static std::unique_ptr
+  make_sarif_serialization_object (enum sarif_serialization_kind);
 };
 
 class html_scheme_handler : public output_factory::scheme_handler
@@ -505,27 +513,40 @@ sarif_scheme_handler::make_sink (const context &ctxt,
   if (!output_file)
 return nullptr;
 
+  auto sarif_gen_opts = make_sarif_gen_opts (version, xml_state);
+
+  auto serialization_obj = make_sarif_serialization_object 
(serialization_kind);
+
+  auto sink = make_sarif_sink (dc,
+  *ctxt.get_affected_location_mgr (),
+  std::move (serialization_obj),
+  sarif_gen_opts,
+  std::move (output_file));
+  return sink;
+}
+
+sarif_generation_options
+sarif_scheme_handler::make_sarif_gen_opts (enum sarif_version version,
+  bool xml_state)
+{
   sarif_generation_options sarif_gen_opts;
   sarif_gen_opts.m_version = version;
   sarif_gen_opts.m_xml_state = xml_state;
+  return sarif_gen_opts;
+}
 
-  std::unique_ptr serialization_obj;
-  switch (serialization_kind)
+std::unique_ptr
+sarif_scheme_handler::
+make_sarif_serialization_object (enum sarif_serialization_kind kind)
+{
+  switch (kind)
 {
 default:
   gcc_unreachable ();
 case sarif_serialization_kind::json:
-  serialization_obj
-   = std::make_unique (true);
+  return std::make_unique (true);
   break;
 }
-
-  auto sink = make_sarif_sink (dc,
-  *ctxt.get_affected_location_mgr (),
-  std::move (serialization_obj),
-  sarif_gen_opts,
-  std::move (output_file));
-  return sink;
 }
 
 /* class html_scheme_handler : public output_factory::scheme_handler.  */
-- 
2.26.3



[PING] [PATCH v4 0/6] c, dwarf, btf: Add btf_decl_tag and btf_type_tag C attributes

2025-06-26 Thread David Faust
Ping for the series

https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686373.html

Thanks

On 6/10/25 14:41, David Faust wrote:
> [v3: https://gcc.gnu.org/pipermail/gcc-patches/2025-April/682340.html
>  Changes v3->v4:
>  - Only patch 2 (DWARF generation) is changed; update based on Richard's
>review comments.
>  - Fix an issue with generating DWARF for type_tags when a typedef involves
>type_tags and the use of the typedef adds additional type_tags, where the
>DW_AT_GNU_annotation chain added to the (cloned) typedef DIE could include
>tags accumulated from the other side of the typedef.
>e.g. for:
>  typedef int __attribute__((btf_type_tag ("tag1"))) foo;
>  foo __attribute__((btf_type_tag ("tag2"))) x;
>The duplicate "foo" typedef DIE for the type of variable 'x' would include
>via DW_AT_GNU_annotation chain information for "tag1" as well as "tag2",
>but "tag1" should only appear in the annotations for the base integer
>type referred to by the typedef not on the typedef DIE itself.
>  - Remove some unecessary code dealing with type_tags in modified_type_die.
>  - Expand a couple of the added dwarf-btf-type-tag-N tests and also added
>more new tests, particularly to check known (but not necessarily ideal)
>behavior in corner cases such as when the parser ignores a type_tag
>attribute.
> 
>  Changes v2->v3:
>  - Change BTF format to match what is currently in use by clang, pahole and
>the linux kernel.  The format in prior versions of this series was a new
>format meant to address issues with the existing one.  However, during
>discussion at LSFMM/BPF in March, it was decided that it is not desirable
>to change the BTF format at this time, and the issues are not problematic
>in practice for current use cases.  Therefore this version of the series
>reverts to the 'old' BTF format, where type_tag can only be represented
>on pointer types.  This 'old' format is described below.
>  - Address review comments on v2, including new patch 6 with tests for some
>BPF-target specific interactions.  ]
> 
> This patch series adds support for the btf_decl_tag and btf_type_tag 
> attributes
> to GCC. This entails:
> 
> - Two new C-family attributes that allow to associate (to "tag") particular
>   declarations and types with arbitrary strings. As explained below, this is
>   intended to be used to, for example, characterize certain pointer types.  A
>   single declaration or type may have multiple occurrences of these 
> attributes.
> 
> - The conveyance of that information in the DWARF output in the form of a new
>   DIE: DW_TAG_GNU_annotation, and a new attribute: DW_AT_GNU_annotation.
> 
> - The conveyance of that information in the BTF output in the form of two new
>   kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. These BTF
>   kinds are already supported by LLVM and other tools in the BPF ecosystem.
> 
> Both of these attributes are already supported by clang, and are already being
> used in various ways by BPF support inside the Linux kernel.
> 
> It is worth noting that while the Linux kernel and BPF/BTF is the motivating 
> use
> case of this feature, the format of the new DWARF extension is generic.  This
> work could be easily adapted to provide a general way for program authors to
> annotate types and declarations with arbitrary information for any
> post-compilation analysis needs, not just the Linux kernel BPF verifier.  For
> example, these annotations could be used to aid in ABI analysis.
> 
> Purpose
> ===
> 
> 1)  Addition of C-family language constructs (attributes) to specify free-text
> tags on certain language elements, such as struct fields.
> 
> The purpose of these annotations is to provide additional information 
> about
> types, variables, and function parameters of interest to the kernel. A
> driving use case is to tag pointer types within the Linux kernel and BPF
> programs with additional semantic information, such as '__user' or 
> '__rcu'.
> 
> For example, consider the Linux kernel function do_execve with the
> following declaration:
> 
>   static int do_execve(struct filename *filename,
>  const char __user *const __user *__argv,
>  const char __user *const __user *__envp);
> 
> Here, __user could be defined with these annotations to record semantic
> information about the pointer parameters (e.g., they are user-provided) in
> DWARF and BTF information. Other kernel facilities such as the BPF 
> verifier
> can read the tags and make use of the information.
> 
> 2)  Conveying the tags in the generated DWARF debug info.
> 
> The main motivation for emitting the tags in DWARF is that the Linux 
> kernel
> generates its BTF information via pahole, using DWARF as a source:
> 
> ++  BTF  BTF   +--+
> | pahole |---> vmlinux.btf --->| verifier |
> +-

[pushed: r16-1716] diagnostics: make 5 more fields of diagnostic_context private

2025-06-26 Thread David Malcolm
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-1716-gc4d211bba2a86b.

gcc/ada/ChangeLog:
* gcc-interface/misc.cc (gnat_init): Use
diagnostic_context::set_internal_error_callback.

gcc/c-family/ChangeLog:
* c-opts.cc (c_common_diagnostics_set_defaults): Use
diagnostic_context::set_permissive_option.

gcc/cp/ChangeLog:
* error.cc (cxx_initialize_diagnostics): Use
diagnostic_context::set_adjust_diagnostic_info_callback.

gcc/ChangeLog:
* diagnostic.h (diagnostic_context::set_permissive_option): New.
(diagnostic_context::set_fatal_errors): New.
(diagnostic_context::set_internal_error_callback): New.
(diagnostic_context::set_adjust_diagnostic_info_callback): New.
(diagnostic_context::inhibit_notes): New.
(diagnostic_context::m_opt_permissive): Make private.
(diagnostic_context::m_fatal_errors): Likewise.
(diagnostic_context::m_internal_error): Likewise.
(diagnostic_context::m_adjust_diagnostic_info): Likewise.
(diagnostic_context::m_inhibit_notes_p): Likewise.
(diagnostic_inhibit_notes): Delete.
* opts.cc (common_handle_option): Use
diagnostic_context::set_fatal_errors.
* toplev.cc (internal_error_function): Use
diagnostic_context::set_internal_error_callback.
(general_init): Likewise.
(process_options): Use diagnostic_context::inhibit_notes.

Signed-off-by: David Malcolm 
---
 gcc/ada/gcc-interface/misc.cc |  2 +-
 gcc/c-family/c-opts.cc|  2 +-
 gcc/cp/error.cc   |  2 +-
 gcc/diagnostic.h  | 43 +++
 gcc/opts.cc   |  2 +-
 gcc/toplev.cc |  6 ++---
 6 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index ca5c9a2163ee..128040e4d90d 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -377,7 +377,7 @@ gnat_init (void)
   line_table->default_range_bits = 0;
 
   /* Register our internal error function.  */
-  global_dc->m_internal_error = &internal_error_function;
+  global_dc->set_internal_error_callback (&internal_error_function);
 
   return true;
 }
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 697518637df3..3ee9444cbefe 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -192,7 +192,7 @@ void
 c_common_diagnostics_set_defaults (diagnostic_context *context)
 {
   diagnostic_text_finalizer (context) = c_diagnostic_text_finalizer;
-  context->m_opt_permissive = OPT_fpermissive;
+  context->set_permissive_option (OPT_fpermissive);
 }
 
 /* Input charset configuration for diagnostics.  */
diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 69da381a355b..abeb0285eec6 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -308,7 +308,7 @@ cxx_initialize_diagnostics (diagnostic_context *context)
   diagnostic_text_starter (context) = cp_diagnostic_text_starter;
   /* diagnostic_finalizer is already c_diagnostic_text_finalizer.  */
   context->set_format_decoder (cp_printer);
-  context->m_adjust_diagnostic_info = cp_adjust_diagnostic_info;
+  context->set_adjust_diagnostic_info_callback (cp_adjust_diagnostic_info);
 }
 
 /* Dump an '@module' name suffix for DECL, if it's attached to an import.  */
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index f9c8253395b9..9df429275f0b 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -842,6 +842,36 @@ public:
 
   void set_main_input_filename (const char *filename);
 
+  void
+  set_permissive_option (diagnostic_option_id opt_permissive)
+  {
+m_opt_permissive = opt_permissive;
+  }
+
+  void
+  set_fatal_errors (bool fatal_errors)
+  {
+m_fatal_errors = fatal_errors;
+  }
+
+  void
+  set_internal_error_callback (void (*cb) (diagnostic_context *,
+  const char *,
+  va_list *))
+  {
+m_internal_error = cb;
+  }
+
+  void
+  set_adjust_diagnostic_info_callback (void (*cb) (diagnostic_context *,
+  diagnostic_info *))
+  {
+m_adjust_diagnostic_info = cb;
+  }
+
+  void
+  inhibit_notes () { m_inhibit_notes_p = true; }
+
 private:
   void error_recursion () ATTRIBUTE_NORETURN;
 
@@ -911,6 +941,7 @@ public:
   /* True if permerrors are warnings.  */
   bool m_permissive;
 
+private:
   /* The option to associate with turning permerrors into warnings,
  if any.  */
   diagnostic_option_id m_opt_permissive;
@@ -918,6 +949,7 @@ public:
   /* True if errors are fatal.  */
   bool m_fatal_errors;
 
+public:
   /* True if all warnings should be disabled.  */
   bool m_inhibit_warnings;
 
@@ -949,7 +981,6 @@ private:
 diagnostic_text_finalizer_fn m_end_diagnostic;
   } m_text_callbacks;
 
-public:
   /* Client hook to report an internal error.  

[pushed: r16-1715] diagnostics, testsuite: don't assume host has "dot" [PR120809]

2025-06-26 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-1715-g0e7296540be358.

gcc/ChangeLog:
PR analyzer/120809
* diagnostic-format-html.cc
(html_builder::maybe_make_state_diagram): Bulletproof against the
SVG generation failing.
* xml.cc (xml::printer::push_element): Assert that the ptr is
nonnull.
(xml::printer::append): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/120809
* gcc.dg/analyzer/state-diagram-5.c: Split out into...
* gcc.dg/analyzer/state-diagram-5-html.c: ...this, adding
dg-require-dot...
* gcc.dg/analyzer/state-diagram-5-sarif.c: ...and this.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-html.cc |  3 +-
 ...ate-diagram-5.c => state-diagram-5-html.c} | 11 ++
 .../gcc.dg/analyzer/state-diagram-5-sarif.c   | 35 +++
 gcc/xml.cc|  2 ++
 4 files changed, 41 insertions(+), 10 deletions(-)
 rename gcc/testsuite/gcc.dg/analyzer/{state-diagram-5.c => 
state-diagram-5-html.c} (64%)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/state-diagram-5-sarif.c

diff --git a/gcc/diagnostic-format-html.cc b/gcc/diagnostic-format-html.cc
index c397c9f088d0..9b4c8bcdf31f 100644
--- a/gcc/diagnostic-format-html.cc
+++ b/gcc/diagnostic-format-html.cc
@@ -632,7 +632,8 @@ html_builder::maybe_make_state_diagram (const 
diagnostic_event &event)
 
   // Turn the .dot into SVG and splice into place
   auto svg = dot::make_svg_from_graph (*graph);
-  xp.append (std::move (svg));
+  if (svg)
+xp.append (std::move (svg));
 
   return wrapper;
 }
diff --git a/gcc/testsuite/gcc.dg/analyzer/state-diagram-5.c 
b/gcc/testsuite/gcc.dg/analyzer/state-diagram-5-html.c
similarity index 64%
rename from gcc/testsuite/gcc.dg/analyzer/state-diagram-5.c
rename to gcc/testsuite/gcc.dg/analyzer/state-diagram-5-html.c
index 8e00cac06863..274a951769e9 100644
--- a/gcc/testsuite/gcc.dg/analyzer/state-diagram-5.c
+++ b/gcc/testsuite/gcc.dg/analyzer/state-diagram-5-html.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-fdiagnostics-add-output=sarif:xml-state=yes" } */
+/* { dg-require-dot "" } */
 /* { dg-additional-options 
"-fdiagnostics-add-output=experimental-html:javascript=no,show-state-diagrams=yes"
 } */
 /* { dg-additional-options "-fdiagnostics-show-caret" } */
 
@@ -36,13 +36,6 @@ void test (void)
   __analyzer_dump_path ();
{ dg-end-multiline-output "" } */
 
-/* Verify that some JSON was written to a file with the expected name.  */
-/* { dg-final { verify-sarif-file } } */
-
-/* Use a Python script to verify various properties about the generated
-   .sarif file:
-   { dg-final { run-sarif-pytest state-diagram-5.c "state-diagram-5-sarif.py" 
} } */
-
 /* Use a Python script to verify various properties about the generated
.html file:
-   { dg-final { run-html-pytest state-diagram-5.c "state-diagram-5-html.py" } 
} */
+   { dg-final { run-html-pytest state-diagram-5-html.c 
"state-diagram-5-html.py" } } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/state-diagram-5-sarif.c 
b/gcc/testsuite/gcc.dg/analyzer/state-diagram-5-sarif.c
new file mode 100644
index ..28cf58042306
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/state-diagram-5-sarif.c
@@ -0,0 +1,35 @@
+/* { dg-additional-options "-fdiagnostics-add-output=sarif:xml-state=yes" } */
+
+#include "analyzer-decls.h"
+
+struct foo
+{
+  int m_ints[4];
+};
+
+struct bar
+{
+  struct foo m_foos[3];
+  int m_int;
+  char m_ch;
+};
+
+struct baz
+{
+  struct bar m_bars[2];
+  struct foo m_foos[5];
+};
+
+void test (void)
+{
+  struct baz baz_arr[2];
+  baz_arr[1].m_bars[1].m_foos[2].m_ints[1] = 42;
+  __analyzer_dump_path (); /* { dg-message "path" } */
+}
+
+/* Verify that some JSON was written to a file with the expected name.  */
+/* { dg-final { verify-sarif-file } } */
+
+/* Use a Python script to verify various properties about the generated
+   .sarif file:
+   { dg-final { run-sarif-pytest state-diagram-5-sarif.c 
"state-diagram-5-sarif.py" } } */
diff --git a/gcc/xml.cc b/gcc/xml.cc
index 6bb269a2a19e..8e11c6783425 100644
--- a/gcc/xml.cc
+++ b/gcc/xml.cc
@@ -317,6 +317,7 @@ printer::add_raw (std::string text)
 void
 printer::push_element (std::unique_ptr new_element)
 {
+  gcc_assert (new_element.get ());
   element *parent = m_open_tags.back ();
   m_open_tags.push_back (new_element.get ());
   parent->add_child (std::move (new_element));
@@ -325,6 +326,7 @@ printer::push_element (std::unique_ptr new_element)
 void
 printer::append (std::unique_ptr new_node)
 {
+  gcc_assert (new_node.get ());
   element *parent = m_open_tags.back ();
   parent->add_child (std::move (new_node));
 }
-- 
2.26.3



Re: [PATCH] gcc: middle-end opt for trigonometric pi-based functions builtins

2025-06-26 Thread David Malcolm
On Thu, 2025-06-26 at 17:45 +, Yuao Ma wrote:
> Hi all,
> 
> This patch, a follow-up to r16-1652-g0606d2b979f401, implements
> middle-end
> optimizations (e.g., constant folding) for our trigonometric pi-based
> function
> built-ins.
> 
> This patch is part of
> https://gcc.gnu.org/pipermail/fortran/attachments/20250607/4a4a9cb6/attachment.obj
> 
> Please take a look when you are available. Thanks!

I'm not an expert on this part of the code, but I noticed that the code
part of the patch has these guards:

  #if MPFR_VERSION >= MPFR_VERSION_NUM(4, 2, 0)

but the testcases don't seem to be conditionalized on this.  Would the
new tests fail if gcc is built against an insufficiently recent version
of mpfr, and is/should there be some kind of dg-requires for this, so
that the new tests gracefully are "UNSUPPORTED" on such configurations?

Hope this is constructive; sorry if I'm missing something here.
Dave



Re: [PATCH 1/2] Fixup partial_vectors_supported_p use

2025-06-26 Thread Richard Biener
On Thu, 26 Jun 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following fixes the computation of supports_partial_vectors which
> > is used to prune the set of modes to iterate over for epilog
> > vectorization.  The used partial_vectors_supported_p predicate
> > only looks for while_ult while also support predication when
> > mask modes are integer modes as for AVX512.
> >
> > I've noticed this isn't very effective on x86_64 anyway since
> > if the main loop mode is autodetected we skip re-analyzing
> > mode_i == 0, but then mode_i == 1 is usually the very same
> > large mode.
> >
> > Thus I do wonder if we should instead always (or when
> > --param vect-partial-vector-usage != 0, or when the target would
> > support predication in principle) perform main loop analysis
> > with partial vectors in mind (start with can_use_partial_vectors_p =
> > true), but only at the end honor the --param when deciding on
> > using_partial_vectors_p.  We can then remember can_use_partial_vectors_p
> > for each analyzed mode and use that more specific info for the
> > pruning?
> 
> Yeah, sounds like that could work.  In principle, epilogue loops should
> be strictly easier to vectorise than main loops.  If you know that the
> epilogue "loop" never iterates, there could in principle be cases
> where we'd need to clear can_use_partial_vectors_p for the main loop
> but not for the epilogue loop.  I can't think of any situation like
> that off-hand though.  Likewise for unrolling.

So we already do analyze the main loop for partial vector usage when
--param vect-partial-vector-usage != 0, so for the purpose of
pruning epilogue analysis we should be able to use
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.

As you say there might in theory be corner cases, like when
applying a suggested unroll factor to the main loop.  I can't
think of a reason for when we don't, so we can in principle
just remember the analysis result without if required.

But basically it would be like below, I'll post this separately
again so the CI can pick it up.

Would that be OK as-is or do you think we should be looking
to deal with the unrolled main loop case preventively?

Thanks,
Richard.

>From ef60826a888247da723385c84c1dca2aead7b2e4 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 26 Jun 2025 11:08:04 +0200
Subject: [PATCH] Fixup partial_vectors_supported_p use
To: gcc-patches@gcc.gnu.org

The following fixes the computation of supports_partial_vectors which
is used to prune the set of modes to iterate over for epilog
vectorization.  The used partial_vectors_supported_p predicate
only looks for while_ult while also support predication when
mask modes are integer modes as for AVX512.

I've noticed this isn't very effective on x86_64 anyway since
if the main loop mode is autodetected we skip re-analyzing
mode_i == 0, but then mode_i == 1 is usually the very same
large mode.  This is fixed by the next patch.

The following simplifies the logic by simply re-using the
already computed LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P from
the main loop to decide whether we can possibly use partial
vectors for the epilogue (for the case of having the same VF).

* tree-vect-loop.cc (vect_analyze_loop): Use the main
loop partial vector analysis result to decide if epilogues
with the same VF can use partial vectors.
---
 gcc/tree-vect-loop.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c824b5abaaf..603d60d8977 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3742,8 +3742,9 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
 vector_modes[0] = autodetected_vector_mode;
   mode_i = 0;
 
-  bool supports_partial_vectors =
-partial_vectors_supported_p () && param_vect_partial_vector_usage != 0;
+  bool supports_partial_vectors
+= (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (first_loop_vinfo)
+   && param_vect_partial_vector_usage != 0);
   poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
 
   loop_vec_info orig_loop_vinfo = first_loop_vinfo;
-- 
2.43.0



[PATCH] [RISC-V] Correct CFA notes for stack-clash protection [PR120714]

2025-06-26 Thread Alexey Merzlyakov
Fixes incorrect SP-addresses used in CFA notes for the stack probes
unrelative to the frame's top. It applied to the RISC-V targets code
generation when the stack-clash protection is enabled.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_allocate_and_probe_stack_space):
  Fix SP-addresses in REG_CFA_DEF_CFA notes for stack-clash case.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr120714.c: New test.

Signed-off-by: Alexey Merzlyakov 
---
 gcc/config/riscv/riscv.cc | 13 ++--
 gcc/testsuite/gcc.target/riscv/pr120714.c | 40 +++
 2 files changed, 51 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr120714.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3c1bb74675a..4ac5ad998fb 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8976,12 +8976,20 @@ riscv_allocate_and_probe_stack_space (rtx temp1, 
HOST_WIDE_INT size)
   temp2 = riscv_force_temporary (temp2, gen_int_mode (rounded_size, 
Pmode));
   insn = emit_insn (gen_sub3_insn (temp2, stack_pointer_rtx, temp2));
 
+  /* The size does not represent actual stack pointer address shift
+from the top of the frame, as it might be lowered before.
+To consider the correct SP addresses for the CFA notes, it is needed
+to correct them with the initial offset value.  */
+  HOST_WIDE_INT initial_cfa_offset
+   = cfun->machine->frame.total_size.to_constant () - size;
+
   if (!frame_pointer_needed)
{
  /* We want the CFA independent of the stack pointer for the
 duration of the loop.  */
  add_reg_note (insn, REG_CFA_DEF_CFA,
-   plus_constant (Pmode, temp1, rounded_size));
+   plus_constant (Pmode, temp1,
+  initial_cfa_offset + rounded_size));
  RTX_FRAME_RELATED_P (insn) = 1;
}
 
@@ -8994,7 +9002,8 @@ riscv_allocate_and_probe_stack_space (rtx temp1, 
HOST_WIDE_INT size)
{
  insn = get_last_insn ();
  add_reg_note (insn, REG_CFA_DEF_CFA,
-   plus_constant (Pmode, stack_pointer_rtx, rounded_size));
+   plus_constant (Pmode, stack_pointer_rtx,
+  initial_cfa_offset + rounded_size));
  RTX_FRAME_RELATED_P (insn) = 1;
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr120714.c 
b/gcc/testsuite/gcc.target/riscv/pr120714.c
new file mode 100644
index 000..dd71a3e11d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr120714.c
@@ -0,0 +1,40 @@
+/* Test checking that the backtrace on large frame size with additional
+   SP shift in the prologue won't broken when compiled with the
+   -fstack-clash-protection option.  */
+/* { dg-do run { target { *-*-linux* } } } */
+/* -O0 does not have enough optimizations.
+   -O2/-O3 does inline and reduces number of addresses in the backtrace.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O2" "-O3" } } */
+/* { dg-options "-g -fstack-clash-protection" } */
+
+#include 
+
+#define MAX 4000
+
+void goo ()
+{
+  int addresses;
+  void *buffer[10];
+
+  addresses = backtrace (buffer, 10);
+  if (addresses != 6)
+__builtin_abort ();
+}
+
+int foo (int a)
+{
+  long long A[MAX];
+  for (int i = 0; i < MAX; i++)
+A[i] = i;
+
+  goo ();
+
+  return A[a % MAX];
+}
+
+int main ()
+{
+  if (foo (20) != 20)
+__builtin_abort ();
+  return 0;
+}
-- 
2.34.1



Re: [PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-26 Thread Robin Dapp

Hi Pan,

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h

index 2932e189186..0af8b969f47 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
@@ -282,9 +282,24 @@ DEF_SAT_U_ADD(uint16_t)
 DEF_SAT_U_ADD(uint32_t)
 DEF_SAT_U_ADD(uint64_t)
 
+#define DEF_SAT_U_SUB(T)   \

+T  \
+test_##T##_sat_sub (T a, T b)  \
+{  \
+  return (a - b) & (-(T)(a >= b)); \
+}
+
+DEF_SAT_U_SUB(uint8_t)
+DEF_SAT_U_SUB(uint16_t)
+DEF_SAT_U_SUB(uint32_t)
+DEF_SAT_U_SUB(uint64_t)
+
 #define SAT_U_ADD_FUNC(T) test_##T##_sat_add
 #define SAT_U_ADD_FUNC_WRAP(T) SAT_U_ADD_FUNC(T)
 
+#define SAT_U_SUB_FUNC(T) test_##T##_sat_sub

+#define SAT_U_SUB_FUNC_WRAP(T) SAT_U_SUB_FUNC(T)
+
 #define TEST_BINARY_VX_SIGNED_0(T)  \
   DEF_VX_BINARY_CASE_0_WRAP(T, +, add)  \
   DEF_VX_BINARY_CASE_0_WRAP(T, -, sub)  \
@@ -313,6 +328,7 @@ DEF_SAT_U_ADD(uint64_t)
   DEF_VX_BINARY_CASE_2_WRAP(T, MAX_FUNC_1_WARP(T), max)\
   DEF_VX_BINARY_CASE_2_WRAP(T, MIN_FUNC_0_WARP(T), min)\
   DEF_VX_BINARY_CASE_2_WRAP(T, MIN_FUNC_1_WARP(T), min)\
-  DEF_VX_BINARY_CASE_2_WRAP(T, SAT_U_ADD_FUNC(T), sat_add)
+  DEF_VX_BINARY_CASE_2_WRAP(T, SAT_U_ADD_FUNC(T), sat_add) \
+  DEF_VX_BINARY_CASE_2_WRAP(T, SAT_U_SUB_FUNC(T), sat_add) \


Shouldn't that be sat_sub here?

--
Regards
Robin



[PATCH][v2] Fixup partial_vectors_supported_p use

2025-06-26 Thread Richard Biener
The following fixes the computation of supports_partial_vectors which
is used to prune the set of modes to iterate over for epilog
vectorization.  The used partial_vectors_supported_p predicate
only looks for while_ult while also support predication when
mask modes are integer modes as for AVX512.

I've noticed this isn't very effective on x86_64 anyway since
if the main loop mode is autodetected we skip re-analyzing
mode_i == 0, but then mode_i == 1 is usually the very same
large mode.  This is fixed by the next patch.

The following simplifies the logic by simply re-using the
already computed LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P from
the main loop to decide whether we can possibly use partial
vectors for the epilogue (for the case of having the same VF).
We remember the main loop analysis before a suggested unroll
factor is applied to avoid possible differences from that.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

* tree-vect-loop.cc (vect_analyze_loop_1): New parameter
to output whether the not unrolled loop can use partial
vectors.
(vect_analyze_loop): Use the main loop partial vector
analysis result to decide if epilogues with the same VF
can use partial vectors.
---
 gcc/tree-vect-loop.cc | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c824b5abaaf..fa022dfad42 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3474,7 +3474,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
 loop_vec_info orig_loop_vinfo,
 const vector_modes &vector_modes, unsigned &mode_i,
 machine_mode &autodetected_vector_mode,
-bool &fatal)
+bool &fatal,
+bool &loop_as_epilogue_supports_partial_vectors)
 {
   loop_vec_info loop_vinfo
 = vect_create_loop_vinfo (loop, shared, loop_form_info, orig_loop_vinfo);
@@ -3488,6 +3489,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
   opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal,
&suggested_unroll_factor,
slp_done_for_suggested_uf);
+  loop_as_epilogue_supports_partial_vectors
+= LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo);
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
 "* Analysis %s with vector mode %s\n",
@@ -3633,6 +3636,8 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
   for (unsigned i = 0; i < vector_modes.length (); ++i)
 cached_vf_per_mode.safe_push (0);
 
+  bool supports_partial_vectors = false;
+
   /* First determine the main loop vectorization mode, either the first
  one that works, starting with auto-detecting the vector mode and then
  following the targets order of preference, or the one with the
@@ -3644,10 +3649,12 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
   /* Set cached VF to -1 prior to analysis, which indicates a mode has
 failed.  */
   cached_vf_per_mode[last_mode_i] = -1;
+  bool loop_as_epilogue_supports_partial_vectors;
   opt_loop_vec_info loop_vinfo
= vect_analyze_loop_1 (loop, shared, &loop_form_info,
   NULL, vector_modes, mode_i,
-  autodetected_vector_mode, fatal);
+  autodetected_vector_mode, fatal,
+  loop_as_epilogue_supports_partial_vectors);
   if (fatal)
break;
 
@@ -3677,7 +3684,11 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
  first_loop_vinfo = opt_loop_vec_info::success (NULL);
}
  if (first_loop_vinfo == NULL)
-   first_loop_vinfo = loop_vinfo;
+   {
+ first_loop_vinfo = loop_vinfo;
+ supports_partial_vectors
+   = loop_as_epilogue_supports_partial_vectors;
+   }
  else
{
  delete loop_vinfo;
@@ -3742,8 +3753,7 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
 vector_modes[0] = autodetected_vector_mode;
   mode_i = 0;
 
-  bool supports_partial_vectors =
-partial_vectors_supported_p () && param_vect_partial_vector_usage != 0;
+  supports_partial_vectors &= param_vect_partial_vector_usage != 0;
   poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
 
   loop_vec_info orig_loop_vinfo = first_loop_vinfo;
@@ -3769,12 +3779,13 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
 "* Re-trying epilogue analysis with vector "
 "mode %s\n", GET_MODE_NAME (vector_modes[mode_i]));
 
- bool fatal;
+ bool fatal, loop_as_epilogue_supports_partial_vectors;
 

Re: [PATCH v2 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-26 Thread Tomasz Kaminski
On Thu, Jun 26, 2025 at 3:40 PM Luc Grosheintz 
wrote:

>
>
> On 6/13/25 12:40, Luc Grosheintz wrote:
> > libstdc++-v3/ChangeLog:
> >
> >   * include/std/mdspan (default_accessor): New class.
> >   * src/c++23/std.cc.in: Register default_accessor.
> >   * testsuite/23_containers/mdspan/accessors/default.cc: New test.
> >
> > Signed-off-by: Luc Grosheintz 
> > ---
> >   libstdc++-v3/include/std/mdspan   | 26 
> >   libstdc++-v3/src/c++23/std.cc.in  |  3 +-
> >   .../23_containers/mdspan/accessors/default.cc | 59 +++
> >   3 files changed, 87 insertions(+), 1 deletion(-)
> >   create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> >
> > diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> > index 6dc2441f80b..2e85ba8e6cb 100644
> > --- a/libstdc++-v3/include/std/mdspan
> > +++ b/libstdc++-v3/include/std/mdspan
> > @@ -1004,6 +1004,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > [[no_unique_address]] _S_strides_t _M_strides;
> >   };
> >
> > +  template
> > +struct default_accessor
> > +{
>
> It would be easy to check the two mandates: not abstract, not array
> here. Would you like a v3, with the change?
>
> https://eel.is/c++draft/views.multidim#mdspan.accessor.default.overview-2

Yes, I think that makes sense. Thanks.

>
>
> > +  using offset_policy = default_accessor;
> > +  using element_type = _ElementType;
> > +  using reference = element_type&;
> > +  using data_handle_type = element_type*;
> > +
> > +  constexpr
> > +  default_accessor() noexcept = default;
> > +
> > +  template
> > + requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
> > + constexpr
> > + default_accessor(default_accessor<_OElementType>) noexcept
> > + { }
> > +
> > +  constexpr reference
> > +  access(data_handle_type __p, size_t __i) const noexcept
> > +  { return __p[__i]; }
> > +
> > +  constexpr data_handle_type
> > +  offset(data_handle_type __p, size_t __i) const noexcept
> > +  { return __p + __i; }
> > +};
> > +
> >   _GLIBCXX_END_NAMESPACE_VERSION
> >   }
> >   #endif
> > diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/
> std.cc.in
> > index 109f590f1d1..e671aff68f8 100644
> > --- a/libstdc++-v3/src/c++23/std.cc.in
> > +++ b/libstdc++-v3/src/c++23/std.cc.in
> > @@ -1843,7 +1843,8 @@ export namespace std
> > using std::layout_left;
> > using std::layout_right;
> > using std::layout_stride;
> > -  // FIXME layout_left_padded, layout_right_padded, default_accessor
> and mdspan
> > +  using std::default_accessor;
> > +  // FIXME layout_left_padded, layout_right_padded and mdspan
> >   }
> >   #endif
> >
> > diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> > new file mode 100644
> > index 000..303833d4857
> > --- /dev/null
> > +++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
> > @@ -0,0 +1,59 @@
> > +// { dg-do run { target c++23 } }
> > +#include 
> > +
> > +#include 
> > +
> > +constexpr size_t dyn = std::dynamic_extent;
> > +
> > +template
> > +  constexpr void
> > +  test_accessor_policy()
> > +  {
> > +static_assert(std::copyable);
> > +static_assert(std::is_nothrow_move_constructible_v);
> > +static_assert(std::is_nothrow_move_assignable_v);
> > +static_assert(std::is_nothrow_swappable_v);
> > +  }
> > +
> > +constexpr bool
> > +test_access()
> > +{
> > +  std::default_accessor accessor;
> > +  std::array a{10, 11, 12, 13, 14};
> > +  VERIFY(accessor.access(a.data(), 0) == 10);
> > +  VERIFY(accessor.access(a.data(), 4) == 14);
> > +  return true;
> > +}
> > +
> > +constexpr bool
> > +test_offset()
> > +{
> > +  std::default_accessor accessor;
> > +  std::array a{10, 11, 12, 13, 14};
> > +  VERIFY(accessor.offset(a.data(), 0) == a.data());
> > +  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
> > +  return true;
> > +}
> > +
> > +constexpr void
> > +test_ctor()
> > +{
> > +
> static_assert(std::is_nothrow_constructible_v,
> > +
>  std::default_accessor>);
> > +  static_assert(std::is_convertible_v,
> > +   std::default_accessor>);
> > +  static_assert(!std::is_constructible_v,
> > +  std::default_accessor>);
> > +}
> > +
> > +int
> > +main()
> > +{
> > +  test_accessor_policy>();
> > +  test_access();
> > +  static_assert(test_access());
> > +  test_offset();
> > +  static_assert(test_offset());
> > +  test_ctor();
> > +  return 0;
> > +}
>
>


Re: [PATCH 1/2] Fixup partial_vectors_supported_p use

2025-06-26 Thread Richard Biener
On Fri, 27 Jun 2025, Richard Biener wrote:

> On Thu, 26 Jun 2025, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > The following fixes the computation of supports_partial_vectors which
> > > is used to prune the set of modes to iterate over for epilog
> > > vectorization.  The used partial_vectors_supported_p predicate
> > > only looks for while_ult while also support predication when
> > > mask modes are integer modes as for AVX512.
> > >
> > > I've noticed this isn't very effective on x86_64 anyway since
> > > if the main loop mode is autodetected we skip re-analyzing
> > > mode_i == 0, but then mode_i == 1 is usually the very same
> > > large mode.
> > >
> > > Thus I do wonder if we should instead always (or when
> > > --param vect-partial-vector-usage != 0, or when the target would
> > > support predication in principle) perform main loop analysis
> > > with partial vectors in mind (start with can_use_partial_vectors_p =
> > > true), but only at the end honor the --param when deciding on
> > > using_partial_vectors_p.  We can then remember can_use_partial_vectors_p
> > > for each analyzed mode and use that more specific info for the
> > > pruning?
> > 
> > Yeah, sounds like that could work.  In principle, epilogue loops should
> > be strictly easier to vectorise than main loops.  If you know that the
> > epilogue "loop" never iterates, there could in principle be cases
> > where we'd need to clear can_use_partial_vectors_p for the main loop
> > but not for the epilogue loop.  I can't think of any situation like
> > that off-hand though.  Likewise for unrolling.
> 
> So we already do analyze the main loop for partial vector usage when
> --param vect-partial-vector-usage != 0, so for the purpose of
> pruning epilogue analysis we should be able to use
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
> 
> As you say there might in theory be corner cases, like when
> applying a suggested unroll factor to the main loop.  I can't
> think of a reason for when we don't, so we can in principle
> just remember the analysis result without if required.
> 
> But basically it would be like below, I'll post this separately
> again so the CI can pick it up.
> 
> Would that be OK as-is or do you think we should be looking
> to deal with the unrolled main loop case preventively?

It's easy enough to do, like with the following.  So that's what
I'm going to test.

Richard.

>From b0ae2522e8ddb3381e7e22995c0ce3e700c53755 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 26 Jun 2025 11:08:04 +0200
Subject: [PATCH] Fixup partial_vectors_supported_p use
To: gcc-patches@gcc.gnu.org

The following fixes the computation of supports_partial_vectors which
is used to prune the set of modes to iterate over for epilog
vectorization.  The used partial_vectors_supported_p predicate
only looks for while_ult while also support predication when
mask modes are integer modes as for AVX512.

I've noticed this isn't very effective on x86_64 anyway since
if the main loop mode is autodetected we skip re-analyzing
mode_i == 0, but then mode_i == 1 is usually the very same
large mode.  This is fixed by the next patch.

The following simplifies the logic by simply re-using the
already computed LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P from
the main loop to decide whether we can possibly use partial
vectors for the epilogue (for the case of having the same VF).
We remember the main loop analysis before a suggested unroll
factor is applied to avoid possible differences from that.

* tree-vect-loop.cc (vect_analyze_loop_1): New parameter
to output whether the not unrolled loop can use partial
vectors.
(vect_analyze_loop): Use the main loop partial vector
analysis result to decide if epilogues with the same VF
can use partial vectors.
---
 gcc/tree-vect-loop.cc | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c824b5abaaf..fa022dfad42 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3474,7 +3474,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
 loop_vec_info orig_loop_vinfo,
 const vector_modes &vector_modes, unsigned &mode_i,
 machine_mode &autodetected_vector_mode,
-bool &fatal)
+bool &fatal,
+bool &loop_as_epilogue_supports_partial_vectors)
 {
   loop_vec_info loop_vinfo
 = vect_create_loop_vinfo (loop, shared, loop_form_info, orig_loop_vinfo);
@@ -3488,6 +3489,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
   opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal,
&suggested_unroll_factor,
slp_done_for_suggested_uf);
+  loop_as_epilogue_supports_partial_vectors
+= LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo);
   if (dump_enabl

Re: [PATCH 02/17] Mark pass_sccopy gate and execute functions as final override

2025-06-26 Thread Filip Kastl
Thanks for spotting this.

Filip

On Wed 2025-06-25 16:03:20, Martin Jambor wrote:
> Hi,
> 
> It is customary to mark the gate and execute functions of the classes
> representing passes as final override but this is missing in
> pass_sccopy.  This patch adds it which also silences clang warnings
> about it.
> 
> Bootstrapped and tested on x86_64-linux. Because of the precedent
> elsewhere I consider this obvious and will commit it shortly.
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2025-06-24  Martin Jambor  
> 
>   * gimple-ssa-sccopy.cc (class pass_sccopy): Mark member functions
>   gate and execute as final override.
> ---
>  gcc/gimple-ssa-sccopy.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> index c93374572a9..341bae46080 100644
> --- a/gcc/gimple-ssa-sccopy.cc
> +++ b/gcc/gimple-ssa-sccopy.cc
> @@ -699,8 +699,8 @@ public:
>{}
>  
>/* opt_pass methods: */
> -  virtual bool gate (function *) { return true; }
> -  virtual unsigned int execute (function *);
> +  virtual bool gate (function *) final override { return true; }
> +  virtual unsigned int execute (function *) final override;
>opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
>  }; // class pass_sccopy
>  
> -- 
> 2.49.0
> 


Re: Remove early inlining from afdo pass

2025-06-26 Thread Kugan Vivekanandarajah


> On 24 Jun 2025, at 7:43 pm, Jan Hubicka  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi,
> this pass removes early-inlining from afdo pass since all inlining
> should now happen from early inliner.  I tedted this on spec and there
> are 3 inlines happening here which are blocked at early-inline time by
> hitting large function growth limit.  We probably want to bypass that
> limit, I will look into that incrementaly.

Thanks for doing this. Is the inlining difference here is due to annotation 
that happens in auto-profile pass in the earlier implementation?

One unrelated question about scaling profiles. We seem to scale-up AFDO  with 
and_count_scale and scale down local_profile in some other cases. Should we 
instead scale up AFDO profile to local_profile scale. Lot of the inlining and 
other parameters seem to work well with that.

Thanks,
Kugan
> 
> This should make the non-inlined function profile merging hopefully
> easier.
> 
> It may still make sense to separate afdo inliner from early inliner to
> solve the non-transitivity issues which is not that hard to do with
> current code orgnaization. However this should be separate IPA pass
> rather then another part of afdo pass, since it can be coneptually
> separate.
> 
> Boostrapped/regtested x86_64-linux, will commit it shortly.

> 
> Honza
> 
> gcc/ChangeLog:
> 
>* auto-profile.cc: Update toplevel comment.
>(early_inline): Remove.
>(auto_profile): Don't do early inlining.

> 
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index 8a1d9f878c6..3f8310e6324 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -76,21 +76,30 @@ along with GCC; see the file COPYING3.  If not see
>  standalone symbol, or a clone of a function that is inlined into another
>  function.
> 
> -   Phase 2: Early inline + value profile transformation.
> - Early inline uses autofdo_source_profile to find if a callsite is:
> +   Phase 2: AFDO inline + value profile transformation.
> + This happens during early optimization.
> + During early inlning AFDO inliner is executed which
> + uses autofdo_source_profile to find if a callsite is:
> * inlined in the profiled binary.
> * callee body is hot in the profiling run.
>  If both condition satisfies, early inline will inline the callsite
>  regardless of the code growth.
> - Phase 2 is an iterative process. During each iteration, we also check
> - if an indirect callsite is promoted and inlined in the profiling run.
> - If yes, vpt will happen to force promote it and in the next iteration,
> - einline will inline the promoted callsite in the next iteration.
> +
> + Performing this early has benefit of doing early optimizations
> + before read IPA passe and getting more "context sensitivity" of
> + the profile read.  Profile of inlined functions may differ
> + significantly form one inline instance to another and from the
> + offline version.
> +
> + This is controlled by -fauto-profile-inlinig and is independent
> + of -fearly-inlining.
> 
>Phase 3: Annotate control flow graph.
>  AutoFDO uses a separate pass to:
> * Annotate basic block count
> * Estimate branch probability
> +   * Use earlier static profile to fill in the gaps
> + if AFDO profile is ambigous
> 
>After the above 3 phases, all profile is readily annotated on the GCC IR.
>AutoFDO tries to reuse all FDO infrastructure as much as possible to make
> @@ -2217,18 +2226,6 @@ afdo_annotate_cfg (void)
>   free_dominance_info (CDI_POST_DOMINATORS);
> }
> 
> -/* Wrapper function to invoke early inliner.  */
> -
> -static unsigned int
> -early_inline ()
> -{
> -  compute_fn_summary (cgraph_node::get (current_function_decl), true);
> -  unsigned int todo = early_inliner (cfun);
> -  if (todo & TODO_update_ssa_any)
> -update_ssa (TODO_update_ssa);
> -  return todo;
> -}
> -
> /* Use AutoFDO profile to annoate the control flow graph.
>Return the todo flag.  */
> 
> @@ -2254,15 +2251,9 @@ auto_profile (void)
> 
> push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> 
> -unsigned int todo = early_inline ();
> autofdo::afdo_annotate_cfg ();
> compute_function_frequency ();
> 
> -/* Local pure-const may imply need to fixup the cfg.  */
> -todo |= execute_fixup_cfg ();
> -if (todo & TODO_cleanup_cfg)
> -  cleanup_tree_cfg ();
> -
> free_dominance_info (CDI_DOMINATORS);
> free_dominance_info (CDI_POST_DOMINATORS);
> cgraph_edge::rebuild_edges ();



[PATCH] RISC-V: Add pipeline-checker script

2025-06-26 Thread Kito Cheng
Pipeline checker utility for RISC-V architecture that validates processor
pipeline models. This tool analyzes machine description files to ensure all
instruction types are properly handled by pipeline scheduling models.

I write this tool since I am implment vector pipeline stuff for SiFive
core, but it's hard to find which instruction type is not handled by
pipeline scheduling models. This tool will help me to find out which
instruction type is not handled by pipeline scheduling models, so I can
fix them.

And I think it may be useful for other RISC-V core developers, so I
decided to upstream that :)

Usage:
```
./pipeline-checker 
```
Example:
```
$ ./pipeline-checker sifive-7.md
Error: Some types are not consumed by the pipemodel
Missing types:
 {'vfclass', 'vimovxv', 'vmov', 'rdfrm', 'wrfrm', 'ghost', 'wrvxrm', 'crypto', 
'vwsll', 'vfmovfv', 'vimovvx', 'sf_vc', 'vfmovvf', 'sf_vc_se', 'rdvlenb', 
'vbrev', 'vrev8', 'sf_vqmacc', 'sf_vfnrclip', 'vsetvl_pre', 'rdvl', 'vsetvl'}
```

gcc/ChangeLog:

* config/riscv/pipeline-checker: New file.
---
 gcc/config/riscv/pipeline-checker | 191 ++
 1 file changed, 191 insertions(+)
 create mode 100755 gcc/config/riscv/pipeline-checker

diff --git a/gcc/config/riscv/pipeline-checker 
b/gcc/config/riscv/pipeline-checker
new file mode 100755
index 000..815698b0e20
--- /dev/null
+++ b/gcc/config/riscv/pipeline-checker
@@ -0,0 +1,191 @@
+#!/usr/bin/env python3
+
+# RISC-V pipeline model checker.
+# Copyright (C) 2025 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+import re
+import sys
+import argparse
+from pathlib import Path
+from typing import List
+import pprint
+
+def remove_line_comments(text: str) -> str:
+# Remove ';;' and everything after it on each line
+cleaned_lines = []
+for line in text.splitlines():
+comment_index = line.find(';;')
+if comment_index != -1:
+line = line[:comment_index]
+cleaned_lines.append(line)
+return '\n'.join(cleaned_lines)
+
+
+def tokenize_sexpr(s: str) -> List[str]:
+# Tokenize input string, including support for balanced {...} C blocks
+tokens = []
+i = 0
+while i < len(s):
+c = s[i]
+if c.isspace():
+i += 1
+elif c == '(' or c == ')':
+tokens.append(c)
+i += 1
+elif c == '"':
+# Parse quoted string
+j = i + 1
+while j < len(s) and s[j] != '"':
+if s[j] == '\\':
+j += 1  # Skip escape
+j += 1
+tokens.append(s[i:j+1])
+i = j + 1
+elif c == '{':
+# Parse balanced C block
+depth = 1
+j = i + 1
+while j < len(s) and depth > 0:
+if s[j] == '{':
+depth += 1
+elif s[j] == '}':
+depth -= 1
+j += 1
+tokens.append(s[i:j])  # Include enclosing braces
+i = j
+else:
+# Parse atom
+j = i
+while j < len(s) and not s[j].isspace() and s[j] not in '()"{}':
+j += 1
+tokens.append(s[i:j])
+i = j
+return tokens
+
+
+def parse_sexpr(tokens: List[str]) -> any:
+# Recursively parse tokenized S-expression
+token = tokens.pop(0)
+if token == '(':
+lst = []
+while tokens[0] != ')':
+lst.append(parse_sexpr(tokens))
+tokens.pop(0)  # Discard closing parenthesis
+return lst
+elif token.startswith('"') and token.endswith('"'):
+return token[1:-1]  # Remove surrounding quotes
+elif token.startswith('{') and token.endswith('}'):
+return token  # Keep C code block as-is
+else:
+return token
+
+
+def find_define_attr_type(ast: any) -> List[List[str]]:
+# Traverse AST to find all (define_attr "type" ...) entries
+result = []
+if isinstance(ast, list):
+if ast and ast[0] == 'define_attr' and len(ast) >= 2 and ast[1] == 
'type':
+result.append(ast)
+for elem in ast:
+result.extend(find_define_attr_type(elem))
+return result
+
+
+def parse_md_file(path: Path):
+# Read file, remove comments, and parse all top-level S-expressions
+with open(path

Re: [PATCH 13/17] lto-ltrans-cache: Remove unused private member

2025-06-26 Thread Martin Jambor
Hi,

On Thu, Jun 26 2025, Michal Jireš wrote:
> On 6/25/25 4:14 PM, Martin Jambor wrote:
>> Hi,
>> 
>> when building GCC with clang, it warns that the private member suffix
>> in class ltrans_file_cache (defined in lto-ltrans-cache.h) is not used
>> which indeed looks like it is the case.  This patch therefore removes
>> it along with its initialization in the constructor.
>> 
>> Bootstrapped and tested on x86_64-linx.  OK for master?
>> 
>> Alternatively, as with all of these clang warning issues, I'm
>> perfectly happy to add an entry to contrib/filter-clang-warnings.py to
>> ignore the warning instead.
>> 
>> Thanks,
>> 
>> Martin
>> 
>
> Thanks, I am ok with this,

thanks for having a look.  With the above, I'll declare the change
obvious and go ahead and commit it.

Martin



[committed] RISC-V: Add comment and reorder the the include files in riscv.md [NFC]

2025-06-26 Thread Kito Cheng
This patch adds a comment to the riscv.md file to clarify the purpose of
the file and reorders the include files for better organization.

gcc/ChangeLog:

* config/riscv/riscv.md: Add comment and reorder include
files.
---
 gcc/config/riscv/riscv.md | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 3406b50518e..04b8cc92cd6 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4860,6 +4860,7 @@ (define_split
   { operands[3] = GEN_INT (BITS_PER_WORD
   - exact_log2 (INTVAL (operands[3]) + 1)); })
 
+;; Standard extensions and pattern for optimization
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
@@ -4867,19 +4868,21 @@ (define_split
 (include "sync-ztso.md")
 (include "peephole.md")
 (include "pic.md")
-(include "generic.md")
-(include "sifive-7.md")
-(include "sifive-p400.md")
-(include "sifive-p600.md")
-(include "thead.md")
-(include "generic-vector-ooo.md")
-(include "generic-ooo.md")
-(include "vector.md")
 (include "vector-crypto.md")
 (include "vector-bfloat16.md")
 (include "zicond.md")
 (include "sfb.md")
 (include "zc.md")
+(include "vector.md")
+;; Vendor extensions
+(include "thead.md")
 (include "corev.md")
+;; Pipeline models
+(include "generic.md")
 (include "xiangshan.md")
 (include "mips-p8700.md")
+(include "sifive-7.md")
+(include "sifive-p400.md")
+(include "sifive-p600.md")
+(include "generic-vector-ooo.md")
+(include "generic-ooo.md")
-- 
2.34.1



[PATCH 1/2] Fixup partial_vectors_supported_p use

2025-06-26 Thread Richard Biener
The following fixes the computation of supports_partial_vectors which
is used to prune the set of modes to iterate over for epilog
vectorization.  The used partial_vectors_supported_p predicate
only looks for while_ult while also support predication when
mask modes are integer modes as for AVX512.

I've noticed this isn't very effective on x86_64 anyway since
if the main loop mode is autodetected we skip re-analyzing
mode_i == 0, but then mode_i == 1 is usually the very same
large mode.

Thus I do wonder if we should instead always (or when
--param vect-partial-vector-usage != 0, or when the target would
support predication in principle) perform main loop analysis
with partial vectors in mind (start with can_use_partial_vectors_p =
true), but only at the end honor the --param when deciding on
using_partial_vectors_p.  We can then remember can_use_partial_vectors_p
for each analyzed mode and use that more specific info for the
pruning?  For the missed skipping we probably want to increment
mode_i based on vect_chooses_same_modes_p, like we do in
vect_analyze_loop_1.  I'll propose a patch for this - but this
would regress --param vect-partial-vector-usage=1 on x86 without
the patch below.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK?

* tree-vect-loop.cc (vect_analyze_loop): Consider AVX512
style masking when computing supports_partial_vectors.
---
 gcc/tree-vect-loop.cc | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c824b5abaaf..b91ef4a2325 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3742,8 +3742,15 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
 vector_modes[0] = autodetected_vector_mode;
   mode_i = 0;
 
-  bool supports_partial_vectors =
-partial_vectors_supported_p () && param_vect_partial_vector_usage != 0;
+  bool supports_partial_vectors = param_vect_partial_vector_usage != 0;
+  machine_mode mask_mode;
+  if (supports_partial_vectors
+  && !partial_vectors_supported_p ()
+  && !(VECTOR_MODE_P (first_loop_vinfo->vector_mode)
+  && targetm.vectorize.get_mask_mode
+   (first_loop_vinfo->vector_mode).exists (&mask_mode)
+  && SCALAR_INT_MODE_P (mask_mode)))
+supports_partial_vectors = false;
   poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
 
   loop_vec_info orig_loop_vinfo = first_loop_vinfo;
-- 
2.43.0



[PATCH 2/2] Fixup vector epilog analysis skipping when not using partial vectors

2025-06-26 Thread Richard Biener
The following avoids re-analyzing the loop as epilogue when not
using partial vectors and the mode is the same as the autodetected
vector mode and that has a too high VF for a non-predicated loop.
This situation occurs almost always on x86 and saves us one
re-analysis unless --param vect-partial-vector-usage is non-default.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

* tree-vect-loop.cc (vect_analyze_loop): Prune epilogue
analysis further when not using partial vectors.
---
 gcc/tree-vect-loop.cc | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b91ef4a2325..d9091c6c705 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3770,6 +3770,26 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
break;
  continue;
}
+ /* We would need an exhaustive search to find all modes we
+skipped but that would lead to the same result as another
+and where we'd could check cached_vf_per_mode against.
+Check for the autodetected mode, which is the common
+situation on x86 which does not perform cost comparison.  */
+ if (!supports_partial_vectors
+ && maybe_ge (cached_vf_per_mode[0], first_vinfo_vf)
+ && VECTOR_MODE_P (autodetected_vector_mode)
+ && (related_vector_mode (vector_modes[mode_i],
+  GET_MODE_INNER 
(autodetected_vector_mode))
+ == autodetected_vector_mode)
+ && (related_vector_mode (autodetected_vector_mode,
+  GET_MODE_INNER (vector_modes[mode_i]))
+ == vector_modes[mode_i]))
+   {
+ mode_i++;
+ if (mode_i == vector_modes.length ())
+   break;
+ continue;
+   }
 
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
-- 
2.43.0


[PATCH v2 1/2] libstdc++: Type-erase chrono-data for formatting [PR110739]

2025-06-26 Thread Tomasz Kamiński
This patch reworks the formatting for the chrono types, such that they are all
formatted in terms of _ChronoData class, that includes all required fields.
Populating each required field is performed in formatter for specific type,
based on the chrono-spec used.

To facilitate above, the _ChronoSpec now includes additional _M_needed field,
that represnts the chrono data that is referenced by format spec (this value
is also configured for __defSpec). This value differs from the value of
__parts passed to _M_parse, which does include all fields that can be computed
from input (e.g. weekday_indexed can be computed for year_month_day). Later
it is used to fill _ChronoData, in particular _M_fill_* family of functions,
to determine if given field needs to be set, and thus it's value needs to be
computed.

In consequence _ChronoParts enum was exteneded with additional values,
that allows more fine grained indentification:
 * _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds,
 * _TimeZone is separated into _ZoneAbbrev and _ZoneOffset,
 * _LocalDays, _WeekdayIndex are defiend in included in _Date,
 * _Duration is removed, and instead _EpochUnits and _UnitSuffix are
   introduced.
Furthermore, to avoid name conflicts _ChonoParts is now defined as enum class,
with additional operators that simplify uses.

In addition to fields that can be printed using chron-spec, _ChronoData stores:
 * Total days in wall time (_M_ldays), day of year (_M_day_of_year) - used by
   struct tm construction, and for ISO calendar computation.
 * Total seconds in wall time (_M_lseconds) - this value may be different from
   sum of days, hours, minutes, seconds (e.g. see utc_time below). Included
   to allow future extension, like printing total minutes.
 * Total seconds since epoch - due offset different from above. Again to be
   used with future extension (e.g. %s as proposed in P2945R1).
 * Subseconds - count of attoseconds (10^(-18)), in addition to priting can
   be used to  compute fractional hours, minutes.
The both total seconds fielkds we use single _TotalSeconds enumerator in
_ChronoParts, that when present in combination with _EpochUnits or _LocalDays
indicates that _M_eseconds (_EpochSeconds) or _M_lseconds (_LocalSeconds) are
provided/required.

To handle type formatting of time since epoch ('%Q'|_EpochUnits), we use the
format_args mechanism, where the result of +d.count() (see LWG4118) is erased
into make_format_args to local __arg_store, that is later referenced by
_M_ereps (_M_ereps.get(0)).

To handle precision values, and in prepartion to allow user to configure ones,
we store the precision as third element of _M_ereps (_M_ereps.get(2)), this
allows duration with precision to be printed using "{0:{2}}". For subseconds
the precision is handled differently depending on the representation:
 * for integral reps, _M_subseconds value is used to determine fractional value,
   precision is trimmed to 18 digits;
 * for floating-points, we _M_ereps stores duration initialized with only
   fractional seconds, that is later formatted with precision.
Always using _M_subseconds fields for integral duration, means that we do not
use formattter for user-defined durations that are considered to be integral
(see empty_spec.cc file change). To avoid potentially expensive computation
of _M_subseconds, we make sure that _ChronoParts::_Subseconds is set only if
_Subseconds are needed. In particular we remove this flag for localized ouput
in _M_parse.

Construction the _M_ereps as described above is handled by __formatter_duration,
that is then used to format duration, hh_mm_ss and time_points specialization.
This class also handles _UnitSuffix, the _M_units_suffix field is populated
either with predefined suffix (chrono::__detail::__units_suffix) or one produced
locally.

Finally, formatters for types listed below contains type specific logic:
 * hh_mm_ss - we do not compute total duration and seconds, unless explicitly
   requested, as such computation may overflow;
 * utc_time - for time during leap second insertion, the _M_seconds field is
   increased to 60;
 * __local_time_fmt - exception is thrown if zone offset (_ZoneOffset) or
   abbrevation (_ZoneAbbrev) is requsted, but corresponding pointer is null,
   futhermore conversion from `char` to `wchar_t` for abbreviation is performed
   if needed.

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__format::__no_timezone_available):
Removed, replaced with separate throws in formatter for
__local_time_fmt
(__format::_ChronoParts): Defined additional enumertors and
declared as enum class.
(__format::operator&(_ChronoParts, _ChronoParts))
(__format::operator&=(_ChronoParts&, _ChronoParts))
(__format::operator-(_ChronoParts, _ChronoParts))
(__format::operator-=(_ChronoParts&, _ChronoParts))
(__format::operator==(_ChronoParts, decltype(nullptr)))
(_ChronoSpec::

[PATCH v2 0/2] libstdc++: Reduce compilation times for chrono-formatting

2025-06-26 Thread Tomasz Kamiński
This is now series of two patches, that aim to reduce compilation times
for chrono formatting:
* first patch implements formatting of chrono types in terms of single
  _ChronoData type, instead of instantiating it for all calendar types
* second patch deduplicates localized chrono formatting code, by putting
  it into the main formatting loop.

Tested on x86_64-linux locallt. The std/time* test passed with
-D_GLIBCXX_USE_CXX11_ABI=0 and -D_GLIBCXX_DEBUG.



[PATCH v2 2/2] libstdc++: Lift chrono localized formatting to main chrono format loop [PR110739]

2025-06-26 Thread Tomasz Kamiński
This patch extract calls to _M_locale_fmt and construction of the struct tm,
from the functions dedicated to each specifier, to main format loop in
_M_format_to functions. This removes duplicated code repeated for specifiers.

To allow _M_locale_fmt to only be called if localized formatting is enabled
('L' is present in chrono-format-spec), we provide a implementations for
locale specific specifiers (%c, %r, %x, %X) that produces the same result
as locale::classic():
 * %c is implemented as separate _M_c method
 * %r is implemented as separate _M_r method
 * %x is implemented together with %D, as they provide same behavior,
 * %X is implemented together with %R as _M_R_X, as both of them do not include
   subseconds.

The handling of subseconds was also extracted to _M_subsecs function that is
used by _M_S and _M_T specifier. The _M_T is now implemented in terms of
_M_R_X (printing time without subseconds) and _M_subs.

The __mod responsible for triggering localized formatting was removed from
method handling most of specifiers, except:
 * _M_S (for %S) for which it determines if subseconds should be included,
 * _M_z (for %z) for which it determines if ':' is used as separator.

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_use_locale_fmt):
Define.
(__formatter_chrono::_M_locale_fmt): Moved to front of the class.
(__formatter_chrono::_M_format_to): Construct and initialize
struct tm and call _M_locale_fmt if needed.
(__formatter_chrono::_M_c_r_x_X): Split into separate methods.
(__formatter_chrono::_M_c, __formatter_chrono::_M_r): Define.
(__formatter_chrono::_M_D): Renamed to _M_D_x.
(__formatter_chrono::_M_D_x): Renamed from _M_D.
(__formatter_chrono::_M_R_T): Split into _M_R_X and _M_T.
(__formatter_chrono::_M_R_X): Extracted from _M_R_T.
(__formatter_chrono::_M_T): Define in terms of _M_R_X and _M_subsecs.
(__formatter_chrono::_M_subsecs): Extracted from _M_S.
(__formatter_chrono::_M_S): Replaced __mod with __subs argument,
removed _M_locale_fmt call, and delegate to _M_subsecs.
(__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
(__formatter_chrono::_M_H_I, __formatter_chrono::_M_m)
(__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W): Remove
__mod argument and call to _M_locale_fmt.
---
 libstdc++-v3/include/bits/chrono_io.h | 340 +-
 1 file changed, 172 insertions(+), 168 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 35e95906e6a..d451bde722d 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -906,6 +906,40 @@ namespace __format
  return __format::__write(std::move(__out), __s);
}
 
+  [[__gnu__::__always_inline__]]
+  static bool
+  _S_localized_spec(_CharT __conv, _CharT __mod)
+  {
+   switch (__conv)
+ {
+ case 'c':
+ case 'r':
+ case 'x':
+ case 'X':
+   return true;
+ case 'z':
+   return false;
+ default:
+   return (bool)__mod;
+ };
+  }
+
+  // Use the formatting locale's std::time_put facet to produce
+  // a locale-specific representation.
+  template
+   _Iter
+   _M_locale_fmt(_Iter __out, const locale& __loc, const struct tm& __tm,
+ char __fmt, char __mod) const
+   {
+ basic_ostringstream<_CharT> __os;
+ __os.imbue(__loc);
+ const auto& __tp = use_facet>(__loc);
+ __tp.put(__os, __os, _S_space, &__tm, __fmt, __mod);
+ if (__os)
+   __out = _M_write(std::move(__out), __loc, __os.view());
+ return __out;
+   }
+
   template
_Out
_M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
@@ -923,6 +957,36 @@ namespace __format
return std::move(__out);
  };
 
+ struct tm __tm{};
+ bool __use_locale_fmt = false;
+ if (_M_spec._M_localized && _M_spec._M_locale_specific)
+   if (__fc.locale() != locale::classic())
+ {
+   __use_locale_fmt = true;
+
+   __tm.tm_year = (int)__t._M_year - 1900;
+   __tm.tm_yday = __t._M_day_of_year.count();
+   __tm.tm_mon = (unsigned)__t._M_month - 1;
+   __tm.tm_mday = (unsigned)__t._M_day;
+   __tm.tm_wday = __t._M_weekday.c_encoding();
+   __tm.tm_hour = __t._M_hours.count();
+   __tm.tm_min = __t._M_minutes.count();
+   __tm.tm_sec = __t._M_seconds.count();
+
+   // Some locales use %Z in their %c format but we don't want 
strftime
+   // to use the system's local time zone (from /etc/localtime or 
$TZ)
+   // as the output for %Z. Setting tm_isdst to 

[PATCH] c++, libstdc++, v2: Implement C++26 P2830R10 - Constexpr Type Ordering

2025-06-26 Thread Jakub Jelinek
On Wed, Jun 25, 2025 at 10:58:59PM +0200, Maciej Cencora wrote:
> update of std module is missing.

Here is an updated patch which adds the std module part and while I was
changing the patch, I've also added value_type/type and the 2 operators
to std::type_order.

Interdiff from the last patch is:
--- libstdc++-v3/libsupc++/compare  2025-06-25 16:18:25.221710493 +0200
+++ libstdc++-v3/libsupc++/compare  2025-06-25 16:18:25.221710493 +0200
@@ -1271,6 +1271,10 @@
 struct type_order
 {
   static constexpr strong_ordering value = __builtin_type_order(_Tp, _Up);
+  using value_type = strong_ordering;
+  using type = type_order<_Tp, _Up>;
+  constexpr operator value_type() const noexcept { return value; }
+  constexpr value_type operator()() const noexcept { return value; }
 };
 
   /// @ingroup variable_templates
--- libstdc++-v3/src/c++23/std.cc.in.jj 2025-06-12 15:50:51.400821105 +0200
+++ libstdc++-v3/src/c++23/std.cc.in2025-06-26 07:37:06.90208 +0200
@@ -888,6 +888,10 @@ export namespace std
   using std::partial_order;
   using std::strong_order;
   using std::weak_order;
+#if __glibcxx_type_order >= 202506L
+  using std::type_order;
+  using std::type_order_v;
+#endif
 }
 
 // 28.4 

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Though, now that I look at it again, perhaps both
#if __glibcxx_type_order >= 202506L
in the patch should have been
#if __cpp_lib_type_order >= 202506L

Can change that.

2025-06-26  Jakub Jelinek  

gcc/cp/
* cp-trait.def: Implement C++26 P2830R10 - Constexpr Type Ordering.
(TYPE_ORDER): New.
* method.cc (type_order_value): Define.
* cp-tree.h (type_order_value): Declare.
* semantics.cc (trait_expr_value): Use gcc_unreachable also
for CPTK_TYPE_ORDER, adjust comment.
(finish_trait_expr): Handle CPTK_TYPE_ORDER.
* constraint.cc (diagnose_trait_expr): Likewise.
gcc/testsuite/
* g++.dg/cpp26/type-order1.C: New test.
* g++.dg/cpp26/type-order2.C: New test.
* g++.dg/cpp26/type-order3.C: New test.
libstdc++-v3/
* include/bits/version.def (type_order): New.
* include/bits/version.h: Regenerate.
* libsupc++/compare: Define __glibcxx_want_type_order before
including bits/version.h.
(std::type_order, std::type_order_v): New trait and template variable.
* src/c++23/std.cc.in (std::type_order, std::type_order_v): Export.
* testsuite/18_support/comparisons/type_order/1.cc: New test.

--- gcc/cp/method.cc.jj 2025-06-25 16:04:51.611158952 +0200
+++ gcc/cp/method.cc2025-06-25 16:09:32.017556551 +0200
@@ -3951,5 +3951,26 @@ num_artificial_parms_for (const_tree fn)
   return count;
 }
 
+/* Return value of the __builtin_type_order trait.  */
+
+tree
+type_order_value (tree type1, tree type2)
+{
+  tree rettype = lookup_comparison_category (cc_strong_ordering);
+  if (rettype == error_mark_node)
+return rettype;
+  int ret;
+  if (type1 == type2)
+ret = 0;
+  else
+{
+  const char *name1 = ASTRDUP (mangle_type_string (type1));
+  const char *name2 = mangle_type_string (type2);
+  ret = strcmp (name1, name2);
+}
+  return lookup_comparison_result (cc_strong_ordering, rettype,
+  ret == 0 ? 0 : ret > 0 ? 1 : 2);
+}
+
 
 #include "gt-cp-method.h"
--- gcc/cp/cp-tree.h.jj 2025-06-25 16:04:51.610158965 +0200
+++ gcc/cp/cp-tree.h2025-06-25 16:09:32.019556525 +0200
@@ -7557,6 +7557,8 @@ extern bool ctor_omit_inherited_parms (
 extern tree locate_ctor(tree);
 extern tree implicitly_declare_fn   (special_function_kind, tree,
 bool, tree, tree);
+extern tree type_order_value   (tree, tree);
+
 /* In module.cc  */
 class module_state; /* Forward declare.  */
 inline bool modules_p () { return flag_modules != 0; }
--- gcc/cp/semantics.cc.jj  2025-06-25 16:04:51.633158669 +0200
+++ gcc/cp/semantics.cc 2025-06-25 16:09:32.021556500 +0200
@@ -13593,8 +13593,10 @@ trait_expr_value (cp_trait_kind kind, tr
 case CPTK_IS_DEDUCIBLE:
   return type_targs_deducible_from (type1, type2);
 
-/* __array_rank is handled in finish_trait_expr. */
+/* __array_rank and __builtin_type_order are handled in
+   finish_trait_expr.  */
 case CPTK_RANK:
+case CPTK_TYPE_ORDER:
   gcc_unreachable ();
 
 #define DEFTRAIT_TYPE(CODE, NAME, ARITY) \
@@ -13724,6 +13726,12 @@ finish_trait_expr (location_t loc, cp_tr
   tree trait_expr = make_node (TRAIT_EXPR);
   if (kind == CPTK_RANK)
TREE_TYPE (trait_expr) = size_type_node;
+  else if (kind == CPTK_TYPE_ORDER)
+   {
+ tree val = type_order_value (type1, type1);
+ if (val != error_mark_node)
+   TREE_TYPE (trait_expr) = TREE_TYPE (val);
+   }
   else
TREE_TYPE (trait_expr) = boolean_type_node;

[PATCH] Fix misoptimization of CONSTRUCTOR with reverse SSO

2025-06-26 Thread Eric Botcazou
Hi,

fold_ctor_reference already punts on a CONSTRUCTOR whose type has reverse 
storage order, but it can be invoked in a couple of places on a CONSTRUCTOR 
with native storage order that has been wrapped in a VIEW_CONVERT_EXPR to a 
type with reverse storage order; this would require a post adjustment that 
does not currently exist, thus yield wrong code for this admittedly quite 
pathological (but supported) case.

Technically, this is a regression in GCC 10.x and later but, being quite 
pathological, at least in Ada, I don't think that we need to bother about it 
on earlier branches than gcc-13.

Tested on x86-64/Linux, OK for mainline down to the gcc-13 branch?


2025-06-26  Eric Botcazou  

* gimple-fold.cc (fold_const_aggregate_ref_1) :
Bail out immediately if the reference has reverse storage order.
* tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Likewise.


2025-06-26  Eric Botcazou  

* gnat.dg/sso20.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 729080ad6e5..e9635d1005d 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -10117,19 +10117,21 @@ fold_const_aggregate_ref_1 (tree t, tree (*valueize) (tree))
   base = get_ref_base_and_extent (t, &offset, &size, &max_size, &reverse);
   ctor = get_base_constructor (base, &offset, valueize);
 
+  /* We cannot determine ctor.  */
+  if (!ctor)
+	return NULL_TREE;
   /* Empty constructor.  Always fold to 0.  */
   if (ctor == error_mark_node)
 	return build_zero_cst (TREE_TYPE (t));
-  /* We do not know precise address.  */
+  /* We do not know precise access.  */
   if (!known_size_p (max_size) || maybe_ne (max_size, size))
 	return NULL_TREE;
-  /* We cannot determine ctor.  */
-  if (!ctor)
-	return NULL_TREE;
-
   /* Out of bound array access.  Value is undefined, but don't fold.  */
   if (maybe_lt (offset, 0))
 	return NULL_TREE;
+  /* Access with reverse storage order.  */
+  if (reverse)
+	return NULL_TREE;
 
   tem = fold_ctor_reference (TREE_TYPE (t), ctor, offset, size, base);
   if (tem)
@@ -10149,7 +10151,6 @@ fold_const_aggregate_ref_1 (tree t, tree (*valueize) (tree))
 	  && offset.is_constant (&coffset)
 	  && (coffset % BITS_PER_UNIT != 0
 		  || csize % BITS_PER_UNIT != 0)
-	  && !reverse
 	  && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN)
 	{
 	  poly_int64 bitoffset;
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index f7f50c3de99..9cdbf3da772 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -1615,6 +1615,8 @@ fully_constant_vn_reference_p (vn_reference_t ref)
 	  ++i;
 	  break;
 	}
+	  if (operands[i].reverse)
+	return NULL_TREE;
 	  if (known_eq (operands[i].off, -1))
 	return NULL_TREE;
 	  off += operands[i].off;
--  { dg-do run }
--  { dg-options "-O" }

with Ada.Unchecked_Conversion;
with Interfaces;  use Interfaces;
with System;  use System;

procedure SSO20 is

  type Bytes_Ref is array (1 .. 4) of Unsigned_8
with Convention => Ada_Pass_By_Reference;

  type U32_BE is record
Value : Unsigned_32;
  end record
with
  Pack,
  Bit_Order=> High_Order_First,
  Scalar_Storage_Order => High_Order_First;

  function Conv is new Ada.Unchecked_Conversion (Bytes_Ref, U32_BE);

  function Value (B : Bytes_Ref) return Unsigned_32 is (Conv (B).Value);

begin
  if Value ((16#11#, 16#22#, 16#33#, 16#44#)) /= 16#11223344# then
 raise Program_Error;
  end if;
end;


[PATCH] libstdc++, v2: Implement C++26 P2927R3 - Inspecting exception_ptr

2025-06-26 Thread Jakub Jelinek
On Wed, Jun 25, 2025 at 08:20:55PM +0100, Jonathan Wakely wrote:
> This won't work for -fno-rtti

I've missed the || __cpp_exceptions part in there, thought it is &&.

Here is an updated patch which uses just one definition of
std::exception_ptr_cast and additionally exports it from std.cc.in as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-06-26  Jakub Jelinek  

* include/bits/version.def (exception_ptr_cast): Add.
* include/bits/version.h: Regenerate.
* libsupc++/exception: Define __glibcxx_want_exception_ptr_cast before
including bits/version.h.
* libsupc++/exception_ptr.h (std::exception_ptr_cast): Define.
(std::__exception_ptr::exception_ptr::_M_exception_ptr_cast): Declare.
* libsupc++/eh_ptr.cc
(std::__exception_ptr::exception_ptr::_M_exception_ptr_cast): Define.
* src/c++23/std.cc.in (std::exception_ptr_cast): Export.
* config/abi/pre/gnu.ver: Export

_ZNKSt15__exception_ptr13exception_ptr21_M_exception_ptr_castERKSt9type_info
at CXXABI_1.3.17.
* testsuite/util/testsuite_abi.cc (check_version): Allow CXXABI_1.3.17.
* testsuite/18_support/exception_ptr/exception_ptr_cast.cc: New test.

--- libstdc++-v3/include/bits/version.def.jj2025-06-24 18:53:13.751807828 
+0200
+++ libstdc++-v3/include/bits/version.def   2025-06-25 12:52:41.844921595 
+0200
@@ -2012,6 +2012,14 @@ ftms = {
   };
 };
 
+ftms = {
+  name = exception_ptr_cast;
+  values = {
+v = 202506;
+cxxmin = 26;
+  };
+};
+
 // Standard test specifications.
 stds[97] = ">= 199711L";
 stds[03] = ">= 199711L";
--- libstdc++-v3/include/bits/version.h.jj  2025-06-24 18:53:13.751807828 
+0200
+++ libstdc++-v3/include/bits/version.h 2025-06-25 12:52:47.754691329 +0200
@@ -2253,4 +2253,14 @@
 #endif /* !defined(__cpp_lib_sstream_from_string_view) && 
defined(__glibcxx_want_sstream_from_string_view) */
 #undef __glibcxx_want_sstream_from_string_view
 
+#if !defined(__cpp_lib_exception_ptr_cast)
+# if (__cplusplus >  202302L)
+#  define __glibcxx_exception_ptr_cast 202506L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_exception_ptr_cast)
+#   define __cpp_lib_exception_ptr_cast 202506L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_exception_ptr_cast) && 
defined(__glibcxx_want_exception_ptr_cast) */
+#undef __glibcxx_want_exception_ptr_cast
+
 #undef __glibcxx_want_all
--- libstdc++-v3/libsupc++/exception.jj 2025-06-12 09:49:19.924910752 +0200
+++ libstdc++-v3/libsupc++/exception2025-06-25 12:53:09.924564775 +0200
@@ -38,6 +38,7 @@
 #include 
 
 #define __glibcxx_want_uncaught_exceptions
+#define __glibcxx_want_exception_ptr_cast
 #include 
 
 extern "C++" {
--- libstdc++-v3/libsupc++/exception_ptr.h.jj   2025-06-02 11:00:06.267523918 
+0200
+++ libstdc++-v3/libsupc++/exception_ptr.h  2025-06-26 07:53:12.966100732 
+0200
@@ -80,6 +80,13 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   /// Throw the object pointed to by the exception_ptr.
   void rethrow_exception(exception_ptr) __attribute__ ((__noreturn__));
 
+#if __cpp_lib_exception_ptr_cast >= 202506L
+  template
+  const _Ex* exception_ptr_cast(const exception_ptr&) noexcept;
+  template
+  void exception_ptr_cast(const exception_ptr&&) = delete;
+#endif
+
   namespace __exception_ptr
   {
 using std::rethrow_exception; // So that ADL finds it.
@@ -109,6 +116,13 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   friend void std::rethrow_exception(exception_ptr);
   template
   friend exception_ptr std::make_exception_ptr(_Ex) _GLIBCXX_USE_NOEXCEPT;
+#if __cpp_lib_exception_ptr_cast >= 202506L
+  template
+  friend const _Ex* std::exception_ptr_cast(const exception_ptr&) noexcept;
+#endif
+
+  const void* _M_exception_ptr_cast(const type_info&) const
+   _GLIBCXX_USE_NOEXCEPT;
 
 public:
   exception_ptr() _GLIBCXX_USE_NOEXCEPT;
@@ -283,6 +299,20 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 { return exception_ptr(); }
 #endif
 
+#if __cpp_lib_exception_ptr_cast >= 202506L
+  template
+[[__gnu__::__always_inline__]]
+inline const _Ex* exception_ptr_cast(const exception_ptr& __p) noexcept
+{
+#ifdef __cpp_rtti
+  const type_info &__id = typeid(const _Ex&);
+  return static_cast(__p._M_exception_ptr_cast(__id));
+#else
+  return nullptr;
+#endif
+}
+#endif
+
 #undef _GLIBCXX_EH_PTR_USED
 
   /// @} group exceptions
--- libstdc++-v3/libsupc++/eh_ptr.cc.jj 2025-04-08 14:10:30.518900025 +0200
+++ libstdc++-v3/libsupc++/eh_ptr.cc2025-06-25 15:29:17.416393720 +0200
@@ -220,4 +220,20 @@ std::rethrow_exception(std::exception_pt
   std::terminate();
 }
 
+const void*
+std::__exception_ptr::exception_ptr::_M_exception_ptr_cast(const type_info& t)
+  const noexcept
+{
+  void *ptr = _M_exception_object;
+  if (__builtin_expect(ptr == nullptr, false))
+return nullptr;
+  __cxa_refcounted_exception *eh
+= __get_refcounted_exception_header_from_ob

Re: [PATCH 2/2] Fixup vector epilog analysis skipping when not using partial vectors

2025-06-26 Thread Richard Sandiford
Richard Biener  writes:
> The following avoids re-analyzing the loop as epilogue when not
> using partial vectors and the mode is the same as the autodetected
> vector mode and that has a too high VF for a non-predicated loop.
> This situation occurs almost always on x86 and saves us one
> re-analysis unless --param vect-partial-vector-usage is non-default.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
>
> Thanks,
> Richard.
>
>   * tree-vect-loop.cc (vect_analyze_loop): Prune epilogue
>   analysis further when not using partial vectors.
> ---
>  gcc/tree-vect-loop.cc | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b91ef4a2325..d9091c6c705 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3770,6 +3770,26 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>   break;
> continue;
>   }
> +   /* We would need an exhaustive search to find all modes we
> +  skipped but that would lead to the same result as another
> +  and where we'd could check cached_vf_per_mode against.

I didn't really follow this.  Is there a missing word around "another"?

> +  Check for the autodetected mode, which is the common
> +  situation on x86 which does not perform cost comparison.  */
> +   if (!supports_partial_vectors
> +   && maybe_ge (cached_vf_per_mode[0], first_vinfo_vf)
> +   && VECTOR_MODE_P (autodetected_vector_mode)
> +   && (related_vector_mode (vector_modes[mode_i],
> +GET_MODE_INNER 
> (autodetected_vector_mode))
> +   == autodetected_vector_mode)
> +   && (related_vector_mode (autodetected_vector_mode,
> +GET_MODE_INNER (vector_modes[mode_i]))
> +   == vector_modes[mode_i]))

Not too keen on cutting-&-pasting all this :-)  Could we split the
VECTOR_MODE_P onwards into a subroutine that's shared with
vect_analyze_loop_1?

LGTM otherwise FWIW.

Thanks,
Richard

> + {
> +   mode_i++;
> +   if (mode_i == vector_modes.length ())
> + break;
> +   continue;
> + }
>  
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_NOTE, vect_location,


[PATCH v1] rs6000: Fix UBSAN runtime errors for powerpc64le-unknown-linux-gnu

2025-06-26 Thread Kishan Parmar
Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

While building GCC with --with-build-config=bootstrap-ubsan on
powerpc64le-unknown-linux-gnu, multiple UBSAN runtime errors were
encountered in rs6000.cc and rs6000.md due to undefined behavior
involving left shifts on negative values and shift exponents equal to
or exceeding the type width.

The issue was in bit pattern recognition code
(in can_be_rotated_to_negative_lis and can_be_built_by_li_and_rldic),
where signed values were shifted without handling negative inputs or
guarding against shift counts equal to the type width, causing UB.
The fix ensures shifts and rotations are done unsigned HOST_WIDE_INT,
and casting back only where needed (like for arithmetic right shifts)
with proper guards to prevent shift-by-64.

2025-06-26  Kishan Parmar  

gcc:
PR target/118890
* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis):
Avoid left shift of negative value and guard shift count.
(can_be_built_by_li_and_rldic): Likewise.
(rs6000_emit_set_long_const): Likewise.
* config/rs6000/rs6000.md : Avoid signed overflow.
---
 gcc/config/rs6000/rs6000.cc | 24 ++--
 gcc/config/rs6000/rs6000.md |  4 +++-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 7ee26e52b13..e7e30fa95ba 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10309,15 +10309,18 @@ can_be_rotated_to_negative_lis (HOST_WIDE_INT c, int 
*rot)
 
   /* case b. xx0..01..1xx: some of 15 x's (and some of 16 0's) are
  rotated over the highest bit.  */
-  int pos_one = clz_hwi ((c << 16) >> 16);
-  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
-  int middle_ones = clz_hwi (~(c << pos_one));
-  if (middle_zeros >= 16 && middle_ones >= 33)
+  unsigned HOST_WIDE_INT uc = (unsigned HOST_WIDE_INT)c;
+  int pos_one = clz_hwi ((HOST_WIDE_INT)(uc << 16) >> 16);
+  if (pos_one != 0)
 {
-  *rot = pos_one;
-  return true;
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
+  int middle_ones = clz_hwi (~(uc << pos_one));
+  if (middle_zeros >= 16 && middle_ones >= 33)
+   {
+ *rot = pos_one;
+ return true;
+   }
 }
-
   return false;
 }
 
@@ -10434,7 +10437,7 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
   if (lz >= HOST_BITS_PER_WIDE_INT)
 return false;
 
-  int middle_ones = clz_hwi (~(c << lz));
+  int middle_ones = clz_hwi (~(((unsigned HOST_WIDE_INT)c) << lz));
   if (tz + lz + middle_ones >= ones
   && (tz - lz) < HOST_BITS_PER_WIDE_INT
   && tz < HOST_BITS_PER_WIDE_INT)
@@ -10468,7 +10471,7 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
   if (!IN_RANGE (pos_first_1, 1, HOST_BITS_PER_WIDE_INT-1))
 return false;
 
-  middle_ones = clz_hwi (~c << pos_first_1);
+  middle_ones = clz_hwi ((~(unsigned HOST_WIDE_INT)c) << pos_first_1);
   middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
   if (pos_first_1 < HOST_BITS_PER_WIDE_INT
   && middle_ones + middle_zeros < HOST_BITS_PER_WIDE_INT
@@ -10570,7 +10573,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c, 
int *num_insns)
 {
   /* li/lis; rldicX */
   unsigned HOST_WIDE_INT imm = (c | ~mask);
-  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+  if (shift != 0)
+   imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
 
   count_or_emit_insn (temp, GEN_INT (imm));
   if (shift != 0)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 9c718ca2a22..8fc079a4297 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1971,7 +1971,9 @@
 {
   HOST_WIDE_INT val = INTVAL (operands[2]);
   HOST_WIDE_INT low = sext_hwi (val, 16);
-  HOST_WIDE_INT rest = trunc_int_for_mode (val - low, mode);
+  /* Avoid signed overflow by computing difference in unsigned domain.  */
+  unsigned HOST_WIDE_INT urest = (unsigned HOST_WIDE_INT)val - (unsigned 
HOST_WIDE_INT)low;
+  HOST_WIDE_INT rest = trunc_int_for_mode (urest, mode);
 
   operands[4] = GEN_INT (low);
   if (mode == SImode || satisfies_constraint_L (GEN_INT (rest)))
-- 
2.43.5



[PATCH][RFC] c/96570 - diagnostics for conversions to/from time_t

2025-06-26 Thread Richard Biener
The following prototypes diagnostics for conversions to/from time_t
where the source/destination does not have sufficient precision for it.
I've lumped this into -Wconversion for the moment and didn't bother
fixing up the testcase for !ilp32 or the -Wconversion diagnostics that
happen.

Would -Wtime-conversion (or -Wtime_t-conversion?) be an appropriate
option?  I'd enable it with -Wconversion.

This does not diagnose time_t to long conversion on 64bit long
platforms (in anticipation of a problem with -m32), so actual audits
would need to build for 32bit long targets and with 64bit time_t.

The alternative is to implement this with 64bit time_t in mind
(even when it's actually 32bit) and base it solely on types
that would be safe on targets.  This get's hard for long vs.
long long then, esp. if typedefs are involved.

Any known problematic constructs out in the wild we'd like to
have test coverage for?

Thanks,
Richard.

PR c/96570
gcc/c-family/
* c-warn.cc (is_time_t): New.
(warnings_for_convert_and_check): When not otherwise diagnosed
diagnose conversions to/from time_t and loss/lack of precision.

* c-c++-common/Wtime_t-1.c: New testcase.
---
 gcc/c-family/c-warn.cc | 30 +-
 gcc/testsuite/c-c++-common/Wtime_t-1.c | 83 ++
 2 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wtime_t-1.c

diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index d547b08f55d..7184a525d42 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -1394,6 +1394,17 @@ conversion_warning (location_t loc, tree type, tree 
expr, tree result)
   return false;
 }
 
+static bool
+is_time_t (tree type)
+{
+  tree name = TYPE_NAME (type);
+  if (name
+  && TREE_CODE (name) == TYPE_DECL
+  && strcmp (IDENTIFIER_POINTER (DECL_NAME (name)), "time_t") == 0)
+return true;
+  return false;
+}
+
 /* Produce warnings after a conversion. RESULT is the result of
converting EXPR to TYPE.  This is a helper function for
convert_and_check and cp_convert_and_check.  */
@@ -1506,7 +1517,24 @@ warnings_for_convert_and_check (location_t loc, tree 
type, tree expr,
exprtype, type, expr);
 }
   else
-conversion_warning (loc, type, expr, result);
+{
+  if (conversion_warning (loc, type, expr, result))
+   return;
+
+  if (TREE_CODE (result) == INTEGER_CST)
+   ;
+  else if (is_time_t (TREE_TYPE (expr))
+  && INTEGRAL_TYPE_P (type)
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (expr)))
+   warning_at (loc, OPT_Wconversion,
+   "conversion from % to %qT loses precision", type);
+  else if (is_time_t (type)
+  && INTEGRAL_TYPE_P (TREE_TYPE (expr))
+  && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (expr)))
+   warning_at (loc, OPT_Wconversion,
+   "source %qE of conversion to % lacks precision",
+   expr);
+}
 }
 
 /* Subroutines of c_do_switch_warnings, called via splay_tree_foreach.
diff --git a/gcc/testsuite/c-c++-common/Wtime_t-1.c 
b/gcc/testsuite/c-c++-common/Wtime_t-1.c
new file mode 100644
index 000..7832a13d239
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wtime_t-1.c
@@ -0,0 +1,83 @@
+/* Test for diagnostics for conversions between time_t and integer types
+   These tests are based on gcc.dg/Wconversion-integer.c   */
+
+/* { dg-do compile { target ilp32 } } */
+/* { dg-options "-std=c99 -fsigned-char -Wconversion" } */
+
+#define __USE_TIME_BITS64
+#include 
+
+void fsc (signed char sc);
+void fuc (unsigned char uc);
+unsigned fui (unsigned int  ui);
+int fsi (signed int si);
+unsigned long ful (unsigned long  ul);
+signed long fsl (signed long  sl);
+time_t ft(time_t t);
+
+void h (int x)
+{
+  unsigned char uc = 3;
+  signed char   sc = 3;
+  unsigned short us = 3;
+  signed short   ss = 3;
+  unsigned int ui = 3;
+  signed int   si = 3;
+  unsigned long int ul = 3;
+  signed long int   sl = 3;
+  unsigned long long int ull = 3;
+  signed long long int   sll = 3;
+  time_t t = 3;
+  time_t t2 = -3;
+
+  uc = t; /* { dg-warning "conversion" } */
+  sc = t; /* { dg-warning "conversion" } */
+  us = t; /* { dg-warning "conversion" } */
+  ss = t; /* { dg-warning "conversion" } */
+  si = t; /* { dg-warning "conversion" } */
+  ui = t; /* { dg-warning "conversion" } */
+  sl = t; /* { dg-warning "conversion" } */
+  ul = t; /* { dg-warning "conversion" } */
+  ull = t; /* { dg-warning "sign" } */
+  t = uc; /* { dg-warning "conversion" } */
+  t = sc; /* { dg-warning "conversion" } */
+  t = si; /* { dg-warning "conversion" } */
+  t = ui; /* { dg-warning "conversion" } */
+  t = sl; /* { dg-warning "conversion" } */
+  t = ul; /* { dg-warning "conversion" } */
+  fuc (t); /* { dg-warning "conversion" } */
+  fuc (t); /* { dg-warning "conversion" } */
+  fsc (t); /* { dg-warning "conversion" } */

Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-26 Thread Andre Vehreschild
Hi Thomas,

> I have a few questions.
> 
> First, I see that your patch series does not use gfortran's descriptors
> for accessing coarrays via shared memory, as the original work by
> Nicolas did.  Can you comment on that?

The ABI for invoking coarray functionality is sufficient for doing the job.
Modifying the compiler to access coarrays directly, i.e., having implementation
detail on a certain library in the compiler did not appeal to me. Furthermore
has the new library in conjunction with the other library available the
potential to get to a stable and maintained ABI. Having another ABI in the
compiler would have lead to two badly maintained ones (in my opinion). I
therefore decided to just have one ABI and figured that all that is needed can
be done in a library. This also allows link-time polymorphism. And last but not
least, there is a major rework on the array descriptor going on and that would
have had the potential to conflict with my work. 

> Second, how did you ensure that the library is free from race
> conditions?

The library itself uses mutexes where needed. Additionally are there no
segments where two locks are held at the same time (when I remember correctly).

When you mean races on coarray's data: It is not. That is the user's
responsibility like it is with the other coarray implementation.

> Third, the code "as is" will (looking at this in a cursory way)
> will probly fail on systems where you cannot increase mmap()ed
> regsions, such as macOS or Windows.  Dominique pointed this out
> back then.

The code has no support for increasing the shared (mmap()ed) region. And I
don't think it is necessary at this state of the development. One can chose a
very large shared memory size on program startup. To my knowledge most systems
will map in/allocate the pages as needed. When there is no space left in main
memory, then also growing a shared region would not have worked. Therefore I
think this is fine. Do you have different knowledge?

I deem this library fit for educational and research use, where small to medium
sized problems are researched. I do not expect it to support a long term running
application, because is does not join adjacent blocks in the shared memory upon
free. I.e. the shared memory will get fragmented and at some time no shared
memory can be allocated anymore. Furthermore is there no check in the library,
when it tries to use memory beyond the allocated size. This is the first shot
on having a shared memory coarray library. Therefore in my opinion it should be
usable but not perfect in all aspects. Yes, there will be bugs and yes it can
be improved when need by. But for the time being, I think what I presented is
quite usable.

Did that answer your questions?

Regards,
Andre

-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [Fortran, Patch, PR120711, v1] 1/(3) Fix out of bounds access in cleanup of array constructor

2025-06-26 Thread Andre Vehreschild
Hi Harald,

thanks for the review. Pushed all three parts as gcc-16-1698-g24940ad1534.

A backport to gcc-15 of the first part of the patch, aka this one, seems to be
feasible. I'd like to give the patch a bit time to mature here in gcc-16 and
backport in about a week, when I do not forget it.

Thanks again,
Andre


On Wed, 25 Jun 2025 22:24:46 +0200
Harald Anlauf  wrote:

> Am 25.06.25 um 13:39 schrieb Andre Vehreschild:
> > Hi all,
> > 
> > attached patch fixes an out of bounds access in the clean up code of a
> > concatenating array constructor. A fragment like
> > 
> > list = [ list, something() ]
> > 
> > lead to clean up using an offset (of the list array) that was manipulated in
> > the loop copying the existing array elements and at the end pointing to one
> > element past the list (after the concatenation).
> > 
> > This fixes a 15-regression. Releases prior to 15 do not have the out
> > of bounds access in the (non existing) clean up code. The have a memory
> > leak instead.
> > 
> > Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline?  
> 
> This looks good to me.
> 
> Given the severity of the bug, do you plan to backport to 15-branch?
> 
> Thanks for the patch!
> 
> Harald
> 
> > The subject says, that there will be 3 patches. Only this one fixes the bug.
> > The other fixes I found while hunting this issue and because they play in
> > the general same area, I don't want to loose them. I therefore publish them
> > in this context.
> > 
> > Regards,
> > Andre  
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH 13/17] lto-ltrans-cache: Remove unused private member

2025-06-26 Thread Michal Jireš

On 6/25/25 4:14 PM, Martin Jambor wrote:

Hi,

when building GCC with clang, it warns that the private member suffix
in class ltrans_file_cache (defined in lto-ltrans-cache.h) is not used
which indeed looks like it is the case.  This patch therefore removes
it along with its initialization in the constructor.

Bootstrapped and tested on x86_64-linx.  OK for master?

Alternatively, as with all of these clang warning issues, I'm
perfectly happy to add an entry to contrib/filter-clang-warnings.py to
ignore the warning instead.

Thanks,

Martin



Thanks, I am ok with this,
Michal



gcc/ChangeLog:

2025-06-24  Martin Jambor  

* lto-ltrans-cache.h (class ltrans_file_cache): Remove member prefix.
* lto-ltrans-cache.cc (ltrans_file_cache::ltrans_file_cache): Do
not initialize member prefix.
---
  gcc/lto-ltrans-cache.cc | 3 +--
  gcc/lto-ltrans-cache.h  | 3 +--
  2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
index c57775fae85..91af6ed6f82 100644
--- a/gcc/lto-ltrans-cache.cc
+++ b/gcc/lto-ltrans-cache.cc
@@ -210,8 +210,7 @@ write_cache_item (FILE* f, ltrans_file_cache::item *item, 
const char* dir)
  ltrans_file_cache::ltrans_file_cache (const char* dir, const char* prefix,
  const char* suffix,
  size_t soft_cache_size):
-  dir (dir), prefix (prefix), suffix (suffix),
-  soft_cache_size (soft_cache_size)
+  dir (dir), suffix (suffix),  soft_cache_size (soft_cache_size)
  {
if (!dir) return;
  
diff --git a/gcc/lto-ltrans-cache.h b/gcc/lto-ltrans-cache.h

index 5fef44bae53..fdb7a389435 100644
--- a/gcc/lto-ltrans-cache.h
+++ b/gcc/lto-ltrans-cache.h
@@ -122,8 +122,7 @@ private:
std::map map_checksum;
std::map map_input;
  
-  /* Cached filenames are in format "prefix%d[.ltrans]suffix".  */

-  const char* prefix;
+  /* Cached filenames are in format "cache_prefix%d[.ltrans]suffix".  */
const char* suffix;
  
/* If cache items count is larger, prune deletes old items.  */




Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.

2025-06-26 Thread Andre Vehreschild
Hi Jerry,

thanks for testing. I have fixed IMO most of the whitespace issues in the
patch attached to this mail:
https://gcc.gnu.org/pipermail/fortran/2025-June/062349.html

About the 32 vs. 64 bit versions of the libraries: I never got in touch with
that. I am doing the same as for caf_single. In fact I copied the Makefile.am
portion of caf_single and changed it to generate caf_shmem. Do you get both
versions for caf_single? Did you try a clean rebuild? Can anyone give me a
pointer on what I do wrong here?

Regards,
Andre

On Wed, 25 Jun 2025 13:21:29 -0700
Jerry D  wrote:

> On 6/24/25 11:49 PM, Andre Vehreschild wrote:
> > Hi Jerry,
> > 
> > thank you very much. Just try it. I can only imagine that Paul had a somehow
> > corrupted build directory or left overs from some previous build. I am still
> > wondering, that I got no automated mail from the build hosts, but I can
> > imagine, that they get issues with a series of patches, that build upon each
> > other.
> > 
> > Just try it. The more feedback, the better.
> > 
> > Regards,
> > Andre
> > 
> > On Tue, 24 Jun 2025 11:07:23 -0700
> > Jerry D  wrote:
> >   
> >> On 6/24/25 6:09 AM, Andre Vehreschild wrote:  
> >>> Hi all,
> >>>
> >>> this series of patches (six in total) adds a new coarray backend library
> >>> to libgfortran.  The library uses shared memory and processes to implement
> >>> running multiple images on the same node.  The work is based on work
> >>> started by Thomas and Nicolas Koenig. No changes to the gfortran compile
> >>> part are required for this.  
> >>
> >> --- snip ---
> >>
> >> Hi Andre,
> >>
> >> Thank you for this work. I have been wanting this functionality for
> >> several years!
> >>
> >> I will begin reviewing as best I can.  I did see Paul's initial comment
> >> so your feedback on that would be appreciated.
> >>
> >> Best regards,
> >>
> >> Jerry  
> > 
> >   
> 
> I was able to apply the patches without any issues.  I did see some 
> trailing white space in a few places.
> 
> In running the testsuite the test lock_1.f90 test fails, unable to link 
> to the new library.
> 
> After some brief investigation, it appears the the 64-bit version of the 
> new library is not created or installed.  I did find the 32-bit version.
> 
> So something not right in the make mechanisms.
> 
> Looking ahead a bit I was wondering if one could enable co-array if 
> co-array syntax is seen at the parsing phase of the compiler, if no 
> --fcoarray= has been seen, default it to 'single' and issue a NOTE to 
> the user "-fcoarray=single enabled, use -fcoarray=[none, shmem, lib] to 
> override"
> 
> Regards,
> 
> Jerry
> 
> 
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Avoid some lost AFDO profiles with LTO

2025-06-26 Thread Jan Hubicka
Hi,
This patch fixes some of cases where we lose profile info because we do not
perform inlining that happened at train run before AFDO annotation is done.
This is a common problem with LTO in the case cross-module inlining happened.

I added afdo_offline pass that does two things:
 1) collect set of all functions defined in current unit
 2) walk all toplevel function instances.  If function instance correspond
to a defined symbol, walk everything inlined to it.  If crossmodule
inlining is seen, remove the inline instances and recursively look into
inline instnaces that go back to the current unit and turn them to offline
ones

If function instance corresponds to external symbol, remove it but
also look for functions inlined to it that belong to current module.

When merging profile we also need to recursively merge profiles of inlined
functions and if the inlining decisins does not match, offline the bodies.
This is somewhat fragile since recursive calls may trigger modifications of
functions currently being merged, but I hope I chased away problems with that -
will give it a second tought to see if this can be reorganized into a worklist
fashion that is more safe.

I noticed that functions may appear in the afdo data either as their
symbol name or dwarf name (since inline functions may not have known symbol
name).  There is already some logic to handle that but it is broken in the
case both names are used.

To mitigate the problem I also added logic to translate dwarf names
to symbol names in case both are used.  This prevents profile loss i.e.
in exchange2.  Here digits_2 function appears by its dwarf name (digits_2)
but also is clonned which makes it to appear by its symbol name (__*digits_2)

All profile massaging is done before early optimization so the VPT targets of
offline bodies are correct.  We still will lose profile if early inlining
fails.  I will add second pass to afdo to offline these.

Last problem is that in case we early inlined more than expected (which now
happens more often due to offlining) the profile will be lost and filled by
static profile.  Problem here is that we need to somehow scale the profile of
inline instance but I do not see how to determine invocation counts.  Will try
to look into that incrementally - perhaps we can keep some info from offlining.

There is also now a dump infrastructure that prints the proflie in a
the same format as dump_gcov tool.

autoprofiledbootstraped, regsted x86_64-linux, will commit it shortly.

Honza

gcc/ChangeLog:

* auto-profile.cc (name_index_set, name_index_map): New types.
(dump_afdo_loc): New function.
(dump_inline_stack): Simplify.
(function_instance::merge): Merge recursively inlined functions;
offline if necessary; collect new fnctions.
(function_instance::offline): New member function.
(function_instance::offline_if_in_set): New member function.
(function_instance::remove_external_functions): New member function.
(function_instance::dump): New member function.
(function_instance::debug): New member function.
(function_instance::dump_inline_stack): New member function.
(function_instance::find_icall_target_map): Use removed_icall_target.
(function_instance::remove_icall_target): Only mark icall target 
removed.
(autofdo_source_profile::offline_external_functions): New function.
(function_instance::read_function_instance): Record inlined_to pointers;
use -1 for unknown head counts.
(autofdo_source_profile::get_function_instance_by_name_index): New
function.
(autofdo_source_profile::add_function_instance): New member function.
(autofdo_source_profile::read): Do not leak memory; fix formatting.
(read_profile): Fix formatting.
(afdo_annotate_cfg): LIkewise.
(class pass_ipa_auto_profile_offline): New pass.
(make_pass_ipa_auto_profile_offline): New function.
* passes.def (pass_ipa_auto_profile_offline): Add
* tree-pass.h (make_pass_ipa_auto_profile): Declare

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/indir-call-prof-2.c: Update template.

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 3f8310e6324..12bcba8a03b 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -147,6 +147,10 @@ typedef std::map icall_target_map;
to direct call.  */
 typedef std::set stmt_set;
 
+/* Set and map used to translate name indexes.  */
+typedef hash_set> name_index_set;
+typedef hash_map, int> name_index_map;
+
 /* Represent count info of an inline stack.  */
 class count_info
 {
@@ -233,6 +237,8 @@ public:
   {
 return total_count_;
   }
+
+  /* Return head count or -1 if unknown.  */
   gcov_type
   head_count () const
   {
@@ -246,7 +252,24 @@ public:
 
   /* Merge profile of clones.  Note that cloning hasnt been performed when
  we annotate the CFG (at this stage).  */
-  void merge 

[PATCH] [testsuite] restore default action from dfp.exp [PR120631]

2025-06-26 Thread Alexandre Oliva


dfp.exp tests for dfprt before deciding whether to default to run or
compile, and the PR120631 tests override that without checking for
dfprt.  Rework them to avoid attempting to link and run programs
when dfp runtime support isn't available.

Tested on x86_64-linux-gnu, and, with gcc-14, on aarch64-elf.
Ok to install?


for  gcc/testsuite/ChangeLog

PR middle-end/120631
* pr120631.c: Drop overrider of dg-do default action.
* bitint-9.c: Likewise.
* bitint-10.c: Likewise.
---
 gcc/testsuite/gcc.dg/dfp/bitint-10.c |2 +-
 gcc/testsuite/gcc.dg/dfp/bitint-9.c  |2 +-
 gcc/testsuite/gcc.dg/dfp/pr120631.c  |1 -
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-10.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-10.c
index b48f0ea6c277e..4a73aebe095c9 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-10.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-10.c
@@ -1,5 +1,5 @@
 /* PR middle-end/120631 */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-options "-O2" } */
 
 #if __BITINT_MAXWIDTH__ >= 128
diff --git a/gcc/testsuite/gcc.dg/dfp/bitint-9.c 
b/gcc/testsuite/gcc.dg/dfp/bitint-9.c
index 72155a0124753..31614876a12f9 100644
--- a/gcc/testsuite/gcc.dg/dfp/bitint-9.c
+++ b/gcc/testsuite/gcc.dg/dfp/bitint-9.c
@@ -1,5 +1,5 @@
 /* PR middle-end/120631 */
-/* { dg-do run { target bitint } } */
+/* { dg-require-effective-target bitint } */
 /* { dg-options "-O2" } */
 
 #if __BITINT_MAXWIDTH__ >= 2048
diff --git a/gcc/testsuite/gcc.dg/dfp/pr120631.c 
b/gcc/testsuite/gcc.dg/dfp/pr120631.c
index 2085ff7ba5a72..2533e9de29f81 100644
--- a/gcc/testsuite/gcc.dg/dfp/pr120631.c
+++ b/gcc/testsuite/gcc.dg/dfp/pr120631.c
@@ -1,5 +1,4 @@
 /* PR middle-end/120631 */
-/* { dg-do run } */
 /* { dg-options "-O2" } */
 
 _Decimal64 a = 123456789135790.0dd;


-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


Re: [PATCH 1/2] Fixup partial_vectors_supported_p use

2025-06-26 Thread Richard Sandiford
Richard Biener  writes:
> The following fixes the computation of supports_partial_vectors which
> is used to prune the set of modes to iterate over for epilog
> vectorization.  The used partial_vectors_supported_p predicate
> only looks for while_ult while also support predication when
> mask modes are integer modes as for AVX512.
>
> I've noticed this isn't very effective on x86_64 anyway since
> if the main loop mode is autodetected we skip re-analyzing
> mode_i == 0, but then mode_i == 1 is usually the very same
> large mode.
>
> Thus I do wonder if we should instead always (or when
> --param vect-partial-vector-usage != 0, or when the target would
> support predication in principle) perform main loop analysis
> with partial vectors in mind (start with can_use_partial_vectors_p =
> true), but only at the end honor the --param when deciding on
> using_partial_vectors_p.  We can then remember can_use_partial_vectors_p
> for each analyzed mode and use that more specific info for the
> pruning?

Yeah, sounds like that could work.  In principle, epilogue loops should
be strictly easier to vectorise than main loops.  If you know that the
epilogue "loop" never iterates, there could in principle be cases
where we'd need to clear can_use_partial_vectors_p for the main loop
but not for the epilogue loop.  I can't think of any situation like
that off-hand though.  Likewise for unrolling.

> For the missed skipping we probably want to increment
> mode_i based on vect_chooses_same_modes_p, like we do in
> vect_analyze_loop_1.  I'll propose a patch for this - but this
> would regress --param vect-partial-vector-usage=1 on x86 without
> the patch below.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> OK?
>
>   * tree-vect-loop.cc (vect_analyze_loop): Consider AVX512
>   style masking when computing supports_partial_vectors.
> ---
>  gcc/tree-vect-loop.cc | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index c824b5abaaf..b91ef4a2325 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3742,8 +3742,15 @@ vect_analyze_loop (class loop *loop, gimple 
> *loop_vectorized_call,
>  vector_modes[0] = autodetected_vector_mode;
>mode_i = 0;
>  
> -  bool supports_partial_vectors =
> -partial_vectors_supported_p () && param_vect_partial_vector_usage != 0;
> +  bool supports_partial_vectors = param_vect_partial_vector_usage != 0;
> +  machine_mode mask_mode;
> +  if (supports_partial_vectors
> +  && !partial_vectors_supported_p ()
> +  && !(VECTOR_MODE_P (first_loop_vinfo->vector_mode)
> +&& targetm.vectorize.get_mask_mode
> + (first_loop_vinfo->vector_mode).exists (&mask_mode)
> +&& SCALAR_INT_MODE_P (mask_mode)))
> +supports_partial_vectors = false;

LGTM FWIW.

I suppose an alternative would be to do this check within the loop
and use vector_modes[mode_i] rather than first_loop_vinfo->vector_mode,
so that we test the mode that we intend to use.  But maybe the extra
precision (if that's what it is) isn't useful in practice.

Thanks,
Richard

>poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
>  
>loop_vec_info orig_loop_vinfo = first_loop_vinfo;


[PATCH] openmp: Allocate memory for private/firstprivate clauses as, directed by allocate clauses in target constructs [PR113436]

2025-06-26 Thread Kwok Cheung Yeung
Currently, GCC accepts an allocate clause (to use a specific memory 
allocator and alignment) on the OpenMP target construct, but it has no 
effect - memory is always allocated with the defaults.


This patch causes memory for privatized variables (i.e. variables in 
private and firstprivate clauses) to be allocated with the specified 
allocator and alignment in a similar fashion to how it is done for 
parallel constructs, reusing the lower_private_allocate function.


As the allocated memory is addressed via a pointer, references to the 
variables in the target code need to be adjusted to refer to it, which 
is done by adjusting the DECL_VALUE_EXPR of the version of the variable 
in the target region.


For firstprivate variables, the allocated memory needs to be 
initialized. For most part this is done using the existing mechanisms 
but to a different target. Arrays need an additional copy of their 
contents to the allocated region. C++ references do not need to create a 
temporary to hold the referred-to object as the allocated memory 
fulfills the role already.


VLAs have a non-constant size which is passed in another variable, so 
they cannot be allocated until the size variable is available in the 
target region. Similarly to how private VLAs are handled, the allocation 
and initialisation is delayed until the size variable is set up.


Tested on a x86_64 host with offloading to nvptx. Okay for trunk?


KwokFrom 84adc8bf84974529e5e73d28c7e0abfd7f421364 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 11 Jun 2025 12:46:44 +0100
Subject: [PATCH] openmp: Allocate memory for private/firstprivate clauses as
 directed by allocate clauses in target constructs [PR113436]

This patch generates calls to GOMP_alloc to allocate memory for firstprivate
and private clauses on target constructs with an allocator and alignment
as specified by the allocate clause.

The decl values of the clause need to be adjusted to refer to the allocated
memory, and the initial values of variables need to be copied into the
allocated space for firstprivate variables.

For variable-length arrays, the size of the array is stored in a separate
variable, so the allocation and initialization need to be delayed until the
size is made available on the target.

gcc/

PR middle-end/113436
* omp-low.cc (lower_omp_target): Call lower_private_allocate to
generate code to allocate memory for firstprivate/private clauses
with allocators, and insert code after dependent variables have
been initialized.  Construct calls to free allocate memory and insert
after target block.  Adjust decl values for clause variables.  Copy
value of firstprivate variables to allocated memory.

gcc/testsuite/

PR middle-end/113436
* c-c++-common/gomp/pr113436-1.c: New.
* c-c++-common/gomp/pr113436-2.c: New.

libgomp/

PR middle-end/113436
* testsuite/libgomp.c++/firstprivate-1.C: Enable alignment check.
* testsuite/libgomp.c++/pr113436-1.C: New.
* testsuite/libgomp.c++/pr113436-2.C: New.
* testsuite/libgomp.c++/private-1.C: Enable alignment check.
* testsuite/libgomp.c-c++-common/pr113436-1.c: New.
* testsuite/libgomp.c-c++-common/pr113436-2.c: New.
* testsuite/libgomp.fortran/pr113436-1.f90: New.
* testsuite/libgomp.fortran/pr113436-2.f90: New.
---
 gcc/omp-low.cc| 203 +++---
 gcc/testsuite/c-c++-common/gomp/pr113436-1.c  |  39 
 gcc/testsuite/c-c++-common/gomp/pr113436-2.c  |  40 
 .../testsuite/libgomp.c++/firstprivate-1.C|   6 +-
 libgomp/testsuite/libgomp.c++/pr113436-1.C|  27 +++
 libgomp/testsuite/libgomp.c++/pr113436-2.C|  25 +++
 libgomp/testsuite/libgomp.c++/private-1.C |   3 +-
 .../libgomp.c-c++-common/pr113436-1.c |  94 
 .../libgomp.c-c++-common/pr113436-2.c |  80 +++
 .../testsuite/libgomp.fortran/pr113436-1.f90  |  43 
 .../testsuite/libgomp.fortran/pr113436-2.f90  |  38 
 11 files changed, 563 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/pr113436-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/pr113436-2.c
 create mode 100644 libgomp/testsuite/libgomp.c++/pr113436-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/pr113436-2.C
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/pr113436-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/pr113436-2.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/pr113436-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/pr113436-2.f90

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index e1036adab28..8efe7c5d2ab 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -12752,10 +12752,16 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
 
   ilist = NULL;
   olist = NULL;
+
+  gimple_seq alloc_dlist = NULL;
+  hash_map alloc_map;
+  hash_map alloc_seq_map;
+

[PATCH v3 1/2] libstdc++: Type-erase chrono-data for formatting [PR110739]

2025-06-26 Thread Tomasz Kamiński
This patch reworks the formatting for the chrono types, such that they are all
formatted in terms of _ChronoData class, that includes all required fields.
Populating each required field is performed in formatter for specific type,
based on the chrono-spec used.

To facilitate above, the _ChronoSpec now includes additional _M_needed field,
that represnts the chrono data that is referenced by format spec (this value
is also configured for __defSpec). This value differs from the value of
__parts passed to _M_parse, which does include all fields that can be computed
from input (e.g. weekday_indexed can be computed for year_month_day). Later
it is used to fill _ChronoData, in particular _M_fill_* family of functions,
to determine if given field needs to be set, and thus its value needs to be
computed.

In consequence _ChronoParts enum was extended with additional values, that
allows more fine grained identification:
 * _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds,
 * _TimeZone is separated into _ZoneAbbrev and _ZoneOffset,
 * _LocalDays, _WeekdayIndex are defined and in included in _Date,
 * _Duration is removed, and instead _EpochUnits and _UnitSuffix are
   introduced.
Furthermore, to avoid name conflicts _ChonoParts is now defined as enum class,
with additional operators that simplify uses.

In addition to fields that can be printed using chrono-spec, _ChronoData stores:
 * Total days in wall time (_M_ldays), day of year (_M_day_of_year) - used by
   struct tm construction, and for ISO calendar computation.
 * Total seconds in wall time (_M_lseconds) - this value may be different from
   sum of days, hours, minutes, seconds (e.g. see utc_time below). Included
   to allow future extension, like printing total minutes.
 * Total seconds since epoch - due offset different from above. Again to be
   used with future extension (e.g. %s as proposed in P2945R1).
 * Subseconds - count of attoseconds (10^(-18)), in addition to printing can
   be used to  compute fractional hours, minutes.
The both total seconds fields we use single _TotalSeconds enumerator in
_ChronoParts, that when present in combination with _EpochUnits or _LocalDays
indicates that _M_eseconds (_EpochSeconds) or _M_lseconds (_LocalSeconds) are
provided/required.

To handle type formatting of time since epoch ('%Q'|_EpochUnits), we use the
format_args mechanism, where the result of +d.count() (see LWG4118) is erased
into make_format_args to local __arg_store, that is later referenced by
_M_ereps (_M_ereps.get(0)).

To handle precision values, and in prepartion to allow user to configure ones,
we store the precision as third element of _M_ereps (_M_ereps.get(2)), this
allows duration with precision to be printed using "{0:{2}}". For subseconds
the precision is handled differently depending on the representation:
 * for integral reps, _M_subseconds value is used to determine fractional value,
   precision is trimmed to 18 digits;
 * for floating-points, _M_ereps stores duration initialized with only
   fractional seconds, that is later formatted with precision.
Always using _M_subseconds fields for integral duration, means that we do not
use formattter for user-defined durations that are considered to be integral
(see empty_spec.cc file change). To avoid potentially expensive computation
of _M_subseconds, we make sure that _ChronoParts::_Subseconds is set only if
_Subseconds are needed. In particular we remove this flag for localized ouput
in _M_parse.

Construction of the _M_ereps as described above is handled by 
__formatter_duration,
that is then used to format duration, hh_mm_ss and time_points specializations.
This class also handles _UnitSuffix, the _M_units_suffix field is populated
either with predefined suffix (chrono::__detail::__units_suffix) or one produced
locally.

Finally, formatters for types listed below contains type specific logic:
 * hh_mm_ss - we do not compute total duration and seconds, unless explicitly
   requested, as such computation may overflow;
 * utc_time - for time during leap second insertion, the _M_seconds field is
   increased to 60;
 * __local_time_fmt - exception is thrown if zone offset (_ZoneOffset) or
   abbrevation (_ZoneAbbrev) is requsted, but corresponding pointer is null,
   futhermore conversion from `char` to `wchar_t` for abbreviation is performed
   if needed.

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__format::__no_timezone_available):
Removed, replaced with separate throws in formatter for
__local_time_fmt
(__format::_ChronoParts): Defined additional enumertors and
declared as enum class.
(__format::operator&(_ChronoParts, _ChronoParts))
(__format::operator&=(_ChronoParts&, _ChronoParts))
(__format::operator-(_ChronoParts, _ChronoParts))
(__format::operator-=(_ChronoParts&, _ChronoParts))
(__format::operator==(_ChronoParts, decltype(nullptr)))
(_ChronoSp

[PATCH v3 2/2] libstdc++: Lift chrono localized formatting to main chrono format loop [PR110739]

2025-06-26 Thread Tomasz Kamiński
This patch extract calls to _M_locale_fmt and construction of the struct tm,
from the functions dedicated to each specifier, to main format loop in
_M_format_to functions. This removes duplicated code repeated for specifiers.

To allow _M_locale_fmt to only be called if localized formatting is enabled
('L' is present in chrono-format-spec), we provide a implementations for
locale specific specifiers (%c, %r, %x, %X) that produces the same result
as locale::classic():
 * %c is implemented as separate _M_c method
 * %r is implemented as separate _M_r method
 * %x is implemented together with %D, as they provide same behavior,
 * %X is implemented together with %R as _M_R_X, as both of them do not include
   subseconds.

The handling of subseconds was also extracted to _M_subsecs function that is
used by _M_S and _M_T specifier. The _M_T is now implemented in terms of
_M_R_X (printing time without subseconds) and _M_subs.

The __mod responsible for triggering localized formatting was removed from
method handling most of specifiers, except:
 * _M_S (for %S) for which it determines if subseconds should be included,
 * _M_z (for %z) for which it determines if ':' is used as separator.

PR libstdc++/110739

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_use_locale_fmt):
Define.
(__formatter_chrono::_M_locale_fmt): Moved to front of the class.
(__formatter_chrono::_M_format_to): Construct and initialize
struct tm and call _M_locale_fmt if needed.
(__formatter_chrono::_M_c_r_x_X): Split into separate methods.
(__formatter_chrono::_M_c, __formatter_chrono::_M_r): Define.
(__formatter_chrono::_M_D): Renamed to _M_D_x.
(__formatter_chrono::_M_D_x): Renamed from _M_D.
(__formatter_chrono::_M_R_T): Split into _M_R_X and _M_T.
(__formatter_chrono::_M_R_X): Extracted from _M_R_T.
(__formatter_chrono::_M_T): Define in terms of _M_R_X and _M_subsecs.
(__formatter_chrono::_M_subsecs): Extracted from _M_S.
(__formatter_chrono::_M_S): Replaced __mod with __subs argument,
removed _M_locale_fmt call, and delegate to _M_subsecs.
(__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
(__formatter_chrono::_M_H_I, __formatter_chrono::_M_m)
(__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W): Remove
__mod argument and call to _M_locale_fmt.

Reviewed-by: Jonathan Wakely 
Signed-off-by: Tomasz Kamiński 
---
Changes in v3:
 - restored missing comment in _M_S
 - increment __out before calling _M_C_y_Y in _M_c

 libstdc++-v3/include/bits/chrono_io.h | 338 +-
 1 file changed, 171 insertions(+), 167 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 8811eaa5b3b..9e21152e398 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -908,6 +908,40 @@ namespace __format
  return __format::__write(std::move(__out), __s);
}
 
+  [[__gnu__::__always_inline__]]
+  static bool
+  _S_localized_spec(_CharT __conv, _CharT __mod)
+  {
+   switch (__conv)
+ {
+ case 'c':
+ case 'r':
+ case 'x':
+ case 'X':
+   return true;
+ case 'z':
+   return false;
+ default:
+   return (bool)__mod;
+ };
+  }
+
+  // Use the formatting locale's std::time_put facet to produce
+  // a locale-specific representation.
+  template
+   _Iter
+   _M_locale_fmt(_Iter __out, const locale& __loc, const struct tm& __tm,
+ char __fmt, char __mod) const
+   {
+ basic_ostringstream<_CharT> __os;
+ __os.imbue(__loc);
+ const auto& __tp = use_facet>(__loc);
+ __tp.put(__os, __os, _S_space, &__tm, __fmt, __mod);
+ if (__os)
+   __out = _M_write(std::move(__out), __loc, __os.view());
+ return __out;
+   }
+
   template
_Out
_M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
@@ -925,6 +959,36 @@ namespace __format
return std::move(__out);
  };
 
+ struct tm __tm{};
+ bool __use_locale_fmt = false;
+ if (_M_spec._M_localized && _M_spec._M_locale_specific)
+   if (__fc.locale() != locale::classic())
+ {
+   __use_locale_fmt = true;
+
+   __tm.tm_year = (int)__t._M_year - 1900;
+   __tm.tm_yday = __t._M_day_of_year.count();
+   __tm.tm_mon = (unsigned)__t._M_month - 1;
+   __tm.tm_mday = (unsigned)__t._M_day;
+   __tm.tm_wday = __t._M_weekday.c_encoding();
+   __tm.tm_hour = __t._M_hours.count();
+   __tm.tm_min = __t._M_minutes.count();
+   __tm.tm_sec = __t._M_seconds.count();
+
+   // Some locales use %Z in their %c format but we d

Re: [PATCH v2 2/2] libstdc++: Lift chrono localized formatting to main chrono format loop [PR110739]

2025-06-26 Thread Tomasz Kaminski
On Thu, Jun 26, 2025 at 2:13 PM Tomasz Kaminski  wrote:

>
>
> On Thu, Jun 26, 2025 at 2:09 PM Jonathan Wakely 
> wrote:
>
>> On 26/06/25 11:39 +0200, Tomasz Kamiński wrote:
>> >This patch extract calls to _M_locale_fmt and construction of the struct
>> tm,
>> >from the functions dedicated to each specifier, to main format loop in
>> >_M_format_to functions. This removes duplicated code repeated for
>> specifiers.
>>
>> Great, this is exactly what I wanted to do. Removing all the branches
>> to call _M_locale_fmt from each of the _M_xxx member functions makes
>> them smaller and potentially faster.
>>
>> >To allow _M_locale_fmt to only be called if localized formatting is
>> enabled
>> >('L' is present in chrono-format-spec), we provide a implementations for
>> >locale specific specifiers (%c, %r, %x, %X) that produces the same result
>> >as locale::classic():
>> > * %c is implemented as separate _M_c method
>> > * %r is implemented as separate _M_r method
>> > * %x is implemented together with %D, as they provide same behavior,
>> > * %X is implemented together with %R as _M_R_X, as both of them do not
>> include
>> >   subseconds.
>>
>> Nice.
>>
>> >The handling of subseconds was also extracted to _M_subsecs function
>> that is
>> >used by _M_S and _M_T specifier. The _M_T is now implemented in terms of
>> >_M_R_X (printing time without subseconds) and _M_subs.
>> >
>> >The __mod responsible for triggering localized formatting was removed
>> from
>> >method handling most of specifiers, except:
>> > * _M_S (for %S) for which it determines if subseconds should be
>> included,
>> > * _M_z (for %z) for which it determines if ':' is used as separator.
>> >
>> >   PR libstdc++/110739
>> >
>> >libstdc++-v3/ChangeLog:
>> >
>> >   * include/bits/chrono_io.h
>> (__formatter_chrono::_M_use_locale_fmt):
>> >   Define.
>> >   (__formatter_chrono::_M_locale_fmt): Moved to front of the class.
>> >   (__formatter_chrono::_M_format_to): Construct and initialize
>> >   struct tm and call _M_locale_fmt if needed.
>> >   (__formatter_chrono::_M_c_r_x_X): Split into separate methods.
>> >   (__formatter_chrono::_M_c, __formatter_chrono::_M_r): Define.
>> >   (__formatter_chrono::_M_D): Renamed to _M_D_x.
>> >   (__formatter_chrono::_M_D_x): Renamed from _M_D.
>> >   (__formatter_chrono::_M_R_T): Split into _M_R_X and _M_T.
>> >   (__formatter_chrono::_M_R_X): Extracted from _M_R_T.
>> >   (__formatter_chrono::_M_T): Define in terms of _M_R_X and
>> _M_subsecs.
>> >   (__formatter_chrono::_M_subsecs): Extracted from _M_S.
>> >   (__formatter_chrono::_M_S): Replaced __mod with __subs argument,
>> >   removed _M_locale_fmt call, and delegate to _M_subsecs.
>> >   (__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
>> >   (__formatter_chrono::_M_H_I, __formatter_chrono::_M_m)
>> >   (__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W): Remove
>> >   __mod argument and call to _M_locale_fmt.
>> >---
>> > libstdc++-v3/include/bits/chrono_io.h | 340 +-
>> > 1 file changed, 172 insertions(+), 168 deletions(-)
>> >
>> >diff --git a/libstdc++-v3/include/bits/chrono_io.h
>> b/libstdc++-v3/include/bits/chrono_io.h
>> >index 35e95906e6a..d451bde722d 100644
>> >--- a/libstdc++-v3/include/bits/chrono_io.h
>> >+++ b/libstdc++-v3/include/bits/chrono_io.h
>> >@@ -906,6 +906,40 @@ namespace __format
>> > return __format::__write(std::move(__out), __s);
>> >   }
>> >
>> >+  [[__gnu__::__always_inline__]]
>> >+  static bool
>> >+  _S_localized_spec(_CharT __conv, _CharT __mod)
>> >+  {
>> >+  switch (__conv)
>> >+{
>> >+case 'c':
>> >+case 'r':
>> >+case 'x':
>> >+case 'X':
>> >+  return true;
>> >+case 'z':
>> >+  return false;
>> >+default:
>> >+  return (bool)__mod;
>> >+};
>> >+  }
>> >+
>> >+  // Use the formatting locale's std::time_put facet to produce
>> >+  // a locale-specific representation.
>> >+  template
>> >+  _Iter
>> >+  _M_locale_fmt(_Iter __out, const locale& __loc, const struct tm&
>> __tm,
>> >+char __fmt, char __mod) const
>> >+  {
>> >+basic_ostringstream<_CharT> __os;
>> >+__os.imbue(__loc);
>> >+const auto& __tp = use_facet>(__loc);
>> >+__tp.put(__os, __os, _S_space, &__tm, __fmt, __mod);
>> >+if (__os)
>> >+  __out = _M_write(std::move(__out), __loc, __os.view());
>> >+return __out;
>> >+  }
>> >+
>> >   template
>> >   _Out
>> >   _M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
>> >@@ -923,6 +957,36 @@ namespace __format
>> >   return std::move(__out);
>> > };
>> >
>> >+struct tm __tm{};
>> >+bool __use_locale_fmt = false;
>> >+if (_M_spec._M_localized && _M_spec._M_locale_specific)
>> >+  if (__fc.locale() != l

Re: [PATCH V1] RISC-V:Add the MIPS P8700 conditional move extension instruction support.

2025-06-26 Thread Umesh Kalappa
Hi @Jeff Lawand all,

Please have a look at the below changes that were suggested and tested.

Thank you
~U

On Fri, Jun 13, 2025 at 8:31 PM Umesh Kalappa 
wrote:

> Addressed the most of comments and tried to refactor the
> riscv_expand_conditional_move() to some extent.
>
> No regressions are found for "runtest --tool gcc
> --target_board='riscv-sim/-mabi=lp64d/-mtune=mips-p8700/-O2 ' riscv.exp"
>
> *config/riscv/riscv-cores.def(RISCV_CORE):Updated the supported
> march.
> *config/riscv/riscv-ext-mips.def(DEFINE_RISCV_EXT):
>  New file added for mips conditional mov extension.
> *config/riscv/riscv-ext.def: Likewise.
> *config/riscv/t-riscv:Generates riscv-ext.opt
> *config/riscv/riscv-ext.opt: Generated file.
> *config/riscv/riscv.cc(riscv_expand_conditional_move):Updated for
> mips cmov
>  and outlined some code that handle arch cond move.
> *config/riscv/riscv.md(movcc):updated expand for MIPS CCMOV.
> *config/riscv/mips-insn.md:New file for mips-p8700 ccmov insn.
> *testsuite/gcc.target/riscv/mipscondmov.c:Test file for mips.ccmov
> insn.
> *gcc/doc/riscv-ext.texi:Updated for mips cmov.
> ---
>  gcc/config/riscv/mips-insn.md|  37 ++
>  gcc/config/riscv/riscv-cores.def |   3 +-
>  gcc/config/riscv/riscv-ext-mips.def  |  35 +
>  gcc/config/riscv/riscv-ext.def   |   1 +
>  gcc/config/riscv/riscv-ext.opt   |   4 +
>  gcc/config/riscv/riscv.cc| 131 ---
>  gcc/config/riscv/riscv.md|   3 +-
>  gcc/config/riscv/t-riscv |   3 +-
>  gcc/doc/riscv-ext.texi   |   4 +
>  gcc/testsuite/gcc.target/riscv/mipscondmov.c |  30 +
>  10 files changed, 202 insertions(+), 49 deletions(-)
>  create mode 100644 gcc/config/riscv/mips-insn.md
>  create mode 100644 gcc/config/riscv/riscv-ext-mips.def
>  create mode 100644 gcc/testsuite/gcc.target/riscv/mipscondmov.c
>
> diff --git a/gcc/config/riscv/mips-insn.md b/gcc/config/riscv/mips-insn.md
> new file mode 100644
> index 000..e36b7d78796
> --- /dev/null
> +++ b/gcc/config/riscv/mips-insn.md
> @@ -0,0 +1,37 @@
> +;; Machine description for MIPS custom instructions.
> +;; Copyright (C) 2025 Free Software Foundation, Inc.
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify
> +;; it under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +
> +;; GCC is distributed in the hope that it will be useful,
> +;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +;; GNU General Public License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; .
> +
> +(define_insn "*movcc_bitmanip"
> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> +   (if_then_else:GPR
> +(match_operator 5 "equality_operator"
> +   [(match_operand:X 1 "register_operand" "r")
> +(match_operand:X 2 "const_0_operand" "J")])
> +(match_operand:GPR 3 "reg_or_0_operand" "rJ")
> +(match_operand:GPR 4 "reg_or_0_operand" "rJ")))]
> +  "TARGET_XMIPSCMOV"
> +{
> +  enum rtx_code code = GET_CODE (operands[5]);
> +  if (code == NE)
> +return "mips.ccmov\t%0,%1,%z3,%z4";
> +  else
> +return "mips.ccmov\t%0,%1,%z4,%z3";
> +}
> +  [(set_attr "type" "condmove")
> +   (set_attr "mode" "")])
> diff --git a/gcc/config/riscv/riscv-cores.def
> b/gcc/config/riscv/riscv-cores.def
> index cff7c77a0bd..111ee02260e 100644
> --- a/gcc/config/riscv/riscv-cores.def
> +++ b/gcc/config/riscv/riscv-cores.def
> @@ -168,7 +168,6 @@ RISCV_CORE("xiangshan-kunminghu",
>  "rv64imafdcbvh_sdtrig_sha_shcounterenw_"
>   "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b",
>   "xiangshan-kunminghu")
>
> -RISCV_CORE("mips-p8700",   "rv64imafd_zicsr_zmmul_"
> - "zaamo_zalrsc_zba_zbb",
> +RISCV_CORE("mips-p8700",  "rv64imafd_zaamo_zalrsc_zba_zbb",
>   "mips-p8700")
>  #undef RISCV_CORE
> diff --git a/gcc/config/riscv/riscv-ext-mips.def
> b/gcc/config/riscv/riscv-ext-mips.def
> new file mode 100644
> index 000..f24507139f6
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-ext-mips.def
> @@ -0,0 +1,35 @@
> +/* MIPS extension definition file for RISC-V.
> +   Copyright (C) 2025 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your opti

Re: [PATCH v2 2/2] libstdc++: Lift chrono localized formatting to main chrono format loop [PR110739]

2025-06-26 Thread Tomasz Kaminski
On Thu, Jun 26, 2025 at 2:09 PM Jonathan Wakely  wrote:

> On 26/06/25 11:39 +0200, Tomasz Kamiński wrote:
> >This patch extract calls to _M_locale_fmt and construction of the struct
> tm,
> >from the functions dedicated to each specifier, to main format loop in
> >_M_format_to functions. This removes duplicated code repeated for
> specifiers.
>
> Great, this is exactly what I wanted to do. Removing all the branches
> to call _M_locale_fmt from each of the _M_xxx member functions makes
> them smaller and potentially faster.
>
> >To allow _M_locale_fmt to only be called if localized formatting is
> enabled
> >('L' is present in chrono-format-spec), we provide a implementations for
> >locale specific specifiers (%c, %r, %x, %X) that produces the same result
> >as locale::classic():
> > * %c is implemented as separate _M_c method
> > * %r is implemented as separate _M_r method
> > * %x is implemented together with %D, as they provide same behavior,
> > * %X is implemented together with %R as _M_R_X, as both of them do not
> include
> >   subseconds.
>
> Nice.
>
> >The handling of subseconds was also extracted to _M_subsecs function that
> is
> >used by _M_S and _M_T specifier. The _M_T is now implemented in terms of
> >_M_R_X (printing time without subseconds) and _M_subs.
> >
> >The __mod responsible for triggering localized formatting was removed from
> >method handling most of specifiers, except:
> > * _M_S (for %S) for which it determines if subseconds should be included,
> > * _M_z (for %z) for which it determines if ':' is used as separator.
> >
> >   PR libstdc++/110739
> >
> >libstdc++-v3/ChangeLog:
> >
> >   * include/bits/chrono_io.h (__formatter_chrono::_M_use_locale_fmt):
> >   Define.
> >   (__formatter_chrono::_M_locale_fmt): Moved to front of the class.
> >   (__formatter_chrono::_M_format_to): Construct and initialize
> >   struct tm and call _M_locale_fmt if needed.
> >   (__formatter_chrono::_M_c_r_x_X): Split into separate methods.
> >   (__formatter_chrono::_M_c, __formatter_chrono::_M_r): Define.
> >   (__formatter_chrono::_M_D): Renamed to _M_D_x.
> >   (__formatter_chrono::_M_D_x): Renamed from _M_D.
> >   (__formatter_chrono::_M_R_T): Split into _M_R_X and _M_T.
> >   (__formatter_chrono::_M_R_X): Extracted from _M_R_T.
> >   (__formatter_chrono::_M_T): Define in terms of _M_R_X and
> _M_subsecs.
> >   (__formatter_chrono::_M_subsecs): Extracted from _M_S.
> >   (__formatter_chrono::_M_S): Replaced __mod with __subs argument,
> >   removed _M_locale_fmt call, and delegate to _M_subsecs.
> >   (__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
> >   (__formatter_chrono::_M_H_I, __formatter_chrono::_M_m)
> >   (__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W): Remove
> >   __mod argument and call to _M_locale_fmt.
> >---
> > libstdc++-v3/include/bits/chrono_io.h | 340 +-
> > 1 file changed, 172 insertions(+), 168 deletions(-)
> >
> >diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> >index 35e95906e6a..d451bde722d 100644
> >--- a/libstdc++-v3/include/bits/chrono_io.h
> >+++ b/libstdc++-v3/include/bits/chrono_io.h
> >@@ -906,6 +906,40 @@ namespace __format
> > return __format::__write(std::move(__out), __s);
> >   }
> >
> >+  [[__gnu__::__always_inline__]]
> >+  static bool
> >+  _S_localized_spec(_CharT __conv, _CharT __mod)
> >+  {
> >+  switch (__conv)
> >+{
> >+case 'c':
> >+case 'r':
> >+case 'x':
> >+case 'X':
> >+  return true;
> >+case 'z':
> >+  return false;
> >+default:
> >+  return (bool)__mod;
> >+};
> >+  }
> >+
> >+  // Use the formatting locale's std::time_put facet to produce
> >+  // a locale-specific representation.
> >+  template
> >+  _Iter
> >+  _M_locale_fmt(_Iter __out, const locale& __loc, const struct tm&
> __tm,
> >+char __fmt, char __mod) const
> >+  {
> >+basic_ostringstream<_CharT> __os;
> >+__os.imbue(__loc);
> >+const auto& __tp = use_facet>(__loc);
> >+__tp.put(__os, __os, _S_space, &__tm, __fmt, __mod);
> >+if (__os)
> >+  __out = _M_write(std::move(__out), __loc, __os.view());
> >+return __out;
> >+  }
> >+
> >   template
> >   _Out
> >   _M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
> >@@ -923,6 +957,36 @@ namespace __format
> >   return std::move(__out);
> > };
> >
> >+struct tm __tm{};
> >+bool __use_locale_fmt = false;
> >+if (_M_spec._M_localized && _M_spec._M_locale_specific)
> >+  if (__fc.locale() != locale::classic())
> >+{
> >+  __use_locale_fmt = true;
> >+
> >+  __tm.tm_year = (int)__t._M_year - 1900;
> >+  __tm.tm_yday = __t._M_day_of_year

Re: [PATCH v2 1/2] libstdc++: Type-erase chrono-data for formatting [PR110739]

2025-06-26 Thread Jonathan Wakely
On Thu, 26 Jun 2025 at 13:52, Tomasz Kaminski  wrote:
>
>
>
> On Thu, Jun 26, 2025 at 2:26 PM Jonathan Wakely  wrote:
>>
>> On 26/06/25 11:39 +0200, Tomasz Kamiński wrote:
>> >This patch reworks the formatting for the chrono types, such that they are 
>> >all
>> >formatted in terms of _ChronoData class, that includes all required fields.
>> >Populating each required field is performed in formatter for specific type,
>> >based on the chrono-spec used.
>> >
>> >To facilitate above, the _ChronoSpec now includes additional _M_needed 
>> >field,
>> >that represnts the chrono data that is referenced by format spec (this value
>> >is also configured for __defSpec). This value differs from the value of
>> >__parts passed to _M_parse, which does include all fields that can be 
>> >computed
>> >from input (e.g. weekday_indexed can be computed for year_month_day). Later
>> >it is used to fill _ChronoData, in particular _M_fill_* family of functions,
>> >to determine if given field needs to be set, and thus it's value needs to be
>>
>> "its"
>>
>> >computed.
>> >
>> >In consequence _ChronoParts enum was exteneded with additional values,
>>
>> "extended"
>>
>> >that allows more fine grained indentification:
>>
>> "identification"
>>
>> > * _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds,
>> > * _TimeZone is separated into _ZoneAbbrev and _ZoneOffset,
>> > * _LocalDays, _WeekdayIndex are defiend in included in _Date,
>>
>> "defined"
>>
>> > * _Duration is removed, and instead _EpochUnits and _UnitSuffix are
>> >   introduced.
>> >Furthermore, to avoid name conflicts _ChonoParts is now defined as enum 
>> >class,
>> >with additional operators that simplify uses.
>>
>> I don't love overloading operator- to mean clearing bits, but it does
>> make clearing the bits very convenient. Maybe just add a comment
>> before operator-(_ChronoParts x, _ChronoParts y) saying that it
>> returns a copy of x with all bits from y unset.
>
> I have added a comment. I actually think operator- makes the filling code
> pretty readable, and intuitive. We have set some files, and remove it from
> parts.

Yes, it definitely makes the code more readable. I'm just generally
cautious about overloading operators to give them meanings which are
different from their conventional meanings. Here we have operator-
which does not produce the same result as (int)x - (int)y, but this is
not a user-facing type so it's OK.



>>
>> That comment will be
>> know that's what the function body


Sorry, I'm not sure what happened to the sentence above! What I was
trying to say is that a comment will make it easier to see at a glance
that operator- unsets bits, without having to parse the function body.


>>
>> (Which is x&(x^y) I think, right?)
>
> I think both x&(x^y) and x & ~y gives the same result. I prefer the later.

I agree.



Re: [PATCH v2 2/2] libstdc++: Lift chrono localized formatting to main chrono format loop [PR110739]

2025-06-26 Thread Tomasz Kaminski
On Thu, Jun 26, 2025 at 2:52 PM Jonathan Wakely  wrote:

> On Thu, 26 Jun 2025 at 13:30, Tomasz Kaminski  wrote:
> >
> >
> >
> > On Thu, Jun 26, 2025 at 2:13 PM Tomasz Kaminski 
> wrote:
> >>
> >>
> >>
> >> On Thu, Jun 26, 2025 at 2:09 PM Jonathan Wakely 
> wrote:
> >>>
> >>> On 26/06/25 11:39 +0200, Tomasz Kamiński wrote:
> >>> >This patch extract calls to _M_locale_fmt and construction of the
> struct tm,
> >>> >from the functions dedicated to each specifier, to main format loop in
> >>> >_M_format_to functions. This removes duplicated code repeated for
> specifiers.
> >>>
> >>> Great, this is exactly what I wanted to do. Removing all the branches
> >>> to call _M_locale_fmt from each of the _M_xxx member functions makes
> >>> them smaller and potentially faster.
> >>>
> >>> >To allow _M_locale_fmt to only be called if localized formatting is
> enabled
> >>> >('L' is present in chrono-format-spec), we provide a implementations
> for
> >>> >locale specific specifiers (%c, %r, %x, %X) that produces the same
> result
> >>> >as locale::classic():
> >>> > * %c is implemented as separate _M_c method
> >>> > * %r is implemented as separate _M_r method
> >>> > * %x is implemented together with %D, as they provide same behavior,
> >>> > * %X is implemented together with %R as _M_R_X, as both of them do
> not include
> >>> >   subseconds.
> >>>
> >>> Nice.
> >>>
> >>> >The handling of subseconds was also extracted to _M_subsecs function
> that is
> >>> >used by _M_S and _M_T specifier. The _M_T is now implemented in terms
> of
> >>> >_M_R_X (printing time without subseconds) and _M_subs.
> >>> >
> >>> >The __mod responsible for triggering localized formatting was removed
> from
> >>> >method handling most of specifiers, except:
> >>> > * _M_S (for %S) for which it determines if subseconds should be
> included,
> >>> > * _M_z (for %z) for which it determines if ':' is used as separator.
> >>> >
> >>> >   PR libstdc++/110739
> >>> >
> >>> >libstdc++-v3/ChangeLog:
> >>> >
> >>> >   * include/bits/chrono_io.h
> (__formatter_chrono::_M_use_locale_fmt):
> >>> >   Define.
> >>> >   (__formatter_chrono::_M_locale_fmt): Moved to front of the
> class.
> >>> >   (__formatter_chrono::_M_format_to): Construct and initialize
> >>> >   struct tm and call _M_locale_fmt if needed.
> >>> >   (__formatter_chrono::_M_c_r_x_X): Split into separate methods.
> >>> >   (__formatter_chrono::_M_c, __formatter_chrono::_M_r): Define.
> >>> >   (__formatter_chrono::_M_D): Renamed to _M_D_x.
> >>> >   (__formatter_chrono::_M_D_x): Renamed from _M_D.
> >>> >   (__formatter_chrono::_M_R_T): Split into _M_R_X and _M_T.
> >>> >   (__formatter_chrono::_M_R_X): Extracted from _M_R_T.
> >>> >   (__formatter_chrono::_M_T): Define in terms of _M_R_X and
> _M_subsecs.
> >>> >   (__formatter_chrono::_M_subsecs): Extracted from _M_S.
> >>> >   (__formatter_chrono::_M_S): Replaced __mod with __subs
> argument,
> >>> >   removed _M_locale_fmt call, and delegate to _M_subsecs.
> >>> >   (__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
> >>> >   (__formatter_chrono::_M_H_I, __formatter_chrono::_M_m)
> >>> >   (__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W):
> Remove
> >>> >   __mod argument and call to _M_locale_fmt.
> >>> >---
> >>> > libstdc++-v3/include/bits/chrono_io.h | 340
> +-
> >>> > 1 file changed, 172 insertions(+), 168 deletions(-)
> >>> >
> >>> >diff --git a/libstdc++-v3/include/bits/chrono_io.h
> b/libstdc++-v3/include/bits/chrono_io.h
> >>> >index 35e95906e6a..d451bde722d 100644
> >>> >--- a/libstdc++-v3/include/bits/chrono_io.h
> >>> >+++ b/libstdc++-v3/include/bits/chrono_io.h
> >>> >@@ -906,6 +906,40 @@ namespace __format
> >>> > return __format::__write(std::move(__out), __s);
> >>> >   }
> >>> >
> >>> >+  [[__gnu__::__always_inline__]]
> >>> >+  static bool
> >>> >+  _S_localized_spec(_CharT __conv, _CharT __mod)
> >>> >+  {
> >>> >+  switch (__conv)
> >>> >+{
> >>> >+case 'c':
> >>> >+case 'r':
> >>> >+case 'x':
> >>> >+case 'X':
> >>> >+  return true;
> >>> >+case 'z':
> >>> >+  return false;
> >>> >+default:
> >>> >+  return (bool)__mod;
> >>> >+};
> >>> >+  }
> >>> >+
> >>> >+  // Use the formatting locale's std::time_put facet to produce
> >>> >+  // a locale-specific representation.
> >>> >+  template
> >>> >+  _Iter
> >>> >+  _M_locale_fmt(_Iter __out, const locale& __loc, const struct
> tm& __tm,
> >>> >+char __fmt, char __mod) const
> >>> >+  {
> >>> >+basic_ostringstream<_CharT> __os;
> >>> >+__os.imbue(__loc);
> >>> >+const auto& __tp = use_facet>(__loc);
> >>> >+__tp.put(__os, __os, _S_space, &__tm, __fmt, __mod);
> >>> >+if (__os)
> >>> >+  __out = _M_write(std::move(__out), __loc, __os.view()

Re: [PATCH 10/17] rust: Silence a clang warning in borrow-checker-diagnostics

2025-06-26 Thread Martin Jambor
On Wed, Jun 25 2025, Martin Jambor wrote:
> Hi,
>
> when compiling
> gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
> with clang, it emits the following warning:
>
>   gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc:145:46: 
> warning: non-constant-expression cannot be narrowed from type 
> 'Polonius::Loan' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') in 
> initializer list [-Wc++11-narrowing]
>
> I'd hope that for indexing that is never really a problem,
> nevertheless if narrowing is taking place, I guess it can be argued it
> should be made explicit.
>
> I have so far only tested this with the clang compile, I will try to
> do a bootstrap with rust-enabled too.
>
> Philip, Pierre, would you be willing to incorporate this into your
> tree and commit it to master at gcc.gnu.org from there?  Or should I
> commit it to master at gcc.gnu.org and you'll merge it from there?

This has been approved on Zulip and so I have pushed this as
1e69c565589.

Thanks,

Martin


>
> Thanks,
>
> Martin
>
>
> gcc/rust/ChangeLog:
>
> 2025-06-23  Martin Jambor  
>
>   * checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
>   (BorrowCheckerDiagnostics::get_loan): Type cast loan to uint32_t.
> ---
>  .../checks/errors/borrowck/rust-borrow-checker-diagnostics.cc   | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git 
> a/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc 
> b/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
> index 6c67706780b..adf1448791e 100644
> --- a/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
> +++ b/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
> @@ -142,7 +142,7 @@ BorrowCheckerDiagnostics::get_statement (Polonius::Point 
> point)
>  const BIR::Loan &
>  BorrowCheckerDiagnostics::get_loan (Polonius::Loan loan)
>  {
> -  return bir_function.place_db.get_loans ()[{loan}];
> +  return bir_function.place_db.get_loans ()[{(uint32_t) loan}];
>  }
>  
>  const HIR::LifetimeParam *
> -- 
> 2.49.0


Re: [PATCH] c++, libstdc++, v2: Implement C++26 P2830R10 - Constexpr Type Ordering

2025-06-26 Thread Jonathan Wakely
On Thu, 26 Jun 2025 at 11:33, Jakub Jelinek  wrote:
>
> On Wed, Jun 25, 2025 at 10:58:59PM +0200, Maciej Cencora wrote:
> > update of std module is missing.
>
> Here is an updated patch which adds the std module part and while I was
> changing the patch, I've also added value_type/type and the 2 operators
> to std::type_order.
>
> Interdiff from the last patch is:
> --- libstdc++-v3/libsupc++/compare  2025-06-25 16:18:25.221710493 +0200
> +++ libstdc++-v3/libsupc++/compare  2025-06-25 16:18:25.221710493 +0200
> @@ -1271,6 +1271,10 @@
>  struct type_order
>  {
>static constexpr strong_ordering value = __builtin_type_order(_Tp, 
> _Up);
> +  using value_type = strong_ordering;
> +  using type = type_order<_Tp, _Up>;
> +  constexpr operator value_type() const noexcept { return value; }
> +  constexpr value_type operator()() const noexcept { return value; }
>  };
>
>/// @ingroup variable_templates
> --- libstdc++-v3/src/c++23/std.cc.in.jj 2025-06-12 15:50:51.400821105 +0200
> +++ libstdc++-v3/src/c++23/std.cc.in2025-06-26 07:37:06.90208 +0200
> @@ -888,6 +888,10 @@ export namespace std
>using std::partial_order;
>using std::strong_order;
>using std::weak_order;
> +#if __glibcxx_type_order >= 202506L
> +  using std::type_order;
> +  using std::type_order_v;
> +#endif
>  }
>
>  // 28.4 
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Though, now that I look at it again, perhaps both
> #if __glibcxx_type_order >= 202506L
> in the patch should have been
> #if __cpp_lib_type_order >= 202506L
>
> Can change that.

I would leave it. Testing the internal __glibcxx_foo macro is always
correct. Testing the standard __cpp_lib_foo macro is only correct in
the main header that defines the __glibcxx_want_foo macro. In this
case both are defined, because it's in the same header as the "want"
macro, but if we decide we need std::type_order to be available in
other headers and move it to some  then we'd need
to change it to test __glibcxx_type_order instead. Using the internal
macro is a bit more robust.

>
> 2025-06-26  Jakub Jelinek  
>
> gcc/cp/
> * cp-trait.def: Implement C++26 P2830R10 - Constexpr Type Ordering.
> (TYPE_ORDER): New.
> * method.cc (type_order_value): Define.
> * cp-tree.h (type_order_value): Declare.
> * semantics.cc (trait_expr_value): Use gcc_unreachable also
> for CPTK_TYPE_ORDER, adjust comment.
> (finish_trait_expr): Handle CPTK_TYPE_ORDER.
> * constraint.cc (diagnose_trait_expr): Likewise.
> gcc/testsuite/
> * g++.dg/cpp26/type-order1.C: New test.
> * g++.dg/cpp26/type-order2.C: New test.
> * g++.dg/cpp26/type-order3.C: New test.
> libstdc++-v3/
> * include/bits/version.def (type_order): New.
> * include/bits/version.h: Regenerate.
> * libsupc++/compare: Define __glibcxx_want_type_order before
> including bits/version.h.
> (std::type_order, std::type_order_v): New trait and template variable.
> * src/c++23/std.cc.in (std::type_order, std::type_order_v): Export.
> * testsuite/18_support/comparisons/type_order/1.cc: New test.
>
> --- gcc/cp/method.cc.jj 2025-06-25 16:04:51.611158952 +0200
> +++ gcc/cp/method.cc2025-06-25 16:09:32.017556551 +0200
> @@ -3951,5 +3951,26 @@ num_artificial_parms_for (const_tree fn)
>return count;
>  }
>
> +/* Return value of the __builtin_type_order trait.  */
> +
> +tree
> +type_order_value (tree type1, tree type2)
> +{
> +  tree rettype = lookup_comparison_category (cc_strong_ordering);
> +  if (rettype == error_mark_node)
> +return rettype;
> +  int ret;
> +  if (type1 == type2)
> +ret = 0;
> +  else
> +{
> +  const char *name1 = ASTRDUP (mangle_type_string (type1));
> +  const char *name2 = mangle_type_string (type2);
> +  ret = strcmp (name1, name2);
> +}
> +  return lookup_comparison_result (cc_strong_ordering, rettype,
> +  ret == 0 ? 0 : ret > 0 ? 1 : 2);
> +}
> +
>
>  #include "gt-cp-method.h"
> --- gcc/cp/cp-tree.h.jj 2025-06-25 16:04:51.610158965 +0200
> +++ gcc/cp/cp-tree.h2025-06-25 16:09:32.019556525 +0200
> @@ -7557,6 +7557,8 @@ extern bool ctor_omit_inherited_parms (
>  extern tree locate_ctor(tree);
>  extern tree implicitly_declare_fn   (special_function_kind, tree,
>  bool, tree, tree);
> +extern tree type_order_value   (tree, tree);
> +
>  /* In module.cc  */
>  class module_state; /* Forward declare.  */
>  inline bool modules_p () { return flag_modules != 0; }
> --- gcc/cp/semantics.cc.jj  2025-06-25 16:04:51.633158669 +0200
> +++ gcc/cp/semantics.cc 2025-06-25 16:09:32.021556500 +0200
> @@ -13593,8 +13593,10 @@ trait_expr_value (cp_trait_kind kind, tr
>  case CPTK_IS_DEDUCIBLE:
>return type_targs_deducible_from

Re: [PATCH] mklog.py: Add main function

2025-06-26 Thread Alex Coplan
On 21/06/2025 12:35, Filip Kastl wrote:
> On Fri 2025-06-20 10:46:08, Alex Coplan wrote:
> > Hi,
> > 
> > This adds a main() function to mklog.py (like e.g. check_GNU_style.py
> > has), which makes it easier to import and invoke from another python
> > script.  This is useful when using a wrapper script to set up the python
> > environment.
> > 
> > Smoke tested by using the modified mklog.py to generate the ChangeLog
> > for this patch.
> > 
> > OK to install?
> 
> It is a small change and doing this is considered good practice for Python
> scripts anyway AFAIK.  So LGTM.  I'm not a maintainer though.

Thanks, I've now pushed this as obvious (in hindsight I probably could
have done that straight away).

Pushed as g:ca8ea1d23e8b6798b6eb8c018957b25aa6f0db95.

Alex

> 
> Filip Kastl
> 
> > 
> > Thanks,
> > Alex
> > 
> > contrib/ChangeLog:
> > 
> > * mklog.py: Add main() function.
> 
> > diff --git a/contrib/mklog.py b/contrib/mklog.py
> > index dcf7dde6333..26d4156b034 100755
> > --- a/contrib/mklog.py
> > +++ b/contrib/mklog.py
> > @@ -360,7 +360,7 @@ def skip_line_in_changelog(line):
> >  return FIRST_LINE_OF_END_RE.match(line) is None
> >  
> >  
> > -if __name__ == '__main__':
> > +def main():
> >  extra_args = os.getenv('GCC_MKLOG_ARGS')
> >  if extra_args:
> >  sys.argv += json.loads(extra_args)
> > @@ -447,3 +447,6 @@ if __name__ == '__main__':
> >  f.write('\n'.join(end))
> >  else:
> >  print(output, end='')
> > +
> > +if __name__ == '__main__':
> > +main()
> 


[PATCH] RISC-V: Vector-scalar negate-multiply-(subtract-)accumulate [PR119100]

2025-06-26 Thread Paul-Antoine Arras
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a (possibly negated) minus-mult RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.fv6,fa0
  vfnmacc.vv  v2,v6,v4

After, we get only one:
  vfnmacc.vf  v2,fa0,v4

PR target/119100

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vfnmsub_,*vfnmadd_): Handle
both add and acc variants.
* config/riscv/vector.md (*pred_mul_neg__scalar_undef): New
pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfnmacc and
vfnmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h (DEF_VF_MULOP_CASE_1):
Fix return type.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f64.c: New test.
---
 gcc/config/riscv/autovec-opt.md   | 30 ---
 gcc/config/riscv/vector.md| 38 ++-
 .../riscv/rvv/autovec/vx_vf/vf-1-f16.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-1-f32.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-1-f64.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-2-f16.c|  2 +
 .../riscv/rvv/autovec/vx_vf/vf-2-f32.c|  2 +
 .../riscv/rvv/autovec/vx_vf/vf-2-f64.c|  2 +
 .../riscv/rvv/autovec/vx_vf/vf-3-f16.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-3-f32.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-3-f64.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-4-f16.c|  2 +
 .../riscv/rvv/autovec/vx_vf/vf-4-f32.c|  2 +
 .../riscv/rvv/autovec/vx_vf/vf-4-f64.c|  2 +
 .../riscv/rvv/autovec/vx_vf/vf_mulop.h|  5 ++-
 .../rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c  | 16 
 .../rvv/autovec/vx_vf/vf_vfnmacc-run-1-f32.c  | 16 
 .../rvv/autovec/vx_vf/vf_vfnmacc-run-1-f64.c  | 16 
 .../rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c  | 16 
 .../rvv/autovec/vx_vf/vf_vfnmsac-run-1-f32.c  | 16 
 .../rvv/autovec/vx_vf/vf_vfnmsac-run-1-f64.c  | 16 
 21 files changed, 187 insertions(+), 18 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f64.c

diff --git gcc/config/riscv/autovec-opt.md gcc/config/riscv/autovec-opt.md
index bb15d14b4..8df7f6494 100644
--- gcc/config/riscv/autovec-opt.md
+++ gcc/config/riscv/autovec-opt.md
@@ -1723,6 +1723,8 @@ (define_insn_and_split "*_vx_"
 ;; - vfnmsub.vf
 ;; - vfmacc.vf
 ;; - vfmsac.vf
+;; - vfnmacc.vf
+;; - vfnmsac.vf
 ;; 
=
 
 ;; vfmadd.vf, vfmsub.vf, vfmacc.vf, vfmsac.vf
@@ -1748,22 +1750,22 @@ (define_insn_and_split "*_vf_"
   [(set_attr "type" "vfmuladd")]
 )
 
-;; vfnmsub.vf
+;; vfnmsub.vf, vfnmsac.vf
 (define_insn_and_split "*vfnmsub_"
-  [(set (match_operand:V_VLSF 0 "register_operand" "=vd")
+  [(set (match_operand:V_VLSF 0 "register_operand")
 (minus:V_VLSF
-   (match_operand:V_VLSF 3 "register_operand"  " vr")
-   (mult:V_VLSF
- (vec_duplicate:V_VLSF
-   (match_operand: 1 "register_operand"   "  f"))
- (match_operand:V_VLSF 2 "register_operand""  0"]
+  (match_operand:V_VLSF 3 "register_operand")
+  (mult:V_VLSF
+   (vec_duplicate:V_VLSF
+ (match_operand: 1 "register_operand"))

Re: [PATCH v3 2/2] libstdc++: Lift chrono localized formatting to main chrono format loop [PR110739]

2025-06-26 Thread Jonathan Wakely
On Thu, 26 Jun 2025 at 14:19, Tomasz Kamiński  wrote:
>
> This patch extract calls to _M_locale_fmt and construction of the struct tm,
> from the functions dedicated to each specifier, to main format loop in
> _M_format_to functions. This removes duplicated code repeated for specifiers.
>
> To allow _M_locale_fmt to only be called if localized formatting is enabled
> ('L' is present in chrono-format-spec), we provide a implementations for
> locale specific specifiers (%c, %r, %x, %X) that produces the same result
> as locale::classic():
>  * %c is implemented as separate _M_c method
>  * %r is implemented as separate _M_r method
>  * %x is implemented together with %D, as they provide same behavior,
>  * %X is implemented together with %R as _M_R_X, as both of them do not 
> include
>subseconds.
>
> The handling of subseconds was also extracted to _M_subsecs function that is
> used by _M_S and _M_T specifier. The _M_T is now implemented in terms of
> _M_R_X (printing time without subseconds) and _M_subs.
>
> The __mod responsible for triggering localized formatting was removed from
> method handling most of specifiers, except:
>  * _M_S (for %S) for which it determines if subseconds should be included,
>  * _M_z (for %z) for which it determines if ':' is used as separator.
>
> PR libstdc++/110739
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h (__formatter_chrono::_M_use_locale_fmt):
> Define.
> (__formatter_chrono::_M_locale_fmt): Moved to front of the class.
> (__formatter_chrono::_M_format_to): Construct and initialize
> struct tm and call _M_locale_fmt if needed.
> (__formatter_chrono::_M_c_r_x_X): Split into separate methods.
> (__formatter_chrono::_M_c, __formatter_chrono::_M_r): Define.
> (__formatter_chrono::_M_D): Renamed to _M_D_x.
> (__formatter_chrono::_M_D_x): Renamed from _M_D.
> (__formatter_chrono::_M_R_T): Split into _M_R_X and _M_T.
> (__formatter_chrono::_M_R_X): Extracted from _M_R_T.
> (__formatter_chrono::_M_T): Define in terms of _M_R_X and _M_subsecs.
> (__formatter_chrono::_M_subsecs): Extracted from _M_S.
> (__formatter_chrono::_M_S): Replaced __mod with __subs argument,
> removed _M_locale_fmt call, and delegate to _M_subsecs.
> (__formatter_chrono::_M_C_y_Y, __formatter_chrono::_M_d_e)
> (__formatter_chrono::_M_H_I, __formatter_chrono::_M_m)
> (__formatter_chrono::_M_u_w, __formatter_chrono::_M_U_V_W): Remove
> __mod argument and call to _M_locale_fmt.
>
> Reviewed-by: Jonathan Wakely 
> Signed-off-by: Tomasz Kamiński 
> ---
> Changes in v3:
>  - restored missing comment in _M_S
>  - increment __out before calling _M_C_y_Y in _M_c

OK for trunk, thanks


>
>  libstdc++-v3/include/bits/chrono_io.h | 338 +-
>  1 file changed, 171 insertions(+), 167 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h 
> b/libstdc++-v3/include/bits/chrono_io.h
> index 8811eaa5b3b..9e21152e398 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -908,6 +908,40 @@ namespace __format
>   return __format::__write(std::move(__out), __s);
> }
>
> +  [[__gnu__::__always_inline__]]
> +  static bool
> +  _S_localized_spec(_CharT __conv, _CharT __mod)
> +  {
> +   switch (__conv)
> + {
> + case 'c':
> + case 'r':
> + case 'x':
> + case 'X':
> +   return true;
> + case 'z':
> +   return false;
> + default:
> +   return (bool)__mod;
> + };
> +  }
> +
> +  // Use the formatting locale's std::time_put facet to produce
> +  // a locale-specific representation.
> +  template
> +   _Iter
> +   _M_locale_fmt(_Iter __out, const locale& __loc, const struct tm& __tm,
> + char __fmt, char __mod) const
> +   {
> + basic_ostringstream<_CharT> __os;
> + __os.imbue(__loc);
> + const auto& __tp = use_facet>(__loc);
> + __tp.put(__os, __os, _S_space, &__tm, __fmt, __mod);
> + if (__os)
> +   __out = _M_write(std::move(__out), __loc, __os.view());
> + return __out;
> +   }
> +
>template
> _Out
> _M_format_to(const _ChronoData<_CharT>& __t, _Out __out,
> @@ -925,6 +959,36 @@ namespace __format
> return std::move(__out);
>   };
>
> + struct tm __tm{};
> + bool __use_locale_fmt = false;
> + if (_M_spec._M_localized && _M_spec._M_locale_specific)
> +   if (__fc.locale() != locale::classic())
> + {
> +   __use_locale_fmt = true;
> +
> +   __tm.tm_year = (int)__t._M_year - 1900;
> +   __tm.tm_yday = __t._M_day_of_year.count();
> +   __tm.tm_mon = (unsigned)__t._M_month - 1;
> +   __tm.tm_mday = (unsi

[PATCH] fold-mem-offsets: Convert from DF to RTL-SSA

2025-06-26 Thread Christoph Müllner
This patch converts the fold-mem-offsets pass from DF to RTL-SSA.
Along with this conversion, the way the pass collects information
was completely reworked.  Instead of visiting each instruction multiple
times, this is now down only once.

Most significant changes are:
* The pass operates mainly on insn_info objects from RTL-SSA.
* Single iteration over all nondebug INSNs for identification
  of fold-mem-roots.  Then walk of the fold-mem-roots' DEF-chain
  to collect foldable constants.
* The class fold_mem_info holds vectors for the DEF-chain of
  the to-be-folded INSNs (fold_agnostic_insns, which don't need
  to be adjusted, and fold_insns, which need their constant to
  be set to zero).
* Introduction of a single-USE mode, which only collects DEFs,
  that have a single USE and therefore are safe to transform
  (the fold-mem-root will be the final USE).  This mode is fast
  and will always run (unless disabled via -fno-fold-mem-offsets).
* Introduction of a multi-USE mode, which allows DEFs to have
  multiple USEs, but all USEs must be part of any fold-mem-root's
  DEF-chain.  The analysis of all USEs is expensive and therefore,
  this mode is disabled for highly connected CFGs.  Note, that
  multi-USE mode will miss some opportunities that the single-USE
  mode finds (e.g. multi-USE mode fails for fold-mem-offsets-3.c).

The following testing was done:
* Bootstrapped and regtested on aarch64-linux and x86-64-linux.
* Regtested on riscv64-linux.
* SPEC CPU 2017 tested on aarch64 and riscv64-linux.

The number of applied optimizations of different versions/modes
of fold-mem-offsets in SPEC CPU2017 on RISC-V rv64gc_zba_zbb_zbs
is as follows:

Upstream:
  500.perlbench_r: 169
  502.gcc_r: 624
  520.omnetpp_r: 1301
  523.xalancbmk_r: 23
  525.x264_r: 705
  531.deepsjeng_r: 36
  541.leela_r: 19
  548.exchange2: 90
  557.xz_r: 11
  SUM: 2151

New single-USE:
  500.perlbench_r: 70
  502.gcc_r: 263
  520.omnetpp_r: 1100
  523.xalancbmk_r: 10
  525.x264_r: 95
  531.deepsjeng_r: 19
  541.leela_r: 252
  548.exchange2: 13
  557.xz_r: 11
  SUM: 1833

New multi-USE:
  500.perlbench_r: 186
  502.gcc_r: 744
  520.omnetpp_r: 1187
  523.xalancbmk_r: 22
  525.x264_r: 985
  531.deepsjeng_r: 21
  541.leela_r: 87
  548.exchange2: 63
  557.xz_r: 23
  SUM: 3318

New single-USE then multi-USE:
  500.perlbench_r: 192
  502.gcc_r: 761
  520.omnetpp_r: 1673
  523.xalancbmk_r: 22
  525.x264_r: 995
  531.deepsjeng_r: 21
  541.leela_r: 252
  548.exchange2: 63
  557.xz_r: 23
  SUM: 4002

A compile time analysis with `/bin/time -v ./install/usr/local/bin/gcc -O2 
all.i`
(all.i from PR117922) shows:
* -fno-fold-mem-offsets:  289 s (user time) / 15572436 kBytes (max resident set 
size)
* -ffold-mem-offsets: 339 s (user time) / 23606516 kBytes (max resident set 
size)
Adding -fexpensive-optimizations to enable multi-USE mode does not have
an impact on the duration or the memory footprint.

SPEC CPU 2017 showed no significant performance impact on aarch64-linux.

gcc/ChangeLog:

PR rtl-optimization/117922
* fold-mem-offsets.cc (INCLUDE_ALGORITHM): Added definition.
(INCLUDE_FUNCTIONAL): Likewise.
(INCLUDE_ARRAY): Likewise.
(class pass_fold_mem_offsets): Moved to bottom of file.
(get_fold_mem_offset_root): Converted to RTL-SSA.
(get_single_def_in_bb): Converted to RTL-SSA.
(get_uses): New.
(has_foldable_uses_p): Converted to RTL-SSA.
(fold_offsets): Converted to RTL-SSA.
(fold_offsets_1): Converted to RTL-SSA.
(get_fold_mem_root): Removed.
(do_check_validity): New.
(do_analysis): Removed.
(insn_uses_not_in_bitmap): New.
(do_fold_info_calculation): Removed.
(drop_unsafe_candidates): New.
(do_commit_offset): Converted to RTL-SSA.
(compute_validity_closure): Removed.
(do_commit_insn): Changed to change INSN in place.
(fold_mem_offsets_single_use): New.
(fold_mem_offsets_multi_use): New.
(pass_fold_mem_offsets::execute): Moved to bottom of file.
(fold_mem_offsets): New.

Signed-off-by: Christoph Müllner 
---
 gcc/fold-mem-offsets.cc | 1138 ---
 1 file changed, 596 insertions(+), 542 deletions(-)

diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
index c1c94472a071..0e777c32ee31 100644
--- a/gcc/fold-mem-offsets.cc
+++ b/gcc/fold-mem-offsets.cc
@@ -17,24 +17,34 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
-#include "tm.h"
+#include "backend.h"
 #include "rtl.h"
+#include "rtlanal.h"
+#include "df.h"
+#include "rtl-ssa.h"
+
+#include "predict.h"
+#include "cfgrtl.h"
+#include "cfgcleanup.h"
+#include "cgraph.h"
+#include "tree-pass.h"
+#include "target

Re: [PATCH v2 1/1] libstdc++: Implement default_accessor from mdspan.

2025-06-26 Thread Luc Grosheintz




On 6/13/25 12:40, Luc Grosheintz wrote:

libstdc++-v3/ChangeLog:

* include/std/mdspan (default_accessor): New class.
* src/c++23/std.cc.in: Register default_accessor.
* testsuite/23_containers/mdspan/accessors/default.cc: New test.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan   | 26 
  libstdc++-v3/src/c++23/std.cc.in  |  3 +-
  .../23_containers/mdspan/accessors/default.cc | 59 +++
  3 files changed, 87 insertions(+), 1 deletion(-)
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 6dc2441f80b..2e85ba8e6cb 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1004,6 +1004,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
[[no_unique_address]] _S_strides_t _M_strides;
  };
  
+  template

+struct default_accessor
+{


It would be easy to check the two mandates: not abstract, not array
here. Would you like a v3, with the change?

https://eel.is/c++draft/views.multidim#mdspan.accessor.default.overview-2


+  using offset_policy = default_accessor;
+  using element_type = _ElementType;
+  using reference = element_type&;
+  using data_handle_type = element_type*;
+
+  constexpr
+  default_accessor() noexcept = default;
+
+  template
+   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
+   constexpr
+   default_accessor(default_accessor<_OElementType>) noexcept
+   { }
+
+  constexpr reference
+  access(data_handle_type __p, size_t __i) const noexcept
+  { return __p[__i]; }
+
+  constexpr data_handle_type
+  offset(data_handle_type __p, size_t __i) const noexcept
+  { return __p + __i; }
+};
+
  _GLIBCXX_END_NAMESPACE_VERSION
  }
  #endif
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/std.cc.in
index 109f590f1d1..e671aff68f8 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -1843,7 +1843,8 @@ export namespace std
using std::layout_left;
using std::layout_right;
using std::layout_stride;
-  // FIXME layout_left_padded, layout_right_padded, default_accessor and mdspan
+  using std::default_accessor;
+  // FIXME layout_left_padded, layout_right_padded and mdspan
  }
  #endif
  
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc

new file mode 100644
index 000..303833d4857
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
@@ -0,0 +1,59 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  test_accessor_policy()
+  {
+static_assert(std::copyable);
+static_assert(std::is_nothrow_move_constructible_v);
+static_assert(std::is_nothrow_move_assignable_v);
+static_assert(std::is_nothrow_swappable_v);
+  }
+
+constexpr bool
+test_access()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.access(a.data(), 0) == 10);
+  VERIFY(accessor.access(a.data(), 4) == 14);
+  return true;
+}
+
+constexpr bool
+test_offset()
+{
+  std::default_accessor accessor;
+  std::array a{10, 11, 12, 13, 14};
+  VERIFY(accessor.offset(a.data(), 0) == a.data());
+  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
+  return true;
+}
+
+constexpr void
+test_ctor()
+{
+  static_assert(std::is_nothrow_constructible_v,
+   std::default_accessor>);
+  static_assert(std::is_convertible_v,
+ std::default_accessor>);
+  static_assert(!std::is_constructible_v,
+std::default_accessor>);
+}
+
+int
+main()
+{
+  test_accessor_policy>();
+  test_access();
+  static_assert(test_access());
+  test_offset();
+  static_assert(test_offset());
+  test_ctor();
+  return 0;
+}




Re: [PATCH] RISC-V: update prepare_ternary_operands to handle the vector-scalar case [PR120828]

2025-06-26 Thread Robin Dapp

This is a followup to 92e1893e0 "RISC-V: Add patterns for vector-scalar
multiply-(subtract-)accumulate" that caused an ICE in some cases where the mult
operands were wrongly swapped.
This patch ensures that operands are not swapped in the vector-scalar case.


This looks reasonable, so OK for the trunk but how did that slip through in the 
first place?


--
Regards
Robin



  1   2   >