[llvm-branch-commits] [libc++] Introduce __force_nonstandard_layout base class for pointer field protection (PR #151652)

2025-08-02 Thread A. Jiang via llvm-branch-commits

frederick-vs-ja wrote:

> That's right. For example, the standard would require that a standard-layout 
> `std::unique_ptr` has the same representation as `int *` (assuming the 
> obvious implementation), and it was not considered practical to change the 
> representation of all pointers for compatibility reasons.

However, even if we make `std::unique_ptr` non-standard-layout, as long as 
the `int*` subobject is at offset 0, it's still well-defined to access the 
`int*` subobject via `*std::launder(reinterpret_cast(&up))`. I'm not sure 
whether merely requiring `launder` would be safe enough.

I think a "safer" way (defending against `reinterpret_cast`) would be making 
the `int*` subobject (and its friends) at some positive offset. But this is 
theoretically orthogonal to the standard-layout property.

https://github.com/llvm/llvm-project/pull/151652
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [RISCV] vsha2cl intrinsics should select vsha2cl instructions. (PR #151834)

2025-08-02 Thread Brandon Wu via llvm-branch-commits

https://github.com/4vtomat approved this pull request.

LGTM~

https://github.com/llvm/llvm-project/pull/151834
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] release/21.x: [DTLTO][LLD][ELF] Support bitcode members of thin archives (#149425) (PR #151674)

2025-08-02 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay approved this pull request.


https://github.com/llvm/llvm-project/pull/151674
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Remove target check for sjlj config (PR #148792)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/148792

>From 402b5c4a8db38647fae58db8b19749c3a2b566c6 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 15 Jul 2025 16:01:44 +0900
Subject: [PATCH] RuntimeLibcalls: Remove target check for sjlj config

I'm assuming this was the set of targets that were relevant
for sjlj handling. Just take the raw exception setting instead,
and assume it makes sense for the target.
---
 llvm/lib/IR/RuntimeLibcalls.cpp | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index bfe2a3d6af867..1ba2fa7f93b03 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -73,10 +73,8 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT,
EABI EABIVersion, StringRef ABIName) {
   setTargetRuntimeLibcallSets(TT, FloatABI);
 
-  if (TT.isX86() || TT.isVE() || TT.isARM() || TT.isThumb()) {
-if (ExceptionModel == ExceptionHandling::SjLj)
-  setLibcallImpl(RTLIB::UNWIND_RESUME, RTLIB::_Unwind_SjLj_Resume);
-  }
+  if (ExceptionModel == ExceptionHandling::SjLj)
+setLibcallImpl(RTLIB::UNWIND_RESUME, RTLIB::_Unwind_SjLj_Resume);
 
   if (TT.isOSOpenBSD())
 setLibcallImpl(RTLIB::STACK_SMASH_HANDLER, RTLIB::__stack_smash_handler);

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Remove darwin override of half convert libcalls (PR #148782)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/148782

>From 9404c93a6e6bed8ca8572b2a1c3f4b96d7c2d895 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 15 Jul 2025 14:53:15 +0900
Subject: [PATCH] RuntimeLibcalls: Remove darwin override of half convert
 libcalls

These are already the default calls set for these conversions, so
they should not require explicit setting. The non-default cases are
currently overridden in ARMISelLowering. Just delete this until
the list of calls and lowering decisions are separated.

This was added back in 6402ad27c01c9503a12d41d7e40646cf0d1f919f. It
appears to not be relevant for AArch64, where calls appear to never
be used for these. It also appears to not be relevant for x86, where
the default calls seem to always end up used anyway.
---
 llvm/lib/IR/RuntimeLibcalls.cpp | 9 -
 1 file changed, 9 deletions(-)

diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 8c90c52141dc7..5340f11e0c112 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -78,15 +78,6 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT,
   setLibcallImpl(RTLIB::UNWIND_RESUME, RTLIB::_Unwind_SjLj_Resume);
   }
 
-  // A few names are different on particular architectures or environments.
-  if (TT.isOSDarwin()) {
-// For f16/f32 conversions, Darwin uses the standard naming scheme,
-// instead of the gnueabi-style __gnu_*_ieee.
-// FIXME: What about other targets?
-setLibcallImpl(RTLIB::FPEXT_F16_F32, RTLIB::__extendhfsf2);
-setLibcallImpl(RTLIB::FPROUND_F32_F16, RTLIB::__truncsfhf2);
-  }
-
   if (TT.isOSOpenBSD()) {
 setLibcallImpl(RTLIB::STACKPROTECTOR_CHECK_FAIL, RTLIB::Unsupported);
 setLibcallImpl(RTLIB::STACK_SMASH_HANDLER, RTLIB::__stack_smash_handler);

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Remove target check for sjlj config (PR #148792)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/148792

>From 402b5c4a8db38647fae58db8b19749c3a2b566c6 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 15 Jul 2025 16:01:44 +0900
Subject: [PATCH] RuntimeLibcalls: Remove target check for sjlj config

I'm assuming this was the set of targets that were relevant
for sjlj handling. Just take the raw exception setting instead,
and assume it makes sense for the target.
---
 llvm/lib/IR/RuntimeLibcalls.cpp | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index bfe2a3d6af867..1ba2fa7f93b03 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -73,10 +73,8 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT,
EABI EABIVersion, StringRef ABIName) {
   setTargetRuntimeLibcallSets(TT, FloatABI);
 
-  if (TT.isX86() || TT.isVE() || TT.isARM() || TT.isThumb()) {
-if (ExceptionModel == ExceptionHandling::SjLj)
-  setLibcallImpl(RTLIB::UNWIND_RESUME, RTLIB::_Unwind_SjLj_Resume);
-  }
+  if (ExceptionModel == ExceptionHandling::SjLj)
+setLibcallImpl(RTLIB::UNWIND_RESUME, RTLIB::_Unwind_SjLj_Resume);
 
   if (TT.isOSOpenBSD())
 setLibcallImpl(RTLIB::STACK_SMASH_HANDLER, RTLIB::__stack_smash_handler);

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Move __stack_chk_fail config to tablegen (PR #148789)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/148789

>From cdfde2310d5dabadae82e4cd64d385d6919da66d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 15 Jul 2025 15:47:10 +0900
Subject: [PATCH 1/2] RuntimeLibcalls: Move __stack_chk_fail config to tablegen

---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 44 -
 llvm/lib/IR/RuntimeLibcalls.cpp |  4 +--
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index f8782d71ddf37..02b5d603a29f4 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -18,6 +18,7 @@ class DuplicateLibcallImplWithPrefix
 /// Libcall Predicates
 def isOSDarwin : RuntimeLibcallPredicate<"TT.isOSDarwin()">;
 def isOSOpenBSD : RuntimeLibcallPredicate<"TT.isOSOpenBSD()">;
+def isNotOSOpenBSD : RuntimeLibcallPredicate<"!TT.isOSOpenBSD()">;
 def isOSWindows : RuntimeLibcallPredicate<"TT.isOSWindows()">;
 def isNotOSWindows : RuntimeLibcallPredicate<"!TT.isOSWindows()">;
 def isNotOSMSVCRT : RuntimeLibcallPredicate<"!TT.isOSMSVCRT()">;
@@ -705,9 +706,6 @@ foreach lc = LibCalls__atomic in {
   def __#!tolower(!cast(lc)) : RuntimeLibcallImpl;
 }
 
-// Stack Protector Fail
-def __stack_chk_fail : RuntimeLibcallImpl;
-
 // Safe stack.
 def __safestack_pointer_address : 
RuntimeLibcallImpl;
 
@@ -955,6 +953,9 @@ def exp10l_f80 : RuntimeLibcallImpl;
 def exp10l_f128 : RuntimeLibcallImpl;
 def exp10l_ppcf128 : RuntimeLibcallImpl;
 
+// Stack Protector Fail
+def __stack_chk_fail : RuntimeLibcallImpl;
+
 //
 // compiler-rt/libgcc but 64-bit only, not available by default
 //
@@ -1149,6 +1150,8 @@ defvar LibmHasLdexpF80 = LibcallImpls<(add ldexp_f80), 
isNotOSWindowsOrIsCygwinM
 defvar LibmHasFrexpF128 = LibcallImpls<(add frexp_f128), 
isNotOSWindowsOrIsCygwinMinGW>;
 defvar LibmHasLdexpF128 = LibcallImpls<(add ldexp_f128), 
isNotOSWindowsOrIsCygwinMinGW>;
 
+defvar has__stack_chk_fail = LibcallImpls<(add __stack_chk_fail), 
isNotOSOpenBSD>;
+
 
//===--===//
 // Objective-C Runtime Libcalls
 
//===--===//
@@ -1225,7 +1228,8 @@ def AArch64SystemLibrary : SystemRuntimeLibrary<
LibcallImpls<(add bzero), isOSDarwin>,
DarwinExp10, DarwinSinCosStret,
LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
-   DefaultLibmExp10)
+   DefaultLibmExp10,
+   has__stack_chk_fail)
 >;
 
 // Prepend a # to every name
@@ -1241,7 +1245,7 @@ defset list 
WinArm64ECDefaultRuntimeLibcallImpls = {
 
 def WindowsARM64ECSystemLibrary
 : SystemRuntimeLibrary;
+   (add WinArm64ECDefaultRuntimeLibcallImpls, 
__stack_chk_fail)>;
 
 
//===--===//
 // AMDGPU Runtime Libcalls
@@ -1501,7 +1505,8 @@ def ARMSystemLibrary
// Use divmod compiler-rt calls for iOS 5.0 and later.
LibcallImpls<(add __divmodsi4, __udivmodsi4),
 RuntimeLibcallPredicate<[{TT.isOSBinFormatMachO() &&
-  (!TT.isiOS() || 
!TT.isOSVersionLT(5, 0))}]>>)> {
+  (!TT.isiOS() || 
!TT.isOSVersionLT(5, 0))}]>>,
+   has__stack_chk_fail)> {
   let DefaultLibcallCallingConv = LibcallCallingConv<[{
  (!TT.isOSDarwin() && !TT.isiOS() && !TT.isWatchOS() && !TT.isDriverKit()) 
?
 (FloatABI == FloatABI::Hard ? CallingConv::ARM_AAPCS_VFP
@@ -1612,7 +1617,7 @@ def HexagonSystemLibrary
 __umoddi3, __divdf3, __muldf3, __divsi3, __subdf3, sqrtf,
 __divdi3, __umodsi3, __moddi3, __modsi3), HexagonLibcalls,
 LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
-exp10f, exp10, exp10l_f128)>;
+exp10f, exp10, exp10l_f128, __stack_chk_fail)>;
 
 
//===--===//
 // Lanai Runtime Libcalls
@@ -1622,7 +1627,8 @@ def isLanai : RuntimeLibcallPredicate<"TT.getArch() == 
Triple::lanai">;
 
 // Use fast calling convention for library functions.
 def LanaiSystemLibrary
-: SystemRuntimeLibrary {
+: SystemRuntimeLibrary {
   let DefaultLibcallCallingConv = FASTCC;
 }
 
@@ -1914,8 +1920,10 @@ def MSP430SystemLibrary
   // TODO: __mspabi_[srli/srai/slli] ARE implemented in libgcc
   __mspabi_srll,
   __mspabi_sral,
-  __mspabi_slll
+  __mspabi_slll,
   // __mspabi_[srlll/srall/s/rlli/rlll] are NOT implemented in libgcc
+
+  __stack_chk_fail
   )
 >;
 
@@ -2006,7 +2014,8 @@ def PPCSystemLibrary
LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
LibmHasSinCosPPCF128,
Availabl

[llvm-branch-commits] [llvm] RuntimeLibcalls: Add bitset for available libcalls (PR #150869)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/150869

>From 14813069a5bb931b2d93edde952dabed87d56a4d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 27 Jul 2025 23:26:20 +0900
Subject: [PATCH] RuntimeLibcalls: Add bitset for available libcalls

This is a step towards separating the set of available libcalls
from the lowering decision of which call to use. Libcall recognition
now directly checks availability instead of indirectly checking through
the lowering table.
---
 llvm/include/llvm/IR/RuntimeLibcalls.h| 64 +++
 llvm/lib/IR/RuntimeLibcalls.cpp   |  8 +--
 .../RuntimeLibcallEmitter-calling-conv.td | 22 +++
 .../RuntimeLibcallEmitter-conflict-warning.td | 15 +
 llvm/test/TableGen/RuntimeLibcallEmitter.td   | 27 
 .../TableGen/Basic/RuntimeLibcallsEmitter.cpp | 36 ++-
 6 files changed, 164 insertions(+), 8 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index f39e2e3c26900..8a5f953b68f9d 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -53,8 +53,64 @@ static inline auto libcall_impls() {
   return enum_seq(static_cast(1), RTLIB::NumLibcallImpls);
 }
 
+/// Manage a bitset representing the list of available libcalls for a module.
+///
+/// Most of this exists because std::bitset cannot be statically constructed in
+/// a size large enough before c++23
+class LibcallImplBitset {
+private:
+  using BitWord = uint64_t;
+  static constexpr unsigned BitWordSize = sizeof(BitWord) * CHAR_BIT;
+  static constexpr size_t NumArrayElts =
+  divideCeil(RTLIB::NumLibcallImpls, BitWordSize);
+  using Storage = BitWord[NumArrayElts];
+
+  Storage Bits = {};
+
+  /// Get bitmask for \p Impl in its Bits element.
+  static constexpr BitWord getBitmask(RTLIB::LibcallImpl Impl) {
+unsigned Idx = static_cast(Impl);
+return BitWord(1) << (Idx % BitWordSize);
+  }
+
+  /// Get index of array element of Bits for \p Impl
+  static constexpr unsigned getArrayIdx(RTLIB::LibcallImpl Impl) {
+return static_cast(Impl) / BitWordSize;
+  }
+
+public:
+  constexpr LibcallImplBitset() = default;
+  constexpr LibcallImplBitset(const Storage &Src) {
+for (size_t I = 0; I != NumArrayElts; ++I)
+  Bits[I] = Src[I];
+  }
+
+  /// Check if a LibcallImpl is available.
+  constexpr bool test(RTLIB::LibcallImpl Impl) const {
+BitWord Mask = getBitmask(Impl);
+return (Bits[getArrayIdx(Impl)] & Mask) != 0;
+  }
+
+  /// Mark a LibcallImpl as available
+  void set(RTLIB::LibcallImpl Impl) {
+assert(Impl != RTLIB::Unsupported && "cannot enable unsupported libcall");
+Bits[getArrayIdx(Impl)] |= getBitmask(Impl);
+  }
+
+  /// Mark a LibcallImpl as unavailable
+  void unset(RTLIB::LibcallImpl Impl) {
+assert(Impl != RTLIB::Unsupported && "cannot enable unsupported libcall");
+Bits[getArrayIdx(Impl)] &= ~getBitmask(Impl);
+  }
+};
+
 /// A simple container for information about the supported runtime calls.
 struct RuntimeLibcallsInfo {
+private:
+  /// Bitset of libcalls a module may emit a call to.
+  LibcallImplBitset AvailableLibcallImpls;
+
+public:
   explicit RuntimeLibcallsInfo(
   const Triple &TT,
   ExceptionHandling ExceptionModel = ExceptionHandling::None,
@@ -132,6 +188,14 @@ struct RuntimeLibcallsInfo {
 return ImplToLibcall[Impl];
   }
 
+  bool isAvailable(RTLIB::LibcallImpl Impl) const {
+return AvailableLibcallImpls.test(Impl);
+  }
+
+  void setAvailable(RTLIB::LibcallImpl Impl) {
+AvailableLibcallImpls.set(Impl);
+  }
+
   /// Check if this is valid libcall for the current module, otherwise
   /// RTLIB::Unsupported.
   LLVM_ABI RTLIB::LibcallImpl getSupportedLibcallImpl(StringRef FuncName) 
const;
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 8c90c52141dc7..569f63f28db77 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -114,12 +114,8 @@ RuntimeLibcallsInfo::getSupportedLibcallImpl(StringRef 
FuncName) const {
   for (auto I = Range.begin(); I != Range.end(); ++I) {
 RTLIB::LibcallImpl Impl =
 static_cast(I - RuntimeLibcallNameOffsets.begin());
-
-// FIXME: This should not depend on looking up ImplToLibcall, only the list
-// of libcalls for the module.
-RTLIB::LibcallImpl Recognized = LibcallImpls[ImplToLibcall[Impl]];
-if (Recognized != RTLIB::Unsupported)
-  return Recognized;
+if (isAvailable(Impl))
+  return Impl;
   }
 
   return RTLIB::Unsupported;
diff --git a/llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td 
b/llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td
index 49d5ecaa0e5c5..14c811da64910 100644
--- a/llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td
+++ b/llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td
@@ -48,12 +48,18 @@ def MSP430LibraryWithCondCC : SystemRuntimeLibrary;
 // func_a and func_b b

[llvm-branch-commits] [llvm] RuntimeLibcalls: Remove darwin override of half convert libcalls (PR #148782)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/148782

>From 9404c93a6e6bed8ca8572b2a1c3f4b96d7c2d895 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 15 Jul 2025 14:53:15 +0900
Subject: [PATCH] RuntimeLibcalls: Remove darwin override of half convert
 libcalls

These are already the default calls set for these conversions, so
they should not require explicit setting. The non-default cases are
currently overridden in ARMISelLowering. Just delete this until
the list of calls and lowering decisions are separated.

This was added back in 6402ad27c01c9503a12d41d7e40646cf0d1f919f. It
appears to not be relevant for AArch64, where calls appear to never
be used for these. It also appears to not be relevant for x86, where
the default calls seem to always end up used anyway.
---
 llvm/lib/IR/RuntimeLibcalls.cpp | 9 -
 1 file changed, 9 deletions(-)

diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 8c90c52141dc7..5340f11e0c112 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -78,15 +78,6 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT,
   setLibcallImpl(RTLIB::UNWIND_RESUME, RTLIB::_Unwind_SjLj_Resume);
   }
 
-  // A few names are different on particular architectures or environments.
-  if (TT.isOSDarwin()) {
-// For f16/f32 conversions, Darwin uses the standard naming scheme,
-// instead of the gnueabi-style __gnu_*_ieee.
-// FIXME: What about other targets?
-setLibcallImpl(RTLIB::FPEXT_F16_F32, RTLIB::__extendhfsf2);
-setLibcallImpl(RTLIB::FPROUND_F32_F16, RTLIB::__truncsfhf2);
-  }
-
   if (TT.isOSOpenBSD()) {
 setLibcallImpl(RTLIB::STACKPROTECTOR_CHECK_FAIL, RTLIB::Unsupported);
 setLibcallImpl(RTLIB::STACK_SMASH_HANDLER, RTLIB::__stack_smash_handler);

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Move __stack_smash_handler config to tablegen (PR #150870)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/150870

>From 980e5523cdc1ade16f0970ba7872dac6fcf819d9 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 28 Jul 2025 11:47:04 +0900
Subject: [PATCH 1/2] RuntimeLibcalls: Move __stack_smash_handler config to
 tablegen

---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 17 ++---
 llvm/lib/IR/RuntimeLibcalls.cpp |  3 ---
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index e31573b04cb4b..8ae1882ab9475 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -1151,6 +1151,7 @@ defvar LibmHasFrexpF128 = LibcallImpls<(add frexp_f128), 
isNotOSWindowsOrIsCygwi
 defvar LibmHasLdexpF128 = LibcallImpls<(add ldexp_f128), 
isNotOSWindowsOrIsCygwinMinGW>;
 
 defvar has__stack_chk_fail = LibcallImpls<(add __stack_chk_fail), 
isNotOSOpenBSD>;
+defvar has__stack_smash_handler = LibcallImpls<(add __stack_smash_handler), 
isOSOpenBSD>;
 
 
//===--===//
 // Objective-C Runtime Libcalls
@@ -1229,7 +1230,7 @@ def AArch64SystemLibrary : SystemRuntimeLibrary<
DarwinExp10, DarwinSinCosStret,
LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
DefaultLibmExp10,
-   has__stack_chk_fail)
+   has__stack_chk_fail, has__stack_smash_handler)
 >;
 
 // Prepend a # to every name
@@ -1506,7 +1507,7 @@ def ARMSystemLibrary
LibcallImpls<(add __divmodsi4, __udivmodsi4),
 RuntimeLibcallPredicate<[{TT.isOSBinFormatMachO() &&
   (!TT.isiOS() || 
!TT.isOSVersionLT(5, 0))}]>>,
-   has__stack_chk_fail)> {
+   has__stack_chk_fail, has__stack_smash_handler)> {
   let DefaultLibcallCallingConv = LibcallCallingConv<[{
  (!TT.isOSDarwin() && !TT.isiOS() && !TT.isWatchOS() && !TT.isDriverKit()) 
?
 (FloatABI == FloatABI::Hard ? CallingConv::ARM_AAPCS_VFP
@@ -2015,7 +2016,7 @@ def PPCSystemLibrary
LibmHasSinCosPPCF128,
AvailableIf,
LibcallImpls<(add Int128RTLibcalls), isPPC64>,
-   has__stack_chk_fail)>;
+   has__stack_chk_fail, has__stack_smash_handler)>;
 
 
//===--===//
 // RISCV Runtime Libcalls
@@ -2030,7 +2031,7 @@ def RISCVSystemLibrary
exp10f, exp10, exp10l_f128,
__riscv_flush_icache,
LibcallImpls<(add Int128RTLibcalls), isRISCV64>,
-   has__stack_chk_fail)>;
+   has__stack_chk_fail, has__stack_smash_handler)>;
 
 
//===--===//
 // SPARC Runtime Libcalls
@@ -2098,7 +2099,7 @@ def SPARCSystemLibrary
LibcallImpls<(add _Q_qtoll, _Q_qtoull, _Q_lltoq, _Q_ulltoq), isSPARC32>,
LibcallImpls<(add SPARC64_MulDivCalls, Int128RTLibcalls), isSPARC64>,
LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
-   has__stack_chk_fail)
+   has__stack_chk_fail, has__stack_smash_handler)
 >;
 
 
//===--===//
@@ -2159,7 +2160,8 @@ defvar X86CommonLibcalls =
// FIXME: MSVCRT doesn't have powi. The f128 case is added as a
// hack for one test relying on it.
__powitf2_f128,
-   has__stack_chk_fail
+   has__stack_chk_fail,
+   has__stack_smash_handler
 );
 
 defvar Windows32DivRemMulCalls =
@@ -2328,5 +2330,6 @@ def LegacyDefaultSystemLibrary
  exp10f, exp10, exp10l_f128,
  __powisf2, __powidf2, __powitf2_f128,
  LibcallImpls<(add Int128RTLibcalls), isArch64Bit>,
- has__stack_chk_fail
+ has__stack_chk_fail,
+ has__stack_smash_handler
 )>;
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index bfe2a3d6af867..a930414d177c5 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -78,9 +78,6 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT,
   setLibcallImpl(RTLIB::UNWIND_RESUME, RTLIB::_Unwind_SjLj_Resume);
   }
 
-  if (TT.isOSOpenBSD())
-setLibcallImpl(RTLIB::STACK_SMASH_HANDLER, RTLIB::__stack_smash_handler);
-
   if (TT.isARM() || TT.isThumb()) {
 setARMLibcallNames(*this, TT, FloatABI, EABIVersion);
 return;

>From 30e707c021c8af8c2f21ccd1c5d47488716ab40f Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 3 Aug 2025 07:33:05 +0900
Subject: [PATCH 2/2] DefaultStackProtector

---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index 8ae1882ab9475..384b7f29b9073 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLi

[llvm-branch-commits] [llvm] RuntimeLibcalls: Really move default libcall handling to tablegen (PR #148780)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/148780

>From a8d3505cf190e60ffba12580456a785a9c313063 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 14 Jul 2025 19:53:22 +0900
Subject: [PATCH] RuntimeLibcalls: Really move default libcall handling to
 tablegen

Hack in the default setting so it's consistently generated like
the other cases. Maintain a list of targets where this applies.
The alternative would require new infrastructure to sort the system
library initialization in some way.

I wanted the unhandled target case to be treated as a fatal
error, but it turns out there's a hack in IRSymtab using
RuntimeLibcalls, which will fail out in many tests that
do not have a triple set. Many of the failures are simply
running llvm-as with no triple, which probably should not
depend on knowing an accurate set of calls.
---
 llvm/include/llvm/IR/RuntimeLibcalls.h|5 -
 llvm/include/llvm/IR/RuntimeLibcalls.td   |   36 +-
 llvm/lib/IR/RuntimeLibcalls.cpp   |   42 +-
 llvm/test/CodeGen/AVR/llvm.sincos.ll  | 1183 +
 llvm/test/TableGen/RuntimeLibcallEmitter.td   |   14 +-
 .../TableGen/Basic/RuntimeLibcallsEmitter.cpp |   35 +-
 6 files changed, 398 insertions(+), 917 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index eb882c48270cf..f39e2e3c26900 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -137,9 +137,6 @@ struct RuntimeLibcallsInfo {
   LLVM_ABI RTLIB::LibcallImpl getSupportedLibcallImpl(StringRef FuncName) 
const;
 
 private:
-  static const RTLIB::LibcallImpl
-  DefaultLibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1];
-
   /// Stores the implementation choice for each each libcall.
   RTLIB::LibcallImpl LibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1] = {
   RTLIB::Unsupported};
@@ -197,8 +194,6 @@ struct RuntimeLibcallsInfo {
 return hasSinCos(TT) || TT.isPS();
   }
 
-  LLVM_ABI void initDefaultLibCallImpls();
-
   /// Generated by tablegen.
   void setTargetRuntimeLibcallSets(const Triple &TT,
FloatABI::ABIType FloatABI);
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index bae0020e0aead..f8782d71ddf37 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -1610,7 +1610,9 @@ def HexagonSystemLibrary
 (add (sub DefaultLibcallImpls32,
 __adddf3, __divsf3, __udivsi3, __udivdi3,
 __umoddi3, __divdf3, __muldf3, __divsi3, __subdf3, sqrtf,
-__divdi3, __umodsi3, __moddi3, __modsi3), HexagonLibcalls)>;
+__divdi3, __umodsi3, __moddi3, __modsi3), HexagonLibcalls,
+LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
+exp10f, exp10, exp10l_f128)>;
 
 
//===--===//
 // Lanai Runtime Libcalls
@@ -1823,6 +1825,7 @@ defvar MSP430DefaultOptOut = [
 def MSP430SystemLibrary
 : SystemRuntimeLibrary;
 
 def isXCore : RuntimeLibcallPredicate<"TT.getArch() == Triple::xcore">;
 def XCoreSystemLibrary
-: SystemRuntimeLibrary;
+: SystemRuntimeLibrary
+)>;
 
 
//===--===//
 // ZOS Runtime Libcalls
@@ -2286,3 +2293,26 @@ def WasmSystemLibrary
CompilerRTOnlyInt64Libcalls, CompilerRTOnlyInt128Libcalls,
exp10f, exp10,
emscripten_return_address)>;
+
+//===--===//
+// Legacy Default Runtime Libcalls
+//===--===//
+
+// TODO: Should make every target explicit.
+def isDefaultLibcallArch : RuntimeLibcallPredicate<[{
+  TT.isMIPS() || TT.isLoongArch() || TT.isVE() || TT.isBPF() ||
+  TT.getArch() == Triple::csky || TT.getArch() == Triple::arc ||
+  TT.getArch() == Triple::m68k ||  TT.getArch() == Triple::xtensa ||
+  (TT.isSystemZ() && !TT.isOSzOS())
+}]>;
+
+
+def isArch64Bit : RuntimeLibcallPredicate<[{TT.isArch64Bit()}]>;
+def LegacyDefaultSystemLibrary
+: SystemRuntimeLibrary
+)>;
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 1ca5878787979..8c90c52141dc7 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -8,6 +8,9 @@
 
 #include "llvm/IR/RuntimeLibcalls.h"
 #include "llvm/ADT/StringTable.h"
+#include "llvm/Support/Debug.h"
+
+#define DEBUG_TYPE "runtime-libcalls-info"
 
 using namespace llvm;
 using namespace RTLIB;
@@ -62,12 +65,6 @@ static void setARMLibcallNames(RuntimeLibcallsInfo &Info, 
const Triple &TT,
 Info.setLibcallImplCallingConv(Impl, CallingConv::ARM_AAPCS);
 }
 
-void RTLIB::RuntimeLibcallsInfo::initDefaultLibCallImpls() {
-  std::memcpy(LibcallImpls, DefaultLibcallImpls, sizeof(LibcallImpls));
-  static_assert(sizeof(LibcallImpls) == sizeof(DefaultLibcallImpls),
-  

[llvm-branch-commits] [llvm] RuntimeLibcalls: Really move default libcall handling to tablegen (PR #148780)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/148780

>From a8d3505cf190e60ffba12580456a785a9c313063 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 14 Jul 2025 19:53:22 +0900
Subject: [PATCH] RuntimeLibcalls: Really move default libcall handling to
 tablegen

Hack in the default setting so it's consistently generated like
the other cases. Maintain a list of targets where this applies.
The alternative would require new infrastructure to sort the system
library initialization in some way.

I wanted the unhandled target case to be treated as a fatal
error, but it turns out there's a hack in IRSymtab using
RuntimeLibcalls, which will fail out in many tests that
do not have a triple set. Many of the failures are simply
running llvm-as with no triple, which probably should not
depend on knowing an accurate set of calls.
---
 llvm/include/llvm/IR/RuntimeLibcalls.h|5 -
 llvm/include/llvm/IR/RuntimeLibcalls.td   |   36 +-
 llvm/lib/IR/RuntimeLibcalls.cpp   |   42 +-
 llvm/test/CodeGen/AVR/llvm.sincos.ll  | 1183 +
 llvm/test/TableGen/RuntimeLibcallEmitter.td   |   14 +-
 .../TableGen/Basic/RuntimeLibcallsEmitter.cpp |   35 +-
 6 files changed, 398 insertions(+), 917 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index eb882c48270cf..f39e2e3c26900 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -137,9 +137,6 @@ struct RuntimeLibcallsInfo {
   LLVM_ABI RTLIB::LibcallImpl getSupportedLibcallImpl(StringRef FuncName) 
const;
 
 private:
-  static const RTLIB::LibcallImpl
-  DefaultLibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1];
-
   /// Stores the implementation choice for each each libcall.
   RTLIB::LibcallImpl LibcallImpls[RTLIB::UNKNOWN_LIBCALL + 1] = {
   RTLIB::Unsupported};
@@ -197,8 +194,6 @@ struct RuntimeLibcallsInfo {
 return hasSinCos(TT) || TT.isPS();
   }
 
-  LLVM_ABI void initDefaultLibCallImpls();
-
   /// Generated by tablegen.
   void setTargetRuntimeLibcallSets(const Triple &TT,
FloatABI::ABIType FloatABI);
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index bae0020e0aead..f8782d71ddf37 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -1610,7 +1610,9 @@ def HexagonSystemLibrary
 (add (sub DefaultLibcallImpls32,
 __adddf3, __divsf3, __udivsi3, __udivdi3,
 __umoddi3, __divdf3, __muldf3, __divsi3, __subdf3, sqrtf,
-__divdi3, __umodsi3, __moddi3, __modsi3), HexagonLibcalls)>;
+__divdi3, __umodsi3, __moddi3, __modsi3), HexagonLibcalls,
+LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
+exp10f, exp10, exp10l_f128)>;
 
 
//===--===//
 // Lanai Runtime Libcalls
@@ -1823,6 +1825,7 @@ defvar MSP430DefaultOptOut = [
 def MSP430SystemLibrary
 : SystemRuntimeLibrary;
 
 def isXCore : RuntimeLibcallPredicate<"TT.getArch() == Triple::xcore">;
 def XCoreSystemLibrary
-: SystemRuntimeLibrary;
+: SystemRuntimeLibrary
+)>;
 
 
//===--===//
 // ZOS Runtime Libcalls
@@ -2286,3 +2293,26 @@ def WasmSystemLibrary
CompilerRTOnlyInt64Libcalls, CompilerRTOnlyInt128Libcalls,
exp10f, exp10,
emscripten_return_address)>;
+
+//===--===//
+// Legacy Default Runtime Libcalls
+//===--===//
+
+// TODO: Should make every target explicit.
+def isDefaultLibcallArch : RuntimeLibcallPredicate<[{
+  TT.isMIPS() || TT.isLoongArch() || TT.isVE() || TT.isBPF() ||
+  TT.getArch() == Triple::csky || TT.getArch() == Triple::arc ||
+  TT.getArch() == Triple::m68k ||  TT.getArch() == Triple::xtensa ||
+  (TT.isSystemZ() && !TT.isOSzOS())
+}]>;
+
+
+def isArch64Bit : RuntimeLibcallPredicate<[{TT.isArch64Bit()}]>;
+def LegacyDefaultSystemLibrary
+: SystemRuntimeLibrary
+)>;
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 1ca5878787979..8c90c52141dc7 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -8,6 +8,9 @@
 
 #include "llvm/IR/RuntimeLibcalls.h"
 #include "llvm/ADT/StringTable.h"
+#include "llvm/Support/Debug.h"
+
+#define DEBUG_TYPE "runtime-libcalls-info"
 
 using namespace llvm;
 using namespace RTLIB;
@@ -62,12 +65,6 @@ static void setARMLibcallNames(RuntimeLibcallsInfo &Info, 
const Triple &TT,
 Info.setLibcallImplCallingConv(Impl, CallingConv::ARM_AAPCS);
 }
 
-void RTLIB::RuntimeLibcallsInfo::initDefaultLibCallImpls() {
-  std::memcpy(LibcallImpls, DefaultLibcallImpls, sizeof(LibcallImpls));
-  static_assert(sizeof(LibcallImpls) == sizeof(DefaultLibcallImpls),
-  

[llvm-branch-commits] [llvm] RuntimeLibcalls: Move __stack_smash_handler config to tablegen (PR #150870)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/150870

>From 980e5523cdc1ade16f0970ba7872dac6fcf819d9 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 28 Jul 2025 11:47:04 +0900
Subject: [PATCH 1/2] RuntimeLibcalls: Move __stack_smash_handler config to
 tablegen

---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 17 ++---
 llvm/lib/IR/RuntimeLibcalls.cpp |  3 ---
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index e31573b04cb4b..8ae1882ab9475 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -1151,6 +1151,7 @@ defvar LibmHasFrexpF128 = LibcallImpls<(add frexp_f128), 
isNotOSWindowsOrIsCygwi
 defvar LibmHasLdexpF128 = LibcallImpls<(add ldexp_f128), 
isNotOSWindowsOrIsCygwinMinGW>;
 
 defvar has__stack_chk_fail = LibcallImpls<(add __stack_chk_fail), 
isNotOSOpenBSD>;
+defvar has__stack_smash_handler = LibcallImpls<(add __stack_smash_handler), 
isOSOpenBSD>;
 
 
//===--===//
 // Objective-C Runtime Libcalls
@@ -1229,7 +1230,7 @@ def AArch64SystemLibrary : SystemRuntimeLibrary<
DarwinExp10, DarwinSinCosStret,
LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
DefaultLibmExp10,
-   has__stack_chk_fail)
+   has__stack_chk_fail, has__stack_smash_handler)
 >;
 
 // Prepend a # to every name
@@ -1506,7 +1507,7 @@ def ARMSystemLibrary
LibcallImpls<(add __divmodsi4, __udivmodsi4),
 RuntimeLibcallPredicate<[{TT.isOSBinFormatMachO() &&
   (!TT.isiOS() || 
!TT.isOSVersionLT(5, 0))}]>>,
-   has__stack_chk_fail)> {
+   has__stack_chk_fail, has__stack_smash_handler)> {
   let DefaultLibcallCallingConv = LibcallCallingConv<[{
  (!TT.isOSDarwin() && !TT.isiOS() && !TT.isWatchOS() && !TT.isDriverKit()) 
?
 (FloatABI == FloatABI::Hard ? CallingConv::ARM_AAPCS_VFP
@@ -2015,7 +2016,7 @@ def PPCSystemLibrary
LibmHasSinCosPPCF128,
AvailableIf,
LibcallImpls<(add Int128RTLibcalls), isPPC64>,
-   has__stack_chk_fail)>;
+   has__stack_chk_fail, has__stack_smash_handler)>;
 
 
//===--===//
 // RISCV Runtime Libcalls
@@ -2030,7 +2031,7 @@ def RISCVSystemLibrary
exp10f, exp10, exp10l_f128,
__riscv_flush_icache,
LibcallImpls<(add Int128RTLibcalls), isRISCV64>,
-   has__stack_chk_fail)>;
+   has__stack_chk_fail, has__stack_smash_handler)>;
 
 
//===--===//
 // SPARC Runtime Libcalls
@@ -2098,7 +2099,7 @@ def SPARCSystemLibrary
LibcallImpls<(add _Q_qtoll, _Q_qtoull, _Q_lltoq, _Q_ulltoq), isSPARC32>,
LibcallImpls<(add SPARC64_MulDivCalls, Int128RTLibcalls), isSPARC64>,
LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
-   has__stack_chk_fail)
+   has__stack_chk_fail, has__stack_smash_handler)
 >;
 
 
//===--===//
@@ -2159,7 +2160,8 @@ defvar X86CommonLibcalls =
// FIXME: MSVCRT doesn't have powi. The f128 case is added as a
// hack for one test relying on it.
__powitf2_f128,
-   has__stack_chk_fail
+   has__stack_chk_fail,
+   has__stack_smash_handler
 );
 
 defvar Windows32DivRemMulCalls =
@@ -2328,5 +2330,6 @@ def LegacyDefaultSystemLibrary
  exp10f, exp10, exp10l_f128,
  __powisf2, __powidf2, __powitf2_f128,
  LibcallImpls<(add Int128RTLibcalls), isArch64Bit>,
- has__stack_chk_fail
+ has__stack_chk_fail,
+ has__stack_smash_handler
 )>;
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index bfe2a3d6af867..a930414d177c5 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -78,9 +78,6 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT,
   setLibcallImpl(RTLIB::UNWIND_RESUME, RTLIB::_Unwind_SjLj_Resume);
   }
 
-  if (TT.isOSOpenBSD())
-setLibcallImpl(RTLIB::STACK_SMASH_HANDLER, RTLIB::__stack_smash_handler);
-
   if (TT.isARM() || TT.isThumb()) {
 setARMLibcallNames(*this, TT, FloatABI, EABIVersion);
 return;

>From 30e707c021c8af8c2f21ccd1c5d47488716ab40f Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 3 Aug 2025 07:33:05 +0900
Subject: [PATCH 2/2] DefaultStackProtector

---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index 8ae1882ab9475..384b7f29b9073 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLi

[llvm-branch-commits] [libcxx] release/21.x: [libc++] Implement comparison operators for `tuple` added in C++23 (#148799) (PR #151808)

2025-08-02 Thread A. Jiang via llvm-branch-commits

frederick-vs-ja wrote:

> Why do we want to back-port this? This looks to me very much like feature 
> work.

See also 
https://github.com/llvm/llvm-project/pull/148799#pullrequestreview-3079726788. 
I'm inclined to backport this, but I will be also fine if backport is 
eventually rejected.

https://github.com/llvm/llvm-project/pull/151808
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/151811?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#151811** https://app.graphite.dev/github/pr/llvm/llvm-project/151811?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/151811?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#151810** https://app.graphite.dev/github/pr/llvm/llvm-project/151810?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151807** https://app.graphite.dev/github/pr/llvm/llvm-project/151807?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151804** https://app.graphite.dev/github/pr/llvm/llvm-project/151804?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/151811
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/151811

None

>From 7d26f4bb0efd6e38ef03ebfbe0b1c488435f3fb7 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:39:28 -0700
Subject: [PATCH] [AMDGPU] Add missing v_permlane_up_b32 test. NFC.

---
 llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt 
b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
index 5fdc973ef216c..4b44c27570af5 100644
--- a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
@@ -869,6 +869,9 @@
 0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01
 # GFX1250: v_permlane_down_b32 v5, v1, vcc_lo, m0  ; encoding: 
[0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01]
 
+0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01
+# GFX1250: v_permlane_up_b32 v5, v1, exec_hi, vcc_lo ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01]
+
 0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03
 # GFX1250: v_permlane_up_b32 v5, v1, exec_lo, src_scc ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03]
 

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mc

Author: Stanislav Mekhanoshin (rampitec)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/151811.diff


1 Files Affected:

- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt (+3) 


``diff
diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt 
b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
index 5fdc973ef216c..4b44c27570af5 100644
--- a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
@@ -869,6 +869,9 @@
 0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01
 # GFX1250: v_permlane_down_b32 v5, v1, vcc_lo, m0  ; encoding: 
[0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01]
 
+0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01
+# GFX1250: v_permlane_up_b32 v5, v1, exec_hi, vcc_lo ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01]
+
 0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03
 # GFX1250: v_permlane_up_b32 v5, v1, exec_lo, src_scc ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03]
 

``




https://github.com/llvm/llvm-project/pull/151811
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/151811.diff


1 Files Affected:

- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt (+3) 


``diff
diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt 
b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
index 5fdc973ef216c..4b44c27570af5 100644
--- a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
@@ -869,6 +869,9 @@
 0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01
 # GFX1250: v_permlane_down_b32 v5, v1, vcc_lo, m0  ; encoding: 
[0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01]
 
+0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01
+# GFX1250: v_permlane_up_b32 v5, v1, exec_hi, vcc_lo ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01]
+
 0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03
 # GFX1250: v_permlane_up_b32 v5, v1, exec_lo, src_scc ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03]
 

``




https://github.com/llvm/llvm-project/pull/151811
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/151811
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] release/21.x: [lld] Add thunks for hexagon (#111217) (PR #149723)

2025-08-02 Thread Tom Stellard via llvm-branch-commits

tstellar wrote:

@androm3da The hexagon-thunks-packets.s test added by this change is failing on 
our s390x builds.

See 
https://download.copr.fedorainfracloud.org/results/@fedora-llvm-team/llvm21/fedora-rawhide-s390x/09365945-llvm/builder-live.log.gz

https://github.com/llvm/llvm-project/pull/149723
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Add bitset for available libcalls (PR #150869)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/150869

>From 14813069a5bb931b2d93edde952dabed87d56a4d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 27 Jul 2025 23:26:20 +0900
Subject: [PATCH] RuntimeLibcalls: Add bitset for available libcalls

This is a step towards separating the set of available libcalls
from the lowering decision of which call to use. Libcall recognition
now directly checks availability instead of indirectly checking through
the lowering table.
---
 llvm/include/llvm/IR/RuntimeLibcalls.h| 64 +++
 llvm/lib/IR/RuntimeLibcalls.cpp   |  8 +--
 .../RuntimeLibcallEmitter-calling-conv.td | 22 +++
 .../RuntimeLibcallEmitter-conflict-warning.td | 15 +
 llvm/test/TableGen/RuntimeLibcallEmitter.td   | 27 
 .../TableGen/Basic/RuntimeLibcallsEmitter.cpp | 36 ++-
 6 files changed, 164 insertions(+), 8 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.h 
b/llvm/include/llvm/IR/RuntimeLibcalls.h
index f39e2e3c26900..8a5f953b68f9d 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.h
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.h
@@ -53,8 +53,64 @@ static inline auto libcall_impls() {
   return enum_seq(static_cast(1), RTLIB::NumLibcallImpls);
 }
 
+/// Manage a bitset representing the list of available libcalls for a module.
+///
+/// Most of this exists because std::bitset cannot be statically constructed in
+/// a size large enough before c++23
+class LibcallImplBitset {
+private:
+  using BitWord = uint64_t;
+  static constexpr unsigned BitWordSize = sizeof(BitWord) * CHAR_BIT;
+  static constexpr size_t NumArrayElts =
+  divideCeil(RTLIB::NumLibcallImpls, BitWordSize);
+  using Storage = BitWord[NumArrayElts];
+
+  Storage Bits = {};
+
+  /// Get bitmask for \p Impl in its Bits element.
+  static constexpr BitWord getBitmask(RTLIB::LibcallImpl Impl) {
+unsigned Idx = static_cast(Impl);
+return BitWord(1) << (Idx % BitWordSize);
+  }
+
+  /// Get index of array element of Bits for \p Impl
+  static constexpr unsigned getArrayIdx(RTLIB::LibcallImpl Impl) {
+return static_cast(Impl) / BitWordSize;
+  }
+
+public:
+  constexpr LibcallImplBitset() = default;
+  constexpr LibcallImplBitset(const Storage &Src) {
+for (size_t I = 0; I != NumArrayElts; ++I)
+  Bits[I] = Src[I];
+  }
+
+  /// Check if a LibcallImpl is available.
+  constexpr bool test(RTLIB::LibcallImpl Impl) const {
+BitWord Mask = getBitmask(Impl);
+return (Bits[getArrayIdx(Impl)] & Mask) != 0;
+  }
+
+  /// Mark a LibcallImpl as available
+  void set(RTLIB::LibcallImpl Impl) {
+assert(Impl != RTLIB::Unsupported && "cannot enable unsupported libcall");
+Bits[getArrayIdx(Impl)] |= getBitmask(Impl);
+  }
+
+  /// Mark a LibcallImpl as unavailable
+  void unset(RTLIB::LibcallImpl Impl) {
+assert(Impl != RTLIB::Unsupported && "cannot enable unsupported libcall");
+Bits[getArrayIdx(Impl)] &= ~getBitmask(Impl);
+  }
+};
+
 /// A simple container for information about the supported runtime calls.
 struct RuntimeLibcallsInfo {
+private:
+  /// Bitset of libcalls a module may emit a call to.
+  LibcallImplBitset AvailableLibcallImpls;
+
+public:
   explicit RuntimeLibcallsInfo(
   const Triple &TT,
   ExceptionHandling ExceptionModel = ExceptionHandling::None,
@@ -132,6 +188,14 @@ struct RuntimeLibcallsInfo {
 return ImplToLibcall[Impl];
   }
 
+  bool isAvailable(RTLIB::LibcallImpl Impl) const {
+return AvailableLibcallImpls.test(Impl);
+  }
+
+  void setAvailable(RTLIB::LibcallImpl Impl) {
+AvailableLibcallImpls.set(Impl);
+  }
+
   /// Check if this is valid libcall for the current module, otherwise
   /// RTLIB::Unsupported.
   LLVM_ABI RTLIB::LibcallImpl getSupportedLibcallImpl(StringRef FuncName) 
const;
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index 8c90c52141dc7..569f63f28db77 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -114,12 +114,8 @@ RuntimeLibcallsInfo::getSupportedLibcallImpl(StringRef 
FuncName) const {
   for (auto I = Range.begin(); I != Range.end(); ++I) {
 RTLIB::LibcallImpl Impl =
 static_cast(I - RuntimeLibcallNameOffsets.begin());
-
-// FIXME: This should not depend on looking up ImplToLibcall, only the list
-// of libcalls for the module.
-RTLIB::LibcallImpl Recognized = LibcallImpls[ImplToLibcall[Impl]];
-if (Recognized != RTLIB::Unsupported)
-  return Recognized;
+if (isAvailable(Impl))
+  return Impl;
   }
 
   return RTLIB::Unsupported;
diff --git a/llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td 
b/llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td
index 49d5ecaa0e5c5..14c811da64910 100644
--- a/llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td
+++ b/llvm/test/TableGen/RuntimeLibcallEmitter-calling-conv.td
@@ -48,12 +48,18 @@ def MSP430LibraryWithCondCC : SystemRuntimeLibrary;
 // func_a and func_b b

[llvm-branch-commits] [llvm] RuntimeLibcalls: Move __stack_chk_fail config to tablegen (PR #148789)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/148789

>From cdfde2310d5dabadae82e4cd64d385d6919da66d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 15 Jul 2025 15:47:10 +0900
Subject: [PATCH 1/2] RuntimeLibcalls: Move __stack_chk_fail config to tablegen

---
 llvm/include/llvm/IR/RuntimeLibcalls.td | 44 -
 llvm/lib/IR/RuntimeLibcalls.cpp |  4 +--
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.td 
b/llvm/include/llvm/IR/RuntimeLibcalls.td
index f8782d71ddf37..02b5d603a29f4 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.td
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.td
@@ -18,6 +18,7 @@ class DuplicateLibcallImplWithPrefix
 /// Libcall Predicates
 def isOSDarwin : RuntimeLibcallPredicate<"TT.isOSDarwin()">;
 def isOSOpenBSD : RuntimeLibcallPredicate<"TT.isOSOpenBSD()">;
+def isNotOSOpenBSD : RuntimeLibcallPredicate<"!TT.isOSOpenBSD()">;
 def isOSWindows : RuntimeLibcallPredicate<"TT.isOSWindows()">;
 def isNotOSWindows : RuntimeLibcallPredicate<"!TT.isOSWindows()">;
 def isNotOSMSVCRT : RuntimeLibcallPredicate<"!TT.isOSMSVCRT()">;
@@ -705,9 +706,6 @@ foreach lc = LibCalls__atomic in {
   def __#!tolower(!cast(lc)) : RuntimeLibcallImpl;
 }
 
-// Stack Protector Fail
-def __stack_chk_fail : RuntimeLibcallImpl;
-
 // Safe stack.
 def __safestack_pointer_address : 
RuntimeLibcallImpl;
 
@@ -955,6 +953,9 @@ def exp10l_f80 : RuntimeLibcallImpl;
 def exp10l_f128 : RuntimeLibcallImpl;
 def exp10l_ppcf128 : RuntimeLibcallImpl;
 
+// Stack Protector Fail
+def __stack_chk_fail : RuntimeLibcallImpl;
+
 //
 // compiler-rt/libgcc but 64-bit only, not available by default
 //
@@ -1149,6 +1150,8 @@ defvar LibmHasLdexpF80 = LibcallImpls<(add ldexp_f80), 
isNotOSWindowsOrIsCygwinM
 defvar LibmHasFrexpF128 = LibcallImpls<(add frexp_f128), 
isNotOSWindowsOrIsCygwinMinGW>;
 defvar LibmHasLdexpF128 = LibcallImpls<(add ldexp_f128), 
isNotOSWindowsOrIsCygwinMinGW>;
 
+defvar has__stack_chk_fail = LibcallImpls<(add __stack_chk_fail), 
isNotOSOpenBSD>;
+
 
//===--===//
 // Objective-C Runtime Libcalls
 
//===--===//
@@ -1225,7 +1228,8 @@ def AArch64SystemLibrary : SystemRuntimeLibrary<
LibcallImpls<(add bzero), isOSDarwin>,
DarwinExp10, DarwinSinCosStret,
LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
-   DefaultLibmExp10)
+   DefaultLibmExp10,
+   has__stack_chk_fail)
 >;
 
 // Prepend a # to every name
@@ -1241,7 +1245,7 @@ defset list 
WinArm64ECDefaultRuntimeLibcallImpls = {
 
 def WindowsARM64ECSystemLibrary
 : SystemRuntimeLibrary;
+   (add WinArm64ECDefaultRuntimeLibcallImpls, 
__stack_chk_fail)>;
 
 
//===--===//
 // AMDGPU Runtime Libcalls
@@ -1501,7 +1505,8 @@ def ARMSystemLibrary
// Use divmod compiler-rt calls for iOS 5.0 and later.
LibcallImpls<(add __divmodsi4, __udivmodsi4),
 RuntimeLibcallPredicate<[{TT.isOSBinFormatMachO() &&
-  (!TT.isiOS() || 
!TT.isOSVersionLT(5, 0))}]>>)> {
+  (!TT.isiOS() || 
!TT.isOSVersionLT(5, 0))}]>>,
+   has__stack_chk_fail)> {
   let DefaultLibcallCallingConv = LibcallCallingConv<[{
  (!TT.isOSDarwin() && !TT.isiOS() && !TT.isWatchOS() && !TT.isDriverKit()) 
?
 (FloatABI == FloatABI::Hard ? CallingConv::ARM_AAPCS_VFP
@@ -1612,7 +1617,7 @@ def HexagonSystemLibrary
 __umoddi3, __divdf3, __muldf3, __divsi3, __subdf3, sqrtf,
 __divdi3, __umodsi3, __moddi3, __modsi3), HexagonLibcalls,
 LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
-exp10f, exp10, exp10l_f128)>;
+exp10f, exp10, exp10l_f128, __stack_chk_fail)>;
 
 
//===--===//
 // Lanai Runtime Libcalls
@@ -1622,7 +1627,8 @@ def isLanai : RuntimeLibcallPredicate<"TT.getArch() == 
Triple::lanai">;
 
 // Use fast calling convention for library functions.
 def LanaiSystemLibrary
-: SystemRuntimeLibrary {
+: SystemRuntimeLibrary {
   let DefaultLibcallCallingConv = FASTCC;
 }
 
@@ -1914,8 +1920,10 @@ def MSP430SystemLibrary
   // TODO: __mspabi_[srli/srai/slli] ARE implemented in libgcc
   __mspabi_srll,
   __mspabi_sral,
-  __mspabi_slll
+  __mspabi_slll,
   // __mspabi_[srlll/srall/s/rlli/rlll] are NOT implemented in libgcc
+
+  __stack_chk_fail
   )
 >;
 
@@ -2006,7 +2014,8 @@ def PPCSystemLibrary
LibmHasSinCosF32, LibmHasSinCosF64, LibmHasSinCosF128,
LibmHasSinCosPPCF128,
Availabl

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (PR #151807)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/151807

None

>From 7bc4dd4b6d3bf909475316a44f7ef42c11ef4160 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 01:48:31 -0700
Subject: [PATCH] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   6 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl  |  36 +++
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   6 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   3 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  12 +
 .../llvm.amdgcn.cvt.scalef32.pk16.gfx1250.ll  | 303 ++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s |  36 +++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s|  36 +++
 .../Disassembler/AMDGPU/gfx1250_dasm_vop3.txt |  36 +++
 10 files changed, 480 insertions(+)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk16.gfx1250.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 3773031e187c1..9125315310306 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -731,6 +731,12 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_bf8_f32, 
"V2UiV8ff", "nc", "gfx
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_f32, "UiV8ff", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_f16, "UiV8hf", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_bf16, "UiV8yf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_f32, "V3UiV16ff", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_f32, "V3UiV16ff", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_f16, "V3UiV16hf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_f16, "V3UiV16hf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_bf16, "V3UiV16yf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_bf16, "V3UiV16yf", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp8_bf16, "V2UiV8yUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_bf16, "V2UiV8yUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp8_f16, "V2UiV8hUif", 
"nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index c25aaf11bb0e1..e50ab77f48c79 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -787,6 +787,36 @@ void test_cvt_scale_pk(global half8 *outh8, global bfloat8 
*outy8, uint2 src2,
 // CHECK-NEXT:[[TMP34:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.pk8.fp4.bf16(<8 x bfloat> [[TMP32]], float [[TMP33]])
 // CHECK-NEXT:[[TMP35:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP34]], ptr addrspace(1) [[TMP35]], align 4
+// CHECK-NEXT:[[TMP36:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP37:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP38:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.bf6.bf16(<16 x bfloat> [[TMP36]], float 
[[TMP37]])
+// CHECK-NEXT:[[TMP39:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP38]], ptr addrspace(1) [[TMP39]], align 
16
+// CHECK-NEXT:[[TMP40:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP41:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP42:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.bf6.f16(<16 x half> [[TMP40]], float [[TMP41]])
+// CHECK-NEXT:[[TMP43:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP42]], ptr addrspace(1) [[TMP43]], align 
16
+// CHECK-NEXT:[[TMP44:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP45:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP46:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.fp6.bf16(<16 x bfloat> [[TMP44]], float 
[[TMP45]])
+// CHECK-NEXT:[[TMP47:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP46]], ptr addrspace(1) [[TMP47]], align 
16
+// CHECK-NEXT:[[TMP48:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP49:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP50:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.fp6.f16(<16 x half> [[TMP48]], float [[TMP49]])
+// CHECK-NEXT:[[TMP51:%.*]] = load 

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (PR #151807)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/151807?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#151807** https://app.graphite.dev/github/pr/llvm/llvm-project/151807?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/151807?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#151804** https://app.graphite.dev/github/pr/llvm/llvm-project/151804?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/151807
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151811

>From e911c22a6243b6cbe2932202e6c27c9d75b6e8c9 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:39:28 -0700
Subject: [PATCH] [AMDGPU] Add missing v_permlane_up_b32 test. NFC.

---
 llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt 
b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
index 5fdc973ef216c..4b44c27570af5 100644
--- a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
@@ -869,6 +869,9 @@
 0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01
 # GFX1250: v_permlane_down_b32 v5, v1, vcc_lo, m0  ; encoding: 
[0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01]
 
+0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01
+# GFX1250: v_permlane_up_b32 v5, v1, exec_hi, vcc_lo ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01]
+
 0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03
 # GFX1250: v_permlane_up_b32 v5, v1, exec_lo, src_scc ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03]
 

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151810

>From ff0a7f87901b9dc50e0fd5ec09f6000c25b5e91f Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:11:34 -0700
Subject: [PATCH] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   6 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl  |  42 
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   6 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   3 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  12 +
 .../llvm.amdgcn.cvt.scalef32.sr.pk16.ll   | 232 ++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s |  36 +++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s|  36 +++
 .../Disassembler/AMDGPU/gfx1250_dasm_vop3.txt |  36 +++
 10 files changed, 415 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.sr.pk16.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 9125315310306..ced758c814105 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -746,6 +746,12 @@ 
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_f32, "V2UiV8fUif", "nc",
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f32, "UiV8fUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f16, "UiV8hUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_bf16, "UiV8yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32_e5m3, "iffiIb", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32_e5m3, "ifiiIi", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sat_pk4_i4_i8, "UsUi", "nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index e50ab77f48c79..4ff0571239e71 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -929,6 +929,42 @@ void test_cvt_scalef32_pk(global uint2 *out2, bfloat8 
srcbf8, half8 srch8, float
 // CHECK-NEXT:[[TMP43:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.sr.pk8.fp4.bf16(<8 x bfloat> [[TMP40]], i32 
[[TMP41]], float [[TMP42]])
 // CHECK-NEXT:[[TMP44:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP43]], ptr addrspace(1) [[TMP44]], align 4
+// CHECK-NEXT:[[TMP45:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP46:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP47:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP48:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.bf16(<16 x bfloat> [[TMP45]], i32 
[[TMP46]], float [[TMP47]])
+// CHECK-NEXT:[[TMP49:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP48]], ptr addrspace(1) [[TMP49]], align 
16
+// CHECK-NEXT:[[TMP50:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP51:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP52:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP53:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.f16(<16 x half> [[TMP50]], i32 [[TMP51]], 
float [[TMP52]])
+// CHECK-NEXT:[[TMP54:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP53]], ptr addrspace(1) [[TMP54]], align 
16
+// CHECK-NEXT:[[TMP55:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP56:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP57:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP58:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.fp6.bf16(<16 x bfloat> [[TMP55]], i32 
[[TMP56]], float [[TMP57]])
+// CHECK-NEXT:[[TMP59:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP58]], ptr addrspace(1) [[TMP59]], align 
16
+// CHECK-NEXT:[[TMP60:%.*]] = load <16 x h

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151810

>From ff0a7f87901b9dc50e0fd5ec09f6000c25b5e91f Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:11:34 -0700
Subject: [PATCH] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   6 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl  |  42 
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   6 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   3 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  12 +
 .../llvm.amdgcn.cvt.scalef32.sr.pk16.ll   | 232 ++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s |  36 +++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s|  36 +++
 .../Disassembler/AMDGPU/gfx1250_dasm_vop3.txt |  36 +++
 10 files changed, 415 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.sr.pk16.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 9125315310306..ced758c814105 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -746,6 +746,12 @@ 
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_f32, "V2UiV8fUif", "nc",
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f32, "UiV8fUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f16, "UiV8hUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_bf16, "UiV8yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32_e5m3, "iffiIb", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32_e5m3, "ifiiIi", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sat_pk4_i4_i8, "UsUi", "nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index e50ab77f48c79..4ff0571239e71 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -929,6 +929,42 @@ void test_cvt_scalef32_pk(global uint2 *out2, bfloat8 
srcbf8, half8 srch8, float
 // CHECK-NEXT:[[TMP43:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.sr.pk8.fp4.bf16(<8 x bfloat> [[TMP40]], i32 
[[TMP41]], float [[TMP42]])
 // CHECK-NEXT:[[TMP44:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP43]], ptr addrspace(1) [[TMP44]], align 4
+// CHECK-NEXT:[[TMP45:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP46:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP47:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP48:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.bf16(<16 x bfloat> [[TMP45]], i32 
[[TMP46]], float [[TMP47]])
+// CHECK-NEXT:[[TMP49:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP48]], ptr addrspace(1) [[TMP49]], align 
16
+// CHECK-NEXT:[[TMP50:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP51:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP52:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP53:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.f16(<16 x half> [[TMP50]], i32 [[TMP51]], 
float [[TMP52]])
+// CHECK-NEXT:[[TMP54:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP53]], ptr addrspace(1) [[TMP54]], align 
16
+// CHECK-NEXT:[[TMP55:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP56:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP57:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP58:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.fp6.bf16(<16 x bfloat> [[TMP55]], i32 
[[TMP56]], float [[TMP57]])
+// CHECK-NEXT:[[TMP59:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP58]], ptr addrspace(1) [[TMP59]], align 
16
+// CHECK-NEXT:[[TMP60:%.*]] = load <16 x h

[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151811

>From e911c22a6243b6cbe2932202e6c27c9d75b6e8c9 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:39:28 -0700
Subject: [PATCH] [AMDGPU] Add missing v_permlane_up_b32 test. NFC.

---
 llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt 
b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
index 5fdc973ef216c..4b44c27570af5 100644
--- a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
@@ -869,6 +869,9 @@
 0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01
 # GFX1250: v_permlane_down_b32 v5, v1, vcc_lo, m0  ; encoding: 
[0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01]
 
+0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01
+# GFX1250: v_permlane_up_b32 v5, v1, exec_hi, vcc_lo ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01]
+
 0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03
 # GFX1250: v_permlane_up_b32 v5, v1, exec_lo, src_scc ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03]
 

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [RISCV] vsha2cl intrinsics should select vsha2cl instructions. (PR #151834)

2025-08-02 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/151834

Backport a585d5758847dd7e4cd7d8137bea6c1577c53009

Requested by: @topperc

>From 3c543929b1b5e11da820e207960e2f286ddb33ef Mon Sep 17 00:00:00 2001
From: Craig Topper 
Date: Sat, 2 Aug 2025 11:07:34 -0700
Subject: [PATCH] [RISCV] vsha2cl intrinsics should select vsha2cl
 instructions.

Fixes #151814.

(cherry picked from commit a585d5758847dd7e4cd7d8137bea6c1577c53009)
---
 llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td |  4 ++--
 llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll | 10 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td 
b/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
index 4147c97a7a23a..92bc3ee8bdacc 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
@@ -1130,13 +1130,13 @@ let Predicates = [HasStdExtZvkned] in {
 
 let Predicates = [HasStdExtZvknha] in {
   defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2ch", "PseudoVSHA2CH", 
I32IntegerVectors>;
-  defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2cl", "PseudoVSHA2CH", 
I32IntegerVectors>;
+  defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2cl", "PseudoVSHA2CL", 
I32IntegerVectors>;
   defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2ms", "PseudoVSHA2MS", 
I32IntegerVectors, isSEWAware=true>;
 } // Predicates = [HasStdExtZvknha]
 
 let Predicates = [HasStdExtZvknhb] in {
   defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2ch", "PseudoVSHA2CH", 
I32I64IntegerVectors>;
-  defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2cl", "PseudoVSHA2CH", 
I32I64IntegerVectors>;
+  defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2cl", "PseudoVSHA2CL", 
I32I64IntegerVectors>;
   defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2ms", "PseudoVSHA2MS", 
I32I64IntegerVectors, isSEWAware=true>;
 } // Predicates = [HasStdExtZvknhb]
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll 
b/llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll
index f29c74ae69bf6..697c582dcb38b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll
@@ -21,7 +21,7 @@ define  
@intrinsic_vsha2cl_vv_nxv4i32_nxv4i32( @llvm.riscv.vsha2cl.nxv4i32.nxv4i32(
@@ -45,7 +45,7 @@ define  
@intrinsic_vsha2cl_vv_nxv8i32_nxv8i32( @llvm.riscv.vsha2cl.nxv8i32.nxv8i32(
@@ -70,7 +70,7 @@ define  
@intrinsic_vsha2cl_vv_nxv16i32_nxv16i32( @llvm.riscv.vsha2cl.nxv16i32.nxv16i32(
@@ -94,7 +94,7 @@ define  
@intrinsic_vsha2cl_vv_nxv4i64_nxv4i64( @llvm.riscv.vsha2cl.nxv4i64.nxv4i64(
@@ -119,7 +119,7 @@ define  
@intrinsic_vsha2cl_vv_nxv8i64_nxv8i64( @llvm.riscv.vsha2cl.nxv8i64.nxv8i64(

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [RISCV] vsha2cl intrinsics should select vsha2cl instructions. (PR #151834)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-risc-v

Author: None (llvmbot)


Changes

Backport a585d5758847dd7e4cd7d8137bea6c1577c53009

Requested by: @topperc

---
Full diff: https://github.com/llvm/llvm-project/pull/151834.diff


2 Files Affected:

- (modified) llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td (+2-2) 
- (modified) llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll (+5-5) 


``diff
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td 
b/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
index 4147c97a7a23a..92bc3ee8bdacc 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
@@ -1130,13 +1130,13 @@ let Predicates = [HasStdExtZvkned] in {
 
 let Predicates = [HasStdExtZvknha] in {
   defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2ch", "PseudoVSHA2CH", 
I32IntegerVectors>;
-  defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2cl", "PseudoVSHA2CH", 
I32IntegerVectors>;
+  defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2cl", "PseudoVSHA2CL", 
I32IntegerVectors>;
   defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2ms", "PseudoVSHA2MS", 
I32IntegerVectors, isSEWAware=true>;
 } // Predicates = [HasStdExtZvknha]
 
 let Predicates = [HasStdExtZvknhb] in {
   defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2ch", "PseudoVSHA2CH", 
I32I64IntegerVectors>;
-  defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2cl", "PseudoVSHA2CH", 
I32I64IntegerVectors>;
+  defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2cl", "PseudoVSHA2CL", 
I32I64IntegerVectors>;
   defm : VPatBinaryV_VV_NoMask<"int_riscv_vsha2ms", "PseudoVSHA2MS", 
I32I64IntegerVectors, isSEWAware=true>;
 } // Predicates = [HasStdExtZvknhb]
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll 
b/llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll
index f29c74ae69bf6..697c582dcb38b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsha2cl.ll
@@ -21,7 +21,7 @@ define  
@intrinsic_vsha2cl_vv_nxv4i32_nxv4i32( @llvm.riscv.vsha2cl.nxv4i32.nxv4i32(
@@ -45,7 +45,7 @@ define  
@intrinsic_vsha2cl_vv_nxv8i32_nxv8i32( @llvm.riscv.vsha2cl.nxv8i32.nxv8i32(
@@ -70,7 +70,7 @@ define  
@intrinsic_vsha2cl_vv_nxv16i32_nxv16i32( @llvm.riscv.vsha2cl.nxv16i32.nxv16i32(
@@ -94,7 +94,7 @@ define  
@intrinsic_vsha2cl_vv_nxv4i64_nxv4i64( @llvm.riscv.vsha2cl.nxv4i64.nxv4i64(
@@ -119,7 +119,7 @@ define  
@intrinsic_vsha2cl_vv_nxv8i64_nxv8i64( @llvm.riscv.vsha2cl.nxv8i64.nxv8i64(

``




https://github.com/llvm/llvm-project/pull/151834
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [RISCV] vsha2cl intrinsics should select vsha2cl instructions. (PR #151834)

2025-08-02 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/151834
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrtf implementation to header-only in src/__support/math folder. (PR #151846)

2025-08-02 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix edited 
https://github.com/llvm/llvm-project/pull/151846
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrtf implementation to header-only in src/__support/math folder. (PR #151846)

2025-08-02 Thread Muhammad Bassiouni via llvm-branch-commits

bassiounix wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/151846?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#151846** https://app.graphite.dev/github/pr/llvm/llvm-project/151846?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/151846?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#151837** https://app.graphite.dev/github/pr/llvm/llvm-project/151837?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151779** https://app.graphite.dev/github/pr/llvm/llvm-project/151779?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151399** https://app.graphite.dev/github/pr/llvm/llvm-project/151399?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151012** https://app.graphite.dev/github/pr/llvm/llvm-project/151012?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150993** https://app.graphite.dev/github/pr/llvm/llvm-project/150993?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150968** https://app.graphite.dev/github/pr/llvm/llvm-project/150968?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150868** https://app.graphite.dev/github/pr/llvm/llvm-project/150868?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150854** https://app.graphite.dev/github/pr/llvm/llvm-project/150854?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150852** https://app.graphite.dev/github/pr/llvm/llvm-project/150852?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150849** https://app.graphite.dev/github/pr/llvm/llvm-project/150849?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150843** https://app.graphite.dev/github/pr/llvm/llvm-project/150843?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/151846
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrtf implementation to header-only in src/__support/math folder. (PR #151846)

2025-08-02 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix ready_for_review 
https://github.com/llvm/llvm-project/pull/151846
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrtf implementation to header-only in src/__support/math folder. (PR #151846)

2025-08-02 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix created 
https://github.com/llvm/llvm-project/pull/151846

None

>From ea293895fc08b57dd21e46744298e4631c4c9bc7 Mon Sep 17 00:00:00 2001
From: bassiounix 
Date: Sun, 3 Aug 2025 07:05:00 +0300
Subject: [PATCH] [libc][math] Refactor cbrtf implementation to header-only in
 src/__support/math folder.

---
 libc/shared/math.h|   1 +
 libc/shared/math/cbrtf.h  |  23 +++
 libc/src/__support/math/CMakeLists.txt|  11 ++
 libc/src/__support/math/cbrtf.h   | 161 ++
 libc/src/math/generic/CMakeLists.txt  |   6 +-
 libc/src/math/generic/cbrtf.cpp   | 147 +---
 libc/test/shared/CMakeLists.txt   |   1 +
 libc/test/shared/shared_math_test.cpp |   1 +
 .../llvm-project-overlay/libc/BUILD.bazel |  14 +-
 9 files changed, 214 insertions(+), 151 deletions(-)
 create mode 100644 libc/shared/math/cbrtf.h
 create mode 100644 libc/src/__support/math/cbrtf.h

diff --git a/libc/shared/math.h b/libc/shared/math.h
index 3714f380a27dc..ea645f0afedbc 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -31,6 +31,7 @@
 #include "math/atanhf.h"
 #include "math/atanhf16.h"
 #include "math/cbrt.h"
+#include "math/cbrtf.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cbrtf.h b/libc/shared/math/cbrtf.h
new file mode 100644
index 0..09b86bed3fb7e
--- /dev/null
+++ b/libc/shared/math/cbrtf.h
@@ -0,0 +1,23 @@
+//===-- Shared cbrtf function ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LIBC_SHARED_MATH_CBRTF_H
+#define LIBC_SHARED_MATH_CBRTF_H
+
+#include "shared/libc_common.h"
+#include "src/__support/math/cbrtf.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cbrtf;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_SHARED_MATH_CBRTF_H
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index e1076edf1e61c..fe928a8fadd5e 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -346,6 +346,17 @@ add_header_library(
 libc.src.__support.integer_literals
 )
 
+add_header_library(
+  cbrtf
+  HDRS
+cbrtf.h
+  DEPENDS
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.macros.optimization
+)
+
 add_header_library(
   erff
   HDRS
diff --git a/libc/src/__support/math/cbrtf.h b/libc/src/__support/math/cbrtf.h
new file mode 100644
index 0..f82892bbbe61b
--- /dev/null
+++ b/libc/src/__support/math/cbrtf.h
@@ -0,0 +1,161 @@
+//===-- Implementation header for cbrtf -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LIBC_SRC___SUPPORT_MATH_CBRTF_H
+#define LIBC_SRC___SUPPORT_MATH_CBRTF_H
+
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace math {
+
+LIBC_INLINE static constexpr float cbrtf(float x) {
+  // Look up table for 2^(i/3) for i = 0, 1, 2.
+  constexpr double CBRT2[3] = {1.0, 0x1.428a2f98d728bp0, 0x1.965fea53d6e3dp0};
+
+  // Degree-7 polynomials approximation of ((1 + x)^(1/3) - 1)/x for 0 <= x <= 
1
+  // generated by Sollya with:
+  // > for i from 0 to 15 do {
+  // P = fpminimax(((1 + x)^(1/3) - 1)/x, 6, [|D...|], [i/16, (i + 1)/16]);
+  // print("{", coeff(P, 0), ",", coeff(P, 1), ",", coeff(P, 2), ",",
+  //   coeff(P, 3), ",", coeff(P, 4), ",", coeff(P, 5), ",",
+  //   coeff(P, 6), "},");
+  // };
+  // Then (1 + x)^(1/3) ~ 1 + x * P(x).
+  constexpr double COEFFS[16][7] = {
+  {0x1.554ebp-2, -0x1.c71c71c678c0cp-4, 0x1.f9add2776de81p-5,
+   -0x1.511e10aa964a7p-5, 0x1.ee44165937fa2p-6, -0x1.7c5c9e059345dp-6,
+   0x1.047f75e0aff14p-6},
+  {0x1.554d1149ap-2, -0x1.c71c676fcb5bp-4, 0x1.f9ab127dc57ebp-5,
+   -0x1.50ea8fd1d4c15p-5, 0x1.e9d68f28ced43p-6, -0x1.60e0e1e661311p-6,
+   0x1.716eca1d6e3bcp-7},
+  {0x1.546377d45p-2, -0x1.c71bc1c6d49d2p-4, 0x1.f9924cc0ed24dp-5,
+   -0x1.4fea3beb53b3bp-5, 0x1.de028a9a07b1bp-6, -0x1.3b090d2233524p-6,
+   0x1.0ae

[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrtf implementation to header-only in src/__support/math folder. (PR #151846)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libc

Author: Muhammad Bassiouni (bassiounix)


Changes

Part of #147386

in preparation for: 
https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450

---
Full diff: https://github.com/llvm/llvm-project/pull/151846.diff


9 Files Affected:

- (modified) libc/shared/math.h (+1) 
- (added) libc/shared/math/cbrtf.h (+23) 
- (modified) libc/src/__support/math/CMakeLists.txt (+11) 
- (added) libc/src/__support/math/cbrtf.h (+161) 
- (modified) libc/src/math/generic/CMakeLists.txt (+1-5) 
- (modified) libc/src/math/generic/cbrtf.cpp (+2-145) 
- (modified) libc/test/shared/CMakeLists.txt (+1) 
- (modified) libc/test/shared/shared_math_test.cpp (+1) 
- (modified) utils/bazel/llvm-project-overlay/libc/BUILD.bazel (+13-1) 


``diff
diff --git a/libc/shared/math.h b/libc/shared/math.h
index 3714f380a27dc..ea645f0afedbc 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -31,6 +31,7 @@
 #include "math/atanhf.h"
 #include "math/atanhf16.h"
 #include "math/cbrt.h"
+#include "math/cbrtf.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cbrtf.h b/libc/shared/math/cbrtf.h
new file mode 100644
index 0..09b86bed3fb7e
--- /dev/null
+++ b/libc/shared/math/cbrtf.h
@@ -0,0 +1,23 @@
+//===-- Shared cbrtf function ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LIBC_SHARED_MATH_CBRTF_H
+#define LIBC_SHARED_MATH_CBRTF_H
+
+#include "shared/libc_common.h"
+#include "src/__support/math/cbrtf.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cbrtf;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_SHARED_MATH_CBRTF_H
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index e1076edf1e61c..fe928a8fadd5e 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -346,6 +346,17 @@ add_header_library(
 libc.src.__support.integer_literals
 )
 
+add_header_library(
+  cbrtf
+  HDRS
+cbrtf.h
+  DEPENDS
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.macros.optimization
+)
+
 add_header_library(
   erff
   HDRS
diff --git a/libc/src/__support/math/cbrtf.h b/libc/src/__support/math/cbrtf.h
new file mode 100644
index 0..f82892bbbe61b
--- /dev/null
+++ b/libc/src/__support/math/cbrtf.h
@@ -0,0 +1,161 @@
+//===-- Implementation header for cbrtf -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LIBC_SRC___SUPPORT_MATH_CBRTF_H
+#define LIBC_SRC___SUPPORT_MATH_CBRTF_H
+
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace math {
+
+LIBC_INLINE static constexpr float cbrtf(float x) {
+  // Look up table for 2^(i/3) for i = 0, 1, 2.
+  constexpr double CBRT2[3] = {1.0, 0x1.428a2f98d728bp0, 0x1.965fea53d6e3dp0};
+
+  // Degree-7 polynomials approximation of ((1 + x)^(1/3) - 1)/x for 0 <= x <= 
1
+  // generated by Sollya with:
+  // > for i from 0 to 15 do {
+  // P = fpminimax(((1 + x)^(1/3) - 1)/x, 6, [|D...|], [i/16, (i + 1)/16]);
+  // print("{", coeff(P, 0), ",", coeff(P, 1), ",", coeff(P, 2), ",",
+  //   coeff(P, 3), ",", coeff(P, 4), ",", coeff(P, 5), ",",
+  //   coeff(P, 6), "},");
+  // };
+  // Then (1 + x)^(1/3) ~ 1 + x * P(x).
+  constexpr double COEFFS[16][7] = {
+  {0x1.554ebp-2, -0x1.c71c71c678c0cp-4, 0x1.f9add2776de81p-5,
+   -0x1.511e10aa964a7p-5, 0x1.ee44165937fa2p-6, -0x1.7c5c9e059345dp-6,
+   0x1.047f75e0aff14p-6},
+  {0x1.554d1149ap-2, -0x1.c71c676fcb5bp-4, 0x1.f9ab127dc57ebp-5,
+   -0x1.50ea8fd1d4c15p-5, 0x1.e9d68f28ced43p-6, -0x1.60e0e1e661311p-6,
+   0x1.716eca1d6e3bcp-7},
+  {0x1.546377d45p-2, -0x1.c71bc1c6d49d2p-4, 0x1.f9924cc0ed24dp-5,
+   -0x1.4fea3beb53b3bp-5, 0x1.de028a9a07b1bp-6, -0x1.3b090d2233524p-6,
+   0x1.0aeca34893785p-7},
+  {0x1.4dce9f649p-2, -0x1.c7188b34b98f8p-4, 0x1.f93e1af34af49p-5,
+   -0x1.4d9a06be75c63p-5, 0x1.cb943f4f68992p-6, -0x1.139a

[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151811

>From f5894074b3be2d7b02950d62cb2f82ebfcc63174 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:39:28 -0700
Subject: [PATCH] [AMDGPU] Add missing v_permlane_up_b32 test. NFC.

---
 llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt 
b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
index 5fdc973ef216c..4b44c27570af5 100644
--- a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
@@ -869,6 +869,9 @@
 0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01
 # GFX1250: v_permlane_down_b32 v5, v1, vcc_lo, m0  ; encoding: 
[0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01]
 
+0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01
+# GFX1250: v_permlane_up_b32 v5, v1, exec_hi, vcc_lo ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01]
+
 0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03
 # GFX1250: v_permlane_up_b32 v5, v1, exec_lo, src_scc ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03]
 

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151811

>From f5894074b3be2d7b02950d62cb2f82ebfcc63174 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:39:28 -0700
Subject: [PATCH] [AMDGPU] Add missing v_permlane_up_b32 test. NFC.

---
 llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt 
b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
index 5fdc973ef216c..4b44c27570af5 100644
--- a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
+++ b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt
@@ -869,6 +869,9 @@
 0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01
 # GFX1250: v_permlane_down_b32 v5, v1, vcc_lo, m0  ; encoding: 
[0x05,0x00,0x72,0xd6,0x01,0xd5,0xf4,0x01]
 
+0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01
+# GFX1250: v_permlane_up_b32 v5, v1, exec_hi, vcc_lo ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xff,0xa8,0x01]
+
 0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03
 # GFX1250: v_permlane_up_b32 v5, v1, exec_lo, src_scc ; encoding: 
[0x05,0x00,0x71,0xd6,0x01,0xfd,0xf4,0x03]
 

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrt implementation to header-only in src/__support/math folder. (PR #151837)

2025-08-02 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix created 
https://github.com/llvm/llvm-project/pull/151837

None

>From 16142801a79a1bbde7208deb738420829f0a1282 Mon Sep 17 00:00:00 2001
From: bassiounix 
Date: Sat, 2 Aug 2025 22:51:43 +0300
Subject: [PATCH] [libc][math] Refactor cbrt implementation to header-only in
 src/__support/math folder.

---
 libc/shared/math.h|   1 +
 libc/shared/math/cbrt.h   |  23 ++
 libc/src/__support/math/CMakeLists.txt|  15 +
 libc/src/__support/math/cbrt.h| 349 ++
 libc/src/math/generic/CMakeLists.txt  |  10 +-
 libc/src/math/generic/cbrt.cpp| 328 +---
 libc/test/shared/CMakeLists.txt   |   1 +
 libc/test/shared/shared_math_test.cpp |   1 +
 .../llvm-project-overlay/libc/BUILD.bazel |  16 +-
 9 files changed, 406 insertions(+), 338 deletions(-)
 create mode 100644 libc/shared/math/cbrt.h
 create mode 100644 libc/src/__support/math/cbrt.h

diff --git a/libc/shared/math.h b/libc/shared/math.h
index 7fb736b78efa5..3714f380a27dc 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -30,6 +30,7 @@
 #include "math/atanf16.h"
 #include "math/atanhf.h"
 #include "math/atanhf16.h"
+#include "math/cbrt.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cbrt.h b/libc/shared/math/cbrt.h
new file mode 100644
index 0..2f49dbd364328
--- /dev/null
+++ b/libc/shared/math/cbrt.h
@@ -0,0 +1,23 @@
+//===-- Shared cbrt function *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_CBRT_H
+#define LLVM_LIBC_SHARED_MATH_CBRT_H
+
+#include "shared/libc_common.h"
+#include "src/__support/math/cbrt.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cbrt;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SHARED_MATH_CBRT_H
\ No newline at end of file
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index 9631ab5be7d3b..e1076edf1e61c 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -331,6 +331,21 @@ add_header_library(
 libc.src.__support.macros.optimization
 )
 
+add_header_library(
+  cbrt
+  HDRS
+cbrt.h
+  DEPENDS
+libc.src.__support.FPUtil.double_double
+libc.src.__support.FPUtil.dyadic_float
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.macros.optimization
+libc.src.__support.integer_literals
+)
+
 add_header_library(
   erff
   HDRS
diff --git a/libc/src/__support/math/cbrt.h b/libc/src/__support/math/cbrt.h
new file mode 100644
index 0..2b9a73c823b14
--- /dev/null
+++ b/libc/src/__support/math/cbrt.h
@@ -0,0 +1,349 @@
+//===-- Implementation header for erff --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LIBC_SRC___SUPPORT_MATH_CBRT_H
+#define LIBC_SRC___SUPPORT_MATH_CBRT_H
+
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/dyadic_float.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/integer_literals.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace math {
+
+#if ((LIBC_MATH & LIBC_MATH_SKIP_ACCURATE_PASS) != 0)
+#define LIBC_MATH_CBRT_SKIP_ACCURATE_PASS
+#endif
+
+namespace cbrt_internal {
+using namespace fputil;
+
+// Initial approximation of x^(-2/3) for 1 <= x < 2.
+// Polynomial generated by Sollya with:
+// > P = fpminimax(x^(-2/3), 7, [|D...|], [1, 2]);
+// > dirtyinfnorm(P/x^(-2/3) - 1, [1, 2]);
+// 0x1.28...p-21
+LIBC_INLINE static double intial_approximation(double x) {
+  constexpr double COEFFS[8] = {
+  0x1.bc52aedead5c6p1,  -0x1.b52bfebf110b3p2,  0x1.1d8d71d53d126p3,
+  -0x1.de2db9e81cf87p2, 0x1.0154ca06153bdp2,   -0x1.5973c66ee6da7p0,
+  0x1.07bf6ac832552p-2, -0x1.5e53d9ce41cb8p-6,
+  };
+
+  double x_sq = x * x;
+
+  double c0 = fputil::multiply_add(x, COEFFS[1], COEFFS[0]);
+  double c1 = fputil::multiply_add(x, COEFFS[3], COEFFS[2]);
+  d

[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrt implementation to header-only in src/__support/math folder. (PR #151837)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libc

Author: Muhammad Bassiouni (bassiounix)


Changes



---

Patch is 30.62 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/151837.diff


9 Files Affected:

- (modified) libc/shared/math.h (+1) 
- (added) libc/shared/math/cbrt.h (+23) 
- (modified) libc/src/__support/math/CMakeLists.txt (+15) 
- (added) libc/src/__support/math/cbrt.h (+349) 
- (modified) libc/src/math/generic/CMakeLists.txt (+1-9) 
- (modified) libc/src/math/generic/cbrt.cpp (+2-326) 
- (modified) libc/test/shared/CMakeLists.txt (+1) 
- (modified) libc/test/shared/shared_math_test.cpp (+1) 
- (modified) utils/bazel/llvm-project-overlay/libc/BUILD.bazel (+13-3) 


``diff
diff --git a/libc/shared/math.h b/libc/shared/math.h
index 7fb736b78efa5..3714f380a27dc 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -30,6 +30,7 @@
 #include "math/atanf16.h"
 #include "math/atanhf.h"
 #include "math/atanhf16.h"
+#include "math/cbrt.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cbrt.h b/libc/shared/math/cbrt.h
new file mode 100644
index 0..2f49dbd364328
--- /dev/null
+++ b/libc/shared/math/cbrt.h
@@ -0,0 +1,23 @@
+//===-- Shared cbrt function *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_CBRT_H
+#define LLVM_LIBC_SHARED_MATH_CBRT_H
+
+#include "shared/libc_common.h"
+#include "src/__support/math/cbrt.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cbrt;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SHARED_MATH_CBRT_H
\ No newline at end of file
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index 9631ab5be7d3b..e1076edf1e61c 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -331,6 +331,21 @@ add_header_library(
 libc.src.__support.macros.optimization
 )
 
+add_header_library(
+  cbrt
+  HDRS
+cbrt.h
+  DEPENDS
+libc.src.__support.FPUtil.double_double
+libc.src.__support.FPUtil.dyadic_float
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.macros.optimization
+libc.src.__support.integer_literals
+)
+
 add_header_library(
   erff
   HDRS
diff --git a/libc/src/__support/math/cbrt.h b/libc/src/__support/math/cbrt.h
new file mode 100644
index 0..2b9a73c823b14
--- /dev/null
+++ b/libc/src/__support/math/cbrt.h
@@ -0,0 +1,349 @@
+//===-- Implementation header for erff --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LIBC_SRC___SUPPORT_MATH_CBRT_H
+#define LIBC_SRC___SUPPORT_MATH_CBRT_H
+
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/dyadic_float.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/integer_literals.h"
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace math {
+
+#if ((LIBC_MATH & LIBC_MATH_SKIP_ACCURATE_PASS) != 0)
+#define LIBC_MATH_CBRT_SKIP_ACCURATE_PASS
+#endif
+
+namespace cbrt_internal {
+using namespace fputil;
+
+// Initial approximation of x^(-2/3) for 1 <= x < 2.
+// Polynomial generated by Sollya with:
+// > P = fpminimax(x^(-2/3), 7, [|D...|], [1, 2]);
+// > dirtyinfnorm(P/x^(-2/3) - 1, [1, 2]);
+// 0x1.28...p-21
+LIBC_INLINE static double intial_approximation(double x) {
+  constexpr double COEFFS[8] = {
+  0x1.bc52aedead5c6p1,  -0x1.b52bfebf110b3p2,  0x1.1d8d71d53d126p3,
+  -0x1.de2db9e81cf87p2, 0x1.0154ca06153bdp2,   -0x1.5973c66ee6da7p0,
+  0x1.07bf6ac832552p-2, -0x1.5e53d9ce41cb8p-6,
+  };
+
+  double x_sq = x * x;
+
+  double c0 = fputil::multiply_add(x, COEFFS[1], COEFFS[0]);
+  double c1 = fputil::multiply_add(x, COEFFS[3], COEFFS[2]);
+  double c2 = fputil::multiply_add(x, COEFFS[5], COEFFS[4]);
+  double c3 = fputil::multiply_add(x, COEFFS[7], COEFFS[6]);
+
+  double x_4 = x_sq * x_sq;
+  double d0 = fputil::multiply_add(x_sq, c1, c0);
+  double d1 = fputil::multiply_add(x_sq, c3, c2);
+
+  return fputil:

[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrt implementation to header-only in src/__support/math folder. (PR #151837)

2025-08-02 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix ready_for_review 
https://github.com/llvm/llvm-project/pull/151837
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrt implementation to header-only in src/__support/math folder. (PR #151837)

2025-08-02 Thread Muhammad Bassiouni via llvm-branch-commits

bassiounix wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/151837?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#151837** https://app.graphite.dev/github/pr/llvm/llvm-project/151837?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/151837?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#151779** https://app.graphite.dev/github/pr/llvm/llvm-project/151779?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151399** https://app.graphite.dev/github/pr/llvm/llvm-project/151399?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151012** https://app.graphite.dev/github/pr/llvm/llvm-project/151012?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150993** https://app.graphite.dev/github/pr/llvm/llvm-project/150993?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150968** https://app.graphite.dev/github/pr/llvm/llvm-project/150968?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150868** https://app.graphite.dev/github/pr/llvm/llvm-project/150868?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150854** https://app.graphite.dev/github/pr/llvm/llvm-project/150854?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150852** https://app.graphite.dev/github/pr/llvm/llvm-project/150852?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150849** https://app.graphite.dev/github/pr/llvm/llvm-project/150849?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150843** https://app.graphite.dev/github/pr/llvm/llvm-project/150843?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/151837
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cbrt implementation to header-only in src/__support/math folder. (PR #151837)

2025-08-02 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix edited 
https://github.com/llvm/llvm-project/pull/151837
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (PR #151807)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/151807
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/151810
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add missing v_permlane_up_b32 test. NFC. (PR #151811)

2025-08-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/151811
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (PR #151807)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151807

>From 792b72892f4c359330472eca4926ec1f7fe6aec2 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 01:48:31 -0700
Subject: [PATCH] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   6 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl  |  36 +++
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   6 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   3 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  12 +
 .../llvm.amdgcn.cvt.scalef32.pk16.gfx1250.ll  | 303 ++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s |  36 +++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s|  36 +++
 .../Disassembler/AMDGPU/gfx1250_dasm_vop3.txt |  36 +++
 10 files changed, 480 insertions(+)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk16.gfx1250.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 3773031e187c1..9125315310306 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -731,6 +731,12 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_bf8_f32, 
"V2UiV8ff", "nc", "gfx
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_f32, "UiV8ff", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_f16, "UiV8hf", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_bf16, "UiV8yf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_f32, "V3UiV16ff", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_f32, "V3UiV16ff", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_f16, "V3UiV16hf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_f16, "V3UiV16hf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_bf16, "V3UiV16yf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_bf16, "V3UiV16yf", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp8_bf16, "V2UiV8yUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_bf16, "V2UiV8yUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp8_f16, "V2UiV8hUif", 
"nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index c25aaf11bb0e1..e50ab77f48c79 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -787,6 +787,36 @@ void test_cvt_scale_pk(global half8 *outh8, global bfloat8 
*outy8, uint2 src2,
 // CHECK-NEXT:[[TMP34:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.pk8.fp4.bf16(<8 x bfloat> [[TMP32]], float [[TMP33]])
 // CHECK-NEXT:[[TMP35:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP34]], ptr addrspace(1) [[TMP35]], align 4
+// CHECK-NEXT:[[TMP36:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP37:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP38:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.bf6.bf16(<16 x bfloat> [[TMP36]], float 
[[TMP37]])
+// CHECK-NEXT:[[TMP39:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP38]], ptr addrspace(1) [[TMP39]], align 
16
+// CHECK-NEXT:[[TMP40:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP41:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP42:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.bf6.f16(<16 x half> [[TMP40]], float [[TMP41]])
+// CHECK-NEXT:[[TMP43:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP42]], ptr addrspace(1) [[TMP43]], align 
16
+// CHECK-NEXT:[[TMP44:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP45:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP46:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.fp6.bf16(<16 x bfloat> [[TMP44]], float 
[[TMP45]])
+// CHECK-NEXT:[[TMP47:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP46]], ptr addrspace(1) [[TMP47]], align 
16
+// CHECK-NEXT:[[TMP48:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP49:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP50:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.fp6.f16(<16 x half> [[TMP48]], float [[TMP49]])
+// CHECK-NEXT:[[TMP51:%.*]] = load ptr ad

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151810

>From dad0929b323eb7dde3211a43a4b14170fee5d56c Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:11:34 -0700
Subject: [PATCH] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   6 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl  |  42 
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   6 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   3 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  12 +
 .../llvm.amdgcn.cvt.scalef32.sr.pk16.ll   | 232 ++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s |  36 +++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s|  36 +++
 .../Disassembler/AMDGPU/gfx1250_dasm_vop3.txt |  36 +++
 10 files changed, 415 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.sr.pk16.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 9125315310306..ced758c814105 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -746,6 +746,12 @@ 
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_f32, "V2UiV8fUif", "nc",
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f32, "UiV8fUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f16, "UiV8hUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_bf16, "UiV8yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32_e5m3, "iffiIb", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32_e5m3, "ifiiIi", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sat_pk4_i4_i8, "UsUi", "nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index e50ab77f48c79..4ff0571239e71 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -929,6 +929,42 @@ void test_cvt_scalef32_pk(global uint2 *out2, bfloat8 
srcbf8, half8 srch8, float
 // CHECK-NEXT:[[TMP43:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.sr.pk8.fp4.bf16(<8 x bfloat> [[TMP40]], i32 
[[TMP41]], float [[TMP42]])
 // CHECK-NEXT:[[TMP44:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP43]], ptr addrspace(1) [[TMP44]], align 4
+// CHECK-NEXT:[[TMP45:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP46:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP47:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP48:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.bf16(<16 x bfloat> [[TMP45]], i32 
[[TMP46]], float [[TMP47]])
+// CHECK-NEXT:[[TMP49:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP48]], ptr addrspace(1) [[TMP49]], align 
16
+// CHECK-NEXT:[[TMP50:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP51:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP52:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP53:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.f16(<16 x half> [[TMP50]], i32 [[TMP51]], 
float [[TMP52]])
+// CHECK-NEXT:[[TMP54:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP53]], ptr addrspace(1) [[TMP54]], align 
16
+// CHECK-NEXT:[[TMP55:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP56:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP57:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP58:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.fp6.bf16(<16 x bfloat> [[TMP55]], i32 
[[TMP56]], float [[TMP57]])
+// CHECK-NEXT:[[TMP59:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP58]], ptr addrspace(1) [[TMP59]], align 
16
+// CHECK-NEXT:[[TMP60:%.*]] = load <16 x h

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (PR #151807)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151807

>From 792b72892f4c359330472eca4926ec1f7fe6aec2 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 01:48:31 -0700
Subject: [PATCH] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   6 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl  |  36 +++
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   6 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   3 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  12 +
 .../llvm.amdgcn.cvt.scalef32.pk16.gfx1250.ll  | 303 ++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s |  36 +++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s|  36 +++
 .../Disassembler/AMDGPU/gfx1250_dasm_vop3.txt |  36 +++
 10 files changed, 480 insertions(+)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk16.gfx1250.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 3773031e187c1..9125315310306 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -731,6 +731,12 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_bf8_f32, 
"V2UiV8ff", "nc", "gfx
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_f32, "UiV8ff", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_f16, "UiV8hf", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_bf16, "UiV8yf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_f32, "V3UiV16ff", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_f32, "V3UiV16ff", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_f16, "V3UiV16hf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_f16, "V3UiV16hf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_bf16, "V3UiV16yf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_bf16, "V3UiV16yf", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp8_bf16, "V2UiV8yUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_bf16, "V2UiV8yUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp8_f16, "V2UiV8hUif", 
"nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index c25aaf11bb0e1..e50ab77f48c79 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -787,6 +787,36 @@ void test_cvt_scale_pk(global half8 *outh8, global bfloat8 
*outy8, uint2 src2,
 // CHECK-NEXT:[[TMP34:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.pk8.fp4.bf16(<8 x bfloat> [[TMP32]], float [[TMP33]])
 // CHECK-NEXT:[[TMP35:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP34]], ptr addrspace(1) [[TMP35]], align 4
+// CHECK-NEXT:[[TMP36:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP37:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP38:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.bf6.bf16(<16 x bfloat> [[TMP36]], float 
[[TMP37]])
+// CHECK-NEXT:[[TMP39:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP38]], ptr addrspace(1) [[TMP39]], align 
16
+// CHECK-NEXT:[[TMP40:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP41:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP42:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.bf6.f16(<16 x half> [[TMP40]], float [[TMP41]])
+// CHECK-NEXT:[[TMP43:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP42]], ptr addrspace(1) [[TMP43]], align 
16
+// CHECK-NEXT:[[TMP44:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP45:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP46:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.fp6.bf16(<16 x bfloat> [[TMP44]], float 
[[TMP45]])
+// CHECK-NEXT:[[TMP47:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP46]], ptr addrspace(1) [[TMP47]], align 
16
+// CHECK-NEXT:[[TMP48:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP49:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP50:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.fp6.f16(<16 x half> [[TMP48]], float [[TMP49]])
+// CHECK-NEXT:[[TMP51:%.*]] = load ptr ad

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/151810

None

>From dfe439a1ed94031e238a3acd558cb0035b74e97b Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:11:34 -0700
Subject: [PATCH] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   6 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl  |  42 
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   6 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   3 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  12 +
 .../llvm.amdgcn.cvt.scalef32.sr.pk16.ll   | 232 ++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s |  36 +++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s|  36 +++
 .../Disassembler/AMDGPU/gfx1250_dasm_vop3.txt |  36 +++
 10 files changed, 415 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.sr.pk16.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 9125315310306..ced758c814105 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -746,6 +746,12 @@ 
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_f32, "V2UiV8fUif", "nc",
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f32, "UiV8fUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f16, "UiV8hUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_bf16, "UiV8yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32_e5m3, "iffiIb", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32_e5m3, "ifiiIi", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sat_pk4_i4_i8, "UsUi", "nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index e50ab77f48c79..4ff0571239e71 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -929,6 +929,42 @@ void test_cvt_scalef32_pk(global uint2 *out2, bfloat8 
srcbf8, half8 srch8, float
 // CHECK-NEXT:[[TMP43:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.sr.pk8.fp4.bf16(<8 x bfloat> [[TMP40]], i32 
[[TMP41]], float [[TMP42]])
 // CHECK-NEXT:[[TMP44:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP43]], ptr addrspace(1) [[TMP44]], align 4
+// CHECK-NEXT:[[TMP45:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP46:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP47:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP48:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.bf16(<16 x bfloat> [[TMP45]], i32 
[[TMP46]], float [[TMP47]])
+// CHECK-NEXT:[[TMP49:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP48]], ptr addrspace(1) [[TMP49]], align 
16
+// CHECK-NEXT:[[TMP50:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP51:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP52:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP53:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.f16(<16 x half> [[TMP50]], i32 [[TMP51]], 
float [[TMP52]])
+// CHECK-NEXT:[[TMP54:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP53]], ptr addrspace(1) [[TMP54]], align 
16
+// CHECK-NEXT:[[TMP55:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP56:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP57:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP58:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.fp6.bf16(<16 x bfloat> [[TMP55]], i32 
[[TMP56]], float [[TMP57]])
+// CHECK-NEXT:[[TMP59:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP58]], ptr addrspace(1) [[TMP59]], align 
16
+// CHECK-NEXT:[[TMP60:%.*]] = load <

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/151810?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#151810** https://app.graphite.dev/github/pr/llvm/llvm-project/151810?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/151810?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#151807** https://app.graphite.dev/github/pr/llvm/llvm-project/151807?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151804** https://app.graphite.dev/github/pr/llvm/llvm-project/151804?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/151810
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Stanislav Mekhanoshin (rampitec)


Changes



---

Patch is 34.66 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/151810.diff


10 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+6) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl (+42) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+6) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+3) 
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+12) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.sr.pk16.ll (+232) 
- (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s (+36) 
- (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s (+36) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt (+36) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 9125315310306..ced758c814105 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -746,6 +746,12 @@ 
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_f32, "V2UiV8fUif", "nc",
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f32, "UiV8fUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f16, "UiV8hUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_bf16, "UiV8yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32_e5m3, "iffiIb", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32_e5m3, "ifiiIi", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sat_pk4_i4_i8, "UsUi", "nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index e50ab77f48c79..4ff0571239e71 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -929,6 +929,42 @@ void test_cvt_scalef32_pk(global uint2 *out2, bfloat8 
srcbf8, half8 srch8, float
 // CHECK-NEXT:[[TMP43:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.sr.pk8.fp4.bf16(<8 x bfloat> [[TMP40]], i32 
[[TMP41]], float [[TMP42]])
 // CHECK-NEXT:[[TMP44:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP43]], ptr addrspace(1) [[TMP44]], align 4
+// CHECK-NEXT:[[TMP45:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP46:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP47:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP48:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.bf16(<16 x bfloat> [[TMP45]], i32 
[[TMP46]], float [[TMP47]])
+// CHECK-NEXT:[[TMP49:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP48]], ptr addrspace(1) [[TMP49]], align 
16
+// CHECK-NEXT:[[TMP50:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP51:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP52:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP53:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.f16(<16 x half> [[TMP50]], i32 [[TMP51]], 
float [[TMP52]])
+// CHECK-NEXT:[[TMP54:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP53]], ptr addrspace(1) [[TMP54]], align 
16
+// CHECK-NEXT:[[TMP55:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP56:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP57:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP58:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.fp6.bf16(<16 x bfloat> [[TMP55]], i32 
[[TMP56]], float [[TMP57]])
+// CHECK-NEXT:[[TMP59:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP58]], ptr addrspace(1) [[TMP59]], align 
16
+// CHECK-NEXT:[[TMP60:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP6

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)


Changes



---

Patch is 34.66 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/151810.diff


10 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+6) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl (+42) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+6) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+3) 
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+12) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.sr.pk16.ll (+232) 
- (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s (+36) 
- (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s (+36) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt (+36) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 9125315310306..ced758c814105 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -746,6 +746,12 @@ 
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_f32, "V2UiV8fUif", "nc",
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f32, "UiV8fUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f16, "UiV8hUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_bf16, "UiV8yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32_e5m3, "iffiIb", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32_e5m3, "ifiiIi", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sat_pk4_i4_i8, "UsUi", "nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index e50ab77f48c79..4ff0571239e71 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -929,6 +929,42 @@ void test_cvt_scalef32_pk(global uint2 *out2, bfloat8 
srcbf8, half8 srch8, float
 // CHECK-NEXT:[[TMP43:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.sr.pk8.fp4.bf16(<8 x bfloat> [[TMP40]], i32 
[[TMP41]], float [[TMP42]])
 // CHECK-NEXT:[[TMP44:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP43]], ptr addrspace(1) [[TMP44]], align 4
+// CHECK-NEXT:[[TMP45:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP46:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP47:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP48:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.bf16(<16 x bfloat> [[TMP45]], i32 
[[TMP46]], float [[TMP47]])
+// CHECK-NEXT:[[TMP49:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP48]], ptr addrspace(1) [[TMP49]], align 
16
+// CHECK-NEXT:[[TMP50:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP51:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP52:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP53:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.f16(<16 x half> [[TMP50]], i32 [[TMP51]], 
float [[TMP52]])
+// CHECK-NEXT:[[TMP54:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP53]], ptr addrspace(1) [[TMP54]], align 
16
+// CHECK-NEXT:[[TMP55:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP56:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP57:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP58:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.fp6.bf16(<16 x bfloat> [[TMP55]], i32 
[[TMP56]], float [[TMP57]])
+// CHECK-NEXT:[[TMP59:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP58]], ptr addrspace(1) [[TMP59]], align 
16
+// CHECK-NEXT:[[TMP60:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mc

Author: Stanislav Mekhanoshin (rampitec)


Changes



---

Patch is 34.66 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/151810.diff


10 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+6) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl (+42) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+6) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+3) 
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+12) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.sr.pk16.ll (+232) 
- (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s (+36) 
- (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s (+36) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt (+36) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 9125315310306..ced758c814105 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -746,6 +746,12 @@ 
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_f32, "V2UiV8fUif", "nc",
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f32, "UiV8fUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f16, "UiV8hUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_bf16, "UiV8yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32_e5m3, "iffiIb", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32_e5m3, "ifiiIi", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sat_pk4_i4_i8, "UsUi", "nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index e50ab77f48c79..4ff0571239e71 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -929,6 +929,42 @@ void test_cvt_scalef32_pk(global uint2 *out2, bfloat8 
srcbf8, half8 srch8, float
 // CHECK-NEXT:[[TMP43:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.sr.pk8.fp4.bf16(<8 x bfloat> [[TMP40]], i32 
[[TMP41]], float [[TMP42]])
 // CHECK-NEXT:[[TMP44:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP43]], ptr addrspace(1) [[TMP44]], align 4
+// CHECK-NEXT:[[TMP45:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP46:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP47:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP48:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.bf16(<16 x bfloat> [[TMP45]], i32 
[[TMP46]], float [[TMP47]])
+// CHECK-NEXT:[[TMP49:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP48]], ptr addrspace(1) [[TMP49]], align 
16
+// CHECK-NEXT:[[TMP50:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP51:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP52:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP53:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.f16(<16 x half> [[TMP50]], i32 [[TMP51]], 
float [[TMP52]])
+// CHECK-NEXT:[[TMP54:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP53]], ptr addrspace(1) [[TMP54]], align 
16
+// CHECK-NEXT:[[TMP55:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP56:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP57:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP58:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.fp6.bf16(<16 x bfloat> [[TMP55]], i32 
[[TMP56]], float [[TMP57]])
+// CHECK-NEXT:[[TMP59:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP58]], ptr addrspace(1) [[TMP59]], align 
16
+// CHECK-NEXT:[[TMP60:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP61:%

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/151810

>From dad0929b323eb7dde3211a43a4b14170fee5d56c Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Sat, 2 Aug 2025 02:11:34 -0700
Subject: [PATCH] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   6 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx1250.cl  |  42 
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   6 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   3 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  12 +
 .../llvm.amdgcn.cvt.scalef32.sr.pk16.ll   | 232 ++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s |  36 +++
 llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s|  36 +++
 .../Disassembler/AMDGPU/gfx1250_dasm_vop3.txt |  36 +++
 10 files changed, 415 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.sr.pk16.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 9125315310306..ced758c814105 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -746,6 +746,12 @@ 
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_f32, "V2UiV8fUif", "nc",
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f32, "UiV8fUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_f16, "UiV8hUif", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp4_bf16, "UiV8yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_bf6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_bf16, "V3UiV16yUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f16, "V3UiV16hUif", 
"nc", "gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk16_fp6_f32, "V3UiV16fUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_pk_fp8_f32_e5m3, "iffiIb", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_sr_fp8_f32_e5m3, "ifiiIi", "nc", 
"fp8e5m3-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sat_pk4_i4_i8, "UsUi", "nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index e50ab77f48c79..4ff0571239e71 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -929,6 +929,42 @@ void test_cvt_scalef32_pk(global uint2 *out2, bfloat8 
srcbf8, half8 srch8, float
 // CHECK-NEXT:[[TMP43:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.sr.pk8.fp4.bf16(<8 x bfloat> [[TMP40]], i32 
[[TMP41]], float [[TMP42]])
 // CHECK-NEXT:[[TMP44:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP43]], ptr addrspace(1) [[TMP44]], align 4
+// CHECK-NEXT:[[TMP45:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP46:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP47:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP48:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.bf16(<16 x bfloat> [[TMP45]], i32 
[[TMP46]], float [[TMP47]])
+// CHECK-NEXT:[[TMP49:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP48]], ptr addrspace(1) [[TMP49]], align 
16
+// CHECK-NEXT:[[TMP50:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP51:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP52:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP53:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.bf6.f16(<16 x half> [[TMP50]], i32 [[TMP51]], 
float [[TMP52]])
+// CHECK-NEXT:[[TMP54:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP53]], ptr addrspace(1) [[TMP54]], align 
16
+// CHECK-NEXT:[[TMP55:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP56:%.*]] = load i32, ptr [[SR_ADDR_ASCAST]], align 4
+// CHECK-NEXT:[[TMP57:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP58:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.sr.pk16.fp6.bf16(<16 x bfloat> [[TMP55]], i32 
[[TMP56]], float [[TMP57]])
+// CHECK-NEXT:[[TMP59:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP58]], ptr addrspace(1) [[TMP59]], align 
16
+// CHECK-NEXT:[[TMP60:%.*]] = load <16 x h

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (PR #151810)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/151810
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang] Avoid inheriting [[noreturn]] in explicit function template specializations (#150003) (PR #151752)

2025-08-02 Thread Corentin Jabot via llvm-branch-commits

https://github.com/cor3ntin approved this pull request.

LGTM 
but main could be improved with an nfc commit 
https://github.com/llvm/llvm-project/pull/150003#pullrequestreview-3080960841

https://github.com/llvm/llvm-project/pull/151752
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (PR #151807)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)


Changes



---

Patch is 37.07 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/151807.diff


10 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+6) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl (+36) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+6) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+3) 
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+12) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk16.gfx1250.ll 
(+303) 
- (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3-fake16.s (+36) 
- (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3.s (+36) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3.txt (+36) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 3773031e187c1..9125315310306 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -731,6 +731,12 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_bf8_f32, 
"V2UiV8ff", "nc", "gfx
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_f32, "UiV8ff", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_f16, "UiV8hf", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk8_fp4_bf16, "UiV8yf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_f32, "V3UiV16ff", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_f32, "V3UiV16ff", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_f16, "V3UiV16hf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_f16, "V3UiV16hf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_fp6_bf16, "V3UiV16yf", "nc", 
"gfx1250-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk16_bf6_bf16, "V3UiV16yf", "nc", 
"gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp8_bf16, "V2UiV8yUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_bf8_bf16, "V2UiV8yUif", 
"nc", "gfx1250-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk8_fp8_f16, "V2UiV8hUif", 
"nc", "gfx1250-insts")
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
index c25aaf11bb0e1..e50ab77f48c79 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
@@ -787,6 +787,36 @@ void test_cvt_scale_pk(global half8 *outh8, global bfloat8 
*outy8, uint2 src2,
 // CHECK-NEXT:[[TMP34:%.*]] = call i32 
@llvm.amdgcn.cvt.scalef32.pk8.fp4.bf16(<8 x bfloat> [[TMP32]], float [[TMP33]])
 // CHECK-NEXT:[[TMP35:%.*]] = load ptr addrspace(1), ptr 
[[OUT1_ADDR_ASCAST]], align 8
 // CHECK-NEXT:store i32 [[TMP34]], ptr addrspace(1) [[TMP35]], align 4
+// CHECK-NEXT:[[TMP36:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP37:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP38:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.bf6.bf16(<16 x bfloat> [[TMP36]], float 
[[TMP37]])
+// CHECK-NEXT:[[TMP39:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP38]], ptr addrspace(1) [[TMP39]], align 
16
+// CHECK-NEXT:[[TMP40:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP41:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP42:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.bf6.f16(<16 x half> [[TMP40]], float [[TMP41]])
+// CHECK-NEXT:[[TMP43:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP42]], ptr addrspace(1) [[TMP43]], align 
16
+// CHECK-NEXT:[[TMP44:%.*]] = load <16 x bfloat>, ptr 
[[SRCBF16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP45:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP46:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.fp6.bf16(<16 x bfloat> [[TMP44]], float 
[[TMP45]])
+// CHECK-NEXT:[[TMP47:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADDR_ASCAST]], align 8
+// CHECK-NEXT:store <3 x i32> [[TMP46]], ptr addrspace(1) [[TMP47]], align 
16
+// CHECK-NEXT:[[TMP48:%.*]] = load <16 x half>, ptr 
[[SRCH16_ADDR_ASCAST]], align 32
+// CHECK-NEXT:[[TMP49:%.*]] = load float, ptr [[SCALE_ADDR_ASCAST]], align 
4
+// CHECK-NEXT:[[TMP50:%.*]] = call <3 x i32> 
@llvm.amdgcn.cvt.scalef32.pk16.fp6.f16(<16 x half> [[TMP48]], float [[TMP49]])
+// CHECK-NEXT:[[TMP51:%.*]] = load ptr addrspace(1), ptr 
[[OUT3_ADD

[llvm-branch-commits] [clang] [llvm] [AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (PR #151807)

2025-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/151807
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/21.x: [libc++] Implement comparison operators for `tuple` added in C++23 (#148799) (PR #151808)

2025-08-02 Thread A. Jiang via llvm-branch-commits

https://github.com/frederick-vs-ja created 
https://github.com/llvm/llvm-project/pull/151808

And constrain the new `operator==` since C++26.

This patch implements parts of P2165R4, P2944R3, and a possibly improved 
resolution of LWG3882. Currently, libstdc++ and MSVC STL constrain the new 
overloads in the same way.

Also set feature-test macro `__cpp_lib_constrained_equality` and add related 
release note, as P2944R3 will completed with this patch.

Manually backport 4a509f853fa4821ecdb0f6bc3b90ddd48794cc8c.

>From fe866577c7977eaf5bbfe87c4a8336b288018274 Mon Sep 17 00:00:00 2001
From: "A. Jiang" 
Date: Fri, 1 Aug 2025 23:53:33 +0800
Subject: [PATCH] [libc++] Implement comparison operators for `tuple` added in
 C++23 (#148799)

And constrain the new `operator==` since C++26.

This patch implements parts of P2165R4, P2944R3, and a possibly improved
resolution of LWG3882. Currently, libstdc++ and MSVC STL constrain the
new overloads in the same way.

Also set feature-test macro `__cpp_lib_constrained_equality` and add
related release note, as P2944R3 will completed with this patch.

Fixes #136765
Fixes #136770
Fixes #105424
---
 libcxx/docs/FeatureTestMacroTable.rst |   2 +-
 libcxx/docs/ReleaseNotes/21.rst   |   2 +
 libcxx/docs/Status/Cxx23Papers.csv|   2 +-
 libcxx/docs/Status/Cxx2cIssues.csv|   1 +
 libcxx/docs/Status/Cxx2cPapers.csv|   2 +-
 libcxx/include/tuple  | 133 +++--
 libcxx/include/version|   2 +-
 .../expected.version.compile.pass.cpp |  16 +-
 .../optional.version.compile.pass.cpp |  16 +-
 .../tuple.version.compile.pass.cpp|  16 +-
 .../utility.version.compile.pass.cpp  |  16 +-
 .../variant.version.compile.pass.cpp  |  16 +-
 .../version.version.compile.pass.cpp  |  16 +-
 .../tuple/tuple.tuple/tuple.rel/eq.pass.cpp   | 380 +-
 .../tuple/tuple.tuple/tuple.rel/lt.pass.cpp   | 469 +++---
 ...ze_incompatible_three_way.compile.pass.cpp |  19 +-
 .../tuple.tuple/tuple.rel/three_way.pass.cpp  | 167 ++-
 .../generate_feature_test_macro_components.py |   2 -
 18 files changed, 852 insertions(+), 425 deletions(-)

diff --git a/libcxx/docs/FeatureTestMacroTable.rst 
b/libcxx/docs/FeatureTestMacroTable.rst
index 61805726a4ff0..a36848ebd24b4 100644
--- a/libcxx/docs/FeatureTestMacroTable.rst
+++ b/libcxx/docs/FeatureTestMacroTable.rst
@@ -432,7 +432,7 @@ Status
 -- 
-
 ``__cpp_lib_constexpr_queue``  ``202502L``
 -- 
-
-``__cpp_lib_constrained_equality`` *unimplemented*
+``__cpp_lib_constrained_equality`` ``202411L``
 -- 
-
 ``__cpp_lib_copyable_function``*unimplemented*
 -- 
-
diff --git a/libcxx/docs/ReleaseNotes/21.rst b/libcxx/docs/ReleaseNotes/21.rst
index 74bfa97fd21c2..91123ffa3e34b 100644
--- a/libcxx/docs/ReleaseNotes/21.rst
+++ b/libcxx/docs/ReleaseNotes/21.rst
@@ -53,6 +53,8 @@ Implemented Papers
 - P2711R1: Making multi-param constructors of ``views`` ``explicit`` (`Github 
`__)
 - P2770R0: Stashing stashing ``iterators`` for proper flattening (`Github 
`__)
 - P2655R3: ``common_reference_t`` of ``reference_wrapper`` Should Be a 
Reference Type (`Github `__)
+- P2944R3: Comparisons for ``reference_wrapper`` (`Github 
`__)
+- P3379R0: Constrain ``std::expected equality`` operators (`Github 
`__)
 
 Improvements and New Features
 -
diff --git a/libcxx/docs/Status/Cxx23Papers.csv 
b/libcxx/docs/Status/Cxx23Papers.csv
index e4fa07d82289d..f1d8e9a2bd09c 100644
--- a/libcxx/docs/Status/Cxx23Papers.csv
+++ b/libcxx/docs/Status/Cxx23Papers.csv
@@ -60,7 +60,7 @@
 "`P1642R11 `__","Freestanding ``[utilities]``, 
``[ranges]``, and ``[iterators]``","2022-07 (Virtual)","","",""
 "`P1899R3 `__","``stride_view``","2022-07 
(Virtual)","","",""
 "`P2093R14 `__","Formatted output","2022-07 
(Virtual)","|Complete|","18",""
-"`P2165R4 `__","Compatibility between ``tuple``, 
``pair`` and ``tuple-like`` objects","2022-07 (Virtual)","|Partial|","","Only 
the part for ``zip_view`` is implemented."
+"`P2165R4 `__","Compatibility between ``tuple``, 
``pair`` and ``tuple-like`` objects","2022-07 
(Virt

[llvm-branch-commits] [libcxx] release/21.x: [libc++] Implement comparison operators for `tuple` added in C++23 (#148799) (PR #151808)

2025-08-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: A. Jiang (frederick-vs-ja)


Changes

And constrain the new `operator==` since C++26.

This patch implements parts of P2165R4, P2944R3, and a possibly improved 
resolution of LWG3882. Currently, libstdc++ and MSVC STL constrain the new 
overloads in the same way.

Also set feature-test macro `__cpp_lib_constrained_equality` and add related 
release note, as P2944R3 will completed with this patch.

Manually backport 4a509f853fa4821ecdb0f6bc3b90ddd48794cc8c.

---

Patch is 55.32 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/151808.diff


18 Files Affected:

- (modified) libcxx/docs/FeatureTestMacroTable.rst (+1-1) 
- (modified) libcxx/docs/ReleaseNotes/21.rst (+2) 
- (modified) libcxx/docs/Status/Cxx23Papers.csv (+1-1) 
- (modified) libcxx/docs/Status/Cxx2cIssues.csv (+1) 
- (modified) libcxx/docs/Status/Cxx2cPapers.csv (+1-1) 
- (modified) libcxx/include/tuple (+105-28) 
- (modified) libcxx/include/version (+1-1) 
- (modified) 
libcxx/test/std/language.support/support.limits/support.limits.general/expected.version.compile.pass.cpp
 (+5-11) 
- (modified) 
libcxx/test/std/language.support/support.limits/support.limits.general/optional.version.compile.pass.cpp
 (+5-11) 
- (modified) 
libcxx/test/std/language.support/support.limits/support.limits.general/tuple.version.compile.pass.cpp
 (+5-11) 
- (modified) 
libcxx/test/std/language.support/support.limits/support.limits.general/utility.version.compile.pass.cpp
 (+5-11) 
- (modified) 
libcxx/test/std/language.support/support.limits/support.limits.general/variant.version.compile.pass.cpp
 (+5-11) 
- (modified) 
libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp
 (+5-11) 
- (modified) libcxx/test/std/utilities/tuple/tuple.tuple/tuple.rel/eq.pass.cpp 
(+249-131) 
- (modified) libcxx/test/std/utilities/tuple/tuple.tuple/tuple.rel/lt.pass.cpp 
(+294-175) 
- (modified) 
libcxx/test/std/utilities/tuple/tuple.tuple/tuple.rel/size_incompatible_three_way.compile.pass.cpp
 (+17-2) 
- (modified) 
libcxx/test/std/utilities/tuple/tuple.tuple/tuple.rel/three_way.pass.cpp 
(+150-17) 
- (modified) libcxx/utils/generate_feature_test_macro_components.py (-2) 


``diff
diff --git a/libcxx/docs/FeatureTestMacroTable.rst 
b/libcxx/docs/FeatureTestMacroTable.rst
index 61805726a4ff0..a36848ebd24b4 100644
--- a/libcxx/docs/FeatureTestMacroTable.rst
+++ b/libcxx/docs/FeatureTestMacroTable.rst
@@ -432,7 +432,7 @@ Status
 -- 
-
 ``__cpp_lib_constexpr_queue``  ``202502L``
 -- 
-
-``__cpp_lib_constrained_equality`` *unimplemented*
+``__cpp_lib_constrained_equality`` ``202411L``
 -- 
-
 ``__cpp_lib_copyable_function``*unimplemented*
 -- 
-
diff --git a/libcxx/docs/ReleaseNotes/21.rst b/libcxx/docs/ReleaseNotes/21.rst
index 74bfa97fd21c2..91123ffa3e34b 100644
--- a/libcxx/docs/ReleaseNotes/21.rst
+++ b/libcxx/docs/ReleaseNotes/21.rst
@@ -53,6 +53,8 @@ Implemented Papers
 - P2711R1: Making multi-param constructors of ``views`` ``explicit`` (`Github 
`__)
 - P2770R0: Stashing stashing ``iterators`` for proper flattening (`Github 
`__)
 - P2655R3: ``common_reference_t`` of ``reference_wrapper`` Should Be a 
Reference Type (`Github `__)
+- P2944R3: Comparisons for ``reference_wrapper`` (`Github 
`__)
+- P3379R0: Constrain ``std::expected equality`` operators (`Github 
`__)
 
 Improvements and New Features
 -
diff --git a/libcxx/docs/Status/Cxx23Papers.csv 
b/libcxx/docs/Status/Cxx23Papers.csv
index e4fa07d82289d..f1d8e9a2bd09c 100644
--- a/libcxx/docs/Status/Cxx23Papers.csv
+++ b/libcxx/docs/Status/Cxx23Papers.csv
@@ -60,7 +60,7 @@
 "`P1642R11 `__","Freestanding ``[utilities]``, 
``[ranges]``, and ``[iterators]``","2022-07 (Virtual)","","",""
 "`P1899R3 `__","``stride_view``","2022-07 
(Virtual)","","",""
 "`P2093R14 `__","Formatted output","2022-07 
(Virtual)","|Complete|","18",""
-"`P2165R4 `__","Compatibility between ``tuple``, 
``pair`` and ``tuple-like`` objects","2022-07 (Virtual)","|Partial|","","Only 
the part for ``zip_view`` is implemented."
+"`P2165R4 `__","Compatibility between ``tuple``, 
``

[llvm-branch-commits] [clang] release/21.x: [X86][AVX10.2] Fix VNNIINT16 maskz intrinsics arguments order (#151077) (PR #151092)

2025-08-02 Thread Phoebe Wang via llvm-branch-commits

phoebewang wrote:

> > LGTM - but definitely needs details adding to release notes
> 
> @phoebewang can you handle a release note as well?

Sure! I plan to add it after it merged. Does it sound ok for you?

https://github.com/llvm/llvm-project/pull/151092
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/21.x: [libc++] Implement comparison operators for `tuple` added in C++23 (#148799) (PR #151808)

2025-08-02 Thread Nikolas Klauser via llvm-branch-commits

https://github.com/philnik777 requested changes to this pull request.

Why do we want to back-port this? This looks to me very much like feature work.

https://github.com/llvm/llvm-project/pull/151808
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang-format] Google Style: disable DerivePointerAlignment. (#149602) (PR #151797)

2025-08-02 Thread Björn Schäpers via llvm-branch-commits

https://github.com/HazardyKnusperkeks approved this pull request.


https://github.com/llvm/llvm-project/pull/151797
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [DAG] visitFREEZE - limit freezing of multiple operands (PR #150425)

2025-08-02 Thread Simon Pilgrim via llvm-branch-commits

RKSimon wrote:

@tru This should be OK to merge now - thanks.

https://github.com/llvm/llvm-project/pull/150425
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits