================
@@ -0,0 +1,74 @@
+// RUN: %clang_cc1 -fsycl-is-device -O0 -triple spirv64-unknown-unknown \
+// RUN: -emit-llvm  %s -o - | FileCheck %s --check-prefix=DEVICE
+
+// RUN: %clang_cc1 -fsycl-is-host -O0 -triple spirv64-unknown-unknown \
+// RUN: -emit-llvm  %s -o - | FileCheck %s --check-prefix=HOST
+
+// RUN: %clang_cc1 -fsycl-is-device -emit-llvm \
+// RUN: -aux-triple x86_64-pc-windows-msvc -triple spir-unknown--unknown \
+// RUN: %s -o - | FileCheck %s --check-prefix=MSVC
+
+namespace QL {
+  auto dg1 = [] { return 1; };
+  inline auto dg_inline1 = [] { return 1; };
+}
+
+namespace QL {
+  auto dg2 = [] { return 2; };
+  template<int N>
+  auto dg_template = [] { return N; };
+}
+
+using namespace QL;
+template<typename T>
+[[clang::sycl_kernel_entry_point(T)]] void f(T t) {
+  t();
+}
+
+void g() {
+  f(dg1);
+  f(dg2);
+  f(dg_inline1);
+  f(dg_template<3>);
+}
+
+// HOST: @_ZN2QL3dg1E = internal global %class.anon undef, align 1
+// HOST: @_ZN2QL3dg2E = internal global %class.anon.0 undef, align 1
+// HOST: @_ZN2QL10dg_inline1E = linkonce_odr global %class.anon.2 undef, 
comdat, align 1
+// HOST: @_ZN2QL11dg_templateILi3EEE = linkonce_odr global %class.anon.4 
undef, comdat, align 1
+
+// DEVICE: define spir_kernel void @_ZTSN2QL3dg1MUlvE_E
+// DEVICE: call spir_func noundef i32 @_ZNK2QL3dg1MUlvE_clEv
+// DEVICE: define internal spir_func noundef i32 @_ZNK2QL3dg1MUlvE_clEv
+// DEVICE: define spir_kernel void @_ZTSN2QL3dg2MUlvE_E
+// DEVICE: call spir_func noundef i32 @_ZNK2QL3dg2MUlvE_clEv
+// DEVICE: define internal spir_func noundef i32 @_ZNK2QL3dg2MUlvE_clEv
+// DEVICE: define spir_kernel void @_ZTSN2QL10dg_inline1MUlvE_E
+// DEVICE: call spir_func noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
+// DEVICE: define linkonce_odr spir_func noundef i32 
@_ZNK2QL10dg_inline1MUlvE_clEv
+// DEVICE: define spir_kernel void @_ZTSN2QL11dg_templateILi3EEMUlvE_E
+// DEVICE: call spir_func noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv
+// DEVICE: define linkonce_odr spir_func noundef i32 
@_ZNK2QL11dg_templateILi3EEMUlvE_clEv
+
+// HOST: define spir_func void @_Z1gv
+// HOST: call spir_func void @_Z1fIN2QL3dg1MUlvE_EEvT_
+// HOST: call spir_func void @_Z1fIN2QL3dg2MUlvE_EEvT_
+// HOST: call spir_func void @_Z1fIN2QL10dg_inline1MUlvE_EEvT_
+// HOST: call spir_func void @_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_
+// HOST: define internal spir_func void @_Z1fIN2QL3dg1MUlvE_EEvT
+// HOST: define internal spir_func void @_Z1fIN2QL3dg2MUlvE_EEvT_
+// HOST: define linkonce_odr spir_func void @_Z1fIN2QL10dg_inline1MUlvE_EEvT_
+// HOST: define linkonce_odr spir_func void 
@_Z1fIN2QL11dg_templateILi3EEMUlvE_EEvT_
+
+// MSVC: define dso_local spir_kernel void @_ZTSN2QL3dg1MUlvE_E
+// MSVC: call spir_func noundef i32 @_ZNK2QL3dg1MUlvE_clEv
+// MSVC: define internal spir_func noundef i32 @_ZNK2QL3dg1MUlvE_clEv
+// MSVC: define dso_local spir_kernel void @_ZTSN2QL3dg2MUlvE_E
+// MSVC: call spir_func noundef i32 @_ZNK2QL3dg2MUlvE_clEv
+// MSVC: define internal spir_func noundef i32 @_ZNK2QL3dg2MUlvE_clEv
+// MSVC: define dso_local spir_kernel void @_ZTSN2QL10dg_inline1MUlvE_E
+// MSVC: call spir_func noundef i32 @_ZNK2QL10dg_inline1MUlvE_clEv
+// MSVC: define linkonce_odr spir_func noundef i32 
@_ZNK2QL10dg_inline1MUlvE_clEv
+// MSVC: define dso_local spir_kernel void @_ZTSN2QL11dg_templateILi3EEMUlvE_E
+// MSVC: call spir_func noundef i32 @_ZNK2QL11dg_templateILi3EEMUlvE_clEv
+// MSVC: define linkonce_odr spir_func noundef i32 
@_ZNK2QL11dg_templateILi3EEMUlvE_clEv
----------------
tahonermann wrote:

Thanks @zygoloid!

> That lambda should have internal linkage (the relevant part of the ODR [does 
> not apply to lambdas in default template 
> arguments](https://eel.is/c++draft/basic.def.odr#18)), so that seems fine. 
> Except that Clang fails to actually give the lambda internal linkage, which 
> looks like a bug to me.

I see, ok, we can follow up on that.

> That's correct. Lambdas in default template arguments are not "owned" by the 
> template and so shouldn't allocate mangling numbers within it. Note that 
> their existence can differ across redeclarations in different TUs, so 
> numbering them would mean the same entity could get different manglings in 
> different TUs.

Thank you! This is a very helpful insight that I was missing and failed to pick 
up from the standard wording.

Assuming I'm reading things correctly, 
[\[basic.def.odr\]p(16.11)](https://eel.is/c++draft/basic.def.odr#16.11) also 
makes it clear that uses of such default template arguments result in distinct 
closure types in the contexts where those uses occur.

The example in [\[basic.def.odr\]p19](https://eel.is/c++draft/basic.def.odr#19) 
makes it clear that correspondence across TUs is required for default arguments 
of function parameters in member functions. The preceding text appears to treat 
default template arguments and function default arguments the same, so I 
presume the example is applicable to default template arguments as well. I 
think the example is more subtle than the description suggests though. Where it 
states, "The definition of `X` can appear in multiple translation units of a 
valid program; the 
[lambda-expression](https://eel.is/c++draft/expr.prim.lambda.general#nt:lambda-expression)s
 defined within the default argument of `X​::​h` within the definition of `X` 
denote the same closure type in each translation unit.", I think this is 
referring only to the use of that lambda within the body of `X::h` as opposed 
to other hypothetical uses elsewhere outside the class definition. Am I reading 
that correctly?

Back to the ABI, my take away from this is that discriminators allocated from 
namespace and class contexts need not correspond across TUs; when 
correspondence is required, discriminators are allocated from more local 
declaration contexts (except in cases like those reported in 
https://github.com/itanium-cxx-abi/cxx-abi/issues/165). If so, then it seems I 
need not worry (too much) about discriminators being allocated and unused or 
about the order in which they are allocated in those contexts.

Another example to check what I'm learning; https://godbolt.org/z/M4fY4e3Px:
```
extern int x;

struct S {
  template<auto F = [] { return x; }>
  static inline auto vt = F;

  // GCC:   _ZNK1SUlvE0_clEv
  // Clang: _ZNK1SUlvE_clEv
  // SYCL:  _ZNK1SUlvE0_clEv
  static inline auto sdm = vt<>;
};

// GCC:   _ZZ6usevt0vENKUlvE1_clEv
// Clang: _ZNK1SUlvE0_clEv
// SYCL:  _ZNK1SUlvE1_clEv
inline int usevt0() { return S::vt<>(); }

// GCC:   _ZZ6usevt1vENKUlvE2_clEv
// Clang: _ZNK1SUlvE1_clEv
// SYCL:  _ZNK1SUlvE2_clEv
inline int usevt1() { return S::vt<>(); }

int f() {
  return usevt0() + usevt1() + S::sdm();
}
```

Per your reference to 
[\[basic.def.odr\]p18](https://eel.is/c++draft/basic.def.odr#18), if `usevt0()` 
and/or `usevt1()` are defined in multiple TUs, the program is IFNDR because the 
uses of the lambda in the default template argument do not correspond. However, 
the same use in the initializer of `S::sdm` does correspond because `S::sdm` is 
defined within the definition of `S`. From an ABI perspective, the `sdm` case 
is the only one that matters, but a stable name isn't produced by Clang or GCC.

GCC includes a `<closure-prefix>` corresponding to the point of use of the 
lambda for `usevt0()` and `usevt1()`, which is good. However, I would have 
expected `_ZZ6usevt0vENKUlvE_clEv` and `_ZZ6usevt1vENKUlvE_clEv` for those 
cases; it looks like discriminators might be being allocated from the class 
context, but the function context is used for naming. I think the use in the 
initializer of `S::sdm` should result in a name with a `<closure-prefix>` that 
incorporates `sdm` too.

For Clang, the situation looks much like the last case. Hmm, I think I'm 
rediscovering #143218.

> We'd previously agreed (on a different github issue, I could probably dig it 
> up if needed) that we want to stop using the `$n` mangling for lambdas in 
> Clang entirely, and always produce manglings that actually match the ABI, can 
> be demangled, etc. We should do that -- which hopefully would mean that 
> there's no divergence between the SYCL and non-SYCL manglings after this PR, 
> at least for ABI-mandated manglings.

That would be great.

> But separately from that, in the` v2` case, the lambda should be mangled with 
> a `<closure-prefix>` naming the inline variable `v2`. Looks like GCC gets 
> this right, and Clang gets it wrong in both modes.

Ok, we can follow up on that.

> What new requirements does SYCL have compared to C++ here? It looks like at a 
> minimum we effectively want to treat all entities as visible-across-TUs for 
> lambda mangling purposes, but as I noted above, we want to do that anyway. 
> And those manglings should presumably exactly match the Itanium ones.

That is correct. The extra requirements for CUDA/HIP/SYCL stem from the need 
for types used in kernel invocations to correspond across the host target TU 
and each of the device target TUs, even for types that don't have external 
linkage. [SYCL issue 454](https://github.com/KhronosGroup/SYCL-Docs/issues/454) 
seeks to clarify what kinds of heroics are required; at least in the face of 
preprocessor shenanigans.  For SYCL, the correspondence concerns are limited to 
the type used to name the kernel, the lambda/class type used to define the 
kernel, and the types of its captures/data members. For CUDA/HIP, I think the 
correspondence concerns are limited to the types used as kernel arguments. I 
can provide examples if it would be helpful.

https://github.com/llvm/llvm-project/pull/159115
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to