yaxunl added a comment. In D69826#1734324 <https://reviews.llvm.org/D69826#1734324>, @hliao wrote:
> In D69826#1734296 <https://reviews.llvm.org/D69826#1734296>, @yaxunl wrote: > > > I am a little bit concerned that user may have such code: > > > > struct A { int *p; } > > __global__ kernel(A a) { > > int x; > > a.p = &x; > > f(a); > > } > > > > > > @arsenm what happens if a private pointer is mis-used as a global pointer? > > > > I am wondering if we should coerce byval struct kernel arg to global only > > if they are const, e.g. > > > > __global__ kernel(const A a); > > > > > > I understand this may lose performance. Or should we introduce an option to > > let user disable coerce of non-const struct kernel arg to global. > > > This should not be a concern. The coercing is only applied to the parameter > itself. Within the function body, we still use the original `struct A`. The > preparation in function prolog will copy that coerced argument into the > original one (alloca-ed.) The modification of that parameter later will be > applied to the original one due to the by-val nature. > > A modified version of your code is compiled into the following code at O0: > > define protected amdgpu_kernel void @_Z3foo1A(%struct.A.coerce %a.coerce) > #0 { > entry: > %a = alloca %struct.A, align 8, addrspace(5) > %a1 = addrspacecast %struct.A addrspace(5)* %a to %struct.A* > %x = alloca i32, align 4, addrspace(5) > %x.ascast = addrspacecast i32 addrspace(5)* %x to i32* > %agg.tmp = alloca %struct.A, align 8, addrspace(5) > %agg.tmp.ascast = addrspacecast %struct.A addrspace(5)* %agg.tmp to > %struct.A* > %0 = bitcast %struct.A* %a1 to %struct.A.coerce* > %1 = getelementptr inbounds %struct.A.coerce, %struct.A.coerce* %0, i32 > 0, i32 0 > %2 = extractvalue %struct.A.coerce %a.coerce, 0 > store i32 addrspace(1)* %2, i32 addrspace(1)** %1, align 8 > %3 = getelementptr inbounds %struct.A.coerce, %struct.A.coerce* %0, i32 > 0, i32 1 > %4 = extractvalue %struct.A.coerce %a.coerce, 1 > store i32 addrspace(1)* %4, i32 addrspace(1)** %3, align 8 > %p = getelementptr inbounds %struct.A, %struct.A* %a1, i32 0, i32 0 > store i32* %x.ascast, i32** %p, align 8 > %5 = bitcast %struct.A* %agg.tmp.ascast to i8* > %6 = bitcast %struct.A* %a1 to i8* > call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %5, i8* align 8 %6, i64 > 16, i1 false) > %7 = getelementptr inbounds %struct.A, %struct.A* %agg.tmp.ascast, i32 0, > i32 0 > %8 = load i32*, i32** %7, align 8 > %9 = getelementptr inbounds %struct.A, %struct.A* %agg.tmp.ascast, i32 0, > i32 1 > %10 = load i32*, i32** %9, align 8 > call void @_Z1f1A(i32* %8, i32* %10) #3 > ret void > } > > > The modification of parameter `a` is applied the alloca-ed one. OK. Thanks for clarification. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69826/new/ https://reviews.llvm.org/D69826 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits