yaxunl added inline comments.
================ Comment at: lib/CodeGen/CGCall.cpp:3861 < Align.getQuantity()) || (ArgInfo.getIndirectByVal() && (RVAddrSpace != ArgAddrSpace))) { // Create an aligned temporary, and copy to it. ---------------- rjmccall wrote: > yaxunl wrote: > > rjmccall wrote: > > > This should be comparing AST address spaces. > > The AST address space of RV cannot be obtained through > > `CGFunctionInfo::const_arg_iterator it` and `it->type` since `it->type` > > takes type of > > > > > > ``` > > ImplicitCastExpr 0x60a9ff0 <col:5> 'struct S':'struct S' <LValueToRValue> > > `-DeclRefExpr 0x60a9f28 <col:5> '__global struct S':'__global struct S' > > lvalue Var 0x607efb0 > > ``` > > > > and the original addr space is lost due to LValueToRValue cast. > > > > To get the AST addr space of RV, it seems I need to save the argument Expr > > in CallArgList and get it from Expr. > > > I think your last two comments are related. I'm not sure why we haven't > copied into a temporary here, and if we had, the assumption of > LangAS::Default would be fine. Would you mind doing the investigation there? It seems the backend will insert a temp copy for byval arguments, therefore normally a byval argument does not need caller to create a temp copy in LLVM IR. An explicit temp copy is only needed for special cases, e.g. alignment mismatch with ABI. For example, the following C program, ``` struct S { long x[100]; }; struct S g_s; void f(struct S s); void g() { f(g_s); } ``` will generate the following IR on x86_64: ``` target triple = "x86_64-unknown-linux-gnu" %struct.S = type { [100 x i64] } @g_s = common global %struct.S zeroinitializer, align 8 ; Function Attrs: noinline nounwind optnone define void @g() #0 { entry: call void @f(%struct.S* byval align 8 @g_s) ret void } declare void @f(%struct.S* byval align 8) #1 ``` However, the following C program ``` struct S { int x[100]; }; struct S g_s; void f(struct S s); void g() { f(g_s); } ``` will generate the following IR ``` target triple = "x86_64-unknown-linux-gnu" %struct.S = type { [100 x i32] } @g_s = common global %struct.S zeroinitializer, align 4 ; Function Attrs: noinline nounwind optnone define void @g() #0 { entry: %byval-temp = alloca %struct.S, align 8 %0 = bitcast %struct.S* %byval-temp to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast (%struct.S* @g_s to i8*), i64 400, i32 4, i1 false) call void @f(%struct.S* byval align 8 %byval-temp) ret void } declare void @f(%struct.S* byval align 8) #1 ``` The temp var is generated by line 3863. The control flow reaches line 3863 because the alignment of the argument is 4 but the ABI requires it to be 8, so a temp is created to match the ABI align requirement. That means, in the OpenCL example, it is normal that a temp var is not generated at line 3848. The temp is supposed to be generated at line 3863 too, like the C example. https://reviews.llvm.org/D34367 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits