[PATCH] D50596: [HIP] Make __hip_gpubin_handle hidden to avoid being merged across different shared libraries
yaxunl created this revision. yaxunl added reviewers: rjmccall, tra. Different shared libraries contain different fat binary, which is stored in a global variable `__hip_gpubin_handle`. Since different compilation units share the same fat binary, this variable has linkonce linkage. However, it should not be merged across different shared libraries. This patch set the visibility of the global variable to be hidden, which will make it invisible in the shared library, therefore preventing it from being merged. https://reviews.llvm.org/D50596 Files: lib/CodeGen/CGCUDANV.cpp test/CodeGenCUDA/device-stub.cu Index: test/CodeGenCUDA/device-stub.cu === --- test/CodeGenCUDA/device-stub.cu +++ test/CodeGenCUDA/device-stub.cu @@ -80,7 +80,7 @@ // HIP-SAME: section ".hipFatBinSegment" // * variable to save GPU binary handle after initialization // CUDANORDC: @__[[PREFIX]]_gpubin_handle = internal global i8** null -// HIP: @__[[PREFIX]]_gpubin_handle = linkonce global i8** null +// HIP: @__[[PREFIX]]_gpubin_handle = linkonce hidden global i8** null // * constant unnamed string with NVModuleID // RDC: [[MODULE_ID_GLOBAL:@.*]] = private constant // CUDARDC-SAME: c"[[MODULE_ID:.+]]\00", section "__nv_module_id", align 32 Index: lib/CodeGen/CGCUDANV.cpp === --- lib/CodeGen/CGCUDANV.cpp +++ lib/CodeGen/CGCUDANV.cpp @@ -459,6 +459,8 @@ /*Initializer=*/llvm::ConstantPointerNull::get(VoidPtrPtrTy), "__hip_gpubin_handle"); GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getQuantity()); +// Prevent the weak symbol in different shared libraries being merged. +GpuBinaryHandle->setVisibility(llvm::GlobalValue::HiddenVisibility); Address GpuBinaryAddr( GpuBinaryHandle, CharUnits::fromQuantity(GpuBinaryHandle->getAlignment())); Index: test/CodeGenCUDA/device-stub.cu === --- test/CodeGenCUDA/device-stub.cu +++ test/CodeGenCUDA/device-stub.cu @@ -80,7 +80,7 @@ // HIP-SAME: section ".hipFatBinSegment" // * variable to save GPU binary handle after initialization // CUDANORDC: @__[[PREFIX]]_gpubin_handle = internal global i8** null -// HIP: @__[[PREFIX]]_gpubin_handle = linkonce global i8** null +// HIP: @__[[PREFIX]]_gpubin_handle = linkonce hidden global i8** null // * constant unnamed string with NVModuleID // RDC: [[MODULE_ID_GLOBAL:@.*]] = private constant // CUDARDC-SAME: c"[[MODULE_ID:.+]]\00", section "__nv_module_id", align 32 Index: lib/CodeGen/CGCUDANV.cpp === --- lib/CodeGen/CGCUDANV.cpp +++ lib/CodeGen/CGCUDANV.cpp @@ -459,6 +459,8 @@ /*Initializer=*/llvm::ConstantPointerNull::get(VoidPtrPtrTy), "__hip_gpubin_handle"); GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getQuantity()); +// Prevent the weak symbol in different shared libraries being merged. +GpuBinaryHandle->setVisibility(llvm::GlobalValue::HiddenVisibility); Address GpuBinaryAddr( GpuBinaryHandle, CharUnits::fromQuantity(GpuBinaryHandle->getAlignment())); ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50596: [HIP] Make __hip_gpubin_handle hidden to avoid being merged across different shared libraries
This revision was automatically updated to reflect the committed changes. Closed by commit rC340056: [HIP] Make __hip_gpubin_handle hidden to avoid being merged across different… (authored by yaxunl, committed by ). Repository: rC Clang https://reviews.llvm.org/D50596 Files: lib/CodeGen/CGCUDANV.cpp test/CodeGenCUDA/device-stub.cu Index: lib/CodeGen/CGCUDANV.cpp === --- lib/CodeGen/CGCUDANV.cpp +++ lib/CodeGen/CGCUDANV.cpp @@ -459,6 +459,8 @@ /*Initializer=*/llvm::ConstantPointerNull::get(VoidPtrPtrTy), "__hip_gpubin_handle"); GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getQuantity()); +// Prevent the weak symbol in different shared libraries being merged. +GpuBinaryHandle->setVisibility(llvm::GlobalValue::HiddenVisibility); Address GpuBinaryAddr( GpuBinaryHandle, CharUnits::fromQuantity(GpuBinaryHandle->getAlignment())); Index: test/CodeGenCUDA/device-stub.cu === --- test/CodeGenCUDA/device-stub.cu +++ test/CodeGenCUDA/device-stub.cu @@ -80,7 +80,7 @@ // HIP-SAME: section ".hipFatBinSegment" // * variable to save GPU binary handle after initialization // CUDANORDC: @__[[PREFIX]]_gpubin_handle = internal global i8** null -// HIP: @__[[PREFIX]]_gpubin_handle = linkonce global i8** null +// HIP: @__[[PREFIX]]_gpubin_handle = linkonce hidden global i8** null // * constant unnamed string with NVModuleID // RDC: [[MODULE_ID_GLOBAL:@.*]] = private constant // CUDARDC-SAME: c"[[MODULE_ID:.+]]\00", section "__nv_module_id", align 32 Index: lib/CodeGen/CGCUDANV.cpp === --- lib/CodeGen/CGCUDANV.cpp +++ lib/CodeGen/CGCUDANV.cpp @@ -459,6 +459,8 @@ /*Initializer=*/llvm::ConstantPointerNull::get(VoidPtrPtrTy), "__hip_gpubin_handle"); GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getQuantity()); +// Prevent the weak symbol in different shared libraries being merged. +GpuBinaryHandle->setVisibility(llvm::GlobalValue::HiddenVisibility); Address GpuBinaryAddr( GpuBinaryHandle, CharUnits::fromQuantity(GpuBinaryHandle->getAlignment())); Index: test/CodeGenCUDA/device-stub.cu === --- test/CodeGenCUDA/device-stub.cu +++ test/CodeGenCUDA/device-stub.cu @@ -80,7 +80,7 @@ // HIP-SAME: section ".hipFatBinSegment" // * variable to save GPU binary handle after initialization // CUDANORDC: @__[[PREFIX]]_gpubin_handle = internal global i8** null -// HIP: @__[[PREFIX]]_gpubin_handle = linkonce global i8** null +// HIP: @__[[PREFIX]]_gpubin_handle = linkonce hidden global i8** null // * constant unnamed string with NVModuleID // RDC: [[MODULE_ID_GLOBAL:@.*]] = private constant // CUDARDC-SAME: c"[[MODULE_ID:.+]]\00", section "__nv_module_id", align 32 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50259: [OpenCL] Disallow negative attribute arguments
yaxunl accepted this revision. yaxunl added a comment. LGTM. Thanks. Repository: rC Clang https://reviews.llvm.org/D50259 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46475: [HIP] Set proper triple and offload kind for the toolchain
yaxunl updated this revision to Diff 145699. yaxunl marked an inline comment as done. yaxunl added a comment. Revised by John's comments. https://reviews.llvm.org/D46475 Files: include/clang/Basic/DiagnosticDriverKinds.td include/clang/Driver/Options.td include/clang/Driver/Types.h lib/Driver/Driver.cpp lib/Driver/Types.cpp test/Driver/Inputs/hip_multiple_inputs/a.cu test/Driver/Inputs/hip_multiple_inputs/b.hip test/Driver/hip-inputs.hip Index: test/Driver/hip-inputs.hip === --- /dev/null +++ test/Driver/hip-inputs.hip @@ -0,0 +1,23 @@ +// REQUIRES: clang-driver +// REQUIRES: x86-registered-target +// REQUIRES: amdgpu-registered-target + +// RUN: %clang -ccc-print-phases -target x86_64-linux-gnu \ +// RUN: -x hip --cuda-gpu-arch=gfx803 -c \ +// RUN: %S/Inputs/hip_multiple_inputs/a.cu \ +// RUN: %S/Inputs/hip_multiple_inputs/b.hip 2>&1 \ +// RUN: | FileCheck %s + +// RUN: not %clang -ccc-print-phases -target x86_64-linux-gnu \ +// RUN: --cuda-gpu-arch=gfx803 -c \ +// RUN: %S/Inputs/hip_multiple_inputs/a.cu \ +// RUN: %S/Inputs/hip_multiple_inputs/b.hip 2>&1 \ +// RUN: | FileCheck -check-prefix=MIX %s + +// RUN: not %clang -ccc-print-phases -target x86_64-linux-gnu \ +// RUN: --cuda-gpu-arch=gfx803 -c \ +// RUN: --hip-link %S/Inputs/hip_multiple_inputs/a.cu 2>&1 \ +// RUN: | FileCheck -check-prefix=MIX %s + +// CHECK-NOT: error: Mixed Cuda and HIP compilation is not supported. +// MIX: error: Mixed Cuda and HIP compilation is not supported. Index: lib/Driver/Types.cpp === --- lib/Driver/Types.cpp +++ lib/Driver/Types.cpp @@ -172,6 +172,15 @@ case TY_CUDA: case TY_PP_CUDA: case TY_CUDA_DEVICE: +return true; + } +} + +bool types::isHIP(ID Id) { + switch (Id) { + default: +return false; + case TY_HIP: case TY_PP_HIP: case TY_HIP_DEVICE: @@ -230,6 +239,7 @@ .Case("fpp", TY_Fortran) .Case("FPP", TY_Fortran) .Case("gch", TY_PCH) + .Case("hip", TY_HIP) .Case("hpp", TY_CXXHeader) .Case("iim", TY_PP_CXXModule) .Case("lib", TY_Object) Index: lib/Driver/Driver.cpp === --- lib/Driver/Driver.cpp +++ lib/Driver/Driver.cpp @@ -538,24 +538,46 @@ InputList &Inputs) { // - // CUDA + // CUDA/HIP // - // We need to generate a CUDA toolchain if any of the inputs has a CUDA type. - if (llvm::any_of(Inputs, [](std::pair &I) { + // We need to generate a CUDA toolchain if any of the inputs has a CUDA + // or HIP type. However, mixed CUDA/HIP compilation is not supported. + bool IsCuda = + llvm::any_of(Inputs, [](std::pair &I) { return types::isCuda(I.first); - })) { + }); + bool IsHIP = + llvm::any_of(Inputs, + [](std::pair &I) { + return types::isHIP(I.first); + }) || + C.getInputArgs().hasArg(options::OPT_hip_link); + if (IsCuda && IsHIP) { +Diag(clang::diag::err_drv_mix_cuda_hip); +return; + } + if (IsCuda || IsHIP) { const ToolChain *HostTC = C.getSingleOffloadToolChain(); const llvm::Triple &HostTriple = HostTC->getTriple(); -llvm::Triple CudaTriple(HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda" - : "nvptx-nvidia-cuda"); -// Use the CUDA and host triples as the key into the ToolChains map, because -// the device toolchain we create depends on both. +StringRef DeviceTripleStr; +auto OFK = IsHIP ? Action::OFK_HIP : Action::OFK_Cuda; +if (IsHIP) { + // HIP is only supported on amdgcn. + DeviceTripleStr = "amdgcn-amd-amdhsa"; +} else { + // CUDA is only supported on nvptx. + DeviceTripleStr = HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda" + : "nvptx-nvidia-cuda"; +} +llvm::Triple CudaTriple(DeviceTripleStr); +// Use the CUDA/HIP and host triples as the key into the ToolChains map, +// because the device toolchain we create depends on both. auto &CudaTC = ToolChains[CudaTriple.str() + "/" + HostTriple.str()]; if (!CudaTC) { CudaTC = llvm::make_unique( - *this, CudaTriple, *HostTC, C.getInputArgs(), Action::OFK_Cuda); + *this, CudaTriple, *HostTC, C.getInputArgs(), OFK); } -C.addOffloadDeviceToolChain(CudaTC.get(), Action::OFK_Cuda); +C.addOffloadDeviceToolChain(CudaTC.get(), OFK); } // Index: include/clang/Driver/Types.h === --- include/clang/Driver/Types.h +++ include/clang/Driver/Types.h @@ -77,6 +77,9 @@ /// isCuda - Is this a CUDA input. bool isCuda(ID Id); + /// isHIP - Is this a HIP input. + bool isHIP(ID Id); + /// isObjC - Is
[PATCH] D46471: [HIP] Add hip offload kind
yaxunl marked an inline comment as done. yaxunl added inline comments. Comment at: lib/Driver/ToolChains/Clang.cpp:133-135 Work(*C.getSingleOffloadToolChain()); + if (JA.isHostOffloading(Action::OFK_HIP)) tra wrote: > CUDA and HIP are mutually exclusive, so this should probably be `else if` Will do when committing. https://reviews.llvm.org/D46471 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46601: [OpenCL] Fix typos in emitted enqueue kernel function names
yaxunl created this revision. yaxunl added reviewers: Anastasia, b-sumner. Two typos: vaarg => vararg get_kernel_preferred_work_group_multiple => get_kernel_preferred_work_group_size_multiple https://reviews.llvm.org/D46601 Files: lib/CodeGen/CGBuiltin.cpp test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -88,7 +88,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 256, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_vaargs( + // COMMON-LABEL: call i32 @__enqueue_kernel_varargs( // COMMON-SAME: %opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK1:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG1]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -109,7 +109,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 %{{.*}}, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_vaargs( + // COMMON-LABEL: call i32 @__enqueue_kernel_varargs( // COMMON-SAME: %opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK2:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG2]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -133,7 +133,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 256, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_events_vaargs + // COMMON-LABEL: call i32 @__enqueue_kernel_events_varargs // COMMON-SAME: (%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.*}}, i32 2, %opencl.clk_event_t{{.*}} [[WAIT_EVNT]], %opencl.clk_event_t{{.*}} [[EVNT]], // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK3:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG3]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -157,7 +157,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 %{{.*}}, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_events_vaargs + // COMMON-LABEL: call i32 @__enqueue_kernel_events_varargs // COMMON-SAME: (%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.*}}, i32 2, %opencl.clk_event_t{{.*}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.*}}* addrspace(4)* [[EVNT]], // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK4:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG4]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -179,7 +179,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 %{{.*}}, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_vaargs + // COMMON-LABEL: call i32 @__enqueue_kernel_varargs // COMMON-SAME: (%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK5:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG5]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -208,7 +208,7 @@ // B64: store i64 2, i64* %[[TMP2]], align 8 // B64: %[[TMP3:.*]] = getelementptr [3 x i64], [3 x i64]* %[[TMP]], i32 0, i32 2 // B64: store i64 4, i64* %[[TMP3]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_vaargs + // COMMON-LABEL: call i32 @__enqueue_kernel_varargs // COMMON-SAME: (%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK6:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG6]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 3, @@ -229,7 +229,7 @@
[PATCH] D46471: [HIP] Add hip offload kind
This revision was automatically updated to reflect the committed changes. yaxunl marked an inline comment as done. Closed by commit rL331811: [HIP] Add hip offload kind (authored by yaxunl, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D46471?vs=145472&id=145780#toc Repository: rL LLVM https://reviews.llvm.org/D46471 Files: cfe/trunk/include/clang/Driver/Action.h cfe/trunk/lib/Driver/Action.cpp cfe/trunk/lib/Driver/Compilation.cpp cfe/trunk/lib/Driver/ToolChains/Clang.cpp Index: cfe/trunk/lib/Driver/ToolChains/Clang.cpp === --- cfe/trunk/lib/Driver/ToolChains/Clang.cpp +++ cfe/trunk/lib/Driver/ToolChains/Clang.cpp @@ -131,6 +131,10 @@ Work(*C.getSingleOffloadToolChain()); else if (JA.isDeviceOffloading(Action::OFK_Cuda)) Work(*C.getSingleOffloadToolChain()); + else if (JA.isHostOffloading(Action::OFK_HIP)) +Work(*C.getSingleOffloadToolChain()); + else if (JA.isDeviceOffloading(Action::OFK_HIP)) +Work(*C.getSingleOffloadToolChain()); if (JA.isHostOffloading(Action::OFK_OpenMP)) { auto TCs = C.getOffloadToolChains(); @@ -3105,13 +3109,14 @@ // Check number of inputs for sanity. We need at least one input. assert(Inputs.size() >= 1 && "Must have at least one input."); const InputInfo &Input = Inputs[0]; - // CUDA compilation may have multiple inputs (source file + results of + // CUDA/HIP compilation may have multiple inputs (source file + results of // device-side compilations). OpenMP device jobs also take the host IR as a // second input. All other jobs are expected to have exactly one // input. bool IsCuda = JA.isOffloading(Action::OFK_Cuda); + bool IsHIP = JA.isOffloading(Action::OFK_HIP); bool IsOpenMPDevice = JA.isDeviceOffloading(Action::OFK_OpenMP); - assert((IsCuda || (IsOpenMPDevice && Inputs.size() == 2) || + assert((IsCuda || IsHIP || (IsOpenMPDevice && Inputs.size() == 2) || Inputs.size() == 1) && "Unable to handle multiple inputs."); @@ -3123,10 +3128,10 @@ bool IsWindowsMSVC = RawTriple.isWindowsMSVCEnvironment(); bool IsIAMCU = RawTriple.isOSIAMCU(); - // Adjust IsWindowsXYZ for CUDA compilations. Even when compiling in device - // mode (i.e., getToolchain().getTriple() is NVPTX, not Windows), we need to - // pass Windows-specific flags to cc1. - if (IsCuda) { + // Adjust IsWindowsXYZ for CUDA/HIP compilations. Even when compiling in + // device mode (i.e., getToolchain().getTriple() is NVPTX/AMDGCN, not + // Windows), we need to pass Windows-specific flags to cc1. + if (IsCuda || IsHIP) { IsWindowsMSVC |= AuxTriple && AuxTriple->isWindowsMSVCEnvironment(); IsWindowsGNU |= AuxTriple && AuxTriple->isWindowsGNUEnvironment(); IsWindowsCygnus |= AuxTriple && AuxTriple->isWindowsCygwinEnvironment(); @@ -3150,18 +3155,21 @@ Args.ClaimAllArgs(options::OPT_MJ); } - if (IsCuda) { -// We have to pass the triple of the host if compiling for a CUDA device and -// vice-versa. + if (IsCuda || IsHIP) { +// We have to pass the triple of the host if compiling for a CUDA/HIP device +// and vice-versa. std::string NormalizedTriple; -if (JA.isDeviceOffloading(Action::OFK_Cuda)) +if (JA.isDeviceOffloading(Action::OFK_Cuda) || +JA.isDeviceOffloading(Action::OFK_HIP)) NormalizedTriple = C.getSingleOffloadToolChain() ->getTriple() .normalize(); else - NormalizedTriple = C.getSingleOffloadToolChain() - ->getTriple() - .normalize(); + NormalizedTriple = + (IsCuda ? C.getSingleOffloadToolChain() + : C.getSingleOffloadToolChain()) + ->getTriple() + .normalize(); CmdArgs.push_back("-aux-triple"); CmdArgs.push_back(Args.MakeArgString(NormalizedTriple)); Index: cfe/trunk/lib/Driver/Compilation.cpp === --- cfe/trunk/lib/Driver/Compilation.cpp +++ cfe/trunk/lib/Driver/Compilation.cpp @@ -196,10 +196,10 @@ if (FailingCommands.empty()) return false; - // CUDA can have the same input source code compiled multiple times so do not - // compiled again if there are already failures. It is OK to abort the CUDA - // pipeline on errors. - if (A->isOffloading(Action::OFK_Cuda)) + // CUDA/HIP can have the same input source code compiled multiple times so do + // not compiled again if there are already failures. It is OK to abort the + // CUDA pipeline on errors. + if (A->isOffloading(Action::OFK_Cuda) || A->isOffloading(Action::OFK_HIP)) return true; for (const auto &CI : FailingCommands) Index: cfe/trunk/lib/Driver/Action.cpp === --- cfe/trunk/lib/Driver/Action.cpp +++ cfe/trunk/l
[PATCH] D46643: CodeGen: Emit string literal in constant address space
yaxunl created this revision. yaxunl added a reviewer: rjmccall. Some targets have constant address space (e.g. amdgcn). For them string literal should be emitted in constant address space then casted to default address space. https://reviews.llvm.org/D46643 Files: lib/CodeGen/CGDecl.cpp lib/CodeGen/CodeGenModule.cpp test/CodeGenCXX/amdgcn-string-literal.cpp Index: test/CodeGenCXX/amdgcn-string-literal.cpp === --- /dev/null +++ test/CodeGenCXX/amdgcn-string-literal.cpp @@ -0,0 +1,28 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s + +// CHECK: @.str = private unnamed_addr addrspace(4) constant [6 x i8] c"g_str\00", align 1 +// CHECK: @g_str = addrspace(1) global i8* addrspacecast (i8 addrspace(4)* getelementptr inbounds ([6 x i8], [6 x i8] addrspace(4)* @.str, i32 0, i32 0) to i8*), align 8 +// CHECK: @g_array = addrspace(1) global [8 x i8] c"g_array\00", align 1 +// CHECK: @.str.1 = private unnamed_addr addrspace(4) constant [6 x i8] c"l_str\00", align 1 +// CHECK: @_ZZ1fvE7l_array = private unnamed_addr addrspace(4) constant [8 x i8] c"l_array\00", align 1 + +const char* g_str = "g_str"; +char g_array[] = "g_array"; + +void g(const char* p); + +// CHECK-LABEL: define void @_Z1fv() +void f() { + const char* l_str = "l_str"; + + // CHECK: call void @llvm.memcpy.p5i8.p4i8.i64 + char l_array[] = "l_array"; + + g(g_str); + g(g_array); + g(l_str); + g(l_array); + + const char* p = g_str; + g(p); +} \ No newline at end of file Index: lib/CodeGen/CodeGenModule.cpp === --- lib/CodeGen/CodeGenModule.cpp +++ lib/CodeGen/CodeGenModule.cpp @@ -4032,6 +4032,9 @@ unsigned AddrSpace = 0; if (CGM.getLangOpts().OpenCL) AddrSpace = CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant); + else if (auto AS = CGM.getTarget().getConstantAddressSpace()) { +AddrSpace = CGM.getContext().getTargetAddressSpace(AS.getValue()); + } llvm::Module &M = CGM.getModule(); // Create a global variable for this string @@ -4093,7 +4096,19 @@ SanitizerMD->reportGlobalToASan(GV, S->getStrTokenLoc(0), "", QualType()); - return ConstantAddress(GV, Alignment); + + llvm::Constant *Cast = GV; + if (!getLangOpts().OpenCL) { +if (auto AS = getTarget().getConstantAddressSpace()) { + if (AS != LangAS::Default) +Cast = getTargetCodeGenInfo().performAddrSpaceCast( +*this, GV, AS.getValue(), LangAS::Default, +GV->getValueType()->getPointerTo( +getContext().getTargetAddressSpace(LangAS::Default))); +} + } + + return ConstantAddress(Cast, Alignment); } /// GetAddrOfConstantStringFromObjCEncode - Return a pointer to a constant @@ -4137,7 +4152,17 @@ GlobalName, Alignment); if (Entry) *Entry = GV; - return ConstantAddress(GV, Alignment); + llvm::Constant *Cast = GV; + if (!getLangOpts().OpenCL) { +if (auto AS = getTarget().getConstantAddressSpace()) { + if (AS != LangAS::Default) +Cast = getTargetCodeGenInfo().performAddrSpaceCast( +*this, GV, AS.getValue(), LangAS::Default, +GV->getValueType()->getPointerTo( +getContext().getTargetAddressSpace(LangAS::Default))); +} + } + return ConstantAddress(Cast, Alignment); } ConstantAddress CodeGenModule::GetAddrOfGlobalTemporary( Index: lib/CodeGen/CGDecl.cpp === --- lib/CodeGen/CGDecl.cpp +++ lib/CodeGen/CGDecl.cpp @@ -1371,7 +1371,8 @@ llvm::Type *BP = AllocaInt8PtrTy; if (Loc.getType() != BP) -Loc = Builder.CreateBitCast(Loc, BP); +Loc = Address(EmitCastToVoidPtrInAllocaAddrSpace(Loc.getPointer()), + Loc.getAlignment()); // If the initializer is all or mostly zeros, codegen with memset then do // a few stores afterward. @@ -1394,7 +1395,11 @@ if (getLangOpts().OpenCL) { AS = CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant); BP = llvm::PointerType::getInt8PtrTy(getLLVMContext(), AS); +} else if (auto OptionalAS = CGM.getTarget().getConstantAddressSpace()) { + AS = CGM.getContext().getTargetAddressSpace(OptionalAS.getValue()); + BP = llvm::PointerType::getInt8PtrTy(getLLVMContext(), AS); } + llvm::GlobalVariable *GV = new llvm::GlobalVariable(CGM.getModule(), constant->getType(), true, llvm::GlobalValue::PrivateLinkage, ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45900: CodeGen: Fix invalid bitcast for lifetime.start/end
yaxunl added a comment. In https://reviews.llvm.org/D45900#1083377, @rjmccall wrote: > Oh, I see, it's not that the lifetime intrinsics don't handle pointers in the > alloca address space, it's that we might have already promoted them into > `DefaultAS`. > > Do the LLVM uses of lifetime intrinsics actually look through these address > space casts? I'm wondering if we might need to change how we emit the > intrinsics so that they're emitted directly on (bitcasts of) the underlying > allocas. Some passes do not look through address space casts. Although there is InferAddressSpace pass which can eliminate the redundant address space casts, still it is desirable not to emit redundant address space in Clang. To avoid increasing complexity of alloca emitting API, I think we need a way to track the original alloca and the alloca casted to default address space. I can think of two ways: 1. add OriginalPointer member to Address, which is the originally emitted LLVM value for the variable. Whenever we pass the address of a variable we also pass the original LLVM value. 2. add a map to CodeGenFunction to map the casted alloca to the real alloca. Any suggestion? Thanks. https://reviews.llvm.org/D45900 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46601: [OpenCL] Fix typos in emitted enqueue kernel function names
This revision was automatically updated to reflect the committed changes. Closed by commit rL331895: [OpenCL] Fix typos in emitted enqueue kernel function names (authored by yaxunl, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D46601?vs=145773&id=145948#toc Repository: rL LLVM https://reviews.llvm.org/D46601 Files: cfe/trunk/lib/CodeGen/CGBuiltin.cpp cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -88,7 +88,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 256, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_vaargs( + // COMMON-LABEL: call i32 @__enqueue_kernel_varargs( // COMMON-SAME: %opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK1:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG1]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -109,7 +109,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 %{{.*}}, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_vaargs( + // COMMON-LABEL: call i32 @__enqueue_kernel_varargs( // COMMON-SAME: %opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK2:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG2]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -133,7 +133,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 256, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_events_vaargs + // COMMON-LABEL: call i32 @__enqueue_kernel_events_varargs // COMMON-SAME: (%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.*}}, i32 2, %opencl.clk_event_t{{.*}} [[WAIT_EVNT]], %opencl.clk_event_t{{.*}} [[EVNT]], // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK3:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG3]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -157,7 +157,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 %{{.*}}, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_events_vaargs + // COMMON-LABEL: call i32 @__enqueue_kernel_events_varargs // COMMON-SAME: (%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.*}}, i32 2, %opencl.clk_event_t{{.*}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.*}}* addrspace(4)* [[EVNT]], // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK4:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG4]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -179,7 +179,7 @@ // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 %{{.*}}, i64* %[[TMP1]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_vaargs + // COMMON-LABEL: call i32 @__enqueue_kernel_varargs // COMMON-SAME: (%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK5:[^ ]+_kernel]] to i8*) to i8 addrspace(4)*), // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32 } addrspace(1)* [[BLG5]] to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, @@ -208,7 +208,7 @@ // B64: store i64 2, i64* %[[TMP2]], align 8 // B64: %[[TMP3:.*]] = getelementptr [3 x i64], [3 x i64]* %[[TMP]], i32 0, i32 2 // B64: store i64 4, i64* %[[TMP3]], align 8 - // COMMON-LABEL: call i32 @__enqueue_kernel_vaargs + // COMMON-LABEL: call i32 @__enqueue_kernel_varargs // COMMON-SAME: (%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, // COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.*}} [[INVGK6:[^ ]+_kernel]] to i8*) to i
[PATCH] D46643: CodeGen: Emit string literal in constant address space
yaxunl marked an inline comment as done. yaxunl added inline comments. Comment at: lib/CodeGen/CGDecl.cpp:1375 +Loc = Address(EmitCastToVoidPtrInAllocaAddrSpace(Loc.getPointer()), + Loc.getAlignment()); rjmccall wrote: > I don't understand why a patch about string literals is changing auto > variable emission. It is a bug about alloca revealed by the lit test ``` char l_array[] = "l_array"; ``` Loc contains the alloca casted to default address space, therefore it needs to be casted back to alloca address space here, otherwise CreateBitCast returns invalid bitcast. Unlike lifetime.start, memcpy does not require alloca address space, so an alternative fix is to let BP take address space of Loc. https://reviews.llvm.org/D46643 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46473: [HIP] Let clang-offload-bundler support HIP
This revision was automatically updated to reflect the committed changes. Closed by commit rC332121: [HIP] Let clang-offload-bundler support HIP (authored by yaxunl, committed by ). Repository: rC Clang https://reviews.llvm.org/D46473 Files: lib/Driver/Driver.cpp lib/Driver/ToolChains/Clang.cpp tools/clang-offload-bundler/ClangOffloadBundler.cpp Index: tools/clang-offload-bundler/ClangOffloadBundler.cpp === --- tools/clang-offload-bundler/ClangOffloadBundler.cpp +++ tools/clang-offload-bundler/ClangOffloadBundler.cpp @@ -969,11 +969,11 @@ getOffloadKindAndTriple(Target, Kind, Triple); bool KindIsValid = !Kind.empty(); -KindIsValid = KindIsValid && - StringSwitch(Kind) - .Case("host", true) - .Case("openmp", true) - .Default(false); +KindIsValid = KindIsValid && StringSwitch(Kind) + .Case("host", true) + .Case("openmp", true) + .Case("hip", true) + .Default(false); bool TripleIsValid = !Triple.empty(); llvm::Triple T(Triple); Index: lib/Driver/ToolChains/Clang.cpp === --- lib/Driver/ToolChains/Clang.cpp +++ lib/Driver/ToolChains/Clang.cpp @@ -5542,6 +5542,10 @@ Triples += Action::GetOffloadKindName(CurKind); Triples += '-'; Triples += CurTC->getTriple().normalize(); +if (CurKind == Action::OFK_HIP && CurDep->getOffloadingArch()) { + Triples += '-'; + Triples += CurDep->getOffloadingArch(); +} } CmdArgs.push_back(TCArgs.MakeArgString(Triples)); @@ -5611,6 +5615,11 @@ Triples += Action::GetOffloadKindName(Dep.DependentOffloadKind); Triples += '-'; Triples += Dep.DependentToolChain->getTriple().normalize(); +if (Dep.DependentOffloadKind == Action::OFK_HIP && +!Dep.DependentBoundArch.empty()) { + Triples += '-'; + Triples += Dep.DependentBoundArch; +} } CmdArgs.push_back(TCArgs.MakeArgString(Triples)); Index: lib/Driver/Driver.cpp === --- lib/Driver/Driver.cpp +++ lib/Driver/Driver.cpp @@ -3736,9 +3736,12 @@ UI.DependentToolChain->getTriple().normalize(), /*CreatePrefixForHost=*/true); auto CurI = InputInfo( - UA, GetNamedOutputPath(C, *UA, BaseInput, UI.DependentBoundArch, - /*AtTopLevel=*/false, MultipleArchs, - OffloadingPrefix), + UA, + GetNamedOutputPath(C, *UA, BaseInput, UI.DependentBoundArch, + /*AtTopLevel=*/false, + MultipleArchs || + UI.DependentOffloadKind == Action::OFK_HIP, + OffloadingPrefix), BaseInput); // Save the unbundling result. UnbundlingResults.push_back(CurI); Index: tools/clang-offload-bundler/ClangOffloadBundler.cpp === --- tools/clang-offload-bundler/ClangOffloadBundler.cpp +++ tools/clang-offload-bundler/ClangOffloadBundler.cpp @@ -969,11 +969,11 @@ getOffloadKindAndTriple(Target, Kind, Triple); bool KindIsValid = !Kind.empty(); -KindIsValid = KindIsValid && - StringSwitch(Kind) - .Case("host", true) - .Case("openmp", true) - .Default(false); +KindIsValid = KindIsValid && StringSwitch(Kind) + .Case("host", true) + .Case("openmp", true) + .Case("hip", true) + .Default(false); bool TripleIsValid = !Triple.empty(); llvm::Triple T(Triple); Index: lib/Driver/ToolChains/Clang.cpp === --- lib/Driver/ToolChains/Clang.cpp +++ lib/Driver/ToolChains/Clang.cpp @@ -5542,6 +5542,10 @@ Triples += Action::GetOffloadKindName(CurKind); Triples += '-'; Triples += CurTC->getTriple().normalize(); +if (CurKind == Action::OFK_HIP && CurDep->getOffloadingArch()) { + Triples += '-'; + Triples += CurDep->getOffloadingArch(); +} } CmdArgs.push_back(TCArgs.MakeArgString(Triples)); @@ -5611,6 +5615,11 @@ Triples += Action::GetOffloadKindName(Dep.DependentOffloadKind); Triples += '-'; Triples += Dep.DependentToolChain->getTriple().normalize(); +if (Dep.DependentOffloadKind == Action::OFK_HIP && +!Dep.DependentBoundArch.empty()) { + Triples += '-'; + Triples += Dep.DependentBoundArch; +} } CmdArgs.push_back(TCArgs.MakeArgString(Triples)); Index: lib
[PATCH] D46487: [HIP] Diagnose unsupported host triple
This revision was automatically updated to reflect the committed changes. Closed by commit rL332122: [HIP] Diagnose unsupported host triple (authored by yaxunl, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D46487?vs=145346&id=146379#toc Repository: rL LLVM https://reviews.llvm.org/D46487 Files: cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td cfe/trunk/lib/Driver/Driver.cpp cfe/trunk/test/Driver/cuda-bad-arch.cu Index: cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td === --- cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td +++ cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td @@ -40,7 +40,7 @@ "but installation at %3 is %4. Use --cuda-path to specify a different CUDA " "install, pass a different GPU arch with --cuda-gpu-arch, or pass " "--no-cuda-version-check.">; -def err_drv_cuda_nvptx_host : Error<"unsupported use of NVPTX for host compilation.">; +def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">; def err_drv_invalid_thread_model_for_target : Error< "invalid thread model '%0' in '%1' for this target">; def err_drv_invalid_linker_name : Error< Index: cfe/trunk/test/Driver/cuda-bad-arch.cu === --- cfe/trunk/test/Driver/cuda-bad-arch.cu +++ cfe/trunk/test/Driver/cuda-bad-arch.cu @@ -2,6 +2,7 @@ // REQUIRES: clang-driver // REQUIRES: x86-registered-target // REQUIRES: nvptx-registered-target +// REQUIRES: amdgpu-registered-target // RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=compute_20 -c %s 2>&1 \ // RUN: | FileCheck -check-prefix BAD %s @@ -25,9 +26,12 @@ // RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \ // RUN: | FileCheck -check-prefix OK %s -// We don't allow using NVPTX for host compilation. +// We don't allow using NVPTX/AMDGCN for host compilation. // RUN: %clang -### --cuda-host-only -target nvptx-nvidia-cuda -c %s 2>&1 \ // RUN: | FileCheck -check-prefix HOST_NVPTX %s +// RUN: %clang -### --cuda-host-only -target amdgcn-amd-amdhsa -c %s 2>&1 \ +// RUN: | FileCheck -check-prefix HOST_AMDGCN %s // OK-NOT: error: Unsupported CUDA gpu architecture -// HOST_NVPTX: error: unsupported use of NVPTX for host compilation. +// HOST_NVPTX: error: unsupported architecture 'nvptx' for host compilation. +// HOST_AMDGCN: error: unsupported architecture 'amdgcn' for host compilation. Index: cfe/trunk/lib/Driver/Driver.cpp === --- cfe/trunk/lib/Driver/Driver.cpp +++ cfe/trunk/lib/Driver/Driver.cpp @@ -2338,11 +2338,13 @@ const ToolChain *HostTC = C.getSingleOffloadToolChain(); assert(HostTC && "No toolchain for host compilation."); - if (HostTC->getTriple().isNVPTX()) { -// We do not support targeting NVPTX for host compilation. Throw + if (HostTC->getTriple().isNVPTX() || + HostTC->getTriple().getArch() == llvm::Triple::amdgcn) { +// We do not support targeting NVPTX/AMDGCN for host compilation. Throw // an error and abort pipeline construction early so we don't trip // asserts that assume device-side compilation. -C.getDriver().Diag(diag::err_drv_cuda_nvptx_host); +C.getDriver().Diag(diag::err_drv_cuda_host_arch) +<< HostTC->getTriple().getArchName(); return true; } Index: cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td === --- cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td +++ cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td @@ -40,7 +40,7 @@ "but installation at %3 is %4. Use --cuda-path to specify a different CUDA " "install, pass a different GPU arch with --cuda-gpu-arch, or pass " "--no-cuda-version-check.">; -def err_drv_cuda_nvptx_host : Error<"unsupported use of NVPTX for host compilation.">; +def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">; def err_drv_invalid_thread_model_for_target : Error< "invalid thread model '%0' in '%1' for this target">; def err_drv_invalid_linker_name : Error< Index: cfe/trunk/test/Driver/cuda-bad-arch.cu === --- cfe/trunk/test/Driver/cuda-bad-arch.cu +++ cfe/trunk/test/Driver/cuda-bad-arch.cu @@ -2,6 +2,7 @@ // REQUIRES: clang-driver // REQUIRES: x86-registered-target // REQUIRES: nvptx-registered-target +// REQUIRES: amdgpu-registered-target // RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=compute_20 -c %s 2>&1 \ // RUN: | FileCheck -check-prefix BAD %s @@ -25,9 +26,12 @@ // RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \ // RUN: | FileCheck -check-prefix OK %s -// We don't allow using NVPTX for host compilation. +// We don't allow using NVPTX/A
[PATCH] D46475: [HIP] Set proper triple and offload kind for the toolchain
This revision was automatically updated to reflect the committed changes. Closed by commit rL332123: [HIP] Set proper triple and offload kind for the toolchain (authored by yaxunl, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D46475?vs=145699&id=146381#toc Repository: rL LLVM https://reviews.llvm.org/D46475 Files: cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td cfe/trunk/include/clang/Driver/Options.td cfe/trunk/include/clang/Driver/Types.h cfe/trunk/lib/Driver/Driver.cpp cfe/trunk/lib/Driver/Types.cpp cfe/trunk/test/Driver/Inputs/hip_multiple_inputs/a.cu cfe/trunk/test/Driver/Inputs/hip_multiple_inputs/b.hip cfe/trunk/test/Driver/hip-inputs.hip Index: cfe/trunk/test/Driver/hip-inputs.hip === --- cfe/trunk/test/Driver/hip-inputs.hip +++ cfe/trunk/test/Driver/hip-inputs.hip @@ -0,0 +1,23 @@ +// REQUIRES: clang-driver +// REQUIRES: x86-registered-target +// REQUIRES: amdgpu-registered-target + +// RUN: %clang -ccc-print-phases -target x86_64-linux-gnu \ +// RUN: -x hip --cuda-gpu-arch=gfx803 -c \ +// RUN: %S/Inputs/hip_multiple_inputs/a.cu \ +// RUN: %S/Inputs/hip_multiple_inputs/b.hip 2>&1 \ +// RUN: | FileCheck %s + +// RUN: not %clang -ccc-print-phases -target x86_64-linux-gnu \ +// RUN: --cuda-gpu-arch=gfx803 -c \ +// RUN: %S/Inputs/hip_multiple_inputs/a.cu \ +// RUN: %S/Inputs/hip_multiple_inputs/b.hip 2>&1 \ +// RUN: | FileCheck -check-prefix=MIX %s + +// RUN: not %clang -ccc-print-phases -target x86_64-linux-gnu \ +// RUN: --cuda-gpu-arch=gfx803 -c \ +// RUN: --hip-link %S/Inputs/hip_multiple_inputs/a.cu 2>&1 \ +// RUN: | FileCheck -check-prefix=MIX %s + +// CHECK-NOT: error: Mixed Cuda and HIP compilation is not supported. +// MIX: error: Mixed Cuda and HIP compilation is not supported. Index: cfe/trunk/lib/Driver/Types.cpp === --- cfe/trunk/lib/Driver/Types.cpp +++ cfe/trunk/lib/Driver/Types.cpp @@ -172,6 +172,15 @@ case TY_CUDA: case TY_PP_CUDA: case TY_CUDA_DEVICE: +return true; + } +} + +bool types::isHIP(ID Id) { + switch (Id) { + default: +return false; + case TY_HIP: case TY_PP_HIP: case TY_HIP_DEVICE: @@ -230,6 +239,7 @@ .Case("fpp", TY_Fortran) .Case("FPP", TY_Fortran) .Case("gch", TY_PCH) + .Case("hip", TY_HIP) .Case("hpp", TY_CXXHeader) .Case("iim", TY_PP_CXXModule) .Case("lib", TY_Object) Index: cfe/trunk/lib/Driver/Driver.cpp === --- cfe/trunk/lib/Driver/Driver.cpp +++ cfe/trunk/lib/Driver/Driver.cpp @@ -538,24 +538,46 @@ InputList &Inputs) { // - // CUDA + // CUDA/HIP // - // We need to generate a CUDA toolchain if any of the inputs has a CUDA type. - if (llvm::any_of(Inputs, [](std::pair &I) { + // We need to generate a CUDA toolchain if any of the inputs has a CUDA + // or HIP type. However, mixed CUDA/HIP compilation is not supported. + bool IsCuda = + llvm::any_of(Inputs, [](std::pair &I) { return types::isCuda(I.first); - })) { + }); + bool IsHIP = + llvm::any_of(Inputs, + [](std::pair &I) { + return types::isHIP(I.first); + }) || + C.getInputArgs().hasArg(options::OPT_hip_link); + if (IsCuda && IsHIP) { +Diag(clang::diag::err_drv_mix_cuda_hip); +return; + } + if (IsCuda || IsHIP) { const ToolChain *HostTC = C.getSingleOffloadToolChain(); const llvm::Triple &HostTriple = HostTC->getTriple(); -llvm::Triple CudaTriple(HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda" - : "nvptx-nvidia-cuda"); -// Use the CUDA and host triples as the key into the ToolChains map, because -// the device toolchain we create depends on both. +StringRef DeviceTripleStr; +auto OFK = IsHIP ? Action::OFK_HIP : Action::OFK_Cuda; +if (IsHIP) { + // HIP is only supported on amdgcn. + DeviceTripleStr = "amdgcn-amd-amdhsa"; +} else { + // CUDA is only supported on nvptx. + DeviceTripleStr = HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda" + : "nvptx-nvidia-cuda"; +} +llvm::Triple CudaTriple(DeviceTripleStr); +// Use the CUDA/HIP and host triples as the key into the ToolChains map, +// because the device toolchain we create depends on both. auto &CudaTC = ToolChains[CudaTriple.str() + "/" + HostTriple.str()]; if (!CudaTC) { CudaTC = llvm::make_unique( - *this, CudaTriple, *HostTC, C.getInputArgs(), Action::OFK_Cuda); + *this, CudaTriple, *HostTC, C.getInputArgs(), OFK); } -C.addOffloadDeviceToolChain(CudaTC.get(), Action::OFK_Cuda); +
[PATCH] D46643: CodeGen: Emit string literal in constant address space
yaxunl updated this revision to Diff 146468. yaxunl marked an inline comment as done. yaxunl added a comment. Revised by John's comments. Also refactored to extract common code. https://reviews.llvm.org/D46643 Files: lib/CodeGen/CGDecl.cpp lib/CodeGen/CodeGenModule.cpp lib/CodeGen/CodeGenModule.h test/CodeGenCXX/amdgcn-string-literal.cpp Index: test/CodeGenCXX/amdgcn-string-literal.cpp === --- /dev/null +++ test/CodeGenCXX/amdgcn-string-literal.cpp @@ -0,0 +1,28 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s + +// CHECK: @.str = private unnamed_addr addrspace(4) constant [6 x i8] c"g_str\00", align 1 +// CHECK: @g_str = addrspace(1) global i8* addrspacecast (i8 addrspace(4)* getelementptr inbounds ([6 x i8], [6 x i8] addrspace(4)* @.str, i32 0, i32 0) to i8*), align 8 +// CHECK: @g_array = addrspace(1) global [8 x i8] c"g_array\00", align 1 +// CHECK: @.str.1 = private unnamed_addr addrspace(4) constant [6 x i8] c"l_str\00", align 1 +// CHECK: @_ZZ1fvE7l_array = private unnamed_addr addrspace(4) constant [8 x i8] c"l_array\00", align 1 + +const char* g_str = "g_str"; +char g_array[] = "g_array"; + +void g(const char* p); + +// CHECK-LABEL: define void @_Z1fv() +void f() { + const char* l_str = "l_str"; + + // CHECK: call void @llvm.memcpy.p0i8.p4i8.i64 + char l_array[] = "l_array"; + + g(g_str); + g(g_array); + g(l_str); + g(l_array); + + const char* p = g_str; + g(p); +} Index: lib/CodeGen/CodeGenModule.h === --- lib/CodeGen/CodeGenModule.h +++ lib/CodeGen/CodeGenModule.h @@ -785,6 +785,18 @@ ForDefinition_t IsForDefinition = NotForDefinition); + /// Return the AST address space of string literal, which is used to emit + /// the string literal as global variable in LLVM IR. + /// Note: This is not necessarily the address space of the string literal + /// in AST. For address space agnostic language, e.g. C++, string literal + /// in AST is always in default address space. + LangAS getStringLiteralAddressSpace() const; + + /// Cast the string literal global variable to default address space when + /// necessary. + llvm::Constant * + castStringLiteralToDefaultAddressSpace(llvm::GlobalVariable *GV); + /// Return the address of the given function. If Ty is non-null, then this /// function will use the specified type if it has to create it. llvm::Constant *GetAddrOfFunction(GlobalDecl GD, llvm::Type *Ty = nullptr, Index: lib/CodeGen/CodeGenModule.cpp === --- lib/CodeGen/CodeGenModule.cpp +++ lib/CodeGen/CodeGenModule.cpp @@ -3044,6 +3044,38 @@ return getTargetCodeGenInfo().getGlobalVarAddressSpace(*this, D); } +LangAS CodeGenModule::getStringLiteralAddressSpace() const { + // OpenCL v1.2 s6.5.3: a string literal is in the constant address space. + if (LangOpts.OpenCL) +return LangAS::opencl_constant; + if (auto AS = getTarget().getConstantAddressSpace()) +return AS.getValue(); + return LangAS::Default; +} + +// In address space agnostic languages, string literals are in default address +// space in AST. However, certain targets (e.g. amdgcn) request them to be +// emitted in constant address space in LLVM IR. To be consistent with other +// parts of AST, string literal global variables in constant address space +// need to be casted to default address space before being put into address +// map and referenced by other part of CodeGen. +// In OpenCL, string literals are in constant address space in AST, therefore +// they should not be casted to default address space. +llvm::Constant *CodeGenModule::castStringLiteralToDefaultAddressSpace( +llvm::GlobalVariable *GV) { + llvm::Constant *Cast = GV; + if (!LangOpts.OpenCL) { +if (auto AS = getTarget().getConstantAddressSpace()) { + if (AS != LangAS::Default) +Cast = getTargetCodeGenInfo().performAddrSpaceCast( +*this, GV, AS.getValue(), LangAS::Default, +GV->getValueType()->getPointerTo( +getContext().getTargetAddressSpace(LangAS::Default))); +} + } + return Cast; +} + template void CodeGenModule::MaybeHandleStaticInExternC(const SomeDecl *D, llvm::GlobalValue *GV) { @@ -4039,10 +4071,8 @@ GenerateStringLiteral(llvm::Constant *C, llvm::GlobalValue::LinkageTypes LT, CodeGenModule &CGM, StringRef GlobalName, CharUnits Alignment) { - // OpenCL v1.2 s6.5.3: a string literal is in the constant address space. - unsigned AddrSpace = 0; - if (CGM.getLangOpts().OpenCL) -AddrSpace = CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant); + unsigned AddrSpace = CGM.getContext().getTargetAddressSpace( + CGM.getStringLiteralAddressS
[PATCH] D43281: [AMDGPU] fixes for lds f32 builtins
yaxunl accepted this revision. yaxunl added a comment. This revision is now accepted and ready to land. LGTM. Thanks! Repository: rC Clang https://reviews.llvm.org/D43281 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46643: CodeGen: Emit string literal in constant address space
yaxunl updated this revision to Diff 146647. yaxunl marked 2 inline comments as done. yaxunl added a comment. Revised by John's comments. https://reviews.llvm.org/D46643 Files: lib/CodeGen/CGDecl.cpp lib/CodeGen/CodeGenModule.cpp lib/CodeGen/CodeGenModule.h test/CodeGenCXX/amdgcn-string-literal.cpp Index: test/CodeGenCXX/amdgcn-string-literal.cpp === --- /dev/null +++ test/CodeGenCXX/amdgcn-string-literal.cpp @@ -0,0 +1,28 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s + +// CHECK: @.str = private unnamed_addr addrspace(4) constant [6 x i8] c"g_str\00", align 1 +// CHECK: @g_str = addrspace(1) global i8* addrspacecast (i8 addrspace(4)* getelementptr inbounds ([6 x i8], [6 x i8] addrspace(4)* @.str, i32 0, i32 0) to i8*), align 8 +// CHECK: @g_array = addrspace(1) global [8 x i8] c"g_array\00", align 1 +// CHECK: @.str.1 = private unnamed_addr addrspace(4) constant [6 x i8] c"l_str\00", align 1 +// CHECK: @_ZZ1fvE7l_array = private unnamed_addr addrspace(4) constant [8 x i8] c"l_array\00", align 1 + +const char* g_str = "g_str"; +char g_array[] = "g_array"; + +void g(const char* p); + +// CHECK-LABEL: define void @_Z1fv() +void f() { + const char* l_str = "l_str"; + + // CHECK: call void @llvm.memcpy.p0i8.p4i8.i64 + char l_array[] = "l_array"; + + g(g_str); + g(g_array); + g(l_str); + g(l_array); + + const char* p = g_str; + g(p); +} Index: lib/CodeGen/CodeGenModule.h === --- lib/CodeGen/CodeGenModule.h +++ lib/CodeGen/CodeGenModule.h @@ -785,6 +785,13 @@ ForDefinition_t IsForDefinition = NotForDefinition); + /// Return the AST address space of string literal, which is used to emit + /// the string literal as global variable in LLVM IR. + /// Note: This is not necessarily the address space of the string literal + /// in AST. For address space agnostic language, e.g. C++, string literal + /// in AST is always in default address space. + LangAS getStringLiteralAddressSpace() const; + /// Return the address of the given function. If Ty is non-null, then this /// function will use the specified type if it has to create it. llvm::Constant *GetAddrOfFunction(GlobalDecl GD, llvm::Type *Ty = nullptr, Index: lib/CodeGen/CodeGenModule.cpp === --- lib/CodeGen/CodeGenModule.cpp +++ lib/CodeGen/CodeGenModule.cpp @@ -3044,6 +3044,39 @@ return getTargetCodeGenInfo().getGlobalVarAddressSpace(*this, D); } +LangAS CodeGenModule::getStringLiteralAddressSpace() const { + // OpenCL v1.2 s6.5.3: a string literal is in the constant address space. + if (LangOpts.OpenCL) +return LangAS::opencl_constant; + if (auto AS = getTarget().getConstantAddressSpace()) +return AS.getValue(); + return LangAS::Default; +} + +// In address space agnostic languages, string literals are in default address +// space in AST. However, certain targets (e.g. amdgcn) request them to be +// emitted in constant address space in LLVM IR. To be consistent with other +// parts of AST, string literal global variables in constant address space +// need to be casted to default address space before being put into address +// map and referenced by other part of CodeGen. +// In OpenCL, string literals are in constant address space in AST, therefore +// they should not be casted to default address space. +static llvm::Constant * +castStringLiteralToDefaultAddressSpace(CodeGenModule &CGM, + llvm::GlobalVariable *GV) { + llvm::Constant *Cast = GV; + if (!CGM.getLangOpts().OpenCL) { +if (auto AS = CGM.getTarget().getConstantAddressSpace()) { + if (AS != LangAS::Default) +Cast = CGM.getTargetCodeGenInfo().performAddrSpaceCast( +CGM, GV, AS.getValue(), LangAS::Default, +GV->getValueType()->getPointerTo( +CGM.getContext().getTargetAddressSpace(LangAS::Default))); +} + } + return Cast; +} + template void CodeGenModule::MaybeHandleStaticInExternC(const SomeDecl *D, llvm::GlobalValue *GV) { @@ -4039,10 +4072,8 @@ GenerateStringLiteral(llvm::Constant *C, llvm::GlobalValue::LinkageTypes LT, CodeGenModule &CGM, StringRef GlobalName, CharUnits Alignment) { - // OpenCL v1.2 s6.5.3: a string literal is in the constant address space. - unsigned AddrSpace = 0; - if (CGM.getLangOpts().OpenCL) -AddrSpace = CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant); + unsigned AddrSpace = CGM.getContext().getTargetAddressSpace( + CGM.getStringLiteralAddressSpace()); llvm::Module &M = CGM.getModule(); // Create a global variable for this string @@ -4104,7 +4135,9 @@ SanitizerMD->reportGlobalToASan(GV,
[PATCH] D46643: CodeGen: Emit string literal in constant address space
This revision was automatically updated to reflect the committed changes. Closed by commit rC332279: CodeGen: Emit string literal in constant address space (authored by yaxunl, committed by ). Repository: rC Clang https://reviews.llvm.org/D46643 Files: lib/CodeGen/CGDecl.cpp lib/CodeGen/CodeGenModule.cpp lib/CodeGen/CodeGenModule.h test/CodeGenCXX/amdgcn-string-literal.cpp Index: lib/CodeGen/CodeGenModule.cpp === --- lib/CodeGen/CodeGenModule.cpp +++ lib/CodeGen/CodeGenModule.cpp @@ -3044,6 +3044,39 @@ return getTargetCodeGenInfo().getGlobalVarAddressSpace(*this, D); } +LangAS CodeGenModule::getStringLiteralAddressSpace() const { + // OpenCL v1.2 s6.5.3: a string literal is in the constant address space. + if (LangOpts.OpenCL) +return LangAS::opencl_constant; + if (auto AS = getTarget().getConstantAddressSpace()) +return AS.getValue(); + return LangAS::Default; +} + +// In address space agnostic languages, string literals are in default address +// space in AST. However, certain targets (e.g. amdgcn) request them to be +// emitted in constant address space in LLVM IR. To be consistent with other +// parts of AST, string literal global variables in constant address space +// need to be casted to default address space before being put into address +// map and referenced by other part of CodeGen. +// In OpenCL, string literals are in constant address space in AST, therefore +// they should not be casted to default address space. +static llvm::Constant * +castStringLiteralToDefaultAddressSpace(CodeGenModule &CGM, + llvm::GlobalVariable *GV) { + llvm::Constant *Cast = GV; + if (!CGM.getLangOpts().OpenCL) { +if (auto AS = CGM.getTarget().getConstantAddressSpace()) { + if (AS != LangAS::Default) +Cast = CGM.getTargetCodeGenInfo().performAddrSpaceCast( +CGM, GV, AS.getValue(), LangAS::Default, +GV->getValueType()->getPointerTo( +CGM.getContext().getTargetAddressSpace(LangAS::Default))); +} + } + return Cast; +} + template void CodeGenModule::MaybeHandleStaticInExternC(const SomeDecl *D, llvm::GlobalValue *GV) { @@ -4039,10 +4072,8 @@ GenerateStringLiteral(llvm::Constant *C, llvm::GlobalValue::LinkageTypes LT, CodeGenModule &CGM, StringRef GlobalName, CharUnits Alignment) { - // OpenCL v1.2 s6.5.3: a string literal is in the constant address space. - unsigned AddrSpace = 0; - if (CGM.getLangOpts().OpenCL) -AddrSpace = CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant); + unsigned AddrSpace = CGM.getContext().getTargetAddressSpace( + CGM.getStringLiteralAddressSpace()); llvm::Module &M = CGM.getModule(); // Create a global variable for this string @@ -4104,7 +4135,9 @@ SanitizerMD->reportGlobalToASan(GV, S->getStrTokenLoc(0), "", QualType()); - return ConstantAddress(GV, Alignment); + + return ConstantAddress(castStringLiteralToDefaultAddressSpace(*this, GV), + Alignment); } /// GetAddrOfConstantStringFromObjCEncode - Return a pointer to a constant @@ -4148,7 +4181,9 @@ GlobalName, Alignment); if (Entry) *Entry = GV; - return ConstantAddress(GV, Alignment); + + return ConstantAddress(castStringLiteralToDefaultAddressSpace(*this, GV), + Alignment); } ConstantAddress CodeGenModule::GetAddrOfGlobalTemporary( Index: lib/CodeGen/CGDecl.cpp === --- lib/CodeGen/CGDecl.cpp +++ lib/CodeGen/CGDecl.cpp @@ -1374,7 +1374,7 @@ llvm::ConstantInt::get(IntPtrTy, getContext().getTypeSizeInChars(type).getQuantity()); - llvm::Type *BP = AllocaInt8PtrTy; + llvm::Type *BP = CGM.Int8Ty->getPointerTo(Loc.getAddressSpace()); if (Loc.getType() != BP) Loc = Builder.CreateBitCast(Loc, BP); @@ -1395,11 +1395,10 @@ // Otherwise, create a temporary global with the initializer then // memcpy from the global to the alloca. std::string Name = getStaticDeclName(CGM, D); -unsigned AS = 0; -if (getLangOpts().OpenCL) { - AS = CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant); - BP = llvm::PointerType::getInt8PtrTy(getLLVMContext(), AS); -} +unsigned AS = CGM.getContext().getTargetAddressSpace( +CGM.getStringLiteralAddressSpace()); +BP = llvm::PointerType::getInt8PtrTy(getLLVMContext(), AS); + llvm::GlobalVariable *GV = new llvm::GlobalVariable(CGM.getModule(), constant->getType(), true, llvm::GlobalValue::PrivateLinkage, Index: lib/CodeGen/CodeGenModule.h === --- lib/CodeGen/CodeGenModule.h +++ lib/CodeGen/C
[PATCH] D45900: CodeGen: Fix invalid bitcast for lifetime.start/end
yaxunl added a comment. In https://reviews.llvm.org/D45900#1093160, @rjmccall wrote: > In https://reviews.llvm.org/D45900#1093154, @yaxunl wrote: > > > In https://reviews.llvm.org/D45900#1083377, @rjmccall wrote: > > > > > Oh, I see, it's not that the lifetime intrinsics don't handle pointers in > > > the alloca address space, it's that we might have already promoted them > > > into `DefaultAS`. > > > > > > Do the LLVM uses of lifetime intrinsics actually look through these > > > address space casts? I'm wondering if we might need to change how we > > > emit the intrinsics so that they're emitted directly on (bitcasts of) the > > > underlying allocas. > > > > > > Some passes do not look through address space casts. Although there is > > InferAddressSpace pass which can eliminate the redundant address space > > casts, still it is desirable not to emit redundant address space in Clang. > > > > To avoid increasing complexity of alloca emitting API, I think we need a > > way to track the original alloca and the alloca casted to default address > > space. I can think of two ways: > > > > 1. add OriginalPointer member to Address, which is the originally emitted > > LLVM value for the variable. Whenever we pass the address of a variable we > > also pass the original LLVM value. > > 2. add a map to CodeGenFunction to map the casted alloca to the real alloca. > > > > Any suggestion? Thanks. > > > Can we just call CreateLifetimeStart (and push the cleanup to call > CreateLifetimeEnd) immediately after creating the alloca instead of waiting > until later like we do now? > > Modifying Address is not appropriate, and adding a map to CGF would be big > waste. Since CreateTempAlloca returns the casted alloca by default, whereas CreateLifetimeStart expects the original alloca. If we want to call CreateLifetimeStart with original alloca, we have to do it in CreateTempAlloca. This incurs two issues: 1. we need an enum parameter to control how lifetime.start/end is emitted. There are cases that lifetime.start is not emitted, and different ways to push cleanups for lifetime.end, e.g. in CodeGenFunction:: EmitMaterializeTemporaryExpr switch (M->getStorageDuration()) { case SD_Automatic: case SD_FullExpression: if (auto *Size = EmitLifetimeStart( CGM.getDataLayout().getTypeAllocSize(Object.getElementType()), Object.getPointer())) { if (M->getStorageDuration() == SD_Automatic) pushCleanupAfterFullExpr(NormalEHLifetimeMarker, Object, Size); else pushFullExprCleanup(NormalEHLifetimeMarker, Object, Size); } break; 2. There are situations that the pushed cleanup for lifetime.end is deactivated and emitted early, e.g. in AggExprEmitter::withReturnValueSlot // If there's no dtor to run, the copy was the last use of our temporary. // Since we're not guaranteed to be in an ExprWithCleanups, clean up // eagerly. CGF.DeactivateCleanupBlock(LifetimeEndBlock, LifetimeStartInst); CGF.EmitLifetimeEnd(LifetimeSizePtr, RetAddr.getPointer()); In this case there is no good way to get the LifetimeStartInst and the original alloca inst. Basically the emitting of lifetime.start/end is not uniform enough to be incorporated as part of CreateTempAlloca. How about letting CreateTempAlloca have an optional pointer argument for returning the original alloca? https://reviews.llvm.org/D45900 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46489: [HIP] Let assembler output bitcode for amdgcn
yaxunl added a comment. In https://reviews.llvm.org/D46489#1088940, @rjmccall wrote: > I think the right solution here is to make a CompileJobAction with type > TY_LLVM_BC in the first place. You should get the advice of a driver expert, > though. There is already JobAction for TY_LLVM_BC. I just want to skip the backend and assemble phase when offloading HIP. I will try achieving that through HIP action builder. https://reviews.llvm.org/D46489 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45900: CodeGen: Fix invalid bitcast for lifetime.start/end
yaxunl updated this revision to Diff 146987. yaxunl edited the summary of this revision. yaxunl added a comment. Add optional argument to CreateMemTemp and CreateTempAlloca to get the original alloca and use it for lifetime intrinsic. https://reviews.llvm.org/D45900 Files: lib/CodeGen/CGCall.cpp lib/CodeGen/CGDecl.cpp lib/CodeGen/CGExpr.cpp lib/CodeGen/CGExprAgg.cpp lib/CodeGen/CodeGenFunction.h test/CodeGenCXX/amdgcn_declspec_get.cpp Index: test/CodeGenCXX/amdgcn_declspec_get.cpp === --- /dev/null +++ test/CodeGenCXX/amdgcn_declspec_get.cpp @@ -0,0 +1,27 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -O3 -fdeclspec \ +// RUN: -disable-llvm-passes -o - %s | FileCheck %s + +int get_x(); + +struct A { + __declspec(property(get = _get_x)) int x; + static int _get_x(void) { + return get_x(); + }; +}; + +extern const A a; + +// CHECK-LABEL: define void @_Z4testv() +// CHECK: %i = alloca i32, align 4, addrspace(5) +// CHECK: %[[ii:.*]] = addrspacecast i32 addrspace(5)* %i to i32* +// CHECK: %[[cast:.*]] = bitcast i32 addrspace(5)* %i to i8 addrspace(5)* +// CHECK: call void @llvm.lifetime.start.p5i8(i64 4, i8 addrspace(5)* %[[cast]]) +// CHECK: %call = call i32 @_ZN1A6_get_xEv() +// CHECK: store i32 %call, i32* %[[ii]] +// CHECK: %[[cast2:.*]] = bitcast i32 addrspace(5)* %i to i8 addrspace(5)* +// CHECK: call void @llvm.lifetime.end.p5i8(i64 4, i8 addrspace(5)* %[[cast2]]) +void test() +{ + int i = a.x; +} Index: lib/CodeGen/CodeGenFunction.h === --- lib/CodeGen/CodeGenFunction.h +++ lib/CodeGen/CodeGenFunction.h @@ -2023,11 +2023,14 @@ /// various ways, this function will perform the cast by default. The cast /// may be avoided by passing false as \p CastToDefaultAddrSpace; this is /// more efficient if the caller knows that the address will not be exposed. + /// The original alloca instruction is returned through \p Alloca if it is + /// not nullptr. llvm::AllocaInst *CreateTempAlloca(llvm::Type *Ty, const Twine &Name = "tmp", llvm::Value *ArraySize = nullptr); Address CreateTempAlloca(llvm::Type *Ty, CharUnits align, const Twine &Name = "tmp", llvm::Value *ArraySize = nullptr, + Address *Alloca = nullptr, bool CastToDefaultAddrSpace = true); /// CreateDefaultAlignedTempAlloca - This creates an alloca with the @@ -2064,10 +2067,13 @@ /// CreateMemTemp - Create a temporary memory object of the given type, with /// appropriate alignment. Cast it to the default address space if - /// \p CastToDefaultAddrSpace is true. + /// \p CastToDefaultAddrSpace is true. Returns the original alloca + /// instruction by \p Alloca if it is not nullptr. Address CreateMemTemp(QualType T, const Twine &Name = "tmp", +Address *Alloca = nullptr, bool CastToDefaultAddrSpace = true); Address CreateMemTemp(QualType T, CharUnits Align, const Twine &Name = "tmp", +Address *Alloca = nullptr, bool CastToDefaultAddrSpace = true); /// CreateAggTemp - Create a temporary memory object for the given @@ -2515,7 +2521,9 @@ const VarDecl *Variable; -/// The address of the alloca. Invalid if the variable was emitted +/// The address of the alloca for languages with explicit address space +/// (e.g. OpenCL) or alloca casted to generic pointer for address space +/// agnostic languages (e.g. C++). Invalid if the variable was emitted /// as a global constant. Address Addr; @@ -2531,13 +2539,19 @@ /// Non-null if we should use lifetime annotations. llvm::Value *SizeForLifetimeMarkers; +/// Address with original alloca instruction. Invalid if the variable was +/// emitted as a global constant. +Address AllocaAddr; + struct Invalid {}; -AutoVarEmission(Invalid) : Variable(nullptr), Addr(Address::invalid()) {} +AutoVarEmission(Invalid) +: Variable(nullptr), Addr(Address::invalid()), + AllocaAddr(Address::invalid()) {} AutoVarEmission(const VarDecl &variable) - : Variable(&variable), Addr(Address::invalid()), NRVOFlag(nullptr), -IsByRef(false), IsConstantAggregate(false), -SizeForLifetimeMarkers(nullptr) {} +: Variable(&variable), Addr(Address::invalid()), NRVOFlag(nullptr), + IsByRef(false), IsConstantAggregate(false), + SizeForLifetimeMarkers(nullptr), AllocaAddr(Address::invalid()) {} bool wasEmittedAsGlobal() const { return !Addr.isValid(); } @@ -2553,11 +2567,15 @@ } /// Returns the raw, allocated address, which is not necessarily -/// the address of the object itself. +/// the address of the object itself. It is caste
[PATCH] D46476: [HIP] Add action builder for HIP
yaxunl updated this revision to Diff 147156. yaxunl added a comment. Skip backend and assemmbler phases for amdgcn since it does not support linking of object files. https://reviews.llvm.org/D46476 Files: lib/Driver/Driver.cpp test/Driver/cuda-phases.cu Index: test/Driver/cuda-phases.cu === --- test/Driver/cuda-phases.cu +++ test/Driver/cuda-phases.cu @@ -7,195 +7,242 @@ // REQUIRES: clang-driver // REQUIRES: powerpc-registered-target // REQUIRES: nvptx-registered-target - +// REQUIRES: amdgpu-registered-target // // Test single gpu architecture with complete compilation. // // RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s 2>&1 \ -// RUN: | FileCheck -check-prefix=BIN %s -// BIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda) -// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda) -// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda) -// BIN-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30) -// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, cuda-cpp-output, (device-cuda, sm_30) -// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-cuda, sm_30) -// BIN-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-cuda, sm_30) -// BIN-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-cuda, sm_30) -// BIN-DAG: [[P8:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P7]]}, object -// BIN-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P6]]}, assembler -// BIN-DAG: [[P10:[0-9]+]]: linker, {[[P8]], [[P9]]}, cuda-fatbin, (device-cuda) -// BIN-DAG: [[P11:[0-9]+]]: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-cuda (nvptx64-nvidia-cuda)" {[[P10]]}, ir -// BIN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-cuda) -// BIN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-cuda) -// BIN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-cuda) +// RUN: | FileCheck -check-prefixes=BIN,BIN_NV %s +// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s 2>&1 \ +// RUN: | FileCheck -check-prefixes=BIN,BIN_AMD %s +// BIN_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (host-[[T]]) +// BIN_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (host-[[T]]) +// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]]) +// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]]) +// BIN_NV-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH:sm_30]]) +// BIN_AMD-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH:gfx803]]) +// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]]) +// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-[[T]], [[ARCH]]) +// BIN_NV-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-[[T]], [[ARCH]]) +// BIN_NV-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-[[T]], [[ARCH]]) +// BIN_NV-DAG: [[P8:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda]]:[[ARCH]])" {[[P7]]}, object +// BIN_NV-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH]])" {[[P6]]}, assembler +// BIN_NV-DAG: [[P10:[0-9]+]]: linker, {[[P8]], [[P9]]}, cuda-fatbin, (device-[[T]]) +// BIN_NV-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-[[T]] ([[TRIPLE]])" {[[P10]]}, ir +// BIN_NV-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]]) +// BIN_AMD-DAG: [[P12:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]]) +// BIN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]]) +// BIN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-[[T]]) +// BIN_AMD-DAG: [[P15:[0-9]+]]: linker, {[[P5]]}, image, (device-[[T]], [[ARCH]]) +// BIN_AMD-DAG: [[P16:[0-9]+]]: offload, "host-[[T]] (powerpc64le-ibm-linux-gnu)" {[[P14]]}, +// BIN_AMD-DAG-SAME: "device-[[T]] ([[TRIPLE:amdgcn-amd-amdhsa]]:[[ARCH]])" {[[P15]]}, object // // Test single gpu architecture up to the assemble phase. // // RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s -S 2>&1 \ -// RUN: | FileCheck -check-prefix=ASM %s -// ASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30) -// ASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30) -// ASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30) -// ASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30) -// ASM-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler -// ASM-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda) -// ASM-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (host-cuda) -// ASM-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (host-cuda) -// A
[PATCH] D46489: [HIP] Let assembler output bitcode for amdgcn
yaxunl abandoned this revision. yaxunl added a comment. I have updated https://reviews.llvm.org/D46476 to skip backend and assembler phases for amdgcn, therefore this patch is no longer needed. https://reviews.llvm.org/D46489 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45900: CodeGen: Fix invalid bitcast for lifetime.start/end
This revision was automatically updated to reflect the committed changes. Closed by commit rL332593: CodeGen: Fix invalid bitcast for lifetime.start/end (authored by yaxunl, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D45900?vs=146987&id=147289#toc Repository: rL LLVM https://reviews.llvm.org/D45900 Files: cfe/trunk/lib/CodeGen/CGCall.cpp cfe/trunk/lib/CodeGen/CGDecl.cpp cfe/trunk/lib/CodeGen/CGExpr.cpp cfe/trunk/lib/CodeGen/CGExprAgg.cpp cfe/trunk/lib/CodeGen/CodeGenFunction.h cfe/trunk/test/CodeGenCXX/amdgcn_declspec_get.cpp Index: cfe/trunk/test/CodeGenCXX/amdgcn_declspec_get.cpp === --- cfe/trunk/test/CodeGenCXX/amdgcn_declspec_get.cpp +++ cfe/trunk/test/CodeGenCXX/amdgcn_declspec_get.cpp @@ -0,0 +1,27 @@ +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -O3 -fdeclspec \ +// RUN: -disable-llvm-passes -o - %s | FileCheck %s + +int get_x(); + +struct A { + __declspec(property(get = _get_x)) int x; + static int _get_x(void) { + return get_x(); + }; +}; + +extern const A a; + +// CHECK-LABEL: define void @_Z4testv() +// CHECK: %i = alloca i32, align 4, addrspace(5) +// CHECK: %[[ii:.*]] = addrspacecast i32 addrspace(5)* %i to i32* +// CHECK: %[[cast:.*]] = bitcast i32 addrspace(5)* %i to i8 addrspace(5)* +// CHECK: call void @llvm.lifetime.start.p5i8(i64 4, i8 addrspace(5)* %[[cast]]) +// CHECK: %call = call i32 @_ZN1A6_get_xEv() +// CHECK: store i32 %call, i32* %[[ii]] +// CHECK: %[[cast2:.*]] = bitcast i32 addrspace(5)* %i to i8 addrspace(5)* +// CHECK: call void @llvm.lifetime.end.p5i8(i64 4, i8 addrspace(5)* %[[cast2]]) +void test() +{ + int i = a.x; +} Index: cfe/trunk/lib/CodeGen/CGDecl.cpp === --- cfe/trunk/lib/CodeGen/CGDecl.cpp +++ cfe/trunk/lib/CodeGen/CGDecl.cpp @@ -965,6 +965,9 @@ if (!ShouldEmitLifetimeMarkers) return nullptr; + assert(Addr->getType()->getPointerAddressSpace() == + CGM.getDataLayout().getAllocaAddrSpace() && + "Pointer should be in alloca address space"); llvm::Value *SizeV = llvm::ConstantInt::get(Int64Ty, Size); Addr = Builder.CreateBitCast(Addr, AllocaInt8PtrTy); llvm::CallInst *C = @@ -974,6 +977,9 @@ } void CodeGenFunction::EmitLifetimeEnd(llvm::Value *Size, llvm::Value *Addr) { + assert(Addr->getType()->getPointerAddressSpace() == + CGM.getDataLayout().getAllocaAddrSpace() && + "Pointer should be in alloca address space"); Addr = Builder.CreateBitCast(Addr, AllocaInt8PtrTy); llvm::CallInst *C = Builder.CreateCall(CGM.getLLVMLifetimeEndFn(), {Size, Addr}); @@ -1058,6 +1064,7 @@ codegenoptions::LimitedDebugInfo; Address address = Address::invalid(); + Address AllocaAddr = Address::invalid(); if (Ty->isConstantSizeType()) { bool NRVO = getLangOpts().ElideConstructors && D.isNRVOVariable(); @@ -1148,7 +1155,8 @@ // Create the alloca. Note that we set the name separately from // building the instruction so that it's there even in no-asserts // builds. - address = CreateTempAlloca(allocaTy, allocaAlignment, D.getName()); + address = CreateTempAlloca(allocaTy, allocaAlignment, D.getName(), + /*ArraySize=*/nullptr, &AllocaAddr); // Don't emit lifetime markers for MSVC catch parameters. The lifetime of // the catch parameter starts in the catchpad instruction, and we can't @@ -1176,7 +1184,7 @@ !(!getLangOpts().CPlusPlus && hasLabelBeenSeenInCurrentScope())) { uint64_t size = CGM.getDataLayout().getTypeAllocSize(allocaTy); emission.SizeForLifetimeMarkers = - EmitLifetimeStart(size, address.getPointer()); + EmitLifetimeStart(size, AllocaAddr.getPointer()); } } else { assert(!emission.useLifetimeMarkers()); @@ -1205,7 +1213,8 @@ llvm::Type *llvmTy = ConvertTypeForMem(VlaSize.Type); // Allocate memory for the array. -address = CreateTempAlloca(llvmTy, alignment, "vla", VlaSize.NumElts); +address = CreateTempAlloca(llvmTy, alignment, "vla", VlaSize.NumElts, + &AllocaAddr); // If we have debug info enabled, properly describe the VLA dimensions for // this type by registering the vla size expression for each of the @@ -1215,6 +1224,7 @@ setAddrOfLocalVar(&D, address); emission.Addr = address; + emission.AllocaAddr = AllocaAddr; // Emit debug info for local var declaration. if (EmitDebugInfo && HaveInsertPoint()) { @@ -1228,7 +1238,7 @@ // Make sure we call @llvm.lifetime.end. if (emission.useLifetimeMarkers()) EHStack.pushCleanup(NormalEHLifetimeMarker, - emission.getAllocatedAddress(), +
[PATCH] D46476: [HIP] Add action builder for HIP
yaxunl added a comment. ping https://reviews.llvm.org/D46476 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46472: [HIP] Support offloading by linker script
yaxunl added a comment. ping https://reviews.llvm.org/D46472 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45212: [HIP] Let CUDA toolchain support HIP language mode and amdgpu
yaxunl added a comment. ping https://reviews.llvm.org/D45212 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46472: [HIP] Support offloading by linker script
yaxunl added a comment. In https://reviews.llvm.org/D46472#1103577, @t-tye wrote: > LGTM except for minor suggestions. Thanks. Will make changes when committing. https://reviews.llvm.org/D46472 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45212: [HIP] Let CUDA toolchain support HIP language mode and amdgpu
yaxunl added a comment. Hi Artem, I've addressed your comments. Any further changes are needed? Thanks. https://reviews.llvm.org/D45212 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46472: [HIP] Support offloading by linker script
This revision was automatically updated to reflect the committed changes. Closed by commit rL332724: [HIP] Support offloading by linker script (authored by yaxunl, committed by ). Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D46472?vs=145480&id=147511#toc Repository: rL LLVM https://reviews.llvm.org/D46472 Files: cfe/trunk/include/clang/Driver/Options.td cfe/trunk/lib/CodeGen/CGCUDANV.cpp cfe/trunk/lib/Driver/ToolChains/CommonArgs.cpp cfe/trunk/lib/Driver/ToolChains/CommonArgs.h cfe/trunk/lib/Driver/ToolChains/Gnu.cpp cfe/trunk/test/CodeGenCUDA/device-stub.cu Index: cfe/trunk/include/clang/Driver/Options.td === --- cfe/trunk/include/clang/Driver/Options.td +++ cfe/trunk/include/clang/Driver/Options.td @@ -586,6 +586,8 @@ def fcuda_short_ptr : Flag<["-"], "fcuda-short-ptr">, Flags<[CC1Option]>, HelpText<"Use 32-bit pointers for accessing const/local/shared address spaces.">; def fno_cuda_short_ptr : Flag<["-"], "fno-cuda-short-ptr">; +def fhip_dump_offload_linker_script : Flag<["-"], "fhip-dump-offload-linker-script">, + Group, Flags<[NoArgumentUnused, HelpHidden]>; def dA : Flag<["-"], "dA">, Group; def dD : Flag<["-"], "dD">, Group, Flags<[CC1Option]>, HelpText<"Print macro definitions in -E mode in addition to normal output">; Index: cfe/trunk/test/CodeGenCUDA/device-stub.cu === --- cfe/trunk/test/CodeGenCUDA/device-stub.cu +++ cfe/trunk/test/CodeGenCUDA/device-stub.cu @@ -1,25 +1,25 @@ // RUN: echo "GPU binary would be here" > %t // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \ // RUN: -fcuda-include-gpubinary %t -o - \ -// RUN: | FileCheck %s --check-prefixes=ALL,NORDC,CUDA +// RUN: | FileCheck %s --check-prefixes=ALL,NORDC,CUDA,CUDANORDC // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \ // RUN: -fcuda-include-gpubinary %t -o - -DNOGLOBALS \ -// RUN: | FileCheck %s -check-prefix=NOGLOBALS +// RUN: | FileCheck %s -check-prefixes=NOGLOBALS,CUDANOGLOBALS // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \ // RUN: -fcuda-rdc -fcuda-include-gpubinary %t -o - \ -// RUN: | FileCheck %s --check-prefixes=ALL,RDC,CUDA +// RUN: | FileCheck %s --check-prefixes=ALL,RDC,CUDA,CUDARDC // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - \ // RUN: | FileCheck %s -check-prefix=NOGPUBIN // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \ // RUN: -fcuda-include-gpubinary %t -o - -x hip\ // RUN: | FileCheck %s --check-prefixes=ALL,NORDC,HIP // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \ // RUN: -fcuda-include-gpubinary %t -o - -DNOGLOBALS -x hip \ -// RUN: | FileCheck %s -check-prefix=NOGLOBALS +// RUN: | FileCheck %s -check-prefixes=NOGLOBALS,HIPNOGLOBALS // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \ // RUN: -fcuda-rdc -fcuda-include-gpubinary %t -o - -x hip \ -// RUN: | FileCheck %s --check-prefixes=ALL,RDC,HIP +// RUN: | FileCheck %s --check-prefixes=ALL,RDC,HIP,HIPRDC // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - -x hip\ // RUN: | FileCheck %s -check-prefix=NOGPUBIN @@ -64,21 +64,26 @@ // * constant unnamed string with the kernel name // ALL: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00" // * constant unnamed string with GPU binary -// ALL: private unnamed_addr constant{{.*GPU binary would be here.*}}\00" -// NORDC-SAME: section ".nv_fatbin", align 8 -// RDC-SAME: section "__nv_relfatbin", align 8 +// HIP: @[[FATBIN:__hip_fatbin]] = external constant i8, section ".hip_fatbin" +// CUDA: @[[FATBIN:.*]] = private unnamed_addr constant{{.*GPU binary would be here.*}}\00", +// CUDANORDC-SAME: section ".nv_fatbin", align 8 +// CUDARDC-SAME: section "__nv_relfatbin", align 8 // * constant struct that wraps GPU binary -// CUDA: @__[[PREFIX:cuda]]_fatbin_wrapper = internal constant -// CUDA-SAME: { i32, i32, i8*, i8* } -// HIP: @__[[PREFIX:hip]]_fatbin_wrapper = internal constant -// HIP-SAME: { i32, i32, i8*, i8* } -// ALL-SAME: { i32 1180844977, i32 1, {{.*}}, i8* null } -// ALL-SAME: section ".nvFatBinSegment" +// ALL: @__[[PREFIX:cuda|hip]]_fatbin_wrapper = internal constant +// ALL-SAME: { i32, i32, i8*, i8* } +// CUDA-SAME: { i32 1180844977, i32 1, +// HIP-SAME: { i32 1212764230, i32 1, +// CUDA-SAME: i8* getelementptr inbounds ({{.*}}@[[FATBIN]], i64 0, i64 0), +// HIP-SAME: i8* @[[FATBIN]], +// ALL-SAME: i8* null } +// CUDA-SAME: section ".nvFatBinSegment" +// HIP-SAME: section ".hipFatBinSegment" // * variable to save GPU binary handle after initialization // NORDC: @__[[PREFIX]]_gpubin_handle = internal global i8** null // * constant unnamed string with NVModuleID // RDC: [[MODULE_ID_GLOBAL:@.*]] = private unnamed_addr constant -// RDC-SAME: c"[[MODULE_ID:.+]]\00", section "__nv_module_id", align 32 +// CUDARDC-SAME: c"[[MODULE_ID:.+
[PATCH] D47099: Disable casting of alloca for ActiveFlag
yaxunl created this revision. yaxunl added a reviewer: rjmccall. ActiveFlag is a temporary variable emitted for clean up. It is defined as AllocaInst* type and there is a cast to AlllocaInst in SetActiveFlag. An alloca casted to generic pointer causes assertion in SetActiveFlag. Since there is only load/store of ActiveFlag, it is safe to use the original alloca, therefore disable the cast. https://reviews.llvm.org/D47099 Files: lib/CodeGen/CGCleanup.cpp test/CodeGenCXX/conditional-temporaries.cpp Index: test/CodeGenCXX/conditional-temporaries.cpp === --- test/CodeGenCXX/conditional-temporaries.cpp +++ test/CodeGenCXX/conditional-temporaries.cpp @@ -1,4 +1,5 @@ // RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-apple-darwin9 -O3 | FileCheck %s +// RUN: %clang_cc1 -emit-llvm %s -o - -triple=amdgcn-amd-amdhsa -O3 | FileCheck %s namespace { Index: lib/CodeGen/CGCleanup.cpp === --- lib/CodeGen/CGCleanup.cpp +++ lib/CodeGen/CGCleanup.cpp @@ -284,7 +284,8 @@ void CodeGenFunction::initFullExprCleanup() { // Create a variable to decide whether the cleanup needs to be run. Address active = CreateTempAlloca(Builder.getInt1Ty(), CharUnits::One(), -"cleanup.cond"); +"cleanup.cond", /*ArraySize=*/nullptr, +/*Alloca=*/nullptr, /*Cast=*/false); // Initialize it to false at a site that's guaranteed to be run // before each evaluation. Index: test/CodeGenCXX/conditional-temporaries.cpp === --- test/CodeGenCXX/conditional-temporaries.cpp +++ test/CodeGenCXX/conditional-temporaries.cpp @@ -1,4 +1,5 @@ // RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-apple-darwin9 -O3 | FileCheck %s +// RUN: %clang_cc1 -emit-llvm %s -o - -triple=amdgcn-amd-amdhsa -O3 | FileCheck %s namespace { Index: lib/CodeGen/CGCleanup.cpp === --- lib/CodeGen/CGCleanup.cpp +++ lib/CodeGen/CGCleanup.cpp @@ -284,7 +284,8 @@ void CodeGenFunction::initFullExprCleanup() { // Create a variable to decide whether the cleanup needs to be run. Address active = CreateTempAlloca(Builder.getInt1Ty(), CharUnits::One(), -"cleanup.cond"); +"cleanup.cond", /*ArraySize=*/nullptr, +/*Alloca=*/nullptr, /*Cast=*/false); // Initialize it to false at a site that's guaranteed to be run // before each evaluation. ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47099: Disable casting of alloca for ActiveFlag
yaxunl added a comment. In https://reviews.llvm.org/D47099#1105493, @rjmccall wrote: > Maybe there should just be a method that makes a primitive alloca without the > casting, and then you can call that in CreateTempAlloca. In many cases we still need to call CreateTempAlloca with cast enabled, since we are not certain there is only load from it and store to it. Any time it is stored to another memory location or passed to another function (e.g. constructor/destructor), it needs to be a pointer to the language's default address space since the language sees it that way. https://reviews.llvm.org/D47099 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46472: [HIP] Support offloading by linker script
yaxunl added inline comments. Comment at: cfe/trunk/lib/Driver/ToolChains/CommonArgs.cpp:1371-1388 + // machines. + LksStream << "/*\n"; + LksStream << " HIP Offload Linker Script\n"; + LksStream << " *** Automatically generated by Clang ***\n"; + LksStream << "*/\n"; + LksStream << "TARGET(binary)\n"; + LksStream << "INPUT(" << BundleFileName << ")\n"; pcc wrote: > tra wrote: > > Using this linker script may present a problem. > > > > INSERT BEFORE is not going to work with ld.gold. > > https://sourceware.org/bugzilla/show_bug.cgi?id=15373 > > > > LLD also does not handle it particularly well -- INSERT BEFORE can only be > > used to override explicitly specified external linker script and virtually > > nobody uses linker scripts with LLD. > > See tests in https://reviews.llvm.org/D44380 > > > If you're just trying to embed a file it may be better to use MC to write an > ELF with something like: > ``` > .section .hip_fatbin > .align 16 > .globl __hip_fatbin > __hip_fatbin: > .incbin "BundleFileName" > ``` > and add that to the link. In https://reviews.llvm.org/D45212 we specified to use lld for linking. Since this is an explicit linker script, it works. We have tested this approach with many HIP applications that it works well. Similar approach has also been used by OpenMP for embedding device binary. Comment at: cfe/trunk/lib/Driver/ToolChains/CommonArgs.cpp:1371-1388 + // machines. + LksStream << "/*\n"; + LksStream << " HIP Offload Linker Script\n"; + LksStream << " *** Automatically generated by Clang ***\n"; + LksStream << "*/\n"; + LksStream << "TARGET(binary)\n"; + LksStream << "INPUT(" << BundleFileName << ")\n"; yaxunl wrote: > pcc wrote: > > tra wrote: > > > Using this linker script may present a problem. > > > > > > INSERT BEFORE is not going to work with ld.gold. > > > https://sourceware.org/bugzilla/show_bug.cgi?id=15373 > > > > > > LLD also does not handle it particularly well -- INSERT BEFORE can only > > > be used to override explicitly specified external linker script and > > > virtually nobody uses linker scripts with LLD. > > > See tests in https://reviews.llvm.org/D44380 > > > > > If you're just trying to embed a file it may be better to use MC to write > > an ELF with something like: > > ``` > > .section .hip_fatbin > > .align 16 > > .globl __hip_fatbin > > __hip_fatbin: > > .incbin "BundleFileName" > > ``` > > and add that to the link. > In https://reviews.llvm.org/D45212 we specified to use lld for linking. Since > this is an explicit linker script, it works. We have tested this approach > with many HIP applications that it works well. > > Similar approach has also been used by OpenMP for embedding device binary. Thanks for your suggestion. We will consider using MC if we see issues arise due to linker script. Repository: rL LLVM https://reviews.llvm.org/D46472 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47099: Disable casting of alloca for ActiveFlag
yaxunl updated this revision to Diff 147860. yaxunl added a comment. Add CreateMemTempWithoutCast and CreateTempAllocaWithoutCast by John's comments. https://reviews.llvm.org/D47099 Files: lib/CodeGen/CGCall.cpp lib/CodeGen/CGCleanup.cpp lib/CodeGen/CGExpr.cpp lib/CodeGen/CodeGenFunction.h test/CodeGenCXX/conditional-temporaries.cpp Index: test/CodeGenCXX/conditional-temporaries.cpp === --- test/CodeGenCXX/conditional-temporaries.cpp +++ test/CodeGenCXX/conditional-temporaries.cpp @@ -1,4 +1,5 @@ // RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-apple-darwin9 -O3 | FileCheck %s +// RUN: %clang_cc1 -emit-llvm %s -o - -triple=amdgcn-amd-amdhsa -O3 | FileCheck %s namespace { Index: lib/CodeGen/CodeGenFunction.h === --- lib/CodeGen/CodeGenFunction.h +++ lib/CodeGen/CodeGenFunction.h @@ -2020,18 +2020,20 @@ /// to the stack. /// /// Because the address of a temporary is often exposed to the program in - /// various ways, this function will perform the cast by default. The cast - /// may be avoided by passing false as \p CastToDefaultAddrSpace; this is + /// various ways, this function will perform the cast. The original alloca + /// instruction is returned through \p Alloca if it is not nullptr. + /// + /// The cast is not performaed in CreateTempAllocaWithoutCast. This is /// more efficient if the caller knows that the address will not be exposed. - /// The original alloca instruction is returned through \p Alloca if it is - /// not nullptr. llvm::AllocaInst *CreateTempAlloca(llvm::Type *Ty, const Twine &Name = "tmp", llvm::Value *ArraySize = nullptr); Address CreateTempAlloca(llvm::Type *Ty, CharUnits align, const Twine &Name = "tmp", llvm::Value *ArraySize = nullptr, - Address *Alloca = nullptr, - bool CastToDefaultAddrSpace = true); + Address *Alloca = nullptr); + Address CreateTempAllocaWithoutCast(llvm::Type *Ty, CharUnits align, + const Twine &Name = "tmp", + llvm::Value *ArraySize = nullptr); /// CreateDefaultAlignedTempAlloca - This creates an alloca with the /// default ABI alignment of the given LLVM type. @@ -2066,15 +2068,18 @@ Address CreateIRTemp(QualType T, const Twine &Name = "tmp"); /// CreateMemTemp - Create a temporary memory object of the given type, with - /// appropriate alignment. Cast it to the default address space if - /// \p CastToDefaultAddrSpace is true. Returns the original alloca - /// instruction by \p Alloca if it is not nullptr. + /// appropriate alignmen and cast it to the default address space. Returns + /// the original alloca instruction by \p Alloca if it is not nullptr. Address CreateMemTemp(QualType T, const Twine &Name = "tmp", -Address *Alloca = nullptr, -bool CastToDefaultAddrSpace = true); +Address *Alloca = nullptr); Address CreateMemTemp(QualType T, CharUnits Align, const Twine &Name = "tmp", -Address *Alloca = nullptr, -bool CastToDefaultAddrSpace = true); +Address *Alloca = nullptr); + + /// CreateMemTemp - Create a temporary memory object of the given type, with + /// appropriate alignmen without casting it to the default address space. + Address CreateMemTempWithoutCast(QualType T, const Twine &Name = "tmp"); + Address CreateMemTempWithoutCast(QualType T, CharUnits Align, + const Twine &Name = "tmp"); /// CreateAggTemp - Create a temporary memory object for the given /// aggregate type. Index: lib/CodeGen/CGExpr.cpp === --- lib/CodeGen/CGExpr.cpp +++ lib/CodeGen/CGExpr.cpp @@ -61,11 +61,21 @@ /// CreateTempAlloca - This creates a alloca and inserts it into the entry /// block. +Address CodeGenFunction::CreateTempAllocaWithoutCast(llvm::Type *Ty, + CharUnits Align, + const Twine &Name, + llvm::Value *ArraySize) { + auto Alloca = CreateTempAlloca(Ty, Name, ArraySize); + Alloca->setAlignment(Align.getQuantity()); + return Address(Alloca, Align); +} + +/// CreateTempAlloca - This creates a alloca and inserts it into the entry +/// block. The alloca is casted to default address space if necessary. Address CodeGenFunction::CreateTempAlloca(llvm::Type *Ty, CharUnits Align, const Twine &Name, llvm::Value *ArraySize, -
[PATCH] D47099: Disable casting of alloca for ActiveFlag
yaxunl updated this revision to Diff 147914. yaxunl edited the summary of this revision. yaxunl added a comment. Revised by John's comments. https://reviews.llvm.org/D47099 Files: lib/CodeGen/CGCall.cpp lib/CodeGen/CGCleanup.cpp lib/CodeGen/CGExpr.cpp lib/CodeGen/CodeGenFunction.h test/CodeGenCXX/conditional-temporaries.cpp Index: test/CodeGenCXX/conditional-temporaries.cpp === --- test/CodeGenCXX/conditional-temporaries.cpp +++ test/CodeGenCXX/conditional-temporaries.cpp @@ -1,4 +1,5 @@ // RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-apple-darwin9 -O3 | FileCheck %s +// RUN: %clang_cc1 -emit-llvm %s -o - -triple=amdgcn-amd-amdhsa -O3 | FileCheck %s namespace { Index: lib/CodeGen/CodeGenFunction.h === --- lib/CodeGen/CodeGenFunction.h +++ lib/CodeGen/CodeGenFunction.h @@ -2020,18 +2020,20 @@ /// to the stack. /// /// Because the address of a temporary is often exposed to the program in - /// various ways, this function will perform the cast by default. The cast - /// may be avoided by passing false as \p CastToDefaultAddrSpace; this is + /// various ways, this function will perform the cast. The original alloca + /// instruction is returned through \p Alloca if it is not nullptr. + /// + /// The cast is not performaed in CreateTempAllocaWithoutCast. This is /// more efficient if the caller knows that the address will not be exposed. - /// The original alloca instruction is returned through \p Alloca if it is - /// not nullptr. llvm::AllocaInst *CreateTempAlloca(llvm::Type *Ty, const Twine &Name = "tmp", llvm::Value *ArraySize = nullptr); Address CreateTempAlloca(llvm::Type *Ty, CharUnits align, const Twine &Name = "tmp", llvm::Value *ArraySize = nullptr, - Address *Alloca = nullptr, - bool CastToDefaultAddrSpace = true); + Address *Alloca = nullptr); + Address CreateTempAllocaWithoutCast(llvm::Type *Ty, CharUnits align, + const Twine &Name = "tmp", + llvm::Value *ArraySize = nullptr); /// CreateDefaultAlignedTempAlloca - This creates an alloca with the /// default ABI alignment of the given LLVM type. @@ -2066,15 +2068,18 @@ Address CreateIRTemp(QualType T, const Twine &Name = "tmp"); /// CreateMemTemp - Create a temporary memory object of the given type, with - /// appropriate alignment. Cast it to the default address space if - /// \p CastToDefaultAddrSpace is true. Returns the original alloca - /// instruction by \p Alloca if it is not nullptr. + /// appropriate alignmen and cast it to the default address space. Returns + /// the original alloca instruction by \p Alloca if it is not nullptr. Address CreateMemTemp(QualType T, const Twine &Name = "tmp", -Address *Alloca = nullptr, -bool CastToDefaultAddrSpace = true); +Address *Alloca = nullptr); Address CreateMemTemp(QualType T, CharUnits Align, const Twine &Name = "tmp", -Address *Alloca = nullptr, -bool CastToDefaultAddrSpace = true); +Address *Alloca = nullptr); + + /// CreateMemTemp - Create a temporary memory object of the given type, with + /// appropriate alignmen without casting it to the default address space. + Address CreateMemTempWithoutCast(QualType T, const Twine &Name = "tmp"); + Address CreateMemTempWithoutCast(QualType T, CharUnits Align, + const Twine &Name = "tmp"); /// CreateAggTemp - Create a temporary memory object for the given /// aggregate type. Index: lib/CodeGen/CGExpr.cpp === --- lib/CodeGen/CGExpr.cpp +++ lib/CodeGen/CGExpr.cpp @@ -61,21 +61,30 @@ /// CreateTempAlloca - This creates a alloca and inserts it into the entry /// block. +Address CodeGenFunction::CreateTempAllocaWithoutCast(llvm::Type *Ty, + CharUnits Align, + const Twine &Name, + llvm::Value *ArraySize) { + auto Alloca = CreateTempAlloca(Ty, Name, ArraySize); + Alloca->setAlignment(Align.getQuantity()); + return Address(Alloca, Align); +} + +/// CreateTempAlloca - This creates a alloca and inserts it into the entry +/// block. The alloca is casted to default address space if necessary. Address CodeGenFunction::CreateTempAlloca(llvm::Type *Ty, CharUnits Align, const Twine &Name, llvm::Value *ArraySize, -
[PATCH] D47099: Call CreateTempAllocaWithoutCast for ActiveFlag
This revision was automatically updated to reflect the committed changes. Closed by commit rC332982: Call CreateTempMemWithoutCast for ActiveFlag (authored by yaxunl, committed by ). Repository: rC Clang https://reviews.llvm.org/D47099 Files: lib/CodeGen/CGCall.cpp lib/CodeGen/CGCleanup.cpp lib/CodeGen/CGExpr.cpp lib/CodeGen/CodeGenFunction.h test/CodeGenCXX/conditional-temporaries.cpp Index: lib/CodeGen/CGCall.cpp === --- lib/CodeGen/CGCall.cpp +++ lib/CodeGen/CGCall.cpp @@ -3888,9 +3888,8 @@ assert(NumIRArgs == 1); if (!I->isAggregate()) { // Make a temporary alloca to pass the argument. -Address Addr = CreateMemTemp(I->Ty, ArgInfo.getIndirectAlign(), - "indirect-arg-temp", /*Alloca=*/nullptr, - /*Cast=*/false); +Address Addr = CreateMemTempWithoutCast( +I->Ty, ArgInfo.getIndirectAlign(), "indirect-arg-temp"); IRCallArgs[FirstIRArg] = Addr.getPointer(); I->copyInto(*this, Addr); @@ -3935,9 +3934,8 @@ } if (NeedCopy) { // Create an aligned temporary, and copy to it. - Address AI = CreateMemTemp(I->Ty, ArgInfo.getIndirectAlign(), - "byval-temp", /*Alloca=*/nullptr, - /*Cast=*/false); + Address AI = CreateMemTempWithoutCast( + I->Ty, ArgInfo.getIndirectAlign(), "byval-temp"); IRCallArgs[FirstIRArg] = AI.getPointer(); I->copyInto(*this, AI); } else { Index: lib/CodeGen/CodeGenFunction.h === --- lib/CodeGen/CodeGenFunction.h +++ lib/CodeGen/CodeGenFunction.h @@ -2020,18 +2020,20 @@ /// to the stack. /// /// Because the address of a temporary is often exposed to the program in - /// various ways, this function will perform the cast by default. The cast - /// may be avoided by passing false as \p CastToDefaultAddrSpace; this is + /// various ways, this function will perform the cast. The original alloca + /// instruction is returned through \p Alloca if it is not nullptr. + /// + /// The cast is not performaed in CreateTempAllocaWithoutCast. This is /// more efficient if the caller knows that the address will not be exposed. - /// The original alloca instruction is returned through \p Alloca if it is - /// not nullptr. llvm::AllocaInst *CreateTempAlloca(llvm::Type *Ty, const Twine &Name = "tmp", llvm::Value *ArraySize = nullptr); Address CreateTempAlloca(llvm::Type *Ty, CharUnits align, const Twine &Name = "tmp", llvm::Value *ArraySize = nullptr, - Address *Alloca = nullptr, - bool CastToDefaultAddrSpace = true); + Address *Alloca = nullptr); + Address CreateTempAllocaWithoutCast(llvm::Type *Ty, CharUnits align, + const Twine &Name = "tmp", + llvm::Value *ArraySize = nullptr); /// CreateDefaultAlignedTempAlloca - This creates an alloca with the /// default ABI alignment of the given LLVM type. @@ -2066,15 +2068,18 @@ Address CreateIRTemp(QualType T, const Twine &Name = "tmp"); /// CreateMemTemp - Create a temporary memory object of the given type, with - /// appropriate alignment. Cast it to the default address space if - /// \p CastToDefaultAddrSpace is true. Returns the original alloca - /// instruction by \p Alloca if it is not nullptr. + /// appropriate alignmen and cast it to the default address space. Returns + /// the original alloca instruction by \p Alloca if it is not nullptr. Address CreateMemTemp(QualType T, const Twine &Name = "tmp", -Address *Alloca = nullptr, -bool CastToDefaultAddrSpace = true); +Address *Alloca = nullptr); Address CreateMemTemp(QualType T, CharUnits Align, const Twine &Name = "tmp", -Address *Alloca = nullptr, -bool CastToDefaultAddrSpace = true); +Address *Alloca = nullptr); + + /// CreateMemTemp - Create a temporary memory object of the given type, with + /// appropriate alignmen without casting it to the default address space. + Address CreateMemTempWithoutCast(QualType T, const Twine &Name = "tmp"); + Address CreateMemTempWithoutCast(QualType T, CharUnits Align, + const Twine &Name = "tmp"); /// CreateAggTemp - Create a temporary memory object for the given /// aggregate type. Index: lib/CodeGen/CGCleanup.cpp === --- lib/CodeGen/CGCleanup.cpp +++ lib/CodeGen/CGCleanup.cpp @@ -283,8 +283,8
[PATCH] D47099: Call CreateTempAllocaWithoutCast for ActiveFlag
yaxunl added a comment. I revert it since it caused regression on arm and some other arch's. Script: -- /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/stage1/bin/clang -cc1 -internal-isystem /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/stage1/lib/clang/7.0.0/include -nostdsysteminc -emit-llvm /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/tools/clang/test/CodeGenCXX/conditional-temporaries.cpp -o - -triple=x86_64-apple-darwin9 -O3 | /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/stage1/bin/FileCheck /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/tools/clang/test/CodeGenCXX/conditional-temporaries.cpp /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/stage1/bin/clang -cc1 -internal-isystem /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/stage1/lib/clang/7.0.0/include -nostdsysteminc -emit-llvm /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/tools/clang/test/CodeGenCXX/conditional-temporaries.cpp -o - -triple=amdgcn-amd-amdhsa -O3 | /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/stage1/bin/FileCheck /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/tools/clang/test/CodeGenCXX/conditional-temporaries.cpp -- Exit Code: 1 Command Output (stderr): -- /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/tools/clang/test/CodeGenCXX/conditional-temporaries.cpp:42:12: error: expected string not found in input // CHECK: ret i32 5 ^ :11:33: note: scanning from here define i32 @_Z12getCtorCallsv() local_unnamed_addr #0 { ^ :14:2: note: possible intended match here ret i32 %0 ^ /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/tools/clang/test/CodeGenCXX/conditional-temporaries.cpp:48:12: error: expected string not found in input // CHECK: ret i32 5 ^ :18:33: note: scanning from here define i32 @_Z12getDtorCallsv() local_unnamed_addr #0 { ^ :21:2: note: possible intended match here ret i32 %0 ^ /home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/tools/clang/test/CodeGenCXX/conditional-temporaries.cpp:54:12: error: expected string not found in input // CHECK: ret i1 true ^ :25:34: note: scanning from here define zeroext i1 @_Z7successv() local_unnamed_addr #0 { ^ :30:2: note: possible intended match here ret i1 %cmp ^ -- Strange thing is that this only happens on some arch's. It passes on my x86_64/ubuntu built with clang. Repository: rC Clang https://reviews.llvm.org/D47099 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46476: [HIP] Add action builder for HIP
yaxunl marked 4 inline comments as done. yaxunl added inline comments. Comment at: lib/Driver/Driver.cpp:2221 +CudaDeviceActions.clear(); +for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) { + CudaDeviceActions.push_back(UA); tra wrote: > `for(auto Arch: GpuArchList)` will do Comment at: lib/Driver/Driver.cpp:2265-2272 + assert(AssociatedOffloadKind == Action::OFK_Cuda || AssociatedOffloadKind == Action::OFK_HIP); + // We don't need to support CUDA. - if (!C.hasOffloadToolChain()) + if (AssociatedOffloadKind == Action::OFK_Cuda && !C.hasOffloadToolChain()) +return false; + + // We don't need to support HIP. tra wrote: > Please reformat. will do Comment at: lib/Driver/Driver.cpp:2330-2332 + for (CudaArch Arch : GpuArchs) { GpuArchList.push_back(Arch); + } tra wrote: > Single-statement for does not need braces. will do Comment at: lib/Driver/Driver.cpp:2485-2493 + // The host only depends on device action in the linking phase, when all + // the device images have to be embedded in the host image. + if (CurPhase == phases::Link) { +DeviceLinkerInputs.resize(CudaDeviceActions.size()); +auto LI = DeviceLinkerInputs.begin(); +for (auto *A : CudaDeviceActions) { + LI->push_back(A); tra wrote: > I'm not sure I understand what happens here and the comment does not help. > We appear to add each element of CudaDeviceActions to the action list of each > linker input. > > Does the comment mean that *only in linking mode* do we need to add > dependency on device actions? > Modified the comment to make it clearer. We only add dependency on device action at linking phase. HIP embeds device image in host image in host linking phase. Since we need to link all device actions, we cannot create link action here since we have not went through all device actions yet. We just save device actions to DeviceLinkerInputs and create device link action later in appendLinkDependences, where all device actions have been went through. https://reviews.llvm.org/D46476 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46476: [HIP] Add action builder for HIP
yaxunl updated this revision to Diff 148051. yaxunl marked 4 inline comments as done. yaxunl added a comment. Revised by Artem's comments. https://reviews.llvm.org/D46476 Files: lib/Driver/Driver.cpp test/Driver/cuda-phases.cu Index: test/Driver/cuda-phases.cu === --- test/Driver/cuda-phases.cu +++ test/Driver/cuda-phases.cu @@ -7,195 +7,242 @@ // REQUIRES: clang-driver // REQUIRES: powerpc-registered-target // REQUIRES: nvptx-registered-target - +// REQUIRES: amdgpu-registered-target // // Test single gpu architecture with complete compilation. // // RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s 2>&1 \ -// RUN: | FileCheck -check-prefix=BIN %s -// BIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda) -// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda) -// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda) -// BIN-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30) -// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, cuda-cpp-output, (device-cuda, sm_30) -// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-cuda, sm_30) -// BIN-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-cuda, sm_30) -// BIN-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-cuda, sm_30) -// BIN-DAG: [[P8:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P7]]}, object -// BIN-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P6]]}, assembler -// BIN-DAG: [[P10:[0-9]+]]: linker, {[[P8]], [[P9]]}, cuda-fatbin, (device-cuda) -// BIN-DAG: [[P11:[0-9]+]]: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-cuda (nvptx64-nvidia-cuda)" {[[P10]]}, ir -// BIN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-cuda) -// BIN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-cuda) -// BIN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-cuda) +// RUN: | FileCheck -check-prefixes=BIN,BIN_NV %s +// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s 2>&1 \ +// RUN: | FileCheck -check-prefixes=BIN,BIN_AMD %s +// BIN_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (host-[[T]]) +// BIN_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (host-[[T]]) +// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]]) +// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]]) +// BIN_NV-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH:sm_30]]) +// BIN_AMD-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH:gfx803]]) +// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]]) +// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-[[T]], [[ARCH]]) +// BIN_NV-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-[[T]], [[ARCH]]) +// BIN_NV-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-[[T]], [[ARCH]]) +// BIN_NV-DAG: [[P8:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda]]:[[ARCH]])" {[[P7]]}, object +// BIN_NV-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH]])" {[[P6]]}, assembler +// BIN_NV-DAG: [[P10:[0-9]+]]: linker, {[[P8]], [[P9]]}, cuda-fatbin, (device-[[T]]) +// BIN_NV-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-[[T]] ([[TRIPLE]])" {[[P10]]}, ir +// BIN_NV-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]]) +// BIN_AMD-DAG: [[P12:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]]) +// BIN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]]) +// BIN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-[[T]]) +// BIN_AMD-DAG: [[P15:[0-9]+]]: linker, {[[P5]]}, image, (device-[[T]], [[ARCH]]) +// BIN_AMD-DAG: [[P16:[0-9]+]]: offload, "host-[[T]] (powerpc64le-ibm-linux-gnu)" {[[P14]]}, +// BIN_AMD-DAG-SAME: "device-[[T]] ([[TRIPLE:amdgcn-amd-amdhsa]]:[[ARCH]])" {[[P15]]}, object // // Test single gpu architecture up to the assemble phase. // // RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s -S 2>&1 \ -// RUN: | FileCheck -check-prefix=ASM %s -// ASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30) -// ASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30) -// ASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30) -// ASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30) -// ASM-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler -// ASM-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda) -// ASM-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (host-cuda) -// ASM-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (host-cuda) -// ASM-DAG: [[P8:[0-9]+]]: backe
[PATCH] D45212: [HIP] Let CUDA toolchain support HIP language mode and amdgpu
yaxunl marked 19 inline comments as done. yaxunl added a comment. In https://reviews.llvm.org/D45212#1105177, @tra wrote: > Hi, > > Sorry about the long silence. I'm back to continue the reviews. I'll handle > what I can today and will continue with the rest on Tuesday. > > It looks like patch description needs to be updated: > > > Use clang-offload-bindler to create binary for device ISA. > > I don't see anything related to offload-bundler in this patch any more. You are right. Using clang-offload-bundler to create binary for device ISA has been moved to another patch. Will update the description of this patch. In https://reviews.llvm.org/D45212#1105282, @tra wrote: > One more thing -- it would be really good to add some tests to make sure your > commands are constructed the way you want. will do Comment at: include/clang/Driver/Options.td:582 def fno_cuda_rdc : Flag<["-"], "fno-cuda-rdc">; +def hip_device_lib_path_EQ : Joined<["--"], "hip-device-lib-path=">, Group, + HelpText<"HIP device library path">; tra wrote: > I'm not sure about `i_Group`? This will cause this option to be passed to all > preprocessor jobs. It will also be passed to host and device side > compilations, while you probably only want/need it on device side only. will change to Link_Group Comment at: lib/Driver/ToolChains/Cuda.cpp:323 +C.getDriver().Diag(diag::err_drv_no_such_file) << BCName; + CmdArgs.push_back(Args.MakeArgString(FullName)); + return FoundLibDevice; tra wrote: > FullName is already result of Args.MakeArgString. You only need to do it once. will fix Comment at: lib/Driver/ToolChains/Cuda.cpp:329 +// object file. It calls llvm-link, opt, llc, then lld steps. +void AMDGCN::Linker::ConstructJob(Compilation &C, const JobAction &JA, + const InputInfo &Output, tra wrote: > This function is too large to easily see that we're actually constructing > sequence of commands. > I'd probably split construction of individual tool's command line into its > own function. will do Comment at: lib/Driver/ToolChains/Cuda.cpp:336 + assert(StringRef(JA.getOffloadingArch()).startswith("gfx") && + " unless gfx processor, backend should be clang"); + const auto &TC = tra wrote: > No need for the leading space in the message. will fix. Comment at: lib/Driver/ToolChains/Cuda.cpp:344-345 + // Add the input bc's created by compile step. + for (InputInfoList::const_iterator it = Inputs.begin(), ie = Inputs.end(); + it != ie; ++it) { +const InputInfo &II = *it; tra wrote: > `for (const InputInfo &it : Inputs)` ? will fix Comment at: lib/Driver/ToolChains/Cuda.cpp:350 + + std::string GFXNAME = JA.getOffloadingArch(); + tra wrote: > All-caps name looks like a macro. Rename to `GfxName` ? will fix Comment at: lib/Driver/ToolChains/Cuda.cpp:354-359 + // Find in --hip-device-lib-path and HIP_LIBRARY_PATH. + for (auto Arg : Args) { +if (Arg->getSpelling() == "--hip-device-lib-path=") { + LibraryPaths.push_back(Args.MakeArgString(Arg->getValue())); +} + } tra wrote: > ``` > for (path : Args.getAllArgValues(...)) { >LibraryPaths.push_back(Args.MakeArgString(path)); > } > > ``` will fix Comment at: lib/Driver/ToolChains/Cuda.cpp:375-378 + addBCLib(C, Args, CmdArgs, LibraryPaths, + (Twine("oclc_isa_version_") + StringRef(GFXNAME).drop_front(3) + +".amdgcn.bc") + .str()); tra wrote: > This is somewhat unreadable. Perhaps you could construct the name in a temp > variable. will do Comment at: lib/Driver/ToolChains/Cuda.cpp:384 + const char *ResultingBitcodeF = + C.addTempFile(C.getArgs().MakeArgString(TmpName.c_str())); + CmdArgs.push_back(ResultingBitcodeF); tra wrote: > You don't need to use c_str() for MakeArgString. It will happily accept > std::string. will fix Comment at: lib/Driver/ToolChains/Cuda.cpp:394 + // The input to opt is the output from llvm-link. + OptArgs.push_back(ResultingBitcodeF); + // Pass optimization arg to opt. tra wrote: > `BitcodeOutputFile`? will change Comment at: lib/Driver/ToolChains/Cuda.cpp:417 + const char *mcpustr = Args.MakeArgString("-mcpu=" + GFXNAME); + OptArgs.push_back(mcpustr); + OptArgs.push_back("-o"); tra wrote: > I think you can get rid of the temp var here without hurting readability. will do Comment at: lib/Driver/ToolChains/Cuda.cpp:420 + std::string OptOutputFileName = + C.getDriver().GetTemporaryPath("OPT_OUTPUT", "bc"); + const char *OptOutputFile = tra wrote: > I wonder if we could derive temp file name from the input's name. Th
[PATCH] D45212: Add HIP toolchain
yaxunl updated this revision to Diff 148216. yaxunl marked 19 inline comments as done. yaxunl retitled this revision from "[HIP] Let CUDA toolchain support HIP language mode and amdgpu" to "Add HIP toolchain". yaxunl edited the summary of this revision. yaxunl added a comment. Herald added a subscriber: mgorny. Revised by Artem's comments. https://reviews.llvm.org/D45212 Files: include/clang/Driver/Options.td lib/Driver/CMakeLists.txt lib/Driver/Driver.cpp lib/Driver/ToolChains/HIP.cpp lib/Driver/ToolChains/HIP.h test/Driver/Inputs/hip_multiple_inputs/lib1/lib1.bc test/Driver/Inputs/hip_multiple_inputs/lib2/lib2.bc test/Driver/hip-toolchain.hip Index: test/Driver/hip-toolchain.hip === --- /dev/null +++ test/Driver/hip-toolchain.hip @@ -0,0 +1,84 @@ +// REQUIRES: clang-driver +// REQUIRES: x86-registered-target +// REQUIRES: amdgpu-registered-target + +// RUN: %clang -### -target x86_64-linux-gnu \ +// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \ +// RUN: --hip-device-lib=lib1.bc --hip-device-lib=lib2.bc \ +// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \ +// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib2 \ +// RUN: -fuse-ld=lld \ +// RUN: %S/Inputs/hip_multiple_inputs/a.cu \ +// RUN: %S/Inputs/hip_multiple_inputs/b.hip \ +// RUN: 2>&1 | FileCheck %s + +// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa" +// CHECK-SAME: "-aux-triple" "x86_64--linux-gnu" "-emit-llvm-bc" +// CHECK-SAME: {{.*}} "-main-file-name" "a.cu" {{.*}} "-fcuda-is-device" +// CHECK-SAME: {{.*}} "-o" [[A_BC:".*bc"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[A_SRC:".*a.cu"]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" +// CHECK-SAME: "-aux-triple" "x86_64--linux-gnu" "-emit-llvm-bc" +// CHECK-SAME: {{.*}} "-main-file-name" "b.hip" {{.*}} "-fcuda-is-device" +// CHECK-SAME: {{.*}} "-o" [[B_BC:".*bc"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[B_SRC:".*b.hip"]] + +// CHECK: [[LLVM_LINK:"*.llvm-link"]] [[A_BC]] [[B_BC]] +// CHECK-SAME: "{{.*}}lib1.bc" "{{.*}}lib2.bc" +// CHECK-SAME: "-o" [[LINKED_BC_DEV1:".*-gfx803-linked-.*bc"]] + +// CHECK: [[OPT:".*opt"]] [[LINKED_BC_DEV1]] "-mtriple=amdgcn-amd-amdhsa" +// CHECK-SAME: "-mcpu=gfx803" +// CHECK-SAME: "-o" [[OPT_BC_DEV1:".*-gfx803-optimized.*bc"]] + +// CHECK: [[LLC: ".*llc"]] [[OPT_BC_DEV1]] "-mtriple=amdgcn-amd-amdhsa" +// CHECK-SAME: "-filetype=obj" "-mcpu=gfx803" "-o" [[OBJ_DEV1:".*-gfx803-.*o"]] + +// CHECK: [[LLD: ".*lld"]] "-flavor" "gnu" "--no-undefined" "-shared" +// CHECK-SAME: "-o" "[[IMG_DEV1:.*out]]" [[OBJ_DEV1]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" +// CHECK-SAME: "-aux-triple" "x86_64--linux-gnu" "-emit-llvm-bc" +// CHECK-SAME: {{.*}} "-main-file-name" "a.cu" {{.*}} "-fcuda-is-device" +// CHECK-SAME: {{.*}} "-o" [[A_BC:".*bc"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[A_SRC]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" +// CHECK-SAME: "-aux-triple" "x86_64--linux-gnu" "-emit-llvm-bc" +// CHECK-SAME: {{.*}} "-main-file-name" "b.hip" {{.*}} "-fcuda-is-device" +// CHECK-SAME: {{.*}} "-o" [[B_BC:".*bc"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[B_SRC]] + +// CHECK: [[LLVM_LINK]] [[A_BC]] [[B_BC]] +// CHECK-SAME: "{{.*}}lib1.bc" "{{.*}}lib2.bc" +// CHECK-SAME: "-o" [[LINKED_BC_DEV2:".*-gfx900-linked-.*bc"]] + +// CHECK: [[OPT]] [[LINKED_BC_DEV2]] "-mtriple=amdgcn-amd-amdhsa" +// CHECK-SAME: "-mcpu=gfx900" +// CHECK-SAME: "-o" [[OPT_BC_DEV2:".*-gfx900-optimized.*bc"]] + +// CHECK: [[LLC]] [[OPT_BC_DEV2]] "-mtriple=amdgcn-amd-amdhsa" +// CHECK-SAME: "-filetype=obj" "-mcpu=gfx900" "-o" [[OBJ_DEV2:".*-gfx900-.*o"]] + +// CHECK: [[LLD]] "-flavor" "gnu" "--no-undefined" "-shared" +// CHECK-SAME: "-o" "[[IMG_DEV2:.*out]]" [[OBJ_DEV2]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "x86_64--linux-gnu" +// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa" "-emit-obj" +// CHECK-SAME: {{.*}} "-main-file-name" "a.cu" +// CHECK-SAME: {{.*}} "-o" [[A_OBJ_HOST:".*o"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[A_SRC]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "x86_64--linux-gnu" +// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa" "-emit-obj" +// CHECK-SAME: {{.*}} "-main-file-name" "b.hip" +// CHECK-SAME: {{.*}} "-o" [[B_OBJ_HOST:".*o"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[B_SRC]] + +// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o" +// CHECK-SAME: "-targets={{.*}},hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900" +// CHECK-SAME: "-inputs={{.*}},[[IMG_DEV1]],[[IMG_DEV2]]" "-outputs=[[BUNDLE:.*o]]" + +// CHECK: [[LD:".*ld.lld"]] {{.*}} [[A_OBJ_HOST]] [[B_OBJ_HOST]] +// CHECK-SAME: {{.*}} "-T" "{{.*}}.lk" Index: lib/Driver/ToolChains/HIP.h === --- /dev/null +++ lib/Driver/ToolChains/HIP.h @@ -0,0 +1,123 @@ +//===--- HIP.h - HIP ToolChain Implementations --*- C++ -*-===// +// +// The LLVM Compiler Infrastructur
[PATCH] D45212: Add HIP toolchain
yaxunl marked 6 inline comments as done. yaxunl added inline comments. Comment at: lib/Driver/ToolChains/HIP.cpp:29-47 +static bool addBCLib(Compilation &C, const ArgList &Args, + ArgStringList &CmdArgs, ArgStringList LibraryPaths, + StringRef BCName) { + StringRef FullName; + bool FoundLibDevice = false; + for (std::string LibraryPath : LibraryPaths) { +SmallString<128> Path(LibraryPath); tra wrote: > FullName may remain uninitialized if LibraryPaths are empty which will > probably crash compiler when you attempt to pass it to MakeArgString. > If empty LibraryPaths is not expected there should be an assert. > > If the library is not found, we issue an error, but we still proceed to > append the FullName to the CmdArgs. I don't think we should do that. FullName > will be either NULL or pointing to the last directory in the LibraryPaths. > > You seem to be relying on diagnostics to deal with errors and are not using > return value of the function. You may as well make it void. > > I'd move `CmdArgs.push_back(...)` under `if(::exists(FullName))` and change > `break` to `return`; > Then you can get rid of FoundLibDevice and just issue the error if we ever > reach the end of the function. > Will CmdArgs.push_back(...) under if(::exists(FullName)) and change break to return; and change return type to void. Comment at: lib/Driver/ToolChains/HIP.cpp:79-81 +std::string ISAVerBC = "oclc_isa_version_"; +ISAVerBC = ISAVerBC + SubArchName.drop_front(3).str(); +ISAVerBC = ISAVerBC + ".amdgcn.bc"; tra wrote: > No need for intermediate values here -- just '+' all parts together. > will do Comment at: lib/Driver/ToolChains/HIP.cpp:133 +} +OptArgs.push_back(Args.MakeArgString(llvm::Twine("-O") + OOpt)); + } tra wrote: > Nit: I think explicit llvm::Twine is unnecessary here. will remove Comment at: lib/Driver/ToolChains/HIP.cpp:155-160 + ArgStringList LlcArgs; + LlcArgs.push_back(InputFileName); + LlcArgs.push_back("-mtriple=amdgcn-amd-amdhsa"); + LlcArgs.push_back("-filetype=obj"); + LlcArgs.push_back(Args.MakeArgString("-mcpu=" + SubArchName)); + LlcArgs.push_back("-o"); tra wrote: > Nit: THis could be collapsed into `ArgStringList LlcArgs({...});` will do Comment at: lib/Driver/ToolChains/HIP.cpp:179-181 + ArgStringList LldArgs; + // The output from ld.lld is an HSA code object file. + LldArgs.append({"-flavor", "gnu", "--no-undefined", "-shared", "-o"}); tra wrote: > Same here: `ArgStringList LldArgs({"-flavor", "gnu", "--no-undefined", > "-shared", "-o"});` will do Comment at: lib/Driver/ToolChains/HIP.cpp:212-215 + TempFile = + constructOptCommand(C, JA, Inputs, Args, SubArchName, Prefix, TempFile); + TempFile = + constructLlcCommand(C, JA, Inputs, Args, SubArchName, Prefix, TempFile); tra wrote: > Right now the code is structured as if you're appending to the same TempFile > string which is not the case here. I'd give intermediate variables their own > names -- `OptCommand`,`LlcCommand`. > This would make it easier to see that you are **chaining** separate commands, > each producing its own temp output file. will do https://reviews.llvm.org/D45212 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45212: Add HIP toolchain
yaxunl updated this revision to Diff 148277. yaxunl marked 6 inline comments as done. yaxunl added a comment. Revised by Artem's comments. https://reviews.llvm.org/D45212 Files: include/clang/Driver/Options.td lib/Driver/CMakeLists.txt lib/Driver/Driver.cpp lib/Driver/ToolChains/HIP.cpp lib/Driver/ToolChains/HIP.h test/Driver/Inputs/hip_multiple_inputs/lib1/lib1.bc test/Driver/Inputs/hip_multiple_inputs/lib2/lib2.bc test/Driver/hip-toolchain.hip Index: test/Driver/hip-toolchain.hip === --- /dev/null +++ test/Driver/hip-toolchain.hip @@ -0,0 +1,84 @@ +// REQUIRES: clang-driver +// REQUIRES: x86-registered-target +// REQUIRES: amdgpu-registered-target + +// RUN: %clang -### -target x86_64-linux-gnu \ +// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \ +// RUN: --hip-device-lib=lib1.bc --hip-device-lib=lib2.bc \ +// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \ +// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib2 \ +// RUN: -fuse-ld=lld \ +// RUN: %S/Inputs/hip_multiple_inputs/a.cu \ +// RUN: %S/Inputs/hip_multiple_inputs/b.hip \ +// RUN: 2>&1 | FileCheck %s + +// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa" +// CHECK-SAME: "-aux-triple" "x86_64--linux-gnu" "-emit-llvm-bc" +// CHECK-SAME: {{.*}} "-main-file-name" "a.cu" {{.*}} "-fcuda-is-device" +// CHECK-SAME: {{.*}} "-o" [[A_BC:".*bc"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[A_SRC:".*a.cu"]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" +// CHECK-SAME: "-aux-triple" "x86_64--linux-gnu" "-emit-llvm-bc" +// CHECK-SAME: {{.*}} "-main-file-name" "b.hip" {{.*}} "-fcuda-is-device" +// CHECK-SAME: {{.*}} "-o" [[B_BC:".*bc"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[B_SRC:".*b.hip"]] + +// CHECK: [[LLVM_LINK:"*.llvm-link"]] [[A_BC]] [[B_BC]] +// CHECK-SAME: "{{.*}}lib1.bc" "{{.*}}lib2.bc" +// CHECK-SAME: "-o" [[LINKED_BC_DEV1:".*-gfx803-linked-.*bc"]] + +// CHECK: [[OPT:".*opt"]] [[LINKED_BC_DEV1]] "-mtriple=amdgcn-amd-amdhsa" +// CHECK-SAME: "-mcpu=gfx803" +// CHECK-SAME: "-o" [[OPT_BC_DEV1:".*-gfx803-optimized.*bc"]] + +// CHECK: [[LLC: ".*llc"]] [[OPT_BC_DEV1]] "-mtriple=amdgcn-amd-amdhsa" +// CHECK-SAME: "-filetype=obj" "-mcpu=gfx803" "-o" [[OBJ_DEV1:".*-gfx803-.*o"]] + +// CHECK: [[LLD: ".*lld"]] "-flavor" "gnu" "--no-undefined" "-shared" +// CHECK-SAME: "-o" "[[IMG_DEV1:.*out]]" [[OBJ_DEV1]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" +// CHECK-SAME: "-aux-triple" "x86_64--linux-gnu" "-emit-llvm-bc" +// CHECK-SAME: {{.*}} "-main-file-name" "a.cu" {{.*}} "-fcuda-is-device" +// CHECK-SAME: {{.*}} "-o" [[A_BC:".*bc"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[A_SRC]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa" +// CHECK-SAME: "-aux-triple" "x86_64--linux-gnu" "-emit-llvm-bc" +// CHECK-SAME: {{.*}} "-main-file-name" "b.hip" {{.*}} "-fcuda-is-device" +// CHECK-SAME: {{.*}} "-o" [[B_BC:".*bc"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[B_SRC]] + +// CHECK: [[LLVM_LINK]] [[A_BC]] [[B_BC]] +// CHECK-SAME: "{{.*}}lib1.bc" "{{.*}}lib2.bc" +// CHECK-SAME: "-o" [[LINKED_BC_DEV2:".*-gfx900-linked-.*bc"]] + +// CHECK: [[OPT]] [[LINKED_BC_DEV2]] "-mtriple=amdgcn-amd-amdhsa" +// CHECK-SAME: "-mcpu=gfx900" +// CHECK-SAME: "-o" [[OPT_BC_DEV2:".*-gfx900-optimized.*bc"]] + +// CHECK: [[LLC]] [[OPT_BC_DEV2]] "-mtriple=amdgcn-amd-amdhsa" +// CHECK-SAME: "-filetype=obj" "-mcpu=gfx900" "-o" [[OBJ_DEV2:".*-gfx900-.*o"]] + +// CHECK: [[LLD]] "-flavor" "gnu" "--no-undefined" "-shared" +// CHECK-SAME: "-o" "[[IMG_DEV2:.*out]]" [[OBJ_DEV2]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "x86_64--linux-gnu" +// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa" "-emit-obj" +// CHECK-SAME: {{.*}} "-main-file-name" "a.cu" +// CHECK-SAME: {{.*}} "-o" [[A_OBJ_HOST:".*o"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[A_SRC]] + +// CHECK: [[CLANG]] "-cc1" "-triple" "x86_64--linux-gnu" +// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa" "-emit-obj" +// CHECK-SAME: {{.*}} "-main-file-name" "b.hip" +// CHECK-SAME: {{.*}} "-o" [[B_OBJ_HOST:".*o"]] "-x" "hip" +// CHECK-SAME: {{.*}} [[B_SRC]] + +// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o" +// CHECK-SAME: "-targets={{.*}},hip-amdgcn-amd-amdhsa-gfx803,hip-amdgcn-amd-amdhsa-gfx900" +// CHECK-SAME: "-inputs={{.*}},[[IMG_DEV1]],[[IMG_DEV2]]" "-outputs=[[BUNDLE:.*o]]" + +// CHECK: [[LD:".*ld.lld"]] {{.*}} [[A_OBJ_HOST]] [[B_OBJ_HOST]] +// CHECK-SAME: {{.*}} "-T" "{{.*}}.lk" Index: lib/Driver/ToolChains/HIP.h === --- /dev/null +++ lib/Driver/ToolChains/HIP.h @@ -0,0 +1,123 @@ +//===--- HIP.h - HIP ToolChain Implementations --*- C++ -*-===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===--==
[PATCH] D45212: Add HIP toolchain
yaxunl marked an inline comment as done. yaxunl added inline comments. Comment at: lib/Driver/ToolChains/HIP.cpp:44 + } + if (!FoundLibDevice) +C.getDriver().Diag(diag::err_drv_no_such_file) << BCName; tra wrote: > You don't need FoundLibDevice any more as you will always return from inside > the loop if it is ever true. will remove when committing. Thanks! https://reviews.llvm.org/D45212 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47376: [CUDA][HIP] Do not offload for -M
yaxunl created this revision. yaxunl added a reviewer: tra. CUDA and HIP action builder currently tries to do offloading for -M, which causes dependency file not generated. This patch changes action builder so that only host compilation is performed to generate dependency file. This assumes that the header files do not depend on whether it is device compilation or host compilation. This is not ideal, but at least let projects using -M compile. Ideally, we should create an offloading action for host dependency file and device dependency file and merge them to be on dependency file, which will be done in a separate patch. https://reviews.llvm.org/D47376 Files: lib/Driver/Driver.cpp test/Driver/cuda-phases.cu Index: test/Driver/cuda-phases.cu === --- test/Driver/cuda-phases.cu +++ test/Driver/cuda-phases.cu @@ -246,3 +246,14 @@ // DASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-[[T]], [[ARCH2]]) // DASM2_NV-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-[[T]], [[ARCH2]]) // DASM2_NV-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH2]])" {[[P8]]}, assembler + +// +// Test -M does not cause device input. +// +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -M 2>&1 \ +// RUN: | FileCheck -check-prefixes=DEP %s +// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s -M 2>&1 \ +// RUN: | FileCheck -check-prefixes=DEP %s +// DEP-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:.*]], (host-[[T]]) +// DEP-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, dependencies, (host-[[T]]) + Index: lib/Driver/Driver.cpp === --- lib/Driver/Driver.cpp +++ lib/Driver/Driver.cpp @@ -2210,7 +2210,10 @@ // Set the flag to true, so that the builder acts on the current input. IsActive = true; -if (CompileHostOnly) +// ToDo: Handle situations where device compilation and host +// compilation have different dependencies. Currently we assume they +// are the same therefore device compilation is not performed for -M. +if (CompileHostOnly || Args.getLastArg(options::OPT_M)) return ABRT_Success; // Replicate inputs for each GPU architecture. Index: test/Driver/cuda-phases.cu === --- test/Driver/cuda-phases.cu +++ test/Driver/cuda-phases.cu @@ -246,3 +246,14 @@ // DASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-[[T]], [[ARCH2]]) // DASM2_NV-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-[[T]], [[ARCH2]]) // DASM2_NV-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH2]])" {[[P8]]}, assembler + +// +// Test -M does not cause device input. +// +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -M 2>&1 \ +// RUN: | FileCheck -check-prefixes=DEP %s +// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s -M 2>&1 \ +// RUN: | FileCheck -check-prefixes=DEP %s +// DEP-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:.*]], (host-[[T]]) +// DEP-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, dependencies, (host-[[T]]) + Index: lib/Driver/Driver.cpp === --- lib/Driver/Driver.cpp +++ lib/Driver/Driver.cpp @@ -2210,7 +2210,10 @@ // Set the flag to true, so that the builder acts on the current input. IsActive = true; -if (CompileHostOnly) +// ToDo: Handle situations where device compilation and host +// compilation have different dependencies. Currently we assume they +// are the same therefore device compilation is not performed for -M. +if (CompileHostOnly || Args.getLastArg(options::OPT_M)) return ABRT_Success; // Replicate inputs for each GPU architecture. ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46476: [HIP] Add action builder for HIP
yaxunl added a comment. ping Any further changes are needed? Thanks. https://reviews.llvm.org/D46476 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46476: [HIP] Add action builder for HIP
yaxunl marked an inline comment as done. yaxunl added inline comments. Comment at: test/Driver/cuda-phases.cu:16 +// RUN: | FileCheck -check-prefixes=BIN,BIN_NV %s +// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s 2>&1 \ +// RUN: | FileCheck -check-prefixes=BIN,BIN_AMD %s tra wrote: > Please wrap long RUN lines in all tests. will do when commit. Thanks! https://reviews.llvm.org/D46476 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46476: [HIP] Add action builder for HIP
This revision was automatically updated to reflect the committed changes. yaxunl marked an inline comment as done. Closed by commit rC333483: Add action builder for HIP (authored by yaxunl, committed by ). Changed prior to commit: https://reviews.llvm.org/D46476?vs=148051&id=149012#toc Repository: rC Clang https://reviews.llvm.org/D46476 Files: lib/Driver/Driver.cpp test/Driver/cuda-phases.cu Index: lib/Driver/Driver.cpp === --- lib/Driver/Driver.cpp +++ lib/Driver/Driver.cpp @@ -2151,9 +2151,10 @@ } }; - /// CUDA action builder. It injects device code in the host backend - /// action. - class CudaActionBuilder final : public DeviceActionBuilder { + /// Base class for CUDA/HIP action builder. It injects device code in + /// the host backend action. + class CudaActionBuilderBase : public DeviceActionBuilder { + protected: /// Flags to signal if the user requested host-only or device-only /// compilation. bool CompileHostOnly = false; @@ -2170,115 +2171,11 @@ /// Flag that is set to true if this builder acted on the current input. bool IsActive = false; - public: -CudaActionBuilder(Compilation &C, DerivedArgList &Args, - const Driver::InputList &Inputs) -: DeviceActionBuilder(C, Args, Inputs, Action::OFK_Cuda) {} - -ActionBuilderReturnCode -getDeviceDependences(OffloadAction::DeviceDependences &DA, - phases::ID CurPhase, phases::ID FinalPhase, - PhasesTy &Phases) override { - if (!IsActive) -return ABRT_Inactive; - - // If we don't have more CUDA actions, we don't have any dependences to - // create for the host. - if (CudaDeviceActions.empty()) -return ABRT_Success; - - assert(CudaDeviceActions.size() == GpuArchList.size() && - "Expecting one action per GPU architecture."); - assert(!CompileHostOnly && - "Not expecting CUDA actions in host-only compilation."); - - // If we are generating code for the device or we are in a backend phase, - // we attempt to generate the fat binary. We compile each arch to ptx and - // assemble to cubin, then feed the cubin *and* the ptx into a device - // "link" action, which uses fatbinary to combine these cubins into one - // fatbin. The fatbin is then an input to the host action if not in - // device-only mode. - if (CompileDeviceOnly || CurPhase == phases::Backend) { -ActionList DeviceActions; -for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) { - // Produce the device action from the current phase up to the assemble - // phase. - for (auto Ph : Phases) { -// Skip the phases that were already dealt with. -if (Ph < CurPhase) - continue; -// We have to be consistent with the host final phase. -if (Ph > FinalPhase) - break; - -CudaDeviceActions[I] = C.getDriver().ConstructPhaseAction( -C, Args, Ph, CudaDeviceActions[I], Action::OFK_Cuda); - -if (Ph == phases::Assemble) - break; - } - - // If we didn't reach the assemble phase, we can't generate the fat - // binary. We don't need to generate the fat binary if we are not in - // device-only mode. - if (!isa(CudaDeviceActions[I]) || - CompileDeviceOnly) -continue; - - Action *AssembleAction = CudaDeviceActions[I]; - assert(AssembleAction->getType() == types::TY_Object); - assert(AssembleAction->getInputs().size() == 1); - - Action *BackendAction = AssembleAction->getInputs()[0]; - assert(BackendAction->getType() == types::TY_PP_Asm); - - for (auto &A : {AssembleAction, BackendAction}) { -OffloadAction::DeviceDependences DDep; -DDep.add(*A, *ToolChains.front(), CudaArchToString(GpuArchList[I]), - Action::OFK_Cuda); -DeviceActions.push_back( -C.MakeAction(DDep, A->getType())); - } -} - -// We generate the fat binary if we have device input actions. -if (!DeviceActions.empty()) { - CudaFatBinary = - C.MakeAction(DeviceActions, types::TY_CUDA_FATBIN); - - if (!CompileDeviceOnly) { -DA.add(*CudaFatBinary, *ToolChains.front(), /*BoundArch=*/nullptr, - Action::OFK_Cuda); -// Clear the fat binary, it is already a dependence to an host -// action. -CudaFatBinary = nullptr; - } - - // Remove the CUDA actions as they are already connected to an host - // action or fat binary. - CudaDeviceActions.clear(); -} - -// We avoid creating host action in device-only mode. -return
[PATCH] D45212: Add HIP toolchain
This revision was automatically updated to reflect the committed changes. yaxunl marked an inline comment as done. Closed by commit rC333484: Add HIP toolchain (authored by yaxunl, committed by ). Changed prior to commit: https://reviews.llvm.org/D45212?vs=148277&id=149013#toc Repository: rC Clang https://reviews.llvm.org/D45212 Files: include/clang/Driver/Options.td lib/Driver/CMakeLists.txt lib/Driver/Driver.cpp lib/Driver/ToolChains/HIP.cpp lib/Driver/ToolChains/HIP.h test/Driver/Inputs/hip_multiple_inputs/lib1/lib1.bc test/Driver/Inputs/hip_multiple_inputs/lib2/lib2.bc test/Driver/hip-toolchain.hip Index: lib/Driver/CMakeLists.txt === --- lib/Driver/CMakeLists.txt +++ lib/Driver/CMakeLists.txt @@ -45,6 +45,7 @@ ToolChains/Fuchsia.cpp ToolChains/Gnu.cpp ToolChains/Haiku.cpp + ToolChains/HIP.cpp ToolChains/Hexagon.cpp ToolChains/Linux.cpp ToolChains/MipsLinux.cpp Index: lib/Driver/Driver.cpp === --- lib/Driver/Driver.cpp +++ lib/Driver/Driver.cpp @@ -12,6 +12,7 @@ #include "ToolChains/AMDGPU.h" #include "ToolChains/AVR.h" #include "ToolChains/Ananas.h" +#include "ToolChains/BareMetal.h" #include "ToolChains/Clang.h" #include "ToolChains/CloudABI.h" #include "ToolChains/Contiki.h" @@ -22,15 +23,15 @@ #include "ToolChains/FreeBSD.h" #include "ToolChains/Fuchsia.h" #include "ToolChains/Gnu.h" -#include "ToolChains/BareMetal.h" +#include "ToolChains/HIP.h" #include "ToolChains/Haiku.h" #include "ToolChains/Hexagon.h" #include "ToolChains/Lanai.h" #include "ToolChains/Linux.h" +#include "ToolChains/MSVC.h" #include "ToolChains/MinGW.h" #include "ToolChains/Minix.h" #include "ToolChains/MipsLinux.h" -#include "ToolChains/MSVC.h" #include "ToolChains/Myriad.h" #include "ToolChains/NaCl.h" #include "ToolChains/NetBSD.h" @@ -70,9 +71,9 @@ #include "llvm/Support/PrettyStackTrace.h" #include "llvm/Support/Process.h" #include "llvm/Support/Program.h" +#include "llvm/Support/StringSaver.h" #include "llvm/Support/TargetRegistry.h" #include "llvm/Support/raw_ostream.h" -#include "llvm/Support/StringSaver.h" #include #include #include @@ -540,7 +541,7 @@ // // CUDA/HIP // - // We need to generate a CUDA toolchain if any of the inputs has a CUDA + // We need to generate a CUDA/HIP toolchain if any of the inputs has a CUDA // or HIP type. However, mixed CUDA/HIP compilation is not supported. bool IsCuda = llvm::any_of(Inputs, [](std::pair &I) { @@ -556,28 +557,37 @@ Diag(clang::diag::err_drv_mix_cuda_hip); return; } - if (IsCuda || IsHIP) { + if (IsCuda) { const ToolChain *HostTC = C.getSingleOffloadToolChain(); const llvm::Triple &HostTriple = HostTC->getTriple(); StringRef DeviceTripleStr; -auto OFK = IsHIP ? Action::OFK_HIP : Action::OFK_Cuda; -if (IsHIP) { - // HIP is only supported on amdgcn. - DeviceTripleStr = "amdgcn-amd-amdhsa"; -} else { - // CUDA is only supported on nvptx. - DeviceTripleStr = HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda" - : "nvptx-nvidia-cuda"; -} +auto OFK = Action::OFK_Cuda; +DeviceTripleStr = +HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda" : "nvptx-nvidia-cuda"; llvm::Triple CudaTriple(DeviceTripleStr); -// Use the CUDA/HIP and host triples as the key into the ToolChains map, +// Use the CUDA and host triples as the key into the ToolChains map, // because the device toolchain we create depends on both. auto &CudaTC = ToolChains[CudaTriple.str() + "/" + HostTriple.str()]; if (!CudaTC) { CudaTC = llvm::make_unique( *this, CudaTriple, *HostTC, C.getInputArgs(), OFK); } C.addOffloadDeviceToolChain(CudaTC.get(), OFK); + } else if (IsHIP) { +const ToolChain *HostTC = C.getSingleOffloadToolChain(); +const llvm::Triple &HostTriple = HostTC->getTriple(); +StringRef DeviceTripleStr; +auto OFK = Action::OFK_HIP; +DeviceTripleStr = "amdgcn-amd-amdhsa"; +llvm::Triple HIPTriple(DeviceTripleStr); +// Use the HIP and host triples as the key into the ToolChains map, +// because the device toolchain we create depends on both. +auto &HIPTC = ToolChains[HIPTriple.str() + "/" + HostTriple.str()]; +if (!HIPTC) { + HIPTC = llvm::make_unique( + *this, HIPTriple, *HostTC, C.getInputArgs()); +} +C.addOffloadDeviceToolChain(HIPTC.get(), OFK); } // Index: lib/Driver/ToolChains/HIP.cpp === --- lib/Driver/ToolChains/HIP.cpp +++ lib/Driver/ToolChains/HIP.cpp @@ -0,0 +1,343 @@ +//===--- HIP.cpp - HIP Tool and ToolChain Implementations ---*- C++ -*-===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illin
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl updated this revision to Diff 116704. yaxunl marked 5 inline comments as done. yaxunl added a comment. Rebase to ToT and clean up logic. https://reviews.llvm.org/D35082 Files: include/clang/AST/ASTContext.h include/clang/AST/Type.h include/clang/Basic/AddressSpaces.h lib/AST/ASTContext.cpp lib/AST/Expr.cpp lib/AST/ItaniumMangle.cpp lib/AST/TypePrinter.cpp lib/Basic/Targets/AMDGPU.cpp lib/Basic/Targets/NVPTX.h lib/Basic/Targets/SPIR.h lib/Basic/Targets/TCE.h lib/CodeGen/CGDecl.cpp lib/Sema/SemaChecking.cpp lib/Sema/SemaDecl.cpp lib/Sema/SemaType.cpp test/CodeGen/blocks-opencl.cl test/CodeGenOpenCL/address-spaces-mangling.cl test/CodeGenOpenCL/address-spaces.cl test/SemaOpenCL/address-spaces-conversions-cl2.0.cl test/SemaOpenCL/address-spaces.cl test/SemaOpenCL/atomic-ops.cl test/SemaOpenCL/cl20-device-side-enqueue.cl test/SemaOpenCL/invalid-block.cl test/SemaOpenCL/invalid-pipes-cl2.0.cl test/SemaOpenCL/null_literal.cl Index: test/SemaOpenCL/null_literal.cl === --- test/SemaOpenCL/null_literal.cl +++ test/SemaOpenCL/null_literal.cl @@ -1,29 +1,68 @@ // RUN: %clang_cc1 -verify %s -// RUN: %clang_cc1 -cl-std=CL2.0 -DCL20 -verify %s +// RUN: %clang_cc1 -cl-std=CL2.0 -verify %s #define NULL ((void*)0) void foo(){ + global int *g1 = NULL; + global int *g2 = (global void *)0; + global int *g3 = (constant void *)0; // expected-error{{initializing '__global int *' with an expression of type '__constant void *' changes address space of pointer}} + global int *g4 = (local void *)0; // expected-error{{initializing '__global int *' with an expression of type '__local void *' changes address space of pointer}} + global int *g5 = (private void *)0; // expected-error{{initializing '__global int *' with an expression of type '__private void *' changes address space of pointer}} -global int* ptr1 = NULL; + constant int *c1 = NULL; + constant int *c2 = (global void *)0; // expected-error{{initializing '__constant int *' with an expression of type '__global void *' changes address space of pointer}} + constant int *c3 = (constant void *)0; + constant int *c4 = (local void *)0; // expected-error{{initializing '__constant int *' with an expression of type '__local void *' changes address space of pointer}} + constant int *c5 = (private void *)0; // expected-error{{initializing '__constant int *' with an expression of type '__private void *' changes address space of pointer}} -global int* ptr2 = (global void*)0; + local int *l1 = NULL; + local int *l2 = (global void *)0; // expected-error{{initializing '__local int *' with an expression of type '__global void *' changes address space of pointer}} + local int *l3 = (constant void *)0; // expected-error{{initializing '__local int *' with an expression of type '__constant void *' changes address space of pointer}} + local int *l4 = (local void *)0; + local int *l5 = (private void *)0; // expected-error{{initializing '__local int *' with an expression of type '__private void *' changes address space of pointer}} -constant int* ptr3 = NULL; + private int *p1 = NULL; + private int *p2 = (global void *)0; // expected-error{{initializing '__private int *' with an expression of type '__global void *' changes address space of pointer}} + private int *p3 = (constant void *)0; // expected-error{{initializing '__private int *' with an expression of type '__constant void *' changes address space of pointer}} + private int *p4 = (local void *)0; // expected-error{{initializing '__private int *' with an expression of type '__local void *' changes address space of pointer}} + private int *p5 = (private void *)0; -constant int* ptr4 = (global void*)0; // expected-error{{initializing '__constant int *' with an expression of type '__global void *' changes address space of pointer}} +#if __OPENCL_C_VERSION__ >= 200 + // Assigning a pointer to a pointer to narrower address space causes an error unless there is an valid explicit cast. + global int *g6 = (generic void *)0; // expected-error{{initializing '__global int *' with an expression of type '__generic void *' changes address space of pointer}} + constant int *c6 = (generic void *)0; // expected-error{{initializing '__constant int *' with an expression of type '__generic void *' changes address space of pointer}} + local int *l6 = (generic void *)0; // expected-error{{initializing '__local int *' with an expression of type '__generic void *' changes address space of pointer}} + private int *p6 = (generic void *)0; // expected-error{{initializing '__private int *' with an expression of type '__generic void *' changes address space of pointer}} -#ifdef CL20 -// Accept explicitly pointer to generic address space in OpenCL v2.0. -global int* ptr5 = (generic void*)0; -#endif - -global int* ptr6 = (local void*)0; // expected-error{{initializing '__global int *' with an expression of type
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl marked an inline comment as done. yaxunl added inline comments. Comment at: lib/Sema/SemaType.cpp:6974 + if (state.getSema().getLangOpts().OpenCL && !hasOpenCLAddressSpace && + type.getAddressSpace() == LangAS::Default && Anastasia wrote: > I am not very convinced we need to do all this check really... I think since > we have to live with this code further and maintain it let's try to simplify > it a bit. This can also help us avoid redundant checks. Perhaps we could even > move it out in a separate function? > > How about some sort of hierarchical check structure like: > > | |**CL version?** || > |<2.0||>=2.0| > |`private`||**Type?**|| > ||pointer||non-pointer| > ||`generic`||**Scope?**|| > |||program||function| > |||`global`||**Qualifiers?**| > none||static/extern| > `private`||`global`| > fixed. https://reviews.llvm.org/D35082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D37822: [OpenCL] Clean up and add missing fields for block struct
yaxunl marked 4 inline comments as done. yaxunl added inline comments. Comment at: lib/CodeGen/CGBlocks.cpp:311 +// The header is basically 'struct { int; int; generic void *; +// custom_fields; }'. Assert that that struct is packed. +auto GenPtrAlign = CharUnits::fromQuantity( Anastasia wrote: > remove one "that". will do. Comment at: lib/CodeGen/CGBlocks.cpp:312 +// custom_fields; }'. Assert that that struct is packed. +auto GenPtrAlign = CharUnits::fromQuantity( +CGM.getTarget().getPointerAlign(LangAS::opencl_generic) / 8); Anastasia wrote: > I think the alignment might not be computed correctly now if there will be > custom fields that might have a bigger size than a pointer? Also what happens > if we have captures as well? Will fix. The captures will be accounted for by computeBlockInfo and BlockSize and BlockAlign will be updated. Comment at: lib/CodeGen/CGBlocks.cpp:850 + CGM.getDataLayout().getTypeAllocSize(I->getType())), + "block.custom"); + } Anastasia wrote: > do we need to add numeration to each item name? yes. will add it. Comment at: lib/CodeGen/CGBlocks.cpp:1250 // Function fields.add(blockFn); Anastasia wrote: > If we reorder fields and put this on top we can merge the if statements above > and below this point. By convention the size of the whole struct is the first field so that the library function reads the first integer and knows how many bytes to copy. https://reviews.llvm.org/D37822 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D37822: [OpenCL] Clean up and add missing fields for block struct
yaxunl updated this revision to Diff 116877. yaxunl marked 4 inline comments as done. yaxunl added a comment. Rebased to ToT and revised by Anastasia's comments. https://reviews.llvm.org/D37822 Files: lib/CodeGen/CGBlocks.cpp lib/CodeGen/CGOpenCLRuntime.cpp lib/CodeGen/CGOpenCLRuntime.h lib/CodeGen/TargetInfo.h test/CodeGen/blocks-opencl.cl test/CodeGenOpenCL/blocks.cl test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -7,7 +7,7 @@ typedef struct {int a;} ndrange_t; // N.B. The check here only exists to set BL_GLOBAL -// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i8**, i32, i32, i8*, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) +// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) const bl_t block_G = (bl_t) ^ (local void *a) {}; kernel void device_side_enqueue(global int *a, global int *b, int i) { @@ -27,9 +27,10 @@ // COMMON: [[NDR:%[a-z0-9]+]] = alloca %struct.ndrange_t, align 4 // COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.*}}*, %opencl.queue_t{{.*}}** %default_queue // COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags - // COMMON: [[BL:%[0-9]+]] = bitcast <{ i8*, i32, i32, i8*, %struct.__block_descriptor addrspace(2)*, i32{{.*}}, i32{{.*}}, i32{{.*}} }>* %block to void ()* + // B32: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4)*, i32 addrspace(1)*, i32, i32 addrspace(1)* }>* %block to void ()* + // B64: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4)*, i32 addrspace(1)*, i32 addrspace(1)*, i32 }>* %block to void ()* // COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)* - // COMMON: call i32 @__enqueue_kernel_basic(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* [[BL_I8]]) + // COMMON: call i32 @__enqueue_kernel_basic(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* [[BL_I8]]) enqueue_kernel(default_queue, flags, ndrange, ^(void) { a[i] = b[i]; @@ -39,7 +40,7 @@ // COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags // COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.*}}** %event_wait_list to %opencl.clk_event_t{{.*}}* addrspace(4)* // COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.*}}** %clk_event to %opencl.clk_event_t{{.*}}* addrspace(4)* - // COMMON: [[BL:%[0-9]+]] = bitcast <{ i8*, i32, i32, i8*, %struct.__block_descriptor addrspace(2)*, i32{{.*}}, i32{{.*}}, i32{{.*}} }>* %block3 to void ()* + // COMMON: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4)*, i32{{.*}}, i32{{.*}}, i32{{.*}} }>* %block3 to void ()* // COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)* // COMMON: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.*}}, i32 2, %opencl.clk_event_t{{.*}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.*}}* addrspace(4)* [[EVNT]], i8 addrspace(4)* [[BL_I8]]) enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event, @@ -52,11 +53,11 @@ // B32: %[[TMP:.*]] = alloca [1 x i32] // B32: %[[TMP1:.*]] = getelementptr [1 x i32], [1 x i32]* %[[TMP]], i32 0, i32 0 // B32: store i32 256, i32* %[[TMP1]], align 4 - // B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8**, i32, i32, i8*, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, i32* %[[TMP1]]) + // B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, i32* %[[TMP1]]) // B64: %[[TMP:.*]] = alloca [1 x i64] // B64: %[[TMP1:.*]] = getelementptr [1 x i64], [1 x i64]* %[[TMP]], i32 0, i32 0 // B64: store i64 256, i64* %[[TMP1]], align 8 - // B64: call i32 @__enqueue_kernel_vaargs(%opencl
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl marked 5 inline comments as done. yaxunl added inline comments. Comment at: lib/Sema/SemaType.cpp:6810 + QualType &T, TypeAttrLocation TAL) { + if (!State.getSema().getLangOpts().OpenCL || + T.getAddressSpace() != LangAS::Default) Anastasia wrote: > I think this could be checked before calling the function. will do. Comment at: lib/Sema/SemaType.cpp:6863 + unsigned ImpAddr; + bool IsStatic = D.getDeclSpec().getStorageClassSpec() == DeclSpec::SCS_static; + // Put OpenCL automatic variable in private address space. Anastasia wrote: > Do we cover `extern` too? will add. Comment at: lib/Sema/SemaType.cpp:6872 + ImpAddr = LangAS::opencl_private; +else if (IsStatic) + ImpAddr = LangAS::opencl_global; Anastasia wrote: > I think we can't have this case for CL1.2 see s6.8. But I think it could > happen for `extern` though. Right. I will remove setting implicit addr space for static var for CL1.2. For extern var, for CL2.0 I will set implicit addr space to global. However, for CL1.2 since only constant addr space is supported for file-scope var, I can only set the implicit addr space of an extern var to be constant. However I feel this may cause more confusion than convenience, therefore I will not set implicit addr space for extern var for CL1.2. If user does not use constant addr space with extern var explicitly, they will see diagnostics that extern var must have constant addr space. This is also the current behavior before my change. https://reviews.llvm.org/D35082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D37568: [AMDGPU] Allow flexible register names in inline asm constraints
yaxunl marked an inline comment as done. yaxunl added a comment. Ping. Brian, Stas, Can you review this since Matt is on vacation? Thanks. https://reviews.llvm.org/D37568 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl updated this revision to Diff 117020. yaxunl marked 3 inline comments as done. yaxunl added a comment. Revised by Anastasia's comments. https://reviews.llvm.org/D35082 Files: include/clang/AST/ASTContext.h include/clang/AST/Type.h include/clang/Basic/AddressSpaces.h lib/AST/ASTContext.cpp lib/AST/Expr.cpp lib/AST/ItaniumMangle.cpp lib/AST/TypePrinter.cpp lib/Basic/Targets/AMDGPU.cpp lib/Basic/Targets/NVPTX.h lib/Basic/Targets/SPIR.h lib/Basic/Targets/TCE.h lib/CodeGen/CGDecl.cpp lib/Sema/SemaChecking.cpp lib/Sema/SemaDecl.cpp lib/Sema/SemaType.cpp test/CodeGen/blocks-opencl.cl test/CodeGenOpenCL/address-spaces-mangling.cl test/CodeGenOpenCL/address-spaces.cl test/SemaOpenCL/address-spaces-conversions-cl2.0.cl test/SemaOpenCL/address-spaces.cl test/SemaOpenCL/atomic-ops.cl test/SemaOpenCL/cl20-device-side-enqueue.cl test/SemaOpenCL/extern.cl test/SemaOpenCL/invalid-block.cl test/SemaOpenCL/invalid-pipes-cl2.0.cl test/SemaOpenCL/null_literal.cl test/SemaOpenCL/storageclass-cl20.cl test/SemaOpenCL/storageclass.cl Index: test/SemaOpenCL/storageclass.cl === --- test/SemaOpenCL/storageclass.cl +++ test/SemaOpenCL/storageclass.cl @@ -5,6 +5,20 @@ int G3 = 0;// expected-error{{program scope variable must reside in constant address space}} global int G4 = 0; // expected-error{{program scope variable must reside in constant address space}} +static float g_implicit_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static constant float g_constant_static_var = 0; +static global float g_global_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static local float g_local_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static private float g_private_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static generic float g_generic_static_var = 0; // expected-error{{OpenCL version 1.2 does not support the 'generic' type qualifier}} // expected-error {{program scope variable must reside in constant address space}} + +extern float g_implicit_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern constant float g_constant_extern_var; +extern global float g_global_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern local float g_local_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern private float g_private_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern generic float g_generic_extern_var; // expected-error{{OpenCL version 1.2 does not support the 'generic' type qualifier}} // expected-error {{extern variable must reside in constant address space}} + void kernel foo(int x) { // static is not allowed at local scope before CL2.0 static int S1 = 5; // expected-error{{variables in function scope cannot be declared static}} @@ -45,10 +59,17 @@ __attribute__((address_space(100))) int L4; // expected-error{{automatic variable qualified with an invalid address space}} } + static float l_implicit_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static constant float l_constant_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static global float l_global_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static local float l_local_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static private float l_private_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static generic float l_generic_static_var = 0; // expected-error{{OpenCL version 1.2 does not support the 'generic' type qualifier}} // expected-error {{variables in function scope cannot be declared static}} - extern constant float L5; - extern local float L6; // expected-error{{extern variable must reside in constant address space}} - - static int L7 = 0; // expected-error{{variables in function scope cannot be declared static}} - static int L8; // expected-error{{variables in function scope cannot be declared static}} + extern float l_implicit_extern_var; // expected-error {{extern variable must reside in constant address space}} + extern constant float l_constant_extern_var; + extern global float l_global_extern_var; // expected-error {{extern variable must reside in constant address space}} + extern local float l_local_extern_var; // expected-error {{extern variable must reside in constant address space}} + extern private float l_private_extern_var; //
[PATCH] D37568: [AMDGPU] Allow flexible register names in inline asm constraints
This revision was automatically updated to reflect the committed changes. Closed by commit rL314452: [AMDGPU] Allow flexible register names in inline asm constraints (authored by yaxunl). Changed prior to commit: https://reviews.llvm.org/D37568?vs=116383&id=117037#toc Repository: rL LLVM https://reviews.llvm.org/D37568 Files: cfe/trunk/lib/Basic/Targets/AMDGPU.h cfe/trunk/test/Sema/inline-asm-validate-amdgpu.cl Index: cfe/trunk/lib/Basic/Targets/AMDGPU.h === --- cfe/trunk/lib/Basic/Targets/AMDGPU.h +++ cfe/trunk/lib/Basic/Targets/AMDGPU.h @@ -17,6 +17,7 @@ #include "clang/AST/Type.h" #include "clang/Basic/TargetInfo.h" #include "clang/Basic/TargetOptions.h" +#include "llvm/ADT/StringSet.h" #include "llvm/ADT/Triple.h" #include "llvm/Support/Compiler.h" @@ -115,17 +116,83 @@ return None; } + /// Accepted register names: (n, m is unsigned integer, n < m) + /// v + /// s + /// {vn}, {v[n]} + /// {sn}, {s[n]} + /// {S} , where S is a special register name + {v[n:m]} + /// {s[n:m]} bool validateAsmConstraint(const char *&Name, TargetInfo::ConstraintInfo &Info) const override { -switch (*Name) { -default: - break; -case 'v': // vgpr -case 's': // sgpr +static const ::llvm::StringSet<> SpecialRegs({ +"exec", "vcc", "flat_scratch", "m0", "scc", "tba", "tma", +"flat_scratch_lo", "flat_scratch_hi", "vcc_lo", "vcc_hi", "exec_lo", +"exec_hi", "tma_lo", "tma_hi", "tba_lo", "tba_hi", +}); + +StringRef S(Name); +bool HasLeftParen = false; +if (S.front() == '{') { + HasLeftParen = true; + S = S.drop_front(); +} +if (S.empty()) + return false; +if (S.front() != 'v' && S.front() != 's') { + if (!HasLeftParen) +return false; + auto E = S.find('}'); + if (!SpecialRegs.count(S.substr(0, E))) +return false; + S = S.drop_front(E + 1); + if (!S.empty()) +return false; + // Found {S} where S is a special register. + Info.setAllowsRegister(); + Name = S.data() - 1; + return true; +} +S = S.drop_front(); +if (!HasLeftParen) { + if (!S.empty()) +return false; + // Found s or v. Info.setAllowsRegister(); + Name = S.data() - 1; return true; } -return false; +bool HasLeftBracket = false; +if (!S.empty() && S.front() == '[') { + HasLeftBracket = true; + S = S.drop_front(); +} +unsigned long long N; +if (S.empty() || consumeUnsignedInteger(S, 10, N)) + return false; +if (!S.empty() && S.front() == ':') { + if (!HasLeftBracket) +return false; + S = S.drop_front(); + unsigned long long M; + if (consumeUnsignedInteger(S, 10, M) || N >= M) +return false; +} +if (HasLeftBracket) { + if (S.empty() || S.front() != ']') +return false; + S = S.drop_front(); +} +if (S.empty() || S.front() != '}') + return false; +S = S.drop_front(); +if (!S.empty()) + return false; +// Found {vn}, {sn}, {v[n]}, {s[n]}, {v[n:m]}, or {s[n:m]}. +Info.setAllowsRegister(); +Name = S.data() - 1; +return true; } bool Index: cfe/trunk/test/Sema/inline-asm-validate-amdgpu.cl === --- cfe/trunk/test/Sema/inline-asm-validate-amdgpu.cl +++ cfe/trunk/test/Sema/inline-asm-validate-amdgpu.cl @@ -1,14 +1,76 @@ // REQUIRES: amdgpu-registered-target -// RUN: %clang_cc1 -x cl -triple amdgcn -fsyntax-only %s -// expected-no-diagnostics +// RUN: %clang_cc1 -triple amdgcn -fsyntax-only -verify %s + +#pragma OPENCL EXTENSION cl_khr_fp64 : enable kernel void test () { int sgpr = 0, vgpr = 0, imm = 0; // sgpr constraints __asm__ ("s_mov_b32 %0, %1" : "=s" (sgpr) : "s" (imm) : ); + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}" (imm) : ); + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exe" (imm) : ); // expected-error {{invalid input constraint '{exe' in asm}} + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec" (imm) : ); // expected-error {{invalid input constraint '{exec' in asm}} + __asm__ ("s_mov_b32 %0, %1" : "={s1}" (sgpr) : "{exec}a" (imm) : ); // expected-error {{invalid input constraint '{exec}a' in asm}} + // vgpr constraints __asm__ ("v_mov_b32 %0, %1" : "=v" (vgpr) : "v" (imm) : ); } + +__kernel void +test_float(const __global float *a, const __global float *b, __global float *c, unsigned i) +{ +float ai = a[i]; +float bi = b[i]; +float ci; + +__asm("v_add_f32_e32 v1, v2, v3" : "={v1}"(ci) : "{v2}"(ai), "{v3}"(bi) : ); +__asm("v_add_f32_e32 v1, v2, v3" : ""(ci) : "{v2}"(ai), "{v3}"(bi) : ); // expected-error {{invalid output constraint '' in asm}} +__asm("v_add_f32_e32 v1, v2, v3" : "="(ci) : "{v2}"(ai), "{v3}"(bi) : ); // expected-error {{inval
[PATCH] D38463: [OpenCL] Fix checking of vector type casting
yaxunl created this revision. Currently clang allows the following code int a; int b = (const int) a; However it does not the following code int4 a; int4 b = (const int4) a; This is because Clang compares the qualified types instead of unqualified types for vector type casting, which causes the inconsistency. This patch fixes that. https://reviews.llvm.org/D38463 Files: lib/Sema/SemaExpr.cpp test/SemaOpenCL/vector_conv_invalid.cl Index: test/SemaOpenCL/vector_conv_invalid.cl === --- test/SemaOpenCL/vector_conv_invalid.cl +++ test/SemaOpenCL/vector_conv_invalid.cl @@ -5,10 +5,18 @@ typedef int int3 __attribute((ext_vector_type(3))); typedef unsigned uint3 __attribute((ext_vector_type(3))); -void vector_conv_invalid() { +void vector_conv_invalid(const global int4 *const_global_ptr) { uint4 u = (uint4)(1); int4 i = u; // expected-error{{initializing 'int4' (vector of 4 'int' values) with an expression of incompatible type 'uint4' (vector of 4 'unsigned int' values)}} int4 e = (int4)u; // expected-error{{invalid conversion between ext-vector type 'int4' (vector of 4 'int' values) and 'uint4' (vector of 4 'unsigned int' values)}} uint3 u4 = (uint3)u; // expected-error{{invalid conversion between ext-vector type 'uint3' (vector of 3 'unsigned int' values) and 'uint4' (vector of 4 'unsigned int' values)}} + + e = (const int4)i; + e = (constant int4)i; + e = (private int4)i; + + private int4 *private_ptr = (const private int4 *)const_global_ptr; // expected-error{{casting 'const __global int4 *' to type 'const int4 *' changes address space of pointer}} + global int4 *global_ptr = const_global_ptr; // expected-warning {{initializing '__global int4 *' with an expression of type 'const __global int4 *' discards qualifiers}} + global_ptr = (global int4 *)const_global_ptr; } Index: lib/Sema/SemaExpr.cpp === --- lib/Sema/SemaExpr.cpp +++ lib/Sema/SemaExpr.cpp @@ -6033,9 +6033,9 @@ // In OpenCL, casts between vectors of different types are not allowed. // (See OpenCL 6.2). if (SrcTy->isVectorType()) { -if (!areLaxCompatibleVectorTypes(SrcTy, DestTy) -|| (getLangOpts().OpenCL && -(DestTy.getCanonicalType() != SrcTy.getCanonicalType( { +if (!areLaxCompatibleVectorTypes(SrcTy, DestTy) || +(getLangOpts().OpenCL && + !Context.hasSameUnqualifiedType(DestTy, SrcTy))) { Diag(R.getBegin(),diag::err_invalid_conversion_between_ext_vectors) << DestTy << SrcTy << R; return ExprError(); Index: test/SemaOpenCL/vector_conv_invalid.cl === --- test/SemaOpenCL/vector_conv_invalid.cl +++ test/SemaOpenCL/vector_conv_invalid.cl @@ -5,10 +5,18 @@ typedef int int3 __attribute((ext_vector_type(3))); typedef unsigned uint3 __attribute((ext_vector_type(3))); -void vector_conv_invalid() { +void vector_conv_invalid(const global int4 *const_global_ptr) { uint4 u = (uint4)(1); int4 i = u; // expected-error{{initializing 'int4' (vector of 4 'int' values) with an expression of incompatible type 'uint4' (vector of 4 'unsigned int' values)}} int4 e = (int4)u; // expected-error{{invalid conversion between ext-vector type 'int4' (vector of 4 'int' values) and 'uint4' (vector of 4 'unsigned int' values)}} uint3 u4 = (uint3)u; // expected-error{{invalid conversion between ext-vector type 'uint3' (vector of 3 'unsigned int' values) and 'uint4' (vector of 4 'unsigned int' values)}} + + e = (const int4)i; + e = (constant int4)i; + e = (private int4)i; + + private int4 *private_ptr = (const private int4 *)const_global_ptr; // expected-error{{casting 'const __global int4 *' to type 'const int4 *' changes address space of pointer}} + global int4 *global_ptr = const_global_ptr; // expected-warning {{initializing '__global int4 *' with an expression of type 'const __global int4 *' discards qualifiers}} + global_ptr = (global int4 *)const_global_ptr; } Index: lib/Sema/SemaExpr.cpp === --- lib/Sema/SemaExpr.cpp +++ lib/Sema/SemaExpr.cpp @@ -6033,9 +6033,9 @@ // In OpenCL, casts between vectors of different types are not allowed. // (See OpenCL 6.2). if (SrcTy->isVectorType()) { -if (!areLaxCompatibleVectorTypes(SrcTy, DestTy) -|| (getLangOpts().OpenCL && -(DestTy.getCanonicalType() != SrcTy.getCanonicalType( { +if (!areLaxCompatibleVectorTypes(SrcTy, DestTy) || +(getLangOpts().OpenCL && + !Context.hasSameUnqualifiedType(DestTy, SrcTy))) { Diag(R.getBegin(),diag::err_invalid_conversion_between_ext_vectors) << DestTy << SrcTy << R; return ExprError(); ___ cfe-commits mailing list cfe-commits@lists.llvm.o
[PATCH] D38463: [OpenCL] Fix checking of vector type casting
This revision was automatically updated to reflect the committed changes. Closed by commit rL314802: [OpenCL] Fix checking of vector type casting (authored by yaxunl). Changed prior to commit: https://reviews.llvm.org/D38463?vs=117357&id=117524#toc Repository: rL LLVM https://reviews.llvm.org/D38463 Files: cfe/trunk/lib/Sema/SemaExpr.cpp cfe/trunk/test/SemaOpenCL/vector_conv_invalid.cl Index: cfe/trunk/test/SemaOpenCL/vector_conv_invalid.cl === --- cfe/trunk/test/SemaOpenCL/vector_conv_invalid.cl +++ cfe/trunk/test/SemaOpenCL/vector_conv_invalid.cl @@ -5,10 +5,18 @@ typedef int int3 __attribute((ext_vector_type(3))); typedef unsigned uint3 __attribute((ext_vector_type(3))); -void vector_conv_invalid() { +void vector_conv_invalid(const global int4 *const_global_ptr) { uint4 u = (uint4)(1); int4 i = u; // expected-error{{initializing 'int4' (vector of 4 'int' values) with an expression of incompatible type 'uint4' (vector of 4 'unsigned int' values)}} int4 e = (int4)u; // expected-error{{invalid conversion between ext-vector type 'int4' (vector of 4 'int' values) and 'uint4' (vector of 4 'unsigned int' values)}} uint3 u4 = (uint3)u; // expected-error{{invalid conversion between ext-vector type 'uint3' (vector of 3 'unsigned int' values) and 'uint4' (vector of 4 'unsigned int' values)}} + + e = (const int4)i; + e = (constant int4)i; + e = (private int4)i; + + private int4 *private_ptr = (const private int4 *)const_global_ptr; // expected-error{{casting 'const __global int4 *' to type 'const int4 *' changes address space of pointer}} + global int4 *global_ptr = const_global_ptr; // expected-warning {{initializing '__global int4 *' with an expression of type 'const __global int4 *' discards qualifiers}} + global_ptr = (global int4 *)const_global_ptr; } Index: cfe/trunk/lib/Sema/SemaExpr.cpp === --- cfe/trunk/lib/Sema/SemaExpr.cpp +++ cfe/trunk/lib/Sema/SemaExpr.cpp @@ -6033,9 +6033,9 @@ // In OpenCL, casts between vectors of different types are not allowed. // (See OpenCL 6.2). if (SrcTy->isVectorType()) { -if (!areLaxCompatibleVectorTypes(SrcTy, DestTy) -|| (getLangOpts().OpenCL && -(DestTy.getCanonicalType() != SrcTy.getCanonicalType( { +if (!areLaxCompatibleVectorTypes(SrcTy, DestTy) || +(getLangOpts().OpenCL && + !Context.hasSameUnqualifiedType(DestTy, SrcTy))) { Diag(R.getBegin(),diag::err_invalid_conversion_between_ext_vectors) << DestTy << SrcTy << R; return ExprError(); Index: cfe/trunk/test/SemaOpenCL/vector_conv_invalid.cl === --- cfe/trunk/test/SemaOpenCL/vector_conv_invalid.cl +++ cfe/trunk/test/SemaOpenCL/vector_conv_invalid.cl @@ -5,10 +5,18 @@ typedef int int3 __attribute((ext_vector_type(3))); typedef unsigned uint3 __attribute((ext_vector_type(3))); -void vector_conv_invalid() { +void vector_conv_invalid(const global int4 *const_global_ptr) { uint4 u = (uint4)(1); int4 i = u; // expected-error{{initializing 'int4' (vector of 4 'int' values) with an expression of incompatible type 'uint4' (vector of 4 'unsigned int' values)}} int4 e = (int4)u; // expected-error{{invalid conversion between ext-vector type 'int4' (vector of 4 'int' values) and 'uint4' (vector of 4 'unsigned int' values)}} uint3 u4 = (uint3)u; // expected-error{{invalid conversion between ext-vector type 'uint3' (vector of 3 'unsigned int' values) and 'uint4' (vector of 4 'unsigned int' values)}} + + e = (const int4)i; + e = (constant int4)i; + e = (private int4)i; + + private int4 *private_ptr = (const private int4 *)const_global_ptr; // expected-error{{casting 'const __global int4 *' to type 'const int4 *' changes address space of pointer}} + global int4 *global_ptr = const_global_ptr; // expected-warning {{initializing '__global int4 *' with an expression of type 'const __global int4 *' discards qualifiers}} + global_ptr = (global int4 *)const_global_ptr; } Index: cfe/trunk/lib/Sema/SemaExpr.cpp === --- cfe/trunk/lib/Sema/SemaExpr.cpp +++ cfe/trunk/lib/Sema/SemaExpr.cpp @@ -6033,9 +6033,9 @@ // In OpenCL, casts between vectors of different types are not allowed. // (See OpenCL 6.2). if (SrcTy->isVectorType()) { -if (!areLaxCompatibleVectorTypes(SrcTy, DestTy) -|| (getLangOpts().OpenCL && -(DestTy.getCanonicalType() != SrcTy.getCanonicalType( { +if (!areLaxCompatibleVectorTypes(SrcTy, DestTy) || +(getLangOpts().OpenCL && + !Context.hasSameUnqualifiedType(DestTy, SrcTy))) { Diag(R.getBegin(),diag::err_invalid_conversion_between_ext_vectors) << DestTy << SrcTy << R; return ExprError(); ___
[PATCH] D37822: [OpenCL] Clean up and add missing fields for block struct
yaxunl marked an inline comment as done. yaxunl added inline comments. Comment at: test/CodeGenOpenCL/blocks.cl:30 + // COMMON: %[[block_captured:.*]] = getelementptr inbounds <{ i32, i32, i8 addrspace(4)*, i32 }>, <{ i32, i32, i8 addrspace(4)*, i32 }>* %[[block]], i32 0, i32 3 + // COMMON: %[[r0:.*]] = load i32, i32* %i + // COMMON: store i32 %[[r0]], i32* %[[block_captured]], Anastasia wrote: > It might be better to give those r0-r7 some names for readability if possible! Will fix it when committing. https://reviews.llvm.org/D37822 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D37822: [OpenCL] Clean up and add missing fields for block struct
This revision was automatically updated to reflect the committed changes. yaxunl marked an inline comment as done. Closed by commit rL314932: [OpenCL] Clean up and add missing fields for block struct (authored by yaxunl). Changed prior to commit: https://reviews.llvm.org/D37822?vs=116877&id=117732#toc Repository: rL LLVM https://reviews.llvm.org/D37822 Files: cfe/trunk/lib/CodeGen/CGBlocks.cpp cfe/trunk/lib/CodeGen/CGOpenCLRuntime.cpp cfe/trunk/lib/CodeGen/CGOpenCLRuntime.h cfe/trunk/lib/CodeGen/TargetInfo.h cfe/trunk/test/CodeGen/blocks-opencl.cl cfe/trunk/test/CodeGenOpenCL/blocks.cl cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -7,7 +7,7 @@ typedef struct {int a;} ndrange_t; // N.B. The check here only exists to set BL_GLOBAL -// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i8**, i32, i32, i8*, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) +// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) const bl_t block_G = (bl_t) ^ (local void *a) {}; kernel void device_side_enqueue(global int *a, global int *b, int i) { @@ -27,9 +27,10 @@ // COMMON: [[NDR:%[a-z0-9]+]] = alloca %struct.ndrange_t, align 4 // COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.*}}*, %opencl.queue_t{{.*}}** %default_queue // COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags - // COMMON: [[BL:%[0-9]+]] = bitcast <{ i8*, i32, i32, i8*, %struct.__block_descriptor addrspace(2)*, i32{{.*}}, i32{{.*}}, i32{{.*}} }>* %block to void ()* + // B32: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4)*, i32 addrspace(1)*, i32, i32 addrspace(1)* }>* %block to void ()* + // B64: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4)*, i32 addrspace(1)*, i32 addrspace(1)*, i32 }>* %block to void ()* // COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)* - // COMMON: call i32 @__enqueue_kernel_basic(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* [[BL_I8]]) + // COMMON: call i32 @__enqueue_kernel_basic(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* [[BL_I8]]) enqueue_kernel(default_queue, flags, ndrange, ^(void) { a[i] = b[i]; @@ -39,7 +40,7 @@ // COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags // COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.*}}** %event_wait_list to %opencl.clk_event_t{{.*}}* addrspace(4)* // COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.*}}** %clk_event to %opencl.clk_event_t{{.*}}* addrspace(4)* - // COMMON: [[BL:%[0-9]+]] = bitcast <{ i8*, i32, i32, i8*, %struct.__block_descriptor addrspace(2)*, i32{{.*}}, i32{{.*}}, i32{{.*}} }>* %block3 to void ()* + // COMMON: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4)*, i32{{.*}}, i32{{.*}}, i32{{.*}} }>* %block3 to void ()* // COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)* // COMMON: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.*}}, i32 2, %opencl.clk_event_t{{.*}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.*}}* addrspace(4)* [[EVNT]], i8 addrspace(4)* [[BL_I8]]) enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event, @@ -52,11 +53,11 @@ // B32: %[[TMP:.*]] = alloca [1 x i32] // B32: %[[TMP1:.*]] = getelementptr [1 x i32], [1 x i32]* %[[TMP]], i32 0, i32 0 // B32: store i32 256, i32* %[[TMP1]], align 4 - // B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8**, i32, i32, i8*, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)*) to i8 addrspace(4)*), i32 1, i32* %[[TMP1]]) + // B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.*}}* [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)*) to i8 ad
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
yaxunl marked 10 inline comments as done. yaxunl added a comment. In https://reviews.llvm.org/D38134#880133, @Anastasia wrote: > I think we should add a test case when the same block is both called and > enqueued. Will do. Comment at: test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl:3 + +// CHECK: %[[S1:struct.__amdgpu_block_arg_t.*]] = type { [3 x i64], [1 x i8] } +// CHECK: %[[S2:struct.__amdgpu_block_arg_t.*]] = type { [5 x i64], [1 x i8] } Anastasia wrote: > yaxunl wrote: > > Anastasia wrote: > > > This struct is not identical to block literal struct? > > The LLVM type of the first argument of block invoke function is created > > directly with sorting and rearrangement. There is no AST type corresponding > > to it. However, the function codegen requires AST type of this argument. I > > feel it is unnecessary to create the corresponding AST type. For > > simplicity, just create an AST type with the same size and alignment as the > > LLVM type. In the function code, it will be bitcasted to the correct LLVM > > struct type and get the captured variables. > So `void ptr` won't be possible here? Since it is cast to a right struct > inside the block anyway. Once again a block is a special type object with > known semantic to compiler and runtime in contract to kernels that can be > written with any arbitrary type of arguments. > > I just don't like the idea of duplicating the block invoke function in case > it's being both called and enqueued. Also the login in blocks code generation > becomes difficult to understand. So I am wondering if we could perhaps create > a separate kernel function (as a wrapper) for enqueue_kernel which would call > a block instead. What do you think about it? I think the kernel prototype > would be fairly generic as it would just have a block call inside and pass > the arguments into it... We won't need to modify block generation then at > all. Will emit a wrapper kernel which calls the block invoke function and keep the block invoke function unchanged. https://reviews.llvm.org/D38134 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
yaxunl updated this revision to Diff 117739. yaxunl marked an inline comment as done. yaxunl edited the summary of this revision. yaxunl added a comment. Emit enqueued block as a wrapper kernel which calls the block invoke function. Added test for calling and enqueue the same block. https://reviews.llvm.org/D38134 Files: lib/CodeGen/CGBuiltin.cpp lib/CodeGen/CGOpenCLRuntime.cpp lib/CodeGen/CGOpenCLRuntime.h lib/CodeGen/CodeGenTypes.h lib/CodeGen/TargetInfo.cpp lib/CodeGen/TargetInfo.h test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -6,10 +6,33 @@ typedef void (^bl_t)(local void *); typedef struct {int a;} ndrange_t; -// N.B. The check here only exists to set BL_GLOBAL -// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) +// COMMON: %struct.__opencl_block_literal_generic = type { i32, i32, i8 addrspace(4)* } + +// For a block global variable, first emit the block literal as a global variable, then emit the block variable itself. +// COMMON: [[BL_GLOBAL:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INV_G:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) + +// For anonymous blocks without captures, emit block literals as global variable. +// COMMON: [[BLG1:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG1:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG2:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG2:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG3:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG3:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG4:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG4:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG5:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG5:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG6:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*, i8 addrspace(3)*, i8 addrspace(3)*)* [[INVG6:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG7:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG7:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG8:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*)* [[INVG8:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG9:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG9:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG8K:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) c
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl updated this revision to Diff 117770. yaxunl marked 9 inline comments as done. yaxunl added a comment. Revised by John's comments. https://reviews.llvm.org/D35082 Files: include/clang/AST/ASTContext.h include/clang/AST/Type.h include/clang/Basic/AddressSpaces.h lib/AST/ASTContext.cpp lib/AST/Expr.cpp lib/AST/ItaniumMangle.cpp lib/AST/TypePrinter.cpp lib/Basic/Targets/AMDGPU.cpp lib/Basic/Targets/NVPTX.h lib/Basic/Targets/SPIR.h lib/Basic/Targets/TCE.h lib/CodeGen/CGDecl.cpp lib/Sema/SemaChecking.cpp lib/Sema/SemaDecl.cpp lib/Sema/SemaType.cpp test/CodeGen/blocks-opencl.cl test/CodeGenOpenCL/address-spaces-mangling.cl test/CodeGenOpenCL/address-spaces.cl test/SemaOpenCL/address-spaces-conversions-cl2.0.cl test/SemaOpenCL/address-spaces.cl test/SemaOpenCL/atomic-ops.cl test/SemaOpenCL/cl20-device-side-enqueue.cl test/SemaOpenCL/extern.cl test/SemaOpenCL/invalid-block.cl test/SemaOpenCL/invalid-pipes-cl2.0.cl test/SemaOpenCL/null_literal.cl test/SemaOpenCL/storageclass-cl20.cl test/SemaOpenCL/storageclass.cl Index: test/SemaOpenCL/storageclass.cl === --- test/SemaOpenCL/storageclass.cl +++ test/SemaOpenCL/storageclass.cl @@ -5,6 +5,20 @@ int G3 = 0;// expected-error{{program scope variable must reside in constant address space}} global int G4 = 0; // expected-error{{program scope variable must reside in constant address space}} +static float g_implicit_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static constant float g_constant_static_var = 0; +static global float g_global_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static local float g_local_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static private float g_private_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static generic float g_generic_static_var = 0; // expected-error{{OpenCL version 1.2 does not support the 'generic' type qualifier}} // expected-error {{program scope variable must reside in constant address space}} + +extern float g_implicit_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern constant float g_constant_extern_var; +extern global float g_global_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern local float g_local_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern private float g_private_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern generic float g_generic_extern_var; // expected-error{{OpenCL version 1.2 does not support the 'generic' type qualifier}} // expected-error {{extern variable must reside in constant address space}} + void kernel foo(int x) { // static is not allowed at local scope before CL2.0 static int S1 = 5; // expected-error{{variables in function scope cannot be declared static}} @@ -45,10 +59,17 @@ __attribute__((address_space(100))) int L4; // expected-error{{automatic variable qualified with an invalid address space}} } + static float l_implicit_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static constant float l_constant_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static global float l_global_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static local float l_local_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static private float l_private_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static generic float l_generic_static_var = 0; // expected-error{{OpenCL version 1.2 does not support the 'generic' type qualifier}} // expected-error {{variables in function scope cannot be declared static}} - extern constant float L5; - extern local float L6; // expected-error{{extern variable must reside in constant address space}} - - static int L7 = 0; // expected-error{{variables in function scope cannot be declared static}} - static int L8; // expected-error{{variables in function scope cannot be declared static}} + extern float l_implicit_extern_var; // expected-error {{extern variable must reside in constant address space}} + extern constant float l_constant_extern_var; + extern global float l_global_extern_var; // expected-error {{extern variable must reside in constant address space}} + extern local float l_local_extern_var; // expected-error {{extern variable must reside in constant address space}} + extern private float l_private_extern_var; // expec
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl added a comment. In https://reviews.llvm.org/D35082#887855, @rjmccall wrote: > Why is most of this patch necessary under the design of adding a > non-canonical __private address space? There are two reasons that we need a flag to indicate an address space is simplicit: 1. We need a consistent way to print the address space qualifier depending on whether it is implicit or not. We only print address space qualifier when it is explicit. This is not just for private address space. It is for all address spaces. 2. In some rare situations we need to know whether an address space is implicit when doing the semantic checking. Since the implicit property is per address space qualifier, we need this flag to be on the qualifier. Comment at: include/clang/AST/Type.h:336 + /// space makes difference. + bool getImplicitAddressSpaceFlag() const { return Mask & IMask; } + void setImplicitAddressSpaceFlag(bool Value) { rjmccall wrote: > isAddressSpaceImplicit() Will change. Comment at: include/clang/AST/Type.h:337 + bool getImplicitAddressSpaceFlag() const { return Mask & IMask; } + void setImplicitAddressSpaceFlag(bool Value) { +Mask = (Mask & ~IMask) | (((uint32_t)Value) << IShift); rjmccall wrote: > setAddressSpaceImplicit Will change. Comment at: lib/AST/ItaniumMangle.cpp:2232 case LangAS::opencl_constant: ASString = "CLconstant"; break; + case LangAS::opencl_private: ASString = "CLprivate"; break; case LangAS::opencl_generic: ASString = "CLgeneric"; break; rjmccall wrote: > In what situation is this mangled? I thought we agree this was non-canonical. OpenCL has overloaded builtin functions, e.g. `__attribute__((overloadable)) void f(private int*)` and `__attribute__((overloadable)) void f(global int*)`. These functions need to be mangled so that the mangled names are different. Comment at: test/SemaOpenCL/storageclass.cl:63 + static float l_implicit_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static constant float l_constant_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static global float l_global_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} Anastasia wrote: > Does it make sense to put different address spaces here since this code is > rejected earlier anyway? In Sema I saw code handling different address spaces in various places. I want to make sure that all address spaces are handled correctly. https://reviews.llvm.org/D35082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl added a comment. In https://reviews.llvm.org/D35082#889053, @rjmccall wrote: > Are you sure it's a good idea to not print the address space when it's > implicit? Won't that often lead to really confusing diagnostics? > > Also, we do already have a way of expressing that an extended qualifier was > explicit: AttributedType. We have very similar sorts of superficial > well-formedness checks to what I think you're trying to do in ObjC ARC. Based on my observation, in most cases it is OK not to print the implicit address space, and printing implicit address space could cause quite annoying cluttering. In some cases printing implicit address space may be desired. I can improve the printing in some future patch, e.g. only hide the implicit address space in situations which causes cluttering but not providing much useful information. I think AttributedType sounds an interesting idea and worth exploring. I just felt this review dragged too long (~ 3 months) already. We have an important backend change depending on this feature. Since the current solution achieves its goals already, can we leave re-implementing it by AttributedType for the future? Thanks. https://reviews.llvm.org/D35082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl added a comment. In https://reviews.llvm.org/D35082#890143, @rjmccall wrote: > You have an important backend change relying on being able to preserve type > sugar better in diagnostics? The only apparent semantic change in this patch > is that you're changing the mangling, which frankly seems incorrect. Can you elaborate on why the mangling is incorrect? https://reviews.llvm.org/D35082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl added a comment. In https://reviews.llvm.org/D35082#890150, @rjmccall wrote: > Non-canonical information is not supposed to be mangled. > > It's not clear to me what the language rule you're really trying to implement > is. Maybe you really do need a canonical __private address space, in which > case you are going to have to change a lot of code in Sema to add the address > space qualifier to every local and temporary allocation. But that's not > obvious to me yet, because you haven't really explained why it needs to be > explicit in the representation. In OpenCL all memory is in certain address space, therefore every l-value and pointee in a pointer should have an address space. Private address space has equal status as global or local address space. There are language rules about pointers to what address space can be assigned to pointers to what address space. Therefore all address space needs to be canonical. This patch already puts every local variable and function parameter in private address space, as is done in deduceOpenCLImplicitAddrSpace(). We need private address space to be explicit because we need to display explicit private address space and implicit address space differently, also because in certain semantic checks we need to know if a private address space is explicit or not. > If you just want pointers to __private to be mangled with an address-space > qualifier — meaning I guess that the mangling e.g. Pi will be completely > unused — that should be easy enough to handle in the mangler. But if you > need to distinguish between __private-qualified types and unqualified types, > and that distinction isn't purely to implement some syntactic restriction > about not writing e.g. __private __global, then that's not good enough and > you do need a canonical __private. The mangler does not mangle an empty type qualifier, therefore if private address space is represented as the default address space (i.e., no address space), it will not be mangled at all. This is also related to the substitution and cannot be easily changed without breaking lots of stuff in the mangler. > Telling me that you're in a hurry isn't helpful; preserving a reasonable > representation and not allowing corner cases to become maintenance problems > is far more important to the project than landing patches in trunk on some > particular schedule. The introduction of explicit private address space and the implicit address space flag in AST is precisely for handling those corner cases in the OpenCL language rules so that they won't become maintenance problems. https://reviews.llvm.org/D35082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
yaxunl marked 6 inline comments as done. yaxunl added inline comments. Comment at: lib/CodeGen/CGOpenCLRuntime.cpp:144 + if (auto *I = dyn_cast(V)) { +// If the block literal is emitted as an instruction, it is an alloca +// and the block invoke function is stored to GEP of this alloca. Anastasia wrote: > Why do we need to replace original block calls with the kernels? I think in > case of calling a block we could use the original block function and only for > enqueue use the kernel that would call the block function inside. The pointer > to the kernel wrapper could be passed as an additional parameter to > `enqueue_kernel` calls. We won't need to iterate through all IR then. `CGF.EmitScalarExpr(Block)` returns the block literal structure which contains the size/align/invoke_function/captures. The block invoke function is stored to the struct by a `StoreInst`. To create the wrapper kernel, we need to get the block invoke function, therefore we have to iterate through IR. Since we need to find the store instruction any way, it is simpler to just replace the stored function with the kernel and pass the block literal struct, instead of passing the kernel separately. Comment at: lib/CodeGen/TargetInfo.cpp:8927 +llvm::Function * +TargetCodeGenInfo::createEnqueuedBlockKernel(CodeGenFunction &CGF, + llvm::Function *Invoke, Anastasia wrote: > Could you add some comments please? Will do. Comment at: lib/CodeGen/TargetInfo.cpp:8949 + Builder.restoreIP(IP); + return F; +} Anastasia wrote: > Wondering if we should add the kernel metadata (w/o args) since it was used > for long time to indicate the kernel. Currently (before this change), clang already does not generate kernel metadata if there is no vec_type_hint, work_group_size_hint, reqd_work_group_size. Remember last time we made the change to use function metadata to represent these attributes. Whether a function is a kernel can be determined by its calling convention. Comment at: lib/CodeGen/TargetInfo.h:35 class Decl; +class ASTContext; Anastasia wrote: > Do we need this? Will remove it. Comment at: test/CodeGenOpenCL/cl20-device-side-enqueue.cl:9 -// N.B. The check here only exists to set BL_GLOBAL -// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) +// COMMON: %struct.__opencl_block_literal_generic = type { i32, i32, i8 addrspace(4)* } + Anastasia wrote: > Can we check generated kernel function too? will do. https://reviews.llvm.org/D38134 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
yaxunl updated this revision to Diff 118064. yaxunl marked 5 inline comments as done. yaxunl added a comment. Revise by Anastasia's comments. https://reviews.llvm.org/D38134 Files: lib/CodeGen/CGBuiltin.cpp lib/CodeGen/CGOpenCLRuntime.cpp lib/CodeGen/CGOpenCLRuntime.h lib/CodeGen/CodeGenTypes.h lib/CodeGen/TargetInfo.cpp lib/CodeGen/TargetInfo.h test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -6,10 +6,33 @@ typedef void (^bl_t)(local void *); typedef struct {int a;} ndrange_t; -// N.B. The check here only exists to set BL_GLOBAL -// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) +// COMMON: %struct.__opencl_block_literal_generic = type { i32, i32, i8 addrspace(4)* } + +// For a block global variable, first emit the block literal as a global variable, then emit the block variable itself. +// COMMON: [[BL_GLOBAL:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INV_G:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) + +// For anonymous blocks without captures, emit block literals as global variable. +// COMMON: [[BLG1:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG1:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG2:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG2:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG3:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG3:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG4:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG4:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG5:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG5:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG6:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*, i8 addrspace(3)*, i8 addrspace(3)*)* [[INVG6:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG7:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG7:@[^ ]+_kernel]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG8:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*)* [[INVG8:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG9:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG9:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG8K:@__opencl_enqueued_block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*)*
[PATCH] D33681: [OpenCL] Allow function declaration with empty argument list.
yaxunl accepted this revision. yaxunl added a comment. LGTM. Thanks. https://reviews.llvm.org/D33681 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
yaxunl marked 4 inline comments as done. yaxunl added inline comments. Comment at: lib/CodeGen/CGOpenCLRuntime.cpp:144 + if (auto *I = dyn_cast(V)) { +// If the block literal is emitted as an instruction, it is an alloca +// and the block invoke function is stored to GEP of this alloca. Anastasia wrote: > yaxunl wrote: > > Anastasia wrote: > > > Why do we need to replace original block calls with the kernels? I think > > > in case of calling a block we could use the original block function and > > > only for enqueue use the kernel that would call the block function > > > inside. The pointer to the kernel wrapper could be passed as an > > > additional parameter to `enqueue_kernel` calls. We won't need to iterate > > > through all IR then. > > `CGF.EmitScalarExpr(Block)` returns the block literal structure which > > contains the size/align/invoke_function/captures. The block invoke function > > is stored to the struct by a `StoreInst`. To create the wrapper kernel, we > > need to get the block invoke function, therefore we have to iterate through > > IR. > > > > Since we need to find the store instruction any way, it is simpler to just > > replace the stored function with the kernel and pass the block literal > > struct, instead of passing the kernel separately. > So we cann't get the invoke function from the block literal structure passed > into the kernel wrapper directly knowing its offset? Iterating through IR > adds extra time and also I am not sure how reliable this is wrt different > corner cases of IR. Unfortunately the invoke function is not returned directly. Instead, it is buried in an LLVM value. And to extract the invoke function from the LLVM value we have to wade through a bunch of LLVM IRs. There is one way to get the invoke function directly instead of going through IRs, but we need to change the functions for generating code for the blocks a little bit so that they return the block invoke function. https://reviews.llvm.org/D38134 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
yaxunl updated this revision to Diff 118677. yaxunl marked 2 inline comments as done. yaxunl added a comment. Revised by Anastasia's comments. Get block invoke function by API instead of iterate through IR's. Pass the block kernel directly to `__enqueu_kernel functions`. https://reviews.llvm.org/D38134 Files: lib/CodeGen/CGBlocks.cpp lib/CodeGen/CGBuiltin.cpp lib/CodeGen/CGOpenCLRuntime.cpp lib/CodeGen/CGOpenCLRuntime.h lib/CodeGen/CodeGenFunction.h lib/CodeGen/CodeGenTypes.h lib/CodeGen/TargetInfo.cpp lib/CodeGen/TargetInfo.h test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -6,10 +6,30 @@ typedef void (^bl_t)(local void *); typedef struct {int a;} ndrange_t; -// N.B. The check here only exists to set BL_GLOBAL -// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) +// COMMON: %struct.__opencl_block_literal_generic = type { i32, i32, i8 addrspace(4)* } + +// For a block global variable, first emit the block literal as a global variable, then emit the block variable itself. +// COMMON: [[BL_GLOBAL:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INV_G:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) + +// For anonymous blocks without captures, emit block literals as global variable. +// COMMON: [[BLG1:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG2:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG3:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG4:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG5:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG6:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*, i8 addrspace(3)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG7:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG8:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*)* [[INVG8:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG9:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG9:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG10:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*)* {{@[^ ]+}} to i8*) to i8 add
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl added a comment. If there is no other issues. May I commit this patch now? Thanks. https://reviews.llvm.org/D35082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl marked 2 inline comments as done. yaxunl added a comment. In https://reviews.llvm.org/D35082#895230, @rjmccall wrote: > It sounds like there's agreement about the basic technical direction of > introducing LangAS::opencl_private. Please introduce > isAddressSpaceImplicit() in a different patch and make this patch just about > the introduction of LangAS::opencl_private. You can have the pretty-printer > just ignore __private for now, which should avoid gratuitous diagnostic > changes. > > I would like you to investigate using AttributedType for the pretty-printing > and address-space semantic checks before adding isAddressSpaceImplicit(). Thanks. I will separate the implicit addr space flag to another patch. Comment at: include/clang/AST/Type.h:562 + static const uint32_t IMask = 0x200; + static const uint32_t IShift = 9; static const uint32_t AddressSpaceMask = rjmccall wrote: > "I" is not an appropriate abbreviation for "AddressSpaceImplicit". Will change it to ImplictAddrSpace. Comment at: include/clang/Basic/AddressSpaces.h:34 // OpenCL specific address spaces. opencl_global, rjmccall wrote: > I think you need a real comment about the design of OpenCL address spaces > here. Specifically, it is important to note that OpenCL no longer uses > LangAS::Default for anything except r-values. Will do. https://reviews.llvm.org/D35082 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
yaxunl updated this revision to Diff 118795. yaxunl marked 5 inline comments as done. yaxunl added a comment. Revised by Anastasia's comments. https://reviews.llvm.org/D38134 Files: lib/CodeGen/CGBlocks.cpp lib/CodeGen/CGBuiltin.cpp lib/CodeGen/CGOpenCLRuntime.cpp lib/CodeGen/CGOpenCLRuntime.h lib/CodeGen/CodeGenFunction.h lib/CodeGen/CodeGenTypes.h lib/CodeGen/TargetInfo.cpp lib/CodeGen/TargetInfo.h test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl test/CodeGenOpenCL/blocks.cl test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -6,10 +6,30 @@ typedef void (^bl_t)(local void *); typedef struct {int a;} ndrange_t; -// N.B. The check here only exists to set BL_GLOBAL -// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) +// COMMON: %struct.__opencl_block_literal_generic = type { i32, i32, i8 addrspace(4)* } + +// For a block global variable, first emit the block literal as a global variable, then emit the block variable itself. +// COMMON: [[BL_GLOBAL:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INV_G:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) + +// For anonymous blocks without captures, emit block literals as global variable. +// COMMON: [[BLG1:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG2:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG3:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG4:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG5:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG6:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*, i8 addrspace(3)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG7:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG8:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*)* [[INVG8:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG9:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INVG9:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG10:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG11:@__block_literal_global[^ ]*]] = internal addrspace(1) constant {
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
yaxunl added a comment. In https://reviews.llvm.org/D38134#895848, @Anastasia wrote: > I think it would be good to add a block test to CodeGenOpenCL where we would > just call the block without any enqueue and check that the invoke function is > generated but the kernel wrapper isn't. we have test/CodeGenOpenCL/blocks.cl which only calls a block. I can add check to make sure no kernels generated. Comment at: lib/CodeGen/CGBuiltin.cpp:2846 + PtrToSizeArray}; + std::vector ArgTys = {QueueTy, + IntTy, Anastasia wrote: > Formatting seems inconsistent from above. Will fix. Comment at: lib/CodeGen/CodeGenFunction.h:2921 private: - /// Helpers for blocks - llvm::Value *EmitBlockLiteral(const CGBlockInfo &Info); + /// Helpers for blocks. Returns invoke function by \p InvokeF if it is not + /// nullptr. Anastasia wrote: > It will be nullptr in case block is not enqueued? May be it's worth > explaining it in the comment. Will do. Comment at: lib/CodeGen/TargetInfo.h:290 } + /// Create an OpenCL kernel for an enqueued block. + virtual llvm::Function * Anastasia wrote: > Can we also explain the wrapper kernel here? Will do. https://reviews.llvm.org/D38134 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
yaxunl updated this revision to Diff 118813. yaxunl marked 7 inline comments as done. yaxunl edited the summary of this revision. yaxunl added a comment. Separate implicit addr space flag to another patch as John suggested. This patch only introduces the private addr space but does not print it. https://reviews.llvm.org/D35082 Files: include/clang/Basic/AddressSpaces.h lib/AST/ASTContext.cpp lib/AST/Expr.cpp lib/AST/ItaniumMangle.cpp lib/AST/TypePrinter.cpp lib/Basic/Targets/AMDGPU.cpp lib/Basic/Targets/NVPTX.h lib/Basic/Targets/SPIR.h lib/Basic/Targets/TCE.h lib/CodeGen/CGDecl.cpp lib/Sema/SemaChecking.cpp lib/Sema/SemaDecl.cpp lib/Sema/SemaType.cpp test/CodeGenOpenCL/address-spaces-mangling.cl test/CodeGenOpenCL/address-spaces.cl test/SemaOpenCL/address-spaces.cl test/SemaOpenCL/cl20-device-side-enqueue.cl test/SemaOpenCL/extern.cl test/SemaOpenCL/storageclass-cl20.cl test/SemaOpenCL/storageclass.cl test/SemaTemplate/address_space-dependent.cpp Index: test/SemaTemplate/address_space-dependent.cpp === --- test/SemaTemplate/address_space-dependent.cpp +++ test/SemaTemplate/address_space-dependent.cpp @@ -43,7 +43,7 @@ template void tooBig() { - __attribute__((address_space(I))) int *bounds; // expected-error {{address space is larger than the maximum supported (8388599)}} + __attribute__((address_space(I))) int *bounds; // expected-error {{address space is larger than the maximum supported (8388598)}} } template @@ -101,7 +101,7 @@ car<1, 2, 3>(); // expected-note {{in instantiation of function template specialization 'car<1, 2, 3>' requested here}} HasASTemplateFields<1> HASTF; neg<-1>(); // expected-note {{in instantiation of function template specialization 'neg<-1>' requested here}} - correct<0x77>(); + correct<0x76>(); tooBig<8388650>(); // expected-note {{in instantiation of function template specialization 'tooBig<8388650>' requested here}} __attribute__((address_space(1))) char *x; Index: test/SemaOpenCL/storageclass.cl === --- test/SemaOpenCL/storageclass.cl +++ test/SemaOpenCL/storageclass.cl @@ -5,6 +5,20 @@ int G3 = 0;// expected-error{{program scope variable must reside in constant address space}} global int G4 = 0; // expected-error{{program scope variable must reside in constant address space}} +static float g_implicit_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static constant float g_constant_static_var = 0; +static global float g_global_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static local float g_local_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static private float g_private_static_var = 0; // expected-error {{program scope variable must reside in constant address space}} +static generic float g_generic_static_var = 0; // expected-error{{OpenCL version 1.2 does not support the 'generic' type qualifier}} // expected-error {{program scope variable must reside in constant address space}} + +extern float g_implicit_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern constant float g_constant_extern_var; +extern global float g_global_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern local float g_local_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern private float g_private_extern_var; // expected-error {{extern variable must reside in constant address space}} +extern generic float g_generic_extern_var; // expected-error{{OpenCL version 1.2 does not support the 'generic' type qualifier}} // expected-error {{extern variable must reside in constant address space}} + void kernel foo(int x) { // static is not allowed at local scope before CL2.0 static int S1 = 5; // expected-error{{variables in function scope cannot be declared static}} @@ -45,10 +59,17 @@ __attribute__((address_space(100))) int L4; // expected-error{{automatic variable qualified with an invalid address space}} } + static float l_implicit_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static constant float l_constant_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static global float l_global_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static local float l_local_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static private float l_private_static_var = 0; // expected-error {{variables in function scope cannot be declared static}} + static generic float l_
[PATCH] D38857: [OpenCL] Improve printing and semantic check related to implicit addr space
yaxunl created this revision. There are two issues: 1. only (void*)0 should be treated as nullptr 2. only explicit addr space should be printed This patch introduces a flag in Qualifier to indicating a non-default address space qualifier is deduced by context. Only non-implicit address space qualifier will be print out when printing AST. It is also used to identify nullptr. However this review does not rule out alternative approaches, e.g. using AttributedType. We will explore alternative approaches. https://reviews.llvm.org/D38857 Files: include/clang/AST/ASTContext.h include/clang/AST/Type.h lib/AST/ASTContext.cpp lib/AST/Expr.cpp lib/AST/TypePrinter.cpp lib/Sema/SemaType.cpp test/SemaOpenCL/address-spaces-conversions-cl2.0.cl test/SemaOpenCL/address-spaces.cl test/SemaOpenCL/atomic-ops.cl test/SemaOpenCL/invalid-block.cl test/SemaOpenCL/invalid-pipes-cl2.0.cl test/SemaOpenCL/null_literal.cl test/SemaOpenCL/vector_conv_invalid.cl test/SemaTemplate/address_space-dependent.cpp Index: test/SemaTemplate/address_space-dependent.cpp === --- test/SemaTemplate/address_space-dependent.cpp +++ test/SemaTemplate/address_space-dependent.cpp @@ -43,7 +43,7 @@ template void tooBig() { - __attribute__((address_space(I))) int *bounds; // expected-error {{address space is larger than the maximum supported (8388598)}} + __attribute__((address_space(I))) int *bounds; // expected-error {{address space is larger than the maximum supported (4194294)}} } template @@ -101,7 +101,7 @@ car<1, 2, 3>(); // expected-note {{in instantiation of function template specialization 'car<1, 2, 3>' requested here}} HasASTemplateFields<1> HASTF; neg<-1>(); // expected-note {{in instantiation of function template specialization 'neg<-1>' requested here}} - correct<0x76>(); + correct<0x36>(); tooBig<8388650>(); // expected-note {{in instantiation of function template specialization 'tooBig<8388650>' requested here}} __attribute__((address_space(1))) char *x; Index: test/SemaOpenCL/vector_conv_invalid.cl === --- test/SemaOpenCL/vector_conv_invalid.cl +++ test/SemaOpenCL/vector_conv_invalid.cl @@ -16,7 +16,7 @@ e = (constant int4)i; e = (private int4)i; - private int4 *private_ptr = (const private int4 *)const_global_ptr; // expected-error{{casting 'const __global int4 *' to type 'const int4 *' changes address space of pointer}} + private int4 *private_ptr = (const private int4 *)const_global_ptr; // expected-error{{casting 'const __global int4 *' to type 'const __private int4 *' changes address space of pointer}} global int4 *global_ptr = const_global_ptr; // expected-warning {{initializing '__global int4 *' with an expression of type 'const __global int4 *' discards qualifiers}} global_ptr = (global int4 *)const_global_ptr; } Index: test/SemaOpenCL/null_literal.cl === --- test/SemaOpenCL/null_literal.cl +++ test/SemaOpenCL/null_literal.cl @@ -1,29 +1,68 @@ // RUN: %clang_cc1 -verify %s -// RUN: %clang_cc1 -cl-std=CL2.0 -DCL20 -verify %s +// RUN: %clang_cc1 -cl-std=CL2.0 -verify %s #define NULL ((void*)0) void foo(){ + global int *g1 = NULL; + global int *g2 = (global void *)0; + global int *g3 = (constant void *)0; // expected-error{{initializing '__global int *' with an expression of type '__constant void *' changes address space of pointer}} + global int *g4 = (local void *)0; // expected-error{{initializing '__global int *' with an expression of type '__local void *' changes address space of pointer}} + global int *g5 = (private void *)0; // expected-error{{initializing '__global int *' with an expression of type '__private void *' changes address space of pointer}} -global int* ptr1 = NULL; + constant int *c1 = NULL; + constant int *c2 = (global void *)0; // expected-error{{initializing '__constant int *' with an expression of type '__global void *' changes address space of pointer}} + constant int *c3 = (constant void *)0; + constant int *c4 = (local void *)0; // expected-error{{initializing '__constant int *' with an expression of type '__local void *' changes address space of pointer}} + constant int *c5 = (private void *)0; // expected-error{{initializing '__constant int *' with an expression of type '__private void *' changes address space of pointer}} -global int* ptr2 = (global void*)0; + local int *l1 = NULL; + local int *l2 = (global void *)0; // expected-error{{initializing '__local int *' with an expression of type '__global void *' changes address space of pointer}} + local int *l3 = (constant void *)0; // expected-error{{initializing '__local int *' with an expression of type '__constant void *' changes address space of pointer}} + local int *l4 = (local void *)0; + local int *l5 = (private void *)0; //
[PATCH] D38816: Convert clang::LangAS to a strongly typed enum
yaxunl added inline comments. Comment at: include/clang/Basic/AddressSpaces.h:66 +inline LangAS LangASFromTargetAS(unsigned TargetAS) { + return static_cast((TargetAS) + how about `getLangASFromTargetAS` ? It is preferred to start with small letters. Comment at: tools/libclang/CXType.cpp:402 + ASTContext &Ctx = cxtu::getASTUnit(GetTU(CT))->getASTContext(); + return Ctx.getTargetAddressSpace(T); } Is this function suppose to return AST address space or target address space? Some targets e.g. x86 maps all AST address spaces to 0. Returning target address space will not let the client unable to differentiate different address spaces in the source language. https://reviews.llvm.org/D38816 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38816: Convert clang::LangAS to a strongly typed enum
yaxunl added inline comments. Comment at: tools/libclang/CXType.cpp:402 + ASTContext &Ctx = cxtu::getASTUnit(GetTU(CT))->getASTContext(); + return Ctx.getTargetAddressSpace(T); } arichardson wrote: > yaxunl wrote: > > Is this function suppose to return AST address space or target address > > space? > > > > Some targets e.g. x86 maps all AST address spaces to 0. Returning target > > address space will not let the client unable to differentiate different > > address spaces in the source language. > I am not entirely sure what the correct return value is here because the > current implementation returns either the LanguageAS or `LangAS - > LangAS::FirstTargetAddressSpace` which can also overlap. So possibly it > should just always returning the AST address space? > > I think for now I will just keep the current behaviour with a FIXME and > create a followup patch. > That's fine. Thanks. https://reviews.llvm.org/D38816 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D35082: [OpenCL] Add LangAS::opencl_private to represent private address space in AST
This revision was automatically updated to reflect the committed changes. Closed by commit rL315668: [OpenCL] Add LangAS::opencl_private to represent private address space in AST (authored by yaxunl). Changed prior to commit: https://reviews.llvm.org/D35082?vs=118813&id=118882#toc Repository: rL LLVM https://reviews.llvm.org/D35082 Files: cfe/trunk/include/clang/Basic/AddressSpaces.h cfe/trunk/lib/AST/ASTContext.cpp cfe/trunk/lib/AST/Expr.cpp cfe/trunk/lib/AST/ItaniumMangle.cpp cfe/trunk/lib/AST/TypePrinter.cpp cfe/trunk/lib/Basic/Targets/AMDGPU.cpp cfe/trunk/lib/Basic/Targets/NVPTX.h cfe/trunk/lib/Basic/Targets/SPIR.h cfe/trunk/lib/Basic/Targets/TCE.h cfe/trunk/lib/CodeGen/CGDecl.cpp cfe/trunk/lib/Sema/SemaChecking.cpp cfe/trunk/lib/Sema/SemaDecl.cpp cfe/trunk/lib/Sema/SemaType.cpp cfe/trunk/test/CodeGenOpenCL/address-spaces-mangling.cl cfe/trunk/test/CodeGenOpenCL/address-spaces.cl cfe/trunk/test/SemaOpenCL/address-spaces.cl cfe/trunk/test/SemaOpenCL/cl20-device-side-enqueue.cl cfe/trunk/test/SemaOpenCL/extern.cl cfe/trunk/test/SemaOpenCL/storageclass-cl20.cl cfe/trunk/test/SemaOpenCL/storageclass.cl cfe/trunk/test/SemaTemplate/address_space-dependent.cpp Index: cfe/trunk/include/clang/Basic/AddressSpaces.h === --- cfe/trunk/include/clang/Basic/AddressSpaces.h +++ cfe/trunk/include/clang/Basic/AddressSpaces.h @@ -25,16 +25,17 @@ /// enum ID { // The default value 0 is the value used in QualType for the the situation - // where there is no address space qualifier. For most languages, this also - // corresponds to the situation where there is no address space qualifier in - // the source code, except for OpenCL, where the address space value 0 in - // QualType represents private address space in OpenCL source code. + // where there is no address space qualifier. Default = 0, // OpenCL specific address spaces. + // In OpenCL each l-value must have certain non-default address space, each + // r-value must have no address space (i.e. the default address space). The + // pointee of a pointer must have non-default address space. opencl_global, opencl_local, opencl_constant, + opencl_private, opencl_generic, // CUDA specific address spaces. Index: cfe/trunk/test/CodeGenOpenCL/address-spaces.cl === --- cfe/trunk/test/CodeGenOpenCL/address-spaces.cl +++ cfe/trunk/test/CodeGenOpenCL/address-spaces.cl @@ -7,6 +7,24 @@ // RUN: %clang_cc1 %s -O0 -triple amdgcn-mesa-mesa3d -emit-llvm -o - | FileCheck --check-prefixes=CHECK,SPIR %s // RUN: %clang_cc1 %s -O0 -triple r600-- -emit-llvm -o - | FileCheck --check-prefixes=CHECK,SPIR %s +// SPIR: %struct.S = type { i32, i32, i32* } +// CL20SPIR: %struct.S = type { i32, i32, i32 addrspace(4)* } +struct S { + int x; + int y; + int *z; +}; + +// CL20-DAG: @g_extern_var = external addrspace(1) global float +// CL20-DAG: @l_extern_var = external addrspace(1) global float +// CL20-DAG: @test_static.l_static_var = internal addrspace(1) global float 0.00e+00 +// CL20-DAG: @g_static_var = internal addrspace(1) global float 0.00e+00 + +#ifdef CL20 +// CL20-DAG: @g_s = common addrspace(1) global %struct.S zeroinitializer +struct S g_s; +#endif + // SPIR: i32* %arg // GIZ: i32 addrspace(5)* %arg void f__p(__private int *arg) {} @@ -58,3 +76,52 @@ // CL20-DAG: @f.ii = internal addrspace(1) global i32 0 #endif } + +typedef int int_td; +typedef int *intp_td; +// SPIR: define void @test_typedef(i32 addrspace(1)* %x, i32 addrspace(2)* %y, i32* %z) +void test_typedef(global int_td *x, constant int_td *y, intp_td z) { + *x = *y; + *z = 0; +} + +// SPIR: define void @test_struct() +void test_struct() { + // SPIR: %ps = alloca %struct.S* + // CL20SPIR: %ps = alloca %struct.S addrspace(4)* + struct S *ps; + // SPIR: store i32 0, i32* %x + // CL20SPIR: store i32 0, i32 addrspace(4)* %x + ps->x = 0; +#ifdef CL20 + // CL20SPIR: store i32 0, i32 addrspace(1)* getelementptr inbounds (%struct.S, %struct.S addrspace(1)* @g_s, i32 0, i32 0) + g_s.x = 0; +#endif +} + +// SPIR-LABEL: define void @test_void_par() +void test_void_par(void) {} + +// SPIR-LABEL: define i32 @test_func_return_type() +int test_func_return_type(void) { + return 0; +} + +#ifdef CL20 +extern float g_extern_var; + +// CL20-LABEL: define {{.*}}void @test_extern( +kernel void test_extern(global float *buf) { + extern float l_extern_var; + buf[0] += g_extern_var + l_extern_var; +} + +static float g_static_var; + +// CL20-LABEL: define {{.*}}void @test_static( +kernel void test_static(global float *buf) { + static float l_static_var; + buf[0] += g_static_var + l_static_var; +} + +#endif Index: cfe/trunk/test/CodeGenOpenCL/address-spaces-mangling.cl === --- cfe/trunk/test/CodeGenOpenCL/address-spaces-mangling.cl +++ cfe/trunk/t
[PATCH] D38816: Convert clang::LangAS to a strongly typed enum
yaxunl added inline comments. Comment at: lib/AST/TypePrinter.cpp:1323 OS << "address_space("; -OS << T->getEquivalentType().getAddressSpace(); +OS << T->getEquivalentType() + .getQualifiers() arichardson wrote: > arichardson wrote: > > Anastasia wrote: > > > arichardson wrote: > > > > Anastasia wrote: > > > > > arichardson wrote: > > > > > > Anastasia wrote: > > > > > > > Why do we need this change? > > > > > > `__attribute__((address_space(n)))` is a target address space and > > > > > > not a language address space like `LangAS::opencl_generic`. Isn't > > > > > > `Qualifiers::getAddressSpaceAttributePrintValue()` meant exactly > > > > > > for this use case? > > > > > Yes, I think there are some adjustment we do in this method to get > > > > > the original source value to be printed corerctly. Does this mean we > > > > > have no tests that caught this issue? > > > > Seems like it, all tests pass both with and without this patch. > > > Strange considering that we have this attribute printed in some error > > > messages of some Sema tests. If I compile this code without your patch: > > > > > > ``` > > > typedef int __attribute__((address_space(1))) int_1; > > > typedef int __attribute__((address_space(2))) int_2; > > > > > > void f0(int_1 &); > > > void f0(const int_1 &); > > > > > > void test_f0() { > > > int i; > > > static int_2 i2; > > > f0(i); > > > f0(i2); > > > } > > > ``` > > > > > > I get the address spaces printed correctly inside the type: > > > note: candidate function not viable: 1st argument ('int_2' (aka > > > '__attribute__((address_space(2))) int')) is in address space 2, but > > > parameter must be in address space 1 > > > > > > Perhaps @yaxunl could comment further on whether this change is needed. > > My guess is that it doesn't go through that switch statement but rather > > through `Qualifiers::print()`. I'll try adding a llvm_unreachable() to see > > if there are any tests that go down this path. > I just ran the clang tests with an llvm_unreachable() here and none of them > failed. So it seems like we don't have anything testing this code path. Sorry for the delay. This part of code is for printing the addr space of AttributedType. Since it seems not used by any language yet, there is no test for it. It is possible a non-target-specific address space being printed here if a language chooses to use AttributedType to represent address space. Therefore a proper fix would be isolate the code for printing address space from Qualifiers::print and re-use it here so that addr space is printed in consistent way no matter it is represented as qualifier or as AttributedType. https://reviews.llvm.org/D38816 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38816: Convert clang::LangAS to a strongly typed enum
yaxunl added inline comments. Comment at: lib/AST/TypePrinter.cpp:1323 OS << "address_space("; -OS << T->getEquivalentType().getAddressSpace(); +OS << T->getEquivalentType() + .getQualifiers() arichardson wrote: > yaxunl wrote: > > arichardson wrote: > > > arichardson wrote: > > > > Anastasia wrote: > > > > > arichardson wrote: > > > > > > Anastasia wrote: > > > > > > > arichardson wrote: > > > > > > > > Anastasia wrote: > > > > > > > > > Why do we need this change? > > > > > > > > `__attribute__((address_space(n)))` is a target address space > > > > > > > > and not a language address space like `LangAS::opencl_generic`. > > > > > > > > Isn't `Qualifiers::getAddressSpaceAttributePrintValue()` meant > > > > > > > > exactly for this use case? > > > > > > > Yes, I think there are some adjustment we do in this method to > > > > > > > get the original source value to be printed corerctly. Does this > > > > > > > mean we have no tests that caught this issue? > > > > > > Seems like it, all tests pass both with and without this patch. > > > > > Strange considering that we have this attribute printed in some error > > > > > messages of some Sema tests. If I compile this code without your > > > > > patch: > > > > > > > > > > ``` > > > > > typedef int __attribute__((address_space(1))) int_1; > > > > > typedef int __attribute__((address_space(2))) int_2; > > > > > > > > > > void f0(int_1 &); > > > > > void f0(const int_1 &); > > > > > > > > > > void test_f0() { > > > > > int i; > > > > > static int_2 i2; > > > > > f0(i); > > > > > f0(i2); > > > > > } > > > > > ``` > > > > > > > > > > I get the address spaces printed correctly inside the type: > > > > > note: candidate function not viable: 1st argument ('int_2' (aka > > > > > '__attribute__((address_space(2))) int')) is in address space 2, but > > > > > parameter must be in address space 1 > > > > > > > > > > Perhaps @yaxunl could comment further on whether this change is > > > > > needed. > > > > My guess is that it doesn't go through that switch statement but rather > > > > through `Qualifiers::print()`. I'll try adding a llvm_unreachable() to > > > > see if there are any tests that go down this path. > > > I just ran the clang tests with an llvm_unreachable() here and none of > > > them failed. So it seems like we don't have anything testing this code > > > path. > > Sorry for the delay. This part of code is for printing the addr space of > > AttributedType. Since it seems not used by any language yet, there is no > > test for it. It is possible a non-target-specific address space being > > printed here if a language chooses to use AttributedType to represent > > address space. Therefore a proper fix would be isolate the code for > > printing address space from Qualifiers::print and re-use it here so that > > addr space is printed in consistent way no matter it is represented as > > qualifier or as AttributedType. > Thanks, that makes sense. To avoid breaking anything here I think it should > be part of a separate patch though. Sure. In this one probably keep the original behavior. https://reviews.llvm.org/D38816 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38816: Convert clang::LangAS to a strongly typed enum
yaxunl accepted this revision. yaxunl added a comment. LGTM. Thanks! Great work! https://reviews.llvm.org/D38816 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38134: [OpenCL] Emit enqueued block as kernel
This revision was automatically updated to reflect the committed changes. Closed by commit rL315804: [OpenCL] Emit enqueued block as kernel (authored by yaxunl). Changed prior to commit: https://reviews.llvm.org/D38134?vs=118795&id=119017#toc Repository: rL LLVM https://reviews.llvm.org/D38134 Files: cfe/trunk/lib/CodeGen/CGBlocks.cpp cfe/trunk/lib/CodeGen/CGBuiltin.cpp cfe/trunk/lib/CodeGen/CGOpenCLRuntime.cpp cfe/trunk/lib/CodeGen/CGOpenCLRuntime.h cfe/trunk/lib/CodeGen/CodeGenFunction.h cfe/trunk/lib/CodeGen/CodeGenTypes.h cfe/trunk/lib/CodeGen/TargetInfo.cpp cfe/trunk/lib/CodeGen/TargetInfo.h cfe/trunk/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl cfe/trunk/test/CodeGenOpenCL/blocks.cl cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl Index: cfe/trunk/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl === --- cfe/trunk/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl +++ cfe/trunk/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl @@ -0,0 +1,36 @@ +// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -emit-llvm -o - -triple amdgcn | FileCheck %s --check-prefix=CHECK + +typedef struct {int a;} ndrange_t; + +// CHECK-LABEL: define amdgpu_kernel void @test +kernel void test(global char *a, char b, global long *c, long d) { + queue_t default_queue; + unsigned flags = 0; + ndrange_t ndrange; + + enqueue_kernel(default_queue, flags, ndrange, + ^(void) { + a[0] = b; + }); + + enqueue_kernel(default_queue, flags, ndrange, + ^(void) { + a[0] = b; + c[0] = d; + }); +} + +// CHECK-LABEL: define internal amdgpu_kernel void @__test_block_invoke_kernel(<{ i32, i32, i8 addrspace(4)*, i8 addrspace(1)*, i8 }>) +// CHECK-SAME: #[[ATTR:[0-9]+]] !kernel_arg_addr_space !{{.*}} !kernel_arg_access_qual !{{.*}} !kernel_arg_type !{{.*}} !kernel_arg_base_type !{{.*}} !kernel_arg_type_qual !{{.*}} +// CHECK: entry: +// CHECK: %1 = alloca <{ i32, i32, i8 addrspace(4)*, i8 addrspace(1)*, i8 }>, align 8 +// CHECK: store <{ i32, i32, i8 addrspace(4)*, i8 addrspace(1)*, i8 }> %0, <{ i32, i32, i8 addrspace(4)*, i8 addrspace(1)*, i8 }>* %1, align 8 +// CHECK: %2 = addrspacecast <{ i32, i32, i8 addrspace(4)*, i8 addrspace(1)*, i8 }>* %1 to i8 addrspace(4)* +// CHECK: call void @__test_block_invoke(i8 addrspace(4)* %2) +// CHECK: ret void +// CHECK:} + +// CHECK-LABEL: define internal amdgpu_kernel void @__test_block_invoke_2_kernel(<{ i32, i32, i8 addrspace(4)*, i8 addrspace(1)*, i64 addrspace(1)*, i64, i8 }>) +// CHECK-SAME: #[[ATTR]] !kernel_arg_addr_space !{{.*}} !kernel_arg_access_qual !{{.*}} !kernel_arg_type !{{.*}} !kernel_arg_base_type !{{.*}} !kernel_arg_type_qual !{{.*}} + +// CHECK: attributes #[[ATTR]] = { nounwind "enqueued-block" } Index: cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl === --- cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl +++ cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -6,10 +6,30 @@ typedef void (^bl_t)(local void *); typedef struct {int a;} ndrange_t; -// N.B. The check here only exists to set BL_GLOBAL -// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) +// COMMON: %struct.__opencl_block_literal_generic = type { i32, i32, i8 addrspace(4)* } + +// For a block global variable, first emit the block literal as a global variable, then emit the block variable itself. +// COMMON: [[BL_GLOBAL:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* [[INV_G:@[^ ]+]] to i8*) to i8 addrspace(4)*) } +// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)*) addrspace(4)* addrspacecast (void (i8 addrspace(3)*) addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to void (i8 addrspace(3)*) addrspace(1)*) to void (i8 addrspace(3)*) addrspace(4)*) + +// For anonymous blocks without captures, emit block literals as global variable. +// COMMON: [[BLG1:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspace(4)*) } +// COMMON: [[BLG2:@__block_literal_global[^ ]*]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)*, i8 addrspace(3)*)* {{@[^ ]+}} to i8*) to i8 addrspa
[PATCH] D38857: [OpenCL] Improve printing and semantic check related to implicit addr space
yaxunl marked an inline comment as done. yaxunl added a comment. In https://reviews.llvm.org/D38857#896994, @Anastasia wrote: > LGTM! Thanks! > > Can we close https://bugs.llvm.org/show_bug.cgi?id=33418 after this commit? Will do. Comment at: test/SemaOpenCL/null_literal.cl:38 -#ifdef CL20 -// Accept explicitly pointer to generic address space in OpenCL v2.0. -global int* ptr5 = (generic void*)0; -#endif - -global int* ptr6 = (local void*)0; // expected-error{{initializing '__global int *' with an expression of type '__local void *' changes address space of pointer}} + global int *g7 = (global void*)(generic void *)0; + constant int *c7 = (constant void*)(generic void *)0; //expected-error{{casting '__generic void *' to type '__constant void *' changes address space of pointer}} Anastasia wrote: > Does this extra cast test anything we already miss to test? Yes. It tests a generic pointer of zero value can be explicitly casted to a global pointer. This should be true for any generic pointer, however since pointers with zero value have special handling, we want to make sure this still works. https://reviews.llvm.org/D38857 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38966: CodeGen: Fix invalid bitcasts for atomic builtins
yaxunl created this revision. Currently clang assumes the temporary variables emitted during codegen of atomic builtins have address space 0, which is not true for target triple amdgcn---amdgiz and causes invalid bitcasts. This patch fixes that. https://reviews.llvm.org/D38966 Files: lib/CodeGen/CGAtomic.cpp test/CodeGenOpenCL/atomic-ops.cl Index: test/CodeGenOpenCL/atomic-ops.cl === --- test/CodeGenOpenCL/atomic-ops.cl +++ test/CodeGenOpenCL/atomic-ops.cl @@ -1,8 +1,8 @@ -// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-llvm -O0 -o - -triple=amdgcn-amd-amdhsa-opencl | opt -instnamer -S | FileCheck %s +// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-llvm -O0 -o - -triple=amdgcn-amd-amdhsa-amdgizcl | opt -instnamer -S | FileCheck %s // Also test serialization of atomic operations here, to avoid duplicating the test. -// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-pch -O0 -o %t -triple=amdgcn-amd-amdhsa-opencl -// RUN: %clang_cc1 %s -cl-std=CL2.0 -include-pch %t -O0 -triple=amdgcn-amd-amdhsa-opencl -emit-llvm -o - | opt -instnamer -S | FileCheck %s +// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-pch -O0 -o %t -triple=amdgcn-amd-amdhsa-amdgizcl +// RUN: %clang_cc1 %s -cl-std=CL2.0 -include-pch %t -O0 -triple=amdgcn-amd-amdhsa-amdgizcl -emit-llvm -o - | opt -instnamer -S | FileCheck %s #ifndef ALREADY_INCLUDED #define ALREADY_INCLUDED @@ -32,58 +32,58 @@ void fi1(atomic_int *i) { // CHECK-LABEL: @fi1 - // CHECK: load atomic i32, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: load atomic i32, i32* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst int x = __opencl_atomic_load(i, memory_order_seq_cst, memory_scope_work_group); - // CHECK: load atomic i32, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} syncscope("agent") seq_cst + // CHECK: load atomic i32, i32* %{{[.0-9A-Z_a-z]+}} syncscope("agent") seq_cst x = __opencl_atomic_load(i, memory_order_seq_cst, memory_scope_device); - // CHECK: load atomic i32, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} seq_cst + // CHECK: load atomic i32, i32* %{{[.0-9A-Z_a-z]+}} seq_cst x = __opencl_atomic_load(i, memory_order_seq_cst, memory_scope_all_svm_devices); - // CHECK: load atomic i32, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} syncscope("subgroup") seq_cst + // CHECK: load atomic i32, i32* %{{[.0-9A-Z_a-z]+}} syncscope("subgroup") seq_cst x = __opencl_atomic_load(i, memory_order_seq_cst, memory_scope_sub_group); } void fi2(atomic_int *i) { // CHECK-LABEL: @fi2 - // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst __opencl_atomic_store(i, 1, memory_order_seq_cst, memory_scope_work_group); } void test_addr(global atomic_int *ig, private atomic_int *ip, local atomic_int *il) { // CHECK-LABEL: @test_addr // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32 addrspace(1)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst __opencl_atomic_store(ig, 1, memory_order_seq_cst, memory_scope_work_group); - // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32 addrspace(5)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst __opencl_atomic_store(ip, 1, memory_order_seq_cst, memory_scope_work_group); // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32 addrspace(3)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst __opencl_atomic_store(il, 1, memory_order_seq_cst, memory_scope_work_group); } void fi3(atomic_int *i, atomic_uint *ui) { // CHECK-LABEL: @fi3 - // CHECK: atomicrmw and i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: atomicrmw and i32* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst int x = __opencl_atomic_fetch_and(i, 1, memory_order_seq_cst, memory_scope_work_group); - // CHECK: atomicrmw min i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: atomicrmw min i32* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst x = __opencl_atomic_fetch_min(i, 1, memory_order_seq_cst, memory_scope_work_group); - // CHECK: atomicrmw max i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: atomicrmw max i32* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst x = __opencl_atomic_fetch_max(i, 1, memory_order_seq_cst, memory_scope_work_group); - // CHECK: atomicrmw umin i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: atomicrmw umin i32* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst x = __opencl_atomic_fetch_min(ui, 1, memory_o
[PATCH] D38966: CodeGen: Fix invalid bitcasts for atomic builtins
This revision was automatically updated to reflect the committed changes. Closed by commit rL316000: CodeGen: Fix invalid bitcasts for atomic builtins (authored by yaxunl). Changed prior to commit: https://reviews.llvm.org/D38966?vs=119182&id=119319#toc Repository: rL LLVM https://reviews.llvm.org/D38966 Files: cfe/trunk/lib/CodeGen/CGAtomic.cpp cfe/trunk/test/CodeGenOpenCL/atomic-ops.cl Index: cfe/trunk/lib/CodeGen/CGAtomic.cpp === --- cfe/trunk/lib/CodeGen/CGAtomic.cpp +++ cfe/trunk/lib/CodeGen/CGAtomic.cpp @@ -1226,7 +1226,8 @@ return RValue::get(nullptr); return convertTempToRValue( -Builder.CreateBitCast(Dest, ConvertTypeForMem(RValTy)->getPointerTo()), +Builder.CreateBitCast(Dest, ConvertTypeForMem(RValTy)->getPointerTo( +Dest.getAddressSpace())), RValTy, E->getExprLoc()); } @@ -1298,7 +1299,8 @@ assert(Atomics.getValueSizeInBits() <= Atomics.getAtomicSizeInBits()); return convertTempToRValue( - Builder.CreateBitCast(Dest, ConvertTypeForMem(RValTy)->getPointerTo()), + Builder.CreateBitCast(Dest, ConvertTypeForMem(RValTy)->getPointerTo( + Dest.getAddressSpace())), RValTy, E->getExprLoc()); } Index: cfe/trunk/test/CodeGenOpenCL/atomic-ops.cl === --- cfe/trunk/test/CodeGenOpenCL/atomic-ops.cl +++ cfe/trunk/test/CodeGenOpenCL/atomic-ops.cl @@ -1,8 +1,8 @@ -// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-llvm -O0 -o - -triple=amdgcn-amd-amdhsa-opencl | opt -instnamer -S | FileCheck %s +// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-llvm -O0 -o - -triple=amdgcn-amd-amdhsa-amdgizcl | opt -instnamer -S | FileCheck %s // Also test serialization of atomic operations here, to avoid duplicating the test. -// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-pch -O0 -o %t -triple=amdgcn-amd-amdhsa-opencl -// RUN: %clang_cc1 %s -cl-std=CL2.0 -include-pch %t -O0 -triple=amdgcn-amd-amdhsa-opencl -emit-llvm -o - | opt -instnamer -S | FileCheck %s +// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-pch -O0 -o %t -triple=amdgcn-amd-amdhsa-amdgizcl +// RUN: %clang_cc1 %s -cl-std=CL2.0 -include-pch %t -O0 -triple=amdgcn-amd-amdhsa-amdgizcl -emit-llvm -o - | opt -instnamer -S | FileCheck %s #ifndef ALREADY_INCLUDED #define ALREADY_INCLUDED @@ -32,58 +32,58 @@ void fi1(atomic_int *i) { // CHECK-LABEL: @fi1 - // CHECK: load atomic i32, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: load atomic i32, i32* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst int x = __opencl_atomic_load(i, memory_order_seq_cst, memory_scope_work_group); - // CHECK: load atomic i32, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} syncscope("agent") seq_cst + // CHECK: load atomic i32, i32* %{{[.0-9A-Z_a-z]+}} syncscope("agent") seq_cst x = __opencl_atomic_load(i, memory_order_seq_cst, memory_scope_device); - // CHECK: load atomic i32, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} seq_cst + // CHECK: load atomic i32, i32* %{{[.0-9A-Z_a-z]+}} seq_cst x = __opencl_atomic_load(i, memory_order_seq_cst, memory_scope_all_svm_devices); - // CHECK: load atomic i32, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} syncscope("subgroup") seq_cst + // CHECK: load atomic i32, i32* %{{[.0-9A-Z_a-z]+}} syncscope("subgroup") seq_cst x = __opencl_atomic_load(i, memory_order_seq_cst, memory_scope_sub_group); } void fi2(atomic_int *i) { // CHECK-LABEL: @fi2 - // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst __opencl_atomic_store(i, 1, memory_order_seq_cst, memory_scope_work_group); } void test_addr(global atomic_int *ig, private atomic_int *ip, local atomic_int *il) { // CHECK-LABEL: @test_addr // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32 addrspace(1)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst __opencl_atomic_store(ig, 1, memory_order_seq_cst, memory_scope_work_group); - // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32 addrspace(5)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst __opencl_atomic_store(ip, 1, memory_order_seq_cst, memory_scope_work_group); // CHECK: store atomic i32 %{{[.0-9A-Z_a-z]+}}, i32 addrspace(3)* %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst __opencl_atomic_store(il, 1, memory_order_seq_cst, memory_scope_work_group); } void fi3(atomic_int *i, atomic_uint *ui) { // CHECK-LABEL: @fi3 - // CHECK: atomicrmw and i32 addrspace(4)* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup") seq_cst + // CHECK: atomicrmw and i32* %{{[.0-9A-Z_a-z]+}}, i32 %{{[.0-9A-Z_a-z]+}} syncscope("workgroup"
[PATCH] D39069: CodeGen: Fix missing debug loc due to alloca
yaxunl created this revision. Builder save/restores insertion pointer when emitting addr space cast for alloca, but does not save/restore debug loc, which causes verifier failure for certain call instructions. This patch fixes that. https://reviews.llvm.org/D39069 Files: lib/CodeGen/CGExpr.cpp test/CodeGenOpenCL/func-call-dbg-loc.cl Index: test/CodeGenOpenCL/func-call-dbg-loc.cl === --- /dev/null +++ test/CodeGenOpenCL/func-call-dbg-loc.cl @@ -0,0 +1,34 @@ +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -debug-info-kind=limited -dwarf-version=2 -debugger-tuning=gdb -O0 -emit-llvm -o - %s | FileCheck %s +// Checks the file compiles without verifier error: inlinable function call in a function with debug info must have a !dbg location. + +typedef struct +{ +float m_max; +} Struct; + +typedef struct +{ +Struct m_volume; +} Node; + + +Struct buzz(Node node) +{ +return node.m_volume; +} + +__attribute__((always_inline)) +float bar(Struct aabb) +{ +return 0.0f; +} + +__attribute__((used)) +void foo() +{ +Node node; +// CHECK: store float 0.00e+00, float addrspace(5)* %f, align 4, !dbg !{{[0-9]+}} +float f = bar(buzz(node)); +} + + Index: lib/CodeGen/CGExpr.cpp === --- lib/CodeGen/CGExpr.cpp +++ lib/CodeGen/CGExpr.cpp @@ -75,11 +75,13 @@ if (CastToDefaultAddrSpace && getASTAllocaAddressSpace() != LangAS::Default) { auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default); auto CurIP = Builder.saveIP(); +auto DbgLoc = Builder.getCurrentDebugLocation(); Builder.SetInsertPoint(AllocaInsertPt); V = getTargetHooks().performAddrSpaceCast( *this, V, getASTAllocaAddressSpace(), LangAS::Default, Ty->getPointerTo(DestAddrSpace), /*non-null*/ true); Builder.restoreIP(CurIP); +Builder.SetCurrentDebugLocation(DbgLoc); } return Address(V, Align); Index: test/CodeGenOpenCL/func-call-dbg-loc.cl === --- /dev/null +++ test/CodeGenOpenCL/func-call-dbg-loc.cl @@ -0,0 +1,34 @@ +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -debug-info-kind=limited -dwarf-version=2 -debugger-tuning=gdb -O0 -emit-llvm -o - %s | FileCheck %s +// Checks the file compiles without verifier error: inlinable function call in a function with debug info must have a !dbg location. + +typedef struct +{ +float m_max; +} Struct; + +typedef struct +{ +Struct m_volume; +} Node; + + +Struct buzz(Node node) +{ +return node.m_volume; +} + +__attribute__((always_inline)) +float bar(Struct aabb) +{ +return 0.0f; +} + +__attribute__((used)) +void foo() +{ +Node node; +// CHECK: store float 0.00e+00, float addrspace(5)* %f, align 4, !dbg !{{[0-9]+}} +float f = bar(buzz(node)); +} + + Index: lib/CodeGen/CGExpr.cpp === --- lib/CodeGen/CGExpr.cpp +++ lib/CodeGen/CGExpr.cpp @@ -75,11 +75,13 @@ if (CastToDefaultAddrSpace && getASTAllocaAddressSpace() != LangAS::Default) { auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default); auto CurIP = Builder.saveIP(); +auto DbgLoc = Builder.getCurrentDebugLocation(); Builder.SetInsertPoint(AllocaInsertPt); V = getTargetHooks().performAddrSpaceCast( *this, V, getASTAllocaAddressSpace(), LangAS::Default, Ty->getPointerTo(DestAddrSpace), /*non-null*/ true); Builder.restoreIP(CurIP); +Builder.SetCurrentDebugLocation(DbgLoc); } return Address(V, Align); ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D39184: CodeGen: Fix invalid bitcast in partial initialization of automatic arrary variable
yaxunl created this revision. https://reviews.llvm.org/D39184 Files: lib/CodeGen/CGDecl.cpp test/CodeGenOpenCL/amdgcn-automatic-variable.cl Index: test/CodeGenOpenCL/amdgcn-automatic-variable.cl === --- test/CodeGenOpenCL/amdgcn-automatic-variable.cl +++ test/CodeGenOpenCL/amdgcn-automatic-variable.cl @@ -58,3 +58,11 @@ const int lvc = 4; lv1 = lvc; } + +// CHECK-LABEL: define void @func3() +// CHECK: %a = alloca [16 x [1 x float]], align 4, addrspace(5) +// CHECK: %[[CAST:.+]] = bitcast [16 x [1 x float]] addrspace(5)* %a to i8 addrspace(5)* +// CHECK: call void @llvm.memset.p5i8.i64(i8 addrspace(5)* %[[CAST]], i8 0, i64 64, i32 4, i1 false) +void func3(void) { + float a[16][1] = {{0.}}; +} Index: lib/CodeGen/CGDecl.cpp === --- lib/CodeGen/CGDecl.cpp +++ lib/CodeGen/CGDecl.cpp @@ -1266,7 +1266,7 @@ llvm::ConstantInt::get(IntPtrTy, getContext().getTypeSizeInChars(type).getQuantity()); - llvm::Type *BP = Int8PtrTy; + llvm::Type *BP = AllocaInt8PtrTy; if (Loc.getType() != BP) Loc = Builder.CreateBitCast(Loc, BP); Index: test/CodeGenOpenCL/amdgcn-automatic-variable.cl === --- test/CodeGenOpenCL/amdgcn-automatic-variable.cl +++ test/CodeGenOpenCL/amdgcn-automatic-variable.cl @@ -58,3 +58,11 @@ const int lvc = 4; lv1 = lvc; } + +// CHECK-LABEL: define void @func3() +// CHECK: %a = alloca [16 x [1 x float]], align 4, addrspace(5) +// CHECK: %[[CAST:.+]] = bitcast [16 x [1 x float]] addrspace(5)* %a to i8 addrspace(5)* +// CHECK: call void @llvm.memset.p5i8.i64(i8 addrspace(5)* %[[CAST]], i8 0, i64 64, i32 4, i1 false) +void func3(void) { + float a[16][1] = {{0.}}; +} Index: lib/CodeGen/CGDecl.cpp === --- lib/CodeGen/CGDecl.cpp +++ lib/CodeGen/CGDecl.cpp @@ -1266,7 +1266,7 @@ llvm::ConstantInt::get(IntPtrTy, getContext().getTypeSizeInChars(type).getQuantity()); - llvm::Type *BP = Int8PtrTy; + llvm::Type *BP = AllocaInt8PtrTy; if (Loc.getType() != BP) Loc = Builder.CreateBitCast(Loc, BP); ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D39184: CodeGen: Fix invalid bitcast in partial initialization of automatic arrary variable
This revision was automatically updated to reflect the committed changes. Closed by commit rL316353: CodeGen: Fix invalid bitcast in partial initialization of automatic arrary… (authored by yaxunl). Changed prior to commit: https://reviews.llvm.org/D39184?vs=119861&id=119900#toc Repository: rL LLVM https://reviews.llvm.org/D39184 Files: cfe/trunk/lib/CodeGen/CGDecl.cpp cfe/trunk/test/CodeGenOpenCL/amdgcn-automatic-variable.cl Index: cfe/trunk/lib/CodeGen/CGDecl.cpp === --- cfe/trunk/lib/CodeGen/CGDecl.cpp +++ cfe/trunk/lib/CodeGen/CGDecl.cpp @@ -1266,7 +1266,7 @@ llvm::ConstantInt::get(IntPtrTy, getContext().getTypeSizeInChars(type).getQuantity()); - llvm::Type *BP = Int8PtrTy; + llvm::Type *BP = AllocaInt8PtrTy; if (Loc.getType() != BP) Loc = Builder.CreateBitCast(Loc, BP); Index: cfe/trunk/test/CodeGenOpenCL/amdgcn-automatic-variable.cl === --- cfe/trunk/test/CodeGenOpenCL/amdgcn-automatic-variable.cl +++ cfe/trunk/test/CodeGenOpenCL/amdgcn-automatic-variable.cl @@ -58,3 +58,11 @@ const int lvc = 4; lv1 = lvc; } + +// CHECK-LABEL: define void @func3() +// CHECK: %a = alloca [16 x [1 x float]], align 4, addrspace(5) +// CHECK: %[[CAST:.+]] = bitcast [16 x [1 x float]] addrspace(5)* %a to i8 addrspace(5)* +// CHECK: call void @llvm.memset.p5i8.i64(i8 addrspace(5)* %[[CAST]], i8 0, i64 64, i32 4, i1 false) +void func3(void) { + float a[16][1] = {{0.}}; +} Index: cfe/trunk/lib/CodeGen/CGDecl.cpp === --- cfe/trunk/lib/CodeGen/CGDecl.cpp +++ cfe/trunk/lib/CodeGen/CGDecl.cpp @@ -1266,7 +1266,7 @@ llvm::ConstantInt::get(IntPtrTy, getContext().getTypeSizeInChars(type).getQuantity()); - llvm::Type *BP = Int8PtrTy; + llvm::Type *BP = AllocaInt8PtrTy; if (Loc.getType() != BP) Loc = Builder.CreateBitCast(Loc, BP); Index: cfe/trunk/test/CodeGenOpenCL/amdgcn-automatic-variable.cl === --- cfe/trunk/test/CodeGenOpenCL/amdgcn-automatic-variable.cl +++ cfe/trunk/test/CodeGenOpenCL/amdgcn-automatic-variable.cl @@ -58,3 +58,11 @@ const int lvc = 4; lv1 = lvc; } + +// CHECK-LABEL: define void @func3() +// CHECK: %a = alloca [16 x [1 x float]], align 4, addrspace(5) +// CHECK: %[[CAST:.+]] = bitcast [16 x [1 x float]] addrspace(5)* %a to i8 addrspace(5)* +// CHECK: call void @llvm.memset.p5i8.i64(i8 addrspace(5)* %[[CAST]], i8 0, i64 64, i32 4, i1 false) +void func3(void) { + float a[16][1] = {{0.}}; +} ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D39069: CodeGen: Fix missing debug loc due to alloca
yaxunl added a comment. In https://reviews.llvm.org/D39069#903344, @rjmccall wrote: > If this is something we generally need to be doing in all the places we > temporarily save and restore the insertion point, we should fix the basic > behavior of saveIP instead of adding explicit code to a bunch of separate > places. Can we just override saveIP() on CGBuilder to return a struct that > also includes the current debug location? IRBuilderBase::InsertPointGuard does that. Will use it instead. https://reviews.llvm.org/D39069 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D39069: CodeGen: Fix missing debug loc due to alloca
yaxunl updated this revision to Diff 119918. yaxunl added a comment. Use InsertPointGuard and simplify test. https://reviews.llvm.org/D39069 Files: lib/CodeGen/CGExpr.cpp test/CodeGenOpenCL/func-call-dbg-loc.cl Index: test/CodeGenOpenCL/func-call-dbg-loc.cl === --- /dev/null +++ test/CodeGenOpenCL/func-call-dbg-loc.cl @@ -0,0 +1,18 @@ +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -debug-info-kind=limited -O0 -emit-llvm -o - %s | FileCheck %s + +typedef struct +{ +int a; +} Struct; + +Struct func1(); + +void func2(Struct S); + +void func3() +{ +// CHECK: call i32 @func1() #{{[0-9]+}}, !dbg !{{[0-9]+}} +// CHECK: call void @func2(i32 %{{[0-9]+}}) #{{[0-9]+}}, !dbg !{{[0-9]+}} +func2(func1()); +} + Index: lib/CodeGen/CGExpr.cpp === --- lib/CodeGen/CGExpr.cpp +++ lib/CodeGen/CGExpr.cpp @@ -74,12 +74,11 @@ // cast alloca to the default address space when necessary. if (CastToDefaultAddrSpace && getASTAllocaAddressSpace() != LangAS::Default) { auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default); -auto CurIP = Builder.saveIP(); +llvm::IRBuilderBase::InsertPointGuard IPG(Builder); Builder.SetInsertPoint(AllocaInsertPt); V = getTargetHooks().performAddrSpaceCast( *this, V, getASTAllocaAddressSpace(), LangAS::Default, Ty->getPointerTo(DestAddrSpace), /*non-null*/ true); -Builder.restoreIP(CurIP); } return Address(V, Align); Index: test/CodeGenOpenCL/func-call-dbg-loc.cl === --- /dev/null +++ test/CodeGenOpenCL/func-call-dbg-loc.cl @@ -0,0 +1,18 @@ +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -debug-info-kind=limited -O0 -emit-llvm -o - %s | FileCheck %s + +typedef struct +{ +int a; +} Struct; + +Struct func1(); + +void func2(Struct S); + +void func3() +{ +// CHECK: call i32 @func1() #{{[0-9]+}}, !dbg !{{[0-9]+}} +// CHECK: call void @func2(i32 %{{[0-9]+}}) #{{[0-9]+}}, !dbg !{{[0-9]+}} +func2(func1()); +} + Index: lib/CodeGen/CGExpr.cpp === --- lib/CodeGen/CGExpr.cpp +++ lib/CodeGen/CGExpr.cpp @@ -74,12 +74,11 @@ // cast alloca to the default address space when necessary. if (CastToDefaultAddrSpace && getASTAllocaAddressSpace() != LangAS::Default) { auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default); -auto CurIP = Builder.saveIP(); +llvm::IRBuilderBase::InsertPointGuard IPG(Builder); Builder.SetInsertPoint(AllocaInsertPt); V = getTargetHooks().performAddrSpaceCast( *this, V, getASTAllocaAddressSpace(), LangAS::Default, Ty->getPointerTo(DestAddrSpace), /*non-null*/ true); -Builder.restoreIP(CurIP); } return Address(V, Align); ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D39069: CodeGen: Fix missing debug loc due to alloca
yaxunl updated this revision to Diff 120076. yaxunl added a comment. Revised the test by Paul's comments. https://reviews.llvm.org/D39069 Files: lib/CodeGen/CGExpr.cpp test/CodeGenOpenCL/func-call-dbg-loc.cl Index: test/CodeGenOpenCL/func-call-dbg-loc.cl === --- /dev/null +++ test/CodeGenOpenCL/func-call-dbg-loc.cl @@ -0,0 +1,18 @@ +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -debug-info-kind=limited -O0 -emit-llvm -o - %s | FileCheck %s + +typedef struct +{ +int a; +} Struct; + +Struct func1(); + +void func2(Struct S); + +void func3() +{ +// CHECK: call i32 @func1() #{{[0-9]+}}, !dbg ![[LOC:[0-9]+]] +// CHECK: call void @func2(i32 %{{[0-9]+}}) #{{[0-9]+}}, !dbg ![[LOC]] +func2(func1()); +} + Index: lib/CodeGen/CGExpr.cpp === --- lib/CodeGen/CGExpr.cpp +++ lib/CodeGen/CGExpr.cpp @@ -74,12 +74,11 @@ // cast alloca to the default address space when necessary. if (CastToDefaultAddrSpace && getASTAllocaAddressSpace() != LangAS::Default) { auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default); -auto CurIP = Builder.saveIP(); +llvm::IRBuilderBase::InsertPointGuard IPG(Builder); Builder.SetInsertPoint(AllocaInsertPt); V = getTargetHooks().performAddrSpaceCast( *this, V, getASTAllocaAddressSpace(), LangAS::Default, Ty->getPointerTo(DestAddrSpace), /*non-null*/ true); -Builder.restoreIP(CurIP); } return Address(V, Align); Index: test/CodeGenOpenCL/func-call-dbg-loc.cl === --- /dev/null +++ test/CodeGenOpenCL/func-call-dbg-loc.cl @@ -0,0 +1,18 @@ +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -debug-info-kind=limited -O0 -emit-llvm -o - %s | FileCheck %s + +typedef struct +{ +int a; +} Struct; + +Struct func1(); + +void func2(Struct S); + +void func3() +{ +// CHECK: call i32 @func1() #{{[0-9]+}}, !dbg ![[LOC:[0-9]+]] +// CHECK: call void @func2(i32 %{{[0-9]+}}) #{{[0-9]+}}, !dbg ![[LOC]] +func2(func1()); +} + Index: lib/CodeGen/CGExpr.cpp === --- lib/CodeGen/CGExpr.cpp +++ lib/CodeGen/CGExpr.cpp @@ -74,12 +74,11 @@ // cast alloca to the default address space when necessary. if (CastToDefaultAddrSpace && getASTAllocaAddressSpace() != LangAS::Default) { auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default); -auto CurIP = Builder.saveIP(); +llvm::IRBuilderBase::InsertPointGuard IPG(Builder); Builder.SetInsertPoint(AllocaInsertPt); V = getTargetHooks().performAddrSpaceCast( *this, V, getASTAllocaAddressSpace(), LangAS::Default, Ty->getPointerTo(DestAddrSpace), /*non-null*/ true); -Builder.restoreIP(CurIP); } return Address(V, Align); ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D39069: CodeGen: Fix missing debug loc due to alloca
This revision was automatically updated to reflect the committed changes. Closed by commit rL316484: CodeGen: Fix missing debug loc due to alloca (authored by yaxunl). Changed prior to commit: https://reviews.llvm.org/D39069?vs=120076&id=120115#toc Repository: rL LLVM https://reviews.llvm.org/D39069 Files: cfe/trunk/lib/CodeGen/CGExpr.cpp cfe/trunk/test/CodeGenOpenCL/func-call-dbg-loc.cl Index: cfe/trunk/lib/CodeGen/CGExpr.cpp === --- cfe/trunk/lib/CodeGen/CGExpr.cpp +++ cfe/trunk/lib/CodeGen/CGExpr.cpp @@ -74,12 +74,11 @@ // cast alloca to the default address space when necessary. if (CastToDefaultAddrSpace && getASTAllocaAddressSpace() != LangAS::Default) { auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default); -auto CurIP = Builder.saveIP(); +llvm::IRBuilderBase::InsertPointGuard IPG(Builder); Builder.SetInsertPoint(AllocaInsertPt); V = getTargetHooks().performAddrSpaceCast( *this, V, getASTAllocaAddressSpace(), LangAS::Default, Ty->getPointerTo(DestAddrSpace), /*non-null*/ true); -Builder.restoreIP(CurIP); } return Address(V, Align); Index: cfe/trunk/test/CodeGenOpenCL/func-call-dbg-loc.cl === --- cfe/trunk/test/CodeGenOpenCL/func-call-dbg-loc.cl +++ cfe/trunk/test/CodeGenOpenCL/func-call-dbg-loc.cl @@ -0,0 +1,18 @@ +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -debug-info-kind=limited -O0 -emit-llvm -o - %s | FileCheck %s + +typedef struct +{ +int a; +} Struct; + +Struct func1(); + +void func2(Struct S); + +void func3() +{ +// CHECK: call i32 @func1() #{{[0-9]+}}, !dbg ![[LOC:[0-9]+]] +// CHECK: call void @func2(i32 %{{[0-9]+}}) #{{[0-9]+}}, !dbg ![[LOC]] +func2(func1()); +} + Index: cfe/trunk/lib/CodeGen/CGExpr.cpp === --- cfe/trunk/lib/CodeGen/CGExpr.cpp +++ cfe/trunk/lib/CodeGen/CGExpr.cpp @@ -74,12 +74,11 @@ // cast alloca to the default address space when necessary. if (CastToDefaultAddrSpace && getASTAllocaAddressSpace() != LangAS::Default) { auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default); -auto CurIP = Builder.saveIP(); +llvm::IRBuilderBase::InsertPointGuard IPG(Builder); Builder.SetInsertPoint(AllocaInsertPt); V = getTargetHooks().performAddrSpaceCast( *this, V, getASTAllocaAddressSpace(), LangAS::Default, Ty->getPointerTo(DestAddrSpace), /*non-null*/ true); -Builder.restoreIP(CurIP); } return Address(V, Align); Index: cfe/trunk/test/CodeGenOpenCL/func-call-dbg-loc.cl === --- cfe/trunk/test/CodeGenOpenCL/func-call-dbg-loc.cl +++ cfe/trunk/test/CodeGenOpenCL/func-call-dbg-loc.cl @@ -0,0 +1,18 @@ +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -debug-info-kind=limited -O0 -emit-llvm -o - %s | FileCheck %s + +typedef struct +{ +int a; +} Struct; + +Struct func1(); + +void func2(Struct S); + +void func3() +{ +// CHECK: call i32 @func1() #{{[0-9]+}}, !dbg ![[LOC:[0-9]+]] +// CHECK: call void @func2(i32 %{{[0-9]+}}) #{{[0-9]+}}, !dbg ![[LOC]] +func2(func1()); +} + ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D51212: [OpenCL][Docs] Release notes for OpenCL in Clang
yaxunl accepted this revision. yaxunl added a comment. LGTM. Thanks. https://reviews.llvm.org/D51212 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43783: [OpenCL] Remove block invoke function from emitted block literal struct
yaxunl added a comment. Herald added a subscriber: jvesely. In https://reviews.llvm.org/D43783#1204353, @svenvh wrote: > Sorry for digging up an old commit... > > Apparently this broke block arguments, e.g. the following test case: > > int foo(int (^ bl)(void)) { > return bl(); > } > > int get21() { > return foo(^{return 21;}); > } > > int get42() { > return foo(^{return 42;}); > } > > > In particular, the VarDecl that `getBlockExpr()` sees doesn't have an > initializer when the called block comes from an argument (causing clang to > crash). Sorry for the delay. I think block should not be allowed as function argument since this generally leads indirect function calls therefore requires support of function pointer. It will rely on optimizations to get rid of indirect function calls. Repository: rC Clang https://reviews.llvm.org/D43783 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D51336: [HIP] Fix output file extension
yaxunl created this revision. yaxunl added reviewers: tra, rjmccall. OffloadBundlingJobAction constructor accepts a list of JobAction as inputs. The host JobAction is the last one. The file type of OffloadBundlingJobAction should be determined by the host JobAction (the last one) instead of the first one. Since HIP emits LLVM bitcode for device compilation, device JobAction has different file type as host Job Action. This bug causes incorrect output file extension for HIP. This patch fixes it by using the last input JobAction (host JobAction) to determine file type of OffloadBundlingJobAction. https://reviews.llvm.org/D51336 Files: lib/Driver/Action.cpp test/Driver/hip-output-file-name.hip Index: test/Driver/hip-output-file-name.hip === --- /dev/null +++ test/Driver/hip-output-file-name.hip @@ -0,0 +1,9 @@ +// REQUIRES: clang-driver +// REQUIRES: x86-registered-target +// REQUIRES: amdgpu-registered-target + +// RUN: %clang -### -c -target x86_64-linux-gnu \ +// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \ +// RUN: 2>&1 | FileCheck %s + +// CHECK: {{.*}}clang-offload-bundler{{.*}}"-outputs=hip-output-file-name.o" Index: lib/Driver/Action.cpp === --- lib/Driver/Action.cpp +++ lib/Driver/Action.cpp @@ -382,7 +382,7 @@ void OffloadBundlingJobAction::anchor() {} OffloadBundlingJobAction::OffloadBundlingJobAction(ActionList &Inputs) -: JobAction(OffloadBundlingJobClass, Inputs, Inputs.front()->getType()) {} +: JobAction(OffloadBundlingJobClass, Inputs, Inputs.back()->getType()) {} void OffloadUnbundlingJobAction::anchor() {} Index: test/Driver/hip-output-file-name.hip === --- /dev/null +++ test/Driver/hip-output-file-name.hip @@ -0,0 +1,9 @@ +// REQUIRES: clang-driver +// REQUIRES: x86-registered-target +// REQUIRES: amdgpu-registered-target + +// RUN: %clang -### -c -target x86_64-linux-gnu \ +// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \ +// RUN: 2>&1 | FileCheck %s + +// CHECK: {{.*}}clang-offload-bundler{{.*}}"-outputs=hip-output-file-name.o" Index: lib/Driver/Action.cpp === --- lib/Driver/Action.cpp +++ lib/Driver/Action.cpp @@ -382,7 +382,7 @@ void OffloadBundlingJobAction::anchor() {} OffloadBundlingJobAction::OffloadBundlingJobAction(ActionList &Inputs) -: JobAction(OffloadBundlingJobClass, Inputs, Inputs.front()->getType()) {} +: JobAction(OffloadBundlingJobClass, Inputs, Inputs.back()->getType()) {} void OffloadUnbundlingJobAction::anchor() {} ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D51336: [HIP] Fix output file extension
This revision was automatically updated to reflect the committed changes. Closed by commit rC340873: [HIP] Fix output file extension (authored by yaxunl, committed by ). Repository: rC Clang https://reviews.llvm.org/D51336 Files: lib/Driver/Action.cpp test/Driver/hip-output-file-name.hip Index: test/Driver/hip-output-file-name.hip === --- test/Driver/hip-output-file-name.hip +++ test/Driver/hip-output-file-name.hip @@ -0,0 +1,9 @@ +// REQUIRES: clang-driver +// REQUIRES: x86-registered-target +// REQUIRES: amdgpu-registered-target + +// RUN: %clang -### -c -target x86_64-linux-gnu \ +// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \ +// RUN: 2>&1 | FileCheck %s + +// CHECK: {{.*}}clang-offload-bundler{{.*}}"-outputs=hip-output-file-name.o" Index: lib/Driver/Action.cpp === --- lib/Driver/Action.cpp +++ lib/Driver/Action.cpp @@ -382,7 +382,7 @@ void OffloadBundlingJobAction::anchor() {} OffloadBundlingJobAction::OffloadBundlingJobAction(ActionList &Inputs) -: JobAction(OffloadBundlingJobClass, Inputs, Inputs.front()->getType()) {} +: JobAction(OffloadBundlingJobClass, Inputs, Inputs.back()->getType()) {} void OffloadUnbundlingJobAction::anchor() {} Index: test/Driver/hip-output-file-name.hip === --- test/Driver/hip-output-file-name.hip +++ test/Driver/hip-output-file-name.hip @@ -0,0 +1,9 @@ +// REQUIRES: clang-driver +// REQUIRES: x86-registered-target +// REQUIRES: amdgpu-registered-target + +// RUN: %clang -### -c -target x86_64-linux-gnu \ +// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \ +// RUN: 2>&1 | FileCheck %s + +// CHECK: {{.*}}clang-offload-bundler{{.*}}"-outputs=hip-output-file-name.o" Index: lib/Driver/Action.cpp === --- lib/Driver/Action.cpp +++ lib/Driver/Action.cpp @@ -382,7 +382,7 @@ void OffloadBundlingJobAction::anchor() {} OffloadBundlingJobAction::OffloadBundlingJobAction(ActionList &Inputs) -: JobAction(OffloadBundlingJobClass, Inputs, Inputs.front()->getType()) {} +: JobAction(OffloadBundlingJobClass, Inputs, Inputs.back()->getType()) {} void OffloadUnbundlingJobAction::anchor() {} ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits