Anastasia created this revision. Anastasia added reviewers: bader, yaxunl. Anastasia added subscribers: pekka.jaaskelainen, pxli168, cfe-commits.
An implementation of device side enqueue (DSE) - enqueue_kernel and related BIFs from OpenCL v2.0 s6.13.17. This change includes: 1. adding enqueue_kernel, get_kernel_work_group_size and get_kernel_preferred_work_group_size_multiple as Clang builtins with a custom check. Example: enqueue_kernel(.../*ommited params*/, block, /*optional sizes of passed block args if any*/) This allows diagnosing parameters of the passed block variable (the spec mandates them to be 'local void*' type) and we can check different overloads too (Table 6.31). 2. IR generation with an internal library call for each new builtins used in the CL code, reusing ObjC block generation. For the following example of CL code: kernel void device_side_enqueue(…) { … /*declare default_queue, flags, ndrange, a, b here*/ enqueue_kernel(default_queue, flags, ndrange, ^(void) { a + b; }); } The generated IR could be: ; from ObjC block CodeGen (the second field contains the size of the block literal record) @__block_descriptor_tmp = internal constant { i64, i64, i8*, i8* } { i64 0, i64 52, i8* getelementptr inbounds ([35 x i8]* @.str, i32 0, i32 0), i8* null } ... define void @device_side_enqueue() { ... ; from ObjC block CodeGen (block literal record with a capture) %block = alloca <{ i8*, i32, i32, i8*, %struct.__block_descriptor*, i32, i32}> ; from ObjC block CodeGen - store block descriptor and block captures below ... ; from ObjC block CodeGen (set pointer to block definition code) %block.invoke = getelementptr inbounds <{ i8*, i32, i32, i8*, %struct.__block_descriptor*, i32, i32}>* %block, i64 0, i32 3 * store i8* bitcast (void (i8*)* @__device_side_enqueue_block_invoke to i8*), i8** %block.invoke * ; potential impl of OpenCL CodeGen (cast from block literal record ptr to void ptr) %1 = bitcast <{ i8*, i32, i32, i8*, %struct.__block_descriptor*, i32, i32}>* %block to i8* ; potential impl of OpenCL CodeGen (this function will have additional integer params at the end if the block has any parameters to be passed to) ... call i32 @__enqueue_kernel_impl(..., i8* %1) ... } define internal void @__device_side_enqueue_block_invoke(i8* nocapture readonly %.block_descriptor) { ; from ObjC block CodeGen (this can have more params of local void* type) ; from ObjC block CodeGen - load captures below … ; from ObjC block CodeGen - original body of block … } Note that there are different versions of __enqueue_kernel_impl with unique name each. These functions will have to be implemented as a part of an OpenCL runtime library which will get a block literal data structure (allocated locally as in this example if capture is present or as a global variable otherwise), sizes of each block literal parameter (from 'local void*' list) and other omitted arguments at the beginning - mainly opaque objects, and will perform necessary steps to enqueue work specified by the block. The block literal record itself contains all important bits to facilitate basic implementation of DSE: a pointer to a block function definition, captured fields, and size of the block literal record. We can also discuss and implement some optimisations later on or as a part of this work. The implementation of __enqueue_kernel_impl will have to take care of (1) initiating execution of the block invoke code pointed to by the block literal record (%block.invoke in the example above), (2) copying captured variables in the accessible memory location, (3) performing some sort of memory management to allocate space for 'local void*' parameters passed to the block if any. Additional changes not included in this change: 1. Modifications of ObjC blocks IR generation. A block literal record currently contains a number of fields that are not needed for OpenCL, i.e. isa, flags, copy and dispose helpers. They can be removed when compiling in OpenCL mode. We might potentially add extra fields to enable more efficient support of DSE or facilitate compiler optimisations. Ideas are welcome! I expect some places might require taking care of address spaces too. 2. Potentially change of existing OpenCL types is needed. At least it seems like we might need to handle the ndrange_t type differently than we do currently. It's an opaque type now, but we need it to be allocated on a stack because a local variable of that type can be declared in CL code. http://reviews.llvm.org/D20249 Files: include/clang/Basic/Builtins.def include/clang/Basic/Builtins.h include/clang/Basic/DiagnosticSemaKinds.td lib/Basic/Builtins.cpp lib/CodeGen/CGBuiltin.cpp lib/Sema/SemaChecking.cpp test/CodeGenOpenCL/cl20-device-side-enqueue.cl test/SemaOpenCL/cl20-device-side-enqueue.cl test/SemaOpenCL/clang-builtin-version.cl
Index: test/SemaOpenCL/clang-builtin-version.cl =================================================================== --- /dev/null +++ test/SemaOpenCL/clang-builtin-version.cl @@ -0,0 +1,44 @@ +// RUN: %clang_cc1 %s -fblocks -verify -pedantic -fsyntax-only -ferror-limit 100 + +// Confirm CL2.0 Clang builtins are not available in earlier versions + +kernel void dse_builtins() { + int tmp; + enqueue_kernel(tmp, tmp, tmp, ^(void) { + return; + }); // expected-warning{{implicit declaration of function 'enqueue_kernel' is invalid in C99}} + unsigned size = get_kernel_work_group_size(^(void) { + return; + }); // expected-warning{{implicit declaration of function 'get_kernel_work_group_size' is invalid in C99}} + size = get_kernel_preferred_work_group_size_multiple(^(void) { + return; + }); // expected-warning{{implicit declaration of function 'get_kernel_preferred_work_group_size_multiple' is invalid in C99}} +} + +void pipe_builtins() { + int tmp; + + read_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'read_pipe' is invalid in C99}} + write_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'write_pipe' is invalid in C99}} + + reserve_read_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'reserve_read_pipe' is invalid in C99}} + reserve_write_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'reserve_write_pipe' is invalid in C99}} + + work_group_reserve_read_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'work_group_reserve_read_pipe' is invalid in C99}} + work_group_reserve_write_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'work_group_reserve_write_pipe' is invalid in C99}} + + sub_group_reserve_write_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'sub_group_reserve_write_pipe' is invalid in C99}} + sub_group_reserve_read_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'sub_group_reserve_read_pipe' is invalid in C99}} + + commit_read_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'commit_read_pipe' is invalid in C99}} + commit_write_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'commit_write_pipe' is invalid in C99}} + + work_group_commit_read_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'work_group_commit_read_pipe' is invalid in C99}} + work_group_commit_write_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'work_group_commit_write_pipe' is invalid in C99}} + + sub_group_commit_write_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'sub_group_commit_write_pipe' is invalid in C99}} + sub_group_commit_read_pipe(tmp, tmp); // expected-warning{{implicit declaration of function 'sub_group_commit_read_pipe' is invalid in C99}} + + get_pipe_num_packets(tmp); // expected-warning{{implicit declaration of function 'get_pipe_num_packets' is invalid in C99}} + get_pipe_max_packets(tmp); // expected-warning{{implicit declaration of function 'get_pipe_max_packets' is invalid in C99}} +} Index: test/SemaOpenCL/cl20-device-side-enqueue.cl =================================================================== --- /dev/null +++ test/SemaOpenCL/cl20-device-side-enqueue.cl @@ -0,0 +1,142 @@ +// RUN: %clang_cc1 %s -cl-std=CL2.0 -fblocks -verify -pedantic -fsyntax-only + +// Diagnostic tests for different overloads of enqueue_kernel from Table 6.13.17.1 of OpenCL 2.0 Spec. +kernel void enqueue_kernel_tests() { + queue_t default_queue; + unsigned flags = 0; + ndrange_t ndrange; + clk_event_t event_wait_list; + clk_event_t evt; + void *vptr; + + // Testing the first overload type + enqueue_kernel(default_queue, flags, ndrange, ^(void) { + return 0; + }); + + enqueue_kernel(vptr, flags, ndrange, ^(void) { + return 0; + }); // expected-error{{illegal call to enqueue_kernel, expected 'queue_t' argument type}} + + enqueue_kernel(default_queue, vptr, ndrange, ^(void) { + return 0; + }); // expected-error{{illegal call to enqueue_kernel, expected 'kernel_enqueue_flags_t' (i.e. uint) argument type}} + + enqueue_kernel(default_queue, flags, vptr, ^(void) { + return 0; + }); // expected-error{{illegal call to enqueue_kernel, expected 'ndrange_t' argument type}} + + enqueue_kernel(default_queue, flags, ndrange, vptr); // expected-error{{illegal call to enqueue_kernel, expected block argument}} + + // Testing the second overload type + enqueue_kernel(default_queue, flags, ndrange, 1, &event_wait_list, &evt, ^(void) { + return 0; + }); + + enqueue_kernel(default_queue, flags, ndrange, 1, vptr, &evt, ^(void) // expected-error{{illegal call to enqueue_kernel, expected 'clk_event_t *' argument type}} + { + return 0; + }); + + enqueue_kernel(default_queue, flags, ndrange, 1, &event_wait_list, vptr, ^(void) // expected-error{{illegal call to enqueue_kernel, expected 'clk_event_t *' argument type}} + { + return 0; + }); + + enqueue_kernel(default_queue, flags, ndrange, 1, &event_wait_list, &evt, vptr); // expected-error{{illegal call to enqueue_kernel, expected block argument}} + + // Testing the third overload type + enqueue_kernel(default_queue, flags, ndrange, + ^(local void *a, local void *b) { + return 0; + }, + 1024, 1024); + + enqueue_kernel(default_queue, flags, ndrange, + ^(local void *a, local int *b) { + return 0; + }, // expected-error{{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + 1024, + 1024); + + enqueue_kernel(default_queue, flags, ndrange, // expected-error{{mismatch in number of block parameters and local size arguments passed}} + ^(local void *a, local void *b) { + return 0; + }, + 1024); + + float illegal_mem_size = (float)0.5f; + enqueue_kernel(default_queue, flags, ndrange, + ^(local void *a, local void *b) { + return 0; + }, + illegal_mem_size, illegal_mem_size); // expected-error{{local memory sizes need to be specified as uint}} + + // Testing the forth overload type + enqueue_kernel(default_queue, flags, ndrange, 1, &event_wait_list, &evt, + ^(local void *a, local void *b) { + return 0; + }, + 1024, 1024); + + enqueue_kernel(default_queue, flags, ndrange, 1, &event_wait_list, &evt, // expected-error{{mismatch in number of block parameters and local size arguments passed}} + ^(local void *a, local void *b) { + return 0; + }, + 1024, 1024, 1024); + + // More random misc cases that can't be deduced + enqueue_kernel(default_queue, flags, ndrange, 1, &event_wait_list, &evt); // expected-error{{illegal call to enqueue_kernel, incorrect argument types}} + + enqueue_kernel(default_queue, flags, ndrange, 1, 1); // expected-error{{illegal call to enqueue_kernel, incorrect argument types}} +} + +// Diagnostic tests for get_kernel_work_group_size and allowed block parameter types in dynamic parallelism. +kernel void work_group_size_tests() { + void (^const blockA)(void) = ^{ + return; + }; + void (^const blockB)(int) = ^(int a) { + return; + }; + void (^const blockC)(local void *) = ^(local void *a) { + return; + }; + void (^const blockD)(local int *) = ^(local int *a) { + return; + }; + + unsigned size = get_kernel_work_group_size(blockA); + size = get_kernel_work_group_size(blockC); + size = get_kernel_work_group_size(^(local void *a) { + return; + }); + size = get_kernel_work_group_size(^(local int *a) { + return; + }); // expected-error {{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + size = get_kernel_work_group_size(blockB); // expected-error {{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + size = get_kernel_work_group_size(blockD); // expected-error {{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + size = get_kernel_work_group_size(^(int a) { + return; + }); // expected-error {{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + size = get_kernel_work_group_size(); // expected-error {{too few arguments to function call, expected 1, have 0}} + size = get_kernel_work_group_size(1); // expected-error{{expected block argument}} + size = get_kernel_work_group_size(blockA, 1); // expected-error{{too many arguments to function call, expected 1, have 2}} + + size = get_kernel_preferred_work_group_size_multiple(blockA); + size = get_kernel_preferred_work_group_size_multiple(blockC); + size = get_kernel_preferred_work_group_size_multiple(^(local void *a) { + return; + }); + size = get_kernel_preferred_work_group_size_multiple(^(local int *a) { + return; + }); // expected-error {{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + size = get_kernel_preferred_work_group_size_multiple(^(int a) { + return; + }); // expected-error {{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + size = get_kernel_preferred_work_group_size_multiple(blockB); // expected-error {{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + size = get_kernel_preferred_work_group_size_multiple(blockD); // expected-error {{blocks used in device side enqueue are expected to have parameters of type 'local void*'}} + size = get_kernel_preferred_work_group_size_multiple(); // expected-error {{too few arguments to function call, expected 1, have 0}} + size = get_kernel_preferred_work_group_size_multiple(1); // expected-error{{expected block argument}} + size = get_kernel_preferred_work_group_size_multiple(blockA, 1); // expected-error{{too many arguments to function call, expected 1, have 2}} +} Index: test/CodeGenOpenCL/cl20-device-side-enqueue.cl =================================================================== --- /dev/null +++ test/CodeGenOpenCL/cl20-device-side-enqueue.cl @@ -0,0 +1,77 @@ +// RUN: %clang_cc1 %s -cl-std=CL2.0 -fblocks -ffake-address-space-map -O0 -emit-llvm -o - | FileCheck %s + +kernel void device_side_enqueue(global int *a, global int *b, int i) { + // CHECK: %default_queue = alloca %opencl.queue_t* + queue_t default_queue; + // CHECK: %flags = alloca i32 + unsigned flags = 0; + // CHECK: %ndrange = alloca %opencl.ndrange_t* + ndrange_t ndrange; + // CHECK: %event_wait_list = alloca %opencl.clk_event_t* + clk_event_t event_wait_list; + // CHECK: %clk_event = alloca %opencl.clk_event_t* + clk_event_t clk_event; + + // TODO: we shouldn't cast queue to i8* + // CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t*, %opencl.queue_t** %default_queue + // CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags + // CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t*, %opencl.ndrange_t** %ndrange + // CHECK: [[BL:%[0-9]+]] = bitcast <{ i8*, i32, i32, i8*, %struct.__block_descriptor*, i32 addrspace(1)*, i32 addrspace(1)*, i32 }>* %block to void ()* + // CHECK: [[BL_I8:%[0-9]+]] = bitcast void ()* [[BL]] to i8* + // CHECK: call i32 @__enqueue_kernel_basic(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i8* [[BL_I8]]) + enqueue_kernel(default_queue, flags, ndrange, ^(void) { + a[i] = b[i]; + }); + + // CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t*, %opencl.queue_t** %default_queue + // CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags + // CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t*, %opencl.ndrange_t** %ndrange + // CHECK: [[BL:%[0-9]+]] = bitcast <{ i8*, i32, i32, i8*, %struct.__block_descriptor*, i32 addrspace(1)*, i32 addrspace(1)*, i32 }>* %block3 to void ()* + // CHECK: [[BL_I8:%[0-9]+]] = bitcast void ()* [[BL]] to i8* + // CHECK: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i32 2, %opencl.clk_event_t** %event_wait_list, %opencl.clk_event_t** %clk_event, i8* [[BL_I8]]) + enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event, ^(void) { + a[i] = b[i]; + }); + + // CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t*, %opencl.queue_t** %default_queue + // CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags + // CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t*, %opencl.ndrange_t** %ndrange + // CHECK: call i32 (%opencl.queue_t*, i32, %opencl.ndrange_t*, i8*, i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i8* bitcast ({ i8**, i32, i32, i8*, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 256) + enqueue_kernel(default_queue, flags, ndrange, ^(local void *p) { + return; + }, + 256); + + // CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t*, %opencl.queue_t** %default_queue + // CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags + // CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t*, %opencl.ndrange_t** %ndrange + // CHECK: call i32 (%opencl.queue_t*, i32, %opencl.ndrange_t*, i32, %opencl.clk_event_t**, %opencl.clk_event_t**, i8*, i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i32 2, %opencl.clk_event_t** %event_wait_list, %opencl.clk_event_t** %clk_event, i8* bitcast ({ i8**, i32, i32, i8*, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 256) + enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event, ^(local void *p) { + return; + }, + 256); + + void (^const blockA)(void) = ^{ + return; + }; + void (^const blockB)(local void *) = ^(local void *a) { + return; + }; + + // CHECK: [[BL:%[0-9]+]] = load void ()*, void ()** %blockA + // CHECK: [[BL_I8:%[0-9]+]] = bitcast void ()* [[BL]] to i8* + // CHECK: call i32 @__get_kernel_work_group_size_impl(i8* [[BL_I8]]) + unsigned size = get_kernel_work_group_size(blockA); + // CHECK: [[BL:%[0-9]+]] = load void (i8 addrspace(2)*)*, void (i8 addrspace(2)*)** %blockB + // CHECK: [[BL_I8:%[0-9]+]] = bitcast void (i8 addrspace(2)*)* [[BL]] to i8* + // CHECK: call i32 @__get_kernel_work_group_size_impl(i8* [[BL_I8]]) + size = get_kernel_work_group_size(blockB); + // CHECK: [[BL:%[0-9]+]] = load void ()*, void ()** %blockA + // CHECK: [[BL_I8:%[0-9]+]] = bitcast void ()* [[BL]] to i8* + // CHECK: call i32 @__get_kernel_preferred_work_group_multiple_impl(i8* [[BL_I8]]) + size = get_kernel_preferred_work_group_size_multiple(blockA); + // CHECK: [[BL:%[0-9]+]] = load void (i8 addrspace(2)*)*, void (i8 addrspace(2)*)** %blockB + // CHECK: [[BL_I8:%[0-9]+]] = bitcast void (i8 addrspace(2)*)* [[BL]] to i8* + // CHECK: call i32 @__get_kernel_preferred_work_group_multiple_impl(i8* [[BL_I8]]) + size = get_kernel_preferred_work_group_size_multiple(blockB); +} Index: lib/Sema/SemaChecking.cpp =================================================================== --- lib/Sema/SemaChecking.cpp +++ lib/Sema/SemaChecking.cpp @@ -71,6 +71,234 @@ << call->getArg(1)->getSourceRange(); } +static inline bool isBlockPointer(Expr *Arg) { + return Arg->getType().getTypePtr()->isBlockPointerType(); +} + +/// OpenCL C v2.0, s6.13.17.2 - Checks that the block parameters are all local +/// void*, +/// which is a requirement of device side enqueue. +static bool checkBlockArgs(Sema &S, Expr *BlockArg) { + const BlockPointerType *BPT = cast<BlockPointerType>(BlockArg->getType()); + ArrayRef<QualType> Params = + BPT->getPointeeType()->getAs<FunctionProtoType>()->getParamTypes(); + bool IllegalParams = false; + unsigned ArgCounter = 0; + + // Iterate through the block parameters until either one is found that is not + // a local void*, or the block is valid. + for (ArrayRef<QualType>::iterator I = Params.begin(), E = Params.end(); + I != E; ++I, ++ArgCounter) { + const Type *ParamType = I->getTypePtr(); + if (!ParamType->isPointerType()) { + IllegalParams = true; + break; + } + QualType PointeeType = cast<PointerType>(ParamType)->getPointeeType(); + if (!PointeeType.getTypePtr()->isVoidType()) { + IllegalParams = true; + break; + } + Qualifiers ArgQual = PointeeType.getQualifiers(); + if (ArgQual.getAddressSpace() != LangAS::opencl_local) { + IllegalParams = true; + break; + } + } + if (IllegalParams) { + // Get the location of the error. If a block literal has been passed + // (BlockExpr) then we can point straight to the offending argument, + // else we just point to the variable reference. + SourceLocation ErrorLoc; + if (isa<BlockExpr>(BlockArg)) { + BlockDecl *BD = cast<BlockExpr>(BlockArg)->getBlockDecl(); + ErrorLoc = BD->getParamDecl(ArgCounter)->getLocStart(); + } else if (isa<DeclRefExpr>(BlockArg)) { + ErrorLoc = cast<DeclRefExpr>(BlockArg)->getLocStart(); + } + S.Diag(ErrorLoc, diag::err_opencl_dse_blocks_non_local_void_args); + } + + return IllegalParams; +} + +/// OpenCL C v2.0, s6.13.17.6 - Check the argument to the +/// get_kernel_work_group_size +/// and get_kernel_preferred_work_group_size_multiple builtin functions. +static bool SemaOpenCLBuiltinKernelWorkGroupSize(Sema &S, CallExpr *TheCall) { + if (checkArgCount(S, TheCall, 1)) + return true; + + Expr *BlockArg = TheCall->getArg(0); + if (!isBlockPointer(BlockArg)) { + S.Diag(BlockArg->getLocStart(), diag::err_opencl_expected_type) << "block"; + return true; + } else + return checkBlockArgs(S, BlockArg); +} + +static bool checkForLocalSizeArgs(Sema &S, CallExpr *TheCall, unsigned Start, + unsigned End) { + for (unsigned i = Start; i < End; ++i) + if (!TheCall->getArg(i)->getType().getTypePtr()->isIntegerType()) + return false; + return true; +} + +/// OpenCL v2.0, s6.13.17.1 - Check that sizes are provided for all 'local +/// void*' +/// parameter of passed block. +static bool checkEnqueueVariadicArgs(Sema &S, CallExpr *TheCall, Expr *BlockArg, + unsigned NumNonVarArgs) { + const BlockPointerType *BPT = cast<BlockPointerType>(BlockArg->getType()); + unsigned NumBlockParams = + BPT->getPointeeType()->getAs<FunctionProtoType>()->getNumParams(); + unsigned TotalNumArgs = TheCall->getNumArgs(); + + // For each argument passed to the block, a corresponding uint needs to + // be passed to describe the size of the local memory. + if (TotalNumArgs != NumBlockParams + NumNonVarArgs) { + S.Diag(TheCall->getLocStart(), + diag::err_opencl_enqueue_kernel_local_size_args); + return true; + } + + // Check that the sizes of the local memory are specified by integers. + if (!checkForLocalSizeArgs(S, TheCall, NumNonVarArgs, TotalNumArgs - 1)) { + S.Diag(TheCall->getArg(NumNonVarArgs)->getLocStart(), + diag::err_opencl_enqueue_invalid_local_size_type); + return true; + } + return false; +} + +/// OpenCL C v2.0, s6.13.17 - Enqueue kernel function contains four different +/// overload formats specified in Table 6.13.17.1. +/// int enqueue_kernel(queue_t queue, +/// kernel_enqueue_flags_t flags, +/// const ndrange_t ndrange, +/// void (^block)(void)) +/// int enqueue_kernel(queue_t queue, +/// kernel_enqueue_flags_t flags, +/// const ndrange_t ndrange, +/// uint num_events_in_wait_list, +/// clk_event_t *event_wait_list, +/// clk_event_t *event_ret, +/// void (^block)(void)) +/// int enqueue_kernel(queue_t queue, +/// kernel_enqueue_flags_t flags, +/// const ndrange_t ndrange, +/// void (^block)(local void*, ...), +/// uint size0, ...) +/// int enqueue_kernel(queue_t queue, +/// kernel_enqueue_flags_t flags, +/// const ndrange_t ndrange, +/// uint num_events_in_wait_list, +/// clk_event_t *event_wait_list, +/// clk_event_t *event_ret, +/// void (^block)(void*, ...), +/// uint size0, ...) +static bool SemaBuiltinOpenCLEnqueueKernel(Sema &S, CallExpr *TheCall) { + unsigned NumArgs = TheCall->getNumArgs(); + + if (NumArgs < 4) { + S.Diag(TheCall->getLocStart(), diag::err_typecheck_call_too_few_args); + return true; + } + + Expr *Arg0 = TheCall->getArg(0); + Expr *Arg1 = TheCall->getArg(1); + Expr *Arg2 = TheCall->getArg(2); + Expr *Arg3 = TheCall->getArg(3); + + // First argument always needs to be a queue_t type. + if (!Arg0->getType()->isQueueT()) { + S.Diag(TheCall->getArg(0)->getLocStart(), diag::err_opencl_expected_type) + << S.Context.OCLQueueTy; + return true; + } + + // Second argument always needs to be a kernel_enqueue_flags_t enum value. + if (!Arg1->getType().getTypePtr()->isIntegerType()) { + S.Diag(TheCall->getArg(1)->getLocStart(), diag::err_opencl_expected_type) + << "'kernel_enqueue_flags_t' (i.e. uint)"; + return true; + } + + // Third argument is always an ndrange_t type. + if (!Arg2->getType()->isNDRangeT()) { + S.Diag(TheCall->getArg(2)->getLocStart(), diag::err_opencl_expected_type) + << S.Context.OCLNDRangeTy; + return true; + } + + // With four arguments, there is only one form that the function could be + // called in: no events and no variable arguments. + if (NumArgs == 4) { + // check that the last argument is the right block type. + if (!isBlockPointer(Arg3)) { + S.Diag(Arg3->getLocStart(), diag::err_opencl_expected_type) << "block"; + return true; + } else + return checkBlockArgs(S, Arg3); + } else if (NumArgs >= 5) { + // we can have block + varargs. + if (isBlockPointer(Arg3)) + return (checkBlockArgs(S, Arg3) || + checkEnqueueVariadicArgs(S, TheCall, Arg3, 4)); + // last two cases with either exactly 7 args or 7 args and varargs. + if (NumArgs >= 7) { + // check common block argument. + Expr *Arg6 = TheCall->getArg(6); + if (!isBlockPointer(Arg6)) { + S.Diag(Arg6->getLocStart(), diag::err_opencl_expected_type) << "block"; + return true; + } + if (checkBlockArgs(S, Arg6)) + return true; + + // Forth argument has to be any integer type. + if (!Arg3->getType().getTypePtr()->isIntegerType()) { + S.Diag(TheCall->getArg(3)->getLocStart(), + diag::err_opencl_expected_type) + << "integer"; + return true; + } + // check remaining common arguments. + Expr *Arg4 = TheCall->getArg(4); + Expr *Arg5 = TheCall->getArg(5); + + // Fith argument is always passed as pointers to clk_event_t. + if (!(Arg4->getType().getTypePtr()->isPointerType() && + Arg4->getType().getTypePtr()->getPointeeType()->isClkEventT())) { + S.Diag(TheCall->getArg(4)->getLocStart(), + diag::err_opencl_expected_type) + << S.Context.getPointerType(S.Context.OCLClkEventTy); + return true; + } + + // Sixth argument is always passed as pointers to clk_event_t. + if (!(Arg5->getType().getTypePtr()->isPointerType() && + Arg5->getType().getTypePtr()->getPointeeType()->isClkEventT())) { + S.Diag(TheCall->getArg(5)->getLocStart(), + diag::err_opencl_expected_type) + << S.Context.getPointerType(S.Context.OCLClkEventTy); + return true; + } + + if (NumArgs == 7) + return false; + else if (NumArgs >= 7) // check if varargs are correct types. + return checkEnqueueVariadicArgs(S, TheCall, Arg6, 7); + } + } + + // None of the specific case has been detected, give generic error + S.Diag(TheCall->getLocStart(), + diag::err_opencl_enqueue_kernel_incorrect_args); + return true; +} + /// Check that the first argument to __builtin_annotation is an integer /// and the second argument is a non-wide string literal. static bool SemaBuiltinAnnotation(Sema &S, CallExpr *TheCall) { @@ -755,6 +983,7 @@ TheCall->setType(Context.VoidPtrTy); break; + // OpenCL v2.0 s6.13.16 - Pipe functions case Builtin::BIread_pipe: case Builtin::BIwrite_pipe: // Since those two functions are declared with var args, we need a semantic @@ -789,6 +1018,15 @@ if (SemaBuiltinPipePackets(*this, TheCall)) return ExprError(); break; + // OpenCL v2.0, s6.13.17 - Enqueue kernel functions. + case Builtin::BIenqueue_kernel: + if (SemaBuiltinOpenCLEnqueueKernel(*this, TheCall)) + return ExprError(); + break; + case Builtin::BIget_kernel_work_group_size: + case Builtin::BIget_kernel_preferred_work_group_size_multiple: + if (SemaOpenCLBuiltinKernelWorkGroupSize(*this, TheCall)) + return ExprError(); } // Since the target specific builtins for each arch overlap, only check those Index: lib/CodeGen/CGBuiltin.cpp =================================================================== --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -2081,7 +2081,7 @@ return RValue::get( Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name), {Arg0, Arg1})); } - // OpenCL v2.0 s6.13.16 ,s9.17.3.5 - Built-in pipe commit read and write + // OpenCL v2.0 s6.13.16, s9.17.3.5 - Built-in pipe commit read and write // functions case Builtin::BIcommit_read_pipe: case Builtin::BIcommit_write_pipe: @@ -2134,6 +2134,134 @@ Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name), {Arg0})); } + // OpenCL v2.0, s6.13.17 - Enqueue kernel function. + // It contains four different overload formats specified in Table 6.13.17.1. + case Builtin::BIenqueue_kernel: { + StringRef Name; // Generated function call name + unsigned NumArgs = E->getNumArgs(); + + llvm::Type *QueueTy = ConvertType(getContext().OCLQueueTy); + llvm::Type *RangeTy = ConvertType(getContext().OCLNDRangeTy); + + llvm::Value *Queue = EmitScalarExpr(E->getArg(0)); + llvm::Value *Flags = EmitScalarExpr(E->getArg(1)); + llvm::Value *Range = EmitScalarExpr(E->getArg(2)); + + if (NumArgs == 4) { + // The most basic form of the call with parameters: + // queue_t, kernel_enqueue_flags_t, ndrange_t, block(void) + Name = "__enqueue_kernel_basic"; + llvm::Type *ArgTys[] = {QueueTy, Int32Ty, RangeTy, Int8PtrTy}; + llvm::FunctionType *FTy = llvm::FunctionType::get( + Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys, 4), false); + + llvm::Value *Block = + Builder.CreateBitCast(EmitScalarExpr(E->getArg(3)), Int8PtrTy); + + return RValue::get(Builder.CreateCall( + CGM.CreateRuntimeFunction(FTy, Name), {Queue, Flags, Range, Block})); + } + // Could have events and/or vaargs. + if (NumArgs >= 5) { + if (E->getArg(3)->getType().getTypePtr()->isBlockPointerType()) { + // No events passed, but has variadic arguments. + Name = "__enqueue_kernel_vaargs"; + llvm::Value *Block = + Builder.CreateBitCast(EmitScalarExpr(E->getArg(3)), Int8PtrTy); + unsigned NumVaargs = NumArgs - 4; + // Create a vector of the arguments, as well as a constant value to + // express to the runtime the number of variadic arguments. + std::vector<llvm::Value *> Args{Queue, Flags, Range, Block, + ConstantInt::get(IntTy, NumVaargs)}; + std::vector<llvm::Type *> ArgTys = {QueueTy, IntTy, RangeTy, Int8PtrTy, + IntTy}; + + // Add the variadics. + for (unsigned i = 4; i < NumArgs; ++i) { + Args.push_back(EmitScalarExpr(E->getArg(i))); + } + + llvm::FunctionType *FTy = llvm::FunctionType::get( + Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), true); + return RValue::get( + Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name), + llvm::ArrayRef<llvm::Value *>(Args))); + } + // Any calls now have event arguments passed. + if (NumArgs >= 7) { + llvm::Type *EventTy = ConvertType(getContext().OCLClkEventTy); + unsigned AS4 = + E->getArg(4)->getType()->getPointeeType().getAddressSpace(); + llvm::Type *EventPtrAS4Ty = + EventTy->getPointerTo(CGM.getContext().getTargetAddressSpace(AS4)); + unsigned AS5 = + E->getArg(5)->getType()->getPointeeType().getAddressSpace(); + llvm::Type *EventPtrAS5Ty = + EventTy->getPointerTo(CGM.getContext().getTargetAddressSpace(AS5)); + + llvm::Value *NumEvents = EmitScalarExpr(E->getArg(3)); + llvm::Value *EventList = EmitScalarExpr(E->getArg(4)); + llvm::Value *ClkEvent = EmitScalarExpr(E->getArg(5)); + llvm::Value *Block = + Builder.CreateBitCast(EmitScalarExpr(E->getArg(6)), Int8PtrTy); + + std::vector<llvm::Type *> ArgTys = { + QueueTy, Int32Ty, RangeTy, Int32Ty, + EventPtrAS4Ty, EventPtrAS5Ty, Int8PtrTy}; + std::vector<llvm::Value *> Args{Queue, Flags, Range, NumEvents, + EventList, ClkEvent, Block}; + + if (NumArgs == 7) { + // Has events but no variadics. + Name = "__enqueue_kernel_basic_events"; + llvm::FunctionType *FTy = llvm::FunctionType::get( + Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false); + return RValue::get( + Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name), + llvm::ArrayRef<llvm::Value *>(Args))); + } else { + // Has event info and variadics + unsigned NumVaargs = NumArgs - 7; + + // Pass the number of variadics to the runtime function too. + Args.push_back(ConstantInt::get(Int32Ty, NumVaargs)); + ArgTys.push_back(Int32Ty); + Name = "__enqueue_kernel_events_vaargs"; + + // Add the variadics. + for (unsigned i = 7; i < NumArgs; ++i) { + Args.push_back(EmitScalarExpr(E->getArg(i))); + } + llvm::FunctionType *FTy = llvm::FunctionType::get( + Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), true); + return RValue::get( + Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name), + llvm::ArrayRef<llvm::Value *>(Args))); + } + } + } + llvm_unreachable("Unhandled enqueue_kernel signature"); + } + // OpenCL v2.0 s6.13.17.6 - Kernel query functions need bitcast of block + // parameter. + case Builtin::BIget_kernel_work_group_size: { + Value *Arg = EmitScalarExpr(E->getArg(0)); + Arg = Builder.CreateBitCast(Arg, Int8PtrTy); + return RValue::get( + Builder.CreateCall(CGM.CreateRuntimeFunction( + llvm::FunctionType::get(IntTy, Int8PtrTy, false), + "__get_kernel_work_group_size_impl"), + Arg)); + } + case Builtin::BIget_kernel_preferred_work_group_size_multiple: { + Value *Arg = EmitScalarExpr(E->getArg(0)); + Arg = Builder.CreateBitCast(Arg, Int8PtrTy); + return RValue::get(Builder.CreateCall( + CGM.CreateRuntimeFunction( + llvm::FunctionType::get(IntTy, Int8PtrTy, false), + "__get_kernel_preferred_work_group_multiple_impl"), + Arg)); + } case Builtin::BIprintf: if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) return EmitCUDADevicePrintfCallExpr(E, ReturnValue); Index: lib/Basic/Builtins.cpp =================================================================== --- lib/Basic/Builtins.cpp +++ lib/Basic/Builtins.cpp @@ -69,7 +69,8 @@ bool MSModeUnsupported = !LangOpts.MicrosoftExt && (BuiltinInfo.Langs & MS_LANG); bool ObjCUnsupported = !LangOpts.ObjC1 && BuiltinInfo.Langs == OBJC_LANG; - bool OclCUnsupported = !LangOpts.OpenCL && BuiltinInfo.Langs == OCLC_LANG; + bool OclCUnsupported = + LangOpts.OpenCLVersion != 200 && BuiltinInfo.Langs == OCLC20_LANG; return !BuiltinsUnsupported && !MathBuiltinsUnsupported && !OclCUnsupported && !GnuModeUnsupported && !MSModeUnsupported && !ObjCUnsupported; } Index: include/clang/Basic/DiagnosticSemaKinds.td =================================================================== --- include/clang/Basic/DiagnosticSemaKinds.td +++ include/clang/Basic/DiagnosticSemaKinds.td @@ -7815,6 +7815,19 @@ def err_opencl_extern_block_declaration : Error< "invalid block variable declaration - using 'extern' storage class is disallowed">; +// OpenCL v2.0 s6.13.17 Enqueue kernel restrictions. +def err_opencl_function_not_supported : Error< + "this function is not supported in this version of CL">; +def err_opencl_enqueue_kernel_incorrect_args : Error< + "illegal call to enqueue_kernel, incorrect argument types">; +def err_opencl_expected_type : Error< + "illegal call to enqueue_kernel, expected %0 argument type">; +def err_opencl_enqueue_kernel_local_size_args : Error< + "mismatch in number of block parameters and local size arguments passed">; +def err_opencl_enqueue_invalid_local_size_type : Error< + "local memory sizes need to be specified as uint">; +def err_opencl_dse_blocks_non_local_void_args : Error< + "blocks used in device side enqueue are expected to have parameters of type 'local void*'">; } // end of sema category let CategoryName = "OpenMP Issue" in { Index: include/clang/Basic/Builtins.h =================================================================== --- include/clang/Basic/Builtins.h +++ include/clang/Basic/Builtins.h @@ -36,7 +36,7 @@ CXX_LANG = 0x4, // builtin for cplusplus only. OBJC_LANG = 0x8, // builtin for objective-c and objective-c++ MS_LANG = 0x10, // builtin requires MS mode. - OCLC_LANG = 0x20, // builtin for OpenCL C only. + OCLC20_LANG = 0x20, // builtin for OpenCL C only. ALL_LANGUAGES = C_LANG | CXX_LANG | OBJC_LANG, // builtin for all languages. ALL_GNU_LANGUAGES = ALL_LANGUAGES | GNU_LANG, // builtin requires GNU mode. ALL_MS_LANGUAGES = ALL_LANGUAGES | MS_LANG // builtin requires MS mode. Index: include/clang/Basic/Builtins.def =================================================================== --- include/clang/Basic/Builtins.def +++ include/clang/Basic/Builtins.def @@ -1264,29 +1264,35 @@ // OpenCL v2.0 s6.13.16, s9.17.3.5 - Pipe functions. // We need the generic prototype, since the packet type could be anything. -LANGBUILTIN(read_pipe, "i.", "tn", OCLC_LANG) -LANGBUILTIN(write_pipe, "i.", "tn", OCLC_LANG) +LANGBUILTIN(read_pipe, "i.", "tn", OCLC20_LANG) +LANGBUILTIN(write_pipe, "i.", "tn", OCLC20_LANG) -LANGBUILTIN(reserve_read_pipe, "i.", "tn", OCLC_LANG) -LANGBUILTIN(reserve_write_pipe, "i.", "tn", OCLC_LANG) +LANGBUILTIN(reserve_read_pipe, "i.", "tn", OCLC20_LANG) +LANGBUILTIN(reserve_write_pipe, "i.", "tn", OCLC20_LANG) -LANGBUILTIN(commit_write_pipe, "v.", "tn", OCLC_LANG) -LANGBUILTIN(commit_read_pipe, "v.", "tn", OCLC_LANG) +LANGBUILTIN(commit_write_pipe, "v.", "tn", OCLC20_LANG) +LANGBUILTIN(commit_read_pipe, "v.", "tn", OCLC20_LANG) -LANGBUILTIN(sub_group_reserve_read_pipe, "i.", "tn", OCLC_LANG) -LANGBUILTIN(sub_group_reserve_write_pipe, "i.", "tn", OCLC_LANG) +LANGBUILTIN(sub_group_reserve_read_pipe, "i.", "tn", OCLC20_LANG) +LANGBUILTIN(sub_group_reserve_write_pipe, "i.", "tn", OCLC20_LANG) -LANGBUILTIN(sub_group_commit_read_pipe, "v.", "tn", OCLC_LANG) -LANGBUILTIN(sub_group_commit_write_pipe, "v.", "tn", OCLC_LANG) +LANGBUILTIN(sub_group_commit_read_pipe, "v.", "tn", OCLC20_LANG) +LANGBUILTIN(sub_group_commit_write_pipe, "v.", "tn", OCLC20_LANG) -LANGBUILTIN(work_group_reserve_read_pipe, "i.", "tn", OCLC_LANG) -LANGBUILTIN(work_group_reserve_write_pipe, "i.", "tn", OCLC_LANG) +LANGBUILTIN(work_group_reserve_read_pipe, "i.", "tn", OCLC20_LANG) +LANGBUILTIN(work_group_reserve_write_pipe, "i.", "tn", OCLC20_LANG) -LANGBUILTIN(work_group_commit_read_pipe, "v.", "tn", OCLC_LANG) -LANGBUILTIN(work_group_commit_write_pipe, "v.", "tn", OCLC_LANG) +LANGBUILTIN(work_group_commit_read_pipe, "v.", "tn", OCLC20_LANG) +LANGBUILTIN(work_group_commit_write_pipe, "v.", "tn", OCLC20_LANG) -LANGBUILTIN(get_pipe_num_packets, "Ui.", "tn", OCLC_LANG) -LANGBUILTIN(get_pipe_max_packets, "Ui.", "tn", OCLC_LANG) +LANGBUILTIN(get_pipe_num_packets, "Ui.", "tn", OCLC20_LANG) +LANGBUILTIN(get_pipe_max_packets, "Ui.", "tn", OCLC20_LANG) + +// OpenCL v2.0 s6.13.17 - Enqueue kernel functions. +// Custom builtin check allows to perform special check of passed block arguments. +LANGBUILTIN(enqueue_kernel, "i.", "tn", OCLC20_LANG) +LANGBUILTIN(get_kernel_work_group_size, "i.", "tn", OCLC20_LANG) +LANGBUILTIN(get_kernel_preferred_work_group_size_multiple, "i.", "tn", OCLC20_LANG) #undef BUILTIN #undef LIBBUILTIN
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits