mshockwave removed a reviewer: bader.
mshockwave updated this revision to Diff 73254.
mshockwave added a comment.
@Anastasia
Sorry for late responding, I'd just attach a new version of patch that fixes
block function use cases as normal lambda functions. But testing code is not
included in this revision, we're still working on it.
About the questions you asked in the previous comment, I'm going to explain
them from another aspect: How will one implement __enqueue_kernel_XXX? It might
be classified into two categories:
1. Library implementation like libclc/libclcxx written in OpenCL-C
2. Implement builtins directly in compiler.
If we choose the first one, which most of people would do regarding its
simplicity and flexibility, and we want to fetch captured variables inside the
implementation of __enqueue_kernel_XXX, the possible approach would be:
void* block_as_voidptr = (void*)arg_child_kernel;
block_literal_ty *block = (block_literal_ty*)block_as_voidptr;
block->capA;
block->capB;
This seems promise, but what exactly `block_literal_ty` looks like? We all know
that `block_literal_ty` would look similar to:
typedef struct {
/*
* Fields of block header.
* e.g. isa, block_descriptor...
*/
int capA;
int capB;
...
} block_literal_ty;
But since we're discussing a static type language, the definition of this
struct must be known. However, the EXACT appearence of `block_literal_ty` would
vary among programs, or even functions. That's the thing cap_copy_helper want
to aid.
Of course there is another library approach: Keep the child kernel's
invoke_function prototype untouched, pass block_literal variable(in void
pointer type) as its first function argument. Since instructions for extracting
captured variables had been generated during the codegen of invoke_function
body. Also, we don't need to tackle any captured variables inside
__enqueue_kernel_XXX.
However, the OpenCL spec says that global address space is the only address
space shared between parent and child kernel; and the block_literal variable
itself, is allocated as private(stack) variable in parent kernel. So we need to
copy the block_literal variable(not its pointer) into some global space.
Nevertheless, OpenCL doesn't allow dynamic-sized memory in global space, so we
need to define a block of static size memory, perhaps array, in our library
implementation. Here is the place might require global memory management since
static size implies potential risk of running out pre-allocated space.
Regarding the improvement proposed by us which "flatten" captured variables
into invoke_function argument list and block_literal pointer wouldn't be passed
as first argument(to invoke_function) anymore. The reason why it doesn't
require global memory management is that we can retrieve captured variables
with cap_num field and cap_copy_helper routine INSIDE __enqueue_kernel_XXX and
passed those captures as arguments to child kernel, rather than saving
block_literal variable globally and postpone the retrieving actions until
invoke_function, the child kernel body.
https://reviews.llvm.org/D24715
Files:
lib/CodeGen/CGBlocks.cpp
lib/CodeGen/CGBlocks.h
lib/CodeGen/CGExpr.cpp
lib/CodeGen/CodeGenFunction.cpp
lib/CodeGen/CodeGenFunction.h
lib/CodeGen/CodeGenModule.h
Index: lib/CodeGen/CodeGenModule.h
===
--- lib/CodeGen/CodeGenModule.h
+++ lib/CodeGen/CodeGenModule.h
@@ -464,6 +464,7 @@
llvm::Type *BlockDescriptorType = nullptr;
llvm::Type *GenericBlockLiteralType = nullptr;
+ llvm::Type *GenericOCLBlockLiteralType = nullptr;
struct {
int GlobalUniqueCount;
@@ -768,6 +769,8 @@
/// The type of a generic block literal.
llvm::Type *getGenericBlockLiteralType();
+ llvm::Type *getGenericOCLBlockLiteralType();
+
/// Gets the address of a block which requires no captures.
llvm::Constant *GetAddrOfGlobalBlock(const BlockExpr *BE, const char *);
Index: lib/CodeGen/CodeGenFunction.h
===
--- lib/CodeGen/CodeGenFunction.h
+++ lib/CodeGen/CodeGenFunction.h
@@ -288,6 +288,7 @@
const CodeGen::CGBlockInfo *BlockInfo;
llvm::Value *BlockPointer;
+ bool IsOCLChildKernelInvoke;
llvm::DenseMap LambdaCaptureFields;
FieldDecl *LambdaThisCaptureField;
@@ -1341,8 +1342,12 @@
const DeclMapTy &ldm,
bool IsLambdaConversionToBlock);
+ llvm::Constant* GenerateBlockFunctionWrapper(const CGBlockInfo &blockInfo,
+ const llvm::Function *invokeFunc);
+
llvm::Constant *GenerateCopyHelperFunction(const CGBlockInfo &blockInfo);
llvm::Constant *GenerateDestroyHelperFunction(const CGBlockInfo &blockInfo);
+ llvm::Constant *GenerateOCLCapturesCopyFunction(const CGBlockInfo &blockInfo);
llvm::Constant