asavonic created this revision. asavonic added reviewers: Anastasia, yaxunl, bader. Herald added subscribers: cfe-commits, jfb, mgorny.
TL;DR ----- This patch splits huge opencl-c.h header into multiple headers to support efficient use of Precompiled Headers (or Modules): - opencl-c-defs.h contains all preprocessor macros. Macros are not saved in PCHs. - opencl-c-common.h contains builtins which do not depend on any preprocessor macro (extensions, OpenCL C version). - opencl-c-fp16.h and opencl-c-fp64.h contain builtins which require either cl_khr_fp16 or cl_khr_fp64 macros, but they do not depend on other extensions and OpenCL C version. - opencl-c-platform.h (looking for a better name! 'target' maybe?) contains all other builtins which have more complicated requirements. Umbrella header opencl-c.h includes all headers above, so this change is backward compatible with the original design. Details ------- Typical OpenCL compiler implicitly includes opencl-c.h before compiling user code: #include "opencl-c.h" __kernel void k() { printf("hello world\n"); } With this approach, even for tiny programs compiler spends most of its time on parsing of opencl-c.h header (which has more than 16000 LOC), so it takes ~1s to compile our hello world example into LLVM IR. Obvious solution for this problem is to compile opencl-c.h into an AST only once, and then use this AST to compile user code. This feature can be implemented using Precompiled Headers or Clang Modules, but it has one major drawback: AST must be built with the same set of preprocessor macros and the same target triple as the the user code. Consider the following example: opencl-c.h: #if __OPENCL_C_VERSION__ >= CL_VERSION_2_0 float fract(float x, float *iptr); #else float fract(float x, __global float *iptr); #endif If we compile this opencl-c.h into an AST, only one of these two functions will be present. We cannot use this AST to compile for both OpenCL C 1.2 and 2.0. Another example: if we compile opencl-c.h into an AST with spir-unknown-unknown (32bit) target triple, AST will have 'int' instead of 'size_t': opencl-c.h: typedef __SIZE_TYPE__ size_t; // __SIZE_TYPE__ is defined to int by clang size_t get_global_size(uint dimindx); This makes the AST non-portable, and it cannot be used to compile for spir64 triple (where size_t is 64bit integer). If we want compiler to support CL1.2/CL2.0 for spir/spir64 triple, we'll need to use 4 different PCHs compiled with different flags: -cl-std=CL1.2 + -triple spir -cl-std=CL1.2 + -triple spir64 -cl-std=CL2.0 + -triple spir -cl-std=CL2.0 + -triple spir64 Things are getting worse if we want to support multiple devices, which have different sets of OpenCL extensions. For example, if we want to support both CPU (which supports cl_khr_fp64) and GPU (which supports cl_khr_fp16), we must compile opencl-c.h using different options, because it has the following code all over the place: #ifdef cl_khr_fp64 uchar __ovld __cnfn convert_uchar(double); #endif #ifdef cl_khr_fp16 uchar __ovld __cnfn convert_uchar(half); #endif So if we want to add these 2 devices, we now need 4*2 different PCHs, since every combination of -cl-std and -triple must be compiled twice with different OpenCL extensions defines. Size of each PCH is 2.5M, so we need ~20M of memory to store our PCHs. If we want to add more devices or support another OpenCL C version, the size will double. For a C/C++ compiler this is not a problem: clang maintains a cache so that only combinations which are actually used are saved to a disk. OpenCL compilers are different, because they are often distributed as a shared library. Being a library, it is not expected to write something on a disk, because it rises a number of questions, such as: "where do we have a writable temporary directory?" and "how much space we can consume for PCHs?". For some targets (say, embedded devices) it is better to avoid this approach. To solve this problem, the patch splits opencl-c.h header into multiple headers. First, we split out 'common' functions, which should be present regardless of OpenCL C version or extensions. All builtins which have 'core' types, such as convert, math and simple image builtins are moved into opencl-c-common.h (8K LOC). Second, split out all functions which are 'common' between OpenCL version, but require either cl_khr_fp16 or cl_khr_fp64 (not both at the same time!). Two headers (opencl-c-fp16.h and opencl-c-fp64.h) have 2K LOC each. Everything else goes into opencl-c-platform.h (5K LOC). All macros go in opencl-c-defs.h, because macros cannot be pre-compiled into AST. With this setup, we can compile opencl-c-common.h, opencl-c-fp16.h and opencl-c-fp64.h into PCHs with one set of extensions/OpenCL version, and use them for any other set of extensions/OpenCL version. Clang will detect this and throw out an error, which can be safely disabled by -fno-validate-pch option. opencl-c-platform.h (5K LOC) must still be pre-compiled for each supported combination, but since it is a lot smaller (~0.5M vs original 2.5M), it is not that bad. Or we can sacrifice some performance and leave this header without pre-compilation: large portion of opencl-c-platform.h contains vendor extensions, so it will be removed by preprocessor anyway. Repository: rC Clang https://reviews.llvm.org/D51544 Files: lib/Headers/CMakeLists.txt lib/Headers/opencl-c-common.h lib/Headers/opencl-c-defs.h lib/Headers/opencl-c-fp16.h lib/Headers/opencl-c-fp64.h lib/Headers/opencl-c-platform.h lib/Headers/opencl-c.h test/Headers/opencl-c-header-split.cl _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits