================
@@ -1,119 +1,542 @@
-=============================================
-SYCL Compiler and Runtime architecture design
-=============================================
+============
+SYCL Support
+============
.. contents::
:local:
Introduction
============
+The `SYCL 2020 specification <SYCL-2020_>`_ defines a single-source programming
+model and C++ run-time library interface for writing portable programs that
+support heterogeneous devices including GPUs, CPUs, and accelerators.
+The specification is intended to allow for a wide range of implementation
+possibilities, examples of which include:
+
+- A SYCL run-time library written in standard C++ that executes kernels on a
+ homogeneous set of host and device processors, each of which can execute
+ common compiled code from shared memory.
+- A SYCL run-time library that executes kernels on a heterogeneous set of
+ device processors for which each kernel is pre-compiled for each supported
+ device processor (Ahead-Of-Time (AOT) compilation) or for a family of device
+ processors (Just-In-Time (JIT) compilation).
+
+Since Clang is a conforming implementation of the C++ standard, no additional
+features are required for support of the first implementation strategy.
+This document details the core language features Clang provides for use by
+SYCL run-time libraries that use the second implementation strategy.
+
+.. _SYCL-2020:
+ https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
+
+
+Example Usage
+=============
+SYCL is designed as an extension of C++ rather than as a distinct programming
+language.
+SYCL support is enabled with the `-fsycl <opt-fsycl_>`_ option.
+
+.. code-block:: sh
+
+ clang++ -c -fsycl source-file.cpp
+
+The choice of which target devices will be supported is made at compile time.
+By default, SYCL source files will be compiled with support for a host target
+dependent set of target devices.
+For example, when compiling for a ``x86_64-unknown-linux-gnu`` host target,
+target support will be enabled for ``spirv64-unknown-unknown`` devices.
+The set of supported target devices can be specified via a comma separated list
+of target triples with the `--offload-targets= <opt-offload-targets_>`_ option.
+The following Clang invocation enables support for AMD, NVIDIA, and Intel GPU
+targets.
+
+.. code-block:: sh
+
+ clang++ -c -fsycl \
+
--offload-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda,spirv64-unknown-unknown
\
+ source-file.cpp
+
+Object files built with the `-fsycl <opt-fsycl_>`_ option contain device
+images that require additional processing at link time.
+Programs linked with such object files must also be linked using the
+``clang++`` driver and the `-fsycl <opt-fsycl_>`_ option.
+
+.. code-block:: sh
+
+ clang++ -fsycl example.o source-file.o -o example
+
+.. _opt-fsycl:
+
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fsycl
+.. _opt-offload-targets:
+
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-offload-targets
+
+
+Compilation Model
+=================
+`SYCL 2020 section 5.1, "Offline compilation of SYCL source files"
<SYCL-2020-5.1_>`_
+acknowledges two compilation models.
+
+- Single-source Multiple Compiler Pass (`SMCP`_) describes a compilation model
+ in which source code is separately parsed and analyzed for the host target
+ and each device target.
+
+- Single-source Single Compiler Pass (`SSCP`_) describes a compilation model
+ in which source code is parsed and analyzed once with code generation
+ performed separately for the host target and each device target.
+
+Clang only supports the `SMCP`_ compilation model currently, but the SYCL
+language support features have been designed to allow for support of the
+`SSCP`_ compilation model to be added in the future.
+
+By default, SYCL source files are compiled for the host target and for each
+device target.
+In some cases, it is useful to restrict compilation to just the host target or
+just the device targets; the `-fsycl-host-only <opt-fsycl-host-only_>`_ and
+`-fsycl-device-only <opt-fsycl-device-only_>`_ options are available for these
+purposes.
+
+.. _SMCP:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:smcp
+.. _SSCP:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:sscp
+.. _SYCL-2020-5.1:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_offline_compilation_of_sycl_source_files
+.. _opt-fsycl-host-only:
+
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-offload-host-only
+.. _opt-fsycl-device-only:
+
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-offload-device-only
+
+
+Supported Targets
+=================
+Support for SYCL is still in the implementation phase, but all targets
+supported by the `--offload-targets= <opt-offload-targets_>`_ option
+are intended to eventually be supported.
-This document describes the architecture of the SYCL compiler and runtime
-library. More details are provided in
-`external document
<https://github.com/intel/llvm/blob/sycl/sycl/doc/design/CompilerAndRuntimeDesign.md>`_\
,
-which are going to be added to clang documentation in the future.
-
-Address space handling
-======================
-
-The SYCL specification represents pointers to disjoint memory regions using C++
-wrapper classes on an accelerator to enable compilation with a standard C++
-toolchain and a SYCL compiler toolchain. Section 3.8.2 of SYCL 2020
-specification defines
-`memory model
<https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_sycl_device_memory_model>`_\
,
-section 4.7.7 - `address space classes
<https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_address_space_classes>`_
-and section 5.9 covers `address space deduction
<https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_address_space_deduction>`_.
-The SYCL specification allows two modes of address space deduction: "generic as
-default address space" (see section 5.9.3) and "inferred address space" (see
-section 5.9.4). Current implementation supports only "generic as default
address
-space" mode.
-
-SYCL borrows its memory model from OpenCL however SYCL doesn't perform
-the address space qualifier inference as detailed in
-`OpenCL C v3.0 6.7.8
<https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#addr-spaces-inference>`_.
-
-The default address space is "generic-memory", which is a virtual address space
-that overlaps the global, local, and private address spaces. SYCL mode enables
-following conversions:
-
-- explicit conversions to/from the default address space from/to the address
- space-attributed type
-- implicit conversions from the address space-attributed type to the default
- address space
-- explicit conversions to/from the global address space from/to the
- ``__attribute__((opencl_global_device))`` or
- ``__attribute__((opencl_global_host))`` address space-attributed type
-- implicit conversions from the ``__attribute__((opencl_global_device))`` or
- ``__attribute__((opencl_global_host))`` address space-attributed type to the
- global address space
-
-All named address spaces are disjoint and sub-sets of default address space.
-
-The SPIR target allocates SYCL namespace scope variables in the global address
-space.
-
-Pointers to default address space should get lowered into a pointer to a
generic
-address space (or flat to reuse more general terminology). But depending on the
-allocation context, the default address space of a non-pointer type is assigned
-to a specific address space. This is described in
-`common address space deduction rules
<https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:commonAddressSpace>`_
-section.
-
-This is also in line with the behaviour of CUDA (`small example
-<https://godbolt.org/z/veqTfo9PK>`_).
-
-``multi_ptr`` class implementation example:
+
+Predefined Macros
+=================
+`SYCL 2020 section 5.6, "Preprocessor directives and macros" <SYCL-2020-5.6_>`_
+specifies macros that a SYCL implementation is required to provide.
+Most such macros are defined by the SYCL run-time library and require inclusion
+of the ``<sycl/sycl.hpp>`` header file.
+The following macros are conditionally predefined by the compiler.
+
+.. list-table::
+ :header-rows: 1
+
+ * - Macro
+ - Description
+ * - ``__SYCL_DEVICE_ONLY__``
+ - Predefined by a `SMCP`_ implementation during device compilation (but
not
+ during host compilation).
+ * - ``__SYCL_SINGLE_SOURCE__``
+ - Predefined by a `SSCP`_ implementation during (host and device)
+ compilation.
+
+Since Clang only supports the `SMCP`_ compilation model currently, the
+``__SYCL_SINGLE_SOURCE__`` macro is never predefined.
+
+.. _SYCL-2020-5.6:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_preprocessor_directives_and_macros
+
+
+Language Support
+================
+`SYCL 2020 section 3.12.3, "Library-only implementation" <SYCL-2020-3.12.13_>`_
+notes the intent that the SYCL specification be implementable as a C++ library
+with no requirements beyond a compiler that conforms to the C++17 standard.
+The SYCL specification therefore does not specify extensions to the C++ core
+language and a library-only implementation will work with Clang without any
+core language extensions.
+Clang provides the features described in this section to facilitate
capabilities
+that are not possible with a library-only SYCL implementation.
+
+.. _SYCL-2020-3.12.13:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_library_only_implementation
+
+
+.. _sect-sycl_kernel_entry_point:
+
+The ``[[clang::sycl_kernel_entry_point]]`` Attribute
+----------------------------------------------------
+This attribute is intended for use in the implementation of SYCL run-time
+libraries and should not be used directly by programmers.
+
+The `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attribute
+coordinates interaction between Clang and the SYCL run-time library to
+facilitate code generation and the execution of a SYCL kernel on a device
+that potentially uses an instruction set architecture different from the host.
+Consider the following call to the ``sycl::handler::single_task()`` SYCL
+kernel invocation function.
.. code-block:: C++
- // check that SYCL mode is ON and we can use non-standard decorations
- #if defined(__SYCL_DEVICE_ONLY__)
- // GPU/accelerator implementation
- template <typename T, address_space AS> class multi_ptr {
- // DecoratedType applies corresponding address space attribute to the
type T
- // DecoratedType<T, global_space>::type ==
"__attribute__((opencl_global)) T"
- // See sycl/include/CL/sycl/access/access.hpp for more details
- using pointer_t = typename DecoratedType<T, AS>::type *;
+ struct KN;
+ void f(sycl::handler &h, sycl::stream &sout, int i) {
+ h.single_task<KN>([=] {
+ sout << "The value of i is " << i << "\n";
+ });
+ }
+
+The SYCL kernel is defined by the lambda expression passed to the
+``single_task()`` function and is identified by the ``KN`` type passed as the
+first template argument.
+See
+`SYCL 2020 section 4.9.4.2, "SYCL functions for invoking kernels"
<SYCL-2020-4.9.4.2_>`_
+and
+`SYCL 2020 section 5.2, "Naming of kernels" <SYCL-2020-5.2_>`_
+for more details.
+
+The `SMCP`_ and `SSCP`_ compilation models require that code generation be
+performed for each SYCL kernel for each target device.
+In order for Clang to perform that code generation, it needs to be informed
+that a SYCL kernel invocation is present.
+The `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attribute
+provides the means for the SYCL run-time library, which provides the
+definition of all SYCL kernel invocation functions, to inform Clang of a SYCL
+kernel invocation.
+This is accomplished by, in the definition of a SYCL kernel invocation
function,
+including a call to a function declared with the attribute.
+For example:
+
+.. code-block:: C++
- pointer_t m_Pointer;
+ namespace sycl {
+ class handler {
+ template <typename KernelName, typename KernelType>
+ [[clang::sycl_kernel_entry_point(KernelName)]]
+ void kernel_entry_point(KernelType kernelFunc) {
+ kernelFunc();
+ }
public:
- pointer_t get() { return m_Pointer; }
- T& operator* () { return *reinterpret_cast<T*>(m_Pointer); }
+ template <typename KernelName, typename KernelType>
+ void single_task(const KernelType &kernelFunc) {
+ kernel_entry_point<KernelName>(kernelFunc);
+ }
+ };
}
- #else
- // CPU/host implementation
- template <typename T, address_space AS> class multi_ptr {
- T *m_Pointer; // regular undecorated pointer
- public:
- T *get() { return m_Pointer; }
- T& operator* () { return *m_Pointer; }
+
+The arguments of the call to ``kernel_entry_point()`` constitute the parameters
+of a SYCL kernel.
+The body of the ``kernel_entry_point()`` function contains the statements
+required to execute the SYCL kernel (e.g., an invocation of the call operator
of
+the SYCL kernel object as in this example).
+The call to ``kernel_entry_point()`` in ``single_task()`` establishes a common
+point of SYCL kernel invocation for Clang and the SYCL run-time library.
+With that point established, the tasks required to actually execute a SYCL
+kernel are delegated according to the following division of responsibilities.
+
+Clang is responsible for:
+
+- Validating that all kernel argument types (e.g., the deduced parameter types
+ of the ``kernel_entry_point()`` function above) satisfy the requirements
+ specified in
+ `SYCL 2020 section 4.12.4, "Rules for parameter passing to kernels"
<SYCL-2020-4.12.4_>`_.
+- Informing the SYCL run-time library of the presence of subobjects of SYCL
+ types that require special handling within kernel arguments.
+- Generating an offload kernel entry point function for each SYCL kernel for
+ each target device, generating a name for it derived from the SYCL kernel
+ name, and informing the SYCL run-time library of the generated name.
+
+The SYCL run-time library is responsible for:
+
+- Selecting a device on which to execute the kernel.
+- Copying the SYCL kernel object and any other kernel arguments to the device.
+- Informing Clang of additional parameters required for the offload kernel
+ entry point based on the presence of subobjects of SYCL types that require
+ special handling within kernel arguments.
+- Scheduling execution of the offload kernel entry point function on the
+ selected device.
+
+The SYCL run-time library tasks are expected to be performed in conjunction
+with an offload backend such as liboffload, OpenCL, CUDA, Hip, or Level Zero;
+their details are out of scope for this document.
+
+The above division of responsibilities requires coordination.
+The call to a function declared with the
+`sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attribute causes
+two primary side effects:
+
+- The generation of an offload kernel entry point function.
+- An implicit call to a SYCL run-time library provided template named
+ ``sycl_kernel_launch`` (which may be a function template or a variable
+ template of a type with a member call operator).
+
+The offload kernel entry point function is generated with a target dependent
+calling convention for each device target.
+The function parameters and function body are initially copied from the
function
+declared with the `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_
+attribute, but may be augmented by information provided by the SYCL run-time
+library as described below.
+The function name is an implementation detail subject to change, but
+incorporates the SYCL kernel name in order to ensure that a unique name is
+deterministically generated for each SYCL kernel.
+
+The call to the ``sycl_kernel_launch`` template effectively replaces the call
+to the `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attributed
+function.
+This implicit call serves several purposes:
+
+- It informs the SYCL run-time library of the name of the offload kernel entry
+ point function to be used to execute the kernel on the selected device.
+- It informs the SYCL run-time library of the presence of subobjects of the
+ kernel arguments that require special handling.
+
+See the
+:ref:`sycl_special_kernel_parameter <sect-sycl_special_kernel_parameter>`
+section regarding SYCL types that require special handling in kernel arguments.
+
+The call to the ``sycl_kernel_launch`` template passes the name of the
generated
+offload kernel entry point function, the kernel arguments, and, for each
+subobject of a kernel argument that requires special handling, a reference to
+that subobject.
+For reasons explained further below, the ``sycl_kernel_launch`` template needs
+to know which of its arguments correspond to direct kernel arguments and which
+correspond to references to special subobjects within the direct kernel
+arguments.
+Because there may be multiple kernel arguments with multiple subobjects that
+require special handling, and because C++17 does not support function templates
+with multiple function parameter packs, an idiom is used to pass the kernel
+arguments and special subobjects separately.
+This idiom is best explained by way of an example.
+
+Consider the earlier example of a call to ``single_task()`` that passes a
+lambda that captures variables of type ``int`` and ``std::stream``.
+``std::stream`` is an example of a SYCL type that requires special handling in
+kernel arguments.
+The call to ``kernel_entry_point<KernelName>(kernelFunc)`` in the
implementation
+of ``single_task()`` results in an implicit call to ``sycl_kernel_launch`` that
+looks similar to the following (the access to the captured copy of ``sout`` via
+``kernelFunc.sout`` is not valid C++ syntax, but the compiler can generate such
+accesses).
+
+.. code-block:: C++
+
+ sycl_kernel_launch<KernelName>("kernel-entry-point",
kernelFunc)(kernelFunc.sout)
+
+The SYCL kernel name type, ``KernelName``, is passed as an explicit template
----------------
aelovikov-intel wrote:
I wonder if we might need to pass any code location information as well.
https://github.com/llvm/llvm-project/pull/170602
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits