Anastasia added inline comments.

================
Comment at: clang/docs/SYCLSupport.md:830
+Similar to other single-source C++-based GPU programming modes like
+OpenMP/CUDA/HIP, SYCL uses clang's "default" address space for types with no
+address space attributes. This design has two important features: keeps the 
type system consistent with C++ on one hand and enable tools for emitting 
device code aligned with SPIR memory model (and other GPU targets).
----------------
bader wrote:
> Anastasia wrote:
> > Is this explained somewhere would you be able to add any reference?
> I wasn't able to find documentation for this implementation detail, but we 
> should be able to confirm that by printing AST for example.
> 
> Here is the documentation I found for CUDA in llvm project:
>  - https://llvm.org/docs/CompileCudaWithLLVM.html
>  - https://llvm.org/docs/NVPTXUsage.html - defines LLVM IR representation for 
> NVPTX.
> 
> NVIDIA documentation - 
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#variable-memory-space-specifiers.
> It says that memory space is assigned using "variable specifiers" rather than 
> type qualifiers.
Btw I don't know if it is outdated - do you plan to use any of the conversion 
intrinsics https://llvm.org/docs/NVPTXUsage.html#address-space-conversion or do 
you plan to use `addrspacecast` instruction like OpenCL and some other 
languages?

> NVIDIA documentation - 
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#variable-memory-space-specifiers.
> It says that memory space is assigned using "variable specifiers" rather than 
> type qualifiers.

Yeah it seems like it is some sort of embedded C flavor but not quite the same 
concept though.

> An automatic variable declared in device code without any of the __device__, 
> __shared__ and __constant__ memory space specifiers described in this section 
> generally resides in a register. However in some cases the compiler might 
> choose to place it in local memory, which can have adverse performance 
> consequences as detailed in Device Memory Accesses. 

It might have been better to introduce a separate attribute instead of using 
embedded C's address spaces for this or introduce a different address space 
entry. In OpenCL we do so because generic also has a different flavor from emb 
C. I guess the original design didn't assume wide usage so it has not flagged 
up. It was clearly something custom for a particular vendor with some 
separation of concerns i.e. you wouln't compile CUDA to an arbitrary CPU that 
also needs to compile C or C++. So towards popularizing this approach it is not 
unreasonable that its better understanding and some adjustments might be 
required.

If you plan to use the CUDA approach in a straight-forward way it is reasonable 
to leave this undocumented for now and see if we can provide some details about 
the exact semantics while refining the implementation. 

Since in this model the automatic objects are not allocated in the 
"default"/"generic" address space in contrast to embedded C and C++ just like 
in OpenCL btw, we should at least highlight that fact for now. It is probably 
good to refer to SYCL spec s3.8.2.1 here where object allocation is explained.




================
Comment at: clang/docs/SYCLSupport.md:851
+
+Changing variable type has massive and destructive effect in C++. For instance
+this does not compile in C++ for OpenCL mode:
----------------
bader wrote:
> Anastasia wrote:
> > aaron.ballman wrote:
> > > 
> > > This example demonstrates the problem with compiling C++ code when 
> > > address space type qualifiers are inferred.
> > > 
> > >     The example compiles in accordance with OpenCL language semantic...
> > > 
> > > https://godbolt.org/z/9jzxK5xc4 - ToT clang doesn't compile this example.
> > 
> > I am still not clear what message you are trying to convey here? In OpenCL 
> > kernel languages any object is always in some address space so if you write 
> > the following `decltype(p)`, it will always have address space attribute in 
> > a type. OpenCL spec is very explicit about this:
> > 
> > https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#addr-spaces-inference
> > 
> > So if you compare a type not attributed by an address space with an 
> > attributed one they will never compare as equal because according to C++ 
> > rules if the qualifiers differ the types will differ. You need to use a 
> > special type trait to remove an address space if you need to compare types 
> > not qualified by an address space. What is important to highlight however 
> > is that address space inference is where OpenCL differs to C or C++. But of 
> > course, neither C nor C++ have address spaces so it is hard to compare.
> > 
> > In relation to your documentation, it is not clear what you are trying to 
> > achieve with this paragraph?
> >  
> > In relation to your documentation, it is not clear what you are trying to 
> > achieve with this paragraph?
> 
> This paragraph provides clarification to the question why we can't apply 
> OpenCL address space inference rules for SYCL mode.
> I think it might be unnecessary because the SYCL specification defines 
> address space deduction rules now.
> Do you suggest removing this paragraph?
>     In relation to your documentation, it is not clear what you are trying to 
> achieve with this paragraph?
> 
> This paragraph provides clarification to the question why we can't apply 
> OpenCL address space inference rules for SYCL mode.

I can't quite get what exactly it is trying to say. Perhaps you need to provide 
more context or full example, nit just a small fragement.



> I think it might be unnecessary because the SYCL specification defines 
> address space deduction rules now.
> Do you suggest removing this paragraph?

Yeah, it does not seem very relevant here since we are not comparing the 
semantics from different languages. We just want to describe the expected 
behavior.




================
Comment at: clang/docs/SYCLSupport.md:909
+| `__attribute__((opencl_local))` | local_space |
+| `__attribute__((opencl_private))` | private_space |
+
----------------
Naghasan wrote:
> bader wrote:
> > Anastasia wrote:
> > > Since SYCL spec has constant AS you should explain whether it is going to 
> > > be supported or not and if so then how.
> > The first raw of this table covers mapping between SYCL constant_space and 
> > address space attribute.
> > Could you clarify what else do we need?
> To be more specific here, the OpenCL constant address space no longer have an 
> equivalent in the SYCL core spec memory model 
> https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_sycl_device_memory_model.
> 
> The `multi_ptr` for that address space is now deprecated and implementors can 
> map it the global address space (as described here).
Ok, if that's permitted by the spec, the current table is good enough.


================
Comment at: clang/docs/SYCLSupport.md:914-919
+Default address space represents "Generic-memory", which is a virtual address
+space which overlaps the global, local and private address spaces. SYCL mode
+enables conversion to/from default address space from/to address space
+attributed type.
+
+SPIR target allocates SYCL namespace scope variables in global address space.
----------------
Naghasan wrote:
> bader wrote:
> > bader wrote:
> > > bader wrote:
> > > > Anastasia wrote:
> > > > > Naghasan wrote:
> > > > > > aaron.ballman wrote:
> > > > > > > 
> > > > > > I think this section should be extended.
> > > > > > 
> > > > > > Pointers to `Default` address space should get lowered into a 
> > > > > > pointer to a generic address space (or flat to reuse more general 
> > > > > > terminology).
> > > > > > But depending on the allocation context, the `default` address 
> > > > > > space of a non-pointer type is assigned to a specific address 
> > > > > > space. This is described in 
> > > > > > https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:commonAddressSpace.
> > > > > > 
> > > > > > This is also in line with the behaviour of CUDA (small example 
> > > > > > https://godbolt.org/z/veqTfo9PK).
> > > > > Ok, if the implementation plans to follow the spec precisely then 
> > > > > just adding a reference should be sufficient. 
> > > > > 
> > > > > Do I understand it correctly that your implementation will use the 
> > > > > first approach from the two described in:
> > > > > > If the target of the SYCL backend can represent the generic address 
> > > > > > space, then the "common address space deduction rules" in Section 
> > > > > > 5.9.2 and the "generic as default address space rules" in Section 
> > > > > > 5.9.3 apply. If the target of the SYCL backend cannot represent the 
> > > > > > generic address space, then the "common address space deduction 
> > > > > > rules" in Section 5.9.2 and the "inferred address space rules" in 
> > > > > > Section 5.9.4 apply.
> > > > > 
> > > > > This should be added to the documentation btw.
> > > > > 
> > > > > 
> > > > > Btw does this statement in any way relate to the following statement:
> > > > > 
> > > > > > Within kernels, the underlying C++ pointer types can be obtained 
> > > > > > from an accessor. The pointer types will contain a compile-time 
> > > > > > deduced address space. So, for example, if a C++ pointer is 
> > > > > > obtained from an accessor to global memory, the C++ pointer type 
> > > > > > will have a global address space attribute attached to it. The 
> > > > > > address space attribute will be compile-time propagated to other 
> > > > > > pointer values when one pointer is initialized to another pointer 
> > > > > > value using a defined algorithm.
> > > > > 
> > > > > from 
> > > > > https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_access_to_memory
> > > > > 
> > > > > Or if not where can I find the algorithm it refers to?
> > > > > I think this section should be extended.
> > > > > 
> > > > > Pointers to `Default` address space should get lowered into a pointer 
> > > > > to a generic address space (or flat to reuse more general 
> > > > > terminology).
> > > > > But depending on the allocation context, the `default` address space 
> > > > > of a non-pointer type is assigned to a specific address space. This 
> > > > > is described in 
> > > > > https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:commonAddressSpace.
> > > > > 
> > > > > This is also in line with the behaviour of CUDA (small example 
> > > > > https://godbolt.org/z/veqTfo9PK).
> > > > 
> > > > I've added this text to the document.
> > > > 
> > > > > Ok, if the implementation plans to follow the spec precisely then 
> > > > > just adding a reference should be sufficient. 
> > > > > 
> > > > > Do I understand it correctly that your implementation will use the 
> > > > > first approach from the two described in:
> > > > > > If the target of the SYCL backend can represent the generic address 
> > > > > > space, then the "common address space deduction rules" in Section 
> > > > > > 5.9.2 and the "generic as default address space rules" in Section 
> > > > > > 5.9.3 apply. If the target of the SYCL backend cannot represent the 
> > > > > > generic address space, then the "common address space deduction 
> > > > > > rules" in Section 5.9.2 and the "inferred address space rules" in 
> > > > > > Section 5.9.4 apply.
> > > > > 
> > > > > This should be added to the documentation btw.
> > > > 
> > > > The implementation residing in https://github.com/intel/llvm targets 
> > > > devices supporting generic address space. If I understand it correctly, 
> > > > another approach is supported by ComputeCPP. @Naghasan, are you aware 
> > > > of any plans to upstream the second approach? If no, I can clarify that 
> > > > it's not supported.
> > > > 
> > > > > Btw does this statement in any way relate to the following statement:
> > > > > 
> > > > > > Within kernels, the underlying C++ pointer types can be obtained 
> > > > > > from an accessor. The pointer types will contain a compile-time 
> > > > > > deduced address space. So, for example, if a C++ pointer is 
> > > > > > obtained from an accessor to global memory, the C++ pointer type 
> > > > > > will have a global address space attribute attached to it. The 
> > > > > > address space attribute will be compile-time propagated to other 
> > > > > > pointer values when one pointer is initialized to another pointer 
> > > > > > value using a defined algorithm.
> > > > > 
> > > > > from 
> > > > > https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_access_to_memory
> > > > > 
> > > > > Or if not where can I find the algorithm it refers to?
> > > > 
> > > > AFAIK, the pointer types mentioned in this section is not implemented 
> > > > yet, but their implementation can be done in the library using the 
> > > > attributes described in this section and C++ class templates. 
> > > > 
> > > > Just to demonstrate the idea, let's use the example implementation for 
> > > > `multi_ptr` template class provided above:
> > > > ```
> > > > `multi_ptr` class implementation example:
> > > > 
> > > > ``` C++
> > > > template <typename T, address_space AS> class multi_ptr {
> > > >   // DecoratedType applies corresponding address space attribute to the 
> > > > type T
> > > >   // DecoratedType<T, global_space>::type == 
> > > > "__attribute__((opencl_global)) T"
> > > >   // See sycl/include/CL/sycl/access/access.hpp for more details
> > > >   using pointer_t = typename DecoratedType<T, AS>::type *;
> > > > 
> > > >   pointer_t m_Pointer;
> > > >   public:
> > > >   pointer_t get() { return m_Pointer; }
> > > >   T& operator* () { return *reinterpret_cast<T*>(m_Pointer); }
> > > > }
> > > > ```
> > > > "decorated" pointers will return `pointer_t`, where as "raw" pointers 
> > > > will return the type casted to "generic" address space.
> > > > Ok, if the implementation plans to follow the spec precisely then just 
> > > > adding a reference should be sufficient. 
> > > > 
> > > > Do I understand it correctly that your implementation will use the 
> > > > first approach from the two described in:
> > > > > If the target of the SYCL backend can represent the generic address 
> > > > > space, then the "common address space deduction rules" in Section 
> > > > > 5.9.2 and the "generic as default address space rules" in Section 
> > > > > 5.9.3 apply. If the target of the SYCL backend cannot represent the 
> > > > > generic address space, then the "common address space deduction 
> > > > > rules" in Section 5.9.2 and the "inferred address space rules" in 
> > > > > Section 5.9.4 apply.
> > > > 
> > > > This should be added to the documentation btw.
> > > > 
> > > > 
> > > > Btw does this statement in any way relate to the following statement:
> > > > 
> > > > > Within kernels, the underlying C++ pointer types can be obtained from 
> > > > > an accessor. The pointer types will contain a compile-time deduced 
> > > > > address space. So, for example, if a C++ pointer is obtained from an 
> > > > > accessor to global memory, the C++ pointer type will have a global 
> > > > > address space attribute attached to it. The address space attribute 
> > > > > will be compile-time propagated to other pointer values when one 
> > > > > pointer is initialized to another pointer value using a defined 
> > > > > algorithm.
> > > > 
> > > > from 
> > > > https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_access_to_memory
> > > > 
> > > > Or if not where can I find the algorithm it refers to?
> > > 
> > > 
> > > I think this section should be extended.
> > > 
> > > Pointers to `Default` address space should get lowered into a pointer to 
> > > a generic address space (or flat to reuse more general terminology).
> > > But depending on the allocation context, the `default` address space of a 
> > > non-pointer type is assigned to a specific address space. This is 
> > > described in 
> > > https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:commonAddressSpace.
> > > 
> > > This is also in line with the behaviour of CUDA (small example 
> > > https://godbolt.org/z/veqTfo9PK).
> > 
> > 
> > If I understand it correctly, another approach is supported by ComputeCPP. 
> > @Naghasan, are you aware of any plans to upstream the second approach?
> 
> We do support that other approach, but we do not have plan to upstream this. 
> However an implementation doesn't have to support both, so you should just 
> mention the implementation is solely based on the usage of generic.
> 
> > from 
> > https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_access_to_memory
> 
> There is a spec bug here, this only applies to the deduced address space mode.
> 
> > Or if not where can I find the algorithm it refers to?
> 
> Section 5.9.4, however you will see a note stating rework is due. But that 
> doesn't impact this design here.
> 

> 
> ``` C++
> template <typename T, address_space AS> class multi_ptr {
>   // DecoratedType applies corresponding address space attribute to the type T
>   // DecoratedType<T, global_space>::type == "__attribute__((opencl_global)) 
> T"
>   // See sycl/include/CL/sycl/access/access.hpp for more details
>   using pointer_t = typename DecoratedType<T, AS>::type *;
> 
>   pointer_t m_Pointer;
>   public:
>   pointer_t get() { return m_Pointer; }
>   T& operator* () { return *reinterpret_cast<T*>(m_Pointer); }
> }
> 
> "decorated" pointers will return pointer_t, where as "raw" pointers will 
> return the type casted to "generic" address space.

FYI depending on your conversion semantics you might need to use 
`addrspace_cast` instead of `reinterpret_cast` 
https://www.khronos.org/opencl/assets/CXX_for_OpenCL.html#_casts


> 
>     from 
> https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_access_to_memory
> 
> There is a spec bug here, this only applies to the deduced address space mode.



Ok, this makes sense. So for the current implementation in clang would it be 
something like:

`The pointer types will contain a compile-time deduced address space.` -> `The 
pointer types will point to a generic/default address space object.`?

Or is it something else that will be different... If it can't be easily fixed 
in the spec, let's at least document it here.

> 
>     Or if not where can I find the algorithm it refers to?
> 
> Section 5.9.4, however you will see a note stating rework is due. But that 
> doesn't impact this design here.

Thanks, it seems it is not relevant for this implementation at the moment then.




================
Comment at: clang/docs/SYCLSupport.md:915
+Default address space represents "Generic-memory", which is a virtual address
+space which overlaps the global, local and private address spaces. SYCL mode
+enables conversion to/from default address space from/to address space
----------------
bader wrote:
> Anastasia wrote:
> > aaron.ballman wrote:
> > > 
> > You should also explain what address spaces are super/sub-sets because this 
> > impacts implicit and explicit conversion behavior in an embedded C-like 
> > models. In relation to that, you should highlight that the private, local 
> > and global ASes are disjoint.
> It's already covered in the SYCL device memory model section of the 
> specification: 
> https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_sycl_device_memory_model.
> What additional clarification do we need in this document?
Sorry I can not see.

There are two approaches here:
- Embedded C s5.1.3    "Address space nesting and rules for pointers" requires:
`An implementation must define the relationship between all pairs of address 
spaces.  (The complete set of address spaces includes the generic address space 
plus any address spaces that may be defined within a translation unit, if the 
implementation supports such definitions within a program.)  There is no 
requirement that named address spaces (intrinsic or otherwise) be subsets of 
the generic address space`
- In OpenCL C we also use C terminology i.e. implicit/explicit conversions in 
addition to that:
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#addr-spaces-conversions

Either way, it has to be clear what types of conversions are allowed or 
disallowed. 


================
Comment at: clang/docs/SYCLSupport.rst:243
+Similar to other single-source C++-based GPU programming modes like
+OpenMP/CUDA/HIP, SYCL uses clang's "default" address space for types with no
+explicit address space attribute. This design has two important features: it
----------------
This is ambiguous now because every language will use `clang's "default" 
address space` because at least one address space is always needed by every 
language but it has different semantics in languages. We should either attempt 
to describe it somehow or perhaps just point out that it is inherited from CUDA 
and currently undocumented.


================
Comment at: clang/docs/SYCLSupport.rst:341
+that overlaps the global, local, and private address spaces. SYCL mode enables
+conversion to/from the default address space from/to the address
+space-attributed type.
----------------
Do you mean both implicit and explicit conversions? Does it mean that in your 
AS model named ASes are subset of generic AS and generic AS is a subset of 
named ASes so they are equivalent sets? It is probably good to mention here 
that all named address spaces are disjoint.


================
Comment at: clang/docs/SYCLSupport.rst:344
+
+The SPIR target allocates SYCL namespace scope variables in the global address
+space.
----------------
Interesting, will this deduction always be target specific or can it be 
generalized since it is governed by the language semantic already?


================
Comment at: clang/docs/SYCLSupport.rst:347
+
+Pointers to Default address space should get lowered into a pointer to a 
generic
+address space (or flat to reuse more general terminology). But depending on the
----------------
I think it is also relevant to highlight that you don't perform inference of 
the address space qualifiers and the memory segment binding is performed as a 
final phase of parsing. This is quite relevant since embedded C or C++ have no 
address space inference at all and OpenCL explicitly requires inference in the 
type qualifiers.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99488/new/

https://reviews.llvm.org/D99488

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to