arsenm created this revision. arsenm added reviewers: rjmccall, jdoerfert, efriedma, t-tye, yaxunl, scott.linder, rnk, spatel, lebedev.ri, nlopes, fhahn, hfinkel, Anastasia. Herald added subscribers: tpr, wdng. Herald added a project: LLVM.
This allows tracking the in-memory type of a pointer argument to a function for ABI purposes. This is essentially a stripped down version of byval to remove some of the stack-copy implications in its definition. My original attempt at solving some of these problems was to repurpose byval with a different address space from the stack. However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality. There is also talk of removing and replacing the byval attribute, so a new attribute would need to take its place anyway. I went with the name inmem to mirror inreg. Other name ideas I had are "indirect", "abitype", or "pointee". One question I have is whether this attribute is necessary, or if the definition of preallocated can be refined to fit this case. The current description is heavy on what it means for the call site, but I didn't understand the implications for the callee. For the amdgpu_kernel use case, calls are illegal so all of the details about call setup are irrelevant. This is intended avoid some optimization issues with the current handling of aggregate arguments, as well as fixes inflexibilty in how frontends can specify the kernel ABI. The most honest representation of the amdgpu_kernel convention is to expose all kernel arguments as loads from constant memory. Today, these are raw, SSA Argument values and codegen is responsible for turning these into loads. Background: There currently isn't a satisfactory way to represent how arguments for the amdgpu_kernel calling convention are passed. In reality, arguments are passed in a single, flat, constant memory buffer implicitly passed to the function. It is also illegal to call this function in the IR, and this is only ever invoked by a driver of some kind. It does not make sense to have a stack passed parameter in this context as is implied by byval. It is never valid to write to the kernel arguments, as this would corrupt the inputs seen by other dispatches of the kernel. These argumets are also not in the same address space as the stack, so a copy is needed to an alloca. From a source C-like language, the kernel parameters are invisible. Semantically, a copy is always required from the constant argument memory to a mutable variable. The current clang calling convention lowering emits raw values, including aggregates into the function argument list, since using byval would not make sense. This has some unfortunate consequences for the optimizer. In the aggregate case, we end up with an aggregate store to alloca, which both SROA and instcombine turn into a store of each aggregate field. The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage. This also means the backend dictates the alignment of arguments, and arbitrarily picks the LLVM IR ABI type alignment. By allowing an explicit alignment, frontends can make better decisions. For example, there's real no advantage to an aligment higher than 4, so a frontend could choose to compact the argument layout. Similarly, there is a high penalty to using an alignment lower than 4, so a frontend could opt into more padding for small arguments. Another design consideration is when it is appropriate to expose the fact that these arguments are all really passed in adjacent memory. Currently we have a late IR optimization pass in codegen to rewrite the kernel argument values into explicit loads to enable vectorization. In most programs, unrelated argument loads can be merged together. However, exposing this property directly from the frontend has some disadvantages. We still need a way to track the original argument sizes and alignments to report to the driver. I find using some side-channel, metadata mechanism to track this unappealing. If the kernel arguments were exposed as a single buffer to begin with, alias analysis would be unaware that the padding bits betewen arguments are meaningless. Another family of problems is there are still some gaps in replacing all of the available parameter attributes with metadata equivalents once lowered to loads. The immediate plan is to start using this new attribute to handle all aggregate argumets for kernels. Long term, it makes sense to migrate all kernel arguments, including scalars, to be passed indirectly in the same manner. Additional context is in D79744 <https://reviews.llvm.org/D79744>. https://reviews.llvm.org/D81311 Files: llvm/docs/LangRef.rst llvm/docs/ReleaseNotes.rst Index: llvm/docs/ReleaseNotes.rst =================================================================== --- llvm/docs/ReleaseNotes.rst +++ llvm/docs/ReleaseNotes.rst @@ -74,6 +74,9 @@ information. This information is used to represent Fortran modules debug info at IR level. +* Added the ``inmem`` attribute to better represent argument passing + for the `amdgpu_kernel` calling convention. + Changes to building LLVM ------------------------ @@ -134,6 +137,9 @@ retain the old behavior should explicitly request f32 denormal flushing. +* The new ``inmem`` attribute is now the preferred method for + representing aggregate kernel arguments. + Changes to the AVR Target ----------------------------- Index: llvm/docs/LangRef.rst =================================================================== --- llvm/docs/LangRef.rst +++ llvm/docs/LangRef.rst @@ -1066,6 +1066,28 @@ site. If the alignment is not specified, then the code generator makes a target-specific assumption. +.. _attr_inmem: + +``inmem(<ty>)`` + + The ``inmem`` argument attribute allows specifying the pointee + memory type of an argument for ABI purposes. This is similar to + ``byval``, but does not imply a copy is made anywhere, or that the + argument is passed on the stack. This implies the pointer is + dereferenceable up to the storage size of the type. + + It is not generally permissible to introduce a write to an + ``inmem`` pointer, unless it is known this will not produce an + observable change in the caller. The pointer may have any address + space and may be read only. + + This is not a valid attribute for return values. + + The alignment for an ``inmem`` parameter can be explicitly + specified by combining it with the ``align`` attribute, similar to + ``byval``. If the alignment is not specified, then the code generator + makes a target-specific assumption. + .. _attr_preallocated: ``preallocated(<ty>)``
Index: llvm/docs/ReleaseNotes.rst =================================================================== --- llvm/docs/ReleaseNotes.rst +++ llvm/docs/ReleaseNotes.rst @@ -74,6 +74,9 @@ information. This information is used to represent Fortran modules debug info at IR level. +* Added the ``inmem`` attribute to better represent argument passing + for the `amdgpu_kernel` calling convention. + Changes to building LLVM ------------------------ @@ -134,6 +137,9 @@ retain the old behavior should explicitly request f32 denormal flushing. +* The new ``inmem`` attribute is now the preferred method for + representing aggregate kernel arguments. + Changes to the AVR Target ----------------------------- Index: llvm/docs/LangRef.rst =================================================================== --- llvm/docs/LangRef.rst +++ llvm/docs/LangRef.rst @@ -1066,6 +1066,28 @@ site. If the alignment is not specified, then the code generator makes a target-specific assumption. +.. _attr_inmem: + +``inmem(<ty>)`` + + The ``inmem`` argument attribute allows specifying the pointee + memory type of an argument for ABI purposes. This is similar to + ``byval``, but does not imply a copy is made anywhere, or that the + argument is passed on the stack. This implies the pointer is + dereferenceable up to the storage size of the type. + + It is not generally permissible to introduce a write to an + ``inmem`` pointer, unless it is known this will not produce an + observable change in the caller. The pointer may have any address + space and may be read only. + + This is not a valid attribute for return values. + + The alignment for an ``inmem`` parameter can be explicitly + specified by combining it with the ``align`` attribute, similar to + ``byval``. If the alignment is not specified, then the code generator + makes a target-specific assumption. + .. _attr_preallocated: ``preallocated(<ty>)``
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits