https://gcc.gnu.org/g:b8617e0a241c7021f539aeca09a7c2bec02e9b39
commit r16-1578-gb8617e0a241c7021f539aeca09a7c2bec02e9b39 Author: Tobias Burnus <tbur...@baylibre.com> Date: Thu Jun 19 21:06:11 2025 +0200 libgomp.texi: Document omp(x)::allocator::*, restructure memory allocator doc libgomp/ChangeLog: * libgomp.texi (omp_init_allocator): Refer to 'Memory allocation' for available memory spaces. (OMP_ALLOCATOR): Move list of traits and predefined memspaces and allocators to ... (Memory allocation): ... here. Document omp(x)::allocator::*; minor wording tweaks, be more explicit about memkind, pinned and pool_size. Co-authored-by: waffl3x <waff...@baylibre.com> Diff: --- libgomp/libgomp.texi | 181 ++++++++++++++++++++++++++++++++------------------- 1 file changed, 113 insertions(+), 68 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 8374595bc823..9f53f167e064 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -3453,7 +3453,7 @@ traits; if an allocator that fulfills the requirements cannot be created, @code{omp_null_allocator} is returned. The predefined memory spaces and available traits can be found at -@ref{OMP_ALLOCATOR}, where the trait names have to be prefixed by +@ref{Memory allocation}, where the trait names have to be prefixed by @code{omp_atk_} (e.g. @code{omp_atk_pinned}) and the named trait values by @code{omp_atv_} (e.g. @code{omp_atv_true}); additionally, @code{omp_atv_default} may be used as trait value to specify that the default value should be used. @@ -3476,7 +3476,7 @@ may be used as trait value to specify that the default value should be used. @end multitable @item @emph{See also}: -@ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_destroy_allocator} +@ref{Memory allocation}, @ref{OMP_ALLOCATOR}, @ref{omp_destroy_allocator} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.2 @@ -4057,63 +4057,15 @@ The value can either be a predefined allocator or a predefined memory space or a predefined memory space followed by a colon and a comma-separated list of memory trait and value pairs, separated by @code{=}. +See @ref{Memory allocation} for a list of supported prefedined allocators, +memory spaces, and traits. + Note: The corresponding device environment variables are currently not supported. Therefore, the non-host @var{def-allocator-var} ICVs are always initialized to @code{omp_default_mem_alloc}. However, on all devices, the @code{omp_set_default_allocator} API routine can be used to change value. -@multitable @columnfractions .45 .45 -@headitem Predefined allocators @tab Associated predefined memory spaces -@item omp_default_mem_alloc @tab omp_default_mem_space -@item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space -@item omp_const_mem_alloc @tab omp_const_mem_space -@item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space -@item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space -@item omp_cgroup_mem_alloc @tab omp_low_lat_mem_space (implementation defined) -@item omp_pteam_mem_alloc @tab omp_low_lat_mem_space (implementation defined) -@item omp_thread_mem_alloc @tab omp_low_lat_mem_space (implementation defined) -@item ompx_gnu_pinned_mem_alloc @tab omp_default_mem_space (GNU extension) -@end multitable - -The predefined allocators use the default values for the traits, -as listed below. Except that the last three allocators have the -@code{access} trait set to @code{cgroup}, @code{pteam}, and -@code{thread}, respectively. - -@multitable @columnfractions .25 .40 .25 -@headitem Trait @tab Allowed values @tab Default value -@item @code{sync_hint} @tab @code{contended}, @code{uncontended}, - @code{serialized}, @code{private} - @tab @code{contended} -@item @code{alignment} @tab Positive integer being a power of two - @tab 1 byte -@item @code{access} @tab @code{all}, @code{cgroup}, - @code{pteam}, @code{thread} - @tab @code{all} -@item @code{pool_size} @tab Positive integer - @tab See @ref{Memory allocation} -@item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb}, - @code{abort_fb}, @code{allocator_fb} - @tab See below -@item @code{fb_data} @tab @emph{unsupported as it needs an allocator handle} - @tab (none) -@item @code{pinned} @tab @code{true}, @code{false} - @tab See below -@item @code{partition} @tab @code{environment}, @code{nearest}, - @code{blocked}, @code{interleaved} - @tab @code{environment} -@end multitable - -For the @code{fallback} trait, the default value is @code{null_fb} for the -@code{omp_default_mem_alloc} allocator and any allocator that is associated -with device memory; for all other allocators, it is @code{default_mem_fb} -by default. - -For the @code{pinned} trait, the default value is @code{true} for -predefined allocator @code{ompx_gnu_pinned_mem_alloc} (a GNU extension), and -@code{false} for all others. - Examples: @smallexample OMP_ALLOCATOR=omp_high_bw_mem_alloc @@ -5972,7 +5924,7 @@ This function copies device memory from one memory location to another on the current device. It copies @var{bytes} bytes of data from the device address, specified by @var{data_dev_src}, to the device address @var{data_dev_dest}. The @code{_async} version performs the transfer -asnychronously using the queue associated with @var{async_arg}. +asynchronously using the queue associated with @var{async_arg}. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @@ -6883,6 +6835,7 @@ on more architectures, GCC currently does not match any @code{arch} or @tab See @code{-march=} in ``Nvidia PTX Options'' @end multitable + @node Memory allocation @section Memory allocation @@ -6917,11 +6870,94 @@ The description below applies to: @code{_Alignof} and C++'s @code{alignof}. @end itemize -For the available predefined allocators and, as applicable, their associated -predefined memory spaces and for the available traits and their default values, -see @ref{OMP_ALLOCATOR}. Predefined allocators without an associated memory -space use the @code{omp_default_mem_space} memory space. See additionally -@ref{Offload-Target Specifics}. +GCC supports the following predefined allocators and predefined memory spaces: + +@multitable @columnfractions .45 .45 +@headitem Predefined allocators @tab Associated predefined memory spaces +@item omp_default_mem_alloc @tab omp_default_mem_space +@item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space +@item omp_const_mem_alloc @tab omp_const_mem_space +@item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space +@item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space +@item omp_cgroup_mem_alloc @tab omp_low_lat_mem_space (implementation defined) +@item omp_pteam_mem_alloc @tab omp_low_lat_mem_space (implementation defined) +@item omp_thread_mem_alloc @tab omp_low_lat_mem_space (implementation defined) +@item ompx_gnu_pinned_mem_alloc @tab omp_default_mem_space (GNU extension) +@end multitable + +Each predefined allocator, including @code{omp_null_allocator}, has a corresponding +allocator class template that meet the C++ allocator completeness requirements. +These are located in the @code{omp::allocator} namespace, and the +@code{ompx::allocator} namespace for gnu extensions. This allows the +allocator-aware C++ standard library containers to use OpenMP allocation routines; +for instance: + +@smallexample +std::vector<int, omp::allocator::cgroup_mem<int>> vec; +@end smallexample + +The following allocator templates are supported: + +@multitable @columnfractions .45 .45 +@headitem Predefined allocators @tab Associated allocator template +@item omp_null_allocator @tab omp::allocator::null_allocator +@item omp_default_mem_alloc @tab omp::allocator::default_mem +@item omp_large_cap_mem_alloc @tab omp::allocator::large_cap_mem +@item omp_const_mem_alloc @tab omp::allocator::const_mem +@item omp_high_bw_mem_alloc @tab omp::allocator::high_bw_mem +@item omp_low_lat_mem_alloc @tab omp::allocator::low_lat_mem +@item omp_cgroup_mem_alloc @tab omp::allocator::cgroup_mem +@item omp_pteam_mem_alloc @tab omp::allocator::pteam_mem +@item omp_thread_mem_alloc @tab omp::allocator::thread_mem +@item ompx_gnu_pinned_mem_alloc @tab ompx::allocator::gnu_pinned_mem +@end multitable + +The following traits are available when constructing a new allocator; +if a trait is not specified or with the value @code{default}, the +specified default value is used for that trait. The predefined +allocators use the default values of each trait, except that the +@code{omp_cgroup_mem_alloc}, @code{omp_pteam_mem_alloc}, and +@code{omp_thread_mem_alloc} allocators have the @code{access} trait +set to @code{cgroup}, @code{pteam}, and @code{thread}, respectively. +For each trait, a named constant prefixed by @code{omp_atk_} exists; +for each non-numeric value, a named constant prefixed by @code{omp_atv_} +exists. + +@multitable @columnfractions .25 .40 .25 +@headitem Trait @tab Allowed values @tab Default value +@item @code{sync_hint} @tab @code{contended}, @code{uncontended}, + @code{serialized}, @code{private} + @tab @code{contended} +@item @code{alignment} @tab Positive integer being a power of two + @tab 1 byte +@item @code{access} @tab @code{all}, @code{cgroup}, + @code{pteam}, @code{thread} + @tab @code{all} +@item @code{pool_size} @tab Positive integer (bytes) + @tab See below. +@item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb}, + @code{abort_fb}, @code{allocator_fb} + @tab See below +@item @code{fb_data} @tab @emph{allocator handle} + @tab (none) +@item @code{pinned} @tab @code{true}, @code{false} + @tab See below +@item @code{partition} @tab @code{environment}, @code{nearest}, + @code{blocked}, @code{interleaved} + @tab @code{environment} +@end multitable + +For the @code{fallback} trait, the default value is @code{null_fb} for the +@code{omp_default_mem_alloc} allocator and any allocator that is associated +with device memory; for all other allocators, it is @code{default_mem_fb} +by default. + +For the @code{pinned} trait, the default value is @code{true} for +predefined allocator @code{ompx_gnu_pinned_mem_alloc} (a GNU extension), and +@code{false} for all others. + +The following description applies to the initial device (the host) and largely +also to non-host devices; for the latter, also see @ref{Offload-Target Specifics}. For the memory spaces, the following applies: @itemize @@ -6936,14 +6972,16 @@ For the memory spaces, the following applies: @end itemize On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind -library} (@code{libmemkind.so.0}) is available at runtime, it is used when -creating memory allocators requesting +library} (@code{libmemkind.so.0}) is available at runtime and the respective +memkind kind is supported, it is used when creating memory allocators requesting @itemize -@item the memory space @code{omp_high_bw_mem_space} -@item the memory space @code{omp_large_cap_mem_space} -@item the @code{partition} trait @code{interleaved}; note that for - @code{omp_large_cap_mem_space} the allocation will not be interleaved +@item the @code{partition} trait @code{interleaved} except when the memory space + is @code{omp_large_cap_mem_space} (uses @code{MEMKIND_HBW_INTERLEAVE}) +@item the memory space is @code{omp_high_bw_mem_space} (uses + @code{MEMKIND_HBW_PREFERRED}) +@item the memory space is @code{omp_large_cap_mem_space} (uses + @code{MEMKIND_DAX_KMEM_ALL} or, if not available, @code{MEMKIND_DAX_KMEM}) @end itemize On Linux systems, where the @uref{https://github.com/numactl/numactl, numa @@ -6969,10 +7007,15 @@ a @code{nearest} allocation. Additional notes regarding the traits: @itemize @item The @code{pinned} trait is supported on Linux hosts, but is subject to - the OS @code{ulimit}/@code{rlimit} locked memory settings. + the OS @code{ulimit}/@code{rlimit} locked memory settings. It currently + uses @code{mmap} and is therefore optimized for few allocations, including + large data. If the conditions for numa or memkind allocations are + fulfilled, those allocators are used instead. @item The default for the @code{pool_size} trait is no pool and for every (re)allocation the associated library routine is called, which might - internally use a memory pool. + internally use a memory pool. Currently, the same applies when a + @code{pool_size} has been specified, except that once allocations exceed + the the pool size, the action of the @code{fallback} trait applies. @item For the @code{partition} trait, the partition part size will be the same as the requested size (i.e. @code{interleaved} or @code{blocked} has no effect), except for @code{interleaved} when the memkind library is @@ -6981,13 +7024,15 @@ Additional notes regarding the traits: that allocated the memory; on Linux, this is in particular the case when the memory placement policy is set to preferred. @item The @code{access} trait has no effect such that memory is always - accessible by all threads. + accessible by all threads. (Except on supported no-host devices.) @item The @code{sync_hint} trait has no effect. @end itemize See also: @ref{Offload-Target Specifics} + + @c --------------------------------------------------------------------- @c Offload-Target Specifics @c ---------------------------------------------------------------------