The attached patch does some cleanup to the memory allocation
description, which I mainly started as I wondered myself about some
details - especially about the pool_size feature.
It also includes the documentation about omp::allocator::* by Alex.
And, as I proposed by then (cf. below), it moves the list of supported
traits/predefined memspaces/allocators to the memory-allocation section;
before it was under OMP_ALLOCATOR (as the former didn't exist when I
completed the list of ICV env variables).
Any comments before I commit the attached patch?
Tobias
PS: The pages currently look as follows:
https://gcc.gnu.org/onlinedocs/libgomp/Memory-allocation.html and
https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fALLOCATOR.html
Tobias Burnus wrote:
Alex wrote:
Here is a follow up patch for documentation of the omp.h allocators,
[…]
I want the table in there somewhere but I'm not confident that where I
put it was the right place.
I think having the C++ template classes listed under the OMP_ALLOCATOR
environment variable feels odd.
I think it is best to move the two tables, the existing one under
"OMP_ALLOCATOR – Set the default allocator",
https://https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fALLOCATOR.html
and the one you added to "11.3 Memory allocation",
https://gcc.gnu.org/onlinedocs/libgomp/Memory-allocation.html
libgomp.texi: Document omp(x)::allocator::*, restructure memory allocator doc
libgomp/ChangeLog:
* libgomp.texi (omp_init_allocator): Refer to 'Memory allocation'
for available memory spaces.
(OMP_ALLOCATOR): Move list of traits and predefined memspaces
and allocators to ...
(Memory allocation): ... here. Document omp(x)::allocator::*;
minor wording tweaks, be more explicit about memkind, pinned and
pool_size.
Co-authored-by: waffl3x <waff...@baylibre.com>
libgomp/libgomp.texi | 179 ++++++++++++++++++++++++++++++++-------------------
1 file changed, 112 insertions(+), 67 deletions(-)
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 8374595bc82..06bd5419acc 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -3453,7 +3453,7 @@ traits; if an allocator that fulfills the requirements cannot be created,
@code{omp_null_allocator} is returned.
The predefined memory spaces and available traits can be found at
-@ref{OMP_ALLOCATOR}, where the trait names have to be prefixed by
+@ref{Memory allocation}, where the trait names have to be prefixed by
@code{omp_atk_} (e.g. @code{omp_atk_pinned}) and the named trait values by
@code{omp_atv_} (e.g. @code{omp_atv_true}); additionally, @code{omp_atv_default}
may be used as trait value to specify that the default value should be used.
@@ -3476,7 +3476,7 @@ may be used as trait value to specify that the default value should be used.
@end multitable
@item @emph{See also}:
-@ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_destroy_allocator}
+@ref{Memory allocation}, @ref{OMP_ALLOCATOR}, @ref{omp_destroy_allocator}
@item @emph{Reference}:
@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.2
@@ -4057,63 +4057,15 @@ The value can either be a predefined allocator or a predefined memory space
or a predefined memory space followed by a colon and a comma-separated list
of memory trait and value pairs, separated by @code{=}.
+See @ref{Memory allocation} for a list of supported prefedined allocators,
+memory spaces, and traits.
+
Note: The corresponding device environment variables are currently not
supported. Therefore, the non-host @var{def-allocator-var} ICVs are always
initialized to @code{omp_default_mem_alloc}. However, on all devices,
the @code{omp_set_default_allocator} API routine can be used to change
value.
-@multitable @columnfractions .45 .45
-@headitem Predefined allocators @tab Associated predefined memory spaces
-@item omp_default_mem_alloc @tab omp_default_mem_space
-@item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space
-@item omp_const_mem_alloc @tab omp_const_mem_space
-@item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
-@item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
-@item omp_cgroup_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
-@item omp_pteam_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
-@item omp_thread_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
-@item ompx_gnu_pinned_mem_alloc @tab omp_default_mem_space (GNU extension)
-@end multitable
-
-The predefined allocators use the default values for the traits,
-as listed below. Except that the last three allocators have the
-@code{access} trait set to @code{cgroup}, @code{pteam}, and
-@code{thread}, respectively.
-
-@multitable @columnfractions .25 .40 .25
-@headitem Trait @tab Allowed values @tab Default value
-@item @code{sync_hint} @tab @code{contended}, @code{uncontended},
- @code{serialized}, @code{private}
- @tab @code{contended}
-@item @code{alignment} @tab Positive integer being a power of two
- @tab 1 byte
-@item @code{access} @tab @code{all}, @code{cgroup},
- @code{pteam}, @code{thread}
- @tab @code{all}
-@item @code{pool_size} @tab Positive integer
- @tab See @ref{Memory allocation}
-@item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb},
- @code{abort_fb}, @code{allocator_fb}
- @tab See below
-@item @code{fb_data} @tab @emph{unsupported as it needs an allocator handle}
- @tab (none)
-@item @code{pinned} @tab @code{true}, @code{false}
- @tab See below
-@item @code{partition} @tab @code{environment}, @code{nearest},
- @code{blocked}, @code{interleaved}
- @tab @code{environment}
-@end multitable
-
-For the @code{fallback} trait, the default value is @code{null_fb} for the
-@code{omp_default_mem_alloc} allocator and any allocator that is associated
-with device memory; for all other allocators, it is @code{default_mem_fb}
-by default.
-
-For the @code{pinned} trait, the default value is @code{true} for
-predefined allocator @code{ompx_gnu_pinned_mem_alloc} (a GNU extension), and
-@code{false} for all others.
-
Examples:
@smallexample
OMP_ALLOCATOR=omp_high_bw_mem_alloc
@@ -6883,6 +6835,7 @@ on more architectures, GCC currently does not match any @code{arch} or
@tab See @code{-march=} in ``Nvidia PTX Options''
@end multitable
+
@node Memory allocation
@section Memory allocation
@@ -6917,11 +6870,94 @@ The description below applies to:
@code{_Alignof} and C++'s @code{alignof}.
@end itemize
-For the available predefined allocators and, as applicable, their associated
-predefined memory spaces and for the available traits and their default values,
-see @ref{OMP_ALLOCATOR}. Predefined allocators without an associated memory
-space use the @code{omp_default_mem_space} memory space. See additionally
-@ref{Offload-Target Specifics}.
+GCC supports the following predefined allocators and predefined memory spaces:
+
+@multitable @columnfractions .45 .45
+@headitem Predefined allocators @tab Associated predefined memory spaces
+@item omp_default_mem_alloc @tab omp_default_mem_space
+@item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space
+@item omp_const_mem_alloc @tab omp_const_mem_space
+@item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
+@item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
+@item omp_cgroup_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
+@item omp_pteam_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
+@item omp_thread_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
+@item ompx_gnu_pinned_mem_alloc @tab omp_default_mem_space (GNU extension)
+@end multitable
+
+Each predefined allocator, including @code{omp_null_allocator}, has a corresponding
+allocator class template that meet the C++ allocator completeness requirements.
+These are located in the @code{omp::allocator} namespace, and the
+@code{ompx::allocator} namespace for gnu extensions. This allows the
+allocator-aware C++ standard library containers to use OpenMP allocation routines;
+for instance:
+
+@smallexample
+std::vector<int, omp::allocator::cgroup_mem<int>> vec;
+@end smallexample
+
+The following allocator templates are supported:
+
+@multitable @columnfractions .45 .45
+@headitem Predefined allocators @tab Associated allocator template
+@item omp_null_allocator @tab omp::allocator::null_allocator
+@item omp_default_mem_alloc @tab omp::allocator::default_mem
+@item omp_large_cap_mem_alloc @tab omp::allocator::large_cap_mem
+@item omp_const_mem_alloc @tab omp::allocator::const_mem
+@item omp_high_bw_mem_alloc @tab omp::allocator::high_bw_mem
+@item omp_low_lat_mem_alloc @tab omp::allocator::low_lat_mem
+@item omp_cgroup_mem_alloc @tab omp::allocator::cgroup_mem
+@item omp_pteam_mem_alloc @tab omp::allocator::pteam_mem
+@item omp_thread_mem_alloc @tab omp::allocator::thread_mem
+@item ompx_gnu_pinned_mem_alloc @tab ompx::allocator::gnu_pinned_mem
+@end multitable
+
+The following traits are available when constructing a new allocator;
+if a trait is not specified or with the value @code{default}, the
+specified default value is used for that trait. The predefined
+allocators use the default values of each trait, except that the
+@code{omp_cgroup_mem_alloc}, @code{omp_pteam_mem_alloc}, and
+@code{omp_thread_mem_alloc} allocators have the @code{access} trait
+set to @code{cgroup}, @code{pteam}, and @code{thread}, respectively.
+For each trait, a named constant prefixed by @code{omp_atk_} exists;
+for each non-numeric value, a named constant prefixed by @code{omp_atv_}
+exists.
+
+@multitable @columnfractions .25 .40 .25
+@headitem Trait @tab Allowed values @tab Default value
+@item @code{sync_hint} @tab @code{contended}, @code{uncontended},
+ @code{serialized}, @code{private}
+ @tab @code{contended}
+@item @code{alignment} @tab Positive integer being a power of two
+ @tab 1 byte
+@item @code{access} @tab @code{all}, @code{cgroup},
+ @code{pteam}, @code{thread}
+ @tab @code{all}
+@item @code{pool_size} @tab Positive integer (bytes)
+ @tab See below.
+@item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb},
+ @code{abort_fb}, @code{allocator_fb}
+ @tab See below
+@item @code{fb_data} @tab @emph{allocator handle}
+ @tab (none)
+@item @code{pinned} @tab @code{true}, @code{false}
+ @tab See below
+@item @code{partition} @tab @code{environment}, @code{nearest},
+ @code{blocked}, @code{interleaved}
+ @tab @code{environment}
+@end multitable
+
+For the @code{fallback} trait, the default value is @code{null_fb} for the
+@code{omp_default_mem_alloc} allocator and any allocator that is associated
+with device memory; for all other allocators, it is @code{default_mem_fb}
+by default.
+
+For the @code{pinned} trait, the default value is @code{true} for
+predefined allocator @code{ompx_gnu_pinned_mem_alloc} (a GNU extension), and
+@code{false} for all others.
+
+The following description applies to the initial device (the host) and largely
+also to non-host devices; for the latter, also see @ref{Offload-Target Specifics}.
For the memory spaces, the following applies:
@itemize
@@ -6936,14 +6972,16 @@ For the memory spaces, the following applies:
@end itemize
On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
-library} (@code{libmemkind.so.0}) is available at runtime, it is used when
-creating memory allocators requesting
+library} (@code{libmemkind.so.0}) is available at runtime and the respective
+memkind kind is supported, it is used when creating memory allocators requesting
@itemize
-@item the memory space @code{omp_high_bw_mem_space}
-@item the memory space @code{omp_large_cap_mem_space}
-@item the @code{partition} trait @code{interleaved}; note that for
- @code{omp_large_cap_mem_space} the allocation will not be interleaved
+@item the @code{partition} trait @code{interleaved} except when the memory space
+ is @code{omp_large_cap_mem_space} (uses @code{MEMKIND_HBW_INTERLEAVE})
+@item the memory space is @code{omp_high_bw_mem_space} (uses
+ @code{MEMKIND_HBW_PREFERRED})
+@item the memory space is @code{omp_large_cap_mem_space} (uses
+ @code{MEMKIND_DAX_KMEM_ALL} or, if not available, @code{MEMKIND_DAX_KMEM})
@end itemize
On Linux systems, where the @uref{https://github.com/numactl/numactl, numa
@@ -6969,10 +7007,15 @@ a @code{nearest} allocation.
Additional notes regarding the traits:
@itemize
@item The @code{pinned} trait is supported on Linux hosts, but is subject to
- the OS @code{ulimit}/@code{rlimit} locked memory settings.
+ the OS @code{ulimit}/@code{rlimit} locked memory settings. It currently
+ uses @code{mmap} and is therefore optimized for few allocations, including
+ large data. If the conditions for numa or memkind allocations are
+ fulfilled, those allocators are used instead.
@item The default for the @code{pool_size} trait is no pool and for every
(re)allocation the associated library routine is called, which might
- internally use a memory pool.
+ internally use a memory pool. Currently, the same applies when a
+ @code{pool_size} has been specified, except that once allocations exceed
+ the the pool size, the action of the @code{fallback} trait applies.
@item For the @code{partition} trait, the partition part size will be the same
as the requested size (i.e. @code{interleaved} or @code{blocked} has no
effect), except for @code{interleaved} when the memkind library is
@@ -6981,13 +7024,15 @@ Additional notes regarding the traits:
that allocated the memory; on Linux, this is in particular the case when
the memory placement policy is set to preferred.
@item The @code{access} trait has no effect such that memory is always
- accessible by all threads.
+ accessible by all threads. (Except on supported no-host devices.)
@item The @code{sync_hint} trait has no effect.
@end itemize
See also:
@ref{Offload-Target Specifics}
+
+
@c ---------------------------------------------------------------------
@c Offload-Target Specifics
@c ---------------------------------------------------------------------