I noted that all memory spaces are supported, some by falling back to the default ("malloc") - except for omp_high_bw_mem_space (unless the memkind lib is available).
I think it makes more sense to fallback to 'malloc' also for omp_high_bw_mem_space. Additionally, I updated the documentation to more explicitly state what the current implementation is. Thoughts? Wording improvement suggestions? Tobias PS: I wonder whether it makes sense to use use libnuma besides libmemkind (which depends on libnuma); however, the question is when. libnuma provides numa_alloc_interleaved(_subset), numa_alloc_local and numa_alloc_onnode. In any case, something is odd here. I have two nodes, 0 and 1 (→ 'lscpu') and 'numactls --show' shows "preferred node: current". I allocate memory and then use the following to find the node: "get_mempolicy (&node, NULL, 0, ptr, MPOL_F_ADDR|MPOL_F_NODE)" Result: With malloc'ed data, it shows the same node as the node running the code (i.e. the same as 'getcpu (NULL, &node1);' == 'numa_node_of_cpu (sched_getcpu());'). But I get a constant result of 1 for numa_alloc_local and numa_alloc_onnode, independent of the passed node number (0 or 1) and on the CPU the thread runs on. ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space libgomp/ChangeLog: * allocator.c (omp_init_allocator): Use malloc for omp_high_bw_mem_space when the memkind lib is unavailable instead of returning omp_null_allocator. * libgomp.texi (Memory allocation with libmemkind): Document implementation in more details. libgomp/allocator.c | 2 +- libgomp/libgomp.texi | 26 +++++++++++++++++++++++++- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/libgomp/allocator.c b/libgomp/allocator.c index c49931cbad4..25c0f150302 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -301,7 +301,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits, break; } #endif - return omp_null_allocator; + break; case omp_large_cap_mem_space: #ifdef LIBGOMP_USE_MEMKIND memkind_data = gomp_get_memkind (); diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 7d27cc50df5..b1f58e74903 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -4634,6 +4634,17 @@ smaller number. On non-host devices, the value of the @node Memory allocation with libmemkind @section Memory allocation with libmemkind +For the memory spaces, the following applies: +@itemize +@item @code{omp_default_mem_space} is supported +@item @code{omp_const_mem_space} maps to @code{omp_default_mem_space} +@item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space} +@item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space}, + unless the memkind library is available +@item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space}, + unless the memkind library is available +@end itemize + On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind library} (@code{libmemkind.so.0}) is available at runtime, it is used when creating memory allocators requesting @@ -4641,9 +4652,22 @@ creating memory allocators requesting @itemize @item the memory space @code{omp_high_bw_mem_space} @item the memory space @code{omp_large_cap_mem_space} -@item the partition trait @code{omp_atv_interleaved} +@item the partition trait @code{omp_atv_interleaved}; note that for + @code{omp_large_cap_mem_space} the allocation will not be interleaved @end itemize +Additional notes: +@itemize +@item The @code{pinned} trait is unsupported. +@item For the @code{partition} trait, the partition part size will be the same + as the requested size (i.e. @code{interleaved} or @code{blocked} has no + effect), except for @code{interleaved} when the memkind library is + available. Furthermore, @code{nearest} might not always return memory + on the node of the CPU that triggered an allocation. +@item The @code{access} trait has no effect such that memory is always + accessible by all threads. +@item The @code{sync_hint} trait has no effect. +@end itemize @c --------------------------------------------------------------------- @c Offload-Target Specifics