Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

2025-06-02 Thread Andrew Stubbs
On 02/06/2025 15:40, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: The hsa_memory_copy API is known to be slow, so for smaller data sizes it's probably better to have one hsa_memory_copy replace the whole memset than use three API calls, even with setting up some host-side memory to co

Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

2025-06-02 Thread Andrew Stubbs
On 30/05/2025 23:36, Tobias Burnus wrote: Attached patch adds omp_target_memset and omp_target_memset_async permitting to set (potentially large) data on the device to a certain value - in particular to '\0'. It uses 'memset' on the host (and for shared memory, e.g. via requires unified_shared_m

Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

2025-05-30 Thread Sandra Loosemore
On 5/30/25 16:36, Tobias Burnus wrote: Attached patch adds omp_target_memset and omp_target_memset_async permitting to set (potentially large) data on the device to a certain value - in particular to '\0'. It uses 'memset' on the host (and for shared memory, e.g. via requires unified_shared_memo

[Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

2025-05-30 Thread Tobias Burnus
Attached patch adds omp_target_memset and omp_target_memset_async permitting to set (potentially large) data on the device to a certain value - in particular to '\0'. It uses 'memset' on the host (and for shared memory, e.g. via requires unified_shared_memory/self_maps). For nvptx, cuMemsetD8 is