Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

Andrew Stubbs Mon, 02 Jun 2025 08:02:37 -0700

On 02/06/2025 15:40, Tobias Burnus wrote:

Hi Andrew,
Andrew Stubbs wrote:
The hsa_memory_copy API is known to be slow, so for smaller data sizesit's probably better to have one hsa_memory_copy replace the wholememset than use three API calls, even with setting up some host-sidememory to copy from. This is probably pretty easy to measure anyway.
I have now done some benchmarking - see attached testcase + test result.
The updated the code to switch over from copy to memset only for largervalues or when alignment+counts permit a single fill call.
I bet there are some nits, otherwise, I intent to commit the patch soon.

Thanks for the comments, Andrew & Sandra!

Tobias

+  /* A memset feature is only provided via hsa_amd_memory_fill; while it
+     is fast, it is an HSA extension and it two requirements: The memory
+     must be aligned to multiples of 4 bytes - and, by construction, only
+     multiples of 4 bytes can be filled (uint32_t value argument).


"it *has* two requirements"

+     This means: Either not using that function or up to three function calls:
+     functions:


"function calls: functions:" looks like an editing issue.

+     - copy remaining 1 to 3 bytes (hsa_memory_copy), if after alignment
+       count it not a multiple of 4 bytes.


a/it/is/

+     Having more than one function call is only profitible if there is
+     enough data to process; see below for the used heuristic values.  */


profitable

+  uint8_t v8 = (uint8_t) val;


Is that safe for negative values? Probably, but I always worry it isn't.

+  /* Heuristik  */


Heuristic.

Otherwise the GCN parts LGTM.

Andrew

Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

Reply via email to