[Bug target/101544] [OpenMP][AMDGCN][nvptx] C++ offloading: unresolved _Znwm = "operator new(unsigned long)"

schulz.benjamin at googlemail dot com via Gcc-bugs Thu, 16 Jan 2025 10:41:54 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101544


--- Comment #16 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
"BTW, if you're calling "new" in the offload kernel then you're probably "doing
it wrong","

I do not think so. For more complex mathematical algorithms, there are many
situations, where we need temporary buffers to store some data.

Lets look, for example, at this qr decompositon here: 

https://arxiv.org/pdf/1812.02056

You need a copy of the initial data, and you need to alloc free space for a
temporary matrix C on p. 5...

The host can demand that the target allocates this with functions like 

omp_target_alloc or #pragma omp target enter data alloc , or omp target
map(alloc:)

However, when you are

#pragma omp declare target region

there is not much reason, why that should not be able to create the needed
temporary data cache at the beginning of the function with a new
double*x=double[mysize] call on the target... 


When the buffer is created within the target, this has the benefit that the
caller from the host may just wrap the function into a #pragma omp target area,
map its arguments to the device, call it, give it its input and then the
function allocates every temporary caches it needs by itself at the beginning. 

I think this is better code style, than the host advising the target to
allocate a temporary variable cache. 

That way, the same function can even be run on the host, and is on the target
only if wrapped in a #pragma omp target map(tofrom: data){myfunction(data)}
area. 


One can, however, note that omp alloc and omp target alloc are better for this,
since its memory allocators allow to specify whether one wants to put the data
into fast memory, or into memory for large data,(provided the hardware has such
a designation).this allows, e.g. to put the strides of a matrix into fast
memory, and the matrix data into memory for large data. The #omp target map
pragmas, or the new[] keyword of c do not have this flexibility.

[Bug target/101544] [OpenMP][AMDGCN][nvptx] C++ offloading: unresolved _Znwm = "operator new(unsigned long)"

Reply via email to