https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118518

--- Comment #16 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Hi @all

I just wanted to give you an update how my library progresses and what
additional bugs I've found in GCC so far.

Since some of you are familiar with my code, perhaps you can help to fix them,
as these are, I think, problems of gcc...


I ported my code to openmp and tried to make the gpu offloading more object
oriented, i.e. instead of having global values for the offload, put the
offloading code into the mdspan class which can also make some checks and
support many offloading devices and so on before it calls the globally
available mapping functions for the offloading references....

Apparently , gcc can then no longer see in the accellerated loops that the
datastructs were really offloaded... 

I filed a bug on this here. 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120814

I also compared with clang.

It is embarassing to load a computation into a computer, and exactly the same
code will yield a totally different result on 2 compilers. Unfortunately, the
result of gcc is incorrect. It appears to compute on the host for whatever
reason instead of on the device...

Note that if i map the structs with the globally available helper functions,
instead of with object members, it works fine. But of course this is
inconvenient for multi device support and so on....

one can replace the mapping macros of course with pointers. My attempts to do
that yielded problems where the runtime exits with errors that the mapping was
not complete or something... the macros do not yield these problems but when
the variables are not recognized as mapped by the loop construct (even if the
data are mapped), it is still a bit inconvenient...



Ah and I also came up on this here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120753

the offloading functions want that also has space on the host allocated, making
it problematic to use mappings for temporary data on gpu. functions like
device_alloc would help, but then one is left with a device pointer. 

If that device ptr is a member of a struct, the loop constructs do not
recognize it in is_device_ptr as they can not follow the dots.. The mapping
commands can follow the dots but they require host allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120753
  • [Bug target/118518] gcc... schulz.benjamin at googlemail dot com via Gcc-bugs

Reply via email to