https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118518
--- Comment #16 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- Hi @all I just wanted to give you an update how my library progresses and what additional bugs I've found in GCC so far. Since some of you are familiar with my code, perhaps you can help to fix them, as these are, I think, problems of gcc... I ported my code to openmp and tried to make the gpu offloading more object oriented, i.e. instead of having global values for the offload, put the offloading code into the mdspan class which can also make some checks and support many offloading devices and so on before it calls the globally available mapping functions for the offloading references.... Apparently , gcc can then no longer see in the accellerated loops that the datastructs were really offloaded... I filed a bug on this here. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120814 I also compared with clang. It is embarassing to load a computation into a computer, and exactly the same code will yield a totally different result on 2 compilers. Unfortunately, the result of gcc is incorrect. It appears to compute on the host for whatever reason instead of on the device... Note that if i map the structs with the globally available helper functions, instead of with object members, it works fine. But of course this is inconvenient for multi device support and so on.... one can replace the mapping macros of course with pointers. My attempts to do that yielded problems where the runtime exits with errors that the mapping was not complete or something... the macros do not yield these problems but when the variables are not recognized as mapped by the loop construct (even if the data are mapped), it is still a bit inconvenient... Ah and I also came up on this here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120753 the offloading functions want that also has space on the host allocated, making it problematic to use mappings for temporary data on gpu. functions like device_alloc would help, but then one is left with a device pointer. If that device ptr is a member of a struct, the loop constructs do not recognize it in is_device_ptr as they can not follow the dots.. The mapping commands can follow the dots but they require host allocation https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120753