Hi Jakub, On 30.06.22 10:21, Jakub Jelinek wrote:
So, what is the plan with reverse offload?
My idea was to just call omp_target_ext with 'device(omp_initial_device)'. This then automatically works when called from a target region that runs on omp_get_initial_device(). For the actual device part, this can be implemented incrementally by supporting the reverse_offload for a given device type. For getting it to work when the code enclosing the ancestor:1 target region runs on an offloading device, my idea is the following. Comments are welcome! My idea was to do the same as done for I/O (which supported for both nvptx and gcn). For GCN: libgomp/plugin/plugin-gcn.c has: struct kernargs { /* A pointer to struct output, below, for console output data. */ int64_t out_ptr; /* A pointer to struct heap, below. */ int64_t heap_ptr; /* A pointer to an ephemeral memory arena. Only needed for OpenMP. */ int64_t arena_ptr; /* to be added: */ /* A pointer to reverse-offload. */ int64_t rev_ptr; /* Now come the actual structs.*/ /* Output data. */ struct output { int return_value; unsigned int next_output; struct printf_data { ... }; This gets initialized on the host and then: while (hsa_fns.hsa_signal_wait_acquire_fn (s, HSA_SIGNAL_CONDITION_LT, 1, 1000 * 1000, HSA_WAIT_STATE_BLOCKED) != 0) console_output (kernel, shadow->kernarg_address, false); with: unsigned int from = __atomic_load_n (&kernargs->output_data.consumed, __ATOMIC_ACQUIRE); The I/O itself is implemented in newlib, https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/sys/amdgcn/write.c register void **kernargs asm("s8"); struct output *data = (struct output *)kernargs[2]; and then the data is filled. For reverse offload, the idea is fill it on the device side via /libgomp/config/gcn/target.c's GOMP_target_ext for device == GOMP_DEVICE_HOST_FALLBACK && fn != NULL as: Try to obtain a lock (busy wait) Put addr/kinds/sizes into the struct Put the device's fn pointer in the struct busy wait for completion ('while (fn != NULL) { }') unlock And on the host side: If fn == NULL (= data there) - return output/offload checking loop Otherwise: call a new function in target.c and pass args to it. Once it completed, set fn = NULL to indicate it has been processed. And in target.c's new reverse-offload-handling function: - find generated-target function on the host, based on device stub function's pointer address - Handle the mapping - Call host function - Handle the mapping - return Additionally: If 'requires reverse_offload' is set, fill not only the normal splay_tree for "host -> device" lookup but also another one for the "device -> host" lookups. Does this make sense? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955