https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105873
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I think it might be interesting to see which private values are used when: #pragma omp declare target int foo (void) { int result = 0; void **buf = __builtin_malloc (8192 * 2 * sizeof (void *)); #pragma omp taskgroup task_reduction(+: result) { for(int i = 0; i < 8192; ++i) { #pragma omp task in_reduction(+: result) { result += 1; buf[2 * i] = &result; buf[2 * i + 1] = (void *) (__INTPTR_TYPE__) result; } } } for(int i = 0; i < 8192; ++i) __builtin_printf ("%d %p %d\n", i, buf[2 * i], (int) (__INTPTR_TYPE__) buf[2 * i + 1]); __builtin_free (buf); return result; } #pragma omp end declare target int main () { int r = 0; #pragma omp target parallel map(from:r) num_threads(8) #pragma omp single r = foo (); if (r != 8192) __builtin_abort (); return 0; } Perhaps if it reproduces even with smaller value than 8192 would be nice to use a smaller value that still reproduces. If it hangs before printing that, would be nice to find out where. The code in foo calls GOMP_taskgroup_start and GOMP_taskgroup_reduction_register (the latter should allocate the 8 private copies of result), then in the task body it calls GOMP_task_reduction_remap and finally at the end of the taskgroup, it calls GOMP_taskgroup_end, performs (serially) reduction from those 8 private copies, and then GOMP_taskgroup_reduction_unregister to free that. Adding some printfs to the library side somewhere might help, e.g. to find out if GOMP_taskgroup_end has been reached (it should be reached by a single thread only) and whether it has finished.