https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119325

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[15 Regression]             |[15 Regression]
                   |libgomp.c/simd-math-1.c     |libgomp.c/simd-math-1.c
                   |(gcn offloading): timeout   |(gcn offloading): timeout
                   |(for fmodf, remainderf)     |(for fmodf, remainderf)
                   |since                       |since
                   |r15-7284-g6b56e645a7b481    |r15-7257-g54bdeca3c62144

--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Update:

* When including newlib in the build (i.e. do proper bisecting); the
  fail-causing commit is r15-7257-g54bdeca3c62144

    commit 54bdeca3c6214485d15454df30183a56ad3e473b
    Author: Richard Biener
    Date:   Tue Jan 28 16:20:30 2025 +0100

    middle-end/118684 - wrongly aligned stack local during expansion

* The testcase has an inconsistency, which does not seem to affect the fail.
  (As there is then an implicit 'map(tofrom: b)'.) Still, it seems to be
  cleaner to add it explicitly (macro definition for TEST_FUN2):

-  _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \
+  _Pragma ("omp target parallel for simd map(to:a,b) map(from:res)") \

* * *

Reduced example but still using offloading
  I tried -O1 but this will unbreak the example.
----------------------------

#include <math.h>
static volatile int idx = 0;

void test_fmodf (void) {
  float res[512], a[512], b[512];
  for (int i = 0; i < 512; i++)  {
       a[i] = -10.0 + ((10.0 - -10.0) / 512) * i;
       b[i] = 100.0 + ((-25.0 - 100.0) / 512) * i;
    }
  #pragma omp target parallel for simd map(to:a,b) map(from:res)
    for (int i = 0; i < 512; i++)
      res[i] = fmodf (a[i], b[i]);
  __builtin_printf ("%f\n", res[idx]);
}
int main (void) { test_fmodf (); }

* * *

If I compile the program – either the reduced or the full one - directly for
offloading (w/o specifying '-fopenmp'), it WORKS.

Namely, I tried (gcn compiler):

$build/gcc/xgcc -B $build/gcc -lm -L $build/amdgcn-amdhsa/gfx908/newlib/ \
  -march=gfx908 -I $inst/amdgcn-amdhsa/include/ -O2 -ftree-vectorize
-fno-math-errno -fopenmp-simd

LD_LIBRARY_PATH=/opt/rocm/lib .../accel/amdgcn-amdhsa/gcn-run ./a.out

* * *

For the OpenMP build, comparing a.xamdgcn-amdhsa.mkoffload.2.s shows no
differences, contrary to gfx908/newlib/libm/machine/amdgcn/libm_a-v64sf_fmod.s.

The code uses v64sf_fmodf, v32sf_fmodf and fmodf.

Reply via email to