https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94848

            Bug ID: 94848
           Summary: [Offloading][LTO] partial var elimination errors /
                    -ftree-pre causes link errors |
                    libgomp.fortran/use_device_ptr-optional-3.f90 failures
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: lto
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Compiling
  gfortran -fopenmp libgomp.fortran/use_device_ptr-optional-3.f90 \
      -O1 -foffload=-lgfortran -ftree-pre

with actual offloading (amdgcn, nvidia) fails with:
  /tmp/ccUEo7uX.o:(.gnu.offload_vars+0x10): undefined reference to `A.12.5'

It works with -fno-tree-pre (or when compiling without actual offloading).

The optimization happens on the host side as -foffload="-O0 -lgfortran"
does not solve the issue.


In the Fortran code, this array (A.12) appears in a device function ("omp
declare target") as:
    if (any (c_z /= [1,2,3])) stop 37

As mentioned below, the other array (A.9) appears in:
    if (any (x /= [3,4,6,2]))  stop 44

And in the dump as:
  static integer(kind=4) A.12[3] = {1, 2, 3};
  static integer(kind=4) A.9[4] = {3, 4, 6, 2};
…
  _20 = A.9[S.10];
…
  _26 = A.12[S.13_67];


In the optimized dump (-fno-tree-pre):
  ivtmp.333_78 = (unsigned long) &A.9;
…
  ivtmp.325_89 = (unsigned long) &A.12;

But with -ftree-pre, the last assignment is gone – but 
  <bb 43> [local count: 428295]:
  _gfortran_stop_numeric (37, 0);
still exists. Here, the array has been "unrolled", i.e.:

  if (_61 != 1)
    goto <bb 43>; [5.50%]

(Followed by the conditions for "2" and "3".)

That's perfectly fine and optimizes "A.12" away.

 * * *

If I look at the dumps (-fdump-tree-all) on the device side, those (still)
contain:
  pretmp_157 = A.12[_15];
…
  if (_134 != pretmp_157)
     goto <bb 45>; [5.50%]

My impression is that the local static variable "A.12" is removed before
writing the LTO data – based on the -ftree-pre analysis.

But the LTO expression usage is written before that removal. – At least that
would explain why it fails on the device side.

Reply via email to