https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102330

--- Comment #11 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Jakub, thanks again for your comments!


I've written down the following while working through the issue.  Maybe you'd
like to verify, and maybe it'll useful for someone later (if only my future
self...).


(In reply to Thomas Schwinge from comment #8)
> (In reply to Richard Biener from comment #1)
> > Confirmed.  omp expansion seems to introudce this non-gimple code and 
> > likely also makes 'i' not a register (for OACC):
> > 
> >   .data_dep.5D.4044 = .UNIQUE (OACC_PRIVATE, .data_dep.5D.4044, -1, 
> > &iD.4045);
> 
> That's built in 'gcc/omp-low.cc:lower_oacc_private_marker'.  It's not marked
> 'TREE_ADDRESSABLE' (per debug log above), but the address of 'i' has been
> taken (here).

> So if in 'gcc/omp-expand.cc:expand_oacc_for' I force 'TREE_ADDRESSABLE',
> then the ICE goes away, and we get valid GIMPLE generated, per my
> understanding.
> 
> So, something like this maybe (untested)?  
> 
>     --- gcc/omp-low.cc
>     +++ gcc/omp-low.cc
>     @@ -11513,6 +11513,7 @@ lower_oacc_private_marker (omp_context *ctx)
>             }
>            gcc_checking_assert (decl);
>      
>     +      TREE_ADDRESSABLE (decl) = 1;
>            tree addr = build_fold_addr_expr (decl);
>            args.safe_push (addr);
>          }

That's bogus: per 'gcc/omp-low.cc:oacc_privatization_candidate_p', we should
only be getting here for 'TREE_ADDRESSABLE' DECLs.  Adding such an 'assert', it
triggers exactly for test cases like those we're discussing here.

However, before we look into the OpenACC privatization details, let's first
turn your test case into an OpenMP-only code, as follows:

(Based on Jakub Jelinek from comment #7)
>  program p
>    i = 0
>    !$omp task shared(i)
>    i = 1
>    !$omp end task
>    !$omp taskwait
> -  !$acc parallel loop
> +  !$omp parallel do
>    do i = 1, 8
> +     call f(i)
>    end do
> +  contains
> +
> +    subroutine f(v)
> +      integer :: v
> +      !integer, value :: v
> +    end
>  end

The 'call f(i)' serves as a convenient means to toggle whether or not 'i' in
the loop body needs to be addressable.

Using the 'integer :: v' variant, the 'call f(i)' is by-reference, and thus 'i'
is made "early 'TREE_ADDRESSABLE'".

Relevant part of '*.original' dump:

    [...]
    void p ()
    {
      [...]
      integer(kind=4)D.8 iD.4232;

      iD.4232 = 0;
      #pragma omp task shared(iD.4232)
        {
          {
            {
              iD.4232 = 1;
            }
          }
        }
      __builtin_GOMP_taskwaitD.494 ();
      #pragma omp parallel
        {
          {
            #pragma omp for nowait
            for (iD.4232 = 1; iD.4232 <= 8; iD.4232 = iD.4232 + 1)
              {
                {
                  fD.4229 (&iD.4232);
                }
                L.1D.4233:;
              }
          }
        }
    }
    [...]

Given this "early 'TREE_ADDRESSABLE'" (notice 'fD.4229 (&iD.4232);'), already
the gimplifier replaces the original loop variable 'iD.4232' with a
non-addressable 'i.1D.4239'.  Relevant part (edited) of '*.original' vs.
'*.gimple' dumps diff:

           #pragma omp parallel
             {
    +          integer(kind=4)D.8 i.1D.4239;
    +
    -        #pragma omp for nowait
    -        for (iD.4232 = 1; iD.4232 <= 8; iD.4232 = iD.4232 + 1)
    +          #pragma omp for nowait private(i.1D.4239) private(iD.4232)
    +          for (i.1D.4239 = 1; i.1D.4239 <= 8; i.1D.4239 = i.1D.4239 + 1)
                 {
    +              iD.4232 = i.1D.4239;
                   {
                     fD.4229 (&iD.4232);
                   }

In particular, notice that now a 'private(i.1D.4239)' appears in addition to
'private(iD.4232)'.

The OMP lowering then again replaces the loop variable 'i.1D.4239' with
'i.1D.4255' (I have not researched why) -- but it doesn't update the
'private(i.1D.4239)' clause accordingly (so that one persists, continues to
point to the original, now-unused DECL), and/or it doesn't introduce a new
'private(i.1D.4255)' replacement clause.  Similar for the 'iD.4232' ->
'iD.4256' transformation and unchanged 'private(iD.4232)' clause.  Relevant
part (edited) of '*.gimple' vs. '*.omplower' dumps diff:

           #pragma omp parallel [...]
           {
    -          integer(kind=4)D.8 i.1D.4239;
    +              integer(kind=4)D.8 iD.4256;
    +              integer(kind=4)D.8 i.1D.4255;

                   #pragma omp for nowait private(i.1D.4239) private(iD.4232)
    -          for (i.1D.4239 = 1; i.1D.4239 <= 8; i.1D.4239 = i.1D.4239 + 1)
    +              for (i.1D.4255 = 1; i.1D.4255 <= 8; i.1D.4255 = i.1D.4255 +
1)
                   {
    -              iD.4232 = i.1D.4239;
    +              iD.4256 = i.1D.4255;
                   {
    -                fD.4229 (&iD.4232);
    +                fD.4229 (&iD.4256); [static-chain: [...]]

I have confirmed that 'lookup_decl([i.1D.4239], ctx)' -> 'i.1D.4255' and
'lookup_decl([iD.4232], ctx)' -> 'iD.4256'.  The underlying assumption must be
that any such later use of the 'OMP_CLAUSE_DECL' of a 'OMP_CLAUSE_PRIVATE' is
filtered through 'lookup_decl'.  In other words, I assume it is intentional
that the 'OMP_CLAUSE_DECL' of a 'OMP_CLAUSE_PRIVATE' continues to point to the
original DECL, presumably because we still need to look up information from
that one?

Now, changing the scenario to the 'integer, value :: v' variant, the 'call
f(i)' is by-value, and 'i' is not made "early 'TREE_ADDRESSABLE'".

Relevant part (edited) of by-reference vs. by-value '*.original' dumps diff:

    -              fD.4229 (&iD.4232);
    +              fD.4229 (iD.4232);

Relevant part (edited) of by-reference vs. by-value '*.gimple' dumps diff:

    -          integer(kind=4)D.8 i.1D.4239;
    -
    -          #pragma omp for nowait private(i.1D.4239) private(iD.4232)
    -          for (i.1D.4239 = 1; i.1D.4239 <= 8; i.1D.4239 = i.1D.4239 + 1)
    +      #pragma omp for nowait private(iD.4232)
    +      for (iD.4232 = 1; iD.4232 <= 8; iD.4232 = iD.4232 + 1)
                 {
    -              iD.4232 = i.1D.4239;
                   {
    -                fD.4229 (&iD.4232);
    +            fD.4229 (iD.4232);

As 'iD.4232' itself is non-addressable at this time, the gimplifier doesn't
here introduce a non-addressable loop variable and corresponding 'private'
clause.

As you (Jakub) have explained above, it's now on the OMP lowering to make
'iD.4232' "late 'TREE_ADDRESSABLE'", but in the loop body, the loop variable
'iD.4254' can directly be used in the 'fD.4229 (iD.4254);' function call. 
Relevant part of '*.omplower' dump:

    [...]
        #pragma omp parallel shared(FRAME.1D.4239) [child fn:
MAIN__._omp_fn.1D.4249 (.omp_data_o.6D.4259)]
          {
            .omp_data_iD.4251 = (struct .omp_data_s.4D.4248 & restrict)
&.omp_data_o.6D.4259;
            {
              integer(kind=4)D.8 iD.4254;

              #pragma omp for nowait private(iD.4232)
              for (iD.4254 = 1; iD.4254 <= 8; iD.4254 = iD.4254 + 1)
              {
                D.4258 = .omp_data_iD.4251->FRAME.1D.4252;
                fD.4229 (iD.4254); [static-chain: D.4258]
    [...]

Again I have confirmed that 'lookup_decl([iD.4232], ctx)' -> 'iD.4254'.

And now, I understand the problem regarding the OpenACC privatization is that
it does the analysis ('gcc/omp-low.cc:oacc_privatization_scan_clause_chain'
etc.) on the original DECL, but does the transformation
('gcc/omp-low.cc:lower_oacc_private_marker' etc.) on the 'lookup_decl'ed DECL
-- and these need not correspond regarding 'TREE_ADDRESSABLE', as we've seen
above.

Now I'm looking into at which end to fix this: analysis vs. transformation,
and/or whether loop variables should altogether be excempt from this
transformation (but that may actually fall out of the former).

Reply via email to