skatrak wrote:

> wsloop expects its parent block to be a parallel block which all threads will 
> execute and all of those threads will share the work of the nested loop nest.

Yes, the `omp.wsloop` binds to the current team (usually the innermost 
`omp.parallel` parent). It doesn't have to be the direct parent, though, there 
can be other constructs in between.

> Whereas the workshare.loop_nest op is semantically executed by a 
> single-thread (because the workshare directive acts like it preserves the 
> semantics of single-threaded fortran execution.).

My understanding is that `omp.workshare` would be the operation defining a 
region with sequential execution (as if it was a single thread of the enclosing 
parallel region), but then there can be worksharing loops inside where all 
threads split loop iterations, which is why I proposed using `omp.wsloop`. 
Thinking about this, maybe this could be implemented based on existing OpenMP 
operations:

```f90
subroutine workshare(A, B, C)
  integer, parameter :: N = 10
  integer, intent(in) :: A(N), B(N)
  integer, intent(out :: C(N)
  integer :: tmp

  !$omp parallel workshare
  C = A + B
  tmp = N
  C = C + A
  !$omp end parallel workshare
end subroutine workshare
```
```mlir
func.func @workshare(%A : ..., %B : ..., %C : ...) {
  %N = arith.constant 10 : i32
  %tmp = fir.alloca i32
  omp.parallel {
    omp.wsloop {
      omp.loop_nest (%i) : i32 = (%...) to (%...) inclusive step (%...) {
        // C(%i) = A(%i) + B(%i)
        omp.yield
      }
      omp.terminator
    }
    omp.single {
      fir.store %N to %tmp
      omp.terminator
    }
    omp.wsloop {
      omp.loop_nest (%i) : i32 = (%...) to (%...) inclusive step (%...) {
        // C(%i) = C(%i) + A(%i)
        omp.yield
      }
      omp.terminator
    }
    omp.terminator
  }
}
```
Maybe support for this operation could be just based on changes to how the MLIR 
representation is built in the first place, what do you think? Otherwise, 
something more similar to your proposal for `workdistribute`, introducing only 
the `omp.workshare` operation, keeping `fir.do_loop` inside and having some 
sort of pass to translate this to `omp.wsloop` and `omp.single` (or splitting 
the parent `omp.parallel`) would be possible too. I just think that seems a bit 
too complex.


https://github.com/llvm/llvm-project/pull/101445
_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to