ivanradanov wrote:

I think there are two main approaches: 



### Have alternative intrinsic implementations in their own runtime library

e.g. the version of assignment for the workshare construct will be something 
like this

```
workshare_Assign(a, b) {
  #pragma omp for
  for i from 0 to size
    a[i] = b[i]
}
```

And then for the supported subset the lowering of the array intrinsic becomes 
not

```
omp.parallel {
  omp.single {
    call Assign()
  }
}
```

but 

```
omp.parallel {
  call workshare_Assign()
}
```

(It can also be implemented as a flag to the runtime function telling it which 
version it should use and not a different rt func)

Hopefully using this approach would allow us to share some code between the 
current intrinsic runtime implementations (which do not assume an omp context) 
and these ones.

I think this approach may be more flexible and allow for better optimizations 
like offloading to BLAS libraries or having an optimized code path behind an 
aliasing check. (These are possible in the second approach but more cumbersome?)


### Code gen the intrinsics at the IR level

Which is what this approach would be. My concern with this is that it may be 
quite a lot of effort, but perhaps it has some benefits in that we should have 
better support of intrinsics on GPU targets where the current runtime does not 
compile as is?

Maybe looking for some discussion on this when the approach with using a 
runtime library for the intrinsic implementation was decided could be useful, 
if it exists.

https://github.com/llvm/llvm-project/pull/113082
_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to