ivanradanov wrote: I think there are two main approaches:
### Have alternative intrinsic implementations in their own runtime library e.g. the version of assignment for the workshare construct will be something like this ``` workshare_Assign(a, b) { #pragma omp for for i from 0 to size a[i] = b[i] } ``` And then for the supported subset the lowering of the array intrinsic becomes not ``` omp.parallel { omp.single { call Assign() } } ``` but ``` omp.parallel { call workshare_Assign() } ``` (It can also be implemented as a flag to the runtime function telling it which version it should use and not a different rt func) Hopefully using this approach would allow us to share some code between the current intrinsic runtime implementations (which do not assume an omp context) and these ones. I think this approach may be more flexible and allow for better optimizations like offloading to BLAS libraries or having an optimized code path behind an aliasing check. (These are possible in the second approach but more cumbersome?) ### Code gen the intrinsics at the IR level Which is what this approach would be. My concern with this is that it may be quite a lot of effort, but perhaps it has some benefits in that we should have better support of intrinsics on GPU targets where the current runtime does not compile as is? Maybe looking for some discussion on this when the approach with using a runtime library for the intrinsic implementation was decided could be useful, if it exists. https://github.com/llvm/llvm-project/pull/113082 _______________________________________________ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits