https://github.com/tblah commented:

I don't have a strong opinion about whether this should be done in flang 
codegen or in a runtime library. Some drive-by thoughts:

- I imagine that using a C `omp for` inside of a fortran `omp parallel` should 
work **so long as they use the same openmp library**. Of course care will need 
to be taken about privatisation etc but that can probably be handled in the 
interface of the runtime function.
- The above requirement basically breaks allowing users to pick their own 
openmp library when compiling their application. I'm not sure how widely used 
this is.
- This is already hard to read/review and SUM is a very simple case. I think 
there is a danger in re-inventing the wheel here. Something like a high quality 
parallelised MATMUL which is useful on both CPU and GPU sounds hard. There is a 
lot of work in that direction in "upstream" MLIR dialects. I know flang (for 
historical reasons) doesn't use linalg, affine, etc but we might have to 
eventually.
- If we decide not to re-use upstream mlir work, I think more complex 
intrinsics could get quite hard to review in this style. A C++ runtime library 
implementation would be easier to read and maintain
- Having it in a runtime library would make it easier for vendors to swap out 
their own implementations e.g. maybe somebody already has a great MATMUL for 
AMD GPUs and wants flang to use that, but the implementation is useless on a CPU

Overall I don't know what the right approach is. Maybe we should do both as 
Ivan said. I think if we do decide to implement several intrinsics this way, it 
should go in its own pass because the code could quickly become very long.

https://github.com/llvm/llvm-project/pull/113082
_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to