https://github.com/tblah commented:
I don't have a strong opinion about whether this should be done in flang codegen or in a runtime library. Some drive-by thoughts: - I imagine that using a C `omp for` inside of a fortran `omp parallel` should work **so long as they use the same openmp library**. Of course care will need to be taken about privatisation etc but that can probably be handled in the interface of the runtime function. - The above requirement basically breaks allowing users to pick their own openmp library when compiling their application. I'm not sure how widely used this is. - This is already hard to read/review and SUM is a very simple case. I think there is a danger in re-inventing the wheel here. Something like a high quality parallelised MATMUL which is useful on both CPU and GPU sounds hard. There is a lot of work in that direction in "upstream" MLIR dialects. I know flang (for historical reasons) doesn't use linalg, affine, etc but we might have to eventually. - If we decide not to re-use upstream mlir work, I think more complex intrinsics could get quite hard to review in this style. A C++ runtime library implementation would be easier to read and maintain - Having it in a runtime library would make it easier for vendors to swap out their own implementations e.g. maybe somebody already has a great MATMUL for AMD GPUs and wants flang to use that, but the implementation is useless on a CPU Overall I don't know what the right approach is. Maybe we should do both as Ivan said. I think if we do decide to implement several intrinsics this way, it should go in its own pass because the code could quickly become very long. https://github.com/llvm/llvm-project/pull/113082 _______________________________________________ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits