[flang] [clang] [flang][Driver] Support `-pthread` to the frontend. (PR #75739)

2023-12-17 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/75739

Adds `-pthread` option to flang. Since the GNU toolchain already adds the 
required linker flag, we only need to declare `FlangOption` as one of the 
supported options for `-pthread`.

>From 2880ebc7f3eec9f0c03747c7a2d92e608b111f3c Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sun, 17 Dec 2023 06:29:49 -0600
Subject: [PATCH] [flang][Driver] Support `-pthread` to the frontend.

Adds `-pthread` option to flang. Since the GNU toolchain already adds
the required linker flag, we only need to declare `FlangOption` as one
of the supported options for `-pthread`.
---
 clang/include/clang/Driver/Options.td | 2 +-
 flang/test/Driver/dynamic-linker.f90  | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 1b02087425b751..b8b8d476413982 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5297,7 +5297,7 @@ def pthreads : Flag<["-"], "pthreads">;
 defm pthread : BoolOption<"", "pthread",
   LangOpts<"POSIXThreads">, DefaultFalse,
   PosFlag,
-  NegFlag, BothFlags<[], [ClangOption, CC1Option]>>;
+  NegFlag, BothFlags<[], [ClangOption, CC1Option, FlangOption]>>;
 def pie : Flag<["-"], "pie">, Group;
 def static_pie : Flag<["-"], "static-pie">, Group;
 def read__only__relocs : Separate<["-"], "read_only_relocs">;
diff --git a/flang/test/Driver/dynamic-linker.f90 
b/flang/test/Driver/dynamic-linker.f90
index df119c22a2ea51..57a2af01aadff7 100644
--- a/flang/test/Driver/dynamic-linker.f90
+++ b/flang/test/Driver/dynamic-linker.f90
@@ -1,7 +1,7 @@
 ! Verify that certain linker flags are known to the frontend and are passed on
 ! to the linker.
 
-! RUN: %flang -### --target=x86_64-linux-gnu -rpath /path/to/dir -shared \
+! RUN: %flang -### --target=x86_64-linux-gnu -rpath /path/to/dir -shared 
-pthread \
 ! RUN: -static %s 2>&1 | FileCheck \
 ! RUN: --check-prefixes=GNU-LINKER-OPTIONS %s
 ! RUN: %flang -### --target=x86_64-windows-msvc -rpath /path/to/dir -shared \
@@ -13,6 +13,7 @@
 ! GNU-LINKER-OPTIONS-SAME: "-shared"
 ! GNU-LINKER-OPTIONS-SAME: "-static"
 ! GNU-LINKER-OPTIONS-SAME: "-rpath" "/path/to/dir"
+! GNU-LINKER-OPTIONS-SAME: "-lpthread"
 
 ! For MSVC, adding -static does not add any additional linker options.
 ! MSVC-LINKER-OPTIONS: "{{.*}}link{{(.exe)?}}"

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [clang] [flang][Driver] Support `-pthread` to the frontend. (PR #75739)

2023-12-17 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/75739

>From 395b56fa481e0cd90adf98534f3b60a3cfc8f52b Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sun, 17 Dec 2023 06:29:49 -0600
Subject: [PATCH] [flang][Driver] Support `-pthread` to the frontend.

Adds `-pthread` option to flang. Since the GNU toolchain already adds
the required linker flag, we only need to declare `FlangOption` as one
of the supported options for `-pthread`.
---
 clang/include/clang/Driver/Options.td| 2 +-
 flang/test/Driver/driver-help-hidden.f90 | 1 +
 flang/test/Driver/driver-help.f90| 1 +
 flang/test/Driver/dynamic-linker.f90 | 3 ++-
 4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 1b02087425b751..b8b8d476413982 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5297,7 +5297,7 @@ def pthreads : Flag<["-"], "pthreads">;
 defm pthread : BoolOption<"", "pthread",
   LangOpts<"POSIXThreads">, DefaultFalse,
   PosFlag,
-  NegFlag, BothFlags<[], [ClangOption, CC1Option]>>;
+  NegFlag, BothFlags<[], [ClangOption, CC1Option, FlangOption]>>;
 def pie : Flag<["-"], "pie">, Group;
 def static_pie : Flag<["-"], "static-pie">, Group;
 def read__only__relocs : Separate<["-"], "read_only_relocs">;
diff --git a/flang/test/Driver/driver-help-hidden.f90 
b/flang/test/Driver/driver-help-hidden.f90
index 9a11a7a571ffcc..39c607b80ddb98 100644
--- a/flang/test/Driver/driver-help-hidden.f90
+++ b/flang/test/Driver/driver-help-hidden.f90
@@ -134,6 +134,7 @@
 ! CHECK-NEXT: -pedantic   Warn on language extensions
 ! CHECK-NEXT: -print-effective-triple Print the effective target triple
 ! CHECK-NEXT: -print-target-triplePrint the normalized target triple
+! CHECK-NEXT: -pthreadSupport POSIX threads in generated code
 ! CHECK-NEXT: -P  Disable linemarker output in -E mode
 ! CHECK-NEXT: -Rpass-analysis= Report transformation analysis from 
optimization passes whose name matches the given POSIX regular expression
 ! CHECK-NEXT: -Rpass-missed=   Report missed transformations by 
optimization passes whose name matches the given POSIX regular expression
diff --git a/flang/test/Driver/driver-help.f90 
b/flang/test/Driver/driver-help.f90
index e0e74dc56f331e..51c59694f1570a 100644
--- a/flang/test/Driver/driver-help.f90
+++ b/flang/test/Driver/driver-help.f90
@@ -120,6 +120,7 @@
 ! HELP-NEXT: -pedantic   Warn on language extensions
 ! HELP-NEXT: -print-effective-triple Print the effective target triple
 ! HELP-NEXT: -print-target-triplePrint the normalized target triple
+! HELP-NEXT: -pthreadSupport POSIX threads in generated code
 ! HELP-NEXT: -P  Disable linemarker output in -E mode
 ! HELP-NEXT: -Rpass-analysis= Report transformation analysis from 
optimization passes whose name matches the given POSIX regular expression
 ! HELP-NEXT: -Rpass-missed=   Report missed transformations by 
optimization passes whose name matches the given POSIX regular expression
diff --git a/flang/test/Driver/dynamic-linker.f90 
b/flang/test/Driver/dynamic-linker.f90
index df119c22a2ea51..57a2af01aadff7 100644
--- a/flang/test/Driver/dynamic-linker.f90
+++ b/flang/test/Driver/dynamic-linker.f90
@@ -1,7 +1,7 @@
 ! Verify that certain linker flags are known to the frontend and are passed on
 ! to the linker.
 
-! RUN: %flang -### --target=x86_64-linux-gnu -rpath /path/to/dir -shared \
+! RUN: %flang -### --target=x86_64-linux-gnu -rpath /path/to/dir -shared 
-pthread \
 ! RUN: -static %s 2>&1 | FileCheck \
 ! RUN: --check-prefixes=GNU-LINKER-OPTIONS %s
 ! RUN: %flang -### --target=x86_64-windows-msvc -rpath /path/to/dir -shared \
@@ -13,6 +13,7 @@
 ! GNU-LINKER-OPTIONS-SAME: "-shared"
 ! GNU-LINKER-OPTIONS-SAME: "-static"
 ! GNU-LINKER-OPTIONS-SAME: "-rpath" "/path/to/dir"
+! GNU-LINKER-OPTIONS-SAME: "-lpthread"
 
 ! For MSVC, adding -static does not add any additional linker options.
 ! MSVC-LINKER-OPTIONS: "{{.*}}link{{(.exe)?}}"

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][Driver] Support `-pthread` to the frontend. (PR #75739)

2023-12-21 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

> Hi @ergawy , thanks for this contribution! Could you add a test that would 
> demonstrate compilation failing without `-pthread`?

Thanks for the suggestion. Actually I failed to do that 😆. After looking into 
it, seem that for the GNU toolchain, the `-pthread` flag would be redundant. 
That reason is that the pthread API is defined by `libc` which is linked in by 
the driver by default.

I think I will abandon this PR then. Just need to double check once more. 

https://github.com/llvm/llvm-project/pull/75739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [clang] [flang][Driver] Support `-pthread` to the frontend. (PR #75739)

2023-12-22 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy closed https://github.com/llvm/llvm-project/pull/75739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [clang] [flang][Driver] Support `-pthread` to the frontend. (PR #75739)

2023-12-22 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Abandoning this PR since for the GNU toolchain there is no need to explicitly 
link with pthread to use the API.

https://github.com/llvm/llvm-project/pull/75739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][Driver] Support -pthread in the frontend (PR #77360)

2024-01-09 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Thanks @tarunprabhu for opening this. I indeed closed my original PR but was 
about to reopen it after last week's discussion.

At least for the GNU toolchain, it won't be easy to come up with a simple test 
that fails without `-pthread`. The reason is that the pthread API is actually 
exported by `libc` and that `-lc` is added by the GNU toolchain in the driver 
in any case.

Just to give more context, I tested with the following program:
```
program main  
INTERFACE
  SUBROUTINE pthread_create() BIND(C)
USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_INT
IMPLICIT NONE
  END SUBROUTINE pthread_create
END INTERFACE


print*, "=== Calling pthread_create() ==="
call pthread_create()
end program main
```
And expected `./bin/flang-new /tmp/test_pthread.f90 -o /tmp/test_pthread_2 -v` 
to complain that the `pthread_create()` symbol is undefined. However, it 
compiles and links fine.

And if you look at the linker command, you find it looks like this:
```
"/work/kaergawy/git/trunk18.0/build/llvm-project/bin/ld.lld" -z relro 
--hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker 
/lib64/ld-linux-x86-64.so.2 -o /tmp/test_pthread_2 /lib/x86_64-linux-gnu/crt1.o 
/lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/12/crtbegin.o 
-L/work/kaergawy/git/trunk18.0/build/llvm-project/lib/clang/18/lib/x86_64-unknown-linux-gnu
 -L/usr/lib/gcc/x86_64-linux-gnu/12 
-L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib64 -L/lib/x86_64-linux-gnu 
-L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib 
-L/usr/lib /tmp/test_pthread_fortran.o 
-L/work/kaergawy/git/trunk18.0/build/llvm-project/lib --whole-archive 
-lFortran_main --no-whole-archive -lFortranRuntime -lFortranDecimal -lm -lgcc 
--as-needed -lgcc_s --no-as-needed >> -lc << -lgcc --as-needed -lgcc_s 
--no-as-needed /usr/lib/gcc/x86_64-linux-gnu/12/crtend.o 
/lib/x86_64-linux-gnu/crtn.o
```
Note the `-lc` flag that I highligted with `>> ... <<` above. If you remove 
that flag from the linker command, you get:
```
ld.lld: error: undefined symbol: pthread_create
>>> referenced by FIRModule
>>>   /tmp/test_pthread_fortran.o:(_QQmain)

ld.lld: error: undefined symbol: __libc_start_main
>>> referenced by /lib/x86_64-linux-gnu/crt1.o:(_start)
```

And indeed if you `nm --defined-only /usr/lib/x86_64-linux-gnu/libc.a`, you 
find that `pthread_create` is actually defined by `libc`.

---

That said, what @tarunprabhu mentioned about OpenMPI wrappers adding the flag 
is indeed correct as mentioned by  Brian Cornille in the last flang bi-weekly. 
So adding the flag would indeed make sense even if it is redundant for the GNU 
toolchain but it would be consistent with `clang`. However, coming up with a 
failing test will prove more difficult that it initially seems.

I shared the PR with Brian since I cannot add him as a reviewer.

https://github.com/llvm/llvm-project/pull/77360
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][Driver] Support -pthread in the frontend (PR #77360)

2024-01-10 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Sorry for the late reply. This slipped my mind. Added myself as a reviewer to 
not forget.

> Could you take a look at #77135 and see whether `-gpulibc` could be helpful 
> for testing?

I don't think this will help since that's a different library, right?

For testing purposes, something like adding `-nolibc` 
([see](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Gnu.cpp#L619-L620))
 **_might_** help I assume but it is not supported for `flang`. For example, if 
you try the equivalent C example for the Fortran code I attached above and 
compile it with `clang -nolibc`, you get: `/usr/bin/ld: 
test_pthread.c:(.text+0xd6): undefined reference to `pthread_create'`. And I 
say "**_might help_**" because even if you try `clang -nolibc -pthread` you 
would still get the linker error because the `pthread` library is simply empty 
(i.e. does define any symbols) (at least for my Ubuntu system; don't know how 
general this is for GNU toolchains).

https://github.com/llvm/llvm-project/pull/77360
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [clang] [flang][Driver] Support -pthread in the frontend (PR #77360)

2024-01-12 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy approved this pull request.


https://github.com/llvm/llvm-project/pull/77360
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type for Fortran pre-processed files (PR #104664)

2024-09-04 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 5dadb7acc6741f69c139855c7a7e1b9cc0c4290b Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def| 12 +-
 flang/test/Driver/save-temps-use-module.f90 | 26 +
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..fa8d852f59ce8d 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,17 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,  
   "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+
+// Note: The `phases::Preprocess` phase is added to ".i" (i.e. Fortran
+// pre-processed) files. The reason is that the pre-processor "phase" has to be
+// re-run to make sure that e.g. the include flags (i.e. `-I `) are
+// preserved. That's because these include paths will contain module files and,
+// unlike C header files, these module files wouldn't be included in the
+// pre-processed file. In particular, we need to add the search paths for these
+// modules when flang needs to emits pre-processed files. Therefore, the
+// `PP_TYPE` is set to `PP_Fortran` so that the driver is fine with
+// "pre-processing a pre-processed file".
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..2f184d15898571
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,26 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -S -emit-llvm --save-temps=obj -I%t/mod_inc_dir 
-fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User
+
+program dummy
+end program

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type for Fortran pre-processed files (PR #104664)

2024-09-04 Thread Kareem Ergawy via cfe-commits


@@ -79,7 +79,14 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,  
   "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+
+// Note: The `phases::Preprocess` phase is added to ".i" (i.e.
+// Fortran pre-processed) files. The reason is that Fortran
+// pre-processed files need further pre-proecessing when they
+// include modules from non-standard paths. In particular, we
+// need to add the search paths for these modules when flang
+// needs to emits pre-processed files.

ergawy wrote:

Done.

https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type for Fortran pre-processed files (PR #104664)

2024-09-08 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 9e14adcabd84f1f746e60cb2cc4582f0c852a776 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def| 12 +-
 flang/test/Driver/save-temps-use-module.f90 | 26 +
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..af186c5df69201 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,17 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,  
   "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+
+// Note: The `phases::Preprocess` phase is added to ".i" (i.e. Fortran
+// pre-processed) files. The reason is that the pre-processor "phase" has to be
+// re-run to make sure that e.g. the include flags (i.e. `-I `) are
+// preserved. That's because these include paths will contain module files and,
+// unlike C header files, these module files wouldn't be included in the
+// pre-processed file. In particular, we need to add the search paths for these
+// modules when Flang needs to emit pre-processed files. Therefore, the
+// `PP_TYPE` is set to `PP_Fortran` so that the driver is fine with
+// "pre-processing a pre-processed file".
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..2f184d15898571
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,26 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -S -emit-llvm --save-temps=obj -I%t/mod_inc_dir 
-fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User
+
+program dummy
+end program

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type for Fortran pre-processed files (PR #104664)

2024-09-08 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy closed 
https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-11 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [llvm] [mlir] [Flang]Fix for changed code at the end of AllocaIP. (PR #92430)

2024-06-17 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

> @ergawy Could you take a look at this, given that you did something similar 
> [even if it was much smaller] recently?

Sorry, this totally slipped my mind. I will take a look today.

https://github.com/llvm/llvm-project/pull/92430
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [llvm] [mlir] [Flang]Fix for changed code at the end of AllocaIP. (PR #92430)

2024-06-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/92430
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [llvm] [OpenMP][LLVM] Update alloca IP after `PrivCB` in `OMPIRBUIlder` (PR #93920)

2024-06-03 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/93920

>From 926cf8d19c625880c303aff0527e2e6e8a1629bd Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Thu, 30 May 2024 23:16:39 -0500
Subject: [PATCH 1/3] [OpenMP][][LLVM] Update alloca IP after `PrivCB` in
 `OMPIRBUIlder`

Fixes a crash uncovered by 
[pr77666.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr77666.f90)
 in the test suite.

In particular, whenever `PrivCB` (the callback responsible for
generating privatizaiton logic for an OMP variable) generates a
multi-block privatization region, the insertion point diverges: the BB
component of the IP can become a different BB from the parent block of
the instruction iterator component of the IP. This PR updates the IP to
make sure that the BB is the parent block of the instruction iterator.
---
 ...rivatization-lower-allocatable-to-llvm.f90 | 23 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 +++
 2 files changed, 26 insertions(+)
 create mode 100644 
flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90

diff --git 
a/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90 
b/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90
new file mode 100644
index 0..ac9a6d8746cf2
--- /dev/null
+++ 
b/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90
@@ -0,0 +1,23 @@
+! Tests the OMPIRBuilder can handle multiple privatization regions that contain
+! multiple BBs (for example, for allocatables).
+
+! RUN: %flang -S -emit-llvm -fopenmp -mmlir 
--openmp-enable-delayed-privatization \
+! RUN:   -o - %s 2>&1 | FileCheck %s
+
+subroutine foo(x)
+  integer, allocatable :: x, y
+!$omp parallel private(x, y)
+  x = y
+!$omp end parallel
+end
+
+! CHECK-LABEL: define void @foo_
+! CHECK: ret void
+! CHECK-NEXT:  }
+
+! CHECK-LABEL: define internal void @foo_..omp_par
+! CHECK-DAG: call ptr @malloc
+! CHECK-DAG: call ptr @malloc
+! CHECK-DAG: call void @free
+! CHECK-DAG: call void @free
+! CHECK:   }
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index cb4de9c8876dc..eab41eb8a35b2 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1583,6 +1583,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
 } else {
   Builder.restoreIP(
   PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, 
ReplacementValue));
+  InnerAllocaIP = {InnerAllocaIP.getPoint()->getParent(),
+   InnerAllocaIP.getPoint()};
+
   assert(ReplacementValue &&
  "Expected copy/create callback to set replacement value!");
   if (ReplacementValue == &V)

>From 659eec2a468902cf1654394f3eccdab16e92a027 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 3 Jun 2024 09:05:35 -0500
Subject: [PATCH 2/3] update ip

---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index eab41eb8a35b2..2c4b45255d059 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1583,8 +1583,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
 } else {
   Builder.restoreIP(
   PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, 
ReplacementValue));
-  InnerAllocaIP = {InnerAllocaIP.getPoint()->getParent(),
-   InnerAllocaIP.getPoint()};
+  InnerAllocaIP = {
+  InnerAllocaIP.getBlock(),
+  InnerAllocaIP.getBlock()->getTerminator()->getIterator()};
 
   assert(ReplacementValue &&
  "Expected copy/create callback to set replacement value!");

>From d5403868ef43729aceaf763d8fa7e8e784938948 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 3 Jun 2024 22:13:05 -0500
Subject: [PATCH 3/3] fix clang expectations

---
 clang/test/OpenMP/parallel_codegen.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/clang/test/OpenMP/parallel_codegen.cpp 
b/clang/test/OpenMP/parallel_codegen.cpp
index d545b4a9d9fa8..9082f1c3232af 100644
--- a/clang/test/OpenMP/parallel_codegen.cpp
+++ b/clang/test/OpenMP/parallel_codegen.cpp
@@ -822,8 +822,8 @@ int main (int argc, char **argv) {
 // CHECK3-NEXT:[[TMP1:%.*]] = load i32, ptr [[TID_ADDR]], align 4
 // CHECK3-NEXT:store i32 [[TMP1]], ptr [[TID_ADDR_LOCAL]], align 4
 // CHECK3-NEXT:[[TID:%.*]] = load i32, ptr [[TID_ADDR_LOCAL]], align 4
-// CHECK3-NEXT:[[TMP2:%.*]] = load i64, ptr [[LOADGEP__RELOADED]], align 8
 // CHECK3-NEXT:[[VAR:%.*]] = alloca ptr, align 8
+// CHECK3-NEXT:[[TMP2:%.*]] = load i64, ptr [[LOADGEP__RELOADED]], align 8
 // CHECK3-NEXT:br label [[OMP_PAR_REGION:%.*]]
 // CHECK3:   omp.par.region:
 // CHECK3-NEXT:[[TMP3:%.

[clang] [flang] [llvm] [OpenMP][LLVM] Update alloca IP after `PrivCB` in `OMPIRBUIlder` (PR #93920)

2024-06-03 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/93920

>From 926cf8d19c625880c303aff0527e2e6e8a1629bd Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Thu, 30 May 2024 23:16:39 -0500
Subject: [PATCH 1/3] [OpenMP][][LLVM] Update alloca IP after `PrivCB` in
 `OMPIRBUIlder`

Fixes a crash uncovered by 
[pr77666.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr77666.f90)
 in the test suite.

In particular, whenever `PrivCB` (the callback responsible for
generating privatizaiton logic for an OMP variable) generates a
multi-block privatization region, the insertion point diverges: the BB
component of the IP can become a different BB from the parent block of
the instruction iterator component of the IP. This PR updates the IP to
make sure that the BB is the parent block of the instruction iterator.
---
 ...rivatization-lower-allocatable-to-llvm.f90 | 23 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 +++
 2 files changed, 26 insertions(+)
 create mode 100644 
flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90

diff --git 
a/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90 
b/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90
new file mode 100644
index 0..ac9a6d8746cf2
--- /dev/null
+++ 
b/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90
@@ -0,0 +1,23 @@
+! Tests the OMPIRBuilder can handle multiple privatization regions that contain
+! multiple BBs (for example, for allocatables).
+
+! RUN: %flang -S -emit-llvm -fopenmp -mmlir 
--openmp-enable-delayed-privatization \
+! RUN:   -o - %s 2>&1 | FileCheck %s
+
+subroutine foo(x)
+  integer, allocatable :: x, y
+!$omp parallel private(x, y)
+  x = y
+!$omp end parallel
+end
+
+! CHECK-LABEL: define void @foo_
+! CHECK: ret void
+! CHECK-NEXT:  }
+
+! CHECK-LABEL: define internal void @foo_..omp_par
+! CHECK-DAG: call ptr @malloc
+! CHECK-DAG: call ptr @malloc
+! CHECK-DAG: call void @free
+! CHECK-DAG: call void @free
+! CHECK:   }
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index cb4de9c8876dc..eab41eb8a35b2 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1583,6 +1583,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
 } else {
   Builder.restoreIP(
   PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, 
ReplacementValue));
+  InnerAllocaIP = {InnerAllocaIP.getPoint()->getParent(),
+   InnerAllocaIP.getPoint()};
+
   assert(ReplacementValue &&
  "Expected copy/create callback to set replacement value!");
   if (ReplacementValue == &V)

>From 659eec2a468902cf1654394f3eccdab16e92a027 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 3 Jun 2024 09:05:35 -0500
Subject: [PATCH 2/3] update ip

---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index eab41eb8a35b2..2c4b45255d059 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1583,8 +1583,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
 } else {
   Builder.restoreIP(
   PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, 
ReplacementValue));
-  InnerAllocaIP = {InnerAllocaIP.getPoint()->getParent(),
-   InnerAllocaIP.getPoint()};
+  InnerAllocaIP = {
+  InnerAllocaIP.getBlock(),
+  InnerAllocaIP.getBlock()->getTerminator()->getIterator()};
 
   assert(ReplacementValue &&
  "Expected copy/create callback to set replacement value!");

>From d5403868ef43729aceaf763d8fa7e8e784938948 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 3 Jun 2024 22:13:05 -0500
Subject: [PATCH 3/3] fix clang expectations

---
 clang/test/OpenMP/parallel_codegen.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/clang/test/OpenMP/parallel_codegen.cpp 
b/clang/test/OpenMP/parallel_codegen.cpp
index d545b4a9d9fa8..9082f1c3232af 100644
--- a/clang/test/OpenMP/parallel_codegen.cpp
+++ b/clang/test/OpenMP/parallel_codegen.cpp
@@ -822,8 +822,8 @@ int main (int argc, char **argv) {
 // CHECK3-NEXT:[[TMP1:%.*]] = load i32, ptr [[TID_ADDR]], align 4
 // CHECK3-NEXT:store i32 [[TMP1]], ptr [[TID_ADDR_LOCAL]], align 4
 // CHECK3-NEXT:[[TID:%.*]] = load i32, ptr [[TID_ADDR_LOCAL]], align 4
-// CHECK3-NEXT:[[TMP2:%.*]] = load i64, ptr [[LOADGEP__RELOADED]], align 8
 // CHECK3-NEXT:[[VAR:%.*]] = alloca ptr, align 8
+// CHECK3-NEXT:[[TMP2:%.*]] = load i64, ptr [[LOADGEP__RELOADED]], align 8
 // CHECK3-NEXT:br label [[OMP_PAR_REGION:%.*]]
 // CHECK3:   omp.par.region:
 // CHECK3-NEXT:[[TMP3:%.

[clang] [flang] [llvm] [OpenMP][LLVM] Update alloca IP after `PrivCB` in `OMPIRBUIlder` (PR #93920)

2024-06-04 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/93920

>From 926cf8d19c625880c303aff0527e2e6e8a1629bd Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Thu, 30 May 2024 23:16:39 -0500
Subject: [PATCH 1/3] [OpenMP][][LLVM] Update alloca IP after `PrivCB` in
 `OMPIRBUIlder`

Fixes a crash uncovered by 
[pr77666.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr77666.f90)
 in the test suite.

In particular, whenever `PrivCB` (the callback responsible for
generating privatizaiton logic for an OMP variable) generates a
multi-block privatization region, the insertion point diverges: the BB
component of the IP can become a different BB from the parent block of
the instruction iterator component of the IP. This PR updates the IP to
make sure that the BB is the parent block of the instruction iterator.
---
 ...rivatization-lower-allocatable-to-llvm.f90 | 23 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 +++
 2 files changed, 26 insertions(+)
 create mode 100644 
flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90

diff --git 
a/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90 
b/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90
new file mode 100644
index 0..ac9a6d8746cf2
--- /dev/null
+++ 
b/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90
@@ -0,0 +1,23 @@
+! Tests the OMPIRBuilder can handle multiple privatization regions that contain
+! multiple BBs (for example, for allocatables).
+
+! RUN: %flang -S -emit-llvm -fopenmp -mmlir 
--openmp-enable-delayed-privatization \
+! RUN:   -o - %s 2>&1 | FileCheck %s
+
+subroutine foo(x)
+  integer, allocatable :: x, y
+!$omp parallel private(x, y)
+  x = y
+!$omp end parallel
+end
+
+! CHECK-LABEL: define void @foo_
+! CHECK: ret void
+! CHECK-NEXT:  }
+
+! CHECK-LABEL: define internal void @foo_..omp_par
+! CHECK-DAG: call ptr @malloc
+! CHECK-DAG: call ptr @malloc
+! CHECK-DAG: call void @free
+! CHECK-DAG: call void @free
+! CHECK:   }
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index cb4de9c8876dc..eab41eb8a35b2 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1583,6 +1583,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
 } else {
   Builder.restoreIP(
   PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, 
ReplacementValue));
+  InnerAllocaIP = {InnerAllocaIP.getPoint()->getParent(),
+   InnerAllocaIP.getPoint()};
+
   assert(ReplacementValue &&
  "Expected copy/create callback to set replacement value!");
   if (ReplacementValue == &V)

>From 659eec2a468902cf1654394f3eccdab16e92a027 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 3 Jun 2024 09:05:35 -0500
Subject: [PATCH 2/3] update ip

---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index eab41eb8a35b2..2c4b45255d059 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1583,8 +1583,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
 } else {
   Builder.restoreIP(
   PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, 
ReplacementValue));
-  InnerAllocaIP = {InnerAllocaIP.getPoint()->getParent(),
-   InnerAllocaIP.getPoint()};
+  InnerAllocaIP = {
+  InnerAllocaIP.getBlock(),
+  InnerAllocaIP.getBlock()->getTerminator()->getIterator()};
 
   assert(ReplacementValue &&
  "Expected copy/create callback to set replacement value!");

>From d5403868ef43729aceaf763d8fa7e8e784938948 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 3 Jun 2024 22:13:05 -0500
Subject: [PATCH 3/3] fix clang expectations

---
 clang/test/OpenMP/parallel_codegen.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/clang/test/OpenMP/parallel_codegen.cpp 
b/clang/test/OpenMP/parallel_codegen.cpp
index d545b4a9d9fa8..9082f1c3232af 100644
--- a/clang/test/OpenMP/parallel_codegen.cpp
+++ b/clang/test/OpenMP/parallel_codegen.cpp
@@ -822,8 +822,8 @@ int main (int argc, char **argv) {
 // CHECK3-NEXT:[[TMP1:%.*]] = load i32, ptr [[TID_ADDR]], align 4
 // CHECK3-NEXT:store i32 [[TMP1]], ptr [[TID_ADDR_LOCAL]], align 4
 // CHECK3-NEXT:[[TID:%.*]] = load i32, ptr [[TID_ADDR_LOCAL]], align 4
-// CHECK3-NEXT:[[TMP2:%.*]] = load i64, ptr [[LOADGEP__RELOADED]], align 8
 // CHECK3-NEXT:[[VAR:%.*]] = alloca ptr, align 8
+// CHECK3-NEXT:[[TMP2:%.*]] = load i64, ptr [[LOADGEP__RELOADED]], align 8
 // CHECK3-NEXT:br label [[OMP_PAR_REGION:%.*]]
 // CHECK3:   omp.par.region:
 // CHECK3-NEXT:[[TMP3:%.

[clang] [flang] [llvm] [OpenMP][LLVM] Update alloca IP after `PrivCB` in `OMPIRBUIlder` (PR #93920)

2024-06-04 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/93920

>From 926cf8d19c625880c303aff0527e2e6e8a1629bd Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Thu, 30 May 2024 23:16:39 -0500
Subject: [PATCH 1/3] [OpenMP][][LLVM] Update alloca IP after `PrivCB` in
 `OMPIRBUIlder`

Fixes a crash uncovered by 
[pr77666.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr77666.f90)
 in the test suite.

In particular, whenever `PrivCB` (the callback responsible for
generating privatizaiton logic for an OMP variable) generates a
multi-block privatization region, the insertion point diverges: the BB
component of the IP can become a different BB from the parent block of
the instruction iterator component of the IP. This PR updates the IP to
make sure that the BB is the parent block of the instruction iterator.
---
 ...rivatization-lower-allocatable-to-llvm.f90 | 23 +++
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 +++
 2 files changed, 26 insertions(+)
 create mode 100644 
flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90

diff --git 
a/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90 
b/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90
new file mode 100644
index 0..ac9a6d8746cf2
--- /dev/null
+++ 
b/flang/test/Lower/OpenMP/delayed-privatization-lower-allocatable-to-llvm.f90
@@ -0,0 +1,23 @@
+! Tests the OMPIRBuilder can handle multiple privatization regions that contain
+! multiple BBs (for example, for allocatables).
+
+! RUN: %flang -S -emit-llvm -fopenmp -mmlir 
--openmp-enable-delayed-privatization \
+! RUN:   -o - %s 2>&1 | FileCheck %s
+
+subroutine foo(x)
+  integer, allocatable :: x, y
+!$omp parallel private(x, y)
+  x = y
+!$omp end parallel
+end
+
+! CHECK-LABEL: define void @foo_
+! CHECK: ret void
+! CHECK-NEXT:  }
+
+! CHECK-LABEL: define internal void @foo_..omp_par
+! CHECK-DAG: call ptr @malloc
+! CHECK-DAG: call ptr @malloc
+! CHECK-DAG: call void @free
+! CHECK-DAG: call void @free
+! CHECK:   }
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index cb4de9c8876dc..eab41eb8a35b2 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1583,6 +1583,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
 } else {
   Builder.restoreIP(
   PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, 
ReplacementValue));
+  InnerAllocaIP = {InnerAllocaIP.getPoint()->getParent(),
+   InnerAllocaIP.getPoint()};
+
   assert(ReplacementValue &&
  "Expected copy/create callback to set replacement value!");
   if (ReplacementValue == &V)

>From 659eec2a468902cf1654394f3eccdab16e92a027 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 3 Jun 2024 09:05:35 -0500
Subject: [PATCH 2/3] update ip

---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index eab41eb8a35b2..2c4b45255d059 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1583,8 +1583,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
 } else {
   Builder.restoreIP(
   PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, 
ReplacementValue));
-  InnerAllocaIP = {InnerAllocaIP.getPoint()->getParent(),
-   InnerAllocaIP.getPoint()};
+  InnerAllocaIP = {
+  InnerAllocaIP.getBlock(),
+  InnerAllocaIP.getBlock()->getTerminator()->getIterator()};
 
   assert(ReplacementValue &&
  "Expected copy/create callback to set replacement value!");

>From d5403868ef43729aceaf763d8fa7e8e784938948 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Mon, 3 Jun 2024 22:13:05 -0500
Subject: [PATCH 3/3] fix clang expectations

---
 clang/test/OpenMP/parallel_codegen.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/clang/test/OpenMP/parallel_codegen.cpp 
b/clang/test/OpenMP/parallel_codegen.cpp
index d545b4a9d9fa8..9082f1c3232af 100644
--- a/clang/test/OpenMP/parallel_codegen.cpp
+++ b/clang/test/OpenMP/parallel_codegen.cpp
@@ -822,8 +822,8 @@ int main (int argc, char **argv) {
 // CHECK3-NEXT:[[TMP1:%.*]] = load i32, ptr [[TID_ADDR]], align 4
 // CHECK3-NEXT:store i32 [[TMP1]], ptr [[TID_ADDR_LOCAL]], align 4
 // CHECK3-NEXT:[[TID:%.*]] = load i32, ptr [[TID_ADDR_LOCAL]], align 4
-// CHECK3-NEXT:[[TMP2:%.*]] = load i64, ptr [[LOADGEP__RELOADED]], align 8
 // CHECK3-NEXT:[[VAR:%.*]] = alloca ptr, align 8
+// CHECK3-NEXT:[[TMP2:%.*]] = load i64, ptr [[LOADGEP__RELOADED]], align 8
 // CHECK3-NEXT:br label [[OMP_PAR_REGION:%.*]]
 // CHECK3:   omp.par.region:
 // CHECK3-NEXT:[[TMP3:%.

[clang] [flang] [llvm] [OpenMP][LLVM] Update alloca IP after `PrivCB` in `OMPIRBUIlder` (PR #93920)

2024-06-04 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy closed https://github.com/llvm/llvm-project/pull/93920
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy commented:

Thanks @pranavb-ca! I did a first round and have a few comments.

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy edited https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction(
   OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, 
true,
   OutlinedFn, OutlinedFnID);
 }
+OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask(
+Function *OutlinedFn, Value *OutlinedFnID,
+EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
+Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP,
+SmallVector &Dependencies,
+bool HasNoWait) {
+
+  // When we arrive at this function, the target region itself has been
+  // outlined into the function OutlinedFn.
+  // So at ths point, for
+  // --
+  //   void user_code_that_offloads(...) {
+  // omp target depend(..) map(from:a) map(to:b, c)
+  //a = b + c
+  //   }
+  //
+  // --
+  //
+  // we have
+  //
+  // --
+  //
+  //   void user_code_that_offloads(...) {
+  // %.offload_baseptrs = alloca [3 x ptr], align 8
+  // %.offload_ptrs = alloca [3 x ptr], align 8
+  // %.offload_mappers = alloca [3 x ptr], align 8
+  // ;; target region has been outlined and now we need to
+  // ;; offload to it via a target task.
+  //   }
+  //   void outlined_device_function(ptr a, ptr b, ptr c) {
+  // *a = *b + *c
+  //   }
+  //
+  // We have to now do the following
+  // (i)   Make an offloading call to outlined_device_function using the OpenMP
+  //   RTL. See 'kernel_launch_function' in the pseudo code below. This is
+  //   emitted by emitKernelLaunch
+  // (ii)  Create a task entry point function that calls kernel_launch_function
+  //   and is the entry point for the target task. See
+  //   '@.omp_target_task_proxy_func in the pseudocode below.
+  // (iii) Create a task with the task entry point created in (ii)
+  //
+  // That is we create the following
+  //
+  //   void user_code_that_offloads(...) {
+  // %.offload_baseptrs = alloca [3 x ptr], align 8
+  // %.offload_ptrs = alloca [3 x ptr], align 8
+  // %.offload_mappers = alloca [3 x ptr], align 8
+  //
+  // %structArg = alloca { ptr, ptr, ptr }, align 8
+  // %strucArg[0] = %.offload_baseptrs
+  // %strucArg[1] = %.offload_ptrs
+  // %strucArg[2] = %.offload_mappers
+  // proxy_target_task = @__kmpc_omp_task_alloc(...,
+  //   
@.omp_target_task_proxy_func)
+  // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg))
+  // dependencies_array = ...
+  // ;; if nowait not present
+  // call @__kmpc_omp_wait_deps(..., dependencies_array)
+  // call @__kmpc_omp_task_begin_if0(...)
+  // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr
+  // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...)
+  //   }
+  //
+  //   define internal void @.omp_target_task_proxy_func(i32 %thread.id,
+  // ptr %task) {
+  //   %structArg = alloca {ptr, ptr, ptr}
+  //   %shared_data = load (getelementptr %task, 0, 0)
+  //   mempcy(%structArg, %shared_data, sizeof(structArg))
+  //   kernel_launch_function(%thread.id, %structArg)
+  //   }
+  //
+  //   We need the proxy function because the signature of the task entry point
+  //   expected by kmpc_omp_task is always the same and will be different from
+  //   that of the kernel_launch function.
+  //
+  //   kernel_launch_function is generated by emitKernelLaunch and has the
+  //   always_inline attribute. void kernel_launch_function(thread_id,
+  //structArg)
+  //alwaysinline {
+  //   %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8
+  //   offload_baseptrs = load(getelementptr structArg, 0, 0)
+  //   offload_ptrs = load(getelementptr structArg, 0, 1)
+  //   offload_mappers = load(getelementptr structArg, 0, 2)
+  //   ; setup kernel_args using offload_baseptrs, offload_ptrs and
+  //   ; offload_mappers
+  //   call i32 @__tgt_target_kernel(...,
+  // outlined_device_function,
+  // ptr %kernel_args)
+  //   }
+  //   void outlined_device_function(ptr a, ptr b, ptr c) {
+  //  *a = *b + *c
+  //   }
+  //
+  BasicBlock *TargetTaskBodyBB =
+  splitBB(Builder, /*CreateBranch=*/true, "target.task.body");
+  BasicBlock *TargetTaskAllocaBB =
+  splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca");
+
+  InsertPointTy TargetTaskAllocaIP =
+  InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin());
+  InsertPointTy TargetTaskBodyIP =
+  InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin());
+
+  OutlineInfo OI;
+  OI.En

[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction(
   return Func;
 }
 
+// Create an entry point for a target task with the following.
+// It'll have the following signature
+// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task)
+// This function is called from emitTargetTask once the
+// code to launch the target kernel has been outlined already.
+static Function *emitProxyTaskFunction(OpenMPIRBuilder &OMPBuilder,
+   IRBuilderBase &Builder,
+   CallInst *StaleCI) {
+  Module &M = OMPBuilder.M;
+  // CalledFunction is the target launch function, i.e.
+  // the function that sets up kernel arguments and calls
+  // __tgt_target_kernel to launch the kernel on the device.
+  Function *CalledFunction = StaleCI->getCalledFunction();

ergawy wrote:

nit:
```suggestion
  Function *KernelLaunchFunction = StaleCI->getCalledFunction();
```

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction(
   return Func;
 }
 
+// Create an entry point for a target task with the following.
+// It'll have the following signature
+// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task)
+// This function is called from emitTargetTask once the
+// code to launch the target kernel has been outlined already.
+static Function *emitProxyTaskFunction(OpenMPIRBuilder &OMPBuilder,
+   IRBuilderBase &Builder,
+   CallInst *StaleCI) {
+  Module &M = OMPBuilder.M;
+  // CalledFunction is the target launch function, i.e.
+  // the function that sets up kernel arguments and calls
+  // __tgt_target_kernel to launch the kernel on the device.
+  Function *CalledFunction = StaleCI->getCalledFunction();
+  OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(),
+StaleCI->getIterator());
+  LLVMContext &Ctx = StaleCI->getParent()->getContext();
+  Type *ThreadIDTy = Type::getInt32Ty(Ctx);
+  Type *TaskPtrTy = OMPBuilder.TaskPtr;
+  Type *TaskTy = OMPBuilder.Task;
+  auto ProxyFnTy =
+  FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy},
+/* isVarArg */ false);
+  auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage,
+  ".omp_target_task_proxy_func",
+  Builder.GetInsertBlock()->getModule());
+
+  BasicBlock *EntryBB =
+  BasicBlock::Create(Builder.getContext(), "entry", ProxyFn);
+  Builder.SetInsertPoint(EntryBB);
+
+  bool HasShareds = StaleCI->arg_size() > 1;

ergawy wrote:

Can you document what `StaleCI` will look like in case there are shared values 
and in case there aren't?

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction(
   return Func;
 }
 
+// Create an entry point for a target task with the following.
+// It'll have the following signature
+// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task)
+// This function is called from emitTargetTask once the
+// code to launch the target kernel has been outlined already.
+static Function *emitProxyTaskFunction(OpenMPIRBuilder &OMPBuilder,

ergawy wrote:

nit: Just to express more clearly the intent of the function.
```suggestion
static Function *emitTargetProxyTaskFunction(OpenMPIRBuilder &OMPBuilder,
```

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction(
   return Func;
 }
 
+// Create an entry point for a target task with the following.
+// It'll have the following signature
+// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task)
+// This function is called from emitTargetTask once the
+// code to launch the target kernel has been outlined already.
+static Function *emitProxyTaskFunction(OpenMPIRBuilder &OMPBuilder,
+   IRBuilderBase &Builder,
+   CallInst *StaleCI) {
+  Module &M = OMPBuilder.M;
+  // CalledFunction is the target launch function, i.e.
+  // the function that sets up kernel arguments and calls
+  // __tgt_target_kernel to launch the kernel on the device.
+  Function *CalledFunction = StaleCI->getCalledFunction();
+  OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(),
+StaleCI->getIterator());
+  LLVMContext &Ctx = StaleCI->getParent()->getContext();
+  Type *ThreadIDTy = Type::getInt32Ty(Ctx);
+  Type *TaskPtrTy = OMPBuilder.TaskPtr;
+  Type *TaskTy = OMPBuilder.Task;
+  auto ProxyFnTy =
+  FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy},
+/* isVarArg */ false);
+  auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage,
+  ".omp_target_task_proxy_func",
+  Builder.GetInsertBlock()->getModule());
+
+  BasicBlock *EntryBB =
+  BasicBlock::Create(Builder.getContext(), "entry", ProxyFn);
+  Builder.SetInsertPoint(EntryBB);
+
+  bool HasShareds = StaleCI->arg_size() > 1;
+  // TODO: This is a temporary assert to prove to ourselves that
+  // the outlined target launch function is always going to have
+  // atmost two arguments if there is any data shared between
+  // host and device.
+  assert((!HasShareds || (StaleCI->arg_size() == 2)) &&
+ "StaleCI with shareds should have exactly two arguments.");
+  if (HasShareds) {
+AllocaInst *ArgStructAlloca =
+dyn_cast(StaleCI->getArgOperand(1));
+assert(ArgStructAlloca &&
+   "Unable to find the alloca instruction corresponding to arguments "
+   "for extracted function");
+StructType *ArgStructType =
+dyn_cast(ArgStructAlloca->getAllocatedType());
+LLVM_DEBUG(dbgs() << "ArgStructType = " << *ArgStructType << "\n");
+
+AllocaInst *NewArgStructAlloca =
+Builder.CreateAlloca(ArgStructType, nullptr, "structArg");
+Value *TaskT = ProxyFn->getArg(1);
+Value *ThreadId = ProxyFn->getArg(0);
+LLVM_DEBUG(dbgs() << "TaskT = " << *TaskT << "\n");
+Value *SharedsSize =
+Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType));
+
+Value *Shareds = Builder.CreateStructGEP(TaskTy, TaskT, 0);
+LoadInst *LoadShared =
+Builder.CreateLoad(PointerType::getUnqual(Ctx), Shareds);
+
+// TODO: Are these alignment values correct?
+Builder.CreateMemCpy(
+NewArgStructAlloca,
+NewArgStructAlloca->getPointerAlignment(M.getDataLayout()), LoadShared,
+LoadShared->getPointerAlignment(M.getDataLayout()), SharedsSize);
+
+Builder.CreateCall(CalledFunction, {ThreadId, NewArgStructAlloca});
+  }
+  ProxyFn->getArg(0)->setName("thread.id");
+  ProxyFn->getArg(1)->setName("task");

ergawy wrote:

nit: move these closer to where `ProxyFn` is created?

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const 
LocationDescription &Loc) {
   emitTaskyieldImpl(Loc);
 }
 
+// Processes the dependencies in Dependencies and does the following
+// - Allocates space on the stack of an array of DependInfo objects
+// - Populates each DependInfo object with relevant information of
+//   the corresponding dependence.
+// - All code is inserted in the entry block of the current function.
+static Value *
+emitDepArray(OpenMPIRBuilder &OMPBuilder,
+ SmallVector &Dependencies) {
+  // Early return if we have no dependencies to process
+  if (!Dependencies.size())
+return nullptr;
+
+  IRBuilderBase &Builder = OMPBuilder.Builder;
+  Type *DependInfo = OMPBuilder.DependInfo;
+  Module &M = OMPBuilder.M;
+
+  Value *DepArray = nullptr;
+  if (Dependencies.size()) {

ergawy wrote:

You already checked that `Dependencies` is not empty above.
```suggestion

```

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const 
LocationDescription &Loc) {
   emitTaskyieldImpl(Loc);
 }
 
+// Processes the dependencies in Dependencies and does the following
+// - Allocates space on the stack of an array of DependInfo objects
+// - Populates each DependInfo object with relevant information of
+//   the corresponding dependence.
+// - All code is inserted in the entry block of the current function.
+static Value *
+emitDepArray(OpenMPIRBuilder &OMPBuilder,
+ SmallVector &Dependencies) {
+  // Early return if we have no dependencies to process
+  if (!Dependencies.size())
+return nullptr;
+
+  IRBuilderBase &Builder = OMPBuilder.Builder;
+  Type *DependInfo = OMPBuilder.DependInfo;
+  Module &M = OMPBuilder.M;
+
+  Value *DepArray = nullptr;
+  if (Dependencies.size()) {
+OpenMPIRBuilder::InsertPointTy OldIP = Builder.saveIP();
+Builder.SetInsertPoint(
+&OldIP.getBlock()->getParent()->getEntryBlock().back());
+
+Type *DepArrayTy = ArrayType::get(DependInfo, Dependencies.size());
+DepArray = Builder.CreateAlloca(DepArrayTy, nullptr, ".dep.arr.addr");
+
+unsigned P = 0;
+for (const OpenMPIRBuilder::DependData &Dep : Dependencies) {

ergawy wrote:

Using `llvm::enumerate` will save us having to declare and increment the idx 
var.
```suggestion
for (const auto&[DepIdx, Dep] : enumerate(Dependencies)) {
```

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction(
   OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, 
true,
   OutlinedFn, OutlinedFnID);
 }
+OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask(
+Function *OutlinedFn, Value *OutlinedFnID,
+EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
+Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP,
+SmallVector &Dependencies,
+bool HasNoWait) {
+
+  // When we arrive at this function, the target region itself has been

ergawy wrote:

Really appreciate this block of comments. Paints the picture of what happens 
clearly. Thanks :)!

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const 
LocationDescription &Loc) {
   emitTaskyieldImpl(Loc);
 }
 
+// Processes the dependencies in Dependencies and does the following
+// - Allocates space on the stack of an array of DependInfo objects
+// - Populates each DependInfo object with relevant information of
+//   the corresponding dependence.
+// - All code is inserted in the entry block of the current function.
+static Value *
+emitDepArray(OpenMPIRBuilder &OMPBuilder,
+ SmallVector &Dependencies) {
+  // Early return if we have no dependencies to process
+  if (!Dependencies.size())
+return nullptr;
+
+  IRBuilderBase &Builder = OMPBuilder.Builder;
+  Type *DependInfo = OMPBuilder.DependInfo;
+  Module &M = OMPBuilder.M;
+
+  Value *DepArray = nullptr;
+  if (Dependencies.size()) {
+OpenMPIRBuilder::InsertPointTy OldIP = Builder.saveIP();
+Builder.SetInsertPoint(
+&OldIP.getBlock()->getParent()->getEntryBlock().back());
+
+Type *DepArrayTy = ArrayType::get(DependInfo, Dependencies.size());
+DepArray = Builder.CreateAlloca(DepArrayTy, nullptr, ".dep.arr.addr");
+
+unsigned P = 0;
+for (const OpenMPIRBuilder::DependData &Dep : Dependencies) {

ergawy wrote:

I see there is a similar loop in `OpenMPIRBuilder::createTask`. Can we outline 
this to a shared util used in both locations?

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)

2024-06-05 Thread Kareem Ergawy via cfe-commits


@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction(
   return Func;
 }
 
+// Create an entry point for a target task with the following.
+// It'll have the following signature
+// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task)
+// This function is called from emitTargetTask once the
+// code to launch the target kernel has been outlined already.
+static Function *emitProxyTaskFunction(OpenMPIRBuilder &OMPBuilder,
+   IRBuilderBase &Builder,
+   CallInst *StaleCI) {
+  Module &M = OMPBuilder.M;
+  // CalledFunction is the target launch function, i.e.
+  // the function that sets up kernel arguments and calls
+  // __tgt_target_kernel to launch the kernel on the device.
+  Function *CalledFunction = StaleCI->getCalledFunction();
+  OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(),
+StaleCI->getIterator());
+  LLVMContext &Ctx = StaleCI->getParent()->getContext();
+  Type *ThreadIDTy = Type::getInt32Ty(Ctx);
+  Type *TaskPtrTy = OMPBuilder.TaskPtr;
+  Type *TaskTy = OMPBuilder.Task;
+  auto ProxyFnTy =
+  FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy},
+/* isVarArg */ false);
+  auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage,
+  ".omp_target_task_proxy_func",
+  Builder.GetInsertBlock()->getModule());
+
+  BasicBlock *EntryBB =
+  BasicBlock::Create(Builder.getContext(), "entry", ProxyFn);
+  Builder.SetInsertPoint(EntryBB);
+
+  bool HasShareds = StaleCI->arg_size() > 1;
+  // TODO: This is a temporary assert to prove to ourselves that
+  // the outlined target launch function is always going to have
+  // atmost two arguments if there is any data shared between
+  // host and device.
+  assert((!HasShareds || (StaleCI->arg_size() == 2)) &&
+ "StaleCI with shareds should have exactly two arguments.");
+  if (HasShareds) {
+AllocaInst *ArgStructAlloca =
+dyn_cast(StaleCI->getArgOperand(1));
+assert(ArgStructAlloca &&
+   "Unable to find the alloca instruction corresponding to arguments "
+   "for extracted function");
+StructType *ArgStructType =
+dyn_cast(ArgStructAlloca->getAllocatedType());
+LLVM_DEBUG(dbgs() << "ArgStructType = " << *ArgStructType << "\n");

ergawy wrote:

nit: This will be printed out-of-context and might be confusing. I think this 
can be removed before merging.

https://github.com/llvm/llvm-project/pull/93977
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type for Fortran pre-processed files (PR #104664)

2024-08-29 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type for Fortran pre-processed files (PR #104664)

2024-08-29 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 4ad28a2ab6566121994f14ea233f4fd27aca3285 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  9 ++-
 flang/test/Driver/save-temps-use-module.f90 | 26 +
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..cdb8ca96225b8d 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,14 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,  
   "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+
+// Note: The `phases::Preprocess` phase is added to ".i" (i.e.
+// Fortran pre-processed) files. The reason is that Fortran
+// pre-processed files need further pre-proecessing when they
+// include modules from non-standard paths. In particular, we
+// need to add the search paths for these modules when flang
+// needs to emits pre-processed files.
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..2f184d15898571
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,26 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -S -emit-llvm --save-temps=obj -I%t/mod_inc_dir 
-fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User
+
+program dummy
+end program

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type for Fortran pre-processed files (PR #104664)

2024-08-29 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Thanks a lot for the reply and apologies for being late, it is my turn to be 
OoO :).

Your reply definitely clarified a few things for me. I updated the PR title and 
added as a note as requested. Let me know if further details need to be added.



https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` files (PR #104664)

2024-08-16 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/104664

This diff allows `.i` files emitted by flang-new to be treated as valid files 
in the pre-processing phase. This, in turn, allows flang-new to add 
pre-processing options (e.g. `-I`) when launching compilation jobs for these 
files.

This solves a bug when using `--save-temps` with source files that include 
modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and therefore 
the return value for `types::getPreprocessedType(InputType)` in 
`Flang::ConstructJob(...)` was `types::TY_INVALID`.

>From b193aec2b608e576fcfabbaa4ba5967a7f8bae9b Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  2 +-
 flang/test/Driver/save-temps-use-module.f90 | 23 +
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..b4e9e1f9f3f8b6 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,7 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,   
  "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..ad191ab631e9bb
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,23 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -c --save-temps=obj -I%t/mod_inc_dir -fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser.o
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` files (PR #104664)

2024-08-16 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 1f504bf784ee3b19ed29d2db1ba4ba26ac7d7d66 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  2 +-
 flang/test/Driver/save-temps-use-module.f90 | 23 +
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..b4e9e1f9f3f8b6 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,7 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,   
  "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..ad191ab631e9bb
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,23 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -c --save-temps=obj -I%t/mod_inc_dir -fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser.o
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` files (PR #104664)

2024-08-17 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 8b911e77c30edb19cc5dbb95423de0290ddf2c6b Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  2 +-
 flang/test/Driver/save-temps-use-module.f90 | 23 +
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..b4e9e1f9f3f8b6 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,7 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,   
  "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..ad191ab631e9bb
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,23 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -c --save-temps=obj -I%t/mod_inc_dir -fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser.o
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` files (PR #104664)

2024-08-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 88baab5e4f1f37e7238d11aa416b3bc57cf961fe Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  2 +-
 flang/test/Driver/save-temps-use-module.f90 | 26 +
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..b4e9e1f9f3f8b6 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,7 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,   
  "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..d77dd56a870911
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,26 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang --save-temps=obj -I%t/mod_inc_dir -fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser.o
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User
+
+program dummy
+end program

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` files (PR #104664)

2024-08-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 5b76a29aca1e74b740cbb9b9297673fd532734b7 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  2 +-
 flang/test/Driver/save-temps-use-module.f90 | 26 +
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..b4e9e1f9f3f8b6 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,7 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,   
  "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..2eda35f231ca66
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,26 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang --save-temps=obj -I%t/mod_inc_dir -fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User
+
+program dummy
+end program

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` files (PR #104664)

2024-08-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 56b6eaf8d060f870f64e8d63771a2241cd534320 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  2 +-
 flang/test/Driver/save-temps-use-module.f90 | 26 +
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..b4e9e1f9f3f8b6 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,7 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,   
  "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..2f184d15898571
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,26 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -S -emit-llvm --save-temps=obj -I%t/mod_inc_dir 
-fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User
+
+program dummy
+end program

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` (pre-processed) files (PR #104664)

2024-08-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` (pre-processed) files (PR #104664)

2024-08-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` (pre-processed) files (PR #104664)

2024-08-18 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Thanks for taking a look! 🙏

> Is there any reference that would document .i files?

These are pre-processed Fortran source files. I do not know if these are 
standard or flang-new specific tbh. However, I updated the PR title and 
description with more info about these files and where they are emitted. Let me 
know if we can expand this further.

https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` (pre-processed) files (PR #104664)

2024-08-21 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

> Please bear with me, it's been a while since I've touched this and also, I am 
> afk 😅

No worries. I am new to this part of the code and might be misinterpreting 
things myself.

> IIRC, the generated temp files (e.g. *.i) are used for the subsequent 
> compilation phases. So, after pre-processing file.f90 we'd get file.i with 
> all the modules included, no? So -I shouldn't be a concern, right?

We need the changes in this PR to generate the `*.i` files not to use them. 
More specifically, the call to `Flang::ConstructJob(...)` where `-o .i>` 
is target output of the job needs the preprocessing phase to process the 
included/used modules. If you want to further debug it, one thing you can do is 
to:
- Revert the changes I did in `Types.def`.
- Compile using `--save-temps` on a small sample, you can use the added lit 
test for that.
- Debug what happens in [this if 
condition](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Flang.cpp#L723)
 in the the call to `Flang::ConstructJob(...)` where `-o .i>` is part of 
the `ArgList &Args`.

https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` (pre-processed) files (PR #104664)

2024-08-21 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 205ed1497a145664471055b24ea0391f93b30711 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  2 +-
 flang/test/Driver/save-temps-use-module.f90 | 26 +
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..b4e9e1f9f3f8b6 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,7 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,   
  "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..2f184d15898571
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,26 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -S -emit-llvm --save-temps=obj -I%t/mod_inc_dir 
-fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User
+
+program dummy
+end program

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` (pre-processed) files (PR #104664)

2024-08-26 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Ping! @banach-space did you manage to take another look? Please let me know if 
you disagree with my reply above or have further comments. 🙏 

https://github.com/llvm/llvm-project/pull/104664
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][driver] Add pre-processing type to `.i` (pre-processed) files (PR #104664)

2024-08-26 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/104664

>From 714a4308272134fc83f1640f9303fc535a42cfd3 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Sat, 17 Aug 2024 00:20:11 -0500
Subject: [PATCH] [flang][driver] Add pre-processing type to `.i` files

This diff allows `.i` files emitted by flang-new to be treated as valid
files in the pre-processing phase. This, in turn, allows flang-new to
add pre-processing options (e.g. `-I`) when launching compilation jobs
for these files.

This solves a bug when using `--save-temps` with source files that
include modules from non-standard directories, for example:
```
flang-new -c --save-temps -I/tmp/module_dir -fno-integrated-as \
  /tmp/ModuleUser.f90
```
The problem was that `.i` files were treated as "binary" files and
therefore the return value for `types::getPreprocessedType(InputType)`
in `Flang::ConstructJob(...)` was `types::TY_INVALID`.
---
 clang/include/clang/Driver/Types.def|  2 +-
 flang/test/Driver/save-temps-use-module.f90 | 26 +
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 flang/test/Driver/save-temps-use-module.f90

diff --git a/clang/include/clang/Driver/Types.def 
b/clang/include/clang/Driver/Types.def
index 0e0cae5fb7068d..b4e9e1f9f3f8b6 100644
--- a/clang/include/clang/Driver/Types.def
+++ b/clang/include/clang/Driver/Types.def
@@ -79,7 +79,7 @@ TYPE("c++-module-cpp-output",PP_CXXModule, INVALID,   
  "iim",phases
 TYPE("ada",  Ada,  INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 TYPE("assembler",PP_Asm,   INVALID, "s",  
phases::Assemble, phases::Link)
 TYPE("assembler-with-cpp",   Asm,  PP_Asm,  "S",  
phases::Preprocess, phases::Assemble, phases::Link)
-TYPE("f95",  PP_Fortran,   INVALID, "i",  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
+TYPE("f95",  PP_Fortran,   PP_Fortran,  "i",  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("f95-cpp-input",Fortran,  PP_Fortran,  nullptr,  
phases::Preprocess, phases::Compile, phases::Backend, phases::Assemble, 
phases::Link)
 TYPE("java", Java, INVALID, nullptr,  
phases::Compile, phases::Backend, phases::Assemble, phases::Link)
 
diff --git a/flang/test/Driver/save-temps-use-module.f90 
b/flang/test/Driver/save-temps-use-module.f90
new file mode 100644
index 00..2f184d15898571
--- /dev/null
+++ b/flang/test/Driver/save-temps-use-module.f90
@@ -0,0 +1,26 @@
+! Tests that `--save-temps` works properly when a module from a non standard 
dir
+! is included with `-I/...`.
+
+! RUN: rm -rf %t && split-file %s %t
+! RUN: mkdir %t/mod_inc_dir
+! RUN: mv %t/somemodule.mod %t/mod_inc_dir
+! RUN: %flang -S -emit-llvm --save-temps=obj -I%t/mod_inc_dir 
-fno-integrated-as \
+! RUN:   %t/ModuleUser.f90 -o %t/ModuleUser
+! RUN: ls %t | FileCheck %s
+
+! Verify that the temp file(s) were written to disk.
+! CHECK: ModuleUser.i
+
+!--- somemodule.mod
+!mod$ v1 sum:e9e8fd2bd49e8daa
+module SomeModule
+
+end module SomeModule
+!--- ModuleUser.f90
+
+module User
+  use SomeModule
+end module User
+
+program dummy
+end program

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [flang][OpenMP] Support `target enter|update|exit .. nowait` (PR #113305)

2024-10-23 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Looks like this broke some buildbots. Looking into the reported failures ... 👀 

https://github.com/llvm/llvm-project/pull/113305
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [flang][OpenMP] Support `target enter|update|exit .. nowait` (PR #113305)

2024-10-23 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy closed 
https://github.com/llvm/llvm-project/pull/113305
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [flang][OpenMP] Support `target enter|update|exit .. nowait` (PR #113305)

2024-10-23 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/113305

>From 83088c0b47f7582729f11f996d850e5757fcb872 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 22 Oct 2024 02:02:58 -0500
Subject: [PATCH] [flang][OpenMP] Support `target enter|update|exit .. nowait`

Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
---
 clang/lib/CodeGen/CGOpenMPRuntime.cpp |   4 +-
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |  39 --
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 127 --
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  |  34 +++--
 .../omptarget-nowait-unsupported-llvm.mlir|  39 --
 .../LLVMIR/omptargetdata-nowait-llvm.mlir | 110 +++
 6 files changed, 243 insertions(+), 110 deletions(-)
 delete mode 100644 
mlir/test/Target/LLVMIR/omptarget-nowait-unsupported-llvm.mlir
 create mode 100644 mlir/test/Target/LLVMIR/omptargetdata-nowait-llvm.mlir

diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 3747b00d4893ad..5e9f89b18918d2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -9672,8 +9672,8 @@ static void emitTargetCallKernelLaunch(
 DynCGGroupMem, HasNoWait);
 
 CGF.Builder.restoreIP(OMPRuntime->getOMPBuilder().emitKernelLaunch(
-CGF.Builder, OutlinedFn, OutlinedFnID, EmitTargetCallFallbackCB, Args,
-DeviceID, RTLoc, AllocaIP));
+CGF.Builder, OutlinedFnID, EmitTargetCallFallbackCB, Args, DeviceID,
+RTLoc, AllocaIP));
   };
 
   if (RequiresOuterTask)
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 8834c3b1f50115..d71712a677078c 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -2264,6 +2264,9 @@ class OpenMPIRBuilder {
 
 bool EmitDebug = false;
 
+/// Whether the `target ... data` directive has a `nowait` clause.
+bool HasNoWait = false;
+
 explicit TargetDataInfo() {}
 explicit TargetDataInfo(bool RequiresDevicePointerInfo,
 bool SeparateBeginEndCalls)
@@ -2342,7 +2345,6 @@ class OpenMPIRBuilder {
   /// Generate a target region entry call and host fallback call.
   ///
   /// \param Loc The location at which the request originated and is fulfilled.
-  /// \param OutlinedFn The outlined kernel function.
   /// \param OutlinedFnID The ooulined function ID.
   /// \param EmitTargetCallFallbackCB Call back function to generate host
   ///fallback code.
@@ -2350,18 +2352,27 @@ class OpenMPIRBuilder {
   /// \param DeviceID Identifier for the device via the 'device' clause.
   /// \param RTLoc Source location identifier
   /// \param AllocaIP The insertion point to be used for alloca instructions.
-  InsertPointTy emitKernelLaunch(
-  const LocationDescription &Loc, Function *OutlinedFn, Value 
*OutlinedFnID,
-  EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
-  Value *DeviceID, Value *RTLoc, InsertPointTy AllocaIP);
+  InsertPointTy
+  emitKernelLaunch(const LocationDescription &Loc, Value *OutlinedFnID,
+   EmitFallbackCallbackTy EmitTargetCallFallbackCB,
+   TargetKernelArgs &Args, Value *DeviceID, Value *RTLoc,
+   InsertPointTy AllocaIP);
+
+  /// Callback type for generating the bodies of device directives that require
+  /// outer tasks (e.g. in case of having `nowait` or `depend` clauses).
+  ///
+  /// \param DeviceID The ID of the device on which the target region will
+  ///execute.
+  /// \param RTLoc Source location identifier
+  /// \Param TargetTaskAllocaIP Insertion point for the alloca block of the
+  ///generated task.
+  using TaskBodyCallbackTy =
+  function_ref;
 
   /// Generate a target-task for the target construct
   ///
-  /// \param OutlinedFn The outlined device/target kernel function.
-  /// \param OutlinedFnID The ooulined function ID.
-  /// \param EmitTargetCallFallbackCB Call back function to generate host
-  ///fallback code.
-  /// \param Args Data structure holding information about the kernel 
arguments.
+  /// \param TaskBodyCB Callback to generate the actual body of the target 
task.
   /// \param DeviceID Identifier for the device via the 'device' clause.
   /// \param RTLoc Source location identifier
   /// \param AllocaIP The insertion point to be used for alloca instructions.
@@ -2370,10 +2381,10 @@ class OpenMPIRBuilder {
   /// \param HasNoWait True if the target construct had 'nowait' on it, false
   ///otherwise
   InsertPointTy emitTargetTask(
-  Function *OutlinedFn, Value *OutlinedFnID,
-  EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
-  V

[clang] [llvm] [mlir] [flang][OpenMP] Support `target enter|update|exit .. nowait` (PR #113305)

2024-10-23 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/113305

>From 70a0c97fa86445d1f888cf3645c0b59df9e4a9d7 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 22 Oct 2024 02:02:58 -0500
Subject: [PATCH] [flang][OpenMP] Support `target enter|update|exit .. nowait`

Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
---
 clang/lib/CodeGen/CGOpenMPRuntime.cpp |   4 +-
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |  39 --
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 126 --
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  |  34 +++--
 .../omptarget-nowait-unsupported-llvm.mlir|  39 --
 .../LLVMIR/omptargetdata-nowait-llvm.mlir | 110 +++
 6 files changed, 242 insertions(+), 110 deletions(-)
 delete mode 100644 
mlir/test/Target/LLVMIR/omptarget-nowait-unsupported-llvm.mlir
 create mode 100644 mlir/test/Target/LLVMIR/omptargetdata-nowait-llvm.mlir

diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 3747b00d4893ad..5e9f89b18918d2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -9672,8 +9672,8 @@ static void emitTargetCallKernelLaunch(
 DynCGGroupMem, HasNoWait);
 
 CGF.Builder.restoreIP(OMPRuntime->getOMPBuilder().emitKernelLaunch(
-CGF.Builder, OutlinedFn, OutlinedFnID, EmitTargetCallFallbackCB, Args,
-DeviceID, RTLoc, AllocaIP));
+CGF.Builder, OutlinedFnID, EmitTargetCallFallbackCB, Args, DeviceID,
+RTLoc, AllocaIP));
   };
 
   if (RequiresOuterTask)
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 8834c3b1f50115..d71712a677078c 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -2264,6 +2264,9 @@ class OpenMPIRBuilder {
 
 bool EmitDebug = false;
 
+/// Whether the `target ... data` directive has a `nowait` clause.
+bool HasNoWait = false;
+
 explicit TargetDataInfo() {}
 explicit TargetDataInfo(bool RequiresDevicePointerInfo,
 bool SeparateBeginEndCalls)
@@ -2342,7 +2345,6 @@ class OpenMPIRBuilder {
   /// Generate a target region entry call and host fallback call.
   ///
   /// \param Loc The location at which the request originated and is fulfilled.
-  /// \param OutlinedFn The outlined kernel function.
   /// \param OutlinedFnID The ooulined function ID.
   /// \param EmitTargetCallFallbackCB Call back function to generate host
   ///fallback code.
@@ -2350,18 +2352,27 @@ class OpenMPIRBuilder {
   /// \param DeviceID Identifier for the device via the 'device' clause.
   /// \param RTLoc Source location identifier
   /// \param AllocaIP The insertion point to be used for alloca instructions.
-  InsertPointTy emitKernelLaunch(
-  const LocationDescription &Loc, Function *OutlinedFn, Value 
*OutlinedFnID,
-  EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
-  Value *DeviceID, Value *RTLoc, InsertPointTy AllocaIP);
+  InsertPointTy
+  emitKernelLaunch(const LocationDescription &Loc, Value *OutlinedFnID,
+   EmitFallbackCallbackTy EmitTargetCallFallbackCB,
+   TargetKernelArgs &Args, Value *DeviceID, Value *RTLoc,
+   InsertPointTy AllocaIP);
+
+  /// Callback type for generating the bodies of device directives that require
+  /// outer tasks (e.g. in case of having `nowait` or `depend` clauses).
+  ///
+  /// \param DeviceID The ID of the device on which the target region will
+  ///execute.
+  /// \param RTLoc Source location identifier
+  /// \Param TargetTaskAllocaIP Insertion point for the alloca block of the
+  ///generated task.
+  using TaskBodyCallbackTy =
+  function_ref;
 
   /// Generate a target-task for the target construct
   ///
-  /// \param OutlinedFn The outlined device/target kernel function.
-  /// \param OutlinedFnID The ooulined function ID.
-  /// \param EmitTargetCallFallbackCB Call back function to generate host
-  ///fallback code.
-  /// \param Args Data structure holding information about the kernel 
arguments.
+  /// \param TaskBodyCB Callback to generate the actual body of the target 
task.
   /// \param DeviceID Identifier for the device via the 'device' clause.
   /// \param RTLoc Source location identifier
   /// \param AllocaIP The insertion point to be used for alloca instructions.
@@ -2370,10 +2381,10 @@ class OpenMPIRBuilder {
   /// \param HasNoWait True if the target construct had 'nowait' on it, false
   ///otherwise
   InsertPointTy emitTargetTask(
-  Function *OutlinedFn, Value *OutlinedFnID,
-  EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
-  V

[clang] [llvm] [mlir] [flang][OpenMP] Support `target enter|update|exit .. nowait` (PR #113305)

2024-10-23 Thread Kareem Ergawy via cfe-commits


@@ -6403,16 +6401,45 @@ OpenMPIRBuilder::InsertPointTy 
OpenMPIRBuilder::createTargetData(
   SrcLocInfo = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
 }
 
-Value *OffloadingArgs[] = {SrcLocInfo,   DeviceID,
-   PointerNum,   RTArgs.BasePointersArray,
-   RTArgs.PointersArray, RTArgs.SizesArray,
-   RTArgs.MapTypesArray, RTArgs.MapNamesArray,
-   RTArgs.MappersArray};
+SmallVector OffloadingArgs = {
+SrcLocInfo,   DeviceID,
+PointerNum,   RTArgs.BasePointersArray,
+RTArgs.PointersArray, RTArgs.SizesArray,
+RTArgs.MapTypesArray, RTArgs.MapNamesArray,
+RTArgs.MappersArray};
 
 if (IsStandAlone) {
   assert(MapperFunc && "MapperFunc missing for standalone target data");
-  Builder.CreateCall(getOrCreateRuntimeFunctionPtr(*MapperFunc),
- OffloadingArgs);
+
+  auto TaskBodyCB = [&](Value *, Value *, IRBuilderBase::InsertPoint) {
+if (Info.HasNoWait) {
+  OffloadingArgs.push_back(llvm::Constant::getNullValue(Int32));
+  OffloadingArgs.push_back(llvm::Constant::getNullValue(VoidPtr));
+  OffloadingArgs.push_back(llvm::Constant::getNullValue(Int32));
+  OffloadingArgs.push_back(llvm::Constant::getNullValue(VoidPtr));

ergawy wrote:

It does work. Thanks.

https://github.com/llvm/llvm-project/pull/113305
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [flang][OpenMP] Support `target enter|update|exit .. nowait` (PR #113305)

2024-10-23 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/113305

>From 52b59662de0693d3c9acb4e52d87e748cb9153cf Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 22 Oct 2024 02:02:58 -0500
Subject: [PATCH] [flang][OpenMP] Support `target enter|update|exit .. nowait`

Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
---
 clang/lib/CodeGen/CGOpenMPRuntime.cpp |   4 +-
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |  39 --
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 125 --
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  |  34 +++--
 .../omptarget-nowait-unsupported-llvm.mlir|  39 --
 .../LLVMIR/omptargetdata-nowait-llvm.mlir | 110 +++
 6 files changed, 241 insertions(+), 110 deletions(-)
 delete mode 100644 
mlir/test/Target/LLVMIR/omptarget-nowait-unsupported-llvm.mlir
 create mode 100644 mlir/test/Target/LLVMIR/omptargetdata-nowait-llvm.mlir

diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 3747b00d4893ad..5e9f89b18918d2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -9672,8 +9672,8 @@ static void emitTargetCallKernelLaunch(
 DynCGGroupMem, HasNoWait);
 
 CGF.Builder.restoreIP(OMPRuntime->getOMPBuilder().emitKernelLaunch(
-CGF.Builder, OutlinedFn, OutlinedFnID, EmitTargetCallFallbackCB, Args,
-DeviceID, RTLoc, AllocaIP));
+CGF.Builder, OutlinedFnID, EmitTargetCallFallbackCB, Args, DeviceID,
+RTLoc, AllocaIP));
   };
 
   if (RequiresOuterTask)
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 8834c3b1f50115..c4735ec41e7134 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -2264,6 +2264,9 @@ class OpenMPIRBuilder {
 
 bool EmitDebug = false;
 
+/// Whether the `target ... data` directive has a `nowait` clause.
+bool HasNoWait = false;
+
 explicit TargetDataInfo() {}
 explicit TargetDataInfo(bool RequiresDevicePointerInfo,
 bool SeparateBeginEndCalls)
@@ -2342,7 +2345,6 @@ class OpenMPIRBuilder {
   /// Generate a target region entry call and host fallback call.
   ///
   /// \param Loc The location at which the request originated and is fulfilled.
-  /// \param OutlinedFn The outlined kernel function.
   /// \param OutlinedFnID The ooulined function ID.
   /// \param EmitTargetCallFallbackCB Call back function to generate host
   ///fallback code.
@@ -2350,18 +2352,27 @@ class OpenMPIRBuilder {
   /// \param DeviceID Identifier for the device via the 'device' clause.
   /// \param RTLoc Source location identifier
   /// \param AllocaIP The insertion point to be used for alloca instructions.
-  InsertPointTy emitKernelLaunch(
-  const LocationDescription &Loc, Function *OutlinedFn, Value 
*OutlinedFnID,
-  EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
-  Value *DeviceID, Value *RTLoc, InsertPointTy AllocaIP);
+  InsertPointTy
+  emitKernelLaunch(const LocationDescription &Loc, Value *OutlinedFnID,
+   EmitFallbackCallbackTy EmitTargetCallFallbackCB,
+   TargetKernelArgs &Args, Value *DeviceID, Value *RTLoc,
+   InsertPointTy AllocaIP);
+
+  /// Callback type for generating the bodies of device directives that require
+  /// outer target tasks (e.g. in case of having `nowait` or `depend` clauses).
+  ///
+  /// \param DeviceID The ID of the device on which the target region will
+  ///execute.
+  /// \param RTLoc Source location identifier
+  /// \Param TargetTaskAllocaIP Insertion point for the alloca block of the
+  ///generated task.
+  using TargetTaskBodyCallbackTy =
+  function_ref;
 
   /// Generate a target-task for the target construct
   ///
-  /// \param OutlinedFn The outlined device/target kernel function.
-  /// \param OutlinedFnID The ooulined function ID.
-  /// \param EmitTargetCallFallbackCB Call back function to generate host
-  ///fallback code.
-  /// \param Args Data structure holding information about the kernel 
arguments.
+  /// \param TaskBodyCB Callback to generate the actual body of the target 
task.
   /// \param DeviceID Identifier for the device via the 'device' clause.
   /// \param RTLoc Source location identifier
   /// \param AllocaIP The insertion point to be used for alloca instructions.
@@ -2370,10 +2381,10 @@ class OpenMPIRBuilder {
   /// \param HasNoWait True if the target construct had 'nowait' on it, false
   ///otherwise
   InsertPointTy emitTargetTask(
-  Function *OutlinedFn, Value *OutlinedFnID,
-  EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &A

[clang] [llvm] [mlir] [flang][OpenMP] Support `target enter|update|exit .. nowait` (PR #113305)

2024-10-23 Thread Kareem Ergawy via cfe-commits


@@ -6403,16 +6401,45 @@ OpenMPIRBuilder::InsertPointTy 
OpenMPIRBuilder::createTargetData(
   SrcLocInfo = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
 }
 
-Value *OffloadingArgs[] = {SrcLocInfo,   DeviceID,
-   PointerNum,   RTArgs.BasePointersArray,
-   RTArgs.PointersArray, RTArgs.SizesArray,
-   RTArgs.MapTypesArray, RTArgs.MapNamesArray,
-   RTArgs.MappersArray};
+SmallVector OffloadingArgs = {
+SrcLocInfo,   DeviceID,
+PointerNum,   RTArgs.BasePointersArray,
+RTArgs.PointersArray, RTArgs.SizesArray,
+RTArgs.MapTypesArray, RTArgs.MapNamesArray,
+RTArgs.MappersArray};
 
 if (IsStandAlone) {
   assert(MapperFunc && "MapperFunc missing for standalone target data");
-  Builder.CreateCall(getOrCreateRuntimeFunctionPtr(*MapperFunc),
- OffloadingArgs);
+
+  auto TaskBodyCB = [&](Value *, Value *, IRBuilderBase::InsertPoint) {
+if (Info.HasNoWait) {
+  OffloadingArgs.push_back(llvm::Constant::getNullValue(Int32));
+  OffloadingArgs.push_back(llvm::Constant::getNullValue(VoidPtr));
+  OffloadingArgs.push_back(llvm::Constant::getNullValue(Int32));
+  OffloadingArgs.push_back(llvm::Constant::getNullValue(VoidPtr));
+}
+
+Builder.CreateCall(getOrCreateRuntimeFunctionPtr(*MapperFunc),
+   OffloadingArgs);
+
+if (Info.HasNoWait) {
+  BasicBlock *OffloadContBlock =
+  BasicBlock::Create(Builder.getContext(), "omp_offload.cont");
+  auto *CurFn = Builder.GetInsertBlock()->getParent();
+  emitBranch(OffloadContBlock);
+  emitBlock(OffloadContBlock, CurFn, /*IsFinished=*/true);

ergawy wrote:

Seems like `emitBlock` is enough indeed. Let's see if the CI objects.

https://github.com/llvm/llvm-project/pull/113305
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [mlir] [flang][OpenMP] Support `target enter|update|exit .. nowait` (PR #113305)

2024-10-22 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/113305

>From fddc36ea4086aaaf415f9c5b1f0150969eeacc6e Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 22 Oct 2024 02:02:58 -0500
Subject: [PATCH] [flang][OpenMP] Support `target enter|update|exit .. nowait`

Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
---
 clang/lib/CodeGen/CGOpenMPRuntime.cpp |   4 +-
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |  39 +++--
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 135 +++---
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  |  34 +++--
 .../omptarget-nowait-unsupported-llvm.mlir|  39 -
 .../LLVMIR/omptargetdata-nowait-llvm.mlir | 110 ++
 6 files changed, 246 insertions(+), 115 deletions(-)
 delete mode 100644 
mlir/test/Target/LLVMIR/omptarget-nowait-unsupported-llvm.mlir
 create mode 100644 mlir/test/Target/LLVMIR/omptargetdata-nowait-llvm.mlir

diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 3747b00d4893ad..5e9f89b18918d2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -9672,8 +9672,8 @@ static void emitTargetCallKernelLaunch(
 DynCGGroupMem, HasNoWait);
 
 CGF.Builder.restoreIP(OMPRuntime->getOMPBuilder().emitKernelLaunch(
-CGF.Builder, OutlinedFn, OutlinedFnID, EmitTargetCallFallbackCB, Args,
-DeviceID, RTLoc, AllocaIP));
+CGF.Builder, OutlinedFnID, EmitTargetCallFallbackCB, Args, DeviceID,
+RTLoc, AllocaIP));
   };
 
   if (RequiresOuterTask)
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 8834c3b1f50115..d71712a677078c 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -2264,6 +2264,9 @@ class OpenMPIRBuilder {
 
 bool EmitDebug = false;
 
+/// Whether the `target ... data` directive has a `nowait` clause.
+bool HasNoWait = false;
+
 explicit TargetDataInfo() {}
 explicit TargetDataInfo(bool RequiresDevicePointerInfo,
 bool SeparateBeginEndCalls)
@@ -2342,7 +2345,6 @@ class OpenMPIRBuilder {
   /// Generate a target region entry call and host fallback call.
   ///
   /// \param Loc The location at which the request originated and is fulfilled.
-  /// \param OutlinedFn The outlined kernel function.
   /// \param OutlinedFnID The ooulined function ID.
   /// \param EmitTargetCallFallbackCB Call back function to generate host
   ///fallback code.
@@ -2350,18 +2352,27 @@ class OpenMPIRBuilder {
   /// \param DeviceID Identifier for the device via the 'device' clause.
   /// \param RTLoc Source location identifier
   /// \param AllocaIP The insertion point to be used for alloca instructions.
-  InsertPointTy emitKernelLaunch(
-  const LocationDescription &Loc, Function *OutlinedFn, Value 
*OutlinedFnID,
-  EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
-  Value *DeviceID, Value *RTLoc, InsertPointTy AllocaIP);
+  InsertPointTy
+  emitKernelLaunch(const LocationDescription &Loc, Value *OutlinedFnID,
+   EmitFallbackCallbackTy EmitTargetCallFallbackCB,
+   TargetKernelArgs &Args, Value *DeviceID, Value *RTLoc,
+   InsertPointTy AllocaIP);
+
+  /// Callback type for generating the bodies of device directives that require
+  /// outer tasks (e.g. in case of having `nowait` or `depend` clauses).
+  ///
+  /// \param DeviceID The ID of the device on which the target region will
+  ///execute.
+  /// \param RTLoc Source location identifier
+  /// \Param TargetTaskAllocaIP Insertion point for the alloca block of the
+  ///generated task.
+  using TaskBodyCallbackTy =
+  function_ref;
 
   /// Generate a target-task for the target construct
   ///
-  /// \param OutlinedFn The outlined device/target kernel function.
-  /// \param OutlinedFnID The ooulined function ID.
-  /// \param EmitTargetCallFallbackCB Call back function to generate host
-  ///fallback code.
-  /// \param Args Data structure holding information about the kernel 
arguments.
+  /// \param TaskBodyCB Callback to generate the actual body of the target 
task.
   /// \param DeviceID Identifier for the device via the 'device' clause.
   /// \param RTLoc Source location identifier
   /// \param AllocaIP The insertion point to be used for alloca instructions.
@@ -2370,10 +2381,10 @@ class OpenMPIRBuilder {
   /// \param HasNoWait True if the target construct had 'nowait' on it, false
   ///otherwise
   InsertPointTy emitTargetTask(
-  Function *OutlinedFn, Value *OutlinedFnID,
-  EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs &Args,
-  Valu

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-06 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From f938ba2240e03756ac7597eedd0b5ac3ad1ece3e Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index c0749c418b7bcec..7fc22bcdff17cd2 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6901,6 +6901,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-parallel=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index e4019c434968744..cf08ec1b900ad90 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 000..d40383a06a47b6c
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-parallel`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `om

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-06 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 03d500e28d76ab356537f771dd75ecce4010bd48 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index c0749c418b7bcec..7fc22bcdff17cd2 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6901,6 +6901,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-parallel=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index e4019c434968744..cf08ec1b900ad90 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 000..6805061859556a1
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-parallel`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `om

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-06 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/126026

This PR starts the effort to upstream AMD's internal implementation of `do 
concurrent` to OpenMP mapping. This replaces #77285 since we extended this WIP 
quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current status 
downstream, the upstreaming status, and next steps to make this pass much more 
useful.

In addition to this document, this PR also contains the skeleton of the pass 
(no useful transformations are done yet) and some testing for the added command 
line options.

>From 93576eadc75bece4e49cff2b95519287cb98b8d7 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 17 files changed, 719 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index c0749c418b7bcec..7fc22bcdff17cd2 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6901,6 +6901,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-parallel=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index e4019c434968744..cf08ec1b900ad90 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 000..d40383a06a47b6c
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current 

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-06 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-06 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 04b5656b48bedf1250280c4145ee1c9b2a3f7cdf Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 77ca2d2aac31be1..992e4066153829c 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6910,6 +6910,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-parallel=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index e4019c434968744..cf08ec1b900ad90 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 000..6805061859556a1
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-parallel`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `om

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-10 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 4ea51578a841ae29e17a366d96a7c0f626806623 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 1cf62ab46613456..b0e1ed7d26f1985 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6919,6 +6919,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 591003f56e8bbb9..febe339ca9e0dc1 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 000..6807e402ce081c6
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-10 Thread Kareem Ergawy via cfe-commits


@@ -6910,6 +6910,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-parallel=">,

ergawy wrote:

Changed to `do-concurrent-to-openmp`. Let me know if the name can be improved.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-10 Thread Kareem Ergawy via cfe-commits


@@ -142,6 +142,12 @@ static llvm::cl::opt
llvm::cl::desc("enable openmp device compilation"),
llvm::cl::init(false));
 
+static llvm::cl::opt enableDoConcurrentToOpenMPConversion(
+"fdo-concurrent-parallel",

ergawy wrote:

Done.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-10 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Ping! Please take a look when you have time. And let me know if you disagree 
with any decisions taken in the current approach or need any 
clarification/expansion in the next steps section of the status tracking 
document.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-12 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

@skatrak @kiranchandramohan I removed the "current status" part of the 
document. Left the other sections since they are not related to upcoming 
upstreaming PRs.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-12 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 207fc495f95a852f2689b0fb1d369ac1cc0dea17 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/7] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 618815db28434..57472abd66a7d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6919,6 +6919,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 591003f56e8bb..febe339ca9e0d 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..6807e402ce081
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `omp targ

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-12 Thread Kareem Ergawy via cfe-commits


@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a target 
device.
+   This maps such loops to the equivalent of
+   `omp target teams distribute parallel do`.
+3. `none`: this disables `do concurrent` mapping altogether. In that case, such
+   loops are emitted as sequential loops.
+
+The above compiler switch is currently available only when OpenMP is also
+enabled. So you need to provide the following options to flang in order to
+enable it:
+```
+flang ... -fopenmp -fdo-concurrent-to-openmp=[host|device|none] ...
+```
+
+## Current status
+
+Under the hood, `do concurrent` mapping is implemented in the
+`DoConcurrentConversionPass`. This is still an experimental pass which means
+that:
+* It has been tested in a very limited way so far.
+* It has been tested mostly on simple synthetic inputs.
+
+To describe current status in more detail, following is a description of how
+the pass currently behaves for single-range loops and then for multi-range
+loops. The following sub-sections describe the status of the downstream 
+implementation on the AMD's ROCm fork[^1]. We are working on upstreaming the
+downstream implementation gradually and this document will be updated to 
reflect
+such upstreaming process. Example LIT tests referenced below might also be only
+be available in the ROCm fork and will upstream with the relevant parts of the
+code.
+
+[^1]: 
https://github.com/ROCm/llvm-project/blob/amd-staging/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+
+### Single-range loops
+
+Given the following loop:
+```fortran
+  do concurrent(i=1:n)
+a(i) = i * i
+  end do
+```
+
+ Mapping to `host`
+
+Mapping this loop to the `host`, generates MLIR operations of the following
+structure:
+
+```
+%4 = fir.address_of(@_QFEa) ...
+%6:2 = hlfir.declare %4 ...
+
+omp.parallel {
+  // Allocate private copy for `i`.
+  // TODO Use delayed privatization.
+  %19 = fir.alloca i32 {bindc_name = "i"}
+  %20:2 = hlfir.declare %19 {uniq_name = "_QFEi"} ...
+
+  omp.wsloop {
+omp.loop_nest (%arg0) : index = (%21) to (%22) inclusive step (%c1_2) {
+  %23 = fir.convert %arg0 : (index) -> i32
+  // Use the privatized version of `i`.
+  fir.store %23 to %20#1 : !fir.ref
+  ...
+
+  // Use "shared" SSA value of `a`.
+  %42 = hlfir.designate %6#0
+  hlfir.assign %35 to %42
+  ...
+  omp.yield
+}
+omp.terminator
+  }
+  omp.terminator
+}
+```
+
+ Mapping to `device`
+
+Mapping the same loop to the `device`, generates MLIR operations of the
+following structure:
+
+```
+// Map `a` to the `target` region. The pass automatically detects memory blocks
+// and maps them to device. Currently detection logic is still limited and a 
lot
+// of work is going into making it more capable.
+%29 = omp.map.info ... {name = "_QFEa"}
+omp.target ... map_entries(..., %29 -> %arg4 ...) {
+  ...
+  %51:2 = hlfir.declare %arg4
+  ...
+  omp.teams {
+// Allocate private copy for `i`.
+// TODO Use delayed privatization.
+%52 = fir.alloca i32 {bindc_name = "i"}
+%53:2 = hlfir.declare %52
+...
+
+omp.parallel {
+  omp.distribute {
+omp.wsloop {
+  omp.loop_nest (%arg5) : index = (%54) to (%55) inclusive step 
(%c1_9) {
+// Use the privatized version of `i`.
+%56 = fir.convert %arg5 : (index) -> i32
+fir.store %56 to %53#1
+...
+// Use the mapped version of `a`.
+... = hlfir.designate %51#0
+...
+  }
+  omp.terminator
+}
+omp.terminator
+  }
+  omp.terminator
+}
+omp.terminator
+  }
+  omp.terminator
+}
+```
+
+### Multi-range loops
+
+The pass currently supports multi-range loops as well. Given the following
+example:
+
+```fortran
+   do concurrent(i=1:n, j=1:m)
+   a(i,j) = i * j
+   end do
+```
+
+The generated `omp.loop_nest` operation look like:
+
+```
+omp.loop_nest (%arg0, %arg1)
+: index = (%17, %19) to (%18, %20)
+inclusive step (%c1_2, %c1_4) {
+  fir.store %arg0 to %private_i#1 : !fir.ref
+  fir.store %arg1

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-12 Thread Kareem Ergawy via cfe-commits


@@ -6919,6 +6919,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,

ergawy wrote:

Yes, I think this will be a best-effort transformation, hopefully with good 
diagnostics on which loops whee transformed and which were not.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-12 Thread Kareem Ergawy via cfe-commits


@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a target 
device.
+   This maps such loops to the equivalent of
+   `omp target teams distribute parallel do`.
+3. `none`: this disables `do concurrent` mapping altogether. In that case, such
+   loops are emitted as sequential loops.
+
+The above compiler switch is currently available only when OpenMP is also
+enabled. So you need to provide the following options to flang in order to
+enable it:
+```
+flang ... -fopenmp -fdo-concurrent-to-openmp=[host|device|none] ...
+```
+
+## Current status
+
+Under the hood, `do concurrent` mapping is implemented in the
+`DoConcurrentConversionPass`. This is still an experimental pass which means
+that:
+* It has been tested in a very limited way so far.
+* It has been tested mostly on simple synthetic inputs.
+
+To describe current status in more detail, following is a description of how
+the pass currently behaves for single-range loops and then for multi-range
+loops. The following sub-sections describe the status of the downstream 
+implementation on the AMD's ROCm fork[^1]. We are working on upstreaming the
+downstream implementation gradually and this document will be updated to 
reflect
+such upstreaming process. Example LIT tests referenced below might also be only
+be available in the ROCm fork and will upstream with the relevant parts of the
+code.
+
+[^1]: 
https://github.com/ROCm/llvm-project/blob/amd-staging/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+
+### Single-range loops
+
+Given the following loop:
+```fortran
+  do concurrent(i=1:n)
+a(i) = i * i
+  end do
+```
+
+ Mapping to `host`
+
+Mapping this loop to the `host`, generates MLIR operations of the following
+structure:
+
+```
+%4 = fir.address_of(@_QFEa) ...
+%6:2 = hlfir.declare %4 ...
+
+omp.parallel {
+  // Allocate private copy for `i`.
+  // TODO Use delayed privatization.
+  %19 = fir.alloca i32 {bindc_name = "i"}
+  %20:2 = hlfir.declare %19 {uniq_name = "_QFEi"} ...
+
+  omp.wsloop {
+omp.loop_nest (%arg0) : index = (%21) to (%22) inclusive step (%c1_2) {
+  %23 = fir.convert %arg0 : (index) -> i32
+  // Use the privatized version of `i`.
+  fir.store %23 to %20#1 : !fir.ref
+  ...
+
+  // Use "shared" SSA value of `a`.
+  %42 = hlfir.designate %6#0
+  hlfir.assign %35 to %42
+  ...
+  omp.yield
+}
+omp.terminator
+  }
+  omp.terminator
+}
+```
+
+ Mapping to `device`
+
+Mapping the same loop to the `device`, generates MLIR operations of the
+following structure:
+
+```
+// Map `a` to the `target` region. The pass automatically detects memory blocks
+// and maps them to device. Currently detection logic is still limited and a 
lot
+// of work is going into making it more capable.
+%29 = omp.map.info ... {name = "_QFEa"}
+omp.target ... map_entries(..., %29 -> %arg4 ...) {
+  ...
+  %51:2 = hlfir.declare %arg4
+  ...
+  omp.teams {
+// Allocate private copy for `i`.
+// TODO Use delayed privatization.
+%52 = fir.alloca i32 {bindc_name = "i"}
+%53:2 = hlfir.declare %52
+...
+
+omp.parallel {
+  omp.distribute {
+omp.wsloop {
+  omp.loop_nest (%arg5) : index = (%54) to (%55) inclusive step 
(%c1_9) {
+// Use the privatized version of `i`.
+%56 = fir.convert %arg5 : (index) -> i32
+fir.store %56 to %53#1
+...
+// Use the mapped version of `a`.
+... = hlfir.designate %51#0
+...
+  }
+  omp.terminator
+}
+omp.terminator
+  }
+  omp.terminator
+}
+omp.terminator
+  }
+  omp.terminator
+}
+```
+
+### Multi-range loops
+
+The pass currently supports multi-range loops as well. Given the following
+example:
+
+```fortran
+   do concurrent(i=1:n, j=1:m)
+   a(i,j) = i * j
+   end do
+```
+
+The generated `omp.loop_nest` operation look like:
+
+```
+omp.loop_nest (%arg0, %arg1)
+: index = (%17, %19) to (%18, %20)
+inclusive step (%c1_2, %c1_4) {
+  fir.store %arg0 to %private_i#1 : !fir.ref
+  fir.store %arg1

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-12 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 207fc495f95a852f2689b0fb1d369ac1cc0dea17 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/6] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 618815db28434..57472abd66a7d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6919,6 +6919,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 591003f56e8bb..febe339ca9e0d 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..6807e402ce081
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `omp targ

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-12 Thread Kareem Ergawy via cfe-commits


@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a target 
device.
+   This maps such loops to the equivalent of
+   `omp target teams distribute parallel do`.
+3. `none`: this disables `do concurrent` mapping altogether. In that case, such
+   loops are emitted as sequential loops.
+
+The above compiler switch is currently available only when OpenMP is also
+enabled. So you need to provide the following options to flang in order to
+enable it:
+```
+flang ... -fopenmp -fdo-concurrent-to-openmp=[host|device|none] ...

ergawy wrote:

Yes, I expanded this point to clarify this.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-13 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 207fc495f95a852f2689b0fb1d369ac1cc0dea17 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/8] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 618815db28434..57472abd66a7d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6919,6 +6919,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 591003f56e8bb..febe339ca9e0d 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..6807e402ce081
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `omp targ

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-13 Thread Kareem Ergawy via cfe-commits


@@ -0,0 +1,99 @@
+//===- DoConcurrentConversion.cpp -- map `DO CONCURRENT` to OpenMP loops 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/IR/Diagnostics.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/DialectConversion.h"
+
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_DOCONCURRENTCONVERSIONPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+#define DEBUG_TYPE "do-concurrent-conversion"
+#define DBGS() (llvm::dbgs() << "[" DEBUG_TYPE << "]: ")
+
+namespace {
+class DoConcurrentConversion : public mlir::OpConversionPattern 
{
+public:
+  using mlir::OpConversionPattern::OpConversionPattern;
+
+  DoConcurrentConversion(mlir::MLIRContext *context, bool mapToDevice)
+  : OpConversionPattern(context), mapToDevice(mapToDevice) {}
+
+  mlir::LogicalResult
+  matchAndRewrite(fir::DoLoopOp doLoop, OpAdaptor adaptor,
+  mlir::ConversionPatternRewriter &rewriter) const override {
+// TODO This will be filled in with the next PRs that upstreams the rest of
+// the ROCm implementaion.
+return mlir::success();
+  }
+
+  bool mapToDevice;
+};
+
+class DoConcurrentConversionPass
+: public flangomp::impl::DoConcurrentConversionPassBase<
+  DoConcurrentConversionPass> {
+public:
+  DoConcurrentConversionPass() = default;
+
+  DoConcurrentConversionPass(
+  const flangomp::DoConcurrentConversionPassOptions &options)
+  : DoConcurrentConversionPassBase(options) {}
+
+  void runOnOperation() override {
+mlir::func::FuncOp func = getOperation();
+
+if (func.isDeclaration())
+  return;
+
+auto *context = &getContext();
+
+if (mapTo != flangomp::DoConcurrentMappingKind::DCMK_Host &&
+mapTo != flangomp::DoConcurrentMappingKind::DCMK_Device) {
+  mlir::emitWarning(mlir::UnknownLoc::get(context),
+"DoConcurrentConversionPass: invalid `map-to` value. "
+"Valid values are: `host` or `device`");
+  return;
+}
+
+mlir::RewritePatternSet patterns(context);
+patterns.insert(
+context, mapTo == flangomp::DoConcurrentMappingKind::DCMK_Device);
+mlir::ConversionTarget target(*context);
+target.addDynamicallyLegalOp(
+[&](fir::DoLoopOp op) { return !op.getUnordered(); });

ergawy wrote:

I am hopeing this can be used for all constructs that target `do_loop ... 
unordered`. Added a comment as you suggested.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-13 Thread Kareem Ergawy via cfe-commits


@@ -0,0 +1,99 @@
+//===- DoConcurrentConversion.cpp -- map `DO CONCURRENT` to OpenMP loops 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/IR/Diagnostics.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/DialectConversion.h"
+
+#include 
+#include 

ergawy wrote:

Updated includes. I just copied some extra includes from the downstream 
implementation.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-13 Thread Kareem Ergawy via cfe-commits


@@ -0,0 +1,53 @@
+! Mark as xfail for now until we upstream the relevant part. This is just for
+! demo purposes at this point. Upstreaming this is the next step.
+! XFAIL: *
+
+! Tests mapping of a basic `do concurrent` loop to `!$omp parallel do`.
+
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fdo-concurrent-to-openmp=host %s -o - \
+! RUN:   | FileCheck %s
+! RUN: bbc -emit-hlfir -fopenmp -fdo-concurrent-to-openmp=host %s -o - \
+! RUN:   | FileCheck %s

ergawy wrote:

I will add some MLIR to MLIR tests using `fir-opt` in later PRs where I will 
use `fir-opt`.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-11 Thread Kareem Ergawy via cfe-commits


@@ -292,7 +298,19 @@ createTargetMachine(llvm::StringRef targetTriple, 
std::string &error) {
 static llvm::LogicalResult runOpenMPPasses(mlir::ModuleOp mlirModule) {
   mlir::PassManager pm(mlirModule->getName(),
mlir::OpPassManager::Nesting::Implicit);
-  fir::createOpenMPFIRPassPipeline(pm, enableOpenMPDevice);
+  using DoConcurrentMappingKind =
+  Fortran::frontend::CodeGenOptions::DoConcurrentMappingKind;
+
+  fir::OpenMPFIRPassPipelineOpts opts;
+  opts.isTargetDevice = enableOpenMPDevice;
+  opts.doConcurrentMappingKind =
+  llvm::StringSwitch(
+  enableDoConcurrentToOpenMPConversion)
+  .Case("host", DoConcurrentMappingKind::DCMK_Host)
+  .Case("device", DoConcurrentMappingKind::DCMK_Device)
+  .Default(DoConcurrentMappingKind::DCMK_None);
+

ergawy wrote:

This is inside `runOpenMPPasses` which is called only when openmp is enabled. 
We can move this outside and trigger a warning when the pass is enabled without 
openmp being enabled but since `bbc` is a testing tool and not user-facing, I 
don't know if this is needed.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-11 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From d1ed094aec0713cf6ff75b4244bcd4d15265c6af Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/3] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 1cf62ab466134..b0e1ed7d26f19 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6919,6 +6919,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 591003f56e8bb..febe339ca9e0d 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..6807e402ce081
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `omp targ

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-11 Thread Kareem Ergawy via cfe-commits


@@ -0,0 +1,104 @@
+//===- DoConcurrentConversion.cpp -- map `DO CONCURRENT` to OpenMP loops 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/IR/Diagnostics.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/DialectConversion.h"
+
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_DOCONCURRENTCONVERSIONPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+#define DEBUG_TYPE "do-concurrent-conversion"
+#define DBGS() (llvm::dbgs() << "[" DEBUG_TYPE << "]: ")
+
+namespace {
+class DoConcurrentConversion : public mlir::OpConversionPattern 
{
+public:
+  using mlir::OpConversionPattern::OpConversionPattern;
+
+  DoConcurrentConversion(mlir::MLIRContext *context, bool mapToDevice,
+ llvm::DenseSet &concurrentLoopsToSkip)
+  : OpConversionPattern(context), mapToDevice(mapToDevice),
+concurrentLoopsToSkip(concurrentLoopsToSkip) {}
+
+  mlir::LogicalResult
+  matchAndRewrite(fir::DoLoopOp doLoop, OpAdaptor adaptor,
+  mlir::ConversionPatternRewriter &rewriter) const override {
+return mlir::success();

ergawy wrote:

Added a todo. The very next PR will fill this in with the host mapping so 
triggering a compilation failure should not be needed hopefully.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-11 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From d1ed094aec0713cf6ff75b4244bcd4d15265c6af Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/4] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 1cf62ab466134..b0e1ed7d26f19 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6919,6 +6919,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 591003f56e8bb..febe339ca9e0d 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..6807e402ce081
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `omp targ

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-11 Thread Kareem Ergawy via cfe-commits


@@ -352,16 +352,37 @@ bool CodeGenAction::beginSourceFileAction() {
   // Add OpenMP-related passes
   // WARNING: These passes must be run immediately after the lowering to ensure
   // that the FIR is correct with respect to OpenMP operations/attributes.
-  if (ci.getInvocation().getFrontendOpts().features.IsEnabled(
-  Fortran::common::LanguageFeature::OpenMP)) {
-bool isDevice = false;
+  bool isOpenMPEnabled =
+  ci.getInvocation().getFrontendOpts().features.IsEnabled(
+  Fortran::common::LanguageFeature::OpenMP);
+
+  fir::OpenMPFIRPassPipelineOpts opts;
+
+  using DoConcurrentMappingKind =
+  Fortran::frontend::CodeGenOptions::DoConcurrentMappingKind;
+  opts.doConcurrentMappingKind =
+  ci.getInvocation().getCodeGenOpts().getDoConcurrentMapping();
+
+  if (opts.doConcurrentMappingKind != DoConcurrentMappingKind::DCMK_None &&
+  !isOpenMPEnabled) {
+unsigned diagID = ci.getDiagnostics().getCustomDiagID(
+clang::DiagnosticsEngine::Error,
+"lowering `do concurrent` loops to OpenMP is only supported if "
+"OpenMP is enabled. Enable OpenMP using `-fopenmp`.");
+ci.getDiagnostics().Report(diagID);
+return false;
+  }

ergawy wrote:

Makes sense. Done.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-11 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-17 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Ping! Any objections to merging this PR? cc @skatrak @clementval @tarunprabhu 
(and other reviewers).

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-16 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 2a54270a2ad7f42ddf6787afd81a8b98641f8082 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 01/10] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 380 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 ++
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 ++
 .../flang/Optimizer/Passes/Pipelines.h|  11 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  30 ++
 flang/lib/Frontend/FrontendActions.cpp|  31 +-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp | 104 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |   9 +-
 .../Transforms/DoConcurrent/basic_host.f90|  53 +++
 .../DoConcurrent/command_line_options.f90 |  18 +
 flang/tools/bbc/bbc.cpp   |  20 +-
 18 files changed, 720 insertions(+), 10 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/command_line_options.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 5ad187926e710..fedf2cdad3d49 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6927,6 +6927,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none,host,device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..bf0bfacd03742 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_do_concurrent_parallel_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..6807e402ce081
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,380 @@
+
+
+# `DO CONCURENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implmenentation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flags has 3 possible values:
+1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurent` loops to run in parallel on a device
+   (GPU). This maps such loops to the equivalent of `omp ta

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-17 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From f946ee6c8c34819d36818dbc3a5430c8b9e8a059 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 155 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 +++
 .../flang/Optimizer/Passes/Pipelines.h|  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  28 
 flang/lib/Frontend/FrontendActions.cpp|  32 +++-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp |  99 +++
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../test/Driver/do_concurrent_to_omp_cli.f90  |  20 +++
 .../Transforms/DoConcurrent/basic_host.f90|  53 ++
 flang/tools/bbc/bbc.cpp   |  20 ++-
 18 files changed, 499 insertions(+), 12 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Driver/do_concurrent_to_omp_cli.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 5ad187926e710..0cd3dfd3fb29d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6927,6 +6927,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def fdo_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none, host, device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..cb0b00a2fd699 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_fdo_concurrent_to_openmp_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..43a8ff47161de
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,155 @@
+
+
+# `DO CONCURRENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURRENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implementation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
+1. `host`: this maps `do concurrent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurrent` loops to run in parallel on a target 
device.
+   This maps such loops to the equivalent 

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-17 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 477b5b8d22ddd7b0a873519c8cc16b0e4a3c81ca Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 155 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 +++
 .../flang/Optimizer/Passes/Pipelines.h|  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  28 
 flang/lib/Frontend/FrontendActions.cpp|  32 +++-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp |  99 +++
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../test/Driver/do_concurrent_to_omp_cli.f90  |  20 +++
 .../Transforms/DoConcurrent/basic_host.f90|  53 ++
 flang/tools/bbc/bbc.cpp   |  20 ++-
 18 files changed, 499 insertions(+), 12 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Driver/do_concurrent_to_omp_cli.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 5ad187926e710..0cd3dfd3fb29d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6927,6 +6927,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def fdo_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none, host, device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..cb0b00a2fd699 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_fdo_concurrent_to_openmp_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..62bc3172f8e3b
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,155 @@
+
+
+# `DO CONCURRENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURRENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implementation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
+1. `host`: this maps `do concurrent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurrent` loops to run in parallel on a target 
device.
+   This maps such loops to the equivalent 

[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127478)

2025-02-17 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/127478

Upstreams the next part of `do concurrent` to OpenMP mapping pass (from
AMD's ROCm implementation). See 
https://github.com/llvm/llvm-project/pull/126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range `do concurrent` loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for #126026, only the latest commit is relevant.

>From 477b5b8d22ddd7b0a873519c8cc16b0e4a3c81ca Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/2] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 155 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 +++
 .../flang/Optimizer/Passes/Pipelines.h|  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  28 
 flang/lib/Frontend/FrontendActions.cpp|  32 +++-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp |  99 +++
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../test/Driver/do_concurrent_to_omp_cli.f90  |  20 +++
 .../Transforms/DoConcurrent/basic_host.f90|  53 ++
 flang/tools/bbc/bbc.cpp   |  20 ++-
 18 files changed, 499 insertions(+), 12 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Driver/do_concurrent_to_omp_cli.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 5ad187926e710..0cd3dfd3fb29d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6927,6 +6927,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def fdo_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none, host, device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..cb0b00a2fd699 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_fdo_concurrent_to_openmp_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..62bc3172f8e3b
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,155 @@
+
+
+# `DO CONCURRENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURRENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implementation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-17 Thread Kareem Ergawy via cfe-commits


@@ -0,0 +1,104 @@
+//===- DoConcurrentConversion.cpp -- map `DO CONCURRENT` to OpenMP loops 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/IR/Diagnostics.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/DialectConversion.h"
+
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_DOCONCURRENTCONVERSIONPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+#define DEBUG_TYPE "do-concurrent-conversion"
+#define DBGS() (llvm::dbgs() << "[" DEBUG_TYPE << "]: ")
+
+namespace {
+class DoConcurrentConversion : public mlir::OpConversionPattern 
{
+public:
+  using mlir::OpConversionPattern::OpConversionPattern;
+
+  DoConcurrentConversion(mlir::MLIRContext *context, bool mapToDevice,
+ llvm::DenseSet &concurrentLoopsToSkip)
+  : OpConversionPattern(context), mapToDevice(mapToDevice),
+concurrentLoopsToSkip(concurrentLoopsToSkip) {}
+
+  mlir::LogicalResult
+  matchAndRewrite(fir::DoLoopOp doLoop, OpAdaptor adaptor,
+  mlir::ConversionPatternRewriter &rewriter) const override {
+return mlir::success();

ergawy wrote:

Not in a hurry to merge it, we can wait until the pass has enough functionality 
to do something actual transformation. Opened the next PR for review: 
https://github.com/llvm/llvm-project/pull/127478.

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127595)

2025-02-21 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Thanks @skatrak and @bhandarkar-pranav for the approval.

@kiranchandramohan @clementval I think there is a pretty simple solution that 
enables us to mark multi-range loop nests. I think we can add an optional 
attribute to the `fir::DoLoopOp` to store the loop nest depth: `nest_depth(n)`. 
So for the following input:
```fortran
do concurrent (i=1:10, j=1:10)
end do
```
The MLIR would look like this:
```mlir
fir.do_loop %arg0 = %10 to %11 step %c1 unordered nest_depth(2) {
  %14 = fir.convert %arg0 : (index) -> i32
  fir.store %14 to %3#1 : !fir.ref
  fir.do_loop %arg1 = %12 to %13 step %c1_2 unordered {
%15 = fir.convert %arg1 : (index) -> i32
fir.store %15 to %1#1 : !fir.ref

  }
}
```
I think this is a non-distruptive change to the op that enables us to model 
loop nests more easily.

@clementval as for the locality specifiers, I am working on a PoC (based-on 
which I will write an RFC) to have a shared "Data Environemt" dialect that can 
be used across `do concurrent`, OpenMP, and OpenACC. I published what I have so 
far in https://github.com/llvm/llvm-project/pull/128148.

https://github.com/llvm/llvm-project/pull/127595
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127595)

2025-02-20 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

@clementval @jeanPerier can you please take a look at the PR and 
@kiranchandramohan's comment above? 🙏

https://github.com/llvm/llvm-project/pull/127595
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127595)

2025-02-24 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

@kiranchandramohan @clementval thanks for your comments (and sorry for the late 
response, I was off yesterday).

Sure, we can works on a multi-range loop op in FIR, our team did not write the 
current loop op definition so I was working with what I have.

Just to be on the same page, do you suggest to have a separate op for `do 
concurrent` (separate from the current `fir.do_loop` op)? Or extend the current 
to model:
- multi-range iteration
- and multi-block loop bodies?

I am leaning towards extending the current op to be more capable/flexible but 
if you have any reasons not to do so, please let me know.

In any case, there is not problem blocking this PR until we can model 
multi-range loops (at least, maybe we can defer multi-block loops to a later 
point).

https://github.com/llvm/llvm-project/pull/127595
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127595)

2025-02-20 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/127595

>From 2e89efa197e7a5d3c27e33795781b1c25a123a8c Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/2] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 155 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 +++
 .../flang/Optimizer/Passes/Pipelines.h|  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  28 
 flang/lib/Frontend/FrontendActions.cpp|  39 -
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp |  99 +++
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../test/Driver/do_concurrent_to_omp_cli.f90  |  20 +++
 .../Transforms/DoConcurrent/basic_host.f90|  53 ++
 flang/tools/bbc/bbc.cpp   |  20 ++-
 18 files changed, 506 insertions(+), 12 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Driver/do_concurrent_to_omp_cli.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 5ad187926e710..0cd3dfd3fb29d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6927,6 +6927,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def fdo_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none, host, device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..cb0b00a2fd699 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_fdo_concurrent_to_openmp_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..62bc3172f8e3b
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,155 @@
+
+
+# `DO CONCURRENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURRENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implementation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
+1. `host`: this maps `do concurrent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurrent` loops to run in parallel on a target 
device.
+   This maps such loops to the equiva

[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127595)

2025-02-25 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

> Extending the current fir.do_loop operation to model multi-block loop bodies 
> is not recommended, I think. It is there to model structured fortran loops. 
> If you want to handle multi-block loop bodies you will need a new operation.

If it is ok, let's postpone this issue until later and focus on the multi-range 
modelling issue first (new op vs. extending current op). Just to make the 
discussion easier to follow.

> -> Is this the do-concurrent to OpenMP conversion pass or an unordered 
> fir.do_loop conversion to OpenMP pass?

Now it is mainly the former. However, we can still do this accurately by 
attaching an attribute to the op to tell us where the op originated from: an 
actual `do concurrent` loop or a generated one. I think this way we don't lose 
accuracy and give ourselves the flexibility of thinking about converting 
code-gened loops later on a case by case basis. So we would have a "multi-range 
`fir.op_loop` that can be `unordered` and also carries the information of where 
it originated from".

>  -> What are the do-concurrent features that you can convert accurately (and 
> what you cannot) with the current fir.do_loop, current fir.do_loop with a 
> minor extension, and with adding an hlfir.do-concurrent operation.

Since the pass is hidden behind a flag, I left the analysis part for future 
steps as documented. I understood your point as: which loops are safe to 
transform/parallelize. I might have not understood what you mean though, please 
let me know if that's the case.

I am not against a separate op, and others who worked on the FIR dialect(s) 
would have a better informed opinion than me. But I think having a separate `do 
concurrent` op would unneccisarily complicate the dialect (since we will have 2 
separate constructs/ops to model loops and both constructs/ops can reperesent 
`unordered` execution). Specially that those 2 ops would share lots of 
similarities. On the other hand, one op flexible enough to give us a detailed 
understanding of multi-range iteration spaces and where the MLIR originated 
from on the Fortran source level, I think would be better.

Again my opinion, and others may well have better informed opinions than me.

https://github.com/llvm/llvm-project/pull/127595
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-18 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

@mjklemm suggested to add a warning that the pass is still experimental, which 
I think is a good idea. However, I am wondering what the best place for that 
warning would be? I prefer not to do that in 
`CodeGenAction::beginSourceFileAction()` (where we inspect the flag and 
actually use it) since that would emit one warning per source file and get 
annoying quickly. Any suggestions?

https://github.com/llvm/llvm-project/pull/126026
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127595)

2025-02-18 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

> It is slightly unfortunate to rediscover the loops so early in the flow when 
> we had it in source.

Totally agree, it should be more trivial than this. And it actually was slight 
worse, see: https://github.com/llvm/llvm-project/pull/114020.

> Have you considered changing the representation of do_concurrent in the IR 
> for multi-range do concurrent loops?

I am all for it. I can add that to the future work part of the document and 
look into it once we have a more fleshed out pass. The nest detection algorithm 
is luckily not that complicated and helps us move forward with a working 
implementation faster. WDYT?


https://github.com/llvm/llvm-project/pull/127595
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127595)

2025-02-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/127595

Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See 
https://github.com/llvm/llvm-project/pull/126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for https://github.com/llvm/llvm-project/pull/126026, only 
the latest commit is relevant.

This is a replacement for https://github.com/llvm/llvm-project/pull/127478 
using a `/user//` branch.

>From 477b5b8d22ddd7b0a873519c8cc16b0e4a3c81ca Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/2] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 155 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 +++
 .../flang/Optimizer/Passes/Pipelines.h|  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  28 
 flang/lib/Frontend/FrontendActions.cpp|  32 +++-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp |  99 +++
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../test/Driver/do_concurrent_to_omp_cli.f90  |  20 +++
 .../Transforms/DoConcurrent/basic_host.f90|  53 ++
 flang/tools/bbc/bbc.cpp   |  20 ++-
 18 files changed, 499 insertions(+), 12 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Driver/do_concurrent_to_omp_cli.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 5ad187926e710..0cd3dfd3fb29d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6927,6 +6927,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def fdo_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none, host, device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..cb0b00a2fd699 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_fdo_concurrent_to_openmp_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..62bc3172f8e3b
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,155 @@
+
+
+# `DO CONCURRENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURRENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current impleme

[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127478)

2025-02-18 Thread Kareem Ergawy via cfe-commits

ergawy wrote:

Abandoned in favor of https://github.com/llvm/llvm-project/pull/127595.

https://github.com/llvm/llvm-project/pull/127478
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127478)

2025-02-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy closed 
https://github.com/llvm/llvm-project/pull/127478
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [flang][OpenMP] Upstream `do concurrent` loop-nest detection. (PR #127595)

2025-02-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/127595

>From 0b8b320c30d53eedbe057a8ad10b74e09b2e15f9 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH 1/2] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 155 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 +++
 .../flang/Optimizer/Passes/Pipelines.h|  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  28 
 flang/lib/Frontend/FrontendActions.cpp|  32 +++-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp |  99 +++
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../test/Driver/do_concurrent_to_omp_cli.f90  |  20 +++
 .../Transforms/DoConcurrent/basic_host.f90|  53 ++
 flang/tools/bbc/bbc.cpp   |  20 ++-
 18 files changed, 499 insertions(+), 12 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Driver/do_concurrent_to_omp_cli.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 5ad187926e710..0cd3dfd3fb29d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6927,6 +6927,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def fdo_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none, host, device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..cb0b00a2fd699 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_fdo_concurrent_to_openmp_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..62bc3172f8e3b
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,155 @@
+
+
+# `DO CONCURRENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURRENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implementation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
+1. `host`: this maps `do concurrent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurrent` loops to run in parallel on a target 
device.
+   This maps such loops to the equival

[clang] [flang] [flang][OpenMP] Upstream first part of `do concurrent` mapping (PR #126026)

2025-02-18 Thread Kareem Ergawy via cfe-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/126026

>From 0b8b320c30d53eedbe057a8ad10b74e09b2e15f9 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Wed, 5 Feb 2025 23:31:15 -0600
Subject: [PATCH] [flang][OpenMP] Upstream first part of `do concurrent`
 mapping

This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
---
 clang/include/clang/Driver/Options.td |   4 +
 clang/lib/Driver/ToolChains/Flang.cpp |   3 +-
 flang/docs/DoConcurrentConversionToOpenMP.md  | 155 ++
 flang/docs/index.md   |   1 +
 .../include/flang/Frontend/CodeGenOptions.def |   2 +
 flang/include/flang/Frontend/CodeGenOptions.h |   5 +
 flang/include/flang/Optimizer/OpenMP/Passes.h |   2 +
 .../include/flang/Optimizer/OpenMP/Passes.td  |  30 
 flang/include/flang/Optimizer/OpenMP/Utils.h  |  26 +++
 .../flang/Optimizer/Passes/Pipelines.h|  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |  28 
 flang/lib/Frontend/FrontendActions.cpp|  32 +++-
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 .../OpenMP/DoConcurrentConversion.cpp |  99 +++
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../test/Driver/do_concurrent_to_omp_cli.f90  |  20 +++
 .../Transforms/DoConcurrent/basic_host.f90|  53 ++
 flang/tools/bbc/bbc.cpp   |  20 ++-
 18 files changed, 499 insertions(+), 12 deletions(-)
 create mode 100644 flang/docs/DoConcurrentConversionToOpenMP.md
 create mode 100644 flang/include/flang/Optimizer/OpenMP/Utils.h
 create mode 100644 flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
 create mode 100644 flang/test/Driver/do_concurrent_to_omp_cli.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.f90

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 5ad187926e710..0cd3dfd3fb29d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -6927,6 +6927,10 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", 
"version-loops-for-stri
 
 def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, 
Group,
   HelpText<"Emit hermetic module files (no nested USE association)">;
+
+def fdo_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
+  HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
+  Values<"none, host, device">;
 } // let Visibility = [FC1Option, FlangOption]
 
 def J : JoinedOrSeparate<["-"], "J">,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp 
b/clang/lib/Driver/ToolChains/Flang.cpp
index 9ad795edd724d..cb0b00a2fd699 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -153,7 +153,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
 CmdArgs.push_back("-fversion-loops-for-stride");
 
   Args.addAllArgs(CmdArgs,
-  {options::OPT_flang_experimental_hlfir,
+  {options::OPT_fdo_concurrent_to_openmp_EQ,
+   options::OPT_flang_experimental_hlfir,
options::OPT_flang_deprecated_no_hlfir,
options::OPT_fno_ppc_native_vec_elem_order,
options::OPT_fppc_native_vec_elem_order,
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
new file mode 100644
index 0..62bc3172f8e3b
--- /dev/null
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -0,0 +1,155 @@
+
+
+# `DO CONCURRENT` mapping to OpenMP
+
+```{contents}
+---
+local:
+---
+```
+
+This document seeks to describe the effort to parallelize `do concurrent` loops
+by mapping them to OpenMP worksharing constructs. The goals of this document
+are:
+* Describing how to instruct `flang` to map `DO CONCURRENT` loops to OpenMP
+  constructs.
+* Tracking the current status of such mapping.
+* Describing the limitations of the current implementation.
+* Describing next steps.
+* Tracking the current upstreaming status (from the AMD ROCm fork).
+
+## Usage
+
+In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
+compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
+1. `host`: this maps `do concurrent` loops to run in parallel on the host CPU.
+   This maps such loops to the equivalent of `omp parallel do`.
+2. `device`: this maps `do concurrent` loops to run in parallel on a target 
device.
+   This maps such loops to the equivalent 

  1   2   >