from:"Pierre van Houtryve via cfe\-commits"

[clang] [clang][AMDGPU] Update amdgpu_waves_per_eu attr docs (PR #74587)

2023-12-06 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/74587

None

>From f4f909df09dda7e2d21389f7b44f67e89997c44b Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 6 Dec 2023 12:47:56 +0100
Subject: [PATCH] [clang][AMDGPU] Update amdgpu_waves_per_eu attr docs

---
 clang/include/clang/Basic/AttrDocs.td | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/clang/include/clang/Basic/AttrDocs.td 
b/clang/include/clang/Basic/AttrDocs.td
index bbe4de94cbabe..88f7c65e6e847 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -2659,8 +2659,9 @@ An error will be given if:
   - Specified values violate subtarget specifications;
   - Specified values are not compatible with values provided through other
 attributes;
-  - The AMDGPU target backend is unable to create machine code that can meet 
the
-request.
+
+The AMDGPU target backend will emit a warning whenever it is unable to
+create machine code that meets the request.
   }];
 }
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][AMDGPU] Update amdgpu_waves_per_eu attr docs (PR #74587)

2023-12-06 Thread Pierre van Houtryve via cfe-commits



@@ -2659,8 +2659,9 @@ An error will be given if:
   - Specified values violate subtarget specifications;
   - Specified values are not compatible with values provided through other
 attributes;
-  - The AMDGPU target backend is unable to create machine code that can meet 
the
-request.
+
+The AMDGPU target backend will emit a warning whenever it is unable to

Pierre-vh wrote:

Do you want me to format it like this instead to mimic the previous formatting?
```
A warning will be given if:
  - ...
```

https://github.com/llvm/llvm-project/pull/74587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][AMDGPU] Update amdgpu_waves_per_eu attr docs (PR #74587)

2023-12-07 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh closed 
https://github.com/llvm/llvm-project/pull/74587
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Improve selection of ballot.i64 intrinsic in wave32 mode. (PR #71556)

2023-11-29 Thread Pierre van Houtryve via cfe-commits



@@ -961,6 +961,18 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, 
IntrinsicInst &II) const {
 return IC.replaceInstUsesWith(II, 
Constant::getNullValue(II.getType()));
   }
 }
+if (ST->isWave32() && II.getType()->getIntegerBitWidth() == 64) {
+  // %b64 = call i64 ballot.i64(...)
+  // =>
+  // %b32 = call i32 ballot.i32(...)
+  // %b64 = zext i32 %b32 to i64
+  Function *NewF = Intrinsic::getDeclaration(
+  II.getModule(), Intrinsic::amdgcn_ballot, {IC.Builder.getInt32Ty()});
+  CallInst *NewCall = IC.Builder.CreateCall(NewF, {II.getArgOperand(0)});
+  Value *CastedCall = IC.Builder.CreateZExtOrBitCast(NewCall, 
II.getType());
+  CastedCall->takeName(&II);
+  return IC.replaceInstUsesWith(II, CastedCall);

Pierre-vh wrote:

Nit: Can just reuse the same value, e.g.
```
  Value *Call = IC.Builder.CreateCall(NewF, {II.getArgOperand(0)});
  Call = IC.Builder.CreateZExtOrBitCast(NewCall, II.getType());
  Call->takeName(&II);
  return IC.replaceInstUsesWith(II, Call);
```

I also think you should be able to use something like 
`IC.Builder.CreateIntrinsic` ? There should be a function to create an 
intrinsic call directly.

https://github.com/llvm/llvm-project/pull/71556
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [clang] [flang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-04 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/76954

Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 
except a new "generic version" flag can be present in EFLAGS. This is related 
to new generic targets that'll be added in a follow-up patch. It's also likely 
V6 will have new changes (possibly new metadata entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).

>From dc666323870118020c0fd386d19d8306d4c853e1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  22 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |  12 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   5 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   6 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  27 +++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|  13 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   5 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   2 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 llvm/tools/llvm-readobj/ELFDumper.cpp | 222 --
 49 files changed, 448 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..0bfe0e7739960e 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4753,9 +4753,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_4">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f71dbf1729a1d6..be86731ed912ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeG

[lld] [clang] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-04 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

Note: testing is a bit light for now, I'd like to add more tests, but I'm not 
sure what kind of tests are worth adding.
I could just add a generic target run line wherever gfx9/10/11 run lines are 
present, but that seems a bit overkill? I'd need to change half the tests we 
have or more.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [clang] [flang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-04 Thread Pierre van Houtryve via cfe-commits



@@ -2585,7 +2585,7 @@ getAMDGPUCodeObjectArgument(const Driver &D, const 
llvm::opt::ArgList &Args) {
 void tools::checkAMDGPUCodeObjectVersion(const Driver &D,
  const llvm::opt::ArgList &Args) {
   const unsigned MinCodeObjVer = 4;
-  const unsigned MaxCodeObjVer = 5;
+  const unsigned MaxCodeObjVer = 6;

Pierre-vh wrote:

I'm wondering if we should print a warning when V6 is enabled (either here or 
in the backend) to note that it's in development and not ready yet? Something 
like "code object v6 is still experimental and not ready for production use"


https://github.com/llvm/llvm-project/pull/76954
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [flang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-07 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From dc666323870118020c0fd386d19d8306d4c853e1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH 1/2] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  22 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |  12 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   5 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   6 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  27 +++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|  13 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   5 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   2 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 llvm/tools/llvm-readobj/ELFDumper.cpp | 222 --
 49 files changed, 448 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..0bfe0e7739960e 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4753,9 +4753,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_4">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f71dbf1729a1d6..be86731ed912ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17481,9 +17481,9 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
 // \p Index is 0, 1, and 2 for x, y, and z dimension, respectively.
 /// Emit code based on Code Object ABI version.
 /// COV_4: Emit code to use dispatch ptr
-/// COV_5: Emit code to use implicitarg ptr
+/// COV_5+   : Emit code to use implicitarg ptr
 /// COV_NONE : Emit code to load a global variable "__oclc_ABI_versi

[clang] [lld] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-09 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh commented:

Missing components:

- Need a way for external tools to inquire about the specifics of generic 
targets  (without depending on llvm)
  - map a specific gfx target to its generic family
  - given a specific gfx version, what's the minimum generic version it needs 
(?)
  - tools need to tell generic MACHs from specific ones (currently they can do 
that by just doing EFLAGS >> 24 and checking if there is any value other than 0 
there, but it needs to be documented)

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [flang] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-09 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh edited 
https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [llvm] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-09 Thread Pierre van Houtryve via cfe-commits



@@ -792,6 +793,17 @@ enum : unsigned {
   EF_AMDGPU_MACH_AMDGCN_FIRST = EF_AMDGPU_MACH_AMDGCN_GFX600,
   EF_AMDGPU_MACH_AMDGCN_LAST = EF_AMDGPU_MACH_AMDGCN_GFX1201,
 
+  // Generic AMDGCN processors
+  // clang-format off
+  EF_AMDGPU_MACH_AMDGCN_GFX9_GENERIC  = 0x0c0,

Pierre-vh wrote:

TODO: put it in the list above instead of its own namespace

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [clang] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-09 Thread Pierre van Houtryve via cfe-commits



@@ -839,6 +851,15 @@ enum : unsigned {
   EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 = 0x800,
   // SRAMECC is on.
   EF_AMDGPU_FEATURE_SRAMECC_ON_V4 = 0xc00,
+
+  // Generic target versioning. This is contained in the list byte of EFLAGS.
+  EF_AMDGPU_GENERIC_VERSION = 0xff00,
+  EF_AMDGPU_GENERIC_VERSION_OFFSET = 24,

Pierre-vh wrote:

This will be made generic instead: teach elfdumper to shift EFLAGS to extract 
any integer

TODO: Review how the generic versioning system works in practice

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [flang] [llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-10 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh edited 
https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [flang] [llvm] [clang] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-10 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From 6368d8210e211948b5a03ab326b996695b8d Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  21 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |   9 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   5 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   9 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  27 +++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|  13 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   5 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   2 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 .../elf-headers.test} |   0
 .../ELF/AMDGPU/generic_versions.s |  16 ++
 .../ELF/AMDGPU/generic_versions.test  |  26 ++
 llvm/tools/llvm-readobj/ELFDumper.cpp | 224 --
 52 files changed, 491 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
 rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => 
AMDGPU/elf-headers.test} (100%)
 create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s
 create mode 100644 
llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index bffdddc28aac60..f9381d0706f447 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4761,9 +4761,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_4">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f71dbf1729a1d6..be86731ed912ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/

[clang] [lld] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-10 Thread Pierre van Houtryve via cfe-commits



@@ -1260,13 +1261,9 @@ def FeatureISAVersion9_0_8 : FeatureSet<
  FeatureImageGather4D16Bug])>;
 
 def FeatureISAVersion9_0_9 : FeatureSet<
-  !listconcat(FeatureISAVersion9_0_Common.Features,
-[FeatureGDS,
- FeatureMadMixInsts,
- FeatureDsSrc2Insts,
- FeatureExtendedImageInsts,
- FeatureImageInsts,
- FeatureImageGather4D16Bug])>;
+  !listconcat(FeatureISAVersion9_0_Consumer_Common.Features,
+[FeatureMadMixInsts,
+ FeatureImageInsts])>;

Pierre-vh wrote:

todo: remove ImageInsts, it's already included

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [flang] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-10 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh edited 
https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [flang] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-10 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh edited 
https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [flang] [lld] [clang] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-10 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh edited 
https://github.com/llvm/llvm-project/pull/76954
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76955

>From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:48:05 +0100
Subject: [PATCH 1/3] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets

These generic targets include multiple GPUs and will, in the future, provide a 
way to build once and run on multiple GPU, at the cost of less optimization 
opportunities.
Note that this is just doing the compiler side of things, device libs an 
runtimes/loader/etc. don't know about these targets yet, so none of them 
actually work in practice right now. This is just the initial commit to make 
LLVM aware of them.

No docs in this patch either as I plan to do it all in a follow-up patch.
---
 clang/lib/Basic/Targets/AMDGPU.cpp|  20 +-
 clang/test/Driver/amdgpu-macros.cl|   5 +
 clang/test/Driver/amdgpu-mcpu.cl  |  10 +
 llvm/docs/AMDGPUUsage.rst | 325 +-
 llvm/include/llvm/BinaryFormat/ELF.h  |   6 +-
 llvm/include/llvm/TargetParser/TargetParser.h |  10 +
 llvm/lib/Object/ELFObjectFile.cpp |  10 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   4 +
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  87 +++--
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   6 +
 .../AMDGPURemoveIncompatibleFunctions.cpp |   6 +-
 llvm/lib/Target/AMDGPU/GCNProcessors.td   |  22 ++
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   4 +
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  11 +
 llvm/lib/TargetParser/TargetParser.cpp|  46 +++
 .../GlobalISel/llvm.amdgcn.workitem.id.ll |   1 +
 .../CodeGen/AMDGPU/directive-amdgcn-target.ll |  14 +
 .../CodeGen/AMDGPU/elf-header-flags-mach.ll   |  10 +
 llvm/test/CodeGen/AMDGPU/gds-allocation.ll|   1 +
 llvm/test/CodeGen/AMDGPU/gds-atomic.ll|   1 +
 .../AMDGPU/generic-targets-require-v6.ll  |  18 +
 .../AMDGPU/hsa-generic-target-features.ll |  31 ++
 .../llvm.amdgcn.image.gather4.d16.dim.ll  |   3 +
 .../AMDGPU/llvm.amdgcn.image.sample.dim.ll|   3 +
 .../AMDGPU/unsupported-image-sample.ll|  12 +-
 .../Object/AMDGPU/elf-header-flags-mach.yaml  |  29 ++
 .../llvm-objdump/ELF/AMDGPU/subtarget.ll  |  20 ++
 .../llvm-readobj/ELF/AMDGPU/elf-headers.test  |  12 +
 llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++
 30 files changed, 689 insertions(+), 192 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 141501e8a4d9a..799634ccec7ba 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost)
 return;
 
-  StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
-  : getArchNameR600(GPUKind);
+  std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
+ : getArchNameR600(GPUKind))
+  .str();
+
+  // Sanitize the name of generic targets.
+  // e.g. gfx10.1-generic -> gfx10_1_generic
+  if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST &&
+  GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) {
+std::replace(CanonName.begin(), CanonName.end(), '.', '_');
+std::replace(CanonName.begin(), CanonName.end(), '-', '_');
+  }
+
   Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
   if (isAMDGCN(getTriple()) && !IsHIPHost) {
-assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+assert(StringRef(CanonName).starts_with("gfx") &&
+   "Invalid amdgcn canonical name");
+StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
+Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) +
 Twine("__"));
 Builder.defineMacro("__amdgcn_processor__",
 Twine("\"") + Twine(CanonName) + Twine("\""));
diff --git a/clang/test/Driver/amdgpu-macros.cl 
b/clang/test/Driver/amdgpu-macros.cl
index 81c22af460d12..3b10444ef71d3 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -131,6 +131,11 @@
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 
-DFAMILY=GFX12
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s

[llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76955

>From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:48:05 +0100
Subject: [PATCH 1/4] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets

These generic targets include multiple GPUs and will, in the future, provide a 
way to build once and run on multiple GPU, at the cost of less optimization 
opportunities.
Note that this is just doing the compiler side of things, device libs an 
runtimes/loader/etc. don't know about these targets yet, so none of them 
actually work in practice right now. This is just the initial commit to make 
LLVM aware of them.

No docs in this patch either as I plan to do it all in a follow-up patch.
---
 clang/lib/Basic/Targets/AMDGPU.cpp|  20 +-
 clang/test/Driver/amdgpu-macros.cl|   5 +
 clang/test/Driver/amdgpu-mcpu.cl  |  10 +
 llvm/docs/AMDGPUUsage.rst | 325 +-
 llvm/include/llvm/BinaryFormat/ELF.h  |   6 +-
 llvm/include/llvm/TargetParser/TargetParser.h |  10 +
 llvm/lib/Object/ELFObjectFile.cpp |  10 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   4 +
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  87 +++--
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   6 +
 .../AMDGPURemoveIncompatibleFunctions.cpp |   6 +-
 llvm/lib/Target/AMDGPU/GCNProcessors.td   |  22 ++
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   4 +
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  11 +
 llvm/lib/TargetParser/TargetParser.cpp|  46 +++
 .../GlobalISel/llvm.amdgcn.workitem.id.ll |   1 +
 .../CodeGen/AMDGPU/directive-amdgcn-target.ll |  14 +
 .../CodeGen/AMDGPU/elf-header-flags-mach.ll   |  10 +
 llvm/test/CodeGen/AMDGPU/gds-allocation.ll|   1 +
 llvm/test/CodeGen/AMDGPU/gds-atomic.ll|   1 +
 .../AMDGPU/generic-targets-require-v6.ll  |  18 +
 .../AMDGPU/hsa-generic-target-features.ll |  31 ++
 .../llvm.amdgcn.image.gather4.d16.dim.ll  |   3 +
 .../AMDGPU/llvm.amdgcn.image.sample.dim.ll|   3 +
 .../AMDGPU/unsupported-image-sample.ll|  12 +-
 .../Object/AMDGPU/elf-header-flags-mach.yaml  |  29 ++
 .../llvm-objdump/ELF/AMDGPU/subtarget.ll  |  20 ++
 .../llvm-readobj/ELF/AMDGPU/elf-headers.test  |  12 +
 llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++
 30 files changed, 689 insertions(+), 192 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 141501e8a4d9a..799634ccec7ba 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost)
 return;
 
-  StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
-  : getArchNameR600(GPUKind);
+  std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
+ : getArchNameR600(GPUKind))
+  .str();
+
+  // Sanitize the name of generic targets.
+  // e.g. gfx10.1-generic -> gfx10_1_generic
+  if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST &&
+  GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) {
+std::replace(CanonName.begin(), CanonName.end(), '.', '_');
+std::replace(CanonName.begin(), CanonName.end(), '-', '_');
+  }
+
   Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
   if (isAMDGCN(getTriple()) && !IsHIPHost) {
-assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+assert(StringRef(CanonName).starts_with("gfx") &&
+   "Invalid amdgcn canonical name");
+StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
+Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) +
 Twine("__"));
 Builder.defineMacro("__amdgcn_processor__",
 Twine("\"") + Twine(CanonName) + Twine("\""));
diff --git a/clang/test/Driver/amdgpu-macros.cl 
b/clang/test/Driver/amdgpu-macros.cl
index 81c22af460d12..3b10444ef71d3 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -131,6 +131,11 @@
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 
-DFAMILY=GFX12
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-07 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

For the MD changes, it's just to describe the version increment, nothing else. 
I think describing is important as the V6 diff already updated the 
amdhsa.version.
If amdhsa.version didn't need to change then i need to fix that first, and then 
we can remove the V6 MD section

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-07 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76955

>From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:48:05 +0100
Subject: [PATCH 1/5] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets

These generic targets include multiple GPUs and will, in the future, provide a 
way to build once and run on multiple GPU, at the cost of less optimization 
opportunities.
Note that this is just doing the compiler side of things, device libs an 
runtimes/loader/etc. don't know about these targets yet, so none of them 
actually work in practice right now. This is just the initial commit to make 
LLVM aware of them.

No docs in this patch either as I plan to do it all in a follow-up patch.
---
 clang/lib/Basic/Targets/AMDGPU.cpp|  20 +-
 clang/test/Driver/amdgpu-macros.cl|   5 +
 clang/test/Driver/amdgpu-mcpu.cl  |  10 +
 llvm/docs/AMDGPUUsage.rst | 325 +-
 llvm/include/llvm/BinaryFormat/ELF.h  |   6 +-
 llvm/include/llvm/TargetParser/TargetParser.h |  10 +
 llvm/lib/Object/ELFObjectFile.cpp |  10 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   4 +
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  87 +++--
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   6 +
 .../AMDGPURemoveIncompatibleFunctions.cpp |   6 +-
 llvm/lib/Target/AMDGPU/GCNProcessors.td   |  22 ++
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   4 +
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  11 +
 llvm/lib/TargetParser/TargetParser.cpp|  46 +++
 .../GlobalISel/llvm.amdgcn.workitem.id.ll |   1 +
 .../CodeGen/AMDGPU/directive-amdgcn-target.ll |  14 +
 .../CodeGen/AMDGPU/elf-header-flags-mach.ll   |  10 +
 llvm/test/CodeGen/AMDGPU/gds-allocation.ll|   1 +
 llvm/test/CodeGen/AMDGPU/gds-atomic.ll|   1 +
 .../AMDGPU/generic-targets-require-v6.ll  |  18 +
 .../AMDGPU/hsa-generic-target-features.ll |  31 ++
 .../llvm.amdgcn.image.gather4.d16.dim.ll  |   3 +
 .../AMDGPU/llvm.amdgcn.image.sample.dim.ll|   3 +
 .../AMDGPU/unsupported-image-sample.ll|  12 +-
 .../Object/AMDGPU/elf-header-flags-mach.yaml  |  29 ++
 .../llvm-objdump/ELF/AMDGPU/subtarget.ll  |  20 ++
 .../llvm-readobj/ELF/AMDGPU/elf-headers.test  |  12 +
 llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++
 30 files changed, 689 insertions(+), 192 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 141501e8a4d9a..799634ccec7ba 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost)
 return;
 
-  StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
-  : getArchNameR600(GPUKind);
+  std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
+ : getArchNameR600(GPUKind))
+  .str();
+
+  // Sanitize the name of generic targets.
+  // e.g. gfx10.1-generic -> gfx10_1_generic
+  if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST &&
+  GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) {
+std::replace(CanonName.begin(), CanonName.end(), '.', '_');
+std::replace(CanonName.begin(), CanonName.end(), '-', '_');
+  }
+
   Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
   if (isAMDGCN(getTriple()) && !IsHIPHost) {
-assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+assert(StringRef(CanonName).starts_with("gfx") &&
+   "Invalid amdgcn canonical name");
+StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
+Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) +
 Twine("__"));
 Builder.defineMacro("__amdgcn_processor__",
 Twine("\"") + Twine(CanonName) + Twine("\""));
diff --git a/clang/test/Driver/amdgpu-macros.cl 
b/clang/test/Driver/amdgpu-macros.cl
index 81c22af460d12..3b10444ef71d3 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -131,6 +131,11 @@
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 
-DFAMILY=GFX12
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-07 Thread Pierre van Houtryve via cfe-commits



@@ -520,6 +520,102 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors allow execution of a single code objects on any of the 
processors that
+it supports. Such code objects may not perform as well as those for the 
non-generic processors.
+
+Generic processors are only available on code object V6 and above (see 
:ref:`amdgpu-elf-code-object`).
+
+Generic processor code objects are versioned (see 
:ref:`amdgpu-elf-header-e_flags-table-v6-onwards`).
+The version number is used by runtimes to determine if a code object can be 
run on a specific agent.

Pierre-vh wrote:

I rephrased it a bit (e.g. member -> supported processor) but I mostly followed 
your suggestion

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-07 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76955

>From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:48:05 +0100
Subject: [PATCH 1/6] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets

These generic targets include multiple GPUs and will, in the future, provide a 
way to build once and run on multiple GPU, at the cost of less optimization 
opportunities.
Note that this is just doing the compiler side of things, device libs an 
runtimes/loader/etc. don't know about these targets yet, so none of them 
actually work in practice right now. This is just the initial commit to make 
LLVM aware of them.

No docs in this patch either as I plan to do it all in a follow-up patch.
---
 clang/lib/Basic/Targets/AMDGPU.cpp|  20 +-
 clang/test/Driver/amdgpu-macros.cl|   5 +
 clang/test/Driver/amdgpu-mcpu.cl  |  10 +
 llvm/docs/AMDGPUUsage.rst | 325 +-
 llvm/include/llvm/BinaryFormat/ELF.h  |   6 +-
 llvm/include/llvm/TargetParser/TargetParser.h |  10 +
 llvm/lib/Object/ELFObjectFile.cpp |  10 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   4 +
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  87 +++--
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   6 +
 .../AMDGPURemoveIncompatibleFunctions.cpp |   6 +-
 llvm/lib/Target/AMDGPU/GCNProcessors.td   |  22 ++
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   4 +
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  11 +
 llvm/lib/TargetParser/TargetParser.cpp|  46 +++
 .../GlobalISel/llvm.amdgcn.workitem.id.ll |   1 +
 .../CodeGen/AMDGPU/directive-amdgcn-target.ll |  14 +
 .../CodeGen/AMDGPU/elf-header-flags-mach.ll   |  10 +
 llvm/test/CodeGen/AMDGPU/gds-allocation.ll|   1 +
 llvm/test/CodeGen/AMDGPU/gds-atomic.ll|   1 +
 .../AMDGPU/generic-targets-require-v6.ll  |  18 +
 .../AMDGPU/hsa-generic-target-features.ll |  31 ++
 .../llvm.amdgcn.image.gather4.d16.dim.ll  |   3 +
 .../AMDGPU/llvm.amdgcn.image.sample.dim.ll|   3 +
 .../AMDGPU/unsupported-image-sample.ll|  12 +-
 .../Object/AMDGPU/elf-header-flags-mach.yaml  |  29 ++
 .../llvm-objdump/ELF/AMDGPU/subtarget.ll  |  20 ++
 .../llvm-readobj/ELF/AMDGPU/elf-headers.test  |  12 +
 llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++
 30 files changed, 689 insertions(+), 192 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 141501e8a4d9a1..799634ccec7ba5 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost)
 return;
 
-  StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
-  : getArchNameR600(GPUKind);
+  std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
+ : getArchNameR600(GPUKind))
+  .str();
+
+  // Sanitize the name of generic targets.
+  // e.g. gfx10.1-generic -> gfx10_1_generic
+  if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST &&
+  GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) {
+std::replace(CanonName.begin(), CanonName.end(), '.', '_');
+std::replace(CanonName.begin(), CanonName.end(), '-', '_');
+  }
+
   Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
   if (isAMDGCN(getTriple()) && !IsHIPHost) {
-assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+assert(StringRef(CanonName).starts_with("gfx") &&
+   "Invalid amdgcn canonical name");
+StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
+Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) +
 Twine("__"));
 Builder.defineMacro("__amdgcn_processor__",
 Twine("\"") + Twine(CanonName) + Twine("\""));
diff --git a/clang/test/Driver/amdgpu-macros.cl 
b/clang/test/Driver/amdgpu-macros.cl
index 81c22af460d12d..3b10444ef71d36 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -131,6 +131,11 @@
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 
-DFAMILY=GFX12
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-07 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

@t-tye Can you please approve then? Otherwise the diff still shows a red 
"Changes requested" warning :) Thanks
@arsenm Please also approve if there are no more comments

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh edited 
https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh commented:

My comments are mostly about style, I haven't done a deep dive into the logic 
of the pass yet

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {

Pierre-vh wrote:

`Interface` is a very generic name, can you make it a bit more specific and add 
docs?

https://github.com/llvm/llvm-project/pull/81058
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,701 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class ValistCc { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // Lots of targets use a void* pointed at a buffer for va_list.
+  // Some use more complicated iterator constructs.
+  // This interface seeks to express both.
+  // Ideally it would be a compile time error for a derived class
+  // to override only one of valistOnStack, initializeVAList.
+
+  // How the vaListType is passed
+  virtual ValistCc valistCc() { return ValistCc::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-08 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,698 @@
+//===-- ExpandVariadicsPass.cpp *- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This is an optimisation pass for variadic functions. If called from codegen,
+// it can serve as the implementation of variadic functions for a given target.
+//
+// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new
+// target means adding a case to VariadicABIInfo::create() along with tests.
+//
+// The module pass using that information is class ExpandVariadics.
+//
+// The strategy is:
+// 1. Test whether a variadic function is sufficiently simple
+// 2. If it was, calls to it can be replaced with calls to a different function
+// 3. If it wasn't, try to split it into a simple function and a remainder
+// 4. Optionally rewrite the varadic function calling convention as well
+//
+// This pass considers "sufficiently simple" to mean a variadic function that
+// calls into a different function taking a va_list to do the real work. For
+// example, libc might implement fprintf as a single basic block calling into
+// vfprintf. This pass can then rewrite call to the variadic into some code
+// to construct a target-specific value to use for the va_list and a call
+// into the non-variadic implementation function. There's a test for that.
+//
+// Most other variadic functions whose definition is known can be converted 
into
+// that form. Create a new internal function taking a va_list where the 
original
+// took a ... parameter. Move the blocks across. Create a new block containing 
a
+// va_start that calls into the new function. This is nearly target 
independent.
+//
+// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or
+// where the ABI can be chosen to align with this transform, the function
+// interface can be rewritten along with calls to unknown variadic functions.
+//
+// The aggregate effect is to unblock other transforms, most critically the
+// general purpose inliner. Known calls to variadic functions become zero cost.
+//
+// This pass does define some target specific information which is partially
+// redundant with other parts of the compiler. In particular, the call frame
+// it builds must be the exact complement of the va_arg lowering performed
+// by clang. The va_list construction is similar to work done by the backend
+// for targets that lower variadics there, though distinct in that this pass
+// constructs the pieces using alloca instead of relative to stack pointers.
+//
+// Consistency with clang is primarily tested by emitting va_arg using clang
+// then expanding the variadic functions using this pass, followed by trying
+// to constant fold the functions to no-ops.
+//
+// Target specific behaviour is tested in IR - mainly checking that values are
+// put into positions in call frames that make sense for that particular 
target.
+//
+//===--===//
+
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/TargetParser/Triple.h"
+
+#define DEBUG_TYPE "expand-variadics"
+
+using namespace llvm;
+
+namespace {
+namespace VariadicABIInfo {
+
+// calling convention for passing as valist object, same as it would be in C
+// aarch64 uses byval
+enum class valistCC { value, pointer, /*byval*/ };
+
+struct Interface {
+protected:
+  Interface(uint32_t MinAlign, uint32_t MaxAlign)
+  : MinAlign(MinAlign), MaxAlign(MaxAlign) {}
+
+public:
+  virtual ~Interface() {}
+  const uint32_t MinAlign;
+  const uint32_t MaxAlign;
+
+  // Most ABIs use a void* or char* for va_list, others can specialise
+  virtual Type *vaListType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  // How the vaListType is passed
+  virtual valistCC vaListCC() { return valistCC::value; }
+
+  // The valist might need to be stack allocated.
+  virtual bool valistOnStack() { return false; }
+
+  virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder,
+AllocaInst * /*va_list*/, Value * /*buffer*/) {
+// Function needs to be implemented if valist is on the stack
+assert(!valistOnStack());
+__builtin_unreachable();
+  }
+
+  // All targets currently implemented use a ptr for the valist parameter
+  Type *vaListParameterType(LLVMContext &Ctx) {
+return PointerType::getUnqual(Ctx);
+  }
+
+  bool VAEndIsNop() { return

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-09 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

> mad_mix

I added run lines to `mad-mix.ll` and it behaves as expected: no fma/mad_mix 
emitted

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-12 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh closed 
https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-26 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh approved this pull request.

LGTM, but wait for @t-tye or @jayfoad to approve as well

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-01 Thread Pierre van Houtryve via cfe-commits



@@ -2594,12 +2594,10 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const 
SIMemOpInfo &MOI,
 MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent ||
 MOI.getFailureOrdering() == AtomicOrdering::Acquire ||
 MOI.getFailureOrdering() == AtomicOrdering::SequentiallyConsistent) {
-  Changed |= CC->insertWait(MI, MOI.getScope(),

Pierre-vh wrote:

extra formatting change

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-01 Thread Pierre van Houtryve via cfe-commits



@@ -2326,6 +2326,20 @@ bool 
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
 }
 #endif
 
+if (ST->isPreciseMemoryEnabled()) {
+  AMDGPU::Waitcnt Wait;
+  if (WCG == &WCGPreGFX12)

Pierre-vh wrote:

Use `ST->hasExtendedWaitCounts()` instead of checking the pointer?

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-03-01 Thread Pierre van Houtryve via cfe-commits



@@ -2326,6 +2326,20 @@ bool 
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
 }
 #endif
 
+if (ST->isPreciseMemoryEnabled()) {
+  AMDGPU::Waitcnt Wait;
+  if (WCG == &WCGPreGFX12)
+Wait = AMDGPU::Waitcnt(0, 0, 0, 0);

Pierre-vh wrote:

I was looking at https://github.com/ROCm/ROCm-CompilerSupport/issues/66 and it 
made me wonder, why do we have to emit all zeroes instead of just emitting 
what's in `ScoreBrackets`? Is there an advantage?

I'm wondering if this should just emit `ScoreBrackets`, then `+precise-memory` 
+ `-amdgpu-waitcnt-forcezero` need to be used together achieve the behavior we 
have here?


https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)

2024-03-01 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh created 
https://github.com/llvm/llvm-project/pull/83558

Those macros are unreliable because our features are mostly uninitialized at 
that stage, so any macro we define is unreliable.

Fixes SWDEV-447308

>From 3730631ac58425f559f4bc3cfe3da89e6367c1c5 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 1 Mar 2024 12:43:55 +0100
Subject: [PATCH] [clang][AMDGPU] Don't define feature macros on host code

Those macros are unreliable because our features are mostly uninitialized at 
that stage, so any macro we define is unreliable.

Fixes SWDEV-447308
---
 clang/lib/Basic/Targets/AMDGPU.cpp   | 8 +++-
 clang/test/Preprocessor/predefined-arch-macros.c | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 5742885df0461b..df9a5855068ed3 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -292,8 +292,14 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   }
 
   Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
+
+  // Don't emit feature macros in host code because in such cases the
+  // feature list is not accurate.
+  if (IsHIPHost)
+return;
+
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
-  if (isAMDGCN(getTriple()) && !IsHIPHost) {
+  if (isAMDGCN(getTriple())) {
 assert(StringRef(CanonName).starts_with("gfx") &&
"Invalid amdgcn canonical name");
 StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
diff --git a/clang/test/Preprocessor/predefined-arch-macros.c 
b/clang/test/Preprocessor/predefined-arch-macros.c
index ca51f2fc22c517..8904bcea1a1f68 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -4340,7 +4340,7 @@
 // RUN: %clang -x hip -E -dM %s -o - 2>&1 --offload-host-only -nogpulib \
 // RUN: -nogpuinc --offload-arch=gfx803 -target x86_64-unknown-linux \
 // RUN:   | FileCheck -match-full-lines %s -check-prefixes=CHECK_HIP_HOST
-// CHECK_HIP_HOST: #define __AMDGCN_WAVEFRONT_SIZE__ 64
+// CHECK_HIP_HOST-NOT: #define __AMDGCN_WAVEFRONT_SIZE__ 64
 // CHECK_HIP_HOST: #define __AMDGPU__ 1
 // CHECK_HIP_HOST: #define __AMD__ 1
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)

2024-03-01 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/83558

>From 3730631ac58425f559f4bc3cfe3da89e6367c1c5 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 1 Mar 2024 12:43:55 +0100
Subject: [PATCH 1/2] [clang][AMDGPU] Don't define feature macros on host code

Those macros are unreliable because our features are mostly uninitialized at 
that stage, so any macro we define is unreliable.

Fixes SWDEV-447308
---
 clang/lib/Basic/Targets/AMDGPU.cpp   | 8 +++-
 clang/test/Preprocessor/predefined-arch-macros.c | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 5742885df0461b..df9a5855068ed3 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -292,8 +292,14 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   }
 
   Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
+
+  // Don't emit feature macros in host code because in such cases the
+  // feature list is not accurate.
+  if (IsHIPHost)
+return;
+
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
-  if (isAMDGCN(getTriple()) && !IsHIPHost) {
+  if (isAMDGCN(getTriple())) {
 assert(StringRef(CanonName).starts_with("gfx") &&
"Invalid amdgcn canonical name");
 StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
diff --git a/clang/test/Preprocessor/predefined-arch-macros.c 
b/clang/test/Preprocessor/predefined-arch-macros.c
index ca51f2fc22c517..8904bcea1a1f68 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -4340,7 +4340,7 @@
 // RUN: %clang -x hip -E -dM %s -o - 2>&1 --offload-host-only -nogpulib \
 // RUN: -nogpuinc --offload-arch=gfx803 -target x86_64-unknown-linux \
 // RUN:   | FileCheck -match-full-lines %s -check-prefixes=CHECK_HIP_HOST
-// CHECK_HIP_HOST: #define __AMDGCN_WAVEFRONT_SIZE__ 64
+// CHECK_HIP_HOST-NOT: #define __AMDGCN_WAVEFRONT_SIZE__ 64
 // CHECK_HIP_HOST: #define __AMDGPU__ 1
 // CHECK_HIP_HOST: #define __AMD__ 1
 

>From a60d9fa16876b90a69b60de429261a3d10404f7a Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 1 Mar 2024 13:56:46 +0100
Subject: [PATCH 2/2] use CudaIssDevice

---
 clang/lib/Basic/Targets/AMDGPU.cpp|   2 +-
 .../CodeGenOpenCL/builtins-amdgcn-wave32.cl   |   2 +-
 clang/test/Driver/amdgpu-macros.cl| 212 +-
 clang/test/Driver/target-id-macros.cl |  10 +-
 .../Preprocessor/predefined-arch-macros.c |   6 +-
 5 files changed, 116 insertions(+), 116 deletions(-)

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index df9a5855068ed3..0c5f6bb13ec2eb 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -295,7 +295,7 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
 
   // Don't emit feature macros in host code because in such cases the
   // feature list is not accurate.
-  if (IsHIPHost)
+  if (!Opts.CUDAIsDevice)
 return;
 
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
index da1ae244431556..de3020fdb6f98f 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
@@ -42,6 +42,6 @@ void test_read_exec_hi(global uint* out) {
   *out = __builtin_amdgcn_read_exec_hi();
 }
 
-#if __AMDGCN_WAVEFRONT_SIZE != 32
+#if defined(__AMDGCN_WAVEFRONT_SIZE) && __AMDGCN_WAVEFRONT_SIZE != 32
 #error Wrong wavesize detected
 #endif
diff --git a/clang/test/Driver/amdgpu-macros.cl 
b/clang/test/Driver/amdgpu-macros.cl
index 004619321b271f..1f03ccc6ab9223 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -6,32 +6,32 @@
 // R600-based processors.
 //
 
-// RUN: %clang -E -dM -target r600 -mcpu=r600 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-R600,R600 %s -DCPU=r600
-// RUN: %clang -E -dM -target r600 -mcpu=rv630 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-R600,R600 %s -DCPU=r600
-// RUN: %clang -E -dM -target r600 -mcpu=rv635 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-R600,R600 %s -DCPU=r600
-// RUN: %clang -E -dM -target r600 -mcpu=r630 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-R600,R630 %s -DCPU=r630
-// RUN: %clang -E -dM -target r600 -mcpu=rs780 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-R600,RS880 %s -DCPU=rs880
-// RUN: %clang -E -dM -target r600 -mcpu=rs880 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-R600,RS880 %s -DCPU=rs880
-// RUN: %clang -E -dM -target r600 -mcpu=rv610 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-R600,RS880 %s -DCPU=rs880
-// RUN: %clang -E -dM -target r600 -mcpu=rv620 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-R600,RS880 %s -DCPU=rs880
-// RUN: %clang -E -dM -target r600 -

[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)

2024-03-01 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

> This was the original behavior of my patch, but I reverted it because it 
> broke all the HIP headers that were unintentionally relying on this. Has that 
> been resolved?

Was an issue opened for that? How many headers are affected?

https://github.com/llvm/llvm-project/pull/83558
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)

2024-03-03 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh closed 
https://github.com/llvm/llvm-project/pull/83558
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [flang] [llvm] [clang] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-30 Thread Pierre van Houtryve via cfe-commits



@@ -44,8 +44,15 @@ constexpr uint32_t VersionMajorV5 = 1;
 /// HSA metadata minor version for code object V5.
 constexpr uint32_t VersionMinorV5 = 2;
 
+/// HSA metadata major version for code object V6.
+constexpr uint32_t VersionMajorV6 = 1;
+/// HSA metadata minor version for code object V6.
+constexpr uint32_t VersionMinorV6 = 3;

Pierre-vh wrote:

Do we just increment this number when there's a breaking metadata change? How 
does it work?

https://github.com/llvm/llvm-project/pull/76954
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [flang] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-02-01 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From 7ad88453f5e89fd4643afa486e52a123138433f4 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH 1/2] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  21 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |   9 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   7 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   9 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   6 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   2 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   3 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 .../elf-headers.test} |   0
 .../ELF/AMDGPU/generic_versions.s |  16 ++
 .../ELF/AMDGPU/generic_versions.test  |  26 ++
 llvm/tools/llvm-readobj/ELFDumper.cpp | 224 --
 52 files changed, 483 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
 rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => 
AMDGPU/elf-headers.test} (100%)
 create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s
 create mode 100644 
llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index a4b35e370999e..d7152cc355ce7 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4797,9 +4797,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f3ab5ad7b08ec..a55be6880c5ef 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/C

[clang] [flang] [llvm] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-01 Thread Pierre van Houtryve via cfe-commits



@@ -139,10 +139,10 @@ bool 
AMDGPURemoveIncompatibleFunctions::checkFunction(Function &F) {
   const GCNSubtarget *ST =
   static_cast(TM->getSubtargetImpl(F));
 
-  // Check the GPU isn't generic. Generic is used for testing only
-  // and we don't want this pass to interfere with it.
+  // Check the GPU isn't generic or generic-hsa. Generic is used for testing
+  // only and we don't want this pass to interfere with it.
   StringRef GPUName = ST->getCPU();
-  if (GPUName.empty() || GPUName.contains("generic"))
+  if (GPUName.empty() || GPUName.starts_with("generic"))

Pierre-vh wrote:

No we have some tests using a generic target, and I want the pass to work on 
-generic targets as well (e.g. gfx9-generic), so I'm moving the check to allow 
stuff like gfx9-generic but not "generic" alone

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [flang] [clang] [lld] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-02-01 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From a967fdae9a8557331d2a228f391f39f9e27e8943 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH 1/3] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  21 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |   9 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   7 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   9 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   6 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   2 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   3 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 .../elf-headers.test} |   0
 .../ELF/AMDGPU/generic_versions.s |  16 ++
 .../ELF/AMDGPU/generic_versions.test  |  26 ++
 llvm/tools/llvm-readobj/ELFDumper.cpp | 224 --
 52 files changed, 483 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
 rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => 
AMDGPU/elf-headers.test} (100%)
 create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s
 create mode 100644 
llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 73071a6648541..fb5f50ef452c2 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4801,9 +4801,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 196be813a4896..f17e4a83305bf 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/C

[llvm] [clang] [lld] [flang] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-02-02 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From a967fdae9a8557331d2a228f391f39f9e27e8943 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH 1/4] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  21 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |   9 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   7 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   9 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   6 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   2 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   3 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 .../elf-headers.test} |   0
 .../ELF/AMDGPU/generic_versions.s |  16 ++
 .../ELF/AMDGPU/generic_versions.test  |  26 ++
 llvm/tools/llvm-readobj/ELFDumper.cpp | 224 --
 52 files changed, 483 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
 rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => 
AMDGPU/elf-headers.test} (100%)
 create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s
 create mode 100644 
llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 73071a6648541..fb5f50ef452c2 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4801,9 +4801,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 196be813a4896..f17e4a83305bf 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/C

[clang] [lld] [llvm] [flang] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-02-02 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From a967fdae9a8557331d2a228f391f39f9e27e8943 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH 1/5] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  21 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |   9 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   7 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   9 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   6 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   2 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   3 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 .../elf-headers.test} |   0
 .../ELF/AMDGPU/generic_versions.s |  16 ++
 .../ELF/AMDGPU/generic_versions.test  |  26 ++
 llvm/tools/llvm-readobj/ELFDumper.cpp | 224 --
 52 files changed, 483 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
 rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => 
AMDGPU/elf-headers.test} (100%)
 create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s
 create mode 100644 
llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 73071a6648541..fb5f50ef452c2 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4801,9 +4801,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 196be813a4896..f17e4a83305bf 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/C

[flang] [clang] [llvm] [lld] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-02-04 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh closed 
https://github.com/llvm/llvm-project/pull/76954
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-04 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76955

>From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:48:05 +0100
Subject: [PATCH] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets

These generic targets include multiple GPUs and will, in the future, provide a 
way to build once and run on multiple GPU, at the cost of less optimization 
opportunities.
Note that this is just doing the compiler side of things, device libs an 
runtimes/loader/etc. don't know about these targets yet, so none of them 
actually work in practice right now. This is just the initial commit to make 
LLVM aware of them.

No docs in this patch either as I plan to do it all in a follow-up patch.
---
 clang/lib/Basic/Targets/AMDGPU.cpp|  20 +-
 clang/test/Driver/amdgpu-macros.cl|   5 +
 clang/test/Driver/amdgpu-mcpu.cl  |  10 +
 llvm/docs/AMDGPUUsage.rst | 325 +-
 llvm/include/llvm/BinaryFormat/ELF.h  |   6 +-
 llvm/include/llvm/TargetParser/TargetParser.h |  10 +
 llvm/lib/Object/ELFObjectFile.cpp |  10 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   4 +
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  87 +++--
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   6 +
 .../AMDGPURemoveIncompatibleFunctions.cpp |   6 +-
 llvm/lib/Target/AMDGPU/GCNProcessors.td   |  22 ++
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   4 +
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  11 +
 llvm/lib/TargetParser/TargetParser.cpp|  46 +++
 .../GlobalISel/llvm.amdgcn.workitem.id.ll |   1 +
 .../CodeGen/AMDGPU/directive-amdgcn-target.ll |  14 +
 .../CodeGen/AMDGPU/elf-header-flags-mach.ll   |  10 +
 llvm/test/CodeGen/AMDGPU/gds-allocation.ll|   1 +
 llvm/test/CodeGen/AMDGPU/gds-atomic.ll|   1 +
 .../AMDGPU/generic-targets-require-v6.ll  |  18 +
 .../AMDGPU/hsa-generic-target-features.ll |  31 ++
 .../llvm.amdgcn.image.gather4.d16.dim.ll  |   3 +
 .../AMDGPU/llvm.amdgcn.image.sample.dim.ll|   3 +
 .../AMDGPU/unsupported-image-sample.ll|  12 +-
 .../Object/AMDGPU/elf-header-flags-mach.yaml  |  29 ++
 .../llvm-objdump/ELF/AMDGPU/subtarget.ll  |  20 ++
 .../llvm-readobj/ELF/AMDGPU/elf-headers.test  |  12 +
 llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++
 30 files changed, 689 insertions(+), 192 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 141501e8a4d9a..799634ccec7ba 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost)
 return;
 
-  StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
-  : getArchNameR600(GPUKind);
+  std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
+ : getArchNameR600(GPUKind))
+  .str();
+
+  // Sanitize the name of generic targets.
+  // e.g. gfx10.1-generic -> gfx10_1_generic
+  if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST &&
+  GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) {
+std::replace(CanonName.begin(), CanonName.end(), '.', '_');
+std::replace(CanonName.begin(), CanonName.end(), '-', '_');
+  }
+
   Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
   if (isAMDGCN(getTriple()) && !IsHIPHost) {
-assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+assert(StringRef(CanonName).starts_with("gfx") &&
+   "Invalid amdgcn canonical name");
+StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
+Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) +
 Twine("__"));
 Builder.defineMacro("__amdgcn_processor__",
 Twine("\"") + Twine(CanonName) + Twine("\""));
diff --git a/clang/test/Driver/amdgpu-macros.cl 
b/clang/test/Driver/amdgpu-macros.cl
index 81c22af460d12..3b10444ef71d3 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -131,6 +131,11 @@
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 
-DFAMILY=GFX12
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s -DWA

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-05 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh edited 
https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-05 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh requested changes to this pull request.

When you made changes, you can click the "Re-request review" icon next to 
reviewers to put it back in the review queues :)

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-05 Thread Pierre van Houtryve via cfe-commits



@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   bool IsNonTemporal) const override;
 };
 
+class SIPreciseMemorySupport {

Pierre-vh wrote:

Why does it need to be a separate class hierarchy?
It could just be part of CacheControl, and the functions can be named 
`handlePreciseMemoryAtomic/NonAtomic` ?
That would avoid a lot of boilerplate.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-05 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,362 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op < %s 
| FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op < %s 
| FileCheck %s -check-prefixes=GFX90A
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+amdgpu-precise-memory-op < %s 
| FileCheck %s -check-prefixes=GFX10
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 
-mattr=-flat-for-global,+enable-flat-scratch,+amdgpu-precise-memory-op 
-amdgpu-use-divergent-register-indexing < %s | FileCheck 
--check-prefixes=GFX9-FLATSCR %s

Pierre-vh wrote:

gfx11 and 12 tests

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-05 Thread Pierre van Houtryve via cfe-commits



@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode",
   "Enable CU wavefront execution mode"
 >;
 
+def FeaturePreciseMemory

Pierre-vh wrote:

Understood :) 
Can you remove the `amdgpu` prefix from the option? All target features are 
already specific to AMDGPU, e.g. xnack, sramecc, etc. so you can just use 
something like `precise-memory`

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-05 Thread Pierre van Houtryve via cfe-commits



@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   bool IsNonTemporal) const override;
 };
 
+class SIPreciseMemorySupport {
+protected:
+  const GCNSubtarget &ST;
+  const SIInstrInfo *TII = nullptr;
+
+  IsaVersion IV;
+
+  SIPreciseMemorySupport(const GCNSubtarget &ST) : ST(ST) {
+TII = ST.getInstrInfo();
+IV = getIsaVersion(ST.getCPU());
+  }
+
+public:
+  static std::unique_ptr create(const GCNSubtarget 
&ST);
+
+  virtual bool handleNonAtomic(MachineBasicBlock::iterator &MI) = 0;
+  /// Handles atomic instruction \p MI with \p ret indicating whether \p MI
+  /// returns a result.
+  virtual bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) = 0;
+};
+
+class SIGfx9PreciseMemorySupport : public SIPreciseMemorySupport {
+public:
+  SIGfx9PreciseMemorySupport(const GCNSubtarget &ST)
+  : SIPreciseMemorySupport(ST) {}
+  bool handleNonAtomic(MachineBasicBlock::iterator &MI) override;
+  bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) override;
+};
+
+class SIGfx10And11PreciseMemorySupport : public SIPreciseMemorySupport {
+public:
+  SIGfx10And11PreciseMemorySupport(const GCNSubtarget &ST)
+  : SIPreciseMemorySupport(ST) {}
+  bool handleNonAtomic(MachineBasicBlock::iterator &MI) override;
+  bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) override;
+};
+
+std::unique_ptr
+SIPreciseMemorySupport::create(const GCNSubtarget &ST) {
+  GCNSubtarget::Generation Generation = ST.getGeneration();
+  if (Generation < AMDGPUSubtarget::GFX10)

Pierre-vh wrote:

GFX12 is missing

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-05 Thread Pierre van Houtryve via cfe-commits



@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   bool IsNonTemporal) const override;
 };
 
+class SIPreciseMemorySupport {
+protected:
+  const GCNSubtarget &ST;
+  const SIInstrInfo *TII = nullptr;
+
+  IsaVersion IV;
+
+  SIPreciseMemorySupport(const GCNSubtarget &ST) : ST(ST) {
+TII = ST.getInstrInfo();
+IV = getIsaVersion(ST.getCPU());
+  }
+
+public:
+  static std::unique_ptr create(const GCNSubtarget 
&ST);
+
+  virtual bool handleNonAtomic(MachineBasicBlock::iterator &MI) = 0;
+  /// Handles atomic instruction \p MI with \p ret indicating whether \p MI
+  /// returns a result.
+  virtual bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) = 0;
+};
+
+class SIGfx9PreciseMemorySupport : public SIPreciseMemorySupport {
+public:
+  SIGfx9PreciseMemorySupport(const GCNSubtarget &ST)
+  : SIPreciseMemorySupport(ST) {}
+  bool handleNonAtomic(MachineBasicBlock::iterator &MI) override;
+  bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) override;
+};
+
+class SIGfx10And11PreciseMemorySupport : public SIPreciseMemorySupport {
+public:
+  SIGfx10And11PreciseMemorySupport(const GCNSubtarget &ST)
+  : SIPreciseMemorySupport(ST) {}
+  bool handleNonAtomic(MachineBasicBlock::iterator &MI) override;
+  bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) override;
+};
+
+std::unique_ptr
+SIPreciseMemorySupport::create(const GCNSubtarget &ST) {
+  GCNSubtarget::Generation Generation = ST.getGeneration();
+  if (Generation < AMDGPUSubtarget::GFX10)
+return std::make_unique(ST);
+  return std::make_unique(ST);
+}
+
+bool SIGfx9PreciseMemorySupport ::handleNonAtomic(
+MachineBasicBlock::iterator &MI) {
+  assert(MI->mayLoadOrStore());
+
+  MachineInstr &Inst = *MI;
+  AMDGPU::Waitcnt Wait;
+
+  if (TII->isSMRD(Inst)) { // scalar
+if (Inst.mayStore())
+  return false;
+Wait.DsCnt = 0;   // LgkmCnt
+  } else {// vector
+if (Inst.mayLoad()) { // vector load
+  if (TII->isVMEM(Inst)) {// VMEM load
+Wait.LoadCnt = 0; // VmCnt
+  } else if (TII->isFLAT(Inst)) { // Flat load
+Wait.LoadCnt = 0; // VmCnt
+Wait.DsCnt = 0;   // LgkmCnt
+  } else {// LDS load
+Wait.DsCnt = 0;   // LgkmCnt
+  }
+} else {  // vector store
+  if (TII->isVMEM(Inst)) {// VMEM store
+Wait.LoadCnt = 0; // VmCnt
+  } else if (TII->isFLAT(Inst)) { // Flat store
+Wait.LoadCnt = 0; // VmCnt
+Wait.DsCnt = 0;   // LgkmCnt
+  } else {
+Wait.DsCnt = 0; // LDS store; LgkmCnt
+  }
+}
+  }
+
+  unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+  MachineBasicBlock &MBB = *MI->getParent();
+  BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+  --MI;
+  return true;
+}
+
+bool SIGfx9PreciseMemorySupport ::handleAtomic(MachineBasicBlock::iterator &MI,
+   bool ret) {
+  assert(MI->mayLoadOrStore());
+
+  AMDGPU::Waitcnt Wait;
+
+  Wait.LoadCnt = 0; // VmCnt
+  Wait.DsCnt = 0;   // LgkmCnt
+
+  unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait);
+  MachineBasicBlock &MBB = *MI->getParent();
+  BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc);
+  --MI;
+  return true;
+}
+
+bool SIGfx10And11PreciseMemorySupport ::handleNonAtomic(
+MachineBasicBlock::iterator &MI) {
+  assert(MI->mayLoadOrStore());
+
+  MachineInstr &Inst = *MI;
+  AMDGPU::Waitcnt Wait;
+
+  bool BuildWaitCnt = true;
+  bool BuildVsCnt = false;
+
+  if (TII->isSMRD(Inst)) { // scalar
+if (Inst.mayStore())
+  return false;
+Wait.DsCnt = 0;   // LgkmCnt
+  } else {// vector
+if (Inst.mayLoad()) { // vector load
+  if (TII->isVMEM(Inst)) {// VMEM load
+Wait.LoadCnt = 0; // VmCnt
+  } else if (TII->isFLAT(Inst)) { // Flat load
+Wait.LoadCnt = 0; // VmCnt
+Wait.DsCnt = 0;   // LgkmCnt
+  } else {// LDS load
+Wait.DsCnt = 0;   // LgkmCnt
+  }
+}
+
+// For some instructions, mayLoad() and mayStore() can be both true.
+if (Inst.mayStore()) { // vector store; an instruction can be both
+   // load/store
+  if (TII->isVMEM(Inst)) { // VMEM store
+if (!Inst.mayLoad())
+  BuildWaitCnt = false;
+BuildVsCnt = true;
+  } else if (TII->isFLAT(Inst)) { // Flat store
+Wait.DsCnt = 0;   // LgkmCnt
+BuildVsCnt = true;
+  } else {
+Wait.DsCnt = 0; // LDS store; LgkmCnt
+  }
+}
+  }
+
+  MachineBasicBlock &MBB = *MI->getParent();
+  if (BuildWaitCnt) {
+unsign

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

@arsenm do you have any concerns with this change?
@t-tye is the documentation good?

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76955

>From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:48:05 +0100
Subject: [PATCH 1/2] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets

These generic targets include multiple GPUs and will, in the future, provide a 
way to build once and run on multiple GPU, at the cost of less optimization 
opportunities.
Note that this is just doing the compiler side of things, device libs an 
runtimes/loader/etc. don't know about these targets yet, so none of them 
actually work in practice right now. This is just the initial commit to make 
LLVM aware of them.

No docs in this patch either as I plan to do it all in a follow-up patch.
---
 clang/lib/Basic/Targets/AMDGPU.cpp|  20 +-
 clang/test/Driver/amdgpu-macros.cl|   5 +
 clang/test/Driver/amdgpu-mcpu.cl  |  10 +
 llvm/docs/AMDGPUUsage.rst | 325 +-
 llvm/include/llvm/BinaryFormat/ELF.h  |   6 +-
 llvm/include/llvm/TargetParser/TargetParser.h |  10 +
 llvm/lib/Object/ELFObjectFile.cpp |  10 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   4 +
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  87 +++--
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   6 +
 .../AMDGPURemoveIncompatibleFunctions.cpp |   6 +-
 llvm/lib/Target/AMDGPU/GCNProcessors.td   |  22 ++
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   4 +
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |  11 +
 llvm/lib/TargetParser/TargetParser.cpp|  46 +++
 .../GlobalISel/llvm.amdgcn.workitem.id.ll |   1 +
 .../CodeGen/AMDGPU/directive-amdgcn-target.ll |  14 +
 .../CodeGen/AMDGPU/elf-header-flags-mach.ll   |  10 +
 llvm/test/CodeGen/AMDGPU/gds-allocation.ll|   1 +
 llvm/test/CodeGen/AMDGPU/gds-atomic.ll|   1 +
 .../AMDGPU/generic-targets-require-v6.ll  |  18 +
 .../AMDGPU/hsa-generic-target-features.ll |  31 ++
 .../llvm.amdgcn.image.gather4.d16.dim.ll  |   3 +
 .../AMDGPU/llvm.amdgcn.image.sample.dim.ll|   3 +
 .../AMDGPU/unsupported-image-sample.ll|  12 +-
 .../Object/AMDGPU/elf-header-flags-mach.yaml  |  29 ++
 .../llvm-objdump/ELF/AMDGPU/subtarget.ll  |  20 ++
 .../llvm-readobj/ELF/AMDGPU/elf-headers.test  |  12 +
 llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++
 30 files changed, 689 insertions(+), 192 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp 
b/clang/lib/Basic/Targets/AMDGPU.cpp
index 141501e8a4d9a1..799634ccec7ba5 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost)
 return;
 
-  StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
-  : getArchNameR600(GPUKind);
+  std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind)
+ : getArchNameR600(GPUKind))
+  .str();
+
+  // Sanitize the name of generic targets.
+  // e.g. gfx10.1-generic -> gfx10_1_generic
+  if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST &&
+  GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) {
+std::replace(CanonName.begin(), CanonName.end(), '.', '_');
+std::replace(CanonName.begin(), CanonName.end(), '-', '_');
+  }
+
   Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
   // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
   if (isAMDGCN(getTriple()) && !IsHIPHost) {
-assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+assert(StringRef(CanonName).starts_with("gfx") &&
+   "Invalid amdgcn canonical name");
+StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind);
+Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) +
 Twine("__"));
 Builder.defineMacro("__amdgcn_processor__",
 Twine("\"") + Twine(CanonName) + Twine("\""));
diff --git a/clang/test/Driver/amdgpu-macros.cl 
b/clang/test/Driver/amdgpu-macros.cl
index 81c22af460d12d..3b10444ef71d36 100644
--- a/clang/test/Driver/amdgpu-macros.cl
+++ b/clang/test/Driver/amdgpu-macros.cl
@@ -131,6 +131,11 @@
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 
-DFAMILY=GFX12
 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck 
--check-prefixes=ARCH-GCN,FAST_FMAF

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-28 Thread Pierre van Houtryve via cfe-commits



@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode",
   "Enable CU wavefront execution mode"
 >;
 
+def FeaturePreciseMemory

Pierre-vh wrote:

I'm not a fan of using a feature for this, I think we should have a backend CL 
option instead. After all this isn't an architecture feature but just a 
behavior change for SIMemoryLegalizer.


https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-28 Thread Pierre van Houtryve via cfe-commits



@@ -641,6 +644,9 @@ class SIMemoryLegalizer final : public MachineFunctionPass {
   bool expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI,
 MachineBasicBlock::iterator &MI);
 
+  bool GFX9InsertWaitcntForPreciseMem(MachineFunction &MF);

Pierre-vh wrote:

Agreed, this should definitely be a virtual function such as 
`insertWaitcntForPreciseMem` and let the CacheControl implementation do what is 
needed. This is just emulating what `CacheControl` already does

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-28 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,199 @@
+; Testing the -amdgpu-precise-memory-op option
+; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op 
-verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX9
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op 
-verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX90A
+; COM: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+amdgpu-precise-memory-op 
-verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX10
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 
-mattr=-flat-for-global,+enable-flat-scratch,+amdgpu-precise-memory-op 
-amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s | FileCheck 
--check-prefixes=GFX9-FLATSCR %s
+
+; from atomicrmw-expand.ll
+; covers flat_load, flat_atomic
+define void @syncscope_workgroup_nortn(ptr %addr, float %val) {
+; GFX90A-LABEL: syncscope_workgroup_nortn:
+; GFX90A:  ; %bb.0:
+; GFX90A: flat_load_dword v5, v[0:1]
+; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A:  .LBB0_1: ; %atomicrmw.start
+; GFX90A: flat_atomic_cmpswap v3, v[0:1], v[4:5] glc
+; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+  %res = atomicrmw fadd ptr %addr, float %val syncscope("workgroup") seq_cst
+  ret void
+}
+
+; from atomicrmw-nand.ll
+; covers global_atomic, global_load
+define i32 @atomic_nand_i32_global(ptr addrspace(1) %ptr) nounwind {
+; GFX9-LABEL: atomic_nand_i32_global:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:global_load_dword v2, v[0:1], off
+; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_mov_b64 s[4:5], 0
+; GFX9-NEXT:  .LBB1_1: ; %atomicrmw.start
+; GFX9-NEXT:; =>This Inner Loop Header: Depth=1
+; GFX9-NOT: s_waitcnt vmcnt(0)
+; GFX9-NEXT:v_mov_b32_e32 v3, v2
+; GFX9-NEXT:v_not_b32_e32 v2, v3
+; GFX9-NEXT:v_or_b32_e32 v2, -5, v2
+; GFX9-NEXT:global_atomic_cmpswap v2, v[0:1], v[2:3], off glc
+; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-NEXT:buffer_wbinvl1_vol
+; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, v2, v3
+; GFX9-NEXT:s_or_b64 s[4:5], vcc, s[4:5]
+; GFX9-NEXT:s_andn2_b64 exec, exec, s[4:5]
+; GFX9-NEXT:s_cbranch_execnz .LBB1_1
+; GFX9-NEXT:  ; %bb.2: ; %atomicrmw.end
+; GFX9-NEXT:s_or_b64 exec, exec, s[4:5]
+; GFX9-NEXT:v_mov_b32_e32 v0, v2
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+  %result = atomicrmw nand ptr addrspace(1) %ptr, i32 4 seq_cst
+  ret i32 %result
+}
+
+; from bf16.ll
+; covers buffer_load, buffer_store, flat_load, flat_store, global_load, 
global_store
+define void @test_load_store(ptr addrspace(1) %in, ptr addrspace(1) %out) {
+;
+; GFX9-LABEL: test_load_store:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:global_load_ushort v0, v[0:1], off
+; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-NEXT:global_store_short v[2:3], v0, off
+; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: test_load_store:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:global_load_ushort v0, v[0:1], off
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:s_waitcnt_vscnt null, 0x0
+; GFX10-NEXT:global_store_short v[2:3], v0, off
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:s_waitcnt_vscnt null, 0x0
+; GFX10-NEXT:s_setpc_b64 s[30:31]
+  %val = load bfloat, ptr addrspace(1) %in
+  store bfloat %val, ptr addrspace(1) %out
+  ret void
+}
+
+; from scratch-simple.ll
+; covers scratch_load, scratch_store
+;
+; GFX9-FLATSCR-LABEL: {{^}}vs_main:
+; GFX9-FLATSCR:scratch_store_dwordx4 off, v[{{[0-9:]+}}],
+; GFX9-FLATSCR-NEXT:   s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX9-FLATSCR:scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
+; GFX9-FLATSCR-NEXT:   s_waitcnt vmcnt(0) lgkmcnt(0)
+define amdgpu_vs float @vs_main(i32 %idx) {
+  %v1 = extractelement <81 x float> , i32 %idx

Pierre-vh wrote:

what is that test for/why does it need such a big vector?
if it's to force a spill to stack I think you can do that by playing with the 
number of available v/sgprs, IIRC there is an attribute or CL opt for that

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-28 Thread Pierre van Houtryve via cfe-commits



@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const 
SIMemOpInfo &MOI,
   return Changed;
 }
 
+bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) {
+  const GCNSubtarget &ST = MF.getSubtarget();
+  const SIInstrInfo *TII = ST.getInstrInfo();
+  IsaVersion IV = getIsaVersion(ST.getCPU());
+
+  bool Changed = false;
+
+  for (auto &MBB : MF) {
+for (auto MI = MBB.begin(); MI != MBB.end();) {
+  MachineInstr &Inst = *MI;
+  ++MI;
+  if (Inst.mayLoadOrStore() == false)
+continue;
+
+  // Todo: if next insn is an s_waitcnt
+  AMDGPU::Waitcnt Wait;
+
+  if (!(Inst.getDesc().TSFlags & SIInstrFlags::maybeAtomic)) {
+if (TII->isSMRD(Inst)) {  // scalar

Pierre-vh wrote:

Can we have a shared helper, e.g. in `SIInstrInfo`  for both? It's a lot of 
logic to duplicate

> The counter values in SIInsertWaitcnt are precise, while in this features the 
> counters are simply set to 0.
That could just be a boolean switch in a shared helper

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-28 Thread Pierre van Houtryve via cfe-commits



@@ -0,0 +1,199 @@
+; Testing the -amdgpu-precise-memory-op option

Pierre-vh wrote:

Please generate the test using `update_llc_test_checks`, much easier to update 
if/when things change.

Also I think you don't need `-verify-machineinstrs`. It's expensive and runs 
anyway when EXPENSIVE_CHECKS is on.

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-28 Thread Pierre van Houtryve via cfe-commits



@@ -17,13 +17,16 @@
 #include "AMDGPUMachineModuleInfo.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
+#include "Utils/AMDGPUBaseInfo.h"
 #include "llvm/ADT/BitmaskEnum.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/Support/AtomicOrdering.h"
 #include "llvm/TargetParser/TargetParser.h"
 
+#include 

Pierre-vh wrote:

I suspect this is a debug leftover but: always use `dbgs` to print, it's easier 
and faster to build than `iostream` :)

https://github.com/llvm/llvm-project/pull/79236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [flang] [clang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-28 Thread Pierre van Houtryve via cfe-commits



@@ -44,8 +44,15 @@ constexpr uint32_t VersionMajorV5 = 1;
 /// HSA metadata minor version for code object V5.
 constexpr uint32_t VersionMinorV5 = 2;
 
+/// HSA metadata major version for code object V6.
+constexpr uint32_t VersionMajorV6 = 1;
+/// HSA metadata minor version for code object V6.
+constexpr uint32_t VersionMinorV6 = 3;

Pierre-vh wrote:

Not yet, but I assume we'll want to bundle some changes to the MD with V6 so 
it's better to update the version now, no?

https://github.com/llvm/llvm-project/pull/76954
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [flang] [clang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-28 Thread Pierre van Houtryve via cfe-commits



@@ -620,6 +620,15 @@ void ScalarBitSetTraits::bitset(IO &IO,
   BCase(EF_AMDGPU_FEATURE_XNACK_V3);
   BCase(EF_AMDGPU_FEATURE_SRAMECC_V3);
   break;
+case ELF::ELFABIVERSION_AMDGPU_HSA_V6:

Pierre-vh wrote:

elf-headers.test already covers it

https://github.com/llvm/llvm-project/pull/76954
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-12 Thread Pierre van Houtryve via cfe-commits



@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,
+allowing to build code once and run it on multiple targets at the cost
+of less features being available.
+
+Generic processors are only available on Code Object V6 and up.
+
+  .. table:: AMDGPU Generic Processors
+ :name: amdgpu-generic-processor-table
+
+  == = 
=
+ Processor TargetSupported Target
+   TripleProcessorsFeatures
+   ArchitectureRestrictions
+
+
+
+
+
+
+
+
+  == = 
=
+ ``gfx9-generic`` ``amdgcn`` - ``gfx900``  - ``v_mad_mix`` 
instructions
+ - ``gfx902``are not available 
on
+ - ``gfx904````gfx900``, 
``gfx902``,
+ - ``gfx906````gfx909``, 
``gfx90c``
+ - ``gfx909``  - ``v_fma_mix`` 
instructions
+ - ``gfx90c``are not available 
on ``gfx904``
+   - sramecc is not 
available on
+ ``gfx906``
+   - The following 
instructions
+ are not available 
on ``gfx906``:
+
+ - ``v_fmac_f32``
+ - ``v_xnor_b32``
+ - 
``v_dot4_i32_i8``
+ - 
``v_dot8_i32_i4``
+ - 
``v_dot2_i32_i16``
+ - 
``v_dot2_u32_u16``
+ - 
``v_dot4_u32_u8``
+ - 
``v_dot8_u32_u4``
+ - 
``v_dot2_f32_f16``
+
+
+ ``gfx10.1-generic``  ``amdgcn`` - ``gfx1010`` - The following 
instructions are
+ - ``gfx1011``   not available on 
``gfx1011``
+ - ``gfx1012``   and ``gfx1012``
+ - ``gfx1013``
+ - 
``v_dot4_i32_i8``
+ - 
``v_dot8_i32_i4``
+ - 
``v_dot2_i32_i16``
+ - 
``v_dot2_u32_u16``
+ - 
``v_dot2c_f32_f16``
+ - 
``v_dot4c_i32_i8``
+ - 
``v_dot4_u32_u8``
+ - 
``v_dot8_u32_u4``
+ - 
``v_dot2_f32_f16``
+
+   - BVH Ray Tracing 
instructions
+ are not available 
on
+ ``gfx1013``
+
+
+ ``gfx10.3-generic``  ``amdgcn`` - ``gfx1030`` No restrictions.
+ - ``gfx1031``
+ - ``gfx1032``
+ - ``gfx1033``
+ - ``gfx1034``
+ - ``gfx1035``
+ - ``gfx1036``

Pierre-vh wrote:

It's not a target in LLVM so no

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-12 Thread Pierre van Houtryve via cfe-commits



@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,
+allowing to build code once and run it on multiple targets at the cost
+of less features being available.
+
+Generic processors are only available on Code Object V6 and up.
+
+  .. table:: AMDGPU Generic Processors
+ :name: amdgpu-generic-processor-table
+
+  == = 
=
+ Processor TargetSupported Target
+   TripleProcessorsFeatures
+   ArchitectureRestrictions
+
+
+
+
+
+
+
+
+  == = 
=
+ ``gfx9-generic`` ``amdgcn`` - ``gfx900``  - ``v_mad_mix`` 
instructions
+ - ``gfx902``are not available 
on
+ - ``gfx904````gfx900``, 
``gfx902``,
+ - ``gfx906````gfx909``, 
``gfx90c``
+ - ``gfx909``  - ``v_fma_mix`` 
instructions
+ - ``gfx90c``are not available 
on ``gfx904``
+   - sramecc is not 
available on
+ ``gfx906``
+   - The following 
instructions
+ are not available 
on ``gfx906``:
+
+ - ``v_fmac_f32``
+ - ``v_xnor_b32``
+ - 
``v_dot4_i32_i8``
+ - 
``v_dot8_i32_i4``
+ - 
``v_dot2_i32_i16``
+ - 
``v_dot2_u32_u16``
+ - 
``v_dot4_u32_u8``
+ - 
``v_dot8_u32_u4``
+ - 
``v_dot2_f32_f16``
+
+
+ ``gfx10.1-generic``  ``amdgcn`` - ``gfx1010`` - The following 
instructions are
+ - ``gfx1011``   not available on 
``gfx1011``
+ - ``gfx1012``   and ``gfx1012``
+ - ``gfx1013``
+ - 
``v_dot4_i32_i8``

Pierre-vh wrote:

gfx1010 and gfx1012 are indeed not identical, but gfx1011 and gfx1012 are:
```
def FeatureISAVersion10_1_0 : FeatureSet<
  !listconcat(FeatureISAVersion10_1_Common.Features,
[])>;

def FeatureISAVersion10_1_2 : FeatureSet<
  !listconcat(FeatureISAVersion10_1_Common.Features,
[FeatureDot1Insts,
 FeatureDot2Insts,
 FeatureDot5Insts,
 FeatureDot6Insts,
 FeatureDot7Insts,
 FeatureDot10Insts])>;
```



https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [llvm] [clang] [lld] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-16 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

ping

https://github.com/llvm/llvm-project/pull/76954
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [clang] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-17 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

@arsenm Hi, can you take a look - especially on the testing? I don't know if 
this is tested well enough

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [llvm] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-18 Thread Pierre van Houtryve via cfe-commits



@@ -840,6 +845,12 @@ enum : unsigned {
   EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 = 0x800,
   // SRAMECC is on.
   EF_AMDGPU_FEATURE_SRAMECC_ON_V4 = 0xc00,
+
+  // Generic target versioning. This is contained in the list byte of EFLAGS.

Pierre-vh wrote:

It's already part of #76954, I just haven't figured out how to stack PR yet so 
all changes of #76954 are here too :/

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [llvm] [lld] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-18 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From d56e752e3eed0fd75a7ff98638ec71635019fdb1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  21 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |   9 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   5 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   9 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  27 +++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|  13 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   5 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   2 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 .../elf-headers.test} |   0
 .../ELF/AMDGPU/generic_versions.s |  16 ++
 .../ELF/AMDGPU/generic_versions.test  |  26 ++
 llvm/tools/llvm-readobj/ELFDumper.cpp | 224 --
 52 files changed, 491 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
 rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => 
AMDGPU/elf-headers.test} (100%)
 create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s
 create mode 100644 
llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index e4fdad8265c8637..a6b96ea027056e3 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4763,9 +4763,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_4">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f4246c5e8f68e8b..16dbb4bd835df53 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/Code

[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-18 Thread Pierre van Houtryve via cfe-commits



@@ -787,11 +788,15 @@ enum : unsigned {
   EF_AMDGPU_MACH_AMDGCN_GFX942= 0x04c,
   EF_AMDGPU_MACH_AMDGCN_RESERVED_0X4D = 0x04d,
   EF_AMDGPU_MACH_AMDGCN_GFX1201   = 0x04e,
+  EF_AMDGPU_MACH_AMDGCN_GFX9_GENERIC  = 0x04f,
+  EF_AMDGPU_MACH_AMDGCN_GFX10_1_GENERIC   = 0x050,

Pierre-vh wrote:

172dbdf9312a15b449954e43623afc28240f50dd

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [lld] [clang] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-18 Thread Pierre van Houtryve via cfe-commits



@@ -787,11 +788,15 @@ enum : unsigned {
   EF_AMDGPU_MACH_AMDGCN_GFX942= 0x04c,
   EF_AMDGPU_MACH_AMDGCN_RESERVED_0X4D = 0x04d,
   EF_AMDGPU_MACH_AMDGCN_GFX1201   = 0x04e,
+  EF_AMDGPU_MACH_AMDGCN_GFX9_GENERIC  = 0x04f,

Pierre-vh wrote:

172dbdf9312a15b449954e43623afc28240f50dd

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [lld] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-18 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From 47d4f3ed4e27f2ce2b3b33c9b0ca4838b3011f22 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  21 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |   9 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   5 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   9 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  27 +++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|  13 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   5 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   2 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 .../elf-headers.test} |   0
 .../ELF/AMDGPU/generic_versions.s |  16 ++
 .../ELF/AMDGPU/generic_versions.test  |  26 ++
 llvm/tools/llvm-readobj/ELFDumper.cpp | 224 --
 52 files changed, 491 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
 rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => 
AMDGPU/elf-headers.test} (100%)
 create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s
 create mode 100644 
llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index e4fdad8265c863..a6b96ea027056e 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4763,9 +4763,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_4">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f4246c5e8f68e8..16dbb4bd835df5 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/

[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-18 Thread Pierre van Houtryve via cfe-commits



@@ -787,11 +788,15 @@ enum : unsigned {
   EF_AMDGPU_MACH_AMDGCN_GFX942= 0x04c,
   EF_AMDGPU_MACH_AMDGCN_RESERVED_0X4D = 0x04d,
   EF_AMDGPU_MACH_AMDGCN_GFX1201   = 0x04e,
+  EF_AMDGPU_MACH_AMDGCN_GFX9_GENERIC  = 0x04f,
+  EF_AMDGPU_MACH_AMDGCN_GFX10_1_GENERIC   = 0x050,

Pierre-vh wrote:

Just noticed I forgot to update the AMDGPUUsage + the 
EF_AMDGPU_MACH_AMDGCN_LAST enum when adding the reserved entries. I'll do that 
here.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-18 Thread Pierre van Houtryve via cfe-commits



@@ -253,6 +274,12 @@ AMDGPU::IsaVersion AMDGPU::getIsaVersion(StringRef GPU) {
   case GK_GFX1151: return {11, 5, 1};
   case GK_GFX1200: return {12, 0, 0};
   case GK_GFX1201: return {12, 0, 1};
+
+  // Generic targets use the earliest ISA version in their group.

Pierre-vh wrote:

I think it's alright as is, but this API is bad and should probably be 
refactored IMO. Most users of the API are just interested in checking the 
version major, sometimes minor (10.1 vs 10.3).

In theory, this API should _never_ be used to check for presence of a feature, 
that's always done through the feature list check, so it shouldn't really be 
abusable. I added a comment though to revisit this and make the intent clearer.

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lld] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-18 Thread Pierre van Houtryve via cfe-commits



@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following
 
  === ===  = = 
=== === ==
 
+Generic processors also exist. They group multiple processors into one,
+allowing to build code once and run it on multiple targets at the cost
+of less features being available.
+
+Generic processors are only available on Code Object V6 and up.
+
+  .. table:: AMDGPU Generic Processors
+ :name: amdgpu-generic-processor-table
+
+  == = 
=
+ Processor TargetSupported Target
+   TripleProcessorsFeatures
+   ArchitectureRestrictions
+
+
+
+
+
+
+
+
+  == = 
=
+ ``gfx9-generic`` ``amdgcn`` - ``gfx900``  - ``v_mad_mix`` 
instructions
+ - ``gfx902``are not available 
on
+ - ``gfx904````gfx900``, 
``gfx902``,
+ - ``gfx906````gfx909``, 
``gfx90c``
+ - ``gfx909``  - ``v_fma_mix`` 
instructions
+ - ``gfx90c``are not available 
on ``gfx904``
+   - sramecc is not 
available on

Pierre-vh wrote:

No, for unsupported: `EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4`

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [llvm] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-18 Thread Pierre van Houtryve via cfe-commits



@@ -4135,6 +4283,33 @@ Code object V5 metadata is the same as
 
  == == = 

 
+.. _amdgpu-amdhsa-code-object-metadata-v6:
+
+Code Object V6 Metadata

+
+.. warning::
+  Code object V6 is not the default code object version emitted by this version
+  of LLVM.
+
+
+Code object V6 metadata is the same as
+:ref:`amdgpu-amdhsa-code-object-metadata-v5` with the changes defined in table
+:ref:`amdgpu-amdhsa-code-object-metadata-map-table-v6`.
+
+  .. table:: AMDHSA Code Object V6 Metadata Map Changes
+ :name: amdgpu-amdhsa-code-object-metadata-map-table-v6
+
+ = == = 
===
+ String KeyValue Type Required? Description
+ = == = 
===
+ "amdhsa.version"  sequence ofRequired  - The first integer is the 
major

Pierre-vh wrote:

I anticipate that we'll want to add some more V6-only metadata at some point, 
that's why I just started a new  table so it's easier to follow up. I don't 
mind merging it with the V5 table if you really prefer

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [lld] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-01-18 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

I added a few more tests, I just didn't find how to test the flat-scratch stuff 
properly.
Also, gfx904 is documented as not having absolute flat scratch, yet I don't see 
anything about that in the code (no related feature). I put gfx9-generic with 
flat scratch but I don't know if that's correct at all, and how to test it?

https://github.com/llvm/llvm-project/pull/76955
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[lld] [llvm] [clang] [flang] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-22 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

ping

https://github.com/llvm/llvm-project/pull/76954
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [lld] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)

2024-01-23 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/76954

>From b5a034bd71d6925ac287a9bf4c0f86f9e70bb9d1 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Thu, 4 Jan 2024 14:12:00 +0100
Subject: [PATCH] [AMDGPU] Introduce Code Object V6

Introduce Code Object V6 in Clang, LLD, Flang and LLVM.
This is the same as V5 except a new "generic version" flag can be present in 
EFLAGS. This is related to new generic targets that'll be added in a follow-up 
patch. It's also likely V6 will have new changes (possibly new metadata 
entries) added later.

Docs change are not included, I'm planning to do them in a follow-up patch all 
at once (when generic targets land too).
---
 clang/include/clang/Driver/Options.td |   4 +-
 clang/lib/CodeGen/CGBuiltin.cpp   |   6 +-
 clang/lib/Driver/ToolChains/CommonArgs.cpp|   2 +-
 .../amdgpu-code-object-version-linking.cu |  37 +++
 .../CodeGenCUDA/amdgpu-code-object-version.cu |   4 +
 .../test/CodeGenCUDA/amdgpu-workgroup-size.cu |   4 +
 .../amdgcn/bitcode/oclc_abi_version_600.bc|   0
 clang/test/Driver/hip-code-object-version.hip |  12 +
 clang/test/Driver/hip-device-libs.hip |  18 +-
 flang/lib/Frontend/CompilerInvocation.cpp |   2 +
 flang/test/Lower/AMD/code-object-version.f90  |   3 +-
 lld/ELF/Arch/AMDGPU.cpp   |  21 ++
 lld/test/ELF/amdgpu-tid.s |  16 ++
 llvm/include/llvm/BinaryFormat/ELF.h  |   9 +-
 llvm/include/llvm/Support/AMDGPUMetadata.h|   7 +
 llvm/include/llvm/Support/ScopedPrinter.h |   4 +-
 llvm/include/llvm/Target/TargetOptions.h  |   1 +
 llvm/lib/ObjectYAML/ELFYAML.cpp   |   9 +
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |   3 +
 .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp  |  10 +
 .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h |  11 +-
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp |  26 ++
 .../MCTargetDesc/AMDGPUTargetStreamer.h   |   1 +
 .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp|   6 +
 llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h |   2 +-
 ...licit-kernarg-backend-usage-global-isel.ll |   2 +
 .../AMDGPU/call-graph-register-usage.ll   |   1 +
 .../AMDGPU/codegen-internal-only-func.ll  |   3 +
 llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll  |   4 +
 .../enable-scratch-only-dynamic-stack.ll  |   1 +
 .../AMDGPU/implicit-kernarg-backend-usage.ll  |   2 +
 .../AMDGPU/implicitarg-offset-attributes.ll   |  46 
 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll |   1 +
 llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll  |   1 +
 llvm/test/CodeGen/AMDGPU/recursion.ll |   1 +
 .../AMDGPU/resource-usage-dead-function.ll|   1 +
 .../AMDGPU/tid-mul-func-xnack-all-any.ll  |   6 +
 .../tid-mul-func-xnack-all-not-supported.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-off.ll  |   6 +
 .../AMDGPU/tid-mul-func-xnack-all-on.ll   |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-1.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-off-2.ll|   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-1.ll |   6 +
 .../AMDGPU/tid-mul-func-xnack-any-on-2.ll |   6 +
 .../tid-one-func-xnack-not-supported.ll   |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll  |   6 +
 .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll   |   6 +
 .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s |   5 +
 .../elf-headers.test} |   0
 .../ELF/AMDGPU/generic_versions.s |  16 ++
 .../ELF/AMDGPU/generic_versions.test  |  26 ++
 llvm/tools/llvm-readobj/ELFDumper.cpp | 224 --
 52 files changed, 483 insertions(+), 135 deletions(-)
 create mode 100644 
clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc
 rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => 
AMDGPU/elf-headers.test} (100%)
 create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s
 create mode 100644 
llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 819f6f1a15c3f3..0d66b68e5d8c47 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4783,9 +4783,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, 
Group,
   HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
   Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>,
-  Values<"none,4,5">,
+  Values<"none,4,5,6">,
   NormalizedValuesScope<"llvm::CodeObjectVersionKind">,
-  NormalizedValues<["COV_None", "COV_4", "COV_5"]>,
+  NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>,
   MarshallingInfoEnum, "COV_4">;
 
 defm cumode : SimpleMFlag<"cumode",
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 7ef764b8e1ac80..fdc7025e50fed6 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/C

[clang] [AMDGPU] Remove Code Object V3 (PR #67118)

2023-10-18 Thread Pierre van Houtryve via cfe-commits


Pierre-vh wrote:

I don't mind reverting, but do you have a timeline for removal of that device? 
v3 has been deprecated for a while, AFAIK.

https://github.com/llvm/llvm-project/pull/67118
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] d6acd01 - [Sema] Fix crash when evaluating nested call with value-dependent arg

2023-01-05 Thread Pierre van Houtryve via cfe-commits


Author: Pierre van Houtryve
Date: 2023-01-06T02:57:50-05:00
New Revision: d6acd0196b3378bdeb5193053e290d7194c4f72d

URL: 
https://github.com/llvm/llvm-project/commit/d6acd0196b3378bdeb5193053e290d7194c4f72d
DIFF: 
https://github.com/llvm/llvm-project/commit/d6acd0196b3378bdeb5193053e290d7194c4f72d.diff

LOG: [Sema] Fix crash when evaluating nested call with value-dependent arg

Fix an edge case `ExprConstant.cpp`'s `EvaluateWithSubstitution` when called by 
`CheckEnableIf`

The assertion in `CallStackFrame::getTemporary`
could fail during evaluation of nested calls to a function
using `enable_if` when the second argument was a
value-dependent expression.

This caused a temporary to be created for the second
argument with a given version during the
evaluation of the inner call, but we bailed out
when evaluating the second argument of the
outer call due to the expression being value-dependent.
After bailing out, we tried to clean up the argument's value slot but it
caused an assertion to trigger in `getTemporary` as
a temporary for the second argument existed, but only for the inner call and 
not the outer call.

See the test case for a more complete description of the issue.

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D139713

Added: 
clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp

Modified: 
clang/lib/AST/ExprConstant.cpp

Removed: 




diff  --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 78cfecbec9fd3..a43845e53c5d0 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -594,11 +594,6 @@ namespace {
   auto LB = Temporaries.lower_bound(KV);
   if (LB != Temporaries.end() && LB->first == KV)
 return &LB->second;
-  // Pair (Key,Version) wasn't found in the map. Check that no elements
-  // in the map have 'Key' as their key.
-  assert((LB == Temporaries.end() || LB->first.first != Key) &&
- (LB == Temporaries.begin() || std::prev(LB)->first.first != Key) 
&&
- "Element with key 'Key' found in map");
   return nullptr;
 }
 

diff  --git 
a/clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp 
b/clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
new file mode 100644
index 0..998f2ccf92534
--- /dev/null
+++ b/clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -fsyntax-only %s -std=c++14
+
+// Checks that Clang doesn't crash/assert on the nested call to "kaboom"
+// in "bar()".
+//
+// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame`
+// because it triggers the following chain of events:
+// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`.
+//  1. The outer call to "kaboom" gets evaluated.
+//   2. The expr for "a" gets evaluated, it has a version X;
+//  a temporary with the key (a, X) is created.
+// 3. The inner call to "kaboom" gets evaluated.
+//   4. The expr for "a" gets evaluated, it has a version Y;
+//  a temporary with the key (a, Y) is created.
+//   5. The expr for "b" gets evaluated, it has a version Y;
+//  a temporary with the key (b, Y) is created.
+//   6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it
+//  because it's value-dependent (due to the call to "f.foo()").
+//
+// When `EvaluateWithSubstitution` bails out while evaluating the outer
+// call, it attempts to fetch "b"'s param slot to clean it up.
+//
+// This used to cause an assertion failure in `getTemporary` because
+// a temporary with the key "(b, Y)" (created at step 4) existed but
+// not one for "(b, X)", which is what it was trying to fetch.
+
+template
+__attribute__((enable_if(true, "")))
+T kaboom(T a, T b) {
+  return b;
+}
+
+struct A {
+  double foo();
+};
+
+template 
+struct B {
+  A &f;
+
+  void bar() {
+kaboom(kaboom(0.0, 1.0), f.foo());
+  }
+};



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] c05f163 - [clang][cuda/hip] Allow `noinline` lambdas

2022-11-04 Thread Pierre van Houtryve via cfe-commits


Author: Pierre van Houtryve
Date: 2022-11-04T07:33:31Z
New Revision: c05f1639f7f4a8e81ad83bba99bae95553c6064e

URL: 
https://github.com/llvm/llvm-project/commit/c05f1639f7f4a8e81ad83bba99bae95553c6064e
DIFF: 
https://github.com/llvm/llvm-project/commit/c05f1639f7f4a8e81ad83bba99bae95553c6064e.diff

LOG: [clang][cuda/hip] Allow `__noinline__`  lambdas

D124866 seem to have had an unintended side effect: __noinline__ on lambdas was 
no longer accepted.

This fixes the regression and adds a test case for it.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D137251

Added: 
clang/test/CodeGenCUDA/lambda-noinline.cu

Modified: 
clang/docs/ReleaseNotes.rst
clang/lib/Parse/ParseExprCXX.cpp
clang/test/Parser/lambda-attr.cu

Removed: 




diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index ad1a00b4bbcc..7bb1405c131a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -638,6 +638,9 @@ C++2b Feature Support
 CUDA/HIP Language Changes in Clang
 --
 
+ - Allow the use of ``__noinline__`` as a keyword (instead of 
``__attribute__((noinline))``)
+   in lambda declarations.
+
 Objective-C Language Changes in Clang
 -
 

diff  --git a/clang/lib/Parse/ParseExprCXX.cpp 
b/clang/lib/Parse/ParseExprCXX.cpp
index e34bd8d7bca4..a768c4da504a 100644
--- a/clang/lib/Parse/ParseExprCXX.cpp
+++ b/clang/lib/Parse/ParseExprCXX.cpp
@@ -1291,7 +1291,22 @@ ExprResult Parser::ParseLambdaExpressionAfterIntroducer(
   if (getLangOpts().CUDA) {
 // In CUDA code, GNU attributes are allowed to appear immediately after the
 // "[...]", even if there is no "(...)" before the lambda body.
-MaybeParseGNUAttributes(D);
+//
+// Note that we support __noinline__ as a keyword in this mode and thus
+// it has to be separately handled.
+while (true) {
+  if (Tok.is(tok::kw___noinline__)) {
+IdentifierInfo *AttrName = Tok.getIdentifierInfo();
+SourceLocation AttrNameLoc = ConsumeToken();
+Attr.addNew(AttrName, AttrNameLoc, nullptr, AttrNameLoc, nullptr, 0,
+ParsedAttr::AS_Keyword);
+  } else if (Tok.is(tok::kw___attribute))
+ParseGNUAttributes(Attr, nullptr, &D);
+  else
+break;
+}
+
+D.takeAttributes(Attr);
   }
 
   // Helper to emit a warning if we see a CUDA host/device/global attribute

diff  --git a/clang/test/CodeGenCUDA/lambda-noinline.cu 
b/clang/test/CodeGenCUDA/lambda-noinline.cu
new file mode 100644
index ..de2196e63f07
--- /dev/null
+++ b/clang/test/CodeGenCUDA/lambda-noinline.cu
@@ -0,0 +1,23 @@
+// RUN: %clang_cc1 -no-opaque-pointers -x hip -emit-llvm -std=c++11 %s -o - \
+// RUN:   -triple x86_64-linux-gnu \
+// RUN:   | FileCheck -check-prefix=HOST %s
+// RUN: %clang_cc1 -no-opaque-pointers -x hip -emit-llvm -std=c++11 %s -o - \
+// RUN:   -triple amdgcn-amd-amdhsa -fcuda-is-device \
+// RUN:   | FileCheck -check-prefix=DEV %s
+
+#include "Inputs/cuda.h"
+
+// Checks noinline is correctly added to the lambda function.
+
+// HOST: define{{.*}}@_ZZ4HostvENKUlvE_clEv({{.*}}) #[[ATTR:[0-9]+]]
+// HOST: attributes #[[ATTR]]{{.*}}noinline
+
+// DEV: define{{.*}}@_ZZ6DevicevENKUlvE_clEv({{.*}}) #[[ATTR:[0-9]+]]
+// DEV: attributes #[[ATTR]]{{.*}}noinline
+
+__device__ int a;
+int b;
+
+__device__ int Device() { return ([&] __device__ __noinline__ (){ return a; 
})(); }
+
+__host__ int Host() { return ([&] __host__ __noinline__ (){ return b; })(); }

diff  --git a/clang/test/Parser/lambda-attr.cu 
b/clang/test/Parser/lambda-attr.cu
index 886212b97f50..7fa128effd51 100644
--- a/clang/test/Parser/lambda-attr.cu
+++ b/clang/test/Parser/lambda-attr.cu
@@ -18,6 +18,10 @@ __attribute__((device)) void device_attr() {
   ([&](int) __attribute__((device)){ device_fn(); })(0);
   // expected-warning@-1 {{nvcc does not allow '__device__' to appear after 
the parameter list in lambdas}}
   ([&] __attribute__((device)) (int) { device_fn(); })(0);
+
+  // test that noinline can appear anywhere.
+  ([&] __attribute__((device)) __noinline__ () { device_fn(); })();
+  ([&] __noinline__ __attribute__((device)) () { device_fn(); })();
 }
 
 __attribute__((host)) __attribute__((device)) void host_device_attrs() {
@@ -37,6 +41,11 @@ __attribute__((host)) __attribute__((device)) void 
host_device_attrs() {
   // expected-warning@-1 {{nvcc does not allow '__host__' to appear after the 
parameter list in lambdas}}
   // expected-warning@-2 {{nvcc does not allow '__device__' to appear after 
the parameter list in lambdas}}
   ([&] __attribute__((host)) __attribute__((device)) (int) { hd_fn(); })(0);
+
+  // test that noinline can also appear anywhere.
+  ([] __attribute__((host)) __attribute__((device)) () { hd_fn(); })();
+  ([] __attribute__((host)) __noinline__ __attribute__((device)) () { hd_fn(); 
})();

[clang] [AMDGPU] Remove Code Object V3 (PR #67118)

2023-10-15 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh closed 
https://github.com/llvm/llvm-project/pull/67118
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Remove Code Object V2 (PR #65715)

2023-09-07 Thread Pierre van Houtryve via cfe-commits


https://github.com/Pierre-vh review_requested 
https://github.com/llvm/llvm-project/pull/65715
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 >

1 - 100 of 195 matches

Mail list logo