[clang] [clang][AMDGPU] Update amdgpu_waves_per_eu attr docs (PR #74587)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/74587 None >From f4f909df09dda7e2d21389f7b44f67e89997c44b Mon Sep 17 00:00:00 2001 From: pvanhout Date: Wed, 6 Dec 2023 12:47:56 +0100 Subject: [PATCH] [clang][AMDGPU] Update amdgpu_waves_per_eu attr docs --- clang/include/clang/Basic/AttrDocs.td | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td index bbe4de94cbabe..88f7c65e6e847 100644 --- a/clang/include/clang/Basic/AttrDocs.td +++ b/clang/include/clang/Basic/AttrDocs.td @@ -2659,8 +2659,9 @@ An error will be given if: - Specified values violate subtarget specifications; - Specified values are not compatible with values provided through other attributes; - - The AMDGPU target backend is unable to create machine code that can meet the -request. + +The AMDGPU target backend will emit a warning whenever it is unable to +create machine code that meets the request. }]; } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][AMDGPU] Update amdgpu_waves_per_eu attr docs (PR #74587)
@@ -2659,8 +2659,9 @@ An error will be given if: - Specified values violate subtarget specifications; - Specified values are not compatible with values provided through other attributes; - - The AMDGPU target backend is unable to create machine code that can meet the -request. + +The AMDGPU target backend will emit a warning whenever it is unable to Pierre-vh wrote: Do you want me to format it like this instead to mimic the previous formatting? ``` A warning will be given if: - ... ``` https://github.com/llvm/llvm-project/pull/74587 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][AMDGPU] Update amdgpu_waves_per_eu attr docs (PR #74587)
https://github.com/Pierre-vh closed https://github.com/llvm/llvm-project/pull/74587 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Improve selection of ballot.i64 intrinsic in wave32 mode. (PR #71556)
@@ -961,6 +961,18 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const { return IC.replaceInstUsesWith(II, Constant::getNullValue(II.getType())); } } +if (ST->isWave32() && II.getType()->getIntegerBitWidth() == 64) { + // %b64 = call i64 ballot.i64(...) + // => + // %b32 = call i32 ballot.i32(...) + // %b64 = zext i32 %b32 to i64 + Function *NewF = Intrinsic::getDeclaration( + II.getModule(), Intrinsic::amdgcn_ballot, {IC.Builder.getInt32Ty()}); + CallInst *NewCall = IC.Builder.CreateCall(NewF, {II.getArgOperand(0)}); + Value *CastedCall = IC.Builder.CreateZExtOrBitCast(NewCall, II.getType()); + CastedCall->takeName(&II); + return IC.replaceInstUsesWith(II, CastedCall); Pierre-vh wrote: Nit: Can just reuse the same value, e.g. ``` Value *Call = IC.Builder.CreateCall(NewF, {II.getArgOperand(0)}); Call = IC.Builder.CreateZExtOrBitCast(NewCall, II.getType()); Call->takeName(&II); return IC.replaceInstUsesWith(II, Call); ``` I also think you should be able to use something like `IC.Builder.CreateIntrinsic` ? There should be a function to create an intrinsic call directly. https://github.com/llvm/llvm-project/pull/71556 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [clang] [flang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/76954 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). >From dc666323870118020c0fd386d19d8306d4c853e1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 22 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 12 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 5 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 6 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 27 +++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 13 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 5 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 2 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + llvm/tools/llvm-readobj/ELFDumper.cpp | 222 -- 49 files changed, 448 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499c..0bfe0e7739960e 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4753,9 +4753,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_4">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index f71dbf1729a1d6..be86731ed912ea 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeG
[lld] [clang] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
Pierre-vh wrote: Note: testing is a bit light for now, I'd like to add more tests, but I'm not sure what kind of tests are worth adding. I could just add a generic target run line wherever gfx9/10/11 run lines are present, but that seems a bit overkill? I'd need to change half the tests we have or more. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [clang] [flang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)
@@ -2585,7 +2585,7 @@ getAMDGPUCodeObjectArgument(const Driver &D, const llvm::opt::ArgList &Args) { void tools::checkAMDGPUCodeObjectVersion(const Driver &D, const llvm::opt::ArgList &Args) { const unsigned MinCodeObjVer = 4; - const unsigned MaxCodeObjVer = 5; + const unsigned MaxCodeObjVer = 6; Pierre-vh wrote: I'm wondering if we should print a warning when V6 is enabled (either here or in the backend) to note that it's in development and not ready yet? Something like "code object v6 is still experimental and not ready for production use" https://github.com/llvm/llvm-project/pull/76954 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [flang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From dc666323870118020c0fd386d19d8306d4c853e1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH 1/2] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 22 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 12 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 5 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 6 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 27 +++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 13 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 5 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 2 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + llvm/tools/llvm-readobj/ELFDumper.cpp | 222 -- 49 files changed, 448 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 2b93ddf033499c..0bfe0e7739960e 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4753,9 +4753,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_4">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index f71dbf1729a1d6..be86731ed912ea 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -17481,9 +17481,9 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) { // \p Index is 0, 1, and 2 for x, y, and z dimension, respectively. /// Emit code based on Code Object ABI version. /// COV_4: Emit code to use dispatch ptr -/// COV_5: Emit code to use implicitarg ptr +/// COV_5+ : Emit code to use implicitarg ptr /// COV_NONE : Emit code to load a global variable "__oclc_ABI_versi
[clang] [lld] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh commented: Missing components: - Need a way for external tools to inquire about the specifics of generic targets (without depending on llvm) - map a specific gfx target to its generic family - given a specific gfx version, what's the minimum generic version it needs (?) - tools need to tell generic MACHs from specific ones (currently they can do that by just doing EFLAGS >> 24 and checking if there is any value other than 0 there, but it needs to be documented) https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [flang] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh edited https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [llvm] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -792,6 +793,17 @@ enum : unsigned { EF_AMDGPU_MACH_AMDGCN_FIRST = EF_AMDGPU_MACH_AMDGCN_GFX600, EF_AMDGPU_MACH_AMDGCN_LAST = EF_AMDGPU_MACH_AMDGCN_GFX1201, + // Generic AMDGCN processors + // clang-format off + EF_AMDGPU_MACH_AMDGCN_GFX9_GENERIC = 0x0c0, Pierre-vh wrote: TODO: put it in the list above instead of its own namespace https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [clang] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -839,6 +851,15 @@ enum : unsigned { EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 = 0x800, // SRAMECC is on. EF_AMDGPU_FEATURE_SRAMECC_ON_V4 = 0xc00, + + // Generic target versioning. This is contained in the list byte of EFLAGS. + EF_AMDGPU_GENERIC_VERSION = 0xff00, + EF_AMDGPU_GENERIC_VERSION_OFFSET = 24, Pierre-vh wrote: This will be made generic instead: teach elfdumper to shift EFLAGS to extract any integer TODO: Review how the generic versioning system works in practice https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [flang] [llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh edited https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [flang] [llvm] [clang] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From 6368d8210e211948b5a03ab326b996695b8d Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 21 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 9 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 5 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 9 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 27 +++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 13 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 5 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 2 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + .../elf-headers.test} | 0 .../ELF/AMDGPU/generic_versions.s | 16 ++ .../ELF/AMDGPU/generic_versions.test | 26 ++ llvm/tools/llvm-readobj/ELFDumper.cpp | 224 -- 52 files changed, 491 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => AMDGPU/elf-headers.test} (100%) create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index bffdddc28aac60..f9381d0706f447 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4761,9 +4761,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_4">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index f71dbf1729a1d6..be86731ed912ea 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/
[clang] [lld] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -1260,13 +1261,9 @@ def FeatureISAVersion9_0_8 : FeatureSet< FeatureImageGather4D16Bug])>; def FeatureISAVersion9_0_9 : FeatureSet< - !listconcat(FeatureISAVersion9_0_Common.Features, -[FeatureGDS, - FeatureMadMixInsts, - FeatureDsSrc2Insts, - FeatureExtendedImageInsts, - FeatureImageInsts, - FeatureImageGather4D16Bug])>; + !listconcat(FeatureISAVersion9_0_Consumer_Common.Features, +[FeatureMadMixInsts, + FeatureImageInsts])>; Pierre-vh wrote: todo: remove ImageInsts, it's already included https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [flang] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh edited https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [flang] [lld] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh edited https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [flang] [lld] [clang] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh edited https://github.com/llvm/llvm-project/pull/76954 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76955 >From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:48:05 +0100 Subject: [PATCH 1/3] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. No docs in this patch either as I plan to do it all in a follow-up patch. --- clang/lib/Basic/Targets/AMDGPU.cpp| 20 +- clang/test/Driver/amdgpu-macros.cl| 5 + clang/test/Driver/amdgpu-mcpu.cl | 10 + llvm/docs/AMDGPUUsage.rst | 325 +- llvm/include/llvm/BinaryFormat/ELF.h | 6 +- llvm/include/llvm/TargetParser/TargetParser.h | 10 + llvm/lib/Object/ELFObjectFile.cpp | 10 + llvm/lib/ObjectYAML/ELFYAML.cpp | 4 + llvm/lib/Target/AMDGPU/AMDGPU.td | 87 +++-- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 6 + .../AMDGPURemoveIncompatibleFunctions.cpp | 6 +- llvm/lib/Target/AMDGPU/GCNProcessors.td | 22 ++ llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 + .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 11 + llvm/lib/TargetParser/TargetParser.cpp| 46 +++ .../GlobalISel/llvm.amdgcn.workitem.id.ll | 1 + .../CodeGen/AMDGPU/directive-amdgcn-target.ll | 14 + .../CodeGen/AMDGPU/elf-header-flags-mach.ll | 10 + llvm/test/CodeGen/AMDGPU/gds-allocation.ll| 1 + llvm/test/CodeGen/AMDGPU/gds-atomic.ll| 1 + .../AMDGPU/generic-targets-require-v6.ll | 18 + .../AMDGPU/hsa-generic-target-features.ll | 31 ++ .../llvm.amdgcn.image.gather4.d16.dim.ll | 3 + .../AMDGPU/llvm.amdgcn.image.sample.dim.ll| 3 + .../AMDGPU/unsupported-image-sample.ll| 12 +- .../Object/AMDGPU/elf-header-flags-mach.yaml | 29 ++ .../llvm-objdump/ELF/AMDGPU/subtarget.ll | 20 ++ .../llvm-readobj/ELF/AMDGPU/elf-headers.test | 12 + llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++ 30 files changed, 689 insertions(+), 192 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 141501e8a4d9a..799634ccec7ba 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost) return; - StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) - : getArchNameR600(GPUKind); + std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) + : getArchNameR600(GPUKind)) + .str(); + + // Sanitize the name of generic targets. + // e.g. gfx10.1-generic -> gfx10_1_generic + if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST && + GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) { +std::replace(CanonName.begin(), CanonName.end(), '.', '_'); +std::replace(CanonName.begin(), CanonName.end(), '-', '_'); + } + Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__")); // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ if (isAMDGCN(getTriple()) && !IsHIPHost) { -assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name"); -Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) + +assert(StringRef(CanonName).starts_with("gfx") && + "Invalid amdgcn canonical name"); +StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind); +Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) + Twine("__")); Builder.defineMacro("__amdgcn_processor__", Twine("\"") + Twine(CanonName) + Twine("\"")); diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl index 81c22af460d12..3b10444ef71d3 100644 --- a/clang/test/Driver/amdgpu-macros.cl +++ b/clang/test/Driver/amdgpu-macros.cl @@ -131,6 +131,11 @@ // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 -DFAMILY=GFX12 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s
[llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76955 >From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:48:05 +0100 Subject: [PATCH 1/4] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. No docs in this patch either as I plan to do it all in a follow-up patch. --- clang/lib/Basic/Targets/AMDGPU.cpp| 20 +- clang/test/Driver/amdgpu-macros.cl| 5 + clang/test/Driver/amdgpu-mcpu.cl | 10 + llvm/docs/AMDGPUUsage.rst | 325 +- llvm/include/llvm/BinaryFormat/ELF.h | 6 +- llvm/include/llvm/TargetParser/TargetParser.h | 10 + llvm/lib/Object/ELFObjectFile.cpp | 10 + llvm/lib/ObjectYAML/ELFYAML.cpp | 4 + llvm/lib/Target/AMDGPU/AMDGPU.td | 87 +++-- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 6 + .../AMDGPURemoveIncompatibleFunctions.cpp | 6 +- llvm/lib/Target/AMDGPU/GCNProcessors.td | 22 ++ llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 + .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 11 + llvm/lib/TargetParser/TargetParser.cpp| 46 +++ .../GlobalISel/llvm.amdgcn.workitem.id.ll | 1 + .../CodeGen/AMDGPU/directive-amdgcn-target.ll | 14 + .../CodeGen/AMDGPU/elf-header-flags-mach.ll | 10 + llvm/test/CodeGen/AMDGPU/gds-allocation.ll| 1 + llvm/test/CodeGen/AMDGPU/gds-atomic.ll| 1 + .../AMDGPU/generic-targets-require-v6.ll | 18 + .../AMDGPU/hsa-generic-target-features.ll | 31 ++ .../llvm.amdgcn.image.gather4.d16.dim.ll | 3 + .../AMDGPU/llvm.amdgcn.image.sample.dim.ll| 3 + .../AMDGPU/unsupported-image-sample.ll| 12 +- .../Object/AMDGPU/elf-header-flags-mach.yaml | 29 ++ .../llvm-objdump/ELF/AMDGPU/subtarget.ll | 20 ++ .../llvm-readobj/ELF/AMDGPU/elf-headers.test | 12 + llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++ 30 files changed, 689 insertions(+), 192 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 141501e8a4d9a..799634ccec7ba 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost) return; - StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) - : getArchNameR600(GPUKind); + std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) + : getArchNameR600(GPUKind)) + .str(); + + // Sanitize the name of generic targets. + // e.g. gfx10.1-generic -> gfx10_1_generic + if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST && + GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) { +std::replace(CanonName.begin(), CanonName.end(), '.', '_'); +std::replace(CanonName.begin(), CanonName.end(), '-', '_'); + } + Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__")); // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ if (isAMDGCN(getTriple()) && !IsHIPHost) { -assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name"); -Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) + +assert(StringRef(CanonName).starts_with("gfx") && + "Invalid amdgcn canonical name"); +StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind); +Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) + Twine("__")); Builder.defineMacro("__amdgcn_processor__", Twine("\"") + Twine(CanonName) + Twine("\"")); diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl index 81c22af460d12..3b10444ef71d3 100644 --- a/clang/test/Driver/amdgpu-macros.cl +++ b/clang/test/Driver/amdgpu-macros.cl @@ -131,6 +131,11 @@ // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 -DFAMILY=GFX12 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
Pierre-vh wrote: For the MD changes, it's just to describe the version increment, nothing else. I think describing is important as the V6 diff already updated the amdhsa.version. If amdhsa.version didn't need to change then i need to fix that first, and then we can remove the V6 MD section https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76955 >From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:48:05 +0100 Subject: [PATCH 1/5] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. No docs in this patch either as I plan to do it all in a follow-up patch. --- clang/lib/Basic/Targets/AMDGPU.cpp| 20 +- clang/test/Driver/amdgpu-macros.cl| 5 + clang/test/Driver/amdgpu-mcpu.cl | 10 + llvm/docs/AMDGPUUsage.rst | 325 +- llvm/include/llvm/BinaryFormat/ELF.h | 6 +- llvm/include/llvm/TargetParser/TargetParser.h | 10 + llvm/lib/Object/ELFObjectFile.cpp | 10 + llvm/lib/ObjectYAML/ELFYAML.cpp | 4 + llvm/lib/Target/AMDGPU/AMDGPU.td | 87 +++-- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 6 + .../AMDGPURemoveIncompatibleFunctions.cpp | 6 +- llvm/lib/Target/AMDGPU/GCNProcessors.td | 22 ++ llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 + .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 11 + llvm/lib/TargetParser/TargetParser.cpp| 46 +++ .../GlobalISel/llvm.amdgcn.workitem.id.ll | 1 + .../CodeGen/AMDGPU/directive-amdgcn-target.ll | 14 + .../CodeGen/AMDGPU/elf-header-flags-mach.ll | 10 + llvm/test/CodeGen/AMDGPU/gds-allocation.ll| 1 + llvm/test/CodeGen/AMDGPU/gds-atomic.ll| 1 + .../AMDGPU/generic-targets-require-v6.ll | 18 + .../AMDGPU/hsa-generic-target-features.ll | 31 ++ .../llvm.amdgcn.image.gather4.d16.dim.ll | 3 + .../AMDGPU/llvm.amdgcn.image.sample.dim.ll| 3 + .../AMDGPU/unsupported-image-sample.ll| 12 +- .../Object/AMDGPU/elf-header-flags-mach.yaml | 29 ++ .../llvm-objdump/ELF/AMDGPU/subtarget.ll | 20 ++ .../llvm-readobj/ELF/AMDGPU/elf-headers.test | 12 + llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++ 30 files changed, 689 insertions(+), 192 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 141501e8a4d9a..799634ccec7ba 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost) return; - StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) - : getArchNameR600(GPUKind); + std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) + : getArchNameR600(GPUKind)) + .str(); + + // Sanitize the name of generic targets. + // e.g. gfx10.1-generic -> gfx10_1_generic + if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST && + GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) { +std::replace(CanonName.begin(), CanonName.end(), '.', '_'); +std::replace(CanonName.begin(), CanonName.end(), '-', '_'); + } + Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__")); // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ if (isAMDGCN(getTriple()) && !IsHIPHost) { -assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name"); -Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) + +assert(StringRef(CanonName).starts_with("gfx") && + "Invalid amdgcn canonical name"); +StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind); +Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) + Twine("__")); Builder.defineMacro("__amdgcn_processor__", Twine("\"") + Twine(CanonName) + Twine("\"")); diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl index 81c22af460d12..3b10444ef71d3 100644 --- a/clang/test/Driver/amdgpu-macros.cl +++ b/clang/test/Driver/amdgpu-macros.cl @@ -131,6 +131,11 @@ // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 -DFAMILY=GFX12 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,102 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors allow execution of a single code objects on any of the processors that +it supports. Such code objects may not perform as well as those for the non-generic processors. + +Generic processors are only available on code object V6 and above (see :ref:`amdgpu-elf-code-object`). + +Generic processor code objects are versioned (see :ref:`amdgpu-elf-header-e_flags-table-v6-onwards`). +The version number is used by runtimes to determine if a code object can be run on a specific agent. Pierre-vh wrote: I rephrased it a bit (e.g. member -> supported processor) but I mostly followed your suggestion https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76955 >From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:48:05 +0100 Subject: [PATCH 1/6] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. No docs in this patch either as I plan to do it all in a follow-up patch. --- clang/lib/Basic/Targets/AMDGPU.cpp| 20 +- clang/test/Driver/amdgpu-macros.cl| 5 + clang/test/Driver/amdgpu-mcpu.cl | 10 + llvm/docs/AMDGPUUsage.rst | 325 +- llvm/include/llvm/BinaryFormat/ELF.h | 6 +- llvm/include/llvm/TargetParser/TargetParser.h | 10 + llvm/lib/Object/ELFObjectFile.cpp | 10 + llvm/lib/ObjectYAML/ELFYAML.cpp | 4 + llvm/lib/Target/AMDGPU/AMDGPU.td | 87 +++-- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 6 + .../AMDGPURemoveIncompatibleFunctions.cpp | 6 +- llvm/lib/Target/AMDGPU/GCNProcessors.td | 22 ++ llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 + .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 11 + llvm/lib/TargetParser/TargetParser.cpp| 46 +++ .../GlobalISel/llvm.amdgcn.workitem.id.ll | 1 + .../CodeGen/AMDGPU/directive-amdgcn-target.ll | 14 + .../CodeGen/AMDGPU/elf-header-flags-mach.ll | 10 + llvm/test/CodeGen/AMDGPU/gds-allocation.ll| 1 + llvm/test/CodeGen/AMDGPU/gds-atomic.ll| 1 + .../AMDGPU/generic-targets-require-v6.ll | 18 + .../AMDGPU/hsa-generic-target-features.ll | 31 ++ .../llvm.amdgcn.image.gather4.d16.dim.ll | 3 + .../AMDGPU/llvm.amdgcn.image.sample.dim.ll| 3 + .../AMDGPU/unsupported-image-sample.ll| 12 +- .../Object/AMDGPU/elf-header-flags-mach.yaml | 29 ++ .../llvm-objdump/ELF/AMDGPU/subtarget.ll | 20 ++ .../llvm-readobj/ELF/AMDGPU/elf-headers.test | 12 + llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++ 30 files changed, 689 insertions(+), 192 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 141501e8a4d9a1..799634ccec7ba5 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost) return; - StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) - : getArchNameR600(GPUKind); + std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) + : getArchNameR600(GPUKind)) + .str(); + + // Sanitize the name of generic targets. + // e.g. gfx10.1-generic -> gfx10_1_generic + if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST && + GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) { +std::replace(CanonName.begin(), CanonName.end(), '.', '_'); +std::replace(CanonName.begin(), CanonName.end(), '-', '_'); + } + Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__")); // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ if (isAMDGCN(getTriple()) && !IsHIPHost) { -assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name"); -Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) + +assert(StringRef(CanonName).starts_with("gfx") && + "Invalid amdgcn canonical name"); +StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind); +Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) + Twine("__")); Builder.defineMacro("__amdgcn_processor__", Twine("\"") + Twine(CanonName) + Twine("\"")); diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl index 81c22af460d12d..3b10444ef71d36 100644 --- a/clang/test/Driver/amdgpu-macros.cl +++ b/clang/test/Driver/amdgpu-macros.cl @@ -131,6 +131,11 @@ // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 -DFAMILY=GFX12 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
Pierre-vh wrote: @t-tye Can you please approve then? Otherwise the diff still shows a red "Changes requested" warning :) Thanks @arsenm Please also approve if there are no more comments https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
https://github.com/Pierre-vh edited https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
https://github.com/Pierre-vh commented: My comments are mostly about style, I haven't done a deep dive into the logic of the pass yet https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { Pierre-vh wrote: `Interface` is a very generic name, can you make it a bit more specific and add docs? https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class ValistCc { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // Lots of targets use a void* pointed at a buffer for va_list. + // Some use more complicated iterator constructs. + // This interface seeks to express both. + // Ideally it would be a compile time error for a derived class + // to override only one of valistOnStack, initializeVAList. + + // How the vaListType is passed + virtual ValistCc valistCc() { return ValistCc::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented
[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This is an optimisation pass for variadic functions. If called from codegen, +// it can serve as the implementation of variadic functions for a given target. +// +// The target-dependent parts are in namespace VariadicABIInfo. Enabling a new +// target means adding a case to VariadicABIInfo::create() along with tests. +// +// The module pass using that information is class ExpandVariadics. +// +// The strategy is: +// 1. Test whether a variadic function is sufficiently simple +// 2. If it was, calls to it can be replaced with calls to a different function +// 3. If it wasn't, try to split it into a simple function and a remainder +// 4. Optionally rewrite the varadic function calling convention as well +// +// This pass considers "sufficiently simple" to mean a variadic function that +// calls into a different function taking a va_list to do the real work. For +// example, libc might implement fprintf as a single basic block calling into +// vfprintf. This pass can then rewrite call to the variadic into some code +// to construct a target-specific value to use for the va_list and a call +// into the non-variadic implementation function. There's a test for that. +// +// Most other variadic functions whose definition is known can be converted into +// that form. Create a new internal function taking a va_list where the original +// took a ... parameter. Move the blocks across. Create a new block containing a +// va_start that calls into the new function. This is nearly target independent. +// +// Where this transform is consistent with the ABI, e.g. AMDGPU or NVPTX, or +// where the ABI can be chosen to align with this transform, the function +// interface can be rewritten along with calls to unknown variadic functions. +// +// The aggregate effect is to unblock other transforms, most critically the +// general purpose inliner. Known calls to variadic functions become zero cost. +// +// This pass does define some target specific information which is partially +// redundant with other parts of the compiler. In particular, the call frame +// it builds must be the exact complement of the va_arg lowering performed +// by clang. The va_list construction is similar to work done by the backend +// for targets that lower variadics there, though distinct in that this pass +// constructs the pieces using alloca instead of relative to stack pointers. +// +// Consistency with clang is primarily tested by emitting va_arg using clang +// then expanding the variadic functions using this pass, followed by trying +// to constant fold the functions to no-ops. +// +// Target specific behaviour is tested in IR - mainly checking that values are +// put into positions in call frames that make sense for that particular target. +// +//===--===// + +#include "llvm/Transforms/IPO/ExpandVariadics.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/Constants.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" +#include "llvm/Pass.h" +#include "llvm/TargetParser/Triple.h" + +#define DEBUG_TYPE "expand-variadics" + +using namespace llvm; + +namespace { +namespace VariadicABIInfo { + +// calling convention for passing as valist object, same as it would be in C +// aarch64 uses byval +enum class valistCC { value, pointer, /*byval*/ }; + +struct Interface { +protected: + Interface(uint32_t MinAlign, uint32_t MaxAlign) + : MinAlign(MinAlign), MaxAlign(MaxAlign) {} + +public: + virtual ~Interface() {} + const uint32_t MinAlign; + const uint32_t MaxAlign; + + // Most ABIs use a void* or char* for va_list, others can specialise + virtual Type *vaListType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + // How the vaListType is passed + virtual valistCC vaListCC() { return valistCC::value; } + + // The valist might need to be stack allocated. + virtual bool valistOnStack() { return false; } + + virtual void initializeVAList(LLVMContext &Ctx, IRBuilder<> &Builder, +AllocaInst * /*va_list*/, Value * /*buffer*/) { +// Function needs to be implemented if valist is on the stack +assert(!valistOnStack()); +__builtin_unreachable(); + } + + // All targets currently implemented use a ptr for the valist parameter + Type *vaListParameterType(LLVMContext &Ctx) { +return PointerType::getUnqual(Ctx); + } + + bool VAEndIsNop() { return
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
Pierre-vh wrote: > mad_mix I added run lines to `mad-mix.ll` and it behaves as expected: no fma/mad_mix emitted https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh closed https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
https://github.com/Pierre-vh approved this pull request. LGTM, but wait for @t-tye or @jayfoad to approve as well https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2594,12 +2594,10 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent || MOI.getFailureOrdering() == AtomicOrdering::Acquire || MOI.getFailureOrdering() == AtomicOrdering::SequentiallyConsistent) { - Changed |= CC->insertWait(MI, MOI.getScope(), Pierre-vh wrote: extra formatting change https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } #endif +if (ST->isPreciseMemoryEnabled()) { + AMDGPU::Waitcnt Wait; + if (WCG == &WCGPreGFX12) Pierre-vh wrote: Use `ST->hasExtendedWaitCounts()` instead of checking the pointer? https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } #endif +if (ST->isPreciseMemoryEnabled()) { + AMDGPU::Waitcnt Wait; + if (WCG == &WCGPreGFX12) +Wait = AMDGPU::Waitcnt(0, 0, 0, 0); Pierre-vh wrote: I was looking at https://github.com/ROCm/ROCm-CompilerSupport/issues/66 and it made me wonder, why do we have to emit all zeroes instead of just emitting what's in `ScoreBrackets`? Is there an advantage? I'm wondering if this should just emit `ScoreBrackets`, then `+precise-memory` + `-amdgpu-waitcnt-forcezero` need to be used together achieve the behavior we have here? https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)
https://github.com/Pierre-vh created https://github.com/llvm/llvm-project/pull/83558 Those macros are unreliable because our features are mostly uninitialized at that stage, so any macro we define is unreliable. Fixes SWDEV-447308 >From 3730631ac58425f559f4bc3cfe3da89e6367c1c5 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 1 Mar 2024 12:43:55 +0100 Subject: [PATCH] [clang][AMDGPU] Don't define feature macros on host code Those macros are unreliable because our features are mostly uninitialized at that stage, so any macro we define is unreliable. Fixes SWDEV-447308 --- clang/lib/Basic/Targets/AMDGPU.cpp | 8 +++- clang/test/Preprocessor/predefined-arch-macros.c | 2 +- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 5742885df0461b..df9a5855068ed3 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -292,8 +292,14 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, } Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__")); + + // Don't emit feature macros in host code because in such cases the + // feature list is not accurate. + if (IsHIPHost) +return; + // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ - if (isAMDGCN(getTriple()) && !IsHIPHost) { + if (isAMDGCN(getTriple())) { assert(StringRef(CanonName).starts_with("gfx") && "Invalid amdgcn canonical name"); StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind); diff --git a/clang/test/Preprocessor/predefined-arch-macros.c b/clang/test/Preprocessor/predefined-arch-macros.c index ca51f2fc22c517..8904bcea1a1f68 100644 --- a/clang/test/Preprocessor/predefined-arch-macros.c +++ b/clang/test/Preprocessor/predefined-arch-macros.c @@ -4340,7 +4340,7 @@ // RUN: %clang -x hip -E -dM %s -o - 2>&1 --offload-host-only -nogpulib \ // RUN: -nogpuinc --offload-arch=gfx803 -target x86_64-unknown-linux \ // RUN: | FileCheck -match-full-lines %s -check-prefixes=CHECK_HIP_HOST -// CHECK_HIP_HOST: #define __AMDGCN_WAVEFRONT_SIZE__ 64 +// CHECK_HIP_HOST-NOT: #define __AMDGCN_WAVEFRONT_SIZE__ 64 // CHECK_HIP_HOST: #define __AMDGPU__ 1 // CHECK_HIP_HOST: #define __AMD__ 1 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/83558 >From 3730631ac58425f559f4bc3cfe3da89e6367c1c5 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 1 Mar 2024 12:43:55 +0100 Subject: [PATCH 1/2] [clang][AMDGPU] Don't define feature macros on host code Those macros are unreliable because our features are mostly uninitialized at that stage, so any macro we define is unreliable. Fixes SWDEV-447308 --- clang/lib/Basic/Targets/AMDGPU.cpp | 8 +++- clang/test/Preprocessor/predefined-arch-macros.c | 2 +- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 5742885df0461b..df9a5855068ed3 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -292,8 +292,14 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, } Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__")); + + // Don't emit feature macros in host code because in such cases the + // feature list is not accurate. + if (IsHIPHost) +return; + // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ - if (isAMDGCN(getTriple()) && !IsHIPHost) { + if (isAMDGCN(getTriple())) { assert(StringRef(CanonName).starts_with("gfx") && "Invalid amdgcn canonical name"); StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind); diff --git a/clang/test/Preprocessor/predefined-arch-macros.c b/clang/test/Preprocessor/predefined-arch-macros.c index ca51f2fc22c517..8904bcea1a1f68 100644 --- a/clang/test/Preprocessor/predefined-arch-macros.c +++ b/clang/test/Preprocessor/predefined-arch-macros.c @@ -4340,7 +4340,7 @@ // RUN: %clang -x hip -E -dM %s -o - 2>&1 --offload-host-only -nogpulib \ // RUN: -nogpuinc --offload-arch=gfx803 -target x86_64-unknown-linux \ // RUN: | FileCheck -match-full-lines %s -check-prefixes=CHECK_HIP_HOST -// CHECK_HIP_HOST: #define __AMDGCN_WAVEFRONT_SIZE__ 64 +// CHECK_HIP_HOST-NOT: #define __AMDGCN_WAVEFRONT_SIZE__ 64 // CHECK_HIP_HOST: #define __AMDGPU__ 1 // CHECK_HIP_HOST: #define __AMD__ 1 >From a60d9fa16876b90a69b60de429261a3d10404f7a Mon Sep 17 00:00:00 2001 From: pvanhout Date: Fri, 1 Mar 2024 13:56:46 +0100 Subject: [PATCH 2/2] use CudaIssDevice --- clang/lib/Basic/Targets/AMDGPU.cpp| 2 +- .../CodeGenOpenCL/builtins-amdgcn-wave32.cl | 2 +- clang/test/Driver/amdgpu-macros.cl| 212 +- clang/test/Driver/target-id-macros.cl | 10 +- .../Preprocessor/predefined-arch-macros.c | 6 +- 5 files changed, 116 insertions(+), 116 deletions(-) diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index df9a5855068ed3..0c5f6bb13ec2eb 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -295,7 +295,7 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, // Don't emit feature macros in host code because in such cases the // feature list is not accurate. - if (IsHIPHost) + if (!Opts.CUDAIsDevice) return; // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl index da1ae244431556..de3020fdb6f98f 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl @@ -42,6 +42,6 @@ void test_read_exec_hi(global uint* out) { *out = __builtin_amdgcn_read_exec_hi(); } -#if __AMDGCN_WAVEFRONT_SIZE != 32 +#if defined(__AMDGCN_WAVEFRONT_SIZE) && __AMDGCN_WAVEFRONT_SIZE != 32 #error Wrong wavesize detected #endif diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl index 004619321b271f..1f03ccc6ab9223 100644 --- a/clang/test/Driver/amdgpu-macros.cl +++ b/clang/test/Driver/amdgpu-macros.cl @@ -6,32 +6,32 @@ // R600-based processors. // -// RUN: %clang -E -dM -target r600 -mcpu=r600 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s -DCPU=r600 -// RUN: %clang -E -dM -target r600 -mcpu=rv630 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s -DCPU=r600 -// RUN: %clang -E -dM -target r600 -mcpu=rv635 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s -DCPU=r600 -// RUN: %clang -E -dM -target r600 -mcpu=r630 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R630 %s -DCPU=r630 -// RUN: %clang -E -dM -target r600 -mcpu=rs780 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RS880 %s -DCPU=rs880 -// RUN: %clang -E -dM -target r600 -mcpu=rs880 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RS880 %s -DCPU=rs880 -// RUN: %clang -E -dM -target r600 -mcpu=rv610 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RS880 %s -DCPU=rs880 -// RUN: %clang -E -dM -target r600 -mcpu=rv620 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RS880 %s -DCPU=rs880 -// RUN: %clang -E -dM -target r600 -
[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)
Pierre-vh wrote: > This was the original behavior of my patch, but I reverted it because it > broke all the HIP headers that were unintentionally relying on this. Has that > been resolved? Was an issue opened for that? How many headers are affected? https://github.com/llvm/llvm-project/pull/83558 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)
https://github.com/Pierre-vh closed https://github.com/llvm/llvm-project/pull/83558 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [flang] [llvm] [clang] [AMDGPU] Introduce Code Object V6 (PR #76954)
@@ -44,8 +44,15 @@ constexpr uint32_t VersionMajorV5 = 1; /// HSA metadata minor version for code object V5. constexpr uint32_t VersionMinorV5 = 2; +/// HSA metadata major version for code object V6. +constexpr uint32_t VersionMajorV6 = 1; +/// HSA metadata minor version for code object V6. +constexpr uint32_t VersionMinorV6 = 3; Pierre-vh wrote: Do we just increment this number when there's a breaking metadata change? How does it work? https://github.com/llvm/llvm-project/pull/76954 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [flang] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From 7ad88453f5e89fd4643afa486e52a123138433f4 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH 1/2] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 21 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 9 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 7 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 9 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 6 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 2 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 3 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + .../elf-headers.test} | 0 .../ELF/AMDGPU/generic_versions.s | 16 ++ .../ELF/AMDGPU/generic_versions.test | 26 ++ llvm/tools/llvm-readobj/ELFDumper.cpp | 224 -- 52 files changed, 483 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => AMDGPU/elf-headers.test} (100%) create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index a4b35e370999e..d7152cc355ce7 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4797,9 +4797,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_5">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index f3ab5ad7b08ec..a55be6880c5ef 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/C
[clang] [flang] [llvm] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -139,10 +139,10 @@ bool AMDGPURemoveIncompatibleFunctions::checkFunction(Function &F) { const GCNSubtarget *ST = static_cast(TM->getSubtargetImpl(F)); - // Check the GPU isn't generic. Generic is used for testing only - // and we don't want this pass to interfere with it. + // Check the GPU isn't generic or generic-hsa. Generic is used for testing + // only and we don't want this pass to interfere with it. StringRef GPUName = ST->getCPU(); - if (GPUName.empty() || GPUName.contains("generic")) + if (GPUName.empty() || GPUName.starts_with("generic")) Pierre-vh wrote: No we have some tests using a generic target, and I want the pass to work on -generic targets as well (e.g. gfx9-generic), so I'm moving the check to allow stuff like gfx9-generic but not "generic" alone https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [flang] [clang] [lld] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From a967fdae9a8557331d2a228f391f39f9e27e8943 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH 1/3] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 21 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 9 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 7 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 9 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 6 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 2 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 3 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + .../elf-headers.test} | 0 .../ELF/AMDGPU/generic_versions.s | 16 ++ .../ELF/AMDGPU/generic_versions.test | 26 ++ llvm/tools/llvm-readobj/ELFDumper.cpp | 224 -- 52 files changed, 483 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => AMDGPU/elf-headers.test} (100%) create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 73071a6648541..fb5f50ef452c2 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4801,9 +4801,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_5">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 196be813a4896..f17e4a83305bf 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/C
[llvm] [clang] [lld] [flang] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From a967fdae9a8557331d2a228f391f39f9e27e8943 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH 1/4] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 21 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 9 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 7 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 9 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 6 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 2 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 3 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + .../elf-headers.test} | 0 .../ELF/AMDGPU/generic_versions.s | 16 ++ .../ELF/AMDGPU/generic_versions.test | 26 ++ llvm/tools/llvm-readobj/ELFDumper.cpp | 224 -- 52 files changed, 483 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => AMDGPU/elf-headers.test} (100%) create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 73071a6648541..fb5f50ef452c2 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4801,9 +4801,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_5">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 196be813a4896..f17e4a83305bf 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/C
[clang] [lld] [llvm] [flang] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From a967fdae9a8557331d2a228f391f39f9e27e8943 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH 1/5] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 21 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 9 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 7 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 9 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 6 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 2 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 3 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + .../elf-headers.test} | 0 .../ELF/AMDGPU/generic_versions.s | 16 ++ .../ELF/AMDGPU/generic_versions.test | 26 ++ llvm/tools/llvm-readobj/ELFDumper.cpp | 224 -- 52 files changed, 483 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => AMDGPU/elf-headers.test} (100%) create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 73071a6648541..fb5f50ef452c2 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4801,9 +4801,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_5">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 196be813a4896..f17e4a83305bf 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/C
[flang] [clang] [llvm] [lld] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh closed https://github.com/llvm/llvm-project/pull/76954 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76955 >From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:48:05 +0100 Subject: [PATCH] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. No docs in this patch either as I plan to do it all in a follow-up patch. --- clang/lib/Basic/Targets/AMDGPU.cpp| 20 +- clang/test/Driver/amdgpu-macros.cl| 5 + clang/test/Driver/amdgpu-mcpu.cl | 10 + llvm/docs/AMDGPUUsage.rst | 325 +- llvm/include/llvm/BinaryFormat/ELF.h | 6 +- llvm/include/llvm/TargetParser/TargetParser.h | 10 + llvm/lib/Object/ELFObjectFile.cpp | 10 + llvm/lib/ObjectYAML/ELFYAML.cpp | 4 + llvm/lib/Target/AMDGPU/AMDGPU.td | 87 +++-- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 6 + .../AMDGPURemoveIncompatibleFunctions.cpp | 6 +- llvm/lib/Target/AMDGPU/GCNProcessors.td | 22 ++ llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 + .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 11 + llvm/lib/TargetParser/TargetParser.cpp| 46 +++ .../GlobalISel/llvm.amdgcn.workitem.id.ll | 1 + .../CodeGen/AMDGPU/directive-amdgcn-target.ll | 14 + .../CodeGen/AMDGPU/elf-header-flags-mach.ll | 10 + llvm/test/CodeGen/AMDGPU/gds-allocation.ll| 1 + llvm/test/CodeGen/AMDGPU/gds-atomic.ll| 1 + .../AMDGPU/generic-targets-require-v6.ll | 18 + .../AMDGPU/hsa-generic-target-features.ll | 31 ++ .../llvm.amdgcn.image.gather4.d16.dim.ll | 3 + .../AMDGPU/llvm.amdgcn.image.sample.dim.ll| 3 + .../AMDGPU/unsupported-image-sample.ll| 12 +- .../Object/AMDGPU/elf-header-flags-mach.yaml | 29 ++ .../llvm-objdump/ELF/AMDGPU/subtarget.ll | 20 ++ .../llvm-readobj/ELF/AMDGPU/elf-headers.test | 12 + llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++ 30 files changed, 689 insertions(+), 192 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 141501e8a4d9a..799634ccec7ba 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost) return; - StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) - : getArchNameR600(GPUKind); + std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) + : getArchNameR600(GPUKind)) + .str(); + + // Sanitize the name of generic targets. + // e.g. gfx10.1-generic -> gfx10_1_generic + if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST && + GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) { +std::replace(CanonName.begin(), CanonName.end(), '.', '_'); +std::replace(CanonName.begin(), CanonName.end(), '-', '_'); + } + Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__")); // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ if (isAMDGCN(getTriple()) && !IsHIPHost) { -assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name"); -Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) + +assert(StringRef(CanonName).starts_with("gfx") && + "Invalid amdgcn canonical name"); +StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind); +Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) + Twine("__")); Builder.defineMacro("__amdgcn_processor__", Twine("\"") + Twine(CanonName) + Twine("\"")); diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl index 81c22af460d12..3b10444ef71d3 100644 --- a/clang/test/Driver/amdgpu-macros.cl +++ b/clang/test/Driver/amdgpu-macros.cl @@ -131,6 +131,11 @@ // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 -DFAMILY=GFX12 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWA
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
https://github.com/Pierre-vh edited https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
https://github.com/Pierre-vh requested changes to this pull request. When you made changes, you can click the "Re-request review" icon next to reviewers to put it back in the review queues :) https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { Pierre-vh wrote: Why does it need to be a separate class hierarchy? It could just be part of CacheControl, and the functions can be named `handlePreciseMemoryAtomic/NonAtomic` ? That would avoid a lot of boilerplate. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -0,0 +1,362 @@ +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op < %s | FileCheck %s -check-prefixes=GFX90A +; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+amdgpu-precise-memory-op < %s | FileCheck %s -check-prefixes=GFX10 +; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global,+enable-flat-scratch,+amdgpu-precise-memory-op -amdgpu-use-divergent-register-indexing < %s | FileCheck --check-prefixes=GFX9-FLATSCR %s Pierre-vh wrote: gfx11 and 12 tests https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode", "Enable CU wavefront execution mode" >; +def FeaturePreciseMemory Pierre-vh wrote: Understood :) Can you remove the `amdgpu` prefix from the option? All target features are already specific to AMDGPU, e.g. xnack, sramecc, etc. so you can just use something like `precise-memory` https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { +protected: + const GCNSubtarget &ST; + const SIInstrInfo *TII = nullptr; + + IsaVersion IV; + + SIPreciseMemorySupport(const GCNSubtarget &ST) : ST(ST) { +TII = ST.getInstrInfo(); +IV = getIsaVersion(ST.getCPU()); + } + +public: + static std::unique_ptr create(const GCNSubtarget &ST); + + virtual bool handleNonAtomic(MachineBasicBlock::iterator &MI) = 0; + /// Handles atomic instruction \p MI with \p ret indicating whether \p MI + /// returns a result. + virtual bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) = 0; +}; + +class SIGfx9PreciseMemorySupport : public SIPreciseMemorySupport { +public: + SIGfx9PreciseMemorySupport(const GCNSubtarget &ST) + : SIPreciseMemorySupport(ST) {} + bool handleNonAtomic(MachineBasicBlock::iterator &MI) override; + bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) override; +}; + +class SIGfx10And11PreciseMemorySupport : public SIPreciseMemorySupport { +public: + SIGfx10And11PreciseMemorySupport(const GCNSubtarget &ST) + : SIPreciseMemorySupport(ST) {} + bool handleNonAtomic(MachineBasicBlock::iterator &MI) override; + bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) override; +}; + +std::unique_ptr +SIPreciseMemorySupport::create(const GCNSubtarget &ST) { + GCNSubtarget::Generation Generation = ST.getGeneration(); + if (Generation < AMDGPUSubtarget::GFX10) Pierre-vh wrote: GFX12 is missing https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -605,12 +606,197 @@ class SIGfx12CacheControl : public SIGfx11CacheControl { bool IsNonTemporal) const override; }; +class SIPreciseMemorySupport { +protected: + const GCNSubtarget &ST; + const SIInstrInfo *TII = nullptr; + + IsaVersion IV; + + SIPreciseMemorySupport(const GCNSubtarget &ST) : ST(ST) { +TII = ST.getInstrInfo(); +IV = getIsaVersion(ST.getCPU()); + } + +public: + static std::unique_ptr create(const GCNSubtarget &ST); + + virtual bool handleNonAtomic(MachineBasicBlock::iterator &MI) = 0; + /// Handles atomic instruction \p MI with \p ret indicating whether \p MI + /// returns a result. + virtual bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) = 0; +}; + +class SIGfx9PreciseMemorySupport : public SIPreciseMemorySupport { +public: + SIGfx9PreciseMemorySupport(const GCNSubtarget &ST) + : SIPreciseMemorySupport(ST) {} + bool handleNonAtomic(MachineBasicBlock::iterator &MI) override; + bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) override; +}; + +class SIGfx10And11PreciseMemorySupport : public SIPreciseMemorySupport { +public: + SIGfx10And11PreciseMemorySupport(const GCNSubtarget &ST) + : SIPreciseMemorySupport(ST) {} + bool handleNonAtomic(MachineBasicBlock::iterator &MI) override; + bool handleAtomic(MachineBasicBlock::iterator &MI, bool ret) override; +}; + +std::unique_ptr +SIPreciseMemorySupport::create(const GCNSubtarget &ST) { + GCNSubtarget::Generation Generation = ST.getGeneration(); + if (Generation < AMDGPUSubtarget::GFX10) +return std::make_unique(ST); + return std::make_unique(ST); +} + +bool SIGfx9PreciseMemorySupport ::handleNonAtomic( +MachineBasicBlock::iterator &MI) { + assert(MI->mayLoadOrStore()); + + MachineInstr &Inst = *MI; + AMDGPU::Waitcnt Wait; + + if (TII->isSMRD(Inst)) { // scalar +if (Inst.mayStore()) + return false; +Wait.DsCnt = 0; // LgkmCnt + } else {// vector +if (Inst.mayLoad()) { // vector load + if (TII->isVMEM(Inst)) {// VMEM load +Wait.LoadCnt = 0; // VmCnt + } else if (TII->isFLAT(Inst)) { // Flat load +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else {// LDS load +Wait.DsCnt = 0; // LgkmCnt + } +} else { // vector store + if (TII->isVMEM(Inst)) {// VMEM store +Wait.LoadCnt = 0; // VmCnt + } else if (TII->isFLAT(Inst)) { // Flat store +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else { +Wait.DsCnt = 0; // LDS store; LgkmCnt + } +} + } + + unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); + MachineBasicBlock &MBB = *MI->getParent(); + BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); + --MI; + return true; +} + +bool SIGfx9PreciseMemorySupport ::handleAtomic(MachineBasicBlock::iterator &MI, + bool ret) { + assert(MI->mayLoadOrStore()); + + AMDGPU::Waitcnt Wait; + + Wait.LoadCnt = 0; // VmCnt + Wait.DsCnt = 0; // LgkmCnt + + unsigned Enc = AMDGPU::encodeWaitcnt(IV, Wait); + MachineBasicBlock &MBB = *MI->getParent(); + BuildMI(MBB, ++MI, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)).addImm(Enc); + --MI; + return true; +} + +bool SIGfx10And11PreciseMemorySupport ::handleNonAtomic( +MachineBasicBlock::iterator &MI) { + assert(MI->mayLoadOrStore()); + + MachineInstr &Inst = *MI; + AMDGPU::Waitcnt Wait; + + bool BuildWaitCnt = true; + bool BuildVsCnt = false; + + if (TII->isSMRD(Inst)) { // scalar +if (Inst.mayStore()) + return false; +Wait.DsCnt = 0; // LgkmCnt + } else {// vector +if (Inst.mayLoad()) { // vector load + if (TII->isVMEM(Inst)) {// VMEM load +Wait.LoadCnt = 0; // VmCnt + } else if (TII->isFLAT(Inst)) { // Flat load +Wait.LoadCnt = 0; // VmCnt +Wait.DsCnt = 0; // LgkmCnt + } else {// LDS load +Wait.DsCnt = 0; // LgkmCnt + } +} + +// For some instructions, mayLoad() and mayStore() can be both true. +if (Inst.mayStore()) { // vector store; an instruction can be both + // load/store + if (TII->isVMEM(Inst)) { // VMEM store +if (!Inst.mayLoad()) + BuildWaitCnt = false; +BuildVsCnt = true; + } else if (TII->isFLAT(Inst)) { // Flat store +Wait.DsCnt = 0; // LgkmCnt +BuildVsCnt = true; + } else { +Wait.DsCnt = 0; // LDS store; LgkmCnt + } +} + } + + MachineBasicBlock &MBB = *MI->getParent(); + if (BuildWaitCnt) { +unsign
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
Pierre-vh wrote: @arsenm do you have any concerns with this change? @t-tye is the documentation good? https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76955 >From 616dda8bc9e000e4243ddb8f6b7f4b04f956a620 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:48:05 +0100 Subject: [PATCH 1/2] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. No docs in this patch either as I plan to do it all in a follow-up patch. --- clang/lib/Basic/Targets/AMDGPU.cpp| 20 +- clang/test/Driver/amdgpu-macros.cl| 5 + clang/test/Driver/amdgpu-mcpu.cl | 10 + llvm/docs/AMDGPUUsage.rst | 325 +- llvm/include/llvm/BinaryFormat/ELF.h | 6 +- llvm/include/llvm/TargetParser/TargetParser.h | 10 + llvm/lib/Object/ELFObjectFile.cpp | 10 + llvm/lib/ObjectYAML/ELFYAML.cpp | 4 + llvm/lib/Target/AMDGPU/AMDGPU.td | 87 +++-- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 6 + .../AMDGPURemoveIncompatibleFunctions.cpp | 6 +- llvm/lib/Target/AMDGPU/GCNProcessors.td | 22 ++ llvm/lib/Target/AMDGPU/GCNSubtarget.h | 4 + .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 11 + llvm/lib/TargetParser/TargetParser.cpp| 46 +++ .../GlobalISel/llvm.amdgcn.workitem.id.ll | 1 + .../CodeGen/AMDGPU/directive-amdgcn-target.ll | 14 + .../CodeGen/AMDGPU/elf-header-flags-mach.ll | 10 + llvm/test/CodeGen/AMDGPU/gds-allocation.ll| 1 + llvm/test/CodeGen/AMDGPU/gds-atomic.ll| 1 + .../AMDGPU/generic-targets-require-v6.ll | 18 + .../AMDGPU/hsa-generic-target-features.ll | 31 ++ .../llvm.amdgcn.image.gather4.d16.dim.ll | 3 + .../AMDGPU/llvm.amdgcn.image.sample.dim.ll| 3 + .../AMDGPU/unsupported-image-sample.ll| 12 +- .../Object/AMDGPU/elf-header-flags-mach.yaml | 29 ++ .../llvm-objdump/ELF/AMDGPU/subtarget.ll | 20 ++ .../llvm-readobj/ELF/AMDGPU/elf-headers.test | 12 + llvm/tools/llvm-readobj/ELFDumper.cpp | 128 +++ 30 files changed, 689 insertions(+), 192 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/generic-targets-require-v6.ll create mode 100644 llvm/test/CodeGen/AMDGPU/hsa-generic-target-features.ll diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp index 141501e8a4d9a1..799634ccec7ba5 100644 --- a/clang/lib/Basic/Targets/AMDGPU.cpp +++ b/clang/lib/Basic/Targets/AMDGPU.cpp @@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost) return; - StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) - : getArchNameR600(GPUKind); + std::string CanonName = (isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) + : getArchNameR600(GPUKind)) + .str(); + + // Sanitize the name of generic targets. + // e.g. gfx10.1-generic -> gfx10_1_generic + if (GPUKind >= llvm::AMDGPU::GK_AMDGCN_GENERIC_FIRST && + GPUKind <= llvm::AMDGPU::GK_AMDGCN_GENERIC_LAST) { +std::replace(CanonName.begin(), CanonName.end(), '.', '_'); +std::replace(CanonName.begin(), CanonName.end(), '-', '_'); + } + Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__")); // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___ if (isAMDGCN(getTriple()) && !IsHIPHost) { -assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name"); -Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) + +assert(StringRef(CanonName).starts_with("gfx") && + "Invalid amdgcn canonical name"); +StringRef CanonFamilyName = getArchFamilyNameAMDGCN(GPUKind); +Builder.defineMacro(Twine("__") + Twine(CanonFamilyName.upper()) + Twine("__")); Builder.defineMacro("__amdgcn_processor__", Twine("\"") + Twine(CanonName) + Twine("\"")); diff --git a/clang/test/Driver/amdgpu-macros.cl b/clang/test/Driver/amdgpu-macros.cl index 81c22af460d12d..3b10444ef71d36 100644 --- a/clang/test/Driver/amdgpu-macros.cl +++ b/clang/test/Driver/amdgpu-macros.cl @@ -131,6 +131,11 @@ // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1200 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF %s -DWAVEFRONT_SIZE=32 -DCPU=gfx1200 -DFAMILY=GFX12 // RUN: %clang -E -dM -target amdgcn -mcpu=gfx1201 %s 2>&1 | FileCheck --check-prefixes=ARCH-GCN,FAST_FMAF
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode", "Enable CU wavefront execution mode" >; +def FeaturePreciseMemory Pierre-vh wrote: I'm not a fan of using a feature for this, I think we should have a backend CL option instead. After all this isn't an architecture feature but just a behavior change for SIMemoryLegalizer. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -641,6 +644,9 @@ class SIMemoryLegalizer final : public MachineFunctionPass { bool expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, MachineBasicBlock::iterator &MI); + bool GFX9InsertWaitcntForPreciseMem(MachineFunction &MF); Pierre-vh wrote: Agreed, this should definitely be a virtual function such as `insertWaitcntForPreciseMem` and let the CacheControl implementation do what is needed. This is just emulating what `CacheControl` already does https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -0,0 +1,199 @@ +; Testing the -amdgpu-precise-memory-op option +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX90A +; COM: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+amdgpu-precise-memory-op -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX10 +; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global,+enable-flat-scratch,+amdgpu-precise-memory-op -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s | FileCheck --check-prefixes=GFX9-FLATSCR %s + +; from atomicrmw-expand.ll +; covers flat_load, flat_atomic +define void @syncscope_workgroup_nortn(ptr %addr, float %val) { +; GFX90A-LABEL: syncscope_workgroup_nortn: +; GFX90A: ; %bb.0: +; GFX90A: flat_load_dword v5, v[0:1] +; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX90A: .LBB0_1: ; %atomicrmw.start +; GFX90A: flat_atomic_cmpswap v3, v[0:1], v[4:5] glc +; GFX90A-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) + %res = atomicrmw fadd ptr %addr, float %val syncscope("workgroup") seq_cst + ret void +} + +; from atomicrmw-nand.ll +; covers global_atomic, global_load +define i32 @atomic_nand_i32_global(ptr addrspace(1) %ptr) nounwind { +; GFX9-LABEL: atomic_nand_i32_global: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:global_load_dword v2, v[0:1], off +; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_mov_b64 s[4:5], 0 +; GFX9-NEXT: .LBB1_1: ; %atomicrmw.start +; GFX9-NEXT:; =>This Inner Loop Header: Depth=1 +; GFX9-NOT: s_waitcnt vmcnt(0) +; GFX9-NEXT:v_mov_b32_e32 v3, v2 +; GFX9-NEXT:v_not_b32_e32 v2, v3 +; GFX9-NEXT:v_or_b32_e32 v2, -5, v2 +; GFX9-NEXT:global_atomic_cmpswap v2, v[0:1], v[2:3], off glc +; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-NEXT:buffer_wbinvl1_vol +; GFX9-NEXT:v_cmp_eq_u32_e32 vcc, v2, v3 +; GFX9-NEXT:s_or_b64 s[4:5], vcc, s[4:5] +; GFX9-NEXT:s_andn2_b64 exec, exec, s[4:5] +; GFX9-NEXT:s_cbranch_execnz .LBB1_1 +; GFX9-NEXT: ; %bb.2: ; %atomicrmw.end +; GFX9-NEXT:s_or_b64 exec, exec, s[4:5] +; GFX9-NEXT:v_mov_b32_e32 v0, v2 +; GFX9-NEXT:s_setpc_b64 s[30:31] + %result = atomicrmw nand ptr addrspace(1) %ptr, i32 4 seq_cst + ret i32 %result +} + +; from bf16.ll +; covers buffer_load, buffer_store, flat_load, flat_store, global_load, global_store +define void @test_load_store(ptr addrspace(1) %in, ptr addrspace(1) %out) { +; +; GFX9-LABEL: test_load_store: +; GFX9: ; %bb.0: +; GFX9-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT:global_load_ushort v0, v[0:1], off +; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-NEXT:global_store_short v[2:3], v0, off +; GFX9-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: test_load_store: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:global_load_ushort v0, v[0:1], off +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:s_waitcnt_vscnt null, 0x0 +; GFX10-NEXT:global_store_short v[2:3], v0, off +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:s_waitcnt_vscnt null, 0x0 +; GFX10-NEXT:s_setpc_b64 s[30:31] + %val = load bfloat, ptr addrspace(1) %in + store bfloat %val, ptr addrspace(1) %out + ret void +} + +; from scratch-simple.ll +; covers scratch_load, scratch_store +; +; GFX9-FLATSCR-LABEL: {{^}}vs_main: +; GFX9-FLATSCR:scratch_store_dwordx4 off, v[{{[0-9:]+}}], +; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) +; GFX9-FLATSCR:scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off +; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) +define amdgpu_vs float @vs_main(i32 %idx) { + %v1 = extractelement <81 x float> , i32 %idx Pierre-vh wrote: what is that test for/why does it need such a big vector? if it's to force a spill to stack I think you can do that by playing with the number of available v/sgprs, IIRC there is an attribute or CL opt for that https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { + const GCNSubtarget &ST = MF.getSubtarget(); + const SIInstrInfo *TII = ST.getInstrInfo(); + IsaVersion IV = getIsaVersion(ST.getCPU()); + + bool Changed = false; + + for (auto &MBB : MF) { +for (auto MI = MBB.begin(); MI != MBB.end();) { + MachineInstr &Inst = *MI; + ++MI; + if (Inst.mayLoadOrStore() == false) +continue; + + // Todo: if next insn is an s_waitcnt + AMDGPU::Waitcnt Wait; + + if (!(Inst.getDesc().TSFlags & SIInstrFlags::maybeAtomic)) { +if (TII->isSMRD(Inst)) { // scalar Pierre-vh wrote: Can we have a shared helper, e.g. in `SIInstrInfo` for both? It's a lot of logic to duplicate > The counter values in SIInsertWaitcnt are precise, while in this features the > counters are simply set to 0. That could just be a boolean switch in a shared helper https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -0,0 +1,199 @@ +; Testing the -amdgpu-precise-memory-op option Pierre-vh wrote: Please generate the test using `update_llc_test_checks`, much easier to update if/when things change. Also I think you don't need `-verify-machineinstrs`. It's expensive and runs anyway when EXPENSIVE_CHECKS is on. https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
@@ -17,13 +17,16 @@ #include "AMDGPUMachineModuleInfo.h" #include "GCNSubtarget.h" #include "MCTargetDesc/AMDGPUMCTargetDesc.h" +#include "Utils/AMDGPUBaseInfo.h" #include "llvm/ADT/BitmaskEnum.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/IR/DiagnosticInfo.h" #include "llvm/Support/AtomicOrdering.h" #include "llvm/TargetParser/TargetParser.h" +#include Pierre-vh wrote: I suspect this is a debug leftover but: always use `dbgs` to print, it's easier and faster to build than `iostream` :) https://github.com/llvm/llvm-project/pull/79236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [flang] [clang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)
@@ -44,8 +44,15 @@ constexpr uint32_t VersionMajorV5 = 1; /// HSA metadata minor version for code object V5. constexpr uint32_t VersionMinorV5 = 2; +/// HSA metadata major version for code object V6. +constexpr uint32_t VersionMajorV6 = 1; +/// HSA metadata minor version for code object V6. +constexpr uint32_t VersionMinorV6 = 3; Pierre-vh wrote: Not yet, but I assume we'll want to bundle some changes to the MD with V6 so it's better to update the version now, no? https://github.com/llvm/llvm-project/pull/76954 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [flang] [clang] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)
@@ -620,6 +620,15 @@ void ScalarBitSetTraits::bitset(IO &IO, BCase(EF_AMDGPU_FEATURE_XNACK_V3); BCase(EF_AMDGPU_FEATURE_SRAMECC_V3); break; +case ELF::ELFABIVERSION_AMDGPU_HSA_V6: Pierre-vh wrote: elf-headers.test already covers it https://github.com/llvm/llvm-project/pull/76954 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, +allowing to build code once and run it on multiple targets at the cost +of less features being available. + +Generic processors are only available on Code Object V6 and up. + + .. table:: AMDGPU Generic Processors + :name: amdgpu-generic-processor-table + + == = = + Processor TargetSupported Target + TripleProcessorsFeatures + ArchitectureRestrictions + + + + + + + + + == = = + ``gfx9-generic`` ``amdgcn`` - ``gfx900`` - ``v_mad_mix`` instructions + - ``gfx902``are not available on + - ``gfx904````gfx900``, ``gfx902``, + - ``gfx906````gfx909``, ``gfx90c`` + - ``gfx909`` - ``v_fma_mix`` instructions + - ``gfx90c``are not available on ``gfx904`` + - sramecc is not available on + ``gfx906`` + - The following instructions + are not available on ``gfx906``: + + - ``v_fmac_f32`` + - ``v_xnor_b32`` + - ``v_dot4_i32_i8`` + - ``v_dot8_i32_i4`` + - ``v_dot2_i32_i16`` + - ``v_dot2_u32_u16`` + - ``v_dot4_u32_u8`` + - ``v_dot8_u32_u4`` + - ``v_dot2_f32_f16`` + + + ``gfx10.1-generic`` ``amdgcn`` - ``gfx1010`` - The following instructions are + - ``gfx1011`` not available on ``gfx1011`` + - ``gfx1012`` and ``gfx1012`` + - ``gfx1013`` + - ``v_dot4_i32_i8`` + - ``v_dot8_i32_i4`` + - ``v_dot2_i32_i16`` + - ``v_dot2_u32_u16`` + - ``v_dot2c_f32_f16`` + - ``v_dot4c_i32_i8`` + - ``v_dot4_u32_u8`` + - ``v_dot8_u32_u4`` + - ``v_dot2_f32_f16`` + + - BVH Ray Tracing instructions + are not available on + ``gfx1013`` + + + ``gfx10.3-generic`` ``amdgcn`` - ``gfx1030`` No restrictions. + - ``gfx1031`` + - ``gfx1032`` + - ``gfx1033`` + - ``gfx1034`` + - ``gfx1035`` + - ``gfx1036`` Pierre-vh wrote: It's not a target in LLVM so no https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, +allowing to build code once and run it on multiple targets at the cost +of less features being available. + +Generic processors are only available on Code Object V6 and up. + + .. table:: AMDGPU Generic Processors + :name: amdgpu-generic-processor-table + + == = = + Processor TargetSupported Target + TripleProcessorsFeatures + ArchitectureRestrictions + + + + + + + + + == = = + ``gfx9-generic`` ``amdgcn`` - ``gfx900`` - ``v_mad_mix`` instructions + - ``gfx902``are not available on + - ``gfx904````gfx900``, ``gfx902``, + - ``gfx906````gfx909``, ``gfx90c`` + - ``gfx909`` - ``v_fma_mix`` instructions + - ``gfx90c``are not available on ``gfx904`` + - sramecc is not available on + ``gfx906`` + - The following instructions + are not available on ``gfx906``: + + - ``v_fmac_f32`` + - ``v_xnor_b32`` + - ``v_dot4_i32_i8`` + - ``v_dot8_i32_i4`` + - ``v_dot2_i32_i16`` + - ``v_dot2_u32_u16`` + - ``v_dot4_u32_u8`` + - ``v_dot8_u32_u4`` + - ``v_dot2_f32_f16`` + + + ``gfx10.1-generic`` ``amdgcn`` - ``gfx1010`` - The following instructions are + - ``gfx1011`` not available on ``gfx1011`` + - ``gfx1012`` and ``gfx1012`` + - ``gfx1013`` + - ``v_dot4_i32_i8`` Pierre-vh wrote: gfx1010 and gfx1012 are indeed not identical, but gfx1011 and gfx1012 are: ``` def FeatureISAVersion10_1_0 : FeatureSet< !listconcat(FeatureISAVersion10_1_Common.Features, [])>; def FeatureISAVersion10_1_2 : FeatureSet< !listconcat(FeatureISAVersion10_1_Common.Features, [FeatureDot1Insts, FeatureDot2Insts, FeatureDot5Insts, FeatureDot6Insts, FeatureDot7Insts, FeatureDot10Insts])>; ``` https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [llvm] [clang] [lld] [AMDGPU] Introduce Code Object V6 (PR #76954)
Pierre-vh wrote: ping https://github.com/llvm/llvm-project/pull/76954 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [clang] [flang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
Pierre-vh wrote: @arsenm Hi, can you take a look - especially on the testing? I don't know if this is tested well enough https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [llvm] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -840,6 +845,12 @@ enum : unsigned { EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 = 0x800, // SRAMECC is on. EF_AMDGPU_FEATURE_SRAMECC_ON_V4 = 0xc00, + + // Generic target versioning. This is contained in the list byte of EFLAGS. Pierre-vh wrote: It's already part of #76954, I just haven't figured out how to stack PR yet so all changes of #76954 are here too :/ https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [llvm] [lld] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From d56e752e3eed0fd75a7ff98638ec71635019fdb1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 21 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 9 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 5 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 9 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 27 +++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 13 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 5 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 2 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + .../elf-headers.test} | 0 .../ELF/AMDGPU/generic_versions.s | 16 ++ .../ELF/AMDGPU/generic_versions.test | 26 ++ llvm/tools/llvm-readobj/ELFDumper.cpp | 224 -- 52 files changed, 491 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => AMDGPU/elf-headers.test} (100%) create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index e4fdad8265c8637..a6b96ea027056e3 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4763,9 +4763,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_4">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index f4246c5e8f68e8b..16dbb4bd835df53 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/Code
[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -787,11 +788,15 @@ enum : unsigned { EF_AMDGPU_MACH_AMDGCN_GFX942= 0x04c, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X4D = 0x04d, EF_AMDGPU_MACH_AMDGCN_GFX1201 = 0x04e, + EF_AMDGPU_MACH_AMDGCN_GFX9_GENERIC = 0x04f, + EF_AMDGPU_MACH_AMDGCN_GFX10_1_GENERIC = 0x050, Pierre-vh wrote: 172dbdf9312a15b449954e43623afc28240f50dd https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [lld] [clang] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -787,11 +788,15 @@ enum : unsigned { EF_AMDGPU_MACH_AMDGCN_GFX942= 0x04c, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X4D = 0x04d, EF_AMDGPU_MACH_AMDGCN_GFX1201 = 0x04e, + EF_AMDGPU_MACH_AMDGCN_GFX9_GENERIC = 0x04f, Pierre-vh wrote: 172dbdf9312a15b449954e43623afc28240f50dd https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [lld] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From 47d4f3ed4e27f2ce2b3b33c9b0ca4838b3011f22 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 21 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 9 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 5 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 9 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 27 +++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 13 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 5 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 2 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + .../elf-headers.test} | 0 .../ELF/AMDGPU/generic_versions.s | 16 ++ .../ELF/AMDGPU/generic_versions.test | 26 ++ llvm/tools/llvm-readobj/ELFDumper.cpp | 224 -- 52 files changed, 491 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => AMDGPU/elf-headers.test} (100%) create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index e4fdad8265c863..a6b96ea027056e 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4763,9 +4763,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_4">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index f4246c5e8f68e8..16dbb4bd835df5 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/
[clang] [flang] [lld] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -787,11 +788,15 @@ enum : unsigned { EF_AMDGPU_MACH_AMDGCN_GFX942= 0x04c, EF_AMDGPU_MACH_AMDGCN_RESERVED_0X4D = 0x04d, EF_AMDGPU_MACH_AMDGCN_GFX1201 = 0x04e, + EF_AMDGPU_MACH_AMDGCN_GFX9_GENERIC = 0x04f, + EF_AMDGPU_MACH_AMDGCN_GFX10_1_GENERIC = 0x050, Pierre-vh wrote: Just noticed I forgot to update the AMDGPUUsage + the EF_AMDGPU_MACH_AMDGCN_LAST enum when adding the reserved entries. I'll do that here. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -253,6 +274,12 @@ AMDGPU::IsaVersion AMDGPU::getIsaVersion(StringRef GPU) { case GK_GFX1151: return {11, 5, 1}; case GK_GFX1200: return {12, 0, 0}; case GK_GFX1201: return {12, 0, 1}; + + // Generic targets use the earliest ISA version in their group. Pierre-vh wrote: I think it's alright as is, but this API is bad and should probably be refactored IMO. Most users of the API are just interested in checking the version major, sometimes minor (10.1 vs 10.3). In theory, this API should _never_ be used to check for presence of a feature, that's always done through the feature list check, so it shouldn't really be abusable. I added a comment though to revisit this and make the intent clearer. https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [lld] [llvm] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -520,6 +520,106 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. They group multiple processors into one, +allowing to build code once and run it on multiple targets at the cost +of less features being available. + +Generic processors are only available on Code Object V6 and up. + + .. table:: AMDGPU Generic Processors + :name: amdgpu-generic-processor-table + + == = = + Processor TargetSupported Target + TripleProcessorsFeatures + ArchitectureRestrictions + + + + + + + + + == = = + ``gfx9-generic`` ``amdgcn`` - ``gfx900`` - ``v_mad_mix`` instructions + - ``gfx902``are not available on + - ``gfx904````gfx900``, ``gfx902``, + - ``gfx906````gfx909``, ``gfx90c`` + - ``gfx909`` - ``v_fma_mix`` instructions + - ``gfx90c``are not available on ``gfx904`` + - sramecc is not available on Pierre-vh wrote: No, for unsupported: `EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4` https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [llvm] [lld] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
@@ -4135,6 +4283,33 @@ Code object V5 metadata is the same as == == = +.. _amdgpu-amdhsa-code-object-metadata-v6: + +Code Object V6 Metadata + +.. warning:: + Code object V6 is not the default code object version emitted by this version + of LLVM. + + +Code object V6 metadata is the same as +:ref:`amdgpu-amdhsa-code-object-metadata-v5` with the changes defined in table +:ref:`amdgpu-amdhsa-code-object-metadata-map-table-v6`. + + .. table:: AMDHSA Code Object V6 Metadata Map Changes + :name: amdgpu-amdhsa-code-object-metadata-map-table-v6 + + = == = === + String KeyValue Type Required? Description + = == = === + "amdhsa.version" sequence ofRequired - The first integer is the major Pierre-vh wrote: I anticipate that we'll want to add some more V6-only metadata at some point, that's why I just started a new table so it's easier to follow up. I don't mind merging it with the V5 table if you really prefer https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [lld] [flang] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)
Pierre-vh wrote: I added a few more tests, I just didn't find how to test the flat-scratch stuff properly. Also, gfx904 is documented as not having absolute flat scratch, yet I don't see anything about that in the code (no related feature). I put gfx9-generic with flat scratch but I don't know if that's correct at all, and how to test it? https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [llvm] [clang] [flang] [AMDGPU] Introduce Code Object V6 (PR #76954)
Pierre-vh wrote: ping https://github.com/llvm/llvm-project/pull/76954 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [lld] [llvm] [AMDGPU] Introduce Code Object V6 (PR #76954)
https://github.com/Pierre-vh updated https://github.com/llvm/llvm-project/pull/76954 >From b5a034bd71d6925ac287a9bf4c0f86f9e70bb9d1 Mon Sep 17 00:00:00 2001 From: pvanhout Date: Thu, 4 Jan 2024 14:12:00 +0100 Subject: [PATCH] [AMDGPU] Introduce Code Object V6 Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are not included, I'm planning to do them in a follow-up patch all at once (when generic targets land too). --- clang/include/clang/Driver/Options.td | 4 +- clang/lib/CodeGen/CGBuiltin.cpp | 6 +- clang/lib/Driver/ToolChains/CommonArgs.cpp| 2 +- .../amdgpu-code-object-version-linking.cu | 37 +++ .../CodeGenCUDA/amdgpu-code-object-version.cu | 4 + .../test/CodeGenCUDA/amdgpu-workgroup-size.cu | 4 + .../amdgcn/bitcode/oclc_abi_version_600.bc| 0 clang/test/Driver/hip-code-object-version.hip | 12 + clang/test/Driver/hip-device-libs.hip | 18 +- flang/lib/Frontend/CompilerInvocation.cpp | 2 + flang/test/Lower/AMD/code-object-version.f90 | 3 +- lld/ELF/Arch/AMDGPU.cpp | 21 ++ lld/test/ELF/amdgpu-tid.s | 16 ++ llvm/include/llvm/BinaryFormat/ELF.h | 9 +- llvm/include/llvm/Support/AMDGPUMetadata.h| 7 + llvm/include/llvm/Support/ScopedPrinter.h | 4 +- llvm/include/llvm/Target/TargetOptions.h | 1 + llvm/lib/ObjectYAML/ELFYAML.cpp | 9 + llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 3 + .../AMDGPU/AMDGPUHSAMetadataStreamer.cpp | 10 + .../Target/AMDGPU/AMDGPUHSAMetadataStreamer.h | 11 +- .../MCTargetDesc/AMDGPUTargetStreamer.cpp | 26 ++ .../MCTargetDesc/AMDGPUTargetStreamer.h | 1 + .../Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp| 6 + llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h | 2 +- ...licit-kernarg-backend-usage-global-isel.ll | 2 + .../AMDGPU/call-graph-register-usage.ll | 1 + .../AMDGPU/codegen-internal-only-func.ll | 3 + llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll | 4 + .../enable-scratch-only-dynamic-stack.ll | 1 + .../AMDGPU/implicit-kernarg-backend-usage.ll | 2 + .../AMDGPU/implicitarg-offset-attributes.ll | 46 .../AMDGPU/llvm.amdgcn.implicitarg.ptr.ll | 1 + llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll | 1 + llvm/test/CodeGen/AMDGPU/recursion.ll | 1 + .../AMDGPU/resource-usage-dead-function.ll| 1 + .../AMDGPU/tid-mul-func-xnack-all-any.ll | 6 + .../tid-mul-func-xnack-all-not-supported.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-off.ll | 6 + .../AMDGPU/tid-mul-func-xnack-all-on.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-off-1.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-off-2.ll| 6 + .../AMDGPU/tid-mul-func-xnack-any-on-1.ll | 6 + .../AMDGPU/tid-mul-func-xnack-any-on-2.ll | 6 + .../tid-one-func-xnack-not-supported.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-off.ll | 6 + .../CodeGen/AMDGPU/tid-one-func-xnack-on.ll | 6 + .../MC/AMDGPU/hsa-v5-uses-dynamic-stack.s | 5 + .../elf-headers.test} | 0 .../ELF/AMDGPU/generic_versions.s | 16 ++ .../ELF/AMDGPU/generic_versions.test | 26 ++ llvm/tools/llvm-readobj/ELFDumper.cpp | 224 -- 52 files changed, 483 insertions(+), 135 deletions(-) create mode 100644 clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_abi_version_600.bc rename llvm/test/tools/llvm-readobj/ELF/{amdgpu-elf-headers.test => AMDGPU/elf-headers.test} (100%) create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.s create mode 100644 llvm/test/tools/llvm-readobj/ELF/AMDGPU/generic_versions.test diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 819f6f1a15c3f3..0d66b68e5d8c47 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -4783,9 +4783,9 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee", def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, Visibility<[ClangOption, FlangOption, CC1Option, FC1Option]>, - Values<"none,4,5">, + Values<"none,4,5,6">, NormalizedValuesScope<"llvm::CodeObjectVersionKind">, - NormalizedValues<["COV_None", "COV_4", "COV_5"]>, + NormalizedValues<["COV_None", "COV_4", "COV_5", "COV_6"]>, MarshallingInfoEnum, "COV_4">; defm cumode : SimpleMFlag<"cumode", diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 7ef764b8e1ac80..fdc7025e50fed6 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/C
[clang] [AMDGPU] Remove Code Object V3 (PR #67118)
Pierre-vh wrote: I don't mind reverting, but do you have a timeline for removal of that device? v3 has been deprecated for a while, AFAIK. https://github.com/llvm/llvm-project/pull/67118 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] d6acd01 - [Sema] Fix crash when evaluating nested call with value-dependent arg
Author: Pierre van Houtryve Date: 2023-01-06T02:57:50-05:00 New Revision: d6acd0196b3378bdeb5193053e290d7194c4f72d URL: https://github.com/llvm/llvm-project/commit/d6acd0196b3378bdeb5193053e290d7194c4f72d DIFF: https://github.com/llvm/llvm-project/commit/d6acd0196b3378bdeb5193053e290d7194c4f72d.diff LOG: [Sema] Fix crash when evaluating nested call with value-dependent arg Fix an edge case `ExprConstant.cpp`'s `EvaluateWithSubstitution` when called by `CheckEnableIf` The assertion in `CallStackFrame::getTemporary` could fail during evaluation of nested calls to a function using `enable_if` when the second argument was a value-dependent expression. This caused a temporary to be created for the second argument with a given version during the evaluation of the inner call, but we bailed out when evaluating the second argument of the outer call due to the expression being value-dependent. After bailing out, we tried to clean up the argument's value slot but it caused an assertion to trigger in `getTemporary` as a temporary for the second argument existed, but only for the inner call and not the outer call. See the test case for a more complete description of the issue. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D139713 Added: clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp Modified: clang/lib/AST/ExprConstant.cpp Removed: diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp index 78cfecbec9fd3..a43845e53c5d0 100644 --- a/clang/lib/AST/ExprConstant.cpp +++ b/clang/lib/AST/ExprConstant.cpp @@ -594,11 +594,6 @@ namespace { auto LB = Temporaries.lower_bound(KV); if (LB != Temporaries.end() && LB->first == KV) return &LB->second; - // Pair (Key,Version) wasn't found in the map. Check that no elements - // in the map have 'Key' as their key. - assert((LB == Temporaries.end() || LB->first.first != Key) && - (LB == Temporaries.begin() || std::prev(LB)->first.first != Key) && - "Element with key 'Key' found in map"); return nullptr; } diff --git a/clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp b/clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp new file mode 100644 index 0..998f2ccf92534 --- /dev/null +++ b/clang/test/SemaCXX/enable_if-nested-call-with-valuedependent-param.cpp @@ -0,0 +1,44 @@ +// RUN: %clang_cc1 -fsyntax-only %s -std=c++14 + +// Checks that Clang doesn't crash/assert on the nested call to "kaboom" +// in "bar()". +// +// This is an interesting test case for `ExprConstant.cpp`'s `CallStackFrame` +// because it triggers the following chain of events: +// 0. `CheckEnableIf` calls `EvaluateWithSubstitution`. +// 1. The outer call to "kaboom" gets evaluated. +// 2. The expr for "a" gets evaluated, it has a version X; +// a temporary with the key (a, X) is created. +// 3. The inner call to "kaboom" gets evaluated. +// 4. The expr for "a" gets evaluated, it has a version Y; +// a temporary with the key (a, Y) is created. +// 5. The expr for "b" gets evaluated, it has a version Y; +// a temporary with the key (b, Y) is created. +// 6. `EvaluateWithSubstitution` looks at "b" but cannot evaluate it +// because it's value-dependent (due to the call to "f.foo()"). +// +// When `EvaluateWithSubstitution` bails out while evaluating the outer +// call, it attempts to fetch "b"'s param slot to clean it up. +// +// This used to cause an assertion failure in `getTemporary` because +// a temporary with the key "(b, Y)" (created at step 4) existed but +// not one for "(b, X)", which is what it was trying to fetch. + +template +__attribute__((enable_if(true, ""))) +T kaboom(T a, T b) { + return b; +} + +struct A { + double foo(); +}; + +template +struct B { + A &f; + + void bar() { +kaboom(kaboom(0.0, 1.0), f.foo()); + } +}; ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] c05f163 - [clang][cuda/hip] Allow `__noinline__` lambdas
Author: Pierre van Houtryve Date: 2022-11-04T07:33:31Z New Revision: c05f1639f7f4a8e81ad83bba99bae95553c6064e URL: https://github.com/llvm/llvm-project/commit/c05f1639f7f4a8e81ad83bba99bae95553c6064e DIFF: https://github.com/llvm/llvm-project/commit/c05f1639f7f4a8e81ad83bba99bae95553c6064e.diff LOG: [clang][cuda/hip] Allow `__noinline__` lambdas D124866 seem to have had an unintended side effect: __noinline__ on lambdas was no longer accepted. This fixes the regression and adds a test case for it. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D137251 Added: clang/test/CodeGenCUDA/lambda-noinline.cu Modified: clang/docs/ReleaseNotes.rst clang/lib/Parse/ParseExprCXX.cpp clang/test/Parser/lambda-attr.cu Removed: diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ad1a00b4bbcc..7bb1405c131a 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -638,6 +638,9 @@ C++2b Feature Support CUDA/HIP Language Changes in Clang -- + - Allow the use of ``__noinline__`` as a keyword (instead of ``__attribute__((noinline))``) + in lambda declarations. + Objective-C Language Changes in Clang - diff --git a/clang/lib/Parse/ParseExprCXX.cpp b/clang/lib/Parse/ParseExprCXX.cpp index e34bd8d7bca4..a768c4da504a 100644 --- a/clang/lib/Parse/ParseExprCXX.cpp +++ b/clang/lib/Parse/ParseExprCXX.cpp @@ -1291,7 +1291,22 @@ ExprResult Parser::ParseLambdaExpressionAfterIntroducer( if (getLangOpts().CUDA) { // In CUDA code, GNU attributes are allowed to appear immediately after the // "[...]", even if there is no "(...)" before the lambda body. -MaybeParseGNUAttributes(D); +// +// Note that we support __noinline__ as a keyword in this mode and thus +// it has to be separately handled. +while (true) { + if (Tok.is(tok::kw___noinline__)) { +IdentifierInfo *AttrName = Tok.getIdentifierInfo(); +SourceLocation AttrNameLoc = ConsumeToken(); +Attr.addNew(AttrName, AttrNameLoc, nullptr, AttrNameLoc, nullptr, 0, +ParsedAttr::AS_Keyword); + } else if (Tok.is(tok::kw___attribute)) +ParseGNUAttributes(Attr, nullptr, &D); + else +break; +} + +D.takeAttributes(Attr); } // Helper to emit a warning if we see a CUDA host/device/global attribute diff --git a/clang/test/CodeGenCUDA/lambda-noinline.cu b/clang/test/CodeGenCUDA/lambda-noinline.cu new file mode 100644 index ..de2196e63f07 --- /dev/null +++ b/clang/test/CodeGenCUDA/lambda-noinline.cu @@ -0,0 +1,23 @@ +// RUN: %clang_cc1 -no-opaque-pointers -x hip -emit-llvm -std=c++11 %s -o - \ +// RUN: -triple x86_64-linux-gnu \ +// RUN: | FileCheck -check-prefix=HOST %s +// RUN: %clang_cc1 -no-opaque-pointers -x hip -emit-llvm -std=c++11 %s -o - \ +// RUN: -triple amdgcn-amd-amdhsa -fcuda-is-device \ +// RUN: | FileCheck -check-prefix=DEV %s + +#include "Inputs/cuda.h" + +// Checks noinline is correctly added to the lambda function. + +// HOST: define{{.*}}@_ZZ4HostvENKUlvE_clEv({{.*}}) #[[ATTR:[0-9]+]] +// HOST: attributes #[[ATTR]]{{.*}}noinline + +// DEV: define{{.*}}@_ZZ6DevicevENKUlvE_clEv({{.*}}) #[[ATTR:[0-9]+]] +// DEV: attributes #[[ATTR]]{{.*}}noinline + +__device__ int a; +int b; + +__device__ int Device() { return ([&] __device__ __noinline__ (){ return a; })(); } + +__host__ int Host() { return ([&] __host__ __noinline__ (){ return b; })(); } diff --git a/clang/test/Parser/lambda-attr.cu b/clang/test/Parser/lambda-attr.cu index 886212b97f50..7fa128effd51 100644 --- a/clang/test/Parser/lambda-attr.cu +++ b/clang/test/Parser/lambda-attr.cu @@ -18,6 +18,10 @@ __attribute__((device)) void device_attr() { ([&](int) __attribute__((device)){ device_fn(); })(0); // expected-warning@-1 {{nvcc does not allow '__device__' to appear after the parameter list in lambdas}} ([&] __attribute__((device)) (int) { device_fn(); })(0); + + // test that noinline can appear anywhere. + ([&] __attribute__((device)) __noinline__ () { device_fn(); })(); + ([&] __noinline__ __attribute__((device)) () { device_fn(); })(); } __attribute__((host)) __attribute__((device)) void host_device_attrs() { @@ -37,6 +41,11 @@ __attribute__((host)) __attribute__((device)) void host_device_attrs() { // expected-warning@-1 {{nvcc does not allow '__host__' to appear after the parameter list in lambdas}} // expected-warning@-2 {{nvcc does not allow '__device__' to appear after the parameter list in lambdas}} ([&] __attribute__((host)) __attribute__((device)) (int) { hd_fn(); })(0); + + // test that noinline can also appear anywhere. + ([] __attribute__((host)) __attribute__((device)) () { hd_fn(); })(); + ([] __attribute__((host)) __noinline__ __attribute__((device)) () { hd_fn(); })();
[clang] [AMDGPU] Remove Code Object V3 (PR #67118)
https://github.com/Pierre-vh closed https://github.com/llvm/llvm-project/pull/67118 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Remove Code Object V2 (PR #65715)
https://github.com/Pierre-vh review_requested https://github.com/llvm/llvm-project/pull/65715 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits