from:"Stanislav Mekhanoshin via Phabricator via cfe\-commits"

[PATCH] D50984: AMDGPU: Move target code into TargetParser

2018-08-21 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D50984



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D57349: AMDGPU: Add ds append/consume builtins

2019-01-28 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57349/new/

https://reviews.llvm.org/D57349



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D58847: AMDGPU: Fix the mapping of sub group sync scope

2019-03-01 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58847/new/

https://reviews.llvm.org/D58847



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59494: AMDGPU: Add support for cross address space synchronization scopes

2019-03-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7973
+  if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
+if (Scope != SyncScope::OpenCLAllSVMDevices)
+  Name = Twine(Twine(Name) + Twine("-")).str();

if (!Name.empty())



Comment at: lib/CodeGen/TargetInfo.cpp:7976
+
+Name = Twine(Twine(Name) + Twine("one-as")).str();
+  }

I think subgroup is in the single address space even if sequentially consistent.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59494/new/

https://reviews.llvm.org/D59494



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59494: AMDGPU: Add support for cross address space synchronization scopes (clang)

2019-03-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7976
+
+Name = Twine(Twine(Name) + Twine("one-as")).str();
+  }

b-sumner wrote:
> kzhuravl wrote:
> > rampitec wrote:
> > > I think subgroup is in the single address space even if sequentially 
> > > consistent.
> > I have synced with @t-tye, and it seems like it might not be. @b-sumner, do 
> > you know what opencl spec states? Thanks.
> As I understand the spec, memory order seq_cst must be consistent with both 
> local- and global-happens-before, so I would say even subgroup is not in the 
> single address space for OpenCL seq_cst.
OK. Is it always one-as if not sequentially consistent? I thought we are about 
to change sequentially consistent case, and not everything else except it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59494/new/

https://reviews.llvm.org/D59494



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59494: AMDGPU: Add support for cross address space synchronization scopes (clang)

2019-03-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59494/new/

https://reviews.llvm.org/D59494



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D61112: AMDGPU: Enable _Float16

2019-04-25 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61112/new/

https://reviews.llvm.org/D61112



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D37386: [AMDGPU] Implement infrastructure to set options in AMDGPUToolChain

2017-09-01 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: lib/Driver/ToolChains/AMDGPU.h:44
+private:
+  const std::map OptionsDefault = {
+  {options::OPT_O, "3"},

Is it really needed to create map in the header?


https://reviews.llvm.org/D37386



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D37386: [AMDGPU] Implement infrastructure to set options in AMDGPUToolChain

2017-09-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D37386



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D63366: AMDGPU: Add GWS instruction builtins

2019-06-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63366/new/

https://reviews.llvm.org/D63366



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D63578: AMDGPU: Add DS GWS sema builtins

2019-06-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63578/new/

https://reviews.llvm.org/D63578



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D64828: AMDGPU: Add some missing builtins

2019-07-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64828/new/

https://reviews.llvm.org/D64828



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D65454: AMDGPU: Add missing builtin declarations

2019-07-30 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65454/new/

https://reviews.llvm.org/D65454



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66198: AMDGPU: Add builtins for is_local/is_private

2019-08-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66198/new/

https://reviews.llvm.org/D66198



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66198: AMDGPU: Add builtins for is_local/is_private

2019-08-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

Didn't you forget to update test/CodeGenOpenCL/amdgpu-features.cl?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66198/new/

https://reviews.llvm.org/D66198



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D63649: AMDGPU: Fix target builtins for gfx10

2019-06-21 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63649/new/

https://reviews.llvm.org/D63649



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D61875: [AMDGPU] gfx1010 clang target

2019-05-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rC360634: [AMDGPU] gfx1010 clang target (authored by rampitec, 
committed by ).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61875/new/

https://reviews.llvm.org/D61875

Files:
  docs/ClangCommandLineReference.rst
  include/clang/Driver/Options.td
  lib/Basic/Targets/AMDGPU.cpp
  lib/Basic/Targets/AMDGPU.h
  lib/Driver/ToolChains/HIP.cpp
  test/CodeGenOpenCL/amdgpu-features.cl
  test/Driver/amdgpu-features.c
  test/Driver/amdgpu-macros.cl
  test/Driver/amdgpu-mcpu.cl

Index: lib/Driver/ToolChains/HIP.cpp
===
--- lib/Driver/ToolChains/HIP.cpp
+++ lib/Driver/ToolChains/HIP.cpp
@@ -307,8 +307,8 @@
   if (BCLibs.empty()) {
 // Get the bc lib file name for ISA version. For example,
 // gfx803 => oclc_isa_version_803.amdgcn.bc.
-std::string ISAVerBC =
-"oclc_isa_version_" + GpuArch.drop_front(3).str() + ".amdgcn.bc";
+std::string GFXVersion = GpuArch.drop_front(3).str();
+std::string ISAVerBC = "oclc_isa_version_" + GFXVersion + ".amdgcn.bc";
 
 llvm::StringRef FlushDenormalControlBC;
 if (DriverArgs.hasArg(options::OPT_fcuda_flush_denormals_to_zero))
Index: lib/Basic/Targets/AMDGPU.h
===
--- lib/Basic/Targets/AMDGPU.h
+++ lib/Basic/Targets/AMDGPU.h
@@ -41,7 +41,6 @@
   llvm::AMDGPU::GPUKind GPUKind;
   unsigned GPUFeatures;
 
-
   bool hasFP64() const {
 return getTriple().getArch() == llvm::Triple::amdgcn ||
!!(GPUFeatures & llvm::AMDGPU::FEATURE_FP64);
Index: lib/Basic/Targets/AMDGPU.cpp
===
--- lib/Basic/Targets/AMDGPU.cpp
+++ lib/Basic/Targets/AMDGPU.cpp
@@ -135,6 +135,14 @@
   CPU = "gfx600";
 
 switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) {
+case GK_GFX1010:
+  Features["dl-insts"] = true;
+  Features["16-bit-insts"] = true;
+  Features["dpp"] = true;
+  Features["gfx9-insts"] = true;
+  Features["gfx10-insts"] = true;
+  Features["s-memrealtime"] = true;
+  break;
 case GK_GFX906:
   Features["dl-insts"] = true;
   Features["dot1-insts"] = true;
Index: docs/ClangCommandLineReference.rst
===
--- docs/ClangCommandLineReference.rst
+++ docs/ClangCommandLineReference.rst
@@ -2396,6 +2396,11 @@
 
 AMDGPU
 --
+.. option:: -mcumode, -mno-cumode
+
+CU wavefront execution mode is used if enabled and WGP wavefront execution mode
+is used if disabled (AMDGPU only)
+
 .. option:: -mxnack, -mno-xnack
 
 Enable XNACK (AMDGPU only)
Index: include/clang/Driver/Options.td
===
--- include/clang/Driver/Options.td
+++ include/clang/Driver/Options.td
@@ -2202,6 +2202,11 @@
 def mno_sram_ecc : Flag<["-"], "mno-sram-ecc">, Group,
   HelpText<"Disable SRAM ECC (AMDGPU only)">;
 
+def mcumode : Flag<["-"], "mcumode">, Group,
+  HelpText<"CU wavefront execution mode is used (AMDGPU only)">;
+def mno_cumode : Flag<["-"], "mno-cumode">, Group,
+  HelpText<"WGP wavefront execution mode is used (AMDGPU only)">;
+
 def faltivec : Flag<["-"], "faltivec">, Group, Flags<[DriverOption]>;
 def fno_altivec : Flag<["-"], "fno-altivec">, Group, Flags<[DriverOption]>;
 def maltivec : Flag<["-"], "maltivec">, Group;
Index: test/CodeGenOpenCL/amdgpu-features.cl
===
--- test/CodeGenOpenCL/amdgpu-features.cl
+++ test/CodeGenOpenCL/amdgpu-features.cl
@@ -5,6 +5,7 @@
 
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX904 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX906 %s
+// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1010 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX1010 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX801 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX700 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s | FileCheck --check-prefix=GFX600 %s
@@ -12,6 +13,7 @@
 
 // GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime"
 // GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime"
+// GFX1010: "target-features"="+16-bit-insts,+dl-insts,+dpp,+fp32-denormals,+fp64-fp16-denormals,+gfx10-insts,+gfx9-insts,+s-memrealtime"
 // GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+fp32-denormals,+fp64-f

[PATCH] D56525: [AMDGPU] Separate feature dot-insts

2019-01-09 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec created this revision.
rampitec added reviewers: b-sumner, kzhuravl, msearles.
Herald added subscribers: cfe-commits, t-tye, tpr, dstuttard, yaxunl, nhaehnle, 
wdng, jvesely.

clang part


Repository:
  rC Clang

https://reviews.llvm.org/D56525

Files:
  include/clang/Basic/BuiltinsAMDGPU.def
  lib/Basic/Targets/AMDGPU.cpp
  test/CodeGenOpenCL/amdgpu-features.cl
  test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl


Index: test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
===
--- test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
+++ test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
@@ -12,24 +12,24 @@
 half2 v2hA, half2 v2hB, float fC,
 short2 v2ssA, short2 v2ssB, int siA, int siB, int siC,
 ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC) {
-  fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dl-insts}}
-  fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true);  // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dl-insts}}
+  fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot-insts}}
+  fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true);  // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot-insts}}
 
-  siOut[0] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dl-insts}}
-  siOut[1] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dl-insts}}
+  siOut[0] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dot-insts}}
+  siOut[1] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dot-insts}}
 
-  uiOut[0] = __builtin_amdgcn_udot2(v2usA, v2usB, uiC, false); // 
expected-error {{'__builtin_amdgcn_udot2' needs target feature dl-insts}}
-  uiOut[1] = __builtin_amdgcn_udot2(v2usA, v2usB, uiC, true);  // 
expected-error {{'__builtin_amdgcn_udot2' needs target feature dl-insts}}
+  uiOut[0] = __builtin_amdgcn_udot2(v2usA, v2usB, uiC, false); // 
expected-error {{'__builtin_amdgcn_udot2' needs target feature dot-insts}}
+  uiOut[1] = __builtin_amdgcn_udot2(v2usA, v2usB, uiC, true);  // 
expected-error {{'__builtin_amdgcn_udot2' needs target feature dot-insts}}
 
-  siOut[2] = __builtin_amdgcn_sdot4(siA, siB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot4' needs target feature dl-insts}}
-  siOut[3] = __builtin_amdgcn_sdot4(siA, siB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot4' needs target feature dl-insts}}
+  siOut[2] = __builtin_amdgcn_sdot4(siA, siB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot4' needs target feature dot-insts}}
+  siOut[3] = __builtin_amdgcn_sdot4(siA, siB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot4' needs target feature dot-insts}}
 
-  uiOut[2] = __builtin_amdgcn_udot4(uiA, uiB, uiC, false); // 
expected-error {{'__builtin_amdgcn_udot4' needs target feature dl-insts}}
-  uiOut[3] = __builtin_amdgcn_udot4(uiA, uiB, uiC, true);  // 
expected-error {{'__builtin_amdgcn_udot4' needs target feature dl-insts}}
+  uiOut[2] = __builtin_amdgcn_udot4(uiA, uiB, uiC, false); // 
expected-error {{'__builtin_amdgcn_udot4' needs target feature dot-insts}}
+  uiOut[3] = __builtin_amdgcn_udot4(uiA, uiB, uiC, true);  // 
expected-error {{'__builtin_amdgcn_udot4' needs target feature dot-insts}}
 
-  siOut[4] = __builtin_amdgcn_sdot8(siA, siB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot8' needs target feature dl-insts}}
-  siOut[5] = __builtin_amdgcn_sdot8(siA, siB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot8' needs target feature dl-insts}}
+  siOut[4] = __builtin_amdgcn_sdot8(siA, siB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot8' needs target feature dot-insts}}
+  siOut[5] = __builtin_amdgcn_sdot8(siA, siB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot8' needs target feature dot-insts}}
 
-  uiOut[4] = __builtin_amdgcn_udot8(uiA, uiB, uiC, false); // 
expected-error {{'__builtin_amdgcn_udot8' needs target feature dl-insts}}
-  uiOut[5] = __builtin_amdgcn_udot8(uiA, uiB, uiC, true);  // 
expected-error {{'__builtin_amdgcn_udot8' needs target feature dl-insts}}
+  uiOut[4] = __builtin_amdgcn_udot8(uiA, uiB, uiC, false); // 
expected-error {{'__builtin_amdgcn_udot8' needs target feature dot-insts}}
+  uiOut[5] = __builtin_amdgcn_udot8(uiA, uiB, uiC, true);  // 
expected-error {{'__builtin_amdgcn_udot8' needs target feature dot-insts}}
 }
Index: test/CodeGenOpenCL/amdgpu-features.cl
===
--- test/CodeGenOpenCL/amdgpu-

[PATCH] D56525: [AMDGPU] Separate feature dot-insts

2019-01-09 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rL350794: [AMDGPU] Separate feature dot-insts (authored by 
rampitec, committed by ).

Changed prior to commit:
  https://reviews.llvm.org/D56525?vs=180969&id=180991#toc

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56525/new/

https://reviews.llvm.org/D56525

Files:
  cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def
  cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
  cfe/trunk/test/CodeGenOpenCL/amdgpu-features.cl
  cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl


Index: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
===
--- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
+++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
@@ -137,6 +137,7 @@
 switch (llvm::AMDGPU::parseArchAMDGCN(CPU)) {
 case GK_GFX906:
   Features["dl-insts"] = true;
+  Features["dot-insts"] = true;
   LLVM_FALLTHROUGH;
 case GK_GFX909:
 case GK_GFX904:
Index: cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def
===
--- cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def
+++ cfe/trunk/include/clang/Basic/BuiltinsAMDGPU.def
@@ -135,13 +135,13 @@
 // Deep learning builtins.
 
//===--===//
 
-TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dl-insts")
-TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dl-insts")
-TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dl-insts")
-TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dl-insts")
-TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dl-insts")
-TARGET_BUILTIN(__builtin_amdgcn_sdot8, "SiSiSiSiIb", "nc", "dl-insts")
-TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dl-insts")
+TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot-insts")
+TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot-insts")
+TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot-insts")
+TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot-insts")
+TARGET_BUILTIN(__builtin_amdgcn_udot4, "UiUiUiUiIb", "nc", "dot-insts")
+TARGET_BUILTIN(__builtin_amdgcn_sdot8, "SiSiSiSiIb", "nc", "dot-insts")
+TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dot-insts")
 
 
//===--===//
 // Special builtins.
Index: cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
===
--- cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
+++ cfe/trunk/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
@@ -12,24 +12,24 @@
 half2 v2hA, half2 v2hB, float fC,
 short2 v2ssA, short2 v2ssB, int siA, int siB, int siC,
 ushort2 v2usA, ushort2 v2usB, uint uiA, uint uiB, uint uiC) {
-  fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dl-insts}}
-  fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true);  // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dl-insts}}
+  fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false); // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot-insts}}
+  fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true);  // 
expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot-insts}}
 
-  siOut[0] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dl-insts}}
-  siOut[1] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dl-insts}}
+  siOut[0] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dot-insts}}
+  siOut[1] = __builtin_amdgcn_sdot2(v2ssA, v2ssB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot2' needs target feature dot-insts}}
 
-  uiOut[0] = __builtin_amdgcn_udot2(v2usA, v2usB, uiC, false); // 
expected-error {{'__builtin_amdgcn_udot2' needs target feature dl-insts}}
-  uiOut[1] = __builtin_amdgcn_udot2(v2usA, v2usB, uiC, true);  // 
expected-error {{'__builtin_amdgcn_udot2' needs target feature dl-insts}}
+  uiOut[0] = __builtin_amdgcn_udot2(v2usA, v2usB, uiC, false); // 
expected-error {{'__builtin_amdgcn_udot2' needs target feature dot-insts}}
+  uiOut[1] = __builtin_amdgcn_udot2(v2usA, v2usB, uiC, true);  // 
expected-error {{'__builtin_amdgcn_udot2' needs target feature dot-insts}}
 
-  siOut[2] = __builtin_amdgcn_sdot4(siA, siB, siC, false); // 
expected-error {{'__builtin_amdgcn_sdot4' needs target feature dl-insts}}
-  siOut[3] = __builtin_amdgcn_sdot4(siA, siB, siC, true);  // 
expected-error {{'__builtin_amdgcn_sdot4' needs target feature dl-insts}}
+  siOut[2] =

[PATCH] D81959: [HIP] Enable -amdgpu-internalize-symbols

2020-06-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM, thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81959/new/

https://reviews.llvm.org/D81959



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D81886: [AMDGPU] Add gfx1030 target

2020-06-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec marked an inline comment as done.
rampitec added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:266-267

  names.
+ ``gfx1030`` ``amdgcn``   dGPU  - xnack   
*TBA*
+  [off]
+- wavefrontsize64

foad wrote:
> Seems odd to list xnack as a "supported fetaure" when the hardware doesn't 
> support it.
Yeah, it is off, but we may want to review the whole table.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81886/new/

https://reviews.llvm.org/D81886



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D85337: [AMDGPU] gfx1031 target

2020-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGea7d0e2996ec: [AMDGPU] gfx1031 target (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85337/new/

https://reviews.llvm.org/D85337

Files:
  clang/include/clang/Basic/Cuda.h
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/NVPTX.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  llvm/docs/AMDGPUUsage.rst
  llvm/include/llvm/BinaryFormat/ELF.h
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/ObjectYAML/ELFYAML.cpp
  llvm/lib/Support/TargetParser.cpp
  llvm/lib/Target/AMDGPU/GCNProcessors.td
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fmad.s32.mir
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.global.atomic.csub.ll
  llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll
  llvm/test/CodeGen/AMDGPU/hsa-note-no-func.ll
  llvm/test/CodeGen/AMDGPU/idot8s.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.csub.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot4.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot8.ll
  llvm/test/MC/AMDGPU/gfx1030_err.s
  llvm/test/MC/AMDGPU/gfx1030_new.s
  llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt
  llvm/tools/llvm-readobj/ELFDumper.cpp

Index: llvm/tools/llvm-readobj/ELFDumper.cpp
===
--- llvm/tools/llvm-readobj/ELFDumper.cpp
+++ llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -1841,6 +1841,7 @@
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1011),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1012),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1030),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1031),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_XNACK),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_SRAM_ECC)
 };
Index: llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt
+++ llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt
@@ -1,4 +1,5 @@
 # RUN: llvm-mc -arch=amdgcn -mcpu=gfx1030 -disassemble -show-encoding < %s | FileCheck -check-prefix=GFX10 %s
+# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1031 -disassemble -show-encoding < %s | FileCheck -check-prefix=GFX10 %s
 
 # GFX10: global_load_dword_addtid v1, s[2:3] offset:16
 0x10,0x80,0x58,0xdc,0x00,0x00,0x02,0x01
Index: llvm/test/MC/AMDGPU/gfx1030_new.s
===
--- llvm/test/MC/AMDGPU/gfx1030_new.s
+++ llvm/test/MC/AMDGPU/gfx1030_new.s
@@ -1,4 +1,5 @@
 // RUN: llvm-mc -arch=amdgcn -mcpu=gfx1030 -show-encoding %s | FileCheck --check-prefix=GFX10 %s
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1031 -show-encoding %s | FileCheck --check-prefix=GFX10 %s
 
 global_load_dword_addtid v1, s[2:3] offset:16
 // GFX10: encoding: [0x10,0x80,0x58,0xdc,0x00,0x00,0x02,0x01]
Index: llvm/test/MC/AMDGPU/gfx1030_err.s
===
--- llvm/test/MC/AMDGPU/gfx1030_err.s
+++ llvm/test/MC/AMDGPU/gfx1030_err.s
@@ -1,4 +1,5 @@
 // RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1030 -show-encoding %s 2>&1 | FileCheck --check-prefix=GFX10 %s
+// RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1031 -show-encoding %s 2>&1 | FileCheck --check-prefix=GFX10 %s
 
 v_dot8c_i32_i4 v5, v1, v2
 // GFX10: error:
Index: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot8.ll
===
--- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot8.ll
+++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot8.ll
@@ -3,6 +3,7 @@
 ; RUN: llc -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
 ; RUN: llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
 ; RUN: llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
+; RUN: llc -march=amdgcn -mcpu=gfx1031 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
 
 declare i32 @llvm.amdgcn.sdot8(i32 %a, i32 %b, i32 %c, i1 %clamp)
 
Index: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot4.ll
===
--- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot4.ll
+++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sdot4.ll
@@ -2,6 +2,7 @@
 ; RUN: llc -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
 ; RUN: llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
 ; RUN: llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=GCN,GFX10
+; RUN: llc -march=amdgcn -mcpu=gfx1031 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=

[PATCH] D85337: [AMDGPU] gfx1031 target

2020-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:1844
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1030),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1031),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_XNACK),

jhenderson wrote:
> Where's the dedicated llvm-readobj test for this? Please add one. If there 
> exists one for the other AMDGPU flags, extend that one, otherwise, it might 
> be best to write one for the whole set at once.
It is in the lvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll above along with 
all other targets.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85337/new/

https://reviews.llvm.org/D85337

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D85337: [AMDGPU] gfx1031 target

2020-08-10 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec marked 2 inline comments as done.
rampitec added inline comments.



Comment at: llvm/tools/llvm-readobj/ELFDumper.cpp:1844
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1030),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1031),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_XNACK),

jhenderson wrote:
> rampitec wrote:
> > jhenderson wrote:
> > > Where's the dedicated llvm-readobj test for this? Please add one. If 
> > > there exists one for the other AMDGPU flags, extend that one, otherwise, 
> > > it might be best to write one for the whole set at once.
> > It is in the lvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll above along 
> > with all other targets.
> I emphasise "dedicated". llvm/test/CodeGen... is not the place for tests for 
> llvm-readobj code. Such tests belong in llvm/test/tools/llvm-readobj.
> 
> To me, the CodeGen test looks like a test of the llc behaviour of generating 
> the right value. It uses llvm-readobj, and therefore only incidentally tests 
> the dumping behaviour. Imagine a situation where we decided to add support 
> for this to llvm-objdump, and then somebody decides to switch the test over 
> to using llvm-objdump to match (maybe it "looks better" that way). That would 
> result in this code in llvm-readobj being untested.
> 
> My opinion, and one I think is shared with others who maintain llvm-readobj 
> (amongst other things) is that you need direct testing for llvm-readobj 
> behaviour within llvm-readobj tests to avoid such situations. This would 
> therefore mean that this change needs two tests:
> 
> 1) A test in llvm/test/tools/llvm-readobj/ELF which uses a process (e.g. 
> yaml2obj) to create the ELF header with the required flag, to ensure the 
> dumping behaviour is correct.
> 2) A test in llvm/test/CodeGen which tests that llc produces the right output 
> value in the ELF header flags field (which of course would use llvm-readobj 
> currently, but could use anything to verify).
Added in D85683.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85337/new/

https://reviews.llvm.org/D85337

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79744: clang: Add address space to indirect abi info and use it for kernels

2020-05-11 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

Typo in commit message: "Previously, indirect arguments assumed assumed".


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79744/new/

https://reviews.llvm.org/D79744



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128952: [AMDGPU] Add WMMA clang builtins

2022-06-30 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128952/new/

https://reviews.llvm.org/D128952

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122044: [AMDGPU] New gfx940 mfma instructions

2022-03-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG27439a764230: [AMDGPU] New gfx940 mfma instructions 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122044/new/

https://reviews.llvm.org/D122044

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/SISchedule.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.mfma.gfx940.mir
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll
  llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll
  llvm/test/MC/AMDGPU/mai-gfx940.s
  llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

Index: llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
+++ llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
@@ -3,6 +3,24 @@
 # GFX940: v_accvgpr_write_b32 a10, s20 ; encoding: [0x0a,0x40,0xd9,0xd3,0x14,0x00,0x00,0x18]
 0x0a,0x40,0xd9,0xd3,0x14,0x00,0x00,0x18
 
+# GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_i32_16x16x32_i8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xd7,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xd7,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0xa4
+
 # GFX940: v_mfma_f32_32x32x4_2b_bf16 v[0:31], v[2:3], v[4:5], v[2:33] ; encoding: [0x00,0x00,0xdd,0xd3,0x02,0x09,0x0a,0x04]
 0x00,0x00,0xdd,0xd3,0x02,0x09,0x0a,0x04
 
@@ -32,3 +50,15 @@
 
 # GFX940: v_mfma_f32_16x16x16_bf16 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04]
 0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_mfma_f32_16x16x8_xf32 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xbe,0xd3,0x02,0x09,0x0a,0x04]
+0x00,0x80,0xbe,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_mfma_f32_16x16x8_xf32 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xbe,0xd3,0x02,0x09,0x0a,0x04]
+0x00,0x00,0xbe,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_mfma_f32_32x32x4_xf32 v[0:15], v[2:3], v[4:5], v[2:17] ; encoding: [0x00,0x00,0xbf,0xd3,0x02,0x09,0x0a,0x04]
+0x00,0x00,0xbf,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04]
+0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04
Index: llvm/test/MC/AMDGPU/mai-gfx940.s
===
--- llvm/test/MC/AMDGPU/mai-gfx940.s
+++ llvm/test/MC/AMDGPU/mai-gfx940.s
@@ -262,6 +262,54 @@
 v_mfma_f32_32x32x1f32 v[0:31], v0, v1, v[34:65] blgp:7
 // GFX940: v_mfma_f32_32x32x1_2b_f32 v[0:31], v0, v1, v[34:65] blgp:7 ; encoding: [0x00,0x00,0xc0,0xd3,0x00,0x03,0x8a,0xe4]
 
+v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15]
+// GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
+// GFX90A: error: instruction not supported on this GPU
+
+v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15]
+// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
+// GFX90A: error: instruction not supported on this GPU
+
+v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15]
+// GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
+// GFX90A: error: instruction not supported on this GPU
+
+v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15]
+// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
+// GFX90A: error: instruction not supported on this GPU
+
+v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5
+// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] b

[PATCH] D122191: [AMDGPU] Support gfx940 smfmac instructions

2022-03-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG6e3e14f600af: [AMDGPU] Support gfx940 smfmac instructions 
(authored by rampitec).
Herald added subscribers: cfe-commits, hsmhsm.
Herald added a project: clang.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122191/new/

https://reviews.llvm.org/D122191

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
  llvm/lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/SIRegisterInfo.td
  llvm/lib/Target/AMDGPU/SISchedule.td
  llvm/lib/Target/AMDGPU/VOP3Instructions.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/lib/Target/AMDGPU/VOPInstructions.td
  llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.mfma.gfx940.mir
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll
  llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll
  llvm/test/MC/AMDGPU/mai-gfx940.s
  llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

Index: llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
+++ llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
@@ -62,3 +62,45 @@
 
 # GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04]
 0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_smfmac_f32_16x16x32_f16 v[10:13], a[2:3], v[4:7], v0 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe2,0xd3,0x02,0x09,0x02,0x0c]
+0x0a,0x0b,0xe2,0xd3,0x02,0x09,0x02,0x0c
+
+# GFX940: v_smfmac_f32_16x16x32_f16 a[10:13], v[2:3], a[4:7], v1 ; encoding: [0x0a,0x80,0xe2,0xd3,0x02,0x09,0x06,0x14]
+0x0a,0x80,0xe2,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x16_f16 v[10:25], a[2:3], v[4:7], v2 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe4,0xd3,0x02,0x09,0x0a,0x0c]
+0x0a,0x0b,0xe4,0xd3,0x02,0x09,0x0a,0x0c
+
+# GFX940: v_smfmac_f32_32x32x16_f16 a[10:25], v[2:3], a[4:7], v3 ; encoding: [0x0a,0x80,0xe4,0xd3,0x02,0x09,0x0e,0x14]
+0x0a,0x80,0xe4,0xd3,0x02,0x09,0x0e,0x14
+
+# GFX940: v_smfmac_f32_16x16x32_bf16 v[10:13], a[2:3], v[4:7], v4 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe6,0xd3,0x02,0x09,0x12,0x0c]
+0x0a,0x0b,0xe6,0xd3,0x02,0x09,0x12,0x0c
+
+# GFX940: v_smfmac_f32_16x16x32_bf16 a[10:13], v[2:3], a[4:7], v5 ; encoding: [0x0a,0x80,0xe6,0xd3,0x02,0x09,0x16,0x14]
+0x0a,0x80,0xe6,0xd3,0x02,0x09,0x16,0x14
+
+# GFX940: v_smfmac_f32_32x32x16_bf16 v[10:25], a[2:3], v[4:7], v6 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe8,0xd3,0x02,0x09,0x1a,0x0c]
+0x0a,0x0b,0xe8,0xd3,0x02,0x09,0x1a,0x0c
+
+# GFX940: v_smfmac_f32_32x32x16_bf16 a[10:25], v[2:3], a[4:7], v7 ; encoding: [0x0a,0x80,0xe8,0xd3,0x02,0x09,0x1e,0x14]
+0x0a,0x80,0xe8,0xd3,0x02,0x09,0x1e,0x14
+
+# GFX940: v_smfmac_f32_32x32x16_bf16 v[10:25], a[2:3], v[4:7], v8 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe8,0xd3,0x02,0x09,0x22,0x0c]
+0x0a,0x0b,0xe8,0xd3,0x02,0x09,0x22,0x0c
+
+# GFX940: v_smfmac_f32_32x32x16_bf16 a[10:25], v[2:3], a[4:7], v9 ; encoding: [0x0a,0x80,0xe8,0xd3,0x02,0x09,0x26,0x14]
+0x0a,0x80,0xe8,0xd3,0x02,0x09,0x26,0x14
+
+# GFX940: v_smfmac_i32_16x16x64_i8 v[10:13], a[2:3], v[4:7], v10 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xea,0xd3,0x02,0x09,0x2a,0x0c]
+0x0a,0x0b,0xea,0xd3,0x02,0x09,0x2a,0x0c
+
+# GFX940: v_smfmac_i32_16x16x64_i8 a[10:13], v[2:3], a[4:7], v11 ; encoding: [0x0a,0x80,0xea,0xd3,0x02,0x09,0x2e,0x14]
+0x0a,0x80,0xea,0xd3,0x02,0x09,0x2e,0x14
+
+# GFX940: v_smfmac_i32_32x32x32_i8 v[10:25], a[2:3], v[4:7], v12 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xec,0xd3,0x02,0x09,0x32,0x0c]
+0x0a,0x0b,0xec,0xd3,0x02,0x09,0x32,0x0c
+
+# GFX940: v_smfmac_i32_32x32x32_i8 a[10:25], v[2:3], a[4:7], v13 ; encoding: [0x0a,0x80,0xec,0xd3,0x02,0x09,0x36,0x14]
+0x0a,0x80,0xec,0xd3,0x02,0x09,0x36,0x14
Index: llvm/test/MC/AMDGPU/mai-gfx940.s
===
--- llvm/test/MC/AMDGPU/mai-gfx940.s
+++ llvm/test/MC/AMDGPU/mai-gfx940.s
@@ -459,3 +459,51 @@
 v_mfma_f32_32x32x4xf32 a[0:15], v[2:3], v[4:5], a[18:33]
 // GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[18:33] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x4a,0x04]
 // GFX90A: error: instruction not supported on this GPU
+
+v_smfmac_f32_16x16x32_f16 v[10:13], a[2:3], v[4:7], v0 cbsz:3 abid:1
+// GFX940: v_smfmac_f32

[PATCH] D123825: clang/AMDGPU: Define macro for -munsafe-fp-atomics

2022-04-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123825/new/

https://reviews.llvm.org/D123825

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D119886: [AMDGPU] Promote recursive loads from kernel argument to constant

2022-02-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGb0aa1946dfe1: [AMDGPU] Promote recursive loads from kernel 
argument to constant (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119886/new/

https://reviews.llvm.org/D119886

Files:
  clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
  llvm/lib/Target/AMDGPU/AMDGPUPromoteKernelArguments.cpp
  llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll

Index: llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
===
--- llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
+++ llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
@@ -11,11 +11,15 @@
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds float**, float** addrspace(1)* [[ARG:%.*]], i32 [[I]]
-; CHECK-NEXT:[[P2:%.*]] = load float**, float** addrspace(1)* [[P1]], align 8
-; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast float** [[P2]] to float* addrspace(1)*
-; CHECK-NEXT:[[P3:%.*]] = load float*, float* addrspace(1)* [[P2_GLOBAL]], align 8
-; CHECK-NEXT:[[P3_GLOBAL:%.*]] = addrspacecast float* [[P3]] to float addrspace(1)*
-; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[P3_GLOBAL]], align 4
+; CHECK-NEXT:[[P1_CONST:%.*]] = addrspacecast float** addrspace(1)* [[P1]] to float** addrspace(4)*
+; CHECK-NEXT:[[P2:%.*]] = load float**, float** addrspace(4)* [[P1_CONST]], align 8
+; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast float** [[P2]] to float* addrspace(1)*
+; CHECK-NEXT:[[TMP1:%.*]] = addrspacecast float* addrspace(1)* [[TMP0]] to float**
+; CHECK-NEXT:[[P2_FLAT:%.*]] = addrspacecast float* addrspace(1)* [[TMP0]] to float**
+; CHECK-NEXT:[[P2_CONST:%.*]] = addrspacecast float** [[TMP1]] to float* addrspace(4)*
+; CHECK-NEXT:[[P3:%.*]] = load float*, float* addrspace(4)* [[P2_CONST]], align 8
+; CHECK-NEXT:[[TMP2:%.*]] = addrspacecast float* [[P3]] to float addrspace(1)*
+; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[TMP2]], align 4
 ; CHECK-NEXT:ret void
 ;
 entry:
@@ -37,9 +41,11 @@
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds float*, float* addrspace(1)* [[ARG_GLOBAL]], i32 [[I]]
 ; CHECK-NEXT:[[P1_CAST:%.*]] = bitcast float* addrspace(1)* [[P1]] to i32* addrspace(1)*
-; CHECK-NEXT:[[P2:%.*]] = load i32*, i32* addrspace(1)* [[P1_CAST]], align 8
-; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast i32* [[P2]] to i32 addrspace(1)*
-; CHECK-NEXT:store i32 0, i32 addrspace(1)* [[P2_GLOBAL]], align 4
+; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast i32* addrspace(1)* [[P1_CAST]] to i32**
+; CHECK-NEXT:[[P1_CAST_CONST:%.*]] = addrspacecast i32** [[TMP0]] to i32* addrspace(4)*
+; CHECK-NEXT:[[P2:%.*]] = load i32*, i32* addrspace(4)* [[P1_CAST_CONST]], align 8
+; CHECK-NEXT:[[TMP1:%.*]] = addrspacecast i32* [[P2]] to i32 addrspace(1)*
+; CHECK-NEXT:store i32 0, i32 addrspace(1)* [[TMP1]], align 4
 ; CHECK-NEXT:ret void
 ;
 entry:
@@ -60,10 +66,11 @@
 ; CHECK-LABEL: @ptr_in_struct(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[P:%.*]] = getelementptr inbounds [[STRUCT_S:%.*]], [[STRUCT_S]] addrspace(1)* [[ARG:%.*]], i64 0, i32 0
-; CHECK-NEXT:[[P1:%.*]] = load float*, float* addrspace(1)* [[P]], align 8
-; CHECK-NEXT:[[P1_GLOBAL:%.*]] = addrspacecast float* [[P1]] to float addrspace(1)*
+; CHECK-NEXT:[[P_CONST:%.*]] = addrspacecast float* addrspace(1)* [[P]] to float* addrspace(4)*
+; CHECK-NEXT:[[P1:%.*]] = load float*, float* addrspace(4)* [[P_CONST]], align 8
+; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast float* [[P1]] to float addrspace(1)*
 ; CHECK-NEXT:[[ID:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
-; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds float, float addrspace(1)* [[P1_GLOBAL]], i32 [[ID]]
+; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds float, float addrspace(1)* [[TMP0]], i32 [[ID]]
 ; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[ARRAYIDX]], align 4
 ; CHECK-NEXT:ret void
 ;
@@ -80,7 +87,14 @@
 
 ; GCN-LABEL: flat_ptr_arg:
 ; GCN-COUNT-2: global_load_dwordx2
-; GCN: global_load_dwordx4
+
+; FIXME: First load is in the constant address space and second is in global
+;because it is clobbered by store. GPU load store vectorizer cannot
+;combine them. Note, this does not happen with -O3 because loads are
+;vectorized in pairs earlier and stay in the global address space.
+
+; GCN: global_load_dword v{{[0-9]+}}, [[PTR:v\[[0-9:]+\]]], off{{$}}
+; GCN: global_load_dwordx3 v[{{[0-9:]+}}], [[PTR]

[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

You do not handle masks other than 0 yet?




Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:219
+// MASK = 0: No instructions may be scheduled across SCHED_BARRIER.
+// MASK = 1: Non-memory, non-side-effect producing instructions may be
+//   scheduled across SCHED_BARRIER, i.e. allow ALU instructions 
to pass.

Since you are going to extend it I'd suggest this is -1. Then you will start 
carving bits outs of it. That way if someone start to use it it will still work 
after update.



Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:222
+def int_amdgcn_sched_barrier : GCCBuiltin<"__builtin_amdgcn_sched_barrier">,
+  Intrinsic<[], [llvm_i16_ty], [ImmArg>, IntrNoMem,
+IntrHasSideEffects, IntrConvergent, 
IntrWillReturn]>;

Why not full i32? This is immediate anyway but you will have more bits for the 
future.



Comment at: llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:213
+OutStreamer->emitRawComment(" sched_barrier mask(" +
+Twine(MI->getOperand(0).getImm()) + ")");
+  }

Use hex?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124700/new/

https://reviews.llvm.org/D124700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D124700#3483609 , @kerbowa wrote:

> In D124700#3483556 , @rampitec 
> wrote:
>
>> You do not handle masks other than 0 yet?
>
> We handle 0 and 1 only.

Do you mean 1 is supported simply because it has side effects? If I understand 
it right you will need to remove this to support more flexible masks, right?




Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:219
+// MASK = 0: No instructions may be scheduled across SCHED_BARRIER.
+// MASK = 1: Non-memory, non-side-effect producing instructions may be
+//   scheduled across SCHED_BARRIER, i.e. allow ALU instructions 
to pass.

kerbowa wrote:
> rampitec wrote:
> > Since you are going to extend it I'd suggest this is -1. Then you will 
> > start carving bits outs of it. That way if someone start to use it it will 
> > still work after update.
> Since the most common use case will be to block all instruction types I 
> thought having that be MASK = 0 made the most sense. After that, we carve out 
> bits for types of instructions that should be scheduled across it.
> 
> There may be modes where we restrict certain types of memops, so we cannot 
> have MASK = 1 above changed to -1. Since this (MASK = 1) is allowing all ALU 
> across we could define which bits mean VALU/SALU/MFMA etc and use that mask 
> if you think it's better. I'm worried we won't be able to anticipate all the 
> types that we could want to be maskable. It might be better to just have a 
> single bit that can mean all ALU, or all MemOps, and so on to avoid this 
> problem.
Ok. Let it be 1.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124700/new/

https://reviews.llvm.org/D124700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

In D124700#3483715 , @kerbowa wrote:

> In D124700#3483633 , @rampitec 
> wrote:
>
>> In D124700#3483609 , @kerbowa 
>> wrote:
>>
>>> In D124700#3483556 , @rampitec 
>>> wrote:
>>>
 You do not handle masks other than 0 yet?
>>>
>>> We handle 0 and 1 only.
>>
>> Do you mean 1 is supported simply because it has side effects? If I 
>> understand it right you will need to remove this to support more flexible 
>> masks, right?
>
> Yes.

LGTM given that. But change imm to i32 before committing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124700/new/

https://reviews.llvm.org/D124700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-05-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124700/new/

https://reviews.llvm.org/D124700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D129902: [AMDGPU] Support for gfx940 fp8 conversions

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9fa5a6b7e8a2: [AMDGPU] Support for gfx940 fp8 conversions 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129902/new/

https://reviews.llvm.org/D129902

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/VOP1Instructions.td
  llvm/lib/Target/AMDGPU/VOP3Instructions.td
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll
  llvm/test/MC/AMDGPU/gfx940_asm_features.s
  llvm/test/MC/AMDGPU/gfx940_err.s
  llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt

Index: llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
+++ llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
@@ -263,3 +263,159 @@
 
 # GFX940: buffer_atomic_min_f64 v[4:5], off, s[8:11], s3 sc1 ; encoding: [0x00,0x80,0x40,0xe1,0x00,0x04,0x02,0x03]
 0x00,0x80,0x40,0xe1,0x00,0x04,0x02,0x03
+
+# GFX940: v_cvt_f32_bf8_e32 v1, s3; encoding: [0x03,0xaa,0x02,0x7e]
+0x03,0xaa,0x02,0x7e
+
+# GFX940: v_cvt_f32_bf8_e32 v1, 3 ; encoding: [0x83,0xaa,0x02,0x7e]
+0x83,0xaa,0x02,0x7e
+
+# GFX940: v_cvt_f32_bf8_e32 v1, v3; encoding: [0x03,0xab,0x02,0x7e]
+0x03,0xab,0x02,0x7e
+
+# GFX940: v_cvt_f32_bf8_sdwa v1, s3 src0_sel:BYTE_1 ; encoding: [0xf9,0xaa,0x02,0x7e,0x03,0x06,0x81,0x00]
+0xf9,0xaa,0x02,0x7e,0x03,0x06,0x81,0x00
+
+# GFX940: v_cvt_f32_bf8_dpp v1, v3 quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xaa,0x02,0x7e,0x03,0x58,0x00,0xff]
+0xfa,0xaa,0x02,0x7e,0x03,0x58,0x00,0xff
+
+# GFX940: v_cvt_f32_bf8_e64 v1, s3 mul:2  ; encoding: [0x01,0x00,0x95,0xd1,0x03,0x00,0x00,0x08]
+0x01,0x00,0x95,0xd1,0x03,0x00,0x00,0x08
+
+# GFX940: v_cvt_f32_bf8_sdwa v1, s3 clamp mul:2 src0_sel:BYTE_1 ; encoding: [0xf9,0xaa,0x02,0x7e,0x03,0x66,0x81,0x00]
+0xf9,0xaa,0x02,0x7e,0x03,0x66,0x81,0x00
+
+# GFX940: v_cvt_f32_bf8_e64 v1, s3 clamp  ; encoding: [0x01,0x80,0x95,0xd1,0x03,0x00,0x00,0x00]
+0x01,0x80,0x95,0xd1,0x03,0x00,0x00,0x00
+
+# GFX940: v_cvt_f32_fp8_e32 v1, s3; encoding: [0x03,0xa8,0x02,0x7e]
+0x03,0xa8,0x02,0x7e
+
+# GFX940: v_cvt_f32_fp8_e32 v1, 3 ; encoding: [0x83,0xa8,0x02,0x7e]
+0x83,0xa8,0x02,0x7e
+
+# GFX940: v_cvt_f32_fp8_e32 v1, v3; encoding: [0x03,0xa9,0x02,0x7e]
+0x03,0xa9,0x02,0x7e
+
+# GFX940: v_cvt_f32_fp8_sdwa v1, s3 src0_sel:BYTE_1 ; encoding: [0xf9,0xa8,0x02,0x7e,0x03,0x06,0x81,0x00]
+0xf9,0xa8,0x02,0x7e,0x03,0x06,0x81,0x00
+
+# GFX940: v_cvt_f32_fp8_dpp v1, v3 quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xa8,0x02,0x7e,0x03,0x58,0x00,0xff]
+0xfa,0xa8,0x02,0x7e,0x03,0x58,0x00,0xff
+
+# GFX940: v_cvt_f32_fp8_e64 v1, s3 mul:2  ; encoding: [0x01,0x00,0x94,0xd1,0x03,0x00,0x00,0x08]
+0x01,0x00,0x94,0xd1,0x03,0x00,0x00,0x08
+
+# GFX940: v_cvt_f32_fp8_sdwa v1, s3 clamp mul:2 src0_sel:BYTE_1 ; encoding: [0xf9,0xa8,0x02,0x7e,0x03,0x66,0x81,0x00]
+0xf9,0xa8,0x02,0x7e,0x03,0x66,0x81,0x00
+
+# GFX940: v_cvt_f32_fp8_e64 v1, s3 clamp  ; encoding: [0x01,0x80,0x94,0xd1,0x03,0x00,0x00,0x00]
+0x01,0x80,0x94,0xd1,0x03,0x00,0x00,0x00
+
+# GFX940: v_cvt_f32_fp8_sdwa v1, 3 src0_sel:BYTE_1 ; encoding: [0xf9,0xa8,0x02,0x7e,0x83,0x06,0x81,0x00]
+0xf9,0xa8,0x02,0x7e,0x83,0x06,0x81,0x00
+
+# GFX940: v_cvt_pk_f32_bf8_e32 v[2:3], s3 ; encoding: [0x03,0xae,0x04,0x7e]
+0x03,0xae,0x04,0x7e
+
+# GFX940: v_cvt_pk_f32_bf8_e32 v[2:3], 3  ; encoding: [0x83,0xae,0x04,0x7e]
+0x83,0xae,0x04,0x7e
+
+# GFX940: v_cvt_pk_f32_bf8_e32 v[2:3], v3 ; encoding: [0x03,0xaf,0x04,0x7e]
+0x03,0xaf,0x04,0x7e
+
+# GFX940: v_cvt_pk_f32_bf8_sdwa v[2:3], s3 src0_sel:WORD_1 ; encoding: [0xf9,0xae,0x04,0x7e,0x03,0x06,0x85,0x00]
+0xf9,0xae,0x04,0x7e,0x03,0x06,0x85,0x00
+
+# GFX940: v_cvt_pk_f32_bf8_dpp v[0:1], v3 quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xae,0x00,0x7e,0x03,0x58,0x00,0xff]
+0xfa,0xae,0x00,0x7e,0x03,0x58,0x00,0xff
+
+# GFX940: v_cvt_pk_f32_bf8_e64 v[2:3], s3 mul:2   ; encoding: [0x02,0x00,0x97,0xd1,0x03,0x00,0x00,0x08]
+0x02,0x00,0x97,0xd1,0x03,0x00,0x00,0x08
+
+# GFX940: v_cvt_pk_f32_bf8_sdwa v[2:3], s3 clamp mul:2 src0_sel:WORD_1 ; encoding: [0xf9,0xae,0x04,0x7e,0x03,0x66,0x85,0x00]
+0xf9,0xae,0x04,0x7e,0x03,0x66,0x85,0x00
+
+# GFX940: v_cvt_pk_f32_bf8_e64 v[2:3], s3 clamp   ; encoding: [0x02,0x80,0x9

[PATCH] D129906: [AMDGPU] Support for gfx940 fp8 mfma

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG2695f0a688e9: [AMDGPU] Support for gfx940 fp8 mfma (authored 
by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129906/new/

https://reviews.llvm.org/D129906

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll
  llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll
  llvm/test/MC/AMDGPU/mai-gfx940.s
  llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

Index: llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
+++ llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
@@ -63,6 +63,78 @@
 # GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04]
 0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04
 
+# GFX940: v_mfma_f32_16x16x32_bf8_bf8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xf0,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf0,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_bf8_bf8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xf0,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf0,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_bf8_bf8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xf0,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf0,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_16x16x32_bf8_fp8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xf1,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf1,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_bf8_fp8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xf1,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf1,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_bf8_fp8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xf1,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf1,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_16x16x32_fp8_bf8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xf2,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf2,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_fp8_bf8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xf2,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf2,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_fp8_bf8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xf2,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf2,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_16x16x32_fp8_fp8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xf3,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf3,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_fp8_fp8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xf3,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf3,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_fp8_fp8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xf3,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf3,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_32x32x16_bf8_bf8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xf4,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf4,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_bf8_bf8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xf4,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf4,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_bf8_bf8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xf4,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf4,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_32x32x16_bf8_fp8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xf5,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf5,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_bf8_fp8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xf5,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf5,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_bf8_fp8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xf5,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf5,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_32x32x16_fp8_bf8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xf6,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf6,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_fp8_bf8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xf6,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf6,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_fp8_bf8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xf6,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf6,0xd3,0x0

[PATCH] D129908: [AMDGPU] Support for gfx940 fp8 smfmac

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG523a99c0eb03: [AMDGPU] Support for gfx940 fp8 smfmac 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129908/new/

https://reviews.llvm.org/D129908

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll
  llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll
  llvm/test/MC/AMDGPU/mai-gfx940.s
  llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

Index: llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
+++ llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
@@ -392,3 +392,51 @@
 
 # GFX940: v_smfmac_i32_32x32x32_i8 v[10:25], v[2:3], v[4:7], v3 abid:15 ; encoding: [0x0a,0x78,0xec,0xd3,0x02,0x09,0x0e,0x04]
 0x0a,0x78,0xec,0xd3,0x02,0x09,0x0e,0x04
+
+# GFX940: v_smfmac_f32_16x16x64_bf8_bf8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xf8,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xf8,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_16x16x64_bf8_bf8 a[0:3], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xf8,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xf8,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_16x16x64_bf8_fp8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xf9,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xf9,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_16x16x64_bf8_fp8 a[0:3], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xf9,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xf9,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_16x16x64_fp8_bf8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfa,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfa,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_16x16x64_fp8_bf8 a[0:3], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfa,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfa,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_16x16x64_fp8_fp8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfb,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfb,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_16x16x64_fp8_fp8 a[0:3], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfb,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfb,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x32_bf8_bf8 v[0:15], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfc,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfc,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_32x32x32_bf8_bf8 a[0:15], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfc,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfc,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x32_bf8_fp8 v[0:15], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfd,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfd,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_32x32x32_bf8_fp8 a[0:15], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfd,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfd,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x32_fp8_bf8 v[0:15], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfe,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfe,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_32x32x32_fp8_bf8 a[0:15], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfe,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfe,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x32_fp8_fp8 v[0:15], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xff,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xff,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_32x32x32_fp8_fp8 a[0:15], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xff,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xff,0xd3,0x02,0x09,0x06,0x14
Index: llvm/test/MC/AMDGPU/mai-gfx940.s
===
--- llvm/test/MC/AMDGPU/mai-gfx940.s
+++ llvm/test/MC/AMDGPU/mai-gfx940.s
@@ -616,6 +616,70 @@
 // GFX940: v_smfmac_i32_32x32x32_i8 a[10:25], v[2:3], a[4:7], v11 ; encoding: [0x0a,0x80,0xec,0xd3,0x02,0x09,0x2e,0x14]
 // GFX90A: error: instruction not supported on this GPU
 
+v_smfmac_f32_16x16x64_bf8_bf8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1
+// GFX940: v_smfmac_f32_16x16x64_bf8_bf8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xf8,0xd3,0x02,0x09,0x06,0x0c]
+// GFX90A: error: instruction not supported on this GPU
+
+v_smfmac_f32_16x16x64_bf8_bf8 a[0:3], v[2:3], a[4:7], v1
+// GFX940: v_smfmac_f

[PATCH] D62739: AMDGPU: Always emit amdgpu-flat-work-group-size

2019-08-22 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: lib/CodeGen/TargetInfo.cpp:7885
+// By default, restrict the maximum size to 256.
+F->addFnAttr("amdgpu-flat-work-group-size", "128,256");
   }

arsenm wrote:
> yaxunl wrote:
> > arsenm wrote:
> > > b-sumner wrote:
> > > > Theoretically, shouldn't the minimum be 1?
> > > That's what I thought, but the backend is defaulting to 2 * wave size now
> > I don't get it. This attribute indicates the possible workgroup size range 
> > this kernel may be run with, right? It only depends on how user execute the 
> > kernel. How is it related to backend defaults?
> The backend currently assumes 128,256 by default as the bounds. I want to 
> make this a frontend decision, and make the backend assumption the most 
> conservative default
I would agree that minimum should be 1. Backend's choice is also unclear, but 
emitting a false attribute is a separate issue.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62739/new/

https://reviews.llvm.org/D62739



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62739: AMDGPU: Always emit amdgpu-flat-work-group-size

2019-08-27 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62739/new/

https://reviews.llvm.org/D62739



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling

2020-09-21 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

This is obviously LGTM from the AMDGPU BE point of view, we did it ourselves.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87972/new/

https://reviews.llvm.org/D87972

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D87947: [AMDGPU] Make ds fp atomics overloadable

2020-09-23 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG59691dc8740c: [AMDGPU] Make ds fp atomics overloadable 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87947/new/

https://reviews.llvm.org/D87947

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGenCUDA/builtins-amdgcn.cu
  clang/test/CodeGenOpenCL/builtins-amdgcn-vi.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/test/CodeGen/AMDGPU/lds_atomic_f32.ll

Index: llvm/test/CodeGen/AMDGPU/lds_atomic_f32.ll
===
--- llvm/test/CodeGen/AMDGPU/lds_atomic_f32.ll
+++ llvm/test/CodeGen/AMDGPU/lds_atomic_f32.ll
@@ -1,9 +1,9 @@
 ; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefixes=GCN,VI %s
 ; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s
 
-declare float @llvm.amdgcn.ds.fadd(float addrspace(3)* nocapture, float, i32, i32, i1)
-declare float @llvm.amdgcn.ds.fmin(float addrspace(3)* nocapture, float, i32, i32, i1)
-declare float @llvm.amdgcn.ds.fmax(float addrspace(3)* nocapture, float, i32, i32, i1)
+declare float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* nocapture, float, i32, i32, i1)
+declare float @llvm.amdgcn.ds.fmin.f32(float addrspace(3)* nocapture, float, i32, i32, i1)
+declare float @llvm.amdgcn.ds.fmax.f32(float addrspace(3)* nocapture, float, i32, i32, i1)
 
 ; GCN-LABEL: {{^}}lds_ds_fadd:
 ; VI-DAG: s_mov_b32 m0
@@ -19,9 +19,9 @@
   %shl1 = shl i32 %idx.add, 4
   %ptr0 = inttoptr i32 %shl0 to float addrspace(3)*
   %ptr1 = inttoptr i32 %shl1 to float addrspace(3)*
-  %a1 = call float @llvm.amdgcn.ds.fadd(float addrspace(3)* %ptr0, float 4.2e+1, i32 0, i32 0, i1 false)
-  %a2 = call float @llvm.amdgcn.ds.fadd(float addrspace(3)* %ptr1, float 4.2e+1, i32 0, i32 0, i1 false)
-  %a3 = call float @llvm.amdgcn.ds.fadd(float addrspace(3)* %ptrf, float %a1, i32 0, i32 0, i1 false)
+  %a1 = call float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %ptr0, float 4.2e+1, i32 0, i32 0, i1 false)
+  %a2 = call float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %ptr1, float 4.2e+1, i32 0, i32 0, i1 false)
+  %a3 = call float @llvm.amdgcn.ds.fadd.f32(float addrspace(3)* %ptrf, float %a1, i32 0, i32 0, i1 false)
   store float %a3, float addrspace(1)* %out
   ret void
 }
@@ -40,9 +40,9 @@
   %shl1 = shl i32 %idx.add, 4
   %ptr0 = inttoptr i32 %shl0 to float addrspace(3)*
   %ptr1 = inttoptr i32 %shl1 to float addrspace(3)*
-  %a1 = call float @llvm.amdgcn.ds.fmin(float addrspace(3)* %ptr0, float 4.2e+1, i32 0, i32 0, i1 false)
-  %a2 = call float @llvm.amdgcn.ds.fmin(float addrspace(3)* %ptr1, float 4.2e+1, i32 0, i32 0, i1 false)
-  %a3 = call float @llvm.amdgcn.ds.fmin(float addrspace(3)* %ptrf, float %a1, i32 0, i32 0, i1 false)
+  %a1 = call float @llvm.amdgcn.ds.fmin.f32(float addrspace(3)* %ptr0, float 4.2e+1, i32 0, i32 0, i1 false)
+  %a2 = call float @llvm.amdgcn.ds.fmin.f32(float addrspace(3)* %ptr1, float 4.2e+1, i32 0, i32 0, i1 false)
+  %a3 = call float @llvm.amdgcn.ds.fmin.f32(float addrspace(3)* %ptrf, float %a1, i32 0, i32 0, i1 false)
   store float %a3, float addrspace(1)* %out
   ret void
 }
@@ -61,9 +61,9 @@
   %shl1 = shl i32 %idx.add, 4
   %ptr0 = inttoptr i32 %shl0 to float addrspace(3)*
   %ptr1 = inttoptr i32 %shl1 to float addrspace(3)*
-  %a1 = call float @llvm.amdgcn.ds.fmax(float addrspace(3)* %ptr0, float 4.2e+1, i32 0, i32 0, i1 false)
-  %a2 = call float @llvm.amdgcn.ds.fmax(float addrspace(3)* %ptr1, float 4.2e+1, i32 0, i32 0, i1 false)
-  %a3 = call float @llvm.amdgcn.ds.fmax(float addrspace(3)* %ptrf, float %a1, i32 0, i32 0, i1 false)
+  %a1 = call float @llvm.amdgcn.ds.fmax.f32(float addrspace(3)* %ptr0, float 4.2e+1, i32 0, i32 0, i1 false)
+  %a2 = call float @llvm.amdgcn.ds.fmax.f32(float addrspace(3)* %ptr1, float 4.2e+1, i32 0, i32 0, i1 false)
+  %a3 = call float @llvm.amdgcn.ds.fmax.f32(float addrspace(3)* %ptrf, float %a1, i32 0, i32 0, i1 false)
   store float %a3, float addrspace(1)* %out
   ret void
 }
Index: llvm/include/llvm/IR/IntrinsicsAMDGPU.td
===
--- llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -397,11 +397,10 @@
 def int_amdgcn_atomic_inc : AMDGPUAtomicIncIntrin;
 def int_amdgcn_atomic_dec : AMDGPUAtomicIncIntrin;
 
-class AMDGPULDSF32Intrin :
-  GCCBuiltin,
-  Intrinsic<[llvm_float_ty],
-[LLVMQualPointerType,
-llvm_float_ty,
+class AMDGPULDSIntrin :
+  Intrinsic<[llvm_any_ty],
+[LLVMQualPointerType, 3>,
+LLVMMatchType<0>,
 llvm_i32_ty, // ordering
 llvm_i32_ty, // scope
 llvm_i1_ty], // isVolatile
@@ -446,9 +445,9 @@
 def int_amdgcn_ds_append : AMDGPUDSAppendConsumedIntrinsic;
 def int_amdgcn_ds

[PATCH] D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support.

2020-11-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:1581
+def int_amdgcn_endpgm : GCCBuiltin<"__builtin_amdgcn_endpgm">,
+  Intrinsic<[], [], [IntrNoReturn, IntrNoMem, IntrHasSideEffects]
+>;

Mayby also IntrCold?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90809/new/

https://reviews.llvm.org/D90809

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support.

2020-11-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D90809#2376994 , @b-sumner wrote:

> Should this also be IntrConvergent?

Probably yes... This is control flow after all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90809/new/

https://reviews.llvm.org/D90809

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support.

2020-11-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

In D90809#2377221 , @hliao wrote:

> In D90809#2377083 , @rampitec wrote:
>
>> In D90809#2376994 , @b-sumner wrote:
>>
>>> Should this also be IntrConvergent?
>>
>> Probably yes... This is control flow after all.
>
> The real effect of this intrinsic seems not changed if it's moved somewhere. 
> For example,
>
> convert this
>
>   if (a || b)
> endpgm();
>
> to
>
>   if (a)
> endpgm();
>   else if (b)
> endpgm();
>
> no matter a or b are divergent or not, it works the same as the original one 
> as `s_endpgm` is a scalar instruction. Ofc, if we really doubt bad things may 
> happen, I could add that for safety as we definitely will revise this later.

It is probably OK to clone it, we shall insert skips around it anyway. LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90809/new/

https://reviews.llvm.org/D90809

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90886: [AMDGPU] Simplify amdgpu-macros.cl test. NFC.

2020-11-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG4fcdfc4398bd: [AMDGPU] Simplify amdgpu-macros.cl test. NFC. 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90886/new/

https://reviews.llvm.org/D90886

Files:
  clang/test/Driver/amdgpu-macros.cl

Index: clang/test/Driver/amdgpu-macros.cl
===
--- clang/test/Driver/amdgpu-macros.cl
+++ clang/test/Driver/amdgpu-macros.cl
@@ -5,66 +5,35 @@
 // R600-based processors.
 //
 
-// RUN: %clang -E -dM -target r600 -mcpu=r600 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv630 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv635 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s
-// RUN: %clang -E -dM -target r600 -mcpu=r630 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R630 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rs780 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RS880 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rs880 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RS880 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv610 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RS880 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv620 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RS880 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv670 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RV670 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv710 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RV710 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv730 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RV730 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv740 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RV770 %s
-// RUN: %clang -E -dM -target r600 -mcpu=rv770 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,RV770 %s
-// RUN: %clang -E -dM -target r600 -mcpu=cedar %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,CEDAR %s
-// RUN: %clang -E -dM -target r600 -mcpu=palm %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,CEDAR %s
-// RUN: %clang -E -dM -target r600 -mcpu=cypress %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,CYPRESS %s
-// RUN: %clang -E -dM -target r600 -mcpu=hemlock %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,CYPRESS %s
-// RUN: %clang -E -dM -target r600 -mcpu=juniper %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,JUNIPER %s
-// RUN: %clang -E -dM -target r600 -mcpu=redwood %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,REDWOOD %s
-// RUN: %clang -E -dM -target r600 -mcpu=sumo %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,SUMO %s
-// RUN: %clang -E -dM -target r600 -mcpu=sumo2 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,SUMO %s
-// RUN: %clang -E -dM -target r600 -mcpu=barts %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,BARTS %s
-// RUN: %clang -E -dM -target r600 -mcpu=caicos %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,CAICOS %s
-// RUN: %clang -E -dM -target r600 -mcpu=aruba %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,CAYMAN %s
-// RUN: %clang -E -dM -target r600 -mcpu=cayman %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,CAYMAN %s
-// RUN: %clang -E -dM -target r600 -mcpu=turks %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,TURKS %s
-
-// R600-NOT:#define FP_FAST_FMA 1
-// R630-NOT:#define FP_FAST_FMA 1
-// RS880-NOT:   #define FP_FAST_FMA 1
-// RV670-NOT:   #define FP_FAST_FMA 1
-// RV710-NOT:   #define FP_FAST_FMA 1
-// RV730-NOT:   #define FP_FAST_FMA 1
-// RV770-NOT:   #define FP_FAST_FMA 1
-// CEDAR-NOT:   #define FP_FAST_FMA 1
-// CYPRESS-NOT: #define FP_FAST_FMA 1
-// JUNIPER-NOT: #define FP_FAST_FMA 1
-// REDWOOD-NOT: #define FP_FAST_FMA 1
-// SUMO-NOT:#define FP_FAST_FMA 1
-// BARTS-NOT:   #define FP_FAST_FMA 1
-// CAICOS-NOT:  #define FP_FAST_FMA 1
-// CAYMAN-NOT:  #define FP_FAST_FMA 1
-// TURKS-NOT:   #define FP_FAST_FMA 1
-
-// R600-NOT:#define FP_FAST_FMAF 1
-// R630-NOT:#define FP_FAST_FMAF 1
-// RS880-NOT:   #define FP_FAST_FMAF 1
-// RV670-NOT:   #define FP_FAST_FMAF 1
-// RV710-NOT:   #define FP_FAST_FMAF 1
-// RV730-NOT:   #define FP_FAST_FMAF 1
-// RV770-NOT:   #define FP_FAST_FMAF 1
-// CEDAR-NOT:   #define FP_FAST_FMAF 1
-// CYPRESS-NOT: #define FP_FAST_FMAF 1
-// JUNIPER-NOT: #define FP_FAST_FMAF 1
-// REDWOOD-NOT: #define FP_FAST_FMAF 1
-// SUMO-NOT:#define FP_FAST_FMAF 1
-// BARTS-NOT:   #define FP_FAST_FMAF 1
-// CAICOS-NOT:  #define FP_FAST_FMAF 1
-// CAYMAN-NOT:  #define FP_FAST_FMAF 1
-// TURKS-NOT:   #define FP_FAST_FMAF 1
+// RUN: %clang -E -dM -target r600 -mcpu=r600 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s -DCPU=r600
+// RUN: %clang -E -dM -target r600 -mcpu=rv630 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s -DCPU=r600
+// RUN: %clang -E -dM -target r600 -mcpu=rv635 %s 2>&1 | FileCheck --check-prefixes=ARCH-R600,R600 %s -DCPU=r600
+//

[PATCH] D88916: [AMDGPU] Add gfx602, gfx705, gfx805 targets

2020-10-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88916/new/

https://reviews.llvm.org/D88916

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89487: [AMDGPU] gfx1032 target

2020-10-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGd1beb95d1241: [AMDGPU] gfx1032 target (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89487/new/

https://reviews.llvm.org/D89487

Files:
  clang/include/clang/Basic/Cuda.h
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/NVPTX.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  llvm/docs/AMDGPUUsage.rst
  llvm/include/llvm/BinaryFormat/ELF.h
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/ObjectYAML/ELFYAML.cpp
  llvm/lib/Support/TargetParser.cpp
  llvm/lib/Target/AMDGPU/GCNProcessors.td
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
  llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll
  llvm/test/CodeGen/AMDGPU/hsa-note-no-func.ll
  llvm/test/MC/AMDGPU/gfx1030_err.s
  llvm/test/MC/AMDGPU/gfx1030_new.s
  llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt
  llvm/tools/llvm-readobj/ELFDumper.cpp

Index: llvm/tools/llvm-readobj/ELFDumper.cpp
===
--- llvm/tools/llvm-readobj/ELFDumper.cpp
+++ llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -1781,6 +1781,7 @@
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1012),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1030),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1031),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1032),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_XNACK),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_SRAM_ECC)
 };
Index: llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt
+++ llvm/test/MC/Disassembler/AMDGPU/gfx1030_dasm_new.txt
@@ -1,5 +1,6 @@
 # RUN: llvm-mc -arch=amdgcn -mcpu=gfx1030 -disassemble -show-encoding < %s | FileCheck -check-prefix=GFX10 %s
 # RUN: llvm-mc -arch=amdgcn -mcpu=gfx1031 -disassemble -show-encoding < %s | FileCheck -check-prefix=GFX10 %s
+# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1032 -disassemble -show-encoding < %s | FileCheck -check-prefix=GFX10 %s
 
 # GFX10: global_load_dword_addtid v1, s[2:3] offset:16
 0x10,0x80,0x58,0xdc,0x00,0x00,0x02,0x01
Index: llvm/test/MC/AMDGPU/gfx1030_new.s
===
--- llvm/test/MC/AMDGPU/gfx1030_new.s
+++ llvm/test/MC/AMDGPU/gfx1030_new.s
@@ -1,5 +1,6 @@
 // RUN: llvm-mc -arch=amdgcn -mcpu=gfx1030 -show-encoding %s | FileCheck --check-prefix=GFX10 %s
 // RUN: llvm-mc -arch=amdgcn -mcpu=gfx1031 -show-encoding %s | FileCheck --check-prefix=GFX10 %s
+// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1032 -show-encoding %s | FileCheck --check-prefix=GFX10 %s
 
 global_load_dword_addtid v1, s[2:3] offset:16
 // GFX10: encoding: [0x10,0x80,0x58,0xdc,0x00,0x00,0x02,0x01]
Index: llvm/test/MC/AMDGPU/gfx1030_err.s
===
--- llvm/test/MC/AMDGPU/gfx1030_err.s
+++ llvm/test/MC/AMDGPU/gfx1030_err.s
@@ -1,5 +1,6 @@
 // RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1030 %s 2>&1 | FileCheck --check-prefix=GFX10 --implicit-check-not=error: %s
 // RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1031 %s 2>&1 | FileCheck --check-prefix=GFX10 --implicit-check-not=error: %s
+// RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1032 %s 2>&1 | FileCheck --check-prefix=GFX10 --implicit-check-not=error: %s
 
 v_dot8c_i32_i4 v5, v1, v2
 // GFX10: :[[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU
Index: llvm/test/CodeGen/AMDGPU/hsa-note-no-func.ll
===
--- llvm/test/CodeGen/AMDGPU/hsa-note-no-func.ll
+++ llvm/test/CodeGen/AMDGPU/hsa-note-no-func.ll
@@ -30,6 +30,7 @@
 ; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=gfx1012 --amdhsa-code-object-version=2 | FileCheck --check-prefix=HSA --check-prefix=HSA-GFX1012 %s
 ; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=gfx1030 --amdhsa-code-object-version=2 | FileCheck --check-prefix=HSA --check-prefix=HSA-GFX1030 %s
 ; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=gfx1031 --amdhsa-code-object-version=2 | FileCheck --check-prefix=HSA --check-prefix=HSA-GFX1031 %s
+; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=gfx1032 --amdhsa-code-object-version=2 | FileCheck --check-prefix=HSA --check-prefix=HSA-GFX1032 %s
 
 ; HSA: .hsa_code_object_version 2,1
 ; HSA-SI600: .hsa_code_object_isa 6,0,0,"AMD","AMDGPU"
@@ -54,3 +55,4 @@
 ; HSA-GFX1012: .hsa_code_object_isa 10,1,2,"AMD","AMDGPU"
 ; HSA-GFX1030: .hsa_code_object_isa 10,3,0,"AMD","AMDGPU"
 ; HSA-GFX1031: .hsa_code_object_isa 10,3,1,"AMD","AMDGPU"
+; HSA-GFX1032: .hsa_code_object_isa 10,3,2,"AMD","AMDGPU"
Index: llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll
===

[PATCH] D89487: [AMDGPU] gfx1032 target

2020-10-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/test/Driver/amdgpu-macros.cl:216
 // GFX1031-DAG: #define FP_FAST_FMA 1
+// GFX1032-DAG: #define FP_FAST_FMA 1
 

tra wrote:
> This test could use some refactoring.
> Individual macro checks could be collapsed to
> 
> ```
> // RUN: ... --check-prefixes=...,FMA
> //  FMA-DAG: #define FP_FAST_FMA 1
> ```
> 
Thanks! That makes sense.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89487/new/

https://reviews.llvm.org/D89487

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D92115: AMDGPU - Add diagnostic for compiling modules with AMD HSA OS type and GFX 6 arch

2020-11-25 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

You need to add a new test for this new error.




Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:134
+  if (isAmdHsaOS() && getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS) {
+report_fatal_error("GFX6 (SI) ASICs does not support AMD HSA OS type \n",
+   false);

"do not support". I would also drop "(SI)" from the message. Maybe even better 
just "GFX6 does not support AMD HSA".



Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:201
+  TargetTriple(TT), Gen(initializeGen(TT, GPU)),
+  InstrItins(getInstrItineraryForCPU(GPU)), LDSBankCount(0),
+  MaxPrivateElementSize(0),

Please keep original formatting.



Comment at: llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll:1
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 < %s | FileCheck 
--check-prefixes=GFX600 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti < %s | FileCheck 
--check-prefixes=GFX600 %s

You probably just need to change triple for these targets, not just drop them 
from the test.



Comment at: llvm/test/CodeGen/AMDGPU/lower-kernargs-si-mesa.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; FIXME: Manually added checks for metadata nodes at bottom
+; RUN: opt -mtriple=amdgcn-- -S -o - -amdgpu-lower-kernel-arguments %s | 
FileCheck -check-prefix=MESA %s

There are no such checks?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92115/new/

https://reviews.llvm.org/D92115

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D92115: AMDGPU - Add diagnostic for compiling modules with AMD HSA OS type and GFX 6 arch

2020-12-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:62-72
+static AMDGPUSubtarget::Generation initializeGen(const Triple &TT,
+ StringRef GPU) {
+  if (GPU.contains("generic")) {
+return TT.getOS() == Triple::AMDHSA
+   ? AMDGPUSubtarget::Generation::SEA_ISLANDS
+   : AMDGPUSubtarget::Generation::SOUTHERN_ISLANDS;
+  } else {

t-tye wrote:
> I am not clear what this function is doing. It seems to be returning a 
> generation unrelated to to the actual target generation to satisfy the one 
> place it is called. If the target is not SEA_ISLANDS it seems incorrect to be 
> returning SEA_ISLANDS. If this function is doing something special for the 
> one place it is called perhaps it should be expanded there?
It is probably better just print an error if amdhsa is requested but no 
compatible -mcpu specified (e.g. when we have generic/tahiti etc).



Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:134
+  if (isAmdHsaOS() && getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS) {
+report_fatal_error("GFX6 (SI) ASICs does not support AMD HSA OS type \n",
+   false);

pvellien wrote:
> t-tye wrote:
> > pvellien wrote:
> > > t-tye wrote:
> > > > rampitec wrote:
> > > > > "do not support". I would also drop "(SI)" from the message. Maybe 
> > > > > even better just "GFX6 does not support AMD HSA".
> > > > Make the message include the full target triple text so the user 
> > > > understands how to resolve the issue. For example:
> > > > 
> > > >   The target triple %s is not supported: the processor %s does not 
> > > > support the amdhsa OS
> > > > 
> > > > Do the r600 targets also produce a similar error message?
> > > > 
> > > > Is this really the right test? My understanding is that the issue is 
> > > > not that gfx60x does not support the amdhsa OS, but that it does not 
> > > > use the FLAT address space.
> > > > 
> > > > My understanding is that the current problem is that FLAT instructions 
> > > > are being used for the GLOBAL address space accesses. The use of FLAT 
> > > > instructions for the global address space was introduced after gfx60x 
> > > > was initially being supported on amdhsa. Originally BUFFER instructions 
> > > > that use an SRD that has a 0 base and are marked as addr64 where used 
> > > > for GLOBAL address space accesses. This was changed to use FLAT 
> > > > instructions due to later targets dropping the SRD addr64 support. I 
> > > > suspect it is that change that broke gfx60x as there were no tests to 
> > > > catch it.
> > > > 
> > > > So the real fix seems to find that change and make the code still use 
> > > > use BUFFER instructions for gfxx60x and FLAT instructions for gfx70x+. 
> > > > The tests can then be updated to test gfx60x for amdhsa but to omit the 
> > > > FLAT address space tests. The error would then indicate that the gfx60x 
> > > > does not support the FLAT address space (and that is not conditional on 
> > > > the OS). The documentation in AMDGPUUsage can state that gfx60x does 
> > > > not support the FLAT address space in the Address Space section. The 
> > > > Processor table can add a column for processor characteristics and 
> > > > mention that the gfx60x targets do not support the FLAT address space.
> > > Previously in the internal review process it mentioned that gfx60x does 
> > > not support HSA and agreed to add a diagnostic to report that GFX6 do not 
> > > support HSA OS type, @rampitec mentioned that SI ASICs cannot support HSA 
> > > because we can't able to map memory on SI as HSA requires so the user 
> > > will just have weird runtime failures. But based on your comment it seems 
> > > like we have to use MUBUF instructions for -mtriple=amdgcn-amd-amdhsa 
> > > -mcpu=gfx60x combination and use FLAT instructions for 
> > > -mtriple=amdgcn-amd-amdhsa -mcpu=gfx70x+. Is my understanding correct? If 
> > > the compiler emits the MUBUF instructions for global address space 
> > > accesses, it is still required to produce the error msg? 
> > In the early days of implementing HSA I believe we were bringing up on 
> > gfx6. It could not support all HSA features, but it did function with the 
> > parts it could support. So I was suggesting we restore the code to support 
> > what it did originally. That would mean using MUBUF for the GLOBAL address 
> > space like it used to do (is that code still present?).
> > 
> > The compiler can then report errors for the features it cannot support, 
> > which in this case is it cannot support instruction selection of the 
> > GENERIC address space on gfx6.
> > 
> > If you could find the commit that switched to using FLAT instructions to 
> > access the GLOBAL address space that will likely provide the necessary 
> > information to decide the best thing to do for this issue.
> I code to select MUBUF instructio

[PATCH] D89487: [AMDGPU] gfx1032 target

2020-10-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec marked 3 inline comments as done.
rampitec added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:280

  names.
+ ``gfx1032`` ``amdgcn``   dGPU  - xnack   
*TBA*
+  [off]

t-tye wrote:
> foad wrote:
> > xnack looks like a mistake here?
> gfx1032 does not support XNACK. It should have the same target features as 
> gfx1032.
D89565



Comment at: llvm/docs/AMDGPUUsage.rst:835
+ ``EF_AMDGPU_MACH_AMDGCN_GFX1032`` 0x038  ``gfx1032``
  *reserved*0x038  Reserved.
  *reserved*0x039  Reserved.

t-tye wrote:
> Delete as the line above is using this reserved number.
D89565


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89487/new/

https://reviews.llvm.org/D89487

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89487: [AMDGPU] gfx1032 target

2020-10-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec marked 3 inline comments as done.
rampitec added inline comments.



Comment at: llvm/lib/Support/TargetParser.cpp:66
 // Don't bother listing the implicitly true features
-constexpr GPUInfo AMDGCNGPUs[43] = {
+constexpr GPUInfo AMDGCNGPUs[44] = {
   // Name CanonicalKindFeatures

foad wrote:
> Use `[]` so we don't have to keep updating the number?
D89568


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89487/new/

https://reviews.llvm.org/D89487

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89582: clang/AMDGPU: Apply workgroup related attributes to all functions

2020-10-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D89582#2335619 , @arsenm wrote:

> In D89582#2335574 , @yaxunl wrote:
>
>> What if a device function is called by kernels with different work group 
>> sizes, will caller's work group size override callee's work group size?
>
> It's user error to call a function with a larger range than the caller

The problem is that user can override default on a kernel with the attribute, 
but cannot do so on function. So a module can be compiled with a default 
smaller than requested on one of the kernels.

Then if default is maximum 1024 and can only be overridden with the 
--gpu-max-threads-per-block option it would not be problem, if not the 
description of the option:

  LANGOPT(GPUMaxThreadsPerBlock, 32, 256, "default max threads per block for 
kernel launch bounds for HIP")

I.e. it says about the "default", so it should be perfectly legal to set a 
higher limits on a specific kernel. Should the option say it restricts the 
maximum it would be legal to apply it to functions as well.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89582/new/

https://reviews.llvm.org/D89582

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89582: clang/AMDGPU: Apply workgroup related attributes to all functions

2020-10-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D89582#2335704 , @arsenm wrote:

> In D89582#2335671 , @rampitec wrote:
>
>> In D89582#2335619 , @arsenm wrote:
>>
>>> In D89582#2335574 , @yaxunl wrote:
>>>
 What if a device function is called by kernels with different work group 
 sizes, will caller's work group size override callee's work group size?
>>>
>>> It's user error to call a function with a larger range than the caller
>>
>> The problem is that user can override default on a kernel with the 
>> attribute, but cannot do so on function. So a module can be compiled with a 
>> default smaller than requested on one of the kernels.
>
>
>
>> Then if default is maximum 1024 and can only be overridden with the 
>> --gpu-max-threads-per-block option it would not be problem, if not the 
>> description of the option:
>>
>>   LANGOPT(GPUMaxThreadsPerBlock, 32, 256, "default max threads per block for 
>> kernel launch bounds for HIP")
>>
>> I.e. it says about the "default", so it should be perfectly legal to set a 
>> higher limits on a specific kernel. Should the option say it restricts the 
>> maximum it would be legal to apply it to functions as well.
>
> The current backend default ends up greatly restricting the registers used in 
> the functions, and increasing the spilling.

I know the problem, but it should be better to use AMDGPUPropagateAttributes 
for this. It will clone functions if needed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89582/new/

https://reviews.llvm.org/D89582

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90447: [AMDGPU] Add gfx1033 target

2020-10-30 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

Missing changes to these files:

clang/include/clang/Basic/Cuda.h
clang/lib/Basic/Cuda.cpp
clang/lib/Basic/Targets/NVPTX.cpp
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
llvm/lib/Object/ELFObjectFile.cpp
llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90447/new/

https://reviews.llvm.org/D90447

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90447: [AMDGPU] Add gfx1033 target

2020-10-30 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90447/new/

https://reviews.llvm.org/D90447

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D102306: Add gfx1034

2021-05-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102306/new/

https://reviews.llvm.org/D102306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D95733: [AMDGPU] Set s-memtime-inst feature from clang

2021-02-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG8e661d3d9c52: [AMDGPU] Set s-memtime-inst feature from clang 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95733/new/

https://reviews.llvm.org/D95733

Files:
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl


Index: clang/test/CodeGenOpenCL/amdgpu-features.cl
===
--- clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -32,30 +32,30 @@
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1032 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX1032 %s
 // RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1033 -S -emit-llvm -o - %s | 
FileCheck --check-prefix=GFX1033 %s
 
-// GFX600-NOT: "target-features"
-// GFX601-NOT: "target-features"
-// GFX602-NOT: "target-features"
-// GFX700: "target-features"="+ci-insts,+flat-address-space"
-// GFX701: "target-features"="+ci-insts,+flat-address-space"
-// GFX702: "target-features"="+ci-insts,+flat-address-space"
-// GFX703: "target-features"="+ci-insts,+flat-address-space"
-// GFX704: "target-features"="+ci-insts,+flat-address-space"
-// GFX705: "target-features"="+ci-insts,+flat-address-space"
-// GFX801: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime"
-// GFX802: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime"
-// GFX803: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime"
-// GFX805: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime"
-// GFX810: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime"
-// GFX900: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime"
-// GFX902: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime"
-// GFX904: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime"
-// GFX906: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime"
-// GFX908: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+mai-insts,+s-memrealtime"
-// GFX909: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime"
-// GFX90C: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime"
-// GFX1010: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime"
-// GFX1011: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime"
-// GFX1012: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime"
+// GFX600: "target-features"="+s-memtime-inst"
+// GFX601: "target-features"="+s-memtime-inst"
+// GFX602: "target-features"="+s-memtime-inst"
+// GFX700: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"
+// GFX701: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"
+// GFX702: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"
+// GFX703: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"
+// GFX704: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"
+// GFX705: "target-features"="+ci-insts,+flat-address-space,+s-memtime-inst"
+// GFX801: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"
+// GFX802: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"
+// GFX803: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"
+// GFX805: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"
+// GFX810: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+s-memrealtime,+s-memtime-inst"
+// GFX900: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"
+// GFX902: 
"target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst"
+// GFX904: 
"

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGa8d9d50762c4: [AMDGPU] gfx90a support (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Changed prior to commit:
  https://reviews.llvm.org/D96906?vs=324434&id=324456#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

Files:
  clang/docs/ClangCommandLineReference.rst
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/include/clang/Basic/Cuda.h
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/Cuda.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/NVPTX.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/Driver/amdgpu-features.c
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Driver/cuda-bad-arch.cu
  clang/test/Driver/hip-toolchain-features.hip
  clang/test/Misc/target-invalid-cpu-note.c
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx90a-param.cl
  llvm/docs/AMDGPUUsage.rst
  llvm/include/llvm/BinaryFormat/ELF.h
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/Object/ELFObjectFile.cpp
  llvm/lib/ObjectYAML/ELFYAML.cpp
  llvm/lib/Support/TargetParser.cpp
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h
  llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
  llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
  llvm/lib/Target/AMDGPU/AMDGPUGISel.td
  llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
  llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
  llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
  llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
  llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
  llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
  llvm/lib/Target/AMDGPU/BUFInstructions.td
  llvm/lib/Target/AMDGPU/DSInstructions.td
  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
  llvm/lib/Target/AMDGPU/FLATInstructions.td
  llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp
  llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
  llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
  llvm/lib/Target/AMDGPU/GCNProcessors.td
  llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
  llvm/lib/Target/AMDGPU/GCNRegPressure.h
  llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
  llvm/lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp
  llvm/lib/Target/AMDGPU/MIMGInstructions.td
  llvm/lib/Target/AMDGPU/SIAddIMGInit.cpp
  llvm/lib/Target/AMDGPU/SIDefines.h
  llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
  llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
  llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
  llvm/lib/Target/AMDGPU/SIInstrFormats.td
  llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
  llvm/lib/Target/AMDGPU/SIInstrInfo.h
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/SIInstructions.td
  llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
  llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
  llvm/lib/Target/AMDGPU/SIProgramInfo.h
  llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
  llvm/lib/Target/AMDGPU/SIRegisterInfo.h
  llvm/lib/Target/AMDGPU/SIRegisterInfo.td
  llvm/lib/Target/AMDGPU/SISchedule.td
  llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
  llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
  llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
  llvm/lib/Target/AMDGPU/VOP1Instructions.td
  llvm/lib/Target/AMDGPU/VOP2Instructions.td
  llvm/lib/Target/AMDGPU/VOP3Instructions.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/lib/Target/AMDGPU/VOPInstructions.td
  llvm/test/Analysis/CostModel/AMDGPU/fadd.ll
  llvm/test/Analysis/CostModel/AMDGPU/fma.ll
  llvm/test/Analysis/CostModel/AMDGPU/fmul.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-amdgpu-atomic-cmpxchg-flat.mir
  
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-amdgpu-atomic-cmpxchg-global.mir
  llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-atomicrmw-add-flat.mir
  llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-atomicrmw-add-global.mir
  llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-copy.mir

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D96906#2570086 , @tra wrote:

> This is a pretty huge patch, with no details in the commit log.
>
> One hour between sending the patch out and landing it is not sufficient for 
> anyone to meaningfully 
> review the patch and there are no mentions of the review done anywhere else.
>
> While the code only changes AMDGPU back-end, it does not mean that the patch 
> should be just rubber-stamped.

It's a year of work necessarily downstream. Every line there was reviewed and 
tested in the downstream. I understand no one can reasonably review something 
that big, although I cannot break it into small patches after a year of changes 
and fixes. Not that I have too much choice.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199
+  MCRegister RepReg;
+  for (MCRegister R : *MRI->getRegClass(Reg)) {
+if (!MRI->isReserved(R)) {
+  RepReg = R;
+  break;
+}
+  }

arsenm wrote:
> This is a problem because I've removed forAllLanes.
> 
> This is a hack, we should be using a different register class for cases that 
> don't support a given subregister index not scanning for an example 
> non-reserved register
This would be massive duplication of all instructions with such operands, isn't 
it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/include/clang/Driver/Options.td:3097-3101
+def mtgsplit : Flag<["-"], "mtgsplit">, Group,
+  HelpText<"Enable threadgroup split execution mode (AMDGPU only)">;
+def mno_tgsplit : Flag<["-"], "mno-tgsplit">, Group,
+  HelpText<"Disable threadgroup split execution mode (AMDGPU only)">;
+

kzhuravl wrote:
> tra wrote:
> > We have `BoolFOption` to generate `-fsomething` and `-fno-something`
> The reason we use m_amdgpu_Features_Group is it is getting transformed to 
> target features (-mtgsplit to +tgsplit, mno-tgsplit to -tgsplit. For example, 
> tgsplit target feature in AMDGPU backend:
> 
> https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td#L157
> 
> Does BoolFOption get translated to target features as well?
We could probably create similar BoolMOption. This is not only tgsplit, there 
are plenty of such binary options around.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/include/clang/Driver/Options.td:3097-3101
+def mtgsplit : Flag<["-"], "mtgsplit">, Group,
+  HelpText<"Enable threadgroup split execution mode (AMDGPU only)">;
+def mno_tgsplit : Flag<["-"], "mno-tgsplit">, Group,
+  HelpText<"Disable threadgroup split execution mode (AMDGPU only)">;
+

kzhuravl wrote:
> kzhuravl wrote:
> > kzhuravl wrote:
> > > rampitec wrote:
> > > > kzhuravl wrote:
> > > > > kzhuravl wrote:
> > > > > > tra wrote:
> > > > > > > We have `BoolFOption` to generate `-fsomething` and 
> > > > > > > `-fno-something`
> > > > > > The reason we use m_amdgpu_Features_Group is it is getting 
> > > > > > transformed to target features (-mtgsplit to +tgsplit, mno-tgsplit 
> > > > > > to -tgsplit. For example, tgsplit target feature in AMDGPU backend:
> > > > > > 
> > > > > > https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td#L157
> > > > > > 
> > > > > > Does BoolFOption get translated to target features as well?
> > > > > Quickly glancing over Options.td, BoolFOption is in f_group, and does 
> > > > > not get automatically converted to target-features
> > > > We could probably create similar BoolMOption. This is not only tgsplit, 
> > > > there are plenty of such binary options around.
> > > agreed, seems like a good choice given there is BoolFOption, BoolGOption
> > but it will still need to be in m_amdgpu_Features_Group because 
> > https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/AMDGPU.cpp#L403
> >  unless we want to switch away from that. so maybe make a group an 
> > optional, last template parameter to BoolMOption?
> Other targets also have its own corresponding groups: 
> https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Driver/Options.td#L151
Looks like. It is not just the same thing as BoolFOption or BoolGOption.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/include/clang/Driver/Options.td:3097-3101
+def mtgsplit : Flag<["-"], "mtgsplit">, Group,
+  HelpText<"Enable threadgroup split execution mode (AMDGPU only)">;
+def mno_tgsplit : Flag<["-"], "mno-tgsplit">, Group,
+  HelpText<"Disable threadgroup split execution mode (AMDGPU only)">;
+

kzhuravl wrote:
> rampitec wrote:
> > kzhuravl wrote:
> > > kzhuravl wrote:
> > > > kzhuravl wrote:
> > > > > rampitec wrote:
> > > > > > kzhuravl wrote:
> > > > > > > kzhuravl wrote:
> > > > > > > > tra wrote:
> > > > > > > > > We have `BoolFOption` to generate `-fsomething` and 
> > > > > > > > > `-fno-something`
> > > > > > > > The reason we use m_amdgpu_Features_Group is it is getting 
> > > > > > > > transformed to target features (-mtgsplit to +tgsplit, 
> > > > > > > > mno-tgsplit to -tgsplit. For example, tgsplit target feature in 
> > > > > > > > AMDGPU backend:
> > > > > > > > 
> > > > > > > > https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td#L157
> > > > > > > > 
> > > > > > > > Does BoolFOption get translated to target features as well?
> > > > > > > Quickly glancing over Options.td, BoolFOption is in f_group, and 
> > > > > > > does not get automatically converted to target-features
> > > > > > We could probably create similar BoolMOption. This is not only 
> > > > > > tgsplit, there are plenty of such binary options around.
> > > > > agreed, seems like a good choice given there is BoolFOption, 
> > > > > BoolGOption
> > > > but it will still need to be in m_amdgpu_Features_Group because 
> > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/AMDGPU.cpp#L403
> > > >  unless we want to switch away from that. so maybe make a group an 
> > > > optional, last template parameter to BoolMOption?
> > > Other targets also have its own corresponding groups: 
> > > https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Driver/Options.td#L151
> > Looks like. It is not just the same thing as BoolFOption or BoolGOption.
> @tra , do you have any suggestion based on the comments above? thanks.
I have created new BoolMOption here: D97069. Not sure if it saves much code but 
let's see if we like it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199
+  MCRegister RepReg;
+  for (MCRegister R : *MRI->getRegClass(Reg)) {
+if (!MRI->isReserved(R)) {
+  RepReg = R;
+  break;
+}
+  }

arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > This is a problem because I've removed forAllLanes.
> > > 
> > > This is a hack, we should be using a different register class for cases 
> > > that don't support a given subregister index not scanning for an example 
> > > non-reserved register
> > This would be massive duplication of all instructions with such operands, 
> > isn't it?
> Ideally yes. We can still use register classes for this, with special care to 
> make sure we never end up with the unaligned virtual registers in the wrong 
> contexts.
> 
>  The less that's tracked by the instruction definitions, the more special 
> case code we have to right. I've been thinking of swapping out the entire 
> MCInstrDesc table per-subtarget to make this easier, although that may be a 
> painful change.
I do not see how it can be less code. You will need to duplicate all VALU 
pseudos, not just real instructions. Which means every time you write in the 
source something like AMDGPU::FLAT_LOAD_DWORDX2 you would have to write an if. 
For every VALU instruction.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199
+  MCRegister RepReg;
+  for (MCRegister R : *MRI->getRegClass(Reg)) {
+if (!MRI->isReserved(R)) {
+  RepReg = R;
+  break;
+}
+  }

arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > rampitec wrote:
> > > > arsenm wrote:
> > > > > This is a problem because I've removed forAllLanes.
> > > > > 
> > > > > This is a hack, we should be using a different register class for 
> > > > > cases that don't support a given subregister index not scanning for 
> > > > > an example non-reserved register
> > > > This would be massive duplication of all instructions with such 
> > > > operands, isn't it?
> > > Ideally yes. We can still use register classes for this, with special 
> > > care to make sure we never end up with the unaligned virtual registers in 
> > > the wrong contexts.
> > > 
> > >  The less that's tracked by the instruction definitions, the more special 
> > > case code we have to right. I've been thinking of swapping out the entire 
> > > MCInstrDesc table per-subtarget to make this easier, although that may be 
> > > a painful change.
> > I do not see how it can be less code. You will need to duplicate all VALU 
> > pseudos, not just real instructions. Which means every time you write in 
> > the source something like AMDGPU::FLAT_LOAD_DWORDX2 you would have to write 
> > an if. For every VALU instruction.
> It's less code because the code that's already there is supposed to rely on 
> the static operand definitions. Every time we want to deviate from those, we 
> end up writing manual code in the verifier and fixup things here and there 
> that differ.
> 
> The point of swapping out the table would be to eliminate all the VALU 
> pseudos. We would just have the same enum values referring to different 
> physical instruction definitions
This makes sense, although as you said also quite painful and to me also sounds 
like a hack. There is still a lot of legalization needed even with this 
approach. Every time you hit an instruction not supported by a target you will 
need to do something about it. In a worst case expanding. Sounds like another 
year of work. Especially when you look at highly specialized ASICs which can do 
this but cannot do that, and you have a lot them.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199
+  MCRegister RepReg;
+  for (MCRegister R : *MRI->getRegClass(Reg)) {
+if (!MRI->isReserved(R)) {
+  RepReg = R;
+  break;
+}
+  }

rampitec wrote:
> arsenm wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > rampitec wrote:
> > > > > arsenm wrote:
> > > > > > This is a problem because I've removed forAllLanes.
> > > > > > 
> > > > > > This is a hack, we should be using a different register class for 
> > > > > > cases that don't support a given subregister index not scanning for 
> > > > > > an example non-reserved register
> > > > > This would be massive duplication of all instructions with such 
> > > > > operands, isn't it?
> > > > Ideally yes. We can still use register classes for this, with special 
> > > > care to make sure we never end up with the unaligned virtual registers 
> > > > in the wrong contexts.
> > > > 
> > > >  The less that's tracked by the instruction definitions, the more 
> > > > special case code we have to right. I've been thinking of swapping out 
> > > > the entire MCInstrDesc table per-subtarget to make this easier, 
> > > > although that may be a painful change.
> > > I do not see how it can be less code. You will need to duplicate all VALU 
> > > pseudos, not just real instructions. Which means every time you write in 
> > > the source something like AMDGPU::FLAT_LOAD_DWORDX2 you would have to 
> > > write an if. For every VALU instruction.
> > It's less code because the code that's already there is supposed to rely on 
> > the static operand definitions. Every time we want to deviate from those, 
> > we end up writing manual code in the verifier and fixup things here and 
> > there that differ.
> > 
> > The point of swapping out the table would be to eliminate all the VALU 
> > pseudos. We would just have the same enum values referring to different 
> > physical instruction definitions
> This makes sense, although as you said also quite painful and to me also 
> sounds like a hack. There is still a lot of legalization needed even with 
> this approach. Every time you hit an instruction not supported by a target 
> you will need to do something about it. In a worst case expanding. Sounds 
> like another year of work. Especially when you look at highly specialized 
> ASICs which can do this but cannot do that, and you have a lot them.
JBTW it will not help anyway. Not for this problem. You may create an operand 
of a different RC or you may just reserve every other register like I did, the 
net result will be the same, you will end up using prohibited register. Imagine 
you are using an RC where only even tuples are added. And then you are using 
sub1_sub2 subreg of it. RA will happily allocate forbidden register just like 
it does now. To me this is RA bug in the first place to allocate a reserved 
register.

The only thing which could help is an another register info without odd wide 
subregs, but that you cannot achieve just by duplication of instruction 
definitions, for that you would need to duplicate register info as well. This 
is almost a new BE.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-23 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199
+  MCRegister RepReg;
+  for (MCRegister R : *MRI->getRegClass(Reg)) {
+if (!MRI->isReserved(R)) {
+  RepReg = R;
+  break;
+}
+  }

arsenm wrote:
> rampitec wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > rampitec wrote:
> > > > > arsenm wrote:
> > > > > > rampitec wrote:
> > > > > > > arsenm wrote:
> > > > > > > > This is a problem because I've removed forAllLanes.
> > > > > > > > 
> > > > > > > > This is a hack, we should be using a different register class 
> > > > > > > > for cases that don't support a given subregister index not 
> > > > > > > > scanning for an example non-reserved register
> > > > > > > This would be massive duplication of all instructions with such 
> > > > > > > operands, isn't it?
> > > > > > Ideally yes. We can still use register classes for this, with 
> > > > > > special care to make sure we never end up with the unaligned 
> > > > > > virtual registers in the wrong contexts.
> > > > > > 
> > > > > >  The less that's tracked by the instruction definitions, the more 
> > > > > > special case code we have to right. I've been thinking of swapping 
> > > > > > out the entire MCInstrDesc table per-subtarget to make this easier, 
> > > > > > although that may be a painful change.
> > > > > I do not see how it can be less code. You will need to duplicate all 
> > > > > VALU pseudos, not just real instructions. Which means every time you 
> > > > > write in the source something like AMDGPU::FLAT_LOAD_DWORDX2 you 
> > > > > would have to write an if. For every VALU instruction.
> > > > It's less code because the code that's already there is supposed to 
> > > > rely on the static operand definitions. Every time we want to deviate 
> > > > from those, we end up writing manual code in the verifier and fixup 
> > > > things here and there that differ.
> > > > 
> > > > The point of swapping out the table would be to eliminate all the VALU 
> > > > pseudos. We would just have the same enum values referring to different 
> > > > physical instruction definitions
> > > This makes sense, although as you said also quite painful and to me also 
> > > sounds like a hack. There is still a lot of legalization needed even with 
> > > this approach. Every time you hit an instruction not supported by a 
> > > target you will need to do something about it. In a worst case expanding. 
> > > Sounds like another year of work. Especially when you look at highly 
> > > specialized ASICs which can do this but cannot do that, and you have a 
> > > lot them.
> > JBTW it will not help anyway. Not for this problem. You may create an 
> > operand of a different RC or you may just reserve every other register like 
> > I did, the net result will be the same, you will end up using prohibited 
> > register. Imagine you are using an RC where only even tuples are added. And 
> > then you are using sub1_sub2 subreg of it. RA will happily allocate 
> > forbidden register just like it does now. To me this is RA bug in the first 
> > place to allocate a reserved register.
> > 
> > The only thing which could help is an another register info without odd 
> > wide subregs, but that you cannot achieve just by duplication of 
> > instruction definitions, for that you would need to duplicate register info 
> > as well. This is almost a new BE.
> It's not a hack, this is how operand classes are intended to work. You 
> wouldn't be producing these instructions on targets that don't support them 
> (ideally we would also have a verifier for this, which is another area where 
> subtarget handling is weak).
> 
> The point is to not reserve them. References to unaligned registers can 
> exist, they just can't be used in the context of a real machine operand. 
> D97316 switches to using dedicated classes for alignment (the further cleanup 
> would be to have this come directly from the instruction definitions instead 
> of fixing them up after isel).
> It's not a hack, this is how operand classes are intended to work. You 
> wouldn't be producing these instructions on targets that don't support them 
> (ideally we would also have a verifier for this, which is another area where 
> subtarget handling is weak).
> 
> The point is to not reserve them. References to unaligned registers can 
> exist, they just can't be used in the context of a real machine operand. 
> D97316 switches to using dedicated classes for alignment (the further cleanup 
> would be to have this come directly from the instruction definitions instead 
> of fixing them up after isel).

I have replied in D97316, but I do not believe it will help as is. It will run 
into the exactly same issue as with reserved registers approach and their 
subregs.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-23 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199
+  MCRegister RepReg;
+  for (MCRegister R : *MRI->getRegClass(Reg)) {
+if (!MRI->isReserved(R)) {
+  RepReg = R;
+  break;
+}
+  }

rampitec wrote:
> arsenm wrote:
> > rampitec wrote:
> > > rampitec wrote:
> > > > arsenm wrote:
> > > > > rampitec wrote:
> > > > > > arsenm wrote:
> > > > > > > rampitec wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > This is a problem because I've removed forAllLanes.
> > > > > > > > > 
> > > > > > > > > This is a hack, we should be using a different register class 
> > > > > > > > > for cases that don't support a given subregister index not 
> > > > > > > > > scanning for an example non-reserved register
> > > > > > > > This would be massive duplication of all instructions with such 
> > > > > > > > operands, isn't it?
> > > > > > > Ideally yes. We can still use register classes for this, with 
> > > > > > > special care to make sure we never end up with the unaligned 
> > > > > > > virtual registers in the wrong contexts.
> > > > > > > 
> > > > > > >  The less that's tracked by the instruction definitions, the more 
> > > > > > > special case code we have to right. I've been thinking of 
> > > > > > > swapping out the entire MCInstrDesc table per-subtarget to make 
> > > > > > > this easier, although that may be a painful change.
> > > > > > I do not see how it can be less code. You will need to duplicate 
> > > > > > all VALU pseudos, not just real instructions. Which means every 
> > > > > > time you write in the source something like 
> > > > > > AMDGPU::FLAT_LOAD_DWORDX2 you would have to write an if. For every 
> > > > > > VALU instruction.
> > > > > It's less code because the code that's already there is supposed to 
> > > > > rely on the static operand definitions. Every time we want to deviate 
> > > > > from those, we end up writing manual code in the verifier and fixup 
> > > > > things here and there that differ.
> > > > > 
> > > > > The point of swapping out the table would be to eliminate all the 
> > > > > VALU pseudos. We would just have the same enum values referring to 
> > > > > different physical instruction definitions
> > > > This makes sense, although as you said also quite painful and to me 
> > > > also sounds like a hack. There is still a lot of legalization needed 
> > > > even with this approach. Every time you hit an instruction not 
> > > > supported by a target you will need to do something about it. In a 
> > > > worst case expanding. Sounds like another year of work. Especially when 
> > > > you look at highly specialized ASICs which can do this but cannot do 
> > > > that, and you have a lot them.
> > > JBTW it will not help anyway. Not for this problem. You may create an 
> > > operand of a different RC or you may just reserve every other register 
> > > like I did, the net result will be the same, you will end up using 
> > > prohibited register. Imagine you are using an RC where only even tuples 
> > > are added. And then you are using sub1_sub2 subreg of it. RA will happily 
> > > allocate forbidden register just like it does now. To me this is RA bug 
> > > in the first place to allocate a reserved register.
> > > 
> > > The only thing which could help is an another register info without odd 
> > > wide subregs, but that you cannot achieve just by duplication of 
> > > instruction definitions, for that you would need to duplicate register 
> > > info as well. This is almost a new BE.
> > It's not a hack, this is how operand classes are intended to work. You 
> > wouldn't be producing these instructions on targets that don't support them 
> > (ideally we would also have a verifier for this, which is another area 
> > where subtarget handling is weak).
> > 
> > The point is to not reserve them. References to unaligned registers can 
> > exist, they just can't be used in the context of a real machine operand. 
> > D97316 switches to using dedicated classes for alignment (the further 
> > cleanup would be to have this come directly from the instruction 
> > definitions instead of fixing them up after isel).
> > It's not a hack, this is how operand classes are intended to work. You 
> > wouldn't be producing these instructions on targets that don't support them 
> > (ideally we would also have a verifier for this, which is another area 
> > where subtarget handling is weak).
> > 
> > The point is to not reserve them. References to unaligned registers can 
> > exist, they just can't be used in the context of a real machine operand. 
> > D97316 switches to using dedicated classes for alignment (the further 
> > cleanup would be to have this come directly from the instruction 
> > definitions instead of fixing them up after isel).
> 
> I have replied in D97316, but I do not believe it will help as is. It will 
> run into the exactly same issue as with reserved registers approach and their

[PATCH] D97420: [AMDGPU] require s-memtime-inst for __builtin_amdgcn_s_memtime

2021-02-25 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG502b3bfc6a71: [AMDGPU] require s-memtime-inst for 
__builtin_amdgcn_s_memtime (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97420/new/

https://reviews.llvm.org/D97420

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-ci.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-gfx9.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-vi.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx1030.cl

Index: clang/test/SemaOpenCL/builtins-amdgcn-error-gfx1030.cl
===
--- /dev/null
+++ clang/test/SemaOpenCL/builtins-amdgcn-error-gfx1030.cl
@@ -0,0 +1,7 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1030 -verify -S -o - %s
+
+void test_gfx1030_s_memtime()
+{
+  __builtin_amdgcn_s_memtime(); // expected-error {{'__builtin_amdgcn_s_memtime' needs target feature s-memtime-inst}}
+}
Index: clang/test/CodeGenOpenCL/builtins-amdgcn.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -396,13 +396,6 @@
   __builtin_amdgcn_wave_barrier();
 }
 
-// CHECK-LABEL: @test_s_memtime
-// CHECK: call i64 @llvm.amdgcn.s.memtime()
-void test_s_memtime(global ulong* out)
-{
-  *out = __builtin_amdgcn_s_memtime();
-}
-
 // CHECK-LABEL: @test_s_sleep
 // CHECK: call void @llvm.amdgcn.s.sleep(i32 1)
 // CHECK: call void @llvm.amdgcn.s.sleep(i32 15)
Index: clang/test/CodeGenOpenCL/builtins-amdgcn-vi.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-vi.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-vi.cl
@@ -130,3 +130,10 @@
 void test_ds_fmaxf(local float *out, float src) {
   *out = __builtin_amdgcn_ds_fmaxf(out, src, 0, 0, false);
 }
+
+// CHECK-LABEL: @test_s_memtime
+// CHECK: call i64 @llvm.amdgcn.s.memtime()
+void test_s_memtime(global ulong* out)
+{
+  *out = __builtin_amdgcn_s_memtime();
+}
Index: clang/test/CodeGenOpenCL/builtins-amdgcn-gfx9.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-gfx9.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-gfx9.cl
@@ -3,6 +3,7 @@
 // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -S -emit-llvm -o - %s | FileCheck %s
 
 #pragma OPENCL EXTENSION cl_khr_fp16 : enable
+typedef unsigned long ulong;
 
 // CHECK-LABEL: @test_fmed3_f16
 // CHECK: call half @llvm.amdgcn.fmed3.f16(half %a, half %b, half %c)
@@ -10,3 +11,10 @@
 {
   *out = __builtin_amdgcn_fmed3h(a, b, c);
 }
+
+// CHECK-LABEL: @test_s_memtime
+// CHECK: call i64 @llvm.amdgcn.s.memtime()
+void test_s_memtime(global ulong* out)
+{
+  *out = __builtin_amdgcn_s_memtime();
+}
Index: clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
@@ -4,6 +4,7 @@
 // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1012 -S -emit-llvm -o - %s | FileCheck %s
 
 typedef unsigned int uint;
+typedef unsigned long ulong;
 
 // CHECK-LABEL: @test_permlane16(
 // CHECK: call i32 @llvm.amdgcn.permlane16(i32 %a, i32 %b, i32 %c, i32 %d, i1 false, i1 false)
@@ -22,3 +23,10 @@
 void test_mov_dpp8(global uint* out, uint a) {
   *out = __builtin_amdgcn_mov_dpp8(a, 1);
 }
+
+// CHECK-LABEL: @test_s_memtime
+// CHECK: call i64 @llvm.amdgcn.s.memtime()
+void test_s_memtime(global ulong* out)
+{
+  *out = __builtin_amdgcn_s_memtime();
+}
Index: clang/test/CodeGenOpenCL/builtins-amdgcn-ci.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-ci.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-ci.cl
@@ -5,6 +5,7 @@
 // RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -S -emit-llvm -o - %s | FileCheck %s
 
 typedef unsigned int uint;
+typedef unsigned long ulong;
 
 // CHECK-LABEL: @test_s_dcache_inv_vol
 // CHECK: call void @llvm.amdgcn.s.dcache.inv.vol(
@@ -27,6 +28,13 @@
   __builtin_amdgcn_ds_gws_sema_release_all(id);
 }
 
+// CHECK-LABEL: @test_s_memtime
+// CHECK: call i64 @llvm.amdgcn.s.memtime()
+void test_s_memtime(global ulong* out)
+{
+  *out = __builtin_amdgcn_s_memtime();
+}
+
 // CHECK-LABEL: @test_is_shared(
 // CHECK: [[CAST:%[0-9]+]] = bitcast i32* %{{[0-9]+}} to i8*
 // CHECK: call i1 @llvm.amdgcn.is.shared(i8* [[CAST]]
Index: clang/include/clang/Basic/BuiltinsAMDGPU.def
===
--- clang/i

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

You need to replace HasGFX10_BEncoding with HasGFX10_AEncoding in the BVH and 
IMAGE_MSAA_LOAD_X. You also need to update llvm.amdgcn.image.msaa.load.x.ll 
test to include gfx1013.

Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:1106
   [FeatureGFX10,
FeatureGFX10_BEncoding,
FeatureGFX10_3Insts,

gfx1030 should now include FeatureGFX10_AEncoding as well. 10_B is an extension 
above 10_A.

Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll:4
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1013 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s

foad wrote:
> This test surely should not pass for gfx1012, since it does not have these 
> instructions. And with your patch as written it should fail for gfx1013 too, 
> since they are predicated on HasGFX10_BEncoding.
> 
> @rampitec any idea what is wrong here? Apparently the backend will happily 
> generate image_bvh_intersect_ray instructions even for gfx900!
Indeed. MIMG_IntersectRay has this:

```
  let SubtargetPredicate = HasGFX10_BEncoding,
  AssemblerPredicate = HasGFX10_BEncoding,
```
but apparently SubtargetPredicate did not work. It needs to be fixed.

gfx1012 does not have it, gfx1013 does though. That is what GFX10_A encoding is 
about, 10_B it has to be replaced with 10_A in BVH and MSAA load.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll:4
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1013 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
 

rampitec wrote:
> foad wrote:
> > This test surely should not pass for gfx1012, since it does not have these 
> > instructions. And with your patch as written it should fail for gfx1013 
> > too, since they are predicated on HasGFX10_BEncoding.
> > 
> > @rampitec any idea what is wrong here? Apparently the backend will happily 
> > generate image_bvh_intersect_ray instructions even for gfx900!
> Indeed. MIMG_IntersectRay has this:
> 
> ```
>   let SubtargetPredicate = HasGFX10_BEncoding,
>   AssemblerPredicate = HasGFX10_BEncoding,
> ```
> but apparently SubtargetPredicate did not work. It needs to be fixed.
> 
> gfx1012 does not have it, gfx1013 does though. That is what GFX10_A encoding 
> is about, 10_B it has to be replaced with 10_A in BVH and MSAA load.
Image lowering and selection is not really done like everything else. For BVH 
it just lowers intrinsic to opcode. I think the easiest fix is to add to 
SIISelLowering.cpp where we lower Intrinsic::amdgcn_image_bvh_intersect_ray 
something like this:


```
  if (!Subtarget->hasGFX10_AEncoding())
report_fatal_error(
"requested image instruction is not supported on this GPU");
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4697
+  if (!ST.hasGFX10_AEncoding()) {
+DiagnosticInfoUnsupported BadIntrin(B.getMF().getFunction(), "intrinsic 
not supported on subtarget",
+MI.getDebugLoc());

80 chars per line.



Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4700
+B.getMF().getFunction().getContext().diagnose(BadIntrin);
+B.buildUndef(MI.getOperand(0));
+MI.eraseFromParent();

Just return false like in other places.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7341
+if (!Subtarget->hasGFX10_AEncoding())
+  emitRemovedIntrinsicError(DAG, DL, Op.getValueType());
+

return emitRemovedIntrinsicError();



Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll:4
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1013 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
+; RUN: not llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs 
< %s -o /dev/null 2>&1 | FileCheck -check-prefix=ERR %s
 

not --crash llc ...



Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll:4
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1013 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
 

bcahoon wrote:
> rampitec wrote:
> > rampitec wrote:
> > > foad wrote:
> > > > This test surely should not pass for gfx1012, since it does not have 
> > > > these instructions. And with your patch as written it should fail for 
> > > > gfx1013 too, since they are predicated on HasGFX10_BEncoding.
> > > > 
> > > > @rampitec any idea what is wrong here? Apparently the backend will 
> > > > happily generate image_bvh_intersect_ray instructions even for gfx900!
> > > Indeed. MIMG_IntersectRay has this:
> > > 
> > > ```
> > >   let SubtargetPredicate = HasGFX10_BEncoding,
> > >   AssemblerPredicate = HasGFX10_BEncoding,
> > > ```
> > > but apparently SubtargetPredicate did not work. It needs to be fixed.
> > > 
> > > gfx1012 does not have it, gfx1013 does though. That is what GFX10_A 
> > > encoding is about, 10_B it has to be replaced with 10_A in BVH and MSAA 
> > > load.
> > Image lowering and selection is not really done like everything else. For 
> > BVH it just lowers intrinsic to opcode. I think the easiest fix is to add 
> > to SIISelLowering.cpp where we lower 
> > Intrinsic::amdgcn_image_bvh_intersect_ray something like this:
> > 
> > 
> > ```
> >   if (!Subtarget->hasGFX10_AEncoding())
> > report_fatal_error(
> > "requested image instruction is not supported on this GPU");
> > ```
> I ended up using emitRemovedIntrinsicError, which uses 
> DiagnosticInfoUnsupported. This way the failure isn't a crash dump.
> I ended up using emitRemovedIntrinsicError, which uses 
> DiagnosticInfoUnsupported. This way the failure isn't a crash dump.

Diagnostics is a good thing, but we still have to fail the compilation.



Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll:3
+; RUN: llc -march=amdgcn -mcpu=gfx1013 -verify-machineinstrs < %s | FileCheck 
-check-prefix=GCN %s
+; RUN: not llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 2>&1 | 
FileCheck -check-prefix=ERR %s
 

not --crash llc


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll:4
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1013 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
 

bcahoon wrote:
> rampitec wrote:
> > bcahoon wrote:
> > > rampitec wrote:
> > > > rampitec wrote:
> > > > > foad wrote:
> > > > > > This test surely should not pass for gfx1012, since it does not 
> > > > > > have these instructions. And with your patch as written it should 
> > > > > > fail for gfx1013 too, since they are predicated on 
> > > > > > HasGFX10_BEncoding.
> > > > > > 
> > > > > > @rampitec any idea what is wrong here? Apparently the backend will 
> > > > > > happily generate image_bvh_intersect_ray instructions even for 
> > > > > > gfx900!
> > > > > Indeed. MIMG_IntersectRay has this:
> > > > > 
> > > > > ```
> > > > >   let SubtargetPredicate = HasGFX10_BEncoding,
> > > > >   AssemblerPredicate = HasGFX10_BEncoding,
> > > > > ```
> > > > > but apparently SubtargetPredicate did not work. It needs to be fixed.
> > > > > 
> > > > > gfx1012 does not have it, gfx1013 does though. That is what GFX10_A 
> > > > > encoding is about, 10_B it has to be replaced with 10_A in BVH and 
> > > > > MSAA load.
> > > > Image lowering and selection is not really done like everything else. 
> > > > For BVH it just lowers intrinsic to opcode. I think the easiest fix is 
> > > > to add to SIISelLowering.cpp where we lower 
> > > > Intrinsic::amdgcn_image_bvh_intersect_ray something like this:
> > > > 
> > > > 
> > > > ```
> > > >   if (!Subtarget->hasGFX10_AEncoding())
> > > > report_fatal_error(
> > > > "requested image instruction is not supported on this GPU");
> > > > ```
> > > I ended up using emitRemovedIntrinsicError, which uses 
> > > DiagnosticInfoUnsupported. This way the failure isn't a crash dump.
> > > I ended up using emitRemovedIntrinsicError, which uses 
> > > DiagnosticInfoUnsupported. This way the failure isn't a crash dump.
> > 
> > Diagnostics is a good thing, but we still have to fail the compilation.
> The diagnostic is marked as an error, so the compilation fails in that llc 
> returns a non-zero return code. This mechanism is used in other places in the 
> back-end to report similar types of errors. The alternative, if I understand 
> correctly, is that a crash occurs with an error message that indicates that 
> the bug is in LLVM (rather the the input source file).
We do not seem to be consistent here and return either undef or SDValue(), but 
as far as I can see we never continue selecting code though, like here in 
SIISelLowering and always return false from the AMDGPUInstructionSelector.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-08 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4700
+B.getMF().getFunction().getContext().diagnose(BadIntrin);
+B.buildUndef(MI.getOperand(0));
+MI.eraseFromParent();

rampitec wrote:
> Just return false like in other places.
Just return false. I see that is like this in the whole file.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-08 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4701
+B.getMF().getFunction().getContext().diagnose(BadIntrin);
+B.buildUndef(MI.getOperand(0));
+MI.eraseFromParent();

You can just omit undef and erase.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104804: [AMDGPU] Add gfx1035 target

2021-06-23 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104804/new/

https://reviews.llvm.org/D104804

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

Needs an IR test, a test for different supported targets, and a negative test 
for unsupported features.




Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:199
 
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f64, "dd*1di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1fi", "t", 
"gfx90a-insts")

Correct attribute for this one in atomic-fadd-insts. In particular it was first 
added in gfx908 and you would need to test it too.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:205
+
+TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_f64, "dd*1di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fmin_f64, "dd*1di", "t", 
"gfx90a-insts")

Flat address space is 0.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:210
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f64, "dd*3di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f32, "ff*3fi", "t", 
"gfx90a-insts")
+

This is available since gfx8. Attribute gfx8-insts.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+Intrinsic::ID IID;
+llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());

You do not need any of that code. You can directly map a builtin to intrinsic 
in the IntrinsicsAMDGPU.td.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics.cl:112
+kernel void test_flat_global_max(__global double *addr, double x){
+  __builtin_amdgcn_flat_atomic_fmax_f64(addr, x, memory_order_relaxed);
+}

arsenm wrote:
> gandhi21299 wrote:
> > arsenm wrote:
> > > If you're going to bother testing the ISA, is it worth testing rtn and no 
> > > rtn versions?
> > Sorry, what do you mean by rtn version?
> Most atomics can be optimized if they don't return the in memory value if the 
> value is unused
Certainly yes, because global_atomic_add_f32 did not have return version on 
gfx908.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

You cannot do it in a generic llvm code, it simply has no knowledge of what was 
the reason for BE's choice.




Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:598
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+  Remark << "An unsafe hardware instruction was generated.";
+  return Remark;

arsenm wrote:
> Unsafe is misleading, plus this is being too specific to AMDGPU
Having UnsafeFPAtomicFlag does not automatically mean a HW instruction produced 
is unsafe. Moreover, you simply cannot know why this or that decision was done 
by a target method here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D106891#2921048 , @gandhi21299 
wrote:

> @rampitec should the unsafe check go in some pass later in the pipeline then?

No. The only place which has all the knowledge is 
`SITargetLowering::shouldExpandAtomicRMWInIR()`. That is where diagnostics 
shall be emitted.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+Intrinsic::ID IID;
+llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());

gandhi21299 wrote:
> rampitec wrote:
> > You do not need any of that code. You can directly map a builtin to 
> > intrinsic in the IntrinsicsAMDGPU.td.
> Sorry, I looked around for several days but I could not figure this out. Is 
> there a concrete example?
Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+Intrinsic::ID IID;
+llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());

arsenm wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > You do not need any of that code. You can directly map a builtin to 
> > > > intrinsic in the IntrinsicsAMDGPU.td.
> > > Sorry, I looked around for several days but I could not figure this out. 
> > > Is there a concrete example?
> > Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`.
> This is not true if the intrinsic requires type mangling. GCCBuiltin is too 
> simple to handle it
Yes, but these do not need it. All of these builtins are specific.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+Intrinsic::ID IID;
+llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());

arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > rampitec wrote:
> > > > gandhi21299 wrote:
> > > > > rampitec wrote:
> > > > > > You do not need any of that code. You can directly map a builtin to 
> > > > > > intrinsic in the IntrinsicsAMDGPU.td.
> > > > > Sorry, I looked around for several days but I could not figure this 
> > > > > out. Is there a concrete example?
> > > > Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`.
> > > This is not true if the intrinsic requires type mangling. GCCBuiltin is 
> > > too simple to handle it
> > Yes, but these do not need it. All of these builtins are specific.
> These intrinsics are all mangled based on the FP type
Ah, right. Intrinsics are mangled, builtins not. True. OK, this shall be code 
then.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16270
+llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
+return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
+  }

Should we map flags since we already have them?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D106891#2921096 , @gandhi21299 
wrote:

> @rampitec Since remarks cannot be emitted in SIISelLowering because it isn't 
> a pass, in what form can I emit the diagnostics in SIISelLowering?

You could pass ORE to the TLI.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D106891#2921108 , @gandhi21299 
wrote:

> How can I construct an ORE to start off with? I don't think its appropriate 
> to construct it in `shouldExpandAtomicRMWInsts(RMW)`

You have already constructed it. You can just pass it to 
`shouldExpandAtomicRMWInsts`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D106909#2922567 , @gandhi21299 
wrote:

> @rampitec how do I handle the following?
>
>   builtins-fp-atomics.cl:38:10: error: 
> '__builtin_amdgcn_global_atomic_fadd_f64' needs target feature 
> atomic-fadd-insts
> *rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x, 
> memory_order_relaxed);
>^

It is f64, it needs gfx90a-insts. atomic-fadd-insts is for global f32.




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16270
+llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
+return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
+  }

gandhi21299 wrote:
> rampitec wrote:
> > Should we map flags since we already have them?
> Do you mean the memory order flag?
All 3: ordering, scope and volatile.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D106909#2923724 , @gandhi21299 
wrote:

> @rampitec what should I be testing exactly in the IR test?

Produced call to the intrinsic. All of these tests there doing that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16270
+llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
+return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
+  }

gandhi21299 wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > Should we map flags since we already have them?
> > > Do you mean the memory order flag?
> > All 3: ordering, scope and volatile.
> Following the discussion, what change is required here?
Keep zeroes, drop immediate argument of the builtins.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

It still does not do anything useful and still produces useless, wrong and 
misleading remarks for all targets.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:595
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+  Remark << "A hardware instruction was generated";
+  return Remark;

Nothing was generated just yet, pass just left IR instruction untouched. In a 
common case we cannot say what an abstract BE will do about it later.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12139
+OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+Remark << "A hardware instruction was generated";
+return Remark;

It was not generated. We have multiple returns below this point. Some of them 
return None and some CmpXChg for various reasons. The request was to report 
when we produce the instruction *if* it is unsafe, not just that we are about 
to produce an instruction.

Then to make it useful a remark should tell what was the reason to either 
produce an instruction or expand it. Looking at a stream of remarks in a big 
program one would also want to understand what exactly was expanded and what 
was left as is. A stream of messages "A hardware instruction was generated" 
unlikely will help to understand what was done.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

In D106891#2925692 , @gandhi21299 
wrote:

> - eliminated the scope argument as per discussion
> - added more tests

You have updated wrong patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

JBTW, patch title is way too long.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:201
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1f", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2f16, "V2hV2h*1V2h", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fmin_f64, "dd*1d", "t", 
"gfx90a-insts")

gandhi21299 wrote:
> I tried add target feature gfx908-insts for this builtin but the frontend 
> complains that it should have target feature gfx90a-insts.
That was for global_atomic_fadd_f32, but as per discussion we are going to use 
builtin only starting from gfx90a because of the noret problem. Comments in the 
review are off their positions after multiple patch updates.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:210
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f64, "dd*3d", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f32, "ff*3f", "t", "gfx8-insts")
+

This needs tests with a gfx8 target and a negative test with gfx7.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl:13
+// GFX90A:  global_atomic_add_f64
+void test_global_add(__global double *addr, double x) {
+  double *rtn;

Use _f64 or _double in the test name.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl:31
+// GFX90A:  global_atomic_min_f64
+void test_global_global_min(__global double *addr, double x){
+  double *rtn;

Same here and in other double tests, use a suffix f64 or double.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl:67
+// GFX90A:  flat_atomic_min_f64
+void test_flat_min_constant(__generic double *addr, double x){
+  double *rtn;

'constant' is wrong. It is flat. Here and everywhere.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx908.cl:7
+  double *rtn;
+  *rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x); // 
expected-error{{'__builtin_amdgcn_global_atomic_fadd_f64' needs target feature 
gfx90a-insts}}
+}

Need to check all other buintins too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics in GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12139
+OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+Remark << "A hardware instruction was generated";
+return Remark;

gandhi21299 wrote:
> rampitec wrote:
> > It was not generated. We have multiple returns below this point. Some of 
> > them return None and some CmpXChg for various reasons. The request was to 
> > report when we produce the instruction *if* it is unsafe, not just that we 
> > are about to produce an instruction.
> > 
> > Then to make it useful a remark should tell what was the reason to either 
> > produce an instruction or expand it. Looking at a stream of remarks in a 
> > big program one would also want to understand what exactly was expanded and 
> > what was left as is. A stream of messages "A hardware instruction was 
> > generated" unlikely will help to understand what was done.
> Will the hardware instruction be generated in the end of this function then?
It will not be generated here to begin with. If the function returns None the 
atomicrmw will be just left as is and then later selected into the instruction. 
But if you read the function, it has many returns for many different reasons, 
and that is exactly what a useful remark shall report.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics in GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment.

The title should not mention gfx90a, it is not true.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx7.cl:8
+}
\ No newline at end of file


Add new line.



Comment at: clang/test/CodeGenOpenCL/unsupported-fadd2f16-gfx908.cl:1
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu 
gfx908 \

Combine all of these gfx908 error tests into a single file. For example like in 
the builtins-amdgcn-dl-insts-err.cl. It is also better to rename these test 
filenames to follow the existing pattern: builtins-amdgcn-*-err.cl


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12146
+OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+Remark << "A floating-point atomic instruction will generate an unsafe"
+  " hardware instruction";

It would not necessarily generate a HW instruction. There are still cases where 
we return CmpXChg.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12155
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+  Remark << "A floating-point atomic instruction with no following use"
+" will generate an unsafe hardware instruction";

I do not understand this message about the use. We are checking the use below 
simply because there was no return version of global_atomic_add_f32 on gfx908, 
so we are forced to expand it.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12165
 
-  return RMW->use_empty() ? AtomicExpansionKind::None
-  : AtomicExpansionKind::CmpXChg;
+  if (RMW->use_empty()) {
+if (RMW->getFunction()

That's a lot of churn. Please create a function returning AtomicExpansionKind, 
pass what you are going to return into that function, return that argument from 
the function, and also pass a string for diagnosticts to emit from there. 
Replace returns here with its calls. Like:

`return reportAtomicExpand(AtomicExpansionKind::None, ORE, "Produced HW atomic 
is unsafe and might not update memory");`



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12198
+  Remark
+  << "A floating-point atomic instruction will generate an unsafe"
+ " hardware instruction";

This one might be unsafe not because of the cache it works on, but because it 
might not follow denorm mode.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 >

1 - 100 of 179 matches

Mail list logo