from:"Stanislav Mekhanoshin via Phabricator via cfe\-commits"

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12155 + OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction()); + Remark << "A floating-point atomic instruction with no following use" +" will

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12118 -TargetLowering::AtomicExpansionKind -SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const { +TargetLowering::AtomicExpansionKind SITargetLowering::reportAtomicExpand( +

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/fp-atomics-optremarks-gfx90a.cl:23 + +// GFX90A-HW: A floating-point atomic instruction will generate an unsafe hardware instruction which may fail to update memory [-Rpass=si-lower] +// GFX90A-HW-LABEL: test_a

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec requested changes to this revision. rampitec added inline comments. This revision now requires changes to proceed. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:588 + Remark + << "A hardware CAS loop generated: if the memory is " + "known to

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12120 + OptimizationRemarkEmitter *ORE, + OptimizationRemark OptRemark) { + ORE->emit([&]() { return OptRemark; }); gandhi21299 wrote: > ram

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-09 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12120 + OptimizationRemarkEmitter *ORE, + OptimizationRemark OptRemark) { + ORE->emit([&]() { return OptRemark; }); gandhi21299 wrote: > gan

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-10 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:628 + AI, Kind, + Remark << "A hardware CAS loop generated: if the memory is " +"known to be coarse-grain allocated then a hardware " Still the

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-11 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D106891#2938128 , @gandhi21299 wrote: > @rampitec besides the remarks, am I missing anything else in the patch? You should not use AMD specific code in the common code. Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D106891#2940411 , @gandhi21299 wrote: > - eliminated unsafe hardware remarks in SIISelLowering.cpp Most of this patch is not needed now. You do not need to pass ORE to targets, it is a part of the next patch. ===

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. You also need to retitle it now, it is not about AMDGPU and not about FP. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-comm

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:629 + Remark << "A compare and swap loop was generated for an atomic " +"operation " +"at " Need to name the operation. Repository

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:618 expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun); + Ctx.getSyncScopeNames(SSNs); + auto MemScope = SSNs[AI->getSyncScopeID()].empty() Only if SSNs.empty().

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: >

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. Please retitle it without AMDGPU and remove the changes to pass ORE to targets. It is not a part of this change, it is a part of the folloup target specific change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. Please restore opencl test. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:622 +return OptimizationRemark(DEBUG_TYPE, "Passed", AI->getFunction()) + << "A compare and swap loop was generated for an " + << AI->getO

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:32 +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("workgroup-one-as") monotonic +float atomic_cas_system(__global atomic_float *d, float a) { + return __opencl_at

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:33 +float atomic_cas(__global atomic_float *d, float a) { + return __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_work_group); +} Just combine all

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:175 + ORE = std::make_unique(&F); auto &TM = TPC->getTM(); Is there a reason to construct it upfront and not just use a local variable only when needed? Like in StackProtecto

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:175 + ORE = std::make_unique(&F); auto &TM = TPC->getTM(); gandhi21299 wrote: > rampitec wrote: > > Is there a reason to construct it upfront and not just use a local variable

[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. LGTM, but please wait for others too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits

[PATCH] D108150: [Remarks] Emit optimization remarks for atomics generating hardware instructions

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. - Add [AMDGPU] to the title. - Rebase on top of D106891 . - Add tests to atomics-remarks-gfx90a.ll as well, including LDS with matching and non-matching rounding mode. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:9 +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -o - 2>&1 | \ You are c

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:9 +// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu gfx90a \ +// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -o - 2>&1 | \ gandhi212

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12195 + if (!fpModeMatchesGlobalFPAtomicMode(RMW)) +return reportUnsafeHWInst(RMW, AtomicExpansionKind::None); gandhi21299 wrote: > rampitec wrote: > > rampitec w

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec requested changes to this revision. rampitec added a comment. This revision now requires changes to proceed. Logic is still wrong. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108150/new/ https://reviews.llvm.org/D108150

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D108150#2950458 , @gandhi21299 wrote: > @rampitec Which part of the logic is wrong? Still the same around LDS. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108150/new/ https

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D108150#2950479 , @gandhi21299 wrote: > My understanding is that since we are reporting unsafe expansion into hw > instructions, `fpModeMatchesGlobalFPAtomicMode(RMW)` must be false to match > the logic. Please run check-l

[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12194 - return (fpModeMatchesGlobalFPAtomicMode(RMW) || - RMW->getFunction() - ->getFnAttribute("amdgpu-unsafe-fp-atomics") - .getVa

[PATCH] D81886: [AMDGPU] Add gfx1030 target

2021-06-25 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:1245 + +def HasDsSrc2Insts : Predicate<"!Subtarget->hasDsSrc2Insts()">, + AssemblerPredicate<(all_of FeatureDsSrc2Insts)>; foad wrote: > The `!` is obviously wrong in this definition, b

[PATCH] D97069: [clang] SimpleMFlag helper in Options.td

2021-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG21280d35d652: [clang] SimpleMFlag helper in Options.td (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits.

[PATCH] D97928: [AMDGPU] Restore the s_memtime instruction in gfx1030

2021-03-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM, thanks! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D97928/new/ https://reviews.llvm.org/D97928

[PATCH] D115032: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D115032/new/ https://reviews.llvm.org/D115032 ___

[PATCH] D133966: [AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn

2022-09-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGe540965915a4: [AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-

[PATCH] D142493: [AMDGPU] Remove dot1 and dot6 features from clang for gfx11

2023-01-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG4ab2246d486b: [AMDGPU] Remove dot1 and dot6 features from clang for gfx11 (authored by rampitec). Herald added a project: clang. Herald added a subsc

[PATCH] D142407: [AMDGPU] Split dot8 feature

2023-01-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG870b92977e89: [AMDGPU] Split dot8 feature (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits. Repository:

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-01-26 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. rampitec marked an inline comment as done. Closed by commit rGdf0488369d32: [AMDGPU] Split dot7 feature (authored by rampitec). Herald added a project: clang. Herald ad

[PATCH] D127904: [AMDGPU] gfx11 new dot instruction codegen support

2022-06-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl:1 // REQUIRES: amdgpu-registered-target Also need positive tests like in builtins-amdgcn-dl-insts.cl. Comment at: llvm/include/llvm/IR/Intri

[PATCH] D127904: [AMDGPU] gfx11 new dot instruction codegen support

2022-06-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D127904/new/ https://reviews.llvm.org/D127904 ___

[PATCH] D76472: AMDGPU: Emit llvm.fshr for __builtin_amdgcn_alignbit

2020-03-23 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76472/new/ https://reviews.llvm.org/D76472 ___ cfe-commits mailing list cfe-commi

[PATCH] D77329: [AMDGPU] Allow AGPR in inline asm

2020-04-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. Thanks. Could you also update AMDGPUTargetInfo::GCCRegNames[] (in a separate change)? It is used in clobber constraints. JBTW, it does not support register tuples even for V and S now. C

[PATCH] D77329: [AMDGPU] Allow AGPR in inline asm

2020-04-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added inline comments. Comment at: clang/test/CodeGenOpenCL/inline-asm-amdgcn.cl:16 + float reg_b; + float reg_c; + // CHECK: call <32 x float> asm "v_mfma_f32_32x32x1f32 $0, $1, $2, $3", "=a,v,v,a,~{a0},~{a1},~{a2},~{a3},~{a4},~{a5},~{a6},~{a7},~{a8},~{a9},~{a10},~

[PATCH] D76076: [HIP] Mark kernels with uniform-work-group-size=true

2020-03-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D76076/new/ https://reviews.llvm.org/D76076 ___ cfe-commits mailing list cfe-commi

[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-27 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec requested changes to this revision. rampitec added a comment. This revision now requires changes to proceed. You cannot just enable it on gfx908 which does not have return version of it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D146840/

[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-28 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. Can you please also add gfx90a and gfx940 tests? Otherwise LGTM *if* @b-sumner has no objections. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D146840/new/ https://reviews.llvm.org/D146840 ___

[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-28 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM. Please wait for @b-sumner. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D146840/new/ https://reviews.llvm.org/D146840 ___

[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. Isn't it simpler to lower it to an existing int intrinsic and casts in clang? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D147732/new/ https://reviews.llvm.org/D147732 ___ cfe-

[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D147732#4249584 , @jrbyrnes wrote: > In D147732#4249567 , @rampitec > wrote: > >> Isn't it simpler to lower it to an existing int intrinsic and casts in clang? > > Thanks for your com

[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D147732#4267553 , @foad wrote: > Changing the existing intrinsics to use type mangling could break clients > like LLPC and Mesa. I've put up a patch for LLPC to protect it against this > change: https://github.com/GPUOpen-Dr

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D142507#4125940 , @aaronmondal wrote: > Would it be possible to backport this to Clang 16? > > If > https://github.com/RadeonOpenCompute/ROCm-Device-Libs/commit/8dc779e19cbf2ccfd3307b60f7db57cf4203a5be > makes it into ROCm

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D142507#4126864 , @aaronmondal wrote: >> It shall be complimented by the device-lib change in the corresponding >> release, so it is not that simple. > > @rampitec I'm not sure I understand. Does this mean that this is break

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D142507#4127167 , @aaronmondal wrote: > Well, I can already feel the pain that distro maintainers having to build the > next ROCm releases 😅 > > I wonder what the better course of action is here: > > 1. Port this patch to Cl

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D142507#4127275 , @aaronmondal wrote: >> I cannot say there was much choice. The only real choice was to postpone the >> split and magnify the problem in the future. As for the ifdefs, this might >> be possible in the devic

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D142507#4127374 , @aaronmondal wrote: > I think unless conflicts arise creating an issue similar to this > https://github.com/llvm/llvm-project/issues/60600 with the `cherry-pick` line > set to this commit should be enough.

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D142507#4127421 , @b-sumner wrote: > I have no objection to backporting this, but it may need to be accompanied > with a device-libs patch, and I don't know where that patch would be checked > in. The ROCm-Device-Libs in gi

[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D142507#4127505 , @b-sumner wrote: >> My current understanding is the c-p will go into already forked clang-16, >> but not to rocm 5.4. So rocm device-libs will be accompanied by the older >> clang-16 w/o this and stay compa

[PATCH] D148796: [AMDGPU][GFX908] Add builtin support for global add atomic f16/f32

2023-04-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec requested changes to this revision. rampitec added a comment. This revision now requires changes to proceed. We used to support it that way and decided just not doing it. It is very hard to explain why a supported atomic results in error. Someone who really needs it can use intrinsic.

[PATCH] D31210: [AMDGPU] Add new address space mapping

2017-03-21 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. I'm concerned about the default address space to be 64 bit. It would move alloca into generic address space effectively making private address to be 64 bit. This may have very undesirable performance implications, like address arithmetic can become expensive 64 bit and

[PATCH] D31210: [AMDGPU] Add new address space mapping

2017-03-22 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. I also do not exactly like names "old" and "new". This implies we are going to switch to "new" permanently and doing transition. That is not clear yet, however. https://reviews.llvm.org/D31210 ___ cfe-commits mailing list

[PATCH] D31210: [AMDGPU] Add new address space mapping

2017-03-22 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In https://reviews.llvm.org/D31210#707842, @yaxunl wrote: > In https://reviews.llvm.org/D31210#707832, @rampitec wrote: > > > I also do not exactly like names "old" and "new". This implies we are going > > to switch to "new" permanently and doing transition. That is not

[PATCH] D120688: [AMDGPU] Add gfx940 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG2e2e64df4a4f: [AMDGPU] Add gfx940 target (authored by rampitec). Herald added projects: clang, OpenMP. Herald added subscribers: openmp-commits, cfe-

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. Please also update these 2 files: clang/test/Driver/cuda-bad-arch.cu openmp/libomptarget/DeviceRTL/CMakeLists.txt In fact the last one was not updated before too, so the last target gfx1031 there. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION htt

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. You also need to rebase it, I have just landed gfx940 target. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120846/new/ https://reviews.llvm.org/D120846 ___ cfe-commits mailing

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. Looks like cuda-bad-arch.cu does not have any gfx10. Let's fix this in a separate followup patch. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120846/new/ https://reviews.llvm.org/D120846 ___

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120846/new/ https://reviews.llvm.org/D120846 ___

[PATCH] D121028: [AMDGPU] new gfx940 fp atomics

2022-03-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG932f628121d8: [AMDGPU] new gfx940 fp atomics (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits. Reposito

[PATCH] D121172: [AMDGPU] Set noclobber metadata on loads instead of cast to constant

2022-03-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG9eabea396814: [AMDGPU] Set noclobber metadata on loads instead of cast to constant (authored by rampitec). Herald added a project: clang. Herald adde

[PATCH] D102022: [AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32

2021-05-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGc714d037857f: [AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32 (authored by rampitec). Herald added a project: clang. Herald added a subscriber:

[PATCH] D98717: [AMDGPU] Split dot2-insts feature

2021-03-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec accepted this revision. rampitec added a comment. This revision is now accepted and ready to land. LGTM. Thanks Jay! Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:511 +def FeatureDot7Insts : SubtargetFeature<"dot7-insts", + "HasDot7Insts", + "true", a

[PATCH] D96906: [AMDGPU] gfx90a support

2021-03-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec marked an inline comment as done. rampitec added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:100 bool tryFoldOMod(MachineInstr &MI); + bool tryFoldRegSeqence(MachineInstr &MI); + bool tryFoldLCSSAPhi(MachineInstr &MI); foa

[PATCH] D100072: [AMDGPU] Allow -amdgpu-unsafe-fp-atomics to ignore denorm mode

2021-04-09 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes. Closed by commit rG189310a140fa: [AMDGPU] Allow -amdgpu-unsafe-fp-atomics to ignore denorm mode (authored by rampitec). Herald added a project: clang. Herald added a subscriber: cfe-commits. Repository: rG LLVM Github Mon

[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. Is there anything to remove assume() call after address space is inferred? We do not need it anymore. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D112041/new/ https://reviews.llvm.org/D112041 ___

[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D112041#3073637 , @hliao wrote: > In D112041#3073560 , @rampitec > wrote: > >> Is there anything to remove assume() call after address space is inferred? >> We do not need it anymore

[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits

rampitec added a comment. In D112041#3074418 , @hliao wrote: > In D112041#3073676 , @rampitec > wrote: > >> In D112041#3073637 , @hliao wrote: >> >>> In D112041#3073560 <

< 1 2

101 - 179 of 179 matches

Mail list logo