[PATCH] D30551: [AMDGPU] Add builtin functions readlane ds_permute mov_dpp

2017-03-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. LGTM https://reviews.llvm.org/D30551 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-c

[PATCH] D30316: AMDGPU: Make 0 the private nullptr value

2017-03-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision. arsenm added a comment. r297658, r297659 https://reviews.llvm.org/D30316 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D32084: AMDGPU/GFX9: Set +fast-fmaf for >=gfx900 unless -cl-denorms-are-zero is set

2017-04-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: lib/Basic/Targets.cpp:2208-2210 + TargetOpts.Features.push_back( + (Twine(hasFullSpeedFMAF32(TargetOpts.CPU) && + !CGOpts.FlushDenorm ? '+' : '-') + Twine("fast-fmaf")).str()); We glue fast-fmaf to

[PATCH] D110869: [X86] Implement -fzero-call-used-regs option

2022-02-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/CodeGen/PrologEpilogInserter.cpp:1259 + +for (const MCPhysReg &SReg : + TRI.sub_and_superregs_inclusive(MO.getReg())) No reference Comment at: llvm/lib/CodeGen/PrologEpilogI

[PATCH] D120566: [OpenCL][AMDGPU]: Do not allow a call to kernel

2022-02-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D120566#3345604 , @yaxunl wrote: > One of my concerns is that all kernels are duplicated which may cause code > object size doubled. Not really, the kernel should just be a stub that calls the real implementation function. In

[PATCH] D120566: [OpenCL][AMDGPU]: Do not allow a call to kernel

2022-02-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:9238 +static llvm::Function *getKernelClone(llvm::Function &F) { + llvm::Module *M = F.getParent(); I don't think we can really start with the function IR. The TargetABIInfo could be d

[PATCH] D120566: [OpenCL][AMDGPU]: Do not allow a call to kernel

2022-02-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D120566#3346506 , @rjmccall wrote: > Is there something which stops you from taking the address of a kernel and > then calling it? If not, are there actually any uses of kernels in the > module that shouldn't be rewritten as

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-22 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D69498#2707100 , @sameerds wrote: >>> 1. The meaning of the `convergent` attribute has always been >>> target-dependent. >>> 2. Dependence on TTI is not a real cost at all. We may eventually update >>> every use of isConvergent

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-23 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D69498#2709218 , @sameerds wrote: > 1. It is actually okay add control dependences to a convergent function as > long as the new branches are uniform. > 2. Some times we hack a transform to avoid doing things that cannot be > d

[PATCH] D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee

2021-04-29 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/include/clang/Basic/DiagnosticDriverKinds.td:131-133 + "invalid argument '-mno-amdgpu-ieee' only allowed with floating point options " + "which do not honor NaNs, e.g. '-fno-honor-nans', '-ffast-math', " + "'-ffinite-math-only',

[PATCH] D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee

2021-04-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision. arsenm added inline comments. This revision is now accepted and ready to land. Comment at: clang/include/clang/Driver/Options.td:3186 + "gathering quiet and propagate signaling NaN inputs per IEEE 754-2008 " + "(AMDGPU only)">, + NegFlag>, Group;

[PATCH] D102134: [docs]Updated the AMD GPU Attributes documentation

2021-05-11 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/include/clang/Basic/AttrDocs.td:2244-2247 +and is an optimization hint. It is mandatory to use this attribute in some +situations. Because when the attribute is absent, the compiler assumes the +default maximum workgroup size of 2

[PATCH] D115153: clang/AMDGPU: Don't set implicit arg attribute to default size

2022-01-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision. arsenm added a comment. 33315ef3216be6edcfb4a6577150682b80a18766 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D115153/new/ https://reviews.llvm.org/D115153 __

[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. There's no real advantage to using addrspace(4) at all Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D115661/new/ https://reviews.llvm.org/D115661 ___ cfe-commits mailing list cfe-

[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D115661#3190041 , @arsenm wrote: > There's no real advantage to using addrspace(4) at all There are a few places where it's used as an optimization hint / as a crutch where we don't have a proper analysis. Fundamentally I woul

[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D115661#3190324 , @JonChesterfield wrote: > Patch looks ok to me. This will fix the miscompile (we end up with a store to > addrspace(4) at present) without upsetting whatever hacks rely on > addrspace(4). @arsenm reasonable

[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D115661#3193152 , @yaxunl wrote: > In D115661#3192983 , @estewart08 > wrote: > >> In D115661#3190477 , @yaxunl wrote: >> >>> This may cause per

[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D115661#3193431 , @yaxunl wrote: > What about situations of a derived pointer to the global variable? For example > > const int a[100] ; > > foo(&a[50]); > > If we put a in addr space 4, it is easy to deduce &a[50] is con

[PATCH] D109885: [MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm

2021-12-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision. arsenm added a comment. Dropping the default install location from the default search hint is entirely unreasonable Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D109885/new/ https://reviews.llvm.org/D1098

[PATCH] D109885: [MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm

2021-12-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D109885#3198290 , @JonChesterfield wrote: > In D109885#3198232 , @arsenm wrote: > >> Dropping the default install location from the default search hint is >> entirely unreasonable > >

[PATCH] D109885: [MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm

2021-12-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/tools/amdgpu-arch/CMakeLists.txt:13 + +find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATH ${ROCM_PATH}) if (NOT ${hsa-runtime64_FOUND}) I also think CMAKE_INSTALL_PREFIX in the search hints

[PATCH] D109885: [MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm

2021-12-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D109885#3198340 , @JonChesterfield wrote: > In D109885#3198296 , @arsenm wrote: > >> This isn't a feature, it's the introduction of a bug. > > It would regress people depending on the i

[PATCH] D109885: [MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm

2021-12-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D109885#3202213 , @JonChesterfield wrote: > In D109885#3198375 , @arsenm wrote: > >> In D109885#3198340 , >> @JonChesterfield wrote: >> >>> I'

[PATCH] D109885: [MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm

2021-12-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D109885#3203444 , @JonChesterfield wrote: > In D109885#3203412 , @arsenm wrote: > >> We should have versioned libraries and well defined ABI breaks. The build >> should know what runti

[PATCH] D98717: [AMDGPU] Split dot2-insts feature

2021-03-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:511 +def FeatureDot7Insts : SubtargetFeature<"dot7-insts", + "HasDot7Insts", + "true", I'm not sure where the "7" is coming from Repository: rG LLVM Github Monorepo CHANGES SINCE LA

[PATCH] D98146: OpaquePtr: Turn inalloca into a type attribute

2021-03-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/IR/Attributes.cpp:490 if (Type *Ty = getValueAsType()) { - raw_string_ostream OS(Result); + // FIXME: This should never be null Result += '('; dblaikie wrote: > arsenm wrote: > > dblaikie wr

[PATCH] D98146: OpaquePtr: Turn inalloca into a type attribute

2021-03-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision. arsenm added a comment. 4fefed65637ec46c8c2edad6b07b5569ac61e9e5 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D98146/new/ https://reviews.llvm.org/D98146 ___ cfe-commits mailing list cfe-commits@lists.llvm.org htt

[PATCH] D88978: [WIP] Attach debug intrinsics to allocas, and use correct address space

2021-03-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. Is this still needed? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D88978/new/ https://reviews.llvm.org/D88978 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://list

[PATCH] D99812: [PowerPC] [GlobalISel] Implementation of formal arguments lowering in the IRTranslator for the PPC backend

2021-04-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. Attached wrong diff? There are a lot of unrelated files Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99812/new/ https://reviews.llvm.org/D99812 ___ cfe-commits mailing list cfe-c

[PATCH] D98783: [AMDGPU] Add GlobalDCE before internalization pass

2021-04-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:584 + PM.addPass(GlobalDCEPass()); PM.addPass(InternalizePass(mustPreserveGV)); } Should we move where the internalize pass is added instead? C

[PATCH] D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee

2021-04-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision. arsenm added inline comments. This revision now requires changes to proceed. Comment at: clang/lib/Frontend/CompilerInvocation.cpp:1951 + Args.hasFlag(options::OPT_mamdgpu_ieee, options::OPT_mno_amdgpu_ieee, + !Lan

[PATCH] D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee

2021-04-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D77013#2702129 , @yaxunl wrote: > The recent change https://reviews.llvm.org/D96280 caused some difficulty for > this patch. I would like to have some suggestions. > > Basically current FE requires any codegen or target option s

[PATCH] D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee

2021-04-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/Frontend/CompilerInvocation.cpp:1951 + Args.hasFlag(options::OPT_mamdgpu_ieee, options::OPT_mno_amdgpu_ieee, + !LangOptsRef.NoHonorNaNs); + yaxunl wrote: > arsenm wrote: > > This should no

[PATCH] D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee

2021-04-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/include/clang/Basic/DiagnosticDriverKinds.td:130-131 "invalid argument '%0' only allowed with '%1'">; +def err_drv_argument_only_allowed_with_or_equivalent : Error< + "invalid argument '%0' only allowed with '%1' or equivalent">

[PATCH] D99812: [PowerPC] [GlobalISel] Implementation of formal arguments lowering in the IRTranslator for the PPC backend

2021-04-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/PowerPC/GISel/PPCCallLowering.cpp:46-58 +/** + * @brief Lower incoming arguments into generic MIR, this method is responsible + * for splitting aggregate arguments into multiple single value types as well + * as setting

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-22 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D69498#2705441 , @sameerds wrote: > I realize now that what @foad says above puts the idea in a clearer context. > Essentially, any check for isConvergent() isn't happening in a vacuum. It is > relevant only in the presence of

[PATCH] D95063: AMDGPU: Use optimization remarks for register usage

2021-01-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision. arsenm added reviewers: yaxunl, rampitec, scott.linder, t-tye, b-sumner. Herald added subscribers: kerbowa, hiraditya, tpr, dstuttard, nhaehnle, jvesely, kzhuravl. arsenm requested review of this revision. Herald added a subscriber: wdng. Herald added a project: LLVM.

[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:800-801 + // (!is_share(p) && !is_private(p)). + // FIXME: Even though there is no distinguish between CONSTANT and GLOBAL in + // the backend, it may still benefit by telling them from ea

[PATCH] D109707: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols

2021-10-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp:67 + if (auto *GA = dyn_cast(Op.getGlobal())) +return cast(GA->getOperand(0)); return cast(Op.getGlobal()); I thought aliases could include embedded bitcasts

[PATCH] D109707: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols

2021-10-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp:67 + if (auto *GA = dyn_cast(Op.getGlobal())) +return cast(GA->getOperand(0)); return cast(Op.getGlobal()); gandhi21299 wrote: > arsenm wrote: > > I thought a

[PATCH] D108661: The maximal representable alignment in LLVM IR is 1GiB, not 512MiB

2021-08-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. Could use some MIR tests to make sure that parser doesn't explode Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D108661/new/ https://reviews.llvm.org/D108661 ___ cfe-commits mailin

[PATCH] D108661: The maximal representable alignment in LLVM IR is 1GiB, not 512MiB

2021-08-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/test/CodeGen/X86/load-with-1gb-alignment.mir:1 +# RUN: llc -run-pass machine-cp -o - %s | FileCheck %s + Should use -run-pass=none, and go in test/CodeGen/MIR/X86 Comment at: llvm/test/CodeGen/X86/

[PATCH] D108661: The maximal representable alignment in LLVM IR is 1GiB, not 512MiB

2021-08-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/test/CodeGen/MIR/X86/load-with-1gb-alignment.mir:1 +# RUN: llc -run-pass=none -o - %s | FileCheck %s + Needs to add -march/-mtriple Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews

[PATCH] D109707: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols

2021-09-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. While I think the early inliner is largely obsolete, it should still handle aliases correctly Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D109707/new/ https://reviews.llvm.org/D109707 _

[PATCH] D109707: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols

2021-09-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5069 // where aliases aren't supported. - if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX() && !RawTriple.isAMDGPU()) CmdArgs.push_back("-mconstructor-aliases"); This looks

<    8   9   10   11   12   13