[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

Alex Voicu via cfe-commits Mon, 09 Sep 2024 06:04:27 -0700

AlexVlx wrote:

> Thank you for the PR! I'd like to better understand motivation and 
> justification of SPIR-V BE-related changes though. The goal would be to 
> understand whether AllSvmDevices is indeed a better choice (for whom?) than 
> Device as a default mem scope value in SPIR-V BE.
> 
> 1. Questions to the description of the PR.
> 
> > "These were previously unconditionally lowered to Device scope, which is 
> > can be too conservative and possibly incorrect."
> 
> The claim is not justified by any docs/specs. Why Device scope is incorrect 
> as a default? In my opinion, it's AllSvmDevices that looks like a 
> conservative choice that may lead to performance degradation in general case 
> when we change the default without notifying customers. Or, we may say that 
> potential performance changes may depend on a vendor-specific behavior in 
> this case.
> 
> > "Furthermore, the default / implicit scope is changed from Device (an 
> > OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is 
> > not OpenCL specific / can ingest IR coming from other language front-ends."
> 
> What I know without additional references to other docs/specs is that Device 
> is default by OpenCL spec 
> (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions).
>  It would help if you can provide references where AllSvmDevices is a 
> preferable choice, so that we are able to compare and figure out the best 
> default for the Computational flavor of SPIR-V. For sure, SPIR-V BE is not 
> OpenCL (=Device) specific, and it's also not specific to any particular 
> vendor or computational framework. I've seen usages of AllSvmDevices as 
> default in the code base (for example, in
> 
> https://github.com/llvm/llvm-project/blob/319e8cd201e6744199da377fba237dd276063e49/clang/lib/CodeGen/Targets/AMDGPU.cpp#L537
> 
> ), but it seems not enough to flip the default over.
> > "OpenCL defaulting to Device scope is now reflected in the front-end 
> > handling of atomic ops, which seems preferable."
> 
> Changes in clang part looks really good to me. However, when we add to it 
> changes in SPIR-V part of the code base, things look less optimistic, because 
> what this PR means by "the front-end handling of atomic ops" is the upstream 
> clang only, whereas actual choices of a front-end are more versatile, and 
> users coming to SPIR-V by other paths would get a sudden change of behavior 
> in the worst case (e.g., MLIR input for the GenAI domain).
> 
> ===
> 
> 2. If it's acceptable to split this PR into two separate PR's (clang and 
> SPIR-V), I'd gladly support changes in clang part, it makes sense for me. At 
> the moment, however, I have objections against SPIR-V Backend changes as they 
> are represented in the PR:
> 
> * This PR looks like a breaking change that would flip over the default value 
> of mem scope for all environments except for OpenCL and may have a 
> potentially negative impact on an unknown number of projects/customers. I'd 
> guess that OpenCL would not notice the difference, because path that goes via 
> upstream clang front-end redefines default mem scope as Device. All other 
> toolchains just get a breaking change in the form of the AllSvmDevices 
> default. clang-related changes do not help to smooth this, because SPIRV BE 
> should remain agnostic towards front-ends, frameworks, etc.
> * A technical comment is that the proposed implementation in SPIR-V part is 
> less efficient that existing. It compares strings rather than integers and 
> fills in scope names on each call to the getMemScope() function, whereas 
> existing implementation does it just once per a machine function.
> * A terminology (the choice of syncscope names) is debatable. The closest 
> thing in specs that I see is 
> https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_scope_id. I 
> don't see any references to "singlethread" in the specs. Name "workitem" 
> (spelling precisely as "work-item") is used at least in the official Khronos 
> documents (see for example 
> https://registry.khronos.org/SPIR-V/specs/1.0/SPIR-V-execution-and-memory-model.pdf).
>  "all_svm_devices" is not mentioned in the specs at all (there is only the 
> "CrossDevice" term).
> 
> ===
> 
> For now, I'd rather see an eventual solution in the form of further 
> classification of the computational flavor of SPIR-V (not just Compute vs. 
> Vulkan but breaking Compute part further where this is required) -- comparing 
> to this sudden change of the default in favor of any incarnation of Compute 
> targets. As the first approach, all SPIR-V-related changes may require just a 
> short snippet of the kind "if TheTriple is XXX-specific then use CrossDevice 
> instead of Device" and minor rename of syncscope names ("subgroup", for 
> example, indeed makes more sense than "sub_group"). This would probably 
> require a description in the SPIRVUsage doc as well to avoid confusion among 
> customers. Anyway, I'd be glad to talk out a reasonable way forward to get a 
> better solution than we have now, if needed.

Thank you for the thorough response, it's highly appreciated. Let me try to
address some of the points being made:

1. `The claim is not justified by any docs/specs. Why Device scope is incorrect
as a default? In my opinion, it's AllSvmDevices that looks like a conservative
choice that may lead to performance degradation in general case when we change
the default without notifying customers. Or, we may say that potential
performance changes may depend on a vendor-specific behavior in this case` -
poor/confusing choice of words on my part, apologies. The idea, which might be
more a matter of philosophy, is the following: BEs should be correct by
default, and forfeiting general correctness for performance should be opt-in.
In the specific case of scopes, as far as the BE is concerned, if an explicit
scope is not provided the "safest" scope (i.e. the one that subsumes /
incorporates all others) should be chosen, to guarantee that the code just
works. IMHO, whatever choice OpenCL (or any other language / higher-level
source such as MLIR) makes regarding its defaults, it should be handled in the
FE with all other linguistic concerns; it is also desirable to properly scope
ops rather than rely on the default.
2. `What I know without additional references to other docs/specs is that
Device is default by OpenCL spec
(https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions).
It would help if you can provide references where AllSvmDevices is a
preferable choice, so that we are able to compare and figure out the best
default for the Computational flavor of SPIR-V.` - consider all of the *CCL
(NCCL, RCCL, CCCL) libs that do in-kernel cross device communication; at least
one implementation uses unscoped Clang builtins, and this breaks down (in an
opaque / subtle fashion) with the current default. Please note that [we do have
wording in
LangRef](https://llvm.org/docs/LangRef.html#atomic-memory-ordering-constraints)
about expected behaviour which I would interpret as matching what is bieng
proposed, albeit in a slightly roundabout way, which might require tweaking:
`Otherwise, an atomic operation that is not marked syncscope("singlethread") or
syncscope("<target-scope>") synchronizes with and participates in the seq_cst
total orderings of other operations that are not marked
syncscope("singlethread") or syncscope("<target-scope>")`.

I am sympathetic to the concerns about this possibly triggering heartburn by
way of performance degradation, but I'd suggest that we don't want this to be
target / vendor specific, as these are target orthogonal. Perhaps a better
solution is to add an option that controls this behaviour / the default, that
the BE can consume and act accordingly? Thus, there can be a smooth, opt-in
transition from the current solution and nobody has to take any pain. Overall,
I do think that it would (have) be(en) beneficial to have SPIR-V Compute
somewhat less OpenCL specific, as there are far more potential generators
(including HLLs such as C/C++ which assume a flat memory model, for example),
but that train might have left the platform; had that been the case, we'd
probably not be having this issue.

In what regards the performance concerns around doing a string compare, those
are acknowledged. I believe the current string based design was put in place to
give full freedom to targets, so relying strictly on integer values for scopes
is legacy/less preferred. We are quite a few years removed from the design
though, and it was before my time, so I might be misinterpreting - perhaps
@arsenm & @yxsamliu can provide more comment here since they were involved at
the time.

`singlethread` is one of the [two LLVM defined syncscopes, which is always
available / all targets must
expose](https://llvm.org/docs/LangRef.html#atomic-memory-ordering-constraints);
Clang matches this, you always get `SyncScpe::SingleThread` and
`SyncScope::System`. Since there can be multiple parent languages /
higher-level generators that want to feed IR into the SPIR-V BE, it seemed
preferable to match LLVM convention here, rather than OpenCL convention, since
it's broader. On the same topic, `subgroup` vs `sub_group` is merely about
symmetry with `workgroup` / matching OCL/SPIR-V nomenclature as far as I could
observe it.

Overall, hopefully we can discuss this more and try to arrive at a robust
solution for everyone. Please let me know if you'd prefer we take this to the
SPIR-V BE discussion group / someplace else, or if you'd like to keep this
conversation going.

https://github.com/llvm/llvm-project/pull/106429
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [SPIRV][RFC] Rework / extend support for memory scopes (PR #106429)

Reply via email to