AlexVlx wrote: > Thank you for the PR! I'd like to better understand motivation and > justification of SPIR-V BE-related changes though. The goal would be to > understand whether AllSvmDevices is indeed a better choice (for whom?) than > Device as a default mem scope value in SPIR-V BE. > > 1. Questions to the description of the PR. > > > "These were previously unconditionally lowered to Device scope, which is > > can be too conservative and possibly incorrect." > > The claim is not justified by any docs/specs. Why Device scope is incorrect > as a default? In my opinion, it's AllSvmDevices that looks like a > conservative choice that may lead to performance degradation in general case > when we change the default without notifying customers. Or, we may say that > potential performance changes may depend on a vendor-specific behavior in > this case. > > > "Furthermore, the default / implicit scope is changed from Device (an > > OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is > > not OpenCL specific / can ingest IR coming from other language front-ends." > > What I know without additional references to other docs/specs is that Device > is default by OpenCL spec > (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions). > It would help if you can provide references where AllSvmDevices is a > preferable choice, so that we are able to compare and figure out the best > default for the Computational flavor of SPIR-V. For sure, SPIR-V BE is not > OpenCL (=Device) specific, and it's also not specific to any particular > vendor or computational framework. I've seen usages of AllSvmDevices as > default in the code base (for example, in > > https://github.com/llvm/llvm-project/blob/319e8cd201e6744199da377fba237dd276063e49/clang/lib/CodeGen/Targets/AMDGPU.cpp#L537 > > ), but it seems not enough to flip the default over. > > "OpenCL defaulting to Device scope is now reflected in the front-end > > handling of atomic ops, which seems preferable." > > Changes in clang part looks really good to me. However, when we add to it > changes in SPIR-V part of the code base, things look less optimistic, because > what this PR means by "the front-end handling of atomic ops" is the upstream > clang only, whereas actual choices of a front-end are more versatile, and > users coming to SPIR-V by other paths would get a sudden change of behavior > in the worst case (e.g., MLIR input for the GenAI domain). > > === > > 2. If it's acceptable to split this PR into two separate PR's (clang and > SPIR-V), I'd gladly support changes in clang part, it makes sense for me. At > the moment, however, I have objections against SPIR-V Backend changes as they > are represented in the PR: > > * This PR looks like a breaking change that would flip over the default value > of mem scope for all environments except for OpenCL and may have a > potentially negative impact on an unknown number of projects/customers. I'd > guess that OpenCL would not notice the difference, because path that goes via > upstream clang front-end redefines default mem scope as Device. All other > toolchains just get a breaking change in the form of the AllSvmDevices > default. clang-related changes do not help to smooth this, because SPIRV BE > should remain agnostic towards front-ends, frameworks, etc. > * A technical comment is that the proposed implementation in SPIR-V part is > less efficient that existing. It compares strings rather than integers and > fills in scope names on each call to the getMemScope() function, whereas > existing implementation does it just once per a machine function. > * A terminology (the choice of syncscope names) is debatable. The closest > thing in specs that I see is > https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_scope_id. I > don't see any references to "singlethread" in the specs. Name "workitem" > (spelling precisely as "work-item") is used at least in the official Khronos > documents (see for example > https://registry.khronos.org/SPIR-V/specs/1.0/SPIR-V-execution-and-memory-model.pdf). > "all_svm_devices" is not mentioned in the specs at all (there is only the > "CrossDevice" term). > > === > > For now, I'd rather see an eventual solution in the form of further > classification of the computational flavor of SPIR-V (not just Compute vs. > Vulkan but breaking Compute part further where this is required) -- comparing > to this sudden change of the default in favor of any incarnation of Compute > targets. As the first approach, all SPIR-V-related changes may require just a > short snippet of the kind "if TheTriple is XXX-specific then use CrossDevice > instead of Device" and minor rename of syncscope names ("subgroup", for > example, indeed makes more sense than "sub_group"). This would probably > require a description in the SPIRVUsage doc as well to avoid confusion among > customers. Anyway, I'd be glad to talk out a reasonable way forward to get a > better solution than we have now, if needed.
Thank you for the thorough response, it's highly appreciated. Let me try to address some of the points being made: 1. `The claim is not justified by any docs/specs. Why Device scope is incorrect as a default? In my opinion, it's AllSvmDevices that looks like a conservative choice that may lead to performance degradation in general case when we change the default without notifying customers. Or, we may say that potential performance changes may depend on a vendor-specific behavior in this case` - poor/confusing choice of words on my part, apologies. The idea, which might be more a matter of philosophy, is the following: BEs should be correct by default, and forfeiting general correctness for performance should be opt-in. In the specific case of scopes, as far as the BE is concerned, if an explicit scope is not provided the "safest" scope (i.e. the one that subsumes / incorporates all others) should be chosen, to guarantee that the code just works. IMHO, whatever choice OpenCL (or any other language / higher-level source such as MLIR) makes regarding its defaults, it should be handled in the FE with all other linguistic concerns; it is also desirable to properly scope ops rather than rely on the default. 2. `What I know without additional references to other docs/specs is that Device is default by OpenCL spec (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions). It would help if you can provide references where AllSvmDevices is a preferable choice, so that we are able to compare and figure out the best default for the Computational flavor of SPIR-V.` - consider all of the *CCL (NCCL, RCCL, CCCL) libs that do in-kernel cross device communication; at least one implementation uses unscoped Clang builtins, and this breaks down (in an opaque / subtle fashion) with the current default. Please note that [we do have wording in LangRef](https://llvm.org/docs/LangRef.html#atomic-memory-ordering-constraints) about expected behaviour which I would interpret as matching what is bieng proposed, albeit in a slightly roundabout way, which might require tweaking: `Otherwise, an atomic operation that is not marked syncscope("singlethread") or syncscope("<target-scope>") synchronizes with and participates in the seq_cst total orderings of other operations that are not marked syncscope("singlethread") or syncscope("<target-scope>")`. I am sympathetic to the concerns about this possibly triggering heartburn by way of performance degradation, but I'd suggest that we don't want this to be target / vendor specific, as these are target orthogonal. Perhaps a better solution is to add an option that controls this behaviour / the default, that the BE can consume and act accordingly? Thus, there can be a smooth, opt-in transition from the current solution and nobody has to take any pain. Overall, I do think that it would (have) be(en) beneficial to have SPIR-V Compute somewhat less OpenCL specific, as there are far more potential generators (including HLLs such as C/C++ which assume a flat memory model, for example), but that train might have left the platform; had that been the case, we'd probably not be having this issue. In what regards the performance concerns around doing a string compare, those are acknowledged. I believe the current string based design was put in place to give full freedom to targets, so relying strictly on integer values for scopes is legacy/less preferred. We are quite a few years removed from the design though, and it was before my time, so I might be misinterpreting - perhaps @arsenm & @yxsamliu can provide more comment here since they were involved at the time. `singlethread` is one of the [two LLVM defined syncscopes, which is always available / all targets must expose](https://llvm.org/docs/LangRef.html#atomic-memory-ordering-constraints); Clang matches this, you always get `SyncScpe::SingleThread` and `SyncScope::System`. Since there can be multiple parent languages / higher-level generators that want to feed IR into the SPIR-V BE, it seemed preferable to match LLVM convention here, rather than OpenCL convention, since it's broader. On the same topic, `subgroup` vs `sub_group` is merely about symmetry with `workgroup` / matching OCL/SPIR-V nomenclature as far as I could observe it. Overall, hopefully we can discuss this more and try to arrive at a robust solution for everyone. Please let me know if you'd prefer we take this to the SPIR-V BE discussion group / someplace else, or if you'd like to keep this conversation going. https://github.com/llvm/llvm-project/pull/106429 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits