[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

Alex Voicu via Phabricator via cfe-commits Fri, 21 Jul 2023 05:25:36 -0700

AlexVlx added inline comments.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:5542
+  if (!getLangOpts().HIPStdPar)
+    ErrorUnsupported(E, "builtin function");
 
----------------
efriedma wrote:
> AlexVlx wrote:
> > efriedma wrote:
> > > This doesn't make sense; we can't just ignore bits of the source code.  I 
> > > guess this is related to "the decision on their validity is deferred", 
> > > but I don't see how you expect this to work.
> > This is one of the weirder parts, so let's consider the following example:
> > 
> > ```cpp
> > void foo() { __builtin_ia32_pause(); }
> > void bar() { __builtin_trap(); }
> > 
> > void baz(const vector<int>& v) {
> >     return for_each(par_unseq, cbegin(v), cend(v), [](auto&& x) { if (x == 
> > 42) bar(); });
> > }
> > ```
> > 
> > In the case above, what we'd offload to the accelerator, and ask the target 
> > BE to lower, is the implementation of `for_each`, and `bar`, because it is 
> > reachable from the latter. `foo` is not reachable by any execution path on 
> > the accelerator side, however it includes a builtin that is unsupported by 
> > the accelerator (unless said accelerator is x86, which is not impossible, 
> > but not something we're dealing with at the moment). If we were to actually 
> > error out early, in the FE, in these cases, there's almost no appeal to 
> > what is being proposed, because standard headers, as well as other 
> > libraries, are littered with various target specific builtins that are not 
> > going to be supported. This all builds on the core invariant of this model 
> > / extension / thingamabob, which is that the algorithms, and only the 
> > algorithms, are targets for offload. It thus follows that as long as code 
> > that is reachable from an algorithm's implementation is safe, all is fine, 
> > but we cannot know this in the FE / on an AST level, because we need the 
> > actual CFG. This part is handled in LLVM in the `SelectAcceleratorCodePass` 
> > that's in the last patch in this series.
> > 
> > Now, you will quite correctly observe that there's nothing preventing an 
> > user from calling `foo` in the callable they pass to an algorithm; they 
> > might read the docs / appreciate that this won't work, but even there they 
> > are not safe, because there via some opaque call chain they might end up 
> > touching some unsupported builtin. My intuition here, which is reflected 
> > above in letting builtins just flow through, is that such cases are better 
> > served with a compile time error, which is what will obtain once the target 
> > BE chokes trying to lower an unsupported builtin. It's not going to be a 
> > beautiful error, and we could probably prettify it somewhat if we were to 
> > check after we've done the accelerator code selection pass, but it will 
> > happen at compile time. Another solution would be to emit these as traps 
> > (poison + trap for value returning ones), but I am concerned that it would 
> > lead to really fascinating debug journeys.
> > 
> > Having said this, if there's a better way to deal with these scenarios, it 
> > would be rather nice. Similarly, if the above doesn't make sense, please 
> > let me know.
> > 
> Oh, I see; you "optimistically" compile everything assuming it might run on 
> the accelerator, then run LLVM IR optimizations, then determine late which 
> bits of code will actually run on the accelerator, which then prunes the code 
> which shouldn't run.
> 
> I'm not sure I really like this... would it be possible to infer which 
> functions need to be run on the accelerator based on the AST?  I mean, if 
> your API takes a lambda expression that runs on the accelerator, you can mark 
> the lambda's body as "must be emitted for GPU", then recursively mark all the 
> functions referred to by the lambda.
> 
> Emiting errors lazily from the backend means you get different diagnostics 
> depending on the optimization level.
> 
> If you do go with this codegen-based approach, it's not clear to me how you 
> detect that a forbidden builtin was called; if you skip the error handling, 
> you just get a literal "undef".
`I'm not sure I really like this...` - actually, I am not a big fan either, 
however I think it's about the best one can do, given the constraints (consume 
standard C++, no annotations on the user side etc.). Having tried a few times 
in the past (and at least once in a different compiler), I don't quite think 
this can be done on an AST level. It would add some fairly awkward checking 
during template instantiation (no way to know earlier that a `CallableFoo` was 
passed to an offloadable algorithm), and it's a bit unwieldy to basically 
compute the CFG on the AST and mark reachable Callees at that point. Ignoring 
those, the main reason for which we cannot do this is that the interface is not 
constrained to only take lambdas, but callables in general, and that includes 
pointers to function as well. We don't deal with those today, but plan to, and 
there's a natural solution when operating on IR, assuming closed / internalised 
Modules (which is the case at least for AMDGPU at the moment). The final 
challenge pertains to the AST being per TU, with no cross-TU visibility, 
whereas with IR you can either pre-link the BC (implicitly or LTO) and then 
operate on the entire compilation. This is a problem with cases where `foo` 
defined in TU0 is reachable from `algorithm_bar_offloaded_impl` in TU1. So 
TL;DR, I think it would be more complex to do this on the AST and would end up 
more brittle / less future proof.

In what regards how to do deferred diagnostics, it think it can be done like 
this (I crossed streams in my prior reply when discussing this part, so it's 
actually nonsense): instead of emitting undef here, we can emit a builtin with 
the same signature, but with the name suffixed with e.g. 
(`__stdpar_unsupported`) or something similar. Then, when doing the 
reachability computation later, if we stumble upon a node in the CFG that 
contains a builtin suffixed with `__stdpar_unsupported` we error out, and can 
provide nice diagnostics since we'd have the call-chain handy. Thoughts?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155850/new/

https://reviews.llvm.org/D155850

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155850: [Clang][CodeGen][RFC] Add codegen support for C++ Parallel Algorithm Offload

Reply via email to