[PATCH] D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions.

Artem Belevich via Phabricator via cfe-commits Thu, 06 Aug 2020 10:54:12 -0700

tra added inline comments.


================
Comment at: clang/lib/CodeGen/CodeGenPGO.cpp:839-840
 
+  // Skip host-only functions in the CUDA device compilation and device-only
+  // functions in the host compilation.
+  if (CGM.getLangOpts().CUDA &&
----------------
hliao wrote:
> tra wrote:
> > hliao wrote:
> > > tra wrote:
> > > > We will still have around some functions that may never be used on the 
> > > > host side (HD functions referenced from device code only).  I'm not 
> > > > sure if that's a problem for profiling, though. I wonder if we can 
> > > > somehow tie `skipRegionMappingForDecl` to whether we've actually 
> > > > codegen'ed the function. 
> > > Skipping wrong-side functions here just makes the report not confusing as 
> > > these functions are not emitted at all and are supposed never running on 
> > > the host/device side. If we still create the mapping for them, e.g., we 
> > > may report they have 0 runs instead of reporting nothing (just like 
> > > comments between function.) That looks a little bit confusing.
> > > It seems the current PGO adds everything for coverage mapping and late 
> > > prune them based on checks here. Just try to follow that logic to skip 
> > > wrong-side functions. If we need to revise the original logic and 
> > > generate coverage mapping for emitted functions only, the change here is 
> > > unnecessary.
> > I'd add a comment here that this 'filter' is just a rough best-effort 
> > approximation that still allows some effectively device-only Decls through.
> > The output should still be correct, even though the functions will never be 
> > used. Maybe add a TODO to deal with it if/when we know if the Decl was 
> > codegen'ed.
> > 
> Add that comment. But, I tend to not deal that "effectively" 
> host-only/device-only ones as that should be developers' responsibility to 
> handle them. The additional zero coverage mapping may be useful as well. If a 
> function is really device-only but is attributed with HD, the 0 coverage may 
> help developers correcting them.
It will be rather noisy in practice. A lot of code has either has been written 
for NVCC or has to compile with it. NVCC does not have target overloads, so 
sticking HD everywhere  is pretty much the only practical way to do it in 
complicated enough C++ code.  Anything that uses Eigen or Thrust will have tons 
of HD functions that are actually used only on one side. 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85276/new/

https://reviews.llvm.org/D85276

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions.

Reply via email to