[PATCH] D133539: [OpenMP] Replace OpenMP register requires constructor with a global array

2022-11-28 Thread Jeff Sandoval via Phabricator via cfe-commits
sandoval added a comment.

Overall this approach looks good to me, though I'd suggest some (hopefully 
minor) changes that would help avoid breaking the ABI.  It looks like 
`__tgt_bin_desc` is only used by `__tgt_[un]register_lib` -- rather than 
changing those entry points, can you please add new ones (e.g., 
`__tgt_[un]register_lib_v2` or `__tgt[un]register_lib_with_requires`)?  You can 
still remove the old entry points from `libomptarget`.  This should avoid any 
ABI issues with the old entry points, since calls to the old entry points will 
expect the old struct and calls to the new entry points will expect the new 
struct.  This will also make it possible for vendors to continue supporting the 
old entry points, if desired.

Also, would you mind renaming the `__tgt_bin_desc` struct to 
`__tgt_bin_desc_v2` as well, to indicate it has changed?  This isn't part of 
the ABI, so certainly not required, but it might help to convey that the struct 
has changed from older versions of the source.

Also, it looks like this patch only contains the compiler and test changes, 
correct?  Is there another patch that contains the corresponding `libomptarget` 
changes?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133539/new/

https://reviews.llvm.org/D133539

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls.

2022-01-21 Thread Jeff Sandoval via Phabricator via cfe-commits
sandoval added a comment.

I don't see a clear explain the motivation for this change - can you confirm my 
understanding or provide clarification?  It looks like the issue is that 
D110337  caused a regression for cases when 
user code directly calls a device library function that requires hostcall 
services, right?  If so, I think this issue highlights a weakness in the module 
flag approach implemented in D110337  - i.e., 
now the compiler needs to know every library function that may require hostcall 
services.

We have this same issue with our proprietary compiler, where we have our own 
device runtime library that makes use of printf.  The prior approach of 
detecting the `ockl_hostcall_internal` function definition handles this case 
just fine (with the caveat of the potential LTO/inlining issues mentioned in 
D110337 ).  But with the new approach to use 
the `amdgpu_hostcall` module flag, we need to modify our compiler to emit that 
flag for all of our own library calls, too.

Another concern with using a module flag is that is isn't as easily eliminated 
once it has been inserted, even if the call that triggered insertion is 
ultimately eliminated through optimization.  E.g., a printf call might be 
eliminated if it is under a condition that can be statically evaluated to 
false...but, the `amdgpu_hostcall` module flag may already have been inserted.

Is there an approach that can avoid the LTO/inlining issues, like implementing 
the hostcall implementation with an intrinsic to access the hostcall buffer 
pointer? - then the AMDGPU backend could easily detect use of that intrinsic to 
trigger setup of the implicit kernel arg, and inlining could not eliminate that 
intrinsic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115283/new/

https://reviews.llvm.org/D115283

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits