yaxunl added inline comments.
================
Comment at: clang/lib/Sema/Sema.cpp:1514
+ void visitUsedDecl(SourceLocation Loc, Decl *D) {
+ if (auto *TD = dyn_cast<TranslationUnitDecl>(D)) {
+ for (auto *DD : TD->decls()) {
----------------
rjmccall wrote:
> bader wrote:
> > yaxunl wrote:
> > > rjmccall wrote:
> > > > yaxunl wrote:
> > > > > rjmccall wrote:
> > > > > > erichkeane wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > erichkeane wrote:
> > > > > > > > > Note that when recommitting this (if you choose to), this
> > > > > > > > > needs to also handle NamespaceDecl. We're a downstream and
> > > > > > > > > discovered that this doesn't properly handle functions or
> > > > > > > > > records handled in a namespace.
> > > > > > > > >
> > > > > > > > > It can be implemented identically to TranslationUnitDecl.
> > > > > > > > Wait, what? We shouldn't be doing this for TranslationUnitDecl
> > > > > > > > either. I don't even know how we're "using" a
> > > > > > > > TranslationUnitDecl, but neither this case not the case for
> > > > > > > > `NamespaceDecl` should be recursively using every declaration
> > > > > > > > declared inside it. If there's a declaration in a namespace
> > > > > > > > that's being used, it should be getting visited as part of the
> > > > > > > > actual use of it.
> > > > > > > >
> > > > > > > > The logic for `RecordDecl` has the same problem.
> > > > > > > Despite the name, this seems to be more of a home-written ast
> > > > > > > walking class. The entry point is the 'translation unit' which
> > > > > > > seems to walk through everything in an attempt to find all the
> > > > > > > functions (including those that are 'marked' as used by an
> > > > > > > attribute).
> > > > > > >
> > > > > > > You'll see the FunctionDecl section makes this assumption as well
> > > > > > > (not necessarily that we got to a function via a call). IMO, this
> > > > > > > approach is strange, and we should register entry points in some
> > > > > > > manner (functions marked as emitted to the device in some
> > > > > > > fashion), then just follow its call-graph (via the
> > > > > > > clang::CallGraph?) to emit all of these functions.
> > > > > > >
> > > > > > > It seemed really odd to see this approach here, but it seemed
> > > > > > > well reviewed by the time I noticed it (via a downstream bug) so
> > > > > > > I figured I'd lost my chance to disagree with the approach.
> > > > > > >
> > > > > > >
> > > > > > Sure, but `visitUsedDecl` isn't the right place to be entering the
> > > > > > walk. `visitUsedDecl` is supposed to be the *callback* from the
> > > > > > walk. If they need to walk all the global declarations to find
> > > > > > kernels instead of tracking the kernels as they're encountered
> > > > > > (which would be a *much* better approach), it should be done as a
> > > > > > separate function.
> > > > > >
> > > > > > I just missed this in the review.
> > > > > The deferred diagnostics could be initiated by non-kernel functions
> > > > > or even host functions.
> > > > >
> > > > > Let's consider a device code library where no kernels are defined. A
> > > > > device function is emitted, which calls a host device function which
> > > > > has a deferred diagnostic. All device functions that are emitted need
> > > > > to be checked.
> > > > >
> > > > > Same with host functions that are emitted, which may call a host
> > > > > device function which has deferred diagnostic.
> > > > >
> > > > > Also not just function calls need to be checked. A function address
> > > > > may be taken then called through function pointer. Therefore any
> > > > > reference to a function needs to be followed.
> > > > >
> > > > > In the case of OpenMP, the initialization of a global function
> > > > > pointer which refers a function may trigger a deferred diangostic.
> > > > > There are tests for that.
> > > > Right, I get that emitting deferred diagnostics for a declaration D
> > > > needs to trigger any deferred diagnostics in declarations used by D,
> > > > recursively. You essentially have a graph of lazily-emitted
> > > > declarations (which may or may not have deferred diagnostics) and a
> > > > number of eagerly-emitted "root" declarations with use-edges leading
> > > > into that graph. Any declaration that's reachable from a root will
> > > > need to be emitted and so needs to have any deferred diagnostics
> > > > emitted as well. My question is why you're finding these roots with a
> > > > retroactive walk of the entire translation unit instead of either
> > > > building a list of roots as you go or (better yet) building a list of
> > > > lazily-emitted declarations that are used by those roots. You can
> > > > unambiguously identify at the point of declaration whether an entity
> > > > will be eagerly or lazily emitted, right? If you just store those
> > > > initial edges into the lazily-emitted declarations graph and then
> > > > initiate the recursive walk from them at the end of the translation
> > > > unit, you'll only end up walking declarations that are actually
> > > > relevant to your compilation, so you'll have much better locality and
> > > > (if this matters to you) you'll naturally work a lot better with PCH
> > > > and modules.
> > > I will try the approach you suggested. Basically I will record the
> > > emitted functions and variables during parsing and use them as starting
> > > point for the final traversal.
> > >
> > > This should work for CUDA/HIP. However it may be tricky for OpenMP since
> > > the emission of some entities depending on pragmas. Still it may be
> > > doable. If I encounter difficulty I will come back for discussion.
> > >
> > > I will post the change for review.
> > >
> > > Thanks.
> > FYI: SYCL is also using deferred diagnostics engine to emit device side
> > diagnostics, although this part hasn't been up-streamed yet, but we are
> > tracking changes in this area.
> > SYCL support implementation should be quite similar to CUDA/HIP.
> Okay, thank you. Do you still need all the cases in here for records,
> templates, and so on? It looks to me like you should always end up here with
> exactly the variables and functions that are being used, and you should never
> need to make special efforts to e.g. visit all the specializations of a
> template or visit all the methods of a class.
I can remove handling of templates and records. However I have to keep the
handling of CapturedDecl. It is generated from code like
```
void t1(int r) {}
int main() {
#pragma omp target
{
t1(0);
}
return 0;
}
```
And it is like a function decl embeded in function main, e.g.
```
-FunctionDecl 0x86f7c70 <line:8:1, line:15:1> line:8:5 main 'int ()'
`-CompoundStmt 0x873c3f8 <col:12, line:15:1>
|-OMPTargetDirective 0x873c3a0 <line:9:1, col:19>
| `-CapturedStmt 0x873c378 <line:10:3, line:13:3>
| `-CapturedDecl 0x873bd18 <<invalid sloc>> <invalid sloc> nothrow
| |-CapturedStmt 0x873c350 <line:10:3, line:13:3>
| | `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow
| | |-CompoundStmt 0x873c338 <line:10:3, line:13:3>
| | | `-CallExpr 0x873c310 <line:12:5, col:9> 'void'
| | | |-ImplicitCastExpr 0x873c2f8 <col:5> 'void (*)(int)'
<FunctionToPointerDecay>
| | | | `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function
0x86f7b18 't1' 'void (int)'
| | | `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0
| | `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context
'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict'
| |-AlwaysInlineAttr 0x873c040 <<invalid sloc>> Implicit __forceinline
| |-ImplicitParamDecl 0x873bda0 <col:1> col:1 implicit .global_tid.
'const int'
| |-ImplicitParamDecl 0x873be08 <col:1> col:1 implicit .part_id. 'const
int *const restrict'
| |-ImplicitParamDecl 0x873be70 <col:1> col:1 implicit .privates. 'void
*const restrict'
| |-ImplicitParamDecl 0x873bed8 <col:1> col:1 implicit .copy_fn. 'void
(*const restrict)(void *const restrict, ...)'
| |-ImplicitParamDecl 0x873bf40 <col:1> col:1 implicit .task_t. 'void
*const'
| |-ImplicitParamDecl 0x873bfd8 <col:1> col:1 implicit __context
'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict'
| |-RecordDecl 0x873c098 <col:1> col:1 implicit struct definition
| | `-CapturedRecordAttr 0x873c140 <<invalid sloc>> Implicit
| `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow
| |-CompoundStmt 0x873c338 <line:10:3, line:13:3>
| | `-CallExpr 0x873c310 <line:12:5, col:9> 'void'
| | |-ImplicitCastExpr 0x873c2f8 <col:5> 'void (*)(int)'
<FunctionToPointerDecay>
| | | `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function
0x86f7b18 't1' 'void (int)'
| | `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0
| `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context
'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict'
`-ReturnStmt 0x873c3e8 <line:14:3, col:10>
`-IntegerLiteral 0x873c3c8 <col:10> 'int' 0
```
If I do not handle it, I will not be able to reach the call of t1().
================
Comment at: clang/lib/Sema/Sema.cpp:1540
+ } else if (auto *VD = dyn_cast<VarDecl>(D)) {
+ if (auto *Init = VD->getInit()) {
+ auto DevTy = OMPDeclareTargetDeclAttr::getDeviceType(VD);
----------------
rjmccall wrote:
> Can there also be deferred diagnostics associated with this initializer?
Yes. A global variable may be marked by omp declare target directive to be
emitted on device. If the global var is initialized with the address of a
function, the function will be emitted on device. If the device function calls
a host device function which contains a deferred diag, that diag will be
emitted. This can only be known after everything is parsed.
================
Comment at: clang/lib/Sema/SemaDecl.cpp:12229
+ if (LangOpts.OpenMP)
+ DeclsToCheckForDeferredDiags.push_back(VDecl);
CheckCompleteVariableDeclaration(VDecl);
----------------
rjmccall wrote:
> `DeclsToCheckForDeferredDiags` is basically a set of declarations that you
> know to have to emit, right? It doesn't seem right to be adding every
> variable with an initializer to that set — especially because I'm pretty sure
> this function gets called for literally every variable with an initializer,
> including local variables. Presumably you only need to do this for global
> variables that you're definitely going to emit in the current mode.
Yes we only need to check global variables. Fixed.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D70172/new/
https://reviews.llvm.org/D70172
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits