sammccall accepted this revision.
sammccall added inline comments.
This revision is now accepted and ready to land.


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:99
 /// An abstract node for C++ statements, e.g. 'while', 'if', etc.
 class Statement : public Tree {
 public:
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > Do you want to expose the statement-ending-semicolon here?
> > 
> > (Not all statements have it, but common enough you may want it in the base 
> > class instead of all children)
> Yes, only "leaf" (i.e. the ones not having any statement children) have it.
> I was thinking about:
>   - having a separate class for non-composite statements and providing an 
> accessor there,
>   - providing an accessor in each of the leaf statements (would mean some 
> duplication, but, arguably, better discoverability).
> 
> But, from an offline conversation, we seem to disagree that inheritance is a 
> proper way to model this.
> Would it be ok to do this in a follow-up? I'll add a FIXME for now.
First: yes, let's not do this now.

Second: I'm wary of using standard is-a inheritance to model more than 
alternation in the grammar. That is, ForStatement is-a Statement is fine, 
ForStatement is-a LoopyStatement is suspect. This is to do with the fact that 
LoopyStatement is-a Statement seems obvious, and we may end up with 
diamond-shaped inheritance and some conceptual confusion.

This goes for all traits that aren't natural tree-shaped inheritance: 
HasTrailingSemicolon, LoopyStatement, ...

I think there are two concerns here:
 - we want to be able to get the trailing-semicolon if it exists
 - we want to be able to check if the trailing-semicolon is *expected* 
including via its static type

One way to do this (not the only one...):

```
// generic helper, or callers could even write this directly
Optional<Leaf> trailingSemi(Tree *Node) {
  return firstElement(Node->Children<Leaf>(NodeRole::TrailingSemi));
}

// mixin for trailing semi support. Note: doesn't inherit Statement!
// maybe need/want some CRTP magic
class TrailingSemicolon {
  Optional<Leaf> trailingSemi() const { return trailingSemi((const Node*)this; }
}

// Gets the trailingSemi() accessor.
ExprStmt : public Statement, TrailingSemicolon { ... }

```


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:193
+  syntax::Statement *thenStatement();
+  syntax::Leaf *elseKeyword();
+  syntax::Statement *elseStatement();
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > I think throughout it's important to mark which of these are:
> >  - nullable in correct code
> >  - nullable in code generated by recovery
> I would suggest to only mark the nodes that are nullable in the correct code. 
> For recovery, I would assume the following rule (please tell me if I'm wrong):
> 
> On a construct whose parsing involved recovery:
> - if the node has an introducing token (`if`, `try`, etc.), the corresponding 
> child cannot be null.
> - any other child can be null.
Agree with this strategy, and the fact that it doesn't need to be documented on 
every node/occurrence.

But it should definitely be documented somewhere at a high level! (With clang 
AST, this sort of thing feels like tribal knowledge)


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:24
 
 /// A kind of a syntax node, used for implementing casts.
 enum class NodeKind : uint16_t {
----------------
Can you add a comment here saying the ordering/blocks must correspond to the 
Node inheritance hierarchy? This is *kind of* common knowledge, but I think 
this is normally handled by tablegen.


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:79
+  ExpressionStatement_expression,
+  CompoundStatement_statement
 };
----------------
As discussed offline, there's some options about how abstract/concrete these 
roles should be.

e.g. for a list of function args, this could be 
FunctionOpenParen/FunctionArgExpr/FunctionArgComma/FunctionCloseParam 
(specific) <-> OpenParen/Arg/Comma/CloseParen <-> Open/Item/Separator/Close.

The more specific ones are somewhat redundant with the parent/child type (but 
easy to assign systematically), and the more generic ones are more orthogonal 
(but require more design and may by hard to always make consistent).

The concrete advantage of the generic roles is being able to write code like 
`getTrailingSemicolon(Tree*)` or `findLoopBreak(Stmt*)` or 
`removeListItem(Tree*, int)` in a fairly generic way, without resorting to 
adding a `Loop` base class or handling each case with separate code.

This is up to you, though.


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:265
+  syntax::Leaf *returnKeyword();
+  syntax::Expression *value();
+};
----------------
nullable, marked somehow

Optional<Expression*> is tempting as a systematic and hard-to-ignore way of 
documenting that.
And it reflects the fact that there are three conceptual states for children: 
present, legally missing, brokenly missing.

At the same time, I'm not sure how to feel about the fact that in practice this 
can't be present but null, and the fact that *other* non-optional pointers can 
be null.


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:99
+  /// semicolon when needed.
   llvm::ArrayRef<syntax::Token> getRange(const Stmt *S) const {
+    auto Tokens = getRange(S->getBeginLoc(), S->getEndLoc());
----------------
since Expr : Stmt, we need to be a bit wary of overloading based on static type.

It's tempting to say it's correct here: if we statically know E is an Expr, 
then maybe it's never correct to consume the semicolon. But is the converse 
true? e.g. if we're traversing using RAV and call getRange() in visitstmt...

(The alternatives seem to be a) removing the expr version of the function, and 
having the stmt version take a `bool ConsumeSemi` or b) change the stmt version 
to have (dynamic) expr behave like the expr overload, and handle it specially 
when forming exprstmt. More verbose, genuinely conflicted here)


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:270
+
+  bool TraverseCXXForRangeStmt(CXXForRangeStmt *S) {
+    // We override to traverse range initializer as VarDecl.
----------------
maybe group with corresponding `WalkUpFromCXXForRangeStmt`?
(Could also group all `Traverse*` together if you prefer. Current ordering 
seems a little random)


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:272
+    // We override to traverse range initializer as VarDecl.
+    // RAT traverses it as a statement, we produce invalid node kinds in that
+    // case.
----------------
nit: RAV?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63835/new/

https://reviews.llvm.org/D63835



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to