ilya-biryukov marked 13 inline comments as done.
ilya-biryukov added a comment.
This is ready for another round now.
================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:37
+ /// and \p Last are added as children of the new node.
+ void learnNode(SourceLocation Fist, SourceLocation Last,
+ syntax::TreeNode *New);
----------------
sammccall wrote:
> sammccall wrote:
> > maybe formNode, formToken, formRoot()?
> if syntax nodes strictly nest and we form left-to-right and bottom up, then
> why are there ever pending nodes that aren't in the range? Is it because we
> don't aggregate them as early as possible?
>
> (edit: after offline discussion, there are precise invariants here that could
> be documented and asserted)
Sorry, missed this comment and went with `expectToken()` and `expectNode()`,
root is now built on `consume()`. Can change to `form*`, not a big deal.
I still need to write good docs about the invariants here, so leaving this open.
================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:48
+ /// Consume the root node.
+ syntax::TranslationUnit *root() && {
+ assert(Root);
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > any particular reason learnRoot() and root() are different functions?
> >
> > If learnRoot() returned TranslationUnit*, then we avoid the need for the
> > caller to know about the dependency, it would track the state itself.
> `learnRoot` is called inside ast visitor when processing
> `TranslationUnitDecl` and `root()` is used to consume the result.
>
> I guess we could just delay `learnRoot` until `consume()` is called,
> shouldn't be a big deal. I'll do this in the next iteration.
`consume()` now builds the root node.
================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:92
+ bool TraverseCompoundStmt(CompoundStmt *S) {
+ Builder.learnTokenNode(S->getLBracLoc(), tok::l_brace);
+ Builder.learnTokenNode(S->getRBracLoc(), tok::r_brace);
----------------
sammccall wrote:
> So explicitly claiming the primitive tokens, but implicitly claiming the
> subtrees, seems like a weird mix.
> Having both explicit might be nicer:
> - It seems somewhat likely we want to record/tag their semantics (maybe in
> the child itself, or even the low bits of its pointer?), rather than having
> accessors scan around looking for something likely.
> - currently when expected subtrees fail to parse, their tokens get
> (implicitly) wrapped up in Recovery nodes. They're good targets for heuristic
> parsing, but this probably means we should record what an unexplained range
> of tokens is supposed to be for.
>
> Thinking of something like:
> ```
> builder.expectToken(l_brace, S->getLBracLoc());
> builder.expectTree(Statement, S->getBody());
> builder.expectToken(r_brace, S->getRBracLoc());
> ```
> where the builder would react to non-null AST nodes by mapping the associated
> syntax node, and null AST nodes by trying to heuristically parse the tokens
> in between LBracLoc and RBracLoc.
>
> But lots of unanswered questions here: body is a list of statements, how does
> that work? What if LBracLoc or RBracLoc is missing? etc.
That's exactly the design we have one, with a limitation that `expect*` have to
be called in left-to-right and bottom-up manner. Also, you can only build a
tree that ends at the tokens that were consumed.
This is actually a reasonable interface to build a tree from an actual parser,
but might feel weird for a ast-to-syntax transformation. I need to figure out a
way to write good docs about it, but there's a separate comment for that.
Marking this as done (although the questions you mentioned at the end are still
there)
================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:135
+
+void syntax::TreeBuilder::learnNodeImpl(const syntax::Token *Begin,
+ const syntax::Token *End,
----------------
sammccall wrote:
> this function needs some high-level implementation comments
Done. Also renamed it to `consumeNode`.
The docs are very short, though, might need a revamp for clarity.
================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:164
+ };
+ for (auto It = FirstChild; It != NodesInProgress.end(); ++It) {
+ // Add non-coverred ranges as recovery nodes.
----------------
sammccall wrote:
> why can It not point to a node that spans/is past End?
>
> (edit after offline discussion: it's an important invariant that we're always
> consuming a suffix of the pending nodes)
This is now spelled out in the documentation for `foldChildren`.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D61637/new/
https://reviews.llvm.org/D61637
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits