sammccall added inline comments.

================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:157-184
+  /// Iterator over children (common base for const/non-const).
+  /// Not invalidated by tree mutations (holds a stable node pointer).
+  template <typename DerivedT, typename NodeT>
+  class child_iterator_base
+      : public llvm::iterator_facade_base<DerivedT, std::forward_iterator_tag,
+                                          NodeT> {
+  protected:
----------------
eduucaldas wrote:
> I think we should only have an iterator through `Node`s, not a general one 
> like this.
> 
> In general `Tree` has *heterogeneous* children:
> For instance, `1+2` yields the syntax tree:
> ```
> BinaryOperatorExpression
> |-  IntegerLiteralExpression
> |-  Leaf
> `- IntegerLiteralExpression
> ```
> 
> Very rarely will syntax trees have all their children of the same kind, so I 
> don't think it makes sense to try an provide an iterator template. That makes 
> sense for lists , because generally they *have* the same kind. But in that 
> case we provide special iterators for lists only.
> 
This does only iterate over nodes, note this class is private.
This is a common pattern when implementing iterators: you need `iterator` and 
`const_iterator`, they have almost identical implementations but operate on 
different types (`const T` vs `T`). Thus a private template.

Sometimes it's used directly (`using iterator = iterator_base<Node>`) but that 
means the template needs to be visible and also leads to worse compiler 
diagnostics when the canonical type is printed. Inheritance lets us prevent 
direct use of the base class and give the iterator classes distinct names.

---

That said, in future, I think we should (later) have a facility to iterate over 
children of a certain type. This would explicitly be a filtering operation, 
ignoring nodes not of that type.
Providing these typed operations for concrete subclasses of List makes sense, 
but in clangd outside of specific refactorings we've found generic facilities 
to often be more useful. When we designed syntax trees, providing a simple and 
generic data structure that could be used directly was an explicit goal.


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:202-219
+  /// child_iterator is not invalidated by mutations.
+  struct child_iterator : child_iterator_base<child_iterator, Node> {
+    using Base::Base;
+  };
+  struct const_child_iterator
+      : child_iterator_base<const_child_iterator, const Node> {
+    using Base::Base;
----------------
eduucaldas wrote:
> TL;DR:
> `child_iterator` -> `ChildIterator`
> `const_child_iterator` -> `ConstChildIterator`
> `children` -> `getChildren`
> 
> I see you followed the naming convention of the ClangAST, which makes loads 
> of sense. 
> 
> In syntax trees we follow the naming conventions of the [conding 
> standard](https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly)
>  -- just because we want consistency and we had to pick a guideline to follow 
> -- according to that, functions should be verb like and names should be camel 
> case.
> 
> If like us you chose `const_child_iterator` and `children` kind of "just 
> because", could you change it to follow `ConstChildIterator` and 
> `getChildren` respectively. 
> In syntax trees we follow the naming conventions of the conding standard

I see that's been used in some of the newer classes, and that you changed this 
file to follow that style in 4c14ee61b73746b314d83e7c52e03d6527b78105. However 
it's not the original style we used here and elsewhere in libSyntax.
It's a bit frustrating to hear the argument about consistency, because we were 
quite careful and deliberate about this style, which was IMO fairly consistent 
prior to that change.

I'm willing to change these to match the local style in this file. However 
`*_iterator`, `*_begin/end/range()` is more common than FooIterator etc in LLVM 
overall (I think for consistency with `iterator` etc, which is endorsed by the 
style guide). So I don't think this change is particularly an improvement, even 
in terms of consistency.


================
Comment at: clang/lib/Tooling/Syntax/Tree.cpp:22-23
   if (auto *T = dyn_cast<syntax::Tree>(N)) {
-    for (const auto *C = T->getFirstChild(); C; C = C->getNextSibling())
-      traverse(C, Visit);
+    for (const syntax::Node &C : T->children())
+      traverse(&C, Visit);
   }
----------------
eduucaldas wrote:
> Hmm... 
> That looks funny, if the user uses a range-based for loop and forgets the 
> `&`, then we would be copying the Node???
> 
> Also in many places we had to get the address to the node. That is not really 
> consistent with how the syntax trees API's were designed, they generally take 
> pointers instead of references. (that said we could obviously reconsider 
> those design choices)
> That looks funny, if the user uses a range-based for loop and forgets the &, 
> then we would be copying the Node???

It doesn't look funny to me - foreach usually iterates over values rather than 
pointers. This raises a good point - it looks like Node is copyable and 
shouldn't be. (The default copy operation violates the tree invariants, e.g. 
the copied node will not be a child of its parent). I'll send a patch...

(That said this is always the case with copyable objects in range-based for 
loops, and indeed using references - that doesn't usually mean we avoid using 
them)

> Also in many places we had to get the address to the node. That is not really 
> consistent with how the syntax trees API's were designed, they generally take 
> pointers instead of references

I'm not convinced that taking the address is itself a burden, any more than 
using `->` instead of `.` is a burden.
Sometimes the use of pointer vs reference intends to convey something about 
address stability, but here this goes with the type as all nodes are allocated 
on the arena. (IMO this is a part of the AST design that works well).
The other thing it conveys is nullability, and this seems valuable enough that 
IMO APIs that take *non-nullable* nodes by pointer should be reconsidered.

For the iterators specifically, the main alternative here I guess is having 
Tree::children() act as a container of *pointers to* children, instead of the 
children themselves. This *is* highly unconventional and surprising. Consider 
the ubiquitous `for (const auto&N : T->children())`... now the author/reader 
expects N to be a const reference to a child, instead it's a const reference to 
a **non-const** pointer to a child....


================
Comment at: clang/unittests/Tooling/Syntax/TreeTest.cpp:129
 
+TEST_F(TreeTest, Iterators) {
+  buildTree("", allTestClangConfigs().front());
----------------
eduucaldas wrote:
> Suggestion:
> I would split this into `Iterators` and `ConstIterators`.
Hmm, I'm not a big fan of duplicating all the setup bits. And where would we 
test that iterators can be converted to an equivalent const_iterator? (This 
doubles as our implicit test of the multi-pass guarantee, too...)


================
Comment at: clang/unittests/Tooling/Syntax/TreeTest.cpp:159-169
+  for (unsigned I = 0; I < 3; ++I) {
+    EXPECT_EQ(It, CIt);
+    EXPECT_TRUE(It);
+    EXPECT_TRUE(CIt);
+    EXPECT_EQ(It.asPointer(), Children[I]);
+    EXPECT_EQ(CIt.asPointer(), Children[I]);
+    EXPECT_EQ(&*It, Children[I]);
----------------
eduucaldas wrote:
> Is there a reason why you didn't use a range-based for loop over Children?
I want to be really explicit about the expectation that this yields three 
elements (as opposed to say zero).

The ElementsAre tests above should disallow that going wrong, but calling the 
methods once directly with the dumbest possible code seems worthwhile to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90023/new/

https://reviews.llvm.org/D90023

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to