[PATCH] D105904: [clangd] Support `#pragma mark` in the outline

Kadir Cetinkaya via Phabricator via cfe-commits Wed, 04 Aug 2021 03:57:08 -0700

kadircet added inline comments.


================
Comment at: clang-tools-extra/clangd/FindSymbols.cpp:664
+      }
+      // NextPragma is after us but before the next symbol. Our reign is over.
+      break;
----------------
I suppose this is not correct in cases like.

```
#pragma mark - Outer
foo {
  #pragma mark - Foo
}
bar {
  # pragma mark - Bar
}
baz {
  # pragma mark - Baz
}
```

We'll first move `foo` under `Outer`, then start processing `bar` we notice 
that It comes after pragma `Foo`, and then also notice that pragma `Foo` is not 
contained in `bar` hence break. but all of `foo,bar,baz` should've been moved 
to be under `Outer` instead.


================
Comment at: clang-tools-extra/clangd/FindSymbols.cpp:535
+/// by range.
+std::vector<DocumentSymbol> mergePragmas(std::vector<DocumentSymbol> &Syms,
+                                         std::vector<PragmaMarkSymbol> 
&Pragmas,
----------------
dgoldman wrote:
> kadircet wrote:
> > dgoldman wrote:
> > > kadircet wrote:
> > > > dgoldman wrote:
> > > > > kadircet wrote:
> > > > > > dgoldman wrote:
> > > > > > > sammccall wrote:
> > > > > > > > FWIW the flow control/how we make progress seem hard to follow 
> > > > > > > > here to me.
> > > > > > > > 
> > > > > > > > In particular I think I'm struggling with the statefulness of 
> > > > > > > > "is there an open mark group".
> > > > > > > > 
> > > > > > > > Possible simplifications:
> > > > > > > >  - define a dummy root symbol, which seems clearer than the 
> > > > > > > > vector<symbols> + range
> > > > > > > >  - avoid reverse-sorting the list of pragma symbols, and just 
> > > > > > > > consume from the front of an ArrayRef instead
> > > > > > > >  - make the outer loop over pragmas, rather than symbols. It 
> > > > > > > > would first check if the pragma belongs directly here or not, 
> > > > > > > > and if so, loop over symbols to work out which should become 
> > > > > > > > children. This seems very likely to be efficient enough in 
> > > > > > > > practice (few pragmas, or most children are grouped into 
> > > > > > > > pragmas)
> > > > > > > > define a dummy root symbol, which seems clearer than the 
> > > > > > > > vector<symbols> + range
> > > > > > > 
> > > > > > > I guess? Then we'd take in a `DocumentSymbol & and a 
> > > > > > > ArrayRef<PragmaMarkSymbol> & (or just by value and then return it 
> > > > > > > as well). The rest would be the same though
> > > > > > > 
> > > > > > > > In particular I think I'm struggling with the statefulness of 
> > > > > > > > "is there an open mark group".
> > > > > > > 
> > > > > > > We need to track the current open group if there is one in order 
> > > > > > > to move children to it.
> > > > > > > 
> > > > > > > > make the outer loop over pragmas, rather than symbols. It would 
> > > > > > > > first check if the pragma belongs directly here or not, and if 
> > > > > > > > so, loop over symbols to work out which should become children. 
> > > > > > > > This seems very likely to be efficient enough in practice (few 
> > > > > > > > pragmas, or most children are grouped into pragmas)
> > > > > > > 
> > > > > > > The important thing here is knowing where the pragma mark ends - 
> > > > > > > if it doesn't, it actually gets all of the children. So we'd have 
> > > > > > > to peak at the next pragma mark, add all symbols before it to us 
> > > > > > > as children, and then potentially recurse to nest it inside of a 
> > > > > > > symbol. I'll try it out and see if it's simpler.
> > > > > > > 
> > > > > > > 
> > > > > > ```
> > > > > > while(Pragmas) {
> > > > > > // We'll figure out where the Pragmas.front() should go.
> > > > > > Pragma P = Pragmas.front();
> > > > > > DocumentSymbol *Cur = Root;
> > > > > > while(Cur->contains(P)) {
> > > > > >   auto *OldCur = Cur;
> > > > > >   for(auto *C : Cur->children) {
> > > > > >      // We assume at most 1 child can contain the pragma (as 
> > > > > > pragmas are on a single line, and children have disjoint ranges)
> > > > > >      if (C->contains(P)) {
> > > > > >          Cur = C;
> > > > > >          break;
> > > > > >      }
> > > > > >   }
> > > > > >   // Cur is immediate parent of P
> > > > > >   if (OldCur == Cur) {
> > > > > >     // Just insert P into children if it is not a group and we are 
> > > > > > done.
> > > > > >     // Otherwise we need to figure out when current pragma is 
> > > > > > terminated:
> > > > > > // if next pragma is not contained in Cur, or is contained in one 
> > > > > > of the children, It is at the end of Cur, nest all the children 
> > > > > > that appear after P under the symbol node for P.
> > > > > > // Otherwise nest all the children that appear after P but before 
> > > > > > next pragma under the symbol node for P.
> > > > > > // Pop Pragmas and break
> > > > > >   }
> > > > > > }
> > > > > > }
> > > > > > ```
> > > > > > 
> > > > > > Does that make sense, i hope i am not missing something obvious? 
> > > > > > Complexity-wise in the worst case we'll go all the way down to a 
> > > > > > leaf once per pragma, since there will only be a handful of pragmas 
> > > > > > most of the time it shouldn't be too bad.
> > > > > I've implemented your suggestion. I don't think it's simpler, but 
> > > > > LMK, maybe it can be improved.
> > > > oops, i was looking into an older revision and missed mergepragmas2, i 
> > > > think it looks quite similar to this one but we can probably get rid of 
> > > > the recursion as well and simplify a couple more cases
> > > This makes sense,  I think that works for the most part besides dropping 
> > > the recursion, specifically for
> > > 
> > > ```
> > >       // Next pragma is contained in the Sym, it belongs there and doesn't
> > >       // affect us at all.
> > >       if (Sym.range.contains(NextPragma.DocSym.range)) {
> > >         Sym.children = mergePragmas2(Sym.children, Pragmas, Sym.range);
> > >         continue;
> > >       }
> > > ```
> > > 
> > > I guess we could explicitly forbid 3+ layers of nesting and handle it 
> > > inline there? But I'm not sure it's worth the effort to rewrite all of 
> > > this - the recursion shouldn't be deep and we avoid needing to shift 
> > > vector elements over by recreating a new one.
> > Sorry I don't follow why we can't get rid of the recursion in this case.
> > 
> > Two loop solution I described above literally tries to find the document 
> > symbol node, such that the current pragma is contained in that node && 
> > current pragma isn't contained in any of that node's children. Afterwards 
> > it inserts the pragma into that node and starts traversing the tree from 
> > root again for the next pragma.
> > 
> > Again I don't follow where the `3+ layers of nesting` constraint came from. 
> > But I do feel like the iterative version is somewhat easier to reason about 
> > (especially keeping track of what's happening with `pragmas.front()` and 
> > the way it bails out via `parentrange` check). Shifting of the vector is 
> > definitely unfortunate but I think it shouldn't imply big performance hits 
> > in practice as we are only shifting the children of a single node.
> Yeah, I understand the first part,  I think specifically handling the group 
> case after you discover where it needs to be inserted is a bit more 
> complicated, something like the following:
> 
> 
> ```
>         // Pragma is a group, so we need to figure out where it terminates:
>         // - If the next Pragma is not contained in Cur, P owns all of its
>         //   parent's children which occur after P.
>         // - If the next pragma is contained in Cur but actually belongs to 
> one
>         //   of the parent's children, we temporarily skip over it and look at
>         //   the next pragma to decide where we end.
>         // - Otherwise nest all of its parent's children which occur after P 
> but
>         //   before the next pragma.
> ```
> 
> And yeah, shifting in the worst case is definitely worst (due to repeat 
> shifting) although it shouldn't be too common in practice (things like a 
> large @implementation block would probably have the most children).
> we temporarily skip over it and look at the next pragma to decide where we 
> end.

Ah you are right I was missing this part. As you suggested we'll need to loop 
over all the remaining pragmas, which will make the complexity `((quadratic in 
terms of # of pragmas) * (depth of document symbol tree))` I think it is still 
acceptable in practice. I don't think there'll be more than a dozen pragmas in 
a file. (moreover i think this problem exists with the current implementation 
you've provided, see my comment below)

Are you fine with reshaping the implementation in that way then? Because I 
still feel like the code will end up looking a lot more straightforward, even 
if not the most optimal and we can try to optimize the rest if it shows up in 
the latency graphs.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105904/new/

https://reviews.llvm.org/D105904

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105904: [clangd] Support `#pragma mark` in the outline

Reply via email to