hokein added a comment.
I think the current version should be good to review, some bits need to take
care of:
- left-recursive vs right-recursive of declaration-seq rule, see my other
comments about it. Currently I keep it as-is to avoid the performance
regression on large files (I still don't see your point how right-recursive can
simplify the error recovery, it seems to me either can work at least for this
case)
- recoveryNextDeclaration will consume at least 1 token (the return token index
> Cursor) to avoid the risk of running in an infinite loop
- I keep the original implementation as-is (except fixing some bugs), but IMO
`T->Indent == OrigIndent && LikelyStartsDeclaration(T->Kind)` is too strict
(see my FIXME testcases), and we should probably include the IDENTIFIER (I
believe it is a common case), `LikelyStartsDeclaration(T->Kind) || (T->Kind =
IDENTIFIER && T->Indent == OrigIndent)` might be better.
================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:291
+ // they appear at the beginning of a line.
+ auto LikelyStartsDeclaration = [&](tok::TokenKind K) -> bool {
+ switch (K) {
----------------
hokein wrote:
> the list looks reasonable -- I think we can lookup the `Follow(declaration)`
> set to find more.
added more: kw_friend, kw_inline, kw_explicit, kw_virtual.
================
Comment at: clang-tools-extra/pseudo/lib/cxx/cxx.bnf:313
-declaration-seq := declaration
-declaration-seq := declaration-seq declaration
declaration := block-declaration
----------------
hokein wrote:
> Just want to spell out, this causes a performance hit :(
> (a pure change of this has a ~6% slowdown, 7.17 MB/s -> 6.7MB/s on
> SemaCodeComplete.cpp).
>
> left-cursive grammar is more friendly to LR-like parsers, because the parser
> can do the reduction earlier (after a parsed a `declaration`, we can perform
> a `declaration-seq` reduce); while right-cursive grammar, the reduction
> starts from the last declaration (this means our GSS is more deeper).
>
> > change decl-sequence to right-recursive to simplify this
> I remembered we discussed this bit before, but I don't really remember the
> details now :(
>
Got more data, it does hurt the performance on large files:
SemaExpr.cpp: 9.54MB/s -> 7.98MB/s
SemaDeclCXX.cpp: 10.16MB/s -> 9.25MB/s
small/medium files doesn't seem to be affected (hypothesis: these files has
less declarations than large files, thus being less affected by the
right-recursive declaration-seq grammar rule)
Hover.cpp: 8MB/s -> 7.92MB/s
ASTSignals.cpp: 12.34M/s -> 12.29MB/s
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130460/new/
https://reviews.llvm.org/D130460
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits