hokein added a comment. I think the current version should be good to review, some bits need to take care of:
- left-recursive vs right-recursive of declaration-seq rule, see my other comments about it. Currently I keep it as-is to avoid the performance regression on large files (I still don't see your point how right-recursive can simplify the error recovery, it seems to me either can work at least for this case) - recoveryNextDeclaration will consume at least 1 token (the return token index > Cursor) to avoid the risk of running in an infinite loop - I keep the original implementation as-is (except fixing some bugs), but IMO `T->Indent == OrigIndent && LikelyStartsDeclaration(T->Kind)` is too strict (see my FIXME testcases), and we should probably include the IDENTIFIER (I believe it is a common case), `LikelyStartsDeclaration(T->Kind) || (T->Kind = IDENTIFIER && T->Indent == OrigIndent)` might be better. ================ Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:291 + // they appear at the beginning of a line. + auto LikelyStartsDeclaration = [&](tok::TokenKind K) -> bool { + switch (K) { ---------------- hokein wrote: > the list looks reasonable -- I think we can lookup the `Follow(declaration)` > set to find more. added more: kw_friend, kw_inline, kw_explicit, kw_virtual. ================ Comment at: clang-tools-extra/pseudo/lib/cxx/cxx.bnf:313 -declaration-seq := declaration -declaration-seq := declaration-seq declaration declaration := block-declaration ---------------- hokein wrote: > Just want to spell out, this causes a performance hit :( > (a pure change of this has a ~6% slowdown, 7.17 MB/s -> 6.7MB/s on > SemaCodeComplete.cpp). > > left-cursive grammar is more friendly to LR-like parsers, because the parser > can do the reduction earlier (after a parsed a `declaration`, we can perform > a `declaration-seq` reduce); while right-cursive grammar, the reduction > starts from the last declaration (this means our GSS is more deeper). > > > change decl-sequence to right-recursive to simplify this > I remembered we discussed this bit before, but I don't really remember the > details now :( > Got more data, it does hurt the performance on large files: SemaExpr.cpp: 9.54MB/s -> 7.98MB/s SemaDeclCXX.cpp: 10.16MB/s -> 9.25MB/s small/medium files doesn't seem to be affected (hypothesis: these files has less declarations than large files, thus being less affected by the right-recursive declaration-seq grammar rule) Hover.cpp: 8MB/s -> 7.92MB/s ASTSignals.cpp: 12.34M/s -> 12.29MB/s Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D130460/new/ https://reviews.llvm.org/D130460 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits