sammccall marked 6 inline comments as done. sammccall added inline comments.
================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/cxx/CXX.h:54 +namespace rules { +namespace dummy { +enum Rule { ---------------- hokein wrote: > sammccall wrote: > > usaxena95 wrote: > > > sammccall wrote: > > > > hokein wrote: > > > > > why there is a dummy namespace here? > > > > for each NT, we close the previous ns+enum and open new ones. > > > > For this to work for the first NT, we have to open an ns+enum. > > > > > > > > I can add a comment, but would prefer to do this some other way? > > > I would include this block in the clang-format off block to show these > > > are for the generated code. > > > ``` > > > //clang-format off > > > namespace dummy { enum Rule { > > > ... > > > };} > > > //clang-format on > > > ``` > > Oops, they're not for the generated code, just for the macro definition > > (clang-format gets very confused) > > > > Restricted the scope of the formatting-disabled block and tried to improve > > the hand-formatting (though I don't think any formatting of this particular > > preprocessor trick is very readable) > ah, I get it (until I read the preprocessed CXX.h code) -- I found it really > hard to understand it by literally reading the code here. > > To make it clear, I think we can probably add two additional RULE_BEGIN, > RULE_END macros? > > in `CXXSymbols.inc`, we emit something like > > ``` > RULE_BEGIN(_) > RULE(_, translation_unit, 135) > RULE(_, statement_seq, 136) > RULE(_, declaration_seq, 137)) > RULE_END(_) > ``` > > In CXX.h, we write > > ``` > #define RULE_BEGIN(LHS) namespace LHS { enum Rule : RULE ID { > #define RULE_END() }; } > ``` > That would make the callsite more clear, but at the cost of adding ad-hoc bits to the CXXSymbols.inc. I'd prefer to keep it generic if we can (otherwise we might as well just have Gen generate the enum definitions directly, right?) Added a comment to explain the dummy. ================ Comment at: clang-tools-extra/pseudo/include/clang-pseudo/cxx/CXX.h:54-62 +namespace dummy { +enum Rule { +//clang-format off +#define NONTERMINAL(NAME, ID) };} namespace NAME { enum Rule : RuleID { +#define RULE(LHS, RHS, ID) RHS = ID, #include "CXXSymbols.inc" + //clang-format on ---------------- usaxena95 wrote: > sammccall wrote: > > hokein wrote: > > > why there is a dummy namespace here? > > for each NT, we close the previous ns+enum and open new ones. > > For this to work for the first NT, we have to open an ns+enum. > > > > I can add a comment, but would prefer to do this some other way? > I would include this block in the clang-format off block to show these are > for the generated code. > ``` > //clang-format off > namespace dummy { enum Rule { > ... > };} > //clang-format on > ``` Oops, they're not for the generated code, just for the macro definition (clang-format gets very confused) Restricted the scope of the formatting-disabled block and tried to improve the hand-formatting (though I don't think any formatting of this particular preprocessor trick is very readable) ================ Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:119 - switch ((cxx::Rule)Declarator->rule()) { - case Rule::noptr_declarator_0declarator_id: // reached the bottom + switch (Declarator->rule()) { + case rule::noptr_declarator::declarator_id: // reached the bottom ---------------- hokein wrote: > sammccall wrote: > > hokein wrote: > > > The code of applying the pattern doesn't look much worse to me and it is > > > easier to verify the exhaustiveness by human as well. We might loose some > > > performance (I hope not a lot), but I think it is a good tradeoff (not > > > saying we need do it in this patch). > > > > > > ``` > > > switch (LHSNonterminal(Declarator->rule(), *G)) { > > > case cxx::Symbol::noptr_declarator: { > > > switch ((rule::noptr_declarator)Declarator->rule()) { > > > case rule::noptr_declarator::declarator_id: > > > .... > > > case > > > rule::noptr_declarator::noptr_declarator__parameters_and_qualifiers: > > > .... > > > } > > > ... > > > } > > > } > > > ``` > > I guess this is a question of taste, but I find that style very hard to > > read. > > > > (Note that it's incorrectly indented, and the indentation rules around > > switch/case are one of the main reasons I find nested switches confusing) > > (Note that it's incorrectly indented, and the indentation rules around > > switch/case are one of the main reasons I find nested switches confusing) > > Ah the indentation is weird (it is not what I originally written..). There > are ways to make the indentation correct (turn off the clang-format etc). > > If nested switch is hard to read, maybe we can wrap one with a macro to > improve the code readability (happy to explore ideas, I think there is value > on this pattern). Maybe - I think nonstandard formatting and macros cost a lot of readability in their own right. I think we should probably discuss this separately - I think this change is worth having even if we don't change control flow to exploit in in all cases, WDYT? ================ Comment at: clang-tools-extra/pseudo/lib/grammar/Grammar.cpp:51 #define TOK(X) #X, -#define KEYWORD(X, Y) #X, +#define KEYWORD(Keyword, Condition) llvm::StringRef(#Keyword).upper(), #include "clang/Basic/TokenKinds.def" ---------------- hokein wrote: > sammccall wrote: > > hokein wrote: > > > IMO, this improves the readability, and also aligns with the text in the > > > cxx.grammar. > > Thanks. I like this change too. We still have `semi` vs `;` (should we use > > `SEMI`?) but it feels clearer > yeah, that looks good (that means lowercase is always referring to > nonterminals) I go back and forth: I think the use-mention distinction is important to readability too: - `initializer_list` means an initializer list goes here - `INT` does *not* mean an integer goes here, but rather the *word* "int" - by this test, `comma` should be lowercase, because it's a comma, not the word "comma" that goes here (Now I'm actually wondering if we should go back and lowercase `IDENTIFIER`, `NUMERIC_CONSTANT` etc in the bnf file...) In any case, let's uppercase `SEMI` for now because anything else leaves ugly inconsistency somewhere. ================ Comment at: clang-tools-extra/pseudo/lib/grammar/Grammar.cpp:65 + std::string MangleName; + for (size_t I = 0; I < R.seq().size(); ++I) { + MangleName += mangleSymbol(R.seq()[I]); ---------------- hokein wrote: > sammccall wrote: > > hokein wrote: > > > ok, now we're dropping the index for all RHS symbols. Just want to know > > > your thought about this. Personally, I'd like to keep this information > > > (`noptr_declarator__l_square__constant_expression__r_square` vs > > > `noptr_declarator0_l_square1_constant_expression2_r_square3`) though it > > > makes the name a bit uglier. > > > > > > > Change mangling of keywords to ALL_CAPS (needed to turn keywords that > > > > appear alone on RHS into valid identifiers) > > > > > > if we add index number to the name, then this change is not required. > > Short version: I can deal with the numbers at the front (there's a *little* > > bit of value), but at the back of the symbol name there's no value, just > > noise. I find the double-underscore version much more readable (than either > > variant with numbers). > > > > --- > > > > I always found the indexes aesthetically ugly but OK when you could read > > them as > > ``` > > lhs _ 0 rhsa_ 1 rhsb > > lhs := [0]rhsa [1]rhsb > > ``` > > > > But an identifier can't start with a number > > (`rule::empty_statement::0semi`) this isn't possible. I think you suggested > > shifting the numbers to the end, but I can't find a way to read that, it > > adds too much visual noise. (e.g. because the number "decorations" don't > > line up in a column in code completion or a switch statement). > > > > I also think the double-underscore version is significantly less cryptic > > for new readers. > > I find the double-underscore version much more readable (than either > > variant with numbers) > > I agree that the double-underscore is more readable, but I don't have a > much-better feeling. I'm ok with the double-underscore one if you feel strong > about it. > > The reason why I value the index is that it is easier to spot the index and > its corresponding RHS element, and it could potentially avoid writing bugs > (specially in the guard function implementation, > thinking about accessing the RHS element in the looong rule > `l_paren__r_paren__decl_specifier_seq__noexcept_specifier__trailing_return_type__requires_clause`). > > > The reason why I value the index is that it is easier to spot the index and > its corresponding RHS element Right, I agree, but at least for me this benefit vanishes entirely if the index has to come at the end of the name instead of the start. I don't feel strongly about double-underscore over `0foo_1bar` (which isn't a valid identifier), but I do feel strongly about it over `foo0_bar1`. Losing the indices is a tradeoff, but I don't feel like it's terrible (it's bad in the example you gave, but it's not the common case: median RHS tokens is 1, the average is 2, and 85% have <=3) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D130414/new/ https://reviews.llvm.org/D130414 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits