[PATCH] D130414: [pseudo] Reorganize CXX.h enums

Sam McCall via Phabricator via cfe-commits Tue, 26 Jul 2022 07:04:47 -0700

sammccall marked 6 inline comments as done.
sammccall added inline comments.



================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/cxx/CXX.h:54
+namespace rules {
+namespace dummy {
+enum Rule {
----------------
hokein wrote:
> sammccall wrote:
> > usaxena95 wrote:
> > > sammccall wrote:
> > > > hokein wrote:
> > > > > why there is a dummy namespace here?
> > > > for each NT, we close the previous ns+enum and open new ones.
> > > > For this to work for the first NT, we have to open an ns+enum.
> > > > 
> > > > I can add a comment, but would prefer to do this some other way?
> > > I would include this block in the clang-format off block to show these 
> > > are for the generated code.
> > > ```
> > > //clang-format off
> > > namespace dummy { enum Rule {
> > > ...
> > > };}
> > > //clang-format on
> > > ```
> > Oops, they're not for the generated code, just for the macro definition 
> > (clang-format gets very confused)
> > 
> > Restricted the scope of the formatting-disabled block and tried to improve 
> > the hand-formatting (though I don't think any formatting of this particular 
> > preprocessor trick is very readable)
> ah, I get it (until I read the preprocessed CXX.h code) -- I found it really 
> hard to understand it by literally reading the code here. 
> 
> To make it clear,  I think we can probably add two additional RULE_BEGIN, 
> RULE_END macros? 
> 
>  in `CXXSymbols.inc`, we emit something like
> 
> ```
> RULE_BEGIN(_)
> RULE(_, translation_unit, 135)
> RULE(_, statement_seq, 136)
> RULE(_, declaration_seq, 137))
> RULE_END(_)
> ```
> 
> In CXX.h, we write
> 
> ```
> #define RULE_BEGIN(LHS) namespace LHS { enum Rule : RULE ID {
> #define RULE_END()  }; }
> ```
> 
That would make the callsite more clear, but at the cost of adding ad-hoc bits 
to the CXXSymbols.inc.
I'd prefer to keep it generic if we can (otherwise we might as well just have 
Gen generate the enum definitions directly, right?)

Added a comment to explain the dummy.


================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/cxx/CXX.h:54-62
+namespace dummy {
+enum Rule {
+//clang-format off
+#define NONTERMINAL(NAME, ID) };} namespace NAME { enum Rule : RuleID {
+#define RULE(LHS, RHS, ID) RHS = ID,
 #include "CXXSymbols.inc"
+  //clang-format on
----------------
usaxena95 wrote:
> sammccall wrote:
> > hokein wrote:
> > > why there is a dummy namespace here?
> > for each NT, we close the previous ns+enum and open new ones.
> > For this to work for the first NT, we have to open an ns+enum.
> > 
> > I can add a comment, but would prefer to do this some other way?
> I would include this block in the clang-format off block to show these are 
> for the generated code.
> ```
> //clang-format off
> namespace dummy { enum Rule {
> ...
> };}
> //clang-format on
> ```
Oops, they're not for the generated code, just for the macro definition 
(clang-format gets very confused)

Restricted the scope of the formatting-disabled block and tried to improve the 
hand-formatting (though I don't think any formatting of this particular 
preprocessor trick is very readable)


================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:119
 
-    switch ((cxx::Rule)Declarator->rule()) {
-    case Rule::noptr_declarator_0declarator_id: // reached the bottom
+    switch (Declarator->rule()) {
+    case rule::noptr_declarator::declarator_id: // reached the bottom
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > The code of applying the pattern doesn't look much worse to me and it is 
> > > easier to verify the exhaustiveness by human as well. We might loose some 
> > > performance (I hope not a lot), but I think it is a good tradeoff (not 
> > > saying we need do it in this patch).
> > > 
> > > ```
> > > switch (LHSNonterminal(Declarator->rule(), *G)) {
> > >    case cxx::Symbol::noptr_declarator: {
> > >       switch ((rule::noptr_declarator)Declarator->rule())  {
> > >          case rule::noptr_declarator::declarator_id:
> > >              ....
> > >          case 
> > > rule::noptr_declarator::noptr_declarator__parameters_and_qualifiers:
> > >              ....
> > >       }
> > >    ...
> > >    }
> > > }
> > > ```
> > I guess this is a question of taste, but I find that style very hard to 
> > read.
> > 
> > (Note that it's incorrectly indented, and the indentation rules around 
> > switch/case are one of the main reasons I find nested switches confusing)
> > (Note that it's incorrectly indented, and the indentation rules around 
> > switch/case are one of the main reasons I find nested switches confusing)
> 
> Ah the indentation is weird (it is not what I originally written..). There 
> are ways to make the indentation correct (turn off the clang-format etc).
> 
> If nested switch is hard to read, maybe we can wrap one with a macro to 
> improve the code readability (happy to explore ideas, I think there is value 
> on this pattern).
Maybe - I think nonstandard formatting and macros cost a lot of readability in 
their own right.

I think we should probably discuss this separately - I think this change is 
worth having even if we don't change control flow to exploit in in all cases, 
WDYT?


================
Comment at: clang-tools-extra/pseudo/lib/grammar/Grammar.cpp:51
 #define TOK(X) #X,
-#define KEYWORD(X, Y) #X,
+#define KEYWORD(Keyword, Condition) llvm::StringRef(#Keyword).upper(),
 #include "clang/Basic/TokenKinds.def"
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > IMO, this improves the readability, and also aligns with the text in the 
> > > cxx.grammar.
> > Thanks. I like this change too. We still have `semi` vs `;` (should we use 
> > `SEMI`?) but it feels clearer
> yeah, that looks good (that means lowercase is always referring to 
> nonterminals)
I go back and forth: I think the use-mention distinction is important to 
readability too:
 - `initializer_list` means an initializer list goes here
 - `INT` does *not* mean an integer goes here, but rather the *word* "int"
 - by this test, `comma` should be lowercase, because it's a comma, not the 
word "comma" that goes here

(Now I'm actually wondering if we should go back and lowercase `IDENTIFIER`, 
`NUMERIC_CONSTANT` etc in the bnf file...)

In any case, let's uppercase `SEMI` for now because anything else leaves ugly 
inconsistency somewhere.


================
Comment at: clang-tools-extra/pseudo/lib/grammar/Grammar.cpp:65
+  std::string MangleName;
+  for (size_t I = 0; I < R.seq().size(); ++I) {
+    MangleName += mangleSymbol(R.seq()[I]);
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > ok, now we're dropping the index for all RHS symbols. Just want to know 
> > > your thought about this. Personally, I'd like to keep this information 
> > > (`noptr_declarator__l_square__constant_expression__r_square` vs 
> > > `noptr_declarator0_l_square1_constant_expression2_r_square3`) though it 
> > > makes the name a bit uglier.
> > > 
> > > > Change mangling of keywords to ALL_CAPS (needed to turn keywords that 
> > > > appear alone on RHS into valid identifiers)
> > > 
> > > if we add index number to the name, then this change is not required.
> > Short version: I can deal with the numbers at the front (there's a *little* 
> > bit of value), but at the back of the symbol name there's no value, just 
> > noise. I find the double-underscore version much more readable (than either 
> > variant with numbers).
> > 
> > ---
> > 
> > I always found the indexes aesthetically ugly but OK when you could read 
> > them as
> > ```
> > lhs _   0 rhsa_ 1 rhsb
> > lhs := [0]rhsa [1]rhsb
> > ```
> > 
> > But an identifier can't start with a number 
> > (`rule::empty_statement::0semi`) this isn't possible. I think you suggested 
> > shifting the numbers to the end, but I can't find a way to read that, it 
> > adds too much visual noise. (e.g. because the number "decorations" don't 
> > line up in a column in code completion or a switch statement).
> > 
> > I also think the double-underscore version is significantly less cryptic 
> > for new readers.
> >  I find the double-underscore version much more readable (than either 
> > variant with numbers)
> 
> I agree that the double-underscore is more readable, but I don't have a 
> much-better feeling. I'm ok with the double-underscore one if you feel strong 
> about it.
> 
> The reason why I value the index is that it is easier to spot the index and 
> its corresponding RHS element, and it could potentially avoid writing bugs 
> (specially in the guard function implementation,
> thinking about accessing the RHS element in the looong rule 
> `l_paren__r_paren__decl_specifier_seq__noexcept_specifier__trailing_return_type__requires_clause`).
> 
> 
> The reason why I value the index is that it is easier to spot the index and 
> its corresponding RHS element

Right, I agree, but at least for me this benefit vanishes entirely if the index 
has to come at the end of the name instead of the start.

I don't feel strongly about double-underscore over `0foo_1bar` (which isn't a 
valid identifier), but I do feel strongly about it over `foo0_bar1`.

Losing the indices is a tradeoff, but I don't feel like it's terrible (it's bad 
in the example you gave, but it's not the common case: median RHS tokens is 1, 
the average is 2, and 85% have <=3)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130414/new/

https://reviews.llvm.org/D130414

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130414: [pseudo] Reorganize CXX.h enums

Reply via email to