sammccall marked 6 inline comments as done.
sammccall added inline comments.
================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/cxx/CXX.h:54
+namespace rules {
+namespace dummy {
+enum Rule {
----------------
hokein wrote:
> sammccall wrote:
> > usaxena95 wrote:
> > > sammccall wrote:
> > > > hokein wrote:
> > > > > why there is a dummy namespace here?
> > > > for each NT, we close the previous ns+enum and open new ones.
> > > > For this to work for the first NT, we have to open an ns+enum.
> > > >
> > > > I can add a comment, but would prefer to do this some other way?
> > > I would include this block in the clang-format off block to show these
> > > are for the generated code.
> > > ```
> > > //clang-format off
> > > namespace dummy { enum Rule {
> > > ...
> > > };}
> > > //clang-format on
> > > ```
> > Oops, they're not for the generated code, just for the macro definition
> > (clang-format gets very confused)
> >
> > Restricted the scope of the formatting-disabled block and tried to improve
> > the hand-formatting (though I don't think any formatting of this particular
> > preprocessor trick is very readable)
> ah, I get it (until I read the preprocessed CXX.h code) -- I found it really
> hard to understand it by literally reading the code here.
>
> To make it clear, I think we can probably add two additional RULE_BEGIN,
> RULE_END macros?
>
> in `CXXSymbols.inc`, we emit something like
>
> ```
> RULE_BEGIN(_)
> RULE(_, translation_unit, 135)
> RULE(_, statement_seq, 136)
> RULE(_, declaration_seq, 137))
> RULE_END(_)
> ```
>
> In CXX.h, we write
>
> ```
> #define RULE_BEGIN(LHS) namespace LHS { enum Rule : RULE ID {
> #define RULE_END() }; }
> ```
>
That would make the callsite more clear, but at the cost of adding ad-hoc bits
to the CXXSymbols.inc.
I'd prefer to keep it generic if we can (otherwise we might as well just have
Gen generate the enum definitions directly, right?)
Added a comment to explain the dummy.
================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/cxx/CXX.h:54-62
+namespace dummy {
+enum Rule {
+//clang-format off
+#define NONTERMINAL(NAME, ID) };} namespace NAME { enum Rule : RuleID {
+#define RULE(LHS, RHS, ID) RHS = ID,
#include "CXXSymbols.inc"
+ //clang-format on
----------------
usaxena95 wrote:
> sammccall wrote:
> > hokein wrote:
> > > why there is a dummy namespace here?
> > for each NT, we close the previous ns+enum and open new ones.
> > For this to work for the first NT, we have to open an ns+enum.
> >
> > I can add a comment, but would prefer to do this some other way?
> I would include this block in the clang-format off block to show these are
> for the generated code.
> ```
> //clang-format off
> namespace dummy { enum Rule {
> ...
> };}
> //clang-format on
> ```
Oops, they're not for the generated code, just for the macro definition
(clang-format gets very confused)
Restricted the scope of the formatting-disabled block and tried to improve the
hand-formatting (though I don't think any formatting of this particular
preprocessor trick is very readable)
================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:119
- switch ((cxx::Rule)Declarator->rule()) {
- case Rule::noptr_declarator_0declarator_id: // reached the bottom
+ switch (Declarator->rule()) {
+ case rule::noptr_declarator::declarator_id: // reached the bottom
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > The code of applying the pattern doesn't look much worse to me and it is
> > > easier to verify the exhaustiveness by human as well. We might loose some
> > > performance (I hope not a lot), but I think it is a good tradeoff (not
> > > saying we need do it in this patch).
> > >
> > > ```
> > > switch (LHSNonterminal(Declarator->rule(), *G)) {
> > > case cxx::Symbol::noptr_declarator: {
> > > switch ((rule::noptr_declarator)Declarator->rule()) {
> > > case rule::noptr_declarator::declarator_id:
> > > ....
> > > case
> > > rule::noptr_declarator::noptr_declarator__parameters_and_qualifiers:
> > > ....
> > > }
> > > ...
> > > }
> > > }
> > > ```
> > I guess this is a question of taste, but I find that style very hard to
> > read.
> >
> > (Note that it's incorrectly indented, and the indentation rules around
> > switch/case are one of the main reasons I find nested switches confusing)
> > (Note that it's incorrectly indented, and the indentation rules around
> > switch/case are one of the main reasons I find nested switches confusing)
>
> Ah the indentation is weird (it is not what I originally written..). There
> are ways to make the indentation correct (turn off the clang-format etc).
>
> If nested switch is hard to read, maybe we can wrap one with a macro to
> improve the code readability (happy to explore ideas, I think there is value
> on this pattern).
Maybe - I think nonstandard formatting and macros cost a lot of readability in
their own right.
I think we should probably discuss this separately - I think this change is
worth having even if we don't change control flow to exploit in in all cases,
WDYT?
================
Comment at: clang-tools-extra/pseudo/lib/grammar/Grammar.cpp:51
#define TOK(X) #X,
-#define KEYWORD(X, Y) #X,
+#define KEYWORD(Keyword, Condition) llvm::StringRef(#Keyword).upper(),
#include "clang/Basic/TokenKinds.def"
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > IMO, this improves the readability, and also aligns with the text in the
> > > cxx.grammar.
> > Thanks. I like this change too. We still have `semi` vs `;` (should we use
> > `SEMI`?) but it feels clearer
> yeah, that looks good (that means lowercase is always referring to
> nonterminals)
I go back and forth: I think the use-mention distinction is important to
readability too:
- `initializer_list` means an initializer list goes here
- `INT` does *not* mean an integer goes here, but rather the *word* "int"
- by this test, `comma` should be lowercase, because it's a comma, not the
word "comma" that goes here
(Now I'm actually wondering if we should go back and lowercase `IDENTIFIER`,
`NUMERIC_CONSTANT` etc in the bnf file...)
In any case, let's uppercase `SEMI` for now because anything else leaves ugly
inconsistency somewhere.
================
Comment at: clang-tools-extra/pseudo/lib/grammar/Grammar.cpp:65
+ std::string MangleName;
+ for (size_t I = 0; I < R.seq().size(); ++I) {
+ MangleName += mangleSymbol(R.seq()[I]);
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > ok, now we're dropping the index for all RHS symbols. Just want to know
> > > your thought about this. Personally, I'd like to keep this information
> > > (`noptr_declarator__l_square__constant_expression__r_square` vs
> > > `noptr_declarator0_l_square1_constant_expression2_r_square3`) though it
> > > makes the name a bit uglier.
> > >
> > > > Change mangling of keywords to ALL_CAPS (needed to turn keywords that
> > > > appear alone on RHS into valid identifiers)
> > >
> > > if we add index number to the name, then this change is not required.
> > Short version: I can deal with the numbers at the front (there's a *little*
> > bit of value), but at the back of the symbol name there's no value, just
> > noise. I find the double-underscore version much more readable (than either
> > variant with numbers).
> >
> > ---
> >
> > I always found the indexes aesthetically ugly but OK when you could read
> > them as
> > ```
> > lhs _ 0 rhsa_ 1 rhsb
> > lhs := [0]rhsa [1]rhsb
> > ```
> >
> > But an identifier can't start with a number
> > (`rule::empty_statement::0semi`) this isn't possible. I think you suggested
> > shifting the numbers to the end, but I can't find a way to read that, it
> > adds too much visual noise. (e.g. because the number "decorations" don't
> > line up in a column in code completion or a switch statement).
> >
> > I also think the double-underscore version is significantly less cryptic
> > for new readers.
> > I find the double-underscore version much more readable (than either
> > variant with numbers)
>
> I agree that the double-underscore is more readable, but I don't have a
> much-better feeling. I'm ok with the double-underscore one if you feel strong
> about it.
>
> The reason why I value the index is that it is easier to spot the index and
> its corresponding RHS element, and it could potentially avoid writing bugs
> (specially in the guard function implementation,
> thinking about accessing the RHS element in the looong rule
> `l_paren__r_paren__decl_specifier_seq__noexcept_specifier__trailing_return_type__requires_clause`).
>
>
> The reason why I value the index is that it is easier to spot the index and
> its corresponding RHS element
Right, I agree, but at least for me this benefit vanishes entirely if the index
has to come at the end of the name instead of the start.
I don't feel strongly about double-underscore over `0foo_1bar` (which isn't a
valid identifier), but I do feel strongly about it over `foo0_bar1`.
Losing the indices is a tradeoff, but I don't feel like it's terrible (it's bad
in the example you gave, but it's not the common case: median RHS tokens is 1,
the average is 2, and 85% have <=3)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130414/new/
https://reviews.llvm.org/D130414
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits