sammccall added a comment.

Sorry for the long delay, I've made up for it with extra comments :-\

This looks really well-thought-out and I'm rationalizing my pickiness as:

- this is conceptually complicated
- I expect this code to live a long time and be read and "modified around" by 
lots of people

Some of the comments/requests for doc might strictly be more in scope for in 
later patches (documenting functionality that doesn't exist yet). Those docs 
would help *me* now but happy if you'd rather briefly explain and add them 
later.



================
Comment at: clang/lib/Format/FormatToken.h:166
+  ///               \- P0  \- P1
+  /// ExpandedFrom stacks for each generated token will be:
+  /// ( -> P0
----------------
this is a great example, it might be a little more clear with more distinct 
chars and some vertical alignment:
```
Given X(A)=[A], Y(A)=<A>,
                  X({ Y(0)  } ) expands as
                  [ { < 0 > } ]
StartOfExpansion  1   1
ExpandedFrom[0]   X X X X X X X
ExpandedFrom[1]       Y Y Y
```

You could extend this to cover all the fields and hoist it to be a comment on 
MacroContext if you like, I think the concreteness helps.


================
Comment at: clang/lib/Format/FormatToken.h:176
+
+  /// Whether this token is the first token in a macro expansion.
+  bool StartOfExpansion = false;
----------------
why the asymmetry between start/end?

given `ID(x)=X`, `ID(ID(0))` yields `0` which starts and ends two expansions, 
right?
Consider making them both integer, even if you don't need it at this point.

(also 64 bits, really?)


================
Comment at: clang/lib/Format/FormatToken.h:183
+
+  /// When macro expansion introduces parents, those are marked as
+  /// \c MacroParent, so formatting knows their children need to be formatted.
----------------
this isn't used in this patch - can we leave it out until used?


================
Comment at: clang/lib/Format/FormatToken.h:389
+  // in a configured macro expansion.
+  MacroContext MacroCtx;
+
----------------
if you're not extremely concerned about memory layout, I'd consider making this 
an `Optional<MacroContext>` with nullopt replacing the current MR_None. 

This reduces the number of implicit invariants (AIUI MR_None can't be combined 
with any other fields being set) and means the name MacroContext more closely 
fits the thing it's modeling.


================
Comment at: clang/lib/Format/FormatToken.h:632
 
+  void copyInto(FormatToken &Tok) { Tok = *this; }
+
----------------
const.

I guess it doesn't matter, but copyFrom would seem a little less weird to me in 
an OOP/encapsulation sense.
I do like this explicit form rather than clone() + move constructor though, as 
pointer identity is pretty important for tokens.


================
Comment at: clang/lib/Format/MacroExpander.cpp:35
+  SmallVector<FormatToken *, 8> Params;
+  SmallVector<FormatToken *, 8> Tokens;
+};
----------------
Tokens -> Expansion? (semantics rather than type)


================
Comment at: clang/lib/Format/MacroExpander.cpp:38
+
+// A simple macro parser.
+class MacroExpander::DefinitionParser {
----------------
Dmitri gave a tech talk on dropping comments like these :-)


================
Comment at: clang/lib/Format/MacroExpander.cpp:42
+  DefinitionParser(ArrayRef<FormatToken *> Tokens) : Tokens(Tokens) {
+    assert(!Tokens.empty());
+    Current = Tokens[0];
----------------
who's responsible for establishing this?

AIUI this will fail if e.g. `Macros` contains a string that contains only 
whitespace, which is a slightly weird precondition.


================
Comment at: clang/lib/Format/MacroExpander.cpp:63
+  bool parseParams() {
+    if (Current->isNot(tok::l_paren))
+      return false;
----------------
assert instead? Caller checks this


================
Comment at: clang/lib/Format/MacroExpander.cpp:81
+    do {
+      Def.Tokens.push_back(Current);
+      nextToken();
----------------
this assumes the expansion is nonempty, which the grammar doesn't. while{} 
instead?


================
Comment at: clang/lib/Format/MacroExpander.cpp:113
+void MacroExpander::parseDefinitions(
+    const std::vector<std::string> &MacroExpander) {
+  for (const std::string &Macro : MacroExpander) {
----------------
weird param name!


================
Comment at: clang/lib/Format/MacroExpander.cpp:116
+    Buffers.push_back(
+        llvm::MemoryBuffer::getMemBufferCopy(Macro, "<scratch space>"));
+    clang::FileID FID =
----------------
This is a slightly spooky buffer name - it's the magic name the PP uses for 
pasted tokens.
A closer fit for config is maybe "<command line>" (like macro definitions 
passed with `-D`).
Is it necessary to use one of clang's magic buffer names at all? If so, 
comment! Else maybe "<clang-format style>" or something?


================
Comment at: clang/lib/Format/MacroExpander.cpp:133
+                                                          ArgsList Args) {
+  assert(defined(ID->TokenText));
+  SmallVector<FormatToken *, 8> Result;
----------------
is the caller responsible for checking the #args matches #params?
If so, document and assert here?

Looking at the implementation, it seems you don't expand if there are too few 
args, and expand if there are too many args (ignoring the last ones). Maybe it 
doesn't matter, but it'd be nice to be more consistent here.

(Probably worth calling out somewhere explicitly that variadic macros are not 
supported)


================
Comment at: clang/lib/Format/MacroExpander.cpp:141
+  //   y -> 1
+  llvm::StringMap<size_t> ArgMap;
+  for (size_t I = 0, E = Def.Params.size(); I != E; ++I) {
----------------
This doesn't depend on args, so we could compute this mapping when the 
Definition is constructed and encapsulate it there.

(Maybe performance doesn't matter, I'd also find this a little clearer. But if 
the allocation doesn't matter, we shouldn't be using SmallVector...)


================
Comment at: clang/lib/Format/MacroExpander.cpp:167
+      return false;
+    // If there are fewer arguments than referenced parameters, skip the
+    // parameter.
----------------
skip the parameter -> treat the parameter as empty?
(My first guess was this meant given `ID(X)=X`, `ID()` would expand to `X`.)


================
Comment at: clang/lib/Format/MacroExpander.cpp:185
+
+  // Expand the definition into Restlt.
+  for (FormatToken *Tok : Definitions[ID->TokenText].Tokens) {
----------------
nit: Result


================
Comment at: clang/lib/Format/MacroExpander.cpp:189
+      continue;
+    // Create a copy of the tokens that were not part of the macro argument,
+    // i.e. were not provided by user code.
----------------
"tokens that were not part of the macro argument" --> "tokens from the macro 
body"?


================
Comment at: clang/lib/Format/MacroExpander.cpp:194
+    assert(New->MacroCtx.Role == MR_None);
+    // Tokens that are not part of the user code do not need to be formatted.
+    New->MacroCtx.Role = MR_Hidden;
----------------
(I don't know exactly how this is used, but consider whether you mean "do not 
need to", "should not" or "cannot" here)


================
Comment at: clang/lib/Format/MacroExpander.cpp:198
+  }
+  assert(Result.size() >= 1);
+  if (Result.size() > 1)
----------------
this threw me for a loop... it's EOF right?

It's not explicitly mentioned, so maybe either add a comment or `&& 
Result.back()->is(tok::eof)`.

This makes the `size-2` less cryptic too.


================
Comment at: clang/lib/Format/MacroExpander.cpp:200
+  if (Result.size() > 1)
+    ++Result[Result.size() - 2]->MacroCtx.EndOfExpansion;
+  return Result;
----------------
Why not set StartOfExpansion in the same way, to avoid tracking the `First` 
state?


================
Comment at: clang/lib/Format/MacroExpander.h:10
+///
+/// \file
+/// This file contains the declaration of MacroExpander, which handles macro
----------------
I think this comment is too short (and doesn't really say much that you can't 
get from the filename). IME many people's mental model of macros is based on 
how they're used rather than how they formally work, so I think it's worth 
spelling out even the obvious implementation choices.

I'd like to see:
 - rough description of the scope/goal of the feature (clang-format doesn't see 
macro definitions, so macro uses tend to be pseudo-parsed as function calls. 
When this isn't appropriate [example], a macro definition can be provided as 
part of the style. When such a macro is used in the code, clang-format will 
expand it as the preprocessor would before determining the role of tokens. 
[example])
 - explicitly call out here that only a single level of expansion is supported, 
which is a divergence from the real preprocessor. (This influences both the 
implementation and the mental model)
 - what the MacroExpander does and how it relates to MacroContext. I think this 
should give the input and output token streams names, as it's often important 
to clearly distinguish the two. (TokenBuffer uses "Spelled tokens" and 
"Expanded tokens" for this, which is not the same as how these terms are used 
in SourceManager, but related and hasn't been confusing in practice).
 - a rough sketch of how the mismatch between what was annotated and what must 
be formatted is resolved e.g. -- just guessing here -- the spelled stream is 
formatted but for macro args, the annotations from the expanded stream are used.

(I'm assuming this is the best file to talk about the implementation of this 
feature in general - i'm really hoping that there is such a file. If there are 
a bunch of related utilities you might even consider renaming the header as an 
umbrella to make them easier to document. This is a question of taste...)


================
Comment at: clang/lib/Format/MacroExpander.h:40
+
+/// Takes a set of simple macro definitions as strings and allows expanding
+/// calls to those macros.
----------------
You define "simple" in the patch description as non-recursive - can you just 
say "non-recursive" here? Or better spell out what that means (no macro can 
refer to another macro)


================
Comment at: clang/lib/Format/MacroExpander.h:44
+public:
+  typedef llvm::ArrayRef<llvm::SmallVector<FormatToken *, 8>> ArgsList;
+
----------------
nit: I think `using` is usually considered more readable


================
Comment at: clang/lib/Format/MacroExpander.h:49
+  /// Each entry in \p Macros must conform to the following simple
+  /// macro-definition language:
+  /// <def>    ::= <id> <exp> | <id> "(" <params> ") <exp>
----------------
This seems to precisely define the grammar but leave me guessing as to the 
semantics.
I'd at least suggest 'exp' -> 'expansion'.

Personally I'm partial to examples instead :-)


================
Comment at: clang/lib/Format/MacroExpander.h:50
+  /// macro-definition language:
+  /// <def>    ::= <id> <exp> | <id> "(" <params> ") <exp>
+  /// <params> ::= <id> | <id> "," <params>
----------------
"PI 3.14" or "NOT(X) !X" seems much less familiar than "PI=3.14" or "NOT(X)=!X" 
as accepted by `-D`.

It also resolves an ambiguity: is "DISCARD_ERROR ( void ) ( err )" an object or 
function macro?

The extra redundancy in the grammar should make it easier to detect errors too.



================
Comment at: clang/lib/Format/MacroExpander.h:51
+  /// <def>    ::= <id> <exp> | <id> "(" <params> ") <exp>
+  /// <params> ::= <id> | <id> "," <params>
+  /// <exp>    ::= <eof> | <tok> <exp>
----------------
this grammar allows object macros, but disallows function macros with no args. 
intended?

(FWIW this grammar also allows "ID(X X": an object macro "ID" which expands to 
"(X X". But the implementation, probably correctly, doesn't)


================
Comment at: clang/lib/Format/MacroExpander.h:54
+  ///
+  MacroExpander(const std::vector<std::string> &Macros,
+                clang::SourceManager &SourceMgr, const FormatStyle &Style,
----------------
The signature doesn't allow errors to be reported, which is a little 
unfortunate but seems hard to fix properly (so that errors are reported when 
the config is parsed) - the "style is a simple struct" is hard to reconcile 
with this data structure.

Silent discard on error should probably be mentioned in the constructor.


================
Comment at: clang/lib/Format/MacroExpander.h:56
+                clang::SourceManager &SourceMgr, const FormatStyle &Style,
+                encoding::Encoding encoding,
+                llvm::SpecificBumpPtrAllocator<FormatToken> &Allocator,
----------------
Why are the macro definitions in an arbitrary specified encoding? I'd hope by 
the time we've parsed the config, our strings are UTF-8. (On disk, YAML can be 
UTF-16 per spec, but...)


================
Comment at: clang/lib/Format/MacroExpander.h:62
+  /// Returns whether a macro \p Name is defined.
+  bool defined(llvm::StringRef Name);
+
----------------
const (and an expand)


================
Comment at: clang/lib/Format/MacroExpander.h:66
+  /// each element in \p Args is a positional argument to the macro call.
+  llvm::SmallVector<FormatToken *, 8> expand(FormatToken *ID, ArgsList Args);
+
----------------
(I can't see the real callsite but...)
If we care about performance here, is 8 a little small? should we have a 
`vector &Out` instead?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83296/new/

https://reviews.llvm.org/D83296



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to