[PATCH] D140584: [Clang] Refactor "Designators" into a unified implementation [NFC]

Richard Smith - zygoloid via Phabricator via cfe-commits Fri, 03 Feb 2023 12:51:28 -0800

rsmith accepted this revision.
rsmith added a comment.

I think this is fine as a short-term stepping stone to a different 
representation.




================
Comment at: clang/include/clang/AST/Designator.h:88
+  /// An array designator, e.g., "[42] = 0" and "[42 ... 50] = 1".
+  template <typename Ty> struct ArrayDesignatorInfo {
+    /// Location of the first and last index expression within the designated
----------------
void wrote:
> rsmith wrote:
> > void wrote:
> > > rsmith wrote:
> > > > void wrote:
> > > > > void wrote:
> > > > > > rsmith wrote:
> > > > > > > Can we move the templating out from here to the whole 
> > > > > > > `Designator` and `Designation` classes? It shouldn't be possible 
> > > > > > > to mix the two kinds in the same `Designation`.
> > > > > > Grr...My previous comment was eaten.
> > > > > > 
> > > > > > I'll give it a shot.
> > > > > > 
> > > > > > However, I'm a bit surprised at how designators are handled by 
> > > > > > Clang. I expected that a `Designation` would be an `Expr` with the 
> > > > > > `Designator`s being L-values (e.g. `MemberExpr`s / 
> > > > > > `ArraySubscriptExpr`s), but instead the `Designation` exists just 
> > > > > > long enough to be turned into an explicit initialization list. Is 
> > > > > > there a reason to do it that way instead of using expressions?
> > > > > So it looks like moving the template outside of the class won't work. 
> > > > > The ability to switch between `Expr` and `unsigned` while retaining 
> > > > > the same overall type is hardwired into things like the `ASTImporter`.
> > > > > 
> > > > > This is kind of a massive mess. Maybe we shouldn't even allow them to 
> > > > > use both `Expr` and `unsigned` but instead require them to use one or 
> > > > > the other? Maybe we could require `unsigned` with the understanding 
> > > > > that the `Expr` can be converted into a constant?
> > > > I'm not understanding something. Currently the `ASTImporter` only deals 
> > > > with `DesignatedInitExpr::Designator`s , which only ever store integer 
> > > > indexes.
> > > > 
> > > > Basically, today, we have two different classes:
> > > > - A class that's specific to `DesignatedInitExpr`, and tracks array 
> > > > index expressions by storing the index of the expression within the 
> > > > `DesignatedInitExpr`'s list of children; this is also what 
> > > > `ASTImporter` can import, because it's the one that's used in the AST's 
> > > > representation.
> > > > - A class that's specific to `Sema`'s processing that tracks array 
> > > > index expressions as `Expr*` instead.
> > > > 
> > > > You want to refactor them to share code, which makes sense, because 
> > > > they are basically the same other than how they refer to expressions. 
> > > > (Not quite: `DesignatedInitExpr` can apparently refer to a field either 
> > > > as an `IdentifierInfo*` or as a `FieldDecl*`, whereas the `Sema` 
> > > > version always uses the `IdentifierInfo*` representation.)
> > > > 
> > > > Each current user of one of these two classes uses only one of the two, 
> > > > which means they're either exclusively using integers to refer to 
> > > > expressions or exclusively using `Expr*`. So it seems to me that you 
> > > > should be able to update each user to use either `Designator<unsigned>` 
> > > > or `Designator<Expr*>`, depending on which class they used before.
> > > > 
> > > > What am I missing?
> > > I'm still allowing them to use a `Designator<unsigned>` / 
> > > `Designator<Expr*>` as they see fit, only it's hidden from them via the 
> > > `Create` methods. I personally find the use of two different versions 
> > > (one using `unsigned` and one using `Expr*`) completely baffling. Why 
> > > can't they all use `Expr*`? Also the `ASTImporter` only outputs the start 
> > > of an array init range, which is at the very least counter-intuitive. 
> > > That's one of the issues I'd like to tackle with follow-up patches, 
> > > hopefully getting rid of the need for this template all together. This 
> > > does mean that in the interim a non-array range designator will have 
> > > extra `End` & `EllisisLoc` fields that aren't used, but that shouldn't be 
> > > too horrible, given that they'd be there anyway because of the union.
> > The reason that's jumping out at me for having separate integer / `Expr*` 
> > implementations here is space-efficiency -- we get to make array range 
> > designators (and hence designators overall) be only 16 bytes rather than 
> > the 32 bytes they occupy in this patch (assuming 64-bit pointers) by 
> > storing indexes instead of pointers.
> > 
> > If your eventual plan is to remove the children list from 
> > `DesignatedInitExpr`, and store the pointers only in the designators, that 
> > seems to cost 8 bytes per designator in the two common cases:
> > 
> > - For a field designator: 32 bytes (with 16 bytes of padding) versus 16 
> > bytes + 8 bytes for the child pointer today
> > - For an array designator: 32 bytes (with 16 bytes of padding) versus 16 
> > bytes + 8 bytes for the child pointer today
> > - For an array range designator: 32 bytes (4 bytes of padding) versus 16 
> > bytes + 16 bytes for two child pointers today
> > 
> > ... plus it'll presumably be painful to make the `Stmt` child iterator be 
> > able to handle this.
> > 
> > If you don't remove the separate children list from `DesignatedInitExpr`, 
> > then it seems like this approach will cost 16 bytes per designator in all 
> > cases, and we'll need to be careful in AST serialization / deserialization 
> > that we don't accidentally duplicate the `Expr`s that now have two pointers 
> > pointing to them instead of one, and likewise anywhere else that assumes 
> > each `Expr` is only reachable by one path through the AST (eg, 
> > `TreeTransform`, the recursive AST visitor).
> > 
> > I think some more visibility into the eventual plan would help.
> The plan isn't detailed, but I basically want to address several of the 
> points you mentioned here. In particular, I think the structure of 
> `DesignatedInitExpr` is backwards from how every other `Expr` is handled in 
> Clang. For instance, the `Expr` for something like `s.t.u` is a `MemberExpr` 
> with a `MemberExpr` as its sub-expression and so on. `DesignatedInitExpr` on 
> the other hand basically has a list of maybe expressions, maybe integers that 
> refer to parts of the structure / array. It seems cleaner to me to use the 
> `MemberExpr` / `ArraySubscriptExpr` way of referring to the member being 
> initialized rather than using a specialized list that has to be handled 
> differently from other `Expr`'s.
> 
> The first step in my evil plot is to do this simple refactoring, so that 
> there's no initial functionality change, before I do the more invasive 
> changes that may break things.
> 
> I'm doing this because I'm working on a feature that uses the `DIE` syntax, 
> and it would be much simpler to have it be a `MemberExpr`.
> 
> Am I completely off base here?
Thanks, that's really helpful.

OK, so the end design would be something like (just building out some details 
here so I can think about this better; I'm not expecting you would necessarily 
do exactly this!):
- `DesignatedInitExpr` becomes a base class with derived classes for member 
designators, array designators, and array range designators (And maybe also 
there's a different representation for unresolved versus resolved member 
designators.)
- These work much like `MemberExpr` / `ArraySubscriptExpr`, except that they 
don't have a "base" object (and instead store an initializer for the nominated 
field / array element(s)), and are "inside-out": for `.t.u = x` we want the 
top-level expression to be a `DesignatedInitMemberExpr` that names `t` and 
whose initializer is a `DesignatedInitMemberExpr` that names `u` and whose 
initializer is `x`, whereas for `s.t.u` the top-level expression is a 
`MemberExpr` that names `u` and its base subexpression is a `MemberExpr` that 
names `t`. That is, while we model `s.t.u = x` as `((s.t).u) = x`, we model 
`.t.u = x` more like `.t = (.u = x)`
- We use the same representation both in `DesignatedInitExpr` and in `Sema` 
(and everywhere else that looks at the syntactic form of an initializer list).
- `Designation` and `Designator` are removed entirely.

If something like that is the plan, then yes, I think that's completely 
reasonable, and seems like a nice improvement -- and I think it's fine if the 
intermediate state increases the AST size for designated initializers, as this 
patch does, especially if you're aiming for this to be completed within a 
single release cycle.

(Just quickly checking the size of the representation: a single-field 
`DesignatedInitExpr` is currently 40 bytes + 16 bytes for the `Designator` 
array, and I think could easily be 40 bytes *total* with the new 
representation; with two fields, it's currently 40 + 32 and could be 40 + 40 
with the new representation. I think that's going to be a win essentially 
always.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140584/new/

https://reviews.llvm.org/D140584

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140584: [Clang] Refactor "Designators" into a unified implementation [NFC]

Reply via email to