https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119792
--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 15 Apr 2025, uecker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119792 > > --- Comment #11 from uecker at gcc dot gnu.org --- > > > > (In reply to rguent...@suse.de from comment #10) > > On Tue, 15 Apr 2025, uecker at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119792 > > > > > > --- Comment #9 from uecker at gcc dot gnu.org --- > > > > > > If the problem is that useless_type_conversion_p should not return true > > > for > > > certain types with the same TYPE_CANONICAL, shouldn't we simply add back > > > the > > > test for the size of the FAM here to useless_type_conversion_p? > > > > That would be a quite expensive thing to do. > > In what sense? Do you mean the cost of doing the test? Yes. > > TYPE_CANONICAL was > > supposed to be at most "structurally equivalent", in particular > > the sizes of types with the same TYPE_CANONICAL should either > > be the same or they should be incomplete. > > > > Basically TYPE_CANONICAL should form a type system in the middle-end > > we can map all frontend type systems to. There was the neat goal > > that it should be possible to map TBAA to the very same, but examples > > like C FAM shows this might be not achievable? > > I would not be so pessimistic. The trouble occurs for some annoying > corner cases related to GNU extensions (old-style zero-sized > FAMs, VLAIS). Where we still get int wrong though in terms of > ISO C semantics are pointers to function types I believe, but > this seems less problematic at the moment. > > > > > Note the issue we run into with C vs. Ada is that LTO attempts > > to re-compute TYPE_CANONICAL by making it "most conservative" > > via applying that "structurally equivalent" logic - very much what > > for frontends is to set TYPE_STRUCTURAL_EQUALITY_P. But that > > does not seem to mix with C FAM which are _not_ structurally > > equivalent(?). > > The C semantics (including GNU extensions) should fit perfectly > well to structural equivalency. The issue here is that the backend > has semantics that are more strict than this, i.e. > > struct foo { int n; char buf[]; }; > struct foo { int n; char buf[0]; }; > struct foo { int n; char buf[n]; }; > > are all structurally equivalent (and we would decide this way when > only looking at the array type). This implies that the all need > to get the same TYPE_CANONICAL, but they got different ones. The code might have been inconsistent with respect to struct vs. stand-alone array. I think we are mixing "structurally equivalent" and "TBAA compatible". When introducing TYPE_CANONICAL to the middle-end via useless_type_conversion_p and the type checker on the IL I mostly "ignored" the VLA cases (present a lot in Ada) and massaged the equivalences to make the IL conforming. That's also how the array code evolved (I think only Ada eventually exposes assignments of arrays not wrapped in aggregates). > (it becomes more serious if you consider variable arrays in > the middle of the struct, but considering that these are also > not ISO C, not supported by other compilers, and seem rarely > used, I think we could just perhaps deprecate those or add > a big warning about aliasing) We could resort to force those to have alias-set zero. For the middle-end it's also about what is valid in assignments. We pick the size to memcpy for an aggregate assignment from the RHS (or was it the LHS?), for VLAs the RHS is then wrapped in a WITH_SIZE_EXPR, making that size explicit. So the last 'foo' above you could assign by X = WITH_SIZE_EXPR <Y, Y.n>; the first 'foo' is incomplete, it cannot appear as part of assignments but you'd have to have a complete type there. The middle with [0] is complete, it does have a size, you can assign it and have declarations with this type. > > So the problematic change was to this logic which > > made TBAA more conservative but broke the type system in that > > information is lost that was present in conversions that are now > > considered useless (carrying no middle-end relevant information). > > Ok. It seems clear that certain conversion among the types above > are not useless. But if this is encoded in useless_type_conversion_p > than we could strengthen the semantics there beyond what > TYPE_CANONICAL says. Looking how this works for other types, > this seems generally ok to have stricter semantics there than > for TYPE_CANONICAL (although because the direction of the conversion > is relevant). But your comment above implies this may be too > expensive... The IL was designed so you can substitute a type with its TYPE_CANONICAL and it would be still valid. Putting checks into useless_type_conversion_p that further constrain what is useless doesn't work - currently (I hope..) the function only allows _extra_ conversions besides the TYPE_CANONICAL equality to be useless. But for aggregate it's already most relaxed - TYPE_CANONICAL equivalence. > > > > I lack a total complete picture to see a possible solution here. > > One might be to keep TYPE_CANONICAL different but make > > get_alias_set somehow compute the same alias-sets (possibly in > > a way that might also fix that drat common initial sequence > > rule) via record_component_aliases. No idea how though. > > Adding yet another field - TYPE_CANONICAL_FOR_TBAA (yuk) - would > > be the most stupid solution, we'd have to somehow stream that > > and unify the sets when they get related by TYPE_CANONICAL > > merging. It would be a half-attempt at dissecting the type > > system from TBAA. > > Do you want to have a phone or zoom discussion? I am happy to > help with any effort, but I need to get a better understanding > of how this works in the middle end. I think we might want to have a "living document" (in the wiki perhaps?) documenting the C23 requirements and how that is realized in the middle-end. And the Ada requirements. I know we've tried to hash this out in this and previous mail discussions but it's hard to come back and wade through a discussion trying to extract the big picture. That isn't easier (IMO it's worse) in a phone or zoom discussion. When we have such complete document _then_ it possibly makes sense to clarify remaining things. So I suggest we start a wiki page on TYPE_CANONICAL, the middle-end "type system", TBAA and specialities of language frontends. I started https://gcc.gnu.org/wiki/document-middle-end-type-system