kbobyrev added a comment.

In D120306#3337374 <https://reviews.llvm.org/D120306#3337374>, @sammccall wrote:

> In D120306#3337288 <https://reviews.llvm.org/D120306#3337288>, @kbobyrev 
> wrote:
>
>> In D120306#3337212 <https://reviews.llvm.org/D120306#3337212>, @sammccall 
>> wrote:
>>
>>> What's the policy this patch intends to implement?
>>>
>>> I'm a little concerned by building up maps of filenames with segment limits 
>>> - it looks like some kind of heuristic/partial matching.
>>
>> Basically, handling the mappings that. IWYU pragmas provide.
>
> Yes, I'm asking you to describe that handling in a couple of sentences so I 
> can understand what you intend it to be.
>
> e.g. for includecleaner in general, the approach is: walk the AST, map each 
> node to decls it uses, map decls to locations, map locations to files, then 
> check which includes point to files that were not identified.

I'm trying to reuse the information about `IWYU pragma private` that is 
collected in `Canoncalncludes` to find the header responsible for including the 
private ones. Canonicalncludes stores information about the headers with `IWYU 
pragma: prviate, include <XYZ>`. When we're figuring out the responsible header 
in IncludeCleaner, I'm checking if the header has this pragma and then I'm 
trying to find the `<XYZ>` header that is mentioned in the pragma comment. For 
that I'm using a heuristic that the include stack of the main file should have 
a header whose suffix is <XYZ>. This because we never really have the full 
header name in the pragma comment and it might not be visible from the private 
header (and hence can not be resolved from there). I'm not sure if this 
heuristic will work well in practice, but I couldn't figure out a better one.

An alternative would probably be:

- Walk #include directives and try to find the "umbrella headers"
- For each include directive, query the included file and figure out if it has 
"IWYU pragma: private, include <XYZ>" through `CanonicalIncludes`
- If it does, figure out if the supposed umbrella header name has the `<XYZ>` 
suffix, in that case attribute the private header to this public one

This all would need to be done after `Canonicalncludes` are complete, meaning 
this would probably belong to `IncludeStructure` instead. Maybe this is more 
precise, WDYT?

>> Are you concerned about the limit itself or the way I'm trying to find these 
>> headers in general? I was afraid of consuming too much memory, hence the 
>> limit; it's not crucial to the implementation, though, I can remove it if 
>> your concern is the limit rather than the approach.
>
> Mostly I'm unclear on:
>
> - whether we're comparing filenames among a small (e.g. the include stack - 
> heuristics OK) or a large set (need to be very precise)

Yes, I'm using just the include stack of the main file and the preamble. When 
you say "large set", do you mean the whole set of project headers?

The problem here is that for the include stack we would miss the right 
diagnostic if the public header responsible for the private one is not 
accessible. Maybe we should throw an "unused" warning either way here, I don't 
know what would be right.

> - whether the partial matching is semantically important or a performance 
> optimitzation

The matching itself is an optimisation, right: what I'm dealing with is that 
`gtest.h` header is called 
`/home/user/llvm-project/llvm/util/googletest/.../gtest/gtest.h` but the 
private headers will only say `include "gtest/gtest.h"`. What I'm trying to do 
is find the header whose real path name ends with `gtest/gtest.h`.

> - whether the concept of "real name" is significant or likely to cause 
> problems

Hm, I don't know; what kind of problems do you think might appear?

> - whether storing another copy of the names of all the transitive headers is 
> actually necessary for what you're trying to do

The problem here is that I'll need `FileID` in `headerResponsbile` but I can't 
reuse `FileID`s between preamble and main file, right? Logically, I'm afraid I 
don't have an option to store anything other than the name (or an index to it); 
I could uplift this logic to the `IncludeStructure` and loop over all visible 
headers in there, I wouldn't need another copy since I'm already storing it 
there. But that might be quite slow.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120306/new/

https://reviews.llvm.org/D120306

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to