Currently all that we need from the C family of frontends is the
cpp_reader and the string concatenation records. I think we can
reconstruct the cpp_reader if we have the options, though presumably
that's per TU, so to support all this we'd need to capture e.g. the per
-TU encoding information in the LTO records, for the case where one TU
is UTF-8 encoded source to UTF-8 execution, and another TU is EBCDIC
-encoded source to UCS-4 execution (or whatever). And there's an issue
if different TUs compiled the same header with different encoding
options.
Or... we could not bother. This is a Quality of Implementation thing,
for improving diagnostics, and in each case, the diagnostic is required
to cope with substring location information not being available (and
the code I posted in patch 2 of the kit makes it trivial to handle that
case from a diagnostic). So we could simply have LTO use the
fallback mode.
There are two high-level approaches I've tried:
(a) capture the substring location information in the lexer/parser in
the frontend as it runs, and store it somehow.
(b) regenerate it "on-demand" when a diagnostic needs it.
Approach (b) is inherently going to be prone to the LTO issues you
describe, but it avoids adding to the CPU cycles/memory consumption for
the common case of not needing the information. [1]
Is approach (b) acceptable?
If (b) means potentially reduced quality of the location ranges
in the -Wformat-length pass (e.g., with funky C++ format strings)
then I don't think that's enough of a problem to worry about, at
least not for this warning.
If it means not being able to use the solution you're working
on in the middle end at all (unless I misunderstood that doesn't
seem to be what you're implying, but just to be sure) then that
would seem like a serious shortcoming. I would continue to use
the code I copied from c-format.c (assuming that will still work),
but as more warnings are implemented in later passes it would
lead to duplicating code or reinventing the wheel just to get
around the limitation (or simply worse quality diagnostics).
Martin
Thanks
Dave
[1] with the exception of the string concatenation records, but I
believe those are tiny