On Fri, 2016-07-29 at 10:46 -0400, David Malcolm wrote: > On Fri, 2016-07-29 at 08:22 -0600, Martin Sebor wrote: > > > Currently all that we need from the C family of frontends is the > > > cpp_reader and the string concatenation records. I think we can > > > reconstruct the cpp_reader if we have the options, though > > > presumably > > > that's per TU, so to support all this we'd need to capture e.g. > > > the > > > per > > > -TU encoding information in the LTO records, for the case where > > > one > > > TU > > > is UTF-8 encoded source to UTF-8 execution, and another TU is > > > EBCDIC > > > -encoded source to UCS-4 execution (or whatever). And there's an > > > issue > > > if different TUs compiled the same header with different encoding > > > options. > > > > > > Or... we could not bother. This is a Quality of Implementation > > > thing, > > > for improving diagnostics, and in each case, the diagnostic is > > > required > > > to cope with substring location information not being available > > > (and > > > the code I posted in patch 2 of the kit makes it trivial to > > > handle > > > that > > > case from a diagnostic). So we could simply have LTO use the > > > fallback mode. > > > > > > There are two high-level approaches I've tried: > > > > > > (a) capture the substring location information in the > > > lexer/parser > > > in > > > the frontend as it runs, and store it somehow. > > > > > > (b) regenerate it "on-demand" when a diagnostic needs it. > > > > > > Approach (b) is inherently going to be prone to the LTO issues > > > you > > > describe, but it avoids adding to the CPU cycles/memory > > > consumption > > > for > > > the common case of not needing the information. [1] > > > > > > Is approach (b) acceptable? > > > > If (b) means potentially reduced quality of the location ranges > > in the -Wformat-length pass (e.g., with funky C++ format strings) > > then I don't think that's enough of a problem to worry about, at > > least not for this warning. > > > > If it means not being able to use the solution you're working > > on in the middle end at all (unless I misunderstood that doesn't > > seem to be what you're implying, but just to be sure) then that > > would seem like a serious shortcoming. I would continue to use > > the code I copied from c-format.c (assuming that will still work), > > but as more warnings are implemented in later passes it would > > lead to duplicating code or reinventing the wheel just to get > > around the limitation (or simply worse quality diagnostics). > > It'll work fine for the middle-end within cc1 and cc1plus. > > I'm specifically referring to LTO here, and it would be fixable from > LTO if we can encode information about the TU encoding options into > the > LTO data stream, and capture the string concatenation records there > too > (but that would be followup work).
FWIW, it appears that clang uses the on-demand approach; the relevant code appears to be StringLiteral::getLocationOfByte: http://clang.llvm.org/doxygen/Expr_8cpp_source.html#l01008 > > > Martin > > > > > > > > Thanks > > > Dave > > > > > > [1] with the exception of the string concatenation records, but I > > > believe those are tiny > > > > >