Re: [PATCH 1/3] (v2) On-demand locations within string-literals

David Malcolm Fri, 29 Jul 2016 08:26:22 -0700

On Fri, 2016-07-29 at 10:46 -0400, David Malcolm wrote:
> On Fri, 2016-07-29 at 08:22 -0600, Martin Sebor wrote:
> > > Currently all that we need from the C family of frontends is the
> > > cpp_reader and the string concatenation records.  I think we can
> > > reconstruct the cpp_reader if we have the options, though
> > > presumably
> > > that's per TU, so to support all this we'd need to capture e.g.
> > > the
> > > per
> > > -TU encoding information in the LTO records, for the case where
> > > one
> > > TU
> > > is UTF-8 encoded source to UTF-8 execution, and another TU is
> > > EBCDIC
> > > -encoded source to UCS-4 execution (or whatever).  And there's an
> > > issue
> > > if different TUs compiled the same header with different encoding
> > > options.
> > > 
> > > Or... we could not bother.  This is a Quality of Implementation
> > > thing,
> > > for improving diagnostics, and in each case, the diagnostic is
> > > required
> > > to cope with substring location information not being available
> > > (and
> > > the code I posted in patch 2 of the kit makes it trivial to
> > > handle
> > > that
> > > case from a diagnostic).  So we could simply have LTO use the
> > > fallback mode.
> > > 
> > > There are two high-level approaches I've tried:
> > > 
> > > (a) capture the substring location information in the
> > > lexer/parser
> > > in
> > > the frontend as it runs, and store it somehow.
> > > 
> > > (b) regenerate it "on-demand" when a diagnostic needs it.
> > > 
> > > Approach (b) is inherently going to be prone to the LTO issues
> > > you
> > > describe, but it avoids adding to the CPU cycles/memory
> > > consumption
> > > for
> > > the common case of not needing the information. [1]
> > > 
> > > Is approach (b) acceptable?
> > 
> > If (b) means potentially reduced quality of the location ranges
> > in the -Wformat-length pass (e.g., with funky C++ format strings)
> > then I don't think that's enough of a problem to worry about, at
> > least not for this warning.
> > 
> > If it means not being able to use the solution you're working
> > on in the middle end  at all (unless I misunderstood that doesn't
> > seem to be what you're implying, but just to be sure) then that
> > would seem like a serious shortcoming.  I would continue to use
> > the code I copied from c-format.c (assuming that will still work),
> > but as more warnings are implemented in later passes it would
> > lead to duplicating code or reinventing the wheel just to get
> > around the limitation (or simply worse quality diagnostics).
> 
> It'll work fine for the middle-end within cc1 and cc1plus.
> 
> I'm specifically referring to LTO here, and it would be fixable from
> LTO if we can encode information about the TU encoding options into
> the
> LTO data stream, and capture the string concatenation records there
> too
> (but that would be followup work).


FWIW, it appears that clang uses the on-demand approach; the relevant
code appears to be StringLiteral::getLocationOfByte:
http://clang.llvm.org/doxygen/Expr_8cpp_source.html#l01008


> 
> > Martin
> > 
> > > 
> > > Thanks
> > > Dave
> > > 
> > > [1] with the exception of the string concatenation records, but I
> > > believe those are tiny
> > > 
> >

Re: [PATCH 1/3] (v2) On-demand locations within string-literals

Reply via email to