On Mon, 2018-07-23 at 20:56 -0600, Martin Sebor wrote: > On 07/23/2018 07:20 PM, David Malcolm wrote: > > On Mon, 2018-07-23 at 17:49 -0600, Martin Sebor wrote: > > > (David, I'm hoping your your help here. Please see the end.) > > > > > > While looking into a recent -Warray-bounds instance in Glibc > > > involving inlining of large functions it became apparent that > > > GCC could do a better job of pinpointing the source of > > > the problem. > > > > > > The attached patch makes a few adjustments to both > > > the pretty printer infrastructure and to VRP to make this > > > possible. The diagnostic pretty printer already has directives > > > to print the inlining context for both tree and gcall* arguments, > > > so most of the changes just adjust things to be able to pass in > > > gimple* argument instead. > > > > > > The only slightly interesting change is to print the declaration > > > to which the out-of-bounds array refers if one is known. > > > > > > Tested on x86_64-linux with one regression. > > > > > > The regression is in the gcc.dg/Warray-bounds.c test: the column > > > numbers of the warnings are off. Adding the %G specifier to > > > the array bounds warnings in VRP has the unexpected effect of > > > expanding the extent of the underling. For instance, for a test > > > case like this: > > > > > > int a[10]; > > > > > > void f (void) > > > { > > > a[-1] = 0; > > > } > > > > > > from the expected: > > > > > > a[-1] = 0; > > > ~^~~~ > > > > > > to this: > > > > > > a[-1] = 0; > > > ~~~~~~^~~ > > > > > > David, do you have any idea how to avoid this? > > > > Are you referring to the the various places in your patch (in e.g. > > vrp_prop::check_array_ref > > vrp_prop::check_mem_ref > > vrp_prop::search_for_addr_array > > ) where the patch changed things from this form: > > > > warning_at (location, OPT_Warray_bounds, > > "[...format string...]", ARGS...); > > > > to this form: > > > > warning_at (location, OPT_Warray_bounds, > > "%G[...format string...]", stmt, ARGS...); > > Yes. > > > > > If so, there are two location_t values of interest here: > > (a) the "location" value, and > > (b) gimple_location (stmt) > > > > My recollection is that %G and %K override the "location" value > > passed > > in as the first param to the diagnostic call, overwriting it within > > the > > diagnostic_info's text_info with the location value from the %K/%G > > (which also set up the pp_ti_abstract_origin of the text_info from > > the > > block information stashed in the ad-hoc data part of the location, > > so > > that the pretty-printer prints the inlining chain). > > Would having the pretty printer restore the location and > the block after it's done printing the context and before > processing the rest of the format string fix it? (I have > only a vague idea how this all works so I'm not sure if > this even makes sense.)
Structurally, it looks like this: Temporaries during the emission of | Long-lived stuff: the diagnostic: | | +---------+ +----------------+ | |global_dc| |diagnostic_info | | +---------+ |+------------+ | | ||text_info: | | | || m_richloc-+--+---> rich_location | || x_data----+--+-------------------+--> block (via pp_ti_abstract_origin) |+------------+ | | +----------------+ | | The location_t of the diagnostic is stored in the rich_location. Calling: warning_at (location) creates a rich_location wrapping "location" and uses it as above. During formatting, the %K/%G codes set text_info.x_data via pp_ti_abstract_origin and overwrite the location_t in the rich_location. So in theory we could have a format code that sets the block and doesn't touch the rich_location. But that seems like overkill to me. > > > > [aside, why don't we always just print the inlining chain? IIRC, > > %K > > and %G feel too much like having to jump through hoops to me, given > > that gimple_block is looking at gimple_location anyway, why not > > just > > use the location in the location_t's ad-hoc data; I have a feeling > > there's a PR open about this, but I don't have it to hand right > > now]. > > That would make sense to me. I think that's also what we > agreed would be the way forward the last time we discussed > this. (nods) > > > > It looks like the old location was (a), and now you're seeing (b), > > since (b) is the location of the statement. > > Yes, I think that's it. > > > > > Whether or not this is a problem is difficult to tell; what does > > the > > full diagnostic look like? (you only quoted the > > diagnostic_show_locus > > part of it). > > It looks like this: > > e.c: In function ‘f’: > e.c:5:9: warning: array subscript -1 is below array bounds of > ‘int[10]’ > [-Warray-bounds] > a[-1] = 0; > ~~~~~~^~~ > e.c:1:5: note: while referencing ‘a’ > int a[10]; > ^ FWIW, either of the locations/underlining above seem fine to me, but I feel the wording of the diagnostic could be improved. When I first read: array subscript -1 is below array bounds of ‘int[10]’ I found myself pausing to think about the message; the "10" in the message confused me for a moment, and then I realized that it's referring to the *lower* bound of the array. I don't know how "standardese" refers to this, but I think about it as the "start" of the array. So maybe the messages could read something like: out of bounds access to array ‘int[10]’: subscript -1 is before the start of the array or something. In a similar vein, is it easy to distinguish reads from writes, which might give: out of bounds write to array ‘int[10]’: subscript -1 writes to before the start of the array vs e.g.: out of bounds read from array 'int[10]': subscript 10 reads from after 'a[9]', the final element of the array and similar (I'm brainstorming here). Dave