On Tue, 2024-07-02 at 16:17 +0000, Qing Zhao wrote: > due to code duplication from jump threading [PR109071] > Control this with a new option -fdiagnostic-try-to-explain-harder.
The name -fdiagnostic-try-to-explain-harder seems a little too "cute" to me, but I can't think of a better name. Various comments inline below... I'm sorry I didn't take a close look at the copy history implementation; I'm hoping Richi will dig into that part of the patch. > > This patch has been tested with -fdiagnostic-try-to-expain-harder on > by > default to bootstrap gcc and regression testing on both x86 and > aarch64, > resolved all bootstrap issues and regression testing issues. > > I need some help in the following two items: > 1. suggestions on better documentation wordings for the new option. > 2. checking on the new data structures copy_history and the > memory management for it. > In the beginning, I tried to use the GGC for it. > but have met quite some issues (possibly because the gimple and > the containing > basic block elimination), then I gave up the GGC scheme and > instead > used the obstack and manually clean and deleted the obstack in the > end of > the compilation. > > The following is more details on the patchs > > Thanks a lot! > > Qing > > $ cat t.c > extern void warn(void); > static inline void assign(int val, int *regs, int *index) > { > if (*index >= 4) > warn(); > *regs = val; > } > struct nums {int vals[4];}; > > void sparx5_set (int *ptr, struct nums *sg, int index) > { > int *val = &sg->vals[index]; > > assign(0, ptr, &index); > assign(*val, ptr, &index); > } > > $ gcc -Wall -O2 -c -o t.o t.c > t.c: In function ‘sparx5_set’: > t.c:12:23: warning: array subscript 4 is above array bounds of > ‘int[4]’ [-Warray-bounds=] > 12 | int *val = &sg->vals[index]; > | ~~~~~~~~^~~~~~~ > t.c:8:18: note: while referencing ‘vals’ > 8 | struct nums {int vals[4];}; > | ^~~~ > > In the above, Although the warning is correct in theory, the warning > message > itself is confusing to the end-user since there is information that > cannot > be connected to the source code directly. > > It will be a nice improvement to add more information in the warning > message > to report where such index value come from. > > In order to achieve this, we add a new data structure copy_history to > record > the condition and the transformation that triggered the code > duplication. > Whenever there is a code duplication due to some specific > transformations, > such as jump threading, loop switching, etc, a copy_history structure > is > created and attached to the duplicated gimple statement. > > During array out-of-bound checking or other warning checking, the > copy_history > that was attached to the gimple statement is used to form a sequence > of > diagnostic events that are added to the corresponding rich location > to be used > to report the warning message. > > This behavior is controled by the new option -fdiagnostic-try-to- > explain-harder > which is off by default. > > With this change, by adding -fdiagnostic-try-to-explain-harder, > the warning message for the above testing case is now: > > t.c: In function ‘sparx5_set’: > t.c:12:23: warning: array subscript 4 is above array bounds of > ‘int[4]’ [-Warray-bounds=] > 12 | int *val = &sg->vals[index]; > | ~~~~~~~~^~~~~~~ > event 1 > | > | 4 | if (*index >= 4) > | | ^ > | | | > | | (1) when the condition is evaluated to true > | > t.c:8:18: note: while referencing ‘vals’ > 8 | struct nums {int vals[4];}; > | ^~~~ BTW I notice in the above example you have the extra vertical line below "event 1" here: event 1 | | 4 | if (*index >= 4) | | ^ | | | | | (1) when the condition is evaluated to true | whereas in the testcase it looks like you're expecting the simpler: event 1 4 | if (*index >= 4) | ^ | | | (1) when the condition is evaluated to true which is due to r15-533-g3cd267446755ab, so I think the example text is slightly outdated. I wonder if the wording of the event could be improved to better explain to the user what the optimizer is "thinking". Perhaps it could be two events: (1) when specializing the code for both branches... (2) ...and considering the 'true' branch... or something like that? (I'm not sure) Is there a more complicated testcase with multiple events? Various other points about the path: When the analyzer builds paths for its diagnostics, it adds a final event that repeats the problem. This is somewhat tautological, but might make the path more readable - I'm not sure. Consider: t.c: In function ‘sparx5_set’: t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=] 12 | int *val = &sg->vals[index]; | ~~~~~~~~^~~~~~~ events 1-2 4 | if (*index >= 4) | ^ | | | (1) when the condition is evaluated to true ... 12 | int *val = &sg->vals[index]; | ~~~~~~~~^~~~~~~ | | | (2) ⚠️ out of array bounds here t.c:8:18: note: while referencing ‘vals’ 8 | struct nums {int vals[4];}; | ^~~~ FWIW there's a way to get get a ⚠️ emoji on that final event, by having its meaning have verb == diagnostic_event::VERB_danger. GCC 15 also supports visualizing control flow between events, by having diagnostic_event::connect_to_next_event_p () return true; see r15-636- g770657d02c986c, so that might be something to consider adding. [...snip...] > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 7d3ea27a6ab..36a5b5fa305 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1436,6 +1436,7 @@ OBJS = \ > df-problems.o \ > df-scan.o \ > dfp.o \ > + diagnostic-copy-history.o \ > digraph.o \ > dojump.o \ > dominance.o \ > @@ -1821,7 +1822,8 @@ OBJS = \ > > # Objects in libcommon.a, potentially used by all host binaries and > with > # no target dependencies. > -OBJS-libcommon = diagnostic-spec.o diagnostic.o diagnostic-color.o \ > +OBJS-libcommon = diagnostic-spec.o \ > + diagnostic.o diagnostic-color.o \ Is this change to OBJS-libcommon a stray edit? I don't think it changes the meaning. > diagnostic-format-json.o \ > diagnostic-format-sarif.o \ > diagnostic-global-context.o \ > diff --git a/gcc/common.opt b/gcc/common.opt > index 327230967ea..833cc2f4da4 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1558,6 +1558,10 @@ fdiagnostics-minimum-margin-width= > Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6) > Set minimum width of left margin of source code when showing source. > > +fdiagnostics-try-to-explain-harder > +Common Var(flag_diagnostics_try_to_explain_harder) > +Collect and print more context information for diagnostics. > + Eventually you'll want to run "make regenerate-opt-urls". [...snip...] > diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc > index 1637a2fc4f4..9050b06bf1f 100644 > --- a/gcc/gimple-array-bounds.cc > +++ b/gcc/gimple-array-bounds.cc [...snip...] > @@ -801,6 +802,21 @@ array_bounds_checker::check_array_bounds (tree > *tp, int *walk_subtree, > else > location = gimple_location (wi->stmt); > > + /* Generate a rich location for this location. */ > + gcc_rich_location richloc (location); > + simple_diagnostic_path path (global_dc->printer); > + > + /* Add events to this diagnostic_path per the copy_history > attached to > + the corresponding STMT. */ > + copy_history_t cp_history = wi->stmt ? get_copy_history (wi->stmt) > : NULL; > + for (copy_history_t cur_ch = cp_history; cur_ch; > + cur_ch = cur_ch->prev_copy) > + path.add_event (cur_ch->condition, NULL_TREE, 0, > + "when the condition is evaluated to %s", > + cur_ch->is_true_path ? "true" : "false"); add_event's 2nd param is the function in which the event occurs, which you may want to set to cfun, and set the stack depth to 1, rather than 0, suggesting a purely intraprocedural control flow. FWIW the analyzer has some nasty logic to try to reconstruct functions and stack frames in the light of inlining, which might or might not make sense for you to make use of (I'm not sure). > + > + richloc.set_path (&path); Creating and populating "path" here is doing non-trivial work per call to check_array_bounds, and if I'm reading things right that's being done for every statement. So maybe we want something like: lazy_diaqnostic_path path (callback, data); that is backed by a diagnostic_path *, which gets populated on demand if any vfuncs on the lazy_diagnostic_path are actually called. Or we could abandon simple_diagnostic_path here and have a: copy_history_diagnostic_path path (wi->stmt); subclass of diagnostic_path that implements its vfuncs directly in terms of a copy_history_t; the ctor simply sets up the vtable ptr and initializes a gimple *. Or, all those if (for_array_bound) warned = warning_at (richloc, OPT_Warray_bounds_, /* etc */); could become: if (for_array_bound) { copy_history_diagnostic_path path (location, stmt); warned = warning_at (richloc, OPT_Warray_bounds_, /* etc */); } or somesuch, explicitly deferring creation of the path until we're actually about to emit a warning, and putting the logic in copy_history_diagnostic_path's ctor. > + > *walk_subtree = true; > > bool warned = false; > @@ -808,14 +824,14 @@ array_bounds_checker::check_array_bounds (tree > *tp, int *walk_subtree, > gcc_assert (checker->m_stmt == wi->stmt); > > if (TREE_CODE (t) == ARRAY_REF) > - warned = checker->check_array_ref (location, t, wi->stmt, > + warned = checker->check_array_ref (&richloc, t, wi->stmt, > false/*ignore_off_by_one*/); > else if (TREE_CODE (t) == MEM_REF) > - warned = checker->check_mem_ref (location, t, > + warned = checker->check_mem_ref (&richloc, t, > false /*ignore_off_by_one*/); > else if (TREE_CODE (t) == ADDR_EXPR) > { > - checker->check_addr_expr (location, t, wi->stmt); > + checker->check_addr_expr (&richloc, t, wi->stmt); > *walk_subtree = false; > } > else if (inbounds_memaccess_p (t, wi->stmt)) [...snip...] > diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h > index aa7ca8e9730..c2f7a017fc2 100644 > --- a/gcc/gimple-array-bounds.h > +++ b/gcc/gimple-array-bounds.h > @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see > #define GCC_GIMPLE_ARRAY_BOUNDS_H > > #include "pointer-query.h" > +#include "gcc-rich-location.h" > > class array_bounds_checker > { > @@ -32,9 +33,10 @@ public: > > private: > static tree check_array_bounds (tree *tp, int *walk_subtree, void > *data); > - bool check_array_ref (location_t, tree, gimple *, bool > ignore_off_by_one); > - bool check_mem_ref (location_t, tree, bool ignore_off_by_one); > - void check_addr_expr (location_t, tree, gimple *); > + bool check_array_ref (gcc_rich_location *, tree, gimple *, > + bool ignore_off_by_one); Note that these could just take a rich_location *, I think, and drop the #include above - I don't think there's anything you're using in gcc_rich_location that isn't in the rich_location base class. > + bool check_mem_ref (gcc_rich_location *, tree, bool > ignore_off_by_one); > + void check_addr_expr (gcc_rich_location *, tree, gimple *); > void get_value_range (irange &r, const_tree op, gimple *); > > /* Current function. */ [...snip...] Hope this is constructive Dave