On Tue, Jun 10, 2025 at 5:28 PM Qing Zhao <qing.z...@oracle.com> wrote: > > > > > On Jun 10, 2025, at 09:37, Richard Biener <richard.guent...@gmail.com> > > wrote: > > > > On Mon, Jun 9, 2025 at 8:06 PM Qing Zhao <qing.z...@oracle.com> wrote: > >> > >> > >> > >>> On Jun 6, 2025, at 03:31, Richard Biener <richard.guent...@gmail.com> > >>> wrote: > >>> > >>> On Fri, May 30, 2025 at 5:13 PM Qing Zhao <qing.z...@oracle.com> wrote: > >>>> > >>>> Hi, Richard, > >>>> > >>>> Really appreciate for your suggestions. > >>>> > >>>>> On May 30, 2025, at 05:22, Richard Biener <richard.guent...@gmail.com> > >>>>> wrote: > >>>>> > >>>>> On Fri, May 23, 2025 at 10:49 PM Qing Zhao <qing.z...@oracle.com> wrote: > >>>>>> > >>>>>> Hi, Richard, > >>>>>> > >>>>>> Thanks a lot for your comments and questions. > >>>>>> Please see my answers embedded below: > >>>>>> > >>>>>>> On May 19, 2025, at 06:44, Richard Biener > >>>>>>> <richard.guent...@gmail.com> wrote: > >>>>>>> > >>>>>>> On Fri, May 16, 2025 at 3:34 PM Qing Zhao <qing.z...@oracle.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Control this with a new option -fdiagnostics-details. > >>>>>>>> > >>>>>>>> $ cat t.c > >>>>>>>> extern void warn(void); > >>>>>>>> static inline void assign(int val, int *regs, int *index) > >>>>>>>> { > >>>>>>>> if (*index >= 4) > >>>>>>>> warn(); > >>>>>>>> *regs = val; > >>>>>>>> } > >>>>>>>> struct nums {int vals[4];}; > >>>>>>>> > >>>>>>>> void sparx5_set (int *ptr, struct nums *sg, int index) > >>>>>>>> { > >>>>>>>> int *val = &sg->vals[index]; > >>>>>>>> > >>>>>>>> assign(0, ptr, &index); > >>>>>>>> assign(*val, ptr, &index); > >>>>>>>> } > >>>>>>>> > >>>>>>>> $ gcc -Wall -O2 -c -o t.o t.c > >>>>>>>> t.c: In function ‘sparx5_set’: > >>>>>>>> t.c:12:23: warning: array subscript 4 is above array bounds of > >>>>>>>> ‘int[4]’ [-Warray-bounds=] > >>>>>>>> 12 | int *val = &sg->vals[index]; > >>>>>>>> | ~~~~~~~~^~~~~~~ > >>>>>>>> t.c:8:18: note: while referencing ‘vals’ > >>>>>>>> 8 | struct nums {int vals[4];}; > >>>>>>>> | ^~~~ > >>>>>>>> > >>>>>>>> In the above, Although the warning is correct in theory, the warning > >>>>>>>> message > >>>>>>>> itself is confusing to the end-user since there is information that > >>>>>>>> cannot > >>>>>>>> be connected to the source code directly. > >>>>>>>> > >>>>>>>> It will be a nice improvement to add more information in the warning > >>>>>>>> message > >>>>>>>> to report where such index value come from. > >>>>>>>> > >>>>>>>> In order to achieve this, we add a new data structure "move_history" > >>>>>>>> to record > >>>>>>>> 1. the "condition" that triggers the code movement; > >>>>>>>> 2. whether the code movement is on the true path of the "condition"; > >>>>>>>> 3. the "compiler transformation" that triggers the code movement. > >>>>>>>> > >>>>>>>> Whenever there is a code movement along control flow graph due to > >>>>>>>> some > >>>>>>>> specific transformations, such as jump threading, path isolation, > >>>>>>>> tree > >>>>>>>> sinking, etc., a move_history structure is created and attached to > >>>>>>>> the > >>>>>>>> moved gimple statement. > >>>>>>>> > >>>>>>>> During array out-of-bound checking or -Wstringop-* warning checking, > >>>>>>>> the > >>>>>>>> "move_history" that was attached to the gimple statement is used to > >>>>>>>> form > >>>>>>>> a sequence of diagnostic events that are added to the corresponding > >>>>>>>> rich > >>>>>>>> location to be used to report the warning message. > >>>>>>>> > >>>>>>>> This behavior is controled by the new option -fdiagnostics-details > >>>>>>>> which is off by default. > >>>>>>>> > >>>>>>>> With this change, by adding -fdiagnostics-details, > >>>>>>>> the warning message for the above testing case is now: > >>>>>>>> > >>>>>>>> $ gcc -Wall -O2 -fdiagnostics-details -c -o t.o t.c > >>>>>>>> t.c: In function ‘sparx5_set’: > >>>>>>>> t.c:12:23: warning: array subscript 4 is above array bounds of > >>>>>>>> ‘int[4]’ [-Warray-bounds=] > >>>>>>>> 12 | int *val = &sg->vals[index]; > >>>>>>>> | ~~~~~~~~^~~~~~~ > >>>>>>>> ‘sparx5_set’: events 1-2 > >>>>>>>> 4 | if (*index >= 4) > >>>>>>>> | ^ > >>>>>>>> | | > >>>>>>>> | (1) when the condition is evaluated to true > >>>>>>>> ...... > >>>>>>>> 12 | int *val = &sg->vals[index]; > >>>>>>>> | ~~~~~~~~~~~~~~~ > >>>>>>>> | | > >>>>>>>> | (2) out of array bounds here > >>>>>>>> t.c:8:18: note: while referencing ‘vals’ > >>>>>>>> 8 | struct nums {int vals[4];}; > >>>>>>>> | ^~~~ > >>>>>>>> > >>>>>>>> The change was divided into 3 parts: > >>>>>>>> > >>>>>>>> Part 1: Add new data structure move_history, record move_history > >>>>>>>> during > >>>>>>>> transformation; > >>>>>>>> Part 2: In warning analysis, Use the new move_history to form a rich > >>>>>>>> location with a sequence of events, to report more context info > >>>>>>>> of the warnings. > >>>>>>>> Part 3: Add debugging mechanism for move_history. > >>>>>>> > >>>>>>> Thanks for working on this. I'm pasting my review notes on [1/3] > >>>>>>> below. > >>>>>>> > >>>>>>> > >>>>>>> ove histories are allocated from a global obstack - it seems they are > >>>>>>> never released, just the association between stmt and history is > >>>>>>> eventually broken by remove_move_history? > >>>>>> > >>>>>> They are supposed to be released by the following routine: > >>>>>> > >>>>>> +void > >>>>>> +move_history_finalize (void) > >>>>>> +{ > >>>>>> + if (move_history_map) > >>>>>> + { > >>>>>> + delete move_history_map; > >>>>>> + move_history_map = NULL; > >>>>>> + } > >>>>>> + obstack_free (&move_history_obstack, NULL); > >>>>>> + return; > >>>>>> +} > >>>>>> > >>>>>> Is the above “obstack_free” the proper utility routine that are used > >>>>>> to free the whole > >>>>>> memory allocated for the“move_history_obstack”? > >>>>>> > >>>>>> If not, please advise on how to free the obstacle variables. > >>>>> > >>>>> Yes, the above releases them all - I was pointing out that when > >>>>> an entry becomes unreachable via remove_move_history the > >>>>> entry doesn't get re-used but instead is only released at the end > >>>>> of compilation. > >>>> > >>>> Okay, I see. > >>>> So in the following code: > >>>> > >>>> @@ -581,6 +582,8 @@ gsi_remove (gimple_stmt_iterator *i, bool > >>>> remove_permanently) > >>>> cfun->debug_marker_count--; > >>>> require_eh_edge_purge = remove_stmt_from_eh_lp (stmt); > >>>> gimple_remove_stmt_histograms (cfun, stmt); > >>>> + if (get_move_history (stmt) != NULL) > >>>> + remove_move_history (stmt); > >>>> > >>>> +/* Remove the move history for STMT. */ > >>>> + > >>>> +void > >>>> +remove_move_history (gimple *stmt) > >>>> +{ > >>>> + if (!move_history_map) > >>>> + return; > >>>> + move_history_map->remove (stmt); > >>>> + return; > >>>> +} > >>>> > >>>> What’s in my original thought is, whenever one gimple stmt is deleted, > >>>> We just break the > >>>> association between It and its move_history, the real move_history data > >>>> structure will > >>>> be released by obstack_free anyway later. > >>>> > >>>> Yes, you are right, if there are many gimple stmts with move_histories > >>>> are deleted during > >>>> the optimization, it’s better to delete the associated move_history data > >>>> structure too in > >>>> “remove_move_history”. > >>>> Will update it. > >>>> > >>>>> > >>>>>>> Do you have any statistics > >>>>>>> on the memory usage? > >>>>>> No. > >>>>>>> Might be an interesting bit to dump with > >>>>>>> -fmem-report (the size of the obstack). Likewise the statistics > >>>>>>> on the hash-map are interesting. > >>>>>> > >>>>>> Are you suggesting to add the memory usage statistics for both > >>>>>> “move_history_obstack” and > >>>>>> “move_history_map” to -fmem_report? > >>>>>> If so, I will study a little bit on how to do this. And then add this > >>>>>> part. > >>>>> > >>>>> Yes. Basically to get an idea how bad the lack of re-use is. I could > >>>>> imagine > >>>>> the obstack could be per function which might improve peak memory use, > >>>>> or adding a ref counter and keeping a freelist. Without data it's hard > >>>>> to tell > >>>>> whether this is worth the trouble. > >>>> > >>>> Sure, I will collect these statistics with GCC build as you suggested. > >>>> (But maybe I don’t need to > >>>> collect them anymore due to the new approach you suggested-:). > >>>>> > >>>>>> > >>>>>>> static bool > >>>>>>> +is_move_history_existed (location_t cond_location, bool is_true_path, > >>>>>>> + enum move_reason > >>>>>>> > >>>>>>> that's a bit of an odd name - 'move_history_exists_p'? > >>>>>> > >>>>>> Okay, will update the name as you suggested. > >>>>>> > >>>>>>> What are > >>>>>>> we supposed to do on a duplicate? This is a linear search - how > >>>>>>> many do we accumulate in practice? Does it make sense to put a > >>>>>>> limit on the number of transforms we record? > >>>>>> > >>>>>> Yeah, all are good questions. > >>>>>> I cannot imagine a concrete situation in which the duplication might > >>>>>> happen, > >>>>>> But I assume that it (the duplication) might happen (might due to > >>>>>> interaction among > >>>>>> different transformations within loops?). > >>>>>> Shall we collect some statistics on the potential duplications? And > >>>>>> also the length of the > >>>>>> move_history chain? And then decide the solution to the above > >>>>>> questions? > >>>>> > >>>>> Yes. > >>>>> > >>>>>> If so, do you have a suggestion on what benchmark I should use to > >>>>>> collect such > >>>>>> Data? Is SPECCPU good enough for such purpose? > >>>>> > >>>>> I think even data from a GCC build is useful enough. > >>>> Okay, will collect all the above data with GCC build if needed. (But > >>>> looks like I don’t need > >>>> To do this if the new approach works well) > >>>>> > >>>>>>> > >>>>>>> static gimple * > >>>>>>> +get_cond_stmt (edge entry, bool is_destination, bool *is_true_path) > >>>>>>> +{ > >>>>>>> > >>>>>>> I don't quite understand the is_destination parameter, likewise > >>>>>>> for the two APIs using this function? > >>>>>> > >>>>>> The major purpose of “is_destination” is: > >>>>>> the value of “is_destination” is used to compute the value of > >>>>>> “is_true_path” as: > >>>>>> > >>>>>> + *is_true_path = !(is_branch_taken ^ is_destination); > >>>>>> > >>>>>> As now, only GIMPLE_COND is considered, GIMPLE_SWITCH is not handled. > >>>>>> We can extend this work to include GIMPLE_SWITCH if needed. What’s your > >>>>>> Suggestions on this? > >>>>>> > >>>>>> Since only GIMPLE_COND is handled now, there will be two places > >>>>>> where the > >>>>>> code might move to, one is the “destination” of the current edge, one > >>>>>> is not the > >>>>>> “destination” of the current edge, i.e, its on the destination of the > >>>>>> new edge. > >>>>>> > >>>>>> Then based on whether the current edge is a taken edge and whether the > >>>>>> code is > >>>>>> moved to the destination of this current edge, we can compute whether > >>>>>> the code > >>>>>> moved to the destination of this current edge is on the TRUE path of > >>>>>> the COND > >>>>>> Stmt. and report such information to the end users. (Please see my > >>>>>> explanation > >>>>>> on the isolate path transformation below, there is a graph to explain > >>>>>> this -:). > >>>>>> > >>>>>> Hope this is clear, if it is still not clear, let me know, I will try > >>>>>> to explain more. > >>>>> > >>>>> Hmm, I'm not sure yet ;) > >>>>> > >>>>>> > >>>>>>> Usually the control edge > >>>>>> > >>>>>> What do you mean by the control edge? > >>>>> > >>>>> The control edge is the edge out of the control stmt that decides > >>>>> whether the > >>>>> outgoing edge (which IIRC is passed here) is executed. Consider > >>>>> > >>>>> if (...) > >>>>> \ e' > >>>>> \ > >>>>> if (..) > >>>>> / \ > >>>>> A B > >>>>> \ / > >>>>> C > >>>>> | e > >>>>> > >>>>> here 'e' would be the outgoing edge, but rather than the if() before > >>>>> the diamond, the if() controlling e' is the control edge. > >>>> > >>>> Yes, that’s what in my mind too. > >>>>> > >>>>>>> and the BB the stmt is in can be different from edge->dest, and > >>>>>>> I'd expect the callers to know, so I wonder why we would want to > >>>>>>> search here? > >>>>>> > >>>>>> Not quite understand the question. Is my graph below for the isolate > >>>>>> path > >>>>>> transformation answer this question? > >>>>> > >>>>> I think how it actually works depends on the code transform, but see > >>>>> one of > >>>>> my last comments where the abstraction presented is maybe not best. > >>>>> > >>>>>>> In particular the use in path isolation for PHI > >>>>>>> arguments passes in the edge of the problematic PHI arg but > >>>>>>> the first reached gcond * might not be the full controlling > >>>>>>> condition when there is more than two PHI arguments. > >>>>>> > >>>>>> For path isolation, I have the following patch: > >>>>>> > >>>>>> @@ -170,6 +171,16 @@ isolate_path (basic_block bb, basic_block > >>>>>> duplicate, edge e. > >>>>>> } > >>>>>> bb->count -= count; > >>>>>> > >>>>>> + /* Set the move history for all the stmts in both original and > >>>>>> copied > >>>>>> + basic blocks. The duplicated block will be the destination of > >>>>>> the > >>>>>> + incoming edge. */ > >>>>>> + if (flag_diagnostics_details) > >>>>>> + { > >>>>>> + set_move_history_to_stmts_in_bb (bb, e, false, > >>>>>> COPY_BY_ISOLATE_PATH); > >>>>>> + set_move_history_to_stmts_in_bb (duplicate, e, > >>>>>> + true, COPY_BY_ISOLATE_PATH); > >>>>>> + } > >>>>>> + > >>>>>> > >>>>>> From my understanding: > >>>>>> the arguments “bb”, “duplicate” and “e” of the routine “isolate_path” > >>>>>> can be illustrated as following: > >>>>>> > >>>>>> OLD: > >>>>>> cond_bb > >>>>>> \ | / > >>>>>> e1 \ | e / e2 > >>>>>> V > >>>>>> bb > >>>>>> > >>>>>> NEW: > >>>>>> cond_bb > >>>>>> \ / | > >>>>>> e1\ / e2 | e > >>>>>> V V > >>>>>> bb duplicate > >>>>>> > >>>>>> Is the above transformation correct? > >>>>> > >>>>> more-or-less, yes. path isolation only handles the case where there's > >>>>> a gimple-cond on the incoming > >>>>> edge to BB that will cause UB in BB. So we want to remove that edge > >>>>> from BB and instead have a > >>>>> copy which we can prune at/after the UB stmt. > >>>> > >>>> Yes. That’s right. > >>>>> > >>>>>> If yes, then the “duplicate” is the destination of the edge “e”, but > >>>>>> the “bb” is not the destination of the edge “e”. > >>>>>> Therefore: > >>>>>> > >>>>>> + set_move_history_to_stmts_in_bb (bb, e, false, > >>>>>> COPY_BY_ISOLATE_PATH); > >>>>>> + set_move_history_to_stmts_in_bb (duplicate, e, > >>>>>> + true, COPY_BY_ISOLATE_PATH); > >>>>>> > >>>>>> Then, the above two calls to the routine > >>>>>> “set_move_history_to_stmts_in_bb” will get the cond stmt “cond” in the > >>>>>> source basic block “cond_bb” of the edge “e” if the condition stmt can > >>>>>> be found. And then set the move_history > >>>>>> for each stmt in the block “bb” or “duplicate” properly. > >>>>>> (If the condition stmt cannot be found, we give up). > >>>>>> > >>>>>> Is the above correct? > >>>>> > >>>>> So what I wondered is why we record the stmts in 'bb' to be "moved"? > >>>>> Or why we > >>>>> record the stmts in 'duplicate' to be "moved"? That is, what's the > >>>>> condition that's > >>>>> no longer visible in the IL that we want to preserve this way? > >>>>> > >>>>> I think both that in 'bb' 'cond' is false and that in 'duplicate' > >>>>> 'cond' is true is still > >>>>> obviously present, so it's redundant to note this? So what we get from > >>>>> marking > >>>>> the stmts is just that "stmt somehow participates in an optimization > >>>>> where 'cond' > >>>>> was involved"? > >>>>> > >>>>> That is, I wonder whether if -fdiagnostic-show-context would simply > >>>>> display > >>>>> the (first) control stmt (that is still in the IL!) of its execution > >>>>> context, whether > >>>>> that would fix all the testcases you include? > >>>> Oh, yeah, all the above sound very reasonable and natural to me -:) > >>>> I guess that I was misleaded by the compiler transformations…. > >>>> > >>>> Maybe Not only the “first” control stmt, tracking back along the CFG to > >>>> the first joint place, i.e: > >>>> > >>>> + B2 > >>>> + / \ > >>>> + V \ > >>>> + B3 \ > >>>> + / \ \ > >>>> + V \---->V > >>>> + B4 B7 > >>>> + > >>>> > >>>> If the diagnostic happens in B4, and there is only one incoming edge to > >>>> B4, will track back to B3 and then back to B2. > >>>> Report the condition on B2, B3, and whether B3 and B4 is on the > >>>> TRUE/FALSE patch of the corresponding edges. > >>>> > >>>> On the other hand, if the diagnostic happens in B7, since there are > >>>> multiple incoming edges to B7, -fdiagnostic-show-context > >>>> will give up on such case. > >>> > >>> In principle it should be possible to collect a complex condition like > >>> A && B || C from the CFG. Of course the question is how much is > >>> interesting or even relevant - it's going to be hard to guess. > >> Yes. That’s right. > >> It might not be helpful to report the complete condition chain for the > >> specific diagnostic. > >> So the major question is: > >> what kind of algorithm or heuristic we should use to pick up the specific > >> portion of the condition > >> chain for the diagnostic to help the end-user to understand the message > >> better? > >> > >> What kind of factors might impact the heuristic: (the following are just > >> my guess) > >> > >> 1. The algorithm or heuristic for the value ranger analysis? > >> 2. Different warnings might have different heuristic? > >> 3. Only compiler optimization altered control edges? > >> > >> What else? > >> > >> Any suggestion on this? > > > > It's difficult to do any meaningful pruning I think. Consider > > > > if (i == -1) > > tem = a[i]; > > > > when we transform this to > > > > if (i == -1) > > tem = a[-1]; > > > > and report the out-of-bounds access then we've lost the fact > > that -1 was originally 'i' and there's an interesting condition on that > > variable. So unless we preserve some kind of optimization history > > on a stmt like the above we are lost. The same is true for > > "altered" control edges where the most interesting are again those > > that are elided by optimization (the implied true/false conditions). > > On the other hand, -fdiagnostic-show-context=1 should provide helpful > information > to the user when -Warray-bounds reports out-of-bounds warning for: > > if (i == -1) > tem = a[i]; // out-of-bound warning here > > I.e., the condition that guards this statement, “if (I == -1)” will be > reported by -fdiagnostic-show-context=1, > Then the user will figure out the problem in the source code. -:)
Yes, I think it's going to be OK in practice. Just not perfect ;) > > > That said, I would expect the user to not use -fdiagnostics-show-context by > > default but instead when being puzzled about a diagnostic, turn it on > > on-demand. So it's probably useful to report all of the chain or maybe > > have -fdiagnostic-show-context=N with N limiting the depth (or a separate > > knob for this). > > Yes, -fdiagnostic-show-context=N is good. N is the depth of the control flow > chain > that guards the basic block of the statement that has the warning, by > default, N is1. > Is this a reasonable user documentation? Yes, I think so. Maybe "interesting control flow chain"? Leaves us to eventually prune parts of the chain and more specifically we can waive not printing control flow instructions we've optimized away. > I think that this option is only helpful for the warnings that depends on > value range information. > Is this right? I don't think so. Consider uninit diagnostics for example, printing the condition chains where we failed to prove initializedness might be helpful. > > > > When there's still 'i' in a[i] but we figured out a range then we can see > > to do 1. to gather conditions that we consider. The backwards threader > > has "interestingness" heuristics that could eventually be repurposed. > > The "good" thing is that once we decide to emit a diagnostic it's fair > > to spend quite some cycles on improving it. > > > For the following arbitrary CFG: > > +B6 B2 > + | / \ > + V V \ > +B5 B3 \ > + \ / \ \ > + V \---->V > + B4 B7 > + > > If the interesting warning (for example -Warray-bounds) that depends on value > range information is emitted at B4, > which implies that the value range information for the index variable of the > array reference is known at B4, usually > such value range information is collected at the condition statements that > precedes B4 and propagated through CFG > (Is this correct?), therefore, backtracing the CFG edges starting from B4 to > its preceding blocks, such as B3, B5, then > back to B2, B6 should be reasonable? Yes, though without having a common base (the common immediate dominator of B5 and B3 here) it's difficult to interpret. Generally when you have B2 / \ B3 B4 \ / \ B5 B6 and the array ref is in B5 the condition that is useful to print with just this part of the CFG is ! B4->B6, the inverted condition of the edge B4->B6. Plus (and) the condition that B2 is entered. So you'd somehow say note: assuming line-at-end-of-B2 is reached note: and ! B4->B6 warning: array index out of bounds or how the path diagnostic materializes. That is, the top of the CFG considered should be a dominator of the block we diagnose and we should indicate we assume we've reached there. > > However, should we give up on the cases when there are multiple immediate > preceding blocks of B4, only report context > When there is only one immediate preceding block? (This might be a good > starting point?) Simplifying the initial supported CFG shapes is of course fine. Richard. > >> > >>> > >>>> > >>>> Now, when we actually elide a control stmt, that's the point where we > >>>> lose > >>>> information, but the transforms altered, jump threading, path > >>>> isolation and sinking, > >>>> don't actually do this? > >>> Yes. > >>>> > >>>> And if we record the stmt "move" for both edges don't we end up creating > >>>> too much noise that will be perceived as fals positive / misleading when > >>>> we extend the use of this? > >>>> > >>>> So like, when path isolation isolates a ptr == 0 we should end up with IL > >>>> like > >>>> > >>>> if (ptr != 0) > >>>> | \ > >>>> | good code > >>>> | > >>>> *0 = x; > >>>> > >>>> where the last stmt is eventually diagnosed. But the test on ptr > >>>> will have survived, so if we'd diagnose the path to it that would > >>>> help already, no need for any magic stmt duplication/move marking? > >>> > >>> Right, that should be enough. > >>> > >>>> > >>>> I'm not saying that for some cases we will need that, but the testcaes > >>>> seem to be too simple to prove this? > >>>> > >>>> That said, we might be able to do with the 2nd patch improving > >>>> the actual diagnostic but simply by making it show a path > >>>> with -fdiagnostic-show-context? > >>> > >>> I will try this approach to see whether it can catch all the cases. > >>> And whether there is any issue we miss at this moment. > >> > >> Thanks. I guess if we're collecting a complex condition like > >> suggested above we could also try hard to prove it false > >> so we can prune diagnostics on unreachable code that we just > >> didn't optimize. > > > > Has such analysis (prove one path false) been done by the compiler already? > > > Yes, for example jump-threading can do this. But the transforms are > > limited by > > costing, for example on code duplication. So there's the chance that extra > > static analysis can prove unreachability (and we always have the cases with > > bad luck of pass ordering or too late removal of dead code). > > > Okay, I see. > > >> Such analysis also relates to value range analysis and propagation? (Since > >> evaluating a condition > >> to TRUE/FALSE need to know the value ranges of the variables in the > >> condition) > > > > value ranges can help, yes. > >>> There is (a quite imperfect) code for analyzing > >>> and simplifying predicates in gimple-predicate-analysis.{cc,h}. > >>> The predicate build from a CFG isn't fully exposed in the API > >>> though. > >> > >> I just briefly read these files and see that it is currently used by > >> uninitialized analysis. > >> I haven’t gone into details of how this help uninitialized analysis, but I > >> guess it’s used > >> to evaluate the path in order to rule out false-positive warnings? > > > > It's used the other way around - it tries to prove that on all paths > > to a use a variable > > is initialized. If we fail to prove that we diagnose. > Okay. > > >> Might need to understand how uninitialized analysis use the predicate > >> analysis first to see how to use it > >> for my purpose. > > > > What you'd do is try to prove the condition guarding the stmt we are > > diagnosing > > is false. > > I think that -fdiagnostic-show-context=1 is trying to show to the user: > > when a warning being emitted at one statement is not obvious at that > statement itself > (due to compiler transformations improving the value range analysis), tracing > back in the > source code to see which guarding condition in the source code might be > interesting to > check for the potential user error. > > > I'm not sure if it helps in practice, of course. It's > > really poor mans > > static analysis and easily confused by late transforms. > > This option is trying to help in this area, and from Kees’s and Sam’s > experiments in Linux and other applications, > It did help already. -:) > > > > Richard. > > > > thanks. > > > > Qing > >> > >>>> > >>>>>> Not to > >>>>>> mention a switch statement might also be the control stmt. > >>>>>> While this can be extended in future I'd like the caller > >>>>>> to compute the control dependence that's relevant - using > >>>>>> an edge (or in future a vector of edges) is fine, it should > >>>>>> be the edge outgoing from the control stmt. > >>>>> > >>>>> Yes, that’s the edge outgoing from the cond stmt. > >>>>>> > >>>>>> @opindex fdiagnostics-details > >>>>>> +@item -fdiagnostics-details > >>>>>> > >>>>>> Not entirely happy with the name, we have -fdiagnostic-show-* > >>>>>> options related to events, so maybe -fdiagnostic-show-details > >>>>>> or -fdiagnostic-show-context? > >>>>> > >>>>> -fdiagnostic-show-context is better? > >>>> > >>>> I'd think so. > >>>> > >>>>>> > >>>>>> @@ -581,6 +582,8 @@ gsi_remove (gimple_stmt_iterator *i, bool > >>>>>> remove_permanently) > >>>>>> cfun->debug_marker_count--; > >>>>>> require_eh_edge_purge = remove_stmt_from_eh_lp (stmt); > >>>>>> gimple_remove_stmt_histograms (cfun, stmt); > >>>>>> + if (get_move_history (stmt) != NULL) > >>>>>> + remove_move_history (stmt); > >>>>>> > >>>>>> I'd say make remove_move_history cope with the stmt not being in the > >>>>>> hash is better, so just call that function here, without get_*() > >>>>> > >>>>> Okay. > >>>>>> > >>>>>> @ -712,6 +756,19 @@ sink_code_in_bb (basic_block bb, > >>>>>> virtual_operand_live &vop_live) > >>>>>> bb->index, (gsi_bb (togsi))->index); > >>>>>> } > >>>>>> > >>>>>> + /* Set the move history for the stmt that is sinked from BB to > >>>>>> + gsi_bb (togsi). This stmt is on the path from BB to > >>>>>> + gsi_bb (togsi). */ > >>>>>> + if (flag_diagnostics_details) > >>>>>> + { > >>>>>> + /* BB might not be the immediate predecessor of gsi_bb > >>>>>> (togsi), > >>>>>> + get the edge chain from BB to gsi_bb (togsi). */ > >>>>>> + auto_vec<edge> edge_chain = get_edge_chain (bb, gsi_bb > >>>>>> (togsi)); > >>>>>> + > >>>>>> + for (edge entry : edge_chain) > >>>>>> + set_move_history_to_stmt (stmt, entry, true, MOVE_BY_SINK); > >>>>>> + } > >>>>>> > >>>>>> so you represent a more complex control condition by a series of > >>>>>> sinkings? > >>>>> > >>>>> If the sinking happens along a path that contain multiple edges such as > >>>>> the following > >>>>> B2 to B4: > >>>>> > >>>>> +/* Get the edge chain from FROM_BB to TO_BB, FROM_BB dominates TO_BB. > >>>>> + B2 > >>>>> + / \ > >>>>> + V \ > >>>>> + B3 \ > >>>>> + / \ \ > >>>>> + V \---->V > >>>>> + B4 B7 > >>>>> + > >>>>> + In the above CFG, FROM_BB is B2, TO_BB is B4. This routine > >>>>> + will locate two edges, B2->B3, and B3->B4 and return a vector > >>>>> + with these two edges. > >>>>> + This routine will return an empty vector if the control flow > >>>>> + edge chain from FROM_BB to TO_BB is too complicate. */ > >>>>> + > >>>>> > >>>>> Then both the edges B2->B3 and B3->B4 will be recorded. And the sinking > >>>>> from B2->B3 > >>>>> is on the destination of this edge, and then sinking from B3->B4 also > >>>>> is on the destination > >>>>> of this edge too. > >>>>> Therefore, we have: > >>>>> > >>>>> + for (edge entry : edge_chain) > >>>>> + set_move_history_to_stmt (stmt, entry, true, MOVE_BY_SINK); > >>>>> + } > >>>>> > >>>>> > >>>>>> I guess that's possible in this case at least. I'll note the > >>>>>> get_edge_chain function seems to be quite simplistic and likely > >>>>>> prone to record only a part of the sinkings. > >>>>> > >>>>> Could you please explain a bit more here? Do you mean that the current > >>>>> “get_edge_chain” might miss some more complicate cases? If so, could you > >>>>> please provide more details on what’s the more complicate cases look > >>>>> like? > >>>>> Then I will enhance this routine to handle them. > >>>>>> > >>>>>> + /* Set the move history for all the stmts in both original > >>>>>> and copied > >>>>>> + basic blocks. The duplicated block will be the > >>>>>> destination of the > >>>>>> + incoming edge. */ > >>>>>> + if (flag_diagnostics_details) > >>>>>> + { > >>>>>> + set_move_history_to_stmts_in_bb (e->dest, e, false, > >>>>>> + COPY_BY_THREAD_JUMP); > >>>>>> + set_move_history_to_stmts_in_bb (rd->dup_blocks[0], e, > >>>>>> + true, > >>>>>> COPY_BY_THREAD_JUMP); > >>>>>> + } > >>>>>> > >>>>>> I'm taking this one as example - why are we setting a move history > >>>>>> on the not copied part? Shouldn't that be still within the original > >>>>>> constraints / location? Otherwise why would(nt) we show the location > >>>>>> and direction of all control stmts of a stmts we emit a diagnostic on? > >>>>> > >>>>> I think that the following should answer the above questions: > >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666870.html > >>>>> > >>>>> " > >>>>> The key feature of the compiler transformation that might provide more > >>>>> accurate information to the value-range analysis is: moving a statement > >>>>> from the joint point of one specific condition to this condition's > >>>>> TRUE or FALSE path. > >>>>> > >>>>> For example, threadjump and isolate-path transformation make a > >>>>> duplicated > >>>>> basic block “Ba’ " of the original basic block "Ba" that on the joint > >>>>> point of a condition "cond", and then move the two basic block “Ba' " > >>>>> and "Ba" to the TRUE path and FALSE path of the condition "cond"; on the > >>>>> otherhand, tree sink transformation just move some of the statements > >>>>> from > >>>>> the joint point of a condition "cond" to one specific path of this > >>>>> condition. > >>>>> > >>>>> So, the new data structure "move_history" will include the following > >>>>> information: > >>>>> A. the "condition" that triggers the code movement; > >>>>> B. whether the code movement is on the true path of the "condition"; > >>>>> C. the "compiler transformation" that triggers the code movement. > >>>>> " > >>>>> > >>>>> Does the above answer your question? > >>>> > >>>> Yes, I understand what we record. But I still wonder if we need this. > >>>> What we do not record reliably is when if (A) eventually is optimized > >>>> to if (1) or if (0) that the stmts on the path that remains are subject > >>>> to A == 1 (or 0). > >>> > >>> Yes, I understand what you suggested and I like this idea. That’s much > >>> simple > >>> and natural to this problem. > >>> I will try to implement this approach in the next version to see whether > >>> there is any > >>> potential issue with it. > >>> > >>> Really appreciate for your suggestions and help. > >>> > >>> FYI, I will be on vacation next week, so will try the new approach when I > >>> am back > >>> on the week of 6/9. > >> > >> No problem. > >> > >> Richard. > >> > >>> Thanks again. > >>> > >>> Qing > >>> > >>> > >>>> Richard. > >>>> > >>>>>> > >>>>>> > >>>>>> I think it makes sense to squash [3/3] into this patch. > >>>>> > >>>>> Okay, will do that, > >>>>>> > >>>>>> There is documentation about diagnostics in doc/ux.texi, it would be > >>>>>> nice to mention move-history there. > >>>>> > >>>>> Okay, will study a little bit and add it. > >>>>> > >>>>> Thanks again for your time and comments. > >>>>> > >>>>> Qing > >>>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> Richard. > >>>>>> > >>>>>>> PR tree-optimization/109071 > >>>>>>> PR tree-optimization/85788 > >>>>>>> PR tree-optimization/88771 > >>>>>>> PR tree-optimization/106762 > >>>>>>> PR tree-optimization/108770 > >>>>>>> PR tree-optimization/115274 > >>>>>>> PR tree-optimization/117179 > >>>>>>> > >>>>>>> gcc/ChangeLog: > >>>>>>> > >>>>>>> * Makefile.in (OBJS): Add diagnostic-move-history.o. > >>>>>>> * gcc/common.opt (fdiagnostics-details): New option. > >>>>>>> * gcc/doc/invoke.texi (fdiagnostics-details): Add > >>>>>>> documentation for the new option. > >>>>>>> * gimple-iterator.cc (gsi_remove): (gsi_remove): Remove the move > >>>>>>> history when removing the gimple. > >>>>>>> * gimple-pretty-print.cc (pp_gimple_stmt_1): Emit MV_H marking > >>>>>>> if the gimple has a move_history. > >>>>>>> * gimple-ssa-isolate-paths.cc (isolate_path): Set move history > >>>>>>> for the gimples of the duplicated blocks. > >>>>>>> * tree-ssa-sink.cc (sink_code_in_bb): Create move_history for > >>>>>>> stmt when it is sinked. > >>>>>>> * toplev.cc (toplev::finalize): Call move_history_finalize. > >>>>>>> * tree-ssa-threadupdate.cc (ssa_redirect_edges): Create > >>>>>>> move_history > >>>>>>> for stmts when they are duplicated. > >>>>>>> (back_jt_path_registry::duplicate_thread_path): Likewise. > >>>>>>> * diagnostic-move-history.cc: New file. > >>>>>>> * diagnostic-move-history.h: New file. > >>>>>>> > >>>>>>> gcc/testsuite/ChangeLog: > >>>>>>> > >>>>>>> * gcc.dg/pr117375.c: New test. > >>>>>>> --- > >>>>>>> gcc/Makefile.in | 1 + > >>>>>>> gcc/common.opt | 4 + > >>>>>>> gcc/diagnostic-move-history.cc | 265 ++++++++++++++++++++++++++++++++ > >>>>>>> gcc/diagnostic-move-history.h | 92 +++++++++++ > >>>>>>> gcc/doc/invoke.texi | 11 ++ > >>>>>>> gcc/gimple-iterator.cc | 3 + > >>>>>>> gcc/gimple-pretty-print.cc | 4 + > >>>>>>> gcc/gimple-ssa-isolate-paths.cc | 11 ++ > >>>>>>> gcc/testsuite/gcc.dg/pr117375.c | 13 ++ > >>>>>>> gcc/toplev.cc | 3 + > >>>>>>> gcc/tree-ssa-sink.cc | 57 +++++++ > >>>>>>> gcc/tree-ssa-threadupdate.cc | 25 +++ > >>>>>>> 12 files changed, 489 insertions(+) > >>>>>>> create mode 100644 gcc/diagnostic-move-history.cc > >>>>>>> create mode 100644 gcc/diagnostic-move-history.h > >>>>>>> create mode 100644 gcc/testsuite/gcc.dg/pr117375.c > >>>>>>> > >>>>>>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in > >>>>>>> index 72d132207c0..38dfb688e60 100644 > >>>>>>> --- a/gcc/Makefile.in > >>>>>>> +++ b/gcc/Makefile.in > >>>>>>> @@ -1451,6 +1451,7 @@ OBJS = \ > >>>>>>> df-problems.o \ > >>>>>>> df-scan.o \ > >>>>>>> dfp.o \ > >>>>>>> + diagnostic-move-history.o \ > >>>>>>> digraph.o \ > >>>>>>> dojump.o \ > >>>>>>> dominance.o \ > >>>>>>> diff --git a/gcc/common.opt b/gcc/common.opt > >>>>>>> index 0e50305dde8..081537c349a 100644 > >>>>>>> --- a/gcc/common.opt > >>>>>>> +++ b/gcc/common.opt > >>>>>>> @@ -1613,6 +1613,10 @@ fdiagnostics-minimum-margin-width= > >>>>>>> Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6) > >>>>>>> Set minimum width of left margin of source code when showing source. > >>>>>>> > >>>>>>> +fdiagnostics-details > >>>>>>> +Common Var(flag_diagnostics_details) > >>>>>>> +Collect and print more context information for diagnostics. > >>>>>>> + > >>>>>>> fdisable- > >>>>>>> Common Joined RejectNegative Var(common_deferred_options) Defer > >>>>>>> -fdisable-[tree|rtl|ipa]-<pass>=range1+range2 Disable an > >>>>>>> optimization pass. > >>>>>>> diff --git a/gcc/diagnostic-move-history.cc > >>>>>>> b/gcc/diagnostic-move-history.cc > >>>>>>> new file mode 100644 > >>>>>>> index 00000000000..83d8a42b577 > >>>>>>> --- /dev/null > >>>>>>> +++ b/gcc/diagnostic-move-history.cc > >>>>>>> @@ -0,0 +1,265 @@ > >>>>>>> +/* Functions to handle move history. > >>>>>>> + Copyright (C) 2024 Free Software Foundation, Inc. > >>>>>>> + Contributed by Qing Zhao <qing.z...@oracle.com> > >>>>>>> + > >>>>>>> + This file is part of GCC. > >>>>>>> + > >>>>>>> + GCC is free software; you can redistribute it and/or modify it > >>>>>>> under > >>>>>>> + the terms of the GNU General Public License as published by the > >>>>>>> Free > >>>>>>> + Software Foundation; either version 3, or (at your option) any > >>>>>>> later > >>>>>>> + version. > >>>>>>> + > >>>>>>> + GCC is distributed in the hope that it will be useful, but > >>>>>>> WITHOUT ANY > >>>>>>> + WARRANTY; without even the implied warranty of MERCHANTABILITY or > >>>>>>> + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public > >>>>>>> License > >>>>>>> + for more details. > >>>>>>> + > >>>>>>> + You should have received a copy of the GNU General Public License > >>>>>>> + along with GCC; see the file COPYING3. If not see > >>>>>>> + <http://www.gnu.org/licenses/>. */ > >>>>>>> + > >>>>>>> +#define INCLUDE_MEMORY > >>>>>>> +#include "config.h" > >>>>>>> +#include "system.h" > >>>>>>> +#include "coretypes.h" > >>>>>>> +#include "backend.h" > >>>>>>> +#include "tree.h" > >>>>>>> +#include "gimple.h" > >>>>>>> +#include "gimple-iterator.h" > >>>>>>> +#include "cfganal.h" > >>>>>>> +#include "diagnostic-move-history.h" > >>>>>>> + > >>>>>>> +/* A mapping from a gimple to a pointer to the move history of it. > >>>>>>> */ > >>>>>>> +static move_history_map_t *move_history_map; > >>>>>>> + > >>>>>>> +/* Obstack for move history. */ > >>>>>>> +static struct obstack move_history_obstack; > >>>>>>> + > >>>>>>> +/* Create a new move history. */ > >>>>>>> + > >>>>>>> +move_history_t > >>>>>>> +create_move_history (location_t condition, > >>>>>>> + bool is_true_path, > >>>>>>> + enum move_reason reason, > >>>>>>> + move_history_t prev_move) > >>>>>>> +{ > >>>>>>> + static bool initialized = false; > >>>>>>> + > >>>>>>> + if (!initialized) > >>>>>>> + { > >>>>>>> + gcc_obstack_init (&move_history_obstack); > >>>>>>> + initialized = true; > >>>>>>> + } > >>>>>>> + > >>>>>>> + move_history_t move_history > >>>>>>> + = (move_history_t) obstack_alloc (&move_history_obstack, > >>>>>>> + sizeof (struct move_history)); > >>>>>>> + move_history->condition = condition; > >>>>>>> + move_history->is_true_path = is_true_path; > >>>>>>> + move_history->reason = reason; > >>>>>>> + move_history->prev_move = prev_move; > >>>>>>> + return move_history; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Insert the move history for the gimple STMT assuming the linked > >>>>>>> list > >>>>>>> + of MV_HISTORY does not have duplications. It's the caller's > >>>>>>> + responsibility to make sure that the linked list of MV_HISTORY > >>>>>>> does > >>>>>>> + not have duplications. */ > >>>>>>> + > >>>>>>> +void > >>>>>>> +insert_move_history (gimple *stmt, move_history_t mv_history) > >>>>>>> +{ > >>>>>>> + if (!move_history_map) > >>>>>>> + move_history_map = new move_history_map_t; > >>>>>>> + > >>>>>>> + move_history_map->put (stmt, mv_history); > >>>>>>> + return; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Get the move history for the gimple STMT, return NULL when there > >>>>>>> is > >>>>>>> + no associated move history. */ > >>>>>>> + > >>>>>>> +move_history_t > >>>>>>> +get_move_history (const gimple *stmt) > >>>>>>> +{ > >>>>>>> + if (!move_history_map) > >>>>>>> + return NULL; > >>>>>>> + > >>>>>>> + if (const move_history_t *mv_history_p = move_history_map->get > >>>>>>> (stmt)) > >>>>>>> + return *mv_history_p; > >>>>>>> + > >>>>>>> + return NULL; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Remove the move history for STMT. */ > >>>>>>> + > >>>>>>> +void > >>>>>>> +remove_move_history (gimple *stmt) > >>>>>>> +{ > >>>>>>> + if (!move_history_map) > >>>>>>> + return; > >>>>>>> + move_history_map->remove (stmt); > >>>>>>> + return; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Check whether the cond_location, is_true_path and reason existed > >>>>>>> + * in the OLD_MOVE_HISTORY. */ > >>>>>>> + > >>>>>>> +static bool > >>>>>>> +is_move_history_existed (location_t cond_location, bool is_true_path, > >>>>>>> + enum move_reason reason, > >>>>>>> + move_history_t old_move_history) > >>>>>>> +{ > >>>>>>> + for (move_history_t cur_ch = old_move_history; cur_ch; > >>>>>>> + cur_ch = cur_ch->prev_move) > >>>>>>> + if ((cur_ch->condition == cond_location) > >>>>>>> + && (cur_ch->is_true_path == is_true_path) > >>>>>>> + && (cur_ch->reason == reason)) > >>>>>>> + return true; > >>>>>>> + > >>>>>>> + return false; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Set move history for the gimple STMT. Return TRUE when a new move > >>>>>>> + * history is created and inserted. Otherwise return FALSE. */ > >>>>>>> + > >>>>>>> +bool > >>>>>>> +set_move_history (gimple *stmt, location_t cond_location, > >>>>>>> + bool is_true_path, enum move_reason reason) > >>>>>>> +{ > >>>>>>> + > >>>>>>> + /* First, get the old move history associated with this STMT. */ > >>>>>>> + move_history_t old_mv_history = get_move_history (stmt); > >>>>>>> + > >>>>>>> + /* If the current move history is not in the STMT's move history > >>>>>>> linked > >>>>>>> + list yet, create the new move history, put the old_move_history > >>>>>>> as the > >>>>>>> + prev_move of it. */ > >>>>>>> + move_history_t new_mv_history = NULL; > >>>>>>> + if (!is_move_history_existed (cond_location, is_true_path, > >>>>>>> + reason, old_mv_history)) > >>>>>>> + new_mv_history > >>>>>>> + = create_move_history (cond_location, is_true_path, > >>>>>>> + reason, old_mv_history); > >>>>>>> + > >>>>>>> + /* Insert the move history into the hash map. */ > >>>>>>> + if (new_mv_history) > >>>>>>> + { > >>>>>>> + insert_move_history (stmt, new_mv_history); > >>>>>>> + return true; > >>>>>>> + } > >>>>>>> + > >>>>>>> + return false; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Reset all state for diagnostic-move-history.cc so that we can > >>>>>>> rerun the > >>>>>>> + compiler within the same process. For use by toplev::finalize. > >>>>>>> */ > >>>>>>> + > >>>>>>> +void > >>>>>>> +move_history_finalize (void) > >>>>>>> +{ > >>>>>>> + if (move_history_map) > >>>>>>> + { > >>>>>>> + delete move_history_map; > >>>>>>> + move_history_map = NULL; > >>>>>>> + } > >>>>>>> + obstack_free (&move_history_obstack, NULL); > >>>>>>> + return; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Given an edge ENTRY and whether the new code will be moved to the > >>>>>>> + destination of the edge, IS_DISTINATION, return the condition > >>>>>>> + statement in the source of the ENTRY if found. Return NULL > >>>>>>> otherwise. > >>>>>>> + > >>>>>>> + When the condition statement is found, setting IS_TRUE_PATH to > >>>>>>> true > >>>>>>> + if the destination of the edge is on the true path of the > >>>>>>> condition. > >>>>>>> + > >>>>>>> + IS_TRUE_PATH is only valid when the condition statement is found. > >>>>>>> + > >>>>>>> + source > >>>>>>> + | ENTRY > >>>>>>> + V > >>>>>>> + destination > >>>>>>> + > >>>>>>> +*/ > >>>>>>> + > >>>>>>> +static gimple * > >>>>>>> +get_cond_stmt (edge entry, bool is_destination, bool *is_true_path) > >>>>>>> +{ > >>>>>>> + /* First, get the condition statement in the source of the > >>>>>>> + edge ENTRY. */ > >>>>>>> + basic_block cond_block = entry->src; > >>>>>>> + gimple *cond_stmt = NULL; > >>>>>>> + gimple_stmt_iterator gsi; > >>>>>>> + *is_true_path = false; > >>>>>>> + > >>>>>>> + /* if the cond_block ends with a conditional statement, get it. */ > >>>>>>> + while (!cond_stmt && cond_block) > >>>>>>> + { > >>>>>>> + gsi = gsi_last_bb (cond_block); > >>>>>>> + if (!gsi_end_p (gsi) > >>>>>>> + && gsi_stmt (gsi) > >>>>>>> + && (gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)) > >>>>>>> + cond_stmt = gsi_stmt (gsi); > >>>>>>> + /* If there is no cond_stmt in the cond_block, search the > >>>>>>> single_pred > >>>>>>> + of it. */ > >>>>>>> + if (!cond_stmt && single_pred_p (cond_block)) > >>>>>>> + { > >>>>>>> + basic_block prev_cond_block = cond_block; > >>>>>>> + cond_block = single_pred (cond_block); > >>>>>>> + entry = find_edge (cond_block, prev_cond_block); > >>>>>>> + } > >>>>>>> + else > >>>>>>> + break; > >>>>>>> + } > >>>>>>> + > >>>>>>> + bool is_branch_taken = (cond_stmt && (BRANCH_EDGE (cond_block) == > >>>>>>> entry)); > >>>>>>> + *is_true_path = !(is_branch_taken ^ is_destination); > >>>>>>> + > >>>>>>> + return cond_stmt; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Set move history to the stmt based on the edge ENTRY and whether > >>>>>>> this stmt > >>>>>>> + will be in the destination of the ENTRY. > >>>>>>> + The REASON indicates what kind of transformation contributing to > >>>>>>> the > >>>>>>> + statment movement. Return TRUE when the move history has been set > >>>>>>> + successfully. */ > >>>>>>> + > >>>>>>> +bool > >>>>>>> +set_move_history_to_stmt (gimple *stmt, edge entry, > >>>>>>> + bool is_destination, enum move_reason > >>>>>>> reason) > >>>>>>> +{ > >>>>>>> + bool is_true_path = false; > >>>>>>> + gimple *cond_stmt = get_cond_stmt (entry, is_destination, > >>>>>>> &is_true_path); > >>>>>>> + > >>>>>>> + if (!cond_stmt) > >>>>>>> + return false; > >>>>>>> + > >>>>>>> + set_move_history (stmt, gimple_location (cond_stmt), > >>>>>>> + is_true_path, reason); > >>>>>>> + return true; > >>>>>>> +} > >>>>>>> + > >>>>>>> +/* Set move history to all the stmts in the basic block BB based on > >>>>>>> + the edge ENTRY and whether this basic block will be the > >>>>>>> destination > >>>>>>> + of the ENTRY. > >>>>>>> + The REASON indicates what kind of transformation contributing to > >>>>>>> the > >>>>>>> + statement move. Return TRUE when the move history has been set > >>>>>>> + successfully. */ > >>>>>>> + > >>>>>>> +bool > >>>>>>> +set_move_history_to_stmts_in_bb (basic_block bb, edge entry, > >>>>>>> + bool is_destination, > >>>>>>> + enum move_reason reason) > >>>>>>> +{ > >>>>>>> + bool is_true_path = false; > >>>>>>> + gimple_stmt_iterator gsi; > >>>>>>> + gimple *cond_stmt = get_cond_stmt (entry, is_destination, > >>>>>>> &is_true_path); > >>>>>>> + > >>>>>>> + if (!cond_stmt) > >>>>>>> + return false; > >>>>>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > >>>>>>> + set_move_history (gsi_stmt (gsi), gimple_location (cond_stmt), > >>>>>>> + is_true_path, reason); > >>>>>>> + > >>>>>>> + return true; > >>>>>>> +} > >>>>>>> diff --git a/gcc/diagnostic-move-history.h > >>>>>>> b/gcc/diagnostic-move-history.h > >>>>>>> new file mode 100644 > >>>>>>> index 00000000000..9a58766d544 > >>>>>>> --- /dev/null > >>>>>>> +++ b/gcc/diagnostic-move-history.h > >>>>>>> @@ -0,0 +1,92 @@ > >>>>>>> +/* Move history associated with a gimple statement to record its > >>>>>>> history > >>>>>>> + of movement due to different transformations. > >>>>>>> + The move history will be used to construct events for later > >>>>>>> diagnostic. > >>>>>>> + > >>>>>>> + Copyright (C) 2024 Free Software Foundation, Inc. > >>>>>>> + Contributed by Qing Zhao <qing.z...@oracle.com> > >>>>>>> + > >>>>>>> + This file is part of GCC. > >>>>>>> + > >>>>>>> + GCC is free software; you can redistribute it and/or modify it > >>>>>>> under > >>>>>>> + the terms of the GNU General Public License as published by the > >>>>>>> Free > >>>>>>> + Software Foundation; either version 3, or (at your option) any > >>>>>>> later > >>>>>>> + version. > >>>>>>> + > >>>>>>> + GCC is distributed in the hope that it will be useful, but > >>>>>>> WITHOUT ANY > >>>>>>> + WARRANTY; without even the implied warranty of MERCHANTABILITY or > >>>>>>> + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public > >>>>>>> License > >>>>>>> + for more details. > >>>>>>> + > >>>>>>> + You should have received a copy of the GNU General Public License > >>>>>>> + along with GCC; see the file COPYING3. If not see > >>>>>>> + <http://www.gnu.org/licenses/>. */ > >>>>>>> + > >>>>>>> +#ifndef DIAGNOSTIC_MOVE_HISTORY_H > >>>>>>> +#define DIAGNOSTIC_MOVE_HISTORY_H > >>>>>>> + > >>>>>>> +#include "hash-map.h" > >>>>>>> +#include "line-map.h" > >>>>>>> + > >>>>>>> +/* An enum for the reason why this move is made. Right now, there > >>>>>>> are > >>>>>>> + three reasons, we can add more if needed. */ > >>>>>>> +enum move_reason { > >>>>>>> + COPY_BY_THREAD_JUMP, > >>>>>>> + COPY_BY_ISOLATE_PATH, > >>>>>>> + MOVE_BY_SINK, > >>>>>>> + COPY_BY_MAX > >>>>>>> +}; > >>>>>>> + > >>>>>>> +/* This data structure records the information when a statement is > >>>>>>> + moved along control flow graph during different transformations. > >>>>>>> + Such information will be used by the later diagnostic messages > >>>>>>> + to report more contexts of the warnings or errors. */ > >>>>>>> +struct move_history { > >>>>>>> + /* The location of the condition statement that triggered the code > >>>>>>> + movement. */ > >>>>>>> + location_t condition; > >>>>>>> + > >>>>>>> + /* Whether this move is on the TRUE path of the condition. */ > >>>>>>> + bool is_true_path; > >>>>>>> + > >>>>>>> + /* The reason for the code movement. */ > >>>>>>> + enum move_reason reason; > >>>>>>> + > >>>>>>> + /* This statement itself might be a previous code movement. */ > >>>>>>> + struct move_history *prev_move; > >>>>>>> +}; > >>>>>>> + > >>>>>>> +typedef struct move_history *move_history_t; > >>>>>>> + > >>>>>>> +/* Create a new move history. */ > >>>>>>> +extern move_history_t create_move_history (location_t, bool, > >>>>>>> + enum move_reason, > >>>>>>> move_history_t); > >>>>>>> + > >>>>>>> +typedef hash_map<const gimple *, move_history_t> move_history_map_t; > >>>>>>> + > >>>>>>> +/* Get the move history for the gimple STMT, return NULL when there > >>>>>>> is > >>>>>>> + no associated move history. */ > >>>>>>> +extern move_history_t get_move_history (const gimple *); > >>>>>>> + > >>>>>>> +/* Remove the move history for STMT from the move_history_map. */ > >>>>>>> +extern void remove_move_history (gimple *); > >>>>>>> + > >>>>>>> +/* Set move history for the gimple STMT. */ > >>>>>>> +extern bool set_move_history (gimple *, location_t, > >>>>>>> + bool, enum move_reason); > >>>>>>> + > >>>>>>> +/* Reset all state for diagnostic-move-history.cc so that we can > >>>>>>> rerun the > >>>>>>> + compiler within the same process. For use by toplev::finalize. > >>>>>>> */ > >>>>>>> +extern void move_history_finalize (void); > >>>>>>> + > >>>>>>> +/* Set move history to the stmt based on the edge ENTRY and whether > >>>>>>> this stmt > >>>>>>> + will be in the destination of the ENTRY. */ > >>>>>>> +extern bool set_move_history_to_stmt (gimple *, edge, > >>>>>>> + bool, enum move_reason); > >>>>>>> + > >>>>>>> +/* Set move history to all the stmts in the basic block based on > >>>>>>> + the entry edge and whether this basic block will be the > >>>>>>> destination > >>>>>>> + of the entry edge. */ > >>>>>>> +extern bool set_move_history_to_stmts_in_bb (basic_block, edge, > >>>>>>> + bool, enum move_reason); > >>>>>>> + > >>>>>>> +#endif // DIAGNOSTIC_MOVE_HISTORY_H > >>>>>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > >>>>>>> index ee7180110e1..ca04f35d6a1 100644 > >>>>>>> --- a/gcc/doc/invoke.texi > >>>>>>> +++ b/gcc/doc/invoke.texi > >>>>>>> @@ -332,6 +332,7 @@ Objective-C and Objective-C++ Dialects}. > >>>>>>> -fdiagnostics-column-origin=@var{origin} > >>>>>>> -fdiagnostics-escape-format=@r{[}unicode@r{|}bytes@r{]} > >>>>>>> -fdiagnostics-text-art-charset=@r{[}none@r{|}ascii@r{|}unicode@r{|}emoji@r{]}} > >>>>>>> +-fdiagnostics-details > >>>>>>> > >>>>>>> @item Warning Options > >>>>>>> @xref{Warning Options,,Options to Request or Suppress Warnings}. > >>>>>>> @@ -5687,6 +5688,16 @@ left margin. > >>>>>>> This option controls the minimum width of the left margin printed by > >>>>>>> @option{-fdiagnostics-show-line-numbers}. It defaults to 6. > >>>>>>> > >>>>>>> +@opindex fdiagnostics-details > >>>>>>> +@item -fdiagnostics-details > >>>>>>> +With this option, the compiler collects more context information for > >>>>>>> +diagnostics and emits them to the users to provide more hints on how > >>>>>>> +the diagnostics come from, at the cost of a slightly slower > >>>>>>> compilation. > >>>>>>> +Currently, The list of the impacted warning options includes: > >>>>>>> +@option{-Warray-bounds}, @option{-Wstringop-overflow}, > >>>>>>> +@option{-Wstringop-overread}, and @option{-Wstringop-truncation}. > >>>>>>> +More warning options might be added to this list in future releases. > >>>>>>> + > >>>>>>> @opindex fdiagnostics-parseable-fixits > >>>>>>> @item -fdiagnostics-parseable-fixits > >>>>>>> Emit fix-it hints in a machine-parseable format, suitable for > >>>>>>> consumption > >>>>>>> diff --git a/gcc/gimple-iterator.cc b/gcc/gimple-iterator.cc > >>>>>>> index 3af672bf0b9..f1d7ca9c78e 100644 > >>>>>>> --- a/gcc/gimple-iterator.cc > >>>>>>> +++ b/gcc/gimple-iterator.cc > >>>>>>> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see > >>>>>>> #include "tree-ssa.h" > >>>>>>> #include "value-prof.h" > >>>>>>> #include "gimplify.h" > >>>>>>> +#include "diagnostic-move-history.h" > >>>>>>> > >>>>>>> > >>>>>>> /* Mark the statement STMT as modified, and update it. */ > >>>>>>> @@ -581,6 +582,8 @@ gsi_remove (gimple_stmt_iterator *i, bool > >>>>>>> remove_permanently) > >>>>>>> cfun->debug_marker_count--; > >>>>>>> require_eh_edge_purge = remove_stmt_from_eh_lp (stmt); > >>>>>>> gimple_remove_stmt_histograms (cfun, stmt); > >>>>>>> + if (get_move_history (stmt) != NULL) > >>>>>>> + remove_move_history (stmt); > >>>>>>> } > >>>>>>> > >>>>>>> /* Update the iterator and re-wire the links in I->SEQ. */ > >>>>>>> diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc > >>>>>>> index 4e20b4cc371..811164565bb 100644 > >>>>>>> --- a/gcc/gimple-pretty-print.cc > >>>>>>> +++ b/gcc/gimple-pretty-print.cc > >>>>>>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see > >>>>>>> #include "asan.h" > >>>>>>> #include "cfgloop.h" > >>>>>>> #include "gimple-range.h" > >>>>>>> +#include "diagnostic-move-history.h" > >>>>>>> > >>>>>>> /* Disable warnings about quoting issues in the pp_xxx calls below > >>>>>>> that (intentionally) don't follow GCC diagnostic conventions. */ > >>>>>>> @@ -2734,6 +2735,9 @@ pp_gimple_stmt_1 (pretty_printer *pp, const > >>>>>>> gimple *gs, int spc, > >>>>>>> && (flags & TDF_ALIAS)) > >>>>>>> dump_ssaname_info (pp, gimple_get_lhs (gs), spc); > >>>>>>> > >>>>>>> + if (get_move_history (gs)) > >>>>>>> + pp_printf (pp, "[MV_H] "); > >>>>>>> + > >>>>>>> switch (gimple_code (gs)) > >>>>>>> { > >>>>>>> case GIMPLE_ASM: > >>>>>>> diff --git a/gcc/gimple-ssa-isolate-paths.cc > >>>>>>> b/gcc/gimple-ssa-isolate-paths.cc > >>>>>>> index ca1daf1d168..14c86590b17 100644 > >>>>>>> --- a/gcc/gimple-ssa-isolate-paths.cc > >>>>>>> +++ b/gcc/gimple-ssa-isolate-paths.cc > >>>>>>> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3. If not see > >>>>>>> #include "tree-cfg.h" > >>>>>>> #include "cfganal.h" > >>>>>>> #include "intl.h" > >>>>>>> +#include "diagnostic-move-history.h" > >>>>>>> > >>>>>>> > >>>>>>> static bool cfg_altered; > >>>>>>> @@ -170,6 +171,16 @@ isolate_path (basic_block bb, basic_block > >>>>>>> duplicate, > >>>>>>> } > >>>>>>> bb->count -= count; > >>>>>>> > >>>>>>> + /* Set the move history for all the stmts in both original and > >>>>>>> copied > >>>>>>> + basic blocks. The duplicated block will be the destination of > >>>>>>> the > >>>>>>> + incoming edge. */ > >>>>>>> + if (flag_diagnostics_details) > >>>>>>> + { > >>>>>>> + set_move_history_to_stmts_in_bb (bb, e, false, > >>>>>>> COPY_BY_ISOLATE_PATH); > >>>>>>> + set_move_history_to_stmts_in_bb (duplicate, e, > >>>>>>> + true, COPY_BY_ISOLATE_PATH); > >>>>>>> + } > >>>>>>> + > >>>>>>> /* Complete the isolation step by redirecting E to reach DUPLICATE. > >>>>>>> */ > >>>>>>> e2 = redirect_edge_and_branch (e, duplicate); > >>>>>>> if (e2) > >>>>>>> diff --git a/gcc/testsuite/gcc.dg/pr117375.c > >>>>>>> b/gcc/testsuite/gcc.dg/pr117375.c > >>>>>>> new file mode 100644 > >>>>>>> index 00000000000..beb685afc3b > >>>>>>> --- /dev/null > >>>>>>> +++ b/gcc/testsuite/gcc.dg/pr117375.c > >>>>>>> @@ -0,0 +1,13 @@ > >>>>>>> +/* PR middle-end/117375 ICE with -fdiagnostics-details patch in sink > >>>>>>> pass. */ > >>>>>>> +/* { dg-do compile } > >>>>>>> + { dg-options "-O2 -Wall -fdiagnostics-details" } */ > >>>>>>> + > >>>>>>> +int st, st_0; > >>>>>>> +int nbFilledBytes, max; > >>>>>>> +void ec_enc_shrink(); > >>>>>>> +void max_allowed() { > >>>>>>> + int nbAvailableBytes = nbFilledBytes; > >>>>>>> + if (st && st_0) > >>>>>>> + if (max < nbAvailableBytes) > >>>>>>> + ec_enc_shrink(); > >>>>>>> +} > >>>>>>> diff --git a/gcc/toplev.cc b/gcc/toplev.cc > >>>>>>> index 7e457b5168b..7652c222676 100644 > >>>>>>> --- a/gcc/toplev.cc > >>>>>>> +++ b/gcc/toplev.cc > >>>>>>> @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3. If not see > >>>>>>> #include "coverage.h" > >>>>>>> #include "diagnostic.h" > >>>>>>> #include "pretty-print-urlifier.h" > >>>>>>> +#include "diagnostic-move-history.h" > >>>>>>> #include "varasm.h" > >>>>>>> #include "tree-inline.h" > >>>>>>> #include "realmpfr.h" /* For GMP/MPFR/MPC versions, in > >>>>>>> print_version. */ > >>>>>>> @@ -2436,6 +2437,8 @@ toplev::finalize (void) > >>>>>>> reginfo_cc_finalize (); > >>>>>>> varasm_cc_finalize (); > >>>>>>> > >>>>>>> + move_history_finalize (); > >>>>>>> + > >>>>>>> /* save_decoded_options uses opts_obstack, so these must > >>>>>>> be cleaned up together. */ > >>>>>>> obstack_free (&opts_obstack, NULL); > >>>>>>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc > >>>>>>> index 959e0d5c6be..0b3441e894c 100644 > >>>>>>> --- a/gcc/tree-ssa-sink.cc > >>>>>>> +++ b/gcc/tree-ssa-sink.cc > >>>>>>> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3. If not see > >>>>>>> #include "tree-eh.h" > >>>>>>> #include "tree-ssa-live.h" > >>>>>>> #include "tree-dfa.h" > >>>>>>> +#include "diagnostic-move-history.h" > >>>>>>> > >>>>>>> /* TODO: > >>>>>>> 1. Sinking store only using scalar promotion (IE without moving the > >>>>>>> RHS): > >>>>>>> @@ -655,6 +656,49 @@ sink_common_stores_to_bb (basic_block bb) > >>>>>>> return todo; > >>>>>>> } > >>>>>>> > >>>>>>> +/* Get the edge chain from FROM_BB to TO_BB, FROM_BB dominates TO_BB. > >>>>>>> + B2 > >>>>>>> + / \ > >>>>>>> + V \ > >>>>>>> + B3 \ > >>>>>>> + / \ \ > >>>>>>> + V \---->V > >>>>>>> + B4 B7 > >>>>>>> + > >>>>>>> + In the above CFG, FROM_BB is B2, TO_BB is B4. This routine > >>>>>>> + will locate two edges, B2->B3, and B3->B4 and return a vector > >>>>>>> + with these two edges. > >>>>>>> + This routine will return an empty vector if the control flow > >>>>>>> + edge chain from FROM_BB to TO_BB is too complicate. */ > >>>>>>> + > >>>>>>> +static auto_vec<edge> > >>>>>>> +get_edge_chain (basic_block from_bb, basic_block to_bb) > >>>>>>> +{ > >>>>>>> + auto_vec<edge> rev_edge_chain, edge_chain; > >>>>>>> + basic_block imm_dom_bb; > >>>>>>> + edge e; > >>>>>>> + unsigned int i; > >>>>>>> + /* Backtracing from TO_BB to FROM_BB through the dominator tree. > >>>>>>> */ > >>>>>>> + do > >>>>>>> + { > >>>>>>> + imm_dom_bb = get_immediate_dominator (CDI_DOMINATORS, to_bb); > >>>>>>> + if (!imm_dom_bb) > >>>>>>> + return edge_chain; > >>>>>>> + e = find_edge (imm_dom_bb, to_bb); > >>>>>>> + if (!e) > >>>>>>> + return edge_chain; > >>>>>>> + gcc_assert (e); > >>>>>>> + rev_edge_chain.safe_push (e); > >>>>>>> + to_bb = imm_dom_bb; > >>>>>>> + } > >>>>>>> + while (imm_dom_bb != from_bb); > >>>>>>> + > >>>>>>> + /* The order of the edges need to be reverted. */ > >>>>>>> + FOR_EACH_VEC_ELT_REVERSE (rev_edge_chain, i, e) > >>>>>>> + edge_chain.safe_push (e); > >>>>>>> + return edge_chain; > >>>>>>> +} > >>>>>>> + > >>>>>>> /* Perform code sinking on BB */ > >>>>>>> > >>>>>>> static unsigned > >>>>>>> @@ -712,6 +756,19 @@ sink_code_in_bb (basic_block bb, > >>>>>>> virtual_operand_live &vop_live) > >>>>>>> bb->index, (gsi_bb (togsi))->index); > >>>>>>> } > >>>>>>> > >>>>>>> + /* Set the move history for the stmt that is sinked from BB to > >>>>>>> + gsi_bb (togsi). This stmt is on the path from BB to > >>>>>>> + gsi_bb (togsi). */ > >>>>>>> + if (flag_diagnostics_details) > >>>>>>> + { > >>>>>>> + /* BB might not be the immediate predecessor of gsi_bb > >>>>>>> (togsi), > >>>>>>> + get the edge chain from BB to gsi_bb (togsi). */ > >>>>>>> + auto_vec<edge> edge_chain = get_edge_chain (bb, gsi_bb > >>>>>>> (togsi)); > >>>>>>> + > >>>>>>> + for (edge entry : edge_chain) > >>>>>>> + set_move_history_to_stmt (stmt, entry, true, > >>>>>>> MOVE_BY_SINK); > >>>>>>> + } > >>>>>>> + > >>>>>>> /* Update virtual operands of statements in the path we > >>>>>>> do not sink to. */ > >>>>>>> if (gimple_vdef (stmt)) > >>>>>>> diff --git a/gcc/tree-ssa-threadupdate.cc > >>>>>>> b/gcc/tree-ssa-threadupdate.cc > >>>>>>> index 4e5c7566857..789ca1d9422 100644 > >>>>>>> --- a/gcc/tree-ssa-threadupdate.cc > >>>>>>> +++ b/gcc/tree-ssa-threadupdate.cc > >>>>>>> @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. If not see > >>>>>>> #include "tree-cfg.h" > >>>>>>> #include "tree-vectorizer.h" > >>>>>>> #include "tree-pass.h" > >>>>>>> +#include "diagnostic-move-history.h" > >>>>>>> > >>>>>>> /* Given a block B, update the CFG and SSA graph to reflect > >>>>>>> redirecting > >>>>>>> one or more in-edges to B to instead reach the destination of an > >>>>>>> @@ -1341,6 +1342,17 @@ ssa_redirect_edges (struct redirection_data > >>>>>>> **slot, > >>>>>>> { > >>>>>>> edge e2; > >>>>>>> > >>>>>>> + /* Set the move history for all the stmts in both original > >>>>>>> and copied > >>>>>>> + basic blocks. The duplicated block will be the > >>>>>>> destination of the > >>>>>>> + incoming edge. */ > >>>>>>> + if (flag_diagnostics_details) > >>>>>>> + { > >>>>>>> + set_move_history_to_stmts_in_bb (e->dest, e, false, > >>>>>>> + COPY_BY_THREAD_JUMP); > >>>>>>> + set_move_history_to_stmts_in_bb (rd->dup_blocks[0], e, > >>>>>>> + true, > >>>>>>> COPY_BY_THREAD_JUMP); > >>>>>>> + } > >>>>>>> + > >>>>>>> if (dump_file && (dump_flags & TDF_DETAILS)) > >>>>>>> fprintf (dump_file, " Threaded jump %d --> %d to %d\n", > >>>>>>> e->src->index, e->dest->index, > >>>>>>> rd->dup_blocks[0]->index); > >>>>>>> @@ -2419,6 +2431,19 @@ back_jt_path_registry::duplicate_thread_path > >>>>>>> (edge entry, > >>>>>>> copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop, > >>>>>>> split_edge_bb_loc (entry), false); > >>>>>>> > >>>>>>> + /* Set the move history for all the stmts in both original and > >>>>>>> copied > >>>>>>> + basic blocks. The copied regions will be the destination of the > >>>>>>> + entry edge. */ > >>>>>>> + for (i = 0; i < n_region; i++) > >>>>>>> + if (flag_diagnostics_details) > >>>>>>> + { > >>>>>>> + set_move_history_to_stmts_in_bb (region[i], entry, false, > >>>>>>> + COPY_BY_THREAD_JUMP); > >>>>>>> + set_move_history_to_stmts_in_bb (region_copy[i], entry, > >>>>>>> + true, COPY_BY_THREAD_JUMP); > >>>>>>> + } > >>>>>>> + > >>>>>>> + > >>>>>>> /* Fix up: copy_bbs redirects all edges pointing to copied blocks. > >>>>>>> The > >>>>>>> following code ensures that all the edges exiting the jump-thread > >>>>>>> path are > >>>>>>> redirected back to the original code: these edges are exceptions > >>>>>>> -- > >>>>>>> 2.31.1 > >