basic_block flags, BB_VISITED
Hi! After the "Add Early VRP" GCC trunk commit r240291 (Kugan CC for your information), I've been observing all kinds of OpenACC offloading failures. I now figured out what's going on. The "evrp" pass uses basic_block's BB_VISITED flag. It first clears these all, gcc/tree-vrp.c:execute_early_vrp: FOR_EACH_BB_FN (bb, cfun) { bb->flags &= ~BB_VISITED; ..., then does its processing, and at the end, clears these again: FOR_EACH_BB_FN (bb, cfun) bb->flags &= ~BB_VISITED; I note that this looks slightly different from what gcc/cfg.c:clear_bb_flags whould be doing, which works from ENTRY_BLOCK_PTR_FOR_FN onwards: /* Clear all basic block flags that do not have to be preserved. */ void clear_bb_flags (void) { basic_block bb; FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) bb->flags &= BB_FLAGS_TO_PRESERVE; } In the specific case that I've been looking at, "evrp" processing doesn't change the code other than for shifting some IDs because of adding a (dummy one-argument) PHI node. And the problem now exactly is that the ENTRY_BLOCK_PTR_FOR_FN's BB_VISITED flag is still set, and so gcc/omp-low.c:oacc_loop_discover_walk will bail out without doing any processing: /* DFS walk of basic blocks BB onwards, creating OpenACC loop structures as we go. By construction these loops are properly nested. */ static void oacc_loop_discover_walk (oacc_loop *loop, basic_block bb) { [...] if (bb->flags & BB_VISITED) return; follow: bb->flags |= BB_VISITED; [...] /* Discover the OpenACC loops marked up by HEAD and TAIL markers for the current function. */ static oacc_loop * oacc_loop_discovery () { basic_block bb; oacc_loop *top = new_oacc_loop_outer (current_function_decl); oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun)); [...] Now, gcc/cfg-flags.def says: [Most of the basic block] flags may be cleared by clear_bb_flags(). It is generally a bad idea to rely on any flags being up-to-date. */ [...] /* A general visited flag for passes to use. */ DEF_BASIC_BLOCK_FLAG(VISITED, 13) The BB_VISITED flag to me seems to always be of very local use only. Should we be more strict and strengthen the above "may be"/"bad idea" wording? Does the "evrp" pass need to get changes applied, to properly clear BB_VISITED (ENTRY_BLOCK_PTR_FOR_FN, in particular)? Or, in contrast, as we're not to "rely on any flags being up-to-date", should the BB_VISITED clearing at the end of gcc/tree-vrp.c:execute_early_vrp be removed in fact? >From (really just) a quick look, I can't tell the exact policy that other GCC passes are applying/using regarding the basic_block flags... The following patch does cure the testsuite regressions: --- gcc/omp-low.c +++ gcc/omp-low.c @@ -19342,6 +19342,7 @@ oacc_loop_discovery () basic_block bb; oacc_loop *top = new_oacc_loop_outer (current_function_decl); + clear_bb_flags (); oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun)); /* The siblings were constructed in reverse order, reverse them so Is something like that what we should do? Should clear_bb_flags be called here, or early in gcc/omp-low.c:execute_oacc_device_lower (like some other GCC passes seem to be doing)? Grüße Thomas
Re: basic_block flags, BB_VISITED
On 10/14/2016 10:01 AM, Thomas Schwinge wrote: After the "Add Early VRP" GCC trunk commit r240291 (Kugan CC for your information), I've been observing all kinds of OpenACC offloading failures. I now figured out what's going on. The "evrp" pass uses basic_block's BB_VISITED flag. It first clears these all, gcc/tree-vrp.c:execute_early_vrp: FOR_EACH_BB_FN (bb, cfun) { bb->flags &= ~BB_VISITED; ..., then does its processing, and at the end, clears these again: FOR_EACH_BB_FN (bb, cfun) bb->flags &= ~BB_VISITED; I note that this looks slightly different from what gcc/cfg.c:clear_bb_flags whould be doing, which works from ENTRY_BLOCK_PTR_FOR_FN onwards: So maybe it should just call clear_bb_flags instead of doing the loop itself? Ok if that works. Bernd
Re: basic_block flags, BB_VISITED
On Fri, Oct 14, 2016 at 11:15 AM, Bernd Schmidt wrote: > On 10/14/2016 10:01 AM, Thomas Schwinge wrote: >> >> After the "Add Early VRP" GCC trunk commit r240291 (Kugan CC for your >> information), I've been observing all kinds of OpenACC offloading >> failures. I now figured out what's going on. >> >> The "evrp" pass uses basic_block's BB_VISITED flag. It first clears >> these all, gcc/tree-vrp.c:execute_early_vrp: >> >> FOR_EACH_BB_FN (bb, cfun) >> { >> bb->flags &= ~BB_VISITED; >> >> ..., then does its processing, and at the end, clears these again: >> >> FOR_EACH_BB_FN (bb, cfun) >> bb->flags &= ~BB_VISITED; >> >> I note that this looks slightly different from what >> gcc/cfg.c:clear_bb_flags whould be doing, which works from >> ENTRY_BLOCK_PTR_FOR_FN onwards: > > > So maybe it should just call clear_bb_flags instead of doing the loop > itself? Ok if that works. That doesn't generally work, it clears too many flags (it was appearantly designed for RTL). Richard. > > Bernd >
Re: basic_block flags, BB_VISITED
On Fri, Oct 14, 2016 at 10:01 AM, Thomas Schwinge wrote: > Hi! > > After the "Add Early VRP" GCC trunk commit r240291 (Kugan CC for your > information), I've been observing all kinds of OpenACC offloading > failures. I now figured out what's going on. > > The "evrp" pass uses basic_block's BB_VISITED flag. It first clears > these all, gcc/tree-vrp.c:execute_early_vrp: > > FOR_EACH_BB_FN (bb, cfun) > { > bb->flags &= ~BB_VISITED; > > ..., then does its processing, and at the end, clears these again: > > FOR_EACH_BB_FN (bb, cfun) > bb->flags &= ~BB_VISITED; > > I note that this looks slightly different from what > gcc/cfg.c:clear_bb_flags whould be doing, which works from > ENTRY_BLOCK_PTR_FOR_FN onwards: > > /* Clear all basic block flags that do not have to be preserved. */ > void > clear_bb_flags (void) > { > basic_block bb; > > FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) > bb->flags &= BB_FLAGS_TO_PRESERVE; > } > > In the specific case that I've been looking at, "evrp" processing doesn't > change the code other than for shifting some IDs because of adding a > (dummy one-argument) PHI node. And the problem now exactly is that the > ENTRY_BLOCK_PTR_FOR_FN's BB_VISITED flag is still set, and so > gcc/omp-low.c:oacc_loop_discover_walk will bail out without doing any > processing: The BB_VISITED flag has indetermined state at the beginning of a pass. You have to ensure it is cleared yourself. EVRP clearing it (similar to tree-ssa-propagate.c) is because IRA has the same issue and crashes on "stale" BB_VISTED flags. Richard. > /* DFS walk of basic blocks BB onwards, creating OpenACC loop >structures as we go. By construction these loops are properly >nested. */ > > static void > oacc_loop_discover_walk (oacc_loop *loop, basic_block bb) > { > [...] > if (bb->flags & BB_VISITED) > return; > > follow: > bb->flags |= BB_VISITED; > [...] > > /* Discover the OpenACC loops marked up by HEAD and TAIL markers for >the current function. */ > > static oacc_loop * > oacc_loop_discovery () > { > basic_block bb; > > oacc_loop *top = new_oacc_loop_outer (current_function_decl); > oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun)); > [...] > > Now, gcc/cfg-flags.def says: > >[Most of the basic block] flags may be cleared by clear_bb_flags(). > It is generally >a bad idea to rely on any flags being up-to-date. */ > [...] > /* A general visited flag for passes to use. */ > DEF_BASIC_BLOCK_FLAG(VISITED, 13) > > The BB_VISITED flag to me seems to always be of very local use only. > Should we be more strict and strengthen the above "may be"/"bad idea" > wording? > > Does the "evrp" pass need to get changes applied, to properly clear > BB_VISITED (ENTRY_BLOCK_PTR_FOR_FN, in particular)? Or, in contrast, as > we're not to "rely on any flags being up-to-date", should the BB_VISITED > clearing at the end of gcc/tree-vrp.c:execute_early_vrp be removed in > fact? > > From (really just) a quick look, I can't tell the exact policy that other > GCC passes are applying/using regarding the basic_block flags... > > The following patch does cure the testsuite regressions: > > --- gcc/omp-low.c > +++ gcc/omp-low.c > @@ -19342,6 +19342,7 @@ oacc_loop_discovery () >basic_block bb; > >oacc_loop *top = new_oacc_loop_outer (current_function_decl); > + clear_bb_flags (); >oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun)); > >/* The siblings were constructed in reverse order, reverse them so > > Is something like that what we should do? Should clear_bb_flags be > called here, or early in gcc/omp-low.c:execute_oacc_device_lower (like > some other GCC passes seem to be doing)? > > > Grüße > Thomas
Re: basic_block flags, BB_VISITED
On 10/14/2016 11:26 AM, Richard Biener wrote: On Fri, Oct 14, 2016 at 11:15 AM, Bernd Schmidt wrote: So maybe it should just call clear_bb_flags instead of doing the loop itself? Ok if that works. That doesn't generally work, it clears too many flags (it was appearantly designed for RTL). Or maybe it could be argued that more tree-specific flags ought to be in BB_FLAGS_TO_PRESERVE, or that it would make more sense to define a more specific BB_FLAGS_TO_KEEP. So maybe just change the loop at the end of evrp, it clearly wants to clean up after itself so just make it do it reliably. Bernd
Re: basic_block flags, BB_VISITED
On 10/14/16 05:28, Richard Biener wrote: The BB_VISITED flag has indetermined state at the beginning of a pass. You have to ensure it is cleared yourself. In that case the openacc (&nvptx?) passes should be modified to clear the flags at their start, rather than at their end. nathan
Re: basic_block flags, BB_VISITED
On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell wrote: > On 10/14/16 05:28, Richard Biener wrote: > >> The BB_VISITED flag has indetermined state at the beginning of a pass. >> You have to ensure it is cleared yourself. > > > In that case the openacc (&nvptx?) passes should be modified to clear the > flags at their start, rather than at their end. Yes. But as I said, I ran into IRA ICEs (somewhere in the testsuite) when not cleaning up after tree-ssa-propagate.c. So somebody has to fix IRA first. Richard. > nathan
Re: basic_block flags, BB_VISITED
On Fri, Oct 14, 2016 at 12:57 PM, Bernd Schmidt wrote: > On 10/14/2016 11:26 AM, Richard Biener wrote: >> >> On Fri, Oct 14, 2016 at 11:15 AM, Bernd Schmidt >> wrote: >>> >>> So maybe it should just call clear_bb_flags instead of doing the loop >>> itself? Ok if that works. >> >> >> That doesn't generally work, it clears too many flags (it was appearantly >> designed for RTL). > > > Or maybe it could be argued that more tree-specific flags ought to be in > BB_FLAGS_TO_PRESERVE, or that it would make more sense to define a more > specific BB_FLAGS_TO_KEEP. > > So maybe just change the loop at the end of evrp, it clearly wants to clean > up after itself so just make it do it reliably. I thought about adding an argument to clear_bb_flags telling it the flags to clear. Richard. > > Bernd >
DWARF Version 5 Public Review Draft Released
DWARF Version 5 Public Review Draft Released October 14, 2016 The DWARF Debugging Information Format Committee has released the public review draft of Version 5 of the DWARF Debugging Information Format standard. The DWARF debugging format is used to communicate debugging information between a compiler and debugger to make it easier for programmers to develop, test, and debug programs. DWARF is used by a wide range of compilers and debuggers, both proprietary and open source, to support debugging of Ada, C, C++, Cobol, FORTRAN, Java, and other programming languages. DWARF V5 adds support for new languages like Rust, Swift, Ocaml, Go, and Haskell, as well as support for new features in the older languages. DWARF can be used with many processor architectures, from 8-bit to 64-bit. DWARF is the standard debugging format for Linux and several versions of Unix and is widely used with embedded processors. DWARF is designed to be extended easily to support new languages and new architectures. The DWARF Version 5 Standard has been in development for six years. DWARF Committee members include representatives from over a dozen major companies with extensive experience in compiler and debugger development. Version 5 incorporates improvements in many areas: better data compression, separation of debugging data from executable files, improved description of macros and source files, faster searching for symbols, improved debugging optimized code, as well as numerous improvements in functionality and performance. The public review draft of DWARF Version 5 standard can be downloaded without charge from the DWARF website (http://dwarfstd.org). The DWARF Committee will accept public comments on DWARF Version 5 until November 30, after which a finalized version will be published. Additional information about DWARF, including how to subscribe to the DWARF mailing list, can also be found on the website. Questions about the DWARF Debugging Information Format or the DWARF Committee can be directed to the DWARF Committee Chair, Michael Eager at i...@dwarfstd.org. -- Michael Eager, Chair, DWARF Debugging Format Standards Committee i...@dwarfstd.org 650-325-8077
Question about sibling call epilogues & registers
I'm working on my -foutline-masbi-xlouges optimization (targeting 64 bit Wine) and I've run into a snag with sibling calls. When -foutline-masbi-xlouges is disabled, the sibling call generates just fine. But when enabled the call to df_analyze() in the peephole2 pass deletes the insn that initializes the register that the sibling call uses. I'm hoping that somebody might be able to look at my epilogue code and identify what may cause this. My optimization is generating insns 146, 147, 148 and 149. (Actually, 146 and 149 can be merged, I just haven't gotten to it yet). The insn that's getting deleted is 75, where RCX is set. I'm starting to think that maybe df_analyze() presumes that my call (to the stub) is invalidating RCX, although it does not. Below is the RTL dump from the compgotos pass. Also, I'm not sure why all of my REG_CFA_RESTORE notes aren't showing up (xmm10-15 are missing). (code_label 91 70 73 11 903 "" [6 uses]) (note 73 91 74 11 [bb 11] NOTE_INSN_BASIC_BLOCK) (debug_insn 74 73 75 11 (var_location:DI data (debug_expr:DI D#64)) -1 (nil)) (insn 75 74 150 11 (set (reg:DI 2 cx) (symbol_ref:DI ("win_data_section") [flags 0x2] 0x7f33746fce10 win_data_section>)) /home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1826 89 {*movdi_internal} (expr_list:REG_UNUSED (reg:DI 2 cx) (nil))) (note 150 75 146 11 NOTE_INSN_EPILOGUE_BEG) (insn/f 146 150 147 11 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int 48 [0x30]))) (clobber (reg:CC 17 flags)) (clobber (mem:BLK (scratch) [0 A8])) ]) /home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1655 -1 (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int 48 [0x30]))) (nil (insn 147 146 148 11 (set (reg:DI 4 si) (plus:DI (reg/f:DI 7 sp) (const_int 96 [0x60]))) /home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1655 -1 (nil)) (call_insn/f 148 147 149 11 (parallel [ (call (mem:QI (symbol_ref:DI ("__msabi_restore_15") [flags 0x1]) [0 S1 A8]) (const_int 0 [0])) (clobber (reg:CC 17 flags)) (set (reg:V4SF 52 xmm15) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int -88 [0xffa8])) [0 S16 A8])) (set (reg:V4SF 51 xmm14) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int -72 [0xffb8])) [0 S16 A8])) (set (reg:V4SF 50 xmm13) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int -56 [0xffc8])) [0 S16 A8])) (set (reg:V4SF 49 xmm12) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int -40 [0xffd8])) [0 S16 A8])) (set (reg:V4SF 48 xmm11) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int -24 [0xffe8])) [0 S16 A8])) (set (reg:V4SF 47 xmm10) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int -8 [0xfff8])) [0 S16 A8])) (set (reg:V4SF 46 xmm9) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int 8 [0x8])) [0 S16 A8])) (set (reg:V4SF 45 xmm8) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int 24 [0x18])) [0 S16 A8])) (set (reg:V4SF 28 xmm7) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int 40 [0x28])) [0 S16 A8])) (set (reg:V4SF 27 xmm6) (mem/c:V4SF (plus:DI (reg:DI 4 si) (const_int 56 [0x38])) [0 S16 A8])) (set (reg:DI 5 di) (mem/c:DI (plus:DI (reg:DI 4 si) (const_int 72 [0x48])) [0 S8 A8])) (set (reg:DI 3 bx) (mem/c:DI (plus:DI (reg:DI 4 si) (const_int 80 [0x50])) [0 S8 A8])) (set (reg:DI 6 bp) (mem/c:DI (plus:DI (reg:DI 4 si) (const_int 88 [0x58])) [0 S8 A8])) (set (reg:DI 41 r12) (mem/c:DI (plus:DI (reg:DI 4 si) (const_int 96 [0x60])) [0 S8 A8])) (set (reg:DI 4 si) (mem/c:DI (plus:DI (reg:DI 4 si) (const_int 64 [0x40])) [0 S8 A8])) ]) /home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1655 -1 (expr_list:REG_CFA_RESTORE (reg:DI 4 si) (expr_list:REG_CFA_RESTORE (reg:DI 41 r12) (expr_list:REG_CFA_RESTORE (reg:DI 6 bp) (expr_list:REG_CFA_RESTORE (reg:DI 3 bx) (expr_list:REG_CFA_RESTORE (reg:DI 5 di) (expr_list