basic_block flags, BB_VISITED

2016-10-14 Thread Thomas Schwinge
Hi!

After the "Add Early VRP" GCC trunk commit r240291 (Kugan CC for your
information), I've been observing all kinds of OpenACC offloading
failures.  I now figured out what's going on.

The "evrp" pass uses basic_block's BB_VISITED flag.  It first clears
these all, gcc/tree-vrp.c:execute_early_vrp:

  FOR_EACH_BB_FN (bb, cfun)
{
  bb->flags &= ~BB_VISITED;

..., then does its processing, and at the end, clears these again:

  FOR_EACH_BB_FN (bb, cfun)
bb->flags &= ~BB_VISITED;

I note that this looks slightly different from what
gcc/cfg.c:clear_bb_flags whould be doing, which works from
ENTRY_BLOCK_PTR_FOR_FN onwards:

/* Clear all basic block flags that do not have to be preserved.  */
void
clear_bb_flags (void)
{
  basic_block bb;

  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
bb->flags &= BB_FLAGS_TO_PRESERVE;
}

In the specific case that I've been looking at, "evrp" processing doesn't
change the code other than for shifting some IDs because of adding a
(dummy one-argument) PHI node.  And the problem now exactly is that the
ENTRY_BLOCK_PTR_FOR_FN's BB_VISITED flag is still set, and so
gcc/omp-low.c:oacc_loop_discover_walk will bail out without doing any
processing:

/* DFS walk of basic blocks BB onwards, creating OpenACC loop
   structures as we go.  By construction these loops are properly
   nested.  */

static void
oacc_loop_discover_walk (oacc_loop *loop, basic_block bb)
{
[...]
  if (bb->flags & BB_VISITED)
return;

 follow:
  bb->flags |= BB_VISITED;
[...]

/* Discover the OpenACC loops marked up by HEAD and TAIL markers for
   the current function.  */

static oacc_loop *
oacc_loop_discovery ()
{
  basic_block bb;
  
  oacc_loop *top = new_oacc_loop_outer (current_function_decl);
  oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
[...]

Now, gcc/cfg-flags.def says:

   [Most of the basic block] flags may be cleared by clear_bb_flags().  It 
is generally
   a bad idea to rely on any flags being up-to-date.  */
[...]
/* A general visited flag for passes to use.  */
DEF_BASIC_BLOCK_FLAG(VISITED, 13)

The BB_VISITED flag to me seems to always be of very local use only.
Should we be more strict and strengthen the above "may be"/"bad idea"
wording?

Does the "evrp" pass need to get changes applied, to properly clear
BB_VISITED (ENTRY_BLOCK_PTR_FOR_FN, in particular)?  Or, in contrast, as
we're not to "rely on any flags being up-to-date", should the BB_VISITED
clearing at the end of gcc/tree-vrp.c:execute_early_vrp be removed in
fact?

>From (really just) a quick look, I can't tell the exact policy that other
GCC passes are applying/using regarding the basic_block flags...

The following patch does cure the testsuite regressions:

--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -19342,6 +19342,7 @@ oacc_loop_discovery ()
   basic_block bb;
   
   oacc_loop *top = new_oacc_loop_outer (current_function_decl);
+  clear_bb_flags ();
   oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
 
   /* The siblings were constructed in reverse order, reverse them so

Is something like that what we should do?  Should clear_bb_flags be
called here, or early in gcc/omp-low.c:execute_oacc_device_lower (like
some other GCC passes seem to be doing)?


Grüße
 Thomas


Re: basic_block flags, BB_VISITED

2016-10-14 Thread Bernd Schmidt

On 10/14/2016 10:01 AM, Thomas Schwinge wrote:

After the "Add Early VRP" GCC trunk commit r240291 (Kugan CC for your
information), I've been observing all kinds of OpenACC offloading
failures.  I now figured out what's going on.

The "evrp" pass uses basic_block's BB_VISITED flag.  It first clears
these all, gcc/tree-vrp.c:execute_early_vrp:

  FOR_EACH_BB_FN (bb, cfun)
{
  bb->flags &= ~BB_VISITED;

..., then does its processing, and at the end, clears these again:

  FOR_EACH_BB_FN (bb, cfun)
bb->flags &= ~BB_VISITED;

I note that this looks slightly different from what
gcc/cfg.c:clear_bb_flags whould be doing, which works from
ENTRY_BLOCK_PTR_FOR_FN onwards:


So maybe it should just call clear_bb_flags instead of doing the loop 
itself? Ok if that works.



Bernd



Re: basic_block flags, BB_VISITED

2016-10-14 Thread Richard Biener
On Fri, Oct 14, 2016 at 11:15 AM, Bernd Schmidt  wrote:
> On 10/14/2016 10:01 AM, Thomas Schwinge wrote:
>>
>> After the "Add Early VRP" GCC trunk commit r240291 (Kugan CC for your
>> information), I've been observing all kinds of OpenACC offloading
>> failures.  I now figured out what's going on.
>>
>> The "evrp" pass uses basic_block's BB_VISITED flag.  It first clears
>> these all, gcc/tree-vrp.c:execute_early_vrp:
>>
>>   FOR_EACH_BB_FN (bb, cfun)
>> {
>>   bb->flags &= ~BB_VISITED;
>>
>> ..., then does its processing, and at the end, clears these again:
>>
>>   FOR_EACH_BB_FN (bb, cfun)
>> bb->flags &= ~BB_VISITED;
>>
>> I note that this looks slightly different from what
>> gcc/cfg.c:clear_bb_flags whould be doing, which works from
>> ENTRY_BLOCK_PTR_FOR_FN onwards:
>
>
> So maybe it should just call clear_bb_flags instead of doing the loop
> itself? Ok if that works.

That doesn't generally work, it clears too many flags (it was appearantly
designed for RTL).

Richard.

>
> Bernd
>


Re: basic_block flags, BB_VISITED

2016-10-14 Thread Richard Biener
On Fri, Oct 14, 2016 at 10:01 AM, Thomas Schwinge
 wrote:
> Hi!
>
> After the "Add Early VRP" GCC trunk commit r240291 (Kugan CC for your
> information), I've been observing all kinds of OpenACC offloading
> failures.  I now figured out what's going on.
>
> The "evrp" pass uses basic_block's BB_VISITED flag.  It first clears
> these all, gcc/tree-vrp.c:execute_early_vrp:
>
>   FOR_EACH_BB_FN (bb, cfun)
> {
>   bb->flags &= ~BB_VISITED;
>
> ..., then does its processing, and at the end, clears these again:
>
>   FOR_EACH_BB_FN (bb, cfun)
> bb->flags &= ~BB_VISITED;
>
> I note that this looks slightly different from what
> gcc/cfg.c:clear_bb_flags whould be doing, which works from
> ENTRY_BLOCK_PTR_FOR_FN onwards:
>
> /* Clear all basic block flags that do not have to be preserved.  */
> void
> clear_bb_flags (void)
> {
>   basic_block bb;
>
>   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
> bb->flags &= BB_FLAGS_TO_PRESERVE;
> }
>
> In the specific case that I've been looking at, "evrp" processing doesn't
> change the code other than for shifting some IDs because of adding a
> (dummy one-argument) PHI node.  And the problem now exactly is that the
> ENTRY_BLOCK_PTR_FOR_FN's BB_VISITED flag is still set, and so
> gcc/omp-low.c:oacc_loop_discover_walk will bail out without doing any
> processing:

The BB_VISITED flag has indetermined state at the beginning of a pass.
You have to ensure it is cleared yourself.

EVRP clearing it (similar to tree-ssa-propagate.c) is because IRA has
the same issue and crashes on "stale" BB_VISTED flags.

Richard.

> /* DFS walk of basic blocks BB onwards, creating OpenACC loop
>structures as we go.  By construction these loops are properly
>nested.  */
>
> static void
> oacc_loop_discover_walk (oacc_loop *loop, basic_block bb)
> {
> [...]
>   if (bb->flags & BB_VISITED)
> return;
>
>  follow:
>   bb->flags |= BB_VISITED;
> [...]
>
> /* Discover the OpenACC loops marked up by HEAD and TAIL markers for
>the current function.  */
>
> static oacc_loop *
> oacc_loop_discovery ()
> {
>   basic_block bb;
>
>   oacc_loop *top = new_oacc_loop_outer (current_function_decl);
>   oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
> [...]
>
> Now, gcc/cfg-flags.def says:
>
>[Most of the basic block] flags may be cleared by clear_bb_flags().  
> It is generally
>a bad idea to rely on any flags being up-to-date.  */
> [...]
> /* A general visited flag for passes to use.  */
> DEF_BASIC_BLOCK_FLAG(VISITED, 13)
>
> The BB_VISITED flag to me seems to always be of very local use only.
> Should we be more strict and strengthen the above "may be"/"bad idea"
> wording?
>
> Does the "evrp" pass need to get changes applied, to properly clear
> BB_VISITED (ENTRY_BLOCK_PTR_FOR_FN, in particular)?  Or, in contrast, as
> we're not to "rely on any flags being up-to-date", should the BB_VISITED
> clearing at the end of gcc/tree-vrp.c:execute_early_vrp be removed in
> fact?
>
> From (really just) a quick look, I can't tell the exact policy that other
> GCC passes are applying/using regarding the basic_block flags...
>
> The following patch does cure the testsuite regressions:
>
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -19342,6 +19342,7 @@ oacc_loop_discovery ()
>basic_block bb;
>
>oacc_loop *top = new_oacc_loop_outer (current_function_decl);
> +  clear_bb_flags ();
>oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
>
>/* The siblings were constructed in reverse order, reverse them so
>
> Is something like that what we should do?  Should clear_bb_flags be
> called here, or early in gcc/omp-low.c:execute_oacc_device_lower (like
> some other GCC passes seem to be doing)?
>
>
> Grüße
>  Thomas


Re: basic_block flags, BB_VISITED

2016-10-14 Thread Bernd Schmidt

On 10/14/2016 11:26 AM, Richard Biener wrote:

On Fri, Oct 14, 2016 at 11:15 AM, Bernd Schmidt  wrote:

So maybe it should just call clear_bb_flags instead of doing the loop
itself? Ok if that works.


That doesn't generally work, it clears too many flags (it was appearantly
designed for RTL).


Or maybe it could be argued that more tree-specific flags ought to be in 
BB_FLAGS_TO_PRESERVE, or that it would make more sense to define a more 
specific BB_FLAGS_TO_KEEP.


So maybe just change the loop at the end of evrp, it clearly wants to 
clean up after itself so just make it do it reliably.



Bernd



Re: basic_block flags, BB_VISITED

2016-10-14 Thread Nathan Sidwell

On 10/14/16 05:28, Richard Biener wrote:


The BB_VISITED flag has indetermined state at the beginning of a pass.
You have to ensure it is cleared yourself.


In that case the openacc (&nvptx?) passes should be modified to clear the flags 
at their start, rather than at their end.


nathan


Re: basic_block flags, BB_VISITED

2016-10-14 Thread Richard Biener
On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
> On 10/14/16 05:28, Richard Biener wrote:
>
>> The BB_VISITED flag has indetermined state at the beginning of a pass.
>> You have to ensure it is cleared yourself.
>
>
> In that case the openacc (&nvptx?) passes should be modified to clear the
> flags at their start, rather than at their end.

Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) when not
cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.

Richard.

> nathan


Re: basic_block flags, BB_VISITED

2016-10-14 Thread Richard Biener
On Fri, Oct 14, 2016 at 12:57 PM, Bernd Schmidt  wrote:
> On 10/14/2016 11:26 AM, Richard Biener wrote:
>>
>> On Fri, Oct 14, 2016 at 11:15 AM, Bernd Schmidt 
>> wrote:
>>>
>>> So maybe it should just call clear_bb_flags instead of doing the loop
>>> itself? Ok if that works.
>>
>>
>> That doesn't generally work, it clears too many flags (it was appearantly
>> designed for RTL).
>
>
> Or maybe it could be argued that more tree-specific flags ought to be in
> BB_FLAGS_TO_PRESERVE, or that it would make more sense to define a more
> specific BB_FLAGS_TO_KEEP.
>
> So maybe just change the loop at the end of evrp, it clearly wants to clean
> up after itself so just make it do it reliably.

I thought about adding an argument to clear_bb_flags telling it the
flags to clear.

Richard.

>
> Bernd
>


DWARF Version 5 Public Review Draft Released

2016-10-14 Thread Michael Eager

DWARF Version 5 Public Review Draft Released
October 14, 2016

The DWARF Debugging Information Format Committee has released the public review draft of Version 5 
of the DWARF Debugging Information Format standard. The DWARF debugging format is used to 
communicate debugging information between a compiler and debugger to make it easier for programmers 
to develop, test, and debug programs.


DWARF is used by a wide range of compilers and debuggers, both proprietary and open source, to 
support debugging of Ada, C, C++, Cobol, FORTRAN, Java, and other programming languages. DWARF
V5 adds support for new languages like Rust, Swift, Ocaml, Go, and Haskell, as well as support for 
new features in the older languages.  DWARF can be used with many processor architectures, from 
8-bit to 64-bit.


DWARF is the standard debugging format for Linux and several versions of Unix and is widely used 
with embedded processors. DWARF is designed to be extended easily to support new languages and new 
architectures.


The DWARF Version 5 Standard has been in development for six years.  DWARF Committee members include 
representatives from over a dozen major companies with extensive experience in compiler and debugger 
development.  Version 5 incorporates improvements in many areas:  better data compression, 
separation of debugging data from executable files, improved description of macros and source files, 
faster searching for symbols, improved debugging optimized code, as well as numerous improvements in 
functionality and performance.


The public review draft of DWARF Version 5 standard can be downloaded without charge from the DWARF 
website (http://dwarfstd.org). The DWARF Committee will accept public comments on DWARF Version 5 
until November 30, after which a finalized version will be published.  Additional information about 
DWARF, including how to subscribe to the DWARF mailing list, can also be found on the website. 
Questions about the DWARF Debugging Information Format or the DWARF Committee can be directed to the 
DWARF Committee Chair, Michael Eager at i...@dwarfstd.org.


--
Michael Eager, Chair, DWARF Debugging Format Standards Committee
i...@dwarfstd.org  650-325-8077



Question about sibling call epilogues & registers

2016-10-14 Thread Daniel Santos
I'm working on my -foutline-masbi-xlouges optimization (targeting 64 bit 
Wine) and I've run into a snag with sibling calls. When 
-foutline-masbi-xlouges is disabled, the sibling call generates just 
fine. But when enabled the call to df_analyze() in the peephole2 pass 
deletes the insn that initializes the register that the sibling call 
uses. I'm hoping that somebody might be able to look at my epilogue code 
and identify what may cause this. My optimization is generating insns 
146, 147, 148 and 149. (Actually, 146 and 149 can be merged, I just 
haven't gotten to it yet).


The insn that's getting deleted is 75, where RCX is set.  I'm starting 
to think that maybe df_analyze() presumes that my call (to the stub) is 
invalidating RCX, although it does not. Below is the RTL dump from the 
compgotos pass. Also, I'm not sure why all of my REG_CFA_RESTORE notes 
aren't showing up (xmm10-15 are missing).


(code_label 91 70 73 11 903 "" [6 uses])
(note 73 91 74 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
(debug_insn 74 73 75 11 (var_location:DI data (debug_expr:DI D#64)) -1
 (nil))
(insn 75 74 150 11 (set (reg:DI 2 cx)
(symbol_ref:DI ("win_data_section") [flags 0x2] 0x7f33746fce10 win_data_section>)) 
/home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1826 89 
{*movdi_internal}

 (expr_list:REG_UNUSED (reg:DI 2 cx)
(nil)))
(note 150 75 146 11 NOTE_INSN_EPILOGUE_BEG)
(insn/f 146 150 147 11 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 48 [0x30])))
(clobber (reg:CC 17 flags))
(clobber (mem:BLK (scratch) [0  A8]))
]) 
/home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1655 -1

 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_CFA_ADJUST_CFA (set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 48 [0x30])))
(nil
(insn 147 146 148 11 (set (reg:DI 4 si)
(plus:DI (reg/f:DI 7 sp)
(const_int 96 [0x60]))) 
/home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1655 -1

 (nil))
(call_insn/f 148 147 149 11 (parallel [
(call (mem:QI (symbol_ref:DI ("__msabi_restore_15") [flags 
0x1]) [0  S1 A8])

(const_int 0 [0]))
(clobber (reg:CC 17 flags))
(set (reg:V4SF 52 xmm15)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -88 [0xffa8])) [0  S16 A8]))
(set (reg:V4SF 51 xmm14)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -72 [0xffb8])) [0  S16 A8]))
(set (reg:V4SF 50 xmm13)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -56 [0xffc8])) [0  S16 A8]))
(set (reg:V4SF 49 xmm12)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -40 [0xffd8])) [0  S16 A8]))
(set (reg:V4SF 48 xmm11)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -24 [0xffe8])) [0  S16 A8]))
(set (reg:V4SF 47 xmm10)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -8 [0xfff8])) [0  S16 A8]))
(set (reg:V4SF 46 xmm9)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int 8 [0x8])) [0  S16 A8]))
(set (reg:V4SF 45 xmm8)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int 24 [0x18])) [0  S16 A8]))
(set (reg:V4SF 28 xmm7)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int 40 [0x28])) [0  S16 A8]))
(set (reg:V4SF 27 xmm6)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int 56 [0x38])) [0  S16 A8]))
(set (reg:DI 5 di)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 72 [0x48])) [0  S8 A8]))
(set (reg:DI 3 bx)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 80 [0x50])) [0  S8 A8]))
(set (reg:DI 6 bp)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 88 [0x58])) [0  S8 A8]))
(set (reg:DI 41 r12)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 96 [0x60])) [0  S8 A8]))
(set (reg:DI 4 si)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 64 [0x40])) [0  S8 A8]))
]) 
/home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1655 -1

 (expr_list:REG_CFA_RESTORE (reg:DI 4 si)
(expr_list:REG_CFA_RESTORE (reg:DI 41 r12)
(expr_list:REG_CFA_RESTORE (reg:DI 6 bp)
(expr_list:REG_CFA_RESTORE (reg:DI 3 bx)
(expr_list:REG_CFA_RESTORE (reg:DI 5 di)
(expr_list