Re: Dead code elimination PROBLEM

2014-02-13 Thread Richard Biener
On February 13, 2014 8:07:16 AM GMT+01:00, chronicle 
 wrote:
>Hi PPL i developed a plugin that  produces the following gimple
>
>test ()
>{
>   int selected_fnc_var_.3;
>   int random_Var.2;
>   int D.2363;
>   int _1;
>
>   :
>   random_Var.2_2 = rand ();
>   selected_fnc_var_.3_3 = random_Var.2_2 %[fl] 5;
>   if (selected_fnc_var_.3_3 == 4) goto ;
>   if (selected_fnc_var_.3_3 == 3) goto ;
>   if (selected_fnc_var_.3_3 == 2) goto ;
>   if (selected_fnc_var_.3_3 == 1) goto ;
>   if (selected_fnc_var_.3_3 == 0) goto ;
>:
>   _1 = f.clone.4 ("t", "t");
>   goto ;
>:
>   _1 = f.clone.3 ("t", "t");
>   goto ;
>:
>   _1 = f.clone.2 ("t", "t");
>   goto ;
>:
>   _1 =f.clone.1 ("t", "t");
>   goto ;
>
>:

You miss a phi node merging the different _1.  Also you cannot assign to _1 
multiple times but have to use a new ssa name for each.

Richard.

>   if (_1 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   __builtin_puts (&" f success "[0]);
>   goto ;
>
>   :
>   __builtin_puts (&" f failed "[0]);
>
>   :
>   return;
>
>}
>
>with this final code
>
>004005c6 :
>   4005c6:55   push   %rbp
>   4005c7:48 89 e5 mov%rsp,%rbp
>   4005ca:53   push   %rbx
>   4005cb:48 83 ec 08  sub$0x8,%rsp
>   4005cf:e8 6c fe ff ff   callq  400440 
>   4005d4:89 d9mov%ebx,%ecx
>   4005d6:c1 f9 1f sar$0x1f,%ecx
>   4005d9:89 d8mov%ebx,%eax
>   4005db:31 c8xor%ecx,%eax
>   4005dd:ba 67 66 66 66   mov$0x6667,%edx
>   4005e2:f7 e2mul%edx
>   4005e4:89 d0mov%edx,%eax
>   4005e6:d1 e8shr%eax
>   4005e8:31 c8xor%ecx,%eax
>   4005ea:89 c2mov%eax,%edx
>   4005ec:c1 e2 02 shl$0x2,%edx
>   4005ef:01 c2add%eax,%edx
>   4005f1:89 d8mov%ebx,%eax
>   4005f3:29 d0sub%edx,%eax
>   4005f5:83 f8 04 cmp$0x4,%eax
>   4005f8:75 32jne40062c 
>   4005fa:83 f8 03 cmp$0x3,%eax
>   4005fd:74 2dje 40062c 
>   4005ff:83 f8 02 cmp$0x2,%eax
>   400602:74 28je 40062c 
>   400604:83 f8 01 cmp$0x1,%eax
>   400607:74 23je 40062c 
>   400609:85 c0test   %eax,%eax
>   40060b:74 1fje 40062c 
>   40060d:be bc 09 40 00   mov$0x4009bc,%esi
>   400612:bf c6 09 40 00   mov$0x4009c6,%edi
>   400617:e8 7d 02 00 00   callq  400899 
>   40061c:85 c0test   %eax,%eax
>   40061e:75 0cjne40062c 
>   400620:bf d0 09 40 00   mov$0x4009d0,%edi
>   400625:e8 e6 fd ff ff   callq  400410 
>   40062a:eb 0ajmp400636 
>   40062c:bf e8 09 40 00   mov$0x4009e8,%edi
>   400631:e8 da fd ff ff   callq  400410 
>   400636:48 83 c4 08  add$0x8,%rsp
>   40063a:5b   pop%rbx
>   40063b:5d   pop%rbp
>   40063c:c3   retq
>
>
>from this gimple
>
>test(){
>
>int D.2363;
>   int _1;
>
>   :
>   _1 = f("t", "t");
>   if (_1 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   __builtin_puts (&" f "[0]);
>   goto ;
>
>   :
>   __builtin_puts (&" f "[0]);
>
>   :
>   return;
>}
>
>as you can see in the dis output code, its only make call to f.clone.4 
>(  callq  400899  ), i suppose is the dead code elimination 
>pass is the responsable of this action, i tryed to disable it using -O0
>
>compilation option but without success. my question is how can i make 
>the compiler produce the final code without deleting those dead codes 
>portion ( do i need to make any kind of PHI nodes in the labels to 
>achive that, if so how could i do that ? )
>
>thanks in advance




Re: Fwd: LLVM collaboration?

2014-02-13 Thread Richard Biener
On Wed, Feb 12, 2014 at 5:22 PM, Jan Hubicka  wrote:
>> On Wed, 12 Feb 2014, Richard Biener wrote:
>>
>> > What about instead of our current odd way of identifying LTO objects
>> > simply add a special ELF note telling the linker the plugin to use?
>> >
>> > .note._linker_plugin '/./libltoplugin.so'
>> >
>> > that way the linker should try 1) loading that plugin, 2) register the
>> > specific object with that plugin.
>>
>> Unless this is only allowed for a whitelist of known-good plugins in
>> known-good directories, it's a clear security hole for the linker to
>> execute code in arbitrary files named by linker input.  The linker should
>> be safe to run on untrusted input files.
>
> Also I believe the flies should be independent of particular setup (that is 
> not
> contain a path) and probably host OS (that is not having .so extension) at 
> least.
> We need some versioning scheme for different versions of compilers.
> Finally we need a solution for non-ELF LTO objects (like LLVM)
>
> But yes, having an compiler independent way of declaring that plugin is needed
> and what plugin should be uses seems possible.

Yeah, naming the plugin (and searching it in a ld specific trusted configurable
path only) would work as well, of course.

That also means that we should try to make the GCC side lto-plugin work for
older GCC versions as well (we pick the lto-wrapper to call from the environment
which would have to change if we'd try to support using multiple GCC versions
at the same time).

Richard.

> Honza
>>
>> --
>> Joseph S. Myers
>> jos...@codesourcery.com


Re: Aarch64 implementation for dwarf exception handling

2014-02-13 Thread Yufeng Zhang 张玉峰
Hi Shiva,

I wonder if you have any test case to demonstrate the potential
code-gen issue you are concerned with.

Thanks,
Yufeng

On Thu, Feb 13, 2014 at 2:14 AM, Shiva Chen  wrote:
> Hi,
>
> I have a question about the implementation of
>
> aarch64_final_eh_return_addr
>
> which is used to point out the return address of the frame
>
> According the source code
>
> If FP is not needed
>
>   return gen_frame_mem (DImode,
> plus_constant (Pmode,
>stack_pointer_rtx,
>fp_offset
>+ cfun->machine->frame.saved_regs_size
>- 2 * UNITS_PER_WORD));
>
>
> According the frame layout
>
> +---+ <-- arg_pointer_rtx
> |
> |  callee-allocated save area
> |  for register varargs
> |
> +---+
> |
> |  local variables
> |
> +---+ <-- frame_pointer_rtx
> |
> |  callee-saved registers
> |
> +---+
> |  LR'
> +---+
> |  FP'
>P+---+ <-- hard_frame_pointer_rtx
> |  dynamic allocation
> +---+
> |
> |  outgoing stack arguments
> |
> +---+ <-- stack_pointer_rtx
>
> Shouldn't the return value be
>
>   return gen_frame_mem (DImode,
> plus_constant (Pmode,
>stack_pointer_rtx,
>fp_offset
>+  2* UNITS_PER_WORD));
>
> Or I just mis-understanding something ?
>
>
> Hope someone could give me a tip.
>
> It would be very helpful.
>
> Thanks
>
> Shiva Chen


[GCC 4.8.1] Which section to emit, .eh_frame or .debug_section?

2014-02-13 Thread Ramana
Hi,

For C++ applications, on PPC, gcc v4.8.1 is generating the call frame
information in the .eh_frame section by default.

Could you please tell me why .eh_frame is being generated instead of
.debug_frame?
Also, the dwarf4 standard does not describe .eh_frame section. I
understand that by default gcc v4.8.1 emits dwarf4 debug information.

Thanks,
Venkata Ramanaiah N


Re: [GCC 4.8.1] Which section to emit, .eh_frame or .debug_section?

2014-02-13 Thread Jakub Jelinek
On Thu, Feb 13, 2014 at 05:24:53PM +0530, Ramana wrote:
> For C++ applications, on PPC, gcc v4.8.1 is generating the call frame
> information in the .eh_frame section by default.
> 
> Could you please tell me why .eh_frame is being generated instead of
> .debug_frame?

Because .eh_frame is the same data .debug_frame contains, just more compact
and usable also for unwinding and backtrace purposes, not just debugging.
It doesn't make sense to emit both.

Jakub


Re: gcc.gnu.org/infrastructure - newer versions of GMP/mpfr/mpc/isl/cloog?

2014-02-13 Thread Richard Biener
On Mon, Jan 27, 2014 at 8:40 PM, Tobias Grosser  wrote:
> On 01/27/2014 08:29 PM, Tobias Burnus wrote:
>>
>> Hello,
>>
>> motivated by the recent MPC 1.0.2 announcement, I looked at
>> ./contrib/download_prerequisites and also at
>> ftp://gcc.gnu.org/pub/gcc/infrastructure/ to see which versions are
>> offered there.
>>
>> Question: Would it make sense to place newer versions into
>> infrastructure and update ./contrib/download_prerequisites for those? I
>> believe most distros use newer versions nowadays and as some bugs have
>> been fixed in newer versions...
>>
>> * GMP: infrastructure 4.3.2 (2010-01-08), current: 5.1.3 (2013-09-30)
>> * mpfr: infrastructure 2.4.2 (2009-11-30), current: 3.1.2 (2013-03-13)
>> * mpc: infrastructure 0.8.1 (2009-12-08), current: 1.0.2 (2014-01-15)
>> * ISL: infrastructure 0.11.1 (2012-12-12), current: 0.12.2 (2014-01-12)
>> * CLooG: infrastructure 0.18.0 (2012-12-20), current: 0.18.1
>> (2013-10-11) [Or 0.18.2 (2013-12-20) according to the GIT tag, but that
>> release only added a howto_cloog_release.txt file ...]
>
>
> Hi Tobias,
>
> that sounds like a great idea. We are internally currently working on
> preparing graphite for the isl 0.13.0 release, which is a large improvement
> and e.g. provides a computeout facility that allows us to stop dependence
> analysis in case the dependence problem is too complex to solve. This would
> address some of the open graphite bugs. Even before this is ready, upgrading
> to CLooG 0.18.1 and isl-0.12.1 would probably be a good thing to do.

I've tested building the 4.8 branch with cloog 0.18.1 and isl 0.12.2 and
that seems to work.  Updating the infrastructure dir sounds good to me,
I'll do it.

Richard.

> Cheers,
> Tobias
>


Re: gcc.gnu.org/infrastructure - newer versions of GMP/mpfr/mpc/isl/cloog?

2014-02-13 Thread Tobias Grosser

On 02/13/2014 08:19 AM, Richard Biener wrote:

On Mon, Jan 27, 2014 at 8:40 PM, Tobias Grosser  wrote:

On 01/27/2014 08:29 PM, Tobias Burnus wrote:


Hello,

motivated by the recent MPC 1.0.2 announcement, I looked at
./contrib/download_prerequisites and also at
ftp://gcc.gnu.org/pub/gcc/infrastructure/ to see which versions are
offered there.

Question: Would it make sense to place newer versions into
infrastructure and update ./contrib/download_prerequisites for those? I
believe most distros use newer versions nowadays and as some bugs have
been fixed in newer versions...

* GMP: infrastructure 4.3.2 (2010-01-08), current: 5.1.3 (2013-09-30)
* mpfr: infrastructure 2.4.2 (2009-11-30), current: 3.1.2 (2013-03-13)
* mpc: infrastructure 0.8.1 (2009-12-08), current: 1.0.2 (2014-01-15)
* ISL: infrastructure 0.11.1 (2012-12-12), current: 0.12.2 (2014-01-12)
* CLooG: infrastructure 0.18.0 (2012-12-20), current: 0.18.1
(2013-10-11) [Or 0.18.2 (2013-12-20) according to the GIT tag, but that
release only added a howto_cloog_release.txt file ...]



Hi Tobias,

that sounds like a great idea. We are internally currently working on
preparing graphite for the isl 0.13.0 release, which is a large improvement
and e.g. provides a computeout facility that allows us to stop dependence
analysis in case the dependence problem is too complex to solve. This would
address some of the open graphite bugs. Even before this is ready, upgrading
to CLooG 0.18.1 and isl-0.12.1 would probably be a good thing to do.


I've tested building the 4.8 branch with cloog 0.18.1 and isl 0.12.2 and
that seems to work.  Updating the infrastructure dir sounds good to me,
I'll do it.


Thanks Richi!

Could we also make the minimal library requirement for gcc the following 
two?


Cheers,
Tobias



Re: gcc.gnu.org/infrastructure - newer versions of GMP/mpfr/mpc/isl/cloog?

2014-02-13 Thread Richard Biener
On Thu, Feb 13, 2014 at 2:20 PM, Tobias Grosser  wrote:
> On 02/13/2014 08:19 AM, Richard Biener wrote:
>>
>> On Mon, Jan 27, 2014 at 8:40 PM, Tobias Grosser  wrote:
>>>
>>> On 01/27/2014 08:29 PM, Tobias Burnus wrote:


 Hello,

 motivated by the recent MPC 1.0.2 announcement, I looked at
 ./contrib/download_prerequisites and also at
 ftp://gcc.gnu.org/pub/gcc/infrastructure/ to see which versions are
 offered there.

 Question: Would it make sense to place newer versions into
 infrastructure and update ./contrib/download_prerequisites for those? I
 believe most distros use newer versions nowadays and as some bugs have
 been fixed in newer versions...

 * GMP: infrastructure 4.3.2 (2010-01-08), current: 5.1.3 (2013-09-30)
 * mpfr: infrastructure 2.4.2 (2009-11-30), current: 3.1.2 (2013-03-13)
 * mpc: infrastructure 0.8.1 (2009-12-08), current: 1.0.2 (2014-01-15)
 * ISL: infrastructure 0.11.1 (2012-12-12), current: 0.12.2 (2014-01-12)
 * CLooG: infrastructure 0.18.0 (2012-12-20), current: 0.18.1
 (2013-10-11) [Or 0.18.2 (2013-12-20) according to the GIT tag, but that
 release only added a howto_cloog_release.txt file ...]
>>>
>>>
>>>
>>> Hi Tobias,
>>>
>>> that sounds like a great idea. We are internally currently working on
>>> preparing graphite for the isl 0.13.0 release, which is a large
>>> improvement
>>> and e.g. provides a computeout facility that allows us to stop dependence
>>> analysis in case the dependence problem is too complex to solve. This
>>> would
>>> address some of the open graphite bugs. Even before this is ready,
>>> upgrading
>>> to CLooG 0.18.1 and isl-0.12.1 would probably be a good thing to do.
>>
>>
>> I've tested building the 4.8 branch with cloog 0.18.1 and isl 0.12.2 and
>> that seems to work.  Updating the infrastructure dir sounds good to me,
>> I'll do it.
>
>
> Thanks Richi!
>
> Could we also make the minimal library requirement for gcc the following
> two?

I see no reason to do that at this point (I suppose only dropping support
for ISL < 0.12.2 would help you?), but I have updated the recommended
versions as documented in install.texi to those two.

When trunk re-opens for stage1 we can drop support for older
versions once we make code changes that make those versions no
longer work.

Pretty high on my wishlist and a good motivation would be to get
rid of the cloog dependency by using the ISL code generator ;)
I'll help as good as I can with all problems that arise in GCC
specific areas such as GIMPLE and SSA.

Thanks,
Richard.

> Cheers,
> Tobias
>


Re: gcc.gnu.org/infrastructure - newer versions of GMP/mpfr/mpc/isl/cloog?

2014-02-13 Thread Tobias Grosser

On 02/13/2014 08:37 AM, Richard Biener wrote:

On Thu, Feb 13, 2014 at 2:20 PM, Tobias Grosser  wrote:

On 02/13/2014 08:19 AM, Richard Biener wrote:


On Mon, Jan 27, 2014 at 8:40 PM, Tobias Grosser  wrote:


On 01/27/2014 08:29 PM, Tobias Burnus wrote:



Hello,

motivated by the recent MPC 1.0.2 announcement, I looked at
./contrib/download_prerequisites and also at
ftp://gcc.gnu.org/pub/gcc/infrastructure/ to see which versions are
offered there.

Question: Would it make sense to place newer versions into
infrastructure and update ./contrib/download_prerequisites for those? I
believe most distros use newer versions nowadays and as some bugs have
been fixed in newer versions...

* GMP: infrastructure 4.3.2 (2010-01-08), current: 5.1.3 (2013-09-30)
* mpfr: infrastructure 2.4.2 (2009-11-30), current: 3.1.2 (2013-03-13)
* mpc: infrastructure 0.8.1 (2009-12-08), current: 1.0.2 (2014-01-15)
* ISL: infrastructure 0.11.1 (2012-12-12), current: 0.12.2 (2014-01-12)
* CLooG: infrastructure 0.18.0 (2012-12-20), current: 0.18.1
(2013-10-11) [Or 0.18.2 (2013-12-20) according to the GIT tag, but that
release only added a howto_cloog_release.txt file ...]




Hi Tobias,

that sounds like a great idea. We are internally currently working on
preparing graphite for the isl 0.13.0 release, which is a large
improvement
and e.g. provides a computeout facility that allows us to stop dependence
analysis in case the dependence problem is too complex to solve. This
would
address some of the open graphite bugs. Even before this is ready,
upgrading
to CLooG 0.18.1 and isl-0.12.1 would probably be a good thing to do.



I've tested building the 4.8 branch with cloog 0.18.1 and isl 0.12.2 and
that seems to work.  Updating the infrastructure dir sounds good to me,
I'll do it.



Thanks Richi!

Could we also make the minimal library requirement for gcc the following
two?


I see no reason to do that at this point (I suppose only dropping support
for ISL < 0.12.2 would help you?), but I have updated the recommended
versions as documented in install.texi to those two.

When trunk re-opens for stage1 we can drop support for older
versions once we make code changes that make those versions no
longer work.


Perfect. That works for us.


Pretty high on my wishlist and a good motivation would be to get
rid of the cloog dependency by using the ISL code generator ;)
I'll help as good as I can with all problems that arise in GCC
specific areas such as GIMPLE and SSA.


We seem to agree on the next steps. Using the isl code generator is 
pretty high on the wish list, only fixing the last P1 bugs is higher.


Cheers,
Tobias



GNU Tools Cauldron 2014 - Venue and Hotel information

2014-02-13 Thread Diego Novillo
==

 GNU Tools Cauldron 2014
  http://gcc.gnu.org/wiki/cauldron2014


  Call for Abstracts and Participation

 18-20 July 2014
   Cambridge, England

==

An update to this year's Cauldron. The workshop will  be held at
University of Cambridge's Computer Labratory in the William Gates
Building. Details at http://www.cl.cam.ac.uk/local/wgb/

We have negotiated a promotional rate at St. Catherine's college.
For those interested, please reserve through:

  http://www.caths.cam.ac.uk/home/?m=page&id=73

Note: you will need to enter GNUTOOLSCAULDRON into the
promotional code box for it to bring up availability.

If you are interested in attending, please remember to register
soon. We have limited room for attendance and we have quite a few
registrations already.

If you have a topic that you would like to present, please submit
an abstract describing what you plan to present.  We are
accepting three types of submissions:

- Prepared presentations: demos, project reports, etc.
- BoFs: coordination meetings with other developers.
- Tutorials for developers.  No user tutorials, please.

Note that we will not be doing in-depth reviews of the
presentations.  Mainly we are looking for applicability and to
decide scheduling.  There will be time at the conference to add
other topics of discussion, similarly to what we did at the
previous meetings.

To register your abstract, send e-mail to
tools-cauldron-ad...@googlegroups.com.

Your submission should contain the following information:

Title:
Authors:
Abstract:

If you intend to participate, but not necessarily present, please
let us know as well.  Send a message to
tools-cauldron-ad...@googlegroups.com stating your intent to
participate.


Re: [GCC 4.8.1] Which section to emit, .eh_frame or .debug_section?

2014-02-13 Thread Ramana
On Thu, Feb 13, 2014 at 5:29 PM, Jakub Jelinek  wrote:
> On Thu, Feb 13, 2014 at 05:24:53PM +0530, Ramana wrote:
>> For C++ applications, on PPC, gcc v4.8.1 is generating the call frame
>> information in the .eh_frame section by default.
>>
>> Could you please tell me why .eh_frame is being generated instead of
>> .debug_frame?
>
> Because .eh_frame is the same data .debug_frame contains, just more compact
> and usable also for unwinding and backtrace purposes, not just debugging.
> It doesn't make sense to emit both.
>
> Jakub
Ok.

But there is no mention of .eh_frame in dwarf4 standard. Any dwarf4
compliant debugger would look for only .debug_frame or shouldn't it?

Regards,
Venkata Ramanaiah N


Re: Aarch64 implementation for dwarf exception handling

2014-02-13 Thread Renlin Li

On 13/02/14 02:14, Shiva Chen wrote:

Hi,

I have a question about the implementation of

aarch64_final_eh_return_addr

which is used to point out the return address of the frame

According the source code

If FP is not needed

   return gen_frame_mem (DImode,
 plus_constant (Pmode,
stack_pointer_rtx,
fp_offset
+ cfun->machine->frame.saved_regs_size
- 2 * UNITS_PER_WORD));


According the frame layout

 +---+ <-- arg_pointer_rtx
 |
 |  callee-allocated save area
 |  for register varargs
 |
 +---+
 |
 |  local variables
 |
 +---+ <-- frame_pointer_rtx
 |
 |  callee-saved registers
 |
 +---+
 |  LR'
 +---+
 |  FP'
P+---+ <-- hard_frame_pointer_rtx
 |  dynamic allocation
 +---+
 |
 |  outgoing stack arguments
 |
 +---+ <-- stack_pointer_rtx

Shouldn't the return value be

   return gen_frame_mem (DImode,
 plus_constant (Pmode,
stack_pointer_rtx,
fp_offset
+  2* UNITS_PER_WORD));

Or I just mis-understanding something ?


Hope someone could give me a tip.

It would be very helpful.

Thanks

Shiva Chen


Hi,

If frame pointer is not needed. The prologue routine will store the 
callee saved registers to stack according to ascending order, which 
means X0 will be saved first if needed, and X30(LR) will be the last if 
it's pushed into stack.


Please check the source code, aarch64_layout_frame().

As the comment above the code also indicates, LR would be at the top of 
the saved registers block().


By the way, there is one additional stack slot might be needed to keep 
stack pointer 16-byte aligned,  so - 2 * UNITS_PER_WORD is needed to 
adjust the load  address.


+---+ <-- arg_pointer_rtx
|
+---+ <-- frame_pointer_rtx
|  dummy
|  LR
|  bla...bla...
|  x3
|  x2
|  x1
|  x0
  P +---+ <-- hard_frame_pointer_rtx
|
+---+ <-- stack_pointer_rtx


Kind regards,
Renlin



Re: Aarch64 implementation for dwarf exception handling

2014-02-13 Thread Shiva Chen
Hi, Yufeng

Sorry, I don't have any testcase
I just mis-understanding the implementation.


Hi, Renlin

Thanks to point out my mis-understanding.
I didn't aware that LP would in different position between FP needed
(bottom of callee) and FP not needed(top of callee).
I have check the aarch64_layout_frame() and find out the FP/LP will
push as last register by aarch64_save_or_restore_callee_save_registers
() if FP is not needed.

Thanks for your kindly help,
I really appreciate it.

Shiva


2014-02-13 22:32 GMT+08:00 Renlin Li :
> On 13/02/14 02:14, Shiva Chen wrote:
>>
>> Hi,
>>
>> I have a question about the implementation of
>>
>> aarch64_final_eh_return_addr
>>
>> which is used to point out the return address of the frame
>>
>> According the source code
>>
>> If FP is not needed
>>
>>return gen_frame_mem (DImode,
>>  plus_constant (Pmode,
>> stack_pointer_rtx,
>> fp_offset
>> +
>> cfun->machine->frame.saved_regs_size
>> - 2 * UNITS_PER_WORD));
>>
>>
>> According the frame layout
>>
>>  +---+ <-- arg_pointer_rtx
>>  |
>>  |  callee-allocated save area
>>  |  for register varargs
>>  |
>>  +---+
>>  |
>>  |  local variables
>>  |
>>  +---+ <-- frame_pointer_rtx
>>  |
>>  |  callee-saved registers
>>  |
>>  +---+
>>  |  LR'
>>  +---+
>>  |  FP'
>> P+---+ <-- hard_frame_pointer_rtx
>>  |  dynamic allocation
>>  +---+
>>  |
>>  |  outgoing stack arguments
>>  |
>>  +---+ <-- stack_pointer_rtx
>>
>> Shouldn't the return value be
>>
>>return gen_frame_mem (DImode,
>>  plus_constant (Pmode,
>> stack_pointer_rtx,
>> fp_offset
>> +  2* UNITS_PER_WORD));
>>
>> Or I just mis-understanding something ?
>>
>>
>> Hope someone could give me a tip.
>>
>> It would be very helpful.
>>
>> Thanks
>>
>> Shiva Chen
>>
> Hi,
>
> If frame pointer is not needed. The prologue routine will store the callee
> saved registers to stack according to ascending order, which means X0 will
> be saved first if needed, and X30(LR) will be the last if it's pushed into
> stack.
>
> Please check the source code, aarch64_layout_frame().
>
> As the comment above the code also indicates, LR would be at the top of the
> saved registers block().
>
> By the way, there is one additional stack slot might be needed to keep stack
> pointer 16-byte aligned,  so - 2 * UNITS_PER_WORD is needed to adjust the
> load  address.
>
> +---+ <-- arg_pointer_rtx
> |
> +---+ <-- frame_pointer_rtx
> |  dummy
> |  LR
> |  bla...bla...
> |  x3
> |  x2
> |  x1
> |  x0
>   P +---+ <-- hard_frame_pointer_rtx
> |
> +---+ <-- stack_pointer_rtx
>
>
> Kind regards,
> Renlin
>


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-13 Thread Torvald Riegel
On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote:
> On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote:
> > On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney
> >  wrote:
> > >
> > > Us Linux-kernel hackers will often need to use volatile semantics in
> > > combination with C11 atomics in most cases.  The C11 atomics do cover
> > > some of the reasons we currently use ACCESS_ONCE(), but not all of them --
> > > in particular, it allows load/store merging.
> > 
> > I really disagree with the "will need to use volatile".
> > 
> > We should never need to use volatile (outside of whatever MMIO we do
> > using C) if C11 defines atomics correctly.
> > 
> > Allowing load/store merging is *fine*. All sane CPU's do that anyway -
> > it's called a cache - and there's no actual reason to think that
> > "ACCESS_ONCE()" has to mean our current "volatile".
> > 
> > Now, it's possible that the C standards simply get atomics _wrong_, so
> > that they create visible semantics that are different from what a CPU
> > cache already does, but that's a plain bug in the standard if so.
> > 
> > But merging loads and stores is fine. And I *guarantee* it is fine,
> > exactly because CPU's already do it, so claiming that the compiler
> > couldn't do it is just insanity.
> 
> Agreed, both CPUs and compilers can merge loads and stores.  But CPUs
> normally get their stores pushed through the store buffer in reasonable
> time, and CPUs also use things like invalidations to ensure that a
> store is seen in reasonable time by readers.  Compilers don't always
> have these two properties, so we do need to be more careful of load
> and store merging by compilers.

The standard's _wording_ is a little vague about forward-progress
guarantees, but I believe the vast majority of the people involved do
want compilers to not prevent forward progress.  There is of course a
difference whether a compiler establishes _eventual_ forward progress in
the sense of after 10 years or forward progress in a small bounded
interval of time, but this is a QoI issue, and good compilers won't want
to introduce unnecessary latencies.  I believe that it is fine if the
standard merely talks about eventual forward progress.

> > Now, there are things that are *not* fine, like speculative stores
> > that could be visible to other threads. Those are *bugs* (either in
> > the compiler or in the standard), and anybody who claims otherwise is
> > not worth discussing with.
> 
> And as near as I can tell, volatile semantics are required in C11 to
> avoid speculative stores.  I might be wrong about this, and hope that
> I am wrong.  But I am currently not seeing it in the current standard.
> (Though I expect that most compilers would avoid speculating stores,
> especially in the near term.

This really depends on how we define speculative stores.  The memory
model is absolutely clear that programs have to behave as if executed by
the virtual machine, and that rules out speculative stores to volatiles
and other locations.  Under certain circumstances, there will be
"speculative" stores in the sense that they will happen at different
times as if you had a trivial implementation of the abstract machine.
But to be allowed to do that, the compiler has to prove that such a
transformation still fulfills the as-if rule.

IOW, the abstract machine is what currently defines disallowed
speculative stores.  If you want to put *further* constraints on what
implementations are allowed to do, I suppose it is best to talk about
those and see how we can add rules that allow programmers to express
those constraints.  For example, control dependencies might be such a
case.  I don't have a specific suggestion -- maybe the control
dependencies are best tackled similar to consume dependencies (even
though we don't have a good solution for those yets).  But using
volatile accesses for that seems to be a big hammer, or even the wrong
one.



Building GCC with -Wmissing-declarations and addressing its warnings

2014-02-13 Thread Patrick Palka
Hi everyone,

I noticed that the GCC build process currently only uses the
-Wmissing-prototypes flag, and not the -Wmissing-declarations flag.
It seems that the former flag only works on C source files, which
means that GCC's source files no longer benefit from this flag as they
are now  C++ files.  The right flag to use, in this case, is
-Wmissing-declarations, which works on both C and C++ source files.  I
decided to build GCC with this flag to see what kinds of warnings
popped up, and to use these warnings to clean up the GCC source.

I sifted through all the new warnings generated by
-Wmissing-declarations during the build process and fixed the ones
whose fixes were obvious.  Most of my fixes are on global (non-debug)
functions that are only referenced in the compilation unit in which
they are defined.  To fix these functions and to silence their
warnings I have simply gave them static linkage.  The rest of the
fixes are on global function definitions whose declaration exists in a
header file that was not included by the source file.  To fix up these
functions I simply included the relevant header file.

The -Wmissing-declarations warnings that I did _not_ address are those
emitted from the autogenerated gengtype header files, because their
fixes are not trivial to me.  They look like:

In file included from ../../gcc/gcc/c/c-parser.c:14162:0:
./gt-c-c-parser.h: In function 'void gt_ggc_mx(c_token&)':
./gt-c-c-parser.h:50:1: warning: no previous declaration for 'void
gt_ggc_mx(c_token&)' [-Wmissing-declarations]
 gt_ggc_mx (struct c_token& x_r ATTRIBUTE_UNUSED)

I have also not addressed some of such warnings in predict.c and
config/i386/i386.c because their fixes are not trivial to me either.
Furthermore, I was not able to mark "static" any function that was
used as a template argument to hash_table::traverse()
because it seems that C++98 requires template argument pointers to
have external linkage.  (The C++11 standard relaxes this restriction,
it seems.)  The file var-tracking.c has many of such functions.

Since I do not yet have a copyright assignment filed for GCC, I have
omitted an actual code patch and instead provide you with a changelog
that could be used to reconstruct the patch 100% if anyone is so
inclined.  Once my copyright assignment is filed, I will properly
submit this patch if it is not yet done so by somebody else.

On a related note, would a patch to officially enable
-Wmissing-declarations in the build process be well regarded?  Since
-Wmissing-prototypes is currently enabled, I assume it is the
intention of the GCC devs to address these warnings, and that during
the transition from a C to C++ bootstrap compiler a small oversight
was made (that -Wmissing-prototypes is a no-op against C++ source
files).  If the answer to the previous question is "yes" then how
would one go about addressing the above gengtype-related warnings, if
at all?

Thanks for your time,
Patrick

* asan.c (asan_mem_ref_get_end): Make static.
* calls.c: Include calls.h.
* cfgexpand.c: Include cfgexpand.h.
* cfgloop.c: Include tree-ssa-loop-niter.h.
* cfgraphunit.c (decide_is_symbol_needed): Make static.
* config/i386/i386.c (make_pass_insert_vzeroupper): Likewise.
(ix86_avx_emit_vzeroupper): Likewise.
* dwarf2out.c (init_addr_table_entry): Likewise.
* gimple-builder.c: Include gimple-builder.h.
* gimple-ssa-isolate-paths.c (isolate_path): Make static.
* graphite.c (graphite_transform_loops): Likewise.
* internal-fn.c (ubsan_expand_si_overflow_addsub_check): Make
static.
(ubsan_expand_si_overflow_neg_check): Likewise.
(ubsan_expand_si_overflow_mul_check): Likewise.
* ipa-devirt.c (hash_type_name): Likewise.
(likely_target_p): Likewise.
* ipa-inline-analysis.c (simple_edge_hints): Likewise.
* ipa-profile.c (cmp_counts): Likewise.
(contains_hot_call_p): Likewise.
* ipa-prop.c (ipa_alloc_node_params): Likewise.
(write_agg_replacement_chain): Likewise.
* ipa.c (can_replace_by_local_alias): Likewise.
* lto-streamer-out.c (output_symbol_p): Likewise.
* omp-low.c (simd_clone_vector_of_formal_parm_types): Likewise.
* print-tree.c: Include print-tree.h.
* stmt.c: Include stmt.h.
* stringpool.c: Include stringpool.h.
* tree-cfg-cleanup.c: Include tree-cfg-cleanup.h.
* tree-inline.c (redirect_all_calls): Make static.
(freqs_to_count): Likewise.
* tree-nested.c: Include tree-nested.h.
* tree-predcom.c (tree_predictive_commoning): Make static.
* tree-sra.c (ipa_sra_modify_function_body): Likewise.
* tree-ssa-loop-im.c (movement_possibility): Likewise.
(tree_ssa_lim): Likewise.
* tree-ssa-loop-ivcanon.c (canonicalize_induction_variables):
Likewise.
(tree_unroll_loops_completely): Likewise.
* tree-ssa-loop-prefet

gcc-4.8-20140213 is now available

2014-02-13 Thread gccadmin
Snapshot gcc-4.8-20140213 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20140213/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch 
revision 207768

You'll find:

 gcc-4.8-20140213.tar.bz2 Complete GCC

  MD5=a1f395cafd66e403ab2d03c88a51125e
  SHA1=71d27c2e5536fd12d738d17a1acd002834e1b6b8

Diffs from 4.8-20140206 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


TYPE_BINFO and canonical types at LTO

2014-02-13 Thread Jan Hubicka
Hi,
I have noticed that record_component_aliases is called during LTO time and it
examines contents of BINFO:
0x5cd7a5 record_component_aliases(tree_node*)
../../gcc/alias.c:1005
0x5cd4a9 get_alias_set(tree_node*)
../../gcc/alias.c:895
0x5cc67a component_uses_parent_alias_set_from(tree_node const*)
../../gcc/alias.c:548
0x5ccc42 reference_alias_ptr_type_1
../../gcc/alias.c:660
0x5ccf93 get_alias_set(tree_node*)
../../gcc/alias.c:740
0xb823d8 indirect_refs_may_alias_p
../../gcc/tree-ssa-alias.c:1125
0xb82d8d refs_may_alias_p_1(ao_ref*, ao_ref*, bool)
../../gcc/tree-ssa-alias.c:1279
0xb848df stmt_may_clobber_ref_p_1(gimple_statement_base*, ao_ref*)
../../gcc/tree-ssa-alias.c:2013
0xb85d27 walk_non_aliased_vuses(ao_ref*, tree_node*, void* (*)(ao_ref*, 
tree_node*, unsigned int, void*), void* (*)(ao_ref*, tree_node*, void*), void*)
../../gcc/tree-ssa-alias.c:2411
0xc509f3 vn_reference_lookup(tree_node*, tree_node*, vn_lookup_kind, 
vn_reference_s**)
../../gcc/tree-ssa-sccvn.c:2063
0xc52ea4 visit_reference_op_store
../../gcc/tree-ssa-sccvn.c:2970
0xc55404 extract_and_process_scc_for_name
../../gcc/tree-ssa-sccvn.c:3825

This smells bad, since it is given a canonical type that is after the
structural equivalency merging that ignores BINFOs, so it may be completely
different class with completely different bases than the original.  Bases are
structuraly merged, too and may be exchanged for normal fields because
DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part of
the canonical type definition in LTO.

I wonder if that code is needed after all:
case QUAL_UNION_TYPE:
  /* Recursively record aliases for the base classes, if there are any.  */
  if (TYPE_BINFO (type))
{ 
  int i;
  tree binfo, base_binfo;

  for (binfo = TYPE_BINFO (type), i = 0;
   BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
record_alias_subset (superset,
 get_alias_set (BINFO_TYPE (base_binfo)));
}
  for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN (field))
if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P (field))
  record_alias_subset (superset, get_alias_set (TREE_TYPE (field)));
  break;
all bases are also fields of within the type, so the second loop should notice
all the types seen by first loop if I am correct?
So perhaps the loop can be dropped at first place. 

Honza


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-13 Thread Paul E. McKenney
On Thu, Feb 13, 2014 at 12:03:57PM -0800, Torvald Riegel wrote:
> On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote:
> > > On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney
> > >  wrote:
> > > >
> > > > Us Linux-kernel hackers will often need to use volatile semantics in
> > > > combination with C11 atomics in most cases.  The C11 atomics do cover
> > > > some of the reasons we currently use ACCESS_ONCE(), but not all of them 
> > > > --
> > > > in particular, it allows load/store merging.
> > > 
> > > I really disagree with the "will need to use volatile".
> > > 
> > > We should never need to use volatile (outside of whatever MMIO we do
> > > using C) if C11 defines atomics correctly.
> > > 
> > > Allowing load/store merging is *fine*. All sane CPU's do that anyway -
> > > it's called a cache - and there's no actual reason to think that
> > > "ACCESS_ONCE()" has to mean our current "volatile".
> > > 
> > > Now, it's possible that the C standards simply get atomics _wrong_, so
> > > that they create visible semantics that are different from what a CPU
> > > cache already does, but that's a plain bug in the standard if so.
> > > 
> > > But merging loads and stores is fine. And I *guarantee* it is fine,
> > > exactly because CPU's already do it, so claiming that the compiler
> > > couldn't do it is just insanity.
> > 
> > Agreed, both CPUs and compilers can merge loads and stores.  But CPUs
> > normally get their stores pushed through the store buffer in reasonable
> > time, and CPUs also use things like invalidations to ensure that a
> > store is seen in reasonable time by readers.  Compilers don't always
> > have these two properties, so we do need to be more careful of load
> > and store merging by compilers.
> 
> The standard's _wording_ is a little vague about forward-progress
> guarantees, but I believe the vast majority of the people involved do
> want compilers to not prevent forward progress.  There is of course a
> difference whether a compiler establishes _eventual_ forward progress in
> the sense of after 10 years or forward progress in a small bounded
> interval of time, but this is a QoI issue, and good compilers won't want
> to introduce unnecessary latencies.  I believe that it is fine if the
> standard merely talks about eventual forward progress.

The compiler will need to earn my trust on this one.  ;-)

> > > Now, there are things that are *not* fine, like speculative stores
> > > that could be visible to other threads. Those are *bugs* (either in
> > > the compiler or in the standard), and anybody who claims otherwise is
> > > not worth discussing with.
> > 
> > And as near as I can tell, volatile semantics are required in C11 to
> > avoid speculative stores.  I might be wrong about this, and hope that
> > I am wrong.  But I am currently not seeing it in the current standard.
> > (Though I expect that most compilers would avoid speculating stores,
> > especially in the near term.
> 
> This really depends on how we define speculative stores.  The memory
> model is absolutely clear that programs have to behave as if executed by
> the virtual machine, and that rules out speculative stores to volatiles
> and other locations.  Under certain circumstances, there will be
> "speculative" stores in the sense that they will happen at different
> times as if you had a trivial implementation of the abstract machine.
> But to be allowed to do that, the compiler has to prove that such a
> transformation still fulfills the as-if rule.

Agreed, although the as-if rule would ignore control dependencies, since
these are not yet part of the standard (as you in fact note below).
I nevertheless consider myself at least somewhat reassured that current
C11 won't speculate stores.  My remaining concerns involve the compiler
proving to itself that a given branch is always taken, thus motivating
it to optimize the branch away -- though this is more properly a
control-dependency concern.

> IOW, the abstract machine is what currently defines disallowed
> speculative stores.  If you want to put *further* constraints on what
> implementations are allowed to do, I suppose it is best to talk about
> those and see how we can add rules that allow programmers to express
> those constraints.  For example, control dependencies might be such a
> case.  I don't have a specific suggestion -- maybe the control
> dependencies are best tackled similar to consume dependencies (even
> though we don't have a good solution for those yets).  But using
> volatile accesses for that seems to be a big hammer, or even the wrong
> one.

In current compilers, the two hammers we have are volatile and barrier().
But yes, it would be good to have something more focused.  One option
would be to propose memory_order_control loads to see how loudly the
committee screams.  One use case might be as follows:

if (atomic_load(x, memory_order_control))
   

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-13 Thread Torvald Riegel
On Thu, 2014-02-13 at 18:01 -0800, Paul E. McKenney wrote:
> On Thu, Feb 13, 2014 at 12:03:57PM -0800, Torvald Riegel wrote:
> > On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote:
> > > > On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney
> > > >  wrote:
> > > > >
> > > > > Us Linux-kernel hackers will often need to use volatile semantics in
> > > > > combination with C11 atomics in most cases.  The C11 atomics do cover
> > > > > some of the reasons we currently use ACCESS_ONCE(), but not all of 
> > > > > them --
> > > > > in particular, it allows load/store merging.
> > > > 
> > > > I really disagree with the "will need to use volatile".
> > > > 
> > > > We should never need to use volatile (outside of whatever MMIO we do
> > > > using C) if C11 defines atomics correctly.
> > > > 
> > > > Allowing load/store merging is *fine*. All sane CPU's do that anyway -
> > > > it's called a cache - and there's no actual reason to think that
> > > > "ACCESS_ONCE()" has to mean our current "volatile".
> > > > 
> > > > Now, it's possible that the C standards simply get atomics _wrong_, so
> > > > that they create visible semantics that are different from what a CPU
> > > > cache already does, but that's a plain bug in the standard if so.
> > > > 
> > > > But merging loads and stores is fine. And I *guarantee* it is fine,
> > > > exactly because CPU's already do it, so claiming that the compiler
> > > > couldn't do it is just insanity.
> > > 
> > > Agreed, both CPUs and compilers can merge loads and stores.  But CPUs
> > > normally get their stores pushed through the store buffer in reasonable
> > > time, and CPUs also use things like invalidations to ensure that a
> > > store is seen in reasonable time by readers.  Compilers don't always
> > > have these two properties, so we do need to be more careful of load
> > > and store merging by compilers.
> > 
> > The standard's _wording_ is a little vague about forward-progress
> > guarantees, but I believe the vast majority of the people involved do
> > want compilers to not prevent forward progress.  There is of course a
> > difference whether a compiler establishes _eventual_ forward progress in
> > the sense of after 10 years or forward progress in a small bounded
> > interval of time, but this is a QoI issue, and good compilers won't want
> > to introduce unnecessary latencies.  I believe that it is fine if the
> > standard merely talks about eventual forward progress.
> 
> The compiler will need to earn my trust on this one.  ;-)
> 
> > > > Now, there are things that are *not* fine, like speculative stores
> > > > that could be visible to other threads. Those are *bugs* (either in
> > > > the compiler or in the standard), and anybody who claims otherwise is
> > > > not worth discussing with.
> > > 
> > > And as near as I can tell, volatile semantics are required in C11 to
> > > avoid speculative stores.  I might be wrong about this, and hope that
> > > I am wrong.  But I am currently not seeing it in the current standard.
> > > (Though I expect that most compilers would avoid speculating stores,
> > > especially in the near term.
> > 
> > This really depends on how we define speculative stores.  The memory
> > model is absolutely clear that programs have to behave as if executed by
> > the virtual machine, and that rules out speculative stores to volatiles
> > and other locations.  Under certain circumstances, there will be
> > "speculative" stores in the sense that they will happen at different
> > times as if you had a trivial implementation of the abstract machine.
> > But to be allowed to do that, the compiler has to prove that such a
> > transformation still fulfills the as-if rule.
> 
> Agreed, although the as-if rule would ignore control dependencies, since
> these are not yet part of the standard (as you in fact note below).
> I nevertheless consider myself at least somewhat reassured that current
> C11 won't speculate stores.  My remaining concerns involve the compiler
> proving to itself that a given branch is always taken, thus motivating
> it to optimize the branch away -- though this is more properly a
> control-dependency concern.
> 
> > IOW, the abstract machine is what currently defines disallowed
> > speculative stores.  If you want to put *further* constraints on what
> > implementations are allowed to do, I suppose it is best to talk about
> > those and see how we can add rules that allow programmers to express
> > those constraints.  For example, control dependencies might be such a
> > case.  I don't have a specific suggestion -- maybe the control
> > dependencies are best tackled similar to consume dependencies (even
> > though we don't have a good solution for those yets).  But using
> > volatile accesses for that seems to be a big hammer, or even the wrong
> > one.
> 
> In current compilers, the two hammers we have are volatile and barrier().
> But yes, it would be goo

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-13 Thread Torvald Riegel
On Wed, 2014-02-12 at 10:19 +0100, Peter Zijlstra wrote:
> > I don't know the specifics of your example, but from how I understand
> > it, I don't see a problem if the compiler can prove that the store will
> > always happen.
> > 
> > To be more specific, if the compiler can prove that the store will
> > happen anyway, and the region of code can be assumed to always run
> > atomically (e.g., there's no loop or such in there), then it is known
> > that we have one atomic region of code that will always perform the
> > store, so we might as well do the stuff in the region in some order.
> > 
> > Now, if any of the memory accesses are atomic, then the whole region of
> > code containing those accesses is often not atomic because other threads
> > might observe intermediate results in a data-race-free way.
> > 
> > (I know that this isn't a very precise formulation, but I hope it brings
> > my line of reasoning across.)
> 
> So given something like:
> 
>   if (x)
>   y = 3;
> 
> assuming both x and y are atomic (so don't gimme crap for now knowing
> the C11 atomic incantations); and you can prove x is always true; you
> don't see a problem with not emitting the conditional?

That depends on what your goal is.  It would be correct as far as the
standard is specified; this makes sense if all you want is indeed a
program that does what the abstract machine might do, and produces the
same output / side effects.

If you're trying to preserve the branch in the code emitted / executed
by the implementation, then it would not be correct.  But those branches
aren't specified as being part of the observable side effects.  In the
common case, this makes sense because it enables optimizations that are
useful; this line of reasoning also allows the compiler to merge some
atomic accesses in the way that Linus would like to see it.

> Avoiding the conditional changes the result; see that control dependency
> email from earlier.

It does not regarding how the standard defines "result".

> In the above example the load of X and the store to
> Y are strictly ordered, due to control dependencies. Not emitting the
> condition and maybe not even emitting the load completely wrecks this.

I think you're trying to solve this backwards.  You are looking at this
with an implicit wishlist of what the compiler should do (or how you
want to use the hardware), but this is not a viable specification that
one can write a compiler against.

We do need clear rules for what the compiler is allowed to do or not
(e.g., a memory model that models multi-threaded executions).  Otherwise
it's all hand-waving, and we're getting nowhere.  Thus, the way to
approach this is to propose a feature or change to the standard, make
sure that this is consistent and has no unintended side effects for
other aspects of compilation or other code, and then ask the compiler to
implement it.  IOW, we need a patch for where this all starts: in the
rules and requirements for compilation.

Paul and I are at the C++ meeting currently, and we had sessions in
which the concurrency study group talked about memory model issues like
dependency tracking and memory_order_consume.  Paul shared uses of
atomics (or likewise) in the kernel, and we discussed how the memory
model currently handles various cases and why, how one could express
other requirements consistently, and what is actually implementable in
practice.  I can't speak for Paul, but I thought those discussions were
productive.

> Its therefore an invalid optimization to take out the conditional or
> speculate the store, since it takes out the dependency.