Re: Dead code elimination PROBLEM
On February 13, 2014 8:07:16 AM GMT+01:00, chronicle wrote: >Hi PPL i developed a plugin that produces the following gimple > >test () >{ > int selected_fnc_var_.3; > int random_Var.2; > int D.2363; > int _1; > > : > random_Var.2_2 = rand (); > selected_fnc_var_.3_3 = random_Var.2_2 %[fl] 5; > if (selected_fnc_var_.3_3 == 4) goto ; > if (selected_fnc_var_.3_3 == 3) goto ; > if (selected_fnc_var_.3_3 == 2) goto ; > if (selected_fnc_var_.3_3 == 1) goto ; > if (selected_fnc_var_.3_3 == 0) goto ; >: > _1 = f.clone.4 ("t", "t"); > goto ; >: > _1 = f.clone.3 ("t", "t"); > goto ; >: > _1 = f.clone.2 ("t", "t"); > goto ; >: > _1 =f.clone.1 ("t", "t"); > goto ; > >: You miss a phi node merging the different _1. Also you cannot assign to _1 multiple times but have to use a new ssa name for each. Richard. > if (_1 != 0) > goto ; > else > goto ; > > : > __builtin_puts (&" f success "[0]); > goto ; > > : > __builtin_puts (&" f failed "[0]); > > : > return; > >} > >with this final code > >004005c6 : > 4005c6:55 push %rbp > 4005c7:48 89 e5 mov%rsp,%rbp > 4005ca:53 push %rbx > 4005cb:48 83 ec 08 sub$0x8,%rsp > 4005cf:e8 6c fe ff ff callq 400440 > 4005d4:89 d9mov%ebx,%ecx > 4005d6:c1 f9 1f sar$0x1f,%ecx > 4005d9:89 d8mov%ebx,%eax > 4005db:31 c8xor%ecx,%eax > 4005dd:ba 67 66 66 66 mov$0x6667,%edx > 4005e2:f7 e2mul%edx > 4005e4:89 d0mov%edx,%eax > 4005e6:d1 e8shr%eax > 4005e8:31 c8xor%ecx,%eax > 4005ea:89 c2mov%eax,%edx > 4005ec:c1 e2 02 shl$0x2,%edx > 4005ef:01 c2add%eax,%edx > 4005f1:89 d8mov%ebx,%eax > 4005f3:29 d0sub%edx,%eax > 4005f5:83 f8 04 cmp$0x4,%eax > 4005f8:75 32jne40062c > 4005fa:83 f8 03 cmp$0x3,%eax > 4005fd:74 2dje 40062c > 4005ff:83 f8 02 cmp$0x2,%eax > 400602:74 28je 40062c > 400604:83 f8 01 cmp$0x1,%eax > 400607:74 23je 40062c > 400609:85 c0test %eax,%eax > 40060b:74 1fje 40062c > 40060d:be bc 09 40 00 mov$0x4009bc,%esi > 400612:bf c6 09 40 00 mov$0x4009c6,%edi > 400617:e8 7d 02 00 00 callq 400899 > 40061c:85 c0test %eax,%eax > 40061e:75 0cjne40062c > 400620:bf d0 09 40 00 mov$0x4009d0,%edi > 400625:e8 e6 fd ff ff callq 400410 > 40062a:eb 0ajmp400636 > 40062c:bf e8 09 40 00 mov$0x4009e8,%edi > 400631:e8 da fd ff ff callq 400410 > 400636:48 83 c4 08 add$0x8,%rsp > 40063a:5b pop%rbx > 40063b:5d pop%rbp > 40063c:c3 retq > > >from this gimple > >test(){ > >int D.2363; > int _1; > > : > _1 = f("t", "t"); > if (_1 != 0) > goto ; > else > goto ; > > : > __builtin_puts (&" f "[0]); > goto ; > > : > __builtin_puts (&" f "[0]); > > : > return; >} > >as you can see in the dis output code, its only make call to f.clone.4 >( callq 400899 ), i suppose is the dead code elimination >pass is the responsable of this action, i tryed to disable it using -O0 > >compilation option but without success. my question is how can i make >the compiler produce the final code without deleting those dead codes >portion ( do i need to make any kind of PHI nodes in the labels to >achive that, if so how could i do that ? ) > >thanks in advance
Re: Fwd: LLVM collaboration?
On Wed, Feb 12, 2014 at 5:22 PM, Jan Hubicka wrote: >> On Wed, 12 Feb 2014, Richard Biener wrote: >> >> > What about instead of our current odd way of identifying LTO objects >> > simply add a special ELF note telling the linker the plugin to use? >> > >> > .note._linker_plugin '/./libltoplugin.so' >> > >> > that way the linker should try 1) loading that plugin, 2) register the >> > specific object with that plugin. >> >> Unless this is only allowed for a whitelist of known-good plugins in >> known-good directories, it's a clear security hole for the linker to >> execute code in arbitrary files named by linker input. The linker should >> be safe to run on untrusted input files. > > Also I believe the flies should be independent of particular setup (that is > not > contain a path) and probably host OS (that is not having .so extension) at > least. > We need some versioning scheme for different versions of compilers. > Finally we need a solution for non-ELF LTO objects (like LLVM) > > But yes, having an compiler independent way of declaring that plugin is needed > and what plugin should be uses seems possible. Yeah, naming the plugin (and searching it in a ld specific trusted configurable path only) would work as well, of course. That also means that we should try to make the GCC side lto-plugin work for older GCC versions as well (we pick the lto-wrapper to call from the environment which would have to change if we'd try to support using multiple GCC versions at the same time). Richard. > Honza >> >> -- >> Joseph S. Myers >> jos...@codesourcery.com
Re: Aarch64 implementation for dwarf exception handling
Hi Shiva, I wonder if you have any test case to demonstrate the potential code-gen issue you are concerned with. Thanks, Yufeng On Thu, Feb 13, 2014 at 2:14 AM, Shiva Chen wrote: > Hi, > > I have a question about the implementation of > > aarch64_final_eh_return_addr > > which is used to point out the return address of the frame > > According the source code > > If FP is not needed > > return gen_frame_mem (DImode, > plus_constant (Pmode, >stack_pointer_rtx, >fp_offset >+ cfun->machine->frame.saved_regs_size >- 2 * UNITS_PER_WORD)); > > > According the frame layout > > +---+ <-- arg_pointer_rtx > | > | callee-allocated save area > | for register varargs > | > +---+ > | > | local variables > | > +---+ <-- frame_pointer_rtx > | > | callee-saved registers > | > +---+ > | LR' > +---+ > | FP' >P+---+ <-- hard_frame_pointer_rtx > | dynamic allocation > +---+ > | > | outgoing stack arguments > | > +---+ <-- stack_pointer_rtx > > Shouldn't the return value be > > return gen_frame_mem (DImode, > plus_constant (Pmode, >stack_pointer_rtx, >fp_offset >+ 2* UNITS_PER_WORD)); > > Or I just mis-understanding something ? > > > Hope someone could give me a tip. > > It would be very helpful. > > Thanks > > Shiva Chen
[GCC 4.8.1] Which section to emit, .eh_frame or .debug_section?
Hi, For C++ applications, on PPC, gcc v4.8.1 is generating the call frame information in the .eh_frame section by default. Could you please tell me why .eh_frame is being generated instead of .debug_frame? Also, the dwarf4 standard does not describe .eh_frame section. I understand that by default gcc v4.8.1 emits dwarf4 debug information. Thanks, Venkata Ramanaiah N
Re: [GCC 4.8.1] Which section to emit, .eh_frame or .debug_section?
On Thu, Feb 13, 2014 at 05:24:53PM +0530, Ramana wrote: > For C++ applications, on PPC, gcc v4.8.1 is generating the call frame > information in the .eh_frame section by default. > > Could you please tell me why .eh_frame is being generated instead of > .debug_frame? Because .eh_frame is the same data .debug_frame contains, just more compact and usable also for unwinding and backtrace purposes, not just debugging. It doesn't make sense to emit both. Jakub
Re: gcc.gnu.org/infrastructure - newer versions of GMP/mpfr/mpc/isl/cloog?
On Mon, Jan 27, 2014 at 8:40 PM, Tobias Grosser wrote: > On 01/27/2014 08:29 PM, Tobias Burnus wrote: >> >> Hello, >> >> motivated by the recent MPC 1.0.2 announcement, I looked at >> ./contrib/download_prerequisites and also at >> ftp://gcc.gnu.org/pub/gcc/infrastructure/ to see which versions are >> offered there. >> >> Question: Would it make sense to place newer versions into >> infrastructure and update ./contrib/download_prerequisites for those? I >> believe most distros use newer versions nowadays and as some bugs have >> been fixed in newer versions... >> >> * GMP: infrastructure 4.3.2 (2010-01-08), current: 5.1.3 (2013-09-30) >> * mpfr: infrastructure 2.4.2 (2009-11-30), current: 3.1.2 (2013-03-13) >> * mpc: infrastructure 0.8.1 (2009-12-08), current: 1.0.2 (2014-01-15) >> * ISL: infrastructure 0.11.1 (2012-12-12), current: 0.12.2 (2014-01-12) >> * CLooG: infrastructure 0.18.0 (2012-12-20), current: 0.18.1 >> (2013-10-11) [Or 0.18.2 (2013-12-20) according to the GIT tag, but that >> release only added a howto_cloog_release.txt file ...] > > > Hi Tobias, > > that sounds like a great idea. We are internally currently working on > preparing graphite for the isl 0.13.0 release, which is a large improvement > and e.g. provides a computeout facility that allows us to stop dependence > analysis in case the dependence problem is too complex to solve. This would > address some of the open graphite bugs. Even before this is ready, upgrading > to CLooG 0.18.1 and isl-0.12.1 would probably be a good thing to do. I've tested building the 4.8 branch with cloog 0.18.1 and isl 0.12.2 and that seems to work. Updating the infrastructure dir sounds good to me, I'll do it. Richard. > Cheers, > Tobias >
Re: gcc.gnu.org/infrastructure - newer versions of GMP/mpfr/mpc/isl/cloog?
On 02/13/2014 08:19 AM, Richard Biener wrote: On Mon, Jan 27, 2014 at 8:40 PM, Tobias Grosser wrote: On 01/27/2014 08:29 PM, Tobias Burnus wrote: Hello, motivated by the recent MPC 1.0.2 announcement, I looked at ./contrib/download_prerequisites and also at ftp://gcc.gnu.org/pub/gcc/infrastructure/ to see which versions are offered there. Question: Would it make sense to place newer versions into infrastructure and update ./contrib/download_prerequisites for those? I believe most distros use newer versions nowadays and as some bugs have been fixed in newer versions... * GMP: infrastructure 4.3.2 (2010-01-08), current: 5.1.3 (2013-09-30) * mpfr: infrastructure 2.4.2 (2009-11-30), current: 3.1.2 (2013-03-13) * mpc: infrastructure 0.8.1 (2009-12-08), current: 1.0.2 (2014-01-15) * ISL: infrastructure 0.11.1 (2012-12-12), current: 0.12.2 (2014-01-12) * CLooG: infrastructure 0.18.0 (2012-12-20), current: 0.18.1 (2013-10-11) [Or 0.18.2 (2013-12-20) according to the GIT tag, but that release only added a howto_cloog_release.txt file ...] Hi Tobias, that sounds like a great idea. We are internally currently working on preparing graphite for the isl 0.13.0 release, which is a large improvement and e.g. provides a computeout facility that allows us to stop dependence analysis in case the dependence problem is too complex to solve. This would address some of the open graphite bugs. Even before this is ready, upgrading to CLooG 0.18.1 and isl-0.12.1 would probably be a good thing to do. I've tested building the 4.8 branch with cloog 0.18.1 and isl 0.12.2 and that seems to work. Updating the infrastructure dir sounds good to me, I'll do it. Thanks Richi! Could we also make the minimal library requirement for gcc the following two? Cheers, Tobias
Re: gcc.gnu.org/infrastructure - newer versions of GMP/mpfr/mpc/isl/cloog?
On Thu, Feb 13, 2014 at 2:20 PM, Tobias Grosser wrote: > On 02/13/2014 08:19 AM, Richard Biener wrote: >> >> On Mon, Jan 27, 2014 at 8:40 PM, Tobias Grosser wrote: >>> >>> On 01/27/2014 08:29 PM, Tobias Burnus wrote: Hello, motivated by the recent MPC 1.0.2 announcement, I looked at ./contrib/download_prerequisites and also at ftp://gcc.gnu.org/pub/gcc/infrastructure/ to see which versions are offered there. Question: Would it make sense to place newer versions into infrastructure and update ./contrib/download_prerequisites for those? I believe most distros use newer versions nowadays and as some bugs have been fixed in newer versions... * GMP: infrastructure 4.3.2 (2010-01-08), current: 5.1.3 (2013-09-30) * mpfr: infrastructure 2.4.2 (2009-11-30), current: 3.1.2 (2013-03-13) * mpc: infrastructure 0.8.1 (2009-12-08), current: 1.0.2 (2014-01-15) * ISL: infrastructure 0.11.1 (2012-12-12), current: 0.12.2 (2014-01-12) * CLooG: infrastructure 0.18.0 (2012-12-20), current: 0.18.1 (2013-10-11) [Or 0.18.2 (2013-12-20) according to the GIT tag, but that release only added a howto_cloog_release.txt file ...] >>> >>> >>> >>> Hi Tobias, >>> >>> that sounds like a great idea. We are internally currently working on >>> preparing graphite for the isl 0.13.0 release, which is a large >>> improvement >>> and e.g. provides a computeout facility that allows us to stop dependence >>> analysis in case the dependence problem is too complex to solve. This >>> would >>> address some of the open graphite bugs. Even before this is ready, >>> upgrading >>> to CLooG 0.18.1 and isl-0.12.1 would probably be a good thing to do. >> >> >> I've tested building the 4.8 branch with cloog 0.18.1 and isl 0.12.2 and >> that seems to work. Updating the infrastructure dir sounds good to me, >> I'll do it. > > > Thanks Richi! > > Could we also make the minimal library requirement for gcc the following > two? I see no reason to do that at this point (I suppose only dropping support for ISL < 0.12.2 would help you?), but I have updated the recommended versions as documented in install.texi to those two. When trunk re-opens for stage1 we can drop support for older versions once we make code changes that make those versions no longer work. Pretty high on my wishlist and a good motivation would be to get rid of the cloog dependency by using the ISL code generator ;) I'll help as good as I can with all problems that arise in GCC specific areas such as GIMPLE and SSA. Thanks, Richard. > Cheers, > Tobias >
Re: gcc.gnu.org/infrastructure - newer versions of GMP/mpfr/mpc/isl/cloog?
On 02/13/2014 08:37 AM, Richard Biener wrote: On Thu, Feb 13, 2014 at 2:20 PM, Tobias Grosser wrote: On 02/13/2014 08:19 AM, Richard Biener wrote: On Mon, Jan 27, 2014 at 8:40 PM, Tobias Grosser wrote: On 01/27/2014 08:29 PM, Tobias Burnus wrote: Hello, motivated by the recent MPC 1.0.2 announcement, I looked at ./contrib/download_prerequisites and also at ftp://gcc.gnu.org/pub/gcc/infrastructure/ to see which versions are offered there. Question: Would it make sense to place newer versions into infrastructure and update ./contrib/download_prerequisites for those? I believe most distros use newer versions nowadays and as some bugs have been fixed in newer versions... * GMP: infrastructure 4.3.2 (2010-01-08), current: 5.1.3 (2013-09-30) * mpfr: infrastructure 2.4.2 (2009-11-30), current: 3.1.2 (2013-03-13) * mpc: infrastructure 0.8.1 (2009-12-08), current: 1.0.2 (2014-01-15) * ISL: infrastructure 0.11.1 (2012-12-12), current: 0.12.2 (2014-01-12) * CLooG: infrastructure 0.18.0 (2012-12-20), current: 0.18.1 (2013-10-11) [Or 0.18.2 (2013-12-20) according to the GIT tag, but that release only added a howto_cloog_release.txt file ...] Hi Tobias, that sounds like a great idea. We are internally currently working on preparing graphite for the isl 0.13.0 release, which is a large improvement and e.g. provides a computeout facility that allows us to stop dependence analysis in case the dependence problem is too complex to solve. This would address some of the open graphite bugs. Even before this is ready, upgrading to CLooG 0.18.1 and isl-0.12.1 would probably be a good thing to do. I've tested building the 4.8 branch with cloog 0.18.1 and isl 0.12.2 and that seems to work. Updating the infrastructure dir sounds good to me, I'll do it. Thanks Richi! Could we also make the minimal library requirement for gcc the following two? I see no reason to do that at this point (I suppose only dropping support for ISL < 0.12.2 would help you?), but I have updated the recommended versions as documented in install.texi to those two. When trunk re-opens for stage1 we can drop support for older versions once we make code changes that make those versions no longer work. Perfect. That works for us. Pretty high on my wishlist and a good motivation would be to get rid of the cloog dependency by using the ISL code generator ;) I'll help as good as I can with all problems that arise in GCC specific areas such as GIMPLE and SSA. We seem to agree on the next steps. Using the isl code generator is pretty high on the wish list, only fixing the last P1 bugs is higher. Cheers, Tobias
GNU Tools Cauldron 2014 - Venue and Hotel information
== GNU Tools Cauldron 2014 http://gcc.gnu.org/wiki/cauldron2014 Call for Abstracts and Participation 18-20 July 2014 Cambridge, England == An update to this year's Cauldron. The workshop will be held at University of Cambridge's Computer Labratory in the William Gates Building. Details at http://www.cl.cam.ac.uk/local/wgb/ We have negotiated a promotional rate at St. Catherine's college. For those interested, please reserve through: http://www.caths.cam.ac.uk/home/?m=page&id=73 Note: you will need to enter GNUTOOLSCAULDRON into the promotional code box for it to bring up availability. If you are interested in attending, please remember to register soon. We have limited room for attendance and we have quite a few registrations already. If you have a topic that you would like to present, please submit an abstract describing what you plan to present. We are accepting three types of submissions: - Prepared presentations: demos, project reports, etc. - BoFs: coordination meetings with other developers. - Tutorials for developers. No user tutorials, please. Note that we will not be doing in-depth reviews of the presentations. Mainly we are looking for applicability and to decide scheduling. There will be time at the conference to add other topics of discussion, similarly to what we did at the previous meetings. To register your abstract, send e-mail to tools-cauldron-ad...@googlegroups.com. Your submission should contain the following information: Title: Authors: Abstract: If you intend to participate, but not necessarily present, please let us know as well. Send a message to tools-cauldron-ad...@googlegroups.com stating your intent to participate.
Re: [GCC 4.8.1] Which section to emit, .eh_frame or .debug_section?
On Thu, Feb 13, 2014 at 5:29 PM, Jakub Jelinek wrote: > On Thu, Feb 13, 2014 at 05:24:53PM +0530, Ramana wrote: >> For C++ applications, on PPC, gcc v4.8.1 is generating the call frame >> information in the .eh_frame section by default. >> >> Could you please tell me why .eh_frame is being generated instead of >> .debug_frame? > > Because .eh_frame is the same data .debug_frame contains, just more compact > and usable also for unwinding and backtrace purposes, not just debugging. > It doesn't make sense to emit both. > > Jakub Ok. But there is no mention of .eh_frame in dwarf4 standard. Any dwarf4 compliant debugger would look for only .debug_frame or shouldn't it? Regards, Venkata Ramanaiah N
Re: Aarch64 implementation for dwarf exception handling
On 13/02/14 02:14, Shiva Chen wrote: Hi, I have a question about the implementation of aarch64_final_eh_return_addr which is used to point out the return address of the frame According the source code If FP is not needed return gen_frame_mem (DImode, plus_constant (Pmode, stack_pointer_rtx, fp_offset + cfun->machine->frame.saved_regs_size - 2 * UNITS_PER_WORD)); According the frame layout +---+ <-- arg_pointer_rtx | | callee-allocated save area | for register varargs | +---+ | | local variables | +---+ <-- frame_pointer_rtx | | callee-saved registers | +---+ | LR' +---+ | FP' P+---+ <-- hard_frame_pointer_rtx | dynamic allocation +---+ | | outgoing stack arguments | +---+ <-- stack_pointer_rtx Shouldn't the return value be return gen_frame_mem (DImode, plus_constant (Pmode, stack_pointer_rtx, fp_offset + 2* UNITS_PER_WORD)); Or I just mis-understanding something ? Hope someone could give me a tip. It would be very helpful. Thanks Shiva Chen Hi, If frame pointer is not needed. The prologue routine will store the callee saved registers to stack according to ascending order, which means X0 will be saved first if needed, and X30(LR) will be the last if it's pushed into stack. Please check the source code, aarch64_layout_frame(). As the comment above the code also indicates, LR would be at the top of the saved registers block(). By the way, there is one additional stack slot might be needed to keep stack pointer 16-byte aligned, so - 2 * UNITS_PER_WORD is needed to adjust the load address. +---+ <-- arg_pointer_rtx | +---+ <-- frame_pointer_rtx | dummy | LR | bla...bla... | x3 | x2 | x1 | x0 P +---+ <-- hard_frame_pointer_rtx | +---+ <-- stack_pointer_rtx Kind regards, Renlin
Re: Aarch64 implementation for dwarf exception handling
Hi, Yufeng Sorry, I don't have any testcase I just mis-understanding the implementation. Hi, Renlin Thanks to point out my mis-understanding. I didn't aware that LP would in different position between FP needed (bottom of callee) and FP not needed(top of callee). I have check the aarch64_layout_frame() and find out the FP/LP will push as last register by aarch64_save_or_restore_callee_save_registers () if FP is not needed. Thanks for your kindly help, I really appreciate it. Shiva 2014-02-13 22:32 GMT+08:00 Renlin Li : > On 13/02/14 02:14, Shiva Chen wrote: >> >> Hi, >> >> I have a question about the implementation of >> >> aarch64_final_eh_return_addr >> >> which is used to point out the return address of the frame >> >> According the source code >> >> If FP is not needed >> >>return gen_frame_mem (DImode, >> plus_constant (Pmode, >> stack_pointer_rtx, >> fp_offset >> + >> cfun->machine->frame.saved_regs_size >> - 2 * UNITS_PER_WORD)); >> >> >> According the frame layout >> >> +---+ <-- arg_pointer_rtx >> | >> | callee-allocated save area >> | for register varargs >> | >> +---+ >> | >> | local variables >> | >> +---+ <-- frame_pointer_rtx >> | >> | callee-saved registers >> | >> +---+ >> | LR' >> +---+ >> | FP' >> P+---+ <-- hard_frame_pointer_rtx >> | dynamic allocation >> +---+ >> | >> | outgoing stack arguments >> | >> +---+ <-- stack_pointer_rtx >> >> Shouldn't the return value be >> >>return gen_frame_mem (DImode, >> plus_constant (Pmode, >> stack_pointer_rtx, >> fp_offset >> + 2* UNITS_PER_WORD)); >> >> Or I just mis-understanding something ? >> >> >> Hope someone could give me a tip. >> >> It would be very helpful. >> >> Thanks >> >> Shiva Chen >> > Hi, > > If frame pointer is not needed. The prologue routine will store the callee > saved registers to stack according to ascending order, which means X0 will > be saved first if needed, and X30(LR) will be the last if it's pushed into > stack. > > Please check the source code, aarch64_layout_frame(). > > As the comment above the code also indicates, LR would be at the top of the > saved registers block(). > > By the way, there is one additional stack slot might be needed to keep stack > pointer 16-byte aligned, so - 2 * UNITS_PER_WORD is needed to adjust the > load address. > > +---+ <-- arg_pointer_rtx > | > +---+ <-- frame_pointer_rtx > | dummy > | LR > | bla...bla... > | x3 > | x2 > | x1 > | x0 > P +---+ <-- hard_frame_pointer_rtx > | > +---+ <-- stack_pointer_rtx > > > Kind regards, > Renlin >
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote: > On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote: > > On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney > > wrote: > > > > > > Us Linux-kernel hackers will often need to use volatile semantics in > > > combination with C11 atomics in most cases. The C11 atomics do cover > > > some of the reasons we currently use ACCESS_ONCE(), but not all of them -- > > > in particular, it allows load/store merging. > > > > I really disagree with the "will need to use volatile". > > > > We should never need to use volatile (outside of whatever MMIO we do > > using C) if C11 defines atomics correctly. > > > > Allowing load/store merging is *fine*. All sane CPU's do that anyway - > > it's called a cache - and there's no actual reason to think that > > "ACCESS_ONCE()" has to mean our current "volatile". > > > > Now, it's possible that the C standards simply get atomics _wrong_, so > > that they create visible semantics that are different from what a CPU > > cache already does, but that's a plain bug in the standard if so. > > > > But merging loads and stores is fine. And I *guarantee* it is fine, > > exactly because CPU's already do it, so claiming that the compiler > > couldn't do it is just insanity. > > Agreed, both CPUs and compilers can merge loads and stores. But CPUs > normally get their stores pushed through the store buffer in reasonable > time, and CPUs also use things like invalidations to ensure that a > store is seen in reasonable time by readers. Compilers don't always > have these two properties, so we do need to be more careful of load > and store merging by compilers. The standard's _wording_ is a little vague about forward-progress guarantees, but I believe the vast majority of the people involved do want compilers to not prevent forward progress. There is of course a difference whether a compiler establishes _eventual_ forward progress in the sense of after 10 years or forward progress in a small bounded interval of time, but this is a QoI issue, and good compilers won't want to introduce unnecessary latencies. I believe that it is fine if the standard merely talks about eventual forward progress. > > Now, there are things that are *not* fine, like speculative stores > > that could be visible to other threads. Those are *bugs* (either in > > the compiler or in the standard), and anybody who claims otherwise is > > not worth discussing with. > > And as near as I can tell, volatile semantics are required in C11 to > avoid speculative stores. I might be wrong about this, and hope that > I am wrong. But I am currently not seeing it in the current standard. > (Though I expect that most compilers would avoid speculating stores, > especially in the near term. This really depends on how we define speculative stores. The memory model is absolutely clear that programs have to behave as if executed by the virtual machine, and that rules out speculative stores to volatiles and other locations. Under certain circumstances, there will be "speculative" stores in the sense that they will happen at different times as if you had a trivial implementation of the abstract machine. But to be allowed to do that, the compiler has to prove that such a transformation still fulfills the as-if rule. IOW, the abstract machine is what currently defines disallowed speculative stores. If you want to put *further* constraints on what implementations are allowed to do, I suppose it is best to talk about those and see how we can add rules that allow programmers to express those constraints. For example, control dependencies might be such a case. I don't have a specific suggestion -- maybe the control dependencies are best tackled similar to consume dependencies (even though we don't have a good solution for those yets). But using volatile accesses for that seems to be a big hammer, or even the wrong one.
Building GCC with -Wmissing-declarations and addressing its warnings
Hi everyone, I noticed that the GCC build process currently only uses the -Wmissing-prototypes flag, and not the -Wmissing-declarations flag. It seems that the former flag only works on C source files, which means that GCC's source files no longer benefit from this flag as they are now C++ files. The right flag to use, in this case, is -Wmissing-declarations, which works on both C and C++ source files. I decided to build GCC with this flag to see what kinds of warnings popped up, and to use these warnings to clean up the GCC source. I sifted through all the new warnings generated by -Wmissing-declarations during the build process and fixed the ones whose fixes were obvious. Most of my fixes are on global (non-debug) functions that are only referenced in the compilation unit in which they are defined. To fix these functions and to silence their warnings I have simply gave them static linkage. The rest of the fixes are on global function definitions whose declaration exists in a header file that was not included by the source file. To fix up these functions I simply included the relevant header file. The -Wmissing-declarations warnings that I did _not_ address are those emitted from the autogenerated gengtype header files, because their fixes are not trivial to me. They look like: In file included from ../../gcc/gcc/c/c-parser.c:14162:0: ./gt-c-c-parser.h: In function 'void gt_ggc_mx(c_token&)': ./gt-c-c-parser.h:50:1: warning: no previous declaration for 'void gt_ggc_mx(c_token&)' [-Wmissing-declarations] gt_ggc_mx (struct c_token& x_r ATTRIBUTE_UNUSED) I have also not addressed some of such warnings in predict.c and config/i386/i386.c because their fixes are not trivial to me either. Furthermore, I was not able to mark "static" any function that was used as a template argument to hash_table::traverse() because it seems that C++98 requires template argument pointers to have external linkage. (The C++11 standard relaxes this restriction, it seems.) The file var-tracking.c has many of such functions. Since I do not yet have a copyright assignment filed for GCC, I have omitted an actual code patch and instead provide you with a changelog that could be used to reconstruct the patch 100% if anyone is so inclined. Once my copyright assignment is filed, I will properly submit this patch if it is not yet done so by somebody else. On a related note, would a patch to officially enable -Wmissing-declarations in the build process be well regarded? Since -Wmissing-prototypes is currently enabled, I assume it is the intention of the GCC devs to address these warnings, and that during the transition from a C to C++ bootstrap compiler a small oversight was made (that -Wmissing-prototypes is a no-op against C++ source files). If the answer to the previous question is "yes" then how would one go about addressing the above gengtype-related warnings, if at all? Thanks for your time, Patrick * asan.c (asan_mem_ref_get_end): Make static. * calls.c: Include calls.h. * cfgexpand.c: Include cfgexpand.h. * cfgloop.c: Include tree-ssa-loop-niter.h. * cfgraphunit.c (decide_is_symbol_needed): Make static. * config/i386/i386.c (make_pass_insert_vzeroupper): Likewise. (ix86_avx_emit_vzeroupper): Likewise. * dwarf2out.c (init_addr_table_entry): Likewise. * gimple-builder.c: Include gimple-builder.h. * gimple-ssa-isolate-paths.c (isolate_path): Make static. * graphite.c (graphite_transform_loops): Likewise. * internal-fn.c (ubsan_expand_si_overflow_addsub_check): Make static. (ubsan_expand_si_overflow_neg_check): Likewise. (ubsan_expand_si_overflow_mul_check): Likewise. * ipa-devirt.c (hash_type_name): Likewise. (likely_target_p): Likewise. * ipa-inline-analysis.c (simple_edge_hints): Likewise. * ipa-profile.c (cmp_counts): Likewise. (contains_hot_call_p): Likewise. * ipa-prop.c (ipa_alloc_node_params): Likewise. (write_agg_replacement_chain): Likewise. * ipa.c (can_replace_by_local_alias): Likewise. * lto-streamer-out.c (output_symbol_p): Likewise. * omp-low.c (simd_clone_vector_of_formal_parm_types): Likewise. * print-tree.c: Include print-tree.h. * stmt.c: Include stmt.h. * stringpool.c: Include stringpool.h. * tree-cfg-cleanup.c: Include tree-cfg-cleanup.h. * tree-inline.c (redirect_all_calls): Make static. (freqs_to_count): Likewise. * tree-nested.c: Include tree-nested.h. * tree-predcom.c (tree_predictive_commoning): Make static. * tree-sra.c (ipa_sra_modify_function_body): Likewise. * tree-ssa-loop-im.c (movement_possibility): Likewise. (tree_ssa_lim): Likewise. * tree-ssa-loop-ivcanon.c (canonicalize_induction_variables): Likewise. (tree_unroll_loops_completely): Likewise. * tree-ssa-loop-prefet
gcc-4.8-20140213 is now available
Snapshot gcc-4.8-20140213 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20140213/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch revision 207768 You'll find: gcc-4.8-20140213.tar.bz2 Complete GCC MD5=a1f395cafd66e403ab2d03c88a51125e SHA1=71d27c2e5536fd12d738d17a1acd002834e1b6b8 Diffs from 4.8-20140206 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
TYPE_BINFO and canonical types at LTO
Hi, I have noticed that record_component_aliases is called during LTO time and it examines contents of BINFO: 0x5cd7a5 record_component_aliases(tree_node*) ../../gcc/alias.c:1005 0x5cd4a9 get_alias_set(tree_node*) ../../gcc/alias.c:895 0x5cc67a component_uses_parent_alias_set_from(tree_node const*) ../../gcc/alias.c:548 0x5ccc42 reference_alias_ptr_type_1 ../../gcc/alias.c:660 0x5ccf93 get_alias_set(tree_node*) ../../gcc/alias.c:740 0xb823d8 indirect_refs_may_alias_p ../../gcc/tree-ssa-alias.c:1125 0xb82d8d refs_may_alias_p_1(ao_ref*, ao_ref*, bool) ../../gcc/tree-ssa-alias.c:1279 0xb848df stmt_may_clobber_ref_p_1(gimple_statement_base*, ao_ref*) ../../gcc/tree-ssa-alias.c:2013 0xb85d27 walk_non_aliased_vuses(ao_ref*, tree_node*, void* (*)(ao_ref*, tree_node*, unsigned int, void*), void* (*)(ao_ref*, tree_node*, void*), void*) ../../gcc/tree-ssa-alias.c:2411 0xc509f3 vn_reference_lookup(tree_node*, tree_node*, vn_lookup_kind, vn_reference_s**) ../../gcc/tree-ssa-sccvn.c:2063 0xc52ea4 visit_reference_op_store ../../gcc/tree-ssa-sccvn.c:2970 0xc55404 extract_and_process_scc_for_name ../../gcc/tree-ssa-sccvn.c:3825 This smells bad, since it is given a canonical type that is after the structural equivalency merging that ignores BINFOs, so it may be completely different class with completely different bases than the original. Bases are structuraly merged, too and may be exchanged for normal fields because DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part of the canonical type definition in LTO. I wonder if that code is needed after all: case QUAL_UNION_TYPE: /* Recursively record aliases for the base classes, if there are any. */ if (TYPE_BINFO (type)) { int i; tree binfo, base_binfo; for (binfo = TYPE_BINFO (type), i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) record_alias_subset (superset, get_alias_set (BINFO_TYPE (base_binfo))); } for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN (field)) if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P (field)) record_alias_subset (superset, get_alias_set (TREE_TYPE (field))); break; all bases are also fields of within the type, so the second loop should notice all the types seen by first loop if I am correct? So perhaps the loop can be dropped at first place. Honza
Re: [RFC][PATCH 0/5] arch: atomic rework
On Thu, Feb 13, 2014 at 12:03:57PM -0800, Torvald Riegel wrote: > On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote: > > On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote: > > > On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney > > > wrote: > > > > > > > > Us Linux-kernel hackers will often need to use volatile semantics in > > > > combination with C11 atomics in most cases. The C11 atomics do cover > > > > some of the reasons we currently use ACCESS_ONCE(), but not all of them > > > > -- > > > > in particular, it allows load/store merging. > > > > > > I really disagree with the "will need to use volatile". > > > > > > We should never need to use volatile (outside of whatever MMIO we do > > > using C) if C11 defines atomics correctly. > > > > > > Allowing load/store merging is *fine*. All sane CPU's do that anyway - > > > it's called a cache - and there's no actual reason to think that > > > "ACCESS_ONCE()" has to mean our current "volatile". > > > > > > Now, it's possible that the C standards simply get atomics _wrong_, so > > > that they create visible semantics that are different from what a CPU > > > cache already does, but that's a plain bug in the standard if so. > > > > > > But merging loads and stores is fine. And I *guarantee* it is fine, > > > exactly because CPU's already do it, so claiming that the compiler > > > couldn't do it is just insanity. > > > > Agreed, both CPUs and compilers can merge loads and stores. But CPUs > > normally get their stores pushed through the store buffer in reasonable > > time, and CPUs also use things like invalidations to ensure that a > > store is seen in reasonable time by readers. Compilers don't always > > have these two properties, so we do need to be more careful of load > > and store merging by compilers. > > The standard's _wording_ is a little vague about forward-progress > guarantees, but I believe the vast majority of the people involved do > want compilers to not prevent forward progress. There is of course a > difference whether a compiler establishes _eventual_ forward progress in > the sense of after 10 years or forward progress in a small bounded > interval of time, but this is a QoI issue, and good compilers won't want > to introduce unnecessary latencies. I believe that it is fine if the > standard merely talks about eventual forward progress. The compiler will need to earn my trust on this one. ;-) > > > Now, there are things that are *not* fine, like speculative stores > > > that could be visible to other threads. Those are *bugs* (either in > > > the compiler or in the standard), and anybody who claims otherwise is > > > not worth discussing with. > > > > And as near as I can tell, volatile semantics are required in C11 to > > avoid speculative stores. I might be wrong about this, and hope that > > I am wrong. But I am currently not seeing it in the current standard. > > (Though I expect that most compilers would avoid speculating stores, > > especially in the near term. > > This really depends on how we define speculative stores. The memory > model is absolutely clear that programs have to behave as if executed by > the virtual machine, and that rules out speculative stores to volatiles > and other locations. Under certain circumstances, there will be > "speculative" stores in the sense that they will happen at different > times as if you had a trivial implementation of the abstract machine. > But to be allowed to do that, the compiler has to prove that such a > transformation still fulfills the as-if rule. Agreed, although the as-if rule would ignore control dependencies, since these are not yet part of the standard (as you in fact note below). I nevertheless consider myself at least somewhat reassured that current C11 won't speculate stores. My remaining concerns involve the compiler proving to itself that a given branch is always taken, thus motivating it to optimize the branch away -- though this is more properly a control-dependency concern. > IOW, the abstract machine is what currently defines disallowed > speculative stores. If you want to put *further* constraints on what > implementations are allowed to do, I suppose it is best to talk about > those and see how we can add rules that allow programmers to express > those constraints. For example, control dependencies might be such a > case. I don't have a specific suggestion -- maybe the control > dependencies are best tackled similar to consume dependencies (even > though we don't have a good solution for those yets). But using > volatile accesses for that seems to be a big hammer, or even the wrong > one. In current compilers, the two hammers we have are volatile and barrier(). But yes, it would be good to have something more focused. One option would be to propose memory_order_control loads to see how loudly the committee screams. One use case might be as follows: if (atomic_load(x, memory_order_control))
Re: [RFC][PATCH 0/5] arch: atomic rework
On Thu, 2014-02-13 at 18:01 -0800, Paul E. McKenney wrote: > On Thu, Feb 13, 2014 at 12:03:57PM -0800, Torvald Riegel wrote: > > On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote: > > > On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote: > > > > On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney > > > > wrote: > > > > > > > > > > Us Linux-kernel hackers will often need to use volatile semantics in > > > > > combination with C11 atomics in most cases. The C11 atomics do cover > > > > > some of the reasons we currently use ACCESS_ONCE(), but not all of > > > > > them -- > > > > > in particular, it allows load/store merging. > > > > > > > > I really disagree with the "will need to use volatile". > > > > > > > > We should never need to use volatile (outside of whatever MMIO we do > > > > using C) if C11 defines atomics correctly. > > > > > > > > Allowing load/store merging is *fine*. All sane CPU's do that anyway - > > > > it's called a cache - and there's no actual reason to think that > > > > "ACCESS_ONCE()" has to mean our current "volatile". > > > > > > > > Now, it's possible that the C standards simply get atomics _wrong_, so > > > > that they create visible semantics that are different from what a CPU > > > > cache already does, but that's a plain bug in the standard if so. > > > > > > > > But merging loads and stores is fine. And I *guarantee* it is fine, > > > > exactly because CPU's already do it, so claiming that the compiler > > > > couldn't do it is just insanity. > > > > > > Agreed, both CPUs and compilers can merge loads and stores. But CPUs > > > normally get their stores pushed through the store buffer in reasonable > > > time, and CPUs also use things like invalidations to ensure that a > > > store is seen in reasonable time by readers. Compilers don't always > > > have these two properties, so we do need to be more careful of load > > > and store merging by compilers. > > > > The standard's _wording_ is a little vague about forward-progress > > guarantees, but I believe the vast majority of the people involved do > > want compilers to not prevent forward progress. There is of course a > > difference whether a compiler establishes _eventual_ forward progress in > > the sense of after 10 years or forward progress in a small bounded > > interval of time, but this is a QoI issue, and good compilers won't want > > to introduce unnecessary latencies. I believe that it is fine if the > > standard merely talks about eventual forward progress. > > The compiler will need to earn my trust on this one. ;-) > > > > > Now, there are things that are *not* fine, like speculative stores > > > > that could be visible to other threads. Those are *bugs* (either in > > > > the compiler or in the standard), and anybody who claims otherwise is > > > > not worth discussing with. > > > > > > And as near as I can tell, volatile semantics are required in C11 to > > > avoid speculative stores. I might be wrong about this, and hope that > > > I am wrong. But I am currently not seeing it in the current standard. > > > (Though I expect that most compilers would avoid speculating stores, > > > especially in the near term. > > > > This really depends on how we define speculative stores. The memory > > model is absolutely clear that programs have to behave as if executed by > > the virtual machine, and that rules out speculative stores to volatiles > > and other locations. Under certain circumstances, there will be > > "speculative" stores in the sense that they will happen at different > > times as if you had a trivial implementation of the abstract machine. > > But to be allowed to do that, the compiler has to prove that such a > > transformation still fulfills the as-if rule. > > Agreed, although the as-if rule would ignore control dependencies, since > these are not yet part of the standard (as you in fact note below). > I nevertheless consider myself at least somewhat reassured that current > C11 won't speculate stores. My remaining concerns involve the compiler > proving to itself that a given branch is always taken, thus motivating > it to optimize the branch away -- though this is more properly a > control-dependency concern. > > > IOW, the abstract machine is what currently defines disallowed > > speculative stores. If you want to put *further* constraints on what > > implementations are allowed to do, I suppose it is best to talk about > > those and see how we can add rules that allow programmers to express > > those constraints. For example, control dependencies might be such a > > case. I don't have a specific suggestion -- maybe the control > > dependencies are best tackled similar to consume dependencies (even > > though we don't have a good solution for those yets). But using > > volatile accesses for that seems to be a big hammer, or even the wrong > > one. > > In current compilers, the two hammers we have are volatile and barrier(). > But yes, it would be goo
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, 2014-02-12 at 10:19 +0100, Peter Zijlstra wrote: > > I don't know the specifics of your example, but from how I understand > > it, I don't see a problem if the compiler can prove that the store will > > always happen. > > > > To be more specific, if the compiler can prove that the store will > > happen anyway, and the region of code can be assumed to always run > > atomically (e.g., there's no loop or such in there), then it is known > > that we have one atomic region of code that will always perform the > > store, so we might as well do the stuff in the region in some order. > > > > Now, if any of the memory accesses are atomic, then the whole region of > > code containing those accesses is often not atomic because other threads > > might observe intermediate results in a data-race-free way. > > > > (I know that this isn't a very precise formulation, but I hope it brings > > my line of reasoning across.) > > So given something like: > > if (x) > y = 3; > > assuming both x and y are atomic (so don't gimme crap for now knowing > the C11 atomic incantations); and you can prove x is always true; you > don't see a problem with not emitting the conditional? That depends on what your goal is. It would be correct as far as the standard is specified; this makes sense if all you want is indeed a program that does what the abstract machine might do, and produces the same output / side effects. If you're trying to preserve the branch in the code emitted / executed by the implementation, then it would not be correct. But those branches aren't specified as being part of the observable side effects. In the common case, this makes sense because it enables optimizations that are useful; this line of reasoning also allows the compiler to merge some atomic accesses in the way that Linus would like to see it. > Avoiding the conditional changes the result; see that control dependency > email from earlier. It does not regarding how the standard defines "result". > In the above example the load of X and the store to > Y are strictly ordered, due to control dependencies. Not emitting the > condition and maybe not even emitting the load completely wrecks this. I think you're trying to solve this backwards. You are looking at this with an implicit wishlist of what the compiler should do (or how you want to use the hardware), but this is not a viable specification that one can write a compiler against. We do need clear rules for what the compiler is allowed to do or not (e.g., a memory model that models multi-threaded executions). Otherwise it's all hand-waving, and we're getting nowhere. Thus, the way to approach this is to propose a feature or change to the standard, make sure that this is consistent and has no unintended side effects for other aspects of compilation or other code, and then ask the compiler to implement it. IOW, we need a patch for where this all starts: in the rules and requirements for compilation. Paul and I are at the C++ meeting currently, and we had sessions in which the concurrency study group talked about memory model issues like dependency tracking and memory_order_consume. Paul shared uses of atomics (or likewise) in the kernel, and we discussed how the memory model currently handles various cases and why, how one could express other requirements consistently, and what is actually implementable in practice. I can't speak for Paul, but I thought those discussions were productive. > Its therefore an invalid optimization to take out the conditional or > speculate the store, since it takes out the dependency.