Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
2015-03-24 17:06 GMT+03:00 Jakub Jelinek : > On Tue, Mar 24, 2015 at 12:22:27PM +0300, Ilya Enkovich wrote: >> 2015-03-24 11:33 GMT+03:00 Jakub Jelinek : >> > On Thu, Mar 19, 2015 at 11:29:44AM +0300, Ilya Enkovich wrote: >> >> + /* We might propagate instrumented function pointer into >> >> + not instrumented function and vice versa. In such a >> >> + case we need to either fix function declaration or >> >> + remove bounds from call statement. */ >> >> + if (callee) >> >> +skip_bounds = chkp_redirect_edge (e); >> > >> > I just want to say that I'm not really excited about all this compile time >> > cost that is added everywhere unconditionally for chkp. >> > I think much better would be to guard most of it with proper option check >> > first and only do the more expensive part if the option has been used. >> >> Agree, overhead for not instrumented code should be minimized. >> Unfortunately there is no option check I can use to guard chkp codes >> due to LTO. Currently it is allowed to pass -fcheck-pointer-bounds for >> IL generation and don't pass it for final code generation. I suppose I >> may set this (or some new) flag if see instrumented node when read >> cgraph and then use it to guard chkp related codes. Would it be >> acceptable? > > The question is what you want to do in the LTO case for the different cases, > in particular a TU compiled with -fcheck-pointer-bounds and LTO link without > that, or TU compiled without -fcheck-pointer-bounds and LTO link with it. > It could be handled as LTO incompatible option, where lto1 would error out > if you try to mix -fcheck-pointer-bounds with -fno-check-pointer-bounds > code, or e.g. similar to var-tracking, you could consider adjusting the IL > upon LTO reading if if some TU has been built with -fcheck-pointer-bounds > and the LTO link is -fno-check-pointer-bounds. Dunno what will happen > with -fno-check-pointer-bounds TUs LTO linked with -fcheck-pointer-bounds. > Or another possibility is to or in -fcheck-pointer-bounds from all TUs. Mixing instrumented and not instrumented TUs is allowed. All instrumentation passes happen before LTO link. The only code generation problem if instrumented code is linked with no -fcheck-pointer-bounds is disabled chkp_finish_file call which generates static constructors. I think I just should set flag_check_pointer_bounds if see any instrumented symbol on LTO read. It would cause chkp_finish_file call when required and would be available as guard for chkp related codes. > >> Maybe replace attribute usage with a new flag in tree_decl_with_vis >> structure? > > Depends, might be better to stick it into cgraph_node instead, depends on > whether you are querying it already early in the FEs or just during GIMPLE > when the cgraph node should be created too. Flag in cgraph_node should work. I'll have a look. Thanks, Ilya > > Jakub
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
On Wed, Mar 25, 2015 at 11:05:17AM +0300, Ilya Enkovich wrote: > > The question is what you want to do in the LTO case for the different cases, > > in particular a TU compiled with -fcheck-pointer-bounds and LTO link without > > that, or TU compiled without -fcheck-pointer-bounds and LTO link with it. > > It could be handled as LTO incompatible option, where lto1 would error out > > if you try to mix -fcheck-pointer-bounds with -fno-check-pointer-bounds > > code, or e.g. similar to var-tracking, you could consider adjusting the IL > > upon LTO reading if if some TU has been built with -fcheck-pointer-bounds > > and the LTO link is -fno-check-pointer-bounds. Dunno what will happen > > with -fno-check-pointer-bounds TUs LTO linked with -fcheck-pointer-bounds. > > Or another possibility is to or in -fcheck-pointer-bounds from all TUs. > > Mixing instrumented and not instrumented TUs is allowed. All > instrumentation passes happen before LTO link. The only code > generation problem if instrumented code is linked with no > -fcheck-pointer-bounds is disabled chkp_finish_file call which > generates static constructors. I think I just should set > flag_check_pointer_bounds if see any instrumented symbol on LTO read. > It would cause chkp_finish_file call when required and would be > available as guard for chkp related codes. Thus perhaps oring the flag_check_pointer_bounds option from all the TUs is the desirable behavior for LTO? I think Richard or Honza would know where would be the best spot to do that. Jakub
Re: [PATCH][3/3][PR65460] Mark offloaded functions as parallelized
Hi Tom! On Sat, 21 Mar 2015 23:30:51 +0100, Tom de Vries wrote: > On 20-03-15 12:38, Tom de Vries wrote: > > On 19-03-15 12:05, Tom de Vries wrote: > >> On 18-03-15 18:22, Tom de Vries wrote: > >>> this patch fixes PR65460. > >>> > >>> The patch marks offloaded functions as parallelized, which means the > >>> parloops > >>> pass no longer attempts to modify that function. > >> > >> Updated patch to postpone mark_parallelized_function until the > >> corresponding > >> cgraph_node is available, to ensure it works with the updated > >> mark_parallelized_function from patch 2/3. > > > > Updated to eliminate mark_parallelized_function. > > > > Bootstrapped and reg-tested on x86_64. > > > > OK for stage4? > > as requested, applied to gomp-4_0-branch. Thanks! Committed to gomp-4_0-branch in r221652: commit 68c0851cb7ce420d5d938d7f0d9247adf79190a5 Author: tschwinge Date: Wed Mar 25 08:28:09 2015 + Use ChangeLog.gomp on gomp-4_0-branch. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@221652 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 34 -- gcc/ChangeLog.gomp | 34 ++ 2 files changed, 34 insertions(+), 34 deletions(-) diff --git gcc/ChangeLog gcc/ChangeLog index 48dca87..e474fc8 100644 --- gcc/ChangeLog +++ gcc/ChangeLog @@ -1,37 +1,3 @@ -2015-03-21 Tom de Vries - - PR tree-optimization/65460 - * omp-low.c (expand_omp_target): Set parallelized_function on - cgraph_node for child_fn. - -2015-03-21 Tom de Vries - - backport from trunk: - 2015-03-21 Tom de Vries - - PR tree-optimization/65458 - * cgraph.c (cgraph_node::dump): Handle parallelized_function field. - * cgraph.h (cgraph_node): Add parallelized_function field. - * lto-cgraph.c (lto_output_node): Write parallelized_function field. - (input_overwrite_node): Read parallelized_function field. - * omp-low.c (expand_omp_taskreg, finalize_task_copyfn): Set - parallelized_function on cgraph_node for child_fn. - * tree-parloops.c: Add include of plugin-api.h, ipa-ref.h and cgraph.h. - Remove include of gt-tree-parloops.h. - (parallelized_functions): Remove static variable. - (parallelized_function_p): Rewrite using parallelized_function field of - cgraph_node. - (create_loop_fn): Remove adding to parallelized_functions. - * Makefile.in (GTFILES): Remove tree-parloops.c - -2015-03-21 Tom de Vries - - backport from trunk: - 2015-03-18 Tom de Vries - - * tree-parloops.c (parallelize_loops): Make static. - * tree-parloops.h (parallelize_loops): Remove extern declaration. - 2015-03-11 Thomas Schwinge * config/nvptx/nvptx.h (LIBSTDCXX): Define to "gcc". diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 6ed6962..b499d04 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,37 @@ +2015-03-21 Tom de Vries + + PR tree-optimization/65460 + * omp-low.c (expand_omp_target): Set parallelized_function on + cgraph_node for child_fn. + +2015-03-21 Tom de Vries + + backport from trunk: + 2015-03-21 Tom de Vries + + PR tree-optimization/65458 + * cgraph.c (cgraph_node::dump): Handle parallelized_function field. + * cgraph.h (cgraph_node): Add parallelized_function field. + * lto-cgraph.c (lto_output_node): Write parallelized_function field. + (input_overwrite_node): Read parallelized_function field. + * omp-low.c (expand_omp_taskreg, finalize_task_copyfn): Set + parallelized_function on cgraph_node for child_fn. + * tree-parloops.c: Add include of plugin-api.h, ipa-ref.h and cgraph.h. + Remove include of gt-tree-parloops.h. + (parallelized_functions): Remove static variable. + (parallelized_function_p): Rewrite using parallelized_function field of + cgraph_node. + (create_loop_fn): Remove adding to parallelized_functions. + * Makefile.in (GTFILES): Remove tree-parloops.c + +2015-03-21 Tom de Vries + + backport from trunk: + 2015-03-18 Tom de Vries + + * tree-parloops.c (parallelize_loops): Make static. + * tree-parloops.h (parallelize_loops): Remove extern declaration. + 2015-01-13 Thomas Schwinge * tree-core.h: Don't include "gomp-constants.h". Grüße, Thomas pgpN3QSz6W6Zt.pgp Description: PGP signature
[patch, nios2, committed] Fix nios2-linux crti/crtn settings
We appear to have erroneously set 'extra_parts' in nios2-linux libgcc, to include the crti.o/crtn.o files intended for nios2 EABI. This still largely worked, which is why we haven't noticed it till now, expect some features like gprof profiling wasn't properly set up. This patch removes the extra_parts setting for nios2-linux libgcc; now crti.o/crtn.o links to the correct ones provided by glibc. Chung-Lin 2015-03-25 Chung-Lin Tang libgcc/ * config.host (nios2-*-linux*): Remove 'extra_parts' setting. Index: config.host === --- config.host (revision 221651) +++ config.host (working copy) @@ -943,7 +943,6 @@ nds32*-elf*) ;; nios2-*-linux*) tmake_file="$tmake_file nios2/t-nios2 nios2/t-linux t-libgcc-pic t-slibgcc-libgcc" - extra_parts="$extra_parts crti.o crtn.o" md_unwind_header=nios2/linux-unwind.h ;; nios2-*-*)
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
2015-03-24 17:40 GMT+03:00 Richard Biener : > On Tue, Mar 24, 2015 at 3:06 PM, Jakub Jelinek wrote: >> On Tue, Mar 24, 2015 at 12:22:27PM +0300, Ilya Enkovich wrote: >> >> The question is what you want to do in the LTO case for the different cases, >> in particular a TU compiled with -fcheck-pointer-bounds and LTO link without >> that, or TU compiled without -fcheck-pointer-bounds and LTO link with it. >> It could be handled as LTO incompatible option, where lto1 would error out >> if you try to mix -fcheck-pointer-bounds with -fno-check-pointer-bounds >> code, or e.g. similar to var-tracking, you could consider adjusting the IL >> upon LTO reading if if some TU has been built with -fcheck-pointer-bounds >> and the LTO link is -fno-check-pointer-bounds. Dunno what will happen >> with -fno-check-pointer-bounds TUs LTO linked with -fcheck-pointer-bounds. >> Or another possibility is to or in -fcheck-pointer-bounds from all TUs. >> >>> Maybe replace attribute usage with a new flag in tree_decl_with_vis >>> structure? >> >> Depends, might be better to stick it into cgraph_node instead, depends on >> whether you are querying it already early in the FEs or just during GIMPLE >> when the cgraph node should be created too. > > I also wonder why it is necessary to execute pass_chkp_instrumentation_passes > when mpx is not active. > > That is, can we guard that properly in > > void > pass_manager::execute_early_local_passes () > { > execute_pass_list (cfun, pass_build_ssa_passes_1->sub); > execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); > execute_pass_list (cfun, pass_local_optimization_passes_1->sub); > } I'm worried about new functions generated in LTO. But with re-created flag_check_pointer_bounds it should be safe to guard it. > > (why's that so oddly wrapped?) > > class pass_chkp_instrumentation_passes > > also has no gate that guards with flag_mpx or so. > > That would save a IL walk over all functions (fixup_cfg) and a cgraph > edge rebuild. Right. Will fix it. Thanks, Ilya > > Richard. > >> Jakub
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
2015-03-25 11:16 GMT+03:00 Jakub Jelinek : > On Wed, Mar 25, 2015 at 11:05:17AM +0300, Ilya Enkovich wrote: >> > The question is what you want to do in the LTO case for the different >> > cases, >> > in particular a TU compiled with -fcheck-pointer-bounds and LTO link >> > without >> > that, or TU compiled without -fcheck-pointer-bounds and LTO link with it. >> > It could be handled as LTO incompatible option, where lto1 would error out >> > if you try to mix -fcheck-pointer-bounds with -fno-check-pointer-bounds >> > code, or e.g. similar to var-tracking, you could consider adjusting the IL >> > upon LTO reading if if some TU has been built with -fcheck-pointer-bounds >> > and the LTO link is -fno-check-pointer-bounds. Dunno what will happen >> > with -fno-check-pointer-bounds TUs LTO linked with -fcheck-pointer-bounds. >> > Or another possibility is to or in -fcheck-pointer-bounds from all TUs. >> >> Mixing instrumented and not instrumented TUs is allowed. All >> instrumentation passes happen before LTO link. The only code >> generation problem if instrumented code is linked with no >> -fcheck-pointer-bounds is disabled chkp_finish_file call which >> generates static constructors. I think I just should set >> flag_check_pointer_bounds if see any instrumented symbol on LTO read. >> It would cause chkp_finish_file call when required and would be >> available as guard for chkp related codes. > > Thus perhaps oring the flag_check_pointer_bounds option from all the TUs is > the desirable behavior for LTO? > I think Richard or Honza would know where would be the best spot to do that. > > Jakub Is such oring used for some other flags to have an example? Thanks, Ilya
Re: [PATCH] Rewrite lto streamer DFS from recursion to worklist (PR lto/65515)
On Tue, 24 Mar 2015, Jakub Jelinek wrote: > On Tue, Mar 24, 2015 at 04:19:46PM +0100, Jakub Jelinek wrote: > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Also tested with > ../configure --with-build-config=bootstrap-lto > --enable-languages=c,c++,fortran,objc,obj-c++,go > make -j16; make -j16 -k check > on x86_64-linux, no regressions. Ok. Thanks, Richard.
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
On Wed, Mar 25, 2015 at 9:50 AM, Ilya Enkovich wrote: > 2015-03-24 17:40 GMT+03:00 Richard Biener : >> On Tue, Mar 24, 2015 at 3:06 PM, Jakub Jelinek wrote: >>> On Tue, Mar 24, 2015 at 12:22:27PM +0300, Ilya Enkovich wrote: >>> >>> The question is what you want to do in the LTO case for the different cases, >>> in particular a TU compiled with -fcheck-pointer-bounds and LTO link without >>> that, or TU compiled without -fcheck-pointer-bounds and LTO link with it. >>> It could be handled as LTO incompatible option, where lto1 would error out >>> if you try to mix -fcheck-pointer-bounds with -fno-check-pointer-bounds >>> code, or e.g. similar to var-tracking, you could consider adjusting the IL >>> upon LTO reading if if some TU has been built with -fcheck-pointer-bounds >>> and the LTO link is -fno-check-pointer-bounds. Dunno what will happen >>> with -fno-check-pointer-bounds TUs LTO linked with -fcheck-pointer-bounds. >>> Or another possibility is to or in -fcheck-pointer-bounds from all TUs. >>> Maybe replace attribute usage with a new flag in tree_decl_with_vis structure? >>> >>> Depends, might be better to stick it into cgraph_node instead, depends on >>> whether you are querying it already early in the FEs or just during GIMPLE >>> when the cgraph node should be created too. >> >> I also wonder why it is necessary to execute pass_chkp_instrumentation_passes >> when mpx is not active. >> >> That is, can we guard that properly in >> >> void >> pass_manager::execute_early_local_passes () >> { >> execute_pass_list (cfun, pass_build_ssa_passes_1->sub); >> execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); >> execute_pass_list (cfun, pass_local_optimization_passes_1->sub); >> } > > I'm worried about new functions generated in LTO. But with re-created > flag_check_pointer_bounds it should be safe to guard it. > >> >> (why's that so oddly wrapped?) >> >> class pass_chkp_instrumentation_passes >> >> also has no gate that guards with flag_mpx or so. >> >> That would save a IL walk over all functions (fixup_cfg) and a cgraph >> edge rebuild. > > Right. Will fix it. I am already testing Index: gcc/passes.c === --- gcc/passes.c(revision 221633) +++ gcc/passes.c(working copy) @@ -156,7 +156,8 @@ void pass_manager::execute_early_local_passes () { execute_pass_list (cfun, pass_build_ssa_passes_1->sub); - execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); + if (flag_check_pointer_bounds) +execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); execute_pass_list (cfun, pass_local_optimization_passes_1->sub); } @@ -424,7 +425,8 @@ public: virtual bool gate (function *) { /* Don't bother doing anything if the program has errors. */ - return (!seen_error () && !in_lto_p); + return (flag_check_pointer_bounds + && !seen_error () && !in_lto_p); } }; // class pass_chkp_instrumentation_passes Richard. > Thanks, > Ilya > >> >> Richard. >> >>> Jakub
Re: [Patch, Fortran, pr60322] was: [Patch 1/2, Fortran, pr60322] [OOP] Incorrect bounds on polymorphic dummy array
Hi Andre, > Le 24 mars 2015 à 18:06, Andre Vehreschild a écrit : > > Hi all, > > I have worked on the comments Mikael gave me. I am now checking for > class_pointer in the way he pointed out. > > Furthermore did I *join the two parts* of the patch into this one, because > keeping both in sync was no benefit but only tedious and did not prove to be > reviewed faster. Are you sure that you attached the right patch? It does not apply on a clean tree unless I apply the patch at https://gcc.gnu.org/ml/fortran/2015-02/msg00105.html with minor surgery for gcc/fortran/expr.c. > Paul, Dominique: I have addressed the LOC issue that came up lately. Or rather > the patch addressed it already. I feel like this is not tested very well, not > the loc() call nor the sizeof() call as given in the 57305 second's download. The ICE is fixed and the LOC issue seems fixed. > Unfortunately, is that download not runable. I would love to see a test > similar > to that download, but couldn't come up with one, that satisfied me. Given that > the patch's review will last some days, I still have enough time to come up > with something beautiful which I will add then. I have changed the test to use iso_c_binding implicit none real, target :: e class(*), allocatable, target :: a(:) e = 1.0 call add_element_poly(a,e) print *, size(a) call add_element_poly(a,e) print *, size(a) select type (a) type is (real) print *, a end select contains subroutine add_element_poly(a,e) use iso_c_binding class(*),allocatable,intent(inout),target :: a(:) class(*),intent(in),target :: e class(*),allocatable,target :: tmp(:) type(c_ptr) :: dummy interface function memcpy(dest,src,n) bind(C,name="memcpy") result(res) import type(c_ptr) :: res integer(c_intptr_t),value :: dest integer(c_intptr_t),value :: src integer(c_size_t),value :: n end function end interface if (.not.allocated(a)) then allocate(a(1), source=e) else print *, size(a) allocate(tmp(size(a)),source=a) print *, size(a), size(tmp) + 1 print *, loc(a(1)),loc(tmp),sizeof(tmp) deallocate(a) !allocate(a(size(tmp)+1),mold=e) allocate(a(size(tmp)+1),source=e) print *, size(a), size(tmp) dummy = memcpy(loc(a(1)),loc(tmp),sizeof(tmp)) dummy = memcpy(loc(a(size(tmp)+1)),loc(e),sizeof(e)) end if end subroutine end As pointed by Paul, I get a segfault at run time if I use the commented line, i.e. ‘mold’ instead of ‘source’. > Bootstraps and regtests ok on x86_64-linux-gnu/F20. > > Regards, > Andre Thanks for your work. Dominique
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
On Wed, Mar 25, 2015 at 10:38:56AM +0100, Richard Biener wrote: > --- gcc/passes.c(revision 221633) > +++ gcc/passes.c(working copy) > @@ -156,7 +156,8 @@ void > pass_manager::execute_early_local_passes () > { >execute_pass_list (cfun, pass_build_ssa_passes_1->sub); > - execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); > + if (flag_check_pointer_bounds) > +execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); >execute_pass_list (cfun, pass_local_optimization_passes_1->sub); > } > > @@ -424,7 +425,8 @@ public: >virtual bool gate (function *) > { >/* Don't bother doing anything if the program has errors. */ > - return (!seen_error () && !in_lto_p); > + return (flag_check_pointer_bounds > + && !seen_error () && !in_lto_p); > } > > }; // class pass_chkp_instrumentation_passes There is still the wasteful pass_fixup_cfg at the start of: PUSH_INSERT_PASSES_WITHIN (pass_local_optimization_passes) NEXT_PASS (pass_fixup_cfg); which wasn't there before chkp. Perhaps this should be a different pass with the same execute method, but gate containing flag_check_pointer_bounds? Jakub
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
2015-03-25 12:50 GMT+03:00 Jakub Jelinek : > On Wed, Mar 25, 2015 at 10:38:56AM +0100, Richard Biener wrote: >> --- gcc/passes.c(revision 221633) >> +++ gcc/passes.c(working copy) >> @@ -156,7 +156,8 @@ void >> pass_manager::execute_early_local_passes () >> { >>execute_pass_list (cfun, pass_build_ssa_passes_1->sub); >> - execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); >> + if (flag_check_pointer_bounds) >> +execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); >>execute_pass_list (cfun, pass_local_optimization_passes_1->sub); >> } >> >> @@ -424,7 +425,8 @@ public: >>virtual bool gate (function *) >> { >>/* Don't bother doing anything if the program has errors. */ >> - return (!seen_error () && !in_lto_p); >> + return (flag_check_pointer_bounds >> + && !seen_error () && !in_lto_p); >> } >> >> }; // class pass_chkp_instrumentation_passes > > There is still the wasteful pass_fixup_cfg at the start of: > PUSH_INSERT_PASSES_WITHIN (pass_local_optimization_passes) > NEXT_PASS (pass_fixup_cfg); > which wasn't there before chkp. Perhaps this should be a different > pass with the same execute method, but gate containing > flag_check_pointer_bounds? IIRC the reason for this pass was a different passes split, not instrumentation itself. Previously function processing always started with pass_fixup_cfg. Splitting processing into three stages we got three pass_fixup_cfg passes. Ilya > > Jakub
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
On Wed, Mar 25, 2015 at 01:06:46PM +0300, Ilya Enkovich wrote: > > There is still the wasteful pass_fixup_cfg at the start of: > > PUSH_INSERT_PASSES_WITHIN (pass_local_optimization_passes) > > NEXT_PASS (pass_fixup_cfg); > > which wasn't there before chkp. Perhaps this should be a different > > pass with the same execute method, but gate containing > > flag_check_pointer_bounds? > > IIRC the reason for this pass was a different passes split, not > instrumentation itself. Previously function processing always started > with pass_fixup_cfg. Splitting processing into three stages we got > three pass_fixup_cfg passes. Sure, but it would be really nice if for !flag_check_pointer_bounds we really could have just one stage again, rather than 3. When it is a global option, and for LTO ideally ored in from all the TUs, that shouldn't be that hard... Jakub
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
On Wed, Mar 25, 2015 at 10:50 AM, Jakub Jelinek wrote: > On Wed, Mar 25, 2015 at 10:38:56AM +0100, Richard Biener wrote: >> --- gcc/passes.c(revision 221633) >> +++ gcc/passes.c(working copy) >> @@ -156,7 +156,8 @@ void >> pass_manager::execute_early_local_passes () >> { >>execute_pass_list (cfun, pass_build_ssa_passes_1->sub); >> - execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); >> + if (flag_check_pointer_bounds) >> +execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); >>execute_pass_list (cfun, pass_local_optimization_passes_1->sub); >> } >> >> @@ -424,7 +425,8 @@ public: >>virtual bool gate (function *) >> { >>/* Don't bother doing anything if the program has errors. */ >> - return (!seen_error () && !in_lto_p); >> + return (flag_check_pointer_bounds >> + && !seen_error () && !in_lto_p); >> } >> >> }; // class pass_chkp_instrumentation_passes > > There is still the wasteful pass_fixup_cfg at the start of: > PUSH_INSERT_PASSES_WITHIN (pass_local_optimization_passes) > NEXT_PASS (pass_fixup_cfg); > which wasn't there before chkp. Perhaps this should be a different > pass with the same execute method, but gate containing > flag_check_pointer_bounds? That's not wasteful but required due to local_pure_const. The remaining wasteful fixup_cfg is the one in pass_build_ssa_passes. ISTR that pass_ipa_chkp_versioning/early_produce_thunks makes that one required? Or EH / CFG cleanup stuff makes it necessary to not fail IL checking done by into-SSA. Richard. > Jakub
Re: [Patch, fortran] PR65532 shape mismatch error with data partial initialization
Le 24/03/2015 23:39, Mikael Morin a écrit : > The patch I propose here adds a flag to remember the function has been > called, and skip it the second time. > I considered reusing the existing 'resolved' field, but I had to > slightly change its semantics to prevent regressing somewhere, and I was > not completely sure how safe that change was. > I have finally preferred this safer patch keeping the existing field > completely untouched. > > Regression tested on x86_64-unknown-linux-gnu. OK for trunk? > I have committed the patch as obvious as revision 221657. If someone is willing to debate about it, the discussion remains open. Mikael
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
On Wed, Mar 25, 2015 at 11:11 AM, Jakub Jelinek wrote: > On Wed, Mar 25, 2015 at 01:06:46PM +0300, Ilya Enkovich wrote: >> > There is still the wasteful pass_fixup_cfg at the start of: >> > PUSH_INSERT_PASSES_WITHIN (pass_local_optimization_passes) >> > NEXT_PASS (pass_fixup_cfg); >> > which wasn't there before chkp. Perhaps this should be a different >> > pass with the same execute method, but gate containing >> > flag_check_pointer_bounds? >> >> IIRC the reason for this pass was a different passes split, not >> instrumentation itself. Previously function processing always started >> with pass_fixup_cfg. Splitting processing into three stages we got >> three pass_fixup_cfg passes. > > Sure, but it would be really nice if for !flag_check_pointer_bounds > we really could have just one stage again, rather than 3. > When it is a global option, and for LTO ideally ored in from all the TUs, > that shouldn't be that hard... LTO doesn't even run all this stuff at it only runs before LTO streaming. I don't think we want to go back to not going into SSA for all functions before early-opts (esp. early inlining). Which unfortunately won't get the EH cleanup related benefits. Btw, execute_fixup_cfg can be optimized as well - edge purging only needs to be done for the last stmt of a BB. Richard. > Jakub
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
2015-03-25 13:15 GMT+03:00 Richard Biener : > On Wed, Mar 25, 2015 at 10:50 AM, Jakub Jelinek wrote: >> On Wed, Mar 25, 2015 at 10:38:56AM +0100, Richard Biener wrote: >>> --- gcc/passes.c(revision 221633) >>> +++ gcc/passes.c(working copy) >>> @@ -156,7 +156,8 @@ void >>> pass_manager::execute_early_local_passes () >>> { >>>execute_pass_list (cfun, pass_build_ssa_passes_1->sub); >>> - execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); >>> + if (flag_check_pointer_bounds) >>> +execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); >>>execute_pass_list (cfun, pass_local_optimization_passes_1->sub); >>> } >>> >>> @@ -424,7 +425,8 @@ public: >>>virtual bool gate (function *) >>> { >>>/* Don't bother doing anything if the program has errors. */ >>> - return (!seen_error () && !in_lto_p); >>> + return (flag_check_pointer_bounds >>> + && !seen_error () && !in_lto_p); >>> } >>> >>> }; // class pass_chkp_instrumentation_passes >> >> There is still the wasteful pass_fixup_cfg at the start of: >> PUSH_INSERT_PASSES_WITHIN (pass_local_optimization_passes) >> NEXT_PASS (pass_fixup_cfg); >> which wasn't there before chkp. Perhaps this should be a different >> pass with the same execute method, but gate containing >> flag_check_pointer_bounds? > > That's not wasteful but required due to local_pure_const. The remaining > wasteful fixup_cfg is the one in pass_build_ssa_passes. ISTR > that pass_ipa_chkp_versioning/early_produce_thunks makes that one > required? Or EH / CFG cleanup stuff makes it necessary to not > fail IL checking done by into-SSA. These two chkp passes don't modify function bodies (mat remove it though). I don't expect them to require following fixup_cfg. Ilya > > Richard. > >> Jakub
Re: [PATCH] Fix PR65538
On 03/25/2015 12:37 AM, Jan Hubicka wrote: On Tue, Mar 24, 2015 at 10:54:25PM +0100, Martin Liška wrote: --- a/gcc/symbol-summary.h +++ b/gcc/symbol-summary.h @@ -81,6 +81,12 @@ public: m_symtab_insertion_hook = NULL; m_symtab_removal_hook = NULL; m_symtab_duplication_hook = NULL; + +/* Release all summaries in case we use non-GGC memory. */ +typedef typename hash_map ::iterator map_iterator; +if (!m_ggc) + for (map_iterator it = m_map.begin (); it != m_map.end (); ++it) + release ((*it).second); You haven't removed the now unnecessary if (!m_ggc) guard. @@ -106,6 +112,15 @@ public: return m_ggc ? new (ggc_alloc ()) T() : new T () ; } + /* Release an item that is stored within map. */ + void release (T *item) + { +if (m_ggc) + ggc_free (item); Perhaps run also the item's destructor first? I know that inline_summary doesn't have a user destructor, so it will expand to nothing, so it would be just for completeness. Yep, calling destructors is a good idea. OK with that change and fix Jakub pointed out. Honza +else + delete item; + } + Jakub Ok, changes are applied in the final patch I'm going to install. Thanks, Martin >From 6eae938e34e36c461ebec1570ff0f3d2f5e1b8cf Mon Sep 17 00:00:00 2001 From: mliska Date: Tue, 24 Mar 2015 13:58:50 +0100 Subject: [PATCH] Fix PR65538. gcc/ChangeLog: 2015-03-24 Martin Liska PR tree-optimization/65538 * symbol-summary.h (function_summary::~function_summary): Relese memory for allocated summaries. (function_summary::release): New function. --- gcc/symbol-summary.h | 17 + 1 file changed, 17 insertions(+) diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h index 8d7e42c..0448310 100644 --- a/gcc/symbol-summary.h +++ b/gcc/symbol-summary.h @@ -81,6 +81,11 @@ public: m_symtab_insertion_hook = NULL; m_symtab_removal_hook = NULL; m_symtab_duplication_hook = NULL; + +/* Release all summaries. */ +typedef typename hash_map ::iterator map_iterator; +for (map_iterator it = m_map.begin (); it != m_map.end (); ++it) + release ((*it).second); } /* Traverses all summarys with a function F called with @@ -106,6 +111,18 @@ public: return m_ggc ? new (ggc_alloc ()) T() : new T () ; } + /* Release an item that is stored within map. */ + void release (T *item) + { +if (m_ggc) + { + item->~T (); + ggc_free (item); + } +else + delete item; + } + /* Getter for summary callgraph node pointer. */ T* get (cgraph_node *node) { -- 2.1.4
[PATCH] Vimrc config: fix symlink creation
Hello. Following patch correctly creates symlink that now points to a wrong location. Ready for trunk? Thanks, Martin >From 5681b55f531f579ba75aad21f5628f86fba4bc8a Mon Sep 17 00:00:00 2001 From: mliska Date: Wed, 25 Mar 2015 10:09:21 +0100 Subject: [PATCH] Fix vimrc file link creation. ChangeLog: 2015-03-25 Martin Liska Yury Gribov * Makefile.in: Fix ln source location for vimrc file. * Makefile.tpl: Likewise. --- Makefile.in | 4 ++-- Makefile.tpl | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/Makefile.in b/Makefile.in index 6f9dfd4..36b4008 100644 --- a/Makefile.in +++ b/Makefile.in @@ -2442,10 +2442,10 @@ mail-report-with-warnings.log: warning.log # Local Vim config $(srcdir)/.local.vimrc: - $(LN_S) $(srcdir)/contrib/vimrc $@ + $(LN_S) contrib/vimrc $@ $(srcdir)/.lvimrc: - $(LN_S) $(srcdir)/contrib/vimrc $@ + $(LN_S) contrib/vimrc $@ vimrc: $(srcdir)/.local.vimrc $(srcdir)/.lvimrc diff --git a/Makefile.tpl b/Makefile.tpl index f737cfc..1ea1954 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -872,10 +872,10 @@ mail-report-with-warnings.log: warning.log # Local Vim config $(srcdir)/.local.vimrc: - $(LN_S) $(srcdir)/contrib/vimrc $@ + $(LN_S) contrib/vimrc $@ $(srcdir)/.lvimrc: - $(LN_S) $(srcdir)/contrib/vimrc $@ + $(LN_S) contrib/vimrc $@ vimrc: $(srcdir)/.local.vimrc $(srcdir)/.lvimrc -- 2.1.4
Re: [PATCH] Vimrc config: fix symlink creation
On Wed, Mar 25, 2015 at 11:57:08AM +0100, Martin Liška wrote: > Following patch correctly creates symlink that now points to a wrong location. Only if $(srcdir) is a relative path I'd say. > Ready for trunk? In any case, LGTM. > 2015-03-25 Martin Liska > Yury Gribov > > * Makefile.in: Fix ln source location for vimrc file. > * Makefile.tpl: Likewise. Jakub
[PATCH] Guard pass_chkp_instrumentation_passes with flag_check_pointer_bounds
Avoids a fixup-cfg and cgraph edge rebuild. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-03-25 Richard Biener * passes.c (pass_manager::execute_early_local_passes): Guard execution of pass_chkp_instrumentation_passes with flag_check_pointer_bounds. (pass_chkp_instrumentation_passes::gate): Likewise. Index: gcc/passes.c === --- gcc/passes.c(revision 221633) +++ gcc/passes.c(working copy) @@ -156,7 +156,8 @@ void pass_manager::execute_early_local_passes () { execute_pass_list (cfun, pass_build_ssa_passes_1->sub); - execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); + if (flag_check_pointer_bounds) +execute_pass_list (cfun, pass_chkp_instrumentation_passes_1->sub); execute_pass_list (cfun, pass_local_optimization_passes_1->sub); } @@ -424,7 +425,8 @@ public: virtual bool gate (function *) { /* Don't bother doing anything if the program has errors. */ - return (!seen_error () && !in_lto_p); + return (flag_check_pointer_bounds + && !seen_error () && !in_lto_p); } }; // class pass_chkp_instrumentation_passes
[patch libgomp]: Fix PR 64972
Hi, ChangeLog 2015-03-25 Kai Tietz PR libgomp/64972 * oacc-parallel.c (GOACC_parallel): Use PRIu64 if available. (GOACC_data_start): Likewise. * target.c (gomp_map_vars): Likewise. Tested for i686-w64-mingw32. Fix got preapproved by Jakub, so I will commit this soon, if there are no objections. Regards, Kai Index: oacc-parallel.c === --- oacc-parallel.c(Revision 221640) +++ oacc-parallel.c(Arbeitskopie) @@ -31,6 +31,9 @@ #include "libgomp_g.h" #include "gomp-constants.h" #include "oacc-int.h" +#ifdef HAVE_INTTYPES_H +# include /* For PRIu64. */ +#endif #include #include #include @@ -99,9 +102,15 @@ GOACC_parallel (int device, void (*fn) (void *), gomp_fatal ("num_workers (%d) different from one is not yet supported", num_workers); - gomp_debug (0, "%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n", - __FUNCTION__, mapnum, hostaddrs, sizes, kinds, async); - +#ifdef HAVE_INTTYPES_H + gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p, " + "async = %d\n", + __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds, async); +#else + gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n", + __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds, + async); +#endif select_acc_device (device); thr = goacc_thread (); @@ -178,8 +187,13 @@ GOACC_data_start (int device, size_t mapnum, bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK; struct target_mem_desc *tgt; - gomp_debug (0, "%s: mapnum=%zd, hostaddrs=%p, sizes=%p, kinds=%p\n", - __FUNCTION__, mapnum, hostaddrs, sizes, kinds); +#ifdef HAVE_INTTYPES_H + gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p\n", + __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds); +#else + gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p\n", + __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds); +#endif select_acc_device (device); Index: target.c === --- target.c(Revision 221640) +++ target.c(Arbeitskopie) @@ -33,6 +33,9 @@ #include #include #include +#ifdef HAVE_INTTYPES_H +# include /* For PRIu64. */ +#endif #include #include @@ -438,9 +441,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, /* We already looked up the memory region above and it was missing. */ size_t size = k->host_end - k->host_start; +#ifdef HAVE_INTTYPES_H gomp_fatal ("present clause: !acc_is_present (%p, " - "%zd (0x%zx))", (void *) k->host_start, - size, size); + "%"PRIu64" (0x%"PRIx64"))", + (void *) k->host_start, + (uint64_t) size, (uint64_t) size); +#else + gomp_fatal ("present clause: !acc_is_present (%p, " + "%lu (0x%lx))", (void *) k->host_start, + (unsigned long) size, (unsigned long) size); +#endif } break; case GOMP_MAP_FORCE_DEVICEPTR:
[PATCH, CHKP, PR target/65508] Set static chain for instrumented calls
Hi, This patch fixes PR target/65508 by proper copy of static chain for instrumented calls. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk or wait for stage 1? Thanks, Ilya -- gcc/ 2015-03-25 Ilya Enkovich PR target/65508 * tree-chkp.c (chkp_add_bounds_to_call_stmt): Set static chain for generated call. gcc/testsuite/ 2015-03-25 Ilya Enkovich PR target/65508 * gcc.target/i386/mpx/pr65508.c: New. diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr65508.c b/gcc/testsuite/gcc.target/i386/mpx/pr65508.c new file mode 100644 index 000..9060287 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mpx/pr65508.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fcheck-pointer-bounds -mmpx" } */ + +void +bar (int N) +{ + int a[N]; + void foo (int a[N]) + { + } + foo (a); +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index d2df4ba..de127ae 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -1838,6 +1838,7 @@ chkp_add_bounds_to_call_stmt (gimple_stmt_iterator *gsi) new_call = gimple_build_call_vec (gimple_op (call, 1), new_args); gimple_call_set_lhs (new_call, gimple_call_lhs (call)); gimple_call_copy_flags (new_call, call); + gimple_call_set_chain (new_call, gimple_call_chain (call)); } new_args.release ();
Re: [PATCH, CHKP, PR target/65508] Set static chain for instrumented calls
On Wed, Mar 25, 2015 at 1:35 PM, Ilya Enkovich wrote: > Hi, > > This patch fixes PR target/65508 by proper copy of static chain for > instrumented calls. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK > for trunk or wait for stage 1? Ok for trunk. Richard. > Thanks, > Ilya > -- > gcc/ > > 2015-03-25 Ilya Enkovich > > PR target/65508 > * tree-chkp.c (chkp_add_bounds_to_call_stmt): Set static > chain for generated call. > > gcc/testsuite/ > > 2015-03-25 Ilya Enkovich > > PR target/65508 > * gcc.target/i386/mpx/pr65508.c: New. > > > diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr65508.c > b/gcc/testsuite/gcc.target/i386/mpx/pr65508.c > new file mode 100644 > index 000..9060287 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/mpx/pr65508.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fcheck-pointer-bounds -mmpx" } */ > + > +void > +bar (int N) > +{ > + int a[N]; > + void foo (int a[N]) > + { > + } > + foo (a); > +} > diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c > index d2df4ba..de127ae 100644 > --- a/gcc/tree-chkp.c > +++ b/gcc/tree-chkp.c > @@ -1838,6 +1838,7 @@ chkp_add_bounds_to_call_stmt (gimple_stmt_iterator *gsi) >new_call = gimple_build_call_vec (gimple_op (call, 1), new_args); >gimple_call_set_lhs (new_call, gimple_call_lhs (call)); >gimple_call_copy_flags (new_call, call); > + gimple_call_set_chain (new_call, gimple_call_chain (call)); > } >new_args.release (); >
[PATCH] XFAIL gcc.dg/graphite/vect-pr43423.c
Committed. Richard. 2015-03-25 Richard Biener PR tree-optimization/62630 * gcc.dg/graphite/vect-pr43423.c: XFAIL. Index: gcc/testsuite/gcc.dg/graphite/vect-pr43423.c === --- gcc/testsuite/gcc.dg/graphite/vect-pr43423.c(revision 221633) +++ gcc/testsuite/gcc.dg/graphite/vect-pr43423.c(working copy) @@ -15,5 +15,5 @@ void foo(int n, int mid) } } -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { xfail *-*-* } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */
[Patch, fortran, pr65548, v1] [5 Regression] gfc_conv_procedure_call
Hi all, please find attached a fix for the recently introduced regression when allocating arrays with an intrinsic function for source=. The patch addresses this issue by using gfc_conv_expr_descriptor () for intrinsic functions. Bootstraps and regtests ok on x86_64-linux-gnu/F20. Ok for trunk? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de pr65548_1.clog Description: Binary data diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c index 6ffae6e79e..68b343b 100644 --- a/gcc/fortran/trans-stmt.c +++ b/gcc/fortran/trans-stmt.c @@ -5075,12 +5075,17 @@ gfc_trans_allocate (gfc_code * code) /* In all other cases evaluate the expr3 and create a temporary. */ gfc_init_se (&se, NULL); - gfc_conv_expr_reference (&se, code->expr3); + if (code->expr3->rank != 0 + && code->expr3->expr_type == EXPR_FUNCTION + && code->expr3->value.function.isym) + gfc_conv_expr_descriptor (&se, code->expr3); + else + gfc_conv_expr_reference (&se, code->expr3); if (code->expr3->ts.type == BT_CLASS) gfc_conv_class_to_class (&se, code->expr3, code->expr3->ts, false, true, - false,false); + false, false); gfc_add_block_to_block (&block, &se.pre); gfc_add_block_to_block (&post, &se.post); /* Prevent aliasing, i.e., se.expr may be already a diff --git a/gcc/testsuite/gfortran.dg/allocate_with_source_5.f90 b/gcc/testsuite/gfortran.dg/allocate_with_source_5.f90 new file mode 100644 index 000..e934e08 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/allocate_with_source_5.f90 @@ -0,0 +1,52 @@ +! { dg-do run } +! +! Check that pr65548 is fixed. +! Contributed by Juergen Reuter + +module allocate_with_source_5_module + + type :: selector_t +integer, dimension(:), allocatable :: map +real, dimension(:), allocatable :: weight + contains +procedure :: init => selector_init + end type selector_t + +contains + + subroutine selector_init (selector, weight) +class(selector_t), intent(out) :: selector +real, dimension(:), intent(in) :: weight +real :: s +integer :: n, i +logical, dimension(:), allocatable :: mask +s = sum (weight) +allocate (mask (size (weight)), source = weight /= 0) +n = count (mask) +if (n > 0) then + allocate (selector%map (n), & +source = pack ([(i, i = 1, size (weight))], mask)) + allocate (selector%weight (n), & +source = pack (weight / s, mask)) +else + allocate (selector%map (1), source = 1) + allocate (selector%weight (1), source = 0.) +end if + end subroutine selector_init + +end module allocate_with_source_5_module + +program allocate_with_source_5 + use allocate_with_source_5_module + + class(selector_t), allocatable :: sel; + real, dimension(5) :: w = [ 1, 0, 2, 0, 3]; + + allocate (sel) + call sel%init(w) + + if (any(sel%map /= [ 1, 3, 5])) call abort() + if (any(abs(sel%weight - [1, 2, 3] / 6) < 1E-6)) call abort() +end program allocate_with_source_5 +! { dg-final { cleanup-modules "allocate_with_source_5_module" } } +
C++ PATCH for c++/61670 (ice-after-error with null DECL_SIZE)
The following fixes an ICE on invalid code by checking that DECL_SIZE is not null before feeding it to integer_zerop. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-03-25 Marek Polacek PR c++/61670 * class.c (remove_zero_width_bit_fields): Check for null DECL_SIZE. * g++.dg/template/pr61670.C: New test. diff --git gcc/cp/class.c gcc/cp/class.c index 0518320..c2d4201 100644 --- gcc/cp/class.c +++ gcc/cp/class.c @@ -5434,7 +5434,8 @@ remove_zero_width_bit_fields (tree t) DECL_INITIAL (*fieldsp). check_bitfield_decl eventually sets DECL_SIZE (*fieldsp) to that width. */ - && integer_zerop (DECL_SIZE (*fieldsp))) + && (DECL_SIZE (*fieldsp) == NULL_TREE + || integer_zerop (DECL_SIZE (*fieldsp *fieldsp = DECL_CHAIN (*fieldsp); else fieldsp = &DECL_CHAIN (*fieldsp); diff --git gcc/testsuite/g++.dg/template/pr61670.C gcc/testsuite/g++.dg/template/pr61670.C index e69de29..d244efa 100644 --- gcc/testsuite/g++.dg/template/pr61670.C +++ gcc/testsuite/g++.dg/template/pr61670.C @@ -0,0 +1,9 @@ +// PR c++/61670 +// { dg-do compile } + +template +class A { + A: 0 // { dg-error "" } +}; + +A a; Marek
Re: Fix PR 65177: diamonds are not valid execution threads for jump threading
On 03/19/15 13:54, Sebastian Pop wrote: Richard Biener wrote: >please instead fixup after copy_bbs in duplicate_seme_region. > Thanks for the review. Attached patch that does not modify copy_bbs. Fixes make check in hmmer and make check RUNTESTFLAGS=tree-ssa.exp Full bootstrap and regtest in progress on x86_64-linux. Ok for trunk? 0001-diamonds-are-not-valid-execution-threads-for-jump-th.patch From 8f1516235bce3e1c4f359149dcc546d813ed7817 Mon Sep 17 00:00:00 2001 From: Sebastian Pop Date: Tue, 17 Mar 2015 20:28:19 +0100 Subject: [PATCH] diamonds are not valid execution threads for jump threading PR tree-optimization/65177 * tree-ssa-threadupdate.c (verify_seme): Renamed verify_jump_thread. (bb_in_bbs): New. (duplicate_seme_region): Renamed duplicate_thread_path. Redirect all edges not adjacent on the path to the original code. OK for the trunk. Though I think there's some stage1 refactoring that we're going to want to do. Specifically, it seems to me that copy_bbs should be refactored into copy_bbs and copy_bbs_for_threading or somesuch. Where those routines call into refactored common subroutines, but obviously handle wiring up the outgoing edges from the copied blocks differently. The goal would be to eliminate the overly complex block copy/CFG update scheme in tree-ssa-threadupdate.c as part of a larger project to convert to a backward threader that can run independently of DOM. Jeff
Re: [PATCH v2] New testcase to check parameter passing bug
On 03/18/15 19:40, Honggyu Kim wrote: Hi, I have modified the test-case to check parameter passing bug based on the comments from Kyrill Tkachov, Christophe Lyon, and Segher Boessenkool as follows: 1. move from "gcc.target/arm" to "gcc.dg" 2. change "dg-do compile" to "dg-do run" Please let me know if there's still something to fix more. Thanks for your comment. Honggyu --- gcc/testsuite/ChangeLog|4 gcc/testsuite/gcc.dg/pr65358.c | 33 + 2 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/pr65358.c diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 77d24a1..218f908 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2015-03-19 Honggyu Kim + + * gcc.dg/pr65358.c: New test. This should be included as part of Kyrill's patch. If the test goes in without Kryill's fix, then it'll just create testsuite noise. Jeff
RE: [PATCH v2] New testcase to check parameter passing bug
> -Original Message- > From: Jeff Law [mailto:l...@redhat.com] > Sent: 25 March 2015 12:27 > To: Honggyu Kim; gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov; seg...@kernel.crashing.org; christophe.l...@st.com > Subject: Re: [PATCH v2] New testcase to check parameter passing bug > > On 03/18/15 19:40, Honggyu Kim wrote: > > Hi, > > > > I have modified the test-case to check parameter passing bug based on > > the comments from Kyrill Tkachov, Christophe Lyon, and Segher > > Boessenkool as follows: > > 1. move from "gcc.target/arm" to "gcc.dg" > > 2. change "dg-do compile" to "dg-do run" > > > > Please let me know if there's still something to fix more. > > Thanks for your comment. > > > > Honggyu > > --- > > gcc/testsuite/ChangeLog|4 > > gcc/testsuite/gcc.dg/pr65358.c | 33 > + > > 2 files changed, 37 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/pr65358.c > > > > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index > > 77d24a1..218f908 100644 > > --- a/gcc/testsuite/ChangeLog > > +++ b/gcc/testsuite/ChangeLog > > @@ -1,3 +1,7 @@ > > +2015-03-19 Honggyu Kim > > + > > + * gcc.dg/pr65358.c: New test. > This should be included as part of Kyrill's patch. If the test goes in without > Kryill's fix, then it'll just create testsuite noise. I'll make sure to commit this together with my fix (at https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01014.html) if it gets approved. I agree that there's no point taking the test in by itself . Thanks, Kyrill > > Jeff >
Re: [patch libgomp]: Fix PR 64972
On Wed, Mar 25, 2015 at 01:35:02PM +0100, Kai Tietz wrote: > ChangeLog > > 2015-03-25 Kai Tietz > > PR libgomp/64972 > * oacc-parallel.c (GOACC_parallel): Use PRIu64 if available. > (GOACC_data_start): Likewise. > * target.c (gomp_map_vars): Likewise. > > Tested for i686-w64-mingw32. Fix got preapproved by Jakub, so I will > commit this soon, if there are no objections. The patch is ok to commit immediately, no need to wait. Jakub
Re: [PATCH, bootstrap]: Add bootstrap-lto-noplugin build configuration (PR65537)
On Tue, Mar 24, 2015 at 05:43:09PM +0100, Uros Bizjak wrote: > Attached patch introduces bootstrap-lto-noplugin bootstrap > configuration for hosts that do not support linker plugin (e.g. CentOS > 5.11 with binutils 2.17). Also, the patch adds some additional > documentation to bootstrap-lto option. > > config/ChangeLog: > > 2015-03-24 Uros Bizjak > > PR bootstrap/65537 > * bootstrap-lto-noplugin.mk: New build configuration. > > gcc/ChangeLog: > > 2015-03-24 Uros Bizjak > > PR bootstrap/65537 > * doc/install.texi (Building a native compiler): Document new > bootstrap-lto-noplugin configuration. Mention that bootstrap-lto > configuration assumes that the host supports the linker plugin. > > Patch was bootstrapped and tested on x86_64-linux-gnu (CentOS 5.11) > host, configured with --with-build-config=bootstrap-lto build > configuration. and not --with-build-config=bootstrap-lto-noplugin ? > OK for mainline? Ok, thanks. Jakub
[Obvious] Fix libstdc++/33394 testcase when cross-testing linux
When cross-testing, the -DITERATIONS=1000 flag replaced the -pthread required for linux targets, so the test failed to build. I've pushed the following test fix as r221666: Index: libstdc++-v3/testsuite/21_strings/basic_string/pthread33394.cc === --- libstdc++-v3/testsuite/21_strings/basic_string/pthread33394.cc (revision 221665) +++ libstdc++-v3/testsuite/21_strings/basic_string/pthread33394.cc (working copy) @@ -18,7 +18,7 @@ // { dg-do run { target *-*-freebsd* *-*-dragonfly* *-*-netbsd* *-*-linux* *-*-gnu* *-*-solaris* *-*-cygwin *-*-darwin* } } // { dg-options "-pthread" { target *-*-freebsd* *-*-dragonfly* *-*-netbsd* *-*-linux* *-*-gnu* *-*-solaris* } } -// { dg-options "-DITERATIONS=1000" { target simulator } } +// { dg-additional-options "-DITERATIONS=1000" { target simulator } } #ifndef ITERATIONS #define ITERATIONS 5 #endif Jonathan Wakely wrote: Adding a testcase so the bug can be closed. I believe the segfault was fixed for 3.4.0 by https://gcc.gnu.org/r67912 Tested x86_64-linux, committed to trunk.
Re: [Obvious] Fix libstdc++/33394 testcase when cross-testing linux
On 25/03/15 15:49 +, Alan Lawrence wrote: When cross-testing, the -DITERATIONS=1000 flag replaced the -pthread required for linux targets, so the test failed to build. I've pushed the following test fix as r221666: Ah yes, of course it does! Thanks for the fix. Index: libstdc++-v3/testsuite/21_strings/basic_string/pthread33394.cc === --- libstdc++-v3/testsuite/21_strings/basic_string/pthread33394.cc (revision 221665) +++ libstdc++-v3/testsuite/21_strings/basic_string/pthread33394.cc (working copy) @@ -18,7 +18,7 @@ // { dg-do run { target *-*-freebsd* *-*-dragonfly* *-*-netbsd* *-*-linux* *-*-gnu* *-*-solaris* *-*-cygwin *-*-darwin* } } // { dg-options "-pthread" { target *-*-freebsd* *-*-dragonfly* *-*-netbsd* *-*-linux* *-*-gnu* *-*-solaris* } } -// { dg-options "-DITERATIONS=1000" { target simulator } } +// { dg-additional-options "-DITERATIONS=1000" { target simulator } } #ifndef ITERATIONS #define ITERATIONS 5 #endif Jonathan Wakely wrote: Adding a testcase so the bug can be closed. I believe the segfault was fixed for 3.4.0 by https://gcc.gnu.org/r67912 Tested x86_64-linux, committed to trunk.
Re: [libstdc++/65033] Give alignment info to libatomic
On 18/02/15 12:15 +, Jonathan Wakely wrote: On 12/02/15 13:23 -0800, Richard Henderson wrote: When we fixed PR54005, making sure that atomic_is_lock_free returns the same value for all objects of a given type, we probably should have changed the interface so that we would pass size and alignment rather than size and object pointer. Instead, we decided that passing null for the object pointer would be sufficient. But as this PR shows, we really do need to take alignment into account. The following patch constructs a fake object pointer that is maximally misaligned. This allows the interface to both the builtin and to libatomic to remain unchanged. Which probably makes this back-portable to maintenance releases as well. Am I right in thinking that another option would be to ensure that std::atomic<> objects are always suitably aligned? Would that make std::atomic<> slightly more compatible with a C11 atomic_int, where the _Atomic qualifier affects alignment? https://gcc.gnu.org/PR62259 suggests we might need to enforce alignment on std::atomic anyway, or am I barking up the wrong tree? I've convinced myself that Richard's patch is correct in all cases, but I think we also want this patch, to fix PR62259 and PR65147. For the generic std::atomic (i.e. not the integral or pointer specializations) we should increase the alignment of atomic types that have the same size as one of the standard integral types. This should be consistent with what the C front end does for _Atomic, based on what Joseph told me on IRC: jwakely: _Atomic aligns 1/2/4/8/16-byte types the same as integer types of that size. (Which may not be alignment = size, depending on the architecture.) Ideally we'd use an attribute like Andrew describes at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62259#c4 but that's not going to happen for GCC 5, so this just looks for an integral type of the same size and uses its alignment. Tested x86_64-linux, powerpc64le-linux. I'll wait for RM approval for this and Richard's patch (which is OK from a libstdc++ perspective). commit bdcba837b42bbef3143ea59a0194bd3ef435dfcb Author: Jonathan Wakely Date: Wed Sep 3 15:39:53 2014 +0100 PR libstdc++/62259 PR libstdc++/65147 * include/std/atomic (atomic): Increase alignment for types with the same size as one of the integral types. * testsuite/29_atomics/atomic/60695.cc: Adjust dg-error line number. * testsuite/29_atomics/atomic/62259.cc: New. diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic index cc4b5f1..5f02fe8 100644 --- a/libstdc++-v3/include/std/atomic +++ b/libstdc++-v3/include/std/atomic @@ -165,7 +165,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION struct atomic { private: - _Tp _M_i; + // Align 1/2/4/8/16-byte types the same as integer types of that size. + // This matches the alignment effects of the C11 _Atomic qualifier. + static constexpr int _S_alignment + = sizeof(_Tp) == sizeof(char) ? alignof(char) + : sizeof(_Tp) == sizeof(short) ? alignof(short) + : sizeof(_Tp) == sizeof(int) ? alignof(int) + : sizeof(_Tp) == sizeof(long) ? alignof(long) + : sizeof(_Tp) == sizeof(long long) ? alignof(long long) +#ifdef _GLIBCXX_USE_INT128 + : sizeof(_Tp) == sizeof(__int128) ? alignof(__int128) +#endif + : alignof(_Tp); + + alignas(_S_alignment) _Tp _M_i; static_assert(__is_trivially_copyable(_Tp), "std::atomic requires a trivially copyable type"); diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/60695.cc b/libstdc++-v3/testsuite/29_atomics/atomic/60695.cc index b59c6ba..806ccb1 100644 --- a/libstdc++-v3/testsuite/29_atomics/atomic/60695.cc +++ b/libstdc++-v3/testsuite/29_atomics/atomic/60695.cc @@ -27,4 +27,4 @@ struct X { char stuff[0]; // GNU extension, type has zero size }; -std::atomic a; // { dg-error "not supported" "" { target *-*-* } 173 } +std::atomic a; // { dg-error "not supported" "" { target *-*-* } 186 } diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/62259.cc b/libstdc++-v3/testsuite/29_atomics/atomic/62259.cc new file mode 100644 index 000..cfe70b1 --- /dev/null +++ b/libstdc++-v3/testsuite/29_atomics/atomic/62259.cc @@ -0,0 +1,56 @@ +// Copyright (C) 2015 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +//
[PATCH, rs6000, 4.9] Backport little endian swap optimization to 4.9
Hi, The POWER-specific little-endian swap optimization pass has been burning in on mainline since last August. Since then there have been a few improvements and bug fixes, but the code is very stable. I've had some recent requests to get this code backported to 4.9, as it provides important performance benefits for vector computation. Most of the work is target-specific, but there are some target-independent changes to convert web.c to use a class structure so that this pass can inherit it. This has caused no problems and was not controversial when added to trunk. The rest of the patch is straightforward backporting of the target-specific pieces and test cases. There have been a few infrastructural changes to adjust to, but nothing major. After this goes in, I'll work on taking it back to 4.8 as well. OK for 4.9? Thanks, Bill gcc: 2015-03-25 Bill Schmidt Backport of r214242, r214254, and bug fix patches from mainline * config/rs6000/rs6000.c (context.h): New #include. (tree-pass.h): Likewise. (make_pass_analyze_swaps): New declaration. (rs6000_option_override): Register swap-optimization pass. (swap_web_entry): New class. (special_handling_values): New enum. (union_defs): New function. (union_uses): Likewise. (insn_is_load_p): Likewise. (insn_is_store_p): Likewise. (insn_is_swap_p): Likewise. (rtx_is_swappable_p): Likewise. (insn_is_swappable_p): Likewise. (chain_purpose): New enum. (chain_contains_only_swaps): New function. (mark_swaps_for_removal): Likewise. (swap_const_vector_halves): Likewise. (adjust_subreg_index): Likewise. (permute_load): Likewise. (permute_store): Likewise. (adjust_extract): Likewise. (adjust_splat): Likewise. (handle_special_swappables): Likewise. (replace_swap_with_copy): Likewise. (dump_swap_insn_table): Likewise. (rs6000_analyze_swaps): Likewise. (pass_data_analyze_swaps): New pass_data. (pass_analyze_swaps): New class. (pass_analyze_swaps::gate): New method. (pass_analyze_swaps::execute): New method. (make_pass_analyze_swaps): New function. * config/rs6000/rs6000.opt (moptimize-swaps): New option. * df.h (web_entry_base): New class, replacing struct web_entry. (web_entry_base::pred): New method. (web_entry_base::set_pred): Likewise. (web_entry_base::unionfind_root): Likewise. (web_entry_base::unionfind_union): Likewise. (unionfind_root): Delete external reference. (unionfind_union): Likewise. (union_defs): Likewise. * web.c (web_entry_base::unionfind_root): Convert to method. (web_entry_base::unionfind_union): Likewise. (web_entry): New class. (union_match_dups): Convert to use class structure. (union_defs): Likewise. (entry_register): Likewise. (web_main): Likewise. [testsuite] 2015-03-25 Bill Schmidt Backport r214254 and related tests from mainline * gcc.target/powerpc/swaps-p8-1.c: New test. * gcc.target/powerpc/swaps-p8-2.c: New test. * gcc.target/powerpc/swaps-p8-3.c: New test. * gcc.target/powerpc/swaps-p8-4.c: New test. * gcc.target/powerpc/swaps-p8-5.c: New test. * gcc.target/powerpc/swaps-p8-6.c: New test. * gcc.target/powerpc/swaps-p8-7.c: New test. * gcc.target/powerpc/swaps-p8-8.c: New test. * gcc.target/powerpc/swaps-p8-9.c: New test. * gcc.target/powerpc/swaps-p8-10.c: New test. * gcc.target/powerpc/swaps-p8-11.c: New test. * gcc.target/powerpc/swaps-p8-12.c: New test. * gcc.target/powerpc/swaps-p8-13.c: New test. * gcc.target/powerpc/swaps-p8-14.c: New test. * gcc.target/powerpc/swaps-p8-15.c: New test. * gcc.target/powerpc/swaps-p8-16.c: New test. * gcc.target/powerpc/swaps-p8-17.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 221633) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -80,6 +80,8 @@ #include "cgraph.h" #include "target-globals.h" #include "real.h" +#include "context.h" +#include "tree-pass.h" #if TARGET_XCOFF #include "xcoffout.h" /* get declarations of xcoff_*_section_name */ #endif @@ -1172,6 +1174,7 @@ static bool rs6000_secondary_reload_move (enum rs6 enum machine_mode, secondary_reload_info *, bool); +rtl_opt_pass *make_pass_analyze_swaps (gcc::context*); /* Hash table stuff for keeping track of TOC entries. */ @@ -4094,6 +4097,15 @@ static void rs6000_option_override (void) { (void) rs6000_option_override_internal (true); + + /* Register machine-sp
Re: [Patch, Fortran, pr60322] was: [Patch 1/2, Fortran, pr60322] [OOP] Incorrect bounds on polymorphic dummy array
Hi Dominique, hi all, you are absolutely right, Dominique: I missed the part of pr60322_base_*. But this time it is there and furthermore does solve the allocate( mold=e) and the loc(e) issue. Paul: I have simplified your patch by only checking whether the arg_expr.ts.type == BT_CLASS. All tests showed, that this enough to produce the correct code. Bootstraps and regtests ok on x86_64-linux-gnu/F20. Comments, please! Regards, Andre On Wed, 25 Mar 2015 10:43:34 +0100 Dominique d'Humières wrote: > Hi Andre, > > > Le 24 mars 2015 à 18:06, Andre Vehreschild a écrit : > > > > Hi all, > > > > I have worked on the comments Mikael gave me. I am now checking for > > class_pointer in the way he pointed out. > > > > Furthermore did I *join the two parts* of the patch into this one, because > > keeping both in sync was no benefit but only tedious and did not prove to be > > reviewed faster. > > Are you sure that you attached the right patch? It does not apply on a clean > tree unless I apply the patch at > > https://gcc.gnu.org/ml/fortran/2015-02/msg00105.html > > with minor surgery for gcc/fortran/expr.c. > > > Paul, Dominique: I have addressed the LOC issue that came up lately. Or > > rather the patch addressed it already. I feel like this is not tested very > > well, not the loc() call nor the sizeof() call as given in the 57305 > > second's download. > > The ICE is fixed and the LOC issue seems fixed. > > > Unfortunately, is that download not runable. I would love to see a test > > similar to that download, but couldn't come up with one, that satisfied me. > > Given that the patch's review will last some days, I still have enough time > > to come up with something beautiful which I will add then. > > I have changed the test to > > use iso_c_binding > implicit none > real, target :: e > class(*), allocatable, target :: a(:) > e = 1.0 > call add_element_poly(a,e) > print *, size(a) > call add_element_poly(a,e) > print *, size(a) > select type (a) > type is (real) > print *, a > end select > contains > subroutine add_element_poly(a,e) > use iso_c_binding > class(*),allocatable,intent(inout),target :: a(:) > class(*),intent(in),target :: e > class(*),allocatable,target :: tmp(:) > type(c_ptr) :: dummy > > interface > function memcpy(dest,src,n) bind(C,name="memcpy") result(res) > import > type(c_ptr) :: res > integer(c_intptr_t),value :: dest > integer(c_intptr_t),value :: src > integer(c_size_t),value :: n > end function > end interface > > if (.not.allocated(a)) then > allocate(a(1), source=e) > else > print *, size(a) > allocate(tmp(size(a)),source=a) > print *, size(a), size(tmp) + 1 > print *, loc(a(1)),loc(tmp),sizeof(tmp) > deallocate(a) > !allocate(a(size(tmp)+1),mold=e) > allocate(a(size(tmp)+1),source=e) > print *, size(a), size(tmp) > dummy = memcpy(loc(a(1)),loc(tmp),sizeof(tmp)) > dummy = memcpy(loc(a(size(tmp)+1)),loc(e),sizeof(e)) > end if > end subroutine > end > > As pointed by Paul, I get a segfault at run time if I use the commented line, > i.e. ‘mold’ instead of ‘source’. > > > Bootstraps and regtests ok on x86_64-linux-gnu/F20. > > > > Regards, > > Andre > > Thanks for your work. > > Dominique > -- Andre Vehreschild * Email: vehre ad gmx dot de pr60322_full_5.clog Description: Binary data diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c index ab6f7a5..7f3a59d 100644 --- a/gcc/fortran/expr.c +++ b/gcc/fortran/expr.c @@ -4052,6 +4052,7 @@ gfc_expr * gfc_lval_expr_from_sym (gfc_symbol *sym) { gfc_expr *lval; + gfc_array_spec *as; lval = gfc_get_expr (); lval->expr_type = EXPR_VARIABLE; lval->where = sym->declared_at; @@ -4059,10 +4060,10 @@ gfc_lval_expr_from_sym (gfc_symbol *sym) lval->symtree = gfc_find_symtree (sym->ns->sym_root, sym->name); /* It will always be a full array. */ - lval->rank = sym->as ? sym->as->rank : 0; + as = IS_CLASS_ARRAY (sym) ? CLASS_DATA (sym)->as : sym->as; + lval->rank = as ? as->rank : 0; if (lval->rank) -gfc_add_full_array_ref (lval, sym->ts.type == BT_CLASS ? - CLASS_DATA (sym)->as : sym->as); +gfc_add_full_array_ref (lval, as); return lval; } diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 8e6595f..901a1c0 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -3206,6 +3206,11 @@ bool gfc_is_finalizable (gfc_symbol *, gfc_expr **); && CLASS_DATA (sym) \ && CLASS_DATA (sym)->ts.u.derived \ && CLASS_DATA (sym)->ts.u.derived->attr.unlimited_polymorphic) +#define IS_CLASS_ARRAY(sym) \ + (sym->ts.type == BT_CLASS \ + && CLASS_DATA (sym) \ + && CLASS_DATA (sym)->attr.dimension \ + && !CLASS_DATA (sym)->attr.class_pointer) /* frontend-passes.c */ diff --git a/gcc/fortran/trans-array.c b/gcc/f
Re: [PATCH][AArch64][Testsuite] Fix gcc.target/aarch64/c-output-template-3.c
On Tue, Mar 24, 2015 at 05:46:57PM +, Alan Lawrence wrote: > Hmmm. This is not the right fix: the tests Richard fixed, were failing because > of lack of constant propagation and DCE at compile-time, which then didn't > eliminate the call to link_error. The AArch64 test is failing because this > from > aarch64/constraints.md: > > (define_constraint "S" > "A constraint that matches an absolute symbolic address." > (and (match_code "const,symbol_ref,label_ref") > (match_test "aarch64_symbolic_address_p (op)"))) > > previously was seeing (and being satisfied by): > > (const:DI (plus:DI (symbol_ref:DI ("test") [flags 0x3] 0x7fb7c60300 test>) > (const_int 4 [0x4]))) > > but following Richard's patch the constraint is evaluated only on: > > (reg/f:DI 73 [ D.2670 ]) I don't think we should get too concerned by this. There are a number of other constraints which we define which we can only satisfy given a level of optimisation. Take the I (immediate acceptable for an ADD instruction) constraint, which will fail for: int foo (int x) { int z = 5; __asm__ ("xxx %0 %1":"=r"(x) : "I"(z)); return x; } at O0 and happily produce: xxx x0 5 with optimisations. I think your original patch to add -O is just fine, but Marcus or Richard will need to approve it. Cheers, James
Re: [libstdc++/65033] Give alignment info to libatomic
On 03/25/2015 09:22 AM, Jonathan Wakely wrote: > private: > - _Tp _M_i; > + // Align 1/2/4/8/16-byte types the same as integer types of that size. > + // This matches the alignment effects of the C11 _Atomic qualifier. > + static constexpr int _S_alignment > + = sizeof(_Tp) == sizeof(char) ? alignof(char) > + : sizeof(_Tp) == sizeof(short) ? alignof(short) > + : sizeof(_Tp) == sizeof(int) ? alignof(int) > + : sizeof(_Tp) == sizeof(long) ? alignof(long) > + : sizeof(_Tp) == sizeof(long long) ? alignof(long long) > +#ifdef _GLIBCXX_USE_INT128 > + : sizeof(_Tp) == sizeof(__int128) ? alignof(__int128) > +#endif > + : alignof(_Tp); > + > + alignas(_S_alignment) _Tp _M_i; Surely not by reducing a larger alignment applied to _Tp. I.e. static constexpr int _S_min_alignment = sizeof(_Tp) == sizeof(char) ? alignof(char) : sizeof(_Tp) == sizeof(short) ? alignof(short) : sizeof(_Tp) == sizeof(int) ? alignof(int) : sizeof(_Tp) == sizeof(long) ? alignof(long) : sizeof(_Tp) == sizeof(long long) ? alignof(long long) #ifdef _GLIBCXX_USE_INT128 : sizeof(_Tp) == sizeof(__int128) ? alignof(__int128) #endif : 0; static constexpr int _S_alignment = _S_min_alignment > alignof(_Tp) ? _S_min_alignment : alignof(_Tp); r~
[PATCH] Add workaround for PR64715
Hi! As discussed in the PR, fixing this issue for real (make sure we at least until the objsz pass don't lose information on which field's address if any has been taken) is probably too dangerous at this point, so this patch just adds a simple workaround: another objsz pass instance run early before first ccp pass, in which we only process __bos (x, 1) and __bos (x, 3), and rather than folding them right away we instead just replace say _1 = __builtin_object_size (ptr_2, 1); with _7 = __builtin_object_size (ptr_2, 1); _1 = MIN <_7, 17>; if 17 is what the __builtin_object_size folds to. The reason for the MIN or MAX is that later DCE etc. could still make the value smaller later on (as shown in the third snippet of __builtin_object_size). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? For GCC 6 will need to write some real fix and revert this (except for the testcases). 2015-03-25 Jakub Jelinek PR tree-optimization/64715 * passes.def: Add another instance of pass_object_sizes before ccp1. * tree-object-size.c (pass_object_sizes::execute): In first_pass_instance, only handle __bos (, 1) and __bos (, 3) calls, and keep the call in the IL, as {MIN,MAX}_EXPR of the __bos result and the computed constant. Remove redundant checks, obsoleted by gimple_call_builtin_p test. When propagating folded __bos into uses, if the use is {MIN,MAX}_EXPR we can fold into constant, propagate even that constant into their uses. * gcc.dg/builtin-object-size-15.c: New test. * gcc.dg/pr64715-1.c: New test. * gcc.dg/pr64715-2.c: New test. --- gcc/passes.def.jj 2015-01-19 14:40:46.0 +0100 +++ gcc/passes.def 2015-03-25 12:18:21.079207954 +0100 @@ -77,6 +77,7 @@ along with GCC; see the file COPYING3. PUSH_INSERT_PASSES_WITHIN (pass_all_early_optimizations) NEXT_PASS (pass_remove_cgraph_callee_edges); NEXT_PASS (pass_rename_ssa_copies); + NEXT_PASS (pass_object_sizes); NEXT_PASS (pass_ccp); /* After CCP we rewrite no longer addressed locals into SSA form if possible. */ --- gcc/tree-object-size.c.jj 2015-03-20 17:58:31.0 +0100 +++ gcc/tree-object-size.c 2015-03-25 14:40:03.664185560 +0100 @@ -1268,25 +1268,60 @@ pass_object_sizes::execute (function *fu continue; init_object_sizes (); + + /* In the first pass instance, only attempt to fold +__builtin_object_size (x, 1) and __builtin_object_size (x, 3), +and rather than folding the builtin to the constant if any, +create a MIN_EXPR or MAX_EXPR of the __builtin_object_size +call result and the computed constant. */ + if (first_pass_instance) + { + tree ost = gimple_call_arg (call, 1); + if (tree_fits_uhwi_p (ost)) + { + unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost); + tree ptr = gimple_call_arg (call, 0); + tree lhs = gimple_call_lhs (call); + if ((object_size_type == 1 || object_size_type == 3) + && (TREE_CODE (ptr) == ADDR_EXPR + || TREE_CODE (ptr) == SSA_NAME) + && lhs) + { + tree type = TREE_TYPE (lhs); + unsigned HOST_WIDE_INT bytes + = compute_builtin_object_size (ptr, object_size_type); + if (bytes != (unsigned HOST_WIDE_INT) (object_size_type == 1 +? -1 : 0) + && wi::fits_to_tree_p (bytes, type)) + { + tree tem = make_ssa_name (type); + gimple_call_set_lhs (call, tem); + enum tree_code code + = object_size_type == 1 ? MIN_EXPR : MAX_EXPR; + tree cst = build_int_cstu (type, bytes); + gimple g = gimple_build_assign (lhs, code, tem, cst); + gsi_insert_after (&i, g, GSI_NEW_STMT); + update_stmt (call); + } + } + } + continue; + } + result = fold_call_stmt (as_a (call), false); if (!result) { - if (gimple_call_num_args (call) == 2 - && POINTER_TYPE_P (TREE_TYPE (gimple_call_arg (call, 0 - { - tree ost = gimple_call_arg (call, 1); + tree ost = gimple_call_arg (call, 1); - if (tree_fits_uhwi_p (ost)) - { - unsigned HOST_WIDE_INT object_size_type - = tree_to_uhwi (ost); + if (tree_fits_uhwi_p (ost)) +
Re: [libstdc++/65033] Give alignment info to libatomic
On 03/25/2015 09:22 AM, Jonathan Wakely wrote: > +static_assert( alignof(std::atomic) > alignof(int), > + "std::atomic not suitably aligned" ); This is only true if int64_t has alignment larger than int32_t, which is unfortunately not always the case. r~
Re: [libstdc++/65033] Give alignment info to libatomic
On 25/03/15 11:36 -0700, Richard Henderson wrote: On 03/25/2015 09:22 AM, Jonathan Wakely wrote: private: - _Tp _M_i; + // Align 1/2/4/8/16-byte types the same as integer types of that size. + // This matches the alignment effects of the C11 _Atomic qualifier. + static constexpr int _S_alignment + = sizeof(_Tp) == sizeof(char) ? alignof(char) + : sizeof(_Tp) == sizeof(short) ? alignof(short) + : sizeof(_Tp) == sizeof(int) ? alignof(int) + : sizeof(_Tp) == sizeof(long) ? alignof(long) + : sizeof(_Tp) == sizeof(long long) ? alignof(long long) +#ifdef _GLIBCXX_USE_INT128 + : sizeof(_Tp) == sizeof(__int128) ? alignof(__int128) +#endif + : alignof(_Tp); + + alignas(_S_alignment) _Tp _M_i; Surely not by reducing a larger alignment applied to _Tp. I.e. static constexpr int _S_min_alignment = sizeof(_Tp) == sizeof(char) ? alignof(char) : sizeof(_Tp) == sizeof(short) ? alignof(short) : sizeof(_Tp) == sizeof(int) ? alignof(int) : sizeof(_Tp) == sizeof(long) ? alignof(long) : sizeof(_Tp) == sizeof(long long) ? alignof(long long) #ifdef _GLIBCXX_USE_INT128 : sizeof(_Tp) == sizeof(__int128) ? alignof(__int128) #endif : 0; static constexpr int _S_alignment = _S_min_alignment > alignof(_Tp) ? _S_min_alignment : alignof(_Tp); Doh, good catch. I'll make that change and add a test with a type that has alignof(X) > sizeof(X). On 25/03/15 11:39 -0700, Richard Henderson wrote: On 03/25/2015 09:22 AM, Jonathan Wakely wrote: +static_assert( alignof(std::atomic) > alignof(int), + "std::atomic not suitably aligned" ); This is only true if int64_t has alignment larger than int32_t, which is unfortunately not always the case. Huh, didn't realise that. I could change the tests to check it's alignof(std::int64_t) as the next assertion does, but is it safe to assume that struct twoints { int a; int b; } is exactly 64 bits everywhere? I'd prefer not to have the test say "if sizeof(twoints) == sizeof(long), test this, otherwise if sizeof(twoints) == ..."
C++ PATCH for c++/65558 (ICE with abi_tag on anon namespace)
As discussed in the PR, the abi_tag on an anonymous namespace is useless, but we shouldn't ICE if the user attempts to do that. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-03-25 Marek Polacek PR c++/65558 * name-lookup.c (handle_namespace_attrs): Ignore abi_tag attribute on an anonymous namespace. * g++.dg/cpp0x/pr65558.C: New test. diff --git gcc/cp/name-lookup.c gcc/cp/name-lookup.c index b85fbc9..4303ed5 100644 --- gcc/cp/name-lookup.c +++ gcc/cp/name-lookup.c @@ -3663,6 +3663,12 @@ handle_namespace_attrs (tree ns, tree attributes) "namespace", name); continue; } + if (!DECL_NAME (ns)) + { + warning (OPT_Wattributes, "ignoring %qD attribute on anonymous " + "namespace", name); + continue; + } if (!args) { tree dn = DECL_NAME (ns); diff --git gcc/testsuite/g++.dg/cpp0x/pr65558.C gcc/testsuite/g++.dg/cpp0x/pr65558.C index e69de29..5437e50 100644 --- gcc/testsuite/g++.dg/cpp0x/pr65558.C +++ gcc/testsuite/g++.dg/cpp0x/pr65558.C @@ -0,0 +1,6 @@ +// PR c++/65558 +// { dg-do compile { target c++11 } } + +inline namespace __attribute__((__abi_tag__)) +{ // { dg-warning "ignoring .__abi_tag__. attribute on anonymous namespace" } +} Marek
Re: [libstdc++/65033] Give alignment info to libatomic
On 03/25/2015 11:49 AM, Jonathan Wakely wrote: > On 25/03/15 11:36 -0700, Richard Henderson wrote: >> On 03/25/2015 09:22 AM, Jonathan Wakely wrote: > On 25/03/15 11:39 -0700, Richard Henderson wrote: >> On 03/25/2015 09:22 AM, Jonathan Wakely wrote: >>> +static_assert( alignof(std::atomic) > alignof(int), >>> + "std::atomic not suitably aligned" ); >> >> This is only true if int64_t has alignment larger than int32_t, >> which is unfortunately not always the case. > > Huh, didn't realise that. I could change the tests to check it's > alignof(std::int64_t) as the next assertion does, but is it safe to > assume that struct twoints { int a; int b; } is exactly 64 bits > everywhere? Certainly not. But if you're going to explicitly use int64_t elsewhere, you might as well explicitly use int32_t as well. Then I believe you can reasonably assert alignof(twoint32) == alignof(int64_t) r~
Re: [PATCH, bootstrap]: Add bootstrap-lto-noplugin build configuration (PR65537)
On Wed, Mar 25, 2015 at 3:23 PM, Jakub Jelinek wrote: >> Attached patch introduces bootstrap-lto-noplugin bootstrap >> configuration for hosts that do not support linker plugin (e.g. CentOS >> 5.11 with binutils 2.17). Also, the patch adds some additional >> documentation to bootstrap-lto option. >> >> config/ChangeLog: >> >> 2015-03-24 Uros Bizjak >> >> PR bootstrap/65537 >> * bootstrap-lto-noplugin.mk: New build configuration. >> >> gcc/ChangeLog: >> >> 2015-03-24 Uros Bizjak >> >> PR bootstrap/65537 >> * doc/install.texi (Building a native compiler): Document new >> bootstrap-lto-noplugin configuration. Mention that bootstrap-lto >> configuration assumes that the host supports the linker plugin. >> >> Patch was bootstrapped and tested on x86_64-linux-gnu (CentOS 5.11) >> host, configured with --with-build-config=bootstrap-lto build >> configuration. > > and not --with-build-config=bootstrap-lto-noplugin ? Oh ... with bootstrap-lto-noplugin option. The bootstrap with linker plugin does not work at all on CentOS 5.11. Uros.
Re: C++ PATCH for c++/65558 (ICE with abi_tag on anon namespace)
OK. Jason
Re: C++ PATCH for c++/61670 (ice-after-error with null DECL_SIZE)
OK. Jason
Re: [debug-early] emit early dwarf for locally scoped functions
On 03/24/2015 02:00 PM, Aldy Hernandez wrote: I found that for locally scoped functions we were not emitting early dwarf. Why weren't they being emitted as part of their enclosing function? They should be. Jason
Re: [debug-early] emit early dwarf for locally scoped functions
On 03/25/2015 12:37 PM, Jason Merrill wrote: On 03/24/2015 02:00 PM, Aldy Hernandez wrote: I found that for locally scoped functions we were not emitting early dwarf. Why weren't they being emitted as part of their enclosing function? They should be. Jason Hmm, you're right. Sorry for being so sloppy. What is actually happening is that when the declaration is seen, nameless DIEs for the types are generated, which are then used when the cached subprogram DIE is seen the second time. The nameless DIEs end up looking like this because we don't have the "this" name: char Object_method(Object * const); whereas the function type should be: char Object_method(void); I now understand what this was doing in mainline: /* Clear out the declaration attribute and the formal parameters. Do not remove all children, because it is possible that this declaration die was forced using force_decl_die(). In such cases die that forced declaration die (e.g. TAG_imported_module) is one of the children that we do not want to remove. */ remove_AT (subr_die, DW_AT_declaration); remove_AT (subr_die, DW_AT_object_pointer); remove_child_TAG (subr_die, DW_TAG_formal_parameter); I suppose we could re-use the DW_AT_object_pointer and DW_TAG_formal_parameter, and tack on the DW_AT_name now that we know it? Or we could cheat and just remove them as mainline does, but only when reusing a declaration (as in the attached patch). What do you think? Aldy diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 48e2eed..4bc945f 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3113,7 +3113,7 @@ static inline dw_die_ref get_AT_ref (dw_die_ref, enum dwarf_attribute); static bool is_cxx (void); static bool is_fortran (void); static bool is_ada (void); -static void remove_AT (dw_die_ref, enum dwarf_attribute); +static bool remove_AT (dw_die_ref, enum dwarf_attribute); static void remove_child_TAG (dw_die_ref, enum dwarf_tag); static void add_child_die (dw_die_ref, dw_die_ref); static dw_die_ref new_die (enum dwarf_tag, dw_die_ref, tree); @@ -4752,16 +4752,17 @@ is_ada (void) return lang == DW_LANG_Ada95 || lang == DW_LANG_Ada83; } -/* Remove the specified attribute if present. */ +/* Remove the specified attribute if present. Return TRUE if removal + was successful. */ -static void +static bool remove_AT (dw_die_ref die, enum dwarf_attribute attr_kind) { dw_attr_ref a; unsigned ix; if (! die) -return; +return false; FOR_EACH_VEC_SAFE_ELT (die->die_attr, ix, a) if (a->dw_attr == attr_kind) @@ -4773,8 +4774,9 @@ remove_AT (dw_die_ref die, enum dwarf_attribute attr_kind) /* vec::ordered_remove should help reduce the number of abbrevs that are needed. */ die->die_attr->ordered_remove (ix); - return; + return true; } + return false; } /* Remove CHILD from its parent. PREV must have the property that @@ -18790,8 +18792,15 @@ gen_subprogram_die (tree decl, dw_die_ref context_die) /* Clear out the declaration attribute, but leave the parameters so they can be augmented with location -information later. */ - remove_AT (subr_die, DW_AT_declaration); +information later. Unless this was a declaration, in +which case, wipe out the nameless parameters and recreate +them further down. */ + if (remove_AT (subr_die, DW_AT_declaration)) + { + + remove_AT (subr_die, DW_AT_object_pointer); + remove_child_TAG (subr_die, DW_TAG_formal_parameter); + } } /* Make a specification pointing to the previously built declaration. */
libgo patch committed: Add runtime/cgo to list of standard packages
PR 65570 points out that the recent patch to the go tool breaks the use of cgo (and obviously also points out that we need better testing for go and cgo). The problem is that the go tool treats the runtime/cgo package specially. Although gccgo doesn't use that package, the go tool needs to know that it has no source code. This patch fixes it. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 2048f406394a libgo/Makefile.am --- a/libgo/Makefile.am Tue Mar 24 13:54:30 2015 -0700 +++ b/libgo/Makefile.am Wed Mar 25 14:14:41 2015 -0700 @@ -987,7 +987,7 @@ echo 'package main' > zstdpkglist.go.tmp echo "" >> zstdpkglist.go.tmp echo 'var stdpkg = map[string]bool{' >> zstdpkglist.go.tmp - echo $(libgo_go_objs) 'unsafe.lo' | sed 's/\.lo /\": true,\n/g' | sed 's/\.lo/\": true,/' | sed 's/-go//' | grep -v _c | sed 's/^/\t\"/' | sort | uniq >> zstdpkglist.go.tmp + echo $(libgo_go_objs) 'unsafe.lo' 'runtime/cgo.lo' | sed 's/\.lo /\": true,\n/g' | sed 's/\.lo/\": true,/' | sed 's/-go//' | grep -v _c | sed 's/^/\t\"/' | sort | uniq >> zstdpkglist.go.tmp echo '}' >> zstdpkglist.go.tmp $(SHELL) $(srcdir)/mvifdiff.sh zstdpkglist.go.tmp zstdpkglist.go $(STAMP) $@
Re: [PATCH, rs6000, 4.9] Backport little endian swap optimization to 4.9
On Wed, Mar 25, 2015 at 12:42 PM, Bill Schmidt wrote: > Hi, > > The POWER-specific little-endian swap optimization pass has been burning > in on mainline since last August. Since then there have been a few > improvements and bug fixes, but the code is very stable. I've had some > recent requests to get this code backported to 4.9, as it provides > important performance benefits for vector computation. > > Most of the work is target-specific, but there are some > target-independent changes to convert web.c to use a class structure so > that this pass can inherit it. This has caused no problems and was not > controversial when added to trunk. > > The rest of the patch is straightforward backporting of the > target-specific pieces and test cases. There have been a few > infrastructural changes to adjust to, but nothing major. > > After this goes in, I'll work on taking it back to 4.8 as well. OK for > 4.9? > > Thanks, > Bill > > > gcc: > > 2015-03-25 Bill Schmidt > > Backport of r214242, r214254, and bug fix patches from mainline > * config/rs6000/rs6000.c (context.h): New #include. > (tree-pass.h): Likewise. > (make_pass_analyze_swaps): New declaration. > (rs6000_option_override): Register swap-optimization pass. > (swap_web_entry): New class. > (special_handling_values): New enum. > (union_defs): New function. > (union_uses): Likewise. > (insn_is_load_p): Likewise. > (insn_is_store_p): Likewise. > (insn_is_swap_p): Likewise. > (rtx_is_swappable_p): Likewise. > (insn_is_swappable_p): Likewise. > (chain_purpose): New enum. > (chain_contains_only_swaps): New function. > (mark_swaps_for_removal): Likewise. > (swap_const_vector_halves): Likewise. > (adjust_subreg_index): Likewise. > (permute_load): Likewise. > (permute_store): Likewise. > (adjust_extract): Likewise. > (adjust_splat): Likewise. > (handle_special_swappables): Likewise. > (replace_swap_with_copy): Likewise. > (dump_swap_insn_table): Likewise. > (rs6000_analyze_swaps): Likewise. > (pass_data_analyze_swaps): New pass_data. > (pass_analyze_swaps): New class. > (pass_analyze_swaps::gate): New method. > (pass_analyze_swaps::execute): New method. > (make_pass_analyze_swaps): New function. > * config/rs6000/rs6000.opt (moptimize-swaps): New option. > * df.h (web_entry_base): New class, replacing struct web_entry. > (web_entry_base::pred): New method. > (web_entry_base::set_pred): Likewise. > (web_entry_base::unionfind_root): Likewise. > (web_entry_base::unionfind_union): Likewise. > (unionfind_root): Delete external reference. > (unionfind_union): Likewise. > (union_defs): Likewise. > * web.c (web_entry_base::unionfind_root): Convert to method. > (web_entry_base::unionfind_union): Likewise. > (web_entry): New class. > (union_match_dups): Convert to use class structure. > (union_defs): Likewise. > (entry_register): Likewise. > (web_main): Likewise. > > [testsuite] > > 2015-03-25 Bill Schmidt > > Backport r214254 and related tests from mainline > * gcc.target/powerpc/swaps-p8-1.c: New test. > * gcc.target/powerpc/swaps-p8-2.c: New test. > * gcc.target/powerpc/swaps-p8-3.c: New test. > * gcc.target/powerpc/swaps-p8-4.c: New test. > * gcc.target/powerpc/swaps-p8-5.c: New test. > * gcc.target/powerpc/swaps-p8-6.c: New test. > * gcc.target/powerpc/swaps-p8-7.c: New test. > * gcc.target/powerpc/swaps-p8-8.c: New test. > * gcc.target/powerpc/swaps-p8-9.c: New test. > * gcc.target/powerpc/swaps-p8-10.c: New test. > * gcc.target/powerpc/swaps-p8-11.c: New test. > * gcc.target/powerpc/swaps-p8-12.c: New test. > * gcc.target/powerpc/swaps-p8-13.c: New test. > * gcc.target/powerpc/swaps-p8-14.c: New test. > * gcc.target/powerpc/swaps-p8-15.c: New test. > * gcc.target/powerpc/swaps-p8-16.c: New test. > * gcc.target/powerpc/swaps-p8-17.c: New test. Okay. However, I was hoping to perform the backport of both this piece and your newer swapping patches together, but those patches cannot go in until GCC 5 is released and trunk is re-opened for non-bug fix patches. Once some of the optimizations are applied, users complain about other extraneous swaps addressed by your next set of patches. Thanks, David
[PATCH, testsuite]: Fix gcc.target/i386/sse-{13,23}.c
Hello! For some reason gcc.target/i386/sse-13.c lost its #include . Attached patch fixes this issue and adjusts corresponding #defines. The patch also removes extra #includes from sse-23.c. 2015-03-25 Uros Bizjak * gcc.target/i386/sse-13.c: Include x86intrin.h and adjust #defines. * gcc.target/i386/sse-23.c: Do not explicitly include wmmintrin.h, smmintrin.h and mm3dnow.h. Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: gcc.target/i386/sse-13.c === --- gcc.target/i386/sse-13.c(revision 221669) +++ gcc.target/i386/sse-13.c(working copy) @@ -55,20 +55,6 @@ #define __builtin_ia32_vcvtps2ph(A, I) __builtin_ia32_vcvtps2ph(A, 1) #define __builtin_ia32_vcvtps2ph256(A, I) __builtin_ia32_vcvtps2ph256(A, 1) -/* avx512pfintrin.h */ -#define __builtin_ia32_gatherpfdps(A, B, C, D, E) __builtin_ia32_gatherpfdps (A, B, C, 1, 1) -#define __builtin_ia32_gatherpfqps(A, B, C, D, E) __builtin_ia32_gatherpfqps (A, B, C, 1, 1) -#define __builtin_ia32_scatterpfdps(A, B, C, D, E) __builtin_ia32_scatterpfdps (A, B, C, 1, 1) -#define __builtin_ia32_scatterpfqps(A, B, C, D, E) __builtin_ia32_scatterpfqps (A, B, C, 1, 1) - -/* avx512erintrin.h */ -#define __builtin_ia32_exp2pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask (A, B, C, 1) -#define __builtin_ia32_exp2ps_mask(A, B, C, D) __builtin_ia32_exp2ps_mask (A, B, C, 1) -#define __builtin_ia32_rcp28pd_mask(A, B, C, D) __builtin_ia32_rcp28pd_mask (A, B, C, 1) -#define __builtin_ia32_rcp28ps_mask(A, B, C, D) __builtin_ia32_rcp28ps_mask (A, B, C, 1) -#define __builtin_ia32_rsqrt28pd_mask(A, B, C, D) __builtin_ia32_rsqrt28pd_mask (A, B, C, 1) -#define __builtin_ia32_rsqrt28ps_mask(A, B, C, D) __builtin_ia32_rsqrt28ps_mask (A, B, C, 1) - /* wmmintrin.h */ #define __builtin_ia32_aeskeygenassist128(X, C) __builtin_ia32_aeskeygenassist128(X, 1) #define __builtin_ia32_pclmulqdq128(X, Y, I) __builtin_ia32_pclmulqdq128(X, Y, 1) @@ -195,13 +181,13 @@ #define __builtin_ia32_gatherdiv4si256(X, Y, Z, K, M) __builtin_ia32_gatherdiv4si256(X, Y, Z, K, 1) /* rtmintrin.h */ -#define __builtin_ia32_xabort (N) __builtin_ia32_xabort (1) +#define __builtin_ia32_xabort(N) __builtin_ia32_xabort(1) /* avx512fintrin.h */ #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 8) #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 8) -#define __builtin_ia32_addsd_mask(A, B, C, D, E) __builtin_ia32_addsd_mask(A, B, C, D, 8) -#define __builtin_ia32_addss_mask(A, B, C, D, E) __builtin_ia32_addss_mask(A, B, C, D, 8) +#define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 8) +#define __builtin_ia32_addss_round(A, B, C) __builtin_ia32_addss_round(A, B, 8) #define __builtin_ia32_alignd512_mask(A, B, F, D, E) __builtin_ia32_alignd512_mask(A, B, 1, D, E) #define __builtin_ia32_alignq512_mask(A, B, F, D, E) __builtin_ia32_alignq512_mask(A, B, 1, D, E) #define __builtin_ia32_cmpd512_mask(A, B, E, D) __builtin_ia32_cmpd512_mask(A, B, 1, D) @@ -217,11 +203,11 @@ #define __builtin_ia32_cvtps2dq512_mask(A, B, C, D) __builtin_ia32_cvtps2dq512_mask(A, B, C, 8) #define __builtin_ia32_cvtps2pd512_mask(A, B, C, D) __builtin_ia32_cvtps2pd512_mask(A, B, C, 8) #define __builtin_ia32_cvtps2udq512_mask(A, B, C, D) __builtin_ia32_cvtps2udq512_mask(A, B, C, 8) -#define __builtin_ia32_cvtsd2ss_mask(A, B, C, D, E) __builtin_ia32_cvtsd2ss_mask(A, B, C, D, 8) +#define __builtin_ia32_cvtsd2ss_round(A, B, C) __builtin_ia32_cvtsd2ss_round(A, B, 8) +#define __builtin_ia32_cvtss2sd_round(A, B, C) __builtin_ia32_cvtss2sd_round(A, B, 4) #define __builtin_ia32_cvtsi2sd64(A, B, C) __builtin_ia32_cvtsi2sd64(A, B, 8) #define __builtin_ia32_cvtsi2ss32(A, B, C) __builtin_ia32_cvtsi2ss32(A, B, 8) #define __builtin_ia32_cvtsi2ss64(A, B, C) __builtin_ia32_cvtsi2ss64(A, B, 8) -#define __builtin_ia32_cvtss2sd_mask(A, B, C, D, E) __builtin_ia32_cvtss2sd_mask(A, B, C, D, 8) #define __builtin_ia32_cvttpd2dq512_mask(A, B, C, D) __builtin_ia32_cvttpd2dq512_mask(A, B, C, 8) #define __builtin_ia32_cvttpd2udq512_mask(A, B, C, D) __builtin_ia32_cvttpd2udq512_mask(A, B, C, 8) #define __builtin_ia32_cvttps2dq512_mask(A, B, C, D) __builtin_ia32_cvttps2dq512_mask(A, B, C, 8) @@ -232,8 +218,8 @@ #define __builtin_ia32_cvtusi2ss64(A, B, C) __builtin_ia32_cvtusi2ss64(A, B, 8) #define __builtin_ia32_divpd512_mask(A, B, C, D, E) __builtin_ia32_divpd512_mask(A, B, C, D, 8) #define __builtin_ia32_divps512_mask(A, B, C, D, E) __builtin_ia32_divps512_mask(A, B, C, D, 8) -#define __builtin_ia32_divsd_mask(A, B, C, D, E) __builtin_ia32_divsd_mask(A, B, C, D, 8) -#define __builtin_ia32_divss_mask(A, B, C, D, E) __builtin_ia32_divss_mask(A, B, C, D, 8) +#define __builtin_ia32_divsd_round(A, B, C) __builtin_ia32_divsd_round(A, B, 8) +#define __builtin_ia32_divss_round(A, B, C) __builtin_ia32_divss_roun
Re: [PATCH, rs6000, 4.9] Backport little endian swap optimization to 4.9
On Wed, 2015-03-25 at 17:56 -0400, David Edelsohn wrote: > On Wed, Mar 25, 2015 at 12:42 PM, Bill Schmidt > wrote: > > Hi, > > > > The POWER-specific little-endian swap optimization pass has been burning > > in on mainline since last August. Since then there have been a few > > improvements and bug fixes, but the code is very stable. I've had some > > recent requests to get this code backported to 4.9, as it provides > > important performance benefits for vector computation. > > > > Most of the work is target-specific, but there are some > > target-independent changes to convert web.c to use a class structure so > > that this pass can inherit it. This has caused no problems and was not > > controversial when added to trunk. > > > > The rest of the patch is straightforward backporting of the > > target-specific pieces and test cases. There have been a few > > infrastructural changes to adjust to, but nothing major. > > > > After this goes in, I'll work on taking it back to 4.8 as well. OK for > > 4.9? > > > > Thanks, > > Bill > > > > > > gcc: > > > > 2015-03-25 Bill Schmidt > > > > Backport of r214242, r214254, and bug fix patches from mainline > > * config/rs6000/rs6000.c (context.h): New #include. > > (tree-pass.h): Likewise. > > (make_pass_analyze_swaps): New declaration. > > (rs6000_option_override): Register swap-optimization pass. > > (swap_web_entry): New class. > > (special_handling_values): New enum. > > (union_defs): New function. > > (union_uses): Likewise. > > (insn_is_load_p): Likewise. > > (insn_is_store_p): Likewise. > > (insn_is_swap_p): Likewise. > > (rtx_is_swappable_p): Likewise. > > (insn_is_swappable_p): Likewise. > > (chain_purpose): New enum. > > (chain_contains_only_swaps): New function. > > (mark_swaps_for_removal): Likewise. > > (swap_const_vector_halves): Likewise. > > (adjust_subreg_index): Likewise. > > (permute_load): Likewise. > > (permute_store): Likewise. > > (adjust_extract): Likewise. > > (adjust_splat): Likewise. > > (handle_special_swappables): Likewise. > > (replace_swap_with_copy): Likewise. > > (dump_swap_insn_table): Likewise. > > (rs6000_analyze_swaps): Likewise. > > (pass_data_analyze_swaps): New pass_data. > > (pass_analyze_swaps): New class. > > (pass_analyze_swaps::gate): New method. > > (pass_analyze_swaps::execute): New method. > > (make_pass_analyze_swaps): New function. > > * config/rs6000/rs6000.opt (moptimize-swaps): New option. > > * df.h (web_entry_base): New class, replacing struct web_entry. > > (web_entry_base::pred): New method. > > (web_entry_base::set_pred): Likewise. > > (web_entry_base::unionfind_root): Likewise. > > (web_entry_base::unionfind_union): Likewise. > > (unionfind_root): Delete external reference. > > (unionfind_union): Likewise. > > (union_defs): Likewise. > > * web.c (web_entry_base::unionfind_root): Convert to method. > > (web_entry_base::unionfind_union): Likewise. > > (web_entry): New class. > > (union_match_dups): Convert to use class structure. > > (union_defs): Likewise. > > (entry_register): Likewise. > > (web_main): Likewise. > > > > [testsuite] > > > > 2015-03-25 Bill Schmidt > > > > Backport r214254 and related tests from mainline > > * gcc.target/powerpc/swaps-p8-1.c: New test. > > * gcc.target/powerpc/swaps-p8-2.c: New test. > > * gcc.target/powerpc/swaps-p8-3.c: New test. > > * gcc.target/powerpc/swaps-p8-4.c: New test. > > * gcc.target/powerpc/swaps-p8-5.c: New test. > > * gcc.target/powerpc/swaps-p8-6.c: New test. > > * gcc.target/powerpc/swaps-p8-7.c: New test. > > * gcc.target/powerpc/swaps-p8-8.c: New test. > > * gcc.target/powerpc/swaps-p8-9.c: New test. > > * gcc.target/powerpc/swaps-p8-10.c: New test. > > * gcc.target/powerpc/swaps-p8-11.c: New test. > > * gcc.target/powerpc/swaps-p8-12.c: New test. > > * gcc.target/powerpc/swaps-p8-13.c: New test. > > * gcc.target/powerpc/swaps-p8-14.c: New test. > > * gcc.target/powerpc/swaps-p8-15.c: New test. > > * gcc.target/powerpc/swaps-p8-16.c: New test. > > * gcc.target/powerpc/swaps-p8-17.c: New test. > > Okay. > > However, I was hoping to perform the backport of both this piece and > your newer swapping patches together, but those patches cannot go in > until GCC 5 is released and trunk is re-opened for non-bug fix > patches. Once some of the optimizations are applied, users complain > about other extraneous swaps addressed by your next set of patches. Thanks, David. I agree, and that was my original plan until we had some customer r
Optimize lto location stremaing
Hi, linemap is optimized for situation where parser enters positions into it in source order. LTO does not work this way - it attach locations to trees and reads them more or less randomly. This results in large memory use of linemaps, slow lookups (that are critical for WPA stremaing) and as i noticed recently also wrong line&column info. This patch changes the way by streaming in the location into cache that is ordered and applied in source order. The cache also knows how to cheaply discard elements for linemaps of trees that was rmeoved by tree merging. One catch ist hat the linemaps are not present in trees and thus can not be expanded, copied or relocated before calling lto_apply_location_cache. I hope I caught the cases where this can happen. This include 1) calling debug hooks during ltrans from lto_read_decls 2) producing odr violation warnings from ipa-devirt 3) modifying locations to record blocks (unpack_ts_block_value_fields) 4) for safety I skipped the trick for gimple streaming for now becuase at least PHI args can probably be relocated. Bootstrapped/regtested x86_64-linux, the patch saves about 1GB of locators for chromium and 400MB for firefox LTO. OK? Honza PR lto/65536 * streamer-hooks.h (struct streamer_hooks): Make input_location to take pointer to location. (stream_input_location): Update. (lto_apply_location_cache, lto_revert_location_cache, lto_accept_location_cache): Declare. (stream_input_location_now): New inline function. * ipa-devirt.c: Include streamer-hooks.h. (warn_odr): Apply location cache before warning. (lto_input_location): Update prototype. * gimple-streamer-in.c (input_phi, input_gimple_stmt): Use stream_input_location_now. * lto/lto.c (unify_scc): Revert location cache when unification suceeded. (lto_read_decls): Accept location cache after sucess; apply location cache before calling debug hooks. * lto-streamer-in.c (struct cached_location): New. (loc_cache, accepted_length, current_file, current_line, current_col, current_loc): New static vars. (cmp_loc): New function. (lto_apply_location_cache): New function. (lto_accept_location_cache): New function. (lto_revert_location_cache): New function. (lto_input_location): Do location caching. (input_eh_region, input_struct_function_base): Use stream_input_location_now. * tree-streamer-in.c (unpack_ts_block_value_fields, unpack_ts_omp_clause_value_fields, streamer_read_tree_bitfields, lto_input_ts_exp_tree_pointers): Update for cached location api. Index: streamer-hooks.h === --- streamer-hooks.h(revision 221582) +++ streamer-hooks.h(working copy) @@ -52,7 +52,7 @@ struct streamer_hooks { tree (*read_tree) (struct lto_input_block *, struct data_in *); /* [REQ] Called by every streaming routine that needs to read a location. */ - location_t (*input_location) (struct bitpack_d *, struct data_in *); + void (*input_location) (location_t *, struct bitpack_d *, struct data_in *); /* [REQ] Called by every streaming routine that needs to write a location. */ void (*output_location) (struct output_block *, struct bitpack_d *, location_t); @@ -67,8 +67,8 @@ struct streamer_hooks { #define stream_read_tree(IB, DATA_IN) \ streamer_hooks.read_tree (IB, DATA_IN) -#define stream_input_location(BP, DATA_IN) \ -streamer_hooks.input_location (BP, DATA_IN) +#define stream_input_location(LOCPTR, BP, DATA_IN) \ +streamer_hooks.input_location (LOCPTR, BP, DATA_IN) #define stream_output_location(OB, BP, LOC) \ streamer_hooks.output_location (OB, BP, LOC) @@ -78,5 +78,21 @@ extern struct streamer_hooks streamer_ho /* In streamer-hooks.c. */ void streamer_hooks_init (void); +bool lto_apply_location_cache (); +void lto_revert_location_cache (); +void lto_accept_location_cache (); + +/* Read location and return it instead of going through location caching. + This should be used only when the resulting location is not going to be + discarded. */ + +inline location_t +stream_input_location_now (struct bitpack_d *bp, struct data_in *data) +{ + location_t loc; + streamer_hooks.input_location (&loc, bp, data); + lto_apply_location_cache (); + return loc; +} #endif /* GCC_STREAMER_HOOKS_H */ Index: ipa-devirt.c === --- ipa-devirt.c(revision 221582) +++ ipa-devirt.c(working copy) @@ -166,7 +166,7 @@ along with GCC; see the file COPYING3. #include "gimple-pretty-print.h" #include "stor-layout.h" #include "intl.h" -#include "demangle.h" +#include "streamer-hooks.h" /* Hash based set of pairs of types. */ typedef struct @@ -936,6 +936,8 @@ warn_odr (tree t1, tree t2, tree st1, tr if (!wa
[PATCH] testsuite checks for arm vectorization support on non-arm targets
The attached patch adds tests to lib/target-supports.exp to avoid unnecessarily invoking the compiler on non-ARM targets to check for the support for a number of ARM vectorization features. Okay to commit to trunk? Martin 2015-03-23 Martin Sebor * lib/target-supports.exp (check_effective_target_arm32): Fail early when target isn't arm*-*-*-*. (check_effective_target_arm_nothumb): Likewise. (check_effective_target_arm_little_endian): Likewise. (check_effective_target_arm_vect_no_misalign): Likewise. (check_effective_target_aarch64_little_endian): Fail early if target isn't aarch64*-*-*. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 6b957de..25786df 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2373,6 +2373,10 @@ proc check_effective_target_aarch64_big_endian { } { # Return 1 if this is a AArch64 target supporting little endian proc check_effective_target_aarch64_little_endian { } { +if { ![istarget aarch64*-*-*] } { + return 0 +} + return [check_no_compiler_messages aarch64_little_endian assembly { #if !defined(__aarch64__) || defined(__AARCH64EB__) #error FOO @@ -2382,6 +2386,10 @@ proc check_effective_target_aarch64_little_endian { } { # Return 1 if this is an arm target using 32-bit instructions proc check_effective_target_arm32 { } { +if { ![istarget arm*-*-*] } { + return 0 +} + return [check_no_compiler_messages arm32 assembly { #if !defined(__arm__) || (defined(__thumb__) && !defined(__thumb2__)) #error !__arm || __thumb__ && !__thumb2__ @@ -2391,6 +2399,10 @@ proc check_effective_target_arm32 { } { # Return 1 if this is an arm target not using Thumb proc check_effective_target_arm_nothumb { } { +if { ![istarget arm*-*-*] } { + return 0 +} + return [check_no_compiler_messages arm_nothumb assembly { #if !defined(__arm__) || (defined(__thumb__) || defined(__thumb2__)) #error !__arm__ || __thumb || __thumb2__ @@ -2400,6 +2412,10 @@ proc check_effective_target_arm_nothumb { } { # Return 1 if this is a little-endian ARM target proc check_effective_target_arm_little_endian { } { +if { ![istarget arm*-*-*] } { + return 0 +} + return [check_no_compiler_messages arm_little_endian assembly { #if !defined(__arm__) || !defined(__ARMEL__) #error !__arm__ || !__ARMEL__ @@ -2409,6 +2425,10 @@ proc check_effective_target_arm_little_endian { } { # Return 1 if this is an ARM target that only supports aligned vector accesses proc check_effective_target_arm_vect_no_misalign { } { +if { ![istarget arm*-*-*] } { + return 0 +} + return [check_no_compiler_messages arm_vect_no_misalign assembly { #if !defined(__arm__) \ || (defined(__ARM_FEATURE_UNALIGNED) \
Re: [PATCH] testsuite checks for arm vectorization support on non-arm targets
On Wed, Mar 25, 2015 at 05:04:32PM -0600, Martin Sebor wrote: > The attached patch adds tests to lib/target-supports.exp > to avoid unnecessarily invoking the compiler on non-ARM > targets to check for the support for a number of ARM > vectorization features. > > Okay to commit to trunk? > > Martin > 2015-03-23 Martin Sebor Use current date ;) > * lib/target-supports.exp (check_effective_target_arm32): Fail early > when target isn't arm*-*-*-*. > (check_effective_target_arm_nothumb): Likewise. > (check_effective_target_arm_little_endian): Likewise. > (check_effective_target_arm_vect_no_misalign): Likewise. > (check_effective_target_aarch64_little_endian): Fail early if target > isn't aarch64*-*-*. Ok, thanks. Jakub
Re: Fix PR 65177: diamonds are not valid execution threads for jump threading
Jeff Law wrote: > > PR tree-optimization/65177 > > * tree-ssa-threadupdate.c (verify_seme): Renamed verify_jump_thread. > > (bb_in_bbs): New. > > (duplicate_seme_region): Renamed duplicate_thread_path. Redirect all > > edges not adjacent on the path to the original code. > OK for the trunk. Committed r221675. > Though I think there's some stage1 refactoring that we're going to want to do. Agreed. > Specifically, it seems to me that copy_bbs should be refactored into > copy_bbs and copy_bbs_for_threading or somesuch. Where those > routines call into refactored common subroutines, but obviously > handle wiring up the outgoing edges from the copied blocks > differently. > That would be a good cleanup: I don't like to arbitrarily redirect edges in copy_bbs just to redirect them back to their initial place in the caller. > The goal would be to eliminate the overly complex block copy/CFG > update scheme in tree-ssa-threadupdate.c as part of a larger project > to convert to a backward threader that can run independently of DOM. I have a start of a patch for that cleanup, it currently runs wild as it would replace the existing threadupdate code generator with a call to the new duplicate_thread_path. I think we should take smaller more manageable steps to ease the review and to not destabilize the jump-threader. In particular I think we should have both code generators for a while and turn one on/off with an option. Sebastian
[PATCH], PR 65569, Fix powerpc long double regression PF 65240 caused
Pat Haugen runs a spec regression tester on various PowerPC boxes, and he noticed that my fix for PR 65240 (the bug involving floating point constants and -ffast-math under VSX) caused a regression in building the dealII benchmark on power6x. I looked into it, and discovered I had missed extenddftf2_fprs relying on (const_double 0.0) being used in RTL code. This works on VSX systems, where you can use the XXLXOR instruction, but it does not work on previous systems. This patch fixes the problem. I have bootstrapped and ran make check on a power7 big endian system and a power8 little endian system. On power7, the following test had been failing, and is now fixed (it doesn't fail on power8): g++.dg/torture/pr58369.C I have also built the power8-vsx, power7-vsx, power6x-altivec suite with no failures. I'm building power6x-scalar, and power5-scalar shortly. Assuming that the last two spec runs build without errors, can I apply the patch? 2015-03-25 Michael Meissner PR target/65569 * config/rs6000/rs6000.md (extenddftf2_fprs): On VSX systems use XXLXOR to create 0.0. On pre-VSX systems make sure the constant 0.0 is correctly setup. (extenddftf2_internal): Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 221668) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -8357,16 +8357,21 @@ (define_expand "extenddftf2_fprs" && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && TARGET_LONG_DOUBLE_128" { - operands[2] = CONST0_RTX (DFmode); - /* Generate GOT reference early for SVR4 PIC. */ - if (DEFAULT_ABI == ABI_V4 && flag_pic) -operands[2] = validize_mem (force_const_mem (DFmode, operands[2])); + /* VSX can create 0.0 directly, otherwise let rs6000_emit_move create + the proper constant. */ + if (TARGET_VSX) +operands[2] = CONST0_RTX (DFmode); + else +{ + operands[2] = gen_reg_rtx (DFmode); + rs6000_emit_move (operands[2], CONST0_RTX (DFmode), DFmode); +} }) (define_insn_and_split "*extenddftf2_internal" - [(set (match_operand:TF 0 "nonimmediate_operand" "=m,Y,d,&d,r") - (float_extend:TF (match_operand:DF 1 "input_operand" "d,r,md,md,rm"))) - (use (match_operand:DF 2 "zero_reg_mem_operand" "d,r,m,d,n"))] + [(set (match_operand:TF 0 "nonimmediate_operand" "=m,Y,ws,d,&d,r") + (float_extend:TF (match_operand:DF 1 "input_operand" "d,r,md,md,md,rm"))) + (use (match_operand:DF 2 "zero_reg_mem_operand" "d,r,j,m,d,n"))] "!TARGET_IEEEQUAD && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && TARGET_LONG_DOUBLE_128"
libgo patch committed: Avoid some s390 failures
This patch from Dominik Vogt fixes some s390 failures in libgo. Ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r bdce421e579e libgo/go/runtime/chan_test.go --- a/libgo/go/runtime/chan_test.go Wed Mar 25 14:16:52 2015 -0700 +++ b/libgo/go/runtime/chan_test.go Wed Mar 25 17:39:06 2015 -0700 @@ -202,6 +202,11 @@ n := 1 if testing.Short() { n = 100 + } else { + if runtime.GOARCH == "s390" { + // Test uses too much address space on 31-bit S390. + t.Skip("skipping long test on s390") + } } for i := 0; i < n; i++ { c := make(chan int, 1) diff -r bdce421e579e libgo/go/runtime/map_test.go --- a/libgo/go/runtime/map_test.go Wed Mar 25 14:16:52 2015 -0700 +++ b/libgo/go/runtime/map_test.go Wed Mar 25 17:39:06 2015 -0700 @@ -243,7 +243,12 @@ func testConcurrentReadsAfterGrowth(t *testing.T, useReflect bool) { if runtime.GOMAXPROCS(-1) == 1 { - defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(16)) + if runtime.GOARCH == "s390" { + // Test uses too much address space on 31-bit S390. + defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8)) + } else { + defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(16)) + } } numLoop := 10 numGrowStep := 250
Fix line-maps wrt LTO
Hi, I read linemap_line_start and I think I noticed few issues with respect to overflows and lines being added randomly. 1) line_delta is computed as to_line SOURCE_LINE (map, set->highest_line) I think the last inserted line is not very releavnt. What we care about is the base of the last location and to keep thing dense how much we are stretching the value range from highest inserted element (inserting into middle is cheap). For this reason I added base_line_delta and changed line_delta to be to_line - SOURCE_LINE (map, set->highest_location). Because things go in randomly, highest_line, which really is last inserted line, may be somewhere in between. 2) (line_delta > 10 && line_delta * ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) > 1000) ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) is in range 7... 15, so it never gets high enough to make this conditional trigger. I changed it to: || line_delta > 1000 || (line_delta << ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map)) > 1000 I.e. we do not want to skip more than 1000 unused entries since highest inserted location. 3) (max_column_hint <= 80 && ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) >= 10) seems to intend to reduce the column range when it is no longer needed. Again, this is not really good idea when line inserted is not last. 4) the code deciding whether to do reuse seems worng: if (line_delta < 0 || last_line != ORDINARY_MAP_STARTING_LINE_NUMBER (map) || SOURCE_COLUMN (map, highest) >= (1U << column_bits)) line_delta really should be base_line_delta, we do not need to give up when map's line is 1, SOURCE_LINE (map, set->highest_line) is 5 and we are requested to switch to line 3. Second last_line != ORDINARY_MAP_STARTING_LINE_NUMBER (map) tests whether location has only one line that does not work (at least with my changes) because we may switch to next line and back. This conditoinal also seems to be completely missing hanlding of overflows. The following patch makes all line info and all but one carret to to be right on chromium warnings Bootstrapped/regtested x86_64-linux, OK? * line-map.c (linemap_line_start): Correct overflow tests. Index: line-map.c === --- line-map.c (revision 221568) +++ line-map.c (working copy) @@ -519,25 +519,38 @@ linemap_line_start (struct line_maps *se struct line_map *map = LINEMAPS_LAST_ORDINARY_MAP (set); source_location highest = set->highest_location; source_location r; - linenum_type last_line = -SOURCE_LINE (map, set->highest_line); - int line_delta = to_line - last_line; + int base_line_delta = to_line - ORDINARY_MAP_STARTING_LINE_NUMBER (map); + int line_delta = to_line - SOURCE_LINE (map, set->highest_location); bool add_map = false; - if (line_delta < 0 - || (line_delta > 10 - && line_delta * ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) > 1000) - || (max_column_hint >= (1U << ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map))) + /* Single MAP entry can be used to encode multiple source lines. + Look for situations when this is impossible or undesriable. */ + if (base_line_delta < 0 + /* We want to keep maps resonably dense, so do not increase the range +of this linemap entry by more than 1000. */ + || line_delta > 1000 + || (line_delta << ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map)) > 1000 + /* If the max column is out of range and we are still not dropping line +info. */ + || (max_column_hint >= (1U << ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map)) + && highest < 0x6000) + /* If the prevoius line was long. Ignore this problem is we already +re-used the map for lines with greater indexes. */ || (max_column_hint <= 80 - && ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) >= 10) + && ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) >= 10 && line_delta > 0) + /* If we are just started running out of locations (which makes us to drop +column info), but current line map still has column info, create fresh +one. */ || (highest > 0x6000 - && (set->max_column_hint || highest > 0x7000))) + && (ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (map) + || highest > 0x7000))) add_map = true; else max_column_hint = set->max_column_hint; if (add_map) { int column_bits; + bool reuse_map = true; if (max_column_hint > 10 || highest > 0x6000) { /* If the column number is ridiculous or we've allocated a huge @@ -554,11 +567,38 @@ linemap_line_start (struct line_maps *se column_bits++; max_column_hint = 1U << column_bits; } + /* Allocate the new line_map. However, if the current map only has a single line we can sometimes just increase its column_bits instead. */ - if (line_delta < 0 -
Re: [PATCH, bootstrap]: Add bootstrap-lto-noplugin build configuration (PR65537)
> Hello! > > Attached patch introduces bootstrap-lto-noplugin bootstrap > configuration for hosts that do not support linker plugin (e.g. CentOS > 5.11 with binutils 2.17). Also, the patch adds some additional > documentation to bootstrap-lto option. > > config/ChangeLog: > > 2015-03-24 Uros Bizjak > > PR bootstrap/65537 > * bootstrap-lto-noplugin.mk: New build configuration. > > gcc/ChangeLog: > > 2015-03-24 Uros Bizjak > > PR bootstrap/65537 > * doc/install.texi (Building a native compiler): Document new > bootstrap-lto-noplugin configuration. Mention that bootstrap-lto > configuration assumes that the host supports the linker plugin. > > Patch was bootstrapped and tested on x86_64-linux-gnu (CentOS 5.11) > host, configured with --with-build-config=bootstrap-lto build > configuration. > > OK for mainline? > Index: gcc/doc/install.texi > === > --- gcc/doc/install.texi (revision 221636) > +++ gcc/doc/install.texi (working copy) > @@ -2519,8 +2519,14 @@ > @item @samp{bootstrap-lto} > Enables Link-Time Optimization for host tools during bootstrapping. > @samp{BUILD_CONFIG=bootstrap-lto} is equivalent to adding > -@option{-flto} to @samp{BOOT_CFLAGS}. > +@option{-flto} to @samp{BOOT_CFLAGS}. This option assumes that the host > +supports the linker plugin (e.g. GNU ld version 2.21 or later or GNU gold > +version 2.21 or later). > > +@item @samp{bootstrap-lto-noplugin} > +This option is similar to @code{bootstrap-lto}, but is intended for > +hosts that do not support the linker plugin. Can you, please, add a note that without linker plugin the static libraries are not compiled with linktime optimization. Because GCC middle-end and backend is in libbackend.a it means that only (part of) the frontend is actually LTO optimized? Currently it seems bit too welcoming to skip the linker update. Honza > + > @item @samp{bootstrap-debug} > Verifies that the compiler generates the same executable code, whether > or not it is asked to emit debug information. To this end, this
Re: [debug-early] emit early dwarf for locally scoped functions
On 03/25/2015 05:05 PM, Aldy Hernandez wrote: Or we could cheat and just remove them as mainline does, but only when reusing a declaration (as in the attached patch). This seems right to me. Jason
[patch, libgfortran] Bug 65541 - [5 Regression] namelist regression
Committed as obvious and simple. revision 221682. Regards, Jerry 2015-03-25 Jerry DeLisle PR libgfortran/65541 * io/write.c (nml_write_obj): Convert '+' to '%' before emitting object names in namelists. Index: io/write.c === --- io/write.c (revision 221681) +++ io/write.c (working copy) @@ -1704,10 +1704,11 @@ size_t clen; index_type elem_ctr; size_t obj_name_len; - void * p ; + void * p; char cup; char * obj_name; char * ext_name; + char * q; size_t ext_name_len; char rep_buff[NML_DIGITS]; namelist_info * cmp; @@ -1745,6 +1746,8 @@ for (dim_i = len; dim_i < clen; dim_i++) { cup = toupper ((int) obj->var_name[dim_i]); + if (cup == '+') + cup = '%'; write_character (dtp, &cup, 1, 1, NODELIM); } write_character (dtp, "=", 1, 1, NODELIM); @@ -1894,6 +1897,9 @@ } ext_name[tot_len] = '\0'; + for (q = ext_name; *q; q++) + if (*q == '+') + *q = '%'; /* Now obj_name. */
Re: [PATCH], PR 65569, Fix powerpc long double regression PF 65240 caused
On Wed, Mar 25, 2015 at 8:09 PM, Michael Meissner wrote: > Pat Haugen runs a spec regression tester on various PowerPC boxes, and he > noticed that my fix for PR 65240 (the bug involving floating point constants > and -ffast-math under VSX) caused a regression in building the dealII > benchmark > on power6x. I looked into it, and discovered I had missed extenddftf2_fprs > relying on (const_double 0.0) being used in RTL code. This works on VSX > systems, where you can use the XXLXOR instruction, but it does not work on > previous systems. > > This patch fixes the problem. I have bootstrapped and ran make check on a > power7 big endian system and a power8 little endian system. On power7, the > following test had been failing, and is now fixed (it doesn't fail on power8): > > g++.dg/torture/pr58369.C > > I have also built the power8-vsx, power7-vsx, power6x-altivec suite with no > failures. I'm building power6x-scalar, and power5-scalar shortly. Assuming > that the last two spec runs build without errors, can I apply the patch? > > 2015-03-25 Michael Meissner > > PR target/65569 > * config/rs6000/rs6000.md (extenddftf2_fprs): On VSX systems use > XXLXOR to create 0.0. On pre-VSX systems make sure the constant > 0.0 is correctly setup. > (extenddftf2_internal): Likewise. Okay. Thanks, David
Re: Optimize lto location stremaing
Jan Hubicka writes: > > Bootstrapped/regtested x86_64-linux, the patch saves about 1GB of locators > for chromium > and 400MB for firefox LTO. Great. On my LTO builds linemap was always high up in the profiles too. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: Optimize lto location stremaing
> Jan Hubicka writes: > > > > Bootstrapped/regtested x86_64-linux, the patch saves about 1GB of locators > > for chromium > > and 400MB for firefox LTO. > > Great. On my LTO builds linemap was always high up in the profiles too. Yep, these was always high. I am re-running some profiles now. I feel somewhat stupid I did not get this idea bit earlier - it seems to work well in practice. In GCC 6 we we will hopefully look into preserving more of the linemap info (inline stacks & macro expansion) and inventing less stupid way to pickle the locators, but I think this patch solves the majority of memory use/lookup time issues. I think this was the last major offender for Chromoim/Libreoffice and Firefox. (Modulo the fact htat chromium needs 9GB for WPA. There seems not be much of low hanging fruit - chromium needs a lot of trees to be streamed in that will hopefully be tracked by early debug soon.) What is the status of GCC 5 for kernel compilation? Are the compile times/memory uses resonable now? Honza
Discover nothorow functions before into_ssa
Hi, this patch (as suggested by Richard) adds very simple discovery of DECL_NOTHORW to build_ssa passes. The reason is that in 4.9 we did build_ssa in parallel with early optimization that does nothrow discovery as part of local pure const. Bounds checking patches broke the pass queue into multiple lasses and we produce a lot more statements when notrhow is not identifier early. I went with specialized pass because I do not want to pay the cost of local pure const building loop structure to prove that function is pure/const. We really care about this just later. I also tested a variant making this pass part of early lowering passes. This does not work that well, because these are not run in topological order and C++ FE already does its own nothrow discovery, so it handled only about 900 calls on tramp3d. This pass handles about 3500 calls and additional 3000 calls are handled by the followup ipa-pure-const pass (probably because of extra code removal). Adding pass causes cgraph verifier to fail. The reason is that now fixup_cfg pass at begging of ssa passes actually does some dead code removal. This makes cgraph edges out of date and they are not rebuild at the end of the passes. Instead of triggering yet another rebuild, which would be somewhat redundant given that early passes rebuilds the edges again, I just changed cgraph verifier to not compare calleers frequencies, but do callees. This way we reduce some work, too. Doing this I removed one very old FIXME about verificatoin that pointed out latent bug in set_edge_predicate. Fixed thus. Bootstrapped/regtested x86_64-linux, OK? * cgraph.c (cgraph_edge::verify_count_and_frequency): Remove testing of frequency and bb match. (cgraph_node::verify_node): Do it here on callees only. * passes.def: Add pass_nothrow. * ipa-pure-const.c: (pass_data_nothrow): New. (pass_nothrow): New. (pass_nothrow::execute): New. (make_pass_nothrow): New. * tree-pass.h (make_pass_nothrow): Declare. * ipa-inline.c (set_edge_predicate): Also redirect indirect edges. Index: cgraph.c === --- cgraph.c(revision 221682) +++ cgraph.c(working copy) @@ -2661,25 +2661,6 @@ cgraph_edge::verify_count_and_frequency error ("caller edge frequency is too large"); error_found = true; } - if (gimple_has_body_p (caller->decl) - && !caller->global.inlined_to - && !speculative - /* FIXME: Inline-analysis sets frequency to 0 when edge is optimized out. -Remove this once edges are actually removed from the function at that time. */ - && (frequency - || (inline_edge_summary_vec.exists () - && ((inline_edge_summary_vec.length () <= (unsigned) uid) - || !inline_edge_summary (this)->predicate))) - && (frequency - != compute_call_stmt_bb_frequency (caller->decl, -gimple_bb (call_stmt -{ - error ("caller edge frequency %i does not match BB frequency %i", -frequency, -compute_call_stmt_bb_frequency (caller->decl, -gimple_bb (call_stmt))); - error_found = true; -} return error_found; } @@ -2848,9 +2829,46 @@ cgraph_node::verify_node (void) error_found = true; } } + for (e = callees; e; e = e->next_callee) +{ + if (e->verify_count_and_frequency ()) + error_found = true; + if (gimple_has_body_p (e->caller->decl) + && !e->caller->global.inlined_to + && !e->speculative + /* Optimized out calls are redirected to __builtin_unreachable. */ + && (e->frequency + || e->callee->decl +!= builtin_decl_implicit (BUILT_IN_UNREACHABLE)) + && (e->frequency + != compute_call_stmt_bb_frequency (e->caller->decl, +gimple_bb (e->call_stmt + { + error ("caller edge frequency %i does not match BB frequency %i", +e->frequency, +compute_call_stmt_bb_frequency (e->caller->decl, +gimple_bb (e->call_stmt))); + error_found = true; + } +} for (e = indirect_calls; e; e = e->next_callee) -if (e->verify_count_and_frequency ()) - error_found = true; +{ + if (e->verify_count_and_frequency ()) + error_found = true; + if (gimple_has_body_p (e->caller->decl) + && !e->caller->global.inlined_to + && !e->speculative + && (e->frequency + != compute_call_stmt_bb_frequency (e->caller->decl, +gimple_bb (e->call_stmt + { + error ("caller edge frequency %i does not match BB frequency %i", +e->frequency, +c
[PINGv3][PATCH] ASan on unaligned accesses
On 03/19/2015 09:01 AM, Marat Zakirov wrote: On 03/04/2015 11:07 AM, Andrew Pinski wrote: On Wed, Mar 4, 2015 at 12:00 AM, Marat Zakirov wrote: Hi all! Here is the patch which forces ASan to work on memory access without proper alignment. it's useful because some programs like linux kernel often cheat with alignment which may cause false negatives. This patch needs additional support for proper work on unaligned accesses in global data and heap. It will be implemented in libsanitizer by separate patch. --Marat gcc/ChangeLog: 2015-02-25 Marat Zakirov * asan.c (asan_emit_stack_protection): Support for misalign accesses. (asan_expand_check_ifn): Likewise. * params.def: New option asan-catch-misaligned. * params.h: New param ASAN_CATCH_MISALIGNED. Since this parameter can only be true or false, I think it should be a normal option. Also you did not add documentation of the param. Thanks, Andrew Fixed. gcc/ChangeLog: 2015-03-12 Marat Zakirov * asan.c (asan_emit_stack_protection): Support for misalign accesses. (asan_expand_check_ifn): Likewise. * common.opt: New flag -fasan-catch-misaligned. * doc/invoke.texi: New flag description. * opts.c (finish_options): Add check for new flag. (common_handle_option): Switch on flag if SANITIZE_KERNEL_ADDRESS. gcc/testsuite/ChangeLog: 2015-03-12 Marat Zakirov * c-c++-common/asan/misalign-catch.c: New test. diff --git a/gcc/asan.c b/gcc/asan.c index 9e4a629..80bf2e8 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1050,7 +1050,6 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, rtx_code_label *lab; rtx_insn *insns; char buf[30]; - unsigned char shadow_bytes[4]; HOST_WIDE_INT base_offset = offsets[length - 1]; HOST_WIDE_INT base_align_bias = 0, offset, prev_offset; HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset; @@ -1193,11 +1192,37 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, if (STRICT_ALIGNMENT) set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode))); prev_offset = base_offset; + + vec shadow_mems; + vec shadow_bytes; + + shadow_mems.create(0); + shadow_bytes.create(0); + for (l = length; l; l -= 2) { if (l == 2) cur_shadow_byte = ASAN_STACK_MAGIC_RIGHT; offset = offsets[l - 1]; + if (l != length && flag_asan_catch_misaligned) + { + HOST_WIDE_INT aoff + = base_offset + ((offset - base_offset) + & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1)) + - ASAN_RED_ZONE_SIZE; + if (aoff > prev_offset) + { + shadow_mem = adjust_address (shadow_mem, VOIDmode, + (aoff - prev_offset) + >> ASAN_SHADOW_SHIFT); + prev_offset = aoff; + shadow_bytes.safe_push (0); + shadow_bytes.safe_push (0); + shadow_bytes.safe_push (0); + shadow_bytes.safe_push (0); + shadow_mems.safe_push (shadow_mem); + } + } if ((offset - base_offset) & (ASAN_RED_ZONE_SIZE - 1)) { int i; @@ -1212,13 +1237,13 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, if (aoff < offset) { if (aoff < offset - (1 << ASAN_SHADOW_SHIFT) + 1) - shadow_bytes[i] = 0; + shadow_bytes.safe_push (0); else - shadow_bytes[i] = offset - aoff; + shadow_bytes.safe_push (offset - aoff); } else - shadow_bytes[i] = ASAN_STACK_MAGIC_PARTIAL; - emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes)); + shadow_bytes.safe_push (ASAN_STACK_MAGIC_PARTIAL); + shadow_mems.safe_push(shadow_mem); offset = aoff; } while (offset <= offsets[l - 2] - ASAN_RED_ZONE_SIZE) @@ -1227,12 +1252,21 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, (offset - prev_offset) >> ASAN_SHADOW_SHIFT); prev_offset = offset; - memset (shadow_bytes, cur_shadow_byte, 4); - emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes)); + shadow_bytes.safe_push (cur_shadow_byte); + shadow_bytes.safe_push (cur_shadow_byte); + shadow_bytes.safe_push (cur_shadow_byte); + shadow_bytes.safe_push (cur_shadow_byte); + shadow_mems.safe_push(shadow_mem); offset += ASAN_RED_ZONE_SIZE; } cur_shadow_byte = ASAN_STACK_MAGIC_MIDDLE; } + for (unsigned i = 0; flag_asan_catch_misaligned && i < shadow_bytes.length () - 1; i++) +if (shadow_bytes[i] == 0 && shadow_bytes[i + 1] > 0) + shadow_bytes[i] = 8 + (shadow_bytes[i + 1] > 7 ? 0 : shadow_bytes[i + 1]); + for (unsigned i = 0; i < shadow_mems.length (); i++) +emit_move_insn (shadow_mems[i], asan_shadow_cst (&shadow_bytes[i * 4])); + do_pending_stack_adjust (); /* Construct epilogue sequence. */ @@ -1285,34 +1319,8 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, if (STRICT_ALIGNMENT) set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode))); - prev_offset = base_offset; - last_offset = base_offset; - last_size = 0; -