[PATCH, PR68809] Fix same_close_phi_node
Hi, this patch fixes graphite PR68809, a 6 regression. The problem we run into is that we return true in same_close_phi_node for two single-argument phis, one with an int 0 argument, and one with a char 0 argument: ... _1 = PHI <(int)0> _2 = PHI <(char)0> ... The result types of the two phis are not the same though. So when we replace uses of _1 by uses of _2 in remove_duplicate_close_phi, we introduce type errors, which causes an ICE during gimple verification. The patch fixes this by testing for equality of the result type of the phis in same_close_phi_node. Bootstrapped and reg-tested on x86_64. OK for stage4 trunk? Thanks, - Tom Fix same_close_phi_node 2016-03-15 Tom de Vries PR tree-optimization/68809 * graphite-scop-detection.c (same_close_phi_node): Test if result types are the same. * gcc.dg/graphite/pr68809-2.c: New test. * gcc.dg/graphite/pr68809.c: New test. --- gcc/graphite-scop-detection.c | 6 -- gcc/testsuite/gcc.dg/graphite/pr68809-2.c | 27 +++ gcc/testsuite/gcc.dg/graphite/pr68809.c | 28 3 files changed, 59 insertions(+), 2 deletions(-) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 03b1c49..9161cb7 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -273,8 +273,10 @@ trivially_empty_bb_p (basic_block bb) static inline bool same_close_phi_node (gphi *p1, gphi *p2) { - return operand_equal_p (gimple_phi_arg_def (p1, 0), - gimple_phi_arg_def (p2, 0), 0); + return ((TREE_TYPE (gimple_phi_result (p1)) + == TREE_TYPE (gimple_phi_result (p2))) + && operand_equal_p (gimple_phi_arg_def (p1, 0), + gimple_phi_arg_def (p2, 0), 0)); } static void make_close_phi_nodes_unique (basic_block bb); diff --git a/gcc/testsuite/gcc.dg/graphite/pr68809-2.c b/gcc/testsuite/gcc.dg/graphite/pr68809-2.c new file mode 100644 index 000..e6639b8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr68809-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -floop-nest-optimize" } */ + +int ae, vs, gf; +char ue; + +void +kc (char); + +void +pm (void) +{ + unsigned int v9; + int td = (gf != 0); + while (vs) +{ + kc (ue); + for (ae = 0; ae < 70; ++ae) + { + } + ae &= 4; + ae ^ td && ((ue = 0) != 0); + ++vs; +} + v9 = ue + 1; + ue - v9 && ((ue = 0) != 0); +} diff --git a/gcc/testsuite/gcc.dg/graphite/pr68809.c b/gcc/testsuite/gcc.dg/graphite/pr68809.c new file mode 100644 index 000..1d75841 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr68809.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -floop-nest-optimize" } */ + +int ae, vs; +char ue; + +void +kc (char); + +void +pm (void) +{ + unsigned int v9; + int gf = 0; + vs = 1; + while (vs) +{ + gf -= ue; + kc (ue); + for (ae = 0; ae < 70; ++ae) + { + } + ae &= 4; + ae ^ (gf != 0) && ((ue = 0) != 0); +} + v9 = ue + 1; + ue - v9 && ((ue = 0) != 0); +}
[PATCH, PR68715] Add missing single_pred_p test in scop_detection::merge_sese
Hi, this patch fixes graphite PR68715, a 6 regression. In scop_detection::merge_sese, we check if the exit bb of the merged sese region is dominated by the entry bb: ... if (... || !dominated_by_p (CDI_DOMINATORS, get_exit_bb (combined), get_entry_bb (combined))) { ... return invalid_sese; } ... Subsequently, we check for an empty exit bb, and if that one's not empty, we try to merge an empty successor block into the sese: ... /* FIXME: We should remove this piece of code once canonicalize_loop_closed_ssa has been removed, because that function adds a BB with single exit. */ if (!trivially_empty_bb_p (get_exit_bb (combined))) { /* Find the first empty succ (with single exit) of combined.exit. */ basic_block imm_succ = combined.exit->dest; if (single_succ_p (imm_succ) && trivially_empty_bb_p (imm_succ)) combined.exit = single_succ_edge (imm_succ); else { ... return invalid_sese; } } ... However, when the imm_succ block has more than one predecessor, merging the block into the sese region breaks the property that the exit bb is dominated by the entry bb. We then run into an ICE in harmful_loop_in_region a bit later, when re-checking that property. The patch fixes this by adding a test for 'single_pred_p (imm_succ)' in the empty-block-merge condition. Bootstrapped and reg-tested on x86_64. OK for stage4 trunk? Thanks, - Tom Add missing single_pred_p test in scop_detection::merge_sese 2016-03-15 Tom de Vries PR tree-optimization/68715 * graphite-scop-detection.c (scop_detection::merge_sese): Add missing single_pred_p test. * gcc.dg/graphite/pr68715-2.c: New test. * gcc.dg/graphite/pr68715.c: New test. * gfortran.dg/graphite/pr68715.f90: New test. --- gcc/graphite-scop-detection.c | 4 ++- gcc/testsuite/gcc.dg/graphite/pr68715-2.c | 35 + gcc/testsuite/gcc.dg/graphite/pr68715.c| 36 ++ gcc/testsuite/gfortran.dg/graphite/pr68715.f90 | 31 ++ 4 files changed, 105 insertions(+), 1 deletion(-) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 9161cb7..f0c13ee 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -836,7 +836,9 @@ scop_detection::merge_sese (sese_l first, sese_l second) const { /* Find the first empty succ (with single exit) of combined.exit. */ basic_block imm_succ = combined.exit->dest; - if (single_succ_p (imm_succ) && trivially_empty_bb_p (imm_succ)) + if (single_succ_p (imm_succ) + && single_pred_p (imm_succ) + && trivially_empty_bb_p (imm_succ)) combined.exit = single_succ_edge (imm_succ); else { diff --git a/gcc/testsuite/gcc.dg/graphite/pr68715-2.c b/gcc/testsuite/gcc.dg/graphite/pr68715-2.c new file mode 100644 index 000..270d948 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr68715-2.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -floop-interchange" } */ + +int a, b, c, d, f, g; +int e[1], h[1]; +void fn2 (); +void fn3 (); +void +fn1 () +{ + fn2 (); + b = 0; + for (; b < 10; b++) +; +} + +void +fn2 () +{ + if (a) +{ + fn3 (); + c = d; +} +} + +void +fn3 () +{ + for (; g; g++) +e[g] = 2; + if (f) +for (; g; g++) + h[g] = 5; +} diff --git a/gcc/testsuite/gcc.dg/graphite/pr68715.c b/gcc/testsuite/gcc.dg/graphite/pr68715.c new file mode 100644 index 000..14da2fb --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr68715.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -floop-interchange" } */ + +int a[1], c[1]; +int b, d, e; + +void +fn1 (int p1) +{ + for (;;) +; +} + +int +fn3 () +{ + for (; e; e++) +c[e] = 2; + for (; d; d--) +a[d] = 8; + return 0; +} + +int fn5 (int); + +int +fn2 () +{ + fn3 (); +} + +void +fn4 () +{ + fn1 (b || fn5 (fn2 ())); +} diff --git a/gcc/testsuite/gfortran.dg/graphite/pr68715.f90 b/gcc/testsuite/gfortran.dg/graphite/pr68715.f90 new file mode 100644 index 000..c011756 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/graphite/pr68715.f90 @@ -0,0 +1,31 @@ +! { dg-do compile } +! { dg-options "-floop-nest-optimize -O1" } + +SUBROUTINE se_core_core_interaction(calculate_forces) + INTEGER, PARAMETER :: dp=8 + LOGICAL, INTENT(in) :: calculate_forces + REAL(KIND=dp), DIMENSION(3) :: force_ab, rij + LOGICAL :: lfoo,kfoo,mfoo,nfoo,ffoo + INTEGER, PARAMETER :: mi2=42 + CALL dummy(lfoo,kfoo,mfoo,nfoo,method_id,core_core) + IF (lfoo) THEN + DO WHILE (ffoo()) + IF (lfoo) CYCLE + IF (kfoo) CYCLE + dr1 = DOT_PRODUCT(rij,rij) + IF (dr1 > rij_threshold) THEN + SELECT CASE (method_id) + CASE (mi2) + IF (calculate_forces) THEN + CALL dummy2(force_ab) + IF (nfoo) THEN + force_ab = force_ab + core_core*dr3inv + END IF + END IF + END SELECT
[PATCH, PR70187] Safely use nodes[0] in possible_polymorphic_call_targets
Hi, this patch fixes lto PR70187, a 6 regression. We run into an ICE in in possible_polymorphic_call_targets when accessing nodes[0]->decl because nodes == vNULL: ... if (!outer_type->all_derivations_known) { if (!speculative && final_warning_records && TREE_CODE (TREE_TYPE (nodes[0]->decl)) == METHOD_TYPE) { if (complete && nodes.length () == 1 && warn_suggest_final_types && !outer_type->derived_types.length ()) { ... The patch fixes this by moving the 'nodes.length () == 1' test to before the use of nodes[0]. Bootstrapped and reg-tested on x86_64. OK for stage4 trunk? Thanks, - Tom Safely use nodes[0] in possible_polymorphic_call_targets 2016-03-16 Tom de Vries PR lto/70187 * ipa-devirt.c (possible_polymorphic_call_targets): Move nodes.length () == 1 test to before first nodes[0] access. --- gcc/ipa-devirt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index e4fb562..4df171b 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -3178,10 +3178,10 @@ possible_polymorphic_call_targets (tree otr_type, if (!outer_type->all_derivations_known) { if (!speculative && final_warning_records + && nodes.length () == 1 && TREE_CODE (TREE_TYPE (nodes[0]->decl)) == METHOD_TYPE) { if (complete - && nodes.length () == 1 && warn_suggest_final_types && !outer_type->derived_types.length ()) { @@ -3197,7 +3197,6 @@ possible_polymorphic_call_targets (tree otr_type, } if (complete && warn_suggest_final_methods - && nodes.length () == 1 && types_same_for_odr (DECL_CONTEXT (nodes[0]->decl), outer_type->type)) {
[PING] [PATCH, libstdc++] Add missing free-standing headers to install rule
Ping... Hi, when the free-standing libstdc++-headers are installed, the C++ header file does not always compile, because it includes and this includes under certain conditions (__cplusplus >= 201103L && ATOMIC_INT_LOCK_FREE > 1) the header file but that fails to compile because it needs which is not installed. This condition depends on the target, and for instance an arm-eabi eCos compiler fails to compile with -mcpu=cortex-a9 and the default C++ standard option, while it is OK with ARMv4 CPUs. Therefore this patch adds move.h and concept_check.h to the installed headers, unconditionally. I've verified that the header compiles on an eCos cross compiler. Boot-strapped and regression-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd.
Re: [PATCH, PR68809] Fix same_close_phi_node
On Wed, 16 Mar 2016, Tom de Vries wrote: > Hi, > > this patch fixes graphite PR68809, a 6 regression. > > The problem we run into is that we return true in same_close_phi_node for two > single-argument phis, one with an int 0 argument, and one with a char 0 > argument: > ... > _1 = PHI <(int)0> > _2 = PHI <(char)0> > ... > The result types of the two phis are not the same though. > > So when we replace uses of _1 by uses of _2 in remove_duplicate_close_phi, we > introduce type errors, which causes an ICE during gimple verification. > > The patch fixes this by testing for equality of the result type of the phis in > same_close_phi_node. > > Bootstrapped and reg-tested on x86_64. > > OK for stage4 trunk? Please use types_compatible_p () rather than strict equality. Ok with that change. Richard.
Re: [PATCH, PR68715] Add missing single_pred_p test in scop_detection::merge_sese
On Wed, 16 Mar 2016, Tom de Vries wrote: > Hi, > > this patch fixes graphite PR68715, a 6 regression. > > In scop_detection::merge_sese, we check if the exit bb of the merged sese > region is dominated by the entry bb: > ... > if (... > || !dominated_by_p (CDI_DOMINATORS, get_exit_bb (combined), > get_entry_bb (combined))) > { > ... > return invalid_sese; > } > ... > > Subsequently, we check for an empty exit bb, and if that one's not empty, we > try to merge an empty successor block into the sese: > ... > /* FIXME: We should remove this piece of code once > canonicalize_loop_closed_ssa has been removed, because that > function adds a BB with single exit. */ > if (!trivially_empty_bb_p (get_exit_bb (combined))) > { > /* Find the first empty succ (with single exit) of > combined.exit. */ > basic_block imm_succ = combined.exit->dest; > if (single_succ_p (imm_succ) > && trivially_empty_bb_p (imm_succ)) > combined.exit = single_succ_edge (imm_succ); > else > { > ... > return invalid_sese; > } > } > ... > > However, when the imm_succ block has more than one predecessor, merging the > block into the sese region breaks the property that the exit bb is dominated > by the entry bb. We then run into an ICE in harmful_loop_in_region a bit > later, when re-checking that property. > > The patch fixes this by adding a test for 'single_pred_p (imm_succ)' in the > empty-block-merge condition. > > Bootstrapped and reg-tested on x86_64. > > OK for stage4 trunk? Hmm, it looks like for all what this function does this effectively pessimizes scop merging and it would be easier to split 'exit' in case its destination is unsuitable (not trivially empty). The /* For now we just want to bail out when exit does not post-dominate entry. TODO: We might just add a basic_block at the exit to make exit post-dominate entry (the entire region). */ if (!dominated_by_p (CDI_POST_DOMINATORS, get_entry_bb (combined), get_exit_bb (combined)) || !dominated_by_p (CDI_DOMINATORS, get_exit_bb (combined), comment also suggests that splitting the get_nearest_pdom_with_single_exit edge and including the new BB in the combined region would also always fix the dominance relation (though I don't see how it could do that and the comment looks wrong and by construction the check should never trigger). Otherwise the patch is certainly fine of course. Thanks, Richard.
[PATCH GCC][OBVIOUS]Fix dump info by reporting malformed loop nest.
Hi, This is an obvious change to dump message in vect_analyze_loop_2. Apparently, the wrong message is copies/pasted from another place, the code has nothing to do with function calls or data references. We should report that loop cannot be vectorized because of malformed loop nest. Build successfully. Is it OK? Thanks, bin 2016-03-16 Bin Cheng * tree-vect-loop.c (vect_analyze_loop_2): Fix wrong dump info by reporting malformed loop nest. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 234057) +++ gcc/tree-vect-loop.c(working copy) @@ -1772,8 +1772,9 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, boo { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, -"not vectorized: loop contains function calls" -" or data references that cannot be analyzed\n"); +"not vectorized: loop nest containing two " +"or more consecutive inner loops cannot be " +"vectorized\n"); return false; }
Re: [PATCH] Fix 70199
On Tue, Mar 15, 2016 at 8:42 PM, Richard Henderson wrote: > On 03/15/2016 07:13 AM, Richard Biener wrote: >> >> On Tue, Mar 15, 2016 at 4:44 AM, Richard Henderson wrote: >>> >>> The problem here is that >>> >>>void* labels[] = { >>> &&l0, &&l1, &&l2 >>>}; >>> >>> gets gimplified to >>> >>>labels = *.LC0; >>> >>> but .LC0 is not in the set of local decls, so that when copy_forbidden is >>> called during sra versioning we fail to forbid the copy. We could set a >>> different flag, but I think it's easiest to just add the artificial decl >>> to >>> where it can be seen. >>> >>> Ok? >> >> >> Hmm. tree_output_constant_def uses the global constant pool (and not >> function-scope statics). So while for the above case with local labels >> there can be no sharing and thus the decl is really "local" with non-local >> labels or with other random initializers you'd have the ctor decl in >> multiple local decl vectors. Not sure if that's a problem, but at least >> if you'd have >> >>void* labels[] = { >> &&l0, &&l1, &&l2 >>}; >>void* labels2[] = { >> &&l0, &&l1, &&l2 >>}; >> >> you'll end up with the same constant pool decl in local-decls twice. > > > Yeah, but since the decl is TREE_STATIC, we'll ignore it for almost > everything. About the only thing I can figure that might go wrong is unused > variable removal, where we'd remove the first copy but not look for > duplicates, and so the variable stays in use when it isn't. I don't *think* > that can cause further problems. It's not like we ever clear FORCED_LABEL > even if the data referencing it goes away. > >> It's also >> a bit pre-mature in the gimplifier as we only add to local-decls during >> BIND expr lowering. > > > Yeah, I suppose. Though for a TREE_STATIC decl it doesn't make a difference > that we didn't put it into any BIND_EXPR. > >> I also wonder if outputting the constant pool decl far away from the >> labels >> might end up with invalid asm for some targets. > > > No. The pointers involved here are full address space, not reduced > displacement pc-relative. > >> Well, I don't see any convenient way of fixing things here either but >> maybe >> we can do >> >>if (walk_tree_without_duplicataes (&DECL_INITIAL (ctor), >> has_label_address_in_static_1, cfun->decl)) >> add_local_decl (cfun, ctor); >> >> to avoid adding the decl when it is not necessary. > > > Sure. Patch 1 below. > >> Having another struct function flag would be possible as well, or re-use >> has_nonlocal_label as clearly a global static is now refering to a local >> label (you'd lose optimization when 'labels' becomes unused of course). > > > On the other hand, the likelyhood of these labels (or the data referencing > the labels) going away is slim. Except for artificial test cases, the user > is going to have taken these addresses and put them in an array for a > reason. The likelyhood of some stored FORCED_LABEL becoming non-forced is > virtually nil. > > Patch 2 below. This second patch does have lower complexity, and doesn't > have the duplicated entry issue you point out. I like patch 2 more - btw, you need to add has_forced_label_in_static streaming to lto-streamer-{in,out}.c, just look for has_nonlocal_label streaming. Also has_label_address_in_static_1 is now unused and should be removed. Thanks, Richard. > Thoughts? > > > r~
Re: [PATCH, PR70187] Safely use nodes[0] in possible_polymorphic_call_targets
On Wed, Mar 16, 2016 at 9:13 AM, Tom de Vries wrote: > Hi, > > this patch fixes lto PR70187, a 6 regression. > > We run into an ICE in in possible_polymorphic_call_targets when accessing > nodes[0]->decl because nodes == vNULL: > ... > if (!outer_type->all_derivations_known) > { > if (!speculative && final_warning_records > && TREE_CODE (TREE_TYPE (nodes[0]->decl)) == METHOD_TYPE) > { > if (complete > && nodes.length () == 1 > && warn_suggest_final_types > && !outer_type->derived_types.length ()) > { > ... > > The patch fixes this by moving the 'nodes.length () == 1' test to before the > use of nodes[0]. > > Bootstrapped and reg-tested on x86_64. > > OK for stage4 trunk? Ok (even obvious I think). Thanks, Richard. > Thanks, > - Tom
Re: [PATCH GCC][OBVIOUS]Fix dump info by reporting malformed loop nest.
On Wed, Mar 16, 2016 at 10:28 AM, Bin Cheng wrote: > Hi, > This is an obvious change to dump message in vect_analyze_loop_2. > Apparently, the wrong message is copies/pasted from another place, the code > has nothing to do with function calls or data references. We should report > that loop cannot be vectorized because of malformed loop nest. > Build successfully. Is it OK? Ok. Richard. > Thanks, > bin > > 2016-03-16 Bin Cheng > > * tree-vect-loop.c (vect_analyze_loop_2): Fix wrong dump info by > reporting malformed loop nest.
Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class
On March 16, 2016 3:17:20 AM GMT+01:00, "H.J. Lu" wrote: >> Where is the current definition of empty types you're proposing for >use in >> GCC? Is the behavior of this case clear from that definition? > >https://gcc.gnu.org/ml/gcc/2016-03/msg00071.html > >Jason's patch follows it. Here is a test for struct with zero-size >array of empty type, which is treated as empty type. index 000..489eb3a --- /dev/null +++ b/gcc/testsuite/g++.dg/abi/empty19.C @@ -0,0 +1,17 @@ +// PR c++/60336 +// { dg-do run } +// { dg-options "-Wabi=9 -x c" } +// { dg-additional-sources "empty14a.c" } 14a ? Not 19a ? Thanks
[PATCH PR69042/01]Add IV candidate for use with constant offset stripped in base.
Hi, When I tried to decrease # of IV candidates, I removed code that adds IV candidates for use with constant offset stripped in use->base. This is kind of too aggressive and triggers PR69042. So here is a patch adding back the missing candidates. Honestly, this patch doesn't truly fix the issue, it just brings back the original behavior in IVOPT part (Which is still a right thing to do I think). The issue still depends on PIC_OFFSET register used on x86 target. As discussed in https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html. Furthermore, the real problem could be in register pressure modeling about PIC_OFFSET symbol in IVOPT. On AArch64, overall spec2k number isn't changed, though 173.applu is regressed by ~4% because couple of loops' # of candidates now hits "--param iv-consider-all-candidates-bound=30". For spec2k6 data on AArch64, INT is not affected; FP overall is not changed, as for specific case: 459.GemsFDTD is regressed by 2%, 433.milc is improved by 2%. To address the regression, I will send another patch increasing the parameter bound. Bootstrap&test on x86_64 and AArch64, is it OK? In the meantime, I will collect spec2k6 data on x86_64. Thanks, bin 2016-03-10 Bin Cheng PR tree-optimization/69042 * tree-ssa-loop-ivopts.c (add_iv_candidate_for_use): Add IV cand for use with constant offset stripped in base. diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 4026d28..32594df 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -3232,11 +3232,13 @@ add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use) basetype = sizetype; record_common_cand (data, build_int_cst (basetype, 0), iv->step, use); - /* Record common candidate with constant offset stripped in base. */ + /* Record common candidate with constant offset stripped in base. + Like the use itself, we also add candidate directly for it. */ + base = strip_offset (iv->base, &offset); + if (offset || base != iv->base) { - base = strip_offset (iv->base, &offset); - if (offset || base != iv->base) - record_common_cand (data, base, iv->step, use); + record_common_cand (data, base, iv->step, use); + add_candidate (data, base, iv->step, false, use); } /* Record common candidate with base_object removed in base. */
[PATCH PR69489/01]Improve tree ifcvt by storing/tracking DR against its innermost loop bahavior if possible
Hi, One issue revealed in tree ifcvt is the pass stores/tracks DRs against its memory references in IR. This causes difficulty in identifying same memory references appearing in different forms. Given below example: void foo (int a[], int b[]) { int i; for (i = 0; i < 100; i++) { if (a[i] ==0) a[i] = b[i]*4; else a[i] = b[i]*3; } } The gimple dump before tree ifcvt is as: : : # i_24 = PHI # ivtmp_28 = PHI _5 = (long unsigned int) i_24; _6 = _5 * 4; _8 = a_7(D) + _6; _9 = *_8; if (_9 == 0) goto ; else goto ; : _11 = b_10(D) + _6; _12 = *_11; _13 = _12 * 4; goto ; : _15 = b_10(D) + _6; _16 = *_15; _17 = _16 * 3; : # cstore_1 = PHI <_13(4), _17(5)> *_8 = cstore_1; i_19 = i_24 + 1; ivtmp_23 = ivtmp_28 - 1; if (ivtmp_23 != 0) goto ; else goto ; : goto ; : return; The two memory references "*_11" and "*_15" are actually the same thing, but ifcvt failed to discover this because they are recorded in different forms. This patch fixes the issue by recording/tracking memory reference against its innermost_loop_behavior if: the memory reference has innermost_loop_behavior and it is not a compound reference. BTW, PR56625 reported that this case couldn't be vectorized even tree if-conv can handle it. Interesting thing is at current time, it's tree if-conv that could not handle the case. Once it's if-converted with this patch, vectorizer are able to handle it too. Bootstrap and test on x86_64 and AArch64. Is it OK, not sure if it's GCC 7? Thanks, bin 2016-03-11 Bin Cheng PR tree-optimization/56625 PR tree-optimization/69489 * tree-data-ref.h (innermost_loop_behavior_hash): New class for hashing struct innermost_loop_behavior. (DR_INNERMOST): New macro. * tree-if-conv.c (innermost_DR_map): New map. (ref_DR_map, baseref_DR_map): Revise the comment. (hash_memrefs_baserefs_and_store_DRs_read_written_info): Store DR to innermost_DR_map if it has innermost loop behavior and is not a compound reference. (ifcvt_memrefs_wont_trap): Get DR from innermost_DR_map if it has innermost loop behavior and is not a compound reference. (if_convertible_loop_p_1): Initialize innermost_DR_map. (if_convertible_loop_p): Release innermost_DR_map. gcc/testsuite/ChangeLog 2016-03-11 Bin Cheng PR tree-optimization/56625 PR tree-optimization/69489 * gcc.dg/tree-ssa/ifc-pr69489-1.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c new file mode 100644 index 000..05db62b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-S -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */ + +void foo (int a[], int b[]) +{ + int i; + for (i = 0; i < 100; i++) +{ + if (a[i] ==0) + a[i] = b[i]*4; + else + a[i] = b[i]*3; +} +} + +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index eb24d16..310514a 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -57,6 +57,61 @@ struct innermost_loop_behavior tree aligned_to; }; +/* Hash for struct innermost_loop_behavior. It depends on the user to + free the memory. */ + +struct innermost_loop_behavior_hash : nofree_ptr_hash +{ + static inline hashval_t hash (const value_type &); + static inline bool equal (const value_type &, + const compare_type &); +}; + +inline hashval_t +innermost_loop_behavior_hash::hash (const value_type &e) +{ + hashval_t hash; + + hash = iterative_hash_expr (e->base_address, 0); + hash = iterative_hash_expr (e->offset, hash); + hash = iterative_hash_expr (e->init, hash); + hash = iterative_hash_expr (e->step, hash); + return iterative_hash_expr (e->aligned_to, hash); +} + +inline bool +innermost_loop_behavior_hash::equal (const value_type &e1, +const compare_type &e2) +{ + bool equal_p; + + if ((e1->base_address && !e2->base_address) + || (!e1->base_address && e2->base_address) + || (!e1->offset && e2->offset) + || (e1->offset && !e2->offset) + || (!e1->init && e2->init) + || (e1->init && !e2->init) + || (!e1->step && e2->step) + || (e1->step && !e2->step) + || (!e1->aligned_to && e2->aligned_to) + || (e1->aligned_to && !e2->aligned_to)) +return false; + + equal_p = true; + if (e1->base_address && e2->base_address) +equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0); + if (e1->offset && e2->offset) +equal_p &= operand_equal_p (e1->offset, e2->offset, 0); + if (e1->init && e2->init) +equal_p &= operand_equal_p (e1->init, e2->init, 0); + if (e1->step && e2->step) +eq
Re: [PATCH PR69042/01]Add IV candidate for use with constant offset stripped in base.
On Wed, Mar 16, 2016 at 10:48 AM, Bin Cheng wrote: > Hi, > When I tried to decrease # of IV candidates, I removed code that adds IV > candidates for use with constant offset stripped in use->base. This is kind > of too aggressive and triggers PR69042. So here is a patch adding back the > missing candidates. Honestly, this patch doesn't truly fix the issue, it > just brings back the original behavior in IVOPT part (Which is still a right > thing to do I think). The issue still depends on PIC_OFFSET register used on > x86 target. As discussed in > https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html. Furthermore, the real > problem could be in register pressure modeling about PIC_OFFSET symbol in > IVOPT. > > On AArch64, overall spec2k number isn't changed, though 173.applu is > regressed by ~4% because couple of loops' # of candidates now hits "--param > iv-consider-all-candidates-bound=30". For spec2k6 data on AArch64, INT is > not affected; FP overall is not changed, as for specific case: 459.GemsFDTD > is regressed by 2%, 433.milc is improved by 2%. To address the regression, I > will send another patch increasing the parameter bound. > > Bootstrap&test on x86_64 and AArch64, is it OK? In the meantime, I will > collect spec2k6 data on x86_64. Ok. Richard. > Thanks, > bin > > 2016-03-10 Bin Cheng > > PR tree-optimization/69042 > * tree-ssa-loop-ivopts.c (add_iv_candidate_for_use): Add IV cand > for use with constant offset stripped in base.
Re: Please include ada-hurd.diff upstream (try2)
> Maybe the time has come now for a proper inclusion of Ada/Gnat support in > gcc-* for GNU/Hurd. (Yes, I commit to maintain Ada in gcc for GNU/Hurd in > the foreseeable future, according to my abilities). > > Attached is an updated patch, now living e.g. in GNU/Debian versions of > gcc-5 and gcc-6. > > Please tell me if you need a changelog entry too. And yes, I have already > signed all copyright papers for FSF wrt gcc (and other packages). Yes, patches need ChangeLog. This one is small though so I have applied it: 2016-03-16 Svante Signell * gcc-interface/Makefile.in: Add support for x86 GNU/Hurd. * s-osinte-gnu.ads: New file. -- Eric Botcazou
Re: [PATCH, PR68715] Add missing single_pred_p test in scop_detection::merge_sese
On 16/03/16 09:53, Richard Biener wrote: Hmm, it looks like for all what this function does this effectively pessimizes scop merging and it would be easier to split 'exit' in case its destination is unsuitable (not trivially empty). Agreed. The /* For now we just want to bail out when exit does not post-dominate entry. TODO: We might just add a basic_block at the exit to make exit post-dominate entry (the entire region). */ if (!dominated_by_p (CDI_POST_DOMINATORS, get_entry_bb (combined), get_exit_bb (combined)) || !dominated_by_p (CDI_DOMINATORS, get_exit_bb (combined), comment also suggests that splitting the get_nearest_pdom_with_single_exit edge and including the new BB in the combined region would also always fix the dominance relation (though I don't see how it could do that and the comment looks wrong and by construction the check should never trigger). I've replaced the entire condition with two asserts: ... diff --git {a,b}/gcc/graphite-scop-detection.c index 7615842..762a248 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -820,14 +820,10 @@ scop_detection::merge_sese /* For now we just want to bail out when exit does not post-dominate entry. TODO: We might just add a basic_block at the exit to make exit post-dominate entry (the entire region). */ - if (!dominated_by_p (CDI_POST_DOMINATORS, get_entry_bb (combined), - get_exit_bb (combined)) - || !dominated_by_p (CDI_DOMINATORS, get_exit_bb (combined), - get_entry_bb (combined))) -{ - DEBUG_PRINT (dp << "[scop-detection-fail] cannot merge " - "seses.\n"); - return invalid_sese; -} + gcc_assert (dominated_by_p (CDI_POST_DOMINATORS, + get_entry_bb (combined), + get_exit_bb (combined))); + gcc_assert (dominated_by_p (CDI_DOMINATORS, + get_exit_bb (combined), + get_entry_bb (combined))); /* FIXME: We should remove this piece of code once canonicalize_loop_closed_ssa has been removed, because that function ... and ran graphite.exp. I ran into an ICE for pr68693.f90, for the second assert. The exit bb has two incoming edges from within the sese, and one from outside. Given that the exit bb is empty, we could split the exit edge and redirect the edge from outside the sese to the new bb, which would fix the sese. Perhaps that is what is meant in the comment, I'm not sure. Thanks, - Tom
Re: Please include ada-hurd.diff upstream (try2)
On Wed, 2016-03-16 at 11:08 +0100, Eric Botcazou wrote: > > > > Maybe the time has come now for a proper inclusion of Ada/Gnat support in > > gcc-* for GNU/Hurd. (Yes, I commit to maintain Ada in gcc for GNU/Hurd in > > the foreseeable future, according to my abilities). > > > > Attached is an updated patch, now living e.g. in GNU/Debian versions of > > gcc-5 and gcc-6. > > > > Please tell me if you need a changelog entry too. And yes, I have already > > signed all copyright papers for FSF wrt gcc (and other packages). > Yes, patches need ChangeLog. This one is small though so I have applied it: > > 2016-03-16 Svante Signell > > * gcc-interface/Makefile.in: Add support for x86 GNU/Hurd. > * s-osinte-gnu.ads: New file. Hi Eric, Sorry for not supplying a ChangeLog yet, I've been busy lately with other things. Since there is another new file s-osinte-gnu.adb in the latest patch, that might be added too. 2016-03-16 Svante Signell * gcc-interface/Makefile.in: Add support for x86 GNU/Hurd. * s-osinte-gnu.ads: New specification file. * s-osinte-gnu.adb: New body file, basically s-osinte-posix.adb adding dummy implementation of functions not yet implemented. Thanks!
Re: [PATCH] libffi: define FFI_SIZEOF_JAVA_RAW for aarch64 ILP32
On Tue, Mar 15, 2016 at 03:18:43PM +0100, Andreas Schwab wrote: > Like x32, aarch64 ILP32 needs to define FFI_SIZEOF_JAVA_RAW. This fixes > the java interpreter. Should this go through upstream libffi first? I can't remember the right order for these. Anyway, this is OK in whatever order is needed. Thanks, James > > Andreas. > > * src/aarch64/ffitarget.h (FFI_SIZEOF_JAVA_RAW) [__ILP32__]: > Define. > --- > libffi/src/aarch64/ffitarget.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/libffi/src/aarch64/ffitarget.h b/libffi/src/aarch64/ffitarget.h > index 2862ec7..34200ad 100644 > --- a/libffi/src/aarch64/ffitarget.h > +++ b/libffi/src/aarch64/ffitarget.h > @@ -29,6 +29,7 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ > #ifndef LIBFFI_ASM > #ifdef __ILP32__ > #define FFI_SIZEOF_ARG 8 > +#define FFI_SIZEOF_JAVA_RAW 4 > typedef unsigned long long ffi_arg; > typedef signed long long ffi_sarg; > #else > -- > 2.7.3 > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different." >
Re: [PATCH] boehm-gc: add supprt for aarch64 ILP32
On Tue, Mar 15, 2016 at 03:46:00PM +0100, Andreas Schwab wrote: > * include/private/gcconfig.h [AARCH64] (ALIGNMENT, CPP_WORDSZ): > Define for __ILP32__. OK. Thanks, James > --- > boehm-gc/include/private/gcconfig.h | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/boehm-gc/include/private/gcconfig.h > b/boehm-gc/include/private/gcconfig.h > index 7e081d9..aa81f15 100644 > --- a/boehm-gc/include/private/gcconfig.h > +++ b/boehm-gc/include/private/gcconfig.h > @@ -1854,9 +1854,14 @@ > # endif > > # ifdef AARCH64 > -# define CPP_WORDSZ 64 > +# ifdef __ILP32__ > +# define ALIGNMENT 4 > +# define CPP_WORDSZ 32 > +# else > +# define ALIGNMENT 8 > +# define CPP_WORDSZ 64 > +# endif > # define MACH_TYPE "AARCH64" > -# define ALIGNMENT 8 > # ifndef HBLKSIZE > # define HBLKSIZE 4096 > # endif > -- > 2.7.3 > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different." >
Re: Please include ada-hurd.diff upstream (try2)
> Sorry for not supplying a ChangeLog yet, I've been busy lately with other > things. Since there is another new file s-osinte-gnu.adb in the latest > patch, that might be added too. AFAICS there is only s-osinte-gnu.ads in the posted patch. -- Eric Botcazou
[PATCH, PR70185] Only finalize dot files that have been initialized
Hi, Atm, using fdump-tree-all-graph produces invalid dot files: ... $ rm *.c.* ; gcc test.c -O2 -S -fdump-tree-all-graph $ for f in *.dot; do dot -Tpdf $f -o dot.pdf; done Warning: test.c.006t.omplower.dot: syntax error in line 1 near '}' Warning: test.c.007t.lower.dot: syntax error in line 1 near '}' Warning: test.c.010t.eh.dot: syntax error in line 1 near '}' Warning: test.c.292t.statistics.dot: syntax error in line 1 near '}' $ cat test.c.006t.omplower.dot } $ ... These dot files are finalized, but never initialized or used. The 006/007/010 files are not used because '(fn->curr_properties & PROP_cfg) == 0' at the corresponding passes. And the file test.c.292t.statistics.dot is not used, because it doesn't belong to a single pass. The current finalization code doesn't handle these cases: ... /* Do whatever is necessary to finish printing the graphs. */ for (i = TDI_end; (dfi = dumps->get_dump_file_info (i)) != NULL; ++i) if (dumps->dump_initialized_p (i) && (dfi->pflags & TDF_GRAPH) != 0 && (name = dumps->get_dump_file_name (i)) != NULL) { finish_graph_dump_file (name); free (name); } ... The patch fixes this by simply testing for pass->graph_dump_initialized instead. [ That fix exposes the lack of initialization of graph_dump_initialized. It seems to be initialized for static passes, but for dynamically added passes, such as f.i. vzeroupper the value is uninitialized. The patch also fixes this. ] Bootstrapped and reg-tested on x86_64. OK for stage1? Thanks, - Tom Only finalize dot files that have been initialized 2016-03-16 Tom de Vries PR other/70185 * passes.c (opt_pass::opt_pass): Add missing initialization of graph_dump_initialized. (finish_optimization_passes): Only call finish_graph_dump_file if pass->graph_dump_initialized. --- gcc/passes.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/gcc/passes.c b/gcc/passes.c index 9d90251..5aa2b32 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -105,6 +105,7 @@ opt_pass::opt_pass (const pass_data &data, context *ctxt) sub (NULL), next (NULL), static_pass_number (0), +graph_dump_initialized (false), m_ctxt (ctxt) { } @@ -363,14 +364,18 @@ finish_optimization_passes (void) } /* Do whatever is necessary to finish printing the graphs. */ + gcc::pass_manager *passes = g->get_passes (); for (i = TDI_end; (dfi = dumps->get_dump_file_info (i)) != NULL; ++i) -if (dumps->dump_initialized_p (i) - && (dfi->pflags & TDF_GRAPH) != 0 - && (name = dumps->get_dump_file_name (i)) != NULL) - { - finish_graph_dump_file (name); - free (name); - } +{ + opt_pass *pass = passes->get_pass_for_id (i); + if (pass == NULL + || !pass->graph_dump_initialized) + continue; + + name = dumps->get_dump_file_name (i); + finish_graph_dump_file (name); + free (name); +} timevar_pop (TV_DUMP); }
Re: Please include ada-hurd.diff upstream (try2)
On Wed, 2016-03-16 at 11:53 +0100, Eric Botcazou wrote: > > > > Sorry for not supplying a ChangeLog yet, I've been busy lately with other > > things. Since there is another new file s-osinte-gnu.adb in the latest > > patch, that might be added too. > AFAICS there is only s-osinte-gnu.ads in the posted patch. Yes, you are right, somehow I submitted the old patch. Attached is the updated one, with a proposed ChangeLog entry included. Thanks!2016-03-16 Svante Signell * gcc-interface/Makefile.in: Add support for x86 GNU/Hurd. * s-osinte-gnu.ads: New specification file. * s-osinte-gnu.adb: New body file, basically s-osinte-posix.adb adding dummy implementation of functions not yet implemented. --- Index: gcc-5-5.3.1/src/gcc/ada/s-osinte-gnu.adb === --- /dev/null +++ gcc-5-5.3.1/src/gcc/ada/s-osinte-gnu.adb @@ -0,0 +1,147 @@ +-- +-- -- +-- GNAT RUN-TIME LIBRARY (GNARL) COMPONENTS -- +-- -- +-- S Y S T E M . O S _ I N T E R F A C E -- +-- -- +-- B o d y-- +-- -- +-- Copyright (C) 1991-1994, Florida State University-- +-- Copyright (C) 1995-2006, AdaCore -- +-- -- +-- GNARL is free software; you can redistribute it and/or modify it under -- +-- terms of the GNU General Public License as published by the Free Soft- -- +-- ware Foundation; either version 2, or (at your option) any later ver- -- +-- sion. GNARL is distributed in the hope that it will be useful, but WITH- -- +-- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY -- +-- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License -- +-- for more details. You should have received a copy of the GNU General -- +-- Public License distributed with GNARL; see file COPYING. If not, write -- +-- to the Free Software Foundation, 51 Franklin Street, Fifth Floor, -- +-- Boston, MA 02110-1301, USA. -- +-- -- +-- As a special exception, if other files instantiate generics from this -- +-- unit, or you link this unit with other files to produce an executable, -- +-- this unit does not by itself cause the resulting executable to be -- +-- covered by the GNU General Public License. This exception does not -- +-- however invalidate any other reasons why the executable file might be -- +-- covered by the GNU Public License. -- +-- -- +-- GNARL was developed by the GNARL team at Florida State University. -- +-- Extensive contributions were provided by Ada Core Technologies, Inc. -- +-- -- +-- + +-- This is the GNU/Hurd version of this package. + +pragma Polling (Off); +-- Turn off polling, we do not want ATC polling to take place during +-- tasking operations. It causes infinite loops and other problems. + +-- This package encapsulates all direct interfaces to OS services +-- that are needed by children of System. + +package body System.OS_Interface is + + + -- Get_Stack_Base -- + + + function Get_Stack_Base (thread : pthread_t) return Address is + pragma Warnings (Off, thread); + + begin + return Null_Address; + end Get_Stack_Base; + + -- + -- pthread_init -- + -- + + procedure pthread_init is + begin + null; + end pthread_init; + + -- + -- pthread_mutexattr_setprioceiling -- + -- + + function pthread_mutexattr_setprioceiling + (attr : access pthread_mutexattr_t; + prioceiling : int) return int is + pragma Unreferenced (attr, prioceiling); + begin + return 0; + end pthread_mutexattr_setprioceiling; + + -- + -- pthread_mutexattr_getprioceiling -- + -- + + function pthread_mutexattr_getprioceiling + (attr : access pthread_mutexattr_t; + prioceiling : access int) return int is +
Re: [PATCH] libffi: define FFI_SIZEOF_JAVA_RAW for aarch64 ILP32
James Greenhalgh writes: > On Tue, Mar 15, 2016 at 03:18:43PM +0100, Andreas Schwab wrote: >> Like x32, aarch64 ILP32 needs to define FFI_SIZEOF_JAVA_RAW. This fixes >> the java interpreter. > > Should this go through upstream libffi first? I can't remember the right > order for these. Dunno, things have gone either way. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."