Re: [PATCH] match.pd: optimize unsigned mul overflow check
On Sun, 29 May 2016, Marc Glisse wrote: > On Sat, 28 May 2016, Alexander Monakov wrote: > > > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether > > 'A*B' > > overflows (or 'B && A > -1 / B' if B may be zero). Let's optimize it to an > > invocation of __builtin_mul_overflow to avoid the divide operation. > > I forgot to ask earlier: what does this give for modes / platforms where > umulv4 does not have a specific implementation? Is the generic implementation > worse than A>-1/B, in which case we may want to check optab_handler before > doing the transformation? Or is it always at least as good? If umulv4 is unavailable (which today is everywhere except x86), gcc falls back as follows. First, it tries to see if doing a multiplication in a 2x wider type is possible (which it usually is, as gcc supports __int128_t on 64-bit platforms and 64-bit long long on 32-bit platforms), then it looks at high bits of the 2x wide product. This should boil down to doing a 'high multiply' instruction if original operands' type matches register size, and a normal multiply + masking high bits if the type is smaller than register. Second, if the above fails (e.g. with 64-bit operands on a 32-bit platform), then gcc emits a sequence that performs the multiplication by parts in a 2x narrower type. I think the first, more commonly taken, fallback path results in an always-good code. In the second case, the eliminated 64-bit divide is unlikely to have a direct hw support; e.g., on i386 it's a library call to __udivdi3. This makes the transformation a likely loss for code size, a likely win for performance. It could be better if GCC could CSE REALPART (IFN_MUL_OVERFLOW) with A*B on gimple. Thanks. Alexander
Re: [PATCH, PR69067] Remove assert in get_def_bb_for_const
On Sun, 29 May 2016, Tom de Vries wrote: > Hi, > > this patch fixes graphite PR69067, a 6/7 regression. > > > I. > > Consider the following test-case, compiled with -O1 -floop-nest-optimize > -flto: > ... > int a1, c1, cr, kt; > int aa[2]; > > int > ce (void) > { > while (a1 < 1) { > int g8; > for (g8 = 0; g8 < 3; ++g8) > if (c1 != 0) > cr = aa[a1 * 2] = kt; > for (c1 = 0; c1 < 2; ++c1) > aa[c1] = cr; > ++a1; > } > return 0; > } > > int > main (void) > { > return ce (aa); > } > ... > > At graphite0, there's a loop with header bb4, which conditionally executes bb > 5: > ... > : > > : > # g8_39 = PHI <0(13), g8_19(3)> > # cr_lsm.1_4 = PHI > # cr_lsm.2_22 = PHI > if (c1_lsm.3_35 != 0) > goto ; > else > goto ; > > : > aa[_3] = 0; > > : > # cr_lsm.1_33 = PHI > # cr_lsm.2_11 = PHI > g8_19 = g8_39 + 1; > if (g8_19 <= 2) > goto ; > else > goto ; > ... > > > II. > > The graphite transformation moves the condition '(P_35 <= -1 || P_35 >= 1)' > out of the loop: > ... > [scheduler] original ast: > { > for (int c0 = 0; c0 <= 2; c0 += 1) { > S_4(c0); > if (P_35 <= -1 || P_35 >= 1) > S_5(c0); > S_6(c0); > } > S_7(); > for (int c0 = 0; c0 <= 1; c0 += 1) > S_8(c0); > } > > [scheduler] AST generated by isl: > { > if (P_35 <= -1) { > for (int c0 = 0; c0 <= 2; c0 += 1) > S_5(c0); > } else if (P_35 >= 1) > for (int c0 = 0; c0 <= 2; c0 += 1) > S_5(c0); > for (int c0 = 0; c0 <= 2; c0 += 1) { > S_4(c0); > S_6(c0); > } > S_7(); > for (int c0 = 0; c0 <= 1; c0 += 1) > S_8(c0); > } > ... > > When instantiating the ast back to gimple, we run into an assert: > ... > pr-graphite-4.c: In function ‘ce’: > pr-graphite-4.c:5:1: internal compiler error: in get_def_bb_for_const, at > graphite-isl-ast-to-gimple.c:1795 > ce() > ^ > ... > > > III. > > What happens is the following: in copy_cond_phi_args we try to copy the > arguments of phi in bb6 to the arguments of new_phi in bb 46 > ... > (gdb) call debug_gimple_stmt (phi) > cr_lsm.1_33 = PHI > (gdb) call debug_gimple_stmt (new_phi) > cr_lsm.1_62 = PHI <(28), (47)> > ... > > While handling the '0' phi argument in add_phi_arg_for_new_expr we trigger > this bit of code and call get_def_bb_for_const with bb.index == 46 and > old_bb.index == 5: > ... > /* If the corresponding def_bb could not be found the entry will > be NULL. */ > if (TREE_CODE (old_phi_args[i]) == INTEGER_CST) > def_pred[i] > = get_def_bb_for_const (new_bb, > gimple_phi_arg_edge (phi, i)->src); > ... > > Neither of the two copies of bb 5 dominates bb 46, so we run into the assert > at the end: > ... > /* Returns a basic block that could correspond to where a constant was > defined in the original code. In the original code OLD_BB had the > definition, we need to find which basic block out of the copies of > old_bb, in the new region, should a definition correspond to if it > has to reach BB. */ > > basic_block translate_isl_ast_to_gimple:: > get_def_bb_for_const (basic_block bb, basic_block old_bb) const > { > vec *bbs = region->copied_bb_map->get (old_bb); > > if (!bbs || bbs->is_empty ()) > return NULL; > > if (1 == bbs->length ()) > return (*bbs)[0]; > > int i; > basic_block b1 = NULL, b2; > FOR_EACH_VEC_ELT (*bbs, i, b2) > { > if (b2 == bb) > return bb; > > /* BB and B2 are in two unrelated if-clauses. */ > if (!dominated_by_p (CDI_DOMINATORS, bb, b2)) > continue; > > /* Compute the nearest dominator. */ > if (!b1 || dominated_by_p (CDI_DOMINATORS, b2, b1)) > b1 = b2; > } > > gcc_assert (b1); > return b1; > } > ... > > > IV. > > Attached patch fixes this by removing the assert. > > Bootstrapped and reg-tested on x86_64. > > OK for trunk, 6-branch? Ok. Thanks, Richard. > Thanks, > - Tom > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: RFC [1/2] divmod transform
On Mon, 30 May 2016, Prathamesh Kulkarni wrote: > The attached patch ICE's during bootstrap for x86_64, and is reproducible with > following case with -m32 -O2: > > typedef long long type; > > type f(type x, type y) > { > type q = x / y; > type r = x % y; > return q + r; > } > > The ICE happens because the test-case hits > gcc_assert (unsignedp); > in default_expand_divmod_libfunc (). That's of course your function (and ICE). > Surprisingly, optab_libfunc (sdivmod_optab, DImode) returns optab_libfunc > with name "__divmoddi4" although __divmoddi4() is nowhere defined in > libgcc for x86. > (I verified that by forcing the patch to generate call to __divmoddi4, > which results in undefined reference to __divmoddi4). > > This happens because in optabs.def we have: > OPTAB_NL(sdivmod_optab, "divmod$a4", UNKNOWN, "divmod", '4', gen_int_libfunc) > > and gen_int_libfunc generates "__divmoddi4" on first call to optab_libfunc > and sets optab_libfunc (sdivmod_optab, DImode) to "__divmoddi4". > I wonder if we should remove gen_int_libfunc entry in optabs.def for > sdivmod_optab ? Hum, not sure - you might want to look at expand_divmod (though that always just computes one part of the result in the end). Joseph - do you know sth about why there's not a full set of divmod libfuncs in libgcc? Thanks, Richard.
Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations
On 23 May 2016 at 14:59, Prathamesh Kulkarni wrote: > On 5 February 2016 at 18:40, Prathamesh Kulkarni > wrote: >> On 4 February 2016 at 16:31, Ramana Radhakrishnan >> wrote: >>> On Sun, Jan 17, 2016 at 9:06 AM, Prathamesh Kulkarni >>> wrote: On 31 July 2015 at 15:04, Ramana Radhakrishnan wrote: > > > On 29/07/15 11:09, Prathamesh Kulkarni wrote: >> Hi, >> This patch tries to implement division with multiplication by >> reciprocal using vrecpe/vrecps >> with -funsafe-math-optimizations and -freciprocal-math enabled. >> Tested on arm-none-linux-gnueabihf using qemu. >> OK for trunk ? >> >> Thank you, >> Prathamesh >> > > I've tried this in the past and never been convinced that 2 iterations > are enough to get to stability with this given that the results are only > precise for 8 bits / iteration. Thus I've always believed you need 3 > iterations rather than 2 at which point I've never been sure that it's > worth it. So the testing that you've done with this currently is not > enough for this to go into the tree. > > I'd like this to be tested on a couple of different AArch32 > implementations with a wider range of inputs to verify that the results > are acceptable as well as running something like SPEC2k(6) with atleast > one iteration to ensure correctness. Hi, I got results of SPEC2k6 fp benchmarks: a15: +0.64% overall, 481.wrf: +6.46% a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76% a57: +0.35% overall, 481.wrf: +3.84% The other benchmarks had (almost) identical results. >>> >>> Thanks for the benchmarking results - Please repost the patch with >>> the changes that I had requested in my previous review - given it is >>> now stage4 , I would rather queue changes like this for stage1 now. >> Hi, >> Please find the updated patch attached. >> It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and >> arm-none-eabi. >> However the test-case added in the patch (neon-vect-div-1.c) fails to >> get vectorized at -O2 >> for armeb-none-linux-gnueabihf. >> Charles suggested me to try with -O3, which worked. >> It appears the test-case fails to get vectorized with >> -fvect-cost-model=cheap (which is default enabled at -O2) >> and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic >> >> I can't figure out why it fails -fvect-cost-model=cheap. >> From the vect dump (attached): >> neon-vect-div-1.c:12:3: note: Setting misalignment to -1. >> neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9 > Hi, > I think I have some idea why the test-case fails attached with patch > fail to get vectorized on armeb with -O2. > > Issue with big endian vectorizer: > The patch does not cause regressions on big endian vectorizer but > fails to vectorize the test-cases attached with the patch, while they > get vectorized on > litttle-endian. > Fails with armeb with the following message in dump: > note: not vectorized: unsupported unaligned load.*_9 > > The behavior of big and little endian vectorizer seems to be different > in arm_builtin_support_vector_misalignment() which overrides the hook > targetm.vectorize.support_vector_misalignment(). > > targetm.vectorize.support_vector_misalignment is called by > vect_supportable_dr_alignment () which in turn is called > by verify_data_refs_alignment (). > > Execution upto following condition is common between arm and armeb > in vect_supportable_dr_alignment(): > > if ((TYPE_USER_ALIGN (type) && !is_packed) > || targetm.vectorize.support_vector_misalignment (mode, type, > DR_MISALIGNMENT (dr), is_packed)) > /* Can't software pipeline the loads, but can at least do them. */ > return dr_unaligned_supported; > > For little endian case: > arm_builtin_support_vector_misalignment() is called with > V2SF mode and misalignment == -1, and the following condition > becomes true: > /* If the misalignment is unknown, we should be able to handle the access > so long as it is not to a member of a packed data structure. */ > if (misalignment == -1) > return true; > > Since the hook returned true we enter the condition above in > vect_supportable_dr_alignment() and return dr_unaligned_supported; > > For big-endian: > arm_builtin_support_vector_misalignment() is called with V2SF mode. > The following condition that gates the entire function body fails: > if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access) > and the default hook gets called with V2SF mode and the default hook > returns false because > movmisalign_optab does not exist for V2SF mode. > > So the condition above in vect_supportable_dr_alignment() fails > and we come here: > /* Unsupported. */ > return dr_unaligned_unsupported; > > And hence we get the unaligned load not supported message in the dump > for armeb in verify_data_ref_alignment (): >
[gotools, libcc1] Update copyright dates
I noticed that libcc1 copyright dates hadn't been updated since it had initially been added. Looking around, the same applies to gotools. This patch updates update-copyright.py to take care of that. At the same time, I checked that script, added appropriate entries for the other missing directories and removed libmudflap support. I'm also including the results of a update-copyright.py run on gotools and libcc1, but am uncertain what to include in default_dirs. Bootstrapped without regressions on i386-pc-solaris2.12, ok for mainline? Rainer 2016-05-25 Rainer Orth libcc1: Update copyrights. gotools: Update copyrights. contrib: * update-copyright.py (LibMudflapFilter): Remove. (GCCCmdLine.__init__): Add gotools, libcc1. Remove libmudflap. List unhandled intl, libcilkrts, libgo, liboffloadmic, maintainer-scripts. # HG changeset patch # Parent 2977caa0bc2d9892e182af40fcd11c51d9ec921b Update copyright dates diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py --- a/contrib/update-copyright.py +++ b/contrib/update-copyright.py @@ -631,15 +631,6 @@ class LibJavaFilter (GenericFilter): return re.compile ('.*icSigCopyrightTag') return GenericFilter.get_line_filter (self, dir, filename) -class LibMudflapFilter (GenericFilter): -def __init__ (self): -GenericFilter.__init__ (self) - -self.skip_dirs |= set ([ -# Handled separately. -'testsuite', -]) - class LibStdCxxFilter (GenericFilter): def __init__ (self): GenericFilter.__init__ (self) @@ -724,30 +715,34 @@ class GCCCmdLine (CmdLine): self.add_dir ('gcc', GCCFilter()) self.add_dir (os.path.join ('gcc', 'testsuite'), TestsuiteFilter()) self.add_dir ('gnattools') +self.add_dir ('gotools') self.add_dir ('include') +# intl is imported from upstream. self.add_dir ('libada') self.add_dir ('libatomic') self.add_dir ('libbacktrace') +self.add_dir ('libcc1') +# libcilkrts is imported from upstream. self.add_dir ('libcpp', LibCppFilter()) self.add_dir ('libdecnumber') # libffi is imported from upstream. self.add_dir ('libgcc', LibGCCFilter()) self.add_dir ('libgfortran') +# libgo is imported from upstream. self.add_dir ('libgomp') self.add_dir ('libiberty') self.add_dir ('libitm') self.add_dir ('libjava', LibJavaFilter()) self.add_dir (os.path.join ('libjava', 'testsuite'), TestsuiteFilter()) -self.add_dir ('libmudflap', LibMudflapFilter()) -self.add_dir (os.path.join ('libmudflap', 'testsuite'), - TestsuiteFilter()) self.add_dir ('libobjc') +# liboffloadmic is imported from upstream. self.add_dir ('libquadmath') -# libsanitiser is imported from upstream. +# libsanitizer is imported from upstream. self.add_dir ('libssp') self.add_dir ('libstdc++-v3', LibStdCxxFilter()) self.add_dir ('libvtv') self.add_dir ('lto-plugin') +# maintainer-scripts maintainer-scripts # zlib is imported from upstream. self.default_dirs = [ @@ -761,7 +756,6 @@ class GCCCmdLine (CmdLine): 'libgfortran', 'libgomp', 'libitm', -'libmudflap', 'libobjc', 'libstdc++-v3', ] diff --git a/gotools/Makefile.am b/gotools/Makefile.am --- a/gotools/Makefile.am +++ b/gotools/Makefile.am @@ -1,5 +1,5 @@ # Makefile for gotools -# Copyright 2015 Free Software Foundation, Inc. +# Copyright (C) 2015-2016 Free Software Foundation, Inc. # # This file is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/gotools/configure.ac b/gotools/configure.ac --- a/gotools/configure.ac +++ b/gotools/configure.ac @@ -1,5 +1,5 @@ # Configure script for gotools. -# Copyright 2015 Free Software Foundation, Inc. +# Copyright (C) 2015-2016 Free Software Foundation, Inc. # # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by diff --git a/libcc1/Makefile.am b/libcc1/Makefile.am --- a/libcc1/Makefile.am +++ b/libcc1/Makefile.am @@ -1,4 +1,4 @@ -## Copyright (C) 2014 Free Software Foundation, Inc. +## Copyright (C) 2014-2016 Free Software Foundation, Inc. ## This file is part of GCC. diff --git a/libcc1/callbacks.cc b/libcc1/callbacks.cc --- a/libcc1/callbacks.cc +++ b/libcc1/callbacks.cc @@ -1,5 +1,5 @@ /* Callback management. - Copyright (C) 2014 Free Software Foundation, Inc. + Copyright (C) 2014-2016 Free Software Foundation, Inc. This file is part of GCC. diff --git a/libcc1/callbacks.hh b/libcc1/cal
Re: [visium] Split DImode arithmetical operations
> This makes it so that DImode arithmetical operations are split into a pair > of SImode operations in order to enable better scheduling. > > Tested on visium-elf, applied on the mainline and 6 branch. A minor follow-up, applied on the mainline and 6 branch. 2016-05-30 Eric Botcazou * config/visium/visium.c (visium_split_double_add): Minor tweaks. (visium_expand_copysign): Use gen_int_mode directly. (visium_compute_frame_size): Minor tweaks. -- Eric BotcazouIndex: config/visium/visium.c === --- config/visium/visium.c (revision 236827) +++ config/visium/visium.c (working copy) @@ -2087,6 +2087,7 @@ visium_split_double_add (enum rtx_code c rtx op6 = gen_highpart (SImode, op0); rtx op7 = (op1 == const0_rtx ? op1 : gen_highpart (SImode, op1)); rtx op8; + rtx x, pat, flags; /* If operand #2 is a small constant, then its high part is null. */ if (CONST_INT_P (op2)) @@ -2109,14 +2110,13 @@ visium_split_double_add (enum rtx_code c } /* This is the {add,sub,neg}si3_insn_set_flags pattern. */ - rtx x; if (op4 == const0_rtx) x = gen_rtx_NEG (SImode, op5); else x = gen_rtx_fmt_ee (code, SImode, op4, op5); - rtx pat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2)); + pat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2)); XVECEXP (pat, 0, 0) = gen_rtx_SET (op3, x); - rtx flags = gen_rtx_REG (CC_NOOVmode, FLAGS_REGNUM); + flags = gen_rtx_REG (CC_NOOVmode, FLAGS_REGNUM); x = gen_rtx_COMPARE (CC_NOOVmode, shallow_copy_rtx (x), const0_rtx); XVECEXP (pat, 0, 1) = gen_rtx_SET (flags, x); emit_insn (pat); @@ -2160,7 +2160,7 @@ visium_expand_copysign (rtx *operands, e { long l; REAL_VALUE_TO_TARGET_SINGLE (*CONST_DOUBLE_REAL_VALUE (op1), l); - op1 = force_reg (SImode, GEN_INT (trunc_int_for_mode (l, SImode))); + op1 = force_reg (SImode, gen_int_mode (l, SImode)); } } else @@ -3597,18 +3597,15 @@ visium_compute_frame_size (int size) int visium_initial_elimination_offset (int from, int to ATTRIBUTE_UNUSED) { - const int frame_size = visium_compute_frame_size (get_frame_size ()); const int save_fp = current_frame_info.save_fp; const int save_lr = current_frame_info.save_lr; const int lr_slot = current_frame_info.lr_slot; - const int local_frame_offset -= (save_fp + save_lr + lr_slot) * UNITS_PER_WORD; int offset; if (from == FRAME_POINTER_REGNUM) -offset = local_frame_offset; +offset = (save_fp + save_lr + lr_slot) * UNITS_PER_WORD; else if (from == ARG_POINTER_REGNUM) -offset = frame_size; +offset = visium_compute_frame_size (get_frame_size ()); else gcc_unreachable ();
Re: Moving backwards/FSM threader into its own pass
On Fri, May 27, 2016 at 6:30 PM, Jeff Law wrote: > > It's been my plan since finally wrapping my head around Bodik's thesis to > revamp how we handle jump threading to use some of the principles from his > thesis. In particular, the back substitution and simplification model feels > like the right long term direction. > > Sebastian's FSM threader was the first step on that path (gcc-5). Exploiting > that threader for more than just FSM loops was the next big step (gcc-6). > > This patch takes the next step -- dis-entangling that new jump threading > code from the old threading code and VRP/DOM. > > The key thing to realize here is that the backwards (FSM) jump threader does > not inherently need the DOM tables nor the ASSERT_EXPRs from VRP to do its > job. ie, it can and should run completely independently of DOM/VRP (though > one day it may exploit range information that a prior VRP pass has > computed). > > By moving the backwards threader into its own pass, we can run it prior to > DOM/VRP, which allow DOM/VRP to work on a simpler CFG with larger extended > basic blocks. Ok, but the placement (and number of) threading passes then no longer depends on DOM/VRP passes - and as you placed the threading passes _before_ those passes the threading itself does not benefit from DOM/VRP but only from previous optimization passes. I see this as opportunity to remove some of them ;) I now see in the main optimization pipeline NEXT_PASS (pass_fre); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */); position makes sense - FRE removed redundancies and fully copy/constant propagated the IL. NEXT_PASS (pass_sra); /* The dom pass will also resolve all __builtin_constant_p calls that are still there to 0. This has to be done after some propagations have already run, but before some more dead code is removed, and this place fits nicely. Remember this when trying to move or duplicate pass_dominator somewhere earlier. */ NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */); this position OTOH doesn't make much sense as IL cleanup is missing after SRA and previous opts. After loop we now have NEXT_PASS (pass_tracer); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); NEXT_PASS (pass_strlen); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); I don't think we want two threadings so close together. It makes some sense to have a threading _after_ DOM but before VRP (DOM cleaned up the IL). So that would leave two from your four passes and expose the opportunity to re-add one during early-opts, also after FRE. That one should be throttled down to operate in "-Os" mode though. So, can you see what removing the two threading passes that don't make sense to me do to your statistics? And check whether a -Os-like threading can be done early? Thanks, Richard. > The removal of unexecutable paths before VRP also has the nice effect of > also eliminating false positive warnings for some work Aldy is doing around > out-of-bound array index warnings. > > We can remove all the calls to the backwards threader from the old style > threader. The way the FSM bits wired into the old threader caused redundant > path evaluations. That can be easily fixed with the FSM bits in their own > pass. The net is a 25% reduction in paths examined by the FSM threader. > > Finally, we ultimately end up threading more jumps. I don't have the #s > handy anymore, but when I ran this through my tester there was a clear > decrease in the number of runtime jumps. > > So what are the downsides. > > With the threader in its own pass, we end up getting more calls into the CFG > & SSA verification routines in a checking-enabled compiler. So the > compile-time improvement is lost for a checking-enabled compiler. > > The backward threader does not combine identical jump threading paths with > different starting edges into a single jump threading path with multiple > entry points. This is primarily a codesize issue, but can have a secondary > effect on performance. I know how to fix this and it's on the list for > gcc-7 along with further cleanups. > > > Bootstrapped and regression tested on x86_64 linux. Installing on the trunk > momentarily. > > Jeff > > commit 35bd646a4834a68a49af9ccb5873362a0fc742ae > Author: Jeff Law > Date: Fri May 27 10:23:54 2016 -0600 > > * tree-ssa-threadedge.c: Remove include of > tree-ssa-threadbackward.h. > (thread_across_edge): Remove calls to find_jump_threads_backwards. > * passes.def: Add jump threading passes before DOM/VRP. > * tree-ssa-threadbackward.c (find_jump_threads_backwards): Change > argument to a basic block from an edge. Remove tests which are > handled elsewhere. > (pass_d
Re: Enable loop peeling at -O3
On Sat, 28 May 2016, Jan Hubicka wrote: > Hello, > thanks for feedback. I updated the patch and also noticed that > -fpeel-all-loops gives up when > upper bound is known but it is large and when the max-peel-insns is too small > to permit > peeling max-peel-times. This patch also updates pr61743-2.c which are now > peeled before > we manage to propagate the proper loop bound. > > Bootstrapped/regtested x86_64-linux. OK? Humm, so why add -fpeel-all-loops? I don't think -funroll-all-loops is useful. Did you check code-size/compile-time/performance effects of enabling -fpeel-loops at -O3 for, say, SPEC CPU 2006? Thanks, Richard. > Honza > > * common.opt (flag_peel_all_loops): New option. > * doc/invoke.texi: (-fpeel-loops): Update documentation. > (-fpeel-all-loops): Document. > * opts.c (default_options): Add OPT_fpeel_loops to -O3+. > * toplev.c (process_options): flag_peel_all_loops implies > flag_peel_loops. > * tree-ssa-lop-ivcanon.c (try_peel_loop): Update comment; handle > -fpeel-all-loops, use likely estimates. > > * gcc.dg/tree-ssa/peel1.c: New testcase. > * gcc.dg/tree-ssa/peel2.c: New testcase. > * gcc.dg/tree-ssa/pr61743-1.c: Pass -fno-peel-loops. > * gcc.dg/tree-ssa/pr61743-2.c: Pass -fno-peel-loops. > Index: common.opt > === > --- common.opt(revision 236815) > +++ common.opt(working copy) > @@ -1840,6 +1840,10 @@ fpeel-loops > Common Report Var(flag_peel_loops) Optimization > Perform loop peeling. > > +fpeel-all-loops > +Common Report Var(flag_peel_all_loops) Optimization > +Perform loop peeling of all loops. > + > fpeephole > Common Report Var(flag_no_peephole,0) Optimization > Enable machine specific peephole optimizations. > Index: doc/invoke.texi > === > --- doc/invoke.texi (revision 236815) > +++ doc/invoke.texi (working copy) > @@ -375,7 +375,7 @@ Objective-C and Objective-C++ Dialects}. > -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol > -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol > -fomit-frame-pointer -foptimize-sibling-calls @gol > --fpartial-inlining -fpeel-loops -fpredictive-commoning @gol > +-fpartial-inlining -fpeel-loops -fpeel-all-loops -fpredictive-commoning @gol > -fprefetch-loop-arrays @gol > -fprofile-correction @gol > -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol > @@ -6338,7 +6338,8 @@ by @option{-O2} and also turns on the @o > @option{-fgcse-after-reload}, @option{-ftree-loop-vectorize}, > @option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths} > @option{-ftree-slp-vectorize}, @option{-fvect-cost-model}, > -@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options. > +@option{-ftree-partial-pre}, @option{-fpeel-loops} > +and @option{-fipa-cp-clone} options. > > @item -O0 > @opindex O0 > @@ -8593,7 +8594,7 @@ data about values of expressions in the > With @option{-fbranch-probabilities}, it reads back the data gathered > from profiling values of expressions for usage in optimizations. > > -Enabled with @option{-fprofile-generate} and @option{-fprofile-use}. > +Enabled with @option{-fprofile-generate} and/or @option{-fprofile-use}. > > @item -fprofile-reorder-functions > @opindex fprofile-reorder-functions > @@ -8661,10 +8662,17 @@ the loop is entered. This usually makes > @item -fpeel-loops > @opindex fpeel-loops > Peels loops for which there is enough information that they do not > -roll much (from profile feedback). It also turns on complete loop peeling > -(i.e.@: complete removal of loops with small constant number of iterations). > - > -Enabled with @option{-fprofile-use}. > +roll much (from profile feedback or static analysis). It also turns on > +complete loop peeling (i.e.@: complete removal of loops with small constant > +number of iterations). > + > +Enabled with @option{-O3} and @option{-fprofile-use}. > + > +@item -fpeel-all-loops > +@opindex fpeel-all-loops > +Peel all loops, even if their number of iterations is uncertain when > +the loop is entered. For loops with large number of iterations this leads > +to wasted code size. > > @item -fmove-loop-invariants > @opindex fmove-loop-invariants > Index: opts.c > === > --- opts.c(revision 236815) > +++ opts.c(working copy) > @@ -535,6 +535,7 @@ static const struct default_options defa > { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, > VECT_COST_MODEL_DYNAMIC }, > { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 }, > +{ OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 }, > > /* -Ofast adds optimizations to -O3. */ > { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 }, > Index: testsuite/gcc.dg/tree-ssa/peel1.c > ==
Re: Thoughts on memcmp expansion (PR43052)
On 01/15/2016 05:58 PM, Bernd Schmidt wrote: One question Richard posed in the comments: why aren't we optimizing small constant size memcmps other than size 1 to *s == *q? The reason is the return value of memcmp, which implies byte-sized operation (incidentally, the use of SImode in the cmpmem/cmpstr patterns is really odd). It's possible to work around this, but expansion becomes a little more tricky (subtract after bswap, maybe). When I did this (big-endian conversion, wide substract, sign) to the tail difference check in glibc's x86_64 memcmp, it was actually a bit faster than isolating the differing byte and returning its difference, even for non-random data such as encountered during qsort: * Expand __memcmp_eq for small constant sizes with loads and comparison, fall back to a memcmp call. Should we export such a function from glibc? I expect it's fairly common. Computing the tail difference costs a few cycles. It may also make sense to call a streamlined implementation if you have interesting alignment information (for x86_64, that would be at least 16 on one or both inputs, so it's perhaps not easy to come by). Florian
[PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences
Hi, this patch fixes graphite PR69068, a 6/7 regression. I. Consider this test-case: ... int qo; int zh[2]; void td (void) { int ly, en; for (ly = 0; ly < 2; ++ly) for (en = 0; en < 2; ++en) zh[en] = ((qo == 0) || (((qo * 2) != 0))) ? 1 : -1; } ... When compiling with -O1 -fgraphite-identity, we run into an assert in bb_contains_loop_phi_nodes: ... pr-graphite.c: In function ‘td’: pr-graphite.c:5:1: internal compiler error: in bb_contains_loop_phi_nodes, at graphite-isl-ast-to-gimple.c:1078 td(void) ^~ ... II. At graphite0, we have a 3 argument phi in bb 7: ... td () { int en; int ly; int qo.1_1; int _3; int iftmp.0_6; : qo.1_1 = qo; _3 = qo.1_1 * 2; goto ; : : # en_19 = PHI <0(10), en_12(3)> if (qo.1_1 == 0) goto ; else goto ; : if (_3 != 0) goto ; else goto ; : : # iftmp.0_6 = PHI <1(6), -1(5), 1(4)> zh[en_19] = iftmp.0_6; en_12 = en_19 + 1; if (en_12 <= 1) goto ; else goto ; : ly_13 = ly_18 + 1; if (ly_13 <= 1) goto ; else goto ; : : # ly_18 = PHI goto ; : return; } ... When instantiating the gimple from the graphite ast, we arrive at copy_bb_and_scalar_dependences with bb.index == 7, where we execute: ... if (num_phis > 0 && bb_contains_loop_phi_nodes (bb)) { ... Subsequently we run into this assert, because EDGE_COUNT (bb->preds) == 3: ... /* Return true when BB contains loop phi nodes. A loop phi node is the loop header containing phi nodes which has one init-edge and one back-edge. */ static bool bb_contains_loop_phi_nodes (basic_block bb) { gcc_assert (EDGE_COUNT (bb->preds) <= 2); ... III. This patch fixes the assert conservatively by aborting graphite code generation when encountering a phi with more than two arguments in copy_bb_and_scalar_dependences. Bootstrapped and reg-tested on x86_64. OK for trunk, 6 branch? Thanks, - Tom Handle 3-arg phi in copy_bb_and_scalar_dependences 2016-05-30 Tom de Vries PR tree-optimization/69068 * graphite-isl-ast-to-gimple.c (copy_bb_and_scalar_dependences): Handle phis with more than two args. * gcc.dg/graphite/pr69068.c: New test. --- gcc/graphite-isl-ast-to-gimple.c| 7 +++ gcc/testsuite/gcc.dg/graphite/pr69068.c | 14 ++ 2 files changed, 21 insertions(+) diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index ff1d91f..c176db0 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -2725,6 +2725,13 @@ copy_bb_and_scalar_dependences (basic_block bb, edge next_e, vec iv_map) } else { + if (num_phis > 0 + && EDGE_COUNT (bb->preds) > 2) + { + codegen_error = true; + return NULL; + } + new_bb = split_edge (next_e); if (num_phis > 0 && bb_contains_loop_phi_nodes (bb)) { diff --git a/gcc/testsuite/gcc.dg/graphite/pr69068.c b/gcc/testsuite/gcc.dg/graphite/pr69068.c new file mode 100644 index 000..0abea06 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr69068.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fgraphite-identity" } */ + +int qo; +int zh[2]; + +void +td (void) +{ + int ly, en; + for (ly = 0; ly < 2; ++ly) +for (en = 0; en < 2; ++en) + zh[en] = ((qo == 0) || (((qo * 2) != 0))) ? 1 : -1; +}
Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences
On Mon, 30 May 2016, Tom de Vries wrote: > Hi, > > this patch fixes graphite PR69068, a 6/7 regression. > > > I. > > Consider this test-case: > ... > int qo; > int zh[2]; > > void > td (void) > { > int ly, en; > for (ly = 0; ly < 2; ++ly) > for (en = 0; en < 2; ++en) > zh[en] = ((qo == 0) || (((qo * 2) != 0))) ? 1 : -1; > } > ... > > When compiling with -O1 -fgraphite-identity, we run into an assert in > bb_contains_loop_phi_nodes: > ... > pr-graphite.c: In function ‘td’: > pr-graphite.c:5:1: internal compiler error: in bb_contains_loop_phi_nodes, at > graphite-isl-ast-to-gimple.c:1078 > td(void) > ^~ > ... > > > II. > > At graphite0, we have a 3 argument phi in bb 7: > ... > td () > { > int en; > int ly; > int qo.1_1; > int _3; > int iftmp.0_6; > > : > qo.1_1 = qo; > _3 = qo.1_1 * 2; > goto ; > > : > > : > # en_19 = PHI <0(10), en_12(3)> > if (qo.1_1 == 0) > goto ; > else > goto ; > > : > if (_3 != 0) > goto ; > else > goto ; > > : > > : > # iftmp.0_6 = PHI <1(6), -1(5), 1(4)> > zh[en_19] = iftmp.0_6; > en_12 = en_19 + 1; > if (en_12 <= 1) > goto ; > else > goto ; > > : > ly_13 = ly_18 + 1; > if (ly_13 <= 1) > goto ; > else > goto ; > > : > > : > # ly_18 = PHI > goto ; > > : > return; > > } > ... > > When instantiating the gimple from the graphite ast, we arrive at > copy_bb_and_scalar_dependences with bb.index == 7, where we execute: > ... > if (num_phis > 0 && bb_contains_loop_phi_nodes (bb)) > { > ... > > Subsequently we run into this assert, because > EDGE_COUNT (bb->preds) == 3: > ... > /* Return true when BB contains loop phi nodes. A loop phi node is the >loop header containing phi nodes which has one init-edge and one >back-edge. */ > > static bool > bb_contains_loop_phi_nodes (basic_block bb) > { > gcc_assert (EDGE_COUNT (bb->preds) <= 2); > ... > > > III. > > This patch fixes the assert conservatively by aborting graphite code > generation when encountering a phi with more than two arguments in > copy_bb_and_scalar_dependences. > > Bootstrapped and reg-tested on x86_64. > > OK for trunk, 6 branch? Did you check if simply returning false from bb_contains_loop_phi_nodes instead of asserting works? The caller has a 'else' that is supposed to handle condition PHIs. After all it already handles one predecessor specially ... Thus if (EDGE_COUNT (bb->preds) != 2) return false; should work here. Or replace this function with bb_loop_header_p (bb) (works w/o loop info, the function seems to depend on loop info and thus simply checking bb->loop_father->header == bb should work as well). Thanks, Richard.
Re: [SPARC] Support for --with-{cpu,tune}-{32,64} in sparc*-* targets
> Tested in sparc64-linux-gnu, sparcv9-linux-gnu and sparc-sun-solaris2.11. > > 2016-05-25 Jose E. Marchesi > > * config.gcc (sparc*-*-*): Support cpu_32, cpu_64, tune_32 and > tune_64. > * doc/install.texi (--with-cpu-32, --with-cpu-64): Document > support on SPARC. > * config/sparc/linux64.h (OPTION_DEFAULT_SPECS): Add entries for > cpu_32, cpu_64, tune_32 and tune_64. > * config/sparc/sol2.h (OPTION_DEFAULT_SPECS): Likewise. OK for mainline, thanks. Thanks. I would need someone to install it, as I don't have commit access to the svn.
Re: C PATCH to add -Wswitch-unreachable (PR c/49859)
On 05/23/2016 12:16 PM, Marek Polacek wrote: Can we document what a programmer needs to do to suppress the warning? My worry is that the warning will be too unpredictable to be useful. Why would you suppress the warning? Just delete the code that cannot be executed anyway. And what is unpredictable? You don't want the compiler to warn on switch(x) { found = 1; case 1: ... ? Oh, I misread what the patch was about. It looks sufficiently predictable to do this the GIMPLE level (but it's odd if that's indeed the architecturally correct choice). Thanks, Florian
Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences
On 30/05/16 11:46, Richard Biener wrote: This patch fixes the assert conservatively by aborting graphite code >generation when encountering a phi with more than two arguments in >copy_bb_and_scalar_dependences. > >Bootstrapped and reg-tested on x86_64. > >OK for trunk, 6 branch? Did you check if simply returning false from bb_contains_loop_phi_nodes instead of asserting works? The caller has a 'else' that is supposed to handle condition PHIs. After all it already handles one predecessor specially ... Thus if (EDGE_COUNT (bb->preds) != 2) return false; should work here. Unfortunately, that doesn't work. We run into another assert in copy_cond_phi_nodes: ... /* Cond phi nodes should have exactly two arguments. */ gcc_assert (2 == EDGE_COUNT (bb->preds)); ... Or replace this function with bb_loop_header_p (bb) (works w/o loop info, the function seems to depend on loop info and thus simply checking bb->loop_father->header == bb should work as well). I think that will run into the same assert. Thanks, - Tom
Re: [PATCH2][PR71252] Fix insertion point of stmt_to_insert
On Mon, May 30, 2016 at 2:35 AM, kugan wrote: > > > On 28/05/16 01:28, Kugan Vivekanandarajah wrote: >> >> Hi Richard, >> >> This fix insertion point of stmt_to_insert based on your comments. In >> insert_stmt_before_use , I now use find_insert_point such that we >> insert the stmt_to_insert after its operands are defined. This means >> that we now have to insert before and insert after in other cases. >> >> I also factored out uses of insert_stmt_before_use. >> >> I tested this with: >> ./build/gcc/f951 cp2k_single_file.f90 -O3 -ffast-math -march=westmere >> >> I am running bootstrap and regression testing on x86-64-linux gnu. Is >> this OK for trunk if the testing is fine ? I will also test with other >> test cases from relevant PRs >> >> > Hi, > > Here is the tested patch. I made a slight change to reflect the paten used > everywhere else in reassoc. > > i.e., in insert_stmt_before_use, after finding the insertion point I now do: > > + if (stmt == insert_point) > +gsi_insert_before (&gsi, stmt_to_insert, GSI_NEW_STMT); > + else > +insert_stmt_after (stmt_to_insert, insert_point); > > This is consistent with what is done in other places. > > I tested the patch with CPU2006 and bootstrapped and regression tested for > x86-64-none-linux-gnu with no new regressions. > > Is this OK for trunk? Ok. Thanks, Richard. > Thanks, > Kugan > > gcc/testsuite/ChangeLog: > > 2016-05-28 Kugan Vivekanandarajah > > PR middle-end/71269 > PR middle-end/71292 > * gcc.dg/tree-ssa/pr71269.c: New test. > * gcc.dg/tree-ssa/pr71292.c: New test. > > gcc/ChangeLog: > > 2016-05-28 Kugan Vivekanandarajah > > PR middle-end/71269 > PR middle-end/71252 > > * tree-ssa-reassoc.c (insert_stmt_before_use): Use find_insert_point > so > that inserted stmt will not dominate stmts that defines its operand. > (rewrite_expr_tree): Add stmt_to_insert before adding the use stmt. > (rewrite_expr_tree_parallel): Likewise.
Re: [PATCH] match.pd: optimize unsigned mul overflow check
On Mon, May 30, 2016 at 9:14 AM, Alexander Monakov wrote: > On Sun, 29 May 2016, Marc Glisse wrote: >> On Sat, 28 May 2016, Alexander Monakov wrote: >> >> > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether >> > 'A*B' >> > overflows (or 'B && A > -1 / B' if B may be zero). Let's optimize it to an >> > invocation of __builtin_mul_overflow to avoid the divide operation. >> >> I forgot to ask earlier: what does this give for modes / platforms where >> umulv4 does not have a specific implementation? Is the generic implementation >> worse than A>-1/B, in which case we may want to check optab_handler before >> doing the transformation? Or is it always at least as good? > > If umulv4 is unavailable (which today is everywhere except x86), gcc > falls back as follows. First, it tries to see if doing a multiplication in a > 2x wider type is possible (which it usually is, as gcc supports __int128_t on > 64-bit platforms and 64-bit long long on 32-bit platforms), then it looks at > high bits of the 2x wide product. This should boil down to doing a 'high > multiply' instruction if original operands' type matches register size, and a > normal multiply + masking high bits if the type is smaller than register. > > Second, if the above fails (e.g. with 64-bit operands on a 32-bit platform), > then gcc emits a sequence that performs the multiplication by parts in a 2x > narrower type. > > I think the first, more commonly taken, fallback path results in an > always-good code. In the second case, the eliminated 64-bit divide is unlikely > to have a direct hw support; e.g., on i386 it's a library call to __udivdi3. > This makes the transformation a likely loss for code size, a likely win for > performance. It could be better if GCC could CSE REALPART (IFN_MUL_OVERFLOW) > with A*B on gimple. CCing Jakub who wrote the tree-ssa-math-opts.c code last year. I remember we discussed using match.pd but ended up with not doing it there but I don't remember the exact reason. It would be nice to have these in one place rather than split. As of umulv4 handling - users can write overflow checks using the builtins so we better expand them to the optimal form for each target, thus canonicalizing them to the IFN looks reasonable to me. The plus/minus case in tree-ssa-math-opts.c _does_ disable itself if no uaddv is available though. As for the division by zero thing - division by zero invokes undefined behavior. Yes, with -fnon-call-exceptions you could catch this as exception, but you shouldn't be able to w/o a -fdivision-by-zero. And yes, we're quite inconsistent in folding of, say, 0/0 - but that is due to warning code (historically). I'd rather ignore that bit in folding and make us consistent here (hopefully w/o regressing in diagnostics). Richard. > Thanks. > Alexander
Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences
On Mon, 30 May 2016, Tom de Vries wrote: > On 30/05/16 11:46, Richard Biener wrote: > > > This patch fixes the assert conservatively by aborting graphite code > > > >generation when encountering a phi with more than two arguments in > > > >copy_bb_and_scalar_dependences. > > > > > > > >Bootstrapped and reg-tested on x86_64. > > > > > > > >OK for trunk, 6 branch? > > > Did you check if simply returning false from bb_contains_loop_phi_nodes > > instead of asserting works? The caller has a 'else' that is supposed > > to handle condition PHIs. After all it already handles one predecessor > > specially ... Thus > > > >if (EDGE_COUNT (bb->preds) != 2) > > return false; > > > > should work here. > > Unfortunately, that doesn't work. We run into another assert in > copy_cond_phi_nodes: > ... > /* Cond phi nodes should have exactly two arguments. */ > gcc_assert (2 == EDGE_COUNT (bb->preds)); > ... Hah. So can we still do my suggested change and bail out conservatively in Cond PHI node handling instead? Because the PHI node is clearly _not_ a loop header PHI and the cond phi assert is also a latent bug. > > Or replace this function with bb_loop_header_p (bb) > > (works w/o loop info, the function seems to depend on loop info and > > thus simply checking bb->loop_father->header == bb should work as well). > > I think that will run into the same assert. Yes. Richard. > Thanks, > - Tom > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns
The following patch removes the restriction on seeing a tree of stmts in vectorizer bool pattern detection (aka single-use). With this it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern and that compile-time hog can go (it's now enabled unconditionally for GCC 7). Instead the pattern detection code will now "unshare" the condition tree for each bool pattern root my means of adding all pattern stmts of the condition tree to its pattern def sequence (so we still get some unnecessary copying, worst-case quadratic rather than exponential). Ilja - I had to disable the tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE (rhs1)); if (mask_type && expand_vec_cmp_expr_p (comp_vectype, mask_type)) return false; check you added to check_bool_pattern to get any coverage for bool patterns on x86_64. Doing that regresses FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1 loops" 10 FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized 1 loops" 10 FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1 loops" 2 so somehow bool patterns mess up things here (I didn't investigate). The final patch will enable the above path again, avoiding the regression. Yuri - I suppose you have a larger set of testcases using OMP simd or other forced vectorization you added ifcvt_repair_bool_pattern for. I'd appreciate testing (and testcases if anything fails unexpectedly). Testing on other targets is of course appreciated as well. Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0). Comments? I agree with Ilya elsewhere to remove bool patterns completely (as a first step making them apply to individual stmts). We do currently not support mixed cond/non-cond uses anyway, like _Bool a[64]; unsigned char b[64]; void foo (void) { for (int i = 0; i < 64; ++i) { _Bool x = a[i] && b[i] < 10; a[i] = x; } } and stmt-local "patterns" can be added when vectorizing the stmt (like in the above case a tem = a[i] != 0 ? -1 : 0). Doing bool "promotion" optimally requires a better vectorizer IL (similar to placing of shuffles). Thanks, Richard. 2016-05-30 Richard Biener PR tree-optimization/71261 * tree-vect-patterns.c (check_bool_pattern): Gather a hash-set of stmts successfully put in the bool pattern. Remove single-use restriction. (adjust_bool_pattern_cast): Add cast at the use site via the pattern def sequence. (adjust_bool_pattern): Remove recursion, maintain a hash-map of patterned defs. Use the pattern def seqence instead of multiple independent patterns. (sort_after_uid): New qsort compare function. (adjust_bool_stmts): New function to process stmts in the bool pattern in IL order. (vect_recog_bool_pattern): Adjust. * tree-if-conv.c (ifcvt_split_def_stmt): Remove. (ifcvt_walk_pattern_tree): Likewise. (stmt_is_root_of_bool_pattern): Likewise. (ifcvt_repair_bool_pattern): Likewise. (tree_if_conversion): Do not call ifcvt_repair_bool_pattern. * gcc.dg/torture/vect-bool-1.c: New testcase. Index: gcc/tree-vect-patterns.c === *** gcc/tree-vect-patterns.c.orig 2016-05-27 14:32:49.778686018 +0200 --- gcc/tree-vect-patterns.c2016-05-30 11:06:23.563532173 +0200 *** vect_recog_mixed_size_cond_pattern (vec< *** 2888,2897 /* Helper function of vect_recog_bool_pattern. Called recursively, return true if bool VAR can and should be optimized that way. Assume it shouldn't in case it's a result of a comparison which can be directly vectorized into !a vector comparison. */ static bool ! check_bool_pattern (tree var, vec_info *vinfo) { gimple *def_stmt; enum vect_def_type dt; --- 2888,2898 /* Helper function of vect_recog_bool_pattern. Called recursively, return true if bool VAR can and should be optimized that way. Assume it shouldn't in case it's a result of a comparison which can be directly vectorized into !a vector comparison. Fills in STMTS with all stmts visited during the !walk. */ static bool ! check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts) { gimple *def_stmt; enum vect_def_type dt; *** check_bool_pattern (tree var, vec_info * *** 2907,2943 if (!is_gimple_assign (def_stmt)) return false; ! if (!has_single_use (var)) ! return false; rhs1 = gimple_assign_rhs1 (def_stmt); rhs_code = gimple_assign_rhs_code (def_stmt); switch (rhs_code) { case SSA_NAME: ! return check_bool_pattern (rhs1, vinfo); CASE_CONVERT: if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1 || !TYPE_UNSIGNED (TREE_TYPE (rhs1))) && TREE_CODE (TREE_TYPE (rhs1)) != B
Re: Enable loop peeling at -O3
> On Sat, 28 May 2016, Jan Hubicka wrote: > > > Hello, > > thanks for feedback. I updated the patch and also noticed that > > -fpeel-all-loops gives up when > > upper bound is known but it is large and when the max-peel-insns is too > > small to permit > > peeling max-peel-times. This patch also updates pr61743-2.c which are now > > peeled before > > we manage to propagate the proper loop bound. > > > > Bootstrapped/regtested x86_64-linux. OK? > > Humm, so why add -fpeel-all-loops? I don't think -funroll-all-loops > is useful. It is mostly there to trigger the transform to see if it is useful. Not something you want to enable by default. -fpeel-all-loops helps when you know your code have internal loops that iterate few times. I.e. one can get good speedup for the sudoku solver benchmark because it has loops that iterate either once or 10 times. http://www.ucw.cz/~hubicka/papers/amd64/node4.html also claims that -funroll-all-loops improves specint by 2.5%, while -funroll-loops by 2.23%, so it seemed somewhat useful back then. > > Did you check code-size/compile-time/performance effects of enabling > -fpeel-loops at -O3 for, say, SPEC CPU 2006? Martin Liska run it on the SPEC2006 and v6 (not with latest fixes to heuristics). Without FDO the loop peeling triggers only for loops where we have likely_upper_bound != upper_bound. We do not predict that too often (we may in future as there is room for improvement in niter). The code size effect was +0.9% for SPECint and +2.2% on SPECfp. The off-noise improvements were vrp 94.5->89.7 and John the ripper 106.9->100 (less than 0.1% in geomavg). I have cut down the code size effects since that (there was a bug that made us to peel without control when maxiter overflowed), but I did not re-run full specs since then. My motivation was mainly to reduce number of optimizations that are not good enough to be enabled by default and also observation that it helps to some benchmarks. We can re-run benchmarks with current patch after fixing the profile update issues I will send shortly. No fine-tuning of the parameters was done and I guess they are still set the way they was set for RTL peeling in 2003. Honza
Re: [PATCH] match.pd: optimize unsigned mul overflow check
On Mon, May 30, 2016 at 12:55:22PM +0200, Richard Biener wrote: > CCing Jakub who wrote the tree-ssa-math-opts.c code last year. I remember we > discussed using match.pd but ended up with not doing it there but I > don't remember > the exact reason. richi: does match.pd support turning non-call code into internal call? jakub: yes jakub: (simplify (plus @1 @2) (fn @1 @2)) richi: where fn is IFN_... or something else? jakub: IFN_ should work, yes jakub: CFN_ as well as far as I understand the code richard added but then I don't control the type of the lhs, right? want to transform r_4 = x_2(D) - y_3(D); if (x_2(D) < r_4) into: well, the lhs is always the original type of the expression (obviously) _4 = SUB_OVERFLOW (x_2(D), y_3(D)); r.0_5 = REALPART_EXPR <_4>; _7 = IMAGPART_EXPR <_4>; if (_7 != 0) unsigned only jakub: hmm, you have two results, the compare and the subtract, that's not possible to handle right now (unless r_4 is unused otherwise) jakub: you can write a matcher and do the transform in a pass somewhere richi: ah, ok; any suggestion which pass is best for this? forwprop, something else? jakub: I'd say forwprop is good enough, yes richi: FYI, this is for PR67089, rth is working on the optab side for that hmm, the (match ...) code doesn't deal well with values not having an SSA name (the compare in this case) "deal well" as in, not implemented it would require to output some auxiliary entry points richi: if it is done in forwprop or some other pass, it can do everything in there by hand, doesn't need match.pd help I guess jakub: yes > It would be nice to have these in one place rather > than split. > > As of umulv4 handling - users can write overflow checks using the > builtins > so we better expand them to the optimal form for each target, thus > canonicalizing > them to the IFN looks reasonable to me. > The plus/minus case in tree-ssa-math-opts.c > _does_ disable itself if no uaddv is available though. That is because the uaddv/usubv case is so simple without the HW support that it is unlikely to be beneficial then. That is not the case for multiplication. Jakub
Re: Enable loop peeling at -O3
On Mon, 30 May 2016, Jan Hubicka wrote: > > On Sat, 28 May 2016, Jan Hubicka wrote: > > > > > Hello, > > > thanks for feedback. I updated the patch and also noticed that > > > -fpeel-all-loops gives up when > > > upper bound is known but it is large and when the max-peel-insns is too > > > small to permit > > > peeling max-peel-times. This patch also updates pr61743-2.c which are > > > now peeled before > > > we manage to propagate the proper loop bound. > > > > > > Bootstrapped/regtested x86_64-linux. OK? > > > > Humm, so why add -fpeel-all-loops? I don't think -funroll-all-loops > > is useful. > It is mostly there to trigger the transform to see if it is useful. Not > something > you want to enable by default. > > -fpeel-all-loops helps when you know your code have internal loops that > iterate few times. I.e. one can get good speedup for the sudoku solver > benchmark > because it has loops that iterate either once or 10 times. > > http://www.ucw.cz/~hubicka/papers/amd64/node4.html also claims that > -funroll-all-loops > improves specint by 2.5%, while -funroll-loops by 2.23%, so it seemed > somewhat useful > back then. Still I'm hesitant to introduce new user-visible options. > > Did you check code-size/compile-time/performance effects of enabling > > -fpeel-loops at -O3 for, say, SPEC CPU 2006? > > Martin Liska run it on the SPEC2006 and v6 (not with latest fixes to > heuristics). Without FDO the loop peeling triggers only for loops where we > have likely_upper_bound != upper_bound. We do not predict that too often (we > may in future as there is room for improvement in niter). The code size > effect > was +0.9% for SPECint and +2.2% on SPECfp. The off-noise improvements were > vrp > 94.5->89.7 and John the ripper 106.9->100 (less than 0.1% in geomavg). I have > cut down the code size effects since that (there was a bug that made us to > peel > without control when maxiter overflowed), but I did not re-run full specs > since > then. > > My motivation was mainly to reduce number of optimizations that are not good > enough to be enabled by default and also observation that it helps to some > benchmarks. > > We can re-run benchmarks with current patch after fixing the profile update > issues I will send shortly. No fine-tuning of the parameters was done and I > guess they are still set the way they was set for RTL peeling in 2003. Sounds good. The patch is ok if you omit the new flag for now. Thanks, Richard.
Re: [PATCH] match.pd: optimize unsigned mul overflow check
On Mon, May 30, 2016 at 1:15 PM, Jakub Jelinek wrote: > On Mon, May 30, 2016 at 12:55:22PM +0200, Richard Biener wrote: >> CCing Jakub who wrote the tree-ssa-math-opts.c code last year. I remember we >> discussed using match.pd but ended up with not doing it there but I >> don't remember >> the exact reason. > > richi: does match.pd support turning non-call code into internal call? > jakub: yes > jakub: (simplify (plus @1 @2) (fn @1 @2)) > richi: where fn is IFN_... or something else? > jakub: IFN_ should work, yes > jakub: CFN_ as well as far as I understand the code richard added > but then I don't control the type of the lhs, right? > want to transform >r_4 = x_2(D) - y_3(D); >if (x_2(D) < r_4) > into: > well, the lhs is always the original type of the expression > (obviously) >_4 = SUB_OVERFLOW (x_2(D), y_3(D)); >r.0_5 = REALPART_EXPR <_4>; >_7 = IMAGPART_EXPR <_4>; >if (_7 != 0) > unsigned only > jakub: hmm, you have two results, the compare and the subtract, > that's not possible to handle right now (unless r_4 is unused otherwise) > jakub: you can write a matcher and do the transform in a pass > somewhere > richi: ah, ok; any suggestion which pass is best for this? forwprop, > something else? > jakub: I'd say forwprop is good enough, yes > richi: FYI, this is for PR67089, rth is working on the optab side for > that > hmm, the (match ...) code doesn't deal well with values not having an > SSA name (the compare in this case) > "deal well" as in, not implemented > it would require to output some auxiliary entry points > richi: if it is done in forwprop or some other pass, it can do > everything in there by hand, doesn't need match.pd help I guess > jakub: yes Ok, so Alexander - you transform the condition only as the multiplication value is not available anyway with your formulation of the overflow check. That's then different to Jakubs case, so match.pd is ok (that is, the patch is ok for trunk). >> It would be nice to have these in one place rather >> than split. >> >> As of umulv4 handling - users can write overflow checks using the >> builtins >> so we better expand them to the optimal form for each target, thus >> canonicalizing >> them to the IFN looks reasonable to me. > >> The plus/minus case in tree-ssa-math-opts.c >> _does_ disable itself if no uaddv is available though. > > That is because the uaddv/usubv case is so simple without the HW support that > it is unlikely to be beneficial then. That is not the case for > multiplication. Richard. > Jakub
[build] Handle gas/gld --compress-debug-sections=type
I noticed that, while gcc.c is already prepared for it, gcc/configure doesn't yet detect the full ELF gABI compression support introduced in GNU binutils 2.26. The following patch adds the missing pieces. While mostly straightforward, there are two noteworthy issues: * I'm removing the `extra switches' arg to gcc_GAS_CHECK_FEATURES. This would cause HAVE_AS_COMPRESS_DEBUG to be set to "no" instead of one of the numeric values when testing /bin/as on Solaris. In the future, this might cause to misdetect assembler compression support requiring a different option. * When I removed the default in the gcc_cv_ld_compress test, the outcome always was 0, irrespective of the linker tested. Before, it would almost always have been 1 if testing GNU ld. It turns out that in this (and numerous other) cases the nesting of tests using ld_date was wrong. I believe most if not all of those ld_date tests can go, being only relevant for truly prehistoric versions of GNU ld not in use anymore. I'll probably submit a followup to remove them, simplifying several ld tests. Bootstrapped without regressions on i386-pc-solaris2.1[02] with various as/ld combinations and checking that the tests yielded the expected results: gcc_cv_as_compress_debug/gcc_cv_ld_compress_debug as/ld 2.10 0/0 as/ld 2.12 0/3 gas 2.26/ld 2.102/0 gas 2.26/ld 2.122/3 gas 2.26/gld 2.26 2/3 gas 2.25/gld 2.25 1/1 I've also compiled and linked a test program and checked that assembler and linker were invoked as expected for -gz/-gz=none/-gz=zlib/-gz=zlib-gnu and yielded either errors or output with compressed debug sections as should be. Ok for mainline? Rainer 2016-05-26 Rainer Orth * configure.ac (gcc_cv_as_compress_debug): Remove --compress-debug-sections as extra as switch. Handle gas --compress-debug-sections=type. (gcc_cv_ld_compess_debug): Remove bogus ld_date check. Handle gld --compress-debug-sections=type. * configure: Regenerate. # HG changeset patch # Parent 42e566f9c2ba3a71305b2db7875ea2e5fe2df2c2 Handle gas/gld --compress-debug-sections=type diff --git a/gcc/configure.ac b/gcc/configure.ac --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -4731,12 +4731,21 @@ if test x"$insn" != x; then fi gcc_GAS_CHECK_FEATURE([compressed debug sections], - gcc_cv_as_compress_debug,,[--compress-debug-sections],, + gcc_cv_as_compress_debug [# gas compiled without zlib cannot compress debug sections and warns # about it, but still exits successfully. So check for this, too. if $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s 2>&1 | grep -i warning > /dev/null then gcc_cv_as_compress_debug=0 + # Since binutils 2.26, gas supports --compress-debug-sections=type, + # defaulting to the ELF gABI format. + elif $gcc_cv_as --compress-debug-sections=zlib-gnu -o conftest.o conftest.s > /dev/null 2>&1 + then + gcc_cv_as_compress_debug=2 + gcc_cv_as_compress_debug_option="--compress-debug-sections" + gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections" + # Before binutils 2.26, gas only supported --compress-debug-options and + # emitted the traditional GNU format. elif $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s > /dev/null 2>&1 then gcc_cv_as_compress_debug=1 @@ -4744,8 +4753,6 @@ gcc_GAS_CHECK_FEATURE([compressed debug gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections" else gcc_cv_as_compress_debug=0 - # FIXME: Future gas versions will support ELF gABI style via - # --compress-debug-sections[=type]. fi]) AC_DEFINE_UNQUOTED(HAVE_AS_COMPRESS_DEBUG, $gcc_cv_as_compress_debug, [Define to the level of your assembler's compressed debug section support.]) @@ -5120,27 +5127,30 @@ AC_MSG_RESULT($gcc_cv_ld_eh_gc_sections_ AC_MSG_CHECKING(linker for compressed debug sections) # gold/gld support compressed debug sections since binutils 2.19/2.21 +# In binutils 2.26, gld gained support for the ELF gABI format. if test $in_tree_ld = yes ; then gcc_cv_ld_compress_debug=0 if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 19 -o "$gcc_cv_gld_major_version" -gt 2 \ && test $in_tree_ld_is_elf = yes && test $ld_is_gold = yes; then gcc_cv_ld_compress_debug=2 gcc_cv_ld_compress_debug_option="--compress-debug-sections" + elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 26 -o "$gcc_cv_gld_major_version" -gt 2 \ + && test $in_tree_ld_is_elf = yes && test $ld_is_gold = no; then +gcc_cv_ld_compress_debug=3 +gcc_cv_ld_compress_debug_option="--compress-debug-sections" elif test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 21 -o "$gcc_cv_gld_major_version" -gt 2 \ && test $in_tree_ld_is_elf = yes; then gcc_cv_ld_compress_debug=1 fi elif echo "$
Re: Enable loop peeling at -O3
> > Sounds good. > > The patch is ok if you omit the new flag for now. Ok, I will omit that flag for now. Thanks! Honza > > Thanks, > Richard.
[PATCH 0/2] [ARC] Refurbish backend options
This series of patches redefines the way how we handle the options within the ARC backend. Firstly, a number of options got deprecated as they were directed to control the assembler behavior, assembler which got overhaul and ignores the options in question. Secondly, we remove the capitalized cpu names accepted by mcpu option. Finally, we introduced new cpu options, based on the Synopsys predefined CPU templates to make easier the translation from hardware CPU template to gcc options. Those new cpu options are handled by a number of scripts which seamlessly generates gcc options, multilib variants, and checking the allowed hardware extensions against gcc command line options. Please find two patches, one which is refurbishing the ARC options, and the second one which updates the ARC specific tests. Claudiu Zissulescu (2): New option handling, refurbish multilib support. Update target specific tests. gcc/common/config/arc/arc-common.c | 162 -- gcc/config.gcc | 47 +++--- gcc/config/arc/arc-arch.h| 120 + gcc/config/arc/arc-arches.def| 35 gcc/config/arc/arc-c.def | 4 + gcc/config/arc/arc-cpus.def | 47 ++ gcc/config/arc/arc-options.def | 69 gcc/config/arc/arc-opts.h| 47 +- gcc/config/arc/arc-protos.h | 1 - gcc/config/arc/arc-tables.opt| 90 ++ gcc/config/arc/arc.c | 186 +++-- gcc/config/arc/arc.h | 91 +- gcc/config/arc/arc.md| 5 - gcc/config/arc/arc.opt | 109 gcc/config/arc/driver-arc.c | 80 + gcc/config/arc/genmultilib.awk | 204 +++ gcc/config/arc/genoptions.awk| 86 ++ gcc/config/arc/t-arc | 19 +++ gcc/config/arc/t-arc-newlib | 46 - gcc/config/arc/t-arc-uClibc | 20 --- gcc/config/arc/t-multilib| 51 ++ gcc/config/arc/t-uClibc | 20 +++ gcc/doc/invoke.texi | 86 -- gcc/testsuite/gcc.target/arc/abitest.S | 31 gcc/testsuite/gcc.target/arc/arc.exp | 66 +++- gcc/testsuite/gcc.target/arc/barrel-shifter-1.c | 2 +- gcc/testsuite/gcc.target/arc/builtin_simd.c | 1 + gcc/testsuite/gcc.target/arc/builtin_simdarc.c | 1 + gcc/testsuite/gcc.target/arc/cmem-1.c| 1 + gcc/testsuite/gcc.target/arc/cmem-2.c| 1 + gcc/testsuite/gcc.target/arc/cmem-3.c| 1 + gcc/testsuite/gcc.target/arc/cmem-4.c| 1 + gcc/testsuite/gcc.target/arc/cmem-5.c| 1 + gcc/testsuite/gcc.target/arc/cmem-6.c| 1 + gcc/testsuite/gcc.target/arc/cmem-7.c| 1 + gcc/testsuite/gcc.target/arc/extzv-1.c | 1 + gcc/testsuite/gcc.target/arc/insv-1.c| 1 + gcc/testsuite/gcc.target/arc/insv-2.c| 1 + gcc/testsuite/gcc.target/arc/interrupt-1.c | 7 +- gcc/testsuite/gcc.target/arc/interrupt-2.c | 1 + gcc/testsuite/gcc.target/arc/interrupt-3.c | 2 +- gcc/testsuite/gcc.target/arc/jump-around-jump.c | 2 +- gcc/testsuite/gcc.target/arc/mA6.c | 1 + gcc/testsuite/gcc.target/arc/mA7.c | 1 + gcc/testsuite/gcc.target/arc/mARC600.c | 1 + gcc/testsuite/gcc.target/arc/mARC601.c | 3 +- gcc/testsuite/gcc.target/arc/mARC700.c | 1 + gcc/testsuite/gcc.target/arc/mcpu-arc600.c | 3 +- gcc/testsuite/gcc.target/arc/mcpu-arc601.c | 5 +- gcc/testsuite/gcc.target/arc/mcpu-arc700.c | 3 +- gcc/testsuite/gcc.target/arc/mcrc.c | 8 - gcc/testsuite/gcc.target/arc/mdpfp.c | 1 + gcc/testsuite/gcc.target/arc/mdsp-packa.c| 9 - gcc/testsuite/gcc.target/arc/mdvbf.c | 9 - gcc/testsuite/gcc.target/arc/mmac-24.c | 8 - gcc/testsuite/gcc.target/arc/mmac-d16.c | 9 - gcc/testsuite/gcc.target/arc/mno-crc.c | 11 -- gcc/testsuite/gcc.target/arc/mno-dsp-packa.c | 11 -- gcc/testsuite/gcc.target/arc/mno-dvbf.c | 11 -- gcc/testsuite/gcc.target/arc/mno-mac-24.c| 11 -- gcc/testsuite/gcc.target/arc/mno-mac-d16.c | 11 -- gcc/testsuite/gcc.target/arc/mno-rtsc.c | 11 -- gcc/testsuite/gcc.target/arc/mno-xy.c| 10 -- gcc/testsuite/gcc.target/arc/movb-1.c| 1 + gcc/testsuite/gcc.target/arc/movb-2.c| 1 + gcc/testsuite/gcc.target/arc/movb-3.c| 1 + gcc/testsuite/gcc.target/arc/movb-4.c| 1 + g
[PATCH 2/2] [ARC] Update target specific tests.
Update the ARC specific tests. OK to apply? Claudiu gcc/ 2016-05-26 Claudiu Zissulescu * testsuite/gcc.target/arc/abitest.S: New file. * testsuite/gcc.target/arc/va_args-1.c: Likewise. * testsuite/gcc.target/arc/va_args-2.c: Likewise. * testsuite/gcc.target/arc/va_args-3.c: Likewise. * testsuite/gcc.target/arc/mcrc.c: Deleted. * testsuite/gcc.target/arc/mdsp-packa.c: Likewise. * testsuite/gcc.target/arc/mdvbf.c: Likewise. * testsuite/gcc.target/arc/mmac-24.c: Likewise. * testsuite/gcc.target/arc/mmac-d16.c: Likewise. * testsuite/gcc.target/arc/mno-crc.c: Likewise. * testsuite/gcc.target/arc/mno-dsp-packa.c: Likewise. * testsuite/gcc.target/arc/mno-dvbf.c: Likewise. * testsuite/gcc.target/arc/mno-mac-24.c: Likewise. * testsuite/gcc.target/arc/mno-mac-d16.c: Likewise. * testsuite/gcc.target/arc/mno-rtsc.c: Likewise. * testsuite/gcc.target/arc/mno-xy.c: Likewise. * testsuite/gcc.target/arc/mrtsc.c: Likewise. * testsuite/gcc.target/arc/arc.exp (check_effective_target_arcem): New function. (check_effective_target_arc700): Likewise. (check_effective_target_arc6xx): Likewise. (check_effective_target_arcmpy): Likewise. (check_effective_target_archs): Likewise. (check_effective_target_clmcpu): Likewise. * testsuite/gcc.target/arc/barrel-shifter-1.c: Changed. * testsuite/gcc.target/arc/builtin_simd.c: Test only for ARC700 cpus. * testsuite/gcc.target/arc/cmem-1.c: Changed. * testsuite/gcc.target/arc/cmem-2.c: Likewise. * testsuite/gcc.target/arc/cmem-3.c: Likewise. * testsuite/gcc.target/arc/cmem-4.c: Likewise. * testsuite/gcc.target/arc/cmem-5.c: Likewise. * testsuite/gcc.target/arc/cmem-6.c: Likewise. * testsuite/gcc.target/arc/cmem-7.c: Likewise. * testsuite/gcc.target/arc/interrupt-1.c: Test for RTIE as well. * testsuite/gcc.target/arc/interrupt-2.c: Skip it for ARCv2 cores. * testsuite/gcc.target/arc/interrupt-3.c: Match also ARCv2 warnings. * testsuite/gcc.target/arc/jump-around-jump.c: Update options. * testsuite/gcc.target/arc/mARC601.c: Changed. * testsuite/gcc.target/arc/mcpu-arc600.c: Changed. * testsuite/gcc.target/arc/mcpu-arc601.c: Changed. * testsuite/gcc.target/arc/mcpu-arc700.c: Changed. * testsuite/gcc.target/arc/mdpfp.c: Skip for ARCv2 cores. * testsuite/gcc.target/arc/movb-1.c: Changed. * testsuite/gcc.target/arc/movb-2.c: Likewise. * testsuite/gcc.target/arc/movb-3.c: Likewise. * testsuite/gcc.target/arc/movb-4.c: Likewise. * testsuite/gcc.target/arc/movb-5.c: Likewise. * testsuite/gcc.target/arc/movb_cl-1.c: Likewise. * testsuite/gcc.target/arc/movb_cl-2.c: Likewise. * testsuite/gcc.target/arc/movbi_cl-1.c: Likewise. * testsuite/gcc.target/arc/movh_cl-1.c: Likewise. * testsuite/gcc.target/arc/mspfp.c: Skip for ARC HS cores. * testsuite/gcc.target/arc/mul64.c: Enable it only for ARC600. * testsuite/gcc.target/arc/mulsi3_highpart-1.c: Scan for ARCv2 instructions. * testsuite/gcc.target/arc/mulsi3_highpart-2.c: Skip it for ARCv1 cores. * testsuite/gcc.target/arc/no-dpfp-lrsr.c: Skip it for ARC HS. * testsuite/gcc.target/arc/trsub.c: Only for ARC EM cores. * testsuite/gcc.target/arc/builtin_simdarc.c: Changed. * testsuite/gcc.target/arc/extzv-1.c: Likewise. * testsuite/gcc.target/arc/insv-1.c: Likewise. * testsuite/gcc.target/arc/insv-2.c: Likewise. * testsuite/gcc.target/arc/mA6.c: Likewise. * testsuite/gcc.target/arc/mA7.c: Likewise. * testsuite/gcc.target/arc/mARC600.c: Likewise. * testsuite/gcc.target/arc/mARC700.c: Likewise. * testsuite/gcc.target/arc/mcpu-arc600.c: Likewise. * testsuite/gcc.target/arc/mcpu-arc700.c: Likewise. * testsuite/gcc.target/arc/movl-1.c: Likewise. * testsuite/gcc.target/arc/nps400-1.c: Likewise. * testsuite/gcc.target/arc/trsub.c: Likewise. --- gcc/testsuite/gcc.target/arc/abitest.S | 31 +++ gcc/testsuite/gcc.target/arc/arc.exp | 66 +++- gcc/testsuite/gcc.target/arc/barrel-shifter-1.c | 2 +- gcc/testsuite/gcc.target/arc/builtin_simd.c | 1 + gcc/testsuite/gcc.target/arc/builtin_simdarc.c | 1 + gcc/testsuite/gcc.target/arc/cmem-1.c| 1 + gcc/testsuite/gcc.target/arc/cmem-2.c| 1 + gcc/testsuite/gcc.target/arc/cmem-3.c| 1 + gcc/testsuite/gcc.target/arc/cmem-4.c| 1 + gcc/testsuite/gcc.target/arc/cmem-5.c| 1 + gcc/testsuite/gcc.target/arc/cmem-6.c| 1 + gcc/testsuite/gcc.target/arc/cmem-7.c| 1 + gcc/testsuite/gcc.target/arc/ext
[PATCH 1/2] [ARC] New option handling, refurbish multilib support.
Overhaul of ARC options, and multilib support for various cpus. OK to apply? Claudiu gcc/ 2016-05-09 Claudiu Zissulescu * config/arc/arc-arch.h: New file. * config/arc/arc-arches.def: Likewise. * config/arc/arc-cpus.def: Likewise. * config/arc/arc-options.def: Likewise. * config/arc/t-multilib: Likewise. * config/arc/genmultilib.awk: Likewise. * config/arc/genoptions.awk: Likewise. * config/arc/arc-tables.opt: Likewise. * config/arc/driver-arc.c: Likewise. * common/config/arc/arc-common.c (arc_handle_option): Trace toggled options. * config.gcc (arc*-*-*): Add arc-tables.opt to arc's extra options; check for supported cpu against arc-cpus.def file. (arc*-*-elf*, arc*-*-linux-uclibc*): Use new make fragment; define TARGET_CPU_BUILD macro; add driver-arc.o as an extra object. * config/arc/arc-c.def: Add emacs local variables. * config/arc/arc-opts.h (processor_type): Use arc-cpus.def file. (FPU_FPUS, FPU_FPUD, FPU_FPUDA, FPU_FPUDA_DIV, FPU_FPUDA_FMA) (FPU_FPUDA_ALL, FPU_FPUS_DIV, FPU_FPUS_FMA, FPU_FPUS_ALL) (FPU_FPUD_DIV, FPU_FPUD_FMA, FPU_FPUD_ALL): New defines. (DEFAULT_arc_fpu_build): Define. (DEFAULT_arc_mpy_option): Define. * config/arc/arc-protos.h (arc_init): Delete. * config/arc/arc.c (arc_cpu_name): New variable. (arc_selected_cpu, arc_selected_arch, arc_arcem, arc_archs) (arc_arc700, arc_arc600, arc_arc601): New variable. (arc_init): Add static; remove selection of default tune value, cleanup obsolete error messages. (arc_override_options): Make use of .def files for selecting the right cpu and option configurations. * config/arc/arc.h (stdbool.h): Include. (TARGET_CPU_DEFAULT): Define. (CPP_SPEC): Remove mcpu=NPS400 handling. (arc_cpu_to_as): Declare. (EXTRA_SPEC_FUNCTIONS): Define. (OPTION_DEFAULT_SPECS): Likewise. (ASM_DEFAULT): Remove. (ASM_SPEC): Use arc_cpu_to_as. (DRIVER_SELF_SPECS): Remove deprecated options. (arc_arc700, arc_arc600, arc_arc601, arc_arcem, arc_archs): Declare. (TARGET_ARC600, TARGET_ARC601, TARGET_ARC700, TARGET_EM) (TARGET_HS, TARGET_V2, TARGET_ARC600): Make them use arc_arc* variables. (MULTILIB_DEFAULTS): Use ARC_MULTILIB_CPU_DEFAULT. * config/arc/arc.md (attr_cpu): Remove. * config/arc/arc.opt (arc_mpy_option): Make it target variable. (mno-mpy): Deprecate. (mcpu=ARC600, mcpu=ARC601, mcpu=ARC700, mcpu=NPS400, mcpu=ARCEM) (mcpu=ARCHS): Remove. (mcrc, mdsp-packa, mdvbf, mmac-d16, mmac-24, mtelephony, mrtsc): Deprecate. (mbarrel_shifte, mspfp_, mdpfp_, mdsp_pack, mmac_): Remove. (arc_fpu): Use new defines. (arc_seen_options): New target variable. * config/arc/t-arc (driver-arc.o): New target. (arc-cpus, t-multilib, arc-tables.opt): Likewise. * config/arc/t-arc-newlib: Delete. * config/arc/t-arc-uClibc: Renamed to t-uClibc. * doc/invoke.texi (ARC): Update arc options. --- gcc/common/config/arc/arc-common.c | 162 - gcc/config.gcc | 47 + gcc/config/arc/arc-arch.h | 120 ++ gcc/config/arc/arc-arches.def | 35 +++ gcc/config/arc/arc-c.def | 4 + gcc/config/arc/arc-cpus.def| 47 + gcc/config/arc/arc-options.def | 69 + gcc/config/arc/arc-opts.h | 47 +++-- gcc/config/arc/arc-protos.h| 1 - gcc/config/arc/arc-tables.opt | 90 gcc/config/arc/arc.c | 186 ++--- gcc/config/arc/arc.h | 91 - gcc/config/arc/arc.md | 5 - gcc/config/arc/arc.opt | 109 ++-- gcc/config/arc/driver-arc.c| 80 +++ gcc/config/arc/genmultilib.awk | 204 + gcc/config/arc/genoptions.awk | 86 gcc/config/arc/t-arc | 19 gcc/config/arc/t-arc-newlib| 46 - gcc/config/arc/t-arc-uClibc| 20 gcc/config/arc/t-multilib | 51 ++ gcc/config/arc/t-uClibc| 20 gcc/doc/invoke.texi| 86 +--- 23 files changed, 1249 insertions(+), 376 deletions(-) create mode 100644 gcc/config/arc/arc-arch.h create mode 100644 gcc/config/arc/arc-arches.def create mode 100644 gcc/config/arc/arc-cpus.def create mode 100644 gcc/config/arc/arc-options.def create mode 100644 gcc/config/arc/arc-tables.opt create mode 100644 gcc/config/arc/driver-arc.c create mode 100644 gcc/config/arc/genmultilib.awk create mode 100644 gcc/config/arc/genoptions.awk delete mode 10064
Fix profile updating in loop peeling
Hi, this patch fixes profile updates in loop peeling pass. First it correctly set wont_exit which can only be set when we know the number of iterations tested by EXIT and this number is higher than maximal number of iterations (an unlikely case which is usually removed by VRP pass or earlier cunroll). Second problem is that we determine number of peelings as number of estimated iterations + 1. After peeling we currently underflow updating estimates which makes them to be re-computed later to bogus values. Last change is way we netermine profile for the new loop header. We used to drop the loop frequency 1/1000. This patch makes use of the info on remaining entry edges to the loop. While working on this I noticed that try_peel_loop has tendency to iterate and peel one loop many times. I will prepare followup for that. Also testcases will come once I commit the change enabling loop peeling at -O3 by using likely estimates. Bootstrapped/regtested x86_64-linux. * tree-ssa-loop-ivcanon.c (try_peel_loop): Correctly set wont_exit for peeled copies; avoid underflow when updating estimates; correctly scale loop profile. Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 236874) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -970,7 +970,9 @@ try_peel_loop (struct loop *loop, if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0) return false; - /* Peel only innermost loops. */ + /* Peel only innermost loops. + While the code is perfectly capable of peeling non-innermost loops, + the heuristics would probably need some improvements. */ if (loop->inner) { if (dump_file) @@ -1029,13 +1031,23 @@ try_peel_loop (struct loop *loop, /* Duplicate possibly eliminating the exits. */ initialize_original_copy_tables (); wont_exit = sbitmap_alloc (npeel + 1); - bitmap_ones (wont_exit); - bitmap_clear_bit (wont_exit, 0); + if (exit && niter + && TREE_CODE (niter) == INTEGER_CST + && wi::leu_p (npeel, wi::to_widest (niter))) +{ + bitmap_ones (wont_exit); + if (wi::eq_p (wi::to_widest (niter), npeel)) +bitmap_clear_bit (wont_exit, 0); +} + else +{ + exit = NULL; + bitmap_clear (wont_exit); +} if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop), npeel, wont_exit, exit, &to_remove, -DLTHE_FLAG_UPDATE_FREQ -| DLTHE_FLAG_COMPLETTE_PEEL)) +DLTHE_FLAG_UPDATE_FREQ)) { free_original_copy_tables (); free (wont_exit); @@ -1053,14 +1065,48 @@ try_peel_loop (struct loop *loop, fprintf (dump_file, "Peeled loop %d, %i times.\n", loop->num, (int) npeel); } + if (loop->any_estimate) +{ + if (wi::ltu_p (npeel, loop->nb_iterations_estimate)) +loop->nb_iterations_estimate -= npeel; + else + loop->nb_iterations_estimate = 0; +} if (loop->any_upper_bound) -loop->nb_iterations_upper_bound -= npeel; +{ + if (wi::ltu_p (npeel, loop->nb_iterations_estimate)) +loop->nb_iterations_upper_bound -= npeel; + else +loop->nb_iterations_upper_bound = 0; +} if (loop->any_likely_upper_bound) -loop->nb_iterations_likely_upper_bound -= npeel; - loop->nb_iterations_estimate = 0; - /* Make sure to mark loop cold so we do not try to peel it more. */ - scale_loop_profile (loop, 1, 0); - loop->header->count = 0; +{ + if (wi::ltu_p (npeel, loop->nb_iterations_estimate)) + loop->nb_iterations_likely_upper_bound -= npeel; + else + { + loop->any_estimate = true; + loop->nb_iterations_estimate = 0; + loop->nb_iterations_likely_upper_bound = 0; + } +} + gcov_type entry_count = 0; + int entry_freq = 0; + + edge_iterator ei; + FOR_EACH_EDGE (e, ei, loop->header->preds) +if (e->src != loop->latch) + { + entry_count += e->src->count; + entry_freq += e->src->frequency; + gcc_assert (!flow_bb_inside_loop_p (loop, e->src)); + } + int scale = 1; + if (loop->header->count) +scale = RDIV (entry_count * REG_BR_PROB_BASE, loop->header->count); + else if (loop->header->frequency) +scale = RDIV (entry_freq * REG_BR_PROB_BASE, loop->header->frequency); + scale_loop_profile (loop, scale, 0); return true; } /* Adds a canonical induction variable to LOOP if suitable.
Re: [PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns
2016-05-30 14:04 GMT+03:00 Richard Biener : > > The following patch removes the restriction on seeing a tree of stmts > in vectorizer bool pattern detection (aka single-use). With this > it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern > and that compile-time hog can go (it's now enabled unconditionally for GCC 7). > > Instead the pattern detection code will now "unshare" the condition tree > for each bool pattern root my means of adding all pattern stmts of the > condition tree to its pattern def sequence (so we still get some > unnecessary copying, worst-case quadratic rather than exponential). > > Ilja - I had to disable the > > tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE > (rhs1)); > if (mask_type > && expand_vec_cmp_expr_p (comp_vectype, mask_type)) > return false; > > check you added to check_bool_pattern to get any coverage for bool > patterns on x86_64. Doing that regresses > > FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1 > loops" 10 > FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized > 1 loops" 10 > FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1 > loops" 2 > > so somehow bool patterns mess up things here (I didn't investigate). > The final patch will enable the above path again, avoiding the regression. Mask conversion patterns handle some cases bool patterns don't. So it's expected we can't vectorize some loops if bool patterns are enforced. Thanks, Ilya > > Yuri - I suppose you have a larger set of testcases using OMP simd > or other forced vectorization you added ifcvt_repair_bool_pattern for. > I'd appreciate testing (and testcases if anything fails unexpectedly). > > Testing on other targets is of course appreciated as well. > > Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0). > > Comments? > > I agree with Ilya elsewhere to remove bool patterns completely > (as a first step making them apply to individual stmts). We do > currently not support mixed cond/non-cond uses anyway, like > > _Bool a[64]; > unsigned char b[64]; > > void foo (void) > { > for (int i = 0; i < 64; ++i) > { > _Bool x = a[i] && b[i] < 10; > a[i] = x; > } > } > > and stmt-local "patterns" can be added when vectorizing the stmt > (like in the above case a tem = a[i] != 0 ? -1 : 0). Doing > bool "promotion" optimally requires a better vectorizer IL > (similar to placing of shuffles). > > Thanks, > Richard. >
Re: [SPARC] Support for --with-{cpu,tune}-{32,64} in sparc*-* targets
Hi Jose, On 30/05/2016 12:03, jose.march...@oracle.com wrote: > Tested in sparc64-linux-gnu, sparcv9-linux-gnu and sparc-sun-solaris2.11. > > 2016-05-25 Jose E. Marchesi > > * config.gcc (sparc*-*-*): Support cpu_32, cpu_64, tune_32 and > tune_64. > * doc/install.texi (--with-cpu-32, --with-cpu-64): Document > support on SPARC. > * config/sparc/linux64.h (OPTION_DEFAULT_SPECS): Add entries for > cpu_32, cpu_64, tune_32 and tune_64. > * config/sparc/sol2.h (OPTION_DEFAULT_SPECS): Likewise. OK for mainline, thanks. Thanks. I would need someone to install it, as I don't have commit access to the svn. If nobody beats me to it, I can take care of that later today. Cheers, Paolo.
Re: [PATCH][2/3] Vectorize inductions that are live after the loop
On Fri, May 27, 2016 at 5:12 PM, Alan Hayward wrote: > > On 27/05/2016 12:41, "Richard Biener" wrote: > >>On Fri, May 27, 2016 at 11:09 AM, Alan Hayward >>wrote: >>> This patch is a reworking of the previous vectorize inductions that are >>> live >>> after the loop patch. >>> It now supports SLP and an optimisation has been moved to patch [3/3]. >>> >>> >>> Stmts which are live (ie: defined inside a loop and then used after the >>> loop) >>> are not currently supported by the vectorizer. In many cases >>> vectorization can >>> still occur because the SCEV cprop pass will hoist the definition of the >>> stmt >>> outside of the loop before the vectorizor pass. However, there are >>>various >>> cases SCEV cprop cannot hoist, for example: >>> for (i = 0; i < n; ++i) >>> { >>> ret = x[i]; >>> x[i] = i; >>> } >>>return i; >>> >>> Currently stmts are marked live using a bool, and the relevant state >>>using >>> an >>> enum. Both these states are propagated to the definition of all uses of >>>the >>> stmt. Also, a stmt can be live but not relevant. >>> >>> This patch vectorizes a live stmt definition normally within the loop >>>and >>> then >>> after the loop uses BIT_FIELD_REF to extract the final scalar value from >>> the >>> vector. >>> >>> This patch adds a new relevant state (vect_used_only_live) for when a >>>stmt >>> is >>> used only outside the loop. The relevant state is still propagated to >>>all >>> it's >>> uses, but the live bool is not (this ensures that >>> vectorizable_live_operation >>> is only called with stmts that really are live). >>> >>> Tested on x86 and aarch64. >> >>+ /* If STMT is a simple assignment and its inputs are invariant, then >>it can >>+ remain in place, unvectorized. The original last scalar value that >>it >>+ computes will be used. */ >>+ if (is_simple_and_all_uses_invariant (stmt, loop_vinfo)) >> { >> >>so we can't use STMT_VINFO_RELEVANT or so? I thought we somehow >>mark stmts we don't vectorize in the end. > > It’s probably worth making clear that this check exists in the current > GCC head - today vectorize_live_operation only supports the > simple+invariant > case and the SSE2 case. > My patch simply moved the simple+invariant code into the new function > is_simple_and_all_uses_invariant. Ah, didn't notice this. > Looking at this again, I think what we really care about is…. > *If the stmt is live but not relevant, we need to mark it to ensure it is > vectorised. > *If a stmt is simple+invariant then a live usage of it can either use the > scalar or vectorized version. > > So for a live stmt: > *If it is simple+invariant and not relevant, then it is more optimal to > use the > scalar version. > *If it is simple+invariant and relevant then it is more optimal to use the > vectorized version. > *If it is not simple+invariant then we must always use the vectorized > version. So, if ! relevant (aka not vectorized) use the scalar version. Otherwise use the vectorized version. The case we fail to handle is a live but not relevant computation derived from a vectorized stmt (the !invariant and !relevant case). This case should be handled by sinking the computation up to the vectorized stmt to the exit and handling the "last" live operation by using the vectorized def. In fact stmt sinking should have performed this sinking already (so a check and not vectorizing this case just in case that didn't happen is fine I think) > Therefore, the patch as it stands is correct but not optimal. In patch 3/3 > for > the code above (vectorize_live_operation) I can change the check to if not > relevant then assert that it is not simple+invariant and return true. Sounds good. > > >> >>+ lhs = (is_a (stmt)) ? gimple_phi_result (stmt) >>+ : gimple_get_lhs (stmt); >>+ lhs_type = TREE_TYPE (lhs); >>+ >>+ /* Find all uses of STMT outside the loop. */ >>+ auto_vec worklist; >>+ FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs) >>+{ >>+ basic_block bb = gimple_bb (use_stmt); >>+ >>+ if (!flow_bb_inside_loop_p (loop, bb)) >>+ worklist.safe_push (use_stmt); >> } >>+ gcc_assert (!worklist.is_empty ()); >> >>as we are in loop-closed SSA there should be exactly one such use...? > > Yes, I should change this to assert that worklist is of length 1. > >> >>+ /* Get the correct slp vectorized stmt. */ >>+ vec_oprnds.create (num_vec); >>+ vect_get_slp_vect_defs (slp_node, &vec_oprnds); >> >>As you look at the vectorized stmt you can directly use the >>SLP_TREE_VEC_STMTS >>array (the stmts lhs, of course), no need to export this function. > > Ok, I can change this. > >> >>The rest of the changes look ok to me. > > Does that include PATCH 1/3 ? I don't like how 1/3 ends up looking :/ So what was the alternative again? I looked into 1/3 and what it takes to remove the 'stmt' argument and instead pass in a vect_def_type. It's a bit twisted and just adding another argument (the loop_vinfo) doesn't help th
Re: [PATCH] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount
On 5/30/16 03:18, Mike Stump wrote: > On May 29, 2016, at 3:39 AM, cheng...@emindsoft.com.cn wrote: >> >> r10 may also be as parameter for the nested function, so need save it >> before call mcount. > > mcount can have a special abi where it preserves more registers than one > would otherwise expect. I'm wondering if you know what registers it saves or > doesn't touch? Does this fix any bug found by running tests, or just by > inspection? > It is about Bug71331, which I reported yesterday. For nested function, r10 is treated as the stack pointer for parameters, mcount really save r10, but tilegx also use r10 to save lr, so cause this issue ("move r10, lr; jal __mcount"). What I have done just like gcc x86 has done ("push r10; callq mcount"). Thanks. -- Chen Gang (陈刚) Managing Natural Environments is the Duty of Human Beings.
RE: [PATCH] x86 interrupt attribute patch [2/2]
Hi, Here is the fixed version of the patch. Ok for trunk? -Original Message- From: Sandra Loosemore [mailto:san...@codesourcery.com] Sent: Tuesday, May 10, 2016 11:02 PM To: Koval, Julia ; gcc-patches@gcc.gnu.org Cc: Lu, Hongjiu ; vaalfr...@gmail.com; ubiz...@gmail.com; l...@redhat.com; Zamyatin, Igor Subject: Re: [PATCH] x86 interrupt attribute patch [2/2] On 04/20/2016 07:42 AM, Koval, Julia wrote: > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index > a5a8b23..82de5bf 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -5263,6 +5263,83 @@ On x86-32 targets, the @code{stdcall} attribute > causes the compiler to assume that the called function pops off the > stack space used to pass arguments, unless it takes a variable number of > arguments. > > +@item no_caller_saved_registers > +@cindex @code{no_caller_saved_registers} function attribute, x86 Use > +this attribute to indicate that the specified function has no > +caller-saved registers. That is, all registers are callee-saved. > +The compiler generates proper function entry and exit sequences to > +save and restore any modified registers, except for the EFLAGS > +register. If the compiler generates MPX, SSE, MMX or x87 > +instructions in a function with @code{no_caller_saved_registers} > +attribute or functions called from a function with > +@code{no_caller_saved_registers} attribute may contain MPX, SSE, MMX > +or x87 instructions, the compiler must save and restore the corresponding > state. I cannot parse the last sentence in this paragraph. How can the compiler know whether called functions may contain those instructions? Plus, talking about what the compiler must do seems too implementor-speaky for user documentation. Maybe you mean something like "The compiler also saves and restores state associated with MPX, SSE, MMX, and x87 instructions." ? I also think the documentation needs to give some hint about why a user would want to put this attribute on a function. > + > +@item interrupt > +@cindex @code{interrupt} function attribute, x86 Use this attribute > +to indicate that the specified function is an interrupt handler. The > +compiler generates function entry and exit sequences suitable for use > +in an interrupt handler when this attribute is present. The > +@code{IRET} instruction, instead of the @code{RET} instruction, is > +used to return from interrupt handlers. All registers, except for > +the EFLAGS register which is restored by the @code{IRET} instruction, > +are preserved by the compiler. If the compiler generates MPX, SSE, > +MMX or x87 instructions in an interrupt handler, or functions called > +from an interrupt handler may contain MPX, SSE, MMX or x87 > +instructions, the compiler must save and restore the corresponding > +state. Similar problems here. From the further discussion that follows, it appears that you can use the "interrupt" attribute on exception handlers as well, but the paragraph above only mentions interrupt handlers. > + > +Since the direction flag in the FLAGS register in interrupt handlers > +is undetermined, cld instruction must be emitted in function prologue > +if rep string instructions are used in interrupt handler or interrupt > +handler isn't a leaf function. Again, this sounds like implementor-speak, and there are grammatical errors (noun/verb disagreement, missing articles). Do users of this attribute need to know what instructions the compiler is emitting? We already say above that it causes GCC to generate suitable entry and exit sequences. > + > +Any interruptible-without-stack-switch code must be compiled with > +@option{-mno-red-zone} since interrupt handlers can and will, because > +of the hardware design, touch the red zone. > + > +An interrupt handler must be declared with a mandatory pointer > +argument: > + > +@smallexample > +struct interrupt_frame; > + > +__attribute__ ((interrupt)) > +void > +f (struct interrupt_frame *frame) > +@{ > +@} > +@end smallexample > + > +and user must properly define the structure the pointer pointing to. "user" == "you" in the GCC user manual. How do you "properly define" this structure, or is that a stupid question? (I know very little about x86, maybe this is obvious to experts.) > + > +The exception handler is very similar to the interrupt handler with > +a different mandatory function signature: > + > +@smallexample > +#ifdef __x86_64__ > +typedef unsigned long long int uword_t; > +#else > +typedef unsigned int uword_t; > +#endif > + > +struct interrupt_frame; > + > +__attribute__ ((interrupt)) > +void > +f (struct interrupt_frame *frame, uword_t error_code) > +@{ > + ... > +@} > +@end smallexample > + > +and compiler pops the error code off the stack before the @code{IRET} > +instruction. > + > +The exception handler should only be used for exceptions which push an > +error code and all other exceptions must use the interrupt handler. > +The system will cr
[PATCH, i386, AVX-512] Fix PR target/71346.
Hello, Patch limits constraint for scalar operand in split to AVX-512VL. Boostrap/regtest in progress for x86_64/ia32. I'll check it in if pass. PR target/71346. gcc/ * config/i386/sse.md (define_insn_and_split "*vec_extractv4sf_0"): Use `Yv' for scalar operand. testsuite/ * gcc.target/i386/prr71346.c: New test. -- Thanks, K commit 6c0021bea7a5be8d9a10ef3f2fb30c1228f53d48 Author: Kirill Yukhin Date: Mon May 30 16:31:28 2016 +0300 AVX-512. Fix PR target/71346. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b348f2d..1267897 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -6837,7 +6837,7 @@ "operands[1] = gen_lowpart (SFmode, operands[1]);") (define_insn_and_split "*sse4_1_extractps" - [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v") + [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv") (vec_select:SF (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v") (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n,n,n")])))] diff --git a/gcc/testsuite/gcc.target/i386/pr71346.c b/gcc/testsuite/gcc.target/i386/pr71346.c new file mode 100644 index 000..0a15869 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr71346.c @@ -0,0 +1,25 @@ +/* PR target/71346 */ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=knl" } */ + +typedef int rvec[3]; +int a; +float b, c, d, g; +rvec *e, *f; +void fn2(float h, float g); + +void +fn1() +{ + float h; + for (; a; a++) { +h += e[a][0] * f[a][0]; +b += e[a][0] * f[a][1]; +c += e[a][2] * f[a][0]; +d += e[a][2] * f[a][1]; +g += e[a][2] * f[a][2]; + } + fn2(h, g); +} + +/* { dg-final { scan-assembler-not "vshufps\[ \\t\]+\[^\n\]*%\xmm(?:1\[6-9\]|\[2-3\]\[0-9\])" } } */
Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences
On 30/05/16 12:56, Richard Biener wrote: On Mon, 30 May 2016, Tom de Vries wrote: >On 30/05/16 11:46, Richard Biener wrote: > > >This patch fixes the assert conservatively by aborting graphite code > > > >generation when encountering a phi with more than two arguments in > > > >copy_bb_and_scalar_dependences. > > > > > > > >Bootstrapped and reg-tested on x86_64. > > > > > > > >OK for trunk, 6 branch? > > >Did you check if simply returning false from bb_contains_loop_phi_nodes > >instead of asserting works? The caller has a 'else' that is supposed > >to handle condition PHIs. After all it already handles one predecessor > >specially ... Thus > > > >if (EDGE_COUNT (bb->preds) != 2) > > return false; > > > >should work here. > >Unfortunately, that doesn't work. We run into another assert in >copy_cond_phi_nodes: >... > /* Cond phi nodes should have exactly two arguments. */ > gcc_assert (2 == EDGE_COUNT (bb->preds)); >... Hah. So can we still do my suggested change and bail out conservatively in Cond PHI node handling instead? Because the PHI node is clearly _not_ a loop header PHI and the cond phi assert is also a latent bug. Agreed. Updated as attached. OK for trunk, 6 branch? Thanks, - Tom Handle 3-arg phi in copy_bb_and_scalar_dependences 2016-05-30 Tom de Vries PR tree-optimization/69068 * graphite-isl-ast-to-gimple.c (copy_bb_and_scalar_dependences): Handle phis with more than two args. * gcc.dg/graphite/pr69068.c: New test. --- gcc/graphite-isl-ast-to-gimple.c| 10 ++ gcc/testsuite/gcc.dg/graphite/pr69068.c | 14 ++ 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index ff1d91f..b4ee9a9 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -1075,7 +1075,8 @@ bb_contains_loop_close_phi_nodes (basic_block bb) static bool bb_contains_loop_phi_nodes (basic_block bb) { - gcc_assert (EDGE_COUNT (bb->preds) <= 2); + if (EDGE_COUNT (bb->preds) != 2) +return false; if (bb->preds->length () == 1) return false; @@ -2480,13 +2481,14 @@ copy_cond_phi_nodes (basic_block bb, basic_block new_bb, vec iv_map) gcc_assert (!bb_contains_loop_close_phi_nodes (bb)); + /* TODO: Handle cond phi nodes with more than 2 predecessors. */ + if (EDGE_COUNT (bb->preds) != 2) +return false; + if (dump_file) fprintf (dump_file, "[codegen] copying cond phi nodes in bb_%d.\n", new_bb->index); - /* Cond phi nodes should have exactly two arguments. */ - gcc_assert (2 == EDGE_COUNT (bb->preds)); - for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi)) { diff --git a/gcc/testsuite/gcc.dg/graphite/pr69068.c b/gcc/testsuite/gcc.dg/graphite/pr69068.c new file mode 100644 index 000..0abea06 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr69068.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fgraphite-identity" } */ + +int qo; +int zh[2]; + +void +td (void) +{ + int ly, en; + for (ly = 0; ly < 2; ++ly) +for (en = 0; en < 2; ++en) + zh[en] = ((qo == 0) || (((qo * 2) != 0))) ? 1 : -1; +}
Re: [PATCH, PR69068] Handle 3-arg phi in copy_bb_and_scalar_dependences
On Mon, 30 May 2016, Tom de Vries wrote: > On 30/05/16 12:56, Richard Biener wrote: > > On Mon, 30 May 2016, Tom de Vries wrote: > > > > > >On 30/05/16 11:46, Richard Biener wrote: > > > > > > > >This patch fixes the assert conservatively by aborting graphite > > > > > code > > > > > > > > > >generation when encountering a phi with more than two > > > > > > arguments in > > > > > > > > > >copy_bb_and_scalar_dependences. > > > > > > > > > > > > > > > > > > > >Bootstrapped and reg-tested on x86_64. > > > > > > > > > > > > > > > > > > > >OK for trunk, 6 branch? > > > > > > > > > >Did you check if simply returning false from > > > > bb_contains_loop_phi_nodes > > > > > >instead of asserting works? The caller has a 'else' that is supposed > > > > > >to handle condition PHIs. After all it already handles one > > > > predecessor > > > > > >specially ... Thus > > > > > > > > > > > >if (EDGE_COUNT (bb->preds) != 2) > > > > > > return false; > > > > > > > > > > > >should work here. > > > > > > > >Unfortunately, that doesn't work. We run into another assert in > > > >copy_cond_phi_nodes: > > > >... > > > > /* Cond phi nodes should have exactly two arguments. */ > > > > gcc_assert (2 == EDGE_COUNT (bb->preds)); > > > >... > > Hah. So can we still do my suggested change and bail out conservatively > > in Cond PHI node handling instead? Because the PHI node is clearly > > _not_ a loop header PHI and the cond phi assert is also a latent bug. > > > > Agreed. Updated as attached. > > OK for trunk, 6 branch? Ok with the now redundant 2nd check removed @@ -1075,7 +1075,8 @@ bb_contains_loop_close_phi_nodes (basic_block bb) static bool bb_contains_loop_phi_nodes (basic_block bb) { - gcc_assert (EDGE_COUNT (bb->preds) <= 2); + if (EDGE_COUNT (bb->preds) != 2) +return false; if (bb->preds->length () == 1) return false; ^^ Thanks, Richard. > Thanks, > - Tom > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH, i386, AVX-512] Fix PR target/71346.
On Mon, May 30, 2016 at 04:46:50PM +0300, Kirill Yukhin wrote: > Hello, > Patch limits constraint for scalar operand in split to AVX-512VL. > > Boostrap/regtest in progress for x86_64/ia32. > > I'll check it in if pass. > > PR target/71346. > gcc/ > * config/i386/sse.md (define_insn_and_split "*vec_extractv4sf_0"): Use > `Yv' for scalar operand. > testsuite/ > * gcc.target/i386/prr71346.c: New test. Too many `r' chars above. LGTM otherwise and sorry for introducing this in r235970. > commit 6c0021bea7a5be8d9a10ef3f2fb30c1228f53d48 > Author: Kirill Yukhin > Date: Mon May 30 16:31:28 2016 +0300 > > AVX-512. Fix PR target/71346. > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index b348f2d..1267897 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -6837,7 +6837,7 @@ >"operands[1] = gen_lowpart (SFmode, operands[1]);") > > (define_insn_and_split "*sse4_1_extractps" > - [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v") > + [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv") > (vec_select:SF > (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v") > (parallel [(match_operand:SI 2 "const_0_to_3_operand" > "n,n,n,n,n")])))] > diff --git a/gcc/testsuite/gcc.target/i386/pr71346.c > b/gcc/testsuite/gcc.target/i386/pr71346.c > new file mode 100644 > index 000..0a15869 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr71346.c > @@ -0,0 +1,25 @@ > +/* PR target/71346 */ > +/* { dg-do compile { target { ! ia32 } } } */ > +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=knl" } */ > + > +typedef int rvec[3]; > +int a; > +float b, c, d, g; > +rvec *e, *f; > +void fn2(float h, float g); > + > +void > +fn1() > +{ > + float h; > + for (; a; a++) { > +h += e[a][0] * f[a][0]; > +b += e[a][0] * f[a][1]; > +c += e[a][2] * f[a][0]; > +d += e[a][2] * f[a][1]; > +g += e[a][2] * f[a][2]; > + } > + fn2(h, g); > +} > + > +/* { dg-final { scan-assembler-not "vshufps\[ > \\t\]+\[^\n\]*%\xmm(?:1\[6-9\]|\[2-3\]\[0-9\])" } } */ Jakub
Re: [PATCH] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount
On 5/30/16 21:32, Chen Gang wrote: > On 5/30/16 03:18, Mike Stump wrote: >> On May 29, 2016, at 3:39 AM, cheng...@emindsoft.com.cn wrote: >>> >>> r10 may also be as parameter for the nested function, so need save it >>> before call mcount. >> >> mcount can have a special abi where it preserves more registers than one >> would otherwise expect. I'm wondering if you know what registers it saves >> or doesn't touch? Does this fix any bug found by running tests, or just by >> inspection? >> > > It is about Bug71331, which I reported yesterday. > > For nested function, r10 is treated as the stack pointer for parameters, > mcount really save r10, but tilegx also use r10 to save lr, so cause > this issue ("move r10, lr; jal __mcount"). > > What I have done just like gcc x86 has done ("push r10; callq mcount"). > After this patch, nested-func-4.c can passes compiling and running under tilegx qemu. :-) Thanks. -- Chen Gang (陈刚) Managing Natural Environments is the Duty of Human Beings.
Re: [C++ Patch] PR 71109 ("Misleading diagnostic message with 'virtual' used in out-of-line definitions of class template member functions")
OK. On Sun, May 29, 2016 at 6:49 PM, Paolo Carlini wrote: > Hi, > > submitter noticed that for wrong uses of 'virtual' outside of template > classes (B in the testcase) vs plain classes (A) we wrongly emit the > "templates may not be %" error message. Simply checking > current_class_type seems enough to solve the problem. Case C in the extended > testcase double checks that we still give the "templates may not be > %" error message when appropriate. Tested x86_64-linux. > > Thanks, > Paolo. > > //
[PATCH, OpenACC] Make reduction arguments addressable
Hi, a previous patch of Cesar's has made the middle-end omp-lowering automatically create and insert a tofrom (i.e. present_or_copy) map for parallel reductions. This allowed the user to not need explicit clauses to copy out the reduction result, but because reduction arguments are not marked addressable, async does not work as expected, i.e. the asynchronous copy-out results are not used in the compiler generated code. This patch fixes this in the front-ends, I've tested this patch without new regressions, and fixes some C++ OpenACC tests that regressed after my last OpenACC async patch. Is this okay for trunk? Thanks, Chung-Lin 2016-05-30 Chung-Lin Tang c/ * c-typeck.c (c_finish_omp_clauses): Mark OpenACC reduction arguments as addressable. cp/ * semantics.c (finish_omp_clauses): Mark OpenACC reduction arguments as addressable. fortran/ * trans-openmp.c (gfc_trans_oacc_construct): Mark OpenACC reduction arguments as addressable. (gfc_trans_oacc_combined_directive): Likewise. Index: c/c-typeck.c === --- c/c-typeck.c(revision 236845) +++ c/c-typeck.c(working copy) @@ -12575,6 +12575,8 @@ c_finish_omp_clauses (tree clauses, enum c_omp_reg remove = true; break; } + if (ort & C_ORT_ACC) + c_mark_addressable (t); type = TREE_TYPE (t); if (TREE_CODE (t) == MEM_REF) type = TREE_TYPE (type); Index: cp/semantics.c === --- cp/semantics.c (revision 236845) +++ cp/semantics.c (working copy) @@ -5827,6 +5827,8 @@ finish_omp_clauses (tree clauses, enum c_omp_regio t = n; goto check_dup_generic_t; } + if (ort & C_ORT_ACC) + cxx_mark_addressable (t); goto check_dup_generic; case OMP_CLAUSE_COPYPRIVATE: copyprivate_seen = true; Index: fortran/trans-openmp.c === --- fortran/trans-openmp.c (revision 236845) +++ fortran/trans-openmp.c (working copy) @@ -2704,6 +2704,10 @@ gfc_trans_oacc_construct (gfc_code *code) gfc_start_block (&block); oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses, code->loc); + for (tree c = oacc_clauses; c; c = OMP_CLAUSE_CHAIN (c)) +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION + && DECL_P (OMP_CLAUSE_DECL (c))) + TREE_ADDRESSABLE (OMP_CLAUSE_DECL (c)) = 1; stmt = gfc_trans_omp_code (code->block->next, true); stmt = build2_loc (input_location, construct_code, void_type_node, stmt, oacc_clauses); @@ -3501,6 +3505,10 @@ gfc_trans_oacc_combined_directive (gfc_code *code) construct_clauses.lists[OMP_LIST_REDUCTION] = NULL; oacc_clauses = gfc_trans_omp_clauses (&block, &construct_clauses, code->loc); + for (tree c = oacc_clauses; c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION + && DECL_P (OMP_CLAUSE_DECL (c))) + TREE_ADDRESSABLE (OMP_CLAUSE_DECL (c)) = 1; } if (!loop_clauses.seq) pblock = █
[PATCH, testsuite]: Fix clear_non_sret_int_hardware_registers define in gcc.target/i386/iamcu/args.h
Hello! 2016-05-30 Uros Bizjak * gcc.target/i386/iamcu/args.h (clear_non_sret_int_hardware_registers): Use correct register when clearing %edx. Regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline and gcc-6 branch. Uros. Index: gcc.target/i386/iamcu/args.h === --- gcc.target/i386/iamcu/args.h(revision 236882) +++ gcc.target/i386/iamcu/args.h(working copy) @@ -30,7 +30,7 @@ /* Clear all scratch integer registers, excluding the one used to return aggregate. */ #define clear_non_sret_int_hardware_registers \ - asm __volatile__ ("xor %%edx, %%ebx\n\t" \ + asm __volatile__ ("xor %%edx, %%edx\n\t" \ "xor %%ecx, %%ecx\n\t" \ ::: "edx", "ecx");
Re: [PATCH] match.pd: optimize unsigned mul overflow check
On Sun, 29 May 2016, Marc Glisse wrote: > On Sat, 28 May 2016, Alexander Monakov wrote: > > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether > > 'A*B' > > overflows (or 'B && A > -1 / B' if B may be zero). Let's optimize it to an > > invocation of __builtin_mul_overflow to avoid the divide operation. > [snip] > Any plan on optimizing the 'B && ovf' form? Not in the immediate future; I don't have a good approach in mind to do that. (if I get back to this topic I'd try to CSE the product with the IFN result) Thanks for the discussion and help -- I've applied the patch. Alexander
Re: Enable loop peeling at -O3
Hi, this is version of patch I intend to commit after re-testing at x86_64-linux with loop peeling enabled at -O3. It drops -fpeel-all-loops, add logic to not peel loops multiple times and fix profile updating. Bootstrapped/regtested x86_64-linux Honza * doc/invoke.texi (-fpeel-loops,-O3): Update documentation. * opts.c (default_options): Enable peel loops at -O3. * tree-ssa-loop-ivcanon.c (peeled_loops): New static var. (try_peel_loop): Do not re-peel already peeled loops; use likely upper bounds; fix profile updating. (pass_complete_unroll::execute): Initialize peeled_loops. * gcc.dg/tree-ssa/peel1.c: New testcase. * gcc.dg/tree-ssa/peel2.c: New testcase. * gcc.dg/tree-ssa/pr61743-1.c: Disable loop peeling. * gcc.dg/tree-ssa/pr61743-2.c: Disable loop peeling. Index: doc/invoke.texi === --- doc/invoke.texi (revision 236873) +++ doc/invoke.texi (working copy) @@ -6338,7 +6338,8 @@ by @option{-O2} and also turns on the @o @option{-fgcse-after-reload}, @option{-ftree-loop-vectorize}, @option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths} @option{-ftree-slp-vectorize}, @option{-fvect-cost-model}, -@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options. +@option{-ftree-partial-pre}, @option{-fpeel-loops} +and @option{-fipa-cp-clone} options. @item -O0 @opindex O0 @@ -8661,10 +8662,11 @@ the loop is entered. This usually makes @item -fpeel-loops @opindex fpeel-loops Peels loops for which there is enough information that they do not -roll much (from profile feedback). It also turns on complete loop peeling -(i.e.@: complete removal of loops with small constant number of iterations). +roll much (from profile feedback or static analysis). It also turns on +complete loop peeling (i.e.@: complete removal of loops with small constant +number of iterations). -Enabled with @option{-fprofile-use}. +Enabled with @option{-O3} and/or @option{-fprofile-use}. @item -fmove-loop-invariants @opindex fmove-loop-invariants Index: opts.c === --- opts.c (revision 236873) +++ opts.c (working copy) @@ -535,6 +535,7 @@ static const struct default_options defa { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC }, { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 }, +{ OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 }, /* -Ofast adds optimizations to -O3. */ { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 }, Index: testsuite/gcc.dg/tree-ssa/peel1.c === --- testsuite/gcc.dg/tree-ssa/peel1.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/peel1.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ +struct foo {int b; int a[3];} foo; +void add(struct foo *a,int l) +{ + int i; + for (i=0;ia[i]++; +} +/* { dg-final { scan-tree-dump "Loop 1 likely iterates at most 3 times." "cunroll"} } */ +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." "cunroll"} } */ Index: testsuite/gcc.dg/tree-ssa/peel2.c === --- testsuite/gcc.dg/tree-ssa/peel2.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/peel2.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpeel-all-loops -fdump-tree-cunroll-details --param max-peel-times=16 --param max-peeled-insns=100" } */ +void add(int *a,int l) +{ + int i; + for (i=0;i loops_to_unloop; static vec loops_to_unloop_nunroll; +/* Stores loops that has been peeled. */ +static bitmap peeled_loops; /* Cancel all fully unrolled loops by putting __builtin_unreachable on the latch edge. @@ -962,14 +964,16 @@ try_peel_loop (struct loop *loop, vec to_remove = vNULL; edge e; - /* If the iteration bound is known and large, then we can safely eliminate - the check in peeled copies. */ - if (TREE_CODE (niter) != INTEGER_CST) -exit = NULL; - if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0) return false; + if (bitmap_bit_p (peeled_loops, loop->num)) +{ + if (dump_file) +fprintf (dump_file, "Not peeling: loop is already peeled\n"); + return false; +} + /* Peel only innermost loops. While the code is perfectly capable of peeling non-innermost loops, the heuristics would probably need some improvements. */ @@ -990,6 +994,8 @@ try_peel_loop (struct loop *loop, /* Check if there is an estimate on the number of iterations. */ npeel = estimated_loop_iterations_int (loop); if (npeel < 0) +npeel = likely_max_loop_iterations_int (loop); + if (npeel < 0) { if (dump_file) fprintf (dump_file, "Not peeling: number of iteration
[C++ Patch] PR 71238 ("Undeclared function message imprecisely points to error column")
Hi, this diagnostic bug report is about an incorrect column (normally the final closed parenthesis) for error messages emitted by unqualified_fn_lookup_error. The substance of the fix is obvious - pass an appropriate column information to the function - and it handles correctly all the instances I could find in the C++ testsuite, but there are a few implementation details that certainly could be different. Note that I also tried to be quite conservative, thus actively avoid UNKNOWN_LOCATION to ever get through. Eventually I'm proposing to change unqualified_fn_lookup_error to take a cp_expr instead of my initial idea of adding a location_t parameter: this way the call from cp_parser_postfix_expression - which accounts for about a third of the locations fixed in the testsuite - is automatically handled and only the call from perform_koenig_lookup needs a little adjustment, to wrap the location in a cp_expr together with the identifier. But admittedly I didn't follow in detail the discussion which led to the introduction of cp_expr, I'm not sure this kind of wrapping counts as an "appropriate" use. Tested x86_64-linux. Thanks, Paolo. // /cp 2016-05-30 Paolo Carlini PR c++/71238 * lex.c (unqualified_name_lookup_error): Take a location too. (unqualified_fn_lookup_error): Take a cp_expr. * cp-tree.h (unqualified_name_lookup_error, unqualified_fn_lookup_error): Adjust declarations. * semantics.c (perform_koenig_lookup): Adjust unqualified_fn_lookup_error call, pass the location of the identifier too as part of a cp_expr. /testsuite 2016-05-30 Paolo Carlini PR c++/71238 * g++.dg/parse/pr71238.C: New. * g++.dg/concepts/friend1.C: Test column numbers too. * g++.dg/cpp0x/initlist31.C: Likewise. * g++.dg/cpp0x/pr51420.C: Likewise. * g++.dg/cpp0x/udlit-declare-neg.C: Likewise. * g++.dg/cpp0x/udlit-member-neg.C: Likewise. * g++.dg/ext/builtin3.C: Likewise. * g++.dg/lookup/friend12.C: Likewise. * g++.dg/lookup/friend7.C: Likewise. * g++.dg/lookup/koenig1.C: Likewise. * g++.dg/lookup/koenig5.C: Likewise. * g++.dg/lookup/used-before-declaration.C: Likewise. * g++.dg/overload/koenig1.C: Likewise. * g++.dg/template/crash65.C: Likewise. * g++.dg/template/friend57.C: Likewise. * g++.dg/warn/Wshadow-5.C: Likewise. * g++.dg/warn/Wunused-8.C: Likewise. * g++.old-deja/g++.bugs/900211_01.C: Likewise. * g++.old-deja/g++.jason/lineno5.C: Likewise. * g++.old-deja/g++.jason/member.C: Likewise. * g++.old-deja/g++.jason/report.C: Likewise. * g++.old-deja/g++.jason/scoping12.C: Likewise. * g++.old-deja/g++.law/visibility20.C: Likewise. * g++.old-deja/g++.ns/koenig5.C: Likewise. * g++.old-deja/g++.other/static5.C: Likewise. * g++.old-deja/g++.pt/overload2.C: Likewise. Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 236880) +++ cp/cp-tree.h(working copy) @@ -5974,8 +5974,9 @@ extern tree build_vtbl_address (t extern void cxx_dup_lang_specific_decl (tree); extern void yyungetc (int, int); -extern tree unqualified_name_lookup_error (tree); -extern tree unqualified_fn_lookup_error(tree); +extern tree unqualified_name_lookup_error (tree, +location_t = UNKNOWN_LOCATION); +extern tree unqualified_fn_lookup_error(cp_expr); extern tree build_lang_decl(enum tree_code, tree, tree); extern tree build_lang_decl_loc(location_t, enum tree_code, tree, tree); extern void retrofit_lang_decl (tree); Index: cp/lex.c === --- cp/lex.c(revision 236880) +++ cp/lex.c(working copy) @@ -443,19 +443,22 @@ handle_pragma_java_exceptions (cpp_reader* /*dfile IDENTIFIER_NODE) failed. Returns the ERROR_MARK_NODE. */ tree -unqualified_name_lookup_error (tree name) +unqualified_name_lookup_error (tree name, location_t loc) { + if (loc == UNKNOWN_LOCATION) +loc = location_of (name); + if (IDENTIFIER_OPNAME_P (name)) { if (name != ansi_opname (ERROR_MARK)) - error ("%qD not defined", name); + error_at (loc, "%qD not defined", name); } else { if (!objc_diagnose_private_ivar (name)) { - error ("%qD was not declared in this scope", name); - suggest_alternatives_for (location_of (name), name); + error_at (loc, "%qD was not declared in this scope", name); + suggest_alternatives_for (loc, name); } /* Prevent repeated error messages by creating a VAR_DECL with this NAME in t
Re: [PATCH 2/3] Add profiling support for IVOPTS
On 05/24/2016 12:11 PM, Bin.Cheng wrote: > Hi, > Could you please factor out this as a function and remove the goto > statements? Okay with this change if no fallout in benchmarks you > run. > > Thanks, > bin Hi. Thanks for the review, I've just verified that it does not introduce any regression on SPECv6 and it improves couple of SPEC2006 benchmarks w/ PGO. I'm going to install the patch and make a control run of benchmarks. Thanks Martin >From 2991622862dd934e464f542e9e58270bf0088544 Mon Sep 17 00:00:00 2001 From: marxin Date: Tue, 17 May 2016 15:22:43 +0200 Subject: [PATCH 1/5] Add profiling support for IVOPTS gcc/ChangeLog: 2016-05-17 Martin Liska * tree-ssa-loop-ivopts.c (get_computation_cost_at): Scale computed costs by frequency of BB they belong to. (get_scaled_computation_cost_at): New function. --- gcc/tree-ssa-loop-ivopts.c | 62 ++ 1 file changed, 46 insertions(+), 16 deletions(-) diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index d770ec9..a541ef8 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -4794,7 +4794,33 @@ get_loop_invariant_expr (struct ivopts_data *data, tree ubase, return record_inv_expr (data, expr); } +/* Scale (multiply) the computed COST (except scratch part that should be + hoisted out a loop) by header->frequency / AT->frequency, + which makes expected cost more accurate. */ +static comp_cost +get_scaled_computation_cost_at (ivopts_data *data, gimple *at, iv_cand *cand, +comp_cost cost) +{ + int loop_freq = data->current_loop->header->frequency; + int bb_freq = at->bb->frequency; + if (loop_freq != 0) + { + gcc_assert (cost.scratch <= cost.cost); + int scaled_cost + = cost.scratch + (cost.cost - cost.scratch) * bb_freq / loop_freq; + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Scaling iv_use based on cand %d " + "by %2.2f: %d (scratch: %d) -> %d (%d/%d)\n", + cand->id, 1.0f * bb_freq / loop_freq, cost.cost, + cost.scratch, scaled_cost, bb_freq, loop_freq); + + cost.cost = scaled_cost; + } + + return cost; +} /* Determines the cost of the computation by that USE is expressed from induction variable CAND. If ADDRESS_P is true, we just need @@ -4982,18 +5008,21 @@ get_computation_cost_at (struct ivopts_data *data, (symbol/var1/const parts may be omitted). If we are looking for an address, find the cost of addressing this. */ if (address_p) -return cost + get_address_cost (symbol_present, var_present, -offset, ratio, cstepi, -mem_mode, -TYPE_ADDR_SPACE (TREE_TYPE (utype)), -speed, stmt_is_after_inc, can_autoinc); +{ + cost += get_address_cost (symbol_present, var_present, +offset, ratio, cstepi, +mem_mode, +TYPE_ADDR_SPACE (TREE_TYPE (utype)), +speed, stmt_is_after_inc, can_autoinc); + return get_scaled_computation_cost_at (data, at, cand, cost); +} /* Otherwise estimate the costs for computing the expression. */ if (!symbol_present && !var_present && !offset) { if (ratio != 1) cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed); - return cost; + return get_scaled_computation_cost_at (data, at, cand, cost); } /* Symbol + offset should be compile-time computable so consider that they @@ -5012,24 +5041,25 @@ get_computation_cost_at (struct ivopts_data *data, aratio = ratio > 0 ? ratio : -ratio; if (aratio != 1) cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed); - return cost; + + return get_scaled_computation_cost_at (data, at, cand, cost); fallback: if (can_autoinc) *can_autoinc = false; - { -/* Just get the expression, expand it and measure the cost. */ -tree comp = get_computation_at (data->current_loop, use, cand, at); + /* Just get the expression, expand it and measure the cost. */ + tree comp = get_computation_at (data->current_loop, use, cand, at); -if (!comp) - return infinite_cost; + if (!comp) +return infinite_cost; + + if (address_p) +comp = build_simple_mem_ref (comp); -if (address_p) - comp = build_simple_mem_ref (comp); + cost = comp_cost (computation_cost (comp, speed), 0); -return comp_cost (computation_cost (comp, speed), 0); - } + return get_scaled_computation_cost_at (data, at, cand, cost); } /* Determines the cost of the computation by that USE is expressed -- 2.8.3
Re: [PATCH, rs6000] Add builtin-support for new Power9 vslv and vsrv (vector shift left and right variable) instructions
On Fri, May 27, 2016 at 02:01:33PM -0600, Kelvin Nilsen wrote: > > This patch adds built-in function support for the Power9 vslv and vsrv > instructions. > +Tne @code{vec_slv} and @code{vec_srv} functions operate in parallel on Typo ("Tne"). > +all of the bytes of their @code{src} and @code{shift_distance} > +arguments in parallel. The behavior of the @code{vec_slv} is as if You also say "in parallel" twice here. > --- gcc/testsuite/gcc.target/powerpc/vslv-0.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vslv-0.c (working copy) > @@ -0,0 +1,14 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ > +/* { dg-options "-mcpu=power9" } */ I think you need to prevent the user from overriding -mcpu= for these tests? There are many examples of this in gcc.target/powerpc/ . Okay for trunk with those things taken care of; okay for 6 later, too. Thanks, Segher
Re: [C++ Patch] PR 71238 ("Undeclared function message imprecisely points to error column")
On 05/30/2016 11:42 AM, Paolo Carlini wrote: +unqualified_name_lookup_error (tree name, location_t loc) { + if (loc == UNKNOWN_LOCATION) +loc = location_of (name); When does this do anything useful? If name is a DECL, this will give the wrong answer. If name is an IDENTIFIER_NODE, this will give UNKNOWN_LOCATION. Jason
Re: [C++ Patch] PR 71238 ("Undeclared function message imprecisely points to error column")
Hi, On 30/05/2016 18:07, Jason Merrill wrote: On 05/30/2016 11:42 AM, Paolo Carlini wrote: +unqualified_name_lookup_error (tree name, location_t loc) { + if (loc == UNKNOWN_LOCATION) +loc = location_of (name); When does this do anything useful? If name is a DECL, this will give the wrong answer. If name is an IDENTIFIER_NODE, this will give UNKNOWN_LOCATION. Indeed, I wasn't sure, as I said, I tried to be conservative, because we used to call in that function. I'm going to regression test again with input_location instead. Does the patch otherwise looks sane? Thanks, Paolo.
[gomp4.5] Doacross fixes for Fortran non-simple loops
Hi! This fixes the handling of doacross on non-simple Fortran loops (ones where step isn't compile time known 1 or -1), except that lastprivate is still broken if the collapsed loop iterates non-zero times, but one of the ordered loops inside of it has zero iterations. Will need to figure out something for that case. Tested on x86_64-linux and i686-linux, committed to gomp-4_5-branch. 2016-05-30 Jakub Jelinek * omp-low.c (expand_omp_ordered_sink): Handle TREE_PURPOSE of deps being TRUNC_DIV_EXPR. * gimplify.c (gimplify_scan_omp_clauses): Likewise. gcc/fortran/ * trans-openmp.c (doacross_steps): New variable. (gfc_trans_omp_clauses): Wrap depend sink addend into TRUNC_DIV_EXPR with second operand the non-simple step. (gfc_trans_omp_do): Set up doacross_steps. libgomp/ * testsuite/libgomp.c/doacross-1.c (main): Add missing #pragma omp atomic read. * testsuite/libgomp.c/doacross-2.c (main): Likewise. * testsuite/libgomp.c/doacross-3.c (main): Likewise. * testsuite/libgomp.fortran/doacross1.f90: New test. * testsuite/libgomp.fortran/doacross2.f90: New test. --- gcc/omp-low.c.jj2016-05-20 14:52:59.0 +0200 +++ gcc/omp-low.c 2016-05-30 17:46:40.239706984 +0200 @@ -7996,12 +7996,27 @@ expand_omp_ordered_sink (gimple_stmt_ite for (i = 0; i < fd->ordered; i++) { + tree step = NULL_TREE; off = TREE_PURPOSE (deps); + if (TREE_CODE (off) == TRUNC_DIV_EXPR) + { + step = TREE_OPERAND (off, 1); + off = TREE_OPERAND (off, 0); + } if (!integer_zerop (off)) { gcc_assert (fd->loops[i].cond_code == LT_EXPR || fd->loops[i].cond_code == GT_EXPR); bool forward = fd->loops[i].cond_code == LT_EXPR; + if (step) + { + /* Non-simple Fortran DO loops. If step is variable, +we don't know at compile even the direction, so can't +warn. */ + if (TREE_CODE (step) != INTEGER_CST) + break; + forward = tree_int_cst_sgn (step) != -1; + } if (forward ^ OMP_CLAUSE_DEPEND_SINK_NEGATIVE (deps)) warning_at (loc, 0, "% clause waiting for " "lexically later iteration"); @@ -8022,16 +8037,33 @@ expand_omp_ordered_sink (gimple_stmt_ite edge e1 = split_block (gsi_bb (gsi2), gsi_stmt (gsi2)); edge e2 = split_block_after_labels (e1->dest); - *gsi = gsi_after_labels (e1->dest); + gsi2 = gsi_after_labels (e1->dest); + *gsi = gsi_last_bb (e1->src); for (i = 0; i < fd->ordered; i++) { tree itype = TREE_TYPE (fd->loops[i].v); + tree step = NULL_TREE; + tree orig_off = NULL_TREE; if (POINTER_TYPE_P (itype)) itype = sizetype; if (i) deps = TREE_CHAIN (deps); off = TREE_PURPOSE (deps); - tree s = fold_convert_loc (loc, itype, fd->loops[i].step); + if (TREE_CODE (off) == TRUNC_DIV_EXPR) + { + step = TREE_OPERAND (off, 1); + off = TREE_OPERAND (off, 0); + gcc_assert (fd->loops[i].cond_code == LT_EXPR + && integer_onep (fd->loops[i].step) + && !POINTER_TYPE_P (TREE_TYPE (fd->loops[i].v))); + } + tree s = fold_convert_loc (loc, itype, step ? step : fd->loops[i].step); + if (step) + { + off = fold_convert_loc (loc, itype, off); + orig_off = off; + off = fold_build2_loc (loc, TRUNC_DIV_EXPR, itype, off, s); + } if (integer_zerop (off)) t = boolean_true_node; @@ -8053,7 +8085,36 @@ expand_omp_ordered_sink (gimple_stmt_ite else a = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (fd->loops[i].v), fd->loops[i].v, co); - if (fd->loops[i].cond_code == LT_EXPR) + if (step) + { + tree t1, t2; + if (OMP_CLAUSE_DEPEND_SINK_NEGATIVE (deps)) + t1 = fold_build2_loc (loc, GE_EXPR, boolean_type_node, a, + fd->loops[i].n1); + else + t1 = fold_build2_loc (loc, LT_EXPR, boolean_type_node, a, + fd->loops[i].n2); + if (OMP_CLAUSE_DEPEND_SINK_NEGATIVE (deps)) + t2 = fold_build2_loc (loc, LT_EXPR, boolean_type_node, a, + fd->loops[i].n2); + else + t2 = fold_build2_loc (loc, GE_EXPR, boolean_type_node, a, + fd->loops[i].n1); + t = fold_build2_loc (loc, LT_EXPR, boolean_type_node, + step, build_int_cst (TREE_TYPE (step), 0)); + if (TREE_CODE (step) != INTEGER_CST) + { + t1 = unshare_expr (t1); + t1 = force_gimple_operand_gsi (gsi, t1, true, NU
Re: [PATCH, OpenACC] Make reduction arguments addressable
On Mon, May 30, 2016 at 10:38:59PM +0800, Chung-Lin Tang wrote: > Hi, a previous patch of Cesar's has made the middle-end omp-lowering > automatically create and insert a tofrom (i.e. present_or_copy) map for > parallel reductions. This allowed the user to not need explicit > clauses to copy out the reduction result, but because reduction arguments > are not marked addressable, async does not work as expected, > i.e. the asynchronous copy-out results are not used in the compiler generated > code. If you need it only for async parallel/kernels? regions, can't you do that only for those and not for others? > This patch fixes this in the front-ends, I've tested this patch without > new regressions, and fixes some C++ OpenACC tests that regressed after > my last OpenACC async patch. Is this okay for trunk? Testcases in the testsuite or others? If the latter, we should add them. Jakub
Re: [C++ Patch] PR 71238 ("Undeclared function message imprecisely points to error column")
On 05/30/2016 12:45 PM, Paolo Carlini wrote: Hi, On 30/05/2016 18:07, Jason Merrill wrote: On 05/30/2016 11:42 AM, Paolo Carlini wrote: +unqualified_name_lookup_error (tree name, location_t loc) { + if (loc == UNKNOWN_LOCATION) +loc = location_of (name); When does this do anything useful? If name is a DECL, this will give the wrong answer. If name is an IDENTIFIER_NODE, this will give UNKNOWN_LOCATION. Indeed, I wasn't sure, as I said, I tried to be conservative, because we used to call in that function. I'm going to regression test again with input_location instead. Does the patch otherwise looks sane? The safest choice would be EXPR_LOCATION_OR_LOC (name, input_location) OK with that change. Jason
Re: [PATCH] c++/71306 - bogus -Wplacement-new with an array element
OK. Jason
[PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse
On Sun, May 29, 2016 at 11:10 PM, Uros Bizjak wrote: > Attached RFC patch implements this proposal. > > 2016-05-29 Uros Bizjak > > * config/i386/sync.md (mfence_nosse): Use "lock orl $0, -4(%esp)". > > Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > Any other opinion on this issue? The linux kernel also implements > memory fence like the above proposal. I have committed the patch to mainline SVN. Uros.
Re: [PATCH] x86 interrupt attribute patch [2/2]
On 05/30/2016 07:31 AM, Koval, Julia wrote: Hi, Here is the fixed version of the patch. Ok for trunk? diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 2d4f028..f4bd7dd 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -5266,6 +5266,96 @@ On x86-32 targets, the @code{stdcall} attribute causes th e compiler to assume that the called function pops off the stack space used to pass arguments, unless it takes a variable number of arguments. +@item no_caller_saved_registers +@cindex @code{no_caller_saved_registers} function attribute, x86 +Use this attribute to indicate that the specified function has no +caller-saved registers, for example for a function, called from an +interrupt handler. That is, all registers are callee-saved. I think the "for example" information should come after the second sentence to improve the flow here. I'm also confused; I can see how an interrupt handler might have different register usage conventions, but do regular functions called from inside an interrupt handler function really use a non-standard call/return sequence? +The compiler generates proper function entry and exit sequences to +save and restore any modified registers, except for the EFLAGS +register. Since GCC doesn't preserve MPX, SSE, MMX nor x87 states, +the GCC option, @option{-mgeneral-regs-only}, should be used to Please delete both commas in the line above. +compile functions with @code{no_caller_saved_registers} attribute. + +@emph{Note for compiler implementers:} If the compiler generates MPX, +SSE, MMX or x87 instructions in a function with +@code{no_caller_saved_registers} attribute or functions called from +a function with @code{no_caller_saved_registers} attribute may contain +MPX, SSE, MMX or x87 instructions, the compiler must save and restore +the corresponding state. A "Note for compiler implementers" has no place in user documentation. You should just document what GCC does, if it is relevant to how a user would use this feature. It also seems like the admonition in this note that the compiler must save/restore the state contradicts the previous paragraph, where you say GCC doesn't preserve the state. + +@item interrupt +@cindex @code{interrupt} function attribute, x86 +Use this attribute to indicate that the specified function is an +interrupt handler or an exception handler (depending on parameters, passed Delete the comma. +to the function, explained further). The compiler generates function +entry and exit sequences suitable for use in an interrupt handler when +this attribute is present. The @code{IRET} instruction, instead of the +@code{RET} instruction, is used to return from interrupt handlers. All +registers, except for the EFLAGS register which is restored by the +@code{IRET} instruction, are preserved by the compiler. Since GCC +doesn't preserve MPX, SSE, MMX nor x87 states, the GCC option, +@option{-mgeneral-regs-only}, should be used to compile interrupt and Delete the two previous commas. +exception handlers. + +@emph{Note for compiler implementers:} If the compiler generates MPX, +SSE, MMX or x87 instructions in an interrupt or exception handler, or +functions called from an interrupt or exception handler may contain +MPX, SSE, MMX or x87 instructions, the compiler must save and restore +the corresponding state. Again, this is user documentation. Just explain what GCC does if it is relevant to how a user would use the feature you are documenting. + +Since the direction flag in the FLAGS register in interrupt handlers +is undetermined, cld instruction must be emitted in function prologue +if rep string instructions are used in interrupt handler or interrupt +handler isn't a leaf function. This paragraph seems totally implementor-speaky and irrelevant to how a user would use the feature. + +Any interruptible-without-stack-switch code must be compiled with +@option{-mno-red-zone} since interrupt handlers can and will, because +of the hardware design, touch the red zone. + +An interrupt handler must be declared with a mandatory pointer +argument: + +@smallexample +struct interrupt_frame; + +__attribute__ ((interrupt)) +void +f (struct interrupt_frame *frame) +@{ +@} +@end smallexample + +and you must define the structure the pointer pointing to, depending +on the proper x86 interrupt frame, described in the processor's manual. How about @noindent and you must define @code{struct interrupt_frame} as described in the processor's manual. + +The exception handler is very similar to the interrupt handler with +a different mandatory function signature: + +@smallexample +#ifdef __x86_64__ +typedef unsigned long long int uword_t; +#else +typedef unsigned int uword_t; +#endif + +struct interrupt_frame; + +__attribute__ ((interrupt)) +void +f (struct interrupt_frame *frame, uword_t error_code) +@{ + ... +@} +@end smallexample + +and compiler pops the error code off the stack before the @code{IRET} +instruction. +
Re: Enable loop peeling at -O3
On 05/30/2016 09:26 AM, Jan Hubicka wrote: Hi, this is version of patch I intend to commit after re-testing at x86_64-linux with loop peeling enabled at -O3. It drops -fpeel-all-loops, add logic to not peel loops multiple times and fix profile updating. Bootstrapped/regtested x86_64-linux Honza * doc/invoke.texi (-fpeel-loops,-O3): Update documentation. * opts.c (default_options): Enable peel loops at -O3. * tree-ssa-loop-ivcanon.c (peeled_loops): New static var. (try_peel_loop): Do not re-peel already peeled loops; use likely upper bounds; fix profile updating. (pass_complete_unroll::execute): Initialize peeled_loops. * gcc.dg/tree-ssa/peel1.c: New testcase. * gcc.dg/tree-ssa/peel2.c: New testcase. * gcc.dg/tree-ssa/pr61743-1.c: Disable loop peeling. * gcc.dg/tree-ssa/pr61743-2.c: Disable loop peeling. The documentation parts look OK (and I'm glad we don't need to add yet another new option). -Sandra
[PATCH] Fix PR tree-optimization/71077
In this PR the function simplify_control_stmt_condition_1(), which is responsible for recursing into the operands of a GIMPLE_COND during jump threading to check for dominating ASSERT_EXPRs, was erroneously returning a VECTOR_CST when it should instead be returning an INTEGER_CST. The problem is that the zero/one constants that this function uses share the type of the GIMPLE_COND's operands, operands which may be vectors. This makes no sense and there is no reason to build and use such constants instead of using boolean_[true|false]_node since all that matters is whether a constant returned by simplify_control_stmt_condition() correctly satisfies integer_[zero|one]p. So this patch makes the combining part of simplify_control_stmt_condition_1() just use boolean_false_node or boolean_true_node as the designated false/true values. Does this look OK to commit after bootstrap+regtest on x86_64-pc-linux-gnu? (I'm not sure if I put the test in the proper directory since no other test in tree-ssa uses -flto. Also not sure if an i686 target accepts the -march=core-avx2 flag.) gcc/ChangeLog: PR tree-optimization/71077 * tree-ssa-threadedge.c (simplify_control_stmt_condition_1): In the combining step, use boolean_false_node and boolean_true_node as the designated false/true return values. gcc/testsuite/ChangeLog: PR tree-optimization/71077 * gcc.dg/tree-ssa/pr71077.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/pr71077.c | 18 ++ gcc/tree-ssa-threadedge.c | 26 -- 2 files changed, 30 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71077.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c new file mode 100644 index 000..4753740 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c @@ -0,0 +1,18 @@ +/* PR c++/71077 */ +/* { dg-do link { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options "-O3 -flto -march=core-avx2" } */ + +int *a; +int b, c, d, e; +int sched_analyze(void) { + for (; b; b++) { + c = 0; + for (; c < 32; c++) + if (b & 1 << c) + a[b + c] = d; + } + return 0; +} + +void schedule_insns(void) { e = sched_analyze(); } +int main(void) { schedule_insns(); } diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index eca3812..826d620 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -573,8 +573,6 @@ simplify_control_stmt_condition_1 (edge e, enum tree_code rhs_code = gimple_assign_rhs_code (def_stmt); const tree rhs1 = gimple_assign_rhs1 (def_stmt); const tree rhs2 = gimple_assign_rhs2 (def_stmt); - const tree zero_cst = build_zero_cst (TREE_TYPE (op0)); - const tree one_cst = build_one_cst (TREE_TYPE (op0)); /* Is A != 0 ? */ const tree res1 @@ -589,19 +587,19 @@ simplify_control_stmt_condition_1 (edge e, { /* If A == 0 then (A & B) != 0 is always false. */ if (cond_code == NE_EXPR) - return zero_cst; + return boolean_false_node; /* If A == 0 then (A & B) == 0 is always true. */ if (cond_code == EQ_EXPR) - return one_cst; + return boolean_true_node; } else if (rhs_code == BIT_IOR_EXPR && integer_nonzerop (res1)) { /* If A != 0 then (A | B) != 0 is always true. */ if (cond_code == NE_EXPR) - return one_cst; + return boolean_true_node; /* If A != 0 then (A | B) == 0 is always false. */ if (cond_code == EQ_EXPR) - return zero_cst; + return boolean_false_node; } /* Is B != 0 ? */ @@ -617,19 +615,19 @@ simplify_control_stmt_condition_1 (edge e, { /* If B == 0 then (A & B) != 0 is always false. */ if (cond_code == NE_EXPR) - return zero_cst; + return boolean_false_node; /* If B == 0 then (A & B) == 0 is always true. */ if (cond_code == EQ_EXPR) - return one_cst; + return boolean_true_node; } else if (rhs_code == BIT_IOR_EXPR && integer_nonzerop (res2)) { /* If B != 0 then (A | B) != 0 is always true. */ if (cond_code == NE_EXPR) - return one_cst; + return boolean_true_node; /* If B != 0 then (A | B) == 0 is always false. */ if (cond_code == EQ_EXPR) - return zero_cst; + return boolean_false_node; } if (res1 != NULL_TREE && res2 != NULL_TREE) @@ -641,10 +639,10 @@ simplify_control_stmt_condition_1 (edge e, { /* If A != 0 and B != 0 then (bool)(A & B) != 0 is true. */
[PATCH] Fix PR tree-optimization/71314
The test case ssa-thread-14.c expects that non-short-circuit logical ops are used so it fails on targets that don't use such ops by default. To fix, I copied the dg directives from tree-ssa/reassoc-35.c which look reasonable. Does this look OK to commit? gcc/testsuite/ChangeLog: PR tree-optimization/71314 * gcc.dg/tree-ssa/ssa-thread-14.c: Adjust target selector. Pass -mbranch-cost= 2. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c index e2ac2f7..c754b5b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c @@ -1,5 +1,6 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { ! { m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-* hppa*-*-* nios2*-*-* } } } } */ /* { dg-additional-options "-O2 -fdump-tree-vrp-details" } */ +/* { dg-additional-options "-mbranch-cost=2" { target mips*-*-* avr-*-* s390*-*-* i?86-*-* x86_64-*-* } } */ /* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } } */ void foo (void); -- 2.9.0.rc0.29.gabd6606
Re: [PATCH] Update documentation for ARM architecture
On 05/28/2016 07:52 AM, stefan.bru...@rwth-aachen.de wrote: From: Stefan Brüns * use lexicographical ordering, as "gcc -march=foo" does * add armv6k, armv6z, arm6zk * remove ep9312, it is only valid for -mcpu * add armv6s-m and document it, as it is no official ARM name. Support for the OS extension/SVC is mandatory, non-supporting implementations are deprecated (ARMv6-M Architecture Reference Manual, B.2) --- gcc/doc/invoke.texi | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9e92133..6942b83 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14041,15 +14041,22 @@ name to determine what kind of instructions it can emit when generating assembly code. This option can be used in conjunction with or instead of the @option{-mcpu=} option. Permissible names are: @samp{armv2}, @samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t}, -@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te}, -@samp{armv6}, @samp{armv6j}, -@samp{armv6t2}, @samp{armv6z}, @samp{armv6kz}, @samp{armv6-m}, -@samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m}, @samp{armv7e-m}, +@samp{armv5}, @samp{armv5e}, @samp{armv5t}, @samp{armv5te}, +@samp{armv6}, @samp{armv6-m}, @samp{armv6j}, @samp{armv6k}, +@samp{armv6kz}, @samp{armv6s-m}, +@samp{armv6t2}, @samp{armv6z}, @samp{armv6zk}, +@samp{armv7}, @samp{armv7-a}, @samp{armv7-m}, @samp{armv7-r}, @samp{armv7e-m}, @samp{armv7ve}, @samp{armv8-a}, @samp{armv8-a+crc}, @samp{armv8.1-a}, -@samp{armv8.1-a+crc}, @samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}. +@samp{armv8.1-a+crc}, @samp{iwmmxt}, @samp{iwmmxt2}. Architecture revisions older than @option{armv4t} are deprecated. Can you please s/@option{armv4t}/@samp{armv4t}/ +@option{-march=armv6s-m} is the armv6-m architecture with support for +the (now mandatory) SVC instruction. + +@option{-march=armv6zk} is an alias for armv6kz, existing for backwards +compatibility. + @option{-march=armv7ve} is the armv7-a architecture with virtualization extensions. And in the above 3 paragraphs, please add @samp{} markup to armv6-m, armv6kz, and armv7-a, respectively. It's OK with those changes unless the arch maintainers have some technical complaint about the information. -Sandra
[PATCH] Fix PR c++/27100
Here's a more straightforward version of the patch that abuses the fact that the fields DECL_PENDING_INLINE_INFO and DECL_SAVED_FUNCTION_DATA are a union. Bootstrapped + regtested on x86_64-pc-linux-gnu. Does this look OK to commit? gcc/cp/ChangeLog: PR c++/27100 * decl.c (duplicate_decls): Copy the DECL_PENDING_INLINE_P flag. gcc/testsuite/ChangeLog: PR c++/27100 * g++.dg/other/friend6.C: New test. --- gcc/cp/decl.c| 11 +-- gcc/testsuite/g++.dg/other/friend6.C | 15 +++ 2 files changed, 24 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/other/friend6.C diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 90e147c..b26e8ae 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -2351,8 +2351,15 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend) } else { - if (DECL_PENDING_INLINE_INFO (newdecl) == 0) - DECL_PENDING_INLINE_INFO (newdecl) = DECL_PENDING_INLINE_INFO (olddecl); + if (DECL_PENDING_INLINE_INFO (newdecl) == NULL) + { + /* This also copies DECL_SAVED_FUNCTION_DATA since the two fields +are a union (tagged by DECL_PENDING_INLINE_P). */ + DECL_PENDING_INLINE_INFO (newdecl) + = DECL_PENDING_INLINE_INFO (olddecl); + DECL_PENDING_INLINE_P (newdecl) + = DECL_PENDING_INLINE_P (olddecl); + } DECL_DECLARED_INLINE_P (newdecl) |= DECL_DECLARED_INLINE_P (olddecl); diff --git a/gcc/testsuite/g++.dg/other/friend6.C b/gcc/testsuite/g++.dg/other/friend6.C new file mode 100644 index 000..5f593a1 --- /dev/null +++ b/gcc/testsuite/g++.dg/other/friend6.C @@ -0,0 +1,15 @@ +// PR c++/27100 +// This used to fail at link time with an "undefined reference to 'foo'" error. +// { dg-do link } + +struct A +{ + friend void foo (const A&) { } + friend void foo (const A&); +}; + +int +main () +{ + foo (A ()); +} -- 2.9.0.rc0.29.gabd6606
Re: [patch] config.gcc FreeBSD ARM
On 27.05.16 22:34, Andreas Tobler wrote: Hi all, The FreeBSD ARM people eliminated the extra armv6hf target and moved the hardfloat 'functionality' into the armv6-*-freebsd11+ target. This applies / will apply (FreeBSD11 is not released yet. Planned date in September 16) to FreeBSD11. On FreeBSD10 armv6 still has only soft float. The armv6hf is not life on FreeBSD10. This simplifies life a bit. I'll commit the attached patch to all the active branches. Regarding the gcc-5 branch, do I have permission to apply? Committed to trunk and gcc-6, waiting till gcc-5 is open again. Andreas 2016-05-27 Andreas Tobler * config.gcc: Move hard float support for arm*hf*-*-freebsd* into armv6*-*-freebsd* for FreeBSD11*. Eliminate the arm*hf*-*-freebsd* target. Index: config.gcc === --- config.gcc (revision 236835) +++ config.gcc (working copy) @@ -1058,13 +1058,11 @@ case $target in armv6*-*-freebsd*) tm_defines="${tm_defines} TARGET_FREEBSD_ARMv6=1" +if test $fbsd_major -ge 11; then + tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1" +fi ;; esac - case $target in - arm*hf-*-freebsd*) - tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1" - ;; - esac with_tls=${with_tls:-gnu} ;; arm*-*-netbsdelf*)
Re: [v3 PATCH] Protect allocator-overloads of tuple-from-tuple constructors from cases that would create dangling references.
Hi Ville, On 29/05/2016 16:42, Ville Voutilainen wrote: On 28 May 2016 at 21:25, Ville Voutilainen wrote: The fix to avoid binding dangling references to temporaries for tuple's constructors that take tuples of different type didn't include the fix for allocator overloads. That was just lazy, and I should feel ashamed. This patch fixes it, and takes us one step further to pass libc++'s testsuite for tuple. The added _NonNestedTuple checks could actually be folded into the recently-added _TMCT alias, but I'll do that as a separate cleanup patch. For now, this should do as an easy and straightforward fix. Tested on Linux-x64. 2016-05-28 Ville Voutilainen Protect allocator-overloads of tuple-from-tuple constructors from cases that would create dangling references. * include/std/tuple (tuple(allocator_arg_t, const _Alloc&, const tuple<_UElements...>&), tuple(allocator_arg_t, const _Alloc&, tuple<_UElements...>&&)): Add a check for _NonNestedTuple. * testsuite/20_util/tuple/cons/nested_tuple_construct.cc: Adjust. Since Jonathan is going to be out-of-reach for next week due to a well-deserved holiday, would it be ok if Paolo approves such patches? Admittedly, I didn't follow in detail all your latest contributions, but sure, if this work goes only in trunk, please go ahead at your ease. We can certainly revisit details when Jon comes back, but trunk is currently in Stage 1, open for any kind of development work... Thanks, Paolo.
[committed] Fix some OpenMP 4.5 issues (PR c++/71349)
Hi! As the changes to the testcase show, there were 3 issues: 1) nowait clause has not been allowed on target {teams distribute ,}parallel for{, simd} - while parallel for{, simd} disallows nowait, it is allowed on target, so on these constructs nowait should go to target, and not to the worksharing construct 2) depend clause has been allowed on combined constructs including target, but resulted in ICE, only specifying it on non-combined target worked 3) for C++ only, after fixing 2), target parallel depend(inout: p[0]) resulted in weirdo error, because we were finalizing clauses twice in that case Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk and 6.2. 2016-05-30 Jakub Jelinek PR c++/71349 * c-parser.c (c_parser_omp_for): Don't disallow nowait clause when combined with target construct. * parser.c (cp_parser_omp_for): Don't disallow nowait clause when combined with target construct. (cp_parser_omp_parallel): Pass cclauses == NULL as last argument to cp_parser_omp_all_clauses. * c-omp.c (c_omp_split_clauses): Put OMP_CLAUSE_DEPEND to C_OMP_CLAUSE_SPLIT_TARGET. Put OMP_CLAUSE_NOWAIT to C_OMP_CLAUSE_SPLIT_TARGET if combined with target construct, instead of C_OMP_CLAUSE_SPLIT_FOR. * c-c++-common/gomp/clauses-1.c (bar): Add dd argument. Add nowait depend(inout: dd[0]) clauses where permitted. --- gcc/c/c-parser.c.jj 2016-05-26 21:10:52.0 +0200 +++ gcc/c/c-parser.c2016-05-30 20:11:36.522350438 +0200 @@ -15113,7 +15113,9 @@ c_parser_omp_for (location_t loc, c_pars strcat (p_name, " for"); mask |= OMP_FOR_CLAUSE_MASK; - if (cclauses) + /* parallel for{, simd} disallows nowait clause, but for + target {teams distribute ,}parallel for{, simd} it should be accepted. */ + if (cclauses && (mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_MAP)) == 0) mask &= ~(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NOWAIT); /* Composite distribute parallel for{, simd} disallows ordered clause. */ if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DIST_SCHEDULE)) != 0) --- gcc/cp/parser.c.jj 2016-05-26 10:38:01.0 +0200 +++ gcc/cp/parser.c 2016-05-30 20:51:40.418637030 +0200 @@ -33917,7 +33917,9 @@ cp_parser_omp_for (cp_parser *parser, cp strcat (p_name, " for"); mask |= OMP_FOR_CLAUSE_MASK; - if (cclauses) + /* parallel for{, simd} disallows nowait clause, but for + target {teams distribute ,}parallel for{, simd} it should be accepted. */ + if (cclauses && (mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_MAP)) == 0) mask &= ~(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NOWAIT); /* Composite distribute parallel for{, simd} disallows ordered clause. */ if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DIST_SCHEDULE)) != 0) @@ -34256,7 +34258,8 @@ cp_parser_omp_parallel (cp_parser *parse } } - clauses = cp_parser_omp_all_clauses (parser, mask, p_name, pragma_tok); + clauses = cp_parser_omp_all_clauses (parser, mask, p_name, pragma_tok, + cclauses == NULL); if (cclauses) { cp_omp_split_clauses (loc, OMP_PARALLEL, mask, clauses, cclauses); --- gcc/c-family/c-omp.c.jj 2016-05-01 12:21:55.0 +0200 +++ gcc/c-family/c-omp.c2016-05-30 20:21:59.107412705 +0200 @@ -983,6 +983,7 @@ c_omp_split_clauses (location_t loc, enu case OMP_CLAUSE_MAP: case OMP_CLAUSE_IS_DEVICE_PTR: case OMP_CLAUSE_DEFAULTMAP: + case OMP_CLAUSE_DEPEND: s = C_OMP_CLAUSE_SPLIT_TARGET; break; case OMP_CLAUSE_NUM_TEAMS: @@ -998,7 +999,6 @@ c_omp_split_clauses (location_t loc, enu s = C_OMP_CLAUSE_SPLIT_PARALLEL; break; case OMP_CLAUSE_ORDERED: - case OMP_CLAUSE_NOWAIT: s = C_OMP_CLAUSE_SPLIT_FOR; break; case OMP_CLAUSE_SCHEDULE: @@ -1333,6 +1333,18 @@ c_omp_split_clauses (location_t loc, enu else s = C_OMP_CLAUSE_SPLIT_FOR; break; + case OMP_CLAUSE_NOWAIT: + /* Nowait clause is allowed on target, for and sections, but +is not allowed on parallel for or parallel sections. Therefore, +put it on target construct if present, because that can only +be combined with parallel for{, simd} and not with for{, simd}, +otherwise to the worksharing construct. */ + if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_MAP)) + != 0) + s = C_OMP_CLAUSE_SPLIT_TARGET; + else + s = C_OMP_CLAUSE_SPLIT_FOR; + break; default: gcc_unreachable (); } --- gcc/testsuite/c-c++-common/gomp/clauses-1.c.jj 2016-03-08 09:01:48.0 +0100 +++ gcc/testsuite/c-c++-common/gomp/clauses-1.c 2016-05-30 20:18:59.0 +0200 @@ -34,7 +34,7 @@ foo (int d, int m, int i1, int i2, int p void bar
Re: [PATCH] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount
On 5/30/16 22:25, Chen Gang wrote: > On 5/30/16 21:32, Chen Gang wrote: >> On 5/30/16 03:18, Mike Stump wrote: >>> On May 29, 2016, at 3:39 AM, cheng...@emindsoft.com.cn wrote: r10 may also be as parameter for the nested function, so need save it before call mcount. >>> >>> mcount can have a special abi where it preserves more registers than one >>> would otherwise expect. I'm wondering if you know what registers it saves >>> or doesn't touch? Does this fix any bug found by running tests, or just by >>> inspection? >>> Oh, I forgot bundle optimization. We can {st sp, 10; addi sp, sp, -8}. Also I shall add PR target/71331 for the comments. If no any additional reply within 3 days, I shall send patch v2 for it within this week. Thanks. >> >> It is about Bug71331, which I reported yesterday. >> >> For nested function, r10 is treated as the stack pointer for parameters, >> mcount really save r10, but tilegx also use r10 to save lr, so cause >> this issue ("move r10, lr; jal __mcount"). >> >> What I have done just like gcc x86 has done ("push r10; callq mcount"). >> > > After this patch, nested-func-4.c can passes compiling and running under > tilegx qemu. :-) > > Thanks. > -- Chen Gang (陈刚) Managing Natural Environments is the Duty of Human Beings.
Re: [PATCH] c++/71306 - bogus -Wplacement-new with an array element
On 05/30/2016 12:42 PM, Jason Merrill wrote: OK. Sorry, I should have asked to begin with: Is it okay to backport this fix to the 6.x branch too? Martin
[PATCH] Remove old diagnostic macros.
Hi, this is my first GCC patch, so please bear with me if something is wrong with it in an obvious way. I've found two unused macros in gcc/diagnostic.h. Is the patch okay as is? Bootstrapped on x86_64-pc-linux-gnu. 2016-05-31 Marcin Baczyński * gcc/diagnostic.h (diagnostic_line_cutoff, diagnostic_flush_buffer): delete. --- gcc/diagnostic.h | 6 -- 1 file changed, 6 deletions(-) diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h index 48ae50d..58d77df 100644 --- a/gcc/diagnostic.h +++ b/gcc/diagnostic.h @@ -231,12 +231,6 @@ diagnostic_inhibit_notes (diagnostic_context * context) /* Same as output_prefixing_rule. Works on 'diagnostic_context *'. */ #define diagnostic_prefixing_rule(DC) ((DC)->printer->wrapping.rule) -/* Maximum characters per line in automatic line wrapping mode. - Zero means don't wrap lines. */ -#define diagnostic_line_cutoff(DC) ((DC)->printer->wrapping.line_cutoff) - -#define diagnostic_flush_buffer(DC) pp_flush ((DC)->printer) - /* True if the last module or file in which a diagnostic was reported is different from the current one. */ #define diagnostic_last_module_changed(DC, MAP)\ -- 2.8.3