Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)
> On 02/28/2014 03:56 PM, Jan Hubicka wrote: > >I think we can safely test here DECL_ARTIFICIAL && (DECL_EXTERNAL || > >DECL_COMDAT). If the dtor is going to be output anyway, we are safe to use > >it. > > We already skipped DECL_EXTERNAL decls, and artificial members are > always DECL_COMDAT, but I'll add the COMDAT check. > > >Are those programs valid by C++ standard? (I believe it is not valid to > >include > >stuff whose implementation you do not link with.). > > Symbol visibility is outside the scope of the standard. > > >If we just want to avoid > >breaking python and libreoffice (I fixed libreoffice part however), we may > >just > >go with the ipa-devirt change as you propose (with external&comdat check). > > > >If this is an correctness issue, I think we want to be safe that other > >optimizations > >won't do the same. In that case your check seems misplaced. > > > >If DECL_ARTIFICIAL destructors are not safe to inline, I would add it into > >function_attribute_inlinable_p. If the dtor is not safe to refer, then I > >would > >add it into can_refer_decl_in_current_unit_p > > >Both such changes would however inhibit quite some optimization, since > >artificial destructors are quite common case, right? Or is there some reason > >why > >only speculative devirtualization count possibly work out reference to these? > > Normally, it's fine to inline destructors, and refer to them. The > problem comes when we turn what had been a virtual call (which goes > through the vtable that is hidden in the DSO) into a direct call to > a hidden function. We don't do that for user-defined virtual > functions because the user controls whether or not they are defined > in the header, and we don't devirtualize if no definition is > available, but implicitly-declared functions are different because > the user has no way to prevent the definition from being available. > > This also isn't a problem for cprop devirtualization, because in > that situation we must have already referred to the vtable. Is this always the case? For normal virtual method you can have something like: foo(class bla a) { a->bar(); } Here we will devirutalize bar into bla's method without ever seeing the vtable. I am not sure if I can construct a testcase for dtor w/o referencing the vtable, since generally those are cases where dtor is generated autmatically. But it is a bit fragile assumption to be made. Jan > > Jason > > commit 2a05a09c268ce3abb373aa86cf731d20aac8dd7a > Author: Jason Merrill > Date: Fri Feb 28 14:03:19 2014 -0500 > > PR c++/58678 > * ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared > function. > > diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c > index 21649cb..2f84f17 100644 > --- a/gcc/ipa-devirt.c > +++ b/gcc/ipa-devirt.c > @@ -1710,7 +1710,7 @@ ipa_devirt (void) > >int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0; >int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0; > - int nwrong = 0, nok = 0, nexternal = 0;; > + int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0; > >FOR_EACH_DEFINED_FUNCTION (n) > { > @@ -1820,6 +1820,17 @@ ipa_devirt (void) > nexternal++; > continue; > } > + /* Don't use an implicitly-declared destructor (c++/58678). */ > + struct cgraph_node *non_thunk_target > + = cgraph_function_node (likely_target); > + if (DECL_ARTIFICIAL (non_thunk_target->decl) > + && DECL_COMDAT (non_thunk_target->decl)) > + { > + if (dump_file) > + fprintf (dump_file, "Target is artificial\n\n"); > + nartificial++; > + continue; > + } > if (cgraph_function_body_availability (likely_target) > <= AVAIL_OVERWRITABLE > && symtab_can_be_discarded (likely_target)) > @@ -1862,10 +1873,10 @@ ipa_devirt (void) >" %i speculatively devirtualized, %i cold\n" >"%i have multiple targets, %i overwritable," >" %i already speculated (%i agree, %i disagree)," > - " %i external, %i not defined\n", > + " %i external, %i not defined, %i artificial\n", >npolymorphic, ndevirtualized, nconverted, ncold, >nmultiple, noverwritable, nspeculated, nok, nwrong, > - nexternal, nnotdefined); > + nexternal, nnotdefined, nartificial); >return ndevirtualized ? TODO_remove_functions : 0; > } > > diff --git a/gcc/testsuite/g++.dg/ipa/devirt-28.C > b/gcc/testsuite/g++.dg/ipa/devirt-28.C > new file mode 100644 > index 000..e18b818 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/ipa/devirt-28.C > @@ -0,0 +1,17 @@ > +// PR c++/58678 > +// { dg-options "-O3 -fdump-ipa-devirt" } > + > +struct A { > + virtual ~A(); > +}; > +struct B : A { > + virtual int m_fn1(); > +}; > +void fn1(B* b) { > + delete b; > +} > + > +// { dg-final { scan-assembl
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Fri, Feb 28, 2014 at 7:23 PM, H.J. Lu wrote: > On Fri, Feb 28, 2014 at 9:42 AM, H.J. Lu wrote: >> On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu wrote: >>> On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu wrote: On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener wrote: > On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener > wrote: >> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu wrote: >>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c". When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. >>> >>> This change: >>> >>> if (dest->loop_father->header == dest) >>> - return false; >>> + { >>> +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) >>> +&& bb->loop_father->header != dest) >>> + return false; >>> + >>> +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) >>> +&& bb->loop_father->header == dest) >>> + return false; >>> + } >>> } >>> >>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with >>> >>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver >>> -fuse-linker-plugin >>> >>> This patch changes loops without LOOPS_HAVE_PREHEADERS >>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning >>> true. I don't have a small testcase. But this patch: >>> >>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c >>> index b5c384b..2ba673c 100644 >>> --- a/gcc/tree-cfgcleanup.c >>> +++ b/gcc/tree-cfgcleanup.c >>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool >>> phi_wanted) >>> if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) >>> && bb->loop_father->header == dest) >>>return false; >>> + >>> +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) >>> +&& !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) >>> + return false; >>>} >>> } >>> >>> fixes the regression. Does it make any senses? >> >> I think the preheader test isn't fully correct (bb may be in an inner >> loop >> for example). So a more conservative variant would be >> >> Index: gcc/tree-cfgcleanup.c >> === >> --- gcc/tree-cfgcleanup.c (revision 208169) >> +++ gcc/tree-cfgcleanup.c (working copy) >> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb, >>/* Protect loop preheaders and latches if requested. */ >>if (dest->loop_father->header == dest) >> { >> - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) >> - && bb->loop_father->header != dest) >> - return false; >> - >> - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) >> - && bb->loop_father->header == dest) >> - return false; >> + if (bb->loop_father == dest->loop_father) >> + return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES); >> + else if (bb->loop_father == loop_outer (dest->loop_father)) >> + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); >> + /* Always preserve other edges into loop headers that are >> +not simple latches or preheaders. */ >> + return false; >> } >> } >> >> that makes sure we can properly update loop information. It's also >> a more conservative change at this point which s
Re: calloc = malloc + memset
Hi > On 28/feb/2014, at 23:48, Marc Glisse wrote: > > Hello, > > this is a stage 1 patch, and I'll ping it then, but if you have comments > now... > > Passes bootstrap+testsuite on x86_64-linux-gnu. > > 2014-02-28 Marc Glisse > >PR tree-optimization/57742 > gcc/ >* tree-ssa-forwprop.c (simplify_malloc_memset): New function. >(simplify_builtin_call): Call it. > gcc/testsuite/ >* g++.dg/tree-ssa/calloc.C: New testcase. >* gcc.dg/tree-ssa/calloc.c: Likewise. > > -- > Marc Glisse > Index: gcc/testsuite/g++.dg/tree-ssa/calloc.C > === > --- gcc/testsuite/g++.dg/tree-ssa/calloc.C(revision 0) > +++ gcc/testsuite/g++.dg/tree-ssa/calloc.C(working copy) > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=gnu++11 -O3 -fdump-tree-optimized" } */ > + > +#include > +#include > +#include > + > +void g(void*); > +inline void* operator new(std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) Unless *really* necessary I would recommend not including the large (that also couples quite seriously the front-end testsuite to the library testsuite, we already discussed those topics in the past). Using the internal macros seems also unnecessary. Paolo
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
Hi H.J, Sorry that I will be out of office next week, and don't have chance to reproduce it until back. BTW, does x32 refer to x86 32 bit? Thanks, bin On Sat, Mar 1, 2014 at 2:23 AM, H.J. Lu wrote: > On Fri, Feb 28, 2014 at 9:42 AM, H.J. Lu wrote: >> On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu wrote: >>> On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu wrote: On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener wrote: > On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener > wrote: >> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu wrote: >>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c". When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. >>> >>> This change: >>> >>> if (dest->loop_father->header == dest) >>> - return false; >>> + { >>> +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) >>> +&& bb->loop_father->header != dest) >>> + return false; >>> + >>> +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) >>> +&& bb->loop_father->header == dest) >>> + return false; >>> + } >>> } >>> >>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with >>> >>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver >>> -fuse-linker-plugin >>> >>> This patch changes loops without LOOPS_HAVE_PREHEADERS >>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning >>> true. I don't have a small testcase. But this patch: >>> >>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c >>> index b5c384b..2ba673c 100644 >>> --- a/gcc/tree-cfgcleanup.c >>> +++ b/gcc/tree-cfgcleanup.c >>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool >>> phi_wanted) >>> if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) >>> && bb->loop_father->header == dest) >>>return false; >>> + >>> +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) >>> +&& !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) >>> + return false; >>>} >>> } >>> >>> fixes the regression. Does it make any senses? >> >> I think the preheader test isn't fully correct (bb may be in an inner >> loop >> for example). So a more conservative variant would be >> >> Index: gcc/tree-cfgcleanup.c >> === >> --- gcc/tree-cfgcleanup.c (revision 208169) >> +++ gcc/tree-cfgcleanup.c (working copy) >> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb, >>/* Protect loop preheaders and latches if requested. */ >>if (dest->loop_father->header == dest) >> { >> - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) >> - && bb->loop_father->header != dest) >> - return false; >> - >> - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) >> - && bb->loop_father->header == dest) >> - return false; >> + if (bb->loop_father == dest->loop_father) >> + return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES); >> + else if (bb->loop_father == loop_outer (dest->loop_father)) >> + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); >> + /* Always preserve other edges into loop headers that are >> +not simple latches or preheaders. */ >> + return false; >>
[SH, committed] Fix PR 60071
Hi, I've just committed the attached patch that fixes PR 60071 as rev 208242. Originally tested by Kaz: http://gcc.gnu.org/ml/gcc-testresults/2014-02/msg01866.html Tested the new test case with: make check-gcc RUNTESTFLAGS="compile.exp=pr60071.c --target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}" Cheers, Oleg gcc/ChangeLog: PR target/60071 * config/sh/sh.md (*mov_t_msb_neg): Split into ... (*mov_t_msb_neg_negc): ... this new insn. testsuite/ChangeLog: PR target/60071 * gcc.c-torture/compile/pr60071.c: New. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 208241) +++ gcc/config/sh/sh.md (working copy) @@ -11434,6 +11434,10 @@ ;; T = 1: 0x8000 -> reg ;; T = 0: 0x7FFF -> reg ;; This works because 0 - 0x8000 = 0x8000. +;; +;; This insn must not match again after it has been split into the constant +;; load and negc. This is accomplished by the special negc insn that +;; has a use on the operand. (define_insn_and_split "*mov_t_msb_neg" [(set (match_operand:SI 0 "arith_reg_dest") (minus:SI (const_int -2147483648) ;; 0x8000 @@ -11444,12 +11448,23 @@ "&& can_create_pseudo_p ()" [(set (match_dup 2) (const_int -2147483648)) (parallel [(set (match_dup 0) (minus:SI (neg:SI (match_dup 2)) - (reg:SI T_REG))) - (clobber (reg:SI T_REG))])] + (reg:SI T_REG))) + (clobber (reg:SI T_REG)) + (use (match_dup 2))])] { operands[2] = gen_reg_rtx (SImode); }) +(define_insn "*mov_t_msb_neg_negc" + [(set (match_operand:SI 0 "arith_reg_dest" "=r") + (minus:SI (neg:SI (match_operand:SI 1 "arith_reg_operand" "r")) + (match_operand:SI 2 "t_reg_operand"))) + (clobber (reg:SI T_REG)) + (use (match_dup 1))] + "TARGET_SH1" + "negc %1,%0" + [(set_attr "type" "arith")]) + ;; These are essentially the same as above, but with the inverted T bit. ;; Combine recognizes the split patterns, but does not take them sometimes ;; if the T_REG clobber is specified. Instead it tries to split out the Index: gcc/testsuite/gcc.c-torture/compile/pr60071.c === --- gcc/testsuite/gcc.c-torture/compile/pr60071.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/pr60071.c (revision 0) @@ -0,0 +1,8 @@ +int +foo (int cls, int sign) +{ + if (__builtin_expect (cls == 4, 0)) +return (sign +? (-((int) ((~(unsigned)0) >> 1)))-1 +: ((int) ((~(unsigned)0) >> 1))); +}
gcc-patches@gcc.gnu.org
There is some uncertainty in the PR discussion about what semantics vector && should have, so this patch just avoids the ICE for now. Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit cbf13b08495010d24a784170f76684e5109bd92b Author: Jason Merrill Date: Fri Feb 28 23:56:22 2014 -0500 PR c++/58845 * typeck.c (cp_build_binary_op): Sorry on vector&&vector. diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index 1e14b63..29f9e9d 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -4177,6 +4177,11 @@ cp_build_binary_op (location_t location, case TRUTH_ORIF_EXPR: case TRUTH_AND_EXPR: case TRUTH_OR_EXPR: + if (VECTOR_TYPE_P (type0) || VECTOR_TYPE_P (type1)) + { + sorry ("logical operation on vector type"); + return error_mark_node; + } result_type = boolean_type_node; break; diff --git a/gcc/testsuite/g++.dg/ext/vector27.C b/gcc/testsuite/g++.dg/ext/vector27.C new file mode 100644 index 000..288e13c --- /dev/null +++ b/gcc/testsuite/g++.dg/ext/vector27.C @@ -0,0 +1,7 @@ +// PR c++/58845 + +void foo() +{ + int v __attribute__((vector_size(8))); + v = v || v; // { dg-bogus "" "" { xfail *-*-* } } +}
Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)
On 03/01/2014 03:52 AM, Jan Hubicka wrote: a hidden function. We don't do that for user-defined virtual functions because the user controls whether or not they are defined in the header, and we don't devirtualize if no definition is available, but implicitly-declared functions are different because the user has no way to prevent the definition from being available. This also isn't a problem for cprop devirtualization, because in that situation we must have already referred to the vtable. Is this always the case? For normal virtual method you can have something like: foo(class bla a) { a->bar(); } Here we will devirutalize bar into bla's method without ever seeing the vtable. I am not sure if I can construct a testcase for dtor w/o referencing the vtable, since generally those are cases where dtor is generated autmatically. But it is a bit fragile assumption to be made. For pass-by-value we need to be able to call a constructor, which needs to be able to reference the vtable, so I think it should be OK typically. Sure, people can do all kinds of weird and broken things with visibility; I'm just trying to make what seems to be a common and understandable pattern work. Jason
Optimize n?rotate(x,n):x
Hello, again, a stage 1 patch that I will ping then, but early comments are welcome. PR 59100 was asking to transform n?rotate(x,n):x to rotate(x,n) (because it can be hard to write a strictly valid rotate in plain C). The operation is really: (x != neutral) ? x op y : y where neutral is such that (neutral op y) is always y, so that's what I implemented (and absorbing elements while I was at it). For some operations on some platforms, the transformation may not be such a good idea, in particular if division is very slow and b is 1 most of the time, then computing a/b may be slower than (b!=1)?a/b:a. The easiest might be to comment out those operations in the switch for now. I think divisions are the only ones slow enough to deserve this, though there certainly are CPUs where multiplication is not so fast, and even for rotate it may not always be a win if the processor doesn't have a rotate instruction and the shift amount is almost always 0. Passes bootstrap+testsuite on x86_64-linux-gnu. 2014-03-01 Marc Glisse PR tree-optimization/59100 gcc/ * tree-ssa-phiopt.c (neutral_element_p, absorbing_element_p): New functions. (value_replacement): Handle conditional binary operations with a neutral or absorbing element. gcc/testsuite/ * gcc.dg/tree-ssa/phi-opt-12.c: New file. -- Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c === --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c (working copy) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-phiopt1" } */ + +int f(int a, int b, int c) { + if (c > 5) return c; + if (a == 0) return b; + return a + b; +} + +unsigned rot(unsigned x, int n) { + const int bits = __CHAR_BIT__ * __SIZEOF_INT__; + return (n == 0) ? x : ((x << n) | (x >> (bits - n))); +} + +unsigned m(unsigned a, unsigned b) { + if (a == 0) +return 0; + else +return a & b; +} + +/* { dg-final { scan-tree-dump-times "goto" 2 "phiopt1" } } */ +/* { dg-final { cleanup-tree-dump "phiopt1" } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c ___ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c (revision 208241) +++ gcc/tree-ssa-phiopt.c (working copy) @@ -140,20 +140,37 @@ static bool gate_hoist_loads (void); x = PHI (CONST, a) Gets replaced with: bb0: bb2: t1 = a == CONST; t2 = b > c; t3 = t1 & t2; x = a; + + It also replaces + + bb0: + if (a != 0) goto bb1; else goto bb2; + bb1: + c = a + b; + bb2: + x = PHI ; + + with + + bb0: + c = a + b; + bb2: + x = PHI ; + ABS Replacement --- This transformation, implemented in abs_replacement, replaces bb0: if (a >= 0) goto bb2; else goto bb1; bb1: x = -a; bb2: @@ -809,20 +826,79 @@ operand_equal_for_value_replacement (con if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp)) return true; tmp = gimple_assign_rhs2 (def); if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp)) return true; return false; } +/* Returns true if ARG is a neutral element for operation CODE + on the RIGHT side. */ + +static bool +neutral_element_p (tree_code code, tree arg, bool right) +{ + switch (code) +{ +case PLUS_EXPR: +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: + return integer_zerop (arg); + +case LROTATE_EXPR: +case RROTATE_EXPR: +case LSHIFT_EXPR: +case RSHIFT_EXPR: +case MINUS_EXPR: +case POINTER_PLUS_EXPR: + return right && integer_zerop (arg); + +case MULT_EXPR: + return integer_onep (arg); + +// Are those too expensive? Should we check edge probabilities? +case TRUNC_DIV_EXPR: +case CEIL_DIV_EXPR: +case FLOOR_DIV_EXPR: +case ROUND_DIV_EXPR: +case EXACT_DIV_EXPR: + return right && integer_onep (arg); + +case BIT_AND_EXPR: + return integer_all_onesp (arg); + +default: + return false; +} +} + +/* Returns true if ARG is an absorbing element for operation CODE. */ + +static bool +absorbing_element_p (tree_code code, tree arg) +{ + switch (code) +{ +case BIT_IOR_EXPR: + return integer_all_onesp (arg); + +case MULT_EXPR: +case BIT_AND_EXPR: + return integer_zerop (arg); + +default: + return false; +} +} + /* The function value_replacement does the main work of doing the value replacement. Return non-zero if the replacement is done. Otherwis
[RFC] Do not consider volatile asms as optimization barriers #1
There seems to be a sufficiently large consensus that volatile asms should not be treated as full optimization barriers by the compiler. This patch is a first step towards implementing this conclusion and aims only at addressing the code quality regressions introduced by http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=193802 on the 4.8 branch and mainline for volatile asms. It introduces a new, temporary predicate really_volatile_insn_p and invokes it from the 3 places in cse.c, cselib.c and dse.c which were touched above. But this comes with a side effect: the "blockage" standard pattern needs to be adjusted to always use an UNSPEC_VOLATILE. That's already the case for all the architectures that define it, but 21 architectures (aarch64 arc avr bfin cr16 cris epiphany h8300 m32c mcore mmix mn10300 moxie msp430 nds32 picochip rl78 rx score v850 xtensa) don't define it and therefore need to be adjusted. Tested on x86_64-suse-linux and by building cc1 and compiling a relevant testcase for the 21 aforementioned architectures. 2014-03-01 Eric Botcazou * doc/md.texi (blockage): Do not allow volatile asms. * rtl.def (UNSPEC_VOLATILE): Adjust description. * rtl.h (really_volatile_insn_p): Declare. * rtlanal.c (volatile_insn_1): New predicate copied from... (volatile_insn_p): ...this. Call it. (really_volatile_insn_p): New predicate. * cse.c (cse_insn): Call really_volatile_insn_p. * cselib.c (cselib_process_insn): Likewise. * dse.c (scan_insn): Likewise. * emit-rtl.c (gen_blockage): Delete. * emit-rtl.h (gen_blockage): Likewise. * config/aarch64/aarch64.md (UNSPECV_BLOCKAGE): New enum value. (blockage): New insn. * config/arc/arc.md (VUNSPEC_BLOCKAGE): New constant. (blockage): New insn. * config/avr/avr.md (UNSPECV_BLOCKAGE): New enum value. (blockage): New insn. * config/bfin/bfin.md (UNSPEC_VOLATILE_BLOCKAGE): New constant. (blockage): New insn. * config/cr16/cr16.md (UNSPECV_BLOCKAGE): New constant. (blockage): New insn. * config/cris/cris.md (CRIS_UNSPECV_BLOCKAGE): New enum value. (blockage): New insn. * config/epiphany/epiphany.md (UNSPECV_BLOCKAGE): New constant. (blockage): New insn. * config/h8300/h8300.md (UNSPEC_BLOCKAGE): New constant. (blockage): New insn. * config/m32c/m32c.md (UNS_BLOCKAGE): New constant. (blockage): New insn. * config/mcore/mcore.md (UNSPECV_BLOCKAGE, UNSPECV_CONSTANT, UNSPECV_ALIGN, UNSPECV_TABLE): New enum values. (blockage): New insn. (consttable_4): Use UNSPECV_CONSTANT. (align_4): Use UNSPECV_ALIGN. (consttable_end): Use UNSPECV_TABLE. * config/mmix/mmix.md (UNSPECV_BLOCKAGE, UNSPECV_SYNC, UNSPECV_NONLOCAL): New enum values. (blockage): New insn. (nonlocal_goto_receiver): Use UNSPECV_NONLOCAL. (nonlocal_goto_receiver_expanded): Likewise. (sync_icache): Use UNSPECV_SYNC. * config/mn10300/mn10300.md (UNSPECV_BLOCKAGE): New constant. (blockage): New insn. * config/moxie/moxie.md (UNSPECV_BLOCKAGE): New enum value. (blockage): New insn. * config/msp430/msp430.md (UNS_BLOCKAGE): New enum value. (blockage): New insn. * config/nds32/constants.md (UNSPEC_VOLATILE_BLOCKAGE): New enum value. * config/nds32/nds32.md (blockage): New insn. * config/nds32/nds32.c (nds32_valid_stack_push_pop): Return false if the insn is of the wrong kind. * config/picochip/picochip.md (UNSPEC_BLOCKAGE): New constant. (blockage): New insn. * config/rl78/rl78.md (UNS_BLOCKAGE): New constant. (blockage): New insn. * config/rx/rx.md (UNSPEC_BLOCKAGE): New constant. (blockage): New insn. * config/score/score.md (BLOCKAGE): New constant. (blockage): New insn. * config/v850/v850.md (UNSPECV_BLOCKAGE): New constant. (blockage): New insn. * config/xtensa/xtensa.md (UNSPECV_BLOCKAGE): New constant. (blockage): New insn. -- Eric BotcazouIndex: doc/md.texi === --- doc/md.texi (revision 208241) +++ doc/md.texi (working copy) @@ -6228,7 +6228,7 @@ the values of operands 1 and 2. This pattern defines a pseudo insn that prevents the instruction scheduler and other passes from moving instructions and using register equivalences across the boundary defined by the blockage insn. -This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM. +This needs to be an UNSPEC_VOLATILE pattern. @cindex @code{memory_barrier} instruction pattern @item @samp{memory_barrier} Index: rtlanal.c === --- rtlanal.c (revision 208241) +++ rtlanal.c (working copy) @@ -21
Re: calloc = malloc + memset
On Sat, 1 Mar 2014, Paolo Carlini wrote: Hi On 28/feb/2014, at 23:48, Marc Glisse wrote: Hello, this is a stage 1 patch, and I'll ping it then, but if you have comments now... Passes bootstrap+testsuite on x86_64-linux-gnu. 2014-02-28 Marc Glisse PR tree-optimization/57742 gcc/ * tree-ssa-forwprop.c (simplify_malloc_memset): New function. (simplify_builtin_call): Call it. gcc/testsuite/ * g++.dg/tree-ssa/calloc.C: New testcase. * gcc.dg/tree-ssa/calloc.c: Likewise. -- Marc Glisse Index: gcc/testsuite/g++.dg/tree-ssa/calloc.C === --- gcc/testsuite/g++.dg/tree-ssa/calloc.C(revision 0) +++ gcc/testsuite/g++.dg/tree-ssa/calloc.C(working copy) @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu++11 -O3 -fdump-tree-optimized" } */ + +#include +#include +#include + +void g(void*); +inline void* operator new(std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) Unless *really* necessary I would recommend not including the large (that also couples quite seriously the front-end testsuite to the library testsuite, we already discussed those topics in the past). Using the internal macros seems also unnecessary. I think it might be the first time I include large headers in a compiler testcase (note that there are already 16 other testcases including in g++.dg). In this case, it seems to be what I want to test though. I already have some elementary tests in gcc.dg. This testcase is the original motivation for working on this. It requires a combination of quite a few optimizations (inlining, recognizing that a loop is a memset, aliasing, this optimization (the complicated version with a PHI node)), and I want to test that we won't for instance shuffle the passes in a way that breaks it. Also, if the library changes vector enough that this doesn't optimize anymore, I want to know about it, either the library change was wrong or the middle-end needs to improve some optimization before the next release. I wanted to keep the implementation of new as close to the one in libsupc++ as possible (mimic LTO), so I copy-pasted (and slightly edited, I may propose a patch to libsupc++ later). I agree that I should remove the exception specification (since I am compiling in C++11 to have access to get_new_handler) and replace _GLIBCXX_THROW_OR_ABORT with just throw, and I just did it locally, thanks. -- Marc Glisse
Re: [Patch, fortran] PR51976 - [F2003] Support deferred-length character components of derived types (allocatable string length)
Hi Mikael, hi all, 2014-02-22 16:38 GMT+01:00 Mikael Morin : > Le 19/02/2014 16:51, Janus Weil a écrit : >> The patch was not applying cleanly any more, so here is a re-diffed >> version for current trunk. It works nicely on the included test case >> as well as the one provided by Walter Spector in comment 12 of the PR. >> >> Since, also in the current state, "character(:)" works only in a >> subset of all cases, I think it cannot hurt to add more cases that >> work for 4.9 (even if still not all possible cases work). >> >> Please let me know what you think ... > > Review: > >> PR fortran/51976 >> * gfortran.h : Add deferred_parameter attribute. > Add the name of the struct before ':' > like "(struct symbol_attribute)" or maybe just "(symbol_attribute)" > >> * trans.c (gfc_deferred_strlen): New function. >> * trans.h : Prototype for the new function. > This is really nitpicking but "(gfc_deferred_strlen)" should be in front > of trans.h as well. Done. > Now regarding the patch itself, I don't know character handling very > well, but it seems to me that the patch makes the (wrong) assumption > that characters are of default kind, so that string length is the same > as memory size. > Namely: > >> Index: gcc/fortran/trans-array.c >> === >> --- gcc/fortran/trans-array.c (revision 207896) >> +++ gcc/fortran/trans-array.c (working copy) >> @@ -7365,7 +7365,7 @@ get_full_array_size (stmtblock_t *block, tree decl >> >> static tree >> duplicate_allocatable (tree dest, tree src, tree type, int rank, >> -bool no_malloc) >> +bool no_malloc, tree strlen) >> { >>tree tmp; >>tree size; >> @@ -7386,7 +7386,11 @@ duplicate_allocatable (tree dest, tree src, tree t >>null_data = gfc_finish_block (&block); >> >>gfc_init_block (&block); >> - size = TYPE_SIZE_UNIT (TREE_TYPE (type)); >> + if (strlen != NULL_TREE) >> + size = strlen; >> + else >> + size = TYPE_SIZE_UNIT (TREE_TYPE (type)); >> + > here... Fixed (but actually in the code which calls duplicate_allocatable). >>size = fold_build2_loc (input_location, MULT_EXPR, >> gfc_array_index_type, >> nelems, tmp); >>if (!no_malloc) >> Index: gcc/fortran/trans-expr.c >> === >> --- gcc/fortran/trans-expr.c (revision 207896) >> +++ gcc/fortran/trans-expr.c (working copy) >> @@ -6043,9 +6051,40 @@ gfc_trans_subcomponent_assign (tree dest, gfc_comp >> gfc_add_expr_to_block (&block, tmp); >> } >> } >> - else >> + else if (gfc_deferred_strlen (cm, &tmp)) >> { >> - /* Scalar component. */ >> + tree strlen; >> + strlen = tmp; >> + gcc_assert (strlen); >> + strlen = fold_build3_loc (input_location, COMPONENT_REF, >> + TREE_TYPE (strlen), >> + TREE_OPERAND (dest, 0), >> + strlen, NULL_TREE); >> + >> + if (expr->expr_type == EXPR_NULL) >> + { >> + tmp = build_int_cst (TREE_TYPE (cm->backend_decl), 0); >> + gfc_add_modify (&block, dest, tmp); >> + tmp = build_int_cst (TREE_TYPE (strlen), 0); >> + gfc_add_modify (&block, strlen, tmp); >> + } >> + else >> + { >> + gfc_init_se (&se, NULL); >> + gfc_conv_expr (&se, expr); >> + tmp = build_call_expr_loc (input_location, >> + builtin_decl_explicit (BUILT_IN_MALLOC), >> + 1, se.string_length); > here, Also fixed (again by using size_of_string_in_bytes). >> + gfc_add_modify (&block, dest, >> + fold_convert (TREE_TYPE (dest), tmp)); >> + gfc_add_modify (&block, strlen, se.string_length); >> + tmp = gfc_build_memcpy_call (dest, se.expr, se.string_length); >> + gfc_add_expr_to_block (&block, tmp); >> + } >> +} >> + else if (!cm->attr.deferred_parameter) >> +{ >> + /* Scalar component (excluding deferred parameters). */ >>gfc_init_se (&se, NULL); >>gfc_init_se (&lse, NULL); >> >> Index: gcc/fortran/trans-stmt.c >> === >> --- gcc/fortran/trans-stmt.c (revision 207896) >> +++ gcc/fortran/trans-stmt.c (working copy) >> @@ -5028,6 +5028,11 @@ gfc_trans_allocate (gfc_code * code) >> if (tmp && TREE_CODE (tmp) == VAR_DECL) >> gfc_add_modify (&se.pre, tmp, fold_convert (TREE_TYPE (tmp), >> memsz)); >> + else if (al->expr->ts.type == BT_CHARACTER >> +&& al->expr->ts.deferred && se.string_length) >> + gfc_add_modify (&se.pre, se.string_length, >> + fold_convert (TREE_TYPE (se.string_length), >> + memsz)); >> > and here.
Re: [RFC] Do not consider volatile asms as optimization barriers #1
Eric Botcazou writes: > There seems to be a sufficiently large consensus that volatile asms should > not > be treated as full optimization barriers by the compiler. This patch is a > first step towards implementing this conclusion and aims only at addressing > the code quality regressions introduced by > http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=193802 > on the 4.8 branch and mainline for volatile asms. > > It introduces a new, temporary predicate really_volatile_insn_p and invokes > it > from the 3 places in cse.c, cselib.c and dse.c which were touched above. But > this comes with a side effect: the "blockage" standard pattern needs to be > adjusted to always use an UNSPEC_VOLATILE. That's already the case for all > the architectures that define it, but 21 architectures (aarch64 arc avr bfin > cr16 cris epiphany h8300 m32c mcore mmix mn10300 moxie msp430 nds32 picochip > rl78 rx score v850 xtensa) don't define it and therefore need to be adjusted. > > Tested on x86_64-suse-linux and by building cc1 and compiling a relevant > testcase for the 21 aforementioned architectures. Thanks for doing this. FWIW I agree it's probably the best stop-gap fix. But the implication seems to be that unspec_volatile and volatile asms are volatile in different ways. IMO they're volatile in the same way and the problems for volatile asms apply to unspec_volatile too. E.g. although cse.c will flush the table for unspec_volatile, it isn't the case that unspec_volatile forces a containing function to save all call-saved registers. That would be excessive for a plain blockage instruction. So again we seem to be assuming one thing in places like cse.c and another in the register allocator. Code that uses the DF framework will also assume that registers are not implicitly clobbered by an unspec_volatile: case ASM_OPERANDS: case UNSPEC_VOLATILE: case TRAP_IF: case ASM_INPUT: { /* Traditional and volatile asm instructions must be considered to use and clobber all hard registers, all pseudo-registers and all of memory. So must TRAP_IF and UNSPEC_VOLATILE operations. Consider for instance a volatile asm that changes the fpu rounding mode. An insn should not be moved across this even if it only uses pseudo-regs because it might give an incorrectly rounded result. However, flow.c's liveness computation did *not* do this, giving the reasoning as " ?!? Unfortunately, marking all hard registers as live causes massive problems for the register allocator and marking all pseudos as live creates mountains of uninitialized variable warnings." In order to maintain the status quo with regard to liveness and uses, we do what flow.c did and just mark any regs we can find in ASM_OPERANDS as used. In global asm insns are scanned and regs_asm_clobbered is filled out. For all ASM_OPERANDS, we must traverse the vector of input operands. We can not just fall through here since then we would be confused by the ASM_INPUT rtx inside ASM_OPERANDS, which do not indicate traditional asms unlike their normal usage. */ if (code == ASM_OPERANDS) { int j; for (j = 0; j < ASM_OPERANDS_INPUT_LENGTH (x); j++) df_uses_record (collection_rec, &ASM_OPERANDS_INPUT (x, j), DF_REF_REG_USE, bb, insn_info, flags); return; } break; } Also, ira-lives.c (which tracks the liveness of both pseudo and hard registers) doesn't mention "volatile" at all. So most passes assume that no pseudos or hard registers will be implicitly clobbered by unspec_volatile, just like for a volatile asm. And IMO that's right. I think the rule should be the same for volatile asms and unspec_volatiles, and the same for registers as it already is for memory: if the instruction clobbers something, it should say so explicitly. Volatile itself should: (a) prevent deletion or duplication of the operation (b) prevent reordering wrt other volatiles (c) prevent the operation from being considered equivalent to any other operation (even if it's structurally identical and has the same inputs) but nothing beyond that. So in terms of where we go from here, I'd hope we'd add something like fake hard registers to track any processor mode of interest, such as FP rounding mode once that is modelled properly. Then anything that relies on a specific mode would use that register and anything that changes the mode would set the register, meaning we have a proper dependency chain. I think we should do the same for whatever unspec_volatiles caused the RTL CSE & co. to be so conservative. Trying to leave it implicit seems too error-prone, because then you have to make sure that every rtl pass
[Bug target/60369] [PATCH] [TIC6X] new compiler intrinsics
Hi all, this is to notify about the patch I've submitted: http://gcc.gnu.org/ml/gcc-bugs/2014-02/msg02863.html Cheers, Wojtek
Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)
> On 03/01/2014 03:52 AM, Jan Hubicka wrote: > >>a hidden function. We don't do that for user-defined virtual > >>functions because the user controls whether or not they are defined > >>in the header, and we don't devirtualize if no definition is > >>available, but implicitly-declared functions are different because > >>the user has no way to prevent the definition from being available. > >> > >>This also isn't a problem for cprop devirtualization, because in > >>that situation we must have already referred to the vtable. > > > >Is this always the case? For normal virtual method you can have something > >like: > > > >foo(class bla a) > >{ > > a->bar(); > >} > > > >Here we will devirutalize bar into bla's method without ever seeing the > >vtable. > >I am not sure if I can construct a testcase for dtor w/o referencing the > >vtable, since generally those are cases where dtor is generated autmatically. > >But it is a bit fragile assumption to be made. > > For pass-by-value we need to be able to call a constructor, which The call may live in other DSO, but I agree it is weird. > needs to be able to reference the vtable, so I think it should be OK > typically. Sure, people can do all kinds of weird and broken things > with visibility; I'm just trying to make what seems to be a common > and understandable pattern work. Yes, this seems resonable. I guess I will need to find some central place where to document those little implementation decisions (there are some in can_refer_in_current_unit_p, some in visibility logic and some in the new devirt logic) Honza > > Jason
Re: [C++ Patch] PR 58610
... in fact, we can also imagine the below clean-up, for another DECL_DELETED_FN use and likewise for DECL_DECLARED_CONSTEXPR_P (which immediately applies STRIP_TEMPLATE to its argument) uses. Or for Stage 1, maybe? Thanks, Paolo. // 2014-03-01 Paolo Carlini * method.c (implicitly_declare_fn): Remove redundant DECL_TEMPLATE_RESULT and STRIP_TEMPLATE uses. * semantics.c (is_instantiation_of_constexpr): Likewise. * error.c (dump_function_decl): Likewise. Index: error.c === --- error.c (revision 208243) +++ error.c (working copy) @@ -1465,7 +1465,7 @@ dump_function_decl (cxx_pretty_printer *pp, tree t else if (DECL_VIRTUAL_P (t)) pp_cxx_ws_string (pp, "virtual"); - if (DECL_DECLARED_CONSTEXPR_P (STRIP_TEMPLATE (t))) + if (DECL_DECLARED_CONSTEXPR_P (t)) pp_cxx_ws_string (pp, "constexpr"); } Index: method.c === --- method.c(revision 208243) +++ method.c(working copy) @@ -1645,9 +1645,8 @@ implicitly_declare_fn (special_function_kind kind, /* For an inheriting constructor template, just copy these flags from the inherited constructor template for now. */ raises = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (inherited_ctor)); - deleted_p = DECL_DELETED_FN (DECL_TEMPLATE_RESULT (inherited_ctor)); - constexpr_p - = DECL_DECLARED_CONSTEXPR_P (DECL_TEMPLATE_RESULT (inherited_ctor)); + deleted_p = DECL_DELETED_FN (inherited_ctor); + constexpr_p = DECL_DECLARED_CONSTEXPR_P (inherited_ctor); } else synthesized_method_walk (type, kind, const_p, &raises, &trivial_p, @@ -1726,8 +1725,7 @@ implicitly_declare_fn (special_function_kind kind, TREE_PROTECTED (fn) = TREE_PROTECTED (inherited_ctor); /* Copy constexpr from the inherited constructor even if the inheriting constructor doesn't satisfy the requirements. */ - constexpr_p - = DECL_DECLARED_CONSTEXPR_P (STRIP_TEMPLATE (inherited_ctor)); + constexpr_p = DECL_DECLARED_CONSTEXPR_P (inherited_ctor); } /* Add the "this" parameter. */ this_parm = build_this_parm (fn_type, TYPE_UNQUALIFIED); Index: semantics.c === --- semantics.c (revision 208243) +++ semantics.c (working copy) @@ -3941,8 +3941,7 @@ static inline bool is_instantiation_of_constexpr (tree fun) { return (DECL_TEMPLOID_INSTANTIATION (fun) - && DECL_DECLARED_CONSTEXPR_P (DECL_TEMPLATE_RESULT - (DECL_TI_TEMPLATE (fun; + && DECL_DECLARED_CONSTEXPR_P (DECL_TI_TEMPLATE (fun))); } /* Generate RTL for FN. */
Re: [C++ Patch] PR 60314 (ICE with decltype(auto))
Hi, On 02/28/2014 04:50 PM, Jason Merrill wrote: OK, thanks. Applied. I have just noticed (sorry) that get_AT_ref (thus get_AT) isn't trivial at all, thus I propose to apply the below. Is it Ok with you? Thanks, Paolo. // 2014-03-01 Paolo Carlini * dwarf2out.c (gen_subprogram_die): Tidy. Index: dwarf2out.c === --- dwarf2out.c (revision 208243) +++ dwarf2out.c (working copy) @@ -18028,11 +18028,13 @@ gen_subprogram_die (tree decl, dw_die_ref context_ /* If the prototype had an 'auto' or 'decltype(auto)' return type, emit the real type on the definition die. */ - if (is_cxx() && debug_info_level > DINFO_LEVEL_TERSE - && (get_AT_ref (old_die, DW_AT_type) == auto_die - || get_AT_ref (old_die, DW_AT_type) == decltype_auto_die)) - add_type_attribute (subr_die, TREE_TYPE (TREE_TYPE (decl)), - 0, 0, context_die); + if (is_cxx() && debug_info_level > DINFO_LEVEL_TERSE) + { + dw_die_ref die = get_AT_ref (old_die, DW_AT_type); + if (die == auto_die || die == decltype_auto_die) + add_type_attribute (subr_die, TREE_TYPE (TREE_TYPE (decl)), + 0, 0, context_die); + } } } else
Re: [PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.
On Sat, Mar 1, 2014 at 7:13 AM, Kirill Yukhin wrote: > Hello Uroš, > On 28 Feb 13:55, Uros Bizjak wrote: >> On Fri, Feb 28, 2014 at 1:14 PM, Kirill Yukhin >> wrote: >> > Hello, >> > This is relatively obvious patch which eliminates comparision >> > of inifinities for exp2 AVX-512 test and properly comparing floats >> > for avx512f-sqrtps-2.c. >> > >> > Tests pass. >> > >> > Is it ok for trunk? >> > >> > gcc/testsuite/ >> > * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent >> > argument to avoid inf values. >> > * gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with >> > UNION_FP_CHECK machinery. >> >> You are talking about avx512f-sqrtps-2.c, the ChangeLog refers to >> avx512er-vexp2ps-2.c, but the patch is modifying avx512f-vdivps-2.c. > Sorry for mess. > Broken was avx512f-vdivps-2.c. > > Updated testsuite/CHangelog: > * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent > argument to avoid inf values. > * gcc.target/i386/avx512f-vdivps-2.c: Compare results with > UNION_FP_CHECK machinery. OK. Thanks, Uros.
RE: [PATCH v4] PR middle-end/60281
Hi Lin, On Fri, 28 Feb 2014 19:14:11, lin zuojian wrote: > > 于 2014年02月28日 15:58, lin zuojian 写道: >> Hi Bernd, >> I agree you with the mode problem. >> >> And I have not change the stack alignment.What I change is the virtual >> register base's alignment. I tried your patch on this test case: gcc -O2 -fsanitize=address -S gcc/testsuite/c-c++-common/asan/pr59063-1.c and compared the stack usage with and without patch: without patch: @ Function supports interworking. @ args = 0, pretend = 0, frame = 96 @ frame_needed = 0, uses_anonymous_args = 0 stmfd sp!, {r4, r5, r6, r7, r8, lr} ldr r3, .L12 ldr r3, [r3] sub sp, sp, #96 cmp r3, #0 mov r5, sp mov r8, sp and with patch: @ Function supports interworking. @ args = 0, pretend = 0, frame = 128 @ frame_needed = 0, uses_anonymous_args = 0 stmfd sp!, {r4, r5, r6, r7, r8, lr} ldr r3, .L12 ldr r3, [r3] sub sp, sp, #128 cmp r3, #0 add r3, sp, #128 bic r4, r3, #31 sub r4, r4, #96 mov r8, r4 So, that is what I mean: this patch makes the stack grow by 32 bytes, just because the emit_stack_protection uses SImode, with unaligned addresses which is not possible for ARM, and not optimal for X86_64. Why not use the true alignment value in set_mem_align? And simply let get_best_mode() choose the right mode to use for that alignment. IMHO it should not be necessary to use STRICT_ALIGNMENT everywhere, because get_best_mode should know what mode is appropriate for which alignment value. Regards Bernd. >> Realignment must be make in !STRICT_ALIGNMENT machine,or emitting the >> efficient code is impossible. > Sorry, it should be "Realignment must be make in STRICT_ALIGNMENT machine". >> For example 4 set mem:QI X,REG:QI Y will not combine into one set mem:SI >> X1,REG:SI Y1,if X is not mentioned as SI mode aligned. >> To make sure X is SI mode algined,virtual register base must be realigned. >> >> For this patch,I only intent to make it right.Making it best is next task. >> -- >> Regards >> lin zuojian. >> >> 于 2014年02月28日 15:47, Bernd Edlinger 写道: >>> Hi, >>> >>> I see the problem too. >>> >>> But I think it is not necessary to change the stack alignment >>> to solve the problem. >>> >>> It appears to me that the code in asan_emit_stack_protection >>> is just wrong. It uses SImode when the memory is not aligned >>> enough for that mode. This would not happen if that code >>> is rewritten to use get_best_mode, and by the way, even on >>> x86_64 the emitted code is not optimal, because that target >>> could work with DImode more efficiently. >>> >>> So, to fix that, it would be better to concentrate on that function, >>> and use word_mode instead of SImode, and let get_best_mode >>> choose the required mode. >>> >>> >>> Regards >>> Bernd Edlinger. >
[PATCH] Fix PR c++/60377.
PR c++/60377 * parser.c (cp_parser_parameter_declaration_clause): Unwind generic function scope on parse error in function parameter list. PR c++/60377 * g++.dg/cpp1y/pr60377.C: New testcase. --- gcc/cp/parser.c | 7 ++- gcc/testsuite/g++.dg/cpp1y/pr60377.C | 9 + 2 files changed, 15 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp1y/pr60377.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index ef36327..8c78262 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -18208,7 +18208,12 @@ cp_parser_parameter_declaration_clause (cp_parser* parser) parameter-declaration-list, then the entire parameter-declaration-clause is erroneous. */ if (is_error) -return NULL; +{ + /* Unwind generic function template scope if necessary. */ + if (parser->fully_implicit_function_template_p) + finish_fully_implicit_template (parser, /*member_decl_opt=*/0); + return NULL; +} /* Peek at the next token. */ token = cp_lexer_peek_token (parser->lexer); diff --git a/gcc/testsuite/g++.dg/cpp1y/pr60377.C b/gcc/testsuite/g++.dg/cpp1y/pr60377.C new file mode 100644 index 000..4f6497c --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/pr60377.C @@ -0,0 +1,9 @@ +// PR c++/60377 +// { dg-options -std=c++1y } + +void foo(auto, void (f*)()); // { dg-error "expected" } + +struct A +{ + int i; +}; -- 1.9.0
C++ PATCH for c++/60379 (bogus error with loop in template)
We shouldn't complain if a loop condition isn't a valid constant expression; it isn't supposed to be. Tested x86_64-pc-linux-gnu, applying to trunk. commit 5266ff07b6077df5d7bf6d5748856a43e48ad6ee Author: Jason Merrill Date: Sat Mar 1 14:22:02 2014 -0500 PR c++/60379 * semantics.c (begin_maybe_infinite_loop): Use fold_non_dependent_expr_sfinae. diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 9c1c29d..eaeeb24 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -503,7 +503,7 @@ begin_maybe_infinite_loop (tree cond) bool maybe_infinite = true; if (cond) { - cond = fold_non_dependent_expr (cond); + cond = fold_non_dependent_expr_sfinae (cond, tf_none); cond = maybe_constant_value (cond); maybe_infinite = integer_nonzerop (cond); } diff --git a/gcc/testsuite/g++.dg/template/loop1.C b/gcc/testsuite/g++.dg/template/loop1.C new file mode 100644 index 000..aa6d177 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/loop1.C @@ -0,0 +1,9 @@ +// PR c++/60379 + +template struct A { + void m_fn1(int p1) { +int *a; +while (p1 && *static_cast(static_cast(a))) + ; + } +};
Re: [C++ Patch] PR 58610
OK. Jason
Re: [PATCH] Fix PR c++/60377.
OK. Jason
Re: [C++ Patch] PR 60314 (ICE with decltype(auto))
OK. Jason
[patch, libgfortran] PR60148 Strings in NAMELIST do not honor DELIM= in open statement
Hi all, The attached patch fixes this by actually implementing it. I cleaned up some of the code by getting rid of the tmp_delim variables and adding a "mode" to write_character which is used to ignore delimiters when writing out variable names and other namelist parts. I will prepare a test case. Regression tested on x86_64. OK for trunk or hold for next stage? Regards, Jerry 2014-03-01 Jerry DeLisle PR libfortran/60148 * io/inquire.c (inquire_via_unit): In the case of DELIM_UNSPECIFIED set inquire return string to "NONE". * io/list_read.c (read_character): In the case of DELIM_NONE and namelists, complete the character read using the namelist variable length. * io/open.c (new_unit): Don't set delim status to none if not specified so that DELIM_UNSPECIFIED can be used later. * io/transfer.c (data_transfer_init): For namelist I/O, if the unit delim status is unspecified set the current status to quote. Otherwise, set current status to the unit status. * io/unit.c (get_internel_unit, init_unit): Remember to set flags_delim initially to DELIM_UNSPECIFIED so defaults come out correctly. * io/write.c (write_character): Add a new function argument "mode" to signify that raw output is to be used vs output with delimiters. If the mode is set to DELIM (1) proceed with delimiters. (list_formatted_write_scalar): Write the separator only if a delimiter was previously specified. Update the call to write_character with the mode argument given. (namelist_write_newline): Use the mode argument. (nml_write_obj): Use the mode argument. Remove use of tmp_delim. Write the semi-colon or comma correctly only when needed with using delimiters. Cleanup whitespace. (namelist_write): If delim is not specified in namelist I/O, default to using quotes. Get rid of the tmp_delim variable and use the new mode argument if write_character. Index: inquire.c === --- inquire.c (revision 208246) +++ inquire.c (working copy) @@ -523,6 +523,7 @@ inquire_via_unit (st_parameter_inquire *iqp, gfc_u switch (u->flags.delim) { case DELIM_NONE: + case DELIM_UNSPECIFIED: p = "NONE"; break; case DELIM_QUOTE: Index: list_read.c === --- list_read.c (revision 208246) +++ list_read.c (working copy) @@ -971,10 +971,24 @@ read_character (st_parameter_dt *dtp, int length _ default: if (dtp->u.p.namelist_mode) { + if (dtp->u.p.current_unit->delim_status == DELIM_NONE) + { + /* No delimiters so finish reading the string now. */ + int i; + push_char (dtp, c); + for (i = dtp->u.p.ionml->string_length; i > 1; i--) + { + if ((c = next_char (dtp)) == EOF) + goto done_eof; + push_char (dtp, c); + } + dtp->u.p.saved_type = BT_CHARACTER; + free_line (dtp); + return; + } unget_char (dtp, c); return; } - push_char (dtp, c); goto get_string; } Index: open.c === --- open.c (revision 208246) +++ open.c (working copy) @@ -332,17 +332,13 @@ new_unit (st_parameter_open *opp, gfc_unit *u, uni /* Checks. */ - if (flags->delim == DELIM_UNSPECIFIED) -flags->delim = DELIM_NONE; - else + if (flags->delim != DELIM_UNSPECIFIED + && flags->form == FORM_UNFORMATTED) { - if (flags->form == FORM_UNFORMATTED) - { - generate_error (&opp->common, LIBERROR_OPTION_CONFLICT, - "DELIM parameter conflicts with UNFORMATTED form in " - "OPEN statement"); - goto fail; - } + generate_error (&opp->common, LIBERROR_OPTION_CONFLICT, + "DELIM parameter conflicts with UNFORMATTED form in " + "OPEN statement"); + goto fail; } if (flags->blank == BLANK_UNSPECIFIED) Index: transfer.c === --- transfer.c (revision 208246) +++ transfer.c (working copy) @@ -2670,16 +2670,21 @@ data_transfer_init (st_parameter_dt *dtp, int read = !(cf & IOPARM_DT_HAS_DELIM) ? DELIM_UNSPECIFIED : find_option (&dtp->common, dtp->delim, dtp->delim_len, delim_opt, "Bad DELIM parameter in data transfer statement"); - + if (dtp->u.p.current_unit->delim_status == DELIM_UNSPECIFIED) -dtp->u.p.current_unit->delim_status = dtp->u.p.current_unit->flags.delim; +{ + if (ionml && dtp->u.p.current_unit->flags.delim == DELIM_UNSPECIFIED) + dtp->u.p.current_unit->delim_status = DELIM_QUOTE; + else + dtp->u.p.current_unit->delim_status = dtp->u.p.current_unit->flags.delim; +} /* Check the pad mode. */ dtp->u.p.current_unit->pad_status = !(cf & IOPARM_DT_HAS_PAD) ? PAD_UNSPECIFIED : find_option (&dtp->common, dtp->
[Committed, fortran] PR 60341 invalid union access on string comparison optimization
Hello, I have just regression tested and committed a patch fixing PR 60341 by adding two expression type checks before union accesses (it's the same as the one of the PR). I plan to backport tomorrow (4.8 and 4.7). Mikael Index: gcc/testsuite/gfortran.dg/str_comp_optimize_1.f90 === --- gcc/testsuite/gfortran.dg/str_comp_optimize_1.f90 (révision 0) +++ gcc/testsuite/gfortran.dg/str_comp_optimize_1.f90 (révision 208249) @@ -0,0 +1,22 @@ +! { dg-do compile } +! { dg-options "-ffrontend-optimize" } +! +! PR fortran/60341 +! An unguarded union access was wrongly enabling a frontend optimization on a +! string comparison, leading to an ICE. +! +! Original testcase from Steve Chapel . +! Reduced by Steven G. Kargl . +! + + subroutine modelg(ncm) + implicit none + integer, parameter :: pc = 30, pm = pc - 1 + integer i + character*4 catt(pm,2) + integer ncm,iatt(pm,pc) + do i=1,ncm + if (catt(i,1)//catt(i,2).eq.'central') exit + end do + iatt(i,4)=1 + end Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (révision 208248) +++ gcc/testsuite/ChangeLog (révision 208249) @@ -1,3 +1,8 @@ +2014-03-01 Mikael Morin + + PR fortran/60341 + * gfortran.dg/str_comp_optimize_1.f90: New test. + 2014-03-01 Oleg Endo PR target/60071 Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (révision 208248) +++ gcc/fortran/ChangeLog (révision 208249) @@ -1,3 +1,9 @@ +2014-03-01 Mikael Morin + + PR fortran/60341 + * frontend-passes.c (optimize_comparison): Guard two union accesses + with the corresponding tag checks. + 2014-02-28 Janus Weil PR fortran/60359 Index: gcc/fortran/frontend-passes.c === --- gcc/fortran/frontend-passes.c (révision 208248) +++ gcc/fortran/frontend-passes.c (révision 208249) @@ -1391,7 +1391,9 @@ optimize_comparison (gfc_expr *e, gfc_intrinsic_op /* Replace A // B < A // C with B < C, and A // B < C // B with A < C. */ if (op1->ts.type == BT_CHARACTER && op2->ts.type == BT_CHARACTER + && op1->expr_type == EXPR_OP && op1->value.op.op == INTRINSIC_CONCAT + && op2->expr_type == EXPR_OP && op2->value.op.op == INTRINSIC_CONCAT) { gfc_expr *op1_left = op1->value.op.op1;
[PATCH] [lto/55113] Fix use of -fshort-double with -flto for powerpc
This patch fixes lto/55113 for powerpc. Combining -fshort-double with -flto is now working fine. I attach patch and testcase (unsure if testcase is in the right place). Tested with target powerpc-abispe. 2014-03-01 Paulo Matos * c-family/c.opt: Add LTO FE support for fshort-double option. * tree-streamer.c (record_common_node): Assert we don't record nodes with type double. (preload_common_node): Skip type double, complex double and double pointer since it is now frontend dependent due to fshort-double option. 2014-03-01 Paulo Matos * gcc.target/powerpc/pr55113.c: New testcase. OK to commit? -- PMatos Index: gcc/c-family/c.opt === --- gcc/c-family/c.opt (revision 208249) +++ gcc/c-family/c.opt (working copy) @@ -1141,7 +1141,7 @@ C++ ObjC++ Optimization Var(flag_rtti) I Generate run time type descriptor information fshort-double -C ObjC C++ ObjC++ Optimization Var(flag_short_double) +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_double) Use the same size for double as for float fshort-enums Index: gcc/tree-streamer.c === --- gcc/tree-streamer.c (revision 208249) +++ gcc/tree-streamer.c (working copy) @@ -264,7 +264,8 @@ record_common_node (struct streamer_tree gcc_checking_assert (node != boolean_type_node && node != boolean_true_node - && node != boolean_false_node); + && node != boolean_false_node + && node != double_type_node); /* We have to make sure to fill exactly the same number of elements for all frontends. That can include NULL trees. @@ -315,10 +316,14 @@ preload_common_nodes (struct streamer_tr record_common_node (cache, sizetype_tab[i]); for (i = 0; i < TI_MAX; i++) -/* Skip boolean type and constants, they are frontend dependent. */ +/* Skip boolean type and constants. They are frontend dependent. + Skip double type, frontend dependent due to -fshort-double. */ if (i != TI_BOOLEAN_TYPE && i != TI_BOOLEAN_FALSE - && i != TI_BOOLEAN_TRUE) + && i != TI_BOOLEAN_TRUE + && i != TI_DOUBLE_TYPE + && i != TI_COMPLEX_DOUBLE_TYPE + && i != TI_DOUBLE_PTR_TYPE) record_common_node (cache, global_trees[i]); } Index: gcc/testsuite/gcc.target/powerpc/pr55113.c === --- gcc/testsuite/gcc.target/powerpc/pr55113.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr55113.c (working copy) @@ -0,0 +1,11 @@ +#include + +int main() +{ + static float f; + float a = 1.0; + float b = 2.0; + f = a + b * 1e-12; + printf("%f\n", f); + return 0; +}
[C++ Patch] PR 50025 - [DR 1288] C++0x initialization syntax doesn't work for class members of reference type
Built and tested on x86-64-linux. This is just a test case. 2014-03-01 Edward Smith-Rowland <3dw...@verizon.net> PR c++/50025 * g++.dg/cpp0x/pr50025.C: New. Index: g++.dg/cpp0x/pr50025.C === --- g++.dg/cpp0x/pr50025.C (revision 0) +++ g++.dg/cpp0x/pr50025.C (working copy) @@ -0,0 +1,40 @@ +// { dg-options "-std=gnu++11" } + +#include + +class A +{ +public: + + A(int a, int& b, int&& c) + : m_a{a}, +m_b{b}, +m_c{std::move(c)} + {} + +private: + + int m_a; + int& m_b; + int&& m_c; +}; + + +struct X {}; + +class B +{ +public: + + B(X& q, X&& r, const X& s) + : m_q{q}, +m_r{std::move(r)}, +m_s{s} + {} + +private: + + X& m_q; + X&& m_r; + const X& m_s; +};
[PATCH v2] Fix PR c++/25940
Hi, The following patch fixes two issues: the first issue is PR c++/25940 and the second is related to PR c++/13699. The first issue is that the C++ frontend fails to reject duplicate definitions of functions declared to have C language linkage. This results in the compiler emitting ASM that defines the same symbol more than once. The second issue is that upon encountering a redeclaration of an extern "C" function with a different exception specification, the C++ frontend fails to add the new declaration to the list of things declared in the current namespace (because we exit early from pushdecl_maybe_friend_1). This behavior makes the new declaration appear to be not in scope. I bootstrapped and regtested this patch on x86_64-unknown-linux-gnu. I had a few guality and cilk FAILs though I don't think they are related to this patch. I have not yet finalized my copyright assignment but I would appreciate any comments on the patch (this is my first substantive patch to GCC). Would this patch be OK for trunk once my copyright assignment is in place? Changes in v2: No longer return error_mark_node after emitting a redefinition error. Simplified the new testcase "extern-c-redecl6.C". CC'd the C++ maintainer. Regards, Patrick PR c++/25940 * name-lookup.c (pushdecl_maybe_friend_1): Emit an error when a function with C language linkage is redefined. Don't return error_mark_node after a function with C language linkage is redeclared with a different exception specification. (lookup_extern_c_fun_in_all_ns): Prefer returning an existing function definition instead of a declaration. PR c++/25940 * g++.dg/lookup/extern-c-redecl6.C: New test. * g++.dg/lookup/extern-c-redecl7.C: New test. --- gcc/cp/name-lookup.c | 23 ++- gcc/testsuite/g++.dg/lookup/extern-c-redecl6.C | 25 + gcc/testsuite/g++.dg/lookup/extern-c-redecl7.C | 12 3 files changed, 55 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/g++.dg/lookup/extern-c-redecl6.C create mode 100644 gcc/testsuite/g++.dg/lookup/extern-c-redecl7.C diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index ea16061..c797193 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -861,6 +861,14 @@ pushdecl_maybe_friend_1 (tree x, bool is_friend) tree x_exception_spec = NULL_TREE; tree previous_exception_spec = NULL_TREE; + if (DECL_INITIAL (x) && DECL_INITIAL (previous)) + { + error_at (input_location, + "redefinition of %q+#D with C language linkage", + x); + inform (input_location, + "%q+#D previously defined here", previous); + } x_exception_spec = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (x)); previous_exception_spec = @@ -877,7 +885,6 @@ pushdecl_maybe_friend_1 (tree x, bool is_friend) previous); pedwarn (input_location, 0, "due to different exception specifications"); - return error_mark_node; } if (DECL_ASSEMBLER_NAME_SET_P (previous)) SET_DECL_ASSEMBLER_NAME (x, @@ -2142,8 +2149,10 @@ binding_for_name (cp_binding_level *scope, tree name) } /* Walk through the bindings associated to the name of FUNCTION, - and return the first declaration of a function with a - "C" linkage specification, a.k.a 'extern "C"'. + and return the first definition of a function with a + "C" linkage specification, a.k.a 'extern "C"'. If no previous + definition exists, return the first declaration of the function. + This function looks for the binding, regardless of which scope it has been defined in. It basically looks in all the known scopes. Note that this function does not lookup for bindings of builtin functions @@ -2152,6 +2161,7 @@ static tree lookup_extern_c_fun_in_all_ns (tree function) { tree name; + tree ret = NULL_TREE; cxx_binding *iter; gcc_assert (function && TREE_CODE (function) == FUNCTION_DECL); @@ -2172,11 +2182,14 @@ lookup_extern_c_fun_in_all_ns (tree function) && DECL_EXTERN_C_P (decl) && !DECL_ARTIFICIAL (decl)) { - return decl; + if (DECL_INITIAL (decl)) + return decl; + if (!ret) + ret = decl; } } } - return NULL; + return ret; } /* Returns a list of C-linkage decls with the name NAME. */ diff --git a/gcc/testsuite/g++.dg/lookup/extern-c-redecl6.C b/gcc/testsuite/g++.dg/lookup/extern-c-redecl6.C new file mode 100644 index 000..7577d3c --- /dev/null +
Re: [PATCH v2] Fix PR c++/25940
On Sat, 1 Mar 2014, Patrick Palka wrote: + error_at (input_location, + "redefinition of %q+#D with C language linkage", + x); + inform (input_location, + "%q+#D previously defined here", previous); It seems strange that both the new and the old declarations are at the same location... -- Marc Glisse
Re: [PATCH v2] Fix PR c++/25940
On Sat, Mar 1, 2014 at 7:53 PM, Marc Glisse wrote: > On Sat, 1 Mar 2014, Patrick Palka wrote: > >> + error_at (input_location, >> + "redefinition of %q+#D with C language >> linkage", >> + x); >> + inform (input_location, >> + "%q+#D previously defined here", previous); > > > It seems strange that both the new and the old declarations are at the same > location... Isn't input_location is a symbolic value standing for the location of the decl argument that's being printed?
Re: [PATCH v4] PR middle-end/60281
Hi Bernd, set_mem_align is not working like magic. set_mem_align just set the alignment of a memory rtx.And You must aware that you do so because you are sure this rtx IS aligned like this. For arm machines, the base the virtual registers are aligned to 8 bytes.You can't just set_mem_align to get_best_mode(),because this is not the fact.If you do so,the unalign accessing is still there.My realignment code make sure the base of virtual registers is aligned to SImode,and so I can use that fact to set_mem_align. In a word,set_mem_align does not generate realignment code for STRICT_ALIGNMENT machines. As you say that's not the best mode on all machines.But let's fix one bug at a time.You are focusing too much in that mode. > So, that is what I mean: this patch makes the stack grow by > 32 bytes, just because the emit_stack_protection uses SImode, > with unaligned addresses which is not possible for ARM, and > not optimal for X86_64. It's because of the realignment,as my comment said.I have to make room for virtual registers's realignment,or they(or the spilled registers) will get damaged by the other subroutine. -- Regards lin zuojian
Re: [PATCH v4] PR middle-end/60281
Hi Bernd, You may send a patch too.Your idea will be more clear. -- Regards lin zuojian On Sun, Mar 02, 2014 at 10:24:52AM +0800, lin zuojian wrote: > Hi Bernd, > > set_mem_align is not working like magic. > > set_mem_align just set the alignment of a memory rtx.And You must aware > that you do so because you are sure this rtx IS aligned like this. > For arm machines, the base the virtual registers are aligned to 8 > bytes.You can't just set_mem_align to get_best_mode(),because this is > not the fact.If you do so,the unalign accessing is still there.My > realignment code make sure the base of virtual registers is aligned to > SImode,and so I can use that fact to set_mem_align. > > In a word,set_mem_align does not generate realignment code for > STRICT_ALIGNMENT machines. > > As you say that's not the best mode on all machines.But let's fix one > bug at a time.You are focusing too much in that mode. > > > So, that is what I mean: this patch makes the stack grow by > > 32 bytes, just because the emit_stack_protection uses SImode, > > with unaligned addresses which is not possible for ARM, and > > not optimal for X86_64. > It's because of the realignment,as my comment said.I have to make room > for virtual registers's realignment,or they(or the spilled registers) will > get damaged by the other subroutine. > > > -- > Regards > lin zuojian
Why my mail is not achived
Hi, my mail is not achived by http://gcc.gnu.org/ml/gcc-patches/2014-02/. What's happening? -- Regards lin zuojian
Re: Why my mail is not achived
Hi, I forgot now is Mar.I thought it's Feb.Sorry. -- Regards lin zuojian On Sun, Mar 02, 2014 at 10:57:15AM +0800, lin zuojian wrote: > Hi, > my mail is not achived by > http://gcc.gnu.org/ml/gcc-patches/2014-02/. What's happening? > > -- > Regards > lin zuojian
Re: Why my mail is not achived
On Sat, Mar 1, 2014 at 9:57 PM, lin zuojian wrote: > Hi, > my mail is not achived by > http://gcc.gnu.org/ml/gcc-patches/2014-02/. What's happening? That's last month's archive.
Re: Why my mail is not achived
Yeah, I have realized that. On Sat, Mar 01, 2014 at 10:01:41PM -0500, Patrick Palka wrote: > On Sat, Mar 1, 2014 at 9:57 PM, lin zuojian wrote: > > Hi, > > my mail is not achived by > > http://gcc.gnu.org/ml/gcc-patches/2014-02/. What's happening? > > That's last month's archive.
Re: [Fortran] RFC / RFA patch for using build_predict_expr instead of builtin_expect / PR 58721
Pre-remark: I currently get Stage 2/Stage 3 miscompares. As this is unlikely to be caused by my patch, it properly means that Honza's patch bitrotted even though it still applies. Dear all, attached is an updated patch set for this PR. Background: gfortran uses internally __builtin_expect. However, it recently changed from very likely (99%, if I recall correctly) to 90% (adaptable via a "--param"). As side-effect, some conditions for error paths in gfortran became more likely - even if they lead to no-return code as PRED_NORETURN is overridden by an explicit __builtin_expect (alias PRED_BUILTIN_EXPECT). I have attached two patches: a) Honza's patch, which he attached to the PR. (without changelog, rediffed - which essentially means that I have taken out the failing part, which seems to be already applied.) b) An updated patch of mine. Honza's patch permits to add, optionally, a PRED_* value to builtin_predict. My patch adds a bunch of PRED_* for Fortran, which should cover all existing use of builtin_expect; I have added a PRED_ for every different use I found. The PRED_* numbers I use are hopefully good guesses but I am open for suggestions for their numbers. Additional, one can argue whether the PRED_* names are well chosen or whether there are too many PRED_. On the Fortran side, I have replaced the builtin_expect by either build_predict_expr (which is added to the (basic) block which is (un)likely) and by the three-argument version of builtin_expect as we often use it like "(cont1 || builtin_expect(...))" which isn't representable with build_predict_expr. (However, I might have used build_predict_expr in some additional cases.) The patch was attempted to be bootstrapped on x86-64-gnu-linux. Tobias Am 15.12.2013 23:15, schrieb Jan Hubicka: Jan Hubicka wrote: Yep, they should not roduce incorrect code. Isn't the problem that you expect the whole expression to have value and predit_expr "reutrns" nothing? Can you check what lands into gimple? That could well be the case – I replace "0" by "{ built_predict; 0 }" and I wouldn't be surprised if the built_predict causes problem Yep, though this is also the case whrere you really want to predict a value rather than code path, so the extended bultin_expect is probably only resonable approach here. I am not really generic person, but perhaps it is a difference in between { built_predict; 0 } that is stmt with no value and built_predict,0 that is an expression with value? I will look into the builtin_expect extension. Then we can probably keem gfc_likely/unlikely with extra argument specifying the predictor for cases like this and use predict_expr in cases where you really produce runtime conditional like if (...) { bulitin_predict; abort ();} Honza as it returns 'nothing'. At least the basic block belonging to "else" () is empty: Original dump (4.8 and 4.9 with predict patch): sub1 (&v[S.0 + -1], (logical(kind=4)) __builtin_expect ((integer(kind=8)) (D.1897 == 0B), 0) ? 0B : &(*D.1897)[(S.0 + D.1901) * D.1903 + D.1898]); and sub1 (&v[S.0 + -1], D.1917 != 0B ? &(*D.1917)[(S.0 + D.1921) * D.1923 + D.1918] : (void) 0); Gimple of 4.8: if (D.1916 != 0) goto ; else goto ; : iftmp.2 = 0B; goto ; : ... iftmp.2 = &*D.1897[D.1922]; : .. sub1 (D.1924, iftmp.2); gimple of 4.9 with patch: if (D.1917 != 0B) goto ; else goto ; : D.1935 = S.0 + D.1921; D.1936 = D.1935 * D.1923; D.1937 = D.1936 + D.1918; iftmp.2 = &*D.1917[D.1937]; goto ; : : D.1939 = S.0 + -1; D.1940 = &v[D.1939]; sub1 (D.1940, iftmp.2); Tobias PS: That's for the code: implicit none type t2 integer, allocatable :: a integer, pointer :: p2(:) => null() end type t2 type(t2), save :: x integer, save :: s, v(2) call sub1 (v, x%p2) contains elemental subroutine sub1 (x, y) integer, intent(inout) :: x integer, intent(in), optional :: y end subroutine sub1 end gcc/predict.c | 79 +++-- gcc/predict.def | 5 2 files changed, 65 insertions(+), 19 deletions(-) diff --git a/gcc/predict.c b/gcc/predict.c index db5eed9..94f4df9 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -956,7 +956,8 @@ combine_predictions_for_bb (basic_block bb) struct edge_prediction *pred2; int prob = probability; - for (pred2 = (struct edge_prediction *) *preds; pred2; pred2 = pred2->ep_next) + for (pred2 = (struct edge_prediction *) *preds; + pred2; pred2 = pred2->ep_next) if (pred2 != pred && pred2->ep_predictor == pred->ep_predictor) { int probability2 = pred->ep_probability; @@ -1788,16 +1789,19 @@ guess_outgoing_edge_probabilities (basic_block bb) combine_predictions_for_insn (BB_END (bb), bb); } -static tree expr_expected_value (t