Re: [PATCH, SH] Add support for inlined builtin-strcmp (1/2)

2013-10-18 Thread Christian Bruel

On 10/18/2013 12:53 AM, Oleg Endo wrote:
> Hi,
>
> On Thu, 2013-10-17 at 16:13 +0200, Christian Bruel wrote:
>> Hello,
>>
>> This patch just reorganizes the SH code used for memory builtins into
>> its own file, in preparation of the RTL strcmp hoisting in the next part.
>>
> Since GCC is now being compiled as C++, it's probably better to name
> newly added source files .cc instead of .c.  Could you please rename the
> new file to sh-mem.cc?
>
> Thanks,
> Oleg
Hello Oleg,

I have no objection to rename a pure C file to a c++ suffixed file. 
I'll conform to whatever
 the general guidelines for pure C code is.

For now it doesn't seem to be the tendency.

grep -i "ew File" ChangeLog | grep .c:
* gimple-builder.c: New File.
* config/winnt-c.c: New file
* ipa-profile.c: New file.
* ubsan.c: New file.
* ipa-devirt.c: New file.
* vtable-verify.c: New file.
* config/arm/aarch-common.c: ... here.  New file.
* diagnostic-color.c: New file.
* config/linux-android.c: New file.

I haven't seen any reference to this in the GCC coding guidelines,
should we prefer .cc, cxx, C,  cpp., c++.. ?

Also I'm wondering if there is any plan to rename all files in the tree
so we have a consistent source tree.

Do we have general recommendation from the general maintainers ?

Many thanks

Christian





Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-18 Thread Christian Bruel
On 10/18/2013 01:05 AM, Oleg Endo wrote:
> I was wondering, in file sh-mem.c, the new function
> 'sh4_expand_cmpstr' ... why is it SH4-something?  It's a bit confusing,
> since cmp/str has been around since ever (i.e. since SH1). Maybe just
> rename it to 'sh_expand_cmpstr' instead?

Just historical. (SH4* are our primary SH platforms). The code is
enabled/tested for all SH1 of course, I will  rename. Thanks .

>  Maybe just
> rename it to 'sh_expand_cmpstr' instead?  The function always returns
> 'true', so maybe just make it return 'void'?

yes, it's for genericity as I plan to reuse/specialize the code based on
the count parameter for strncmp to be contributed next.
>
> Also, in the expander ...
>
> +  [(set (match_operand:SI 0 "register_operand" "")
> + (compare:SI (match_operand:BLK 1 "memory_operand" "")
>
> ... no need to use empty "" constraints

OK, thanks

Christian

> Cheers,
> Oleg
>



Re: patch to canonize unsigned tree-csts

2013-10-18 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 16 Oct 2013, Kenneth Zadeck wrote:
>
>> On 10/15/2013 02:30 PM, Richard Sandiford wrote:
>> > Richard Sandiford  writes:
>> > >if (small_prec)
>> > >  ;
>> > >else if (precision == xprecision)
>> > >  while (len >= 0 && val[len - 1] == -1)
>> > >len--;
>> > Err, len > 0 obviously.
>> you were only close.patch tested on ppc and committed as revision 203739.
>> 
>> Index: gcc/tree.c
>> ===
>> --- gcc/tree.c  (revision 203701)
>> +++ gcc/tree.c  (working copy)
>> @@ -1204,7 +1204,7 @@ wide_int_to_tree (tree type, const wide_
>>  }
>> 
>>wide_int cst = wide_int::from (pcst, prec, sgn);
>> -  unsigned int len = int (cst.get_len ());
>> +  unsigned int len = cst.get_len ();
>>unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
>>bool recanonize = sgn == UNSIGNED
>>  && small_prec
>> Index: gcc/tree.h
>> ===
>> --- gcc/tree.h  (revision 203701)
>> +++ gcc/tree.h  (working copy)
>> @@ -5204,13 +5204,13 @@ wi::int_traits ::decompose (
>>   scratch[len - 1] = sext_hwi (val[len - 1], precision);
>>   return wi::storage_ref (scratch, len, precision);
>> }
>> -   }
>> -
>> -  if (precision < xprecision + HOST_BITS_PER_WIDE_INT)
>> -   {
>> - len = wi::force_to_size (scratch, val, len, xprecision, precision,
>> UNS
>> IGNED);
>> - return wi::storage_ref (scratch, len, precision);
>> -   }
>> +   }
>> +  /* We have to futz here because a large unsigned int with
>> +precision 128 may look (0x0 0x 0xF...) as a
>> +tree-cst and as (0xF...) as a wide-int.  */
>> +  else if (precision == xprecision && len == max_len)
>> +while (len > 1 && val[len - 1] == (HOST_WIDE_INT)-1)
>> +  len--;
>>  }
>
> Err, that now undoes the extra zero word thing?  Or was I confused
> about the previous code and this "append extra zero word for
> MSB set unsigned constants"?

When the precision == xprecision yes.  Please see the discussion from
yesterday about this:

> >> The fundamental problem here is that we're trying to support two cases:
> >> 
> >> (a) doing N-bit arithemtic in cases where the inputs have N bits
> >> (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits
> >> and are extended according to TYPE_SIGN.
> >> 
> >> Let's assume 32-bit HWIs.  The 16-bit (4-hex-digit) constant 0x8000 is
> >> 0x8000 regardless of whether the type is signed or unsigned.  But if it's
> >> extended to 32-bits you get two different numbers 0x8000 and 
> >> 0x8000,
> >> depending on the sign.
> [...]
> But extending the precision can change the right value of "len".
> Take the same example with 16-bit HWIs.  In wide_int terms, and with
> the original tree representation, the constant is a single HWI:
>
> 0x8000
>
> with len 1.  And in case (a) -- where we're asking for a 16-bit wide_int --
> this single HWI is all we want.  The signed and unsigned constants give
> the same wide_int.
>
> But the same constant extended to 32 bits and left "uncompressed" would be
> two HWIs:
>
> 0x 0x8000 for unsigned constants
> 0x 0x8000 for signed constants
>
> Compressed according to the sign scheme they are:
>
> 0x 0x8000 (len == 2) for unsigned constants
>0x8000 (len == 1) for signed constants
>
> which is also the new tree representation.
>
> So the unsigned case is different for (a) and (b).

Thanks,
Richard


Re: [PATCH, SH] Add support for inlined builtin-strcmp (1/2)

2013-10-18 Thread Kaz Kojima
Christian Bruel  wrote:
> This patch just reorganizes the SH code used for memory builtins into
> its own file, in preparation of the RTL strcmp hoisting in the next part.
> 
> OK for trunk ?

A minor nit:

>   * gcc/config/sh/sh-mem (expand_block_move): Moved here.

lost .c?

Ok with that change.
I'd like to wait for comments from the experts about .c vs. .cc
argument, though I also have no objections to renaming.

Regards,
kaz


[PATCH][buildrobot] frv: Fix get_tree_code_name() conversion fallout

2013-10-18 Thread Jan-Benedict Glaw
Hi!

When building for frv-linux, I see some fallout (cf.
http://toolchain.lug-owl.de/buildbot/deliver_artifact.php?mode=view&id=91345):

g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. 
-I. -I/home/vaxbuild/repos/gcc/gcc -I/home/vaxbuild/repos/gcc/gcc/. 
-I/home/vaxbuild/repos/gcc/gcc/../include 
-I/home/vaxbuild/repos/gcc/gcc/../libcpp/include  
-I/home/vaxbuild/repos/gcc/gcc/../libdecnumber 
-I/home/vaxbuild/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/home/vaxbuild/repos/gcc/gcc/../libbacktrace-o frv.o -MT frv.o -MMD -MP 
-MF ./.deps/frv.TPo /home/vaxbuild/repos/gcc/gcc/config/frv/frv.c
/home/vaxbuild/repos/gcc/gcc/config/frv/frv.c: In function ‘void 
frv_init_cumulative_args(int*, tree, rtx, tree, int)’:
/home/vaxbuild/repos/gcc/gcc/config/frv/frv.c:3097:51: error: invalid 
conversion from ‘int’ to ‘tree_code’ [-fpermissive]
  get_tree_code_name ((int)TREE_CODE (ret_type)));
   ^
make[1]: *** [frv.o] Error 1

Fixed like this:

2013-10-18  Jan-Benedict Glaw  

gcc/
* config/frv/frv.c (frv_init_cumulative_args): Fix wrong cast.

diff --git a/gcc/config/frv/frv.c b/gcc/config/frv/frv.c
index 41ae2bb..bcd5511 100644
--- a/gcc/config/frv/frv.c
+++ b/gcc/config/frv/frv.c
@@ -3094,7 +3094,7 @@ frv_init_cumulative_args (CUMULATIVE_ARGS *cum,
{
  tree ret_type = TREE_TYPE (fntype);
  fprintf (stderr, " return=%s,",
-  get_tree_code_name ((int)TREE_CODE (ret_type)));
+  get_tree_code_name (TREE_CODE (ret_type)));
}
 
   if (libname && GET_CODE (libname) == SYMBOL_REF)


Ok?

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:   The real problem with C++ for kernel modules is:
the second  : the language just sucks.
   -- Linus Torvalds


signature.asc
Description: Digital signature


Re: [wide-int] int_traits

2013-10-18 Thread Richard Biener
On Thu, 17 Oct 2013, Mike Stump wrote:

> On Oct 17, 2013, at 1:46 AM, Richard Biener  wrote:
> > [This function shows another optimization issue:
> > 
> >case BOOLEAN_TYPE:
> >  /* Cache false or true.  */
> >  limit = 2;
> >  if (wi::leu_p (cst, 1))
> >ix = cst.to_uhwi ();
> > 
> > I would have expected cst <= 1 be optimized to cst.len == 1 &&
> > cst.val[0] <= 1.
> 
> So lts_p has the same problem, and is used just below.  If we do:
> 
> @@ -1461,12 +1470,22 @@ wi::ne_p (const T1 &x, const T2 &y)
>  inline bool
>  wi::lts_p (const wide_int_ref &x, const wide_int_ref &y)
>  {
> -  if (x.precision <= HOST_BITS_PER_WIDE_INT
> -  && y.precision <= HOST_BITS_PER_WIDE_INT)
> -return x.slow () < y.slow ();
> -  else
> -return lts_p_large (x.val, x.len, x.precision, y.val, y.len,
> -   y.precision);
> +  // We optimize w < N, where N is 64 or fewer bits.
> +  if (y.precision <= HOST_BITS_PER_WIDE_INT)
> +{
> +  // If it fits directly into a shwi, we can compare directly.
> +  if (wi::fits_shwi_p (x))
> +   return x.slow () < y.slow ();
> +  // If it is doesn't fit, then it must be more negative than any
> +  // value in y, and hence smaller.
> +  if (neg_p (x, SIGNED))
> +   return true;
> +  // If it is positve, then it must be larger than any value in y,
> +  // and hence greater than.
> +  return false;
> +}
> +  return lts_p_large (x.val, x.len, x.precision, y.val, y.len,
> + y.precision);
>  }
> 
> then for:
> 
> bool MGEN(wide_int cst) {
>   return wi::lts_p (cst, 100);
> }
> 
> we generate:
> 
> movl520(%rsp), %eax
> cmpl$1, %eax
> je  .L5283
> subl$1, %eax
> movq8(%rsp,%rax,8), %rax
> shrq$63, %rax
> ret
> .p2align 4,,10
> .p2align 3
> .L5283:
> cmpq$99, 8(%rsp)
> setle   %al
> ret
> 
> which as you can see, never calls lts_p_large, and hence, if that was the 
> only reason the storage escapes, then it should be able to optimize better.  
> In the above code, minimal, no function calls, and the 100 is propagated all 
> the way down into the cmpq.  Now, the reason why we should do it this way, 
> most of the time, most types are 64 bits or less.  When that is true, then it 
> will never call into lts_p_large.
> 
> If the above 100 is changed to l, from a parameter long l, then the code is 
> the same except the last part is:
> 
> cmpq8(%rsp), %rdi
> setg%al
> ret
> 
> If we have two wide_int's, then the code does this:
> 
> .cfi_startproc
> subq$8, %rsp
>   .cfi_def_cfa_offset 16
> movl1052(%rsp), %r9d
> movl1048(%rsp), %r8d
> movl532(%rsp), %edx
> movl528(%rsp), %esi
> cmpl$64, %r9d
> ja  .L5281
> cmpl$1, %esi
>   je  .L5284
> subl$1, %esi
> movq16(%rsp,%rsi,8), %rax
> addq$8, %rsp
> .cfi_remember_state
> .cfi_def_cfa_offset 8
> shrq$63, %rax
> ret
> .p2align 4,,10
> .p2align 3
> .L5281:
> .cfi_restore_state
> leaq536(%rsp), %rcx
> leaq16(%rsp), %rdi
> call_ZN2wi11lts_p_largeEPKljjS1_jj
>   addq$8, %rsp
> .cfi_remember_state
> .cfi_def_cfa_offset 8
> ret
> .p2align 4,,10
> .p2align 3
> .L5284:
> .cfi_restore_state
> movq536(%rsp), %rax
> cmpq%rax, 16(%rsp)
> setl%al
> addq$8, %rsp
> .cfi_def_cfa_offset 8
> ret
> .cfi_endproc
> 
> which is still pretty nice.
> 
> Does this look ok?  Kenny, can you double check the cases, think I have them 
> right, but?  a double check would be good.

That works for me.

> > the representation should guarantee the
> > compare with a low precision (32 bit) constant is evaluatable
> > at compile-time if len of the larger value is > 1, no?
> 
> I agree, though, I didn't check the symmetric case, as constants and 
> smaller values can be put on the right by convention.
>
> > The question is whether we should try to optimize wide-int for
> > such cases or simply not use wi:leu_p (cst, 1) but rather
> > 
> > if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1)
> > 
> > ?
> 
> Since we can do an excellent job with the simple interface, I'd argue 
> for the simple interface and the simple change to do the conditional 
> better.  I'd rather up the ante on meta programming to get any 
> performance we want, retaining the simple interfaces.

Agreed.  Note that I'd make heavy use of __builtin_constant_p
in the inline implementation as if we cannot optimize the choice
at compile-time we'll both slow down things and make the code
bloated up to the extent that the inliner no longer will consider
it for inlining.  That is,

> +  // We optimize w

Re: [wide-int] int_traits

2013-10-18 Thread Richard Biener
On Thu, 17 Oct 2013, Kenneth Zadeck wrote:

> On 10/17/2013 10:05 AM, Richard Sandiford wrote:
> > Richard Biener  writes:
> > > > > What's the reason again to not use my original proposed encoding
> > > > > of the MSB being the sign bit?  RTL constants simply are all signed
> > > > > then.  Just you have to also sign-extend in functions like lts_p
> > > > > as not all constants are sign-extended.  But we can use both tree
> > > > > (with the now appended zero) and RTL constants representation
> > > > > unchanged.
> > > > The answer's the same as always.  RTL constants don't have a sign.
> > > > Any time we extend an RTL constant, we need to say whether the extension
> > > > should be sign extension or zero extension.  So the natural model for
> > > > RTL is that an SImode addition is done to SImode width, not some
> > > > arbitrary wider width.
> > > RTL constants are sign-extended (whether you call them then "signed"
> > > is up to you).  They have a precision.  This is how they are
> > > valid reps for wide-ints, and that doesn't change.
> > > 
> > > I was saying that if we make not _all_ wide-ints sign-extended
> > > then we can use the tree rep as-is.  We'd then have the wide-int
> > > rep being either zero or sign extended but not arbitrary random
> > > bits outside of the precision (same as the tree rep).
> > OK, but does that have any practical value over leaving them arbitrary,
> > as in Kenny's original implementation?  Saying that upper bits can be
> > either signs or zeros means that readers wouldn't be able to rely on
> > one particular extension (so would have to do the work that they did
> > in Kenny's implementation).  At the same time it means that writers
> > can't leave the upper bits in an arbitrary state, so would have to
> > make sure that they are signs or zeros (presumably having a free
> > choice of which).
> > 
> > Thanks,
> > Richard
> > 
> not so easy here.We would still need to clean things up when we convert
> back to tree-cst or rtl at least for the short precisions.Otherwise, the

Sure.  But wide-int to tree / RTL is already expensive as we're going to
allocate memory and copy the rep anyway (and modify-on-copy makes the
modify part almost free I'd say).

> parts of the compiler that look inside of trees, in particular the hash
> functions have to clean the value them selves.
> 
> As much as my arm still hurts from richi forcing me to change this, it is true
> that it seems to have been the right thing to do.
> 
> the change for the wider unsigned trees perhaps should be backed out.

Yeah, I suppose we can avoid the 2nd copy for the fixed_wide_int
interface in other ways.

Richard.


Re: [wide-int] int_traits

2013-10-18 Thread Richard Biener
On Thu, 17 Oct 2013, Kenneth Zadeck wrote:

> On 10/17/2013 09:48 AM, Richard Biener wrote:
> > On Thu, 17 Oct 2013, Richard Sandiford wrote:
> > 
> > > Richard Biener  writes:
> > > > On Thu, 17 Oct 2013, Richard Sandiford wrote:
> > > > 
> > > > > Richard Biener  writes:
> > > > > > > The new tree representation can have a length greater than max_len
> > > > > > > for an unsigned tree constant that occupies a whole number of
> > > > > > > HWIs.
> > > > > > > The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
> > > > > > > When extended to max_wide_int the representation is the same.
> > > > > > > But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading
> > > > > > > zero.
> > > > > > > The MIN trims the length from 3 to 2 in the last case.
> > > > > > Oh, so it was the tree rep that changed?  _Why_ was it changed?
> > > > > > We still cannot use it directly from wide-int and the extra
> > > > > > word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE
> > > > > > ()).
> > > > > It means that we can now use the tree's HWI array directly, without
> > > > > any
> > > > > copying, for addr_wide_int and max_wide_int.  The only part of
> > > > > decompose ()
> > > > > that does a copy is the small_prec case, which is trivially compiled
> > > > > out
> > > > > for addr_wide_int and max_wide_int.
> > > > " 2) addr_wide_int.  This is a fixed size representation that is
> > > >   guaranteed to be large enough to compute any bit or byte sized
> > > >   address calculation on the target.  Currently the value is 64 + 4
> > > >   bits rounded up to the next number even multiple of
> > > >   HOST_BITS_PER_WIDE_INT (but this can be changed when the first
> > > >   port needs more than 64 bits for the size of a pointer).
> > > > 
> > > >   This flavor can be used for all address math on the target.  In
> > > >   this representation, the values are sign or zero extended based
> > > >   on their input types to the internal precision.  All math is done
> > > >   in this precision and then the values are truncated to fit in the
> > > >   result type.  Unlike most gimple or rtl intermediate code, it is
> > > >   not useful to perform the address arithmetic at the same
> > > >   precision in which the operands are represented because there has
> > > >   been no effort by the front ends to convert most addressing
> > > >   arithmetic to canonical types.
> > > > 
> > > >   In the addr_wide_int, all numbers are represented as signed
> > > >   numbers.  There are enough bits in the internal representation so
> > > >   that no infomation is lost by representing them this way."
> > > > 
> > > > so I guess from that that addr_wide_int.get_precision is always
> > > > that "64 + 4 rounded up".  Thus decompose gets that constant precision
> > > > input and the extra zeros make the necessary extension always a no-op.
> > > > Aha.
> > > > 
> > > > For max_wide_int the same rules apply, just its size is larger.
> > > > 
> > > > Ok.  So the reps are only canonical wide-int because we only
> > > > ever use them with precision > xprecision (maybe we should assert
> > > > that).
> > > No, we allow precision == xprecision for addr_wide_int and max_wide_int
> > > too.
> > > But all we do in that case is trim the length.
> > > 
> > > Requiring precision > xprecision was option (5) from my message.
> > > 
> > > > Btw, we are not using them directly, but every time we actually
> > > > build a addr_wide_int / max_wide_int we copy them anyway:
> > > > 
> > > > /* Initialize the storage from integer X, in precision N.  */
> > > > template 
> > > > template 
> > > > inline fixed_wide_int_storage ::fixed_wide_int_storage (const T &x)
> > > > {
> > > >/* Check for type compatibility.  We don't want to initialize a
> > > >   fixed-width integer from something like a wide_int.  */
> > > >WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
> > > >wide_int_ref xi (x, N);
> > > >len = xi.len;
> > > >for (unsigned int i = 0; i < len; ++i)
> > > >  val[i] = xi.val[i];
> > > > }
> > > Are you saying that:
> > > 
> > >   max_wide_int x = (tree) y;
> > > 
> > > should just copy a pointer?  Then what about:
> > No, it should do a copy.  But only one - which it does now with
> > the append-extra-zeros-in-tree-rep, I was just thinking how to
> > avoid it when not doing that which means adjusting the rep in
> > the copy we need to do anyway.  If fixed_wide_int_storage is
> > really the only reason to enlarge the tree rep.
> >   
> > >   max_wide_int z = x;
> > > 
> > > Should that just copy a pointer too, or is it OK for the second
> > > constructor
> > > to do a copy?
> > > 
> > > In most cases we only use the fixed_wide_int to store the result of the
> > > computation.  We don't use the above constructor for things like:
> > > 
> > >   max_wide_int x = wi::add ((tree) y, (int) z);
> > > 
> > > wi::add u

Re: [patch] Rename tree-flow.h to tree-cfg.h.

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 2:48 AM, Andrew MacLeod  wrote:
> This patch finally renames tree-flow.h to tree-cfg.h.  It now contains just
> the prototypes for tree-cfg.h.

Yay!

> I've also removed all the #include's from tree-cfg.h, and relocated them
> (temporarily) to tree-ssa.h which is acting as the tree-ssa module header.
>
> tree-flow.h is removed from all includes, and a few places were tweaked to
> include just the bit(s) they may have needed from the old tree-flow.h list.
> (for instance, gimple.c only needed bitmap.h, nothing else :-P)
>
> virtual_operand_p is removed from tree-ssa-operands.h and relocated to
> gimple.c.  It actually has nothing to do with operand processing, merely
> queries whether a given VAR_DECL is the global variable for VOPs.
> (VAR_DECL_CHECK (NODE)->base.u.bits.saturating_flag). So I dont think it
> even counts as a border routine for gimple-ssa.h. Moving this prevented
> tree-ssa-operands.h from being required in a couple of files that shouldn't
> need it..

You could argue it belongs to tree-ssa-alias.h as it is a GIMPLE alias
machinery specific var ...

> A set of 8 more patches will follow this (very simple ones :-) which reduce
> the include list out of tree-ssa.h to just files which are commonly used.
>
> rs6000 and alpha were including tree-flow.h in their config/target.c file. I
> built stage 1 for each target as a cross compiler to confirm builds dont
> break due to something unresolved.  rs6000.c no longer needed it, alpha
> needs num_ssa_names, so it requires gimple-ssa.h (instead of tree-flow.h).
> There is a compilation error in alpha (like on many other targets) for
> target_flags_explicit when it fails, that is the only unresolved
> external left the situation which existed before this patch.
>
> gimple-ssa.h cannot be compiled without tree-ssa-operands.h (it requires
> some structs and fields), so I #include tree-ssa-operands.h *back* into
> gimple-ssa.h where it was a week ago... not sure why I took it out.  oops.
>
> Bootstraps on x86_64-unknown-linux-gnu with no new regressions.
>
> OK?

Ok.

Thanks,
Richard.

> Andrew
>


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-18 Thread Zhenqiang Chen
On 18 October 2013 00:58, Jeff Law  wrote:
> On 10/17/13 05:03, Richard Biener wrote:

 Is it OK for trunk?
>>>
>>>
>>> I had a much simpler change which did basically the same from 4.7 (I
>>> can update it if people think this is a better approach).
>>
>>
>> I like that more (note you can now use is_gimple_condexpr as predicate
>> for force_gimple_operand).
>
> The obvious question is whether or not Andrew's simpler change picks up as
> many transformations as Zhenqiang's change.  If not are the things missed
> important.
>
> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?

Here is a rough compare:

1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
in my patch). Root cause is that it does not skip "LABEL". The guard
to do this opt should be the same the bb_has_overhead_p in my patch.

2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate

  _3 = a_2(D) > 0;
  _5 = b_4(D) > 0;
  _6 = _3 | _5;
  _9 = c_7(D) <= 0;
  _10 = ~_6;
  _11 = _9 & _10;
  if (_11 == 0)

With my patch, it will generate

  _3 = a_2(D) > 0;
  _5 = b_4(D) > 0;
  _6 = _3 | _5;
  _9 = c_7(D) > 0;
  _10 = _6 | _9;
  if (_10 != 0)

3) The good thing of Andrew P.'s change is that "Inverse the order of
the basic block walk" so it can do combine recursively.

But I think we need some heuristic to control the number of ifs. Move
too much compares from
the inner_bb to outer_bb is not good.

4) Another good thing of Andrew P.'s change is that it reuses some
existing functions. So it looks much simple.

>>
>> With that we should be able to kill the fold-const.c transform?
>
> That would certainly be nice and an excellent follow-up for Zhenqiang.

That's my final goal to "kill the fold-const.c transform". I think we
may combine the two changes to make a "simple" and "good" patch.

Thanks!
-Zhenqiang


Re: [Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-18 Thread Paolo Carlini


Hi,

>Oops that's only perfect Chinese version of "hello, world" under UTF-8
>charset. Now it's hex format.
>
>I also make the char handling simpler.

Great, thanks!

Paolo



Re: [PATCH][buildrobot] frv: Fix get_tree_code_name() conversion fallout

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 10:23 AM, Jan-Benedict Glaw  wrote:
> Hi!
>
> When building for frv-linux, I see some fallout (cf.
> http://toolchain.lug-owl.de/buildbot/deliver_artifact.php?mode=view&id=91345):
>
> g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
> -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic 
> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
> -DHAVE_CONFIG_H -I. -I. -I/home/vaxbuild/repos/gcc/gcc 
> -I/home/vaxbuild/repos/gcc/gcc/. -I/home/vaxbuild/repos/gcc/gcc/../include 
> -I/home/vaxbuild/repos/gcc/gcc/../libcpp/include  
> -I/home/vaxbuild/repos/gcc/gcc/../libdecnumber 
> -I/home/vaxbuild/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
> -I/home/vaxbuild/repos/gcc/gcc/../libbacktrace-o frv.o -MT frv.o -MMD -MP 
> -MF ./.deps/frv.TPo /home/vaxbuild/repos/gcc/gcc/config/frv/frv.c
> /home/vaxbuild/repos/gcc/gcc/config/frv/frv.c: In function ‘void 
> frv_init_cumulative_args(int*, tree, rtx, tree, int)’:
> /home/vaxbuild/repos/gcc/gcc/config/frv/frv.c:3097:51: error: invalid 
> conversion from ‘int’ to ‘tree_code’ [-fpermissive]
>   get_tree_code_name ((int)TREE_CODE (ret_type)));
>^
> make[1]: *** [frv.o] Error 1
>
> Fixed like this:
>
> 2013-10-18  Jan-Benedict Glaw  
>
> gcc/
> * config/frv/frv.c (frv_init_cumulative_args): Fix wrong cast.
>
> diff --git a/gcc/config/frv/frv.c b/gcc/config/frv/frv.c
> index 41ae2bb..bcd5511 100644
> --- a/gcc/config/frv/frv.c
> +++ b/gcc/config/frv/frv.c
> @@ -3094,7 +3094,7 @@ frv_init_cumulative_args (CUMULATIVE_ARGS *cum,
> {
>   tree ret_type = TREE_TYPE (fntype);
>   fprintf (stderr, " return=%s,",
> -  get_tree_code_name ((int)TREE_CODE (ret_type)));
> +  get_tree_code_name (TREE_CODE (ret_type)));
> }
>
>if (libname && GET_CODE (libname) == SYMBOL_REF)
>
>
> Ok?

Ok.

Thanks,
Richard.

> MfG, JBG
>
> --
>   Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
> Signature of:   The real problem with C++ for kernel modules is:
> the second  : the language just sucks.
>-- Linus Torvalds


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen
 wrote:
> On 18 October 2013 00:58, Jeff Law  wrote:
>> On 10/17/13 05:03, Richard Biener wrote:
>
> Is it OK for trunk?


 I had a much simpler change which did basically the same from 4.7 (I
 can update it if people think this is a better approach).
>>>
>>>
>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>> for force_gimple_operand).
>>
>> The obvious question is whether or not Andrew's simpler change picks up as
>> many transformations as Zhenqiang's change.  If not are the things missed
>> important.
>>
>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>
> Here is a rough compare:
>
> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
> in my patch). Root cause is that it does not skip "LABEL". The guard
> to do this opt should be the same the bb_has_overhead_p in my patch.

I think we want a "proper" predicate in tree-cfg.c for this, like maybe
a subset of tree_forwarder_block_p or whatever it will end up looking
like (we need "effectively empty BB" elsewhere, for example in vectorization,
add a flag to allow a condition ending the BB and the predicate is done).

> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>
>   _3 = a_2(D) > 0;
>   _5 = b_4(D) > 0;
>   _6 = _3 | _5;
>   _9 = c_7(D) <= 0;
>   _10 = ~_6;
>   _11 = _9 & _10;
>   if (_11 == 0)
>
> With my patch, it will generate
>
>   _3 = a_2(D) > 0;
>   _5 = b_4(D) > 0;
>   _6 = _3 | _5;
>   _9 = c_7(D) > 0;
>   _10 = _6 | _9;
>   if (_10 != 0)

But that seems like a missed simplification in predicate combining
which should be fixed more generally.

> 3) The good thing of Andrew P.'s change is that "Inverse the order of
> the basic block walk" so it can do combine recursively.
>
> But I think we need some heuristic to control the number of ifs. Move
> too much compares from
> the inner_bb to outer_bb is not good.

True, but that's what fold-const.c does, no?

> 4) Another good thing of Andrew P.'s change is that it reuses some
> existing functions. So it looks much simple.

Indeed - that's what I like about it.

>>>
>>> With that we should be able to kill the fold-const.c transform?
>>
>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>
> That's my final goal to "kill the fold-const.c transform". I think we
> may combine the two changes to make a "simple" and "good" patch.

Thanks,
Richard.

> Thanks!
> -Zhenqiang


Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.

2013-10-18 Thread Richard Biener
On Thu, 17 Oct 2013, Cong Hou wrote:

> I tested this case with -fno-tree-loop-im and -fno-tree-pre, and it
> seems that GCC could hoist j+1 outside of the i loop:
> 
> t3.c:5:5: note: hoisting out of the vectorized loop: _10 = (sizetype) j_25;
> t3.c:5:5: note: hoisting out of the vectorized loop: _11 = _10 + 1;
> t3.c:5:5: note: hoisting out of the vectorized loop: _12 = _11 * 4;
> t3.c:5:5: note: hoisting out of the vectorized loop: _14 = b_13(D) + _12;
> t3.c:5:5: note: hoisting out of the vectorized loop: _15 = *_14;
> t3.c:5:5: note: hoisting out of the vectorized loop: _16 = _15 + 1;
> 
> 
> But your suggestion is still nice as it can remove a branch and make
> the code more brief. I have updated the patch and also included the
> nested loop example into the test case.

Ok if it passes bootstrap & regtest.

What is now missing is applying this kind of transform in the
case the invariant ref is found to not alias and thus we don't
require versioning for alias.

Thanks,
Richard.

> Thank you!
> 
> 
> Cong
> 
> 
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 8a38316..2637309 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2013-10-15  Cong Hou  
> +
> + * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop invariant
> + statement that contains data refs with zero-step.
> +
>  2013-10-14  David Malcolm  
> 
>   * dumpfile.h (gcc::dump_manager): New class, to hold state
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 075d071..9d0f4a5 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2013-10-15  Cong Hou  
> +
> + * gcc.dg/vect/pr58508.c: New test.
> +
>  2013-10-14  Tobias Burnus  
> 
>   PR fortran/58658
> diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c
> b/gcc/testsuite/gcc.dg/vect/pr58508.c
> new file mode 100644
> index 000..6484a65
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr58508.c
> @@ -0,0 +1,70 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +
> +
> +/* The GCC vectorizer generates loop versioning for the following loop
> +   since there may exist aliasing between A and B.  The predicate checks
> +   if A may alias with B across all iterations.  Then for the loop in
> +   the true body, we can assert that *B is a loop invariant so that
> +   we can hoist the load of *B before the loop body.  */
> +
> +void test1 (int* a, int* b)
> +{
> +  int i;
> +  for (i = 0; i < 10; ++i)
> +a[i] = *b + 1;
> +}
> +
> +/* A test case with nested loops.  The load of b[j+1] in the inner
> +   loop should be hoisted.  */
> +
> +void test2 (int* a, int* b)
> +{
> +  int i, j;
> +  for (j = 0; j < 10; ++j)
> +for (i = 0; i < 10; ++i)
> +  a[i] = b[j+1] + 1;
> +}
> +
> +/* A test case with ifcvt transformation.  */
> +
> +void test3 (int* a, int* b)
> +{
> +  int i, t;
> +  for (i = 0; i < 1; ++i)
> +{
> +  if (*b > 0)
> + t = *b * 2;
> +  else
> + t = *b / 2;
> +  a[i] = t;
> +}
> +}
> +
> +/* A test case in which the store in the loop can be moved outside
> +   in the versioned loop with alias checks.  Note this loop won't
> +   be vectorized.  */
> +
> +void test4 (int* a, int* b)
> +{
> +  int i;
> +  for (i = 0; i < 10; ++i)
> +*a += b[i];
> +}
> +
> +/* A test case in which the load and store in the loop to b
> +   can be moved outside in the versioned loop with alias checks.
> +   Note this loop won't be vectorized.  */
> +
> +void test5 (int* a, int* b)
> +{
> +  int i;
> +  for (i = 0; i < 10; ++i)
> +{
> +  *b += a[i];
> +  a[i] = *b;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "hoist" 8 "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 574446a..1cc563c 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -2477,6 +2477,73 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
>  }
> 
> +
> +  /* Extract load statements on memrefs with zero-stride accesses.  */
> +
> +  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
> +{
> +  /* In the loop body, we iterate each statement to check if it is a 
> load.
> + Then we check the DR_STEP of the data reference.  If DR_STEP is zero,
> + then we will hoist the load statement to the loop preheader.  */
> +
> +  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> +  int nbbs = loop->num_nodes;
> +
> +  for (int i = 0; i < nbbs; ++i)
> + {
> +  for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]);
> +   !gsi_end_p (si);)
> +{
> +  gimple stmt = gsi_stmt (si);
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
> +
> +  if (is_gimple_assign (stmt)
> +  && (!dr
> +  || (DR_IS_READ (dr) && integer_zerop (DR_S

Re: PR C++/58708 - string literal operator templates broken

2013-10-18 Thread Paolo Carlini

On 10/18/2013 04:42 AM, Ed Smith-Rowland wrote:
Here is a patch to correct the char type and the type, number of 
nontype parameter pack for user defined strng literal operator templates.

Let's make sure Jason is in CC.

Thanks!
Paolo.


Re: [PATCH][i386]Fix PR 57756

2013-10-18 Thread Dominique Dhumieres
Sriraman,

The tests gcc.target/i386/funcspec-5.c and gcc.target/i386/pr57756.c fail 
on targets for which -msse is the default (see
http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01365.html or
http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01345.html ).

This is fixed with the following patch:

diff -up ../_clean/gcc/testsuite/gcc.target/i386/funcspec-5.c 
gcc/testsuite/gcc.target/i386/funcspec-5.c
--- ../_clean/gcc/testsuite/gcc.target/i386/funcspec-5.c2011-08-23 
21:54:27.0 +0200
+++ gcc/testsuite/gcc.target/i386/funcspec-5.c  2013-10-17 09:45:20.0 
+0200
@@ -2,6 +2,7 @@
without error.  */
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
+/* { dg-options "-mno-sse" } */
 
 extern void test_abm (void)
__attribute__((__target__("abm")));
 extern void test_aes (void)
__attribute__((__target__("aes")));
diff -up ../_clean/gcc/testsuite/gcc.target/i386/pr57756.c 
gcc/testsuite/gcc.target/i386/pr57756.c
--- ../_clean/gcc/testsuite/gcc.target/i386/pr57756.c   2013-10-15 
23:53:31.0 +0200
+++ gcc/testsuite/gcc.target/i386/pr57756.c 2013-10-17 09:46:53.0 
+0200
@@ -1,6 +1,7 @@
 /* callee cannot be inlined into caller because it has a higher
target ISA.  */
 /* { dg-do compile } */
+/* { dg-options "-mno-sse" } */
 
 __attribute__((always_inline,target("sse4.2")))
 __inline int callee () /* { dg-error "inlining failed in call to 
always_inline" }  */

My apologies if this has already reported in the thread (I may have missed some 
posts).

Dominique


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-18 Thread Zhenqiang Chen
On 18 October 2013 17:53, Richard Biener  wrote:
> On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen
>  wrote:
>> On 18 October 2013 00:58, Jeff Law  wrote:
>>> On 10/17/13 05:03, Richard Biener wrote:
>>
>> Is it OK for trunk?
>
>
> I had a much simpler change which did basically the same from 4.7 (I
> can update it if people think this is a better approach).


 I like that more (note you can now use is_gimple_condexpr as predicate
 for force_gimple_operand).
>>>
>>> The obvious question is whether or not Andrew's simpler change picks up as
>>> many transformations as Zhenqiang's change.  If not are the things missed
>>> important.
>>>
>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>
>> Here is a rough compare:
>>
>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>> in my patch). Root cause is that it does not skip "LABEL". The guard
>> to do this opt should be the same the bb_has_overhead_p in my patch.
>
> I think we want a "proper" predicate in tree-cfg.c for this, like maybe
> a subset of tree_forwarder_block_p or whatever it will end up looking
> like (we need "effectively empty BB" elsewhere, for example in vectorization,
> add a flag to allow a condition ending the BB and the predicate is done).
>
>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>
>>   _3 = a_2(D) > 0;
>>   _5 = b_4(D) > 0;
>>   _6 = _3 | _5;
>>   _9 = c_7(D) <= 0;
>>   _10 = ~_6;
>>   _11 = _9 & _10;
>>   if (_11 == 0)
>>
>> With my patch, it will generate
>>
>>   _3 = a_2(D) > 0;
>>   _5 = b_4(D) > 0;
>>   _6 = _3 | _5;
>>   _9 = c_7(D) > 0;
>>   _10 = _6 | _9;
>>   if (_10 != 0)
>
> But that seems like a missed simplification in predicate combining
> which should be fixed more generally.
>
>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>> the basic block walk" so it can do combine recursively.
>>
>> But I think we need some heuristic to control the number of ifs. Move
>> too much compares from
>> the inner_bb to outer_bb is not good.
>
> True, but that's what fold-const.c does, no?

Based on current fold-const, we can not generate more than "two"
compares in a basic block.

But if ifs can be combined recursively in ifcombine, we might generate
many compares in a basic block.

>> 4) Another good thing of Andrew P.'s change is that it reuses some
>> existing functions. So it looks much simple.
>
> Indeed - that's what I like about it.
>

 With that we should be able to kill the fold-const.c transform?
>>>
>>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>>
>> That's my final goal to "kill the fold-const.c transform". I think we
>> may combine the two changes to make a "simple" and "good" patch.
>
> Thanks,
> Richard.
>
>> Thanks!
>> -Zhenqiang


Re: RFC: Add of type-demotion pass

2013-10-18 Thread Richard Biener
On Thu, Oct 17, 2013 at 9:32 PM, Jeff Law  wrote:
> On 10/17/13 04:41, Richard Biener wrote:
>>>
>>> I don't see this as the major benefit of type demotion.  Yes, there is
>>> some
>>> value in shrinking constants and the like, but in my experience the
>>> benefits
>>> are relatively small and often get lost in things like partial register
>>> stalls on x86, the PA and probably others (yes, the PA has partial
>>> register
>>> stalls, it's just that nobody used that term).
>>>
>>> What I really want to get at here is avoiding having a large number of
>>> optimizers looking back through the use-def chains and attempting to
>>> elide
>>> typecasts in the middle of a chain of statements of interest.
>>
>>
>> Hmm, off the top of my head only forwprop and VRP look back through
>> use-def chains to elide typecasts.  And they do that to optimize those
>> casts, thus it is their job ...?  Other cases are around, but those
>> are of the sorts of "is op1 available in type X and/or can I safely cast
>> it to type X?" that code isn't going to be simplified by generic
>> promotion / demotion because that code isn't going to know what
>> type pass Y in the end wants.
>
> I strongly suspect if we were to look hard at why various optimizations
> weren't being applied in cases where intuitively we think they should, we'd
> find that type conversions are often the culprit.
>
> And so we'd go off fixing the vectorizer, DOM, and god knows what else to
> start looking through the type conversions.  I want to stop this before it
> starts.
>
> I'm *certain* that to do this well, we're going to need a mess of additional
> cases in tree-ssa-forwprop.c based on my prior investigations.  A large part
> of the reason I stopped with that work was I could already see the code was
> ultimately going to be an utter mess.
>
>
>
>> Abstracting functions that can answer those questions instead of
>> repeating N variants of it would of course be nice.
>
> Or we can move the type conversions out of the way so they don't impact our
> optimizers.

You can't move type conversion "out of the way" in most cases as
GIMPLE is stronly typed
and data sources and sinks can obviously not be "promoted" (nor can
function arguments).
So you'll very likely not be able to remove the code from the
optimizers, it will only maybe
trigger less often.

>> Likewise reducing the number of places we perform promotion / demotion
>> (remove it from frontend code and fold, add it in the GIMPLE combiner).
>>
>> Also making the GIMPLE combiner available as an utility to apply
>> to a single statement (see my very original GIMPLE-fold proposal)
>> would be very useful.
>
> I strongly believe the gimple combiner is not the place to handle
> promotion/demotion based on already working through some of these issues
> privately.  It was that investigative work which led me to look more closely
> at what Kai was doing with the promotion/demotion work.
>
>
>
>>
>> As for promotion / demotion (if you are not talking about applying
>> PROMOTE_MODE which rather forces promotion of variables and
>> requires inserting compensation code), you want to optimize
>>
>>   op1 = (T) op1';
>>   op2 = (T) op2';
>>   x = op1 OP op2; (*)
>>   y = (T2) x;
>>
>> to either carry out OP in type T2 or in a type derived from the types
>> of op1' and op2'.
>
> That's part of the benefit, but you also want to be able to  look at where
> op1' and op2' came from and possibly do something even more significant than
> just changing the type of OP.  Getting the casts out of the way makes that a
> lot easier.   And that's one of the reasons why you want both promotion and
> demotion, both expose those kind of opportunities.
>
>
>>
>> For the simple case combine-like pattern matching is ok.  It gets
>> more complicated if there are a series of statements here (*), but
>> even that case is handled by iteratively applying the combiner
>> patterns (which forwprop does).
>
> Right, but you're still missing the point that every time a type conversion
> apepars in a stream of interesting statements that you have to special case
> the optimization to deal with the type conversions.
>
> With Kai's work that special casing goes away and thus our existing
> reassociation & forwprop passes do a better job without needing a ton of
> special cases.

See above - you can't remove the special casing.

>> If you split out promotion / demotion into a separate pass then
>> you introduce pass ordering issues as combining may introduce
>> promotion / demotion opportunities and the other way around.
>
> Right, which is why you promote, optimize, demote, optimize.  Both promotion
> and demotion have the potential to expose optimizable sequences.
>
> It's not perfect, but it's a hell of a lot better than what we do now.

I'm not sure ;)  Keep an eye on compile-time.

>> If we remove the ad-hoc frontend code and strip down fold then an
>> early combine phase (before CSE wrecks single-use cases) will
>> more reliably handle what fronte

[Ping] Re: [C++ Patch] Tidy a bit cp_parser_lookup_name

2013-10-18 Thread Paolo Carlini

Hi

On 10/13/2013 08:26 PM, Paolo Carlini wrote:

Hi,

today I noticed that in cp_parser_lookup_name the code in the 
object_type != NULL_TREE else can be tidied a bit and a 
lookup_name_real often avoided. Tested x86_64-linux.
Can I apply this clean-up? It seems pretty straightforward to me, 
essentially matter of logic, and saves a redundant name lookup. I'm also 
attaching a variant which avoids early initializing the decl local and 
shaves about 120 bytes out the release cc1plus compared to the first 
version.


Thanks!
Paolo.

//
Index: parser.c
===
--- parser.c(revision 203517)
+++ parser.c(working copy)
@@ -21875,7 +21875,6 @@ cp_parser_lookup_name (cp_parser *parser, tree nam
 }
   else if (object_type)
 {
-  tree object_decl = NULL_TREE;
   /* Look up the name in the scope of the OBJECT_TYPE, unless the
 OBJECT_TYPE is not a class.  */
   if (CLASS_TYPE_P (object_type))
@@ -21883,19 +21882,21 @@ cp_parser_lookup_name (cp_parser *parser, tree nam
   be instantiated during name lookup.  In that case, errors
   may be issued.  Even if we rollback the current tentative
   parse, those errors are valid.  */
-   object_decl = lookup_member (object_type,
-name,
-/*protect=*/0,
-tag_type != none_type,
-tf_warning_or_error);
-  /* Look it up in the enclosing context, too.  */
-  decl = lookup_name_real (name, tag_type != none_type,
-  /*nonclass=*/0,
-  /*block_p=*/true, is_namespace, 0);
+   decl = lookup_member (object_type,
+ name,
+ /*protect=*/0,
+ tag_type != none_type,
+ tf_warning_or_error);
+  else
+   decl = NULL_TREE;
+
+  if (!decl)
+   /* Look it up in the enclosing context.  */
+   decl = lookup_name_real (name, tag_type != none_type,
+/*nonclass=*/0,
+/*block_p=*/true, is_namespace, 0);
   parser->object_scope = object_type;
   parser->qualifying_scope = NULL_TREE;
-  if (object_decl)
-   decl = object_decl;
 }
   else
 {


Re: [PATCH] Fix part of PR58712

2013-10-18 Thread Markus Trippelsdorf
On 2013.10.15 at 12:49 +0200, Richard Biener wrote:
> On Tue, Oct 15, 2013 at 12:31 PM, Markus Trippelsdorf
>  wrote:
> > Valgrind complains:
> > ==27870== Conditional jump or move depends on uninitialised value(s)
> > ==27870==at 0x557CDC: cgraph_create_edge_1(cgraph_node*, cgraph_node*, 
> > gimple_statement_d*, long, int) (cgraph.c:695)
> > ==27870==by 0x55882E: cgraph_create_edge(cgraph_node*, cgraph_node*, 
> > gimple_statement_d*, long, int) (cgraph.c:890)
> > ==27870==by 0x560891: cgraph_clone_edge(cgraph_edge*, cgraph_node*, 
> > gimple_statement_d*, unsigned int, long, int, bool) (cgraphclones.c:135)
> > ==27870==by 0x7F1F14: copy_body(copy_body_data*, long, int, 
> > basic_block_def*, basic_block_def*, basic_block_def*) (tree-inline.c:1741)
> > ...
> > ==27870==  Uninitialised value was created by a client request
> > ==27870==at 0x50BBEE: ggc_internal_alloc_stat(unsigned long) 
> > (ggc-page.c:1339)
> > ==27870==by 0x557D92: cgraph_create_edge_1(cgraph_node*, cgraph_node*, 
> > gimple_statement_d*, long, int) (cgraph.c:842)
> > ==27870==by 0x55882E: cgraph_create_edge(cgraph_node*, cgraph_node*, 
> > gimple_statement_d*, long, int) (cgraph.c:890)
> > ==27870==by 0x560891: cgraph_clone_edge(cgraph_edge*, cgraph_node*, 
> > gimple_statement_d*, unsigned int, long, int, bool) (cgraphclones.c:135)
> > ,,,
> >
> > This happens because e->indirect_unknown_callee may be uninitialized in
> > cgraph_add_edge_to_call_site_hash. Fixed by initializing it earlier.
> >
> > LTO bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Please apply, if this looks reasonable.
> 
> As we also have
> 
> struct cgraph_edge *
> cgraph_create_indirect_edge (struct cgraph_node *caller, gimple call_stmt,
>  int ecf_flags,
>  gcov_type count, int freq)
> {
>   struct cgraph_edge *edge = cgraph_create_edge_1 (caller, NULL, call_stmt,
>count, freq);
>   tree target;
> 
>   edge->indirect_unknown_callee = 1;
> 
> I'd rather change the cgraph_create_edge_1 interface to get an additional
> argument (the value to use for indirect_unknown_callee).  Or maybe
> we can statically compute it from the current arguments already?

IOW something like this:

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 124ee0adf855..ccd73fa21b80 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -814,7 +814,8 @@ cgraph_set_call_stmt (struct cgraph_edge *e, gimple 
new_stmt,
 
 static struct cgraph_edge *
 cgraph_create_edge_1 (struct cgraph_node *caller, struct cgraph_node *callee,
-  gimple call_stmt, gcov_type count, int freq)
+  gimple call_stmt, gcov_type count, int freq,
+  bool indir_unknown_callee)
 {
   struct cgraph_edge *edge;
 
@@ -874,6 +875,7 @@ cgraph_create_edge_1 (struct cgraph_node *caller, struct 
cgraph_node *callee,
   edge->indirect_info = NULL;
   edge->indirect_inlining_edge = 0;
   edge->speculative = false;
+  edge->indirect_unknown_callee = indir_unknown_callee;
   if (call_stmt && caller->call_site_hash)
 cgraph_add_edge_to_call_site_hash (edge);
 
@@ -887,9 +889,8 @@ cgraph_create_edge (struct cgraph_node *caller, struct 
cgraph_node *callee,
gimple call_stmt, gcov_type count, int freq)
 {
   struct cgraph_edge *edge = cgraph_create_edge_1 (caller, callee, call_stmt,
-  count, freq);
+  count, freq, false);
 
-  edge->indirect_unknown_callee = 0;
   initialize_inline_failed (edge);
 
   edge->next_caller = callee->callers;
@@ -926,10 +927,9 @@ cgraph_create_indirect_edge (struct cgraph_node *caller, 
gimple call_stmt,
 gcov_type count, int freq)
 {
   struct cgraph_edge *edge = cgraph_create_edge_1 (caller, NULL, call_stmt,
-  count, freq);
+  count, freq, true);
   tree target;
 
-  edge->indirect_unknown_callee = 1;
   initialize_inline_failed (edge);
 
   edge->indirect_info = cgraph_allocate_init_indirect_info ();

-- 
Markus


Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2

2013-10-18 Thread Richard Biener
On Wed, Oct 9, 2013 at 3:09 AM, Bernd Edlinger
 wrote:
> Hi,
>
> On Mon, 30 Sep 2013 16:18:30, DJ Delorie wrote:
>>
>> As per my previous comments on this patch, I will not approve the
>> changes to the m32c backend, as they will cause real bugs in real
>> hardware, and violate the hardware's ABI. The user may use
>> -fno-strict-volatile-bitfields if they do not desire this behavior and
>> understand the consequences.
>>
>> I am not a maintainer for the rx and h8300 ports, but they are in the
>> same situation.
>>
>> To reiterate my core position: if the user defines a proper "volatile
>> int" bitfield, and the compiler does anything other than an int-sized
>> access, the compiler is WRONG. Any optimization that changes volatile
>> accesses to something other than what the user specified is a bug that
>> needs to be fixed before this option can be non-default.
>
> hmm, I just tried to use the latest 4.9 trunk to compile the example from
> the AAPCS document:
>
> struct s
> {
>   volatile int a:8;
>   volatile char b:2;
> };
>
> struct s ss;
>
> int
> main ()
> {
>   ss.a=1;
>   ss.b=1;
>   return 0;
> }
>
> and the resulting code is completely against the written AAPCS specification:
>
> main:
> @ Function supports interworking.
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> @ link register save eliminated.
> ldr r3, .L2
> ldrhr2, [r3]
> bic r2, r2, #254
> orr r2, r2, #1
> strhr2, [r3]@ movhi
> ldrhr2, [r3]
> bic r2, r2, #512
> orr r2, r2, #256
> strhr2, [r3]@ movhi
> mov r0, #0
> bx  lr
>
> two half-word accesses, to my total surprise!
>
> As it looks like, the -fstrict-volatile-bitfields are already totally broken,
> apparently in favour of the C++ memory model, at least at the write-side.

Note that the C++ memory model restricts the _maximum_ memory region
we may touch - it does not constrain the minimum as AAPCS does.

What I would suggest is to have a -fgnu-strict-volatile-bit-fields
(or two levels of it) enabled by default on AAPCS targets which will
follow the AAPCS if it doesn't violate the maximum memory region
constraints of the C++ memory model.

I never claimed that the C++ memory model is good enough for AAPCS
but there was consensus that the default on AAPCS should not violate
the C++ memory model by default.

-fgnu-strict-volatile-bit-fields should be fully implementable in
get_best_mode if you pass down the desired AAPCS mode.

Richard.

> These are aligned accesses, not the packed structures, that was the
> only case where it used to work once.
>
> This must be fixed. I do not understand why we cannot agree, that
> at least the bug-fix part of Sandra's patch needs to be applied.
>
> Regards
> Bernd.


RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2

2013-10-18 Thread Bernd Edlinger
Hi,

On Thu, 17 Oct 2013 16:41:13, DJ Delorie wrote:
> I'm starting from an MCU that doesn't work right if GCC doesn't do
> what the user tells GCC to do. I added -fstrict-volatile-bitfields to
> tell gcc that it needs to be more strict than the standard allows for
> bitfield access, because without that flag, there's no way to force
> gcc to use a specific access width on bitfields. When I added that
> flag, some ARM folks chose to enable it in their target, because they
> felt they needed it. If different ARM folks feel otherwise, that's a
> target problem.
>
> If the user tells gcc that a particular 32-bit memory location should
> be treated as both a char and a long, then gcc has been given
> inconsistent information. The spec says it can do what it wants to
> access that memory, and it does.
>
> If the user tells gcc that a particular 16-bit memory location
> consists of 16-bits worth of "unsigned short", and the user has told
> gcc that it needs to be strict about accessing that field in the type
> specified, and gcc uses a 32-bit access anyway, gcc is wrong.
>
> I will agree with you 100% that gcc can do whatever the spec allows if
> the user does NOT specify -fstrict-volatile-bitfields, but the flag is
> there to tell gcc that the user needs stricter control than the spec
> demands. If the user uses that flag, *and* gives gcc information that
> is inconsistent with the use of that flag, then the user is wrong.
>
> I note that you assume GNU/Linux is involved, perhaps that's part of
> the problem. Maybe the Linux kernel needs gcc to ignore its bitfield
> types, but other ARM firmware may have other requirements. If you and
> the other ARM maintainers want to argue over whether
> -fstrict-volatile-bitfields is enabled by default for the ARM target,
> go for it. Just leave my targets alone.
>
>> where those semantics contradict the semantics of ISO C.
>
> Where in the spec does it say that a compiler MUST access an "unsigned
> short" bitfield with a 32-bit access? I've seen places where it says
> the compiler MAY or MIGHT do it, but not MUST.

Well, I'm starting to think that we can never follow the ARM AAPCS by the 
letter.

But maybe we could implement -fstrict-volatile-bitfields in this way:

We use the container mode where possible.
It is always possible for well-formed bit-fields.

If necessary the user has to add anonymous bit field members, or
convert normal members to bit-fields.

But when the bit field is conflict with the either the hardware's alignment
requirements or with the C++11 memory model, we follow the latter.

This means that, even for volatile data:
1. we never write anything outside the BIT_FIELD_REPRESENTATIVE.
2. we allow unaligned packed data, but we may use multiple accesses for a
    single read/write op. Also in this case: no data store races outside the 
member.

Example:

struct {
  volatile  int a : 24;
  volatile char b;
};

This is not well-formed, and -fstrict-volatile-bitfields will have no effect on 
it.

The user has to re-write this structure to:

struct {
  volatile int a : 24;
  volatile char b : 8;
};

Maybe a warning for "not well-formed" bit-fields could be helpful...

What do you think?

Bernd.

Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 12:06 PM, Zhenqiang Chen
 wrote:
> On 18 October 2013 17:53, Richard Biener  wrote:
>> On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen
>>  wrote:
>>> On 18 October 2013 00:58, Jeff Law  wrote:
 On 10/17/13 05:03, Richard Biener wrote:
>>>
>>> Is it OK for trunk?
>>
>>
>> I had a much simpler change which did basically the same from 4.7 (I
>> can update it if people think this is a better approach).
>
>
> I like that more (note you can now use is_gimple_condexpr as predicate
> for force_gimple_operand).

 The obvious question is whether or not Andrew's simpler change picks up as
 many transformations as Zhenqiang's change.  If not are the things missed
 important.

 Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>>
>>> Here is a rough compare:
>>>
>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>>> in my patch). Root cause is that it does not skip "LABEL". The guard
>>> to do this opt should be the same the bb_has_overhead_p in my patch.
>>
>> I think we want a "proper" predicate in tree-cfg.c for this, like maybe
>> a subset of tree_forwarder_block_p or whatever it will end up looking
>> like (we need "effectively empty BB" elsewhere, for example in vectorization,
>> add a flag to allow a condition ending the BB and the predicate is done).
>>
>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>>
>>>   _3 = a_2(D) > 0;
>>>   _5 = b_4(D) > 0;
>>>   _6 = _3 | _5;
>>>   _9 = c_7(D) <= 0;
>>>   _10 = ~_6;
>>>   _11 = _9 & _10;
>>>   if (_11 == 0)
>>>
>>> With my patch, it will generate
>>>
>>>   _3 = a_2(D) > 0;
>>>   _5 = b_4(D) > 0;
>>>   _6 = _3 | _5;
>>>   _9 = c_7(D) > 0;
>>>   _10 = _6 | _9;
>>>   if (_10 != 0)
>>
>> But that seems like a missed simplification in predicate combining
>> which should be fixed more generally.
>>
>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>>> the basic block walk" so it can do combine recursively.
>>>
>>> But I think we need some heuristic to control the number of ifs. Move
>>> too much compares from
>>> the inner_bb to outer_bb is not good.
>>
>> True, but that's what fold-const.c does, no?
>
> Based on current fold-const, we can not generate more than "two"
> compares in a basic block.

I think you are wrong as fold is invoked recursively on a && b && c && d.

> But if ifs can be combined recursively in ifcombine, we might generate
> many compares in a basic block.

Yes, we can end up with that for the other simplifications done in
ifcombine as well.  Eventually there needs to be a cost model for this
(use edge probabilities for example, so that if you have profile data
you never combine cold blocks).

Any takers?

Richard.

>>> 4) Another good thing of Andrew P.'s change is that it reuses some
>>> existing functions. So it looks much simple.
>>
>> Indeed - that's what I like about it.
>>
>
> With that we should be able to kill the fold-const.c transform?

 That would certainly be nice and an excellent follow-up for Zhenqiang.
>>>
>>> That's my final goal to "kill the fold-const.c transform". I think we
>>> may combine the two changes to make a "simple" and "good" patch.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks!
>>> -Zhenqiang


Re: [PATCH] Fix part of PR58712

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 12:10 PM, Markus Trippelsdorf
 wrote:
> On 2013.10.15 at 12:49 +0200, Richard Biener wrote:
>> On Tue, Oct 15, 2013 at 12:31 PM, Markus Trippelsdorf
>>  wrote:
>> > Valgrind complains:
>> > ==27870== Conditional jump or move depends on uninitialised value(s)
>> > ==27870==at 0x557CDC: cgraph_create_edge_1(cgraph_node*, cgraph_node*, 
>> > gimple_statement_d*, long, int) (cgraph.c:695)
>> > ==27870==by 0x55882E: cgraph_create_edge(cgraph_node*, cgraph_node*, 
>> > gimple_statement_d*, long, int) (cgraph.c:890)
>> > ==27870==by 0x560891: cgraph_clone_edge(cgraph_edge*, cgraph_node*, 
>> > gimple_statement_d*, unsigned int, long, int, bool) (cgraphclones.c:135)
>> > ==27870==by 0x7F1F14: copy_body(copy_body_data*, long, int, 
>> > basic_block_def*, basic_block_def*, basic_block_def*) (tree-inline.c:1741)
>> > ...
>> > ==27870==  Uninitialised value was created by a client request
>> > ==27870==at 0x50BBEE: ggc_internal_alloc_stat(unsigned long) 
>> > (ggc-page.c:1339)
>> > ==27870==by 0x557D92: cgraph_create_edge_1(cgraph_node*, cgraph_node*, 
>> > gimple_statement_d*, long, int) (cgraph.c:842)
>> > ==27870==by 0x55882E: cgraph_create_edge(cgraph_node*, cgraph_node*, 
>> > gimple_statement_d*, long, int) (cgraph.c:890)
>> > ==27870==by 0x560891: cgraph_clone_edge(cgraph_edge*, cgraph_node*, 
>> > gimple_statement_d*, unsigned int, long, int, bool) (cgraphclones.c:135)
>> > ,,,
>> >
>> > This happens because e->indirect_unknown_callee may be uninitialized in
>> > cgraph_add_edge_to_call_site_hash. Fixed by initializing it earlier.
>> >
>> > LTO bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > Please apply, if this looks reasonable.
>>
>> As we also have
>>
>> struct cgraph_edge *
>> cgraph_create_indirect_edge (struct cgraph_node *caller, gimple call_stmt,
>>  int ecf_flags,
>>  gcov_type count, int freq)
>> {
>>   struct cgraph_edge *edge = cgraph_create_edge_1 (caller, NULL, call_stmt,
>>count, freq);
>>   tree target;
>>
>>   edge->indirect_unknown_callee = 1;
>>
>> I'd rather change the cgraph_create_edge_1 interface to get an additional
>> argument (the value to use for indirect_unknown_callee).  Or maybe
>> we can statically compute it from the current arguments already?
>
> IOW something like this:

Looks good to me.

Honza?

Thanks,
Richard.

> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 124ee0adf855..ccd73fa21b80 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -814,7 +814,8 @@ cgraph_set_call_stmt (struct cgraph_edge *e, gimple 
> new_stmt,
>
>  static struct cgraph_edge *
>  cgraph_create_edge_1 (struct cgraph_node *caller, struct cgraph_node *callee,
> -  gimple call_stmt, gcov_type count, int freq)
> +  gimple call_stmt, gcov_type count, int freq,
> +  bool indir_unknown_callee)
>  {
>struct cgraph_edge *edge;
>
> @@ -874,6 +875,7 @@ cgraph_create_edge_1 (struct cgraph_node *caller, struct 
> cgraph_node *callee,
>edge->indirect_info = NULL;
>edge->indirect_inlining_edge = 0;
>edge->speculative = false;
> +  edge->indirect_unknown_callee = indir_unknown_callee;
>if (call_stmt && caller->call_site_hash)
>  cgraph_add_edge_to_call_site_hash (edge);
>
> @@ -887,9 +889,8 @@ cgraph_create_edge (struct cgraph_node *caller, struct 
> cgraph_node *callee,
> gimple call_stmt, gcov_type count, int freq)
>  {
>struct cgraph_edge *edge = cgraph_create_edge_1 (caller, callee, call_stmt,
> -  count, freq);
> +  count, freq, false);
>
> -  edge->indirect_unknown_callee = 0;
>initialize_inline_failed (edge);
>
>edge->next_caller = callee->callers;
> @@ -926,10 +927,9 @@ cgraph_create_indirect_edge (struct cgraph_node *caller, 
> gimple call_stmt,
>  gcov_type count, int freq)
>  {
>struct cgraph_edge *edge = cgraph_create_edge_1 (caller, NULL, call_stmt,
> -  count, freq);
> +  count, freq, true);
>tree target;
>
> -  edge->indirect_unknown_callee = 1;
>initialize_inline_failed (edge);
>
>edge->indirect_info = cgraph_allocate_init_indirect_info ();
>
> --
> Markus


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-18 Thread Zhenqiang Chen
On 18 October 2013 18:15, Richard Biener  wrote:
> On Fri, Oct 18, 2013 at 12:06 PM, Zhenqiang Chen
>  wrote:
>> On 18 October 2013 17:53, Richard Biener  wrote:
>>> On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen
>>>  wrote:
 On 18 October 2013 00:58, Jeff Law  wrote:
> On 10/17/13 05:03, Richard Biener wrote:

 Is it OK for trunk?
>>>
>>>
>>> I had a much simpler change which did basically the same from 4.7 (I
>>> can update it if people think this is a better approach).
>>
>>
>> I like that more (note you can now use is_gimple_condexpr as predicate
>> for force_gimple_operand).
>
> The obvious question is whether or not Andrew's simpler change picks up as
> many transformations as Zhenqiang's change.  If not are the things missed
> important.
>
> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?

 Here is a rough compare:

 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
 in my patch). Root cause is that it does not skip "LABEL". The guard
 to do this opt should be the same the bb_has_overhead_p in my patch.
>>>
>>> I think we want a "proper" predicate in tree-cfg.c for this, like maybe
>>> a subset of tree_forwarder_block_p or whatever it will end up looking
>>> like (we need "effectively empty BB" elsewhere, for example in 
>>> vectorization,
>>> add a flag to allow a condition ending the BB and the predicate is done).
>>>
 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
 efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate

   _3 = a_2(D) > 0;
   _5 = b_4(D) > 0;
   _6 = _3 | _5;
   _9 = c_7(D) <= 0;
   _10 = ~_6;
   _11 = _9 & _10;
   if (_11 == 0)

 With my patch, it will generate

   _3 = a_2(D) > 0;
   _5 = b_4(D) > 0;
   _6 = _3 | _5;
   _9 = c_7(D) > 0;
   _10 = _6 | _9;
   if (_10 != 0)
>>>
>>> But that seems like a missed simplification in predicate combining
>>> which should be fixed more generally.
>>>
 3) The good thing of Andrew P.'s change is that "Inverse the order of
 the basic block walk" so it can do combine recursively.

 But I think we need some heuristic to control the number of ifs. Move
 too much compares from
 the inner_bb to outer_bb is not good.
>>>
>>> True, but that's what fold-const.c does, no?
>>
>> Based on current fold-const, we can not generate more than "two"
>> compares in a basic block.
>
> I think you are wrong as fold is invoked recursively on a && b && c && d.

Please try to compile it with -fdump-tree-gimple. It will be broken into

 if (a && b)
   if (c && d)

>> But if ifs can be combined recursively in ifcombine, we might generate
>> many compares in a basic block.
>
> Yes, we can end up with that for the other simplifications done in
> ifcombine as well.  Eventually there needs to be a cost model for this
> (use edge probabilities for example, so that if you have profile data
> you never combine cold blocks).
>
> Any takers?

I will take it.

Thanks!
-Zhenqiang

> Richard.
>
 4) Another good thing of Andrew P.'s change is that it reuses some
 existing functions. So it looks much simple.
>>>
>>> Indeed - that's what I like about it.
>>>
>>
>> With that we should be able to kill the fold-const.c transform?
>
> That would certainly be nice and an excellent follow-up for Zhenqiang.

 That's my final goal to "kill the fold-const.c transform". I think we
 may combine the two changes to make a "simple" and "good" patch.
>>>
>>> Thanks,
>>> Richard.
>>>
 Thanks!
 -Zhenqiang


[PATCH] Fix PR58775

2013-10-18 Thread Zhenqiang Chen
Hi,

The patch fixes PR58775.

Bootstrap and no make check regression on X86-64?

Is it OK?

Thanks!
-Zhenqiang

ChangeLog:
2013-10-18  Zhenqiang Chen  

PR tree-optimization/58775
* tree-ssa-reassoc.c (next_stmt_of): New function.
(not_dominated_by): Check next_stmt_of for new statement.
(rewrite_expr_tree): Call ensure_ops_are_available.

testsuite/ChangeLog:
2013-10-18  Zhenqiang Chen  

* gcc.dg/tree-ssa/pr58775.c: New test case.


pr58775.patch
Description: Binary data


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-18 Thread Zhenqiang Chen
On 18 October 2013 10:13, Andrew Pinski  wrote:
> On Thu, Oct 17, 2013 at 6:39 PM, Andrew Pinski  wrote:
>> On Thu, Oct 17, 2013 at 4:03 AM, Richard Biener
>>  wrote:
>>> On Thu, Oct 17, 2013 at 4:14 AM, Andrew Pinski  wrote:
 On Wed, Oct 16, 2013 at 2:12 AM, Zhenqiang Chen
  wrote:
> Hi,
>
> The patch enhances ifcombine pass to recover some non short circuit
> branches. Basically, it will do the following transformation:
>
> Case 1:
>   if (cond1)
> if (cond2)
> ==>
>   if (cond1 && cond2)
>
> Case 2:
>   if (cond1)
> goto L1
>   if (cond2)
> goto L1
> ==>
>   if (cond1 || cond2)
> goto L1
>
> Case 3:
>   if (cond1)
> goto L1
>   else
> goto L2
> L1:
>   if (cond2)
> goto L2
> ==>
>   if (invert (cond1) || cond2)
> goto L2
>
> Case 4:
>   if (cond1)
> goto L1
>   if (cond2)
> goto L2
> L1:
> ==>
>   if (invert (cond1) && cond2)
> goto L2
>
> Bootstrap on X86-64 and ARM.
>
> Two new FAILs in regression tests:
> gcc.dg/uninit-pred-8_b.c
> gcc.dg/uninit-pred-9_b.c
> uninit pass should be enhanced to handle more complex conditions. Will
> submit a bug to track it and fix it later.
>
> Is it OK for trunk?

 I had a much simpler change which did basically the same from 4.7 (I
 can update it if people think this is a better approach).
>>>
>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>> for force_gimple_operand).
>>
>>
>> Ok, with both this email and Jakub's email, I decided to port the
>> patch to the trunk but I ran into a bug in reassoc which I submitted
>> as PR 58775 (with a testcase which shows the issue without this
>> patch).
>
> I forgot to say that the tree-ssa-ifcombine.c hunk of the 4.7 patch
> applies correctly.  Once PR 58775 is fixed, I will be testing and
> submitting the full patch including with the testcases from Zhenqiang.

I just post a patch to fix PR 58775. With the patch, your changes can
bootstrap on X86-64.

Thanks!
-Zhenqiang

> Thanks,
> Andrew
>
>>
>>
>>>
>>> With that we should be able to kill the fold-const.c transform?
>>
>>
>> I think so but I have never tested removing it.
>>
>> Thanks,
>> Andrew Pinski
>>
>>
>>>
>>> Thanks,
>>> Richard.
>>>
 Thanks,
 Andrew Pinski


 2012-09-29  Andrew Pinski  

 * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
 (ifcombine_ifandif): Handle cases where
 maybe_fold_and_comparisons fails, combining the branches
 anyways.
 (tree_ssa_ifcombine): Inverse the order of
 the basic block walk, increases the number of combinings.
 * Makefile.in (tree-ssa-ifcombine.o): Update dependencies.

 * testsuite/gcc.dg/tree-ssa/phi-opt-2.c: Expect zero ifs as the 
 compiler
 produces a & b now.
 * testsuite/gcc.dg/tree-ssa/phi-opt-9.c: Use a function call
 to prevent conditional move to be used.
 * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for
 "one or more intermediate".


>
> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2013-10-16  Zhenqiang Chen  
>
> * fold-const.c (simple_operand_p_2): Make it global.
> * tree.h (simple_operand_p_2): Declare it.
> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
> (bb_has_overhead_p, generate_condition_node,
> ifcombine_ccmp): New functions.
> (ifcombine_fold_ifandif): New function, extracted from
> ifcombine_ifandif.
> (ifcombine_ifandif): Call ifcombine_ccmp.
> (tree_ssa_ifcombine_bb): Skip optimized bb.
>
> testsuite/ChangeLog
> 2013-10-16  Zhenqiang Chen  
>
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Updated.


Re: [PATCH] Portable Volatility Warning

2013-10-18 Thread Richard Biener
On Wed, Oct 16, 2013 at 2:48 PM, Bernd Edlinger
 wrote:
> Hi Richard,
>
> well I think I have now a solution for both of your comments on the
> initial version of the portable volatility warning patch.
> Furthermore I have integrated Sandra's comments.
>
> Therefore I think it might be worth another try, if you don't mind.
>
> Technically this patch is not dependent on Sandra's patch any more.
> However it may be useful to use -Wportable-volatility together with
> -fno-strict-volatile-bitmaps, to get correct code, together with a warning
> where the device registers are accessed in possible wrong way.
> As we know the -fstrict-volatile-bitmaps generates wrong
> code most of the time, but the warning does not cover all cases
> where -fstrict-volatile-bitmaps generates wrong code
> without Sandra's patch.
>
> On Tue, 3 Sep 2013 10:53:10, Richard Biener wrote:
>> Just a quick first note:
>>
>> @@ -3470,6 +3470,13 @@ optimize_bit_field_compare (location_t l
>> || offset != 0 || TREE_CODE (linner) == PLACEHOLDER_EXPR)
>> return 0;
>>
>> + /* Do not try to optimize on volatiles, as it would defeat the
>> + -Wportable-volatility warning later:
>> + If the COMPONENT_REF is replaced by a BIT_FIELD_REF we loose
>> + information and thus the warning cannot be generated correctly. */
>> + if (warn_portable_volatility && lvolatilep)
>> + return 0;
>>
>> changing code-generation with a warning option looks wrong to me. Note
>> that I completely removed this code once (it also generates strange
>> BIT_FIELD_REFs) but re-instantiated it on request because of lost
>> optimizations.
>>
>
> I just emit the warning, when the COMPONENT_REF is about to be replaced,
> but leave the code generation flow intact.
>
>> Similar for
>>
>> @@ -609,11 +609,14 @@ layout_decl (tree decl, unsigned int kno
>> occupying a complete byte or bytes on proper boundary,
>> and not -fstrict-volatile-bitfields. If the latter is set,
>> we unfortunately can't check TREE_THIS_VOLATILE, as a cast
>> - may make a volatile object later. */
>> + may make a volatile object later.
>> + Handle -Wportable-volatility analogous, as we would loose
>> + information and defeat the warning later. */
>> if (TYPE_SIZE (type) != 0
>> && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
>> && GET_MODE_CLASS (TYPE_MODE (type)) == MODE_INT
>> - && flag_strict_volatile_bitfields <= 0)
>> + && flag_strict_volatile_bitfields <= 0
>> + && warn_portable_volatility == 0)
>> {
>> enum machine_mode xmode
>> = mode_for_size_tree (DECL_SIZE (decl), MODE_INT, 1);
>>
>> I also wonder about "we unfortunately can't check TREE_THIS_VOLATILE, as a 
>> cast
>> may make a volatile object later." - does that mean that AAPCS requires
>>
>> struct X { int x : 8; char c; };
>>
>> void foo (struct X *p)
>> {
>> ((volatile struct X *)p)->x = 1;
>> }
>>
>> to be treated as volatile bitfield access? Similar, is
>>
>> volatile struct X x;
>> x.x = 1;
>>
>> a volatile bitifled access?
>>
>> I think the warning can be completely implemented inside struct-layout.c
>> for example in finish_bitfield_representative (if you pass that the first 
>> field
>> in the group, too). Of course that is at the expense of warning for
>> struct declarations rather than actual code differences (the struct may
>> not be used)
>>
>> Richard.
>
>
> Bootstrapped and regression-tested on x86_64-linux-gnu.
>
> OK for trunk?

Hmm, I see the issue with optimize_bit_field_compare (though I
see a -fstrict-volatile-bitfields early-out in that function).  The folding
itself is weird enough (and also possibly violates the C++ memory
model eventually) that I'd consider instead disabling it unconditionally
if lvolatile || rvolatile (and remove all -fstrict-volatile-bitfield handling
from it).  Can you prepare a split out patch to do that?  I promise I'll
quickly approve it ;)

For the rest I think that this warning should warn independent of
optimization levels or actual bitfield member references, thus,
be implemented entirely at the time we lay out the structure.
That may not be able to take into account alignment issues, but your
patch doesn't either AFAIK.

Thanks and sorry for taking so long to review this again,
Richard.

> Thanks
> Bernd.


Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-10-18 Thread Richard Biener
On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore
 wrote:
> This patch fixes various -fstrict-volatile-bitfields wrong-code bugs,
> including instances where bitfield values were being quietly truncated as
> well as bugs relating to using the wrong width.  The code changes are
> identical to those in the previous version of this patch series (originally
> posted in June and re-pinged many times since then), but for this version I
> have cleaned up the test cases to remove dependencies on header files, as
> Bernd requested.

Ok.

Thanks,
Richard.

> -Sandra
>


Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-10-18 Thread Richard Biener
On Fri, Jun 21, 2013 at 11:21 PM, Steve Ellcey  wrote:
> On Fri, 2013-06-21 at 17:43 +0100, James Greenhalgh wrote:
>
>> So to my mind, this is all far too tied up in pass ordering details to
>> resolve. Given that all the threading opportunities for my patch are found
>> in dom1 and how fragile the positioning of dom1 is, there is not a great
>> deal I can do to modify the ordering.
>>
>> The biggest improvement I could find comes from rerunning pass_ch
>> immediately after dom1, though I'm not sure what the cost of that
>> would be.
>>
>> I wonder if you or others have any thoughts on what the right thing to
>> do would be?
>>
>> > I am not sure if path threading in general is turned off for -Os but it
>> > probably should be.
>>
>> I agree, jump threading is on at -Os, path threading should not be.
>>
>> Thanks,
>> James
>
> I think that since it seems to be more a space issue then a time issue,
> the way you currently have it is reasonable.  As long as the
> optimization does not happen at -Os then I think we can probably live
> with the space increase.

I suppose with Jeffs recent work on jump-threading through paths
this case in handled and the patch in this thread is obsolete or can
be reworked?

Thanks,
Richard.

> Steve Ellcey
> sell...@mips.com
>
>


Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-10-18 Thread James Greenhalgh
On Fri, Oct 18, 2013 at 11:55:08AM +0100, Richard Biener wrote:
> I suppose with Jeffs recent work on jump-threading through paths
> this case in handled and the patch in this thread is obsolete or can
> be reworked?

Yes, this patch is now obsolete, Jeff's solution is much more
elegant :-)

Thanks,
James



Re: [PATCH] Fix PR58775

2013-10-18 Thread Jakub Jelinek
On Fri, Oct 18, 2013 at 06:36:44PM +0800, Zhenqiang Chen wrote:
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2861,6 +2861,19 @@ swap_ops_for_binary_stmt (vec ops,
 }
 }
 
+/* Determine if stmt A is in th next list of stmt B.  */
+static inline bool
+next_stmt_of (gimple a, gimple b)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_for_stmt (b); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  if (gsi_stmt (gsi) == a)
+   return true;
+}
+  return false;
+}

How is that different from appears_later_in_bb?  More importantly,
not_dominated_by shouldn't be called with the same uid and basic block,
unless the stmts are the same, except for the is_gimple_debug case
(which looks like a bug).  And more importantly, if a stmt has zero uid,
we'd better set it from the previous or next non-zero uid stmt, so that
we don't lineary traverse the whole bb many times.

That said, I've lost track of the bugfixes for this, don't know if there is
any patch still pending or not.

Jakub


Re: [PATCH]Fix computation of offset in ivopt

2013-10-18 Thread Richard Biener
On Thu, Oct 17, 2013 at 7:52 AM, bin.cheng  wrote:
> Hi,
> As noted in previous messages, GCC forces offset to unsigned in middle end.
> It also gets offset value and stores it in HOST_WIDE_INT then uses it in
> various computation.  In this scenario, It should use int_cst_value to do
> additional sign extension against the type of tree node, otherwise we might
> lose sign information.  Take function fold_plusminus_mult_expr as an
> example, the sign information of arg01/arg11 might be lost because it uses
> TREE_INT_CST_LOW directly.  I think this is the offender of the problem in
> this thread.  I think we should fix the offender, rather the hacking
> strip_offset as in the original patch, so I split original patch into two as
> attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, the
> other fixing the mentioned sign extension problem.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c (revision 203267)
+++ gcc/fold-const.c (working copy)
@@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre
   HOST_WIDE_INT int01, int11, tmp;
   bool swap = false;
   tree maybe_same;
-  int01 = TREE_INT_CST_LOW (arg01);
-  int11 = TREE_INT_CST_LOW (arg11);
+  int01 = int_cst_value (arg01);
+  int11 = int_cst_value (arg11);

this is not correct - it will mishandle all large unsigned numbers.

The real issue is that we rely on twos-complement arithmetic to work
when operating on pointer offsets (because we convert them all to
unsigned sizetype).  That makes interpreting the offsets or expressions
that compute offsets hard (or impossible in some cases), as you can
see from the issues in IVOPTs.

The above means that strip_offset_1 cannot reliably look through
MULT_EXPRs as that can make an offset X * CST that is
"effectively" signed "surely" unsigned in the process.  Think of
this looking into X * CST as performing a division - whose result
is dependent on the sign of X which we lost with our unsigned
casting.  Now, removing the MULT_EXPR look-through might
be too pessimizing ... but it may be worth trying to see if it fixes
your issue?  In this context I also remember a new bug filed
recently of us not folding x /[ex] 4 * 4 to x.

In the past I was working multiple times on stopping doing that
(make all offsets unsigned), but I never managed to finish ...

Richard.

> Bootstrap and test on x86/x86_64/arm.  Is it OK?
>
> Thanks.
> bin
>
> Patch a:
> 2013-10-17  Bin Cheng  
>
> * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type.
> Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF.
>
> Patch b:
> 2013-10-17  Bin Cheng  
>
> * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value instead
> of TREE_INT_CST_LOW.
>
> gcc/testsuite/ChangeLog
> 2013-10-17  Bin Cheng  
>
> * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test.
>
>> -Original Message-
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Wednesday, October 02, 2013 4:32 PM
>> To: Bin.Cheng
>> Cc: Bin Cheng; GCC Patches
>> Subject: Re: [PATCH]Fix computation of offset in ivopt
>>
>> On Tue, Oct 1, 2013 at 6:13 PM, Bin.Cheng  wrote:
>> > On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener
>> >  wrote:
>> >> On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng 
>> wrote:
>> >>>
>> >>>
>> >>
>> >> I don't think you need
>> >>
>> >> +  /* Sign extend off if expr is in type which has lower precision
>> >> + than HOST_WIDE_INT.  */
>> >> +  if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT)
>> >> +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr)));
>> >>
>> >> at least it would be suspicious if you did ...
>> > There is a problem for example of the first message.  The iv base if
> like:
>> > pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 I am not sure
>> > why but it seems (-4/0xFFFC) is represented by (1073741823*4).
>> > For each operand strip_offset_1 returns exactly the positive number
>> > and result of multiplication never get its chance of sign extend.
>> > That's why the sign extend is necessary to fix the problem. Or it
>> > should be fixed elsewhere by representing iv base with:
>> > "pretmp_184 + ((sizetype) KeyIndex_180 + 4294967295) * 4" in the first
>> place.
>>
>> Yeah, that's why I said the whole issue with forcing all offsets to be
> unsigned
>> is a mess ...
>>
>> There is really no good answer besides not doing that I fear.
>>
>> Yes, in the above case we could fold the whole thing differently
> (interpret
>> the offset of a POINTER_PLUS_EXPR as signed).  You can try tracking down
>> the offender, but it'll get non-trivial easily as you have to consider the
> fact
>> that GCC will treat signed operations as having undefined behavior on
>> overflow.
>>
>> So I see why you want to do the extension above (re-interpret the result),
> I
>> suppose we can live with it but please make sure to add a big fat ?

Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS

2013-10-18 Thread Richard Biener
On Tue, Sep 17, 2013 at 7:37 PM, Richard Sandiford
 wrote:
> Mike Stump  writes:
>> +/* Partial integer modes are specified by relation to a full integer
>> +   mode.  */
>> +#define PARTIAL_INT_MODE(M,PREC) \
>> +  make_partial_integer_mode (#M, "P" #PREC #M, PREC, __FILE__, __LINE__)
>> +/* Partial integer modes are specified by relation to a full integer
>> +   mode.  */
>> +#define PARTIAL_INT_MODE_NAME(M,PREC,NAME)   \
>> +  make_partial_integer_mode (#M, #NAME, PREC, __FILE__, __LINE__)
>
> Sorry for the bikeshedding, but I think it'd better to have a single macro:
>
> #define PARTIAL_INT_MODE(M, PREC, NAME)
>
> You can easily add an explicit "P" if the port happens to want
> that name.

I agree.  Btw, the "implementation defined" precision of PDI on SH-5 definitely
looks interesting, but I wonder how you can perform "implementation defined"
arithmetic on that PDI mode then ;)  I suppose using the maximum precision
of 64 bits is good enough, assuming the rest of the bits get truncated in
"implementation defined" manners.

What else blocks this patch?

Thanks,
Richard.

> Thanks,
> Richard


Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-10-18 Thread Richard Biener
On Thu, Oct 3, 2013 at 6:05 PM, Ilya Tocar  wrote:
> On 26 Sep 21:21, Ilya Tocar wrote:
>> On 25 Sep 15:48, Richard Biener wrote:
>> > On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar  
>> > wrote:
>> > > On 24 Sep 11:02, Richard Biener wrote:
>> > >> On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar  
>> > >> wrote:
>> > >>  thus consider assigning the section
>> > >> name in a different place.
>> > >>
>> > >> Richard.
>> > >
>> > > What do you mean  by different place?
>> > > I can add global dumping_omp_target variable to choose correct name,
>> > > depending on it's value (patch below). Is it better?
>> >
>> > More like passing down a different abstraction, like for
>> >
>> > > @@ -907,9 +907,15 @@ output_symtab (void)
>> > >  {
>> > >symtab_node node = lto_symtab_encoder_deref (encoder, i);
>> > >if (cgraph_node *cnode = dyn_cast  (node))
>> > > -lto_output_node (ob, cnode, encoder);
>> > > +   {
>> > > + if (!dumping_omp_target || lookup_attribute ("omp declare 
>> > > target",
>> > > + DECL_ATTRIBUTES 
>> > > (node->symbol.decl)))
>> > > +   lto_output_node (ob, cnode, encoder);
>> > > +   }
>> > >else
>> > > -lto_output_varpool_node (ob, varpool (node), encoder);
>> > > + if (!dumping_omp_target || lookup_attribute ("omp declare 
>> > > target",
>> > > + DECL_ATTRIBUTES 
>> > > (node->symbol.decl)))
>> > > +   lto_output_varpool_node (ob, varpool (node), encoder);
>> > >
>> > >  }
>> >
>> > have the symtab encoder already not contain the varpool nodes you
>> > don't need.
>> >
>> > And instead of looking up attributes, mark the symtab node with a flag.
>>
>> Good idea!
>> I've tried creating 2 encoders, and adding only nodes with
>> "omp declare target" attribute in omp case. There is still some is_omp
>> passing to control  lto_set_symtab_encoder_in_partition behaivor,
>> because i think it's better than global var.
>> What do you think?
>>
> Updated version of the patch. I've checked that it doesn't break lto on
> SPEC 2006. Streaming for omp is enabled by -fopnemp flag. Works with and
> without enabled lto. Ok for gomp4 branch?

Certainly better than the first version.  Jakub should decide for the branch
and eventually Honza for the merge to trunk.  It still looks somewhat hackish,
but I suppose that's because we don't have a LTO-state object where we
can encapsulate all this.

Also I still don't like the attribute lookup

> +  /* Ignore non omp target nodes for omp case.  */
> +  if (is_omp && !lookup_attribute ("omp declare target",
> +  DECL_ATTRIBUTES (node->symbol.decl)))
> +return;

can we instead please add a flag in cgraph_node?

Thanks,
Richard.

>
> ---
>  gcc/cgraphunit.c  | 15 +--
>  gcc/ipa-inline-analysis.c |  2 +-
>  gcc/lto-cgraph.c  | 15 ++-
>  gcc/lto-streamer.c|  5 +++--
>  gcc/lto-streamer.h| 10 --
>  gcc/lto/lto-partition.c   |  4 ++--
>  gcc/passes.c  | 12 ++--
>  gcc/tree-pass.h   |  2 +-
>  8 files changed, 44 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index 1644ca9..d595475 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -2016,7 +2016,18 @@ ipa_passes (void)
>   passes->all_lto_gen_passes);
>
>if (!in_lto_p)
> -ipa_write_summaries ();
> +{
> +  if (flag_openmp)
> +   {
> + section_name_prefix = OMP_SECTION_NAME_PREFIX;
> + ipa_write_summaries (true);
> +   }
> +  if (flag_lto)
> +   {
> + section_name_prefix = LTO_SECTION_NAME_PREFIX;
> + ipa_write_summaries (false);
> +   }
> +}
>
>if (flag_generate_lto)
>  targetm.asm_out.lto_end ();
> @@ -2107,7 +2118,7 @@ compile (void)
>cgraph_state = CGRAPH_STATE_IPA;
>
>/* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
> -  if (flag_lto)
> +  if (flag_lto || flag_openmp)
>  lto_streamer_hooks_init ();
>
>/* Don't run the IPA passes if there was any error or sorry messages.  */
> diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
> index ba6221e..4420213 100644
> --- a/gcc/ipa-inline-analysis.c
> +++ b/gcc/ipa-inline-analysis.c
> @@ -3721,7 +3721,7 @@ inline_generate_summary (void)
>
>/* When not optimizing, do not bother to analyze.  Inlining is still done
>   because edge redirection needs to happen there.  */
> -  if (!optimize && !flag_lto && !flag_wpa)
> +  if (!optimize && !flag_lto && !flag_wpa && !flag_openmp)
>  return;
>
>function_insertion_hook_holder =
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 952588d..4a7d179 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t 
> encoder,
>
>  void
>  lto_set_symtab_encoder_

RE: [PATCH] Fix PR58682

2013-10-18 Thread Paulo Matos


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
> Behalf Of Paulo Matos
> Sent: 16 October 2013 09:42
> To: Mike Stump; Kyrill Tkachov
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Fix PR58682
> 
> > -Original Message-
> > From: Mike Stump [mailto:mikest...@comcast.net]
> > Sent: 14 October 2013 20:46
> > To: Kyrill Tkachov
> > Cc: Paulo Matos; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Fix PR58682
> >
> > In this case, I've reviewed the license I suspect you have, and I suspect it
> does
> > not have a grant of enough rights to allow it to be contributed to gcc.
> >
> 
> Assuming we cannot get the testcase into GCC, can we get the patch into trunk?
> If you try it locally you can definitely see the ICE and that the patch fixes 
> it.
> 

ping.

Paulo Matos



Fix asymmetric volatile handling in optimize_bit_field_compare

2013-10-18 Thread Bernd Edlinger
Hello Richard,

I see it the same way.

The existing code in optimize_bit_field_compare looks wrong. It is very 
asymmetric,
and handles lvolatilep and rvolatilep differently. And the original handling of
flag_strict_volatile_bitfields also was asymmetric.

However this optimize-bit-field-compare check at the beginning of that function 
was not
introduced because of volatile issues I think:

There was a discussion in April 2012 on this thread: "Continue 
strict-volatile-bitfields fixes"
 
http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01094.html
 
The result was that this optimization seems to break other possible 
optimizations later on,
when -fstrict-volatile-bitfields was enabled on the SH target. Even when the 
bit fields are NOT volatile.
(Of course you should not touch volatile bit fields at all)
 
And this was added to optimize_bit_field_compare as a result:
 
  /* In the strict volatile bitfields case, doing code changes here may prevent
 other optimizations, in particular in a SLOW_BYTE_ACCESS setting.  */
  if (flag_strict_volatile_bitfields> 0)
    return 0;
 
Well, I dont know if that still applies to the trunk, and probably this was the 
completely wrong way to fix
this issue it here.

Maybe that had to do with the other place in stor-layout.c,
which also could be solved differently.

I think it could be possible to remove the flag_strict_volatile_bitfields 
special case
also there, and instead use the DECL_BIT_FIELD_TYPE instead of DECL_BIT_FIELD in
get_inner_reference IF we have flag_strict_volatile_bitfields and that field
is really volatile. At least this helped in the portable volatiliy patch.
What do you think?

Anyway, here is my patch for review.

OK for trunk after regression testing?

Thanks
Bernd.2013-10-18  Bernd Edlinger  

Fix volatile issues in optimize_bit_field_compare.
* fold-const.c (optimize_bit_field_compare): Bail out if
lvolatilep or rvolatilep.



patch-bitfield-compare.diff
Description: Binary data


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 12:34 PM, Zhenqiang Chen
 wrote:
> On 18 October 2013 18:15, Richard Biener  wrote:
>> On Fri, Oct 18, 2013 at 12:06 PM, Zhenqiang Chen
>>  wrote:
>>> On 18 October 2013 17:53, Richard Biener  wrote:
 On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen
  wrote:
> On 18 October 2013 00:58, Jeff Law  wrote:
>> On 10/17/13 05:03, Richard Biener wrote:
>
> Is it OK for trunk?


 I had a much simpler change which did basically the same from 4.7 (I
 can update it if people think this is a better approach).
>>>
>>>
>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>> for force_gimple_operand).
>>
>> The obvious question is whether or not Andrew's simpler change picks up 
>> as
>> many transformations as Zhenqiang's change.  If not are the things missed
>> important.
>>
>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>
> Here is a rough compare:
>
> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
> in my patch). Root cause is that it does not skip "LABEL". The guard
> to do this opt should be the same the bb_has_overhead_p in my patch.

 I think we want a "proper" predicate in tree-cfg.c for this, like maybe
 a subset of tree_forwarder_block_p or whatever it will end up looking
 like (we need "effectively empty BB" elsewhere, for example in 
 vectorization,
 add a flag to allow a condition ending the BB and the predicate is done).

> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>
>   _3 = a_2(D) > 0;
>   _5 = b_4(D) > 0;
>   _6 = _3 | _5;
>   _9 = c_7(D) <= 0;
>   _10 = ~_6;
>   _11 = _9 & _10;
>   if (_11 == 0)
>
> With my patch, it will generate
>
>   _3 = a_2(D) > 0;
>   _5 = b_4(D) > 0;
>   _6 = _3 | _5;
>   _9 = c_7(D) > 0;
>   _10 = _6 | _9;
>   if (_10 != 0)

 But that seems like a missed simplification in predicate combining
 which should be fixed more generally.

> 3) The good thing of Andrew P.'s change is that "Inverse the order of
> the basic block walk" so it can do combine recursively.
>
> But I think we need some heuristic to control the number of ifs. Move
> too much compares from
> the inner_bb to outer_bb is not good.

 True, but that's what fold-const.c does, no?
>>>
>>> Based on current fold-const, we can not generate more than "two"
>>> compares in a basic block.
>>
>> I think you are wrong as fold is invoked recursively on a && b && c && d.
>
> Please try to compile it with -fdump-tree-gimple. It will be broken into
>
>  if (a && b)
>if (c && d)

Indeed - it seems to handle all kinds of series but not the incremental
one that appears after the first folding - (a truth-and b) truth-andif c.

>>> But if ifs can be combined recursively in ifcombine, we might generate
>>> many compares in a basic block.
>>
>> Yes, we can end up with that for the other simplifications done in
>> ifcombine as well.  Eventually there needs to be a cost model for this
>> (use edge probabilities for example, so that if you have profile data
>> you never combine cold blocks).
>>
>> Any takers?
>
> I will take it.
>
> Thanks!
> -Zhenqiang
>
>> Richard.
>>
> 4) Another good thing of Andrew P.'s change is that it reuses some
> existing functions. So it looks much simple.

 Indeed - that's what I like about it.

>>>
>>> With that we should be able to kill the fold-const.c transform?
>>
>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>
> That's my final goal to "kill the fold-const.c transform". I think we
> may combine the two changes to make a "simple" and "good" patch.

 Thanks,
 Richard.

> Thanks!
> -Zhenqiang


Re: [PATCH]Fix computation of offset in ivopt

2013-10-18 Thread Bernd Schmidt
On 10/18/2013 01:18 PM, Richard Biener wrote:

> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c (revision 203267)
> +++ gcc/fold-const.c (working copy)
> @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre
>HOST_WIDE_INT int01, int11, tmp;
>bool swap = false;
>tree maybe_same;
> -  int01 = TREE_INT_CST_LOW (arg01);
> -  int11 = TREE_INT_CST_LOW (arg11);
> +  int01 = int_cst_value (arg01);
> +  int11 = int_cst_value (arg11);
> 
> this is not correct - it will mishandle all large unsigned numbers.
> 
> The real issue is that we rely on twos-complement arithmetic to work
> when operating on pointer offsets (because we convert them all to
> unsigned sizetype).  That makes interpreting the offsets or expressions
> that compute offsets hard (or impossible in some cases), as you can
> see from the issues in IVOPTs.

I still have patches to keep pointer types in ivopts (using a new
POINTER_PLUSV_EXPR). Would that help in this case? Last time I posted
them they met an unenthusiastic reception so I've never bothered to
repost them.


Bernd



Re: Fix asymmetric volatile handling in optimize_bit_field_compare

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 1:56 PM, Bernd Edlinger
 wrote:
> Hello Richard,
>
> I see it the same way.
>
> The existing code in optimize_bit_field_compare looks wrong. It is very 
> asymmetric,
> and handles lvolatilep and rvolatilep differently. And the original handling 
> of
> flag_strict_volatile_bitfields also was asymmetric.
>
> However this optimize-bit-field-compare check at the beginning of that 
> function was not
> introduced because of volatile issues I think:
>
> There was a discussion in April 2012 on this thread: "Continue 
> strict-volatile-bitfields fixes"
>
> http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01094.html
>
> The result was that this optimization seems to break other possible 
> optimizations later on,
> when -fstrict-volatile-bitfields was enabled on the SH target. Even when the 
> bit fields are NOT volatile.
> (Of course you should not touch volatile bit fields at all)
>
> And this was added to optimize_bit_field_compare as a result:
>
>   /* In the strict volatile bitfields case, doing code changes here may 
> prevent
>  other optimizations, in particular in a SLOW_BYTE_ACCESS setting.  */
>   if (flag_strict_volatile_bitfields> 0)
> return 0;

I see it as that this check breaked the testcases which required the
testcase fixes.  So yes, if
that referenced patch got in then it can likely be reverted.

> Well, I dont know if that still applies to the trunk, and probably this was 
> the completely wrong way to fix
> this issue it here.
>
> Maybe that had to do with the other place in stor-layout.c,
> which also could be solved differently.
>
> I think it could be possible to remove the flag_strict_volatile_bitfields 
> special case
> also there, and instead use the DECL_BIT_FIELD_TYPE instead of DECL_BIT_FIELD 
> in
> get_inner_reference IF we have flag_strict_volatile_bitfields and that field
> is really volatile. At least this helped in the portable volatiliy patch.
> What do you think?

Well, sure using DECL_BIT_FIELD is wrong if AACPS specifies the
handling is conditional
on being declared as bitfield.  Otherwise

struct {
  int c : 8;
};

will be not handled correctly (DECL_BIT_FIELD is not set, the field
occupies a QImode).

DECL_BIT_FIELD_TYPE is the proper way to check whether a field was _declared_
as bitfield.

So yes, removing this special-case in stor-layout.c seems possible -
but beware of
possible ABI issues here, for example passing an argument of the above type
by value might result in passing it in a register or on the stack
dependent on whether
the struct has QImode or BLKmode (I'd say that's a target bug then as the target
should look at the type, not at its mode... but reality is that I
fully can believe such
a bug exists - and whoops you have a call ABI that depends on
-fstrict-volatile-bitfields ;))

> Anyway, here is my patch for review.
>
> OK for trunk after regression testing?

Ok.

Thanks,
Richard.

> Thanks
> Bernd.


Re: [PATCH]Fix computation of offset in ivopt

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 2:02 PM, Bernd Schmidt  wrote:
> On 10/18/2013 01:18 PM, Richard Biener wrote:
>
>> Index: gcc/fold-const.c
>> ===
>> --- gcc/fold-const.c (revision 203267)
>> +++ gcc/fold-const.c (working copy)
>> @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre
>>HOST_WIDE_INT int01, int11, tmp;
>>bool swap = false;
>>tree maybe_same;
>> -  int01 = TREE_INT_CST_LOW (arg01);
>> -  int11 = TREE_INT_CST_LOW (arg11);
>> +  int01 = int_cst_value (arg01);
>> +  int11 = int_cst_value (arg11);
>>
>> this is not correct - it will mishandle all large unsigned numbers.
>>
>> The real issue is that we rely on twos-complement arithmetic to work
>> when operating on pointer offsets (because we convert them all to
>> unsigned sizetype).  That makes interpreting the offsets or expressions
>> that compute offsets hard (or impossible in some cases), as you can
>> see from the issues in IVOPTs.
>
> I still have patches to keep pointer types in ivopts (using a new
> POINTER_PLUSV_EXPR). Would that help in this case? Last time I posted
> them they met an unenthusiastic reception so I've never bothered to
> repost them.

Can you point me to that patch?  Or shortly elaborate on "keep pointer types
in ivopts"?  I think this issue is about decomposing offset computations into
a constant and a variable part, which after foldings messed up the unsigned
computation can result in "odd" constant offset parts.  So it's rather
because the offset operand of POINTER_PLUS_EXPR is fixed as
sizetype.

Richard.

>
> Bernd
>


Re: [PATCH] Fix PR58682

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 1:40 PM, Paulo Matos  wrote:
>
>
>> -Original Message-
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
>> Behalf Of Paulo Matos
>> Sent: 16 October 2013 09:42
>> To: Mike Stump; Kyrill Tkachov
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: RE: [PATCH] Fix PR58682
>>
>> > -Original Message-
>> > From: Mike Stump [mailto:mikest...@comcast.net]
>> > Sent: 14 October 2013 20:46
>> > To: Kyrill Tkachov
>> > Cc: Paulo Matos; gcc-patches@gcc.gnu.org
>> > Subject: Re: [PATCH] Fix PR58682
>> >
>> > In this case, I've reviewed the license I suspect you have, and I suspect 
>> > it
>> does
>> > not have a grant of enough rights to allow it to be contributed to gcc.
>> >
>>
>> Assuming we cannot get the testcase into GCC, can we get the patch into 
>> trunk?
>> If you try it locally you can definitely see the ICE and that the patch 
>> fixes it.
>>
>
> ping.

Please re-post the latest version of the patch and CC Honza.

Richard.

> Paulo Matos
>


Re: [PING^2][PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-10-18 Thread Richard Biener
On Wed, 16 Oct 2013, Kugan wrote:

> Thanks Richard for the review.
> 
> On 15/10/13 23:55, Richard Biener wrote:
> > On Tue, 15 Oct 2013, Kugan wrote:
> > 
> >> Hi Eric,
> >>
> >> Can you please help to review this patch?
> >> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00452.html
> > 
> > I think that gimple_assign_is_zero_sign_ext_redundant and its
> > description is somewhat confused.  You seem to have two cases
> > here, one being NOP_EXPR or CONVERT_EXPR that are in the IL
> > because they are required for type correctness.  So,
> > 
> 
> I have changed the name and the comments to make it more clear.
> 
> >long = (long) int
> > 
> > which usually expands to
> > 
> >(set:DI (sext:DI reg:SI))
> > 
> > you want to expand as
> > 
> >(set:DI (subreg:DI reg:SI))
> > 
> > where you have to be careful for modes smaller than word_mode.
> > You don't seem to implement this optimization though (but
> > the gimple_assign_is_zero_sign_ext_redundant talks about it).
> > 
> 
> 
> I am actually handling only the cases smaller than word_mode. If there
> is any sign change, I dont do any changes. In the place RTL expansion is
> done, I have added these condition as it is done for convert_move and
> others.
> 
> For example, when an expression is evaluated and it's value is assigned
> to variable of type short, the generated RTL would look something like
> the following.
> 
> (set (reg:SI 110)
> (zero_extend:SI (subreg:HI (reg:SI 117) 0)))
> 
> However, if during value range propagation, if we can say for certain
> that the value of the expression which is present in register 117 is
> within the limits of short and there is no sign conversion, we do not
> need to perform the subreg and zero_extend; instead we can generate the
> following RTl.
> 
> (set (reg:SI 110)
> (reg:SI 117)))
> 
> 
> > Second are promotions required by the target (PROMOTE_MODE)
> > that do arithmetic on wider registers like for
> > 
> >   char = char + char
> > 
> > where they cannot do
> > 
> >   (set:QI (plus:QI reg:QI reg:QI))
> > 
> > but instead have, for example reg:SI only and you get
> > 
> >   (set:SI (plus:SI reg:SI reg:SI))
> >   (set:SI ([sz]ext:SI (subreg:QI reg:SI)))
> > 
> > that you try to address with the cfgexpand hunk but I believe
> > it doesn't work the way you do it.  That is because on GIMPLE
> > you do not see SImode temporaries and thus no SImode value-ranges.
> > Consider
> > 
> >   tem = (char) 255 + (char) 1;
> > 
> > which has a value-range of [0,0] but clearly when computed in
> > SImode the value-range is [256, 256].  That is, VRP computes
> > value-ranges in the expression type, not in some arbitrary
> > larger type.
> > 
> > So what you'd have to do is take the value-ranges of the
> > two operands of the plus and see whether the plus can overflow
> > QImode when computed in SImode (for the example).
> >
> 
> Yes. Instead of calculating the value ranges of the two operand in
> SImode, What I am doing in this case is to look at the value range of
> tem and if it is within [CHAR_MIN + 1, CHAR_MAX -1]. As you have
> explained earlier, we cant rely on being within the [CHAR_MIN, CHAR_MAX]
> as the range could have been modified to fit the LHS type. This ofcourse
> will miss some of the cases where we can remove extensions but
> simplifies the logic.

Not sure if I understand what you are saying here.  As for the above
case

> >   tem = (char) 255 + (char) 1;

tem is always of type 'char' in GIMPLE (even if later promoted
via PROMOTE_MODE) the value-range is a 'char' value-range and thus
never will exceed [CHAR_MIN, CHAR_MAX].  The only way you can
use that directly is if you can rely on undefined behavior
happening for signed overflow - but if you argue that way you
can simply _always_ drop the (sext:SI (subreg:QI part and you
do not need value ranges for this.  For unsigned operations
for example [250, 254] + [8, 10] will simply wrap to [3, 7]
(if I got the math correct) which is inside your [CHAR_MIN + 1,
CHAR_MAX - 1] but if performed in SImode you can get 259 and
thus clearly you cannot drop the (zext:SI (subreg:QI parts.
The same applies to signed types if you do not want to rely
on signed overflow being undefined of course.

Richard.


> > [exposing the effect of PROMOTE_MODE earlier than at RTL expansion
> > time may make this less awkward]
> 
> Please find the modified patch attached.
> 
> 
> +2013-10-16  Kugan Vivekanandarajah  
> +
> + * dojump.c (do_compare_and_jump): Generate rtl without
> + zero/sign extension if redundant.
> + * cfgexpand.c (expand_gimple_stmt_1): Likewise.
> + * gimple.c (gimple_is_rhs_value_fits_in_assign) : New
> + function.
> + * gimple.h (gimple_is_rhs_value_fits_in_assign) : Declare.
> +
> 
> Thanks,
> Kugan
> > 
> > Thanks,
> > Richard.
> > 
> >> Thanks,
> >> Kugan
> >>
> >>> +2013-09-25  Kugan Vivekanandarajah  
> >>> +
> >>> + * dojump.c (do_compare_and_jump): Generate rtl without
> >>> + zero/sign extension if redundant.
> >>> + * cf

Re: [c++-concepts] Shorthand notation

2013-10-18 Thread Andrew Sutton
A small follow up change. This removes the "sorry" from concept name
resolution. Committed in r203815.

2013-10-16  Andrew Sutton  
* gcc/cp/constraint.cc (finish_concept_name): Allow functions with
the same name as concepts to resolve as call expressions in the
usual way.

Andrew Sutton


On Wed, Oct 16, 2013 at 9:59 AM, Andrew Sutton
 wrote:
> I've committed initial support for shorthand constraints. This patch
> adds support for parsing a concept-names as non-class names. When
> introducing a template parameter, the concept name is transformed into
> a constraint on the template parameter. Constrained parameters can
> introduce type, non-type and template template parameters.
>
> This has initial support for variadic constraints, but it's not well tested.
>
> This patch does not yet support default arguments for constrained
> template parameters, nor does it support the use of concept ids of
> this form:
>
>   template F>
> void f();
>
> There are a couple of interesting things in the patch. I'm using a
> PLACEHOLDER_EXPR as a template argument in order to resolve constraint
> names. Effectively, I deduce concepts by creating an expression like:
>
>   Equality_comparable()
>
> where ? is a placeholder, and after coerce_template_arguments
> completes, I can extract the matched parameter from the placeholder.
> This works nicely when concepts are overloaded or have default
> arguments (although I'm not sure I'm testing that very thoroughly
> right now).
>
> With variadic constraints, I've had to add functionality to expand a
> pack as a conjunction of requirements. For example, if you declare:
>
>   template // Class() must be true for each T in Ts
> void f();
>
> The transformation is:
>
>   template
> requires Class()...
>   void f();
>
> Where Class()... expands to Class() && Class() && ... etc.
> I feel like the current implementation is a bit hacky, and I'm
> wondering if I should introduce a new node for a pack conjunction.
>
> Change log follows.
>
> 2013-10-16  Andrew Sutton  
> * gcc/cp/constraint.cc (conjoin_requiremens): New.
> (resolve_constraint_check): Filter non-concept candidates before
> coercing arguments. Perform deduction in a template-decl processing
> context to prevent errors during diagnosis.
> (finish_concept_name), (finish_shorthand_requirement),
> (get_shorthand_requirements): New.
> * gcc/cp/pt.c (template_parm_to_arg): Make non-static.
> (process_templat_parm): Build shorthand requirements from the
> parameter description.
> (end_templat_parm_list): New.
> (convert_placeholder_argument): New.
> (convert_template_argument): Match placeholder arguments against
> any template parameter.
> (tsubst_pack_conjuction):  New.
> (tsubst_expr): Expand a pack as a conjunction.
> (type_dependent_expression_p): Placeholders are always type
> dependent.
> * gcc/cp/parser.c (cp_is_constrained_parameter),
> (cp_finish_template_type_parm), (cp_finish_template_template_parm)
> (cp_finish_non_type_template_parm), (cp_finish_constrined_parameter):
> New.
> (cp_parser_template_parameter): Handle constrained parameters.
> (cp_parser_nonclass_name): An identifier naming an overload set
> may declare a constrained parameter.
> (cp_parser_type_parameter), 
> (cp_parser_template_declaration_after_exp):
> Get shorthand requirements from the tmeplate parameter list.
> * gcc/cp/cp-tree.h (TEMPLATE_PARM_CONSTRAINTS): New.
>
> Committed in 203704.
>
> Andrew Sutton


short-2.patch
Description: Binary data


Re: [PATCH]Fix computation of offset in ivopt

2013-10-18 Thread Bernd Schmidt
On 10/18/2013 02:10 PM, Richard Biener wrote:
> On Fri, Oct 18, 2013 at 2:02 PM, Bernd Schmidt  
> wrote:
>> On 10/18/2013 01:18 PM, Richard Biener wrote:
>>
>>> Index: gcc/fold-const.c
>>> ===
>>> --- gcc/fold-const.c (revision 203267)
>>> +++ gcc/fold-const.c (working copy)
>>> @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre
>>>HOST_WIDE_INT int01, int11, tmp;
>>>bool swap = false;
>>>tree maybe_same;
>>> -  int01 = TREE_INT_CST_LOW (arg01);
>>> -  int11 = TREE_INT_CST_LOW (arg11);
>>> +  int01 = int_cst_value (arg01);
>>> +  int11 = int_cst_value (arg11);
>>>
>>> this is not correct - it will mishandle all large unsigned numbers.
>>>
>>> The real issue is that we rely on twos-complement arithmetic to work
>>> when operating on pointer offsets (because we convert them all to
>>> unsigned sizetype).  That makes interpreting the offsets or expressions
>>> that compute offsets hard (or impossible in some cases), as you can
>>> see from the issues in IVOPTs.
>>
>> I still have patches to keep pointer types in ivopts (using a new
>> POINTER_PLUSV_EXPR). Would that help in this case? Last time I posted
>> them they met an unenthusiastic reception so I've never bothered to
>> repost them.
> 
> Can you point me to that patch?  Or shortly elaborate on "keep pointer types
> in ivopts"?  I think this issue is about decomposing offset computations into
> a constant and a variable part, which after foldings messed up the unsigned
> computation can result in "odd" constant offset parts.  So it's rather
> because the offset operand of POINTER_PLUS_EXPR is fixed as
> sizetype.

Okay, my patch doesn't address that part, it only ensures the pointer
base values are kept and arithmetic on them is done using POINTER_PLUS.

The original patch was here.
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01104.html


Bernd



Re: [PATCH]Fix computation of offset in ivopt

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 2:26 PM, Bernd Schmidt  wrote:
> On 10/18/2013 02:10 PM, Richard Biener wrote:
>> On Fri, Oct 18, 2013 at 2:02 PM, Bernd Schmidt  
>> wrote:
>>> On 10/18/2013 01:18 PM, Richard Biener wrote:
>>>
 Index: gcc/fold-const.c
 ===
 --- gcc/fold-const.c (revision 203267)
 +++ gcc/fold-const.c (working copy)
 @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre
HOST_WIDE_INT int01, int11, tmp;
bool swap = false;
tree maybe_same;
 -  int01 = TREE_INT_CST_LOW (arg01);
 -  int11 = TREE_INT_CST_LOW (arg11);
 +  int01 = int_cst_value (arg01);
 +  int11 = int_cst_value (arg11);

 this is not correct - it will mishandle all large unsigned numbers.

 The real issue is that we rely on twos-complement arithmetic to work
 when operating on pointer offsets (because we convert them all to
 unsigned sizetype).  That makes interpreting the offsets or expressions
 that compute offsets hard (or impossible in some cases), as you can
 see from the issues in IVOPTs.
>>>
>>> I still have patches to keep pointer types in ivopts (using a new
>>> POINTER_PLUSV_EXPR). Would that help in this case? Last time I posted
>>> them they met an unenthusiastic reception so I've never bothered to
>>> repost them.
>>
>> Can you point me to that patch?  Or shortly elaborate on "keep pointer types
>> in ivopts"?  I think this issue is about decomposing offset computations into
>> a constant and a variable part, which after foldings messed up the unsigned
>> computation can result in "odd" constant offset parts.  So it's rather
>> because the offset operand of POINTER_PLUS_EXPR is fixed as
>> sizetype.
>
> Okay, my patch doesn't address that part, it only ensures the pointer
> base values are kept and arithmetic on them is done using POINTER_PLUS.

Ah, I recently fixed parts of that I think ...

2013-09-02  Richard Biener  

* tree-affine.c (add_elt_to_tree): Avoid converting all pointer
arithmetic to sizetype.

maybe you can double-check if the result is what you desired ;)

Richard.

> The original patch was here.
> http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01104.html
>
>
> Bernd
>


Re: libgo patch committed: Add system calls

2013-10-18 Thread Uros Bizjak
Hello!

> This patch to libgo adds some system calls to the syscall package.
> Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
> Committed to mainline and 4.8 branch, since these calls were in Go
> 1.1.2.

Uros.

Index: runtime/go-nosys.c
===
--- runtime/go-nosys.c  (revision 203810)
+++ runtime/go-nosys.c  (working copy)
@@ -47,7 +47,7 @@ accept4 (int sockfd __attribute__ ((unused)),
 int
 dup3 (int oldfd __attribute__ ((unused)),
   int newfd __attribute__ ((unused)),
-  int flags __attribtue__ ((unused)))
+  int flags __attribute__ ((unused)))
 {
   errno = ENOSYS;
   return -1;


Re: [PATCH] Reducing number of alias checks in vectorization.

2013-10-18 Thread Richard Biener
On Wed, Oct 2, 2013 at 8:26 PM, Jakub Jelinek  wrote:
> On Wed, Oct 02, 2013 at 10:50:21AM -0700, Cong Hou wrote:
>> >> +  if (int_cst_value (p11.offset) != int_cst_value (p21.offset))
>> >> +return int_cst_value (p11.offset) < int_cst_value (p21.offset);
>> >
>> > This is going to ICE whenever the offsets wouldn't fit into a
>> > HOST_WIDE_INT.
>> >
>> > I'd say you just shouldn't put into the vector entries where offset isn't
>> > host_integerp, those would never be merged with other checks, or something
>> > similar.
>>
>> Do you mean I should use widest_int_cst_value()? Then I will replace
>> all int_cst_value() here with it. I also changed the type of "diff"
>> variable into HOST_WIDEST_INT.
>
> Actually, best would be just to use
> tree_int_cst_compare (p11.offset, p21.offset)
> that will handle any INTEGER_CSTs, not just those that fit into HWI.

Note that this is not equivalent because int_cst_value () sign-extends
even unsigned values (yes, a very bad name for such a function).  But
I believe that in this case - comparing pointer offsets(?) - int_cst_value
is the correct function to use.

Richard.

> Jakub


Re: [PATCH] [ARM] Fix PR57909 : ICE with internal memcpy and -mno-unaligned-access

2013-10-18 Thread Yvan Roux
Ping^2

I forgot this one was still pending.

On 13 August 2013 14:21, Yvan Roux  wrote:
> Ping.
>
> On 23 July 2013 16:18, Yvan Roux  wrote:
>> Hi,
>>
>> I forgot to add the test case with the PR fix, the attached patch add it.
>>
>> Thanks,
>> Yvan
>>
>> ChangeLog
>>
>> gcc/testsuite
>>
>> 2013-07-23  Yvan Roux  
>>
>> PR target/57909
>> * gcc.target/arm/pr57909.c: New test.
>>
>>
>> On 17 July 2013 10:58, Ramana Radhakrishnan  wrote:
>>> On 07/17/13 09:53, Yvan Roux wrote:

 Hi,

 this patch fixes the issue described in PR57909, where we have an ICE
 during the internal memcpy, as some UNSPEC_UNALIGNED insns are emitted
 even if -mno-unaligned-access flag is passed. See the link below for a
 more detailled description:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57909

 Tested without any regression.

 Thanks,
 Yvan

 ChangeLog

 gcc/

 2013-07-17  Yvan Roux  

  PR target/57909
  * config/arm/arm.c (gen_movmem_ldrd_strd): Fix unaligned
 load/store
  usage in HI mode.

>>>
>>> Ok.
>>>
>>> regards
>>> Ramana
>>>
>>>


pr57909-test.diff
Description: Binary data


Re: [c++-concepts] Shorthand notation

2013-10-18 Thread Paolo Carlini

Hi,

On 10/18/2013 02:23 PM, Andrew Sutton wrote:

A small follow up change. This removes the "sorry" from concept name
resolution. Committed in r203815.

2013-10-16  Andrew Sutton  
 * gcc/cp/constraint.cc (finish_concept_name): Allow functions with
 the same name as concepts to resolve as call expressions in the
 usual way.
Nit: normally this would be just * constraint.cc, because the paths are 
relative to the location of the corresponding ChangeLog file (which is 
under gcc/cp)


Thanks!
Paolo.


Re: [PATCH, PR 53001] Re: Patch to split out new warning flag for floating point conversion

2013-10-18 Thread Joshua J Cogliati
On 10/14/2013 05:34 PM, Joseph S. Myers wrote:
> On Mon, 14 Oct 2013, Dodji Seketeli wrote:
> 
>>> This patch has passes the existing -Wconversion testcases.  It
>>> modifies Wconversion-real.c, Wconversion-real-integer.c and
>>> pr35635.c to be more specific
>> 
>> If the patch passes existing tests, I'd be inclined to leave
>> them tests alone and add new ones that are specific to what this
>> patch
> 
> Indeed, it's best not to change what an existing test tests.

This patch does not change any of the non-commented c and c++ code.
It changes the dg comments.
Example:
-  fsi (3.1f); /* { dg-warning "conversion" } */
+  fsi (3.1f); /* { dg-warning "float-conversion" } */

If you want I can change it to (in separate files if desired):

  fsi (3.1f); /* { dg-warning "conversion" } */
  fsi (3.1f); /* { dg-warning "float-conversion" } */

so that now the tests are run both ways, but it would test the exact
same code path.

>>> * gcc/c-family/c-common.c Switching unsafe_conversion_p to 
>>> return an enumeration with more detail, and conversion_warning 
>>> to use this information.
>> 
>> The correct format for this ChangeLog entry would be:
>> 
>> * gcc/c-family/c-common.c (unsafe_conversion_p):
>> .
> 
> And ChangeLog entries give the path to the file relative to the
> relevant ChangeLog file, which is that in the closest containing
> directory to the file being modified.  Since there's one in
> gcc/c-family, that means the path in this case is just c-common.c.
> 

Thanks for pointing out that detail about the changelog, I will fix
that next time I send the patch.


Re: [c++-concepts] Shorthand notation

2013-10-18 Thread Andrew Sutton
I know. When I started working on the project, Gaby asked that I keep
all concept-related changes in the top-level Changelog.concepts with
full paths for the file names.

Andrew Sutton


On Fri, Oct 18, 2013 at 8:53 AM, Paolo Carlini  wrote:
> Hi,
>
>
> On 10/18/2013 02:23 PM, Andrew Sutton wrote:
>>
>> A small follow up change. This removes the "sorry" from concept name
>> resolution. Committed in r203815.
>>
>> 2013-10-16  Andrew Sutton  
>>  * gcc/cp/constraint.cc (finish_concept_name): Allow functions
>> with
>>  the same name as concepts to resolve as call expressions in the
>>  usual way.
>
> Nit: normally this would be just * constraint.cc, because the paths are
> relative to the location of the corresponding ChangeLog file (which is under
> gcc/cp)
>
> Thanks!
> Paolo.


Re: [PATCH] Reducing number of alias checks in vectorization.

2013-10-18 Thread Richard Biener
On Tue, Oct 15, 2013 at 12:43 AM, Cong Hou  wrote:
> Sorry for forgetting using plain-text mode. Resend it.
>
>
> -- Forwarded message --
> From: Cong Hou 
> Date: Mon, Oct 14, 2013 at 3:29 PM
> Subject: Re: [PATCH] Reducing number of alias checks in vectorization.
> To: Richard Biener , GCC Patches 
> Cc: Jakub Jelinek 
>
>
> I have made a new patch for this issue according to your comments.
>
> There are several modifications to my previous patch:
>
>
> 1. Remove the use of STL features such as vector and sort. Use GCC's
> vec and qsort instead.

I think that using  and thus std::pair and std::swap is ok.  Including
of  should be done via system.h though.

> 2. Comparisons between tree nodes are not based on their addresses any
> more. Use compare_tree() function instead.

Ok.

> 3. The function vect_create_cond_for_alias_checks() now returns the
> number of alias checks. If its second parameter cond_expr is NULL,
> then this function only calculate the number of alias checks after the
> merging and won't generate comparison expressions.

We now perform the merging twice - that's easily avoided by recording
the merge result in loop_vinfo alongside may_alias_ddrs (which you
should be able to drop as they are already contained in the data
structures you build).

Which means you can split the function and move the merging
part to vect_prune_runtime_alias_test_list.

> 4. The function vect_prune_runtime_alias_test_list() now uses
> vect_create_cond_for_alias_checks() to get the number of alias checks.
>
>
> The patch is attached as a text file.
>
> Please give me your comment on this patch. Thank you!

+  if (!operand_equal_p (dr_a1->basic_addr, dr_a2->basic_addr, 0)
+  || TREE_CODE (dr_a1->offset) != INTEGER_CST
+  || TREE_CODE (dr_a2->offset) != INTEGER_CST)
+continue;
+
+  HOST_WIDEST_INT diff = widest_int_cst_value (dr_a2->offset) -
+ widest_int_cst_value (dr_a1->offset);

use !host_integerp (dr_a1->offset) as check and then int_cst_value.

+  if (diff <= vect_factor
+  || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
+  && diff - widest_int_cst_value (dr_a1->seg_len) < vect_factor)
+  || (TREE_CODE (dr_a1->seg_len) == INTEGER_CST
+  && TREE_CODE (dr_b1->seg_len) == INTEGER_CST
+  && diff - widest_int_cst_value (dr_a1->seg_len) <
+ widest_int_cst_value (dr_b1->seg_len)))

can you add a comment above this why it is safe to combine under
this condition?  You seem to combine offsets but not data-ref steps
for example.  The segment length is computed quite optimistically
and you don't seem to adjust that.  That is, generally the alias
check, if implemented as simple overlap, would need to intersect
[init, init + niter * step]  intervals, losing some non-aliasing cases.
The current segment length is optimistically computed and thus
can recover some more cases, but you have to be careful with
merging to not break its implicit assumptions.

Thanks,
Richard.

>
> Cong
>
>
> On Thu, Oct 3, 2013 at 2:35 PM, Cong Hou  wrote:
>>
>> Forget about this "aux" idea as the segment length for one data ref
>> can be different in different dr pairs.
>>
>> In my patch I created a struct as shown below:
>>
>> struct dr_addr_with_seg_len
>> {
>>   data_reference *dr;
>>   tree basic_addr;
>>   tree offset;
>>   tree seg_len;
>> };
>>
>>
>> Note that basic_addr and offset can always obtained from dr, but we
>> need to store two segment lengths for each dr pair. It is improper to
>> add a field to data_dependence_relation as it is defined outside of
>> vectorizer. We can change the type (a new one combining
>> data_dependence_relation and segment length) of may_alias_ddrs in
>> loop_vec_info to include such information, but we have to add a new
>> type to tree-vectorizer.h which is only used in two places - still too
>> much.
>>
>> One possible solution is that we create a local struct as shown above
>> and a new function which returns the merged alias check information.
>> This function will be called twice: once during analysis phase and
>> once in transformation phase. Then we don't have to store the merged
>> alias check information during those two phases. The additional time
>> cost is minimal as there will not be too many data dependent dr pairs
>> in a loop.
>>
>> Any comment?
>>
>>
>> thanks,
>> Cong
>>
>>
>> On Thu, Oct 3, 2013 at 10:57 AM, Cong Hou  wrote:
>> > I noticed that there is a "struct dataref_aux" defined in
>> > tree-vectorizer.h which is specific to the vectorizer pass and is
>> > stored in (void*)aux in "struct data_reference". Can we add one more
>> > field "segment_length" to dataref_aux so that we can pass this
>> > information for merging alias checks? Then we can avoid to modify or
>> > create other structures.
>> >
>> >
>> > thanks,
>> > Cong
>> >
>> >
>> > On Wed, Oct 2, 2013 at 2:34 PM, Cong Hou  wrote:
>> >> On Wed, Oct 2, 2013 at 4:24 AM, Richard Biener  wrote:
>> >>> On Tue, 1 Oct 2013, Cong Hou wrote:
>> >>>
>>  When alias exists between data refs in a l

Re: [wide-int] int_traits

2013-10-18 Thread Kenneth Zadeck

On 10/18/2013 04:45 AM, Richard Biener wrote:

On Thu, 17 Oct 2013, Mike Stump wrote:


On Oct 17, 2013, at 1:46 AM, Richard Biener  wrote:

[This function shows another optimization issue:

case BOOLEAN_TYPE:
  /* Cache false or true.  */
  limit = 2;
  if (wi::leu_p (cst, 1))
ix = cst.to_uhwi ();

I would have expected cst <= 1 be optimized to cst.len == 1 &&
cst.val[0] <= 1.

So lts_p has the same problem, and is used just below.  If we do:

@@ -1461,12 +1470,22 @@ wi::ne_p (const T1 &x, const T2 &y)
  inline bool
  wi::lts_p (const wide_int_ref &x, const wide_int_ref &y)
  {
-  if (x.precision <= HOST_BITS_PER_WIDE_INT
-  && y.precision <= HOST_BITS_PER_WIDE_INT)
-return x.slow () < y.slow ();
-  else
-return lts_p_large (x.val, x.len, x.precision, y.val, y.len,
-   y.precision);
+  // We optimize w < N, where N is 64 or fewer bits.
+  if (y.precision <= HOST_BITS_PER_WIDE_INT)
+{
+  // If it fits directly into a shwi, we can compare directly.
+  if (wi::fits_shwi_p (x))
+   return x.slow () < y.slow ();
+  // If it is doesn't fit, then it must be more negative than any
+  // value in y, and hence smaller.
+  if (neg_p (x, SIGNED))
+   return true;
+  // If it is positve, then it must be larger than any value in y,
+  // and hence greater than.
+  return false;
+}
+  return lts_p_large (x.val, x.len, x.precision, y.val, y.len,
+ y.precision);
  }

then for:

bool MGEN(wide_int cst) {
   return wi::lts_p (cst, 100);
}

we generate:

 movl520(%rsp), %eax
 cmpl$1, %eax
 je  .L5283
 subl$1, %eax
 movq8(%rsp,%rax,8), %rax
 shrq$63, %rax
 ret
 .p2align 4,,10
 .p2align 3
.L5283:
 cmpq$99, 8(%rsp)
 setle   %al
 ret

which as you can see, never calls lts_p_large, and hence, if that was the only 
reason the storage escapes, then it should be able to optimize better.  In the 
above code, minimal, no function calls, and the 100 is propagated all the way 
down into the cmpq.  Now, the reason why we should do it this way, most of the 
time, most types are 64 bits or less.  When that is true, then it will never 
call into lts_p_large.

If the above 100 is changed to l, from a parameter long l, then the code is the 
same except the last part is:

 cmpq8(%rsp), %rdi
 setg%al
 ret

If we have two wide_int's, then the code does this:

 .cfi_startproc
 subq$8, %rsp
.cfi_def_cfa_offset 16
 movl1052(%rsp), %r9d
 movl1048(%rsp), %r8d
 movl532(%rsp), %edx
 movl528(%rsp), %esi
 cmpl$64, %r9d
 ja  .L5281
 cmpl$1, %esi
je  .L5284
 subl$1, %esi
 movq16(%rsp,%rsi,8), %rax
 addq$8, %rsp
 .cfi_remember_state
 .cfi_def_cfa_offset 8
 shrq$63, %rax
 ret
 .p2align 4,,10
 .p2align 3
.L5281:
 .cfi_restore_state
 leaq536(%rsp), %rcx
 leaq16(%rsp), %rdi
 call_ZN2wi11lts_p_largeEPKljjS1_jj
addq$8, %rsp
 .cfi_remember_state
 .cfi_def_cfa_offset 8
 ret
 .p2align 4,,10
 .p2align 3
.L5284:
 .cfi_restore_state
 movq536(%rsp), %rax
 cmpq%rax, 16(%rsp)
 setl%al
 addq$8, %rsp
 .cfi_def_cfa_offset 8
 ret
 .cfi_endproc

which is still pretty nice.

Does this look ok?  Kenny, can you double check the cases, think I have them 
right, but?  a double check would be good.

That works for me.

i talked to mike about this last night, but did not follow up with an 
email until now.   The problem is that this code is wrong!!!   He is 
working to fix that and so i would expect something from him later 
(tomorrow for those in europe).The problem is if the variable that 
comes in is a unsigned HWI that has to the top bit set.In that case, 
the wide-int canonicalized version would have two hwi's, the top one 
being a 0 block and so you cannot do the fast path test coming in on 
that even though it's precison fits.

the representation should guarantee the
compare with a low precision (32 bit) constant is evaluatable
at compile-time if len of the larger value is > 1, no?

I agree, though, I didn't check the symmetric case, as constants and
smaller values can be put on the right by convention.


The question is whether we should try to optimize wide-int for
such cases or simply not use wi:leu_p (cst, 1) but rather

if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1)

?

Since we can do an excellent job with the simple interface, I'd argue
for the simple interface and the simple change to do the conditional
better.  I'd rather up the ante on meta programming to get any
performance we want, retaining the s

Re: [wide-int] int_traits

2013-10-18 Thread Kenneth Zadeck

Richi,

Do you want me to back out the patch that changes the rep for unsigned 
tree-csts?


kenny


Re: [PATCH] Fix part of PR58712

2013-10-18 Thread Jan Hubicka
> On Fri, Oct 18, 2013 at 12:10 PM, Markus Trippelsdorf
>  wrote:
> > On 2013.10.15 at 12:49 +0200, Richard Biener wrote:
> >> On Tue, Oct 15, 2013 at 12:31 PM, Markus Trippelsdorf
> >>  wrote:
> >> > Valgrind complains:
> >> > ==27870== Conditional jump or move depends on uninitialised value(s)
> >> > ==27870==at 0x557CDC: cgraph_create_edge_1(cgraph_node*, 
> >> > cgraph_node*, gimple_statement_d*, long, int) (cgraph.c:695)
> >> > ==27870==by 0x55882E: cgraph_create_edge(cgraph_node*, cgraph_node*, 
> >> > gimple_statement_d*, long, int) (cgraph.c:890)
> >> > ==27870==by 0x560891: cgraph_clone_edge(cgraph_edge*, cgraph_node*, 
> >> > gimple_statement_d*, unsigned int, long, int, bool) (cgraphclones.c:135)
> >> > ==27870==by 0x7F1F14: copy_body(copy_body_data*, long, int, 
> >> > basic_block_def*, basic_block_def*, basic_block_def*) 
> >> > (tree-inline.c:1741)
> >> > ...
> >> > ==27870==  Uninitialised value was created by a client request
> >> > ==27870==at 0x50BBEE: ggc_internal_alloc_stat(unsigned long) 
> >> > (ggc-page.c:1339)
> >> > ==27870==by 0x557D92: cgraph_create_edge_1(cgraph_node*, 
> >> > cgraph_node*, gimple_statement_d*, long, int) (cgraph.c:842)
> >> > ==27870==by 0x55882E: cgraph_create_edge(cgraph_node*, cgraph_node*, 
> >> > gimple_statement_d*, long, int) (cgraph.c:890)
> >> > ==27870==by 0x560891: cgraph_clone_edge(cgraph_edge*, cgraph_node*, 
> >> > gimple_statement_d*, unsigned int, long, int, bool) (cgraphclones.c:135)
> >> > ,,,
> >> >
> >> > This happens because e->indirect_unknown_callee may be uninitialized in
> >> > cgraph_add_edge_to_call_site_hash. Fixed by initializing it earlier.
> >> >
> >> > LTO bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >
> >> > Please apply, if this looks reasonable.
> >>
> >> As we also have
> >>
> >> struct cgraph_edge *
> >> cgraph_create_indirect_edge (struct cgraph_node *caller, gimple call_stmt,
> >>  int ecf_flags,
> >>  gcov_type count, int freq)
> >> {
> >>   struct cgraph_edge *edge = cgraph_create_edge_1 (caller, NULL, call_stmt,
> >>count, freq);
> >>   tree target;
> >>
> >>   edge->indirect_unknown_callee = 1;
> >>
> >> I'd rather change the cgraph_create_edge_1 interface to get an additional
> >> argument (the value to use for indirect_unknown_callee).  Or maybe
> >> we can statically compute it from the current arguments already?
> >
> > IOW something like this:
> 
> Looks good to me.
> 
> Honza?

Yes, this is OK.
Thanks!
Honza


Re: [wide-int] int_traits

2013-10-18 Thread Richard Biener
On Fri, Oct 18, 2013 at 3:13 PM, Kenneth Zadeck
 wrote:
> Richi,
>
> Do you want me to back out the patch that changes the rep for unsigned
> tree-csts?

Not yet please.

Thanks,
Richard.

> kenny


Re: libgo patch committed: Add system calls

2013-10-18 Thread Ian Lance Taylor
On Fri, Oct 18, 2013 at 5:40 AM, Uros Bizjak  wrote:
> Hello!
>
>> This patch to libgo adds some system calls to the syscall package.
>> Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
>> Committed to mainline and 4.8 branch, since these calls were in Go
>> 1.1.2.
>
> Uros.
>
> Index: runtime/go-nosys.c
> ===
> --- runtime/go-nosys.c  (revision 203810)
> +++ runtime/go-nosys.c  (working copy)
> @@ -47,7 +47,7 @@ accept4 (int sockfd __attribute__ ((unused)),
>  int
>  dup3 (int oldfd __attribute__ ((unused)),
>int newfd __attribute__ ((unused)),
> -  int flags __attribtue__ ((unused)))
> +  int flags __attribute__ ((unused)))
>  {
>errno = ENOSYS;
>return -1;



Wow, who writes this stuff?

Committed to mainline and 4.8 branch.

Thanks.

Ian


Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)

2013-10-18 Thread David Malcolm
On Wed, 2013-10-16 at 11:32 -0700, Mike Stump wrote:
> On Oct 16, 2013, at 9:26 AM, Tom Tromey  wrote:
> > Andreas> You could put it in .dir-locals.el, or put a hook on
> > Andreas> find-file-hook that looks at buffer-file-name.
> 
> > We checked in a .dir-locals.el for gdb.  I recommend it for GCC as well.
> 
> I've copied the gdb one and updated it for our tree:
[...]
> Committed revision 203715.

Thanks everyone; I'm finding this very helpful.

Dave



[patch 1/8] Remove gimple-low.h from the tree-ssa.h include list.

2013-10-18 Thread Andrew MacLeod


gimple_check_call_matching_types() was being called from 3 or 4 
different files,and seemed more appropriate as a cgraph routine (which 
called it 3 times). So I moved that and its helper to cgraph.c.


After that, I only needed to update 4 .c files to directly include 
gimple-low.h


bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

Andrew


  * tree-ssa.h: Remove gimple-low.h from include list.
  * gimple-low.c (gimple_check_call_arg, gimple_check_call_matching_types):
  Move to cgraph.c.
  * gimple-low.h: Remove prototype.
  * cgraph.c: (gimple_check_call_arg, gimple_check_call_matching_types):
  Relocate from gimple-low.c.
  * cgraph.h: Add prototype,
  * gimplify.c: Add gimple-low to include list.
  * omp-low.c: Add gimple-low to include list.
  * tree-eh.c: Add gimple-low to include list.
  * tree-nested.c: Add gimple-low to include list.

*** T2/tree-ssa.h	2013-10-17 09:36:38.577417384 -0400
--- tree-ssa.h	2013-10-17 09:37:07.962843834 -0400
*** along with GCC; see the file COPYING3.
*** 36,42 
  #include "tree-ssa-address.h"
  #include "tree-ssa-loop.h"
  #include "tree-into-ssa.h"
- #include "gimple-low.h"
  #include "tree-dfa.h"
  
  /* Mapping for redirected edges.  */
--- 36,41 
*** T2/gimple-low.c	2013-10-17 09:36:38.544418194 -0400
--- gimple-low.c	2013-10-17 10:43:00.860672918 -0400
*** along with GCC; see the file COPYING3.
*** 32,37 
--- 32,38 
  #include "diagnostic-core.h"
  #include "tree-pass.h"
  #include "langhooks.h"
+ #include "gimple-low.h"
  
  /* The differences between High GIMPLE and Low GIMPLE are the
 following:
*** make_pass_lower_cf (gcc::context *ctxt)
*** 215,317 
return new pass_lower_cf (ctxt);
  }
  
- 
- 
- /* Verify if the type of the argument matches that of the function
-declaration.  If we cannot verify this or there is a mismatch,
-return false.  */
- 
- static bool
- gimple_check_call_args (gimple stmt, tree fndecl, bool args_count_match)
- {
-   tree parms, p;
-   unsigned int i, nargs;
- 
-   /* Calls to internal functions always match their signature.  */
-   if (gimple_call_internal_p (stmt))
- return true;
- 
-   nargs = gimple_call_num_args (stmt);
- 
-   /* Get argument types for verification.  */
-   if (fndecl)
- parms = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
-   else
- parms = TYPE_ARG_TYPES (gimple_call_fntype (stmt));
- 
-   /* Verify if the type of the argument matches that of the function
-  declaration.  If we cannot verify this or there is a mismatch,
-  return false.  */
-   if (fndecl && DECL_ARGUMENTS (fndecl))
- {
-   for (i = 0, p = DECL_ARGUMENTS (fndecl);
- 	   i < nargs;
- 	   i++, p = DECL_CHAIN (p))
- 	{
- 	  tree arg;
- 	  /* We cannot distinguish a varargs function from the case
- 	 of excess parameters, still deferring the inlining decision
- 	 to the callee is possible.  */
- 	  if (!p)
- 	break;
- 	  arg = gimple_call_arg (stmt, i);
- 	  if (p == error_mark_node
- 	  || arg == error_mark_node
- 	  || (!types_compatible_p (DECL_ARG_TYPE (p), TREE_TYPE (arg))
- 		  && !fold_convertible_p (DECL_ARG_TYPE (p), arg)))
- return false;
- 	}
-   if (args_count_match && p)
- 	return false;
- }
-   else if (parms)
- {
-   for (i = 0, p = parms; i < nargs; i++, p = TREE_CHAIN (p))
- 	{
- 	  tree arg;
- 	  /* If this is a varargs function defer inlining decision
- 	 to callee.  */
- 	  if (!p)
- 	break;
- 	  arg = gimple_call_arg (stmt, i);
- 	  if (TREE_VALUE (p) == error_mark_node
- 	  || arg == error_mark_node
- 	  || TREE_CODE (TREE_VALUE (p)) == VOID_TYPE
- 	  || (!types_compatible_p (TREE_VALUE (p), TREE_TYPE (arg))
- 		  && !fold_convertible_p (TREE_VALUE (p), arg)))
- return false;
- 	}
- }
-   else
- {
-   if (nargs != 0)
- return false;
- }
-   return true;
- }
- 
- /* Verify if the type of the argument and lhs of CALL_STMT matches
-that of the function declaration CALLEE. If ARGS_COUNT_MATCH is
-true, the arg count needs to be the same.
-If we cannot verify this or there is a mismatch, return false.  */
- 
- bool
- gimple_check_call_matching_types (gimple call_stmt, tree callee,
-   bool args_count_match)
- {
-   tree lhs;
- 
-   if ((DECL_RESULT (callee)
-&& !DECL_BY_REFERENCE (DECL_RESULT (callee))
-&& (lhs = gimple_call_lhs (call_stmt)) != NULL_TREE
-&& !useless_type_conversion_p (TREE_TYPE (DECL_RESULT (callee)),
-   TREE_TYPE (lhs))
-&& !fold_convertible_p (TREE_TYPE (DECL_RESULT (callee)), lhs))
-   || !gimple_check_call_args (call_stmt, callee, args_count_match))
- return false;
-   return true;
- }
- 
  /* Lower sequence SEQ.  Unlike gimplification the statements are not relowered
 when they are changed -- if this has to be done, the lowering routine must
 do it ex

[patch 2/8] Remove tree-ssa-address.h from the tree-ssa.h include list.

2013-10-18 Thread Andrew MacLeod
This patch just does the direct thing.. included the file where it was 
needed (7 .c files)   I was tempted to move copy_ref_info() to a 
different file... its actually the only routine in this file which is 
SSA based... and rename the file tree-address.[ch].  Not sure where I'd 
copy it to, maybe tree-ssa.c... Almost every routine called is in 
tree-ssanames.c, but that doesnt seem quite appropriate. Maybe its OK 
like it is.


In any case, if you want something further done, I can follow up with 
another patch.


bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

Andrew


	* tree-ssa.h: Don't include tree-ssa-address.h.
	* cfgexpand.c: Add tree-ssa-address.h to include list.
	* expr.c: Likewise.
	* gimple-fold.c: Likewise.
	* gimple-ssa-strength-reduction.c: Likewise.
	* trans-mem.c: Likewise.
	* tree-mudflap.c: Likewise.
	* tree-ssa-loop-ivopts.c: Likewise.

*** T3/tree-ssa.h	2013-10-17 10:52:02.858047334 -0400
--- tree-ssa.h	2013-10-17 12:10:07.460476961 -0400
*** along with GCC; see the file COPYING3.
*** 33,39 
  #include "tree-ssanames.h"
  #include "tree-ssa-dom.h"
  #include "tree-ssa-threadedge.h"
- #include "tree-ssa-address.h"
  #include "tree-ssa-loop.h"
  #include "tree-into-ssa.h"
  #include "tree-dfa.h"
--- 33,38 
*** T3/cfgexpand.c	2013-10-17 10:52:02.825047320 -0400
--- cfgexpand.c	2013-10-17 12:21:13.884452160 -0400
*** along with GCC; see the file COPYING3.
*** 47,52 
--- 47,53 
  #include "regs.h" /* For reg_renumber.  */
  #include "insn-attr.h" /* For INSN_SCHEDULING.  */
  #include "asan.h"
+ #include "tree-ssa-address.h"
  
  /* This variable holds information helping the rewriting of SSA trees
 into RTL.  */
*** T3/expr.c	2013-10-17 10:52:02.831047323 -0400
--- expr.c	2013-10-17 12:22:05.594452615 -0400
*** along with GCC; see the file COPYING3.
*** 52,57 
--- 52,58 
  #include "tree-outof-ssa.h"
  #include "target-globals.h"
  #include "params.h"
+ #include "tree-ssa-address.h"
  
  /* Decide whether a function's arguments should be processed
 from first to last or from last to first.
*** T3/gimple-fold.c	2013-10-17 10:52:02.836047325 -0400
--- gimple-fold.c	2013-10-17 12:21:57.753452521 -0400
*** along with GCC; see the file COPYING3.
*** 31,36 
--- 31,37 
  #include "target.h"
  #include "ipa-utils.h"
  #include "gimple-pretty-print.h"
+ #include "tree-ssa-address.h"
  
  /* Return true when DECL can be referenced from current unit.
 FROM_DECL (if non-null) specify constructor of variable DECL was taken from.
*** T3/gimple-ssa-strength-reduction.c	2013-10-17 10:52:02.836047325 -0400
--- gimple-ssa-strength-reduction.c	2013-10-17 12:21:05.563452124 -0400
*** along with GCC; see the file COPYING3.
*** 48,53 
--- 48,54 
  #include "expmed.h"
  #include "params.h"
  #include "hash-table.h"
+ #include "tree-ssa-address.h"
  
  /* Information about a strength reduction candidate.  Each statement
 in the candidate table represents an expression of one of the
*** T3/trans-mem.c	2013-10-17 10:52:02.853047332 -0400
--- trans-mem.c	2013-10-17 12:21:09.283452139 -0400
***
*** 35,40 
--- 35,41 
  #include "langhooks.h"
  #include "gimple-pretty-print.h"
  #include "cfgloop.h"
+ #include "tree-ssa-address.h"
  
  
  #define PROB_VERY_UNLIKELY	(REG_BR_PROB_BASE / 2000 - 1)
*** T3/tree-mudflap.c	2013-10-17 10:52:02.856047334 -0400
--- tree-mudflap.c	2013-10-17 12:20:54.929452093 -0400
*** along with GCC; see the file COPYING3.
*** 42,47 
--- 42,48 
  #include "ggc.h"
  #include "cgraph.h"
  #include "gimple.h"
+ #include "tree-ssa-address.h"
  
  extern void add_bb_to_loop (basic_block, struct loop *);
  
*** T3/tree-ssa-loop-ivopts.c	2013-10-17 10:52:02.859047335 -0400
--- tree-ssa-loop-ivopts.c	2013-10-17 12:20:59.933452106 -0400
*** along with GCC; see the file COPYING3.
*** 86,91 
--- 86,92 
  #include "tree-inline.h"
  #include "tree-ssa-propagate.h"
  #include "expmed.h"
+ #include "tree-ssa-address.h"
  
  /* FIXME: Expressions are expanded to RTL in this pass to determine the
 cost of different addressing modes.  This should be moved to a TBD


[patch 3/8] Remove tree-ssa-threadedge.h from the tree-ssa.h include list.

2013-10-18 Thread Andrew MacLeod
Straightforward. Just include tree-ssa-threadedge in the 4 .c files that 
need it.


bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

Andrew



	* tree-ssa.h: Don't include tree-ssa-threadedge.h.
	* tree-ssa-dom.c: Include tree-ssa-threadedge.h.
	* tree-ssa-loop-ch.c: Likewise.
	* tree-ssa-threadedge.c: Likewise.
	* tree-vrp.c: Likewise.

*** T4/tree-ssa.h	2013-10-17 12:27:21.260463076 -0400
--- tree-ssa.h	2013-10-17 12:27:36.545463878 -0400
*** along with GCC; see the file COPYING3.
*** 32,38 
  #include "ssa-iterators.h"
  #include "tree-ssanames.h"
  #include "tree-ssa-dom.h"
- #include "tree-ssa-threadedge.h"
  #include "tree-ssa-loop.h"
  #include "tree-into-ssa.h"
  #include "tree-dfa.h"
--- 32,37 
*** T4/tree-ssa-dom.c	2013-10-17 12:27:21.260463076 -0400
--- tree-ssa-dom.c	2013-10-17 13:00:46.759975424 -0400
*** along with GCC; see the file COPYING3.
*** 37,42 
--- 37,43 
  #include "tree-ssa-threadupdate.h"
  #include "langhooks.h"
  #include "params.h"
+ #include "tree-ssa-threadedge.h"
  
  /* This file implements optimizations on the dominator tree.  */
  
*** T4/tree-ssa-loop-ch.c	2013-10-17 12:27:21.260463076 -0400
--- tree-ssa-loop-ch.c	2013-10-17 13:00:33.272976093 -0400
*** along with GCC; see the file COPYING3.
*** 29,34 
--- 29,35 
  #include "cfgloop.h"
  #include "tree-inline.h"
  #include "flags.h"
+ #include "tree-ssa-threadedge.h"
  
  /* Duplicates headers of loops if they are small enough, so that the statements
 in the loop body are always executed when the loop is entered.  This
*** T4/tree-ssa-threadedge.c	2013-10-17 12:27:21.264463076 -0400
--- tree-ssa-threadedge.c	2013-10-17 13:00:41.902975659 -0400
*** along with GCC; see the file COPYING3.
*** 35,40 
--- 35,41 
  #include "tree-ssa-threadupdate.h"
  #include "langhooks.h"
  #include "params.h"
+ #include "tree-ssa-threadedge.h"
  
  /* To avoid code explosion due to jump threading, we limit the
 number of statements we are going to copy.  This variable
*** T4/tree-vrp.c	2013-10-17 12:27:21.268463076 -0400
--- tree-vrp.c	2013-10-17 13:00:51.003975224 -0400
*** along with GCC; see the file COPYING3.
*** 39,44 
--- 39,45 
  #include "tree-ssa-threadupdate.h"
  #include "expr.h"
  #include "optabs.h"
+ #include "tree-ssa-threadedge.h"
  
  
  


Re: libbacktrace patch committed (Was: Re: [jit] Update TODO.rst)

2013-10-18 Thread David Malcolm
On Thu, 2013-10-17 at 21:28 -0700, Ian Lance Taylor wrote:
> On Thu, Oct 17, 2013 at 8:54 PM, David Malcolm  wrote:
> >
> > +* segfault seen in libbacktrace, when an ICE occurs
> 
> That reminded me to commit this libbacktrace patch I worked up a
> couple of weeks ago.  Previously if some debug section was missing,
> the code could compute the wrong min_offset.  The missing section
> would have a zero offset, so min_offset would be set to zero, and
> would then be set to the offset of the next section, even though that
> one might not be the minimum.  That could lead to a segfault in some
> cases, though I don't know if that is the issue that David is seeing.

Thanks - your patch has fixed the issue I was seeing, and I now reliably
get backtraces when an ICE happens within libgccjit.so.

Now to try to fix things so that ICEs can't happen...

Dave



[patch 4/8] Remove tree-ssa-dom.h from the tree-ssa.h include list.

2013-10-18 Thread Andrew MacLeod
degenerate_phi_result was defined in tree-ssa-dom.c, I moved it to 
tree-phinodes since all it does is determine whether the arguements of 
the phi which are not the same as the result are all the same. This 
reduced by half the number fo files which required tree-ssa-dom.h.


bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

Andrew


	* tree-ssa.h: Don't include tree-ssa-dom.h.
	* tree-ssa-dom.c: Include tree-ssa-dom.h.
	(degenerate_phi_result): Move to tree-phinodes.c.
	* tree-phinodes.c (degenerate_phi_result): Relocate here.
	* tree-ssa-dom.h: Remove Prototype.
	* tree-phinodes.h: Add prototype.
	* tree-ssa-copy.c: Include tree-ssa-dom.h.
	* tree-ssa-forwprop.c: Likewise.

*** T5/tree-ssa.h	2013-10-17 13:05:11.372947947 -0400
--- tree-ssa.h	2013-10-17 13:05:32.445945346 -0400
*** along with GCC; see the file COPYING3.
*** 31,37 
  #include "tree-phinodes.h"
  #include "ssa-iterators.h"
  #include "tree-ssanames.h"
- #include "tree-ssa-dom.h"
  #include "tree-ssa-loop.h"
  #include "tree-into-ssa.h"
  #include "tree-dfa.h"
--- 31,36 
*** T5/tree-ssa-dom.c	2013-10-17 13:05:11.368947948 -0400
--- tree-ssa-dom.c	2013-10-17 13:14:26.712902297 -0400
*** along with GCC; see the file COPYING3.
*** 38,43 
--- 38,44 
  #include "langhooks.h"
  #include "params.h"
  #include "tree-ssa-threadedge.h"
+ #include "tree-ssa-dom.h"
  
  /* This file implements optimizations on the dominator tree.  */
  
*** avail_expr_hash (const void *p)
*** 2589,2630 
  /* PHI-ONLY copy and constant propagation.  This pass is meant to clean
 up degenerate PHIs created by or exposed by jump threading.  */
  
- /* Given PHI, return its RHS if the PHI is a degenerate, otherwise return
-NULL.  */
- 
- tree
- degenerate_phi_result (gimple phi)
- {
-   tree lhs = gimple_phi_result (phi);
-   tree val = NULL;
-   size_t i;
- 
-   /* Ignoring arguments which are the same as LHS, if all the remaining
-  arguments are the same, then the PHI is a degenerate and has the
-  value of that common argument.  */
-   for (i = 0; i < gimple_phi_num_args (phi); i++)
- {
-   tree arg = gimple_phi_arg_def (phi, i);
- 
-   if (arg == lhs)
- 	continue;
-   else if (!arg)
- 	break;
-   else if (!val)
- 	val = arg;
-   else if (arg == val)
- 	continue;
-   /* We bring in some of operand_equal_p not only to speed things
- 	 up, but also to avoid crashing when dereferencing the type of
- 	 a released SSA name.  */
-   else if (TREE_CODE (val) != TREE_CODE (arg)
- 	   || TREE_CODE (val) == SSA_NAME
- 	   || !operand_equal_p (arg, val, 0))
- 	break;
- }
-   return (i == gimple_phi_num_args (phi) ? val : NULL);
- }
- 
  /* Given a statement STMT, which is either a PHI node or an assignment,
 remove it from the IL.  */
  
--- 2590,2595 
*** T5/tree-phinodes.c	2013-10-17 13:05:11.366947948 -0400
--- tree-phinodes.c	2013-10-17 13:10:14.281917669 -0400
*** remove_phi_nodes (basic_block bb)
*** 464,467 
--- 464,504 
set_phi_nodes (bb, NULL);
  }
  
+ /* Given PHI, return its RHS if the PHI is a degenerate, otherwise return
+NULL.  */
+ 
+ tree
+ degenerate_phi_result (gimple phi)
+ {
+   tree lhs = gimple_phi_result (phi);
+   tree val = NULL;
+   size_t i;
+ 
+   /* Ignoring arguments which are the same as LHS, if all the remaining
+  arguments are the same, then the PHI is a degenerate and has the
+  value of that common argument.  */
+   for (i = 0; i < gimple_phi_num_args (phi); i++)
+ {
+   tree arg = gimple_phi_arg_def (phi, i);
+ 
+   if (arg == lhs)
+ 	continue;
+   else if (!arg)
+ 	break;
+   else if (!val)
+ 	val = arg;
+   else if (arg == val)
+ 	continue;
+   /* We bring in some of operand_equal_p not only to speed things
+ 	 up, but also to avoid crashing when dereferencing the type of
+ 	 a released SSA name.  */
+   else if (TREE_CODE (val) != TREE_CODE (arg)
+ 	   || TREE_CODE (val) == SSA_NAME
+ 	   || !operand_equal_p (arg, val, 0))
+ 	break;
+ }
+   return (i == gimple_phi_num_args (phi) ? val : NULL);
+ }
+ 
+ 
  #include "gt-tree-phinodes.h"
*** T5/tree-ssa-dom.h	2013-10-17 13:05:11.368947948 -0400
--- tree-ssa-dom.h	2013-10-17 13:11:27.636912389 -0400
*** extern void dump_dominator_optimization_
*** 24,29 
  extern void debug_dominator_optimization_stats (void);
  extern int loop_depth_of_name (tree);
  extern bool simple_iv_increment_p (gimple);
- extern tree degenerate_phi_result (gimple);
  
  #endif /* GCC_TREE_SSA_DOM_H */
--- 24,28 
*** T5/tree-phinodes.h	2013-10-17 13:05:11.366947948 -0400
--- tree-phinodes.h	2013-10-17 13:11:25.245912550 -0400
*** extern void add_phi_arg (gimple, tree, e
*** 29,35 
  extern void remove_phi_args (edge);
  extern void remove_phi_node (gimple_stmt_iterator *, bool);
  extern void remove_phi_nodes (basic_block);
! /* Return a use_operand_p pointer for arg

[patch 5/8] Remove tree-cfgcleanup.h from the tree-ssa.h include list.

2013-10-18 Thread Andrew MacLeod
This was slightly more interesting. I moved the 
cleanup_cfg_post_optimizing pass from tree-optimize.c to 
tree-cfgcleanup.c.  And that left tree-optimize.c empty... so I deleted 
that file as well.  Other than that, just include it in the other 7 
files that require it


bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

Andrew


	* tree-ssa.h: Don't include tree-cfgcleanup.h.
	* tree-cfgcleanup.c: Include tree-cfgcleanup.h.
	(execute_cleanup_cfg_post_optimizing,
	pass_data_cleanup_cfg_post_optimizing,
	make_pass_cleanup_cfg_post_optimizing): Relocate from tree-optimize.c.
	* tree-optimize.c: Delete File.
	* graphite.c: Include tree-cfgcleanup.h.
	* omp-low.c: Likewise.
	* passes.c: Likewise.
	* tree-cfg.c: Likewise.
	* tree-profile.c: Likewise.
	* tree-ssa-dse.c: Likewise.
	* tree-ssa-loop-ivcanon.c: Likewise.
	* Makefile.in (OBJS): Delete tree-optimize.o.

*** T6/tree-ssa.h	2013-10-17 14:35:07.043401391 -0400
--- tree-ssa.h	2013-10-17 14:35:17.332401630 -0400
*** along with GCC; see the file COPYING3.
*** 26,32 
  #include "tree-ssa-operands.h"
  #include "gimple-ssa.h"
  #include "cgraph.h"
- #include "tree-cfgcleanup.h"
  #include "tree-cfg.h"
  #include "tree-phinodes.h"
  #include "ssa-iterators.h"
--- 26,31 
*** T6/tree-cfgcleanup.c	2013-10-17 14:35:07.021401391 -0400
--- tree-cfgcleanup.c	2013-10-17 14:35:17.333401630 -0400
*** make_pass_merge_phi (gcc::context *ctxt)
*** 1027,1029 
--- 1027,1116 
  {
return new pass_merge_phi (ctxt);
  }
+ 
+ /* Pass: cleanup the CFG just before expanding trees to RTL.
+This is just a round of label cleanups and case node grouping
+because after the tree optimizers have run such cleanups may
+be necessary.  */
+ 
+ static unsigned int
+ execute_cleanup_cfg_post_optimizing (void)
+ {
+   unsigned int todo = 0;
+   if (cleanup_tree_cfg ())
+ todo |= TODO_update_ssa;
+   maybe_remove_unreachable_handlers ();
+   cleanup_dead_labels ();
+   group_case_labels ();
+   if ((flag_compare_debug_opt || flag_compare_debug)
+   && flag_dump_final_insns)
+ {
+   FILE *final_output = fopen (flag_dump_final_insns, "a");
+ 
+   if (!final_output)
+ 	{
+ 	  error ("could not open final insn dump file %qs: %m",
+ 		 flag_dump_final_insns);
+ 	  flag_dump_final_insns = NULL;
+ 	}
+   else
+ 	{
+ 	  int save_unnumbered = flag_dump_unnumbered;
+ 	  int save_noaddr = flag_dump_noaddr;
+ 
+ 	  flag_dump_noaddr = flag_dump_unnumbered = 1;
+ 	  fprintf (final_output, "\n");
+ 	  dump_enumerated_decls (final_output, dump_flags | TDF_NOUID);
+ 	  flag_dump_noaddr = save_noaddr;
+ 	  flag_dump_unnumbered = save_unnumbered;
+ 	  if (fclose (final_output))
+ 	{
+ 	  error ("could not close final insn dump file %qs: %m",
+ 		 flag_dump_final_insns);
+ 	  flag_dump_final_insns = NULL;
+ 	}
+ 	}
+ }
+   return todo;
+ }
+ 
+ namespace {
+ 
+ const pass_data pass_data_cleanup_cfg_post_optimizing =
+ {
+   GIMPLE_PASS, /* type */
+   "optimized", /* name */
+   OPTGROUP_NONE, /* optinfo_flags */
+   false, /* has_gate */
+   true, /* has_execute */
+   TV_TREE_CLEANUP_CFG, /* tv_id */
+   PROP_cfg, /* properties_required */
+   0, /* properties_provided */
+   0, /* properties_destroyed */
+   0, /* todo_flags_start */
+   TODO_remove_unused_locals, /* todo_flags_finish */
+ };
+ 
+ class pass_cleanup_cfg_post_optimizing : public gimple_opt_pass
+ {
+ public:
+   pass_cleanup_cfg_post_optimizing (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_cleanup_cfg_post_optimizing, ctxt)
+   {}
+ 
+   /* opt_pass methods: */
+   unsigned int execute () {
+ return execute_cleanup_cfg_post_optimizing ();
+   }
+ 
+ }; // class pass_cleanup_cfg_post_optimizing
+ 
+ } // anon namespace
+ 
+ gimple_opt_pass *
+ make_pass_cleanup_cfg_post_optimizing (gcc::context *ctxt)
+ {
+   return new pass_cleanup_cfg_post_optimizing (ctxt);
+ }
+ 
+ 
*** T6/tree-optimize.c	2013-10-17 14:35:07.033401391 -0400
--- tree-optimize.c	1969-12-31 19:00:00.0 -0500
***
*** 1,130 
- /* Top-level control of tree optimizations.
-Copyright (C) 2001-2013 Free Software Foundation, Inc.
-Contributed by Diego Novillo 
- 
- This file is part of GCC.
- 
- GCC is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 3, or (at your option)
- any later version.
- 
- GCC is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- GNU General Public License for more details.
- 
- You should have received a copy of the GNU General Public License
- along with GCC; see the file COPYING3.  If not see
- .  */
- 
- #include "config.h"
- #include "system.h"
- #include "coretypes.h"
- #include "tm.h"
- #include "tree.h"
- #include 

[patch 6/8] Remove sbitmap.h from the tree-ssa.h include list.

2013-10-18 Thread Andrew MacLeod

Also straightforward. includes sbitmap in the 5 files that need it.
I also happened to notice that tree-switch-conversion.c was including 
tree-ssa-operands.h directly, and doesnt need it, so removed it too


bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

Andrew


	* tree-switch-conversion.c: Don't include tree-ssa-operands.h.
	* tree-ssa.h: Don't include sbitmap.h.
	* tree-outof-ssa.c: Include sbitmap.h.
	* tree-ssa-live.c: Likewise.
	* tree-ssa-propagate.c: Likewise.
	* tree-ssa-structalias.c: Likewise.
	* tree-stdarg.c: Likewise.

*** A7/tree-switch-conversion.c	2013-10-17 18:59:40.114873462 -0400
--- tree-switch-conversion.c	2013-10-17 18:59:45.511872296 -0400
*** Software Foundation, 51 Franklin Street,
*** 32,38 
  #include "tree.h"
  #include "basic-block.h"
  #include "tree-ssa.h"
- #include "tree-ssa-operands.h"
  #include "tree-pass.h"
  #include "gimple-pretty-print.h"
  #include "cfgloop.h"
--- 32,37 
*** A7/tree-ssa.h	2013-10-17 18:59:40.102873464 -0400
--- tree-ssa.h	2013-10-17 19:00:34.201861784 -0400
*** along with GCC; see the file COPYING3.
*** 21,27 
  #define GCC_TREE_SSA_H
  
  #include "bitmap.h"
- #include "sbitmap.h"
  #include "gimple.h"
  #include "gimple-ssa.h"
  #include "cgraph.h"
--- 21,26 
*** A7/tree-outof-ssa.c	2013-10-17 18:59:40.099873465 -0400
--- tree-outof-ssa.c	2013-10-17 18:59:45.512872296 -0400
*** along with GCC; see the file COPYING3.
*** 27,32 
--- 27,33 
  #include "basic-block.h"
  #include "gimple-pretty-print.h"
  #include "bitmap.h"
+ #include "sbitmap.h"
  #include "tree-ssa.h"
  #include "dumpfile.h"
  #include "diagnostic-core.h"
*** A7/tree-ssa-live.c	2013-10-17 18:59:40.102873464 -0400
--- tree-ssa-live.c	2013-10-17 18:59:45.513872296 -0400
*** along with GCC; see the file COPYING3.
*** 26,31 
--- 26,32 
  #include "tree.h"
  #include "gimple-pretty-print.h"
  #include "bitmap.h"
+ #include "sbitmap.h"
  #include "tree-ssa.h"
  #include "timevar.h"
  #include "dumpfile.h"
*** A7/tree-ssa-propagate.c	2013-10-17 18:59:40.111873462 -0400
--- tree-ssa-propagate.c	2013-10-17 18:59:45.514872295 -0400
***
*** 29,34 
--- 29,35 
  #include "function.h"
  #include "gimple-pretty-print.h"
  #include "dumpfile.h"
+ #include "sbitmap.h"
  #include "tree-ssa.h"
  #include "tree-ssa-propagate.h"
  #include "langhooks.h"
*** A7/tree-ssa-structalias.c	2013-10-17 18:59:40.113873462 -0400
--- tree-ssa-structalias.c	2013-10-17 18:59:45.515872295 -0400
***
*** 25,30 
--- 25,31 
  #include "ggc.h"
  #include "obstack.h"
  #include "bitmap.h"
+ #include "sbitmap.h"
  #include "flags.h"
  #include "basic-block.h"
  #include "tree.h"
*** A7/tree-stdarg.c	2013-10-17 18:59:40.114873462 -0400
--- tree-stdarg.c	2013-10-17 18:59:45.516872295 -0400
*** along with GCC; see the file COPYING3.
*** 28,33 
--- 28,34 
  #include "gimple-pretty-print.h"
  #include "target.h"
  #include "tree-ssa.h"
+ #include "sbitmap.h"
  #include "tree-pass.h"
  #include "tree-stdarg.h"
  


[patch 7/8] Remove basic-block.h from cgraph.h

2013-10-18 Thread Andrew MacLeod
I also happened to notice that basic-block.h was being included directly 
by cgraph.h.  The only thing cgraph.h and most of what use it need are 
the typedefs for gcov_type:


typedef HOST_WIDEST_INT gcov_type;
typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;

This patch moves gcov_type and gcov_type_unsigned to coretypes.h.

I will note this is the first time HWINT is introduced to coretypes, but 
I believe system.h is suppose to always be included first, and that 
includes HWINT.  we could also include HWINT from coretypes... But Im 
not sure what the standard practice for what goes into coretypes 
actually is.


In any case, by removing the basic-block.h include from cgraph.h, 
varasm.c was the only .c file that required a new include.


Does this seem reasonable?

bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

Andrew



	* basic-block.h (gcov_type, gcov_type_unsigned): Move to coretypes.h.
	* coretypes.h (gcov_type, gcov_type_unsigned): Relocate here.
	* cgraph.h: Don't include basic-block.h.
	* varasm.c: Include basic-block.h.

*** T8/basic-block.h	2013-10-17 14:05:41.283303134 -0400
--- basic-block.h	2013-10-17 14:28:51.313383495 -0400
*** along with GCC; see the file COPYING3.
*** 24,36 
  #include "vec.h"
  #include "function.h"
  
! /* Type we use to hold basic block counters.  Should be at least
 64bit.  Although a counter cannot be negative, we use a signed
 type, because erroneous negative counts can be generated when the
 flow graph is manipulated by various optimizations.  A signed type
 makes those easy to detect.  */
- typedef HOST_WIDEST_INT gcov_type;
- typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
  
  /* Control flow edge information.  */
  struct GTY((user)) edge_def {
--- 24,34 
  #include "vec.h"
  #include "function.h"
  
! /* Use gcov_type to hold basic block counters.  Should be at least
 64bit.  Although a counter cannot be negative, we use a signed
 type, because erroneous negative counts can be generated when the
 flow graph is manipulated by various optimizations.  A signed type
 makes those easy to detect.  */
  
  /* Control flow edge information.  */
  struct GTY((user)) edge_def {
*** T8/coretypes.h	2013-10-17 14:05:41.288303134 -0400
--- coretypes.h	2013-10-17 14:13:33.007307287 -0400
*** see the files COPYING3 and COPYING.RUNTI
*** 43,48 
--- 43,51 
  
  #ifndef USED_FOR_TARGET
  
+ typedef HOST_WIDEST_INT gcov_type;
+ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
+ 
  struct bitmap_head_def;
  typedef struct bitmap_head_def *bitmap;
  typedef const struct bitmap_head_def *const_bitmap;
*** T8/cgraph.h	2013-10-17 14:05:41.286303134 -0400
--- cgraph.h	2013-10-17 14:25:16.424376641 -0400
*** along with GCC; see the file COPYING3.
*** 25,31 
  #include "plugin-api.h"
  #include "vec.h"
  #include "tree.h"
- #include "basic-block.h"
  #include "function.h"
  #include "ipa-ref.h"
  
--- 25,30 
*** T8/varasm.c	2013-10-17 14:05:41.335303131 -0400
--- varasm.c	2013-10-17 14:17:08.632339088 -0400
*** along with GCC; see the file COPYING3.
*** 50,55 
--- 50,56 
  #include "cgraph.h"
  #include "pointer-set.h"
  #include "asan.h"
+ #include "basic-block.h"
  
  #ifdef XCOFF_DEBUGGING_INFO
  #include "xcoffout.h"		/* Needed for external data


Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 2)

2013-10-18 Thread Sergey Ostanevich
failed on 403 of '06:
regclass.c: In function 'init_reg_sets':
regclass.c:277:1: internal compiler error: tree check: expected
ssa_name, have var_decl in copy_ssa_name_fn, at tree-ssanames.c:393
 init_reg_sets ()
 ^
0xb74124 tree_check_failed(tree_node const*, char const*, int, char const*, ...)
../../gcc/tree.c:9301
0xb0ade2 tree_check
../../gcc/tree.h:2674
0xb0ade2 copy_ssa_name_fn(function*, tree_node*, gimple_statement_d*)
../../gcc/tree-ssanames.c:393
0xb0ae6c duplicate_ssa_name_fn(function*, tree_node*, gimple_statement_d*)
../../gcc/tree-ssanames.c:452
0x9ce82d duplicate_ssa_name
../../gcc/tree-ssanames.h:121
0x9ce82d create_new_def_for(tree_node*, gimple_statement_d*, tree_node**)
../../gcc/tree-into-ssa.c:2836
0x986a4e gimple_duplicate_bb
../../gcc/tree-cfg.c:5496
0x6425c9 duplicate_block(basic_block_def*, edge_def*, basic_block_def*)
../../gcc/cfghooks.c:1040
0x642bfe copy_bbs(basic_block_def**, unsigned int, basic_block_def**,
edge_def**, unsigned int, edge_def**, loop*, basic_block_def*, bool)
../../gcc/cfghooks.c:1320
0x649d69 duplicate_loop_to_header_edge(loop*, edge_def*, unsigned int,
simple_bitmap_def*, edge_def*, vec*, int)
../../gcc/cfgloopmanip.c:1314
0xa83d09 gimple_duplicate_loop_to_header_edge(loop*, edge_def*,
unsigned int, simple_bitmap_def*, edge_def*, vec*, int)
../../gcc/tree-ssa-loop-manip.c:761
0x64b760 loop_version(loop*, void*, basic_block_def**, unsigned int,
unsigned int, unsigned int, bool)
../../gcc/cfgloopmanip.c:1706
0x9b71ae version_loop_for_if_conversion
../../gcc/tree-if-conv.c:1943
0x9baa3e tree_if_conversion
../../gcc/tree-if-conv.c:2069
0x9baa3e main_tree_if_conversion
../../gcc/tree-if-conv.c:2089
0x9baa3e execute
../../gcc/tree-if-conv.c:2139
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
specmake: *** [regclass.o] Error 1

On Thu, Oct 17, 2013 at 8:55 PM, Jakub Jelinek  wrote:
> Hi!
>
> On Tue, Oct 15, 2013 at 02:32:25PM +0200, Jakub Jelinek wrote:
>> Especially on i?86/x86_64 if-conversion pass seems to be often
>> a pessimization, but the vectorization relies on it and without it we can't
>> vectorize a lot of the loops.
>
> Here is an updated patchset:
> - first patch is an updated version of the
>   http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01058.html
>   patch
> - second patch is one variant (easier to write and perhaps maintain,
>   but perhaps with higher compile time and memory requirements) of the
>   follow up to support if-converted inner loop in outer loop vectorization;
>   I have started also working on variant where just the vectorizer
>   groks outer loop with LOOP_VECTORIZED call and two inner loops,
>   but am only about half way through the needed changes and would
>   prefer to wait how these patches actually work out together
> - third patch is an updated version of the
>   http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html patch, this
>   time without if-unconversion pass, but instead performed only in the
>   versioned loops for vectorization purposes
>
> All 3 patches bootstrapped/regtested together on x86_64-linux and
> i686-linux, the first patch and first+second patch additionally tested
> alone with
> make -C gcc -j4 -k check RUNTESTFLAGS=vect.exp
> make -C gcc check-gfortran RUNTESTFLAGS="dg.exp='assumed_rank* pr32533*' 
> execute.exp='forall* intrinsic_mmloc* pr54767.f90'"
> make -C x86*/libgomp check RUNTESTFLAGS=fortran.exp=omp_parse*
> (the latter two are set of tests that failed at some point during
> the development of the patchset).
>
> If anyone has spare cycles to e.g. SPEC2k{,6} test it on his favourite
> platforms (both for resulting code performance and for compile time
> and/or compile memory usage), it would be greatly appreciated.
>
> The only targets with masked load/store support are right now i?86/x86_64
> with -mavx and higher (and for masked gather load support only -mavx2
> and higher), so the last patch will probably be only beneficial to those for
> the time being.
>
> Jakub


[patch 8/8] cfgloop.h includes basic-block.h

2013-10-18 Thread Andrew MacLeod
Another basoc-block inclusion.  cfgloop.h was including basic-block.h, 
which meant lots of other things were getting it from here too.


The only routine in cfgloop.h which uses anything from basic-block.h was 
bb_loop_depth().  By moving that to cfgloop.c, its no longer required by 
the .h file.I did have to include function.h for a few routines.  I 
cannot imagine this being hot enough to show up on any performance 
measure...


bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

Andrew


	* cfgloop.h: Include function.h instead of basic-block.h
	(bb_loop_depth): Move to cfgloop.c.
	* cfgloop.c (bb_loop_depth): Relocate from cfgloop.h.

*** T9/cfgloop.h	2013-10-17 14:45:29.067398302 -0400
--- cfgloop.h	2013-10-17 14:55:46.321376841 -0400
*** along with GCC; see the file COPYING3.
*** 20,30 
  #ifndef GCC_CFGLOOP_H
  #define GCC_CFGLOOP_H
  
- #include "basic-block.h"
  #include "double-int.h"
- 
  #include "bitmap.h"
  #include "sbitmap.h"
  
  /* Structure to hold decision about unrolling/peeling.  */
  enum lpt_dec
--- 20,29 
  #ifndef GCC_CFGLOOP_H
  #define GCC_CFGLOOP_H
  
  #include "double-int.h"
  #include "bitmap.h"
  #include "sbitmap.h"
+ #include "function.h"
  
  /* Structure to hold decision about unrolling/peeling.  */
  enum lpt_dec
*** loop_depth (const struct loop *loop)
*** 455,468 
return vec_safe_length (loop->superloops);
  }
  
- /* Returns the loop depth of the loop BB belongs to.  */
- 
- static inline int
- bb_loop_depth (const_basic_block bb)
- {
-   return bb->loop_father ? loop_depth (bb->loop_father) : 0;
- }
- 
  /* Returns the immediate superloop of LOOP, or NULL if LOOP is the outermost
 loop.  */
  
--- 454,459 
*** extern HOST_WIDE_INT get_estimated_loop_
*** 744,749 
--- 735,741 
  extern HOST_WIDE_INT get_max_loop_iterations_int (struct loop *);
  extern bool get_estimated_loop_iterations (struct loop *loop, double_int *nit);
  extern bool get_max_loop_iterations (struct loop *loop, double_int *nit);
+ extern int bb_loop_depth (const_basic_block);
  
  /* Converts VAL to double_int.  */
  
*** T9/cfgloop.c	2013-10-17 14:45:29.067398302 -0400
--- cfgloop.c	2013-10-17 14:52:57.178383635 -0400
*** get_max_loop_iterations_int (struct loop
*** 1912,1915 
--- 1912,1921 
return hwi_nit < 0 ? -1 : hwi_nit;
  }
  
+ /* Returns the loop depth of the loop BB belongs to.  */
  
+ int
+ bb_loop_depth (const_basic_block bb)
+ {
+   return bb->loop_father ? loop_depth (bb->loop_father) : 0;
+ }


Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 2)

2013-10-18 Thread Jakub Jelinek
On Fri, Oct 18, 2013 at 05:45:15PM +0400, Sergey Ostanevich wrote:
> failed on 403 of '06:
> regclass.c: In function 'init_reg_sets':
> regclass.c:277:1: internal compiler error: tree check: expected
> ssa_name, have var_decl in copy_ssa_name_fn, at tree-ssanames.c:393
>  init_reg_sets ()
>  ^

Yeah, Kyrill reported the same issue to me privately earlier today,
and I've managed to reproduce, will look at it on Monday.

Jakub


Re: [PATCH, rs6000] Don't convert a vector constant load into a splat illegally

2013-10-18 Thread Bill Schmidt
Just a quick note that overnight testing of the posted patch was clean.

Recap: There are three options on the table:
 - the posted patch
 - that patch with the (1 - start) change
 - replace nunits - 1 with nunits as the loop upper bound

I'm happy to implement any of them, as you prefer.  I lean towards the
last as the most easily understood.

Thanks,
Bill

On Fri, 2013-10-18 at 00:09 -0500, Bill Schmidt wrote:
> 
> On Fri, 2013-10-18 at 00:34 -0400, David Edelsohn wrote:
> > On Thu, Oct 17, 2013 at 10:43 PM, Bill Schmidt
> >  wrote:
> > > Hi,
> > >
> > > In little endian mode, we managed to convert a load of the V4SI vector
> > > {3, 3, 3, 7} into a vspltisw of 3, apparently taking offense at the
> > > number 7.  It turns out we only looked at the first N-1 elements of an
> > > N-element vector in little endian mode, and verified the zeroth element
> > > twice.  Adjusting the loop boundaries fixes the problem.
> > >
> > > Currently bootstrapping for powerpc64{,le}-unknown-linux-gnu.  Ok to
> > > commit to trunk if no regressions?
> > 
> > This patch does not make sense. It biases the loop bounds based on
> > BYTES_BIG_ENDIAN, but the body of the loop biases the index variable
> > based on BYTES_BIG_ENDIAN in one of the two uses.  The changes seem to
> > compute the same value for the first use of index "i" for both
> > BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN in a convoluted way.  It looks
> > like something should be able to be simplified.
> > 
> > Thanks, David
> > 
> 
> It seems like it, but I haven't been able to figure out anything nicer.
> The thing that makes this weird is that "i" is referring to different
> elements of the array depending on Big or Little Endian numbering.
> 
> Suppose you have a V16QI: {0 15 0 15 0 15 0 15 0 15 0 15 0 15 0 15} with
> step = 2.  Then for BE, the 15's are in odd-numbered elements (starting
> with zero from the left), but for LE they are in even-numbered elements
> (starting with zero from the right).  For both cases, we found the
> candidate value in the rightmost element (nunits - 1 for BE, 0 for LE).
> 
> The first two iterations for BE process the two leftmost elements.  For
> i = 0, we find (i+1)&1 to be nonzero, so the desired value is msb_val =
> 0.  For i = 1, we find (i+1)&1 to be zero, so the desired value is val =
> 15, and so on.
> 
> The first two iterations for LE process the second and third elements
> from the right.  For i = 1, we find i&1 to be nonzero, so the desired
> value is 0.  For i = 2, we find i&2 to be zero, so the desired value is
> 15.
> 
> So even though the first iteration calculates the same value for both
> endian modes, "i" means something different in each case.
> 
> I guess we could make a "simplification" as follows:
> 
>   /* Check if VAL is present in every STEP-th element, and the
>   
>  other elements are filled with its most significant bit.  */
>   start = BYTES_BIG_ENDIAN ? 0 : 1;
>   end = BYTES_BIG_ENDIAN ? nunits - 1 : nunits;
> 
>   for (i = start; i < end; ++i)
> {
>   HOST_WIDE_INT desired_val;
>   if (((i + (1 - start)) & (step - 1)) == 0)
> desired_val = val;
>   else
> desired_val = msb_val;
> 
> but that may be as hard to understand as what's there now...
> 
> An alternative is to just change the loop bounds to
> 
>   for (i = 0; i < nunits; ++i)
> 
> which will reprocess the candidate element for both endian modes, and
> leave everything else alone.  That would probably actually be faster
> than all the convoluted nonsense to avoid the reprocessing.  Want me to
> go in that direction instead?
> 
> Thanks,
> Bill
> 



Re: [Ping] Re: [C++ Patch] Tidy a bit cp_parser_lookup_name

2013-10-18 Thread Jason Merrill

OK.  I thought I had already approved this, but apparently not.

Jason


Re: [PATCH] Fix profile count updates during tail merging

2013-10-18 Thread Teresa Johnson
On Tue, Oct 15, 2013 at 2:05 PM, Jan Hubicka  wrote:
>> This patch fixes a profile count insanity introduced by ssa tail
>> merging. When replacing bb1 with bb2, which has the same successors,
>> the bb counts were being merged, but the successor edge weights
>> were not.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2013-10-15  Teresa Johnson  
>>
>> * tree-ssa-tail-merge.c (replace_block_by): Update edge
>> weights during merging.
>>
>> Index: tree-ssa-tail-merge.c
>> ===
>> --- tree-ssa-tail-merge.c   (revision 203389)
>> +++ tree-ssa-tail-merge.c   (working copy)
>> @@ -1462,6 +1462,8 @@ static void
>>  replace_block_by (basic_block bb1, basic_block bb2)
>>  {
>>edge pred_edge;
>> +  edge e1;
>> +  edge_iterator ei;
>>unsigned int i;
>>gimple bb2_phi;
>>
>> @@ -1488,6 +1490,15 @@ replace_block_by (basic_block bb1, basic_block bb2
>>pred_edge, UNKNOWN_LOCATION);
>>  }
>>
>> +  /* Merge the outgoing edge counts from bb1 onto bb2.  */
>> +  FOR_EACH_EDGE (e1, ei, bb1->succs)
>> +{
>> +  edge e2;
>> +  e2 = find_edge (bb2, e1->dest);
>> +  gcc_assert (e2);
>> +  e2->count += e1->count;
>
> Don't you need to redistribute the counts via edge probabilities?

Good point on the probabilities. However, we want to redistribute the
counts as-is and instead update the probabilities to reflect the new
counts. Otherwise, successor blocks would need to have their counts
updated.

Here is the new patch, which was bootstrapped and tested on
x86_64-unknown-linux-gnu. Ok for trunk?

2013-10-18  Teresa Johnson  

* tree-ssa-tail-merge.c (replace_block_by): Update edge
weights during merging.

Index: tree-ssa-tail-merge.c
===
--- tree-ssa-tail-merge.c   (revision 203389)
+++ tree-ssa-tail-merge.c   (working copy)
@@ -1462,6 +1462,8 @@ static void
 replace_block_by (basic_block bb1, basic_block bb2)
 {
   edge pred_edge;
+  edge e1;
+  edge_iterator ei;
   unsigned int i;
   gimple bb2_phi;

@@ -1494,6 +1496,18 @@ replace_block_by (basic_block bb1, basic_block bb2

   bb2->count += bb1->count;

+  /* Merge the outgoing edge counts from bb1 onto bb2.  */
+  FOR_EACH_EDGE (e1, ei, bb1->succs)
+{
+  edge e2;
+  e2 = find_edge (bb2, e1->dest);
+  gcc_assert (e2);
+  e2->count += e1->count;
+  /* Recompute the probability from the new merged edge count (bb2->count
+ was updated above).  */
+  e2->probability = GCOV_COMPUTE_SCALE (e2->count, bb2->count);
+}
+
   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);
   same_succ_flush_bb (bb1);



>
> Honza
>> +}
>> +
>>bb2->frequency += bb1->frequency;
>>if (bb2->frequency > BB_FREQ_MAX)
>>  bb2->frequency = BB_FREQ_MAX;
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix profile count updates during tail merging

2013-10-18 Thread Jan Hubicka
> On Tue, Oct 15, 2013 at 2:05 PM, Jan Hubicka  wrote:
> >> This patch fixes a profile count insanity introduced by ssa tail
> >> merging. When replacing bb1 with bb2, which has the same successors,
> >> the bb counts were being merged, but the successor edge weights
> >> were not.
> >>
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
> >>
> >> Thanks,
> >> Teresa
> >>
> >> 2013-10-15  Teresa Johnson  
> >>
> >> * tree-ssa-tail-merge.c (replace_block_by): Update edge
> >> weights during merging.
> >>
> >> Index: tree-ssa-tail-merge.c
> >> ===
> >> --- tree-ssa-tail-merge.c   (revision 203389)
> >> +++ tree-ssa-tail-merge.c   (working copy)
> >> @@ -1462,6 +1462,8 @@ static void
> >>  replace_block_by (basic_block bb1, basic_block bb2)
> >>  {
> >>edge pred_edge;
> >> +  edge e1;
> >> +  edge_iterator ei;
> >>unsigned int i;
> >>gimple bb2_phi;
> >>
> >> @@ -1488,6 +1490,15 @@ replace_block_by (basic_block bb1, basic_block bb2
> >>pred_edge, UNKNOWN_LOCATION);
> >>  }
> >>
> >> +  /* Merge the outgoing edge counts from bb1 onto bb2.  */
> >> +  FOR_EACH_EDGE (e1, ei, bb1->succs)
> >> +{
> >> +  edge e2;
> >> +  e2 = find_edge (bb2, e1->dest);
> >> +  gcc_assert (e2);
> >> +  e2->count += e1->count;
> >
> > Don't you need to redistribute the counts via edge probabilities?
> 
> Good point on the probabilities. However, we want to redistribute the
> counts as-is and instead update the probabilities to reflect the new
> counts. Otherwise, successor blocks would need to have their counts
> updated.
> 
> Here is the new patch, which was bootstrapped and tested on
> x86_64-unknown-linux-gnu. Ok for trunk?
> 
> 2013-10-18  Teresa Johnson  
> 
> * tree-ssa-tail-merge.c (replace_block_by): Update edge
> weights during merging.

This is OK, thanks!
Honza


Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-18 Thread Jan Hubicka
> 
> As noted above, the change to avoid the RDIV by runs decreases the
> effect of the min_run_ratio (now unlikely_count_fraction) in half. So
> we really need to increase this parameter to 32 to compare to what the
> old version of the code was doing.
> 
> Indeed, using 32 does fix the same set of issues. However, when
> setting the parameter to the old default of 4, there are 5 core dumps
> using the patch to make the unlikely code unexecutable with a 100
> training runs (see below for some examples why). This goes down to 2
> failures if I set the parameter to 10, and 0 if I set it to 20. Using
> 20 essentially means that the code has to be executed 5% of the runs,
> which seems reasonable for something like splitting. What do you
> think?

Yes, I think I am fine with pretty much any constant that is not insanely large.
20 or 32 seems fine.
> 
> 1) No context sensitivity for routine that is inlined:
> 
> In 
> /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr33870.c,
> we call merge_pagelist 3 times from sort_pagelist. In the profile-use
> compile merge_pagelist is inlined (I believe in all 3 locations). One
> of the conditional blocks (line 23) has a profile count that is less
> than runs/4. It turns out that from the first merge_pagelist callsite
> we always execute this block, but the other two are completely cold.
> But the profile info from all three calls is of course merged, and it
> looks like it is executed infrequently overall. So the block is split
> but then we execute it from the inline instance corresponding to the
> first call.
> 
> I'm not sure what we can do in general here. But it is another data
> point suggesting to me that the blocks should be very cold to be
> split.

Yep, these issues from code duplicating optimizations are probably going to be
unavoidable.
> 
>  2) RTL expansion:
> 
> Another was in the following code from
> /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr43784.c:
> 
> static struct s __attribute__((noinline)) rp(void)
> {
>   return *p;
> }
> 
> where p is a global that is assigned as:
> 
> static union u v;
> static struct s *p = &v.d.b;
> 
> RTL expansion synthesizes a bunch of tests and branches (alignment
> handling?), and guesses at the probabilities of the resulting

Yep, this should get better with my memcpy patch.  I will try to update it and
get it to mainline druing the weekend.

> branches. Unfortunately, this is less than accurate and we end up
> executing a block that is given a small likelihood, making it below
> the threshold with the default unlikely fraction of 4.

> 
> Here is the patch with the current threshold kept as 4. Let me know if
> you think we can kick this up to 20, or if we should just increase the
> threshold for splitting.
> 
> Passes regression tests, ok for trunk?
> 
> (BTW, getting close. I have a bunch of fixes to the loop unroller that
> need some cleanup, and a small fix to ssa tail merging to update the
> profiles, but I think that does it for the tests I was looking at.
> Will send those soon for review.)
> 
> Thanks,
> Teresa
> 
> 2013-10-10  Teresa Johnson  
> 
> * predict.c (probably_never_executed): Compare frequency-based
> count to number of training runs.
> * params.def (UNLIKELY_BB_COUNT_FRACTION): New parameter.

OK, thanks for working on this!
> 
> Index: predict.c
> ===
> --- predict.c   (revision 203390)
> +++ predict.c   (working copy)
> @@ -237,17 +237,33 @@ probably_never_executed (struct function *fun,
>gcc_checking_assert (fun);
>if (profile_status_for_function (fun) == PROFILE_READ)
>  {
> -  if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> +  if (count * unlikely_count_fraction >= profile_info->runs)
> return false;
>if (!frequency)
> return true;
>if (!ENTRY_BLOCK_PTR->frequency)
> return false;
> -  if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
> REG_BR_PROB_BASE)
> +  if (ENTRY_BLOCK_PTR->count)
> {
> - return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
> -   ENTRY_BLOCK_PTR->frequency)
> - < REG_BR_PROB_BASE / 4);
> +  gcov_type computed_count;
> +  /* Check for possibility of overflow, in which case entry bb count
> + is large enough to do the division first without losing much
> + precision.  */
> +  if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
> +{
> +  gcov_type scaled_count
> +  = frequency * ENTRY_BLOCK_PTR->count *
> unlikely_count_fraction;
> +  computed_count = RDIV (scaled_count, 
> ENTRY_BLOCK_PTR->frequency);
> +}
> +  else
> +{
> +  computed_cou

Re: [Ping] Re: [C++ Patch] Tidy a bit cp_parser_lookup_name

2013-10-18 Thread Paolo Carlini

On 10/18/2013 03:57 PM, Jason Merrill wrote:

OK. I thought I had already approved this, but apparently not.

Thanks, applied.

Can I also ask you to follow up to this exchange which remained hanging?

http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00827.html

Thanks in advance!
Paolo.


Re: [patch] Rename tree-flow.h to tree-cfg.h.

2013-10-18 Thread Andrew MacLeod

On 10/17/2013 08:48 PM, Andrew MacLeod wrote:


rs6000 and alpha were including tree-flow.h in their config/target.c 
file. I built stage 1 for each target as a cross compiler to confirm 
builds dont break due to something unresolved.  rs6000.c no longer 
needed it, alpha needs num_ssa_names, so it requires gimple-ssa.h 
(instead of tree-flow.h).   There is a compilation error in alpha 
(like on many other targets) for target_flags_explicit when it 
fails, that is the only unresolved external left the situation 
which existed before this patch.


Bootstraps on x86_64-unknown-linux-gnu with no new regressions.

OK?

Andrew

Apparently rs6000.c requires cgraph when HAVE_AS_GNU_ATTRIBUTE is 
defined, which it isn't for the cross-compile I tried using target 
rs6000-ibm-aix4.3.   I also tried powerpc64-unknown-linux-gnu, but it 
still doesn't fail.  I hacked up the file to define the macro and verify 
that simply including cgraph.h is sufficient to compile the file.


Thanks to jbglaw and his build robot for snagging this so quickly for me.

Checked in as revision 203824.

Andrew





2013-10-18  Andrew MacLeod  

	* config/rs6000/rs6000.c: Include cgraph.h.

Index: config/rs6000/rs6000.c
===
*** config/rs6000/rs6000.c	(revision 203817)
--- config/rs6000/rs6000.c	(working copy)
***
*** 58,63 
--- 58,64 
  #include "opts.h"
  #include "tree-vectorizer.h"
  #include "dumpfile.h"
+ #include "cgraph.h"
  #if TARGET_XCOFF
  #include "xcoffout.h"  /* get declarations of xcoff_*_section_name */
  #endif


Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-10-18 Thread Jeff Law

On 10/18/13 04:55, Richard Biener wrote:

On Fri, Jun 21, 2013 at 11:21 PM, Steve Ellcey  wrote:

On Fri, 2013-06-21 at 17:43 +0100, James Greenhalgh wrote:


So to my mind, this is all far too tied up in pass ordering details to
resolve. Given that all the threading opportunities for my patch are found
in dom1 and how fragile the positioning of dom1 is, there is not a great
deal I can do to modify the ordering.

The biggest improvement I could find comes from rerunning pass_ch
immediately after dom1, though I'm not sure what the cost of that
would be.

I wonder if you or others have any thoughts on what the right thing to
do would be?


I am not sure if path threading in general is turned off for -Os but it
probably should be.


I agree, jump threading is on at -Os, path threading should not be.

Thanks,
James


I think that since it seems to be more a space issue then a time issue,
the way you currently have it is reasonable.  As long as the
optimization does not happen at -Os then I think we can probably live
with the space increase.


I suppose with Jeffs recent work on jump-threading through paths
this case in handled and the patch in this thread is obsolete or can
be reworked?
It trivially falls out of the work I'm doing.  The biggest question will 
be what to do about the loop structures.  For the FSA case from coremark 
the loop structures get mucked up way beyond repair.


My thoughts were to generally continue to not allow major damage to the 
loop structures when threading jumps --  with the exception being if we 
can eliminate an indirect or tablejump at the top of a loop.  Possibly 
further restricting it to the jump threading done after our loop 
optimizers have completed.


I haven't looked at that heuristic yet as I've primarily focused on 
having the right infrastructure to correctly update the SSA and CFG in 
the general case.




jeff


Re: patch to enable LRA for ppc

2013-10-18 Thread David Edelsohn
On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov  wrote:
> The following patch permits today trunk to use LRA for ppc by default.
> To switch it off -mno-lra can be used.
>
> The patch was bootstrapped on ppc64.  GCC testsuite does not have
> regressions too (in comparison with reload).  The change in rs6000.md is
> for fix LRA failure on a recently added ppc test.

Vlad,

I have not forgotten this patch. We are trying to figure out the right
timeframe to make this change. The patch does affect performance --
both positively and negatively; most are in the noise but not all. And
there still are some SPEC benchmarks that fail to build with the
patch, at least in Mike's tests. And Mike is implementing some patches
to utilize reload to improve use of VSX registers, which would need to
be mirrored in LRA for the equivalent functionality.

Thanks, David


Re: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742).

2013-10-18 Thread Jeff Law

On 10/18/13 05:05, James Greenhalgh wrote:

On Fri, Oct 18, 2013 at 11:55:08AM +0100, Richard Biener wrote:

I suppose with Jeffs recent work on jump-threading through paths
this case in handled and the patch in this thread is obsolete or can
be reworked?


Yes, this patch is now obsolete, Jeff's solution is much more
elegant :-)
Not sure if I'd use the term elegant :-)  My solution is certainly more 
general though.  There's still a couple rounds of staging bits as I 
clean up the rough edges.


The need to store a full thread path to get SSA graph update correct in 
cases where we have a joiner block, followed by multiple threadable 
blocks without side effects, followed by a threadable block with side 
effects was unexpected.


Jeff


Re: [PATCH, PR 53001] Re: Patch to split out new warning flag for floating point conversion

2013-10-18 Thread Joseph S. Myers
On Fri, 18 Oct 2013, Joshua J Cogliati wrote:

> This patch does not change any of the non-commented c and c++ code.
> It changes the dg comments.
> Example:
> -  fsi (3.1f); /* { dg-warning "conversion" } */
> +  fsi (3.1f); /* { dg-warning "float-conversion" } */
> 
> If you want I can change it to (in separate files if desired):
> 
>   fsi (3.1f); /* { dg-warning "conversion" } */
>   fsi (3.1f); /* { dg-warning "float-conversion" } */
> 
> so that now the tests are run both ways, but it would test the exact
> same code path.

Really I think it's better for the dg-warning text to test the actual 
warning text rather than the name of the option that's also reported as 
part of the compiler output ("conversion" matches both, of course).

The problem isn't so much the change to dg-warning, though, as the change 
to dg-options.  Previously the test asserted that certain things warn with 
-Wconversion, by changing it you lose the assertion that -Wconversion 
enables those warnings.  So I think the test should remain as-is, 
verifying that -Wfloat-conversion causes certain warnings, and then be 
copied in a form using -Wfloat-conversion to verify that 
-Wfloat-conversion also causes the same warnings.

Looking at , 
there are also some formatting problems, "if(" which should have a space 
before the "(".

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-18 Thread Jeff Law

On 10/18/13 03:53, Richard Biener wrote:


I think we want a "proper" predicate in tree-cfg.c for this, like maybe
a subset of tree_forwarder_block_p or whatever it will end up looking
like (we need "effectively empty BB" elsewhere, for example in vectorization,
add a flag to allow a condition ending the BB and the predicate is done).
Yes.  We definitely have a need for this.  For threading, "effectively 
empty" comes up often.  In that context, we're ignoring the control 
statement at the end (because it's going away ;-)


Jeff


Re: [GOOGLE] Add flag to enalbe AutoFDO accurate mode

2013-10-18 Thread Xinliang David Li
ok.

David

On Tue, Oct 15, 2013 at 10:58 AM, Dehao Chen  wrote:
> This patch add a new flag to let user to tell compiler that the
> AutoFDO profile is accurate. So the compiler will assume function
> without any sample is UNLIKELY_EXECUTED. This could save 10%~20% text
> section size.
>
> Bootstrapped and passed regression test.
>
> OK for google-4_8 branch?
>
> Thanks,
> Dehao
>
> Index: gcc/common.opt
> ===
> --- gcc/common.opt (revision 203546)
> +++ gcc/common.opt (working copy)
> @@ -942,6 +942,10 @@ Common Joined RejectNegative Var(auto_profile_file
>  Use sample profile information for call graph node weights. The profile
>  file is specified in the argument.
>
> +fauto-profile-accurate
> +Common Report Var(flag_auto_profile_accurate) Optimization
> +Whether to assume the sample profile is accurate.
> +
>  ; -fcheck-bounds causes gcc to generate array bounds checks.
>  ; For C, C++ and ObjC: defaults off.
>  ; For Java: defaults to on.
> Index: gcc/predict.c
> ===
> --- gcc/predict.c (revision 203546)
> +++ gcc/predict.c (working copy)
> @@ -2866,7 +2866,9 @@ compute_function_frequency (void)
>|| (flag_auto_profile && profile_status == PROFILE_GUESSED))
>  {
>int flags = flags_from_decl_or_type (current_function_decl);
> -  if (lookup_attribute ("cold", DECL_ATTRIBUTES (current_function_decl))
> +  if (profile_info && flag_auto_profile_accurate)
> + node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
> +  else if (lookup_attribute ("cold", DECL_ATTRIBUTES
> (current_function_decl))
>!= NULL)
>  node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
>else if (lookup_attribute ("hot", DECL_ATTRIBUTES
> (current_function_decl))


Re: [PATCH] PR58669: does not detect all cpu cores/threads

2013-10-18 Thread Andïï
On 18 October 2013 03:27, Tom Tromey  wrote:
>> "Andrew" == Andrew   writes:
>
> Andrew> +#ifdef HAVE_UNISTD_H
> Andrew> +  procs = sysconf(_SC_NPROCESSORS_ONLN);
> Andrew> +#endif
>
> Space before the "(".
>
> Technically you should probably check for sysconf in configure.ac.
> I'm not sure whether it matters any more.
>
> I think _SC_NPROCESSORS_ONLN is not portable though.
>
> Tom

I can add an #ifdef _SC_NPROCESSORS_ONLN around it too.
Is that enough to imply we have sysconf too?
-- 
Andii :-)


Re: Add predefined macros for library use in defining __STDC_IEC_559*

2013-10-18 Thread Joseph S. Myers
Note that the associated glibc patch 
 is now 
approved, pending the GCC one.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH]Fix computation of offset in ivopt

2013-10-18 Thread Bin.Cheng
Hi Richard,
Is the first patch OK?  Since there is a patch depending on it.
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01931.html

Thanks.
bin

On Fri, Oct 18, 2013 at 7:18 PM, Richard Biener
 wrote:
> On Thu, Oct 17, 2013 at 7:52 AM, bin.cheng  wrote:
>> Hi,
>> As noted in previous messages, GCC forces offset to unsigned in middle end.
>> It also gets offset value and stores it in HOST_WIDE_INT then uses it in
>> various computation.  In this scenario, It should use int_cst_value to do
>> additional sign extension against the type of tree node, otherwise we might
>> lose sign information.  Take function fold_plusminus_mult_expr as an
>> example, the sign information of arg01/arg11 might be lost because it uses
>> TREE_INT_CST_LOW directly.  I think this is the offender of the problem in
>> this thread.  I think we should fix the offender, rather the hacking
>> strip_offset as in the original patch, so I split original patch into two as
>> attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, the
>> other fixing the mentioned sign extension problem.
>
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c (revision 203267)
> +++ gcc/fold-const.c (working copy)
> @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre
>HOST_WIDE_INT int01, int11, tmp;
>bool swap = false;
>tree maybe_same;
> -  int01 = TREE_INT_CST_LOW (arg01);
> -  int11 = TREE_INT_CST_LOW (arg11);
> +  int01 = int_cst_value (arg01);
> +  int11 = int_cst_value (arg11);
>
> this is not correct - it will mishandle all large unsigned numbers.
>
> The real issue is that we rely on twos-complement arithmetic to work
> when operating on pointer offsets (because we convert them all to
> unsigned sizetype).  That makes interpreting the offsets or expressions
> that compute offsets hard (or impossible in some cases), as you can
> see from the issues in IVOPTs.
>
> The above means that strip_offset_1 cannot reliably look through
> MULT_EXPRs as that can make an offset X * CST that is
> "effectively" signed "surely" unsigned in the process.  Think of
> this looking into X * CST as performing a division - whose result
> is dependent on the sign of X which we lost with our unsigned
> casting.  Now, removing the MULT_EXPR look-through might
> be too pessimizing ... but it may be worth trying to see if it fixes
> your issue?  In this context I also remember a new bug filed
> recently of us not folding x /[ex] 4 * 4 to x.
>
> In the past I was working multiple times on stopping doing that
> (make all offsets unsigned), but I never managed to finish ...
>
> Richard.
>
>> Bootstrap and test on x86/x86_64/arm.  Is it OK?
>>
>> Thanks.
>> bin
>>
>> Patch a:
>> 2013-10-17  Bin Cheng  
>>
>> * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type.
>> Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF.
>>
>> Patch b:
>> 2013-10-17  Bin Cheng  
>>
>> * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value instead
>> of TREE_INT_CST_LOW.
>>
>> gcc/testsuite/ChangeLog
>> 2013-10-17  Bin Cheng  
>>
>> * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test.
>>
>>> -Original Message-
>>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>>> Sent: Wednesday, October 02, 2013 4:32 PM
>>> To: Bin.Cheng
>>> Cc: Bin Cheng; GCC Patches
>>> Subject: Re: [PATCH]Fix computation of offset in ivopt
>>>
>>> On Tue, Oct 1, 2013 at 6:13 PM, Bin.Cheng  wrote:
>>> > On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener
>>> >  wrote:
>>> >> On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng 
>>> wrote:
>>> >>>
>>> >>>
>>> >>
>>> >> I don't think you need
>>> >>
>>> >> +  /* Sign extend off if expr is in type which has lower precision
>>> >> + than HOST_WIDE_INT.  */
>>> >> +  if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT)
>>> >> +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr)));
>>> >>
>>> >> at least it would be suspicious if you did ...
>>> > There is a problem for example of the first message.  The iv base if
>> like:
>>> > pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 I am not sure
>>> > why but it seems (-4/0xFFFC) is represented by (1073741823*4).
>>> > For each operand strip_offset_1 returns exactly the positive number
>>> > and result of multiplication never get its chance of sign extend.
>>> > That's why the sign extend is necessary to fix the problem. Or it
>>> > should be fixed elsewhere by representing iv base with:
>>> > "pretmp_184 + ((sizetype) KeyIndex_180 + 4294967295) * 4" in the first
>>> place.
>>>
>>> Yeah, that's why I said the whole issue with forcing all offsets to be
>> unsigned
>>> is a mess ...
>>>
>>> There is really no good answer besides not doing that I fear.
>>>
>>> Yes, in the above case we could fold the whole thing differently
>> (interpret
>>> the offset of a POINTER_PLUS_EXPR as signed).  You 

[PATCH] Add debug counter to jump threading code

2013-10-18 Thread Jeff Law


I was debugging a failure yesterday in the jump threading code that was 
best narrowed down with a debug counter.  No sense in keeping that in my 
local tree.


While adding the new #include, I saw a couple #includes that clearly 
didn't belong, so I zapped them.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b427946..b1028bf 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-09  Zhenqiang Chen  
+
+   * tree-ssa-phiopts.c (rhs_is_fed_for_value_replacement): New function.
+   (operand_equal_for_value_replacement): New function, extracted from
+   value_replacement and enhanced to catch more cases.
+   (value_replacement): Use operand_equal_for_value_replacement.
+
 2013-10-09  Andrew MacLeod  
 
* loop-doloop.c (doloop_modify, doloop_optimize): Use 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 66b3c38..76cce59 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2013-10-09  Zhenqiang Chen  
+
+   * gcc.dg/tree-ssa/phi-opt-11.c: New test.
+
 2013-10-09  Marek Polacek  
 
PR c++/58635
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c
new file mode 100644
index 000..7c83007
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+int f(int a, int b, int c)
+{
+  if (a == 0 && b > c)
+   return 0;
+ return a;
+}
+
+int g(int a, int b, int c)
+{
+  if (a == 42 && b > c)
+   return 42;
+ return a;
+}
+
+int h(int a, int b, int c, int d)
+{
+  if (a == d && b > c)
+   return d;
+ return a;
+}
+/* { dg-final { scan-tree-dump-times "if" 0 "optimized"} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 8e1ddab..adf8a28 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -110,6 +110,26 @@ static bool gate_hoist_loads (void);
This opportunity can sometimes occur as a result of other
optimizations.
 
+
+   Another case caught by value replacement looks like this:
+
+ bb0:
+   t1 = a == CONST;
+   t2 = b > c;
+   t3 = t1 & t2;
+   if (t3 != 0) goto bb1; else goto bb2;
+ bb1:
+ bb2:
+   x = PHI (CONST, a)
+
+   Gets replaced with:
+ bb0:
+ bb2:
+   t1 = a == CONST;
+   t2 = b > c;
+   t3 = t1 & t2;
+   x = a;
+
ABS Replacement
---
 
@@ -155,7 +175,7 @@ static bool gate_hoist_loads (void);
 
Adjacent Load Hoisting
--
-   
+
This transformation replaces
 
  bb0:
@@ -286,7 +306,7 @@ single_non_singleton_phi_for_edges (gimple_seq seq, edge 
e0, edge e1)
phi optimizations.  Both share much of the infrastructure in how
to match applicable basic block patterns.  DO_STORE_ELIM is true
when we want to do conditional store replacement, false otherwise.
-   DO_HOIST_LOADS is true when we want to hoist adjacent loads out 
+   DO_HOIST_LOADS is true when we want to hoist adjacent loads out
of diamond control flow patterns, false otherwise.  */
 static unsigned int
 tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads)
@@ -389,7 +409,7 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads)
  continue;
}
   else
-   continue;  
+   continue;
 
   e1 = EDGE_SUCC (bb1, 0);
 
@@ -437,7 +457,7 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads)
 
  if (!candorest)
continue;
- 
+
  phi = single_non_singleton_phi_for_edges (phis, e1, e2);
  if (!phi)
continue;
@@ -672,6 +692,93 @@ jump_function_from_stmt (tree *arg, gimple stmt)
   return false;
 }
 
+/* RHS is a source argument in a BIT_AND_EXPR which feeds a conditional
+   of the form SSA_NAME NE 0.
+
+   If RHS is fed by a simple EQ_EXPR comparison of two values, see if
+   the two input values of the EQ_EXPR match arg0 and arg1.
+
+   If so update *code and return TRUE.  Otherwise return FALSE.  */
+
+static bool
+rhs_is_fed_for_value_replacement (const_tree arg0, const_tree arg1,
+ enum tree_code *code, const_tree rhs)
+{
+  /* Obviously if RHS is not an SSA_NAME, we can't look at the defining
+ statement.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+{
+  gimple def1 = SSA_NAME_DEF_STMT (rhs);
+
+  /* Verify the defining statement has an EQ_EXPR on the RHS.  */
+  if (is_gimple_assign (def1) && gimple_assign_rhs_code (def1) == EQ_EXPR)
+   {
+ /* Finally verify the source operands of the EQ_EXPR are equal
+to arg0 and arg1.  */
+ tree op0 = gimple_assign_rhs1 (def1);
+ tree op1 = gimple_assign_rhs2 (def1);
+ if ((operand_equal_for_phi_arg_p (arg0, op0)
+  && operand_equal_for_phi_arg_p (arg1, op

Re: [patch 3/8] Remove tree-ssa-threadedge.h from the tree-ssa.h include list.

2013-10-18 Thread Jeff Law

On 10/18/13 07:38, Andrew MacLeod wrote:

Straightforward. Just include tree-ssa-threadedge in the 4 .c files that
need it.

bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

OK.
Jeff



Re: [patch 4/8] Remove tree-ssa-dom.h from the tree-ssa.h include list.

2013-10-18 Thread Jeff Law

On 10/18/13 07:39, Andrew MacLeod wrote:

degenerate_phi_result was defined in tree-ssa-dom.c, I moved it to
tree-phinodes since all it does is determine whether the arguements of
the phi which are not the same as the result are all the same. This
reduced by half the number fo files which required tree-ssa-dom.h.

bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

OK.

FWIW, the original purpose of tree-phinodes.[ch] was just management of 
a list of free PHI nodes for recycling.  If it's a good place to put 
various support routines of this nature, that's fine by me.


jeff


Re: [PATCH] Add debug counter to jump threading code

2013-10-18 Thread Jeff Law

On 10/18/13 09:50, Jeff Law wrote:


I was debugging a failure yesterday in the jump threading code that was
best narrowed down with a debug counter.  No sense in keeping that in my
local tree.

While adding the new #include, I saw a couple #includes that clearly
didn't belong, so I zapped them.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.

Opps, attached wrong patch.  Correct version attached this time.

jeff






commit 9339daeaa66b8a3d958fc32eed66dbdca7bbf31f
Author: law 
Date:   Fri Oct 18 15:50:04 2013 +

   * tree-ssa-threadupdate.c: Do not include "tm.h" or "tm_p.h".

* tree-ssa-threadupdate.c: Include "dbgcnt.h".
(register_jump_thread): Add "registered_jump_thread" debug counter 
support.
* dbgcnt.def (registered_jump_thread): New debug counter.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@203825 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 0553518..e18ad6f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2013-10-18  Jeff Law  
+
+   * tree-ssa-threadupdate.c: Do not include "tm.h" or "tm_p.h".
+
+   * tree-ssa-threadupdate.c: Include "dbgcnt.h".
+   (register_jump_thread): Add "registered_jump_thread" debug counter 
support.
+   * dbgcnt.def (registered_jump_thread): New debug counter.
+
 2013-10-18  Andrew MacLeod  
 
* config/rs6000/rs6000.c: Include cgraph.h.
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 45b8eed..6f86253 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -188,3 +188,4 @@ DEBUG_COUNTER (store_motion)
 DEBUG_COUNTER (split_for_sched2)
 DEBUG_COUNTER (tail_call)
 DEBUG_COUNTER (ira_move)
+DEBUG_COUNTER (registered_jump_thread)
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 3e34567..e791269 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -20,10 +20,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
-#include "tm.h"
 #include "tree.h"
 #include "flags.h"
-#include "tm_p.h"
 #include "basic-block.h"
 #include "function.h"
 #include "tree-ssa.h"
@@ -31,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "cfgloop.h"
 #include "hash-table.h"
+#include "dbgcnt.h"
 
 /* Given a block B, update the CFG and SSA graph to reflect redirecting
one or more in-edges to B to instead reach the destination of an
@@ -1534,6 +1533,14 @@ dump_jump_thread_path (FILE *dump_file, 
vec path)
 void
 register_jump_thread (vec *path)
 {
+  if (!dbg_cnt (registered_jump_thread))
+{
+  for (unsigned int i = 0; i < path->length (); i++)
+   delete (*path)[i];
+  path->release ();
+  return;
+}
+
   /* First make sure there are no NULL outgoing edges on the jump threading
  path.  That can happen for jumping to a constant address.  */
   for (unsigned int i = 0; i < path->length (); i++)


Re: [patch 2/8] Remove tree-ssa-address.h from the tree-ssa.h include list.

2013-10-18 Thread Jeff Law

On 10/18/13 07:38, Andrew MacLeod wrote:

This patch just does the direct thing.. included the file where it was
needed (7 .c files)   I was tempted to move copy_ref_info() to a
different file... its actually the only routine in this file which is
SSA based... and rename the file tree-address.[ch].  Not sure where I'd
copy it to, maybe tree-ssa.c... Almost every routine called is in
tree-ssanames.c, but that doesnt seem quite appropriate. Maybe its OK
like it is.

In any case, if you want something further done, I can follow up with
another patch.

bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

OK.

Your call on moving copy_ref_into now or at a later point, similarly on 
renaming file afterwards.


jeff


Re: [patch 5/8] Remove tree-cfgcleanup.h from the tree-ssa.h include list.

2013-10-18 Thread Jeff Law

On 10/18/13 07:40, Andrew MacLeod wrote:

This was slightly more interesting. I moved the
cleanup_cfg_post_optimizing pass from tree-optimize.c to
tree-cfgcleanup.c.  And that left tree-optimize.c empty... so I deleted
that file as well.  Other than that, just include it in the other 7
files that require it

bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

OK.
jeff


[jit] Improvements to error-handling

2013-10-18 Thread David Malcolm
Committed to dmalcolm/jit:

Add defensive checks at the API boundary to gracefully reject
various kinds of error.

gcc/jit/

* internal-api.c (gcc::jit::context::get_type): Improve error
message, and report the bogus value.
(gcc::jit::context::new_binary_op): Likewise.
(gcc::jit::context::new_comparison): Likewise.
(gcc::jit::context::set_str_option): Likewise.
(gcc::jit::context::set_int_option): Likewise.
(gcc::jit::context::set_bool_option): Likewise.
(gcc::jit::context::compile): Likewise, and make the errors
block the creation of result, rather than just the return
value of the client callback.
(gcc::jit::context::add_error): Add varargs and provide
implementation, calling into...
(gcc::jit::context::add_error_va): New.
* internal-api.h (GNU_PRINTF): New.
(gcc::jit::context::add_error): Add varargs and GNU_PRINTF
attribute macro.
(gcc::jit::context::add_error_va): New.
(gcc::jit::context::errors_occurred): New.
(gcc::jit::context::m_error_count): New.
(gcc::jit::function::get_kind): New.
* libgccjit.c (JIT_BEGIN_STMT): New.
(JIT_END_STMT): New.
(RETURN_VAL_IF_FAIL): New.
(RETURN_NULL_IF_FAIL): New.
(RETURN_IF_FAIL): New.
(RETURN_IF_NOT_INITIAL_CTXT): New.
(RETURN_NULL_IF_NOT_INITIAL_CTXT): New.
(RETURN_NULL_IF_NOT_CALLBACK_CTXT): New.
(RETURN_IF_NOT_FUNC_DEFINITION): New.
(RETURN_NULL_IF_NOT_FUNC_DEFINITION): New.
(jit_error): New.
(gcc_jit_context_set_code_factory): Use new error-checking
macros.
(ASSERT_WITHIN_CALLBACK): Remove.
(ASSERT_NOT_WITHIN_CALLBACK): Remove.
(gcc_jit_context_new_location): Use new error-checking macros.
(gcc_jit_context_get_type): Likewise.
(gcc_jit_type_get_pointer): Likewise.
(gcc_jit_type_get_const): Likewise.
(gcc_jit_context_new_field): Likewise.
(gcc_jit_context_new_struct_type): Likewise.
(gcc_jit_context_new_param): Likewise.
(gcc_jit_param_as_lvalue): Likewise.
(gcc_jit_param_as_rvalue): Likewise.
(gcc_jit_context_new_function): Likewise.
(gcc_jit_function_new_forward_label): Likewise.
(gcc_jit_context_new_global): Likewise.
(gcc_jit_lvalue_as_rvalue): Likewise.
(gcc_jit_context_new_rvalue_from_int): Likewise.
(gcc_jit_context_zero): Likewise.
(gcc_jit_context_one): Likewise.
(gcc_jit_context_new_rvalue_from_double): Likewise.
(gcc_jit_context_new_rvalue_from_ptr): Likewise.
(gcc_jit_context_new_string_literal): Likewise.
(gcc_jit_context_new_binary_op): Likewise.
(gcc_jit_context_new_comparison): Likewise.
(gcc_jit_context_new_call): Likewise.
(gcc_jit_context_new_array_lookup): Likewise.
(gcc_jit_context_new_field_access): Likewise.
(gcc_jit_function_new_local): Likewise.
(gcc_jit_function_add_label): Likewise.
(gcc_jit_function_place_forward_label): Likewise.
(gcc_jit_function_add_eval): Likewise.
(gcc_jit_function_add_assignment): Likewise.
(gcc_jit_function_add_assignment_op): Likewise.
(gcc_jit_function_add_conditional): Likewise.
(gcc_jit_function_add_jump): Likewise.
(gcc_jit_function_add_return): Likewise.
(gcc_jit_function_new_loop): Likewise.
(gcc_jit_loop_end): Likewise.
(gcc_jit_context_set_str_option): Likewise.
(gcc_jit_context_set_int_option): Likewise.
(gcc_jit_context_set_bool_option): Likewise.
(gcc_jit_context_compile): Likewise.
(gcc_jit_result_get_code): Likewise.
(gcc_jit_result_release): Likewise.
* libgccjit.h (gcc_jit_function_new_forward_label): Clarify
behavior.
(gcc_jit_function_add_label): Likewise.

gcc/testsuite/
* jit.dg/test-null-passed-to-api.c: New.
---
 gcc/jit/ChangeLog.jit  |  82 +++
 gcc/jit/internal-api.c |  44 +++-
 gcc/jit/internal-api.h |  28 ++-
 gcc/jit/libgccjit.c| 299 +
 gcc/jit/libgccjit.h|   8 +-
 gcc/testsuite/ChangeLog.jit|   4 +
 gcc/testsuite/jit.dg/test-null-passed-to-api.c |  30 +++
 7 files changed, 444 insertions(+), 51 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-null-passed-to-api.c

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 78c1b65..bcbb93b 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,85 @@
+2013-10-18  David Malcolm  
+
+   * internal-api.c (gcc::jit::context::get_type): Improve error
+   message, and report the bogus value.
+   (gcc::jit::context::new_binary_op): Likewise.
+   (gcc::jit::context::new_comparison): Likewise.

Re: [patch 7/8] Remove basic-block.h from cgraph.h

2013-10-18 Thread Jeff Law

On 10/18/13 07:43, Andrew MacLeod wrote:

I also happened to notice that basic-block.h was being included directly
by cgraph.h.  The only thing cgraph.h and most of what use it need are
the typedefs for gcov_type:

typedef HOST_WIDEST_INT gcov_type;
typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;

This patch moves gcov_type and gcov_type_unsigned to coretypes.h.

I will note this is the first time HWINT is introduced to coretypes, but
I believe system.h is suppose to always be included first, and that
includes HWINT.  we could also include HWINT from coretypes... But Im
not sure what the standard practice for what goes into coretypes
actually is.

In any case, by removing the basic-block.h include from cgraph.h,
varasm.c was the only .c file that required a new include.

Does this seem reasonable?

bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

OK.

I'm not sure if we have a hard and fast rule, but I'm certainly OK with 
making one that we have config.h, system.h, coretypes.h as the defined 
order.


jeff


Re: [PATCH][ARM] New rtx costs table for Cortex A9

2013-10-18 Thread Richard Earnshaw
On 17/10/13 18:00, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch adds the rtx costs table for the Cortex-A9 core.
> An arm-none-eabi regression run tuned for A9 succeeded.
> This costs tabled showed a slight improvement on some popular benchmarks and 
> no 
> performance regressions on others against the old way of doing rtx costs.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> [gcc/]
> 2013-10-17  Kyrylo Tkachov  
> 
>  * config/arm/arm.c (cortexa9_extra_costs): New table.
>  (arm_cortex_a9_tune): Use cortexa9_extra_costs.
> 

OK.

R.




Re: [patch 8/8] cfgloop.h includes basic-block.h

2013-10-18 Thread Jeff Law

On 10/18/13 07:45, Andrew MacLeod wrote:

Another basoc-block inclusion.  cfgloop.h was including basic-block.h,
which meant lots of other things were getting it from here too.

The only routine in cfgloop.h which uses anything from basic-block.h was
bb_loop_depth().  By moving that to cfgloop.c, its no longer required by
the .h file.I did have to include function.h for a few routines.  I
cannot imagine this being hot enough to show up on any performance
measure...

bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?

OK.
jeff


Re: [patch 1/8] Remove gimple-low.h from the tree-ssa.h include list.

2013-10-18 Thread Jeff Law

On 10/18/13 07:37, Andrew MacLeod wrote:


gimple_check_call_matching_types() was being called from 3 or 4
different files,and seemed more appropriate as a cgraph routine (which
called it 3 times). So I moved that and its helper to cgraph.c.

After that, I only needed to update 4 .c files to directly include
gimple-low.h

bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK?
I'm less sure about this one.  I don't see that it clearly belongs in 
either location.  I could easily see it moving out of cgraph.c at some 
point.


If it furthers your cleanup efforts at this time, that's fine.  Just be 
aware that, at least IMHO, this routine doesn't fit clearly into either 
location and I wouldn't be surprised if we have to come back to it at 
some point.


jeff


[c++-concepts] Bugfix

2013-10-18 Thread Andrew Sutton
Fixing 2 issues. A test for __is_convertible_to was missing in
diagnose_trait, and *somehow* the logic for determining constraint
ordering w.r.t. requires expressions was missing.

Committed in 203826.

2013-10-16  Andrew Sutton  
* gcc/cp/logic.cc (left_requires), (decompose_left): Add
decomposition rules for requires expressions.
(subsumes_requires), (subsumes_prop): Add subsumption rules for
requires expressions.
* gcc/cp/constraint.cc (diagnose_trait): Diagnose failed conversion
requirements.

Andrew


Re: [Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-18 Thread Tim Shen
On Fri, Oct 18, 2013 at 5:39 AM, Paolo Carlini  wrote:
>>Oops that's only perfect Chinese version of "hello, world" under UTF-8
>>charset. Now it's hex format.
>>
>>I also make the char handling simpler.
>
> Great, thanks!

-m32 and -m64 tested and committed.

By the way, UTF-8 *encoding* is a more accurate word, instead of
`charset`(which is Unicode).


-- 
Tim Shen


Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-10-18 Thread Sandra Loosemore

On 10/18/2013 04:50 AM, Richard Biener wrote:

On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore
 wrote:

This patch fixes various -fstrict-volatile-bitfields wrong-code bugs,
including instances where bitfield values were being quietly truncated as
well as bugs relating to using the wrong width.  The code changes are
identical to those in the previous version of this patch series (originally
posted in June and re-pinged many times since then), but for this version I
have cleaned up the test cases to remove dependencies on header files, as
Bernd requested.


Ok.


Just to clarify, is this approval unconditional, independent of part 2 
and other patches or changes still under active discussion?


-Sandra


  1   2   >