date:20140515

[GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread bin.cheng

Hi,
Targets like ARM and AARCH64 support double-word load store instructions,
and these instructions are generally faster than the corresponding two
load/stores.  GCC currently uses peephole2 to merge paired load/store into
one single instruction which has a disadvantage.  It can only handle simple
cases like the two instructions actually appear sequentially in instruction
stream, and is too weak to handle cases in which the two load/store are
intervened by other irrelevant instructions.

Here comes up with a new GCC pass looking through each basic block and
merging paired load store even they are not adjacent to each other.  The
algorithm is pretty simple:
1) In initialization pass iterating over instruction stream it collects
relevant memory access information for each instruction.
2) It iterates over each basic block, tries to find possible paired
instruction for each memory access instruction.  During this work, it checks
dependencies between the two possible instructions and also records the
information indicating how to pair the two instructions.  To avoid quadratic
behavior of the algorithm, It introduces new parameter
max-merge-paired-loadstore-distance and set the default value to 4, which is
large enough to catch major part of opportunities on ARM/cortex-a15.
3) For each candidate pair, it calls back-end's hook to do target dependent
check and merge the two instructions if possible.

Though the parameter is set to 4, for miscellaneous benchmarks, this pass
can merge numerous opportunities except ones already merged by peephole2
(same level numbers of opportunities comparing to peepholed ones).  GCC
bootstrap can also confirm this finding.

Yet there is an open issue about when we should run this new pass.  Though
register renaming is disabled by default now, I put this pass after it,
because renaming can resolve some false dependencies thus benefit this pass.
Another finding is, it can capture a lot more opportunities if it's after
sched2, but I am not sure whether it will mess up with scheduling results in
this way.

So, any comments about this?

Thanks,
bin


2014-05-15  Bin Cheng  
* common.opt (flag_merge_paired_loadstore): New option.
* merge-paired-loadstore.c: New file.
* Makefile.in: Support new file.
* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
* params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
* doc/invoke.texi (-fmerge-paired-loadstore): New.
(max-merge-paired-loadstore-distance): New.
* doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
* doc/tm.texi: Regenerated.
* target.def (merge_paired_loadstore): New.
* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
* passes.def (pass_merge_paired_loadstore): New pass.
* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.

gcc/testsuite/ChangeLog
2014-05-15  Bin Cheng  

* gcc.target/arm/merge-paired-loadstore.c: New test.

Index: gcc/common.opt
===
--- gcc/common.opt  (revision 209009)
+++ gcc/common.opt  (working copy)
@@ -1745,6 +1745,10 @@ flive-range-shrinkage
 Common Report Var(flag_live_range_shrinkage) Init(0) Optimization
 Relief of register pressure through live range shrinkage
 
+fmerge-paired-loadstore
+Common Report Var(flag_merge_paired_loadstore) Init(2) Optimization
+Perform a target dependent pass merging paired loadstore
+
 frename-registers
 Common Report Var(flag_rename_registers) Init(2) Optimization
 Perform a register renaming optimization pass
Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 209009)
+++ gcc/Makefile.in (working copy)
@@ -1289,6 +1289,7 @@ OBJS = \
langhooks.o \
lcm.o \
lists.o \
+   merge-paired-loadstore.o \
loop-doloop.o \
loop-init.o \
loop-invariant.o \
Index: gcc/params.def
===
--- gcc/params.def  (revision 209009)
+++ gcc/params.def  (working copy)
@@ -262,6 +262,14 @@ DEFPARAM(PARAM_MAX_HOIST_DEPTH,
 "Maximum depth of search in the dominator tree for expressions to 
hoist",
 30, 0, 0)
 
+/* Paired memory access instructions can't be merged if they are too far
+   from each other in instruction flow.  This is used to avoid quadratic
+   behavior in merging paired load store algorithm.  */
+DEFPARAM(PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE,
+"max-merge-paired-loadstore-distance",
+"Only paired instructions within this distance can be merged",
+4, 0, 0)
+
 /* This parameter limits the number of insns in a loop that will be unrolled,
and by how much the loop is unrolled.
 
Index: gcc/target.def
=

Re: RFA: Fix calculation of size of builtin setjmp buffer

2014-05-15 Thread Eric Botcazou

> Ah - you are worried about the case where STACK_SAVEAREA_MODE() is
> smaller than Pmode, yes ?

No, simply that the modified formula is plain wrong.  The code does:

  tmp = size_int (5 * BITS_PER_WORD / POINTER_SIZE - 1);
  tmp = build_index_type (tmp);
  tmp = build_array_type (ptr_type_node, tmp);

so, in the end, the size of the buffer is:

  [(5 * BITS_PER_WORD / POINTER_SIZE - 1) + 1] * POINTER_SIZE

which boilds down to:

  5 * BITS_PER_WORD

provided that POINTER_SIZE <= BITS_PER_WORD.


So we have a problem if POINTER_SIZE > BITS_PER_WORD, in which case it's 
sufficient to use 5 * POINTER_SIZE instead.

> OK then, how about this revised version of the patch where the size
> computation is now:
> 
>   tmp = size_int (MAX (GET_MODE_SIZE (STACK_SAVEAREA_MODE
> (SAVE_NONLOCAL))
>  / GET_MODE_SIZE (Pmode), 1)
> + 2 /* Stack pointer and frame pointer.  */
> + 1 /* Slop for mips.  */
> - 1);
> 
> OK to apply ?

No, that's too complicated and risky, just do the following:

  /* builtin_setjmp takes a pointer to 5 words or pointers.  */
  if (POINTER_SIZE > BITS_PER_WORD)
tmp = size_int (4);
  else
tmp = size_int (5 * BITS_PER_WORD / POINTER_SIZE - 1);

which is simple and safe.

-- 
Eric Botcazou

Re: [Patch, avr] Propagate -mrelax gcc driver flag to assembler

2014-05-15 Thread Senthil Kumar Selvaraj

On Wed, May 14, 2014 at 12:56:54PM +0200, Rainer Orth wrote:
> Georg-Johann Lay  writes:
> 
> > Or what about simply that, which works for me:
> >
> >
> > Index: config/avr/avr.h
> > ===
> > --- config/avr/avr.h(revision 210276)
> > +++ config/avr/avr.h(working copy)
> > @@ -512,7 +512,11 @@ extern const char *avr_device_to_sp8 (in
> >  %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
> >  %{!fexceptions:-fno-exceptions}"
> >
> > +#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
> > +#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) %{mrelax:-mlink-relax} "
> > +#else
> >  #define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) "
> > +#endif
> >
> >  #define LINK_SPEC "\
> >  %{mrelax:--relax\
> 
> Better yet something like
> 
> #ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
> #define LINK_RELAX_SPEC "%{mrelax:-mlink-relax} "
> #else
> #define LINK_RELAX_SPEC ""
> #endif
> 
> #define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) " LINK_RELAX_SPEC
> 

Does this look ok? I don't have commit access, so could someone commit
this please?

Regards
Senthil

2014-05-15  Senthil Kumar Selvaraj  

* config/avr/avr.h: Pass on mlink-relax to assembler.
* configure.ac: Test for mlink-relax assembler support.
* config.in: Regenerate.
* configure: Likewise.

diff --git gcc/config.in gcc/config.in
index c0ba36e..1738301 100644
--- gcc/config.in
+++ gcc/config.in
@@ -575,6 +575,12 @@
 #endif
 
 
+/* Define if your assembler supports -mlink-relax option. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AVR_AS_LINK_RELAX_OPTION
+#endif
+
+
 /* Define to 1 if you have the `clearerr_unlocked' function. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_CLEARERR_UNLOCKED
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index 9d34983..c59c54d 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -512,8 +512,14 @@ extern const char *avr_device_to_sp8 (int argc, const char 
**argv);
 %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
 %{!fexceptions:-fno-exceptions}"
 
-#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) "
-  
+#ifdef HAVE_AVR_AS_LINK_RELAX_OPTION
+#define ASM_RELAX_SPEC "%{mrelax:-mlink-relax}"
+#else
+#define ASM_RELAX_SPEC ""
+#endif
+
+#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) " ASM_RELAX_SPEC 
+
 #define LINK_SPEC "\
 %{mrelax:--relax\
  %{mpmem-wrap-around:%{mmcu=at90usb8*:--pmem-wrap-around=8k}\
diff --git gcc/configure gcc/configure
index f4db0a0..2812cdb 100755
--- gcc/configure
+++ gcc/configure
@@ -24014,6 +24014,39 @@ $as_echo "#define HAVE_AS_JSRDIRECT_RELOCS 1" 
>>confdefs.h
 fi
 ;;
 
+  avr-*-*)
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
-mlink-relax option" >&5
+$as_echo_n "checking assembler for -mlink-relax option... " >&6; }
+if test "${gcc_cv_as_avr_relax+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_avr_relax=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.text' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mlink-relax -o conftest.o 
conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_avr_relax=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_avr_relax" >&5
+$as_echo "$gcc_cv_as_avr_relax" >&6; }
+if test $gcc_cv_as_avr_relax = yes; then
+
+$as_echo "#define HAVE_AVR_AS_LINK_RELAX_OPTION 1" >>confdefs.h
+
+fi
+  ;;
+
   cris-*-*)
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
-no-mul-bug-abort option" >&5
 $as_echo_n "checking assembler for -no-mul-bug-abort option... " >&6; }
diff --git gcc/configure.ac gcc/configure.ac
index 8f17dfb..49a1f3d 100644
--- gcc/configure.ac
+++ gcc/configure.ac
@@ -3510,6 +3510,13 @@ case "$target" in
   [Define if your assembler supports the lituse_jsrdirect relocation.])])
 ;;
 
+  avr-*-*)
+gcc_GAS_CHECK_FEATURE([-mlink-relax option], gcc_cv_as_avr_relax,,
+  [-mlink-relax], [.text],,
+  [AC_DEFINE(HAVE_AVR_AS_LINK_RELAX_OPTION, 1,
+   [Define if your assembler supports -mlink-relax option.])])
+  ;;
+
   cris-*-*)
 gcc_GAS_CHECK_FEATURE([-no-mul-bug-abort option],
   gcc_cv_as_cris_no_mul_bug,[2,15,91],

Re: [PATCH] Fix (X >> C1) & C2 folding (PR tree-optimization/61158)

2014-05-15 Thread Richard Biener

On Thu, 15 May 2014, Jakub Jelinek wrote:

> Hi!
> 
> fold_binary_loc for (X >> C1) & C2, if X is zero extended narrower
> value, decreases prec, but if the shift count is bigger
> than the narrower prec, we then attempt to zerobits <<= negative_value.
> 
> In that case the result is necessarily zero though, all possibly non-zero
> bits are shifted away, so this patch fixes this case by making sure zerobits
> is all ones in that case, which results in (X, 0) folding a few lines below.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9/4.8?

Ok.

Thanks,
Richard.

> 2014-05-15  Jakub Jelinek  
> 
>   PR tree-optimization/61158
>   * fold-const.c (fold_binary_loc): If X is zero-extended and
>   shiftc >= prec, make sure zerobits is all ones instead of
>   invoking undefined behavior.
> 
>   * gcc.dg/pr61158.c: New test.
> 
> --- gcc/fold-const.c.jj   2014-05-14 09:46:07.0 +0200
> +++ gcc/fold-const.c  2014-05-14 11:35:19.593730226 +0200
> @@ -11972,11 +11972,17 @@ fold_binary_loc (location_t loc,
> /* See if we can shorten the right shift.  */
> if (shiftc < prec)
>   shift_type = inner_type;
> +   /* Otherwise X >> C1 is all zeros, so we'll optimize
> +  it into (X, 0) later on by making sure zerobits
> +  is all ones.  */
>   }
>   }
> zerobits = ~(unsigned HOST_WIDE_INT) 0;
> -   zerobits >>= HOST_BITS_PER_WIDE_INT - shiftc;
> -   zerobits <<= prec - shiftc;
> +   if (shiftc < prec)
> + {
> +   zerobits >>= HOST_BITS_PER_WIDE_INT - shiftc;
> +   zerobits <<= prec - shiftc;
> + }
> /* For arithmetic shift if sign bit could be set, zerobits
>can contain actually sign bits, so no transformation is
>possible, unless MASK masks them all away.  In that
> @@ -11994,7 +12000,7 @@ fold_binary_loc (location_t loc,
> /* ((X << 16) & 0xff00) is (X, 0).  */
> if ((mask & zerobits) == mask)
>   return omit_one_operand_loc (loc, type,
> -  build_int_cst (type, 0), arg0);
> +  build_int_cst (type, 0), arg0);
>  
> newmask = mask | zerobits;
> if (newmask != mask && (newmask & (newmask + 1)) == 0)
> --- gcc/testsuite/gcc.dg/pr61158.c.jj 2014-05-14 12:18:06.066817887 +0200
> +++ gcc/testsuite/gcc.dg/pr61158.c2014-05-14 12:18:48.046598622 +0200
> @@ -0,0 +1,12 @@
> +/* PR tree-optimization/61158 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-original" } */
> +
> +unsigned long long
> +foo (unsigned int x)
> +{
> +  return ((unsigned long long) x & 0x00ffULL) >> 40;
> +}
> +
> +/* { dg-final { scan-tree-dump "return 0;" "original" { target { ilp32 || 
> lp64 } } } } */
> +/* { dg-final { cleanup-tree-dump "original" } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: libsanitizer merge from upstream r208536

2014-05-15 Thread Konstantin Serebryany

On Thu, May 15, 2014 at 9:28 AM, Yury Gribov  wrote:
> On 05/14/2014 08:51 AM, Konstantin Serebryany wrote:
>>>
>>> One theme I have been noticing in the libsanitizer code is that it has
>>> all of the knowledge of glibc when it comes to syscalls but makes some
>>> bad assumptions dealing with some structures.  For an example on MIPS
>>> n64: stat and stat64 are different layout between user and kernel.
>>> Don't ask, it is a historical mistake.
>>>
>>> Also it really forces target maintainers to maintain both glibc and
>>> libsanitizer code.  ILP32 on aarch64 has some similar issues as x32
>>> and MIPS n32.
>>> Even Linus is pushing new 32bit targets to go the route of 64bit time
>>> which is means libsanitizer code becomes even more complex.
>>
>>
>> This is indeed a PITA.
>> It is caused by our need to avoid even greater PITA of including
>> system headers into the sanitizer files
>> (we allow system #includes only in a handful of files).
>
>
> Have you considered adding a configure-like step to your build process to
> determine necessary sizes and then provide them to sanitizer files via some
> config.h?

No. We have to support too many build systems and hence do not want
any configure step.
All configuration has to be done in the sources.

Re: libsanitizer merge from upstream r208536

2014-05-15 Thread Konstantin Serebryany

H.J.,
Thanks for the patches. Please (re)send them to llvm-commits,
otherwise I can not accept them.

--kcc

On Wed, May 14, 2014 at 2:31 AM, H.J. Lu  wrote:
> On Tue, May 13, 2014 at 2:02 AM, Konstantin Serebryany
>  wrote:
>> New patch attached.
>> It is based on r208674 which enables LeakSanitizer
>> (https://code.google.com/p/address-sanitizer/wiki/LeakSanitizer) by
>> default in asan mode.
>
> There are a couple issues for x32:
>
> 1.  This change
>
> @@ -56,13 +55,6 @@ typedef signed   long long sptr;  // NOLINT
>  typedef unsigned long uptr;  // NOLINT
>  typedef signed   long sptr;  // NOLINT
>  #endif  // defined(_WIN64)
> -#if defined(__x86_64__)
> -// Since x32 uses ILP32 data model in 64-bit hardware mode,  we must use
> -// 64-bit pointer to unwind stack frame.
> -typedef unsigned long long uhwptr;  // NOLINT
> -#else
> -typedef uptr uhwptr;   // NOLINT
> -#endif
>  typedef unsigned char u8;
>  typedef unsigned short u16;  // NOLINT
>  typedef unsigned int u32;
>
> @@ -120,46 +34,43 @@ uptr StackTrace::GetCurrentPc() {
>  void StackTrace::FastUnwindStack(uptr pc, uptr bp,
>   uptr stack_top, uptr stack_bottom,
>   uptr max_depth) {
> -  if (max_depth == 0) {
> -size = 0;
> -return;
> -  }
> +  CHECK_GE(max_depth, 2);
>trace[0] = pc;
>size = 1;
> -  uhwptr *frame = (uhwptr *)bp;
> -  uhwptr *prev_frame = frame - 1;
> +  uptr *frame = (uptr *)bp;
> +  uptr *prev_frame = frame - 1;
>if (stack_top < 4096) return;  // Sanity check for stack top.
>// Avoid infinite loop when frame == frame[0] by using frame > prev_frame.
>while (frame > prev_frame &&
> - frame < (uhwptr *)stack_top - 2 &&
> - frame > (uhwptr *)stack_bottom &&
> + frame < (uptr *)stack_top - 2 &&
> + frame > (uptr *)stack_bottom &&
>   IsAligned((uptr)frame, sizeof(*frame)) &&
>   size < max_depth) {
> -uhwptr pc1 = frame[1];
> +uptr pc1 = frame[1];
>  if (pc1 != pc) {
> -  trace[size++] = (uptr) pc1;
> +  trace[size++] = pc1;
>  }
>  prev_frame = frame;
> -frame = (uhwptr *)frame[0];
> +frame = (uptr*)frame[0];
>}
>  }
>
> reverted:
>
> 2012-11-16  H.J. Lu  
>
> PR other/55333
> * include/sanitizer/common_interface_defs.h (uhwptr): New type
> for hardware pointer.
> * sanitizer_common/sanitizer_stacktrace.cc
> (StackTrace::FastUnwindStack):
> Replace uptr with uhwptr for stack unwind.
>
> 2. struct linux_dirent is incorrect for x32.  We need something like
>
> struct linux_dirent {
> #if defined(__x86_64__) && !defined(_LP64)
>   unsigned long long d_ino;
>   unsigned long long d_off;
> #else
>   unsigned long  d_ino;
>   unsigned long  d_off;
> #endif
>   unsigned short d_reclen;
>   char   d_name[256];
> };
>
> 3. Pointers passed to internal_syscall should be casted to (uptr).
> Otherwise, they won't be properly extended to 64-bit.  We need
> to use (uptr) in internal_sigprocmask, like
>
> uptr internal_sigprocmask(int how, __sanitizer_sigset_t *set,
> __sanitizer_sigset_t *oldset) {
> #if SANITIZER_FREEBSD
>   return internal_syscall(SYSCALL(sigprocmask), how, set, oldset);
> #else
>   __sanitizer_kernel_sigset_t *k_set = (__sanitizer_kernel_sigset_t *)set;
>   __sanitizer_kernel_sigset_t *k_oldset = (__sanitizer_kernel_sigset_t 
> *)oldset;
>   return internal_syscall(SYSCALL(rt_sigprocmask), (uptr)how,
> (uptr)&k_set->sig[0],
>   (uptr)&k_oldset->sig[0], sizeof(__sanitizer_kernel_sigset_t));
> #endif
> }
>
> 4. ThreadDescriptorSize returns wrong value for x32. Size of struct
> pthread should be 1728.
>
> I am enclosing patches for those issues.
>
>
> --
> H.J.

Re: [PATCH aarch64] aarch64-linux: output .note.GNU-stack

2014-05-15 Thread Marcus Shawcroft

On 14 May 2014 22:30, Kyle McMartin  wrote:
> The toolchain would like PT_GNU_STACK in our objects for all
> architectures to make it explicit whether we are requesting an
> executable stack or not.
>
> 2014-05-14  Kyle McMartin  
>
> * config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): define
> to file_end_indicate_exec_stack for .note.GNU-stack emission.
>
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -44,4 +44,6 @@
>  }  \
>while (0)
>
> +#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
> +
>  #endif  /* GCC_AARCH64_LINUX_H */

OK, thanks.

Can you back port this to 4.9 and 4.8 ?

/Marcus

Re: libsanitizer merge from upstream r208536

2014-05-15 Thread Andrew Pinski

On Thu, May 15, 2014 at 1:05 AM, Konstantin Serebryany
 wrote:
> On Thu, May 15, 2014 at 9:28 AM, Yury Gribov  wrote:
>> On 05/14/2014 08:51 AM, Konstantin Serebryany wrote:

 One theme I have been noticing in the libsanitizer code is that it has
 all of the knowledge of glibc when it comes to syscalls but makes some
 bad assumptions dealing with some structures.  For an example on MIPS
 n64: stat and stat64 are different layout between user and kernel.
 Don't ask, it is a historical mistake.

 Also it really forces target maintainers to maintain both glibc and
 libsanitizer code.  ILP32 on aarch64 has some similar issues as x32
 and MIPS n32.
 Even Linus is pushing new 32bit targets to go the route of 64bit time
 which is means libsanitizer code becomes even more complex.
>>>
>>>
>>> This is indeed a PITA.
>>> It is caused by our need to avoid even greater PITA of including
>>> system headers into the sanitizer files
>>> (we allow system #includes only in a handful of files).
>>
>>
>> Have you considered adding a configure-like step to your build process to
>> determine necessary sizes and then provide them to sanitizer files via some
>> config.h?
>
> No. We have to support too many build systems and hence do not want
> any configure step.
> All configuration has to be done in the sources.

I think  this argument is bogus.  Please do one build system and
support it.  Simple makefile and some scripts seems simple to support
so you don't need anything extra.

Thanks,
Andrew Pinski

Re: libsanitizer merge from upstream r208536

2014-05-15 Thread Andrew Pinski

On Thu, May 15, 2014 at 1:05 AM, Konstantin Serebryany
 wrote:
> H.J.,
> Thanks for the patches. Please (re)send them to llvm-commits,
> otherwise I can not accept them.


I think this is bogus reasoning.  You should be able to take and post
them yourself.  Those patches.

Thanks,
Andrew Pinski

>
> --kcc
>
> On Wed, May 14, 2014 at 2:31 AM, H.J. Lu  wrote:
>> On Tue, May 13, 2014 at 2:02 AM, Konstantin Serebryany
>>  wrote:
>>> New patch attached.
>>> It is based on r208674 which enables LeakSanitizer
>>> (https://code.google.com/p/address-sanitizer/wiki/LeakSanitizer) by
>>> default in asan mode.
>>
>> There are a couple issues for x32:
>>
>> 1.  This change
>>
>> @@ -56,13 +55,6 @@ typedef signed   long long sptr;  // NOLINT
>>  typedef unsigned long uptr;  // NOLINT
>>  typedef signed   long sptr;  // NOLINT
>>  #endif  // defined(_WIN64)
>> -#if defined(__x86_64__)
>> -// Since x32 uses ILP32 data model in 64-bit hardware mode,  we must use
>> -// 64-bit pointer to unwind stack frame.
>> -typedef unsigned long long uhwptr;  // NOLINT
>> -#else
>> -typedef uptr uhwptr;   // NOLINT
>> -#endif
>>  typedef unsigned char u8;
>>  typedef unsigned short u16;  // NOLINT
>>  typedef unsigned int u32;
>>
>> @@ -120,46 +34,43 @@ uptr StackTrace::GetCurrentPc() {
>>  void StackTrace::FastUnwindStack(uptr pc, uptr bp,
>>   uptr stack_top, uptr stack_bottom,
>>   uptr max_depth) {
>> -  if (max_depth == 0) {
>> -size = 0;
>> -return;
>> -  }
>> +  CHECK_GE(max_depth, 2);
>>trace[0] = pc;
>>size = 1;
>> -  uhwptr *frame = (uhwptr *)bp;
>> -  uhwptr *prev_frame = frame - 1;
>> +  uptr *frame = (uptr *)bp;
>> +  uptr *prev_frame = frame - 1;
>>if (stack_top < 4096) return;  // Sanity check for stack top.
>>// Avoid infinite loop when frame == frame[0] by using frame > prev_frame.
>>while (frame > prev_frame &&
>> - frame < (uhwptr *)stack_top - 2 &&
>> - frame > (uhwptr *)stack_bottom &&
>> + frame < (uptr *)stack_top - 2 &&
>> + frame > (uptr *)stack_bottom &&
>>   IsAligned((uptr)frame, sizeof(*frame)) &&
>>   size < max_depth) {
>> -uhwptr pc1 = frame[1];
>> +uptr pc1 = frame[1];
>>  if (pc1 != pc) {
>> -  trace[size++] = (uptr) pc1;
>> +  trace[size++] = pc1;
>>  }
>>  prev_frame = frame;
>> -frame = (uhwptr *)frame[0];
>> +frame = (uptr*)frame[0];
>>}
>>  }
>>
>> reverted:
>>
>> 2012-11-16  H.J. Lu  
>>
>> PR other/55333
>> * include/sanitizer/common_interface_defs.h (uhwptr): New type
>> for hardware pointer.
>> * sanitizer_common/sanitizer_stacktrace.cc
>> (StackTrace::FastUnwindStack):
>> Replace uptr with uhwptr for stack unwind.
>>
>> 2. struct linux_dirent is incorrect for x32.  We need something like
>>
>> struct linux_dirent {
>> #if defined(__x86_64__) && !defined(_LP64)
>>   unsigned long long d_ino;
>>   unsigned long long d_off;
>> #else
>>   unsigned long  d_ino;
>>   unsigned long  d_off;
>> #endif
>>   unsigned short d_reclen;
>>   char   d_name[256];
>> };
>>
>> 3. Pointers passed to internal_syscall should be casted to (uptr).
>> Otherwise, they won't be properly extended to 64-bit.  We need
>> to use (uptr) in internal_sigprocmask, like
>>
>> uptr internal_sigprocmask(int how, __sanitizer_sigset_t *set,
>> __sanitizer_sigset_t *oldset) {
>> #if SANITIZER_FREEBSD
>>   return internal_syscall(SYSCALL(sigprocmask), how, set, oldset);
>> #else
>>   __sanitizer_kernel_sigset_t *k_set = (__sanitizer_kernel_sigset_t *)set;
>>   __sanitizer_kernel_sigset_t *k_oldset = (__sanitizer_kernel_sigset_t 
>> *)oldset;
>>   return internal_syscall(SYSCALL(rt_sigprocmask), (uptr)how,
>> (uptr)&k_set->sig[0],
>>   (uptr)&k_oldset->sig[0], sizeof(__sanitizer_kernel_sigset_t));
>> #endif
>> }
>>
>> 4. ThreadDescriptorSize returns wrong value for x32. Size of struct
>> pthread should be 1728.
>>
>> I am enclosing patches for those issues.
>>
>>
>> --
>> H.J.

Re: libsanitizer merge from upstream r208536

2014-05-15 Thread Yury Gribov


On 05/15/2014 12:05 PM, Konstantin Serebryany wrote:

No. We have to support too many build systems and hence do not want
any configure step.
All configuration has to be done in the sources.


Yeah, I see your point. But filling code with magic constants isn't very 
nice either.


We could make a separate CMake check like GetTargetStructSize or 
something (somewhere in projects/compiler-rt/make/platform/*.mk).


-Y

Re: [PATCH, ARM] Enable shrink-wrap for apcs frame

2014-05-15 Thread Zhenqiang Chen

On 13 May 2014 20:56, Richard Earnshaw  wrote:
> On 25/03/14 08:13, Zhenqiang Chen wrote:
>> Hi
>>
>> The patch enables shrink-wrap for apcs frame.
>>
>> Bootstrap and no make check regression in ARM, THUMB1 and THUMB2 modes.
>> No make check regression with "-g/-mapcs/-marm".
>> Build linux-3.14-rc7 without error.
>>
>> Is it OK for next stage1?
>>
>> Thanks!
>> -Zhenqiang
>>
>> ChangeLog:
>> 2014-03-25  Zhenqiang Chen  
>>
>> * config/arm/arm.c (arm_option_override): Enable shrink-wrap for
>> TARGET_APCS_FRAME.
>> (arm_emit_multi_reg_pop): Set correct dwarf info.
>> (arm_expand_epilogue_apcs_frame): Add more dwarf info.
>>
>> testsuite/ChangeLog:
>> 2014-03-25  Zhenqiang Chen  
>>
>> * gcc.target/arm/shrink-wrap-alloca.c: New test case.
>> * gcc.target/arm/shrink-wrap-sibcall.c: New test case.
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 0240cc7..fa86942 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -2811,9 +2811,6 @@ arm_option_override (void)
>>   generate additional returns.  */
>>if (optimize_function_for_size_p (cfun) && TARGET_THUMB2)
>>  flag_shrink_wrap = false;
>> -  /* TBD: Dwarf info for apcs frame is not handled yet.  */
>> -  if (TARGET_APCS_FRAME)
>> -flag_shrink_wrap = false;
>>
>>/* We only support -mslow-flash-data on armv7-m targets.  */
>>if (target_slow_flash_data
>> @@ -19840,7 +19837,14 @@ arm_emit_multi_reg_pop (unsigned long 
>> saved_regs_mask)
>>  par = emit_insn (par);
>>
>>REG_NOTES (par) = dwarf;
>> -  if (!return_in_pc)
>> +
>> +  if (!emit_update)
>> +{
>> +  /* SP is restored from stack.  So reset the frame info.  */
>> +  RTX_FRAME_RELATED_P (par) = 1;
>> +  add_reg_note (par, REG_CFA_DEF_CFA, stack_pointer_rtx);
>> +}
>> +  else if (!return_in_pc)
>>  arm_add_cfa_adjust_cfa_note (par, UNITS_PER_WORD * num_regs,
>>   stack_pointer_rtx, stack_pointer_rtx);
>>  }
>> @@ -27226,6 +27230,9 @@ arm_expand_epilogue_apcs_frame (bool really_return)
>>REG_NOTES (insn) = alloc_reg_note (REG_CFA_RESTORE,
>>   gen_rtx_REG (SImode, IP_REGNUM),
>>   NULL_RTX);
>> +  arm_add_cfa_adjust_cfa_note (insn, UNITS_PER_WORD,
>> +   stack_pointer_rtx,
>> +   stack_pointer_rtx);
>
> This can't be related to $SUBJECT, surely?  Shrink-wrapping an interrupt
> routine?
>
> If this is as I think, please resubmit that part as a separate patch.

Yip. gcc can Shrink-wrapping an interrupt routine (only ARM mode). I
can write a test case to produce it.

I will investigate interrupt related codes and resubmit a separate patch for it.

> The other changes look ok.

I will retest it. If there is no regression, I will commit it.

Thanks!
-Zhenqiang

>>  }
>>
>>if (!really_return || (saved_regs_mask & (1 << PC_REGNUM)))
>> diff --git a/gcc/testsuite/gcc.target/arm/shrink-wrap-alloca.c
>> b/gcc/testsuite/gcc.target/arm/shrink-wrap-alloca.c
>> new file mode 100644
>> index 000..318240b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/shrink-wrap-alloca.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -g -mapcs " } */
>> +
>> +int *p;
>> +
>> +void
>> +test (int a)
>> +{
>> +  if (a > 0)
>> +p = __builtin_alloca (4);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/arm/shrink-wrap-sibcall.c
>> b/gcc/testsuite/gcc.target/arm/shrink-wrap-sibcall.c
>> new file mode 100644
>> index 000..2efe5d0
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/shrink-wrap-sibcall.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -g -mapcs " } */
>> +
>> +unsigned char a, b, d, f, g;
>> +
>> +int test (void);
>> +
>> +int
>> +baz (int c)
>> +{
>> +  if (c == 0) return test ();
>> +  if (b & 1)
>> +{
>> +  g = 0;
>> +  int e = (a & 0x0f) - (g & 0x0f);
>> +
>> +  if (!a)  b |= 0x80;
>> +  a = e + test ();
>> + f = g/5 + a*3879 + b *2985;
>> +}
>> +   else
>> +   {
>> + f = g + a*39879 + b *25;
>> +   }
>> +  return test ();
>> +}
>>
>
>

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-15 Thread Rainer Orth

Wei Mi  writes:

> Can I checkin this testcase fix?

I think this is for Uros to approve.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: libsanitizer merge from upstream r208536

2014-05-15 Thread Konstantin Serebryany

On Thu, May 15, 2014 at 12:07 PM, Andrew Pinski  wrote:
> On Thu, May 15, 2014 at 1:05 AM, Konstantin Serebryany
>  wrote:
>> H.J.,
>> Thanks for the patches. Please (re)send them to llvm-commits,
>> otherwise I can not accept them.
>
>
> I think this is bogus reasoning.  You should be able to take and post
> them yourself.  Those patches.

I can't, sorry.

Re: libsanitizer merge from upstream r208536

2014-05-15 Thread Konstantin Serebryany

>
> I think  this argument is bogus.  Please do one build system and
> support it.  Simple makefile and some scripts seems simple to support
> so you don't need anything extra.

Would be glad to. One (but not the only) prerequisite for that GCC
exports libsanitizer
as "svn external" and uses our cmake build for libsanitizer.

Re: libsanitizer merge from upstream r208536

2014-05-15 Thread Jakub Jelinek

On Thu, May 15, 2014 at 12:20:06PM +0400, Yury Gribov wrote:
> On 05/15/2014 12:05 PM, Konstantin Serebryany wrote:
> >No. We have to support too many build systems and hence do not want
> >any configure step.
> >All configuration has to be done in the sources.
> 
> Yeah, I see your point. But filling code with magic constants isn't
> very nice either.
> 
> We could make a separate CMake check like GetTargetStructSize or
> something (somewhere in projects/compiler-rt/make/platform/*.mk).

Also, I'd suggest to have a look at glibc or libgomp how a directory system
with easily overridable target specific headers is way better than a maze of
#ifdefs all around the source.
I mean glibc sysdeps/ stuff or libgomp config/ stuff (the latter is far more
limited of course).
So, individual structures/constants/defines (or related groups thereof)
could be split into separate headers, and have target specific set of
-I arguments added by the Makefile/CMake, so that you can have some
defaults, some per-CPU defaults, some OS defaults and some per-CPU OS
defaults.  Say, you could have a header file with layout of kernel stat
structure, one for Linux, another one for Darwin, another one for Windows,
and then if say on ppc64-linux it is different, it could supply it's own
file that would override the generic Linux one.

Jakub

[PATCH][ARM] Adjust arith_shiftsi for ARMv8-style

2014-05-15 Thread Kyrill Tkachov


Hi all,

Shifted arithmetic operations can never be encoded in 16-bits in and therefore 
can not appear in Thumb2 IT blocks under ARMv8-A rules (and the -mrestrict-it 
rules). This patch adjusts the relevant pattern for that purpose.


Tested and bootstrapped on arm-none-linux-gnueabihf and made sure no performance 
regressions on a number of benchmarks.


This is a bug (not wrong-code though) in -mrestrict-it that affects 4.9 as well 
as trunk, so is it ok to backport it there?



Thanks,
Kyrill

2014-05-15  Kyrylo Tkachov  

* config/arm/arm.md (arith_shiftsi): Do not predicate for
arm_restrict_it.commit 8d54d43e84925ee5c53e70ecc7036bc5a7e867ba
Author: Kyrylo Tkachov 
Date:   Fri Apr 4 16:10:39 2014 +0100

[ARM] Adjust arith_shiftsi for arm_restrict_it

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 38ca058..9468c70 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9860,6 +9860,7 @@
   "TARGET_32BIT"
   "%i1%?\\t%0, %2, %4%S3"
   [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
(set_attr "shift" "4")
(set_attr "arch" "a,t2,t2,a")
;; Thumb2 doesn't allow the stack pointer to be used for

Re: [RFC] Using function clones for Pointer Bounds Checker

2014-05-15 Thread Ilya Enkovich

2014-05-14 19:09 GMT+04:00 H.J. Lu :
> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich  wrote:
>> 2014-05-13 23:21 GMT+04:00 Jeff Law :
>>> On 05/13/14 02:38, Ilya Enkovich wrote:
>>
>> propagate constant bounds value and remove checks in called function).
>
>
> So from a linking standpoint, presumably you have to mangle the
> instrumented
> caller/callee in some manner.  Right?  Or are you dynamically dispatching
> somehow?


 Originally the idea was o have instrumented clone to have the same
 assembler name as the original function. Since instrumented code is
 fully compatible with not instrumented code, we always emit only one
 version. Usage of the same assembler name allows instrumented and not
 instrumented calls to look similar in assembler. It worked fine until
 I tried it with LTO where assembler name is used as a unique
 identifier. With linker resolutions files it became even more harder
 to use such approach. To resolve these issues I started to use new
 assembler name with postfix, but linked with the original name using
 IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
 clones and originals during compilation, but both clone and original
 functions have similar name in output assembler.
>>>
>>> OK.  So if I read that correctly, it implies that the existence of bounds
>>> information does not change the signature of the callee.   This is obviously
>>> important for C++.
>>>
>>> Sounds like I need to sit down with the branch and see how this works in the
>>> new scheme.
>>
>> Both mpx branch and Wiki
>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
>> page are up-to-date now and may be tried out either in NOP mode or
>> with simulator. Let me know if you have any troubles with using it.
>>
>
> I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
> MPX enabled executable which runs on both MPX-enabled and
> non MPX-enabled hardwares. I didn't see any MPX run-time library.

MPX run-time library is not a part of GCC now. It goes with Intel SDE.
See 
https://software.intel.com/en-us/articles/using-intel-mpx-with-the-intel-software-development-emulator
for usage instructions.

Ilya
>
> --
> H.J.

GCC 4.8.3 Status Report, branch frozen for release (2014-05-15)

2014-05-15 Thread Richard Biener


Status
==

The 4.8 branch is now frozen as I am preparing a first release
candidate for 4.8.3.  All patches to the branch now require
explicit approval from release managers.


Previous Report
===

https://gcc.gnu.org/ml/gcc/2014-05/msg00026.html

Re: C++ PATCHes to improve overload resolution diagnostics

2014-05-15 Thread Andreas Schwab

Jason Merrill  writes:

>   (print_z_candidate): Say "candidate:" before each candidate.

* obj-c++.dg/exceptions-3.mm: Remove check for message no longer
emitted.
* obj-c++.dg/exceptions-5.mm: Likewise.

diff --git a/gcc/testsuite/obj-c++.dg/exceptions-3.mm 
b/gcc/testsuite/obj-c++.dg/exceptions-3.mm
index bf0be01..3e6a775 100644
--- a/gcc/testsuite/obj-c++.dg/exceptions-3.mm
+++ b/gcc/testsuite/obj-c++.dg/exceptions-3.mm
@@ -73,7 +73,6 @@ int test (id object)
 { /* { dg-error "no matching function" "" { target 
*-*-* } 72 } */
   dummy++;/* { dg-message "MyObject" "" { target *-*-* } 13 } 
*/
 } /* { dg-message "candidate" "" { target *-*-* } 13 } 
*/
-  /* { dg-message "candidates" "" { target *-*-* } 72 
} */
   @try { @throw object; }
   @catch (static MyObject *x) /* { dg-error "storage class" } */
 {
diff --git a/gcc/testsuite/obj-c++.dg/exceptions-5.mm 
b/gcc/testsuite/obj-c++.dg/exceptions-5.mm
index 1ea9a6f..087533b 100644
--- a/gcc/testsuite/obj-c++.dg/exceptions-5.mm
+++ b/gcc/testsuite/obj-c++.dg/exceptions-5.mm
@@ -73,7 +73,6 @@ int test (id object)
 { /* { dg-error "no matching function" "" { target 
*-*-* } 72 } */
   dummy++;/* { dg-message "MyObject" "" { target *-*-* } 13 } 
*/
 } /* { dg-message "candidate" "" { target *-*-* } 13 } 
*/
-  /* { dg-message "candidate" "" { target *-*-* } 72 } 
*/
 
   @try { @throw object; }
   @catch (static MyObject *) /* { dg-error "storage class" } */
-- 
1.9.3


Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Richard Sandiford

Jakub Jelinek  writes:
> This patch adds two new options (compatible with clang) which allow
> users to choose the behavior of undefined behavior sanitization.
>
> By default as before, all undefined behaviors (except for
> __builtin_unreachable and missing return in C++) continue after reporting
> which means that you can get lots of runtime errors from a single program
> run and the exit code will not reflect the failure in that case.
>
> With this patch, one can use -fsanitize=undefined -fno-sanitize-recover,
> which will report just the first undefined behavior and then exit with
> non-zero code.

Would it make sense for this to be the default for bootstrap-ubsan,
so that the bootstrap fails on undefined behaviour?

Thanks,
Richard

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Oleg Endo

Hi,

On 15 May 2014, at 09:26, "bin.cheng"  wrote:

> Hi,
> Targets like ARM and AARCH64 support double-word load store instructions,
> and these instructions are generally faster than the corresponding two
> load/stores.  GCC currently uses peephole2 to merge paired load/store into
> one single instruction which has a disadvantage.  It can only handle simple
> cases like the two instructions actually appear sequentially in instruction
> stream, and is too weak to handle cases in which the two load/store are
> intervened by other irrelevant instructions.
> 
> Here comes up with a new GCC pass looking through each basic block and
> merging paired load store even they are not adjacent to each other.  The
> algorithm is pretty simple:
> 1) In initialization pass iterating over instruction stream it collects
> relevant memory access information for each instruction.
> 2) It iterates over each basic block, tries to find possible paired
> instruction for each memory access instruction.  During this work, it checks
> dependencies between the two possible instructions and also records the
> information indicating how to pair the two instructions.  To avoid quadratic
> behavior of the algorithm, It introduces new parameter
> max-merge-paired-loadstore-distance and set the default value to 4, which is
> large enough to catch major part of opportunities on ARM/cortex-a15.
> 3) For each candidate pair, it calls back-end's hook to do target dependent
> check and merge the two instructions if possible.
> 
> Though the parameter is set to 4, for miscellaneous benchmarks, this pass
> can merge numerous opportunities except ones already merged by peephole2
> (same level numbers of opportunities comparing to peepholed ones).  GCC
> bootstrap can also confirm this finding.

This is interesting.  E.g. on SH there are insns to load/store SFmode pairs.  
However, these insns require a mode switch and have some constraints on 
register usage.  So in the SH case the load/store pairing would need to be done 
before reg alloc and before mode switching.

> 
> Yet there is an open issue about when we should run this new pass.  Though
> register renaming is disabled by default now, I put this pass after it,
> because renaming can resolve some false dependencies thus benefit this pass.
> Another finding is, it can capture a lot more opportunities if it's after
> sched2, but I am not sure whether it will mess up with scheduling results in
> this way.

How about the following.
Instead of adding new hooks and inserting the pass to the general pass list, 
make the new pass class take the necessary callback functions directly.  Then 
targets can just instantiate the pass, passing their impl of the callbacks, and 
insert the pass object into the pass list at a place that fits best for the 
target.


> 
> So, any comments about this?
> 
> Thanks,
> bin
> 
> 
> 2014-05-15  Bin Cheng  
>* common.opt (flag_merge_paired_loadstore): New option.
>* merge-paired-loadstore.c: New file.
>* Makefile.in: Support new file.
>* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
>(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
>* params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
>* doc/invoke.texi (-fmerge-paired-loadstore): New.
>(max-merge-paired-loadstore-distance): New.
>* doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
>* doc/tm.texi: Regenerated.
>* target.def (merge_paired_loadstore): New.
>* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
>* passes.def (pass_merge_paired_loadstore): New pass.
>* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.
> 
> gcc/testsuite/ChangeLog
> 2014-05-15  Bin Cheng  
> 
>* gcc.target/arm/merge-paired-loadstore.c: New test.
> 
>

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Jakub Jelinek

On Thu, May 15, 2014 at 11:30:40AM +0100, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > This patch adds two new options (compatible with clang) which allow
> > users to choose the behavior of undefined behavior sanitization.
> >
> > By default as before, all undefined behaviors (except for
> > __builtin_unreachable and missing return in C++) continue after reporting
> > which means that you can get lots of runtime errors from a single program
> > run and the exit code will not reflect the failure in that case.
> >
> > With this patch, one can use -fsanitize=undefined -fno-sanitize-recover,
> > which will report just the first undefined behavior and then exit with
> > non-zero code.
> 
> Would it make sense for this to be the default for bootstrap-ubsan,
> so that the bootstrap fails on undefined behaviour?

Perhaps eventually, but is current bootstrap-ubsan really ubsan error free
on at least the major targets?  I've made some efforts towards that goal on
x86_64-linux, but haven't tried bootstrap-ubsan recently.

Jakub

Re: C++ PATCHes to improve overload resolution diagnostics

2014-05-15 Thread Manuel López-Ibáñez

On 14 May 2014 18:47, Jason Merrill  wrote:
> When I was working on DR 1571, I noticed that our diagnostics weren't very
> helpful for reference bindings (BZ 20332/21631) so I set out to treat
> lvalue/rvalue mismatches and cv-qual loss as a bad conversion rather than no
> conversion.  As I worked on that I kept noticing other things that I could
> tweak to improve diagnostics:
>
> We now explain what's wrong with near matches in overload resolution. As a
> result, I've changed to saying "candidate:" before every candidate to help
> users recognize when we're printing a new candidate rather than explanation
> of what was wrong with the previous one.
>
> We now get information about near-matches even with -pedantic.
>
> If we see ambiguous valid candidates, the diagnostic won't mention
> near-matches.  If we see a template candidate and ambiguous near-valid
> candidates, we will print all the candidates.
>
> Tested x86_64-pc-linux-gnu, applying to trunk.

This looks great. One minor nit: In this hunk, what is input_location
pointing at and why is that better than loc?

@@ -3195,18 +3217,37 @@ equal_functions (tree fn1, tree fn2)
 static void
 print_conversion_rejection (location_t loc, struct conversion_info *info)
 {
+  tree from = info->from;
+  if (!TYPE_P (from))
+from = lvalue_type (from);
   if (info->n_arg == -1)
-/* Conversion of implicit `this' argument failed.  */
-inform (loc, "  no known conversion for implicit "
-"% parameter from %qT to %qT",
-info->from_type, info->to_type);
+{
+  /* Conversion of implicit `this' argument failed.  */
+  if (!TYPE_P (info->from))
+/* A bad conversion for 'this' must be discarding cv-quals.  */
+inform (input_location, "  passing %qT as % "
+"argument discards qualifiers",
+from);
+  else
+inform (loc, "  no known conversion for implicit "
+"% parameter from %qT to %qT",
+from, info->to_type);
+}
+  else if (!TYPE_P (info->from))
+{
+  if (info->n_arg >= 0)
+inform (loc, "  conversion of argument %d would be ill-formed:",
+info->n_arg+1);
+  perform_implicit_conversion (info->to_type, info->from,
+   tf_warning_or_error);
+}
   else if (info->n_arg == -2)
 /* Conversion of conversion function return value failed.  */
 inform (loc, "  no known conversion from %qT to %qT",
-info->from_type, info->to_type);
+from, info->to_type);
   else
 inform (loc, "  no known conversion for argument %d from %qT to %qT",
-info->n_arg+1, info->from_type, info->to_type);
+info->n_arg+1, from, info->to_type);


And too few spaces around '+'.

Also, are there other qualifiers of 'this' besides 'const'?

Cheers,

Manuel.

Re: [patch] Simplify std::tuple helpers and fix C++14 bug.

2014-05-15 Thread Jonathan Wakely


On 15/05/14 07:36 +0200, Daniel Krügler wrote:

Looking at the definition of the new alias

__cv_tuple_size

you might want to apply LWG 2313

http://cplusplus.github.io/LWG/lwg-defects.html#2313

and simplify its definition even further.


I forgot about that. With that resolution the __cv_tuple_size alias
doesn't seem worth using, it's simple enough anyway. Thanks!

Tested x86_64-linux, committed to trunk.
commit 420a40952d148f64e73517c75060e6ff17e7c93b
Author: Jonathan Wakely 
Date:   Thu May 15 11:02:30 2014 +0100

	* include/std/tuple (tuple_size): Implement LWG 2313.
	* include/std/array (tuple_size, tuple_element): Add Doxygen comments.
	* include/std/utility (tuple_size, tuple_element): Likewise.
	* testsuite/23_containers/array/tuple_interface/tuple_element_neg.cc:
	Adjust dg-error line number.

diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
index 22947ce..cc2c864 100644
--- a/libstdc++-v3/include/std/array
+++ b/libstdc++-v3/include/std/array
@@ -306,6 +306,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template 
 class tuple_size;
 
+  /// Partial specialization for std::array
   template
 struct tuple_size<_GLIBCXX_STD_C::array<_Tp, _Nm>>
 : public integral_constant { };
@@ -314,6 +315,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class tuple_element;
 
+  /// Partial specialization for std::array
   template
 struct tuple_element<_Int, _GLIBCXX_STD_C::array<_Tp, _Nm>>
 {
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 95c197d..ef8aa5a 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -693,18 +693,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct tuple_size;
 
-  template>
-using __cv_tuple_size = integral_constant<
- typename remove_cv::type, _Ts::value>;
-
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2313. tuple_size should always derive from integral_constant
   template
-struct tuple_size : __cv_tuple_size<_Tp> { };
+struct tuple_size
+: integral_constant::value> { };
 
   template
-struct tuple_size : __cv_tuple_size<_Tp> { };
+struct tuple_size
+: integral_constant::value> { };
 
   template
-struct tuple_size : __cv_tuple_size<_Tp> { };
+struct tuple_size
+: integral_constant::value> { };
 
   /// class tuple_size
   template
diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 4da9209..6d12839e 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -84,14 +84,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 class tuple_element;
 
// Various functions which give std::pair a tuple-like interface.
+
+  /// Partial specialization for std::pair
   template
 struct tuple_size>
 : public integral_constant { };
 
+  /// Partial specialization for std::pair
   template
 struct tuple_element<0, std::pair<_Tp1, _Tp2>>
 { typedef _Tp1 type; };
  
+  /// Partial specialization for std::pair
   template
 struct tuple_element<1, std::pair<_Tp1, _Tp2>>
 { typedef _Tp2 type; };
diff --git a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_neg.cc b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_neg.cc
index f80798c..4b1e5ae 100644
--- a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_neg.cc
@@ -23,4 +23,4 @@
 
 typedef std::tuple_element<1, std::array>::type type;
 
-// { dg-error "static assertion failed" "" { target *-*-* } 320 }
+// { dg-error "static assertion failed" "" { target *-*-* } 322 }

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Marek Polacek

On Thu, May 15, 2014 at 12:33:57PM +0200, Jakub Jelinek wrote:
> On Thu, May 15, 2014 at 11:30:40AM +0100, Richard Sandiford wrote:
> > Jakub Jelinek  writes:
> > > This patch adds two new options (compatible with clang) which allow
> > > users to choose the behavior of undefined behavior sanitization.
> > >
> > > By default as before, all undefined behaviors (except for
> > > __builtin_unreachable and missing return in C++) continue after reporting
> > > which means that you can get lots of runtime errors from a single program
> > > run and the exit code will not reflect the failure in that case.
> > >
> > > With this patch, one can use -fsanitize=undefined -fno-sanitize-recover,
> > > which will report just the first undefined behavior and then exit with
> > > non-zero code.
> > 
> > Would it make sense for this to be the default for bootstrap-ubsan,
> > so that the bootstrap fails on undefined behaviour?
> 
> Perhaps eventually, but is current bootstrap-ubsan really ubsan error free
> on at least the major targets?  I've made some efforts towards that goal on
> x86_64-linux, but haven't tried bootstrap-ubsan recently.

It's not, I'm seeing many 
/home/marek/src/gcc/gcc/wide-int.h:1734:7: runtime error: shift
exponent 64 is too large for 64-bit type 'long unsigned int'
plus I think I remember some other fails.

Marek

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Ramana Radhakrishnan

On Thu, May 15, 2014 at 11:33 AM, Jakub Jelinek  wrote:
> On Thu, May 15, 2014 at 11:30:40AM +0100, Richard Sandiford wrote:
>> Jakub Jelinek  writes:
>> > This patch adds two new options (compatible with clang) which allow
>> > users to choose the behavior of undefined behavior sanitization.
>> >
>> > By default as before, all undefined behaviors (except for
>> > __builtin_unreachable and missing return in C++) continue after reporting
>> > which means that you can get lots of runtime errors from a single program
>> > run and the exit code will not reflect the failure in that case.
>> >
>> > With this patch, one can use -fsanitize=undefined -fno-sanitize-recover,
>> > which will report just the first undefined behavior and then exit with
>> > non-zero code.
>>
>> Would it make sense for this to be the default for bootstrap-ubsan,
>> so that the bootstrap fails on undefined behaviour?
>
> Perhaps eventually, but is current bootstrap-ubsan really ubsan error free
> on at least the major targets?  I've made some efforts towards that goal on
> x86_64-linux, but haven't tried bootstrap-ubsan recently.

What's the overhead with bootstrap-ubsan ?

Ramana

>
> Jakub

Re: [RFC] Using function clones for Pointer Bounds Checker

2014-05-15 Thread Ilya Enkovich

2014-05-14 19:09 GMT+04:00 H.J. Lu :
> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich  wrote:
>> 2014-05-13 23:21 GMT+04:00 Jeff Law :
>>> On 05/13/14 02:38, Ilya Enkovich wrote:
>>
>> propagate constant bounds value and remove checks in called function).
>
>
> So from a linking standpoint, presumably you have to mangle the
> instrumented
> caller/callee in some manner.  Right?  Or are you dynamically dispatching
> somehow?


 Originally the idea was o have instrumented clone to have the same
 assembler name as the original function. Since instrumented code is
 fully compatible with not instrumented code, we always emit only one
 version. Usage of the same assembler name allows instrumented and not
 instrumented calls to look similar in assembler. It worked fine until
 I tried it with LTO where assembler name is used as a unique
 identifier. With linker resolutions files it became even more harder
 to use such approach. To resolve these issues I started to use new
 assembler name with postfix, but linked with the original name using
 IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
 clones and originals during compilation, but both clone and original
 functions have similar name in output assembler.
>>>
>>> OK.  So if I read that correctly, it implies that the existence of bounds
>>> information does not change the signature of the callee.   This is obviously
>>> important for C++.
>>>
>>> Sounds like I need to sit down with the branch and see how this works in the
>>> new scheme.
>>
>> Both mpx branch and Wiki
>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
>> page are up-to-date now and may be tried out either in NOP mode or
>> with simulator. Let me know if you have any troubles with using it.
>>
>
> I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
> MPX enabled executable which runs on both MPX-enabled and
> non MPX-enabled hardwares. I didn't see any MPX run-time library.

Just checked out the branch and checked generated code.

#cat test.c
int
test (int *p, int i)
{
  return p[i];
}
#gcc -fcheck-pointer-bounds -mmpx test.c -S -O2
#cat test.s
.file   "test.c"
.section.text.unlikely,"ax",@progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl  test
.type   test, @function
test:
.LFB1:
.cfi_startproc
movslq  %esi, %rsi
leaq(%rdi,%rsi,4), %rax
bndcl   (%rax), %bnd0
bndcu   3(%rax), %bnd0
movl(%rax), %eax
bnd ret
.cfi_endproc
...

Checks are here. What do you see in your test?

Ilya

>
> --
> H.J.

[PATCH][ARM][committed] Use enum name for PARAM_SCHED_PRESSURE_ALGORITHM

2014-05-15 Thread Kyrill Tkachov


Hi all,

The scheduler algorithms are defined in sched-int.h as:
enum sched_pressure_algorithm
{
  SCHED_PRESSURE_NONE,
  SCHED_PRESSURE_WEIGHTED,
  SCHED_PRESSURE_MODEL
};
We should use the enum name SCHED_PRESSURE_MODEL rather than the direct integer 
value (2).


This doesn't make any functional difference, just makes it a bit more readable 
(and avoids any unexpected effects in the future if enum 
sched_pressure_algorithm is extended).


Committed as obvious with r210471.

Cheers,
Kyrill

2014-05-15  Kyrylo Tkachov  

* config/arm/arm.c (arm_option_override): Use the SCHED_PRESSURE_MODEL
enum name for PARAM_SCHED_PRESSURE_ALGORITHM.

Re: [PATCH][ARM][committed] Use enum name for PARAM_SCHED_PRESSURE_ALGORITHM

2014-05-15 Thread Kyrill Tkachov


Now with patch attached.

Also forgot to mention.
Tested arm-none-eabi.

Kyrill

On 15/05/14 12:25, Kyrill Tkachov wrote:

Hi all,

The scheduler algorithms are defined in sched-int.h as:
enum sched_pressure_algorithm
{
SCHED_PRESSURE_NONE,
SCHED_PRESSURE_WEIGHTED,
SCHED_PRESSURE_MODEL
};
We should use the enum name SCHED_PRESSURE_MODEL rather than the direct integer
value (2).

This doesn't make any functional difference, just makes it a bit more readable
(and avoids any unexpected effects in the future if enum
sched_pressure_algorithm is extended).

Committed as obvious with r210471.

Cheers,
Kyrill

2014-05-15  Kyrylo Tkachov  

  * config/arm/arm.c (arm_option_override): Use the SCHED_PRESSURE_MODEL
  enum name for PARAM_SCHED_PRESSURE_ALGORITHM.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 28b6b29..14e86c3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -50,6 +50,7 @@
 #include "except.h"
 #include "tm_p.h"
 #include "target.h"
+#include "sched-int.h"
 #include "target-def.h"
 #include "debug.h"
 #include "langhooks.h"
@@ -2944,7 +2945,7 @@ arm_option_override (void)
  prefer_neon_for_64bits = true;
 
   /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2,
+  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
  global_options.x_param_values,
  global_options_set.x_param_values);

Re: [RFC] Using function clones for Pointer Bounds Checker

2014-05-15 Thread Richard Biener

On Thu, May 15, 2014 at 1:07 PM, Ilya Enkovich  wrote:
> 2014-05-14 19:09 GMT+04:00 H.J. Lu :
>> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich  
>> wrote:
>>> 2014-05-13 23:21 GMT+04:00 Jeff Law :
 On 05/13/14 02:38, Ilya Enkovich wrote:
>>>
>>> propagate constant bounds value and remove checks in called function).
>>
>>
>> So from a linking standpoint, presumably you have to mangle the
>> instrumented
>> caller/callee in some manner.  Right?  Or are you dynamically dispatching
>> somehow?
>
>
> Originally the idea was o have instrumented clone to have the same
> assembler name as the original function. Since instrumented code is
> fully compatible with not instrumented code, we always emit only one
> version. Usage of the same assembler name allows instrumented and not
> instrumented calls to look similar in assembler. It worked fine until
> I tried it with LTO where assembler name is used as a unique
> identifier. With linker resolutions files it became even more harder
> to use such approach. To resolve these issues I started to use new
> assembler name with postfix, but linked with the original name using
> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
> clones and originals during compilation, but both clone and original
> functions have similar name in output assembler.

 OK.  So if I read that correctly, it implies that the existence of bounds
 information does not change the signature of the callee.   This is 
 obviously
 important for C++.

 Sounds like I need to sit down with the branch and see how this works in 
 the
 new scheme.
>>>
>>> Both mpx branch and Wiki
>>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
>>> page are up-to-date now and may be tried out either in NOP mode or
>>> with simulator. Let me know if you have any troubles with using it.
>>>
>>
>> I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
>> MPX enabled executable which runs on both MPX-enabled and
>> non MPX-enabled hardwares. I didn't see any MPX run-time library.
>
> Just checked out the branch and checked generated code.
>
> #cat test.c
> int
> test (int *p, int i)
> {
>   return p[i];
> }
> #gcc -fcheck-pointer-bounds -mmpx test.c -S -O2
> #cat test.s
> .file   "test.c"
> .section.text.unlikely,"ax",@progbits
> .LCOLDB0:
> .text
> .LHOTB0:
> .p2align 4,,15
> .globl  test
> .type   test, @function
> test:
> .LFB1:
> .cfi_startproc
> movslq  %esi, %rsi
> leaq(%rdi,%rsi,4), %rax
> bndcl   (%rax), %bnd0
> bndcu   3(%rax), %bnd0
> movl(%rax), %eax
> bnd ret
> .cfi_endproc
> ...
>
> Checks are here. What do you see in your test?

Wow, that's quite an overhead compared to the non-instrumented variant

movslq  %esi, %rsi
movl(%rdi,%rsi,4), %eax
ret

I thought bounds-checking was done with some clever prefixes thus
that

movslq  %esi, %rsi
bndmovl(%rdi,%rsi,4), %eax, %bnd0
bnd ret

would be possible (well, replace with valid ISA).

Richard.

> Ilya
>
>>
>> --
>> H.J.

Re: GCC 4.8.3 Status Report, branch frozen for release (2014-05-15)

2014-05-15 Thread David Edelsohn

On Thu, May 15, 2014 at 5:27 AM, Richard Biener  wrote:
>
> Status
> ==
>
> The 4.8 branch is now frozen as I am preparing a first release
> candidate for 4.8.3.  All patches to the branch now require
> explicit approval from release managers.

Please hold off on GCC 4.8.3.  powerpc-linux has a potentially serious
ABI incompatibility issue related to the HTM support backported to the
4.8 branch.

Thanks, David

Re: [RFC] Using function clones for Pointer Bounds Checker

2014-05-15 Thread Ilya Enkovich

2014-05-15 15:27 GMT+04:00 Richard Biener :
> On Thu, May 15, 2014 at 1:07 PM, Ilya Enkovich  wrote:
>> 2014-05-14 19:09 GMT+04:00 H.J. Lu :
>>> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich  
>>> wrote:
 2014-05-13 23:21 GMT+04:00 Jeff Law :
> On 05/13/14 02:38, Ilya Enkovich wrote:

 propagate constant bounds value and remove checks in called function).
>>>
>>>
>>> So from a linking standpoint, presumably you have to mangle the
>>> instrumented
>>> caller/callee in some manner.  Right?  Or are you dynamically 
>>> dispatching
>>> somehow?
>>
>>
>> Originally the idea was o have instrumented clone to have the same
>> assembler name as the original function. Since instrumented code is
>> fully compatible with not instrumented code, we always emit only one
>> version. Usage of the same assembler name allows instrumented and not
>> instrumented calls to look similar in assembler. It worked fine until
>> I tried it with LTO where assembler name is used as a unique
>> identifier. With linker resolutions files it became even more harder
>> to use such approach. To resolve these issues I started to use new
>> assembler name with postfix, but linked with the original name using
>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
>> clones and originals during compilation, but both clone and original
>> functions have similar name in output assembler.
>
> OK.  So if I read that correctly, it implies that the existence of bounds
> information does not change the signature of the callee.   This is 
> obviously
> important for C++.
>
> Sounds like I need to sit down with the branch and see how this works in 
> the
> new scheme.

 Both mpx branch and Wiki
 (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
 page are up-to-date now and may be tried out either in NOP mode or
 with simulator. Let me know if you have any troubles with using it.

>>>
>>> I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
>>> MPX enabled executable which runs on both MPX-enabled and
>>> non MPX-enabled hardwares. I didn't see any MPX run-time library.
>>
>> Just checked out the branch and checked generated code.
>>
>> #cat test.c
>> int
>> test (int *p, int i)
>> {
>>   return p[i];
>> }
>> #gcc -fcheck-pointer-bounds -mmpx test.c -S -O2
>> #cat test.s
>> .file   "test.c"
>> .section.text.unlikely,"ax",@progbits
>> .LCOLDB0:
>> .text
>> .LHOTB0:
>> .p2align 4,,15
>> .globl  test
>> .type   test, @function
>> test:
>> .LFB1:
>> .cfi_startproc
>> movslq  %esi, %rsi
>> leaq(%rdi,%rsi,4), %rax
>> bndcl   (%rax), %bnd0
>> bndcu   3(%rax), %bnd0
>> movl(%rax), %eax
>> bnd ret
>> .cfi_endproc
>> ...
>>
>> Checks are here. What do you see in your test?
>
> Wow, that's quite an overhead compared to the non-instrumented variant
>
> movslq  %esi, %rsi
> movl(%rdi,%rsi,4), %eax
> ret
>

Overhead is actually two instructions - checks for lower and upper
bounds. lea instruction is probably a miss-optimization. Checks are
cheap instructions and do not introduce new dependencies for the load
which is the heaviest here. BTW checks are not the main reason for
overhead in an instrumented code, it is a bounds tables management
(store/load bounds for stored/loaded pointers) which is. Anyway it is
too early to speak about overhead until we have hardware to measure
it.

> I thought bounds-checking was done with some clever prefixes thus
> that
>
> movslq  %esi, %rsi
> bndmovl(%rdi,%rsi,4), %eax, %bnd0
> bnd ret
>
> would be possible (well, replace with valid ISA).

Doubt it would be possible to encode it keeping backward compatible
with existing hardware. Also putting all logic into one instruction
does not mean it is executed faster than a sequence of instructions,
especially on out-of-order CPUs.

Ilya

>
> Richard.
>
>> Ilya
>>
>>>
>>> --
>>> H.J.

[patch] libstdc++/60326 make_signed/make_unsigned for wide char types

2014-05-15 Thread Jonathan Wakely


We are missing specializations for wchar_t, char16_t and char32_t.

The preprocessor condition I've added at the top of  is
to avoid needing to #include , which drops names in the
global namespace. GCC defines __UINT_LEAST16_TYPE__ and
__UINT_LEAST32_TYPE__ so we can use them, but other compilers may not
(Clang doesn't) so just #include  and accept that 
might put its contents in the global namespace.

Tested x86_64-linux, committed to trunk. I'd like to put this on the
branch for 4.9.1 too, but will wait a while.
commit e1804a737ee8f12f2e1e4fb93693eb96bff6e8af
Author: Jonathan Wakely 
Date:   Tue May 13 16:04:37 2014 +0100

	PR libstdc++/60326
	* include/std/type_traits (__make_unsigned, __make_signed): Define
	specializations for wchar_t, char16_t and char32_t.
	* testsuite/20_util/make_signed/requirements/typedefs-4.cc: New.
	* testsuite/20_util/make_unsigned/requirements/typedefs-1.cc: Correct
	test for make_unsigned.
	* testsuite/20_util/make_unsigned/requirements/typedefs-2.cc:
	Likewise.
	* testsuite/20_util/declval/requirements/1_neg.cc: Adjust dg-error
	line number.
	* testsuite/20_util/make_signed/requirements/typedefs_neg.cc:
	Likewise.
	* testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
	Likewise.

diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits
index 4b434a6..0eacd36 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -41,6 +41,15 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+# if defined (__UINT_LEAST16_TYPE__) && defined(__UINT_LEAST32_TYPE__)
+  typedef __UINT_LEAST16_TYPE__ uint_least16_t;
+  typedef __UINT_LEAST32_TYPE__ uint_least32_t;
+# else
+#  include 
+# endif
+#endif
+
   /**
* @defgroup metaprogramming Metaprogramming
* @ingroup utilities
@@ -1583,6 +1592,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __make_unsigned
 { typedef unsigned long long __type; };
 
+#if defined(_GLIBCXX_USE_WCHAR_T) && !defined(__WCHAR_UNSIGNED__)
+  template<>
+struct __make_unsigned : __make_unsigned<__WCHAR_TYPE__>
+{ };
+#endif
+
 #if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_INT128)
   template<>
 struct __make_unsigned<__int128>
@@ -1665,6 +1680,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __make_signed
 { typedef signed long long __type; };
 
+#if defined(_GLIBCXX_USE_WCHAR_T) && defined(__WCHAR_UNSIGNED__)
+  template<>
+struct __make_signed : __make_signed<__WCHAR_TYPE__>
+{ };
+#endif
+
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+  template<>
+struct __make_signed : __make_signed
+{ };
+  template<>
+struct __make_signed : __make_signed
+{ };
+#endif
+
 #if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_INT128)
   template<>
 struct __make_signed
diff --git a/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc b/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
index a744d83..6be3af0 100644
--- a/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
@@ -19,7 +19,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "static assertion failed" "" { target *-*-* } 2003 }
+// { dg-error "static assertion failed" "" { target *-*-* } 2033 }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/make_signed/requirements/typedefs-4.cc b/libstdc++-v3/testsuite/20_util/make_signed/requirements/typedefs-4.cc
new file mode 100644
index 000..4950e54
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/make_signed/requirements/typedefs-4.cc
@@ -0,0 +1,33 @@
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+// { dg-require-cstdint "" }
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+// libstdc++/60326
+
+using namespace std;
+#ifdef _GLIBCXX_USE_WCHAR_T
+using wchar_signed = make_signed::type;
+using wchar_unsigned = make_unsigned::type;
+static_assert( !is_same::value, "wchar_t" );
+#endif
+static_assert( is_signed::type>::value, "char16_t");
+static_assert( is_signed::type>::value, "char32_t");
diff --git a/libst

[PATCH][match-and-simplify] match-and-simplify from fold_stmt

2014-05-15 Thread Richard Biener


This adds a dispatch from fold_stmt (!inplace for now) to
match-and-simplify.  Mainly to get more testing coverage.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-05-15  Richard Biener  

* gimple-fold.c (fold_stmt_1): Dispatch to
gimple_match_and_simplify.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 210414)
+++ gcc/gimple-fold.c   (working copy)
@@ -1393,6 +1393,13 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
}
 }
 
+  /* Dispatch to pattern-based folding.
+ ???  Do this after the previous stuff as fold_stmt is used to make
+ stmts valid gimple again via maybe_fold_reference of ops.  */
+  if (!inplace
+  && gimple_match_and_simplify (gsi, NULL))
+changed = true;
+
   return changed;
 }

Re: [PATCH] Fix omp declare simd cloning (PR tree-optimization/60823)

2014-05-15 Thread Rainer Orth

Jakub Jelinek  writes:

> On Tue, May 13, 2014 at 07:38:52PM +0200, Rainer Orth wrote:
>> Rainer Orth  writes:
>> 
>> > Jakub Jelinek  writes:
>> >
>> >> 2014-04-18  Jakub Jelinek  
>> >>
>> >>   PR tree-optimization/60823
>> >>   * omp-low.c (ipa_simd_modify_function_body): Go through
>> >>   all SSA_NAMEs and for those refering to vector arguments
>> >>   which are going to be replaced adjust SSA_NAME_VAR and,
>> >>   if it is a default definition, change it into a non-default
>> >>   definition assigned at the beginning of function from new_decl.
>> >>   (ipa_simd_modify_stmt_ops): Rewritten.
>> >>   * tree-dfa.c (set_ssa_default_def): When removing default def,
>> >>   check for NULL loc instead of NULL *loc.
>> >>
>> >>   * c-c++-common/gomp/pr60823-1.c: New test.
>> >>   * c-c++-common/gomp/pr60823-2.c: New test.
>> >>   * c-c++-common/gomp/pr60823-3.c: New test.
>> >
>> > The second test FAILs on Solaris/x86 with Sun as:
>> >
>> > ld.so.1: pr60823-2.exe: fatal: pr60823-2.exe: hardware capability
>> > (CA_SUNW_HW_1) unsupported: 0x2000 [ AVX ]
>> > FAIL: c-c++-common/gomp/pr60823-2.c execution test
>> >
>> > If this is expected, I can extend the code in gcc.target/i386/i386.exp
>> > (or rather move it to a new lib/clearcap.exp) to handle that via linker
>> > maps.
>> 
>> Something else seems to be amiss here: the new
>> libgomp.fortran/declare-simd-[12].f90 tests fail in just the same way,
>> although avx_runtime is false and they are only compiled with -msse2.
>
> If OpenMP declare simd doesn't work on Solaris/x86 (due to the bogus hw cap
> stuff), then supposedly vect_simd_clones effective target should fail there.

I don't think it's bogus: it guards against a similar kind of problems
as symbol versioning.  There, programs that depend on a missing
interface don't start to run instead of crashing in the middle of
execution when a function is missing.  With hwcap, programs depending on
insns not supported by the host don't even start running instead of
crashing later on.

Only if you start playing games with runtime selection of code based on
the host you're running on this gets you into trouble.  If this done
explicitly and you know what you are doing, you can still assemble with
-nH to suppress hwcaps, but if it happens implicitly, this is not an
option, at least not a user-friendly one.

> OpenMP declare simd results in cloning of the functions for SSE2, AVX and
> AVX2.

Ok, that's what I was missing.  ISTM that on Solaris with as/ld,
-fopenmp should trigger linking with a mapfile like the one currently
used in some places in the testsuite.  I envision turning this into a
general facility (-mclear-hwcap; -m is for target-specific options,
right?) which is automatically turned on by -fopenmp on Solaris.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: C++ PATCHes to improve overload resolution diagnostics

2014-05-15 Thread Jason Merrill


On 05/15/2014 06:34 AM, Manuel López-Ibáñez wrote:

This looks great. One minor nit: In this hunk, what is input_location
pointing at and why is that better than loc?


Oops, cut/paste error, thanks.


Also, are there other qualifiers of 'this' besides 'const'?


An object can also be 'volatile'.

Jason

Re: [PATCH] Fix omp declare simd cloning (PR tree-optimization/60823)

2014-05-15 Thread Jakub Jelinek

On Thu, May 15, 2014 at 02:37:33PM +0200, Rainer Orth wrote:
> > If OpenMP declare simd doesn't work on Solaris/x86 (due to the bogus hw cap
> > stuff), then supposedly vect_simd_clones effective target should fail there.
> 
> I don't think it's bogus: it guards against a similar kind of problems
> as symbol versioning.  There, programs that depend on a missing
> interface don't start to run instead of crashing in the middle of
> execution when a function is missing.  With hwcap, programs depending on
> insns not supported by the host don't even start running instead of
> crashing later on.

Runtime selection of code is very common though, which is why I think that
whole idea of hw cap flags checking is bogus.

> > OpenMP declare simd results in cloning of the functions for SSE2, AVX and
> > AVX2.
> 
> Ok, that's what I was missing.  ISTM that on Solaris with as/ld,
> -fopenmp should trigger linking with a mapfile like the one currently
> used in some places in the testsuite.  I envision turning this into a
> general facility (-mclear-hwcap; -m is for target-specific options,
> right?) which is automatically turned on by -fopenmp on Solaris.

Perhaps.

Jakub

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Richard Sandiford

Marek Polacek  writes:
> On Thu, May 15, 2014 at 12:33:57PM +0200, Jakub Jelinek wrote:
>> On Thu, May 15, 2014 at 11:30:40AM +0100, Richard Sandiford wrote:
>> > Jakub Jelinek  writes:
>> > > This patch adds two new options (compatible with clang) which allow
>> > > users to choose the behavior of undefined behavior sanitization.
>> > >
>> > > By default as before, all undefined behaviors (except for
>> > > __builtin_unreachable and missing return in C++) continue after reporting
>> > > which means that you can get lots of runtime errors from a single program
>> > > run and the exit code will not reflect the failure in that case.
>> > >
>> > > With this patch, one can use -fsanitize=undefined -fno-sanitize-recover,
>> > > which will report just the first undefined behavior and then exit with
>> > > non-zero code.
>> > 
>> > Would it make sense for this to be the default for bootstrap-ubsan,
>> > so that the bootstrap fails on undefined behaviour?
>> 
>> Perhaps eventually, but is current bootstrap-ubsan really ubsan error free
>> on at least the major targets?  I've made some efforts towards that goal on
>> x86_64-linux, but haven't tried bootstrap-ubsan recently.
>
> It's not, I'm seeing many 
> /home/marek/src/gcc/gcc/wide-int.h:1734:7: runtime error: shift
> exponent 64 is too large for 64-bit type 'long unsigned int'
> plus I think I remember some other fails.

Yeah, like Richard said on IRC a few days ago, this is partly due to the
zero-precision stuff.  We need to ween ubsan off void_zero_node and then
see where things stand.

Thanks,
Richard

Re: [PATCH] Fix omp declare simd cloning (PR tree-optimization/60823)

2014-05-15 Thread Rainer Orth

Jakub Jelinek  writes:

> On Thu, May 15, 2014 at 02:37:33PM +0200, Rainer Orth wrote:
>> > If OpenMP declare simd doesn't work on Solaris/x86 (due to the bogus hw cap
>> > stuff), then supposedly vect_simd_clones effective target should fail 
>> > there.
>> 
>> I don't think it's bogus: it guards against a similar kind of problems
>> as symbol versioning.  There, programs that depend on a missing
>> interface don't start to run instead of crashing in the middle of
>> execution when a function is missing.  With hwcap, programs depending on
>> insns not supported by the host don't even start running instead of
>> crashing later on.
>
> Runtime selection of code is very common though, which is why I think that
> whole idea of hw cap flags checking is bogus.

That may be changing, but what I observe far more often is code compiled
on one system with -mcpu/arch=native, later you try to execute it on a
different system and it crashes, giving no real indication why.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Marek Polacek

On Thu, May 15, 2014 at 01:42:20PM +0100, Richard Sandiford wrote:
> > It's not, I'm seeing many 
> > /home/marek/src/gcc/gcc/wide-int.h:1734:7: runtime error: shift
> > exponent 64 is too large for 64-bit type 'long unsigned int'
> > plus I think I remember some other fails.
> 
> Yeah, like Richard said on IRC a few days ago, this is partly due to the
> zero-precision stuff.  We need to ween ubsan off void_zero_node and then
> see where things stand.

Yeah, I don't like void_zero_node that much; I'll see if I can stamp it
out.  But note that I see many uses of void_zero_node in the C++ FE.
(ubsan uses void_zero_node only in the c-family/ subdirectory.)

Marek

[PATCH] Make SCCVN constant-fold calls

2014-05-15 Thread Richard Biener


For some odd reason I didn't implement this earlier.  This is one
major source of 2nd-stage opportunities that come up when running
two adjacent FRE passes.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

Richard.

2014-05-15  Richard Biener  

* tree-ssa-sccvn.c (visit_use): Also constant-fold calls.

* gcc.dg/tree-ssa/ssa-fre-41.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 210418)
--- gcc/tree-ssa-sccvn.c(working copy)
*** visit_use (tree use)
*** 3566,3593 
else if (is_gimple_call (stmt))
{
  tree lhs = gimple_call_lhs (stmt);
- 
- /* ???  We could try to simplify calls.  */
- 
  if (lhs && TREE_CODE (lhs) == SSA_NAME)
{
! if (stmt_has_constants (stmt))
!   VN_INFO (lhs)->has_constants = true;
! else
{
! /* We reset expr and constantness here because we may
!have been value numbering optimistically, and
!iterating.  They may become non-constant in this case,
!even if they were optimistically constant.  */
! VN_INFO (lhs)->has_constants = false;
! VN_INFO (lhs)->expr = NULL_TREE;
}
! 
! if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
{
! changed = defs_to_varying (stmt);
  goto done;
}
}
  
  if (!gimple_call_internal_p (stmt)
--- 3566,3629 
else if (is_gimple_call (stmt))
{
  tree lhs = gimple_call_lhs (stmt);
  if (lhs && TREE_CODE (lhs) == SSA_NAME)
{
! /* Try constant folding based on our current lattice.  */
! tree simplified = gimple_fold_stmt_to_constant_1 (stmt,
!   vn_valueize);
! if (simplified)
{
! if (dump_file && (dump_flags & TDF_DETAILS))
!   {
! fprintf (dump_file, "call ");
! print_gimple_expr (dump_file, stmt, 0, 0);
! fprintf (dump_file, " simplified to ");
! print_generic_expr (dump_file, simplified, 0);
! if (TREE_CODE (lhs) == SSA_NAME)
!   fprintf (dump_file, " has constants %d\n",
!expr_has_constants (simplified));
! else
!   fprintf (dump_file, "\n");
!   }
}
! /* Setting value numbers to constants will occasionally
!screw up phi congruence because constants are not
!uniquely associated with a single ssa name that can be
!looked up.  */
! if (simplified
! && is_gimple_min_invariant (simplified))
{
! VN_INFO (lhs)->expr = simplified;
! VN_INFO (lhs)->has_constants = true;
! changed = set_ssa_val_to (lhs, simplified);
  goto done;
}
+ else if (simplified
+  && TREE_CODE (simplified) == SSA_NAME)
+   {
+ changed = visit_copy (lhs, simplified);
+ goto done;
+   }
+ else
+   {
+ if (stmt_has_constants (stmt))
+   VN_INFO (lhs)->has_constants = true;
+ else
+   {
+ /* We reset expr and constantness here because we may
+have been value numbering optimistically, and
+iterating.  They may become non-constant in this case,
+even if they were optimistically constant.  */
+ VN_INFO (lhs)->has_constants = false;
+ VN_INFO (lhs)->expr = NULL_TREE;
+   }
+ 
+ if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+   {
+ changed = defs_to_varying (stmt);
+ goto done;
+   }
+   }
}
  
  if (!gimple_call_internal_p (stmt)
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-41.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-41.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-41.c  (working copy)
***
*** 0 
--- 1,12 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-fre1" } */
+ 
+ int x;
+ int foo (void)
+ {
+   x = 1;
+   return __builtin_ffs (x);
+ }
+ 
+ /* { dg-final { scan-tree-dump-not "ffs" "fre1" } } */
+ /* { dg-final { cleanup-tree-dump "fre1" } } */

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Richard Biener

On Thu, 15 May 2014, Marek Polacek wrote:

> On Thu, May 15, 2014 at 01:42:20PM +0100, Richard Sandiford wrote:
> > > It's not, I'm seeing many 
> > > /home/marek/src/gcc/gcc/wide-int.h:1734:7: runtime error: shift
> > > exponent 64 is too large for 64-bit type 'long unsigned int'
> > > plus I think I remember some other fails.
> > 
> > Yeah, like Richard said on IRC a few days ago, this is partly due to the
> > zero-precision stuff.  We need to ween ubsan off void_zero_node and then
> > see where things stand.
> 
> Yeah, I don't like void_zero_node that much; I'll see if I can stamp it
> out.  But note that I see many uses of void_zero_node in the C++ FE.
> (ubsan uses void_zero_node only in the c-family/ subdirectory.)

They shouldn't survive gimplification though.  I suggest to add
a check for verify_expr to catch them and ICE if they appear in
the IL.

Richard.

Re: [PATCH, libgfortran] PR 61187 Handle closed std{in,out,err}

2014-05-15 Thread Tobias Burnus

Janne Blomqvist wrote:
> libgfortran was happily assuming that STD{IN,OUT,ERR}_FILENO were open
> and no error checking was performed on the fstat() call

> Tested on x86_64-unknown-linux-gnu, Ok for trunk/4.9/4.8/4.7?

Looks good to me. Thanks!

Note that 4.8 has been frozen, cf.
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01168.html

But it might be hold off:
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01181.html

Thus, please check with the release managers/Richard whether you
may commit the patch, e.g. by pinging Richard on  #gcc's IRC channel.

Tobias

> 2014-05-15  Janne Blomqvist  
>
> PR libfortran/61187
> * io/unix.c (raw_close): Check if s->fd is -1.
> (fd_to_stream): Check return value of fstat(), handle error.

Re: SYMBOL_REF_FLAGS

2014-05-15 Thread David Edelsohn

On Thu, May 15, 2014 at 2:37 AM, Richard Sandiford
 wrote:

> get_pool_constant can return general constants, not just SYMBOL_REFs,
> so the code does look wrong.  In the particular case of CONST flagged up
> in the message, SYMBOL_REF_TLS_MODEL would read beyond the end of the rtx.
> I think you would have seen the same failure before the patch with
> --enable-checking=yes,rtl.
>
> The reason my patch broke the bootstrap is that SYMBOL_REF_FLAGS now
> checks based on ENABLE_RTL_FLAG_CHECKING rather than ENABLE_RTL_CHECKING.
> I'd done that because u2 is part of the header and so checking it felt
> more like checking a flag than the format.  Perhaps that was a bad
> idea though, since it will slow down --enable-checking=yes (but not
> --enable-checking=release) compilers.  OTOH it will catch things like
> this.  I can switch it back if that seems better.
>
> Does the patch below fix things?
>
> Thanks,
> Richard
>
>
> gcc/
> * config/rs6000/rs600.c (rs6000_real_tls_symbol_ref_p): New function.
> (rs6000_delegitimize_address): Use it.

The patch allows me to get past the failure.  I wasn't certain if the
check was too strict or not. I'm bootstrapping and running the
testsuite now.

Thanks, David

[C++ Patch/RFC] PR 58753 & 58930

2014-05-15 Thread Paolo Carlini


Hi,

I'm trying to make progress on these two issues, which seem closely 
related to me. Both only happen for NSDMIs in template classes, thus 
when cp_parser_late_parse_one_default_arg sees processing_template_decl 
!= 0 and doesn't call digest_init_flags directly, thus produces a 
CONSTRUCTOR (vs, eg, a TARGET_EXPR). Then, I think that the problem - 
thus the incorrect diagnostic and the rejection - must be in 
perform_member_init, which doesn't early handle such CONSTRUCTOR with 
DIRECT_LIST_INIT_P true, thus doesn't call digest_init. The below passes 
testing and fixes all the snippets we have got for those two PRs.


Thanks!
Paolo.


Index: cp/init.c
===
--- cp/init.c   (revision 210459)
+++ cp/init.c   (working copy)
@@ -644,7 +644,9 @@ perform_member_init (tree member, tree init)
|| (TREE_CODE (init) == TREE_LIST
&& DIRECT_LIST_INIT_P (TREE_VALUE (init
   && (CP_AGGREGATE_TYPE_P (type)
-  || is_std_init_list (type)
+  || is_std_init_list (type)))
+  /* This can happen for NSDMI in template class (c++/58930).  */ 
+  || DIRECT_LIST_INIT_P (init)))
 {
   /* With references and list-initialization, we need to deal with
 extending temporary lifetimes.  12.2p5: "A temporary bound to a
Index: testsuite/g++.dg/cpp0x/nsdmi-template11.C
===
--- testsuite/g++.dg/cpp0x/nsdmi-template11.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/nsdmi-template11.C   (working copy)
@@ -0,0 +1,15 @@
+// PR c++/58930
+// { dg-do compile { target c++11 } }
+
+struct SampleModule
+{
+  explicit SampleModule (int);
+};
+
+template < typename >
+struct BaseHandler
+{
+  SampleModule module_ { 0 };
+};
+
+BaseHandler a;
Index: testsuite/g++.dg/cpp0x/nsdmi-template12.C
===
--- testsuite/g++.dg/cpp0x/nsdmi-template12.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/nsdmi-template12.C   (working copy)
@@ -0,0 +1,17 @@
+// PR c++/58753
+// { dg-do compile { target c++11 } }
+
+#include 
+
+template 
+struct X {X(std::initializer_list) {}};
+
+template  
+class T {
+  X x{1}; 
+}; 
+
+int main()
+{
+  T t;
+}

Re: [patch] Minor improvement to fold_unary_loc

2014-05-15 Thread Eric Botcazou

> Seems reasonable.  Do you happen to have a testcase where you can see
> the effects in one of the dumps?

That's not easy because NON_LVALUE_EXPRs are present only in .original and, at 
least in Ada, essentially only in size expressions which are not visible in 
the dump.

-- 
Eric Botcazou

[PATCH, PR 61090] Pass gsi to build_ref_for_model in sra_modify_expr

2014-05-15 Thread Martin Jambor

Hi,

my patch switching to more correct alias introduced an error into the
call of build_ref_for_model also in sra_modify_expr (when avoiding
VIEW_CONVERT_EXPR).  We are no longer using access->base as the base
for the memory expression being built, but rather the original
expression from the source.  This means we may encounter an ARRAY_REF
and need an iterator for producing potential extra statements.

Fixed simply by the following patch which provides the iterator we
have at hand.  Bootstrapped and tested on x86_64-linux.  OK for trunk?

Thanks,

Martin


2014-05-14  Martin Jambor  

PR tree-optimization/61090
* tree-sra.c (sra_modify_expr): Pass the current gsi to
build_ref_for_model.

testsuite/
* gcc.dg/tree-ssa/pr61090.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61090.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr61090.c
new file mode 100644
index 000..fff2895
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61090.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+struct i {
+   int c;
+};
+
+static int
+p(struct i a)
+{
+  return 0;
+}
+
+void
+h(void)
+{
+  struct i z[] = {{ 0 }};
+  int e[] = {};
+  int x;
+  e[0] = p(z[x]) + z[x].c;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 72c485b..ef6c966 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2812,7 +2812,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
bool write)
{
  tree ref;
 
- ref = build_ref_for_model (loc, orig_expr, 0, access, NULL, false);
+ ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
 
  if (write)
{

Re: [PATCH, PR 61090] Pass gsi to build_ref_for_model in sra_modify_expr

2014-05-15 Thread Richard Biener

On Thu, 15 May 2014, Martin Jambor wrote:

> Hi,
> 
> my patch switching to more correct alias introduced an error into the
> call of build_ref_for_model also in sra_modify_expr (when avoiding
> VIEW_CONVERT_EXPR).  We are no longer using access->base as the base
> for the memory expression being built, but rather the original
> expression from the source.  This means we may encounter an ARRAY_REF
> and need an iterator for producing potential extra statements.
> 
> Fixed simply by the following patch which provides the iterator we
> have at hand.  Bootstrapped and tested on x86_64-linux.  OK for trunk?

Ok.

Thamks,
Richard.

> Thanks,
> 
> Martin
> 
> 
> 2014-05-14  Martin Jambor  
> 
>   PR tree-optimization/61090
>   * tree-sra.c (sra_modify_expr): Pass the current gsi to
>   build_ref_for_model.
> 
>   testsuite/
>   * gcc.dg/tree-ssa/pr61090.c: New test.
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61090.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr61090.c
> new file mode 100644
> index 000..fff2895
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61090.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O" } */
> +
> +struct i {
> +   int c;
> +};
> +
> +static int
> +p(struct i a)
> +{
> +  return 0;
> +}
> +
> +void
> +h(void)
> +{
> +  struct i z[] = {{ 0 }};
> +  int e[] = {};
> +  int x;
> +  e[0] = p(z[x]) + z[x].c;
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 72c485b..ef6c966 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -2812,7 +2812,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
> bool write)
>   {
> tree ref;
>  
> -   ref = build_ref_for_model (loc, orig_expr, 0, access, NULL, false);
> +   ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
>  
> if (write)
>   {
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

[PATCH, PR 61085] Add missing type_preserved check

2014-05-15 Thread Martin Jambor

Hi,

PR 61085 revealed that I forgot to put a type_preserved check to an
important spot, namely to update_indirect_edges_after_inlining, which
leads to wrong devirtualization because the function does not ignore
jump functions it should.

Fixed thusly, bootstrapped and tested on x86_64-linux on both trunk
and the 4.9 branch.  OK for both?

Thanks,

Martin


2014-05-15  Martin Jambor  

PR ipa/61085
* ipa-prop.c (update_indirect_edges_after_inlining): Check
type_preserved flag when the indirect edge is polymorphic.

testsuite/
* g++.dg/ipa/pr61085.C: New test.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index da6ffe8..4f983a6 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -2877,16 +2877,20 @@ update_indirect_edges_after_inlining (struct 
cgraph_edge *cs,
   else if (jfunc->type == IPA_JF_PASS_THROUGH
   && ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR)
{
- if (ici->agg_contents
- && !ipa_get_jf_pass_through_agg_preserved (jfunc))
+ if ((ici->agg_contents
+  && !ipa_get_jf_pass_through_agg_preserved (jfunc))
+ || (ici->polymorphic
+ && !ipa_get_jf_pass_through_type_preserved (jfunc)))
ici->param_index = -1;
  else
ici->param_index = ipa_get_jf_pass_through_formal_id (jfunc);
}
   else if (jfunc->type == IPA_JF_ANCESTOR)
{
- if (ici->agg_contents
- && !ipa_get_jf_ancestor_agg_preserved (jfunc))
+ if ((ici->agg_contents
+  && !ipa_get_jf_ancestor_agg_preserved (jfunc))
+ || (ici->polymorphic
+ && !ipa_get_jf_ancestor_type_preserved (jfunc)))
ici->param_index = -1;
  else
{
diff --git a/gcc/testsuite/g++.dg/ipa/pr61085.C 
b/gcc/testsuite/g++.dg/ipa/pr61085.C
new file mode 100644
index 000..531f59d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr61085.C
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-early-inlining" } */
+
+struct A {};
+struct B : virtual A {
+  unsigned m_i;
+  B() : m_i () {}
+  virtual A *m_virt ()
+  {
+return 0;
+  }
+  ~B ()
+  {
+m_foo ();
+while (m_i)
+  ;
+  }
+  void m_foo ()
+  {
+m_virt ();
+  }
+};
+
+class C : B {
+  A *m_virt () {
+__builtin_abort ();
+  }
+};
+
+int main ()
+{
+  C c;
+}

Re: RFA: Fix calculation of size of builtin setjmp buffer

2014-05-15 Thread Mike Stump

On May 15, 2014, at 12:55 AM, Eric Botcazou  wrote:
> No, that's too complicated and risky, just do the following:
> 
>  /* builtin_setjmp takes a pointer to 5 words or pointers.  */
>  if (POINTER_SIZE > BITS_PER_WORD)
>   tmp = size_int (4);
>  else
>   tmp = size_int (5 * BITS_PER_WORD / POINTER_SIZE - 1);
> 
> which is simple and safe.

But, fails whenever the size of the mode of the save area is bigger than a 
certain amount…  On my port, the size taken up by the  save area is large 
enough to cause this to fail.  :-(  The correct bug fix would have my port not 
fail.

Re: [PATCH, PR 61085] Add missing type_preserved check

2014-05-15 Thread Richard Biener

On Thu, May 15, 2014 at 4:45 PM, Martin Jambor  wrote:
> Hi,
>
> PR 61085 revealed that I forgot to put a type_preserved check to an
> important spot, namely to update_indirect_edges_after_inlining, which
> leads to wrong devirtualization because the function does not ignore
> jump functions it should.
>
> Fixed thusly, bootstrapped and tested on x86_64-linux on both trunk
> and the 4.9 branch.  OK for both?

Ok.

Thanks,
Richard.

> Thanks,
>
> Martin
>
>
> 2014-05-15  Martin Jambor  
>
> PR ipa/61085
> * ipa-prop.c (update_indirect_edges_after_inlining): Check
> type_preserved flag when the indirect edge is polymorphic.
>
> testsuite/
> * g++.dg/ipa/pr61085.C: New test.
>
> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> index da6ffe8..4f983a6 100644
> --- a/gcc/ipa-prop.c
> +++ b/gcc/ipa-prop.c
> @@ -2877,16 +2877,20 @@ update_indirect_edges_after_inlining (struct 
> cgraph_edge *cs,
>else if (jfunc->type == IPA_JF_PASS_THROUGH
>&& ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR)
> {
> - if (ici->agg_contents
> - && !ipa_get_jf_pass_through_agg_preserved (jfunc))
> + if ((ici->agg_contents
> +  && !ipa_get_jf_pass_through_agg_preserved (jfunc))
> + || (ici->polymorphic
> + && !ipa_get_jf_pass_through_type_preserved (jfunc)))
> ici->param_index = -1;
>   else
> ici->param_index = ipa_get_jf_pass_through_formal_id (jfunc);
> }
>else if (jfunc->type == IPA_JF_ANCESTOR)
> {
> - if (ici->agg_contents
> - && !ipa_get_jf_ancestor_agg_preserved (jfunc))
> + if ((ici->agg_contents
> +  && !ipa_get_jf_ancestor_agg_preserved (jfunc))
> + || (ici->polymorphic
> + && !ipa_get_jf_ancestor_type_preserved (jfunc)))
> ici->param_index = -1;
>   else
> {
> diff --git a/gcc/testsuite/g++.dg/ipa/pr61085.C 
> b/gcc/testsuite/g++.dg/ipa/pr61085.C
> new file mode 100644
> index 000..531f59d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr61085.C
> @@ -0,0 +1,33 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-early-inlining" } */
> +
> +struct A {};
> +struct B : virtual A {
> +  unsigned m_i;
> +  B() : m_i () {}
> +  virtual A *m_virt ()
> +  {
> +return 0;
> +  }
> +  ~B ()
> +  {
> +m_foo ();
> +while (m_i)
> +  ;
> +  }
> +  void m_foo ()
> +  {
> +m_virt ();
> +  }
> +};
> +
> +class C : B {
> +  A *m_virt () {
> +__builtin_abort ();
> +  }
> +};
> +
> +int main ()
> +{
> +  C c;
> +}

Re: [patch] Minor improvement to fold_unary_loc

2014-05-15 Thread Jeff Law


On 05/15/14 08:15, Eric Botcazou wrote:

Seems reasonable.  Do you happen to have a testcase where you can see
the effects in one of the dumps?


That's not easy because NON_LVALUE_EXPRs are present only in .original and, at
least in Ada, essentially only in size expressions which are not visible in
the dump.

OK.  Just thought I'd verify in case it'd been forgotten.

jeff

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Marek Polacek

On Thu, May 15, 2014 at 11:39:26AM +0100, Ramana Radhakrishnan wrote:
> What's the overhead with bootstrap-ubsan ?

I timed normal bootstrap and bootstrap-ubsan on x86_64, 24 cores,
Intel(R) Xeon(R) CPU X5670  @ 2.93GHz (aka cfarm 20) and the results:

--enable-languages=all

real35m10.419s
user204m5.613s
sys 6m15.615s

--enable-languages=all --with-build-config=bootstrap-ubsan

real71m39.338s
user347m53.409s
sys 7m44.281s

Marek

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Jakub Jelinek

On Thu, May 15, 2014 at 05:08:10PM +0200, Marek Polacek wrote:
> On Thu, May 15, 2014 at 11:39:26AM +0100, Ramana Radhakrishnan wrote:
> > What's the overhead with bootstrap-ubsan ?
> 
> I timed normal bootstrap and bootstrap-ubsan on x86_64, 24 cores,
> Intel(R) Xeon(R) CPU X5670  @ 2.93GHz (aka cfarm 20) and the results:
> 
> --enable-languages=all
> 
> real  35m10.419s
> user  204m5.613s
> sys   6m15.615s
> 
> --enable-languages=all --with-build-config=bootstrap-ubsan
> 
> real  71m39.338s
> user  347m53.409s
> sys   7m44.281s

Note that doesn't mean -fsanitize=undefined generated code is twice as slow
as code compiled without it, guess most of the extra overhead is actually
compile time (mostly larger cfg due to all the extra checks).

Jakub

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Marek Polacek

On Thu, May 15, 2014 at 05:16:00PM +0200, Jakub Jelinek wrote:
> On Thu, May 15, 2014 at 05:08:10PM +0200, Marek Polacek wrote:
> > On Thu, May 15, 2014 at 11:39:26AM +0100, Ramana Radhakrishnan wrote:
> > > What's the overhead with bootstrap-ubsan ?
> > 
> > I timed normal bootstrap and bootstrap-ubsan on x86_64, 24 cores,
> > Intel(R) Xeon(R) CPU X5670  @ 2.93GHz (aka cfarm 20) and the results:
> > 
> > --enable-languages=all
> > 
> > real35m10.419s
> > user204m5.613s
> > sys 6m15.615s
> > 
> > --enable-languages=all --with-build-config=bootstrap-ubsan
> > 
> > real71m39.338s
> > user347m53.409s
> > sys 7m44.281s
> 
> Note that doesn't mean -fsanitize=undefined generated code is twice as slow
> as code compiled without it, guess most of the extra overhead is actually
> compile time (mostly larger cfg due to all the extra checks).

And I think the main offender will be the NULL checking with its extra bbs.  So 
to
alleviate the cfg size one can use -fsanitize=undefined -fno-sanitize=null - I'd
expect this to make a significant difference.  

Marek

Re: [AArch64/ARM 2/3] Recognize shuffle patterns for REV instructions on AArch64, rewrite intrinsics.

2014-05-15 Thread Alan Lawrence

Sure, here is a revised patch (replacing a with __a). I've retested (the various 
TBL dependencies have all been committed), no regressions on aarch64-none-elf or 
aarch64_be-none-elf. May I propose gcc/ChangeLog:


2014-05-15  Alan Lawrence  

* config/aarch64/aarch64-simd.md (aarch64_rev):
New pattern.
* config/aarch64/aarch64.c (aarch64_evpc_rev): New function.
(aarch64_expand_vec_perm_const_1): Add call to aarch64_evpc_rev.
* config/aarch64/iterators.md (REVERSE): New iterator.
(UNSPEC_REV64, UNSPEC_REV32, UNSPEC_REV16): New enum elements.
(rev_op): New int_attribute.
* config/aarch64/arm_neon.h (vrev16_p8, vrev16_s8, vrev16_u8,
vrev16q_p8, vrev16q_s8, vrev16q_u8, vrev32_p8, vrev32_p16, vrev32_s8,
vrev32_s16, vrev32_u8, vrev32_u16, vrev32q_p8, vrev32q_p16, vrev32q_s8,
vrev32q_s16, vrev32q_u8, vrev32q_u16, vrev64_f32, vrev64_p8,
vrev64_p16, vrev64_s8, vrev64_s16, vrev64_s32, vrev64_u8, vrev64_u16,
vrev64_u32, vrev64q_f32, vrev64q_p8, vrev64q_p16, vrev64q_s8,
vrev64q_s16, vrev64q_s32, vrev64q_u8, vrev64q_u16, vrev64q_u32):
Replace temporary __asm__ with __builtin_shuffle.

Cheers, Alan

Marcus Shawcroft wrote:


On 23 April 2014 20:44, Alan Lawrence  wrote:

This patch (borrowing heavily from the ARM backend) makes
aarch64_expand_vec_perm_const output REV instructions when appropriate,
and then implements the vrev_XXX intrinsics in terms of __builtin_shuffle
(which
now produces the same assembly instructions).

No regressions (and tests in previous patch
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01468.html still passing) on
aarch64-none-elf; also on aarch64_be-none-elf, where there are
no regressions following testsuite config changes in
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00579.html, but some "noise"
(due
to unexpected success in vectorization) without that patch.

gcc/ChangeLog:
2014-04-23  Alan Lawrence  


Two spaces after the date.


* config/aarch64/iterators.md: add a REVERSE iterator and rev_op
attribute for REV64/32/16 insns.


ChangeLog entries are sentences, therefore they start with a capital letter.

Identify the new definitions in parentheses.  The ChangeLog entry just
states what changed, not why, any explanation required should be in
code comments or in the submission email, therefore:

 * config/aarch64/iterators.md (REVERSE, rev_op): Define.



* config/aarch64/aarch64-simd.md: add corresponding define_insn
parameterized by REVERSE iterator.


... and this one should read something like:

 * config/aarch64/aarch64-simd.md
   (aarch64_rev): Define.

The remaining entries should all be updated in a similar fashion...



+
+/* vrev  */
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vrev16_p8 (poly8x8_t a)
+{
+  return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
+}


This has the effect of reserving the symbol 'a', use __a instead.  We
have other breakage like this in arm_neon.h which needs fixing but
that aside we should not be compounding the problem.

Cheers
/Marcus

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 108bc8d..52fffd9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4167,6 +4167,15 @@
   [(set_attr "type" "neon_permute")]
 )
 
+(define_insn "aarch64_rev"
+  [(set (match_operand:VALL 0 "register_operand" "=w")
+	(unspec:VALL [(match_operand:VALL 1 "register_operand" "w")]
+REVERSE))]
+  "TARGET_SIMD"
+  "rev\\t%0., %1."
+  [(set_attr "type" "neon_rev")]
+)
+
 (define_insn "aarch64_st2_dreg"
   [(set (match_operand:TI 0 "aarch64_simd_struct_operand" "=Utv")
 	(unspec:TI [(match_operand:OI 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6a6fb03..6701c8c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8087,6 +8087,80 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
   return true;
 }
 
+/* Recognize patterns for the REV insns.  */
+
+static bool
+aarch64_evpc_rev (struct expand_vec_perm_d *d)
+{
+  unsigned int i, j, diff, nelt = d->nelt;
+  rtx (*gen) (rtx, rtx);
+
+  if (!d->one_vector_p)
+return false;
+
+  diff = d->perm[0];
+  switch (diff)
+{
+case 7:
+  switch (d->vmode)
+	{
+	case V16QImode: gen = gen_aarch64_rev64v16qi; break;
+	case V8QImode: gen = gen_aarch64_rev64v8qi;  break;
+	default:
+	  return false;
+	}
+  break;
+case 3:
+  switch (d->vmode)
+	{
+	case V16QImode: gen = gen_aarch64_rev32v16qi; break;
+	case V8QImode: gen = gen_aarch64_rev32v8qi;  break;
+	case V8HImode: gen = gen_aarch64_rev64v8hi;  break;
+	case V4HImode: gen = gen_aarch64_rev64v4hi;  break;
+	default:
+	  return false;
+	}
+  break;
+case 1:
+  switch (d->vmode)
+	{
+	case V16QImode: gen = gen_aarch64_rev16v16qi; break;
+	c

[PATCH AArch64 / testsuite] Add V1DFmode, fixes PR/59843

2014-05-15 Thread Alan Lawrence

This fixes an ICE on AArch64 when compiling code with a vector of exactly one 
double, and seems the most specific/accurate way of fixing that specific case.


I've included a test case of a range of other singleton vector types too 
(compiles on aarch64-none-elf, x64_64, arm-none-eabi).


No regressions on aarch64-none-elf.

Cheers, Alandiff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index 1d2cc767946623fc557e2f6518827e40c4df9b73..f9c436948a6f5177761ee1d288809f05a7e841c1 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -31,6 +31,7 @@ VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI.  */
 VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI.  */
 VECTOR_MODES (FLOAT, 8);  /* V2SF.  */
 VECTOR_MODES (FLOAT, 16); /*V4SF V2DF.  */
+VECTOR_MODE (FLOAT, DF, 1);   /* V1DF.  */
 
 /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
 INT_MODE (OI, 32);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index dacd7eebcf6ad000c43fbb86f74c343573b30615..601f54e63171c57dfa9fb316530baea0891f3247 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6335,7 +6335,8 @@ aarch64_vector_mode_supported_p (enum machine_mode mode)
 	  || mode == V16QImode || mode == V2DImode
 	  || mode == V2SImode  || mode == V4HImode
 	  || mode == V8QImode || mode == V2SFmode
-	  || mode == V4SFmode || mode == V2DFmode))
+	  || mode == V4SFmode || mode == V2DFmode
+	  || mode == V1DFmode))
 return true;
 
   return false;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-singleton_1.c b/gcc/testsuite/gcc.dg/vect/vect-singleton_1.c
new file mode 100644
index ..6c2ff49cdab358245bfc3f5994724bb878c68d61
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-singleton_1.c
@@ -0,0 +1,38 @@
+/* PR target/59843 ICE on function taking/returning vector of one float64_t.  */
+
+/* { dg-do compile } */
+/* { dg-options "-Warray-bounds -O2 -fno-inline -std=c99" } */
+
+#define TEST(BASETYPE, VECTYPE, SUFFIX)	 \
+  typedef BASETYPE VECTYPE		 \
+  __attribute__ ((__vector_size__ (sizeof (BASETYPE;		 \
+  VECTYPE \
+  test_vadd_##SUFFIX (VECTYPE a, VECTYPE b) \
+  {	 \
+return a + b;			 \
+  }	 \
+	 \
+  void	 \
+  test_##SUFFIX (BASETYPE val)		 \
+  {	 \
+VECTYPE var = { val };		 \
+BASETYPE v0 = var[0];		 \
+BASETYPE v1 = var[1]; /* { dg-warning "index value is out of bound" } */ \
+  }
+
+TEST (double, float64x1_t, f64)
+
+/* Original bug was for above type;
+   in a nod to completeness, test other types too.  */
+
+TEST (long long, int64x1_t, s64)
+
+TEST (float, float32x1_t, f32)
+
+TEST (long, longx1_t, l)
+
+TEST (int, intx1_t, i)
+
+TEST (short, int16x1_t, s16)
+
+TEST (char, int8x1_t, s8)

Re: [PATCH AArch64 / testsuite] Add V1DFmode, fixes PR/59843

2014-05-15 Thread Alan Lawrence


Oops, I missed:

gcc/ChangeLog:
2014-05-15  Alan Lawrence  

* config/aarch64/aarch64-modes.def: Add V1DFmode.
* config/aarch64/aarch64.c (aarch64_vector_mode_supported_p):
Support V1DFmode.

gcc/testsuite/ChangeLog:
2014-05-15  Alan Lawrence  

* gcc.dg/vect/vect-singleton_1.c: New file.

Alan Lawrence wrote:
This fixes an ICE on AArch64 when compiling code with a vector of exactly one 
double, and seems the most specific/accurate way of fixing that specific case.


I've included a test case of a range of other singleton vector types too 
(compiles on aarch64-none-elf, x64_64, arm-none-eabi).


No regressions on aarch64-none-elf.

Cheers, Alan

Re: [Committed] [PATCH,*/2] shrink wrap a function with a single loop: split live_edge

2014-05-15 Thread Dominique Dhumieres

One of the two commits breaks sevral fortran tests in 32 bit mode,
see https://gcc.gnu.org/ml/gcc-regression/2014-05/msg00155.html

The ICE is

[Book15] f90/bug% gfc 
/opt/gcc/work/gcc/testsuite/gfortran.dg/assumed_charlen_needed_1.f90 -m32 -O
/opt/gcc/work/gcc/testsuite/gfortran.dg/assumed_charlen_needed_1.f90: In 
function 'MAIN__':
/opt/gcc/work/gcc/testsuite/gfortran.dg/assumed_charlen_needed_1.f90:8:0: 
internal compiler error: in maybe_record_trace_start, at dwarf2cfi.c:2239
   print *, fun (a)
 ^

/opt/gcc/work/gcc/testsuite/gfortran.dg/assumed_charlen_needed_1.f90:8:0: 
internal compiler error: Abort trap: 6

TIA 

Dominique

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Mike Stump

On May 15, 2014, at 12:26 AM, bin.cheng  wrote:
> Here comes up with a new GCC pass looking through each basic block and
> merging paired load store even they are not adjacent to each other.

So I have a target that has load and store multiple support that supports large 
a number of registers (2-n registers), and I added a sched0 pass that is a 
light copy of the regular scheduling pass that uses a different cost function 
which arranges all loads first, then all stores then everything else.  Within a 
group of loads or stores the secondary key is the base register, the next key 
is the offset.  The net result, all loads off the same register are sorted in 
increasing order.  We then can use some define_insns and some peephole to 
patterns to take the seemingly unrelated instructions, which are now adjacent 
to knock them down into single instructions, instead of the mass of 
instructions they were before.  And then a peephole pass that runs early to 
allow the patterns to do the heavy lifting.  This scheme can handle unlimited 
complexity on the addressing forms just by ensuring the cost function for the 
new scheduling pass looks at all relevant bits (target dependent, if the simple 
machine independent form reg + n is not enough).  The sched0 and the peephole 
pass run early:

+  NEXT_PASS (pass_sched0);
+  NEXT_PASS (pass_peephole1);
   NEXT_PASS (pass_web);
   NEXT_PASS (pass_rtl_cprop);
   NEXT_PASS (pass_cse2);

(before register allocation) so, it can arrange to have things in adjacent 
registers for the load and store multiple instructions.  The register allocator 
can then arrange all the code to use those registers directly.

So, having done all this work, I think it would be nicer if there were a pass 
that managed it (so that one doesn’t have to write any of the peephole or the 
define_insns (you need like 3*n patterns, and the patterns of O(n), so, you 
need around n*4/2 lines of code, which is annoying for large n.  A pass could 
use the existing load store multiple patterns directly, so, no additional port 
work.  In my work, since I extend life times of values into registers, pretty 
much without limit, this could be a bad thing.  The code is naturally limited 
to only extending the lives of things where load and store multiple are used, 
as if they aren’t used, the regular scheduling pass would undo all the sched0 
motion.  I choose a light copy of sched, as I don’t care about compile time, 
and I wanted a very small patch that was easy to maintain.  If this pass when 
into trunk, we’d run the new passes _only_ if a port asked for them.  99% of 
the ports likely don’t want either, though, peephole before register allocation 
might be interesting for others to solve other issues.

I wanted this to run before register allocation as my load and store multiple 
instructions only take consecutive register ranges (n-m), and I need the 
register allocator to manage to make it true.  I do reg to reg motion to move 
things around as needed, but almost all I expect the allocator to get rid of.  
Very complex cases might wind up with a few extra moves, but I have nice 
bubbles that I can fit these extra moves into.

In my scheme, no new options, no documentation for new options, no new param 
options, no silly constants, no hard to write/maintain pass, no new weird 
targets interface, no limit on just pairs, works on stores as well, runs 
earlier, 430 lines instead of 1086 lines, conceptually much simpler, added 
benefit of peephole before register allocation that can be used for other 
things by the port…

So, my question is, does my scheme on your target find more or fewer things?  
Would your scheme pair pairs (so that 4 registers would go into 1 instruction)?

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 4317c9f..9702ebe 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -1361,7 +1361,10 @@ insn_cost (rtx insn)
   if (recog_memoized (insn) < 0)
return 0;

-  cost = insn_default_latency (insn);
+  if (sched_load)
+   cost = 0;
+  else
+   cost = insn_default_latency (insn);
   if (cost < 0)
cost = 0;

@@ -1383,7 +1386,10 @@ insn_cost (rtx insn)
}
   else
{
- cost = insn_default_latency (insn);
+ if (sched_load)
+   cost = 0;
+ else
+   cost = insn_default_latency (insn);
  if (cost < 0)
cost = 0;

@@ -1568,6 +1574,27 @@ dep_list_size (rtx insn, sd_list_types_def list)
   return nodbgcount;
 }

+static int saved_priority;
+bool sched_load;
+
+
+static int
+pri (bool wasload, int regno, HOST_WIDE_INT offp) {
+  int p;
+  int off = (int)offp;
+  int m = /* ((1<14)-1) */ 0x3fff;
+  off = off + (1<<13);
+  off = off & m;
+  /* loads go first, then stores */
+  p = wasload*(INT_MAX/4+1) + (INT_MAX/4+1);
+  /* then we sort upon the base register, increasing regnos.  */
+  p -= (regno & 0xfff) <<14;
+  /* then we sort on the offset, increas

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Steven Bosscher

On Thu, May 15, 2014 at 9:26 AM, bin.cheng wrote:
> Hi,
> Targets like ARM and AARCH64 support double-word load store instructions,
> and these instructions are generally faster than the corresponding two
> load/stores.  GCC currently uses peephole2 to merge paired load/store into
> one single instruction which has a disadvantage.  It can only handle simple
> cases like the two instructions actually appear sequentially in instruction
> stream, and is too weak to handle cases in which the two load/store are
> intervened by other irrelevant instructions.
>
> Here comes up with a new GCC pass looking through each basic block and
> merging paired load store even they are not adjacent to each other.  The
> algorithm is pretty simple:
> 1) In initialization pass iterating over instruction stream it collects
> relevant memory access information for each instruction.
> 2) It iterates over each basic block, tries to find possible paired
> instruction for each memory access instruction.  During this work, it checks
> dependencies between the two possible instructions and also records the
> information indicating how to pair the two instructions.  To avoid quadratic
> behavior of the algorithm, It introduces new parameter
> max-merge-paired-loadstore-distance and set the default value to 4, which is
> large enough to catch major part of opportunities on ARM/cortex-a15.
> 3) For each candidate pair, it calls back-end's hook to do target dependent
> check and merge the two instructions if possible.
>
> Though the parameter is set to 4, for miscellaneous benchmarks, this pass
> can merge numerous opportunities except ones already merged by peephole2
> (same level numbers of opportunities comparing to peepholed ones).  GCC
> bootstrap can also confirm this finding.
>
> Yet there is an open issue about when we should run this new pass.  Though
> register renaming is disabled by default now, I put this pass after it,
> because renaming can resolve some false dependencies thus benefit this pass.
> Another finding is, it can capture a lot more opportunities if it's after
> sched2, but I am not sure whether it will mess up with scheduling results in
> this way.
>
> So, any comments about this?

First off: Why does this need a target hook? We're getting more and
more of them -- too many IMHO. There should be really good reasons for
adding even more new ones.

Does this pass have to run after scheduling? The way you describe it,
this sounds like a scheduling problem, where you don't need regrename
to resolve false dependencies. Your sched2 pass should be able to
prioritize mergeable loads/stores to schedule them adjacent. Of if you
must do this before scheduling, then at least do it before sched2. Now
you're really ruining the order of the scheduled instructions, which
seems bad.

I don't see how regrename will help resolve [base+offset] false
dependencies. Can you explain? I'd expect effects from
hardreg-copyprop "commoning" a base register.

ChangeLog is missing the entry for arm.c.

Your pass should make those peephole2's redundant, so you should
remove the relevant define_peephole2's.


>  +   generated by spilling during reigster allocation.  To catch all

s/reigster/register/


> +   whose address expression is in the form of "[base_offset]".  It

s/[base_offset]/[base+offset]/


> +   only guarantee that the two consecutive memory access instructions

s/guarantee/guarantees/


> +   has no data dependency issues.  After a pair is found it calls to

s/has/have/


> +/* Maximal number of memory references we support in single isntruction.  */

s/Maximal/Maximum/
s/isntruction/instruction/


> +/* Check instruction recorded in PINSN_INFO to see if it is the
> +   first instruction of load store pair.  If yes, record the
> +   information in PPAIR_INFO and return TRUE.  */
> +
> +static bool
> +find_pair_x_insn (struct pair_info_t *ppair_info,

The function description doesn't seem to match the function
implementation. There is nothing in find_pair_x_insn that looks for a
load/store pair.


> +  /* Don't bother if base is a register and we are modifying it.  */
> +  if (REG_P (base) && reg_modified_p (base, insn))
> +return false;

You can shortcut this if you're looking at a load.


> +  for (; *ref_rec; ref_rec++)

Nit: "while (*ref_rec)"


> +  /* Can't be paired if memory refs have different mode.  */
> +  if (GET_MODE (mem) != mode)
> +return false;

Not even if same GET_MODE_SIZE?


> +  /* Clear PAIR_SINK if the memory unit of the first instruction
> + is clobbered between.  */
> +  if ((pinfo->pair_kind & PAIR_SINK) != 0
> +  && MEM_CLOBBERED_P (pinfo->mem_clob, MEM_CLOB_SELF))
> +pinfo->pair_kind &= (~PAIR_SINK);

This has the smell of architecture-specific behavior. Maybe you can't
even pair instructions if there is a MEM_CLOB_SELF involved. Likewise
for some of the checks in valid_insn_pair_p following the one quoted
above.


> +  HOST_WIDE_INT offset_t;

We generally reserve "_t" for types

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Jeff Law


On 05/15/14 10:51, Mike Stump wrote:

On May 15, 2014, at 12:26 AM, bin.cheng  wrote:

Here comes up with a new GCC pass looking through each basic block
and merging paired load store even they are not adjacent to each
other.


So I have a target that has load and store multiple support that
supports large a number of registers (2-n registers), and I added a
sched0 pass that is a light copy of the regular scheduling pass that
uses a different cost function which arranges all loads first, then
all stores then everything else.  Within a group of loads or stores
the secondary key is the base register, the next key is the offset.
The net result, all loads off the same register are sorted in
increasing order.

Glad to see someone else stumble on (ab)using the scheduler to do this.

I've poked at the scheduler several times to do similar stuff, but was 
never really satisfied with the results and never tried to polish those 
prototypes into something worth submitting.


One example I've poked at was discovery of stores which then feed into a 
load from the same location.  Which obviously we'd prefer to turn into a 
store + copy (subject to mess of constraints).  There's a handful of 
these kind of transformations that seem to naturally drop out of this 
kind of work.


Similarly a post-reload pass could be used to promote single word 
loads/stores to double-word operations.


If anyone cared about PA 1.1 code generation, it'd be a much cleaner way 
to support the non-fused fmpyadd fmpysub insns.


Anyway, if you want to move forward with the idea, I'd certainly support 
doing so.


jeff

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread H.J. Lu

On Thu, May 15, 2014 at 12:26 AM, bin.cheng  wrote:
> Hi,
> Targets like ARM and AARCH64 support double-word load store instructions,
> and these instructions are generally faster than the corresponding two
> load/stores.  GCC currently uses peephole2 to merge paired load/store into
> one single instruction which has a disadvantage.  It can only handle simple
> cases like the two instructions actually appear sequentially in instruction
> stream, and is too weak to handle cases in which the two load/store are
> intervened by other irrelevant instructions.
>
> Here comes up with a new GCC pass looking through each basic block and
> merging paired load store even they are not adjacent to each other.  The
> algorithm is pretty simple:
> 1) In initialization pass iterating over instruction stream it collects
> relevant memory access information for each instruction.
> 2) It iterates over each basic block, tries to find possible paired
> instruction for each memory access instruction.  During this work, it checks
> dependencies between the two possible instructions and also records the
> information indicating how to pair the two instructions.  To avoid quadratic
> behavior of the algorithm, It introduces new parameter
> max-merge-paired-loadstore-distance and set the default value to 4, which is
> large enough to catch major part of opportunities on ARM/cortex-a15.
> 3) For each candidate pair, it calls back-end's hook to do target dependent
> check and merge the two instructions if possible.
>
> Though the parameter is set to 4, for miscellaneous benchmarks, this pass
> can merge numerous opportunities except ones already merged by peephole2
> (same level numbers of opportunities comparing to peepholed ones).  GCC
> bootstrap can also confirm this finding.
>
> Yet there is an open issue about when we should run this new pass.  Though
> register renaming is disabled by default now, I put this pass after it,
> because renaming can resolve some false dependencies thus benefit this pass.
> Another finding is, it can capture a lot more opportunities if it's after
> sched2, but I am not sure whether it will mess up with scheduling results in
> this way.
>
> So, any comments about this?
>
> Thanks,
> bin
>
>
> 2014-05-15  Bin Cheng  
> * common.opt (flag_merge_paired_loadstore): New option.
> * merge-paired-loadstore.c: New file.
> * Makefile.in: Support new file.
> * config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
> (load_latency_expanded_p, arm_merge_paired_loadstore): New function.
> * params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
> * doc/invoke.texi (-fmerge-paired-loadstore): New.
> (max-merge-paired-loadstore-distance): New.
> * doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
> * doc/tm.texi: Regenerated.
> * target.def (merge_paired_loadstore): New.
> * tree-pass.h (make_pass_merge_paired_loadstore): New decl.
> * passes.def (pass_merge_paired_loadstore): New pass.
> * timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.
>
> gcc/testsuite/ChangeLog
> 2014-05-15  Bin Cheng  
>
> * gcc.target/arm/merge-paired-loadstore.c: New test.
>

Here is a testcase on x86-64:

---
struct Foo
{
  Foo (double x0, double x1, double x2)
{
  data[0] = x0;
  data[1] = x1;
  data[2] = x2;
}
  double data[3];
};

const Foo f1 (0.0, 0.0, 1.0);
const Foo f2 (1.0, 0.0, 0.0);

struct Bar
{
  Bar (float x0, float x1, float x2, float x3, float x4)
{
  data[0] = x0;
  data[1] = x1;
  data[2] = x2;
  data[3] = x3;
  data[4] = x4;
}
  float data[5];
};

const Bar b1 (0.0, 0.0, 0.0, 0.0, 1.0);
const Bar b2 (1.0, 0.0, 0.0, 0.0, 0.0);
---

We generate

xorpd %xmm0, %xmm0
movsd .LC1(%rip), %xmm1
movsd %xmm0, _ZL2f1(%rip)
movsd %xmm0, _ZL2f1+8(%rip)
movsd %xmm0, _ZL2f2+8(%rip)
movsd %xmm0, _ZL2f2+16(%rip)
xorps %xmm0, %xmm0
movsd %xmm1, _ZL2f1+16(%rip)
movsd %xmm1, _ZL2f2(%rip)
movss .LC3(%rip), %xmm1
movss %xmm0, _ZL2b1(%rip)
movss %xmm0, _ZL2b1+4(%rip)
movss %xmm0, _ZL2b1+8(%rip)
movss %xmm0, _ZL2b1+12(%rip)
movss %xmm1, _ZL2b1+16(%rip)
movss %xmm1, _ZL2b2(%rip)
movss %xmm0, _ZL2b2+4(%rip)
movss %xmm0, _ZL2b2+8(%rip)
movss %xmm0, _ZL2b2+12(%rip)
movss %xmm0, _ZL2b2+16(%rip)

There are pairs of movsd and sets of 4 movss.  We should
be able to handle more than 2 load/store insns.

-- 
H.J.

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Toon Moene


On 05/15/2014 05:08 PM, Marek Polacek wrote:


On Thu, May 15, 2014 at 11:39:26AM +0100, Ramana Radhakrishnan wrote:

What's the overhead with bootstrap-ubsan ?


I timed normal bootstrap and bootstrap-ubsan on x86_64, 24 cores,
Intel(R) Xeon(R) CPU X5670  @ 2.93GHz (aka cfarm 20) and the results:

--enable-languages=all

real35m10.419s
user204m5.613s
sys 6m15.615s

--enable-languages=all --with-build-config=bootstrap-ubsan

real71m39.338s
user347m53.409s
sys 7m44.281s


And don't underestimate the *usefulness* of this - if you don't have the 
resources to do a ubsan bootstrap, download mine from last night 
(x86_64-linux-gnu): http://moene.org/~toon/gcc-tests.log.gz


[ I hope there is a way to discard color codings when writing error 
messages to a file, ugh ]


Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news

[PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations

2014-05-15 Thread Sriraman Tallam

Optimize access to globals with -fpie, x86_64 only:

Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT.  This is two instructions, one to get the address of the global
from the GOT and the other to get the value.  If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access.

Examples:

foo.cc
--
int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code directly accesses the global via
PC-relative insn:

5e0   :
   mov0x165a(%rip),%eax# 1c40 

foo.cc
--

extern int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code accesses global via GOT using two
memory loads:

6f0  :
   mov0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
   mov(%rax),%eax

This is true even if in the latter case the global was defined in the
executable through a different file.

Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%.


Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable.  For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.

This patch to the gold linker :
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.

I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this.  Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.

Please review.


ChangeLog:

* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
 address is still legitimate in the presence of copy relocations
 and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.



Patch attached.
Thanks
Sri
Optimize access to globals with -fpie, x86_64 only:

Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT.  This is two instructions, one to get the address of the global
from the GOT and the other to get the value.  If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access. 

Examples:

foo.cc
--
int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code directly accesses the global via
PC-relative insn:

5e0   :
   mov    0x165a(%rip),%eax    # 1c40 

foo.cc
--

extern int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code accesses global via GOT using two
memory loads:

6f0  :
   mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
   mov    (%rax),%eax

This is true even if in the latter case the global was defined in the
executable through a different file.

Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%. 


Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable.  For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.

This patch to the gold linker :
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.

I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this.  Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.

Please review.


ChangeLog:

* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
  address is still legitimate in the presence of copy relocations
  and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.


Index: config/i386/i386.opt
===
--- config/i386/i386.opt(revision 210437)
+++ config/i386/i386.opt(working copy)
@@ -108,6 +108,10 @@ int x_ix86_dump_tunes
 TargetSave
 int x_ix86_force_align_arg_pointer
 
+;; -mld-pie-copyrelocs
+TargetSave
+int x_ix86_ld_pie_copyrelocs
+
 ;; -mforce-drap= 
 TargetSave
 int x_ix86_force_drap
@@ -291,6 +295,

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Marek Polacek

On Thu, May 15, 2014 at 08:32:36PM +0200, Toon Moene wrote:
> On 05/15/2014 05:08 PM, Marek Polacek wrote:
> 
> >On Thu, May 15, 2014 at 11:39:26AM +0100, Ramana Radhakrishnan wrote:
> >>What's the overhead with bootstrap-ubsan ?
> >
> >I timed normal bootstrap and bootstrap-ubsan on x86_64, 24 cores,
> >Intel(R) Xeon(R) CPU X5670  @ 2.93GHz (aka cfarm 20) and the results:
> >
> >--enable-languages=all
> >
> >real 35m10.419s
> >user 204m5.613s
> >sys  6m15.615s
> >
> >--enable-languages=all --with-build-config=bootstrap-ubsan
> >
> >real 71m39.338s
> >user 347m53.409s
> >sys  7m44.281s
> 
> And don't underestimate the *usefulness* of this - if you don't have the
> resources to do a ubsan bootstrap, download mine from last night
> (x86_64-linux-gnu): http://moene.org/~toon/gcc-tests.log.gz
> 
> [ I hope there is a way to discard color codings when writing error messages
> to a file, ugh ]

Sure, -fdiagnostics-color=auto "means to use color only when the
standard error is a terminal", or -fdiagnostics-color=never to turn it
off completely (testsuite uses the latter).

Marek

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Mike Stump

On May 15, 2014, at 10:13 AM, Jeff Law  wrote:
> I've poked at the scheduler several times to do similar stuff, but was never 
> really satisfied with the results and never tried to polish those prototypes 
> into something worth submitting.

What was lacking?  The cleanliness of the patch or the, it didn’t win me more 
than 0.01% code improvement so I never submitted it?

> One example I've poked at was discovery of stores which then feed into a load 
> from the same location.

Trivial to do with my patch, just change the sort key to arrange that to 
happen, however, if you want multiple support and this, then, you need to 
recognize store_m / load from a part of that, which would be kinda annoying.

> Anyway, if you want to move forward with the idea, I'd certainly support 
> doing so.

I’d be happy to clean up my patch as necessary and contribute it, however, 
people would have to tell me what they would like to see done to it.  In my 
mind, the arm, more, nds32, rs6000 and s390 ports would be the biggest in tree 
winners.

To be the most useful, my patch would benefit from an optimization person 
writing MI code to coalesce adjacent memory operations into either larger mode 
load and stores or into gen_load_multiple, gen_store_multiplethen, then all the 
port work (mine is around 16*n*n lines) would be zero.

Re: [PATCH] Make ipa-prop analyze BBs in DOM order

2014-05-15 Thread Jan Hubicka

> Hi,
> 
> the following patch deals with requested propagation in PR 53787 in
> the real benchmark (as opposed to the original simple testcase) by
> analyzing individual BBs in ipa-prop.c in dominator order.
> 
> Currently we do the analysis in two loops, in the first the order is
> given by FOR_EACH_BB_FN and in the second the order is the order of
> call graph edges (which in practice is quite bad because we tend to
> start with BBs towards the end).  When analysis determines that a
> non-SSA parameter or that data pointed to by a parameter are modified,
> this is marked into a function wide flag (describing the parameter)
> and we always consider the data clobbered from that moment on in order
> to save AA walking.
> 
> This patch changes the analysis into dominator order and data is
> considered clobbered only in dominator children of a block where it
> was determined to be possibly clobbered, not in the whole function.
> This was confirmed to help in the aforementioned PR.
> 
> The patch also, enforces a cap on the number of statements walked
> (approx. 4 times the highest I have ever seen).  On the other hand it
> gives up trying to cache the bitmap of visited memory-SSAs.  The
> per-BB information had to be put somewhere and I took this opportunity
> to cram a lot of commonly used per-function data into the same
> structure, which is then passed among various functions.  The new
> structure replaces the struct param_analysis_info.
> 
> Bootstrapped and tested on x86_64-linux, I have also LTO-built Firefox
> at -O3 with it.  OK for trunk?
> 
> Thanks,
> 
> Martin
> 
> 
> 2014-02-12  Martin Jambor  
> 
>   PR tree-optimization/53787
>   * params.def (PARAM_IPA_CP_LOOP_HINT_BONUS): New param.
>   * ipa-prop.h (ipa_node_params): Rename uses_analysis_done to
>   analysis_done, update all uses.
>   * ipa-prop.c: Include domwalk.h
>   (param_analysis_info): Removed.
>   (param_aa_status): New type.
>   (ipa_bb_info): Likewise.
>   (func_body_info): Likewise.
>   (ipa_get_bb_info): New function.
>   (aa_overwalked): Likewise.
>   (find_dominating_aa_status): Likewise.
>   (parm_bb_aa_status_for_bb): Likewise.
>   (parm_preserved_before_stmt_p): Changed to use new param AA info.
>   (load_from_unmodified_param): Accept func_body_info as a parameter
>   instead of parms_ainfo.
>   (parm_ref_data_preserved_p): Changed to use new param AA info.
>   (parm_ref_data_pass_through_p): Likewise.
>   (ipa_load_from_parm_agg_1): Likewise.  Update callers.
>   (compute_complex_assign_jump_func): Changed to use new param AA info.
>   (compute_complex_ancestor_jump_func): Likewise.
>   (ipa_compute_jump_functions_for_edge): Likewise.
>   (ipa_compute_jump_functions): Removed.
>   (ipa_compute_jump_functions_for_bb): New function.
>   (ipa_analyze_indirect_call_uses): Likewise, moved variable
>   declarations down.
>   (ipa_analyze_virtual_call_uses): Accept func_body_info instead of node
>   and info, moved variable declarations down.
>   (ipa_analyze_call_uses): Accept and pass on func_body_info instead of
>   node and info.
>   (ipa_analyze_stmt_uses): Likewise.
>   (ipa_analyze_params_uses): Removed.
>   (ipa_analyze_params_uses_in_bb): New function.
>   (ipa_analyze_controlled_uses): Likewise.
>   (free_ipa_bb_info): Likewise.
>   (analysis_dom_walker): New class.
>   (ipa_analyze_node): Handle node-specific forbidden analysis,
>   initialize and free func_body_info, use dominator walker.
>   (ipcp_modif_dom_walker): New class.
>   (ipcp_transform_function): Create and free func_body_info, use
>   ipcp_modif_dom_walker, moved a lot of functionality there.

> 
> --- 59,111 
>   #include "ipa-utils.h"
>   #include "stringpool.h"
>   #include "tree-ssanames.h"
> + #include "domwalk.h"
>   
> ! /* Intermediate information that we get from AA analysis about a parameter. 
>  */

AA is alias analysis, so I guess either "AA" (removing analysis) or "alias 
analysis" :)

>   
> ! struct param_aa_status
>   {
> +   /* If not true, look at the dominator parent instead.  */
> +   bool valid;

In isolation, this comment is hardly understandable.
Perhaps you can mention that this info is used for each parameter during the
dominator order propagation.


> + 
> +   /* Whether we have seen something which might have modified the data in
> +  question.  Parm is for the parameter itself, ref is for data it points 
> to
> +  but using the alias type of individual accesses and pt is the same 
> thing
> +  but for computing aggregate pass-through functions using a very 
> inclusive
> +  ao_ref.  */

I guess PARM/REF/PR per coding style ;))

> ! 
> ! /* Structure with global information that is only used when looking at 
> function
> !body. */
> ! 
> ! struct func_body_info
> ! {
> !   /* The node that is being analyzed.  */
> !   cgraph_node *node;
> ! 
>

Re: [PATCH, libstdc++/61166] overflow when parse number in std::duration operator""

2014-05-15 Thread Jonathan Wakely


Here's a finished patch to simplify 

Tested x86_64-linux. Ed, any objection to this version?

commit 87d26af2fc07f0c45a0a6676161ae1db1d7541b7
Author: Jonathan Wakely 
Date:   Wed May 14 16:35:20 2014 +0100

2014-05-15  Ed Smith-Rowland  <3dw...@verizon.net>
	Jonathan Wakely  

	PR libstdc++/61166
	* include/bits/parse_numbers.h: Use integral_constant to remove
	duplication and simplify.
	* testsuite/20_util/duration/literals/61166.cc: New.

diff --git a/libstdc++-v3/include/bits/parse_numbers.h b/libstdc++-v3/include/bits/parse_numbers.h
index 91a127c..0a42381a 100644
--- a/libstdc++-v3/include/bits/parse_numbers.h
+++ b/libstdc++-v3/include/bits/parse_numbers.h
@@ -27,8 +27,8 @@
  *  Do not attempt to use it directly. @headername{chrono}
  */
 
-#ifndef _PARSE_NUMBERS_H
-#define _PARSE_NUMBERS_H 1
+#ifndef _GLIBCXX_PARSE_NUMBERS_H
+#define _GLIBCXX_PARSE_NUMBERS_H 1
 
 #pragma GCC system_header
 
@@ -36,289 +36,181 @@
 
 #if __cplusplus > 201103L
 
+#include 
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-namespace __parse_int {
-
+namespace __parse_int
+{
   template
 struct _Digit;
 
   template
-struct _Digit<_Base, '0'>
+struct _Digit<_Base, '0'> : integral_constant
 {
-  static constexpr bool valid{true};
-  static constexpr unsigned value{0};
+  using __valid = true_type;
 };
 
   template
-struct _Digit<_Base, '1'>
+struct _Digit<_Base, '1'> : integral_constant
 {
-  static constexpr bool valid{true};
-  static constexpr unsigned value{1};
+  using __valid = true_type;
 };
 
-  template
-struct _Digit<_Base, '2'>
+  template
+struct _Digit_impl : integral_constant
 {
-  static_assert(_Base > 2, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{2};
+  static_assert(_Base > _Val, "invalid digit");
+  using __valid = true_type;
 };
 
   template
-struct _Digit<_Base, '3'>
-{
-  static_assert(_Base > 3, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{3};
-};
+struct _Digit<_Base, '2'> : _Digit_impl<_Base, 2>
+{ };
 
   template
-struct _Digit<_Base, '4'>
-{
-  static_assert(_Base > 4, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{4};
-};
+struct _Digit<_Base, '3'> : _Digit_impl<_Base, 3>
+{ };
 
   template
-struct _Digit<_Base, '5'>
-{
-  static_assert(_Base > 5, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{5};
-};
+struct _Digit<_Base, '4'> : _Digit_impl<_Base, 4>
+{ };
 
   template
-struct _Digit<_Base, '6'>
-{
-  static_assert(_Base > 6, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{6};
-};
+struct _Digit<_Base, '5'> : _Digit_impl<_Base, 5>
+{ };
 
   template
-struct _Digit<_Base, '7'>
-{
-  static_assert(_Base > 7, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{7};
-};
+struct _Digit<_Base, '6'> : _Digit_impl<_Base, 6>
+{ };
 
   template
-struct _Digit<_Base, '8'>
-{
-  static_assert(_Base > 8, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{8};
-};
+struct _Digit<_Base, '7'> : _Digit_impl<_Base, 7>
+{ };
 
   template
-struct _Digit<_Base, '9'>
-{
-  static_assert(_Base > 9, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{9};
-};
+struct _Digit<_Base, '8'> : _Digit_impl<_Base, 8>
+{ };
 
   template
-struct _Digit<_Base, 'a'>
-{
-  static_assert(_Base > 0xa, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{0xa};
-};
+struct _Digit<_Base, '9'> : _Digit_impl<_Base, 9>
+{ };
 
   template
-struct _Digit<_Base, 'A'>
-{
-  static_assert(_Base > 0xa, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{0xa};
-};
+struct _Digit<_Base, 'a'> : _Digit_impl<_Base, 0xa>
+{ };
 
   template
-struct _Digit<_Base, 'b'>
-{
-  static_assert(_Base > 0xb, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{0xb};
-};
+struct _Digit<_Base, 'A'> : _Digit_impl<_Base, 0xa>
+{ };
 
   template
-struct _Digit<_Base, 'B'>
-{
-  static_assert(_Base > 0xb, "invalid digit");
-  static constexpr bool valid{true};
-  static constexpr unsigned value{0xb};
-};
+struct _Digit<_Base, 'b'> : _Digit_impl<_Base, 0xb>
+{ };
 
   template
-struct _Digit<_Base, 'c'>
-{
-  static_assert(_Base > 0xc, "invalid digit");
-  static constexpr bool valid{t

Re: [PATCH] Implement -fsanitize=float-cast-overflow

2014-05-15 Thread Jakub Jelinek

On Wed, May 14, 2014 at 05:37:25PM +, Joseph S. Myers wrote:
> > with s/max/min/;s/dconst1/dconstm1/; and, after the real_convert
> > do inexact = real_arithmetic (&newminval, MINUS_EXPR, &minval, &dconst1);
> > if !inexact just min = build_real (expr_type, newminval); and be done with
> > it (the question is if for IBM double double this will DTRT for
> > LONG_LONG_MIN, which I think should be that the high double will contain
> > (double) LONG_LONG_MIN and the low double -1.0).  For inexact
> > (which should be the same thing as if result of real_arithmetic + 
> > real_convert
> > is the same as original minval) we need to subtract more than one, dunno if
> > we should just compute it from the REAL_EXP and precision, or just keep
> > subtracing powers of two until after real_convert it is no longer bitwise
> > identical to original minval.  We don't have anything close to
> > real_nextafter nor real_convert variant that can round for arbitrary
> > rounding modes.
> > Any preferences how to implement this?
> 
> In the inexact case but where the power of 2 is representable, you could 
> always handle it as < min rather than <= min-1 - although computing the 
> actual nextafter value based on the precision of the floating-point type 
> shouldn't be hard and would allow <= to be used everywhere.

Here is incremental implementation for the binary floating point formats
(and, it seems also the C/C++ bitfields are handled properly already by
Marek's patch).
I've tried
struct S { int i : 17; } s;

void
f1 (float x)
{
  s.i = x;
}

long long
f2 (float x)
{
  return x;
}

__int128_t
f3 (long double x)
{
  return x;
}

__int128_t
f4 (float x)
{
  return x;
}

__uint128_t
f5 (float x)
{
  return x;
}

long long
f6 (long double x)
{
  return x;
}

on both x86_64 and ppc64 -mlong-double-128 and manually inspected the
numbers and they looked ok, for ppc64 f6 was using:
u<= { 9223372036854775808.0 + 0.0 }
u>= {-9223372036854775808.0 + -1.0 }
and f3
u<= { 170141183460469231731687303715884105728.0 + 0.0 }
u>= { -170141183460469231731687303715884105728.0 + -4194304.0 }
which I believe is correct.  Of course the above needs to be transformed
into two real testcases that will actually test that valid in-range values
don't complain and out of range do, will leave that to Marek.

> > For _Decimal*, no idea unfortunately, perhaps for the first iteration
> > ubsan should ignore decimal to int conversions.
> 
> Yes, that seems reasonable.  (Computing the exact max+1 or min-1 as an 
> MPFR value and then using mpfr_snprintf (then decimal_real_from_string) 
> would be one way of converting to decimal with a controlled rounding 
> direction.)

If prec < HOST_BITS_PER_WIDE_INT, then we can just snprintf
normally, otherwise yes, we could e.g. use
char *buf = XALLOCAVEC (char, prec / 2);
mpfr_t m;
mpfr_init2 (m, prec + 2);
mpfr_set_ui_2exp (m, 1, prec - !uns_p);
mpfr_snprintf (buf, prec / 2, "%f", m);
// buf should be TYPE_MAX_VALUE + 1.0 here?
if (!uns_p)
{
mpfr_set_si_2exp (m, -1, prec - 1);
mpfr_sub_ui (m, m, 1, GMP_RNDN);
mpfr_snprintf (buf, prec / 2, "%f", m);
// buf should be TYPE_MIN_VALUE - 1.0 here?
}
But I think we can't use decimal_real_from_string, we'd need a variant
of that function that would allow specification of the rounding mode
and which of the 3 _DecimalN types it is, and supposedly
  decNumber dn;
  decContext set;
  decContextDefault (&set, DEC_INIT_DECIMAL{32,64,128});
  set.traps = 0;
  set.round = {DEC_ROUND_CEILING,DEC_ROUND_FLOOR};
  decNumberFromString (&dn, s, &set);
then if not DEC_INIT_DECIMAL128 supposedly convert there and back
to DEC_INIT_DECIMAL128 and then convert to REAL_FORMAT.
But my _Decimal knowledge is very limited, so I'll leave that to
the DFP folks (unless Marek is interested).

--- gcc/convert.c
+++ gcc/convert.c
@@ -851,6 +851,8 @@
  expr = save_expr (expr);
  tree check = ubsan_instrument_float_cast (loc, type, expr);
  expr = build1 (FIX_TRUNC_EXPR, type, expr);
+ if (check == NULL)
+   return expr;
  return fold_build2 (COMPOUND_EXPR, TREE_TYPE (expr), check, expr);
}
   else
--- gcc/ubsan.c
+++ gcc/ubsan.c
@@ -903,17 +903,62 @@
 ubsan_instrument_float_cast (location_t loc, tree type, tree expr)
 {
   tree expr_type = TREE_TYPE (expr);
-  tree t, tt, fn;
+  tree t, tt, fn, min, max;
+  enum machine_mode mode = TYPE_MODE (expr_type);
+  int prec = TYPE_PRECISION (type);
+  bool uns_p = TYPE_UNSIGNED (type);
 
-  tree min = TYPE_MIN_VALUE (type);
-  tree max = TYPE_MAX_VALUE (type);
-  /* Add/subtract 1.0 so we can avoid truncating the value of EXPR.  */
-  min = fold_build2 (MINUS_EXPR, expr_type,
-build_real_from_int_cst (expr_type, min),
-build_one_cst (expr_type));
-  max = fold_build2 (PLUS_EXPR, expr_type,
-build_real_from_int_cst (expr_type, max),
-build_one_cst (expr_type));
+  /* Float to integer conversion first truncates toward zero, s

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Jakub Jelinek

On Thu, May 15, 2014 at 08:37:07PM +0200, Marek Polacek wrote:
> > And don't underestimate the *usefulness* of this - if you don't have the
> > resources to do a ubsan bootstrap, download mine from last night
> > (x86_64-linux-gnu): http://moene.org/~toon/gcc-tests.log.gz
> > 
> > [ I hope there is a way to discard color codings when writing error messages
> > to a file, ugh ]
> 
> Sure, -fdiagnostics-color=auto "means to use color only when the
> standard error is a terminal", or -fdiagnostics-color=never to turn it
> off completely (testsuite uses the latter).

I guess Toon meant that there is no easy way? to get rid of the color
stuff in libsanitizer messages.

Jakub

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Toon Moene


On 05/15/2014 09:10 PM, Jakub Jelinek wrote:


On Thu, May 15, 2014 at 08:37:07PM +0200, Marek Polacek wrote:



Sure, -fdiagnostics-color=auto "means to use color only when the
standard error is a terminal", or -fdiagnostics-color=never to turn it
off completely (testsuite uses the latter).


I guess Toon meant that there is no easy way? to get rid of the color
stuff in libsanitizer messages.


Sure. The point is that the testsuite runs only write to a *file*, so 
why should I get color-coded error messages like this:


ESC[1m/home/toon/compilers/trunk/gcc/config/i386/i386.c:6577:60:ESC[1mESC[31m 
runtime error: ESC[1mESC[0mESC[1mload of value 32763, which is not a 
valid value for type 'x86_64_reg_class'ESC[1mESC[0m


?

It makes precise grepping needlessly hard ...

Otherwise, thanks very much for this work - definitely appreciated !

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news

Re: [PATCH] Add support for -fno-sanitize-recover and -fsanitize-undefined-trap-on-error (PR sanitizer/60275)

2014-05-15 Thread Marek Polacek

On Thu, May 15, 2014 at 09:10:55PM +0200, Jakub Jelinek wrote:
> I guess Toon meant that there is no easy way? to get rid of the color
> stuff in libsanitizer messages.

Yikes.  I think there's no way yet; UBSAN_OPTIONS envvar, similar to
e.g. ASAN_OPTIONS, isn't supported yet it seems.
Marek

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Jeff Law


On 05/15/14 12:41, Mike Stump wrote:

On May 15, 2014, at 10:13 AM, Jeff Law  wrote:

I've poked at the scheduler several times to do similar stuff, but
was never really satisfied with the results and never tried to
polish those prototypes into something worth submitting.


What was lacking?  The cleanliness of the patch or the, it didn’t win
me more than 0.01% code improvement so I never submitted it?

Cleanliness and applicability of my implementation.

For fmpyadd/fmpysub on the PA, my recollection was that the right way to 
go was to detect when both were in the ready queue at the same time, 
then resort the queue to have them issue consecutively.  We didn't have 
the necessary hooks to have target dependent code examine and rearrange 
the queues back when I looked at this.  By the time we had those 
capabilities, the PA8000 class processors had been released and those 
instructions were considered bad so I never came back to this.


For the memory optimizations, IIRC, the dependencies keep them from 
getting into the ready queue at the same time.  Thus it's significantly 
harder to get them to issue consecutively when you've got an issue rate > 1.





Trivial to do with my patch, just change the sort key to arrange that
to happen, however, if you want multiple support and this, then, you
need to recognize store_m / load from a part of that, which would be
kinda annoying.
But if you've got an issue rate > 1, then it's a lot less likely you'll 
get the store/load pairing up the way you want.


Arguably one could throttle down the issue rate for these scheduler-like 
passes which makes that problem go away.


JEff

[patch] libstdc++/61143 make unordered containers usable after move

2014-05-15 Thread François Dumont


Hi

Here is a proposal to fix PR 61143. As explained in the PR I 
finally prefer to leave the container in a valid state that is to say 
with a non zero number of bucket, that is to say 1, after a move. This 
bucket is stored directly in the container so not allocated to leave the 
move operations noexcept qualified. With this evolution we could even 
make the default constructor noexcept but I don't think it has any interest.


2014-05-15  François Dumont  

PR libstdc++/61143
* include/bits/hashtable.h: Fix move semantic to leave hashtable in a
usable state.
* testsuite/23_containers/unordered_set/61143.cc: New.

I run unordered containers test for the moment with no error. I 
will run the others before to commit.


François
Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h	(revision 210479)
+++ include/bits/hashtable.h	(working copy)
@@ -316,14 +316,37 @@
   size_type			_M_element_count;
   _RehashPolicy		_M_rehash_policy;
 
+  // A single bucket used when only need 1 bucket. After move the hashtable
+  // is left with only 1 bucket which is not allocated so that we can have a
+  // noexcept move constructor.
+  // Note that we can't leave hashtable with 0 bucket without adding
+  // numerous checks in the code to avoid 0 modulus.
+  __bucket_type		_M_single_bucket;
+
   __hashtable_alloc&
   _M_base_alloc() { return *this; }
 
-  using __hashtable_alloc::_M_deallocate_buckets;
+  __bucket_type*
+  _M_allocate_buckets(size_type __n)
+  {
+	if (__builtin_expect(__n == 1, false))
+	  return &_M_single_bucket;
 
+	return __hashtable_alloc::_M_allocate_buckets(__n);
+  }
+
   void
+  _M_deallocate_buckets(__bucket_type* __bkts, size_type __n)
+  {
+	if (__builtin_expect(__bkts == &_M_single_bucket, false))
+	  return;
+
+	__hashtable_alloc::_M_deallocate_buckets(__bkts, __n);
+  }
+
+  void
   _M_deallocate_buckets()
-  { this->_M_deallocate_buckets(_M_buckets, _M_bucket_count); }
+  { _M_deallocate_buckets(_M_buckets, _M_bucket_count); }
 
   // Gets bucket begin, deals with the fact that non-empty buckets contain
   // their before begin node.
@@ -703,11 +726,7 @@
 
   size_type
   erase(const key_type& __k)
-  {
-	if (__builtin_expect(_M_bucket_count == 0, false))
-	  return 0;
-	return _M_erase(__unique_keys(), __k);
-  }
+  { return _M_erase(__unique_keys(), __k); }
 
   iterator
   erase(const_iterator, const_iterator);
@@ -768,7 +787,7 @@
   _M_rehash_policy()
 {
   _M_bucket_count = _M_rehash_policy._M_next_bkt(__bucket_hint);
-  _M_buckets = this->_M_allocate_buckets(_M_bucket_count);
+  _M_buckets = _M_allocate_buckets(_M_bucket_count);
 }
 
   template_M_allocate_buckets(_M_bucket_count);
+	_M_buckets = _M_allocate_buckets(_M_bucket_count);
 	__try
 	  {
 	for (; __f != __l; ++__f)
@@ -833,9 +852,9 @@
 	  {
 		// Replacement allocator cannot free existing storage.
 		this->_M_deallocate_nodes(_M_begin());
-		if (__builtin_expect(_M_bucket_count != 0, true))
-		  _M_deallocate_buckets();
-		_M_reset();
+		_M_before_begin._M_nxt = nullptr;
+		_M_deallocate_buckets();
+		_M_buckets = nullptr;
 		std::__alloc_on_copy(__this_alloc, __that_alloc);
 		__hashtable_base::operator=(__ht);
 		_M_bucket_count = __ht._M_bucket_count;
@@ -867,7 +886,7 @@
 	if (_M_bucket_count != __ht._M_bucket_count)
 	  {
 	__former_buckets = _M_buckets;
-	_M_buckets = this->_M_allocate_buckets(__ht._M_bucket_count);
+	_M_buckets = _M_allocate_buckets(__ht._M_bucket_count);
 	_M_bucket_count = __ht._M_bucket_count;
 	  }
 	else
@@ -885,8 +904,7 @@
 		  [&__roan](const __node_type* __n)
 		  { return __roan(__n->_M_v()); });
 	if (__former_buckets)
-	  this->_M_deallocate_buckets(__former_buckets,
-	  __former_bucket_count);
+	  _M_deallocate_buckets(__former_buckets, __former_bucket_count);
 	  }
 	__catch(...)
 	  {
@@ -917,7 +935,7 @@
   {
 	__bucket_type* __buckets = nullptr;
 	if (!_M_buckets)
-	  _M_buckets = __buckets = this->_M_allocate_buckets(_M_bucket_count);
+	  _M_buckets = __buckets = _M_allocate_buckets(_M_bucket_count);
 
 	__try
 	  {
@@ -964,8 +982,9 @@
 _M_reset() noexcept
 {
   _M_rehash_policy._M_reset();
-  _M_bucket_count = 0;
-  _M_buckets = nullptr;
+  _M_bucket_count = 1;
+  _M_buckets = &_M_single_bucket;
+  _M_single_bucket = nullptr;
   _M_before_begin._M_nxt = nullptr;
   _M_element_count = 0;
 }
@@ -980,12 +999,16 @@
 _M_move_assign(_Hashtable&& __ht, std::true_type)
 {
   this->_M_deallocate_nodes(_M_begin());
-  if (__builtin_expect(_M_bucket_count != 0, true))
-	_M_deallocate_buckets();
-
+  _M_deallocate_buckets();
   __hashtable_base::operator=(std::move(__ht));
   _M_rehash_policy = __ht._M_rehash_policy;
-  _M_bu

Re: [C++ Patch/RFC] PR 58753 & 58930

2014-05-15 Thread Paolo Carlini

.. in case it would be useful to better analyze the issue and my 
tentative fix, I ran the C++ and C++ runtime testsuite with an 
appropriate gcc_assert, and the added DIRECT_LIST_INIT_P (init) case 
matters only for the existing cpp1y/pr58708.C (besides the new 
testcases, of course).


Also, in case it wasn't obvious in the other message, comparing 
c++/58930 as-is to the non-template version, it's clear that with the 
patchlet applied, both lead to similar TARGET_EXPRs from the initial 
CONSTRUCTOR produced by cp_parser_initializer.


Paolo.

Re: [patch] libstdc++/61143 make unordered containers usable after move

2014-05-15 Thread Paolo Carlini


Hi,

On 05/15/2014 10:20 PM, François Dumont wrote:

Hi

Here is a proposal to fix PR 61143. As explained in the PR I 
finally prefer to leave the container in a valid state that is to say 
with a non zero number of bucket, that is to say 1, after a move. This 
bucket is stored directly in the container so not allocated to leave 
the move operations noexcept qualified. With this evolution we could 
even make the default constructor noexcept but I don't think it has 
any interest.
Well, FWIW my opinion (I didn't follow the issue in detail), I think it 
would be a very nice improvement, not only because of the noexcept, but 
because of the avoided dynamic memory allocation, thus performance. I'm 
sure many other implementors agree ;)


Paolo.

Re: [PATCH] offline gcda profile processing tool

2014-05-15 Thread Jan Hubicka

> Hi, Honza,
> 
> Please find the new patch in the attachment. This patch integrated
> your earlier suggestions. The noticeable changes are:
> (1) build specialized object for libgcov-driver.c and libgcov-merge.c
> and link into gcov-tool, rather including the source file.
> (2) split some gcov counter definition code to gcov-counter.def to
> avoid code duplication.
> (3) use a macro for gcov_read_counter(), so gcov-tool can use existing
> code in libgcov-merge.c with minimal change.
> 
> This patch does not address the suggestion of creating a new directory
> for libgcov. I agree with you that's a much better
> and cleaner structure we should go for. We can do that in follow-up patches.

Yep, lets do this incrementally. thanks!
> 
> I tested this patch with boostrap and profiledbootstrap. Other tests
> are on-going.
> 
> Thanks,
> 
> -Rong
> 2014-05-01  Rong Xu  
> 
>   * gcc/gcov-io.c (gcov_position): Make avaialble to gcov-tool.
>   (gcov_is_error): Ditto.
>   (gcov_read_string): Ditto.
>   (gcov_read_sync): Ditto.
>   * gcc/gcov-io.h: Move counter defines to gcov-counter.def.
>   * gcc/gcov-dump.c (tag_counters): Use gcov-counter.def.
>   * gcc/coverage.c: Ditto.
>   * gcc/gcov-tool.c: Offline gcda profile processing tool.
> (unlink_gcda_file): Remove one gcda file.
>   (unlink_profile_dir): Remove gcda files from the profile path.
>   (profile_merge): Merge two profiles in directory.
>   (print_merge_usage_message): Print merge usage.
>   (merge_usage): Print merge usage and exit.
>   (do_merge): Driver for profile merge sub-command.
>   (profile_rewrite): Rewrite profile.
>   (print_rewrite_usage_message): Print rewrite usage.
>   (rewrite_usage): Print rewrite usage and exit.
>   (do_rewrite): Driver for profile rewrite sub-command.
>   (print_usage): Print gcov-info usage and exit.
>   (print_version): Print gcov-info version.
>   (process_args): Process arguments.
>   (main): Main routine for gcov-tool.
>   * gcc/Makefile.in: Build and install gcov-tool.
>   * gcc/gcov-counter.def: New file split from gcov-io.h.
>   * libgcc/libgcov-driver.c (gcov_max_filename): Make available
> to gcov-tool.
>   * libgcc/libgcov-merge.c (__gcov_merge_add): Replace
> gcov_read_counter() with a Macro.
>   (__gcov_merge_ior): Ditto.
>   (__gcov_merge_time_profile): Ditto.
>   (__gcov_merge_single): Ditto.
>   (__gcov_merge_delta): Ditto.
>   * libgcc/libgcov-util.c (void gcov_set_verbose): Set the verbose flag
> in the utility functions.
>   (set_fn_ctrs): Utility function for reading gcda files to in-memory
> gcov_list object link lists.
>   (tag_function): Ditto.
>   (tag_blocks): Ditto.
>   (tag_arcs): Ditto.
>   (tag_lines): Ditto.
>   (tag_counters): Ditto.
>   (tag_summary): Ditto.
>   (read_gcda_finalize): Ditto.
>   (read_gcda_file): Ditto.
>   (ftw_read_file): Ditto.
>   (read_profile_dir_init): Ditto.
>   (gcov_read_profile_dir): Ditto.
>   (gcov_read_counter_mem): Ditto.
>   (gcov_get_merge_weight): Ditto.
>   (merge_wrapper): A wrapper function that calls merging handler.
>   (gcov_merge): Merge two gcov_info objects with weights.
>   (find_match_gcov_info): Find the matched gcov_info in the list.
>   (gcov_profile_merge): Merge two gcov_info object lists.
>   (__gcov_add_counter_op): Process edge profile counter values.
>   (__gcov_ior_counter_op): Process IOR profile counter values.
>   (__gcov_delta_counter_op): Process delta profile counter values.
>   (__gcov_single_counter_op): Process single  profile counter values.
>   (fp_scale): Callback function for float-point scaling.
>   (int_scale): Callback function for integer fraction scaling. 
>   (gcov_profile_scale): Scaling profile counters.
>   (gcov_profile_normalize): Normalize profile counters.
>   * libgcc/libgcov.h: Add headers and macros for gcov-tool use.
> (GCOV_GET_COUNTER): New.
> (GCOV_GET_COUNTER_TARGET): Ditto.
> (struct gcov_info): Make the functions field mutable in gcov-tool
> compilation.
> 
> Index: gcc/gcov-io.c
> ===
> --- gcc/gcov-io.c (revision 209981)
> +++ gcc/gcov-io.c (working copy)
> @@ -64,7 +64,11 @@ GCOV_LINKAGE struct gcov_var
>  } gcov_var;
>  
>  /* Save the current position in the gcov file.  */
> -static inline gcov_position_t
> +/* We need to expose this function when compiling for gcov-tool.  */
> +#ifndef IN_GCOV_TOOL
> +static inline
> +#endif
> +gcov_position_t
>  gcov_position (void)
>  {
>gcc_assert (gcov_var.mode > 0); 
> @@ -72,7 +76,11 @@ gcov_position (void)
>  }
>  
>  /* Return nonzero if the error flag is set.  */
> -static inline int 
> +/* We need to expose this function when compiling for gcov-tool.  */
> +#ifndef IN_GCOV_TOOL
>

Re: [patch] libstdc++/61143 make unordered containers usable after move

2014-05-15 Thread Jonathan Wakely


On 15/05/14 22:20 +0200, François Dumont wrote:

Hi

   Here is a proposal to fix PR 61143. As explained in the PR I 
finally prefer to leave the container in a valid state that is to say 
with a non zero number of bucket, that is to say 1, after a move. This 
bucket is stored directly in the container so not allocated to leave 
the move operations noexcept qualified.


Thanks for fixing this, I like the direction very much. I have a few
comments below ...

With this evolution we could 
even make the default constructor noexcept but I don't think it has 
any interest.


I tend to agree with Paolo that a noexcept default constructor might
be useful, but let's fix the bug first and consider that later.


Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h(revision 210479)
+++ include/bits/hashtable.h(working copy)
@@ -316,14 +316,37 @@
  size_type _M_element_count;
  _RehashPolicy _M_rehash_policy;

+  // A single bucket used when only need 1 bucket. After move the hashtable
+  // is left with only 1 bucket which is not allocated so that we can have 
a
+  // noexcept move constructor.
+  // Note that we can't leave hashtable with 0 bucket without adding
+  // numerous checks in the code to avoid 0 modulus.
+  __bucket_type_M_single_bucket;


Does this get initialized in the constructors?
Would it make sense to give it an initializer?

 __bucket_type  _M_single_bucket = nullptr;


@@ -980,12 +999,16 @@
_M_move_assign(_Hashtable&& __ht, std::true_type)
{
  this->_M_deallocate_nodes(_M_begin());
-  if (__builtin_expect(_M_bucket_count != 0, true))
-   _M_deallocate_buckets();
-
+  _M_deallocate_buckets();
  __hashtable_base::operator=(std::move(__ht));
  _M_rehash_policy = __ht._M_rehash_policy;
-  _M_buckets = __ht._M_buckets;
+  if (__builtin_expect(__ht._M_buckets != &__ht._M_single_bucket, true))
+   _M_buckets = __ht._M_buckets;


What is the value of this->_M_single_bucket now?

Should it be set to nullptr, if only to help debugging?


+  if (__builtin_expect(__ht._M_buckets == &__ht._M_single_bucket, false))


This check appears in a few places, I wonder if it is worth creating a
private member function to hide the details:

 bool _M_moved_from() const noexcept
 {
   return __builtin_expect(_M_buckets == &_M_single_bucket, false);
 }

Then just test if (__ht._M_moved_from())

Usually I would think the __builtin_expect should not be inside the
member function, so the caller decides what the expected result is,
but I think in all cases the result is expected to be false. That
matches how move semantics are designed: the object that gets moved
from is expected to be going out of scope, and so will be reused in a
minority of cases.


@@ -1139,7 +1170,14 @@
{
  if (__ht._M_node_allocator() == this->_M_node_allocator())
{
- _M_buckets = __ht._M_buckets;
+ if (__builtin_expect(__ht._M_buckets == &__ht._M_single_bucket, 
false))


This could be if (__ht._M_moved_from())


@@ -1193,11 +1231,21 @@
  std::swap(_M_bucket_count, __x._M_bucket_count);
  std::swap(_M_before_begin._M_nxt, __x._M_before_begin._M_nxt);
  std::swap(_M_element_count, __x._M_element_count);
+  std::swap(_M_single_bucket, __x._M_single_bucket);

+  // Fix buckets if any is pointing to its single bucket that can't be
+  // swapped.
+  if (_M_buckets == &__x._M_single_bucket)
+   _M_buckets = &_M_single_bucket;
+
+  if (__x._M_buckets == &_M_single_bucket)
+   __x._M_buckets = &__x._M_single_bucket;
+


Does this do more work than necessary, swapping the _M_buckets
members, then updating them again?

How about removing the std::swap(_M_buckets, __x._M_buckets) above and
doing (untested):

 if (this->_M_moved_from())
   {
 if (__x._M_moved_from())
   _M_buckets = &_M_single_bucket;
 else
   _M_buckets = __x._M_buckets;
 __x._M_buckets = &__x._M_single_bucket;
   }
 else if (__x._M_moved_from())
   {
 __x._M_buckets = _M_buckets;
 _M_buckets = &_M_single_bucket;
   }
 else
   std::swap(_M_buckets, __x._M_buckets);

Is that worth it?  I'm not sure.


Index: testsuite/23_containers/unordered_map/allocator/copy_assign.cc
===
--- testsuite/23_containers/unordered_map/allocator/copy_assign.cc  
(revision 210479)
+++ testsuite/23_containers/unordered_map/allocator/copy_assign.cc  
(working copy)


The changes to this file are not listed in the ChangeLog entry, and we
don't want writes to stdout in the testsuite.
Did you intend to include this in the patch?


Index: testsuite/Makefile.in
===
--- testsuite/Makefile.in   (revision 210479)
+++ testsuite/Makefile.in   (working copy)


Same question f

Re: [PATCH] AutoFDO patch for trunk

2014-05-15 Thread Jan Hubicka

> Hi,
> 
> I'm planning to port the AutoFDO patch upstream. Attached is the
> prepared patch. You can also find the patch in
> http://codereview.appspot.com/99010043
> 
> I've tested the patch with SPECCPU2006. For the CINT2006 benchmarks,
> the speedup comparison between O2, FDO and AutoFDO is as follows:
> 
> Reference: o2
> (1): auto_fdo
> (2): fdo
> 
>Benchmark Base:Reference(1)  (2)
> -
> spec/2006/int/C++/471.omnetpp 23.18   +3.11%   +5.09%
> spec/2006/int/C++/473.astar   21.15   +6.79%   +9.80%
> spec/2006/int/C++/483.xalancbmk   36.68  +11.56%  +14.47%
> spec/2006/int/C/400.perlbench 34.57   +6.59%  +18.56%
> spec/2006/int/C/401.bzip2 23.17   +0.95%   +2.49%
> spec/2006/int/C/403.gcc   32.33   +8.27%   +9.76%
> spec/2006/int/C/429.mcf   42.13   +4.72%   +5.23%
> spec/2006/int/C/445.gobmk 26.53   -1.39%   +0.05%
> spec/2006/int/C/456.hmmer 23.72   +7.12%   +7.87%
> spec/2006/int/C/458.sjeng 26.17   +4.65%   +6.04%
> spec/2006/int/C/462.libquantum57.23   +4.04%   +1.42%
> spec/2006/int/C/464.h264ref46.3   +1.07%   +8.97%
> 
> geometric mean+4.73%   +7.36%

The results are very nice indeed. So basically it adds another 50% to
static estimation, that is not bad.

The patch is long and missing a changelog, I will add my comments and
I think it would be great to break it up where possible - for some
infrastructure preparation patches and the actual reading infrastructure.

> Index: gcc/cfghooks.c
> ===
> --- gcc/cfghooks.c(revision 210180)
> +++ gcc/cfghooks.c(working copy)
> @@ -833,6 +833,9 @@ make_forwarder_block (basic_block bb, bool (*redir
>  
>fallthru = split_block_after_labels (bb);
>dummy = fallthru->src;
> +  dummy->count = 0;
> +  dummy->frequency = 0;
> +  fallthru->count = 0;
>bb = fallthru->dest;
>  
>/* Redirect back edges we want to keep.  */
> @@ -842,20 +845,13 @@ make_forwarder_block (basic_block bb, bool (*redir
>  
>if (redirect_edge_p (e))
>   {
> +   dummy->frequency += EDGE_FREQUENCY (e);
> +   dummy->count += e->count;
> +   fallthru->count += e->count;
> ei_next (&ei);
> continue;
>   }
>  
> -  dummy->frequency -= EDGE_FREQUENCY (e);
> -  dummy->count -= e->count;
> -  if (dummy->frequency < 0)
> - dummy->frequency = 0;
> -  if (dummy->count < 0)
> - dummy->count = 0;
> -  fallthru->count -= e->count;
> -  if (fallthru->count < 0)
> - fallthru->count = 0;
> -

Can you elaborate why these changes are needed?  They look unrelated,
so it is a bug in forwarder block profile updating?
I do not see why source of the edge needs to have profile cleaned.
> Index: gcc/loop-unroll.c
> ===
> --- gcc/loop-unroll.c (revision 210180)
> +++ gcc/loop-unroll.c (working copy)
> @@ -998,6 +998,13 @@ decide_unroll_runtime_iterations (struct loop *loo
>return;
>  }
>  
> +  /* In AutoFDO, the profile is not accurate. If the calculated trip count
> + is larger than the header count, then the profile is not accurate
> + enough to make correct unroll decisions. */
> +  if (flag_auto_profile
> +  && expected_loop_iterations (loop) > loop->header->count)
> +return;

This is another independent change.  It does make sense, but I would
preffer to have it in a separate function (expected_loop_iterations_reliable_p)
rather than inline in the code.  

There are other loop optimizations that are driven by this, and they should
consistently use this predicate.

In fact current -fprofile-use enabled -funroll-loops flag that enables all
unrolling independently of reliablility of the profile.  I suppose we need to
imporove this scheme and have something liek unroll-loop-with-reliable-profile
path that would default for profile feedback optimization.

I also do not think your check makes a lot of sense.  You want to compare loop
header count and loop preheader or so, or this is not reliable enough for
auto-FDO profiles?

In any case, lets handle this separately of the main autoFDO patches (and not
forget on it).

> Index: gcc/dwarf2out.c
> ===
> --- gcc/dwarf2out.c   (revision 210180)
> +++ gcc/dwarf2out.c   (working copy)
> @@ -2481,6 +2481,40 @@ const struct gcc_debug_hooks dwarf2_debug_hooks =
>1,/* start_end_main_source_file */
>TYPE_SYMTAB_IS_DIE/* tree_type_symtab_field */
>  };
> +
> +const struct gcc_debug_hooks auto_profile_debug_hooks =
> +{
> +  debug_nothing_charstar,
> +  debug_nothing_charstar,
> +  debug_nothing_void,
> +  debug_nothing_int

RE: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend

2014-05-15 Thread Robert Suchanek

Ping.

From: Robert Suchanek
Sent: 14 May 2014 14:24
To: Richard Sandiford; Matthew Fortune
Cc: Vladimir Makarov; gcc-patches@gcc.gnu.org; Kyrill Tkachov
Subject: RE: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend

Hi Richard,

Are you working on the solution to fix the breakage? I'm about
to look into this and wanted to find out how far we got with this.

Regards,
Robert

> -Original Message-
> From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
> Sent: 10 May 2014 19:44
> To: Matthew Fortune
> Cc: Vladimir Makarov; Robert Suchanek; gcc-patches@gcc.gnu.org; Kyrill
> Tkachov
> Subject: Re: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend
>
> Thanks for looking at this.
>
> Matthew Fortune  writes:
> >> > Hi all,
> >> > This caused some testsuite failures on arm:
> >> > FAIL: gcc.target/arm/vfp-ldmdbd.c scan-assembler fldmdbd
> >> > FAIL: gcc.target/arm/vfp-ldmdbs.c scan-assembler fldmdbs
> >> > FAIL: gcc.target/arm/vfp-ldmiad.c scan-assembler fldmiad
> >> > FAIL: gcc.target/arm/vfp-ldmias.c scan-assembler fldmias
> >> >
> >> >  From the vfp-ldmdbd.c test this patch changed the codegen from:
> >> >  fldmdbdr5!, {d7}
> >> >
> >> > into
> >> >  subr5, r5, #8
> >> >  flddd7, [r5]
> >> >
> >> > Which broke the test.
> >>
> >> Sorry for the breakage.  I've reverted the patch for now and will send a
> >> fixed version when I have time.
> >
> > The problem appears to lie with the new satisfies_memory_constraint_p
> > which is passing just the address to valid_address_p but the constraint
> > is a memory constraint (which wants the mem to be retained).
>
> Yeah.
>
> > One option is to re-construct the MEM using the address_info provided
> > to valid_address_p. This has mode, address space and whether it was
> > actually a MEM to start with so there is enough information.
>
> We don't want to create garbage rtl though.  The substitution happens
> in-place, so the routine does temporarily turn the original MEM into the
> eliminated form.
>
> My first reaction after seeing the bug was to pass the operand in as
> another parameter to valid_address_p.  That made the interface a bit
> ridiculous though, so I didn't post it and reverted the original patch
> instead.
>
> I think a cleaner way of doing it would be to have helper functions
> that switch in and out of the eliminated form, storing the old form
> in fields of a new structure (either separate from address_info,
> or a local inheritance of it).  We probably also want to have arrays
> of address_infos, one for each operand, so that we don't analyse the
> same address too many times during the same insn.
>
> > Another issue noticed while looking at this:
> > process_address does not seem to make an attempt to check for
> > EXTRA_MEMORY_CONSTRAINT even though it does decompose memory addresses.
> > For the lea address case then the address is checked with valid_address_p.
> > This seems inconsistent, is it intentional?
>
> Yeah, I think so.  Memory constraints are different in two main ways:
>
> (a) it's obvious from the MEM what the mode and address space of the
> access actually are.  legitimate_address_p is all about the most
> general address, so in practice no memory constraint should rely
> on accepting more than legitimate_address_p does.
>
> (b) it's useful for one pattern to have several alternatives with
> different memory constraints (e.g. "ZS", "ZT" and "m" in the 32-bit
> MIPS move patterns).  There isn't really anything special about the
> first alternative.
>
> IMO it'd be good to get rid of this special handling for extra address
> constraints, but I've no idea how easy that would be.
>
> Thanks,
> Richard

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-15 Thread Mike Stump

On May 15, 2014, at 1:01 PM, Jeff Law  wrote:
> For the memory optimizations, IIRC, the dependencies keep them from getting 
> into the ready queue at the same time.  Thus it's significantly harder to get 
> them to issue consecutively when you've got an issue rate > 1.

> But if you've got an issue rate > 1, then it's a lot less likely you'll get 
> the store/load pairing up the way you want.

Ah, yeah, on second thought, that won’t work if they are not ready together…

I sort the ready queue:

first ready set:
load
store
everything else

next ready set:
load store
everything else

Now, it might work, if there are no other instructions and no loads in the next 
set, and so on, but that would be an accident.  The utility is only in 
instructions that are ready together, though, tick boundaries and issue rates 
don’t matter any, as I don’t care about ticks or issue rates.  The only 
consideration on order is my ordering operator and once ordered, the target is 
free to combine as it sees fit.  So, for store load optimization, my patch is 
rather useless.

Re: [PATCH] Implement -fsanitize=float-cast-overflow

2014-05-15 Thread Joseph S. Myers

On Thu, 15 May 2014, Jakub Jelinek wrote:

> But I think we can't use decimal_real_from_string, we'd need a variant
> of that function that would allow specification of the rounding mode

My point is that you can use "%.*RUe" or "%.*RDe" formats (for max and min 
respectively), with an appropriate precision, and let MPFR do the rounding 
to an appropriate number of decimal digits in the right direction (to 
produce a value that's exactly representable in the relevant DFP type, as 
long as it's in range).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend

2014-05-15 Thread Richard Sandiford

Robert Suchanek  writes:
> Are you working on the solution to fix the breakage? I'm about
> to look into this and wanted to find out how far we got with this.

You mean the "cleaner way" I suggested, or something else?
If you want to have a go then feel free.  Otherwise I'll try to get
to it over the weekend.

Richard

add dbgcnt and opt-info support for devirtualization

2014-05-15 Thread Xinliang David Li

Hi, debugging runtime bugs due to devirtualization can be hard for
very large C++ programs with complicated class hierarchy. This patch
adds the support to report this high level transformation via
-fopt-info (not hidden inside dump file) and the ability the do binary
search with cutoff.

Ok for trunk after build and test?

thanks,

David
Index: ChangeLog
===
--- ChangeLog   (revision 210479)
+++ ChangeLog   (working copy)
@@ -1,3 +1,18 @@
+2014-05-15  Xinliang David Li  
+
+   * cgraphunit.c (walk_polymorphic_call_targets): Add
+   dbgcnt and fopt-info support.
+   2014-05-15  Xinliang David Li  
+
+   * cgraphunit.c (walk_polymorphic_call_targets): Add
+   dbgcnt and fopt-info support.
+   * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto.
+   * ipa-devirt.c (ipa_devirt): Ditto.
+   * ipa.c (walk_polymorphic_call_targets): Ditto.
+   * gimple-fold.c (fold_gimple_assign): Ditto.
+   (gimple_fold_call): Ditto.
+   * dbgcnt.def: New counter.
+
 2014-05-15  Martin Jambor  
 
PR ipa/61085
Index: ipa-prop.c
===
--- ipa-prop.c  (revision 210479)
+++ ipa-prop.c  (working copy)
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.
 #include "ipa-utils.h"
 #include "stringpool.h"
 #include "tree-ssanames.h"
+#include "dbgcnt.h"
 
 /* Intermediate information about a parameter that is only useful during the
run of ipa_analyze_node and is not kept afterwards.  */
@@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c
fprintf (dump_file, "ipa-prop: Discovered direct call to 
non-function"
" in %s/%i, making it unreachable.\n",
 ie->caller->name (), ie->caller->order);
+  else if (dump_enabled_p ())
+   {
+ location_t loc = gimple_location (ie->call_stmt);
+ dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+  "Discovered direct call to non-function in %s, "
+  "making it unreachable\n", ie->caller->name ());
+   }
  target = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
  callee = cgraph_get_create_node (target);
  unreachable = true;
@@ -2527,6 +2535,10 @@ ipa_make_edge_direct_to_target (struct c
}
   callee = cgraph_get_create_node (target);
 }
+
+  if (!dbg_cnt (devirt))
+return NULL;
+
   ipa_check_create_node_params ();
 
   /* We can not make edges to inline clones.  It is bug that someone removed
@@ -2547,6 +2559,13 @@ ipa_make_edge_direct_to_target (struct c
   else
fprintf (dump_file, "with uid %i\n", ie->lto_stmt_uid);
  }
+  if (dump_enabled_p ())
+{
+  location_t loc = gimple_location (ie->call_stmt);
+  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+  "Converting indirect call in %s to direct call to %s\n",
+  ie->caller->name (), callee->name ());
+}
   ie = cgraph_make_edge_direct (ie, callee);
   es = inline_edge_summary (ie);
   es->call_stmt_size -= (eni_size_weights.indirect_call_cost
Index: gimple-fold.c
===
--- gimple-fold.c   (revision 210479)
+++ gimple-fold.c   (working copy)
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.
 #include "tree-ssa-address.h"
 #include "langhooks.h"
 #include "gimplify-me.h"
+#include "dbgcnt.h"
 
 /* Return true when DECL can be referenced from current unit.
FROM_DECL (if non-null) specify constructor of variable DECL was taken from.
@@ -389,10 +390,23 @@ fold_gimple_assign (gimple_stmt_iterator
if (final && targets.length () <= 1)
  {
tree fndecl;
+   if (!dbg_cnt (devirt))
+ return val;
+
if (targets.length () == 1)
  fndecl = targets[0]->decl;
else
  fndecl = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
+   if (dump_enabled_p ())
+ {
+   location_t loc = gimple_location (stmt);
+   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+"Resolving virtual function address "
+"reference to function %s\n",
+targets.length () == 1
+? targets[0]->name ()
+: "unreachable function"); 
+ }
val = fold_convert (TREE_TYPE (val), fndecl);
STRIP_USELESS_TYPE_CONVERSION (val);
return val;
@@ -1124,9 +1138,18 @@ gimple_fold_call (gimple_stmt_iterator

Re: [PATCH] AutoFDO patch for trunk

2014-05-15 Thread Jan Hubicka

> Index: gcc/auto-profile.c
> ===
> --- gcc/auto-profile.c(revision 0)
> +++ gcc/auto-profile.c(revision 0)
> @@ -0,0 +1,1584 @@
> +/* Calculate branch probabilities, and basic block execution counts.

Update the toplevel comment, too.

> +
> +/* The following routines implements AutoFDO optimization.
> +
> +   This optimization uses sampling profiles to annotate basic block counts
> +   and uses heuristics to estimate branch probabilities.
> +
> +   There are three phases in AutoFDO:
> +
> +   Phase 1: Read profile from the profile data file.
> + The following info is read from the profile datafile:
> + * string_table: a map between function name and its index.
> + * autofdo_source_profile: a map from function_instance name to
> +   function_instance. This is represented as a forest of
> +   function_instances.
> + * WorkingSet: a histogram of how many instructions are covered for a
> + given percentage of total cycles.
> +
> +   Phase 2: Early inline.
> + Early inline uses autofdo_source_profile to find if a callsite is:
> + * inlined in the profiled binary.
> + * callee body is hot in the profiling run.
> + If both condition satisfies, early inline will inline the callsite
> + regardless of the code growth.
> +
> +   Phase 3: Annotate control flow graph.
> + AutoFDO uses a separate pass to:
> + * Annotate basic block count
> + * Estimate branch probability
> +
> +   After the above 3 phases, all profile is readily annotated on the GCC IR.
> +   AutoFDO tries to reuse all FDO infrastructure as much as possible to make
> +   use of the profile. E.g. it uses existing mechanism to calculate the basic
> +   block/edge frequency, as well as the cgraph node/edge count.
> +*/

I suppose the phases can be made explicit to the PM, so early inliner can stay
a pass.
> +/* Store a string array, indexed by string position in the array.  */
> +class string_table {
> +public:
> +  static string_table *create ();
> +
> +  /* For a given string, returns its index.  */
> +  int get_index (const char *name) const;
> +
> +  /* For a given decl, returns the index of the decl name.  */
> +  int get_index_by_decl (tree decl) const;
> +
> +  /* For a given index, returns the string.  */
> +  const char *get_name (int index) const;

I wonder how these should work with LTO.  I suppose we can tell user to not LTO
for train run, we should teach the table to ignore LTO generated suffixes as
well as random seeds (like coverage code already does).

What happens when two static functions have same name?

> +/* Profile for all functions.  */
> +class autofdo_source_profile {
> +public:
> +  static autofdo_source_profile *create ()
> +{
> +  autofdo_source_profile *map = new autofdo_source_profile ();

I think we still want empty lines here.  Otherwise I trust you on following GCC 
C++ coding standards :)
> +  if (map->read ())
> + return map;
> +  delete map;
> +  return NULL;
> +}

> +/* Helper functions.  */
> +
> +/* Return the original name of NAME: strip the suffix that starts
> +   with '.'  */
> +
> +static const char *get_original_name (const char *name)
> +{
> +  char *ret = xstrdup (name);
> +  char *find = strchr (ret, '.');
> +  if (find != NULL)
> +*find = 0;
> +  return ret;
> +}

Stripping all suffixes will likely make conflicts on static functions, right?
Example like this will get you a conflict
m()
{ 
void t(void)
{
}
t();
}
m2()
{
  void t2(void)
{
}
t2();
}

that is of course not that important given that nested functions are not top 
priority
but it would be nice to handle it gratefuly.
I would definitely merge logic here with logic in gcov that already knows how 
to skip
random seeds and other cases that makes filenames to diverge...
> +
> +/* Return the combined location, which is a 32bit integer in which
> +   higher 16 bits stores the line offset of LOC to the start lineno
> +   of DECL, The lower 16 bits stores the discrimnator.  */
> +
> +static unsigned
> +get_combined_location (location_t loc, tree decl)
> +{
> +  return ((LOCATION_LINE (loc) - DECL_SOURCE_LINE (decl)) << 16);
> +}

What happens on overflows here? I guess they are possible - 65536 lines is not 
infinity
these days.
The discriminator is added later?
> +
> +int
> +string_table::get_index (const char *name) const
> +{
> +  if (name == NULL)
> +return -1;
> +  string_index_map::const_iterator iter = map_.find (name);
> +  if (iter == map_.end())
> +return -1;
> +  else
> +return iter->second;
> +}
> +
> +int
> +string_table::get_index_by_decl (tree decl) const

Comments missing here.
> +
> +/* Read the profile and create a function_instance with head count as
> +   HEAD_COUNT. Recursively read callsites to create nested function_instances
> +   too. STACK is used to track the recursive creation process.  */
> +
> +function_instance *
> +function_instance::read_func

avx runtime check

2014-05-15 Thread Mike Stump

This reorders the avx checks and gates on a target triplet check before 
compiling any code.

Ok?

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 40b5414..103a28a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1353,8 +1353,8 @@ proc check_effective_target_sse2_runtime { } {
 # Return 1 if the target supports running AVX executables, 0 otherwise.
 
 proc check_effective_target_avx_runtime { } {
-if { [check_effective_target_avx]
-&& [check_avx_hw_available]
+if { [check_avx_hw_available]
+&& [check_effective_target_avx]
 && [check_avx_os_support_available] } {
return 1
 }

Re: [PATCH, libstdc++/61166] overflow when parse number in std::duration operator""

2014-05-15 Thread Ed Smith-Rowland


On 05/15/2014 03:03 PM, Jonathan Wakely wrote:

Here's a finished patch to simplify 

Tested x86_64-linux. Ed, any objection to this version?


This looks great, thanks!

Having done that should we actually stop using it as suggested in the 
bug trail? ;-)

[PATCH, rs6000] Commited fix for PR target/61193, HTM intrinsic API incompatibility between Power and S390

2014-05-15 Thread Peter Bergner

The IBM XL team defined a set of HTM intrinsic functions that were supposed
to be API compatible across the XL and GCC compilers on both Power and S390.
PR61193 describes an issue where the functions that begin a transaction
are incompatible.  The Power intrinsics return non-zero on success, while
the S390 intrinsics return zero on success.

After discussing this with the XL compiler team, the S390 GCC team and
the Power GCC team, we have decided to leave the incompatibility between
Power and S390.  However, the XL and GCC compilers will be compatible
with each other when targeting the same processor target.  To mitigate
the incompatibility somewhat, we have decided to add a macro to the
powerpc*-linux's htmintrin.h file that defines what the "successful"
return status value of the __TM_simple_begin() and __TM_begin() intrinsic
function is.  This macro is already defined in the S390's htmintrin.h
header file and is used by the S390 to determine whether the transaction
was successfully started or not.  By adding the same macro on Power, we
allow users to write common code between Power and S390, even though our
successful return status values are different.

For example, the following code can be used on Power and S390, even
though the actual value returned by __TM_simple_begin() is different.

  if ((tx_state = __TM_simple_begin ()) == _HTM_TBEGIN_STARTED)
{
  /* Transaction State Initiated.  */
  ...
}
  else
{
  /* Transaction State Failed.  */
  ...
}

David approved this offline, so I'm committing this to mainline as revision
210486, as well as the 4.9 (210487) and 4.8 (210488) branches.

Peter

PR target/61193
* config/rs6000/htmxlintrin.h (_HTM_TBEGIN_STARTED): New define.
(__TM_simple_begin): Use it.
(__TM_begin): Likewise.

Index: gcc/config/rs6000/htmxlintrin.h
===
--- gcc/config/rs6000/htmxlintrin.h (revision 210485)
+++ gcc/config/rs6000/htmxlintrin.h (working copy)
@@ -46,12 +46,17 @@ extern "C" {

 typedef char TM_buff_type[16];

+/* Compatibility macro with s390.  This macro can be used to determine
+   whether a transaction was successfully started from the __TM_begin()
+   and __TM_simple_begin() intrinsic functions below.  */
+#define _HTM_TBEGIN_STARTED 1
+
 extern __inline long
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 __TM_simple_begin (void)
 {
   if (__builtin_expect (__builtin_tbegin (0), 1))
-return 1;
+return _HTM_TBEGIN_STARTED;
   return 0;
 }

@@ -61,7 +66,7 @@ __TM_begin (void* const TM_buff)
 {
   *_TEXASRL_PTR (TM_buff) = 0;
   if (__builtin_expect (__builtin_tbegin (0), 1))
-return 1;
+return _HTM_TBEGIN_STARTED;
 #ifdef __powerpc64__
   *_TEXASR_PTR (TM_buff) = __builtin_get_texasr ();
 #else

[fortran-dev] Merge the trunk into the branch

2014-05-15 Thread Tobias Burnus

I have updated the trunk into the "fortran-dev" branch, which was quite 
painful as I got a lot of conflicts when merging libgo (!). Committed as 
Rev. 210480  (up to Rev. 208756) and Rev. 210490 (up to current trunk).


Tobias

Re: avx runtime check

2014-05-15 Thread Richard Biener

On May 16, 2014 4:47:11 AM CEST, Mike Stump  wrote:
>This reorders the avx checks and gates on a target triplet check before
>compiling any code.

Can you explain why?

>Ok?
>
>diff --git a/gcc/testsuite/lib/target-supports.exp
>b/gcc/testsuite/lib/target-supports.exp
>index 40b5414..103a28a 100644
>--- a/gcc/testsuite/lib/target-supports.exp
>+++ b/gcc/testsuite/lib/target-supports.exp
>@@ -1353,8 +1353,8 @@ proc check_effective_target_sse2_runtime { } {
># Return 1 if the target supports running AVX executables, 0 otherwise.
> 
> proc check_effective_target_avx_runtime { } {
>-if { [check_effective_target_avx]
>-&& [check_avx_hw_available]
>+if { [check_avx_hw_available]
>+&& [check_effective_target_avx]
> && [check_avx_os_support_available] } {
>return 1
> }

91 matches

Mail list logo