date:20141209

Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics

2014-12-09 Thread Yangfei (Felix)

> On 28 November 2014 at 09:23, Yangfei (Felix)  wrote:
> > Hi,
> >   This patch converts vpmaxX & vpminX intrinsics to use builtin functions
> instead of the previous inline assembly syntax.
> >   Regtested with aarch64-linux-gnu on QEMU.  Also passed the glorious
> testsuite of Christophe Lyon.
> >   OK for the trunk?
> 
> Hi Felix,   We know from experience that the advsimd intrinsics tend
> to be fragile for big endian and in general it is fairly easy to break the 
> big endian
> case.  For these advsimd improvements that you are working on (that we very
> much appreciate) it is important to run both little endian and big endian
> regressions.
> 
> Thanks
> /Marcus


Okay.  Any plan for the advsimd big-endian improvement? 
I rebased this patch over Alan Lawrance's patch: 
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html 
No regressions for aarch64_be-linux-gnu target too.  OK for the thunk? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 218464)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,18 @@
+2014-12-09  Felix Yang  
+
+   * config/aarch64/aarch64-simd.md (aarch64_p): New
+   pattern.
+   * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
+   uminp, smax_nanp, smin_nanp): New builtins.
+   * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
+   vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32,
+   vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64,
+   vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64,
+   vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8,
+   vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8,
+   vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64,
+   vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64,
+
 2014-12-07  Felix Yang  
Shanyao Chen  
 
Index: gcc/config/aarch64/arm_neon.h
===
--- gcc/config/aarch64/arm_neon.h   (revision 218464)
+++ gcc/config/aarch64/arm_neon.h   (working copy)
@@ -8843,491 +8843,7 @@ vpadds_f32 (float32x2_t a)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpmax_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("fmaxp %0.2s, %1.2s, %2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vpmax_s8 (int8x8_t a, int8x8_t b)
-{
-  int8x8_t result;
-  __asm__ ("smaxp %0.8b, %1.8b, %2.8b"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vpmax_s16 (int16x4_t a, int16x4_t b)
-{
-  int16x4_t result;
-  __asm__ ("smaxp %0.4h, %1.4h, %2.4h"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vpmax_s32 (int32x2_t a, int32x2_t b)
-{
-  int32x2_t result;
-  __asm__ ("smaxp %0.2s, %1.2s, %2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vpmax_u8 (uint8x8_t a, uint8x8_t b)
-{
-  uint8x8_t result;
-  __asm__ ("umaxp %0.8b, %1.8b, %2.8b"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vpmax_u16 (uint16x4_t a, uint16x4_t b)
-{
-  uint16x4_t result;
-  __asm__ ("umaxp %0.4h, %1.4h, %2.4h"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vpmax_u32 (uint32x2_t a, uint32x2_t b)
-{
-  uint32x2_t result;
-  __asm__ ("umaxp %0.2s, %1.2s, %2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpmaxnm_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("fmaxnmp %0.2s,%1.2s,%2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vpmaxnmq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("fmaxnmp %0.4s,%1.4s,%2.4s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attri

Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics

2014-12-09 Thread Yangfei (Felix)

> You'll need to rebase over Alan Lawrance's patch.
> https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html

Yes, see my new patch: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00750.html 

> > +;; Pairwise Integer Max/Min operations.
> > +(define_insn "aarch64_p"
> > + [(set (match_operand:VQ_S 0 "register_operand" "=w")
> > +   (unspec:VQ_S [(match_operand:VQ_S 1 "register_operand" "w")
> > +(match_operand:VQ_S 2 "register_operand" "w")]
> > +   MAXMINV))]
> > + "TARGET_SIMD"
> > + "p\t%0., %1., %2."
> > +  [(set_attr "type" "neon_minmax")]
> > +)
> > +
> 
> Could you roll aarch64_reduc__internalv2si into this pattern?

Will come up with another patch to fix this.  Thanks for pointing this out.

> 
> Thanks,
> Tejas.

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-12-09 Thread Ilya Enkovich

On 24 Nov 17:02, Ilya Enkovich wrote:
> 
> Right.  This works for both top level and multilib checks because failing 
> test is used and CC is usually not set when it's called by the top level 
> configure.  If we configure with CC=... then it may go wrong.  I left only 
> target check in configure.tgt and inlined test for x32 into libmpx configure.
> 
> > 
> > -- 
> > Joseph S. Myers
> > jos...@codesourcery.com
> 
> Here is an updated version.
> 
> Thanks,
> Ilya
> --

Here is an updated version.  I moved linker specs to target.  Currently mpx 
libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think 
gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations.

Thanks,
Ilya
--
2014-12-09  Ilya Enkovich  

* Makefile.def: Add libmpx.
* configure.ac: Add libmpx.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/

2014-12-09  Ilya Enkovich  

* config/i386/linux-common.h (LIBMPX_LIBS): New.
(LIBMPX_SPEC): New.
(CHKP_SPEC): New.
* gcc.c (CHKP_SPEC): New.
(LINK_COMMAND_SPEC): Add CHKP_SPEC.
* c-family/c.opt (static-libmpx): New.

libmpx/

2014-12-09  Ilya Enkovich  

Initial commit.


diff --git a/Makefile.def b/Makefile.def
index 7c8761a..82432ef 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -129,6 +129,9 @@ target_modules = { module= libsanitizer;
   bootstrap=true;
   lib_path=.libs;
   raw_cxx=true; };
+target_modules = { module= libmpx;
+  bootstrap=true;
+  lib_path=.libs; };
 target_modules = { module= libvtv;
   bootstrap=true;
   lib_path=.libs;
diff --git a/configure.ac b/configure.ac
index fd1bdf0..1d36f6b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -162,6 +162,7 @@ target_libraries="target-libgcc \
target-libstdc++-v3 \
target-libsanitizer \
target-libvtv \
+   target-libmpx \
target-libssp \
target-libquadmath \
target-libgfortran \
@@ -642,6 +643,25 @@ if test -d ${srcdir}/libvtv; then
 fi
 fi
 
+
+# Disable libmpx on unsupported systems.
+if test -d ${srcdir}/libmpx; then
+if test x$enable_libmpx = x; then
+   AC_MSG_CHECKING([for libmpx support])
+   if (srcdir=${srcdir}/libmpx; \
+   . ${srcdir}/configure.tgt; \
+   test "$LIBMPX_SUPPORTED" != "yes")
+   then
+   AC_MSG_RESULT([no])
+   noconfigdirs="$noconfigdirs target-libmpx"
+   else
+   AC_MSG_RESULT([yes])
+   fi
+fi
+fi
+
+
+
 # Disable libquadmath for some systems.
 case "${target}" in
   avr-*-*)
@@ -2662,6 +2682,11 @@ if echo " ${target_configdirs} " | grep " libvtv " > 
/dev/null 2>&1 &&
   bootstrap_target_libs=${bootstrap_target_libs}target-libvtv,
 fi
 
+# If we are building libmpx, bootstrap it.
+if echo " ${target_configdirs} " | grep " libmpx " > /dev/null 2>&1; then
+  bootstrap_target_libs=${bootstrap_target_libs}target-libmpx,
+fi
+
 # Determine whether gdb needs tk/tcl or not.
 # Use 'maybe' since enable_gdbtk might be true even if tk isn't available
 # and in that case we want gdb to be built without tk.  Ugh!
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index b9f7c65..65731cc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only
 C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0)
 Instrument only functions marked with bnd_instrument attribute.
 
+static-libmpx
+Driver
+
 fcilkplus
 C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
 Enable Cilk Plus
diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h
index 1eaf024..07cd1a2 100644
--- a/gcc/config/i386/linux-common.h
+++ b/gcc/config/i386/linux-common.h
@@ -53,3 +53,27 @@ along with GCC; see the file COPYING3.  If not see
   GNU_USER_TARGET_ENDFILE_SPEC, \
   GNU_USER_TARGET_MATHFILE_SPEC " " \
   ANDROID_ENDFILE_SPEC)
+
+#ifndef LIBMPX_LIBS
+#define LIBMPX_LIBS "\
+ %:include(libmpx.spec)%(link_libmpx)"
+#endif
+
+#ifndef LIBMPX_SPEC
+#if defined(HAVE_LD_STATIC_DYNAMIC)
+#define LIBMPX_SPEC "\
+%{mmpx:%{fcheck-pointer-bounds:\
+%{static:--whole-archive -lmpx --no-whole-archive" LIBMPX_LIBS "}\
+%{!static:%{static-libmpx:" LD_STATIC_OPTION " --whole-archive}\
+-lmpx %{static-libmpx:--no-whole-archive " LD_DYNAMIC_OPTION \
+LIBMPX_LIBS ""
+#else
+#define LIBMPX_SPEC "\
+%{mmpx:%{fcheck-pointer-bounds:-lmpx" LIBMPX_LIBS "}}"
+#endif
+#endif
+
+#ifndef CHKP_SPEC
+#define CHKP_SPEC "\
+%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC "}}"
+#endif
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 4cb4ba3..636830e 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -809,6 +809,10 @@ proper position among the other output files.  */
 %{fvtable-verify=preinit: -lvtv -u_vtable_map_vars_s

Re: [PATCH, MPX wrappers 1/3] Add MPX wrappers library

2014-12-09 Thread Ilya Enkovich

On 05 Dec 15:52, Jeff Law wrote:
> On 12/03/14 07:28, Ilya Enkovich wrote:
> >>>   #ifndef MPX_SPEC
> >>>   #define MPX_SPEC "\
> >>>-%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC "}}"
> >>>+%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC LIBMPXWRAPPERS_SPEC "}}"
> >>>   #endif
> >>
> >>Ugh.  Somehow I missed that MPX_SPEC was in gcc.c along with the uses of
> >>LIBMPX_SPEC.  Aren't all these target specific and thus belong in the x86
> >>specific files?
> >
> >Is config/i386/linux-common.h is a proper place for these specs then?
> Depends on whether or not we expect MPX to show up on other systems
> such as *bsd, mingw, solaris, etc.
> 
> So I'd say linux-common.h is better than gcc.c, but perhaps not the
> best location.  Uros should chime in here.
> 
> >Right. Wrappers code doesn't use anything specific to MPX.  In case of
> >pure software solution we should be able to compile and use this
> >library without changes (except compilation flags).  But in case pure
> >software solution exists MPX option should still be available and we
> >should have two builds for this library.
> Ok.  Just wanted to be sure I understood how the pieces fit
> together.  I don't really expect a software implementation, but
> keeping it in mind helps us reasonably consider where certain things
> belong implementation wise.
> 
> jeff
> 

Here is a version with linker specs moved into linux-common.h.

Thanks,
Ilya
--
gcc/

2014-12-09  Ilya Enkovich  

* config/i386/linux-common.h (LIBMPX_WRAPPERSSPEC): New.
(CHKP_SPEC): Add wrappers library.
* c-family/c.opt (static-libmpxwrappers): New.

libmpx/

2014-12-09  Ilya Enkovich  

* Makefile.am (SUBDIRS): Add mpxwrap when used
AS supports MPX.
(MAKEOVERRIDES): New.
* Makefile.in: Regenerate.
* configure.ac: Check AS supports MPX.  Add
mpxintr/Makefile to config files.
* configure: Regenerate.
* mpxwrap/Makefile.am: New.
* mpxwrap/Makefile.in: New.
* mpxwrap/libtool-version: New.
* mpxwrap/mpx_wrappers.cc: New.
* mpxwrap/libmpxwrappers.map: New.


diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 65731cc..2632c3f 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1023,6 +1023,9 @@ Instrument only functions marked with bnd_instrument 
attribute.
 static-libmpx
 Driver
 
+static-libmpxwrappers
+Driver
+
 fcilkplus
 C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
 Enable Cilk Plus
diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h
index 07cd1a2..854af46 100644
--- a/gcc/config/i386/linux-common.h
+++ b/gcc/config/i386/linux-common.h
@@ -73,7 +73,21 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 #endif
 
+#ifndef LIBMPXWRAPPERS_SPEC
+#if defined(HAVE_LD_STATIC_DYNAMIC)
+#define LIBMPXWRAPPERS_SPEC "\
+%{mmpx:%{fcheck-pointer-bounds:%{!fno-chkp-use-wrappers:\
+%{static:-lmpxwrappers}\
+%{!static:%{static-libmpxwrappers:" LD_STATIC_OPTION " --whole-archive}\
+-lmpxwrappers %{static-libmpxwrappers:--no-whole-archive "\
+LD_DYNAMIC_OPTION "}"
+#else
+#define LIBMPXWRAPPERS_SPEC "\
+%{mmpx:%{fcheck-pointer-bounds:{!fno-chkp-use-wrappers:-lmpxwrappers}}}"
+#endif
+#endif
+
 #ifndef CHKP_SPEC
 #define CHKP_SPEC "\
-%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC "}}"
+%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC LIBMPXWRAPPERS_SPEC "}}"
 #endif
diff --git a/libmpx/Makefile.am b/libmpx/Makefile.am
index 6cee4ac..bd0a8b6 100644
--- a/libmpx/Makefile.am
+++ b/libmpx/Makefile.am
@@ -2,6 +2,9 @@ ACLOCAL_AMFLAGS = -I .. -I ../config
 
 if LIBMPX_SUPPORTED
 SUBDIRS = mpxrt
+if MPX_AS_SUPPORTED
+SUBDIRS += mpxwrap
+endif
 nodist_toolexeclib_HEADERS = libmpx.spec
 endif
 
@@ -45,3 +48,5 @@ AM_MAKEFLAGS = \
"PICFLAG=$(PICFLAG)" \
"RANLIB=$(RANLIB)" \
"DESTDIR=$(DESTDIR)"
+
+MAKEOVERRIDES =
diff --git a/libmpx/configure.ac b/libmpx/configure.ac
index dbfad02..4669525 100644
--- a/libmpx/configure.ac
+++ b/libmpx/configure.ac
@@ -100,6 +100,18 @@ AC_CHECK_TOOL(AS, as)
 AC_CHECK_TOOL(AR, ar)
 AC_CHECK_TOOL(RANLIB, ranlib, :)
 
+# Check we may build wrappers library
+echo "test:  bndmov %bnd0, %bnd1" > conftest.s
+if AC_TRY_COMMAND([$AS -o conftest.o conftest.s 1>&AS_MESSAGE_LOG_FD])
+then
+mpx_as=yes
+else
+mpx_as=no
+echo "configure: no MPX support fo as" >&AS_MESSAGE_LOG_FD
+fi
+rm -f conftest.o conftest.s
+AM_CONDITIONAL(MPX_AS_SUPPORTED, [test "x$mpx_as" = "xyes"])
+
 # Configure libtool
 AC_LIBTOOL_DLOPEN
 AM_PROG_LIBTOOL
@@ -117,7 +129,7 @@ fi
 
 AC_CONFIG_FILES([Makefile libmpx.spec])
 AC_CONFIG_HEADERS(config.h)
-AC_CONFIG_FILES(AC_FOREACH([DIR], [mpxrt], [DIR/Makefile]),
+AC_CONFIG_FILES(AC_FOREACH([DIR], [mpxrt mpxwrap], [DIR/Makefile ]),
   [cat > vpsed$$ << \_EOF
 s!`test -f '$<' || echo '$(srcdir)/'`!!
 _EOF
diff --git a/libmpx/mpxwrap/Makefile.am b/libmpx/mpxwrap/Makefile.am
new file mode 100644
index 000..c254d9b
--- /dev/null
+++ b/libmpx/mpxwrap/Makefile.am
@@ -

[PING ^ 3] [RFC PATCH, AARCH64] Add support for -mlong-calls option

2014-12-09 Thread Yangfei (Felix)

Hi, 
  This is a pin for: 
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02258.html 

Thanks.

Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)

2014-12-09 Thread Uros Bizjak

On Mon, Dec 8, 2014 at 10:42 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch attempts to fix
> (set (reg:V*) (vec_duplicate:V* (reg/mem:)))
> patterns.  One issue is that there were separate patterns for
> broadcast from gpr and separate patterns for broadcast from memory
> (and vector reg), that isn't a good idea for reload, which can't then
> freely choose.  Another issue is that some pre-AVX2 broadcast patterns
> were present above the avx512vl broadcast patterns, so again, reload didn't
> have the possibility to use %xmm16-31/%ymm16-31 registers.  Also, the
> splitter written for AVX2 broadcasts from gpr went into the way of AVX512VL
> broadcasts.  And finally, the avx512*intrin.h headers were using
> #ifdef TARGET_64BIT, macro not used anywhere (probably meant to write
> __x86_64__ instead, but with the patch we actually just have one set of
> builtins.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2014-12-08  Jakub Jelinek  
>
> PR target/63594
> * config/i386/sse.md (vec_dupv4sf): Move after
> _vec_dup_gpr pattern.
> (*vec_dupv4si, *vec_dupv2di): Likewise.
> (_vec_dup_mem): Merge into ...
> (_vec_dup_gpr): ... this
> pattern.
> (*vec_dup AVX2_VEC_DUP_MODE splitter): Disable for
> TARGET_AVX512VL (for QI/HI scalar modes only if TARGET_AVX512BW
> is set too).
> * config/i386/i386.c (enum ix86_builtins): Remove
> IX86_BUILTIN_PBROADCASTQ256_MEM_MASK,
> IX86_BUILTIN_PBROADCASTQ128_MEM_MASK and
> IX86_BUILTIN_PBROADCASTQ512_MEM.
> (bdesc_args): Use __builtin_ia32_pbroadcastq512_gpr_mask,
> __builtin_ia32_pbroadcastq256_gpr_mask and
> __builtin_ia32_pbroadcastq128_gpr_mask instead of *_mem_mask
> regardless of OPTION_MASK_ISA_64BIT.
> * config/i386/avx512fintrin.h (_mm512_set1_epi64,
> _mm512_mask_set1_epi64, _mm512_maskz_set1_epi64): Use *_gpr_mask
> builtins regardless of whether TARGET_64BIT is defined or not.
> * config/i386/avx512vlintrin.h (_mm256_mask_set1_epi64,
> _mm256_maskz_set1_epi64, _mm_mask_set1_epi64, _mm_maskz_set1_epi64):
> Likewise.

LGTM, but please see inline comment below.

> --- gcc/config/i386/sse.md.jj   2014-12-03 11:52:41.0 +0100
> +++ gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
> @@ -6319,22 +6319,6 @@ (define_insn "avx512f_vec_dup_1"
>  (set_attr "prefix" "evex")
>  (set_attr "mode" "")])
>
> -(define_insn "vec_dupv4sf"
> -  [(set (match_operand:V4SF 0 "register_operand" "=x,x,x")
> -   (vec_duplicate:V4SF
> - (match_operand:SF 1 "nonimmediate_operand" "x,m,0")))]
> -  "TARGET_SSE"
> -  "@
> -   vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
> -   vbroadcastss\t{%1, %0|%0, %1}
> -   shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "avx,avx,noavx")
> -   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
> -   (set_attr "length_immediate" "1,0,1")
> -   (set_attr "prefix_extra" "0,1,*")
> -   (set_attr "prefix" "vex,vex,orig")
> -   (set_attr "mode" "V4SF")])
> -
>  ;; Although insertps takes register source, we prefer
>  ;; unpcklps with register source since it is shorter.
>  (define_insn "*vec_concatv2sf_sse4_1"
> @@ -12821,37 +12805,6 @@ (define_split
>operands[1] = adjust_address (operands[1], mode, offs);
>  })
>
> -(define_insn "*vec_dupv4si"
> -  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
> -   (vec_duplicate:V4SI
> - (match_operand:SI 1 "nonimmediate_operand" " x,m,0")))]
> -  "TARGET_SSE"
> -  "@
> -   %vpshufd\t{$0, %1, %0|%0, %1, 0}
> -   vbroadcastss\t{%1, %0|%0, %1}
> -   shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "sse2,avx,noavx")
> -   (set_attr "type" "sselog1,ssemov,sselog1")
> -   (set_attr "length_immediate" "1,0,1")
> -   (set_attr "prefix_extra" "0,1,*")
> -   (set_attr "prefix" "maybe_vex,vex,orig")
> -   (set_attr "mode" "TI,V4SF,V4SF")])
> -
> -(define_insn "*vec_dupv2di"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,x")
> -   (vec_duplicate:V2DI
> - (match_operand:DI 1 "nonimmediate_operand" " 0,x,m,0")))]
> -  "TARGET_SSE"
> -  "@
> -   punpcklqdq\t%0, %0
> -   vpunpcklqdq\t{%d1, %0|%0, %d1}
> -   %vmovddup\t{%1, %0|%0, %1}
> -   movlhps\t%0, %0"
> -  [(set_attr "isa" "sse2_noavx,avx,sse3,noavx")
> -   (set_attr "type" "sselog1,sselog1,sselog1,ssemov")
> -   (set_attr "prefix" "orig,vex,maybe_vex,orig")
> -   (set_attr "mode" "TI,TI,DF,V4SF")])
> -
>  (define_insn "*vec_concatv2si_sse4_1"
>[(set (match_operand:V2SI 0 "register_operand" "=Yr,*x,x, Yr,*x,x, x, 
> *y,*y")
> (vec_concat:V2SI
> @@ -16665,46 +16618,78 @@ (define_insn "avx512f_broa
> (set_attr "mode" "")])
>
>  (define_insn "_vec_dup_gpr"
> -  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
> +  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v,v")
> (vec_duplicate:VI12_AVX512VL
> - (match_operand: 1 "

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Uros Bizjak

On Tue, Dec 9, 2014 at 12:33 AM, Evgeny Stupachenko  wrote:

> The patch fix pblendv expand.
> The bug was uncovered when permutation operands are constants.
> In this case we init target register for expand_vec_perm_1 with
> constant and then rewrite the target with constant for
> expand_vec_perm_pblend.
>
> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
> -flto -march=corei7.
>
> Bootstrap and make check passed.
>
> Is it ok?

Please add a testcase.

Uros.

>
> Evgeny
>
> 2014-12-09  Evgeny Stupachenko  
>
> gcc/
> * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
> expand_vec_perm_1 target.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index eafc15a..5a914ad 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>  dcopy.op0 = dcopy.op1 = d->op1;
>else
>  dcopy.op0 = dcopy.op1 = d->op0;
> +  dcopy.target = gen_reg_rtx (vmode);
>dcopy.one_operand_p = true;
>
>for (i = 0; i < nelt; ++i)

Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-09 Thread Richard Biener

On Mon, 8 Dec 2014, Joseph Myers wrote:

> On Mon, 8 Dec 2014, Richard Biener wrote:
> 
> > The alternative is to decide "used" in the middle-end at one point,
> > for example at the end of all_lowering_passes where hopefully
> > we have constant folded and removed dead code enough.  We can also
> > compute an overall "uses libm" flag to fix the testcase I reported
> > (of course we'd like to re-compute that at LTO time).
> 
> If you do that, of course any optimizations before deciding "used" need to 
> be conservative - assume that any function except the one you're 
> optimizing a call of is not "used".

Yes.  I guess I'm willing to consider a new langhook that for C evaluates
C_DECL_USED.  We still have to find a suitable point to query it -
with the earliest point is after parsing finished (thus when we
finalize the TU).  At that point all constant expression folding
should have happened (up to the point required by the language) - but
I suppose C_DECL_USED is still set even if all calls were constant
folded away.

Note that computing a middle-end "C_DECL_USED" can be done from
gimplification (possibly after folding the calls there once).
Same for a global "libm function used" flag (where at some point
we might need a target hook to allow specifying whether a call
is from libm).

Richard.

[PATCH, trivial] [AArch64] Remove declaration of removed function from aarch64-protos.h

2014-12-09 Thread Yangfei (Felix)

The definition of function aarch64_function_profiler is removed since GCC-4.9. 
But the declaration is still there in aarch64-protos.h.  So remove it. 
OK for the trunk? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 218464)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-12-09  Felix Yang  
+
+   * config/aarch64/aarch64-protos.h (aarch64_function_profiler): Remove
+   declaration of removed function.
+
 2014-12-07  Felix Yang  
Shanyao Chen  
 
Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h (revision 218464)
+++ gcc/config/aarch64/aarch64-protos.h (working copy)
@@ -247,7 +247,6 @@ void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
 void aarch64_expand_prologue (void);
 void aarch64_expand_vector_init (rtx, rtx);
-void aarch64_function_profiler (FILE *, int);
 void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
   const_tree, unsigned);
 void aarch64_init_expanders (void);

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Uros Bizjak

On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak  wrote:

>> The patch fix pblendv expand.
>> The bug was uncovered when permutation operands are constants.
>> In this case we init target register for expand_vec_perm_1 with
>> constant and then rewrite the target with constant for
>> expand_vec_perm_pblend.
>>
>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>> -flto -march=corei7.
>>
>> Bootstrap and make check passed.
>>
>> Is it ok?
>
> Please add a testcase.

Also, it surprises me that we enter expand_vec_perm_pblendv with
uninitialized (?) target. Does your patch only papers over a real
problem up the call chain (hard to say without a testcase)?

Uros.

>
>>
>> Evgeny
>>
>> 2014-12-09  Evgeny Stupachenko  
>>
>> gcc/
>> * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>> expand_vec_perm_1 target.
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index eafc15a..5a914ad 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>>  dcopy.op0 = dcopy.op1 = d->op1;
>>else
>>  dcopy.op0 = dcopy.op1 = d->op0;
>> +  dcopy.target = gen_reg_rtx (vmode);
>>dcopy.one_operand_p = true;
>>
>>for (i = 0; i < nelt; ++i)

[PATCH] Fix half of PR64191

2014-12-09 Thread Richard Biener


This avoids vectorizing empty loops that only contain an indirect
clobber.  The real bug of course is that we don't remove the loop
in the first place (but we might not able to prove it terminates).
Still vectorizing it adds additional prologue/epilogue loops that
only convolutes the final code we generate.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
and branch.

Richard.

2014-12-09  Richard Biener  

PR tree-optimization/64191
* tree-vect-stmts.c (vect_stmt_relevant_p): Clobbers are
not relevant (nor are their uses).

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 218479)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -340,7 +340,8 @@ vect_stmt_relevant_p (gimple stmt, loop_
 
   /* changing memory.  */
   if (gimple_code (stmt) != GIMPLE_PHI)
-if (gimple_vdef (stmt))
+if (gimple_vdef (stmt)
+   && !gimple_clobber_p (stmt))
   {
if (dump_enabled_p ())
  dump_printf_loc (MSG_NOTE, vect_location,

Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)

2014-12-09 Thread Jakub Jelinek

On Tue, Dec 09, 2014 at 09:49:17AM +0100, Uros Bizjak wrote:
> >  (define_insn "_vec_dup_gpr"
> > -  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
> > -   (vec_duplicate:VI48_AVX512VL
> > - (match_operand: 1 "register_operand" "r")))]
> > -  "TARGET_AVX512F && (mode != DImode || TARGET_64BIT)"
> > -{
> > -  return "vpbroadcast\t{%1, 
> > %0|%0, %1}";
> > -}
> > -  [(set_attr "type" "ssemov")
> > -   (set_attr "prefix" "evex")
> > -   (set_attr "mode" "")])
> > -
> > -(define_insn "_vec_dup_mem"
> > -  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
> > -   (vec_duplicate:V48_AVX512VL
> > - (match_operand: 1 "nonimmediate_operand" "vm")))]
> > +  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v,v")
> > +   (vec_duplicate:V48_AVX512VL
> > + (match_operand: 1 "nonimmediate_operand" "vm,r")))]
> >"TARGET_AVX512F"
> >"vbroadcast\t{%1, 
> > %0|%0, %1}"
> >[(set_attr "type" "ssemov")
> > (set_attr "prefix" "evex")
> > -   (set_attr "mode" "")])
> > +   (set_attr "mode" "")
> > +   (set (attr "enabled")
> > + (if_then_else (eq_attr "alternative" "1")
> > +   (symbol_ref "GET_MODE_CLASS (mode) == MODE_INT
> > +&& (mode != DImode || TARGET_64BIT)")
> > +   (const_int 1)))])

I have no idea how to get rid of this enabled attribute, as it disables just
one alternative.  Creating a special isa attribute for it looks less
readable and much more expensive.  This enabled attribute is IMHO of the
good kind, GET_MODE_CLASS (mode) == MODE_INT and
mode != DImode are constants for the particular chosen mode,
TARGET_64BIT doesn't change during compilation of a TU, and so the only
variable is that it disables just one of the alternatives.

> > @@ -16759,7 +16744,10 @@ (define_split
> >[(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand")
> > (vec_duplicate:AVX2_VEC_DUP_MODE
> >   (match_operand: 1 "register_operand")))]
> > -  "TARGET_AVX2 && reload_completed && GENERAL_REG_P (operands[1])"
> > +  "TARGET_AVX2
> > +   && (!TARGET_AVX512VL
> > +   || (!TARGET_AVX512BW && GET_MODE_SIZE (mode) > 2))
> > +   && reload_completed && GENERAL_REG_P (operands[1])"
> >[(const_int 0)]
> 
> We would like to avoid convoluted insn enable condition by moving the
> target delated complexity to the mode iterator, even if it requires
> additional single-use mode iterator. In the ideal case, the remaining
> target-dependant condition would represent the baseline target for an
> insn and all other target-related conditions would be inside mode
> iterator.

This splitter condition can be replaced with mode iterator, but it results
in big repetitions there (incremental diff below).  Or do you have something
else in mind?

--- gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
+++ gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
@@ -16740,14 +16740,21 @@
(set_attr "isa" "avx2,noavx2,avx2,noavx2")
(set_attr "mode" ",V8SF,,V8SF")])
 
+;; Modes handled by AVX2 vec_dup splitter.  Assumes TARGET_AVX2,
+;; if both -mavx512vl and -mavx512bw are used, disables all modes,
+;; if just -mavx512vl, disables just V*SI.
+(define_mode_iterator AVX2_VEC_DUP_MODE_SPLIT
+  [(V32QI "!TARGET_AVX512VL || !TARGET_AVX512BW")
+   (V16QI "!TARGET_AVX512VL || !TARGET_AVX512BW")
+   (V16HI "!TARGET_AVX512VL || !TARGET_AVX512BW")
+   (V8HI "!TARGET_AVX512VL || !TARGET_AVX512BW")
+   (V8SI "!TARGET_AVX512VL") (V4SI "!TARGET_AVX512VL")])
+
 (define_split
-  [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand")
-   (vec_duplicate:AVX2_VEC_DUP_MODE
+  [(set (match_operand:AVX2_VEC_DUP_MODE_SPLIT 0 "register_operand")
+   (vec_duplicate:AVX2_VEC_DUP_MODE_SPLIT
  (match_operand: 1 "register_operand")))]
-  "TARGET_AVX2
-   && (!TARGET_AVX512VL
-   || (!TARGET_AVX512BW && GET_MODE_SIZE (mode) > 2))
-   && reload_completed && GENERAL_REG_P (operands[1])"
+  "TARGET_AVX2 && reload_completed && GENERAL_REG_P (operands[1])"
   [(const_int 0)]
 {
   emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),

Jakub

Re: [RFC, PATCH, fortran] PR fortran/60255 Deferred character length

2014-12-09 Thread Andre Vehreschild

Hi Paul,

> s/furure/future/ :-)

Hups, fixed.

> Why are you using a double underscore in get__len_component?

Because the component is called _len. The routine should be called "get _len
component", but spaces aren't allowed :-) Anyways, does this violate some style
guide? Should I remove one of underscores?

> More seriously, I think that the len field should be added unconditionally
> to unlimited polymorphic variables. Otherwise, you might find unlimited
> polymorphic variables that are created in an already compiled
> module/subprogramme arriving without the requisite field.

I was thinking about that, too. For a start I just wanted to give an idea of
where this is going. When more gfortran gurus vote for the unconditional add to
u-poly variables, then I will change it. 

> Michael Metcalf has posted an example that makes use of unlimited
> polymorphism at ftp://ftp.numerical.rl.ac.uk/pub/MRandC/oo.f90 . gfortran
> does not work correctly with it at the moment because of the lack of a len
> field. Removing all the string input allows it to run correctly. I think
> that you should ensure that your patch fixes the problem.
> 
> A slight obstacle is that the substring at line 216 causes the emission of:
> type is (character(*))
>   1
> Error: Associate-name '__tmp_CHARACTER_0_1' at (1) is used as array
> 
> Just retaining print *, 'character = "', v, '"' allows the example to
> compile

Ok, I take a look at it. As I am paid to fix certain bugs that prevent
compiling another software, I will not prioritize working on 55901 as long as
it is not needed in the software I focus on. Sorry for not being more
enthusiastic, but there are more than 8 prs (and only one down yet) I have to
fix and time is limited.

What I did not mention in the previous mail is that I also plan to implement
this fixes in the fortran-dev branch with the new array descriptor. Given that
there is no other volunteer. :-)

Please continue commenting.

Regards,
Andre

> ifort compiles and runs it successfully and so I think that it would be
> nice if gfortran catches up on this one.
> 
> Parenthetically, I wonder if this is not the time to implement PR53971...
> including responding to Mikael's comment?
> 
> Anyway, this is a good start in the right direction. Please persist!
> 
> Thanks
> 
> Paul
> 
> 
> On 8 December 2014 at 18:38, Andre Vehreschild  wrote:
> 
> > Hi all,
> >
> > please find attached a more elaborate patch for pr60255. I totally agree
> > that
> > my first attempt was just scratching the surface of the work needed.
> >
> > This patch also is *not* complete, but because I am really new to gfortran
> > patching, I don't want to present a final patch only to learn then, that I
> > have
> > violated design rules, common practice or the like. Therefore please
> > comment
> > and direct me to any sources/ideas to improve the patch.
> >
> > Topic:
> > The pr 60255 is about assigning a char array to an unlimited polymorphic
> > entity. In the comments the concern about the lost length information is
> > raised. The patch adds a _len component to the unlimited polymorphic entity
> > (after _data and _vtab) and adds an assignment of the string length to _len
> > when a string is pointer assigned to the unlimited poly entity.
> > Furthermore is
> > the intrinsic len(unlimited poly pointing to a string) resolved to give the
> > _len component.
> >
> > Yet missing:
> > - assign _len component back to deferred char array length component
> > - transport length along chains of unlimited poly entities, i.e., a => b;
> > c =>
> >   a where all objects are unlimited poly and b is a string.
> > - allocate() in this context
> >
> > Patch dependencies:
> > none
> >
> > Comments, concerns, candy welcome!
> >
> > Regards,
> > Andre
> >
> > On Sun, 17 Aug 2014 14:32:21 +0200
> > domi...@lps.ens.fr (Dominique Dhumieres) wrote:
> >
> > > > the testcase should check that the code generated is actually working,
> > > > not just that the ICE disappeared.
> > >
> > > I agree. Note that there is a test in the comment 3 of PR60255 that
> > > can be used to check the run time behavior (and possibly check the
> > > vtab issue).
> > >
> > > Dominique
> >
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
> >
> 
> 
> 


-- 
Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen
Tel.: +49 241 9291018 * Email: ve...@gmx.de

RE: [PATCH] Fix PR 61225

2014-12-09 Thread Zhenqiang Chen


> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, December 09, 2014 5:29 AM
> To: Zhenqiang Chen
> Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek
> Subject: Re: [PATCH] Fix PR 61225
> 
> On 12/04/14 01:43, Zhenqiang Chen wrote:
> >>> > >
> >>> > >  Part of PR rtl-optimization/61225
> >>> > >  * combine.c (refer_same_reg_p): New function.
> >>> > >  (combine_instructions): Handle I1 -> I2 -> I3; I2 ->
insn.
> >>> > >  (try_combine): Add one more parameter TO_COMBINED_INSN,
> >>> > > which
> >> >is
> >>> > >  used to create a new insn parallel (TO_COMBINED_INSN,
I3).
> >>> > >
> >>> > >testsuite/ChangeLog:
> >>> > >2014-08-04  Zhenqiang Chen
> >>> > >
> >>> > >  * gcc.target/i386/pr61225.c: New test.
> THanks for the updates and clarifications.  Just a few minor things and
while
> it's a bit of a hack, I'll approve:
> 
> 
> 
> > +
> > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> > +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> > +   except A and B.  */
> > +
> > +static bool
> > +refer_same_reg_p (rtx_insn *a, rtx_insn *b)
> > +{
> > +  rtx seta = single_set (a);
> > +  rtx setb = single_set (b);
> > +
> > +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> > +  || !seta
> > +  || !setb)
> > +return false;
> > 
> > +  if (GET_CODE (SET_SRC (seta)) != COMPARE
> > +  || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
> > +  || !REG_P (XEXP (SET_SRC (seta), 0))
> > +  || XEXP (SET_SRC (seta), 1) != const0_rtx
> > +  || !REG_P (SET_SRC (setb))
> > +  || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
> > +return false;
> Do you need to verify SETA and SETB satisfy single_set?  Or has that
> already been done elsewhere?

A is NEXT_INSN (insn)
B is prev_nonnote_nondebug_insn (insn),

For I1 -> I2 -> B; I2 -> A;
LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did
found codes in function try_combine, which can make sure B (or I3) is
single_set. 

So I think the check can skip failed cases at early stage.
 
> The name refer_same_reg_p seems wrong -- your function is verifying the
> underlying RTL store as well as the existence of a a dependency between
> the insns.  Can you try to come up with a better name?

Change it to can_reuse_cc_set_p.

> Please use CONST0_RTX (mode)  IIRC that'll allow this to work regardless
> of the size of the modes relative to the host word size.
 
Updated. 
 
> > +
> > +  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2)
> > +{
> > +  df_ref use;
> > +  rtx insn;
> > +  unsigned int i = REGNO (SET_SRC (setb));
> > +
> > +  for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG
(use))
> > +{
> > + insn = DF_REF_INSN (use);
> > + if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P
> (insn)))
> > +   return false;
> > +   }
> > +}
> > +
> > +  return true;
> > +}
> Is this fragment really needed?  Does it ever trigger?  I'd think that
> for > 2 uses punting would be fine.  Do we really commonly have cases
> with > 2 uses, but where they're all in SETA and SETB?

The check is to make sure the correctness.  Here is a case,

int 
f1 (int *x)
{
  int t = --*x;
  if (!t)
foo (x);
  return t;
}

  _4 = *x_3(D);
  _5 = _4 + -1;
  *x_3(D) = _5;
  # DEBUG t => _5
  if (_5 == 0)
   ...
  :
  return _5;

"_5" is used in "return _5". So we can not remove "_5 = _4 + -1".
 
> >   }
> >   }
> >
> > + /* Try to combine a compare insn that sets CC
> > +with a preceding insn that can set CC, and maybe with its
> > +logical predecessor as well.
> > +We need this special code because data flow connections
> > +do not get entered in LOG_LINKS.  */
> > + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
> > + && refer_same_reg_p (insn, prev)
> > + && max_combine >= 4)
> > +   {
> > +   struct insn_link *next1;
> > +   FOR_EACH_LOG_LINK (next1, prev)
> > + {
> > +   rtx_insn *link1 = next1->insn;
> > +   if (NOTE_P (link1))
> > + continue;
> > +   /* I1 -> I2 -> I3; I2 -> insn;
> > +  output parallel (insn, I3).  */
> > +   FOR_EACH_LOG_LINK (nextlinks, link1)
> > + if ((next = try_combine (prev, link1,
> > +  nextlinks->insn, NULL,
> > +  &new_direct_jump_p,
> > +  last_combined_insn, insn)) !=
0)
> > +
> > +   {
> > + delete_insn (insn);
> > + insn = next;
> > + statistics_counter_event (cfun, "four-insn
combine",
> 1);
> > + goto retry;
> > +   }
> > +   /* I2 -> I3; I2 -> insn

Re: [testsuite patch] avoid test when compile options is conflict with default mthumb

2014-12-09 Thread Mike Stump

On Dec 8, 2014, at 7:06 PM, Wang Deqiang  wrote:
> This is a ping for
> 
> https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01049.html

Seems reasonable enough.  I was hoping the arm folks would chime in, we should 
have enough of them to review.  Let’s given them two more days, and if no 
comments, Ok.  If they comment then seems reasonable to let them have the final 
say.

Re: locales fixes

2014-12-09 Thread Jonathan Wakely


On 08/12/14 23:53 +0100, François Dumont wrote:
   After having installed all necessary locales on my system I end up 
with 4 failures. Here is a patch to fix them all.


Did you discover why only you are seeing failures?

The whole testsuite passes for me, with glibc 2.18, including the
tests you are "fixing" (i.e. I don't see any need for a fix).

The changes to the glibc versions tests are definitely not correct.

Re: [PATCH] Fix broadcast from scalar patterns (PR target/63594)

2014-12-09 Thread Uros Bizjak

On Tue, Dec 9, 2014 at 10:17 AM, Jakub Jelinek  wrote:
> On Tue, Dec 09, 2014 at 09:49:17AM +0100, Uros Bizjak wrote:
>> >  (define_insn "_vec_dup_gpr"
>> > -  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
>> > -   (vec_duplicate:VI48_AVX512VL
>> > - (match_operand: 1 "register_operand" "r")))]
>> > -  "TARGET_AVX512F && (mode != DImode || TARGET_64BIT)"
>> > -{
>> > -  return "vpbroadcast\t{%1, 
>> > %0|%0, %1}";
>> > -}
>> > -  [(set_attr "type" "ssemov")
>> > -   (set_attr "prefix" "evex")
>> > -   (set_attr "mode" "")])
>> > -
>> > -(define_insn "_vec_dup_mem"
>> > -  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
>> > -   (vec_duplicate:V48_AVX512VL
>> > - (match_operand: 1 "nonimmediate_operand" "vm")))]
>> > +  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v,v")
>> > +   (vec_duplicate:V48_AVX512VL
>> > + (match_operand: 1 "nonimmediate_operand" 
>> > "vm,r")))]
>> >"TARGET_AVX512F"
>> >"vbroadcast\t{%1, 
>> > %0|%0, %1}"
>> >[(set_attr "type" "ssemov")
>> > (set_attr "prefix" "evex")
>> > -   (set_attr "mode" "")])
>> > +   (set_attr "mode" "")
>> > +   (set (attr "enabled")
>> > + (if_then_else (eq_attr "alternative" "1")
>> > +   (symbol_ref "GET_MODE_CLASS (mode) == MODE_INT
>> > +&& (mode != DImode || TARGET_64BIT)")
>> > +   (const_int 1)))])
>
> I have no idea how to get rid of this enabled attribute, as it disables just
> one alternative.  Creating a special isa attribute for it looks less
> readable and much more expensive.  This enabled attribute is IMHO of the
> good kind, GET_MODE_CLASS (mode) == MODE_INT and
> mode != DImode are constants for the particular chosen mode,
> TARGET_64BIT doesn't change during compilation of a TU, and so the only
> variable is that it disables just one of the alternatives.

IMO, this one is not problematic.

>> > @@ -16759,7 +16744,10 @@ (define_split
>> >[(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand")
>> > (vec_duplicate:AVX2_VEC_DUP_MODE
>> >   (match_operand: 1 "register_operand")))]
>> > -  "TARGET_AVX2 && reload_completed && GENERAL_REG_P (operands[1])"
>> > +  "TARGET_AVX2
>> > +   && (!TARGET_AVX512VL
>> > +   || (!TARGET_AVX512BW && GET_MODE_SIZE (mode) > 2))
>> > +   && reload_completed && GENERAL_REG_P (operands[1])"
>> >[(const_int 0)]
>>
>> We would like to avoid convoluted insn enable condition by moving the
>> target delated complexity to the mode iterator, even if it requires
>> additional single-use mode iterator. In the ideal case, the remaining
>> target-dependant condition would represent the baseline target for an
>> insn and all other target-related conditions would be inside mode
>> iterator.
>
> This splitter condition can be replaced with mode iterator, but it results
> in big repetitions there (incremental diff below).  Or do you have something
> else in mind?
>
> --- gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
> +++ gcc/config/i386/sse.md  2014-12-08 13:26:06.505543457 +0100
> @@ -16740,14 +16740,21 @@
> (set_attr "isa" "avx2,noavx2,avx2,noavx2")
> (set_attr "mode" ",V8SF,,V8SF")])
>
> +;; Modes handled by AVX2 vec_dup splitter.  Assumes TARGET_AVX2,
> +;; if both -mavx512vl and -mavx512bw are used, disables all modes,
> +;; if just -mavx512vl, disables just V*SI.
> +(define_mode_iterator AVX2_VEC_DUP_MODE_SPLIT
> +  [(V32QI "!TARGET_AVX512VL || !TARGET_AVX512BW")
> +   (V16QI "!TARGET_AVX512VL || !TARGET_AVX512BW")
> +   (V16HI "!TARGET_AVX512VL || !TARGET_AVX512BW")
> +   (V8HI "!TARGET_AVX512VL || !TARGET_AVX512BW")
> +   (V8SI "!TARGET_AVX512VL") (V4SI "!TARGET_AVX512VL")])
> +
>  (define_split
> -  [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand")
> -   (vec_duplicate:AVX2_VEC_DUP_MODE
> +  [(set (match_operand:AVX2_VEC_DUP_MODE_SPLIT 0 "register_operand")
> +   (vec_duplicate:AVX2_VEC_DUP_MODE_SPLIT
>   (match_operand: 1 "register_operand")))]
> -  "TARGET_AVX2
> -   && (!TARGET_AVX512VL
> -   || (!TARGET_AVX512BW && GET_MODE_SIZE (mode) > 2))
> -   && reload_completed && GENERAL_REG_P (operands[1])"
> +  "TARGET_AVX2 && reload_completed && GENERAL_REG_P (operands[1])"
>[(const_int 0)]
>  {
>emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),

Ugh, tough choice.

Let's go with the original, but please add the comment, similar to the
one above. Maybe also use "mode == SImode" instead of
GET_MODE_SIZE, it better matches the comment.

Thanks,
Uros.

Re: [PATCH, trivial] [AArch64] Remove declaration of removed function from aarch64-protos.h

2014-12-09 Thread Richard Earnshaw

On 09/12/14 08:59, Yangfei (Felix) wrote:
> The definition of function aarch64_function_profiler is removed since 
> GCC-4.9. 
> But the declaration is still there in aarch64-protos.h.  So remove it. 
> OK for the trunk? 
> 
> 
> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog (revision 218464)
> +++ gcc/ChangeLog (working copy)
> @@ -1,3 +1,8 @@
> +2014-12-09  Felix Yang  
> +
> + * config/aarch64/aarch64-protos.h (aarch64_function_profiler): Remove
> + declaration of removed function.
> +
>  2014-12-07  Felix Yang  
>   Shanyao Chen  
>  
> Index: gcc/config/aarch64/aarch64-protos.h
> ===
> --- gcc/config/aarch64/aarch64-protos.h   (revision 218464)
> +++ gcc/config/aarch64/aarch64-protos.h   (working copy)
> @@ -247,7 +247,6 @@ void aarch64_expand_epilogue (bool);
>  void aarch64_expand_mov_immediate (rtx, rtx);
>  void aarch64_expand_prologue (void);
>  void aarch64_expand_vector_init (rtx, rtx);
> -void aarch64_function_profiler (FILE *, int);
>  void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
>  const_tree, unsigned);
>  void aarch64_init_expanders (void);
> 

OK.

R.

[PATCH][libstdc++][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model

2014-12-09 Thread Kyrill Tkachov


Hi all,

There are a few libstdc++ that don't fit into memory when the 
-mcmodel=tiny option is used.
For example 25_algorithms/random_shuffle/1.cc on aarch64 is a 5.5M 
binary on aarch64 with -mcmodel=small.


In the gcc and g++ testsuite we already catch such cases and mark them 
as UNSUPPORTED. In libstdc++.exp there is no such functionality.
This patch adds that check. I've implemented the v3_check_unsupported_p 
predicate the same way as it's implemented in the gcc testsuite.
I'm not very happy that it had to be copied, but I couldn't find a way 
to include the gcc definition sanely.


With this patch 8 tests that were previously FAILing on 
aarch64-none-elf/-mcmodel=tiny due to relocation truncation errors are 
now marked as UNSUPPORTED.


A test run on x86 didn't show any bad behaviour appearing.

Is this a sane approach to what I'm trying to solve?

Thanks,
Kyrill

2014-12-03  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

 * testsuite/lib/libstdc++.exp (${tool}_check_unsupported_p):
 New procedure.
 (v3_target_compile): Check if test is unsupported.
 (v3_target_compile_as_c): Likewise.diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 3d9913b..385ae07 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -343,6 +343,22 @@ proc libstdc++_exit { } {
 }
 }
 
+proc ${tool}_check_unsupported_p { output } {
+if [regexp "(^|\n)\[^\n\]*: region \[^\n\]* is full" $output] {
+	return "memory full"
+}
+if { [regexp "(^|\n)\[^\n\]*: relocation truncated to fit" $output]
+  && [check_effective_target_tiny] } {
+return "memory full"
+ }
+
+if { [istarget spu-*-*] && \
+	 [string match "*exceeds local store*" $output] } {
+	return "memory full"
+}
+return ""
+}
+
 # Callback from system dg-test.
 proc libstdc++-dg-test { prog do_what extra_tool_flags } {
 # Set up the compiler flags, based on what we're going to do.
@@ -455,6 +471,7 @@ proc v3_target_compile { source dest type options } {
 global cxxldflags
 global includes
 global STATIC_LIBCXXFLAGS
+global tool
 
 if { [target_info needs_status_wrapper] != "" && [info exists gluefile] } {
 lappend options "libs=${gluefile}"
@@ -483,7 +500,14 @@ proc v3_target_compile { source dest type options } {
 lappend options "compiler=$cxx_final"
 lappend options "timeout=[timeout_value]"
 
-return [target_compile $source $dest $type $options]
+set comp_output [target_compile $source $dest $type $options]
+set unsupported_message [${tool}_check_unsupported_p $comp_output]
+
+if { $unsupported_message != "" } {
+  unsupported "$dest: $unsupported_message"
+  return ""
+}
+return $comp_output
 }
 
 
@@ -498,6 +522,7 @@ proc v3_target_compile_as_c { source dest type options } {
 global cc
 global cxxflags
 global STATIC_LIBCXXFLAGS
+global tool
 
 if { [target_info needs_status_wrapper] != "" && [info exists gluefile] } {
 lappend options "libs=${gluefile}"
@@ -551,7 +576,14 @@ proc v3_target_compile_as_c { source dest type options } {
 lappend options "compiler=$cc_final"
 lappend options "timeout=[timeout_value]"
 
-return [target_compile $source $dest $type $options]
+set comp_output [target_compile $source $dest $type $options]
+set unsupported_message [${tool}_check_unsupported_p $comp_output]
+
+if { $unsupported_message != "" } {
+  unsupported "$dest: $unsupported_message"
+  return ""
+}
+return $comp_output
 }
 
 # Build the support objects linked in with the libstdc++ tests.  In

Re: Fix libgomp crash without TLS (PR42616)

2014-12-09 Thread Varvara Rainchik

Can we instead of adding new macroses in config/tls.m4 use something
like that in libgomp:

#if defined (HAVE_TLS) && defined (USE_EMUTLS)

(with GCC_CHECK_EMUTLS in libgomp/configure.ac)?

2014-12-08 19:28 GMT+03:00 Jakub Jelinek :
> On Mon, Dec 08, 2014 at 07:01:09PM +0300, Varvara Rainchik wrote:
>> Is it ok to add GCC_CHECK_EMUTLS test in libgomp/configure.ac or
>> should I add in tls.m3 a similair test that would be used only in
>> libgomp?
>
> I think config/tls.m4 would be a better place, but doing it in some way
> where the users of config/tls.m4 could actually by using different macros
> from there choose what they want (either check solely for real TLS, or
> only for emutls, or for both).  And libgomp/configure.ac would then choose
> it is happy with both.
>
> Jakub

[patch] libstdc++/64203 fix for bare-metal targets

2014-12-09 Thread Jonathan Wakely


Tested x86_64-linux, committed to trunk and the 4.9 branch.
commit 6e201632e6b99a8381ec973ae77a9b4e1db255d7
Author: Jonathan Wakely 
Date:   Mon Dec 8 16:02:48 2014 +

	PR libstdc++/64203
	* include/std/shared_mutex: Fix preprocessor conditions.
	* testsuite/experimental/feat-cxx14.cc: Check conditions.

diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex
index 6405f10..c193eb2 100644
--- a/libstdc++-v3/include/std/shared_mutex
+++ b/libstdc++-v3/include/std/shared_mutex
@@ -36,10 +36,8 @@
 #else
 
 #include 
-#if defined(_GLIBCXX_HAS_GTHREADS) && defined(_GLIBCXX_USE_C99_STDINT_TR1)
-# include 
-# include 
-#endif
+#include 
+#include 
 #include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -51,7 +49,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-#if defined(_GLIBCXX_HAS_GTHREADS) && defined(_GLIBCXX_USE_C99_STDINT_TR1)
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+#ifdef _GLIBCXX_HAS_GTHREADS
 
 #define __cpp_lib_shared_timed_mutex 201402
 
@@ -254,7 +253,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 }
   };
-#endif // _GLIBCXX_HAS_GTHREADS && _GLIBCXX_USE_C99_STDINT_TR1
+#endif // _GLIBCXX_HAS_GTHREADS
 
   /// shared_lock
   template
@@ -393,6 +392,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 swap(shared_lock<_Mutex>& __x, shared_lock<_Mutex>& __y) noexcept
 { __x.swap(__y); }
 
+#endif // _GLIBCXX_USE_C99_STDINT_TR1
+
   // @} group mutexes
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
diff --git a/libstdc++-v3/testsuite/experimental/feat-cxx14.cc b/libstdc++-v3/testsuite/experimental/feat-cxx14.cc
index 1c590dd..2cc31ca 100644
--- a/libstdc++-v3/testsuite/experimental/feat-cxx14.cc
+++ b/libstdc++-v3/testsuite/experimental/feat-cxx14.cc
@@ -106,10 +106,12 @@
 #  error ""
 #endif
 
-#ifndef  __cpp_lib_shared_timed_mutex
-#  error "__cpp_lib_shared_timed_mutex"
-#elif  __cpp_lib_shared_timed_mutex != 201402
-#  error "__cpp_lib_shared_timed_mutex != 201402"
+#if defined(_GLIBCXX_HAS_GTHREADS) && defined(_GLIBCXX_USE_C99_STDINT_TR1)
+#  ifndef  __cpp_lib_shared_timed_mutex
+#error "__cpp_lib_shared_timed_mutex"
+#  elif  __cpp_lib_shared_timed_mutex != 201402
+#error "__cpp_lib_shared_timed_mutex != 201402"
+#  endif
 #endif
 
 #ifndef  __cpp_lib_is_final

[PATCH] Fix PR64199

2014-12-09 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-12-09  Richard Biener  

PR middle-end/64199
* fold-const.c (fold_binary_loc): Use TREE_OVERFLOW_P.

* gcc.dg/torture/pr64199.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 218479)
+++ gcc/fold-const.c(working copy)
@@ -10398,8 +10415,8 @@ fold_binary_loc (location_t loc,
 
  /* Don't introduce overflows through reassociation.  */
  if (!any_overflows
- && ((lit0 && TREE_OVERFLOW (lit0))
- || (minus_lit0 && TREE_OVERFLOW (minus_lit0
+ && ((lit0 && TREE_OVERFLOW_P (lit0))
+ || (minus_lit0 && TREE_OVERFLOW_P (minus_lit0
return NULL_TREE;
 
  if (minus_lit0)
Index: gcc/testsuite/gcc.dg/torture/pr64199.c
===
--- gcc/testsuite/gcc.dg/torture/pr64199.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr64199.c  (working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ffast-math -frounding-math" } */
+
+float
+foo (void)
+{
+  return 1.1f + 2.2f + 2.2f;
+}

Re: Fix libgomp crash without TLS (PR42616)

2014-12-09 Thread Jakub Jelinek

On Tue, Dec 09, 2014 at 02:49:44PM +0300, Varvara Rainchik wrote:
> Can we instead of adding new macroses in config/tls.m4 use something
> like that in libgomp:
> 
> #if defined (HAVE_TLS) && defined (USE_EMUTLS)
> 
> (with GCC_CHECK_EMUTLS in libgomp/configure.ac)?

That would be fine too.

Jakub

Re: [PATCH 3/n] OpenMP 4.0 offloading infrastructure: offload tables

2014-12-09 Thread Ilya Verbin

On 04 Dec 20:52, Jakub Jelinek wrote:
> On Thu, Dec 04, 2014 at 10:35:19PM +0300, Ilya Verbin wrote:
> > This issue can be resolved by forcing output of such variables.
> > Is this fix ok?  Should I add a testcase?
> 
> Yes, with proper ChangeLog.  Yes.

Here is updated patch, ok to commit?

However, I don't see -flto option in the build log.  It seems that
check_effective_target_lto isn't working inside libgomp/ directory.
Maybe because ENABLE_LTO is defined only in gcc/configure.ac ?


gcc/
* varpool.c (varpool_node::get_create): Force output of vars with
"omp declare target" attribute.
libgomp/
* testsuite/libgomp.c/target-9.c: New test.


diff --git a/gcc/varpool.c b/gcc/varpool.c
index 0526b7f..db28c2a 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -175,6 +175,7 @@ varpool_node::get_create (tree decl)
   g->have_offload = true;
   if (!in_lto_p)
vec_safe_push (offload_vars, decl);
+  node->force_output = 1;
 #endif
 }
 
diff --git a/libgomp/testsuite/libgomp.c/target-9.c 
b/libgomp/testsuite/libgomp.c/target-9.c
new file mode 100644
index 000..00fe0cb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-9.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+/* { dg-additional-options "-flto" { target lto } } */
+
+#include 
+
+#define N 123456
+
+#pragma omp declare target
+int X, Y;
+#pragma omp end declare target
+
+void
+foo ()
+{
+  #pragma omp target map(alloc: X)
+X = N;
+}
+
+int
+main ()
+{
+  int res;
+
+  foo ();
+
+  #pragma omp target map(alloc: X, Y) map(from: res)
+{
+  Y = N;
+  res = X + Y;
+}
+
+  if (res != N + N)
+abort ();
+
+  return 0;
+}


  -- Ilya

[Android] Stop unconditional disabling of ifuncs for Bionic #2

2014-12-09 Thread Alexander Ivchenko

Hi,

Whether the *linux* target supports ifuncs or not is defined here:

linux_has_ifunc_p (void)
{
  return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION;
}

Bionic right now supports indirect functions, but there is no way to
notify the compiler about that (For Android OPTION_BIONIC is always on
and so configure time check with --enable-gnu-indirect-function does
not work)

The following patch makes the use of the default version of
TARGET_HAS_IFUNC_P for *linux*, so we can decide whether the Android
target supports indirect functions or not on configure time.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f22bba8..d4d09d0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-08  Alexander Ivchenko  
+
+   * config/linux.c (linux_has_ifunc_p): Remove.
+   * config/linux.h (TARGET_HAS_IFUNC_P): Use default version.
+
 2014-12-08  Ilya Tocar  

* config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
diff --git a/gcc/config/linux.c b/gcc/config/linux.c
index 6242e11..15df213 100644
--- a/gcc/config/linux.c
+++ b/gcc/config/linux.c
@@ -23,14 +23,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "linux-protos.h"

-/* Android does not support GNU indirect functions.  */
-
-bool
-linux_has_ifunc_p (void)
-{
-  return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION;
-}
-
 bool
 linux_libc_has_function (enum function_class fn_class)
 {
diff --git a/gcc/config/linux.h b/gcc/config/linux.h
index d38ef81..6ccacff 100644
--- a/gcc/config/linux.h
+++ b/gcc/config/linux.h
@@ -117,10 +117,6 @@ see the files COPYING3 and COPYING.RUNTIME
respectively.  If not, see

 #else /* !uClinux, i.e., normal Linux */

-/* IFUNCs are supported by glibc, but not by uClibc or Bionic.  */
-# undef TARGET_HAS_IFUNC_P
-# define TARGET_HAS_IFUNC_P linux_has_ifunc_p
-
 /* Determine what functions are present at the runtime;
this includes full c99 runtime and sincos.  */
 # undef TARGET_LIBC_HAS_FUNCTION

Is the patch ok? Bootstrapped/regtested on x86_64-unknown-linux-gnu +
checked that the behavior of i686-linux-android is correct.

Alexander

[AArch64, NEON] Improve vmulX intrinsics

2014-12-09 Thread Jiangjiji


Hi,
 This patch converts more intrinsics to use builtin functions instead of the
previous inline assembly syntax.
 Passed the glorious testsuite of Christophe Lyon.

 Three testcases are added for the testing of intriniscs which are not
covered by the testsuite:
 gcc.target/aarch64/vmull_high.c
 gcc.target/aarch64/vmull_high_lane.c
 gcc.target/aarch64/vmull_high_n.c

 Regtested with aarch64-linux-gnu on QEMU.
 This patch has no regressions for aarch64_be-linux-gnu big-endian target 
too.
 OK for the trunk?



Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 218464)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,38 @@
+2014-12-09  Felix Yang  
+Jiji Jiang  
+
+   * config/aarch64/aarch64-simd.md (aarch64_mul_n,
+   aarch64_mull_n, aarch64_mull,
+   aarch64_simd_mull2_n, aarch64_mull2_n,
+   aarch64_mull_lane, aarch64_mull2_lane_internal,
+   aarch64_mull_laneq, aarch64_mull2_laneq_internal,
+   aarch64_smull2_lane, aarch64_umull2_lane,
+   aarch64_smull2_laneq, aarch64_umull2_laneq,
+   aarch64_fmulx, aarch64_fmulx, aarch64_fmulx_lane,
+   aarch64_pmull2v16qi, aarch64_pmullv8qi): New patterns.
+   * config/aarch64/aarch64-simd-builtins.def (vec_widen_smult_hi_,
+   vec_widen_umult_hi_, umull, smull, smull_n, umull_n, mul_n, smull2_n,
+   umull2_n, smull_lane, umull_lane, smull_laneq, umull_laneq, pmull,
+   umull2_lane, smull2_laneq, umull2_laneq, fmulx, fmulx_lane, pmull2,
+   smull2_lane): New builtins.
+   * config/aarch64/arm_neon.h (vmul_n_f32, vmul_n_s16, vmul_n_s32,
+   vmul_n_u16, vmul_n_u32, vmulq_n_f32, vmulq_n_f64, vmulq_n_s16,
+   vmulq_n_s32, vmulq_n_u16, vmulq_n_u32, vmull_high_lane_s16,
+   vmull_high_lane_s32, vmull_high_lane_u16, vmull_high_lane_u32,
+   vmull_high_laneq_s16, vmull_high_laneq_s32, vmull_high_laneq_u16,
+   vmull_high_laneq_u32, vmull_high_n_s16, vmull_high_n_s32,
+   vmull_high_n_u16, vmull_high_n_u32, vmull_high_p8, vmull_high_s8,
+   vmull_high_s16, vmull_high_s32, vmull_high_u8, vmull_high_u16,
+   vmull_high_u32, vmull_lane_s16, vmull_lane_s32, vmull_lane_u16,
+   vmull_lane_u32, vmull_laneq_s16, vmull_laneq_s32, vmull_laneq_u16,
+   vmull_laneq_u32, vmull_n_s16, vmull_n_s32, vmull_n_u16, vmull_n_u32,
+   vmull_p8, vmull_s8, vmull_s16, vmull_s32, vmull_u8, vmull_u16,
+   vmull_u32, vmulx_f32, vmulx_lane_f32, vmulxd_f64, vmulxq_f32,
+   vmulxq_f64, vmulxq_lane_f32, vmulxq_lane_f64, vmulxs_f32): Rewrite
+   using builtin functions.
+   * config/aarch64/iterators.md (UNSPEC_FMULX, UNSPEC_FMULX_LANE,
+   VDQF_Q): New unspec and int iterator.
+
 2014-12-07  Felix Yang  
Shanyao Chen  
 
Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c

===
--- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
(revision 0)
@@ -0,0 +1,111 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfc48, 0xfcbf, 0xfd36, 0xfdad,
+0xfe24, 0xfe9b, 0xff12, 0xff89 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xf9a0, 0xfa28,
+0xfab0, 0xfb38 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xf7a2,
+0xf83b };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xa4b0, 0xa55a, 0xa604, 0xa6ae,
+ 0xa758, 0xa802, 0xa8ac, 0xa956 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xbaf73c, 0xbaf7f7,
+ 0xbaf8b2, 0xbaf96d };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xcbf4d8,
+ 0xcbf5a4};
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x6530, 0x659a, 0x6464, 0x64ce,
+ 0x6798, 0x6732, 0x66cc, 0x };
+
+#ifndef INSN_NAME
+#define INSN_NAME vmull_high
+#define TEST_MSG "VMUL_HIGH"
+#endif
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+#define DECL_VMUL(T, W, N)  \
+  DECL_VARIABLE(vector1, T, W, N);  \
+  DECL_VARIABLE(vector2, T, W, N); 
+
+  /* vector_res = OP(vector1, vector2), then store the result.  */
+#define TEST_VMULL_HIGH1(INSN, Q, T1, T2, W, N, W1, N1) \
+  VECT_VAR(vector_res, T1, W1, N1) =\
+INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),  \
+   VECT_VAR(vector2, T1, W, N));\
+  vst1q##_##T2##W1(VECT_VAR(result, T1, W1, N1),\
+

Re: match.pd tweaks for vectors and issues with HONOR_NANS

2014-12-09 Thread Richard Biener

On Mon, Dec 8, 2014 at 3:45 PM, Marc Glisse  wrote:
> On Tue, 18 Nov 2014, Richard Biener wrote:
>
>>> I'll try to replace some more TYPE_MODE during stage3...
>>
>>
>> Btw, a convenience would be to be able to write
>>
>> HONOR_NANS (type)
>>
>> thus effectively make HONOR_* inline functions with a machine_mode
>> and a type overload (and the type overload properly looking at
>> element types).
>
>
> Making those functions inline is not easy, because real.h and tree.h don't
> include each other. Here is a version with the functions not inline. I was
> tempted to also overload on gcond const* (for the cases that call
> gimple_cond_lhs) but the arguments were always gimple and not gcond*, so I
> didn't.
>
> Passes bootstrap+testsuite on x86_64-linux-gnu.

Ok.

Thanks,
Richard.

> 2014-12-08  Marc Glisse  
>
> * real.h (HONOR_NANS): Replace macro with 3 overloaded declarations.
> * real.c: Include rtl.h and options.h.
> (HONOR_NANS): Define three overloads.
> * builtins.c (fold_builtin_classify, fold_builtin_unordered_cmp):
> Simplify argument of HONOR_NANS.
> * fold-const.c (combine_comparisons, fold_truth_not_expr,
> fold_cond_expr_with_comparison, merge_truthop_with_opposite_arm,
> fold_comparison, fold_binary_loc): Likewise.
> * ifcvt.c (noce_try_move, noce_try_minmax): Likewise.
> * ipa-inline-analysis.c (add_clause,
> set_cond_stmt_execution_predicate): Likewise.
> * match.pd: Likewise.
> * rtlanal.c (may_trap_p_1): Likewise.
> * simplify-rtx.c (simplify_const_relational_operation): Likewise.
> * tree-if-conv.c (parse_predicate): Likewise.
> * tree-ssa-ccp.c (valid_lattice_transition): Likewise.
> * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
> * tree-ssa-phiopt.c (minmax_replacement, neg_replacement): Likewise.
> * tree-ssa-reassoc.c (eliminate_using_constants): Likewise.
> * tree-ssa-tail-merge.c (gimple_equal_p): Likewise.
>
> --
> Marc Glisse
> Index: builtins.c
> ===
> --- builtins.c  (revision 218467)
> +++ builtins.c  (working copy)
> @@ -9641,34 +9641,34 @@ fold_builtin_classify (location_t loc, t
>integer_minus_one_node, integer_one_node);
> tmp = fold_build3_loc (loc, COND_EXPR, integer_type_node,
>isinf_call, tmp,
>integer_zero_node);
>   }
>
> return tmp;
>}
>
>  case BUILT_IN_ISFINITE:
> -  if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg)))
> +  if (!HONOR_NANS (arg)
>   && !HONOR_INFINITIES (TYPE_MODE (TREE_TYPE (arg
> return omit_one_operand_loc (loc, type, integer_one_node, arg);
>
>if (TREE_CODE (arg) == REAL_CST)
> {
>   r = TREE_REAL_CST (arg);
>   return real_isfinite (&r) ? integer_one_node : integer_zero_node;
> }
>
>return NULL_TREE;
>
>  case BUILT_IN_ISNAN:
> -  if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg
> +  if (!HONOR_NANS (arg))
> return omit_one_operand_loc (loc, type, integer_zero_node, arg);
>
>if (TREE_CODE (arg) == REAL_CST)
> {
>   r = TREE_REAL_CST (arg);
>   return real_isnan (&r) ? integer_one_node : integer_zero_node;
> }
>
>arg = builtin_save_expr (arg);
>return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
> @@ -9782,27 +9782,26 @@ fold_builtin_unordered_cmp (location_t l
>else if (code0 == REAL_TYPE && code1 == INTEGER_TYPE)
>  cmp_type = type0;
>else if (code0 == INTEGER_TYPE && code1 == REAL_TYPE)
>  cmp_type = type1;
>
>arg0 = fold_convert_loc (loc, cmp_type, arg0);
>arg1 = fold_convert_loc (loc, cmp_type, arg1);
>
>if (unordered_code == UNORDERED_EXPR)
>  {
> -  if (!HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0
> +  if (!HONOR_NANS (arg0))
> return omit_two_operands_loc (loc, type, integer_zero_node, arg0,
> arg1);
>return fold_build2_loc (loc, UNORDERED_EXPR, type, arg0, arg1);
>  }
>
> -  code = HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))) ? unordered_code
> -  : ordered_code;
> +  code = HONOR_NANS (arg0) ? unordered_code : ordered_code;
>return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
>   fold_build2_loc (loc, code, type, arg0, arg1));
>  }
>
>  /* Fold __builtin_{,s,u}{add,sub,mul}{,l,ll}_overflow, either into normal
> arithmetics if it can never overflow, or into internal functions that
> return both result of arithmetics and overflowed boolean flag in
> a complex integer result, or some other check for overflow.  */
>
>  static tree
> Index: fold-const.c
> ===
> --- fold-const.c(revision 218467)
> +++ fold-const.c(w

Re: [PATCH 2/3] Extended if-conversion

2014-12-09 Thread Yuri Rumyantsev

Richard,

Here is updated patch2 with the following changes:
1. Delete functions  phi_has_two_different_args and find_insertion_point.
2. Use only one function for extended predication -
predicate_extended_scalar_phi.
3. Save gsi before insertion of predicate computations for basic
blocks if it has 2 predecessors and
both incoming edges are critical or it gas more than 2 predecessors
and at least one incoming edge
is critical. This saved iterator can be used by extended phi predication.

Here is motivated test-case which explains this point.
Test-case is attached (t5.c) and it must be compiled with -O2
-ftree-loop-vectorize -fopenmp options.
The problem phi is in bb-7:

  bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
  {
:
xmax_edge_18 = xmax_edge_36 + 1;
if (xmax_17 == xmax_27)
  goto ;
else
  goto ;

  }
  bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
  {
:
if (xmax_17 == xmax_27)
  goto ;
else
  goto ;

  }
  bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
  {
:
# xmax_edge_30 = PHI 
xmax_edge_19 = xmax_edge_39 + 1;
goto ;

  }

Note that both incoming edges to bb_7 are critical. If we comment out
restoring gsi in predicate_all_scalar_phi:
#if 0
 if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
 || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
   gsi = bb_insert_point (bb);
 else
#endif
   gsi = gsi_after_labels (bb);

we will get ICE:
t5.c: In function 'foo':
t5.c:9:6: error: definition in block 4 follows the use
 void foo (int n)
  ^
for SSA_NAME: _1 in statement:
_52 = _1 & _3;
t5.c:9:6: internal compiler error: verify_ssa failed

smce predicate computations were inserted in bb_7.

ChangeLog is

2014-12-09  Yuri Rumyantsev  

* tree-if-conv.c : Include hash-map.h.
(struct bb_predicate_s): Add new field to save copy of gimple
statement iterator.
(bb_insert_point): New function.
(set_bb_insert_point): New function.
(has_pred_critical_p): New function.
(if_convertible_bb_p): Allow bb has more than 2 predecessors if
AGGRESSIVE_IF_CONV is true.
(if_convertible_bb_p): Delete check that bb has at least one
non-critical incoming edge.
(is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
Allow interchange PHI arguments if EXTENDED is false.
Change check that block containing reduction statement candidate
is predecessor of phi-block since phi may have more than two arguments.
(predicate_scalar_phi): Add new arguments for call of
is_cond_scalar_reduction.
(get_predicate_for_edge): New function.
(struct phi_args_hash_traits): New type.
(phi_args_hash_traits::hash): New function.
(phi_args_hash_traits::equal_keys): New function.
(gen_phi_arg_condition): New function.
(predicate_extended_scalar_phi): New function.
(predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
to true if BB containing phi has more than 2 predecessors or both
incoming edges are critical. Invoke find_phi_replacement_condition and
predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
has 2 predecessors and both incoming edges are critical or it has more
than 2 predecessors and atleast one incoming edge is critical.
Use standard gsi_after_labels otherwise.
Invoke predicate_extended_scalar_phi if EXTENDED is true.
(insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
to save gsi before insertion of predicate computations. SEt-up it to
true for BB with 2 predecessors and critical incoming edges either
number of predecessors is geater 2 and at least one incoming edge is
critical.
Add check that non-predicated block may have statements to insert.
Insert predicate computation of BB just after label if
EXTENDED_PREDICATION is true.
(tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
is copy of inner or outer loop force_vectorize field.

2014-12-04 16:37 GMT+03:00 Richard Biener :
> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev  wrote:
>> Richard,
>>
>> I did simple change by saving gsi iterator for each bb that has
>> critical edges by adding additional field to bb_predicate_s:
>>
>> typedef struct bb_predicate_s {
>>
>>   /* The condition under which this basic block is executed.  */
>>   tree predicate;
>>
>>   /* PREDICATE is gimplified, and the sequence of statements is
>>  recorded here, in order to avoid the duplication of computations
>>  that occur in previous conditions.  See PR44483.  */
>>   gimple_seq predicate_gimplified_stmts;
>>
>>   /* Insertion point for blocks having incoming critical edges.  */
>>   gimple_stmt_iterator gsi;
>> } *bb_predicate_p;
>>
>> and this iterator is saved in  insert_gimplified_predicates before
>> insertion code for predicate computation. I checked that this fix
>> works.
>
> Huh?  I still wonder what the issue is with inserting everything
> after the PHI we predicate.
>
> Well, your updated patch will come with testcases for the testsuite
> that will hopefully fail if doing that.
>
> Richard.
>
>>
>> Now I am implementing me

Re: [COMMITTED] [PING] [PATCH] [AArch64, NEON] More NEON intrinsics improvement

2014-12-09 Thread Christophe Lyon

On 9 December 2014 at 03:26, Yangfei (Felix)  wrote:
>> On 5 December 2014 at 18:44, Tejas Belagod  wrote:
>> >
>> >>>
>> >>> +__extension__ static __inline float32x2_t __attribute__
>> >>> +((__always_inline__))
>> >>> +vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c) {
>> >>> +  return __builtin_aarch64_fmav2sf (-__b, __c, __a); }
>> >>> +
>> >>> +__extension__ static __inline float32x4_t __attribute__
>> >>> +((__always_inline__))
>> >>> +vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c) {
>> >>> +  return __builtin_aarch64_fmav4sf (-__b, __c, __a); }
>> >>> +
>> >>> +__extension__ static __inline float64x2_t __attribute__
>> >>> +((__always_inline__))
>> >>> +vfmsq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c) {
>> >>> +  return __builtin_aarch64_fmav2df (-__b, __c, __a); }
>> >>> +
>> >>> +
>> >
>> >
>> > Thanks, the patch looks good. Just one comment:
>> > You could also add
>> > float32x2_t vfms_n_f32(float32x2_t a, float32x2_t b, float32_t n) and
>> > its Q-variant.
>>
>> You can, if you wish,  deal with Tejas' comment with a follow on patch
>> rather than re-spinning this one.   Provided this patch has no
>> regressions on a big endian and a little endian test run then you can commit 
>> it.
>> Thanks
>> /Marcus
>
>
> No regressions for aarch64_be-linux-gnu target.  Committed as r218484.
> Will come up with a new patch to deal with Tejas' comment.  Thanks.

My validations of trunk show that your new tests are incorrect: none
of them compiles because the hfloat64_t type isn't defined.

Also, keep in mind that the tests in this directory are executed by
the aarch32 target too.

Christophe

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Richard Biener

On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey  wrote:
> On Sat, 2014-12-06 at 19:21 +, Sebastian Pop wrote:
>
>> > I think it does not make sense to duplicate paths at -Os: I disabled the 
>> > FSM
>> > jump-threading when optimizing for size like this.
>> >
>> > diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
>> > index 29b20c8..ce70311 100644
>> > --- a/gcc/tree-ssa-threadedge.c
>> > +++ b/gcc/tree-ssa-threadedge.c
>> > @@ -1335,8 +1335,9 @@ thread_through_normal_block (edge e,
>> >   return 1;
>> > }
>> >
>> >if (!flag_expensive_optimizations
>> > + || optimize_function_for_size_p (cfun)
>> >   || TREE_CODE (cond) != SSA_NAME
>> >   || e->dest->loop_father != e->src->loop_father
>> >   || loop_depth (e->dest->loop_father) == 0)
>> > return 0;
>> >
>> > I will regstrap and commit the attached patch.
>>
>> Bootstrapped and regression tested on x86_64-linux.
>> Committed r218451.
>>
>> Sebastian
>
> Thanks for getting all this checked in Sebastian, I have tested it on
> coremark and I am getting the speed up that I expect.  But I am a little
> confused about turning off jump threading.  I am getting the
> optimization on coremark with -O3 and that is great and if I use '-O3
> -fno-expensive-optimizations' then I don't see this part of the jump
> threading, also great.  But I was surprised that if I just used '-O3
> -fno-thread-jumps' then I still see this optimization.  Is that
> expected?  Should this test also check flag_thread_jumps?  Or should
> that be getting checked somewhere else?

-fthread-jumps is an RTL optimization flag and ignored on GIMPLE.

Richard.

> Steve Ellcey
>
>

Re: [RFC, PATCH, fortran] PR fortran/60255 Deferred character length

2014-12-09 Thread Dominique d'Humières

Dear Andre,

The patch causes an ICE for the test gfortran.dg/unlimited_polymorphic_1.f03:

f951: internal compiler error: in gfc_add_component_ref, at fortran/class.c:236

f951: internal compiler error: Abort trap: 6
gfc: internal compiler error: Abort trap: 6 (program f951)
Abort

Reduced test for which the ICE is triggered by ‘len(w)'

MODULE m

contains
  subroutine bar (arg, res)
class(*) :: arg
character(100) :: res
select type (w => arg)
  type is (character(*))
write (res, '(I2)') len(w)
end select
  end subroutine

END MODULE

Note that with your patch at 
https://gcc.gnu.org/ml/fortran/2014-08/msg00022.html, I get the same ICE for 
the Mikael’s test at https://gcc.gnu.org/ml/fortran/2014-08/msg00055.html 
(before your patch for pr60255, it used to give a wrong length: 80 instead of 
20 AFAICR).

Note that the assert at fortran/class.c:236 is also triggered for pr61115.

Thanks for working on these issues,

Dominique

>> On 8 December 2014 at 18:38, Andre Vehreschild  wrote:
>> Hi all,
>> 
>> please find attached a more elaborate patch for pr60255. I totally agree that
>> my first attempt was just scratching the surface of the work needed.
>> 
>> This patch also is *not* complete, but because I am really new to gfortran
>> patching, I don't want to present a final patch only to learn then, that I 
>> have
>> violated design rules, common practice or the like. Therefore please comment
>> and direct me to any sources/ideas to improve the patch.
>> 
>> Topic:
>> The pr 60255 is about assigning a char array to an unlimited polymorphic
>> entity. In the comments the concern about the lost length information is
>> raised. The patch adds a _len component to the unlimited polymorphic entity
>> (after _data and _vtab) and adds an assignment of the string length to _len
>> when a string is pointer assigned to the unlimited poly entity. Furthermore 
>> is
>> the intrinsic len(unlimited poly pointing to a string) resolved to give the
>> _len component.
>> 
>> Yet missing:
>> - assign _len component back to deferred char array length component
>> - transport length along chains of unlimited poly entities, i.e., a => b; c 
>> =>
>>  a where all objects are unlimited poly and b is a string.
>> - allocate() in this context
>> 
>> Patch dependencies:
>> none
>> 
>> Comments, concerns, candy welcome!
>> 
>> Regards,
>>Andre

Re: [PATCH] Do not download packages for graphite loop optimizations by default when using ./contrib/download_prerequisites

2014-12-09 Thread Richard Biener

On Tue, Dec 9, 2014 at 6:36 AM, Chung-Ju Wu  wrote:
> Hi, all,
>
> In the discussion thread last year:
>   https://gcc.gnu.org/ml/gcc-patches/2013-05/msg01334.html
>
> I extended the script ./contrib/download_prerequisites so that it can
> download isl and cloog packages for graphite loop optimizations.
> The patch was proposed to use GRAPHITE_LOOP_OPT=no by default.
> However, the change I committed into trunk is setting GRAPHITE_LOOP_OPT=yes:
>   https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=199297
>
> I am sorry about my carelessness and I would like to propose a new patch
> to fix it.  The plaintext ChangeLog and patch are as follow:

I'd say keep it as =yes (now that we only need ISL) and adjust the comment
instead.

Richard.

>
> Index: contrib/ChangeLog
> ===
> --- contrib/ChangeLog   (revision 218505)
> +++ contrib/ChangeLog   (working copy)
> @@ -1,3 +1,7 @@
> +2014-12-09  Chung-Ju Wu  
> +
> +   * download_prerequisites: Set GRAPHITE_LOOP_OPT=no by default.
> +
>  2014-12-04  Thomas Preud'homme  
>
> * check_GNU_style.sh: Warn for incorrect number of spaces in function
>
>
> Index: contrib/download_prerequisites
> ===
> --- contrib/download_prerequisites  (revision 218505)
> +++ contrib/download_prerequisites  (working copy)
> @@ -22,7 +22,7 @@
>  # If you want to build GCC with the Graphite loop optimizations,
>  # set GRAPHITE_LOOP_OPT=yes to download optional prerequisties
>  # ISL Library and CLooG.
> -GRAPHITE_LOOP_OPT=yes
> +GRAPHITE_LOOP_OPT=no
>
>  # Necessary to build GCC.
>  MPFR=mpfr-2.4.2
>
>
> Is this patch OK for trunk?
>
> Or we can still keep it "yes" since it has been using GRAPHITE_LOOP_OPT=yes
> for a long time.  In that case, I will help to update the comment accordingly.
>
> Any comment? :)
>
>
> Best regards,
> jasonwucj

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Evgeny Stupachenko

The case comes from spec2006 403.gcc (or old GCC itself).

  for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
{
  vd->e[i].mode = VOIDmode;
  vd->e[i].oldest_regno = i;
  vd->e[i].next_regno = INVALID_REGNUM;
}

It is vectorized and only then completely peeled.
Only after peeling all stored values become constant.

Currently expand_vec_perm_pblendv works as following:
Let d.target, d.op0, dop1 be permutation parameters.

First we permute an operand (d.op1 or d.op0) and then blend it with
other argument:

d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, d.target)
(if d.op0 equal to d.target this is buggy)

Patch make it more accurate:
tmp = gen_reg_rtx (vmode)
tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, tmp)
(Here d.op0 can be equal to d.target)

Below is rtl dump of buggy case:

(183t.optimized)
...
vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
35 }, { 0, 4, 0, 1 }>;
vect_shuffle3_high_469 = VEC_PERM_EXPR ;
...
(184r.expand)
...
(insn 205 204 206 (set (reg:V4SI 768)
(const_vector:V4SI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
])) ../regrename.c:1171 -1
 (nil))

(insn 206 205 208 (set (reg:V4SI 769)
(mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3  S16
A128])) ../regrename.c:1171 -1
 (expr_list:REG_EQUAL (const_vector:V4SI [
(const_int 32 [0x20])
(const_int 33 [0x21])
(const_int 34 [0x22])
(const_int 35 [0x23])
])
(nil)))

(insn 208 206 207 (set (reg:V4SI 770)
(vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
(reg:V4SI 769))
(parallel [
(const_int 0 [0])
(const_int 4 [0x4])
(const_int 1 [0x1])
(const_int 5 [0x5])
]))) ../regrename.c:1171 -1
 (nil))

(insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
(vec_select:V4SI (reg:V4SI 770)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 2 [0x2])
]))) ../regrename.c:1171 -1
 (nil))

(insn 209 207 210 (set (reg:V4SI 771)
(const_vector:V4SI [
(const_int -1 [0x])
(const_int -1 [0x])
(const_int -1 [0x])
(const_int -1 [0x])
])) ../regrename.c:1171 -1
 (nil))

(insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
(vec_select:V4SI (reg:V4SI 771)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 3 [0x3])
]))) ../regrename.c:1171 -1
 (nil))
(insn 211 210 212 (set (reg:V8HI 772)
(vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
(subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
(const_int 48 [0x30]))) ../regrename.c:1171 -1
 (nil))
...

On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak  wrote:
> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak  wrote:
>
>>> The patch fix pblendv expand.
>>> The bug was uncovered when permutation operands are constants.
>>> In this case we init target register for expand_vec_perm_1 with
>>> constant and then rewrite the target with constant for
>>> expand_vec_perm_pblend.
>>>
>>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>>> -flto -march=corei7.
>>>
>>> Bootstrap and make check passed.
>>>
>>> Is it ok?
>>
>> Please add a testcase.
>
> Also, it surprises me that we enter expand_vec_perm_pblendv with
> uninitialized (?) target. Does your patch only papers over a real
> problem up the call chain (hard to say without a testcase)?
>
> Uros.
>
>>
>>>
>>> Evgeny
>>>
>>> 2014-12-09  Evgeny Stupachenko  
>>>
>>> gcc/
>>> * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>> expand_vec_perm_1 target.
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index eafc15a..5a914ad 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d 
>>> *d)
>>>  dcopy.op0 = dcopy.op1 = d->op1;
>>>else
>>>  dcopy.op0 = dcopy.op1 = d->op0;
>>> +  dcopy.target = gen_reg_rtx (vmode);
>>>dcopy.one_operand_p = true;
>>>
>>>for (i = 0; i < nelt; ++i)

Re: [AArch64, NEON] Improve vmulX intrinsics

2014-12-09 Thread Christophe Lyon

On 9 December 2014 at 13:52, Jiangjiji  wrote:
> Hi,
>  This patch converts more intrinsics to use builtin functions instead of
> the
> previous inline assembly syntax.
>  Passed the glorious testsuite of Christophe Lyon.
>
>  Three testcases are added for the testing of intriniscs which are not
> covered by the testsuite:
>  gcc.target/aarch64/vmull_high.c
>  gcc.target/aarch64/vmull_high_lane.c
>  gcc.target/aarch64/vmull_high_n.c
>

As I said here:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01934.html
I am in tre process of converting my existing testsuite to GCC/Dejagnu.
Please do not duplicate work.


>  Regtested with aarch64-linux-gnu on QEMU.
>  This patch has no regressions for aarch64_be-linux-gnu big-endian
> target too.
>  OK for the trunk?
>
>
>
> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog   (revision 218464)
> +++ gcc/ChangeLog   (working copy)
> @@ -1,3 +1,38 @@
> +2014-12-09  Felix Yang  
> +Jiji Jiang  
> +
> +   * config/aarch64/aarch64-simd.md (aarch64_mul_n,
> +   aarch64_mull_n, aarch64_mull,
> +   aarch64_simd_mull2_n, aarch64_mull2_n,
> +   aarch64_mull_lane, aarch64_mull2_lane_internal,
> +   aarch64_mull_laneq,
> aarch64_mull2_laneq_internal,
> +   aarch64_smull2_lane, aarch64_umull2_lane,
> +   aarch64_smull2_laneq, aarch64_umull2_laneq,
> +   aarch64_fmulx, aarch64_fmulx, aarch64_fmulx_lane,
> +   aarch64_pmull2v16qi, aarch64_pmullv8qi): New patterns.
> +   * config/aarch64/aarch64-simd-builtins.def (vec_widen_smult_hi_,
> +   vec_widen_umult_hi_, umull, smull, smull_n, umull_n, mul_n,
> smull2_n,
> +   umull2_n, smull_lane, umull_lane, smull_laneq, umull_laneq, pmull,
> +   umull2_lane, smull2_laneq, umull2_laneq, fmulx, fmulx_lane, pmull2,
> +   smull2_lane): New builtins.
> +   * config/aarch64/arm_neon.h (vmul_n_f32, vmul_n_s16, vmul_n_s32,
> +   vmul_n_u16, vmul_n_u32, vmulq_n_f32, vmulq_n_f64, vmulq_n_s16,
> +   vmulq_n_s32, vmulq_n_u16, vmulq_n_u32, vmull_high_lane_s16,
> +   vmull_high_lane_s32, vmull_high_lane_u16, vmull_high_lane_u32,
> +   vmull_high_laneq_s16, vmull_high_laneq_s32, vmull_high_laneq_u16,
> +   vmull_high_laneq_u32, vmull_high_n_s16, vmull_high_n_s32,
> +   vmull_high_n_u16, vmull_high_n_u32, vmull_high_p8, vmull_high_s8,
> +   vmull_high_s16, vmull_high_s32, vmull_high_u8, vmull_high_u16,
> +   vmull_high_u32, vmull_lane_s16, vmull_lane_s32, vmull_lane_u16,
> +   vmull_lane_u32, vmull_laneq_s16, vmull_laneq_s32, vmull_laneq_u16,
> +   vmull_laneq_u32, vmull_n_s16, vmull_n_s32, vmull_n_u16, vmull_n_u32,
> +   vmull_p8, vmull_s8, vmull_s16, vmull_s32, vmull_u8, vmull_u16,
> +   vmull_u32, vmulx_f32, vmulx_lane_f32, vmulxd_f64, vmulxq_f32,
> +   vmulxq_f64, vmulxq_lane_f32, vmulxq_lane_f64, vmulxs_f32): Rewrite
> +   using builtin functions.
> +   * config/aarch64/iterators.md (UNSPEC_FMULX, UNSPEC_FMULX_LANE,
> +   VDQF_Q): New unspec and int iterator.
> +
>  2014-12-07  Felix Yang  
> Shanyao Chen  
>  Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
> (revision 0)
> +++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_high.c
> (revision 0)
> @@ -0,0 +1,111 @@
> +#include 
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +
> +/* Expected results.  */
> +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfc48, 0xfcbf, 0xfd36, 0xfdad,
> +0xfe24, 0xfe9b, 0xff12, 0xff89 };
> +VECT_VAR_DECL(expected,int,32,4) [] = { 0xf9a0, 0xfa28,
> +0xfab0, 0xfb38 };
> +VECT_VAR_DECL(expected,int,64,2) [] = { 0xf7a2,
> +0xf83b };
> +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xa4b0, 0xa55a, 0xa604, 0xa6ae,
> + 0xa758, 0xa802, 0xa8ac, 0xa956 };
> +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xbaf73c, 0xbaf7f7,
> + 0xbaf8b2, 0xbaf96d };
> +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xcbf4d8,
> + 0xcbf5a4};
> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x6530, 0x659a, 0x6464, 0x64ce,
> + 0x6798, 0x6732, 0x66cc, 0x };
> +
> +#ifndef INSN_NAME
> +#define INSN_NAME vmull_high
> +#define TEST_MSG "VMUL_HIGH"
> +#endif
> +
> +#define FNNAME1(NAME) exec_ ## NAME
> +#define FNNAME(NAME) FNNAME1(NAME)
> +
> +void FNNAME (INSN_NAME) (void)
> +{
> +#define DECL_VMUL(T, W, N)  \
> +  DECL_VARIABLE(vector1, T, W, N);  \
> +  DECL_VARIABLE(vector2, T, W, N);
> +
> +  /* vector_res = OP(vector1, vector2)

[PATCH] Fix PR64193

2014-12-09 Thread Richard Biener


The following patch restores proper CSE by FRE/PRE when doing
optimistic value-numbering of stores/loads.  In that case we
may end up value-numbering a virtual operand to the only executable
edge value of a PHI - but as the alias walker doesn't know this
it fails to handle the PHI which results in inconsistent virtual
operand being used for the entry in the hashtable.  Since
r173250 we refuse to value-number to sth non-VARYING in the 2nd
iteration if the 1st iteration computed VARYING (which is correct).

The fix is to tell the alias walker about this optimistic handling.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-12-09  Richard Biener  

PR tree-optimization/64193
* tree-ssa-alias.c (walk_non_aliased_vuses): Add valueize parameter
and valueize the VUSE before looking up the def stmt.
* tree-ssa-alias.h (walk_non_aliased_vuses): Adjust prototype.
* tree-ssa-sccvn.c (vn_reference_lookup_pieces): Pass vn_valueize
to walk_non_aliased_vuses.
(vn_reference_lookup): Likewise.
* tree-ssa-dom.c (lookup_avail_expr): Pass NULL as valueize
callback to walk_non_aliased_vuses.

* gcc.dg/tree-ssa/ssa-fre-43.c: New testcase.

Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 218479)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -2632,12 +2632,18 @@ get_continuation_for_phi (gimple phi, ao
If TRANSLATE returns NULL the walk continues and TRANSLATE is supposed
to adjust REF and *DATA to make that valid.
 
+   VALUEIZE if non-NULL is called with the next VUSE that is considered
+   and return value is substituted for that.  This can be used to
+   implement optimistic value-numbering for example.  Note that the
+   VUSE argument is assumed to be valueized already.
+
TODO: Cache the vector of equivalent vuses per ref, vuse pair.  */
 
 void *
 walk_non_aliased_vuses (ao_ref *ref, tree vuse,
void *(*walker)(ao_ref *, tree, unsigned int, void *),
void *(*translate)(ao_ref *, tree, void *, bool),
+   tree (*valueize)(tree),
void *data)
 {
   bitmap visited = NULL;
@@ -2663,6 +2669,8 @@ walk_non_aliased_vuses (ao_ref *ref, tre
   else if (res != NULL)
break;
 
+  if (valueize)
+   vuse = valueize (vuse);
   def_stmt = SSA_NAME_DEF_STMT (vuse);
   if (gimple_nop_p (def_stmt))
break;
Index: gcc/tree-ssa-alias.h
===
--- gcc/tree-ssa-alias.h(revision 218479)
+++ gcc/tree-ssa-alias.h(working copy)
@@ -124,6 +124,7 @@ extern void *walk_non_aliased_vuses (ao_
 void *(*)(ao_ref *, tree,
   unsigned int, void *),
 void *(*)(ao_ref *, tree, void *, bool),
+tree (*)(tree),
 void *);
 extern unsigned int walk_aliased_vdefs (ao_ref *, tree,
bool (*)(ao_ref *, tree, void *),
Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 218479)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -2160,7 +2176,8 @@ vn_reference_lookup_pieces (tree vuse, a
*vnresult =
  (vn_reference_t)walk_non_aliased_vuses (&r, vr1.vuse,
  vn_reference_lookup_2,
- vn_reference_lookup_3, &vr1);
+ vn_reference_lookup_3,
+ vn_valueize, &vr1);
   gcc_checking_assert (vr1.operands == shared_lookup_references);
 }
 
@@ -2212,7 +2229,8 @@ vn_reference_lookup (tree op, tree vuse,
   wvnresult =
(vn_reference_t)walk_non_aliased_vuses (&r, vr1.vuse,
vn_reference_lookup_2,
-   vn_reference_lookup_3, &vr1);
+   vn_reference_lookup_3,
+   vn_valueize, &vr1);
   gcc_checking_assert (vr1.operands == shared_lookup_references);
   if (wvnresult)
{
Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  (revision 218479)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -2635,7 +2635,7 @@ lookup_avail_expr (gimple stmt, bool ins
&& TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
&& (ao_ref_init (&ref, gimple_assign_rhs1 (stmt)), true)
&& walk_non_aliased_vuses (&ref, vuse2,
-  vuse_eq, NULL, vuse1) != NULL))
+

[PATCH] Fix PR42108

2014-12-09 Thread Richard Biener


The following finally fixes PR42108 (well, hopefully...) by using
range-information on SSA names to allow the integer divisions introduced
by Fortran array lowering being hoisted out of loops, thus detecting
them as not trapping.

I chose to enhance tree_single_nonzero_warnv_p for this and adjusted
operation_could_trap_helper_p to use this helper.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-12-09  Richard Biener  

PR tree-optimization/42108
* fold-const.c (tree_single_nonzero_warnv_p): Use range
information associated with SSA names.
* tree-eh.c (operation_could_trap_helper_p): Use
tree_single_nonzero_warnv_p to check for trapping non-fp
operation.
* tree-vrp.c (remove_range_assertions): Remove bogus assert.

* gfortran.dg/pr42108.f90: Adjust testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 218479)
+++ gcc/fold-const.c(working copy)
@@ -83,6 +83,8 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "generic-match.h"
 #include "optabs.h"
+#include "stringpool.h"
+#include "tree-ssanames.h"
 
 /* Nonzero if we are folding constants inside an initializer; zero
otherwise.  */
@@ -15362,6 +15381,26 @@ tree_single_nonzero_warnv_p (tree t, boo
}
   break;
 
+case SSA_NAME:
+  if (INTEGRAL_TYPE_P (TREE_TYPE (t)))
+   {
+ wide_int minv, maxv;
+ enum value_range_type rtype = get_range_info (t, &minv, &maxv);
+ if (rtype == VR_RANGE)
+   {
+ if (wi::lt_p (maxv, 0, TYPE_SIGN (TREE_TYPE (t)))
+ || wi::gt_p (minv, 0, TYPE_SIGN (TREE_TYPE (t
+   return true;
+   }
+ else if (rtype == VR_ANTI_RANGE)
+   {
+ if (wi::le_p (minv, 0, TYPE_SIGN (TREE_TYPE (t)))
+ && wi::ge_p (maxv, 0, TYPE_SIGN (TREE_TYPE (t
+   return true;
+   }
+   }
+  break;
+
 default:
   break;
 }
Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 218479)
+++ gcc/tree-eh.c   (working copy)
@@ -2440,13 +2440,16 @@ operation_could_trap_helper_p (enum tree
 case ROUND_MOD_EXPR:
 case TRUNC_MOD_EXPR:
 case RDIV_EXPR:
-  if (honor_snans || honor_trapv)
-   return true;
-  if (fp_operation)
-   return flag_trapping_math;
-  if (!TREE_CONSTANT (divisor) || integer_zerop (divisor))
-return true;
-  return false;
+  {
+   if (honor_snans || honor_trapv)
+ return true;
+   if (fp_operation)
+ return flag_trapping_math;
+   bool sop;
+   if (!tree_single_nonzero_warnv_p (divisor, &sop))
+ return true;
+   return false;
+  }
 
 case LT_EXPR:
 case LE_EXPR:
Index: gcc/testsuite/gfortran.dg/pr42108.f90
===
--- gcc/testsuite/gfortran.dg/pr42108.f90   (revision 218479)
+++ gcc/testsuite/gfortran.dg/pr42108.f90   (working copy)
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O2 -fdump-tree-fre1" }
+! { dg-options "-O2 -fdump-tree-fre1 -fdump-tree-lim1-details" }
 
 subroutine  eval(foo1,foo2,foo3,foo4,x,n,nnd)
   implicit real*8 (a-h,o-z)
@@ -21,7 +21,10 @@ subroutine  eval(foo1,foo2,foo3,foo4,x,n
   end do
 end subroutine eval
 
-! There should be only one load from n left
+! There should be only one load from n left and the division should
+! be hoisted out of the loop
 
 ! { dg-final { scan-tree-dump-times "\\*n_" 1 "fre1" } }
+! { dg-final { scan-tree-dump-times "Moving statement" 5 "lim1" } }
 ! { dg-final { cleanup-tree-dump "fre1" } }
+! { dg-final { cleanup-tree-dump "lim1" } }
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 218479)
+++ gcc/tree-vrp.c  (working copy)
@@ -6866,12 +6866,9 @@ remove_range_assertions (void)
tree lhs = gimple_assign_lhs (stmt);
tree rhs = gimple_assign_rhs1 (stmt);
tree var;
-   tree cond = fold (ASSERT_EXPR_COND (rhs));
use_operand_p use_p;
imm_use_iterator iter;
 
-   gcc_assert (cond != boolean_false_node);
-
var = ASSERT_EXPR_VAR (rhs);
gcc_assert (TREE_CODE (var) == SSA_NAME);

Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2014-12-09 Thread Richard Biener

On Mon, 8 Dec 2014, Ilya Verbin wrote:

> Hi,
> 
> On 28 Nov 09:36, Richard Biener wrote:
> > On Fri, 28 Nov 2014, Ilya Verbin wrote:
> > > I found a bug here, have_{lto,offload} must be set if at least one file 
> > > contains
> > > lto/offload sections, but currently they are overwritten by the last file.
> > > Fix is bootstrapped and regtested on x86_64-linux.  OK for trunk?
> > Ok.
> 
> Unfortunately, this fix was not general enough.
> There might be cases when mixed object files get into lto-wrapper, ie some of
> them contain only LTO sections, some contain only offload sections, and some
> contain both.  But when lto-wrapper will pass all these files to 
> recompilation,
> the compiler might crash (it depends on the order of input files), since in
> read_cgraph_and_symbols it expects that *all* input files contain IR section 
> of
> given type.
> This patch splits input objects from argv into lto_argv and offload_argv, so
> that all files in arrays contain corresponding IR.
> Similarly, in lto-plugin, it was bad idea to add objects, which contain 
> offload
> IR without LTO, to claimed_files, since this may corrupt a resolution file.
> 
> Tested on various combinations of files with/without -flto and with/without
> offload, using trunk ld and gold, also tested on ld without plugin support.
> Bootstrap and make check passed on x86_64-linux and i686-linux.  Ok for trunk?

Did you check that bootstrap-lto still works?  Ok if so.

Thanks,
Richard.

> But I am unable to run lto-bootstrap on trunk, it says:
> /x/build-x86_64-linux/./prev-gcc/gcc-ar -B/x/build-x86_64-linux/./prev-gcc/ 
> cru libdecnumber.a decNumber.o decContext.o decimal32.o decimal64.o 
> decimal128.o bid2dpd_dpd2bid.o host-ieee32.o host-ieee64.o host-ieee128.o
> /usr/bin/ar terminated with signal 11 [Segmentation fault], core dumped
> make[4]: *** [libdecnumber.a] Error 1
> 
> 
> gcc/
>   * lto-wrapper.c (compile_offload_image): Start processing in_argv
>   from 0 instead of 1.
>   (run_gcc): Put offload objects into offload_argv, put LTO objects and
>   possible preceding arguments into lto_argv.
>   Pass offload_argv to compile_images_for_offload_targets instead of argv.
>   Use lto_argv for LTO recompilation instead of argv.
> lto-plugin/
>   * lto-plugin.c (offload_files, num_offload_files): New static variables.
>   (free_1): Use arguments instead of global variables.
>   (free_2): Free offload_files.
>   (all_symbols_read_handler): Add names from offload_files to lto-wrapper
>   arguments.
>   (claim_file_handler): Do not add file to claimed_files if it contains
>   offload sections without LTO sections.  Add it to offload_files instead.
> 
> 
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 0f69457..f75c0dc 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -669,7 +669,7 @@ compile_offload_image (const char *target, const char 
> *compiler_path,
>obstack_ptr_grow (&argv_obstack, filename);
>  
>/* Append names of input object files.  */
> -  for (unsigned i = 1; i < in_argc; i++)
> +  for (unsigned i = 0; i < in_argc; i++)
>   obstack_ptr_grow (&argv_obstack, in_argv[i]);
>  
>/* Append options from offload_lto sections.  */
> @@ -883,6 +883,8 @@ run_gcc (unsigned argc, char *argv[])
>int new_head_argc;
>bool have_lto = false;
>bool have_offload = false;
> +  unsigned lto_argc = 0, offload_argc = 0;
> +  char **lto_argv, **offload_argv;
>  
>/* Get the driver and options.  */
>collect_gcc = getenv ("COLLECT_GCC");
> @@ -896,6 +898,11 @@ run_gcc (unsigned argc, char *argv[])
>   &decoded_options,
>   &decoded_options_count);
>  
> +  /* Allocate arrays for input object files with LTO or offload IL,
> + and for possible preceding arguments.  */
> +  lto_argv = XNEWVEC (char *, argc);
> +  offload_argv = XNEWVEC (char *, argc);
> +
>/* Look at saved options in the IL files.  */
>for (i = 1; i < argc; ++i)
>  {
> @@ -918,17 +925,27 @@ run_gcc (unsigned argc, char *argv[])
>   }
>fd = open (argv[i], O_RDONLY);
>if (fd == -1)
> - continue;
> + {
> +   lto_argv[lto_argc++] = argv[i];
> +   continue;
> + }
> +
> +  if (find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX,
> +   &fdecoded_options, &fdecoded_options_count,
> +   collect_gcc))
> + {
> +   have_lto = true;
> +   lto_argv[lto_argc++] = argv[i];
> + }
> +
> +  if (find_and_merge_options (fd, file_offset, 
> OFFLOAD_SECTION_NAME_PREFIX,
> +   &offload_fdecoded_options,
> +   &offload_fdecoded_options_count, collect_gcc))
> + {
> +   have_offload = true;
> +   offload_argv[offload_argc++] = argv[i];
> + }
>  
> -  have_lto
> - |= find_and_merge_options (f

Re: [PATCH 8/9] Negative numbers added for sreal class.

2014-12-09 Thread Richard Biener

On Mon, Dec 8, 2014 at 5:52 PM, Martin Liška  wrote:
> On 11/28/2014 10:32 AM, Richard Biener wrote:
>>
>> On Thu, Nov 27, 2014 at 6:08 PM, Martin Liška  wrote:
>>>
>>> On 11/21/2014 04:21 PM, Martin Liška wrote:


 On 11/21/2014 04:02 PM, Richard Biener wrote:
>
>
> On Fri, Nov 21, 2014 at 3:39 PM, Martin Liška  wrote:
>
>> Hello.
>>
>> Ok, this is simplified, one can use sreal a = 12345 and it works ;)
>>
>>> that's a  new API, right?  There is no max () and I think that using
>>> LONG_MIN here is asking for trouble (host dependence).  The
>>> comment in the file says the max should be
>>> sreal (SREAL_MAX_SIG, SREAL_MAX_EXP) and the min
>>> sreal (-SREAL_MAX_SIG, SREAL_MAX_EXP)?
>>>
>>
>> Sure, sreal can store much bigger(smaller) numbers :)
>>
>>> Where do you need sreal::to_double?  The host shouldn't perform
>>> double calculations so it can be only for dumping?  In which case
>>> the user should have used sreal::dump (), maybe with extra
>>> arguments.
>>>
>>
>> That new function was request from Honza, only for debugging purpose.
>> I agree that dump should this kind of job.
>>
>> If no other problem, I will run tests once more and commit it.
>> Thanks,
>> Martin
>
>
>
> -#define SREAL_MAX_EXP (INT_MAX / 4)
> +#define SREAL_MAX_EXP (INT_MAX / 8)
>
> this change doesn't look necessary anymore?
>
> Btw, it's also odd that...
>
>#define SREAL_PART_BITS 32
> ...
>#define SREAL_MIN_SIG ((uint64_t) 1 << (SREAL_PART_BITS - 1))
>#define SREAL_MAX_SIG (((uint64_t) 1 << SREAL_PART_BITS) - 1)
>
> thus all m_sig values fit in 32bits but we still use a uint64_t m_sig
> ...
> (the implementation uses 64bit for internal computations, but still
> the storage is wasteful?)
>
> Of course the way normalize() works requires that storage to be
> 64bits to store unnormalized values.
>
> I'd say ok with the SREAL_MAX_EXP change reverted.
>

 Hi.

 You are right, this change was done because I used one bit for
 m_negative
 (bitfield), not needed any more.

 Final version attached.

 Thank you,
 Martin

> Thanks,
> Richard.
>
>
>>
>>> Otherwise looks good to me and sorry for not noticing the above
>>> earlier.
>>>
>>> Thanks,
>>> Richard.
>>>
 Thanks,
 Martin


>>  };
>>
>>  extern void debug (sreal &ref);
>> @@ -76,12 +133,12 @@ inline sreal &operator+= (sreal &a, const
>> sreal
>> &b)
>>
>>  inline sreal &operator-= (sreal &a, const sreal &b)
>>  {
>> -return a = a - b;
>> +  return a = a - b;
>>  }
>>
>>  inline sreal &operator/= (sreal &a, const sreal &b)
>>  {
>> -return a = a / b;
>> +  return a = a / b;
>>  }
>>
>>  inline sreal &operator*= (sreal &a, const sreal &b)
>> --
>> 2.1.2
>>
>>

>>

>>>
>>> Hello.
>>>
>>> After IRC discussions, I decided to give sreal another refactoring where
>>> I
>>> use int64_t for m_sig.
>>>
>>> This approach looks much easier and straightforward. I would like to
>>> ask folk for comments?
>>
>>
>> I think you want to exclude LONG_MIN (how do you know that LONG_MIN
>> is min(int64_t)?) as a valid value for m_sig - after all SREAL_MIN_SIG
>> makes it symmetric.  Or simply use abs_hwi at
>
>
> Hello.
>
> I decided to use abs_hwi.

That will ICE if you do

  sreal x (- __LONG_MAX__ - 1);

maybe that's the only case though.

 sreal::normalize ()
 {
+  int64_t s = m_sig < 0 ? -1 : 1;
+  HOST_WIDE_INT sig = abs_hwi (m_sig);
+
   if (m_sig == 0)
...
 }
+
+  m_sig = s * sig;
 }

it's a bit awkward to strip the sign and then put it back on this way.  Also
now using a signed 'sig' where it was unsigned before.  And keeping
the first test using m_sig instead of sig.

I'd simply have used 'unsigned HOST_WIDE_INT sig = absu_hwi (m_sig);'
instead.

The rest of the patch is ok with the above change.

Thanks,
Richard.


>>
>> +  int64_t s = m_sig < 0 ? -1 : 1;
>> +  uint64_t sig = m_sig == LONG_MIN ? LONG_MAX : std::abs (m_sig);
>>
>> -#define SREAL_MIN_SIG ((uint64_t) 1 << (SREAL_PART_BITS - 1))
>> -#define SREAL_MAX_SIG (((uint64_t) 1 << SREAL_PART_BITS) - 1)
>> +#define SREAL_MIN_SIG ((uint64_t) 1 << (SREAL_PART_BITS - 2))
>> +#define SREAL_MAX_SIG (((uint64_t) 1 << (SREAL_PART_BITS - 1)) - 1)
>>
>> shouldn't this also be -2 in the last line?
>
>
> It's correct, in the first line, I s/'SREAL_PART_BITS - 1'/'SREAL_PART_BITS
> - 2' and
> second one is also decremented: s/'SREAL_PART_BITS'/'REAL_PART_BITS - 1'.
>
>>
>> That is, you effectively use the MSB as a sign bit?
>
>
> Yes. It uses signed inte

[PATCH, rtl-optimization]: Fix PR64213 by reverting fix for PR64037

2014-12-09 Thread Uros Bizjak

Hello!

The fix for PR64037 [1] introduced severe problems, including
bootstrap failure on alphaev6-linux-gnu. Non-bootstrapped testsuite
results [2] can be compared with bootstrapped gcc with a revert [3].
As discussed in the PR, the problem was in a missed SI-DI zero_extend.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03405.html
[2] https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01068.html
[3] https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01069.html

The patch fixes the regression by reverting the above patch.

2014-12-09  Uros Bizjak  

PR bootstrap/64213
Revert:
2014-11-28  H.J. Lu  

PR rtl-optimization/64037
* combine.c (setup_incoming_promotions): Pass the argument
before any promotions happen to promote_function_mode.

* configure.tgt (x86_64): Tune -m32 multilib to generic.

testsuite/ChangeLog:

2014-12-09  Uros Bizjak  

PR bootstrap/64213
Revert:
2014-11-28  H.J. Lu  

PR rtl-optimization/64037
* g++.dg/pr64037.C: New test.

Patch was bootstrapped and regression tested on alphaev68-linux-gnu.

Approved by Richi in the PR audit trail, will be committed to mainline
SVN, 4.9 and 4.8 branch.

Uros.
Index: combine.c
===
--- combine.c   (revision 218160)
+++ combine.c   (revision 218161)
@@ -1561,8 +1561,8 @@ setup_incoming_promotions (rtx_insn *first)
   uns3 = TYPE_UNSIGNED (DECL_ARG_TYPE (arg));
 
   /* The mode and signedness of the argument as it is actually passed,
- after any TARGET_PROMOTE_FUNCTION_ARGS-driven ABI promotions.  */
-  mode3 = promote_function_mode (DECL_ARG_TYPE (arg), mode2, &uns3,
+ see assign_parm_setup_reg in function.c.  */
+  mode3 = promote_function_mode (TREE_TYPE (arg), mode1, &uns1,
 TREE_TYPE (cfun->decl), 0);
 
   /* The mode of the register in which the argument is being passed.  */

Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-12-09 Thread Tejas Belagod


On 18/11/14 11:28, Yangfei (Felix) wrote:

On 11/18/2014 11:48 AM, Yangfei (Felix) wrote:

+(define_expand "doloop_end"
+  [(use (match_operand 0 "" ""))  ; loop pseudo
+   (use (match_operand 1 "" ""))] ; label
+  ""
+{
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+  if (optimize > 0 && flag_modulo_sched)


You'd be better off moving this condition into the expansion predicate (which is
currently "").

This short-circuits a lot of unnecessary work.  See  pass_rtl_doloop::gate.


r~



Yeah, that's a good idea.  See my updated patch :-)


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-11-13  Felix Yang  
+
+   * config/aarch64/aarch64.c (doloop_end): New pattern.
+   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
+
  2014-11-11  Andrew Pinski  

Bug target/61997
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 217394)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -4087,6 +4087,43 @@
[(set_attr "type" "mrs")])


+;; Define the subtract-one-and-jump insns so loop.c
+;; knows what to generate.
+(define_expand "doloop_end"
+  [(use (match_operand 0 "" ""))  ; loop pseudo
+   (use (match_operand 1 "" ""))] ; label
+  "optimize > 0 && flag_modulo_sched"
+{
+  rtx s0;
+  rtx bcomp;
+  rtx loc_ref;
+  rtx cc_reg;
+  rtx insn;
+  rtx cmp;
+
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+
+  if (GET_MODE (operands[0]) != DImode)
+FAIL;
+
+  s0 = operands [0];
+  insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1)));
+
+  cmp = XVECEXP (PATTERN (insn), 0, 0);
+  cc_reg = SET_DEST (cmp);
+  bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
+  loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]);
+  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
+  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
+loc_ref, pc_rtx)));
+  DONE;
+})
+
  ;; AdvSIMD Stuff
  (include "aarch64-simd.md")

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 217394)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i
  #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
aarch64_use_by_pieces_infrastructure_p

+#undef TARGET_CAN_USE_DOLOOP_P
+#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
+
  struct gcc_target targetm = TARGET_INITIALIZER;

  #include "gt-aarch64.h"



Hi Felix,

This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64240 
when sms-3 is tested with -fPIC. It runs fine when I reverse this patch out.


Please could you have a look?

Thanks,
Tejas.

Re: Fix libgomp crash without TLS (PR42616)

2014-12-09 Thread Varvara Rainchik

Ok, the following patch works for Android:

2014-12-09  Varvara Rainchik  

* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add GCC_CHECK_EMUTLS.
* libgomp.h: Add check for USE_EMUTLS: this case
is equal to HAVE_TLS.
* team.c: Likewise.

--

diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index cea6366..16ec158 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -245,6 +245,9 @@ fi
 # See if we support thread-local storage.
 GCC_CHECK_TLS

+# See if we have emulated thread-local storage.
+GCC_CHECK_EMUTLS
+
 # See what sort of export controls are available.
 LIBGOMP_CHECK_ATTRIBUTE_VISIBILITY
 LIBGOMP_CHECK_ATTRIBUTE_DLLEXPORT
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1482cc..b694356 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -471,7 +471,7 @@ enum gomp_cancel_kind

 /* ... and here is that TLS data.  */

-#ifdef HAVE_TLS
+#if defined HAVE_TLS || defined USE_EMUTLS
 extern __thread struct gomp_thread gomp_tls_data;
 static inline struct gomp_thread *gomp_thread (void)
 {
diff --git a/libgomp/team.c b/libgomp/team.c
index e6a6d8f..594127c 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -37,7 +37,7 @@ pthread_key_t gomp_thread_destructor;


 /* This is the libgomp per-thread data structure.  */
-#ifdef HAVE_TLS
+#if defined HAVE_TLS || defined USE_EMUTLS
 __thread struct gomp_thread gomp_tls_data;
 #else
 pthread_key_t gomp_tls_key;
@@ -70,7 +70,7 @@ gomp_thread_start (void *xdata)
   void (*local_fn) (void *);
   void *local_data;

-#ifdef HAVE_TLS
+#if defined HAVE_TLS || defined USE_EMUTLS
   thr = &gomp_tls_data;
 #else
   struct gomp_thread local_thr;
@@ -916,7 +916,7 @@ gomp_team_end (void)
 static void __attribute__((constructor))
 initialize_team (void)
 {
-#ifndef HAVE_TLS
+#if !defined HAVE_TLS && !defined USE_EMUTLS
   static struct gomp_thread initial_thread_tls_data;

   pthread_key_create (&gomp_tls_key, NULL);

Changes are bootstrapped and regtested on linux, all make check tests
now also pass with --disable-tls.
Is it ok for upstream?

2014-12-09 15:12 GMT+03:00 Jakub Jelinek :
> On Tue, Dec 09, 2014 at 02:49:44PM +0300, Varvara Rainchik wrote:
>> Can we instead of adding new macroses in config/tls.m4 use something
>> like that in libgomp:
>>
>> #if defined (HAVE_TLS) && defined (USE_EMUTLS)
>>
>> (with GCC_CHECK_EMUTLS in libgomp/configure.ac)?
>
> That would be fine too.
>
> Jakub

Re: [PATCH] Update libgcc.texi to match implementation in libgcc/libgcc2.c

2014-12-09 Thread Ian Lance Taylor

On Mon, Oct 27, 2014 at 8:19 PM, Kito Cheng  wrote:
>
> This patch update `Bit operations` section in libgcc.text, most bit
> operation function is take an unsigned integer instead of signed
> integer in libgcc/libgcc2.c [1], and it seem more make sense :)
>
> ChangeLog
> 2014-10-28  Kito Cheng  
>
> * doc/libgcc.texi: Update text to match implementation in
> libgcc/libgcc2.c
>
> [1] 
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=libgcc/libgcc2.c;h=46d6a2ef030ff98a944935f77529cee9c05fb326;hb=HEAD

This is OK.

Thanks.

Ian

Re: [PATCH 2/3] Extended if-conversion

2014-12-09 Thread Richard Biener

On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> Here is updated patch2 with the following changes:
> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
> 2. Use only one function for extended predication -
> predicate_extended_scalar_phi.
> 3. Save gsi before insertion of predicate computations for basic
> blocks if it has 2 predecessors and
> both incoming edges are critical or it gas more than 2 predecessors
> and at least one incoming edge
> is critical. This saved iterator can be used by extended phi predication.
>
> Here is motivated test-case which explains this point.
> Test-case is attached (t5.c) and it must be compiled with -O2
> -ftree-loop-vectorize -fopenmp options.
> The problem phi is in bb-7:
>
>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>   {
> :
> xmax_edge_18 = xmax_edge_36 + 1;
> if (xmax_17 == xmax_27)
>   goto ;
> else
>   goto ;
>
>   }
>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>   {
> :
> if (xmax_17 == xmax_27)
>   goto ;
> else
>   goto ;
>
>   }
>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>   {
> :
> # xmax_edge_30 = PHI 
> xmax_edge_19 = xmax_edge_39 + 1;
> goto ;
>
>   }
>
> Note that both incoming edges to bb_7 are critical. If we comment out
> restoring gsi in predicate_all_scalar_phi:
> #if 0
>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>  || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>gsi = bb_insert_point (bb);
>  else
> #endif
>gsi = gsi_after_labels (bb);
>
> we will get ICE:
> t5.c: In function 'foo':
> t5.c:9:6: error: definition in block 4 follows the use
>  void foo (int n)
>   ^
> for SSA_NAME: _1 in statement:
> _52 = _1 & _3;
> t5.c:9:6: internal compiler error: verify_ssa failed
>
> smce predicate computations were inserted in bb_7.

The issue is obviously that the predicates have already been emitted
in the target BB - that's of course the wrong place.  This is done
by insert_gimplified_predicates.

This just shows how edge predicate handling is broken - we don't
seem to have a sequence of gimplified stmts for edge predicates
but push those to e->dest which makes this really messy.

Rather than having a separate phase where we insert all
gimplified bb predicates we should do that on-demand when
predicating a PHI.

Your patch writes to stderr - that's bad - use dump_file and guard
the printfs properly.

You also still have two functions for PHI predication.  And the
new extended variant doesn't commonize the 2-args and general
paths.

I'm not at all happy with this code.  It may be existing if-conv codes
fault but making it even worse is not an option.

Again - what's wrong with simply splitting critical edges if
aggressive_if_conv?  I think that would very much simplify
things here.  Or alternatively use gsi_insert_on_edge and
commit edge insertions before merging the blocks.

Thanks,
Richard.

> ChangeLog is
>
> 2014-12-09  Yuri Rumyantsev  
>
> * tree-if-conv.c : Include hash-map.h.
> (struct bb_predicate_s): Add new field to save copy of gimple
> statement iterator.
> (bb_insert_point): New function.
> (set_bb_insert_point): New function.
> (has_pred_critical_p): New function.
> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
> AGGRESSIVE_IF_CONV is true.
> (if_convertible_bb_p): Delete check that bb has at least one
> non-critical incoming edge.
> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
> Allow interchange PHI arguments if EXTENDED is false.
> Change check that block containing reduction statement candidate
> is predecessor of phi-block since phi may have more than two arguments.
> (predicate_scalar_phi): Add new arguments for call of
> is_cond_scalar_reduction.
> (get_predicate_for_edge): New function.
> (struct phi_args_hash_traits): New type.
> (phi_args_hash_traits::hash): New function.
> (phi_args_hash_traits::equal_keys): New function.
> (gen_phi_arg_condition): New function.
> (predicate_extended_scalar_phi): New function.
> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
> to true if BB containing phi has more than 2 predecessors or both
> incoming edges are critical. Invoke find_phi_replacement_condition and
> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
> has 2 predecessors and both incoming edges are critical or it has more
> than 2 predecessors and atleast one incoming edge is critical.
> Use standard gsi_after_labels otherwise.
> Invoke predicate_extended_scalar_phi if EXTENDED is true.
> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
> to save gsi before insertion of predicate computations. SEt-up it to
> true for BB with 2 predecessors and critical incoming edges either
> number of predecessors is geater 2 and at least one incoming edge is
> critical.
> Add check that non-predicated block may have statements to insert.
> Insert predicate computation of BB

[PATCH 0/4] [AARCH64,SIMD] PR63870 Improve error messages for single lane load/store

2014-12-09 Thread charles . baylis

From: Charles Baylis 

This patch series moves the checking of lane indices for vld[234](q?)_lane and
vst[234](q?)_lane intrinsics so that it occurs during builtin expansion.

The q register variants are checked directly, but since the d register variants
use the same intrinsics, these are checked in arm_neon.h using
__builtin_aarch64_im_land_boundsi().

Tested with make check-gcc on aarch64-oe-linux, with no regressions.

Charles Baylis (4):
  vldN_lane error message enhancements (Q registers)
  vldN_lane error message enhancements (D registers)
  vstN_lane error message enhancements (Q register)
  vstN_lane error message enhancements (D registers)

 gcc/config/aarch64/aarch64-builtins.c | 32 +++-
 gcc/config/aarch64/aarch64-simd.md| 12 ++--
 gcc/config/aarch64/arm_neon.h | 12 
 3 files changed, 45 insertions(+), 11 deletions(-)

-- 
1.9.1

[PATCH 1/4] vldN_lane error message enhancements (Q registers)

2014-12-09 Thread charles . baylis

From: Charles Baylis 

gcc/ChangeLog:

  Charles Baylis  

PR target/63870
* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers):
Add qualifier_struct_load_store_lane_index.
(aarch64_types_loadstruct_lane_qualifiers): Use
qualifier_struct_load_store_lane_index for lane index argument for
last argument.
(builtin_simd_arg): Add SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
(aarch64_simd_expand_args): Add new argument describing mode of
builtin. Check lane bounds for arguments with
SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
(aarch64_simd_expand_builtin): Emit error for incorrect lane indices
if marked with SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
(aarch64_simd_expand_builtin): Pass machine mode of builtin to
aarch64_simd_expand_args.
* config/aarch64/aarch64-simd.md: (aarch64_ld2_lane): Remove
lane bounds check. Adjust lane numbers for big-endian.
(aarch64_ld3_lane): Likewise.
(aarch64_ld4_lane): Likewise.

gcc/testsuite/ChangeLog:

  Charles Baylis  

* gcc.target/aarch64/simd/vld4q_lane.c: New test.

Change-Id: Ib17adaf64e631cf8d00a1a1a6c12409d2d7f4239
---
 gcc/config/aarch64/aarch64-builtins.c  | 30 +++---
 gcc/config/aarch64/aarch64-simd.md | 12 -
 gcc/testsuite/gcc.target/aarch64/simd/vld4q_lane.c | 16 
 3 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vld4q_lane.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index aac7269..27046e2 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -116,7 +116,9 @@ enum aarch64_type_qualifiers
   /* Polynomial types.  */
   qualifier_poly = 0x100,
   /* Lane indices - must be in range, and flipped for bigendian.  */
-  qualifier_lane_index = 0x200
+  qualifier_lane_index = 0x200,
+  /* Lane indices for single lane structure loads and stores */
+  qualifier_struct_load_store_lane_index = 0x400
 };
 
 typedef struct
@@ -224,7 +226,7 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum aarch64_type_qualifiers
 aarch64_types_loadstruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_const_pointer_map_mode,
-  qualifier_none, qualifier_none };
+  qualifier_none, qualifier_struct_load_store_lane_index };
 #define TYPES_LOADSTRUCT_LANE (aarch64_types_loadstruct_lane_qualifiers)
 
 static enum aarch64_type_qualifiers
@@ -859,12 +861,14 @@ typedef enum
   SIMD_ARG_COPY_TO_REG,
   SIMD_ARG_CONSTANT,
   SIMD_ARG_LANE_INDEX,
+  SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX,
   SIMD_ARG_STOP
 } builtin_simd_arg;
 
 static rtx
 aarch64_simd_expand_args (rtx target, int icode, int have_retval,
- tree exp, builtin_simd_arg *args)
+ tree exp, builtin_simd_arg *args,
+ enum machine_mode builtin_mode)
 {
   rtx pat;
   rtx op[SIMD_MAX_BUILTIN_ARGS + 1]; /* First element for result operand.  */
@@ -903,6 +907,21 @@ aarch64_simd_expand_args (rtx target, int icode, int 
have_retval,
op[opc] = copy_to_mode_reg (mode, op[opc]);
  break;
 
+   case SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX:
+ /* we expect arguments in order (ptr, array_of_vector, lane), and
+we have to grub around in the ptr to find the lane size */
+ gcc_assert (opc > 1);
+ if (CONST_INT_P (op[opc]))
+   {
+ aarch64_simd_lane_bounds (op[opc], 0,
+   GET_MODE_NUNITS (builtin_mode),
+   exp);
+ /* Keep to GCC-vector-extension lane indices in the RTL.  */
+ op[opc] =
+   GEN_INT (ENDIAN_LANE_N (builtin_mode, INTVAL (op[opc])));
+   }
+ goto constant_arg;
+
case SIMD_ARG_LANE_INDEX:
  /* Must be a previous operand into which this is an index.  */
  gcc_assert (opc > 0);
@@ -917,6 +936,7 @@ aarch64_simd_expand_args (rtx target, int icode, int 
have_retval,
  /* Fall through - if the lane index isn't a constant then
 the next case will error.  */
case SIMD_ARG_CONSTANT:
+constant_arg:
  if (!(*insn_data[icode].operand[opc].predicate)
  (op[opc], mode))
  {
@@ -1003,6 +1023,8 @@ aarch64_simd_expand_builtin (int fcode, tree exp, rtx 
target)
 
   if (d->qualifiers[qualifiers_k] & qualifier_lane_index)
args[k] = SIMD_ARG_LANE_INDEX;
+  else if (d->qualifiers[qualifiers_k] & 
qualifier_struct_load_store_lane_index)
+   args[k] = SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX;
   else if (d->qualifiers[qualifiers_k] & qualifier_immediate)
args[k] = SIMD_ARG_CONSTANT;

[PATCH 2/4] vldN_lane error message enhancements (D registers)

2014-12-09 Thread charles . baylis

From: Charles Baylis 

gcc/ChangeLog

  Charles Baylis  

* config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Add explicit lane
bounds check.
(__LD3_LANE_FUNC): Likewise.
(__LD4_LANE_FUNC): Likewise

gcc/testsuite/ChangeLog:

  Charles Baylis  

* gcc.target/aarch64/simd/vld4_lane.c: New test.

Change-Id: Ia95fbed34b50cf710ea9032ff3428a5f1432e0aa
---
 gcc/config/aarch64/arm_neon.h |  6 ++
 gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c | 15 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 8cff719..22df564 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -17901,6 +17901,8 @@ vld2_lane_##funcsuffix (const ptrtype * __ptr, intype 
__b, const int __c)  \
   __o = __builtin_aarch64_set_qregoi##mode (__o,  \
   (signedtype) __temp.val[1], \
   1); \
+  __builtin_aarch64_im_lane_boundsi (__c, \
+sizeof (vectype) / sizeof (*__ptr));  \
   __o =__builtin_aarch64_ld2_lane##mode (  
   \
  (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c);   \
   __b.val[0] = (vectype) __builtin_aarch64_get_dregoidi (__o, 0); \
@@ -17991,6 +17993,8 @@ vld3_lane_##funcsuffix (const ptrtype * __ptr, intype 
__b, const int __c)  \
   __o = __builtin_aarch64_set_qregci##mode (__o,  \
   (signedtype) __temp.val[2], \
   2); \
+  __builtin_aarch64_im_lane_boundsi (__c, \
+sizeof (vectype) / sizeof (*__ptr));  \
   __o =__builtin_aarch64_ld3_lane##mode (  
   \
  (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c);   \
   __b.val[0] = (vectype) __builtin_aarch64_get_dregcidi (__o, 0); \
@@ -18089,6 +18093,8 @@ vld4_lane_##funcsuffix (const ptrtype * __ptr, intype 
__b, const int __c)  \
   __o = __builtin_aarch64_set_qregxi##mode (__o,  \
   (signedtype) __temp.val[3], \
   3); \
+  __builtin_aarch64_im_lane_boundsi (__c, \
+sizeof (vectype) / sizeof (*__ptr));  \
   __o =__builtin_aarch64_ld4_lane##mode (  
   \
  (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c);   \
   __b.val[0] = (vectype) __builtin_aarch64_get_dregxidi (__o, 0); \
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c
new file mode 100644
index 000..d14e6c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vld4_lane.c
@@ -0,0 +1,15 @@
+/* Test error message when passing an invalid value as a lane index.  */
+
+/* { dg-do compile } */
+
+#include 
+
+int8x8x4_t
+f_vld4_lane (int8_t * p, int8x8x4_t v)
+{
+  int8x8x4_t res;
+  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
+  res = vld4_lane_s8 (p, v, 8);
+  return res;
+}
+
-- 
1.9.1

[PATCH 4/4] vstN_lane error message enhancements (D registers)

2014-12-09 Thread charles . baylis

From: Charles Baylis 

gcc/ChangeLog:

  Charles Baylis  

* config/aarch64/arm_neon.h (__ST2_LANE_FUNC): Add explicit lane bounds
check.
(__ST3_LANE_FUNC): Likewise.
(__ST4_LANE_FUNC): Likewise.

gcc/testsuite/ChangeLog:

  Charles Baylis  

* gcc.target/aarch64/simd/vst4_lane.c: New test.

Change-Id: I6bceaeb7773bf20860daca4013ea1c4d2c06afa6
---
 gcc/config/aarch64/arm_neon.h |  6 ++
 gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c | 14 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 22df564..22c6d06 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11181,6 +11181,8 @@ vst2_lane_ ## funcsuffix (ptrtype *__ptr,   
 \
   __temp.val[1]
 \
 = vcombine_##funcsuffix (__b.val[1],\
 vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+  __builtin_aarch64_im_lane_boundsi (__c,   \
+sizeof (__b.val[0]) / sizeof (*__ptr)); \
   __o = __builtin_aarch64_set_qregoi##mode (__o,\
(signedtype) __temp.val[0], 0);  \
   __o = __builtin_aarch64_set_qregoi##mode (__o,\
@@ -11258,6 +11260,8 @@ vst3_lane_ ## funcsuffix (ptrtype *__ptr,   
 \
(signedtype) __temp.val[1], 1);  \
   __o = __builtin_aarch64_set_qregci##mode (__o,\
(signedtype) __temp.val[2], 2);  \
+  __builtin_aarch64_im_lane_boundsi (__c,   \
+sizeof (__b.val[0]) / sizeof (*__ptr)); \
   __builtin_aarch64_st3_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *)  \
 __ptr, __o, __c);   \
 }
@@ -11336,6 +11340,8 @@ vst4_lane_ ## funcsuffix (ptrtype *__ptr,   
 \
(signedtype) __temp.val[2], 2);  \
   __o = __builtin_aarch64_set_qregxi##mode (__o,\
(signedtype) __temp.val[3], 3);  \
+  __builtin_aarch64_im_lane_boundsi (__c,   \
+sizeof (__b.val[0]) / sizeof (*__ptr)); \
   __builtin_aarch64_st4_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *)  \
 __ptr, __o, __c);   \
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c
new file mode 100644
index 000..6627ecf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vst4_lane.c
@@ -0,0 +1,14 @@
+/* Test error message when passing an invalid value as a lane index.  */
+
+/* { dg-do compile } */
+
+#include 
+
+void
+f_vst4_lane (int8_t * p, int8x8x4_t v)
+{
+  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vst4_lane_s8 (p, v, 8);
+  return;
+}
+
-- 
1.9.1

[PATCH 3/4] vstN_lane error message enhancements (Q register)

2014-12-09 Thread charles . baylis

From: Charles Baylis 

gcc/ChangeLog:

  Charles Baylis  

* config/aarch64/aarch64-builtins.c
(aarch64_types_storestruct_lane_qualifiers): Mark last argument with
qualifier_struct_load_store_lane_index.

gcc/testsuite/ChangeLog:

  Charles Baylis  

* gcc.target/aarch64/simd/vst4q_lane.c: New test.

Change-Id: If097c9d32eb6eb3d4c4e16db81f81e44a3154509
---
 gcc/config/aarch64/aarch64-builtins.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 27046e2..f2fb939 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -258,7 +258,7 @@ aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum aarch64_type_qualifiers
 aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer_map_mode,
-  qualifier_none, qualifier_none };
+  qualifier_none, qualifier_struct_load_store_lane_index };
 #define TYPES_STORESTRUCT_LANE (aarch64_types_storestruct_lane_qualifiers)
 
 #define CF0(N, X) CODE_FOR_aarch64_##N##X
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c
new file mode 100644
index 000..849f07a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vst4q_lane.c
@@ -0,0 +1,15 @@
+/* Test error message when passing an invalid value as a lane index.  */
+
+/* { dg-do compile } */
+
+#include 
+
+void
+f_vst4q_lane (int8_t * p, int8x16x4_t v)
+{
+  vst4q_lane_s8 (p, v, 16);
+/* { dg-error "lane 16 out of range 0 - 15" "" { target *-*-* } 0 } */
+  return;
+}
+
+
-- 
1.9.1

Re: [PATCH] PR ipa/63909 ICE: SIGSEGV in ipa_icf_gimple::func_checker::compare_bb()

2014-12-09 Thread Martin Liška


On 11/21/2014 01:23 PM, Richard Biener wrote:

On Fri, Nov 21, 2014 at 12:52 PM, Martin Liška  wrote:

On 11/20/2014 05:41 PM, Richard Biener wrote:


On Thu, Nov 20, 2014 at 5:30 PM, Martin Liška  wrote:


Hello.

Following patch fixes ICE in IPA ICF. Problem was that number of
non-debug
statements in a BB can
change (for instance by IPA split), so that the number is recomputed.



Huh, so can it get different for both candidates?  I think the stmt
compare
loop should be terminated on gsi_end_p of either iterator and return
false for any remaining non-debug-stmts on the other.

Thus, not walk all stmts twice here.



Hello.

Sorry for the previous patch, you are right it can be fixed in purer way.
Please take a look at attached patch.



As IPA split is run early I don't see how it should affect a real IPA
pass though?





Sorry for non precise information, the problematic BB is changed here:
#0  gsi_split_seq_before (i=0x7fffd550, pnew_seq=0x7fffd528) at
../../gcc/gimple-iterator.c:429
#1  0x00b95a2a in gimple_split_block (bb=0x76c41548, stmt=0x0)
at ../../gcc/tree-cfg.c:5707
#2  0x007563cf in split_block (bb=0x76c41548, i=i@entry=0x0) at
../../gcc/cfghooks.c:508
#3  0x00756b44 in split_block_after_labels (bb=) at
../../gcc/cfghooks.c:549
#4  make_forwarder_block (bb=,
redirect_edge_p=redirect_edge_p@entry=0x75d4e0 ,
new_bb_cbk=new_bb_cbk@entry=0x0) at ../../gcc/cfghooks.c:842
#5  0x0076085a in create_preheader (loop=0x76d56948,
flags=) at ../../gcc/cfgloopmanip.c:1563
#6  0x00760aea in create_preheaders (flags=1) at
../../gcc/cfgloopmanip.c:1613
#7  0x009bc6b0 in apply_loop_flags (flags=15) at
../../gcc/loop-init.c:75
#8  0x009bc7d3 in loop_optimizer_init (flags=15) at
../../gcc/loop-init.c:136
#9  0x00957914 in estimate_function_body_sizes (node=0x76c47620,
early=false) at ../../gcc/ipa-inline-analysis.c:2480
#10 0x0095948b in compute_inline_parameters (node=0x76c47620,
early=false) at ../../gcc/ipa-inline-analysis.c:2907
#11 0x0095bd88 in inline_analyze_function (node=0x76c47620) at
../../gcc/ipa-inline-analysis.c:3994
#12 0x0095bed3 in inline_generate_summary () at
../../gcc/ipa-inline-analysis.c:4045
#13 0x00a70b71 in execute_ipa_summary_passes (ipa_pass=0x1dcb9e0) at


So inline_summary is generated after IPA-ICF does its job?

But the bug is obviously that an IPA analysis phase does a code transform
(here initializes loops without AVOID_CFG_MANIPULATIONS).
Honza - if that is really needed then I think we should make sure
loops are initialized at the start of the IPA analysis phase, not randomly
inbetween.

Thanks,
Richard.


Hello.

Even thought the root of problem is hidden somewhere in loop creation, I would
like to apply the patch which makes iteration of non-debug gimple statement
more clearly?

What do you think Richard?

Thanks,
Martin




../../gcc/passes.c:2137
#14 0x00777a15 in ipa_passes () at ../../gcc/cgraphunit.c:2074
#15 symbol_table::compile (this=this@entry=0x76c3a000) at
../../gcc/cgraphunit.c:2187
#16 0x00778bcd in symbol_table::finalize_compilation_unit
(this=0x76c3a000) at ../../gcc/cgraphunit.c:2340
#17 0x006580ee in c_write_global_declarations () at
../../gcc/c/c-decl.c:10777
#18 0x00b5bb8b in compile_file () at ../../gcc/toplev.c:584
#19 0x00b5def1 in do_compile () at ../../gcc/toplev.c:2041
#20 0x00b5e0fa in toplev::main (this=0x7fffdc9f, argc=20,
argv=0x7fffdd98) at ../../gcc/toplev.c:2138
#21 0x0063f1d9 in main (argc=20, argv=0x7fffdd98) at
../../gcc/main.c:38


Patch can bootstrap on x86_64-linux-pc and no regression has been seen.
Ready for trunk?


Thanks,
Martin



Thanks,
Richard.


Patch can bootstrap on x86_64-linux-pc and no regression has been seen.
Ready for trunk?

Thanks,
Martin

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Evgeny Stupachenko

I've tried to get smaller reproducer, however currently it is
complicated as several functions in GCC.
However patch is fixing spec2006 benchmark. Shouldn't that be enough
for regression testing?

Thanks,
Evgeny

On Tue, Dec 9, 2014 at 4:29 PM, Evgeny Stupachenko  wrote:
> The case comes from spec2006 403.gcc (or old GCC itself).
>
>   for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
> {
>   vd->e[i].mode = VOIDmode;
>   vd->e[i].oldest_regno = i;
>   vd->e[i].next_regno = INVALID_REGNUM;
> }
>
> It is vectorized and only then completely peeled.
> Only after peeling all stored values become constant.
>
> Currently expand_vec_perm_pblendv works as following:
> Let d.target, d.op0, dop1 be permutation parameters.
>
> First we permute an operand (d.op1 or d.op0) and then blend it with
> other argument:
>
> d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
> d.target = pblend(d.op0, d.target)
> (if d.op0 equal to d.target this is buggy)
>
> Patch make it more accurate:
> tmp = gen_reg_rtx (vmode)
> tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
> d.target = pblend(d.op0, tmp)
> (Here d.op0 can be equal to d.target)
>
> Below is rtl dump of buggy case:
>
> (183t.optimized)
> ...
> vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
> 35 }, { 0, 4, 0, 1 }>;
> vect_shuffle3_high_469 = VEC_PERM_EXPR  4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>;
> ...
> (184r.expand)
> ...
> (insn 205 204 206 (set (reg:V4SI 768)
> (const_vector:V4SI [
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> ])) ../regrename.c:1171 -1
>  (nil))
>
> (insn 206 205 208 (set (reg:V4SI 769)
> (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3  S16
> A128])) ../regrename.c:1171 -1
>  (expr_list:REG_EQUAL (const_vector:V4SI [
> (const_int 32 [0x20])
> (const_int 33 [0x21])
> (const_int 34 [0x22])
> (const_int 35 [0x23])
> ])
> (nil)))
>
> (insn 208 206 207 (set (reg:V4SI 770)
> (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
> (reg:V4SI 769))
> (parallel [
> (const_int 0 [0])
> (const_int 4 [0x4])
> (const_int 1 [0x1])
> (const_int 5 [0x5])
> ]))) ../regrename.c:1171 -1
>  (nil))
>
> (insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
> (vec_select:V4SI (reg:V4SI 770)
> (parallel [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 0 [0])
> (const_int 2 [0x2])
> ]))) ../regrename.c:1171 -1
>  (nil))
>
> (insn 209 207 210 (set (reg:V4SI 771)
> (const_vector:V4SI [
> (const_int -1 [0x])
> (const_int -1 [0x])
> (const_int -1 [0x])
> (const_int -1 [0x])
> ])) ../regrename.c:1171 -1
>  (nil))
>
> (insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
> (vec_select:V4SI (reg:V4SI 771)
> (parallel [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 0 [0])
> (const_int 3 [0x3])
> ]))) ../regrename.c:1171 -1
>  (nil))
> (insn 211 210 212 (set (reg:V8HI 772)
> (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
> (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
> (const_int 48 [0x30]))) ../regrename.c:1171 -1
>  (nil))
> ...
>
> On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak  wrote:
>> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak  wrote:
>>
 The patch fix pblendv expand.
 The bug was uncovered when permutation operands are constants.
 In this case we init target register for expand_vec_perm_1 with
 constant and then rewrite the target with constant for
 expand_vec_perm_pblend.

 The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
 -flto -march=corei7.

 Bootstrap and make check passed.

 Is it ok?
>>>
>>> Please add a testcase.
>>
>> Also, it surprises me that we enter expand_vec_perm_pblendv with
>> uninitialized (?) target. Does your patch only papers over a real
>> problem up the call chain (hard to say without a testcase)?
>>
>> Uros.
>>
>>>

 Evgeny

 2014-12-09  Evgeny Stupachenko  

 gcc/
 * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
 expand_vec_perm_1 target.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index eafc15a..5a914ad 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv

Re: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-12-09 Thread Marcus Shawcroft

On 24 November 2014 at 13:46, Wilco Dijkstra  wrote:
>> Richard Earnshaw wrote:
>> If all cores seem to benefit from FP reassociation set to 4, then it
>> seems odd that 4 is not also the default for generic.
>>
>> Andrew, you may need to pick a target-specific value for ThunderX; I
>> think Wilco has just picked something that seems plausible because he
>> needs to put a real value in there.
>>
>> What happens if the integer and vector numbers are bumped up?  I'd have
>> thought that integer numbers >1 would be appropriate on all dual-issue
>> or greater cores.
>
> I tried int and vector as well, and setting int to 2 did give an improvement,
> but vector had no effect, so I'll leave to 1 for now. The patch is the same
> as last time, it just sets integer to 2, and uses the same settings for all 
> CPUs.
>
> OK for commit?
>
> ChangeLog:
> 2014-11-24  Wilco Dijkstra  
>
> * gcc/config/aarch64/aarch64-protos.h (tune-params):
> Add reasociation tuning parameters.
> * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH):
> Define.  (aarch64_reassociation_width): New function.
> (generic_tunings) Add reassociation tuning parameters.
> (cortexa53_tunings): Likewise.
> (cortexa57_tunings): Likewise.
> (thunderx_tunings): Likewise.

OK /Marcus

Re: [PATCH] PR other/63613: Add fixincludes for dejagnu.h

2014-12-09 Thread David Malcolm

On Mon, 2014-12-08 at 14:13 +0100, Rainer Orth wrote:
> Jeff Law  writes:
> 
> > On 12/04/14 15:42, Rainer Orth wrote:
> >> David Malcolm  writes:
> >>
> >>>  assumed -fgnu89-inline until a recent upstream fix;
> >>> see http://lists.gnu.org/archive/html/dejagnu/2014-10/msg00011.html
> >>>
> >>> Remove the workaround from jit.exp that used -fgnu89-inline
> >>> in favor of a fixincludes to dejagnu.h that applies the upstream fix
> >>> to a local copy.
> >>>
> >>> This should make it easier to support C++ testcases from jit.exp.
> >>
> >> I wonder how this would work if dejagnu.h doesn't live in a system
> >> include dir (e.g. a self-compiled version)?  fixincludes won't touch
> >> those AFAIU.  The previous version with -fgnu89-inline would still work
> >> in that case provided dejagnu.h is found at all.
> > Presumably in that case the answer is upgrade dejagnu? :-)
> 
> I've two problems with this:
> 
> * There's not yet a DejaGnu release available with the fix and I've no
>   idea if there are any planned any time soon.  Not everyone is
>   comfortable with random git (or whatever) snapshots.

FWIW I've asked on the DejaGnu mailing list, and Ben Elliston said:
> Yes. I plan on releasing 1.6 over the holidays.
http://lists.gnu.org/archive/html/dejagnu/2014-12/msg1.html

> * I don't consider this a critical issue that cannot work without
>   current releases.  We're already working around several upstream
>   DejaGnu issues in our codebase, and I don't consider this particular
>   one important enough to require everyone to upgrade to a not-a-release
>   version.
> 
>   Rainer
>

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Richard Henderson

On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
> However patch is fixing spec2006 benchmark. Shouldn't that be enough
> for regression testing?
> 

No.  Spec is not free.


r~

Re: [PATCH][AARCH64][5/5] Add macro fusion support for cmp/b.X for ThunderX

2014-12-09 Thread Marcus Shawcroft

On 5 December 2014 at 14:36, Kyrill Tkachov  wrote:
> Hi all,
>
> Andrew posted this patch sometime ago (before stage1 closed) and I had
> rebased it on top of the
> other macro fusion patches in that series.
> This is a respin of that patch with the comment about not calling
> get_attr_type
> repeatedly resolved
> (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02251.html)
>
> Tested aarch64-none-elf.
>
> Is this ok to go in?
>
> Thanks,
> Kyrill
>
> 2014-12-01  Andrew Pinski  apin...@cavium.com
> Kyrylo Tkachov  kyrylo.tkac...@arm.com
>
>  * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
>  (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
>  (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.

OK /Marcus

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Jakub Jelinek

On Tue, Dec 09, 2014 at 06:59:27PM +0300, Evgeny Stupachenko wrote:
> I've tried to get smaller reproducer, however currently it is
> complicated as several functions in GCC.

Just add a gcc_assert where you are changing the code, testing for
what you want to avoid.
Then just delta reduce it or creduce it.

Jakub

Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern

2014-12-09 Thread Eric Botcazou

> It's unfortunate that GCC does not have separate patterns for a safe
> ADD used by reload.

It does, the addptrM3 pattern, but it works only with LRA at the moment.

-- 
Eric Botcazou

Re: Fix libgomp crash without TLS (PR42616)

2014-12-09 Thread Richard Henderson

On 12/09/2014 06:53 AM, Varvara Rainchik wrote:
> 2014-12-09  Varvara Rainchik  
> 
> * config.h.in: Regenerate.
> * configure: Regenerate.
> * configure.ac: Add GCC_CHECK_EMUTLS.
> * libgomp.h: Add check for USE_EMUTLS: this case
> is equal to HAVE_TLS.
> * team.c: Likewise.

Ok.


r~

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Evgeny Stupachenko

I mean that there are a lot of people tracking spec2006 stability and
therefore the issue should be on track in future.
And that I can create the test case, but it would be as big as several
GCC functions.
Will work on reducing the test case.

On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson  wrote:
> On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
>> However patch is fixing spec2006 benchmark. Shouldn't that be enough
>> for regression testing?
>>
>
> No.  Spec is not free.
>
>
> r~

Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern

2014-12-09 Thread David Edelsohn

On Tue, Dec 9, 2014 at 11:24 AM, Eric Botcazou  wrote:
>> It's unfortunate that GCC does not have separate patterns for a safe
>> ADD used by reload.
>
> It does, the addptrM3 pattern, but it works only with LRA at the moment.

Hi, Eric

Segher and I previously discussed addptrM3, but I thought that it was
restricted in a different way.

We definitely will have to define addptrM3 and restore addM3 when
rs6000 switches to LRA.

Thanks, David

Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern

2014-12-09 Thread Segher Boessenkool

On Tue, Dec 09, 2014 at 11:48:44AM -0500, David Edelsohn wrote:
> We definitely will have to define addptrM3 and restore addM3 when
> rs6000 switches to LRA.

Can LRA deal with adding a clobber of a hard register?  How would
you express that in the patterns, anyway?


Segher

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-09 Thread Dominique d'Humières

This was for x86_64-apple-darwin14. The patch also works for 
x86_64-apple-darwin10.

Dominique

> Le 6 déc. 2014 à 01:49, Dominique d'Humières  a écrit :
> 
> Bootstrap just finished with the patch.
> 
> Thanks,
> 
> Dominique

RE: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-12-09 Thread Wilco Dijkstra

> Marcus Shawcroft wrote:
> > OK for commit?
> >
> > ChangeLog:
> > 2014-11-24  Wilco Dijkstra  
> >
> > * gcc/config/aarch64/aarch64-protos.h (tune-params):
> > Add reasociation tuning parameters.
> > * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH):
> > Define.  (aarch64_reassociation_width): New function.
> > (generic_tunings) Add reassociation tuning parameters.
> > (cortexa53_tunings): Likewise.
> > (cortexa57_tunings): Likewise.
> > (thunderx_tunings): Likewise.
> 
> OK /Marcus

Jiong, can you commit this please?

---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c| 36 
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a9985b5..ac3487b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -170,6 +170,9 @@ struct tune_params
   const struct cpu_vector_cost *const vec_costs;
   const int memmov_cost;
   const int issue_rate;
+  const int int_reassoc_width;
+  const int fp_reassoc_width;
+  const int vec_reassoc_width;
 };
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3832123..e543161 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -314,7 +314,10 @@ static const struct tune_params generic_tunings =
   &generic_regmove_cost,
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1/* vec_reassoc_width.  */
 };
 
 static const struct tune_params cortexa53_tunings =
@@ -324,7 +327,10 @@ static const struct tune_params cortexa53_tunings =
   &cortexa53_regmove_cost,
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1/* vec_reassoc_width.  */
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -334,7 +340,10 @@ static const struct tune_params cortexa57_tunings =
   &cortexa57_regmove_cost,
   &cortexa57_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 3)
+  NAMED_PARAM (issue_rate, 3),
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1/* vec_reassoc_width.  */
 };
 
 static const struct tune_params thunderx_tunings =
@@ -344,7 +353,10 @@ static const struct tune_params thunderx_tunings =
   &thunderx_regmove_cost,
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 6),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1/* vec_reassoc_width.  */
 };
 
 /* A processor implementing AArch64.  */
@@ -437,6 +449,19 @@ static const char * const aarch64_condition_codes[] =
   "hi", "ls", "ge", "lt", "gt", "le", "al", "nv"
 };
 
+static int
+aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED,
+enum machine_mode mode)
+{
+  if (VECTOR_MODE_P (mode))
+return aarch64_tune_params->vec_reassoc_width;
+  if (INTEGRAL_MODE_P (mode))
+return aarch64_tune_params->int_reassoc_width;
+  if (FLOAT_MODE_P (mode))
+return aarch64_tune_params->fp_reassoc_width;
+  return 1;
+}
+
 /* Provide a mapping from gcc register numbers to dwarf register numbers.  */
 unsigned
 aarch64_dbx_register_number (unsigned regno)
@@ -10499,6 +10524,9 @@ aarch64_gen_ccmp_next (rtx prev, int cmp_code, rtx op0, 
rtx op1, int bit_code)
 #undef TARGET_PREFERRED_RELOAD_CLASS
 #define TARGET_PREFERRED_RELOAD_CLASS aarch64_preferred_reload_class
 
+#undef TARGET_SCHED_REASSOCIATION_WIDTH
+#define TARGET_SCHED_REASSOCIATION_WIDTH aarch64_reassociation_width
+
 #undef TARGET_SECONDARY_RELOAD
 #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload
 
-- 
1.9.1

Re: [PATCH 04/10] rs6000: Remove addic from the normal add pattern

2014-12-09 Thread Segher Boessenkool

On Mon, Dec 08, 2014 at 09:01:39AM -0600, Segher Boessenkool wrote:
> > Why are you removing the alternative instead of clobbering XER[CA]?
> 
> I should have mentioned, sorry.
> 
> We don't want to clobber CA on *every* add, and reload can generate more
> adds out of thin air.  And CA is a fixed register.
> 
> There may be a better way to do this, but I don't know it.

Some numbers...

I checked how often the compiler used to allocate GPR0 here.  I looked
at cc1 of 4.7.2, 12MB text size.  It happened zero times (it already was
"?r" to discourage the RA / reload from using that alternative).

There are 143962 addi insns in there.

Segher

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Sebastian Pop

Richard Biener wrote:
> On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey  wrote:
> > expected?  Should this test also check flag_thread_jumps?  Or should
> > that be getting checked somewhere else?
> 
> -fthread-jumps is an RTL optimization flag and ignored on GIMPLE.

Does it make sense to add a -f[no-]tree-thread-jumps to enable/disable the tree
jump threading?  I could also add -f[no-]tree-fsm-thread-jumps.  Opinions?

On the llvm test-suite, I have seen one ICE with my fsm jump-thread patch.
This patch fixes the problem:

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 12f83ba..f8c736e 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
 {
   if (!loop->header
+|| !loop_latch_edge (loop)
   || !bitmap_bit_p (threaded_blocks, loop->header->index))
   continue;
 
  retval |= thread_through_loop_header (loop, may_peel_loop_headers);

Ok to commit after regstrap?

Thanks,
Sebastian

[PATCH, libcpp]: Check the result of vasprintf

2014-12-09 Thread Uros Bizjak

Hello!

The compilation with gentoo glibc 2.20 emits following warning:

../../../gcc-svn/trunk/libcpp/directives.c:2411:28: warning: ignoring
return value of ‘int vasprintf(char**, const char*, __gnuc_va_list)’,
declared with attribute warn_unused_result [-Wunused-result]

The manpage says:

If memory allocation wasn't possible, or some other error occurs,
these functions will return -1, and the contents of strp is undefined.

Attached patch checks the return value and sets ptr to NULL in this case.

2014-12-09  Uros Bizjak  

* directives.c (cpp_define_formatted): Check return value of
vasprintf and in case of error set ptr to NULL.

Bootstrapped on x86_64-linux-gnu.

OK for mainline?

Uros.

Index: directives.c
===
--- directives.c(revision 218525)
+++ directives.c(working copy)
@@ -2408,7 +2408,8 @@ cpp_define_formatted (cpp_reader *pfile, const cha

   va_list ap;
   va_start (ap, fmt);
-  vasprintf (&ptr, fmt, ap);
+  if (vasprintf (&ptr, fmt, ap) < 0)
+ptr = NULL;
   va_end (ap);

   cpp_define (pfile, ptr);

[PATCH] TYPE_OVERFLOW_* cleanup

2014-12-09 Thread Marek Polacek

The issue here is that TYPE_OVERFLOW_TRAPS, TYPE_OVERFLOW_UNDEFINED,
and TYPE_OVERFLOW_WRAPS macros work on integral types only, yet we
pass e.g. pointer_type/real_type to them.  This patch adds proper
checking for these macros and adds missing guards to various places.
This looks pretty straightforward, but I had to tweak a few places
wrt vector_types (where I've used VECTOR_INTEGER_TYPE_P) to not to
regress folding - and I'm afraid I missed places that aren't tested
in our testsuite :/.

Bootstrapped/regtested on ppc64-linux and x86_64-linux.

2014-12-09  Marek Polacek  

* fold-const.c (negate_expr_p): Check for INTEGRAL_TYPE_P.
(fold_negate_expr): Likewise.
(extract_muldiv_1): Likewise.
(maybe_canonicalize_comparison_1): Likewise.
(fold_comparison): Likewise.
(fold_binary_loc): Likewise.
(tree_binary_nonnegative_warnv_p): Likewise.
(tree_binary_nonzero_warnv_p): Likewise.
* gimple-ssa-strength-reduction.c (legal_cast_p_1): Likewise.
* tree-scalar-evolution.c (simple_iv): Likewise.
(scev_const_prop): Likewise.
* tree-ssa-loop-niter.c (expand_simple_operations): Likewise.
* match.pd (X % -C): Likewise.
(-A - 1 -> ~A): Likewise.
(~A + A -> -1): Check for INTEGRAL_TYPE_P or VECTOR_INTEGER_TYPE_P.
* tree-vect-generic.c (expand_vector_operation): Likewise.
* tree.h (INTEGRAL_TYPE_CHECK): Define.
(TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED, TYPE_OVERFLOW_TRAPS):
Add INTEGRAL_TYPE_CHECK.

diff --git gcc/fold-const.c gcc/fold-const.c
index 0c4fe40..ff9d917 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -426,7 +426,8 @@ negate_expr_p (tree t)
 
 case VECTOR_CST:
   {
-   if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))
+   if (FLOAT_TYPE_P (TREE_TYPE (type))
+   || (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)))
  return true;
 
int count = TYPE_VECTOR_SUBPARTS (type), i;
@@ -558,7 +559,8 @@ fold_negate_expr (location_t loc, tree t)
 case INTEGER_CST:
   tem = fold_negate_const (t, type);
   if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
- || (!TYPE_OVERFLOW_TRAPS (type)
+ || (INTEGRAL_TYPE_P (type)
+ && !TYPE_OVERFLOW_TRAPS (type)
  && TYPE_OVERFLOW_WRAPS (type))
  || (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0)
return tem;
@@ -5951,7 +5953,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
   || EXPRESSION_CLASS_P (op0))
  /* ... and has wrapping overflow, and its type is smaller
 than ctype, then we cannot pass through as widening.  */
- && ((TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0))
+ && (((INTEGRAL_TYPE_P (TREE_TYPE (op0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)))
   && (TYPE_PRECISION (ctype)
   > TYPE_PRECISION (TREE_TYPE (op0
  /* ... or this is a truncation (t is narrower than op0),
@@ -5966,7 +5969,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  /* ... or has undefined overflow while the converted to
 type has not, we cannot do the operation in the inner type
 as that would introduce undefined overflow.  */
- || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))
+ || ((INTEGRAL_TYPE_P (TREE_TYPE (op0))
+  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0)))
  && !TYPE_OVERFLOW_UNDEFINED (type
break;
 
@@ -6159,6 +6163,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
   if ((code == TRUNC_MOD_EXPR || code == CEIL_MOD_EXPR
   || code == FLOOR_MOD_EXPR || code == ROUND_MOD_EXPR)
  /* If the multiplication can overflow we cannot optimize this.  */
+ && INTEGRAL_TYPE_P (TREE_TYPE (t))
  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (t))
  && TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST
  && wi::multiple_of_p (op1, c, TYPE_SIGN (type)))
@@ -6211,7 +6216,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
 
 If we have an unsigned type, we cannot do this since it will change
 the result if the original computation overflowed.  */
-  if (TYPE_OVERFLOW_UNDEFINED (ctype)
+  if (INTEGRAL_TYPE_P (ctype)
+ && TYPE_OVERFLOW_UNDEFINED (ctype)
  && ((code == MULT_EXPR && tcode == EXACT_DIV_EXPR)
  || (tcode == MULT_EXPR
  && code != TRUNC_MOD_EXPR && code != CEIL_MOD_EXPR
@@ -8497,7 +8503,8 @@ maybe_canonicalize_comparison_1 (location_t loc, enum 
tree_code code, tree type,
 
   /* Match A +- CST code arg1 and CST code arg1.  We can change the
  first form only if overflow is undefined.  */
-  if (!((TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
+  if (!(((INTEGRAL_TYPE_P (TREE_TYPE (arg0))
+ && TYPE_OVERFLOW_U

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Jeff Law


On 12/09/14 10:38, Sebastian Pop wrote:

Richard Biener wrote:

On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey  wrote:

expected?  Should this test also check flag_thread_jumps?  Or should
that be getting checked somewhere else?


-fthread-jumps is an RTL optimization flag and ignored on GIMPLE.


Does it make sense to add a -f[no-]tree-thread-jumps to enable/disable the tree
jump threading?  I could also add -f[no-]tree-fsm-thread-jumps.  Opinions?
Our option handling is a bit of a mess if we look at it from the user 
standpoint.  Given that the vast majority of jump threading happens on 
gimple, ISTM that -f[no-]thread-jumps ought to be controlling the gimple 
implementation.  One could easily argue that the user doesn't really 
care about where in the pipeline the optimization is implemented.


My vote would be to just make -fthread-jumps control both RTL and gimple 
jump threading.





On the llvm test-suite, I have seen one ICE with my fsm jump-thread patch.
This patch fixes the problem:

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 12f83ba..f8c736e 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
  {
if (!loop->header
+|| !loop_latch_edge (loop)
|| !bitmap_bit_p (threaded_blocks, loop->header->index))
continue;

   retval |= thread_through_loop_header (loop, may_peel_loop_headers);

Ok to commit after regstrap?
This seems to be indicating that we have with no edge from the latch 
block to the header block.  I'd like to know better how we got into that 
state.


Also, a test for the GCC testsuite would be good.  I have no idea what 
license covers the LLVM testsuite.  But given a good analysis of the 
problem we may be able to write a suitable test independent of the LLVM 
test.


jeff

Re: [Android] Stop unconditional disabling of ifuncs for Bionic #2

2014-12-09 Thread Jeff Law


On 12/09/14 05:39, Alexander Ivchenko wrote:

Hi,

Whether the *linux* target supports ifuncs or not is defined here:

linux_has_ifunc_p (void)
{
   return OPTION_BIONIC ? false : HAVE_GNU_INDIRECT_FUNCTION;
}

Bionic right now supports indirect functions, but there is no way to
notify the compiler about that (For Android OPTION_BIONIC is always on
and so configure time check with --enable-gnu-indirect-function does
not work)

The following patch makes the use of the default version of
TARGET_HAS_IFUNC_P for *linux*, so we can decide whether the Android
target supports indirect functions or not on configure time.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f22bba8..d4d09d0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-08  Alexander Ivchenko  
+
+   * config/linux.c (linux_has_ifunc_p): Remove.
+   * config/linux.h (TARGET_HAS_IFUNC_P): Use default version.

OK.
jeff

[PATCH, committed] jit-playback.c: Move dlopen code into a new function

2014-12-09 Thread David Malcolm

Another simplification of playback::context::compile.

Committed to trunk as r218527.

Index: gcc/jit/ChangeLog
===
--- gcc/jit/ChangeLog	(revision 218526)
+++ gcc/jit/ChangeLog	(revision 218527)
@@ -1,3 +1,12 @@
+2014-12-09  David Malcolm  
+
+	* jit-playback.c (gcc::jit::playback::context::compile): Move the
+	dlopen code into...
+	(gcc::jit::playback::context::dlopen_built_dso): ...this new
+	function.
+	* jit-playback.h (gcc::jit::playback::context::dlopen_built_dso):
+	New function.
+
 2014-12-08  David Malcolm  
 
 	* libgccjit++.h: Indent the forward declarations of the classes to
Index: gcc/jit/jit-playback.c
===
--- gcc/jit/jit-playback.c	(revision 218526)
+++ gcc/jit/jit-playback.c	(revision 218527)
@@ -1586,7 +1586,6 @@
 playback::context::
 compile ()
 {
-  void *handle = NULL;
   const char *ctxt_progname;
   result *result_obj = NULL;
 
@@ -1648,25 +1647,8 @@
   if (errors_occurred ())
 return NULL;
 
-  /* dlopen the .so file. */
-  {
-auto_timevar load_timevar (TV_LOAD);
+  result_obj = dlopen_built_dso ();
 
-const char *error;
-
-/* Clear any existing error.  */
-dlerror ();
-
-handle = dlopen (m_path_so_file, RTLD_NOW | RTLD_LOCAL);
-if ((error = dlerror()) != NULL)  {
-  add_error (NULL, "%s", error);
-}
-if (handle)
-  result_obj = new result (handle);
-else
-  result_obj = NULL;
-  }
-
   return result_obj;
 }
 
@@ -1916,6 +1898,34 @@
 }
 }
 
+/* Dynamically-link the built DSO file into this process, using dlopen.
+   Wrap it up within a jit::result *, and return that.
+   Return NULL if any errors occur, reporting them on this context.  */
+
+result *
+playback::context::
+dlopen_built_dso ()
+{
+  auto_timevar load_timevar (TV_LOAD);
+  void *handle = NULL;
+  const char *error = NULL;
+  result *result_obj = NULL;
+
+  /* Clear any existing error.  */
+  dlerror ();
+
+  handle = dlopen (m_path_so_file, RTLD_NOW | RTLD_LOCAL);
+  if ((error = dlerror()) != NULL)  {
+add_error (NULL, "%s", error);
+  }
+  if (handle)
+result_obj = new result (handle);
+  else
+result_obj = NULL;
+
+  return result_obj;
+}
+
 /* Top-level hook for playing back a recording context.
 
This plays back m_recording_ctxt, and, if no errors
Index: gcc/jit/jit-playback.h
===
--- gcc/jit/jit-playback.h	(revision 218526)
+++ gcc/jit/jit-playback.h	(revision 218527)
@@ -250,6 +250,9 @@
   void
   convert_to_dso (const char *ctxt_progname);
 
+  result *
+  dlopen_built_dso ();
+
 private:
   ::gcc::jit::recording::context *m_recording_ctxt;

Re: [PATCH] Fix PR 61225

2014-12-09 Thread Jeff Law


On 12/09/14 02:49, Zhenqiang Chen wrote:

Do you need to verify SETA and SETB satisfy single_set?  Or has that
already been done elsewhere?


A is NEXT_INSN (insn)
B is prev_nonnote_nondebug_insn (insn),

For I1 -> I2 -> B; I2 -> A;
LOG_LINK can make sure I1 and I2 are single_set, but not A and B. And I did
found codes in function try_combine, which can make sure B (or I3) is
single_set.

So I think the check can skip failed cases at early stage.

Thanks for doing the research on this.



The check is to make sure the correctness.  Here is a case,

int
f1 (int *x)
{
   int t = --*x;
   if (!t)
 foo (x);
   return t;
}

   _4 = *x_3(D);
   _5 = _4 + -1;
   *x_3(D) = _5;
   # DEBUG t => _5
   if (_5 == 0)
...
   :
   return _5;

"_5" is used in "return _5". So we can not remove "_5 = _4 + -1".
Right, but ISTM that if the # uses > 2, then we could just return false 
rather than bothering to see if all the uses are consumed by A or B. 
It's not a big deal, I just have a hard time seeing that doing something 
more complex than "if (# uses > 2) return false;" makes sense.






So you've got two new combine cases here, but I think the testcase only
tests one of them.  Can you include a testcase for both of hte major
paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)


pr61225.c is the case to cover I1->I2->I3; I2->insn.

For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 can also
handle them. So I removed the code from the patch.

Seems like the reasonable thing to do.



Here is the final patch.
Bootstrap and no make check regression on X86-64.

ChangeLog:
2014-11-09  Zhenqiang Chen  

Part of PR rtl-optimization/61225
* combine.c (can_reuse_cc_set_p): New function.
(combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
(try_combine): Add one more parameter TO_COMBINED_INSN, which
is used to create a new insn parallel (TO_COMBINED_INSN, I3).

testsuite/ChangeLog:
2014-11-09  Zhenqiang Chen

* gcc.target/i386/pr61225.c: New test.



OK for the trunk.

jeff

[PATCH, fixincludes]: Check return value of getcwd

2014-12-09 Thread Uros Bizjak

Hello!

The compilation with gentoo glibc 2.20 emits following warning:

../../../gcc-svn/trunk/fixincludes/server.c:195:10: warning: ignoring
return value of ‘getcwd’, declared with attribute warn_unused_result
[-Wunused-result]

The manpage says:

On failure, these functions return NULL, and errno is set to indicate
the error.  The contents of the array pointed to by buf are undefined
on error.

Attached patch checks the return value and sets buff[0] to 0 in this case.

2014-12-09  Uros Bizjak  

* server.c (server_setup): Check return value of
getcwd and in case of error set buff[0] to 0.

Bootstrapped on x86_64-linux-gnu.

OK for mainline?

Index: server.c
===
--- server.c(revision 218525)
+++ server.c(working copy)
@@ -192,7 +192,8 @@ server_setup (void)

   fputs ("trap : 1\n", server_pair.pf_write);
   fflush (server_pair.pf_write);
-  getcwd (buff, MAXPATHLEN + 1);
+  if (getcwd (buff, MAXPATHLEN + 1) == NULL)
+buff[0] = 0;
   p_cur_dir = xstrdup (buff);
 }

[PATCH, committed] Guard less code with the JIT mutex

2014-12-09 Thread David Malcolm

Move acquisition/release of the JIT mutex from
  jit-recording.c:gcc::jit::recording::context::compile
into
  jit-playback.c: gcc::jit::playback::context::compile
and reduce the amount of code guarded by the mutex somewhat.

For acquisition, we didn't need the mutex when building the tempdir and
building toplev's argv, so now acquire the mutex immediately before using
toplev.

For release, we didn't need the mutex when cleaning up the tempdir and
the other cleanup of the playback context, so release the mutex on each exit
path from playback::context::compile.

Ideally we'd release it immediately after toplev.finalize (), but there
are various places after that point that use timevars (which is global
state) so we have to hold the mutex until we're done with them.

Committed to trunk as r218528.

gcc/jit/ChangeLog:
* jit-playback.c (gcc::jit::playback::context::compile): Acquire the
mutex here, immediately before using toplev, and release it here, on
each exit path after acquisition.
(jit_mutex): Move this variable here, from jit-recording.c.
(gcc::jit::playback::context::acquire_mutex): New function, based on
code in jit-recording.c.
(gcc::jit::playback::context::release_mutex): Likewise.
* jit-playback.h (gcc::jit::playback::context::acquire_mutex): New
function.
(gcc::jit::playback::context::release_mutex): New function.
* jit-recording.c (jit_mutex): Move this variable to jit-playback.c.
(gcc::jit::recording::context::compile): Move mutex-handling from
here into jit-playback.c's gcc::jit::playback::context::compile.
* notes.txt: Update to show the new locations of ACQUIRE_MUTEX
and RELEASE_MUTEX.

Index: gcc/jit/jit-recording.c
===
--- gcc/jit/jit-recording.c	(revision 218527)
+++ gcc/jit/jit-recording.c	(revision 218528)
@@ -888,12 +888,6 @@
   m_requested_dumps.safe_push (d);
 }
 
-
-/* This mutex guards gcc::jit::recording::context::compile, so that only
-   one thread can be accessing the bulk of GCC's state at once.  */
-
-static pthread_mutex_t jit_mutex = PTHREAD_MUTEX_INITIALIZER;
-
 /* Validate this context, and if it passes, compile it within a
mutex.
 
@@ -908,20 +902,12 @@
   if (errors_occurred ())
 return NULL;
 
-  /* Acquire the big GCC mutex. */
-  pthread_mutex_lock (&jit_mutex);
-  gcc_assert (NULL == ::gcc::jit::active_playback_ctxt);
-
   /* Set up a playback context.  */
   ::gcc::jit::playback::context replayer (this);
-  ::gcc::jit::active_playback_ctxt = &replayer;
 
+  /* Use it.  */
   result *result_obj = replayer.compile ();
 
-  /* Release the big GCC mutex. */
-  ::gcc::jit::active_playback_ctxt = NULL;
-  pthread_mutex_unlock (&jit_mutex);
-
   return result_obj;
 }
 
Index: gcc/jit/ChangeLog
===
--- gcc/jit/ChangeLog	(revision 218527)
+++ gcc/jit/ChangeLog	(revision 218528)
@@ -1,5 +1,23 @@
 2014-12-09  David Malcolm  
 
+	* jit-playback.c (gcc::jit::playback::context::compile): Acquire the
+	mutex here, immediately before using toplev, and release it here, on
+	each exit path after acquisition.
+	(jit_mutex): Move this variable here, from jit-recording.c.
+	(gcc::jit::playback::context::acquire_mutex): New function, based on
+	code in jit-recording.c.
+	(gcc::jit::playback::context::release_mutex): Likewise.
+	* jit-playback.h (gcc::jit::playback::context::acquire_mutex): New
+	function.
+	(gcc::jit::playback::context::release_mutex): New function.
+	* jit-recording.c (jit_mutex): Move this variable to jit-playback.c.
+	(gcc::jit::recording::context::compile): Move mutex-handling from
+	here into jit-playback.c's gcc::jit::playback::context::compile.
+	* notes.txt: Update to show the new locations of ACQUIRE_MUTEX
+	and RELEASE_MUTEX.
+
+2014-12-09  David Malcolm  
+
 	* jit-playback.c (gcc::jit::playback::context::compile): Move the
 	dlopen code into...
 	(gcc::jit::playback::context::dlopen_built_dso): ...this new
Index: gcc/jit/jit-playback.c
===
--- gcc/jit/jit-playback.c	(revision 218527)
+++ gcc/jit/jit-playback.c	(revision 218528)
@@ -1622,6 +1622,9 @@
   if (errors_occurred ())
 return NULL;
 
+  /* Acquire the JIT mutex and set "this" as the active playback ctxt.  */
+  acquire_mutex ();
+
   /* This runs the compiler.  */
   toplev toplev (false);
   toplev.main (fake_args.length (),
@@ -1635,10 +1638,14 @@
   /* Clean up the compiler.  */
   toplev.finalize ();
 
-  active_playback_ctxt = NULL;
+  /* Ideally we would release the jit mutex here, but we can't yet since
+ followup activities use timevars, which are global state.  */
 
   if (errors_occurred ())
-return NULL;
+{
+  release_mutex ();
+  return NULL;
+}
 
   if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE))
 dump_generated_code ();
@@ -164

Re: [PATCH 07/10] rs6000: The big carry insns addition

2014-12-09 Thread David Edelsohn

On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool
 wrote:
> Now that everything is in place for letting GCC use the carry-using and
> carry-producing machine insns as separate RTL insns, switch over all
> remaining patterns that clobber CA without telling the compiler.  This
> are the multiple-precision add/sub/neg patterns, and the various eq/ne/
> ltu/gtu/leu/geu patterns.
>
> The eq/ne patterns are special.  The optimal machine code for those isn't
> regular (like e.g. the ltu patterns are), and we want to implement a plain
> eq as "cntlz[wd];sr[wd]i"; but that means that if we split those patterns
> at expand time, combine will happily put them back together again.  So
> expand them as eq/ne, and split later.
>
>
> 2014-12-08  Segher Boessenkool  
>
> gcc/
> PR target/64180
> * config/rs6000/predicates.md (unsigned_comparison_operator): New.
> (signed_comparison_operator): New.
> * config/rs6000/rs6000-protos.h (rs6000_emit_eqne): Declare.
> * config/rs6000/rs6000.c (rs6000_emit_eqne): New function.
> (rs6000_emit_sCOND): Remove ISEL test (move it to the expander).
> * config/rs6000/rs6000.md (add3 for SDI): Expand DImode
> add to addc,adde directly, if !TARGET_POWERPC64.
> (sub3 for SDI): Expand DImode sub to subfc,subfe directly,
> if !TARGET_POWERPC64.
> (neg2): Delete expander.
> (*neg2): Rename to "neg2".
> (addti3, subti3): Delete.
> (addti3, subti3): New expanders.
> (*adddi3_noppc64, *subdi3_noppc64, *negdi2_noppc64): Delete.
> (cstore4_unsigned): New expander.
> (cstore4): Allow GPR as output (not just SI).  Rewrite.
> (cstore4 for FP): Remove superfluous quotes.
> (*eq, *eq_compare, *plus_eqsi and splitter,
> *compare_plus_eqsi and splitter, *plus_eqsi_compare and splitter,
> *neg_eq0, *neg_eq, *ne0_, plus_ne0_,
> compare_plus_ne0_ and splitter, *compare_plus_ne0__1 and
> splitter, *plus_ne0__compare and splitter, *leu,
> *leu_compare and splitter, *plus_leu, *neg_leu,
> *and_neg_leu, *ltu, *ltu_compare, *plus_ltu,
> *plus_ltu_1, *plus_ltucompare, *neg_ltu, *geu,
> *geu_compare and splitter, *plus_geu, *neg_geu,
> *and_neg_geu, *plus_gt0, *gtu, *gtu_compare,
> *plus_gtu, *plus_gtu_1, *plus_gtu_compare,
> *neg_gtu, 12 anonymous insns, and 12 anonymous splitters):
> Delete.
> (eq3, ne3): New.
> (*neg_eq_, *neg_ne_): New.
> (*plus_eq_, *plus_ne_): New.
> (*minus_eq_, *minus_ne_): New.

Okay.

Thanks, David

Re: [PATCH 08/10] rs6000: compare->two for rld;rld.

2014-12-09 Thread David Edelsohn

On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool
 wrote:
> These now are the only remaining patterns that have type "compare".
> And they shouldn't: "two" is a better type for them.
>
>
> 2014-12-08  Segher Boessenkool  
>
> gcc/
> * config/rs6000/rs6000.md (*anddi3_2rld_dot, *anddi3_rld_dot2):
> Change type from "compare" to "two".

Okay.

Thanks, David

Re: [PATCH 09/10] rs6000: Remove the (now unused) insn type "compare"

2014-12-09 Thread David Edelsohn

On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool
 wrote:
>
> 2014-12-08  Segher Boessenkool  
>
> gcc/
> * config/rs6000/40x.md (ppc403-compare): Remove "compare".
> config/rs6000/440.md (ppc440-compare): Remove "compare".
> config/rs6000/476.md (ppc476-compare): Remove "compare".
> config/rs6000/601.md (ppc601-compare): Remove "compare".
> config/rs6000/603.md (ppc603-compare): Remove "compare".
> config/rs6000/6xx.md (ppc604-compare): Remove "compare".
> config/rs6000/7450.md (ppc7450-compare): Remove "compare".
> config/rs6000/7xx.md (ppc750-compare): Remove "compare".
> config/rs6000/8540.md (ppc8540_su): Remove "compare".
> config/rs6000/cell.md (cell-fast-cmp, cell-cmp-microcoded): Remove
> "compare".
> config/rs6000/e300c2c3.md (ppce300c3_cmp): Remove "compare".
> config/rs6000/e500mc.md (e500mc_su): Remove "compare".
> config/rs6000/e500mc64.md (e500mc64_su2): Remove "compare".
> config/rs6000/e5500.md (e5500_sfx2): Remove "compare".
> config/rs6000/e6500.md (e6500_sfx2): Remove "compare".
> config/rs6000/mpc.md (mpccore-compare): Remove "compare".
> config/rs6000/power4.md (power4-compare): Remove "compare".
> config/rs6000/power5.md (power5-compare): Remove "compare".
> config/rs6000/power6.md (power6-compare): Remove "compare".
> config/rs6000/power7.md (power7-compare): Remove "compare".
> config/rs6000/power8.md (power8-compare): Remove "compare".  Update
> comment.
> config/rs6000/rs6000.c (rs6000_adjust_cost) : Remove
> (three times).
> (is_cracked_insn): Remove TYPE_COMPARE case.
> (insn_must_be_first_in_group) : Remove (twice).
> config/rs6000/rs6000.md (type): Remove "compare".
> (cell_micro): Remove "compare".
> config/rs6000/rs64.md (rs64a-compare): Remove "compare".

Okay.

Thanks, David

Re: [PATCH 10/10] rs6000: Get rid of the weird decimal thing in add

2014-12-09 Thread David Edelsohn

On Mon, Dec 8, 2014 at 9:18 AM, Segher Boessenkool
 wrote:
> Peter tells me it was an artifact of old versions of the DFP code.  The
> condition can never be false; delete it.
>
>
> 2014-12-08  Segher Boessenkool  
>
> gcc/
> * config/rs6000/rs6000.md (*add3): Remove condition.

Okay.

Thanks, David

Re: [PATCH] Fix PR 61225

2014-12-09 Thread Segher Boessenkool

On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote:
> > Do you need to verify SETA and SETB satisfy single_set?  Or has that
> > already been done elsewhere?
> 
> A is NEXT_INSN (insn)
> B is prev_nonnote_nondebug_insn (insn),
> 
> For I1 -> I2 -> B; I2 -> A;
> LOG_LINK can make sure I1 and I2 are single_set,

It cannot, not anymore anyway.  LOG_LINKs can point to an insn with multiple
SETs; multiple LOG_LINKs can point to such an insn.

The only thing a LOG_LINK from Y to X tells you is that Y is the earliest
insn after X that uses some register set by X (and it knows which register,
too, nowadays).

> > > +  if (DF_REG_USE_COUNT (REGNO (SET_SRC (setb))) > 2)
> > > +{
> > > +  df_ref use;
> > > +  rtx insn;
> > > +  unsigned int i = REGNO (SET_SRC (setb));
> > > +
> > > +  for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG
> (use))
> > > +{
> > > +   insn = DF_REF_INSN (use);
> > > +   if (insn != a && insn != b && !(NOTE_P (insn) || DEBUG_INSN_P
> > (insn)))
> > > + return false;
> > > + }
> > > +}
> > > +
> > > +  return true;
> > > +}
> > Is this fragment really needed?  Does it ever trigger?  I'd think that
> > for > 2 uses punting would be fine.  Do we really commonly have cases
> > with > 2 uses, but where they're all in SETA and SETB?

Can't you just check for a death note on the second insn?  Together with
reg_used_between_p?

> > > +   /* Try to combine a compare insn that sets CC
> > > +  with a preceding insn that can set CC, and maybe with its
> > > +  logical predecessor as well.
> > > +  We need this special code because data flow connections
> > > +  do not get entered in LOG_LINKS.  */

I think you mean "not _all_ data flow connections"?

> > > +   if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
> > > +   && refer_same_reg_p (insn, prev)
> > > +   && max_combine >= 4)
> > > + {
> > > + struct insn_link *next1;
> > > + FOR_EACH_LOG_LINK (next1, prev)
> > > +   {
> > > + rtx_insn *link1 = next1->insn;
> > > + if (NOTE_P (link1))
> > > +   continue;
> > > + /* I1 -> I2 -> I3; I2 -> insn;
> > > +output parallel (insn, I3).  */
> > > + FOR_EACH_LOG_LINK (nextlinks, link1)
> > > +   if ((next = try_combine (prev, link1,
> > > +nextlinks->insn, NULL,
> > > +&new_direct_jump_p,
> > > +last_combined_insn, insn)) !=
> 0)
> > > +
> > > + {
> > > +   delete_insn (insn);
> > > +   insn = next;
> > > +   statistics_counter_event (cfun, "four-insn
> combine",
> > 1);
> > > +   goto retry;
> > > + }
> > > + /* I2 -> I3; I2 -> insn
> > > +output next = parallel (insn, I3).  */
> > > + if ((next = try_combine (prev, link1,
> > > +  NULL, NULL,
> > > +  &new_direct_jump_p,
> > > +  last_combined_insn, insn)) !=
> 0)
> > > +
> > > +   {
> > > + delete_insn (insn);
> > > + insn = next;
> > > + statistics_counter_event (cfun, "three-insn
> combine",
> > 1);
> > > + goto retry;
> > > +   }
> > > +   }
> > > + }
> > So you've got two new combine cases here, but I think the testcase only
> > tests one of them.  Can you include a testcase for both of hte major
> > paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)
> 
> pr61225.c is the case to cover I1->I2->I3; I2->insn.
> 
> For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 can also
> handle them. So I removed the code from the patch.

Why?  The simpler case has much better chances of being used.

In fact, there are many more cases you could handle:

You handle

I1 -> I2 -> I3; I2 -> insn
  I2 -> I3; I2 -> insn

but there are also

   I1,I2 -> I3; I2 -> insn

and the many 4-insn combos, too.
But that's not all: instead of just dealing with I2->insn, you can just as
well have I1->insn or I0->insn, and if you could handle the SET not dying
in the resulting insn, I3->insn.

In fact, in that case you really only need to handle I3->insn (no other
instructions involved), as a simple 2-insn combination that combines into
the earlier insn instead of the later, to get the effect you want.

Just like your patch, that would pull "insn" earlier, but it would do it
much more explicitly.


Some comments on the patch...

> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> +   except A and B.  */

That sound like the only correct inputs are such a compare etc., but the
routine tests whether that is true.

Re: [PATCH] Fix PR 61225

2014-12-09 Thread Jeff Law


On 12/09/14 12:07, Segher Boessenkool wrote:

On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote:

Do you need to verify SETA and SETB satisfy single_set?  Or has that
already been done elsewhere?


A is NEXT_INSN (insn)
B is prev_nonnote_nondebug_insn (insn),

For I1 -> I2 -> B; I2 -> A;
LOG_LINK can make sure I1 and I2 are single_set,


It cannot, not anymore anyway.  LOG_LINKs can point to an insn with multiple
SETs; multiple LOG_LINKs can point to such an insn.

So let's go ahead and put a single_set test in this function.


Is this fragment really needed?  Does it ever trigger?  I'd think that

for > 2 uses punting would be fine.  Do we really commonly have cases
with > 2 uses, but where they're all in SETA and SETB?


Can't you just check for a death note on the second insn?  Together with
reg_used_between_p?
Yea, that'd accomplish the same thing I think Zhenqiang is trying to 
catch and is simpler than walking the lists.





+ /* Try to combine a compare insn that sets CC
+with a preceding insn that can set CC, and maybe with its
+logical predecessor as well.
+We need this special code because data flow connections
+do not get entered in LOG_LINKS.  */


I think you mean "not _all_ data flow connections"?
I almost said something about this comment, but figured I was nitpicking 
too much :-)



So you've got two new combine cases here, but I think the testcase only
tests one of them.  Can you include a testcase for both of hte major
paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)


pr61225.c is the case to cover I1->I2->I3; I2->insn.

For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 can also
handle them. So I removed the code from the patch.


Why?  The simpler case has much better chances of being used.
The question does it actually catch anything not already handled?  I 
guess you could argue that doing it in combine is better than peep2 and 
I'd agree with that.




In fact, there are many more cases you could handle:

You handle

I1 -> I2 -> I3; I2 -> insn
   I2 -> I3; I2 -> insn

but there are also

I1,I2 -> I3; I2 -> insn

and the many 4-insn combos, too.
Yes, but I wonder how much of this is really necessary in practice.  We 
could do exhaustive testing here, but I suspect the payoff isn't all 
that great.  Thus I'm comfortable with faulting in the cases we actually 
find are useful in practice.





+/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
+   It returns TRUE, if reg1 == reg2, and no other refer of reg1
+   except A and B.  */


That sound like the only correct inputs are such a compare etc., but the
routine tests whether that is true.
Correct, the RTL has to have a specific form and that is tested for. 
Comment updates can't hurt.






+static bool
+can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
+{
+  rtx seta = single_set (a);
+  rtx setb = single_set (b);
+
+  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)


Neither the comment nor the function name mention this.  This test is
better placed in the caller of this function, anyway.
Didn't consider it terribly important.  Moving it to the caller doesn't 
change anything significantly, though I would agree it's martinally cleaner.





@@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
*i1, rtx_insn *i0,
  rtx old = newpat;
  total_sets = 1 + extra_sets;
  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
- XVECEXP (newpat, 0, 0) = old;
+
+ if (to_combined_insn)
+   XVECEXP (newpat, 0, --total_sets) = old;
+ else
+   XVECEXP (newpat, 0, 0) = old;
}


Is this correct?  If so, it needs a big fat comment, because it is
not exactly obvious :-)

Also, it doesn't handle at all the case where the new pattern already is
a PARALLEL; can that never happen?

I'd convinced myself it was.  But yes, a comment here would be good.

Presumably you're thinking about a PARALLEL that satisfies single_set_p?

jeff

Re: Running cc1 etc under valgrind (was Re: [PATCH 13/21] PR jit/63854: Add support for running "make check-jit" under valgrind)

2014-12-09 Thread David Malcolm

On Wed, 2014-11-19 at 13:38 -0500, David Malcolm wrote:
> On Wed, 2014-11-19 at 09:57 -0700, Jeff Law wrote:
> > On 11/19/14 03:46, David Malcolm wrote:
> > > This commit updates jit.exp so that if RUN_UNDER_VALGRIND is present
> > > in the environment, all of the built client code using libgccjit.so is
> > > run under valgrind, with --leak-check=full.
> > >
> > > Hence:
> > >RUN_UNDER_VALGRIND= make check-jit
> > > will run all jit testcases under valgrind (taking 27 mins on my
> > > machine).
> > >
> > > Results are written to testsuite/jit/test-FOO.exe.valgrind.txt
> > >
> > > jit.exp automatically parses these result file, looking for lines of
> > > the form
> > >definitely lost: 11,316 bytes in 235 blocks
> > >indirectly lost: 352 bytes in 4 blocks
> > > in the valgrind log's summary footer, adding PASSes if they are zero
> > > bytes, and, for now generating XFAILs for non-zero bytes.
> > >
> > > Sadly this diverges jit.exp's fixed_host_execute further from DejaGnu's
> > > host_execute, but I don't see a clean way to fix that.
> > >
> > > This currently adds 63 PASSes and 49 XFAILs to jit.sum, giving:
> > ># of expected passes   2481
> > ># of expected failures 49
> > >
> > > gcc/testsuite/ChangeLog:
> > >   PR jit/63854
> > >   * jit.dg/jit.exp (report_leak): New.
> > >   (parse_valgrind_logfile): New.
> > >   (fixed_host_execute): Detect if RUN_UNDER_VALGRIND is present
> > >   in the environment, and if so, run the executable under
> > >   valgrind, capturing valgrind's output to a logfile.  Parse the
> > >   log file, generating PASSes and XFAILs for the summary of leaks.
> > OK for the trunk.  FWIW, I'd love to see a mode where we can easily do 
> > this for the other testsuites as well.
> 
> Many of the cleanups in these patches are called from toplev::finalize,
> or something called from there, so that they're called by libgccjit.so,
> but not called by cc1/cc1plus etc.
> 
> In general, cc1 etc don't need to bother free-ing everything, and can
> instead simply exit.
> 
> But if you're running under valgrind, you'd probably want them to call
> toplev::finalize before exiting, to make the valgrind log shorter.
> 
> So perhaps cc1 etc could detect if they're being run under valgrind, and
> call toplev::finalize in the appropriate place?
> 
> Or maybe this could be a command-line option?
> 
> [I think I prefer autodetection, fwiw]

It turns out that this isn't necessary (I think), since the pointers are
typically still reachable.

[PATCH] Fix IRA register preferencing

2014-12-09 Thread Wilco Dijkstra

This fixes a bug in register preferencing. When live range splitting creates a 
new register from
another, it copies most fields except for the register preferences. The 
preference GENERAL_REGS is
used as reg_pref[i].prefclass is initialized with GENERAL_REGS in 
allocate_reg_info () and
resize_reg_info ().

This initialization value is not innocuous like the comment suggests - if a new 
register has a
non-integer mode, it is forced to prefer GENERAL_REGS. This changes the 
register costs in pass 2 so
that they are incorrect. As a result the liverange is either spilled or 
allocated to an integer
register:

void g(double);
void f(double x) 
{ 
  if (x == 0) 
return; 
  g (x);
  g (x); 
}

f:
fcmpd0, #0.0
bne .L6
ret
.L6:
stp x19, x30, [sp, -16]!
fmovx19, d0
bl  g
fmovd0, x19
ldp x19, x30, [sp], 16
b   g

With the fix it uses a floating point register as expected. Given a similar 
issue in
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html, would it not be 
better to change the
initialization values of reg_pref to illegal register classes so this kind of 
issue can be trivially
found with an assert? Also would it not be a good idea to have a single 
register copy function that
ensures all data is copied?


ChangeLog: 2014-12-09  Wilco Dijkstra  wdijk...@arm.com

* gcc/ira-emit.c (ira_create_new_reg): Copy preference classes.

---
 gcc/ira-emit.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/ira-emit.c b/gcc/ira-emit.c
index d246b7f..d736836 100644
--- a/gcc/ira-emit.c
+++ b/gcc/ira-emit.c
@@ -348,6 +348,7 @@ rtx
 ira_create_new_reg (rtx original_reg)
 {
   rtx new_reg;
+  int original_regno = REGNO (original_reg);
 
   new_reg = gen_reg_rtx (GET_MODE (original_reg));
   ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (original_reg);
@@ -356,8 +357,16 @@ ira_create_new_reg (rtx original_reg)
   REG_ATTRS (new_reg) = REG_ATTRS (original_reg);
   if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
 fprintf (ira_dump_file, "  Creating newreg=%i from oldreg=%i\n",
-REGNO (new_reg), REGNO (original_reg));
+REGNO (new_reg), original_regno);
   ira_expand_reg_equiv ();
+
+  /* Copy the preference classes to new_reg.  */
+  resize_reg_info ();
+  setup_reg_classes (REGNO (new_reg),
+reg_preferred_class (original_regno),
+reg_alternate_class (original_regno),
+reg_allocno_class (original_regno));
+
   return new_reg;
 }
 
-- 
1.9.1

Re: [PATCH] TYPE_OVERFLOW_* cleanup

2014-12-09 Thread Marc Glisse


On Tue, 9 Dec 2014, Marek Polacek wrote:


The issue here is that TYPE_OVERFLOW_TRAPS, TYPE_OVERFLOW_UNDEFINED,
and TYPE_OVERFLOW_WRAPS macros work on integral types only, yet we
pass e.g. pointer_type/real_type to them.  This patch adds proper
checking for these macros and adds missing guards to various places.
This looks pretty straightforward, but I had to tweak a few places
wrt vector_types (where I've used VECTOR_INTEGER_TYPE_P) to not to
regress folding - and I'm afraid I missed places that aren't tested
in our testsuite :/.

Bootstrapped/regtested on ppc64-linux and x86_64-linux.

2014-12-09  Marek Polacek  

* fold-const.c (negate_expr_p): Check for INTEGRAL_TYPE_P.
(fold_negate_expr): Likewise.
(extract_muldiv_1): Likewise.
(maybe_canonicalize_comparison_1): Likewise.
(fold_comparison): Likewise.
(fold_binary_loc): Likewise.
(tree_binary_nonnegative_warnv_p): Likewise.
(tree_binary_nonzero_warnv_p): Likewise.
* gimple-ssa-strength-reduction.c (legal_cast_p_1): Likewise.
* tree-scalar-evolution.c (simple_iv): Likewise.
(scev_const_prop): Likewise.
* tree-ssa-loop-niter.c (expand_simple_operations): Likewise.
* match.pd (X % -C): Likewise.
(-A - 1 -> ~A): Likewise.
(~A + A -> -1): Check for INTEGRAL_TYPE_P or VECTOR_INTEGER_TYPE_P.
* tree-vect-generic.c (expand_vector_operation): Likewise.
* tree.h (INTEGRAL_TYPE_CHECK): Define.
(TYPE_OVERFLOW_WRAPS, TYPE_OVERFLOW_UNDEFINED, TYPE_OVERFLOW_TRAPS):
Add INTEGRAL_TYPE_CHECK.

diff --git gcc/fold-const.c gcc/fold-const.c
index 0c4fe40..ff9d917 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -426,7 +426,8 @@ negate_expr_p (tree t)

case VECTOR_CST:
  {
-   if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))
+   if (FLOAT_TYPE_P (TREE_TYPE (type))
+   || (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)))
  return true;


type is a vector type, so INTEGRAL_TYPE_P (type) is always false I think.



int count = TYPE_VECTOR_SUBPARTS (type), i;
@@ -558,7 +559,8 @@ fold_negate_expr (location_t loc, tree t)
case INTEGER_CST:
  tem = fold_negate_const (t, type);
  if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
- || (!TYPE_OVERFLOW_TRAPS (type)
+ || (INTEGRAL_TYPE_P (type)


Can that be false for an INTEGER_CST?


+ && !TYPE_OVERFLOW_TRAPS (type)
  && TYPE_OVERFLOW_WRAPS (type))
  || (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0)
return tem;
@@ -5951,7 +5953,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
   || EXPRESSION_CLASS_P (op0))
  /* ... and has wrapping overflow, and its type is smaller
 than ctype, then we cannot pass through as widening.  */
- && ((TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0))
+ && (((INTEGRAL_TYPE_P (TREE_TYPE (op0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)))
   && (TYPE_PRECISION (ctype)
   > TYPE_PRECISION (TREE_TYPE (op0
  /* ... or this is a truncation (t is narrower than op0),
@@ -5966,7 +5969,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  /* ... or has undefined overflow while the converted to
 type has not, we cannot do the operation in the inner type
 as that would introduce undefined overflow.  */
- || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))
+ || ((INTEGRAL_TYPE_P (TREE_TYPE (op0))
+  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0)))
  && !TYPE_OVERFLOW_UNDEFINED (type
break;

@@ -6159,6 +6163,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  if ((code == TRUNC_MOD_EXPR || code == CEIL_MOD_EXPR
   || code == FLOOR_MOD_EXPR || code == ROUND_MOD_EXPR)
  /* If the multiplication can overflow we cannot optimize this.  */
+ && INTEGRAL_TYPE_P (TREE_TYPE (t))
  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (t))
  && TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST
  && wi::multiple_of_p (op1, c, TYPE_SIGN (type)))
@@ -6211,7 +6216,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,

 If we have an unsigned type, we cannot do this since it will change
 the result if the original computation overflowed.  */
-  if (TYPE_OVERFLOW_UNDEFINED (ctype)
+  if (INTEGRAL_TYPE_P (ctype)
+ && TYPE_OVERFLOW_UNDEFINED (ctype)
  && ((code == MULT_EXPR && tcode == EXACT_DIV_EXPR)
  || (tcode == MULT_EXPR
  && code != TRUNC_MOD_EXPR && code != CEIL_MOD_EXPR
@@ -8497,7 +8503,8 @@ maybe_canonicalize_comparison_1 (location_t loc, enum 
tree_code code, tree type,

  /* Match A +- CST code arg1 and CST code arg1.  We can change the
 first form only if ove

Re: locales fixes

2014-12-09 Thread Marc Glisse


On Tue, 9 Dec 2014, Jonathan Wakely wrote:


On 08/12/14 23:53 +0100, François Dumont wrote:
   After having installed all necessary locales on my system I end up with 
4 failures. Here is a patch to fix them all.


Did you discover why only you are seeing failures?


Not just him, anyone with a debian-based system (assuming we are talking 
about the same thing). I believe it was documented somewhere, Paolo should 
know.


--
Marc Glisse

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Richard Biener

On December 9, 2014 7:39:48 PM CET, Jeff Law  wrote:
>On 12/09/14 10:38, Sebastian Pop wrote:
>> Richard Biener wrote:
>>> On Mon, Dec 8, 2014 at 10:49 PM, Steve Ellcey 
>wrote:
 expected?  Should this test also check flag_thread_jumps?  Or
>should
 that be getting checked somewhere else?
>>>
>>> -fthread-jumps is an RTL optimization flag and ignored on GIMPLE.
>>
>> Does it make sense to add a -f[no-]tree-thread-jumps to
>enable/disable the tree
>> jump threading?  I could also add -f[no-]tree-fsm-thread-jumps. 
>Opinions?
>Our option handling is a bit of a mess if we look at it from the user 
>standpoint.  Given that the vast majority of jump threading happens on 
>gimple, ISTM that -f[no-]thread-jumps ought to be controlling the
>gimple 
>implementation.  One could easily argue that the user doesn't really 
>care about where in the pipeline the optimization is implemented.
>
>My vote would be to just make -fthread-jumps control both RTL and
>gimple 
>jump threading.

Works for me.

>
>>
>> On the llvm test-suite, I have seen one ICE with my fsm jump-thread
>patch.
>> This patch fixes the problem:
>>
>> diff --git a/gcc/tree-ssa-threadupdate.c
>b/gcc/tree-ssa-threadupdate.c
>> index 12f83ba..f8c736e 100644
>> --- a/gcc/tree-ssa-threadupdate.c
>> +++ b/gcc/tree-ssa-threadupdate.c
>> @@ -2564,6 +2564,7 @@ thread_through_all_blocks (bool
>may_peel_loop_headers)
>> FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>   {
>> if (!loop->header
>> +|| !loop_latch_edge (loop)
>> || !bitmap_bit_p (threaded_blocks, loop->header->index))
>> continue;
>>
>>retval |= thread_through_loop_header (loop,
>may_peel_loop_headers);
>>
>> Ok to commit after regstrap?
>This seems to be indicating that we have with no edge from the latch 
>block to the header block.  I'd like to know better how we got into
>that 
>state.

It Also returns null for loops with multiple latches. So the patch looks OK for 
me.

Thanks,
Richard.

>Also, a test for the GCC testsuite would be good.  I have no idea what 
>license covers the LLVM testsuite.  But given a good analysis of the 
>problem we may be able to write a suitable test independent of the LLVM
>
>test.
>
>jeff

Re: [PATCH, MPX wrappers 1/3] Add MPX wrappers library

2014-12-09 Thread Jeff Law


On 12/09/14 01:27, Ilya Enkovich wrote:


2014-12-09  Ilya Enkovich  

* config/i386/linux-common.h (LIBMPX_WRAPPERSSPEC): New.
(CHKP_SPEC): Add wrappers library.
* c-family/c.opt (static-libmpxwrappers): New.

libmpx/

2014-12-09  Ilya Enkovich  

* Makefile.am (SUBDIRS): Add mpxwrap when used
AS supports MPX.
(MAKEOVERRIDES): New.
* Makefile.in: Regenerate.
* configure.ac: Check AS supports MPX.  Add
mpxintr/Makefile to config files.
* configure: Regenerate.
* mpxwrap/Makefile.am: New.
* mpxwrap/Makefile.in: New.
* mpxwrap/libtool-version: New.
* mpxwrap/mpx_wrappers.cc: New.
* mpxwrap/libmpxwrappers.map: New.



OK once prerequisites are approved.

jeff

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Jeff Law


On 12/09/14 12:43, Richard Biener wrote:

This seems to be indicating that we have with no edge from the latch
block to the header block.  I'd like to know better how we got into
that
state.


It Also returns null for loops with multiple latches. So the patch looks OK for 
me.

Ah, OK.

Jeff

Re: [PATCH] Set stores_off_frame_dead_at_return to false if sibling call

2014-12-09 Thread Jeff Law


On 12/08/14 14:50, John David Anglin wrote:

On 12/8/2014 3:49 PM, Jeff Law wrote:

I think the terminology and variable names certainly makes this
tougher to follow than it should.

I certainly agree. When I first looked at the code, I thought it
was completely backwards.

Me too.

Thoughts on using both patches to ensure the we don't have extraneous 
entries in the GEN set and that we're adding more stuff to the KILL set 
for sibcalls?


Jeff

[PATCH, committed] Add jit-tempdir.{c|h}

2014-12-09 Thread David Malcolm

Move tempdir handling out from jit-playback.c into new
jit-tempdir.{c|h} files.

This simplifies jit-playback.c, and may help fix PR jit/64206.

Committed to trunk as r218533.

gcc/jit/ChangeLog:
PR jit/64206
* Make-lang.in (jit_OBJS): Add jit/jit-tempdir.o.
* jit-common.h (gcc::jit::tempdir): New forward decl.
* jit-playback.c: Include jit-tempdir.h.
(gcc::jit::playback::context::context): Initialize m_tempdir.
(gcc::jit::playback::context::~context): Move tempdir
cleanup to new file jit-tempdir.c
(make_tempdir_path_template): Move to new file jit-tempdir.c.
(gcc::jit::playback::context::compile): Move tempdir creation
to new tempdir object in new file jit-tempdir.c.
(gcc::jit::playback::context::make_fake_args): Get path from
tempdir object rather than from member data.
(gcc::jit::playback::context::convert_to_dso): Likewise.
(gcc::jit::playback::context::dlopen_built_dso): Likewise.
(gcc::jit::playback::context::dump_generated_code): Likewise.
(gcc::jit::playback::context::get_path_c_file): New function.
(gcc::jit::playback::context::get_path_s_file): New function.
(gcc::jit::playback::context::get_path_so_file): New function.
* jit-playback.h (gcc::jit::playback::context::get_path_c_file):
New function.
(gcc::jit::playback::context::get_path_s_file): New function.
(gcc::jit::playback::context::get_path_so_file): New function.
(gcc::jit::playback::context): Move fields "m_path_template",
"m_path_tempdir", "m_path_c_file", "m_path_s_file",
"m_path_so_file" to new jit::tempdir class; add field "m_tempdir".
* jit-tempdir.c: New file.
* jit-tempdir.h: New file.
---
 gcc/jit/Make-lang.in   |   1 +
 gcc/jit/jit-common.h   |   1 +
 gcc/jit/jit-playback.c | 118 +---
 gcc/jit/jit-playback.h |  15 ++
 gcc/jit/jit-tempdir.c  | 129 +
 gcc/jit/jit-tempdir.h  |  81 +++
 6 files changed, 262 insertions(+), 83 deletions(-)
 create mode 100644 gcc/jit/jit-tempdir.c
 create mode 100644 gcc/jit/jit-tempdir.h

diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
index e88fd00..818e14b 100644
--- a/gcc/jit/Make-lang.in
+++ b/gcc/jit/Make-lang.in
@@ -65,6 +65,7 @@ jit_OBJS = attribs.o \
jit/jit-recording.o \
jit/jit-playback.o \
jit/jit-result.o \
+   jit/jit-tempdir.o \
jit/jit-builtins.o
 
 # Use strict warnings for this front end.
diff --git a/gcc/jit/jit-common.h b/gcc/jit/jit-common.h
index c9dde3e..25c2c6f 100644
--- a/gcc/jit/jit-common.h
+++ b/gcc/jit/jit-common.h
@@ -98,6 +98,7 @@ namespace jit {
 class result;
 class dump;
 class builtins_manager; // declared within jit-builtins.h
+class tempdir;
 
 namespace recording {
 
diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index 281ad85..8498900 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "jit-playback.h"
 #include "jit-result.h"
 #include "jit-builtins.h"
+#include "jit-tempdir.h"
 
 
 /* gcc::jit::playback::context::build_cast uses the convert.h API,
@@ -86,6 +87,7 @@ namespace jit {
 
 playback::context::context (recording::context *ctxt)
   : m_recording_ctxt (ctxt),
+m_tempdir (NULL),
 m_char_array_type_node (NULL),
 m_const_char_ptr (NULL)
 {
@@ -98,25 +100,8 @@ playback::context::context (recording::context *ctxt)
 
 playback::context::~context ()
 {
-  if (get_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES))
-fprintf (stderr, "intermediate files written to %s\n", m_path_tempdir);
-  else
-{
-  /* Clean up .s/.so and tempdir. */
-  if (m_path_s_file)
-unlink (m_path_s_file);
-  if (m_path_so_file)
-unlink (m_path_so_file);
-  if (m_path_tempdir)
-rmdir (m_path_tempdir);
-}
-
-  free (m_path_template);
-  /* m_path_tempdir aliases m_path_template, or is NULL, so don't
- attempt to free it .  */
-  free (m_path_c_file);
-  free (m_path_s_file);
-  free (m_path_so_file);
+  if (m_tempdir)
+delete m_tempdir;
   m_functions.release ();
 }
 
@@ -1515,44 +1500,6 @@ block (function *func,
   m_label_expr = NULL;
 }
 
-/* Construct a tempdir path template suitable for use by mkdtemp
-   e.g. "/tmp/libgccjit-XX", but respecting the rules in
-   libiberty's choose_tempdir rather than hardcoding "/tmp/".
-
-   The memory is allocated using malloc and must be freed.
-   Aborts the process if allocation fails. */
-
-static char *
-make_tempdir_path_template ()
-{
-  const char *tmpdir_buf;
-  size_t tmpdir_len;
-  const char *file_template_buf;
-  size_t file_template_len;
-  char *result;
-
-  /* The result of choose_tmpdir is a cached buffer within libiberty, so
- we must *not* free it.  */
-  tmpdir_buf = cho

Re: [PATCH][libstdc++][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model

2014-12-09 Thread Mike Stump

On Dec 9, 2014, at 3:17 AM, Kyrill Tkachov  wrote:
> In the gcc and g++ testsuite we already catch such cases and mark them as 
> UNSUPPORTED. In libstdc++.exp there is no such functionality.

> I'm not very happy that it had to be copied, but I couldn't find a way to 
> include the gcc definition sanely.

> Is this a sane approach to what I'm trying to solve?

Ok.

If you would like, you can try and pull the common parts out into a new file, 
and include (load) that file from the two places that currently do that.  If 
they are exactly identical, should be trivial enough.  If not exactly the same, 
I’d do two patches, once to make them the same, then, the second one to split 
them out.

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-09 Thread Mike Stump

On Dec 9, 2014, at 10:39 AM, Jeff Law  wrote:
> Also, a test for the GCC testsuite would be good.  I have no idea what 
> license covers the LLVM testsuite.  But given a good analysis of the problem 
> we may be able to write a suitable test independent of the LLVM test.

So, the usual engineerings rules should work just fine on it.  delta reduce and 
submit.

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-12-09 Thread Jeff Law


On 12/09/14 01:24, Ilya Enkovich wrote:

On 24 Nov 17:02, Ilya Enkovich wrote:


Right.  This works for both top level and multilib checks because failing test 
is used and CC is usually not set when it's called by the top level configure.  
If we configure with CC=... then it may go wrong.  I left only target check in 
configure.tgt and inlined test for x32 into libmpx configure.



--
Joseph S. Myers
jos...@codesourcery.com


Here is an updated version.

Thanks,
Ilya
--


Here is an updated version.  I moved linker specs to target.  Currently mpx 
libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think 
gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations.

Thanks,
Ilya
--
2014-12-09  Ilya Enkovich  

* Makefile.def: Add libmpx.
* configure.ac: Add libmpx.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/

2014-12-09  Ilya Enkovich  

* config/i386/linux-common.h (LIBMPX_LIBS): New.
(LIBMPX_SPEC): New.
(CHKP_SPEC): New.
* gcc.c (CHKP_SPEC): New.
(LINK_COMMAND_SPEC): Add CHKP_SPEC.
* c-family/c.opt (static-libmpx): New.

libmpx/




diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index b9f7c65..65731cc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only
  C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0)
  Instrument only functions marked with bnd_instrument attribute.

+static-libmpx
+Driver
+
  fcilkplus
  C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
  Enable Cilk Plus

Wasn't this duplicated in the mpx-wrapper patch?

I think this is OK on the technical side.  I need to do to some vote 
tallying on the steering committee side WRT licensing, ownership of the 
sources, canonical source location, etc, so please don't commit yet.


jeff

[PATCH, commmited] PR jit/63854: Document how to run the jit testsuite under valgrind

2014-12-09 Thread David Malcolm

Committed to trunk as r218538.
Index: gcc/jit/ChangeLog
===
--- gcc/jit/ChangeLog	(revision 218537)
+++ gcc/jit/ChangeLog	(revision 218538)
@@ -1,5 +1,12 @@
 2014-12-09  David Malcolm  
 
+	PR jit/63854
+	* docs/internals/index.rst (Running under valgrind): New
+	subsection.
+	(docs/_build/texinfo/libgccjit.texi): Regenerate.
+
+2014-12-09  David Malcolm  
+
 PR jit/64206
 	* Make-lang.in (jit_OBJS): Add jit/jit-tempdir.o.
 	* jit-common.h (gcc::jit::tempdir): New forward decl.
Index: gcc/jit/docs/internals/index.rst
===
--- gcc/jit/docs/internals/index.rst	(revision 218537)
+++ gcc/jit/docs/internals/index.rst	(revision 218538)
@@ -118,6 +118,51 @@
  gdb --args \
testsuite/jit/test-factorial.exe
 
+Running under valgrind
+**
+
+The jit testsuite detects if RUN_UNDER_VALGRIND is present in the
+environment (with any value).  If it is present, it runs the test client
+code under `valgrind `_,
+specifcally, the default
+`memcheck `_
+tool with
+`--leak-check=full
+`_.
+
+It automatically parses the output from valgrind, injecting XFAIL results if
+any issues are found, or PASS results if the output is clean.  The output
+is saved to ``TESTNAME.exe.valgrind.txt``.
+
+For example, the following invocation verbosely runs the testcase
+``test-sum-of-squares.c`` under valgrind, showing an issue:
+
+.. code-block:: console
+
+  $ RUN_UNDER_VALGRIND= \
+  make check-jit \
+RUNTESTFLAGS="-v -v -v jit.exp=test-sum-of-squares.c"
+
+  (...verbose log contains detailed valgrind errors, if any...)
+
+  === jit Summary ===
+
+  # of expected passes28
+  # of expected failures  2
+
+  $ less testsuite/jit/jit.sum
+  (...other results...)
+  XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: definitely lost: 8 bytes in 1 blocks
+  XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: unsuppressed errors: 1
+  (...other results...)
+
+  $ less testsuite/jit/test-sum-of-squares.exe.valgrind.txt
+  (...shows full valgrind report for this test case...)
+
+When running under valgrind, it's best to have configured gcc with
+:option:`--enable-valgrind-annotations`, which automatically suppresses
+various known false positives.
+
 Environment variables
 -
 When running client code against a locally-built libgccjit, three
Index: gcc/jit/docs/_build/texinfo/libgccjit.texi
===
--- gcc/jit/docs/_build/texinfo/libgccjit.texi	(revision 218537)
+++ gcc/jit/docs/_build/texinfo/libgccjit.texi	(revision 218538)
@@ -197,6 +197,10 @@
 * Environment variables:: 
 * Overview of code structure:: 
 
+Running the test suite
+
+* Running under valgrind:: 
+
 @end detailmenu
 @end menu
 
@@ -6548,8 +6552,61 @@
 
 @noindent
 
+@menu
+* Running under valgrind:: 
+
+@end menu
+
+@node Running under valgrind,,,Running the test suite
+@anchor{internals/index running-under-valgrind}@anchor{c9}
+@subsection Running under valgrind
+
+
+The jit testsuite detects if RUN_UNDER_VALGRIND is present in the
+environment (with any value).  If it is present, it runs the test client
+code under valgrind@footnote{http://valgrind.org},
+specifcally, the default
+memcheck@footnote{http://valgrind.org/docs/manual/mc-manual.html}
+tool with
+--leak-check=full@footnote{http://valgrind.org/docs/manual/mc-manual.html#opt.leak-check}.
+
+It automatically parses the output from valgrind, injecting XFAIL results if
+any issues are found, or PASS results if the output is clean.  The output
+is saved to @code{TESTNAME.exe.valgrind.txt}.
+
+For example, the following invocation verbosely runs the testcase
+@code{test-sum-of-squares.c} under valgrind, showing an issue:
+
+@example
+$ RUN_UNDER_VALGRIND= \
+make check-jit \
+  RUNTESTFLAGS="-v -v -v jit.exp=test-sum-of-squares.c"
+
+(...verbose log contains detailed valgrind errors, if any...)
+
+=== jit Summary ===
+
+# of expected passes28
+# of expected failures  2
+
+$ less testsuite/jit/jit.sum
+(...other results...)
+XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: definitely lost: 8 bytes in 1 blocks
+XFAIL: jit.dg/test-sum-of-squares.c: test-sum-of-squares.exe.valgrind.txt: unsuppressed errors: 1
+(...other results...)
+
+$ less testsuite/jit/test-sum-of-squares.exe.valgrind.txt
+(...shows full valgrind report for this test case...)
+@end example
+
+@noindent
+
+When running under valgrind, it's best to have configured gcc with
+@code{--enable-valgrind-annotations}, which automatically suppresses
+various known false positives.
+
 @node Environment variables,Overview of cod

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-12-09 Thread Ilya Enkovich

2014-12-09 22:57 GMT+03:00 Jeff Law :
> On 12/09/14 01:24, Ilya Enkovich wrote:
>>
>> On 24 Nov 17:02, Ilya Enkovich wrote:
>>>
>>>
>>> Right.  This works for both top level and multilib checks because failing
>>> test is used and CC is usually not set when it's called by the top level
>>> configure.  If we configure with CC=... then it may go wrong.  I left only
>>> target check in configure.tgt and inlined test for x32 into libmpx
>>> configure.
>>>

 --
 Joseph S. Myers
 jos...@codesourcery.com
>>>
>>>
>>> Here is an updated version.
>>>
>>> Thanks,
>>> Ilya
>>> --
>>
>>
>> Here is an updated version.  I moved linker specs to target.  Currently
>> mpx libraries are built for x86_64-*-linux* | i?86-*-linux*, so I think
>> gcc/config/i386/linux-common.h is a proper place for LIBMPX delcarations.
>>
>> Thanks,
>> Ilya
>> --
>> 2014-12-09  Ilya Enkovich  
>>
>> * Makefile.def: Add libmpx.
>> * configure.ac: Add libmpx.
>> * Makefile.in: Regenerate.
>> * configure: Regenerate.
>>
>> gcc/
>>
>> 2014-12-09  Ilya Enkovich  
>>
>> * config/i386/linux-common.h (LIBMPX_LIBS): New.
>> (LIBMPX_SPEC): New.
>> (CHKP_SPEC): New.
>> * gcc.c (CHKP_SPEC): New.
>> (LINK_COMMAND_SPEC): Add CHKP_SPEC.
>> * c-family/c.opt (static-libmpx): New.
>>
>> libmpx/
>>
>
>> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
>> index b9f7c65..65731cc 100644
>> --- a/gcc/c-family/c.opt
>> +++ b/gcc/c-family/c.opt
>> @@ -1020,6 +1020,9 @@ fchkp-instrument-marked-only
>>   C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only)
>> Init(0)
>>   Instrument only functions marked with bnd_instrument attribute.
>>
>> +static-libmpx
>> +Driver
>> +
>>   fcilkplus
>>   C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
>>   Enable Cilk Plus
>
> Wasn't this duplicated in the mpx-wrapper patch?

Wrappers patch introduces similar option static-libmpxwrappers.

>
> I think this is OK on the technical side.  I need to do to some vote
> tallying on the steering committee side WRT licensing, ownership of the
> sources, canonical source location, etc, so please don't commit yet.

Great! Let me know when it's time to commit.

Thanks,
Ilya

>
> jeff
>

Re: [PATCH] Set stores_off_frame_dead_at_return to false if sibling call

2014-12-09 Thread John David Anglin


On 12/9/2014 3:00 PM, Jeff Law wrote:
Thoughts on using both patches to ensure the we don't have extraneous 
entries in the GEN set and that we're adding more stuff to the KILL 
set for sibcalls?

Maybe we should add all stores to the KILL set
for a sibling call and not worry about whether
they are frame related or not.

Dave

--
John David Anglindave.ang...@bell.net

[patch] gdb python pretty printer for DIEs

2014-12-09 Thread Aldy Hernandez

I am tired of dumping entire DIEs just to see what type they are.  With 
this patch, we get:


(gdb) print context_die
$5 = DW_TAG_compile_unit>>


I know it's past the end of stage1, but this debugging aid can help in 
fixing bugs in stage >= 3.


I am committing this to the [debug-early] branch, but I am hoping I can 
also commit it to mainline and avoid dragging it along.


OK for mainline?
commit e20f40f92eeba51f9a1ffb078e060366e7beb471
Author: Aldy Hernandez 
Date:   Tue Dec 9 13:04:46 2014 -0800

* gdbhooks.py (class DWDieRefPrinter): New class.
(build_pretty_printer): Register dw_die_ref's.

diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index a74e712..26affd9 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -253,6 +253,26 @@ class CGraphNodePrinter:
 return result
 
 ##
+# Dwarf DIE pretty-printers
+##
+
+class DWDieRefPrinter:
+def __init__(self, gdbval):
+self.gdbval = gdbval
+
+def to_string (self):
+result = ''
+result += ' %s' % self.gdbval['die_tag']
+if long(self.gdbval['die_parent']) != 0:
+result += ' ' % (long(self.gdbval['die_parent']),
+ 
self.gdbval['die_parent']['die_tag'])
+ 
+result += '>'
+return result
+
+##
 
 class GimplePrinter:
 def __init__(self, gdbval):
@@ -455,6 +475,8 @@ def build_pretty_printer():
  'tree', TreePrinter)
 pp.add_printer_for_types(['cgraph_node *'],
  'cgraph_node', CGraphNodePrinter)
+pp.add_printer_for_types(['dw_die_ref'],
+ 'dw_die_ref', DWDieRefPrinter)
 pp.add_printer_for_types(['gimple', 'gimple_statement_base *',
 
   # Keep this in the same order as gimple.def:

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Evgeny Stupachenko

I've added the reproducer to the patch.

is it ok?

ChangeLog:

2014-12-10  Evgeny Stupachenko  

gcc/testsuite
* gcc.target/i386/blend.c: New.

gcc/
* config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
expand_vec_perm_1 target.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..5a914ad 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
 dcopy.op0 = dcopy.op1 = d->op1;
   else
 dcopy.op0 = dcopy.op1 = d->op0;
+  dcopy.target = gen_reg_rtx (vmode);
   dcopy.one_operand_p = true;

   for (i = 0; i < nelt; ++i)
diff --git a/gcc/testsuite/gcc.target/i386/blend.c
b/gcc/testsuite/gcc.target/i386/blend.c
new file mode 100644
index 000..d03bdbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/blend.c
@@ -0,0 +1,61 @@
+/* Test correctness of size 3 store groups permutation.  */
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+#define N 50
+
+enum num3
+{
+  a, b, c
+};
+
+struct flags
+{
+  enum num3 f;
+  unsigned int c;
+  unsigned int p;
+};
+
+struct flagsN
+{
+  struct flags a[N];
+};
+
+void
+bar (int n, struct flagsN *ff)
+{
+  struct flagsN *fc;
+  for (fc = ff + 1; fc < (ff + n); fc++)
+{
+  int i;
+  for (i = 0; i < N; ++i)
+   {
+ ff->a[i].f = 0;
+ ff->a[i].c = i;
+ ff->a[i].p = -1;
+   }
+  for (i = 0; i < n; i++)
+   {
+ int j;
+ for (j = 0; j < N - n; ++j)
+   {
+ fc->a[i + j].f = 0;
+ fc->a[i + j].c = j + i;
+ fc->a[i + j].p = -1;
+   }
+   }
+}
+}
+
+struct flagsN q[2];
+
+int main()
+{
+  int i;
+  long long *rr = (long long *)q[0].a;
+  bar(2, q);
+  for (i = 0; i < N * 2; i += 2)
+if (rr[i] == -1 && rr[i + 1] == -1)
+  return 1;
+  return 0;
+}



On Tue, Dec 9, 2014 at 7:38 PM, Evgeny Stupachenko  wrote:
> I mean that there are a lot of people tracking spec2006 stability and
> therefore the issue should be on track in future.
> And that I can create the test case, but it would be as big as several
> GCC functions.
> Will work on reducing the test case.
>
>
> On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson  wrote:
>> On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
>>> However patch is fixing spec2006 benchmark. Shouldn't that be enough
>>> for regression testing?
>>>
>>
>> No.  Spec is not free.
>>
>>
>> r~

[PATCH, committed] jit: toyvm.c: use correct path in debuginfo

2014-12-09 Thread David Malcolm

I noticed while debugging another issue that at some point I'd broken
the source locations that toyvm.c sets up into the .toy source files,
by using the filename, rather than the filesystem path.

Also fix a segfault due to fclose (NULL) when the .toy file is not
found.

Committed to trunk as r218540.

gcc/jit/ChangeLog:
* docs/examples/tut04-toyvm/toyvm.c (toyvm_function_compile): Move
logic for determine "funcname" to new function...
(get_function_name): ...here, adding logic to skip any leading
path from the filename.
(toyvm_function_parse): Use the filename for fn_filename, rather
than "name", so that the debugger can locate the source .toy
file.
(toyvm_function_parse): Don't fclose a NULL FILE *.
---
 gcc/jit/docs/examples/tut04-toyvm/toyvm.c | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/gcc/jit/docs/examples/tut04-toyvm/toyvm.c 
b/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
index 07de507..0089ea7 100644
--- a/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
+++ b/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
@@ -121,6 +121,25 @@ add_unary_op (toyvm_function *fn, enum opcode opcode,
   add_op (fn, opcode, operand, linenum);
 }
 
+static char *
+get_function_name (const char *filename)
+{
+  /* Skip any path separators.  */
+  const char *pathsep = strrchr (filename, '/');
+  if (pathsep)
+filename = pathsep + 1;
+
+  /* Copy filename to funcname.  */
+  char *funcname = (char *)malloc (strlen (filename) + 1);
+
+  strcpy (funcname, filename);
+
+  /* Convert "." to NIL terminator.  */
+  *(strchr (funcname, '.')) = '\0';
+
+  return funcname;
+}
+
 static toyvm_function *
 toyvm_function_parse (const char *filename, const char *name)
 {
@@ -149,7 +168,7 @@ toyvm_function_parse (const char *filename, const char 
*name)
   fprintf (stderr, "out of memory allocating toyvm_function\n");
   goto error;
 }
-  fn->fn_filename = name;
+  fn->fn_filename = filename;
 
   /* Read the lines of the file.  */
   while ((linelen = getline (&line, &bufsize, f)) != -1)
@@ -208,7 +227,8 @@ toyvm_function_parse (const char *filename, const char 
*name)
 
  error:
   free (line);
-  fclose (f);
+  if (f)
+fclose (f);
   free (fn);
   return NULL;
 }
@@ -460,12 +480,7 @@ toyvm_function_compile (toyvm_function *fn)
 
   memset (&state, 0, sizeof (state));
 
-  /* Copy filename to funcname.  */
-  funcname = (char *)malloc (strlen (fn->fn_filename) + 1);
-  strcpy (funcname, fn->fn_filename);
-
-  /* Convert "." to NIL terminator.  */
-  *(strchr (funcname, '.')) = '\0';
+  funcname = get_function_name (fn->fn_filename);
 
   state.ctxt = gcc_jit_context_acquire ();
 
-- 
1.8.5.3

Re: [PATCH 01/10] rs6000: Clobber XER[CA] in all user asm statements

2014-12-09 Thread Richard Henderson

On 12/08/2014 06:25 AM, David Edelsohn wrote:
> Okay.  I don't know of another option that preserves backward
> compatibility.  If RTH looks at this series of patches, he may have a
> better suggestion.

No, this looks like what I'd do.

Indeed, I added this hook back when the i386 port gained direct
support for the flags register, so it's exactly what I did.  ;-)

r~

Re: [PATCH, x86] Fix pblendv expand.

2014-12-09 Thread Jakub Jelinek

On Wed, Dec 10, 2014 at 12:33:52AM +0300, Evgeny Stupachenko wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/blend.c
> @@ -0,0 +1,61 @@
> +/* Test correctness of size 3 store groups permutation.  */
> +/* { dg-do run } */
> +/* { dg-options "-O3" } */
> +
> +#define N 50
> +
> +enum num3
> +{
> +  a, b, c
> +};
> +
> +struct flags
> +{
> +  enum num3 f;

Does this really has to be an enum?  Doesn't unsigned int there work the
same?

> +int main()
> +{
> +  int i;
> +  long long *rr = (long long *)q[0].a;
> +  bar(2, q);
> +  for (i = 0; i < N * 2; i += 2)
> +if (rr[i] == -1 && rr[i + 1] == -1)

This is aliasing violation, can't you avoid that?  I mean, just
check the individual struct flags fields?  And with the aliasing violation
fixed, is there anything i?86 specific left in the testcase (i.e. shouldn't
it go either to gcc.dg/ or gcc.c-torture/execute/ instead?)?

Jakub

Re: [PATCH, libcpp]: Check the result of vasprintf

2014-12-09 Thread Joseph Myers

On Tue, 9 Dec 2014, Uros Bizjak wrote:

> Attached patch checks the return value and sets ptr to NULL in this case.
> 
> 2014-12-09  Uros Bizjak  
> 
> * directives.c (cpp_define_formatted): Check return value of
> vasprintf and in case of error set ptr to NULL.
> 
> Bootstrapped on x86_64-linux-gnu.
> 
> OK for mainline?

No, this will just continue to pass NULL into cpp_define, and so into 
strlen, where it isn't a valid argument.  You need to give an error 
message for allocation failure and exit, much like xmalloc does (or put 
xvasprintf in libiberty and use that here - see 
 and 
 - I don't know 
if that's the latest version).

-- 
Joseph S. Myers
jos...@codesourcery.com

RE: [PATCH][MIPS] Fix P5600 memory cost

2014-12-09 Thread Matthew Fortune

Hi Prachi,

I'm afraid you updated the wrong Changelog with this commit. GCC changes
are recorded in gcc/ChangeLog.

I'm not sure what the correct procedure is for fixing this.

Jeff: Should a mistake like this be fixed by removing the entry
from the top level and adding one at the appropriate location in the
gcc/ChangeLog one?

Thanks,
Matthew

> -Original Message-
> From: Prachi Godbole
> Sent: 26 November 2014 08:40
> To: Matthew Fortune; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH][MIPS] Fix P5600 memory cost
> 
> Committed with ChangeLog entry fixes.
> 
> Prachi
> 
> 
> -Original Message-
> From: Matthew Fortune
> Sent: Wednesday, November 5, 2014 4:07 PM
> To: Prachi Godbole; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH][MIPS] Fix P5600 memory cost
> 
> > The patch below fixes the memory cost for P5600.
> >
> > ChangeLog:
> > 2014-11-05 Prachi Godbole 
> >
> > * config/mips/mips.c (mips_rtx_cost_data): Fix memory_letency cost for
> > p5600.
> 
> Please follow these instructions to add yourself to MAINTAINERS in the
> write-after-approval section now that you have write access to GCC:
> 
> https://gcc.gnu.org/svnwrite.html#authenticated
> 
> OK with fixes to the changelog entry:
> 
> latency not latency. Remember to tab in the changelog entry and split
> the line as it will exceed 80 chars. Also two spaces between the
> date/name and name/email. E.g.
> 
> 2014-11-05  Prachi Godbole  
> 
>   * config/mips/mips.c (mips_rtx_cost_data): Fix memory_latency cost
> for
>   p5600.
> 
> Thanks,
> Matthew
> 
> > diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index
> > af6a913..558ba2f 100644
> > --- a/gcc/config/mips/mips.c
> > +++ b/gcc/config/mips/mips.c
> > @@ -1193,7 +1193,7 @@ static const struct mips_rtx_cost_data
> >  COSTS_N_INSNS (8),/* int_div_si */
> >  COSTS_N_INSNS (8),/* int_div_di */
> > 2,/* branch_cost */
> > -  10 /* memory_latency */
> > +   4 /* memory_latency */
> >}
> >  };
> >  ^L

1 2 >

1 - 100 of 126 matches

Mail list logo