RE: [PATCH][GCC][AArch64] Correct 3 way XOR instructions adding missing patterns.

2018-05-08 Thread Tamar Christina
Ping? Backport may not be appropriate but I'd still like it in trunk.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Tamar Christina
> Sent: Monday, April 30, 2018 15:13
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: [PATCH][GCC][AArch64] Correct 3 way XOR instructions adding
> missing patterns.
> 
> Hi All,
> 
> This patch adds the missing neon intrinsics for all 128 bit vector Integer
> modes for the three-way XOR and negate and xor instructions for Arm8.2-a
> to Armv8.4-a.
> 
> Bootstrapped and regtested on aarch64-none-linux-gnue and no issues.
> 
> Ok for master? And for backport to the GCC-8 branch?
> 
> gcc/
> 2018-04-30  Tamar Christina  
> 
>   * config/aarch64/aarch64-simd.md (aarch64_eor3qv8hi): Change to
>   eor3q4.
>   (aarch64_bcaxqv8hi): Change to bcaxq4.
>   * config/aarch64/aarch64-simd-builtins.def (veor3q_u8, veor3q_u32,
>   veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64,
> vbcaxq_u8,
>   vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
>   vbcaxq_s64): New.
>   * config/aarch64/arm_neon.h: Likewise.
>   * config/aarch64/iterators.md (VQ_I): New.
> 
> gcc/testsuite/
> 2018-04-30  Tamar Christina  
> 
>   * gcc.target/gcc.target/aarch64/sha3.h (veor3q_u8, veor3q_u32,
>   veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64,
> vbcaxq_u8,
>   vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
>   vbcaxq_s64): New.
>   * gcc.target/gcc.target/aarch64/sha3_1.c: Likewise.
>   * gcc.target/gcc.target/aarch64/sha3_1.c: Likewise.
>   * gcc.target/gcc.target/aarch64/sha3_1.c: Likewise.

Copy and paste wibble, will correct when committing.

> 
> Thanks,
> Tamar
> 
> --


Re: [PATCH] Add constant folding support for next{after,toward}{,f,l} (PR libstdc++/85466)

2018-05-08 Thread Tom de Vries

On 05/07/2018 03:41 PM, Christophe Lyon wrote:

On 7 May 2018 at 12:04, Tom de Vries  wrote:

On 04/21/2018 07:36 PM, Jakub Jelinek wrote:


 * gcc.dg/nextafter-2.c: New test.



Hi,

FTR, I ran into a link error "unresolved symbol nexttowardf" using the
standalone nvptx toolchain:
...
PASS: gcc.dg/nextafter-1.c (test for excess errors)
PASS: gcc.dg/nextafter-1.c execution test
PASS: gcc.dg/nextafter-1.c scan-tree-dump-not optimized "nextafter"
PASS: gcc.dg/nextafter-1.c scan-tree-dump-not optimized "nexttoward"
FAIL: gcc.dg/nextafter-2.c (test for excess errors)
UNRESOLVED: gcc.dg/nextafter-2.c compilation failed to produce executable
PASS: gcc.dg/nextafter-3.c (test for excess errors)
PASS: gcc.dg/nextafter-3.c execution test
PASS: gcc.dg/nextafter-3.c scan-tree-dump-not optimized "nextafter"
PASS: gcc.dg/nextafter-3.c scan-tree-dump-not optimized "nexttoward"
PASS: gcc.dg/nextafter-4.c (test for excess errors)
PASS: gcc.dg/nextafter-4.c execution test
PASS: gcc.dg/nextafter-4.c scan-tree-dump-not optimized "nextafter"
PASS: gcc.dg/nextafter-4.c scan-tree-dump-not optimized "nexttoward"
...

This failure exposes a newlib bug. I've submitted a patch here (
https://sourceware.org/ml/newlib/2018/msg00350.html ).



Hi

I noticed the same problem on arm and aarch64 bare-metal targets using newlib,
and I thought it was a matter of adding the c99_runtime effective target,
as done in the attached patch.

Even if newlib gets a fix for this, the effective target will still
claim c99_runtime
is not supported on such targets



Hi Christophe,

It's true that newlib does not support c99 fully, but given that with 
the fix mentioned above (which was applied upstream) the test-case is 
linking, the c99 functions required by the test-case are at least present.


With the fix applied, the test-case now fails in execution for nvptx. I 
don't know yet whether that's a target issue or a newlib issue. Can you 
remove the c99 effective target test and run with updated newlib and see 
if it passes or fails in execution for arm/aarch64?


FWIW, also this test fails for me in execution on ubuntu 16.04 with 
glibc 2.23.


Thanks,
- Tom


Re: [Aarch64] PR target/83009: Relax strict address checking for store pair lanes

2018-05-08 Thread Andre Vieira (lists)
Hi Richard,
On 07/05/18 11:14, Richard Sandiford wrote:
> "Andre Vieira (lists)"  writes:
>> Hi,
>>
>> See below a patch to address PR 83009.
>>
>> Tested with aarch64-linux-gnu bootstrap and regtests for c, c++ and fortran.
>> Ran the adjusted testcase for -mabi=ilp32.
>>
>> Is this OK for gcc-9?
>>
>> Cheers,
>> Andre
>>
>> PR target/83009: Relax strict address checking for store pair lanes
>>
>> The operand constraint for the memory address of store/load pair lanes
>> was enforcing strictly hardware registers be allowed as memory
>> addresses.  We want to relax that such that these patterns can be used
>> by combine.  During register allocation the register constraint will
>> enforce the correct register is chosen.
> 
> Nice spot.
> 
>> diff --git a/gcc/config/aarch64/predicates.md 
>> b/gcc/config/aarch64/predicates.md
>> index 
>> 5d41d4350402b2a9e5941f160c6ab6f933bfff90..f29bc8e74f0070589014ac87fd22a95723ba9be8
>>  100644
>> --- a/gcc/config/aarch64/predicates.md
>> +++ b/gcc/config/aarch64/predicates.md
>> @@ -222,7 +222,7 @@
>>  ;; as a 128-bit vec_concat.
>>  (define_predicate "aarch64_mem_pair_lanes_operand"
>>(and (match_code "mem")
>> -   (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0), 1,
>> +   (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0), 0,
>>ADDR_QUERY_LDP_STP)")))
>>
>>  (define_predicate "aarch64_prefetch_operand"
> 
> Very minor, but it'd be good to change it to a real bool parameter
> at the same time, for consistency with aarch64_mem_pair_operand.
> (Patch LGTM otherwise FWIW.)
> 
Good shout! Thank you. Attached new version.
> Richard
> 

Cheers,
Andre

diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 
5d41d4350402b2a9e5941f160c6ab6f933bfff90..8ce8cd0cad368dff009a15efe25f051764b8bc4d
 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -222,7 +222,7 @@
 ;; as a 128-bit vec_concat.
 (define_predicate "aarch64_mem_pair_lanes_operand"
   (and (match_code "mem")
-   (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0), 1,
+   (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0), false,
  ADDR_QUERY_LDP_STP)")))
 
 (define_predicate "aarch64_prefetch_operand"
diff --git a/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c 
b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
index 
990aea32de6f8239effa95a081950684c6e11386..3296d04da14149d26d19da785663b87bd5ad8994
 100644
--- a/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
+++ b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
@@ -22,10 +22,32 @@ construct_lane_2 (long long *y, v2di *z)
   z[2] = x;
 }
 
+void
+construct_lane_3 (double **py, v2df **pz)
+{
+  double *y = *py;
+  v2df *z = *pz;
+  double y0 = y[0] + 1;
+  double y1 = y[1] + 2;
+  v2df x = {y0, y1};
+  z[2] = x;
+}
+
+void
+construct_lane_4 (long long **py, v2di **pz)
+{
+  long long *y = *py;
+  v2di *z = *pz;
+  long long y0 = y[0] + 1;
+  long long y1 = y[1] + 2;
+  v2di x = {y0, y1};
+  z[2] = x;
+}
+
 /* We can use the load_pair_lanes pattern to vec_concat two DI/DF
values from consecutive memory into a 2-element vector by using
a Q-reg LDR.  */
 
-/* { dg-final { scan-assembler-times "stp\td\[0-9\]+, d\[0-9\]+" 1 { xfail 
ilp32 } } } */
-/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]+" 1 { xfail 
ilp32 } } } */
-/* { dg-final { scan-assembler-not "ins\t" { xfail ilp32 } } } */
+/* { dg-final { scan-assembler-times "stp\td\[0-9\]+, d\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-not "ins\t" } } */


Re: [PATCH] Fix bootstrap miscompare with LTO bootstrap (PR85571)

2018-05-08 Thread Richard Biener
On Wed, 2 May 2018, Richard Biener wrote:

> 
> The following fixes the LTO part of the -f[no-]checking miscompare issue.
> I introduce a compare-lto script similar to compare-debug where I strip
> the LTO option section and re-compare.  I have no easy way to test
> the nonplugin path and at least for targets using simple-object to
> wrap all LTO sections into one native sections we can't strip the
> option section anyway so this just disables compare for those.
> 
> As a followup I'd like to add support to add extra files to compare
> and for LTO bootstrap we'd like to compare cc1plus$(exeext) and 
> lto1$(exeext) - maybe I can come up with a pattern that works everywhere
> but I guess this would be a start.  But then for this we need PR85574
> to be fixed first.
> 
> LTO bootstrapped on x86_64-unknown-linux-gnu.
> 
> OK for trunk?

Ping.

Richard.

> Thanks,
> Richard.
> 
> 2018-05-02  Richard Biener  
> 
>   PR bootstrap/85571
>   config/
>   * bootstrap-lto-noplugin.mk: Disable compare.
>   * bootstrap-lto.mk: Supply contrib/compare-lto for do-compare.
> 
>   contrib/
>   * compare-lto: New script derived from compare-debug.
> 
> Index: config/bootstrap-lto-noplugin.mk
> ===
> --- config/bootstrap-lto-noplugin.mk  (revision 259829)
> +++ config/bootstrap-lto-noplugin.mk  (working copy)
> @@ -6,3 +6,4 @@ STAGE3_CFLAGS += -flto=jobserver -frando
>  STAGEprofile_CFLAGS += -flto=jobserver -frandom-seed=1
>  STAGEtrain_CFLAGS += -flto=jobserver -frandom-seed=1
>  STAGEfeedback_CFLAGS += -flto=jobserver -frandom-seed=1
> +do-compare = /bin/true
> Index: config/bootstrap-lto.mk
> ===
> --- config/bootstrap-lto.mk   (revision 259829)
> +++ config/bootstrap-lto.mk   (working copy)
> @@ -13,3 +13,5 @@ LTO_RANLIB = $$r/$(HOST_SUBDIR)/prev-gcc
>  LTO_EXPORTS = AR="$(LTO_AR)"; export AR; \
> RANLIB="$(LTO_RANLIB)"; export RANLIB;
>  LTO_FLAGS_TO_PASS = AR="$(LTO_AR)" RANLIB="$(LTO_RANLIB)"
> +
> +do-compare = $(SHELL) $(srcdir)/contrib/compare-lto $$f1 $$f2
> Index: contrib/compare-lto
> ===
> --- contrib/compare-lto   (nonexistent)
> +++ contrib/compare-lto   (working copy)
> @@ -0,0 +1,111 @@
> +#! /bin/sh
> +
> +# Compare copies of two given object files.
> +
> +# Copyright (C) 2007, 2008, 2009, 2010, 2012 Free Software Foundation
> +# Originally by Alexandre Oliva 
> +# Modified for LTO bootstrap by Richard Biener 
> +
> +# This file is part of GCC.
> +
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +
> +# GCC is distributed in the hope that it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +# License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# .
> +
> +rm='rm -f'
> +
> +case $1 in
> +-p | --preserve)
> +  rm='echo preserving'
> +  shift
> +  ;;
> +esac
> +
> +if test $# != 2; then
> +  echo 'usage: compare-lto file1.o file2.o' >&2
> +  exit 1
> +fi
> +
> +if test ! -f "$1"; then
> +  echo "$1" does not exist >&2
> +  exit 1
> +fi
> +
> +if test ! -f "$2"; then
> +  echo "$2" does not exist >&2
> +  exit 1
> +fi
> +
> +suf1=stripped
> +while test -f "$1.$suf1"; do
> +  suf1=$suf1.
> +done
> +
> +suf2=stripped
> +while test -f "$2.$suf2"; do
> +  suf2=$suf2.
> +done
> +
> +trap 'rm -f "$1.$suf1" "$2.$suf2"' 0 1 2 15
> +
> +if cmp "$1" "$2"; then
> +  status=0
> +else
> +  status=1
> +
> +  cmd=
> +  for t in objdump readelf eu-readelf; do
> +if ($t --help) 2>&1 | grep ' --\[*section-\]*headers' > /dev/null; then
> +  cmd=$t
> +  break
> +fi
> +  done
> +
> +  # If there are LTO option sections, try to strip them off.
> +  if test "x$cmd" = "x" ||
> + $cmd --section-headers "$1" | grep '.gnu.lto_.opts' > /dev/null ||
> + $cmd --section-headers "$2" | grep '.gnu.lto_.opts' > /dev/null ; then
> +
> +echo stripping off LTO option section, then retrying >&2
> +
> +seclist=".gnu.lto_.opts"
> +rsopts=`for sec in $seclist; do echo " --remove-section $sec"; done`
> +
> +if (objcopy -v) 2>&1 | grep ' --remove-section' > /dev/null; then
> +  objcopy $rsopts "$1" "$1.$suf1"
> +  objcopy $rsopts "$2" "$2.$suf2"
> +elif (strip --help) 2>&1 | grep ' --remove-section' > /dev/null; then
> +  cp "$1" "$1.$suf1"
> +  strip $rsopts "$1.$suf1"
> +
> +  cp "$2" "$2.$suf2"
> +  strip $rsopts "$2.$suf2"
> +else
> +  echo failed to strip off LTO option section >&2
> +fi
> +
> +  

Re: [PATCH] Fix bootstrap miscompare with LTO bootstrap (PR85571)

2018-05-08 Thread Jakub Jelinek
On Tue, May 08, 2018 at 10:37:04AM +0200, Richard Biener wrote:
> > OK for trunk?
> 
> Ping.
> 
> Richard.
> 
> > Thanks,
> > Richard.
> > 
> > 2018-05-02  Richard Biener  
> > 
> > PR bootstrap/85571
> > config/
> > * bootstrap-lto-noplugin.mk: Disable compare.
> > * bootstrap-lto.mk: Supply contrib/compare-lto for do-compare.
> > 
> > contrib/
> > * compare-lto: New script derived from compare-debug.

Ok.

Jakub


Tighten condition in vect/pr85586.c (PR 85654)

2018-05-08 Thread Richard Sandiford
Another gcc.dg/vect test, another chance to play whack-a-mole
with the target selectors.  In this case I think we want
{ ! vect_no_align }.  { { ! vect_no_align } || vect_hw_misalign }
might work too, but (a) there are other tests that use vect_no_align
on its own and (b) the point of the scan test was simply to sanity-
check that we didn't stop vectorising, rather than to test a new
vectorisation feature.

Tested on aaarch64-linux-gnu, x86_64-linux-gnu and armeb-none-elf.
OK for trunk and GCC 8?

Thanks,
Richard


2018-05-08  Richard Sandiford  

gcc/testsuite/
PR testsuite/85586
* gcc.dg/vect/pr85586.c: Restrict LOOP VECTORIZED test to
!vect_no_align.

Index: gcc/testsuite/gcc.dg/vect/pr85586.c
===
--- gcc/testsuite/gcc.dg/vect/pr85586.c 2018-05-02 08:39:59.942069849 +0100
+++ gcc/testsuite/gcc.dg/vect/pr85586.c 2018-05-08 09:47:33.207979464 +0100
@@ -40,4 +40,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target 
vect_int } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target { { ! 
vect_no_align } && vect_int } } } } */


Re: [PATCH] Implement absv2di2 and absv4di2 expanders for pre-avx512vl (PR target/85572)

2018-05-08 Thread Uros Bizjak
On Mon, Apr 30, 2018 at 9:19 PM, Jakub Jelinek  wrote:
> Hi!
>
> Before avx512vl we don't have a single instruction to do V2DImode and
> V4DImode abs, but that isn't much different from say V4SImode before SSE3
> where we also just emit a short sequence that is better than elementwise
> expansion.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2018-04-30  Jakub Jelinek  
>
> PR target/85572
> * config/i386/i386.c (ix86_expand_sse2_abs): Handle E_V2DImode and
> E_V4DImode.
> * config/i386/sse.md (abs2): Use VI_AVX2 iterator instead of
> VI1248_AVX512VL_AVX512BW.  Handle V2DImode and V4DImode if not
> TARGET_AVX512VL using ix86_expand_sse2_abs.  Formatting fixes.
>
> * g++.dg/other/sse2-pr85572-1.C: New test.
> * g++.dg/other/sse2-pr85572-2.C: New test.
> * g++.dg/other/sse4-pr85572-1.C: New test.
> * g++.dg/other/avx2-pr85572-1.C: New test.

LGTM.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2018-04-25 15:09:29.895453703 +0200
> +++ gcc/config/i386/i386.c  2018-04-30 18:31:56.027101932 +0200
> @@ -49806,39 +49806,74 @@ ix86_expand_sse2_abs (rtx target, rtx in
>
>switch (mode)
>  {
> +case E_V2DImode:
> +case E_V4DImode:
> +  /* For 64-bit signed integer X, with SSE4.2 use
> +pxor t0, t0; pcmpgtq X, t0; pxor t0, X; psubq t0, X.
> +Otherwise handle it similarly to V4SImode, except use 64 as W 
> instead of
> +32 and use logical instead of arithmetic right shift (which is
> +unimplemented) and subtract.  */
> +  if (TARGET_SSE4_2)
> +   {
> + tmp0 = gen_reg_rtx (mode);
> + tmp1 = gen_reg_rtx (mode);
> + emit_move_insn (tmp1, CONST0_RTX (mode));
> + if (mode == E_V2DImode)
> +   emit_insn (gen_sse4_2_gtv2di3 (tmp0, tmp1, input));
> + else
> +   emit_insn (gen_avx2_gtv4di3 (tmp0, tmp1, input));
> +
> + tmp1 = expand_simple_binop (mode, XOR, tmp0, input,
> + NULL, 0, OPTAB_DIRECT);
> + x = expand_simple_binop (mode, MINUS, tmp1, tmp0,
> +  target, 0, OPTAB_DIRECT);
> + break;
> +   }
> +
> +  tmp0 = expand_simple_binop (mode, LSHIFTRT, input,
> + GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1),
> + NULL, 0, OPTAB_DIRECT);
> +  tmp0 = expand_simple_unop (mode, NEG, tmp0, NULL, false);
> +
> +  tmp1 = expand_simple_binop (mode, XOR, tmp0, input,
> + NULL, 0, OPTAB_DIRECT);
> +  x = expand_simple_binop (mode, MINUS, tmp1, tmp0,
> +  target, 0, OPTAB_DIRECT);
> +  break;
> +
> +case E_V4SImode:
>/* For 32-bit signed integer X, the best way to calculate the absolute
>  value of X is (((signed) X >> (W-1)) ^ X) - ((signed) X >> (W-1)).  
> */
> -  case E_V4SImode:
> -   tmp0 = expand_simple_binop (mode, ASHIFTRT, input,
> -   GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 
> 1),
> -   NULL, 0, OPTAB_DIRECT);
> -   tmp1 = expand_simple_binop (mode, XOR, tmp0, input,
> -   NULL, 0, OPTAB_DIRECT);
> -   x = expand_simple_binop (mode, MINUS, tmp1, tmp0,
> -target, 0, OPTAB_DIRECT);
> -   break;
> +  tmp0 = expand_simple_binop (mode, ASHIFTRT, input,
> + GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1),
> + NULL, 0, OPTAB_DIRECT);
> +  tmp1 = expand_simple_binop (mode, XOR, tmp0, input,
> + NULL, 0, OPTAB_DIRECT);
> +  x = expand_simple_binop (mode, MINUS, tmp1, tmp0,
> +  target, 0, OPTAB_DIRECT);
> +  break;
>
> +case E_V8HImode:
>/* For 16-bit signed integer X, the best way to calculate the absolute
>  value of X is max (X, -X), as SSE2 provides the PMAXSW insn.  */
> -  case E_V8HImode:
> -   tmp0 = expand_unop (mode, neg_optab, input, NULL_RTX, 0);
> +  tmp0 = expand_unop (mode, neg_optab, input, NULL_RTX, 0);
>
> -   x = expand_simple_binop (mode, SMAX, tmp0, input,
> -target, 0, OPTAB_DIRECT);
> -   break;
> +  x = expand_simple_binop (mode, SMAX, tmp0, input,
> +  target, 0, OPTAB_DIRECT);
> +  break;
>
> +case E_V16QImode:
>/* For 8-bit signed integer X, the best way to calculate the absolute
>  value of X is min ((unsigned char) X, (unsigned char) (-X)),
>  as SSE2 provides the PMINUB insn.  */
> -  case E_V16QImode:
> -   tmp0 = expand_unop (mode, neg_optab, input, NULL_RTX, 0);
> +  tmp0 = expand_unop (mode, neg_optab, input, NULL_RTX, 0);
>
> -   x = expand_simple_binop (V16QImode, UMIN, tmp0, input,
> - 

Re: [PATCH] Implement absv2di2 and absv4di2 expanders for pre-avx512vl (PR target/85572)

2018-05-08 Thread Uros Bizjak
On Tue, May 8, 2018 at 11:11 AM, Uros Bizjak  wrote:
> On Mon, Apr 30, 2018 at 9:19 PM, Jakub Jelinek  wrote:
>> Hi!
>>
>> Before avx512vl we don't have a single instruction to do V2DImode and
>> V4DImode abs, but that isn't much different from say V4SImode before SSE3
>> where we also just emit a short sequence that is better than elementwise
>> expansion.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
>> trunk?
>>
>> 2018-04-30  Jakub Jelinek  
>>
>> PR target/85572
>> * config/i386/i386.c (ix86_expand_sse2_abs): Handle E_V2DImode and
>> E_V4DImode.
>> * config/i386/sse.md (abs2): Use VI_AVX2 iterator instead of
>> VI1248_AVX512VL_AVX512BW.  Handle V2DImode and V4DImode if not
>> TARGET_AVX512VL using ix86_expand_sse2_abs.  Formatting fixes.
>>
>> * g++.dg/other/sse2-pr85572-1.C: New test.
>> * g++.dg/other/sse2-pr85572-2.C: New test.
>> * g++.dg/other/sse4-pr85572-1.C: New test.
>> * g++.dg/other/avx2-pr85572-1.C: New test.
>
> LGTM.
>
> Thanks,
> Uros.
>
>> --- gcc/config/i386/i386.c.jj   2018-04-25 15:09:29.895453703 +0200
>> +++ gcc/config/i386/i386.c  2018-04-30 18:31:56.027101932 +0200
>> @@ -49806,39 +49806,74 @@ ix86_expand_sse2_abs (rtx target, rtx in
>>
>>switch (mode)
>>  {
>> +case E_V2DImode:
>> +case E_V4DImode:
>> +  /* For 64-bit signed integer X, with SSE4.2 use
>> +pxor t0, t0; pcmpgtq X, t0; pxor t0, X; psubq t0, X.
>> +Otherwise handle it similarly to V4SImode, except use 64 as W 
>> instead of
>> +32 and use logical instead of arithmetic right shift (which is
>> +unimplemented) and subtract.  */
>> +  if (TARGET_SSE4_2)
>> +   {
>> + tmp0 = gen_reg_rtx (mode);
>> + tmp1 = gen_reg_rtx (mode);
>> + emit_move_insn (tmp1, CONST0_RTX (mode));
>> + if (mode == E_V2DImode)
>> +   emit_insn (gen_sse4_2_gtv2di3 (tmp0, tmp1, input));
>> + else
>> +   emit_insn (gen_avx2_gtv4di3 (tmp0, tmp1, input));

}
  else
{

>> +  tmp0 = expand_simple_binop (mode, LSHIFTRT, input,
>> + GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1),
>> + NULL, 0, OPTAB_DIRECT);
>> +  tmp0 = expand_simple_unop (mode, NEG, tmp0, NULL, false);

}

>> +  tmp1 = expand_simple_binop (mode, XOR, tmp0, input,
>> + NULL, 0, OPTAB_DIRECT);
>> +  x = expand_simple_binop (mode, MINUS, tmp1, tmp0,
>> +  target, 0, OPTAB_DIRECT);
>> +  break;

You could merge parts of the above code.

Uros.


[committed][AArch64] Tweak sve/vcond_6.c test

2018-05-08 Thread Richard Sandiford
sve/vcond_6.c was effectively testing a three-input logical operation,
since the result of BINOP needed to be ANDed with the loop predicate
before loading src[i].  This patch makes it really test a binary
operation instead.  A later patch will add (and optimise) the
three-operand case.

Tested on aarch64-linux-gnu (with and without SVE) and aaarch64_be-elf.
Applied as r260028.

Richard


2018-05-08  Richard Sandiford  

gcc/testsuite/
* gcc.target/aarch64/sve/vcond_6.c (LOOP): Unconditionally
load from src[i].

Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c  2018-05-08 
10:33:15.816153344 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c  2018-05-08 
10:33:15.970147366 +0100
@@ -19,9 +19,12 @@ #define LOOP(TYPE, BINOP)
\
 TYPE fallback, int count)  \
   {\
 for (int i = 0; i < count; ++i)\
-  dest[i] = (BINOP (__builtin_isunordered (a[i], b[i]),\
-   __builtin_isunordered (c[i], d[i])) \
-? src[i] : fallback);  \
+  {
\
+   TYPE srcv = src[i]; \
+   dest[i] = (BINOP (__builtin_isunordered (a[i], b[i]),   \
+ __builtin_isunordered (c[i], d[i]))   \
+  ? srcv : fallback);  \
+  }
\
   }
 
 #define TEST_BINOP(T, BINOP) \
@@ -40,9 +43,7 @@ #define TEST_ALL(T) \
 
 TEST_ALL (LOOP)
 
-/* Currently we don't manage to remove ANDs from the other loops.  */
-/* { dg-final { scan-assembler-times {\tand\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b, p[0-9]+\.b} 3 { xfail *-*-* } } } */
-/* { dg-final { scan-assembler {\tand\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, 
p[0-9]+\.b} } } */
+/* { dg-final { scan-assembler-times {\tand\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b, p[0-9]+\.b} 3 } } */
 /* { dg-final { scan-assembler-times {\torr\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b, p[0-9]+\.b} 3 } } */
 /* { dg-final { scan-assembler-times {\teor\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b, p[0-9]+\.b} 3 } } */
 /* { dg-final { scan-assembler-times {\tnand\tp[0-9]+\.b, p[0-9]+/z, 
p[0-9]+\.b, p[0-9]+\.b} 3 } } */


[committed][AArch64] Use UNSPEC_MERGE_PTRUE for comparisons

2018-05-08 Thread Richard Sandiford
This patch rewrites the SVE comparison handling so that it uses
UNSPEC_MERGE_PTRUE for comparisons that are known to be predicated
on a PTRUE, for consistency with other patterns.  Specific unspecs
are then only needed for truly predicated floating-point comparisons,
such as those used in the expansion of UNEQ for flag_trapping_math.

The patch also makes sure that the comparison expanders attach
a REG_EQUAL note to instructions that use UNSPEC_MERGE_PTRUE,
so passes can use that as an alternative to the unspec pattern.
(This happens automatically for optabs.  The problem was that
this code emits instruction patterns directly.)

No specific benefit on its own, but it lays the groundwork for
the next patch.

Tested on aarch64-linux-gnu (with and without SVE) and aaarch64_be-elf.
Applied as r260029.

Richard


2018-05-08  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (UNSPEC_COND_LO, UNSPEC_COND_LS)
(UNSPEC_COND_HI, UNSPEC_COND_HS, UNSPEC_COND_UO): Delete.
(SVE_INT_CMP, SVE_FP_CMP): New code iterators.
(cmp_op, sve_imm_con): New code attributes.
(SVE_COND_INT_CMP, imm_con): Delete.
(cmp_op): Remove above unspecs from int attribute.
* config/aarch64/aarch64-sve.md (*vec_cmp_): Rename
to...
(*cmp): ...this.  Use UNSPEC_MERGE_PTRUE instead of
comparison-specific unspecs.
(*vec_cmp__ptest): Rename to...
(*cmp_ptest): ...this and adjust likewise.
(*vec_cmp__cc): Rename to...
(*cmp_cc): ...this and adjust likewise.
(*vec_fcm): Rename to...
(*fcm): ...this and adjust likewise.
(*vec_fcmuo): Rename to...
(*fcmuo): ...this and adjust likewise.
(*pred_fcm): New pattern.
* config/aarch64/aarch64.c (aarch64_emit_unop, aarch64_emit_binop)
(aarch64_emit_sve_ptrue_op, aarch64_emit_sve_ptrue_op_cc): New
functions.
(aarch64_unspec_cond_code): Remove handling of LTU, GTU, LEU, GEU
and UNORDERED.
(aarch64_gen_unspec_cond, aarch64_emit_unspec_cond): Delete.
(aarch64_emit_sve_predicated_cond): New function.
(aarch64_expand_sve_vec_cmp_int): Use aarch64_emit_sve_ptrue_op_cc.
(aarch64_emit_unspec_cond_or): Replace with...
(aarch64_emit_sve_or_conds): ...this new function.  Use
aarch64_emit_sve_ptrue_op for the individual comparisons and
aarch64_emit_binop to OR them together.
(aarch64_emit_inverted_unspec_cond): Replace with...
(aarch64_emit_sve_inverted_cond): ...this new function.  Use
aarch64_emit_sve_ptrue_op for the comparison and
aarch64_emit_unop to invert the result.
(aarch64_expand_sve_vec_cmp_float): Update after the above
changes.  Use aarch64_emit_sve_ptrue_op for native comparisons.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2018-05-01 19:31:04.341265575 +0100
+++ gcc/config/aarch64/iterators.md 2018-05-08 10:51:17.070995242 +0100
@@ -455,11 +455,6 @@ (define_c_enum "unspec"
 UNSPEC_COND_NE ; Used in aarch64-sve.md.
 UNSPEC_COND_GE ; Used in aarch64-sve.md.
 UNSPEC_COND_GT ; Used in aarch64-sve.md.
-UNSPEC_COND_LO ; Used in aarch64-sve.md.
-UNSPEC_COND_LS ; Used in aarch64-sve.md.
-UNSPEC_COND_HS ; Used in aarch64-sve.md.
-UNSPEC_COND_HI ; Used in aarch64-sve.md.
-UNSPEC_COND_UO ; Used in aarch64-sve.md.
 UNSPEC_LASTB   ; Used in aarch64-sve.md.
 ])
 
@@ -1189,6 +1184,12 @@ (define_code_iterator SVE_INT_UNARY [neg
 ;; SVE floating-point unary operations.
 (define_code_iterator SVE_FP_UNARY [neg abs sqrt])
 
+;; SVE integer comparisons.
+(define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu])
+
+;; SVE floating-point comparisons.
+(define_code_iterator SVE_FP_CMP [lt le eq ne ge gt])
+
 ;; ---
 ;; Code Attributes
 ;; ---
@@ -1252,6 +1253,18 @@ (define_code_attr CMP [(lt "LT") (le "LE
(ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU")
(gtu "GTU")])
 
+;; The AArch64 condition associated with an rtl comparison code.
+(define_code_attr cmp_op [(lt "lt")
+ (le "le")
+ (eq "eq")
+ (ne "ne")
+ (ge "ge")
+ (gt "gt")
+ (ltu "lo")
+ (leu "ls")
+ (geu "hs")
+ (gtu "hi")])
+
 (define_code_attr fix_trunc_optab [(fix "fix_trunc")
   (unsigned_fix "fixuns_trunc")])
 
@@ -1358,6 +1371,18 @@ (define_code_attr sve_fp_op [(plus "fadd
 (abs "fabs")
 (sqrt "fsqrt")])
 
+;; The SVE

Re: Tighten condition in vect/pr85586.c (PR 85654)

2018-05-08 Thread Richard Biener
On Tue, May 8, 2018 at 11:03 AM, Richard Sandiford
 wrote:
> Another gcc.dg/vect test, another chance to play whack-a-mole
> with the target selectors.  In this case I think we want
> { ! vect_no_align }.  { { ! vect_no_align } || vect_hw_misalign }
> might work too, but (a) there are other tests that use vect_no_align
> on its own and (b) the point of the scan test was simply to sanity-
> check that we didn't stop vectorising, rather than to test a new
> vectorisation feature.
>
> Tested on aaarch64-linux-gnu, x86_64-linux-gnu and armeb-none-elf.
> OK for trunk and GCC 8?

OK.

> Thanks,
> Richard
>
>
> 2018-05-08  Richard Sandiford  
>
> gcc/testsuite/
> PR testsuite/85586
> * gcc.dg/vect/pr85586.c: Restrict LOOP VECTORIZED test to
> !vect_no_align.
>
> Index: gcc/testsuite/gcc.dg/vect/pr85586.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr85586.c 2018-05-02 08:39:59.942069849 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr85586.c 2018-05-08 09:47:33.207979464 +0100
> @@ -40,4 +40,4 @@ main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target 
> vect_int } } } */
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target { { 
> ! vect_no_align } && vect_int } } } } */


[committed][AArch64] Predicated SVE comparison folds

2018-05-08 Thread Richard Sandiford
This patch adds SVE patterns that combine a PTRUE-predicated
comparison with a separate AND.  The main benefit is for
optimising ANDs with the loop predicate, as in the testcase.
However, one of the potential drawbacks is that it triggers
even for cases in which two naturally-parallel comparisons
are ANDed together.  Whether that's a win or a less will
depend on the schedule, but it has the potential to be a win
more often than a loss.

The combine patterns are undeniably ugly.  One way of getting
around them would be to allow 1->1 "splits" when combining
2 instructions, as well as 1->2 splits when combining more
than 2 instructions (although that wouldn't really be a split).
Another would be to have a way of defining target-specific
rtx simplifications.  branches/ARM/sve-branch has a prototype
implementation of that, but it would need some clean-up before being
ready to submit.  It would also be good to make it closer to the
match.pd style.

Until then, I think what the combine patterns are doing is the
"correct" implementation given the current infrastructure.

Tested on aarch64-linux-gnu (with and without SVE) and aaarch64_be-elf.
Applied as r260031.

Richard


2018-05-08  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-sve.md (*pred_cmp_combine)
(*pred_cmp, *fcm_and_combine)
(*fcmuo_and_combine, *fcm_and)
(*fcmuo_and): New patterns.

gcc/testsuite/
* gcc.target/aarch64/sve/vcond_6.c: Do not expect any ANDs.
XFAIL the BIC test.
* gcc.target/aarch64/sve/vcond_7.c: New test.
* gcc.target/aarch64/sve/vcond_7_run.c: Likewise.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-05-08 10:56:20.122789504 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2018-05-08 11:12:30.156289597 +0100
@@ -1358,6 +1358,49 @@ (define_insn "*cmp_cc"
cmp\t%0., %1/z, %2., %3."
 )
 
+;; Predicated integer comparisons, formed by combining a PTRUE-predicated
+;; comparison with an AND.  Split the instruction into its preferred form
+;; (below) at the earliest opportunity, in order to get rid of the
+;; redundant operand 1.
+(define_insn_and_split "*pred_cmp_combine"
+  [(set (match_operand: 0 "register_operand" "=Upa, Upa")
+   (and:
+ (unspec:
+   [(match_operand: 1)
+(SVE_INT_CMP:
+  (match_operand:SVE_I 2 "register_operand" "w, w")
+  (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" 
", w"))]
+   UNSPEC_MERGE_PTRUE)
+ (match_operand: 4 "register_operand" "Upl, Upl")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_SVE"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (and:
+(SVE_INT_CMP:
+  (match_dup 2)
+  (match_dup 3))
+(match_dup 4)))
+  (clobber (reg:CC CC_REGNUM))])]
+)
+
+;; Predicated integer comparisons.
+(define_insn "*pred_cmp"
+  [(set (match_operand: 0 "register_operand" "=Upa, Upa")
+   (and:
+ (SVE_INT_CMP:
+   (match_operand:SVE_I 2 "register_operand" "w, w")
+   (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" 
", w"))
+ (match_operand: 1 "register_operand" "Upl, Upl")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_SVE"
+  "@
+   cmp\t%0., %1/z, %2., #%3
+   cmp\t%0., %1/z, %2., %3."
+)
+
 ;; Floating-point comparisons predicated with a PTRUE.
 (define_insn "*fcm"
   [(set (match_operand: 0 "register_operand" "=Upa, Upa")
@@ -1384,6 +1427,83 @@ (define_insn "*fcmuo"
   "TARGET_SVE"
   "fcmuo\t%0., %1/z, %2., %3."
 )
+
+;; Floating-point comparisons predicated on a PTRUE, with the results ANDed
+;; with another predicate P.  This does not have the same trapping behavior
+;; as predicating the comparison itself on P, but it's a legitimate fold,
+;; since we can drop any potentially-trapping operations whose results
+;; are not needed.
+;;
+;; Split the instruction into its preferred form (below) at the earliest
+;; opportunity, in order to get rid of the redundant operand 1.
+(define_insn_and_split "*fcm_and_combine"
+  [(set (match_operand: 0 "register_operand" "=Upa, Upa")
+   (and:
+ (unspec:
+   [(match_operand: 1)
+(SVE_FP_CMP
+  (match_operand:SVE_F 2 "register_operand" "w, w")
+  (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w"))]
+   UNSPEC_MERGE_PTRUE)
+ (match_operand: 4 "register_operand" "Upl, Upl")))]
+  "TARGET_SVE"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (and:
+ (SVE_FP_CMP:
+   (match_dup 2)
+   (match_dup 3))
+ (match_dup 4)))]
+)
+
+(define_insn_and_split "*fcmuo_and_combine"
+  [(set (match_operand: 0 "register_operand" "=Upa")
+   (and:
+ (unspec:
+   [(match_operand: 1)
+(unordered
+  (match_operand:SVE_F 2 "register_operand" "w")
+ 

[arm] PR target/85658 Fix operator precedence errors in parsecpu.awk

2018-05-08 Thread Richard Earnshaw (lists)
There are a number of places in parsecpu.awk where I've managed to get
the operator precedence between ! and 'in' incorrect (! binds more
tightly).  In most cases this just makes a consistency test ineffective,
but in a few cases it means we fail to correctly diagnose errors by the
user (for example, when passing an invalid cpu or architecture name to
configure.  This patch fixes all the cases I could find, based on
searching for all uses of the two operators in the same expression.  The
tweak to the API of check_fpu is to bring it into line with the other
check functions - it now returns the result rather than printing it
directly.  The caller now does the printing, in the same way that the
chkarch and chkcpu commands do.

PR target/85658
* config/arm/parsecpu.awk (check_cpu): Fix operator precedence.
(check_arch): Likewise.
(check_fpu): Return the result rather than printing it.
(end arch): Fix operator precedence.
(end cpu): Likewise.
(END): Print the result from check_fpu.

Committed to trunk and gcc-8 branch.
diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index 56c762b..4d234c3 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -463,7 +463,7 @@ function gen_opt () {
 function check_cpu (name) {
 exts = split (name, extensions, "+")
 
-if (! extensions[1] in cpu_cnames) {
+if (! (extensions[1] in cpu_cnames)) {
 	return "error"
 }
 
@@ -477,15 +477,16 @@ function check_cpu (name) {
 }
 
 function check_fpu (name) {
-if (name in fpu_cnames) {
-	print fpu_cnames[name]
-} else print "error"
+if (! (name in fpu_cnames)) {
+	return "error"
+}
+return fpu_cnames[name]
 }
 
 function check_arch (name) {
 exts = split (name, extensions, "+")
 
-if (! extensions[1] in arch_isa) {
+if (! (extensions[1] in arch_isa)) {
 	return "error"
 }
 
@@ -600,10 +601,10 @@ BEGIN {
 /^end arch / {
 if (NF != 3) fatal("syntax: end arch ")
 if (arch_name != $3) fatal("mimatched end arch")
-if (! arch_name in arch_tune_for) {
+if (! (arch_name in arch_tune_for)) {
 	fatal("arch definition lacks a \"tune for\" statement")
 }
-if (! arch_name in arch_isa) {
+if (! (arch_name in arch_isa)) {
 	fatal("arch definition lacks an \"isa\" statement")
 }
 arch_list = arch_list " " arch_name
@@ -742,7 +743,7 @@ BEGIN {
 	cpu_cnames[cpu_name] = cpu_name
 	gsub(/[-+.]/, "_", cpu_cnames[cpu_name])
 }
-if (! cpu_name in cpu_arch) fatal("cpu definition lacks an architecture")
+if (! (cpu_name in cpu_arch)) fatal("cpu definition lacks an architecture")
 cpu_list = cpu_list " " cpu_name
 cpu_name = ""
 parse_ok = 1
@@ -776,6 +777,6 @@ END {
 	print check_arch(target[2])
 } else if (cmd ~ /^chkfpu /) {
 	split (cmd, target)
-	check_fpu(target[2])
+	print check_fpu(target[2])
 } else fatal("unrecognized command: "cmd)
 }


[Patch] Use two source permute for vector initialization (PR 85692)

2018-05-08 Thread Allan Sandfeld Jensen
I have tried to fix PR85692 that I opened.

2018-05-08  Allan Sandfeld Jense 

PR tree-optimization/85692
* tree-ssa-forwprop.c (simplify_vector_constructor): Detect
  two source permute operations as well.
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 58ec6b47a5b..fbee8064160 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2004,7 +2004,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
   gimple *def_stmt;
-  tree op, op2, orig, type, elem_type;
+  tree op, op2, orig1, orig2, type, elem_type;
   unsigned elem_size, i;
   unsigned HOST_WIDE_INT nelts;
   enum tree_code code, conv_code;
@@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  vec_perm_builder sel (nelts, nelts, 1);
-  orig = NULL;
+  vec_perm_builder sel (nelts, 2, nelts);
+  orig1 = NULL;
+  orig2 = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
   FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
@@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op1 = gimple_assign_rhs1 (def_stmt);
   ref = TREE_OPERAND (op1, 0);
-  if (orig)
+  if (orig1)
 	{
-	  if (ref != orig)
-	return false;
+	  if (ref == orig1 || orig2)
+	{
+	  if (ref != orig1 && ref != orig2)
+	return false;
+	}
+	  else
+	{
+	  if (TREE_CODE (ref) != SSA_NAME)
+		return false;
+	  if (! VECTOR_TYPE_P (TREE_TYPE (ref))
+		  || ! useless_type_conversion_p (TREE_TYPE (op1),
+		  TREE_TYPE (TREE_TYPE (ref
+		return false;
+	  if (TREE_TYPE (orig1) != TREE_TYPE (ref))
+		return false;
+	  orig2 = ref;
+	  maybe_ident = false;
+	  }
 	}
   else
 	{
@@ -2076,12 +2093,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	  || ! useless_type_conversion_p (TREE_TYPE (op1),
 	  TREE_TYPE (TREE_TYPE (ref
 	return false;
-	  orig = ref;
+	  orig1 = ref;
 	}
   unsigned int elt;
   if (maybe_ne (bit_field_size (op1), elem_size)
 	  || !constant_multiple_p (bit_field_offset (op1), elem_size, &elt))
 	return false;
+  if (orig2 && ref == orig2)
+	elt += nelts;
   if (elt != i)
 	maybe_ident = false;
   sel.quick_push (elt);
@@ -2089,14 +2108,17 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (i < nelts)
 return false;
 
-  if (! VECTOR_TYPE_P (TREE_TYPE (orig))
+  if (! VECTOR_TYPE_P (TREE_TYPE (orig1))
   || maybe_ne (TYPE_VECTOR_SUBPARTS (type),
-		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig
+		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig1
 return false;
 
+  if (!orig2)
+orig2 = orig1;
+
   tree tem;
   if (conv_code != ERROR_MARK
-  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig),
+  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig1),
 	   &tem, &conv_code)
 	  || conv_code == CALL_EXPR))
 return false;
@@ -2104,16 +2126,16 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (maybe_ident)
 {
   if (conv_code == ERROR_MARK)
-	gimple_assign_set_rhs_from_tree (gsi, orig);
+	gimple_assign_set_rhs_from_tree (gsi, orig1);
   else
-	gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
+	gimple_assign_set_rhs_with_ops (gsi, conv_code, orig1,
 	NULL_TREE, NULL_TREE);
 }
   else
 {
   tree mask_type;
 
-  vec_perm_indices indices (sel, 1, nelts);
+  vec_perm_indices indices (sel, 2, nelts);
   if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
 	return false;
   mask_type
@@ -2125,15 +2147,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op2 = vec_perm_indices_to_tree (mask_type, indices);
   if (conv_code == ERROR_MARK)
-	gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
+	gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig1, orig2, op2);
   else
 	{
 	  gimple *perm
-	= gimple_build_assign (make_ssa_name (TREE_TYPE (orig)),
-   VEC_PERM_EXPR, orig, orig, op2);
-	  orig = gimple_assign_lhs (perm);
+	= gimple_build_assign (make_ssa_name (TREE_TYPE (orig1)),
+   VEC_PERM_EXPR, orig1, orig2, op2);
 	  gsi_insert_before (gsi, perm, GSI_SAME_STMT);
-	  gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
+	  gimple_assign_set_rhs_with_ops (gsi, conv_code, gimple_assign_lhs (perm),
 	  NULL_TREE, NULL_TREE);
 	}
 }


Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-05-08 Thread Sameera Deshpande
On 1 May 2018 at 05:05, Sameera Deshpande  wrote:
> On 13 April 2018 at 20:21, James Greenhalgh  wrote:
>> On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
>>> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
>>> mailto:james.greenha...@arm.com>> wrote:
>>> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
>>> > Hi,
>>> >
>>> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande 
>>> > mailto:sameera.deshpa...@linaro.org>>:
>>> > > Hi Christophe,
>>> > >
>>> > > Please find attached the updated patch with testcases.
>>> > >
>>> > > Ok for trunk?
>>> >
>>> > Thanks for the update.
>>> >
>>> > Since the new intrinsics are only available on aarch64, you want to
>>> > prevent the tests from running on arm.
>>> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
>>> > targets.
>>> > There are several examples on how to do that in that directory.
>>> >
>>> > I have also noticed that the tests fail at execution on aarch64_be.
>>>
>>> I think this is important to fix. We don't want the big-endian target to 
>>> have
>>> failing implementations of the Neon intrinsics. What is the nature of the
>>> failure?
>>>
>>> From what I can see, nothing in the patch prevents using these intrinsics
>>> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
>>> code bug), or the testcase expected behaviour is wrong.
>>>
>>> I don't think disabling the test for big-endian is the right fix. We should
>>> either fix the intrinsics, or fix the testcase.
>>>
>>> Thanks,
>>> James
>>>
>>> Hi James,
>>>
>>> As the tests assume the little endian order of elements while checking the
>>> results, the tests are failing for big endian targets. So, the failures are
>>> not because of intrinsic implementations, but because of the testcase.
>>
>> The testcase is a little hard to follow through the macros, but why would
>> this be the case?
>>
>> ld1 is deterministic on big and little endian for which elements will be
>> loaded from memory, as is st1.
>>
>> My expectation would be that:
>>
>>   int __attribute__ ((noinline))
>>   test_vld_u16_x3 ()
>>   {
>> uint16_t data[3 * 3];
>> uint16_t temp[3 * 3];
>> uint16x4x3_t vectors;
>> int i,j;
>> for (i = 0; i < 3 * 3; i++)
>>   data [i] = (uint16_t) 3*i;
>> asm volatile ("" : : : "memory");
>> vectors = vld1_u16_x3 (data);
>> vst1_u16 (temp, vectors.val[0]);
>> vst1_u16 (&temp[3], vectors.val[1]);
>> vst1_u16 (&temp[3 * 2], vectors.val[2]);
>> asm volatile ("" : : : "memory");
>> for (j = 0; j < 3 * 3; j++)
>>   if (temp[j] != data[j])
>> return 1;
>> return 0;
>>   }
>>
>> would work equally well for big- or little-endian.
>>
>> I think this is more likely to be an intrinsics implementation bug.
>>
>> Thanks,
>> James
>>
>
> Hi James,
>
> Please find attached the updated patch, which now passes for little as
> well as big endian.
> Ok for trunk?
>
> --
> - Thanks and regards,
>   Sameera D.
>
> gcc/Changelog:
>
> 2018-05-01  Sameera Deshpande  
>
>
> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
> (st1x2): Likewise.
> (st1x3): Likewise.
> * config/aarch64/aarch64-simd.md
> (aarch64_ld1x3): New pattern.
> (aarch64_ld1_x3_): Likewise
> (aarch64_st1x2): Likewise
> (aarch64_st1_x2_): Likewise
> (aarch64_st1x3): Likewise
> (aarch64_st1_x3_): Likewise
> * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
> (vld1_s8_x3): Likewise.
> (vld1_u16_x3): Likewise.
> (vld1_s16_x3): Likewise.
> (vld1_u32_x3): Likewise.
> (vld1_s32_x3): Likewise.
> (vld1_u64_x3): Likewise.
> (vld1_s64_x3): Likewise.
> (vld1_f16_x3): Likewise.
> (vld1_f32_x3): Likewise.
> (vld1_f64_x3): Likewise.
> (vld1_p8_x3): Likewise.
> (vld1_p16_x3): Likewise.
> (vld1_p64_x3): Likewise.
> (vld1q_u8_x3): Likewise.
> (vld1q_s8_x3): Likewise.
> (vld1q_u16_x3): Likewise.
> (vld1q_s16_x3): Likewise.
> (vld1q_u32_x3): Likewise.
> (vld1q_s32_x3): Likewise.
> (vld1q_u64_x3): Likewise.
> (vld1q_s64_x3): Likewise.
> (vld1q_f16_x3): Likewise.
> (vld1q_f32_x3): Likewise.
> (vld1q_f64_x3): Likewise.
> (vld1q_p8_x3): Likewise.
> (vld1q_p16_x3): Likewise.
> (vld1q_p64_x3): Likewise.
> (vst1_s64_x2): Likewise.
> (vst1_u64_x2): Likewise.
> (vst1_f64_x2): Likewise.
> (vst1_s8_x2): Likewise.
> (vst1_p8_x2): Likewise.
> (vst1_s16_x2): Likewise.
> (vst1_p16_x2): Likewise.
> (vst1_s32_x2): Likewise.
> (vst1_u8_x2): Likewise.
> (vst1_u16_x2): Likewise.
> (vst1_u32_x2): Likewise.
> (vst1_f16_x2): Likewise.
> (vst1_f32_x2): Likewise.
> (vst1_p64_x2): Likewise.
> (vst1q_s8_x2): Likewise.
> (vs

Re: [Patch] Use two source permute for vector initialization (PR 85692)

2018-05-08 Thread Richard Biener
On Tue, May 8, 2018 at 12:37 PM, Allan Sandfeld Jensen
 wrote:
> I have tried to fix PR85692 that I opened.

Please add a testcase as well.  It also helps if you shortly tell what
the patch does
in your mail.

Thanks,
Richard.

> 2018-05-08  Allan Sandfeld Jense 
>
> PR tree-optimization/85692
> * tree-ssa-forwprop.c (simplify_vector_constructor): Detect
>   two source permute operations as well.


Re: [PATCH] Add constant folding for _mm*movemask* intrinsics (PR target/85317)

2018-05-08 Thread Kirill Yukhin
Hello Jakub!
On 07 мая 10:20, Jakub Jelinek wrote:
> Hi!
> 
> The following patch handles constant folding of the builtins used in
> *movemask* intrinsics - they have single operand and the only useful folding
> seems to be if the argument is VECTOR_CST, we can do what the instruction
> would do on that input and return the resulting INTEGER_CST.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Your patch is OK for trunk.

--
Regards, Kirill Yukhin


Re: [C++ PATCH] Kill -fno-for-scope

2018-05-08 Thread Paolo Carlini

Hi,

On 08/05/2018 01:02, Nathan Sidwell wrote:
As prophesied by gcc 8.1, I have nuked the ARM-era for-scope 
compatibilty of -fno-for-scope.  It has been a c++98-only feature, and 
that's not the default anymore.  Time for this to go.
Nice. I'm sure that for a while we had a bug in Bugzilla due to some 
contortions in the compatibility code, but now I can't immediately find 
it... I'll try harder.


Paolo.


Re: [PATCH] Optimize 128-bit vector insertion into zero 512-bit vector (PR target/85480)

2018-05-08 Thread Kirill Yukhin
Hello Jakub!
On 23 апр 20:31, Jakub Jelinek wrote:
> Hi!
> 
> As mentioned in the PR, vmov{aps,apd,dqa{,32,64}} 128-bit instructions
> zero the rest of 512-bit register, so we can optimize insertion into zero
> vectors using those instructions.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for stage1?
Your patch is ok for main trunk.

--
Thanks, K


Re: [Patch] Use two source permute for vector initialization (PR 85692)

2018-05-08 Thread Allan Sandfeld Jensen
On Dienstag, 8. Mai 2018 12:42:33 CEST Richard Biener wrote:
> On Tue, May 8, 2018 at 12:37 PM, Allan Sandfeld Jensen
> 
>  wrote:
> > I have tried to fix PR85692 that I opened.
> 
> Please add a testcase as well.  It also helps if you shortly tell what
> the patch does
> in your mail.
> 
Okay. I have updated the patch with a test-case based on my motivating 
examples. The patch just extends patching a vector construction to not just a 
single source permute instruction, but also a two source permute instruction.
commit 15c0f6a933d60b085416a59221851b604b955958
Author: Allan Sandfeld Jensen 
Date:   Tue May 8 13:16:18 2018 +0200

Try two source permute for vector construction

simplify_vector_constructor() was detecting when vector construction could
be implemented as a single source permute, but was not detecting when
it could be implemented as a double source permute. This patch adds the
second case.

2018-05-08 Allan Sandfeld Jensen 

gcc/

PR tree-optimization/85692
* tree-ssa-forwprop.c (simplify_vector_constructor): Try two
source permute as well.

gcc/testsuite

* gcc.target/i386/pr85692.c: Test two simply constructions are
detected as permute instructions.

diff --git a/gcc/testsuite/gcc.target/i386/pr85692.c b/gcc/testsuite/gcc.target/i386/pr85692.c
new file mode 100644
index 000..322c1050161
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr85692.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -msse4.1" } */
+/* { dg-final { scan-assembler "unpcklps" } } */
+/* { dg-final { scan-assembler "blendps" } } */
+/* { dg-final { scan-assembler-not "shufps" } } */
+/* { dg-final { scan-assembler-not "unpckhps" } } */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+
+v4sf unpcklps(v4sf a, v4sf b)
+{
+return v4sf{a[0],b[0],a[1],b[1]};
+}
+
+v4sf blendps(v4sf a, v4sf b)
+{
+return v4sf{a[0],b[1],a[2],b[3]};
+}
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 58ec6b47a5b..fbee8064160 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2004,7 +2004,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
   gimple *def_stmt;
-  tree op, op2, orig, type, elem_type;
+  tree op, op2, orig1, orig2, type, elem_type;
   unsigned elem_size, i;
   unsigned HOST_WIDE_INT nelts;
   enum tree_code code, conv_code;
@@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  vec_perm_builder sel (nelts, nelts, 1);
-  orig = NULL;
+  vec_perm_builder sel (nelts, 2, nelts);
+  orig1 = NULL;
+  orig2 = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
   FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
@@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op1 = gimple_assign_rhs1 (def_stmt);
   ref = TREE_OPERAND (op1, 0);
-  if (orig)
+  if (orig1)
 	{
-	  if (ref != orig)
-	return false;
+	  if (ref == orig1 || orig2)
+	{
+	  if (ref != orig1 && ref != orig2)
+	return false;
+	}
+	  else
+	{
+	  if (TREE_CODE (ref) != SSA_NAME)
+		return false;
+	  if (! VECTOR_TYPE_P (TREE_TYPE (ref))
+		  || ! useless_type_conversion_p (TREE_TYPE (op1),
+		  TREE_TYPE (TREE_TYPE (ref
+		return false;
+	  if (TREE_TYPE (orig1) != TREE_TYPE (ref))
+		return false;
+	  orig2 = ref;
+	  maybe_ident = false;
+	  }
 	}
   else
 	{
@@ -2076,12 +2093,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	  || ! useless_type_conversion_p (TREE_TYPE (op1),
 	  TREE_TYPE (TREE_TYPE (ref
 	return false;
-	  orig = ref;
+	  orig1 = ref;
 	}
   unsigned int elt;
   if (maybe_ne (bit_field_size (op1), elem_size)
 	  || !constant_multiple_p (bit_field_offset (op1), elem_size, &elt))
 	return false;
+  if (orig2 && ref == orig2)
+	elt += nelts;
   if (elt != i)
 	maybe_ident = false;
   sel.quick_push (elt);
@@ -2089,14 +2108,17 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (i < nelts)
 return false;
 
-  if (! VECTOR_TYPE_P (TREE_TYPE (orig))
+  if (! VECTOR_TYPE_P (TREE_TYPE (orig1))
   || maybe_ne (TYPE_VECTOR_SUBPARTS (type),
-		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig
+		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig1
 return false;
 
+  if (!orig2)
+orig2 = orig1;
+
   tree tem;
   if (conv_code != ERROR_MARK
-  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig),
+  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig1),
 	   &tem, &conv_code)
 	  || conv_code == CALL_EXPR))
 return false;
@@ -2104,16 +2126,16 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (maybe_ident)
 {
   if (conv_code == ERROR_MARK)
-	gimple_assign_set_rhs_fro

[PATCH][i386] Adding WAITPKG instructions

2018-05-08 Thread Peryt, Sebastian
Hi,

This patch adds support for WAITPKG instructions.

Is it ok for trunk and after few day for backport to GCC-8?

2018-05-08  Sebastian Peryt  

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_WAITPKG_SET,
OPTION_MASK_ISA_WAITPKG_UNSET): New defines.
(ix86_handle_option): Handle -mwaitpkg.
* config.gcc: New header.
* config/i386/cpuid.h (bit_WAITPKG): New bit.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect -mwaitpkg.
* config/i386/i386-builtin-types.def ((UINT8, UNSIGNED, UINT64)): New
function type.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_WAITPKG
* config/i386/i386.c (ix86_target_string): Added -mwaitpkg.
(ix86_option_override_internal): Added PTA_WAITPKG.
(ix86_valid_target_attribute_inner_p): Added -mwaitpkg.
(enum ix86_builtins): Added IX86_BUILTIN_UMONITOR, IX86_BUILTIN_UMWAIT,
IX86_BUILTIN_TPAUSE.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_umonitor,
__builtin_ia32_umwait and __builtin_ia32_tpause.
(ix86_expand_builtin):Expand  IX86_BUILTIN_UMONITOR,
IX86_BUILTIN_UMWAIT, IX86_BUILTIN_TPAUSE.
* config/i386/i386.h (TARGET_WAITPKG, TARGET_WAITPKG_P): New.
* config/i386/i386.opt: Added -mwaitpkg.
* config/i386/sse.md (UNSPECV_UMWAIT, UNSPECV_UMONITOR,
UNSPECV_TPAUSE): New.
(umwait, umonitor_, tpause): New.
* config/i386/waitpkgintrin.h: New file.
* config/i386/x86intrin.h: New header.
* doc/invoke.texi: Added -mwaitpkg.

2018-05-08  Sebastian Peryt  

gcc/testsuite/

* gcc.target/i386/tpause-1.c: New test.
* gcc.target/i386/umonitor-1.c: New test.

Thanks,
Sebastian




0001-WAITPKG.patch
Description: 0001-WAITPKG.patch


[committed] Move C++ SVE tests to g++.target/aarch64/sve

2018-05-08 Thread Richard Sandiford
Move the C++ tests that were originally in gcc.target/aarch64/sve
and later g++.dg/other to g++.target/aarch64/sve.  This means that
we don't need to override the -march flag when testing with something
that already supports SVE.

Tested on aarch64-linux-gnu (with and without SVE) and aaarch64_be-elf.
Applied as r260038.

Richard


2018-05-08  Richard Sandiford  

gcc/testsuite/
* g++.dg/other/sve_const_pred_1.C: Rename to...
* g++.target/aarch64/sve/const_pred_1.C: ...this.  Remove aarch64
target selectors and explicit -march options.
* g++.dg/other/sve_const_pred_2.C: Rename to...
* g++.target/aarch64/sve/const_pred_2.C: ...this and adjust likewise.
* g++.dg/other/sve_const_pred_3.C: Rename to...
* g++.target/aarch64/sve/const_pred_3.C: ...this and adjust likewise.
* g++.dg/other/sve_const_pred_4.C: Rename to...
* g++.target/aarch64/sve/const_pred_4.C: ...this and adjust likewise.
* g++.dg/other/sve_tls_2.C: Rename to...
* g++.target/aarch64/sve/tls_2.C: ...this and adjust likewise.
* g++.dg/other/sve_vcond_1.C: Rename to...
* g++.target/aarch64/sve/vcond_1.C: ...this and adjust likewise.
* g++.dg/other/sve_vcond_1_run.C: Rename to...
* g++.target/aarch64/sve/vcond_1_run.C: ...this and adjust likewise.

Index: gcc/testsuite/g++.dg/other/sve_const_pred_1.C
===
--- gcc/testsuite/g++.dg/other/sve_const_pred_1.C   2018-05-01 
19:30:28.063724095 +0100
+++ /dev/null   2018-04-20 16:19:46.369131350 +0100
@@ -1,18 +0,0 @@
-/* { dg-do compile { target aarch64*-*-* } } */
-/* { dg-options "-O2 -march=armv8.2-a+sve -msve-vector-bits=256" } */
-
-#include 
-
-typedef int8_t vnx16qi __attribute__((vector_size(32)));
-
-vnx16qi
-foo (vnx16qi x, vnx16qi y)
-{
-  return (vnx16qi) { -1, 0, 0, -1, -1, -1, 0, 0,
--1, -1, -1, -1, 0, 0, 0, 0,
--1, -1, -1, -1, -1, -1, -1, -1,
-0, 0, 0, 0, 0, 0, 0, 0 } ? x : y;
-}
-
-/* { dg-final { scan-assembler {\tldr\tp[0-9]+,} } } */
-/* { dg-final { scan-assembler 
{\t\.byte\t57\n\t\.byte\t15\n\t\.byte\t(255|-1)\n\t\.byte\t0\n} } } */
Index: gcc/testsuite/g++.target/aarch64/sve/const_pred_1.C
===
--- /dev/null   2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/g++.target/aarch64/sve/const_pred_1.C 2018-05-08 
12:39:23.327239904 +0100
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=256" } */
+
+#include 
+
+typedef int8_t vnx16qi __attribute__((vector_size(32)));
+
+vnx16qi
+foo (vnx16qi x, vnx16qi y)
+{
+  return (vnx16qi) { -1, 0, 0, -1, -1, -1, 0, 0,
+-1, -1, -1, -1, 0, 0, 0, 0,
+-1, -1, -1, -1, -1, -1, -1, -1,
+0, 0, 0, 0, 0, 0, 0, 0 } ? x : y;
+}
+
+/* { dg-final { scan-assembler {\tldr\tp[0-9]+,} } } */
+/* { dg-final { scan-assembler 
{\t\.byte\t57\n\t\.byte\t15\n\t\.byte\t(255|-1)\n\t\.byte\t0\n} } } */
Index: gcc/testsuite/g++.dg/other/sve_const_pred_2.C
===
--- gcc/testsuite/g++.dg/other/sve_const_pred_2.C   2018-05-01 
19:30:28.063724095 +0100
+++ /dev/null   2018-04-20 16:19:46.369131350 +0100
@@ -1,16 +0,0 @@
-/* { dg-do compile { target aarch64*-*-* } } */
-/* { dg-options "-O2 -march=armv8.2-a+sve -msve-vector-bits=256" } */
-
-#include 
-
-typedef int16_t vnx8hi __attribute__((vector_size(32)));
-
-vnx8hi
-foo (vnx8hi x, vnx8hi y)
-{
-  return (vnx8hi) { -1, 0, 0, -1, -1, -1, 0, 0,
-   -1, -1, -1, -1, 0, 0, 0, 0 } ? x : y;
-}
-
-/* { dg-final { scan-assembler {\tldr\tp[0-9]+,} } } */
-/* { dg-final { scan-assembler 
{\t\.byte\t65\n\t\.byte\t5\n\t\.byte\t85\n\t\.byte\t0\n} } } */
Index: gcc/testsuite/g++.target/aarch64/sve/const_pred_2.C
===
--- /dev/null   2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/g++.target/aarch64/sve/const_pred_2.C 2018-05-08 
12:39:23.327239904 +0100
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=256" } */
+
+#include 
+
+typedef int16_t vnx8hi __attribute__((vector_size(32)));
+
+vnx8hi
+foo (vnx8hi x, vnx8hi y)
+{
+  return (vnx8hi) { -1, 0, 0, -1, -1, -1, 0, 0,
+   -1, -1, -1, -1, 0, 0, 0, 0 } ? x : y;
+}
+
+/* { dg-final { scan-assembler {\tldr\tp[0-9]+,} } } */
+/* { dg-final { scan-assembler 
{\t\.byte\t65\n\t\.byte\t5\n\t\.byte\t85\n\t\.byte\t0\n} } } */
Index: gcc/testsuite/g++.dg/other/sve_const_pred_3.C
===
--- gcc/testsuite/g++.dg/other/sve_const_pred_3.C   2018-05-01 
19:30:28.063724095 +0100
+++ /dev/null   2018-04-20 16:19:46.369131350 +0100
@@ -1,15 +0,0 @@
-/* { dg-do compile { target aarch64*-*-* } } */
-/* { dg-options "-O2 -march=armv8.2-a+sve -msve-vector-

[PATCH][i386] Adding CLDEMOTE instruction

2018-05-08 Thread Peryt, Sebastian
Hi,

This patch adds support for CLDEMOTE instruction.

Is it ok for trunk and after few day for backport to GCC-8?

2018-05-08  Sebastian Peryt  

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_CLDEMOTE_SET,
OPTION_MASK_ISA_CLDEMOTE_UNSET): New defines.
(ix86_handle_option): Handle -mcldemote.
* config.gcc: New header.
* config/i386/cldemoteintrin.h: New file.
* config/i386/cpuid.h (bit_CLDEMOTE): New bit.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
-mcldemote.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_CLDEMOTE.
* config/i386/i386.c (ix86_target_string): Added -mcldemote.
(ix86_valid_target_attribute_inner_p): Ditto.
(enum ix86_builtins): Added IX86_BUILTIN_CLDEMOTE.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_cldemote.
(ix86_expand_builtin): Expand IX86_BUILTIN_CLDEMOTE.
* config/i386/i386.h (TARGET_CLDEMOTE, TARGET_CLDEMOTE_P): New.
* config/i386/i386.md (UNSPECV_CLDEMOTE): New.
(cldemote): New.
* config/i386/i386.opt: Added -mcldemote.
* config/i386/x86intrin.h: New header.
* doc/invoke.texi: Added -mcldemote.

2018-05-08  Sebastian Peryt  

gcc/testsuite/

* gcc.target/i386/cldemote-1.c: New test.

Thanks,
Sebastian


RE: [PATCH][i386] Adding CLDEMOTE instruction

2018-05-08 Thread Peryt, Sebastian
Sorry, forgot attachment.

Sebastian


-Original Message-
From: Peryt, Sebastian 
Sent: Tuesday, May 8, 2018 1:56 PM
To: gcc-patches@gcc.gnu.org
Cc: Uros Bizjak ; Kirill Yukhin ; 
Peryt, Sebastian 
Subject: [PATCH][i386] Adding CLDEMOTE instruction

Hi,

This patch adds support for CLDEMOTE instruction.

Is it ok for trunk and after few day for backport to GCC-8?

2018-05-08  Sebastian Peryt  

gcc/

* common/config/i386/i386-common.c (OPTION_MASK_ISA_CLDEMOTE_SET,
OPTION_MASK_ISA_CLDEMOTE_UNSET): New defines.
(ix86_handle_option): Handle -mcldemote.
* config.gcc: New header.
* config/i386/cldemoteintrin.h: New file.
* config/i386/cpuid.h (bit_CLDEMOTE): New bit.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
-mcldemote.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_CLDEMOTE.
* config/i386/i386.c (ix86_target_string): Added -mcldemote.
(ix86_valid_target_attribute_inner_p): Ditto.
(enum ix86_builtins): Added IX86_BUILTIN_CLDEMOTE.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_cldemote.
(ix86_expand_builtin): Expand IX86_BUILTIN_CLDEMOTE.
* config/i386/i386.h (TARGET_CLDEMOTE, TARGET_CLDEMOTE_P): New.
* config/i386/i386.md (UNSPECV_CLDEMOTE): New.
(cldemote): New.
* config/i386/i386.opt: Added -mcldemote.
* config/i386/x86intrin.h: New header.
* doc/invoke.texi: Added -mcldemote.

2018-05-08  Sebastian Peryt  

gcc/testsuite/

* gcc.target/i386/cldemote-1.c: New test.

Thanks,
Sebastian


0002-CLDEMOTE.PATCH
Description: 0002-CLDEMOTE.PATCH


[PATCH 1/4] regcprop: Avoid REG_CFA_REGISTER notes (PR85645)

2018-05-08 Thread Segher Boessenkool
Changing a SET that has a REG_CFA_REGISTER note is wrong if we are
changing the SET_DEST, or if the REG_CFA_REGISTER has nil as its
argument, and maybe some other cases.  It's never really useful to
propagate into such an instruction, so let's just bail whenever we
see such a note.

Bootstrapped and tested on powerpc64-linux {-m32,-m64}.  Is this
okay for trunk?


Segher


2018-05-08  Segher Boessenkool  

PR rtl-optimization/85645
*  regcprop.c (copyprop_hardreg_forward_1): Don't propagate into an
insn that has a REG_CFA_REGISTER note.

---
 gcc/regcprop.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index a664f76..1813242 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -848,6 +848,12 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
  && reg_overlap_mentioned_p (XEXP (link, 0), SET_SRC (set)))
set = NULL;
}
+
+ /* We need to keep CFI info correct, and the same on all paths,
+so we cannot normally replace the registers REG_CFA_REGISTER
+refers to.  Bail.  */
+ if (REG_NOTE_KIND (link) == REG_CFA_REGISTER)
+   goto did_replacement;
}
 
   /* Special-case plain move instructions, since we may well
-- 
1.8.3.1



[PATCH 0/4] PR85645

2018-05-08 Thread Segher Boessenkool
In this testcase shrink-wrap makes a not-so-great decision.  Both regcprop
and regrename cannot handle the resulting RTL correctly.  The first two
patches fix those passes.

The third patch makes separate shrink-wrapping do a better job: running
spread_components more than once should help only in unusual cases, but
it turns out those unusual cases happen a lot.

The last patch makes the rs6000 backend generate REG_CFA_REGISTER notes
always with an argument (instead of having it nil, which means use the
pattern of the instruction).  This makes live easier for anything that
needs to handle those notes, and nil should probably be disallowed.


Segher Boessenkool (4):
  regcprop: Avoid REG_CFA_REGISTER notes (PR85645)
  regrename: Don't rename the dest of a REG_CFA_REGISTER (PR85645)
  shrink-wrap: Improve spread_components (PR85645)
  rs6000: Give an argument to every REG_CFA_REGISTER (PR85645)

 gcc/config/rs6000/rs6000.c |  6 +++---
 gcc/regcprop.c |  6 ++
 gcc/regrename.c| 19 +++
 gcc/shrink-wrap.c  | 23 ---
 4 files changed, 44 insertions(+), 10 deletions(-)

-- 
1.8.3.1



[PATCH 2/4] regrename: Don't rename the dest of a REG_CFA_REGISTER (PR85645)

2018-05-08 Thread Segher Boessenkool
We should never change the destination of a REG_CFA_REGISTER, just
like for insns with a REG_CFA_RESTORE, because we need to have the
same control flow information on all branches that join.  It is very
doubtful that renaming the scratch registers used for prologue/epilogue
will help anything either.

Bootstrapped and tested on powerpc64-linux {-m32,-m64}.  Is this
okay for trunk?


Segher


2018-05-08  Segher Boessenkool  

PR rtl-optimization/85645
* regrename.c (build_def_use): Also kill the chains that include the
destination of a REG_CFA_REGISTER note.

---
 gcc/regrename.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index 4575481..8424093 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1661,7 +1661,8 @@ build_def_use (basic_block bb)
 (6) For any non-earlyclobber write we find in an operand, make
 a new chain or mark the hard register as live.
 (7) For any REG_UNUSED, close any chains we just opened.
-(8) For any REG_CFA_RESTORE, kill any chain containing it.
+(8) For any REG_CFA_RESTORE or REG_CFA_REGISTER, kill any chain
+containing its dest.
 
 We cannot deal with situations where we track a reg in one mode
 and see a reference in another mode; these will cause the chain
@@ -1882,10 +1883,20 @@ build_def_use (basic_block bb)
  }
 
  /* Step 8: Kill the chains involving register restores.  Those
-should restore _that_ register.  */
+should restore _that_ register.  Similar for REG_CFA_REGISTER.  */
  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
-   if (REG_NOTE_KIND (note) == REG_CFA_RESTORE)
- scan_rtx (insn, &XEXP (note, 0), NO_REGS, mark_all_read, OP_IN);
+   if (REG_NOTE_KIND (note) == REG_CFA_RESTORE
+   || REG_NOTE_KIND (note) == REG_CFA_REGISTER)
+ {
+   rtx *x = &XEXP (note, 0);
+   if (!*x)
+ x = &PATTERN (insn);
+   if (GET_CODE (*x) == PARALLEL)
+ x = &XVECEXP (*x, 0, 0);
+   if (GET_CODE (*x) == SET)
+ x = &SET_DEST (*x);
+   scan_rtx (insn, x, NO_REGS, mark_all_read, OP_IN);
+ }
}
   else if (DEBUG_BIND_INSN_P (insn)
   && !VAR_LOC_UNKNOWN_P (INSN_VAR_LOCATION_LOC (insn)))
-- 
1.8.3.1



[PATCH 3/4] shrink-wrap: Improve spread_components (PR85645)

2018-05-08 Thread Segher Boessenkool
In the testcase for PR85645 we do a pretty dumb placement of the
prologue/epilogue for the LR component: we place an epilogue for LR
before a control flow split where one of the branches clobbers LR
eventually, and the other does not.  The branch that does clobber it
will need a prologue again some time later.  Because saving and
restoring LR is a two step process---it needs to be moved via a GPR---
the backend emits CFI directives so that we get correct unwind
information.  But both regcprop and regrename do not properly handle
such CFI directives leading to ICEs.

Now, neither of the two branches needs to have LR restored at all,
because both of the branches end up in an infinite loop.

This patch makes spread_component return a boolean saying if anything
was changed, and if so, it is called again.  This obviously is finite
(there is a finite number of basic blocks, each with a finite number
of components, and spread_components can only assign more components
to a block, never less).  I also instrumented the code, and on a
bootstrap+regtest spread_components made changes a maximum of two
times.  Interestingly though it made changes on two iterations in
a third of the cases it did anything at all!

Bootstrapped and tested on powerpc64-linux {-m32,-m64}.  Is this
okay for trunk?


Segher


2018-05-08  Segher Boessenkool  

PR rtl-optimization/85645
* shrink-wrap.c (spread_components): Return a boolean saying if
anything was changed.
(try_shrink_wrapping_separate): Iterate spread_components until
nothing changes anymore.

---
 gcc/shrink-wrap.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 6b47d4e..40b26b8 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -1253,8 +1253,9 @@ place_prologue_for_one_component (unsigned int which, 
basic_block head)
 /* Set HAS_COMPONENTS in every block to the maximum it can be set to without
setting it on any path from entry to exit where it was not already set
somewhere (or, for blocks that have no path to the exit, consider only
-   paths from the entry to the block itself).  */
-static void
+   paths from the entry to the block itself).  Return whether any changes
+   were made to some HAS_COMPONENTS.  */
+static bool
 spread_components (sbitmap components)
 {
   basic_block entry_block = ENTRY_BLOCK_PTR_FOR_FN (cfun);
@@ -1377,12 +1378,19 @@ spread_components (sbitmap components)
 
   /* Finally, mark everything not not needed both forwards and backwards.  */
 
+  bool did_changes = false;
+
   FOR_EACH_BB_FN (bb, cfun)
 {
+  bitmap_copy (old, SW (bb)->has_components);
+
   bitmap_and (SW (bb)->head_components, SW (bb)->head_components,
  SW (bb)->tail_components);
   bitmap_and_compl (SW (bb)->has_components, components,
SW (bb)->head_components);
+
+  if (!did_changes && !bitmap_equal_p (old, SW (bb)->has_components))
+   did_changes = true;
 }
 
   FOR_ALL_BB_FN (bb, cfun)
@@ -1394,6 +1402,8 @@ spread_components (sbitmap components)
  fprintf (dump_file, "\n");
}
 }
+
+  return did_changes;
 }
 
 /* If we cannot handle placing some component's prologues or epilogues where
@@ -1797,7 +1807,14 @@ try_shrink_wrapping_separate (basic_block first_bb)
   EXECUTE_IF_SET_IN_BITMAP (components, 0, j, sbi)
 place_prologue_for_one_component (j, first_bb);
 
-  spread_components (components);
+  int spread_times = 0;
+  while (spread_components (components))
+{
+  spread_times++;
+
+  if (dump_file)
+   fprintf (dump_file, "Now spread %d times.\n", spread_times);
+}
 
   disqualify_problematic_components (components);
 
-- 
1.8.3.1



[PATCH 4/4] rs6000: Give an argument to every REG_CFA_REGISTER (PR85645)

2018-05-08 Thread Segher Boessenkool
The one for the prologue mflr did not have any value set, which means
use the SET that is in the insn pattern.  This works fine, except when
some late pass decides to replace the SET_SRC -- this changes the
meaning of the REG_CFA_REGISTER!  Such passes should not do these
things, but let's be more explicit here, for extra robustness.  It
could be argued that this defaulting is a design misfeature (it does
not save much space either, etc.)

Bootstrapped and tested on powerpc64-linux {-m32,-m64}.  I'll commit
this together with the rest of the series.


Segher


2018-05-08  Segher Boessenkool  

PR rtl-optimization/85645
* config/rs6000/rs6000.c (rs6000_emit_prologue_components): Put a SET
in the REG_CFA_REGISTER note for LR, don't leave it empty.

---
 gcc/config/rs6000/rs6000.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1b2bb12..6118423 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26151,10 +26151,11 @@ rs6000_emit_prologue_components (sbitmap components)
   /* Prologue for LR.  */
   if (bitmap_bit_p (components, 0))
 {
+  rtx lr = gen_rtx_REG (reg_mode, LR_REGNO);
   rtx reg = gen_rtx_REG (reg_mode, 0);
-  rtx_insn *insn = emit_move_insn (reg, gen_rtx_REG (reg_mode, LR_REGNO));
+  rtx_insn *insn = emit_move_insn (reg, lr);
   RTX_FRAME_RELATED_P (insn) = 1;
-  add_reg_note (insn, REG_CFA_REGISTER, NULL);
+  add_reg_note (insn, REG_CFA_REGISTER, gen_rtx_SET (reg, lr));
 
   int offset = info->lr_save_offset;
   if (info->push_p)
@@ -26162,7 +26163,6 @@ rs6000_emit_prologue_components (sbitmap components)
 
   insn = emit_insn (gen_frame_store (reg, ptr_reg, offset));
   RTX_FRAME_RELATED_P (insn) = 1;
-  rtx lr = gen_rtx_REG (reg_mode, LR_REGNO);
   rtx mem = copy_rtx (SET_DEST (single_set (insn)));
   add_reg_note (insn, REG_CFA_OFFSET, gen_rtx_SET (mem, lr));
 }
-- 
1.8.3.1



Debug Mode ENH 4/4: Add special iterator support

2018-05-08 Thread François Dumont
Here is a patch to teach _Parameter type about special iterator types so 
that it improves final output.


It also get rid of the debug layer when possible so that failure output 
is cleaner. Debug mode is already transparent to users there is no need 
to show the Debug types in the output.


Here is the output for the newly added tests, for the move_iterator:

/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/vector:188:
In function:
    std::__debug::vector<_Tp, _Allocator>::vector(_InputIterator,
    _InputIterator, const _Allocator&) [with _InputIterator =
std::move_iterator<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator
    std::vector >, std::__debug::vector >; 
 = void; _Tp

    = int; _Allocator = std::allocator]

Backtrace:
    ./debug_neg.exe() [0x402956]
    ./debug_neg.exe() [0x402db5]
    ./debug_neg.exe() [0x4011b9]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) 
[0x7f3962ef8830]

    ./debug_neg.exe() [0x401219]

Error: function requires a valid iterator range [__first, __last).

Objects involved in the operation:
    iterator "__first" @ 0x0x7ffc704c05f0 {
  type = std::move_iterator<__gnu_cxx::__normal_iteratorstd::vector > > > (mutable iterator);

  state = dereferenceable;
  references sequence with type 'std::__debug::vectorstd::allocator >' @ 0x0x7ffc704c07a0

    }
    iterator "__last" @ 0x0x7ffc704c05f0 {
  type = std::move_iterator<__gnu_cxx::__normal_iteratorstd::vector > > > (mutable iterator);

  state = dereferenceable (start-of-sequence);
  references sequence with type 'std::__debug::vectorstd::allocator >' @ 0x0x7ffc704c07a0

    }
XFAIL: 24_iterators/move_iterator/debug_neg.cc execution test

For the reverse_iterator:

/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/vector:188:
In function:
    std::__debug::vector<_Tp, _Allocator>::vector(_InputIterator,
    _InputIterator, const _Allocator&) [with _InputIterator =
std::reverse_iterator<__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator
    std::vector >, std::__debug::vector >; 
 = void; _Tp

    = int; _Allocator = std::allocator]

Backtrace:
    ./debug_neg.exe() [0x4020c1]
    ./debug_neg.exe() [0x400e59]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) 
[0x7f13fc56e830]

    ./debug_neg.exe() [0x400eb9]

Error: function requires a valid iterator range [__first, __last).

Objects involved in the operation:
    iterator "__first" @ 0x0x7ffc4e1f77d0 {
  type = std::reverse_iterator<__gnu_cxx::__normal_iteratorstd::vector > > > (mutable iterator);

  state = past-the-reverse-end;
  references sequence with type 'std::__debug::vectorstd::allocator >' @ 0x0x7ffc4e1f7800

    }
    iterator "__last" @ 0x0x7ffc4e1f77d0 {
  type = std::reverse_iterator<__gnu_cxx::__normal_iteratorstd::vector > > > (mutable iterator);

  state = dereferenceable (start-of-reverse-sequence);
  references sequence with type 'std::__debug::vectorstd::allocator >' @ 0x0x7ffc4e1f7800

    }
XFAIL: 24_iterators/reverse_iterator/debug_neg.cc execution test

Tested under Linux x8-_64.

I'll commit that tomorrow if not told otherwise.

    * include/debug/safe_iterator.h (_Safe_iterator<>::_M_constant()):
    Rename in...
    (_Safe_iterator<>::_S_constant()): ...that.
    * include/debug/safe_local_iterator.h
    (_Safe_local_iterator<>::_M_constant()): Rename in...
    (_Safe_local_iterator<>::_S_constant()): ...that.
    * include/debug/formatter.h: Remove bits/cpp_type_traits.h include.
    (_Iterator_state::__rbegin): New.
    (_Iterator_state::__rmiddle): New.
    (_Iterator_state::__rend): New.
    (_Parameter::_Parameter(const _Safe_iterator<>&, const char*,
    _Is_iterator)): Use _Safe_iterator<>::_S_constant. Grab normal 
underlying

    iterator type.
    (_Parameter::_Parameter(const _Safe_local_iterator<>&, const char*,
    _Is_iterator)): Likewise.
    (_Parameter::_S_reverse_state(_Iterator_state)): New.
    (_Parameter(__gnu_cxx::__normal_iterator<> const&, const char*,
    _Is_iterator)): New.
    (_Parameter(std::reverse_iterator<> const&, const char*,
    _Is_iterator)): New.
(_Parameter(std::reverse_iterator<_Safe_iterator<>> const&,
    const char*, _Is_iterator)): New.
    (_Parameter(std::move_iterator<> const&, const char*, _Is_iterator):
    New.
    (_Parameter(std::move_iterator<_Safe_iterator<>> const&, const char*,
    _Is_iterator)): New.
    * testsuite/24_iterators/move_iterator/debug_neg.cc: New.
    * testsuite/24_iterators/normal_iterator/debug_neg.cc: New.
    * testsuite/24_iterators/reverse_iterator/debug_neg.cc: New.

François

diff --git a/libstdc++-v3/include/debug/formatter.h b/libstdc++-v3/include/debug/formatter.h
index 2939203..7b3c30b 100644
--- a/libstdc++-v3/include/debug/formatter.h
+++ b/libstdc++-v3/include/debug/formatter.h
@@ -30,7 +30,6 @@
 #define _GLIBCXX_DEBUG_FORMATTER_H 1
 
 #include 
-#include 
 
 #if __cpp_rtti
 # include 
@@ -43,6 +42,31 @@ namespace std
 # d

Add clobbers around IFN_LOAD/STORE_LANES

2018-05-08 Thread Richard Sandiford
We build up the input to IFN_STORE_LANES one vector at a time.
In RTL, each of these vector assignments becomes a write to
subregs of the form (subreg:VEC (reg:AGGR R)), where R is the
eventual input to the store lanes instruction.  The problem is
that RTL isn't very good at tracking liveness when things are
initialised piecemeal by subregs, so R tends to end up being
live on all paths from the entry block to the store.  This in
turn leads to unnecessary spilling around calls, as well as to
excess register pressure in vector loops.

This patch adds gimple clobbers to indicate the liveness of the
IFN_STORE_LANES variable and makes sure that gimple clobbers are
expanded to rtl clobbers where useful.  For consistency it also
uses clobbers to mark the point at which an IFN_LOAD_LANES
variable is no longer needed.

Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


2018-05-08  Richard Sandiford  

gcc/
* cfgexpand.c (expand_clobber): New function.
(expand_gimple_stmt_1): Use it.
* tree-vect-stmts.c (vect_clobber_variable): New function,
split out from...
(vectorizable_simd_clone_call): ...here.
(vectorizable_store): Emit a clobber either side of an
IFN_STORE_LANES sequence.
(vectorizable_load): Emit a clobber after an IFN_LOAD_LANES sequence.

gcc/testsuite/
* gcc.target/aarch64/store_lane_spill_1.c: New test.
* gcc.target/aarch64/sve/store_lane_spill_1.c: Likewise.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c 2018-05-08 09:42:02.974668379 +0100
+++ gcc/cfgexpand.c 2018-05-08 14:23:25.039856499 +0100
@@ -3582,6 +3582,20 @@ expand_return (tree retval, tree bounds)
 }
 }
 
+/* Expand a clobber of LHS.  If LHS is stored it in a register, tell
+   the rtl optimizers that its value is no longer needed.  */
+
+static void
+expand_clobber (tree lhs)
+{
+  if (DECL_P (lhs))
+{
+  rtx decl_rtl = DECL_RTL_IF_SET (lhs);
+  if (decl_rtl && REG_P (decl_rtl))
+   emit_clobber (decl_rtl);
+}
+}
+
 /* A subroutine of expand_gimple_stmt, expanding one gimple statement
STMT that doesn't require special handling for outgoing edges.  That
is no tailcalls and no GIMPLE_COND.  */
@@ -3687,7 +3701,7 @@ expand_gimple_stmt_1 (gimple *stmt)
if (TREE_CLOBBER_P (rhs))
  /* This is a clobber to mark the going out of scope for
 this LHS.  */
- ;
+ expand_clobber (lhs);
else
  expand_assignment (lhs, rhs,
 gimple_assign_nontemporal_move_p (
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2018-05-08 09:42:03.335655127 +0100
+++ gcc/tree-vect-stmts.c   2018-05-08 14:23:25.040856464 +0100
@@ -182,6 +182,18 @@ create_array_ref (tree type, tree ptr, t
   return mem_ref;
 }
 
+/* Add a clobber of variable VAR to the vectorization of STMT.
+   Emit the clobber before *GSI.  */
+
+static void
+vect_clobber_variable (gimple *stmt, gimple_stmt_iterator *gsi, tree var)
+{
+  tree clobber = build_constructor (TREE_TYPE (var), NULL);
+  TREE_THIS_VOLATILE (clobber) = 1;
+  gimple *new_stmt = gimple_build_assign (var, clobber);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+}
+
 /* Utility functions used by vect_mark_stmts_to_be_vectorized.  */
 
 /* Function vect_mark_relevant.
@@ -4128,12 +4140,7 @@ vectorizable_simd_clone_call (gimple *st
}
 
  if (ratype)
-   {
- tree clobber = build_constructor (ratype, NULL);
- TREE_THIS_VOLATILE (clobber) = 1;
- new_stmt = gimple_build_assign (new_temp, clobber);
- vect_finish_stmt_generation (stmt, new_stmt, gsi);
-   }
+   vect_clobber_variable (stmt, gsi, new_temp);
  continue;
}
  else if (simd_clone_subparts (vectype) > nunits)
@@ -4156,10 +4163,7 @@ vectorizable_simd_clone_call (gimple *st
  CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE,
  gimple_assign_lhs (new_stmt));
}
- tree clobber = build_constructor (ratype, NULL);
- TREE_THIS_VOLATILE (clobber) = 1;
- new_stmt = gimple_build_assign (new_temp, clobber);
- vect_finish_stmt_generation (stmt, new_stmt, gsi);
+ vect_clobber_variable (stmt, gsi, new_temp);
}
  else
CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, new_temp);
@@ -4186,11 +4190,7 @@ vectorizable_simd_clone_call (gimple *st
  new_stmt
= gimple_build_assign (make_ssa_name (vec_dest), t);
  vect_finish_stmt_generation (stmt, ne

Re: [committed] Move C++ SVE tests to g++.target/aarch64/sve

2018-05-08 Thread Kyrill Tkachov


On 08/05/18 12:43, Richard Sandiford wrote:

Move the C++ tests that were originally in gcc.target/aarch64/sve
and later g++.dg/other to g++.target/aarch64/sve.  This means that
we don't need to override the -march flag when testing with something
that already supports SVE.

Tested on aarch64-linux-gnu (with and without SVE) and aaarch64_be-elf.
Applied as r260038.



Thanks for doing this Richard and sorry for missing the right location in the 
first plan.
I was meaning to move them there after you pointed it out, but it fell through 
the cracks.

Kyrill


Richard


2018-05-08  Richard Sandiford 

gcc/testsuite/
* g++.dg/other/sve_const_pred_1.C: Rename to...
* g++.target/aarch64/sve/const_pred_1.C: ...this. Remove aarch64
target selectors and explicit -march options.
* g++.dg/other/sve_const_pred_2.C: Rename to...
* g++.target/aarch64/sve/const_pred_2.C: ...this and adjust likewise.
* g++.dg/other/sve_const_pred_3.C: Rename to...
* g++.target/aarch64/sve/const_pred_3.C: ...this and adjust likewise.
* g++.dg/other/sve_const_pred_4.C: Rename to...
* g++.target/aarch64/sve/const_pred_4.C: ...this and adjust likewise.
* g++.dg/other/sve_tls_2.C: Rename to...
* g++.target/aarch64/sve/tls_2.C: ...this and adjust likewise.
* g++.dg/other/sve_vcond_1.C: Rename to...
* g++.target/aarch64/sve/vcond_1.C: ...this and adjust likewise.
* g++.dg/other/sve_vcond_1_run.C: Rename to...
* g++.target/aarch64/sve/vcond_1_run.C: ...this and adjust likewise.

Index: gcc/testsuite/g++.dg/other/sve_const_pred_1.C
===
--- gcc/testsuite/g++.dg/other/sve_const_pred_1.C 2018-05-01 19:30:28.063724095 
+0100
+++ /dev/null   2018-04-20 16:19:46.369131350 +0100
@@ -1,18 +0,0 @@
-/* { dg-do compile { target aarch64*-*-* } } */
-/* { dg-options "-O2 -march=armv8.2-a+sve -msve-vector-bits=256" } */
-
-#include 
-
-typedef int8_t vnx16qi __attribute__((vector_size(32)));
-
-vnx16qi
-foo (vnx16qi x, vnx16qi y)
-{
-  return (vnx16qi) { -1, 0, 0, -1, -1, -1, 0, 0,
--1, -1, -1, -1, 0, 0, 0, 0,
--1, -1, -1, -1, -1, -1, -1, -1,
-0, 0, 0, 0, 0, 0, 0, 0 } ? x : y;
-}
-
-/* { dg-final { scan-assembler {\tldr\tp[0-9]+,} } } */
-/* { dg-final { scan-assembler 
{\t\.byte\t57\n\t\.byte\t15\n\t\.byte\t(255|-1)\n\t\.byte\t0\n} } } */
Index: gcc/testsuite/g++.target/aarch64/sve/const_pred_1.C
===
--- /dev/null   2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/g++.target/aarch64/sve/const_pred_1.C 2018-05-08 
12:39:23.327239904 +0100
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=256" } */
+
+#include 
+
+typedef int8_t vnx16qi __attribute__((vector_size(32)));
+
+vnx16qi
+foo (vnx16qi x, vnx16qi y)
+{
+  return (vnx16qi) { -1, 0, 0, -1, -1, -1, 0, 0,
+-1, -1, -1, -1, 0, 0, 0, 0,
+-1, -1, -1, -1, -1, -1, -1, -1,
+0, 0, 0, 0, 0, 0, 0, 0 } ? x : y;
+}
+
+/* { dg-final { scan-assembler {\tldr\tp[0-9]+,} } } */
+/* { dg-final { scan-assembler 
{\t\.byte\t57\n\t\.byte\t15\n\t\.byte\t(255|-1)\n\t\.byte\t0\n} } } */
Index: gcc/testsuite/g++.dg/other/sve_const_pred_2.C
===
--- gcc/testsuite/g++.dg/other/sve_const_pred_2.C 2018-05-01 19:30:28.063724095 
+0100
+++ /dev/null   2018-04-20 16:19:46.369131350 +0100
@@ -1,16 +0,0 @@
-/* { dg-do compile { target aarch64*-*-* } } */
-/* { dg-options "-O2 -march=armv8.2-a+sve -msve-vector-bits=256" } */
-
-#include 
-
-typedef int16_t vnx8hi __attribute__((vector_size(32)));
-
-vnx8hi
-foo (vnx8hi x, vnx8hi y)
-{
-  return (vnx8hi) { -1, 0, 0, -1, -1, -1, 0, 0,
-   -1, -1, -1, -1, 0, 0, 0, 0 } ? x : y;
-}
-
-/* { dg-final { scan-assembler {\tldr\tp[0-9]+,} } } */
-/* { dg-final { scan-assembler 
{\t\.byte\t65\n\t\.byte\t5\n\t\.byte\t85\n\t\.byte\t0\n} } } */
Index: gcc/testsuite/g++.target/aarch64/sve/const_pred_2.C
===
--- /dev/null   2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/g++.target/aarch64/sve/const_pred_2.C 2018-05-08 
12:39:23.327239904 +0100
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=256" } */
+
+#include 
+
+typedef int16_t vnx8hi __attribute__((vector_size(32)));
+
+vnx8hi
+foo (vnx8hi x, vnx8hi y)
+{
+  return (vnx8hi) { -1, 0, 0, -1, -1, -1, 0, 0,
+   -1, -1, -1, -1, 0, 0, 0, 0 } ? x : y;
+}
+
+/* { dg-final { scan-assembler {\tldr\tp[0-9]+,} } } */
+/* { dg-final { scan-assembler 
{\t\.byte\t65\n\t\.byte\t5\n\t\.byte\t85\n\t\.byte\t0\n} } } */
Index: gcc/testsuite/g++.dg/other/sve_const_pred_3.C
===
--- gcc/testsuite/g++.dg/other/sve_cons

Re: Add clobbers around IFN_LOAD/STORE_LANES

2018-05-08 Thread Richard Biener
On Tue, May 8, 2018 at 3:25 PM, Richard Sandiford
 wrote:
> We build up the input to IFN_STORE_LANES one vector at a time.
> In RTL, each of these vector assignments becomes a write to
> subregs of the form (subreg:VEC (reg:AGGR R)), where R is the
> eventual input to the store lanes instruction.  The problem is
> that RTL isn't very good at tracking liveness when things are
> initialised piecemeal by subregs, so R tends to end up being
> live on all paths from the entry block to the store.  This in
> turn leads to unnecessary spilling around calls, as well as to
> excess register pressure in vector loops.
>
> This patch adds gimple clobbers to indicate the liveness of the
> IFN_STORE_LANES variable and makes sure that gimple clobbers are
> expanded to rtl clobbers where useful.  For consistency it also
> uses clobbers to mark the point at which an IFN_LOAD_LANES
> variable is no longer needed.
>
> Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

Minor comment inline.

Also it looks like clobbers are at the moment all thrown away during
RTL expansion?  Do the clobbers we generate with this patch eventually
get collected somehow if they turn out to be no longer necessary?
How many of them do we generate?  I expect not many decls get
expanded to registers and if they are most of them are likely
not constructed piecemail  - thus, maybe we should restrict ourselves
to non-scalar typed lhs?  So, change it to

  if (DECL_P (lhs)
  && (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) // but what about
single-element aggregates?
 || VECTOR_TYPE_P (TREE_TYPE (lhs))
 || COMPLEX_TYPE_P (TREE_TYPE (lhs

?

The vectorizer part is ok with the minor adjustment pointed out below.  Maybe
you want to split this patch while we discuss the RTL bits.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> 2018-05-08  Richard Sandiford  
>
> gcc/
> * cfgexpand.c (expand_clobber): New function.
> (expand_gimple_stmt_1): Use it.
> * tree-vect-stmts.c (vect_clobber_variable): New function,
> split out from...
> (vectorizable_simd_clone_call): ...here.
> (vectorizable_store): Emit a clobber either side of an
> IFN_STORE_LANES sequence.
> (vectorizable_load): Emit a clobber after an IFN_LOAD_LANES sequence.
>
> gcc/testsuite/
> * gcc.target/aarch64/store_lane_spill_1.c: New test.
> * gcc.target/aarch64/sve/store_lane_spill_1.c: Likewise.
>
> Index: gcc/cfgexpand.c
> ===
> --- gcc/cfgexpand.c 2018-05-08 09:42:02.974668379 +0100
> +++ gcc/cfgexpand.c 2018-05-08 14:23:25.039856499 +0100
> @@ -3582,6 +3582,20 @@ expand_return (tree retval, tree bounds)
>  }
>  }
>
> +/* Expand a clobber of LHS.  If LHS is stored it in a register, tell
> +   the rtl optimizers that its value is no longer needed.  */
> +
> +static void
> +expand_clobber (tree lhs)
> +{
> +  if (DECL_P (lhs))
> +{
> +  rtx decl_rtl = DECL_RTL_IF_SET (lhs);
> +  if (decl_rtl && REG_P (decl_rtl))
> +   emit_clobber (decl_rtl);
> +}
> +}
> +
>  /* A subroutine of expand_gimple_stmt, expanding one gimple statement
> STMT that doesn't require special handling for outgoing edges.  That
> is no tailcalls and no GIMPLE_COND.  */
> @@ -3687,7 +3701,7 @@ expand_gimple_stmt_1 (gimple *stmt)
> if (TREE_CLOBBER_P (rhs))
>   /* This is a clobber to mark the going out of scope for
>  this LHS.  */
> - ;
> + expand_clobber (lhs);
> else
>   expand_assignment (lhs, rhs,
>  gimple_assign_nontemporal_move_p (
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2018-05-08 09:42:03.335655127 +0100
> +++ gcc/tree-vect-stmts.c   2018-05-08 14:23:25.040856464 +0100
> @@ -182,6 +182,18 @@ create_array_ref (tree type, tree ptr, t
>return mem_ref;
>  }
>
> +/* Add a clobber of variable VAR to the vectorization of STMT.
> +   Emit the clobber before *GSI.  */
> +
> +static void
> +vect_clobber_variable (gimple *stmt, gimple_stmt_iterator *gsi, tree var)
> +{
> +  tree clobber = build_constructor (TREE_TYPE (var), NULL);
> +  TREE_THIS_VOLATILE (clobber) = 1;

There's now a helper for this - build_clobber.

> +  gimple *new_stmt = gimple_build_assign (var, clobber);
> +  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +}
> +
>  /* Utility functions used by vect_mark_stmts_to_be_vectorized.  */
>
>  /* Function vect_mark_relevant.
> @@ -4128,12 +4140,7 @@ vectorizable_simd_clone_call (gimple *st
> }
>
>   if (ratype)
> -   {
> - tree clobber = build_constructor (ratype, NULL);
> - TREE_THIS_VOLATILE (clobber) = 1;
> - new_stmt = gimple_build_assign (new_temp

Re: [PATCH, v2] Recognize a missed usage of a sbfiz instruction

2018-05-08 Thread Kyrill Tkachov

Hi Luis,

On 07/05/18 15:28, Luis Machado wrote:

Hi,

On 02/08/2018 10:45 AM, Luis Machado wrote:

Hi Kyrill,

On 02/08/2018 09:48 AM, Kyrill Tkachov wrote:

Hi Luis,

On 06/02/18 15:04, Luis Machado wrote:

Thanks for the feedback Kyrill. I've adjusted the v2 patch based on your
suggestions and re-tested the changes. Everything is still sane.


Thanks! This looks pretty good to me.


Since this is ARM-specific and fairly specific, i wonder if it would be
reasonable to consider it for inclusion at the current stage.


It is true that the target maintainers can choose to take
such patches at any stage. However, any patch at this stage increases
the risk of regressions being introduced and these regressions
can come bite us in ways that are very hard to anticipate.

Have a look at some of the bugs in bugzilla (or a quick scan of the gcc-bugs 
list)
for examples of the ways that things can go wrong with any of the myriad of GCC 
components
and the unexpected ways in which they can interact.

For example, I am now working on what I initially thought was a one-liner fix 
for
PR 84164 but it has expanded into a 3-patch series with a midend component and
target-specific changes for 2 ports.

These issues are very hard to catch during review and normal testing, and can 
sometimes take months of deep testing by
fuzzing and massive codebase rebuilds to expose, so the closer the commit is to 
a release
the higher the risk is that an obscure edge case will be unnoticed and unfixed 
in the release.

So the priority at this stage is to minimise the risk of destabilising the 
codebase,
as opposed to taking in new features and desirable performance improvements 
(like your patch!)

That is the rationale for delaying committing such changes until the start
of GCC 9 development. But again, this is up to the aarch64 maintainers.
I'm sure the patch will be a perfectly fine and desirable commit for GCC 9.
This is just my perspective as maintainer of the arm port.


Thanks. Your explanation makes the situation pretty clear and it sounds very 
reasonable. I'll put the patch on hold until development is open again.

Regards,
Luis


With GCC 9 development open, i take it this patch is worth considering again?



Yes, I believe the latest version is at:
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00239.html ?

+(define_insn "*ashift_extv_bfiz"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+   (ashift:GPI (sign_extract:GPI (match_operand:GPI 1 "register_operand" 
"r")
+ (match_operand 2 
"aarch64_simd_shift_imm_offset_" "n")
+ (match_operand 3 "aarch64_simd_shift_imm_" 
"n"))
+(match_operand 4 "aarch64_simd_shift_imm_" "n")))]
+  ""
+  "sbfiz\\t%0, %1, %4, %2"
+  [(set_attr "type" "bfx")]
+)
+

Can you give a bit more information about what are the values for operands 2,3 
and 4 in your example testcases?
I'm trying to understand why the value of operand 3 (the bit position the 
sign-extract starts from) doesn't get validated
in any way and doesn't play any role in the output...

Thanks,
Kyrill


[Ping] [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")

2018-05-08 Thread Paolo Carlini

Hi,

On 20/04/2018 19:46, Paolo Carlini wrote:

Hi,

in this error-recovery regression, after sensible diagnostic about 
"two or more data types in declaration..." we get confused, we issue a 
cryptic -  but useful hint to somebody working on the present bug ;) - 
"template definition of non-template" error and we finally crash. I 
think the issue here is that we want to use 
abort_fully_implicit_template as part of the error recovery done by 
cp_parser_parameter_declaration_list, when the loop is exited early 
after a cp_parser_parameter_declaration internally called 
synthesize_implicit_template_parm. Indeed, if we do that we get the 
same error recovery behavior we get for the same testcase modified to 
not use an auto parameter (likewise for related testcases):


struct a {
  void b() {}
   void c(auto = [] {
    if (a a(int int){})
  ;
  }) {}
};

Tested x86_64-linux.

The last pending patch of mine...

    https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01014.html

Thanks!
Paolo.


[RFA] Incremental LTO linking part 1: simple-object bits

2018-05-08 Thread Jan Hubicka
Hi,
for incremental linking of LTO objects we need to copy debug sections from
source object files into destination without renaming them from .gnu.debuglto
into the standard debug section (because they will again be LTO debug section
in the resulting object file).

I have discussed this with Richard on IRC and I hope it is fine to change the
API here becuase lto-wrapper is the only user of this function.  I will send
lto-wrapper support in separate patch.

I have lto-bootstrapped/regtested the whole incremental linking patchet on
x86-64-linux with libbackend being incrementaly linked and also experimented
with extra testcases and tested that debugging works on resulting cc1 binary.
OK?

Honza

* simple-object.h (simple_object_copy_lto_debug_sections): Add rename
parameter.
* simple-object.c (handle_lto_debug_sections): Add rename parameter.
(handle_lto_debug_sections_rename): New function.
(handle_lto_debug_sections_norename): New function.
(simple_object_copy_lto_debug_sections): Add rename parameter.
Index: include/simple-object.h
===
--- include/simple-object.h (revision 260042)
+++ include/simple-object.h (working copy)
@@ -198,12 +198,15 @@
 simple_object_release_write (simple_object_write *);
 
 /* Copy LTO debug sections from SRC_OBJECT to DEST.
+   If RENAME is true, rename LTO debug section into debug section (i.e.
+   when producing final binary) and if it is false, keep the sections with
+   original names (when incrementally linking).
If an error occurs, return the errno value in ERR and an error string.  */
 
 extern const char *
 simple_object_copy_lto_debug_sections (simple_object_read *src_object,
   const char *dest,
-  int *err);
+  int *err, int rename);
 
 #ifdef __cplusplus
 }
Index: libiberty/simple-object.c
===
--- libiberty/simple-object.c   (revision 260042)
+++ libiberty/simple-object.c   (working copy)
@@ -251,12 +251,15 @@
 }
 
 /* Callback to identify and rename LTO debug sections by name.
-   Returns 1 if NAME is a LTO debug section, 0 if not.  */
+   Returns non-NULL if NAME is a LTO debug section, NULL if not.
+   If RENAME is true it will rename LTO debug sections to non-LTO
+   ones.  */
 
 static char *
-handle_lto_debug_sections (const char *name)
+handle_lto_debug_sections (const char *name, int rename)
 {
-  char *newname = XCNEWVEC (char, strlen (name) + 1);
+  char *newname = rename ? XCNEWVEC (char, strlen (name) + 1)
+: xstrdup (name);
 
   /* ???  So we can't use .gnu.lto_ prefixed sections as the assembler
  complains about bogus section flags.  Which means we need to arrange
@@ -265,12 +268,14 @@
   /* Also include corresponding reloc sections.  */
   if (strncmp (name, ".rela", sizeof (".rela") - 1) == 0)
 {
-  strncpy (newname, name, sizeof (".rela") - 1);
+  if (rename)
+strncpy (newname, name, sizeof (".rela") - 1);
   name += sizeof (".rela") - 1;
 }
   else if (strncmp (name, ".rel", sizeof (".rel") - 1) == 0)
 {
-  strncpy (newname, name, sizeof (".rel") - 1);
+  if (rename)
+strncpy (newname, name, sizeof (".rel") - 1);
   name += sizeof (".rel") - 1;
 }
   /* ???  For now this handles both .gnu.lto_ and .gnu.debuglto_ prefixed
@@ -277,10 +282,10 @@
  sections.  */
   /* Copy LTO debug sections and rename them to their non-LTO name.  */
   if (strncmp (name, ".gnu.debuglto_", sizeof (".gnu.debuglto_") - 1) == 0)
-return strcat (newname, name + sizeof (".gnu.debuglto_") - 1);
+return rename ? strcat (newname, name + sizeof (".gnu.debuglto_") - 1) : 
newname;
   else if (strncmp (name, ".gnu.lto_.debug_",
sizeof (".gnu.lto_.debug_") -1) == 0)
-return strcat (newname, name + sizeof (".gnu.lto_") - 1);
+return rename ? strcat (newname, name + sizeof (".gnu.lto_") - 1) : 
newname;
   /* Copy over .note.GNU-stack section under the same name if present.  */
   else if (strcmp (name, ".note.GNU-stack") == 0)
 return strcpy (newname, name);
@@ -289,14 +294,31 @@
  COMDAT sections in objects produced by GCC.  */
   else if (strcmp (name, ".comment") == 0)
 return strcpy (newname, name);
+  free (newname);
   return NULL;
 }
 
+/* Wrapper for handle_lto_debug_sections.  */
+
+static char *
+handle_lto_debug_sections_rename (const char *name)
+{
+  return handle_lto_debug_sections (name, 1);
+}
+
+/* Wrapper for handle_lto_debug_sections.  */
+
+static char *
+handle_lto_debug_sections_norename (const char *name)
+{
+  return handle_lto_debug_sections (name, 0);
+}
+
 /* Copy LTO debug sections.  */
 
 const char *
 simple_object_copy_lto_debug_sections (simple_object_read *sobj,
-  const char *dest, int *

Incremental LTO linking part 2: lto-plugin support

2018-05-08 Thread Jan Hubicka
Hi,
with lto, incremental linking can be meaninfuly done in three ways:
 1) read LTO file and produce non-LTO .o file
this is current behaviour of gcc -r or ld -r with plugin
 2) read LTO files and merge section for later LTO
this is current behaviour of ld -r w/o plugin
 3) read LTO files into the compiler, link them and produce
incrementaly linked LTO object.

3 makes most sense and I am maing it new default for gcc -r. For testing 
purposes
and perhaps in order to have tool to turn LTO object into real object, we want
to have 1) available as well.  GCC currently have -flinker-output option that
decides between modes that is decided by linker plugin and can be overwritten
by user (I have forgot to document this).

I am targeting for -flinker-output=rel to be incremental linking into LTO
and adding -flinker-output=nolto-rel for 1).

The main limitation of 2 and 3 is that you can not link LTO and non-LTO
object files theger.  For 2 HJ's binutils patchset has support and I think
it can be extended to handle 3 as well. But with default binutils we want
to warn users.  This patch implements the warning (and prevents linker plugin
to add redundat linker-ouptut options.

Bootstrapped/regtested x86_64-linux with rest of the inclink patchset. OK?

* lto-plugin.c: (non_claimed_files): New static var.
(linker_ouput_known): New static var.
(all_symbols_read_handler): When user specifies linker output do not
imply it; output warning when nonlto-rel mode is forced.
(claim_file_header): Record number of nonclaimed files.
(process_option): Remember if linker output is known

Index: lto-plugin.c
===
--- lto-plugin.c(revision 260042)
+++ lto-plugin.c(working copy)
@@ -27,10 +27,13 @@
More information at http://gcc.gnu.org/wiki/whopr/driver.
 
This plugin should be passed the lto-wrapper options and will forward them.
-   It also has 2 options of its own:
+   It also has options at his own:
-debug: Print the command line used to run lto-wrapper.
-nop: Instead of running lto-wrapper, pass the original to the plugin. This
-   only works if the input files are hybrid.  */
+   only works if the input files are hybrid. 
+   -linker-output-known: Do not determine linker output
+   -sym-style={none,win32,underscore|uscore}
+   -pass-through  */
 
 #ifdef HAVE_CONFIG_H
 #include "config.h"
@@ -159,6 +162,7 @@
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
+static unsigned int non_claimed_files = 0;
 
 /* List of files with offloading.  */
 static struct plugin_offload_file *offload_files;
@@ -185,6 +189,7 @@
 static char *resolution_file = NULL;
 static enum ld_plugin_output_file_type linker_output;
 static int linker_output_set;
+static int linker_output_known;
 
 /* The version of gold being used, or -1 if not gold.  The number is
MAJOR * 100 + MINOR.  */
@@ -637,7 +642,8 @@
 all_symbols_read_handler (void)
 {
   unsigned i;
-  unsigned num_lto_args = num_claimed_files + lto_wrapper_num_args + 3;
+  unsigned num_lto_args = num_claimed_files + lto_wrapper_num_args + 2
+  + !linker_output_known;
   char **lto_argv;
   const char *linker_output_str = NULL;
   const char **lto_arg_ptr;
@@ -661,26 +667,37 @@
   for (i = 0; i < lto_wrapper_num_args; i++)
 *lto_arg_ptr++ = lto_wrapper_argv[i];
 
-  assert (linker_output_set);
-  switch (linker_output)
+  if (!linker_output_known)
 {
-case LDPO_REL:
-  linker_output_str = "-flinker-output=rel";
-  break;
-case LDPO_DYN:
-  linker_output_str = "-flinker-output=dyn";
-  break;
-case LDPO_PIE:
-  linker_output_str = "-flinker-output=pie";
-  break;
-case LDPO_EXEC:
-  linker_output_str = "-flinker-output=exec";
-  break;
-default:
-  message (LDPL_FATAL, "unsupported linker output %i", linker_output);
-  break;
+  assert (linker_output_set);
+  switch (linker_output)
+   {
+   case LDPO_REL:
+ if (non_claimed_files)
+   {
+ message (LDPL_WARNING, "incremental linking of LTO and non-LTO "
+  "objects; using -flinker-output=nolto-rel which will "
+  "bypass whole program optimization");
+ linker_output_str = "-flinker-output=nolto-rel";
+   }
+ else
+   linker_output_str = "-flinker-output=rel";
+ break;
+   case LDPO_DYN:
+ linker_output_str = "-flinker-output=dyn";
+ break;
+   case LDPO_PIE:
+ linker_output_str = "-flinker-output=pie";
+ break;
+   case LDPO_EXEC:
+ linker_output_str = "-flinker-output=exec";
+ break;
+   default:
+ message (LDPL_FATAL, "unsupported linker output %i", linker_output);
+ break;
+   }
+  *lto_arg_ptr++ = xstrdup (linker_output_str);
 }
-  *lto_arg_ptr++ = xs

[PATCH] Add missing _mm512_set{_epi16,_epi8,zero} intrinsics

2018-05-08 Thread Jakub Jelinek
Hi!

While working on PR85323 testsuite coverage, I've noticed we lack these
intrinsics.  ICC and since Mar 2017 also clang do have these.

The _mm512_setzero is just a misnamed alias to another intrinsic, but for
compatibility we likely want to have it too.

Surprisingly, the _mm512_setr_epi{8,16} intrinsics one would expect too
are missing in the ICC I have around.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-05-08  Jakub Jelinek  

* config/i386/avx512fintrin.h (_mm512_set_epi16, _mm512_set_epi8,
_mm512_setzero): New intrinsics.

* gcc.target/i386/avx512f-set-v32hi-1.c: New test.
* gcc.target/i386/avx512f-set-v32hi-2.c: New test.
* gcc.target/i386/avx512f-set-v32hi-3.c: New test.
* gcc.target/i386/avx512f-set-v32hi-4.c: New test.
* gcc.target/i386/avx512f-set-v32hi-5.c: New test.
* gcc.target/i386/avx512f-set-v64qi-1.c: New test.
* gcc.target/i386/avx512f-set-v64qi-2.c: New test.
* gcc.target/i386/avx512f-set-v64qi-3.c: New test.
* gcc.target/i386/avx512f-set-v64qi-4.c: New test.
* gcc.target/i386/avx512f-set-v64qi-5.c: New test.
* gcc.target/i386/avx512f-setzero-1.c: New test.

--- gcc/config/i386/avx512fintrin.h.jj  2018-05-03 20:58:56.210706689 +0200
+++ gcc/config/i386/avx512fintrin.h 2018-05-08 12:43:49.905259463 +0200
@@ -97,6 +97,56 @@ _mm512_set_epi32 (int __A, int __B, int
   __H, __G, __F, __E, __D, __C, __B, __A };
 }
 
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_epi16 (short __q31, short __q30, short __q29, short __q28,
+ short __q27, short __q26, short __q25, short __q24,
+ short __q23, short __q22, short __q21, short __q20,
+ short __q19, short __q18, short __q17, short __q16,
+ short __q15, short __q14, short __q13, short __q12,
+ short __q11, short __q10, short __q09, short __q08,
+ short __q07, short __q06, short __q05, short __q04,
+ short __q03, short __q02, short __q01, short __q00)
+{
+  return __extension__ (__m512i)(__v32hi){
+__q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
+__q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
+__q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
+__q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31
+  };
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_epi8 (char __q63, char __q62, char __q61, char __q60,
+char __q59, char __q58, char __q57, char __q56,
+char __q55, char __q54, char __q53, char __q52,
+char __q51, char __q50, char __q49, char __q48,
+char __q47, char __q46, char __q45, char __q44,
+char __q43, char __q42, char __q41, char __q40,
+char __q39, char __q38, char __q37, char __q36,
+char __q35, char __q34, char __q33, char __q32,
+char __q31, char __q30, char __q29, char __q28,
+char __q27, char __q26, char __q25, char __q24,
+char __q23, char __q22, char __q21, char __q20,
+char __q19, char __q18, char __q17, char __q16,
+char __q15, char __q14, char __q13, char __q12,
+char __q11, char __q10, char __q09, char __q08,
+char __q07, char __q06, char __q05, char __q04,
+char __q03, char __q02, char __q01, char __q00)
+{
+  return __extension__ (__m512i)(__v64qi){
+__q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
+__q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
+__q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
+__q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31,
+__q32, __q33, __q34, __q35, __q36, __q37, __q38, __q39,
+__q40, __q41, __q42, __q43, __q44, __q45, __q46, __q47,
+__q48, __q49, __q50, __q51, __q52, __q53, __q54, __q55,
+__q56, __q57, __q58, __q59, __q60, __q61, __q62, __q63
+  };
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_set_pd (double __A, double __B, double __C, double __D,
@@ -263,6 +313,13 @@ _mm512_setzero_ps (void)
 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
 }
 
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero (void)
+{
+  return _mm512_setzero_ps ();
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_pd (void)
--- gcc/testsuite/gcc.target/i386/avx512f-set-v32hi-1.c.jj  2018-05-08 
13:00:19.622831795 +0200
+++ gcc/testsuite/gcc.target/i386/avx512f-set-v32hi-1.c 2018-05-08 
13:00:03.536822520 +0200
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -

Incremental LTO linking part 3: lto-wrapper support

2018-05-08 Thread Jan Hubicka
Hi,
this patch makes lto-wrapper to look for -flinker-output=rel and in this
case confiugre lto1 in non-WHOPR mode + disable section renaming.

Bootstrapped/regtested x86_64-linux with rest of incremental link patchset.
OK?

* lto-wrapper.c (debug_objcopy): Add rename parameter; pass
it down to simple_object_copy_lto_debug_sections.
(run_gcc): Determine incremental LTO link time and configure
lto1 into non-wpa mode, disable renaming of debug sections.

Index: lto-wrapper.c
===
--- lto-wrapper.c   (revision 260042)
+++ lto-wrapper.c   (working copy)
@@ -966,7 +966,7 @@
is returned.  Return NULL on error.  */
 
 const char *
-debug_objcopy (const char *infile)
+debug_objcopy (const char *infile, bool rename)
 {
   const char *outfile;
   const char *errmsg;
@@ -1008,7 +1008,7 @@
 }
 
   outfile = make_temp_file ("debugobjtem");
-  errmsg = simple_object_copy_lto_debug_sections (inobj, outfile, &err);
+  errmsg = simple_object_copy_lto_debug_sections (inobj, outfile, &err, 
rename);
   if (errmsg)
 {
   unlink_if_ordinary (outfile);
@@ -1056,6 +1056,7 @@
   bool have_offload = false;
   unsigned lto_argc = 0, ltoobj_argc = 0;
   char **lto_argv, **ltoobj_argv;
+  bool linker_output_rel = false;
   bool skip_debug = false;
   unsigned n_debugobj;
 
@@ -1108,9 +1109,12 @@
  file_offset = (off_t) loffset;
}
   fd = open (filename, O_RDONLY | O_BINARY);
+  /* Linker plugin passes -fresolution and -flinker-output options.  */
   if (fd == -1)
{
  lto_argv[lto_argc++] = argv[i];
+ if (strcmp (argv[i], "-flinker-output=rel") == 0)
+   linker_output_rel = true;
  continue;
}
 
@@ -1175,6 +1179,11 @@
  lto_mode = LTO_MODE_WHOPR;
  break;
 
+   case OPT_flinker_output_:
+ linker_output_rel = !strcmp (option->arg, "rel");
+ break;
+
+
default:
  break;
}
@@ -1191,6 +1200,9 @@
   fputc ('\n', stderr);
 }
 
+  if (linker_output_rel)
+no_partition = true;
+
   if (no_partition)
 {
   lto_mode = LTO_MODE_LTO;
@@ -1435,7 +1447,7 @@
 for (i = 0; i < ltoobj_argc; ++i)
   {
const char *tem;
-   if ((tem = debug_objcopy (ltoobj_argv[i])))
+   if ((tem = debug_objcopy (ltoobj_argv[i], !linker_output_rel)))
  {
obstack_ptr_grow (&argv_obstack, tem);
n_debugobj++;


Incremental LTO linking part 4: lto-opts support

2018-05-08 Thread Jan Hubicka
Hi,
this patch prevents lto-option to store some flags that does not make snese to 
store,
in partiuclar dumpdir and -fresolution. These definitly should not be preserved 
from
compile time to link time and in case of incremental linking they caused 
trouble with
wrong resolution file being used in some cases.

I guess this is just tip of iceberg - I think we should switch to whitelisting 
options
that needs saving rather than saving everything with few exceptions. This is 
however
a separate issue.

Bootstrapped/regtested x86_64-linux, OK?
* lto-opts.c (lto_write_options): Skip OPT_dumpdir, OPT_fresolution_.
Index: lto-opts.c
===
--- lto-opts.c  (revision 260042)
+++ lto-opts.c  (working copy)
@@ -109,6 +109,8 @@
case OPT_SPECIAL_ignore:
case OPT_SPECIAL_program_name:
case OPT_SPECIAL_input_file:
+   case OPT_dumpdir:
+   case OPT_fresolution_:
  continue;
 
default:


[PATCH] Add peephole2's for mem {+,-,&,|,^}= x; mem != 0 after cmpelim (PR target/85683)

2018-05-08 Thread Jakub Jelinek
Hi!

Since r247992 the cmpelim pass optimizes a few arithmetics with following
comparisons and some of the peephole2s we have to recognize RMW instructions
with comparisons don't trigger anymore.
In particular, on the pr49095.c testcase in GCC 7 only 8 functions used
load + comparison with arith + store ([fh]*xor, something to look at later),
while in GCC 8/9 21 further functions do that.  This patch restores it to
the GCC 7 counts.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
What about GCC 8.2?

2018-05-08  Jakub Jelinek  

PR target/85683
* config/i386/i386.md: Add peepholes for mem {+,-,&,|,^}= x; mem != 0
after cmpelim optimization.

* gcc.target/i386/pr49095.c: Add -masm=att to dg-options.  Add
scan-assembler-times checking that except for [fh]*xor other functions
don't use any load instructions.

--- gcc/config/i386/i386.md.jj  2018-05-02 23:55:44.0 +0200
+++ gcc/config/i386/i386.md 2018-05-08 10:39:52.691422990 +0200
@@ -19285,6 +19285,37 @@ (define_peephole2
   const0_rtx);
 })
 
+;; Likewise for cmpelim optimized pattern.
+(define_peephole2
+  [(set (match_operand:SWI 0 "register_operand")
+   (match_operand:SWI 1 "memory_operand"))
+   (parallel [(set (reg FLAGS_REG)
+  (compare (match_operator:SWI 3 "plusminuslogic_operator"
+ [(match_dup 0)
+  (match_operand:SWI 2 "")])
+   (const_int 0)))
+ (set (match_dup 0) (match_dup 3))])
+   (set (match_dup 1) (match_dup 0))]
+  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && peep2_reg_dead_p (3, operands[0])
+   && !reg_overlap_mentioned_p (operands[0], operands[1])
+   && !reg_overlap_mentioned_p (operands[0], operands[2])
+   && ix86_match_ccmode (peep2_next_insn (1),
+(GET_CODE (operands[3]) == PLUS
+ || GET_CODE (operands[3]) == MINUS)
+? CCGOCmode : CCNOmode)"
+  [(parallel [(set (match_dup 4) (match_dup 6))
+ (set (match_dup 1) (match_dup 5))])]
+{
+  operands[4] = SET_DEST (XVECEXP (PATTERN (peep2_next_insn (1)), 0, 0));
+  operands[5]
+= gen_rtx_fmt_ee (GET_CODE (operands[3]), GET_MODE (operands[3]),
+ copy_rtx (operands[1]), operands[2]);
+  operands[6]
+= gen_rtx_COMPARE (GET_MODE (operands[4]), copy_rtx (operands[5]),
+  const0_rtx);
+})
+
 ;; Likewise for instances where we have a lea pattern.
 (define_peephole2
   [(set (match_operand:SWI 0 "register_operand")
@@ -19348,6 +19379,34 @@ (define_peephole2
   const0_rtx);
 })
 
+;; Likewise for cmpelim optimized pattern.
+(define_peephole2
+  [(parallel [(set (reg FLAGS_REG)
+  (compare (match_operator:SWI 2 "plusminuslogic_operator"
+ [(match_operand:SWI 0 "register_operand")
+  (match_operand:SWI 1 "memory_operand")])
+   (const_int 0)))
+ (set (match_dup 0) (match_dup 2))])
+   (set (match_dup 1) (match_dup 0))]
+  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && peep2_reg_dead_p (2, operands[0])
+   && !reg_overlap_mentioned_p (operands[0], operands[1])
+   && ix86_match_ccmode (peep2_next_insn (0),
+(GET_CODE (operands[2]) == PLUS
+ || GET_CODE (operands[2]) == MINUS)
+? CCGOCmode : CCNOmode)"
+  [(parallel [(set (match_dup 3) (match_dup 5))
+ (set (match_dup 1) (match_dup 4))])]
+{
+  operands[3] = SET_DEST (XVECEXP (PATTERN (peep2_next_insn (0)), 0, 0));
+  operands[4]
+= gen_rtx_fmt_ee (GET_CODE (operands[2]), GET_MODE (operands[2]),
+ copy_rtx (operands[1]), operands[0]);
+  operands[5]
+= gen_rtx_COMPARE (GET_MODE (operands[3]), copy_rtx (operands[4]),
+  const0_rtx);
+})
+
 (define_peephole2
   [(set (match_operand:SWI12 0 "register_operand")
(match_operand:SWI12 1 "memory_operand"))
--- gcc/testsuite/gcc.target/i386/pr49095.c.jj  2017-02-14 20:34:47.575579410 
+0100
+++ gcc/testsuite/gcc.target/i386/pr49095.c 2018-05-08 10:52:03.781730062 
+0200
@@ -1,6 +1,6 @@
 /* PR rtl-optimization/49095 */
 /* { dg-do compile } */
-/* { dg-options "-Os -fno-shrink-wrap" } */
+/* { dg-options "-Os -fno-shrink-wrap -masm=att" } */
 /* { dg-additional-options "-mregparm=2" { target ia32 } } */
 
 void foo (void *);
@@ -71,3 +71,6 @@ G (int)
 G (long)
 
 /* { dg-final { scan-assembler-not "test\[lq\]" } } */
+/* The {f,h}{char,short,int,long}xor functions aren't optimized into
+   a RMW instruction, so need load, modify and store.  FIXME eventually.  */
+/* { dg-final { scan-assembler-times "\\), %" 8 } } */

Jakub


Re: Debug Mode ENH 3/4: Add backtrace

2018-05-08 Thread Jonathan Wakely

On 07/05/18 22:20 +0200, François Dumont wrote:

Hi

    Here is the patch to add backtrace info to debug assertion failure 
output.


Example:

/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/vector:188:
In function:
    std::__debug::vector<_Tp, _Allocator>::vector(_InputIterator,
    _InputIterator, const _Allocator&) [with _InputIterator =
std::reverse_iterator<__gnu_debug::_Safe_tagged_iterator<__gnu_cxx::__normal_iterator >, std::__debug::vector,
    std::random_access_iterator_tag> >;  = 
void; _Tp

    = int; _Allocator = std::allocator]

Backtrace:
    ./debug_neg.exe() [0x4020c1]
    ./debug_neg.exe() [0x400e59]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) 
[0x7f13fc56e830]

    ./debug_neg.exe() [0x400eb9]

I tried to use add2line on the output address and it worked fine.

Tested under Linux x86_64.

I'll commit tomorrow if not told otherwise.

    * src/c++11/debug.cc [_GLIBCXX_HAVE_EXECINFO_H]: Include execinfo.h.
    [_GLIBCXX_HAVE_EXECINFO_H](_Error_formatter::_M_error): Render 
backtrace.


Did you look into using libbacktrace? That resolves the addresses and
prints nice symbols. See the output of AddressSantizer for what it
looks like (I think that uses libbacktrace).



[PATCH] Constant folding of x86 vector shift by scalar builtins (PR target/85323)

2018-05-08 Thread Jakub Jelinek
Hi!

The following patch adds folding for vector shift by scalar builtins.
If they are masked, so far we only optimize them only if the mask is all
ones.  ix86_fold_builtin handles the all constant argument cases, where the
effect of the instructions can be fully precomputed at compile time and can
be useful even in constant expressions etc.
The ix86_gimple_fold_builtin deals with the cases where the first argument
is an arbitrary vector, but we can still optimize the cases:
1) if the shift count is 0, just return the first argument directly
2) if the shift count is equal or higher than element precision and the
shift is not arithmetic right shift, the result is 0

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-05-08  Jakub Jelinek  

PR target/85323
* config/i386/i386.c: Include tree-vector-builder.h.
(ix86_vector_shift_count): New function.
(ix86_fold_builtin): Fold shift builtins by scalar count.
(ix86_gimple_fold_builtin): Likewise.

* gcc.target/i386/pr85323-1.c: New test.
* gcc.target/i386/pr85323-2.c: New test.
* gcc.target/i386/pr85323-3.c: New test.

--- gcc/config/i386/i386.c.jj   2018-05-07 09:11:12.353189182 +0200
+++ gcc/config/i386/i386.c  2018-05-07 19:35:36.266868071 +0200
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.
 #include "ipa-prop.h"
 #include "ipa-fnsummary.h"
 #include "wide-int-bitmask.h"
+#include "tree-vector-builder.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -33320,6 +33321,28 @@ fold_builtin_cpu (tree fndecl, tree *arg
   gcc_unreachable ();
 }
 
+/* Return the shift count of a vector by scalar shift builtin second argument
+   ARG1.  */
+static tree
+ix86_vector_shift_count (tree arg1)
+{
+  if (tree_fits_uhwi_p (arg1))
+return arg1;
+  else if (TREE_CODE (arg1) == VECTOR_CST && CHAR_BIT == 8)
+{
+  /* The count argument is weird, passed in as various 128-bit
+(or 64-bit) vectors, the low 64 bits from it are the count.  */
+  unsigned char buf[16];
+  int len = native_encode_expr (arg1, buf, 16);
+  if (len == 0)
+   return NULL_TREE;
+  tree t = native_interpret_expr (uint64_type_node, buf, len);
+  if (t && tree_fits_uhwi_p (t))
+   return t;
+}
+  return NULL_TREE;
+}
+
 static tree
 ix86_fold_builtin (tree fndecl, int n_args,
   tree *args, bool ignore ATTRIBUTE_UNUSED)
@@ -33328,6 +33351,8 @@ ix86_fold_builtin (tree fndecl, int n_ar
 {
   enum ix86_builtins fn_code = (enum ix86_builtins)
   DECL_FUNCTION_CODE (fndecl);
+  enum rtx_code rcode;
+
   switch (fn_code)
{
case IX86_BUILTIN_CPU_IS:
@@ -33508,6 +33533,168 @@ ix86_fold_builtin (tree fndecl, int n_ar
}
  break;
 
+   case IX86_BUILTIN_PSLLD:
+   case IX86_BUILTIN_PSLLD128:
+   case IX86_BUILTIN_PSLLD128_MASK:
+   case IX86_BUILTIN_PSLLD256:
+   case IX86_BUILTIN_PSLLD256_MASK:
+   case IX86_BUILTIN_PSLLD512:
+   case IX86_BUILTIN_PSLLDI:
+   case IX86_BUILTIN_PSLLDI128:
+   case IX86_BUILTIN_PSLLDI128_MASK:
+   case IX86_BUILTIN_PSLLDI256:
+   case IX86_BUILTIN_PSLLDI256_MASK:
+   case IX86_BUILTIN_PSLLDI512:
+   case IX86_BUILTIN_PSLLQ:
+   case IX86_BUILTIN_PSLLQ128:
+   case IX86_BUILTIN_PSLLQ128_MASK:
+   case IX86_BUILTIN_PSLLQ256:
+   case IX86_BUILTIN_PSLLQ256_MASK:
+   case IX86_BUILTIN_PSLLQ512:
+   case IX86_BUILTIN_PSLLQI:
+   case IX86_BUILTIN_PSLLQI128:
+   case IX86_BUILTIN_PSLLQI128_MASK:
+   case IX86_BUILTIN_PSLLQI256:
+   case IX86_BUILTIN_PSLLQI256_MASK:
+   case IX86_BUILTIN_PSLLQI512:
+   case IX86_BUILTIN_PSLLW:
+   case IX86_BUILTIN_PSLLW128:
+   case IX86_BUILTIN_PSLLW128_MASK:
+   case IX86_BUILTIN_PSLLW256:
+   case IX86_BUILTIN_PSLLW256_MASK:
+   case IX86_BUILTIN_PSLLW512_MASK:
+   case IX86_BUILTIN_PSLLWI:
+   case IX86_BUILTIN_PSLLWI128:
+   case IX86_BUILTIN_PSLLWI128_MASK:
+   case IX86_BUILTIN_PSLLWI256:
+   case IX86_BUILTIN_PSLLWI256_MASK:
+   case IX86_BUILTIN_PSLLWI512_MASK:
+ rcode = ASHIFT;
+ goto do_shift;
+   case IX86_BUILTIN_PSRAD:
+   case IX86_BUILTIN_PSRAD128:
+   case IX86_BUILTIN_PSRAD128_MASK:
+   case IX86_BUILTIN_PSRAD256:
+   case IX86_BUILTIN_PSRAD256_MASK:
+   case IX86_BUILTIN_PSRAD512:
+   case IX86_BUILTIN_PSRADI:
+   case IX86_BUILTIN_PSRADI128:
+   case IX86_BUILTIN_PSRADI128_MASK:
+   case IX86_BUILTIN_PSRADI256:
+   case IX86_BUILTIN_PSRADI256_MASK:
+   case IX86_BUILTIN_PSRADI512:
+   case IX86_BUILTIN_PSRAQ128_MASK:
+   case IX86_BUILTIN_PSRAQ256_MASK:
+   case IX86_BUILTIN_PSRAQ512:
+   case IX86_BUILTIN_PSRAQI128_MASK:
+   case IX86_BUILTIN_PSRAQI256_MASK:
+   case IX86_BUILTIN_PSRAQI512:
+   case IX86_BUILTIN_PSRAW:
+   case IX

Incremental LTO linking part 5: symtab and compilation driver support

2018-05-08 Thread Jan Hubicka
Hi,
this patch adds the symtab support for LTO incremental linking. Most of the
code path is same for both modes of incremental link except hat we want to
produce LTO object file rather than compile down to assembly.

Only non-obvious changes are in ipa.c where I hit a bug where we stream in 
initializers that are going to be eliminated form the symbol table for no
good reasons.

Bootstrapped/regtested x86_64-linux with rest of the incremental link patchset.

Honza

* passes.c (ipa_write_summaries): Only modify statements if body
is in memory.
* cgraphunit.c (ipa_passes): Also produce intermeidate code when
incrementally linking.
(ipa_passes): Likewise.
* lto-cgraph.c (lto_output_node): When incrementally linking do not
pass down resolution info.
* common.opt (flag_incremental_link): Update info.
* gcc.c (plugin specs): Turn flinker-output=* to
-plugin-opt=-linker-output-known
* toplev.c (compile_file): Also cut compilation when doing incremental
link.
* flag-types. (enum lto_partition_model): Add
LTO_LINKER_OUTPUT_NOLTOREL.
(invoke.texi): Add -flinker-output docs.
* ipa.c (symbol_table::remove_unreachable_nodes): Handle LTO incremental
link same way as WPA; do not stream in dead initializers.

* lang.opt (lto_linker_output): Add nolto-rel.
* lto-lang.c (lto_post_options): Handle LTO_LINKER_OUTPUT_REL
and LTO_LINKER_OUTPUT_NOLTOREL.
(lto_init): Generate lto when doing incremental link.
* lto.c (lto_precess_name): Add lto1-inclink.
Index: cgraphunit.c
===
--- cgraphunit.c(revision 260042)
+++ cgraphunit.c(working copy)
@@ -2452,8 +2452,10 @@
   if (flag_generate_lto || flag_generate_offload)
 targetm.asm_out.lto_start ();
 
-  if (!in_lto_p)
+  if (!in_lto_p || flag_incremental_link == 2)
 {
+  if (!quiet_flag)
+   fprintf (stderr, "Streaming LTO\n");
   if (g->have_offload)
{
  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
@@ -2472,7 +2474,9 @@
   if (flag_generate_lto || flag_generate_offload)
 targetm.asm_out.lto_end ();
 
-  if (!flag_ltrans && (in_lto_p || !flag_lto || flag_fat_lto_objects))
+  if (!flag_ltrans
+  && ((in_lto_p && flag_incremental_link != 2)
+ || !flag_lto || flag_fat_lto_objects))
 execute_ipa_pass_list (passes->all_regular_ipa_passes);
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
 
@@ -2559,7 +2563,8 @@
 
   /* Do nothing else if any IPA pass found errors or if we are just streaming 
LTO.  */
   if (seen_error ()
-  || (!in_lto_p && flag_lto && !flag_fat_lto_objects))
+  || ((!in_lto_p || flag_incremental_link == 2)
+ && flag_lto && !flag_fat_lto_objects))
 {
   timevar_pop (TV_CGRAPHOPT);
   return;
Index: common.opt
===
--- common.opt  (revision 260042)
+++ common.opt  (working copy)
@@ -48,7 +48,8 @@
 
 ; This variable is set to non-0 only by LTO front-end.  1 indicates that
 ; the output produced will be used for incrmeental linking (thus weak symbols
-; can still be bound).
+; can still be bound) and 2 indicates that the IL is going to be linked and
+; and output to LTO object file.
 Variable
 int flag_incremental_link = 0
 
Index: flag-types.h
===
--- flag-types.h(revision 260042)
+++ flag-types.h(working copy)
@@ -289,6 +289,7 @@
 enum lto_linker_output {
   LTO_LINKER_OUTPUT_UNKNOWN,
   LTO_LINKER_OUTPUT_REL,
+  LTO_LINKER_OUTPUT_NOLTOREL,
   LTO_LINKER_OUTPUT_DYN,
   LTO_LINKER_OUTPUT_PIE,
   LTO_LINKER_OUTPUT_EXEC
Index: gcc.c
===
--- gcc.c   (revision 260042)
+++ gcc.c   (working copy)
@@ -961,6 +961,7 @@
 -plugin %(linker_plugin_file) \
 -plugin-opt=%(lto_wrapper) \
 -plugin-opt=-fresolution=%u.res \
+%{flinker-output=*:-plugin-opt=-linker-output-known} \
 %{!nostdlib:%{!nodefaultlibs:%:pass-through-libs(%(link_gcc_c_sequence))}} 
\
 }" PLUGIN_COND_CLOSE
 #else
Index: ipa.c
===
--- ipa.c   (revision 260042)
+++ ipa.c   (working copy)
@@ -130,9 +130,9 @@
 constant folding.  Keep references alive so partitioning
 knows about potential references.  */
  || (VAR_P (node->decl)
- && flag_wpa
- && ctor_for_folding (node->decl)
-!= error_mark_node
+ && (flag_wpa || flag_incremental_link == 2)
+ && dyn_cast  (node)
+  ->ctor_useable_for_folding_p ()
{
  /* Be sure that we will not optimize out alias target

Incremental LTO linking part 6: dwarf2out support

2018-05-08 Thread Jan Hubicka
Hi,
this patch tells dwarf2out that it can have early debug not only in WPA mode
but also when incrementally linking. This prevents ICE on almost every testcase
compiled with -g.

Bootstrapped/regtested x86_64-linux with rest of incremental linking patchet.
Makes sense?

Honza

* dwarf2out.c (dwarf2out_die_ref_for_decl,
darf2out_register_external_decl): Support incremental link.
Index: dwarf2out.c
===
--- dwarf2out.c (revision 260042)
+++ dwarf2out.c (working copy)
@@ -5822,7 +5822,7 @@
 {
   dw_die_ref die;
 
-  if (flag_wpa && !decl_die_table)
+  if ((flag_wpa || flag_incremental_link == 2) && !decl_die_table)
 return false;
 
   if (TREE_CODE (decl) == BLOCK)
@@ -5832,10 +5832,10 @@
   if (!die)
 return false;
 
-  /* During WPA stage we currently use DIEs to store the
- decl <-> label + offset map.  That's quite inefficient but it
- works for now.  */
-  if (flag_wpa)
+  /* During WPA stage and incremental linking we currently use DIEs
+ to store the decl <-> label + offset map.  That's quite inefficient
+ but it works for now.  */
+  if (flag_wpa || flag_incremental_link == 2)
 {
   dw_die_ref ref = get_AT_ref (die, DW_AT_abstract_origin);
   if (!ref)
@@ -5886,7 +5886,7 @@
   if (debug_info_level == DINFO_LEVEL_NONE)
 return;
 
-  if (flag_wpa && !decl_die_table)
+  if ((flag_wpa || flag_incremental_link == 2) && !decl_die_table)
 decl_die_table = hash_table::create_ggc (1000);
 
   dw_die_ref die
@@ -5921,7 +5921,8 @@
parent = BLOCK_DIE (ctx);
   else if (TREE_CODE (ctx) == TRANSLATION_UNIT_DECL
   /* Keep the 1:1 association during WPA.  */
-  && !flag_wpa)
+  && !flag_wpa
+  && flag_incremental_link != 2)
/* Otherwise all late annotations go to the main CU which
   imports the original CUs.  */
parent = comp_unit_die ();
@@ -5942,7 +5943,7 @@
   switch (TREE_CODE (decl))
 {
 case TRANSLATION_UNIT_DECL:
-  if (! flag_wpa)
+  if (! flag_wpa && flag_incremental_link != 2)
{
  die = comp_unit_die ();
  dw_die_ref import = new_die (DW_TAG_imported_unit, die, NULL_TREE);


Incremental LTO linking part 7: documentation

2018-05-08 Thread Jan Hubicka
Hi,
this patch adds documentation of -flinker-output.

* doc/invoke.texi (-flinker-output): Document
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 260042)
+++ doc/invoke.texi (working copy)
@@ -12208,6 +12208,50 @@
 object file names should not be used as arguments.  @xref{Overall
 Options}.
 
+@item -flinker-output=@var{type}
+@opindex -flinker-output
+This option controls the code generation of the link time optimizer.  By
+default the linker output is determined by the linker plugin automatically. For
+debugging the compiler and in the case of incremental linking to non-lto object
+file is desired, it may be useful to control the type manually.
+
+If @var{type} is @samp{exec} the code generation is configured to produce 
static
+binary. In this case @option{-fpic} and @option{-fpie} are both disabled.
+
+If @var{type} is @samp{dyn} the code generation is configured to produce shared
+library. In this case @option{-fpic} or @option{-fPIC} is preserved, but not
+enabled automatically.  This makes it possible to build shared libraries 
without
+position independent code on architectures this is possible, i.e. on x86.
+
+If @var{type} is @samp{pie} the code generation is configured to produce
+@option{-fpie} executable. This result in similar optimizations as @samp{exec}
+except that @option{-fpie} is not disabled if specified at compilation time.
+
+If @var{type} is @samp{rel} the compiler assumes that incremental linking is
+done.  The sections containing intermediate code for link-time optimization are
+merged, pre-optimized, and output to the resulting object file. In addition, if
+@option{-ffat-lto-objects} is specified the binary code is produced for future
+non-lto linking. The object file produced by incremental linking will be 
smaller
+than a static library produced from the same object files.  At link-time the
+result of incremental linking will also load faster to compiler than a static
+library assuming that majority of objects in the library are used.
+
+Finally @samp{nolto-rel} configure compiler to for incremental linking where
+code generation is forced, final binary is produced and the intermediate code
+for later link-time optimization is stripped. When multiple object files are
+linked together the resulting code will be optimized better than with link time
+optimizations disabled (for example, the cross-module inlining will happen),
+most of benefits of whole program optimizations are however lost. 
+
+During the incremental link (by @option{-r}) the linker plugin will default to
+@option{rel}. With current interfaces to GNU Binutils it is however not
+possible to link incrementally LTO objects and non-LTO objects into a single
+mixed object file.  In the case any of object files in incremental link can not
+be used for link-time optimization the linker plugin will output warning and
+use @samp{nolto-rel}. To maintain the whole program optimization it is
+recommended to link such objects into static library instead. Alternatively it
+is possible to use H.J. Lu's binutils with support for mixed objects.
+
 @item -fuse-ld=bfd
 @opindex fuse-ld=bfd
 Use the @command{bfd} linker instead of the default linker.


Incremental LTO linking part 8: testsuite compensation

2018-05-08 Thread Jan Hubicka

Hi,
most testcases are written with assumption that -r will trigger code generation.
To make them still meaningful they need nolto-rel.  Bootstrapped/regtested 
x86_64-linux
with the rest of incremental link changes.

Honza

2018-05-08  Jan Hubicka  

* testsuite/g++.dg/lto/20081109-1_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20081118_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20081119-1_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20081120-1_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20081120-2_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20081123_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20081204-1_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20081219_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20090302_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20090313_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20091002-2_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20091002-3_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20091026-1_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20100724-1_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20101010-4_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20101015-2_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/20110311-1_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/pr45621_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/pr48042_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/pr48354-1_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/pr54625-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/pr54625-2_0.c: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/lto/pr68811_0.C: Add -flinker-output=nolto-rel.
* testsuite/g++.dg/torture/pr43760.C: New test. Add 
-flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20081120-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20081120-2_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20081126_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20081204-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20081204-2_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20081212-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20081224_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20090116_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20090126-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20090126-2_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20090206-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20090219_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20091013-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20091014-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20091015-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20091016-1_0.c: Add -flinker-output=nolto-rel.
* testsuite/gcc.dg/lto/20091020-1_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/20091020-2_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/20091027-1_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/20100426_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/20100430-1_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/20100603-1_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/20100603-2_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/20100603-3_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/20111213-1_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/pr45736_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/pr52634_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/pr54702_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/pr59323-2_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/pr59323_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/pr60820_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/pr81406_0.c: Add -flinker-output-nolto-rel.
* testsuite/gcc.dg/lto/pr83388_0.c: Add -flinker-output-nolto-rel.
* testsuite/gfortran.dg/lto/20091016-1_0.f90: Add 
-flinker-output-nolto-rel.
* testsuite/gfortran.dg/lto/20091028-1_0.f90: Add 
-flinker-output-nolto-rel.
* testsuite/gfortran.dg/lto/20091028-2_0.f90: Add 
-flinker-output-nolto-rel.
* testsuite/gfortran.dg/lto/pr46911_0.f: Add -flinker-output-nolto-rel.
  

Re: Add clobbers around IFN_LOAD/STORE_LANES

2018-05-08 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, May 8, 2018 at 3:25 PM, Richard Sandiford
>  wrote:
>> We build up the input to IFN_STORE_LANES one vector at a time.
>> In RTL, each of these vector assignments becomes a write to
>> subregs of the form (subreg:VEC (reg:AGGR R)), where R is the
>> eventual input to the store lanes instruction.  The problem is
>> that RTL isn't very good at tracking liveness when things are
>> initialised piecemeal by subregs, so R tends to end up being
>> live on all paths from the entry block to the store.  This in
>> turn leads to unnecessary spilling around calls, as well as to
>> excess register pressure in vector loops.
>>
>> This patch adds gimple clobbers to indicate the liveness of the
>> IFN_STORE_LANES variable and makes sure that gimple clobbers are
>> expanded to rtl clobbers where useful.  For consistency it also
>> uses clobbers to mark the point at which an IFN_LOAD_LANES
>> variable is no longer needed.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf
>> and x86_64-linux-gnu.  OK to install?
>
> Minor comment inline.

Thanks, fixed.

> Also it looks like clobbers are at the moment all thrown away during
> RTL expansion?  Do the clobbers we generate with this patch eventually
> get collected somehow if they turn out to be no longer necessary?
> How many of them do we generate?  I expect not many decls get
> expanded to registers and if they are most of them are likely
> not constructed piecemail  - thus, maybe we should restrict ourselves
> to non-scalar typed lhs?  So, change it to
>
>   if (DECL_P (lhs)
>   && (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) // but what about
> single-element aggregates?
>  || VECTOR_TYPE_P (TREE_TYPE (lhs))
>  || COMPLEX_TYPE_P (TREE_TYPE (lhs

How about instead deciding based on whether the pseudo register spans a
single hard register or multiple hard registers, as per the patch below?
The clobber is only useful if the pseudo register can be partially
modified via subregs.

This could potentially also help with any large single-element
aggregrates that get broken down into word-sized subreg ops.

> The vectorizer part is ok with the minor adjustment pointed out below.  Maybe
> you want to split this patch while we discuss the RTL bits.

OK, thanks.  I'll keep it as one patch for applying purposes,
but snipped the approved bits below.

Richard


2018-05-08  Richard Sandiford  

gcc/
* cfgexpand.c (expand_clobber): New function.
(expand_gimple_stmt_1): Use it.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c 2018-05-08 16:50:31.815501502 +0100
+++ gcc/cfgexpand.c 2018-05-08 16:50:31.997495804 +0100
@@ -3582,6 +3582,26 @@ expand_return (tree retval, tree bounds)
 }
 }
 
+/* Expand a clobber of LHS.  If LHS is stored it in a multi-part
+   register, tell the rtl optimizers that its value is no longer
+   needed.  */
+
+static void
+expand_clobber (tree lhs)
+{
+  if (DECL_P (lhs))
+{
+  rtx decl_rtl = DECL_RTL_IF_SET (lhs);
+  if (decl_rtl && REG_P (decl_rtl))
+   {
+ machine_mode decl_mode = GET_MODE (decl_rtl);
+ if (maybe_gt (GET_MODE_SIZE (decl_mode),
+   REGMODE_NATURAL_SIZE (decl_mode)))
+   emit_clobber (decl_rtl);
+   }
+}
+}
+
 /* A subroutine of expand_gimple_stmt, expanding one gimple statement
STMT that doesn't require special handling for outgoing edges.  That
is no tailcalls and no GIMPLE_COND.  */
@@ -3687,7 +3707,7 @@ expand_gimple_stmt_1 (gimple *stmt)
if (TREE_CLOBBER_P (rhs))
  /* This is a clobber to mark the going out of scope for
 this LHS.  */
- ;
+ expand_clobber (lhs);
else
  expand_assignment (lhs, rhs,
 gimple_assign_nontemporal_move_p (


Re: [PATCH] Add peephole2's for mem {+,-,&,|,^}= x; mem != 0 after cmpelim (PR target/85683)

2018-05-08 Thread Uros Bizjak
On Tue, May 8, 2018 at 5:21 PM, Jakub Jelinek  wrote:
> Hi!
>
> Since r247992 the cmpelim pass optimizes a few arithmetics with following
> comparisons and some of the peephole2s we have to recognize RMW instructions
> with comparisons don't trigger anymore.
> In particular, on the pr49095.c testcase in GCC 7 only 8 functions used
> load + comparison with arith + store ([fh]*xor, something to look at later),
> while in GCC 8/9 21 further functions do that.  This patch restores it to
> the GCC 7 counts.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> What about GCC 8.2?
>
> 2018-05-08  Jakub Jelinek  
>
> PR target/85683
> * config/i386/i386.md: Add peepholes for mem {+,-,&,|,^}= x; mem != 0
> after cmpelim optimization.
>
> * gcc.target/i386/pr49095.c: Add -masm=att to dg-options.  Add
> scan-assembler-times checking that except for [fh]*xor other functions
> don't use any load instructions.

OK for mainline and backport.

Thanks,
Uros.


Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.

2018-05-08 Thread Segher Boessenkool
Hi Carl,

On Mon, May 07, 2018 at 01:34:55PM -0700, Carl Love wrote:
> This patch maps n2=0 to generate the dcbtstt mnemonic (dcbst for TH
> value of 0b1) for a write prefetch and dcbtst for n2 in range
> [1,3].  
> 
> The dcbtt mnemonic (dcbt for TH value of 0b1) is generated for a
> read prefetch when n2=0 and the dbct instruction is generated for n2 in
> range [1,3].
> 
> The ISA states that the value TH = 0b1 is a hint that the processor
> will probably soon perform a load from the addressed block. 

(s/dcbst/dcbtst/).  Yup, sounds good.

> The regression testing of the patch was done on 
> 
>powerpc64le-unknown-linux-gnu (Power 8 LE)
> 
> with no regressions.  

What ISA version is required for the TH field to do anything?  Will
it work on older machines too (just ignored)?  What assembler version
is required?

> 2018-05-07  Carl Love  
> 
> * config/rs6000/rs6000.md: Add dcbtst, dcbtt instruction generation
>   to define_insn prefetch.

* config/rs6000/rs6000.md (prefetch): Generate dcbtt and dcbtstt instructions
if operands[2] is 0.

or similar.

> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 2b15cca..7429d33 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -13233,10 +13233,19 @@
>(match_operand:SI 2 "const_int_operand" "n"))]
>""
>  {
> -  if (GET_CODE (operands[0]) == REG)
> -return INTVAL (operands[1]) ? "dcbtst 0,%0" : "dcbt 0,%0";
> -  return INTVAL (operands[1]) ? "dcbtst %a0" : "dcbt %a0";
> -}
> +  if (GET_CODE (operands[0]) == REG) {

Use REG_P please.

The correct formatting is

if (this)
  {
something;
  }
else
  {
whatever;
  }

> +if (INTVAL (operands[1]) == 0)

You can also say

  if (operands[1] == const0_rtx)

if that is easier to read.

> +  }
> + }

If you do indenting right there never is a single space indent difference
(always is even).

It's a pity we need to decide between %a0 and not.  Hardly seems worth
making another output modifier for though.


Segher


Re: libstdc++: ODR violation when using std::regex with and without -D_GLIBCXX_DEBUG

2018-05-08 Thread Jonathan Wakely

On 08/05/18 16:17 +0100, Jonathan Wakely wrote:

On 8 May 2018 at 15:45, Marc Glisse  wrote:

On Tue, 8 May 2018, Jonathan Wakely wrote:


On 8 May 2018 at 14:00, Jonathan Wakely wrote:


On 8 May 2018 at 13:44, Stephan Bergmann wrote:


I was recently bitten by the following issue (Linux, libstdc++ 8.0.1): A
process loads two dynamic libraries A and B both using std::regex, and A
is
compiled without -D_GLIBCXX_DEBUG while B is compiled with
-D_GLIBCXX_DEBUG.



This is only supported in very restricted cases.


B creates an instance of std::regex, which internally creates a
std::shared_ptr>>,
where _NFA has various members of std::__debug::vector type (but which
isn't
reflected in the mangled name of that _NFA instantiation itself).

Now, when that instance of std::regex is destroyed again in library B,
the

std::shared_ptr>>::~shared_ptr
destructor (and functions it in turn calls) that happens to get picked
is
the (inlined, and exported due to default visibility) instance from
library
A.  And that assumes that that _NFA instantiation has members of
non-debug
std::vector type, which causes a crash.

Should it be considered a bug that such mixture of debug and non-debug
std::regex usage causes ODR violations?



Yes, but my frank response is "don't do that".

The right fix here might be to ensure that _NFA always uses the
non-debug vector even in Debug Mode, but I'm fairly certain there are
other similar problems lurking.



N.B. I think this discussion belongs on the libstdc++ list.



Would it make sense to use the abi_tag attribute to help with that? (I
didn't really think about it, maybe it doesn't)


Yes, I think we could add it conditionally in debug mode, so that
types with members that are either std::xxx or __gnu_debug::xxx get a
different mangled name in debug mode.

For the regex _NFA type I don't think we want the debug mode checking,
because users can't access it directly so any errors are in the
libstdc++ implementation and we should have eliminated them ourselves,
not be pushing detection of those logic errors into users' programs.


I've committed this patch to do that.



For std::match_results (which derives from std::vector) it's possible
for users to use invalid iterators obtained from a match_results, so
Debug Mode can help. In that case we could decide whether to add the
abi_tag, or always derive from _GLIBCXX_STD_C::vector (i.e. the
non-debug mode one), or even provide an entire
__gnu_debug::match_results type.


"don't do that" remains the most sensible answer.


Yes, it's just asking for trouble.
commit 9e026542864d4ff5dd45ffdc43ec367e36aff8a6
Author: Jonathan Wakely 
Date:   Tue May 8 16:39:33 2018 +0100

Make std::regex automata use non-debug vector in Debug Mode

* include/bits/regex_automaton.h (_NFA_base::_M_paren_stack, _NFA):
Use normal std::vector even in Debug Mode.

diff --git a/libstdc++-v3/include/bits/regex_automaton.h b/libstdc++-v3/include/bits/regex_automaton.h
index bf51df79097..ff87dcc245d 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -210,7 +210,7 @@ namespace __detail
 _M_sub_count() const
 { return _M_subexpr_count; }
 
-std::vector   _M_paren_stack;
+_GLIBCXX_STD_C::vector _M_paren_stack;
 _FlagT_M_flags;
 _StateIdT _M_start_state;
 _SizeT_M_subexpr_count;
@@ -219,7 +219,7 @@ namespace __detail
 
   template
 struct _NFA
-: _NFA_base, std::vector<_State>
+: _NFA_base, _GLIBCXX_STD_C::vector<_State>
 {
   typedef typename _TraitsT::char_type	_Char_type;
   typedef _State<_Char_type>		_StateT;


[PATCH, testsuite]: Add testcase to check for psadbw generation (PR 85693)

2018-05-08 Thread Uros Bizjak
Hello!

The testcase checks if the compiler is able to vectorize with psadbw
insn on x86 targets.

2018-05-08  Uros Bizjak  

PR target/85693
* gcc.target/i386/pr85693.c: New test.

Tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: gcc.target/i386/pr85693.c
===
--- gcc.target/i386/pr85693.c   (nonexistent)
+++ gcc.target/i386/pr85693.c   (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile }
+/* { dg-options "-msse2 -O2 -ftree-vectorize" } */
+
+#define N 1024
+
+int abs (int);
+
+unsigned char pix1[N], pix2[N];
+
+int foo (void)
+{
+  int i_sum = 0;
+  int i;
+
+  for (i = 0; i < N; i++)
+i_sum += abs (pix1[i] - pix2[i]);
+
+  return i_sum;
+}
+
+/* { dg-final { scan-assembler "psadbw" } } */


Re: [C++ PATCH] Fix offsetof constexpr handling (PR c++/85662)

2018-05-08 Thread Jason Merrill
On Sun, May 6, 2018 at 1:56 PM, Jakub Jelinek  wrote:
> --- gcc/c-family/c-common.c.jj  2018-03-27 21:58:55.598502113 +0200
> +++ gcc/c-family/c-common.c 2018-05-05 10:55:47.951600802 +0200
> @@ -6171,7 +6171,7 @@ c_common_to_target_charset (HOST_WIDE_IN
> traditional rendering of offsetof as a macro.  Return the folded result.  
> */
>
>  tree
> -fold_offsetof_1 (tree expr, enum tree_code ctx)
> +fold_offsetof_1 (tree expr, bool nonptr, enum tree_code ctx)

The comment needs to document the NONPTR parameter.

> @@ -6287,7 +6291,7 @@ fold_offsetof_1 (tree expr, enum tree_co
>  tree
>  fold_offsetof (tree expr)
>  {
> -  return convert (size_type_node, fold_offsetof_1 (expr));
> +  return convert (size_type_node, fold_offsetof_1 (expr, true));
>  }

Since all the uses of fold_offset_1 involve converting to a particular
type, I wonder about wrapping it so that the argument for nonptr is
determined from that type.

Jason


Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")

2018-05-08 Thread Jason Merrill
On Fri, Apr 20, 2018 at 1:46 PM, Paolo Carlini  wrote:
> Hi,
>
> in this error-recovery regression, after sensible diagnostic about "two or
> more data types in declaration..." we get confused, we issue a cryptic -
> but useful hint to somebody working on the present bug ;) - "template
> definition of non-template" error and we finally crash. I think the issue
> here is that we want to use abort_fully_implicit_template as part of the
> error recovery done by cp_parser_parameter_declaration_list, when the loop
> is exited early after a cp_parser_parameter_declaration internally called
> synthesize_implicit_template_parm. Indeed, if we do that we get the same
> error recovery behavior we get for the same testcase modified to not use an
> auto parameter (likewise for related testcases):
>
> struct a {
>   void b() {}
>void c(auto = [] {
> if (a a(int int){})
>   ;
>   }) {}
> };

Hmm, the erroneous declaration is within the lambda body, so messing
with whether c is a template seems wrong.

Jason


Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")

2018-05-08 Thread Paolo Carlini

Hi,

On 08/05/2018 19:15, Jason Merrill wrote:

On Fri, Apr 20, 2018 at 1:46 PM, Paolo Carlini  wrote:

Hi,

in this error-recovery regression, after sensible diagnostic about "two or
more data types in declaration..." we get confused, we issue a cryptic -
but useful hint to somebody working on the present bug ;) - "template
definition of non-template" error and we finally crash. I think the issue
here is that we want to use abort_fully_implicit_template as part of the
error recovery done by cp_parser_parameter_declaration_list, when the loop
is exited early after a cp_parser_parameter_declaration internally called
synthesize_implicit_template_parm. Indeed, if we do that we get the same
error recovery behavior we get for the same testcase modified to not use an
auto parameter (likewise for related testcases):

struct a {
   void b() {}
void c(auto = [] {
 if (a a(int int){})
   ;
   }) {}
};

Hmm, the erroneous declaration is within the lambda body, so messing
with whether c is a template seems wrong.
I'm sorry, I don't follow: why you think the issue has to do with c? The 
issue happens while we are parsing:


    a a(int auto)

in the original testcase, in particular the parameters. We set 
parser->fully_implicit_function_template_p in 
synthesize_implicit_template_parm, which in turn is called by 
cp_parser_simple_type_specifier when it sees the auto. As I said, we 
don't have the bug for the snippet you quote above, which is identical 
to that attached in the bug but for the auto in the declaration of a:


struct a {
  void b() {}
  void c(void (*) () = [] {
  if (a a(int auto){})
  ;
  }) {}
};

Paolo.


C++ PATCH for c++/85695, rejects-valid with constexpr if

2018-05-08 Thread Marek Polacek
Here we were confused by a typedef so the "== boolean_type_node" check didn't
work as intended.  We can use TYPE_MAIN_VARIANT to see the real type.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-05-08  Marek Polacek  

PR c++/85695
* semantics.c (finish_if_stmt_cond): See through typedefs.

* g++.dg/cpp1z/constexpr-if22.C: New test.

diff --git gcc/cp/semantics.c gcc/cp/semantics.c
index 2b2b51b2a7e..195286ca751 100644
--- gcc/cp/semantics.c
+++ gcc/cp/semantics.c
@@ -736,7 +736,7 @@ finish_if_stmt_cond (tree cond, tree if_stmt)
   && !instantiation_dependent_expression_p (cond)
   /* Wait until instantiation time, since only then COND has been
 converted to bool.  */
-  && TREE_TYPE (cond) == boolean_type_node)
+  && TYPE_MAIN_VARIANT (TREE_TYPE (cond)) == boolean_type_node)
 {
   cond = instantiate_non_dependent_expr (cond);
   cond = cxx_constant_value (cond, NULL_TREE);
diff --git gcc/testsuite/g++.dg/cpp1z/constexpr-if22.C 
gcc/testsuite/g++.dg/cpp1z/constexpr-if22.C
index e69de29bb2d..76f0c73476b 100644
--- gcc/testsuite/g++.dg/cpp1z/constexpr-if22.C
+++ gcc/testsuite/g++.dg/cpp1z/constexpr-if22.C
@@ -0,0 +1,21 @@
+// PR c++/85695
+// { dg-options -std=c++17 }
+
+template 
+struct integral_constant {
+using value_type = T;
+static constexpr const value_type value = v;
+constexpr operator value_type (void) const { return value; }
+};
+template  struct is_trivial
+: public integral_constant {};
+
+template 
+T clone_object (const T& p)
+{
+if constexpr (is_trivial::value)
+return p;
+else
+return p.clone();
+}
+int main (void) { return clone_object(0); }


Re: C++ PATCH for c++/85695, rejects-valid with constexpr if

2018-05-08 Thread Jason Merrill
OK for trunk and 8.

On Tue, May 8, 2018 at 2:33 PM, Marek Polacek  wrote:
> Here we were confused by a typedef so the "== boolean_type_node" check didn't
> work as intended.  We can use TYPE_MAIN_VARIANT to see the real type.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-05-08  Marek Polacek  
>
> PR c++/85695
> * semantics.c (finish_if_stmt_cond): See through typedefs.
>
> * g++.dg/cpp1z/constexpr-if22.C: New test.
>
> diff --git gcc/cp/semantics.c gcc/cp/semantics.c
> index 2b2b51b2a7e..195286ca751 100644
> --- gcc/cp/semantics.c
> +++ gcc/cp/semantics.c
> @@ -736,7 +736,7 @@ finish_if_stmt_cond (tree cond, tree if_stmt)
>&& !instantiation_dependent_expression_p (cond)
>/* Wait until instantiation time, since only then COND has been
>  converted to bool.  */
> -  && TREE_TYPE (cond) == boolean_type_node)
> +  && TYPE_MAIN_VARIANT (TREE_TYPE (cond)) == boolean_type_node)
>  {
>cond = instantiate_non_dependent_expr (cond);
>cond = cxx_constant_value (cond, NULL_TREE);
> diff --git gcc/testsuite/g++.dg/cpp1z/constexpr-if22.C 
> gcc/testsuite/g++.dg/cpp1z/constexpr-if22.C
> index e69de29bb2d..76f0c73476b 100644
> --- gcc/testsuite/g++.dg/cpp1z/constexpr-if22.C
> +++ gcc/testsuite/g++.dg/cpp1z/constexpr-if22.C
> @@ -0,0 +1,21 @@
> +// PR c++/85695
> +// { dg-options -std=c++17 }
> +
> +template 
> +struct integral_constant {
> +using value_type = T;
> +static constexpr const value_type value = v;
> +constexpr operator value_type (void) const { return value; }
> +};
> +template  struct is_trivial
> +: public integral_constant {};
> +
> +template 
> +T clone_object (const T& p)
> +{
> +if constexpr (is_trivial::value)
> +return p;
> +else
> +return p.clone();
> +}
> +int main (void) { return clone_object(0); }


Re: [C++ Patch] PR 84588 ("[8 Regression] internal compiler error: Segmentation fault (contains_struct_check())")

2018-05-08 Thread Jason Merrill
On Tue, May 8, 2018 at 1:46 PM, Paolo Carlini  wrote:
> Hi,
>
> On 08/05/2018 19:15, Jason Merrill wrote:
>>
>> On Fri, Apr 20, 2018 at 1:46 PM, Paolo Carlini 
>> wrote:
>>>
>>> Hi,
>>>
>>> in this error-recovery regression, after sensible diagnostic about "two
>>> or
>>> more data types in declaration..." we get confused, we issue a cryptic -
>>> but useful hint to somebody working on the present bug ;) - "template
>>> definition of non-template" error and we finally crash. I think the issue
>>> here is that we want to use abort_fully_implicit_template as part of the
>>> error recovery done by cp_parser_parameter_declaration_list, when the
>>> loop
>>> is exited early after a cp_parser_parameter_declaration internally called
>>> synthesize_implicit_template_parm. Indeed, if we do that we get the same
>>> error recovery behavior we get for the same testcase modified to not use
>>> an
>>> auto parameter (likewise for related testcases):
>>>
>>> struct a {
>>>void b() {}
>>> void c(auto = [] {
>>>  if (a a(int int){})
>>>;
>>>}) {}
>>> };
>>
>> Hmm, the erroneous declaration is within the lambda body, so messing
>> with whether c is a template seems wrong.
>
> I'm sorry, I don't follow: why you think the issue has to do with c? The
> issue happens while we are parsing:
>
> a a(int auto)
>
> in the original testcase, in particular the parameters. We set
> parser->fully_implicit_function_template_p in
> synthesize_implicit_template_parm, which in turn is called by
> cp_parser_simple_type_specifier when it sees the auto. As I said, we don't
> have the bug for the snippet you quote above, which is identical to that
> attached in the bug but for the auto in the declaration of a:
>
> struct a {
>   void b() {}
>   void c(void (*) () = [] {
>   if (a a(int auto){})
>   ;
>   }) {}
> };

Ah, I was assuming the quoted testcase was the one in the PR.  The patch is OK.

Jason


Re: Debug Mode ENH 3/4: Add backtrace

2018-05-08 Thread François Dumont

On 08/05/2018 17:27, Jonathan Wakely wrote:

On 07/05/18 22:20 +0200, François Dumont wrote:

Hi

    Here is the patch to add backtrace info to debug assertion 
failure output.


Example:

/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/vector:188: 


In function:
    std::__debug::vector<_Tp, _Allocator>::vector(_InputIterator,
    _InputIterator, const _Allocator&) [with _InputIterator =
std::reverse_iterator<__gnu_debug::_Safe_tagged_iterator<__gnu_cxx::__normal_iterator


    std::vector >, std::__debug::vector,
    std::random_access_iterator_tag> >;  = 
void; _Tp

    = int; _Allocator = std::allocator]

Backtrace:
    ./debug_neg.exe() [0x4020c1]
    ./debug_neg.exe() [0x400e59]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) 
[0x7f13fc56e830]

    ./debug_neg.exe() [0x400eb9]

I tried to use add2line on the output address and it worked fine.

Tested under Linux x86_64.

I'll commit tomorrow if not told otherwise.

    * src/c++11/debug.cc [_GLIBCXX_HAVE_EXECINFO_H]: Include execinfo.h.
    [_GLIBCXX_HAVE_EXECINFO_H](_Error_formatter::_M_error): Render 
backtrace.


Did you look into using libbacktrace? That resolves the addresses and
prints nice symbols. See the output of AddressSantizer for what it
looks like (I think that uses libbacktrace).



I'll go with this version for now but I'll look into libbacktrace.

It will be perhaps the occasion to play with autoconf & al tools to find 
out if I can use libbacktrace.




Re: [C++ PATCH] Fix offsetof constexpr handling (PR c++/85662)

2018-05-08 Thread Jakub Jelinek
On Tue, May 08, 2018 at 01:03:00PM -0400, Jason Merrill wrote:
> On Sun, May 6, 2018 at 1:56 PM, Jakub Jelinek  wrote:
> > --- gcc/c-family/c-common.c.jj  2018-03-27 21:58:55.598502113 +0200
> > +++ gcc/c-family/c-common.c 2018-05-05 10:55:47.951600802 +0200
> > @@ -6171,7 +6171,7 @@ c_common_to_target_charset (HOST_WIDE_IN
> > traditional rendering of offsetof as a macro.  Return the folded 
> > result.  */
> >
> >  tree
> > -fold_offsetof_1 (tree expr, enum tree_code ctx)
> > +fold_offsetof_1 (tree expr, bool nonptr, enum tree_code ctx)
> 
> The comment needs to document the NONPTR parameter.

Ok.

> > @@ -6287,7 +6291,7 @@ fold_offsetof_1 (tree expr, enum tree_co
> >  tree
> >  fold_offsetof (tree expr)
> >  {
> > -  return convert (size_type_node, fold_offsetof_1 (expr));
> > +  return convert (size_type_node, fold_offsetof_1 (expr, true));
> >  }
> 
> Since all the uses of fold_offset_1 involve converting to a particular
> type, I wonder about wrapping it so that the argument for nonptr is
> determined from that type.

So like this?

2018-05-08  Jakub Jelinek  

PR c++/85662
* c-common.h (fold_offsetof_1): Add TYPE argument.
* c-common.c (fold_offsetof_1): Add TYPE argument, if it is not a
pointer type, convert the pointer constant to TYPE and use size_binop
with PLUS_EXPR instead of fold_build_pointer_plus.  Adjust recursive
calls.
(fold_offsetof): Pass size_type_node as TYPE to fold_offsetof_1.

* c-fold.c (c_fully_fold_internal): Pass TREE_TYPE (expr) as TYPE
to fold_offsetof_1.
* c-typeck.c (build_unary_op): Pass argtype as TYPE to fold_offsetof_1.

* cp-gimplify.c (cp_fold): Pass TREE_TYPE (x) as TYPE to
fold_offsetof_1.

* g++.dg/ext/offsetof2.C: New test.

--- gcc/c-family/c-common.h.jj  2018-05-06 23:12:49.185619717 +0200
+++ gcc/c-family/c-common.h 2018-05-08 21:47:40.976737821 +0200
@@ -1033,7 +1033,7 @@ extern bool c_dump_tree (void *, tree);
 
 extern void verify_sequence_points (tree);
 
-extern tree fold_offsetof_1 (tree, tree_code ctx = ERROR_MARK);
+extern tree fold_offsetof_1 (tree, tree, tree_code ctx = ERROR_MARK);
 extern tree fold_offsetof (tree);
 
 extern int complete_array_type (tree *, tree, bool);
--- gcc/c-family/c-common.c.jj  2018-05-06 23:12:49.135619681 +0200
+++ gcc/c-family/c-common.c 2018-05-08 21:56:24.635088315 +0200
@@ -6168,10 +6168,12 @@ c_common_to_target_charset (HOST_WIDE_IN
 
 /* Fold an offsetof-like expression.  EXPR is a nested sequence of component
references with an INDIRECT_REF of a constant at the bottom; much like the
-   traditional rendering of offsetof as a macro.  Return the folded result.  */
+   traditional rendering of offsetof as a macro.  TYPE is the desired type of
+   the whole expression to which it will be converted afterwards.
+   Return the folded result.  */
 
 tree
-fold_offsetof_1 (tree expr, enum tree_code ctx)
+fold_offsetof_1 (tree type, tree expr, enum tree_code ctx)
 {
   tree base, off, t;
   tree_code code = TREE_CODE (expr);
@@ -6196,10 +6198,12 @@ fold_offsetof_1 (tree expr, enum tree_co
  error ("cannot apply % to a non constant address");
  return error_mark_node;
}
+  if (!POINTER_TYPE_P (type))
+   return convert (type, TREE_OPERAND (expr, 0));
   return TREE_OPERAND (expr, 0);
 
 case COMPONENT_REF:
-  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
+  base = fold_offsetof_1 (type, TREE_OPERAND (expr, 0), code);
   if (base == error_mark_node)
return base;
 
@@ -6216,7 +6220,7 @@ fold_offsetof_1 (tree expr, enum tree_co
   break;
 
 case ARRAY_REF:
-  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
+  base = fold_offsetof_1 (type, TREE_OPERAND (expr, 0), code);
   if (base == error_mark_node)
return base;
 
@@ -6273,12 +6277,14 @@ fold_offsetof_1 (tree expr, enum tree_co
   /* Handle static members of volatile structs.  */
   t = TREE_OPERAND (expr, 1);
   gcc_checking_assert (VAR_P (get_base_address (t)));
-  return fold_offsetof_1 (t);
+  return fold_offsetof_1 (type, t);
 
 default:
   gcc_unreachable ();
 }
 
+  if (!POINTER_TYPE_P (type))
+return size_binop (PLUS_EXPR, base, convert (type, off));
   return fold_build_pointer_plus (base, off);
 }
 
@@ -6287,7 +6293,7 @@ fold_offsetof_1 (tree expr, enum tree_co
 tree
 fold_offsetof (tree expr)
 {
-  return convert (size_type_node, fold_offsetof_1 (expr));
+  return convert (size_type_node, fold_offsetof_1 (size_type_node, expr));
 }
 
 
--- gcc/c/c-fold.c.jj   2018-01-17 22:00:12.310228253 +0100
+++ gcc/c/c-fold.c  2018-05-08 21:52:43.303940175 +0200
@@ -473,7 +473,8 @@ c_fully_fold_internal (tree expr, bool i
  && (op1 = get_base_address (op0)) != NULL_TREE
  && INDIRECT_REF_P (op1)
  && TREE_CONSTANT (TREE_OPERAND (op1, 0)))
-   ret = fold_convert_loc (loc, TREE_TYPE (expr), 

Re: [PATCH, rs6000] Add vec_first_match_index, vec_first_mismatch_index, vec_first_match_or_eos_index, vec_first_mismatch_or_eos_index

2018-05-08 Thread Segher Boessenkool
Hi Carl,

Just one tiny thing:

On Mon, Apr 30, 2018 at 09:05:23AM -0700, Carl Love wrote:
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-8-p9-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-8-p9-runnable.c
> new file mode 100644
> index 000..4379d41
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-8-p9-runnable.c
> @@ -0,0 +1,1044 @@
> +/* { dg-do run { target { powerpc*-*-* &&  p9vector_hw } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power9" } } */
> +/* { dg-options "-mcpu=power9 -O2" } */
> +
> +#include 
> +#include 
> +#include 

You never need both  and , because the former
includes the latter.

Okay for trunk with or without that.  Thanks!


Segher


Re: Incremental LTO linking part 2: lto-plugin support

2018-05-08 Thread H.J. Lu
On Tue, May 8, 2018 at 8:14 AM, Jan Hubicka  wrote:
> Hi,
> with lto, incremental linking can be meaninfuly done in three ways:
>  1) read LTO file and produce non-LTO .o file
> this is current behaviour of gcc -r or ld -r with plugin
>  2) read LTO files and merge section for later LTO
> this is current behaviour of ld -r w/o plugin
>  3) read LTO files into the compiler, link them and produce
> incrementaly linked LTO object.
>
> 3 makes most sense and I am maing it new default for gcc -r. For testing 
> purposes
> and perhaps in order to have tool to turn LTO object into real object, we want
> to have 1) available as well.  GCC currently have -flinker-output option that
> decides between modes that is decided by linker plugin and can be overwritten
> by user (I have forgot to document this).
>
> I am targeting for -flinker-output=rel to be incremental linking into LTO
> and adding -flinker-output=nolto-rel for 1).
>
> The main limitation of 2 and 3 is that you can not link LTO and non-LTO
> object files theger.  For 2 HJ's binutils patchset has support and I think
> it can be extended to handle 3 as well. But with default binutils we want
> to warn users.  This patch implements the warning (and prevents linker plugin
> to add redundat linker-ouptut options.


My users/hjl/lto-mixed/master branch is quite flexible.  I can extend
it if needed.

-- 
H.J.


Re: Incremental LTO linking part 2: lto-plugin support

2018-05-08 Thread Jan Hubicka
> On Tue, May 8, 2018 at 8:14 AM, Jan Hubicka  wrote:
> > Hi,
> > with lto, incremental linking can be meaninfuly done in three ways:
> >  1) read LTO file and produce non-LTO .o file
> > this is current behaviour of gcc -r or ld -r with plugin
> >  2) read LTO files and merge section for later LTO
> > this is current behaviour of ld -r w/o plugin
> >  3) read LTO files into the compiler, link them and produce
> > incrementaly linked LTO object.
> >
> > 3 makes most sense and I am maing it new default for gcc -r. For testing 
> > purposes
> > and perhaps in order to have tool to turn LTO object into real object, we 
> > want
> > to have 1) available as well.  GCC currently have -flinker-output option 
> > that
> > decides between modes that is decided by linker plugin and can be 
> > overwritten
> > by user (I have forgot to document this).
> >
> > I am targeting for -flinker-output=rel to be incremental linking into LTO
> > and adding -flinker-output=nolto-rel for 1).
> >
> > The main limitation of 2 and 3 is that you can not link LTO and non-LTO
> > object files theger.  For 2 HJ's binutils patchset has support and I think
> > it can be extended to handle 3 as well. But with default binutils we want
> > to warn users.  This patch implements the warning (and prevents linker 
> > plugin
> > to add redundat linker-ouptut options.
> 
> 
> My users/hjl/lto-mixed/master branch is quite flexible.  I can extend
> it if needed.

I think once the main patchset settles down we could add a way to communicate
to lto-plugin if combined lto+non-lto .o files are supported by linker and 
sillence
the warning.

Honza
> 
> -- 
> H.J.


Re: [PATCH] RISC-V: Use new linker emulations for glibc ABI.

2018-05-08 Thread Jim Wilson
On Fri, May 4, 2018 at 2:45 PM, Jim Wilson  wrote:
> I've submitted a binutils patch that adds some new linker emulations to fix
> a linker problem with library paths.  The rv64/lp64d linker looks in /lib64
> when glibc says it should look in /lib64/lp64d.  To make the binutils patch
> work, I had to add 4 new emulations because we have 6 ABIs.  This patch
> modifies the compiler to use the new linker emulations in the linux port.  
> This
> was done in a backwards compatible way, so the linker still looks in the
> original dir after the ABI specific dir, and I didn't change the emulation
> names for the default lp64d and ilp32d ABIs.

Committed, with a corrected ChangeLog entry.

* config/riscv/linux.h (MUSL_ABI_SUFFIX): Delete unnecessary backslash.
(LD_EMUL_SUFFIX): New.
(LINK_SPEC): Use it.

Jim


Re: [RFC] Configure and testsuite updates for ARM FDPIC target

2018-05-08 Thread Joseph Myers
On Mon, 7 May 2018, Christophe Lyon wrote:

> Roughly speaking, it is a matter of extending cases where we try to match
> $target or $host against *-linux*, or $host_os against linux*. In all these
> cases I conservatively chose to add arm*-*-uclinuxfdpiceabi or
> uclinuxfdpiceabi to avoid side-effects on other uclinux targets.

A lot of cases look like they should apply to all uclinux targets.  I 
think you need to decide case by case whether something should be for 
*-*-uclinux*, or whether it's genuinely specific to e.g. ELF shared 
libraries (in which case this isn't the only such uclinux target either - 
some use FDPIC ELF, some use other formats - but the complete list is 
nonobvious).  I think the default should be to use *-*-uclinux* unless you 
have a concrete reason this would be inappropriate in a particular place.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add ax_pthread.m4 for use in binutils-gdb

2018-05-08 Thread Joshua Watt
On Wed, Apr 18, 2018, 05:20 Pedro Alves  wrote:

> On 04/17/2018 11:10 PM, Joshua Watt wrote:
> > On Tue, 2018-04-17 at 22:50 +0100, Pedro Alves wrote:
> >> On 04/17/2018 06:24 PM, Joshua Watt wrote:
> >>> Ping? I'd really like to get this in binutils, which apparently
> >>> requires getting it here first.
> >>
> >> I think it would help if you mentioned what this is and
> >> what is the intended use case.
> >
> > Ah, that would probably be helpful! Yes, this was discussed on the
> > binutils mailing list, see:
> > https://sourceware.org/ml/binutils/2018-02/msg00260.html
> >
> > In short summary: the gold linker doesn't currently build for mingw,
> > but only because it is attempting to link against libpthread
> > incorrectly on that platform. Instead of bringing in more specialized
> > logic to account for that, I opted to include the autotools
> > ax_pthread.m4 macro (this patch) that automatically handles discovering
> > pthreads on a wide variety of platforms and compilers, including mingw.
> >
> > binutils slaves its config/ directory to GCC, so the patch is required
> > to be committed here first, and then it will be ported over there.
>
> Thanks, that helps indeed.
>
> I agree that the ax_pthread.m4 approach makes sense.  Better to use
> a field-tested macro than reinvent the wheel.  We're using other
> files from the autoconf-archive archive already, for similar reasons
> (e.g., config/ax_check_define.m4, and gdb/ax_cxx_compile_stdcxx.m4).
>
> Since GCC won't be using it (yet at least, but it's conceivable it
> could make use of it in future), there should be no harm in
> installing it even if GCC is in stage 4, IMO.
>
> I don't have the authority to approve it, though.
>
> Thanks,
> Pedro Alves
>

Ping (again)

>


Re: Debug Mode ENH 3/4: Add backtrace

2018-05-08 Thread Ian Lance Taylor via gcc-patches
On Tue, May 8, 2018 at 12:54 PM, François Dumont  wrote:
>
> I'll go with this version for now but I'll look into libbacktrace.
>
> It will be perhaps the occasion to play with autoconf & al tools to find out
> if I can use libbacktrace.

In GCC libgo and libgfortran already use libbacktrace, so there are
good examples to copy.

Ian


Re: [PATCH, rs6000] Map dcbtst, dcbtt to n2=0 for __builtin_prefetch builtin.

2018-05-08 Thread Carl Love
Segher:

On Tue, 2018-05-08 at 11:24 -0500, Segher Boessenkool wrote:
> What ISA version is required for the TH field to do anything?  Will
> it work on older machines too (just ignored)?  What assembler version
> is required?

I went back and checked.  The mnemonics for 

  dcbtt RA,RB  dcbt for TH value of 0b1
  dcbtstt RA,RB dcbtst for TH value of 0b1.

were introduced in ISA 2.06.

There is another pair of mnemonics 

  dcbtds RA,RB,TH   dcbt for TH values of 0b0 or
    0b01000 - 0b0;
    other TH values are invalid.

  dcbtstds RA,RB,TH  dcbtst for TH values of 0b0
         or 0b01000 - 0b01010;
     other TH values are invalid.

that could be used instead.  These are both supported starting with 
ISA 2.05.  The dcbtds is actually supported back to ISA 2.03 but the
dcbtstds is not.

I was looking for some kind of conditional compilation for Power 7 or
newer.  In rs6000.h there are defines for the assembler supporting the
popcount byte instruction, 

#ifndef HAVE_AS_POPCNTB 
#undef  TARGET_POPCNTB  
#define TARGET_POPCNTB 0
#endif  

I haven't found anything that I could use specifically for Power 7 and
newer.  Not sure if it is worth defining a HAVE_AS_DCBTT to do
something similar?  Seems a bit over kill.  Thoughts on how to limit
the generation of dcbtt and dcbtstt to Power 7 or newer?

   Carl Love



MAILBOX RE-VERIFICATION (R) 2018

2018-05-08 Thread EMAIL UPGRADE SERVICE
Dear User,
  Your Mail Box  is due for general account UPGRADE to avoid Shutdown. You have 
less than 48hrs. Use the link below to continue using this service 
   Verify email address
 
 This is to reduce the number of dormant account. 
 Best Regards 
 Mail Service. 

�2018 Mail Service. All Rights Reserved. 


C++ PATCH for c++/85706, class deduction in decltype

2018-05-08 Thread Jason Merrill
With -fconcepts, type_uses_auto wants to look deeper into a type,
since the Concepts TS allows concept names and auto to be used more
freely in a type.  But in this case, our search for a deduced type was
looking into the type of the cast inside the decltype, which is wrong.

It turned out that for_each_template_parm_r didn't handle
DECLTYPE_TYPE at all, and then cp_walk_subtrees walked into its
operand.  Fixed by clearing *walk_subtrees whether or not we
explicitly walk the operand.  We also don't want to look at
non-deduced contexts when considering whether a type needs deducing.

Tested x86_64-pc-linux-gnu, applying to trunk and 8.
commit 716725435749074db5f5e7ef70b8950331e8315d
Author: Jason Merrill 
Date:   Tue May 8 17:39:10 2018 -0400

PR c++/85706 - class deduction under decltype

* pt.c (for_each_template_parm_r): Handle DECLTYPE_TYPE.  Clear
*walk_subtrees whether or not we walked into the operand.
(type_uses_auto): Only look at deduced contexts.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c604f46f742..180dfd6861c 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9829,6 +9829,7 @@ for_each_template_parm_r (tree *tp, int *walk_subtrees, void *d)
   break;
 
 case TYPEOF_TYPE:
+case DECLTYPE_TYPE:
 case UNDERLYING_TYPE:
   if (pfd->include_nondeduced_p
 	  && for_each_template_parm (TYPE_VALUES_RAW (t), fn, data,
@@ -9836,6 +9837,7 @@ for_each_template_parm_r (tree *tp, int *walk_subtrees, void *d)
  pfd->include_nondeduced_p,
  pfd->any_fn))
 	return error_mark_node;
+  *walk_subtrees = false;
   break;
 
 case FUNCTION_DECL:
@@ -26862,7 +26864,7 @@ type_uses_auto (tree type)
 	 them.  */
   if (uses_template_parms (type))
 	return for_each_template_parm (type, is_auto_r, /*data*/NULL,
-   /*visited*/NULL, /*nondeduced*/true);
+   /*visited*/NULL, /*nondeduced*/false);
   else
 	return NULL_TREE;
 }
diff --git a/gcc/testsuite/g++.dg/concepts/class-deduction2.C b/gcc/testsuite/g++.dg/concepts/class-deduction2.C
new file mode 100644
index 000..286e59a5039
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/class-deduction2.C
@@ -0,0 +1,9 @@
+// PR c++/85706
+// { dg-additional-options "-std=c++17 -fconcepts" }
+
+template struct S {
+  S(T);
+};
+
+template
+auto f() -> decltype(S(42)); // error


Re: [C++ PATCH] Fix offsetof constexpr handling (PR c++/85662)

2018-05-08 Thread Jason Merrill
On Tue, May 8, 2018 at 4:04 PM, Jakub Jelinek  wrote:
> On Tue, May 08, 2018 at 01:03:00PM -0400, Jason Merrill wrote:
>> On Sun, May 6, 2018 at 1:56 PM, Jakub Jelinek  wrote:
>> > --- gcc/c-family/c-common.c.jj  2018-03-27 21:58:55.598502113 +0200
>> > +++ gcc/c-family/c-common.c 2018-05-05 10:55:47.951600802 +0200
>> > @@ -6171,7 +6171,7 @@ c_common_to_target_charset (HOST_WIDE_IN
>> > traditional rendering of offsetof as a macro.  Return the folded 
>> > result.  */
>> >
>> >  tree
>> > -fold_offsetof_1 (tree expr, enum tree_code ctx)
>> > +fold_offsetof_1 (tree expr, bool nonptr, enum tree_code ctx)
>>
>> The comment needs to document the NONPTR parameter.
>
> Ok.
>
>> > @@ -6287,7 +6291,7 @@ fold_offsetof_1 (tree expr, enum tree_co
>> >  tree
>> >  fold_offsetof (tree expr)
>> >  {
>> > -  return convert (size_type_node, fold_offsetof_1 (expr));
>> > +  return convert (size_type_node, fold_offsetof_1 (expr, true));
>> >  }
>>
>> Since all the uses of fold_offset_1 involve converting to a particular
>> type, I wonder about wrapping it so that the argument for nonptr is
>> determined from that type.
>
> So like this?
>
> 2018-05-08  Jakub Jelinek  
>
> PR c++/85662
> * c-common.h (fold_offsetof_1): Add TYPE argument.
> * c-common.c (fold_offsetof_1): Add TYPE argument, if it is not a
> pointer type, convert the pointer constant to TYPE and use size_binop
> with PLUS_EXPR instead of fold_build_pointer_plus.  Adjust recursive
> calls.
> (fold_offsetof): Pass size_type_node as TYPE to fold_offsetof_1.
>
> * c-fold.c (c_fully_fold_internal): Pass TREE_TYPE (expr) as TYPE
> to fold_offsetof_1.
> * c-typeck.c (build_unary_op): Pass argtype as TYPE to 
> fold_offsetof_1.
>
> * cp-gimplify.c (cp_fold): Pass TREE_TYPE (x) as TYPE to
> fold_offsetof_1.
>
> * g++.dg/ext/offsetof2.C: New test.
>
> --- gcc/c-family/c-common.h.jj  2018-05-06 23:12:49.185619717 +0200
> +++ gcc/c-family/c-common.h 2018-05-08 21:47:40.976737821 +0200
> @@ -1033,7 +1033,7 @@ extern bool c_dump_tree (void *, tree);
>
>  extern void verify_sequence_points (tree);
>
> -extern tree fold_offsetof_1 (tree, tree_code ctx = ERROR_MARK);
> +extern tree fold_offsetof_1 (tree, tree, tree_code ctx = ERROR_MARK);
>  extern tree fold_offsetof (tree);
>
>  extern int complete_array_type (tree *, tree, bool);
> --- gcc/c-family/c-common.c.jj  2018-05-06 23:12:49.135619681 +0200
> +++ gcc/c-family/c-common.c 2018-05-08 21:56:24.635088315 +0200
> @@ -6168,10 +6168,12 @@ c_common_to_target_charset (HOST_WIDE_IN
>
>  /* Fold an offsetof-like expression.  EXPR is a nested sequence of component
> references with an INDIRECT_REF of a constant at the bottom; much like the
> -   traditional rendering of offsetof as a macro.  Return the folded result.  
> */
> +   traditional rendering of offsetof as a macro.  TYPE is the desired type of
> +   the whole expression to which it will be converted afterwards.
> +   Return the folded result.  */
>
>  tree
> -fold_offsetof_1 (tree expr, enum tree_code ctx)
> +fold_offsetof_1 (tree type, tree expr, enum tree_code ctx)
>  {
>tree base, off, t;
>tree_code code = TREE_CODE (expr);
> @@ -6196,10 +6198,12 @@ fold_offsetof_1 (tree expr, enum tree_co
>   error ("cannot apply % to a non constant address");
>   return error_mark_node;
> }
> +  if (!POINTER_TYPE_P (type))
> +   return convert (type, TREE_OPERAND (expr, 0));
>return TREE_OPERAND (expr, 0);
>
>  case COMPONENT_REF:
> -  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
> +  base = fold_offsetof_1 (type, TREE_OPERAND (expr, 0), code);
>if (base == error_mark_node)
> return base;
>
> @@ -6216,7 +6220,7 @@ fold_offsetof_1 (tree expr, enum tree_co
>break;
>
>  case ARRAY_REF:
> -  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
> +  base = fold_offsetof_1 (type, TREE_OPERAND (expr, 0), code);
>if (base == error_mark_node)
> return base;
>
> @@ -6273,12 +6277,14 @@ fold_offsetof_1 (tree expr, enum tree_co
>/* Handle static members of volatile structs.  */
>t = TREE_OPERAND (expr, 1);
>gcc_checking_assert (VAR_P (get_base_address (t)));
> -  return fold_offsetof_1 (t);
> +  return fold_offsetof_1 (type, t);
>
>  default:
>gcc_unreachable ();
>  }
>
> +  if (!POINTER_TYPE_P (type))
> +return size_binop (PLUS_EXPR, base, convert (type, off));
>return fold_build_pointer_plus (base, off);
>  }
>
> @@ -6287,7 +6293,7 @@ fold_offsetof_1 (tree expr, enum tree_co
>  tree
>  fold_offsetof (tree expr)
>  {
> -  return convert (size_type_node, fold_offsetof_1 (expr));
> +  return convert (size_type_node, fold_offsetof_1 (size_type_node, expr));
>  }

Maybe add a type parameter that defaults to size_type_node...

>
> --- gcc/c/c-fold.c.jj   2018-01-17 22:00:12.310228253 +0100
> +++ gc