Re: libgo patch committed: Fix Makefile bug setting LD_LIBRARY_PATH

2017-11-22 Thread Bernhard Reutner-Fischer
On 21 November 2017 20:53:50 CET, Eric Botcazou  wrote:
>> This patch by Than McIntosh fixes a small bug in the libgo Makefile
>> recipe that constructs the directory from which to pick up
>> libgcc_s.so; the gccgo invocation with -print-libgcc-file-name was
>> missing the flags, which meant that for -m32 builds we'd see the
>> 64-bit libgcc dir.  Bootstrapped and ran Go testsuite on
>> x86_64-pc-linux-gnu.  Committed to mainline.
>
>Thanks, this helps on Solaris.  I have attached another fixlet: the -q
>option 
>of grep is rejected on Solaris.  Tested on Linux and Solaris.

How can that be?
grep -q was even required by SUSv2 from 1997 so Solaris should really support 
it.
What version if Solaris is that and what version of grep?

thanks, 


Re: libgo patch committed: Fix Makefile bug setting LD_LIBRARY_PATH

2017-11-22 Thread Eric Botcazou
> grep -q was even required by SUSv2 from 1997 so Solaris should really
> support it. What version if Solaris is that and what version of grep?

/usr/bin/grep on Solaris 10 (/usr/xpg4/bin/grep does support it).

-- 
Eric Botcazou


Re: [RFTesting] New POINTER_DIFF_EXPR

2017-11-22 Thread Christophe Lyon
Hi Marc,


On 20 November 2017 at 00:54, Marc Glisse  wrote:
> Hello,
>
> new version, regtested on powerpc64le-unknown-linux-gnu. The front-end parts
> are up for review.
>
> 2017-10-28  Marc Glisse  
>
> gcc/c/
> * c-fold.c (c_fully_fold_internal): Handle POINTER_DIFF_EXPR.
> * c-typeck.c (pointer_diff): Use POINTER_DIFF_EXPR.
>
> gcc/c-family/
> * c-pretty-print.c (pp_c_additive_expression,
> c_pretty_printer::expression): Handle POINTER_DIFF_EXPR.
>
> gcc/cp/
> * constexpr.c (cxx_eval_constant_expression,
> potential_constant_expression_1): Handle POINTER_DIFF_EXPR.
> * cp-gimplify.c (cp_fold): Likewise.
> * error.c (dump_expr): Likewise.
> * typeck.c (pointer_diff): Use POINTER_DIFF_EXPR.
>
> gcc/
> * doc/generic.texi: Document POINTER_DIFF_EXPR, update
> POINTER_PLUS_EXPR.
> * cfgexpand.c (expand_debug_expr): Handle POINTER_DIFF_EXPR.
> * expr.c (expand_expr_real_2): Likewise.
> * fold-const.c (const_binop, fold_addr_of_array_ref_difference,
> fold_binary_loc): Likewise.
> * match.pd (X-X, P+(Q-P), &D-P, (P+N)-P, P-(P+N), (P+M)-(P+N),
> P-Q==0, -(A-B), X-Z (A-B)+(C-A)): New transformations for POINTER_DIFF_EXPR, based on
> MINUS_EXPR transformations.
> * optabs-tree.c (optab_for_tree_code): Handle POINTER_DIFF_EXPR.
> * tree-cfg.c (verify_expr, verify_gimple_assign_binary): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * tree-pretty-print.c (dump_generic_node, op_code_prio,
> op_symbol_code): Likewise.
> * tree-vect-stmts.c (vectorizable_operation): Likewise.
> * vr-values.c (extract_range_from_binary_expr): Likewise.
> * varasm.c (initializer_constant_valid_p_1): Likewise.
>
> * tree.def: New tree code POINTER_DIFF_EXPR.
>

Since you committed this patch (r255021), my cross-builds of GCC for
aarch64-linux-gnu fail while building glibc:
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/bin/aarch64-none-linux-gnu-gcc
fetch-value.c -c -std=gnu99 -fgnu89-inline  -O2 -Wall -Winline -Wundef
-Wwrite-string
s -fmerge-all-constants -frounding-math -g -Wstrict-prototypes   -fPIC
   -I../include
-I/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gl
ibc-1/nptl_db  
-I/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1
 -I../sysdeps/unix/sysv/linux/aarch64/nptl  -I../sysdeps/unix/sysv/lin
ux/aarch64  -I../sysdeps/unix/sysv/linux/generic
-I../sysdeps/unix/sysv/linux/wordsize-64
-I../nptl/sysdeps/unix/sysv/linux  -I../nptl/sysdeps/pthread
-I../sysdeps/pthread  -I
../sysdeps/unix/sysv/linux  -I../sysdeps/gnu  -I../sysdeps/unix/inet
-I../nptl/sysdeps/unix/sysv  -I../sysdeps/unix/sysv
-I../nptl/sysdeps/unix  -I../sysdeps/unix  -I../sysdeps
/posix  -I../sysdeps/aarch64/fpu  -I../sysdeps/aarch64/nptl
-I../sysdeps/aarch64  -I../sysdeps/wordsize-64
-I../sysdeps/ieee754/ldbl-128  -I../sysdeps/ieee754/dbl-64/wordsize-6
4  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32
-I../sysdeps/aarch64/soft-fp  -I../sysdeps/ieee754
-I../sysdeps/generic  -I../nptl  -I.. -I../libio -I. -nostdinc -i
system 
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/lib/gcc/aarch64-none-linux-gnu/8.0.0/include
-isystem /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc
/tools/lib/gcc/aarch64-none-linux-gnu/8.0.0/include-fixed -isystem
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-aarch64-none-linux-gnu/usr/include
 -D_LIBC_REE
NTRANT -include ../include/libc-symbols.h  -DPIC -DSHARED
-DNOT_IN_libc=1 -DIS_IN_libthread_db=1 -DIN_LIB=libthread_db-o
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccs
rc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-value.os -MD -MP
-MF 
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-
value.os.dt -MT
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-value.os
during GIMPLE pass: vrp
fetch-value.c: In function '_td_locate_field':
fetch-value.c:44:1: internal compiler error: Segmentation fault
 _td_locate_field (td_thragent_t *ta,
 ^~~~
0xb87c75 crash_signal

/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:325
0x5b1934 contains_struct_check
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3459
0x5b1934 wi::to_wide(tree_node const*)
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:5247
0xf05557 vr_values::two_valued_val_range_p(tree_node*, tree_node**, tree_node**)

/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vr-values.c:4051
0xf0ee48 vr_values::simplify_stmt_using_ranges(gimple_stmt_iterator*)

/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vr-values.c:4115
0xd7068f substitute_and_fold_dom_wal

Re: [RFTesting] New POINTER_DIFF_EXPR

2017-11-22 Thread Marc Glisse

On Wed, 22 Nov 2017, Christophe Lyon wrote:


Since you committed this patch (r255021), my cross-builds of GCC for
aarch64-linux-gnu fail while building glibc:
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/bin/aarch64-none-linux-gnu-gcc
fetch-value.c -c -std=gnu99 -fgnu89-inline  -O2 -Wall -Winline -Wundef
-Wwrite-string
s -fmerge-all-constants -frounding-math -g -Wstrict-prototypes   -fPIC
  -I../include
-I/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gl
ibc-1/nptl_db  
-I/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1
-I../sysdeps/unix/sysv/linux/aarch64/nptl  -I../sysdeps/unix/sysv/lin
ux/aarch64  -I../sysdeps/unix/sysv/linux/generic
-I../sysdeps/unix/sysv/linux/wordsize-64
-I../nptl/sysdeps/unix/sysv/linux  -I../nptl/sysdeps/pthread
-I../sysdeps/pthread  -I
../sysdeps/unix/sysv/linux  -I../sysdeps/gnu  -I../sysdeps/unix/inet
-I../nptl/sysdeps/unix/sysv  -I../sysdeps/unix/sysv
-I../nptl/sysdeps/unix  -I../sysdeps/unix  -I../sysdeps
/posix  -I../sysdeps/aarch64/fpu  -I../sysdeps/aarch64/nptl
-I../sysdeps/aarch64  -I../sysdeps/wordsize-64
-I../sysdeps/ieee754/ldbl-128  -I../sysdeps/ieee754/dbl-64/wordsize-6
4  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32
-I../sysdeps/aarch64/soft-fp  -I../sysdeps/ieee754
-I../sysdeps/generic  -I../nptl  -I.. -I../libio -I. -nostdinc -i
system 
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/lib/gcc/aarch64-none-linux-gnu/8.0.0/include
-isystem /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc
/tools/lib/gcc/aarch64-none-linux-gnu/8.0.0/include-fixed -isystem
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-aarch64-none-linux-gnu/usr/include
-D_LIBC_REE
NTRANT -include ../include/libc-symbols.h  -DPIC -DSHARED
-DNOT_IN_libc=1 -DIS_IN_libthread_db=1 -DIN_LIB=libthread_db-o
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccs
rc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-value.os -MD -MP
-MF 
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-
value.os.dt -MT
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-value.os
during GIMPLE pass: vrp
fetch-value.c: In function '_td_locate_field':
fetch-value.c:44:1: internal compiler error: Segmentation fault
_td_locate_field (td_thragent_t *ta,
^~~~
0xb87c75 crash_signal
   /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:325
0x5b1934 contains_struct_check
   /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3459
0x5b1934 wi::to_wide(tree_node const*)
   /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:5247
0xf05557 vr_values::two_valued_val_range_p(tree_node*, tree_node**, tree_node**)
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vr-values.c:4051
0xf0ee48 vr_values::simplify_stmt_using_ranges(gimple_stmt_iterator*)
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vr-values.c:4115
0xd7068f substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-propagate.c:1073
0x1204860 dom_walker::walk(basic_block_def*)
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/domwalk.c:308
0xd6e805 substitute_and_fold_engine::substitute_and_fold()
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-propagate.c:1173
0xe71b0b vrp_prop::vrp_finalize(bool)
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-vrp.c:6788
0xe81667 execute_vrp
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-vrp.c:6864
Please submit a full bug report,


Looks like PR 83104.

(I am a bit curious what the code looks like, since the testcase in the PR 
is rather on the undefined side of things, though ok for something 
embedded)


--
Marc Glisse


Re: libgo patch committed: Fix Makefile bug setting LD_LIBRARY_PATH

2017-11-22 Thread Bernhard Reutner-Fischer
On 22 November 2017 at 09:07, Eric Botcazou  wrote:
>> grep -q was even required by SUSv2 from 1997 so Solaris should really
>> support it. What version if Solaris is that and what version of grep?
>
> /usr/bin/grep on Solaris 10 (/usr/xpg4/bin/grep does support it).

Why would we want to use /usr/bin ?
According to 
https://docs.oracle.com/cd/E19253-01/816-5175/standards-5/index.html
---8<---
Utilities

If the behavior required by POSIX.2, POSIX.2a, XPG4, SUS, or SUSv2
conflicts with historical Solaris utility behavior, the original
Solaris version of the utility is unchanged; a new version that is
standard-conforming has been provided in /usr/xpg4/bin. If the
behavior required by POSIX.1–2001 or SUSv3 conflicts with historical
Solaris utility behavior, a new version that is standard-conforming
has been provided in /usr/xpg4/bin or in /usr/xpg6/bin. If the
behavior required by POSIX.1–2001 or SUSv3 conflicts with POSIX.2,
POSIX.2a, SUS, or SUSv2, a new version that is SUSv3
standard-conforming has been provided in /usr/xpg6/bin.
---8<---

So we should obviously make sure to prefer /usr/xpg6/bin over
/usr/xpg4/bin over /usr/bin.
If we don't then i would say that's the bug to fix, not warp back in
time 20 years. Why let solaris hold hostage everybody else?

thanks,


Re: [RFTesting] New POINTER_DIFF_EXPR

2017-11-22 Thread Marc Glisse

On Wed, 22 Nov 2017, Marc Glisse wrote:


On Wed, 22 Nov 2017, Christophe Lyon wrote:


Since you committed this patch (r255021), my cross-builds of GCC for
aarch64-linux-gnu fail while building glibc:
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/bin/aarch64-none-linux-gnu-gcc
fetch-value.c -c -std=gnu99 -fgnu89-inline  -O2 -Wall -Winline -Wundef
-Wwrite-string
s -fmerge-all-constants -frounding-math -g -Wstrict-prototypes   -fPIC
  -I../include

-I/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gl
ibc-1/nptl_db 
-I/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1

-I../sysdeps/unix/sysv/linux/aarch64/nptl  -I../sysdeps/unix/sysv/lin
ux/aarch64  -I../sysdeps/unix/sysv/linux/generic
-I../sysdeps/unix/sysv/linux/wordsize-64
-I../nptl/sysdeps/unix/sysv/linux  -I../nptl/sysdeps/pthread
-I../sysdeps/pthread  -I
../sysdeps/unix/sysv/linux  -I../sysdeps/gnu  -I../sysdeps/unix/inet
-I../nptl/sysdeps/unix/sysv  -I../sysdeps/unix/sysv
-I../nptl/sysdeps/unix  -I../sysdeps/unix  -I../sysdeps
/posix  -I../sysdeps/aarch64/fpu  -I../sysdeps/aarch64/nptl
-I../sysdeps/aarch64  -I../sysdeps/wordsize-64
-I../sysdeps/ieee754/ldbl-128  -I../sysdeps/ieee754/dbl-64/wordsize-6
4  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32
-I../sysdeps/aarch64/soft-fp  -I../sysdeps/ieee754
-I../sysdeps/generic  -I../nptl  -I.. -I../libio -I. -nostdinc -i
system 
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/lib/gcc/aarch64-none-linux-gnu/8.0.0/include

-isystem /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc
/tools/lib/gcc/aarch64-none-linux-gnu/8.0.0/include-fixed -isystem
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-aarch64-none-linux-gnu/usr/include
-D_LIBC_REE
NTRANT -include ../include/libc-symbols.h  -DPIC -DSHARED
-DNOT_IN_libc=1 -DIS_IN_libthread_db=1 -DIN_LIB=libthread_db-o
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccs
rc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-value.os -MD -MP
-MF 
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-

value.os.dt -MT
/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-value.os
during GIMPLE pass: vrp
fetch-value.c: In function '_td_locate_field':
fetch-value.c:44:1: internal compiler error: Segmentation fault
_td_locate_field (td_thragent_t *ta,
^~~~
0xb87c75 crash_signal
   /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:325
0x5b1934 contains_struct_check
   /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3459
0x5b1934 wi::to_wide(tree_node const*)
   /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:5247
0xf05557 vr_values::two_valued_val_range_p(tree_node*, tree_node**, 
tree_node**)

   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vr-values.c:4051
0xf0ee48 vr_values::simplify_stmt_using_ranges(gimple_stmt_iterator*)
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vr-values.c:4115
0xd7068f 
substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)

   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-propagate.c:1073
0x1204860 dom_walker::walk(basic_block_def*)
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/domwalk.c:308
0xd6e805 substitute_and_fold_engine::substitute_and_fold()
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-propagate.c:1173
0xe71b0b vrp_prop::vrp_finalize(bool)
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-vrp.c:6788
0xe81667 execute_vrp
   
/tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-vrp.c:6864
Please submit a full bug report,


Looks like PR 83104.

(I am a bit curious what the code looks like, since the testcase in the PR is 
rather on the undefined side of things, though ok for something embedded)


  if (idx != 0 && DB_DESC_NELEM (desc) != 0
  && idx - (psaddr_t) 0 > DB_DESC_NELEM (desc))

"idx - (psaddr_t) 0": glibc should probably avoid doing pointer arithmetic 
with a null pointer, "(ptrdiff_t) idx" looks equivalent.


--
Marc Glisse


RE: [patch] remove cilk-plus

2017-11-22 Thread Koval, Julia
Hi,

> So it's not important, but the patch doesn't have the removal of the
> cilk+ testsuite or runtime.  BUt again, it's not a big deal, I can guess
> what that part of the patch looks like.

I used Jakub's suggestion in 
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01348.html and didn't add 
libcilkrts and cilk-plus directories to patch(without this patch doesn't fit in 
gcc-patches), only to change log. I will include them, when I'll commit the 
patch, but I guess there is nothing to review here:
rm -rf gcc/testsuite/c-c++-common/cilk-plus
rm -rf gcc/testsuite/g++.dg/cilk-plus
rm -rf gcc/testsuite/gcc.dg/cilk-plus
rm -rf libcilkrts
I can send it as an additional patch(or patches) if this is required.

Thanks for your comments, fixed them in attached patch.vm


> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, November 21, 2017 8:41 AM
> To: Koval, Julia ; Jakub Jelinek 
> Cc: GCC Patches 
> Subject: Re: [patch] remove cilk-plus
> 
> On 11/16/2017 10:02 AM, Koval, Julia wrote:
> > Thanks for your comments, fixed it.
> >
> > 2017-11-16  Julia Koval  
> > Sebastian Peryt  
> >
> > * Makefile.def (target_modules): Remove libcilkrts.
> > * Makefile.in: Ditto.
> > * configure: Ditto.
> > * configure.ac: Ditto.
> >
> > contrib/
> > * contrib/gcc_update: Ditto.
> >
> > gcc/
> > * Makefile.in (cilkplus.def, cilk-builtins.def, c-family/cilk.o,
> > c-family/c-cilkplus.o, c-family/array-notation-common.o,
> > cilk-common.o, cilk.h, cilk-common.c): Remove.
> > * builtin-types.def
> > (BT_FN_INT_PTR_PTR_PTR_FTYPE_BT_INT_BT_PTR_BT_PTR_BT_PTR):
> Remove.
> > * builtins.c (is_builtin_name): Remove cilkplus condition.
> > (BUILT_IN_CILK_DETACH, BUILT_IN_CILK_POP_FRAME): Remove.
> > * builtins.def (DEF_CILK_BUILTIN_STUB, DEF_CILKPLUS_BUILTIN,
> > cilk-builtins.def, cilkplus.def): Remove.
> > * cif-code.def (CILK_SPAWN): Remove.
> > * cilk-builtins.def: Delete.
> > * cilk-common.c: Ditto.
> > * cilk.h: Ditto.
> > * cilkplus.def: Ditto.
> > * config/darwin.h (fcilkplus): Delete.
> > * cppbuiltin.c: Ditto.
> > * doc/extend.texi: Remove cilkplus doc.
> > * doc/generic.texi: Ditto.
> > * doc/invoke.texi: Ditto.
> > * doc/passes.texi: Ditto.
> > * gcc.c (fcilkplus): Remove.
> > * gengtype.c (cilk.h): Remove.
> > * gimple-pretty-print.c (dump_gimple_omp_for): Remove cilkplus
> support.
> > * gimple.h (GF_OMP_FOR_KIND_CILKFOR,
> GF_OMP_FOR_KIND_CILKSIMD): Remove.
> > * gimplify.c (gimplify_return_expr, maybe_fold_stmt,
> gimplify_call_expr,
> > is_gimple_stmt, gimplify_modify_expr, gimplify_scan_omp_clauses,
> > gimplify_adjust_omp_clauses, gimplify_omp_for, gimplify_expr):
> Remove
> > cilkplus conditions.
> > * ipa-fnsummary.c (ipa_dump_fn_summary, compute_fn_summary,
> > inline_read_section): Ditto.
> > * ipa-inline-analysis.c (cilk.h): Remove.
> > * ira.c (ira_setup_eliminable_regset): Remove cilkplus support.
> > * lto-wrapper.c (merge_and_complain, append_compiler_options,
> > append_linker_options): Remove condition for fcilkplus.
> > * lto/lto-lang.c (cilk.h): Remove.
> > (lto_init): Remove condition for fcilkplus.
> > * omp-expand.c (expand_cilk_for_call): Delete.
> > (expand_omp_taskreg, expand_omp_for_static_chunk,
> > expand_omp_for): Remove cilkplus
> > conditions.
> > (expand_cilk_for): Delete.
> > * omp-general.c (omp_extract_for_data): Remove cilkplus support.
> > * omp-low.c (scan_sharing_clauses, create_omp_child_function,
> > execute_lower_omp, diagnose_sb_0): Ditto.
> > * omp-simd-clone.c (simd_clone_clauses_extract): Ditto.
> > * tree-core.h (OMP_CLAUSE__CILK_FOR_COUNT_): Delete.
> > * tree-nested.c: Ditto.
> > * tree-pretty-print.c (dump_omp_clause): Remove cilkplus support.
> > (dump_generic_node): Ditto.
> > * tree.c (OMP_CLAUSE__CILK_FOR_COUNT_): Delete.
> > * tree.def (cilk_simd, cilk_for, cilk_spawn_stmt, cilk_sync_stmt): 
> > Delete.
> > * tree.h (CILK_SPAWN_FN, EXPR_CILK_SPAWN): Delete.
> >
> > gcc/c-family/
> > * array-notation-common.c: Delete.
> > * c-cilkplus.c: Ditto.
> > * c-common.c (_Cilk_spawn, _Cilk_sync, _Cilk_for): Remove.
> > * c-common.def (ARRAY_NOTATION_REF): Remove.
> > * c-common.h (RID_CILK_SPAWN, build_array_notation_expr,
> > build_array_notation_ref, C_ORT_CILK, c_check_cilk_loop,
> > c_validate_cilk_plus_loop, cilkplus_an_parts,
> cilk_ignorable_spawn_rhs_op,
> > cilk_recognize_spawn): Remove.
> > * c-gimplify.c (CILK_SPAWN_STMT): Remove.
> > * c-omp.c: Remove CILK_SIMD check.
> > * c-pragma.c: Ditto.
> > * c-pragma.h: Remove CILK related pragmas.
> > * c-pretty-print.c (c_pretty_printer::postfix_expression): Remove
> > ARRAY_NOTATION_REF condition.
> > (c_pretty_printer::expression): Ditto.
> > * c.opt (fcilkplus): Remove.
> > * cilk.c:

[PATCH] Fix PR83089

2017-11-22 Thread Richard Biener

The following fixes if-conversion to free SCEV/niter estimates because
it DCEs stmts.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-11-22  Richard Biener  

PR tree-optimization/83089
* tree-if-conv.c (pass_if_conversion::execute): If anything
changed reset SCEV and free the number of iteration estimates.

* gcc.dg/pr83089.c: New testcase.

Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 254990)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2959,6 +2959,12 @@ pass_if_conversion::execute (function *f
&& !loop->dont_vectorize))
   todo |= tree_if_conversion (loop);
 
+  if (todo)
+{
+  free_numbers_of_iterations_estimates (fun);
+  scev_reset ();
+}
+
   if (flag_checking)
 {
   basic_block bb;
Index: gcc/testsuite/gcc.dg/pr83089.c
===
--- gcc/testsuite/gcc.dg/pr83089.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr83089.c  (working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-if-convert -ftree-parallelize-loops=2" } */
+
+int rl, s8;
+
+void
+it (int zy, short int el)
+{
+  int hb;
+
+  while (el != 0)
+{
+  hb = el;
+  for (rl = 0; rl < 200; ++rl)
+   {
+ for (s8 = 0; s8 < 2; ++s8)
+   {
+   }
+ if (s8 < 3)
+   zy = hb;
+ if (hb == 0)
+   ++s8;
+ zy += (s8 != -1);
+   }
+  el = zy;
+}
+}


Re: [DWARF] mark partial fn versions and OMP frags as partial in dwarf2+ debug info

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 02:40:39AM -0200, Alexandre Oliva wrote:
> On Nov 21, 2017, Jakub Jelinek  wrote:
> 
> > On Wed, Nov 15, 2017 at 05:05:36AM -0200, Alexandre Oliva wrote:
> 
> >> This patch introduces a new DWARF attribute to indicate that a function
> >> is a partial copy of its abstract origin, specifically, that its entry
> >> point does not correspond to the entry point of the abstract origin.
> >> This attribute can then be used to mark the out-of-line portion of
> >> partial inlines, and OpenMP blocks split out into artificial functions.
> 
> > I'm not sure I like the attribute name too much, and more importantly,
> > as I said before, I think the attribute should not be a flag, but a number
> > which tells not just that it is an outlined portion of the original
> > subprogram, but also what kind of outlining it is and its purpose.
> 
> I suppose you don't like it because it means something other than what
> you suggested.  I'd taken note of your suggestion, and even asked for
> more details back then, but none of the debug info consumers seemed to
> have changed their mind to the initial assessment that a single boolean
> flag would suffice for the purposes of telling whether or not the entry
> point of a given (implementation) subprogram should be covered by a
> breakpoint at a given (source-level) subprogram.

Let's take them into a loop again.

> We could supply a lot more detail about how functions are split, and
> why, but if debug info consumers have no way to make use of the info, or
> have no interest in doing so, I don't see that makes sense to waste
> compiler resources preserving and generating such additional
> information.

The thing is, the different kinds of outlined regions work very differently.
Say you have:

int bar (int);
static inline int foo (int x, int y)
{
  int z = x + 9 * y;
  if (x < 20)
return y;
  return bar (bar (bar (bar (x + y * 7;
}
as an example of possibly partially inlined function and
void
baz (int x, int y)
{
  int z = x + 9 * y;
  #pragma omp parallel for
  for (int i = 0; i < 60; i++)
bar (i);
}
as an example of OpenMP parallel region.  Now, if inside of the outlined
regions (bar (bar (bar (bar (x + y * 7; in the first case and the
#pragma omp for in the second case) you want in the debugger to query
the value of z, which isn't used in those outlined regions and thus there
is likely no DW_TAG_variable for it in the outlined region, you need to do
something quite different.  In the first case just look up the caller,
and if the innermost inlined region has the same abstract origin as the
outlined region, query the z variable there.
In the OpenMP region, it isn't a parent that you need to look up though.
In the master thread, it is a grand+ parent, where you need to skip up all
the frames that belong to the OpenMP runtime library, in other threads you
need to talk to the runtime library to find what the parent thread is and
unwind there to some frame above the frames that belong to the OpenMP
runtime library.
OpenMP task region is again quite different from this, in that the parent
might still exist but might not, the task can be invoked after the function
spawning it returned if there is no thread synchronization before that (or
might at least leave some lexical scopes).
If we have an attribute like I'm proposing, you can easily test it even as a
boolean if there is some handling common to all the outlined region kinds
(just compare the value against 0). 

> This flag serves a simple purpose that doesn't require debug info
> consumers to understand the implications of such relationships as a
> function having multiple openmp blocks split out of it, before being
> further split into inline and out-of-line, and then have the out-of-line
> portion further versioned and then split into inline and out-of-line.
> A debug info consumer can deal with that regardless of our providing
> additional detail about the purpose of the splits elsewhere.
> 
> Now consider that we mark both inline and out-of-line functions with the
> same attribute and different nonzero values, and that we assign
> different values for the various kinds of split-out blocks, some of
> which a debug info consumer doesn't know about.  Should it then stop at
> the entry point of that fragment when a breakpoint is set on a function,
> or should it not?  (if there's more than one nonzero value for which the
> fragment encompasses the entry point, a conservative debug info consumer
> won't know what to do)

If we have a value for the partial inlining inlined outer part, then yes,
the debugger would need to check for two values (if we use 1 for the inlined
outer part, then <= 1 check), it is unlikely further outlined regions of
that kind would be needed.  If we don't mark the partial inlining inlined
outer part, then just != 0.  My point is when we are adding a new attribute,
we shouldn't look just for a single consumer purpose when we actually
already now know about multiple other

[PATCH, libgfortran, committed] PR 83070, fix -Wsign-compare warning

2017-11-22 Thread Janne Blomqvist
Hi,

committed the following patch as r255045, pre-approved in the PR:

2017-11-22  Janne Blomqvist  

PR libfortran/83070
* intrinsics/eoshift0.c (eoshift0): Fix -Wsign-compare warning by
making a_ex and r_ex index_type instead of size_t.

--- trunk/libgfortran/intrinsics/eoshift0.c 2017/11/22 08:47:47 255044
+++ trunk/libgfortran/intrinsics/eoshift0.c 2017/11/22 08:51:21 255045
@@ -107,7 +107,7 @@
   if (which > 0)
 {
   /* Test if both ret and array are contiguous.  */
-  size_t r_ex, a_ex;
+  index_type r_ex, a_ex;
   r_ex = 1;
   a_ex = 1;
   do_blocked = true;





-- 
Janne Blomqvist


Re: [PATCH] Fix i?86 bootstrap (PR rtl-optimization/82044)

2017-11-22 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> On Wed, Nov 22, 2017 at 12:27:20AM +0100, Jakub Jelinek wrote:
> > The following patch fixes those two issues and adds similar overflow
> > check to record_store too (in that spot width is always non-negative, so
> > we don't need a special width == -1 handling).
> > 
> > Bootstrapped successfully on i686-linux, ok for trunk if it passes regtest
> > there (and pending x86_64-linux bootstrap + regtest)?
> 
> Now successfully bootstrapped/regtested on both.

Ok.

Richard.


Re: [RFTesting] New POINTER_DIFF_EXPR

2017-11-22 Thread Christophe Lyon
On 22 November 2017 at 09:29, Marc Glisse  wrote:
> On Wed, 22 Nov 2017, Marc Glisse wrote:
>
>> On Wed, 22 Nov 2017, Christophe Lyon wrote:
>>
>>> Since you committed this patch (r255021), my cross-builds of GCC for
>>> aarch64-linux-gnu fail while building glibc:
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/bin/aarch64-none-linux-gnu-gcc
>>> fetch-value.c -c -std=gnu99 -fgnu89-inline  -O2 -Wall -Winline -Wundef
>>> -Wwrite-string
>>> s -fmerge-all-constants -frounding-math -g -Wstrict-prototypes   -fPIC
>>>   -I../include
>>>
>>>
>>> -I/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gl
>>> ibc-1/nptl_db
>>> -I/tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1
>>> -I../sysdeps/unix/sysv/linux/aarch64/nptl  -I../sysdeps/unix/sysv/lin
>>> ux/aarch64  -I../sysdeps/unix/sysv/linux/generic
>>> -I../sysdeps/unix/sysv/linux/wordsize-64
>>> -I../nptl/sysdeps/unix/sysv/linux  -I../nptl/sysdeps/pthread
>>> -I../sysdeps/pthread  -I
>>> ../sysdeps/unix/sysv/linux  -I../sysdeps/gnu  -I../sysdeps/unix/inet
>>> -I../nptl/sysdeps/unix/sysv  -I../sysdeps/unix/sysv
>>> -I../nptl/sysdeps/unix  -I../sysdeps/unix  -I../sysdeps
>>> /posix  -I../sysdeps/aarch64/fpu  -I../sysdeps/aarch64/nptl
>>> -I../sysdeps/aarch64  -I../sysdeps/wordsize-64
>>> -I../sysdeps/ieee754/ldbl-128  -I../sysdeps/ieee754/dbl-64/wordsize-6
>>> 4  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32
>>> -I../sysdeps/aarch64/soft-fp  -I../sysdeps/ieee754
>>> -I../sysdeps/generic  -I../nptl  -I.. -I../libio -I. -nostdinc -i
>>> system
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/lib/gcc/aarch64-none-linux-gnu/8.0.0/include
>>> -isystem /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc
>>> /tools/lib/gcc/aarch64-none-linux-gnu/8.0.0/include-fixed -isystem
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-aarch64-none-linux-gnu/usr/include
>>> -D_LIBC_REE
>>> NTRANT -include ../include/libc-symbols.h  -DPIC -DSHARED
>>> -DNOT_IN_libc=1 -DIS_IN_libthread_db=1 -DIN_LIB=libthread_db-o
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccs
>>> rc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-value.os -MD -MP
>>> -MF
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-
>>> value.os.dt -MT
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/glibc-1/nptl_db/fetch-value.os
>>> during GIMPLE pass: vrp
>>> fetch-value.c: In function '_td_locate_field':
>>> fetch-value.c:44:1: internal compiler error: Segmentation fault
>>> _td_locate_field (td_thragent_t *ta,
>>> ^~~~
>>> 0xb87c75 crash_signal
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:325
>>> 0x5b1934 contains_struct_check
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3459
>>> 0x5b1934 wi::to_wide(tree_node const*)
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:5247
>>> 0xf05557 vr_values::two_valued_val_range_p(tree_node*, tree_node**,
>>> tree_node**)
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vr-values.c:4051
>>> 0xf0ee48 vr_values::simplify_stmt_using_ranges(gimple_stmt_iterator*)
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vr-values.c:4115
>>> 0xd7068f
>>> substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-propagate.c:1073
>>> 0x1204860 dom_walker::walk(basic_block_def*)
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/domwalk.c:308
>>> 0xd6e805 substitute_and_fold_engine::substitute_and_fold()
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-propagate.c:1173
>>> 0xe71b0b vrp_prop::vrp_finalize(bool)
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-vrp.c:6788
>>> 0xe81667 execute_vrp
>>>
>>> /tmp/6857183_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-vrp.c:6864
>>> Please submit a full bug report,
>>
>>
>> Looks like PR 83104.
>>
>> (I am a bit curious what the code looks like, since the testcase in the PR
>> is rather on the undefined side of things, though ok for something embedded)
>
>
>   if (idx != 0 && DB_DESC_NELEM (desc) != 0
>   && idx - (psaddr_t) 0 > DB_DESC_NELEM (desc))
>
> "idx - (psaddr_t) 0": glibc should probably avoid doing pointer arithmetic
> with a null pointer, "(ptrdiff_t) idx" looks equivalent.
>

The patch you attached in PR83104 works for me (at least the build
completes, I haven't run the tests)

Thanks,

Christophe

> --
> Marc Glisse


[PATCH][1/2] Alternate fix for PR81403

2017-11-22 Thread Richard Biener

I had the need to analyze the PR81403 failure again and figured the real
cause and prepared a patch for this (the original fix still keeps some
issues latent I think).

This is cleanups that make it easier to debug PRE issues.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-11-22  Richard Biener  

* gimple-iterator.c (gimple_find_edge_insert_loc): Ignore
fake edges to exit when looking for a place to insert.
* tree-ssa-pre.c (clear_expression_ids): Inline into callers
and remove.
(insert_into_preds_of_block): Commit edge insertion immediately,
assert that doesn't require new BBs.
(fini_pre): Release expressions.
(pass_pre::execute): Shuffle things around a bit, if the fn
is too large do not compute AVAIL either as this is really the
quadratic bit.

Index: gcc/gimple-iterator.c
===
--- gcc/gimple-iterator.c   (revision 254995)
+++ gcc/gimple-iterator.c   (working copy)
@@ -763,7 +763,12 @@ gimple_find_edge_insert_loc (edge e, gim
  Except for the entry block.  */
   src = e->src;
   if ((e->flags & EDGE_ABNORMAL) == 0
-  && single_succ_p (src)
+  && (single_succ_p (src)
+ /* Do not count a fake edge as successor as added to infinite
+loops by connect_infinite_loops_to_exit.  */
+ || (EDGE_COUNT (src->succs) == 2
+ && (EDGE_SUCC (src, 0)->flags & EDGE_FAKE
+ || EDGE_SUCC (src, 1)->flags & EDGE_FAKE)))
   && src != ENTRY_BLOCK_PTR_FOR_FN (cfun))
 {
   *gsi = gsi_last_bb (src);
Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 254995)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -400,15 +400,6 @@ expression_for_id (unsigned int id)
   return expressions[id];
 }
 
-/* Free the expression id field in all of our expressions,
-   and then destroy the expressions array.  */
-
-static void
-clear_expression_ids (void)
-{
-  expressions.release ();
-}
-
 static object_allocator pre_expr_pool ("pre_expr nodes");
 
 /* Given an SSA_NAME NAME, get or create a pre_expr to represent it.  */
@@ -1331,7 +1322,6 @@ get_representative_for (const pre_expr e
 }
 
 
-
 static pre_expr
 phi_translate (pre_expr expr, bitmap_set_t set1, bitmap_set_t set2,
   basic_block pred, basic_block phiblock);
@@ -3004,7 +2994,8 @@ insert_into_preds_of_block (basic_block
   gcc_assert (!(pred->flags & EDGE_ABNORMAL));
   if (!gimple_seq_empty_p (stmts))
{
- gsi_insert_seq_on_edge (pred, stmts);
+ basic_block new_bb = gsi_insert_seq_on_edge_immediate (pred, stmts);
+ gcc_assert (! new_bb);
  insertions = true;
}
   if (!builtexpr)
@@ -4127,6 +4118,7 @@ static void
 fini_pre ()
 {
   value_expressions.release ();
+  expressions.release ();
   BITMAP_FREE (inserted_exprs);
   bitmap_obstack_release (&grand_bitmap_obstack);
   bitmap_set_pool.release ();
@@ -4181,22 +4173,21 @@ pass_pre::execute (function *fun)
  loop_optimizer_init may create new phis, etc.  */
   loop_optimizer_init (LOOPS_NORMAL);
   split_critical_edges ();
+  scev_initialize ();
 
   run_scc_vn (VN_WALK);
 
   init_pre ();
-  scev_initialize ();
-
-  /* Collect and value number expressions computed in each basic block.  */
-  compute_avail ();
 
   /* Insert can get quite slow on an incredibly large number of basic
  blocks due to some quadratic behavior.  Until this behavior is
  fixed, don't run it when he have an incredibly large number of
  bb's.  If we aren't going to run insert, there is no point in
- computing ANTIC, either, even though it's plenty fast.  */
+ computing ANTIC, either, even though it's plenty fast nor do
+ we require AVAIL.  */
   if (n_basic_blocks_for_fn (fun) < 4000)
 {
+  compute_avail ();
   compute_antic ();
   insert ();
 }
@@ -4211,19 +4202,19 @@ pass_pre::execute (function *fun)
  not keeping virtual operands up-to-date.  */
   gcc_assert (!need_ssa_update_p (fun));
 
-  /* Remove all the redundant expressions.  */
-  todo |= vn_eliminate (inserted_exprs);
-
   statistics_counter_event (fun, "Insertions", pre_stats.insertions);
   statistics_counter_event (fun, "PA inserted", pre_stats.pa_insert);
   statistics_counter_event (fun, "HOIST inserted", pre_stats.hoist_insert);
   statistics_counter_event (fun, "New PHIs", pre_stats.phis);
 
-  clear_expression_ids ();
+  /* Remove all the redundant expressions.  */
+  todo |= vn_eliminate (inserted_exprs);
 
-  scev_finalize ();
   remove_dead_inserted_code ();
+
   fini_pre ();
+
+  scev_finalize ();
   loop_optimizer_finalize ();
 
   /* Restore SSA info before tail-merging as that resets it as well.  */


Re: [PATCH] Fix i?86 bootstrap (PR rtl-optimization/82044)

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 9:54 AM, Richard Biener  wrote:
> On Wed, 22 Nov 2017, Jakub Jelinek wrote:
>
>> On Wed, Nov 22, 2017 at 12:27:20AM +0100, Jakub Jelinek wrote:
>> > The following patch fixes those two issues and adds similar overflow
>> > check to record_store too (in that spot width is always non-negative, so
>> > we don't need a special width == -1 handling).
>> >
>> > Bootstrapped successfully on i686-linux, ok for trunk if it passes regtest
>> > there (and pending x86_64-linux bootstrap + regtest)?
>>
>> Now successfully bootstrapped/regtested on both.
>
> Ok.

I've reverted the patch on the branch.  This isn't stuff we should
backport IMHO.

Richard.

> Richard.


[PATCH] Fix dwarf2out ICE on VEC_SERIES rtl (PR debug/83034)

2017-11-22 Thread Jakub Jelinek
Hi!

On this testcase we get (vec_series (reg: ebp) (const_int 1))
in NOTE_INSN_CALL_ARG_LOCATION because the vector isn't live anywhere else.
Right now we do not have a good story for providing values of optimized
out vectors, so just punting on it like on other VEC_* codes is sufficient.

For the future, the question is how to represent vectors, they could be
represented e.g. using typed DWARF stack, but then the stack could have
really large values, or by representing them as arrays of elements using
DW_OP_piece, which probably current consumers would grok already now,
but the debug info for it could be larger (say:
typedef int V __attribute__((vector_size (32)));
V bar (V, V);

V
foo (V a, V b)
{
  V c = a + b;
  return bar (a, b) + a;
}
to represent c we either have typed stack ... DW_OP_plus DW_OP_stack_value,
or ( ... DW_OP_plus DW_OP_stack_value DW_OP_piece ... ) 8x

Anyway, for now I think this is enough.  Bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2017-11-22  Jakub Jelinek  

PR debug/83034
* dwarf2out.c (mem_loc_descriptor): Handle VEC_SERIES.

* gcc.dg/pr83034.c: New test.

--- gcc/dwarf2out.c.jj  2017-11-21 08:56:52.0 +0100
+++ gcc/dwarf2out.c 2017-11-21 16:08:34.397197696 +0100
@@ -15605,6 +15605,7 @@ mem_loc_descriptor (rtx rtl, machine_mod
 case VEC_SELECT:
 case VEC_CONCAT:
 case VEC_DUPLICATE:
+case VEC_SERIES:
 case UNSPEC:
 case HIGH:
 case FMA:
--- gcc/testsuite/gcc.dg/pr83034.c.jj   2017-11-21 16:07:13.886253031 +0100
+++ gcc/testsuite/gcc.dg/pr83034.c  2017-11-21 16:03:23.0 +0100
@@ -0,0 +1,12 @@
+/* PR debug/83034 */
+/* { dg-do compile } */
+/* { dg-options "-funroll-loops -Ofast -g" } */
+
+__attribute__((__simd__)) float expf (float);
+
+void
+foo (float *a, int x)
+{
+  for (; x; x++)
+a[x] = expf (x);
+}

Jakub


Re: [patch] remove cilk-plus

2017-11-22 Thread Rainer Orth
Hi Julia,

>> So it's not important, but the patch doesn't have the removal of the
>> cilk+ testsuite or runtime.  BUt again, it's not a big deal, I can guess
>> what that part of the patch looks like.
>
> I used Jakub's suggestion in
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01348.html and didn't add
> libcilkrts and cilk-plus directories to patch(without this patch doesn't
> fit in gcc-patches), only to change log. I will include them, when I'll
> commit the patch, but I guess there is nothing to review here:
> rm -rf gcc/testsuite/c-c++-common/cilk-plus
> rm -rf gcc/testsuite/g++.dg/cilk-plus
> rm -rf gcc/testsuite/gcc.dg/cilk-plus
> rm -rf libcilkrts
> I can send it as an additional patch(or patches) if this is required.

there's more in the testsuite:

gcc/testsuite/lib/cilk-plus-dg.exp
gcc/testsuite/lib/target-supports.exp (check_effective_target_cilkplus,
check_effective_target_cilkplus_runtime)
gcc/doc/sourcebuild.texi (cilkplus_runtime effective-target keyword)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-22 Thread Richard Biener
On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel
 wrote:
>
>> This is PR81179 I think, please mention that in the changelog.
>
> Correct, my bad for missing that.
>
>> This unconditionally pessimizes code even if there is no valid index
>> zero, right?
>
> Almost, since for a loop such as:
>
>   #define OFFSET 1
>   unsigned int find(const unsigned int *a, unsigned int v) {
> unsigned int result = 120;
> for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>   if (a[i-OFFSET] == v) result = i;
> }
> return result;
>   }
>
> the index i will match the contents of the index vector used here ---
> but this does indeed pessimize the code generated for, say, OFFSET
> = 2. It is probably more sensible to use the existing code path in those
> situations.
>
>> The issue with the COND_REDUCITION index vector is overflow IIRC.
>
> Does that mean such overflows can already manifest themselves for
> regular COND_REDUCTION? I had assumed sufficient checks were already in
> place because of the presence of the is_nonwrapping_integer_induction
> test.

But only if we need the index vector?  The patch looked like you're changing
how other modes are handled (in my look I didn't make myself familiar with
the various modes again...).  Anyway, Alan hopefully remembers what he
coded so I'll defer to him here.

If Alan is happy with the patch consider it approved.

Thanks,
Richard.


RE: [patch] remove cilk-plus

2017-11-22 Thread Koval, Julia
Changes for these files(except sourcebuild one, will fix that) are included in 
patch I sent. I only removed from the patch deletion of the folders I mentioned.

Julia

> -Original Message-
> From: Rainer Orth [mailto:r...@cebitec.uni-bielefeld.de]
> Sent: Wednesday, November 22, 2017 10:11 AM
> To: Koval, Julia 
> Cc: Jeff Law ; Jakub Jelinek ; GCC
> Patches 
> Subject: Re: [patch] remove cilk-plus
> 
> Hi Julia,
> 
> >> So it's not important, but the patch doesn't have the removal of the
> >> cilk+ testsuite or runtime.  BUt again, it's not a big deal, I can guess
> >> what that part of the patch looks like.
> >
> > I used Jakub's suggestion in
> > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01348.html and didn't add
> > libcilkrts and cilk-plus directories to patch(without this patch doesn't
> > fit in gcc-patches), only to change log. I will include them, when I'll
> > commit the patch, but I guess there is nothing to review here:
> > rm -rf gcc/testsuite/c-c++-common/cilk-plus
> > rm -rf gcc/testsuite/g++.dg/cilk-plus
> > rm -rf gcc/testsuite/gcc.dg/cilk-plus
> > rm -rf libcilkrts
> > I can send it as an additional patch(or patches) if this is required.
> 
> there's more in the testsuite:
> 
> gcc/testsuite/lib/cilk-plus-dg.exp
> gcc/testsuite/lib/target-supports.exp (check_effective_target_cilkplus,
> check_effective_target_cilkplus_runtime)
> gcc/doc/sourcebuild.texi (cilkplus_runtime effective-target keyword)
> 
>   Rainer
> 
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix mult expansion ICE (PR middle-end/82875)

2017-11-22 Thread Jakub Jelinek
Hi!

On these two testcases, we end up expanding MULT_EXPR where both arguments
end up being VOIDmode.  For smul_optab that isn't a problem, we have
the mode next to it, but in some cases we want to use {u,s}mul_widen_optab
which is a conversion optab which needs two modes expand_binop is called
just with one mode, the result mode, so the other mode is guessed from
the operands and if both are VOIDmode, then a fallback is chosen to use
return mode.  The new find_widening* changes ICE on that though, previously
we'd just do something.

In any case, I think we need to make sure this doesn't happen while we
still know both modes for the {u,s}mul_widen_optab.  Bootstrapped/regtested
on x86_64-linux and i686-linux, ok for trunk?

Perhaps additionally we could have somewhere a case which for both arguments
constant (unlikely case, as the gimple optimizers should usually optimize
that out) and selected optabs for which we know the corresponding RTL code
we could use simplify_const_binary_operation and see if it optimizes into a
constant and just return that.  Though, these functions are large and it
is still possible a constant could be uncovered later, so I think we want
this patch even if we do something like that.

2017-11-22  Jakub Jelinek  

PR middle-end/82875
* optabs.c (expand_doubleword_mult, expand_binop): Before calling
expand_binop with *mul_widen_optab, make sure at least one of the
operands doesn't have VOIDmode.

* gcc.dg/pr82875.c: New test.
* gcc.c-torture/compile/pr82875.c: New test.

--- gcc/optabs.c.jj 2017-11-09 20:23:51.0 +0100
+++ gcc/optabs.c2017-11-21 17:40:13.318673366 +0100
@@ -861,6 +861,11 @@ expand_doubleword_mult (machine_mode mod
   if (target && !REG_P (target))
 target = NULL_RTX;
 
+  /* *_widen_optab needs to determine operand mode, make sure at least
+ one operand has non-VOID mode.  */
+  if (GET_MODE (op0_low) == VOIDmode && GET_MODE (op1_low) == VOIDmode)
+op0_low = force_reg (word_mode, op0_low);
+
   if (umulp)
 product = expand_binop (mode, umul_widen_optab, op0_low, op1_low,
target, 1, OPTAB_DIRECT);
@@ -1199,6 +1204,10 @@ expand_binop (machine_mode mode, optab b
  : smul_widen_optab),
 wider_mode, mode) != CODE_FOR_nothing))
 {
+  /* *_widen_optab needs to determine operand mode, make sure at least
+one operand has non-VOID mode.  */
+  if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
+   op0 = force_reg (mode, op0);
   temp = expand_binop (wider_mode,
   unsignedp ? umul_widen_optab : smul_widen_optab,
   op0, op1, NULL_RTX, unsignedp, OPTAB_DIRECT);
--- gcc/testsuite/gcc.dg/pr82875.c.jj   2017-11-21 17:50:16.022274628 +0100
+++ gcc/testsuite/gcc.dg/pr82875.c  2017-11-21 17:49:46.0 +0100
@@ -0,0 +1,11 @@
+/* PR middle-end/82875 */
+/* { dg-do compile } */
+/* { dg-options "-ftree-ter" } */
+
+const int a = 100;
+
+void
+foo (void)
+{
+  int c[a];
+}
--- gcc/testsuite/gcc.c-torture/compile/pr82875.c.jj2017-11-21 
17:48:46.409374708 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr82875.c   2017-11-21 
17:48:25.0 +0100
@@ -0,0 +1,24 @@
+/* PR middle-end/82875 */
+
+signed char a;
+unsigned b;
+long c, d;
+long long e;
+
+void
+foo (void)
+{
+  short f = a = 6;
+  while (0)
+while (a <= 7)
+  {
+   for (;;)
+ ;
+   lab:
+ while (c <= 73)
+   ;
+   e = 20;
+   d ? (a %= c) * (e *= a ? f : b) : 0;
+  }
+  goto lab;
+}

Jakub


[PATCH] Avoid UNSPEC_VOLATILE or volatile ASM_OPERANDS in debug insns (PR debug/83084)

2017-11-22 Thread Jakub Jelinek
Hi!

Seems at least the scheduler treats UNSPEC_VOLATILE and perhaps volatile asm
in DEBUG_INSNs as having side-effects and schedules differently depending on
that.  While we could tweak the scheduler not to do that, these will be
actually never useful in DEBUG_INSNs anyway, we don't know what it means so
we can't represent it in debug info.

So, the following patch just makes sure we don't add them into debug_insns,
instead reset those.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-22  Jakub Jelinek  

PR debug/83084
* valtrack.c (propagate_for_debug_subst, propagate_for_debug): Reset
debug insns if they would contain UNSPEC_VOLATILE or volatile asm.
(dead_debug_insert_temp): Likewise, but also ignore even non-volatile
asm.

* g++.dg/opt/pr83084.C: New test.

--- gcc/valtrack.c.jj   2017-09-12 21:58:01.0 +0200
+++ gcc/valtrack.c  2017-11-21 18:26:10.188009398 +0100
@@ -171,10 +171,13 @@ propagate_for_debug_subst (rtx from, con
if (REG_P (*iter) && ++cnt > 1)
  {
rtx dval = make_debug_expr_from_rtl (old_rtx);
+   rtx to = pair->to;
+   if (volatile_insn_p (to))
+ to = gen_rtx_UNKNOWN_VAR_LOC ();
/* Emit a debug bind insn.  */
rtx bind
  = gen_rtx_VAR_LOCATION (GET_MODE (old_rtx),
- DEBUG_EXPR_TREE_DECL (dval), pair->to,
+ DEBUG_EXPR_TREE_DECL (dval), to,
  VAR_INIT_STATUS_INITIALIZED);
rtx_insn *bind_insn = emit_debug_insn_before (bind, pair->insn);
df_insn_rescan (bind_insn);
@@ -217,6 +220,8 @@ propagate_for_debug (rtx_insn *insn, rtx
 dest, propagate_for_debug_subst, &p);
  if (loc == INSN_VAR_LOCATION_LOC (insn))
continue;
+ if (volatile_insn_p (loc))
+   loc = gen_rtx_UNKNOWN_VAR_LOC ();
  INSN_VAR_LOCATION_LOC (insn) = loc;
  df_insn_rescan (insn);
}
@@ -660,6 +665,12 @@ dead_debug_insert_temp (struct dead_debu
}
  return 0;
}
+ /* Asm in DEBUG_INSN is never useful, we can't emit debug info for
+that.  And for volatile_insn_p, it is actually harmful
+- DEBUG_INSNs shouldn't have any side-effects.  */
+ else if (GET_CODE (src) == ASM_OPERANDS
+  || volatile_insn_p (src))
+   set = NULL_RTX;
}
 
   /* ??? Should we try to extract it from a PARALLEL?  */
--- gcc/testsuite/g++.dg/opt/pr83084.C.jj   2017-11-21 18:30:22.252935128 
+0100
+++ gcc/testsuite/g++.dg/opt/pr83084.C  2017-11-21 18:29:18.0 +0100
@@ -0,0 +1,16 @@
+// PR debug/83084
+// { dg-do compile }
+// { dg-options "-O2 -fcompare-debug -Wno-return-type" }
+
+enum E { F };
+template  struct A {
+  bool foo ();
+  int b;
+};
+template <> bool A<>::foo () {
+  int a;
+  do
+if (a)
+  return false;
+  while (__atomic_compare_exchange_n (&b, &a, 0, 1, 4, 0));
+}

Jakub


Fix calculation of ptr_mode for MODE_PARTIAL_INT Pmode

2017-11-22 Thread Richard Sandiford
This patch fixes a regression caused by r251469, where I'd incorrectly
converted a call to mode_for_size that sometimes needs MODE_PARTIAL_INTs.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also spot-checked on msp430-elf.  OK to install?

Richard


2017-11-22  Richard Sandiford  

gcc/
* emit-rtl.c (init_derived_machine_modes): Make sure ptr_mode
has the same mode class as Pmode.

Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  2017-11-22 09:21:12.560954668 +
+++ gcc/emit-rtl.c  2017-11-22 09:25:24.22574 +
@@ -6003,7 +6003,8 @@ init_derived_machine_modes (void)
 
   byte_mode = opt_byte_mode.require ();
   word_mode = opt_word_mode.require ();
-  ptr_mode = int_mode_for_size (POINTER_SIZE, 0).require ();
+  ptr_mode = as_a 
+(mode_for_size (POINTER_SIZE, GET_MODE_CLASS (Pmode), 0).require ());
 }
 
 /* Create some permanent unique rtl objects shared between all functions.  */


Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-22 Thread Richard Biener
On Tue, Nov 21, 2017 at 12:14 PM, Dominik Inführ
 wrote:
> Hi,
>
> this patch tries to extend tree-ssa-dce.c to remove unnecessary 
> new/delete-pairs (it already does that for malloc/free). Clang does it too 
> and it seems to be allowed by 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html. I’ve 
> bootstrapped/regtested on aarch64-linux and x86_64-linux.

Few comments.

+/* Return true when STMT is operator new call.  */
+
+bool
+gimple_call_operator_new_p (const gimple *stmt)
+{
+  tree fndecl;
+
+  if (is_gimple_call (stmt)
+  && (fndecl = gimple_call_fndecl (stmt)) != NULL_TREE)
+return DECL_IS_OPERATOR_NEW (fndecl);
+  return false;

make these take a gcall * please and drop the is_gimple_call check.

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index aa54221253c..9a406c5d86c 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1787,7 +1787,9 @@ struct GTY(()) tree_function_decl {
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
   unsigned versioned_function : 1;
-  /* No bits left.  */
+
+  unsigned operator_delete_flag : 1;
+  /* 31 bits left.  */

while it looks bad reality is that on 64bit pointer hosts we had 32 bits left.

Note that we could really pack things better and even make function_decl
smaller by using the tail-padding (for 64bit pointer hosts...) in
tree_decl_with_vis
which also has 32 + 14 bits left.

+ /* For operator delete [] we need to keep size operand
+alive. Otherwise this operand isn't available anymore
+when we finally decide that this stmt is necessary in
+   eliminate_unnecessary_stmts. If it should really be
+   unnecessary, a later pass can clean this up.  */
+ if (gimple_call_operator_delete_p (stmt))
+   {
+ for (unsigned i=1; i < gimple_call_num_args
(stmt); ++i)
+   {
+ tree arg = gimple_call_arg (stmt, i);
+ if (TREE_CODE (arg) == SSA_NAME)
+   mark_operand_necessary (arg);
+   }
+   }

bah ... so the "hack" for malloc/free pair removal doesn't really work very well
for new/delete.  What you do looks reasonable but in the end we might support
sth like "tentatively not necessary" and a late marking things necessary that
have been proven to be necessary anyway.

diff --git a/libstdc++-v3/testsuite/ext/bitmap_allocator/check_delete.cc
b/libstdc++-v3/testsuite/ext/bitmap_allocator/check_delete.cc
index bc36ed595e0..10e26615fb7 100644
--- a/libstdc++-v3/testsuite/ext/bitmap_allocator/check_delete.cc
+++ b/libstdc++-v3/testsuite/ext/bitmap_allocator/check_delete.cc
@@ -15,6 +15,8 @@
 // with this library; see the file COPYING3.  If not see
 // .

+// { dg-options "-fno-tree-dce" }
+
 // 20.4.1.1 allocator members

hmm.  I think it would be better to add a new C++ FE option
-fno-allocation-dce that
simply doesn't set DECL_IS_OPERATOR_{NEW,DELETE} similar to how we have
-fno-lifetime-dse?  There might be projects that are not happy with this DCE and
disabling _all_ DCE just to keep existing behavior looks bad.

Otherwise the patch looks straight-forward in case C++ folks are happy with it.

Thanks for working on this!  I think that if you manage to convince C++ folks
and fix the -fno-allocation-dce missing then I'd like to have this in
early stage3.
We've been requesting this for a long time.

Richard.


> Best,
> Dominik
>


PR83004: Accidental change to pr81136.c for VECTOR_BITS==128

2017-11-22 Thread Richard Sandiford
r254589 was supposed to leave tests unchanged for the default setting
of VECTOR_BITS, but I must have got my sums wrong on pr81136.c.
Sorry for the breakage.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
OK to install?

Richard


2017-11-22  Richard Sandiford  

gcc/testsuite/
PR testsuite/83004
* gcc.dg/vect/pr81136.c: Restore previous alignment of 32
in the default case.

Index: gcc/testsuite/gcc.dg/vect/pr81136.c
===
--- gcc/testsuite/gcc.dg/vect/pr81136.c 2017-11-22 09:21:12.538474075 +
+++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-11-22 09:28:04.836628929 +
@@ -2,7 +2,13 @@
 
 #include "tree-vect.h"
 
-struct __attribute__((aligned (VECTOR_BITS / 8)))
+#if VECTOR_BITS > 256
+#define ALIGNMENT (VECTOR_BITS / 8)
+#else
+#define ALIGNMENT 32
+#endif
+
+struct __attribute__((aligned (ALIGNMENT)))
 {
   char misaligner;
   int foo[100];


PR82547: Undetected overflow for UNSIGNED wide_ints

2017-11-22 Thread Richard Sandiford
wi::add_large and wi::sub_large weren't setting the overflow bit
correctly for unsigned operations if the result needed fewer HWIs
than the precision.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-11-21  Richard Sandiford  

gcc/
PR middle-end/82547
* wide-int.cc (wi::add_large, wi::sub_large): Fix overflow detection
for unsigned values with fewer HWIs than the precision.
(test_overflow): New function.
(wide_int_cc_tests): Call it.

Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2017-11-22 09:21:12.516930172 +
+++ gcc/wide-int.cc 2017-11-22 09:33:00.427739376 +
@@ -1158,7 +1158,7 @@ wi::add_large (HOST_WIDE_INT *val, const
   val[len] = mask0 + mask1 + carry;
   len++;
   if (overflow)
-   *overflow = false;
+   *overflow = (sgn == UNSIGNED && carry);
 }
   else if (overflow)
 {
@@ -1552,7 +1552,7 @@ wi::sub_large (HOST_WIDE_INT *val, const
   val[len] = mask0 - mask1 - borrow;
   len++;
   if (overflow)
-   *overflow = false;
+   *overflow = (sgn == UNSIGNED && borrow);
 }
   else if (overflow)
 {
@@ -2345,14 +2345,54 @@ static void run_all_wide_int_tests ()
   test_comparisons  ();
 }
 
+/* Test overflow conditions.  */
+
+static void
+test_overflow ()
+{
+  static int precs[] = { 31, 32, 33, 63, 64, 65, 127, 128 };
+  static int offsets[] = { 16, 1, 0 };
+  for (unsigned int i = 0; i < ARRAY_SIZE (precs); ++i)
+for (unsigned int j = 0; j < ARRAY_SIZE (offsets); ++j)
+  {
+   int prec = precs[i];
+   int offset = offsets[j];
+   bool overflow;
+   wide_int sum, diff;
+
+   sum = wi::add (wi::max_value (prec, UNSIGNED) - offset, 1,
+  UNSIGNED, &overflow);
+   ASSERT_EQ (sum, -offset);
+   ASSERT_EQ (overflow, offset == 0);
+
+   sum = wi::add (1, wi::max_value (prec, UNSIGNED) - offset,
+  UNSIGNED, &overflow);
+   ASSERT_EQ (sum, -offset);
+   ASSERT_EQ (overflow, offset == 0);
+
+   diff = wi::sub (wi::max_value (prec, UNSIGNED) - offset,
+   wi::max_value (prec, UNSIGNED),
+   UNSIGNED, &overflow);
+   ASSERT_EQ (diff, -offset);
+   ASSERT_EQ (overflow, offset != 0);
+
+   diff = wi::sub (wi::max_value (prec, UNSIGNED) - offset,
+   wi::max_value (prec, UNSIGNED) - 1,
+   UNSIGNED, &overflow);
+   ASSERT_EQ (diff, 1 - offset);
+   ASSERT_EQ (overflow, offset > 1);
+}
+}
+
 /* Run all of the selftests within this file, for all value types.  */
 
 void
 wide_int_cc_tests ()
 {
- run_all_wide_int_tests  ();
- run_all_wide_int_tests  ();
- run_all_wide_int_tests  ();
+  run_all_wide_int_tests  ();
+  run_all_wide_int_tests  ();
+  run_all_wide_int_tests  ();
+  test_overflow ();
 }
 
 } // namespace selftest


Re: [PATCH] Fix dwarf2out ICE on VEC_SERIES rtl (PR debug/83034)

2017-11-22 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> On this testcase we get (vec_series (reg: ebp) (const_int 1))
> in NOTE_INSN_CALL_ARG_LOCATION because the vector isn't live anywhere else.
> Right now we do not have a good story for providing values of optimized
> out vectors, so just punting on it like on other VEC_* codes is sufficient.
> 
> For the future, the question is how to represent vectors, they could be
> represented e.g. using typed DWARF stack, but then the stack could have
> really large values, or by representing them as arrays of elements using
> DW_OP_piece, which probably current consumers would grok already now,
> but the debug info for it could be larger (say:
> typedef int V __attribute__((vector_size (32)));
> V bar (V, V);
> 
> V
> foo (V a, V b)
> {
>   V c = a + b;
>   return bar (a, b) + a;
> }
> to represent c we either have typed stack ... DW_OP_plus DW_OP_stack_value,
> or ( ... DW_OP_plus DW_OP_stack_value DW_OP_piece ... ) 8x
> 
> Anyway, for now I think this is enough.  Bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> 2017-11-22  Jakub Jelinek  
> 
>   PR debug/83034
>   * dwarf2out.c (mem_loc_descriptor): Handle VEC_SERIES.
> 
>   * gcc.dg/pr83034.c: New test.
> 
> --- gcc/dwarf2out.c.jj2017-11-21 08:56:52.0 +0100
> +++ gcc/dwarf2out.c   2017-11-21 16:08:34.397197696 +0100
> @@ -15605,6 +15605,7 @@ mem_loc_descriptor (rtx rtl, machine_mod
>  case VEC_SELECT:
>  case VEC_CONCAT:
>  case VEC_DUPLICATE:
> +case VEC_SERIES:
>  case UNSPEC:
>  case HIGH:
>  case FMA:
> --- gcc/testsuite/gcc.dg/pr83034.c.jj 2017-11-21 16:07:13.886253031 +0100
> +++ gcc/testsuite/gcc.dg/pr83034.c2017-11-21 16:03:23.0 +0100
> @@ -0,0 +1,12 @@
> +/* PR debug/83034 */
> +/* { dg-do compile } */
> +/* { dg-options "-funroll-loops -Ofast -g" } */
> +
> +__attribute__((__simd__)) float expf (float);
> +
> +void
> +foo (float *a, int x)
> +{
> +  for (; x; x++)
> +a[x] = expf (x);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix mult expansion ICE (PR middle-end/82875)

2017-11-22 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> On these two testcases, we end up expanding MULT_EXPR where both arguments
> end up being VOIDmode.  For smul_optab that isn't a problem, we have
> the mode next to it, but in some cases we want to use {u,s}mul_widen_optab
> which is a conversion optab which needs two modes expand_binop is called
> just with one mode, the result mode, so the other mode is guessed from
> the operands and if both are VOIDmode, then a fallback is chosen to use
> return mode.  The new find_widening* changes ICE on that though, previously
> we'd just do something.
> 
> In any case, I think we need to make sure this doesn't happen while we
> still know both modes for the {u,s}mul_widen_optab.  Bootstrapped/regtested
> on x86_64-linux and i686-linux, ok for trunk?
> 
> Perhaps additionally we could have somewhere a case which for both arguments
> constant (unlikely case, as the gimple optimizers should usually optimize
> that out) and selected optabs for which we know the corresponding RTL code
> we could use simplify_const_binary_operation and see if it optimizes into a
> constant and just return that.  Though, these functions are large and it
> is still possible a constant could be uncovered later, so I think we want
> this patch even if we do something like that.

How much churn would it be to pass down a mode alongside the operands
in expand_binop?  Can't find it right now but didn't we introduce
some rtx_with_mode pair-like stuff somewhen?

Anyway, the patch looks ok to me but generally we of course want to
pass down modes from where we still know them.

Richard.

> 2017-11-22  Jakub Jelinek  
> 
>   PR middle-end/82875
>   * optabs.c (expand_doubleword_mult, expand_binop): Before calling
>   expand_binop with *mul_widen_optab, make sure at least one of the
>   operands doesn't have VOIDmode.
> 
>   * gcc.dg/pr82875.c: New test.
>   * gcc.c-torture/compile/pr82875.c: New test.
> 
> --- gcc/optabs.c.jj   2017-11-09 20:23:51.0 +0100
> +++ gcc/optabs.c  2017-11-21 17:40:13.318673366 +0100
> @@ -861,6 +861,11 @@ expand_doubleword_mult (machine_mode mod
>if (target && !REG_P (target))
>  target = NULL_RTX;
>  
> +  /* *_widen_optab needs to determine operand mode, make sure at least
> + one operand has non-VOID mode.  */
> +  if (GET_MODE (op0_low) == VOIDmode && GET_MODE (op1_low) == VOIDmode)
> +op0_low = force_reg (word_mode, op0_low);
> +
>if (umulp)
>  product = expand_binop (mode, umul_widen_optab, op0_low, op1_low,
>   target, 1, OPTAB_DIRECT);
> @@ -1199,6 +1204,10 @@ expand_binop (machine_mode mode, optab b
> : smul_widen_optab),
>wider_mode, mode) != CODE_FOR_nothing))
>  {
> +  /* *_widen_optab needs to determine operand mode, make sure at least
> +  one operand has non-VOID mode.  */
> +  if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
> + op0 = force_reg (mode, op0);
>temp = expand_binop (wider_mode,
>  unsignedp ? umul_widen_optab : smul_widen_optab,
>  op0, op1, NULL_RTX, unsignedp, OPTAB_DIRECT);
> --- gcc/testsuite/gcc.dg/pr82875.c.jj 2017-11-21 17:50:16.022274628 +0100
> +++ gcc/testsuite/gcc.dg/pr82875.c2017-11-21 17:49:46.0 +0100
> @@ -0,0 +1,11 @@
> +/* PR middle-end/82875 */
> +/* { dg-do compile } */
> +/* { dg-options "-ftree-ter" } */
> +
> +const int a = 100;
> +
> +void
> +foo (void)
> +{
> +  int c[a];
> +}
> --- gcc/testsuite/gcc.c-torture/compile/pr82875.c.jj  2017-11-21 
> 17:48:46.409374708 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr82875.c 2017-11-21 
> 17:48:25.0 +0100
> @@ -0,0 +1,24 @@
> +/* PR middle-end/82875 */
> +
> +signed char a;
> +unsigned b;
> +long c, d;
> +long long e;
> +
> +void
> +foo (void)
> +{
> +  short f = a = 6;
> +  while (0)
> +while (a <= 7)
> +  {
> + for (;;)
> +   ;
> + lab:
> +   while (c <= 73)
> + ;
> + e = 20;
> + d ? (a %= c) * (e *= a ? f : b) : 0;
> +  }
> +  goto lab;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Avoid UNSPEC_VOLATILE or volatile ASM_OPERANDS in debug insns (PR debug/83084)

2017-11-22 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> Seems at least the scheduler treats UNSPEC_VOLATILE and perhaps volatile asm
> in DEBUG_INSNs as having side-effects and schedules differently depending on
> that.  While we could tweak the scheduler not to do that, these will be
> actually never useful in DEBUG_INSNs anyway, we don't know what it means so
> we can't represent it in debug info.
> 
> So, the following patch just makes sure we don't add them into debug_insns,
> instead reset those.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Do we have some RTL IL verifier where we could verify those don't appear?
Maybe ICE in dwarf2out.c?

Thanks,
Richard.

> 2017-11-22  Jakub Jelinek  
> 
>   PR debug/83084
>   * valtrack.c (propagate_for_debug_subst, propagate_for_debug): Reset
>   debug insns if they would contain UNSPEC_VOLATILE or volatile asm.
>   (dead_debug_insert_temp): Likewise, but also ignore even non-volatile
>   asm.
> 
>   * g++.dg/opt/pr83084.C: New test.
> 
> --- gcc/valtrack.c.jj 2017-09-12 21:58:01.0 +0200
> +++ gcc/valtrack.c2017-11-21 18:26:10.188009398 +0100
> @@ -171,10 +171,13 @@ propagate_for_debug_subst (rtx from, con
>   if (REG_P (*iter) && ++cnt > 1)
> {
>   rtx dval = make_debug_expr_from_rtl (old_rtx);
> + rtx to = pair->to;
> + if (volatile_insn_p (to))
> +   to = gen_rtx_UNKNOWN_VAR_LOC ();
>   /* Emit a debug bind insn.  */
>   rtx bind
> = gen_rtx_VAR_LOCATION (GET_MODE (old_rtx),
> -   DEBUG_EXPR_TREE_DECL (dval), pair->to,
> +   DEBUG_EXPR_TREE_DECL (dval), to,
> VAR_INIT_STATUS_INITIALIZED);
>   rtx_insn *bind_insn = emit_debug_insn_before (bind, pair->insn);
>   df_insn_rescan (bind_insn);
> @@ -217,6 +220,8 @@ propagate_for_debug (rtx_insn *insn, rtx
>dest, propagate_for_debug_subst, &p);
> if (loc == INSN_VAR_LOCATION_LOC (insn))
>   continue;
> +   if (volatile_insn_p (loc))
> + loc = gen_rtx_UNKNOWN_VAR_LOC ();
> INSN_VAR_LOCATION_LOC (insn) = loc;
> df_insn_rescan (insn);
>   }
> @@ -660,6 +665,12 @@ dead_debug_insert_temp (struct dead_debu
>   }
> return 0;
>   }
> +   /* Asm in DEBUG_INSN is never useful, we can't emit debug info for
> +  that.  And for volatile_insn_p, it is actually harmful
> +  - DEBUG_INSNs shouldn't have any side-effects.  */
> +   else if (GET_CODE (src) == ASM_OPERANDS
> +|| volatile_insn_p (src))
> + set = NULL_RTX;
>   }
>  
>/* ??? Should we try to extract it from a PARALLEL?  */
> --- gcc/testsuite/g++.dg/opt/pr83084.C.jj 2017-11-21 18:30:22.252935128 
> +0100
> +++ gcc/testsuite/g++.dg/opt/pr83084.C2017-11-21 18:29:18.0 
> +0100
> @@ -0,0 +1,16 @@
> +// PR debug/83084
> +// { dg-do compile }
> +// { dg-options "-O2 -fcompare-debug -Wno-return-type" }
> +
> +enum E { F };
> +template  struct A {
> +  bool foo ();
> +  int b;
> +};
> +template <> bool A<>::foo () {
> +  int a;
> +  do
> +if (a)
> +  return false;
> +  while (__atomic_compare_exchange_n (&b, &a, 0, 1, 4, 0));
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix i?86 bootstrap (PR rtl-optimization/82044)

2017-11-22 Thread Eric Botcazou
> Unfortunately this patch broke i686-linux bootstrap, during stage2
> libgcc configure fails due to numerous ICEs.

Note that the patch is on the 7 branch too.

-- 
Eric Botcazou


Re: [PATCH] Fix i?86 bootstrap (PR rtl-optimization/82044)

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 10:43:47AM +0100, Eric Botcazou wrote:
> > Unfortunately this patch broke i686-linux bootstrap, during stage2
> > libgcc configure fails due to numerous ICEs.
> 
> Note that the patch is on the 7 branch too.

Richard has reverted it there.

Jakub


Re: [PATCH] Fix mult expansion ICE (PR middle-end/82875)

2017-11-22 Thread Richard Sandiford
Really sorry for missing this PR -- don't know that happened :-(

Jakub Jelinek  writes:
> On these two testcases, we end up expanding MULT_EXPR where both arguments
> end up being VOIDmode.  For smul_optab that isn't a problem, we have
> the mode next to it, but in some cases we want to use {u,s}mul_widen_optab
> which is a conversion optab which needs two modes expand_binop is called
> just with one mode, the result mode, so the other mode is guessed from
> the operands and if both are VOIDmode, then a fallback is chosen to use
> return mode.  The new find_widening* changes ICE on that though, previously
> we'd just do something.

What do you think about passing the modes of the operands down to
expand_binop too, a bit like simplify_unary_operation?  We could have
an overloaded wrapper with the current interface to avoid updating every
caller.  That at least would cut down on some of the guessing that
the function currently does.

I can have a go at that if it sounds OK, but the posted patch LGTM too
as an alternative.

> In any case, I think we need to make sure this doesn't happen while we
> still know both modes for the {u,s}mul_widen_optab.  Bootstrapped/regtested
> on x86_64-linux and i686-linux, ok for trunk?
>
> Perhaps additionally we could have somewhere a case which for both arguments
> constant (unlikely case, as the gimple optimizers should usually optimize
> that out) and selected optabs for which we know the corresponding RTL code
> we could use simplify_const_binary_operation and see if it optimizes into a
> constant and just return that.  Though, these functions are large and it
> is still possible a constant could be uncovered later, so I think we want
> this patch even if we do something like that.

(FWIW, I agree we need this either way, although folding sounds good too.)

Thanks,
Richard


RE: [patch] remove cilk-plus

2017-11-22 Thread Koval, Julia
Added fix for gcc/doc/sourcebuild.texi

> -Original Message-
> From: Koval, Julia
> Sent: Wednesday, November 22, 2017 10:15 AM
> To: Rainer Orth 
> Cc: Jeff Law ; Jakub Jelinek ; GCC
> Patches 
> Subject: RE: [patch] remove cilk-plus
> 
> Changes for these files(except sourcebuild one, will fix that) are included in
> patch I sent. I only removed from the patch deletion of the folders I 
> mentioned.
> 
> Julia
> 
> > -Original Message-
> > From: Rainer Orth [mailto:r...@cebitec.uni-bielefeld.de]
> > Sent: Wednesday, November 22, 2017 10:11 AM
> > To: Koval, Julia 
> > Cc: Jeff Law ; Jakub Jelinek ; GCC
> > Patches 
> > Subject: Re: [patch] remove cilk-plus
> >
> > Hi Julia,
> >
> > >> So it's not important, but the patch doesn't have the removal of the
> > >> cilk+ testsuite or runtime.  BUt again, it's not a big deal, I can guess
> > >> what that part of the patch looks like.
> > >
> > > I used Jakub's suggestion in
> > > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01348.html and didn't add
> > > libcilkrts and cilk-plus directories to patch(without this patch doesn't
> > > fit in gcc-patches), only to change log. I will include them, when I'll
> > > commit the patch, but I guess there is nothing to review here:
> > > rm -rf gcc/testsuite/c-c++-common/cilk-plus
> > > rm -rf gcc/testsuite/g++.dg/cilk-plus
> > > rm -rf gcc/testsuite/gcc.dg/cilk-plus
> > > rm -rf libcilkrts
> > > I can send it as an additional patch(or patches) if this is required.
> >
> > there's more in the testsuite:
> >
> > gcc/testsuite/lib/cilk-plus-dg.exp
> > gcc/testsuite/lib/target-supports.exp (check_effective_target_cilkplus,
> > check_effective_target_cilkplus_runtime)
> > gcc/doc/sourcebuild.texi (cilkplus_runtime effective-target keyword)
> >
> > Rainer
> >
> > --
> > -
> > Rainer Orth, Center for Biotechnology, Bielefeld University


cilk-plus.tar.xz
Description: cilk-plus.tar.xz


Re: [PATCH] Fix mult expansion ICE (PR middle-end/82875)

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 10:41:19AM +0100, Richard Biener wrote:
> How much churn would it be to pass down a mode alongside the operands
> in expand_binop?  Can't find it right now but didn't we introduce
> some rtx_with_mode pair-like stuff somewhen?

We have rtx_mode_t for that.  But there are 240+ calls to expand_binop,
and even if we add an overload that will transform it, unless we forcefully
inline it wouldn't that slow down all the spots a little bit?
The thing is, for the vast majority of binary ops we don't need the operand
modes, it is mainly comparisons, second arg of shifts/rotates and this
widening case.

Jakub


Re: [PATCH] Avoid UNSPEC_VOLATILE or volatile ASM_OPERANDS in debug insns (PR debug/83084)

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 10:42:13AM +0100, Richard Biener wrote:
> On Wed, 22 Nov 2017, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > Seems at least the scheduler treats UNSPEC_VOLATILE and perhaps volatile asm
> > in DEBUG_INSNs as having side-effects and schedules differently depending on
> > that.  While we could tweak the scheduler not to do that, these will be
> > actually never useful in DEBUG_INSNs anyway, we don't know what it means so
> > we can't represent it in debug info.
> > 
> > So, the following patch just makes sure we don't add them into debug_insns,
> > instead reset those.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Ok.
> 
> Do we have some RTL IL verifier where we could verify those don't appear?
> Maybe ICE in dwarf2out.c?

dwarf2out.c would ICE on UNSPEC_VOLATILE (but not ASM_OPERANDS), no idea why
it didn't make all the way up to there.
In any case, a more useful verifier would be something like we have for
GIMPLE verification, something that walks the whole IL after every pass and
ICEs if something fails verification.  We don't have that for RTL and I
think we should, it would e.g. catch the various cases where backends just
use invalid RTL in their patterns on which e.g. simplify-rtx.c could just
ICE if it made it there.

Jakub


Re: [PATCH] Fix mult expansion ICE (PR middle-end/82875)

2017-11-22 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> On Wed, Nov 22, 2017 at 10:41:19AM +0100, Richard Biener wrote:
> > How much churn would it be to pass down a mode alongside the operands
> > in expand_binop?  Can't find it right now but didn't we introduce
> > some rtx_with_mode pair-like stuff somewhen?
> 
> We have rtx_mode_t for that.  But there are 240+ calls to expand_binop,
> and even if we add an overload that will transform it, unless we forcefully
> inline it wouldn't that slow down all the spots a little bit?
> The thing is, for the vast majority of binary ops we don't need the operand
> modes, it is mainly comparisons, second arg of shifts/rotates and this
> widening case.

Ok, so maybe split expand_binop then to the class of cases where we do
need the mode and a class where we don't then?

We don't have to use rtx_mode_t we can just pass two arguments.  Not
sure what is more convenient to use.

Anyway, this doesn't have to happen in stage3, just as a general
note on how I believe we changed things in other places.  Richard S.
may remember more here.

Richard.


Replace REDUC_*_EXPRs with internal functions.

2017-11-22 Thread Richard Sandiford
This patch replaces the REDUC_*_EXPR tree codes with internal functions.
This is needed so that the support for in-order reductions can also use
internal functions without too much complication.

This came out of the review for:
  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01516.html

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-11-22  Richard Sandiford  

gcc/
* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Delete.
* cfgexpand.c (expand_debug_expr): Remove handling for them.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (const_unop): Likewise.
* optabs-tree.c (optab_for_tree_code): Likewise.
* tree-cfg.c (verify_gimple_assign_unary): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
(op_code_prio): Likewise.
(op_symbol_code): Likewise.
* internal-fn.def (DEF_INTERNAL_SIGNED_OPTAB_FN): Define.
(IFN_REDUC_PLUS, IFN_REDUC_MAX, IFN_REDUC_MIN): New internal functions.
* internal-fn.c (direct_internal_fn_optab): New function.
(direct_internal_fn_array, direct_internal_fn_supported_p
(internal_fn_expanders): Handle DEF_INTERNAL_SIGNED_OPTAB_FN.
* fold-const-call.c (fold_const_reduction): New function.
(fold_const_call): Handle CFN_REDUC_PLUS, CFN_REDUC_MAX and
CFN_REDUC_MIN.
* tree-vect-loop.c: Include internal-fn.h.
(reduction_code_for_scalar_code): Rename to...
(reduction_fn_for_scalar_code): ...this and return an internal
function.
(vect_model_reduction_cost): Take an internal_fn rather than
a tree_code.
(vect_create_epilog_for_reduction): Likewise.  Build calls rather
than assignments.
(vectorizable_reduction): Use internal functions rather than tree
codes for the reduction operation.  Update calls to the functions
above.
* config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin):
Use calls to internal functions rather than REDUC tree codes.
* config/aarch64/aarch64-simd.md: Update comment accordingly.

Index: gcc/tree.def
===
--- gcc/tree.def2017-11-22 10:05:56.012967507 +
+++ gcc/tree.def2017-11-22 10:06:16.885024316 +
@@ -1260,18 +1260,6 @@ DEFTREECODE (OMP_CLAUSE, "omp_clause", t
Operand 0: BODY: contains body of the transaction.  */
 DEFTREECODE (TRANSACTION_EXPR, "transaction_expr", tcc_expression, 1)
 
-/* Reduction operations.
-   Operations that take a vector of elements and "reduce" it to a scalar
-   result (e.g. summing the elements of the vector, finding the minimum over
-   the vector elements, etc).
-   Operand 0 is a vector.
-   The expression returns a scalar, with type the same as the elements of the
-   vector, holding the result of the reduction of all elements of the operand.
-   */
-DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)
-
 /* Widening dot-product.
The first two arguments are of type t1.
The third argument and the result are of type t2, such that t2 is at least
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c 2017-11-22 10:05:56.012967507 +
+++ gcc/cfgexpand.c 2017-11-22 10:06:16.878456941 +
@@ -5050,9 +5050,6 @@ expand_debug_expr (tree exp)
 
 /* Vector stuff.  For most of the codes we don't have rtl codes.  */
 case REALIGN_LOAD_EXPR:
-case REDUC_MAX_EXPR:
-case REDUC_MIN_EXPR:
-case REDUC_PLUS_EXPR:
 case VEC_COND_EXPR:
 case VEC_PACK_FIX_TRUNC_EXPR:
 case VEC_PACK_SAT_EXPR:
Index: gcc/expr.c
===
--- gcc/expr.c  2017-11-22 10:05:56.012967507 +
+++ gcc/expr.c  2017-11-22 10:06:16.88034 +
@@ -9367,26 +9367,6 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
 return target;
   }
 
-case REDUC_MAX_EXPR:
-case REDUC_MIN_EXPR:
-case REDUC_PLUS_EXPR:
-  {
-op0 = expand_normal (treeop0);
-this_optab = optab_for_tree_code (code, type, optab_default);
-machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
-
-   struct expand_operand ops[2];
-   enum insn_code icode = optab_handler (this_optab, vec_mode);
-
-   create_output_operand (&ops[0], target, mode);
-   create_input_operand (&ops[1], op0, vec_mode);
-   expand_insn (icode, 2, ops);
-   target = ops[0].value;
-   if (GET_MODE (target) != mode)
- return gen_lowpart (tmode, target);
-   return target;
-  }
-
 case VEC_UNPACK_HI_EXPR:
 case VEC_UNPACK_LO_EXPR:
   {
Index: gcc/fold-const.c
===

Re: [PATCH] Avoid UNSPEC_VOLATILE or volatile ASM_OPERANDS in debug insns (PR debug/83084)

2017-11-22 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> On Wed, Nov 22, 2017 at 10:42:13AM +0100, Richard Biener wrote:
> > On Wed, 22 Nov 2017, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > Seems at least the scheduler treats UNSPEC_VOLATILE and perhaps volatile 
> > > asm
> > > in DEBUG_INSNs as having side-effects and schedules differently depending 
> > > on
> > > that.  While we could tweak the scheduler not to do that, these will be
> > > actually never useful in DEBUG_INSNs anyway, we don't know what it means 
> > > so
> > > we can't represent it in debug info.
> > > 
> > > So, the following patch just makes sure we don't add them into 
> > > debug_insns,
> > > instead reset those.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Ok.
> > 
> > Do we have some RTL IL verifier where we could verify those don't appear?
> > Maybe ICE in dwarf2out.c?
> 
> dwarf2out.c would ICE on UNSPEC_VOLATILE (but not ASM_OPERANDS), no idea why
> it didn't make all the way up to there.
> In any case, a more useful verifier would be something like we have for
> GIMPLE verification, something that walks the whole IL after every pass and
> ICEs if something fails verification.  We don't have that for RTL and I
> think we should, it would e.g. catch the various cases where backends just
> use invalid RTL in their patterns on which e.g. simplify-rtx.c could just
> ICE if it made it there.

Yes.  Too bad we don't have such thing.

Richard.


Re: libgo patch committed: Fix Makefile bug setting LD_LIBRARY_PATH

2017-11-22 Thread Eric Botcazou
> If we don't then i would say that's the bug to fix, not warp back in
> time 20 years. Why let solaris hold hostage everybody else?

Let's not start a flame war, please.  Almost all other uses of grep in the 
libgo directory have >/dev/null instead of -q and nobody chokes on them.

-- 
Eric Botcazou


Re: [PATCH] PR libstdc++/48101 improve errors for invalid container specializations

2017-11-22 Thread Rainer Orth
Hi Jonathan,

> This uses static_assert to improve the errors when attempting to
> instantiate invalid specializations of containers, e.g. set,
> or unordered_set, hash> (which mixes up the
> order of the hasher and equality predicate arguments).
>
> This means instead of more than 100 lines of confusing errors for
> https://wandbox.org/permlink/kL1YVNVOzrAsLPyS we get only this:
>
> In file included from /home/jwakely/gcc/8/include/c++/8.0.0/set:61:0,
> from s.cc:2:
> /home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h: In instantiation of 
> ‘class std::set’:
> s.cc:8:18:   required from here
> /home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h:108:7: error: static 
> assertion failed: std::set must have a non-const, non-volatile value_type
>   static_assert(is_same::type, _Key>::value,
>   ^
> In file included from 
> /home/jwakely/gcc/8/include/c++/8.0.0/unordered_set:46:0,
> from s.cc:1:
> /home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h: In instantiation of 
> ‘class std::_Hashtable, 
> std::__detail::_Identity, std::hash, std::equal_to, 
> std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, 
> std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits true, false> >’:
> /home/jwakely/gcc/8/include/c++/8.0.0/bits/unordered_set.h:898:18:   required 
> from ‘class std::unordered_multiset, std::hash >’
> s.cc:10:53:   required from here
> /home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:195:7: error: static 
> assertion failed: hash function must be invocable with an argument of key type
>   static_assert(__is_invocable{},
>   ^
> /home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:197:7: error: static 
> assertion failed: key equality predicate must be invocable with two arguments 
> of key type
>   static_assert(__is_invocable{},
>   ^
>
> Tested powerpc64le-linux, committed to trunk.

this broke Go bootstrap, it seems.  Seen on i386-pc-solaris2.11 as of
r255048:

In file included from 
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/map:60,
 from /vol/gcc/src/hg/trunk/local/gcc/go/go-system.h:36,
 from 
/vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/ast-dump.cc:7:
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:
 In instantiation of 'class std::_Rb_tree, Import_init_lt, std::allocator >':
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_set.h:133:17:
   required from 'class std::set'
/vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/gogo.h:127:37:   required from 
here
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:452:7:
 error: static assertion failed: comparison object must be invocable with two 
arguments of key type
   static_assert(__is_invocable{},
   ^
In file included from 
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/map:60,
 from /vol/gcc/src/hg/trunk/local/gcc/go/go-system.h:36,
 from /vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/escape.cc:7:
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:
 In instantiation of 'class std::_Rb_tree, Import_init_lt, std::allocator >':
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_set.h:133:17:
   required from 'class std::set'
/vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/gogo.h:127:37:   required from 
here
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:452:7:
 error: static assertion failed: comparison object must be invocable with two 
arguments of key type
   static_assert(__is_invocable{},
   ^

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PING] Plugin support on Windows/MinGW

2017-11-22 Thread JonY
On 11/21/2017 07:03 AM, Boris Kolpackov wrote:
> Hi,
> 
> I would like to ping this patch:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01040.html
> 
> The changes are fairly conservative: they do not touch much of the
> existing module implementation and plugin support on MinGW is disabled
> by default (there are also fixes for a couple of bugs as a bonus).
> I would really liked to have this patch considered for GCC 8, if
> possible.
> 
> Thanks,
> Boris
> 

Is there a problem with using .so for internal libraries instead of
"dll" if it simplifies the code?

I don't see any other problems with it.



signature.asc
Description: OpenPGP digital signature


Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-22 Thread Martin Jambor
On Tue, Nov 21 2017, Jeff Law wrote:
> On 11/21/2017 04:14 AM, Dominik Inführ wrote:
>> Hi,
>> 
>> this patch tries to extend tree-ssa-dce.c to remove unnecessary 
>> new/delete-pairs (it already does that for malloc/free). Clang does it too 
>> and it seems to be allowed by 
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html. I’ve 
>> bootstrapped/regtested on aarch64-linux and x86_64-linux.
> Just a note, we've transitioned into stage3 in preparation for the
> upcoming gcc-8 release in the spring.  During stage3 we're addressing
> bugfixes, not further enhancements (with the exception of enhancements
> that were posted prior to stage1 close).
>
> So it's unlikely anyone will dig into this right now, unless there's an
> existing bugzilla around this missed optimization.

Unless I am mistaken, I think it is PR 23383

Martin



Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 10:30:29AM +0100, Richard Biener wrote:
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1787,7 +1787,9 @@ struct GTY(()) tree_function_decl {
>unsigned has_debug_args_flag : 1;
>unsigned tm_clone_flag : 1;
>unsigned versioned_function : 1;
> -  /* No bits left.  */
> +
> +  unsigned operator_delete_flag : 1;
> +  /* 31 bits left.  */
> 
> while it looks bad reality is that on 64bit pointer hosts we had 32 bits left.

But we can just add it to say tree_decl_common which has 14 spare bits or
tree_decl_with_vis which has another 14 spare bits.  By just noting the flag
applies only to FUNCTION_DECLs and enforcing it in the tree.h macros,
it will be easy to reuse that bit for something different for trees other
than FUNCTION_DECL.

Jakub


[PATCH] Fix PR82991

2017-11-22 Thread Richard Biener

I am testing the following (old) patch to value number call lhs
according to the ERF_RETURNS_ARG annotation.  This allows for
more expression simplification and also changes downstream
users of the argument to use the return value which should help
register allocation and register lifetime in general.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-11-22  Richard Biener  

PR tree-optimization/82991
* tree-ssa-sccvn.c (visit_use): Value number lhs of call to
argument that we know is returned.
(eliminate_dom_walker::eliminate_avail): Also lookup available
leader for default defs.
(eliminate_dom_walker::before_dom_children): Watch out for
side-effects even if we know the value of the call result.

* gcc.dg/tree-ssa/ssa-fre-60.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 255051)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -4131,6 +4131,26 @@ visit_use (tree use)
  changed = defs_to_varying (call_stmt);
  goto done;
}
+
+ /* If the return value is a copy of an argument record that.  */
+ int rflags = gimple_call_return_flags (call_stmt);
+ if (rflags & ERF_RETURNS_ARG)
+   {
+ unsigned argnum = rflags & ERF_RETURN_ARG_MASK;
+ if (argnum < gimple_call_num_args (call_stmt))
+   {
+ tree arg = gimple_call_arg (call_stmt, argnum);
+ if (TREE_CODE (arg) == SSA_NAME
+ || is_gimple_min_invariant (arg))
+   {
+ changed = visit_copy (lhs, arg);
+ if (gimple_vdef (call_stmt))
+   changed |= set_ssa_val_to (gimple_vdef (call_stmt),
+  gimple_vdef (call_stmt));
+ goto done;
+   }
+   }
+   }
}
 
   /* Pick up flags from a devirtualization target.  */
@@ -5201,10 +5221,10 @@ eliminate_dom_walker::eliminate_avail (t
   tree valnum = VN_INFO (op)->valnum;
   if (TREE_CODE (valnum) == SSA_NAME)
 {
-  if (SSA_NAME_IS_DEFAULT_DEF (valnum))
-   return valnum;
   if (avail.length () > SSA_NAME_VERSION (valnum))
return avail[SSA_NAME_VERSION (valnum)];
+  if (SSA_NAME_IS_DEFAULT_DEF (valnum))
+   return valnum;
 }
   else if (is_gimple_min_invariant (valnum))
 return valnum;
@@ -5421,6 +5441,14 @@ eliminate_dom_walker::before_dom_childre
eliminate_push_avail (sprime);
}
 
+ /* While we might have value numbered a call return value to
+one of its arguments the call itself might not be solely
+represented by its return value.  Thus do not ignore
+side-effects indicated by a VARYING vdef.  */
+ if (gimple_vdef (stmt)
+ && VN_INFO (gimple_vdef (stmt))->valnum == gimple_vdef (stmt))
+   sprime = NULL_TREE;
+
  /* If this now constitutes a copy duplicate points-to
 and range info appropriately.  This is especially
 important for inserted code.  See tree-ssa-copy.c
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-60.c  (working copy)
@@ -0,0 +1,43 @@
+/* { dg-do link } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+void link_error (void);
+
+void f1 (char *p)
+{
+  char *q = __builtin_stpcpy (p, "123");
+  unsigned n = q - p;
+
+  if (n != 3)
+link_error ();
+}
+
+char *f2 (char *p)
+{
+  char *q = __builtin_strcpy (p, "123");
+  unsigned n = q - p;
+
+  if (n)
+link_error ();
+
+  return p;
+}
+
+char *f3 (char *p, const char *s)
+{
+  char *q = __builtin_memcpy (p, s, 3);
+  unsigned n = q - p;
+
+  if (n)
+link_error ();
+
+  return q;
+}
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "link_error" 0 "fre1" } } */
+/* { dg-final { scan-tree-dump-times "return q" 2 "fre1" } } */


Re: Replace REDUC_*_EXPRs with internal functions.

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 10:09:08AM +, Richard Sandiford wrote:
> This patch replaces the REDUC_*_EXPR tree codes with internal functions.
> This is needed so that the support for in-order reductions can also use
> internal functions without too much complication.
> 
> This came out of the review for:
>   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01516.html
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Wouldn't it be better to just have IFN_REDUC that takes as an additional
argument INTEGER_CST with tree_code of the operation (so REDUC_MAX_EXPR
would be transformed into REDUC (MAX_EXPR, ...) etc.)?
That way we wouldn't need to add further internal fns if we want say
multiplication reduction, or some other.

Jakub


[patch] Add support for #pragma GCC unroll v2

2017-11-22 Thread Eric Botcazou
Hi,

this is a revised version of:
  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01452.html

with the following changes:
 1. integration of Bernhard's patch for the Fortran front-end,
 2. Sandra's fix for the documentation,
 3. minor tweaks to the C and C++ front-end,
 4. change at the GIMPLE level for the cunrolli pass,
 5. change at the RTL level with a fix for a thinko,
 6. More testcases for all languages.

This makes it so that the presence of a pragma GCC unroll doesn't have a 
global effect on the function: at the GIMPLE level, the cunrolli pass is no 
longer forced, so that the unrolling is done by the first activated pass 
(cunroll at -O1, cunrolli at -O2 and above); at the RTL level, this was 
already the case but the code no longer fiddles with the unrolling flag.

Tested on x86_64-suse-linux, OK for the mainline?


2017-11-22  Mike Stump  
Eric Botcazou  
Bernhard Reutner-Fischer  

ChangeLog/
* doc/extend.texi (Loop-Specific Pragmas): Document pragma GCC unroll.
* doc/generic.texi (ANNOTATE_EXPR): Document 3rd operand.
* cfgloop.h (struct loop): Add unroll field.
* function.h (struct function): Add has_unroll bitfield.
* gimplify.c (gimple_boolify) : Deal with unroll kind.
(gimplify_expr) : Propagate 3rd operand.
* loop-init.c (pass_loop2::gate): Return true if cfun->has_unroll.
(pass_rtl_unroll_loops::gate): Likewise.
* loop-unroll.c (decide_unrolling): Tweak note message.  Skip loops
for which loop->unroll==1.
(decide_unroll_constant_iterations): Use note for consistency and
take loop->unroll into account.  Return early if loop->unroll is set.
Fix thinko in existing test.
(decide_unroll_runtime_iterations): Use note for consistency and
take loop->unroll into account.
(decide_unroll_stupid): Likewise.
* lto-streamer-in.c (input_cfg): Read loop->unroll.
* lto-streamer-out.c (output_cfg): Write loop->unroll.
* tree-cfg.c (replace_loop_annotate_in_block) 
New.
(replace_loop_annotate) : Likewise.
(print_loop): Print loop->unroll if set.
* tree-core.h (enum annot_expr_kind): Add annot_expr_unroll_kind.
* tree-inline.c (copy_loops): Copy unroll and set cfun->has_unroll.
* tree-pretty-print.c (dump_generic_node) :
New.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Bail out if
loop->unroll is set and smaller than the trip count.  Otherwise bypass
entirely the heuristics if loop->unroll is set.  Remove dead note.
Fix off-by-one bug in other node.
(try_peel_loop): Bail out if loop->unroll is set.  Fix formatting.
(tree_unroll_loops_completely_1): Force unrolling if loop->unroll
is greater than 1.
(tree_unroll_loops_completely): Make static.
(pass_complete_unroll::execute): Use correct type for variable.
(pass_complete_unrolli::execute): Fix formatting.
* tree.def (ANNOTATE_EXPR): Add 3rd operand.

ada/ChangeLog:
* gcc-interface/trans.c (gnat_gimplify_stmt) : Add 3rd
operand to ANNOTATE_EXPR and pass unrolling hints.

c-family/ChangeLog:
* c-pragma.c (init_pragma): Register pragma GCC unroll.
* c-pragma.h (enum pragma_kind): Add PRAGMA_UNROLL.

c/ChangeLog:
* c-parser.c (c_parser_while_statement): Add unroll parameter and
build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
(c_parser_do_statement): Likewise.
(c_parser_for_statement): Likewise.
(c_parser_statement_after_labels): Adjust calls to above.
(c_parse_pragma_ivdep): New static function.
(c_parser_pragma_unroll): Likewise.
(c_parser_pragma) : Add support for pragma Unroll.
: New case.

cp/ChangeLog:
* constexpr.c (cxx_eval_constant_expression) : Remove
assertion on 2nd operand.
(potential_constant_expression_1): Likewise.
* cp-array-notation.c (create_an_loop): Adjut call to finish_for_cond.
* cp-tree.h (cp_convert_range_for): Adjust prototype.
(finish_while_stmt_cond): Likewise.
(finish_do_stmt): Likewise.
(finish_for_cond): Likewise.
* init.c (build_vec_init): Adjut call to finish_for_cond.
* parser.c (cp_parser_statement): Adjust call to
cp_parser_iteration_statement.
(cp_parser_for): Add unroll parameter and pass it in calls to
cp_parser_range_for and cp_parser_c_for.
(cp_parser_c_for): Add unroll parameter and pass it in call to
finish_for_cond.
(cp_parser_range_for): Add unroll parameter and pass it in call to
cp_convert_range_for.
(cp_convert_range_for): Add unroll parameter and pass it in call to
finish_for_cond.
(cp_parser_iteration_statement): Add unroll parameter and pass it in
calls to finish_while_stmt_cond, finish_do_stmt and 

[PATCH] vrp_prop::check_array_ref fixes (PR tree-optimization/83044)

2017-11-22 Thread Jakub Jelinek
Hi!

On the following testcase we ICE, because eltsize is 0 and when we
int_const_binop (TRUNC_DIV_EXPR, maxbound, size_int (0)), that of course
returns NULL.
The upper bound for zero sized elements is just not meaningful, if we use
any index we still won't reach maximum object size that way.

While looking at that, I've discovered a couple of other issues though.

One is that the off subtraction seems misplaced.
get_addr_base_and_unit_offset computes an offset in bytes, so it corresponds
to the __PTRDIFF_MAX__ limit on object size (also in bytes), not to the
__PTRDIFF_MAX__ limit divided by element size.
So I believe we first want to subtract off (and only if it is positive, we
do not want to enlarge the limit) from __PTRDIFF_MAX__ and then divide
rather than what the code was doing before.

Another thing is that the code has been mixing up types randomly, something
being in ptrdiff_type_node, something in sizetype.

And yet another thing is that even if we can't compute any sensible upper
bound (e.g. the eltsize 0 case; I believe the VLA case just won't happen
here, because we replace the VLAs with dereferences of a pointer I believe
the ARRAY_REFs should have been transformed into POINTER_PLUS anyway), we
can still warn about indexes smaller than low bound (especially if the low
bound is not zero like in Fortran).

The Warray-bounds.c testcase needed adjustment, because it was written with
the same bad computation of the maximum - __PTRDIFF_MAX__ / sizeof (elt) -
offset rather than (__PTRDIFF_MAX__ - offset) / sizeof (elt).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-22  Jakub Jelinek  

PR tree-optimization/83044
* tree-vrp.c (vrp_prop::check_array_ref): If eltsize is not
INTEGER_CST or is 0, clear up_bound{,_p1} and later ignore tests
that need the upper bound.  Subtract offset from
get_addr_base_and_unit_offset only if positive and subtract it
before division by eltsize rather than after it.

* gcc.dg/pr83044.c: New test.
* c-c++-common/Warray-bounds.c (fb): Fix up MAX value.

--- gcc/tree-vrp.c.jj   2017-11-17 08:40:28.0 +0100
+++ gcc/tree-vrp.c  2017-11-21 14:13:47.184974061 +0100
@@ -4795,24 +4795,30 @@ vrp_prop::check_array_ref (location_t lo
 the size of the largest object is PTRDIFF_MAX.  */
   tree eltsize = array_ref_element_size (ref);
 
-  /* FIXME: Handle VLAs.  */
-  if (TREE_CODE (eltsize) != INTEGER_CST)
-   return;
-
-  tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
-
-  up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
-
-  tree arg = TREE_OPERAND (ref, 0);
-
-  HOST_WIDE_INT off;
-  if (get_addr_base_and_unit_offset (arg, &off))
-   up_bound_p1 = wide_int_to_tree (sizetype,
-   wi::sub (wi::to_wide (up_bound_p1),
-off));
-
-  up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
- build_int_cst (ptrdiff_type_node, 1));
+  if (TREE_CODE (eltsize) != INTEGER_CST
+ || integer_zerop (eltsize))
+   {
+ up_bound = NULL_TREE;
+ up_bound_p1 = NULL_TREE;
+   }
+  else
+   {
+ tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
+ tree arg = TREE_OPERAND (ref, 0);
+ HOST_WIDE_INT off;
+
+ if (get_addr_base_and_unit_offset (arg, &off) && off > 0)
+   maxbound = wide_int_to_tree (sizetype,
+wi::sub (wi::to_wide (maxbound),
+ off));
+ else
+   maxbound = fold_convert (sizetype, maxbound);
+
+ up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
+
+ up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
+ build_int_cst (ptrdiff_type_node, 1));
+   }
 }
   else
 up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
@@ -4823,7 +4829,7 @@ vrp_prop::check_array_ref (location_t lo
   tree artype = TREE_TYPE (TREE_OPERAND (ref, 0));
 
   /* Empty array.  */
-  if (tree_int_cst_equal (low_bound, up_bound_p1))
+  if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
 {
   warning_at (location, OPT_Warray_bounds,
  "array subscript %E is above array bounds of %qT",
@@ -4843,7 +4849,8 @@ vrp_prop::check_array_ref (location_t lo
 
   if (vr && vr->type == VR_ANTI_RANGE)
 {
-  if (TREE_CODE (up_sub) == INTEGER_CST
+  if (up_bound
+ && TREE_CODE (up_sub) == INTEGER_CST
   && (ignore_off_by_one
  ? tree_int_cst_lt (up_bound, up_sub)
  : tree_int_cst_le (up_bound, up_sub))
@@ -4856,7 +4863,8 @@ vrp_prop::check_array_ref (location_t lo
   TREE_NO_WARNING (ref) = 1;
 }
 }
-  else if (TREE_CODE (up_sub) == INTEGER_CST
+  else if (up_bound
+  && TREE_CODE (

Re: [PATCH] PR libstdc++/48101 improve errors for invalid container specializations

2017-11-22 Thread Jonathan Wakely

On 22/11/17 11:23 +0100, Rainer Orth wrote:

Hi Jonathan,


This uses static_assert to improve the errors when attempting to
instantiate invalid specializations of containers, e.g. set,
or unordered_set, hash> (which mixes up the
order of the hasher and equality predicate arguments).

This means instead of more than 100 lines of confusing errors for
https://wandbox.org/permlink/kL1YVNVOzrAsLPyS we get only this:

In file included from /home/jwakely/gcc/8/include/c++/8.0.0/set:61:0,
from s.cc:2:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h: In instantiation of ‘class 
std::set’:
s.cc:8:18:   required from here
/home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h:108:7: error: static 
assertion failed: std::set must have a non-const, non-volatile value_type
  static_assert(is_same::type, _Key>::value,
  ^
In file included from /home/jwakely/gcc/8/include/c++/8.0.0/unordered_set:46:0,
from s.cc:1:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h: In instantiation of ‘class std::_Hashtable, std::__detail::_Identity, std::hash, std::equal_to, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, 
std::__detail::_Hashtable_traits >’:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/unordered_set.h:898:18:   required from ‘class 
std::unordered_multiset, std::hash >’
s.cc:10:53:   required from here
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:195:7: error: static 
assertion failed: hash function must be invocable with an argument of key type
  static_assert(__is_invocable{},
  ^
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:197:7: error: static 
assertion failed: key equality predicate must be invocable with two arguments 
of key type
  static_assert(__is_invocable{},
  ^

Tested powerpc64le-linux, committed to trunk.


this broke Go bootstrap, it seems.  Seen on i386-pc-solaris2.11 as of
r255048:

In file included from 
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/map:60,
from /vol/gcc/src/hg/trunk/local/gcc/go/go-system.h:36,
from 
/vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/ast-dump.cc:7:
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:
 In instantiation of 'class std::_Rb_tree, Import_init_lt, std::allocator >':
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_set.h:133:17:
   required from 'class std::set'
/vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/gogo.h:127:37:   required from 
here
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:452:7:
 error: static assertion failed: comparison object must be invocable with two 
arguments of key type
  static_assert(__is_invocable{},
  ^
In file included from 
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/map:60,
from /vol/gcc/src/hg/trunk/local/gcc/go/go-system.h:36,
from /vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/escape.cc:7:
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:
 In instantiation of 'class std::_Rb_tree, Import_init_lt, std::allocator >':
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_set.h:133:17:
   required from 'class std::set'
/vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/gogo.h:127:37:   required from 
here
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:452:7:
 error: static assertion failed: comparison object must be invocable with two 
arguments of key type
  static_assert(__is_invocable{},
  ^


This is PR83102. I'm going to relax the checks, but the Go
frontend still needs to be fixed.




Re: [PATCH] Fix PR82991

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 11:41:24AM +0100, Richard Biener wrote:
> 
> I am testing the following (old) patch to value number call lhs
> according to the ERF_RETURNS_ARG annotation.  This allows for
> more expression simplification and also changes downstream
> users of the argument to use the return value which should help
> register allocation and register lifetime in general.

If the argument is an SSA_NAME then that is generally a win, but won't
it also replace if the argument is say ADDR_EXPR of a VAR_DECL or similar?

If we only later (say some later forwprop) determine the argument is
gimple_min_invariant, shouldn't we have something that will effectively
undo this (i.e. replace the uses of the function result with whatever
we've simplified the argument to)?

Jakub


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-22 Thread Alan Hayward

> On 22 Nov 2017, at 09:14, Richard Biener  wrote:
> 
> On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel
>  wrote:
>> 
>>> This is PR81179 I think, please mention that in the changelog.
>> 
>> Correct, my bad for missing that.
>> 
>>> This unconditionally pessimizes code even if there is no valid index
>>> zero, right?
>> 
>> Almost, since for a loop such as:
>> 
>>  #define OFFSET 1
>>  unsigned int find(const unsigned int *a, unsigned int v) {
>>unsigned int result = 120;
>>for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>>  if (a[i-OFFSET] == v) result = i;
>>}
>>return result;
>>  }
>> 
>> the index i will match the contents of the index vector used here ---
>> but this does indeed pessimize the code generated for, say, OFFSET
>> = 2. It is probably more sensible to use the existing code path in those
>> situations.
>> 
>>> The issue with the COND_REDUCITION index vector is overflow IIRC.
>> 
>> Does that mean such overflows can already manifest themselves for
>> regular COND_REDUCTION? I had assumed sufficient checks were already in
>> place because of the presence of the is_nonwrapping_integer_induction
>> test.
> 
> But only if we need the index vector?  The patch looked like you're changing
> how other modes are handled (in my look I didn't make myself familiar with
> the various modes again...).  Anyway, Alan hopefully remembers what he
> coded so I'll defer to him here.
> 
> If Alan is happy with the patch consider it approved.
> 

Richard’s right with his question.

The optimisation needs to fail if the number of interactions of the loop + 1 
doesn’t
fit into the data type used for the result.

I took the test pr65947-14.c
First I set N to 0x-1. This compiled and vectorised. That’s ok.
Now if I set N to 0x it still vectorises, but this should fail.

Compare to pr65947-14.c where we set  last = a[I]; inside the if.
When set N to 0x-1, it compiled and vectorised. That’s ok.
When set N to 0x it fails to vectorise with the message
"loop size is greater than data size”.

Looks like you might just need to add the one check.

Also see pr65947-9.c which uses the slightly more useful char indexes.


Alan.





Re: [PATCH] PR libstdc++/48101 improve errors for invalid container specializations

2017-11-22 Thread Jonathan Wakely

On 22/11/17 10:56 +, Jonathan Wakely wrote:

On 22/11/17 11:23 +0100, Rainer Orth wrote:

Hi Jonathan,


This uses static_assert to improve the errors when attempting to
instantiate invalid specializations of containers, e.g. set,
or unordered_set, hash> (which mixes up the
order of the hasher and equality predicate arguments).

This means instead of more than 100 lines of confusing errors for
https://wandbox.org/permlink/kL1YVNVOzrAsLPyS we get only this:

In file included from /home/jwakely/gcc/8/include/c++/8.0.0/set:61:0,
   from s.cc:2:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h: In instantiation of ‘class 
std::set’:
s.cc:8:18:   required from here
/home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h:108:7: error: static 
assertion failed: std::set must have a non-const, non-volatile value_type
 static_assert(is_same::type, _Key>::value,
 ^
In file included from /home/jwakely/gcc/8/include/c++/8.0.0/unordered_set:46:0,
   from s.cc:1:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h: In instantiation of ‘class std::_Hashtable, std::__detail::_Identity, std::hash, std::equal_to, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, 
std::__detail::_Hashtable_traits >’:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/unordered_set.h:898:18:   required from ‘class 
std::unordered_multiset, std::hash >’
s.cc:10:53:   required from here
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:195:7: error: static 
assertion failed: hash function must be invocable with an argument of key type
 static_assert(__is_invocable{},
 ^
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:197:7: error: static 
assertion failed: key equality predicate must be invocable with two arguments 
of key type
 static_assert(__is_invocable{},
 ^

Tested powerpc64le-linux, committed to trunk.


this broke Go bootstrap, it seems.  Seen on i386-pc-solaris2.11 as of
r255048:

In file included from 
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/map:60,
   from /vol/gcc/src/hg/trunk/local/gcc/go/go-system.h:36,
   from /vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/ast-dump.cc:7:
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:
 In instantiation of 'class std::_Rb_tree, Import_init_lt, std::allocator >':
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_set.h:133:17:
   required from 'class std::set'
/vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/gogo.h:127:37:   required from 
here
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:452:7:
 error: static assertion failed: comparison object must be invocable with two 
arguments of key type
 static_assert(__is_invocable{},
 ^
In file included from 
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/map:60,
   from /vol/gcc/src/hg/trunk/local/gcc/go/go-system.h:36,
   from /vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/escape.cc:7:
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:
 In instantiation of 'class std::_Rb_tree, Import_init_lt, std::allocator >':
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_set.h:133:17:
   required from 'class std::set'
/vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/gogo.h:127:37:   required from 
here
/var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:452:7:
 error: static assertion failed: comparison object must be invocable with two 
arguments of key type
 static_assert(__is_invocable{},
 ^


This is PR83102. I'm going to relax the checks, but the Go
frontend still needs to be fixed.




This should allow Go to build again (unless you bootstrap with
-std=gnu++17 in the build flags)


commit fbf3996895469f1bcd56c6ca3cfac190d956fff9
Author: Jonathan Wakely 
Date:   Wed Nov 22 10:52:56 2017 +

PR go/83102 relax std::set checks for invocable comparison object

PR go/83102
* include/bits/stl_tree.h (_Rb_tree): Relax invocable checks for
comparison object pre-C++17.

diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h
index ff36618bccf..df92a60d4dd 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -449,9 +449,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef __gnu_cxx::__alloc_traits<_Node_allocator> _Alloc_traits;
 
 #if __cplusplus >= 201103L
-  static_assert(__is_invocable{},
+  static_assert(__is_invocable<_Compare&, const _Key&, const _Key&>{},
 	  "comparison object must be invocable with two argum

Re: [PING] Plugin support on Windows/MinGW

2017-11-22 Thread Boris Kolpackov
JonY <10wa...@gmail.com> writes:

> Is there a problem with using .so for internal libraries instead of
> "dll"...

I think not but I haven't tested it. The problem with using .so instead
of .dll is that producing this non-standard extension may not be easy
or possible depending on the build system/tool (e.g., libtool). Also,
you never know how other pieces of the system (like antivirus) will
react to a file that looks like a DLL but is called something else.


> ... if it simplifies the code?

I don't think it simplifies that much and the potential (and unknown)
downside is significant.

Thanks for the review,
Boris


Re: Replace REDUC_*_EXPRs with internal functions.

2017-11-22 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Wed, Nov 22, 2017 at 10:09:08AM +, Richard Sandiford wrote:
>> This patch replaces the REDUC_*_EXPR tree codes with internal functions.
>> This is needed so that the support for in-order reductions can also use
>> internal functions without too much complication.
>> 
>> This came out of the review for:
>>   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01516.html
>> 
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>> OK to install?
>
> Wouldn't it be better to just have IFN_REDUC that takes as an additional
> argument INTEGER_CST with tree_code of the operation (so REDUC_MAX_EXPR
> would be transformed into REDUC (MAX_EXPR, ...) etc.)?
> That way we wouldn't need to add further internal fns if we want say
> multiplication reduction, or some other.

I think it depends how we use them.  The functions added here map
directly to optabs, so we'd only add a new one if we also added a
new optab.  If there's no optab, or if there is an optab but the
target doesn't support it, then we open-code the reduction during
vectorisation.  (That open-coding already happens for MULT, AND, IOR
and XOR, which have no optabs, although one of the SVE patches does
add optabs for the last three.)

I think having separate functions makes sense in that case, since it
makes the mapping to optabs easier, and makes it easier to probe
for target support.  Maybe an IFN_REDUC would be useful if we wanted
to defer the open-coding of other reductions past vectorisation,
but I'm not sure off-hand how useful that would be.  E.g. we'd still
need to try to cost the eventual expansion when deciding profitability.

Thanks,
Richard


Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-22 Thread Sudi Das

Hi Kyrill and Christophe


In case of soft fp testing, there are other assembly directives apart from the 
vmov one which are also failing. The directives probably make more sense in the 
hard fp context so instead of removing the vmov, I have added the 
-mfloat-abi=hard option.

Is this ok for trunk? If yes could someone post it on my behalf? 

Sudi


*** gcc/testsuite/ChangeLog ***

2017-11-22  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-1.c: Add -mfloat-abi=hard option.




From: Kyrill Tkachov 
Sent: Monday, November 20, 2017 2:20 PM
To: Christophe Lyon
Cc: Sudi Das; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; Richard 
Earnshaw
Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
  


On 20/11/17 14:14, Christophe Lyon wrote:
> Hi,
>
> On 17 November 2017 at 12:12, Kyrill  Tkachov
>  wrote:
>> On 17/11/17 10:45, Sudi Das wrote:
>>> Hi Kyrill
>>>
>>> Thanks I have made the change.
>>
>> Thanks Sudi, I've committed this on your behalf with r254863.
>>
>> Kyrill
>>
>>
>>> Sudi
>>>
>>>
>>>
>>> From: Kyrill Tkachov 
>>> Sent: Thursday, November 16, 2017 5:03 PM
>>> To: Sudi Das; gcc-patches@gcc.gnu.org
>>> Cc: nd; Ramana Radhakrishnan; Richard Earnshaw
>>> Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
>>>
>>> Hi Sudi,
>>>
>>> On 16/11/17 16:37, Sudi Das wrote:
 Hi

 This patch fixes the test case armv8_2-fp16-move-1.c for
 arm-none-linux-gnueabihf where 2 of the scan-assembler directives were
 failing. We now generate less vmov between core and VFP registers.
 Thus changing those directives to reflect that.

 Is this ok for trunk?
 If yes could someone commit it on my behalf?

 Sudi


 *** gcc/testsuite/ChangeLog ***

 2017-11-16  Sudakshina Das  

    * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov
 scan-assembler
    directives.

>>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
>>> b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
>>> index bb4e68f..0ed8560 100644
>>> --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
>>> +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
>>> @@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
>>> /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+,
>>> s[0-9]+} 1 } }  */
>>> /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+,
>>> s[0-9]+} 1 } }  */
>>> -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 }
>>> }  */
>>> -/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }
>>> */
>>> +/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }
>>> */
>>> +/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } }  */
>>> Some of the moves between core and fp registers were the result of
>>> inefficient codegen and in hindsight
>>> scanning for them was not very useful. Now that we emit only the required
>>> ones I think scanning for the plain
>>> vmovs between two S-registers doesn't test anything useful.
>>> So can you please just remove the second scan-assembler directive here?
>>>
> You are probably already aware of that: the tests fail on
> arm-none-linux-gnueabi/arm-none-eabi
> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
> vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 (found 38 times)
>
> but this is not a regression, the previous version of the test had the
> same problem.

Grrr, that's because the softfp ABI necessitates moves between core and 
FP registers,
so scanning for a particular number of vmovs between them is just not 
gonna be stable across
soft-float ABIs. At this point I'd just remove the scan for vmovs 
completely as it doesn't
seem to check anything useful.

A patch to remove that scan for VMOV is pre-approved.

Thanks for reminding me of this Christophe, softfp tends to slip my mind :(
Kyrill

> Christophe
>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index 8c0a53c..167286d 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile }  */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
-/* { dg-options "-O2" }  */
+/* { dg-options "-O2 -mfloat-abi=hard" }  */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
 
 __fp16


Re: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

2017-11-22 Thread Kyrill Tkachov

Hi Tamar,

On 06/11/17 16:53, Tamar Christina wrote:

Hi All,

This patch adds the NEON intrinsics for Dot product.

Dot product is available from ARMv8.2-a and onwards.

Regtested on arm-none-eabi, armeb-none-eabi,
aarch64-none-elf and aarch64_be-none-elf with no issues found.

Ok for trunk?

gcc/
2017-11-06  Tamar Christina  

* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)


This should be config/arm/arm_neon.h


(vdot_s32, vdotq_s32): New.
(vdot_lane_u32, vdotq_lane_u32): New.
(vdot_lane_s32, vdotq_lane_s32): New.


gcc/testsuite/
2017-11-06  Tamar Christina  

* gcc.target/arm/simd/vdot-compile.c: New.
* gcc.target/arm/simd/vect-dot-qi.h: New.
* gcc.target/arm/simd/vect-dot-s8.c: New.
* gcc.target/arm/simd/vect-dot-u8.c: New

--


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a
 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b)
 
 #endif
 
+/* Adv.SIMD Dot Product intrinsics.  */


Please no full stop: "AdvSIMD".

+
+#pragma GCC push_options
+#if __ARM_ARCH >= 8
+#pragma GCC target ("arch=armv8.2-a+dotprod")



diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h 
b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
new file mode 100644
index 
..90b00aff95cfef96d1963be17673dc191cc71169
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
@@ -0,0 +1,15 @@
+TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+
+__attribute__ ((noinline)) int
+foo1(int len) {
+  int i;
+  TYPE int result = 0;
+  TYPE short prod;
+
+  for (i=0; i

Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-22 Thread Alan Hayward

> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>> 
>>> 
>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>>> totally forgotten about it.  And in fact it seems to come pretty close
>>> to what you need…
>> 
>> Yes, some of the code is similar to the way
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>> CLOBBER expr code served as a starting point for writing the patch. The main 
>> difference
>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>> specific Instruction,
>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to an 
>> expression (tls_desc).
>> It meant there wasn’t really any opportunity to resume any existing code.
> Understood.  Though your first patch mentions that you're trying to
> describe partial preservation "around TLS calls".  Presumably those are
> represented as normal insns, not call_insn.
> 
> That brings me back to Richi's idea of exposing a set of the low subreg
> to itself using whatever mode is wide enough to cover the neon part of
> the register.
> 
> That should tell the generic parts of the compiler that you're just
> clobbering the upper part and at least in theory you can implement in
> the aarch64 backend and the rest of the compiler should "just work"
> because that's the existing semantics of a subreg store.
> 
> The only worry would be if a pass tried to get overly smart and
> considered that kind of set a nop -- but I think I'd argue that's simply
> wrong given the semantics of a partial store.
> 

So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
It’s something we considered, and then dismissed.

The problem then is you are now using SET semantics on those registers, and it
would make the register live around the function, which might not be the case.
Whereas clobber semantics will just make the register dead - which is exactly
what we want (but only conditionally).


Alan.

[PATCH] Make shift argument to eoshift0 be of type index_type

2017-11-22 Thread Janne Blomqvist
Regtested on x86_64-pc-linux-gnu, Ok for trunk?

libgfortran/ChangeLog:

2017-11-22  Janne Blomqvist  

* intrinsics/eoshift0.c (eoshift0): Make shift an index_type.
---
 libgfortran/intrinsics/eoshift0.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libgfortran/intrinsics/eoshift0.c 
b/libgfortran/intrinsics/eoshift0.c
index 3dae88c..dd8c81d 100644
--- a/libgfortran/intrinsics/eoshift0.c
+++ b/libgfortran/intrinsics/eoshift0.c
@@ -26,12 +26,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #include "libgfortran.h"
 #include 
 
-/* TODO: make this work for large shifts when
-   sizeof(int) < sizeof (index_type).  */
 
 static void
 eoshift0 (gfc_array_char * ret, const gfc_array_char * array,
- int shift, const char * pbound, int which, index_type size,
+ index_type shift, const char * pbound, int which, index_type size,
  const char *filler, index_type filler_len)
 {
   /* r.* indicates the return array.  */
-- 
2.7.4



[PATCH] Handle VEC_SERIES with both constant args in simplify_binary_operation

2017-11-22 Thread Jakub Jelinek
Hi!

I've noticed that if we construct a VEC_SERIES and later propagate
constants into both arguments (could happen e.g. in DEBUG_INSNs, or
during combine etc.) we don't simplify that.

The following patch adds that, ok for trunk if it passes bootstrap/regtest?

Or should it go into simplify_constant_binary_operation instead?
In there it would better match the function name, on the other side
it would slow down the function for quite a marginal case, because it
doesn't have a big switch on code, but instead handles just a few cases
if constant operand kinds vs. the mode.

2017-11-22  Jakub Jelinek  

* simplify-rtx.c (simplify_binary_operation_1) :
Handle the case where both arguments are using gen_const_vec_series.

--- gcc/simplify-rtx.c.jj   2017-11-20 11:02:42.0 +0100
+++ gcc/simplify-rtx.c  2017-11-22 12:30:46.808056343 +0100
@@ -3566,6 +3566,8 @@ simplify_binary_operation_1 (enum rtx_co
 case VEC_SERIES:
   if (op1 == CONST0_RTX (GET_MODE_INNER (mode)))
return gen_vec_duplicate (mode, op0);
+  if (CONSTANT_P (op0) && CONSTANT_P (op1))
+   return gen_const_vec_series (mode, op0, op1);
   return 0;
 
 case VEC_SELECT:
@@ -6652,6 +6654,9 @@ test_vector_ops_series (machine_mode mod
   ASSERT_RTX_EQ (series_r_1,
 simplify_binary_operation (MINUS, mode, duplicate,
series_0_m1));
+  ASSERT_RTX_EQ (series_0_m1,
+simplify_binary_operation (VEC_SERIES, mode, const0_rtx,
+   constm1_rtx));
 }
 
 /* Verify some simplifications involving vectors.  */

Jakub


Re: [PATCH] Fix PR82991

2017-11-22 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> On Wed, Nov 22, 2017 at 11:41:24AM +0100, Richard Biener wrote:
> > 
> > I am testing the following (old) patch to value number call lhs
> > according to the ERF_RETURNS_ARG annotation.  This allows for
> > more expression simplification and also changes downstream
> > users of the argument to use the return value which should help
> > register allocation and register lifetime in general.
> 
> If the argument is an SSA_NAME then that is generally a win, but won't
> it also replace if the argument is say ADDR_EXPR of a VAR_DECL or similar?

Looks like it also confuses passes, for gcc.dg/tree-ssa/pr62112-1.c

char*g;
void h(){
  char*p=__builtin_malloc(42);
  g=__builtin_memset(p,3,10);
  __builtin_free(p);
}

we no longer DSE the memset because we end up with

   :
  p_4 = __builtin_malloc (42);
  _1 = __builtin_memset (p_4, 3, 10);
  g = _1;
  __builtin_free (_1);

and we no longer see that the free call clobbers the memset destination.

So I guess I'll try to do the reverse (always replace with the argument).
That will of course make DCE remove the LHS of such calls.

In theory it's easy to recover somewhere before RTL expansion ...

> If we only later (say some later forwprop) determine the argument is
> gimple_min_invariant, shouldn't we have something that will effectively
> undo this (i.e. replace the uses of the function result with whatever
> we've simplified the argument to)?

... of course not (easily) if we substituted a constant.

The question is whether for example a pointer in TLS is cheaper
to remat from the return value or via the TLS reg like in

__thread int x[4];

int *foo ()
{
  int *q = __builtin_memset (&x, 3, 4*4);
  return q;
}

of course if the user has written in that way we currently do not
change it.  And we have no easy way (currently) to replace equal
constants by a register where that is available.  So

__thread int x[4];
  
int *foo ()
{
  int *q = __builtin_memset (&x, 3, 4*4);
  return &x;
} 

wouldn't be optimized to return q.  Anyway, it seems the register
allocator can do the trick at least for memset.

Richard.


Re: [PATCH, GCC/ARM] Use bitmap to control cmse_nonsecure_call register clearing

2017-11-22 Thread Kyrill Tkachov

Hi Thomas,

On 15/11/17 17:08, Thomas Preudhomme wrote:

Hi,

As part of r253256, cmse_nonsecure_entry_clear_before_return has been
rewritten to use auto_sbitmap instead of an integer bitfield to control
which register needs to be cleared. This commit continue this work in
cmse_nonsecure_call_clear_caller_saved.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-10-16  Thomas Preud'homme 

* config/arm/arm.c (cmse_nonsecure_call_clear_caller_saved): Use
auto_sbitap instead of integer bitfield to control register 
needing

clearing.

Testing: bootstrapped on arm-linux-gnueabihf and no regression in the
testsuite.

Is this ok for trunk?



Ok for trunk.
Thanks for this conversion. It's much easier to understand the code
without having to think about the bitmasks and shifts.

Kyrill


Best regards,

Thomas




Re: [PATCH] vrp_prop::check_array_ref fixes (PR tree-optimization/83044)

2017-11-22 Thread Richard Biener
On Wed, 22 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase we ICE, because eltsize is 0 and when we
> int_const_binop (TRUNC_DIV_EXPR, maxbound, size_int (0)), that of course
> returns NULL.
> The upper bound for zero sized elements is just not meaningful, if we use
> any index we still won't reach maximum object size that way.
> 
> While looking at that, I've discovered a couple of other issues though.
> 
> One is that the off subtraction seems misplaced.
> get_addr_base_and_unit_offset computes an offset in bytes, so it corresponds
> to the __PTRDIFF_MAX__ limit on object size (also in bytes), not to the
> __PTRDIFF_MAX__ limit divided by element size.
> So I believe we first want to subtract off (and only if it is positive, we
> do not want to enlarge the limit) from __PTRDIFF_MAX__ and then divide
> rather than what the code was doing before.
> 
> Another thing is that the code has been mixing up types randomly, something
> being in ptrdiff_type_node, something in sizetype.
> 
> And yet another thing is that even if we can't compute any sensible upper
> bound (e.g. the eltsize 0 case; I believe the VLA case just won't happen
> here, because we replace the VLAs with dereferences of a pointer I believe
> the ARRAY_REFs should have been transformed into POINTER_PLUS anyway), we
> can still warn about indexes smaller than low bound (especially if the low
> bound is not zero like in Fortran).
> 
> The Warray-bounds.c testcase needed adjustment, because it was written with
> the same bad computation of the maximum - __PTRDIFF_MAX__ / sizeof (elt) -
> offset rather than (__PTRDIFF_MAX__ - offset) / sizeof (elt).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.  Thanks for cleaning this up - looks I got lazy during the lengthy
review process :/

Richard.

> 2017-11-22  Jakub Jelinek  
> 
>   PR tree-optimization/83044
>   * tree-vrp.c (vrp_prop::check_array_ref): If eltsize is not
>   INTEGER_CST or is 0, clear up_bound{,_p1} and later ignore tests
>   that need the upper bound.  Subtract offset from
>   get_addr_base_and_unit_offset only if positive and subtract it
>   before division by eltsize rather than after it.
> 
>   * gcc.dg/pr83044.c: New test.
>   * c-c++-common/Warray-bounds.c (fb): Fix up MAX value.
> 
> --- gcc/tree-vrp.c.jj 2017-11-17 08:40:28.0 +0100
> +++ gcc/tree-vrp.c2017-11-21 14:13:47.184974061 +0100
> @@ -4795,24 +4795,30 @@ vrp_prop::check_array_ref (location_t lo
>the size of the largest object is PTRDIFF_MAX.  */
>tree eltsize = array_ref_element_size (ref);
>  
> -  /* FIXME: Handle VLAs.  */
> -  if (TREE_CODE (eltsize) != INTEGER_CST)
> - return;
> -
> -  tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
> -
> -  up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
> -
> -  tree arg = TREE_OPERAND (ref, 0);
> -
> -  HOST_WIDE_INT off;
> -  if (get_addr_base_and_unit_offset (arg, &off))
> - up_bound_p1 = wide_int_to_tree (sizetype,
> - wi::sub (wi::to_wide (up_bound_p1),
> -  off));
> -
> -  up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
> -   build_int_cst (ptrdiff_type_node, 1));
> +  if (TREE_CODE (eltsize) != INTEGER_CST
> +   || integer_zerop (eltsize))
> + {
> +   up_bound = NULL_TREE;
> +   up_bound_p1 = NULL_TREE;
> + }
> +  else
> + {
> +   tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
> +   tree arg = TREE_OPERAND (ref, 0);
> +   HOST_WIDE_INT off;
> +
> +   if (get_addr_base_and_unit_offset (arg, &off) && off > 0)
> + maxbound = wide_int_to_tree (sizetype,
> +  wi::sub (wi::to_wide (maxbound),
> +   off));
> +   else
> + maxbound = fold_convert (sizetype, maxbound);
> +
> +   up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
> +
> +   up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
> +   build_int_cst (ptrdiff_type_node, 1));
> + }
>  }
>else
>  up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
> @@ -4823,7 +4829,7 @@ vrp_prop::check_array_ref (location_t lo
>tree artype = TREE_TYPE (TREE_OPERAND (ref, 0));
>  
>/* Empty array.  */
> -  if (tree_int_cst_equal (low_bound, up_bound_p1))
> +  if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
>  {
>warning_at (location, OPT_Warray_bounds,
> "array subscript %E is above array bounds of %qT",
> @@ -4843,7 +4849,8 @@ vrp_prop::check_array_ref (location_t lo
>  
>if (vr && vr->type == VR_ANTI_RANGE)
>  {
> -  if (TREE_CODE (up_sub) == INTEGER_CST
> +  if (up_bound
> +   && TREE_CODE (up_sub) == INTEGER_CST
>&& (ignore_off_by_one
> ? tre

[PATCH] Use __BYTE_ORDER__ predefined macro instead of runtime check

2017-11-22 Thread Janne Blomqvist
By using the __BYTE_ORDER__ predefined macro we don't need the
determine_endianness function anymore.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

libgfortran/ChangeLog:

2017-11-22  Janne Blomqvist  

* io/inquire.c (inquire_via_unit): Use __BYTE_ORDER__ predefined
macro.
* io/open.c (st_open): Likewise.
* io/transfer.c (data_transfer_init): Likewise.
* io/write.c (btoa_big): Likewise.
(otoa_big): Likewise.
(ztoa_big): Likewise.
* libgfortran.h (big_endian): Remove variable.
(GFOR_POINTER_TO_L1): Use __BYTE_ORDER__ macro.
* runtime/main.c (determine_endianness): Remove function.
(init): Remove call to determine_endianness.
* runtime/minimal.c: Remove setting big_endian variable.
---
 libgfortran/io/inquire.c  |  5 ++---
 libgfortran/io/open.c |  6 ++
 libgfortran/io/transfer.c |  6 ++
 libgfortran/io/write.c|  6 +++---
 libgfortran/libgfortran.h |  6 +-
 libgfortran/runtime/main.c| 28 
 libgfortran/runtime/minimal.c |  7 ---
 7 files changed, 10 insertions(+), 54 deletions(-)

diff --git a/libgfortran/io/inquire.c b/libgfortran/io/inquire.c
index 4cf87d3..fe353c5 100644
--- a/libgfortran/io/inquire.c
+++ b/libgfortran/io/inquire.c
@@ -612,13 +612,12 @@ inquire_via_unit (st_parameter_inquire *iqp, gfc_unit *u)
   else
switch (u->flags.convert)
  {
-   /*  big_endian is 0 for little-endian, 1 for big-endian.  */
  case GFC_CONVERT_NATIVE:
-   p = big_endian ? "BIG_ENDIAN" : "LITTLE_ENDIAN";
+   p = __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ ? "BIG_ENDIAN" : 
"LITTLE_ENDIAN";
break;
 
  case GFC_CONVERT_SWAP:
-   p = big_endian ? "LITTLE_ENDIAN" : "BIG_ENDIAN";
+   p = __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ ? "LITTLE_ENDIAN" : 
"BIG_ENDIAN";
break;
 
  default:
diff --git a/libgfortran/io/open.c b/libgfortran/io/open.c
index 9d3988a..fab2065 100644
--- a/libgfortran/io/open.c
+++ b/libgfortran/io/open.c
@@ -805,8 +805,6 @@ st_open (st_parameter_open *opp)
conv = compile_options.convert;
 }
   
-  /* We use big_endian, which is 0 on little-endian machines
- and 1 on big-endian machines.  */
   switch (conv)
 {
 case GFC_CONVERT_NATIVE:
@@ -814,11 +812,11 @@ st_open (st_parameter_open *opp)
   break;
   
 case GFC_CONVERT_BIG:
-  conv = big_endian ? GFC_CONVERT_NATIVE : GFC_CONVERT_SWAP;
+  conv = __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ ? GFC_CONVERT_NATIVE : 
GFC_CONVERT_SWAP;
   break;
   
 case GFC_CONVERT_LITTLE:
-  conv = big_endian ? GFC_CONVERT_SWAP : GFC_CONVERT_NATIVE;
+  conv = __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ ? GFC_CONVERT_SWAP : 
GFC_CONVERT_NATIVE;
   break;
   
 default:
diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index 1eb23fb..acaa88a 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -2723,8 +2723,6 @@ data_transfer_init (st_parameter_dt *dtp, int read_flag)
   if (conv == GFC_CONVERT_NONE)
conv = compile_options.convert;
 
-  /* We use big_endian, which is 0 on little-endian machines
-and 1 on big-endian machines.  */
   switch (conv)
{
case GFC_CONVERT_NATIVE:
@@ -2732,11 +2730,11 @@ data_transfer_init (st_parameter_dt *dtp, int read_flag)
  break;
 
case GFC_CONVERT_BIG:
- conv = big_endian ? GFC_CONVERT_NATIVE : GFC_CONVERT_SWAP;
+ conv = __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ ? GFC_CONVERT_NATIVE : 
GFC_CONVERT_SWAP;
  break;
 
case GFC_CONVERT_LITTLE:
- conv = big_endian ? GFC_CONVERT_SWAP : GFC_CONVERT_NATIVE;
+ conv = __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ ? GFC_CONVERT_SWAP : 
GFC_CONVERT_NATIVE;
  break;
 
default:
diff --git a/libgfortran/io/write.c b/libgfortran/io/write.c
index c9aad15..f417202 100644
--- a/libgfortran/io/write.c
+++ b/libgfortran/io/write.c
@@ -986,7 +986,7 @@ btoa_big (const char *s, char *buffer, int len, 
GFC_UINTEGER_LARGEST *n)
   int i, j;
 
   q = buffer;
-  if (big_endian)
+  if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
 {
   const char *p = s;
   for (i = 0; i < len; i++)
@@ -1051,7 +1051,7 @@ otoa_big (const char *s, char *buffer, int len, 
GFC_UINTEGER_LARGEST *n)
   *q = '\0';
   i = k = octet = 0;
 
-  if (big_endian)
+  if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
 {
   const char *p = s + len - 1;
   char c = *p;
@@ -1126,7 +1126,7 @@ ztoa_big (const char *s, char *buffer, int len, 
GFC_UINTEGER_LARGEST *n)
 
   q = buffer;
 
-  if (big_endian)
+  if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
 {
   const char *p = s;
   for (i = 0; i < len; i++)
diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h
index cdbdd951..10c7507 100644
--- a/libgfortran/libgfortran.h
+++ b/libgfortran/libg

Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-22 Thread Nathan Sidwell

On 11/21/2017 06:14 AM, Dominik Inführ wrote:

Hi,

this patch tries to extend tree-ssa-dce.c to remove unnecessary 
new/delete-pairs (it already does that for malloc/free). Clang does it too and 
it seems to be allowed by 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html. I’ve 
bootstrapped/regtested on aarch64-linux and x86_64-linux.


nice.


--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1787,7 +1787,9 @@ struct GTY(()) tree_function_decl {
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
   unsigned versioned_function : 1;
-  /* No bits left.  */
+
+  unsigned operator_delete_flag : 1;
+  /* 31 bits left.  */
 };


that's unpleasant.  We have DECL_IS_{MALLOC,OPERATOR_{NEW,DELETE}} 
flags, which are all mutually exclusive.  If only there was a way to 
encode a 4-valued enumeration in fewer than 3 bits ... :)  (not sure why 
we don't have DECL_IS_FREE, and as we don't, why is 
DECL_IS_OPERATOR_DELETE needed?


(we also have DECL_IS_{CON,DE}STRUCTOR flags, which I think are also 
mutually exclusive with the above.  So that's 5 or (6 if we add 
DECL_IS_FREE), that could be encoded in 3 bits.


There may be even more mutually exclusive flags, 
DECL_STATIC_{CON,DE}STRUCTOR may be candiates?



nathan

--
Nathan Sidwell


Re: [PATCH] Handle VEC_SERIES with both constant args in simplify_binary_operation

2017-11-22 Thread Richard Sandiford
Jakub Jelinek  writes:
> Hi!
>
> I've noticed that if we construct a VEC_SERIES and later propagate
> constants into both arguments (could happen e.g. in DEBUG_INSNs, or
> during combine etc.) we don't simplify that.
>
> The following patch adds that, ok for trunk if it passes bootstrap/regtest?
>
> Or should it go into simplify_constant_binary_operation instead?
> In there it would better match the function name, on the other side
> it would slow down the function for quite a marginal case, because it
> doesn't have a big switch on code, but instead handles just a few cases
> if constant operand kinds vs. the mode.
>
> 2017-11-22  Jakub Jelinek  
>
>   * simplify-rtx.c (simplify_binary_operation_1) :
>   Handle the case where both arguments are using gen_const_vec_series.

LGTM FWIW.  Thanks for doing this.

Richard


Re: [PATCH] Fix mult expansion ICE (PR middle-end/82875)

2017-11-22 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 22 Nov 2017, Jakub Jelinek wrote:
>
>> On Wed, Nov 22, 2017 at 10:41:19AM +0100, Richard Biener wrote:
>> > How much churn would it be to pass down a mode alongside the operands
>> > in expand_binop?  Can't find it right now but didn't we introduce
>> > some rtx_with_mode pair-like stuff somewhen?
>> 
>> We have rtx_mode_t for that.  But there are 240+ calls to expand_binop,
>> and even if we add an overload that will transform it, unless we forcefully
>> inline it wouldn't that slow down all the spots a little bit?
>> The thing is, for the vast majority of binary ops we don't need the operand
>> modes, it is mainly comparisons, second arg of shifts/rotates and this
>> widening case.
>
> Ok, so maybe split expand_binop then to the class of cases where we do
> need the mode and a class where we don't then?
>
> We don't have to use rtx_mode_t we can just pass two arguments.  Not
> sure what is more convenient to use.

FWIW, rtx_mode_t was only really added so that we have a single blob
for wi:: calls (with the hope that it would eventually be replaced
with just the rtx once CONST_INTs have a mode).  I think it'd be
more consistent to use separate arguments for other cases.

Thanks,
Richard

> Anyway, this doesn't have to happen in stage3, just as a general
> note on how I believe we changed things in other places.  Richard S.
> may remember more here.
>
> Richard.


Re: [PATCH] Fix mult expansion ICE (PR middle-end/82875)

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 01:03:06PM +, Richard Sandiford wrote:
> Richard Biener  writes:
> > On Wed, 22 Nov 2017, Jakub Jelinek wrote:
> >
> >> On Wed, Nov 22, 2017 at 10:41:19AM +0100, Richard Biener wrote:
> >> > How much churn would it be to pass down a mode alongside the operands
> >> > in expand_binop?  Can't find it right now but didn't we introduce
> >> > some rtx_with_mode pair-like stuff somewhen?
> >> 
> >> We have rtx_mode_t for that.  But there are 240+ calls to expand_binop,
> >> and even if we add an overload that will transform it, unless we forcefully
> >> inline it wouldn't that slow down all the spots a little bit?
> >> The thing is, for the vast majority of binary ops we don't need the operand
> >> modes, it is mainly comparisons, second arg of shifts/rotates and this
> >> widening case.
> >
> > Ok, so maybe split expand_binop then to the class of cases where we do
> > need the mode and a class where we don't then?
> >
> > We don't have to use rtx_mode_t we can just pass two arguments.  Not
> > sure what is more convenient to use.
> 
> FWIW, rtx_mode_t was only really added so that we have a single blob
> for wi:: calls (with the hope that it would eventually be replaced
> with just the rtx once CONST_INTs have a mode).  I think it'd be
> more consistent to use separate arguments for other cases.

So perhaps it would be easiest to just add one defaulted to VOIDmode
argument to expand_binop, say aux_mode, which would stand for whatever
other second mode the optab needs (could be operand mode for comparisons
or the widening stuff, or op1 mode for shifts, etc.).  If it is VOIDmode,
it wouldn't be used.

Jakub


Re: [PATCH, GCC/ARM] Use bitmap to control cmse_nonsecure_call register clearing

2017-11-22 Thread Thomas Preudhomme

Thanks Kyrill.

Committed the attached rebased patch (same patch but without the last hunk 
because a better fix was done in an earlier commit).


Best regards,

Thomas

On 22/11/17 11:57, Kyrill Tkachov wrote:

Hi Thomas,

On 15/11/17 17:08, Thomas Preudhomme wrote:

Hi,

As part of r253256, cmse_nonsecure_entry_clear_before_return has been
rewritten to use auto_sbitmap instead of an integer bitfield to control
which register needs to be cleared. This commit continue this work in
cmse_nonsecure_call_clear_caller_saved.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-10-16  Thomas Preud'homme 

    * config/arm/arm.c (cmse_nonsecure_call_clear_caller_saved): Use
    auto_sbitap instead of integer bitfield to control register needing
    clearing.

Testing: bootstrapped on arm-linux-gnueabihf and no regression in the
testsuite.

Is this ok for trunk?



Ok for trunk.
Thanks for this conversion. It's much easier to understand the code
without having to think about the bitmasks and shifts.

Kyrill


Best regards,

Thomas


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 106e3edce0d6f2518eb391c436c5213a78d1275b..092cd61d49382101bce9b8c5f04de31965dcdc77 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17007,10 +17007,11 @@ cmse_nonsecure_call_clear_caller_saved (void)
 
   FOR_BB_INSNS (bb, insn)
 	{
-	  uint64_t to_clear_mask, float_mask;
+	  unsigned address_regnum, regno, maxregno =
+	TARGET_HARD_FLOAT_ABI ? D7_VFP_REGNUM : NUM_ARG_REGS - 1;
+	  auto_sbitmap to_clear_bitmap (maxregno + 1);
 	  rtx_insn *seq;
 	  rtx pat, call, unspec, reg, cleared_reg, tmp;
-	  unsigned int regno, maxregno;
 	  rtx address;
 	  CUMULATIVE_ARGS args_so_far_v;
 	  cumulative_args_t args_so_far;
@@ -17041,18 +17042,21 @@ cmse_nonsecure_call_clear_caller_saved (void)
 	continue;
 
 	  /* Determine the caller-saved registers we need to clear.  */
-	  to_clear_mask = (1LL << (NUM_ARG_REGS)) - 1;
-	  maxregno = NUM_ARG_REGS - 1;
+	  bitmap_clear (to_clear_bitmap);
+	  bitmap_set_range (to_clear_bitmap, R0_REGNUM, NUM_ARG_REGS);
+
 	  /* Only look at the caller-saved floating point registers in case of
 	 -mfloat-abi=hard.  For -mfloat-abi=softfp we will be using the
 	 lazy store and loads which clear both caller- and callee-saved
 	 registers.  */
 	  if (TARGET_HARD_FLOAT_ABI)
 	{
-	  float_mask = (1LL << (D7_VFP_REGNUM + 1)) - 1;
-	  float_mask &= ~((1LL << FIRST_VFP_REGNUM) - 1);
-	  to_clear_mask |= float_mask;
-	  maxregno = D7_VFP_REGNUM;
+	  auto_sbitmap float_bitmap (maxregno + 1);
+
+	  bitmap_clear (float_bitmap);
+	  bitmap_set_range (float_bitmap, FIRST_VFP_REGNUM,
+D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1);
+	  bitmap_ior (to_clear_bitmap, to_clear_bitmap, float_bitmap);
 	}
 
 	  /* Make sure the register used to hold the function address is not
@@ -17060,7 +17064,9 @@ cmse_nonsecure_call_clear_caller_saved (void)
 	  address = RTVEC_ELT (XVEC (unspec, 0), 0);
 	  gcc_assert (MEM_P (address));
 	  gcc_assert (REG_P (XEXP (address, 0)));
-	  to_clear_mask &= ~(1LL << REGNO (XEXP (address, 0)));
+	  address_regnum = REGNO (XEXP (address, 0));
+	  if (address_regnum < R0_REGNUM + NUM_ARG_REGS)
+	bitmap_clear_bit (to_clear_bitmap, address_regnum);
 
 	  /* Set basic block of call insn so that df rescan is performed on
 	 insns inserted here.  */
@@ -17081,6 +17087,7 @@ cmse_nonsecure_call_clear_caller_saved (void)
 	  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
 	{
 	  rtx arg_rtx;
+	  uint64_t to_clear_args_mask;
 	  machine_mode arg_mode = TYPE_MODE (arg_type);
 
 	  if (VOID_TYPE_P (arg_type))
@@ -17093,10 +17100,18 @@ cmse_nonsecure_call_clear_caller_saved (void)
 	  arg_rtx = arm_function_arg (args_so_far, arg_mode, arg_type,
 	  true);
 	  gcc_assert (REG_P (arg_rtx));
-	  to_clear_mask
-		&= ~compute_not_to_clear_mask (arg_type, arg_rtx,
-	   REGNO (arg_rtx),
-	   padding_bits_to_clear_ptr);
+	  to_clear_args_mask
+		= compute_not_to_clear_mask (arg_type, arg_rtx,
+	 REGNO (arg_rtx),
+	 padding_bits_to_clear_ptr);
+	  if (to_clear_args_mask)
+		{
+		  for (regno = R0_REGNUM; regno <= maxregno; regno++)
+		{
+		  if (to_clear_args_mask & (1ULL << regno))
+			bitmap_clear_bit (to_clear_bitmap, regno);
+		}
+		}
 
 	  first_param = false;
 	}
@@ -17155,7 +17170,7 @@ cmse_nonsecure_call_clear_caller_saved (void)
 	 call.  */
 	  for (regno = R0_REGNUM; regno <= maxregno; regno++)
 	{
-	  if (!(to_clear_mask & (1LL << regno)))
+	  if (!bitmap_bit_p (to_clear_bitmap, regno))
 		continue;
 
 	  /* If regno is an even vfp register and its successor is also to
@@ -17164,7 +17179,7 @@ cmse_nonsecure_call_clear_caller_saved (void)
 		{
 		  if (TARGET_VFP_DOUBLE
 		  && VFP_REGNO_OK_FOR_DOUBLE (regno)
-		  && to_clear_mask & (1LL << (regno + 1)))
+		  && bitmap_bit

Re: Fix calculation of ptr_mode for MODE_PARTIAL_INT Pmode

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 10:26 AM, Richard Sandiford
 wrote:
> This patch fixes a regression caused by r251469, where I'd incorrectly
> converted a call to mode_for_size that sometimes needs MODE_PARTIAL_INTs.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also spot-checked on msp430-elf.  OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-11-22  Richard Sandiford  
>
> gcc/
> * emit-rtl.c (init_derived_machine_modes): Make sure ptr_mode
> has the same mode class as Pmode.
>
> Index: gcc/emit-rtl.c
> ===
> --- gcc/emit-rtl.c  2017-11-22 09:21:12.560954668 +
> +++ gcc/emit-rtl.c  2017-11-22 09:25:24.22574 +
> @@ -6003,7 +6003,8 @@ init_derived_machine_modes (void)
>
>byte_mode = opt_byte_mode.require ();
>word_mode = opt_word_mode.require ();
> -  ptr_mode = int_mode_for_size (POINTER_SIZE, 0).require ();
> +  ptr_mode = as_a 
> +(mode_for_size (POINTER_SIZE, GET_MODE_CLASS (Pmode), 0).require ());
>  }
>
>  /* Create some permanent unique rtl objects shared between all functions.  */


Re: PR83004: Accidental change to pr81136.c for VECTOR_BITS==128

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 10:30 AM, Richard Sandiford
 wrote:
> r254589 was supposed to leave tests unchanged for the default setting
> of VECTOR_BITS, but I must have got my sums wrong on pr81136.c.
> Sorry for the breakage.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
> OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-11-22  Richard Sandiford  
>
> gcc/testsuite/
> PR testsuite/83004
> * gcc.dg/vect/pr81136.c: Restore previous alignment of 32
> in the default case.
>
> Index: gcc/testsuite/gcc.dg/vect/pr81136.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr81136.c 2017-11-22 09:21:12.538474075 +
> +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-11-22 09:28:04.836628929 +
> @@ -2,7 +2,13 @@
>
>  #include "tree-vect.h"
>
> -struct __attribute__((aligned (VECTOR_BITS / 8)))
> +#if VECTOR_BITS > 256
> +#define ALIGNMENT (VECTOR_BITS / 8)
> +#else
> +#define ALIGNMENT 32
> +#endif
> +
> +struct __attribute__((aligned (ALIGNMENT)))
>  {
>char misaligner;
>int foo[100];


Re: PR82547: Undetected overflow for UNSIGNED wide_ints

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 10:33 AM, Richard Sandiford
 wrote:
> wi::add_large and wi::sub_large weren't setting the overflow bit
> correctly for unsigned operations if the result needed fewer HWIs
> than the precision.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-11-21  Richard Sandiford  
>
> gcc/
> PR middle-end/82547
> * wide-int.cc (wi::add_large, wi::sub_large): Fix overflow detection
> for unsigned values with fewer HWIs than the precision.
> (test_overflow): New function.
> (wide_int_cc_tests): Call it.
>
> Index: gcc/wide-int.cc
> ===
> --- gcc/wide-int.cc 2017-11-22 09:21:12.516930172 +
> +++ gcc/wide-int.cc 2017-11-22 09:33:00.427739376 +
> @@ -1158,7 +1158,7 @@ wi::add_large (HOST_WIDE_INT *val, const
>val[len] = mask0 + mask1 + carry;
>len++;
>if (overflow)
> -   *overflow = false;
> +   *overflow = (sgn == UNSIGNED && carry);
>  }
>else if (overflow)
>  {
> @@ -1552,7 +1552,7 @@ wi::sub_large (HOST_WIDE_INT *val, const
>val[len] = mask0 - mask1 - borrow;
>len++;
>if (overflow)
> -   *overflow = false;
> +   *overflow = (sgn == UNSIGNED && borrow);
>  }
>else if (overflow)
>  {
> @@ -2345,14 +2345,54 @@ static void run_all_wide_int_tests ()
>test_comparisons  ();
>  }
>
> +/* Test overflow conditions.  */
> +
> +static void
> +test_overflow ()
> +{
> +  static int precs[] = { 31, 32, 33, 63, 64, 65, 127, 128 };
> +  static int offsets[] = { 16, 1, 0 };
> +  for (unsigned int i = 0; i < ARRAY_SIZE (precs); ++i)
> +for (unsigned int j = 0; j < ARRAY_SIZE (offsets); ++j)
> +  {
> +   int prec = precs[i];
> +   int offset = offsets[j];
> +   bool overflow;
> +   wide_int sum, diff;
> +
> +   sum = wi::add (wi::max_value (prec, UNSIGNED) - offset, 1,
> +  UNSIGNED, &overflow);
> +   ASSERT_EQ (sum, -offset);
> +   ASSERT_EQ (overflow, offset == 0);
> +
> +   sum = wi::add (1, wi::max_value (prec, UNSIGNED) - offset,
> +  UNSIGNED, &overflow);
> +   ASSERT_EQ (sum, -offset);
> +   ASSERT_EQ (overflow, offset == 0);
> +
> +   diff = wi::sub (wi::max_value (prec, UNSIGNED) - offset,
> +   wi::max_value (prec, UNSIGNED),
> +   UNSIGNED, &overflow);
> +   ASSERT_EQ (diff, -offset);
> +   ASSERT_EQ (overflow, offset != 0);
> +
> +   diff = wi::sub (wi::max_value (prec, UNSIGNED) - offset,
> +   wi::max_value (prec, UNSIGNED) - 1,
> +   UNSIGNED, &overflow);
> +   ASSERT_EQ (diff, 1 - offset);
> +   ASSERT_EQ (overflow, offset > 1);
> +}
> +}
> +
>  /* Run all of the selftests within this file, for all value types.  */
>
>  void
>  wide_int_cc_tests ()
>  {
> - run_all_wide_int_tests  ();
> - run_all_wide_int_tests  ();
> - run_all_wide_int_tests  ();
> +  run_all_wide_int_tests  ();
> +  run_all_wide_int_tests  ();
> +  run_all_wide_int_tests  ();
> +  test_overflow ();
>  }
>
>  } // namespace selftest


Re: Replace REDUC_*_EXPRs with internal functions.

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 12:15 PM, Richard Sandiford
 wrote:
> Jakub Jelinek  writes:
>> On Wed, Nov 22, 2017 at 10:09:08AM +, Richard Sandiford wrote:
>>> This patch replaces the REDUC_*_EXPR tree codes with internal functions.
>>> This is needed so that the support for in-order reductions can also use
>>> internal functions without too much complication.
>>>
>>> This came out of the review for:
>>>   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01516.html
>>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>>> OK to install?
>>
>> Wouldn't it be better to just have IFN_REDUC that takes as an additional
>> argument INTEGER_CST with tree_code of the operation (so REDUC_MAX_EXPR
>> would be transformed into REDUC (MAX_EXPR, ...) etc.)?
>> That way we wouldn't need to add further internal fns if we want say
>> multiplication reduction, or some other.
>
> I think it depends how we use them.  The functions added here map
> directly to optabs, so we'd only add a new one if we also added a
> new optab.  If there's no optab, or if there is an optab but the
> target doesn't support it, then we open-code the reduction during
> vectorisation.  (That open-coding already happens for MULT, AND, IOR
> and XOR, which have no optabs, although one of the SVE patches does
> add optabs for the last three.)
>
> I think having separate functions makes sense in that case, since it
> makes the mapping to optabs easier, and makes it easier to probe
> for target support.  Maybe an IFN_REDUC would be useful if we wanted
> to defer the open-coding of other reductions past vectorisation,
> but I'm not sure off-hand how useful that would be.  E.g. we'd still
> need to try to cost the eventual expansion when deciding profitability.

I also agree that separate IFNs map better to what we do (encode a target
specific insn).

Thus the patch is ok.

Thanks,
Richard.

> Thanks,
> Richard


[PATCH, GCC/ARM] Remove useless variable in CMSE code

2017-11-22 Thread Thomas Preudhomme

Hi,

Functions cmse_nonsecure_call_clear_caller_saved () and
cmse_nonsecure_entry_clear_before_return () use a separate variable
holding a pointer to padding_bits_to_clear array's first entry which is
used when calling function compute_not_to_clear_mask ().  This does not
save space over using &padding_bits_to_clear[0] directly so this commit
gets rid of it.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-11-08  Thomas Preud'homme  

* config/arm/arm.c (cmse_nonsecure_call_clear_caller_saved): Get rid of
padding_bits_to_clear_ptr.
(cmse_nonsecure_entry_clear_before_return): Likewise.

Testing: Bootstrapped an arm-none-linux-gnueabihf compiler and
regression test does not show any regression.

Committed as obvious.

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7384b96fea0179334a6010b099df68c8e2a0fc32..bcb708c1b316ea08969e118fb0949b941ff19c27 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17002,7 +17002,6 @@ cmse_nonsecure_call_clear_caller_saved (void)
 	  bool using_r4, first_param = true;
 	  function_args_iterator args_iter;
 	  uint32_t padding_bits_to_clear[4] = {0U, 0U, 0U, 0U};
-	  uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear[0];
 
 	  if (!NONDEBUG_INSN_P (insn))
 	continue;
@@ -17086,7 +17085,7 @@ cmse_nonsecure_call_clear_caller_saved (void)
 	  to_clear_args_mask
 		= compute_not_to_clear_mask (arg_type, arg_rtx,
 	 REGNO (arg_rtx),
-	 padding_bits_to_clear_ptr);
+	 &padding_bits_to_clear[0]);
 	  if (to_clear_args_mask)
 		{
 		  for (regno = R0_REGNUM; regno <= maxregno; regno++)
@@ -25134,7 +25133,6 @@ cmse_nonsecure_entry_clear_before_return (void)
 {
   int regno, maxregno = TARGET_HARD_FLOAT ? LAST_VFP_REGNUM : IP_REGNUM;
   uint32_t padding_bits_to_clear = 0;
-  uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear;
   auto_sbitmap to_clear_bitmap (maxregno + 1);
   tree result_type;
   rtx result_rtl;
@@ -25187,7 +25185,7 @@ cmse_nonsecure_entry_clear_before_return (void)
   gcc_assert (REG_P (result_rtl));
   to_clear_return_mask
 	= compute_not_to_clear_mask (result_type, result_rtl, 0,
- padding_bits_to_clear_ptr);
+ &padding_bits_to_clear);
   if (to_clear_return_mask)
 	{
 	  gcc_assert ((unsigned) maxregno < sizeof (long long) * __CHAR_BIT__);


Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 1:47 PM, Nathan Sidwell  wrote:
> On 11/21/2017 06:14 AM, Dominik Inführ wrote:
>>
>> Hi,
>>
>> this patch tries to extend tree-ssa-dce.c to remove unnecessary
>> new/delete-pairs (it already does that for malloc/free). Clang does it too
>> and it seems to be allowed by
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html. I’ve
>> bootstrapped/regtested on aarch64-linux and x86_64-linux.
>
>
> nice.
>
>> --- a/gcc/tree-core.h
>> +++ b/gcc/tree-core.h
>> @@ -1787,7 +1787,9 @@ struct GTY(()) tree_function_decl {
>>unsigned has_debug_args_flag : 1;
>>unsigned tm_clone_flag : 1;
>>unsigned versioned_function : 1;
>> -  /* No bits left.  */
>> +
>> +  unsigned operator_delete_flag : 1;
>> +  /* 31 bits left.  */
>>  };
>
>
> that's unpleasant.  We have DECL_IS_{MALLOC,OPERATOR_{NEW,DELETE}} flags,
> which are all mutually exclusive.  If only there was a way to encode a
> 4-valued enumeration in fewer than 3 bits ... :)  (not sure why we don't
> have DECL_IS_FREE, and as we don't, why is DECL_IS_OPERATOR_DELETE needed?
>
> (we also have DECL_IS_{CON,DE}STRUCTOR flags, which I think are also
> mutually exclusive with the above.  So that's 5 or (6 if we add
> DECL_IS_FREE), that could be encoded in 3 bits.
>
> There may be even more mutually exclusive flags,
> DECL_STATIC_{CON,DE}STRUCTOR may be candiates?

Yes, there's room for cleanup (as I noted).  And ok, if we don't want
to regress 32bit hosts we can put
a new flag in decl_common indeed.

So please do that.

Anything else can be done as followup and need not be done as part of
this patch.  An enum for
this would work I guess.

Richard.

>
> nathan
>
> --
> Nathan Sidwell


Re: PR83004: Accidental change to pr81136.c for VECTOR_BITS==128

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 02:51:09PM +0100, Richard Biener wrote:
> On Wed, Nov 22, 2017 at 10:30 AM, Richard Sandiford
>  wrote:
> > r254589 was supposed to leave tests unchanged for the default setting
> > of VECTOR_BITS, but I must have got my sums wrong on pr81136.c.
> > Sorry for the breakage.
> >
> > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
> > OK to install?
> 
> Ok.

That will still FAIL e.g. with -march=skylake-avx512 or -march=knl
(at least when not preferring 256 or 128 bit vectors), those would need
ALIGNMENT 64.

> > 2017-11-22  Richard Sandiford  
> >
> > gcc/testsuite/
> > PR testsuite/83004
> > * gcc.dg/vect/pr81136.c: Restore previous alignment of 32
> > in the default case.
> >
> > Index: gcc/testsuite/gcc.dg/vect/pr81136.c
> > ===
> > --- gcc/testsuite/gcc.dg/vect/pr81136.c 2017-11-22 09:21:12.538474075 +
> > +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-11-22 09:28:04.836628929 +
> > @@ -2,7 +2,13 @@
> >
> >  #include "tree-vect.h"
> >
> > -struct __attribute__((aligned (VECTOR_BITS / 8)))
> > +#if VECTOR_BITS > 256
> > +#define ALIGNMENT (VECTOR_BITS / 8)
> > +#else
> > +#define ALIGNMENT 32
> > +#endif
> > +
> > +struct __attribute__((aligned (ALIGNMENT)))
> >  {
> >char misaligner;
> >int foo[100];

Jakub


Re: PR83004: Accidental change to pr81136.c for VECTOR_BITS==128

2017-11-22 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Wed, Nov 22, 2017 at 02:51:09PM +0100, Richard Biener wrote:
>> On Wed, Nov 22, 2017 at 10:30 AM, Richard Sandiford
>>  wrote:
>> > r254589 was supposed to leave tests unchanged for the default setting
>> > of VECTOR_BITS, but I must have got my sums wrong on pr81136.c.
>> > Sorry for the breakage.
>> >
>> > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
>> > OK to install?
>> 
>> Ok.
>
> That will still FAIL e.g. with -march=skylake-avx512 or -march=knl
> (at least when not preferring 256 or 128 bit vectors), those would need
> ALIGNMENT 64.

Yeah, the real fix for AVX512 is to define VECTOR_BITS.  And I'd have
thought even AVX2 would need to define it to get clean results on other
tests.  But the patch that introduced VECTOR_BITS just wasn't supposed
to be changing the default behaviour in the way that I'd accidentally
done here.

Do you know what the vect.exp results are like for 256-bit and 512-bit
vectors on x86_64?  Like I said in the PR, I was surprised we were the
first to hit the need for a macro like VECTOR_BITS.  The results for
-msve-vector-bits=256 and -msve-vector-bits=512 were really poor without
it, since many things were hard-coded to values that made sense for
<= 128-bit (or sometimes <= 256-bit) vectors.

Thanks,
Richard


Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-22 Thread Nathan Sidwell

On 11/22/2017 08:59 AM, Richard Biener wrote:


Anything else can be done as followup and need not be done as part of
this patch.  An enum for
this would work I guess.


I've added this to https://gcc.gnu.org/wiki/ImprovementProjects:

Compress DECL flags

tree-core defines a number of bit flags (DECL_IS_MALLOC, 
DECL_IS_OPERATOR_NEW, DECL_CONSTRUCTOR, DECL_STATIC_CONSTRUCTOR, etc) 
that are mutually exclusive. It would be better to use some kind of 
enumeration, rather than individual flags. (We've run out of bits). 
(Suggested by Richard Biener & Nathan Sidwell)


[The reason I put names on these things, is so there's someone to 
contact if the intent isn't clear]


nathan

--
Nathan Sidwell


Re: PR83004: Accidental change to pr81136.c for VECTOR_BITS==128

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 02:17:59PM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Wed, Nov 22, 2017 at 02:51:09PM +0100, Richard Biener wrote:
> >> On Wed, Nov 22, 2017 at 10:30 AM, Richard Sandiford
> >>  wrote:
> >> > r254589 was supposed to leave tests unchanged for the default setting
> >> > of VECTOR_BITS, but I must have got my sums wrong on pr81136.c.
> >> > Sorry for the breakage.
> >> >
> >> > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
> >> > OK to install?
> >> 
> >> Ok.
> >
> > That will still FAIL e.g. with -march=skylake-avx512 or -march=knl
> > (at least when not preferring 256 or 128 bit vectors), those would need
> > ALIGNMENT 64.
> 
> Yeah, the real fix for AVX512 is to define VECTOR_BITS.  And I'd have
> thought even AVX2 would need to define it to get clean results on other
> tests.  But the patch that introduced VECTOR_BITS just wasn't supposed
> to be changing the default behaviour in the way that I'd accidentally
> done here.
> 
> Do you know what the vect.exp results are like for 256-bit and 512-bit
> vectors on x86_64?  Like I said in the PR, I was surprised we were the

Dunno, but can try it for -march=haswell easily (or we could look up
gcc-testresults).  I think some of the Intel folks are surely testing
regularly with --with-arch=haswell configured compiler, judging from many
bugreports about tests from the testsuite that only fail with -march=haswell.

Jakub


Re: [PATCH, GCC/ARM] Factor out CMSE register clearing code

2017-11-22 Thread Kyrill Tkachov

Hi Thomas,

On 15/11/17 17:12, Thomas Preudhomme wrote:

Hi,

Functions cmse_nonsecure_call_clear_caller_saved and
cmse_nonsecure_entry_clear_before_return both contain very similar code
to clear registers. What's worse, they differ slightly at times so if a
bug is found in one careful thoughts is needed to decide whether the
other function needs fixing too.

This commit addresses the situation by factoring the two pieces of code
into a new function. In doing so the code generated to clear VFP
registers in cmse_nonsecure_call now uses the same sequence as
cmse_nonsecure_entry functions. Tests expectation are thus updated
accordingly.

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-10-24  Thomas Preud'homme 

* config/arm/arm.c (cmse_clear_registers): New function.
(cmse_nonsecure_call_clear_caller_saved): Replace register 
clearing

code by call to cmse_clear_registers.
(cmse_nonsecure_entry_clear_before_return): Likewise.

*** gcc/ChangeLog ***

2017-10-24  Thomas Preud'homme 

* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Adapt 
expectations

to vmov instructions now generated.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and no regression in the
testsuite.

Is this ok for trunk?



This looks mostly ok, but I have a concern from reading the code that 
I'd like some help with...


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
9b494e9529a4470c18192a4561e03d2f80e90797..22c9add0722974902b2a89b2b0a75759ff8ba37c
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16991,6 +16991,128 @@ compute_not_to_clear_mask (tree arg_type, rtx 
arg_rtx, int regno,
   return not_to_clear_mask;
 }
 
+/* Clear registers secret before doing a cmse_nonsecure_call or returning from

+   a cmse_nonsecure_entry function.  TO_CLEAR_BITMAP indicates which registers
+   are to be fully cleared, using the value in register CLEARING_REG if more
+   efficient.  The PADDING_BITS_LEN entries array PADDING_BITS_TO_CLEAR gives
+   the bits that needs to be cleared in caller-saved core registers, with
+   SCRATCH_REG used as a scratch register for that clearing.
+
+   NOTE: one of three following assertions must hold:
+   - SCRATCH_REG is a low register
+   - CLEARING_REG is in the set of registers fully cleared (ie. its bit is set
+ in TO_CLEAR_BITMAP)
+   - CLEARING_REG is a low register.  */
+
+static void
+cmse_clear_registers (sbitmap to_clear_bitmap, uint32_t *padding_bits_to_clear,
+ int padding_bits_len, rtx scratch_reg, rtx clearing_reg)
+{
+  bool saved_clearing = false;
+  rtx saved_clearing_reg = NULL_RTX;
+  int i, regno, clearing_regno, minregno = R0_REGNUM, maxregno = minregno - 1;
+

Here minregno becomes 0 and maxregno becomes -1...

+  gcc_assert (arm_arch_cmse);
+
+  if (!bitmap_empty_p (to_clear_bitmap))
+{
+  minregno = bitmap_first_set_bit (to_clear_bitmap);
+  maxregno = bitmap_last_set_bit (to_clear_bitmap);
+}


...and here is a path on maxregno may not be set to a proper register 
number...



+
+  for (regno = minregno; regno <= maxregno; regno++)
+{
+  if (!bitmap_bit_p (to_clear_bitmap, regno))
+   continue;
+

...and here we iterate from minregno (potentially 0) to maxregno (potentially 
-1) which will lead to trouble.
Are there any guarantees that this case will not occur?

Thanks,
Kyrill



Re: PR83004: Accidental change to pr81136.c for VECTOR_BITS==128

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 3:17 PM, Richard Sandiford
 wrote:
> Jakub Jelinek  writes:
>> On Wed, Nov 22, 2017 at 02:51:09PM +0100, Richard Biener wrote:
>>> On Wed, Nov 22, 2017 at 10:30 AM, Richard Sandiford
>>>  wrote:
>>> > r254589 was supposed to leave tests unchanged for the default setting
>>> > of VECTOR_BITS, but I must have got my sums wrong on pr81136.c.
>>> > Sorry for the breakage.
>>> >
>>> > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
>>> > OK to install?
>>>
>>> Ok.
>>
>> That will still FAIL e.g. with -march=skylake-avx512 or -march=knl
>> (at least when not preferring 256 or 128 bit vectors), those would need
>> ALIGNMENT 64.
>
> Yeah, the real fix for AVX512 is to define VECTOR_BITS.  And I'd have
> thought even AVX2 would need to define it to get clean results on other
> tests.  But the patch that introduced VECTOR_BITS just wasn't supposed
> to be changing the default behaviour in the way that I'd accidentally
> done here.
>
> Do you know what the vect.exp results are like for 256-bit and 512-bit
> vectors on x86_64?  Like I said in the PR, I was surprised we were the
> first to hit the need for a macro like VECTOR_BITS.  The results for
> -msve-vector-bits=256 and -msve-vector-bits=512 were really poor without
> it, since many things were hard-coded to values that made sense for
> <= 128-bit (or sometimes <= 256-bit) vectors.

IIRC testresuits for -mavx2 were clean (at some point...).  Never checked
-mavx512[bf] as I don't have AVX512 HW conveniently available.

Richard.

> Thanks,
> Richard


[C++ Patch] PR 82293 ("[8 Regression] ICE in nonlambda_method_basetype at gcc/cp/lambda.c:886")

2017-11-22 Thread Paolo Carlini

Hi,

this ICE on valid is triggered by the existing 
cpp0x/lambda/lambda-ice20.C when -Wshadow is requested. Today I 
confirmed that it can be reproduced only after r251433, the "Reimplement 
handling of lambdas in templates." patch: then, as part of 
tsubst_lambda_expr, do_pushdecl calls check_local_shadow which in turn 
checks nonlambda_method_basetype and at that time current_class_type is 
null. I believe this is just something which we have to handle in the 
obvious way: indeed, most other uses of LAMBDA_TYPE_P are already 
checking first that the argument is non-null, see, for example, the 
related current_nonlambda_class_type. Tested x86_64-linux.


Thanks, Paolo.

///


/cp
2017-11-22  Paolo Carlini  

PR c++/82293
* lambda.c (nonlambda_method_basetype): Don't use LAMBDA_TYPE_P
on a null type.

/testsuite
2017-11-22  Paolo Carlini  

PR c++/82293
* g++.dg/cpp0x/lambda/lambda-ice24.C: New.
Index: cp/lambda.c
===
--- cp/lambda.c (revision 255053)
+++ cp/lambda.c (working copy)
@@ -921,7 +921,7 @@ nonlambda_method_basetype (void)
 return NULL_TREE;
 
   type = current_class_type;
-  if (!LAMBDA_TYPE_P (type))
+  if (!type || !LAMBDA_TYPE_P (type))
 return type;
 
   /* Find the nearest enclosing non-lambda function.  */
Index: testsuite/g++.dg/cpp0x/lambda/lambda-ice24.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-ice24.C(nonexistent)
+++ testsuite/g++.dg/cpp0x/lambda/lambda-ice24.C(working copy)
@@ -0,0 +1,12 @@
+// PR c++/82293
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wshadow" }
+
+template 
+struct S {
+  int f{[this](){return 42;}()};
+};
+
+int main(){
+  return S{}.f;
+}


[Patch AArch64] Fixup floating point division with -march=armv8-a+nosimd

2017-11-22 Thread Ramana Radhakrishnan

Hi,

I received a private report from a customer that gcc was putting out 
calls to __divdf3 when compiling with +nosimd. When the reciprocal math 
support was added this was probably an oversight or a typo.


The canonical examples is :

double
foo (double x, double y)
  {
return x / y;
  }

with -march=armv8-a+nosimd

generates a function that calls __divdf3. Ofcourse on AArch64 we don't
have any software floating point and this causes issues.

There is also a problem in +nosimd that has existed since the dawn of 
time in the port with respect to long doubles (128 bit floating point), 
here the ABI and the compiler expect the presence of the SIMD unit as 
these parameters are passed in the vector registers. Thus while +nosimd 
tries to prevent the use of SIMD instructions in the compile we don't 
get this right as we end up using ldr qN / str qN instructions and even 
there I think things go wrong in a simple example that I tried.


Is that sufficient to consider marking +nosimd as deprecated in GCC-8 
and remove this in a future release ?


That is not a subject for this patch but something separate but I would 
like to put this into trunk and the release branches. Bootstrapped and 
regression tested on my aarch64 desktop.


Ok ?

regards
Ramana


* config/aarch64/aarch64.md (div3): Change check to TARGET_FLOAT.
* config/aarch64/aarch64.c (aarch64_emit_approx_div): Add early 
exit for vector mode and !TARGET_SIMD.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0c67e2b5c67..811f19c3f11 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8442,6 +8442,9 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
   || !use_approx_division_p)
 return false;
 
+  if (!TARGET_SIMD && VECTOR_MODE_P (mode))
+return false;
+
   /* Estimate the approximate reciprocal.  */
   rtx xrcp = gen_reg_rtx (mode);
   emit_insn ((*get_recpe_type (mode)) (xrcp, den));
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 423a3352aab..83e49425090 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5093,7 +5093,7 @@
  [(set (match_operand:GPF_F16 0 "register_operand")
(div:GPF_F16 (match_operand:GPF_F16 1 "general_operand")
(match_operand:GPF_F16 2 "register_operand")))]
- "TARGET_SIMD"
+ "TARGET_FLOAT"
 {
   if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
 DONE;


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 12:01 PM, Alan Hayward  wrote:
>
>> On 22 Nov 2017, at 09:14, Richard Biener  wrote:
>>
>> On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel
>>  wrote:
>>>
 This is PR81179 I think, please mention that in the changelog.
>>>
>>> Correct, my bad for missing that.
>>>
 This unconditionally pessimizes code even if there is no valid index
 zero, right?
>>>
>>> Almost, since for a loop such as:
>>>
>>>  #define OFFSET 1
>>>  unsigned int find(const unsigned int *a, unsigned int v) {
>>>unsigned int result = 120;
>>>for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>>>  if (a[i-OFFSET] == v) result = i;
>>>}
>>>return result;
>>>  }
>>>
>>> the index i will match the contents of the index vector used here ---
>>> but this does indeed pessimize the code generated for, say, OFFSET
>>> = 2. It is probably more sensible to use the existing code path in those
>>> situations.
>>>
 The issue with the COND_REDUCITION index vector is overflow IIRC.
>>>
>>> Does that mean such overflows can already manifest themselves for
>>> regular COND_REDUCTION? I had assumed sufficient checks were already in
>>> place because of the presence of the is_nonwrapping_integer_induction
>>> test.
>>
>> But only if we need the index vector?  The patch looked like you're changing
>> how other modes are handled (in my look I didn't make myself familiar with
>> the various modes again...).  Anyway, Alan hopefully remembers what he
>> coded so I'll defer to him here.
>>
>> If Alan is happy with the patch consider it approved.
>>
>
> Richard’s right with his question.
>
> The optimisation needs to fail if the number of interactions of the loop + 1 
> doesn’t
> fit into the data type used for the result.
>
> I took the test pr65947-14.c
> First I set N to 0x-1. This compiled and vectorised. That’s ok.
> Now if I set N to 0x it still vectorises, but this should fail.
>
> Compare to pr65947-14.c where we set  last = a[I]; inside the if.
> When set N to 0x-1, it compiled and vectorised. That’s ok.
> When set N to 0x it fails to vectorise with the message
> "loop size is greater than data size”.
>
> Looks like you might just need to add the one check.
>
> Also see pr65947-9.c which uses the slightly more useful char indexes.

Some extra test variants might be indeed useful to really cover all corner
cases.  The PR had some "trivial" patch for the issue by me that I considered
a too big hammer.

I wonder if other compilers pull another trick to deal with the situation.

Richard.

>
> Alan.
>
>
>


[PATCH, i386] Fix behavior for –mprefer-vector-width= option

2017-11-22 Thread Shalnov, Sergey
Hi,
This patch making –mprefer-vector-width= option inclusive. This means that 
if we use –mprefer-vector-width=128 it should switch TARGET_PREFER_AVX128=ON 
and TARGET_PREFER_AVX256=ON also.
It is minor change to generate “xmm” with –mprefer-vector-width=128 
on the platform with “zmm”.

Sergey

2017-11-22  Sergey Shalnov  
gcc/
* config/i386/i386.h (TARGET_PREFER_AVX256): Add inclusiveness of
the TARGET_PREFER_AVX256 for TARGET_PREFER_AVX128




0006-Fix-behavior-for-mprefer-vector-width-option.patch
Description: 0006-Fix-behavior-for-mprefer-vector-width-option.patch


Re: [PATCH, GCC/ARM] Factor out CMSE register clearing code

2017-11-22 Thread Thomas Preudhomme



On 22/11/17 14:45, Kyrill Tkachov wrote:

Hi Thomas,

On 15/11/17 17:12, Thomas Preudhomme wrote:

Hi,

Functions cmse_nonsecure_call_clear_caller_saved and
cmse_nonsecure_entry_clear_before_return both contain very similar code
to clear registers. What's worse, they differ slightly at times so if a
bug is found in one careful thoughts is needed to decide whether the
other function needs fixing too.

This commit addresses the situation by factoring the two pieces of code
into a new function. In doing so the code generated to clear VFP
registers in cmse_nonsecure_call now uses the same sequence as
cmse_nonsecure_entry functions. Tests expectation are thus updated
accordingly.

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-10-24  Thomas Preud'homme 

    * config/arm/arm.c (cmse_clear_registers): New function.
    (cmse_nonsecure_call_clear_caller_saved): Replace register clearing
    code by call to cmse_clear_registers.
    (cmse_nonsecure_entry_clear_before_return): Likewise.

*** gcc/ChangeLog ***

2017-10-24  Thomas Preud'homme 

    * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Adapt expectations
    to vmov instructions now generated.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and no regression in the
testsuite.

Is this ok for trunk?



This looks mostly ok, but I have a concern from reading the code that I'd like 
some help with...


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
9b494e9529a4470c18192a4561e03d2f80e90797..22c9add0722974902b2a89b2b0a75759ff8ba37c 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16991,6 +16991,128 @@ compute_not_to_clear_mask (tree arg_type, rtx arg_rtx, 
int regno,

    return not_to_clear_mask;
  }

+/* Clear registers secret before doing a cmse_nonsecure_call or returning from
+   a cmse_nonsecure_entry function.  TO_CLEAR_BITMAP indicates which registers
+   are to be fully cleared, using the value in register CLEARING_REG if more
+   efficient.  The PADDING_BITS_LEN entries array PADDING_BITS_TO_CLEAR gives
+   the bits that needs to be cleared in caller-saved core registers, with
+   SCRATCH_REG used as a scratch register for that clearing.
+
+   NOTE: one of three following assertions must hold:
+   - SCRATCH_REG is a low register
+   - CLEARING_REG is in the set of registers fully cleared (ie. its bit is set
+ in TO_CLEAR_BITMAP)
+   - CLEARING_REG is a low register.  */
+
+static void
+cmse_clear_registers (sbitmap to_clear_bitmap, uint32_t *padding_bits_to_clear,
+  int padding_bits_len, rtx scratch_reg, rtx clearing_reg)
+{
+  bool saved_clearing = false;
+  rtx saved_clearing_reg = NULL_RTX;
+  int i, regno, clearing_regno, minregno = R0_REGNUM, maxregno = minregno - 1;
+

Here minregno becomes 0 and maxregno becomes -1...

+  gcc_assert (arm_arch_cmse);
+
+  if (!bitmap_empty_p (to_clear_bitmap))
+    {
+  minregno = bitmap_first_set_bit (to_clear_bitmap);
+  maxregno = bitmap_last_set_bit (to_clear_bitmap);
+    }


...and here is a path on maxregno may not be set to a proper register number...



If bitmap is empty yes, ie. if no bit is set and no register should be cleared.



+
+  for (regno = minregno; regno <= maxregno; regno++)
+    {
+  if (!bitmap_bit_p (to_clear_bitmap, regno))
+    continue;
+

...and here we iterate from minregno (potentially 0) to maxregno (potentially 
-1) which will lead to trouble.

Are there any guarantees that this case will not occur?


It absolutely does occur and that's on purpose. If maxregno is -1 it means there 
is no bit to clear and so it is fine to do nothing.


Best regards,

Thomas


Re: [PATCH, GCC/ARM] Factor out CMSE register clearing code

2017-11-22 Thread Kyrill Tkachov


On 22/11/17 15:07, Thomas Preudhomme wrote:



On 22/11/17 14:45, Kyrill Tkachov wrote:

Hi Thomas,

On 15/11/17 17:12, Thomas Preudhomme wrote:

Hi,

Functions cmse_nonsecure_call_clear_caller_saved and
cmse_nonsecure_entry_clear_before_return both contain very similar code
to clear registers. What's worse, they differ slightly at times so if a
bug is found in one careful thoughts is needed to decide whether the
other function needs fixing too.

This commit addresses the situation by factoring the two pieces of code
into a new function. In doing so the code generated to clear VFP
registers in cmse_nonsecure_call now uses the same sequence as
cmse_nonsecure_entry functions. Tests expectation are thus updated
accordingly.

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-10-24  Thomas Preud'homme 

* config/arm/arm.c (cmse_clear_registers): New function.
(cmse_nonsecure_call_clear_caller_saved): Replace register 
clearing

code by call to cmse_clear_registers.
(cmse_nonsecure_entry_clear_before_return): Likewise.

*** gcc/ChangeLog ***

2017-10-24  Thomas Preud'homme 

* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Adapt 
expectations

to vmov instructions now generated.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and no regression in the
testsuite.

Is this ok for trunk?



This looks mostly ok, but I have a concern from reading the code that 
I'd like some help with...


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
9b494e9529a4470c18192a4561e03d2f80e90797..22c9add0722974902b2a89b2b0a75759ff8ba37c 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16991,6 +16991,128 @@ compute_not_to_clear_mask (tree arg_type, 
rtx arg_rtx, int regno,

return not_to_clear_mask;
  }

+/* Clear registers secret before doing a cmse_nonsecure_call or 
returning from
+   a cmse_nonsecure_entry function.  TO_CLEAR_BITMAP indicates which 
registers
+   are to be fully cleared, using the value in register CLEARING_REG 
if more
+   efficient.  The PADDING_BITS_LEN entries array 
PADDING_BITS_TO_CLEAR gives
+   the bits that needs to be cleared in caller-saved core registers, 
with

+   SCRATCH_REG used as a scratch register for that clearing.
+
+   NOTE: one of three following assertions must hold:
+   - SCRATCH_REG is a low register
+   - CLEARING_REG is in the set of registers fully cleared (ie. its 
bit is set

+ in TO_CLEAR_BITMAP)
+   - CLEARING_REG is a low register.  */
+
+static void
+cmse_clear_registers (sbitmap to_clear_bitmap, uint32_t 
*padding_bits_to_clear,

+  int padding_bits_len, rtx scratch_reg, rtx clearing_reg)
+{
+  bool saved_clearing = false;
+  rtx saved_clearing_reg = NULL_RTX;
+  int i, regno, clearing_regno, minregno = R0_REGNUM, maxregno = 
minregno - 1;

+

Here minregno becomes 0 and maxregno becomes -1...

+  gcc_assert (arm_arch_cmse);
+
+  if (!bitmap_empty_p (to_clear_bitmap))
+{
+  minregno = bitmap_first_set_bit (to_clear_bitmap);
+  maxregno = bitmap_last_set_bit (to_clear_bitmap);
+}


...and here is a path on maxregno may not be set to a proper register 
number...




If bitmap is empty yes, ie. if no bit is set and no register should be 
cleared.




+
+  for (regno = minregno; regno <= maxregno; regno++)
+{
+  if (!bitmap_bit_p (to_clear_bitmap, regno))
+continue;
+

...and here we iterate from minregno (potentially 0) to maxregno 
(potentially -1) which will lead to trouble.

Are there any guarantees that this case will not occur?


It absolutely does occur and that's on purpose. If maxregno is -1 it 
means there is no bit to clear and so it is fine to do nothing.




Err, of course. If maxregno is -1 then the for loop does nothing. My 
brain was reading this as "until regno <= maxregno" for some reason.


This is ok for trunk.
Sorry for the noise,
Kyrill


Best regards,

Thomas




Re: [PATCH] PR libstdc++/48101 improve errors for invalid container specializations

2017-11-22 Thread Ian Lance Taylor
On Wed, Nov 22, 2017 at 3:06 AM, Jonathan Wakely  wrote:
> On 22/11/17 10:56 +, Jonathan Wakely wrote:
>>
>> On 22/11/17 11:23 +0100, Rainer Orth wrote:
>>>
>>> Hi Jonathan,
>>>
 This uses static_assert to improve the errors when attempting to
 instantiate invalid specializations of containers, e.g. set,
 or unordered_set, hash> (which mixes up the
 order of the hasher and equality predicate arguments).

 This means instead of more than 100 lines of confusing errors for
 https://wandbox.org/permlink/kL1YVNVOzrAsLPyS we get only this:

 In file included from /home/jwakely/gcc/8/include/c++/8.0.0/set:61:0,
from s.cc:2:
 /home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h: In instantiation
 of ‘class std::set’:
 s.cc:8:18:   required from here
 /home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h:108:7: error:
 static assertion failed: std::set must have a non-const, non-volatile
 value_type
  static_assert(is_same::type, _Key>::value,
  ^
 In file included from
 /home/jwakely/gcc/8/include/c++/8.0.0/unordered_set:46:0,
from s.cc:1:
 /home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h: In instantiation
 of ‘class std::_Hashtable,
 std::__detail::_Identity, std::hash, std::equal_to,
 std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash,
 std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits>>> true, false> >’:
 /home/jwakely/gcc/8/include/c++/8.0.0/bits/unordered_set.h:898:18:
 required from ‘class std::unordered_multiset,
 std::hash >’
 s.cc:10:53:   required from here
 /home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:195:7: error:
 static assertion failed: hash function must be invocable with an argument 
 of
 key type
  static_assert(__is_invocable{},
  ^
 /home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:197:7: error:
 static assertion failed: key equality predicate must be invocable with two
 arguments of key type
  static_assert(__is_invocable>>> _Key&>{},
  ^

 Tested powerpc64le-linux, committed to trunk.
>>>
>>>
>>> this broke Go bootstrap, it seems.  Seen on i386-pc-solaris2.11 as of
>>> r255048:
>>>
>>> In file included from
>>> /var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/map:60,
>>>from /vol/gcc/src/hg/trunk/local/gcc/go/go-system.h:36,
>>>from
>>> /vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/ast-dump.cc:7:
>>>
>>> /var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:
>>> In instantiation of 'class std::_Rb_tree>> std::_Identity, Import_init_lt, std::allocator
>>> >':
>>>
>>> /var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_set.h:133:17:
>>> required from 'class std::set'
>>> /vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/gogo.h:127:37:   required
>>> from here
>>>
>>> /var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:452:7:
>>> error: static assertion failed: comparison object must be invocable with two
>>> arguments of key type
>>>  static_assert(__is_invocable>> _Key&>{},
>>>  ^
>>> In file included from
>>> /var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/map:60,
>>>from /vol/gcc/src/hg/trunk/local/gcc/go/go-system.h:36,
>>>from
>>> /vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/escape.cc:7:
>>>
>>> /var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:
>>> In instantiation of 'class std::_Rb_tree>> std::_Identity, Import_init_lt, std::allocator
>>> >':
>>>
>>> /var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_set.h:133:17:
>>> required from 'class std::set'
>>> /vol/gcc/src/hg/trunk/local/gcc/go/gofrontend/gogo.h:127:37:   required
>>> from here
>>>
>>> /var/gcc/regression/trunk/11.4-gcc/build/prev-i386-pc-solaris2.11/libstdc++-v3/include/bits/stl_tree.h:452:7:
>>> error: static assertion failed: comparison object must be invocable with two
>>> arguments of key type
>>>  static_assert(__is_invocable>> _Key&>{},
>>>  ^
>>
>>
>> This is PR83102. I'm going to relax the checks, but the Go
>> frontend still needs to be fixed.
>>
>>
>
> This should allow Go to build again (unless you bootstrap with
> -std=gnu++17 in the build flags)

Thanks.  Committed.

Ian


Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-22 Thread Kyrill Tkachov


On 22/11/17 11:25, Sudi Das wrote:

Hi Kyrill and Christophe


In case of soft fp testing, there are other assembly directives apart from the 
vmov one which are also failing. The directives probably make more sense in the 
hard fp context so instead of removing the vmov, I have added the 
-mfloat-abi=hard option.

Is this ok for trunk? If yes could someone post it on my behalf?

Sudi


Thanks Sudi,

You're right, this whole test isn't written with softfp in mind. We 
might as well restrict it to -mfloat-abi=hard.

I've committed the patch with r255061.

Would you like to get commit access to the SVN repo?
If you complete the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi
using my email address as the approver we can get that sorted out :)

Kyrill



*** gcc/testsuite/ChangeLog ***

2017-11-22  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-1.c: Add -mfloat-abi=hard option.




From: Kyrill Tkachov 
Sent: Monday, November 20, 2017 2:20 PM
To: Christophe Lyon
Cc: Sudi Das; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; Richard 
Earnshaw
Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
   



On 20/11/17 14:14, Christophe Lyon wrote:

Hi,

On 17 November 2017 at 12:12, Kyrill  Tkachov
 wrote:

On 17/11/17 10:45, Sudi Das wrote:

Hi Kyrill

Thanks I have made the change.

Thanks Sudi, I've committed this on your behalf with r254863.

Kyrill



Sudi



From: Kyrill Tkachov 
Sent: Thursday, November 16, 2017 5:03 PM
To: Sudi Das; gcc-patches@gcc.gnu.org
Cc: nd; Ramana Radhakrishnan; Richard Earnshaw
Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

Hi Sudi,

On 16/11/17 16:37, Sudi Das wrote:

Hi

This patch fixes the test case armv8_2-fp16-move-1.c for
arm-none-linux-gnueabihf where 2 of the scan-assembler directives were
failing. We now generate less vmov between core and VFP registers.
Thus changing those directives to reflect that.

Is this ok for trunk?
If yes could someone commit it on my behalf?

Sudi


*** gcc/testsuite/ChangeLog ***

2017-11-16  Sudakshina Das  

 * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov
scan-assembler
 directives.


diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index bb4e68f..0ed8560 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
  /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+,
s[0-9]+} 1 } }  */
  /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+,
s[0-9]+} 1 } }  */
  -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 }
}  */
-/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }
*/
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }
*/
+/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } }  */
  Some of the moves between core and fp registers were the result of
inefficient codegen and in hindsight
scanning for them was not very useful. Now that we emit only the required
ones I think scanning for the plain
vmovs between two S-registers doesn't test anything useful.
So can you please just remove the second scan-assembler directive here?


You are probably already aware of that: the tests fail on
arm-none-linux-gnueabi/arm-none-eabi
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 (found 38 times)

but this is not a regression, the previous version of the test had the
same problem.

Grrr, that's because the softfp ABI necessitates moves between core and
FP registers,
so scanning for a particular number of vmovs between them is just not
gonna be stable across
soft-float ABIs. At this point I'd just remove the scan for vmovs
completely as it doesn't
seem to check anything useful.

A patch to remove that scan for VMOV is pre-approved.

Thanks for reminding me of this Christophe, softfp tends to slip my mind :(
Kyrill


Christophe


Thanks,
Kyrill


 




Re: [PATCH] Add fields to struct gomp_thread for debugging purposes

2017-11-22 Thread Kevin Buettner
On Sat, 4 Nov 2017 21:39:14 -0700
Kevin Buettner  wrote:

> On Tue, 31 Oct 2017 08:03:22 +0100
> Jakub Jelinek  wrote:
> 
> > On Mon, Oct 30, 2017 at 04:06:15PM -0700, Kevin Buettner wrote:  
> > > This patch adds a new member named "pthread_id" to the gomp_thread
> > > struct.  It is initialized in team.c.
> > 
> > That part is reasonable, though it is unclear how the debugger will
> > query it (through OMPD, or through hardcoded name lookup of the struct and
> > field in libgomp's debug info, something else).  But the field certainly
> > has to be guarded by #ifdef LIBGOMP_USE_PTHREADS, otherwise it will break
> > NVPTX offloading or any other pthread-less libgomp ports.
> > Another question is exact placement of the field, struct gomp_thread
> > vs. struct gomp_team_state etc.  Maybe it is ok, as the pthread_id is
> > the same once the thread is created, doesn't change when we create more
> > levels.  
> 
> Assuming we can figure out how to work the rest of it out, I'll submit
> a new patch with the appropriate ifdef.

I've decided to try for a more incremental approach.  This patch,
below, retains the portion that looked reasonable to you.  I've
stripped out the portions for finding the thread parent and have
added the ifdef guards that you asked for.

Is this part okay?

___

Add field to struct gomp_thread for debugging purposes

This patch adds a new member named "pthread_id" to the gomp_thread
struct.  It is initialized in team.c.

This new field serves no purpose in a normally running OpenMP
program.  It is intended to be used by a debugger for identifying
threads.

I've done a "make bootstrap" and have regression tested these changes
with no regressions found.

libgomp/ChangeLog:

* libgomp.h (struct gomp_thread): Add new member "pthread_id".
* team.c (gomp_thread_start): Set pthread_id.
(gomp_team_start): Initialize master thread.
---
 libgomp/libgomp.h |  6 ++
 libgomp/team.c| 13 +
 2 files changed, 19 insertions(+)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 940b5b8..5dafb3c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -611,6 +611,12 @@ struct gomp_thread
  to any place.  */
   unsigned int place;
 
+#ifdef LIBGOMP_USE_PTHREADS
+  /* The pthread id associated with this thread.  This is required for
+ debugging.  */
+  pthread_t pthread_id;
+#endif
+
   /* User pthread thread pool */
   struct gomp_thread_pool *thread_pool;
 };
diff --git a/libgomp/team.c b/libgomp/team.c
index 676614a..6292b22 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -92,6 +92,10 @@ gomp_thread_start (void *xdata)
 
   thr->ts.team->ordered_release[thr->ts.team_id] = &thr->release;
 
+#ifdef LIBGOMP_USE_PTHREADS
+  thr->pthread_id = pthread_self ();
+#endif
+
   /* Make thread pool local. */
   pool = thr->thread_pool;
 
@@ -718,6 +722,15 @@ gomp_team_start (void (*fn) (void *), void *data, unsigned 
nthreads,
   attr = &thread_attr;
 }
 
+  /* Add the master thread to threads[] and record its pthread id too.  */
+  if (pool->threads[0] == NULL)
+{
+  pool->threads[0] = thr;
+#ifdef LIBGOMP_USE_PTHREADS
+  thr->pthread_id = pthread_self ();
+#endif
+}
+
   start_data = gomp_alloca (sizeof (struct gomp_thread_start_data)
* (nthreads-i));
 




[PATCH] C++: avoid most reserved words as misspelling suggestions (PR c++/81610 and PR c++/80567)

2017-11-22 Thread David Malcolm
lookup_name_fuzzy can offer some reserved words as suggestions for
misspelled words, helping with "singed"/"signed" typos.

PR c++/81610 and PR c++/80567 report problems where the C++ frontend
suggested "if", "for" and "else" as corrections for misspelled variable
names.

The root cause is that in r247233
  ("Fix spelling suggestions for reserved words (PR c++/80177)")
I loosened the conditions on these reserved words, adding this condition:
   if (kind == FUZZY_LOOKUP_TYPENAME)
to the logic for rejecting words that don't start decl-specifiers, to
allow for "static_assert" to be offered.

This is too loose a condition: we don't want to suggest *any* reserved word
when we're in a context where we don't know we expect a typename.

For the kinds of error-recover situations where we're suggesting
spelling corrections we don't have much contextual information, so it
seems prudent to be stricter about which reserved words we offer
as spelling suggestions; I don't think it makes sense for us to
suggest e.g. "for".

This patch implements that by effectively reinstating the old logic,
but special-casing RID_STATIC_ASSERT, moving the logic to a new
subroutine (in case we want to allow for other special-cases).

I attempted to add suggestions for the various RID_*CAST, to cope
with e.g. "reinterptet_cast" (I can never type that correctly on the
first try), but the following '<' token confuses the error-recovery
enough that the suggestion code isn't triggered.

Hence this more minimal fix.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
PR c++/81610
PR c++/80567
* name-lookup.c (suggest_rid_p): New function.
(lookup_name_fuzzy): Replace enum-rid-filtering logic with call to
suggest_rid_p.

gcc/testsuite/ChangeLog:
PR c++/81610
PR c++/80567
* g++.dg/spellcheck-reswords.C: New test case.
* g++.dg/spellcheck-stdlib.C: Remove xfail from dg-bogus
suggestion of "if".
---
 gcc/cp/name-lookup.c   | 31 +++---
 gcc/testsuite/g++.dg/spellcheck-reswords.C | 11 +++
 gcc/testsuite/g++.dg/spellcheck-stdlib.C   |  2 +-
 3 files changed, 40 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-reswords.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 7c363b0..a96be46 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -5671,6 +5671,32 @@ class macro_use_before_def : public deferred_diagnostic
   cpp_hashnode *m_macro;
 };
 
+/* Determine if it can ever make sense to offer RID as a suggestion for
+   a misspelling.
+
+   Subroutine of lookup_name_fuzzy.  */
+
+static bool
+suggest_rid_p  (enum rid rid)
+{
+  switch (rid)
+{
+/* Support suggesting function-like keywords.  */
+case RID_STATIC_ASSERT:
+  return true;
+
+default:
+  /* Support suggesting the various decl-specifier words, to handle
+e.g. "singed" vs "signed" typos.  */
+  if (cp_keyword_starts_decl_specifier_p (rid))
+   return true;
+
+  /* Otherwise, don't offer it.  This avoids suggesting e.g. "if"
+and "do" for short misspellings, which are likely to lead to
+nonsensical results.  */
+  return false;
+}
+}
 
 /* Search for near-matches for NAME within the current bindings, and within
macro names, returning the best match as a const char *, or NULL if
@@ -5735,9 +5761,8 @@ lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind 
kind, location_t loc)
 {
   const c_common_resword *resword = &c_common_reswords[i];
 
-  if (kind == FUZZY_LOOKUP_TYPENAME)
-   if (!cp_keyword_starts_decl_specifier_p (resword->rid))
- continue;
+  if (!suggest_rid_p (resword->rid))
+   continue;
 
   tree resword_identifier = ridpointers [resword->rid];
   if (!resword_identifier)
diff --git a/gcc/testsuite/g++.dg/spellcheck-reswords.C 
b/gcc/testsuite/g++.dg/spellcheck-reswords.C
new file mode 100644
index 000..db6104b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/spellcheck-reswords.C
@@ -0,0 +1,11 @@
+void pr81610 (void *p)
+{  
+  forget (p); // { dg-error "not declared" }
+  // { dg-bogus "'for'" "" { target *-*-*} .-1 }
+}
+
+void pr80567 (void *p)
+{
+  memset (p, 0, 4); // { dg-error "not declared" }
+  // { dg-bogus "'else'" "" { target *-*-*} .-1 }
+}
diff --git a/gcc/testsuite/g++.dg/spellcheck-stdlib.C 
b/gcc/testsuite/g++.dg/spellcheck-stdlib.C
index 6e6ab1d..c7a6626 100644
--- a/gcc/testsuite/g++.dg/spellcheck-stdlib.C
+++ b/gcc/testsuite/g++.dg/spellcheck-stdlib.C
@@ -16,7 +16,7 @@ void test_cstdio (void)
   FILE *f; // { dg-error "'FILE' was not declared in this scope" }
   // { dg-message "'FILE' is defined in header ''; did you forget to 
'#include '?" "" { target *-*-* } .-1 }
   // { dg-error "'f' was not declared in this scope" "" { target *-*-* } .-2 }
-  // { dg-bogus "suggested alternative: 'if'" "PR c++/80567" { xfail *-*-*

Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-22 Thread Sudakshina Das



On 22/11/17 15:21, Kyrill Tkachov wrote:


On 22/11/17 11:25, Sudi Das wrote:

Hi Kyrill and Christophe


In case of soft fp testing, there are other assembly directives apart 
from the vmov one which are also failing. The directives probably 
make more sense in the hard fp context so instead of removing the 
vmov, I have added the -mfloat-abi=hard option.


Is this ok for trunk? If yes could someone post it on my behalf?

Sudi


Thanks Sudi,

You're right, this whole test isn't written with softfp in mind. We 
might as well restrict it to -mfloat-abi=hard.

I've committed the patch with r255061.


Hi Kyriil

Thanks for the commit!


Would you like to get commit access to the SVN repo?
If you complete the form at 
https://sourceware.org/cgi-bin/pdw/ps_form.cgi

using my email address as the approver we can get that sorted out :)



Thanks again for the invite. I have filled out the form! :)

Sudi



Kyrill



*** gcc/testsuite/ChangeLog ***

2017-11-22  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-1.c: Add -mfloat-abi=hard option.




From: Kyrill Tkachov 
Sent: Monday, November 20, 2017 2:20 PM
To: Christophe Lyon
Cc: Sudi Das; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; 
Richard Earnshaw

Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c


On 20/11/17 14:14, Christophe Lyon wrote:

Hi,

On 17 November 2017 at 12:12, Kyrill  Tkachov
 wrote:

On 17/11/17 10:45, Sudi Das wrote:

Hi Kyrill

Thanks I have made the change.

Thanks Sudi, I've committed this on your behalf with r254863.

Kyrill



Sudi



From: Kyrill Tkachov 
Sent: Thursday, November 16, 2017 5:03 PM
To: Sudi Das; gcc-patches@gcc.gnu.org
Cc: nd; Ramana Radhakrishnan; Richard Earnshaw
Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

Hi Sudi,

On 16/11/17 16:37, Sudi Das wrote:

Hi

This patch fixes the test case armv8_2-fp16-move-1.c for
arm-none-linux-gnueabihf where 2 of the scan-assembler directives 
were

failing. We now generate less vmov between core and VFP registers.
Thus changing those directives to reflect that.

Is this ok for trunk?
If yes could someone commit it on my behalf?

Sudi


*** gcc/testsuite/ChangeLog ***

2017-11-16  Sudakshina Das  

 * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov
scan-assembler
 directives.


diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index bb4e68f..0ed8560 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
  /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, 
s[0-9]+,

s[0-9]+} 1 } }  */
  /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, 
s[0-9]+,

s[0-9]+} 1 } }  */
  -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, 
r[0-9]+} 4 }

}  */
-/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, 
s[0-9]+} 4 } }

*/
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, 
r[0-9]+} 2 } }

*/
+/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } 
}  */
  Some of the moves between core and fp registers were the 
result of

inefficient codegen and in hindsight
scanning for them was not very useful. Now that we emit only the 
required

ones I think scanning for the plain
vmovs between two S-registers doesn't test anything useful.
So can you please just remove the second scan-assembler directive 
here?



You are probably already aware of that: the tests fail on
arm-none-linux-gnueabi/arm-none-eabi
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 (found 38 times)

but this is not a regression, the previous version of the test had the
same problem.

Grrr, that's because the softfp ABI necessitates moves between core and
FP registers,
so scanning for a particular number of vmovs between them is just not
gonna be stable across
soft-float ABIs. At this point I'd just remove the scan for vmovs
completely as it doesn't
seem to check anything useful.

A patch to remove that scan for VMOV is pre-approved.

Thanks for reminding me of this Christophe, softfp tends to slip my 
mind :(

Kyrill


Christophe


Thanks,
Kyrill








Re: [PATCH] Fix up -Wreturn-type (PR c++/83045)

2017-11-22 Thread Christophe Lyon
Hi,


On 21 November 2017 at 18:23, Jeff Law  wrote:
> On 11/21/2017 06:52 AM, Jakub Jelinek wrote:
>> Hi!
>>
>> The C++ FE now emits __builtin_unreachable () with BUILTINS_LOCATION
>> on spots that return from functions/methods returning non-void without
>> proper return.  This breaks the -Wreturn-type warning, because we then
>> don't see any return stmt without argument on the edges to exit, instead
>> we see those __builtin_unreachable () calls at the end of blocks without
>> successors.
>>
>> I wonder if the C++ FE addition of __builtin_unreachable () shouldn't be
>> done only if (optimize).
>>
>> Anyway, this patch tweaks tree-cfg.c so that it recognizes those
>> __builtin_unreachable () calls and reports the -Wreturn-type warning
>> in those cases too (warning in the FE would be too early, we need to
>> optimize away unreachable code).
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>
>> The patch regresses g++.dg/gomp/declare-simd-1.C, but given that it revealed
>> a real bug, I'm not trying to work around it in the patch and will fix it up
>> incrementally instead.
>>
>> 2017-11-21  Jakub Jelinek  
>>
>>   PR c++/83045
>>   * tree-cfg.c (pass_warn_function_return::execute): Formatting fix.
>>   Also warn if seen __builtin_unreachable () call with BUILTINS_LOCATION.
>>   Use LOCATION_LOCUS when comparing against UNKNOWN_LOCATION.
>>
>>   * c-c++-common/pr61405.c (fn0, fn1): Add return stmts.
>>   * c-c++-common/Wlogical-op-2.c (fn): Likewise.
>>   * g++.dg/debug/pr53466.C: Add -Wno-return-type to dg-options.
>>   * g++.dg/opt/combine.C: Likewise.
>>   * g++.dg/ubsan/return-3.C: Likewise.
>>   * g++.dg/pr59445.C: Likewise.
>>   * g++.dg/pr49847.C: Likewise.
>>   * g++.dg/ipa/pr61800.C: Likewise.
>>   * g++.dg/ipa/pr63470.C: Likewise.
>>   * g++.dg/ipa/pr68672-1.C: Likewise.
>>   * g++.dg/pr58438.C: Likewise.
>>   * g++.dg/torture/pr59265.C: Likewise.
>>   * g++.dg/tree-ssa/ssa-dse-2.C: Likewise.
>>   * g++.old-deja/g++.eh/catch13.C: Likewise.
>>   * g++.old-deja/g++.eh/crash1.C: Likewise.
>>   * g++.dg/tm/pr60004.C: Expect -Wreturn-type warning.
>>   * g++.dg/torture/pr55740.C: Likewise.
>>   * g++.dg/torture/pr43257.C: Likewise.
>>   * g++.dg/torture/pr64280.C: Likewise.
>>   * g++.dg/torture/pr54684.C: Likewise.
>>   * g++.dg/torture/pr56694.C: Likewise.
>>   * g++.dg/torture/pr68470.C: Likewise.
>>   * g++.dg/torture/pr60648.C: Likewise.
>>   * g++.dg/torture/pr71281.C: Likewise.
>>   * g++.dg/torture/pr52772.C: Add -Wno-return-type dg-additional-options.
>>   * g++.dg/torture/pr64669.C: Likewise.
>>   * g++.dg/torture/pr58369.C: Likewise.
>>   * g++.dg/torture/pr33627.C: Likewise.
>>   * g++.dg/torture/predcom-1.C: Add
>>   #pragma GCC diagnostic ignored "-Wreturn-type".
>>   * g++.dg/lto/20090221_0.C: Likewise.
>>   * g++.dg/lto/20091026-1_1.C: Likewise.
>>   * g++.dg/lto/pr54625-1_1.C: Likewise.
>>   * g++.dg/warn/pr83045.C: New test.
> OK.
> jeff

As an obvious follow-up, I have committed the usual fix to
gcc.target/arm/pr56184.C (added -Wno-return-type).

Christophe
diff --git a/gcc/testsuite/gcc.target/arm/pr56184.C 
b/gcc/testsuite/gcc.target/arm/pr56184.C
index fd278d3..8244222 100644
--- a/gcc/testsuite/gcc.target/arm/pr56184.C
+++ b/gcc/testsuite/gcc.target/arm/pr56184.C
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-skip-if "incompatible options" { ! { arm_thumb1_ok || arm_thumb2_ok } 
} } */
-/* { dg-options "-fno-short-enums -O2 -mthumb -march=armv7-a -mfpu=neon 
-mfloat-abi=softfp -mtune=cortex-a9 -fno-section-anchors" } */
+/* { dg-options "-fno-short-enums -O2 -mthumb -march=armv7-a -mfpu=neon 
-mfloat-abi=softfp -mtune=cortex-a9 -fno-section-anchors -Wno-return-type" } */
 
 typedef unsigned int size_t;
 __extension__ typedef int __intptr_t;
2017-11-22  Christophe Lyon  

* gcc.target/arm/pr56184.C: Add -Wno-return-type to dg-options.



Re: [patch] Add support for #pragma GCC unroll

2017-11-22 Thread Sandra Loosemore

On 11/21/2017 03:18 AM, Eric Botcazou wrote:

First of all, the structuring in this section is screwed up.  The
discussion and examples for the previous item (#pragma ivdep) should be
moved inside the @table so that you don't have to introduce another
@table here, just insert another entry into the existing one.


That's also the case for the entire subsection just above, namely "Function
Specific Option Pragmas".  I presume the tables must be merged there too?


That would be a good thing generally, but you don't have to fix that as 
part of this patch (since you're not touching that section otherwise).


-Sandra


Re: [PATCH] Use __BYTE_ORDER__ predefined macro instead of runtime check

2017-11-22 Thread Jerry DeLisle
On 11/22/2017 04:34 AM, Janne Blomqvist wrote:
> By using the __BYTE_ORDER__ predefined macro we don't need the
> determine_endianness function anymore.
> 
> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
> 

Looks good, thanks for patch!

Jerry


[committed] PR 83104: Avoid two_valued_val_range_p for pointers

2017-11-22 Thread Marc Glisse

Hello,

with POINTER_DIFF_EXPR, checking if the result is integral does not imply 
that the arguments are as well, so this check needed a tweak.


Bootstrap+regtest on gcc112, pre-approved by Richard.

2017-11-22  Marc Glisse  

PR tree-optimization/83104
gcc/
* vr-values.c (simplify_stmt_using_ranges): Check integral argument,
not result.

gcc/testsuite/
* gcc.c-torture/compile/pr83104.c: New file.


--
Marc GlisseIndex: gcc/testsuite/gcc.c-torture/compile/pr83104.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr83104.c	(nonexistent)
+++ gcc/testsuite/gcc.c-torture/compile/pr83104.c	(revision 255068)
@@ -0,0 +1,5 @@
+int *a;
+int foo() {
+  if (a && a - (int *)0 > 0)
+return 0;
+}
Index: gcc/vr-values.c
===
--- gcc/vr-values.c	(revision 255067)
+++ gcc/vr-values.c	(revision 255068)
@@ -4084,21 +4084,21 @@ vr_values::simplify_stmt_using_ranges (g
 	 To:
 	 LHS = VAR == VAL1 ? (CST BINOP VAL1) : (CST BINOP VAL2)
 
 	 Also handles:
 	 LHS = VAR BINOP CST
 	 Where VAR is two-valued and LHS is used in GIMPLE_COND only
 	 To:
 	 LHS = VAR == VAL1 ? (VAL1 BINOP CST) : (VAL2 BINOP CST) */
 
   if (TREE_CODE_CLASS (rhs_code) == tcc_binary
-	  && INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+	  && INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
 	  && ((TREE_CODE (rhs1) == INTEGER_CST
 	   && TREE_CODE (rhs2) == SSA_NAME)
 	  || (TREE_CODE (rhs2) == INTEGER_CST
 		  && TREE_CODE (rhs1) == SSA_NAME))
 	  && single_imm_use (lhs, &use_p, &use_stmt)
 	  && gimple_code (use_stmt) == GIMPLE_COND)
 
 	{
 	  tree new_rhs1 = NULL_TREE;
 	  tree new_rhs2 = NULL_TREE;


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-22 Thread Kilian Verhetsel

Thank you both for your comments.

I have added the check to ensure the index vector won't cause an
overflow. I also added tests to the testsuite in order to check that the
loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX
iterations. I was not able to write code that triggers
INTEGER_INDUC_COND_REDUCTION when using char or other smaller types
(changing the types of last, min_v and a to something else causes
COND_REDUCTION to be used instead), so these tests are only compiled and
not executed.

I also moved an instruction that generates a vector of zeroes (used for
COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this
should remove the unused vector that Alan noticed.

2017-11-21  Kilian Verhetsel 

gcc/ChangeLog:
PR testsuite/81179
* tree-vect-loop.c
(vect_create_epilog_for_reduction): Fix the returned value for
INTEGER_INDUC_COND_REDUCTION whose last match occurred at
index 0.
(vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
pass the PHI statement that sets the induction variable to the
code generating the epilogue and check that no overflow will
occur.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for
overflows when compiling conditional reductions.
* gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c
===
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N UINT_MAX
+
+/* Condition reduction with maximum possible loop size.  Will fail to vectorize
+   because values in the index vector will overflow.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c
===
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(working copy)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N (UINT_MAX - 1)
+
+/* Condition reduction with maximum possible loop size, minus one.  This should
+   still be vectorized correctly.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c	(revision 254996)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -4316,7 +4316,7 @@ get_initial_defs_for_reduction (slp_tree slp_node,
 
 static void
 vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt,
-  gimple *reduc_def_stmt,
+  gimple *reduc_def_stmt, gimple *induct_stmt,
   int ncopies, enum tree_code reduc_code,
   vec reduction_phis,
   bool double_reduc, 
@@ -4477,7 +4477,9 @@ vect_create_epilog_for_reduction (vec vect_d
  The first match will be a 1 to allow 0 to be used for non-matching
  indexes.  If there are no matches at all then the vector will be all
  zeroes.  */
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+  == INTEGER_INDUC_COND_REDUCTION)
 {
   tree indx_before_incr, indx_after_incr;
   int nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
@@ -4754,7 +4756,9 @@ vect_create_epilog_for_reduction (vec vect_d
   else
 new_phi_result = PHI_RESULT (new_phis[0]);
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+   || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+   == INTEGER_INDUC_COND_REDUCTION)
   && reduc_code != ERROR_MARK)

Re: [SH][committed] Cleanup div and mul patterns

2017-11-22 Thread Tom de Vries

On 05/03/2016 08:50 AM, Oleg Endo wrote:

Hi,

The attached patch cleans up the SH div and mul patterns a bit.  Almost
no changes in CSiBE, except for a few minor improvements.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235803.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.md (udivsi3, divsi3, mulsi3): Simplify.
(mulhisi3, umulhisi3, (smulsi3_highpart, umulsi3_highpart): Convert to
define_insn_and_split.
(mulsi3_i): New define_insn_and_split.
(mulsi3_call): Convert to define_insn.
(mulsidi3, mulsidi3_compact, umulsidi3, umulsidi3_compact):
Remove constraints.


sh_cleanup_mul_div.patch


diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index da1dfe9..2d9502b 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -2244,16 +2244,9 @@
  
  
  (define_expand "udivsi3"

-  [(set (match_dup 3) (symbol_ref:SI "__udivsi3"))
-   (set (reg:SI R4_REG) (match_operand:SI 1 "general_operand" ""))
-   (set (reg:SI R5_REG) (match_operand:SI 2 "general_operand" ""))
-   (parallel [(set (match_operand:SI 0 "register_operand" "")
-  (udiv:SI (reg:SI R4_REG)
-   (reg:SI R5_REG)))
- (clobber (reg:SI T_REG))
- (clobber (reg:SI PR_REG))
- (clobber (reg:SI R4_REG))
- (use (match_dup 3))])]
+  [(set (match_operand:SI 0 "register_operand")
+   (udiv:SI (match_operand:SI 1 "general_operand")
+(match_operand:SI 2 "general_operand")))]
""
  {
rtx last;
@@ -2379,18 +2372,9 @@
 (set_attr "needs_delay_slot" "yes")])
  
  (define_expand "divsi3"

-  [(set (match_dup 3) (symbol_ref:SI "__sdivsi3"))
-   (set (reg:SI R4_REG) (match_operand:SI 1 "general_operand" ""))
-   (set (reg:SI R5_REG) (match_operand:SI 2 "general_operand" ""))
-   (parallel [(set (match_operand:SI 0 "register_operand" "")
-  (div:SI (reg:SI R4_REG)
-  (reg:SI R5_REG)))
- (clobber (reg:SI T_REG))
- (clobber (reg:SI PR_REG))
- (clobber (reg:SI R1_REG))
- (clobber (reg:SI R2_REG))
- (clobber (reg:SI R3_REG))
- (use (match_dup 3))])]
+  [(set (match_operand:SI 0 "register_operand")
+   (div:SI (match_operand:SI 1 "general_operand")
+   (match_operand:SI 2 "general_operand")))]
""
  {
rtx last;


Hi,

this caused PR83111 - "[sh] stack smashing detected in gen_udivsi3"  ( 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83111 ).


Thanks,
- Tom


RE: [patch] remove cilk-plus

2017-11-22 Thread Joseph Myers
This patch version does not appear to address my comment that you're 
leaving behind comments in c-parser.c relating to Cilk array notations 
while removing the subsequent code.  (Or in one case actually reindenting 
the comment that is no longer relevant, rather than removing it.)

-- 
Joseph S. Myers
jos...@codesourcery.com


refine default dwarf settings on VxWorks

2017-11-22 Thread Olivier Hainque
Hello,

The Vxworks ports have specific default settings controlling the
dwarf output to accomodate the possible use of old non-gdb debuggers
provided with the VxWorks environment.

Those settings are hardcoded today, even though different versions
of VxWorks come with different constraints.

This patch just introduces per-vxworks-version variations on the
defaults we enforce.

We have been using a very minor variant of this patch in house for a couple
of years now. I verified a powerpc-wrs-vxworks and a powerpc-wrs-vxworksae
build works on mainline with the change, and that it produces the expected
effect.

Committing to mainline.

2017-11-22  Olivier Hainque  

* vxworks.c (vxworks_override_options): Pick default
dwarf version from macro value, VXWORKS_DWARF_VERSION_DEFAULT.
* vxworks.h: Define VXWORKS_DWARF_VERSION_DEFAULT and
DWARF_GNAT_ENCODINGS_DEFAULT.
* vxworksae.h: Likewise.



vxworks-dwarf.diff
Description: Binary data


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-22 Thread Jeff Law
On 11/22/2017 04:31 AM, Alan Hayward wrote:
> 
>> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>>>

 You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
 totally forgotten about it.  And in fact it seems to come pretty close
 to what you need…
>>>
>>> Yes, some of the code is similar to the way
>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>>> CLOBBER expr code served as a starting point for writing the patch. The 
>>> main difference
>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>>> specific Instruction,
>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to an 
>>> expression (tls_desc).
>>> It meant there wasn’t really any opportunity to resume any existing code.
>> Understood.  Though your first patch mentions that you're trying to
>> describe partial preservation "around TLS calls".  Presumably those are
>> represented as normal insns, not call_insn.
>>
>> That brings me back to Richi's idea of exposing a set of the low subreg
>> to itself using whatever mode is wide enough to cover the neon part of
>> the register.
>>
>> That should tell the generic parts of the compiler that you're just
>> clobbering the upper part and at least in theory you can implement in
>> the aarch64 backend and the rest of the compiler should "just work"
>> because that's the existing semantics of a subreg store.
>>
>> The only worry would be if a pass tried to get overly smart and
>> considered that kind of set a nop -- but I think I'd argue that's simply
>> wrong given the semantics of a partial store.
>>
> 
> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
> It’s something we considered, and then dismissed.
> 
> The problem then is you are now using SET semantics on those registers, and it
> would make the register live around the function, which might not be the case.
> Whereas clobber semantics will just make the register dead - which is exactly
> what we want (but only conditionally).
?!?  A set of the subreg is the *exact* semantics you want.  It says the
low part is preserved while the upper part is clobbered across the TLS
insns.

jeff


Simplify ptr - 0

2017-11-22 Thread Marc Glisse

Hello,

I hadn't implemented this simplification because I think it is invalid 
code, but apparently people do write it, so we might as well handle it 
sensibly. This also happens to work around PR 83104 (already fixed).


bootstrap+regtest on powerpc64le-unknown-linux-gnu.

2017-11-22  Marc Glisse  

* match.pd (ptr-0): New transformation.

--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd	(revision 255052)
+++ gcc/match.pd	(working copy)
@@ -95,20 +95,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for op (plus pointer_plus minus bit_ior bit_xor)
   (simplify
 (op @0 integer_zerop)
 (non_lvalue @0)))
 
 /* 0 +p index -> (type)index */
 (simplify
  (pointer_plus integer_zerop @1)
  (non_lvalue (convert @1)))
 
+/* ptr - 0 -> (type)ptr */
+(simplify
+ (pointer_diff @0 integer_zerop)
+ (convert @0))
+
 /* See if ARG1 is zero and X + ARG1 reduces to X.
Likewise if the operands are reversed.  */
 (simplify
  (plus:c @0 real_zerop@1)
  (if (fold_real_zero_addition_p (type, @1, 0))
   (non_lvalue @0)))
 
 /* See if ARG1 is zero and X - ARG1 reduces to X.  */
 (simplify
  (minus @0 real_zerop@1)


Re: Simplify ptr - 0

2017-11-22 Thread Jakub Jelinek
On Wed, Nov 22, 2017 at 06:34:08PM +0100, Marc Glisse wrote:
> Hello,
> 
> I hadn't implemented this simplification because I think it is invalid code,
> but apparently people do write it, so we might as well handle it sensibly.
> This also happens to work around PR 83104 (already fixed).
> 
> bootstrap+regtest on powerpc64le-unknown-linux-gnu.

I guess we'll need to disable it for -fsanitize=pointer-subtraction
once/if it makes it in, because it should be still diagnosed as invalid.

> 2017-11-22  Marc Glisse  
> 
>   * match.pd (ptr-0): New transformation.
> 
> -- 
> Marc Glisse

Jakub


RFC/A: Stop cselib.c recording sets of the frame pointer

2017-11-22 Thread Richard Sandiford
One of the effects of:

   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02066.html

is that we now emit:

   (set (hard-frame-pointer) (stack-pointer))

instead of the previous non-canonical:

   (set (hard-frame-pointer) (plus (stack-pointer) (const_int 0)))

However, recent changes to aarch64_expand_prologue mean that we
can emit a frame record even if !frame_pointer_needed.  In that case,
the frame pointer is initialised normally, but all references generated
by GCC continue to use the stack pointer.

The combination of these two things triggered a regression in
gcc.c-torture/execute/960419-2.c.  The problem is that alias.c
fundemantally requires GCC to be consistent about which register
it uses:

 2. stack_pointer_rtx, frame_pointer_rtx, hard_frame_pointer_rtx
(if distinct from frame_pointer_rtx) and arg_pointer_rtx.
Each of these rtxes has a separate ADDRESS associated with it,
each with a negative id.

GCC is (and is required to be) precise in which register it
chooses to access a particular region of stack.  We can therefore
assume that accesses based on one of these rtxes do not alias
accesses based on another of these rtxes.

But in the test case cselib saw the move above and recorded the two
registers as being equivalent.  We therefore replaced some (but not all)
references to the stack pointer with references to the frame pointer.
MEMs that were still based on the stack pointer would then not alias
MEMs that became based on the frame pointer.

cselib.c has code to try to prevent this:

  for (; p; p = p->next)
{
  /* Return these right away to avoid returning stack pointer based
 expressions for frame pointer and vice versa, which is something
 that would confuse DSE.  See the comment in cselib_expand_value_rtx_1
 for more details.  */
  if (REG_P (p->loc)
  && (REGNO (p->loc) == STACK_POINTER_REGNUM
  || REGNO (p->loc) == FRAME_POINTER_REGNUM
  || REGNO (p->loc) == HARD_FRAME_POINTER_REGNUM
  || REGNO (p->loc) == cfa_base_preserved_regno))
return p->loc;

which refers to:

  /* The only thing that we are not willing to do (this
 is requirement of dse and if others potential uses
 need this function we should add a parm to control
 it) is that we will not substitute the
 STACK_POINTER_REGNUM, FRAME_POINTER or the
 HARD_FRAME_POINTER.

 These expansions confuses the code that notices that
 stores into the frame go dead at the end of the
 function and that the frame is not effected by calls
 to subroutines.  If you allow the
 STACK_POINTER_REGNUM substitution, then dse will
 think that parameter pushing also goes dead which is
 wrong.  If you allow the FRAME_POINTER or the
 HARD_FRAME_POINTER then you lose the opportunity to
 make the frame assumptions.  */

But that doesn't work if the two registers are equal, and thus
in the same value chain.  It becomes pot luck which one is chosen.

I think it'd be better not to enter an equivalence for the set of
the frame pointer at all.  That should deal with the same correctness
concern as the code above, but I think we still want the existing checks
for optimisation reasons.  It seems better to check for pointer equality
instead of register number equality though, since we don't want to change
the behaviour when the frame pointer register is not being used as a frame
pointer.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Does it look like the right approach?  OK to install if so?

Thanks,
Richard


2017-11-22  Richard Sandiford  

gcc/
* cselib.c (expand_loc, cselib_expand_value_rtx_1): Change
justification for checking for the stack and hard frame pointers.
Check them by pointer rather than register number.
(cselib_record_set): Do nothing for sets of the frame pointer.

Index: gcc/cselib.c
===
*** gcc/cselib.c2017-11-21 17:57:59.245162437 +
--- gcc/cselib.c2017-11-22 17:43:08.894805286 +
*** expand_loc (struct elt_loc_list *p, stru
*** 1420,1433 
  
for (; p; p = p->next)
  {
!   /* Return these right away to avoid returning stack pointer based
!expressions for frame pointer and vice versa, which is something
!that would confuse DSE.  See the comment in cselib_expand_value_rtx_1
!for more details.  */
if (REG_P (p->loc)
! && (REGNO (p->loc) == STACK_POINTER_REGNUM
! || REGNO (p->loc) == FRAME_POINTER_REGNUM
! || REGNO (p->loc) == HARD_FRAME_POINTER_REGNUM
  || REGNO (p->loc) == cfa_base_preserved_regno))
return p->loc;
  

  1   2   >