Re: [arm-embedded][PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs

2016-04-29 Thread Jasmin J.

Hi!

I would really like to see this in GCC, because this is the base for the
next patch "Add multilib support for bare-metal ARM architectures".

BR
   Jasmin

*

On 04/27/2016 04:17 PM, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On Thursday 17 December 2015 17:32:48 Thomas Preud'homme wrote:

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
Sent: Wednesday, December 16, 2015 7:59 PM
To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
Kyrylo Tkachov
Subject: [PATCH, GCC/ARM, 2/3] Error out for incompatible ARM
multilibs

Currently in config.gcc, only the first multilib in a multilib list is
checked for validity and the following elements are ignored due to the
break which only breaks out of loop in shell. A loop is also done over
the multilib list elements despite no combination being legal. This patch
rework the code to address both issues.

ChangeLog entry is as follows:


2015-11-24  Thomas Preud'homme  

 * config.gcc: Error out when conflicting multilib is detected.  Do
 not
 loop over multilibs since no combination is legal.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 59aee2c..be3c720 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3772,38 +3772,40 @@ case "${target}" in

# Add extra multilibs
if test "x$with_multilib_list" != x; then

arm_multilibs=`echo $with_multilib_list | sed -e

's/,/ /g'`
-   for arm_multilib in ${arm_multilibs}; do
-   case ${arm_multilib} in
-   aprofile)
+   case ${arm_multilibs} in
+   aprofile)

# Note that arm/t-aprofile is a
# stand-alone make file fragment to be
# used only with itself.  We do not
# specifically use the
# TM_MULTILIB_OPTION framework

because

# this shorthand is more

-   # pragmatic. Additionally it is only
-   # designed to work without any
-   # with-cpu, with-arch with-mode
+   # pragmatic.
+   tmake_profile_file="arm/t-aprofile"
+   ;;
+   default)
+   ;;
+   *)
+   echo "Error: --with-multilib-
list=${with_multilib_list} not supported." 1>&2
+   exit 1
+   ;;
+   esac
+
+   if test "x${tmake_profile_file}" != x ; then
+   # arm/t-aprofile is only designed to work
+   # without any with-cpu, with-arch, with-
mode,

# with-fpu or with-float options.

-   if test "x$with_arch" != x \
-   || test "x$with_cpu" != x \
-   || test "x$with_float" != x \
-   || test "x$with_fpu" != x \
-   || test "x$with_mode" != x ;
then
-   echo "Error: You cannot use
any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=aprofile"
1>&2
-   exit 1
-   fi
-   tmake_file="${tmake_file}
arm/t-aprofile"
-   break
-   ;;
-   default)
-   ;;
-   *)
-   echo "Error: --with-multilib-
list=${with_multilib_list} not supported." 1>&2
-   exit 1
-   ;;
-   esac
-   done
+   if test "x$with_arch" != x \
+   || test "x$with_cpu" != x \
+   || test "x$with_float" != x \
+   || test "x$with_fpu" != x \
+   || test "x$with_mode" != x ; then
+   echo "Error: You cannot use any of --
with-arch/cpu/fpu/float/mode with --with-multilib-list=${arm_multilib}"
1>&2
+   exit 1
+   

Re: [ping][patch] update handling of 'acc parallel loop' reductions for PR70626

2016-04-29 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 03:49:13PM -0700, Cesar Philippidis wrote:
> 2016-04-15  Cesar Philippidis  
> 
>   gcc/c-family/
>   PR middle-end/70626
>   * c-common.h (c_oacc_split_loop_clauses): Add boolean argument.
>   * c-omp.c (c_oacc_split_loop_clauses): Use it to duplicate
>   reduction clauses in acc parallel loops.
> 
>   gcc/c/
>   PR middle-end/70626
>   * c-parser.c (c_parser_oacc_loop): Don't augment mask with
>   OACC_LOOP_CLAUSE_MASK.
>   (c_parser_oacc_kernels_parallel): Update call to
>   c_oacc_split_loop_clauses.
> 
>   gcc/cp/
>   PR middle-end/70626
>   * parser.c (cp_parser_oacc_loop): Don't augment mask with
>   OACC_LOOP_CLAUSE_MASK.
>   (cp_parser_oacc_kernels_parallel): Update call to
>   c_oacc_split_loop_clauses.
> 
>   gcc/fortran/
>   PR middle-end/70626
>   * trans-openmp.c (gfc_trans_oacc_combined_directive): Duplicate
>   the reduction clause in both parallel and loop directives.
> 
>   gcc/testsuite/
>   PR middle-end/70626
>   * c-c++-common/goacc/combined-reduction.c: New test.
>   * gfortran.dg/goacc/reduction-2.f95: Add check for kernels reductions.
> 
>   libgomp/
>   PR middle-end/70626
>   * testsuite/libgomp.oacc-c++/template-reduction.C: Adjust test.
>   * testsuite/libgomp.oacc-c-c++-common/combined-reduction.c: New test.
>   * testsuite/libgomp.oacc-fortran/combined-reduction.f90: New test.

LGTM.

Jakub


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-29 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 08:29:56PM +0200, Dhole wrote:
> There is the Wdate-time flag, that warns on using __DATE__, __TIME__ and
> __TIMESTAMP__.  Although that alone will not make the compilation fail
> unless it's used with Werror.
> 
> The reason behind using fatal_error (rather than a warning) when
> SOURCE_DATE_EPOCH contains an invalid value is due to the
> SOURCE_DATE_EPOCH specification [1]:
> 
>   SOURCE_DATE_EPOCH
>   (...)
>   If the value is malformed, the build process SHOULD exit with a non-zero 
> error code.

First of all, using error instead of fatal_error achieves just that too,
except that it allows also detecting other errors in the source.
fatal_error is meant for cases where there is no way to go forward with the
compilation, while here exists a perfectly reasonable way to go forward (assume
the env var is not set, use a hardcoded timestamp, ...).

> And the reason for reading and parsing the env var in gcc/ rather than
> when the macro is expanded for the first time (in libcpp/) is from a
> comment by Joseph Myers made the first time I submited this patch [2].
> The most clean way to read the env var from gcc/ I found was to do it
> during the initialization.  But if you think this should be done
> different I'm open to change the implementation.

Doing this on the gcc/ side is of course reasonable, but can be done through
callbacks, libcpp already has lots of other callbacks into the gcc/ code,
look for e.g. cpp_get_callbacks in gcc/c-family/* and in libcpp/ for
corresponding code.

> Bernd: I'll see if I can prepare a testcase; first I need to get
> familiar with the testing framework and learn how to set environment
> variables in tests.  Any tips on that will be really welcome!

grep for dg-set-target-env-var in various tests.

Jakub


Re: [PATCH, ARM, 3/3] Add multilib support for bare-metal ARM architectures

2016-04-29 Thread Jasmin J.

Hi!

Ping!

Attached is a rebased version of my patch with a small change, which converts
some spaces to TABs.

> Please note, that the patch
>"[PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs"
>from 12/16/2015 12:58 PM
> needs to be applied before my new version of this patch.
This is still a requirement.

BR
   Jasmin

***

On 03/04/2016 01:19 AM, Jasmin J. wrote:

Hi all!


As to the need to modify Makefile.in and
configure.ac, this is because the patch aims to let control to the user
as to what multilib should be built.

As Ramana asked in his answer to my first version of the patch: Why?
The GCC mechanism to forward this to the t-* makefile is "TM_MULTILIB_CONFIG"
(as far as I have understand it). It is not necessary to introduce a new
variable to configure and Makefile.

Ramana mentioned also:

... as well as comments up top to explain what multilibs are being
built .


Additionally the error message "You cannot use any of ..." didn't print the
the right text in any case.

Attached is an improved version of this patch:
- it uses TM_MULTILIB_CONFIG
- fixed the error message "You cannot use any of ..."
- made the error message "Error:  not supported." more clear
- added a FSF copyright header to t-baremetal file and described what is
   built there
- commented out armv8-m.base and armv8-m.main, because this is currently not
   available in GCC mainline and gcc 5.3.0 release, but will be added soon
   (I guess)

Ramana mentioned in another message a test of the new options:
- I did test it with "test_arm_none_eabi.sh"; procedure taken from this
   message: https://gcc.gnu.org/ml/gcc-patches/2013-10/msg00659.html
- The result is in "test_result.txt".
(both files attached also)

My copyright assignment number: 1059920

Please note, that the patch
   "[PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs"
   from 12/16/2015 12:58 PM
needs to be applied before my new version of this patch.

BR
Jasmin

**

On 12/16/2015 01:04 PM, Thomas Preud'homme wrote:

Hi Ramana,

As suggested in your initial answer to this thread, we updated the multilib
patch provided in ARM's embedded branch to be up-to-date with regards to
supported CPUs in GCC. As to the need to modify Makefile.in and
configure.ac, this is because the patch aims to let control to the user
as to what multilib should be built. To this effect, it takes a list of
architecture at configure time and that list needs to be passed down to
t-baremetal Makefile to set the multilib variables appropriately.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2015-12-15  Thomas Preud'homme  

 * Makefile.in (with_multilib_list): New variables substituted by
 configure.
 * config.gcc: Handle bare-metal multilibs in --with-multilib-list
 option.
 * config/arm/t-baremetal: New file.
 * configure.ac (with_multilib_list): New AC_SUBST.
 * configure: Regenerate.
 * doc/install.texi (--with-multilib-list): Update description for
 arm*-*-* targets to mention bare-metal multilibs.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 
1f698798aa2df3f44d6b3a478bb4bf48e9fa7372..18b790afa114aa7580be0662d3ac9ffbc94e919d
 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -546,6 +546,7 @@ lang_opt_files=@lang_opt_files@ $(srcdir)/c-family/c.opt 
$(srcdir)/common.opt
  lang_specs_files=@lang_specs_files@
  lang_tree_files=@lang_tree_files@
  target_cpu_default=@target_cpu_default@
+with_multilib_list=@with_multilib_list@
  OBJC_BOEHM_GC=@objc_boehm_gc@
  extra_modes_file=@extra_modes_file@
  extra_opt_files=@extra_opt_files@
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
af948b5e203f6b4f53dfca38e9d02d060d00c97b..d8098ed3cefacd00cb10590db1ec86d48e9fcdbc
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3787,15 +3787,25 @@ case "${target}" in
default)
;;
*)
-   echo "Error: --with-multilib-list=${with_multilib_list} not 
supported." 1>&2
-   exit 1
+   for arm_multilib in ${arm_multilibs}; do
+   case ${arm_multilib} in
+   armv6-m | armv7-m | armv7e-m | armv7-r 
| armv8-m.base | armv8-m.main)
+   
tmake_profile_file="arm/t-baremetal"
+   ;;
+   *)
+   echo "Error: 
--with-multilib-list=${with_multilib_list} not supported." 1>&2
+   exit 1
+   ;;
+   esac
+   done
 

Re: [patch] cleanup *finish_omp_clauses

2016-04-29 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 10:42:49AM -0700, Cesar Philippidis wrote:
> > That said, the above names are just weird, it is non-obvious
> > what they mean at all.  What is C_ORT_NONE for?  We surely don't
> > have any clauses that aren't OpenMP, nor Cilk+, nor OpenACC
> > (ok, maybe the simd attribute, but donno if it ever calls the
> > *finish_omp_clauses functions).
> 
> *parser_clik_for was just passing is_omp/allow_fields = false.

Sure, because it is Cilk+, not OpenMP.

> 

> @@ -17597,7 +17597,7 @@ c_parser_cilk_for (c_parser *parser, tree grain, bool 
> *if_p)
>tree clauses = build_omp_clause (EXPR_LOCATION (grain), 
> OMP_CLAUSE_SCHEDULE);
>OMP_CLAUSE_SCHEDULE_KIND (clauses) = OMP_CLAUSE_SCHEDULE_CILKFOR;
>OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (clauses) = grain;
> -  clauses = c_finish_omp_clauses (clauses, false);
> +  clauses = c_finish_omp_clauses (clauses, 0);
>  
>tree block = c_begin_compound_stmt (true);
>tree sb = push_stmt_list ();

The above is wrong, it should have been C_ORT_CILK.  It will not change
anything on the behavior of c_finish_omp_clauses - _Cilk_for only has
OMP_CLAUSE_SCHEDULE, is_cilk is right now tested only on OMP_CLAUSE_LINEAR
- but it is desirable for consistency and clarity.

> @@ -17663,7 +17663,7 @@ c_parser_cilk_for (c_parser *parser, tree grain, bool 
> *if_p)
>OMP_CLAUSE_OPERAND (c, 0)
>   = cilk_for_number_of_iterations (omp_for);
>OMP_CLAUSE_CHAIN (c) = clauses;
> -  OMP_PARALLEL_CLAUSES (omp_par) = c_finish_omp_clauses (c, true);
> +  OMP_PARALLEL_CLAUSES (omp_par) = c_finish_omp_clauses (c, C_ORT_OMP);
>add_stmt (omp_par);

This is wrong too, it should be C_ORT_CILK.  Again, it shouldn't change
anything, the clauses in that case are OMP_CLAUSE_FIRSTPRIVATE,
OMP_CLAUSE_PRIVATE and OMP_CLAUSE__CILK_FOR_COUNT_, the latter unique to
_Cilk_for, the former not, but with simple decls in them and nothing should
depend on that for c_finish_omp_clauses.

> -extern tree c_finish_omp_clauses (tree, bool, bool = false, bool = false);
> +extern tree c_finish_omp_clauses (tree, unsigned int);

I think it would be better to assign an enum value also for the
C_ORT_OMP | C_ORT_DECLARE_SIMD (C_ORT_OMP_DECLARE_SIMD), and just
use the enum type instead of unsigned int as the type, both in the proto
and in c_finish_omp_clauses definition.

> @@ -12496,8 +12496,7 @@ c_find_omp_placeholder_r (tree *tp, int *, void *data)
> Remove any elements from the list that are invalid.  */
>  
>  tree
> -c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd,
> -   bool is_cilk)
> +c_finish_omp_clauses (tree clauses, unsigned int ort)
>  {
>bitmap_head generic_head, firstprivate_head, lastprivate_head;
>bitmap_head aligned_head, map_head, map_field_head;
> @@ -12509,6 +12508,9 @@ c_finish_omp_clauses (tree clauses, bool is_omp, bool 
> declare_simd,
>tree *nowait_clause = NULL;
>bool ordered_seen = false;
>tree schedule_clause = NULL_TREE;
> +  bool is_omp = ort & C_ORT_OMP;
> +  bool declare_simd = ort & C_ORT_DECLARE_SIMD;
> +  bool is_cilk = ort & C_ORT_CILK;

I think I'd prefer replacing those flags with the ort & ... tests
in all places where they are used.

> -extern tree finish_omp_clauses   (tree, bool, bool = 
> false,
> -  bool = false);
> +extern tree finish_omp_clauses   (tree, unsigned int);

See above.

> @@ -32580,9 +32580,9 @@ cp_parser_omp_all_clauses (cp_parser *parser, 
> omp_clause_mask mask,
>if (finish_p)
>  {
>if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_UNIFORM)) != 0)
> - return finish_omp_clauses (clauses, false, true);
> + return finish_omp_clauses (clauses, C_ORT_DECLARE_SIMD);

This should have been C_ORT_OMP | C_ORT_DECLARE_SIMD or better yet
C_ORT_OMP_DECLARE_SIMD, see above.

> @@ -37771,7 +37771,7 @@ cp_parser_cilk_for (cp_parser *parser, tree grain, 
> bool *if_p)
>tree clauses = build_omp_clause (EXPR_LOCATION (grain), 
> OMP_CLAUSE_SCHEDULE);
>OMP_CLAUSE_SCHEDULE_KIND (clauses) = OMP_CLAUSE_SCHEDULE_CILKFOR;
>OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (clauses) = grain;
> -  clauses = finish_omp_clauses (clauses, false);
> +  clauses = finish_omp_clauses (clauses, 0);

See above.

> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -9585,7 +9585,7 @@ tsubst_attribute (tree t, tree *decl_p, tree args,
>clauses = tsubst_omp_clauses (clauses, true, false, args,
>   complain, in_decl);
>c_omp_declare_simd_clauses_to_decls (*decl_p, clauses);
> -  clauses = finish_omp_clauses (clauses, false, true);
> +  clauses = finish_omp_clauses (clauses, C_ORT_DECLARE_SIMD);
>tree parms = DECL_ARGUMENTS (*decl_p);
>clauses
>   = c_omp_declare_simd_clauses_to_numbers (parms, clauses);
> @@ -14749,7 +14749,8 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, 
> bool allow_fields,
>new_clauses =

Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-29 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 01:45:36PM +0300, Ilya Enkovich wrote:
> > Where does the 2 come from?  Is it that the STV pass right now supports only
> > 2 * wordsize modes?  Also, I don't think we should treat equally constants
> > that fit into the 32-bit immediates and constants that don't, the latter,
> > when movabsq needs to be used, are more costly.
> 
> This variant is for DImode going to split into two SImode.  TImode chains
> have own cost model.

Ok.  Still, we should make sure the cost of movabsq is significantly higher
than cost of other immediates (not only because the other immediates can be
used directly in the instructions, while for movabsq you need to first use
that insn to initialize some reg and then use that reg in other insn, but
also because of the movabsq latency).

Jakub


Re: check-target-libgomp wall time, without vs. with offloading

2016-04-29 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 12:43:43PM +0200, Thomas Schwinge wrote:
> commit 3b521f3e35fdb4b320e95b5f6a82b8d89399481a
> Author: Thomas Schwinge 
> Date:   Thu Apr 21 11:36:39 2016 +0200
> 
> libgomp: Unconfuse offload plugins vs. offload targets

I don't like this patch at all, rather than unconfusing stuff it
makes stuff confusing.  Plugins are just a way to support various
offloading targets.

Can you please post just a short patch without all those changes
that does what you want, rather than renaming everything at the same time?

Jakub


Re: [PATCH] Update gmp/mpfr/mpc in-tree versions

2016-04-29 Thread Richard Biener
On Thu, 28 Apr 2016, Bernd Edlinger wrote:

> On 28.04.2016 16:29, Richard Biener wrote:
> >
> > Another option would be to try if mini-gmp is enough for our
> > (in-tree) use and what the performance impact would be if we'd
> > use that (in-tree).
> >
> 
> Yes, we would certainly never need more than that subset.
> 
> But I don't see how mpfr can be built with mini-gmp.
> I tried to and failed early in mpfr/configure.
> Any ideas?

No idea - it of course breaks down if mpfr cannot work with mini-gmp.

Richard.


Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Eric Botcazou
> While looking at the insn enable condition, I noticed that we don't
> use "probe_stack" pattern any more, as the stack check loop is now
> implemented in a different way.

Yes, we do, probe_stack is a standard pattern called by the middle-end.

> 2016-04-28  Uros Bizjak  
> 
> * config/i386/i386.md (peephole2s for operations with memory inputs):
> Use SWI mode iterator.
> (peephole2s for operations with memory outputs): Ditto.
> Do not check for stack checking probe.
> 
> (probe_stack): Remove expander.
> 
> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

How did you test it exactly?

=== acats tests ===
FAIL:   c52103x
FAIL:   c52104x

-- 
Eric Botcazou


Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Uros Bizjak
On Fri, Apr 29, 2016 at 9:47 AM, Eric Botcazou  wrote:
>> While looking at the insn enable condition, I noticed that we don't
>> use "probe_stack" pattern any more, as the stack check loop is now
>> implemented in a different way.
>
> Yes, we do, probe_stack is a standard pattern called by the middle-end.
>
>> 2016-04-28  Uros Bizjak  
>>
>> * config/i386/i386.md (peephole2s for operations with memory inputs):
>> Use SWI mode iterator.
>> (peephole2s for operations with memory outputs): Ditto.
>> Do not check for stack checking probe.
>>
>> (probe_stack): Remove expander.
>>
>> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> How did you test it exactly?
>
> === acats tests ===
> FAIL:   c52103x
> FAIL:   c52104x

Apparently without ada...

We can put it back, but prehaps implemented as unspec, so it won't
interfere with peepholes?

Uros.


Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Eric Botcazou
> We can put it back, but prehaps implemented as unspec, so it won't
> interfere with peepholes?

No strong opinion, as long as the final assembly is the same as before.

-- 
Eric Botcazou


Re: [RFA][PATCH] Adding missing calls to bitmap_clear

2016-04-29 Thread Richard Biener
On Thu, Apr 28, 2016 at 9:35 PM, Jeff Law  wrote:
> On 03/22/2016 03:37 AM, Richard Biener wrote:
>>
>> On Mon, Mar 21, 2016 at 9:32 PM, Jeff Law  wrote:
>>>
>>> On 03/21/2016 01:10 PM, Bernd Schmidt wrote:


 On 03/21/2016 08:06 PM, Jeff Law wrote:
>
>
>
> As noted last week, find_removable_extensions initializes several
> bitmaps, but doesn't clear them.
>
> This is not strictly a leak as the GC system should find dead data, but
> it's better to go ahead and clear the bitmaps.  That releases the
> elements back to the cache and presumably makes things easier for the
> GC
> system as well.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.
>
> OK for the trunk?



 Looks like they don't leak anywhere, so ok. Probably ok even to install
 it now but maybe stage1 would be better timing.
>>>
>>>
>>> I don't mind waiting for the next stage1, this is a pretty minor issue.
>>
>>
>> It's ok at this stage as it will also fix -fmem-report.  Please also move
>> the thing back to heap, see below.
>>
>> Btw we should disallow bitmap_initialize (&x, NULL) as it does not do
>> the same thing as BITMAP_ALLOC (NULL), it does the same thing
>> as BITMAP_ALLOC_GC ().  Thus I'd rather have a bitmap_initialize_gc (&x)
>> and a bitmap_initialize (&x, NULL) that ends up using the global
>> bitmap obstack.  No idea where REE came from history wise.
>>
>> A grep shows only
>>
>> ira.c:  bitmap_initialize (&seen_insns, NULL);
>> ree.c:  bitmap_initialize (&init, NULL);
>> ree.c:  bitmap_initialize (&kill, NULL);
>> ree.c:  bitmap_initialize (&gen, NULL);
>> ree.c:  bitmap_initialize (&tmp, NULL);
>
> It's more than that.  Sadly folks have passed in "0" instead of NULL in
> various places.
>
> ./haifa-sched.c:  bitmap_initialize (&processed, 0);
> ./haifa-sched.c:  bitmap_initialize (&processed, 0);
> ./haifa-sched.c:  bitmap_initialize (&in_ready, 0);
> ./sched-ebb.c:  bitmap_initialize (&dont_calc_deps, 0);
> ./sched-rgn.c:  bitmap_initialize (¬_in_df, 0);
> ./testsuite/gcc.dg/pr45352.c:  bitmap_initialize_stat (0);
> ./ira.c:  bitmap_initialize (&interesting, 0);
> ./ira.c:  bitmap_initialize (&live, 0);
> ./ira.c:  bitmap_initialize (&used, 0);
> ./ira.c:  bitmap_initialize (&set, 0);
> ./ira.c:  bitmap_initialize (&unusable_as_input, 0);
> ./ira.c:  bitmap_initialize (local, 0);
> ./ira.c:  bitmap_initialize (transp, 0);
> ./ira.c:  bitmap_initialize (moveable, 0);
> ./ira.c:  bitmap_initialize (&need_new, 0);
> ./ira.c:  bitmap_initialize (&reachable, 0);
> ./sel-sched.c:  bitmap_initialize (forced_ebb_heads, 0);
> ./sched-deps.c:   bitmap_initialize (&true_dependency_cache[i], 0);
> ./sched-deps.c:   bitmap_initialize (&output_dependency_cache[i], 0);
> ./sched-deps.c:   bitmap_initialize (&anti_dependency_cache[i], 0);
> ./sched-deps.c:   bitmap_initialize (&control_dependency_cache[i], 0);
> ./sched-deps.c:bitmap_initialize (&spec_dependency_cache[i], 0);
>
>>
>> btw, so please consider simply changing bitmap_initialize behavior.  The
>> IRA
>> use also should use the global bitmap obstack as users around that use
>> use BITMAP_ALLOC (NULL).  [use a default arg for 'obstack' if possible,
>> you have to verify it works with/without
>> --enable-gather-detailed-mem-stats]
>
> The problem is ensuring that allocating off the default bitmap obstack is
> appropriate for all those uses.

True, if the bitmap head lives in a GC structure then that's not safe.

> I'm tempted to change them all to NULL.  Then iterate one by one on to
> ensure we're routing to gc vs the default bitmap obstack as appropriate and
> that we're calling bitmap_clear as appropriate.
>
> Once we've fixed all of 'em, we simply assert that bitmap_initialize is
> never passed NULL and avoid getting in this situation again in the future.

First one sounds good.  I'd still add a bitmap_gc_initialize (&head) and change
bitmap_initialize (&head, NULL) behavior to match that of BITMAP_ALLOC (NULL).

Richard.

> Thoughts?
> jeff
>
>


fix libsanitizer build on ppc-linux

2016-04-29 Thread Olivier Hainque
Hello,

Attempts to bootstrap on our powerpc-linux hosts fail
on libsanitizer with symptoms like:


  In file included from 
../../../../src/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:29:0:
  /usr/include/asm/posix_types.h:72:51: error: '__kernel_fd_set' has not been 
declared
  static __inline__ void __FD_SET(unsigned long fd, __kernel_fd_set *fdsetp)

  /usr/include/asm/posix_types.h: In function 'void __FD_SET(long unsigned int, 
int*)':
  /usr/include/asm/posix_types.h:74:28: error: '__NFDBITS' was not declared in 
this scope
  unsigned long _tmp = fd / __NFDBITS;
  ...

The attach patch fixes this, and bootstrap+regtests fine on x86_64-linux.

OK to commit ?

Thanks in advance,

With Kind Regards,

Olivier


2016-04-29  Olivier Hainque  

libsanitizer/
* sanitizer_common/sanitizer_platform_limits_linux.cc:
#include  instead of asm/posix_types.h.




libsan-linux.diff
Description: Binary data




Re: Move "X +- C1 CMP C2 to X CMP C2 -+ C1" to match.pd

2016-04-29 Thread Richard Biener
On Thu, Apr 28, 2016 at 10:15 PM, Marc Glisse  wrote:
> On Wed, 27 Apr 2016, Richard Biener wrote:
>
>>> --- trunk4/gcc/fold-const.h (revision 235452)
>>> +++ trunk4/gcc/fold-const.h (working copy)
>>> @@ -13,20 +13,22 @@ WARRANTY; without even the implied warra
>>>  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>  for more details.
>>>
>>>  You should have received a copy of the GNU General Public License
>>>  along with GCC; see the file COPYING3.  If not see
>>>  .  */
>>>
>>>  #ifndef GCC_FOLD_CONST_H
>>>  #define GCC_FOLD_CONST_H
>>>
>>> +#include 
>>> +
>>
>>
>> I think the canonical way is to include options.h where you include
>> fold-const.h ...
>> (ick)
>>
>> Doesn't the prototype serve as a forward declaration only and thus
>> including
>> options.h from gimple-match-head.c is enough?
>
>
> Doesn't look like it. If I remove this include, I get build failures for
> a large part of the C front-end (through c-family/c-common.h) and
> tree-ssa-scopedtables.c. Including options.h in those 2 files seems to
> work (I didn't check if all the files in config/ that include
> fold-const.h also indirectly include options.h). If you really think
> that's better, I'll do it...

Another option is to move the enum declaration from flag-types.h to
coretypes.h.  I think I like that best.

Richard.

> --
> Marc Glisse


Re: [PATCH] Fix type field walking in gimplifier unsharing

2016-04-29 Thread Eric Botcazou
> The following works (for the testcase):
> 
> Index: gcc/cp/decl.c
> ===
> --- gcc/cp/decl.c   (revision 235547)
> +++ gcc/cp/decl.c   (working copy)
> @@ -10393,8 +10393,11 @@ grokdeclarator (const cp_declarator *dec
>   && (decl_context == NORMAL || decl_context == FIELD)
>   && at_function_scope_p ()
>   && variably_modified_type_p (type, NULL_TREE))
> -   /* Force evaluation of the SAVE_EXPR.  */
> -   finish_expr_stmt (TYPE_SIZE (type));
> +   {
> + TYPE_NAME (type) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
> +NULL_TREE, type);
> + add_decl_expr (TYPE_NAME (type));
> +   }
> 
>   if (declarator->kind == cdk_reference)
> {
> 
> and I have a similar fix for the Fortran FE for one testcase I
> reduced to
> 
>   character(10), dimension (2) :: implicit_result
>   character(10), dimension (2) :: source
>   implicit_result = reallocate_hnv (LEN (source))
> contains
>   FUNCTION reallocate_hnv(LEN)
> CHARACTER(LEN=LEN), DIMENSION(:), POINTER :: reallocate_hnv
>   END FUNCTION reallocate_hnv
> end
> 
> Index: fortran/trans-array.c
> ===
> --- fortran/trans-array.c   (revision 235547)
> +++ fortran/trans-array.c   (working copy)
> @@ -1094,6 +1094,16 @@ gfc_trans_create_temp_array (stmtblock_t
>info->descriptor = desc;
>size = gfc_index_one_node;
> 
> +  /* Emit a DECL_EXPR for the variable sized array type in
> + GFC_TYPE_ARRAY_DATAPTR_TYPE so the gimplification of its type
> + sizes works correctly.  */
> +  tree arraytype = TREE_TYPE (GFC_TYPE_ARRAY_DATAPTR_TYPE (type));
> +  if (! TYPE_NAME (arraytype))
> +TYPE_NAME (arraytype) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
> +   NULL_TREE, arraytype);
> +  gfc_add_expr_to_block (pre, build1 (DECL_EXPR,
> + arraytype, TYPE_NAME (arraytype)));
> +
>/* Fill in the array dtype.  */
>tmp = gfc_conv_descriptor_dtype (desc);
>gfc_add_modify (pre, tmp, gfc_get_dtype (TREE_TYPE (desc)));

Great.  We do exactly that in the Ada compiler (but of course the number of 
places where we need to do it is an order of magnitude larger).

> I wonder if we can avoid allocating the TYPE_DECL by simply also
> allowing TREE_TYPE as operand of a DECL_EXPR (to avoid adding
> a 'TYPE_EXPR').

I agree that DECL_EXPR + TYPE_DECL is a bit heavy, but I'm not sure that the 
benefit would be worth introducing the irregularity in the IL.

-- 
Eric Botcazou


Re: [PATCH] Fix type field walking in gimplifier unsharing

2016-04-29 Thread Richard Biener
On Fri, 29 Apr 2016, Eric Botcazou wrote:

> > The following works (for the testcase):
> > 
> > Index: gcc/cp/decl.c
> > ===
> > --- gcc/cp/decl.c   (revision 235547)
> > +++ gcc/cp/decl.c   (working copy)
> > @@ -10393,8 +10393,11 @@ grokdeclarator (const cp_declarator *dec
> >   && (decl_context == NORMAL || decl_context == FIELD)
> >   && at_function_scope_p ()
> >   && variably_modified_type_p (type, NULL_TREE))
> > -   /* Force evaluation of the SAVE_EXPR.  */
> > -   finish_expr_stmt (TYPE_SIZE (type));
> > +   {
> > + TYPE_NAME (type) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
> > +NULL_TREE, type);
> > + add_decl_expr (TYPE_NAME (type));
> > +   }
> > 
> >   if (declarator->kind == cdk_reference)
> > {
> > 
> > and I have a similar fix for the Fortran FE for one testcase I
> > reduced to
> > 
> >   character(10), dimension (2) :: implicit_result
> >   character(10), dimension (2) :: source
> >   implicit_result = reallocate_hnv (LEN (source))
> > contains
> >   FUNCTION reallocate_hnv(LEN)
> > CHARACTER(LEN=LEN), DIMENSION(:), POINTER :: reallocate_hnv
> >   END FUNCTION reallocate_hnv
> > end
> > 
> > Index: fortran/trans-array.c
> > ===
> > --- fortran/trans-array.c   (revision 235547)
> > +++ fortran/trans-array.c   (working copy)
> > @@ -1094,6 +1094,16 @@ gfc_trans_create_temp_array (stmtblock_t
> >info->descriptor = desc;
> >size = gfc_index_one_node;
> > 
> > +  /* Emit a DECL_EXPR for the variable sized array type in
> > + GFC_TYPE_ARRAY_DATAPTR_TYPE so the gimplification of its type
> > + sizes works correctly.  */
> > +  tree arraytype = TREE_TYPE (GFC_TYPE_ARRAY_DATAPTR_TYPE (type));
> > +  if (! TYPE_NAME (arraytype))
> > +TYPE_NAME (arraytype) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
> > +   NULL_TREE, arraytype);
> > +  gfc_add_expr_to_block (pre, build1 (DECL_EXPR,
> > + arraytype, TYPE_NAME (arraytype)));
> > +
> >/* Fill in the array dtype.  */
> >tmp = gfc_conv_descriptor_dtype (desc);
> >gfc_add_modify (pre, tmp, gfc_get_dtype (TREE_TYPE (desc)));
> 
> Great.  We do exactly that in the Ada compiler (but of course the number of 
> places where we need to do it is an order of magnitude larger).
> 
> > I wonder if we can avoid allocating the TYPE_DECL by simply also
> > allowing TREE_TYPE as operand of a DECL_EXPR (to avoid adding
> > a 'TYPE_EXPR').
> 
> I agree that DECL_EXPR + TYPE_DECL is a bit heavy, but I'm not sure that the 
> benefit would be worth introducing the irregularity in the IL.

Not sure either.  I'll add a helper like build_decl_expr_for_type
that does the magic (if TYPE_NAME is NULL).  That at least reduces
the amount of code duplication.

Richard.


[PATCH] Fix PR13962 somewhat

2016-04-29 Thread Richard Biener

The PR asks that we optimize pointer comparisons using PTA information.
This patch implements the bits that are possible without adjusting
PTA to be more precise about things like points-to-null or 
points-to-string (or points-to-label/function).

This also fixes PR65686 where it avoids a bogus uninit warning
by simplifying the compare in

mytype f(struct S *e)
{
  mytype x;
  if(&x != e->pu)

where obviously the pointer e->pu in global memory cannot point to x.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-04-29  Richard Biener  

PR tree-optimization/13962
PR tree-optimization/65686
* tree-ssa-alias.h (ptrs_compare_unequal): Declare.
* tree-ssa-alias.c (ptrs_compare_unequal): New function
using PTA to compare pointers.
* match.pd: Add pattern for pointer equality compare simplification
using ptrs_compare_unequal.

* gcc.dg/uninit-pr65686.c: New testcase.

Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c.orig   2016-04-28 11:56:26.864468581 +0200
--- gcc/tree-ssa-alias.c2016-04-28 15:25:00.596095600 +0200
*** ptr_deref_may_alias_ref_p_1 (tree ptr, a
*** 321,326 
--- 321,386 
return true;
  }
  
+ /* Returns true if PTR1 and PTR2 compare unequal because of points-to.  */
+ 
+ bool
+ ptrs_compare_unequal (tree ptr1, tree ptr2)
+ {
+   /* First resolve the pointers down to a SSA name pointer base or
+  a VAR_DECL, PARM_DECL or RESULT_DECL.  This explicitely does
+  not yet try to handle LABEL_DECLs, FUNCTION_DECLs, CONST_DECLs
+  or STRING_CSTs which needs points-to adjustments to track them
+  in the points-to sets.  */
+   tree obj1 = NULL_TREE;
+   tree obj2 = NULL_TREE;
+   if (TREE_CODE (ptr1) == ADDR_EXPR)
+ {
+   tree tem = get_base_address (TREE_OPERAND (ptr1, 0));
+   if (! tem)
+   return false;
+   if (TREE_CODE (tem) == VAR_DECL
+ || TREE_CODE (tem) == PARM_DECL
+ || TREE_CODE (tem) == RESULT_DECL)
+   obj1 = tem;
+   else if (TREE_CODE (tem) == MEM_REF)
+   ptr1 = TREE_OPERAND (tem, 0);
+ }
+   if (TREE_CODE (ptr2) == ADDR_EXPR)
+ {
+   tree tem = get_base_address (TREE_OPERAND (ptr2, 0));
+   if (! tem)
+   return false;
+   if (TREE_CODE (tem) == VAR_DECL
+ || TREE_CODE (tem) == PARM_DECL
+ || TREE_CODE (tem) == RESULT_DECL)
+   obj2 = tem;
+   else if (TREE_CODE (tem) == MEM_REF)
+   ptr2 = TREE_OPERAND (tem, 0);
+ }
+ 
+   if (obj1 && obj2)
+ /* Other code handles this correctly, no need to duplicate it here.  */;
+   else if (obj1 && TREE_CODE (ptr2) == SSA_NAME)
+ {
+   struct ptr_info_def *pi = SSA_NAME_PTR_INFO (ptr2);
+   if (!pi)
+   return false;
+   return !pt_solution_includes (&pi->pt, obj1);
+ }
+   else if (TREE_CODE (ptr1) == SSA_NAME && obj2)
+ {
+   struct ptr_info_def *pi = SSA_NAME_PTR_INFO (ptr1);
+   if (!pi)
+   return false;
+   return !pt_solution_includes (&pi->pt, obj2);
+ }
+ 
+   /* ???  We'd like to handle ptr1 != NULL and ptr1 != ptr2
+  but those require pt.null to be conservatively correct.  */
+ 
+   return false;
+ }
+ 
  /* Returns whether reference REF to BASE may refer to global memory.  */
  
  static bool
Index: gcc/tree-ssa-alias.h
===
*** gcc/tree-ssa-alias.h.orig   2016-04-28 11:56:26.864468581 +0200
--- gcc/tree-ssa-alias.h2016-04-28 11:57:18.389057793 +0200
*** extern alias_set_type ao_ref_alias_set (
*** 101,106 
--- 101,107 
  extern alias_set_type ao_ref_base_alias_set (ao_ref *);
  extern bool ptr_deref_may_alias_global_p (tree);
  extern bool ptr_derefs_may_alias_p (tree, tree);
+ extern bool ptrs_compare_unequal (tree, tree);
  extern bool ref_may_alias_global_p (tree);
  extern bool ref_may_alias_global_p (ao_ref *);
  extern bool refs_may_alias_p (tree, tree);
Index: gcc/match.pd
===
*** gcc/match.pd.orig   2016-04-28 11:56:26.864468581 +0200
--- gcc/match.pd2016-04-28 11:59:36.070631926 +0200
*** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
*** 2409,2414 
--- 2409,2422 
(if (cmp == NE_EXPR)
 { constant_boolean_node (true, type); })
  
+ /* Simplify pointer equality compares using PTA.  */
+ (for neeq (ne eq)
+  (simplify
+   (neeq @0 @1)
+   (if (POINTER_TYPE_P (TREE_TYPE (@0))
+&& ptrs_compare_unequal (@0, @1))
+{ neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; })))
+ 
  /* Non-equality compare simplifications from fold_binary  */
  (for cmp (lt gt le ge)
   /* Comparisons with the highest or lowest possible integer of
Index: gcc/testsuite/gcc.dg/uninit-pr65686.c
===
*** /dev/null

Re: fix libsanitizer build on ppc-linux

2016-04-29 Thread Jakub Jelinek
On Fri, Apr 29, 2016 at 10:08:07AM +0200, Olivier Hainque wrote:
> Hello,
> 
> Attempts to bootstrap on our powerpc-linux hosts fail
> on libsanitizer with symptoms like:
> 
> 
>   In file included from 
> ../../../../src/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:29:0:
>   /usr/include/asm/posix_types.h:72:51: error: '__kernel_fd_set' has not been 
> declared
>   static __inline__ void __FD_SET(unsigned long fd, __kernel_fd_set *fdsetp)
> 
>   /usr/include/asm/posix_types.h: In function 'void __FD_SET(long unsigned 
> int, int*)':
>   /usr/include/asm/posix_types.h:74:28: error: '__NFDBITS' was not declared 
> in this scope
>   unsigned long _tmp = fd / __NFDBITS;
>   ...
> 
> The attach patch fixes this, and bootstrap+regtests fine on x86_64-linux.
> 
> OK to commit ?

No, for these files we aren't upstream and just periodically merge stuff
from there.  So, you should try to discuss this in asan upstream and get
a fix committed there and we can then merge it and/or cherry-pick it.
Please see libsanitizer/README.gcc.

Also, it would be nice to understand what is different on your powerpc-linux
from everybody's else where it works, do you have too old or too new kernel
headers, something different?

Jakub


RE: [PATCH] [ARC] Fix unwanted match for sign extend 16-bit constant.

2016-04-29 Thread Claudiu Zissulescu
Committed r235623.

Thanks,
Claudiu

> -Original Message-
> From: Joern Wolfgang Rennecke [mailto:g...@amylaar.uk]
> Sent: Thursday, April 28, 2016 10:57 PM
> To: Claudiu Zissulescu; Claudiu Zissulescu; gcc-patches@gcc.gnu.org
> Cc: francois.bed...@synopsys.com; jeremy.benn...@embecosm.com
> Subject: Re: [PATCH] [ARC] Fix unwanted match for sign extend 16-bit
> constant.
> 
> 
> 
> On 28/04/16 21:31, Claudiu Zissulescu wrote:
> >>
> >> Otherwise, I'd suggest using a traditional integer letter.  'J' is free.
> > Thanks for the suggestion, I will use 'J'.
> >
> >> Why do you remove half of the indentation?
> > Unwanted reformatting, sorry for this, I will revert it.
> >
> > I have the feeling you are happy with my new patch. Is there anything
> > to be added to it besides fixing the above issues?
> No, otherwise it looks OK.


[PATCH] Fix gcc.dg/tree-ssa/pr18589-10.c

2016-04-29 Thread Richard Biener

The following adjusts gcc.dg/tree-ssa/pr18589-10.c after my match.pd
patch to add

+ /* Simplify powi(x,y) * powi(z,y) -> powi(x*z,y). */
+ (simplify
+  (mult (POWI:s @0 @1) (POWI:s @2 @1))
+   (POWI (mult @0 @2) @1))

which now generates one multiplication less (assembly is the same).

Committed to trunk.

Richard.

2016-04-29  Richard Biener  

* gcc.dg/tree-ssa/pr18589-10.c: Adjust.

Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c  (revision 235621)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c  (working copy)
@@ -7,4 +7,4 @@ double baz (double x, double y, double z
  * __builtin_pow (z, 4.0));
 }
 
-/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* " 4 "optimized" } } */


Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-29 Thread Rainer Orth
Hi Bernd,

>> would this version combo (gmp 6.0.0, mpfr 3.1.1, mpc 0.9) also work on
>> the active release branches (gcc-5 and gcc-6, gcc-4.9 is on it's way
>> out)?  Having to install two different sets of the libraries for trunk
>> and branch work would be extremely tedious.
>>
>>  Rainer
>>
>
> Yes, when they are pre-installed there should be no problem.
> Also newer versions than these seem to work.
>
> In-tree only the versions that download_prerequisite picks are
> tested and guaranteed to work.

fine with me: I never cared about (or even got the point of) in-tree
builds of this stuff.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Allow redefinition of libcilkrts debug macros

2016-04-29 Thread Rainer Orth
Hi Jeff,

> On 04/26/2016 08:04 AM, Rainer Orth wrote:
>> When working on a couple of Cilk Plus issues lately (PRs target/60290,
>> target/68945), I noticed that you have to modify the libcilkplus sources
>> to enable various debugging output.  This seems silly, and the following
>> patch allows defining them from the command line.
>>
>> Tested on i386-pc-solaris2.12 and sparc-sun-solaris2.12.
>>
>> Ok for mainline?
>>
>>  Rainer
>>
>>
>> 2016-04-07  Rainer Orth  
>>
>>  * runtime/except-gcc.cpp (DEBUG_EXCEPTIONS): Allow redefinition.
>>  * runtime/cilk_fiber.h (FIBER_DEBUG): Likewise.
>>  * runtime/scheduler.h (REDPAR_DEBUG): Likewise.
> Ilya will have to chime in here -- we're a downstream consumer of the Cilk+
> runtime.  So these patches need to go into Intel's tree first, then Ilya
> can bring them into the GCC tree.

I suspected that much.  It would be good to have a libcilkrts/README.gcc
describing the rules which changes can go into the gcc tree directly,
which need to go upstream first, and how.  libo and libsanitizer already
have this.

Having a listed libcilkrts maintainer would probably help, too ;-)

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [Patch] Fix PR 60040

2016-04-29 Thread Bernd Schmidt

On 04/28/2016 10:07 AM, Senthil Kumar Selvaraj wrote:


Here's the patch with the extra bits removed.


To get it some additional test coverage, I've tested it on 
gcc-4_7-branch (with another backport so that it applies) with an x86_64 
bootstrap and test. That worked, so I installed it on trunk.



Bernd


[PATCH][C++] Build DECL_EXPRs for anonymous VLAs

2016-04-29 Thread Richard Biener

The following makes sure that the gimplifier properly unshares the
type fields in anonymous VLA types by inserting a DECL_EXPR for
it instead of just forcing a TYPE_SIZE evaluation.

This avoids turning those fields in to garbage during gimplification.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Ok for trunk?

This fixes an ICE with c-c++-common/ubsan/pr59667.c when the gimplfier
is allowed to introduce SSA names.

Richard.

2016-04-29  Richard Biener  

cp/
* decl.c (grokdeclarator): Properly insert a DECL_EXPR for
anonymous VLAs.

Index: gcc/cp/decl.c
===
*** gcc/cp/decl.c.orig  2016-04-28 14:11:00.044581227 +0200
--- gcc/cp/decl.c   2016-04-28 14:11:08.116671106 +0200
*** grokdeclarator (const cp_declarator *dec
*** 10393,10400 
  && (decl_context == NORMAL || decl_context == FIELD)
  && at_function_scope_p ()
  && variably_modified_type_p (type, NULL_TREE))
!   /* Force evaluation of the SAVE_EXPR.  */
!   finish_expr_stmt (TYPE_SIZE (type));
  
  if (declarator->kind == cdk_reference)
{
--- 10393,10403 
  && (decl_context == NORMAL || decl_context == FIELD)
  && at_function_scope_p ()
  && variably_modified_type_p (type, NULL_TREE))
!   {
! TYPE_NAME (type) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
!NULL_TREE, type);
! add_decl_expr (TYPE_NAME (type));
!   }
  
  if (declarator->kind == cdk_reference)
{



RE: [PATCHv2 0/7] ARC: Add support for nps400 variant

2016-04-29 Thread Claudiu Zissulescu
Hi Andrew,

I see the next tests failing:

FAIL: gcc.target/arc/movb-1.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+, 
*r[0-5]+, *19, *21, *8
FAIL: gcc.target/arc/movb-2.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+, 
*r[0-5]+, *23, *23, *9
FAIL: gcc.target/arc/movb-5.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+, 
*r[0-5]+, *23, *(23|7), *9
FAIL: gcc.target/arc/movh_cl-1.c scan-assembler movh.cl r[0-9]+,0xc000>>16

Please can you confirm, and if it is the case please fix them.

Thanks,
Claudiu


[PATCH][Fortran] Properly generate DECL_EXPRs for temporary arrays

2016-04-29 Thread Richard Biener

The following makes sure to create DECL_EXPRs for VLA types built for
temporary arrays to properly allow the gimplifier to unshare expression
in its type fields when required.

This avoids turing those fields into garbage.  With a patch to allow
the gimplifier to introduce SSA names it avoids ICEs for

gfortran.dg/auto_char_pointer_array_result_1.f90
gfortran.dg/interface_12.f90
gfortran.dg/result_in_spec_1.f90

and

libgomp.fortran/vla7.f90

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Ok for trunk?

I'm not sure testsuite coverage catched all cases this was missing so
other places might need similar handling after the SSA patch goes
in and this gets applied to the real world (TM).

Thanks,
Richard.

2016-04-29  Richard Biener  

fortran/
* trans-array.c (gfc_trans_create_temp_array): Properly
create a DECL_EXPR for the anonymous VLA array type.

Index: gcc/fortran/trans-array.c
===
*** gcc/fortran/trans-array.c.orig  2016-04-28 14:11:00.064581449 +0200
--- gcc/fortran/trans-array.c   2016-04-28 14:11:08.120671151 +0200
*** gfc_trans_create_temp_array (stmtblock_t
*** 1094,1099 
--- 1094,1109 
info->descriptor = desc;
size = gfc_index_one_node;
  
+   /* Emit a DECL_EXPR for the variable sized array type in
+  GFC_TYPE_ARRAY_DATAPTR_TYPE so the gimplification of its type
+  sizes works correctly.  */
+   tree arraytype = TREE_TYPE (GFC_TYPE_ARRAY_DATAPTR_TYPE (type));
+   if (! TYPE_NAME (arraytype))
+ TYPE_NAME (arraytype) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
+   NULL_TREE, arraytype);
+   gfc_add_expr_to_block (pre, build1 (DECL_EXPR,
+ arraytype, TYPE_NAME (arraytype)));
+ 
/* Fill in the array dtype.  */
tmp = gfc_conv_descriptor_dtype (desc);
gfc_add_modify (pre, tmp, gfc_get_dtype (TREE_TYPE (desc)));


[Committed] S/390: Memory constraint cleanup

2016-04-29 Thread Andreas Krebbel
This fixes an issue with the long displacement memory address
constraints S and T.  These were defined to only accept long
displacement addresses.  This is wrong since a memory constraint must
not reject an address with a 0 displacement.  Reload relies on being
able to turn an invalid memory address into a valid one by reloading
the address into a base register.  The S and T constraints would
reject such an address.

This isn't really a problem for the backend since we used the
constraints with that knowledge there but it is a problem for people
writing inline assemblies.

gcc/ChangeLog:

2016-04-29  Ulrich Weigand  

* config/s390/constraints.md ("U", "W"): Invoke
s390_mem_constraint with "ZR" and "ZT".
* config/s390/s390.c (s390_check_qrst_address): Reject invalid
addresses when using LRA.  Accept also short displacements for S
and T constraints.  Do not check for long displacement target for
S and T constraints.
(s390_mem_constraint): Remove handling of U and W constraints.
* config/s390/s390.md (various patterns): Remove the short
displacement constraints (Q and R) if a long displacement
constraint is present.  Add longdisp as required CPU capability.
* config/s390/vector.md: Likewise.
* config/s390/vx-builtins.md: Likewise.
---
 gcc/ChangeLog  |  15 ++
 gcc/config/s390/constraints.md |  15 +-
 gcc/config/s390/s390.c |  31 ++--
 gcc/config/s390/s390.md| 401 ++---
 gcc/config/s390/vector.md  |  36 ++--
 gcc/config/s390/vx-builtins.md |   6 +-
 6 files changed, 274 insertions(+), 230 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 984a703..6e783c8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2016-04-29  Ulrich Weigand  
+
+   * config/s390/constraints.md ("U", "W"): Invoke
+   s390_mem_constraint with "ZR" and "ZT".
+   * config/s390/s390.c (s390_check_qrst_address): Reject invalid
+   addresses when using LRA.  Accept also short displacements for S
+   and T constraints.  Do not check for long displacement target for
+   S and T constraints.
+   (s390_mem_constraint): Remove handling of U and W constraints.
+   * config/s390/s390.md (various patterns): Remove the short
+   displacement constraints (Q and R) if a long displacement
+   constraint is present.  Add longdisp as required CPU capability.
+   * config/s390/vector.md: Likewise.
+   * config/s390/vx-builtins.md: Likewise.
+
 2016-04-29  Uros Bizjak  
 
* config/i386/i386.md (Load+RegOp to Mov+MemOp peephole2):
diff --git a/gcc/config/s390/constraints.md b/gcc/config/s390/constraints.md
index 7857700..190cdc9 100644
--- a/gcc/config/s390/constraints.md
+++ b/gcc/config/s390/constraints.md
@@ -77,8 +77,8 @@
 ;;B -- Multiple letter constraint followed by Q, R, S, or T:
 ;; Memory reference of the type specified by second letter that
 ;; does *not* refer to a literal pool entry.
-;;U -- Pointer with short displacement. (deprecated - use ZQZR)
-;;W -- Pointer with long displacement. (deprecated - use ZSZT)
+;;U -- Pointer with short displacement. (deprecated - use ZR)
+;;W -- Pointer with long displacement. (deprecated - use ZT)
 ;;Y -- Address style operand without index.
 ;;ZQ -- Pointer without index register and with short displacement.
 ;;ZR -- Pointer with index register and short displacement.
@@ -455,8 +455,7 @@
 ; the TARGET_MEM_CONSTRAINT macro.
 (define_memory_constraint "m"
   "Matches the most general memory address for pre-z10 machines."
-  (match_test "s390_mem_constraint (\"R\", op)
-   || s390_mem_constraint (\"T\", op)"))
+  (match_test "s390_mem_constraint (\"T\", op)"))
 
 (define_memory_constraint "AQ"
   "@internal
@@ -512,12 +511,12 @@
 
 
 (define_address_constraint "U"
-  "Pointer with short displacement. (deprecated - use ZQZR)"
-  (match_test "s390_mem_constraint (\"U\", op)"))
+  "Pointer with short displacement. (deprecated - use ZR)"
+  (match_test "s390_mem_constraint (\"ZR\", op)"))
 
 (define_address_constraint "W"
-  "Pointer with long displacement. (deprecated - use ZSZT)"
-  (match_test "s390_mem_constraint (\"W\", op)"))
+  "Pointer with long displacement. (deprecated - use ZT)"
+  (match_test "s390_mem_constraint (\"ZT\", op)"))
 
 
 (define_address_constraint "ZQ"
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index a1d0930..155be3c 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3116,6 +3116,19 @@ s390_check_qrst_address (char c, rtx op, bool 
lit_pool_ok)
   decomposed = true;
 }
 
+  /* With reload, we sometimes get intermediate address forms that are
+ actually invalid as-is, but we need to accept them in the most
+ generic cases below ('R' or 'T'), since reload will in fact fix
+ them up.  LRA behaves differently here; we never see such form

[Committed] S/390: Replace LDER with LDR.

2016-04-29 Thread Andreas Krebbel
For performance reasons it is important to write the full 64 bits of
an FPR target reg even when dealing with 32 bit values.  So we chose
lder over ler for 32 bit float register moves.  lder zero-extends the
32 bit value from the source reg to 64 bit in the target.  However,
since it actually doesn't matter whether we write the upper 32 bits
with zeros or with any other garbage we can also use ldr instead.  It
is bit shorter and therefore will do good for I-Cache usage.

gcc/ChangeLog:

2016-04-29  Andreas Krebbel  

* config/s390/2964.md ("z13_unit_fxu", "z13_0"): Remove lder.
* config/s390/s390.md ("movsi_larl", "*movsi_esa", "mov"):
Change lder to ldr.
* config/s390/vector.md ("mov"): Likewise.
---
 gcc/ChangeLog |  7 +++
 gcc/config/s390/2964.md   |  4 ++--
 gcc/config/s390/s390.md   | 12 ++--
 gcc/config/s390/vector.md |  4 ++--
 4 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6e783c8..be81b84 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-04-29  Andreas Krebbel  
+
+   * config/s390/2964.md ("z13_unit_fxu", "z13_0"): Remove lder.
+   * config/s390/s390.md ("movsi_larl", "*movsi_esa", "mov"):
+   Change lder to ldr.
+   * config/s390/vector.md ("mov"): Likewise.
+
 2016-04-29  Ulrich Weigand  
 
* config/s390/constraints.md ("U", "W"): Invoke
diff --git a/gcc/config/s390/2964.md b/gcc/config/s390/2964.md
index d2211e1..e0e732b 100644
--- a/gcc/config/s390/2964.md
+++ b/gcc/config/s390/2964.md
@@ -57,7 +57,7 @@ 
vllezh,oc,xc,clc,lrl,ear,nc,lgrl,sfpc,llgf,llgfrl,llgh,llgt,lcbb,vll,sar") (cons
 (define_attr "z13_unit_fxu" ""
   (cond [(eq_attr "mnemonic" "s,lcgr,x,nop,oiy,ppa,ng,msy,sgrk,vstl,aghik,\
 msgf,ipm,mvi,stocg,rll,srlg,cghsi,clgit,srlk,alrk,sg,sh,sl,st,sy,vst,ark,\
-xgr,agsi,tm,nrk,shy,llhr,agf,alcr,slgfr,sr,clgrt,laa,lder,sgf,lan,llilf,\
+xgr,agsi,tm,nrk,shy,llhr,agf,alcr,slgfr,sr,clgrt,laa,sgf,lan,llilf,\
 llilh,ag,llill,lay,al,n,laxg,ar,ahi,sgr,ntstg,ay,stcy,nopr,mfy,ngrk,lbr,\
 br,dsgr,stdy,ork,ldgr,lcr,cg,ch,lgfrl,cl,stoc,cr,agfr,stgrl,cy,alfi,xg,\
 cgfi,xi,clfhsi,cgfr,xr,slb,mghi,clfi,slg,clhhsi,agfi,clfit,sly,mr,ldr,nihf,\
@@ -121,7 +121,7 @@ vchfs,madb,ddbr") (const_int 1)]
   (and (eq_attr "cpu" "z13")
(eq_attr "mnemonic" "s,lcgr,x,nop,oiy,vlbb,ppa,ng,sgrk,vstl,aghik,\
 mvc,ipm,llgc,mvi,stocg,rll,jg,srlg,cghsi,clgit,srlk,alrk,sg,sh,sl,st,sy,\
-vst,ark,xgr,agsi,tm,nrk,shy,llhr,agf,alcr,slgfr,sr,clgrt,llc,laa,lder,sgf,\
+vst,ark,xgr,agsi,tm,nrk,shy,llhr,agf,alcr,slgfr,sr,clgrt,llc,laa,sgf,\
 lan,llhrl,llilf,llilh,ag,llill,lay,al,n,laxg,ar,ahi,sgr,ntstg,ay,stcy,vl,\
 nopr,ngrk,lbr,br,stdy,ork,ldgr,lcr,cg,ch,llghrl,lgfrl,cl,stoc,cr,agfr,stgrl,\
 cy,alfi,xg,cgfi,xi,vlrepf,vlrepg,vlreph,clfhsi,cgfr,xr,slb,mghi,clfi,slg,\
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 8757470..12a7f2a 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -1924,7 +1924,7 @@
ly\t%0,%1
st\t%1,%0
sty\t%1,%0
-   lder\t%0,%1
+   ldr\t%0,%1
ler\t%0,%1
lde\t%0,%1
le\t%0,%1
@@ -1944,7 +1944,7 @@
vlef\t%v0,%1,0
vstef\t%v1,%0,0"
   [(set_attr "op_type" "RI,RI,RI,RIL,RXY,RIL,RR,RX,RXY,RX,RXY,
-
RRE,RR,RXE,RX,RXY,RX,RXY,RRE,RRE,RS,RIL,SIL,RS,VRI,VRR,VRS,VRS,VRX,VRX")
+
RR,RR,RXE,RX,RXY,RX,RXY,RRE,RRE,RS,RIL,SIL,RS,VRI,VRR,VRS,VRS,VRX,VRX")
(set_attr "type" "*,
  *,
  *,
@@ -2005,7 +2005,7 @@
lr\t%0,%1
l\t%0,%1
st\t%1,%0
-   lder\t%0,%1
+   ldr\t%0,%1
ler\t%0,%1
lde\t%0,%1
le\t%0,%1
@@ -2014,7 +2014,7 @@
sar\t%0,%1
stam\t%1,%1,%S0
lam\t%0,%0,%S1"
-  [(set_attr "op_type" "RI,RR,RX,RX,RRE,RR,RXE,RX,RX,RRE,RRE,RS,RS")
+  [(set_attr "op_type" "RI,RR,RX,RX,RR,RR,RXE,RX,RX,RRE,RRE,RS,RS")
(set_attr "type" 
"*,lr,load,store,floadsf,floadsf,floadsf,floadsf,fstoresf,*,*,*,*")
(set_attr "z10prop" 
"z10_fwd_A1,z10_fr_E1,z10_fwd_A3,z10_rec,*,*,*,*,*,z10_super_E1,
 z10_super,*,*")
@@ -2550,7 +2550,7 @@
   ""
   "@
lzer\t%0
-   lder\t%0,%1
+   ldr\t%0,%1
ler\t%0,%1
lde\t%0,%1
le\t%0,%1
@@ -2571,7 +2571,7 @@
vlgvf\t%0,%v1,0
vleg\t%0,%1,0
vsteg\t%1,%0,0"
-  [(set_attr "op_type" 
"RRE,RRE,RR,RXE,RX,RXY,RX,RXY,RI,RR,RIL,RX,RXY,RIL,RX,RXY,VRR,VRI,VRS,VRS,VRX,VRX")
+  [(set_attr "op_type" 
"RRE,RR,RR,RXE,RX,RXY,RX,RXY,RI,RR,RIL,RX,RXY,RIL,RX,RXY,VRR,VRI,VRS,VRS,VRX,VRX")
(set_attr "type"
"fsimpsf,fsimpsf,fload,fload,fload,fload,
 
fstore,fstore,*,lr,load,load,load,store,store,store,*,*,*,*,load,store")
(set_attr "z10prop" 
"*,*,*,*,*,*,*,*,z10_fwd_A1,z10_fr_E1,z10_fr_E1,z10_fwd_A3,z10_fwd_A3,z10_rec,z10_rec,z10_rec,*,*,*,*,*,*")
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 979cb29..bc4f8da 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -23

Re: [PATCH] PR/68089: C++-11: Ingore "alignas(0)".

2016-04-29 Thread Andreas Krebbel
On 12/31/2015 12:50 PM, Dominik Vogt wrote:
> The attached patch fixes C++-11 handling of "alignas(0)" which
> should be ignored but currently generates an error message.  A
> test case is included; the patch has been tested on S390x.  Since
> it's a language issue it should be independent of the backend
> used.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69089

Applied. Thanks!

-Andreas-




Re: [PATCH] S/390: Improve documentation of s390_reload_costs.

2016-04-29 Thread Andreas Krebbel
On 04/27/2016 09:53 AM, Dominik Vogt wrote:
> The attached patch improves some S/390 function documentation.

Applied. Thanks!

-Andreas-




Re: [PATCH] Clean up tests where a later dg-do completely overrides another.

2016-04-29 Thread Andreas Krebbel
On 04/27/2016 09:50 AM, Dominik Vogt wrote:
> The attached patch cleans up some (mostly unnecessary) dg-do
> directives in the gcc.dg and gcc.target test cases.

Applied. Thanks!

-Andreas-




Re: [PATCH][SMS] SMS use loop induction variable analysis instead of depending on doloop optimization

2016-04-29 Thread Bernd Schmidt

On 04/28/2016 08:06 AM, Shiva Chen wrote:


Could anyone help me to review the patch?
Any suggestion would be very helpful.


You might want to split it up if there are several logically independent 
pieces. I can't quite make sense of it all, and I'm not too familiar 
with SMS anyway, so the following is not a complete review, just a 
selection of issues I observed.


There are a large number of formatting and style problems. I'll be 
pointing out some instances, but please read

   http://www.gnu.org/prep/standards/standards.html#Writing-C
and self-review your patch before resubmission.


+static bool mem_write_insn_p (rtx_insn *);


Generally best to order your code so that you don't need forward 
declarations.



-  /* We do not handle setting only part of the register.  */
-  if (DF_REF_FLAGS (adef) & DF_REF_READ_WRITE)
-return GRD_INVALID;
-


Why this change?


  }

+static rtx
+get_rhs (rtx_insn *insn, rtx reg)


get_rhs might not be the most meaningful function name. We require 
documentation before every function that says what it does, and what the 
arguments mean. Please examine the surrounding code for examples.



+
+  /* Find iv increment/decrement rhs in following pattern
+
+ (parallel [
+   (set (reg:CC_NOOV 100 cc)
+   (compare:CC_NOOV (plus:SI (reg:SI 147)
+ (const_int -1))
+(const_int 0 [0])))
+   (set (reg:SI 147)
+   (plus:SI (reg:SI 147)
+(const_int -1 [0x]))
+   */


Rather than quoting large RTL blocks, it would be better to explain what 
you're trying to do.



@@ -1048,6 +1141,47 @@ iv_analyze_expr (rtx_insn *insn, rtx rhs, machine_mode 
mode,
return iv->base != NULL_RTX;
  }

+/* Auxiliary variable for mem_read_insn_p/mem_write_insn_p.  */
+static bool mem_ref_p;


Auxiliary variable doing what? Rather than using a global, it might be 
better to use the data pointer passed through note_uses.



+   /* To check the case REG is read write register
+  in memory reference.  */
+  if (mem_write_insn_p (insn))
+body = SET_DEST (set);
+  else if (mem_read_insn_p (insn))
+body = SET_SRC (set);


This all looks a little odd; if you're looking for autoincs, why not 
just scan the entire INSN for a MEM, rather than classify it as 
mem_write or mem_read_insn?



+  if (GET_CODE (body) == ZERO_EXTEND ||
+ GET_CODE (body) == SIGN_EXTEND)


Split lines before operators.


+
+  /* To handle the condition as follow
+  (ne (plus:SI (reg:SI 163)
+  (const_int -1))
+ (const_int 0 [0]))
+
+ The pattern would generate
+ by doloop optimization.  */


This comment confuses me more than it helps me.


+
+  /* handle the condition:
+  (ne (plus:SI (reg:SI 163)
+   (const_int -1 [0x]))
+   (const_int 0 [0]))
+  */


Handle how? This has comment formatting problems, which would be easier 
to make go away if you didn't just quote RTL. (If you must quote RTL, 
best to remove irrelevant bits such as :SI and [0xf] and [0]), 
and replace register numbers like 163 with variables like A.



+
+  /* following code handle the condition:
+ (ne (reg:SI 163) (reg:SI 176))
+  */


Same problem.


+ if ((set = single_set (insn)))

> +  {
> +if (SET_DEST (set) == count_reg)
> +  continue;
> +else
> +  {
> +loop->count_ref_p = true;
> +return;
> +  }
> +  }

Either
  if ((set = single_set (insn)) != NULL_RTX)
or, better (also removing the outer declaration of set):

rtx set = single_set (insn);
if (set && SET_DEST (set) == count_reg)
  continue;
loop->count_ref_p = true;
return;


+
+ if (GET_CODE (PATTERN (insn)) == PARALLEL)
+   {


No unnecessary braces if an if contains only a single statement.


+  /* check count_reg reference in the loop
+ and set result to loop->count_ref_p.  */
+  check_count_reg_reference (loop, iv->base);


Comments should be full sentences, properly capitalized. But avoid 
comments that just describe what a called function is doing. This 
information should be part of that function's starting comment.


Skipping most of the rest of the SMS changes...


Bernd


Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Uros Bizjak
On Fri, Apr 29, 2016 at 9:58 AM, Eric Botcazou  wrote:
>> We can put it back, but prehaps implemented as unspec, so it won't
>> interfere with peepholes?
>
> No strong opinion, as long as the final assembly is the same as before.

I'm testing the attached patch. Does it fix your ada failures?

Uros.
Index: i386.md
===
--- i386.md (revision 235620)
+++ i386.md (working copy)
@@ -88,6 +88,7 @@
   UNSPEC_SET_GOT_OFFSET
   UNSPEC_MEMORY_BLOCKAGE
   UNSPEC_STACK_CHECK
+  UNSPEC_PROBE_STACK
 
   ;; TLS support
   UNSPEC_TP
@@ -17552,6 +17553,23 @@
   DONE;
 })
 
+(define_expand "probe_stack"
+  [(parallel
+ [(set (match_operand 0 "memory_operand")
+  (unspec [(const_int 0)] UNSPEC_PROBE_STACK))
+  (clobber (reg:CC FLAGS_REG))])])
+
+;; Use OR for stack probes, this is shorter.
+(define_insn "*probe_stack_"
+  [(set (match_operand:W 0 "memory_operand" "=m")
+   (unspec:W [(const_int 0)] UNSPEC_PROBE_STACK))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "or{}\t{$0, %0|%0, 0}"
+  [(set_attr "type" "alu1")
+   (set_attr "mode" "")
+   (set_attr "length_immediate" "1")])
+  
 (define_insn "adjust_stack_and_probe"
   [(set (match_operand:P 0 "register_operand" "=r")
(unspec_volatile:P [(match_operand:P 1 "register_operand" "0")]


Re: [PATCH] Take known zero bits into account when checking extraction.

2016-04-29 Thread Dominik Vogt
On Wed, Apr 27, 2016 at 10:24:21PM -0600, Jeff Law wrote:
> On 04/27/2016 02:20 AM, Dominik Vogt wrote:
> > * combine.c (make_compound_operation): Take known zero bits into
> > account when checking for possible zero_extend.
> I'd strongly recommend writing some tests for this.  Extra credit if
> they can be run on an x86 target which gets more testing than s390.

I'll look into that later.

> If I go back to our original discussion, we have this going into combine:
> 
> (insn 6 3 7 2 (parallel [
> (set (reg:SI 64)
> (and:SI (mem:SI (reg/v/f:DI 63 [ a ]) [1 *a_2(D)+0 S4 A32])
> (const_int -65521 [0x000f])))
> (clobber (reg:CC 33 %cc))
> ]) andc-immediate.c:21 1481 {*andsi3_zarch}
>  (expr_list:REG_DEAD (reg/v/f:DI 63 [ a ])
> (expr_list:REG_UNUSED (reg:CC 33 %cc)
> (nil
> (insn 7 6 12 2 (set (reg:DI 65)
> (zero_extend:DI (reg:SI 64))) andc-immediate.c:21 1207
> {*zero_extendsidi2}
>  (expr_list:REG_DEAD (reg:SI 64)
> (nil)))
> (insn 12 7 13 2 (set (reg/i:DI 2 %r2)
> (reg:DI 65)) andc-immediate.c:22 1073 {*movdi_64}
>  (expr_list:REG_DEAD (reg:DI 65)
> (nil)))
> 
> Which combine turns into:
> 
> (insn 6 3 7 2 (parallel [
> (set (reg:SI 64)
> (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
> (const_int -65521 [0x000f])))
> (clobber (reg:CC 33 %cc))
> ]) andc-immediate.c:21 1481 {*andsi3_zarch}
>  (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
> (expr_list:REG_UNUSED (reg:CC 33 %cc)
> (nil
> (insn 12 7 13 2 (parallel [
> (set (reg/i:DI 2 %r2)
> (and:DI (subreg:DI (reg:SI 64) 0)
>  ^^^
> (const_int 4294901775 [0x000f])))
>^^
> (clobber (reg:CC 33 %cc))
> ]) andc-immediate.c:22 1474 {*anddi3}
>  (expr_list:REG_UNUSED (reg:CC 33 %cc)
> (expr_list:REG_DEAD (reg:SI 64)
> (nil
> 
> 
> Instead you want insn 12 to use a zero-extend to extend (reg:SI 64)
> into (reg:DI 2)?

Yes, because we get the zero extend for free in this case (through
the constant in the AND or because the input value is a function
argument that is already zero extended).

> Can't you achieve this in this clause:
> 
>  /* If the constant is one less than a power of two, this might be
>  representable by an extraction even if no shift is present.
>  If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
>  we are in a COMPARE.  */
> 
> You extract the constant via UINTVAL (XEXP (x, 1)), then munge it
> based on nonzero_bits and pass the result to exact_log2?

That's what we tried first, but it resulted in worse code in many
places (saved about 250 instructions in the SPEC2006 testsuite but
added about 42000 elsewhere).  It was so bad that I didn't even
bother to check what's going on.  Probably this was triggered all
over the place by small constants like 1, 3, 7 and the like where
s390 has no cheap way for zero extension.  So I limited this to
constants that are actually mode masks, implicitly assuming that
there are zero extend instructions only for known modes (which is
true for s390 but may not for some other targets).  Being
conservative here shouldn't hurt; but I wonder whether there are
targets where this condition still allows too much.

> Though I do like how you've conditionalized on the cost of the and
> vs the cost of hte zero-extend.  So maybe your approach is
> ultimately better.

Actually we wanted to remove the call to rtx_cost() (because
usually combine just assumes that a zero extend is cheaper).  I've
probably forgotten to remove it before posting the patch.  For
s390 it's meaningless whether rtx_cost() is called or not because
at the moment it doesn't model the cost of zero extension (i.e.
the cost of either way is just one instruction, and without
context it's not possible to decide whether s390 needs a separate
instruction for the zero extend or whether it comes for free).

> Still curious your thoughts on doing it by just
> munging the constant you pass off to exact_log2 in that earlier
> clause.

> >+  /* If the one operand is a paradoxical subreg of a register or memory 
> >and
> >+ the constant (limited to the smaller mode) has only zero bits where
> >+ the sub expression has known zero bits, this can be expressed as
> >+ a zero_extend.  */
> >+  else if (GET_CODE (XEXP (x, 0)) == SUBREG)
> >+{
> >+  rtx sub;
> >+
> >+  sub = XEXP (XEXP (x, 0), 0);
> >+  machine_mode sub_mode = GET_MODE (sub);
> >+  if ((REG_P (sub) || MEM_P (sub))
> >+  && GET_MODE_PRECISION (sub_mode) < mode_width
> >+  && (UINTVAL (XEXP (x, 1))
> >+  | (~nonzero_bits (sub, sub_mode) & GET_MODE_MASK (sub_mode))
> >+   

Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-29 Thread Uros Bizjak
On Thu, Apr 28, 2016 at 12:36 PM, Ilya Enkovich  wrote:

> That's what I have in my draft for DImode immediates:
>
> @@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
> unsigned insn_uid)
>BITMAP_FREE (queue);
>  }
>
> +/* Return a cost of building a vector costant
> +   instead of using a scalar one.  */
> +
> +int
> +scalar_chain::vector_const_cost (rtx exp)
> +{
> +  gcc_assert (CONST_INT_P (exp));
> +
> +  if (const0_operand (exp, GET_MODE (exp))
> +  || constm1_operand (exp, GET_MODE (exp)))

The above should just use

standard_sse_constant_p (exp, V2DImode).

Uros.


Re: [PATCH] Fix PR tree-optimization/51513

2016-04-29 Thread Richard Biener
On Fri, Apr 29, 2016 at 1:35 AM, Peter Bergner  wrote:
> This patch fixes PR tree-optimization/51513, namely the generation of
> wild branches due to switch case statements that only contain calls to
> __builtin_unreachable().  For example, compiling using -O2 -fjump-tables
> --param case-values-threshold=1 (to easily expose the bug), we see:
>
>   switch (which)
> {
>   case 0:
> return v0;
>   case 1:
> return v1;
>   case 2:
> return v2;
>   default:
> __builtin_unreachable( );
> }

Your testcase passes '2' where it passes just fine.  If I pass 3 as which
I indeed get an abort () but you can't reasonably expect it to return 13 then.

A __builtin_unreachable () marks the path leading to it as invoking undefined
behavior whenever you would enter it at runtime - this is exactly what happens,
you get a branch "somewhere".

So I fail to see the actual bug you are fixing and I wonder why you do stuff
at the GIMPLE level when we only remove the unreachable blocks at RTL
level CFG cleanup.  Iff then the "fix" should be there.

But as said, the behavior is expected - in fact the jump-table code should
be optimized for a unreachable default case to simply omit the range
check!  That would be a better fix (also avoiding the wild branch).

Richard.

> we currently generate for powerpc64le-linux:
>
> cmpldi 7,3,2
> bgt 7,.L2   <- Invalid branch
> ...
> .L3:
> mr 3,4
> blr
> .p2align 4,,15
> .L2:<- Invalid branch target
> .long 0
> .byte 0,0,0,0,0,0,0,0
> .size   bug,.-bug
>
> ...and for x86_64-linux:
>
> bug:
> .LFB0:
> .cfi_startproc
> cmpq$2, %rdi
> ja  .L2 <- Invalid branch
> jmp *.L4(,%rdi,8)
> ...
> .L3:
> movq%rsi, %rax
> ret
> .p2align 4,,10
> .p2align 3
> .L2:<- Invalid branch target
> .cfi_endproc
> .LFE0:
> .size   bug, .-bug
>
> The bug is that we end up deleting the unreachable block(s) from the CFG,
> but we never remove the label(s) for the block(s) in the switch jump table.
> We fix this by removing the case labels and their associated edges for
> unreachable blocks.  Normal CFG cleanup removes the unreachable blocks.
>
> This has passed bootstrap and regtesting on powerpc64le-linux and x86_64-linux
> with no regressions.  Ok for trunk?
>
> Peter
>
>
> gcc/
> PR tree-optimization/51513
> * tree-cfg.c (gimple_unreachable_bb_p): New function.
> (assert_unreachable_fallthru_edge_p): Use it.
> (compress_case_label_vector): New function.
> (group_case_labels_stmt): Use it.
> (cleanup_dead_labels): Call gimple_unreachable_bb_p() and
> compress_case_label_vector().  Remove labels and edges leading
> to unreachable blocks.
>
> gcc/testsuite/
> PR tree-optimization/51513
> * gcc.c-torture/execute/pr51513.c: New test.
>
>
> Index: gcc/tree-cfg.c
> ===
> --- gcc/tree-cfg.c  (revision 235531)
> +++ gcc/tree-cfg.c  (working copy)
> @@ -408,6 +408,33 @@ computed_goto_p (gimple *t)
>   && TREE_CODE (gimple_goto_dest (t)) != LABEL_DECL);
>  }
>
> +/* Returns true if the basic block BB has no successors and only contains
> +   a call to __builtin_unreachable ().  */
> +
> +static bool
> +gimple_unreachable_bb_p (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi;
> +  gimple *stmt;
> +
> +  if (EDGE_COUNT (bb->succs) != 0)
> +return false;
> +
> +  gsi = gsi_after_labels (bb);
> +  if (gsi_end_p (gsi))
> +return false;
> +
> +  stmt = gsi_stmt (gsi);
> +  while (is_gimple_debug (stmt) || gimple_clobber_p (stmt))
> +{
> +  gsi_next (&gsi);
> +  if (gsi_end_p (gsi))
> +   return false;
> +  stmt = gsi_stmt (gsi);
> +}
> +  return gimple_call_builtin_p (stmt, BUILT_IN_UNREACHABLE);
> +}
> +
>  /* Returns true for edge E where e->src ends with a GIMPLE_COND and
> the other edge points to a bb with just __builtin_unreachable ().
> I.e. return true for C->M edge in:
> @@ -431,23 +458,7 @@ assert_unreachable_fallthru_edge_p (edge
>basic_block other_bb = EDGE_SUCC (pred_bb, 0)->dest;
>if (other_bb == e->dest)
> other_bb = EDGE_SUCC (pred_bb, 1)->dest;
> -  if (EDGE_COUNT (other_bb->succs) == 0)
> -   {
> - gimple_stmt_iterator gsi = gsi_after_labels (other_bb);
> - gimple *stmt;
> -
> - if (gsi_end_p (gsi))
> -   return false;
> - stmt = gsi_stmt (gsi);
> - while (is_gimple_debug (stmt) || gimple_clobber_p (stmt))
> -   {
> - gsi_next (&gsi);
> - if (gsi_end_p (gsi))
> -   return false;
> - stmt = gsi_stmt (gsi);
> -   }
> - return gimple_call_builtin_p (stmt, BUILT_IN_UNREA

[PATCH][COMMITTED] [ARC] Fix obsolete constraint.

2016-04-29 Thread Claudiu Zissulescu
The defines in longlong.h were using obsolete 'J' constraint. I've replace them 
with 'Cal' constraint, and push the patch as obvious.

Cheers,
Claudiu

include/
2016-04-29  Claudiu Zissulescu  

* longlong.h (add_ss): Replace obsolete 'J' constraint with
'Cal' constraint.
(sub_ddmmss): Likewise.
---
 include/ChangeLog  |  6 ++
 include/longlong.h | 12 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/ChangeLog b/include/ChangeLog
index d09d548..1efa034 100644
--- a/include/ChangeLog
+++ b/include/ChangeLog
@@ -1,3 +1,9 @@
+2016-04-29  Claudiu Zissulescu  
+
+   * longlong.h (add_ss): Replace obsolete 'J' constraint with
+   'Cal' constraint.
+   (sub_ddmmss): Likewise.
+
 2016-03-17  Thomas Schwinge  
 
* gomp-constants.h (enum gomp_map_kind): Rename
diff --git a/include/longlong.h b/include/longlong.h
index 34ad9b4..03fd2a1 100644
--- a/include/longlong.h
+++ b/include/longlong.h
@@ -197,17 +197,17 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, 
UDItype);
   : "=r" ((USItype) (sh)), \
 "=&r" ((USItype) (sl)) \
   : "%r" ((USItype) (ah)), \
-"rIJ" ((USItype) (bh)),\
+"rICal" ((USItype) (bh)),  \
 "%r" ((USItype) (al)), \
-"rIJ" ((USItype) (bl)))
+"rICal" ((USItype) (bl)))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub.f  %1, %4, %5\n\tsbc   %0, %2, %3" \
   : "=r" ((USItype) (sh)), \
 "=&r" ((USItype) (sl)) \
   : "r" ((USItype) (ah)),  \
-"rIJ" ((USItype) (bh)),\
+"rICal" ((USItype) (bh)),  \
 "r" ((USItype) (al)),  \
-"rIJ" ((USItype) (bl)))
+"rICal" ((USItype) (bl)))
 
 #define __umulsidi3(u,v) ((UDItype)(USItype)u*(USItype)v)
 #ifdef __ARC_NORM__
@@ -221,8 +221,8 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, 
UDItype);
 }  \
   while (0)
 #define COUNT_LEADING_ZEROS_0 32
-#endif
-#endif
+#endif /* __ARC_NORM__ */
+#endif /* __arc__ */
 
 #if defined (__arm__) && (defined (__thumb2__) || !defined (__thumb__)) \
  && W_TYPE_SIZE == 32
-- 
1.9.1



Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Eric Botcazou
> I'm testing the attached patch. Does it fix your ada failures?

No, it totally breaks stack checking. :-(

=== acats tests ===
 FAIL:  c52103x
 FAIL:  c52104x
+FAIL:  c52104y
+FAIL:  cb1010a
+FAIL:  cb1010c
+FAIL:  cb1010d
 
=== acats Summary ===
-# of expected passes   2318
-# of unexpected failures   2
+# of expected passes   2314
+# of unexpected failures   6
 Native configuration is x86_64-suse-linux-gnu
 
=== gcc tests ===
@@ -133,11 +137,24 @@
 
 
 Running target unix
+FAIL: gnat.dg/opt49.adb 3 blank line(s) in output
+FAIL: gnat.dg/opt49.adb (test for excess errors)
+UNRESOLVED: gnat.dg/opt49.adb compilation failed to produce executable
+FAIL: gnat.dg/stack_check1.adb 3 blank line(s) in output
+FAIL: gnat.dg/stack_check1.adb (test for excess errors)
+UNRESOLVED: gnat.dg/stack_check1.adb compilation failed to produce executable
+FAIL: gnat.dg/stack_check2.adb 3 blank line(s) in output
+FAIL: gnat.dg/stack_check2.adb (test for excess errors)
+UNRESOLVED: gnat.dg/stack_check2.adb compilation failed to produce executable
+FAIL: gnat.dg/stack_check3.adb 3 blank line(s) in output
+FAIL: gnat.dg/stack_check3.adb (test for excess errors)


/home/eric/svn/gcc/gcc/testsuite/gnat.dg/opt49.adb:31:4: error: unrecognizable 
insn:
(insn 33 32 34 8 (parallel [
(set (mem/v:DI (reg/f:DI 7 sp) [0  S8 A8])
(unspec [
(const_int 0 [0])
] UNSPEC_PROBE_STACK))
(clobber (reg:CC 17 flags))
]) /home/eric/svn/gcc/gcc/testsuite/gnat.dg/opt49.adb:17 -1
 (nil))
+===GNAT BUG DETECTED==+
| 7.0.0 20160429 (experimental) [trunk revision 235619] (x86_64-suse-linux) 
GCC error:|
| in extract_insn, at recog.c:2287 |
| Error detected around 
/home/eric/svn/gcc/gcc/testsuite/gnat.dg/opt49.adb:31:4|

-- 
Eric Botcazou


[PATCH] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-29 Thread Claudiu Zissulescu
This is the updated patch on handling FPX NaNs.

Ok to apply?
Claudiu


gcc/
2016-04-18  Claudiu Zissulescu  

* testsuite/gcc.target/arc/ieee_eq.c: New test.

libgcc/
2016-04-18  Claudiu Zissulescu  

* config/arc/ieee-754/eqdf2.S: Handle FPX NaN.
---
 gcc/testsuite/gcc.target/arc/ieee_eq.c | 47 ++
 libgcc/config/arc/ieee-754/eqdf2.S | 15 +++
 2 files changed, 57 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/ieee_eq.c

diff --git a/gcc/testsuite/gcc.target/arc/ieee_eq.c 
b/gcc/testsuite/gcc.target/arc/ieee_eq.c
new file mode 100644
index 000..70aebad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/ieee_eq.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+#define TEST_EQ(TYPE,X,Y,RES)  \
+  do { \
+volatile TYPE a, b;\
+a = (TYPE) X;  \
+b = (TYPE) Y;  \
+if ((a == b) != RES)   \
+  {\
+   printf ("Runtime computation error @%d. %g "\
+   "!= %g\n", __LINE__, a, b); \
+   error = 1;  \
+  }\
+  } while (0)
+
+#ifndef __HS__
+/* Special type of NaN found when using double FPX instructions.  */
+static const unsigned long long __nan = 0x7FF08000ULL;
+# define W (*(double *) &__nan)
+#else
+# define W __builtin_nan ("")
+#endif
+
+#define Q __builtin_nan ("")
+#define H __builtin_inf ()
+
+int main (void)
+{
+  int error = 0;
+
+  TEST_EQ (double, 1, 1, 1);
+  TEST_EQ (double, 1, 2, 0);
+  TEST_EQ (double, W, W, 0);
+  TEST_EQ (double, Q, Q, 0);
+  TEST_EQ (double, __DBL_MAX__, __DBL_MAX__, 1);
+  TEST_EQ (double, __DBL_MIN__, __DBL_MIN__, 1);
+  TEST_EQ (double, H, H, 1);
+
+  if (error)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/libgcc/config/arc/ieee-754/eqdf2.S 
b/libgcc/config/arc/ieee-754/eqdf2.S
index bc7d88e..7e80ef5 100644
--- a/libgcc/config/arc/ieee-754/eqdf2.S
+++ b/libgcc/config/arc/ieee-754/eqdf2.S
@@ -58,11 +58,16 @@ __eqdf2:
   well predictable (as seen from the branch predictor).  */
 __eqdf2:
brne.d DBL0H,DBL1H,.Lhighdiff
-   bmsk r12,DBL0H,20
-#ifdef DPFP_COMPAT
-   or.f 0,DBL0L,DBL1L
-   bset.ne r12,r12,21
-#endif /* DPFP_COMPAT */
+#ifndef __HS__
+   /* The next two instructions are required to recognize the FPX
+   NaN, which has a pattern like this: 0x7ff0__8000_, as
+   oposite to 0x7ff8___.  */
+   or.f0,DBL0L,DBL1L
+   mov_s   r12,0x0020
+   bset.ne r12,r12,0
+#else
+   bmskr12,DBL0H,20
+#endif /* __HS__ */
add1.f  r12,r12,DBL0H /* set c iff NaN; also, clear z if NaN.  */
j_s.d   [blink]
cmp.cc  DBL0L,DBL1L
-- 
1.9.1



Re: [PATCHv2 0/7] ARC: Add support for nps400 variant

2016-04-29 Thread Andrew Burgess
* Claudiu Zissulescu  [2016-04-29 09:03:53 
+]:

> I see the next tests failing:
> 
> FAIL: gcc.target/arc/movb-1.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+, 
> *r[0-5]+, *19, *21, *8
> FAIL: gcc.target/arc/movb-2.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+, 
> *r[0-5]+, *23, *23, *9
> FAIL: gcc.target/arc/movb-5.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+, 
> *r[0-5]+, *23, *(23|7), *9
> FAIL: gcc.target/arc/movh_cl-1.c scan-assembler movh.cl r[0-9]+,0xc000>>16
> 
> Please can you confirm, and if it is the case please fix them.

I will investigate these today.

Thanks,
Andrew



Re: [PATCH] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-29 Thread Joern Wolfgang Rennecke



On 29/04/16 11:16, Claudiu Zissulescu wrote:

This is the updated patch on handling FPX NaNs.

Ok to apply?
Claudiu



 OK.


Re: [PATCH] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-29 Thread Joern Wolfgang Rennecke

P.S.: the .d suffix on the branch was there just for scheduling purposes -
not sure if that actually helped any chip's pipeline, or if it was just 
a bug

in the documentation.


RE: [PATCH] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-29 Thread Claudiu Zissulescu
It should do the job, at least for EM where the jump takes 2 cycle, and by 
means of using delay slots we can make all the cycles count. HS has a branch 
prediction mechanism, hence, filling up the delay slot doesn't have such a big 
impact like in EM or even earlier cpus.

//Claudiu

> -Original Message-
> From: Joern Wolfgang Rennecke [mailto:g...@amylaar.uk]
> Sent: Friday, April 29, 2016 12:27 PM
> To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org
> Cc: francois.bed...@synopsys.com; jeremy.benn...@embecosm.com
> Subject: Re: [PATCH] [ARC] Handle FPX NaN within optimized floating point
> library.
> 
> P.S.: the .d suffix on the branch was there just for scheduling purposes -
> not sure if that actually helped any chip's pipeline, or if it was just
> a bug
> in the documentation.


Re: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops

2016-04-29 Thread Ilya Enkovich
On 28 Apr 15:59, Richard Biener wrote:
> On Thu, Apr 28, 2016 at 3:26 PM, Ilya Enkovich  wrote:
> > On 27 Apr 16:05, Richard Biener wrote:
> >> >>
> >> >> I'd like to see testcases covering the corner-cases - have them have
> >> >> upper bound estimates by adjusting known array sizes and also cover
> >> >> the case of peeling for gaps.
> >> >
> >> > OK, I'll make more tests.
> >> > Thanks,
> >> > Ilya
> >> >
> >> >>
> >> >> Richard.
> >> >>
> >
> > Could you please look at new tests?  I added one simple case with
> > known array size and similar tests with a peeling for gaps w/ and
> > w/o vector iteration peeled.
> >
> > Checked new tests with RUNTESTFLAGS="vect.exp=vect-nb-iter-ub-* 
> > --target_board=unix{-m32,}
> > on x86_64-pc-linux-gnu.  OK for trunk?
> 
> Can you make the new testcases runtime ones, thus check that the
> vectorized outcome
> is ok (so we don't forget any trailing iterations)?
> 
> Ok with that change.
> 
> Richard.
> 

Thanks for review!  Here is a version I'm going to apply.  I also changed tests
to be sse2 instead of avx512bw to make them more usabe on current HW.

Thanks,
Ilya
--
gcc/

2016-04-29  Ilya Enkovich  

* tree-vect-loop.c (vect_transform_loop): Fix
nb_iterations_upper_bound computation for vectorized loop.

gcc/testsuite/

2016-04-29  Ilya Enkovich  

* gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
optimization of vector loop.
* gcc.target/i386/vect-unpack-3.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-1.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-2.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-3.c: New test.


diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
new file mode 100644
index 000..456866d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-msse2 -fdump-tree-cunroll-details" { target { 
i?86-*-* x86_64-*-* } } } */
+
+int ii[31];
+char cc[31] =
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
+20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 };
+
+void __attribute__((noinline,noclone))
+foo (int s)
+{
+  int i;
+  for (i = 0; i < s; i++)
+ii[i] = (int) cc[i];
+}
+
+int main (int argc, const char **argv)
+{
+  int i;
+  foo (31);
+  for (i = 0; i < 31; i++)
+if (ii[i] != i)
+  __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { i?86-*-* 
x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "loop turned into non-loop; it never loops" 
"cunroll" { target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-not "loop with 2 iterations completely 
unrolled" "cunroll" { target { i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
new file mode 100644
index 000..cf1c1ef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-msse2 -fdump-tree-cunroll-details" { target { 
i?86-*-* x86_64-*-* } } } */
+
+int ii[32];
+char cc[66] =
+  { 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0,
+10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0,
+20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0,
+30, 0, 31, 0 };
+
+void __attribute__((noinline,noclone))
+foo (int s)
+{
+  int i;
+   for (i = 0; i < s; i++)
+ ii[i] = (int) cc[i*2];
+}
+
+int main (int argc, const char **argv)
+{
+  int i;
+  foo (32);
+  for (i = 0; i < 32; i++)
+if (ii[i] != i)
+  __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { i?86-*-* 
x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "loop turned into non-loop; it never loops" 
"cunroll" { target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-not "loop with 2 iterations completely 
unrolled" "cunroll" { target { i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
new file mode 100644
index 000..d8fe307
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-msse2 -fdump-tree-cunroll-details" { target { 
i?86-*-* x86_64-*-* } } } */
+
+int ii[33];
+char cc[66] =
+  { 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0,
+10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0,
+20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0,
+30, 0, 31, 0, 32, 0 };
+
+void __attribute__((noinline,noclone))
+foo (int s)
+

Re: [PATCH] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-29 Thread Joern Wolfgang Rennecke



On 29/04/16 11:31, Claudiu Zissulescu wrote:

It should do the job, at least for EM where the jump takes 2 cycle, and by 
means of using delay slots we can make all the cycles count. HS has a branch 
prediction mechanism, hence, filling up the delay slot doesn't have such a big 
impact like in EM or even earlier cpus.
No, the alternative is to hide the delay slot, so if the branch is 
predicted properly, the case with

different high words should be faster without the .d suffix.

I.e. , eagerly filling the delay slot like this has a bigger - negative 
- impact on performance.


Re: fix libsanitizer build on ppc-linux

2016-04-29 Thread Olivier Hainque

> On Apr 29, 2016, at 10:34 , Jakub Jelinek  wrote:
> 
> 
> No, for these files we aren't upstream and just periodically merge stuff
> from there.  So, you should try to discuss this in asan upstream and get
> a fix committed there and we can then merge it and/or cherry-pick it.
> Please see libsanitizer/README.gcc.

Oh, I see. Thanks.

> Also, it would be nice to understand what is different on your powerpc-linux
> from everybody's else where it works, do you have too old or too new kernel
> headers, something different?

I have no precise idea. It's not a recent machine and the OS there is
Red Hat Enterprise Linux Server release 5.10 (Tikanga)

I'll discuss with the compiler-rt folks.

Thanks for your feedback,

Olivier



Re: [SH][committed] Remove SH5 support in compiler

2016-04-29 Thread Oleg Endo
On Thu, 2016-04-28 at 10:27 +0900, Oleg Endo wrote:

> The removal of SH5 support from GCC has been announced here
> https://gcc.gnu.org/ml/gcc/2015-08/msg00101.html
> 
> The attached patch removes support for SH5 in the compiler back end. 
>  There are still some leftovers and new simplification opportunities.
>  These will be addressed in later follow up patches.
> 
> Tested on sh-elf with
> 
> make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
> -m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

The attached patch removes some leftovers and reinstantes the divsf3
expander pattern which got accidentally deleted by the previous patch.

Tested as above, commited as r235632.

Cheers,
Oleg

inlcude/ChangeLog:
* longlong.h (umul_ppmm): Remove SHMEDIA checks.
(__umulsidi3, count_leading_zeros): Remove SHMEDIA implementations.

gcc/ChangeLog:
* common/config/sh/sh-common.c (sh_option_optimization_table): Remove
remaining SH5 related settings.
* config/sh/sh-protos.h (shmedia_cleanup_truncate,
shmedia_prepare_call_address): Delete.
* config/sh/sh.c (sh_print_operand, output_stack_adjust,
DWARF_CIE_DATA_ALIGNMENT, LOCAL_ALIGNMENT): Update comments.
* config/sh/sh.h (SUBTARGET_ASM_RELAX_SPEC,
UNSUPPORTED_SH2A): Remove m5 checks.
(sh_divide_strategy_e): Remove SH5 division strategies.
(TARGET_PTRMEMFUNC_VBIT_LOCATION): Remove and use default.
* config/sh/sh.md (divsf3): Reinstate define_expand pattern.diff --git a/gcc/common/config/sh/sh-common.c b/gcc/common/config/sh/sh-common.c
index ee6e4c9..d7c91b8 100644
--- a/gcc/common/config/sh/sh-common.c
+++ b/gcc/common/config/sh/sh-common.c
@@ -31,15 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 static const struct default_options sh_option_optimization_table[] =
   {
 { OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
-{ OPT_LEVELS_1_PLUS_SPEED_ONLY, OPT_mdiv_, "inv:minlat", 1 },
 { OPT_LEVELS_SIZE, OPT_mdiv_, SH_DIV_STR_FOR_SIZE, 1 },
 { OPT_LEVELS_0_ONLY, OPT_mdiv_, "", 1 },
-/* We can't meaningfully test TARGET_SHMEDIA here, because -m
-   options haven't been parsed yet, hence we'd read only the
-   default.  sh_target_reg_class will return NO_REGS if this is
-   not SHMEDIA, so it's OK to always set
-   flag_branch_target_load_optimize.  */
-{ OPT_LEVELS_2_PLUS, OPT_fbranch_target_load_optimize, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
diff --git a/gcc/config/sh/sh-protos.h b/gcc/config/sh/sh-protos.h
index 537ab39..ea7e847 100644
--- a/gcc/config/sh/sh-protos.h
+++ b/gcc/config/sh/sh-protos.h
@@ -392,11 +392,8 @@ extern void sh_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree,
  signed int, machine_mode);
 extern rtx sh_dwarf_register_span (rtx);
 
-extern int shmedia_cleanup_truncate (rtx);
-
 extern bool sh_contains_memref_p (rtx);
 extern bool sh_loads_bankedreg_p (rtx);
-extern rtx shmedia_prepare_call_address (rtx fnaddr, int is_sibcall);
 extern int sh2a_get_function_vector_number (rtx);
 extern bool sh2a_is_function_vector_call (rtx);
 extern void sh_fix_range (const char *);
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index e6680af..3d9ce9d 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -1177,9 +1177,6 @@ sh_print_operand (FILE *stream, rtx x, int code)
   output_addr_const (stream, x);
   break;
 /* N.B.: %R / %S / %T adjust memory addresses by four.
-   For SHMEDIA, that means they can be used to access the first and
-   second 32 bit part of a 64 bit (or larger) value that
-   might be held in floating point registers or memory.
While they can be used to access 64 bit parts of a larger value
held in general purpose registers, that won't work with memory -
neither for fp registers, since the frxx names are used.  */
@@ -6748,15 +6745,12 @@ output_stack_adjust (int size, rtx reg, int epilogue_p,
 	  rtx adj_reg, tmp_reg, mem;
 	  
 	  /* If we reached here, the most likely case is the (sibcall)
-		 epilogue for non SHmedia.  Put a special push/pop sequence
-		 for such case as the last resort.  This looks lengthy but
-		 would not be problem because it seems to be very
-		 rare.  */
-	  
+		 epilogue.  Put a special push/pop sequence for such case as
+		 the last resort.  This looks lengthy but would not be problem
+		 because it seems to be very rare.  */
 	  gcc_assert (epilogue_p);
-	  
 
-	   /* ??? There is still the slight possibility that r4 or
+	  /* ??? There is still the slight possibility that r4 or
 		  r5 have been reserved as fixed registers or assigned
 		  as global registers, and they change during an
 		  interrupt.  There are possible ways to handle this:
diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index 0303527..9f104f0 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -206,7 +206,7 @@ extern int code_for_indir

Re: [RFC][PR61839]Convert CST BINOP COND_EXPR to COND_EXPR ? (CST BINOP 1) : (CST BINOP 0)

2016-04-29 Thread Richard Biener
On Sun, Apr 17, 2016 at 1:14 AM, kugan
 wrote:
> As explained in PR61839,
>
> Following difference results in extra instructions:
> -  c = b != 0 ? 486097858 : 972195717;
> +  c = a + 972195718 >> (b != 0);
>
> As suggested in PR, attached patch converts CST BINOP COND_EXPR to COND_EXPR
> ? (CST BINOP 1) : (CST BINOP 0).
>
> Bootstrapped and regression tested for x86-64-linux-gnu with no new
> regression. Is this OK for statege-1.

You are missing a testcase.

I think the transform can be generalized to any two-value value-range by
instead of

  lhs = cond_res ? (cst binop 1) : (cst binop 0)

emitting

  lhs = tmp == val1 ? (cst binop val1) : (cst binop val2);

In the PR I asked the transform to be only carried out if cond_res and
tmp have a single use (and thus they'd eventually vanish).

I'm not sure if a general two-value "constant" propagation is profitable
which is why I was originally asking for the pattern to only apply
if the resulting value is used in a comparison which we could then
in turn simplify by substituting COND_RES (or ! COND_RES) for it.
For the general two-value case we'd substitute it with tmp [=!]= val[12]
dependent on which constant is cheaper to test for.

So I think this needs some exploring work on which way to go
and which transform is profitable in the end.  I think the general
two-value case feeding a condition will be always profitable.

Thanks,
Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2016-04-17  Kugan Vivekanandarajah  
>

Add  PR tree-optimization/61839

> * tree-vrp.c (simplify_stmt_using_ranges): Convert CST BINOP
> COND_EXPR to
> COND_EXPR ? (CST BINOP 1) : (CST BINOP 0) when possible.


RE: [PATCH] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-29 Thread Claudiu Zissulescu
> > It should do the job, at least for EM where the jump takes 2 cycle, and by
> means of using delay slots we can make all the cycles count. HS has a branch
> prediction mechanism, hence, filling up the delay slot doesn't have such a big
> impact like in EM or even earlier cpus.
> No, the alternative is to hide the delay slot, so if the branch is
> predicted properly, the case with
> different high words should be faster without the .d suffix.
> 
> I.e. , eagerly filling the delay slot like this has a bigger - negative
> - impact on performance.


If we talking about HS, then we can add another flag 'T' which should instruct 
the branch prediction that we expect this branch to be taken. However, I 
haven't seen any impact of this flag on the code, and the compiler generates 
this. In general, the HS branch prediction has some particularities. Although 
what you say makes perfect sense, I am almost sure it doesn't apply in the case 
of HS because of the way how it is implemented. But this is a good point, I 
will try to keep it in mind and ask the hw guys what is best.

//Claudiu


Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Uros Bizjak
On Fri, Apr 29, 2016 at 12:17 PM, Eric Botcazou  wrote:
>> I'm testing the attached patch. Does it fix your ada failures?
>
> No, it totally breaks stack checking. :-(

Eh, I was trying to be too clever.

Attached patch was actually tested on a couple of cases. It generates
the same assembly as before.

Uros.
Index: i386.md
===
--- i386.md (revision 235620)
+++ i386.md (working copy)
@@ -88,6 +88,7 @@
   UNSPEC_SET_GOT_OFFSET
   UNSPEC_MEMORY_BLOCKAGE
   UNSPEC_STACK_CHECK
+  UNSPEC_PROBE_STACK
 
   ;; TLS support
   UNSPEC_TP
@@ -17552,6 +17553,29 @@
   DONE;
 })
 
+(define_expand "probe_stack"
+  [(match_operand 0 "memory_operand")]
+  ""
+{
+  rtx (*insn) (rtx)
+= (GET_MODE (operands[0]) == DImode
+   ? gen_probe_stack_di : gen_probe_stack_si);
+
+  emit_insn (insn (operands[0]));
+  DONE;
+})
+
+;; Use OR for stack probes, this is shorter.
+(define_insn "probe_stack_"
+  [(set (match_operand:W 0 "memory_operand" "=m")
+   (unspec:W [(const_int 0)] UNSPEC_PROBE_STACK))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "or{}\t{$0, %0|%0, 0}"
+  [(set_attr "type" "alu1")
+   (set_attr "mode" "")
+   (set_attr "length_immediate" "1")])
+  
 (define_insn "adjust_stack_and_probe"
   [(set (match_operand:P 0 "register_operand" "=r")
(unspec_volatile:P [(match_operand:P 1 "register_operand" "0")]


Re: [PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

2016-04-29 Thread Richard Biener
On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng  wrote:
> Hi,
> Tree if-conversion sometimes cannot convert conditional array reference into 
> unconditional one.  Root cause is GCC conservatively assumes newly introduced 
> array reference could be out of array bound and thus trapping.  This patch 
> improves the situation by proving the converted unconditional array reference 
> is within array bound using loop niter information.  To be specific, it 
> checks every index of array reference to see if it's within bound in 
> ifcvt_memrefs_wont_trap.  This patch also factors out base_object_writable 
> checking if the base object is writable or not.
> Bootstrap and test on x86_64 and aarch64, is it OK?

I think you miss to handle the case optimally where the only
non-ARRAY_REF idx is the dereference of the
base-pointer for, say, p->a[i].  In this case we can use
base_master_dr to see if p is unconditionally dereferenced
in the loop.  You also fail to handle the case where we have
MEM_REF[&x].a[i] that is, you see a decl base.
I suppose for_each_index should be fixed for this particular case (to
return true), same for TARGET_MEM_REF TMR_BASE.

+  /* The case of nonconstant bounds could be handled, but it would be
+ complicated.  */
+  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
+  || !high || TREE_CODE (high) != INTEGER_CST)
+return false;
+

handling of a non-zero but constant low bound is important - otherwise
all this is a no-op for Fortran.  It
shouldn't be too difficult to handle after all.  In fact I think your
code does handle it correctly already.

+  if (!init || TREE_CODE (init) != INTEGER_CST
+  || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop (step))
+return false;

step == 0 should be easy to handle as well, no?  The index will simply
always be 'init' ...

+  /* In case the relevant bound of the array does not fit in type, or
+ it does, but bound + step (in type) still belongs into the range of the
+ array, the index may wrap and still stay within the range of the array
+ (consider e.g. if the array is indexed by the full range of
+ unsigned char).
+
+ To make things simpler, we require both bounds to fit into type, although
+ there are cases where this would not be strictly necessary.  */
+  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
+return false;
+
+  low = fold_convert (type, low);

please use wide_int for all of this.

I wonder if we can do sth for wrapping IVs like

int a[2048];

for (int i = 0; i < 4096; ++i)
  ... a[(unsigned char)i];

as well.  Like if the IVs type max and min value are within the array bounds
simply return true?

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-04-28  Bin Cheng  
>
> * tree-if-conv.c (tree-ssa-loop.h): Include header file.
> (tree-ssa-loop-niter.h): Ditto.
> (idx_within_array_bound, ref_within_array_bound): New functions.
> (ifcvt_memrefs_wont_trap): Check if array ref is within bound.
> Factor out check on writable base object to ...
> (base_object_writable): ... here.


Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Eric Botcazou
> Attached patch was actually tested on a couple of cases. It generates
> the same assembly as before.

Note that you could just remove the second ":W" in the define_insn pattern.

That's better, but not quite it because this segfaults at -O2:

#0  memory_operand (op=0xabababababababab, mode=mode@entry=VOIDmode)
at /home/eric/svn/gcc/gcc/recog.c:1360
#1  0x014388b1 in get_attr_memory (insn=insn@entry=0x7697b8c0)
at /home/eric/svn/gcc/gcc/config/i386/i386.md:2120
#2  0x01636fb8 in insn_default_latency_generic (insn=0x7697b8c0)
at /home/eric/svn/gcc/gcc/config/i386/i386.md:27394
#3  0x017f9695 in insn_cost (insn=0x7697b8c0)
at /home/eric/svn/gcc/gcc/haifa-sched.c:1415
#4  0x017feb75 in dep_cost_1 (link=link@entry=0x2e962e8, 
dw=dw@entry=0)
at /home/eric/svn/gcc/gcc/haifa-sched.c:1468
#5  0x01800d7a in dep_cost (link=0x2e962e8)
at /home/eric/svn/gcc/gcc/haifa-sched.c:1523
#6  priority (insn=0x7697b8c0) at /home/eric/svn/gcc/gcc/haifa-
sched.c:1674
#7  0x01800e6f in set_priorities (head=, 
tail=) at /home/eric/svn/gcc/gcc/haifa-sched.c:7209
#8  0x00f689e3 in compute_priorities ()
at /home/eric/svn/gcc/gcc/sched-rgn.c:3022
#9  0x00f6bc46 in schedule_region (rgn=0)
at /home/eric/svn/gcc/gcc/sched-rgn.c:3115
#10 schedule_insns () at /home/eric/svn/gcc/gcc/sched-rgn.c:3513
#11 0x00f6c4de in schedule_insns ()

(gdb) frame 1
#1  0x014388b1 in get_attr_memory (insn=insn@entry=0x7697b8c0)
at /home/eric/svn/gcc/gcc/config/i386/i386.md:2120
2120   (match_test "TARGET_AVX")
(gdb) p debug_rtx(insn)
(insn 779 927 928 2 (parallel [
(set (mem/v:DI (reg/f:DI 7 sp) [0  S8 A8])
(unspec:DI [
(const_int 0 [0])
] UNSPEC_PROBE_STACK))
(clobber (reg:CC 17 flags))
]) c52104y.adb:57 1005 {probe_stack_di}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
$1 = void

-- 
Eric Botcazou


Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-29 Thread Ilya Enkovich
2016-04-29 12:48 GMT+03:00 Uros Bizjak :
> On Thu, Apr 28, 2016 at 12:36 PM, Ilya Enkovich  
> wrote:
>
>> That's what I have in my draft for DImode immediates:
>>
>> @@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
>> unsigned insn_uid)
>>BITMAP_FREE (queue);
>>  }
>>
>> +/* Return a cost of building a vector costant
>> +   instead of using a scalar one.  */
>> +
>> +int
>> +scalar_chain::vector_const_cost (rtx exp)
>> +{
>> +  gcc_assert (CONST_INT_P (exp));
>> +
>> +  if (const0_operand (exp, GET_MODE (exp))
>> +  || constm1_operand (exp, GET_MODE (exp)))
>
> The above should just use
>
> standard_sse_constant_p (exp, V2DImode).

Thanks for the tip!  Surprisingly this replacement caused a different
cost for non-standard constants.  Looking at it in GDB I found:

(gdb) p exp
$3 = (rtx) 0x77f0b560
(gdb) pr
warning: Expression is not an assignment (and might have no effect)
(const_int -1085102592571150096 [0xf0f0f0f0f0f0f0f0])
(gdb) p constm1_operand (exp,GET_MODE (exp))
$4 = 1

Do I misuse constm1_operand?

Thanks,
Ilya

>
> Uros.


Re: [PATCH] Re-use cc1-checksum.c for stage-final

2016-04-29 Thread Richard Biener
On Thu, 28 Apr 2016, Jeff Law wrote:

> On 04/28/2016 02:49 AM, Richard Biener wrote:
> > 
> > The following prototype patch re-uses cc1-checksum.c from the
> > previous stage when compiling stage-final.  This eventually
> > allows to compare cc1 from the last two stages to fix the
> > lack of a true comparison when doing LTO bootstrap (it
> > compiles LTO bytecode from the compile-stage there, not the
> > final optimization result).
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu.
> > 
> > When stripping gcc/cc1 and prev-gcc/cc1 after the bootstrap
> > they now compare identical (with LTO bootstrap it should
> > not require stripping as that doesn't do a bootstrap-debug AFAIK).
> > 
> > Is sth like this acceptable?  (consider it also done for cp/Make-lang.in)
> > 
> > In theory we can compare all stage1 languages but I guess comparing
> > the required ones for a LTO bootstrap, cc1, cc1plus and lto1 would
> > be sufficient (or even just comparing one binary in which case
> > comparing lto1 would not require any patches).
> > 
> > This also gets rid of the annoying warning that cc1-checksum.o
> > differs (obviously).
> > 
> > Thanks,
> > Richard.
> > 
> > 2016-04-28  Richard Biener  
> > 
> > c/
> > * Make-lang.in (cc1-checksum.c): For stage-final re-use
> > the checksum from the previous stage.
> I won't object if you add a comment into the fragment indicating why you're
> doing this.

So the following is a complete patch (not considering people may
add objc or obj-c++ to stage1 languages).  Build with --disable-bootstrap,
bootstrapped and profilebootstrapped with verifying it works as
intended (looks like we don't compare with profiledbootstrap - huh,
we're building stagefeedback only once)

Ok for trunk?

Step 2 will now be to figure out how to also compare cc1 (for example)
when using bootstrap-lto ... (we don't want to do this unconditionally
as it is a waste of time when the objects are not only LTO bytecode).

Thanks,
Richard.

2016-04-29  Richard Biener  

c/
* Make-lang.in (cc1-checksum.c): For stage-final re-use
the checksum from the previous stage.

cp/
* Make-lang.in (cc1plus-checksum.c): For stage-final re-use
the checksum from the previous stage.

Index: gcc/c/Make-lang.in
===
*** gcc/c/Make-lang.in  (revision 235623)
--- gcc/c/Make-lang.in  (working copy)
*** c_OBJS = $(C_OBJS) cc1-checksum.o c/gccs
*** 61,71 
  c-warn = $(STRICT_WARN)
  
  # compute checksum over all object files and the options
  cc1-checksum.c : build/genchecksum$(build_exeext) checksum-options \
$(C_OBJS) $(BACKEND) $(LIBDEPS) 
!   build/genchecksum$(build_exeext) $(C_OBJS) $(BACKEND) $(LIBDEPS) \
   checksum-options > cc1-checksum.c.tmp && 
 \
!   $(srcdir)/../move-if-change cc1-checksum.c.tmp cc1-checksum.c
  
  cc1$(exeext): $(C_OBJS) cc1-checksum.o $(BACKEND) $(LIBDEPS)
+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ $(C_OBJS) \
--- 61,78 
  c-warn = $(STRICT_WARN)
  
  # compute checksum over all object files and the options
+ # re-use the checksum from the prev-final stage so it passes
+ # the bootstrap comparison and allows comparing of the cc1 binary
  cc1-checksum.c : build/genchecksum$(build_exeext) checksum-options \
$(C_OBJS) $(BACKEND) $(LIBDEPS) 
!   if [ -f ../stage_final ] \
!  && cmp -s ../stage_current ../stage_final; then \
! cp ../prev-gcc/cc1-checksum.c cc1-checksum.c; \
!   else \
! build/genchecksum$(build_exeext) $(C_OBJS) $(BACKEND) $(LIBDEPS) \
   checksum-options > cc1-checksum.c.tmp && 
 \
! $(srcdir)/../move-if-change cc1-checksum.c.tmp cc1-checksum.c; \
!   fi
  
  cc1$(exeext): $(C_OBJS) cc1-checksum.o $(BACKEND) $(LIBDEPS)
+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ $(C_OBJS) \
Index: gcc/cp/Make-lang.in
===
*** gcc/cp/Make-lang.in (revision 235623)
--- gcc/cp/Make-lang.in (working copy)
*** c++_OBJS = $(CXX_OBJS) cc1plus-checksum.
*** 90,100 
  cp-warn = $(STRICT_WARN)
  
  # compute checksum over all object files and the options
  cc1plus-checksum.c : build/genchecksum$(build_exeext) checksum-options \
$(CXX_OBJS) $(BACKEND) $(LIBDEPS) 
!   build/genchecksum$(build_exeext) $(CXX_OBJS) $(BACKEND) $(LIBDEPS) \
   checksum-options > cc1plus-checksum.c.tmp &&\
!   $(srcdir)/../move-if-change cc1plus-checksum.c.tmp cc1plus-checksum.c
  
  cc1plus$(exeext): $(CXX_OBJS) cc1plus-checksum.o $(BACKEND) $(LIBDEPS)
+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
--- 90,107 
  cp-warn = $(STRICT_WARN)
  
  # compute checksum over all object files and the options
+ # re-use the checksum from the prev-final stage so it passes
+ # the bootstrap comparison and allows co

Re: Allow redefinition of libcilkrts debug macros

2016-04-29 Thread Ilya Verbin
Hi Rainer!

On Fri, Apr 29, 2016 at 10:58:25 +0200, Rainer Orth wrote:
> > On 04/26/2016 08:04 AM, Rainer Orth wrote:
> >> When working on a couple of Cilk Plus issues lately (PRs target/60290,
> >> target/68945), I noticed that you have to modify the libcilkplus sources
> >> to enable various debugging output.  This seems silly, and the following
> >> patch allows defining them from the command line.
> >>
> >> Tested on i386-pc-solaris2.12 and sparc-sun-solaris2.12.
> >>
> >> Ok for mainline?
> >>
> >>Rainer
> >>
> >>
> >> 2016-04-07  Rainer Orth  
> >>
> >>* runtime/except-gcc.cpp (DEBUG_EXCEPTIONS): Allow redefinition.
> >>* runtime/cilk_fiber.h (FIBER_DEBUG): Likewise.
> >>* runtime/scheduler.h (REDPAR_DEBUG): Likewise.
> > Ilya will have to chime in here -- we're a downstream consumer of the Cilk+
> > runtime.  So these patches need to go into Intel's tree first, then Ilya
> > can bring them into the GCC tree.
> 
> I suspected that much.  It would be good to have a libcilkrts/README.gcc
> describing the rules which changes can go into the gcc tree directly,
> which need to go upstream first, and how.  libo and libsanitizer already
> have this.

Could you please submit your patch to 
?
All patches for libcilkrts/* should go there first in order to avoid possible
license issues, or possible losses during the merge.

Thanks,
  -- Ilya


Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Uros Bizjak
On Fri, Apr 29, 2016 at 1:23 PM, Eric Botcazou  wrote:
>> Attached patch was actually tested on a couple of cases. It generates
>> the same assembly as before.
>
> Note that you could just remove the second ":W" in the define_insn pattern.

Eh, then the build yawns about the missing mode of the input operand.

> That's better, but not quite it because this segfaults at -O2:

This is getting a bit frustrating, but attached patch should solve
this failure. Again lightly tested, regtest in progress.

Uros.
Index: i386.md
===
--- i386.md (revision 235620)
+++ i386.md (working copy)
@@ -88,6 +88,7 @@
   UNSPEC_SET_GOT_OFFSET
   UNSPEC_MEMORY_BLOCKAGE
   UNSPEC_STACK_CHECK
+  UNSPEC_PROBE_STACK
 
   ;; TLS support
   UNSPEC_TP
@@ -17552,6 +17553,30 @@
   DONE;
 })
 
+(define_expand "probe_stack"
+  [(match_operand 0 "memory_operand")]
+  ""
+{
+  rtx (*insn) (rtx, rtx)
+= (GET_MODE (operands[0]) == DImode
+   ? gen_probe_stack_di : gen_probe_stack_si);
+
+  emit_insn (insn (operands[0], const0_rtx));
+  DONE;
+})
+
+;; Use OR for stack probes, this is shorter.
+(define_insn "probe_stack_"
+  [(set (match_operand:W 0 "memory_operand" "=m")
+   (unspec:W [(match_operand:W 1 "const0_operand")]
+ UNSPEC_PROBE_STACK))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "or{}\t{%1, %0|%0, %1}"
+  [(set_attr "type" "alu1")
+   (set_attr "mode" "")
+   (set_attr "length_immediate" "1")])
+  
 (define_insn "adjust_stack_and_probe"
   [(set (match_operand:P 0 "register_operand" "=r")
(unspec_volatile:P [(match_operand:P 1 "register_operand" "0")]


Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-29 Thread Uros Bizjak
On Fri, Apr 29, 2016 at 1:26 PM, Ilya Enkovich  wrote:
> 2016-04-29 12:48 GMT+03:00 Uros Bizjak :
>> On Thu, Apr 28, 2016 at 12:36 PM, Ilya Enkovich  
>> wrote:
>>
>>> That's what I have in my draft for DImode immediates:
>>>
>>> @@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
>>> unsigned insn_uid)
>>>BITMAP_FREE (queue);
>>>  }
>>>
>>> +/* Return a cost of building a vector costant
>>> +   instead of using a scalar one.  */
>>> +
>>> +int
>>> +scalar_chain::vector_const_cost (rtx exp)
>>> +{
>>> +  gcc_assert (CONST_INT_P (exp));
>>> +
>>> +  if (const0_operand (exp, GET_MODE (exp))
>>> +  || constm1_operand (exp, GET_MODE (exp)))
>>
>> The above should just use
>>
>> standard_sse_constant_p (exp, V2DImode).
>
> Thanks for the tip!  Surprisingly this replacement caused a different
> cost for non-standard constants.  Looking at it in GDB I found:
>
> (gdb) p exp
> $3 = (rtx) 0x77f0b560
> (gdb) pr
> warning: Expression is not an assignment (and might have no effect)
> (const_int -1085102592571150096 [0xf0f0f0f0f0f0f0f0])
> (gdb) p constm1_operand (exp,GET_MODE (exp))
> $4 = 1
>
> Do I misuse constm1_operand?

No, it is just a typo that crept in constm1_operand:

;; Match exactly -1.
(define_predicate "constm1_operand"
  (and (match_code "const_int")
   (match_test "op = constm1_rtx")))

There should be a test, not an assignment.

Uros.


[PATCH 2/3] Add profiling support for IVOPTS

2016-04-29 Thread marxin
gcc/ChangeLog:

2016-04-25  Martin Liska  

* tree-ssa-loop-ivopts.c (struct comp_cost): Introduce
m_cost_scaled and m_frequency fields.
(comp_cost::operator=): Assign to m_cost_scaled.
(operator+): Likewise.
(comp_cost::operator+=): Likewise.
(comp_cost::operator-=): Likewise.
(comp_cost::operator/=): Likewise.
(comp_cost::operator*=): Likewise.
(operator-): Likewise.
(comp_cost::set_cost): Likewise.
(comp_cost::get_cost_scaled): New function.
(comp_cost::calculate_scaled_cost): Likewise.
(comp_cost::propagate_scaled_cost): Likewise.
(comp_cost::get_frequency): Likewise.
(comp_cost::scale_cost): Likewise.
(comp_cost::has_frequency): Likewise.
(get_computation_cost_at): Propagate ratio of frequencies
of loop header and another basic block.
(determine_group_iv_costs): Dump new fields.
---
 gcc/tree-ssa-loop-ivopts.c | 130 -
 1 file changed, 118 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1e68927..af00ff0 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -107,6 +107,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-address.h"
 #include "builtins.h"
 #include "tree-vectorizer.h"
+#include "sreal.h"
 
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
cost of different addressing modes.  This should be moved to a TBD
@@ -173,11 +174,13 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
-  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0)
+  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0),
+m_frequency (sreal (0)), m_cost_scaled (sreal (0))
   {}
 
   comp_cost (int cost, unsigned complexity)
-: m_cost (cost), m_complexity (complexity), m_scratch (0)
+: m_cost (cost), m_complexity (complexity), m_scratch (0),
+  m_frequency (sreal (0)), m_cost_scaled (sreal (0))
   {}
 
   comp_cost& operator= (const comp_cost& other);
@@ -236,6 +239,26 @@ struct comp_cost
   /* Set the scratch to S.  */
   void set_scratch (unsigned s);
 
+  /* Return scaled cost.  */
+  double get_cost_scaled ();
+
+  /* Calculate scaled cost based on frequency of a basic block with
+ frequency equal to NOMINATOR / DENOMINATOR.  */
+  void calculate_scaled_cost (int nominator, int denominator);
+
+  /* Propagate scaled cost which is based on frequency of basic block
+ the cost belongs to.  */
+  void propagate_scaled_cost ();
+
+  /* Return frequency of the cost.  */
+  double get_frequency ();
+
+  /* Scale COST by frequency of the cost.  */
+  const sreal scale_cost (int cost);
+
+  /* Return true if the frequency has a valid value.  */
+  bool has_frequency ();
+
   /* Return infinite comp_cost.  */
   static comp_cost get_infinite ();
 
@@ -249,6 +272,9 @@ private:
 complexity field should be larger for more
 complex expressions and addressing modes).  */
   int m_scratch; /* Scratch used during cost computation.  */
+  sreal m_frequency; /* Frequency of the basic block this comp_cost
+belongs to.  */
+  sreal m_cost_scaled;   /* Scalled runtime cost.  */
 };
 
 comp_cost&
@@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
   m_cost = other.m_cost;
   m_complexity = other.m_complexity;
   m_scratch = other.m_scratch;
+  m_frequency = other.m_frequency;
+  m_cost_scaled = other.m_cost_scaled;
 
   return *this;
 }
@@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
 
   cost1.m_cost += cost2.m_cost;
   cost1.m_complexity += cost2.m_complexity;
+  cost1.m_cost_scaled += cost2.m_cost_scaled;
 
   return cost1;
 }
@@ -290,6 +319,8 @@ comp_cost
 comp_cost::operator+= (HOST_WIDE_INT c)
 {
   this->m_cost += c;
+  if (has_frequency ())
+this->m_cost_scaled += scale_cost (c);
 
   return *this;
 }
@@ -298,6 +329,8 @@ comp_cost
 comp_cost::operator-= (HOST_WIDE_INT c)
 {
   this->m_cost -= c;
+  if (has_frequency ())
+this->m_cost_scaled -= scale_cost (c);
 
   return *this;
 }
@@ -306,6 +339,8 @@ comp_cost
 comp_cost::operator/= (HOST_WIDE_INT c)
 {
   this->m_cost /= c;
+  if (has_frequency ())
+this->m_cost_scaled /= scale_cost (c);
 
   return *this;
 }
@@ -314,6 +349,8 @@ comp_cost
 comp_cost::operator*= (HOST_WIDE_INT c)
 {
   this->m_cost *= c;
+  if (has_frequency ())
+this->m_cost_scaled *= scale_cost (c);
 
   return *this;
 }
@@ -323,6 +360,7 @@ operator- (comp_cost cost1, comp_cost cost2)
 {
   cost1.m_cost -= cost2.m_cost;
   cost1.m_complexity -= cost2.m_complexity;
+  cost1.m_cost_scaled -= cost2.m_cost_scaled;
 
   return cost1;
 }
@@ -366,6 +404,7 @@ void
 comp_cost::set_cost (int c)
 {
   m_cost = c;
+  m_cost_scaled = scale_cost (c);
 }
 
 unsigned
@@ -392,6 +431,48 @@ comp_cost::set_scratch (unsigned 

[PATCH 3/3] Enhance dumps of IVOPTS

2016-04-29 Thread marxin
gcc/ChangeLog:

2016-04-25  Martin Liska  

* tree-ssa-loop-ivopts.c (struct ivopts_data): Add inv_expr_map.
(tree_ssa_iv_optimize_init): Initialize it.
(get_expr_id): Assign expressions to the map.
(iv_ca_dump): Dump invariant expressions.
(create_new_ivs): Dump # of inv. expressions and loop niter.
(tree_ssa_iv_optimize_finalize): Release the newly added map.

gcc/testsuite/ChangeLog:

2016-04-29  Martin Liska  

* g++.dg/tree-ssa/ivopts-3.C: Change test-case to follow
the new format of dump output.
---
 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |  2 +-
 gcc/tree-ssa-loop-ivopts.c   | 28 
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C 
b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
index 6194e9d..eb72581 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
@@ -72,4 +72,4 @@ int main ( int , char** ) {
 
 // Verify that on x86_64 and i?86 we use a single IV for the innermost loop
 
-// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ 
\]*:64, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ 
\]*:64, 3 avg niters, 1 expressions, 1 IVs" "ivopts" { target x86_64-*-* 
i?86-*-* } } }
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index af00ff0..52c8184 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -666,6 +666,9 @@ struct ivopts_data
   /* The maximum invariant expression id.  */
   int max_inv_expr_id;
 
+  /* Dictionary of inv_expr with id used as a key.  */
+  vec inv_expr_map;
+
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
 
@@ -1186,6 +1189,7 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->important_candidates = BITMAP_ALLOC (NULL);
   data->max_inv_id = 0;
   data->niters = NULL;
+  data->inv_expr_map.create (20);
   data->vgroups.create (20);
   data->vcands.create (20);
   data->inv_expr_tab = new hash_table (10);
@@ -4812,6 +4816,12 @@ get_expr_id (struct ivopts_data *data, tree expr)
   (*slot)->expr = expr;
   (*slot)->hash = ent.hash;
   (*slot)->id = data->max_inv_expr_id++;
+
+  unsigned id = (*slot)->id;
+  if (id + 1 >= data->inv_expr_map.length ())
+data->inv_expr_map.safe_grow (id + 1);
+  data->inv_expr_map[id] = *slot;
+
   return (*slot)->id;
 }
 
@@ -6590,6 +6600,20 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct 
iv_ca *ivs)
fprintf (file, "%s%d", pref, i);
pref = ", ";
   }
+
+  if (ivs->num_used_inv_expr)
+{
+  fprintf (dump_file, "\n  used invariant expressions:\n");
+  for (int i = 0; i <= data->max_inv_expr_id; i++)
+   if (ivs->used_inv_expr[i])
+ {
+   fprintf (dump_file, "   inv_expr:%d: \t", i);
+   print_generic_expr (dump_file, data->inv_expr_map[i]->expr,
+   TDF_SLIM);
+   fprintf (dump_file, "\n");
+ }
+}
+
   fprintf (file, "\n\n");
 }
 
@@ -7251,6 +7275,9 @@ create_new_ivs (struct ivopts_data *data, struct iv_ca 
*set)
   if (data->loop_loc != UNKNOWN_LOCATION)
fprintf (dump_file, " at %s:%d", LOCATION_FILE (data->loop_loc),
 LOCATION_LINE (data->loop_loc));
+  fprintf (dump_file, ", %lu avg niters",
+  avg_loop_niter (data->current_loop));
+  fprintf (dump_file, ", %u expressions", set->num_used_inv_expr);
   fprintf (dump_file, ", %lu IVs:\n", bitmap_count_bits (set->cands));
   EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
 {
@@ -7820,6 +7847,7 @@ tree_ssa_iv_optimize_finalize (struct ivopts_data *data)
   BITMAP_FREE (data->important_candidates);
 
   decl_rtl_to_reset.release ();
+  data->inv_expr_map.release ();
   data->vgroups.release ();
   data->vcands.release ();
   delete data->inv_expr_tab;
-- 
2.8.1



[PATCH 0/3] IVOPTS: support profiling

2016-04-29 Thread marxin
Hello.

As profile-guided optimization can provide very useful information
about basic block frequencies within a loop, following patch set leverages
that information. It speeds up a single benchmark from upcoming SPECv6
suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
also improve others (currently measuring numbers for PGO).

Idea is quite simple, where each cost (belonging to a BB) is
multiplied by (bb_frequency / header_frequency), which suppress IV uses
in basic blocks with a low frequency.

The patch set can bootstrap on ppc64le-linux-gnu (and also
x86_64-linux-gnu) and no new regression is introduced.

Ready for trunk?
Thanks,
Martin

marxin (3):
  Encapsulate comp_cost within a class with methods.
  Add profiling support for IVOPTS
  Enhance dumps of IVOPTS

 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |   2 +-
 gcc/tree-ssa-loop-ivopts.c   | 690 ++-
 2 files changed, 491 insertions(+), 201 deletions(-)

-- 
2.8.1



Re: [PATCHv2 2/7] gcc/arc: Replace rI constraint with r & Cm2 for ld and update insns

2016-04-29 Thread Andrew Burgess
* Joern Wolfgang Rennecke  [2016-04-28 18:06:42 +0100]:

> On 21/04/16 12:39, Andrew Burgess wrote:
> >
> > * config/arc/arc.md (*loadqi_update): Replace use of 'rI'
> > constraint with separate 'r' and 'Cm2' constraints.
> > 
> Why don't you use simply rCm2 ?

You are absolutely correct.  Thank you for pointing this out.

The much simpler version of this patch is below.

Thanks,
Andrew

---

gcc/arc: Replace rI constraint with rCm2 for ld and update insns

In the load*_update instructions the constraint 'rI' was being used,
which would accept either a register or a signed 12 bit constant.  The
problem is that the 32-bit form of ld with update only takes a signed
9-bit immediate.  As such, some ld instructions could be generated that
would, when assembled be 64-bit long, however, GCC believed them to be
32-bit long.  This error in the length would cause problems during
branch shortening.

The store*_update have the same restrictions on immediate size, however,
the patterns for these instructions already only accept 9-bit
immediates, and so should be safe.

gcc/ChangeLog:

* config/arc/arc.md (*loadqi_update): Replace use of 'rI'
constraint with 'rCm2' constraints to limit possible immediate
size.
(*load_zeroextendqisi_update): Likewise.
(*load_signextendqisi_update): Likewise.
(*loadhi_update): Likewise.
(*load_zeroextendhisi_update): Likewise.
(*load_signextendhisi_update): Likewise.
(*loadsi_update): Likewise.
(*loadsf_update): Likewise.
---
 gcc/ChangeLog.NPS400  | 12 
 gcc/config/arc/arc.md | 16 
 2 files changed, 20 insertions(+), 8 deletions(-)
 create mode 100644 gcc/ChangeLog.NPS400

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index d1a9159..c61107f 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -1254,7 +1254,7 @@
   [(set (match_operand:QI 3 "dest_reg_operand" "=r,r")
 (match_operator:QI 4 "any_mem_operand"
  [(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-   (match_operand:SI 2 "nonmemory_operand" "rI,Cal"))]))
+   (match_operand:SI 2 "nonmemory_operand" "rCm2,Cal"))]))
(set (match_operand:SI 0 "dest_reg_operand" "=r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
@@ -1266,7 +1266,7 @@
   [(set (match_operand:SI 3 "dest_reg_operand" "=r,r")
(zero_extend:SI (match_operator:QI 4 "any_mem_operand"
 [(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-  (match_operand:SI 2 "nonmemory_operand" 
"rI,Cal"))])))
+  (match_operand:SI 2 "nonmemory_operand" 
"rCm2,Cal"))])))
(set (match_operand:SI 0 "dest_reg_operand" "=r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
@@ -1278,7 +1278,7 @@
   [(set (match_operand:SI 3 "dest_reg_operand" "=r,r")
(sign_extend:SI (match_operator:QI 4 "any_mem_operand"
 [(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-  (match_operand:SI 2 "nonmemory_operand" 
"rI,Cal"))])))
+  (match_operand:SI 2 "nonmemory_operand" 
"rCm2,Cal"))])))
(set (match_operand:SI 0 "dest_reg_operand" "=r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
@@ -1304,7 +1304,7 @@
   [(set (match_operand:HI 3 "dest_reg_operand" "=r,r")
(match_operator:HI 4 "any_mem_operand"
 [(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-  (match_operand:SI 2 "nonmemory_operand" "rI,Cal"))]))
+  (match_operand:SI 2 "nonmemory_operand" "rCm2,Cal"))]))
(set (match_operand:SI 0 "dest_reg_operand" "=w,w")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
@@ -1316,7 +1316,7 @@
   [(set (match_operand:SI 3 "dest_reg_operand" "=r,r")
(zero_extend:SI (match_operator:HI 4 "any_mem_operand"
 [(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-  (match_operand:SI 2 "nonmemory_operand" 
"rI,Cal"))])))
+  (match_operand:SI 2 "nonmemory_operand" 
"rCm2,Cal"))])))
(set (match_operand:SI 0 "dest_reg_operand" "=r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
@@ -1329,7 +1329,7 @@
   [(set (match_operand:SI 3 "dest_reg_operand" "=r,r")
(sign_extend:SI (match_operator:HI 4 "any_mem_operand"
 [(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-  (match_operand:SI 2 "nonmemory_operand" 
"rI,Cal"))])))
+  (match_operand:SI 2 "nonmemory_operand" 
"rCm2,Cal"))])))
(set (match_operand:SI 0 "dest_reg_operand" "=w,w")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
@@ -1354,7 +1354,7 @@
   [(set (match_operand:SI 3 "dest_reg_operand" "=r,r")
(match_operator:SI 4 "any_mem_operand"
 [(plus:SI (match_

[PATCH 1/3] Encapsulate comp_cost within a class with methods.

2016-04-29 Thread marxin
gcc/ChangeLog:

2016-04-25  Martin Liska  

* tree-ssa-loop-ivopts.c(comp_cost::operator=): New function.
(comp_cost::infinite_cost_p): Likewise.
(operator+): Likewise.
(comp_cost::operator+=): Likewise.
(comp_cost::operator-=): Likewise.
(comp_cost::operator/=): Likewise.
(comp_cost::operator*=): Likewise.
(operator-): Likewise.
(operator<): Likewise.
(operator==): Likewise.
(operator<=): Likewise.
(comp_cost::get_cost): Likewise.
(comp_cost::set_cost): Likewise.
(comp_cost::get_complexity): Likewise.
(comp_cost::set_complexity): Likewise.
(comp_cost::get_scratch): Likewise.
(comp_cost::set_scratch): Likewise.
(comp_cost::get_infinite): Likewise.
(comp_cost::get_no_cost): Likewise.
(struct ivopts_data): Rename inv_expr_id to max_inv_expr_id;
(tree_ssa_iv_optimize_init): Use the renamed property.
(new_cost): Remove.
(infinite_cost_p): Likewise.
(add_costs): Likewise.
(sub_costs): Likewise.
(compare_costs): Likewise.
(set_group_iv_cost): Use comp_cost::infinite_cost_p.
(get_address_cost): Use new comp_cost::comp_cost.
(get_shiftadd_cost): Likewise.
(force_expr_to_var_cost): Use new comp_cost::get_no_cost.
(split_address_cost): Likewise.
(ptr_difference_cost): Likewise.
(difference_cost): Likewise.
(get_expr_id): Use max_inv_expr_id.
(get_computation_cost_at): Use comp_cost::get_infinite.
(determine_group_iv_cost_generic): Use comp_cost::get_no_cost.
(determine_group_iv_cost_address): Likewise.
(determine_group_iv_cost_cond): Use comp_const::infinite_cost_p.
(autoinc_possible_for_pair): Likewise.
(determine_group_iv_costs): Use new methods of comp_cost.
(determine_iv_cost): Likewise.
(cheaper_cost_pair): Use comp_cost operators.
(iv_ca_recount_cost): Likewise.
(iv_ca_set_no_cp): Likewise.
(iv_ca_set_cp): Likewise.
(iv_ca_cost): Use comp_cost::get_infinite.
(iv_ca_new): Use comp_cost::get_no_cost.
(iv_ca_dump): Use new methods of comp_cost.
(iv_ca_narrow): Use operators of comp_cost.
(iv_ca_prune): Likewi.se
(iv_ca_replace): Likewise.
(try_add_cand_for): Likewise.
(try_improve_iv_set): Likewise.
(find_optimal_iv_set): Use new methods of comp_cost.
(free_loop_data): Use renamed max_inv_expr_id.
---
 gcc/tree-ssa-loop-ivopts.c | 548 +
 1 file changed, 352 insertions(+), 196 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 9314363..1e68927 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -173,16 +173,236 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
-  int cost;/* The runtime cost.  */
-  unsigned complexity; /* The estimate of the complexity of the code for
-  the computation (in no concrete units --
-  complexity field should be larger for more
-  complex expressions and addressing modes).  */
-  int scratch; /* Scratch used during cost computation.  */
+  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0)
+  {}
+
+  comp_cost (int cost, unsigned complexity)
+: m_cost (cost), m_complexity (complexity), m_scratch (0)
+  {}
+
+  comp_cost& operator= (const comp_cost& other);
+
+  /* Returns true if COST is infinite.  */
+  bool infinite_cost_p ();
+
+  /* Adds costs COST1 and COST2.  */
+  friend comp_cost operator+ (comp_cost cost1, comp_cost cost2);
+
+  /* Adds COST to the comp_cost.  */
+  comp_cost operator+= (comp_cost cost);
+
+  /* Adds constant C to this comp_cost.  */
+  comp_cost operator+= (HOST_WIDE_INT c);
+
+  /* Subtracts constant C to this comp_cost.  */
+  comp_cost operator-= (HOST_WIDE_INT c);
+
+  /* Divide the comp_cost by constant C.  */
+  comp_cost operator/= (HOST_WIDE_INT c);
+
+  /* Multiply the comp_cost by constant C.  */
+  comp_cost operator*= (HOST_WIDE_INT c);
+
+  /* Subtracts costs COST1 and COST2.  */
+  friend comp_cost operator- (comp_cost cost1, comp_cost cost2);
+
+  /* Subtracts COST from this comp_cost.  */
+  comp_cost operator-= (comp_cost cost);
+
+  /* Returns true if COST1 is smaller than COST2.  */
+  friend bool operator< (comp_cost cost1, comp_cost cost2);
+
+  /* Returns true if COST1 and COST2 are equal.  */
+  friend bool operator== (comp_cost cost1, comp_cost cost2);
+
+  /* Returns true if COST1 is smaller or equal than COST2.  */
+  friend bool operator<= (comp_cost cost1, comp_cost cost2);
+
+  /* Return the cost.  */
+  int get_cost ();
+
+  /* Set the cost to C.  */
+  void set_cost (int c);
+
+  /* Return the complexity.  */
+  unsigned get_complexity ();
+
+  /* Set the complexity to C. 

[PATCH] Fix warn_for_memset ICE (PR c/70852)

2016-04-29 Thread Marek Polacek
Here, we segv because we're checking TYPE_MAXVAL of domain that is actually
null for "extern int a[];".  The fix is trivial.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-29  Marek Polacek  

PR c/70852
* c-common.c (warn_for_memset): Check domain before accessing it.

* gcc.dg/pr70852.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index c086dee..88507a2 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -11796,6 +11796,7 @@ warn_for_memset (location_t loc, tree arg0, tree arg2,
  tree elt_type = TREE_TYPE (type);
  tree domain = TYPE_DOMAIN (type);
  if (!integer_onep (TYPE_SIZE_UNIT (elt_type))
+ && domain != NULL_TREE
  && TYPE_MAXVAL (domain)
  && TYPE_MINVAL (domain)
  && integer_zerop (TYPE_MINVAL (domain))
diff --git gcc/testsuite/gcc.dg/pr70852.c gcc/testsuite/gcc.dg/pr70852.c
index e69de29..2dec082 100644
--- gcc/testsuite/gcc.dg/pr70852.c
+++ gcc/testsuite/gcc.dg/pr70852.c
@@ -0,0 +1,11 @@
+/* PR c/70852 */
+/* { dg-do compile } */
+/* { dg-options "-Wall" } */
+
+extern void *memset (void *, int, __SIZE_TYPE__);
+extern int A[];
+void
+fn1 (void)
+{
+  memset (A, 0, 1);
+}

Marek


[PATCH, i386]: Fix constm1_operand predicate

2016-04-29 Thread Uros Bizjak
Brown-paper bag bug ...

>> Do I misuse constm1_operand?
>
> No, it is just a typo that crept in constm1_operand:
>
> ;; Match exactly -1.
> (define_predicate "constm1_operand"
>   (and (match_code "const_int")
>(match_test "op = constm1_rtx")))
>
> There should be a test, not an assignment.

Fixed by attached patch.

2016-04-29  Uros Bizjak  

* config/i386/predicates.md (constm1_operand): Fix comparison.

Bootstrap and regression test in progress, will commit to mainline ASAP.

Uros.

Index: i386/predicates.md
===
--- i386/predicates.md  (revision 235619)
+++ i386/predicates.md  (working copy)
@@ -678,7 +678,7 @@
 ;; Match exactly -1.
 (define_predicate "constm1_operand"
   (and (match_code "const_int")
-   (match_test "op = constm1_rtx")))
+   (match_test "op == constm1_rtx")))

 ;; Match exactly eight.
 (define_predicate "const8_operand"


Re: Allow redefinition of libcilkrts debug macros

2016-04-29 Thread Rainer Orth
Hi Ilya,

>> >> 2016-04-07  Rainer Orth  
>> >>
>> >>   * runtime/except-gcc.cpp (DEBUG_EXCEPTIONS): Allow redefinition.
>> >>   * runtime/cilk_fiber.h (FIBER_DEBUG): Likewise.
>> >>   * runtime/scheduler.h (REDPAR_DEBUG): Likewise.
>> > Ilya will have to chime in here -- we're a downstream consumer of the Cilk+
>> > runtime.  So these patches need to go into Intel's tree first, then Ilya
>> > can bring them into the GCC tree.
>> 
>> I suspected that much.  It would be good to have a libcilkrts/README.gcc
>> describing the rules which changes can go into the gcc tree directly,
>> which need to go upstream first, and how.  libo and libsanitizer already
>> have this.
>
> Could you please submit your patch to
> ?
> All patches for libcilkrts/* should go there first in order to avoid possible
> license issues, or possible losses during the merge.

sure, will do.  So current upstream libcilkrts should easily
interoperate with current gcc trunk?  That way, I'll be able to complete
the SPARC port from PR target/68945 upstream and just wait for the next
merge to drag it in.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix warn_for_memset ICE (PR c/70852)

2016-04-29 Thread Bernd Schmidt

On 04/29/2016 02:01 PM, Marek Polacek wrote:

Here, we segv because we're checking TYPE_MAXVAL of domain that is actually
null for "extern int a[];".  The fix is trivial.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-29  Marek Polacek  

PR c/70852
* c-common.c (warn_for_memset): Check domain before accessing it.

* gcc.dg/pr70852.c: New test.


Ok, thanks.


Bernd



Re: [PATCHv2 2/7] gcc/arc: Replace rI constraint with r & Cm2 for ld and update insns

2016-04-29 Thread Joern Wolfgang Rennecke



On 29/04/16 12:58, Andrew Burgess wrote:


* config/arc/arc.md (*loadqi_update): Replace use of 'rI'
constraint with 'rCm2' constraints to limit possible immediate
size.
(*load_zeroextendqisi_update): Likewise.
(*load_signextendqisi_update): Likewise.
(*loadhi_update): Likewise.
(*load_zeroextendhisi_update): Likewise.
(*load_signextendhisi_update): Likewise.
(*loadsi_update): Likewise.
(*loadsf_update): Likewise.

Thanks.  I have checked this in.


Re: Cilk Plus testsuite needs massive cleanup (PR testsuite/70595)

2016-04-29 Thread Dominique d'Humières
Any reason why the dg-additional-options "*-lcilkrts" have been removed? AFAICT 
they are needed for darwin.

TIA

Dominique



Re: [PATCH] Fix PR tree-optimization/51513

2016-04-29 Thread Peter Bergner
On Fri, 2016-04-29 at 11:56 +0200, Richard Biener wrote:
> Your testcase passes '2' where it passes just fine.  If I pass 3 as which
> I indeed get an abort () but you can't reasonably expect it to return 13 then.

Bah, I added an extra case and didn't change the argument.  :-(
Let me fix that and then dig into the current behavior.



> So I fail to see the actual bug you are fixing and I wonder why you do stuff
> at the GIMPLE level when we only remove the unreachable blocks at RTL
> level CFG cleanup.  Iff then the "fix" should be there.

I actually started out trying to fix the problem in rtl first, but
ran into multiple problems, which at the time made it seem like
fixing this at the GIMPLE level was a better solution.



> But as said, the behavior is expected - in fact the jump-table code should
> be optimized for a unreachable default case to simply omit the range
> check!  That would be a better fix (also avoiding the wild branch).

I know I've seen the wild branch due to normal case statements having
__builtin_unreachable() too, so it's not just a default case problem.
That said, I'll have a look to see whether we can fix unreachable
normal case statements too.  Thanks.

Peter




[PATCH][genrecog] Fix warning about potentially uninitialised use of label

2016-04-29 Thread Kyrill Tkachov

Hi all,

I'm getting a warning when building genrecog that 'label' may be used 
uninitialised in:

  uint64_t label = 0;

  if (d->test.kind == rtx_test::CODE
  && d->if_statement_p (&label)
  && label == CONST_INT)

This is because if_statement_p looks like this:
 inline bool
 decision::if_statement_p (uint64_t *label) const
 {
   if (singleton () && first->labels.length () == 1)
 {
   if (label)
 *label = first->labels[0];
   return true;
 }
   return false;
 }

It's not guaranteed to write label.
This patch initialises label to 0 to fix the warning.
Is this the right thing to do?

Bootstrapped and tested on aarch64-none-linux-gnu and arm-none-linux-gnueabihf.

Thanks,
Kyrill

2016-04-29  Kyrylo Tkachov  

* genrecog.c (simplify_tests): Initialize label to 0.
diff --git a/gcc/genrecog.c b/gcc/genrecog.c
index 47e42660fcc854e5da3eba4bee2bb4b06a7352b1..0e62c61a8a756766c12138b51498229f442f44d0 100644
--- a/gcc/genrecog.c
+++ b/gcc/genrecog.c
@@ -1583,7 +1583,7 @@ simplify_tests (state *s)
 {
   for (decision *d = s->first; d; d = d->next)
 {
-  uint64_t label;
+  uint64_t label = 0;
   /* Convert checks for GET_CODE (x) == CONST_INT and XWINT (x, 0) == N
 	 into checks for const_int_rtx[N'], if N is suitably small.  */
   if (d->test.kind == rtx_test::CODE


Support <, <=, > and >= for offset_int and widest_int

2016-04-29 Thread Richard Sandiford
offset_int and widest_int are supposed to be at least one bit wider
than all the values they need to represent, with the extra bits
being signs.  Thus offset_int is effectively int128_t and widest_int
is effectively intNNN_t, for target-dependent NNN.

Because the types are signed, there's not really any need to specify
a sign for operations like comparison.  I think things would be clearer
if we supported <, <=, > and >= for them (but not for wide_int, which
doesn't have a sign).

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* wide-int.h: Update offset_int and widest_int documentation.
(WI_SIGNED_BINARY_PREDICATE_RESULT): New macro.
(wi::binary_traits): Allow ordered comparisons between offset_int and
offset_int, between widest_int and widest_int, and between either
of these types and basic C types.
(operator <, <=, >, >=): Define for the same combinations.
* tree.h (tree_int_cst_lt): Use comparison operators instead
of wi:: comparisons.
(tree_int_cst_le): Likewise.
* gimple-fold.c (fold_array_ctor_reference): Likewise.
(fold_nonarray_ctor_reference): Likewise.
* gimple-ssa-strength-reduction.c (record_increment): Likewise.
* tree-affine.c (aff_comb_cannot_overlap_p): Likewise.
* tree-parloops.c (try_transform_to_exit_first_loop_alt): Likewise.
* tree-sra.c (completely_scalarize): Likewise.
* tree-ssa-alias.c (stmt_kills_ref_p): Likewise.
* tree-ssa-reassoc.c (extract_bit_test_mask): Likewise.
* tree-vrp.c (extract_range_from_binary_expr_1): Likewise.
(check_for_binary_op_overflow): Likewise.
(search_for_addr_array): Likewise.
* ubsan.c (ubsan_expand_objsize_ifn): Likewise.

Index: gcc/wide-int.h
===
--- gcc/wide-int.h
+++ gcc/wide-int.h
@@ -53,22 +53,26 @@ along with GCC; see the file COPYING3.  If not see
  multiply, division, shifts, comparisons, and operations that need
  overflow detected), the signedness must be specified separately.
 
- 2) offset_int.  This is a fixed size representation that is
- guaranteed to be large enough to compute any bit or byte sized
- address calculation on the target.  Currently the value is 64 + 4
- bits rounded up to the next number even multiple of
- HOST_BITS_PER_WIDE_INT (but this can be changed when the first
- port needs more than 64 bits for the size of a pointer).
-
- This flavor can be used for all address math on the target.  In
- this representation, the values are sign or zero extended based
- on their input types to the internal precision.  All math is done
- in this precision and then the values are truncated to fit in the
- result type.  Unlike most gimple or rtl intermediate code, it is
- not useful to perform the address arithmetic at the same
- precision in which the operands are represented because there has
- been no effort by the front ends to convert most addressing
- arithmetic to canonical types.
+ 2) offset_int.  This is a fixed-precision integer that can hold
+ any address offset, measured in either bits or bytes, with at
+ least one extra sign bit.  At the moment the maximum address
+ size GCC supports is 64 bits.  With 8-bit bytes and an extra
+ sign bit, offset_int therefore needs to have at least 68 bits
+ of precision.  We round this up to 128 bits for efficiency.
+ Values of type T are converted to this precision by sign- or
+ zero-extending them based on the signedness of T.
+
+ The extra sign bit means that offset_int is effectively a signed
+ 128-bit integer, i.e. it behaves like int128_t.
+
+ Since the values are logically signed, there is no need to
+ distinguish between signed and unsigned operations.  Sign-sensitive
+ comparison operators <, <=, > and >= are therefore supported.
+
+ [ Note that, even though offset_int is effectively int128_t,
+   it can still be useful to use unsigned comparisons like
+   wi::leu_p (a, b) as a more efficient short-hand for
+   "a >= 0 && a <= b". ]
 
  3) widest_int.  This representation is an approximation of
  infinite precision math.  However, it is not really infinite
@@ -76,9 +80,9 @@ along with GCC; see the file COPYING3.  If not see
  precision math where the precision is 4 times the size of the
  largest integer that the target port can represent.
 
- widest_int is supposed to be wider than any number that it needs to
- store, meaning that there is always at least one leading sign bit.
- All widest_int values are therefore signed.
+ Like offset_int, widest_int is wider than all the values that
+ it needs to represent, so the integers are logically signed.
+ Sign-sensitive comparison operators <, <=, > and >= are supported.
 
  There are sever

Support << and >> for offset_int and widest_int

2016-04-29 Thread Richard Sandiford
Following on from the comparison patch, I think it makes sense to
support << and >> for offset_int (int128_t) and widest_int (intNNN_t),
with >> being arithmetic shift.  It doesn't make sense to use
logical right shift on a potentially negative offset_int, since
the precision of 128 bits has no meaning on the target.

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* wide-int.h: Update offset_int and widest_int documentation.
(WI_SIGNED_SHIFT_RESULT): New macro.
(wi::binary_shift): Define signed_shift_result_type for
shifts on offset_int- and widest_int-like types.
(generic_wide_int): Support <<= and >>= if << and >> are supported.
* tree.h (int_bit_position): Use shift operators instead of wi::
 shifts.
* alias.c (adjust_offset_for_component_ref): Likewise.
* expr.c (get_inner_reference): Likewise.
* fold-const.c (fold_comparison): Likewise.
* gimple-fold.c (fold_nonarray_ctor_reference): Likewise.
* gimple-ssa-strength-reduction.c (restructure_reference): Likewise.
* tree-dfa.c (get_ref_base_and_extent): Likewise.
* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise.
(stmt_kills_ref_p): Likewise.
* tree-ssa-ccp.c (bit_value_binop_1): Likewise.
* tree-ssa-math-opts.c (find_bswap_or_nop_load): Likewise.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
(ao_ref_init_from_vn_reference): Likewise.

gcc/cp/
* init.c (build_new_1): Use shift operators instead of wi:: shifts.

Index: gcc/wide-int.h
===
--- gcc/wide-int.h
+++ gcc/wide-int.h
@@ -68,6 +68,8 @@ along with GCC; see the file COPYING3.  If not see
  Since the values are logically signed, there is no need to
  distinguish between signed and unsigned operations.  Sign-sensitive
  comparison operators <, <=, > and >= are therefore supported.
+ Shift operators << and >> are also supported, with >> being
+ an _arithmetic_ right shift.
 
  [ Note that, even though offset_int is effectively int128_t,
it can still be useful to use unsigned comparisons like
@@ -82,7 +84,8 @@ along with GCC; see the file COPYING3.  If not see
 
  Like offset_int, widest_int is wider than all the values that
  it needs to represent, so the integers are logically signed.
- Sign-sensitive comparison operators <, <=, > and >= are supported.
+ Sign-sensitive comparison operators <, <=, > and >= are supported,
+ as are << and >>.
 
  There are several places in the GCC where this should/must be used:
 
@@ -259,6 +262,11 @@ along with GCC; see the file COPYING3.  If not see
 #define WI_BINARY_RESULT(T1, T2) \
   typename wi::binary_traits ::result_type
 
+/* The type of result produced by T1 << T2.  Leads to substitution failure
+   if the operation isn't supported.  Defined purely for brevity.  */
+#define WI_SIGNED_SHIFT_RESULT(T1, T2) \
+  typename wi::binary_traits ::signed_shift_result_type
+
 /* The type of result produced by a signed binary predicate on types T1 and T2.
This is bool if signed comparisons make sense for T1 and T2 and leads to
substitution failure otherwise.  */
@@ -405,6 +413,7 @@ namespace wi
so as not to confuse gengtype.  */
 typedef generic_wide_int < fixed_wide_int_storage
   ::precision> > result_type;
+typedef result_type signed_shift_result_type;
 typedef bool signed_predicate_result;
   };
 
@@ -416,6 +425,7 @@ namespace wi
 STATIC_ASSERT (int_traits ::precision == int_traits ::precision);
 typedef generic_wide_int < fixed_wide_int_storage
   ::precision> > result_type;
+typedef result_type signed_shift_result_type;
 typedef bool signed_predicate_result;
   };
 
@@ -681,6 +691,11 @@ public:
   template  \
 generic_wide_int &OP (const T &c) { return (*this = wi::F (*this, c)); }
 
+/* Restrict these to cases where the shift operator is defined.  */
+#define SHIFT_ASSIGNMENT_OPERATOR(OP, OP2) \
+  template  \
+generic_wide_int &OP (const T &c) { return (*this = *this OP2 c); }
+
 #define INCDEC_OPERATOR(OP, DELTA) \
   generic_wide_int &OP () { *this += DELTA; return *this; }
 
@@ -702,12 +717,15 @@ public:
   ASSIGNMENT_OPERATOR (operator +=, add)
   ASSIGNMENT_OPERATOR (operator -=, sub)
   ASSIGNMENT_OPERATOR (operator *=, mul)
+  SHIFT_ASSIGNMENT_OPERATOR (operator <<=, <<)
+  SHIFT_ASSIGNMENT_OPERATOR (operator >>=, >>)
   INCDEC_OPERATOR (operator ++, 1)
   INCDEC_OPERATOR (operator --, -1)
 
 #undef BINARY_PREDICATE
 #undef UNARY_OPERATOR
 #undef BINARY_OPERATOR
+#undef SHIFT_ASSIGNMENT_OPERATOR
 #undef ASSIGNMENT_OPERATOR
 #undef INCDEC_OPERATOR
 
@@ -857,7 +875,7 @@ generic_wide_int ::elt (unsigned int i) const
 
 template 
 template 
-generic_wide_int  &
+inline generic_wide_int  &
 generic_wide_int ::opera

Add a wi::to_wide helper function

2016-04-29 Thread Richard Sandiford
As Richard says, we ought to have a convenient way of converting
an INTEGER_CST to a wide_int of a particular precision without
having to extract the sign of the INTEGER_CST's type each time.
This patch adds a wi::to_wide helper for that, alongside the
existing wi::to_offset and wi::to_widest.

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree.h (wi::to_wide): New function.
* expr.c (expand_expr_real_1): Use wi::to_wide.
* fold-const.c (int_const_binop_1): Likewise.
(extract_muldiv_1): Likewise.

gcc/c-family/
* c-common.c (shorten_compare): Use wi::to_wide.

Index: gcc/tree.h
===
--- gcc/tree.h
+++ gcc/tree.h
@@ -5211,6 +5211,8 @@ namespace wi
   to_widest (const_tree);
 
   generic_wide_int  > to_offset 
(const_tree);
+
+  wide_int to_wide (const_tree, unsigned int);
 }
 
 inline unsigned int
@@ -5240,6 +5242,16 @@ wi::to_offset (const_tree t)
   return t;
 }
 
+/* Convert INTEGER_CST T to a wide_int of precision PREC, extending or
+   truncating as necessary.  When extending, use sign extension if T's
+   type is signed and zero extension if T's type is unsigned.  */
+
+inline wide_int
+wi::to_wide (const_tree t, unsigned int prec)
+{
+  return wide_int::from (t, prec, TYPE_SIGN (TREE_TYPE (t)));
+}
+
 template 
 inline wi::extended_tree ::extended_tree (const_tree t)
   : m_t (t)
Index: gcc/expr.c
===
--- gcc/expr.c
+++ gcc/expr.c
@@ -9729,10 +9729,9 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
  GET_MODE_PRECISION (TYPE_MODE (type)), we need to extend from
  the former to the latter according to the signedness of the
  type. */
-  temp = immed_wide_int_const (wide_int::from
+  temp = immed_wide_int_const (wi::to_wide
   (exp,
-   GET_MODE_PRECISION (TYPE_MODE (type)),
-   TYPE_SIGN (type)),
+   GET_MODE_PRECISION (TYPE_MODE (type))),
   TYPE_MODE (type));
   return temp;
 
Index: gcc/fold-const.c
===
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -963,8 +963,7 @@ int_const_binop_1 (enum tree_code code, const_tree arg1, 
const_tree parg2,
   signop sign = TYPE_SIGN (type);
   bool overflow = false;
 
-  wide_int arg2 = wide_int::from (parg2, TYPE_PRECISION (type),
- TYPE_SIGN (TREE_TYPE (parg2)));
+  wide_int arg2 = wi::to_wide (parg2, TYPE_PRECISION (type));
 
   switch (code)
 {
@@ -6394,10 +6393,8 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  bool overflow_mul_p;
  signop sign = TYPE_SIGN (ctype);
  unsigned prec = TYPE_PRECISION (ctype);
- wide_int mul = wi::mul (wide_int::from (op1, prec,
- TYPE_SIGN (TREE_TYPE (op1))),
- wide_int::from (c, prec,
- TYPE_SIGN (TREE_TYPE (c))),
+ wide_int mul = wi::mul (wi::to_wide (op1, prec),
+ wi::to_wide (c, prec),
  sign, &overflow_mul_p);
  overflow_p = TREE_OVERFLOW (c) | TREE_OVERFLOW (op1);
  if (overflow_mul_p
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -4012,10 +4012,9 @@ shorten_compare (location_t loc, tree *op0_ptr, tree 
*op1_ptr,
  /* Convert primop1 to target type, but do not introduce
 additional overflow.  We know primop1 is an int_cst.  */
  primop1 = force_fit_type (*restype_ptr,
-   wide_int::from
- (primop1,
-  TYPE_PRECISION (*restype_ptr),
-  TYPE_SIGN (TREE_TYPE (primop1))),
+   wi::to_wide
+(primop1,
+ TYPE_PRECISION (*restype_ptr)),
0, TREE_OVERFLOW (primop1));
}
   if (type != *restype_ptr)


Simplify cst_and_fits_in_hwi

2016-04-29 Thread Richard Sandiford
While looking at the use of cst_and_fits_in_hwi in tree-ssa-loop-ivopts.c,
I had difficulty working out what the function actually tests.  The
final NUNITS check seems redundant, since it asks about the number of
HWIs in the _unextended_ constant.  We've already checked that the
unextended constant has no more than HOST_BITS_PER_WIDE_INT bits, so the
length must be 1.

I think this was my fault, sorry.

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree.c (cst_and_fits_in_hwi): Simplify.

Index: gcc/tree.c
===
--- gcc/tree.c
+++ gcc/tree.c
@@ -1675,13 +1675,8 @@ build_low_bits_mask (tree type, unsigned bits)
 bool
 cst_and_fits_in_hwi (const_tree x)
 {
-  if (TREE_CODE (x) != INTEGER_CST)
-return false;
-
-  if (TYPE_PRECISION (TREE_TYPE (x)) > HOST_BITS_PER_WIDE_INT)
-return false;
-
-  return TREE_INT_CST_NUNITS (x) == 1;
+  return (TREE_CODE (x) == INTEGER_CST
+ && TYPE_PRECISION (TREE_TYPE (x)) <= HOST_BITS_PER_WIDE_INT);
 }
 
 /* Build a newly constructed VECTOR_CST node of length LEN.  */


Re: [ubsan PATCH] Fix compile-time hog with &TARGET_EXPRs (PR sanitizer/70342)

2016-04-29 Thread Marek Polacek
On Thu, Apr 28, 2016 at 04:15:41PM +0200, Jakub Jelinek wrote:
> On Thu, Apr 28, 2016 at 04:10:01PM +0200, Marek Polacek wrote:
> > That works too, though it of course affects all users, not just ubsan.  
> > Here's
> 
> Of course, but I think that is a good thing ;)
> 
> > the patch with your suggested change.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2016-04-28  Marek Polacek  
> > Jakub Jelinek  
> > 
> > PR sanitizer/70342
> > * fold-const.c (tree_single_nonzero_warnv_p): For TARGET_EXPR, use
> > TARGET_EXPR_SLOT as a base.
> > 
> > * g++.dg/ubsan/null-7.C: New test.
> 
> Ok for trunk.

Thanks, committed.

> For 6.2 dunno, either the same patch after a while, or perhaps your original
> patch is safer (though, wonder if e.g. one can construct a testcase where it
> will use instrument &(TARGET_EXPR <...>.field) nested many times and still
> trigger the compile time hog with your patch).

Dunno either.  I think I'll backport the same patch after a week or so.

Marek


Re: Support << and >> for offset_int and widest_int

2016-04-29 Thread H.J. Lu
On Fri, Apr 29, 2016 at 5:30 AM, Richard Sandiford
 wrote:
> Following on from the comparison patch, I think it makes sense to
> support << and >> for offset_int (int128_t) and widest_int (intNNN_t),
> with >> being arithmetic shift.  It doesn't make sense to use
> logical right shift on a potentially negative offset_int, since
> the precision of 128 bits has no meaning on the target.
>
> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
> * wide-int.h: Update offset_int and widest_int documentation.
> (WI_SIGNED_SHIFT_RESULT): New macro.
> (wi::binary_shift): Define signed_shift_result_type for
> shifts on offset_int- and widest_int-like types.
> (generic_wide_int): Support <<= and >>= if << and >> are supported.
> * tree.h (int_bit_position): Use shift operators instead of wi::
>  shifts.
> * alias.c (adjust_offset_for_component_ref): Likewise.
> * expr.c (get_inner_reference): Likewise.
> * fold-const.c (fold_comparison): Likewise.
> * gimple-fold.c (fold_nonarray_ctor_reference): Likewise.
> * gimple-ssa-strength-reduction.c (restructure_reference): Likewise.
> * tree-dfa.c (get_ref_base_and_extent): Likewise.
> * tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise.
> (stmt_kills_ref_p): Likewise.
> * tree-ssa-ccp.c (bit_value_binop_1): Likewise.
> * tree-ssa-math-opts.c (find_bswap_or_nop_load): Likewise.
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> (ao_ref_init_from_vn_reference): Likewise.
>
> gcc/cp/
> * init.c (build_new_1): Use shift operators instead of wi:: shifts.

Can you also update change_zero_ext in combine.c:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70687

It should use wide_int << instead of HOST_WIDE_INT <<< to
support __int128.

-- 
H.J.


Re: Cilk Plus testsuite needs massive cleanup (PR testsuite/70595)

2016-04-29 Thread Rainer Orth
Hi Dominique,

> Any reason why the dg-additional-options "*-lcilkrts" have been removed?
> AFAICT they are needed for darwin.

-fcilkplus does (and should) include -lcilkrts when linking.  It
certainly does on Solaris and Linux.  Everything else is a usability
nightmare: you don't need to link Fortran programs with

$ gfortran -lgfortran -lquadmath

do you?

gcc.c (LINK_COMMAND_SPEC) has

%{fcilkplus:%:include(libcilkrts.spec)%(link_cilkrts)}\

and the generated libcilkrts.spec on Darwin has

*link_cilkrts: -lcilkrts %{static: }

Unfortunately, the darwin.h almost-copy of LINK_COMMAND_SPEC lacks
this.  We should really find a way to better modularize this (and other)
specs to avoid this error-prone duplication.

The following patch (completely untested) adds the above line to its
darwin.h counterpart.  I'll give it a whirl myself in this weekend's
bootstraps.  Sorry about the breakage.

Rainer


2016-04-29  Rainer Orth  

* config/darwin.h (LINK_COMMAND_SPEC_A): Handle -fcilkplus.

# HG changeset patch
# Parent  297a270669c098610ed0f7333b9a11ab4d3ef2bd
Handle -fcilkplus in Mac OS X LINK_COMMAND_SPEC

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -179,6 +179,7 @@ extern GTY(()) int darwin_ms_struct;
 %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
 %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libgomp.a%s; : -lgomp } } \
+%{fcilkplus:%:include(libcilkrts.spec)%(link_cilkrts)}\
 %{fgnu-tm: \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libitm.a%s; : -litm } } \
 %{!nostdlib:%{!nodefaultlibs:\

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Enabling -frename-registers?

2016-04-29 Thread David Edelsohn
How has this show general benefit for all architectures to deserve
enabling it by default at -O2?

As an aside, this change seems to be the source of a new code
generation bug affecting the PPC kernel.

Thanks, David


Re: [PATCH] Update gmp/mpfr/mpc in-tree versions

2016-04-29 Thread Bernd Edlinger
On 29.04.2016 09:46, Richard Biener wrote:
> On Thu, 28 Apr 2016, Bernd Edlinger wrote:
>
>> On 28.04.2016 16:29, Richard Biener wrote:
>>>
>>> Another option would be to try if mini-gmp is enough for our
>>> (in-tree) use and what the performance impact would be if we'd
>>> use that (in-tree).
>>>
>>
>> Yes, we would certainly never need more than that subset.
>>
>> But I don't see how mpfr can be built with mini-gmp.
>> I tried to and failed early in mpfr/configure.
>> Any ideas?
>
> No idea - it of course breaks down if mpfr cannot work with mini-gmp.
>

Yes, that's how it looks like.

Frankly speaking, I think we should start testing with 6.1.0
(with -DNO_ASM of course, but that should be safe at least)
and if anything goes wrong, we can still try to get it fixed in 6.1.1,
as long as it is not yet released.


Bernd.


Re: Enabling -frename-registers?

2016-04-29 Thread Bernd Schmidt

On 04/29/2016 03:02 PM, David Edelsohn wrote:

How has this show general benefit for all architectures to deserve
enabling it by default at -O2?


It should improve postreload scheduling in general, and it can also help 
clear up bad code generation left behind by register allocation.



As an aside, this change seems to be the source of a new code
generation bug affecting the PPC kernel.


Please file a PR.


Bernd



Re: Enabling -frename-registers?

2016-04-29 Thread David Edelsohn
On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt  wrote:
> On 04/29/2016 03:02 PM, David Edelsohn wrote:
>>
>> How has this show general benefit for all architectures to deserve
>> enabling it by default at -O2?
>
>
> It should improve postreload scheduling in general, and it can also help
> clear up bad code generation left behind by register allocation.

Did you test the actual performance benefit on any architectures,
especially architectures other than x86?

Thanks, David


Re: [PATCH, rs6000] Add support for vector element-reversal built-ins

2016-04-29 Thread Segher Boessenkool
Hi Bill,

On Mon, Apr 25, 2016 at 09:09:03AM -0500, Bill Schmidt wrote:
> Here's the fix for the obvious pasto separated out.  CCing Richi and
> Jakub as I'd appreciate release manager approval to include this in
> gcc-6-branch.  This fixes some cases where built-in functions are
> connected to the wrong expanders because of copy-paste issues.  These
> tend not to be used anyway because the vec_st interface is friendlier,
> but we should clean this up.  Is that ok?

This is fine for trunk, thanks!


Segher



Re: Enabling -frename-registers?

2016-04-29 Thread David Edelsohn
On Fri, Apr 29, 2016 at 9:44 AM, Bernd Schmidt  wrote:
>
>
> On 04/29/2016 03:42 PM, David Edelsohn wrote:
>>
>> On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt 
>> wrote:
>>>
>>> On 04/29/2016 03:02 PM, David Edelsohn wrote:


 How has this show general benefit for all architectures to deserve
 enabling it by default at -O2?
>>>
>>>
>>>
>>> It should improve postreload scheduling in general, and it can also help
>>> clear up bad code generation left behind by register allocation.
>>
>>
>> Did you test the actual performance benefit on any architectures,
>> especially architectures other than x86?
>
>
> No. If that's the standard, I'll back out the change.

It seems rather strange to enable an optimization by default across
all targets without even knowing the performance impact.

I'm eager to learn the opinion of others about this.

Thanks, David


[PATCH] Reduce compile time hog in symbol-summary.h

2016-04-29 Thread Martin Liška
Hello.

For every created function_summary, we validate (with flag_checking) that
all cgraph_nodes have summary_uid > 0. It produces a compile hog.
It's sufficient to validate that for nodes that really utilize a function 
summary.

Patch can bootstrap®test on ppc64le-linux-gnu.

Ready for trunk?
Martin
>From e5a33bd95a102d16d38bbe2a35ecf0fd9a196ee3 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 29 Apr 2016 13:48:31 +0200
Subject: [PATCH] Symbol summary: refactor usage of gcc_checking_asserts

gcc/ChangeLog:

2016-04-29  Martin Liska  

	* symbol-summary.h (function_summary::function_summary):
	Remove checking assert for all cgraph nodes.
	(function_summary::get): Check summary_uid.
	(symtab_insertion): Check summary_uid.
---
 gcc/symbol-summary.h | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
index c0dd5aa..a22352b 100644
--- a/gcc/symbol-summary.h
+++ b/gcc/symbol-summary.h
@@ -39,13 +39,6 @@ public:
   function_summary (symbol_table *symtab, bool ggc = false): m_ggc (ggc),
 m_map (13, ggc), m_insertion_enabled (true), m_symtab (symtab)
   {
-if (flag_checking)
-  {
-	cgraph_node *node;
-	FOR_EACH_FUNCTION (node)
-	  gcc_assert (node->summary_uid > 0);
-  }
-
 m_symtab_insertion_hook =
   symtab->add_cgraph_insertion_hook
   (function_summary::symtab_insertion, this);
@@ -124,6 +117,7 @@ public:
   /* Getter for summary callgraph node pointer.  */
   T* get (cgraph_node *node)
   {
+gcc_checking_assert (node->summary_uid);
 return get (node->summary_uid);
   }
 
@@ -148,6 +142,7 @@ public:
   /* Symbol insertion hook that is registered to symbol table.  */
   static void symtab_insertion (cgraph_node *node, void *data)
   {
+gcc_checking_assert (node->summary_uid);
 function_summary *summary = (function_summary  *) (data);
 
 if (summary->m_insertion_enabled)
-- 
2.8.1



Re: Enabling -frename-registers?

2016-04-29 Thread Bernd Schmidt



On 04/29/2016 03:42 PM, David Edelsohn wrote:

On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt  wrote:

On 04/29/2016 03:02 PM, David Edelsohn wrote:


How has this show general benefit for all architectures to deserve
enabling it by default at -O2?



It should improve postreload scheduling in general, and it can also help
clear up bad code generation left behind by register allocation.


Did you test the actual performance benefit on any architectures,
especially architectures other than x86?


No. If that's the standard, I'll back out the change.


Bernd


Re: [C++ Patch] PR 66644

2016-04-29 Thread Jason Merrill

On 04/28/2016 08:18 PM, Paolo Carlini wrote:

else if (ANON_AGGR_TYPE_P (type))
  {
-  tree fields;
-
-  for (fields = TYPE_FIELDS (type); fields; fields = DECL_CHAIN (fields))
+  for (tree fields = TYPE_FIELDS (type); fields;
+  fields = DECL_CHAIN (fields))
if (TREE_CODE (fields) == FIELD_DECL && !DECL_C_BIT_FIELD (field))
- check_field_decl (fields, t, cant_have_const_ctor,
-   no_const_asn_ref, any_default_members);
+ any_default_members_field |= check_field_decl (fields, t,
+cant_have_const_ctor,
+no_const_asn_ref,
+any_default_members);


The logic here seems convoluted.  I guess we don't need to handle 
anonymous structs and unions differently here because we'll call 
check_field_decls for the anonymous union itself, and complain then?  In 
that case, instead of passing down any_default_members at all, can we 
just pass it up and complain in check_field_decls?


Jason



[Openacc] Adjust automatic loop partitioning

2016-04-29 Thread Nathan Sidwell

Jakub,
currently automatic loop partitioning assigns from the innermost loop outwards 
-- that was the simplest thing to implement.  A better algorithm is to assign 
the outermost loop to the outermost available axis, and then assign from the 
innermost loop outwards.   That way we (generally) get gang partitioning on the 
outermost loop.  Just inside that we'll get non-partitioned loops if the nest is 
too deep, and the two innermost nested loops will get worker and vector 
partitioning.


This patch has been on the gomp4 branch for a while.  ok for trunk?

nathan
2016-04-29  Nathan Sidwell  

	gcc/
	* omp-low.c (struct oacc_loop): Add 'inner' field.
	(new_oacc_loop_raw): Initialize it to zero.
	(oacc_loop_fixed_partitions): Initialize it.
	(oacc_loop_auto_partitions): Partition outermost loop to outermost
	available partitioning.

	gcc/testsuite/
	* c-c++-common/goacc/loop-auto-1.c: Adjust expected warnings.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust
	expected partitioning.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(revision 235511)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(working copy)
@@ -103,9 +103,11 @@ int vector_1 (int *ary, int size)
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
+#pragma acc loop gang
+for (int jx = 0; jx < 1; jx++)
 #pragma acc loop auto
-for (int ix = 0; ix < size; ix++)
-  ary[ix] = place ();
+  for (int ix = 0; ix < size; ix++)
+	ary[ix] = place ();
   }
 
   return check (ary, size, 0, 0, 1);
@@ -118,7 +120,7 @@ int vector_2 (int *ary, int size)
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
 #pragma acc loop worker
-for (int jx = 0; jx <  size  / 64; jx++)
+for (int jx = 0; jx < size  / 64; jx++)
 #pragma acc loop auto
   for (int ix = 0; ix < 64; ix++)
 	ary[ix + jx * 64] = place ();
@@ -133,30 +135,16 @@ int worker_1 (int *ary, int size)
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
+#pragma acc loop gang
+for (int kx = 0; kx < 1; kx++)
 #pragma acc loop auto
-for (int jx = 0; jx <  size  / 64; jx++)
+  for (int jx = 0; jx <  size  / 64; jx++)
 #pragma acc loop vector
-  for (int ix = 0; ix < 64; ix++)
-	ary[ix + jx * 64] = place ();
-  }
-
-  return check (ary, size, 0, 1, 1);
-}
-
-int worker_2 (int *ary, int size)
-{
-  clear (ary, size);
-  
-#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
-  {
-#pragma acc loop auto
-for (int jx = 0; jx <  size  / 64; jx++)
-#pragma acc loop auto
-  for (int ix = 0; ix < 64; ix++)
-	ary[ix + jx * 64] = place ();
+	for (int ix = 0; ix < 64; ix++)
+	  ary[ix + jx * 64] = place ();
   }
 
-  return check (ary, size, 0, 1, 1);
+  return check (ary, size, 0,  1, 1);
 }
 
 int gang_1 (int *ary, int size)
@@ -193,6 +181,22 @@ int gang_2 (int *ary, int size)
   return check (ary, size, 1, 1, 1);
 }
 
+int gang_3 (int *ary, int size)
+{
+  clear (ary, size);
+  
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop auto
+for (int jx = 0; jx <  size  / 64; jx++)
+#pragma acc loop auto
+  for (int ix = 0; ix < 64; ix++)
+	ary[ix + jx * 64] = place ();
+  }
+
+  return check (ary, size, 1, 0, 1);
+}
+
 #define N (32*32*32)
 int main ()
 {
@@ -214,13 +218,13 @@ int main ()
 
   if (worker_1 (ary,  N))
 return 1;
-  if (worker_2 (ary,  N))
-return 1;
   
   if (gang_1 (ary,  N))
 return 1;
   if (gang_2 (ary,  N))
 return 1;
+  if (gang_3 (ary,  N))
+return 1;
 
   return 0;
 }
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 235511)
+++ gcc/omp-low.c	(working copy)
@@ -241,6 +241,7 @@ struct oacc_loop
   tree routine;  /* Pseudo-loop enclosing a routine.  */
 
   unsigned mask;   /* Partitioning mask.  */
+  unsigned inner;  /* Partitioning of inner loops.  */
   unsigned flags;  /* Partitioning flags.  */
   unsigned ifns;   /* Contained loop abstraction functions.  */
   tree chunk_size; /* Chunk size.  */
@@ -18921,7 +18922,7 @@ new_oacc_loop_raw (oacc_loop *parent, lo
   memset (loop->tails, 0, sizeof (loop->tails));
   loop->routine = NULL_TREE;
 
-  loop->mask = loop->flags = 0;
+  loop->mask = loop->flags = loop->inner = 0;
   loop->ifns = 0;
   loop->chunk_size = 0;
   loop->head_end = NULL;
@@ -19449,8 +19450,11 @@ oacc_loop_fixed_partitions (oacc_loop *l
   mask_all |= this_mask;
   
   if (loop->child)
-mask_all |= oacc_loop_fixed_partitions (loop->child,
-	outer_mask | this_mask);
+{
+  loop->inner = oacc_loop_fixed_partitions (loop->child,
+		outer_mask | this_mask); 
+  mask_all |= loo

Re: [SH][committed] Remove SH5 support in compiler

2016-04-29 Thread Oleg Endo
On Fri, 2016-04-29 at 19:45 +0900, Oleg Endo wrote:
> On Thu, 2016-04-28 at 10:27 +0900, Oleg Endo wrote:
> 
> > The removal of SH5 support from GCC has been announced here
> > https://gcc.gnu.org/ml/gcc/2015-08/msg00101.html
> > 
> > The attached patch removes support for SH5 in the compiler back
> > end. 
> >  There are still some leftovers and new simplification
> > opportunities.
> >  These will be addressed in later follow up patches.
> > 
> > Tested on sh-elf with
> > 
> > make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
> > -m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> The attached patch removes some leftovers and reinstantes the divsf3
> expander pattern which got accidentally deleted by the previous 
> patch.

The attached patch removes SH5 support from libgcc.
Tested as above.  Committed as r235640.

Cheers,
Oleg

libgcc/ChangeLog:
* config/sh/crt1.S: Remove SH5 support.
* config/sh/crti.S: Likewise.
* config/sh/crtn.S: Likewise.
* config/sh/lib1funcs-4-300.S: Likewise.
* config/sh/lib1funcs-Os-4-200.S: Likewise.
* config/sh/lib1funcs.S: Likewise.
* config/sh/linux-unwind.h: Likewise.
* config/sh/t-sh64: Delete.diff --git a/libgcc/config/sh/crt1.S b/libgcc/config/sh/crt1.S
index 45e4aad..4e3c27d 100644
--- a/libgcc/config/sh/crt1.S
+++ b/libgcc/config/sh/crt1.S
@@ -38,648 +38,6 @@ __timer_stack:
 	/* ;
 	Normal newlib crt1.S */
 
-#ifdef __SH5__
-	.section .data,"aw"
-	.global ___data
-___data:
-
-	.section .rodata,"a"
-	.global ___rodata
-___rodata:
-
-#define ICCR_BASE  0x0160
-#define OCCR_BASE  0x01e0
-#define MMUIR_BASE 0x
-#define MMUDR_BASE 0x0080
-
-#define PTE_ENABLED 1
-#define PTE_DISABLED0
-
-#define PTE_SHARED (1 << 1)
-#define PTE_NOT_SHARED  0
-
-#define PTE_CB_UNCACHEABLE  0
-#define PTE_CB_DEVICE   1
-#define PTE_CB_CACHEABLE_WB 2
-#define PTE_CB_CACHEABLE_WT 3
-
-#define PTE_SZ_4KB   (0 << 3)
-#define PTE_SZ_64KB  (1 << 3)
-#define PTE_SZ_1MB   (2 << 3)
-#define PTE_SZ_512MB (3 << 3)
-
-#define PTE_PRR  (1 << 6)
-#define PTE_PRX  (1 << 7)
-#define PTE_PRW  (1 << 8)
-#define PTE_PRU  (1 << 9)
-
-#define SR_MMU_BIT  31
-#define SR_BL_BIT   28
-
-#define ALIGN_4KB  (0xfff)
-#define ALIGN_1MB  (0xf)
-#define ALIGN_512MB (0x1fff)
-
-#define DYNACON_BASE   0x0f00
-#define DM_CB_DLINK_BASE   0x0c00
-#define DM_DB_DLINK_BASE   0x0b00
-
-#define FEMI_AREA_00x
-#define FEMI_AREA_10x0400
-#define FEMI_AREA_20x0500
-#define FEMI_AREA_30x0600
-#define FEMI_AREA_40x0700
-#define FEMI_CB0x0800
-
-#define EMI_BASE   0X8000
-
-#define DMA_BASE   0X0e00
-
-#define CPU_BASE   0X0d00
-
-#define PERIPH_BASE0X0900
-#define DMAC_BASE  0x0e00
-#define INTC_BASE  0x0a00
-#define CPRC_BASE  0x0a01
-#define TMU_BASE   0x0a02
-#define SCIF_BASE  0x0a03
-#define RTC_BASE   0x0a04
-
-
-
-#define LOAD_CONST32(val, reg) \
-	movi	((val) >> 16) & 65535, reg; \
-	shori	(val) & 65535, reg
-
-#define LOAD_PTEH_VAL(sym, align, bits, scratch_reg, reg) \
-	LOAD_ADDR (sym, reg); \
-	LOAD_CONST32 ((align), scratch_reg); \
-	andc	reg, scratch_reg, reg; \
-	LOAD_CONST32 ((bits), scratch_reg); \
-	or	reg, scratch_reg, reg
-
-#define LOAD_PTEL_VAL(sym, align, bits, scratch_reg, reg) \
-	LOAD_ADDR (sym, reg); \
-	LOAD_CONST32 ((align), scratch_reg); \
-	andc	reg, scratch_reg, reg; \
-	LOAD_CONST32 ((bits), scratch_reg); \
-	or	reg, scratch_reg, reg
-
-#define SET_PTE(pte_addr_reg, pteh_val_reg, ptel_val_reg) \
-	putcfg  pte_addr_reg, 0, r63; \
-	putcfg  pte_addr_reg, 1, ptel_val_reg; \
-	putcfg  pte_addr_reg, 0, pteh_val_reg
-
-#if __SH5__ == 64
-	.section .text,"ax"
-#define LOAD_ADDR(sym, reg) \
-	movi	(sym >> 48) & 65535, reg; \
-	shori	(sym >> 32) & 65535, reg; \
-	shori	(sym >> 16) & 65535, reg; \
-	shori	sym & 65535, reg
-#else
-	.mode	SHmedia
-	.section .text..SHmedia32,"ax"
-#define LOAD_ADDR(sym, reg) \
-	movi	(sym >> 16) & 65535, reg; \
-	shori	sym & 65535, reg
-#endif
-	.global start
-start:
-	LOAD_ADDR (_stack, r15)
-
-#ifdef MMU_SUPPORT
-	! Set up the VM using the MMU and caches
-
-	! .vm_ep is first instruction to execute
-	! after VM initialization
-	pt/l	.vm_ep, tr1
-	
-	! Configure instruction cache (ICCR)
-	movi	3, r2
-	movi	0, r3
-	LOAD_ADDR (ICCR_BASE, r1)
-	putcfg	r1, 0, r2
-	putcfg	r1, 1, r3
-
-	! movi	7, r2 ! write through
-	! Configure operand cache (OCCR)
-	LOAD_ADDR (OCCR_BASE, r1)
-	putcfg	r1, 0, r2
-	putcfg	r1, 1, r3
-
-	! Disable all PTE translations
-	LOAD_ADDR (MMUIR_BASE, r1)
-	LOAD_ADDR (MMUDR_BASE, r2)
-	movi	64, r3
-	pt/l	.disable_pte

Re: Enabling -frename-registers?

2016-04-29 Thread Richard Biener
On April 29, 2016 3:48:37 PM GMT+02:00, David Edelsohn  
wrote:
>On Fri, Apr 29, 2016 at 9:44 AM, Bernd Schmidt 
>wrote:
>>
>>
>> On 04/29/2016 03:42 PM, David Edelsohn wrote:
>>>
>>> On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt 
>>> wrote:

 On 04/29/2016 03:02 PM, David Edelsohn wrote:
>
>
> How has this show general benefit for all architectures to deserve
> enabling it by default at -O2?



 It should improve postreload scheduling in general, and it can also
>help
 clear up bad code generation left behind by register allocation.
>>>
>>>
>>> Did you test the actual performance benefit on any architectures,
>>> especially architectures other than x86?
>>
>>
>> No. If that's the standard, I'll back out the change.
>
>It seems rather strange to enable an optimization by default across
>all targets without even knowing the performance impact.
>
>I'm eager to learn the opinion of others about this.

It shows overall benefit on Itanic and ups and downs on x86.

It's stage1 and the easiest to get feedback for all archs is to enable it by 
default.

Richard.

>Thanks, David




RE: [PATCH] [ARC] Add new ARCv2 instructions.

2016-04-29 Thread Claudiu Zissulescu
> > (arc_dwarf_register_span): Remove enum keyword.
> That bit should be separate.

About this small one, I will make a new patch and commit it as obvious.

//Claudiu


Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Eric Botcazou
> Eh, then the build yawns about the missing mode of the input operand.

Every good back-end has at least an example of this. ;-)

> This is getting a bit frustrating, but attached patch should solve
> this failure. Again lightly tested, regtest in progress.

Everything is back to normal with this one, thanks!

-- 
Eric Botcazou


[PATCH] [ARC] Add new ARCv2 instructions.

2016-04-29 Thread Claudiu Zissulescu
Please find the updated patch.

Ok to commit?
Claudiu

gcc/
2016-04-20  Claudiu Zissulescu  

* config/arc/arc-protos.h (compact_memory_operand_p): Declare.
* config/arc/arc.c (arc_output_commutative_cond_exec): Consider
bmaskn instruction.
(arc_dwarf_register_span): Remove enum keyword.
(compact_memory_operand_p): New function.
* config/arc/arc.h (reg_class): Add code density register classes.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
* config/arc/arc.md (*movqi_insn): Add code density instructions.
(*movhi_insn, *movsi_insn, *movsf_insn): Likewise.
(*extendhisi2_i, andsi3_i, cmpsi_cc_insn_mixed): Likewise.
(*cmpsi_cc_c_insn, *movsi_ne): Likewise.
* config/arc/constraints.md (C2p, Uts, Cm1, Cm3, Ucd): New
constraints.
(h, Rcd, Rsd, Rzd): New register constraints.
(T): Use compact_memory_operand_p function.
* config/arc/predicates.md (compact_load_memory_operand): Remove.
---
 gcc/config/arc/arc-protos.h   |   2 +-
 gcc/config/arc/arc.c  | 146 +++
 gcc/config/arc/arc.h  |   9 +++
 gcc/config/arc/arc.md | 154 --
 gcc/config/arc/constraints.md |  61 -
 gcc/config/arc/predicates.md  |  89 
 6 files changed, 304 insertions(+), 157 deletions(-)

diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
index 3bf28a0..8630e2d 100644
--- a/gcc/config/arc/arc-protos.h
+++ b/gcc/config/arc/arc-protos.h
@@ -44,7 +44,7 @@ extern void emit_shift (enum rtx_code, rtx, rtx, rtx);
 extern void arc_expand_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern void arc_split_compare_and_swap (rtx *);
 extern void arc_expand_compare_and_swap (rtx *);
-
+extern bool compact_memory_operand_p (rtx, machine_mode, bool, bool);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index dfaea7b..a54fddb 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7389,6 +7389,11 @@ arc_output_commutative_cond_exec (rtx *operands, bool 
output_p)
   case AND:
if (satisfies_constraint_C1p (operands[2]))
  pat = "bmsk%? %0,%1,%Z2";
+   else if (satisfies_constraint_C2p (operands[2]))
+ {
+   operands[2] = GEN_INT ((~INTVAL (operands[2])));
+   pat = "bmskn%? %0,%1,%Z2";
+ }
else if (satisfies_constraint_Ccp (operands[2]))
  pat = "bclr%? %0,%1,%M2";
else if (satisfies_constraint_CnL (operands[2]))
@@ -9859,12 +9864,153 @@ arc_dwarf_register_span (rtx rtl)
 
 /* We can't inline this in INSN_REFERENCES_ARE_DELAYED because
resource.h doesn't include the required header files.  */
+
 bool
 insn_is_tls_gd_dispatch (rtx_insn *insn)
 {
   return recog_memoized (insn) == CODE_FOR_tls_gd_dispatch;
 }
 
+/* Return true if OP is an acceptable memory operand for ARCompact
+   16-bit load instructions of MODE.
+
+   AV2SHORT: TRUE if address needs to fit into the new ARCv2 short
+   non scaled instructions.
+
+   SCALED: TRUE if address can be scaled.  */
+
+bool
+compact_memory_operand_p (rtx op, machine_mode mode,
+ bool av2short, bool scaled)
+{
+  rtx addr, plus0, plus1;
+  int size, off;
+
+  /* Eliminate non-memory operations.  */
+  if (GET_CODE (op) != MEM)
+return 0;
+
+  /* .di instructions have no 16-bit form.  */
+  if (MEM_VOLATILE_P (op) && !TARGET_VOLATILE_CACHE_SET)
+return false;
+
+  if (mode == VOIDmode)
+mode = GET_MODE (op);
+
+  size = GET_MODE_SIZE (mode);
+
+  /* dword operations really put out 2 instructions, so eliminate
+ them.  */
+  if (size > UNITS_PER_WORD)
+return false;
+
+  /* Decode the address now.  */
+  addr = XEXP (op, 0);
+  switch (GET_CODE (addr))
+{
+case REG:
+  return (REGNO (addr) >= FIRST_PSEUDO_REGISTER
+ || COMPACT_GP_REG_P (REGNO (addr))
+ || (SP_REG_P (REGNO (addr)) && (size != 2)));
+case PLUS:
+  plus0 = XEXP (addr, 0);
+  plus1 = XEXP (addr, 1);
+
+  if ((GET_CODE (plus0) == REG)
+ && ((REGNO (plus0) >= FIRST_PSEUDO_REGISTER)
+ || COMPACT_GP_REG_P (REGNO (plus0)))
+ && ((GET_CODE (plus1) == REG)
+ && ((REGNO (plus1) >= FIRST_PSEUDO_REGISTER)
+ || COMPACT_GP_REG_P (REGNO (plus1)
+   {
+ return !av2short;
+   }
+
+  if ((GET_CODE (plus0) == REG)
+ && ((REGNO (plus0) >= FIRST_PSEUDO_REGISTER)
+ || (COMPACT_GP_REG_P (REGNO (plus0)) && !av2short)
+ || (IN_RANGE (REGNO (plus0), 0, 31) && av2short))
+ && (GET_CODE (plus1) == CONST_INT))
+   {
+ bool valid = false;
+
+ off = INTVAL (plus1);
+
+ /* Negative offset is not supported in 16-bit load/store insns.  */
+ if (off < 0)
+   return 0;
+
+ /* Onl

[PATCH, i386]: Some more cleanup of peephole2 patterns

2016-04-29 Thread Uros Bizjak
No functional changes.

2016-04-29  Uros Bizjak  

* config/i386/i386.md
(operations with memory inputs setting flags peephole2):
Remove uneeded REG_P checks.  Cleanup pattern generation.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 235640)
+++ i386.md (working copy)
@@ -18006,16 +18006,18 @@
 (GET_CODE (operands[3]) == PLUS
  || GET_CODE (operands[3]) == MINUS)
 ? CCGOCmode : CCNOmode)"
-  [(parallel [(set (match_dup 4) (match_dup 5))
- (set (match_dup 1) (match_op_dup 3 [(match_dup 1)
- (match_dup 2)]))])]
+  [(parallel [(set (match_dup 4) (match_dup 6))
+ (set (match_dup 1) (match_dup 5))])]
 {
   operands[4] = SET_DEST (PATTERN (peep2_next_insn (3)));
-  operands[5] = gen_rtx_fmt_ee (GET_CODE (operands[3]), mode,
-   copy_rtx (operands[1]),
-   copy_rtx (operands[2]));
-  operands[5] = gen_rtx_COMPARE (GET_MODE (operands[4]),
-operands[5], const0_rtx);
+  operands[5]
+= gen_rtx_fmt_ee (GET_CODE (operands[3]), GET_MODE (operands[3]),
+ copy_rtx (operands[1]),
+ operands[2]);
+  operands[6]
+= gen_rtx_COMPARE (GET_MODE (operands[4]),
+  copy_rtx (operands[5]),
+  const0_rtx);
 })
 
 ;; Likewise for instances where we have a lea pattern.
@@ -18038,16 +18040,18 @@
|| immediate_operand (operands[2], QImode)
|| any_QIreg_operand (operands[2], QImode))
&& ix86_match_ccmode (peep2_next_insn (3), CCGOCmode)"
-  [(parallel [(set (match_dup 4) (match_dup 5))
- (set (match_dup 1) (plus:SWI (match_dup 1)
-  (match_dup 2)))])]
+  [(parallel [(set (match_dup 4) (match_dup 6))
+ (set (match_dup 1) (match_dup 5))])]
 {
   operands[4] = SET_DEST (PATTERN (peep2_next_insn (3)));
-  operands[5] = gen_rtx_PLUS (mode,
- copy_rtx (operands[1]),
- copy_rtx (operands[2]));
-  operands[5] = gen_rtx_COMPARE (GET_MODE (operands[4]),
-operands[5], const0_rtx);
+  operands[5]
+= gen_rtx_PLUS (mode,
+   copy_rtx (operands[1]),
+   operands[2]);
+  operands[6]
+= gen_rtx_COMPARE (GET_MODE (operands[4]),
+  copy_rtx (operands[5]),
+  const0_rtx);
 })
 
 (define_peephole2
@@ -18065,16 +18069,18 @@
&& ix86_match_ccmode (peep2_next_insn (2),
 GET_CODE (operands[2]) == PLUS
 ? CCGOCmode : CCNOmode)"
-  [(parallel [(set (match_dup 3) (match_dup 4))
- (set (match_dup 1) (match_op_dup 2 [(match_dup 1)
- (match_dup 0)]))])]
+  [(parallel [(set (match_dup 3) (match_dup 5))
+ (set (match_dup 1) (match_dup 4))])]
 {
   operands[3] = SET_DEST (PATTERN (peep2_next_insn (2)));
-  operands[4] = gen_rtx_fmt_ee (GET_CODE (operands[2]), mode,
-   copy_rtx (operands[1]),
-   copy_rtx (operands[0]));
-  operands[4] = gen_rtx_COMPARE (GET_MODE (operands[3]),
-operands[4], const0_rtx);
+  operands[4]
+= gen_rtx_fmt_ee (GET_CODE (operands[2]), GET_MODE (operands[2]),
+ copy_rtx (operands[1]),
+ operands[0]);
+  operands[5]
+= gen_rtx_COMPARE (GET_MODE (operands[3]),
+  copy_rtx (operands[4]),
+  const0_rtx);
 })
 
 (define_peephole2
@@ -18088,7 +18094,6 @@
(set (match_dup 1) (match_dup 0))
(set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
-   && REG_P (operands[0]) && REG_P (operands[4])
&& REGNO (operands[0]) == REGNO (operands[4])
&& peep2_reg_dead_p (4, operands[0])
&& (mode != QImode
@@ -18100,18 +18105,18 @@
 (GET_CODE (operands[3]) == PLUS
  || GET_CODE (operands[3]) == MINUS)
 ? CCGOCmode : CCNOmode)"
-  [(parallel [(set (match_dup 4) (match_dup 5))
- (set (match_dup 1) (match_dup 6))])]
+  [(parallel [(set (match_dup 4) (match_dup 6))
+ (set (match_dup 1) (match_dup 5))])]
 {
-  operands[2] = gen_lowpart (mode, operands[2]);
   operands[4] = SET_DEST (PATTERN (peep2_next_insn (3)));
-  operands[5] = gen_rtx_fmt_ee (GET_CODE (operands[3]), mode,
-   copy_rtx (operands[1]), operands[2]);
-  operands[5] = gen_rtx_COMPARE (GET_MODE (operands[4]),
-operands[5], const0_rtx);
-  o

[PATCH][ARM][gas] Fix warnings about uninitialised uses and unused const variables

2016-04-29 Thread Kyrill Tkachov

Hi all,

I recently upgraded my host compiler to GCC 6.1.0 and while trying to build a
cross toolchain for arm-none-eabi I've encountered some -Werror errors in 
tc-arm.c
This patch fixes them.

Some static const variables that are unused are marked with ATTRIBUTE_UNUSED.
In parse_neon_el_struct_list GCC complains that firsttype.index may be used
uninitialized in an inlined neon_alias_types_same call.
This patch initialises the fields firsttype to prevent that.

With this patch the gas build succeeds for me.
Tested with make check-gas for arm-none-eabi.

Ok to commit?

Thanks,
Kyrill

2016-04-29  Kyrylo Tkachov  

* config/tc-arm.c (fpu_arch_vfp_v1): Mark with ATTRIBUTE_UNUSED.
(fpu_arch_vfp_v3): Likewise.
(fpu_arch_neon_v1): Likewise.
(arm_arch_full): Likewise.
(parse_neon_el_struct_list): Initialize fields of firsttype.
diff --git a/gas/config/tc-arm.c b/gas/config/tc-arm.c
index 958434b3563cb49c3fb3cd04ffefc396916f2a49..64d87a371058f0a5b1d173fa4cb43ba8f86dee0d 100644
--- a/gas/config/tc-arm.c
+++ b/gas/config/tc-arm.c
@@ -155,10 +155,10 @@ static const arm_feature_set *object_arch = NULL;
 
 /* Constants for known architecture features.  */
 static const arm_feature_set fpu_default = FPU_DEFAULT;
-static const arm_feature_set fpu_arch_vfp_v1 = FPU_ARCH_VFP_V1;
+static const arm_feature_set fpu_arch_vfp_v1 ATTRIBUTE_UNUSED = FPU_ARCH_VFP_V1;
 static const arm_feature_set fpu_arch_vfp_v2 = FPU_ARCH_VFP_V2;
-static const arm_feature_set fpu_arch_vfp_v3 = FPU_ARCH_VFP_V3;
-static const arm_feature_set fpu_arch_neon_v1 = FPU_ARCH_NEON_V1;
+static const arm_feature_set fpu_arch_vfp_v3 ATTRIBUTE_UNUSED = FPU_ARCH_VFP_V3;
+static const arm_feature_set fpu_arch_neon_v1 ATTRIBUTE_UNUSED = FPU_ARCH_NEON_V1;
 static const arm_feature_set fpu_arch_fpa = FPU_ARCH_FPA;
 static const arm_feature_set fpu_any_hard = FPU_ANY_HARD;
 static const arm_feature_set fpu_arch_maverick = FPU_ARCH_MAVERICK;
@@ -221,7 +221,7 @@ static const arm_feature_set arm_ext_fp16 =
   ARM_FEATURE_CORE_HIGH (ARM_EXT2_FP16_INST);
 
 static const arm_feature_set arm_arch_any = ARM_ANY;
-static const arm_feature_set arm_arch_full = ARM_FEATURE (-1, -1, -1);
+static const arm_feature_set arm_arch_full ATTRIBUTE_UNUSED = ARM_FEATURE (-1, -1, -1);
 static const arm_feature_set arm_arch_t2 = ARM_ARCH_THUMB2;
 static const arm_feature_set arm_arch_none = ARM_ARCH_NONE;
 static const arm_feature_set arm_arch_v6m_only = ARM_ARCH_V6M_ONLY;
@@ -1988,6 +1988,10 @@ parse_neon_el_struct_list (char **str, unsigned *pbase,
   const char *const incr_error = _("register stride must be 1 or 2");
   const char *const type_error = _("mismatched element/structure types in list");
   struct neon_typed_alias firsttype;
+  firsttype.defined = 0;
+  firsttype.eltype.type = NT_invtype;
+  firsttype.eltype.size = -1;
+  firsttype.index = -1;
 
   if (skip_past_char (&ptr, '{') == SUCCESS)
 leading_brace = 1;


Re: [PATCH][ARM][gas] Fix warnings about uninitialised uses and unused const variables

2016-04-29 Thread Kyrill Tkachov

This was supposed to go to the binutils list :(
Please ignore.

Kyrill

On 29/04/16 15:37, Kyrill Tkachov wrote:

Hi all,

I recently upgraded my host compiler to GCC 6.1.0 and while trying to build a
cross toolchain for arm-none-eabi I've encountered some -Werror errors in 
tc-arm.c
This patch fixes them.

Some static const variables that are unused are marked with ATTRIBUTE_UNUSED.
In parse_neon_el_struct_list GCC complains that firsttype.index may be used
uninitialized in an inlined neon_alias_types_same call.
This patch initialises the fields firsttype to prevent that.

With this patch the gas build succeeds for me.
Tested with make check-gas for arm-none-eabi.

Ok to commit?

Thanks,
Kyrill

2016-04-29  Kyrylo Tkachov  

* config/tc-arm.c (fpu_arch_vfp_v1): Mark with ATTRIBUTE_UNUSED.
(fpu_arch_vfp_v3): Likewise.
(fpu_arch_neon_v1): Likewise.
(arm_arch_full): Likewise.
(parse_neon_el_struct_list): Initialize fields of firsttype.




Re: [PATCH, i386]: Extend TARGET_READ_MODIFY{,_WRITE} peepholes to all integer modes

2016-04-29 Thread Uros Bizjak
On Fri, Apr 29, 2016 at 4:30 PM, Eric Botcazou  wrote:
>> Eh, then the build yawns about the missing mode of the input operand.
>
> Every good back-end has at least an example of this. ;-)
>
>> This is getting a bit frustrating, but attached patch should solve
>> this failure. Again lightly tested, regtest in progress.
>
> Everything is back to normal with this one, thanks!

Thanks for testing, committed to mainline with following ChangeLog:

2016-04-29  Uros Bizjak  

* config/i386/i386.md (unspec): Add UNSPEC_PROBE_STACK.
(probe_stack): New expander.
(probe_stack_): New insn pattern.

Uros.


Re: [PATCH] Improve detection of constant conditions during jump threading

2016-04-29 Thread Jeff Law

On 04/28/2016 06:08 PM, Patrick Palka wrote:



The glitch in that plan is there is no easy linkage between the use of b_5
in bb4 and the ASSERT_EXPR in bb3.  That's something Aldy, Andrew and myself
are looking at independently for some of Aldy's work.


I see.. One other deficiency I noticed in the existing threading code
is that there may have been multiple ASSERT_EXPRs registered for b_5,
so bb3 could look like

:
b_15 = ASSERT_EXPR ;
b_16 = ASSERT_EXPR ;
foo ();

but currently we don't consider the 2nd ASSERT_EXPR because we only
look at the immediate uses of b_5.  This oversight makes us fail to
thread

void bar (void);
void baz (void);

void
foo (int a)
{
  if (a != 5 && a != 10)
bar ();
  if (a == 10)
baz ();
}
Can you file this as a BZ please and add me to the cc list.  I'll 
probably add Andrew as well since this is great example of something 
we'd like to catch with his work.   Thanks.







In this specific instance, there's a good chance your analysis is catching
something earlier and allowing it to be better simplified.  But let's do the
analysis to make sure.


From what I can tell, the patch does cause fewer conditionals to get
executed in general.  I spot two extra jumps that are threaded in the
final code compared to without the patch.  I wouldn't trust my
analysis though!

I'll walk through it.



By the way, the test case ssa-thread-11.c is somewhat buggy since its
two functions lack return statements.  Also I would expect abort() to
have the noreturn attribute.
Those testcases are heavily reduced and ultimately are useful only to 
show cases where jump threading ought to happen -- there's all kinds of 
undefined behaviour in those tests.  Their only purpose is to set up a 
CFG and conditionals in which jump threading should be happening and 
wasn't at some point or another.




That makes sense!  I will play around with this technique.
Aside from the time, the biggest problem is ASLR and the ld.so hashing 
bits which cause slight variations from one run to the next.  Otherwise 
it is highly stable, results are independent of the runtime load on the 
machine and measure the primary effect I'm usually searching for.  All 
excellent properties :-)


For your patch the reduction in runtime branches is tiny, on the order 
of 0.01%, but clearly still visible and well outside the typical noise.


What is far more interesting is its overall effect on total instructions 
executed.  Typically for each runtime branch eliminated I see 2-4 total 
instruction fetches eliminated.  Which makes sense if you think about it 
-- the branch usually has a comparison and perhaps some setup code which 
becomes dead as a result of threading the jump.


Your patch eliminates 8.5 instruction fetches per branch eliminated, 
which is very good, in fact, it's by far the highest ratio I can recall 
ever seeing.  So essentially while it doesn't fire often, when it fires 
it's a much bigger win than most jump threading cases.


Jeff



Re: [PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

2016-04-29 Thread Bin.Cheng
On Fri, Apr 29, 2016 at 12:16 PM, Richard Biener
 wrote:
> On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng  wrote:
>> Hi,
>> Tree if-conversion sometimes cannot convert conditional array reference into 
>> unconditional one.  Root cause is GCC conservatively assumes newly 
>> introduced array reference could be out of array bound and thus trapping.  
>> This patch improves the situation by proving the converted unconditional 
>> array reference is within array bound using loop niter information.  To be 
>> specific, it checks every index of array reference to see if it's within 
>> bound in ifcvt_memrefs_wont_trap.  This patch also factors out 
>> base_object_writable checking if the base object is writable or not.
>> Bootstrap and test on x86_64 and aarch64, is it OK?
>
> I think you miss to handle the case optimally where the only
> non-ARRAY_REF idx is the dereference of the
> base-pointer for, say, p->a[i].  In this case we can use
> base_master_dr to see if p is unconditionally dereferenced
Yes, will pick up this case.

> in the loop.  You also fail to handle the case where we have
> MEM_REF[&x].a[i] that is, you see a decl base.
I am having difficulty in creating this case for ifcvt, any advices?  Thanks.

> I suppose for_each_index should be fixed for this particular case (to
> return true), same for TARGET_MEM_REF TMR_BASE.
>
> +  /* The case of nonconstant bounds could be handled, but it would be
> + complicated.  */
> +  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
> +  || !high || TREE_CODE (high) != INTEGER_CST)
> +return false;
> +
>
> handling of a non-zero but constant low bound is important - otherwise
> all this is a no-op for Fortran.  It
> shouldn't be too difficult to handle after all.  In fact I think your
> code does handle it correctly already.
>
> +  if (!init || TREE_CODE (init) != INTEGER_CST
> +  || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop (step))
> +return false;
>
> step == 0 should be easy to handle as well, no?  The index will simply
> always be 'init' ...
>
> +  /* In case the relevant bound of the array does not fit in type, or
> + it does, but bound + step (in type) still belongs into the range of the
> + array, the index may wrap and still stay within the range of the array
> + (consider e.g. if the array is indexed by the full range of
> + unsigned char).
> +
> + To make things simpler, we require both bounds to fit into type, 
> although
> + there are cases where this would not be strictly necessary.  */
> +  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
> +return false;
> +
> +  low = fold_convert (type, low);
>
> please use wide_int for all of this.
Now I use wi:fits_to_tree_p instead of int_fits_type_p. But I am not
sure what's the meaning by "handle "low = fold_convert (type, low);"
related code in wide_int".   Do you mean to use tree_int_cst_compare
instead of tree_int_cst_compare in the following code?

>
> I wonder if we can do sth for wrapping IVs like
>
> int a[2048];
>
> for (int i = 0; i < 4096; ++i)
>   ... a[(unsigned char)i];
>
> as well.  Like if the IVs type max and min value are within the array bounds
> simply return true?
I think we can only do this for read.  For write this is not safe.
From vectorizer's point of view, is this worth handling?  Could
vectorizer handles wrapping IV in a smaller range than loop IV?

Thanks,
bin
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>>
>> 2016-04-28  Bin Cheng  
>>
>> * tree-if-conv.c (tree-ssa-loop.h): Include header file.
>> (tree-ssa-loop-niter.h): Ditto.
>> (idx_within_array_bound, ref_within_array_bound): New functions.
>> (ifcvt_memrefs_wont_trap): Check if array ref is within bound.
>> Factor out check on writable base object to ...
>> (base_object_writable): ... here.


Re: [arm-embedded][PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs

2016-04-29 Thread Kyrill Tkachov


On 27/04/16 15:17, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On Thursday 17 December 2015 17:32:48 Thomas Preud'homme wrote:

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
Sent: Wednesday, December 16, 2015 7:59 PM
To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
Kyrylo Tkachov
Subject: [PATCH, GCC/ARM, 2/3] Error out for incompatible ARM
multilibs

Currently in config.gcc, only the first multilib in a multilib list is
checked for validity and the following elements are ignored due to the
break which only breaks out of loop in shell. A loop is also done over
the multilib list elements despite no combination being legal. This patch
rework the code to address both issues.

ChangeLog entry is as follows:


2015-11-24  Thomas Preud'homme  

 * config.gcc: Error out when conflicting multilib is detected.  Do
 not
 loop over multilibs since no combination is legal.


Ok for trunk.
Thanks,
Kyrill


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 59aee2c..be3c720 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3772,38 +3772,40 @@ case "${target}" in

# Add extra multilibs
if test "x$with_multilib_list" != x; then

arm_multilibs=`echo $with_multilib_list | sed -e

's/,/ /g'`
-   for arm_multilib in ${arm_multilibs}; do
-   case ${arm_multilib} in
-   aprofile)
+   case ${arm_multilibs} in
+   aprofile)

# Note that arm/t-aprofile is a
# stand-alone make file fragment to be
# used only with itself.  We do not
# specifically use the
# TM_MULTILIB_OPTION framework

because

# this shorthand is more

-   # pragmatic. Additionally it is only
-   # designed to work without any
-   # with-cpu, with-arch with-mode
+   # pragmatic.
+   tmake_profile_file="arm/t-aprofile"
+   ;;
+   default)
+   ;;
+   *)
+   echo "Error: --with-multilib-
list=${with_multilib_list} not supported." 1>&2
+   exit 1
+   ;;
+   esac
+
+   if test "x${tmake_profile_file}" != x ; then
+   # arm/t-aprofile is only designed to work
+   # without any with-cpu, with-arch, with-
mode,

# with-fpu or with-float options.

-   if test "x$with_arch" != x \
-   || test "x$with_cpu" != x \
-   || test "x$with_float" != x \
-   || test "x$with_fpu" != x \
-   || test "x$with_mode" != x ;
then
-   echo "Error: You cannot use
any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=aprofile"
1>&2
-   exit 1
-   fi
-   tmake_file="${tmake_file}
arm/t-aprofile"
-   break
-   ;;
-   default)
-   ;;
-   *)
-   echo "Error: --with-multilib-
list=${with_multilib_list} not supported." 1>&2
-   exit 1
-   ;;
-   esac
-   done
+   if test "x$with_arch" != x \
+   || test "x$with_cpu" != x \
+   || test "x$with_float" != x \
+   || test "x$with_fpu" != x \
+   || test "x$with_mode" != x ; then
+   echo "Error: You cannot use any of --
with-arch/cpu/fpu/float/mode with --with-multilib-list=${arm_multilib}"
1>&2
+   exit 1
+   fi
+
+   tmake_file="${tmake_file}
${tmake_profile_file}"
+   fi

fi
;;

Tested with the following m

Re: [PATCH v2] gcov: Runtime configurable destination output

2016-04-29 Thread Aaron Conole
Nathan Sidwell  writes:

> On 04/27/16 16:59, Aaron Conole wrote:
>> Apologies for the top post. Pinging on this again. It still applies
>> cleanly, so no need to resubmit, I think. Is there anything else missing
>> or required before this can go in?
>
> I'm not convinced this is a desirable feature.  IIRC your rationale
> for it was that that you're somehow building the target program with
> inconsistent coverage data, and the messages about that are
> interfering with your program's output.
>
> That's kind of the point of error messages -- to get in your face.

Perhaps I've poorly explained what I want. I want to be able to pipe
gcov error messages to a different file for post-processing / reporting
elsewhere. I don't want them mixed with the application's messages. Do
you think this kind of generic flexibility is not a good thing, when it
comes at such little cost?

The whole point of this is to provide a way to keep the error messages
around. After all, if I really didn't want to see them I could do at
least the following things (untested, just for example):
  1. `./myapp 2>/dev/null` (which I don't want to do)
  2. { ...; fclose(stderr); stderr = fopen("gcoverrfile", "w"); exit(0); }
  3. mkfifo something; ./myapp 2>something; sed -e s,gcov_error_msg,,g something

But, this appeared to me like a generic way of providing what I want
in a way that could apply to any other application, without relying on
workarounds. If that's not a convincing argument, then I guess NAK it
and I'll be done with it - apologies for the noise.

Much thanks for your time,
-Aaron

> nathan


  1   2   >