Re: [Patch] Fix PR 60040

2016-04-28 Thread Joern Wolfgang Rennecke



On 28/04/16 07:57, Senthil Kumar Selvaraj wrote:

diff --git libcilkrts/ChangeLog libcilkrts/ChangeLog
index 8fada8a..ed26a3a 100644
--- libcilkrts/ChangeLog
+++ libcilkrts/ChangeLog
@@ -1,9 +1,3 @@
-2016-04-26  Rainer Orth  
-
-   PR target/60290
-   * Makefile.am (GENERAL_FLAGS): Add -funwind-tables.
-   * Makefile.in: Regenerate.
-
  2015-11-09  Igor Zamyatin  
  
  	PR target/66326
This does not appear related, and it would be the wrong way to back out 
a patch in the FSF repo

even if you wanted to, so I suppose this must be unintentional.


Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-28 Thread Richard Biener
On Wed, 27 Apr 2016, Bernd Edlinger wrote:

> On 26.04.2016 22:14, Joseph Myers wrote:
> > On Tue, 26 Apr 2016, Bernd Edlinger wrote:
> >
> >> Hi,
> >>
> >> as we all know, it's high time now to adjust the minimum supported
> >> gmp/mpfr/mpc versions for gcc-7.
> >
> > I think updating the minimum versions (when using previously built
> > libraries, not in-tree) is only appropriate when it allows some cleanup in
> > GCC, such as removing conditionals on whether a more recently added
> > function is available, adding functionality that depends on a newer
> > interface, or using newer interfaces instead of older ones that are now
> > deprecated.
> >
> > For example, you could justify a move to requiring MPFR 3.0.0 or later
> > with cleanups to use MPFR_RND* instead of the older GMP_RND*, and
> > similarly mpfr_rnd_t instead of the older mp_rnd_t and likewise mpfr_exp_t
> > and mpfr_prec_t in fortran/.  You could justify a move to requiring MPC
> > 1.0.0 (or 1.0.2) by optimizing clog10 using mpc_log10.  I don't know what
> > if any newer GMP interfaces would be beneficial in GCC.  And as always in
> > such cases, it's a good idea to look at e.g. how widespread the newer
> > versions are in GNU/Linux distributions, which indicates how many people
> > might be affected by an increase in the version requirement.
> >
> 
> Yes I see.
> 
> I would justify it this way: gmp-6.0.0 is the first version that does
> not invoke undefined behavior in gmp.h, once we update to gmp-6.0.0
> we could emit at least a warning in cstddef for this invalid code.
> 
> Once we have gmp-6.0.0, the earliest mpfr version that compiles at all
> is mpfr-3.1.1 and the earliest mpc version that compiles at all is
> mpc-0.9.  This would be the supported installed versions.
> 
> In-tree gmp-6.0.0 does _not_ work for ARM.  But gmp-6.1.0 does (with a
> little quirk).  All supported mpfr and mpc versions are working in-tree
> too, even for the ARM target.
> 
> When we have at least mpfr-3.1.1, it is straight forward to remove the
> pre-3.1.0 compatibility code from gcc/fortran/simplify.c for instance.
> 
> So I would propose this updated patch for gcc-7.

As said elsewhere the main reason for all of this is to make the
in-tree builds work better for newer archs that are not happy with
the versions provided by download_prerequesites.  This should come
with a documentation adjustment that the only tested in-tree
versions are those downloaded by dowload_prerequesites.

Please address updating the minimum supported _installed_ version
separately (in fact I do maintain a patch to disable stuff to be
able to go back to even older mpfr versions ... :/).

SLES 11 ships with mpfr 2.3.2, mpc 0.8 and gmp 4.2.3 while SLES 12
and openSUSE Leap have gmp 5.1.3, mpfr 3.1.2 and mpc 1.0.2.

Thanks,
Richard.


New Vietnamese PO file for 'gcc' (version 6.1.0)

2016-04-28 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Vietnamese team of translators.  The file is available at:

http://translationproject.org/latest/gcc/vi.po

(This file, 'gcc-6.1.0.vi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH] Fix PR70777

2016-04-28 Thread Richard Biener

The following removes a premature optimization/canonicalization from
fold-const.c which is now done by reassoc.  This avoids doing this
when sincos is not run (at -Og).  The reassoc pass now does this
transform (and in a more generic way by using powi).

I suspect there are a few missed simplifications regarding to
mixing powi and pow, so for the branches guarding the folding
with !optimize_debug is more appropriate.

Still on trunk we're now getting additional mult/add reassoc
features and should revisit pow[i] handling there.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.


2016-04-28  Richard Biener  

PR middle-end/70777
* fold-const.c (fold_binary_loc): Remove x*x to pow(x,2.0)
canonicalization.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 235510)
+++ gcc/fold-const.c(working copy)
@@ -10033,24 +10033,6 @@ fold_binary_loc (location_t loc,
  && TREE_CODE (arg1) == CONJ_EXPR
  && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
return fold_mult_zconjz (loc, type, arg0);
-
- if (flag_unsafe_math_optimizations)
-   {
-
- /* Canonicalize x*x as pow(x,2.0), which is expanded as x*x.  */
- if (!in_gimple_form
- && optimize
- && operand_equal_p (arg0, arg1, 0))
-   {
- tree powfn = mathfn_built_in (type, BUILT_IN_POW);
-
- if (powfn)
-   {
- tree arg = build_real (type, dconst2);
- return build_call_expr_loc (loc, powfn, 2, arg0, arg);
-   }
-   }
-   }
}
   goto associate;
 


[patch] Don't encode the minor version in the gcj abi version

2016-04-28 Thread Matthias Klose
Bumping the version from from 6.0.0 to 6.1.0 broke gcj, because the minor 
version is still encoded in the gcj abi, not seen during development of the 6 
series until it was bumped for the final release.


The gcc-5-branch needs a slightly different approach, because we froze the abi 
version only with the 5.3.0 release.


--- gcc/java/decl.c (Revision 235458)
+++ gcc/java/decl.c (Arbeitskopie)
@@ -561,9 +561,10 @@
   else /* C++ ABI */
 {
   /* Implicit in this computation is the idea that we won't break the
-old-style binary ABI in a sub-minor release (e.g., from 4.0.0 to
-4.0.1).  */
-  abi_version = 10 * major + 1000 * minor;
+old-style binary ABI in a sub-minor release (e.g., from 5.0 to
+5.1).  Freeze the ABI on the gcc-5-branch with the value of the
+GCC 5.3 release.*/
+  abi_version = 10 * major + 1000 * 3;
 }
   if (flag_bootstrap_classes)
 abi_version |= FLAG_BOOTSTRAP_LOADER;

Ok for the 6 branch and the trunk?

Matthias
2016-04-28  Matthias Klose  

	* decl.c (parse_version): Don't encode the minor version in the abi
	version.

 
--- gcc/java/decl.c
+++ gcc/java/decl.c
@@ -540,9 +540,9 @@
   else /* C++ ABI */
 {
   /* Implicit in this computation is the idea that we won't break the
-	 old-style binary ABI in a sub-minor release (e.g., from 4.0.0 to
-	 4.0.1).  */
-  abi_version = 10 * major + 1000 * minor;
+	 old-style binary ABI in a sub-minor release (e.g., from 6.0 to
+	 6.1).  */
+  abi_version = 10 * major;
 }
   if (flag_bootstrap_classes)
 abi_version |= FLAG_BOOTSTRAP_LOADER;


Re: [patch] Don't encode the minor version in the gcj abi version

2016-04-28 Thread Andrew Haley
On 28/04/16 08:55, Matthias Klose wrote:
> Ok for the 6 branch and the trunk?

OK,

Andrew.



Re: [Patch] Fix PR 60040

2016-04-28 Thread Senthil Kumar Selvaraj

Joern Wolfgang Rennecke writes:

> On 28/04/16 07:57, Senthil Kumar Selvaraj wrote:
>> diff --git libcilkrts/ChangeLog libcilkrts/ChangeLog
>> index 8fada8a..ed26a3a 100644
>> --- libcilkrts/ChangeLog
>> +++ libcilkrts/ChangeLog
>> @@ -1,9 +1,3 @@
>> -2016-04-26  Rainer Orth  
>> -
>> -PR target/60290
>> -* Makefile.am (GENERAL_FLAGS): Add -funwind-tables.
>> -* Makefile.in: Regenerate.
>> -
>>   2015-11-09  Igor Zamyatin  
>>   
>>  PR target/66326
> This does not appear related, and it would be the wrong way to back out 
> a patch in the FSF repo
> even if you wanted to, so I suppose this must be unintentional.

Yup, I should have removed those. Sorry about that.

Here's the patch with the extra bits removed.

gcc/ChangeLog

2016-04-28  Senthil Kumar Selvaraj  

PR target/60040
* reload1.c (reload): Call finish_spills before
restarting reload loop. Skip select_reload_regs
if update_eliminables_and_spill returns true.

gcc/testsuite/ChangeLog

2016-04-28  Sebastian Huber  
Matthijs Kooijman  
Senthil Kumar Selvaraj  

PR target/60040
* gcc.target/avr/pr60040-1.c: New.
* gcc.target/avr/pr60040-2.c: New.


diff --git gcc/reload1.c gcc/reload1.c
index c2800f8..d6fcece 100644
--- gcc/reload1.c
+++ gcc/reload1.c
@@ -981,7 +981,8 @@ reload (rtx_insn *first, int global)
   /* If we allocated another stack slot, redo elimination bookkeeping.  */
   if (something_was_spilled || starting_frame_size != get_frame_size ())
{
- update_eliminables_and_spill ();
+ if (update_eliminables_and_spill ())
+   finish_spills (global);
  continue;
}
 
@@ -1021,10 +1022,12 @@ reload (rtx_insn *first, int global)
  did_spill = 1;
  something_changed = 1;
}
-
-  select_reload_regs ();
-  if (failure)
-   goto failed;
+  else
+   {
+ select_reload_regs ();
+ if (failure)
+   goto failed;
+   }
 
   if (insns_need_reload != 0 || did_spill)
something_changed |= finish_spills (global);
diff --git gcc/testsuite/gcc.target/avr/pr60040-1.c 
gcc/testsuite/gcc.target/avr/pr60040-1.c
new file mode 100644
index 000..4fe296b
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/pr60040-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+float dhistory[10];
+float test;
+
+float getSlope(float history[]) {
+  float sumx = 0;
+  float sumy = 0;
+  float sumxy = 0;
+  float sumxsq = 0;
+  float rate = 0;
+  int n = 10;
+
+  int i;
+  for (i=1; i< 11; i++) {
+sumx = sumx + i;
+sumy = sumy + history[i-1];
+sumy = sumy + history[i-1];
+sumxsq = sumxsq + (i*i);
+  }
+
+  rate = sumy+sumx+sumxsq;
+  return rate;
+}
+
+void loop() {
+  test = getSlope(dhistory);
+}
diff --git gcc/testsuite/gcc.target/avr/pr60040-2.c 
gcc/testsuite/gcc.target/avr/pr60040-2.c
new file mode 100644
index 000..c40d49f
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/pr60040-2.c
@@ -0,0 +1,112 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned char __uint8_t;
+typedef short unsigned int __uint16_t;
+typedef long unsigned int __uint32_t;
+typedef __uint8_t uint8_t ;
+typedef __uint16_t uint16_t ;
+typedef __uint32_t uint32_t ;
+typedef __builtin_va_list __gnuc_va_list;
+typedef __gnuc_va_list va_list;
+typedef enum rtems_blkdev_request_op {
+  RTEMS_BLKDEV_REQ_READ,
+} rtems_fdisk_segment_desc;
+typedef struct rtems_fdisk_driver_handlers
+{
+  int (*blank) (const rtems_fdisk_segment_desc* sd,
+uint32_t device,
+uint32_t segment,
+uint32_t offset,
+uint32_t size);
+} rtems_fdisk_driver_handlers;
+typedef struct rtems_fdisk_page_desc
+{
+  uint16_t flags;
+  uint32_t block;
+} rtems_fdisk_page_desc;
+typedef struct rtems_fdisk_segment_ctl
+{
+  rtems_fdisk_page_desc* page_descriptors;
+  uint32_t pages_active;
+} rtems_fdisk_segment_ctl;
+typedef struct rtems_fdisk_segment_ctl_queue
+{
+} rtems_fdisk_segment_ctl_queue;
+typedef struct rtems_fdisk_device_ctl
+{
+  uint32_t flags;
+  uint8_t* copy_buffer;
+} rtems_flashdisk;
+
+extern void rtems_fdisk_error (const char *, ...);
+extern int rtems_fdisk_seg_write(const rtems_flashdisk*,
+ rtems_fdisk_segment_ctl*,
+ uint32_t,
+ const rtems_fdisk_page_desc* page_desc,
+uint32_t);
+
+void rtems_fdisk_printf (const rtems_flashdisk* fd, const char *format, ...)
+{
+  {
+va_list args;
+__builtin_va_start(args,format);
+  }
+}
+static int
+rtems_fdisk_seg_blank_check (const rtems_flashdisk* fd,
+ rtems_fdisk_segment_ctl* sc,
+ uint32_t offset,
+ uint32_t size)
+{
+  uint32_t device;
+  uint32_t segment;
+  const rtems_fdisk_segment_desc* sd;
+  const rtems_fdisk_driver_h

Re: [PATCH 1/2] [ARC/LIBGCC] Add TLS support.

2016-04-28 Thread Joern Wolfgang Rennecke



On 15/04/16 10:58, Claudiu Zissulescu wrote:

TLS mods for libgcc.

OK to apply?
Claudiu

libgcc/
2016-04-15  Claudiu Zissulescu  
Joern Rennecke  

* config/arc/crttls.S: New file.
* config/arc/t-arc: New rule.
* config.host (arc*-*-elf*, arc*-*-linux*): Add crttls.o.
-

 The libgcc part is OK.


Re: [patch] Don't encode the minor version in the gcj abi version

2016-04-28 Thread Rainer Orth
Matthias Klose  writes:

> Bumping the version from from 6.0.0 to 6.1.0 broke gcj, because the minor
> version is still encoded in the gcj abi, not seen during development of the
> 6 series until it was bumped for the final release.

This is PR java/70839.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] add -fprolog-pad=N option to c-family

2016-04-28 Thread Maxim Kuvyrkov
> On Apr 27, 2016, at 6:22 PM, Torsten Duwe  wrote:
> 
> Hi Maxim,
> 
> thanks for starting the work on this; I have added the missing
> command line option. It builds now and the resulting compiler generates
> a linux kernel with the desired properties, so work can continue there.

Thanks for working on this!

> 
>   Torsten
> 
> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 9bc02fc..57265c5 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -393,6 +393,7 @@ static tree handle_designated_init_attribute (tree *, 
> tree, tree, int, bool *);
> static tree handle_bnd_variable_size_attribute (tree *, tree, tree, int, bool 
> *);
> static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
> static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
> +static tree handle_prolog_pad_attribute (tree *, tree, tree, int, bool *);
> 
> static void check_function_nonnull (tree, int, tree *);
> static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT);
> @@ -833,6 +834,8 @@ const struct attribute_spec c_common_attribute_table[] =
> handle_bnd_legacy, false },
>   { "bnd_instrument", 0, 0, true, false, false,
> handle_bnd_instrument, false },
> +  { "prolog_pad",  1, 1, false, true, true,
> +   handle_prolog_pad_attribute, false },
>   { NULL, 0, 0, false, false, false, NULL, false }
> };
> 
> @@ -9663,6 +9666,16 @@ handle_designated_init_attribute (tree *node, tree 
> name, tree, int,
>   return NULL_TREE;
> }
> 
> +static tree
> +handle_prolog_pad_attribute (tree *, tree name, tree, int,
> +  bool *)
> +{
> +  warning (OPT_Wattributes,
> +"%qE attribute is used", name);
> +
> +  return NULL_TREE;
> +}
> +
> 
> /* Check for valid arguments being passed to a function with FNTYPE.
>There are NARGS arguments in the array ARGARRAY.  */
> diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
> index 9ae181f..31a8026 100644
> --- a/gcc/c-family/c-opts.c
> +++ b/gcc/c-family/c-opts.c
> @@ -532,6 +532,10 @@ c_common_handle_option (size_t scode, const char *arg, 
> int value,
>   cpp_opts->ext_numeric_literals = value;
>   break;
> 
> +case OPT_fprolog_pad_:
> +  prolog_nop_pad_size = value;
> +  break;

As Szabolcs noted in this thread, we need to consider how -fprolog-pad= will 
play with IPA and LTO.  The decision to use __attribute__ to generate prolog 
pad for a function is specifically to handle LTO builds.  The option 
-fprolog-pad=N should set __attribute__((prolog_pad(N))) on every function in 
current translation unit, and the rest should be handled by the attribute 
logic.  This is not trivial to implement, and has been what stopped me from 
finishing the patch.

Your current patch is great for experiments for the kernel engineers to check 
if suggested approaches to code patching will work.  Still, I prefer to 
implement LTO-friendly way of handling -fprolog-pad=N via function attributes.

--
Maxim Kuvyrkov
www.linaro.org




> +
> case OPT_idirafter:
>   add_path (xstrdup (arg), AFTER, 0, true);
>   break;
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index aafd802..929ebb6 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1407,6 +1407,10 @@ fpreprocessed
> C ObjC C++ ObjC++
> Treat the input file as already preprocessed.
> 
> +fprolog-pad=
> +C ObjC C++ ObjC++ RejectNegative Joined UInteger Var(prolog_nop_pad_size) 
> Init(0)
> +Pad NOPs before each function prolog
> +
> ftrack-macro-expansion
> C ObjC C++ ObjC++ JoinedOrMissing RejectNegative UInteger
> ; converted into ftrack-macro-expansion=
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 1ce7181..9d10b10 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -4553,6 +4553,10 @@ will select the smallest suitable mode.
> This section describes the macros that output function entry
> (@dfn{prologue}) and exit (@dfn{epilogue}) code.
> 
> +@deftypefn {Target Hook} void TARGET_ASM_PRINT_PROLOG_PAD (FILE *@var{file}, 
> unsigned HOST_WIDE_INT @var{pad_size}, bool @var{record_p})
> +Generate prologue pad
> +@end deftypefn
> +
> @deftypefn {Target Hook} void TARGET_ASM_FUNCTION_PROLOGUE (FILE *@var{file}, 
> HOST_WIDE_INT @var{size})
> If defined, a function that outputs the assembler code for entry to a
> function.  The prologue is responsible for setting up the stack frame,
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index a0a0a81..bda6d5c 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -3662,6 +3662,8 @@ will select the smallest suitable mode.
> This section describes the macros that output function entry
> (@dfn{prologue}) and exit (@dfn{epilogue}) code.
> 
> +@hook TARGET_ASM_PRINT_PROLOG_PAD
> +
> @hook TARGET_ASM_FUNCTION_PROLOGUE
> 
> @hook TARGET_ASM_FUNCTION_END_PROLOGUE
> diff --git a/gcc/final.c b/gcc/final.c
> index 1

Re: [PATCH 2/2] [ARC] Add TLS support.

2016-04-28 Thread Joern Wolfgang Rennecke



On 15/04/16 10:58, Claudiu Zissulescu wrote:

TLS mods for ARC backend.

OK to apply?
Claudiu


This is using an inefficient TLS global dynamic implementation that 
would not

be expected in a new and/or well-tuned port.
However, if you have to work with a legacy runtime, that can't be helped.


+(define_insn "tls_gd_load"

..

+   ; if the linker has to patch this into IE, we need a long insns


Typo: a long insn.

arc_emit_call_tls_get_addr is missing a start-of-function comment.

Otherwise this is OK.



Re: [PATCH] add -fprolog-pad=N option to c-family

2016-04-28 Thread Maxim Kuvyrkov
> On Apr 27, 2016, at 7:26 PM, Szabolcs Nagy  wrote:
> 
> On 27/04/16 16:22, Torsten Duwe wrote:
>> Hi Maxim,
>> 
>> thanks for starting the work on this; I have added the missing
>> command line option. It builds now and the resulting compiler generates
>> a linux kernel with the desired properties, so work can continue there.
>> 
>>  Torsten
> 
> i guess the flag should be documented in invoke.texi
> 
> it's not clear what N means in -fprolog-pad=N, how
> location recording is enabled and how it interacts
> with -fipa-ra. (-pg disables -fipa-ra, but -fprolog-pad
> works without -pg.)

I think, it should be responsibility of the user to disable -fipa-ra if code 
intended to be patched-in will be incompatible with IPA-RA.  I agree, though, 
that documentation of -fprolog-pad= should make a special note of this fact and 
recommend inclusion of -fno-ipa-ra to the cflags whenever -fprolog-pad= is 
used..

> 
> with -mfentry, by default the user only has to
> implement the fentry call (linux wants nops there, but
> e.g. glibc could use -pg -mfentry for profiling on
> aarch64 and the target specific details are easier to
> document for an -m option than for something general).

I don't understand your point here, could you elaborate, please?

> 
> the nop-padding is more general, but the size and
> layout of nops and the call abi will be target
> specific and the user will most likely need to modify
> the binary (to get the right sequence) which needs
> additional tooling.  i don't know who might use it
> other than linux (which already has tools to deal with
> -mfentry).

Right, but this tooling will require minimal (if any) changes to be adapted to 
nop-pad approach.  If I remember correctly, recent versions of GCC and kernel 
for x86_64 generate NOPs, not the call sequence in the prologs when -mfentry is 
used.

> 
> i'm not against nop-padding, but i think more evidence
> is needed that the generalization is a good idea and
> users can deal with the resulting issues.

--
Maxim Kuvyrkov
www.linaro.org







[PATCH] Re-use cc1-checksum.c for stage-final

2016-04-28 Thread Richard Biener

The following prototype patch re-uses cc1-checksum.c from the
previous stage when compiling stage-final.  This eventually
allows to compare cc1 from the last two stages to fix the
lack of a true comparison when doing LTO bootstrap (it
compiles LTO bytecode from the compile-stage there, not the
final optimization result).

Bootstrapped on x86_64-unknown-linux-gnu.

When stripping gcc/cc1 and prev-gcc/cc1 after the bootstrap
they now compare identical (with LTO bootstrap it should
not require stripping as that doesn't do a bootstrap-debug AFAIK).

Is sth like this acceptable?  (consider it also done for cp/Make-lang.in)

In theory we can compare all stage1 languages but I guess comparing
the required ones for a LTO bootstrap, cc1, cc1plus and lto1 would
be sufficient (or even just comparing one binary in which case
comparing lto1 would not require any patches).

This also gets rid of the annoying warning that cc1-checksum.o
differs (obviously).

Thanks,
Richard.

2016-04-28  Richard Biener  

c/
* Make-lang.in (cc1-checksum.c): For stage-final re-use
the checksum from the previous stage.

Index: gcc/c/Make-lang.in
===
--- gcc/c/Make-lang.in  (revision 235499)
+++ gcc/c/Make-lang.in  (working copy)
@@ -63,9 +63,14 @@ c-warn = $(STRICT_WARN)
 # compute checksum over all object files and the options
 cc1-checksum.c : build/genchecksum$(build_exeext) checksum-options \
$(C_OBJS) $(BACKEND) $(LIBDEPS) 
-   build/genchecksum$(build_exeext) $(C_OBJS) $(BACKEND) $(LIBDEPS) \
+   if [ -f ../stage_final ] \
+  && cmp -s ../stage_current ../stage_final; then \
+ cp ../prev-gcc/cc1-checksum.c cc1-checksum.c; \
+   else \
+ build/genchecksum$(build_exeext) $(C_OBJS) $(BACKEND) $(LIBDEPS) \
  checksum-options > cc1-checksum.c.tmp &&   \
-   $(srcdir)/../move-if-change cc1-checksum.c.tmp cc1-checksum.c
+ $(srcdir)/../move-if-change cc1-checksum.c.tmp cc1-checksum.c; \
+   fi
 
 cc1$(exeext): $(C_OBJS) cc1-checksum.o $(BACKEND) $(LIBDEPS)
+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ $(C_OBJS) \


Re: [patch] cleanup *finish_omp_clauses

2016-04-28 Thread Jakub Jelinek
On Wed, Apr 27, 2016 at 07:37:17PM -0700, Cesar Philippidis wrote:
> This patch replaces all of the bool argument to c_finish_omp_clauses and
> finish_omp_clauses in the c and c++ front ends, respectively. Right now
> there are three bool arguments, one for is_omp/allow_fields,
> declare_simd and is_cilk, the latter two have default values set.
> OpenACC will require some special handling in *finish_omp_clauses in the
> near future, too, so rather than add an is_oacc argument, I introduced
> an enum c_omp_region_type, similar to the one in gimplify.c.
> 
> Is this patch ok for trunk? I'll make use of C_ORT_ACC shortly in a
> follow up patch.

I've been long wanting to use just tree_code there, but as we don't have one
e.g. for DECLARE_SIMD, perhaps a separate enum is better.

> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -1261,6 +1261,17 @@ enum c_omp_clause_split
>C_OMP_CLAUSE_SPLIT_TASKLOOP = C_OMP_CLAUSE_SPLIT_FOR
>  };
>  
> +enum c_omp_region_type
> +{
> +  C_ORT_NONE = 0,
> +  C_ORT_OMP  = 1 << 0,
> +  C_ORT_SIMD = 1 << 1,
> +  C_ORT_CILK = 1 << 2,
> +  C_ORT_ACC  = 1 << 3,
> +  C_ORT_OMP_SIMD = C_ORT_OMP | C_ORT_SIMD,
> +  C_ORT_OMP_CILK = C_ORT_OMP | C_ORT_CILK

That said, the above names are just weird, it is non-obvious
what they mean at all.  What is C_ORT_NONE for?  We surely don't
have any clauses that aren't OpenMP, nor Cilk+, nor OpenACC
(ok, maybe the simd attribute, but donno if it ever calls the
*finish_omp_clauses functions).
So, IMHO the originating specification should be one thing, so
C_ORT_OMP, C_ORT_CILK, C_ORT_ACC.
And, beyond that the C FE cares about whether it is a clause
on #pragma omp declare simd or its Cilk+ counterpart (vector attribute),
so you want C_ORT_DECLARE_SIMD possibly ored with the language
(note, not C_ORT_SIMD, that is way too confusing - we have
#pragma omp simd (OpenMP), #pragma simd (Cilk+), and we certainly do not
want the declare simd behavior for those.
Perhaps #pragma omp declare target is another construct that is handled
differently and could be visible in the bitmask too.

The C++ finish_omp_clauses also cares about whether fields (meaning
non-static members) are allowed, i.e. whether
struct S { int p; void foo () {
...
#pragma omp ... private (p)
...
}};
should be allowed or not.  That can be derived from the language and
the other construct bits though, I believe right now only OpenMP constructs
should handle it, and declare simd should not, and similarly declare target
should not.

Jakub


Re: [PATCH][AArch64][wwwdocs] Summarise some more AArch64 changes for GCC6

2016-04-28 Thread Kyrill Tkachov


On 27/04/16 18:14, Jim Wilson wrote:

On Wed, Apr 27, 2016 at 3:33 AM, Kyrill Tkachov
 wrote:

Thanks, I've incorporated your and James' feedback.
Since James ok'd the content of the patch from an AArch64 perspective
I'll commit this later today if I receive no further feedback.

There is no paragraph for the Qualcomm qdf24xx.  Do you want me to
write that and submit it?  That could take a while as I will have to
discuss if with Qualcomm first.


Hi Jim,

I'll add one separately (and an entry in the ARM section too).
Sorry for the delay,

Kyrill


Jim




Re: [PATCH] operand_equal_p checking (PR sanitizer/70683)

2016-04-28 Thread Christophe Lyon
Hi,
This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70843

On 27 April 2016 at 14:41, Richard Biener  wrote:
> On Wed, 27 Apr 2016, Jakub Jelinek wrote:
>
>> On Tue, Apr 26, 2016 at 03:02:38PM +0200, Jakub Jelinek wrote:
>> > The debugging hack is too ugly and slows down the compiler (by artificially
>> > increasing number of collisions), so it is not appropriate, but perhaps we
>> > can add some internal only use OEP_* flag, pass it to the recursive calls
>> > of operand_equal_p and if not set and flag_checking, verify
>> > iterative_hash_expr equality in the outermost call).
>>
>> Here is the corresponding checking patch.  It uncovered two further issues
>> in the tree.[ch] patch which I'm going to post momentarily.
>> Both patches together bootstrapped/regtested on x86_64-linux and i686-linux,
>> ok for trunk?
>
> Ok.
>
> Thanks,
> Richard.
>
>> 2016-04-27  Jakub Jelinek  
>>
>>   PR sanitizer/70683
>>   * tree-core.h (enum operand_equal_flag): Add OEP_NO_HASH_CHECK.
>>   * fold-const.c (operand_equal_p): If flag_checking and
>>   OEP_NO_HASH_CHECK is not set in flag, recurse with OEP_NO_HASH_CHECK
>>   and if it returns non-zero, assert iterative_hash_expr on both
>>   args is the same.
>>
>> --- gcc/tree-core.h.jj2016-04-22 18:21:55.0 +0200
>> +++ gcc/tree-core.h   2016-04-26 17:47:19.875753297 +0200
>> @@ -765,7 +765,9 @@ enum operand_equal_flag {
>>OEP_ONLY_CONST = 1,
>>OEP_PURE_SAME = 2,
>>OEP_MATCH_SIDE_EFFECTS = 4,
>> -  OEP_ADDRESS_OF = 8
>> +  OEP_ADDRESS_OF = 8,
>> +  /* Internal within operand_equal_p:  */
>> +  OEP_NO_HASH_CHECK = 16
>>  };
>>
>>  /* Enum and arrays used for tree allocation stats.
>> --- gcc/fold-const.c.jj   2016-04-22 18:21:32.0 +0200
>> +++ gcc/fold-const.c  2016-04-26 18:30:40.919080701 +0200
>> @@ -2749,6 +2749,25 @@ combine_comparisons (location_t loc,
>>  int
>>  operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
>>  {
>> +  /* When checking, verify at the outermost operand_equal_p call that
>> + if operand_equal_p returns non-zero then ARG0 and ARG1 has the same
>> + hash value.  */
>> +  if (flag_checking && !(flags & OEP_NO_HASH_CHECK))
>> +{
>> +  if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
>> + {
>> +   inchash::hash hstate0 (0), hstate1 (0);
>> +   inchash::add_expr (arg0, hstate0, flags);
>> +   inchash::add_expr (arg1, hstate1, flags);
>> +   hashval_t h0 = hstate0.end ();
>> +   hashval_t h1 = hstate1.end ();
>> +   gcc_assert (h0 == h1);
>> +   return 1;
>> + }
>> +  else
>> + return 0;
>> +}
>> +
>>/* If either is ERROR_MARK, they aren't equal.  */
>>if (TREE_CODE (arg0) == ERROR_MARK || TREE_CODE (arg1) == ERROR_MARK
>>|| TREE_TYPE (arg0) == error_mark_node
>>
>>
>>   Jakub
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [ubsan PATCH] Fix compile-time hog with &TARGET_EXPRs (PR sanitizer/70342)

2016-04-28 Thread Jakub Jelinek
On Wed, Apr 27, 2016 at 07:03:25PM +0200, Marek Polacek wrote:
> This test took forever to compile with -fsanitize=null, because the
> instrumentation was creating incredible amount of duplicated expressions, in a
> quadratic fashion.  I think the problem is that we instrument &TARGET_EXPR <>
> expressions, which doesn't seem to be needed -- we only need to instrument the
> initializers in TARGET_EXPRs.  With this patch, we avoid creating tons of 
> useless
> expressions and the compile time is reduced from ~ infinity to <1s.
> 
> Jakub, do you see any problem with this?
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2016-04-27  Marek Polacek  
> 
>   PR sanitizer/70342
>   * c-ubsan.c (ubsan_maybe_instrument_reference_or_call): Don't
>   null-instrument &TARGET_EXPR <...>.
> 
>   * g++.dg/ubsan/null-7.C: New test.

I wonder if this wouldn't be better handled in tree_single_nonzero_warnv_p,
perhaps like:

 case ADDR_EXPR:
   {
tree base = TREE_OPERAND (t, 0);
 
if (!DECL_P (base))
  base = get_base_address (base);
+
+   if (base && TREE_CODE (base) == TARGET_EXPR)
+ base = TARGET_EXPR_SLOT (base);

if (!base)
  return false;

(untested)?

Jakub


Re: [PATCH GCC]Do more tree if-conversions by handlding PHIs with more than two arguments.

2016-04-28 Thread Richard Biener
On Wed, Apr 27, 2016 at 5:49 PM, Bin Cheng  wrote:
> Hi,
> Currently tree if-conversion only supports PHIs with no more than two 
> arguments unless the loop is marked with "simd pragma".  This patch makes 
> such PHIs supported unconditionally if they have no more than MAX_PHI_ARG_NUM 
> arguments, thus cases like PR56541 can be fixed.  Note because a chain of 
> "?:" operators are needed to compute mult-arg PHI, this patch records the 
> case and versions loop so that vectorizer can fall back to the original loop 
> if if-conversion+vectorization isn't beneficial.  Ideally, cost computation 
> in vectorizer should be improved to measure benefit against the original 
> loop, rather than if-converted loop.  So far MAX_PHI_ARG_NUM is set to (4) 
> because cases with more arguments are rare and not likely beneficial.
>
> Apart from above change, the patch also makes changes like: only split 
> critical edge when we have to; cleanups code logic in if_convertible_loop_p 
> about aggressive_if_conv.
>
> Bootstrap and test on x86_64 and AArch64, is it OK?

Can you make this magic number a --param please?  Otherwise ok.

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-04-26  Bin Cheng  
>
> PR tree-optimization/56541
> * tree-if-conv.c (MAX_PHI_ARG_NUM): New macro.
> (any_complicated_phi): New static variable.
> (aggressive_if_conv): Delete.
> (if_convertible_phi_p): Support PHIs with more than two arguments.
> (if_convertible_bb_p): Remvoe check on aggressive_if_conv and
> critical pred edges.
> (ifcvt_split_critical_edges): Support PHIs with more than two
> arguments by checking new parameter.  Only split critical edges
> if needed.
> (tree_if_conversion): Handle simd pragma marked loop using new
> local variable aggressive_if_conv.  Check any_complicated_phi.
>
> gcc/testsuite/ChangeLog
> 2016-04-26  Bin Cheng  
>
> PR tree-optimization/56541
> * gcc.dg/tree-ssa/ifc-pr56541.c: New test.


[ARM] Enable __fp16 as a function parameter and return type.

2016-04-28 Thread Matthew Wahab

Hello,

The ARM target supports the half-precision floating point type __fp16
but does not allow its use as a function return or parameter type. This
patch removes that restriction and defines the ACLE macro
__ARM_FP16_ARGS to indicate this. The code generated for passing __fp16
values into and out of functions depends on the level of hardware
support but conforms to the AAPCS (see
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf).

This patch enables data movement for HF-mode values using VFP registers,
when they are available, to support passing arguments and return values
through the registers.

This patch also fixes the definition of TARGET_NEON_FP16 which used to
require both neon and fp16 features to be enabled. This was
inadvertently weakened, when the macro definition was changed to use
ARM_FPU_FSET_HAS, to only require one of neon or fp16 to be
enabled. This patch returns to the original
requirements. TARGET_NEON_FP16 is only used in instruction selection for
HF-mode data moves.

Tested for arm-none-eabi with cross-compiled check-gcc and for
arm-none-linux-gnueabihf with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2016-04-27  Matthew Wahab  
Ramana Radhakrishnan  
Jiong Wang  

* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
for __ARM_FP16_FORMAT_IEEE and __ARM_FP16_FORMAT_ALTERNATIVE.
Define __ARM_FP16_ARGS when appropriate.
* config/arm/arm.c (arm_invalid_parameter_type): Remove
declaration.
(arm_invalid_return_type): Likewise.
(TARGET_INVALID_PARAMETER_TYPE): Remove.
(TARGET_INVALID_RETURN_TYPE): Remove.
(aapcs_vfp_sub_candidate): Allow HFmode.
(aapcs_vfp_allocate): Add comment.  Support HFmode.
(aapcs_vfp_allocate_return_reg): Likewise.
(struct aapcs_cp_arg_layout): Slightly reword comments for
is_return_candidate and allocate_return_reg.
(output_mov_vfp): Update assert.
(arm_hard_regno_mode_ok): Remove comment, update HF-mode
condition.
(arm_invalid_parameter_type): Remove.
(amr_invalid_return_type): Remove.
* config/arm/arm.h (TARGET_NEON_FP16): Fix definition.
* config/arm/arm.md (*arm32_movhf): Disable for TARGET_VFP.
* config/arm/vfp.md (*movhf_vfp): Enable for TARGET_VFP.

gcc/testsuite/
2016-04-27  Matthew Wahab  

* g++.dg/ext/arm-fp16/fp16-param-1.c: Update expected output.  Add
test for __ARM_FP16_ARGS.
* g++.dg/ext/arm-fp16/fp16-return-1.c: Update expected output.
* gcc.target/arm/aapcs/neon-vect10.c: New.
* gcc.target/arm/aapcs/neon-vect9.c: New.
* gcc.target/arm/aapcs/vfp18.c: New.
* gcc.target/arm/aapcs/vfp19.c: New.
* gcc.target/arm/aapcs/vfp20.c: New.
* gcc.target/arm/aapcs/vfp21.c: New.
* gcc.target/arm/fp16-aapcs-1.c: New.
* g++.target/arm/fp16-param-1.c: Update expected output.  Add
test for __ARM_FP16_ARGS.
* g++.target/arm/fp16-return-1.c: Update expected output.
[PATCH] [ARM] Enable __fp16 as a function parameter and return type.

The ARM target supports the half-precision floating point type __fp16
but does not allow its use as a function return or parameter type. This
patch removes that restriction and defines the ACLE macro
__ARM_FP16_ARGS to indicate this. The code generated for passing __fp16
values into and out of functions depends on the level of hardware
support but conforms to the AAPCS (see
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf).

This patch enables data movement for HF-mode values using VFP registers,
when they are available, to support passing arguments and return values
through the register.

This patch also fixes the definition of TARGET_NEON_FP16 which used to
require both neon and fp16 features to be enabled. This was
inadvertently weakened, when the macro definition was changed to use
ARM_FPU_FSET_HAS, to only require one of neon or fp16 to be
enabled. This patch returns to the original
requirements. TARGET_NEON_FP16 is only used in instruction selection for
HF-mode data moves.

Tested for arm-none-eabi with cross-compiled check-gcc and for
arm-none-linux-gnueabihf with native bootstrap and make check.

gcc/
2016-04-27  Matthew Wahab  
	Ramana Radhakrishnan  
	Jiong Wang  

	* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
	for __ARM_FP16_FORMAT_IEEE and __ARM_FP16_FORMAT_ALTERNATIVE.
	Define __ARM_FP16_ARGS when appropriate.
	* config/arm/arm.c (arm_invalid_parameter_type): Remove
	declaration.
	(arm_invalid_return_type): Likewise.
	(TARGET_INVALID_PARAMETER_TYPE): Remove.
	(TARGET_INVALID_RETURN_TYPE): Remove.
	(aapcs_vfp_sub_candidate): Allow HFmode.
	(aapcs_vfp_allocate): Add comment.  Support HFmode.
	(aapcs_vfp_allocate_return_reg): Likewise.
	(struct aapcs_cp_arg_layout): Slightly reword comments for
	is_return_candidate and allocate_r

Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-28 Thread Matthias Klose

On 27.04.2016 17:56, Dhole wrote:

Thanks again for the review Bernd,

On 16-04-27 01:33:47, Bernd Schmidt wrote:

+  epoch = strtoll (source_date_epoch, &endptr, 10);
+  if ((errno == ERANGE && (epoch == LLONG_MAX || epoch == LLONG_MIN))
+  || (errno != 0 && epoch == 0))
+fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
+"strtoll: %s\n", xstrerror(errno));
+  if (endptr == source_date_epoch)
+fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
+"No digits were found: %s\n", endptr);
+  if (*endptr != '\0')
+fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
+"Trailing garbage: %s\n", endptr);
+  if (epoch < 0)
+fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
+"Value must be nonnegative: %lld \n", epoch);


These are somewhat unusual for error messages, but I think the general
principle of no capitalization probably applies, so "No", "Trailing", and
"Value" should be lowercase.


Done.


+  time_t source_date_epoch = (time_t) -1;
+
+  source_date_epoch = get_source_date_epoch ();


First initialization seems unnecessary. Might want to merge the declaration
with the initialization.


And done.

I'm attaching the updated patch with the two minor issues fixed.


committed.



Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-28 Thread Bernd Schmidt

On 04/28/2016 11:20 AM, Matthias Klose wrote:

On 27.04.2016 17:56, Dhole wrote:


I'm attaching the updated patch with the two minor issues fixed.


committed.


Something else that occurred to me - could you please also work on a 
testcase?



Bernd



Re: [PATCH][ARM] PR driver/70132: Avoid double fclose in driver-arm.c

2016-04-28 Thread Kyrill Tkachov


On 21/03/16 10:41, Ramana Radhakrishnan wrote:

On Fri, Mar 11, 2016 at 3:32 PM, Kyrill Tkachov
 wrote:

Hi all,

As reported in the PR we can end up calling fclose twice on a file, causing
an error.
This patch fixes that by reorganising the logic a bit to ensure we return
after closing
the file the first time.

Bootstrapped and tested on arm-none-linux-gnueabihf

Ok for trunk?

Thanks,
Kyrill

2016-03-11  Kyrylo Tkachov  

 PR driver/70132
 * config/arm/driver-arm.c (host_detect_local_cpu): Set file pointer
 to NULL after closing file.


OK with a fixed changelog.


Ok to backport this to the GCC 5 and 4.9 branches?
The tests are fine there for this patch.

Thank,
Kyrill


Ramana




New German PO file for 'cpplib' (version 6.1.0)

2016-04-28 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/de.po

(This file, 'cpplib-6.1.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-6.1.0.de.po'

2016-04-28 Thread Translation Project Robot


cpplib-6.1.0.de.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



Re: [PATCH][ARM] PR driver/70132: Avoid double fclose in driver-arm.c

2016-04-28 Thread Ramana Radhakrishnan
On Thu, Apr 28, 2016 at 10:24 AM, Kyrill Tkachov
 wrote:
>
> On 21/03/16 10:41, Ramana Radhakrishnan wrote:
>>
>> On Fri, Mar 11, 2016 at 3:32 PM, Kyrill Tkachov
>>  wrote:
>>>
>>> Hi all,
>>>
>>> As reported in the PR we can end up calling fclose twice on a file,
>>> causing
>>> an error.
>>> This patch fixes that by reorganising the logic a bit to ensure we return
>>> after closing
>>> the file the first time.
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf
>>>
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2016-03-11  Kyrylo Tkachov  
>>>
>>>  PR driver/70132
>>>  * config/arm/driver-arm.c (host_detect_local_cpu): Set file pointer
>>>  to NULL after closing file.
>>
>>
>> OK with a fixed changelog.
>
>
> Ok to backport this to the GCC 5 and 4.9 branches?
> The tests are fine there for this patch.

OK.

Ramana

>
> Thank,
> Kyrill
>
>> Ramana
>
>


RE: [PATCH] [ARC] Add SIMD extensions for ARC HS

2016-04-28 Thread Claudiu Zissulescu
Committed  r235551.

Thanks,
Claudiu

> 
> On 08/04/16 09:30, Claudiu Zissulescu wrote:
> > This patch adds support for the new SIMD operations added to ARC HS
> > cpu class. The proposed patch doesn't chase for performance but offers
> > support for those newly added operations, and autovectorization.
> >
> > The patch is tested using dg.exp, compile.exp, and execute.exp for
> > both arc700 and archs with and without SIMD support enabled.
> >
> > OK to apply?
> > Claudiu
> 
> OK.


Re: [PATCH 1/6] [ARC] Don't use drsub* instructions when selecting fpuda.

2016-04-28 Thread Joern Wolfgang Rennecke



On 18/04/16 15:33, Claudiu Zissulescu wrote:

The double precision floating point assist instructions are not
implementing the reverse double subtract instruction (drsub) found in
the FPX extension, hence, this patch.

OK to apply?
Claudiu

gcc/
2016-04-18  Claudiu Zissulescu  

* config/arc/arc.md (cpu_facility): Add fpx variant.
(subdf3): Prohibit use reverse sub when assist operations option
is enabled.
* config/arc/fpx.md (subdf3_insn, *dsubh_peep2_insn): Allow drsub
instructions only when FPX is enabled.
 * testsuite/gcc.target/arc/trsub.c: New test.


 OK.


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-28 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 11:20:28AM +0200, Matthias Klose wrote:
> On 27.04.2016 17:56, Dhole wrote:
> >Thanks again for the review Bernd,
> >
> >On 16-04-27 01:33:47, Bernd Schmidt wrote:
> >>>+  epoch = strtoll (source_date_epoch, &endptr, 10);
> >>>+  if ((errno == ERANGE && (epoch == LLONG_MAX || epoch == LLONG_MIN))
> >>>+  || (errno != 0 && epoch == 0))
> >>>+fatal_error (UNKNOWN_LOCATION, "environment variable 
> >>>$SOURCE_DATE_EPOCH: "
> >>>+   "strtoll: %s\n", xstrerror(errno));
> >>>+  if (endptr == source_date_epoch)
> >>>+fatal_error (UNKNOWN_LOCATION, "environment variable 
> >>>$SOURCE_DATE_EPOCH: "
> >>>+   "No digits were found: %s\n", endptr);
> >>>+  if (*endptr != '\0')
> >>>+fatal_error (UNKNOWN_LOCATION, "environment variable 
> >>>$SOURCE_DATE_EPOCH: "
> >>>+   "Trailing garbage: %s\n", endptr);
> >>>+  if (epoch < 0)
> >>>+fatal_error (UNKNOWN_LOCATION, "environment variable 
> >>>$SOURCE_DATE_EPOCH: "
> >>>+   "Value must be nonnegative: %lld \n", epoch);
> >>
> >>These are somewhat unusual for error messages, but I think the general
> >>principle of no capitalization probably applies, so "No", "Trailing", and
> >>"Value" should be lowercase.
> >
> >Done.
> >
> >>>+  time_t source_date_epoch = (time_t) -1;
> >>>+
> >>>+  source_date_epoch = get_source_date_epoch ();
> >>
> >>First initialization seems unnecessary. Might want to merge the declaration
> >>with the initialization.
> >
> >And done.
> >
> >I'm attaching the updated patch with the two minor issues fixed.
> 
> committed.

BTW, I think fatal_error doesn't make sense, it isn't something that is not
recoverable, normal error or just a warning would be IMHO more than
sufficient.  The fallback would be just using current time, i.e. ignoring
the env var.

Additionally, I think it is a very bad idea to slow down the initialization
for something so rarely used - instead of initializing this always, IMNSHO
it should be only initialized when the first __TIME__ macro is expanded,
similarly how it only calls time/localtime etc. when the macro is expanded
for the first time.

Also, as a follow-up, guess the driver should set this
env var for the -fcompare-debug case if not already set, to something that
matches the current date, so that __TIME__ macros expands the same in
between both compilations, even when they don't compile both in the same
second.

Jakub


Re: New hashtable power 2 rehash policy

2016-04-28 Thread Jonathan Wakely

On 23/04/16 10:27 +0200, François Dumont wrote:

Hi

   Here is a patch to introduce a new power of 2 based rehash policy. 
It enhances performance as it avoids modulo float operations. I have 
updated performance benches and here is the result:


54075.cctr1 benches 455r  446u8s
0mem0pf
54075.ccstd benches 466r  460u6s
0mem0pf
54075.ccstd2 benches 375r  369u6s
0mem0pf


std2 benches is the one using power of 2 bucket count.

   Note that I made use of __detected_or_t to avoid duplicating all 
the code of _Rehash_base<>.


   It brings a simplification of _Insert<>, it doesn't take a 
_Unique_keys template parameter anymore. It allowed to remove a 
specialization.


   It also improve behavior when we reach maximum number of buckets, 
we won't keep on trying to increase the number as it is impossible.


   Last it fixes a small problem in 54075.cc bench. We were using 
__uset_traits rather than __umset_traits in definition of __umset. 
Results were not the expected ones.


Thanks, now that we're back in stage 1 we can make this change.




2016-04-22  François Dumont 

   * include/bits/hashtable_policy.h
   (_Prime_rehash_policy::__has_load_factor): New. Mark rehash policy
   having load factor management.
   (_Mask_range_hashing): New.
   (_NextPower2): New.
   (_Power2_rehash_policy): New.
   (_Inserts<>): Remove last template parameter _Unique_keys. Use the same
   implementation when keys are unique no matter if iterators are constant
   or not.


I found this change description confusing, because it's really using
the same implementation whether keys are unique or not, but this says
"Use the same implementation whether iterators are constant or not".

Shouldn't it be "Use the same implementation when iterators are
constant, no matter if keys are unique or not" ?

Maybe this would be clearer:

   (_Inserts<>): Remove last template parameter, _Unique_keys, so
   that partial specializations only depend on whether iterators are
   constant or not.



   * src/c++11/hashable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
   Consider when last prime number has been reach.


s/reach/reached/


   * testsuite/23_containers/unordered_set/hash_policy/power2_rehash.cc:
   New.
   * testsuite/performance/23_containers/insert/54075.cc: Add bench using


s/bench/benchmark/


   the new hash policy.
   * testsuite/performance/23_containers/insert_erase/41975.cc: Likewise.

Tested under linux x64_86, ok to commit ?

François





Index: include/bits/hashtable_policy.h
===
--- include/bits/hashtable_policy.h (r??vision 235348)
+++ include/bits/hashtable_policy.h (copie de travail)
@@ -457,6 +457,8 @@
  /// smallest prime that keeps the load factor small enough.
  struct _Prime_rehash_policy
  {
+using __has_load_factor = std::true_type;

_Prime_rehash_policy(float __z = 1.0) noexcept
: _M_max_load_factor(__z), _M_next_resize(0) { }

@@ -501,6 +503,136 @@
mutable std::size_t _M_next_resize;
  };

+  /// Range hashing function assuming that second args is a power of 2.


s/args/arg/


+  struct _Mask_range_hashing
+  {
+typedef std::size_t first_argument_type;
+typedef std::size_t second_argument_type;
+typedef std::size_t result_type;
+
+result_type
+operator()(first_argument_type __num,
+  second_argument_type __den) const noexcept
+{ return __num & (__den - 1); }
+  };
+
+
+  /// Helper type to compute next power of 2.
+  template
+struct _NextPower2
+{
+  static _SizeT
+  _Get(_SizeT __n)
+  {
+   _SizeT __next = _NextPower2<_SizeT, (_N >> 1), false>::_Get(__n);
+   __next |= __next >> _N;
+   if (_Increment)
+ ++__next;
+
+   return __next;
+  }
+};
+
+  template
+struct _NextPower2<_SizeT, 1, false>
+{
+  static _SizeT
+  _Get(_SizeT __n)
+  {
+   --__n;
+   return __n |= __n >> 1;
+  }
+};


What's the reason to keep this recursive template instead of using a
simple function like the clp2() we discussed? A simple function (which
could be _GLIBCXX14_CONSTEXPR) compiles faster, and produces similar
object code for the default -std=gnu++14 mode. And it doesn't require
six calls to _NextPower2::_Get to calculate the result.

If you're worried about the final shift being unnecessary on 32-bit
you can use the preprocessor, something like:

 _GLIBCXX14_CONSTEXPR
 std::size_t
 __clp2(std::size_t n)
 {
#if __SIZEOF_SIZE_T__ >= 8
   std::uint_fast64_t x = n;
#else
   std::uint_fast32_t x = n;
#endif
   // Algorithm from Hacker's Delight, Figure 3-3.
   x = x - 1;
   x = x | (x >> 1);
   x = x | (x >> 2);
   x = x | (x >> 4);
   x = x | (x >> 8);
   x = x | (x >>16);
#if __SIZEOF_SIZE_T__ >= 8
   x = x | (x >>32);
#endif
   return x + 1;
 }

I don't think we need to worry about 128-bit integers, 

Re: [PATCH] Fix type field walking in gimplifier unsharing

2016-04-28 Thread Eric Botcazou
> Aww, I was hoping for sth that would not require me to fix all
> frontends ...

I don't really see how this can work without DECL_EXPR though.  You need to 
define when the variable-sized expressions are evaluated to lay out the type, 
otherwise it will be laid out on the first use, which may see a different 
value of the expressions than the definition point.  The only way to do that 
for a locally-defined type is to add a DECL_EXPR in GENERIC, so that the 
gimplifier evaluates the expressions at the right spot.

Of course in Ada we have the ACATS testsuite which tests for this kind of 
things, this explains why it already works.

> It seems the C frontend does it correctly already - I hit the
> ubsan issue for c-c++-common/ubsan/pr59667.c and only for the C++ FE
> for example.  Notice how only the pointed-to type is variable-size
> here.
> 
> C produces
> 
> {
>   unsigned int len = 1;
>   typedef float [0:(sizetype) ((long int) SAVE_EXPR  +
> -1)][0:(sizetype) ((long int) SAVE_EXPR  + -1)];
>   float[0:(sizetype) ((long int) SAVE_EXPR  + -1)][0:(sizetype)
> ((long int) SAVE_EXPR  + -1)] * P = 0B;
> 
> unsigned int len = 1;
> typedef float [0:(sizetype) ((long int) SAVE_EXPR  +
> -1)][0:(sizetype) ((long int) SAVE_EXPR  + -1)];
>   SAVE_EXPR ;, (void) SAVE_EXPR ;;
> float[0:(sizetype) ((long int) SAVE_EXPR  + -1)][0:(sizetype)
> ((long int) SAVE_EXPR  + -1)] * P = 0B;
>   (*P)[0][0] = 1.0e+0;
>   return 0;
> }
> 
> the decl-expr is the 'typedef' line.  The C++ FE produces
> 
> {
>   unsigned int len = 1;
>   float[0:(sizetype) (SAVE_EXPR <(ssizetype) len + -1>)][0:(sizetype)
> (SAVE_EXPR <(ssizetype) len + -1>)] * P = 0B;
> 
>   <>;
>   <   (void) (((bitsizetype) ((sizetype) (SAVE_EXPR <(ssizetype) len + -1>) +
> 1) * (bitsizetype) ((sizetype) (SAVE_EXPR <(ssizetype) len + -1>) + 1)) *
> 32) >;
>   < -1>)][0:(sizetype) (SAVE_EXPR <(ssizetype) len + -1>)] * P = 0B;>>;
>   <   (void) ((*P)[0][0] = 1.0e+0) >;
>   return  = 0;
> }
> 
> notice the lack of a decl-expr here.  It has some weird expr_stmt
> here covering the sizes though. Possibly because VLA arrays are a GNU
> extension.

Indeed.

> Didn't look into the fortran FE issue but I expect it's similar
> (it only occurs for pointers to VLAs as well).
> 
> I'll try to come up with patches.
> 
> Thanks for the hint,

You're welcome.

-- 
Eric Botcazou


Avoid NULL cfun ICE in gcc/config/nvptx/nvptx.c:nvptx_libcall_value (was: [PATCH] Fix PR70760)

2016-04-28 Thread Thomas Schwinge
Hi!

Richard's r235511 changes (quoted below) cause certain nvptx offloading
test cases to run into SIGSEGVs:

[...]
#4  0x00d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
#5  0x00d17a20 in nvptx_function_value (type=0x7fc1fa359690, 
func=0x0, outgoing=)
at [...]/source-gcc/gcc/config/nvptx/nvptx.c:512
#6  0x006ba220 in hard_function_value 
(valtype=valtype@entry=0x7fc1fa359690, func=func@entry=0x0, 
fntype=fntype@entry=0x0, 
outgoing=outgoing@entry=0) at [...]/source-gcc/gcc/explow.c:1860
#7  0x0073b0fa in aggregate_value_p (exp=exp@entry=0x7fc1fa41a048, 
fntype=0x0)
at [...]/source-gcc/gcc/function.c:2086
#8  0x00bebc11 in find_func_aliases_for_call (t=0x1feac90, 
fn=0x7ffe448ca8a0)
at [...]/source-gcc/gcc/tree-ssa-structalias.c:4644
#9  find_func_aliases (fn=fn@entry=0x7fc1fa43a540, 
origt=origt@entry=0x7fc1fa43a7e0)
at [...]/source-gcc/gcc/tree-ssa-structalias.c:4737
#10 0x00bf04eb in ipa_pta_execute ()
at [...]/source-gcc/gcc/tree-ssa-structalias.c:7787
#11 (anonymous namespace)::pass_ipa_pta::execute (this=)
at [...]/source-gcc/gcc/tree-ssa-structalias.c:8035
#12 0x00940bed in execute_one_pass (pass=pass@entry=0x1f43770)
at [...]/source-gcc/gcc/passes.c:2348
#13 0x00941972 in execute_ipa_pass_list (pass=0x1f43770)
at [...]/source-gcc/gcc/passes.c:2778
#14 0x00607f1f in symbol_table::compile (this=0x7fc1fa359000)
at [...]/source-gcc/gcc/cgraphunit.c:2435
#15 0x0056ad48 in lto_main () at [...]/source-gcc/gcc/lto/lto.c:3328
#16 0x00a065df in compile_file () at 
[...]/source-gcc/gcc/toplev.c:474
#17 0x0053753a in do_compile () at 
[...]/source-gcc/gcc/toplev.c:1998
#18 toplev::main (this=this@entry=0x7ffe448caba0, argc=argc@entry=18, 
argv=0x1f1eec0, argv@entry=0x7ffe448caca8)
at [...]/source-gcc/gcc/toplev.c:2106
#19 0x005391d7 in main (argc=18, argv=0x7ffe448caca8)
at [...]/source-gcc/gcc/main.c:39

The immediate problem is that
gcc/config/nvptx/nvptx.c:nvptx_libcall_value is called in a context where
cfun is NULL, and it fails to handle that appropriately:

(gdb) frame 4
#4  0x00d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
489   if (!cfun->machine->doing_call)
(gdb) print cfun
$1 = (function *) 0x0

Looking at the backtrace, I see that in frame 7,
gcc/function.c:aggregate_value_p is called with a NULL fntype.  This
function is evidently prepared to handle that case, likewise for
gcc/explow.c:hard_function_value.  Does it thus follow that
gcc/config/nvptx/nvptx.c:nvptx_function_value and/or
gcc/config/nvptx/nvptx.c:nvptx_libcall_value need to be changed?  Is
something like the following sufficient (works in offloading testing, but
feels a bit like just "treating the symptoms"); for instance, should this
case rather be handled in gcc/config/nvptx/nvptx.c:nvptx_function_value
already?

--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -484,7 +484,7 @@ nvptx_strict_argument_naming (cumulative_args_t cum_v)
 static rtx
 nvptx_libcall_value (machine_mode mode, const_rtx)
 {
-  if (!cfun->machine->doing_call)
+  if (!cfun || !cfun->machine->doing_call)
 /* Pretend to return in a hard reg for early uses before pseudos can be
generated.  */
 return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);


For reference:

On Wed, 27 Apr 2016 13:07:36 +0200 (CEST), Richard Biener  
wrote:
> The following patch fixes an issue in IPA PTA regarding to handling
> of DECL_BY_REFERENCE function results at the caller side.  The issue
> for the testcase in the PR is that we use the wrong function decl
> to look for DECL_RESULT for calls that are an alias (which get
> DECL_RESULT released).
> 
> But the issue is deeper in that the code also does not handle
> indirect calls correctly - to expose a testcase for this the
> patch also enables optimistic handling of functions escaping
> via their addresses, this is already handled fine after I added
> code to parse global initializers correctly.
> 
> LTO bootstrapped and tested on x86_64-unknown-linux-gnu with IPA PTA 
> enabled, inspected PTA result on the PRs testcase (I failed to create a 
> small reproducer).
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> 
> This is the trunk version of the fix, for the branch where the
> issue was reported against I will refrain from handling address-taken
> functions differently.
> 
> I hope I deciphered enough of the calls handling to assess that
> aggregate_value_p always matches DECL_BY_REFERENCE on DECL_RESULT.
> IPA PTA needs to know the GIMPLE representation of the callees
> DECL_RESULT (whether it's a pointer - at the caller side we
> still see the non-reference LHS).  And that

Re: [PATCH 2/6] [ARC] Fix FPX/FPUDA code gen when compiling for big-endian.

2016-04-28 Thread Joern Wolfgang Rennecke



On 18/04/16 15:33, Claudiu Zissulescu wrote:

OK to apply?
Claudiu

gcc/
2016-04-18  Claudiu Zissulescu  

* config/arc/arc.c (arc_process_double_reg_moves): Fix for
big-endian compilation.
* config/arc/arc.md (addf3): Likewise.
(subdf3): Likewise.
(muldf3): Likewise.


 OK.

FWIW, there is also a FIXME for a little-endian-centric use of 
split_double in arc.c:arc_rtx_costs.


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-28 Thread Bernd Schmidt

On 04/28/2016 12:08 PM, Jakub Jelinek wrote:

BTW, I think fatal_error doesn't make sense, it isn't something that is not
recoverable, normal error or just a warning would be IMHO more than
sufficient.  The fallback would be just using current time, i.e. ignoring
the env var.


I thought about this, but we also error out for invalid arguments to 
options, and IMO this case is analogous.



Additionally, I think it is a very bad idea to slow down the initialization
for something so rarely used - instead of initializing this always, IMNSHO
it should be only initialized when the first __TIME__ macro is expanded,
similarly how it only calls time/localtime etc. when the macro is expanded
for the first time.


I really don't see anything in that function that looks like a huge time 
sink, so I'm not that worried about it. I think it's likely to be buried 
way down in the noise.



Also, as a follow-up, guess the driver should set this
env var for the -fcompare-debug case if not already set, to something that
matches the current date, so that __TIME__ macros expands the same in
between both compilations, even when they don't compile both in the same
second.


This sounds like a good idea. Maybe we could even have the bootstrap 
include an instance of __TIME__, with the env var set, and use the stage 
comparison as a test for this feature.



Bernd


Re: [PATCH 3/6] [ARC] Pass mfpuda to assembler.

2016-04-28 Thread Joern Wolfgang Rennecke



On 18/04/16 15:33, Claudiu Zissulescu wrote:

OK to apply?
Claudiu

gcc/
2016-04-18  Claudiu Zissulescu  

* config/arc/arc.h (ASM_SPEC): Pass mfpuda to assembler.


 OK.


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-28 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 12:31:40PM +0200, Bernd Schmidt wrote:
> On 04/28/2016 12:08 PM, Jakub Jelinek wrote:
> >BTW, I think fatal_error doesn't make sense, it isn't something that is not
> >recoverable, normal error or just a warning would be IMHO more than
> >sufficient.  The fallback would be just using current time, i.e. ignoring
> >the env var.
> 
> I thought about this, but we also error out for invalid arguments to
> options, and IMO this case is analogous.
> 
> >Additionally, I think it is a very bad idea to slow down the initialization
> >for something so rarely used - instead of initializing this always, IMNSHO
> >it should be only initialized when the first __TIME__ macro is expanded,
> >similarly how it only calls time/localtime etc. when the macro is expanded
> >for the first time.
> 
> I really don't see anything in that function that looks like a huge time
> sink, so I'm not that worried about it. I think it's likely to be buried way
> down in the noise.

True, but the noise sums up, and the result is terrible speed of compiling
empty source files, something that e.g. Linux kernel or other packages
that have lots of small source files, care about a lot.
If initializing it early would buy us anything on code clarity etc., it
could be justified, but IMHO it doesn't, the code in libcpp already has the
delayed initialization anyway.

Jakub


Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-28 Thread Ilya Enkovich
2016-04-27 22:58 GMT+03:00 Uros Bizjak :
> Hello!
>
> This RFC patch illustrates the idea of using STV pass to load/store
> any TImode constant using SSE insns. The testcase:
>
> --cut here--
> __int128 x;
>
> __int128 test_1 (void)
> {
>   x = (__int128) 0x00112233;
> }
>
> __int128 test_2 (void)
> {
>   x = ((__int128) 0x0011223344556677 << 64);
> }
>
> __int128 test_3 (void)
> {
>   x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677;
> }
> --cut here--
>
> currently compiles (-O2) on x86_64 to:
>
> test_1:
> movq$1122867, x(%rip)
> movq$0, x+8(%rip)
> ret
>
> test_2:
> xorl%eax, %eax
> movabsq $4822678189205111, %rdx
> movq%rax, x(%rip)
> movq%rdx, x+8(%rip)
> ret
>
> test_3:
> movabsq $4822678189205111, %rax
> movabsq $4822678189205111, %rdx
> movq%rax, x(%rip)
> movq%rdx, x+8(%rip)
> ret
>
> However, using the attached patch, we compile all tests to:
>
> test:
> movdqa  .LC0(%rip), %xmm0
> movaps  %xmm0, x(%rip)
> ret
>
> Ilya, HJ - do you think new sequences are better, or - as suggested by
> Jakub - they are beneficial with STV pass, as we are now able to load
> any immediate value? A variant of this patch can also be used to load
> DImode values to 32bit STV pass.
>
> Uros.

Hi,

Why don't we have two movq instructions in all three cases now?  Is it
because of late split?

I wouldn't say SSE load+store is always better than two movq instructions.
But it obviously can enable bigger chains for STV which is good.  I think
you should adjust a cost model to handle immediates for proper decision.

That's what I have in my draft for DImode immediates:

@@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
unsigned insn_uid)
   BITMAP_FREE (queue);
 }

+/* Return a cost of building a vector costant
+   instead of using a scalar one.  */
+
+int
+scalar_chain::vector_const_cost (rtx exp)
+{
+  gcc_assert (CONST_INT_P (exp));
+
+  if (const0_operand (exp, GET_MODE (exp))
+  || constm1_operand (exp, GET_MODE (exp)))
+return COSTS_N_INSNS (1);
+  return ix86_cost->sse_load[1];
+}
+
 /* Compute a gain for chain conversion.  */

 int
@@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain ()
   || GET_CODE (src) == IOR
   || GET_CODE (src) == XOR
   || GET_CODE (src) == AND)
-   gain += ix86_cost->add;
+   {
+ gain += ix86_cost->add;
+ if (CONST_INT_P (XEXP (src, 0)))
+   gain -= scalar_chain::vector_const_cost (XEXP (src, 0));
+ if (CONST_INT_P (XEXP (src, 1)))
+   gain -= scalar_chain::vector_const_cost (XEXP (src, 1));
+   }
   else if (GET_CODE (src) == COMPARE)
{
  /* Assume comparison cost is the same.  */
}
+  else if (GET_CODE (src) == CONST_INT)
+   {
+ if (REG_P (dst))
+   gain += COSTS_N_INSNS (2);
+ else if (MEM_P (dst))
+   gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
+ gain -= scalar_chain::vector_const_cost (src);
+   }
   else
gcc_unreachable ();


Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-28 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 01:36:30PM +0300, Ilya Enkovich wrote:
> @@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain ()
>|| GET_CODE (src) == IOR
>|| GET_CODE (src) == XOR
>|| GET_CODE (src) == AND)
> -   gain += ix86_cost->add;
> +   {
> + gain += ix86_cost->add;
> + if (CONST_INT_P (XEXP (src, 0)))
> +   gain -= scalar_chain::vector_const_cost (XEXP (src, 0));
> + if (CONST_INT_P (XEXP (src, 1)))
> +   gain -= scalar_chain::vector_const_cost (XEXP (src, 1));
> +   }
>else if (GET_CODE (src) == COMPARE)
> {
>   /* Assume comparison cost is the same.  */
> }
> +  else if (GET_CODE (src) == CONST_INT)
> +   {
> + if (REG_P (dst))
> +   gain += COSTS_N_INSNS (2);
> + else if (MEM_P (dst))
> +   gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
> + gain -= scalar_chain::vector_const_cost (src);
> +   }
>else
> gcc_unreachable ();

Where does the 2 come from?  Is it that the STV pass right now supports only
2 * wordsize modes?  Also, I don't think we should treat equally constants
that fit into the 32-bit immediates and constants that don't, the latter,
when movabsq needs to be used, are more costly.

Jakub


Re: Avoid NULL cfun ICE in gcc/config/nvptx/nvptx.c:nvptx_libcall_value (was: [PATCH] Fix PR70760)

2016-04-28 Thread Richard Biener
On Thu, 28 Apr 2016, Thomas Schwinge wrote:

> Hi!
> 
> Richard's r235511 changes (quoted below) cause certain nvptx offloading
> test cases to run into SIGSEGVs:
> 
> [...]
> #4  0x00d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
> at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
> #5  0x00d17a20 in nvptx_function_value (type=0x7fc1fa359690, 
> func=0x0, outgoing=)
> at [...]/source-gcc/gcc/config/nvptx/nvptx.c:512
> #6  0x006ba220 in hard_function_value 
> (valtype=valtype@entry=0x7fc1fa359690, func=func@entry=0x0, 
> fntype=fntype@entry=0x0, 
> outgoing=outgoing@entry=0) at [...]/source-gcc/gcc/explow.c:1860
> #7  0x0073b0fa in aggregate_value_p 
> (exp=exp@entry=0x7fc1fa41a048, fntype=0x0)
> at [...]/source-gcc/gcc/function.c:2086
> #8  0x00bebc11 in find_func_aliases_for_call (t=0x1feac90, 
> fn=0x7ffe448ca8a0)
> at [...]/source-gcc/gcc/tree-ssa-structalias.c:4644
> #9  find_func_aliases (fn=fn@entry=0x7fc1fa43a540, 
> origt=origt@entry=0x7fc1fa43a7e0)
> at [...]/source-gcc/gcc/tree-ssa-structalias.c:4737
> #10 0x00bf04eb in ipa_pta_execute ()
> at [...]/source-gcc/gcc/tree-ssa-structalias.c:7787
> #11 (anonymous namespace)::pass_ipa_pta::execute (this=)
> at [...]/source-gcc/gcc/tree-ssa-structalias.c:8035
> #12 0x00940bed in execute_one_pass (pass=pass@entry=0x1f43770)
> at [...]/source-gcc/gcc/passes.c:2348
> #13 0x00941972 in execute_ipa_pass_list (pass=0x1f43770)
> at [...]/source-gcc/gcc/passes.c:2778
> #14 0x00607f1f in symbol_table::compile (this=0x7fc1fa359000)
> at [...]/source-gcc/gcc/cgraphunit.c:2435
> #15 0x0056ad48 in lto_main () at 
> [...]/source-gcc/gcc/lto/lto.c:3328
> #16 0x00a065df in compile_file () at 
> [...]/source-gcc/gcc/toplev.c:474
> #17 0x0053753a in do_compile () at 
> [...]/source-gcc/gcc/toplev.c:1998
> #18 toplev::main (this=this@entry=0x7ffe448caba0, argc=argc@entry=18, 
> argv=0x1f1eec0, argv@entry=0x7ffe448caca8)
> at [...]/source-gcc/gcc/toplev.c:2106
> #19 0x005391d7 in main (argc=18, argv=0x7ffe448caca8)
> at [...]/source-gcc/gcc/main.c:39
> 
> The immediate problem is that
> gcc/config/nvptx/nvptx.c:nvptx_libcall_value is called in a context where
> cfun is NULL, and it fails to handle that appropriately:
> 
> (gdb) frame 4
> #4  0x00d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
> at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
> 489   if (!cfun->machine->doing_call)
> (gdb) print cfun
> $1 = (function *) 0x0
> 
> Looking at the backtrace, I see that in frame 7,
> gcc/function.c:aggregate_value_p is called with a NULL fntype.  This
> function is evidently prepared to handle that case, likewise for
> gcc/explow.c:hard_function_value.  Does it thus follow that
> gcc/config/nvptx/nvptx.c:nvptx_function_value and/or
> gcc/config/nvptx/nvptx.c:nvptx_libcall_value need to be changed?  Is
> something like the following sufficient (works in offloading testing, but
> feels a bit like just "treating the symptoms"); for instance, should this
> case rather be handled in gcc/config/nvptx/nvptx.c:nvptx_function_value
> already?
> 
> --- gcc/config/nvptx/nvptx.c
> +++ gcc/config/nvptx/nvptx.c
> @@ -484,7 +484,7 @@ nvptx_strict_argument_naming (cumulative_args_t cum_v)
>  static rtx
>  nvptx_libcall_value (machine_mode mode, const_rtx)
>  {
> -  if (!cfun->machine->doing_call)
> +  if (!cfun || !cfun->machine->doing_call)
>  /* Pretend to return in a hard reg for early uses before pseudos can be
> generated.  */
>  return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);

Doing anything based on 'cfun' here is fishy at least for the
call context of aggregate_value_p as that is also used when
looking at the caller side of a call for example when expanding calls
where cfun is then the callers cfun and not the callees.

So I suggest to remove cfun->machine->doing_call and revisit the
reason why it was added for PTX.

Richard.

> 
> For reference:
> 
> On Wed, 27 Apr 2016 13:07:36 +0200 (CEST), Richard Biener  
> wrote:
> > The following patch fixes an issue in IPA PTA regarding to handling
> > of DECL_BY_REFERENCE function results at the caller side.  The issue
> > for the testcase in the PR is that we use the wrong function decl
> > to look for DECL_RESULT for calls that are an alias (which get
> > DECL_RESULT released).
> > 
> > But the issue is deeper in that the code also does not handle
> > indirect calls correctly - to expose a testcase for this the
> > patch also enables optimistic handling of functions escaping
> > via their addresses, this is already handled fine after I added
> > code to parse global initializers correctly.
> > 
> > LTO bootstrapped and tested on x86_64-unknown-linux-gnu with IPA PTA 
> > enabled, inspected PTA result 

Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-28 Thread Uros Bizjak
On Thu, Apr 28, 2016 at 12:36 PM, Ilya Enkovich  wrote:
> 2016-04-27 22:58 GMT+03:00 Uros Bizjak :
>> Hello!
>>
>> This RFC patch illustrates the idea of using STV pass to load/store
>> any TImode constant using SSE insns. The testcase:
>>
>> --cut here--
>> __int128 x;
>>
>> __int128 test_1 (void)
>> {
>>   x = (__int128) 0x00112233;
>> }
>>
>> __int128 test_2 (void)
>> {
>>   x = ((__int128) 0x0011223344556677 << 64);
>> }
>>
>> __int128 test_3 (void)
>> {
>>   x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677;
>> }
>> --cut here--
>>
>> currently compiles (-O2) on x86_64 to:
>>
>> test_1:
>> movq$1122867, x(%rip)
>> movq$0, x+8(%rip)
>> ret
>>
>> test_2:
>> xorl%eax, %eax
>> movabsq $4822678189205111, %rdx
>> movq%rax, x(%rip)
>> movq%rdx, x+8(%rip)
>> ret
>>
>> test_3:
>> movabsq $4822678189205111, %rax
>> movabsq $4822678189205111, %rdx
>> movq%rax, x(%rip)
>> movq%rdx, x+8(%rip)
>> ret
>>
>> However, using the attached patch, we compile all tests to:
>>
>> test:
>> movdqa  .LC0(%rip), %xmm0
>> movaps  %xmm0, x(%rip)
>> ret
>>
>> Ilya, HJ - do you think new sequences are better, or - as suggested by
>> Jakub - they are beneficial with STV pass, as we are now able to load
>> any immediate value? A variant of this patch can also be used to load
>> DImode values to 32bit STV pass.
>>
>> Uros.
>
> Hi,
>
> Why don't we have two movq instructions in all three cases now?  Is it
> because of late split?

movq can handle only 32bit sign-extended immediates. There is actually
room for improvement in test_2, where we could directly store 0 to
x(%rip).

Uros.

> I wouldn't say SSE load+store is always better than two movq instructions.
> But it obviously can enable bigger chains for STV which is good.  I think
> you should adjust a cost model to handle immediates for proper decision.
>
> That's what I have in my draft for DImode immediates:
>
> @@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
> unsigned insn_uid)
>BITMAP_FREE (queue);
>  }
>
> +/* Return a cost of building a vector costant
> +   instead of using a scalar one.  */
> +
> +int
> +scalar_chain::vector_const_cost (rtx exp)
> +{
> +  gcc_assert (CONST_INT_P (exp));
> +
> +  if (const0_operand (exp, GET_MODE (exp))
> +  || constm1_operand (exp, GET_MODE (exp)))
> +return COSTS_N_INSNS (1);
> +  return ix86_cost->sse_load[1];
> +}
> +
>  /* Compute a gain for chain conversion.  */
>
>  int
> @@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain ()
>|| GET_CODE (src) == IOR
>|| GET_CODE (src) == XOR
>|| GET_CODE (src) == AND)
> -   gain += ix86_cost->add;
> +   {
> + gain += ix86_cost->add;
> + if (CONST_INT_P (XEXP (src, 0)))
> +   gain -= scalar_chain::vector_const_cost (XEXP (src, 0));
> + if (CONST_INT_P (XEXP (src, 1)))
> +   gain -= scalar_chain::vector_const_cost (XEXP (src, 1));
> +   }
>else if (GET_CODE (src) == COMPARE)
> {
>   /* Assume comparison cost is the same.  */
> }
> +  else if (GET_CODE (src) == CONST_INT)
> +   {
> + if (REG_P (dst))
> +   gain += COSTS_N_INSNS (2);
> + else if (MEM_P (dst))
> +   gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
> + gain -= scalar_chain::vector_const_cost (src);
> +   }
>else
> gcc_unreachable ();


Re: check-target-libgomp wall time, without vs. with offloading

2016-04-28 Thread Thomas Schwinge
Hi!

On Thu, 24 Mar 2016 22:42:10 +0100, I wrote:
> On Wed, 23 Mar 2016 20:02:01 +0100, Jakub Jelinek  wrote:
> > On Tue, Mar 22, 2016 at 11:23:43AM +0100, Thomas Schwinge wrote:
> > > As discussed in
> > > 
> > > (and similar to what we're already doing for Fortran, and similar to what
> > > recently got committed to libgomp/testsuite/libgomp.hsa.c/c.exp), it has
> > > been helpful to also run C, C++ offloading test cases with -O0 in
> > > addition to the -O2 default.  Making my earlier gomp-4_0-branch patch
> > > conceptually simpler, I came up with the following; OK for trunk?
> > 
> > How big difference in make check-target-libgomp time is that?
> > Without PTX offloading I bet zero, but with PTX offloading configured, is it
> > 10% or 50% slower?
> 
> 15 %.  The major part of the total time is still spent in Fortran
> testing...  ;-/
> 
> Offloading compilation is slow; I suppose because of having to invoke
> several tools (LTO streaming -> mkoffload -> offload compilers,
> assemblers, linkers -> combine the resulting images; but I have not done
> a detailed analysis on that).

Here are three patches to improve that.  OK for gcc-6-branch and trunk?
Before:

$ grep ^TIME < build-gcc/x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.log
TIME 1461826886 START [...]/libgomp.c/c.exp
TIME 1461827229 (343) END [...]/libgomp.c/c.exp
TIME 1461827230 START [...]/libgomp.c++/c++.exp
TIME 1461827522 (292) END [...]/libgomp.c++/c++.exp
TIME 1461827522 START [...]/libgomp.fortran/fortran.exp
TIME 1461828279 (757) END [...]/libgomp.fortran/fortran.exp
TIME 1461828280 START [...]/libgomp.graphite/graphite.exp
TIME 1461828284 (4) END [...]/libgomp.graphite/graphite.exp
TIME 1461828284 START [...]/libgomp.hsa.c/c.exp
TIME 1461828285 (1) END [...]/libgomp.hsa.c/c.exp
TIME 1461828285 START [...]/libgomp.oacc-c/c.exp
TIME 1461828866 (581) END [...]/libgomp.oacc-c/c.exp
TIME 1461828866 START [...]/libgomp.oacc-c++/c++.exp
TIME 1461829685 (819) END [...]/libgomp.oacc-c++/c++.exp
TIME 1461829685 START [...]/libgomp.oacc-fortran/fortran.exp
TIME 1461831119 (1434) END [...]/libgomp.oacc-fortran/fortran.exp

After:

TIME 1461832444 START [...]/libgomp.c/c.exp
TIME 1461832935 (491) END [...]/libgomp.c/c.exp
TIME 1461832935 START [...]/libgomp.c++/c++.exp
TIME 1461833275 (340) END [...]/libgomp.c++/c++.exp
TIME 1461833275 START [...]/libgomp.fortran/fortran.exp
TIME 1461833983 (708) END [...]/libgomp.fortran/fortran.exp
TIME 1461833983 START [...]/libgomp.graphite/graphite.exp
TIME 1461833986 (3) END [...]/libgomp.graphite/graphite.exp
TIME 1461833986 START [...]/libgomp.hsa.c/c.exp
TIME 1461833986 (0) END [...]/libgomp.hsa.c/c.exp
TIME 1461833986 START [...]/libgomp.oacc-c/c.exp
TIME 1461834423 (437) END [...]/libgomp.oacc-c/c.exp
TIME 1461834423 START [...]/libgomp.oacc-c++/c++.exp
TIME 1461834918 (495) END [...]/libgomp.oacc-c++/c++.exp
TIME 1461834918 START [...]/libgomp.oacc-fortran/fortran.exp
TIME 1461835533 (615) END [...]/libgomp.oacc-fortran/fortran.exp

This is on a rather busy system; my patch can't have any effect on
libgomp OpenMP offloading testing (which has taken longer in the second
run, due to higher system load); in light of this, the reduced duration
of OpenACC testing "shines" ;-) even better.

commit 3b521f3e35fdb4b320e95b5f6a82b8d89399481a
Author: Thomas Schwinge 
Date:   Thu Apr 21 11:36:39 2016 +0200

libgomp: Unconfuse offload plugins vs. offload targets
---
 libgomp/Makefile.in   |2 +-
 libgomp/config.h.in   |4 +--
 libgomp/configure |   34 +
 libgomp/plugin/configfrag.ac  |   34 +
 libgomp/target.c  |8 +++---
 libgomp/testsuite/Makefile.in |2 +-
 libgomp/testsuite/lib/libgomp.exp |   25 --
 libgomp/testsuite/libgomp-test-support.exp.in |2 +-
 8 files changed, 56 insertions(+), 55 deletions(-)

diff --git libgomp/Makefile.in libgomp/Makefile.in
[snipped]
diff --git libgomp/config.h.in libgomp/config.h.in
[snipped]
diff --git libgomp/configure libgomp/configure
[snipped]
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index 88b4156..93d3a71 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -26,8 +26,6 @@
 # see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # .
 
-offload_targets=
-AC_SUBST(offload_targets)
 plugin_support=yes
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
@@ -142,7 +140,10 @@ AC_SUBST(PLUGIN_HSA_LIBS)
 
 
 
-# Get offload targets and path to install tree of offloading compiler.
+# Parse offload targets, and f

Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-28 Thread Ilya Enkovich
2016-04-28 13:41 GMT+03:00 Jakub Jelinek :
> On Thu, Apr 28, 2016 at 01:36:30PM +0300, Ilya Enkovich wrote:
>> @@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain ()
>>|| GET_CODE (src) == IOR
>>|| GET_CODE (src) == XOR
>>|| GET_CODE (src) == AND)
>> -   gain += ix86_cost->add;
>> +   {
>> + gain += ix86_cost->add;
>> + if (CONST_INT_P (XEXP (src, 0)))
>> +   gain -= scalar_chain::vector_const_cost (XEXP (src, 0));
>> + if (CONST_INT_P (XEXP (src, 1)))
>> +   gain -= scalar_chain::vector_const_cost (XEXP (src, 1));
>> +   }
>>else if (GET_CODE (src) == COMPARE)
>> {
>>   /* Assume comparison cost is the same.  */
>> }
>> +  else if (GET_CODE (src) == CONST_INT)
>> +   {
>> + if (REG_P (dst))
>> +   gain += COSTS_N_INSNS (2);
>> + else if (MEM_P (dst))
>> +   gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
>> + gain -= scalar_chain::vector_const_cost (src);
>> +   }
>>else
>> gcc_unreachable ();
>
> Where does the 2 come from?  Is it that the STV pass right now supports only
> 2 * wordsize modes?  Also, I don't think we should treat equally constants
> that fit into the 32-bit immediates and constants that don't, the latter,
> when movabsq needs to be used, are more costly.

This variant is for DImode going to split into two SImode.  TImode chains
have own cost model.

Thanks,
Ilya

>
> Jakub


Re: Fix PR ada/70759

2016-04-28 Thread Eric Botcazou
> As discussed on gcc@, this removes the internal_reference_types machinery,
> which is used only by the Ada compiler for a reason probably long obsolete.
> 
> Tested on x86_64-suse-linux, applied on the mainline.

Andreas S. reported the PR against 6.0 so I have backported the patch onto the 
6 branch after bootstrapping/regtesting it there.

-- 
Eric Botcazou


Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-28 Thread Ilya Enkovich
2016-04-28 13:43 GMT+03:00 Uros Bizjak :
> On Thu, Apr 28, 2016 at 12:36 PM, Ilya Enkovich  
> wrote:
>> 2016-04-27 22:58 GMT+03:00 Uros Bizjak :
>>> Hello!
>>>
>>> This RFC patch illustrates the idea of using STV pass to load/store
>>> any TImode constant using SSE insns. The testcase:
>>>
>>> --cut here--
>>> __int128 x;
>>>
>>> __int128 test_1 (void)
>>> {
>>>   x = (__int128) 0x00112233;
>>> }
>>>
>>> __int128 test_2 (void)
>>> {
>>>   x = ((__int128) 0x0011223344556677 << 64);
>>> }
>>>
>>> __int128 test_3 (void)
>>> {
>>>   x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677;
>>> }
>>> --cut here--
>>>
>>> currently compiles (-O2) on x86_64 to:
>>>
>>> test_1:
>>> movq$1122867, x(%rip)
>>> movq$0, x+8(%rip)
>>> ret
>>>
>>> test_2:
>>> xorl%eax, %eax
>>> movabsq $4822678189205111, %rdx
>>> movq%rax, x(%rip)
>>> movq%rdx, x+8(%rip)
>>> ret
>>>
>>> test_3:
>>> movabsq $4822678189205111, %rax
>>> movabsq $4822678189205111, %rdx
>>> movq%rax, x(%rip)
>>> movq%rdx, x+8(%rip)
>>> ret
>>>
>>> However, using the attached patch, we compile all tests to:
>>>
>>> test:
>>> movdqa  .LC0(%rip), %xmm0
>>> movaps  %xmm0, x(%rip)
>>> ret
>>>
>>> Ilya, HJ - do you think new sequences are better, or - as suggested by
>>> Jakub - they are beneficial with STV pass, as we are now able to load
>>> any immediate value? A variant of this patch can also be used to load
>>> DImode values to 32bit STV pass.
>>>
>>> Uros.
>>
>> Hi,
>>
>> Why don't we have two movq instructions in all three cases now?  Is it
>> because of late split?
>
> movq can handle only 32bit sign-extended immediates. There is actually
> room for improvement in test_2, where we could directly store 0 to
> x(%rip).

Right.  In this case timode_scalar_chain::compute_convert_gain should
analyze immediate values used in a chain.

Thanks,
Ilya

>
> Uros.
>
>> I wouldn't say SSE load+store is always better than two movq instructions.
>> But it obviously can enable bigger chains for STV which is good.  I think
>> you should adjust a cost model to handle immediates for proper decision.
>>
>> That's what I have in my draft for DImode immediates:
>>
>> @@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
>> unsigned insn_uid)
>>BITMAP_FREE (queue);
>>  }
>>
>> +/* Return a cost of building a vector costant
>> +   instead of using a scalar one.  */
>> +
>> +int
>> +scalar_chain::vector_const_cost (rtx exp)
>> +{
>> +  gcc_assert (CONST_INT_P (exp));
>> +
>> +  if (const0_operand (exp, GET_MODE (exp))
>> +  || constm1_operand (exp, GET_MODE (exp)))
>> +return COSTS_N_INSNS (1);
>> +  return ix86_cost->sse_load[1];
>> +}
>> +
>>  /* Compute a gain for chain conversion.  */
>>
>>  int
>> @@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain ()
>>|| GET_CODE (src) == IOR
>>|| GET_CODE (src) == XOR
>>|| GET_CODE (src) == AND)
>> -   gain += ix86_cost->add;
>> +   {
>> + gain += ix86_cost->add;
>> + if (CONST_INT_P (XEXP (src, 0)))
>> +   gain -= scalar_chain::vector_const_cost (XEXP (src, 0));
>> + if (CONST_INT_P (XEXP (src, 1)))
>> +   gain -= scalar_chain::vector_const_cost (XEXP (src, 1));
>> +   }
>>else if (GET_CODE (src) == COMPARE)
>> {
>>   /* Assume comparison cost is the same.  */
>> }
>> +  else if (GET_CODE (src) == CONST_INT)
>> +   {
>> + if (REG_P (dst))
>> +   gain += COSTS_N_INSNS (2);
>> + else if (MEM_P (dst))
>> +   gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
>> + gain -= scalar_chain::vector_const_cost (src);
>> +   }
>>else
>> gcc_unreachable ();


Re: [patch] Don't encode the minor version in the gcj abi version

2016-04-28 Thread Rainer Orth
Rainer Orth  writes:

> Matthias Klose  writes:
>
>> Bumping the version from from 6.0.0 to 6.1.0 broke gcj, because the minor
>> version is still encoded in the gcj abi, not seen during development of the
>> 6 series until it was bumped for the final release.
>
> This is PR java/70839.

I just noticed that your patch is incomplete: it leaves the now unused
minor around and incorrectly talks about sub-minor versions...

Here's what I had in the PR instead:

2016-04-28  Rainer Orth  

PR java/70839
* decl.c (parse_version): Remove minor handling.

# HG changeset patch
# Parent  acf979f160547bd8b9b207525f97c29f6c9a9a6e
Don't include minor version in GCJ ABI version

diff --git a/gcc/java/decl.c b/gcc/java/decl.c
--- a/gcc/java/decl.c
+++ b/gcc/java/decl.c
@@ -507,7 +507,7 @@ static void
 parse_version (void)
 {
   const char *p = version_string;
-  unsigned int major = 0, minor = 0;
+  unsigned int major = 0;
   unsigned int abi_version;
 
   /* Skip leading junk.  */
@@ -525,13 +525,6 @@ parse_version (void)
   gcc_assert (*p == '.' && ISDIGIT (p[1]));
   ++p;
 
-  /* Extract minor version.  */
-  while (ISDIGIT (*p))
-{
-  minor = minor * 10 + *p - '0';
-  ++p;
-}
-
   if (flag_indirect_dispatch)
 {
   abi_version = GCJ_CURRENT_BC_ABI_VERSION;
@@ -540,9 +533,9 @@ parse_version (void)
   else /* C++ ABI */
 {
   /* Implicit in this computation is the idea that we won't break the
-	 old-style binary ABI in a sub-minor release (e.g., from 4.0.0 to
-	 4.0.1).  */
-  abi_version = 10 * major + 1000 * minor;
+	 old-style binary ABI in a minor release (e.g., from 6.1.0 to
+	 6.2.0).  */
+  abi_version = 10 * major;
 }
   if (flag_bootstrap_classes)
 abi_version |= FLAG_BOOTSTRAP_LOADER;

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix type field walking in gimplifier unsharing

2016-04-28 Thread Richard Biener
On Thu, 28 Apr 2016, Eric Botcazou wrote:

> > Aww, I was hoping for sth that would not require me to fix all
> > frontends ...
> 
> I don't really see how this can work without DECL_EXPR though.  You need to 
> define when the variable-sized expressions are evaluated to lay out the type, 
> otherwise it will be laid out on the first use, which may see a different 
> value of the expressions than the definition point.  The only way to do that 
> for a locally-defined type is to add a DECL_EXPR in GENERIC, so that the 
> gimplifier evaluates the expressions at the right spot.

Ah, so the C++ FE does this correctly but in addition to that it has

  /* When the pointed-to type involves components of variable 
size,
 care must be taken to ensure that the size evaluation code is
 emitted early enough to dominate all the possible later uses
 and late enough for the variables on which it depends to have
 been assigned.

 This is expected to happen automatically when the pointed-to
 type has a name/declaration of it's own, but special 
attention
 is required if the type is anonymous.
...
  if (!TYPE_NAME (type)
  && (decl_context == NORMAL || decl_context == FIELD)
  && at_function_scope_p ()
  && variably_modified_type_p (type, NULL_TREE))
/* Force evaluation of the SAVE_EXPR.  */
finish_expr_stmt (TYPE_SIZE (type));

so in this case the type doesn't have an associated TYPE_DECL and thus
we can't build a DECL_EXPR.  To me the correct fix is then to
always force a TYPE_DECL for variable-modified types.

Jason?

Now digging into the Fortran FE equivalent case...

Thanks,
Richard.

> Of course in Ada we have the ACATS testsuite which tests for this kind of 
> things, this explains why it already works.
> 
> > It seems the C frontend does it correctly already - I hit the
> > ubsan issue for c-c++-common/ubsan/pr59667.c and only for the C++ FE
> > for example.  Notice how only the pointed-to type is variable-size
> > here.
> > 
> > C produces
> > 
> > {
> >   unsigned int len = 1;
> >   typedef float [0:(sizetype) ((long int) SAVE_EXPR  +
> > -1)][0:(sizetype) ((long int) SAVE_EXPR  + -1)];
> >   float[0:(sizetype) ((long int) SAVE_EXPR  + -1)][0:(sizetype)
> > ((long int) SAVE_EXPR  + -1)] * P = 0B;
> > 
> > unsigned int len = 1;
> > typedef float [0:(sizetype) ((long int) SAVE_EXPR  +
> > -1)][0:(sizetype) ((long int) SAVE_EXPR  + -1)];
> >   SAVE_EXPR ;, (void) SAVE_EXPR ;;
> > float[0:(sizetype) ((long int) SAVE_EXPR  + -1)][0:(sizetype)
> > ((long int) SAVE_EXPR  + -1)] * P = 0B;
> >   (*P)[0][0] = 1.0e+0;
> >   return 0;
> > }
> > 
> > the decl-expr is the 'typedef' line.  The C++ FE produces
> > 
> > {
> >   unsigned int len = 1;
> >   float[0:(sizetype) (SAVE_EXPR <(ssizetype) len + -1>)][0:(sizetype)
> > (SAVE_EXPR <(ssizetype) len + -1>)] * P = 0B;
> > 
> >   <>;
> >   < >   (void) (((bitsizetype) ((sizetype) (SAVE_EXPR <(ssizetype) len + -1>) +
> > 1) * (bitsizetype) ((sizetype) (SAVE_EXPR <(ssizetype) len + -1>) + 1)) *
> > 32) >;
> >   < > -1>)][0:(sizetype) (SAVE_EXPR <(ssizetype) len + -1>)] * P = 0B;>>;
> >   < >   (void) ((*P)[0][0] = 1.0e+0) >;
> >   return  = 0;
> > }
> > 
> > notice the lack of a decl-expr here.  It has some weird expr_stmt
> > here covering the sizes though. Possibly because VLA arrays are a GNU
> > extension.
> 
> Indeed.
> 
> > Didn't look into the fortran FE issue but I expect it's similar
> > (it only occurs for pointers to VLAs as well).
> > 
> > I'll try to come up with patches.
> > 
> > Thanks for the hint,
> 
> You're welcome.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] add -fprolog-pad=N option to c-family

2016-04-28 Thread Szabolcs Nagy
On 28/04/16 09:47, Maxim Kuvyrkov wrote:
>> On Apr 27, 2016, at 7:26 PM, Szabolcs Nagy  wrote:
>>
>> with -mfentry, by default the user only has to
>> implement the fentry call (linux wants nops there, but
>> e.g. glibc could use -pg -mfentry for profiling on
>> aarch64 and the target specific details are easier to
>> document for an -m option than for something general).
> 
> I don't understand your point here, could you elaborate, please?
> 

if we only provide -mfentry then

- the kernel can use it (they have tools to nop patch the binary),

- others who don't want to fiddle with nops, just have the call,
can also use it (e.g. user-space profiling cannot really use
something that needs binary patching in case the user prefers
-pg -mfentry over the current -pg behaviour).

- it's target specific, so the magic abi of the fentry call can
be documented by the target according to the specific instruction
sequence that is used. (with nop-padding there are psabi and
compiler optimization interactions that may be hard to document
in a generic way and letting the user figure it out may cause
problems later in compiler development.. but i'm just speculating
based on the powerpc toc handling and ipa-ra findings.)

>> the nop-padding is more general, but the size and
>> layout of nops and the call abi will be target
>> specific and the user will most likely need to modify
>> the binary (to get the right sequence) which needs
>> additional tooling.  i don't know who might use it
>> other than linux (which already has tools to deal with
>> -mfentry).
> 
> Right, but this tooling will require minimal (if any) changes
> to be adapted to nop-pad approach.  If I remember correctly,
> recent versions of GCC and kernel for x86_64 generate NOPs,
> not the call sequence in the prologs when -mfentry is used.

i'm trying to find where this happens in the kernel, but
i only see scripts/recordmcount.{c,pl} which are based on
nop patching the fentry/mcount call sites.

without such call sites the tools have to be implemented
differently and the way the kernel records the call site
positions might not match the prolog-pad recording.



[Ada] Fix PR ada/70786

2016-04-28 Thread Eric Botcazou
It's a small pasto in one of the variants of Get_Immediate.

Tested on x86_64-suse-linux, applied on all active branches.


2016-04-28  Eric Botcazou  

PR ada/70786
* a-textio.adb (Get_Immediate): Add missing 'not' in expression.


-- 
Eric Botcazou
Index: a-textio.adb
===
--- a-textio.adb	(revision 235544)
+++ a-textio.adb	(working copy)
@@ -668,7 +668,7 @@ package body Ada.Text_IO is
 Available := True;
 
 Item :=
-  (if Is_Start_Of_Encoding (Character'Val (ch), File.WC_Method)
+  (if not Is_Start_Of_Encoding (Character'Val (ch), File.WC_Method)
then Character'Val (ch)
else Get_Upper_Half_Char_Immed (Character'Val (ch), File));
  end if;


Re: Avoid NULL cfun ICE in gcc/config/nvptx/nvptx.c:nvptx_libcall_value (was: [PATCH] Fix PR70760)

2016-04-28 Thread Alexander Monakov
On Thu, 28 Apr 2016, Richard Biener wrote:
> Doing anything based on 'cfun' here is fishy at least for the
> call context of aggregate_value_p as that is also used when
> looking at the caller side of a call for example when expanding calls
> where cfun is then the callers cfun and not the callees.
> 
> So I suggest to remove cfun->machine->doing_call and revisit the
> reason why it was added for PTX.

It does look a bit strange.  My understanding is this: today, nvptx never
returns aggregates on registers, so either an aggregate value is returned on
stack, or a scalar value is returned on a register.  The backend wants to
record the RTL mode of the value being returned (hence the 'outgoing'
check in nvptx_function_value), but otherwise wants to behave as if callees
return values in a fixed register, but callers receive values in a pseudo
register (hence the 'doing_call' check in nvptx_libcall_value: it can be
called in a context when pseudos can't be generated, but if we aren't
currently expanding a call to RTL, we can give whatever hard reg to middle end
in the right mode; it's a hack).

So if my understanding is correct, additional !cfun check can be acceptable as
a fix along the existing hack.  Perhaps with a note about the nature of the
hack.

Thanks.
Alexander


Re: [PATCH] add -fprolog-pad=N option to c-family

2016-04-28 Thread Torsten Duwe
On Thu, Apr 28, 2016 at 11:39:48AM +0300, Maxim Kuvyrkov wrote:
> > On Apr 27, 2016, at 6:22 PM, Torsten Duwe  wrote:
> 
> Your current patch is great for experiments for the kernel engineers to check 
> if suggested approaches to code patching will work.  Still, I prefer to 
> implement LTO-friendly way of handling -fprolog-pad=N via function attributes.

That was exactly my intention. I only wanted *some* working compiler.
I'm sure you compiler people will have a better way to finally implement this.

All I can say so far about the ipa-ra issue is that it'd be great if
x9(?) could be left as volatile / scratch; the rest can be preserved.

Torsten



Re: [PATCH 4/6] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-28 Thread Joern Wolfgang Rennecke



On 18/04/16 15:33, Claudiu Zissulescu wrote:

OK to apply?

No.  You are clobbering DBL0H.

Besides, why would you change any of the code, apart from the argument
to #ifdef and the comments?



Re: [PATCH] add -fprolog-pad=N option to c-family

2016-04-28 Thread Torsten Duwe
On Thu, Apr 28, 2016 at 11:58:25AM +0100, Szabolcs Nagy wrote:
> On 28/04/16 09:47, Maxim Kuvyrkov wrote:
> >> On Apr 27, 2016, at 7:26 PM, Szabolcs Nagy  wrote:
> >>
> >> with -mfentry, by default the user only has to
> >> implement the fentry call (linux wants nops there, but
> >> e.g. glibc could use -pg -mfentry for profiling on
> >> aarch64 and the target specific details are easier to
> >> document for an -m option than for something general).
> > 
> > I don't understand your point here, could you elaborate, please?
> > 
> 
> if we only provide -mfentry then
> 
> - the kernel can use it (they have tools to nop patch the binary),
> 
> - others who don't want to fiddle with nops, just have the call,
> can also use it (e.g. user-space profiling cannot really use
> something that needs binary patching in case the user prefers
> -pg -mfentry over the current -pg behaviour).

Any examples of users not satisfied with the current -pg ;-) ?

> - it's target specific, so the magic abi of the fentry call can
> be documented by the target according to the specific instruction

There's a downside to this: you will have to reimplement it in gcc

  * for every architecture
  * for every ABI variant

while the generic approach is -- well -- somewhat generic :-]

> sequence that is used. (with nop-padding there are psabi and
> compiler optimization interactions that may be hard to document
> in a generic way and letting the user figure it out may cause
> problems later in compiler development.. but i'm just speculating
> based on the powerpc toc handling and ipa-ra findings.)

ipa-ra is from hell ;) At least from a function-patcher's standpoint.
You may argue that OTOH function binary patching is from hell :)

> >> the nop-padding is more general, but the size and
> >> layout of nops and the call abi will be target
> >> specific and the user will most likely need to modify
> >> the binary (to get the right sequence) which needs
> >> additional tooling.  i don't know who might use it
> >> other than linux (which already has tools to deal with
> >> -mfentry).

On exactly 1 (one!) architecture. s390x uses NOP padding, hint, hint...

> i'm trying to find where this happens in the kernel, but
> i only see scripts/recordmcount.{c,pl} which are based on
> nop patching the fentry/mcount call sites.
> 
> without such call sites the tools have to be implemented
> differently and the way the kernel records the call site
> positions might not match the prolog-pad recording.

AFAICS Maxim has provided a nice mechanism to find the NOP pads.

Let's see how far we can get, then discuss this further,
I suggest.

Torsten



RE: [PATCH 4/6] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-28 Thread Claudiu Zissulescu
> Besides, why would you change any of the code, apart from the argument
> to #ifdef and the comments?

It is not working/giving wrong results. I think, the test shows you this if you 
run it without all the libgcc mods.


Re: [PATCH 4/6] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-28 Thread Joern Wolfgang Rennecke



On 28/04/16 12:35, Claudiu Zissulescu wrote:

Besides, why would you change any of the code, apart from the argument
to #ifdef and the comments?

It is not working/giving wrong results. I think, the test shows you this if you 
run it without all the libgcc mods.

I can't.

Where exactly does the test go wrong?
Can you show a trace of __eqdf2 with register values?


Re: Avoid NULL cfun ICE in gcc/config/nvptx/nvptx.c:nvptx_libcall_value

2016-04-28 Thread Bernd Schmidt

On 04/28/2016 01:15 PM, Alexander Monakov wrote:

So if my understanding is correct, additional !cfun check can be acceptable as
a fix along the existing hack.  Perhaps with a note about the nature of the
hack.


Yes, I think Thomas' patch is ok.


Bernd


RE: [PATCH 4/6] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-28 Thread Claudiu Zissulescu
> 
> Where exactly does the test go wrong?

I will try to trace it back when I develop it. It passed too long since then. 
Probably something related with big-endian.




Re: [patch] Don't encode the minor version in the gcj abi version

2016-04-28 Thread Matthias Klose

On 28.04.2016 12:52, Rainer Orth wrote:

Rainer Orth  writes:


Matthias Klose  writes:


Bumping the version from from 6.0.0 to 6.1.0 broke gcj, because the minor
version is still encoded in the gcj abi, not seen during development of the
6 series until it was bumped for the final release.


This is PR java/70839.


I just noticed that your patch is incomplete: it leaves the now unused
minor around and incorrectly talks about sub-minor versions...

Here's what I had in the PR instead:

2016-04-28  Rainer Orth  

PR java/70839
* decl.c (parse_version): Remove minor handling.


yes, that looks good. Can't approve it myself.



Re: [PATCH 5/6] [ARC] Fix unwanted match for sign extend 16-bit constant.

2016-04-28 Thread Joern Wolfgang Rennecke



On 18/04/16 15:33, Claudiu Zissulescu wrote:

The combine pass may conclude umulhisi3_imm pattern can accept also sign
extended 16-bit constants. This patch prohibits the combine in considering
this pattern as suitable.

OK to apply?
Claudiu

gcc/
2016-04-18  Claudiu Zissulescu  

* config/arc/arc.md (umulhisi3_imm): Avoid unwanted match for sign
extend 16-bit constants.

...

* testsuite/gcc.target/arc/umulsihi3_z.c: New file.
-(match_operand:HI 2 "short_const_int_operand"  " L, 
L,I,C16,C16")))]
+(match_operand:HI 2 "short_const_int_operand"  " L, 
L,I,C16,C16")))
+  (use (match_dup 2))]


 That's not the way to fix it.  Get the predicates and constraints right.


Re: [patch] Don't encode the minor version in the gcj abi version

2016-04-28 Thread Andrew Haley
On 04/28/2016 12:45 PM, Matthias Klose wrote:
> yes, that looks good. Can't approve it myself.

OK.

Andrew.



RE: [PATCH 1/2] [ARC/LIBGCC] Add TLS support.

2016-04-28 Thread Claudiu Zissulescu
Committed r235558.

Thanks,
Claudiu

> > libgcc/
> > 2016-04-15  Claudiu Zissulescu  
> > Joern Rennecke  
> >
> > * config/arc/crttls.S: New file.
> > * config/arc/t-arc: New rule.
> > * config.host (arc*-*-elf*, arc*-*-linux*): Add crttls.o.
> > -
>   The libgcc part is OK.


RE: [PATCH 2/2] [ARC] Add TLS support.

2016-04-28 Thread Claudiu Zissulescu
Fixing the loose ends. Committed r235559

Thanks,
Claudiu

> 
> > +(define_insn "tls_gd_load"
> ..
> > +   ; if the linker has to patch this into IE, we need a long insns
> 
> Typo: a long insn.
> 
> arc_emit_call_tls_get_addr is missing a start-of-function comment.
> 
> Otherwise this is OK.



[C PATCH PING] PR43651: add warning for duplicate qualifier

2016-04-28 Thread Mikhail Maltsev
On 04/10/2016 11:12 PM, Martin Sebor wrote:
> On 04/09/2016 06:28 AM, Mikhail Maltsev wrote:
>> On 04/08/2016 08:54 PM, Martin Sebor wrote:
 The name for new option "-Wduplicate-decl-specifier" and wording was
 chosen to match the same option in Clang.
>>>
>>> My version of Clang also warns in C++ mode but if I'm reading
>>> the patch right, GCC would warn only C mode.  I would find it
>>> surprising if GCC provided the same option as Clang but didn't
>>> make it available in the same languages.  Do you have some
>>> reason for leaving it out that I'm not thinking of?
>> It is an error in C++ mode. Do we want to change this behavior?
> 
> You're right, G++ does give an error.  I missed it in my testing.
> Unlike C11, C++ requires a diagnostic for duplicated cv-qualifiers
> so by issuing a warning Clang is more permissive.  My personal
> inclination would be to treat this consistently between C and C++
> but whether or not to change it is something Jason would need to
> weigh in on.

Ping.

-- 
Regards,
Mikhail Maltsev


[PATCH, i386]: Improve fop and corresponding peephole2 patterns

2016-04-28 Thread Uros Bizjak
Hello!

mult_operator will never match in patterns, protected with
!COMMUTATIVE_ARITH. The patch also adds "reg = op (mem, reg)"
peephole2 to break additional cases of dependency of two loads from
x87 stack.

2016-04-28  Uros Bizjak  

* config/i386/i386.md (*fop__1_mixed): Do not check for
mult_operator when calculating "type" attribute.
(*fop__1_i387): Ditto.
(*fop_xf_1_i387): Ditto.
(x87 stack loads peephole2): Add "reg = op (mem, reg)" peephole2.
Use std::swap to swap operands.  Use RTL expressions to generate
converted pattern.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 235544)
+++ config/i386/i386.md (working copy)
@@ -14055,20 +14055,13 @@
&& !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "* return output_387_binary_op (insn, operands);"
   [(set (attr "type")
-(cond [(and (eq_attr "alternative" "2,3")
-   (match_operand:MODEF 3 "mult_operator"))
- (const_string "ssemul")
-  (and (eq_attr "alternative" "2,3")
-   (match_operand:MODEF 3 "div_operator"))
- (const_string "ssediv")
-  (eq_attr "alternative" "2,3")
- (const_string "sseadd")
-  (match_operand:MODEF 3 "mult_operator")
- (const_string "fmul")
-   (match_operand:MODEF 3 "div_operator")
- (const_string "fdiv")
-  ]
-  (const_string "fop")))
+   (if_then_else (eq_attr "alternative" "2,3")
+  (if_then_else (match_operand:MODEF 3 "div_operator")
+ (const_string "ssediv")
+ (const_string "sseadd"))
+  (if_then_else (match_operand:MODEF 3 "div_operator")
+ (const_string "fdiv")
+ (const_string "fop"
(set_attr "isa" "*,*,noavx,avx")
(set_attr "prefix" "orig,orig,orig,vex")
(set_attr "mode" "")
@@ -14090,12 +14083,9 @@
&& !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "* return output_387_binary_op (insn, operands);"
   [(set (attr "type")
-(cond [(match_operand:MODEF 3 "mult_operator")
- (const_string "fmul")
-   (match_operand:MODEF 3 "div_operator")
- (const_string "fdiv")
-  ]
-  (const_string "fop")))
+(if_then_else (match_operand:MODEF 3 "div_operator")
+   (const_string "fdiv")
+   (const_string "fop")))
(set_attr "mode" "")])
 
 ;; ??? Add SSE splitters for these!
@@ -14109,7 +14099,7 @@
&& !(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
&& (TARGET_USE_MODE_FIOP
|| optimize_function_for_size_p (cfun))"
-  { return output_387_binary_op (insn, operands); }
+  "* return output_387_binary_op (insn, operands);"
   [(set (attr "type")
 (cond [(match_operand:MODEF 3 "mult_operator")
  (const_string "fmul")
@@ -14130,7 +14120,7 @@
&& !(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
&& (TARGET_USE_MODE_FIOP
|| optimize_function_for_size_p (cfun))"
-  { return output_387_binary_op (insn, operands); }
+  "* return output_387_binary_op (insn, operands);"
   [(set (attr "type")
 (cond [(match_operand:MODEF 3 "mult_operator")
  (const_string "fmul")
@@ -14220,12 +14210,9 @@
&& !COMMUTATIVE_ARITH_P (operands[3])"
   "* return output_387_binary_op (insn, operands);"
   [(set (attr "type")
-(cond [(match_operand:XF 3 "mult_operator")
- (const_string "fmul")
-   (match_operand:XF 3 "div_operator")
- (const_string "fdiv")
-  ]
-  (const_string "fop")))
+(if_then_else (match_operand:XF 3 "div_operator")
+   (const_string "fdiv")
+   (const_string "fop")))
(set_attr "mode" "XF")])
 
 (define_insn "*fop_xf_2_i387"
@@ -14236,7 +14223,7 @@
   (match_operand:XF 2 "register_operand" "0")]))]
   "TARGET_80387
&& (TARGET_USE_MODE_FIOP || optimize_function_for_size_p (cfun))"
-  { return output_387_binary_op (insn, operands); }
+  "* return output_387_binary_op (insn, operands);"
   [(set (attr "type")
 (cond [(match_operand:XF 3 "mult_operator")
  (const_string "fmul")
@@ -14255,7 +14242,7 @@
 (match_operand:SWI24 2 "nonimmediate_operand" "m"))]))]
   "TARGET_80387
&& (TARGET_USE_MODE_FIOP || optimize_function_for_size_p (cfun))"
-  { return output_387_binary_op (insn, operands); }
+  "* return output_387_binary_op (insn, operands);"
   [(set (attr "type")
 (cond [(match_operand:XF 3 "mult_operator")
  (const_string "fmul")
@@ -17394,6 +17381,7 @@
 ;;   fmul bb fmul %st(1), %st
 ;;
 ;; Actually we only match the last two instructions for simplicity.
+
 (define_peephole2
   [(set (match_operand 0 "fp_registe

Re: [PATCH] Fix type field walking in gimplifier unsharing

2016-04-28 Thread Richard Biener
On Thu, 28 Apr 2016, Richard Biener wrote:

> On Thu, 28 Apr 2016, Eric Botcazou wrote:
> 
> > > Aww, I was hoping for sth that would not require me to fix all
> > > frontends ...
> > 
> > I don't really see how this can work without DECL_EXPR though.  You need to 
> > define when the variable-sized expressions are evaluated to lay out the 
> > type, 
> > otherwise it will be laid out on the first use, which may see a different 
> > value of the expressions than the definition point.  The only way to do 
> > that 
> > for a locally-defined type is to add a DECL_EXPR in GENERIC, so that the 
> > gimplifier evaluates the expressions at the right spot.
> 
> Ah, so the C++ FE does this correctly but in addition to that it has
> 
>   /* When the pointed-to type involves components of variable 
> size,
>  care must be taken to ensure that the size evaluation code is
>  emitted early enough to dominate all the possible later uses
>  and late enough for the variables on which it depends to have
>  been assigned.
> 
>  This is expected to happen automatically when the pointed-to
>  type has a name/declaration of it's own, but special 
> attention
>  is required if the type is anonymous.
> ...
>   if (!TYPE_NAME (type)
>   && (decl_context == NORMAL || decl_context == FIELD)
>   && at_function_scope_p ()
>   && variably_modified_type_p (type, NULL_TREE))
> /* Force evaluation of the SAVE_EXPR.  */
> finish_expr_stmt (TYPE_SIZE (type));
> 
> so in this case the type doesn't have an associated TYPE_DECL and thus
> we can't build a DECL_EXPR.  To me the correct fix is then to
> always force a TYPE_DECL for variable-modified types.
> 
> Jason?

The following works (for the testcase):

Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c   (revision 235547)
+++ gcc/cp/decl.c   (working copy)
@@ -10393,8 +10393,11 @@ grokdeclarator (const cp_declarator *dec
  && (decl_context == NORMAL || decl_context == FIELD)
  && at_function_scope_p ()
  && variably_modified_type_p (type, NULL_TREE))
-   /* Force evaluation of the SAVE_EXPR.  */
-   finish_expr_stmt (TYPE_SIZE (type));
+   {
+ TYPE_NAME (type) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
+NULL_TREE, type);
+ add_decl_expr (TYPE_NAME (type));
+   }
 
  if (declarator->kind == cdk_reference)
{

and I have a similar fix for the Fortran FE for one testcase I
reduced to

  character(10), dimension (2) :: implicit_result
  character(10), dimension (2) :: source
  implicit_result = reallocate_hnv (LEN (source))
contains
  FUNCTION reallocate_hnv(LEN)
CHARACTER(LEN=LEN), DIMENSION(:), POINTER :: reallocate_hnv
  END FUNCTION reallocate_hnv
end

Index: fortran/trans-array.c
===
--- fortran/trans-array.c   (revision 235547)
+++ fortran/trans-array.c   (working copy)
@@ -1094,6 +1094,16 @@ gfc_trans_create_temp_array (stmtblock_t
   info->descriptor = desc;
   size = gfc_index_one_node;
 
+  /* Emit a DECL_EXPR for the variable sized array type in
+ GFC_TYPE_ARRAY_DATAPTR_TYPE so the gimplification of its type
+ sizes works correctly.  */
+  tree arraytype = TREE_TYPE (GFC_TYPE_ARRAY_DATAPTR_TYPE (type));
+  if (! TYPE_NAME (arraytype))
+TYPE_NAME (arraytype) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
+   NULL_TREE, arraytype);
+  gfc_add_expr_to_block (pre, build1 (DECL_EXPR,
+ arraytype, TYPE_NAME (arraytype)));
+
   /* Fill in the array dtype.  */
   tmp = gfc_conv_descriptor_dtype (desc);
   gfc_add_modify (pre, tmp, gfc_get_dtype (TREE_TYPE (desc)));


I wonder if we can avoid allocating the TYPE_DECL by simply also
allowing TREE_TYPE as operand of a DECL_EXPR (to avoid adding
a 'TYPE_EXPR').

Richard.


RE: [PATCH 1/6] [ARC] Don't use drsub* instructions when selecting fpuda.

2016-04-28 Thread Claudiu Zissulescu
Committed r235562.

Thanks,
Claudiu

> >
> > gcc/
> > 2016-04-18  Claudiu Zissulescu  
> >
> > * config/arc/arc.md (cpu_facility): Add fpx variant.
> > (subdf3): Prohibit use reverse sub when assist operations option
> > is enabled.
> > * config/arc/fpx.md (subdf3_insn, *dsubh_peep2_insn): Allow drsub
> > instructions only when FPX is enabled.
> >  * testsuite/gcc.target/arc/trsub.c: New test.
> >
>   OK.


[PATCH] Add some more powi patterns to match.pd

2016-04-28 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2016-04-28  Richard Biener  

PR tree-optimization/70840
* match.pd: powi(-x, y) and powi(|x|,y) -> powi(x,y) if y is even;
Fix pow(copysign(x, y), z) -> pow(x, z) and add powi variant;
Mark x * pow(x,c) -> pow(x,c+1) commutative.
Add powi(x,y) * powi(z,y) -> powi(x*z,y).

Index: gcc/match.pd
===
--- gcc/match.pd(revision 235544)
+++ gcc/match.pd(working copy)
@@ -366,6 +366,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(with { HOST_WIDE_INT n; }
 (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
  (pows @0 @1)
+ /* Likewise for powi.  */
+ (for pows (POWI)
+  (simplify
+   (pows (op @0) INTEGER_CST@1)
+   (if (wi::bit_and (@1, 1) == 0)
+(pows @0 @1
  /* Strip negate and abs from both operands of hypot.  */
  (for hypots (HYPOT)
   (simplify
@@ -396,10 +402,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for pows (POW)
  copysigns (COPYSIGN)
  (simplify
-  (pows (copysigns @0 @1) REAL_CST@1)
+  (pows (copysigns @0 @2) REAL_CST@1)
   (with { HOST_WIDE_INT n; }
(if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
 (pows @0 @1)
+/* Likewise for powi.  */
+(for pows (POWI)
+ copysigns (COPYSIGN)
+ (simplify
+  (pows (copysigns @0 @2) INTEGER_CST@1)
+  (if (wi::bit_and (@1, 1) == 0)
+   (pows @0 @1
 
 (for hypots (HYPOT)
  copysigns (COPYSIGN)
@@ -2781,7 +2794,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
  /* Simplify x * pow(x,c) -> pow(x,c+1). */
  (simplify
-  (mult @0 (POW:s @0 REAL_CST@1))
+  (mult:c @0 (POW:s @0 REAL_CST@1))
   (if (!TREE_OVERFLOW (@1))
(POW @0 (plus @1 { build_one_cst (type); }
 
@@ -2819,6 +2832,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (mult (POW:s @0 @1) (POW:s @2 @1))
(POW (mult @0 @2) @1))
 
+ /* Simplify powi(x,y) * powi(z,y) -> powi(x*z,y). */
+ (simplify
+  (mult (POWI:s @0 @1) (POWI:s @2 @1))
+   (POWI (mult @0 @2) @1))
+
  /* Simplify pow(x,c) / x -> pow(x,c-1). */
  (simplify
   (rdiv (POW:s @0 REAL_CST@1) @0)


Re: New hashtable power 2 rehash policy

2016-04-28 Thread Jonathan Wakely

I'm making this small change to some comments in hashtable_policy.h

Tested x86_64-linux, committing to trunk.


commit c387345f7c68df8812c7909f9187445a79bd5dcb
Author: Jonathan Wakely 
Date:   Thu Apr 28 11:25:29 2016 +0100

	* include/bits/hashtable_policy.h (__detail::_Insert_base,
	__detail::_Insert): Improve comments.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 7a2ac92..2c24c19 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -667,7 +667,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /**
*  Primary class template _Insert_base.
*
-   *  insert member functions appropriate to all _Hashtables.
+   *  Defines @c insert member functions appropriate to all _Hashtables.
*/
   template

Re: [PATCH 6/6] [ARC] Various instruction pattern fixes

2016-04-28 Thread Joern Wolfgang Rennecke



On 18/04/16 19:25, Claudiu Zissulescu wrote:

Forgot to add the reload cases. Here it is the updated patch.

//Claudiu


gcc/
2016-04-18  Claudiu Zissulescu  

* config/arc/arc.md (mulsidi3): Change operand 0 predicate to
register_operand.
(umulsidi3): Likewise.
(indirect_jump): Fix jump instruction assembly patterns.
(arcset): Change operand 1 predicate to nonmemory_operand.
(arcsetltu, arcsetgeu): Likewise.

ChangeLog omission: You are also adding an r/n/r alternative.

(arcsethi, arcsetls): Fix pattern.

Otherwise this is OK.

If the constant / register comparisons come from an expander, in
general the expander should be fixed to swap the operands and
use the swapped comparison code, to get canonical rtl.
OTOH, constant re-materialization during register allocation can change 
a reg-reg into
a constant-reg comparison, and at that stage, canonicalization would not 
be expected.


gcc-patches@gcc.gnu.org

2016-04-28 Thread Jonathan Wakely

A few more places where we should be using std::addressof.

Tested x86_64-linux, committed to trunk.

commit 6d5bde9f90e7f1470d3f8d53b3288e6832f536df
Author: Jonathan Wakely 
Date:   Thu Apr 28 11:45:22 2016 +0100

libstdc++/70766 use std::addressof instead of operator&

	PR libstdc++/70766
	* include/bits/basic_ios.tcc (basic_ios::_M_cache_locale): Use
	__addressof.
	* include/bits/stream_iterator.h (istream_iterator, ostream_iterator):
	Likewise.
	* include/std/atomic (atomic<_Tp>): Likewise.
	* include/std/shared_mutex (shared_lock): Likewise.
	* testsuite/24_iterators/istream_iterator/70766.cc: New test.
	* testsuite/24_iterators/ostream_iterator/70766.cc : New test.
	* testsuite/29_atomics/atomic/60695.cc: Adjust dg-error line number.
	* testsuite/29_atomics/atomic/70766.cc: New test.
	* testsuite/30_threads/shared_lock/70766.cc: New test.

diff --git a/libstdc++-v3/include/bits/basic_ios.tcc b/libstdc++-v3/include/bits/basic_ios.tcc
index 6c2ea11..0469220 100644
--- a/libstdc++-v3/include/bits/basic_ios.tcc
+++ b/libstdc++-v3/include/bits/basic_ios.tcc
@@ -157,17 +157,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 basic_ios<_CharT, _Traits>::_M_cache_locale(const locale& __loc)
 {
   if (__builtin_expect(has_facet<__ctype_type>(__loc), true))
-	_M_ctype = &use_facet<__ctype_type>(__loc);
+	_M_ctype = std::__addressof(use_facet<__ctype_type>(__loc));
   else
 	_M_ctype = 0;
 
   if (__builtin_expect(has_facet<__num_put_type>(__loc), true))
-	_M_num_put = &use_facet<__num_put_type>(__loc);
+	_M_num_put = std::__addressof(use_facet<__num_put_type>(__loc));
   else
 	_M_num_put = 0;
 
   if (__builtin_expect(has_facet<__num_get_type>(__loc), true))
-	_M_num_get = &use_facet<__num_get_type>(__loc);
+	_M_num_get = std::__addressof(use_facet<__num_get_type>(__loc));
   else
 	_M_num_get = 0;
 }
diff --git a/libstdc++-v3/include/bits/stream_iterator.h b/libstdc++-v3/include/bits/stream_iterator.h
index f9c6ba6..4afba4e 100644
--- a/libstdc++-v3/include/bits/stream_iterator.h
+++ b/libstdc++-v3/include/bits/stream_iterator.h
@@ -66,7 +66,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   ///  Construct start of input stream iterator.
   istream_iterator(istream_type& __s)
-  : _M_stream(&__s)
+  : _M_stream(std::__addressof(__s))
   { _M_read(); }
 
   istream_iterator(const istream_iterator& __obj)
@@ -84,7 +84,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   const _Tp*
-  operator->() const { return &(operator*()); }
+  operator->() const { return std::__addressof((operator*())); }
 
   istream_iterator&
   operator++()
@@ -168,7 +168,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 public:
   /// Construct from an ostream.
-  ostream_iterator(ostream_type& __s) : _M_stream(&__s), _M_string(0) {}
+  ostream_iterator(ostream_type& __s)
+  : _M_stream(std::__addressof(__s)), _M_string(0) {}
 
   /**
*  Construct from an ostream.
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index bdc1f25..3c8ece8 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -39,6 +39,7 @@
 #else
 
 #include 
+#include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -222,17 +223,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   void
   store(_Tp __i, memory_order __m = memory_order_seq_cst) noexcept
-  { __atomic_store(&_M_i, &__i, __m); }
+  { __atomic_store(std::__addressof(_M_i), std::__addressof(__i), __m); }
 
   void
   store(_Tp __i, memory_order __m = memory_order_seq_cst) volatile noexcept
-  { __atomic_store(&_M_i, &__i, __m); }
+  { __atomic_store(std::__addressof(_M_i), std::__addressof(__i), __m); }
 
   _Tp
   load(memory_order __m = memory_order_seq_cst) const noexcept
   { 
 _Tp tmp;
-	__atomic_load(&_M_i, &tmp, __m);
+	__atomic_load(std::__addressof(_M_i), std::__addressof(tmp), __m);
 	return tmp;
   }
 
@@ -240,7 +241,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   load(memory_order __m = memory_order_seq_cst) const volatile noexcept
   { 
 _Tp tmp;
-	__atomic_load(&_M_i, &tmp, __m);
+	__atomic_load(std::__addressof(_M_i), std::__addressof(tmp), __m);
 	return tmp;
   }
 
@@ -248,7 +249,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   exchange(_Tp __i, memory_order __m = memory_order_seq_cst) noexcept
   { 
 _Tp tmp;
-	__atomic_exchange(&_M_i, &__i, &tmp, __m);
+	__atomic_exchange(std::__addressof(_M_i), std::__addressof(__i),
+			  std::__addressof(tmp), __m);
 	return tmp;
   }
 
@@ -257,7 +259,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   memory_order __m = memory_order_seq_cst) volatile noexcept
   { 
 _Tp tmp;
-	__atomic_exchange(&_M_i, &__i, &tmp, __m);
+	__atomic_exchange(std::__addressof(_M_i), std::__addressof(__i),
+			  std::__addressof(tmp), __m);
 	return tmp;
   }
 
@@ -265,14 +268,20 @@ _GLIBCXX_

Re: [PATCH] Update gmp/mpfr/mpc in-tree versions

2016-04-28 Thread Richard Biener
On Thu, 28 Apr 2016, Bernd Edlinger wrote:

> Hi,
> 
> here is the first part of the patch that addresses only the in-tree
> builds.  I tried different combinations of the documented supported
> in-tree versions, and all combinations seem to work.
> Then I changed the download_prerequisites batch to pick each pre-
> requisite's minimum version (that part is not tested, because I have
> no way to update the gcc.gnu.org ftp server).
> 
> Various boot-straps for x86_64-linux-gnu and armv7-linux-gnueabihf
> were successful.
> 
> Is it OK for trunk?

Please do not document that in-tree versions greater than XXX are
supported, instead just point at download_prerequesites.

Why do you not update to latest mpc (there is 1.0.3) and mpfr but leave 
bugfixes for mpfr on the plate (there is 3.1.4).

Does it make sense to wait for a new GMP release that allows to get
rid of -DNO_ASM?

I will upload mpfr 3.1.4 and mpc 1.0.3.

Thanks,
Richard.

> Thanks
> Bernd.
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[patch] generate_libstdcxx_web_docs: Use realpath to get absolute path

2016-04-28 Thread Jonathan Wakely

When I ran maintainer-scripts/generate_libstdcxx_web_docs to make the
onlinedocs/libstdc++ for 6.1 the other day it failed because I use a
relative path for the output dir argument. This would make it work,
but is relying on GNU realpath OK?

I think in practice nobody is going to generate those docs on anything
other than GNU/Linux, but I could instead use a Bashism like:

DOCSDIR=$(test "${2:0:1}" = "/" && echo "$2" || echo "$PWD/$2")

If it's OK for trunk it could go on the banches too, for generating
the 4.9.4, 5.4 and 6.2 docs.


commit b46ffd26927df284ea3c4760a40ce716e8e25fa2
Author: Jonathan Wakely 
Date:   Wed Apr 27 14:52:03 2016 +0100

	* generate_libstdcxx_web_docs: Use realpath to get absolute path.

diff --git a/maintainer-scripts/generate_libstdcxx_web_docs b/maintainer-scripts/generate_libstdcxx_web_docs
index 700e522..5878dfe 100755
--- a/maintainer-scripts/generate_libstdcxx_web_docs
+++ b/maintainer-scripts/generate_libstdcxx_web_docs
@@ -3,7 +3,7 @@
 # i.e. http://gcc.gnu.org/onlinedocs/gcc-x.y.z/libstdc++*
 
 SRCDIR=${1}
-DOCSDIR=${2}
+DOCSDIR=$(realpath -es ${2})
 
 if ! [ $# -eq 2 -a -x "${SRCDIR}/configure" -a -d "${DOCSDIR}" ]
 then
@@ -34,6 +34,9 @@ set -x
 ${SRCDIR}/configure --enable-languages=c,c++ --disable-gcc $disabled_libs --docdir=/docs
 eval `grep '^target=' config.log`
 make configure-target
+# If the following step fails with an error like
+# ! LaTeX Error: File `xtab.sty' not found.
+# then you need to install the relevant TeX package e.g. texlive-xtab
 make -C $target/libstdc++-v3 doc-install-html doc-install-xml doc-install-pdf DESTDIR=$DESTDIR
 cd $DESTDIR/docs
 mkdir libstdc++


RE: [PATCH 2/6] [ARC] Fix FPX/FPUDA code gen when compiling for big-endian.

2016-04-28 Thread Claudiu Zissulescu
Fixed naming in arc_rtx_costs, committed r235567.

Thanks,
Claudiu

>> gcc/
> > 2016-04-18  Claudiu Zissulescu  
> >
> > * config/arc/arc.c (arc_process_double_reg_moves): Fix for
> > big-endian compilation.
> > * config/arc/arc.md (addf3): Likewise.
> > (subdf3): Likewise.
> > (muldf3): Likewise.
> >
>   OK.
> 
> FWIW, there is also a FIXME for a little-endian-centric use of
> split_double in arc.c:arc_rtx_costs.


[PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

2016-04-28 Thread Bin Cheng
Hi,
Tree if-conversion sometimes cannot convert conditional array reference into 
unconditional one.  Root cause is GCC conservatively assumes newly introduced 
array reference could be out of array bound and thus trapping.  This patch 
improves the situation by proving the converted unconditional array reference 
is within array bound using loop niter information.  To be specific, it checks 
every index of array reference to see if it's within bound in 
ifcvt_memrefs_wont_trap.  This patch also factors out base_object_writable 
checking if the base object is writable or not.
Bootstrap and test on x86_64 and aarch64, is it OK?

Thanks,
bin

2016-04-28  Bin Cheng  

* tree-if-conv.c (tree-ssa-loop.h): Include header file.
(tree-ssa-loop-niter.h): Ditto.
(idx_within_array_bound, ref_within_array_bound): New functions.
(ifcvt_memrefs_wont_trap): Check if array ref is within bound.
Factor out check on writable base object to ...
(base_object_writable): ... here.
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 2d14901..170e644 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -106,6 +106,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-data-ref.h"
 #include "tree-scalar-evolution.h"
+#include "tree-ssa-loop.h"
+#include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop-ivopts.h"
 #include "tree-ssa-address.h"
 #include "dbgcnt.h"
@@ -771,6 +773,132 @@ hash_memrefs_baserefs_and_store_DRs_read_written_info 
(data_reference_p a)
 }
 }
 
+/* Return TRUE if can prove the index IDX of an array reference REF is
+   within array bound.  Return false otherwise.  */
+
+static bool
+idx_within_array_bound (tree ref, tree *idx, void *dta)
+{
+  widest_int niter;
+  tree ev, init, step;
+  tree low, high, type, unsigned_type, delta, valid_niter, step_abs, e;
+  bool sign;
+  struct loop *loop = (struct loop*) dta;
+
+  /* Only support within-bound access for array references.  */
+  if (TREE_CODE (ref) != ARRAY_REF)
+return false;
+
+  /* For arrays at the end of the structure, we are not guaranteed that they
+ do not really extend over their declared size.  However, for arrays of
+ size greater than one, this is unlikely to be intended.  */
+  if (array_at_struct_end_p (ref))
+return false;
+
+  ev = analyze_scalar_evolution (loop, *idx);
+  ev = instantiate_parameters (loop, ev);
+  init = initial_condition (ev);
+  step = evolution_part_in_loop_num (ev, loop->num);
+
+  if (!init || TREE_CODE (init) != INTEGER_CST
+  || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop (step))
+return false;
+
+  low = array_ref_low_bound (ref);
+  high = array_ref_up_bound (ref);
+
+  /* The case of nonconstant bounds could be handled, but it would be
+ complicated.  */
+  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
+  || !high || TREE_CODE (high) != INTEGER_CST)
+return false;
+
+  sign = tree_int_cst_sign_bit (step);
+  type = TREE_TYPE (step);
+
+  /* In case the relevant bound of the array does not fit in type, or
+ it does, but bound + step (in type) still belongs into the range of the
+ array, the index may wrap and still stay within the range of the array
+ (consider e.g. if the array is indexed by the full range of
+ unsigned char).
+
+ To make things simpler, we require both bounds to fit into type, although
+ there are cases where this would not be strictly necessary.  */
+  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
+return false;
+
+  low = fold_convert (type, low);
+  high = fold_convert (type, high);
+
+  /* Check if the first idx is within bound.  */
+  if (tree_int_cst_compare (init, low) < 0
+  || tree_int_cst_compare (init, high) > 0)
+return false;
+
+  /* Don't issue signed overflow warnings.  */
+  fold_defer_overflow_warnings ();
+
+  unsigned_type = unsigned_type_for (type);
+  init = fold_convert (unsigned_type, init);
+  if (sign)
+{
+  tree extreme = fold_convert (unsigned_type, low);
+  delta = fold_build2 (MINUS_EXPR, unsigned_type, init, extreme);
+  step_abs = fold_build1 (NEGATE_EXPR, unsigned_type,
+ fold_convert (unsigned_type, step));
+}
+  else
+{
+  tree extreme = fold_convert (unsigned_type, high);
+  delta = fold_build2 (MINUS_EXPR, unsigned_type, extreme, init);
+  step_abs = fold_convert (unsigned_type, step);
+}
+  valid_niter = fold_build2 (FLOOR_DIV_EXPR, unsigned_type, delta, step_abs);
+
+  /* Check if idx is within bound through all niter of loop.  */
+  if (max_loop_iterations (loop, &niter)
+  && wi::fits_to_tree_p (niter, TREE_TYPE (valid_niter))
+  && (e = fold_binary (GT_EXPR, boolean_type_node, valid_niter,
+  wide_int_to_tree (TREE_TYPE (valid_niter),
+niter))) != NULL
+  && integer_nonzerop (e))
+{
+   

RE: [PATCH 3/6] [ARC] Pass mfpuda to assembler.

2016-04-28 Thread Claudiu Zissulescu
Committed r235568.

Thanks,
Claudiu

> > gcc/
> > 2016-04-18  Claudiu Zissulescu  
> >
> > * config/arc/arc.h (ASM_SPEC): Pass mfpuda to assembler.
> >
>   OK.


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-28 Thread Bernd Schmidt

On 04/28/2016 12:35 PM, Jakub Jelinek wrote:

On Thu, Apr 28, 2016 at 12:31:40PM +0200, Bernd Schmidt wrote:

I really don't see anything in that function that looks like a huge time
sink, so I'm not that worried about it. I think it's likely to be buried way
down in the noise.


True, but the noise sums up, and the result is terrible speed of compiling
empty source files, something that e.g. Linux kernel or other packages
that have lots of small source files, care about a lot.
If initializing it early would buy us anything on code clarity etc., it
could be justified, but IMHO it doesn't, the code in libcpp already has the
delayed initialization anyway.


Well, it does buy us early (and reliable) error checks against the 
environment variable.



Bernd



Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-28 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 03:10:26PM +0200, Bernd Schmidt wrote:
> On 04/28/2016 12:35 PM, Jakub Jelinek wrote:
> >On Thu, Apr 28, 2016 at 12:31:40PM +0200, Bernd Schmidt wrote:
> >>I really don't see anything in that function that looks like a huge time
> >>sink, so I'm not that worried about it. I think it's likely to be buried way
> >>down in the noise.
> >
> >True, but the noise sums up, and the result is terrible speed of compiling
> >empty source files, something that e.g. Linux kernel or other packages
> >that have lots of small source files, care about a lot.
> >If initializing it early would buy us anything on code clarity etc., it
> >could be justified, but IMHO it doesn't, the code in libcpp already has the
> >delayed initialization anyway.
> 
> Well, it does buy us early (and reliable) error checks against the
> environment variable.

I'm not sure we really care about the env var unless it actually needs to be
used.  If we error only if it is used, people could e.g. use it in another
way, to verify their code doesn't contain any __TIME__ uses, compile with
the env var set to some invalid string and just compile everything with
that, it would diagnose any uses of __TIME__.

Jakub


Re: [PATCH v2] gcov: Runtime configurable destination output

2016-04-28 Thread Nathan Sidwell

On 04/27/16 16:59, Aaron Conole wrote:

Apologies for the top post. Pinging on this again. It still applies
cleanly, so no need to resubmit, I think. Is there anything else missing
or required before this can go in?


I'm not convinced this is a desirable feature.  IIRC your rationale for it was 
that that you're somehow building the target program with inconsistent coverage 
data, and the messages about that are interfering with your program's output.


That's kind of the point of error messages -- to get in your face.

nathan


RE: [PATCH][SMS] SMS use loop induction variable analysis instead of depending on doloop optimization

2016-04-28 Thread Shiva Chen
Hi, 

I fixed some bug to pass testing on x86-64 and update the patch
as 0001-SMS-use-loop-induction-variable-analysis-v1.patch.

Thanks,
Shiva

-Original Message-
From: Shiva Chen 
Sent: Thursday, April 28, 2016 2:07 PM
To: GCC Patches ; Shiva Chen 
Subject: [PATCH][SMS] SMS use loop induction variable analysis instead of 
depending on doloop optimization

Hi, 

According to Richard's suggestion in
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01240.html
I try to remove the SMS dependency on doloop pass.

SMS would need to adjust kernel loop iteration count during the transformation.

To adjust loop iteration count, SMS would need to find count_reg which contain 
the loop iteration count and then generate adjustment instruction.

Currently, SMS would find the doloop_end pattern to get count_reg, and generate 
adjustment instruction according to doloop optimization result (tranfer the 
loop to count to zero with step = 1).

If can't find doloop_end pattern or the loop form not the doloop optimization 
result, the SMS will skip the loop.

Doloop optimization could have benefit for some target even if the target don't 
support special loop instruction.

E.g. For arm , doloop optimization could transfer
 the instructions to subs and branch which save the
 comparison instruction.

However, If the loop iteration count computation of doloop optimization is too 
complicate, it would drop performance.
(PARAM_MAX_ITERATIONS_COMPUTATION_COST default value is 10 which may too high 
for the target not support special loop instruction)

This kind loop not suitable for doloop optimization and SMS can't activate.

To free the SMS dependency on doloop optimization, I try to use loop induction 
variable analysis to find count_reg and generate kernel loop adjustment 
instruction for the loop form without doloop optimization(increment/decrement 
loop with step != 1).

Without doloop optimization, induction variable could be a 
POST_INC/POST_DEC/PRE_INC/PRE_DEC in memory reference which current 
implementation won't identify as loop iv. So I modify relative code in 
loop-iv.c to identify this case.

With the patch, backend target could active SMS without define doloop_end 
pattern.

Could anyone help me to review the patch?
Any suggestion would be very helpful.

Thanks,
Shiva



0001-SMS-use-loop-induction-variable-analysis-v1.patch
Description: 0001-SMS-use-loop-induction-variable-analysis-v1.patch


Re: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops

2016-04-28 Thread Ilya Enkovich
On 27 Apr 16:05, Richard Biener wrote:
> >>
> >> I'd like to see testcases covering the corner-cases - have them have
> >> upper bound estimates by adjusting known array sizes and also cover
> >> the case of peeling for gaps.
> >
> > OK, I'll make more tests.
> > Thanks,
> > Ilya
> >
> >>
> >> Richard.
> >>

Could you please look at new tests?  I added one simple case with
known array size and similar tests with a peeling for gaps w/ and
w/o vector iteration peeled.

Checked new tests with RUNTESTFLAGS="vect.exp=vect-nb-iter-ub-* 
--target_board=unix{-m32,}
on x86_64-pc-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2016-04-28  Ilya Enkovich  

* tree-vect-loop.c (vect_transform_loop): Fix
nb_iterations_upper_bound computation for vectorized loop.

gcc/testsuite/

2016-04-28  Ilya Enkovich  

* gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
optimization of vector loop.
* gcc.target/i386/vect-unpack-3.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-1.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-2.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-3.c: New test.


diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
new file mode 100644
index 000..b7504a8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx512bw -fdump-tree-cunroll-details" { target { 
i?86-*-* x86_64-*-* } } } */
+
+int ii[127];
+char cc[127];
+
+void
+foo (int s)
+{
+  int i;
+   for (i = 0; i < s; i++)
+ ii[i] = (int) cc[i];
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { i?86-*-* 
x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "loop turned into non-loop; it never loops" 
"cunroll" { target { i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
new file mode 100644
index 000..5332636
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx512bw -fdump-tree-cunroll-details" { target { 
i?86-*-* x86_64-*-* } } } */
+
+int ii[128];
+char cc[256];
+
+void
+foo (int s)
+{
+  int i;
+   for (i = 0; i < s; i++)
+ ii[i] = (int) cc[i*2];
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { i?86-*-* 
x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "loop turned into non-loop; it never loops" 
"cunroll" { target { i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
new file mode 100644
index 000..5610f6a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx512bw -fdump-tree-cunroll-details" { target { 
i?86-*-* x86_64-*-* } } } */
+
+int ii[130];
+char cc[258];
+
+void
+foo (int s)
+{
+  int i;
+   for (i = 0; i < s; i++)
+ ii[i] = (int) cc[i*2];
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { i?86-*-* 
x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-not "loop turned into non-loop; it never loops" 
"cunroll" { target { i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c 
b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
index 4825248..51c518e 100644
--- a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
+++ b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
@@ -6,19 +6,22 @@
 
 #define N 120
 signed int yy[1];
+signed char zz[1];
 
 void
-__attribute__ ((noinline)) foo (signed char s)
+__attribute__ ((noinline,noclone)) foo (int s)
 {
-   signed char i;
+   int i;
for (i = 0; i < s; i++)
- yy[i] = (signed int) i;
+ yy[i] = zz[i];
 }
 
 void
 avx512bw_test ()
 {
   signed char i;
+  for (i = 0; i < N; i++)
+zz[i] = i;
   foo (N);
   for (i = 0; i < N; i++)
 if ( (signed int)i != yy [i] )
diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-3.c 
b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
new file mode 100644
index 000..eb8a93e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-vect-details -ftree-vectorize -ffast-math 
-mavx512bw -save-temps" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+#define N 120
+signed int yy[1];
+
+void
+__attribute__ ((noinline)) foo (signed char s)
+{
+   signed char i;
+   for (i = 0; i < s; i++)
+ yy[i] = (signed int) i;
+}
+
+void
+avx512bw_test ()
+{
+  signed char i;
+  foo (N);
+  for (i = 0; i < N; i++)
+if ( (signed int)i != yy [i] )
+  abort ();
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-assembler-not "vpmovsxbw\[ \\t\]+\[^\n\]*%zmm" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
ind

[PATCH][internal-fn.c][committed] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-04-28 Thread Kyrill Tkachov

Hi all,

This is another instance of conditional compilation on WORD_REGISTER_OPERATIONS 
that's trivial to remove.

Bootstrapped and tested on arm, aarch64, x86_64.
Committing to trunk as obvious.

Thanks,
Kyrill

2016-04-28  Kyrylo Tkachov  

* internal-fn.c (expand_arith_overflow): Convert preprocessor check
for WORD_REGISTER_OPERATIONS to runtime check.
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 3ceaffe67eaa694afe35de8f7a13a182c46f05ff..2cbe198924c7b3aed34d52c0cc35612e09f5646c 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -1807,11 +1807,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
   /* For sub-word operations, retry with a wider type first.  */
   if (orig_precres == precres && precop <= BITS_PER_WORD)
 	{
-#if WORD_REGISTER_OPERATIONS
-	  int p = BITS_PER_WORD;
-#else
-	  int p = precop;
-#endif
+	  int p = WORD_REGISTER_OPERATIONS ? BITS_PER_WORD : precop;
 	  enum machine_mode m = smallest_mode_for_size (p, MODE_INT);
 	  tree optype = build_nonstandard_integer_type (GET_MODE_PRECISION (m),
 			uns0_p && uns1_p


[PATCH] Mark predicates generated by genmatch as static

2016-04-28 Thread Patrick Palka
The predicate functions emitted by genmatch are expected to only be used
locally within {gimple,generic}-match.c, so this patch marks them as
static.  Does this look OK to commit after bootstrap and regtest?

gcc/ChangeLog:

* genmatch.c (write_predicate): Mark the emitted function as
static.
---
 gcc/genmatch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index ce964fa..2f5147f 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -3552,7 +3552,7 @@ decision_tree::gen (FILE *f, bool gimple)
 void
 write_predicate (FILE *f, predicate_id *p, decision_tree &dt, bool gimple)
 {
-  fprintf (f, "\nbool\n"
+  fprintf (f, "\nstatic bool\n"
   "%s%s (tree t%s%s)\n"
   "{\n", gimple ? "gimple_" : "tree_", p->id,
   p->nargs > 0 ? ", tree *res_ops" : "",
-- 
2.8.1.361.g2fbef4c



[PATCH] Do not build a pointer-to-element type for arrays in layout_type

2016-04-28 Thread Richard Biener

I stumbled over this odd call, present since CVS repo import (r348).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-04-28  Richard Biener  

* stor-layout.c (layout_type): Do not build a pointer-to-element
type for arrays.

Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   (revision 235557)
+++ gcc/stor-layout.c   (working copy)
@@ -2243,8 +2243,6 @@ layout_type (tree type)
tree index = TYPE_DOMAIN (type);
tree element = TREE_TYPE (type);
 
-   build_pointer_type (element);
-
/* We need to know both bounds in order to compute the size.  */
if (index && TYPE_MAX_VALUE (index) && TYPE_MIN_VALUE (index)
&& TYPE_SIZE (element))


Re: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops

2016-04-28 Thread Richard Biener
On Thu, Apr 28, 2016 at 3:26 PM, Ilya Enkovich  wrote:
> On 27 Apr 16:05, Richard Biener wrote:
>> >>
>> >> I'd like to see testcases covering the corner-cases - have them have
>> >> upper bound estimates by adjusting known array sizes and also cover
>> >> the case of peeling for gaps.
>> >
>> > OK, I'll make more tests.
>> > Thanks,
>> > Ilya
>> >
>> >>
>> >> Richard.
>> >>
>
> Could you please look at new tests?  I added one simple case with
> known array size and similar tests with a peeling for gaps w/ and
> w/o vector iteration peeled.
>
> Checked new tests with RUNTESTFLAGS="vect.exp=vect-nb-iter-ub-* 
> --target_board=unix{-m32,}
> on x86_64-pc-linux-gnu.  OK for trunk?

Can you make the new testcases runtime ones, thus check that the
vectorized outcome
is ok (so we don't forget any trailing iterations)?

Ok with that change.

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-04-28  Ilya Enkovich  
>
> * tree-vect-loop.c (vect_transform_loop): Fix
> nb_iterations_upper_bound computation for vectorized loop.
>
> gcc/testsuite/
>
> 2016-04-28  Ilya Enkovich  
>
> * gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
> optimization of vector loop.
> * gcc.target/i386/vect-unpack-3.c: New test.
> * gcc.dg/vect/vect-nb-iter-ub-1.c: New test.
> * gcc.dg/vect/vect-nb-iter-ub-2.c: New test.
> * gcc.dg/vect/vect-nb-iter-ub-3.c: New test.
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
> new file mode 100644
> index 000..b7504a8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-mavx512bw -fdump-tree-cunroll-details" { target 
> { i?86-*-* x86_64-*-* } } } */
> +
> +int ii[127];
> +char cc[127];
> +
> +void
> +foo (int s)
> +{
> +  int i;
> +   for (i = 0; i < s; i++)
> + ii[i] = (int) cc[i];
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
> i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "loop turned into non-loop; it never loops" 
> "cunroll" { target { i?86-*-* x86_64-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c 
> b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
> new file mode 100644
> index 000..5332636
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-mavx512bw -fdump-tree-cunroll-details" { target 
> { i?86-*-* x86_64-*-* } } } */
> +
> +int ii[128];
> +char cc[256];
> +
> +void
> +foo (int s)
> +{
> +  int i;
> +   for (i = 0; i < s; i++)
> + ii[i] = (int) cc[i*2];
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
> i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "loop turned into non-loop; it never loops" 
> "cunroll" { target { i?86-*-* x86_64-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c 
> b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
> new file mode 100644
> index 000..5610f6a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-mavx512bw -fdump-tree-cunroll-details" { target 
> { i?86-*-* x86_64-*-* } } } */
> +
> +int ii[130];
> +char cc[258];
> +
> +void
> +foo (int s)
> +{
> +  int i;
> +   for (i = 0; i < s; i++)
> + ii[i] = (int) cc[i*2];
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
> i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump-not "loop turned into non-loop; it never 
> loops" "cunroll" { target { i?86-*-* x86_64-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c 
> b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> index 4825248..51c518e 100644
> --- a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> @@ -6,19 +6,22 @@
>
>  #define N 120
>  signed int yy[1];
> +signed char zz[1];
>
>  void
> -__attribute__ ((noinline)) foo (signed char s)
> +__attribute__ ((noinline,noclone)) foo (int s)
>  {
> -   signed char i;
> +   int i;
> for (i = 0; i < s; i++)
> - yy[i] = (signed int) i;
> + yy[i] = zz[i];
>  }
>
>  void
>  avx512bw_test ()
>  {
>signed char i;
> +  for (i = 0; i < N; i++)
> +zz[i] = i;
>foo (N);
>for (i = 0; i < N; i++)
>  if ( (signed int)i != yy [i] )
> diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-3.c 
> b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> new file mode 100644
> index 000..eb8a93e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-vect-details -ftree-vectorize -ffast-math 
> -mavx512bw -save-temps" } */
> +/* { dg-require-effective-target avx512bw } */
> +
> +#include "avx512bw-check

Re: [PATCH] Mark predicates generated by genmatch as static

2016-04-28 Thread Richard Biener
On Thu, Apr 28, 2016 at 3:50 PM, Patrick Palka  wrote:
> The predicate functions emitted by genmatch are expected to only be used
> locally within {gimple,generic}-match.c, so this patch marks them as
> static.  Does this look OK to commit after bootstrap and regtest?

Actually the idea was to for example generate predicates in match.pd
format for things like vectorizer pattern recog (I've done this for a few,
need to search for (partial) patches on my disk), so they are supposed
to be externally visible.

Of course we might want to make that explicit in some way with
a (extern_match ...) [or by prefixing local ones with a '*' ...].

Richard.

> gcc/ChangeLog:
>
> * genmatch.c (write_predicate): Mark the emitted function as
> static.
> ---
>  gcc/genmatch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/genmatch.c b/gcc/genmatch.c
> index ce964fa..2f5147f 100644
> --- a/gcc/genmatch.c
> +++ b/gcc/genmatch.c
> @@ -3552,7 +3552,7 @@ decision_tree::gen (FILE *f, bool gimple)
>  void
>  write_predicate (FILE *f, predicate_id *p, decision_tree &dt, bool gimple)
>  {
> -  fprintf (f, "\nbool\n"
> +  fprintf (f, "\nstatic bool\n"
>"%s%s (tree t%s%s)\n"
>"{\n", gimple ? "gimple_" : "tree_", p->id,
>p->nargs > 0 ? ", tree *res_ops" : "",
> --
> 2.8.1.361.g2fbef4c
>


[PATCH 3/4] PR c++/62314: C++: add fixit hint to misspelled member names

2016-04-28 Thread David Malcolm
When we emit a hint about a misspelled member name, it will slightly
aid readability if we use a fixit-hint to show the proposed
name in context within the source code (and in the future this
might support some kind of auto-apply in an IDE).

This patch adds such a hint to the C++ frontend, taking us from:

test.cc:10:15: error: 'struct foo' has no member named 'colour'; did you mean 
'color'?
   return ptr->colour;
   ^~

to:

test.cc:10:15: error: 'struct foo' has no member named 'colour'; did you mean 
'color'?
   return ptr->colour;
   ^~
   color

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
PR c++/62314
* typeck.c (finish_class_member_access_expr): When
giving a hint about a possibly-misspelled member name,
add a fix-it replacement hint.

gcc/testsuite/ChangeLog:
PR c++/62314
* g++.dg/spellcheck-fields-2.C: New test case.
---
 gcc/cp/typeck.c| 18 +++---
 gcc/testsuite/g++.dg/spellcheck-fields-2.C | 19 +++
 2 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-fields-2.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 7e12009..95c777d 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -2817,9 +2817,21 @@ finish_class_member_access_expr (cp_expr object, tree 
name, bool template_p,
  tree guessed_id = lookup_member_fuzzy (access_path, name,
 /*want_type=*/false);
  if (guessed_id)
-   error ("%q#T has no member named %qE; did you mean %qE?",
-  TREE_CODE (access_path) == TREE_BINFO
-  ? TREE_TYPE (access_path) : object_type, name, 
guessed_id);
+   {
+ location_t bogus_component_loc = input_location;
+ rich_location rich_loc (line_table, bogus_component_loc);
+ source_range bogus_component_range =
+   get_range_from_loc (line_table, bogus_component_loc);
+ rich_loc.add_fixit_replace
+   (bogus_component_range,
+IDENTIFIER_POINTER (guessed_id));
+ error_at_rich_loc
+   (&rich_loc,
+"%q#T has no member named %qE; did you mean %qE?",
+TREE_CODE (access_path) == TREE_BINFO
+? TREE_TYPE (access_path) : object_type, name,
+guessed_id);
+   }
  else
error ("%q#T has no member named %qE",
   TREE_CODE (access_path) == TREE_BINFO
diff --git a/gcc/testsuite/g++.dg/spellcheck-fields-2.C 
b/gcc/testsuite/g++.dg/spellcheck-fields-2.C
new file mode 100644
index 000..eb10b44
--- /dev/null
+++ b/gcc/testsuite/g++.dg/spellcheck-fields-2.C
@@ -0,0 +1,19 @@
+// { dg-options "-fdiagnostics-show-caret" }
+
+union u
+{
+  int color;
+  int shape;
+};
+
+int test (union u *ptr)
+{
+  return ptr->colour; // { dg-error "did you mean .color.?" }
+}
+
+// Verify that we get an underline and a fixit hint.
+/* { dg-begin-multiline-output "" }
+   return ptr->colour;
+   ^~
+   color
+   { dg-end-multiline-output "" } */
-- 
1.8.5.3



Re: [PATCH] Turn some compile-time tests into run-time tests

2016-04-28 Thread Patrick Palka
On Wed, Apr 27, 2016 at 5:36 PM, Jeff Law  wrote:
> On 03/10/2016 04:38 PM, Patrick Palka wrote:
>>
>> I ran the command
>>
>>   git grep -l "dg-do compile" | xargs grep -l __builtin_abort | xargs grep
>> -lw main
>>
>> to find tests marked as compile-time tests that likely ought to instead
>> be marked as run-time tests, by the rationale that they use
>> __builtin_abort and they also define main().  (I also then confirmed that
>> they
>> compile, link and run cleanly on my machine.)
>>
>> After this patch, the remaining test files reported by the above command
>> are:
>>
>>   These do not define all the functions they use:
>> gcc/testsuite/g++.dg/ipa/devirt-41.C
>> gcc/testsuite/g++.dg/ipa/devirt-44.C
>> gcc/testsuite/g++.dg/ipa/devirt-45.C
>> gcc/testsuite/gcc.target/i386/pr55672.c
>>
>>   These are non-x86 tests so I can't confirm that they run cleanly:
>> gcc/testsuite/gcc.target/arm/pr58041.c
>> gcc/testsuite/gcc.target/powerpc/pr35907.c
>> gcc/testsuite/gcc.target/s390/dwarfregtable-1.c
>> gcc/testsuite/gcc.target/s390/dwarfregtable-2.c
>> gcc/testsuite/gcc.target/s390/dwarfregtable-3.c
>>
>>   These use dg-error:
>> libstdc++-v3/testsuite/20_util/forward/c_neg.cc
>> libstdc++-v3/testsuite/20_util/forward/f_neg.cc
>>
>> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
>> commit?  Does anyone have another heuristic one can use to help find
>> these kinds of typos?
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.dg/cpp0x/constexpr-aggr2.C: Make it a run-time test.
>> * g++.dg/cpp0x/nullptr32.C: Likewise.
>> * g++.dg/cpp1y/digit-sep-cxx11-neg.C: Likewise.
>> * g++.dg/cpp1y/digit-sep.C: Likewise.
>> * g++.dg/ext/flexary13.C: Likewise.
>> * gcc.dg/alias-14.c: Likewise.
>> * gcc.dg/ipa/PR65282.c: Likewise.
>> * gcc.dg/pr69644.c: Likewise.
>> * gcc.dg/tree-ssa/pr38533.c: Likewise.
>> * gcc.dg/tree-ssa/pr61385.c: Likewise.
>
> My worry with the 38533 test is that while the ASM defines "f" from the
> standpoint of dataflow, it does not actually emit any code to ensure "f" is
> actually defined.  This could lead to spurious aborts due to use of an
> uninitialized value at runtime.  Similarly for alias-14.c
>
> I'd be worried that we don't necessarily have sync_bool_compare_and_swap on
> all targets for 69644.

Ah yeah, good points..

>
> flexary13.C probably won't link on a cross target unless the cross libraries
> are available.  But that's probably OK.
>
> The rest seem OK to me.  Note that I'm not convinced all these tests were
> designed to be execution tests, even though they use __builtin_abort and
> friends.  Though it's a good marker of something that can/should be looked
> at.

True..  What made me look into this in the first place is that I
caught myself making a similar mistake, i.e. marking an execution test
case as dg-do compile instead of dg-do run out of habit.  But I
suppose it's worth looking at the context of each of these tests to
see if they were not actually intended to be execution tests.  I'll
double check this and report back; in the meantime I also found some
more tests that ought to be looked at.

>
>
> jeff
>


[PATCH 1/4] PR c++/62314: add fixit hint for missing "template <> " in explicit specialization

2016-04-28 Thread David Malcolm
This is a resend of a patch kit I sent in stage 3; the original post
was here:
  https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01933.html

I've rebased the patches against yesterday's trunk and retested them.

They add various fix-it hints to existing diagnostics (PR 62314 is a
catch-all for adding fix-its).

The first patch in the kit adds a fix-it insertion hint for missing
"template <> " in explicit specializations, and improves the
reported range of the type name by capturing the full range, rather
than just one token within it.

I note that clang (http://clang.llvm.org/diagnostics.html) suggests
inserting
  template<>
whereas our diagnostic talks about
  template <>
hence I have the fixit suggest inserting that.  Should we change our
wording instead, and lose the space?

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
PR c++/62314
* parser.c (cp_parser_class_head): Capture the start location;
use it to emit a fix-it insertion hint when complaining
about missing "template <> " in explicit specializations.

gcc/testsuite/ChangeLog:
PR c++/62314
* g++.dg/pr62314.C: New test case.
---
 gcc/cp/parser.c| 18 --
 gcc/testsuite/g++.dg/pr62314.C | 17 +
 2 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr62314.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 98a0cd4..ff16f73 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -21655,6 +21655,8 @@ cp_parser_class_head (cp_parser* parser,
   if (class_key == none_type)
 return error_mark_node;
 
+  location_t class_head_start_location = input_location;
+
   /* Parse the attributes.  */
   attributes = cp_parser_attributes_opt (parser);
 
@@ -21871,8 +21873,20 @@ cp_parser_class_head (cp_parser* parser,
   && parser->num_template_parameter_lists == 0
   && template_id_p)
 {
-  error_at (type_start_token->location,
-   "an explicit specialization must be preceded by %%>");
+  /* Build a location of this form:
+   struct typename 
+   ^~
+ with caret==start at the start token, and
+ finishing at the end of the type.  */
+  location_t reported_loc
+= make_location (class_head_start_location,
+ class_head_start_location,
+ get_finish (type_start_token->location));
+  rich_location richloc (line_table, reported_loc);
+  richloc.add_fixit_insert (class_head_start_location, "template <> ");
+  error_at_rich_loc
+(&richloc,
+ "an explicit specialization must be preceded by %%>");
   invalid_explicit_specialization_p = true;
   /* Take the same action that would have been taken by
 cp_parser_explicit_specialization.  */
diff --git a/gcc/testsuite/g++.dg/pr62314.C b/gcc/testsuite/g++.dg/pr62314.C
new file mode 100644
index 000..ebe75ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr62314.C
@@ -0,0 +1,17 @@
+// { dg-options "-fdiagnostics-show-caret" }
+
+template 
+struct iterator_traits {};
+
+struct file_iterator;
+
+struct iterator_traits { // { dg-error "explicit specialization 
must be preceded by .template" }
+};
+
+/* Verify that we emit a fixit hint for this case.  */
+
+/* { dg-begin-multiline-output "" }
+ struct iterator_traits
+ ^
+ template <> 
+   { dg-end-multiline-output "" } */
-- 
1.8.5.3



[PATCH 2/4] PR c++/62314: add fixit hint for "expected ';' after class definition"

2016-04-28 Thread David Malcolm
Looking over the discussion of missing semicolons in
  "Quality of Implementation and Attention to Detail"
within
  http://clang.llvm.org/diagnostics.html
and comparing with
  https://gcc.gnu.org/wiki/ClangDiagnosticsComparison
I noticed that of the cases we do handle [1], there's room for
improvement; we currently emit:

test.c:2:11: error: expected ';' after struct definition
 struct a {}
   ^

whereas clang reportedly emits:

test.c:2:12: error: expected ';' after struct
 struct a {}
^
;

(note the offset of the location, and the fix-it hint)

The following patch gives us the latter, more readable output.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

[1] I've also filed PR c++/68970 about a case given on the clang
page that we still don't handle.

gcc/cp/ChangeLog:
PR c++/62314
* parser.c (cp_parser_class_specifier_1): When reporting
missing semicolons, use a fixit-hint to suggest insertion
of a semicolon immediately after the closing brace,
offsetting the reported column accordingly.

gcc/testsuite/ChangeLog:
PR c++/62314
* gcc/testsuite/g++.dg/parse/error5.C: Update column
number of missing semicolon error.
* g++.dg/pr62314-2.C: New test case.
---
 gcc/cp/parser.c | 19 ---
 gcc/testsuite/g++.dg/parse/error5.C |  2 +-
 gcc/testsuite/g++.dg/pr62314-2.C| 22 ++
 3 files changed, 39 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr62314-2.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ff16f73..e3133d0 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -21440,17 +21440,30 @@ cp_parser_class_specifier_1 (cp_parser* parser)
closing brace.  */
 if (closing_brace && TYPE_P (type) && want_semicolon)
   {
+   /* Locate the closing brace.  */
cp_token_position prev
  = cp_lexer_previous_token_position (parser->lexer);
cp_token *prev_token = cp_lexer_token_at (parser->lexer, prev);
location_t loc = prev_token->location;
 
+   /* We want to suggest insertion of a ';' immediately *after* the
+  closing brace, so, if we can, offset the location by 1 column.  */
+   location_t next_loc = loc;
+   if (!linemap_location_from_macro_expansion_p (line_table, loc))
+ next_loc = linemap_position_for_loc_and_offset (line_table, loc, 1);
+
+   rich_location richloc (line_table, next_loc);
+   richloc.add_fixit_insert (next_loc, ";");
+
if (CLASSTYPE_DECLARED_CLASS (type))
- error_at (loc, "expected %<;%> after class definition");
+ error_at_rich_loc (&richloc,
+"expected %<;%> after class definition");
else if (TREE_CODE (type) == RECORD_TYPE)
- error_at (loc, "expected %<;%> after struct definition");
+ error_at_rich_loc (&richloc,
+"expected %<;%> after struct definition");
else if (TREE_CODE (type) == UNION_TYPE)
- error_at (loc, "expected %<;%> after union definition");
+ error_at_rich_loc (&richloc,
+"expected %<;%> after union definition");
else
  gcc_unreachable ();
 
diff --git a/gcc/testsuite/g++.dg/parse/error5.C 
b/gcc/testsuite/g++.dg/parse/error5.C
index eb1f9c7..d14a476 100644
--- a/gcc/testsuite/g++.dg/parse/error5.C
+++ b/gcc/testsuite/g++.dg/parse/error5.C
@@ -13,7 +13,7 @@ class Foo { int foo() return 0; } };
 // need make cp_parser_error() report more accurate column numbers.
 // { dg-error "30:expected '\{' at end of input" "brace" { target *-*-* } 4 }
 
-// { dg-error "33:expected ';' after class definition" "semicolon" {target 
*-*-* } 4 }
+// { dg-error "34:expected ';' after class definition" "semicolon" {target 
*-*-* } 4 }
 
 // { dg-error "35:expected declaration before '\}' token" "declaration" 
{target *-*-* } 4 }
 
diff --git a/gcc/testsuite/g++.dg/pr62314-2.C b/gcc/testsuite/g++.dg/pr62314-2.C
new file mode 100644
index 000..deb0cb7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr62314-2.C
@@ -0,0 +1,22 @@
+// { dg-options "-fdiagnostics-show-caret" }
+
+template
+class a {} // { dg-error "11: expected .;. after class definition" }
+class temp {};
+a b;
+struct b {
+} // { dg-error "2: expected .;. after struct definition" }
+
+/* Verify that we emit fixit hints.  */
+
+/* { dg-begin-multiline-output "" }
+ class a {}
+   ^
+   ;
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+ }
+  ^
+  ;
+   { dg-end-multiline-output "" } */
-- 
1.8.5.3



[PATCH 4/4] C: add fixit hint to misspelled field names

2016-04-28 Thread David Malcolm
Similar to the C++ case, but more involved as the location of the
pertinent token isn't readily available.  The patch adds it as a param
to build_component_ref.  All callers are updated to provide the info,
apart from objc_build_component_ref; fixing the latter would lead to
a cascade of other changes, so it's simplest to provide UNKNOWN_LOCATION
there and have build_component_ref fall back gracefully for this case
to the old behavior of showing a hint in the message, without a fixit
replacement in the source view.

This does slightly change the location of the error; before we had:

test.c:11:13: error: 'union u' has no member named 'colour'; did you mean 
'color'?
   return ptr->colour;
 ^~

with the patch we have:

test.c:11:15: error: 'union u' has no member named 'colour'; did you mean 
'color'?
   return ptr->colour;
   ^~
   color

I think the location change is an improvement.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c/ChangeLog:
* c-parser.c (c_parser_postfix_expression): In __builtin_offsetof
and structure element reference, capture the location of the
element name token and pass it to build_component_ref.
(c_parser_postfix_expression_after_primary): Likewise for
structure element dereference.
(c_parser_omp_variable_list): Likewise for
OMP_CLAUSE_{_CACHE, MAP, FROM, TO},
* c-tree.h (build_component_ref): Add location_t param.
* c-typeck.c (build_component_ref): Add location_t param
COMPONENT_LOC.  Use it, if available, when issuing hints about
mispelled member names to provide a fixit replacement hint.

gcc/objc/ChangeLog:
* objc-act.c (objc_build_component_ref): Update call
to build_component_ref for added param, passing UNKNOWN_LOCATION.

gcc/testsuite/ChangeLog:
* gcc.dg/spellcheck-fields-2.c: New test case.
---
 gcc/c/c-parser.c   | 34 +-
 gcc/c/c-tree.h |  2 +-
 gcc/c/c-typeck.c   | 26 +++
 gcc/objc/objc-act.c|  3 ++-
 gcc/testsuite/gcc.dg/spellcheck-fields-2.c | 19 +
 5 files changed, 68 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields-2.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 36c44ab..19e6772 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7707,8 +7707,9 @@ c_parser_postfix_expression (c_parser *parser)
   accept sub structure and sub array references.  */
if (c_parser_next_token_is (parser, CPP_NAME))
  {
+   c_token *comp_tok = c_parser_peek_token (parser);
offsetof_ref = build_component_ref
- (loc, offsetof_ref, c_parser_peek_token (parser)->value);
+ (loc, offsetof_ref, comp_tok->value, comp_tok->location);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || c_parser_next_token_is (parser,
@@ -7734,9 +7735,10 @@ c_parser_postfix_expression (c_parser *parser)
c_parser_error (parser, "expected identifier");
break;
  }
+   c_token *comp_tok = c_parser_peek_token (parser);
offsetof_ref = build_component_ref
- (loc, offsetof_ref,
-  c_parser_peek_token (parser)->value);
+ (loc, offsetof_ref, comp_tok->value,
+  comp_tok->location);
c_parser_consume_token (parser);
  }
else
@@ -8213,7 +8215,7 @@ c_parser_postfix_expression_after_primary (c_parser 
*parser,
 {
   struct c_expr orig_expr;
   tree ident, idx;
-  location_t sizeof_arg_loc[3];
+  location_t sizeof_arg_loc[3], comp_loc;
   tree sizeof_arg[3];
   unsigned int literal_zero_mask;
   unsigned int i;
@@ -8327,7 +8329,11 @@ c_parser_postfix_expression_after_primary (c_parser 
*parser,
  c_parser_consume_token (parser);
  expr = default_function_array_conversion (expr_loc, expr);
  if (c_parser_next_token_is (parser, CPP_NAME))
-   ident = c_parser_peek_token (parser)->value;
+   {
+ c_token *comp_tok = c_parser_peek_token (parser);
+ ident = comp_tok->value;
+ comp_loc = comp_tok->location;
+   }
  else
{
  c_parser_error (parser, "expected identifier");
@@ -8339,7 +8345,8 @@ c_parser_postfix_expression_after_primary (c_parser 
*parser,
  start = expr.get_start ();
  finish = c_parser_peek_token (parser)->get_finish ();
  c_parser_consume_token (parser);
- expr.value = build_component_ref (op_loc, expr.value, ident

Re: [ubsan PATCH] Fix compile-time hog with &TARGET_EXPRs (PR sanitizer/70342)

2016-04-28 Thread Marek Polacek
On Thu, Apr 28, 2016 at 11:07:30AM +0200, Jakub Jelinek wrote:
> On Wed, Apr 27, 2016 at 07:03:25PM +0200, Marek Polacek wrote:
> > This test took forever to compile with -fsanitize=null, because the
> > instrumentation was creating incredible amount of duplicated expressions, 
> > in a
> > quadratic fashion.  I think the problem is that we instrument &TARGET_EXPR 
> > <>
> > expressions, which doesn't seem to be needed -- we only need to instrument 
> > the
> > initializers in TARGET_EXPRs.  With this patch, we avoid creating tons of 
> > useless
> > expressions and the compile time is reduced from ~ infinity to <1s.
> > 
> > Jakub, do you see any problem with this?
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2016-04-27  Marek Polacek  
> > 
> > PR sanitizer/70342
> > * c-ubsan.c (ubsan_maybe_instrument_reference_or_call): Don't
> > null-instrument &TARGET_EXPR <...>.
> > 
> > * g++.dg/ubsan/null-7.C: New test.
> 
> I wonder if this wouldn't be better handled in tree_single_nonzero_warnv_p,
> perhaps like:
> 
>  case ADDR_EXPR:
>{
>   tree base = TREE_OPERAND (t, 0);
>  
>   if (!DECL_P (base))
> base = get_base_address (base);
> +
> + if (base && TREE_CODE (base) == TARGET_EXPR)
> +   base = TARGET_EXPR_SLOT (base);
>   
>   if (!base)
> return false;
> 
> (untested)?

That works too, though it of course affects all users, not just ubsan.  Here's
the patch with your suggested change.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-28  Marek Polacek  
Jakub Jelinek  

PR sanitizer/70342
* fold-const.c (tree_single_nonzero_warnv_p): For TARGET_EXPR, use
TARGET_EXPR_SLOT as a base.

* g++.dg/ubsan/null-7.C: New test.

diff --git gcc/fold-const.c gcc/fold-const.c
index 96d8484..171ac83 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -13531,6 +13531,9 @@ tree_single_nonzero_warnv_p (tree t, bool 
*strict_overflow_p)
if (!DECL_P (base))
  base = get_base_address (base);
 
+   if (base && TREE_CODE (base) == TARGET_EXPR)
+ base = TARGET_EXPR_SLOT (base);
+
if (!base)
  return false;
 
diff --git gcc/testsuite/g++.dg/ubsan/null-7.C 
gcc/testsuite/g++.dg/ubsan/null-7.C
index e69de29..8284bc7 100644
--- gcc/testsuite/g++.dg/ubsan/null-7.C
+++ gcc/testsuite/g++.dg/ubsan/null-7.C
@@ -0,0 +1,24 @@
+// PR sanitizer/70342
+// { dg-do compile }
+// { dg-options "-fsanitize=null" }
+
+class A {};
+class B {
+public:
+  B(A);
+};
+class C {
+public:
+  C operator<<(B);
+};
+class D {
+  D(const int &);
+  C m_blackList;
+};
+D::D(const int &) {
+  m_blackList << A() << A() << A() << A() << A() << A() << A() << A() << A()
+  << A() << A() << A() << A() << A() << A() << A() << A() << A()
+  << A() << A() << A() << A() << A() << A() << A() << A() << A()
+  << A() << A() << A() << A() << A() << A() << A() << A() << A()
+  << A() << A() << A() << A() << A() << A() << A() << A() << A();
+}

Marek


[AArch64] Remove an unused reload hook.

2016-04-28 Thread Matthew Wahab

Hello,

Yvan Roux pointed out that the patch at
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01713.html was never
committed.

From the original submission:

  The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since
  the Aarch64 backend no longer supports reload, this macro is not
  needed and this patch removes it.

This is a rebased and retested version of that patch.

Tested aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2016-04-26  Matthew Wahab  

* config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
* config/aarch64/arch64-protos.h
(aarch64_legitimize_reload_address): Remove.
* config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
Remove.
[PATCH] [AArch64] Remove an unused reload hook.

Yvan Roux pointed out that the patch at
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01713.html was never
committed.

>From the original submission:

  The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since
  the Aarch64 backend no longer supports reload, this macro is not
  needed and this patch removes it.

This is a rebased and retested version of that patch.

Tested aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2016-04-26  Matthew Wahab  

* config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
* config/aarch64/arch64-protos.h
(aarch64_legitimize_reload_address): Remove.
* config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
Remove.
---
 gcc/config/aarch64/aarch64-protos.h |   1 -
 gcc/config/aarch64/aarch64.c| 114 
 gcc/config/aarch64/aarch64.h|  15 -
 3 files changed, 130 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index f22a31c..6a8a850 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -339,7 +339,6 @@ int aarch64_simd_attr_length_move (rtx_insn *);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
-rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9995494..4a1acc9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5022,120 +5022,6 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
   return x;
 }
 
-/* Try a machine-dependent way of reloading an illegitimate address
-   operand.  If we find one, push the reload and return the new rtx.  */
-
-rtx
-aarch64_legitimize_reload_address (rtx *x_p,
-   machine_mode mode,
-   int opnum, int type,
-   int ind_levels ATTRIBUTE_UNUSED)
-{
-  rtx x = *x_p;
-
-  /* Do not allow mem (plus (reg, const)) if vector struct mode.  */
-  if (aarch64_vect_struct_mode_p (mode)
-  && GET_CODE (x) == PLUS
-  && REG_P (XEXP (x, 0))
-  && CONST_INT_P (XEXP (x, 1)))
-{
-  rtx orig_rtx = x;
-  x = copy_rtx (x);
-  push_reload (orig_rtx, NULL_RTX, x_p, NULL,
-		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
-		   opnum, (enum reload_type) type);
-  return x;
-}
-
-  /* We must recognize output that we have already generated ourselves.  */
-  if (GET_CODE (x) == PLUS
-  && GET_CODE (XEXP (x, 0)) == PLUS
-  && REG_P (XEXP (XEXP (x, 0), 0))
-  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
-  && CONST_INT_P (XEXP (x, 1)))
-{
-  push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
-		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
-		   opnum, (enum reload_type) type);
-  return x;
-}
-
-  /* We wish to handle large displacements off a base register by splitting
- the addend across an add and the mem insn.  This can cut the number of
- extra insns needed from 3 to 1.  It is only useful for load/store of a
- single register with 12 bit offset field.  */
-  if (GET_CODE (x) == PLUS
-  && REG_P (XEXP (x, 0))
-  && CONST_INT_P (XEXP (x, 1))
-  && HARD_REGISTER_P (XEXP (x, 0))
-  && mode != TImode
-  && mode != TFmode
-  && aarch64_regno_ok_for_base_p (REGNO (XEXP (x, 0)), true))
-{
-  HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
-  HOST_WIDE_INT low = val & 0xfff;
-  HOST_WIDE_INT high = val - low;
-  HOST_WIDE_INT offs;
-  rtx cst;
-  machine_mode xmode = GET_MODE (x);
-
-  /* In ILP32, xmode can be either DImode or SImode.  */
-  gcc_assert (xmode == DImode || xmode == SImode);
-
-  /* Reload non-zero BLKmode offsets.  This is because we cannot ascertain
-	 BLKmode alignment.  */
-  if (GET_MODE_SIZE (mode) == 0)
-	return NULL_RTX;
-
-  offs = low % GET_MODE_SIZE (mode);
-
-  /* Align misaligned offset by adjusting high p

RE: [PATCH 4/6] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-28 Thread Claudiu Zissulescu
Hi,

> Where exactly does the test go wrong?

The test which fails is this one: 
TEST_EQ (double, __DBL_MAX__, __DBL_MAX__, 1);
From the test file included in the patch.

> Can you show a trace of __eqdf2 with register values?

Sure thing, running for ARC700, using original implementation and enabled 
guarded code for FPX handling:

[0x02a2] 0xc000 K Zld_s   r0,[sp,0x0] : lw 
[0x5000c0c0] => 0x : (w1) r0 <= 0x *
[0x02a4] 0xc101 K Zld_s   r1,[sp,0x4] : lw 
[0x5000c0c4] => 0x7fef : (w1) r1 <= 0x7fef *
[0x02a6] 0xc202 K Zld_s   r2,[sp,0x8] : lw 
[0x5000c0c8] => 0x : (w1) r2 <= 0x *
[0x02a8] 0xc303 K Zld_s   r3,[sp,0xc] : lw 
[0x5000c0cc] => 0x7fef : (w1) r3 <= 0x7fef *
[0x02aa] 0x0aea K Zbl 0x2e8 : (w0) r31 <= 
0x02ae *
[0x0590] 0x091d00e1 K Zbrne.d r1,r3,0x1c
[0x0594] 0x2153050c K Zbmsk   r12,r1,0x14 : (w0) 
r12 <= 0x000f *
[0x0598] 0x200580be K Zor.f   0,r0,r2 *
[0x059c] 0x24cf1562 K  N   bset.ner12,r12,0x15 : (w0) 
r12 <= 0x002f *
[0x05a0] 0x2414904c K  N   add1.f r12,r12,r1 : (w0) r12 
<= 0x000d *
[0x05a4] 0x7fe0 K   C  j_s.d  [blink] *
[0x05a6] 0x20cc8086 KD  C  cmp.cc r0,r2
 
For reference, the routine:

.global __eqdf2
.balign 4
HIDDEN_FUNC(__eqdf2)
/* Good performance as long as the difference in high word is
   well predictable (as seen from the branch predictor).  */
__eqdf2:
brne.d DBL0H,DBL1H,.Lhighdiff
bmskr12,DBL0H,20
#ifndef __HS__
/* The next two instructions are required to recognize the FPX
NaN, which has a pattern like this: 0x7ff0__8000_, as
oposite to 0x7ff8___.  */
or.f0,DBL0L,DBL1L
bset.ne r12,r12,21
#endif /* __HS__ */
add1.f  r12,r12,DBL0H /* set c iff NaN; also, clear z if NaN.  */
j_s.d   [blink]
cmp.cc  DBL0L,DBL1L
.balign 4
.Lhighdiff:
or  r12,DBL0H,DBL1H
or.f0,DBL0L,DBL1L
j_s.d   [blink]
bmsk.eq.f r12,r12,30
ENDFUNC(__eqdf2)

All those results were collected using nsimfree.

Please let me know if you need more info,
Claudiu



Re: [PATCH] Mark predicates generated by genmatch as static

2016-04-28 Thread Patrick Palka
On Thu, Apr 28, 2016 at 10:02 AM, Richard Biener
 wrote:
> On Thu, Apr 28, 2016 at 3:50 PM, Patrick Palka  wrote:
>> The predicate functions emitted by genmatch are expected to only be used
>> locally within {gimple,generic}-match.c, so this patch marks them as
>> static.  Does this look OK to commit after bootstrap and regtest?
>
> Actually the idea was to for example generate predicates in match.pd
> format for things like vectorizer pattern recog (I've done this for a few,
> need to search for (partial) patches on my disk), so they are supposed
> to be externally visible.
>
> Of course we might want to make that explicit in some way with
> a (extern_match ...) [or by prefixing local ones with a '*' ...].

Oh, I see.  That sounds useful.

>
> Richard.
>
>> gcc/ChangeLog:
>>
>> * genmatch.c (write_predicate): Mark the emitted function as
>> static.
>> ---
>>  gcc/genmatch.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/genmatch.c b/gcc/genmatch.c
>> index ce964fa..2f5147f 100644
>> --- a/gcc/genmatch.c
>> +++ b/gcc/genmatch.c
>> @@ -3552,7 +3552,7 @@ decision_tree::gen (FILE *f, bool gimple)
>>  void
>>  write_predicate (FILE *f, predicate_id *p, decision_tree &dt, bool gimple)
>>  {
>> -  fprintf (f, "\nbool\n"
>> +  fprintf (f, "\nstatic bool\n"
>>"%s%s (tree t%s%s)\n"
>>"{\n", gimple ? "gimple_" : "tree_", p->id,
>>p->nargs > 0 ? ", tree *res_ops" : "",
>> --
>> 2.8.1.361.g2fbef4c
>>


Re: [ubsan PATCH] Fix compile-time hog with &TARGET_EXPRs (PR sanitizer/70342)

2016-04-28 Thread Jakub Jelinek
On Thu, Apr 28, 2016 at 04:10:01PM +0200, Marek Polacek wrote:
> That works too, though it of course affects all users, not just ubsan.  Here's

Of course, but I think that is a good thing ;)

> the patch with your suggested change.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2016-04-28  Marek Polacek  
>   Jakub Jelinek  
> 
>   PR sanitizer/70342
>   * fold-const.c (tree_single_nonzero_warnv_p): For TARGET_EXPR, use
>   TARGET_EXPR_SLOT as a base.
> 
>   * g++.dg/ubsan/null-7.C: New test.

Ok for trunk.
For 6.2 dunno, either the same patch after a while, or perhaps your original
patch is safer (though, wonder if e.g. one can construct a testcase where it
will use instrument &(TARGET_EXPR <...>.field) nested many times and still
trigger the compile time hog with your patch).

Jakub


Re: [PATCH] Update gmp/mpfr/mpc in-tree versions

2016-04-28 Thread Bernd Edlinger
On 28.04.2016 14:35, Richard Biener wrote:
> On Thu, 28 Apr 2016, Bernd Edlinger wrote:
>
>> Hi,
>>
>> here is the first part of the patch that addresses only the in-tree
>> builds.  I tried different combinations of the documented supported
>> in-tree versions, and all combinations seem to work.
>> Then I changed the download_prerequisites batch to pick each pre-
>> requisite's minimum version (that part is not tested, because I have
>> no way to update the gcc.gnu.org ftp server).
>>
>> Various boot-straps for x86_64-linux-gnu and armv7-linux-gnueabihf
>> were successful.
>>
>> Is it OK for trunk?
>
> Please do not document that in-tree versions greater than XXX are
> supported, instead just point at download_prerequesites.
>

OK, done.

> Why do you not update to latest mpc (there is 1.0.3) and mpfr but leave
> bugfixes for mpfr on the plate (there is 3.1.4).
>

There's not really a good reason for that choice.

I just started with the latest version, and later moved to older
versions, because I did not want to restrict the supported versions
more than absolutely necessary, not even in-tree.

Are there any bug-fixes that we could depend upon?

> Does it make sense to wait for a new GMP release that allows to get
> rid of -DNO_ASM?
>

I was _very_ surprised that gmp-6.0.0 did at first work in-tree but
enabled invalid assembly code, in gmp-6.0.0/mpn/generic/div_qr_1n_pi1.c
when __arm__ or __sparc__ or __s390x__ is defined together with NO_ASM.

All in all GMP contains really much assembler code that we don't need
at all, my impression is that it is nearly impossible to test GMP
on every possible target, although it is all about mathematics.
So at least some choice would be good for us.

In that sense, I would not like to restrict the supported GMP versions
to just one version, that is not even released at this time.

> I will upload mpfr 3.1.4 and mpc 1.0.3.
>

Good.  I updated the download_prerequsites to mpfr-3.1.4 and mpc-1.0.3
again, but left gmp-6.1.0 at the moment.


Thanks,
Bernd.
2016-04-28  Bernd Edlinger  

* configure.ac (mpfr): Remove pre-3.1.0 mpfr compatibility code.
* configure: Regenerated.
* Makefile.def (gmp): Explicitly disable assembler.
(mpfr): Adjust lib_path.
(mpc): Likewise.
* Makefile.in: Regenerated.

gcc/
2016-04-28  Bernd Edlinger  

* doc/install.texi: Document supported in-tree gmp/mpfr/mpc versions.

contrib/
2016-04-28  Bernd Edlinger  

* download_prerequisites: Adjust gmp/mpfr/mpc versions.
Index: Makefile.def
===
--- Makefile.def	(Revision 235487)
+++ Makefile.def	(Arbeitskopie)
@@ -50,6 +50,7 @@ host_modules= { module= gcc; bootstrap=true;
 host_modules= { module= gmp; lib_path=.libs; bootstrap=true;
 		// Work around in-tree gmp configure bug with missing flex.
 		extra_configure_flags='--disable-shared LEX="touch lex.yy.c"';
+		extra_make_flags='AM_CFLAGS="-DNO_ASM"';
 		no_install= true;
 		// none-*-* disables asm optimizations, bootstrap-testing
 		// the compiler more thoroughly.
@@ -57,11 +58,11 @@ host_modules= { module= gmp; lib_path=.libs; boots
 		// gmp's configure will complain if given anything
 		// different from host for target.
 	target="none-${host_vendor}-${host_os}"; };
-host_modules= { module= mpfr; lib_path=.libs; bootstrap=true;
+host_modules= { module= mpfr; lib_path=src/.libs; bootstrap=true;
 		extra_configure_flags='--disable-shared @extra_mpfr_configure_flags@';
 		extra_make_flags='AM_CFLAGS="-DNO_ASM"';
 		no_install= true; };
-host_modules= { module= mpc; lib_path=.libs; bootstrap=true;
+host_modules= { module= mpc; lib_path=src/.libs; bootstrap=true;
 		extra_configure_flags='--disable-shared @extra_mpc_gmp_configure_flags@ @extra_mpc_mpfr_configure_flags@';
 		no_install= true; };
 host_modules= { module= isl; lib_path=.libs; bootstrap=true;
Index: Makefile.in
===
--- Makefile.in	(Revision 235487)
+++ Makefile.in	(Arbeitskopie)
@@ -639,12 +639,12 @@ HOST_LIB_PATH_gmp = \
 
 @if mpfr
 HOST_LIB_PATH_mpfr = \
-  $$r/$(HOST_SUBDIR)/mpfr/.libs:$$r/$(HOST_SUBDIR)/prev-mpfr/.libs:
+  $$r/$(HOST_SUBDIR)/mpfr/src/.libs:$$r/$(HOST_SUBDIR)/prev-mpfr/src/.libs:
 @endif mpfr
 
 @if mpc
 HOST_LIB_PATH_mpc = \
-  $$r/$(HOST_SUBDIR)/mpc/.libs:$$r/$(HOST_SUBDIR)/prev-mpc/.libs:
+  $$r/$(HOST_SUBDIR)/mpc/src/.libs:$$r/$(HOST_SUBDIR)/prev-mpc/src/.libs:
 @endif mpc
 
 @if isl
@@ -11300,7 +11300,7 @@ all-gmp: configure-gmp
 	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
 	$(HOST_EXPORTS)  \
 	(cd $(HOST_SUBDIR)/gmp && \
-	  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) $(STAGE1_FLAGS_TO_PASS)  \
+	  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) $(STAGE1_FLAGS_TO_PASS) AM_CFLAGS="-DNO_ASM" \
 		$(TARGET-gmp))
 @endif gmp
 
@@ -11329,7 +11329,7 @@ all-stage1-gmp: configure-stage1-gmp
 		CXXFLAGS_FOR_TARGET="$(CXXFLAGS_FOR_TARGET)" \
 		LIBCFLAGS_FOR_TARG

Re: [PATCH] Improve AVX512F sse4_1_round* patterns

2016-04-28 Thread Kirill Yukhin
Hi Jakub,
On 27 Apr 23:34, Jakub Jelinek wrote:
> Hi!
> 
> While AVX512F doesn't contain EVEX encoded vround{ss,sd,ps,pd} instructions,
> it contains vrndscale* which performs the same thing if bits [4:7] of the
> immediate are zero.
> 
> For _mm*_round_{ps,pd} we actually already emit vrndscale* for -mavx512f
> instead of vround* unconditionally (because
> _rndscale
> instruction has the same RTL as _round
> and the former, enabled for TARGET_AVX512F, comes first), for the scalar
> cases (thus __builtin_round* or _mm*_round_s{s,d}) the patterns we have
> don't allow extended registers and thus we end up with unnecessary moves
> if the inputs and/or outputs are or could be most effectively allocated
> in the xmm16+ registers.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
Your patch is OK.
> 
> 2016-04-27  Jakub Jelinek  
> 
>   * config/i386/i386.md (sse4_1_round2): Add avx512f alternative.
>   * config/i386/sse.md (sse4_1_round): Likewise.
> 
>   * gcc.target/i386/avx-vround-1.c: New test.
>   * gcc.target/i386/avx-vround-2.c: New test.
>   * gcc.target/i386/avx512vl-vround-1.c: New test.
>   * gcc.target/i386/avx512vl-vround-2.c: New test.

--
Thanks, K


Re: [PATCH 1/4] PR c++/62314: add fixit hint for missing "template <> " in explicit specialization

2016-04-28 Thread Trevor Saunders
On Thu, Apr 28, 2016 at 10:28:15AM -0400, David Malcolm wrote:
> This is a resend of a patch kit I sent in stage 3; the original post
> was here:
>   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01933.html
> 
> I've rebased the patches against yesterday's trunk and retested them.
> 
> They add various fix-it hints to existing diagnostics (PR 62314 is a
> catch-all for adding fix-its).
> 
> The first patch in the kit adds a fix-it insertion hint for missing
> "template <> " in explicit specializations, and improves the
> reported range of the type name by capturing the full range, rather
> than just one token within it.
> 
> I note that clang (http://clang.llvm.org/diagnostics.html) suggests
> inserting
>   template<>
> whereas our diagnostic talks about
>   template <>
> hence I have the fixit suggest inserting that.  Should we change our
> wording instead, and lose the space?

Selfishly I'd prefer to lose the space on the grounds all the other
projects I work on don't put one there and gcc is inconsistant about it.
That said assuming there are projects that put a space there it seems
unfortunate we need to pick one which will definitely be suboptimal for
some people.

Trev


Re: [PATCH] Update gmp/mpfr/mpc in-tree versions

2016-04-28 Thread Richard Biener
On Thu, 28 Apr 2016, Bernd Edlinger wrote:

> On 28.04.2016 14:35, Richard Biener wrote:
> > On Thu, 28 Apr 2016, Bernd Edlinger wrote:
> >
> >> Hi,
> >>
> >> here is the first part of the patch that addresses only the in-tree
> >> builds.  I tried different combinations of the documented supported
> >> in-tree versions, and all combinations seem to work.
> >> Then I changed the download_prerequisites batch to pick each pre-
> >> requisite's minimum version (that part is not tested, because I have
> >> no way to update the gcc.gnu.org ftp server).
> >>
> >> Various boot-straps for x86_64-linux-gnu and armv7-linux-gnueabihf
> >> were successful.
> >>
> >> Is it OK for trunk?
> >
> > Please do not document that in-tree versions greater than XXX are
> > supported, instead just point at download_prerequesites.
> >
> 
> OK, done.
> 
> > Why do you not update to latest mpc (there is 1.0.3) and mpfr but leave
> > bugfixes for mpfr on the plate (there is 3.1.4).
> >
> 
> There's not really a good reason for that choice.
> 
> I just started with the latest version, and later moved to older
> versions, because I did not want to restrict the supported versions
> more than absolutely necessary, not even in-tree.
> 
> Are there any bug-fixes that we could depend upon?
> 
> > Does it make sense to wait for a new GMP release that allows to get
> > rid of -DNO_ASM?
> >
> 
> I was _very_ surprised that gmp-6.0.0 did at first work in-tree but
> enabled invalid assembly code, in gmp-6.0.0/mpn/generic/div_qr_1n_pi1.c
> when __arm__ or __sparc__ or __s390x__ is defined together with NO_ASM.
> 
> All in all GMP contains really much assembler code that we don't need
> at all, my impression is that it is nearly impossible to test GMP
> on every possible target, although it is all about mathematics.
> So at least some choice would be good for us.
> 
> In that sense, I would not like to restrict the supported GMP versions
> to just one version, that is not even released at this time.

Another option would be to try if mini-gmp is enough for our
(in-tree) use and what the performance impact would be if we'd
use that (in-tree).

> > I will upload mpfr 3.1.4 and mpc 1.0.3.
> >
> 
> Good.  I updated the download_prerequsites to mpfr-3.1.4 and mpc-1.0.3
> again, but left gmp-6.1.0 at the moment.

Thanks,
Richard.


Re: [PATCH] Turn some compile-time tests into run-time tests

2016-04-28 Thread Jeff Law

On 04/28/2016 08:03 AM, Patrick Palka wrote:



The rest seem OK to me.  Note that I'm not convinced all these tests were
designed to be execution tests, even though they use __builtin_abort and
friends.  Though it's a good marker of something that can/should be looked
at.


True..  What made me look into this in the first place is that I
caught myself making a similar mistake, i.e. marking an execution test
case as dg-do compile instead of dg-do run out of habit.
It's an easy mistake to make and, it's pretty low in terms of real world 
impact :-)



 But I

suppose it's worth looking at the context of each of these tests to
see if they were not actually intended to be execution tests.  I'll
double check this and report back; in the meantime I also found some
more tests that ought to be looked at.
I think for the set you already identified go ahead and make the 
approved changes.  We don't really lose anything by doing so.  Going 
forward we just have to continue to watch for this kind of thing 
slipping through the cracks and updating tests as mistakes are identified.


jeff


Re: [PATCH 4/6] [ARC] Handle FPX NaN within optimized floating point library.

2016-04-28 Thread Joern Wolfgang Rennecke



On 28/04/16 15:11, Claudiu Zissulescu wrote:

Sure thing, running for ARC700, using original implementation and enabled 
guarded code for FPX handling:

[0x02a2] 0xc000 K Zld_s   r0,[sp,0x0] : lw 
[0x5000c0c0] => 0x : (w1) r0 <= 0x *
[0x02a4] 0xc101 K Zld_s   r1,[sp,0x4] : lw 
[0x5000c0c4] => 0x7fef : (w1) r1 <= 0x7fef *
[0x02a6] 0xc202 K Zld_s   r2,[sp,0x8] : lw 
[0x5000c0c8] => 0x : (w1) r2 <= 0x *
[0x02a8] 0xc303 K Zld_s   r3,[sp,0xc] : lw 
[0x5000c0cc] => 0x7fef : (w1) r3 <= 0x7fef *
[0x02aa] 0x0aea K Zbl 0x2e8 : (w0) r31 <= 
0x02ae *
[0x0590] 0x091d00e1 K Zbrne.d r1,r3,0x1c
[0x0594] 0x2153050c K Zbmsk   r12,r1,0x14 : (w0) r12 
<= 0x000f *
[0x0598] 0x200580be K Zor.f   0,r0,r2 *
[0x059c] 0x24cf1562 K  N   bset.ner12,r12,0x15 : (w0) r12 
<= 0x002f *
[0x05a0] 0x2414904c K  N   add1.f r12,r12,r1 : (w0) r12 
<= 0x000d *
[0x05a4] 0x7fe0 K   C  j_s.d  [blink] *
[0x05a6] 0x20cc8086 KD  C  cmp.cc r0,r2
  
  

I see, we basically have an overflow.
I think the DPFP_COMPAT / __HS__ variant should be something like:

brne DBL0H,DBL1H,.Lhighdiff
mov_s r12,0x0020
or.f 0,DBL0L,DBL1L
bset.ne r12,r12,0

add1.f  r12,r12,DBL0H /* set c iff NaN; also, clear z if NaN.  */
j_s.d   [blink]
cmp.cc  DBL0L,DBL1L
...

Where the mov_s could be replaced with something else that loads the 
same value,

depending on what instructions are supported.


Re: [PATCH] add -fprolog-pad=N option to c-family

2016-04-28 Thread Jeff Law

On 04/28/2016 05:18 AM, Torsten Duwe wrote:

On Thu, Apr 28, 2016 at 11:39:48AM +0300, Maxim Kuvyrkov wrote:

On Apr 27, 2016, at 6:22 PM, Torsten Duwe  wrote:


Your current patch is great for experiments for the kernel engineers to check 
if suggested approaches to code patching will work.  Still, I prefer to 
implement LTO-friendly way of handling -fprolog-pad=N via function attributes.


That was exactly my intention. I only wanted *some* working compiler.
I'm sure you compiler people will have a better way to finally implement this.
Conceptually we have the concept of nops insn patterns, so generically 
I'd implement this by emitting a suitable set of nops followed by a 
scheduling barrier, then thread the mess at the start of the prologue. 
This would be 99.9% target independent changes.


We'd just punt targets that don't represent prologues as RTL.




All I can say so far about the ipa-ra issue is that it'd be great if
x9(?) could be left as volatile / scratch; the rest can be preserved.
ipa-ra doesn't really work that way.  It just notes what's used in the 
callee and the caller is allowed to look at that information and use it 
to optimize stuff on the caller side.


For example, call-clobbered registers that are not used in the callee 
can be used in the caller to hold values across the call.


This is going to wreck havoc for anything that assumes a call-clobbered 
register can always be safely used in the callee, particularly in the 
patched codepath.  One could argue that the patched codepath is the 
uncommon case and should be responsible for saving/restoring any 
register it uses to ensure it doesn't mess up any visible state.


Jeff




  1   2   >