date:20161115

Re: [ssa-coalesce] Rename register_ssa_partition

2016-11-15 Thread Richard Biener

On Mon, Nov 14, 2016 at 11:56 PM, kugan
 wrote:
> Hi Richard,
>
> On 08/11/16 23:45, Richard Biener wrote:
>>
>> On Tue, Nov 8, 2016 at 3:32 AM, kugan 
>> wrote:
>>>
>>> Hi,
>>>
>>> In tree-ssa-coalesce, register_ssa_partition ) and
>>> register_ssa_partition_check have lost their meaning over various commits
>>> and now just verifies that ssa_var is indeed a SSA_NAME and not a
>>> virtual_operand_p. It is confusing when one look at if for the fist time
>>> and
>>> would expect more while reading the register_ssa_partition.
>>>
>>> Attached patch just changes it to verify_ssa_for_coalesce to better
>>> reflect
>>> what it is doing now.
>>>
>>> Bootstrap and regression testing is ongoing. Is this OK for trunk if no
>>> regressions?
>>
>>
>> Hum, can you retain the inline wrapper please?  I find the new name
>> verify_ssa_for_coalesce bad as tree-ssa-live.h is something generic,
>> not just coalescing related.  I'd say a better improvement would be to
>> remove
>> register_ssa_partition completely.
>
> Do you like the attached patch which completely removes this.

Yes.

Ok if it passes bootstrap/regtest.

RIchard.

> Thanks,
> Kugan
>
>
>>
>> Richard.
>>
>>> Thanks,
>>> Kugan
>>>
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-11-08  Kugan Vivekanandarajah  
>>>
>>> * tree-ssa-coalesce.c (register_default_def): Remove usage of arg
>>> map which is not used at all.
>>> (create_outofssa_var_map): Use renamed verify_ssa_for_coalesce
>>> from
>>> register_ssa_partition.
>>> * tree-ssa-live.c (verify_ssa_for_coalesce): Renamed
>>> register_ssa_partition.
>>> (register_ssa_partition_check): Remove.
>>> * tree-ssa-live.h (register_ssa_partition): Renamed to
>>> verify_ssa_for_coalesce

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Richard Biener

On Mon, Nov 14, 2016 at 11:13 PM, Jerry DeLisle  wrote:
> On 11/13/2016 11:03 PM, Thomas Koenig wrote:
>>
>> Hi Jerry,
>>
>> I think this
>>
>> +  /* Parameter adjustments */
>> +  c_dim1 = m;
>> +  c_offset = 1 + c_dim1;
>>
>> should be
>>
>> +  /* Parameter adjustments */
>> +  c_dim1 = rystride;
>> +  c_offset = 1 + c_dim1;
>>
>> Regarding options for matmul:  It is possible to add the
>> options to the lines in Makefile.in
>>
>> # Turn on vectorization and loop unrolling for matmul.
>> $(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS +=
>> -ftree-vectorize
>> -funroll-loops
>>
>> This is a great step forward.  I think we can close most matmul-related
>> PRs once this patch has been applied.
>>
>> Regards
>>
>> Thomas
>>
>
> With Thomas suggestion, I can remove the #pragma optimize from the source
> code. Doing this: (long lines wrapped as shown)
>
> diff --git a/libgfortran/Makefile.am b/libgfortran/Makefile.am
> index 39d3e11..9ee17f9 100644
> --- a/libgfortran/Makefile.am
> +++ b/libgfortran/Makefile.am
> @@ -850,7 +850,7 @@ intrinsics/dprod_r8.f90 \
>  intrinsics/f2c_specifics.F90
>
>  # Turn on vectorization and loop unrolling for matmul.
> -$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
> -funroll-loops
> +$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ffast-math
> -fno-protect-parens -fstack-arrays -ftree-vectorize -funroll-loops --param
> max-unroll-times=4 -ftree-loop-vectorize

-ftree-vectorize turns on -ftree-loop-vectorize and
-ftree-slp-vectorize already.

>  # Logical matmul doesn't vectorize.
>  $(patsubst %.c,%.lo,$(notdir $(i_matmull_c))): AM_CFLAGS += -funroll-loops
>
>
> Comparing gfortran 6 vs 7: (test program posted in PR51119)
>
> $ gfc6 -static -Ofast -finline-matmul-limit=32 -funroll-loops --param
> max-unroll-times=4 compare.f90
> $ ./a.out
>  =
>  MEASURED GIGAFLOPS  =
>  =
>  Matmul   Matmul
>  fixed Matmul variable
>  Size  Loops explicit   refMatmul  assumedexplicit
>  =
> 2  2000 11.928  0.047  0.082  0.138
> 4  2000  1.455  0.220  0.371  0.316
> 8  2000  1.476  0.737  0.704  1.574
>16  2000  4.536  3.755  2.825  3.820
>32  2000  6.070  5.443  3.124  5.158
>64  2000  5.423  5.355  5.405  5.413
>   128  2000  5.913  5.841  5.917  5.917
>   256   477  5.865  5.252  5.863  5.862
>   51259  2.794  2.841  2.794  2.791
>  1024 7  1.662  1.356  1.662  1.661
>  2048 1  1.753  1.724  1.753  1.754
>
> $ gfc -static -Ofast -finline-matmul-limit=32 -funroll-loops --param
> max-unroll-times=4 compare.f90
> $ ./a.out
>  =
>  MEASURED GIGAFLOPS  =
>  =
>  Matmul   Matmul
>  fixed Matmul variable
>  Size  Loops explicit   refMatmul  assumedexplicit
>  =
> 2  2000 12.146  0.042  0.090  0.146
> 4  2000  1.496  0.232  0.384  0.325
> 8  2000  2.330  0.765  0.763  0.965
>16  2000  4.611  4.120  2.792  3.830
>32  2000  6.068  5.265  3.102  4.859
>64  2000  6.527  5.329  6.425  6.495
>   128  2000  8.207  5.643  8.336  8.441
>   256   477  9.210  4.967  9.367  9.299
>   51259  8.330  2.772  8.422  8.342
>  1024 7  8.430  1.378  8.511  8.424
>  2048 1  8.339  1.718  8.425  8.322
>
> I do think we need to adjust the default inline limit and should do this
> separately from this patch.
>
> With these changes, OK for trunk?
>
> Regards,
>
> Jerry
>

Re: [libstdc++, testsuite] Add dg-require-thread-fence

2016-11-15 Thread Christophe Lyon

On 14 November 2016 at 21:31, Ramana Radhakrishnan
 wrote:
>
> On Mon, 14 Nov 2016 at 19:59, Christophe Lyon 
> wrote:
>>
>> On 14 November 2016 at 18:54, Mike Stump  wrote:
>> > On Oct 21, 2016, at 1:00 AM, Christophe Lyon
>> >  wrote:
>> >>
>> >> So if we say that the current behaviour has to keep being the default,
>> >> so that users think about what they are really doing,
>> >
>> > Having a toolchain not work by default to force users to think, isn't a
>> > winning strategy.
>> >
>> > Everything should always, just work.  Those things that don't, we should
>> > fix.
>> >
>> I tend to agree :-)
>>
>> Maybe Ramana changed his mind and would now no longer want to force
>> users to think?
>
>
>
> I haven't been able to deal with this thread having being in and out of the
> office for the past month thanks to various reasons. I am not back at my
> desk until next week for various reasons and ran out of time when I was at
> my desk to get back to this and actually fix the comments in newlib patch
> review.
>
>
> https://sourceware.org/ml/newlib/2015/msg00653.html
>

Thanks for the pointer, I missed it.

> This seems to have dropped between the cracks for various reasons but that
> was the approach I was going for. Some of the points made are taken, but
> having users not think about what they want to do about synchronisation and
> just provide empty stub functions which result in random run time crashes
> aren't correct in my book. If anyone is interested in moving forward I would
> suggest they take that approach or refine it further.
>
>
> Thanks,
> Ramana
>

Re: [MIPS] Enable descriptors for nested functions in Ada

2016-11-15 Thread Eric Botcazou

> Thanks for the patch. I'm a bit concerned about the interaction this
> will have with microMIPS which can (albeit not implemented today) use
> 2-byte alignment on function entry points.
> 
> Is the solution for other targets to mandate 4-byte alignment when
> using function descriptors?

Yes, the compiler will overalign functions for languages using descriptors 
(only Ada as of this writing): 2-byte alignment if the setting is 1, 4-byte 
alignment if the setting is 2, 8-byte alignment if the setting is 4, etc.

That's done by the FUNCTION_ALIGNMENT macro defined in defaults.h.

> If so then I don't see a problem with this. We will have to account for
> that when GCC allows 2-byte aligned microMIPS functions.

OK, thanks.

-- 
Eric Botcazou

Re: [v3 PATCH] Implement P0504R0 (Revisiting in-place tag types for any/optional/variant).

2016-11-15 Thread Ville Voutilainen

On 14 November 2016 at 22:51, Ville Voutilainen
 wrote:
> On 14 November 2016 at 22:49, Ville Voutilainen
>  wrote:
>> I needed to do some minor tweaks in
>> testsuite/20_util/in_place/requirements.cc. I committed the attached
>> after testing the full suite on Linux-PPC64.
>
>
> P.S. make_unsigned/make_signed tests seem broken. That's not caused by
> any of my patches. The expected
> diagnostic message seems to have changed. I can probably fix that, but
> I will do the hash-poisoning patch first.

..and that was just some glitch in the matrix, those tests are fine on
both Linux-X64 and
Linux-PPC64 now that I re-ran everything with current trunk.

Re: RFA (openmp): C++ PATCH to make some references TREE_CONSTANT

2016-11-15 Thread Jakub Jelinek

On Mon, Nov 14, 2016 at 11:53:55PM -0500, Jason Merrill wrote:
> The standard says that references that refer to a constant address can
> be used in a constant-expression, but we haven't allowed that.  This
> patch implements it, but without the parser.c hunk it broke
> libgomp.c++/target-14.C.

The parser hunk can't be right, the decl you modify TREE_CONSTANT on
is not in any way "an OpenMP variable", it is just a variable used in some
OpenMP region, modifying the flag will affect all the handling of the
variable, both outside and inside of OpenMP regions.

> Apparently the target mapping wants to use 't' in a way that doesn't
> actually refer to x?  This seems like a strange use of references.

Yes.  The privatization and mapping of variables with reference type
in OpenMP generally works by privatizing or mapping what that reference
references, and then creating a new reference that references the privatized
or mapped variable.

Looking at the *.original dump diff without your patch to with your patch
minus the parser.c hunk shows:
-  if ((*a != 8 || *c != 142) || *t != 19)
+  if ((*a != 8 || *c != 142) || *(int &) &x != 19)
which might be fine outside of the OpenMP regions, from what you wrote I
understood is required in constexpr contexts, but can't be right in the
OpenMP regions - have to be deferred until omplower pass does whatever it
needs to do.  This is similar to DECL_VALUE_EXPR handling I've mentioned
yesterday, there it is also dangerous to just fold var with DECL_VALUE_EXPR
to its DECL_VALUE_EXPR until after omplower pass - the gimplifier gimplifies
such vars to their DECL_VALUE_EXPR only if it is ok (omp_notice_variable
function, e.g. together with disregard_value_expr langhook, decides on when
it is ok or not).

In target-14.C, there is mapping of reference t, so after lowering
there is going to be in the region used variable t' that refers to some int
in the target.  Your patch replaces uses of t in the region much earlier
with uses of &x, but that is something that isn't explicitly mapped, and
the OpenMP 4.5 rules say that such variable is firstprivatized implicitly,
so the code outside of the target region won't see any changes in the
variable.

So, is there a way to treat references the similarly?  I.e. only "fold"
reference vars to what they refer (DECL_INITIAL) in constexpr.c evaluation,
or gimplification where a langhook or omp_notice_variable etc. has the
last say on when it is ok to do that or not?

Jakub

[PATCH] [ARC] New option handling, refurbish multilib support.

2016-11-15 Thread Claudiu Zissulescu

Please find attached the revised patch as requested.

Ok to apply?
Claudiu

gcc/
2016-05-09  Claudiu Zissulescu  

* config/arc/arc-arch.h: New file.
* config/arc/arc-arches.def: Likewise.
* config/arc/arc-cpus.def: Likewise.
* config/arc/arc-options.def: Likewise.
* config/arc/t-multilib: Likewise.
* config/arc/genmultilib.awk: Likewise.
* config/arc/genoptions.awk: Likewise.
* config/arc/arc-tables.opt: Likewise.
* config/arc/driver-arc.c: Likewise.
* testsuite/gcc.target/arc/nps400-cpu-flag.c: Likewise.
* common/config/arc/arc-common.c (arc_handle_option): Trace
toggled options.
* config.gcc (arc*-*-*): Add arc-tables.opt to arc's extra
options; check for supported cpu against arc-cpus.def file.
(arc*-*-elf*, arc*-*-linux-uclibc*): Use new make fragment; define
TARGET_CPU_BUILD macro; add driver-arc.o as an extra object.
* config/arc/arc-c.def: Add emacs local variables.
* config/arc/arc-opts.h (processor_type): Use arc-cpus.def file.
(FPU_FPUS, FPU_FPUD, FPU_FPUDA, FPU_FPUDA_DIV, FPU_FPUDA_FMA)
(FPU_FPUDA_ALL, FPU_FPUS_DIV, FPU_FPUS_FMA, FPU_FPUS_ALL)
(FPU_FPUD_DIV, FPU_FPUD_FMA, FPU_FPUD_ALL): New defines.
(DEFAULT_arc_fpu_build): Define.
(DEFAULT_arc_mpy_option): Define.
* config/arc/arc-protos.h (arc_init): Delete.
* config/arc/arc.c (arc_cpu_name): New variable.
(arc_selected_cpu, arc_selected_arch, arc_arcem, arc_archs)
(arc_arc700, arc_arc600, arc_arc601): New variable.
(arc_init): Add static; remove selection of default tune value,
cleanup obsolete error messages.
(arc_override_options): Make use of .def files for selecting the
right cpu and option configurations.
* config/arc/arc.h (stdbool.h): Include.
(TARGET_CPU_DEFAULT): Define.
(CPP_SPEC): Remove mcpu=NPS400 handling.
(arc_cpu_to_as): Declare.
(EXTRA_SPEC_FUNCTIONS): Define.
(OPTION_DEFAULT_SPECS): Likewise.
(ASM_DEFAULT): Remove.
(ASM_SPEC): Use arc_cpu_to_as.
(DRIVER_SELF_SPECS): Remove deprecated options.
(arc_base_cpu): Declare.
(TARGET_ARC600, TARGET_ARC601, TARGET_ARC700, TARGET_EM)
(TARGET_HS, TARGET_V2, TARGET_ARC600): Make them use arc_base_cpu
variable.
(MULTILIB_DEFAULTS): Use ARC_MULTILIB_CPU_DEFAULT.
* config/arc/arc.md (attr_cpu): Remove.
* config/arc/arc.opt (mno-mpy): Deprecate.
(mcpu=ARC600, mcpu=ARC601, mcpu=ARC700, mcpu=NPS400, mcpu=ARCEM)
(mcpu=ARCHS): Remove.
(mcrc, mdsp-packa, mdvbf, mmac-d16, mmac-24, mtelephony, mrtsc):
Deprecate.
(mbarrel_shifte, mspfp_, mdpfp_, mdsp_pack, mmac_): Remove.
(arc_fpu): Use new defines.
(mpy-option): Change to use numeric or string like inputs.
* config/arc/t-arc (driver-arc.o): New target.
(arc-cpus, t-multilib, arc-tables.opt): Likewise.
* config/arc/t-arc-newlib: Delete.
* config/arc/t-arc-uClibc: Renamed to t-uClibc.
* doc/invoke.texi (ARC): Update arc options.
---
 gcc/common/config/arc/arc-common.c |  69 +++--
 gcc/config.gcc |  47 +++---
 gcc/config/arc/arc-arch.h  | 123 +++
 gcc/config/arc/arc-arches.def  |  56 +++
 gcc/config/arc/arc-c.def   |   4 +
 gcc/config/arc/arc-cpus.def|  75 +
 gcc/config/arc/arc-options.def | 109 +
 gcc/config/arc/arc-opts.h  |  49 +-
 gcc/config/arc/arc-protos.h|   1 -
 gcc/config/arc/arc-tables.opt  |  90 +++
 gcc/config/arc/arc.c   | 179 --
 gcc/config/arc/arc.h   |  89 +--
 gcc/config/arc/arc.md  |   5 -
 gcc/config/arc/arc.opt | 169 ++--
 gcc/config/arc/driver-arc.c|  81 ++
 gcc/config/arc/genmultilib.awk | 203 +
 gcc/config/arc/genoptions.awk  |  86 +++
 gcc/config/arc/t-arc   |  19 +++
 gcc/config/arc/t-arc-newlib|  46 --
 gcc/config/arc/t-arc-uClibc|  20 ---
 gcc/config/arc/t-multilib  |  34 +
 gcc/config/arc/t-uClibc|  20 +++
 gcc/doc/invoke.texi|  90 +--
 gcc/testsuite/gcc.target/arc/nps400-cpu-flag.c |   4 +
 24 files changed, 1290 insertions(+), 378 deletions(-)
 create mode 100644 gcc/config/arc/arc-arch.h
 create mode 100644 gcc/config/arc/arc-arches.def
 create mode 100644 gcc/config/arc/arc-cpus.def
 create mode 100644 gcc/config/arc/arc-options.def
 create

Re: [PATCH] PR77359: Properly align local variables in functions calling alloca.

2016-11-15 Thread Dominik Vogt

On Fri, Nov 11, 2016 at 02:17:58PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 11, 2016 at 09:58:21AM +0100, Dominik Vogt wrote:
> > > You say it needs more testing -- what testing?
> > 
> > Regression testing on AIX (David has done this in reply to the
> > original message), possibly also on 32-Bit Power, if that is a
> > reasonable target.
> 
> It is.  I'll test powerpc64-linux -m32, and a crosscompiler for
> powerpc-linux.
> 
> Thanks for the patch.  Please apply to trunk.  Does it need backports
> later?

Depends on whether the Power/AIX folks think backports are
necessary.  We'll apply the patch to trunk; if a backport is
desired later, David or someone else from IBM could do that (i.e.
it's not necessary that our department commits further patches).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [patch] Disable LTO note about strict aliasing

2016-11-15 Thread Eric Botcazou

> Can you verify that a TU compiled with -fstrict-aliasing will link as
> if -fno-strict-aliasing if -fno-strict-aliasing is specified at link time?

Yes, it does:

eric@polaris:~/build/gcc/native> ~/install/gcc/bin/gcc -c t.c -O2 -flto
eric@polaris:~/build/gcc/native> ~/install/gcc/bin/gcc -o t t.o -O2 -save-
temps -fverbose-asm && grep strict-aliasing t.ltrans0.s
# -fstore-merging -fstrict-aliasing -fstrict-overflow
eric@polaris:~/build/gcc/native> ~/install/gcc/bin/gcc -o t t.o -O2 -fno-
strict-aliasing -save-temps -fverbose-asm && grep strict-aliasing t.ltrans0.s
# -fno-strict-aliasing -fverbose-asm -fltrans t.ltrans0.o

> That said, -Wno-lto-type-mismatch can be used to disable the warning as
> well.

Right, but the wording ("code may be misoptimized") is a bit scaring so I'd 
rather avoid it when possible.

-- 
Eric Botcazou

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Uros Bizjak

On Mon, Nov 14, 2016 at 7:28 PM, Andrew Senkevich
 wrote:
> 2016-11-11 14:16 GMT+03:00 Uros Bizjak :
>> The x86 part of the patch is OK with the above changes and additional
>> target attribute test for flags2 ISA features..
>
> Fixed according your comments, I will followup with additional tests soon.

OK.

Thanks,
Uros.

Move misplaced assignment in num_sign_bit_copies1

2016-11-15 Thread rsandifo

[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

The old assignment to bitwidth was before we handled VOIDmode with:

  if (mode == VOIDmode)
mode = GET_MODE (x);

so when VOIDmode was specified we would always use:

  if (bitwidth < GET_MODE_PRECISION (GET_MODE (x)))
{
  num0 = cached_num_sign_bit_copies (x, GET_MODE (x),
 known_x, known_mode, known_ret);
  return MAX (1,
  num0 - (int) (GET_MODE_PRECISION (GET_MODE (x)) - bitwidth));
}

For a zero bitwidth this always returns 1 (which is the most
pessimistic result).

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtlanal.c (num_sign_bit_copies1): Calculate bitwidth after
handling VOIDmode.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 4617e8e..35e95f2 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -4795,7 +4795,6 @@ num_sign_bit_copies1 (const_rtx x, machine_mode mode, 
const_rtx known_x,
  unsigned int known_ret)
 {
   enum rtx_code code = GET_CODE (x);
-  unsigned int bitwidth = GET_MODE_PRECISION (mode);
   machine_mode inner_mode;
   int num0, num1, result;
   unsigned HOST_WIDE_INT nonzero;
@@ -4812,6 +4811,7 @@ num_sign_bit_copies1 (const_rtx x, machine_mode mode, 
const_rtx known_x,
 return 1;
 
   /* For a smaller object, just ignore the high bits.  */
+  unsigned int bitwidth = GET_MODE_PRECISION (mode);
   if (bitwidth < GET_MODE_PRECISION (GET_MODE (x)))
 {
   num0 = cached_num_sign_bit_copies (x, GET_MODE (x),

Re: Move misplaced assignment in num_sign_bit_copies1

2016-11-15 Thread Eric Botcazou

> 2016-11-15  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
> 
>   * rtlanal.c (num_sign_bit_copies1): Calculate bitwidth after
>   handling VOIDmode.

OK, thanks, but please change the comment too, I think "For a smaller mode" 
would be less confusing.

-- 
Eric Botcazou

[PATCH v2] df: Change defs in entry and uses in exit block during separate shrink-wrapping

2016-11-15 Thread Segher Boessenkool

So far all target implementations of the separate shrink-wrapping hooks
use the DF LIVE info to figure out around which basic blocks the non-
volatile registers need to be saved.  This is done by looking at the
IN+GEN+KILL sets of the basic blocks.  However, that doesn't work for
registers that DF says are defined in the entry block, or used in the
exit block.

This patch introduces a local flag DF_SCAN_EMPTY_ENTRY_EXIT that says
no registers should be defined in the entry block, and none used in the
exit block.  It also makes try_shrink_wrapping_separate use it.  The
rs6000 port is changed to use IN+GEN+KILL for the LR component.

Testing on powerpc64-linux {-m32,-m64}.  Is this okay for trunk if that
succeeds?


Segher


2016-11-15  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_components_for_bb): Mark the LR
component as used also if LR_REGNO is a live input to the bb.
* df-scan.c (df_get_entry_block_def_set): Return immediately after
clearing the set if DF_SCAN_EMPTY_ENTRY_EXIT is set.
(df_get_exit_block_use_set): Ditto.
* df.h (df_scan_flags): New enum.
* rtl.h (shrink_wrap_separate_in_progress): Declare new variable.
* shrink-wrap.c (try_shrink_wrapping_separate): Set
DF_SCAN_EMPTY_ENTRY_EXIT in df_scan->local_flags, and call
df_update_entry_block_defs and df_update_exit_block_uses
at the start; clear the flag and call those functions at the end.

---
 gcc/config/rs6000/rs6000.c |  3 ++-
 gcc/df-scan.c  | 16 
 gcc/df.h   |  7 +++
 gcc/shrink-wrap.c  | 19 +--
 4 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2ceddfd..d75d52c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -27800,7 +27800,8 @@ rs6000_components_for_bb (basic_block bb)
   bitmap_set_bit (components, regno);
 
   /* LR needs to be saved around a bb if it is killed in that bb.  */
-  if (bitmap_bit_p (gen, LR_REGNO)
+  if (bitmap_bit_p (in, LR_REGNO)
+  || bitmap_bit_p (gen, LR_REGNO)
   || bitmap_bit_p (kill, LR_REGNO))
 bitmap_set_bit (components, 0);
 
diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 7cfd34b..e6b55b5 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -3506,6 +3506,14 @@ df_get_entry_block_def_set (bitmap entry_block_defs)
 
   bitmap_clear (entry_block_defs);
 
+  /* For separate shrink-wrapping we use LIVE to analyze which basic blocks
+ need a prologue for some component to be executed before that block,
+ and we do not care about any other registers.  Hence, we do not want
+ any register for any component defined in the entry block, and we can
+ just leave all registers undefined.  */
+  if (df_scan->local_flags & DF_SCAN_EMPTY_ENTRY_EXIT)
+return;
+
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 {
   if (global_regs[i])
@@ -3665,6 +3673,14 @@ df_get_exit_block_use_set (bitmap exit_block_uses)
 
   bitmap_clear (exit_block_uses);
 
+  /* For separate shrink-wrapping we use LIVE to analyze which basic blocks
+ need an epilogue for some component to be executed after that block,
+ and we do not care about any other registers.  Hence, we do not want
+ any register for any component seen as used in the exit block, and we
+ can just say no registers at all are used.  */
+  if (df_scan->local_flags & DF_SCAN_EMPTY_ENTRY_EXIT)
+return;
+
   /* Stack pointer is always live at the exit.  */
   bitmap_set_bit (exit_block_uses, STACK_POINTER_REGNUM);
 
diff --git a/gcc/df.h b/gcc/df.h
index 40c3794..7a2a6a1 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -450,6 +450,13 @@ enum df_chain_flags
   DF_UD_CHAIN  =  2  /* Build UD chains.  */
 };
 
+enum df_scan_flags
+{
+  /* Flags for the SCAN problem.  */
+  DF_SCAN_EMPTY_ENTRY_EXIT = 1  /* Don't define any registers in the entry
+  block; don't use any in the exit block.  */
+};
+
 enum df_changeable_flags
 {
   /* Scanning flags.  */
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index e480d4d..f838696 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -1793,7 +1793,13 @@ try_shrink_wrapping_separate (basic_block first_bb)
   if (!components)
 return;
 
-  /* We need LIVE info.  */
+  /* We need LIVE info, not defining anything in the entry block and not
+ using anything in the exit block.  A block then needs a component if
+ the register for that component is in the IN or GEN or KILL set for
+ that block.  */
+  df_scan->local_flags |= DF_SCAN_EMPTY_ENTRY_EXIT;
+  df_update_entry_block_defs ();
+  df_update_exit_block_uses ();
   df_live_add_problem ();
   df_live_set_all_dirty ();
   df_analyze ();
@@ -1859,9 +1865,10 @@ try_shrink_wrapping_separate (basic_block first_bb)
   free_dominance_info (CDI_DOMINATORS);
   free_dominance_info (CDI_POST_DOMINATORS);
 
-  if (crtl->shrink_wrapped_sep

Fix simplify_shift_const_1 handling of vector shifts

2016-11-15 Thread Richard Sandiford

[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

simplify_shift_const_1 handles both shifts of scalars by scalars
and shifts of vectors by scalars.  For vectors this means that
each element is shifted by the same amount.

However:

(a) the two cases weren't always distinguished, so we'd try
things for vectors that only made sense for scalars.

(b) a lot of the range and bitcount checks were based on the
bitsize or precision of the full shifted operand, rather
than the mode of each element.

Fixing (b) accidentally exposed more optimisation opportunities,
although that wasn't the point of the patch.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* combine.c (simplify_shift_const_1): Use the number of bits
in the inner mode to determine the range of the shift.
When handling shifts of vectors, skip any rules that apply
only to scalars.

diff --git a/gcc/combine.c b/gcc/combine.c
index 6b7bdd0..66f628f 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -10216,12 +10216,12 @@ simplify_shift_const_1 (enum rtx_code code, 
machine_mode result_mode,
  want to do this inside the loop as it makes it more difficult to
  combine shifts.  */
   if (SHIFT_COUNT_TRUNCATED)
-orig_count &= GET_MODE_BITSIZE (mode) - 1;
+orig_count &= GET_MODE_UNIT_BITSIZE (mode) - 1;
 
   /* If we were given an invalid count, don't do anything except exactly
  what was requested.  */
 
-  if (orig_count < 0 || orig_count >= (int) GET_MODE_PRECISION (mode))
+  if (orig_count < 0 || orig_count >= (int) GET_MODE_UNIT_PRECISION (mode))
 return NULL_RTX;
 
   count = orig_count;
@@ -10238,16 +10238,14 @@ simplify_shift_const_1 (enum rtx_code code, 
machine_mode result_mode,
   /* Convert ROTATERT to ROTATE.  */
   if (code == ROTATERT)
{
- unsigned int bitsize = GET_MODE_PRECISION (result_mode);
+ unsigned int bitsize = GET_MODE_UNIT_PRECISION (result_mode);
  code = ROTATE;
- if (VECTOR_MODE_P (result_mode))
-   count = bitsize / GET_MODE_NUNITS (result_mode) - count;
- else
-   count = bitsize - count;
+ count = bitsize - count;
}
 
   shift_mode = try_widen_shift_mode (code, varop, count, result_mode,
 mode, outer_op, outer_const);
+  machine_mode shift_unit_mode = GET_MODE_INNER (shift_mode);
 
   /* Handle cases where the count is greater than the size of the mode
 minus 1.  For ASHIFT, use the size minus one as the count (this can
@@ -10259,12 +10257,12 @@ simplify_shift_const_1 (enum rtx_code code, 
machine_mode result_mode,
 multiple operations, each of which are defined, we know what the
 result is supposed to be.  */
 
-  if (count > (GET_MODE_PRECISION (shift_mode) - 1))
+  if (count > (GET_MODE_PRECISION (shift_unit_mode) - 1))
{
  if (code == ASHIFTRT)
-   count = GET_MODE_PRECISION (shift_mode) - 1;
+   count = GET_MODE_PRECISION (shift_unit_mode) - 1;
  else if (code == ROTATE || code == ROTATERT)
-   count %= GET_MODE_PRECISION (shift_mode);
+   count %= GET_MODE_PRECISION (shift_unit_mode);
  else
{
  /* We can't simply return zero because there may be an
@@ -10280,44 +10278,49 @@ simplify_shift_const_1 (enum rtx_code code, 
machine_mode result_mode,
   if (complement_p)
break;
 
-  /* An arithmetic right shift of a quantity known to be -1 or 0
-is a no-op.  */
-  if (code == ASHIFTRT
- && (num_sign_bit_copies (varop, shift_mode)
- == GET_MODE_PRECISION (shift_mode)))
+  if (shift_mode == shift_unit_mode)
{
- count = 0;
- break;
-   }
+ /* An arithmetic right shift of a quantity known to be -1 or 0
+is a no-op.  */
+ if (code == ASHIFTRT
+ && (num_sign_bit_copies (varop, shift_unit_mode)
+ == GET_MODE_PRECISION (shift_unit_mode)))
+   {
+ count = 0;
+ break;
+   }
 
-  /* If we are doing an arithmetic right shift and discarding all but
-the sign bit copies, this is equivalent to doing a shift by the
-bitsize minus one.  Convert it into that shift because it will often
-allow other simplifications.  */
-
-  if (code == ASHIFTRT
- && (count + num_sign_bit_copies (varop, shift_mode)
- >= GET_MODE_PRECISION (shift_mode)))
-   count = GET_MODE_PRECISION (shift_mode) - 1;
-
-  /* We simplify the tests below and elsewhere by converting
-ASHIFTRT to LSHIFTRT if we know the sign bit is clear.
-`make_compound_operation' will convert it to an ASHIFTRT for
-those machines (such a

[patch] remove more GCJ references

2016-11-15 Thread Matthias Klose

This patch removes some references to gcj in the top level and config
directories and in the gcc documentation.  The change to the config directory
requires regenerating aclocal.m4 and configure in each sub directory.

Ok for the trunk?

Matthias



2016-11-14  Matthias Klose  

* config-ml.in: Remove references to GCJ.
* configure.ac: Likewise.
* configure: Regenerate.

config/

2016-11-14  Matthias Klose  

multi.m4: Don't set GCJ.

gcc/

2016-11-14  Matthias Klose  

* doc/install.texi: Remove references to gcj/libjava.
* doc/invoke.texi: Likewise.



2016-11-14  Matthias Klose  

	* config-ml.in: Remove references to GCJ.
	* configure.ac: Likewise.
	* configure: Regenerate.

config/

2016-11-14  Matthias Klose  

	multi.m4: Don't set GCJ.

gcc/

2016-11-14  Matthias Klose  

	* doc/install.texi: Remove references to gcj/libjava.
	* doc/invoke.texi: Likewise.

Index: config/multi.m4
===
--- config/multi.m4	(revision 242381)
+++ config/multi.m4	(working copy)
@@ -64,5 +64,4 @@
 CONFIG_SHELL=${CONFIG_SHELL-/bin/sh}
 CC="$CC"
 CXX="$CXX"
-GFORTRAN="$GFORTRAN"
-GCJ="$GCJ"])])dnl
+GFORTRAN="$GFORTRAN"])])dnl
Index: config-ml.in
===
--- config-ml.in	(revision 242381)
+++ config-ml.in	(working copy)
@@ -511,7 +511,6 @@
 ADAFLAGS="$(ADAFLAGS) $${flags}" \
 prefix="$(prefix)" \
 exec_prefix="$(exec_prefix)" \
-GCJFLAGS="$(GCJFLAGS) $${flags}" \
 GOCFLAGS="$(GOCFLAGS) $${flags}" \
 CXXFLAGS="$(CXXFLAGS) $${flags}" \
 LIBCFLAGS="$(LIBCFLAGS) $${flags}" \
@@ -746,13 +745,12 @@
 break
   fi
 done
-ml_config_env='CC="${CC_}$flags" CXX="${CXX_}$flags" F77="${F77_}$flags" GCJ="${GCJ_}$flags" GFORTRAN="${GFORTRAN_}$flags" GOC="${GOC_}$flags"'
+ml_config_env='CC="${CC_}$flags" CXX="${CXX_}$flags" F77="${F77_}$flags" GFORTRAN="${GFORTRAN_}$flags" GOC="${GOC_}$flags"'
 
 if [ "${with_target_subdir}" = "." ]; then
 	CC_=$CC' '
 	CXX_=$CXX' '
 	F77_=$F77' '
-	GCJ_=$GCJ' '
 	GFORTRAN_=$GFORTRAN' '
 	GOC_=$GOC' '
 else
@@ -795,18 +793,6 @@
 	  esac
 	done
 
-	GCJ_=
-	for arg in ${GCJ}; do
-	  case $arg in
-	  -[BIL]"${ML_POPDIR}"/*)
-	GCJ_="${GCJ_}"`echo "X${arg}" | sed -n "s/X\\(-[BIL]${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n "s/X-[BIL]${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
-	  "${ML_POPDIR}"/*)
-	GCJ_="${GCJ_}"`echo "X${arg}" | sed -n "s/X\\(${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n "s/X${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
-	  *)
-	GCJ_="${GCJ_}${arg} " ;;
-	  esac
-	done
-
 	GFORTRAN_=
 	for arg in ${GFORTRAN}; do
 	  case $arg in
Index: configure.ac
===
--- configure.ac	(revision 242381)
+++ configure.ac	(working copy)
@@ -1256,7 +1256,6 @@
   AS_FOR_BUILD=${AS_FOR_BUILD-as}
   CC_FOR_BUILD=${CC_FOR_BUILD-gcc}
   CXX_FOR_BUILD=${CXX_FOR_BUILD-g++}
-  GCJ_FOR_BUILD=${GCJ_FOR_BUILD-gcj}
   GFORTRAN_FOR_BUILD=${GFORTRAN_FOR_BUILD-gfortran}
   GOC_FOR_BUILD=${GOC_FOR_BUILD-gccgo}
   DLLTOOL_FOR_BUILD=${DLLTOOL_FOR_BUILD-dlltool}
@@ -1270,7 +1269,6 @@
   AS_FOR_BUILD="\$(AS)"
   CC_FOR_BUILD="\$(CC)"
   CXX_FOR_BUILD="\$(CXX)"
-  GCJ_FOR_BUILD="\$(GCJ)"
   GFORTRAN_FOR_BUILD="\$(GFORTRAN)"
   GOC_FOR_BUILD="\$(GOC)"
   DLLTOOL_FOR_BUILD="\$(DLLTOOL)"
@@ -3183,7 +3181,6 @@
 AC_SUBST(CXXFLAGS_FOR_BUILD)
 AC_SUBST(CXX_FOR_BUILD)
 AC_SUBST(DLLTOOL_FOR_BUILD)
-AC_SUBST(GCJ_FOR_BUILD)
 AC_SUBST(GFORTRAN_FOR_BUILD)
 AC_SUBST(GOC_FOR_BUILD)
 AC_SUBST(LDFLAGS_FOR_BUILD)
@@ -3293,7 +3290,6 @@
 NCN_STRICT_CHECK_TARGET_TOOLS(CC_FOR_TARGET, cc gcc)
 NCN_STRICT_CHECK_TARGET_TOOLS(CXX_FOR_TARGET, c++ g++ cxx gxx)
 NCN_STRICT_CHECK_TARGET_TOOLS(GCC_FOR_TARGET, gcc, ${CC_FOR_TARGET})
-NCN_STRICT_CHECK_TARGET_TOOLS(GCJ_FOR_TARGET, gcj)
 NCN_STRICT_CHECK_TARGET_TOOLS(GFORTRAN_FOR_TARGET, gfortran)
 NCN_STRICT_CHECK_TARGET_TOOLS(GOC_FOR_TARGET, gccgo)
 
Index: gcc/doc/install.texi
===
--- gcc/doc/install.texi	(revision 242381)
+++ gcc/doc/install.texi	(working copy)
@@ -338,10 +338,6 @@
 Used by various scripts to generate some files included in SVN (mainly
 Unicode-related and rarely changing) from source tables.
 
-@item @command{jar}, or InfoZIP (@command{zip} and @command{unzip})
-
-Necessary to build libgcj, the GCJ runtime.
-
 @end table
 
 Several support libraries are necessary to build GCC, some are required,
@@ -2139,240 +2135,6 @@
 tools.
 @end table
 
-@subheading Java-Specific Options
-
-The following option applies to the build of the Java front end.
-
-@table @code
-@item --disable-libgcj
-Specify that the run-time libraries
-used by GCJ should not be built.  This is useful in case you intend
-to use GCJ with some other run-time, or you're going to install it
-separately, or it just happens not to build on your particular
-machine.  I

[committed] Fix a GET_MODE_CLASS typo in mem_loc_descriptor

2016-11-15 Thread Richard Sandiford

...it should have been checking the size instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Committed as obvious.

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* dwarf2out.c (mem_loc_descriptor): Fix GET_MODE_CLASS/
GET_MODE_SIZE typo.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 58a5e1a..a7344ca 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -14976,7 +14976,7 @@ mem_loc_descriptor (rtx rtl, machine_mode mode,
   if ((!dwarf_strict || dwarf_version >= 5)
  && SCALAR_INT_MODE_P (mode))
{
- if (GET_MODE_CLASS (mode) > DWARF2_ADDR_SIZE)
+ if (GET_MODE_SIZE (mode) > DWARF2_ADDR_SIZE)
{
  op = DW_OP_div;
  goto do_binop;

[PATCH] Significantly reduce memory usage of genattrtab

2016-11-15 Thread Bernd Edlinger

Hi!

The genattrtab build-tool uses way too much memory in general.
I think there is no other build step that uses more memory.

On the currently trunk it takes around 700MB to build the
ARM latency tab files.  I debugged that yesterday
and found that this can be reduced to 8MB (!).  Yes, really.

So the attached patch does try really hard to hash and re-use
all ever created rtx objects.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM.
Is it OK for trunk?


Thanks
Bernd.2016-11-15  Bernd Edlinger  

* genattrtab.c (attr_rtx_1): Ignore ATTR_PERMANENT_P on arguments.
Use DEF_ATTR_STRING for string arguments.
Use RTL_HASH for integer arguments.
(attr_eq): Simplify.
(attr_copy_rtx): Remove.
(make_canonical, get_attr_value): Use attr_equal_p.
(copy_boolean): Rehash NOT.
(simplify_test_exp_in_temp,
optimize_attrs): Remove call to attr_copy_rtx.
(attr_alt_intersection, attr_alt_union,
attr_alt_complement, mk_attr_alt): Rehash EQ_ATTR_ALT.
(make_automaton_attrs): Use attr_eq.
Index: gcc/genattrtab.c
===
--- gcc/genattrtab.c	(revision 242335)
+++ gcc/genattrtab.c	(working copy)
@@ -395,14 +395,6 @@ attr_rtx_1 (enum rtx_code code, va_list p)
 {
   rtx arg0 = va_arg (p, rtx);
 
-  /* A permanent object cannot point to impermanent ones.  */
-  if (! ATTR_PERMANENT_P (arg0))
-	{
-	  rt_val = rtx_alloc (code);
-	  XEXP (rt_val, 0) = arg0;
-	  return rt_val;
-	}
-
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
 	if (h->hashcode == hashcode
@@ -425,15 +417,6 @@ attr_rtx_1 (enum rtx_code code, va_list p)
   rtx arg0 = va_arg (p, rtx);
   rtx arg1 = va_arg (p, rtx);
 
-  /* A permanent object cannot point to impermanent ones.  */
-  if (! ATTR_PERMANENT_P (arg0) || ! ATTR_PERMANENT_P (arg1))
-	{
-	  rt_val = rtx_alloc (code);
-	  XEXP (rt_val, 0) = arg0;
-	  XEXP (rt_val, 1) = arg1;
-	  return rt_val;
-	}
-
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
 	if (h->hashcode == hashcode
@@ -481,6 +464,9 @@ attr_rtx_1 (enum rtx_code code, va_list p)
   char *arg0 = va_arg (p, char *);
   char *arg1 = va_arg (p, char *);
 
+  arg0 = DEF_ATTR_STRING (arg0);
+  arg1 = DEF_ATTR_STRING (arg1);
+
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
 	if (h->hashcode == hashcode
@@ -497,6 +483,29 @@ attr_rtx_1 (enum rtx_code code, va_list p)
 	  XSTR (rt_val, 1) = arg1;
 	}
 }
+  else if (GET_RTX_LENGTH (code) == 2
+	   && GET_RTX_FORMAT (code)[0] == 'i'
+	   && GET_RTX_FORMAT (code)[1] == 'i')
+{
+  int  arg0 = va_arg (p, int);
+  int  arg1 = va_arg (p, int);
+
+  hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
+  for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
+	if (h->hashcode == hashcode
+	&& GET_CODE (h->u.rtl) == code
+	&& XINT (h->u.rtl, 0) == arg0
+	&& XINT (h->u.rtl, 1) == arg1)
+	  return h->u.rtl;
+
+  if (h == 0)
+	{
+	  rtl_obstack = hash_obstack;
+	  rt_val = rtx_alloc (code);
+	  XINT (rt_val, 0) = arg0;
+	  XINT (rt_val, 1) = arg1;
+	}
+}
   else if (code == CONST_INT)
 {
   HOST_WIDE_INT arg0 = va_arg (p, HOST_WIDE_INT);
@@ -592,7 +601,7 @@ attr_printf (unsigned int len, const char *fmt, ..
 static rtx
 attr_eq (const char *name, const char *value)
 {
-  return attr_rtx (EQ_ATTR, DEF_ATTR_STRING (name), DEF_ATTR_STRING (value));
+  return attr_rtx (EQ_ATTR, name, value);
 }
 
 static const char *
@@ -646,89 +655,6 @@ attr_equal_p (rtx x, rtx y)
 		 && rtx_equal_p (x, y)));
 }
 
-/* Copy an attribute value expression,
-   descending to all depths, but not copying any
-   permanent hashed subexpressions.  */
-
-static rtx
-attr_copy_rtx (rtx orig)
-{
-  rtx copy;
-  int i, j;
-  RTX_CODE code;
-  const char *format_ptr;
-
-  /* No need to copy a permanent object.  */
-  if (ATTR_PERMANENT_P (orig))
-return orig;
-
-  code = GET_CODE (orig);
-
-  switch (code)
-{
-case REG:
-CASE_CONST_ANY:
-case SYMBOL_REF:
-case MATCH_TEST:
-case CODE_LABEL:
-case PC:
-case CC0:
-  return orig;
-
-default:
-  break;
-}
-
-  copy = rtx_alloc (code);
-  PUT_MODE (copy, GET_MODE (orig));
-  ATTR_IND_SIMPLIFIED_P (copy) = ATTR_IND_SIMPLIFIED_P (orig);
-  ATTR_CURR_SIMPLIFIED_P (copy) = ATTR_CURR_SIMPLIFIED_P (orig);
-  ATTR_PERMANENT_P (copy) = ATTR_PERMANENT_P (orig);
-
-  format_ptr = GET_RTX_FORMAT (GET_CODE (copy));
-
-  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (copy)); i++)
-{
-  switch (*format_ptr++)
-	{
-	case 'e':
-	  XEXP (copy, i) = XEXP (orig, i);
-	  if (XEXP (orig,

[patch] [5/6] fix installation documentation

2016-11-15 Thread Matthias Klose

Seen while cleaning up the GCJ references.  The last section of the gcj install
docs doesn't belong there, but into the cross install section.  The change is
not necessary on the trunk once the gcj docs are removed.  Fixed this on the 5
and 6 branches by moving the section.

Committed as obvious.

Matthias

2016-11-15  Matthias Klose  

* doc/install.texi: Move the 'Overriding configure test results'
subsub section to the 'Cross-Compiler-Specific Options' sub section.
Index: gcc/doc/install.texi
===
--- gcc/doc/install.texi	(revision 242386)
+++ gcc/doc/install.texi	(working copy)
@@ -2081,6 +2081,36 @@
 tools.
 @end table
 
+@subsubheading Overriding @command{configure} test results
+
+Sometimes, it might be necessary to override the result of some
+@command{configure} test, for example in order to ease porting to a new
+system or work around a bug in a test.  The toplevel @command{configure}
+script provides three variables for this:
+
+@table @code
+
+@item build_configargs
+@cindex @code{build_configargs}
+The contents of this variable is passed to all build @command{configure}
+scripts.
+
+@item host_configargs
+@cindex @code{host_configargs}
+The contents of this variable is passed to all host @command{configure}
+scripts.
+
+@item target_configargs
+@cindex @code{target_configargs}
+The contents of this variable is passed to all target @command{configure}
+scripts.
+
+@end table
+
+In order to avoid shell and @command{make} quoting issues for complex
+overrides, you can pass a setting for @env{CONFIG_SITE} and set
+variables in the site file.
+
 @subheading Java-Specific Options
 
 The following option applies to the build of the Java front end.
@@ -2315,36 +2345,7 @@
 
 @end table
 
-@subsubheading Overriding @command{configure} test results
 
-Sometimes, it might be necessary to override the result of some
-@command{configure} test, for example in order to ease porting to a new
-system or work around a bug in a test.  The toplevel @command{configure}
-script provides three variables for this:
-
-@table @code
-
-@item build_configargs
-@cindex @code{build_configargs}
-The contents of this variable is passed to all build @command{configure}
-scripts.
-
-@item host_configargs
-@cindex @code{host_configargs}
-The contents of this variable is passed to all host @command{configure}
-scripts.
-
-@item target_configargs
-@cindex @code{target_configargs}
-The contents of this variable is passed to all target @command{configure}
-scripts.
-
-@end table
-
-In order to avoid shell and @command{make} quoting issues for complex
-overrides, you can pass a setting for @env{CONFIG_SITE} and set
-variables in the site file.
-
 @html

Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions

2016-11-15 Thread Segher Boessenkool

Hi!

On Mon, Nov 14, 2016 at 04:43:35PM -0700, Kelvin Nilsen wrote:
>   * config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value.
>   (UNSPEC_CMPRB2): New unspec value.

I wonder if you really need both?  The number of arguments will tell
which is which, anyway?

>   (cmprb_p): New expansion.

Not such a great name (now you get a gen_cmprb_p function which isn't
a predicate itself).

>   (CMPRB): Add byte-in-range built-in function.
>   (CMBRB2): Add byte-in-either_range built-in function.
>   (CMPEQB): Add byte-in-set builtin-in function.

"builtin-in", and you typoed an underscore?

> +;; Predicate: test byte within range.
> +;; Return in target register operand 0 a non-zero value iff the byte
> +;; held in bits 24:31 of operand 1 is within the inclusive range
> +;; bounded below by operand 2's bits 0:7 and above by operand 2's
> +;; bits 8:15.
> +(define_expand "cmprb_p"

It seems you got the bit numbers mixed up.  Maybe just call it the low
byte, and the byte just above?

(And it always sets 0 or 1 here, you might want to make that more explicit).

> +;; Set bit 1 (the GT bit, 0x2) of CR register operand 0 to 1 iff the

That's 4, i.e. 0b0100.

> +;; Set operand 0 register to non-zero value iff the CR register named
> +;; by operand 1 has its GT bit (0x2) or its LT bit (0x1) set.
> +(define_insn "*setb"

LT is 8, GT is 4.  If LT is set it returns -1, otherwise if GT is set it
returns 1, otherwise it returns 0.

> +;; Predicate: test byte within two ranges.
> +;; Return in target register operand 0 a non-zero value iff the byte
> +;; held in bits 24:31 of operand 1 is within the inclusive range
> +;; bounded below by operand 2's bits 0:7 and above by operand 2's
> +;; bits 8:15 or if the byte is within the inclusive range bounded
> +;; below by operand 2's bits 16:23 and above by operand 2's bits 24:31.
> +(define_expand "cmprb2_p"

The high bound is higher in the reg than the low bound.  See the example
where 0x3930 is used to do isdigit (and yes 0x3039 would be much more
fun, but alas).

> +;; Predicate: test byte membership within set of 8 bytes.
> +;; Return in target register operand 0 a non-zero value iff the byte
> +;; held in bits 24:31 of operand 1 equals at least one of the eight
> +;; byte values represented by the 64-bit register supplied as operand
> +;; 2.  Note that the 8 byte values held within operand 2 need not be
> +;; unique. 

(trailing space)

I wonder if we really need all these predicate expanders, if it wouldn't
be easier if the builtin handling code did the setb itself?

Segher

[PATCH] rs6000: Make deallocation of a large frame work (PR77687)

2016-11-15 Thread Segher Boessenkool

If we use ABI_V4 and we have a big stack frame, we end the epilogue
with a "mr 1,11" (or similar) instruction.  This instruction however
has no dependencies on the earlier restores from stack (done via r11),
so sched2 can end up reordering the insns, which is bad because we
have no red zone so that you then restore from stack that is already
deallocated.

This fixes it by making that restore depend on the memory accesses.

Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?  Do we
want a testcase for this?


Segher


2016-11-15  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Emit the
stack_restore_tie insn instead of stack_tie, for the SVR4 and
SPE ABIs.
* config/rs6000/rs6000.md (stack_restore_tie): New define_insn.

---
 gcc/config/rs6000/rs6000.c  | 13 -
 gcc/config/rs6000/rs6000.md | 16 
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8a04248..2ceddfd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -27487,7 +27487,11 @@ rs6000_emit_stack_reset (rs6000_stack_t *info,
 rtx frame_reg_rtx, HOST_WIDE_INT frame_off,
 unsigned updt_regno)
 {
-  rtx updt_reg_rtx;
+  /* If there is nothing to do, don't do anything.  */
+  if (frame_off == 0 && REGNO (frame_reg_rtx) == updt_regno)
+return NULL_RTX;
+
+  rtx updt_reg_rtx = gen_rtx_REG (Pmode, updt_regno);
 
   /* This blockage is needed so that sched doesn't decide to move
  the sp change before the register restores.  */
@@ -27495,18 +27499,17 @@ rs6000_emit_stack_reset (rs6000_stack_t *info,
   || (TARGET_SPE_ABI
  && info->spe_64bit_regs_used != 0
  && info->first_gp_reg_save != 32))
-rs6000_emit_stack_tie (frame_reg_rtx, frame_pointer_needed);
+return emit_insn (gen_stack_restore_tie (updt_reg_rtx, frame_reg_rtx,
+GEN_INT (frame_off)));
 
   /* If we are restoring registers out-of-line, we will be using the
  "exit" variants of the restore routines, which will reset the
  stack for us.  But we do need to point updt_reg into the
  right place for those routines.  */
-  updt_reg_rtx = gen_rtx_REG (Pmode, updt_regno);
-
   if (frame_off != 0)
 return emit_insn (gen_add3_insn (updt_reg_rtx,
 frame_reg_rtx, GEN_INT (frame_off)));
-  else if (REGNO (frame_reg_rtx) != updt_regno)
+  else
 return emit_move_insn (updt_reg_rtx, frame_reg_rtx);
 
   return NULL_RTX;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b3fe92a..a779f5c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -12769,6 +12769,22 @@ (define_insn "stack_tie"
   ""
   [(set_attr "length" "0")])
 
+; Some 32-bit ABIs do not have a red zone, so the stack deallocation has to
+; stay behind all restores from the stack, it cannot be reordered to before
+; one.  See PR77687.
+(define_insn "stack_restore_tie"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
+   (plus:SI (match_operand:SI 1 "gpc_reg_operand" "r,r")
+(match_operand:SI 2 "reg_or_cint_operand" "O,rI")))
+   (set (mem:BLK (match_dup 0)) (const_int 0))
+   (set (mem:BLK (match_dup 1)) (const_int 0))]
+;   (clobber (mem:BLK (scratch:SI)))]
+  "TARGET_32BIT"
+  "@
+   mr %0,%1
+   add%I2 %0,%1,%2"
+  [(set_attr "type" "*,add")])
+
 (define_expand "epilogue"
   [(use (const_int 0))]
   ""
-- 
1.9.3

Re: [libstdc++, testsuite] Add dg-require-thread-fence

2016-11-15 Thread Jonathan Wakely


On 14/11/16 14:32 +0100, Christophe Lyon wrote:

On 20 October 2016 at 19:40, Jonathan Wakely  wrote:

On 20/10/16 10:33 -0700, Mike Stump wrote:


On Oct 20, 2016, at 9:34 AM, Jonathan Wakely  wrote:



On 20/10/16 09:26 -0700, Mike Stump wrote:


On Oct 20, 2016, at 5:20 AM, Jonathan Wakely  wrote:



I am considering leaving this in the ARM backend to force people to
think what they want to do about thread safety with statics and C++
on bare-metal systems.



The quoting makes it look like those are my words, but I was quoting
Ramana from https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02751.html


Not quite in the GNU spirit?  The port people should decide the best way
to get as much functionality as possible and everything should just work, no
sharp edges.

Forcing people to think sounds like a sharp edge?



I'm inclined to agree, but we are talking about bare metal systems,



So?  gcc has been doing bare metal systems for more than 2 years now.  It
is pretty good at it.  All my primary targets today are themselves bare
metal systems (I test with newlib).


where there is no one-size-fits-all solution.



Configurations are like ice cream cones.  Everyone gets their flavor no
matter how weird or strange.  Putting nails in a cone because you don't know
if they like vanilla or chocolate isn't reasonable.  If you want, make two
flavors, and vend two, if you want to just do one, pick the flavor and vend
it.  Put an enum #define default_flavor vanilla, and you then have support
for any flavor you want.  Want to add a configure option for the flavor
select, add it.  You want to make a -mflavor=chocolate option, add it.  gcc
is literally littered with these things.



Like I said, you can either build the library with
-fno-threadsafe-statics or you can provide a definition of the missing
symbol.


I gave this a try (using CXXFLAGS_FOR_TARGET=-fno-threadsafe-statics).
It seems to do the trick indeed: almost all tests now pass, the flag is added
to testcase compilation.

Among the 6 remaining failures, I noticed these two:
- experimental/type_erased_allocator/2.cc: still complains about the missing
__sync_synchronize. Does it need dg-require-thread-fence?


Yes, I think that test actually uses atomics directly, so does depend
on the fence.


- abi/header_cxxabi.c complains because the option is not valid for C.
I can see the test is already skipped for other C++-only options: it is OK
if I submit a patch to skip it if -fno-threadsafe-statics is used?


Yes, it makes sense there too.


I think I'm going to use this flag in validations from now on (target
arm-none-eabi
only, with default mode/cpu/fpu).


Thanks for the update on this.

Re: [patch] Disable LTO note about strict aliasing

2016-11-15 Thread Richard Biener

On Tue, Nov 15, 2016 at 10:47 AM, Eric Botcazou  wrote:
>> Can you verify that a TU compiled with -fstrict-aliasing will link as
>> if -fno-strict-aliasing if -fno-strict-aliasing is specified at link time?
>
> Yes, it does:
>
> eric@polaris:~/build/gcc/native> ~/install/gcc/bin/gcc -c t.c -O2 -flto
> eric@polaris:~/build/gcc/native> ~/install/gcc/bin/gcc -o t t.o -O2 -save-
> temps -fverbose-asm && grep strict-aliasing t.ltrans0.s
> # -fstore-merging -fstrict-aliasing -fstrict-overflow
> eric@polaris:~/build/gcc/native> ~/install/gcc/bin/gcc -o t t.o -O2 -fno-
> strict-aliasing -save-temps -fverbose-asm && grep strict-aliasing t.ltrans0.s
> # -fno-strict-aliasing -fverbose-asm -fltrans t.ltrans0.o

Yes, I know -fno-strict-aliasing is globally set, but will all -fstrict-aliasing
optimization attributes on functions be "overwritten"?  That is, are you
sure that when optimizing a function originally compiled with -fstrict-aliasing
that -fno-strict-aliasing is in effect?

>> That said, -Wno-lto-type-mismatch can be used to disable the warning as
>> well.
>
> Right, but the wording ("code may be misoptimized") is a bit scaring so I'd
> rather avoid it when possible.
>
> --
> Eric Botcazou

Re: Fix simplify_shift_const_1 handling of vector shifts

2016-11-15 Thread Segher Boessenkool

Hi Richard,

On Tue, Nov 15, 2016 at 10:49:26AM +, Richard Sandiford wrote:
> simplify_shift_const_1 handles both shifts of scalars by scalars
> and shifts of vectors by scalars.  For vectors this means that
> each element is shifted by the same amount.
> 
> However:
> 
> (a) the two cases weren't always distinguished, so we'd try
> things for vectors that only made sense for scalars.
> 
> (b) a lot of the range and bitcount checks were based on the
> bitsize or precision of the full shifted operand, rather
> than the mode of each element.
> 
> Fixing (b) accidentally exposed more optimisation opportunities,
> although that wasn't the point of the patch.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Yes please.  Thanks!


Segher

Re: [Patch] Remove variant, variant and variant<>

2016-11-15 Thread Jonathan Wakely


On 12/11/16 12:11 -0800, Tim Shen wrote:

At Issaquah we decided to remove the supports above.


OK with a suitable ChangeLog, thanks.

Re: [patch] Disable LTO note about strict aliasing

2016-11-15 Thread Eric Botcazou

> Yes, I know -fno-strict-aliasing is globally set, but will all
> -fstrict-aliasing optimization attributes on functions be "overwritten"? 
> That is, are you sure that when optimizing a function originally compiled
> with -fstrict-aliasing that -fno-strict-aliasing is in effect?

Do you mean with the "optimize" attribute or somesuch?  If so, I suppose not, 
if the attributes are properly saved and restored; otherwise, I don't really 
understand the question because I don't think that LTO saves the entire option 
state on a per-function basis.

-- 
Eric Botcazou

Re: [PATCH] Significantly reduce memory usage of genattrtab

2016-11-15 Thread Richard Sandiford

Bernd Edlinger  writes:
> Hi!
>
> The genattrtab build-tool uses way too much memory in general.
> I think there is no other build step that uses more memory.
>
> On the currently trunk it takes around 700MB to build the
> ARM latency tab files.  I debugged that yesterday
> and found that this can be reduced to 8MB (!).  Yes, really.
>
> So the attached patch does try really hard to hash and re-use
> all ever created rtx objects.
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM.
> Is it OK for trunk?

Just to check: does this produce the same output as before?
And did you notice any difference in the time genattrtab
takes to run?

> Index: gcc/genattrtab.c
> ===
> --- gcc/genattrtab.c  (revision 242335)
> +++ gcc/genattrtab.c  (working copy)
> @@ -395,14 +395,6 @@ attr_rtx_1 (enum rtx_code code, va_list p)
>  {
>rtx arg0 = va_arg (p, rtx);
>  
> -  /* A permanent object cannot point to impermanent ones.  */
> -  if (! ATTR_PERMANENT_P (arg0))
> - {
> -   rt_val = rtx_alloc (code);
> -   XEXP (rt_val, 0) = arg0;
> -   return rt_val;
> - }
> -
>hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0));
>for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
>   if (h->hashcode == hashcode
> @@ -425,15 +417,6 @@ attr_rtx_1 (enum rtx_code code, va_list p)
>rtx arg0 = va_arg (p, rtx);
>rtx arg1 = va_arg (p, rtx);
>  
> -  /* A permanent object cannot point to impermanent ones.  */
> -  if (! ATTR_PERMANENT_P (arg0) || ! ATTR_PERMANENT_P (arg1))
> - {
> -   rt_val = rtx_alloc (code);
> -   XEXP (rt_val, 0) = arg0;
> -   XEXP (rt_val, 1) = arg1;
> -   return rt_val;
> - }
> -
>hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
>for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
>   if (h->hashcode == hashcode

ATTR_PERMANENT_P is supposed to guarantee that no other rtx like it exists,
so that x != y when x or y is "permanent" implies that the attributes
must be different.  This lets attr_equal_p avoid a recursive walk:

static int
attr_equal_p (rtx x, rtx y)
{
  return (x == y || (! (ATTR_PERMANENT_P (x) && ATTR_PERMANENT_P (y))
 && rtx_equal_p (x, y)));
}

Does the patch still guarantee that?

Thanks,
Richard

Re: [PATCH][PPC] Fix ICE using power9 with soft-float

2016-11-15 Thread Segher Boessenkool

Hi Andrew,

Thanks for the patch and looking into this.

On Mon, Nov 14, 2016 at 04:57:58PM +, Andrew Stubbs wrote:
> The testcase powerpc/fusion3.c causes an ICE when compiled with 
> -msoft-float.

> Basically, the problem is that the peephole optimization tries to create 
> a Power9 Fusion instruction, but those do not support SF values in 
> integer registers (AFAICT).

The peepholes do not support it, or maybe the define_insns do not either.
The machine of course will not care.

> So, presumably, I need to adjust either the predicate or the condition 
> of the peephole rules.

Yes.

> The predicate used is "toc_fusion_or_p9_reg_operand", and this might be 
> the root cause, but I don't know the architecture well enough to be 
> sure.

This fusion is quite simple really.  Offset addressing insns can only
access a 16-bit range, but instead of writing e.g.

lwz 0,0x12345678(3)

you can do

addis 4,3,0x1234
lwz 0,0x5678(4)

and the processor will execute it faster, essentially like if it was
written as the first example.  See 2.1.1 in Power ISA 3.0.

> The predicate code seems to suggest that "toc_fusion", whatever 
> that is, should be able to do this, but the insn produced by the 
> peephole uses only UNSPEC_FUSION_P9, which does not. Perhaps this 
> predicate is inappropriate for the P9 Fusion peephole, or perhaps it 
> needs to be taught about this corner case?

One of those yes.

> In any case, I don't want to change the predicate without being sure 
> what it does (here and elsewhere), so the attached patch solves the 
> problem by changing the condition.
> 
> Is this OK, or do I need to do something less blunt?

We can have floats in GPRs even without TARGET_SOFT_FLOAT, so this
does not even work?  And yes this is blunt :-)

Segher

An alternative fix for PR70944

2016-11-15 Thread Richard Sandiford

The transformations made by make_compound_operation apply
only to scalar integer modes.  The fix for PR70944 had enforced
that by returning early for vector modes at the top of the
function.  However, the function is supposed to be recursive,
so we should continue to look at integer suboperands even if
the outer operation is a vector one.

This patch instead splits out the non-recursive parts
of make_compound_operation into a subroutine and checks
that the mode is a scalar integer before calling it.
The patch was originally written to help with the later
conversion to static type checking of mode classes, but it
also happened to reenable optimisation of things like
vec_duplicate operands.

Note that the gen_lowparts in the PLUS and MINUS cases
were redundant, since new_rtx already had mode "mode"
at those points.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* combine.c (maybe_swap_commutative_operands): New function.
(combine_simplify_rtx): Use it.
(make_compound_operation_int): New function, split out of...
(make_compound_operation): ...here.  Use
maybe_swap_commutative_operands for both.

diff --git a/gcc/combine.c b/gcc/combine.c
index 66f628f..0665f38 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5479,6 +5479,21 @@ subst (rtx x, rtx from, rtx to, int in_dest, int 
in_cond, int unique_copy)
   return x;
 }
 
+/* If X is a commutative operation whose operands are not in the canonical
+   order, use substitutions to swap them.  */
+
+static void
+maybe_swap_commutative_operands (rtx x)
+{
+  if (COMMUTATIVE_ARITH_P (x)
+  && swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1)))
+{
+  rtx temp = XEXP (x, 0);
+  SUBST (XEXP (x, 0), XEXP (x, 1));
+  SUBST (XEXP (x, 1), temp);
+}
+}
+
 /* Simplify X, a piece of RTL.  We just operate on the expression at the
outer level; call `subst' to simplify recursively.  Return the new
expression.
@@ -5498,13 +5513,7 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int 
in_dest,
 
   /* If this is a commutative operation, put a constant last and a complex
  expression first.  We don't need to do this for comparisons here.  */
-  if (COMMUTATIVE_ARITH_P (x)
-  && swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1)))
-{
-  temp = XEXP (x, 0);
-  SUBST (XEXP (x, 0), XEXP (x, 1));
-  SUBST (XEXP (x, 1), temp);
-}
+  maybe_swap_commutative_operands (x);
 
   /* Try to fold this expression in case we have constants that weren't
  present before.  */
@@ -7747,55 +7756,38 @@ extract_left_shift (rtx x, int count)
   return 0;
 }
 
-/* Look at the expression rooted at X.  Look for expressions
-   equivalent to ZERO_EXTRACT, SIGN_EXTRACT, ZERO_EXTEND, SIGN_EXTEND.
-   Form these expressions.
-
-   Return the new rtx, usually just X.
+/* Subroutine of make_compound_operation.  *X_PTR is the rtx at the current
+   level of the expression and MODE is its mode.  IN_CODE is as for
+   make_compound_operation.  *NEXT_CODE_PTR is the value of IN_CODE
+   that should be used when recursing on operands of *X_PTR.
 
-   Also, for machines like the VAX that don't have logical shift insns,
-   try to convert logical to arithmetic shift operations in cases where
-   they are equivalent.  This undoes the canonicalizations to logical
-   shifts done elsewhere.
+   There are two possible actions:
 
-   We try, as much as possible, to re-use rtl expressions to save memory.
+   - Return null.  This tells the caller to recurse on *X_PTR with IN_CODE
+ equal to *NEXT_CODE_PTR, after which *X_PTR holds the final value.
 
-   IN_CODE says what kind of expression we are processing.  Normally, it is
-   SET.  In a memory address it is MEM.  When processing the arguments of
-   a comparison or a COMPARE against zero, it is COMPARE, or EQ if more
-   precisely it is an equality comparison against zero.  */
+   - Return a new rtx, which the caller returns directly.  */
 
-rtx
-make_compound_operation (rtx x, enum rtx_code in_code)
+static rtx
+make_compound_operation_int (machine_mode mode, rtx *x_ptr,
+enum rtx_code in_code,
+enum rtx_code *next_code_ptr)
 {
+  rtx x = *x_ptr;
+  enum rtx_code next_code = *next_code_ptr;
   enum rtx_code code = GET_CODE (x);
-  machine_mode mode = GET_MODE (x);
   int mode_width = GET_MODE_PRECISION (mode);
   rtx rhs, lhs;
-  enum rtx_code next_code;
-  int i, j;
   rtx new_rtx = 0;
+  int i;
   rtx tem;
-  const char *fmt;
   bool equality_comparison = false;
 
-  /* PR rtl-optimization/70944.  */
-  if (VECTOR_MODE_P (mode))
-return x;
-
-  /* Select the code to be used in recursive calls.  Once we are inside an
- address, we stay there.  If we have

Add a load_extend_op wrapper

2016-11-15 Thread Richard Sandiford

LOAD_EXTEND_OP only applies to scalar integer modes that are narrower
than a word.  However, callers weren't consistent about which of these
checks they made beforehand, and also weren't consistent about whether
"smaller" was based on (bit)size or precision (IMO it's the latter).
This patch adds a wrapper to try to make the macro easier to use.

LOAD_EXTEND_OP is often used to disable transformations that aren't
beneficial when extends from memory are free, so being stricter about
the check accidentally exposed more optimisation opportunities.

"SUBREG_BYTE (...) == 0" and subreg_lowpart_p are implied by
paradoxical_subreg_p, so the patch also removes some redundant tests.

The patch doesn't change reload, since different checks could have
unforeseen consequences.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtl.h (load_extend_op): Declare.
* rtlanal.c (load_extend_op): New function.
(nonzero_bits1): Use it.
(num_sign_bit_copies1): Likewise.
* cse.c (cse_insn): Likewise.
* fold-const.c (fold_single_bit_test): Likewise.
(fold_unary_loc): Likewise.
* fwprop.c (free_load_extend): Likewise.
* postreload.c (reload_cse_simplify_set): Likewise.
(reload_cse_simplify_operands): Likewise.
* combine.c (try_combine): Likewise.
(simplify_set): Likewise.  Remove redundant SUBREG_BYTE and
subreg_lowpart_p checks.

diff --git a/gcc/combine.c b/gcc/combine.c
index 0665f38..d685f44 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3738,7 +3738,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
{
  /* Or as a SIGN_EXTEND if LOAD_EXTEND_OP says that that's
 what it really is.  */
- if (LOAD_EXTEND_OP (GET_MODE (SUBREG_REG (*split)))
+ if (load_extend_op (GET_MODE (SUBREG_REG (*split)))
  == SIGN_EXTEND)
SUBST (*split, gen_rtx_SIGN_EXTEND (split_mode,
SUBREG_REG (*split)));
@@ -6794,16 +6794,13 @@ simplify_set (rtx x)
  would require a paradoxical subreg.  Replace the subreg with a
  zero_extend to avoid the reload that would otherwise be required.  */
 
-  if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
-  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (src)))
-  && LOAD_EXTEND_OP (GET_MODE (SUBREG_REG (src))) != UNKNOWN
-  && SUBREG_BYTE (src) == 0
-  && paradoxical_subreg_p (src)
-  && MEM_P (SUBREG_REG (src)))
+  enum rtx_code extend_op;
+  if (paradoxical_subreg_p (src)
+  && MEM_P (SUBREG_REG (src))
+  && (extend_op = load_extend_op (GET_MODE (SUBREG_REG (src != UNKNOWN)
 {
   SUBST (SET_SRC (x),
-gen_rtx_fmt_e (LOAD_EXTEND_OP (GET_MODE (SUBREG_REG (src))),
-   GET_MODE (src), SUBREG_REG (src)));
+gen_rtx_fmt_e (extend_op, GET_MODE (src), SUBREG_REG (src)));
 
   src = SET_SRC (x);
 }
diff --git a/gcc/cse.c b/gcc/cse.c
index 11b8fbe..72f1c4f 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4915,11 +4915,10 @@ cse_insn (rtx_insn *insn)
 also have such operations, but this is only likely to be
 beneficial on these machines.  */
 
+  rtx_code extend_op;
   if (flag_expensive_optimizations && src_related == 0
- && (GET_MODE_SIZE (mode) < UNITS_PER_WORD)
- && GET_MODE_CLASS (mode) == MODE_INT
  && MEM_P (src) && ! do_not_record
- && LOAD_EXTEND_OP (mode) != UNKNOWN)
+ && (extend_op = load_extend_op (mode)) != UNKNOWN)
{
  struct rtx_def memory_extend_buf;
  rtx memory_extend_rtx = &memory_extend_buf;
@@ -4928,7 +4927,7 @@ cse_insn (rtx_insn *insn)
  /* Set what we are trying to extend and the operation it might
 have been extended with.  */
  memset (memory_extend_rtx, 0, sizeof (*memory_extend_rtx));
- PUT_CODE (memory_extend_rtx, LOAD_EXTEND_OP (mode));
+ PUT_CODE (memory_extend_rtx, extend_op);
  XEXP (memory_extend_rtx, 0) = src;
 
  for (tmode = GET_MODE_WIDER_MODE (mode);
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index e14471e..c597414 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -6725,7 +6725,7 @@ fold_single_bit_test (location_t loc, enum tree_code code,
   /* If we are going to be able to omit the AND below, we must do our
 operations as unsigned.  If we must use the AND, we have a choice.
 Normally unsigned is faster, but for some machines signed is.  */
-  ops_unsigned = (LOAD_EXTEND_OP (operand_mode) == SIGN_EXTEND
+  ops_unsigned = (load_extend_op (operand_mode) == SIGN_EXTEND
  && !flag_syntax_o

Fix nb_iterations calculation in tree-vect-loop-manip.c

2016-11-15 Thread Richard Sandiford

We previously stored the number of loop iterations rather
than the number of latch iterations.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Set
nb_iterations to the number of latch iterations rather than the
number of loop iterations.

diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 6bfd332..4c6b8c7 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -285,7 +285,10 @@ slpeel_make_loop_iterate_ntimes (struct loop *loop, tree 
niters)
 LOCATION_LINE (loop_loc));
   dump_gimple_stmt (MSG_NOTE, TDF_SLIM, cond_stmt, 0);
 }
-  loop->nb_iterations = niters;
+
+  /* Record the number of latch iterations.  */
+  loop->nb_iterations = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), niters,
+build_int_cst (TREE_TYPE (niters), 1));
 }
 
 /* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg.

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Andrew Senkevich

2016-11-11 14:16 GMT+03:00 Uros Bizjak :
> --- a/gcc/genmodes.c
> +++ b/gcc/genmodes.c
> --- a/gcc/init-regs.c
> +++ b/gcc/init-regs.c
> --- a/gcc/machmode.h
> +++ b/gcc/machmode.h
>
> These are middle-end changes, you will need a separate review for these.

Who could review these changes?


--
WBR,
Andrew

Fix nb_iterations_estimate calculation in tree-vect-loop.c

2016-11-15 Thread Richard Sandiford

vect_transform_loop has to reduce three iteration counts by
the vectorisation factor: nb_iterations_upper_bound,
nb_iterations_likely_upper_bound and nb_iterations_estimate.
All three are latch execution counts rather than loop body
execution counts.  The calculations were taking that into
account for the first two, but not for nb_iterations_estimate.

This patch updates the way the calculations are done to fix
this and to add a bit more commentary about what is going on.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* tree-vect-loop.c (vect_transform_loop): Protect the updates of
all three iteration counts with an any_* test.  Use a single update
for each count.  Fix the calculation of nb_iterations_estimate.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 1cd9c72..53570f3 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7043,27 +7043,25 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   /* Reduce loop iterations by the vectorization factor.  */
   scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
  expected_iterations / vf);
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
-{
-  if (loop->nb_iterations_upper_bound != 0)
-loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 1;
-  if (loop->nb_iterations_likely_upper_bound != 0)
-loop->nb_iterations_likely_upper_bound
-  = loop->nb_iterations_likely_upper_bound - 1;
-}
-  loop->nb_iterations_upper_bound
-= wi::udiv_floor (loop->nb_iterations_upper_bound + 1, vf) - 1;
-  loop->nb_iterations_likely_upper_bound
-= wi::udiv_floor (loop->nb_iterations_likely_upper_bound + 1, vf) - 1;
-
+  /* The minimum number of iterations performed by the epilogue.  This
+ is 1 when peeling for gaps because we always need a final scalar
+ iteration.  */
+  int min_epilogue_iters = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? 1 : 0;
+  /* +1 to convert latch counts to loop iteration counts,
+ -min_epilogue_iters to remove iterations that cannot be performed
+   by the vector code.  */
+  int bias = 1 - min_epilogue_iters;
+  /* In these calculations the "- 1" converts loop iteration counts
+ back to latch counts.  */
+  if (loop->any_upper_bound)
+loop->nb_iterations_upper_bound
+  = wi::udiv_floor (loop->nb_iterations_upper_bound + bias, vf) - 1;
+  if (loop->any_likely_upper_bound)
+loop->nb_iterations_likely_upper_bound
+  = wi::udiv_floor (loop->nb_iterations_likely_upper_bound + bias, vf) - 1;
   if (loop->any_estimate)
-{
-  loop->nb_iterations_estimate
-   = wi::udiv_floor (loop->nb_iterations_estimate, vf);
-   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-  && loop->nb_iterations_estimate != 0)
-loop->nb_iterations_estimate = loop->nb_iterations_estimate - 1;
-}
+loop->nb_iterations_estimate
+  = wi::udiv_floor (loop->nb_iterations_estimate + bias, vf) - 1;
 
   if (dump_enabled_p ())
 {

Re: An alternative fix for PR70944

2016-11-15 Thread Segher Boessenkool

On Tue, Nov 15, 2016 at 12:33:06PM +, Richard Sandiford wrote:
> The transformations made by make_compound_operation apply
> only to scalar integer modes.  The fix for PR70944 had enforced
> that by returning early for vector modes at the top of the
> function.  However, the function is supposed to be recursive,
> so we should continue to look at integer suboperands even if
> the outer operation is a vector one.
> 
> This patch instead splits out the non-recursive parts
> of make_compound_operation into a subroutine and checks
> that the mode is a scalar integer before calling it.
> The patch was originally written to help with the later
> conversion to static type checking of mode classes, but it
> also happened to reenable optimisation of things like
> vec_duplicate operands.
> 
> Note that the gen_lowparts in the PLUS and MINUS cases
> were redundant, since new_rtx already had mode "mode"
> at those points.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Yes, please do.  You can use maybe_swap_commutative_operands in
change_zero_ext as well, perhaps in more places, do you want to
take a look?

Thanks,


Segher

[committed] Fix up gcc.dg/uninit-pr78295.c on i686-linux

2016-11-15 Thread Jakub Jelinek

Hi!

This test fails on i686-linux (and perhaps on powerpc* too) due to -Wpsabi
warnings.  Fixed thusly, committed as obvious to trunk.

2016-11-15  Jakub Jelinek  

PR middle-end/78295
* gcc.dg/uninit-pr78295.c: Add -Wno-psabi to dg-options.

--- gcc/testsuite/gcc.dg/uninit-pr78295.c.jj2016-11-11 14:01:07.709408173 
+0100
+++ gcc/testsuite/gcc.dg/uninit-pr78295.c   2016-11-15 14:52:47.738947202 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wall" } */
+/* { dg-options "-O2 -Wall -Wno-psabi" } */
 
 typedef double vectype __attribute__ ((__vector_size__ (16)));
 

Jakub

[PATCH,testsuite] MIPS: Downgrade from R6 to R5 to prevent redundant testing of branch-cost-1.c.

2016-11-15 Thread Toma Tabacu

Hi,

The branch-cost-1.c test uses the isa>=4 option to ensure the existence of the
MOVN/MOVZ instructions. This, however, does not take into account R6 targets,
which are accepted by the isa>=4 option but do not support MOVN/MOVZ.

This particular test does not fail on R6, because it is checking for the
absence of MOVN/MOVZ, but it is redundant.

This patch fixes this by replacing isa>=4 with (HAS_MOVN), which will only
accept targets in the [MIPS IV, R5] interval.

Tested with mips-img-linux-gnu.

Regards,
Toma Tabacu

gcc/testsuite/ChangeLog:

2016-11-15  Toma Tabacu  

* gcc.target/mips/branch-cost-1.c: Use (HAS_MOVN) instead of isa>=4,
in order to downgrade to R5.

diff --git a/gcc/testsuite/gcc.target/mips/branch-cost-1.c 
b/gcc/testsuite/gcc.target/mips/branch-cost-1.c
index 61c3029..7f7ebbe 100644
--- a/gcc/testsuite/gcc.target/mips/branch-cost-1.c
+++ b/gcc/testsuite/gcc.target/mips/branch-cost-1.c
@@ -1,4 +1,4 @@
-/* { dg-options "-mbranch-cost=1 isa>=4" } */
+/* { dg-options "-mbranch-cost=1 (HAS_MOVN)" } */
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
 NOMIPS16 int
 foo (int x, int y, int z, int k)

[PATCH] Support -fsanitize=integer-arith-overflow even for vectors (PR sanitizer/77823)

2016-11-15 Thread Jakub Jelinek

Hi!

On Mon, Nov 14, 2016 at 10:58:51AM +0100, Jakub Jelinek wrote:
> Working virtually out of Samoa.
> 
> The following patch is an attempt to handle -fsanitize=undefined
> for vectors.  We already diagnose out of bounds accesses for vector
> subscripts, this patch adds expansion for vector UBSAN_CHECK_* and generates
> those in ubsan.  Haven't finished up the many vect elements handling (want
> to emit a loop for code size).  Is this something we want for GCC 7?

Here is the full patch (just for -fsanitize=signed-integer-overflow, not
for -fsanitize=shift or -fsanitize={integer,float}-divide-by-zero for now).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-15  Jakub Jelinek  

PR sanitizer/77823
* ubsan.c (ubsan_build_overflow_builtin): Add DATAP argument, if
it points to non-NULL tree, use it instead of ubsan_create_data.
(instrument_si_overflow): Handle vector signed integer overflow
checking.
* ubsan.h (ubsan_build_overflow_builtin): Add DATAP argument.
* tree-vrp.c (simplify_internal_call_using_ranges): Punt for
vector IFN_UBSAN_CHECK_*.
* internal-fn.c (expand_addsub_overflow): Add DATAP argument,
pass it through to ubsan_build_overflow_builtin.
(expand_neg_overflow, expand_mul_overflow): Likewise.
(expand_vector_ubsan_overflow): New function.
(expand_UBSAN_CHECK_ADD, expand_UBSAN_CHECK_SUB,
expand_UBSAN_CHECK_MUL): Use tit for vector arithmetics.
(expand_arith_overflow): Adjust expand_*_overflow callers.

* c-c++-common/ubsan/overflow-vec-1.c: New test.
* c-c++-common/ubsan/overflow-vec-2.c: New test.

--- gcc/ubsan.c.jj  2016-11-14 19:57:07.005897502 +0100
+++ gcc/ubsan.c 2016-11-15 09:09:33.288146293 +0100
@@ -1219,14 +1219,20 @@ instrument_null (gimple_stmt_iterator gs
 
 tree
 ubsan_build_overflow_builtin (tree_code code, location_t loc, tree lhstype,
- tree op0, tree op1)
+ tree op0, tree op1, tree *datap)
 {
   if (flag_sanitize_undefined_trap_on_error)
 return build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 0);
 
-  tree data = ubsan_create_data ("__ubsan_overflow_data", 1, &loc,
-ubsan_type_descriptor (lhstype), NULL_TREE,
-NULL_TREE);
+  tree data;
+  if (datap && *datap)
+data = *datap;
+  else
+data = ubsan_create_data ("__ubsan_overflow_data", 1, &loc,
+ ubsan_type_descriptor (lhstype), NULL_TREE,
+ NULL_TREE);
+  if (datap)
+*datap = data;
   enum built_in_function fn_code;
 
   switch (code)
@@ -1272,14 +1278,15 @@ instrument_si_overflow (gimple_stmt_iter
   tree_code code = gimple_assign_rhs_code (stmt);
   tree lhs = gimple_assign_lhs (stmt);
   tree lhstype = TREE_TYPE (lhs);
+  tree lhsinner = VECTOR_TYPE_P (lhstype) ? TREE_TYPE (lhstype) : lhstype;
   tree a, b;
   gimple *g;
 
   /* If this is not a signed operation, don't instrument anything here.
  Also punt on bit-fields.  */
-  if (!INTEGRAL_TYPE_P (lhstype)
-  || TYPE_OVERFLOW_WRAPS (lhstype)
-  || GET_MODE_BITSIZE (TYPE_MODE (lhstype)) != TYPE_PRECISION (lhstype))
+  if (!INTEGRAL_TYPE_P (lhsinner)
+  || TYPE_OVERFLOW_WRAPS (lhsinner)
+  || GET_MODE_BITSIZE (TYPE_MODE (lhsinner)) != TYPE_PRECISION (lhsinner))
 return;
 
   switch (code)
@@ -1305,7 +1312,7 @@ instrument_si_overflow (gimple_stmt_iter
   /* Represent i = -u;
 as
 i = UBSAN_CHECK_SUB (0, u);  */
-  a = build_int_cst (lhstype, 0);
+  a = build_zero_cst (lhstype);
   b = gimple_assign_rhs1 (stmt);
   g = gimple_build_call_internal (IFN_UBSAN_CHECK_SUB, 2, a, b);
   gimple_call_set_lhs (g, lhs);
@@ -1316,7 +1323,7 @@ instrument_si_overflow (gimple_stmt_iter
 into
 _N = UBSAN_CHECK_SUB (0, u);
 i = ABS_EXPR<_N>;  */
-  a = build_int_cst (lhstype, 0);
+  a = build_zero_cst (lhstype);
   b = gimple_assign_rhs1 (stmt);
   g = gimple_build_call_internal (IFN_UBSAN_CHECK_SUB, 2, a, b);
   a = make_ssa_name (lhstype);
--- gcc/ubsan.h.jj  2016-11-14 19:57:07.027897220 +0100
+++ gcc/ubsan.h 2016-11-14 20:37:20.892032650 +0100
@@ -52,7 +52,8 @@ extern tree ubsan_create_data (const cha
 extern tree ubsan_type_descriptor (tree, enum ubsan_print_style = 
UBSAN_PRINT_NORMAL);
 extern tree ubsan_encode_value (tree, bool = false);
 extern bool is_ubsan_builtin_p (tree);
-extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree, 
tree);
+extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree,
+ tree, tree *);
 extern tree ubsan_instrument_float_cast (location_t, tree, tree);
 extern tree ubsan_get_source_location_type (void);
 
--- gcc/tree-vrp.c.jj   2016-11-14 19:57:06.957898116 +0100
+++ gcc/tree-vrp.c  2016-11-

Re: [PATCH] rs6000: Make deallocation of a large frame work (PR77687)

2016-11-15 Thread David Edelsohn

On Tue, Nov 15, 2016 at 6:48 AM, Segher Boessenkool
 wrote:
> If we use ABI_V4 and we have a big stack frame, we end the epilogue
> with a "mr 1,11" (or similar) instruction.  This instruction however
> has no dependencies on the earlier restores from stack (done via r11),
> so sched2 can end up reordering the insns, which is bad because we
> have no red zone so that you then restore from stack that is already
> deallocated.
>
> This fixes it by making that restore depend on the memory accesses.
>
> Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?  Do we
> want a testcase for this?
>
>
> Segher
>
>
> 2016-11-15  Segher Boessenkool  
>
> * config/rs6000/rs6000.c (rs6000_emit_stack_reset): Emit the
> stack_restore_tie insn instead of stack_tie, for the SVR4 and
> SPE ABIs.
> * config/rs6000/rs6000.md (stack_restore_tie): New define_insn.

Okay.

A similar change may be necessary for other uses of rs6000_stack_tie
in the prologue.

It would be good to comment somewhere that stack_restore_tie is a
superset of rs6000_stack_tie.

Thanks, David

[C++ PATCH] Add mangling for P0217R3 decompositions at namespace scope

2016-11-15 Thread Jakub Jelinek

Hi!

On the following testcase we ICE, because the underlying artificial decls
have NULL DECL_NAME (intentional), thus mangling is not able to figure out
what to do.  This patch attempts to follow the
http://sourcerytools.com/pipermail/cxx-abi-dev/2016-August/002951.html
proposal (and for error recovery just uses  in order not to ICE).

Not really sure about ABI tags though.
I guess one can specify abi tag on the whole decomposition, perhaps
__attribute__((abi_tag ("foobar"))) auto [ a, b ] = A ();
And/or there could be ABI tags on the type of the artifical decl.
What about ABI tags on the types that the decomposition resolved to
(say if std::tuple* is involved)?  Shall all ABI tags go at the end
of the whole decomp decl, or shall the individual source names have their
ABI tags attached after them?
What about the std::tuple* case where the standalone vars exist too,
shall e.g. abi_tag attributes be copied from the decomp var to those?
Any other attributes to copy over (e.g. unused comes to mind).

In any case, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk (and the rest would be resolved incrementally)?

2016-11-15  Jakub Jelinek  

* decl.c (cp_finish_decomp): For DECL_NAMESPACE_SCOPE_P decl,
set DECL_ASSEMBLER_NAME.
* parser.c (cp_parser_decomposition_declaration): Likewise
if returning error_mark_node.
* mangle.c (mangle_decomp): New function.
* cp-tree.h (mangle_decomp): New declaration.

* g++.dg/cpp1z/decomp12.C: New test.

--- gcc/cp/decl.c.jj2016-11-15 09:57:00.0 +0100
+++ gcc/cp/decl.c   2016-11-15 12:16:41.230596777 +0100
@@ -7301,7 +7301,6 @@ get_tuple_decomp_init (tree decl, unsign
 void
 cp_finish_decomp (tree decl, tree first, unsigned int count)
 {
-  location_t loc = DECL_SOURCE_LOCATION (decl);
   if (error_operand_p (decl))
 {
  error_out:
@@ -7315,9 +7314,12 @@ cp_finish_decomp (tree decl, tree first,
}
  first = DECL_CHAIN (first);
}
+  if (DECL_P (decl) && DECL_NAMESPACE_SCOPE_P (decl))
+   SET_DECL_ASSEMBLER_NAME (decl, get_identifier (""));
   return;
 }
 
+  location_t loc = DECL_SOURCE_LOCATION (decl);
   if (type_dependent_expression_p (decl)
   /* This happens for range for when not in templates.
 Still add the DECL_VALUE_EXPRs for later processing.  */
@@ -7530,6 +7532,8 @@ cp_finish_decomp (tree decl, tree first,
i++;
  }
 }
+  if (DECL_NAMESPACE_SCOPE_P (decl))
+SET_DECL_ASSEMBLER_NAME (decl, mangle_decomp (decl, v));
 }
 
 /* Returns a declaration for a VAR_DECL as if:
--- gcc/cp/parser.c.jj  2016-11-15 10:37:56.0 +0100
+++ gcc/cp/parser.c 2016-11-15 12:16:26.361784744 +0100
@@ -12944,6 +12944,7 @@ cp_parser_decomposition_declaration (cp_
   tree decl = start_decl (declarator, decl_specifiers, SD_INITIALIZED,
  NULL_TREE, decl_specifiers->attributes,
  &pushed_scope);
+  tree orig_decl = decl;
 
   unsigned int i;
   cp_expr e;
@@ -13020,6 +13021,12 @@ cp_parser_decomposition_declaration (cp_
   if (pushed_scope)
 pop_scope (pushed_scope);
 
+  if (decl == error_mark_node && DECL_P (orig_decl))
+{
+  if (DECL_NAMESPACE_SCOPE_P (orig_decl))
+   SET_DECL_ASSEMBLER_NAME (orig_decl, get_identifier (""));
+}
+
   return decl;
 }
 
--- gcc/cp/mangle.c.jj  2016-11-11 14:01:06.0 +0100
+++ gcc/cp/mangle.c 2016-11-15 11:48:58.345751857 +0100
@@ -3995,6 +3995,53 @@ mangle_vtt_for_type (const tree type)
   return mangle_special_for_type (type, "TT");
 }
 
+/* Returns an identifier for the mangled name of the decomposition
+   artificial variable DECL.  DECLS is the vector of the VAR_DECLs
+   for the identifier-list.  */
+
+tree
+mangle_decomp (const tree decl, vec &decls)
+{
+  gcc_assert (!type_dependent_expression_p (decl));
+
+  location_t saved_loc = input_location;
+  input_location = DECL_SOURCE_LOCATION (decl);
+
+  start_mangling (decl);
+  write_string ("_Z");
+
+  tree context = decl_mangling_context (decl);
+  gcc_assert (context != NULL_TREE);
+
+  bool nested = false;
+  if (DECL_NAMESPACE_STD_P (context))
+write_string ("St");
+  else if (context != global_namespace)
+{
+  nested = true;
+  write_char ('N');
+  write_prefix (decl_mangling_context (decl));
+}
+
+  write_string ("DC");
+  unsigned int i;
+  tree d;
+  FOR_EACH_VEC_ELT (decls, i, d)
+write_unqualified_name (d);
+  write_char ('E');
+
+  if (nested)
+write_char ('E');
+
+  tree id = finish_mangling_get_identifier ();
+  if (DEBUG_MANGLE)
+fprintf (stderr, "mangle_decomp = '%s'\n\n",
+ IDENTIFIER_POINTER (id));
+
+  input_location = saved_loc;
+  return id;
+}
+
 /* Return an identifier for a construction vtable group.  TYPE is
the most derived class in the hierarchy; BINFO is the base
subobject for which this construction vtable group will be used.
--- gcc/cp/cp-tree.h.jj 2016-11-15

[C++ PATCH] SOme further g++.dg/cpp1z/decomp*.C tests

2016-11-15 Thread Jakub Jelinek

Hi!

This patch adds 3 new tests.  Tested on x86_64-linux, ok for trunk?

2016-11-15  Jakub Jelinek  

* g++.dg/cpp1z/decomp13.C: New test.
* g++.dg/cpp1z/decomp14.C: New test.
* g++.dg/cpp1z/decomp15.C: New test.

--- gcc/testsuite/g++.dg/cpp1z/decomp13.C.jj2016-11-15 14:25:18.902048735 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp13.C   2016-11-15 14:48:12.795463351 
+0100
@@ -0,0 +1,30 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+struct A { int f; };
+struct B { int b; };
+struct C : virtual A {};
+struct D : virtual A {};
+struct E { int f; };
+struct F : A { int f; };
+struct G : A, E {};
+struct H : C, D {};
+struct I : A, C {};// { dg-warning "due to ambiguity" }
+struct J : B {};
+struct K : B, virtual J {};// { dg-warning "due to ambiguity" }
+struct L : virtual J {};
+struct M : virtual J, L {};
+
+void
+foo (C &c, F &f, G &g, H &h, I &i, K &k, M &m)
+{
+  auto [ ci ] = c; // { dg-warning "decomposition declaration only 
available with" "" { target c++14_down } }
+  auto [ fi ] = f; // { dg-error "cannot decompose class type 'F': 
both it and its base class 'A' have non-static data members" }
+   // { dg-warning "decomposition declaration only 
available with" "" { target c++14_down } .-1 }
+  auto [ gi ] = g; // { dg-error "cannot decompose class type 'G': 
its base classes 'A' and 'E' have non-static data members" }
+   // { dg-warning "decomposition declaration only 
available with" "" { target c++14_down } .-1 }
+  auto [ hi ] = h; // { dg-warning "decomposition declaration only 
available with" "" { target c++14_down } }
+  auto [ ki ] = k; // { dg-error "'B' is an ambiguous base of 'K'" 
}
+   // { dg-warning "decomposition declaration only 
available with" "" { target c++14_down } .-1 }
+  auto [ mi ] = m; // { dg-warning "decomposition declaration only 
available with" "" { target c++14_down } }
+}
--- gcc/testsuite/g++.dg/cpp1z/decomp14.C.jj2016-11-15 14:30:40.296941834 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp14.C   2016-11-15 14:50:32.361678491 
+0100
@@ -0,0 +1,24 @@
+// { dg-do compile }
+// { dg-options "-std=c++1z" }
+
+struct A { bool a, b; };
+struct B { int a, b; };
+
+void
+foo ()
+{
+  auto [ a, b ] = A ();
+  for (auto [ a, b ] = A (); a; )
+;
+  if (auto [ a, b ] = A (); a)
+;
+  switch (auto [ a, b ] = B (); b)
+{
+case 2:
+  break;
+}
+  auto && [ c, d ] = A ();
+  [[maybe_unused]] auto [ e, f ] = A ();
+  alignas (A) auto [ g, h ] = A ();
+  __attribute__((unused)) auto [ i, j ] = A ();
+}
--- gcc/testsuite/g++.dg/cpp1z/decomp15.C.jj2016-11-15 14:38:55.198602649 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp15.C   2016-11-15 14:46:33.0 
+0100
@@ -0,0 +1,47 @@
+// { dg-do compile }
+// { dg-options "-std=c++1z" }
+
+struct A { bool a, b; };
+struct B { int a, b; };
+
+void
+foo ()
+{
+  auto [ a, b ] = A ();
+  for (; auto [ a, b ] = A (); )   // { dg-error 
"expected" }
+;
+  for (; false; auto [ a, b ] = A ())  // { dg-error 
"expected" }
+;
+  if (auto [ a, b ] = A ())// { dg-error 
"expected" }
+;
+  if (auto [ a, b ] = A (); auto [ c, d ] = A ())  // { dg-error 
"expected" }
+;
+  if (int d = 5; auto [ a, b ] = A ()) // { dg-error 
"expected" }
+;
+  switch (auto [ a, b ] = B ())// { dg-error 
"expected" }
+{
+case 2:
+  break;
+}
+  switch (int d = 5; auto [ a, b ] = B ()) // { dg-error 
"expected" }
+{
+case 2:
+  break;
+}
+  A e = A ();
+  auto && [ c, d ] = e;
+  auto [ i, j ] = A (), [ k, l ] = A ();   // { dg-error 
"expected" }
+  auto m = A (), [ n, o ] = A ();  // { dg-error 
"expected" }
+}
+
+template 
+auto [ a, b ] = A ();  // { dg-error 
"expected" }
+
+struct C
+{
+  auto [ e, f ] = A ();// { dg-error 
"expected" }
+  mutable auto [ g, h ] = A ();// { dg-error 
"expected" }
+  virtual auto [ i, j ] = A ();// { dg-error 
"expected" }
+  explicit auto [ k, l ] = A ();   // { dg-error 
"expected" }
+  friend auto [ m, n ] = A (); // { dg-error 
"expected" }
+};

Jakub

[PATCH] Constrain swap overload for std::optional (LWG 2748)

2016-11-15 Thread Jonathan Wakely


This implements the resolution to LWG 2748 which was approved the
other day at the WG21 meeting. I think the resolution is wrong,
because as the test shows it means that optional is
swappable in some cases. I've raised that with the LWG and will
probably create a new issue for it, but in the meantime this
implements what the spec says.

* doc/xml/manual/intro.xml: Document LWG 2748 status.
* include/std/optional (optional::swap): Use is_nothrow_swappable_v
for exception specification.
(swap(optional&, optional&)): Disable when T is not swappable.
* testsuite/20_util/optional/swap/2.cc: New test.

Tested powerpc64le-linux.

This only affects experimental C++17 stuff, so I'm committing it even
though we're in Stage 3.

commit 9f7a9a3b0091dd09870e4044cacdb4db95a994ab
Author: Jonathan Wakely 
Date:   Tue Nov 15 12:06:08 2016 +

Constrain swap overload for std::optional (LWG 2748)

	* doc/xml/manual/intro.xml: Document LWG 2748 status.
	* include/std/optional (optional::swap): Use is_nothrow_swappable_v
	for exception specification.
	(swap(optional&, optional&)): Disable when T is not swappable.
	* testsuite/20_util/optional/swap/2.cc: New test.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml
index 528b192..0df24bb 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1094,7 +1094,7 @@ requirements of the license of GCC.
 
 
 http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2583">2583:
-   There is no way to supply an allocator for  basic_string(str, pos)
+   There is no way to supply an allocator for basic_string(str, pos)

 
 Add new constructor
@@ -1107,6 +1107,14 @@ requirements of the license of GCC.
 Define the value_compare typedef.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2748">2748:
+   swappable traits for optionals
+   
+
+Disable the non-member swap overload when
+  the contained object is not swappable.
+
+
   
 
  
diff --git a/libstdc++-v3/include/std/optional b/libstdc++-v3/include/std/optional
index ac73ea7..ea673cc 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -613,7 +613,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   swap(optional& __other)
   noexcept(is_nothrow_move_constructible<_Tp>()
-   && noexcept(swap(declval<_Tp&>(), declval<_Tp&>(
+   && is_nothrow_swappable_v<_Tp>)
   {
 using std::swap;
 
@@ -920,8 +920,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !__rhs || __lhs >= *__rhs; }
 
   // Swap and creation functions.
+
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2748. swappable traits for optionals
   template
-inline void
+inline enable_if_t && is_swappable_v<_Tp>>
 swap(optional<_Tp>& __lhs, optional<_Tp>& __rhs)
 noexcept(noexcept(__lhs.swap(__rhs)))
 { __lhs.swap(__rhs); }
diff --git a/libstdc++-v3/testsuite/20_util/optional/swap/2.cc b/libstdc++-v3/testsuite/20_util/optional/swap/2.cc
new file mode 100644
index 000..5793488
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/optional/swap/2.cc
@@ -0,0 +1,45 @@
+// { dg-options "-std=gnu++17" }
+// { dg-do compile }
+
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+// Swappable.
+struct A { };
+
+static_assert( std::is_nothrow_swappable_v );
+static_assert( std::is_nothrow_swappable_v> );
+
+// Swappable, but might throw.
+struct B { };
+void swap(B&, B&) noexcept(false);
+
+static_assert( std::is_swappable_v> );
+static_assert( !std::is_nothrow_swappable_v> );
+
+// Not swappable, but optional is swappable via the generic std::swap.
+struct C { };
+void swap(C&, C&) = delete;
+
+static_assert( std::is_swappable_v> );
+
+// Not swappable, and optional not swappable via the generic std::swap.
+struct D { D(D&&) = delete; };
+
+static_assert( !std::is_swappable_v> );

[PATCH] Add std::string constructor for substring of string_view (LWG 2742)

2016-11-15 Thread Jonathan Wakely


This is another issue resolution for C++17 features that was approved
at the recent meeting. I think this resolution is wrong too, but in
this case the fix is obvious so I've gone ahead and done it.

* doc/xml/manual/intro.xml: Document LWG 2742 status.
* doc/html/*: Regenerate.
* include/bits/basic_string.h
(basic_string(const T&, size_type, size_type, const Allocator&)): Add
constructor for substring of basic_string_view, as per LWG 2742 but
with additional constraint to fix ambiguity.
* testsuite/21_strings/basic_string/cons/char/9.cc: New test.
* testsuite/21_strings/basic_string/cons/wchar_t/9.cc: New test.

Tested powerpc64le-linux, comitted to trunk.

commit d8d7a6fba221a205f28212a8bb6288aa724d5c63
Author: Jonathan Wakely 
Date:   Tue Nov 15 12:52:42 2016 +

Add std::string constructor for substring of string_view (LWG 2742)

	* doc/xml/manual/intro.xml: Document LWG 2742 status.
	* doc/html/*: Regenerate.
	* include/bits/basic_string.h
	(basic_string(const T&, size_type, size_type, const Allocator&)): Add
	constructor for substring of basic_string_view, as per LWG 2742 but
	with additional constraint to fix ambiguity.
	* testsuite/21_strings/basic_string/cons/char/9.cc: New test.
	* testsuite/21_strings/basic_string/cons/wchar_t/9.cc: New test.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml
index 0df24bb..7f2586d 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1107,6 +1107,14 @@ requirements of the license of GCC.
 Define the value_compare typedef.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2742">2742:
+   Inconsistent string interface taking string_view
+   
+
+Add the new constructor and additionally constrain it
+  to avoid ambiguities with non-const charT*.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2748">2748:
swappable traits for optionals

diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index b80e270..943e88d 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -586,12 +586,27 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 #if __cplusplus > 201402L
   /**
+   *  @brief  Construct string from a substring of a string_view.
+   *  @param  __t   Source string view.
+   *  @param  __pos The index of the first character to copy from __t.
+   *  @param  __n   The number of characters to copy from __t.
+   *  @param  __a   Allocator to use.
+   */
+  template,
+			__not_>>>
+	basic_string(const _Tp& __t, size_type __pos, size_type __n,
+		 const _Alloc& __a = _Alloc())
+	: basic_string(__sv_type(__t).substr(__pos, __n), __a) { }
+
+  /**
*  @brief  Construct string from a string_view.
*  @param  __sv  Source string view.
*  @param  __a  Allocator to use (default is default allocator).
*/
-  explicit basic_string(__sv_type __sv, const _Alloc& __a = _Alloc())
-	: basic_string(__sv.data(), __sv.size(), __a) {}
+  explicit
+  basic_string(__sv_type __sv, const _Alloc& __a = _Alloc())
+  : basic_string(__sv.data(), __sv.size(), __a) { }
 #endif // C++17
 
   /**
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/cons/char/9.cc b/libstdc++-v3/testsuite/21_strings/basic_string/cons/char/9.cc
new file mode 100644
index 000..0024ffc
--- /dev/null
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/cons/char/9.cc
@@ -0,0 +1,46 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do run { target c++1z } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  using C = char;
+  using string_type = std::basic_string;
+  using view_type = std::basic_string_view;
+
+  std::allocator alloc;
+  VERIFY( string_type(view_type("string")) == "string" );
+  VERIFY( string_type(view_type("string"), alloc) == "string" );
+
+  // LWG 2742
+  VERIFY( string_type("substring", 3, 6) == "string" );
+  VERIFY( string_type("substring", 3, 6, alloc) == "string" );
+  VERIFY( st

[PATCH] Fix find&replace error in license boilerplate

2016-11-15 Thread Jonathan Wakely


There are 138 test files that says "a moved_to of the GNU General
Public License", presumably from a find& replace that then got copied
to new tests.

Fixed as obvious, as shown by this patch (the full patch is too big
for the mailing list).

commit 4ae2edc3c515f76e02712f744a0e3081de94be18
Author: Jonathan Wakely 
Date:   Tue Nov 15 14:31:52 2016 +

Fix find&replace error in license boilerplate

	* testsuite/19_diagnostics/error_code/is_error_code_v.cc: Fix license
	text.
	* testsuite/20_util/any/assign/emplace.cc: Likewise.
	* testsuite/20_util/any/cons/in_place.cc: Likewise.
	* testsuite/20_util/any/make_any.cc: Likewise.
	* testsuite/20_util/any/requirements.cc: Likewise.
	* testsuite/20_util/any/typedefs.cc: Likewise.
	* testsuite/20_util/bind/is_placeholder_v.cc: Likewise.
	* testsuite/20_util/duration/requirements/treat_as_floating_point_v.cc:
	Likewise.
	* testsuite/20_util/in_place/requirements.cc: Likewise.
	* testsuite/20_util/optional/77288.cc: Likewise.
	* testsuite/20_util/optional/assignment/1.cc: Likewise.
	* testsuite/20_util/optional/assignment/2.cc: Likewise.
	* testsuite/20_util/optional/assignment/3.cc: Likewise.
	* testsuite/20_util/optional/assignment/4.cc: Likewise.
	* testsuite/20_util/optional/assignment/5.cc: Likewise.
	* testsuite/20_util/optional/assignment/6.cc: Likewise.
	* testsuite/20_util/optional/assignment/7.cc: Likewise.
	* testsuite/20_util/optional/cons/77727.cc: Likewise.
	* testsuite/20_util/optional/cons/move.cc: Likewise.
	* testsuite/20_util/optional/cons/value.cc: Likewise.
	* testsuite/20_util/optional/cons/value_neg.cc: Likewise.
	* testsuite/20_util/optional/constexpr/cons/value.cc: Likewise.
	* testsuite/20_util/optional/constexpr/make_optional.cc: Likewise.
	* testsuite/20_util/optional/constexpr/observers/1.cc: Likewise.
	* testsuite/20_util/optional/constexpr/observers/2.cc: Likewise.
	* testsuite/20_util/optional/constexpr/observers/3.cc: Likewise.
	* testsuite/20_util/optional/constexpr/observers/4.cc: Likewise.
	* testsuite/20_util/optional/constexpr/observers/5.cc: Likewise.
	* testsuite/20_util/optional/constexpr/relops/1.cc: Likewise.
	* testsuite/20_util/optional/constexpr/relops/2.cc: Likewise.
	* testsuite/20_util/optional/constexpr/relops/3.cc: Likewise.
	* testsuite/20_util/optional/constexpr/relops/4.cc: Likewise.
	* testsuite/20_util/optional/constexpr/relops/5.cc: Likewise.
	* testsuite/20_util/optional/constexpr/relops/6.cc: Likewise.
	* testsuite/20_util/optional/hash.cc: Likewise.
	* testsuite/20_util/optional/make_optional.cc: Likewise.
	* testsuite/20_util/optional/observers/1.cc: Likewise.
	* testsuite/20_util/optional/observers/2.cc: Likewise.
	* testsuite/20_util/optional/observers/3.cc: Likewise.
	* testsuite/20_util/optional/observers/4.cc: Likewise.
	* testsuite/20_util/optional/observers/5.cc: Likewise.
	* testsuite/20_util/optional/observers/6.cc: Likewise.
	* testsuite/20_util/optional/relops/1.cc: Likewise.
	* testsuite/20_util/optional/relops/2.cc: Likewise.
	* testsuite/20_util/optional/relops/3.cc: Likewise.
	* testsuite/20_util/optional/relops/4.cc: Likewise.
	* testsuite/20_util/optional/relops/5.cc: Likewise.
	* testsuite/20_util/optional/relops/6.cc: Likewise.
	* testsuite/20_util/optional/requirements.cc: Likewise.
	* testsuite/20_util/optional/swap/1.cc: Likewise.
	* testsuite/20_util/optional/typedefs.cc: Likewise.
	* testsuite/20_util/ratio/requirements/ratio_equal_v.cc: Likewise.
	* testsuite/20_util/tuple/tuple_size_v.cc: Likewise.
	* testsuite/20_util/uses_allocator/requirements/uses_allocator_v.cc:
	Likewise.
	* testsuite/20_util/variable_templates_for_traits.cc: Likewise.
	* testsuite/20_util/variant/hash.cc: Likewise.
	* testsuite/21_strings/basic_string_view/typedefs.cc: Likewise.
	* testsuite/experimental/any/typedefs.cc: Likewise.
	* testsuite/experimental/array/make_array.cc: Likewise.
	* testsuite/experimental/array/neg.cc: Likewise.
	* testsuite/experimental/chrono/value.cc: Likewise.
	* testsuite/experimental/deque/erasure.cc: Likewise.
	* testsuite/experimental/forward_list/erasure.cc: Likewise.
	* testsuite/experimental/list/erasure.cc: Likewise.
	* testsuite/experimental/map/erasure.cc: Likewise.
	* testsuite/experimental/memory/observer_ptr/assignment/assign.cc:
	Likewise.
	* testsuite/experimental/memory/observer_ptr/cons/cons.cc: Likewise.
	* testsuite/experimental/memory/observer_ptr/hash/hash.cc: Likewise.
	* testsuite/experimental/memory/observer_ptr/make_observer.cc:
	Likewise.
	* testsuite/experimental/memory/observer_ptr/relops/relops.cc:
	Likewise.
	* testsuite/experimental/memory/observer_ptr/requirements.cc: Likewise.
	* testsuite/experimental/memory/observer_ptr/swap/sw

[C++ PATCH] tweak PR77337 testcase

2016-11-15 Thread Jakub Jelinek

On Thu, Nov 10, 2016 at 10:50:27PM +0100, Jakub Jelinek wrote:
> > +   self(); // error: use of 'decltype(auto) 
> > fix_type::operator()() [with Functor = main()::]' 
> > before deduction of 'auto'
> 
> Wouldn't it be clearer to turn that // error: line into
> // { dg-bogus "use of \[^\n\r]* before deduction of 'auto'" }
> so that it is clear that the error is undesirable even to casual reader?

Now in the form of patch.  Tested on x86_64-linux, ok for trunk?

2016-11-15  Jakub Jelinek  

* g++.dg/cpp1y/auto-fn33.C (main): Turn // error: ... into dg-bogus.

--- gcc/testsuite/g++.dg/cpp1y/auto-fn33.C.jj   2016-11-11 12:45:40.0 
+0100
+++ gcc/testsuite/g++.dg/cpp1y/auto-fn33.C  2016-11-15 15:36:58.538054171 
+0100
@@ -20,7 +20,7 @@ int main()
  {
return 0;
 
-   self(); // error: use of 'decltype(auto) 
fix_type::operator()() [with Functor = main()::]' 
before deduction of 'auto'
+   self(); // { dg-bogus "use of \[^\n\r]* before deduction of 'auto'" }
  });
 
   return zero();


Jakub

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-15 Thread Yuri Rumyantsev

Hi All,

Here is patch for non-masked epilogue vectoriziation.

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?

Thanks.
Changelog:

2016-11-15  Yuri Rumyantsev  

* params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
* tree-if-conv.c (tree_if_conversion): Make public.
* * tree-if-conv.h: New file.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
dynamic alias checks for epilogues.
* tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
* tree-vect-loop.c: include tree-if-conv.h.
(new_loop_vec_info): Add zeroing orig_loop_info field.
(vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
(vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
using passed argument.
(vect_transform_loop): Check if created epilogue should be returned
for further vectorization with less vf.  If-convert epilogue if
required. Print vectorization success for epilogue.
* tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
if it is required, pass loop_vinfo produced during vectorization of
loop body to vect_analyze_loop.
* tree-vectorizer.h (struct _loop_vec_info): Add new field
orig_loop_info.
(LOOP_VINFO_ORIG_LOOP_INFO): New.
(LOOP_VINFO_EPILOGUE_P): New.
(LOOP_VINFO_ORIG_VECT_FACTOR): New.
(vect_do_peeling): Change prototype to return epilogue.
(vect_analyze_loop): Add argument of loop_vec_info type.
(vect_transform_loop): Return created loop.

gcc/testsuite/

* lib/target-supports.exp (check_avx2_hw_available): New.
(check_effective_target_avx2_runtime): New.
* gcc.dg/vect/vect-tail-nomask-1.c: New test.

2016-11-14 20:04 GMT+03:00 Richard Biener :
> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
>  wrote:
>>Richard,
>>
>>I checked one of the tests designed for epilogue vectorization using
>>patches 1 - 3 and found out that build compiler performs vectorization
>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>
>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>t1.new-nomask.s -fdump-tree-vect-details
>>$ grep VECTORIZED -c t1.c.156t.vect
>>4
>> Without param only 2 loops are vectorized.
>>
>>Should I simply add a part of tests related to this feature or I must
>>delete all not necessary changes also?
>
> Please remove all not necessary changes.
>
> Richard.
>
>>Thanks.
>>Yuri.
>>
>>2016-11-14 16:40 GMT+03:00 Richard Biener :
>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>
 Richard,

 In my previous patch I forgot to remove couple lines related to aux
>>field.
 Here is the correct updated patch.
>>>
>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>> necessary parts from 1 and 2) if all not required parts are removed
>>> (and you'd add the testcases covering non-masked tail vect).
>>>
>>> Thus, can you please produce a single complete patch containing only
>>> non-masked epilogue vectoriziation?
>>>
>>> Thanks,
>>> Richard.
>>>
 Thanks.
 Yuri.

 2016-11-14 15:51 GMT+03:00 Richard Biener :
 > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
 >
 >> Richard,
 >>
 >> I prepare updated 3 patch with passing additional argument to
 >> vect_analyze_loop as you proposed (untested).
 >>
 >> You wrote:
 >> tw, I wonder if you can produce a single patch containing just
 >> epilogue vectorization, that is combine patches 1-3 but rip out
 >> changes only needed by later patches?
 >>
 >> Did you mean that I exclude all support for vectorization
>>epilogues,
 >> i.e. exclude from 2-nd patch all non-related changes
 >> like
 >>
 >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
 >> index 11863af..32011c1 100644
 >> --- a/gcc/tree-vect-loop.c
 >> +++ b/gcc/tree-vect-loop.c
 >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
 >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
 >>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
 >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
 >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
 >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
 >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
 >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
 >> +  LOOP_VINFO_NEED_MASKING (res) = false;
 >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
 >
 > Yes.
 >
 >> Did you mean also that new combined patch must be working patch,
>>i.e.
 >> can be integrated without other patches?
 >
 > Yes.
 >
 >> Could you please look at updated patch?
 >
 > Will do.
 >
 > Thanks,
 > Richard.
 >
 >> Thanks.
 >> Yuri.
 >>
 >> 2016-11-10 15:36 GMT+03:00 Richard Biener :
 >> > On Thu, 10 Nov 2016, Richard Biener wrote:
 >> >
 >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
 >> >>
 >> >> > Richard,
 >> >> >
 >> >> > Here is updated 3 patch.
 >> >> >
 >> >>

Re: [PATCH] Fix find&replace error in license boilerplate

2016-11-15 Thread Jonathan Wakely


On 15/11/16 14:36 +, Jonathan Wakely wrote:

There are 138 test files that says "a moved_to of the GNU General
Public License", presumably from a find& replace that then got copied
to new tests.

Fixed as obvious, as shown by this patch (the full patch is too big
for the mailing list).


Fixed on the gcc-5 and gcc-6 branches too.

Re: RFA (openmp): C++ PATCH to make some references TREE_CONSTANT

2016-11-15 Thread Jason Merrill

On Tue, Nov 15, 2016 at 4:32 AM, Jakub Jelinek  wrote:
> So, is there a way to treat references the similarly?  I.e. only "fold"
> reference vars to what they refer (DECL_INITIAL) in constexpr.c evaluation,
> or gimplification where a langhook or omp_notice_variable etc. has the
> last say on when it is ok to do that or not?

Yes, we can just not fold away references in cp_fold.  Applying this instead.

Jason
commit 0f5d66d82ca526419d3d3f0ee032ea88b070b214
Author: Jason Merrill 
Date:   Mon Nov 14 14:15:57 2016 -0500

Allow references in constant-expressions.

* decl2.c (decl_maybe_constant_var_p): References qualify.
* constexpr.c (non_const_var_error): Handle references.
* init.c (constant_value_1): Always check decl_constant_var_p.
* cp-gimplify.c (cp_fold_maybe_rvalue): Don't fold references.
* error.c (dump_decl_name): Split out from dump_decl.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index e8c7702..40d1e7b 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3153,6 +3153,10 @@ non_const_var_error (tree r)
   else
gcc_unreachable ();
 }
+  else if (TREE_CODE (type) == REFERENCE_TYPE)
+inform (DECL_SOURCE_LOCATION (r),
+   "%qD was not initialized with a constant "
+   "expression", r);
   else
 {
   if (cxx_dialect >= cxx11 && !DECL_DECLARED_CONSTEXPR_P (r))
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 9b9b511..5b5c0be 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1977,7 +1977,8 @@ cp_fold_maybe_rvalue (tree x, bool rval)
   while (true)
 {
   x = cp_fold (x);
-  if (rval && DECL_P (x))
+  if (rval && DECL_P (x)
+ && TREE_CODE (TREE_TYPE (x)) != REFERENCE_TYPE)
{
  tree v = decl_constant_value (x);
  if (v != x && v != error_mark_node)
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 4ebc7dc..257d211 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -4144,6 +4144,9 @@ decl_maybe_constant_var_p (tree decl)
   if (DECL_HAS_VALUE_EXPR_P (decl))
 /* A proxy isn't constant.  */
 return false;
+  if (TREE_CODE (type) == REFERENCE_TYPE)
+/* References can be constant.  */
+return true;
   return (CP_TYPE_CONST_NON_VOLATILE_P (type)
  && INTEGRAL_OR_ENUMERATION_TYPE_P (type));
 }
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index fe1f751..7bf07c3 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1000,6 +1000,37 @@ dump_simple_decl (cxx_pretty_printer *pp, tree t, tree 
type, int flags)
 dump_type_suffix (pp, type, flags);
 }
 
+/* Print an IDENTIFIER_NODE that is the name of a declaration.  */
+
+static void
+dump_decl_name (cxx_pretty_printer *pp, tree t, int flags)
+{
+  /* These special cases are duplicated here so that other functions
+ can feed identifiers to error and get them demangled properly.  */
+  if (IDENTIFIER_TYPENAME_P (t))
+{
+  pp_cxx_ws_string (pp, "operator");
+  /* Not exactly IDENTIFIER_TYPE_VALUE.  */
+  dump_type (pp, TREE_TYPE (t), flags);
+  return;
+}
+  if (dguide_name_p (t))
+{
+  dump_decl (pp, CLASSTYPE_TI_TEMPLATE (TREE_TYPE (t)),
+TFF_PLAIN_IDENTIFIER);
+  return;
+}
+
+  const char *str = IDENTIFIER_POINTER (t);
+  if (!strncmp (str, "_ZGR", 3))
+{
+  pp_cxx_ws_string (pp, "");
+  return;
+}
+
+  pp_cxx_tree_identifier (pp, t);
+}
+
 /* Dump a human readable string for the decl T under control of FLAGS.  */
 
 static void
@@ -1155,21 +1186,8 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags)
   gcc_unreachable ();
   break;
 
-  /* These special cases are duplicated here so that other functions
-can feed identifiers to error and get them demangled properly.  */
 case IDENTIFIER_NODE:
-  if (IDENTIFIER_TYPENAME_P (t))
-   {
- pp_cxx_ws_string (pp, "operator");
- /* Not exactly IDENTIFIER_TYPE_VALUE.  */
- dump_type (pp, TREE_TYPE (t), flags);
- break;
-   }
-  else if (dguide_name_p (t))
-   dump_decl (pp, CLASSTYPE_TI_TEMPLATE (TREE_TYPE (t)),
-  TFF_PLAIN_IDENTIFIER);
-  else
-   pp_cxx_tree_identifier (pp, t);
+  dump_decl_name (pp, t, flags);
   break;
 
 case OVERLOAD:
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 1fad79c..b4b6cdb 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2078,10 +2078,9 @@ static tree
 constant_value_1 (tree decl, bool strict_p, bool return_aggregate_cst_ok_p)
 {
   while (TREE_CODE (decl) == CONST_DECL
-|| (strict_p
-? decl_constant_var_p (decl)
-: (VAR_P (decl)
-   && CP_TYPE_CONST_NON_VOLATILE_P (TREE_TYPE (decl)
+|| decl_constant_var_p (decl)
+|| (!strict_p && VAR_P (decl)
+&& CP_TYPE_CONST_NON_VOLATILE_P (TREE_TYPE (decl
 {
   tree init;
   /* If DECL is a static data member in a template
diff --git a/gcc/tes

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Jeff Law


On 11/15/2016 05:55 AM, Andrew Senkevich wrote:

2016-11-11 14:16 GMT+03:00 Uros Bizjak :

--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
--- a/gcc/machmode.h
+++ b/gcc/machmode.h

These are middle-end changes, you will need a separate review for these.


Who could review these changes?
I can.  I likely dropped the message because it looked x86 specific, so 
if you could resend it'd be appreciated.


jeff

[PATCH] Fix PR78306

2016-11-15 Thread Richard Biener


Appearantly for some unknown reason we refuse to inline anything into
functions calling cilk_spawn.  That breaks fortified headers and
all other always-inline function calls (intrinsics come to my mind as 
well).

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?

Thanks,
Richard.

2016-11-15  Richard Biener  

PR tree-optimization/78306
* ipa-inline-analysis.c (initialize_inline_failed): Do not
inhibit inlining if function calls cilk_spawn.
(can_inline_edge_p): Likewise.

* gcc.dg/cilk-plus/pr78306.c: New testcase.

Index: gcc/ipa-inline-analysis.c
===
--- gcc/ipa-inline-analysis.c   (revision 242408)
+++ gcc/ipa-inline-analysis.c   (working copy)
@@ -1507,9 +1507,6 @@ initialize_inline_failed (struct cgraph_
 e->inline_failed = CIF_BODY_NOT_AVAILABLE;
   else if (callee->local.redefined_extern_inline)
 e->inline_failed = CIF_REDEFINED_EXTERN_INLINE;
-  else if (cfun && fn_contains_cilk_spawn_p (cfun))
-/* We can't inline if the function is spawing a function.  */
-e->inline_failed = CIF_CILK_SPAWN;
   else
 e->inline_failed = CIF_FUNCTION_NOT_CONSIDERED;
   gcc_checking_assert (!e->call_stmt_cannot_inline_p
Index: gcc/ipa-inline.c
===
--- gcc/ipa-inline.c(revision 242408)
+++ gcc/ipa-inline.c(working copy)
@@ -368,11 +368,6 @@ can_inline_edge_p (struct cgraph_edge *e
   e->inline_failed = CIF_FUNCTION_NOT_INLINABLE;
   inlinable = false;
 }
-  else if (inline_summaries->get (caller)->contains_cilk_spawn)
-{
-  e->inline_failed = CIF_CILK_SPAWN;
-  inlinable = false;
-}
   /* Don't inline a function with mismatched sanitization attributes. */
   else if (!sanitize_attrs_match_for_inline_p (caller->decl, callee->decl))
 {
Index: gcc/testsuite/gcc.dg/cilk-plus/pr78306.c
===
--- gcc/testsuite/gcc.dg/cilk-plus/pr78306.c(revision 0)
+++ gcc/testsuite/gcc.dg/cilk-plus/pr78306.c(working copy)
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fcilkplus" } */
+
+#define _FORTIFY_SOURCE=2
+#include 
+#include 
+#include 
+
+int sum(int low, int high)
+{
+  if(low == high) {
+return low;
+  }
+
+  int mid = low + (high-low)/2;
+  int a = cilk_spawn sum(low, mid);
+  int b = sum(mid+1, high);
+
+  // Some very expensive computation here
+  int foo[64];
+  memset(foo, 0, 64*sizeof(int)); // <--- Fails
+
+  cilk_sync;
+
+  return a+b;
+}
+
+int main(void) {
+  return sum(0, 100);
+}

RE: [PATCH,testsuite] MIPS: Downgrade from R6 to R5 to prevent redundant testing of branch-cost-1.c.

2016-11-15 Thread Toma Tabacu


> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Toma Tabacu
> Sent: 15 November 2016 14:00
> To: gcc-patches@gcc.gnu.org
> Cc: Matthew Fortune; catherine_mo...@mentor.com
> Subject: [PATCH,testsuite] MIPS: Downgrade from R6 to R5 to prevent
> redundant testing of branch-cost-1.c.
> 
> Hi,
> 
> The branch-cost-1.c test uses the isa>=4 option to ensure the existence of the
> MOVN/MOVZ instructions. This, however, does not take into account R6
> targets,
> which are accepted by the isa>=4 option but do not support MOVN/MOVZ.
> 
> This particular test does not fail on R6, because it is checking for the
> absence of MOVN/MOVZ, but it is redundant.
> 
> This patch fixes this by replacing isa>=4 with (HAS_MOVN), which will only
> accept targets in the [MIPS IV, R5] interval.
> 
> Tested with mips-img-linux-gnu.
> 
> Regards,
> Toma Tabacu
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-11-15  Toma Tabacu  
> 
>   * gcc.target/mips/branch-cost-1.c: Use (HAS_MOVN) instead of isa>=4,
>   in order to downgrade to R5.
> 
> diff --git a/gcc/testsuite/gcc.target/mips/branch-cost-1.c
> b/gcc/testsuite/gcc.target/mips/branch-cost-1.c
> index 61c3029..7f7ebbe 100644
> --- a/gcc/testsuite/gcc.target/mips/branch-cost-1.c
> +++ b/gcc/testsuite/gcc.target/mips/branch-cost-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-options "-mbranch-cost=1 isa>=4" } */
> +/* { dg-options "-mbranch-cost=1 (HAS_MOVN)" } */
>  /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
>  NOMIPS16 int
>  foo (int x, int y, int z, int k)

The version below has a slightly more precise ChangeLog entry.

Regards,
Toma Tabacu

gcc/testsuite/ChangeLog:

2016-11-15  Toma Tabacu  

* gcc.target/mips/branch-cost-1.c (dg-options): Use (HAS_MOVN) instead
of isa>=4, in order to downgrade to R5.

diff --git a/gcc/testsuite/gcc.target/mips/branch-cost-1.c
b/gcc/testsuite/gcc.target/mips/branch-cost-1.c
index 61c3029..7f7ebbe 100644
--- a/gcc/testsuite/gcc.target/mips/branch-cost-1.c
+++ b/gcc/testsuite/gcc.target/mips/branch-cost-1.c
@@ -1,4 +1,4 @@
-/* { dg-options "-mbranch-cost=1 isa>=4" } */
+/* { dg-options "-mbranch-cost=1 (HAS_MOVN)" } */
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
 NOMIPS16 int
 foo (int x, int y, int z, int k)

RE: [PATCH] [ARC] New option handling, refurbish multilib support.

2016-11-15 Thread Claudiu Zissulescu

> This looks fine.  Thanks for all your effort revising this patch.
> 
> Andrew
> 

Committed r242425. 

Thank you for your review,
Claudiu

Re: Fix nb_iterations calculation in tree-vect-loop-manip.c

2016-11-15 Thread Richard Biener

On Tue, Nov 15, 2016 at 1:44 PM, Richard Sandiford
 wrote:
> We previously stored the number of loop iterations rather
> than the number of latch iterations.

So ->nb_iterations was unused without SVE?  Otherwise can you please
add a testcase?

Thanks,
Richard.

> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
>
> [ This patch is part of the SVE series posted here:
>   https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]
>
> gcc/
> 2016-11-15  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> * tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Set
> nb_iterations to the number of latch iterations rather than the
> number of loop iterations.
>
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 6bfd332..4c6b8c7 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -285,7 +285,10 @@ slpeel_make_loop_iterate_ntimes (struct loop *loop, tree 
> niters)
>  LOCATION_LINE (loop_loc));
>dump_gimple_stmt (MSG_NOTE, TDF_SLIM, cond_stmt, 0);
>  }
> -  loop->nb_iterations = niters;
> +
> +  /* Record the number of latch iterations.  */
> +  loop->nb_iterations = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), niters,
> +build_int_cst (TREE_TYPE (niters), 1));
>  }
>
>  /* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg.
>

[committed] Add PR c++/71988 testcase

2016-11-15 Thread Jakub Jelinek

Hi!

I've fixed this PR already in r240198 as part of the PR77482
fix, thus I've just added the testcase for it.  Tested on x86_64-linux,
committed to trunk as obvious.

2016-11-15  Jakub Jelinek  

PR c++/71988
* g++.dg/cpp0x/constexpr-71988.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/constexpr-71988.C.jj 2016-11-15 
16:15:45.002454953 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-71988.C2016-11-15 
16:16:22.799977409 +0100
@@ -0,0 +1,6 @@
+// PR c++/71988
+// { dg-do compile { target c++11 } }
+// { dg-options "-fdump-ipa-cgraph" }
+
+struct A {};
+constexpr A a;

Jakub

Re: [patch] Disable LTO note about strict aliasing

2016-11-15 Thread Richard Biener

On Tue, Nov 15, 2016 at 1:19 PM, Eric Botcazou  wrote:
>> Yes, I know -fno-strict-aliasing is globally set, but will all
>> -fstrict-aliasing optimization attributes on functions be "overwritten"?
>> That is, are you sure that when optimizing a function originally compiled
>> with -fstrict-aliasing that -fno-strict-aliasing is in effect?
>
> Do you mean with the "optimize" attribute or somesuch?  If so, I suppose not,
> if the attributes are properly saved and restored; otherwise, I don't really
> understand the question because I don't think that LTO saves the entire option
> state on a per-function basis.

Yes, it does since GCC 5 (or was it 6?).

Richard.

> --
> Eric Botcazou

Re: Fix nb_iterations_estimate calculation in tree-vect-loop.c

2016-11-15 Thread Richard Biener

On Tue, Nov 15, 2016 at 1:57 PM, Richard Sandiford
 wrote:
> vect_transform_loop has to reduce three iteration counts by
> the vectorisation factor: nb_iterations_upper_bound,
> nb_iterations_likely_upper_bound and nb_iterations_estimate.
> All three are latch execution counts rather than loop body
> execution counts.  The calculations were taking that into
> account for the first two, but not for nb_iterations_estimate.
>
> This patch updates the way the calculations are done to fix
> this and to add a bit more commentary about what is going on.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Richard.

> Thanks,
> Richard
>
>
> [ This patch is part of the SVE series posted here:
>   https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]
>
> gcc/
> 2016-11-15  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> * tree-vect-loop.c (vect_transform_loop): Protect the updates of
> all three iteration counts with an any_* test.  Use a single update
> for each count.  Fix the calculation of nb_iterations_estimate.
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 1cd9c72..53570f3 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -7043,27 +7043,25 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>/* Reduce loop iterations by the vectorization factor.  */
>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>   expected_iterations / vf);
> -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> -{
> -  if (loop->nb_iterations_upper_bound != 0)
> -loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 
> 1;
> -  if (loop->nb_iterations_likely_upper_bound != 0)
> -loop->nb_iterations_likely_upper_bound
> -  = loop->nb_iterations_likely_upper_bound - 1;
> -}
> -  loop->nb_iterations_upper_bound
> -= wi::udiv_floor (loop->nb_iterations_upper_bound + 1, vf) - 1;
> -  loop->nb_iterations_likely_upper_bound
> -= wi::udiv_floor (loop->nb_iterations_likely_upper_bound + 1, vf) - 1;
> -
> +  /* The minimum number of iterations performed by the epilogue.  This
> + is 1 when peeling for gaps because we always need a final scalar
> + iteration.  */
> +  int min_epilogue_iters = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? 1 : 0;
> +  /* +1 to convert latch counts to loop iteration counts,
> + -min_epilogue_iters to remove iterations that cannot be performed
> +   by the vector code.  */
> +  int bias = 1 - min_epilogue_iters;
> +  /* In these calculations the "- 1" converts loop iteration counts
> + back to latch counts.  */
> +  if (loop->any_upper_bound)
> +loop->nb_iterations_upper_bound
> +  = wi::udiv_floor (loop->nb_iterations_upper_bound + bias, vf) - 1;
> +  if (loop->any_likely_upper_bound)
> +loop->nb_iterations_likely_upper_bound
> +  = wi::udiv_floor (loop->nb_iterations_likely_upper_bound + bias, vf) - 
> 1;
>if (loop->any_estimate)
> -{
> -  loop->nb_iterations_estimate
> -   = wi::udiv_floor (loop->nb_iterations_estimate, vf);
> -   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> -  && loop->nb_iterations_estimate != 0)
> -loop->nb_iterations_estimate = loop->nb_iterations_estimate - 1;
> -}
> +loop->nb_iterations_estimate
> +  = wi::udiv_floor (loop->nb_iterations_estimate + bias, vf) - 1;
>
>if (dump_enabled_p ())
>  {
>

Re: [PATCH] libiberty: demangler crash with missing :? or fold expression component.

2016-11-15 Thread Ian Lance Taylor

On Mon, Nov 14, 2016 at 3:39 PM, Mark Wielaard  wrote:
> When construction a :? or fold expression that requires a third
> expression only the first and second were explicitly checked to
> not be NULL. Since the third expression is also required in these
> constructs it needs to be explicitly checked and rejected when missing.
> Otherwise the demangler will crash once it tries to d_print the
> NULL component. Added two examples to demangle-expected of strings
> that would crash before this fix.
>
> Found by American Fuzzy Lop (afl) fuzzer.
> ---
>  libiberty/ChangeLog   | 7 +++
>  libiberty/cp-demangle.c   | 4 
>  libiberty/testsuite/demangle-expected | 8 
>  3 files changed, 19 insertions(+)
>
> diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
> index 41f3405..43617e4 100644
> --- a/libiberty/ChangeLog
> +++ b/libiberty/ChangeLog
> @@ -1,3 +1,10 @@
> +2016-11-15  Mark Wielaard  
> +
> +   * cp-demangle.c (d_expression_1): Make sure third expression
> +   exists for ?: and fold expressions.
> +   * testsuite/demangle-expected: Add examples of strings that could
> +   crash the demangler because of missing expression.
> +

This is not the approach usually taken by the demangler.  The usual
approach would be to use a different code, other than
DEMANGLE_COMPONENT_TRINARY_ARG2, that requires a non-NULL right
argument, and test for that in d_make_comp.  But I suppose this
approach is simple enough, so this patch is OK.  Thanks.

Ian

RE: [PATCH 2/2] [ARC] Update target specific tests.

2016-11-15 Thread Claudiu Zissulescu

PING! Once the new options are in, we need also to update the tests. 

Andrew, please can you check it,
Claudiu

> -Original Message-
> From: Claudiu Zissulescu
> Sent: Monday, May 30, 2016 2:33 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ; g...@amylaar.uk;
> francois.bed...@synopsys.com
> Subject: [PATCH 2/2] [ARC] Update target specific tests.
> 
> Update the ARC specific tests.
> 
> OK to apply?
> Claudiu
> 
> gcc/
> 2016-05-26  Claudiu Zissulescu  
> 
>   * testsuite/gcc.target/arc/abitest.S: New file.
>   * testsuite/gcc.target/arc/va_args-1.c: Likewise.
>   * testsuite/gcc.target/arc/va_args-2.c: Likewise.
>   * testsuite/gcc.target/arc/va_args-3.c: Likewise.
>   * testsuite/gcc.target/arc/mcrc.c: Deleted.
>   * testsuite/gcc.target/arc/mdsp-packa.c: Likewise.
>   * testsuite/gcc.target/arc/mdvbf.c: Likewise.
>   * testsuite/gcc.target/arc/mmac-24.c: Likewise.
>   * testsuite/gcc.target/arc/mmac-d16.c: Likewise.
>   * testsuite/gcc.target/arc/mno-crc.c: Likewise.
>   * testsuite/gcc.target/arc/mno-dsp-packa.c: Likewise.
>   * testsuite/gcc.target/arc/mno-dvbf.c: Likewise.
>   * testsuite/gcc.target/arc/mno-mac-24.c: Likewise.
>   * testsuite/gcc.target/arc/mno-mac-d16.c: Likewise.
>   * testsuite/gcc.target/arc/mno-rtsc.c: Likewise.
>   * testsuite/gcc.target/arc/mno-xy.c: Likewise.
>   * testsuite/gcc.target/arc/mrtsc.c: Likewise.
>   * testsuite/gcc.target/arc/arc.exp (check_effective_target_arcem):
>   New function.
>   (check_effective_target_arc700): Likewise.
>   (check_effective_target_arc6xx): Likewise.
>   (check_effective_target_arcmpy): Likewise.
>   (check_effective_target_archs): Likewise.
>   (check_effective_target_clmcpu): Likewise.
>   * testsuite/gcc.target/arc/barrel-shifter-1.c: Changed.
>   * testsuite/gcc.target/arc/builtin_simd.c: Test only for ARC700
>   cpus.
>   * testsuite/gcc.target/arc/cmem-1.c: Changed.
>   * testsuite/gcc.target/arc/cmem-2.c: Likewise.
>   * testsuite/gcc.target/arc/cmem-3.c: Likewise.
>   * testsuite/gcc.target/arc/cmem-4.c: Likewise.
>   * testsuite/gcc.target/arc/cmem-5.c: Likewise.
>   * testsuite/gcc.target/arc/cmem-6.c: Likewise.
>   * testsuite/gcc.target/arc/cmem-7.c: Likewise.
>   * testsuite/gcc.target/arc/interrupt-1.c: Test for RTIE as well.
>   * testsuite/gcc.target/arc/interrupt-2.c: Skip it for ARCv2 cores.
>   * testsuite/gcc.target/arc/interrupt-3.c: Match also ARCv2
>   warnings.
>   * testsuite/gcc.target/arc/jump-around-jump.c: Update options.
>   * testsuite/gcc.target/arc/mARC601.c: Changed.
>   * testsuite/gcc.target/arc/mcpu-arc600.c: Changed.
>   * testsuite/gcc.target/arc/mcpu-arc601.c: Changed.
>   * testsuite/gcc.target/arc/mcpu-arc700.c: Changed.
>   * testsuite/gcc.target/arc/mdpfp.c: Skip for ARCv2 cores.
>   * testsuite/gcc.target/arc/movb-1.c: Changed.
>   * testsuite/gcc.target/arc/movb-2.c: Likewise.
>   * testsuite/gcc.target/arc/movb-3.c: Likewise.
>   * testsuite/gcc.target/arc/movb-4.c: Likewise.
>   * testsuite/gcc.target/arc/movb-5.c: Likewise.
>   * testsuite/gcc.target/arc/movb_cl-1.c: Likewise.
>   * testsuite/gcc.target/arc/movb_cl-2.c: Likewise.
>   * testsuite/gcc.target/arc/movbi_cl-1.c: Likewise.
>   * testsuite/gcc.target/arc/movh_cl-1.c: Likewise.
>   * testsuite/gcc.target/arc/mspfp.c: Skip for ARC HS cores.
>   * testsuite/gcc.target/arc/mul64.c: Enable it only for ARC600.
>   * testsuite/gcc.target/arc/mulsi3_highpart-1.c: Scan for ARCv2
>   instructions.
>   * testsuite/gcc.target/arc/mulsi3_highpart-2.c: Skip it for ARCv1
>   cores.
>   * testsuite/gcc.target/arc/no-dpfp-lrsr.c: Skip it for ARC HS.
>   * testsuite/gcc.target/arc/trsub.c: Only for ARC EM cores.
>   * testsuite/gcc.target/arc/builtin_simdarc.c: Changed.
>   * testsuite/gcc.target/arc/extzv-1.c: Likewise.
>   * testsuite/gcc.target/arc/insv-1.c: Likewise.
>   * testsuite/gcc.target/arc/insv-2.c: Likewise.
>   * testsuite/gcc.target/arc/mA6.c: Likewise.
>   * testsuite/gcc.target/arc/mA7.c: Likewise.
>   * testsuite/gcc.target/arc/mARC600.c: Likewise.
>   * testsuite/gcc.target/arc/mARC700.c: Likewise.
>   * testsuite/gcc.target/arc/mcpu-arc600.c: Likewise.
>   * testsuite/gcc.target/arc/mcpu-arc700.c: Likewise.
>   * testsuite/gcc.target/arc/movl-1.c: Likewise.
>   * testsuite/gcc.target/arc/nps400-1.c: Likewise.
>   * testsuite/gcc.target/arc/trsub.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/arc/abitest.S   | 31 +++
>  gcc/testsuite/gcc.target/arc/arc.exp | 66
> +++-
>  gcc/testsuite/gcc.target/arc/barrel-shifter-1.c  |  2 +-
>  gcc/testsuite/gcc.target/arc/builtin_simd.c  |  1 +
>  gcc/testsuite/gcc.target/arc/builtin_simdarc.c   |  1 +
>  gcc/testsuite/gcc.target/a

Re: [PATCH] Support -fsanitize=integer-arith-overflow even for vectors (PR sanitizer/77823)

2016-11-15 Thread Jeff Law


On 11/15/2016 07:03 AM, Jakub Jelinek wrote:

Hi!

On Mon, Nov 14, 2016 at 10:58:51AM +0100, Jakub Jelinek wrote:

Working virtually out of Samoa.

The following patch is an attempt to handle -fsanitize=undefined
for vectors.  We already diagnose out of bounds accesses for vector
subscripts, this patch adds expansion for vector UBSAN_CHECK_* and generates
those in ubsan.  Haven't finished up the many vect elements handling (want
to emit a loop for code size).  Is this something we want for GCC 7?


Here is the full patch (just for -fsanitize=signed-integer-overflow, not
for -fsanitize=shift or -fsanitize={integer,float}-divide-by-zero for now).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-15  Jakub Jelinek  

PR sanitizer/77823
* ubsan.c (ubsan_build_overflow_builtin): Add DATAP argument, if
it points to non-NULL tree, use it instead of ubsan_create_data.
(instrument_si_overflow): Handle vector signed integer overflow
checking.
* ubsan.h (ubsan_build_overflow_builtin): Add DATAP argument.
* tree-vrp.c (simplify_internal_call_using_ranges): Punt for
vector IFN_UBSAN_CHECK_*.
* internal-fn.c (expand_addsub_overflow): Add DATAP argument,
pass it through to ubsan_build_overflow_builtin.
(expand_neg_overflow, expand_mul_overflow): Likewise.
(expand_vector_ubsan_overflow): New function.
(expand_UBSAN_CHECK_ADD, expand_UBSAN_CHECK_SUB,
expand_UBSAN_CHECK_MUL): Use tit for vector arithmetics.
(expand_arith_overflow): Adjust expand_*_overflow callers.

* c-c++-common/ubsan/overflow-vec-1.c: New test.
* c-c++-common/ubsan/overflow-vec-2.c: New test.
I certainly don't see any reason why we wouldn't want additional 
sanitizers, so ISTM it's really a matter of are you happy with the 
implementation.


While there's a fair amount of changes in internal-fn.c, they're all 
sanitizer specific routines AFAICT.


Jeff

RE: [PATCH 2/2] [ARC] [libgcc] Fix defines

2016-11-15 Thread Claudiu Zissulescu

 
> Is there a reason that instruction should be uppercase?
> 
> This otherwise looks fine to me.
> 

Committed r242428. 
Thank you for your review,
Claudiu

Re: [patch] remove more GCJ references

2016-11-15 Thread Jeff Law


On 11/15/2016 03:55 AM, Matthias Klose wrote:

This patch removes some references to gcj in the top level and config
directories and in the gcc documentation.  The change to the config directory
requires regenerating aclocal.m4 and configure in each sub directory.

Ok for the trunk?

Matthias



2016-11-14  Matthias Klose  

* config-ml.in: Remove references to GCJ.
* configure.ac: Likewise.
* configure: Regenerate.

config/

2016-11-14  Matthias Klose  

multi.m4: Don't set GCJ.

gcc/

2016-11-14  Matthias Klose  

* doc/install.texi: Remove references to gcj/libjava.
* doc/invoke.texi: Likewise.


OK.
jeff

Re: [C++ PATCH] tweak PR77337 testcase

2016-11-15 Thread Jason Merrill

OK.

On Tue, Nov 15, 2016 at 9:39 AM, Jakub Jelinek  wrote:
> On Thu, Nov 10, 2016 at 10:50:27PM +0100, Jakub Jelinek wrote:
>> > +   self(); // error: use of 'decltype(auto) 
>> > fix_type::operator()() [with Functor = 
>> > main()::]' before deduction of 'auto'
>>
>> Wouldn't it be clearer to turn that // error: line into
>> // { dg-bogus "use of \[^\n\r]* before deduction of 'auto'" }
>> so that it is clear that the error is undesirable even to casual reader?
>
> Now in the form of patch.  Tested on x86_64-linux, ok for trunk?
>
> 2016-11-15  Jakub Jelinek  
>
> * g++.dg/cpp1y/auto-fn33.C (main): Turn // error: ... into dg-bogus.
>
> --- gcc/testsuite/g++.dg/cpp1y/auto-fn33.C.jj   2016-11-11 12:45:40.0 
> +0100
> +++ gcc/testsuite/g++.dg/cpp1y/auto-fn33.C  2016-11-15 15:36:58.538054171 
> +0100
> @@ -20,7 +20,7 @@ int main()
>   {
> return 0;
>
> -   self(); // error: use of 'decltype(auto) 
> fix_type::operator()() [with Functor = main()::]' 
> before deduction of 'auto'
> +   self(); // { dg-bogus "use of \[^\n\r]* before deduction of 'auto'" }
>   });
>
>return zero();
>
>
> Jakub

Re: Fix nb_iterations calculation in tree-vect-loop-manip.c

2016-11-15 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Nov 15, 2016 at 1:44 PM, Richard Sandiford
>  wrote:
>> We previously stored the number of loop iterations rather
>> than the number of latch iterations.
>
> So ->nb_iterations was unused without SVE?  Otherwise can you please
> add a testcase?

TBH I can't remember whether we noticed this by inspection or whether
it did manifest in the output somehow.  If it did, it would have been
an extra unnecessary iteration after complete unrolling, but usually
a later patch would remove the iteration as dead.

Thanks,
Richard

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Jerry DeLisle


On 11/14/2016 11:22 PM, Thomas Koenig wrote:

Hi Jerry,


With these changes, OK for trunk?


Just going over this with a fine comb...

One thing just struck me:   The loop variables should be index_type, so

  const index_type m = xcount, n = ycount, k = count;

[...]

   index_type a_dim1, a_offset, b_dim1, b_offset, c_dim1, c_offset, i1, i2,
  i3, i4, i5, i6;

  /* Local variables */
  GFC_REAL_4 t1[65536], /* was [256][256] */
 f11, f12, f21, f22, f31, f32, f41, f42,
 f13, f14, f23, f24, f33, f34, f43, f44;
  index_type i, j, l, ii, jj, ll;
  index_type isec, jsec, lsec, uisec, ujsec, ulsec;

I agree that we should do the tuning of the inline limit
separately.



Several of my iterations used index_type. I found using integer gives better 
performance. The reason is that they are of type ptr_diff_t which is a 64 bit 
integer. I suspect we eliminate one memory fetch for each of these and reduce 
the register loading by reducing the number of registers needed, two for one 
situation. I will change back and retest.


and Paul commeneted "-ftree-vectorize turns on -ftree-loop-vectorize and
-ftree-slp-vectorize already."

I will remove those to options and keep -ftree-vectorize

I will report back my findings.

Thanks, and a fine tooth comb is a very good thing.

Jerry

Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jiong Wang


On 11/11/16 19:38, Jakub Jelinek wrote:


On Fri, Nov 11, 2016 at 06:21:48PM +, Jiong Wang wrote:

This patch introduces three AARCH64 private DWARF operations in vendor extension
space.

DW_OP_AARCH64_pauth 0xea
===
   Takes one unsigned LEB 128 Pointer Authentication Description. Bits [3:0] of
   the description contain the Authentication Action Code. All unused bits are
   initialized to 0. The operation then proceeds according to the value of the
   action code as described in the Action Code Table.

DW_OP_AARCH64_paciasp 0xeb
===
   Authenticates the contents in X30/LR register as per A key for instruction
   pointer using current CFA as salt. The result is pushed onto the stack.

DW_OP_AARCH64_paciasp_deref 0xec
===
   Takes one signed LEB128 offset and retrieves 8-byte contents from the address
   calculated by CFA plus this offset, the contents then authenticated as per A
   key for instruction pointer using current CFA as salt. The result is pushed
   onto the stack.

I'd like to point out that especially the vendor range of DW_OP_* is
extremely scarce resource, we have only a couple of unused values, so taking
3 out of the remaining unused 12 for a single architecture is IMHO too much.
Can't you use just a single opcode and encode which of the 3 operations it is
in say the low 2 bits of a LEB 128 operand?
We'll likely need to do RSN some multiplexing even for the generic GNU
opcodes if we need just a few further ones (say 0xff as an extension,
followed by uleb128 containing the opcode - 0xff).
In the non-vendor area we still have 54 values left, so there is more space
for future expansion.

Jakub


   
  Seperate DWARF operations are introduced instead of combining all of them into

one are mostly because these operations are going to be used for most of the
functions once return address signing are enabled, and they are used for
describing frame unwinding that they will go into unwind table for C++ program
or C program compiled with -fexceptions, the impact on unwind table size is
significant.  So I was trying to lower the unwind table size overhead as much as
I can.

  IMHO, three numbers actually is not that much for one architecture in DWARF
operation vendor extension space as vendors can overlap with each other.  The
only painful thing from my understand is there are platform vendors, for example
"GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

  In include/dwarf2.def, I saw DW_OP_GNU* has reserved 13, DW_OP_HP* has 
reserved
7 and DW_OP_PGI has reserved 1.

  So for an alternative approach, can these AArch64 extensions overlap and reuse
those numbers reserved for DW_OP_HP* ? for example 0xe4, 0xe5, 0xe6.  I am even
thinking GNU toolchain makes the 8 numbers reserved by existed DW_OP_HP* and
DW_OP_SGI* as architecture vendor area and allow multiplexing on them for
different architectures.  This may offer more flexibilities for architecture
vendors.

  Under current code base, my search shows the overlap should be safe inside
GCC/GDB and we only needs minor disassemble tweak in Binutils.

  Thanks.

Regards,
Jiong

Re: Add a load_extend_op wrapper

2016-11-15 Thread Jeff Law


On 11/15/2016 05:42 AM, Richard Sandiford wrote:

LOAD_EXTEND_OP only applies to scalar integer modes that are narrower
than a word.  However, callers weren't consistent about which of these
checks they made beforehand, and also weren't consistent about whether
"smaller" was based on (bit)size or precision (IMO it's the latter).
This patch adds a wrapper to try to make the macro easier to use.
It's unclear to me how GET_MODE_PRECISION is different from 
GET_MODE_SIZE or GET_MODE_BITSIZE.  But I haven't really thought about 
it, particularly in the context of vector modes and such.  I'm certainly 
willing to trust your judgment on this.






LOAD_EXTEND_OP is often used to disable transformations that aren't
beneficial when extends from memory are free, so being stricter about
the check accidentally exposed more optimisation opportunities.

Right.



"SUBREG_BYTE (...) == 0" and subreg_lowpart_p are implied by
paradoxical_subreg_p, so the patch also removes some redundant tests.

Always helpful.



The patch doesn't change reload, since different checks could have
unforeseen consequences.
I think the same concepts apply in reload, but I understand the 
hesitation to twiddle that code and deal with possible fallout.




Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtl.h (load_extend_op): Declare.
* rtlanal.c (load_extend_op): New function.
(nonzero_bits1): Use it.
(num_sign_bit_copies1): Likewise.
* cse.c (cse_insn): Likewise.
* fold-const.c (fold_single_bit_test): Likewise.
(fold_unary_loc): Likewise.
* fwprop.c (free_load_extend): Likewise.
* postreload.c (reload_cse_simplify_set): Likewise.
(reload_cse_simplify_operands): Likewise.
* combine.c (try_combine): Likewise.
(simplify_set): Likewise.  Remove redundant SUBREG_BYTE and
subreg_lowpart_p checks.

OK.
jeff

Add a mem_alias_size helper class

2016-11-15 Thread Richard Sandiford

alias.c encodes memory sizes as follows:

size > 0: the exact size is known
size == 0: the size isn't known
size < 0: the exact size of the reference itself is known,
  but the address has been aligned via AND.  In this case
  "-size" includes the size of the reference and the worst-case
  number of bytes traversed by the AND.

This patch wraps this up in a helper class and associated
functions.  The new routines fix what seems to be a hole
in the old logic: if the size of a reference A was unknown,
offset_overlap_p would assume that it could conflict with any
other reference B, even if we could prove that B comes before A.

The fallback CONSTANT_P (x) && CONSTANT_P (y) case looked incorrect.
Either "c" is trustworthy as a distance between the two constants,
in which case the alignment handling should work as well there as
elsewhere, or "c" isn't trustworthy, in which case offset_overlap_p
is unsafe.  I think the latter's true; AFAICT we have no evidence
that "c" really is the distance between the two references, so using
it in the check doesn't make sense.

At this point we've excluded cases for which:

(a) the base addresses are the same
(b) x and y are SYMBOL_REFs, or SYMBOL_REF-based constants
wrapped in a CONST
(c) x and y are both constant integers

No useful cases should be left.  As things stood, we would
assume that:

  (mem:SI (const_int X))

could overlap:

  (mem:SI (symbol_ref Y))

but not:

  (mem:SI (const (plus (symbol_ref Y) (const_int 4

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* alias.c (mem_alias_size): New class.
(mem_alias_size::mode): New function.
(mem_alias_size::exact_p): Likewise.
(mem_alias_size::max_size_known_p): Likewise.
(align_to): Likewise.
(alias_may_gt): Likewise.
(addr_side_effect_eval): Change type of size argument to
mem_alias_size.  Use plus_constant.
(offset_overlap_p): Change type of xsize and ysize to
mem_alias_size.  Use alias_may_gt.  Don't assume an overlap
between an access of unknown size and an access that's known
to be earlier than it.
(memrefs_conflict_p): Change type of xsize and ysize to
mem_alias_size.  Remove fallback CONSTANT_P (x) && CONSTANT_P (y)
handling.

diff --git a/gcc/alias.c b/gcc/alias.c
index 1ea2417..486d06a 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -148,7 +148,6 @@ struct GTY(()) alias_set_entry {
 };
 
 static int rtx_equal_for_memref_p (const_rtx, const_rtx);
-static int memrefs_conflict_p (int, rtx, int, rtx, HOST_WIDE_INT);
 static void record_set (rtx, const_rtx, void *);
 static int base_alias_check (rtx, rtx, rtx, rtx, machine_mode,
 machine_mode);
@@ -176,11 +175,104 @@ static struct {
   unsigned long long num_disambiguated;
 } alias_stats;
 
+/* Represents the size of a memory reference during alias analysis.
+   There are three possibilities:
 
-/* Set up all info needed to perform alias analysis on memory references.  */
+   (1) the size needs to be treated as completely unknown
+   (2) the size is known exactly and no alignment is applied to the address
+   (3) the size is known exactly but an alignment is applied to the address
+
+   (3) is used for aligned addresses of the form (and X (const_int -N)),
+   which can subtract something in the range [0, N) from the original
+   address X.  We handle this by subtracting N - 1 from X and adding N - 1
+   to the size, so that the range spans all possible bytes.  */
+class mem_alias_size {
+public:
+  /* Return an unknown size (case (1) above).  */
+  static mem_alias_size unknown () { return (HOST_WIDE_INT) 0; }
+
+  /* Return an exact size (case (2) above).  */
+  static mem_alias_size exact (HOST_WIDE_INT size) { return size; }
+
+  /* Return a worst-case size after alignment (case (3) above).
+ SIZE includes the maximum adjustment applied by the alignment.  */
+  static mem_alias_size aligned (HOST_WIDE_INT size) { return -size; }
+
+  /* Return the size of memory reference X.  */
+  static mem_alias_size mem (const_rtx x) { return MEM_SIZE (x); }
+
+  static mem_alias_size mode (machine_mode m);
+
+  /* Return true if the exact size of the memory is known.  */
+  bool exact_p () const { return m_value > 0; }
+  bool exact_p (HOST_WIDE_INT *) const;
+
+  /* Return true if an upper bound on the memory size is known;
+ i.e. not case (1) above.  */
+  bool max_size_known_p () const { return m_value != 0; }
+  bool max_size_known_p (HOST_WIDE_INT *) const;
+
+  /* Return true if the size is subject to alignment.  */
+  bool aligned_p () const { return m_value < 0; }
+
+private:
+  mem_alias_size (HOST_WIDE_INT value) : m_value (value) {}
+
+  HOST_WIDE_INT m_value;
+};
 
-/*

Use simplify_gen_binary in canon_rtx

2016-11-15 Thread Richard Sandiford

After simplifying the operands of a PLUS, canon_rtx checked only
for cases in which one of the simplified operands was a constant,
falling back to gen_rtx_PLUS otherwise.  This left the PLUS in a
non-canonical order if one of the simplified operands was
(plus (reg R1) (const_int X)); we'd end up with:

   (plus (plus (reg R1) (const_int Y)) (reg R2))

rather than:

   (plus (plus (reg R1) (reg R2)) (const_int Y))

Fixing this exposed new DSE opportunities on spu-elf in
gcc.c-torture/execute/builtins/strcat-chk.c but otherwise
it doesn't seem to have much practical effect.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* alias.c (canon_rtx): Use simplify_gen_binary.

diff --git a/gcc/alias.c b/gcc/alias.c
index 486d06a..74df23c 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -1800,13 +1800,7 @@ canon_rtx (rtx x)
   rtx x1 = canon_rtx (XEXP (x, 1));
 
   if (x0 != XEXP (x, 0) || x1 != XEXP (x, 1))
-   {
- if (CONST_INT_P (x0))
-   return plus_constant (GET_MODE (x), x1, INTVAL (x0));
- else if (CONST_INT_P (x1))
-   return plus_constant (GET_MODE (x), x0, INTVAL (x1));
- return gen_rtx_PLUS (GET_MODE (x), x0, x1);
-   }
+   return simplify_gen_binary (PLUS, GET_MODE (x), x0, x1);
 }
 
   /* This gives us much better alias analysis when called from

[PATCH][PR libgfortran/78314] Fix ieee_support_halting

2016-11-15 Thread Szabolcs Nagy

When fpu trapping is enabled in libgfortran, the return value of
feenableexcept is not checked.  Glibc reports there if the operation
was unsuccessful which happens if the target has no trapping support.

There seems to be a separate api for checking trapping support:
ieee_support_halting, but it only checked if the exception status
flags are available, so check trapping support too by enabling
and disabling traps.

Updated the test that changed trapping to use ieee_support_halting,
(I think this is better than XFAILing the test case as it tests
for things that work without trapping support just fine.)

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

gcc/testsuite/
2016-11-15  Szabolcs Nagy  

PR libgfortran/78314
* gfortran.dg/ieee/ieee_6.f90: Use ieee_support_halting.

libgfortran/
2016-11-15  Szabolcs Nagy  

PR libgfortran/78314
* config/fpu-glibc.h (support_fpu_trap): Use feenableexcept.
diff --git a/gcc/testsuite/gfortran.dg/ieee/ieee_6.f90 b/gcc/testsuite/gfortran.dg/ieee/ieee_6.f90
index 8fb4f6f..43aa3bf 100644
--- a/gcc/testsuite/gfortran.dg/ieee/ieee_6.f90
+++ b/gcc/testsuite/gfortran.dg/ieee/ieee_6.f90
@@ -9,7 +9,7 @@
   implicit none
 
   type(ieee_status_type) :: s1, s2
-  logical :: flags(5), halt(5)
+  logical :: flags(5), halt(5), haltworks
   type(ieee_round_type) :: mode
   real :: x
 
@@ -18,6 +18,7 @@
   call ieee_set_flag(ieee_all, .false.)
   call ieee_set_rounding_mode(ieee_down)
   call ieee_set_halting_mode(ieee_all, .false.)
+  haltworks = ieee_support_halting(ieee_overflow)
 
   call ieee_get_status(s1)
   call ieee_set_status(s1)
@@ -46,7 +47,7 @@
   call ieee_get_rounding_mode(mode)
   if (mode /= ieee_to_zero) call abort
   call ieee_get_halting_mode(ieee_all, halt)
-  if ((.not. halt(1)) .or. any(halt(2:))) call abort
+  if ((haltworks .and. .not. halt(1)) .or. any(halt(2:))) call abort
 
   call ieee_set_status(s2)
 
@@ -58,7 +59,7 @@
   call ieee_get_rounding_mode(mode)
   if (mode /= ieee_to_zero) call abort
   call ieee_get_halting_mode(ieee_all, halt)
-  if ((.not. halt(1)) .or. any(halt(2:))) call abort
+  if ((haltworks .and. .not. halt(1)) .or. any(halt(2:))) call abort
 
   call ieee_set_status(s1)
 
@@ -79,6 +80,6 @@
   call ieee_get_rounding_mode(mode)
   if (mode /= ieee_to_zero) call abort
   call ieee_get_halting_mode(ieee_all, halt)
-  if ((.not. halt(1)) .or. any(halt(2:))) call abort
+  if ((haltworks .and. .not. halt(1)) .or. any(halt(2:))) call abort
 
 end
diff --git a/libgfortran/config/fpu-glibc.h b/libgfortran/config/fpu-glibc.h
index 6e505da..e254fb1 100644
--- a/libgfortran/config/fpu-glibc.h
+++ b/libgfortran/config/fpu-glibc.h
@@ -121,7 +121,43 @@ get_fpu_trap_exceptions (void)
 int
 support_fpu_trap (int flag)
 {
-  return support_fpu_flag (flag);
+  int exceptions = 0;
+  int old, ret;
+
+  if (!support_fpu_flag (flag))
+return 0;
+
+#ifdef FE_INVALID
+  if (flag & GFC_FPE_INVALID) exceptions |= FE_INVALID;
+#endif
+
+#ifdef FE_DIVBYZERO
+  if (flag & GFC_FPE_ZERO) exceptions |= FE_DIVBYZERO;
+#endif
+
+#ifdef FE_OVERFLOW
+  if (flag & GFC_FPE_OVERFLOW) exceptions |= FE_OVERFLOW;
+#endif
+
+#ifdef FE_UNDERFLOW
+  if (flag & GFC_FPE_UNDERFLOW) exceptions |= FE_UNDERFLOW;
+#endif
+
+#ifdef FE_DENORMAL
+  if (flag & GFC_FPE_DENORMAL) exceptions |= FE_DENORMAL;
+#endif
+
+#ifdef FE_INEXACT
+  if (flag & GFC_FPE_INEXACT) exceptions |= FE_INEXACT;
+#endif
+
+  old = fedisableexcept (exceptions);
+  if (old == -1)
+return 0;
+
+  ret = feenableexcept (exceptions) != -1;
+  feenableexcept (old);
+  return ret;
 }

Re: Use simplify_gen_binary in canon_rtx

2016-11-15 Thread Jeff Law


On 11/15/2016 09:07 AM, Richard Sandiford wrote:

After simplifying the operands of a PLUS, canon_rtx checked only
for cases in which one of the simplified operands was a constant,
falling back to gen_rtx_PLUS otherwise.  This left the PLUS in a
non-canonical order if one of the simplified operands was
(plus (reg R1) (const_int X)); we'd end up with:

   (plus (plus (reg R1) (const_int Y)) (reg R2))

rather than:

   (plus (plus (reg R1) (reg R2)) (const_int Y))

Fixing this exposed new DSE opportunities on spu-elf in
gcc.c-torture/execute/builtins/strcat-chk.c but otherwise
it doesn't seem to have much practical effect.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* alias.c (canon_rtx): Use simplify_gen_binary.

OK.
jeff

Tweak LRA handling of shared spill slots

2016-11-15 Thread Richard Sandiford

The previous code processed the users of a stack slot in order of
decreasing size and allocated the slot based on the first user.
This seems a bit dangerous, since the ordering is based on the
mode of the biggest reference while the allocation is based also
on the size of the register itself (which I think could be larger).

That scheme doesn't scale well to polynomial sizes, since there's
no guarantee that the order of the sizes is known at compile time.
This patch instead records an upper bound on the size required
by all users of a slot.  It also records the maximum alignment
requirement.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* function.h (spill_slot_alignment): Declare.
* function.c (spill_slot_alignment): New function.
* lra-spills.c (slot): Add align and size fields.
(assign_mem_slot): Use them in the call to assign_stack_local.
(add_pseudo_to_slot): Update the fields.
(assign_stack_slot_num_and_sort_pseudos): Initialise the fields.

diff --git a/gcc/function.c b/gcc/function.c
index 0b1d168..b009a0d 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -246,6 +246,14 @@ frame_offset_overflow (HOST_WIDE_INT offset, tree func)
   return FALSE;
 }
 
+/* Return the minimum spill slot alignment for a register of mode MODE.  */
+
+unsigned int
+spill_slot_alignment (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return STACK_SLOT_ALIGNMENT (NULL_TREE, mode, GET_MODE_ALIGNMENT (mode));
+}
+
 /* Return stack slot alignment in bits for TYPE and MODE.  */
 
 static unsigned int
diff --git a/gcc/function.h b/gcc/function.h
index e854c7f..6898f7f 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -567,6 +567,8 @@ extern HOST_WIDE_INT get_frame_size (void);
return FALSE.  */
 extern bool frame_offset_overflow (HOST_WIDE_INT, tree);
 
+extern unsigned int spill_slot_alignment (machine_mode);
+
 extern rtx assign_stack_local_1 (machine_mode, HOST_WIDE_INT, int, int);
 extern rtx assign_stack_local (machine_mode, HOST_WIDE_INT, int);
 extern rtx assign_stack_temp_for_type (machine_mode, HOST_WIDE_INT, tree);
diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c
index 6e044cd..9f1d5e9 100644
--- a/gcc/lra-spills.c
+++ b/gcc/lra-spills.c
@@ -104,6 +104,10 @@ struct slot
   /* Hard reg into which the slot pseudos are spilled. The value is
  negative for pseudos spilled into memory. */
   int hard_regno;
+  /* Maximum alignment required by all users of the slot.  */
+  unsigned int align;
+  /* Maximum size required by all users of the slot.  */
+  HOST_WIDE_INT size;
   /* Memory representing the all stack slot.  It can be different from
  memory representing a pseudo belonging to give stack slot because
  pseudo can be placed in a part of the corresponding stack slot.
@@ -128,51 +132,23 @@ assign_mem_slot (int i)
 {
   rtx x = NULL_RTX;
   machine_mode mode = GET_MODE (regno_reg_rtx[i]);
-  unsigned int inherent_size = PSEUDO_REGNO_BYTES (i);
-  unsigned int inherent_align = GET_MODE_ALIGNMENT (mode);
-  unsigned int max_ref_width = GET_MODE_SIZE (lra_reg_info[i].biggest_mode);
-  unsigned int total_size = MAX (inherent_size, max_ref_width);
-  unsigned int min_align = max_ref_width * BITS_PER_UNIT;
-  int adjust = 0;
+  HOST_WIDE_INT inherent_size = PSEUDO_REGNO_BYTES (i);
+  machine_mode wider_mode
+= (GET_MODE_SIZE (mode) >= GET_MODE_SIZE (lra_reg_info[i].biggest_mode)
+   ? mode : lra_reg_info[i].biggest_mode);
+  HOST_WIDE_INT total_size = GET_MODE_SIZE (wider_mode);
+  HOST_WIDE_INT adjust = 0;
 
   lra_assert (regno_reg_rtx[i] != NULL_RTX && REG_P (regno_reg_rtx[i])
  && lra_reg_info[i].nrefs != 0 && reg_renumber[i] < 0);
 
-  x = slots[pseudo_slots[i].slot_num].mem;
-
-  /* We can use a slot already allocated because it is guaranteed the
- slot provides both enough inherent space and enough total
- space.  */
-  if (x)
-;
-  /* Each pseudo has an inherent size which comes from its own mode,
- and a total size which provides room for paradoxical subregs
- which refer to the pseudo reg in wider modes.  We allocate a new
- slot, making sure that it has enough inherent space and total
- space.  */
-  else
+  unsigned int slot_num = pseudo_slots[i].slot_num;
+  x = slots[slot_num].mem;
+  if (!x)
 {
-  rtx stack_slot;
-
-  /* No known place to spill from => no slot to reuse.  */
-  x = assign_stack_local (mode, total_size,
- min_align > inherent_align
- || total_size > inherent_size ? -1 : 0);
-  stack_slot = x;
-  /* Cancel the big-endian correction done in assign_stack_local.
-Get the address of the beginning of the slot.  This is so we
-can do a big-endian correction uncondi

Use MEM_SIZE rather than GET_MODE_SIZE in dce.c

2016-11-15 Thread Richard Sandiford

Using MEM_SIZE is more general, since it copes with cases where
targets are forced to use BLKmode references for whatever reason.

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* dce.c (check_argument_store): Pass the size instead of
the memory reference.
(find_call_stack_args): Pass MEM_SIZE to check_argument_store.

diff --git a/gcc/dce.c b/gcc/dce.c
index 154469c..16340b64 100644
--- a/gcc/dce.c
+++ b/gcc/dce.c
@@ -234,16 +234,17 @@ mark_nonreg_stores (rtx body, rtx_insn *insn, bool fast)
 }
 
 
-/* Return true if store to MEM, starting OFF bytes from stack pointer,
+/* Return true if a store to SIZE bytes, starting OFF bytes from stack pointer,
is a call argument store, and clear corresponding bits from SP_BYTES
bitmap if it is.  */
 
 static bool
-check_argument_store (rtx mem, HOST_WIDE_INT off, HOST_WIDE_INT min_sp_off,
- HOST_WIDE_INT max_sp_off, bitmap sp_bytes)
+check_argument_store (HOST_WIDE_INT size, HOST_WIDE_INT off,
+ HOST_WIDE_INT min_sp_off, HOST_WIDE_INT max_sp_off,
+ bitmap sp_bytes)
 {
   HOST_WIDE_INT byte;
-  for (byte = off; byte < off + GET_MODE_SIZE (GET_MODE (mem)); byte++)
+  for (byte = off; byte < off + size; byte++)
 {
   if (byte < min_sp_off
  || byte >= max_sp_off
@@ -468,8 +469,8 @@ find_call_stack_args (rtx_call_insn *call_insn, bool 
do_mark, bool fast,
break;
}
 
-  if (GET_MODE_SIZE (GET_MODE (mem)) == 0
- || !check_argument_store (mem, off, min_sp_off,
+  if (!MEM_SIZE_KNOWN_P (mem)
+ || !check_argument_store (MEM_SIZE (mem), off, min_sp_off,
max_sp_off, sp_bytes))
break;

Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jakub Jelinek

On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:
> >>   Takes one signed LEB128 offset and retrieves 8-byte contents from the 
> >> address
> >>   calculated by CFA plus this offset, the contents then authenticated as 
> >> per A
> >>   key for instruction pointer using current CFA as salt. The result is 
> >> pushed
> >>   onto the stack.
> >I'd like to point out that especially the vendor range of DW_OP_* is
> >extremely scarce resource, we have only a couple of unused values, so taking
> >3 out of the remaining unused 12 for a single architecture is IMHO too much.
> >Can't you use just a single opcode and encode which of the 3 operations it is
> >in say the low 2 bits of a LEB 128 operand?
> >We'll likely need to do RSN some multiplexing even for the generic GNU
> >opcodes if we need just a few further ones (say 0xff as an extension,
> >followed by uleb128 containing the opcode - 0xff).
> >In the non-vendor area we still have 54 values left, so there is more space
> >for future expansion.
> 
>   Seperate DWARF operations are introduced instead of combining all of them 
> into
> one are mostly because these operations are going to be used for most of the
> functions once return address signing are enabled, and they are used for
> describing frame unwinding that they will go into unwind table for C++ program
> or C program compiled with -fexceptions, the impact on unwind table size is
> significant.  So I was trying to lower the unwind table size overhead as much 
> as
> I can.
> 
>   IMHO, three numbers actually is not that much for one architecture in DWARF
> operation vendor extension space as vendors can overlap with each other.  The
> only painful thing from my understand is there are platform vendors, for 
> example
> "GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just
one range, so ideally the opcodes would be unique everywhere, if not, there
is just a single GNU vendor, there is no separate range for Aarch64, that
can overlap with range for x86_64, and powerpc, etc.

Perhaps we could declare that certain opcode subrange for the GNU vendor is
architecture specific and document that the meaning of opcodes in that range
and count/encoding of their arguments depends on the architecture, but then
we should document how to figure out the architecture too (e.g. for ELF
base it on the containing EM_*).  All the tools that look at DWARF (readelf,
objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on 
that
though.

I know nothing about the aarch64 return address signing, would all 3 or say
2 usually appear together without any separate pc advance, or are they all
going to appear frequently and at different pcs?  Perhaps if there is just 1
opcode and has all the info encoded just in one bigger uleb128 or something
similar...

Jakub

Re: Some backward threader refactoring

2016-11-15 Thread Jeff Law


On 11/14/2016 02:39 AM, Jeff Law wrote:



I was looking at the possibility of dropping threading from VRP1/VRP2 or
DOM1/DOM2 in favor of the backwards threader -- the obvious idea being
to recover some compile-time for gcc-7.

Of the old-style threader passes (VRP1, VRP2, DOM1, DOM2), VRP2 is by
far the least useful.  But I can't see a path to removing it in the
gcc-7 timeframe.

Looking at what is caught by VRP and DOM threaders is quite interesting.
 VRP obviously catches stuff with ranges, some fairly complex.  While
you might think that querying range info in the backwards threader would
work, the problem is we lose way too much information as we drop
ASSERT_EXPRs.  (Recall that the threader runs while we're still in VRP
and thus has access to the ASSERT_EXPRs).

The DOM threaders catch stuff through state, simplifications and
bi-directional propagation of equivalences created by conditionals.

The most obvious limitation of the backwards walking threader is that it
only looks at PHIs, copies and constant initializations.  Other
statements are ignored and stop the backwards walk.

I've got a fair amount of support for walking through unary and limited
form binary expressions that I believe can be extended based on needs.
But that's not quite ready for stage1 close.  However, some of the
refactoring to make those changes easier to implement is ready.

This patch starts to break down fsm_find_control_statement_thread_paths
into more manageable hunks.

One such hunk is sub-path checking.  Essentially we're looking to add a
range of blocks to the thread path as we move from one def site to
another in the IL.  There aren't any functional changes in that
refactoring.  It's really just to make f_f_c_s_t_p easier to grok.

f_f_c_s_t_p has inline code to recursively walk backwards through PHI
nodes as well as assignments that are copies and constant initialization
terminals.  Pulling that handling out results in a f_f_c_s_t_p that fits
on a page.  It's just a hell of a lot easier to see what's going on.

The handling of assignments is slightly improved in this patch.
Essentially we only considered a const initialization using an
INTEGER_CST as a proper terminal node.  But certainly other constants
are useful -- ADDR_EXPR in particular and are now handled.  I'll mirror
that improvement in the PHI node routines tomorrow.

Anyway, this is really just meant to make it easier to start extending
the GIMPLE_ASSIGN handling.

Bootstrapped and regression tested on x86_64-linux-gnu.

I've got function comments for the new routines on a local branch.  I'll
get those installed before committing.
Final version attached.  Only change was allowing tcc_constant rather 
than just INTEGER_CST in PHIs and the addition of comments.


Bootstrapped and regression tested on x86, installing on the trunk.

Jeff
commit 4cbde473b184922d6c8423a7a63bdbb86de32b33
Author: Jeff Law 
Date:   Tue Nov 15 09:16:26 2016 -0700

* tree-ssa-threadbackward.c (fsm_find_thread_path): Remove unneeded
parameter.  Callers changed.
(check-subpath_and_update_thread_path): Extracted from
fsm_find_control_statement_thread_paths.
(handle_phi, handle_assignment, handle_assignment_p): Likewise.
(handle_phi, handle_assignment): Allow any constant node, not
just INTEGER_CST.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1e8475f..a54423a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2016-11-15  Jeff Law  
+
+   * tree-ssa-threadbackward.c (fsm_find_thread_path): Remove unneeded
+   parameter.  Callers changed.
+   (check-subpath_and_update_thread_path): Extracted from
+   fsm_find_control_statement_thread_paths.
+   (handle_phi, handle_assignment, handle_assignment_p): Likewise.
+   (handle_phi, handle_assignment): Allow any constant node, not
+   just INTEGER_CST.
+
 2016-11-15  Claudiu Zissulescu  
 
* config/arc/arc-arch.h: New file.
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index fd7d855..203e20e 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -62,14 +62,12 @@ get_gimple_control_stmt (basic_block bb)
 /* Return true if the CFG contains at least one path from START_BB to END_BB.
When a path is found, record in PATH the blocks from END_BB to START_BB.
VISITED_BBS is used to make sure we don't fall into an infinite loop.  Bound
-   the recursion to basic blocks belonging to LOOP.
-   SPEED_P indicate that we could increase code size to improve the code path 
*/
+   the recursion to basic blocks belonging to LOOP.  */
 
 static bool
 fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
  vec *&path,
- hash_set *visited_bbs, loop_p loop,
- bool speed_p)
+ hash_set *visited_bbs, loop_p loop)
 {
   if (loop != start_bb->loop_father)
 return false;
@@ -85,8 +83,7 @@ fsm_find_thread_path (

Fix handling of unknown sizes in rtx_addr_can_trap_p

2016-11-15 Thread Richard Sandiford

If the size passed in to rtx_addr_can_trap_p was zero, the frame
handling would get the size from the mode instead.  However, this
too can be zero if the mode is BLKmode, i.e. if we have a BLKmode
memory reference with no MEM_SIZE (which should be rare these days).
This meant that the conditions for a 4-byte access at offset X were
stricter than those for an access of unknown size at offset X.

This patch checks whether the size is still zero, as the
SYMBOL_REF handling does.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtlanal.c (rtx_addr_can_trap_p_1): Handle unknown sizes.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index a9d3960..889b14d 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -543,6 +543,8 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST_WIDE_INT offset, 
HOST_WIDE_INT size,
 
  if (size == 0)
size = GET_MODE_SIZE (mode);
+ if (size == 0)
+   return 1;
 
  if (x == frame_pointer_rtx)
{

Re: Use MEM_SIZE rather than GET_MODE_SIZE in dce.c

2016-11-15 Thread Jeff Law


On 11/15/2016 09:17 AM, Richard Sandiford wrote:

Using MEM_SIZE is more general, since it copes with cases where
targets are forced to use BLKmode references for whatever reason.

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* dce.c (check_argument_store): Pass the size instead of
the memory reference.
(find_call_stack_args): Pass MEM_SIZE to check_argument_store.

OK.

Jeff

Re: [PATCH][PR libgfortran/78314] Fix ieee_support_halting

2016-11-15 Thread FX

Hi,

> There seems to be a separate api for checking trapping support:
> ieee_support_halting, but it only checked if the exception status
> flags are available, so check trapping support too by enabling
> and disabling traps.

Thanks for the patch.

I am worried about the unnecessary operations that we’re doing here: doesn’t 
glibc have a way to tell you what it supports without having to do it (twice, 
enabling then disabling)?

Also, the glibc doc states that: "Each of the macros FE_DIVBYZERO, FE_INEXACT, 
FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW is defined when the implementation 
supports handling of the corresponding exception”. It evens says:

> Each constant is defined if and only if the FPU you are compiling for 
> supports that exception, so you can test for FPU support with ‘#ifdef’.

So it seems rather clear that compile-time tests are the recommended way to go.

FX

RE: [PATCH] MIPS/GCC: Mark text contents as code or data

2016-11-15 Thread Matthew Fortune

Maciej Rozycki  writes:
>   gcc/
>   * config/mips/mips-protos.h (mips_set_text_contents_type): New
>   prototype.
>   * config/mips/mips.h (ASM_OUTPUT_BEFORE_CASE_LABEL): New macro.
>   (ASM_OUTPUT_CASE_END): Likewise.
>   * config/mips/mips.c (mips_set_text_contents_type): New
>   function.
>   (mips16_emit_constants): Record the pool's initial label number
>   with the `consttable' insn.  Emit a `consttable_end' insn at the
>   end.
>   (mips_final_prescan_insn): Call `mips_set_text_contents_type'
>   for `consttable' insns.
>   (mips_final_postscan_insn): Call `mips_set_text_contents_type'
>   for `consttable_end' insns.
>   * config/mips/mips.md (unspec): Add UNSPEC_CONSTTABLE_END enum
>   value.
>   (consttable): Add operand.
>   (consttable_end): New insn.
> 
>   gcc/testsuite/
>   * gcc.target/mips/data-sym-jump.c: New test case.
>   * gcc.target/mips/data-sym-pool.c: New test case.
>   * gcc.target/mips/insn-pseudo-4.c: Adjust for constant pool
>   annotation.

Thanks for working on this it is really useful functionality.

I'm a little concerned the expected output tests may be fragile over
time but let's wait and see.

OK to commit.

Thanks,
Matthew

Optimise CONCAT handling in emit_group_load

2016-11-15 Thread Richard Sandiford

The CONCAT handling in emit_group_load chooses between doing
an extraction from a single component or forcing the whole
thing to memory and extracting from there.  The condition for
the former (more efficient) option was:

  if ((bytepos == 0 && bytelen == slen0)
  || (bytepos != 0 && bytepos + bytelen <= slen))

On the one hand this seems dangerous, since the second line
allows bit ranges that start in the first component and leak
into the second.  On the other hand it seems strange to allow
references that start after the first byte of the second
component but not those that start after the first byte
of the first component.  This led to a pessimisation of
things like gcc.dg/builtins-54.c for hppa64-hp-hpux11.23.

This patch simply checks whether the reference is contained
within a single component.  It also makes sure that we do
an extraction on anything that doesn't span the whole
component (even if it's constant).

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* expr.c (emit_group_load_1): Tighten check for whether an
access involves only one operand of a CONCAT.  Use extract_bit_field
for constants if the bit range does span the whole operand.

diff --git a/gcc/expr.c b/gcc/expr.c
index 0b0946d..985c2b3 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2175,19 +2175,22 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, 
tree type, int ssize)
{
  unsigned int slen = GET_MODE_SIZE (GET_MODE (src));
  unsigned int slen0 = GET_MODE_SIZE (GET_MODE (XEXP (src, 0)));
+ unsigned int elt = bytepos / slen0;
+ unsigned int subpos = bytepos % slen0;
 
- if ((bytepos == 0 && bytelen == slen0)
- || (bytepos != 0 && bytepos + bytelen <= slen))
+ if (subpos + bytelen <= slen0)
{
  /* The following assumes that the concatenated objects all
 have the same size.  In this case, a simple calculation
 can be used to determine the object and the bit field
 to be extracted.  */
- tmps[i] = XEXP (src, bytepos / slen0);
- if (! CONSTANT_P (tmps[i])
- && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode))
+ tmps[i] = XEXP (src, elt);
+ if (subpos != 0
+ || subpos + bytelen != slen0
+ || (!CONSTANT_P (tmps[i])
+ && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode)))
tmps[i] = extract_bit_field (tmps[i], bytelen * BITS_PER_UNIT,
-(bytepos % slen0) * BITS_PER_UNIT,
+subpos * BITS_PER_UNIT,
 1, NULL_RTX, mode, mode, false);
}
  else

[PATCH, Fortran, pr78356, v1] [7 Regression] [OOP] segfault allocating polymorphic variable with polymorphic component with allocatable component

2016-11-15 Thread Andre Vehreschild

Hi all,

attached patch fixes the issue raised. The issue here was, that a copy of the
base class was generated and its address passed to the _vptr->copy()-method,
which then accessed memory, that was not present in the copy being an object of
the base class. The patch fixes this by making sure the temporary handle is a
pointer to the data to copy.

Sorry, when that is not clear. I am not feeling so well today. So here in
terms of pseudo code. This code was formerly generated:

struct ac {};
struct a : struct ac { integer *i; };

a src, dst;
ac temp;

temp = src; // temp is now only a copy of ac

_vptr.copy(&temp, &dst); // temp does not denote memory having a pointer to i

After the patch, this code is generated:

// types as above
a src, dst;
ac *temp; // !!! Now a pointer

temp = &src;
_vptr.copy(temp, &dst); // temp now points to memory that has a pointer to i
// and is valid for copying.

Bootstraps and regtests ok on x86_64-linux/F23. Ok for trunk?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/fortran/ChangeLog:

2016-11-15  Andre Vehreschild  

PR fortran/78356
* class.c (gfc_is_class_scalar_expr): Prevent taking an array ref for
a component ref.
* trans-expr.c (gfc_trans_assignment_1): Ensure a reference to the
object to copy is generated, when assigning class objects.

gcc/testsuite/ChangeLog:

2016-11-15  Andre Vehreschild  

PR fortran/78356
* gfortran.dg/class_allocate_23.f08: New test.


diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index b42ec40..9db86b4 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -378,7 +378,8 @@ gfc_is_class_scalar_expr (gfc_expr *e)
 	&& CLASS_DATA (e->symtree->n.sym)
 	&& !CLASS_DATA (e->symtree->n.sym)->attr.dimension
 	&& (e->ref == NULL
-	|| (strcmp (e->ref->u.c.component->name, "_data") == 0
+	|| (e->ref->type == REF_COMPONENT
+		&& strcmp (e->ref->u.c.component->name, "_data") == 0
 		&& e->ref->next == NULL)))
 return true;
 
@@ -390,7 +391,8 @@ gfc_is_class_scalar_expr (gfc_expr *e)
 	&& CLASS_DATA (ref->u.c.component)
 	&& !CLASS_DATA (ref->u.c.component)->attr.dimension
 	&& (ref->next == NULL
-		|| (strcmp (ref->next->u.c.component->name, "_data") == 0
+		|| (ref->next->type == REF_COMPONENT
+		&& strcmp (ref->next->u.c.component->name, "_data") == 0
 		&& ref->next->next == NULL)))
 	return true;
 }
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 48296b8..1331b07 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -9628,6 +9628,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
   int n;
   bool maybe_workshare = false;
   symbol_attribute lhs_caf_attr, rhs_caf_attr, lhs_attr;
+  bool is_poly_assign;
 
   /* Assignment of the form lhs = rhs.  */
   gfc_start_block (&block);
@@ -9648,6 +9649,19 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
 	  || gfc_is_alloc_class_scalar_function (expr2)))
 expr2->must_finalize = 1;
 
+  /* Checking whether a class assignment is desired is quite complicated and
+ needed at two locations, so do it once only before the information is
+ needed.  */
+  lhs_attr = gfc_expr_attr (expr1);
+  is_poly_assign = (use_vptr_copy || lhs_attr.pointer
+		|| (lhs_attr.allocatable && !lhs_attr.dimension))
+		   && (expr1->ts.type == BT_CLASS
+		   || gfc_is_class_array_ref (expr1, NULL)
+		   || gfc_is_class_scalar_expr (expr1)
+		   || gfc_is_class_array_ref (expr2, NULL)
+		   || gfc_is_class_scalar_expr (expr2));
+
+
   /* Only analyze the expressions for coarray properties, when in coarray-lib
  mode.  */
   if (flag_coarray == GFC_FCOARRAY_LIB)
@@ -9676,6 +9690,10 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
   if (rss == gfc_ss_terminator)
 	/* The rhs is scalar.  Add a ss for the expression.  */
 	rss = gfc_get_scalar_ss (gfc_ss_terminator, expr2);
+  /* When doing a class assign, then the handle to the rhs needs to be a
+	 pointer to allow for polymorphism.  */
+  if (is_poly_assign && expr2->rank == 0 && !UNLIMITED_POLY (expr2))
+	rss->info->type = GFC_SS_REFERENCE;
 
   /* Associate the SS with the loop.  */
   gfc_add_ss_to_loop (&loop, lss);
@@ -9835,14 +9853,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
 	gfc_add_block_to_block (&loop.post, &rse.post);
 }
 
-  lhs_attr = gfc_expr_attr (expr1);
-  if ((use_vptr_copy || lhs_attr.pointer
-   || (lhs_attr.allocatable && !lhs_attr.dimension))
-  && (expr1->ts.type == BT_CLASS
-	  || (gfc_is_class_array_ref (expr1, NULL)
-	  || gfc_is_class_scalar_expr (expr1))
-	  || (gfc_is_class_array_ref (expr2, NULL)
-	  || gfc_is_class_scalar_expr (expr2
+  if (is_poly_assign)
 {
   tmp = trans_class_assignment (&body, expr1, expr2, &lse, &rse,
 use_vptr_copy

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Andrew Senkevich

2016-11-15 17:56 GMT+03:00 Jeff Law :
> On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>>
>> 2016-11-11 14:16 GMT+03:00 Uros Bizjak :
>>>
>>> --- a/gcc/genmodes.c
>>> +++ b/gcc/genmodes.c
>>> --- a/gcc/init-regs.c
>>> +++ b/gcc/init-regs.c
>>> --- a/gcc/machmode.h
>>> +++ b/gcc/machmode.h
>>>
>>> These are middle-end changes, you will need a separate review for these.
>>
>>
>> Who could review these changes?
>
> I can.  I likely dropped the message because it looked x86 specific, so if
> you could resend it'd be appreciated.

Attached (diff with previous only in fixed comments typos).


--
WBR,
Andrew


new_avx512_instructions_15.11.patch
Description: Binary data

C++ PATCH for c++/78358 (decltype and decomposition)

2016-11-15 Thread Jason Merrill

OK, (hopefully) one more patch for decltype and C++17 decomposition
declarations.  I hadn't been thinking that "referenced type" meant to
look through references in the tuple case, since other parts of
[dcl.decomp] define "the referenced type" directly, but that does seem
to be how it's used elsewhere in the standard.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 113051a8a3e231bb4003831a2f595cd8788eec64
Author: Jason Merrill 
Date:   Tue Nov 15 10:50:00 2016 -0500

PR c++/78358 - tuple decomposition decltype

* semantics.c (finish_decltype_type): Strip references for a tuple
decomposition.
* cp-tree.h (DECL_DECOMPOSITION_P): False for non-variables.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index edcd3b4..634efc9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -3627,10 +3627,10 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
   (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))->u.base.var_declared_inline_p \
= true)
 
-/* Nonzero if NODE is the artificial VAR_DECL for decomposition
+/* Nonzero if NODE is an artificial VAR_DECL for a C++17 decomposition
declaration.  */
 #define DECL_DECOMPOSITION_P(NODE) \
-  (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))  \
+  (VAR_P (NODE) && DECL_LANG_SPECIFIC (NODE)   \
? DECL_LANG_SPECIFIC (NODE)->u.base.decomposition_p \
: false)
 #define SET_DECL_DECOMPOSITION_P(NODE) \
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 29f5233..dc5ad13 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -8873,14 +8873,6 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
   if (identifier_p (expr))
 expr = lookup_name (expr);
 
-  /* The decltype rules for decomposition are different from the rules for
-member access; in particular, the decomposition decl gets
-cv-qualifiers from the aggregate object, whereas decltype of a member
-access expr ignores the object.  */
-  if (VAR_P (expr) && DECL_DECOMPOSITION_P (expr)
- && DECL_HAS_VALUE_EXPR_P (expr))
-   return unlowered_expr_type (DECL_VALUE_EXPR (expr));
-
   if (INDIRECT_REF_P (expr))
 /* This can happen when the expression is, e.g., "a.b". Just
look at the underlying operand.  */
@@ -8898,6 +8890,21 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
 /* See through BASELINK nodes to the underlying function.  */
 expr = BASELINK_FUNCTIONS (expr);
 
+  /* decltype of a decomposition name drops references in the tuple case
+(unlike decltype of a normal variable) and keeps cv-qualifiers from
+the containing object in the other cases (unlike decltype of a member
+access expression).  */
+  if (DECL_DECOMPOSITION_P (expr))
+   {
+ if (DECL_HAS_VALUE_EXPR_P (expr))
+   /* Expr is an array or struct subobject proxy, handle
+  bit-fields properly.  */
+   return unlowered_expr_type (expr);
+ else
+   /* Expr is a reference variable for the tuple case.  */
+   return non_reference (TREE_TYPE (expr));
+   }
+
   switch (TREE_CODE (expr))
 {
 case FIELD_DECL:
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp12.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp12.C
new file mode 100644
index 000..a5b686a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp12.C
@@ -0,0 +1,20 @@
+// PR c++/78358
+// { dg-do run }
+// { dg-options -std=c++1z }
+
+#include 
+
+template  struct same_type;
+template  struct same_type {};
+
+int main() {
+  std::tuple tuple = { 1, 'a', 2.3, true };
+  auto[i, c, d, b] = tuple;
+  same_type::type, decltype(i)>{};
+  same_type{};
+  same_type{};
+  same_type{};
+  same_type{};
+  if (i != 1 || c != 'a' || d != 2.3 || b != true)
+__builtin_abort ();
+}

Add more subreg offset helpers

2016-11-15 Thread Richard Sandiford

Provide versions of subreg_lowpart_offset and subreg_highpart_offset
that work on mode sizes rather than modes.  Also provide a routine
that converts an lsb position to a subreg offset.

The intent (in combination with later patches) is to move the
handling of the BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN case into
just two places, so that for other combinations we don't have
to split offsets into words and subwords.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtl.h (subreg_size_offset_from_lsb): Declare.
(subreg_offset_from_lsb): Likewise.
(subreg_size_lowpart_offset): Likewise.
(subreg_size_highpart_offset): Likewise.
* emit-rtl.c (subreg_size_lowpart_offset): New function.
(subreg_lowpart_offset): Use it.
(subreg_size_highpart_offset): New function.
(subreg_highpart_offset): Use it.
* rtlanal.c (subreg_size_offset_from_lsb): New function.
(subreg_offset_from_lsb): Likewise.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 9ea0c8f..bc4e536 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1478,44 +1478,59 @@ gen_highpart_mode (machine_mode outermode, machine_mode 
innermode, rtx exp)
  subreg_highpart_offset (outermode, innermode));
 }
 
-/* Return the SUBREG_BYTE for an OUTERMODE lowpart of an INNERMODE value.  */
+/* Return the SUBREG_BYTE for a lowpart subreg whose outer mode has
+   OUTER_BYTES bytes and whose inner mode has INNER_BYTES bytes.  */
 
 unsigned int
-subreg_lowpart_offset (machine_mode outermode, machine_mode innermode)
+subreg_size_lowpart_offset (unsigned int outer_bytes, unsigned int inner_bytes)
 {
-  unsigned int offset = 0;
-  int difference = (GET_MODE_SIZE (innermode) - GET_MODE_SIZE (outermode));
+  if (outer_bytes > inner_bytes)
+/* Paradoxical subregs always have a SUBREG_BYTE of 0.  */
+return 0;
 
-  if (difference > 0)
-{
-  if (WORDS_BIG_ENDIAN)
-   offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
-  if (BYTES_BIG_ENDIAN)
-   offset += difference % UNITS_PER_WORD;
-}
+  if (BYTES_BIG_ENDIAN && WORDS_BIG_ENDIAN)
+return inner_bytes - outer_bytes;
+  else if (!BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN)
+return 0;
+  else
+return subreg_size_offset_from_lsb (outer_bytes, inner_bytes, 0);
+}
+
+/* Return the SUBREG_BYTE for an OUTERMODE lowpart of an INNERMODE value.  */
 
-  return offset;
+unsigned int
+subreg_lowpart_offset (machine_mode outermode, machine_mode innermode)
+{
+  return subreg_size_lowpart_offset (GET_MODE_SIZE (outermode),
+GET_MODE_SIZE (innermode));
 }
 
-/* Return offset in bytes to get OUTERMODE high part
-   of the value in mode INNERMODE stored in memory in target format.  */
+/* Return the SUBREG_BYTE for a highpart subreg whose outer mode has
+   OUTER_BYTES bytes and whose inner mode has INNER_BYTES bytes.  */
+
 unsigned int
-subreg_highpart_offset (machine_mode outermode, machine_mode innermode)
+subreg_size_highpart_offset (unsigned int outer_bytes,
+unsigned int inner_bytes)
 {
-  unsigned int offset = 0;
-  int difference = (GET_MODE_SIZE (innermode) - GET_MODE_SIZE (outermode));
+  gcc_assert (inner_bytes >= outer_bytes);
 
-  gcc_assert (GET_MODE_SIZE (innermode) >= GET_MODE_SIZE (outermode));
+  if (BYTES_BIG_ENDIAN && WORDS_BIG_ENDIAN)
+return 0;
+  else if (!BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN)
+return inner_bytes - outer_bytes;
+  else
+return subreg_size_offset_from_lsb (outer_bytes, inner_bytes,
+   (inner_bytes - outer_bytes)
+   * BITS_PER_UNIT);
+}
 
-  if (difference > 0)
-{
-  if (! WORDS_BIG_ENDIAN)
-   offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
-  if (! BYTES_BIG_ENDIAN)
-   offset += difference % UNITS_PER_WORD;
-}
+/* Return the SUBREG_BYTE for an OUTERMODE highpart of an INNERMODE value.  */
 
-  return offset;
+unsigned int
+subreg_highpart_offset (machine_mode outermode, machine_mode innermode)
+{
+  return subreg_size_highpart_offset (GET_MODE_SIZE (outermode),
+ GET_MODE_SIZE (innermode));
 }
 
 /* Return 1 iff X, assumed to be a SUBREG,
diff --git a/gcc/rtl.h b/gcc/rtl.h
index df5172b..2fca974 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2178,6 +2178,10 @@ extern void get_full_rtx_cost (rtx, machine_mode, enum 
rtx_code, int,
 extern unsigned int subreg_lsb (const_rtx);
 extern unsigned int subreg_lsb_1 (machine_mode, machine_mode,
  unsigned int);
+extern unsigned int subreg_size_offset_from_lsb (unsigned int, unsigned int,
+uns

Use df_read_modify_subreg_p in cprop.c

2016-11-15 Thread Richard Sandiford

local_cprop_find_used_regs punted on all multiword registers,
with the comment:

  /* Setting a subreg of a register larger than word_mode leaves
 the non-written words unchanged.  */

But this only applies if the outer mode is smaller than the
inner mode.  If they're the same size then writes to the subreg
are a normal full update.

This patch uses df_read_modify_subreg_p instead.  A later patch
adds more uses of the same routine, but this part had a (positive)
effect on code generation for the testsuite whereas the others
seemed to be simple clean-ups.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* cprop.c (local_cprop_find_used_regs): Use df_read_modify_subreg_p.

diff --git a/gcc/cprop.c b/gcc/cprop.c
index 6b4c0b8..31868a5 100644
--- a/gcc/cprop.c
+++ b/gcc/cprop.c
@@ -1161,9 +1161,7 @@ local_cprop_find_used_regs (rtx *xptr, void *data)
   return;
 
 case SUBREG:
-  /* Setting a subreg of a register larger than word_mode leaves
-the non-written words unchanged.  */
-  if (GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (x))) > BITS_PER_WORD)
+  if (df_read_modify_subreg_p (x))
return;
   break;

Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS

2016-11-15 Thread Jeff Law


On 11/01/2016 03:39 PM, Wilco Dijkstra wrote:

 Jeff Law  wrote:


I think you'll need to look at bz61320 before this could go in.


I had a look, but there is nothing there that is related - eventually
a latent alignment bug was fixed in IVOpt.

Excellent.  Thanks for digging into what really happened.


Note that the bswap phase
currently inserts unaligned accesses irrespectively of STRICT_ALIGNMENT
or SLOW_UNALIGNED_ACCESS:

-  if (bswap
 - && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
 - && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
 -   return false;

If bswap is false no byte swap is needed, so we found a native endian load
and it will always perform the optimization by inserting an unaligned load.
This apparently works on all targets, and doesn't cause alignment traps or
huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS.
So I'm at a loss what these macros are supposed to mean and how I can query
whether a backend supports fast unaligned access for a particular mode.

What I actually want to write is something like:

 if (!FAST_UNALIGNED_LOAD (mode, align)) return false;

And know that it only accepts unaligned accesses that are efficient on the 
target.
Maybe we need a new hook like this and get rid of the old one?
As Richi indicated later, these decisions are probably made best at 
expansion time -- as long as we have the required information.  So I'd 
only go with a hook if (for example) the alignment information is lost 
by the time we get to expansion and thus we can't DTRT at expansion time.


Patch is OK.

jeff

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Jerry DeLisle


On 11/15/2016 07:59 AM, Jerry DeLisle wrote:

On 11/14/2016 11:22 PM, Thomas Koenig wrote:

Hi Jerry,


With these changes, OK for trunk?


Just going over this with a fine comb...

One thing just struck me:   The loop variables should be index_type, so

  const index_type m = xcount, n = ycount, k = count;

[...]

   index_type a_dim1, a_offset, b_dim1, b_offset, c_dim1, c_offset, i1, i2,
  i3, i4, i5, i6;

  /* Local variables */
  GFC_REAL_4 t1[65536], /* was [256][256] */
 f11, f12, f21, f22, f31, f32, f41, f42,
 f13, f14, f23, f24, f33, f34, f43, f44;
  index_type i, j, l, ii, jj, ll;
  index_type isec, jsec, lsec, uisec, ujsec, ulsec;

I agree that we should do the tuning of the inline limit
separately.



Several of my iterations used index_type. I found using integer gives better
performance. The reason is that they are of type ptr_diff_t which is a 64 bit
integer. I suspect we eliminate one memory fetch for each of these and reduce
the register loading by reducing the number of registers needed, two for one
situation. I will change back and retest.

and Paul commeneted "-ftree-vectorize turns on -ftree-loop-vectorize and
-ftree-slp-vectorize already."

I will remove those to options and keep -ftree-vectorize

I will report back my findings.



Changed back to index_type, all OK, must have been some OS stuff running in the 
background.


All comments incorporated. Standing by for approval.

Jerry

Re: [PATCH][PR libgfortran/78314] Fix ieee_support_halting

2016-11-15 Thread Szabolcs Nagy

On 15/11/16 16:22, FX wrote:
>> There seems to be a separate api for checking trapping support:
>> ieee_support_halting, but it only checked if the exception status
>> flags are available, so check trapping support too by enabling
>> and disabling traps.
> 
> Thanks for the patch.
> 
> I am worried about the unnecessary operations that we’re doing here: doesn’t 
> glibc have a way to tell you what it supports without having to do it (twice, 
> enabling then disabling)?
> 
> Also, the glibc doc states that: "Each of the macros FE_DIVBYZERO, 
> FE_INEXACT, FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW is defined when the 
> implementation supports handling of the corresponding exception”. It evens 
> says:
> 
>> Each constant is defined if and only if the FPU you are compiling for 
>> supports that exception, so you can test for FPU support with ‘#ifdef’.
> 
> So it seems rather clear that compile-time tests are the recommended way to 
> go.

i think that's a documentation bug then, it
should say that the macros imply the support
of fpu exception status flags, but not trapping.

(otherwise glibc could not provide iso c annex
f conforming fenv on aarch64 and arm, where FE_*
must be defined, but only status flag support
is required.)

disabling/enabling makes this api a lot heavier
than before, but trapping cannot be decided at
compile-time, although the result may be cached,
i think this should not be a frequent operation.

otoh rereading my patch i think i fail to restore
the original exception state correctly.

Rework subreg_get_info

2016-11-15 Thread Richard Sandiford

This isn't intended to change the behaviour, just rewrite the
existing logic in a different (and hopefully clearer) way.
The new form -- particularly the part based on the "block"
concept -- is easier to convert to polynomial sizes.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtlanal.c (subreg_get_info): Use more local variables.
Remark that for HARD_REGNO_NREGS_HAS_PADDING, each scalar unit
occupies at least one register.  Use byte_lowpart_offset to
check for big-endian offsets unless REG_WORDS_BIG_ENDIAN !=
WORDS_BIG_ENDIAN.  Share previously-duplicated if block.
Rework the main handling so that it operates on independently-
addressable YMODE-sized blocks.  Use subreg_size_lowpart_offset
to check lowpart offsets, without trying to find an equivalent
integer mode first.  Handle WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN
as a final register-endianness correction.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index ca6cced..7c0acf5 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3601,31 +3601,28 @@ subreg_get_info (unsigned int xregno, machine_mode 
xmode,
 unsigned int offset, machine_mode ymode,
 struct subreg_info *info)
 {
-  int nregs_xmode, nregs_ymode;
-  int mode_multiple, nregs_multiple;
-  int offset_adj, y_offset, y_offset_adj;
-  int regsize_xmode, regsize_ymode;
-  bool rknown;
+  unsigned int nregs_xmode, nregs_ymode;
 
   gcc_assert (xregno < FIRST_PSEUDO_REGISTER);
 
-  rknown = false;
+  unsigned int xsize = GET_MODE_SIZE (xmode);
+  unsigned int ysize = GET_MODE_SIZE (ymode);
+  bool rknown = false;
 
   /* If there are holes in a non-scalar mode in registers, we expect
- that it is made up of its units concatenated together.  */
+ that it is made up of its units concatenated together.  Each scalar
+ unit occupies at least one register.  */
   if (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode))
 {
-  machine_mode xmode_unit;
-
   nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
-  xmode_unit = GET_MODE_INNER (xmode);
+  unsigned int nunits = GET_MODE_NUNITS (xmode);
+  machine_mode xmode_unit = GET_MODE_INNER (xmode);
   gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
   gcc_assert (nregs_xmode
- == (GET_MODE_NUNITS (xmode)
+ == (nunits
  * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
   gcc_assert (hard_regno_nregs[xregno][xmode]
- == (hard_regno_nregs[xregno][xmode_unit]
- * GET_MODE_NUNITS (xmode)));
+ == hard_regno_nregs[xregno][xmode_unit] * nunits);
 
   /* You can only ask for a SUBREG of a value with holes in the middle
 if you don't cross the holes.  (Such a SUBREG should be done by
@@ -3635,11 +3632,9 @@ subreg_get_info (unsigned int xregno, machine_mode xmode,
 3 for each part, but in memory it's two 128-bit parts.
 Padding is assumed to be at the end (not necessarily the 'high part')
 of each unit.  */
-  if ((offset / GET_MODE_SIZE (xmode_unit) + 1
-  < GET_MODE_NUNITS (xmode))
+  if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < nunits)
  && (offset / GET_MODE_SIZE (xmode_unit)
- != ((offset + GET_MODE_SIZE (ymode) - 1)
- / GET_MODE_SIZE (xmode_unit
+ != ((offset + ysize - 1) / GET_MODE_SIZE (xmode_unit
{
  info->representable_p = false;
  rknown = true;
@@ -3651,18 +3646,17 @@ subreg_get_info (unsigned int xregno, machine_mode 
xmode,
   nregs_ymode = hard_regno_nregs[xregno][ymode];
 
   /* Paradoxical subregs are otherwise valid.  */
-  if (!rknown
-  && offset == 0
-  && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode))
+  if (!rknown && offset == 0 && ysize > xsize)
 {
   info->representable_p = true;
   /* If this is a big endian paradoxical subreg, which uses more
 actual hard registers than the original register, we must
 return a negative offset so that we find the proper highpart
 of the register.  */
-  if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD
- ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN)
-   info->offset = nregs_xmode - nregs_ymode;
+  if (REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN && ysize > UNITS_PER_WORD
+ ? REG_WORDS_BIG_ENDIAN
+ : byte_lowpart_offset (ymode, xmode) != 0)
+   info->offset = (int) nregs_xmode - (int) nregs_ymode;
   else
info->offset = 0;
   info->nregs = nregs_ymode;
@@ -3673,31 +3667,23 @@ subreg_get_info (unsigned int xregno, machine_mode 
xmode,
  modes, we cannot generally

Re: Fix handling of unknown sizes in rtx_addr_can_trap_p

2016-11-15 Thread Jeff Law


On 11/15/2016 09:21 AM, Richard Sandiford wrote:

If the size passed in to rtx_addr_can_trap_p was zero, the frame
handling would get the size from the mode instead.  However, this
too can be zero if the mode is BLKmode, i.e. if we have a BLKmode
memory reference with no MEM_SIZE (which should be rare these days).
This meant that the conditions for a 4-byte access at offset X were
stricter than those for an access of unknown size at offset X.

This patch checks whether the size is still zero, as the
SYMBOL_REF handling does.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtlanal.c (rtx_addr_can_trap_p_1): Handle unknown sizes.
I guess it's conservatively correct in that claiming we can trap when we 
can't never hurts correctness.



I'm OK with the patch, but am quite curious how we got to this point 
without an attached MEM_SIZE.


jeff

[PATCH] Add sem_item::m_hash_set (PR ipa/78309)

2016-11-15 Thread Martin Liška

Hi.

As seen on ppc64le during compilation of Firefox with LTO, combining inchash 
value
with a pointer, enum value and an integer, one can eventually get zero value.
Thus I decided to introduce a new flag that would distinguish between not set 
hash value
and a valid and (possibly) zero value.

I've been running regression tests, ready to install after it finishes?
Martin
>From 952ca6f6c0f99bcd965825898970453fb413964e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Nov 2016 16:15:20 +0100
Subject: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

gcc/ChangeLog:

2016-11-15  Martin Liska  

	PR ipa/78309
	* ipa-icf.c (void sem_item::set_hash): Update m_hash_set.
	(sem_function::get_hash): Make condition based on m_hash_set.
	(sem_variable::get_hash): Likewise.
	* ipa-icf.h (sem_item::m_hash_set): New property.
---
 gcc/ipa-icf.c | 10 ++
 gcc/ipa-icf.h |  3 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 1ab67f3..4352fd0 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -140,7 +140,8 @@ sem_usage_pair::sem_usage_pair (sem_item *_item, unsigned int _index):
for bitmap memory allocation.  */
 
 sem_item::sem_item (sem_item_type _type,
-		bitmap_obstack *stack): type (_type), m_hash (0)
+		bitmap_obstack *stack): type (_type), m_hash (0),
+		m_hash_set (false)
 {
   setup (stack);
 }
@@ -151,7 +152,7 @@ sem_item::sem_item (sem_item_type _type,
 
 sem_item::sem_item (sem_item_type _type, symtab_node *_node,
 		hashval_t _hash, bitmap_obstack *stack): type(_type),
-  node (_node), m_hash (_hash)
+  node (_node), m_hash (_hash), m_hash_set (true)
 {
   decl = node->decl;
   setup (stack);
@@ -230,6 +231,7 @@ sem_item::target_supports_symbol_aliases_p (void)
 void sem_item::set_hash (hashval_t hash)
 {
   m_hash = hash;
+  m_hash_set = true;
 }
 
 /* Semantic function constructor that uses STACK as bitmap memory stack.  */
@@ -279,7 +281,7 @@ sem_function::get_bb_hash (const sem_bb *basic_block)
 hashval_t
 sem_function::get_hash (void)
 {
-  if (!m_hash)
+  if (!m_hash_set)
 {
   inchash::hash hstate;
   hstate.add_int (177454); /* Random number for function type.  */
@@ -2116,7 +2118,7 @@ sem_variable::parse (varpool_node *node, bitmap_obstack *stack)
 hashval_t
 sem_variable::get_hash (void)
 {
-  if (m_hash)
+  if (m_hash_set)
 return m_hash;
 
   /* All WPA streamed in symbols should have their hashes computed at compile
diff --git a/gcc/ipa-icf.h b/gcc/ipa-icf.h
index d8de655..8dc3d31 100644
--- a/gcc/ipa-icf.h
+++ b/gcc/ipa-icf.h
@@ -274,6 +274,9 @@ protected:
   /* Hash of item.  */
   hashval_t m_hash;
 
+  /* Indicated whether a hash value has been set or not.  */
+  bool m_hash_set;
+
 private:
   /* Initialize internal data structures. Bitmap STACK is used for
  bitmap memory allocation process.  */
-- 
2.10.1

Re: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

2016-11-15 Thread Jeff Law


On 11/15/2016 09:43 AM, Martin Liška wrote:

Hi.

As seen on ppc64le during compilation of Firefox with LTO, combining inchash 
value
with a pointer, enum value and an integer, one can eventually get zero value.
Thus I decided to introduce a new flag that would distinguish between not set 
hash value
and a valid and (possibly) zero value.

I've been running regression tests, ready to install after it finishes?
Martin


0001-Add-sem_item-m_hash_set-PR-ipa-78309.patch


From 952ca6f6c0f99bcd965825898970453fb413964e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Nov 2016 16:15:20 +0100
Subject: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

gcc/ChangeLog:

2016-11-15  Martin Liska  

PR ipa/78309
* ipa-icf.c (void sem_item::set_hash): Update m_hash_set.
(sem_function::get_hash): Make condition based on m_hash_set.
(sem_variable::get_hash): Likewise.
* ipa-icf.h (sem_item::m_hash_set): New property.

OK.

jeff

Re: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

2016-11-15 Thread Jan Hubicka

> Hi.
> 
> As seen on ppc64le during compilation of Firefox with LTO, combining inchash 
> value
> with a pointer, enum value and an integer, one can eventually get zero value.
> Thus I decided to introduce a new flag that would distinguish between not set 
> hash value
> and a valid and (possibly) zero value.
> 
> I've been running regression tests, ready to install after it finishes?
> Martin

> >From 952ca6f6c0f99bcd965825898970453fb413964e Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 11 Nov 2016 16:15:20 +0100
> Subject: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)
> 
> gcc/ChangeLog:
> 
> 2016-11-15  Martin Liska  
> 
>   PR ipa/78309
>   * ipa-icf.c (void sem_item::set_hash): Update m_hash_set.
>   (sem_function::get_hash): Make condition based on m_hash_set.
>   (sem_variable::get_hash): Likewise.
>   * ipa-icf.h (sem_item::m_hash_set): New property.
Yep, zero is definitly valid hash value:0

Patch is OK. We may consider backporting it to release branches.
Honza

Re: Rework subreg_get_info

2016-11-15 Thread Richard Sandiford

Richard Sandiford  writes:
> This isn't intended to change the behaviour, just rewrite the
> existing logic in a different (and hopefully clearer) way.
> The new form -- particularly the part based on the "block"
> concept -- is easier to convert to polynomial sizes.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Sorry, I should have said: this was also tested by compiling the
testsuite before and after the change at -O2 -ftree-vectorize on:

aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf
epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu
mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu
powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf
rx-elf s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu
sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf
xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin
x86_64-linux-gnu xtensa-elf

There were no differences in assembly output.

Thanks,
Richard

Re: [PATCH/AARCH64] Have the verbose cost model output output be controllable

2016-11-15 Thread Andrew Pinski

On Fri, Oct 7, 2016 at 1:01 AM, Kyrill Tkachov
 wrote:
> Hi Andrew,
>
>
> On 24/09/16 06:46, Andrew Pinski wrote:
>>
>> Hi,
>>As reported in PR 61367, the aarch64 back-end is too verbose when it
>> is dealing with the cost model.  I tend to agree, no other back-end is
>> this verbose.  So I decided to add an option to enable this verbose
>> output if requested.
>>
>> I did NOT document it in invoke.texi because I don't feel like this is
>> an option which an user should use.  But I can add it if requested.
>>
>> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * config/aarch64/aarch64.opt (mverbose-cost-dump): New option.
>> * config/aarch64/aarch64.c (aarch64_rtx_costs): Use
>> flag_aarch64_verbose_cost instead of checking for details dump.
>> (aarch64_rtx_costs_wrapper): Likewise.
>
>
> I'm okay with the idea, but I can't approve (cc'ing people who can).

Ping?


> One nit:
>
> +mverbose-cost-dump
> +Common Var(flag_aarch64_verbose_cost)
> +Enables verbose cost model dummping in the debug dump files.
>
> You should add "Undocumented" to that.
> I don't think the option is major enough to warrant an entry in invoke.texi.
> It's only for aarch64 backend developers who know exactly what they're
> looking for.
>
> Cheers,
> Kyrill
>
>

Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jiong Wang




On 15/11/16 16:18, Jakub Jelinek wrote:

On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:

   Takes one signed LEB128 offset and retrieves 8-byte contents from the address
   calculated by CFA plus this offset, the contents then authenticated as per A
   key for instruction pointer using current CFA as salt. The result is pushed
   onto the stack.

I'd like to point out that especially the vendor range of DW_OP_* is
extremely scarce resource, we have only a couple of unused values, so taking
3 out of the remaining unused 12 for a single architecture is IMHO too much.
Can't you use just a single opcode and encode which of the 3 operations it is
in say the low 2 bits of a LEB 128 operand?
We'll likely need to do RSN some multiplexing even for the generic GNU
opcodes if we need just a few further ones (say 0xff as an extension,
followed by uleb128 containing the opcode - 0xff).
In the non-vendor area we still have 54 values left, so there is more space
for future expansion.

   Seperate DWARF operations are introduced instead of combining all of them 
into
one are mostly because these operations are going to be used for most of the
functions once return address signing are enabled, and they are used for
describing frame unwinding that they will go into unwind table for C++ program
or C program compiled with -fexceptions, the impact on unwind table size is
significant.  So I was trying to lower the unwind table size overhead as much as
I can.

   IMHO, three numbers actually is not that much for one architecture in DWARF
operation vendor extension space as vendors can overlap with each other.  The
only painful thing from my understand is there are platform vendors, for example
"GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just
one range, so ideally the opcodes would be unique everywhere, if not, there
is just a single GNU vendor, there is no separate range for Aarch64, that
can overlap with range for x86_64, and powerpc, etc.

Perhaps we could declare that certain opcode subrange for the GNU vendor is
architecture specific and document that the meaning of opcodes in that range
and count/encoding of their arguments depends on the architecture, but then
we should document how to figure out the architecture too (e.g. for ELF
base it on the containing EM_*).  All the tools that look at DWARF (readelf,
objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on 
that
though.

I know nothing about the aarch64 return address signing, would all 3 or say
2 usually appear together without any separate pc advance, or are they all
going to appear frequently and at different pcs?


  I think it's the latter, the DW_OP_AARCH64_paciasp and
DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs.
  
  For example, the following function prologue, there are three instructions

at 0x0, 0x4, 0x8.

  After the first instruction at 0x0, LR/X30 will be mangled.  The "paciasp" 
always
mangle LR register using SP as salt and write back the value into LR.  We then 
generate
DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in 
this
way so they can unwind the original value properly.

  After the second instruction at 0x4, The mangled value of LR/X30 will be 
pushed on
to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first 
fetch the
mangled value from stack offset -16, then do whatever to restore the original 
value
from the mangled value.  This is represented by (DW_OP_AARCH64_paciasp_deref, 
offset).

.cfi_startproc
   0x0  paciasp (this instruction sign return address register LR/X30)
.cfi_val_expression 30, DW_OP_AARCH64_paciasp
   0x4  stp x29, x30, [sp, -32]!
.cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
.cfi_offset 29, -32
.cfi_def_cfa_offset 32
   0x8  add x29, sp, 0


Perhaps if there is just 1
opcode and has all the info encoded just in one bigger uleb128 or something
similar...

Fix vec_cmp comparison mode

2016-11-15 Thread Richard Sandiford

vec_cmps assign the result of a vector comparison to a mask.
The optab was called with the destination having mode mask_mode
but with the source (the comparison) having mode VOIDmode,
which led to invalid rtl if the source operand was used directly.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* optabs.c (vector_compare_rtx): Add a cmp_mode parameter
and use it in the final call to gen_rtx_fmt_ee.
(expand_vec_cond_expr): Update accordingly.
(expand_vec_cmp_expr): Likewise.

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 7a1f025..b135c9b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5283,14 +5283,15 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
   return code;
 }
 
-/* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
-   unsigned operators.  OPNO holds an index of the first comparison
-   operand in insn with code ICODE.  Do not generate compare instruction.  */
+/* Return a comparison rtx of mode CMP_MODE for COND.  Use UNSIGNEDP to
+   select signed or unsigned operators.  OPNO holds the index of the
+   first comparison operand for insn ICODE.  Do not generate the
+   compare instruction itself.  */
 
 static rtx
-vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
-   bool unsignedp, enum insn_code icode,
-   unsigned int opno)
+vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
+   tree t_op0, tree t_op1, bool unsignedp,
+   enum insn_code icode, unsigned int opno)
 {
   struct expand_operand ops[2];
   rtx rtx_op0, rtx_op1;
@@ -5318,7 +5319,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, 
tree t_op1,
   create_input_operand (&ops[1], rtx_op1, m1);
   if (!maybe_legitimize_operands (icode, opno, 2, ops))
 gcc_unreachable ();
-  return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
+  return gen_rtx_fmt_ee (rcode, cmp_mode, ops[0].value, ops[1].value);
 }
 
 /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
@@ -5644,7 +5645,8 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree 
op1, tree op2,
return 0;
 }
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
+  comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
+  icode, 4);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
@@ -5688,7 +5690,8 @@ expand_vec_cmp_expr (tree type, tree exp, rtx target)
return 0;
 }
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+  comparison = vector_compare_rtx (mask_mode, tcode, op0a, op0b,
+  unsignedp, icode, 2);
   create_output_operand (&ops[0], target, mask_mode);
   create_fixed_operand (&ops[1], comparison);
   create_fixed_operand (&ops[2], XEXP (comparison, 0));

Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jiong Wang


On 15/11/16 16:18, Jakub Jelinek wrote:

On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:

   Takes one signed LEB128 offset and retrieves 8-byte contents from the address
   calculated by CFA plus this offset, the contents then authenticated as per A
   key for instruction pointer using current CFA as salt. The result is pushed
   onto the stack.

I'd like to point out that especially the vendor range of DW_OP_* is
extremely scarce resource, we have only a couple of unused values, so taking
3 out of the remaining unused 12 for a single architecture is IMHO too much.
Can't you use just a single opcode and encode which of the 3 operations it is
in say the low 2 bits of a LEB 128 operand?
We'll likely need to do RSN some multiplexing even for the generic GNU
opcodes if we need just a few further ones (say 0xff as an extension,
followed by uleb128 containing the opcode - 0xff).
In the non-vendor area we still have 54 values left, so there is more space
for future expansion.

   Seperate DWARF operations are introduced instead of combining all of them 
into
one are mostly because these operations are going to be used for most of the
functions once return address signing are enabled, and they are used for
describing frame unwinding that they will go into unwind table for C++ program
or C program compiled with -fexceptions, the impact on unwind table size is
significant.  So I was trying to lower the unwind table size overhead as much as
I can.

   IMHO, three numbers actually is not that much for one architecture in DWARF
operation vendor extension space as vendors can overlap with each other.  The
only painful thing from my understand is there are platform vendors, for example
"GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just
one range, so ideally the opcodes would be unique everywhere, if not, there
is just a single GNU vendor, there is no separate range for Aarch64, that
can overlap with range for x86_64, and powerpc, etc.

Perhaps we could declare that certain opcode subrange for the GNU vendor is
architecture specific and document that the meaning of opcodes in that range
and count/encoding of their arguments depends on the architecture, but then
we should document how to figure out the architecture too (e.g. for ELF
base it on the containing EM_*).  All the tools that look at DWARF (readelf,
objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on 
that
though.

I know nothing about the aarch64 return address signing, would all 3 or say
2 usually appear together without any separate pc advance, or are they all
going to appear frequently and at different pcs?


 I think it's the latter, the DW_OP_AARCH64_paciasp and
DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs.
For example, the following function prologue, there are three instructions
at 0x0, 0x4, 0x8.

  After the first instruction at 0x0, LR/X30 will be mangled.  The "paciasp" 
always
mangle LR register using SP as salt and write back the value into LR.  We then 
generate
DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in 
this
way so they can unwind the original value properly.

  After the second instruction at 0x4, The mangled value of LR/X30 will be 
pushed on
to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first 
fetch the
mangled value from stack offset -16, then do whatever to restore the original 
value
from the mangled value.  This is represented by (DW_OP_AARCH64_paciasp_deref, 
offset).

.cfi_startproc
   0x0  paciasp (this instruction sign return address register LR/X30)
.cfi_val_expression 30, DW_OP_AARCH64_paciasp
   0x4  stp x29, x30, [sp, -32]!
.cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
.cfi_offset 29, -32
.cfi_def_cfa_offset 32
   0x8  add x29, sp, 0



  Perhaps if there is just 1
opcode and has all the info encoded just in one bigger uleb128 or something
similar...

Jakub

Re: [PATCH][PPC] Fix ICE using power9 with soft-float

2016-11-15 Thread Andrew Stubbs


On 15/11/16 12:29, Segher Boessenkool wrote:

The peepholes do not support it, or maybe the define_insns do not either.
The machine of course will not care.


Oh, OK, so probably the bug is not in the peephole at all, but in the 
define_insn, or lack thereof.


More investigation required.

Thanks

Andrew

Re: [PATCH] Add map clauses to libgomp test device-3.f90

2016-11-15 Thread Alexander Monakov

On Mon, 14 Nov 2016, Alexander Monakov wrote:
> On Mon, 14 Nov 2016, Martin Jambor wrote:
> 
> > Hi,
> > 
> > yesterday I forgot to send out the following patch.  The test
> > libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90 was failing
> > for me when I was testing the HSA branch merge but I believe the test
> > itself is wrong and the failure is due to us now adhering to OpenMP
> > 4.5 default mapping of scalars (i.e. firstprivate, as opposed to
> > tofrom in 4.0) and the test itself needs to be fixed in the following
> > way.
> 
> From inspection, I believe device-1.f90 in the same directory has the same
> issue?

Yep, I do see new test execution failures with both Intel MIC and PTX offloading
on device-1.f90, device-3.f90 and target2.f90.  Here's an actually-tested patch
for the first two (on target2.f90 there's a different problem).

Martin Jambor  
Alexander Monakov  

* testsuite/libgomp.fortran/examples-4/device-1.f90 (e_57_1): Add
mapping clauses to target constructs.
* testsuite/libgomp.fortran/examples-4/device-3.f90 (e_57_3): Ditto.

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90 
b/libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90
index a411db4..30148f1 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90
@@ -9,12 +9,12 @@ program e_57_1
   a = 100
   b = 0
 
-  !$omp target if(a > 200 .and. a < 400)
+  !$omp target map(from: c) if(a > 200 .and. a < 400)
 c = omp_is_initial_device ()
   !$omp end target
 
   !$omp target data map(to: b) if(a > 200 .and. a < 400)
-!$omp target
+!$omp target map(from: b, d)
   b = 100
   d = omp_is_initial_device ()
 !$omp end target
@@ -25,12 +25,12 @@ program e_57_1
   a = a + 200
   b = 0
 
-  !$omp target if(a > 200 .and. a < 400)
+  !$omp target map(from: c) if(a > 200 .and. a < 400)
 c = omp_is_initial_device ()
   !$omp end target
 
   !$omp target data map(to: b) if(a > 200 .and. a < 400)
-!$omp target
+!$omp target map(from: b, d)
   b = 100
   d = omp_is_initial_device ()
 !$omp end target
@@ -41,12 +41,12 @@ program e_57_1
   a = a + 200
   b = 0
 
-  !$omp target if(a > 200 .and. a < 400)
+  !$omp target map(from: c) if(a > 200 .and. a < 400)
 c = omp_is_initial_device ()
   !$omp end target
 
   !$omp target data map(to: b) if(a > 200 .and. a < 400)
-!$omp target
+!$omp target map(from: b, d)
   b = 100
   d = omp_is_initial_device ()
 !$omp end target
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90 
b/libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90
index a29f1b5..d770b91 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90
@@ -8,13 +8,13 @@ program e_57_3
   integer :: default_device
 
   default_device = omp_get_default_device ()
-  !$omp target
+  !$omp target map(from: res)
 res = omp_is_initial_device ()
   !$omp end target
   if (res) call abort
 
   call omp_set_default_device (omp_get_num_devices ())
-  !$omp target
+  !$omp target map(from: res)
 res = omp_is_initial_device ()
   !$omp end target
   if (.not. res) call abort

Fix instances of gen_rtx_REG (VOIDmode, ...)

2016-11-15 Thread Richard Sandiford

Several definitions of INCOMING_RETURN_ADDR_RTX used
gen_rtx_REG (VOIDmode, ...), which with later patches
would trip an assert.  This patch converts them to use
Pmode instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* config/i386/i386.h (INCOMING_RETURN_ADDR_RTX): Use Pmode instead
of VOIDmode.
* config/ia64/ia64.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/iq2000/iq2000.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/m68k/m68k.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/microblaze/microblaze.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/mips/mips.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/mn10300/mn10300.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/nios2/nios2.h (INCOMING_RETURN_ADDR_RTX): Likewise.

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..fdaf423 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2176,7 +2176,7 @@ extern int const 
x86_64_ms_sysv_extra_clobbered_registers[12];
 
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-  gen_rtx_MEM (VOIDmode, gen_rtx_REG (VOIDmode, STACK_POINTER_REGNUM))
+  gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
 
 /* After the prologue, RA is at -4(AP) in the current frame.  */
 #define RETURN_ADDR_RTX(COUNT, FRAME)  \
diff --git a/gcc/config/ia64/ia64.h b/gcc/config/ia64/ia64.h
index ac0cb86..c79e20b 100644
--- a/gcc/config/ia64/ia64.h
+++ b/gcc/config/ia64/ia64.h
@@ -896,7 +896,7 @@ enum reg_class
RTL is either a `REG', indicating that the return value is saved in `REG',
or a `MEM' representing a location in the stack.  This enables DWARF2
unwind info for C++ EH.  */
-#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (VOIDmode, BR_REG (0))
+#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, BR_REG (0))
 
 /* A C expression whose value is an integer giving the offset, in bytes, from
the value of the stack pointer register to the top of the stack frame at the
diff --git a/gcc/config/iq2000/iq2000.h b/gcc/config/iq2000/iq2000.h
index 3b9dceb..e79c9a7 100644
--- a/gcc/config/iq2000/iq2000.h
+++ b/gcc/config/iq2000/iq2000.h
@@ -258,7 +258,7 @@ enum reg_class
 : (rtx) 0)
 
 /* Before the prologue, RA lives in r31.  */
-#define INCOMING_RETURN_ADDR_RTX  gen_rtx_REG (VOIDmode, GP_REG_FIRST + 31)
+#define INCOMING_RETURN_ADDR_RTX  gen_rtx_REG (Pmode, GP_REG_FIRST + 31)
 
 
 /* Register That Address the Stack Frame.  */
diff --git a/gcc/config/m68k/m68k.h b/gcc/config/m68k/m68k.h
index 2aa858f..7b63bd2 100644
--- a/gcc/config/m68k/m68k.h
+++ b/gcc/config/m68k/m68k.h
@@ -768,7 +768,7 @@ do { if (cc_prev_status.flags & CC_IN_68881)
\
 
 /* Before the prologue, RA is at 0(%sp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-  gen_rtx_MEM (VOIDmode, gen_rtx_REG (VOIDmode, STACK_POINTER_REGNUM))
+  gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
 
 /* After the prologue, RA is at 4(AP) in the current frame.  */
 #define RETURN_ADDR_RTX(COUNT, FRAME) \
diff --git a/gcc/config/microblaze/microblaze.h 
b/gcc/config/microblaze/microblaze.h
index dbfb652..849fab9 100644
--- a/gcc/config/microblaze/microblaze.h
+++ b/gcc/config/microblaze/microblaze.h
@@ -182,7 +182,7 @@ extern enum pipeline_type microblaze_pipe;
NOTE:  GDB has a workaround and expects this incorrect value.
If this is fixed, a corresponding fix to GDB is needed.  */
 #define INCOMING_RETURN_ADDR_RTX   \
-  gen_rtx_REG (VOIDmode, GP_REG_FIRST + MB_ABI_SUB_RETURN_ADDR_REGNUM)
+  gen_rtx_REG (Pmode, GP_REG_FIRST + MB_ABI_SUB_RETURN_ADDR_REGNUM)
 
 /* Use DWARF 2 debugging information by default.  */
 #define DWARF2_DEBUGGING_INFO
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 81862a9..12662a7 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1469,7 +1469,7 @@ FP_ASM_SPEC "\
 #define DWARF_FRAME_RETURN_COLUMN RETURN_ADDR_REGNUM
 
 /* Before the prologue, RA lives in r31.  */
-#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (VOIDmode, RETURN_ADDR_REGNUM)
+#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM)
 
 /* Describe how we implement __builtin_eh_return.  */
 #define EH_RETURN_DATA_REGNO(N) \
diff --git a/gcc/config/mn10300/mn10300.h b/gcc/config/mn10300/mn10300.h
index 714c6a0..9fd3d4b 100644
--- a/gcc/config/mn10300/mn10300.h
+++ b/gcc/config/mn10300/mn10300.h
@@ -516,7 +516,7 @@ struct cum_arg
 /* The return address is saved both in the stack and in MDR.  Using
the stack location is handiest for what unwinding needs.  */
 #define INCOMING_RETURN_ADDR_RTX \
-  gen_rtx_MEM (VOIDm

Re: [PATCH][PR libgfortran/78314] Fix ieee_support_halting

2016-11-15 Thread FX

> disabling/enabling makes this api a lot heavier
> than before, but trapping cannot be decided at
> compile-time, although the result may be cached,
> i think this should not be a frequent operation.
> 
> otoh rereading my patch i think i fail to restore
> the original exception state correctly.

Well, if we have no choice, then let’s do it. (With an updated patch)

FX

Re: [PATCH] libiberty: Fix some demangler crashes caused by reading past end of input.

2016-11-15 Thread Ian Lance Taylor

On Mon, Nov 14, 2016 at 1:19 AM, Mark Wielaard  wrote:
> In various situations the cplus_demangle () function could read past the
> end of input causing crashes. Add checks in various places to not advance
> the demangle string location and fail early when end of string is reached.
> Add various examples of input strings to the testsuite that would crash
> test-demangle before the fixes.
>
> Found by using the American Fuzzy Lop (afl) fuzzer.
>
> libiberty/ChangeLog:
>
>* cplus-dem.c (demangle_signature): After 'H', template function,
>no success and don't advance position if end of string reached.
>(demangle_template): After 'z', template name, return zero on
>premature end of string.
>(gnu_special): Guard strchr against searching for zero characters.
>(do_type): If member, only advance mangled string when 'F' found.
>* testsuite/demangle-expected: Add examples of strings that could
>crash the demangler by reading past end of input.
> ---

This is OK.

Thanks.

Ian

1 2 >

1 - 100 of 164 matches

Mail list logo