[PATCH] Remove odd restriction on conversion folding

2016-06-29 Thread Richard Biener

The following removes the restriction of not removing a useless
intermediate conversion if the final conversion is to a type
whose precision doesn't match its mode precision (in case the
intermediate conversion has the same mode).

Possibly from the times where such outer conversion didn't generate
any code (IIRC I fixed that early during the 4.x series).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-07-29  Richard Biener  

* match.pd ((T)(T2)x -> (T)x): Remove restriction on final
precision not matching mode precision.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 237813)
+++ gcc/match.pd(working copy)
@@ -1652,14 +1652,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
float or both integer, we don't need the middle conversion if the
former is wider than the latter and doesn't change the signedness
(for integers).  Avoid this if the final type is a pointer since
-   then we sometimes need the middle conversion.  Likewise if the
-   final type has a precision not equal to the size of its mode.  */
+   then we sometimes need the middle conversion.  */
 (if (((inter_int && inside_int) || (inter_float && inside_float))
 && (final_int || final_float)
 && inter_prec >= inside_prec
-&& (inter_float || inter_unsignedp == inside_unsignedp)
-&& ! (final_prec != GET_MODE_PRECISION (TYPE_MODE (type))
-  && TYPE_MODE (type) == TYPE_MODE (inter_type)))
+&& (inter_float || inter_unsignedp == inside_unsignedp))
  (ocvt @0))
 
 /* If we have a sign-extension of a zero-extended value, we can
@@ -1692,9 +1689,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 && ((inter_unsignedp && inter_prec > inside_prec)
 == (final_unsignedp && final_prec > inter_prec))
 && ! (inside_ptr && inter_prec != final_prec)
-&& ! (final_ptr && inside_prec != inter_prec)
-&& ! (final_prec != GET_MODE_PRECISION (TYPE_MODE (type))
-  && TYPE_MODE (type) == TYPE_MODE (inter_type)))
+&& ! (final_ptr && inside_prec != inter_prec))
  (ocvt @0))
 
 /* A truncation to an unsigned type (a zero-extension) should be


Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory

2016-06-29 Thread Andreas Krebbel
On 06/28/2016 04:16 PM, Jiong Wang wrote:
...
> So my first impression is TARGET_LEGITIMATE_ADDRESS_P on s390 do need a
> fix here. The following draft patch fix this, my fix may be in
> correct as normally we will allow illegal constant offset if it's
> associated with virtual frame register before virtual register
> elimination, I don't know if that should be considered here. Also I
> don't know if such check should be constrainted under some architecture
> features.
> 
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -4374,6 +4374,11 @@ s390_legitimate_address_p (machine_mode mode, rtx 
> addr, bool strict)
> || REGNO_REG_CLASS (REGNO (ad.indx)) == ADDR_REGS))
>return false;
>   }
> +
> +  if (ad.disp
> +  && !s390_short_displacement (ad.disp))
> +return false;
> +
> return true;
>   }

Whether short displacement is required or not depends on the instruction. We 
always relied on reload
to take care of this.  legitimate_address_p needs to accept the union of all 
valid adresses on
S/390.  That change probably would cause a severe performance regression.

> Meanwhile I feel LRA might be too strict here, as we can't assume all
> address legitimized before lra, right? we still need to handle reload
> illegal address. While now, a reload of
> 
>set (mem (reg + reg + offset)) regA
> 
>(The offset is out of range that the inner address doesn't pass
> constraint check.)
> 
> into
> 
> regB <- reg + reg + offset
>(set (mem (regB)) regA)

This is exactly what is supposed to happen in that case. All addresses which 
might be invalid to be
used directly in an insn can be handled with load address or load address 
relative long (+ secondary
reload for odd addresses).

> is treate as spill after r237277, so I feel the following check in
> lra-constraints.c also needs a relax?
> 
> if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
> 
> 
> Comments?
> 



Re: [PATCH] Improve TBAA with unions

2016-06-29 Thread Richard Biener
On Tue, 21 Jun 2016, Jeff Law wrote:

> On 05/24/2016 03:14 AM, Richard Biener wrote:
> > On Wed, 18 May 2016, Richard Biener wrote:
> > 
> > > 
> > > The following adjusts get_alias_set beahvior when applied to
> > > union accesses to use the union alias-set rather than alias-set
> > > zero.  This is in line with behavior from the alias oracle
> > > which (bogously) circumvents alias-set zero with looking at
> > > the alias-sets of the base object.  Thus for
> > > 
> > > union U { int i; float f; };
> > > 
> > > float
> > > foo (union U *u, double *p)
> > > {
> > >   u->f = 1.;
> > >   *p = 0;
> > >   return u->f;
> > > }
> > > 
> > > the langhooks ensured u->f has alias-set zero and thus disambiguation
> > > against *p was not allowed.  Still the alias-oracle did the disambiguation
> > > by using the alias set of the union here (I think optimizing the
> > > return to return 1. is valid).
> > > 
> > > We have a good place in the middle-end to apply such rules which
> > > is component_uses_parent_alias_set_from - this is where I move
> > > the logic that is duplicated in various frontends.
> > > 
> > > The Java and Ada frontends do not allow union type punning (LTO does),
> > > so this patch may eventually pessimize them.  I don't care anything
> > > about Java but Ada folks might want to chime in.
> > > 
> > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > > 
> > > Ok for trunk?
> > 
> > Ping.
> > 
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Richard.
> > > 
> > > 2016-05-18  Richard Biener  
> > > 
> > >   * alias.c (component_uses_parent_alias_set_from): Handle
> > >   type punning through union accesses by using the union alias set.
> > >   * gimple.c (gimple_get_alias_set): Remove union type punning case.
> > > 
> > >   c-family/
> > >   * c-common.c (c_common_get_alias_set): Remove union type punning case.
> > >   
> > >   fortran/
> > >   * f95-lang.c (LANG_HOOKS_GET_ALIAS_SET): Remove (un-)define.
> > >   (gfc_get_alias_set): Remove.
> You know the aliasing rules better than I.  If you're confident using the
> union's alias set is safe, then it's OK with me.
> 
> My only worry is that if we get it wrong, it's likely to be a subtle bug that
> may take a long time to expose itself.  But that in and of itself shouldn't
> stop us from going forward.

Ok, I have installed the patch now.  I had to adjust
g++.dg/torture/pr71002.C which was on the border of being invalid
(it relied on us using alias-set zero for all union accesses).
Basically it had union { struct A a; struct B a; } u; and accessed
that memory using a type of struct C.  The fix to the testcase is trivial,
just put C into the union, the PR talks about that being difficult
for the origin of the testcase as C is said to be not POD while the
union type has to be.  Still doesn't make it valid IMHO.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-06-29  Richard Biener  

PR middle-end/71002
* alias.c (component_uses_parent_alias_set_from): Handle
type punning through union accesses by using the union alias set.
* gimple.c (gimple_get_alias_set): Remove union type punning case.

c-family/
* c-common.c (c_common_get_alias_set): Remove union type punning case.

fortran/
* f95-lang.c (LANG_HOOKS_GET_ALIAS_SET): Remove (un-)define.
(gfc_get_alias_set): Remove.

* g++.dg/torture/pr71002.C: Adjust testcase.

Index: trunk/gcc/alias.c
===
*** trunk.orig/gcc/alias.c  2016-05-18 11:15:41.744792403 +0200
--- trunk/gcc/alias.c   2016-05-18 11:31:40.139709782 +0200
*** component_uses_parent_alias_set_from (co
*** 619,624 
--- 619,632 
case COMPONENT_REF:
  if (DECL_NONADDRESSABLE_P (TREE_OPERAND (t, 1)))
found = t;
+ /* Permit type-punning when accessing a union, provided the access
+is directly through the union.  For example, this code does not
+permit taking the address of a union member and then storing
+through it.  Even the type-punning allowed here is a GCC
+extension, albeit a common and useful one; the C standard says
+that such accesses have implementation-defined behavior.  */
+ else if (TREE_CODE (TREE_TYPE (TREE_OPERAND (t, 0))) == UNION_TYPE)
+   found = t;
  break;
  
case ARRAY_REF:
Index: trunk/gcc/c-family/c-common.c
===
*** trunk.orig/gcc/c-family/c-common.c  2016-05-18 11:15:41.744792403 +0200
--- trunk/gcc/c-family/c-common.c   2016-05-18 11:31:40.143709828 +0200
*** static GTY(()) hash_table
*** 4734,4741 
  alias_set_type
  c_common_get_alias_set (tree t)
  {
-   tree u;
- 
/* For VLAs, use the alias set of the element type rather than the
   default of alias set 0 for types compared str

Re: Fix for PR70909 in Libiberty Demangler (4)

2016-06-29 Thread Marcel Böhme
Hi Jason,

These test cases are generated by fuzzing which produces a lot of nonsensical 
input data. 
I think, "Garbage In, Garbage Out" is quite applicable here.
With the patch at least it doesn’t crash and fixes the vulnerability.

Best regards,
- Marcel

> On 26 May 2016, at 10:05 PM, Jason Merrill  wrote:
> 
> It seems like in cases of malformed input we should return the input
> again rather than produce garbage like "K> ".  Maybe catch this sort of situation in
> d_lookup_template_parameter?
> 
> Jason
> 
> 
> On Mon, May 2, 2016 at 11:21 AM, Marcel Böhme  wrote:
>> Hi,
>> 
>> This fixes several stack overflows due to infinite recursion in d_print_comp 
>> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70909).
>> 
>> The method d_print_comp in cp-demangle.c recursively constructs the 
>> d_print_info dpi from the demangle_component dc. The method 
>> d_print_comp_inner traverses dc as a graph. Now, dc can be a graph with 
>> cycles leading to infinite recursion in several distinct cases. The patch 
>> uses the component stack to find whether the current node dc has itself as 
>> ancestor more than once.
>> 
>> Bootstrapped and regression tested on x86_64-pc-linux-gnu. Test cases added 
>> to libiberty/testsuite/demangler-expected and checked PR70909 and related 
>> stack overflows are resolved.
>> 
>> Best regards,
>> - Marcel
>> 
>> 
>> 
>> Index: ChangeLog
>> ===
>> --- ChangeLog   (revision 235760)
>> +++ ChangeLog   (working copy)
>> @@ -1,3 +1,19 @@
>> +2016-05-02  Marcel Böhme  
>> +
>> +   PR c++/70909
>> +   PR c++/61460
>> +   PR c++/68700
>> +   PR c++/67738
>> +   PR c++/68383
>> +   PR c++/70517
>> +   PR c++/61805
>> +   PR c++/62279
>> +   PR c++/67264
>> +   * cp-demangle.c: Prevent infinite recursion when traversing cyclic
>> +   demangle component.
>> +   (d_print_comp): Return when demangle component has itself as ancistor
>> +   more than once.
>> +
>> 2016-04-30  Oleg Endo  
>> 
>>* configure: Remove SH5 support.
>> Index: cp-demangle.c
>> ===
>> --- cp-demangle.c   (revision 235760)
>> +++ cp-demangle.c   (working copy)
>> @@ -5436,6 +5436,24 @@ d_print_comp (struct d_print_info *dpi, int option
>> {
>>   struct d_component_stack self;
>> 
>> +  self.parent = dpi->component_stack;
>> +
>> +  while (self.parent)
>> +{
>> +  self.dc = self.parent->dc;
>> +  self.parent = self.parent->parent;
>> +  if (dc != NULL && self.dc == dc)
>> +   {
>> + while (self.parent)
>> +   {
>> + self.dc = self.parent->dc;
>> + self.parent = self.parent->parent;
>> + if (self.dc == dc)
>> +   return;
>> +   }
>> +   }
>> +}
>> +
>>   self.dc = dc;
>>   self.parent = dpi->component_stack;
>>   dpi->component_stack = &self;
>> Index: testsuite/demangle-expected
>> ===
>> --- testsuite/demangle-expected (revision 235760)
>> +++ testsuite/demangle-expected (working copy)
>> @@ -4431,3 +4431,69 @@ _Q.__0
>> 
>> _Q10-__9cafebabe.
>> cafebabe.::-(void)
>> +#
>> +# Test demangler crash PR62279
>> +
>> +_ZN5Utils9transformIPN15ProjectExplorer13BuildStepListEZNKS1_18BuildConfiguration14knownStepListsEvEUlS3_E_EE5QListIDTclfp0_cvT__RKS6_IS7_ET0_
>> +QList 
>> Utils::transform> ProjectExplorer::BuildConfiguration::knownStepLists() 
>> const::{lambda(ProjectExplorer::BuildStepList*)#1}>(ProjectExplorer::BuildConfiguration::knownStepLists()
>>  const::{lambda(ProjectExplorer::BuildStepList*)#1} const&, 
>> ProjectExplorer::BuildConfiguration::knownStepLists() 
>> const::{lambda(ProjectExplorer::BuildStepList*)#1})
>> +#
>> +
>> +_ZSt7forwardIKSaINSt6thread5_ImplISt12_Bind_simpleIFZN6WIM_DL5Utils9AsyncTaskC4IMNS3_8Hardware12FpgaWatchdogEKFvvEIPS8_EEEibOT_DpOT0_EUlvE_vEESD_RNSt16remove_referenceISC_E4typeE
>> +std::allocator>  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
>> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
>> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
>> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const&& 
>> std::forward>  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
>> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, 
>> std::allocator>  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
>> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
>> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
>> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const&&, 
>> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > 
>> const>(std::remove_reference>  (WIM_DL::Hardware::FpgaWatchdog::*)() const, 
>> WIM_DL::Hardware::FpgaWatchdog*>(int, bool, void 
>> (WIM_DL::Hardware::FpgaWatchdog::*&&)() const, 
>> WIM_DL::Hardware::FpgaWatchdog*&&)::{lambda()#1} ()> > > const>::type&)
>> +#
>> +# Test demangler crash PR61805
>> +
>> +_ZNK5n

RE: [PATCH 3/4] Add support to run auto-vectorization tests for multiple effective targets

2016-06-29 Thread Matthew Fortune
Robert Suchanek  writes:
> I'm resending this patch as it has been rebased and updated.  I reverted a 
> change
> to check_effective_target_vect_call_lrint procedure because it does not use
> cached result.

Hi Richard,

Do you know who might be best to ask for a review for this vect.exp testsuite
change? 

Thanks,
Matthew

> 
> Regards,
> Robert
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] 
> > On
> > Behalf Of Robert Suchanek
> > Sent: 10 August 2015 13:15
> > To: catherine_mo...@mentor.com; Matthew Fortune
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: [PATCH 3/4] Add support to run auto-vectorization tests for 
> > multiple
> > effective targets
> >
> > Hi,
> >
> > This patch allows to run auto-vectorization tests for more than one 
> > effective
> > target.  The initial proposal
> >
> > https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02289.html
> >
> > had some issues that have been addressed and should work as expected now.
> >
> > The idea was to add a wrapper procedure that would:
> > 1. Iterative over a list of EFFECTIVE_TARGETS e.g. mips_msa, mpaired_single.
> > 2. Add necessary compile time options for each effective target.
> > 3. Check if it's possible to compile and/or run on a target, and set
> >dg-do-what-default accordingly.
> > 4. Set the target index to tell check_effective_target_vect_* which target 
> > is
> >currently being processed.
> > 5. Invoke {gfortran-,g++-,}dg-runtest with the list of vector tests as 
> > normal.
> >
> > The above required that every vector feature e.g. vect_int that caches the
> > result is
> > capable of tracking what target supports a feature.  The result is saved to 
> > a
> > list
> > at an index controlled by the wrapper (et-dg-runtest).  Ports not using this
> > feature,
> > set DEFAULT_VECTFLAGS and the tests should run as they used to.
> >
> > The patch was additionally tested on x86_64-unknown-linux-gnu and aarch64-
> > linux-gnu.
> >
> > Regards,
> > Robert
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/vect/vect.exp: Add and set new global EFFECTIVE_TARGETS. Call
> > g++-dg-runtest via et-dg-runtest.
> > * gcc.dg/graphite/graphite.exp: Likewise, but for dg-runtest.
> > * gcc.dg/vect/vect.exp: Likewise.
> > * gfortran.dg/graphite/graphite.exp: Likewise, but for
> > gfortran-dg-runtest.
> > * gfortran.dg/vect/vect.exp: Likewise.
> > * lib/target-supports.exp (check_mpaired_single_hw_available): New.
> > (check_mips_loongson_hw_available): Likewise.
> > (check_effective_target_mpaired_single_runtime): Likewise.
> > (check_effective_target_mips_loongson_runtime): Likewise.
> > (add_options_for_mpaired_single): Likewise.
> > (check_effective_target_vect_int): Add global et_index.
> > Check and save the supported feature for a target selected by
> > the et_index target.  Break long lines where appropriate.  Call
> > et-is-effective-target for MIPS with an argument instead of
> > check_effective_target_* where appropriate.
> > (check_effective_target_vect_intfloat_cvt): Likewise.
> > (check_effective_target_vect_uintfloat_cvt): Likewise.
> > (check_effective_target_vect_floatint_cvt): Likewise.
> > (check_effective_target_vect_floatuint_cvt): Likewise.
> > (check_effective_target_vect_simd_clones): Likewise.
> > (check_effective_target_vect_shift): ewise.
> > (check_effective_target_whole_vector_shift): Likewise.
> > (check_effective_target_vect_bswap): Likewise.
> > (check_effective_target_vect_shift_char): Likewise.
> > (check_effective_target_vect_long): Likewise.
> > (check_effective_target_vect_float): Likewise.
> > (check_effective_target_vect_double): Likewise.
> > (check_effective_target_vect_long_long): Likewise.
> > (check_effective_target_vect_no_int_max): Likewise.
> > (check_effective_target_vect_no_int_add): Likewise.
> > (check_effective_target_vect_no_bitwise): Likewise.
> > (check_effective_target_vect_widen_shift): Likewise.
> > (check_effective_target_vect_no_align): Likewise.
> > (check_effective_target_vect_hw_misalign): Likewise.
> > (check_effective_target_vect_element_align): Likewise.
> > (check_effective_target_vect_condition): Likewise.
> > (check_effective_target_vect_cond_mixed): Likewise.
> > (check_effective_target_vect_char_mult): Likewise.
> > (check_effective_target_vect_short_mult): Likewise.
> > (check_effective_target_vect_int_mult): Likewise.
> > (check_effective_target_vect_extract_even_odd): Likewise.
> > (check_effective_target_vect_interleave): Likewise.
> > (check_effective_target_vect_stridedN): Likewise.
> > (check_effective_target_vect_multiple_sizes): Likewise.
> > (check_effective_target_vect64): Likewise.
> > (check_effective_target_vect_call_copysignf): Likewise.
> > (check_effective_target_vect_call_sqrtf): Likewise.
> > (check_effecti

[PATCH] PR 71667 - Handle live operations with DEBUG uses

2016-06-29 Thread Alan Hayward
In vectorizable_live_operation() we always assume uses a of live operation
will be PHIs. However, when using -g a use of a live operation might be a
DEBUG stmt.

This patch avoids adding any DEBUG statments to the worklist in
vectorizable_live_operation(). Also fixes comment.

Tested on x86 and aarch64.
Ok to commit?

gcc/
PR tree-optimization/71667
* tree-vect-loop.c (vectorizable_live_operation): ignore DEBUG stmts

testsuite/gcc.dg/vect
PR tree-optimization/71667
* pr71667.c: New



diff --git a/gcc/testsuite/gcc.dg/vect/pr71667.c
b/gcc/testsuite/gcc.dg/vect/pr71667.c
new file mode 100644
index 
..e7012efa882a5497b0a6099c3d853f9eb
375cc53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr71667.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-g" } */
+
+unsigned int mu;
+int pt;
+
+void
+qf (void)
+{
+  int gy;
+  long int vz;
+
+  for (;;)
+{
+  for (gy = 0; gy < 80; ++gy)
+  {
+   vz = mu;
+   ++mu;
+   pt = (vz != 0) && (pt != 0);
+  }
+  while (gy < 81)
+   while (gy < 83)
+ {
+   vz = (vz != 0) ? 0 : mu;
+   ++gy;
+ }
+  pt = vz;
+  ++mu;
+}
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
6c0337bbbcbebd6443fd3bcef45c1b23a7833486..2980a1b031cd3b919369b5e31dff7e066
5bc7578 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6352,11 +6352,12 @@ vectorizable_live_operation (gimple *stmt,
: gimple_get_lhs (stmt);
   lhs_type = TREE_TYPE (lhs);

-  /* Find all uses of STMT outside the loop - there should be exactly
one.  */
+  /* Find all uses of STMT outside the loop - there should be at least
one.  */
   auto_vec worklist;
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
-if (!flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
-   worklist.safe_push (use_stmt);
+if (!flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
+   && !is_gimple_debug (use_stmt))
+  worklist.safe_push (use_stmt);
   gcc_assert (worklist.length () >= 1);

   bitsize = TYPE_SIZE (TREE_TYPE (vectype));






Re: Determine more IVs to be non-overflowing

2016-06-29 Thread Richard Biener
On Mon, 27 Jun 2016, Jan Hubicka wrote:

> Hi,
> this patch makes simple_iv to determine more often that IV can not overflow.
> First I commonized the logic in simple_iv with nowrap_type_p because it tests
> the same. Second I added iv_can_overflow_p which uses known upper bound on
> number of iteration to see if the IV calculation can overflow.

+  if (!get_max_loop_iterations (loop, &nit))
+/* 10GHz CPU runing one cycle loop will reach 2^60 iterations in 260
+   years.  I won't live long enough to be forced to fix the
+   miscompilation.  Having the limit here will let us to consider
+   64bit IVs with base 0 and step 1...16 as non-wrapping which makes
+   niter and ivopts go smoother.  */
+nit = ((widest_int)1 << 60);

I think that's on the border of being acceptable.  Consider said loop
being autoparallelized and run on a PFlop cluster.  Please take it out
for now.

+  if (INTEGRAL_TYPE_P (type))
+{
+  maxt = TYPE_MAX_VALUE (type);
+  mint = TYPE_MIN_VALUE (type);
+}
+  else
+{
+  maxt = upper_bound_in_type (type, type);
+  mint = lower_bound_in_type (type, type);
+}

I think we don't support any non-integral/pointer IVs and thus you
can simply use

  type_min/max = wi::max/min_value (TYPE_PRECISION (type), TYPE_SIGN 
(type));

+  type_min = wi::to_widest (mint);
+  type_max = wi::to_widest (maxt);
+  if ((base_max + step_max * (nit + 1)) > (type_max)
+  || type_min > (base_min + step_min * (nit + 1)))
+return true;

so I'm not convinced that widest_int precision is enough here.  Can't
you use wide_ints instead of widest_ints and use the wi::add / wi::mult 
overloads which provide the overflow flag?  Otherwise do what
VRP does and use FIXED_WIDE_INT to properly handle __int128_t IVs.

> One interesting thig is that the ivcanon IV variables that goes from niter to > 0
> are believed to be wrapping.  This is because the type is unsigned and -1 is
> then large number.
> 
> It is not specified what overflow means, I suppose one can think of it as 
> overflow
> in the calucaltion what sort of happens in this case.
> Inspecting the code I think both users (niter and ivopts) agrees with 
> different
> interpretation that the induction can not wrap: that is the sequence produced
> is always monotonously increasing or decreasing.

That makes sense.

> I would suggest to incrementally rename it to no_wrap_p and add comment in 
> this
> sense and handle this case, too.  It will need update at one place in ivopts 
> where
> we are detemrining the direction of iteration:
> 
>   /* We need to know that the candidate induction variable does not overflow.
>  While more complex analysis may be used to prove this, for now just
>  check that the variable appears in the original program and that it
>  is computed in a type that guarantees no overflows.  */
>   cand_type = TREE_TYPE (cand->iv->base);
>   if (cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type))
> return false;
> ..
>   step = int_cst_value (cand->iv->step);  
>   
> ...
>   /* Determine the new comparison operator.  */   
>   
>   comp = step < 0 ? GT_EXPR : LT_EXPR;
>   
> 
> (which is the only occurence of step). I guess we can add function 
> iv_direction
> that will return -1,0,1 for monotonously decreasing, constant/unknown and
> monotonously increasing inductions.

scev provides scev_direction for this (of course we don't have the chrec
anymore here).

Richard.

> 
> Honza
> 
>   * tree-scalar-evolution.h (iv_can_overflow_p): Declare.
>   * tree-scalar-evolution.c (iv_can_overflow_p): New funcition.
>   (simple_iv): Use it.
>   * tree-ssa-loop-niter.c (nowrap_type_p): Use ANY_INTEGRAL_TYPE_P
> 
>   * gcc.dg/tree-ssa/scev-14.c: New testcase.
> Index: tree-scalar-evolution.h
> ===
> --- tree-scalar-evolution.h   (revision 237798)
> +++ tree-scalar-evolution.h   (working copy)
> @@ -38,6 +38,7 @@ extern unsigned int scev_const_prop (voi
>  extern bool expression_expensive_p (tree);
>  extern bool simple_iv (struct loop *, struct loop *, tree, struct affine_iv 
> *,
>  bool);
> +extern bool iv_can_overflow_p (struct loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (struct loop *, tree);
>  
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
> Index: tree-scalar-evolution.c
> ===
> --- tree-scalar-evolution.c   (revision 237798)
> +++ tree-scalar-evolution.c   (working copy)
> @@ -3309,6 +3310,70 @@ scev_reset (void)
>  }
>  }
>  
> +/* Return true if the IV calculation in TYPE can overflow based on the 
> knowledge
> +   of the upper bound on the number of iterations of LOOP, the BASE and STEP
> +   of IV.
> +
> +   We do not use information whether T

Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-06-29 Thread Richard Biener
On Mon, 27 Jun 2016, Richard Biener wrote:

> On Wed, 15 Jun 2016, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > With the proposed cost change for vector construction we will end up
> > > vectorizing the testcase in PR68961 again (on x86_64 and likely
> > > on ppc64le as well after that target gets adjustments).  Currently
> > > we can't optimize that away again noticing the direct overlap of
> > > argument and return registers.  The obstackle is
> > >
> > > (insn 7 4 8 2 (set (reg:V2DF 93)
> > > (vec_concat:V2DF (reg/v:DF 91 [ a ])
> > > (reg/v:DF 92 [ aa ]))) 
> > > ...
> > > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> > > (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> > > (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> > > (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> > >
> > > which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> > >
> > > First of all simplify_subreg doesn't handle the subregs of a vec_concat
> > > (easy fix below).
> > >
> > > Then combine doesn't like to simplify the multi-use (it tries some
> > > parallel it seems).  So I went to forwprop which eventually manages
> > > to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> > > because it is not a constant.  Thus I allow arbitrary simplification
> > > results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
> > > to be a magic flag to tell it to restrict to the case where all
> > > uses can be simplified or so, nor to restrict simplifications to a REG.
> > > But I don't see any undesirable simplifications of (subreg 
> > > ([vec_]concat)).
> > 
> > Adding that as a special case to propgate_rtx feels like a hack though :-)
> > I think:
> > 
> > > Index: gcc/fwprop.c
> > > ===
> > > *** gcc/fwprop.c  (revision 237286)
> > > --- gcc/fwprop.c  (working copy)
> > > *** propagate_rtx (rtx x, machine_mode mode,
> > > *** 664,670 
> > > || (GET_CODE (new_rtx) == SUBREG
> > > && REG_P (SUBREG_REG (new_rtx))
> > > && (GET_MODE_SIZE (mode)
> > > !   <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))
> > >   flags |= PR_CAN_APPEAR;
> > > if (!varying_mem_p (new_rtx))
> > >   flags |= PR_HANDLE_MEM;
> > > --- 664,673 
> > > || (GET_CODE (new_rtx) == SUBREG
> > > && REG_P (SUBREG_REG (new_rtx))
> > > && (GET_MODE_SIZE (mode)
> > > !   <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)
> > > !   || ((GET_CODE (new_rtx) == VEC_CONCAT
> > > !|| GET_CODE (new_rtx) == CONCAT)
> > > !   && GET_CODE (x) == SUBREG))
> > >   flags |= PR_CAN_APPEAR;
> > > if (!varying_mem_p (new_rtx))
> > >   flags |= PR_HANDLE_MEM;
> > 
> > ...this if statement should fundamentally only test new_rtx.
> > E.g. we'd want the same thing for any SUBREG inside X.
> > 
> > How about changing:
> > 
> >   /* The replacement we made so far is valid, if all of the recursive
> >  replacements were valid, or we could simplify everything to
> >  a constant.  */
> >   return valid_ops || can_appear || CONSTANT_P (tem);
> > 
> > so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
> > I suppose that's likely to increase register pressure though,
> > if only some uses of new_rtx simplify.  (There again, requiring all
> > uses to be replacable could make hot code the hostage of cold code.)
> 
> Yes, my fear was about register presure increase for the case not all
> uses can be replaced (fwprop doesn't seem to have code to verify or
> require that).
> 
> I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG
> case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the
> new_rtx == [VEC_]CONCAT case for example.

Btw, I have installed the simplify-rtx.c part now.

Richard.

2016-06-29  Richard Biener  

PR rtl-optimization/68961
* simplify-rtx.c (simplify_subreg): Handle VEC_CONCAT like CONCAT.



Re: Determine more IVs to be non-overflowing

2016-06-29 Thread Jan Hubicka
> +  if (!get_max_loop_iterations (loop, &nit))
> +/* 10GHz CPU runing one cycle loop will reach 2^60 iterations in 260
> +   years.  I won't live long enough to be forced to fix the
> +   miscompilation.  Having the limit here will let us to consider
> +   64bit IVs with base 0 and step 1...16 as non-wrapping which makes
> +   niter and ivopts go smoother.  */
> +nit = ((widest_int)1 << 60);
> 
> I think that's on the border of being acceptable.  Consider said loop
> being autoparallelized and run on a PFlop cluster.  Please take it out
> for now.

OK, once we will be able to get more than 2^64 iterations in resonable time,
we will need to revisit gcov code ;)
> 
> +  if (INTEGRAL_TYPE_P (type))
> +{
> +  maxt = TYPE_MAX_VALUE (type);
> +  mint = TYPE_MIN_VALUE (type);
> +}
> +  else
> +{
> +  maxt = upper_bound_in_type (type, type);
> +  mint = lower_bound_in_type (type, type);
> +}
> 
> I think we don't support any non-integral/pointer IVs and thus you
> can simply use
> 
>   type_min/max = wi::max/min_value (TYPE_PRECISION (type), TYPE_SIGN 
> (type));

OK.
> 
> +  type_min = wi::to_widest (mint);
> +  type_max = wi::to_widest (maxt);
> +  if ((base_max + step_max * (nit + 1)) > (type_max)
> +  || type_min > (base_min + step_min * (nit + 1)))
> +return true;
> 
> so I'm not convinced that widest_int precision is enough here.  Can't
> you use wide_ints instead of widest_ints and use the wi::add / wi::mult 
> overloads which provide the overflow flag?  Otherwise do what
> VRP does and use FIXED_WIDE_INT to properly handle __int128_t IVs.

nit is widest_int because, so I suppose first I need to convert it to wide_int
and then do the overflow checks.  Do we have narrowing conversion with overflow 
flat, too?
> 
> scev provides scev_direction for this (of course we don't have the chrec
> anymore here).

Good, thanks!

Honza


Re: Determine more IVs to be non-overflowing

2016-06-29 Thread Richard Biener
On Wed, 29 Jun 2016, Jan Hubicka wrote:

> > +  if (!get_max_loop_iterations (loop, &nit))
> > +/* 10GHz CPU runing one cycle loop will reach 2^60 iterations in 260
> > +   years.  I won't live long enough to be forced to fix the
> > +   miscompilation.  Having the limit here will let us to consider
> > +   64bit IVs with base 0 and step 1...16 as non-wrapping which makes
> > +   niter and ivopts go smoother.  */
> > +nit = ((widest_int)1 << 60);
> > 
> > I think that's on the border of being acceptable.  Consider said loop
> > being autoparallelized and run on a PFlop cluster.  Please take it out
> > for now.
> 
> OK, once we will be able to get more than 2^64 iterations in resonable time,
> we will need to revisit gcov code ;)
> > 
> > +  if (INTEGRAL_TYPE_P (type))
> > +{
> > +  maxt = TYPE_MAX_VALUE (type);
> > +  mint = TYPE_MIN_VALUE (type);
> > +}
> > +  else
> > +{
> > +  maxt = upper_bound_in_type (type, type);
> > +  mint = lower_bound_in_type (type, type);
> > +}
> > 
> > I think we don't support any non-integral/pointer IVs and thus you
> > can simply use
> > 
> >   type_min/max = wi::max/min_value (TYPE_PRECISION (type), TYPE_SIGN 
> > (type));
> 
> OK.
> > 
> > +  type_min = wi::to_widest (mint);
> > +  type_max = wi::to_widest (maxt);
> > +  if ((base_max + step_max * (nit + 1)) > (type_max)
> > +  || type_min > (base_min + step_min * (nit + 1)))
> > +return true;
> > 
> > so I'm not convinced that widest_int precision is enough here.  Can't
> > you use wide_ints instead of widest_ints and use the wi::add / wi::mult 
> > overloads which provide the overflow flag?  Otherwise do what
> > VRP does and use FIXED_WIDE_INT to properly handle __int128_t IVs.
> 
> nit is widest_int because, so I suppose first I need to convert it to 
> wide_int and then do the overflow checks.  Do we have narrowing 
> conversion with overflow flat, too?

I don't think so, you'd have to compare against type_max (nit is
"typeless").

Rihcard.

> > 
> > scev provides scev_direction for this (of course we don't have the chrec
> > anymore here).
> 
> Good, thanks!
> 
> Honza
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


RE: [PATCH 3/4] Add support to run auto-vectorization tests for multiple effective targets

2016-06-29 Thread Richard Biener
On Wed, 29 Jun 2016, Matthew Fortune wrote:

> Robert Suchanek  writes:
> > I'm resending this patch as it has been rebased and updated.  I reverted a 
> > change
> > to check_effective_target_vect_call_lrint procedure because it does not use
> > cached result.
> 
> Hi Richard,
> 
> Do you know who might be best to ask for a review for this vect.exp testsuite
> change? 

Mike may be - you need a TCL speaking reviewer ;)

Richard.

> Thanks,
> Matthew
> 
> > 
> > Regards,
> > Robert
> > 
> > > -Original Message-
> > > From: gcc-patches-ow...@gcc.gnu.org 
> > > [mailto:gcc-patches-ow...@gcc.gnu.org] On
> > > Behalf Of Robert Suchanek
> > > Sent: 10 August 2015 13:15
> > > To: catherine_mo...@mentor.com; Matthew Fortune
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: [PATCH 3/4] Add support to run auto-vectorization tests for 
> > > multiple
> > > effective targets
> > >
> > > Hi,
> > >
> > > This patch allows to run auto-vectorization tests for more than one 
> > > effective
> > > target.  The initial proposal
> > >
> > > https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02289.html
> > >
> > > had some issues that have been addressed and should work as expected now.
> > >
> > > The idea was to add a wrapper procedure that would:
> > > 1. Iterative over a list of EFFECTIVE_TARGETS e.g. mips_msa, 
> > > mpaired_single.
> > > 2. Add necessary compile time options for each effective target.
> > > 3. Check if it's possible to compile and/or run on a target, and set
> > >dg-do-what-default accordingly.
> > > 4. Set the target index to tell check_effective_target_vect_* which 
> > > target is
> > >currently being processed.
> > > 5. Invoke {gfortran-,g++-,}dg-runtest with the list of vector tests as 
> > > normal.
> > >
> > > The above required that every vector feature e.g. vect_int that caches the
> > > result is
> > > capable of tracking what target supports a feature.  The result is saved 
> > > to a
> > > list
> > > at an index controlled by the wrapper (et-dg-runtest).  Ports not using 
> > > this
> > > feature,
> > > set DEFAULT_VECTFLAGS and the tests should run as they used to.
> > >
> > > The patch was additionally tested on x86_64-unknown-linux-gnu and aarch64-
> > > linux-gnu.
> > >
> > > Regards,
> > > Robert
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * g++.dg/vect/vect.exp: Add and set new global EFFECTIVE_TARGETS. Call
> > >   g++-dg-runtest via et-dg-runtest.
> > >   * gcc.dg/graphite/graphite.exp: Likewise, but for dg-runtest.
> > >   * gcc.dg/vect/vect.exp: Likewise.
> > >   * gfortran.dg/graphite/graphite.exp: Likewise, but for
> > >   gfortran-dg-runtest.
> > >   * gfortran.dg/vect/vect.exp: Likewise.
> > >   * lib/target-supports.exp (check_mpaired_single_hw_available): New.
> > >   (check_mips_loongson_hw_available): Likewise.
> > >   (check_effective_target_mpaired_single_runtime): Likewise.
> > >   (check_effective_target_mips_loongson_runtime): Likewise.
> > >   (add_options_for_mpaired_single): Likewise.
> > >   (check_effective_target_vect_int): Add global et_index.
> > >   Check and save the supported feature for a target selected by
> > >   the et_index target.  Break long lines where appropriate.  Call
> > >   et-is-effective-target for MIPS with an argument instead of
> > >   check_effective_target_* where appropriate.
> > >   (check_effective_target_vect_intfloat_cvt): Likewise.
> > >   (check_effective_target_vect_uintfloat_cvt): Likewise.
> > >   (check_effective_target_vect_floatint_cvt): Likewise.
> > >   (check_effective_target_vect_floatuint_cvt): Likewise.
> > >   (check_effective_target_vect_simd_clones): Likewise.
> > >   (check_effective_target_vect_shift): ewise.
> > >   (check_effective_target_whole_vector_shift): Likewise.
> > >   (check_effective_target_vect_bswap): Likewise.
> > >   (check_effective_target_vect_shift_char): Likewise.
> > >   (check_effective_target_vect_long): Likewise.
> > >   (check_effective_target_vect_float): Likewise.
> > >   (check_effective_target_vect_double): Likewise.
> > >   (check_effective_target_vect_long_long): Likewise.
> > >   (check_effective_target_vect_no_int_max): Likewise.
> > >   (check_effective_target_vect_no_int_add): Likewise.
> > >   (check_effective_target_vect_no_bitwise): Likewise.
> > >   (check_effective_target_vect_widen_shift): Likewise.
> > >   (check_effective_target_vect_no_align): Likewise.
> > >   (check_effective_target_vect_hw_misalign): Likewise.
> > >   (check_effective_target_vect_element_align): Likewise.
> > >   (check_effective_target_vect_condition): Likewise.
> > >   (check_effective_target_vect_cond_mixed): Likewise.
> > >   (check_effective_target_vect_char_mult): Likewise.
> > >   (check_effective_target_vect_short_mult): Likewise.
> > >   (check_effective_target_vect_int_mult): Likewise.
> > >   (check_effective_target_vect_extract_even_odd): Likewise.
> > >   (check_effective_target_vect_interleave): Likewise.
> > >   (check_effective_target_vect_stridedN): Likewise.
> > >   (

Re: [PATCH] tree-ssa-strlen improvements (PR tree-optimization/71625)

2016-06-29 Thread Richard Biener
On Tue, 28 Jun 2016, Jakub Jelinek wrote:

> Hi!
> 
> This is just first small step towards this PR.
> It brings the ADDR_EXPR of DECL_P bases roughly on the same level as
> SSA_NAMEs pointers - so get_stridx_plus_constant works for them, and
> more importantly, before this patch there was a very severe bug in
> addr_stridxptr (missing list = list->next; in the loop), which meant that
> inside of a single decl we were tracking at most one string length, rather
> than the former 16 (or now 32) limit.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> As a follow-up, I'd like to introduce string length at least N, where we
> don't know the exact string length, but we know there are no '\0' chars
> within N bytes.
> 
> 2016-06-28  Jakub Jelinek  
> 
>   PR tree-optimization/71625
>   * tree-ssa-strlen.c (get_addr_stridx): Add PTR argument.  Assume list
>   is sorted by ascending list->offset.  If PTR is non-NULL and there is
>   previous strinfo, call get_stridx_plus_constant.
>   (get_stridx): Pass exp as second argument to get_addr_stridx.
>   (addr_stridxptr): Add missing list = list->next, so that there can be
>   more than one entries in the list.  Bump limit from 16 to 32.  Ensure
>   the list is sorted by ascending list->offset.
>   (get_stridx_plus_constant): Adjust so that it can be also called with
>   ADDR_EXPR instead of SSA_NAME as PTR.
>   (handle_char_store): Pass NULL_TREE as second argument to
>   get_addr_stridx.
> 
>   * gcc.dg/strlenopt-28.c: New test.
> 
> --- gcc/tree-ssa-strlen.c.jj  2016-06-28 10:21:22.103693623 +0200
> +++ gcc/tree-ssa-strlen.c 2016-06-28 13:41:56.797718522 +0200
> @@ -159,10 +159,10 @@ get_strinfo (int idx)
>  /* Helper function for get_stridx.  */
>  
>  static int
> -get_addr_stridx (tree exp)
> +get_addr_stridx (tree exp, tree ptr)
>  {
>HOST_WIDE_INT off;
> -  struct stridxlist *list;
> +  struct stridxlist *list, *last = NULL;
>tree base;
>  
>if (!decl_to_stridxlist_htab)
> @@ -180,9 +180,22 @@ get_addr_stridx (tree exp)
>  {
>if (list->offset == off)
>   return list->idx;
> +  if (list->offset > off)
> + return 0;
> +  last = list;
>list = list->next;
>  }
>while (list);
> +
> +  if (ptr && last && last->idx > 0)
> +{
> +  strinfo *si = get_strinfo (last->idx);
> +  if (si
> +   && si->length
> +   && TREE_CODE (si->length) == INTEGER_CST
> +   && compare_tree_int (si->length, off - last->offset) != -1)
> + return get_stridx_plus_constant (si, off - last->offset, ptr);
> +}
>return 0;
>  }
>  
> @@ -234,7 +247,7 @@ get_stridx (tree exp)
>  
>if (TREE_CODE (exp) == ADDR_EXPR)
>  {
> -  int idx = get_addr_stridx (TREE_OPERAND (exp, 0));
> +  int idx = get_addr_stridx (TREE_OPERAND (exp, 0), exp);
>if (idx != 0)
>   return idx;
>  }
> @@ -304,15 +317,29 @@ addr_stridxptr (tree exp)
>if (existed)
>  {
>int i;
> -  for (i = 0; i < 16; i++)
> +  stridxlist *before = NULL;
> +  for (i = 0; i < 32; i++)
>   {
> if (list->offset == off)
>   return &list->idx;
> +   if (list->offset > off && before == NULL)
> + before = list;
> if (list->next == NULL)
>   break;
> +   list = list->next;
>   }
> -  if (i == 16)
> +  if (i == 32)
>   return NULL;
> +  if (before)
> + {
> +   list = before;
> +   before = XOBNEW (&stridx_obstack, struct stridxlist);
> +   *before = *list;
> +   list->next = before;
> +   list->offset = off;
> +   list->idx = 0;
> +   return &list->idx;
> + }
>list->next = XOBNEW (&stridx_obstack, struct stridxlist);
>list = list->next;
>  }
> @@ -613,9 +640,7 @@ verify_related_strinfos (strinfo *origsi
>  static int
>  get_stridx_plus_constant (strinfo *basesi, HOST_WIDE_INT off, tree ptr)
>  {
> -  gcc_checking_assert (TREE_CODE (ptr) == SSA_NAME);
> -
> -  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ptr))
> +  if (TREE_CODE (ptr) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ptr))
>  return 0;
>  
>if (basesi->length == NULL_TREE
> @@ -633,7 +658,8 @@ get_stridx_plus_constant (strinfo *bases
>|| TREE_CODE (si->length) != INTEGER_CST)
>  return 0;
>  
> -  if (ssa_ver_to_stridx.length () <= SSA_NAME_VERSION (ptr))
> +  if (TREE_CODE (ptr) == SSA_NAME
> +  && ssa_ver_to_stridx.length () <= SSA_NAME_VERSION (ptr))
>  ssa_ver_to_stridx.safe_grow_cleared (num_ssa_names);
>  
>gcc_checking_assert (compare_tree_int (si->length, off) != -1);
> @@ -651,6 +677,7 @@ get_stridx_plus_constant (strinfo *bases
>   {
> if (r == 0)
>   {
> +   gcc_assert (TREE_CODE (ptr) == SSA_NAME);
> ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx;
> return si->idx;
>   }
> @@ -2063,7 +2090,7 @@ handle_char_store (gim

Re: [ARM] Fix, add tests for FP16 aapcs.

2016-06-29 Thread Christophe Lyon
On 27 June 2016 at 11:58, Matthew Wahab  wrote:
> On 10/06/16 15:30, Matthew Wahab wrote:
>> On 10/06/16 15:22, Christophe Lyon wrote:
>>> On 10 June 2016 at 15:56, Matthew Wahab 
>>> wrote:
 On 10/06/16 09:32, Christophe Lyon wrote:
>
> On 9 June 2016 at 17:21, Matthew Wahab 
> wrote:
>>
> It's an improvement, but I'm still seeing a few problems with this
> patch:
> the vfp* tests are still failing in some of the configurations I test,
> because
> * you force dg-options that contains -mfloat-abi=hard,
> * you check effective-target arm_neon_fp16_hw
> * but you don't call dg-add-options arm_neon_fp16
>
>> I understand now. I still think it would be better to use a list of
>> require-effective-targets so I'll try that first and use the arm_neon_fp16
>> options if that doesn't work.
>>
>
> Sorry for the delay. I've added effective-target requirements to the
> tests to check for hard-fp and for VFP (i.e. non-neon) FP16 support. The
> directives for the VFP FP16 support are new. I've split them out to a
> separate patch, both patches are attached.
>

I have run validations with the 2 patches applied, and the result
looks OK to me, FWIW.

Thanks,

Christophe.

> The first patch adds:
>
> - effective-target keywords arm_fp16_ok and arm_fp16_hw to check for
>   compiler and hardware support for FP16.
>
> - add-options features arm_fp16_ieee and arm_fp16_alternative, to
>   enable FP16 IEEE format and FP16 ARM Alternative format support
>
> Note that the existing add-options feature arm_fp16 enables the default
> FP16 format (fp16-format=none).
>
> The second patch updates the tests to use these directives. It also
> reworks gcc.target/arm/fp16-aapcs-1.c test is also reworked to focus on
> argument passing and return values adds a softfp variant as
> fp16-aapcs-2.c.
>
> As before, checked for arm-none-eabi with cross-compiled check-gcc and
> arm-linux-gnueabihf with native make check. I also ran the tests for
> cross-compiled arm-none-eabi with -mcpu=Cortex-M3.
>
> Ok for trunk?
> Matthew
>
> PATCH 1/2 ChangeLog
> gcc/
> 2016-06-27  Matthew Wahab  
>
> * doc/sourcebuild.texi (Effective-Target keywords): Add entries
> for arm_fp16_ok and arm_fp16_hw.
> (Add Options): Add entries for arm_fp16, arm_fp16_ieee and
> arm_fp16_alternative.
>
> testsuite/
> 2016-06-27  Matthew Wahab  
>
> * lib/target-supports.exp (add_options_for arm_fp16): Reword
> comment.
> (add_options_for_arm_fp16_ieee): New.
> (add_options_for_arm_fp16_alternative): New.
> (check_effective_target_arm_fp16_ok_nocache): Add to comment.  Fix a
> long-line.
> (check_effective_target_arm_fp16_hw): New.
>
> PATCH 2/2 ChangeLog
> testsuite/
> 2016-06-27  Matthew Wahab  
>
> * testsuite/gcc.target/arm/aapcs/neon-vect10.c: Require
> -mfloat-ab=hard.  Replace arm_neon_fp16_ok with arm_neon_fp16_hw.
> * testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise.
> * testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise.  Also add
> options for ARM FP16 IEEE format.
> * testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise.
> * testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise.
> * testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise.
> * testsuite/gcc.target/arm/fp16-aapcs-1.c: Require
> -mfloat-ab=hard.  Also simplify the test.
> * testsuite/gcc.target/arm/fp16-aapcs-2.c: New.
>


Re: [ARM] Fix, add tests for FP16 aapcs.

2016-06-29 Thread Ramana Radhakrishnan
On Mon, Jun 27, 2016 at 10:58 AM, Matthew Wahab
 wrote:
> On 10/06/16 15:30, Matthew Wahab wrote:
>> On 10/06/16 15:22, Christophe Lyon wrote:
>>> On 10 June 2016 at 15:56, Matthew Wahab 
>>> wrote:
 On 10/06/16 09:32, Christophe Lyon wrote:
>
> On 9 June 2016 at 17:21, Matthew Wahab 
> wrote:
>>
> It's an improvement, but I'm still seeing a few problems with this
> patch:
> the vfp* tests are still failing in some of the configurations I test,
> because
> * you force dg-options that contains -mfloat-abi=hard,
> * you check effective-target arm_neon_fp16_hw
> * but you don't call dg-add-options arm_neon_fp16
>
>> I understand now. I still think it would be better to use a list of
>> require-effective-targets so I'll try that first and use the arm_neon_fp16
>> options if that doesn't work.
>>
>
> Sorry for the delay. I've added effective-target requirements to the
> tests to check for hard-fp and for VFP (i.e. non-neon) FP16 support. The
> directives for the VFP FP16 support are new. I've split them out to a
> separate patch, both patches are attached.
>
> The first patch adds:
>
> - effective-target keywords arm_fp16_ok and arm_fp16_hw to check for
>   compiler and hardware support for FP16.
>
> - add-options features arm_fp16_ieee and arm_fp16_alternative, to
>   enable FP16 IEEE format and FP16 ARM Alternative format support
>
> Note that the existing add-options feature arm_fp16 enables the default
> FP16 format (fp16-format=none).
>
> The second patch updates the tests to use these directives. It also
> reworks gcc.target/arm/fp16-aapcs-1.c test is also reworked to focus on
> argument passing and return values adds a softfp variant as
> fp16-aapcs-2.c.
>
> As before, checked for arm-none-eabi with cross-compiled check-gcc and
> arm-linux-gnueabihf with native make check. I also ran the tests for
> cross-compiled arm-none-eabi with -mcpu=Cortex-M3.
>
> Ok for trunk?

Ok

Thanks,
Ramana





> Matthew
>
> PATCH 1/2 ChangeLog
> gcc/
> 2016-06-27  Matthew Wahab  
>
> * doc/sourcebuild.texi (Effective-Target keywords): Add entries
> for arm_fp16_ok and arm_fp16_hw.
> (Add Options): Add entries for arm_fp16, arm_fp16_ieee and
> arm_fp16_alternative.
>
> testsuite/
> 2016-06-27  Matthew Wahab  
>
> * lib/target-supports.exp (add_options_for arm_fp16): Reword
> comment.
> (add_options_for_arm_fp16_ieee): New.
> (add_options_for_arm_fp16_alternative): New.
> (check_effective_target_arm_fp16_ok_nocache): Add to comment.  Fix a
> long-line.
> (check_effective_target_arm_fp16_hw): New.
>
> PATCH 2/2 ChangeLog
> testsuite/
> 2016-06-27  Matthew Wahab  
>
> * testsuite/gcc.target/arm/aapcs/neon-vect10.c: Require
> -mfloat-ab=hard.  Replace arm_neon_fp16_ok with arm_neon_fp16_hw.
> * testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise.
> * testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise.  Also add
> options for ARM FP16 IEEE format.
> * testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise.
> * testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise.
> * testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise.
> * testsuite/gcc.target/arm/fp16-aapcs-1.c: Require
> -mfloat-ab=hard.  Also simplify the test.
> * testsuite/gcc.target/arm/fp16-aapcs-2.c: New.
>


Re: [AArch64] ARMv8.2 command line and feature macros support

2016-06-29 Thread James Greenhalgh
On Mon, Jun 27, 2016 at 03:58:00PM +0100, Jiong Wang wrote:
> On 07/06/16 09:46, Jiong Wang wrote:
> >2016-06-07  Matthew Wahab
> >Jiong Wang
> >
> >* config/aarch64/aarch64-arches.def: Add "armv8.2-a".
> >* config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
> >(AARCH64_FL_F16): New.
> >(AARCH64_FL_FOR_ARCH8_2): New.
> >(AARCH64_ISA_8_2): New.
> >(AARCH64_ISA_F16): New.
> >(TARGET_FP_F16INST): New.
> >(TARGET_SIMD_F16INST): New.
> >* config/aarch64/aarch64-option-extensions.def: New entry
> >for "fp16".
> >* config/aarch64/aarch64-c.c (arch64_update_cpp_builtins):
> >Conditionally define
> >__ARM_FEATURE_FP16_SCALAR_ARITHMETIC and
> >__ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
> >* doc/invoke.texi (AArch64 Options): Document "armv8.2-a"
> >and "fp16".
> >
> 
> This is a updated version of this patch, the updates are:
> 
>   * When enabling "fp16" also enables "fp".
>   * When disabling "fp" also disables "fp16".
>   * When disabling "fp16" only disables "fp16".
> 
> OK for trunk?

Hi Jiong,

Some very minor comments below.

> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index b15c23f0..d715da7 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -135,6 +135,8 @@ extern unsigned aarch64_architecture_version;
>  /* ARMv8.1 architecture extensions.  */
>  #define AARCH64_FL_LSE (1 << 4)  /* Has Large System Extensions. 
>  */
>  #define AARCH64_FL_V8_1(1 << 5)  /* Has ARMv8.1 extensions.  */
> +#define AARCH64_FL_V8_2(1 << 8)  /* Has ARMv8.2 features.  */
> +#define AARCH64_FL_F16 (1 << 9)  /* Has ARMv8.2 FP16 extensions. 
>  */

Please add a new "section" for the ARMv8.2-A architecture extensions (just
a new comment that groups them), and s/ARMv8.2/ARMv8.2-A/ throughout this
section (if you'd like to do the follow-up trivial patch for
s/ARMv8.1/ARMv8.1-A/ I'd appreciate that).

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index aa11209..d44bbc0 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13035,7 +13035,10 @@ more feature modifiers.  This option has the form
>  @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
>  
>  The permissible values for @var{arch} are @samp{armv8-a},
> -@samp{armv8.1-a} or @var{native}.
> +@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}.
> +
> +The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler
> +support for the ARMv8.2 architecture extensions.

ARMv8.2-A here too please.

>  
>  The value @samp{armv8.1-a} implies @samp{armv8-a} and enables compiler
>  support for the ARMv8.1 architecture extension.  In particular, it
> @@ -13140,6 +13143,8 @@ instructions.  This is on by default for all possible 
> values for options
>  @item lse
>  Enable Large System Extension instructions.  This is on by default for
>  @option{-march=armv8.1-a}.
> +@item fp16
> +Enable FP16 extension.

Please add a note here that enabling this extension also enables "fp"

This is OK for trunk otherwise.

Thanks,
James

> 
> 2016-06-27  Matthew Wahab  
> Jiong Wang  
> 
> * config/aarch64/aarch64-arches.def: Add "armv8.2-a".
> * config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
> (AARCH64_FL_F16): New.
> (AARCH64_FL_FOR_ARCH8_2): New.
> (AARCH64_ISA_8_2): New.
> (AARCH64_ISA_F16): New.
> (TARGET_FP_F16INST): New.
> (TARGET_SIMD_F16INST): New.
> * config/aarch64/aarch64-option-extensions.def ("fp16"): New entry.
> ("fp"): Disabling "fp" also disables "fp16".
> * config/aarch64/aarch64-c.c (arch64_update_cpp_builtins): 
> Conditionally define
> __ARM_FEATURE_FP16_SCALAR_ARITHMETIC and 
> __ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
> * doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and "fp16".



Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread Jonathan Wakely

On 28/06/16 21:59 +0200, François Dumont wrote:

  template
void
vector<_Tp, _Alloc>::
-_M_insert_aux(iterator __position, const _Tp& __x)
+_M_insert_value_aux(iterator __position, const _Tp& __x)
#endif
{
-  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
+  const _Tp* __ptr = std::__addressof(__x);
+
+  _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
+  _GLIBCXX_MOVE(*(this->_M_impl._M_finish - 1)));
+  ++this->_M_impl._M_finish;
+
+  _GLIBCXX_MOVE_BACKWARD3(__position.base(),
+ this->_M_impl._M_finish - 2,
+ this->_M_impl._M_finish - 1);
+
+  if (_M_data_ptr(__position.base()) <= __ptr
+ && __ptr < _M_data_ptr(this->_M_impl._M_finish - 1))


This is undefined behaviour. If the object is not contained in the
vector then you can't compare its address to addresses within the
vector.

I suggest forgetting about this optimisation unless we get a guarantee
from the compiler that we can do it safely. It's orthogonal to fixing
the correctness bug with emplacing an existing element of the vector.
We can fix the correctness bug now and worry about this optimisation
separately.




Re: [PATCH] Mark -fstack-protect as optimization flag (PR middle-end/71585)

2016-06-29 Thread Martin Liška
On 06/28/2016 03:54 PM, Richard Biener wrote:
> I wonder about the inliner change.  If one marks a single function
> with -fstack-protector
> that implicitely marks callers with -fno-stack-protector.  So I'd
> rather disable inlining
> between different settings here?

It works in the opposite way, if a caller has the flag disabled and a callee 
has,
than the flag is also set to the caller. Honza asked me to propagate the flag 
in such manner.

If you prefer to not to inline, I can prepare a patch?

Martin


Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread Jonathan Wakely

On 28/06/16 21:59 +0200, François Dumont wrote:

@@ -303,16 +301,20 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  emplace(const_iterator __position, _Args&&... __args)
  {
const size_type __n = __position - begin();


It looks like this should use __position - cbegin(), to avoid an
implicit conversion from iterator to const_iterator, and ...


-   if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage
-   && __position == end())
- {
-   _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
-std::forward<_Args>(__args)...);
-   ++this->_M_impl._M_finish;
- }
+   if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
+ if (__position == end())


This could be __position == cend(), and ...


+   {
+ _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
+  std::forward<_Args>(__args)...);
+ ++this->_M_impl._M_finish;
+   }
+ else
+   _M_insert_aux(begin() + (__position - cbegin()),


This could use __n, and ...


+ std::forward<_Args>(__args)...);
else
- _M_insert_aux(begin() + (__position - cbegin()),
-   std::forward<_Args>(__args)...);
+ _M_realloc_insert_aux(begin() + (__position - cbegin()),


This could also use __n.


+   std::forward<_Args>(__args)...);
+
return iterator(this->_M_impl._M_start + __n);
  }



Re: [PATCH] Mark -fstack-protect as optimization flag (PR middle-end/71585)

2016-06-29 Thread Richard Biener
On Wed, Jun 29, 2016 at 11:02 AM, Martin Liška  wrote:
> On 06/28/2016 03:54 PM, Richard Biener wrote:
>> I wonder about the inliner change.  If one marks a single function
>> with -fstack-protector
>> that implicitely marks callers with -fno-stack-protector.  So I'd
>> rather disable inlining
>> between different settings here?
>
> It works in the opposite way, if a caller has the flag disabled and a callee 
> has,
> than the flag is also set to the caller. Honza asked me to propagate the flag 
> in such manner.

If the intended use-case is to disable stack-protector on specific
functions then it is odd
if inlining a function with stack-protector on suddenly turns it on
for that function.

Not sure what the point is in the ability to do that per function
anyway (but yes, we can
control stack-protector behavior per function).

If the intended use-case is only for making it work better across TUs
for LTO and we want
to be conservative in _en_abling stack protection then your patch
accomplishes that.  OTOH
in that case merging the option in lto-wrapper works as well.

> If you prefer to not to inline, I can prepare a patch?

So what was your original reason to pursue this?

Richard.

>
> Martin


[PING, two months] libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to test

2016-06-29 Thread Thomas Schwinge
Hi!

Ping...  Now waiting for review for two months.

Quoting my latest commentary from
:

| [...] there actually is a difference between offload_plugins and
| offload_targets (for example, "intelmic"
| vs. "x86_64-intelmicemul-linux-gnu"), and I'm using both variables --
| to avoid having to translate the more specific
| "x86_64-intelmicemul-linux-gnu" (which we required in the test harness)
| into the less specific "intelmic" (for plugin loading) in
| libgomp/target.c.  I can do that, so that we can continue to use just a
| single offload_targets variable, but I consider that a less elegant
| solution.


Grüße
 Thomas


On Wed, 25 May 2016 08:01:17 +0200, I wrote:
> Ping...
> 
> On Wed, 18 May 2016 13:41:25 +0200, I wrote:
> > Ping.
> > 
> > On Wed, 11 May 2016 15:45:13 +0200, I wrote:
> > > Ping.
> > > 
> > > On Mon, 02 May 2016 11:54:27 +0200, I wrote:
> > > > On Fri, 29 Apr 2016 09:43:41 +0200, Jakub Jelinek  
> > > > wrote:
> > > > > On Thu, Apr 28, 2016 at 12:43:43PM +0200, Thomas Schwinge wrote:
> > > > > > commit 3b521f3e35fdb4b320e95b5f6a82b8d89399481a
> > > > > > Author: Thomas Schwinge 
> > > > > > Date:   Thu Apr 21 11:36:39 2016 +0200
> > > > > > 
> > > > > > libgomp: Unconfuse offload plugins vs. offload targets
> > > > > 
> > > > > I don't like this patch at all, rather than unconfusing stuff it
> > > > > makes stuff confusing.  Plugins are just a way to support various
> > > > > offloading targets.
> > > > 
> > > > Huh; my patch exactly clarifies that the offload_targets variable does
> > > > not actually list offload target names, but does list libgomp offload
> > > > plugin names...
> > > > 
> > > > > Can you please post just a short patch without all those changes
> > > > > that does what you want, rather than renaming everything at the same 
> > > > > time?
> > > > 
> > > > I thought incremental, self-contained patches were easier to review.
> > > > Anyway, here's the three patches merged into one:
> > > > 
> > > > commit 8060ae3474072eef685381d80f566d1c0942c603
> > > > Author: Thomas Schwinge 
> > > > Date:   Thu Apr 21 11:36:39 2016 +0200
> > > > 
> > > > libgomp: In OpenACC testing, cycle though $offload_targets, and by 
> > > > default only build for the offload target that we're actually going to 
> > > > test
> > > > 
> > > > libgomp/
> > > > * plugin/configfrag.ac (offload_targets): Actually enumerate
> > > > offload targets, and add...
> > > > (offload_plugins): ... this one to enumerate offload plugins.
> > > > (OFFLOAD_PLUGINS): Renamed from OFFLOAD_TARGETS.
> > > > * target.c (gomp_target_init): Adjust to that.
> > > > * testsuite/lib/libgomp.exp: Likewise.
> > > > (offload_targets_s, offload_targets_s_openacc): Remove 
> > > > variables.
> > > > (offload_target_to_openacc_device_type): New proc.
> > > > (check_effective_target_openacc_nvidia_accel_selected)
> > > > (check_effective_target_openacc_host_selected): Examine
> > > > $openacc_device_type instead of $offload_target_openacc.
> > > > * Makefile.in: Regenerate.
> > > > * config.h.in: Likewise.
> > > > * configure: Likewise.
> > > > * testsuite/Makefile.in: Likewise.
> > > > * testsuite/libgomp.oacc-c++/c++.exp: Cycle through
> > > > $offload_targets (plus "disable") instead of
> > > > $offload_targets_s_openacc, and add "-foffload=$offload_target" 
> > > > to
> > > > tagopt.
> > > > * testsuite/libgomp.oacc-c/c.exp: Likewise.
> > > > * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
> > > > ---
> > > >  libgomp/Makefile.in|  1 +
> > > >  libgomp/config.h.in|  4 +-
> > > >  libgomp/configure  | 44 +++--
> > > >  libgomp/plugin/configfrag.ac   | 39 +++-
> > > >  libgomp/target.c   |  8 +--
> > > >  libgomp/testsuite/Makefile.in  |  1 +
> > > >  libgomp/testsuite/lib/libgomp.exp  | 72 
> > > > ++
> > > >  libgomp/testsuite/libgomp.oacc-c++/c++.exp | 30 +
> > > >  libgomp/testsuite/libgomp.oacc-c/c.exp | 30 +
> > > >  libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 22 ---
> > > >  10 files changed, 142 insertions(+), 109 deletions(-)
> > > > 
> > > > diff --git libgomp/Makefile.in libgomp/Makefile.in
> > > > [snipped]
> > > > diff --git libgomp/config.h.in libgomp/config.h.in
> > > > [snipped]
> > > > diff --git libgomp/configure libgomp/configure
> > > > [snipped]
> > > > diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
> > > > index 88b4156..de0a6f6 100644
> > > > --- libgomp/plugin/configfrag.ac
> > > > +++ libgomp/plugin/configfrag.ac
> > > > @@ -26,8 

Rename PRAGMA_OMP_DECLARE_REDUCTION to PRAGMA_OMP_DECLARE (was: Clarify PRAGMA_OACC_* and PRAGMA_OMP_*)

2016-06-29 Thread Thomas Schwinge
Hi!

On Tue, 28 Jun 2016 14:20:26 +0200, Jakub Jelinek  wrote:
> On Tue, Jun 28, 2016 at 02:13:24PM +0200, Thomas Schwinge wrote:
> > Looking at how OpenMP declare simd is handled in the C++ front end, I ran
> > into it being parsed for PRAGMA_OMP_DECLARE_REDUCTION, which got me
> > confused.  OK to commit the following to un-confuse this, in that
> > PRAGMA_OACC_* and PRAGMA_OMP_* don't describe what *eventually* is to be
> > parsed (which (nowadays?) is wrong for PRAGMA_OMP_DECLARE_REDUCTION,
> > anyway), but instead they describe what *so far* has been parsed.
> 
> I think I'd prefer if anything just to change PRAGMA_OMP_DECLARE_REDUCTION
> to PRAGMA_OMP_DECLARE (as the only case where there is any ambiguity) and
> nothing else, the rest would be more confusing than it is now.  

I anticipated that you would say something along these lines...  Thanks
at least for permitting to clarify PRAGMA_OMP_DECLARE_REDUCTION.

> > commit fa557f6ad39992052decb413501c713db8ec59f0
> > Author: Thomas Schwinge 
> > Date:   Tue Jun 28 14:12:23 2016 +0200
> > 
> > Clarify PRAGMA_OACC_* and PRAGMA_OMP_*
> > 
> > gcc/c-family/
> > * c-pragma.h (enum pragma_kind): Rename PRAGMA_OACC_ENTER_DATA 
> > to
> > PRAGMA_OACC_ENTER, PRAGMA_OACC_EXIT_DATA to PRAGMA_OACC_EXIT,
> > PRAGMA_OMP_CANCELLATION_POINT to PRAGMA_OMP_CANCELLATION,
> > PRAGMA_OMP_DECLARE_REDUCTION to PRAGMA_OMP_DECLARE,
> > PRAGMA_OMP_END_DECLARE_TARGET to PRAGMA_OMP_END.  Adjust all
> > users.

Committed in r237842:

commit a4dd89cbd9a89bf7b4544cf0576b3607a0a7f281
Author: tschwinge 
Date:   Wed Jun 29 09:07:52 2016 +

Rename PRAGMA_OMP_DECLARE_REDUCTION to PRAGMA_OMP_DECLARE

gcc/c-family/
* c-pragma.h (enum pragma_kind): Rename
PRAGMA_OMP_DECLARE_REDUCTION to PRAGMA_OMP_DECLARE.  Adjust all
users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237842 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/c-family/ChangeLog  | 6 ++
 gcc/c-family/c-pragma.c | 2 +-
 gcc/c-family/c-pragma.h | 2 +-
 gcc/c/c-parser.c| 4 ++--
 gcc/cp/parser.c | 4 ++--
 5 files changed, 12 insertions(+), 6 deletions(-)

diff --git gcc/c-family/ChangeLog gcc/c-family/ChangeLog
index d5b8395..679cb6b 100644
--- gcc/c-family/ChangeLog
+++ gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2016-06-29  Thomas Schwinge  
+
+   * c-pragma.h (enum pragma_kind): Rename
+   PRAGMA_OMP_DECLARE_REDUCTION to PRAGMA_OMP_DECLARE.  Adjust all
+   users.
+
 2016-06-29  Richard Biener  
 
PR middle-end/71002
diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c
index c73aa82..277bc56 100644
--- gcc/c-family/c-pragma.c
+++ gcc/c-family/c-pragma.c
@@ -1286,7 +1286,7 @@ static const struct omp_pragma_def omp_pragmas[] = {
   { "threadprivate", PRAGMA_OMP_THREADPRIVATE }
 };
 static const struct omp_pragma_def omp_pragmas_simd[] = {
-  { "declare", PRAGMA_OMP_DECLARE_REDUCTION },
+  { "declare", PRAGMA_OMP_DECLARE },
   { "distribute", PRAGMA_OMP_DISTRIBUTE },
   { "for", PRAGMA_OMP_FOR },
   { "parallel", PRAGMA_OMP_PARALLEL },
diff --git gcc/c-family/c-pragma.h gcc/c-family/c-pragma.h
index 65f10db..6d9cb08 100644
--- gcc/c-family/c-pragma.h
+++ gcc/c-family/c-pragma.h
@@ -46,7 +46,7 @@ enum pragma_kind {
   PRAGMA_OMP_CANCEL,
   PRAGMA_OMP_CANCELLATION_POINT,
   PRAGMA_OMP_CRITICAL,
-  PRAGMA_OMP_DECLARE_REDUCTION,
+  PRAGMA_OMP_DECLARE,
   PRAGMA_OMP_DISTRIBUTE,
   PRAGMA_OMP_END_DECLARE_TARGET,
   PRAGMA_OMP_FLUSH,
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 7f491f1..1d2dac7 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -10215,7 +10215,7 @@ c_parser_pragma (c_parser *parser, enum pragma_context 
context, bool *if_p)
   c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
   return false;
 
-case PRAGMA_OMP_DECLARE_REDUCTION:
+case PRAGMA_OMP_DECLARE:
   c_parser_omp_declare (parser, context);
   return false;
 
@@ -16381,7 +16381,7 @@ c_parser_omp_declare_simd (c_parser *parser, enum 
pragma_context context)
   while (c_parser_next_token_is (parser, CPP_PRAGMA))
 {
   if (c_parser_peek_token (parser)->pragma_kind
- != PRAGMA_OMP_DECLARE_REDUCTION
+ != PRAGMA_OMP_DECLARE
  || c_parser_peek_2nd_token (parser)->type != CPP_NAME
  || strcmp (IDENTIFIER_POINTER
(c_parser_peek_2nd_token (parser)->value),
diff --git gcc/cp/parser.c gcc/cp/parser.c
index d1f06fd..739fca0 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -37203,7 +37203,7 @@ cp_parser_pragma (cp_parser *parser, enum 
pragma_context context, bool *if_p)
   parser->lexer->in_pragma = true;
 
   id = cp_parser_pragma_kind (pragma_tok);
-  if (id != PRAGMA_OMP_DECLARE_REDUCTION && id != PRAGMA_OACC_ROUTINE)
+  if (id != PRAGMA_OMP_DECLARE && id != PRAGMA_OACC_ROUTINE)
 cp_ensure_no_omp_declare_simd (parser);
   switch (id)
 {
@@ -37310,7 +37310,7 @@ cp_parser_

Re: Improve diagnostic messages of "#pragma omp cancel", "#pragma omp cancellation point" parsing (was: Clarify PRAGMA_OACC_* and PRAGMA_OMP_*)

2016-06-29 Thread Thomas Schwinge
Hi!

On Tue, 28 Jun 2016 14:25:28 +0200, Jakub Jelinek  wrote:
> On Tue, Jun 28, 2016 at 02:19:13PM +0200, Thomas Schwinge wrote:
> > [...]
> > 
> > OK for trunk?
> 
> This LGTM if you tweak it to apply without the previous patch or with just
> a patch for PRAGMA_OMP_DECLARE.
> 
> Thanks.

Committed in r237843:

commit 44e775d8fa59d9732f2a10106c3f37e9e3224daa
Author: tschwinge 
Date:   Wed Jun 29 09:08:04 2016 +

Improve diagnostic messages of "#pragma omp cancel", "#pragma omp 
cancellation point" parsing

gcc/c/
* c-parser.c (c_parser_pragma) :
Move pragma context checking into...
(c_parser_omp_cancellation_point): ... here, and improve
diagnostic messages.
* c-typeck.c (c_finish_omp_cancel)
(c_finish_omp_cancellation_point): Improve diagnostic messages.
gcc/cp/
* parser.c (cp_parser_pragma) :
Move pragma context checking into...
(cp_parser_omp_cancellation_point): ... here, and improve
diagnostic messages.
* semantics.c (finish_omp_cancel, finish_omp_cancellation_point):
Improve diagnostic messages.
gcc/testsuite/
* c-c++-common/gomp/cancel-1.c: Extend.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237843 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/c/ChangeLog|  9 
 gcc/c/c-parser.c   | 24 +-
 gcc/c/c-typeck.c   |  4 ++--
 gcc/cp/ChangeLog   |  9 
 gcc/cp/parser.c| 33 +++---
 gcc/cp/semantics.c |  4 ++--
 gcc/testsuite/ChangeLog|  4 
 gcc/testsuite/c-c++-common/gomp/cancel-1.c | 15 ++
 8 files changed, 72 insertions(+), 30 deletions(-)

diff --git gcc/c/ChangeLog gcc/c/ChangeLog
index df73934..7bd112b 100644
--- gcc/c/ChangeLog
+++ gcc/c/ChangeLog
@@ -1,3 +1,12 @@
+2016-06-29  Thomas Schwinge  
+
+   * c-parser.c (c_parser_pragma) :
+   Move pragma context checking into...
+   (c_parser_omp_cancellation_point): ... here, and improve
+   diagnostic messages.
+   * c-typeck.c (c_finish_omp_cancel)
+   (c_finish_omp_cancellation_point): Improve diagnostic messages.
+
 2016-06-29  Jakub Jelinek  
 
PR c/71685
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 1d2dac7..1a50dea 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -1358,11 +1358,11 @@ static tree c_parser_omp_for_loop (location_t, c_parser 
*, enum tree_code,
 static void c_parser_omp_taskwait (c_parser *);
 static void c_parser_omp_taskyield (c_parser *);
 static void c_parser_omp_cancel (c_parser *);
-static void c_parser_omp_cancellation_point (c_parser *);
 
 enum pragma_context { pragma_external, pragma_struct, pragma_param,
  pragma_stmt, pragma_compound };
 static bool c_parser_pragma (c_parser *, enum pragma_context, bool *);
+static void c_parser_omp_cancellation_point (c_parser *, enum pragma_context);
 static bool c_parser_omp_target (c_parser *, enum pragma_context, bool *);
 static void c_parser_omp_end_declare_target (c_parser *);
 static void c_parser_omp_declare (c_parser *, enum pragma_context);
@@ -10187,14 +10187,7 @@ c_parser_pragma (c_parser *parser, enum pragma_context 
context, bool *if_p)
   return false;
 
 case PRAGMA_OMP_CANCELLATION_POINT:
-  if (context != pragma_compound)
-   {
- if (context == pragma_stmt)
-   c_parser_error (parser, "%<#pragma omp cancellation point%> may "
-   "only be used in compound statements");
- goto bad_stmt;
-   }
-  c_parser_omp_cancellation_point (parser);
+  c_parser_omp_cancellation_point (parser, context);
   return false;
 
 case PRAGMA_OMP_THREADPRIVATE:
@@ -15668,7 +15661,7 @@ c_parser_omp_cancel (c_parser *parser)
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_TASKGROUP))
 
 static void
-c_parser_omp_cancellation_point (c_parser *parser)
+c_parser_omp_cancellation_point (c_parser *parser, enum pragma_context context)
 {
   location_t loc = c_parser_peek_token (parser)->location;
   tree clauses;
@@ -15691,6 +15684,17 @@ c_parser_omp_cancellation_point (c_parser *parser)
   return;
 }
 
+  if (context != pragma_compound)
+{
+  if (context == pragma_stmt)
+   error_at (loc, "%<#pragma omp cancellation point%> may only be used in"
+ " compound statements");
+  else
+   c_parser_error (parser, "expected declaration specifiers");
+  c_parser_skip_to_pragma_eol (parser, false);
+  return;
+}
+
   clauses
 = c_parser_omp_all_clauses (parser, OMP_CANCELLATION_POINT_CLAUSE_MASK,
"#pragma omp cancellation point");
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 56268fc..b2435de 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -11933,7 +11933,7 @@ c_finis

Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread Paolo Carlini

Hi,

On 29/06/2016 10:57, Jonathan Wakely wrote:

On 28/06/16 21:59 +0200, François Dumont wrote:

+  if (_M_data_ptr(__position.base()) <= __ptr
+  && __ptr < _M_data_ptr(this->_M_impl._M_finish - 1))


This is undefined behaviour. If the object is not contained in the
vector then you can't compare its address to addresses within the
vector.


Uhm, would that be true also if the code used std::less? Aren't we doing 
something like that in std::basic_string under the assumption (Nathan?) 
that it would not be the case? Or maybe I'm misreading the code 
(admittedly I didn't follow in detail the whole exchange)


Paolo.


Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread Jonathan Wakely

On 29/06/16 11:35 +0200, Paolo Carlini wrote:

Hi,

On 29/06/2016 10:57, Jonathan Wakely wrote:

On 28/06/16 21:59 +0200, François Dumont wrote:

+  if (_M_data_ptr(__position.base()) <= __ptr
+  && __ptr < _M_data_ptr(this->_M_impl._M_finish - 1))


This is undefined behaviour. If the object is not contained in the
vector then you can't compare its address to addresses within the
vector.


Uhm, would that be true also if the code used std::less? Aren't we 
doing something like that in std::basic_string under the assumption 
(Nathan?) that it would not be the case? Or maybe I'm misreading the 
code (admittedly I didn't follow in detail the whole exchange)


Yes, it's OK if you use std::less. In general there's no guarantee
that std::less defines the same order as operator<(T*, T*), but in
our implementation it does.

I'd still prefer to keep a correctness fix and a potentially risky
optimisation in separate commits.




[PATCH] Generate more effective one-operand permutation instruction for knl.

2016-06-29 Thread Yuri Rumyantsev
Hi All,

Here is a simple patch which generates on-operand vperm instructions
introduced in knl.
Using this patch we got +5% speed-up on one important benchmark.

Bootstrapping and regression testing did not show any new failures.
Is it OK for trunk?

ChangeLog:
2016-06-29  Yuri Rumyantsev  

* config/i386/i386.c (ix86_expand_vec_perm): Add handle one-operand
permutation for TARGET_AVX512F.
(ix86_expand_vec_one_operand_perm_avx512): New function.
(expand_vec_perm_1): Invoke introduced function.
* tree-vect-loop.c (vect_transform_loop): Clear-up safelen value since
it may be not valid after vectorization.

gcc/testsuite/ChangeLog
* gcc/testsuite/gcc.target/i386/avx512f-vect-perm-1.c: New test.
* gcc/testsuite/gcc.target/i386/avx512f-vect-perm-2.c: New test.


patch
Description: Binary data


Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread Paolo Carlini

Hi,

On 29/06/2016 12:07, Jonathan Wakely wrote:

On 29/06/16 11:35 +0200, Paolo Carlini wrote:

Hi,

On 29/06/2016 10:57, Jonathan Wakely wrote:

On 28/06/16 21:59 +0200, François Dumont wrote:

+  if (_M_data_ptr(__position.base()) <= __ptr
+  && __ptr < _M_data_ptr(this->_M_impl._M_finish - 1))


This is undefined behaviour. If the object is not contained in the
vector then you can't compare its address to addresses within the
vector.


Uhm, would that be true also if the code used std::less? Aren't we 
doing something like that in std::basic_string under the assumption 
(Nathan?) that it would not be the case? Or maybe I'm misreading the 
code (admittedly I didn't follow in detail the whole exchange)


Yes, it's OK if you use std::less. In general there's no guarantee
that std::less defines the same order as operator<(T*, T*), but in
our implementation it does.

Ah Ok, excellent.


I'd still prefer to keep a correctness fix and a potentially risky
optimisation in separate commits.


Definitely. Maybe with an additional comment too!

Paolo.


Re: [AArch64] ARMv8.2 command line and feature macros support

2016-06-29 Thread Richard Earnshaw (lists)
On 29/06/16 09:43, James Greenhalgh wrote:
> On Mon, Jun 27, 2016 at 03:58:00PM +0100, Jiong Wang wrote:
>> On 07/06/16 09:46, Jiong Wang wrote:
>>> 2016-06-07  Matthew Wahab
>>>Jiong Wang
>>>
>>>* config/aarch64/aarch64-arches.def: Add "armv8.2-a".
>>>* config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
>>>(AARCH64_FL_F16): New.
>>>(AARCH64_FL_FOR_ARCH8_2): New.
>>>(AARCH64_ISA_8_2): New.
>>>(AARCH64_ISA_F16): New.
>>>(TARGET_FP_F16INST): New.
>>>(TARGET_SIMD_F16INST): New.
>>>* config/aarch64/aarch64-option-extensions.def: New entry
>>> for "fp16".
>>>* config/aarch64/aarch64-c.c (arch64_update_cpp_builtins):
>>> Conditionally define
>>>__ARM_FEATURE_FP16_SCALAR_ARITHMETIC and
>>> __ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
>>>* doc/invoke.texi (AArch64 Options): Document "armv8.2-a"
>>> and "fp16".
>>>
>>
>> This is a updated version of this patch, the updates are:
>>
>>   * When enabling "fp16" also enables "fp".
>>   * When disabling "fp" also disables "fp16".
>>   * When disabling "fp16" only disables "fp16".
>>
>> OK for trunk?
> 
> Hi Jiong,
> 
> Some very minor comments below.
> 
>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> index b15c23f0..d715da7 100644
>> --- a/gcc/config/aarch64/aarch64.h
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -135,6 +135,8 @@ extern unsigned aarch64_architecture_version;
>>  /* ARMv8.1 architecture extensions.  */
>>  #define AARCH64_FL_LSE(1 << 4)  /* Has Large System Extensions. 
>>  */
>>  #define AARCH64_FL_V8_1   (1 << 5)  /* Has ARMv8.1 extensions.  */
>> +#define AARCH64_FL_V8_2   (1 << 8)  /* Has ARMv8.2 features.  */
>> +#define AARCH64_FL_F16(1 << 9)  /* Has ARMv8.2 FP16 extensions. 
>>  */
> 
> Please add a new "section" for the ARMv8.2-A architecture extensions (just
> a new comment that groups them), and s/ARMv8.2/ARMv8.2-A/ throughout this
> section (if you'd like to do the follow-up trivial patch for
> s/ARMv8.1/ARMv8.1-A/ I'd appreciate that).
> 
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index aa11209..d44bbc0 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -13035,7 +13035,10 @@ more feature modifiers.  This option has the form
>>  @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
>>  
>>  The permissible values for @var{arch} are @samp{armv8-a},
>> -@samp{armv8.1-a} or @var{native}.
>> +@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}.
>> +
>> +The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler
>> +support for the ARMv8.2 architecture extensions.
> 
> ARMv8.2-A here too please.

Why do we need to say that v8.2-a implies v8.1-a, when we don't say that
v8.1-a implies v8-a?  The whole implies clause seems unnecessary to me.

R.

> 
>>  
>>  The value @samp{armv8.1-a} implies @samp{armv8-a} and enables compiler
>>  support for the ARMv8.1 architecture extension.  In particular, it
>> @@ -13140,6 +13143,8 @@ instructions.  This is on by default for all 
>> possible values for options
>>  @item lse
>>  Enable Large System Extension instructions.  This is on by default for
>>  @option{-march=armv8.1-a}.
>> +@item fp16
>> +Enable FP16 extension.
> 
> Please add a note here that enabling this extension also enables "fp"
> 
> This is OK for trunk otherwise.
> 
> Thanks,
> James
> 
>>
>> 2016-06-27  Matthew Wahab  
>> Jiong Wang  
>>
>> * config/aarch64/aarch64-arches.def: Add "armv8.2-a".
>> * config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
>> (AARCH64_FL_F16): New.
>> (AARCH64_FL_FOR_ARCH8_2): New.
>> (AARCH64_ISA_8_2): New.
>> (AARCH64_ISA_F16): New.
>> (TARGET_FP_F16INST): New.
>> (TARGET_SIMD_F16INST): New.
>> * config/aarch64/aarch64-option-extensions.def ("fp16"): New entry.
>> ("fp"): Disabling "fp" also disables "fp16".
>> * config/aarch64/aarch64-c.c (arch64_update_cpp_builtins): 
>> Conditionally define
>> __ARM_FEATURE_FP16_SCALAR_ARITHMETIC and 
>> __ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
>> * doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and "fp16".
> 



Re: [PATCH] Mark -fstack-protect as optimization flag (PR middle-end/71585)

2016-06-29 Thread Martin Liška
On 06/29/2016 11:12 AM, Richard Biener wrote:
> So what was your original reason to pursue this?
> 
> Richard.

Agree with you that handling the option during inlining is a bit overkill.
I can live with just marking the option as Optimize, which will fix reported PR.

Sending simplified version 2.

Ready for trunk?
Martin


Re: [PATCH] Mark -fstack-protect as optimization flag (PR middle-end/71585)

2016-06-29 Thread Martin Liška
On 06/29/2016 12:27 PM, Martin Liška wrote:
> On 06/29/2016 11:12 AM, Richard Biener wrote:
>> So what was your original reason to pursue this?
>>
>> Richard.
> 
> Agree with you that handling the option during inlining is a bit overkill.
> I can live with just marking the option as Optimize, which will fix reported 
> PR.
> 
> Sending simplified version 2.
> 
> Ready for trunk?
> Martin
> 

Adding missing patch.

M.
>From 67d230ebe22f41dd3fcc8a5f5fc108730619202e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 24 Jun 2016 19:21:37 +0200
Subject: [PATCH] Mark -fstack-protect as optimization flag.

gcc/ChangeLog:

2016-06-27  Martin Liska  

	PR middle-end/71585
	* common.opt (flag_stack_protect): Mark the flag as optimization
	flag.
	* ipa-inline-transform.c (inline_call): Remove unnecessary call
	of build_optimization_node.

gcc/testsuite/ChangeLog:

2016-06-27  Martin Liska  

	* gcc.dg/pr71585.c: New test.
	* gcc.dg/pr71585-2.c: New test.
	* gcc.dg/pr71585-3.c: New test.
---
 gcc/common.opt   |  8 
 gcc/ipa-inline-transform.c   |  3 +--
 gcc/testsuite/gcc.dg/pr71585-2.c | 23 +++
 gcc/testsuite/gcc.dg/pr71585.c   | 23 +++
 4 files changed, 51 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr71585-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr71585.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 5d90385..a7c5125 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2221,19 +2221,19 @@ Common RejectNegative Joined Var(common_deferred_options) Defer
 -fstack-limit-symbol=	Trap if the stack goes past symbol .
 
 fstack-protector
-Common Report Var(flag_stack_protect, 1) Init(-1)
+Common Report Var(flag_stack_protect, 1) Init(-1) Optimization
 Use propolice as a stack protection method.
 
 fstack-protector-all
-Common Report RejectNegative Var(flag_stack_protect, 2) Init(-1)
+Common Report RejectNegative Var(flag_stack_protect, 2) Init(-1) Optimization
 Use a stack protection method for every function.
 
 fstack-protector-strong
-Common Report RejectNegative Var(flag_stack_protect, 3) Init(-1)
+Common Report RejectNegative Var(flag_stack_protect, 3) Init(-1) Optimization
 Use a smart stack protection method for certain functions.
 
 fstack-protector-explicit
-Common Report RejectNegative Var(flag_stack_protect, 4)
+Common Report RejectNegative Var(flag_stack_protect, 4) Optimization
 Use stack protection method only for functions with the stack_protect attribute.
 
 fstack-usage
diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c
index f6b7d41..9ac1efc 100644
--- a/gcc/ipa-inline-transform.c
+++ b/gcc/ipa-inline-transform.c
@@ -342,10 +342,10 @@ inline_call (struct cgraph_edge *e, bool update_original,
   if (dump_file)
 	fprintf (dump_file, "Dropping flag_strict_aliasing on %s:%i\n",
 		 to->name (), to->order);
-  build_optimization_node (&opts);
   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)
 	 = build_optimization_node (&opts);
 }
+
   inline_summary *caller_info = inline_summaries->get (to);
   inline_summary *callee_info = inline_summaries->get (callee);
   if (!caller_info->fp_expressions && callee_info->fp_expressions)
@@ -402,7 +402,6 @@ inline_call (struct cgraph_edge *e, bool update_original,
 	  if (dump_file)
 	fprintf (dump_file, "Copying FP flags from %s:%i to %s:%i\n",
 		 callee->name (), callee->order, to->name (), to->order);
-	  build_optimization_node (&opts);
 	  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)
 	 = build_optimization_node (&opts);
 	}
diff --git a/gcc/testsuite/gcc.dg/pr71585-2.c b/gcc/testsuite/gcc.dg/pr71585-2.c
new file mode 100644
index 000..c3010da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr71585-2.c
@@ -0,0 +1,23 @@
+/* Test that stack protection is done on chosen functions. */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */
+/* { dg-options "-O2 -fstack-protector-all" } */
+
+/* This test checks the presence of __stack_chk_fail function in assembler.
+ * Compiler generates _stack_chk_fail_local (wrapper) calls instead for PIC.
+ */
+/* { dg-require-effective-target nonpic } */
+
+static int foo()
+{
+  return 0;
+}
+
+#pragma GCC push_options
+#pragma GCC optimize ("-fno-stack-protector")
+
+int main() { foo (); }
+
+#pragma GCC pop_options
+
+/* { dg-final { scan-assembler-times "stack_chk_fail" 0 } } */
diff --git a/gcc/testsuite/gcc.dg/pr71585.c b/gcc/testsuite/gcc.dg/pr71585.c
new file mode 100644
index 000..444aaa1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr71585.c
@@ -0,0 +1,23 @@
+/* Test that stack protection is done on chosen functions. */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */
+/* { dg-options "-O2 -fstack-protector-all" } */
+
+/* This test checks the presence of __stack_chk_fail function in assembler.
+ * Compiler generates _stack_chk_fail_local (wrapper) calls instead for PIC.
+ */
+/* { dg-require-effective-target nonpic } */
+
+

[PATCH][AArch64] Fix some scan-assembler tests for -mabi=ilp32

2016-06-29 Thread Kyrill Tkachov

Hi all,

I notice these scan-assembler tests fail when testing -mabi=ilp32 because the 
64-bit operation that they
expect doesn't happen on the 32-bit long types in that configuration.

The easy fix is to change the 'long' types to be 'long long' so that they are 
always 64-bit.
With this patch the tests now pass with and without -mabi=ilp32.

Ok for trunk?

Thanks,
Kyrill

2016-06-29  Kyrylo Tkachov  

* gcc.target/aarch64/cinc_common_1.c: Use long long instead of long.
* gcc.target/aarch64/combine_bfi_1.c: Likewise.
* gcc.target/aarch64/fmul_fcvt_1.c: Likewise.
* gcc.target/aarch64/mult-synth_4.c: Likewise.
* gcc.target/aarch64/target_attr_3.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c b/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c
index d04126331a7456dc267fc999f12366acbddf7927..f93364f74ba04e8dc11914a5a1bc342edf5e0751 100644
--- a/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c
@@ -15,14 +15,14 @@ barsi (int x)
   return x > 100 ? x + 4 : x + 3;
 }
 
-long
-foodi (long x)
+long long
+foodi (long long x)
 {
   return x > 100 ? x - 2 : x - 1;
 }
 
-long
-bardi (long x)
+long long
+bardi (long long x)
 {
   return x > 100 ? x + 4 : x + 3;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfi_1.c b/gcc/testsuite/gcc.target/aarch64/combine_bfi_1.c
index accf14410938a6c0119f7b9b5c76db38170e8e09..9cc3bdb3ddfc286bfb6c079aaba27fb5edb1806c 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfi_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfi_1.c
@@ -25,8 +25,8 @@ f4 (int x, int y)
   return (x & ~0xff) | (y & 0xff);
 }
 
-long
-f5 (long x, long y)
+long long
+f5 (long long x, long long y)
 {
   return (x & ~0xull) | (y & 0x);
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
index 05e06e3729e59a73018ed647057ea42d9c9aa952..af155f2dda216bb7ace03209d3a8a809c4ee2f8d 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
@@ -14,26 +14,26 @@ usffoo##__a (float x)	\
   return x * __a##.0f;	\
 }			\
 			\
-long			\
+long long		\
 lsffoo##__a (float x)	\
 {			\
   return x * __a##.0f;	\
 }			\
 			\
-unsigned long		\
+unsigned long long	\
 ulsffoo##__a (float x)	\
 {			\
   return x * __a##.0f;	\
 }
 
 #define FUNC_DEFD(__a)	\
-long			\
+long long		\
 dffoo##__a (double x)	\
 {			\
   return x * __a##.0;	\
 }			\
 			\
-unsigned long		\
+unsigned long long	\
 udffoo##__a (double x)	\
 {			\
   return x * __a##.0;	\
diff --git a/gcc/testsuite/gcc.target/aarch64/mult-synth_4.c b/gcc/testsuite/gcc.target/aarch64/mult-synth_4.c
index 4b607782f9e3257e3444b355d84970580f90a41e..f1283e84e200bc24986b6e0c57e33694e655b532 100644
--- a/gcc/testsuite/gcc.target/aarch64/mult-synth_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/mult-synth_4.c
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
 
-long
+long long
 foo (int x, int y)
 {
-   return (long)x * 6L;
+   return (long long)x * 6L;
 }
 
 /* { dg-final { scan-assembler-times "smull\tx\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_3.c b/gcc/testsuite/gcc.target/aarch64/target_attr_3.c
index 50e52520ef0d1082681661999cb034f01ffc3fab..7887208fd7d83723c8f5e21000ed4359a27d0dae 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_3.c
@@ -5,12 +5,12 @@
and the fix is applied once.  */
 
 __attribute__ ((target ("fix-cortex-a53-835769")))
-unsigned long
-test (unsigned long a, double b, unsigned long c,
-  unsigned long d, unsigned long *e)
+unsigned long long
+test (unsigned long long a, double b, unsigned long long c,
+  unsigned long long d, unsigned long long *e)
 {
   double result;
-  volatile unsigned long tmp = *e;
+  volatile unsigned long long tmp = *e;
   __asm__ __volatile ("// %0, %1"
 			: "=w" (result)
 			: "0" (b)
@@ -18,12 +18,12 @@ test (unsigned long a, double b, unsigned long c,
   return c * d + d;
 }
 
-unsigned long
-test2 (unsigned long a, double b, unsigned long c,
-   unsigned long d, unsigned long *e)
+unsigned long long
+test2 (unsigned long long a, double b, unsigned long long c,
+   unsigned long long d, unsigned long long *e)
 {
   double result;
-  volatile unsigned long tmp = *e;
+  volatile unsigned long long tmp = *e;
   __asm__ __volatile ("// %0, %1"
 			: "=w" (result)
 			: "0" (b)


Re: [PATCH/AARCH64] Add rtx_costs routine for vulcan.

2016-06-29 Thread James Greenhalgh
On Thu, Jun 23, 2016 at 02:45:21PM +0530, Virendra Pathak wrote:
> Hi gcc-patches group,
> 
> Please find the patch for adding rtx_costs routine for vulcan cpu.
> 
> Tested with compiling cross aarch64-linux-gcc , bootstrapped native
> aarch64-unknown-linux-gnu
> and make check (gcc). No new regression failure is added by this patch.
> 
> Kindly review and merge the patch to trunk, if the patch is okay.
> Thanks.

This is OK, but I have the same question for you as I had for the
qdf24xx tuning that Jim proposed, so I won't commit it yet...

> gcc/ChangeLog:
> 
> Virendra Pathak  
> 
> * config/aarch64/aarch64-cores.def: Update vulcan COSTS.
> * config/aarch64/aarch64-cost-tables.h
> (vulcan_extra_costs): New variable.
> * config/aarch64/aarch64.c
> (vulcan_addrcost_table): Likewise.
> (vulcan_regmove_cost): Likewise.
> (vulcan_vector_cost): Likewise.
> (vulcan_branch_cost): Likewise.
> (vulcan_tunings): Likewise.


>
> +  {
> +/* FP SFmode */
> +{
> +  COSTS_N_INSNS (16),/* Div.  */
> +  COSTS_N_INSNS (6), /* Mult.  */
> +  COSTS_N_INSNS (6), /* Mult_addsub. */
> +  COSTS_N_INSNS (6), /* Fma.  */
> +  COSTS_N_INSNS (6), /* Addsub.  */
> +  COSTS_N_INSNS (5), /* Fpconst. */
> +  COSTS_N_INSNS (5), /* Neg.  */
> +  COSTS_N_INSNS (5), /* Compare.  */
> +  COSTS_N_INSNS (7), /* Widen.  */
> +  COSTS_N_INSNS (7), /* Narrow.  */
> +  COSTS_N_INSNS (7), /* Toint.  */
> +  COSTS_N_INSNS (7), /* Fromint.  */
> +  COSTS_N_INSNS (7)  /* Roundint.  */
> +},
> +/* FP DFmode */
> +{
> +  COSTS_N_INSNS (23),/* Div.  */
> +  COSTS_N_INSNS (6), /* Mult.  */
> +  COSTS_N_INSNS (6), /* Mult_addsub.  */
> +  COSTS_N_INSNS (6), /* Fma.  */
> +  COSTS_N_INSNS (6), /* Addsub.  */
> +  COSTS_N_INSNS (5), /* Fpconst.  */
> +  COSTS_N_INSNS (5), /* Neg.  */
> +  COSTS_N_INSNS (5), /* Compare.  */
> +  COSTS_N_INSNS (7), /* Widen.  */
> +  COSTS_N_INSNS (7), /* Narrow.  */
> +  COSTS_N_INSNS (7), /* Toint.  */
> +  COSTS_N_INSNS (7), /* Fromint.  */
> +  COSTS_N_INSNS (7)  /* Roundint.  */
> +}

Recently ( https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00251.html ,
 https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01418.html ), I changed the
Cortex-A57 and Cortex-A53 cost tables to make the cost of a floating-point
operation relative to the cost of a floating-point move. This gave some
code generation benefits, particularly around conditional execution and
generation of constants.

Did you see those patches, and did you consider whether there would be a
benefit to doing the same for Vulcan?

Thanks,
James



Re: [PATCH][AArch64] Fix some scan-assembler tests for -mabi=ilp32

2016-06-29 Thread Richard Earnshaw (lists)
On 29/06/16 11:40, Kyrill Tkachov wrote:
> Hi all,
> 
> I notice these scan-assembler tests fail when testing -mabi=ilp32
> because the 64-bit operation that they
> expect doesn't happen on the 32-bit long types in that configuration.
> 
> The easy fix is to change the 'long' types to be 'long long' so that
> they are always 64-bit.
> With this patch the tests now pass with and without -mabi=ilp32.
> 
> Ok for trunk?
> 

I think we ought to have two tests, one for long and one for long long.
 I'd only expect the long version to run when !ilp32.

R.

> Thanks,
> Kyrill
> 
> 2016-06-29  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/cinc_common_1.c: Use long long instead of long.
> * gcc.target/aarch64/combine_bfi_1.c: Likewise.
> * gcc.target/aarch64/fmul_fcvt_1.c: Likewise.
> * gcc.target/aarch64/mult-synth_4.c: Likewise.
> * gcc.target/aarch64/target_attr_3.c: Likewise.
> 
> aarch64-tests-ilp32.patch
> 
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c 
> b/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c
> index 
> d04126331a7456dc267fc999f12366acbddf7927..f93364f74ba04e8dc11914a5a1bc342edf5e0751
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c
> @@ -15,14 +15,14 @@ barsi (int x)
>return x > 100 ? x + 4 : x + 3;
>  }
>  
> -long
> -foodi (long x)
> +long long
> +foodi (long long x)
>  {
>return x > 100 ? x - 2 : x - 1;
>  }
>  
> -long
> -bardi (long x)
> +long long
> +bardi (long long x)
>  {
>return x > 100 ? x + 4 : x + 3;
>  }
> diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfi_1.c 
> b/gcc/testsuite/gcc.target/aarch64/combine_bfi_1.c
> index 
> accf14410938a6c0119f7b9b5c76db38170e8e09..9cc3bdb3ddfc286bfb6c079aaba27fb5edb1806c
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/combine_bfi_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/combine_bfi_1.c
> @@ -25,8 +25,8 @@ f4 (int x, int y)
>return (x & ~0xff) | (y & 0xff);
>  }
>  
> -long
> -f5 (long x, long y)
> +long long
> +f5 (long long x, long long y)
>  {
>return (x & ~0xull) | (y & 0x);
>  }
> diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c 
> b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
> index 
> 05e06e3729e59a73018ed647057ea42d9c9aa952..af155f2dda216bb7ace03209d3a8a809c4ee2f8d
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
> @@ -14,26 +14,26 @@ usffoo##__a (float x) \
>return x * __a##.0f;   \
>  }\
>   \
> -long \
> +long long\
>  lsffoo##__a (float x)\
>  {\
>return x * __a##.0f;   \
>  }\
>   \
> -unsigned long\
> +unsigned long long   \
>  ulsffoo##__a (float x)   \
>  {\
>return x * __a##.0f;   \
>  }
>  
>  #define FUNC_DEFD(__a)   \
> -long \
> +long long\
>  dffoo##__a (double x)\
>  {\
>return x * __a##.0;\
>  }\
>   \
> -unsigned long\
> +unsigned long long   \
>  udffoo##__a (double x)   \
>  {\
>return x * __a##.0;\
> diff --git a/gcc/testsuite/gcc.target/aarch64/mult-synth_4.c 
> b/gcc/testsuite/gcc.target/aarch64/mult-synth_4.c
> index 
> 4b607782f9e3257e3444b355d84970580f90a41e..f1283e84e200bc24986b6e0c57e33694e655b532
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/mult-synth_4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/mult-synth_4.c
> @@ -1,10 +1,10 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
>  
> -long
> +long long
>  foo (int x, int y)
>  {
> -   return (long)x * 6L;
> +   return (long long)x * 6L;
>  }
>  
>  /* { dg-final { scan-assembler-times "smull\tx\[0-9\]+, w\[0-9\]+, 
> w\[0-9\]+" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_3.c 
> b/gcc/testsuite/gcc.target/aarch64/target_attr_3.c
> index 
> 50e52520ef0d1082681661999cb034f01ffc3fab..7887208fd7d83723c8f5e21000ed4359a27d0dae
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/target_attr_3.c
> +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_3.c
> @@ -5,12 +5,12 @@
> and the fix is applied once.  */
>  
>  __attribute__ ((target ("fix-cortex-a53-835769")))
> -unsigned long
> -test (unsigned long a, double b, unsigned long c,
> -  unsigned long d, unsigned long *e)
> +unsigned long long
> +test (unsigned long long a, double b, unsigned long long c,
> +  unsigned long long d, unsigned long long *e)
>  {
>double result;
> -  volatile unsigned long tmp = *e;
> +  volatile unsigned long long tmp = *e;
>__asm__ __volatile ("// %0, %1"
>   : "=w" (result)
>   : "0" (b)
> @@ -18,12 +18,12 @@ test (unsigned long a, double b, unsigned long c,
>   

Re: [PATCH][AArch64] Fix some scan-assembler tests for -mabi=ilp32

2016-06-29 Thread James Greenhalgh
On Wed, Jun 29, 2016 at 11:40:13AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> I notice these scan-assembler tests fail when testing -mabi=ilp32 because the
> 64-bit operation that they expect doesn't happen on the 32-bit long types in
> that configuration.
> 
> The easy fix is to change the 'long' types to be 'long long' so that they are
> always 64-bit.  With this patch the tests now pass with and without
> -mabi=ilp32.

In my opinion, the better fix would be something like:

  typedef int SImode __attribute__((mode(SI)));
  typedef int DImode __attribute__((mode(DI)));

Then:

  SImode
  foosi (SImode x)
  {
...
  }

  DImode
  foodi (DImode x)
  {
...
  }

As these patterns are explictly testing that the back-end patterns are
triggered for data of any type, under any ABI, with the appropriate mode.

Thanks,
James


> 2016-06-29  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/cinc_common_1.c: Use long long instead of long.
> * gcc.target/aarch64/combine_bfi_1.c: Likewise.
> * gcc.target/aarch64/fmul_fcvt_1.c: Likewise.
> * gcc.target/aarch64/mult-synth_4.c: Likewise.
> * gcc.target/aarch64/target_attr_3.c: Likewise.



Re: [PATCH][AArch64] Fix some scan-assembler tests for -mabi=ilp32

2016-06-29 Thread Richard Earnshaw (lists)
On 29/06/16 11:47, James Greenhalgh wrote:
> On Wed, Jun 29, 2016 at 11:40:13AM +0100, Kyrill Tkachov wrote:
>> Hi all,
>>
>> I notice these scan-assembler tests fail when testing -mabi=ilp32 because the
>> 64-bit operation that they expect doesn't happen on the 32-bit long types in
>> that configuration.
>>
>> The easy fix is to change the 'long' types to be 'long long' so that they are
>> always 64-bit.  With this patch the tests now pass with and without
>> -mabi=ilp32.
> 
> In my opinion, the better fix would be something like:
> 
>   typedef int SImode __attribute__((mode(SI)));
>   typedef int DImode __attribute__((mode(DI)));
> 
> Then:
> 
>   SImode
>   foosi (SImode x)
>   {
> ...
>   }
> 
>   DImode
>   foodi (DImode x)
>   {
> ...
>   }
> 
> As these patterns are explictly testing that the back-end patterns are
> triggered for data of any type, under any ABI, with the appropriate mode.
> 

Yeah, good idea.

R.

> Thanks,
> James
> 
> 
>> 2016-06-29  Kyrylo Tkachov  
>>
>> * gcc.target/aarch64/cinc_common_1.c: Use long long instead of long.
>> * gcc.target/aarch64/combine_bfi_1.c: Likewise.
>> * gcc.target/aarch64/fmul_fcvt_1.c: Likewise.
>> * gcc.target/aarch64/mult-synth_4.c: Likewise.
>> * gcc.target/aarch64/target_attr_3.c: Likewise.
> 



Re: Determine more IVs to be non-overflowing

2016-06-29 Thread Jan Hubicka
Hi,
after some exprimentation I am re-testing the following version.  wide_int 
variant
seemed ugly. Here I simply cap nit to be in range of the type and then the
widest_int calculation ought to be safe.

What do you think? Perhaps I need to use signs of STEP and BASE type just in
the case some casts was dropped?
Honza

/* Return true if the IV calculation in TYPE can overflow based on the knowledge
   of the upper bound on the number of iterations of LOOP, the BASE and STEP
   of IV.

   We do not use information whether TYPE can overflow so it is safe to
   use this test even for derived IVs not computed every iteration or
   hypotetical IVs to be inserted into code.  */

bool
iv_can_overflow_p (struct loop *loop, tree type, tree base, tree step)
{
  widest_int nit;
  wide_int base_min, base_max, step_min, step_max, type_min, type_max;
  signop sgn = TYPE_SIGN (type);

  if (step == 0)
return false;

  if (TREE_CODE (base) == INTEGER_CST)
base_min = base_max = base;
  else if (TREE_CODE (base) == SSA_NAME && !POINTER_TYPE_P (TREE_TYPE (base))
   && get_range_info (base, &base_min, &base_max) == VR_RANGE)
;
  else
return true;

  if (TREE_CODE (step) == INTEGER_CST)
step_min = step_max = step;
  else if (TREE_CODE (step) == SSA_NAME && !POINTER_TYPE_P (TREE_TYPE (base))
   && get_range_info (step, &step_min, &step_max) == VR_RANGE)
;
  else
return true;

  if (!get_max_loop_iterations (loop, &nit))
return false;

  type_min = wi::max_value (type);
  type_max = wi::min_value (type);
  /* Watch overflow.  */
  if ((widest_int)1 << TYPE_PRECISION (type) <= nit)
return true;
  if ((widest_int::from (base_max, sgn)
   + widest_int::from (step_max, sgn) * (nit + 1))
   > widest_int::from (type_max, sgn)
  || (widest_int::from (type_min, sgn)
  > (widest_int::from (base_min, sgn)
 + widest_int::from (step_min, sgn) * (nit + 1
return true;
  return false;
}


Re: [PATCH/AARCH64] Add rtx_costs routine for vulcan.

2016-06-29 Thread Virendra Pathak
Hi James,

> Did you see those patches, and did you consider whether there would be a
> benefit to doing the same for Vulcan?
No. I have not studied those patches yet. Currently I am working on
adding vulcan scheduler as a next patch.
Kindly advise on the following:
Could this patch be merged now (assuming you are okay),
and I will update vulcan costs based on your patch later (after vulcan
scheduler)?

Thanks for your time.

with regards,
Virendra Pathak


On Wed, Jun 29, 2016 at 4:11 PM, James Greenhalgh
 wrote:
> On Thu, Jun 23, 2016 at 02:45:21PM +0530, Virendra Pathak wrote:
>> Hi gcc-patches group,
>>
>> Please find the patch for adding rtx_costs routine for vulcan cpu.
>>
>> Tested with compiling cross aarch64-linux-gcc , bootstrapped native
>> aarch64-unknown-linux-gnu
>> and make check (gcc). No new regression failure is added by this patch.
>>
>> Kindly review and merge the patch to trunk, if the patch is okay.
>> Thanks.
>
> This is OK, but I have the same question for you as I had for the
> qdf24xx tuning that Jim proposed, so I won't commit it yet...
>
>> gcc/ChangeLog:
>>
>> Virendra Pathak  
>>
>> * config/aarch64/aarch64-cores.def: Update vulcan COSTS.
>> * config/aarch64/aarch64-cost-tables.h
>> (vulcan_extra_costs): New variable.
>> * config/aarch64/aarch64.c
>> (vulcan_addrcost_table): Likewise.
>> (vulcan_regmove_cost): Likewise.
>> (vulcan_vector_cost): Likewise.
>> (vulcan_branch_cost): Likewise.
>> (vulcan_tunings): Likewise.
>
>
>>
>> +  {
>> +/* FP SFmode */
>> +{
>> +  COSTS_N_INSNS (16),/* Div.  */
>> +  COSTS_N_INSNS (6), /* Mult.  */
>> +  COSTS_N_INSNS (6), /* Mult_addsub. */
>> +  COSTS_N_INSNS (6), /* Fma.  */
>> +  COSTS_N_INSNS (6), /* Addsub.  */
>> +  COSTS_N_INSNS (5), /* Fpconst. */
>> +  COSTS_N_INSNS (5), /* Neg.  */
>> +  COSTS_N_INSNS (5), /* Compare.  */
>> +  COSTS_N_INSNS (7), /* Widen.  */
>> +  COSTS_N_INSNS (7), /* Narrow.  */
>> +  COSTS_N_INSNS (7), /* Toint.  */
>> +  COSTS_N_INSNS (7), /* Fromint.  */
>> +  COSTS_N_INSNS (7)  /* Roundint.  */
>> +},
>> +/* FP DFmode */
>> +{
>> +  COSTS_N_INSNS (23),/* Div.  */
>> +  COSTS_N_INSNS (6), /* Mult.  */
>> +  COSTS_N_INSNS (6), /* Mult_addsub.  */
>> +  COSTS_N_INSNS (6), /* Fma.  */
>> +  COSTS_N_INSNS (6), /* Addsub.  */
>> +  COSTS_N_INSNS (5), /* Fpconst.  */
>> +  COSTS_N_INSNS (5), /* Neg.  */
>> +  COSTS_N_INSNS (5), /* Compare.  */
>> +  COSTS_N_INSNS (7), /* Widen.  */
>> +  COSTS_N_INSNS (7), /* Narrow.  */
>> +  COSTS_N_INSNS (7), /* Toint.  */
>> +  COSTS_N_INSNS (7), /* Fromint.  */
>> +  COSTS_N_INSNS (7)  /* Roundint.  */
>> +}
>
> Recently ( https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00251.html ,
>  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01418.html ), I changed the
> Cortex-A57 and Cortex-A53 cost tables to make the cost of a floating-point
> operation relative to the cost of a floating-point move. This gave some
> code generation benefits, particularly around conditional execution and
> generation of constants.
>
> Did you see those patches, and did you consider whether there would be a
> benefit to doing the same for Vulcan?
>
> Thanks,
> James
>


Re: [PATCH] Mark -fstack-protect as optimization flag (PR middle-end/71585)

2016-06-29 Thread Richard Biener
On Wed, Jun 29, 2016 at 12:29 PM, Martin Liška  wrote:
> On 06/29/2016 12:27 PM, Martin Liška wrote:
>> On 06/29/2016 11:12 AM, Richard Biener wrote:
>>> So what was your original reason to pursue this?
>>>
>>> Richard.
>>
>> Agree with you that handling the option during inlining is a bit overkill.
>> I can live with just marking the option as Optimize, which will fix reported 
>> PR.
>>
>> Sending simplified version 2.
>>
>> Ready for trunk?
>> Martin
>>
>
> Adding missing patch.

Ok.  Might want to backport the inline_call hunk to all affected branches.

Richard.

> M.


Re: [AArch64] ARMv8.2 command line and feature macros support

2016-06-29 Thread Jiong Wang

Richard Earnshaw (lists) writes:

> On 29/06/16 09:43, James Greenhalgh wrote:
>> On Mon, Jun 27, 2016 at 03:58:00PM +0100, Jiong Wang wrote:
>>>  
>>>  The permissible values for @var{arch} are @samp{armv8-a},
>>> -@samp{armv8.1-a} or @var{native}.
>>> +@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}.
>>> +
>>> +The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler
>>> +support for the ARMv8.2 architecture extensions.
>> 
>> ARMv8.2-A here too please.
>
> Why do we need to say that v8.2-a implies v8.1-a, when we don't say that
> v8.1-a implies v8-a?  The whole implies clause seems unnecessary to me.
>
> R.

Hi Richard,

  The describe entries are organized in descending order.
  
  "v8.1-a implies v8-a", are right here below.
|
v
>>>  The value @samp{armv8.1-a} implies @samp{armv8-a} and enables compiler
>>>  support for the ARMv8.1 architecture extension.  In particular, it
>>> @@ -13140,6 +13143,8 @@ instructions.  This is on by default for all 
>>> possible values for options
>>>  @item lse
>>>  Enable Large System Extension instructions.  This is on by default for
>>>  @option{-march=armv8.1-a}.
>>> +@item fp16
>>> +Enable FP16 extension.
>> 

-- 
Regards,
Jiong


[PATCH, www, obvious] Fix typo in htdocs/develop.html

2016-06-29 Thread Martin Liška
Hello.

This looks as typo, I'm going to install the patch.

Index: htdocs/develop.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/develop.html,v
retrieving revision 1.165
diff --unified -r1.165 develop.html
--- htdocs/develop.html 3 Jun 2016 09:56:49 -   1.165
+++ htdocs/develop.html 29 Jun 2016 11:20:37 -
@@ -572,7 +572,7 @@
   GCC 6 Stage 4 (starts 2016-01-20)GCC 5.3 release (2015-12-04)
|\
| v
-   |   GCC 5.4 release (2015-06-03)
+   |   GCC 5.4 release (2016-06-03)
+-- GCC 6 branch created +
| \
v  v


Thanks,
Martin


Re: [PATCH] Generate more effective one-operand permutation instruction for knl.

2016-06-29 Thread Kirill Yukhin
Hello,
On 29 Jun 13:12, Yuri Rumyantsev wrote:
> Hi All,
> 
> Here is a simple patch which generates on-operand vperm instructions
> introduced in knl.
> Using this patch we got +5% speed-up on one important benchmark.
> 
> Bootstrapping and regression testing did not show any new failures.
> Is it OK for trunk?
> 
> ChangeLog:
> 2016-06-29  Yuri Rumyantsev  
> 
> * config/i386/i386.c (ix86_expand_vec_perm): Add handle one-operand
> permutation for TARGET_AVX512F.
> (ix86_expand_vec_one_operand_perm_avx512): New function.
> (expand_vec_perm_1): Invoke introduced function.
> * tree-vect-loop.c (vect_transform_loop): Clear-up safelen value since
> it may be not valid after vectorization.
Patch is OK for main trunk.
> 
> gcc/testsuite/ChangeLog
> * gcc/testsuite/gcc.target/i386/avx512f-vect-perm-1.c: New test.
> * gcc/testsuite/gcc.target/i386/avx512f-vect-perm-2.c: New test.

--
Thanks, K




[PATCH][C++] C++ bitfield memory model for as-base classes

2016-06-29 Thread Richard Biener

Currently as-base classes lack DECL_BIT_FIELD_REPRESENTATIVEs which
means RTL expansion doesn't honor the C++ memory model for bitfields
in them thus for the following testcase

struct B {
B() {}
int x;
int a : 6;
int b : 6;
int c : 6;
};

struct C : B {
char d;
};

C c;

int main()
{
  c.c = 1;
  c.d = 2;
}

on x86 we happily store to c.c in a way creating a store data race with
c.d:

main:
.LFB6:
.cfi_startproc
movlc+4(%rip), %eax
andl$-258049, %eax
orb $16, %ah
movl%eax, c+4(%rip)
movb$2, c+7(%rip)
xorl%eax, %eax
ret

Fixing the lack of DECL_BIT_FIELD_REPRESENTATIVEs in as-base
classes doesn't help though as the C++ FE builds access trees
for c.c using the non-as-base class FIELD_DECLs which is because
of layout_class_type doing

  /* Now that we're done with layout, give the base fields the real types.  */
  for (field = TYPE_FIELDS (t); field; field = DECL_CHAIN (field))
if (DECL_ARTIFICIAL (field) && IS_FAKE_BASE_TYPE (TREE_TYPE (field)))
  TREE_TYPE (field) = TYPE_CONTEXT (TREE_TYPE (field));

this would basically require us to always treat tail-padding in a
struct conservatively in finish_bitfield_representative (according
to the doubt by the ??? comment I patch out below).

Simply commenting out the above makes fixing build_simple_base_path
necessary but even after that it then complains in the verifier
later ("type mismatch in component reference" - as-base to class
assignment).

But it still somehow ends up using the wrong FIELD_DECL in the end.

Now I think we need to fix the wrong-code issue somehow and
doing so in stor-layout.c by conservatively treating tail-padding
is a possibility.  But that will pessimize PODs and other languages
unless we have a way to know whether a RECORD_TYPE possibly can
have its tail-padding re-used (I'd hate to put a lang_hooks.name
check there and even that would pessimize C++ PODs).

Any guidance here?

I opened PR71694 for this.

Richard.

2016-06-29  Richard Biener  

* stor-layout.c (finish_bitfield_representative): Clarify comment.

cp/
* class.c (layout_class_type): Call finish_bitfield_layout on
the as-base variant.

Index: gcc/stor-layout.c
===
*** gcc/stor-layout.c   (revision 237840)
--- gcc/stor-layout.c   (working copy)
*** finish_bitfield_representative (tree rep
*** 1866,1878 
  }
else
  {
!   /* ???  If you consider that tail-padding of this struct might be
!  re-used when deriving from it we cannot really do the following
!and thus need to set maxsize to bitsize?  Also we cannot
!generally rely on maxsize to fold to an integer constant, so
!use bitsize as fallback for this case.  */
tree maxsize = size_diffop (TYPE_SIZE_UNIT (DECL_CONTEXT (field)),
  DECL_FIELD_OFFSET (repr));
if (tree_fits_uhwi_p (maxsize))
maxbitsize = (tree_to_uhwi (maxsize) * BITS_PER_UNIT
  - tree_to_uhwi (DECL_FIELD_BIT_OFFSET (repr)));
--- 1866,1878 
  }
else
  {
!   /* Note that if the C++ FE sets up tail-padding to be re-used it
!  creates a as-base variant of the type with TYPE_SIZE adjusted
!accordingly.  So it is safe to include tail-padding here.  */
tree maxsize = size_diffop (TYPE_SIZE_UNIT (DECL_CONTEXT (field)),
  DECL_FIELD_OFFSET (repr));
+   /* We cannot generally rely on maxsize to fold to an integer constant,
+so use bitsize as fallback for this case.  */
if (tree_fits_uhwi_p (maxsize))
maxbitsize = (tree_to_uhwi (maxsize) * BITS_PER_UNIT
  - tree_to_uhwi (DECL_FIELD_BIT_OFFSET (repr)));
Index: gcc/cp/class.c
===
*** gcc/cp/class.c  (revision 237840)
--- gcc/cp/class.c  (working copy)
*** layout_class_type (tree t, tree *virtual
*** 6556,6561 
--- 6556,6564 
  }
*next_field = NULL_TREE;
  
+   /* Finish bitfield layout of the type.  */
+   finish_bitfield_layout (base_t);
+ 
/* Record the base version of the type.  */
CLASSTYPE_AS_BASE (t) = base_t;
TYPE_CONTEXT (base_t) = t;


[PATCH 0/9] remove some manual memory management

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

This is just a bunch of adding constructors and destructors and switching to
use auto_vec more.

patches individually bootstrapped and regtested on x86_64-linux-gnu, ok?

Trev

Trevor Saunders (9):
  tree.c: add [cd]tors to free_lang_data_d
  c-decl.c: add [cd]tors to c_struct_parse_info
  genextract.c: add [cd]tors to accum_extract
  ipa.c: remove static_{ctors,dtors} globals
  cfgexpand.c: use auto_vec in stack_vars_data
  ree.c: use auto_vec in ext_state
  tree-ssa-sccvn.c: use auto_vec for sccvn_dom_walker::cond_stack
  use auto_vec for more local variables
  remove unnecessary calls to vec::release

 gcc/c/c-decl.c| 16 +---
 gcc/c/c-parser.c  | 22 ++
 gcc/cfgexpand.c   | 12 +++-
 gcc/genextract.c  | 23 +++
 gcc/genmatch.c| 12 
 gcc/haifa-sched.c | 15 ---
 gcc/ipa.c | 37 +
 gcc/predict.c |  4 +---
 gcc/ree.c | 19 ++-
 gcc/tree-data-ref.c   |  8 ++--
 gcc/tree-diagnostic.c |  4 +---
 gcc/tree-ssa-alias.c  | 21 +++--
 gcc/tree-ssa-loop-niter.c |  6 ++
 gcc/tree-ssa-sccvn.c  | 26 ++
 gcc/tree-stdarg.c |  3 +--
 gcc/tree-vect-stmts.c | 11 ++-
 gcc/tree.c| 33 -
 17 files changed, 86 insertions(+), 186 deletions(-)

-- 
2.7.4



[PATCH 1/9] tree.c: add [cd]tors to free_lang_data_d

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-06-29  Trevor Saunders  

* tree.c (struct free_lang_data_d): Add constructor and change
types of members to ones that automatically manage resources.
(fld_worklist_push): Adjust.
(find_decls_types): Likewise.
(find_decls_types_in_eh_region): Likewise.
(free_lang_data_in_cgraph): Stop manually creating and
destroying members of free_lang_data_d.
---
 gcc/tree.c | 33 -
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/gcc/tree.c b/gcc/tree.c
index bc60190..617f326 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -5500,17 +5500,19 @@ free_lang_data_in_decl (tree decl)
 
 struct free_lang_data_d
 {
+  free_lang_data_d () : decls (100), types (100) {}
+
   /* Worklist to avoid excessive recursion.  */
-  vec worklist;
+  auto_vec worklist;
 
   /* Set of traversed objects.  Used to avoid duplicate visits.  */
-  hash_set *pset;
+  hash_set pset;
 
   /* Array of symbols to process with free_lang_data_in_decl.  */
-  vec decls;
+  auto_vec decls;
 
   /* Array of types to process with free_lang_data_in_type.  */
-  vec types;
+  auto_vec types;
 };
 
 
@@ -5569,7 +5571,7 @@ add_tree_to_fld_list (tree t, struct free_lang_data_d 
*fld)
 static inline void
 fld_worklist_push (tree t, struct free_lang_data_d *fld)
 {
-  if (t && !is_lang_specific (t) && !fld->pset->contains (t))
+  if (t && !is_lang_specific (t) && !fld->pset.contains (t))
 fld->worklist.safe_push ((t));
 }
 
@@ -5738,8 +5740,8 @@ find_decls_types (tree t, struct free_lang_data_d *fld)
 {
   while (1)
 {
-  if (!fld->pset->contains (t))
-   walk_tree (&t, find_decls_types_r, fld, fld->pset);
+  if (!fld->pset.contains (t))
+   walk_tree (&t, find_decls_types_r, fld, &fld->pset);
   if (fld->worklist.is_empty ())
break;
   t = fld->worklist.pop ();
@@ -5793,7 +5795,7 @@ find_decls_types_in_eh_region (eh_region r, struct 
free_lang_data_d *fld)
for (c = r->u.eh_try.first_catch; c ; c = c->next_catch)
  {
c->type_list = get_eh_types_for_runtime (c->type_list);
-   walk_tree (&c->type_list, find_decls_types_r, fld, fld->pset);
+   walk_tree (&c->type_list, find_decls_types_r, fld, &fld->pset);
  }
   }
   break;
@@ -5801,12 +5803,12 @@ find_decls_types_in_eh_region (eh_region r, struct 
free_lang_data_d *fld)
 case ERT_ALLOWED_EXCEPTIONS:
   r->u.allowed.type_list
= get_eh_types_for_runtime (r->u.allowed.type_list);
-  walk_tree (&r->u.allowed.type_list, find_decls_types_r, fld, fld->pset);
+  walk_tree (&r->u.allowed.type_list, find_decls_types_r, fld, &fld->pset);
   break;
 
 case ERT_MUST_NOT_THROW:
   walk_tree (&r->u.must_not_throw.failure_decl,
-find_decls_types_r, fld, fld->pset);
+find_decls_types_r, fld, &fld->pset);
   break;
 }
 }
@@ -5948,12 +5950,6 @@ free_lang_data_in_cgraph (void)
   unsigned i;
   alias_pair *p;
 
-  /* Initialize sets and arrays to store referenced decls and types.  */
-  fld.pset = new hash_set;
-  fld.worklist.create (0);
-  fld.decls.create (100);
-  fld.types.create (100);
-
   /* Find decls and types in the body of every function in the callgraph.  */
   FOR_EACH_FUNCTION (n)
 find_decls_types_in_node (n, &fld);
@@ -5983,11 +5979,6 @@ free_lang_data_in_cgraph (void)
   FOR_EACH_VEC_ELT (fld.types, i, t)
verify_type (t);
 }
-
-  delete fld.pset;
-  fld.worklist.release ();
-  fld.decls.release ();
-  fld.types.release ();
 }
 
 
-- 
2.7.4



[PATCH 5/9] cfgexpand.c: use auto_vec in stack_vars_data

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-06-29  Trevor Saunders  

* cfgexpand.c (struct stack_vars_data): Make type of fields
auto_vec.
(expand_used_vars): Adjust.
---
 gcc/cfgexpand.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index e4ddb3a..9750586 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1030,10 +1030,10 @@ struct stack_vars_data
   /* Vector of offset pairs, always end of some padding followed
  by start of the padding that needs Address Sanitizer protection.
  The vector is in reversed, highest offset pairs come first.  */
-  vec asan_vec;
+  auto_vec asan_vec;
 
   /* Vector of partition representative decls in between the paddings.  */
-  vec asan_decl_vec;
+  auto_vec asan_decl_vec;
 
   /* Base pseudo register for Address Sanitizer protected automatic vars.  */
   rtx asan_base;
@@ -2179,8 +2179,6 @@ expand_used_vars (void)
 {
   struct stack_vars_data data;
 
-  data.asan_vec = vNULL;
-  data.asan_decl_vec = vNULL;
   data.asan_base = NULL_RTX;
   data.asan_alignb = 0;
 
@@ -2239,9 +2237,6 @@ expand_used_vars (void)
}
 
   expand_stack_vars (NULL, &data);
-
-  data.asan_vec.release ();
-  data.asan_decl_vec.release ();
 }
 
   fini_vars_expansion ();
-- 
2.7.4



[PATCH 2/9] c-decl.c: add [cd]tors to c_struct_parse_info

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/c/ChangeLog:

2016-06-29  Trevor Saunders  

* c-decl.c (struct c_struct_parse_info): Add constructor and
change member types from vec to auto_vec.
(start_struct): Adjust.
(finish_struct): Likewise.
---
 gcc/c/c-decl.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 8b966fe..c173796 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -574,15 +574,15 @@ struct c_struct_parse_info
 {
   /* If warn_cxx_compat, a list of types defined within this
  struct.  */
-  vec struct_types;
+  auto_vec struct_types;
   /* If warn_cxx_compat, a list of field names which have bindings,
  and which are defined in this struct, but which are not defined
  in any enclosing struct.  This is used to clear the in_struct
  field of the c_bindings structure.  */
-  vec fields;
+  auto_vec fields;
   /* If warn_cxx_compat, a list of typedef names used when defining
  fields in this struct.  */
-  vec typedefs_seen;
+  auto_vec typedefs_seen;
 };
 
 /* Information for the struct or union currently being parsed, or
@@ -7443,10 +7443,7 @@ start_struct (location_t loc, enum tree_code code, tree 
name,
 TYPE_PACKED (v) = flag_pack_struct;
 
   *enclosing_struct_parse_info = struct_parse_info;
-  struct_parse_info = XNEW (struct c_struct_parse_info);
-  struct_parse_info->struct_types.create (0);
-  struct_parse_info->fields.create (0);
-  struct_parse_info->typedefs_seen.create (0);
+  struct_parse_info = new c_struct_parse_info ();
 
   /* FIXME: This will issue a warning for a use of a type defined
  within a statement expr used within sizeof, et. al.  This is not
@@ -8088,10 +8085,7 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   if (warn_cxx_compat)
 warn_cxx_compat_finish_struct (fieldlist, TREE_CODE (t), loc);
 
-  struct_parse_info->struct_types.release ();
-  struct_parse_info->fields.release ();
-  struct_parse_info->typedefs_seen.release ();
-  XDELETE (struct_parse_info);
+  delete struct_parse_info;
 
   struct_parse_info = enclosing_struct_parse_info;
 
-- 
2.7.4



[PATCH 7/9] tree-ssa-sccvn.c: use auto_vec for sccvn_dom_walker::cond_stack

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-06-29  Trevor Saunders  

* tree-ssa-sccvn.c (sccvn_dom_walker::~sccvn_dom_walker): remove.
(sccvn_dom_walker): make cond_stack an auto_vec.
---
 gcc/tree-ssa-sccvn.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 0cbd2cd..95306c5 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -,8 +,7 @@ class sccvn_dom_walker : public dom_walker
 {
 public:
   sccvn_dom_walker ()
-: dom_walker (CDI_DOMINATORS, true), fail (false), cond_stack (vNULL) {}
-  ~sccvn_dom_walker ();
+: dom_walker (CDI_DOMINATORS, true), fail (false), cond_stack (0) {}
 
   virtual edge before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
@@ -4456,15 +4455,10 @@ public:
 enum tree_code code, tree lhs, tree rhs, bool value);
 
   bool fail;
-  vec > >
+  auto_vec > >
 cond_stack;
 };
 
-sccvn_dom_walker::~sccvn_dom_walker ()
-{
-  cond_stack.release ();
-}
-
 /* Record a temporary condition for the BB and its dominated blocks.  */
 
 void
-- 
2.7.4



[PATCH 3/9] genextract.c: add [cd]tors to accum_extract

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-06-29  Trevor Saunders  

* genextract.c (struct accum_extract): Add constructor and make
members auto_vec.
(gen_insn): Adjust.
---
 gcc/genextract.c | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/gcc/genextract.c b/gcc/genextract.c
index d591781..e6d5337 100644
--- a/gcc/genextract.c
+++ b/gcc/genextract.c
@@ -69,10 +69,12 @@ static struct code_ptr *peepholes;
 
 struct accum_extract
 {
-  vec oplocs;
-  vec duplocs;
-  vec dupnums;
-  vec pathstr;
+  accum_extract () : oplocs (10), duplocs (10), dupnums (10), pathstr (20) {}
+
+  auto_vec oplocs;
+  auto_vec duplocs;
+  auto_vec dupnums;
+  auto_vec pathstr;
 };
 
 /* Forward declarations.  */
@@ -87,11 +89,6 @@ gen_insn (md_rtx_info *info)
   struct code_ptr *link;
   struct accum_extract acc;
 
-  acc.oplocs.create (10);
-  acc.duplocs.create (10);
-  acc.dupnums.create (10);
-  acc.pathstr.create (20);
-
   /* Walk the insn's pattern, remembering at all times the path
  down to the walking point.  */
 
@@ -142,7 +139,7 @@ gen_insn (md_rtx_info *info)
   /* This extraction is the same as ours.  Just link us in.  */
   link->next = p->insns;
   p->insns = link;
-  goto done;
+  return;
 }
 
   /* Otherwise, make a new extraction method.  We stash the arrays
@@ -166,12 +163,6 @@ gen_insn (md_rtx_info *info)
   memcpy (p->oplocs, acc.oplocs.address (), op_count * sizeof (locstr));
   memcpy (p->duplocs, acc.duplocs.address (), dup_count * sizeof (locstr));
   memcpy (p->dupnums, acc.dupnums.address (), dup_count * sizeof (int));
-
- done:
-  acc.oplocs.release ();
-  acc.duplocs.release ();
-  acc.dupnums.release ();
-  acc.pathstr.release ();
 }
 
 /* Helper subroutine of walk_rtx: given a vec, an index, and a
-- 
2.7.4



[PATCH 4/9] ipa.c: remove static_{ctors,dtors} globals

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-06-29  Trevor Saunders  

* ipa.c (record_cdtor_fn): Adjust.
(build_cdtor_fns): Likewise.
(ipa_cdtor_merge): Make static_ctors and static_dtors local
variables.
---
 gcc/ipa.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/gcc/ipa.c b/gcc/ipa.c
index 6722d3b..2609e32 100644
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -989,11 +989,6 @@ cgraph_build_static_cdtor (char which, tree body, int 
priority)
   cgraph_build_static_cdtor_1 (which, body, priority, false);
 }
 
-/* A vector of FUNCTION_DECLs declared as static constructors.  */
-static vec static_ctors;
-/* A vector of FUNCTION_DECLs declared as static destructors.  */
-static vec static_dtors;
-
 /* When target does not have ctors and dtors, we call all constructor
and destructor by special initialization/destruction function
recognized by collect2.
@@ -1002,12 +997,12 @@ static vec static_dtors;
destructors and turn them into normal functions.  */
 
 static void
-record_cdtor_fn (struct cgraph_node *node)
+record_cdtor_fn (struct cgraph_node *node, vec *ctors, vec *dtors)
 {
   if (DECL_STATIC_CONSTRUCTOR (node->decl))
-static_ctors.safe_push (node->decl);
+ctors->safe_push (node->decl);
   if (DECL_STATIC_DESTRUCTOR (node->decl))
-static_dtors.safe_push (node->decl);
+dtors->safe_push (node->decl);
   node = cgraph_node::get (node->decl);
   DECL_DISREGARD_INLINE_LIMITS (node->decl) = 1;
 }
@@ -1018,7 +1013,7 @@ record_cdtor_fn (struct cgraph_node *node)
they are destructors.  */
 
 static void
-build_cdtor (bool ctor_p, vec cdtors)
+build_cdtor (bool ctor_p, const vec &cdtors)
 {
   size_t i,j;
   size_t len = cdtors.length ();
@@ -1135,20 +1130,20 @@ compare_dtor (const void *p1, const void *p2)
functions have magic names which are detected by collect2.  */
 
 static void
-build_cdtor_fns (void)
+build_cdtor_fns (vec *ctors, vec *dtors)
 {
-  if (!static_ctors.is_empty ())
+  if (!ctors->is_empty ())
 {
   gcc_assert (!targetm.have_ctors_dtors || in_lto_p);
-  static_ctors.qsort (compare_ctor);
-  build_cdtor (/*ctor_p=*/true, static_ctors);
+  ctors->qsort (compare_ctor);
+  build_cdtor (/*ctor_p=*/true, *ctors);
 }
 
-  if (!static_dtors.is_empty ())
+  if (!dtors->is_empty ())
 {
   gcc_assert (!targetm.have_ctors_dtors || in_lto_p);
-  static_dtors.qsort (compare_dtor);
-  build_cdtor (/*ctor_p=*/false, static_dtors);
+  dtors->qsort (compare_dtor);
+  build_cdtor (/*ctor_p=*/false, *dtors);
 }
 }
 
@@ -1161,14 +1156,16 @@ build_cdtor_fns (void)
 static unsigned int
 ipa_cdtor_merge (void)
 {
+  /* A vector of FUNCTION_DECLs declared as static constructors.  */
+  auto_vec ctors;
+  /* A vector of FUNCTION_DECLs declared as static destructors.  */
+  auto_vec dtors;
   struct cgraph_node *node;
   FOR_EACH_DEFINED_FUNCTION (node)
 if (DECL_STATIC_CONSTRUCTOR (node->decl)
|| DECL_STATIC_DESTRUCTOR (node->decl))
-   record_cdtor_fn (node);
-  build_cdtor_fns ();
-  static_ctors.release ();
-  static_dtors.release ();
+   record_cdtor_fn (node, &ctors, &dtors);
+  build_cdtor_fns (&ctors, &dtors);
   return 0;
 }
 
-- 
2.7.4



[PATCH 8/9] use auto_vec for more local variables

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/c/ChangeLog:

2016-06-29  Trevor Saunders  

* c-parser.c (c_parser_generic_selection): Make type of variable
auto_vec.
(c_parser_omp_declare_simd): Likewise.

gcc/ChangeLog:

2016-06-29  Trevor Saunders  

* cfgexpand.c (expand_used_vars): Make the type of a local variable 
auto_vec.
* genmatch.c (lower_for): Likewise.
* haifa-sched.c (haifa_sched_init): Likewise.
(add_to_speculative_block): Likewise.
(create_check_block_twin): Likewise.
* predict.c (handle_missing_profiles): Likewise.
* tree-data-ref.c (loop_nest_has_data_refs): Likewise.
* tree-diagnostic.c (maybe_unwind_expanded_macro_loc): Likewise.
* tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): 
Likewise.
(maybe_lower_iteration_bound): Likewise.
* tree-ssa-sccvn.c (DFS): Likewise.
* tree-stdarg.c (reachable_at_most_once): Likewise.
* tree-vect-stmts.c (vectorizable_conversion): Likewise.
(vectorizable_store): Likewise.
---
 gcc/c/c-parser.c  | 22 ++
 gcc/cfgexpand.c   |  3 +--
 gcc/genmatch.c| 12 
 gcc/haifa-sched.c | 15 ---
 gcc/predict.c |  4 +---
 gcc/tree-data-ref.c   |  5 +
 gcc/tree-diagnostic.c |  4 +---
 gcc/tree-ssa-loop-niter.c |  6 ++
 gcc/tree-ssa-sccvn.c  | 16 
 gcc/tree-stdarg.c |  3 +--
 gcc/tree-vect-stmts.c |  8 ++--
 11 files changed, 27 insertions(+), 71 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7f491f1..18dc618 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7243,7 +7243,6 @@ struct c_generic_association
 static struct c_expr
 c_parser_generic_selection (c_parser *parser)
 {
-  vec associations = vNULL;
   struct c_expr selector, error_expr;
   tree selector_type;
   struct c_generic_association matched_assoc;
@@ -7300,6 +7299,7 @@ c_parser_generic_selection (c_parser *parser)
   return error_expr;
 }
 
+  auto_vec associations;
   while (1)
 {
   struct c_generic_association assoc, *iter;
@@ -7320,13 +7320,13 @@ c_parser_generic_selection (c_parser *parser)
  if (type_name == NULL)
{
  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
- goto error_exit;
+ return error_expr;
}
  assoc.type = groktypename (type_name, NULL, NULL);
  if (assoc.type == error_mark_node)
{
  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
- goto error_exit;
+ return error_expr;
}
 
  if (TREE_CODE (assoc.type) == FUNCTION_TYPE)
@@ -7345,14 +7345,14 @@ c_parser_generic_selection (c_parser *parser)
   if (!c_parser_require (parser, CPP_COLON, "expected %<:%>"))
{
  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
- goto error_exit;
+ return error_expr;
}
 
   assoc.expression = c_parser_expr_no_commas (parser, NULL);
   if (assoc.expression.value == error_mark_node)
{
  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
- goto error_exit;
+ return error_expr;
}
 
   for (ix = 0; associations.iterate (ix, &iter); ++ix)
@@ -7408,8 +7408,6 @@ c_parser_generic_selection (c_parser *parser)
   c_parser_consume_token (parser);
 }
 
-  associations.release ();
-
   if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
 {
   c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
@@ -7425,10 +7423,6 @@ c_parser_generic_selection (c_parser *parser)
 }
 
   return matched_assoc.expression;
-
- error_exit:
-  associations.release ();
-  return error_expr;
 }
 
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2).
@@ -16362,14 +16356,13 @@ check_clauses:
 static void
 c_parser_omp_declare_simd (c_parser *parser, enum pragma_context context)
 {
-  vec clauses = vNULL;
+  auto_vec clauses;
   while (c_parser_next_token_is_not (parser, CPP_PRAGMA_EOL))
 {
   c_token *token = c_parser_peek_token (parser);
   if (token->type == CPP_EOF)
{
  c_parser_skip_to_pragma_eol (parser);
- clauses.release ();
  return;
}
   clauses.safe_push (*token);
@@ -16391,7 +16384,6 @@ c_parser_omp_declare_simd (c_parser *parser, enum 
pragma_context context)
  "%<#pragma omp declare simd%> must be followed by "
  "function declaration or definition or another "
  "%<#pragma omp declare simd%>");
- clauses.release ();
  return;
}
   c_parser_consume_pragma (parser);
@@ -16401,7 +16393,6 @@ c_parser_omp_declare_simd (c_parser *parser, enum 
pragma_context context)
  if (token->type == CPP_EOF)
{
  c_parser_skip

[PATCH 9/9] remove unnecessary calls to vec::release

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

There's no point in calling release () on an auto_vec just before it goes
out of scope.

gcc/ChangeLog:

2016-06-29  Trevor Saunders  

* tree-data-ref.c (find_data_references_in_stmt): Remove
unnecessary call to vec::release.
(graphite_find_data_references_in_stmt): Likewise.
* tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p): Likewise.
* tree-vect-stmts.c (vectorizable_condition): Likewise.
---
 gcc/tree-data-ref.c   |  3 +--
 gcc/tree-ssa-alias.c  | 21 +++--
 gcc/tree-vect-stmts.c |  3 ---
 3 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 1a7a2ea..acdaa2f 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -3999,7 +3999,7 @@ find_data_references_in_stmt (struct loop *nest, gimple 
*stmt,
   gcc_assert (dr != NULL);
   datarefs->safe_push (dr);
 }
-  references.release ();
+
   return ret;
 }
 
@@ -4029,7 +4029,6 @@ graphite_find_data_references_in_stmt (loop_p nest, 
loop_p loop, gimple *stmt,
   datarefs->safe_push (dr);
 }
 
-  references.release ();
   return ret;
 }
 
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index b663ddf..0ffa8c2 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -865,7 +865,7 @@ nonoverlapping_component_refs_of_decl_p (tree ref1, tree 
ref2)
   if (TREE_CODE (ref1) == MEM_REF)
 {
   if (!integer_zerop (TREE_OPERAND (ref1, 1)))
-   goto may_overlap;
+   return false;
   ref1 = TREE_OPERAND (TREE_OPERAND (ref1, 0), 0);
 }
 
@@ -878,7 +878,7 @@ nonoverlapping_component_refs_of_decl_p (tree ref1, tree 
ref2)
   if (TREE_CODE (ref2) == MEM_REF)
 {
   if (!integer_zerop (TREE_OPERAND (ref2, 1)))
-   goto may_overlap;
+   return false;
   ref2 = TREE_OPERAND (TREE_OPERAND (ref2, 0), 0);
 }
 
@@ -898,7 +898,7 @@ nonoverlapping_component_refs_of_decl_p (tree ref1, tree 
ref2)
   do
{
  if (component_refs1.is_empty ())
-   goto may_overlap;
+   return false;
  ref1 = component_refs1.pop ();
}
   while (!RECORD_OR_UNION_TYPE_P (TREE_TYPE (TREE_OPERAND (ref1, 0;
@@ -906,7 +906,7 @@ nonoverlapping_component_refs_of_decl_p (tree ref1, tree 
ref2)
   do
{
  if (component_refs2.is_empty ())
-goto may_overlap;
+return false;
  ref2 = component_refs2.pop ();
}
   while (!RECORD_OR_UNION_TYPE_P (TREE_TYPE (TREE_OPERAND (ref2, 0;
@@ -914,7 +914,7 @@ nonoverlapping_component_refs_of_decl_p (tree ref1, tree 
ref2)
   /* Beware of BIT_FIELD_REF.  */
   if (TREE_CODE (ref1) != COMPONENT_REF
  || TREE_CODE (ref2) != COMPONENT_REF)
-   goto may_overlap;
+   return false;
 
   tree field1 = TREE_OPERAND (ref1, 1);
   tree field2 = TREE_OPERAND (ref2, 1);
@@ -927,21 +927,14 @@ nonoverlapping_component_refs_of_decl_p (tree ref1, tree 
ref2)
 
   /* We cannot disambiguate fields in a union or qualified union.  */
   if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
-goto may_overlap;
+return false;
 
   /* Different fields of the same record type cannot overlap.
 ??? Bitfields can overlap at RTL level so punt on them.  */
   if (field1 != field2)
-   {
- component_refs1.release ();
- component_refs2.release ();
- return !(DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2));
-   }
+   return !(DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2));
 }
 
-may_overlap:
-  component_refs1.release ();
-  component_refs2.release ();
   return false;
 }
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3281860..0238504 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7614,9 +7614,6 @@ vectorizable_condition (gimple *stmt, 
gimple_stmt_iterator *gsi,
  if (!masked)
vec_oprnds1 = vec_defs.pop ();
  vec_oprnds0 = vec_defs.pop ();
-
-  ops.release ();
-  vec_defs.release ();
 }
   else
 {
-- 
2.7.4



[PATCH 6/9] ree.c: use auto_vec in ext_state

2016-06-29 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-06-29  Trevor Saunders  

* ree.c (struct ext_state): Make type of members auto_vec.
(find_and_remove_re): Adjust.
---
 gcc/ree.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/gcc/ree.c b/gcc/ree.c
index 4627b4f..3245ac5 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -544,10 +544,10 @@ struct ext_state
   /* In order to avoid constant alloc/free, we keep these
  4 vectors live through the entire find_and_remove_re and just
  truncate them each time.  */
-  vec defs_list;
-  vec copies_list;
-  vec modified_list;
-  vec work_list;
+  auto_vec defs_list;
+  auto_vec copies_list;
+  auto_vec modified_list;
+  auto_vec work_list;
 
   /* For instructions that have been successfully modified, this is
  the original mode from which the insn is extending and
@@ -1147,7 +1147,6 @@ find_and_remove_re (void)
   vec reinsn_list;
   auto_vec reinsn_del_list;
   auto_vec reinsn_copy_list;
-  ext_state state;
 
   /* Construct DU chain to get all reaching definitions of each
  extension instruction.  */
@@ -1159,10 +1158,8 @@ find_and_remove_re (void)
 
   max_insn_uid = get_max_uid ();
   reinsn_list = find_removable_extensions ();
-  state.defs_list.create (0);
-  state.copies_list.create (0);
-  state.modified_list.create (0);
-  state.work_list.create (0);
+
+  ext_state state;
   if (reinsn_list.is_empty ())
 state.modified = NULL;
   else
@@ -1238,10 +1235,6 @@ find_and_remove_re (void)
 delete_insn (curr_insn);
 
   reinsn_list.release ();
-  state.defs_list.release ();
-  state.copies_list.release ();
-  state.modified_list.release ();
-  state.work_list.release ();
   XDELETEVEC (state.modified);
 
   if (dump_file && num_re_opportunities > 0)
-- 
2.7.4



Re: [PATCH][AArch64] Increase code alignment

2016-06-29 Thread James Greenhalgh
On Tue, Jun 21, 2016 at 02:39:23PM +0100, Wilco Dijkstra wrote:
> 
> ping
> 
> 
> From: Wilco Dijkstra
> Sent: 03 June 2016 11:51
> To: GCC Patches
> Cc: nd; philipp.toms...@theobroma-systems.com; pins...@gmail.com; 
> jim.wil...@linaro.org; benedikt.hu...@theobroma-systems.com; Evandro Menezes
> Subject: [PATCH][AArch64] Increase code alignment
>     
> Increase loop alignment on Cortex cores to 8 and set function alignment to
> 16.  This makes things consistent across big.LITTLE cores, improves
> performance of benchmarks with tight loops and reduces performance variations
> due to small  changes in code layout. It looks almost all AArch64 cores agree
> on alignment of 16 for function, and 8 for loops and branches, so we should
> change -mcpu=generic as well if there is no disagreement - feedback welcome.
> 
> OK for commit?

Hi Wilco,

Sorry for the delay.

This patch is OK for trunk.

I hope we can continue the discussion as to whether there is a set of
values for -mcpu=generic that better suits the range of cores we now
support.

After Wilco's patch, and using the values in the proposed vulcan and
qdf24xx structures, and with the comments from this thread, the state of
the tuning structures will be:

-mcpu=: function-jump-loop alignment

cortex-a35: 16-8-8
cortex-a53: 16-8-8
cortex-a57: 16-8-8
cortex-a72: 16-8-8
cortex-a73: 16-8-4 (Though I'm guessing this is just an ordering issue on
when Kyrill copied the cost table. Kyrill/Wilco do you
want to spin an update for the Cortex-A73 alignment?
Consider it preapproved if you do)
exynos-m1 : 4-4-4  (2: 16-4-16, 3: 8-4-4)
thunderx  : 8-8-8  (But 16-8-8 acceptable/maybe better)
xgene1:16-8-16
qdf24xx   :16-8-16
vulcan:16-8-16

Generic is currently set to 8-8-4, which doesn't look representative of
these individual cores at all.

Running down that list, I see a very compelling case for 16 for function
alignment (from comments in this thread, it is second choice, but not too
bad for exynos-m1, thunderx is maybe better at 16, but should not be
worse). I also see a close to unanimous case for 8 for jump alignment,
though I note that in Evandro's experiments 8 never came out as "best"
for exynos-m1. Did you get a feel in your experiments for what the
performance penalty of aligning jumps to 8 would be? That generic is
currently set to 8 seems like a good tie-breaker if the performance impact
for exynos-m1 would be minimal, as it gives no change from the current
behaviour.

For loop alignment I see a split straight down the middle between 8
and 16 (exynos-m1/cortex-a73 are the outliers at 4, but second place for
exynos-m1 was 16-4-16, and cortex-a73 might just be an artifact of when the
table was copied).

From that, and as a starting point for discussion, I'd suggest that
16-8-8 or 16-8-16 are most representative from the core support that has
been added to GCC.

Wilco, the Cortex-A cores tip the balance in favour of 16-8-8. As you've
most recently looked at the cost of setting loop alignment to 16, do you
have any comments on why you chose to standardize on 16-8-8 rather than
16-8-16, or what the cost would be of 16-8-16?

I've widened the CC list, and I'd appreciate any comments on what we all
think will be the right alignment parameters for -mcpu=generic.

Thanks,
James

> 
> ChangeLog:
> 
> 2016-05-03  Wilco Dijkstra  
> 
>     * gcc/config/aarch64/aarch64.c (cortexa53_tunings):
>     Increase loop alignment to 8.  Set function alignment to 16.
>     (cortexa35_tunings): Likewise.
>     (cortexa57_tunings): Increase loop alignment to 8.
>     (cortexa72_tunings): Likewise.
> 
 



[Ada] translate System.Address as void pointer for convention C

2016-06-29 Thread Eric Botcazou
The natural idiom is to import/export objects whose type is void pointer in C 
with the System.Address type in Ada, for example the malloc & memcpy routines.
However System.Address is (almost) always a scalar type in Ada so there is a 
discrepancy between the C side (pointer) and the Ada side (integer) which is 
problematic at least in 2 cases:
  1. Calling conventions that treat pointer and integer types differently, for 
example the SVR4 ABI on m68k or the x32 ABI on x86-64,
  2. LTO mode, which attempts to unify the representation across languages.

Geert (and probably others) suggested some time ago that System.Address be 
translated as void pointer for convention C in order to address this issue.

Tested on x86_64-suse-linux, applied on the mainline.


2016-06-29  Eric Botcazou  

PR ada/48835
PR ada/61954
* gcc-interface/gigi.h (enum standard_datatypes): Add ADT_realloc_decl
(realloc_decl): New macro.
* gcc-interface/decl.c (gnat_to_gnu_entity) : Use local
variable for the entity type and translate it as void pointer if the
entity has convention C.
(gnat_to_gnu_entity) : If this is not a definition and the
external name matches that of malloc_decl or realloc_decl, return the
correspoding node directly.
(gnat_to_gnu_subprog_type): Likewise for parameter and return types.
* gcc-interface/trans.c (gigi): Initialize void_list_node here, not...
Initialize realloc_decl.
* gcc-interface/utils.c (install_builtin_elementary_types): ...here.
(build_void_list_node): Delete.
* gcc-interface/utils2.c (known_alignment) : Return the
alignment of the system allocator for malloc_decl and realloc_decl.
Do not take alignment from void pointer types either.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 237789)
+++ gcc-interface/decl.c	(working copy)
@@ -603,6 +603,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 case E_Out_Parameter:
 case E_Variable:
   {
+	const Entity_Id gnat_type = Etype (gnat_entity);
 	/* Always create a variable for volatile objects and variables seen
 	   constant but with a Linker_Section pragma.  */
 	bool const_flag
@@ -643,14 +644,20 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  }
 
 	/* Get the type after elaborating the renamed object.  */
-	gnu_type = gnat_to_gnu_type (Etype (gnat_entity));
+	if (Convention (gnat_entity) == Convention_C
+	&& Is_Descendant_Of_Address (gnat_type))
+	  gnu_type = ptr_type_node;
+	else
+	  {
+	gnu_type = gnat_to_gnu_type (gnat_type);
 
-	/* If this is a standard exception definition, then use the standard
-	   exception type.  This is necessary to make sure that imported and
-	   exported views of exceptions are properly merged in LTO mode.  */
-	if (TREE_CODE (TYPE_NAME (gnu_type)) == TYPE_DECL
-	&& DECL_NAME (TYPE_NAME (gnu_type)) == exception_data_name_id)
-	  gnu_type = except_type_node;
+	/* If this is a standard exception definition, use the standard
+	   exception type.  This is necessary to make sure that imported
+	   and exported views of exceptions are merged in LTO mode.  */
+	if (TREE_CODE (TYPE_NAME (gnu_type)) == TYPE_DECL
+		&& DECL_NAME (TYPE_NAME (gnu_type)) == exception_data_name_id)
+	  gnu_type = except_type_node;
+	  }
 
 	/* For a debug renaming declaration, build a debug-only entity.  */
 	if (Present (Debug_Renaming_Link (gnat_entity)))
@@ -812,7 +819,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	 || (TYPE_SIZE (gnu_type)
 		 && integer_zerop (TYPE_SIZE (gnu_type))
 		 && !TREE_OVERFLOW (TYPE_SIZE (gnu_type
-	&& !Is_Constr_Subt_For_UN_Aliased (Etype (gnat_entity))
+	&& !Is_Constr_Subt_For_UN_Aliased (gnat_type)
 	&& No (Renamed_Object (gnat_entity))
 	&& No (Address_Clause (gnat_entity)))
 	  gnu_size = bitsize_unit_node;
@@ -828,8 +835,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		|| (!Optimize_Alignment_Space (gnat_entity)
 		&& kind != E_Exception
 		&& kind != E_Out_Parameter
-		&& Is_Composite_Type (Etype (gnat_entity))
-		&& !Is_Constr_Subt_For_UN_Aliased (Etype (gnat_entity))
+		&& Is_Composite_Type (gnat_type)
+		&& !Is_Constr_Subt_For_UN_Aliased (gnat_type)
 		&& !Is_Exported (gnat_entity)
 		&& !imported_p
 		&& No (Renamed_Object (gnat_entity))
@@ -895,12 +902,11 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	/* If this is an aliased object with an unconstrained array nominal
 	   subtype, make a type that includes the template.  We will either
 	   allocate or create a variable of that type, see below.  */
-	if (Is_Constr_Subt_For_UN_Aliased (Etype (gnat_entity))
-	&& Is_Array_Type (Underlying_Type (Etype (gnat_entity)))
+	if (Is_Constr_Subt_For_UN_Aliased (gnat_type)
+	&& Is_Array_Type (Underlying_Type (gnat_type))
 	&& !type_annotate_only)
 	  {
-	tree gnu_

Re: [PATCH] tree-ssa-strlen improvements (PR tree-optimization/71625)

2016-06-29 Thread Christophe Lyon
On 28 June 2016 at 16:23, Jakub Jelinek  wrote:
> Hi!
>
> This is just first small step towards this PR.
> It brings the ADDR_EXPR of DECL_P bases roughly on the same level as
> SSA_NAMEs pointers - so get_stridx_plus_constant works for them, and
> more importantly, before this patch there was a very severe bug in
> addr_stridxptr (missing list = list->next; in the loop), which meant that
> inside of a single decl we were tracking at most one string length, rather
> than the former 16 (or now 32) limit.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> As a follow-up, I'd like to introduce string length at least N, where we
> don't know the exact string length, but we know there are no '\0' chars
> within N bytes.
>
> 2016-06-28  Jakub Jelinek  
>
> PR tree-optimization/71625
> * tree-ssa-strlen.c (get_addr_stridx): Add PTR argument.  Assume list
> is sorted by ascending list->offset.  If PTR is non-NULL and there is
> previous strinfo, call get_stridx_plus_constant.
> (get_stridx): Pass exp as second argument to get_addr_stridx.
> (addr_stridxptr): Add missing list = list->next, so that there can be
> more than one entries in the list.  Bump limit from 16 to 32.  Ensure
> the list is sorted by ascending list->offset.
> (get_stridx_plus_constant): Adjust so that it can be also called with
> ADDR_EXPR instead of SSA_NAME as PTR.
> (handle_char_store): Pass NULL_TREE as second argument to
> get_addr_stridx.
>
> * gcc.dg/strlenopt-28.c: New test.
>

Hi,

This causes GCC builds failures on arm-linux* targets, while building glibc:

/tmp/1440207_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/bin/arm-none-linux-gnueabi-gcc
ns_print.c -c -std=gnu99 -fgnu89-inline  -O2 -Wall -Winline -Wundef
-Wwrite-strings -fmerge-
all-constants -frounding-math -g -Wno-strict-prototypes
-Wno-write-strings -Wstrict-prototypes   -fPIC   -fstack-protector
-I../include -I/tmp/1440207_1.tmpdir/aci-gcc-fsf/builds/gc
c-fsf-gccsrc/obj-arm-none-linux-gnueabi/glibc-1/resolv
-I/tmp/1440207_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/glibc-1
 -I../sysdeps/unix/sysv/linux/arm  -
I../nptl/sysdeps/unix/sysv/linux  -I../nptl/sysdeps/pthread
-I../sysdeps/pthread  -I../sysdeps/unix/sysv/linux  -I../sysdeps/gnu
-I../sysdeps/unix/inet  -I../nptl/sysdeps/unix/sysv  -
I../sysdeps/unix/sysv  -I../sysdeps/unix/arm  -I../nptl/sysdeps/unix
-I../sysdeps/unix  -I../sysdeps/posix
-I../sysdeps/arm/armv7/multiarch  -I../sysdeps/arm/armv7
-I../sysdeps/arm/a
rmv6t2  -I../sysdeps/arm/armv6  -I../sysdeps/arm/nptl
-I../sysdeps/arm/include -I../sysdeps/arm  -I../sysdeps/arm/soft-fp
-I../sysdeps/wordsize-32  -I../sysdeps/ieee754/flt-32  -I../s
ysdeps/ieee754/dbl-64  -I../sysdeps/ieee754  -I../sysdeps/generic
-I../nptl  -I.. -I../libio -I. -nostdinc -isystem
/tmp/1440207_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/lib/gc
c/arm-none-linux-gnueabi/7.0.0/include -isystem
/tmp/1440207_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/lib/gcc/arm-none-linux-gnueabi/7.0.0/include-fixed
-isystem /tmp/1440207_1.
tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabi/usr/include
 -D_LIBC_REENTRANT -include ../include/libc-symbols.h  -DPIC -DSHARED
-DNOT_IN_libc=1 -DIS_IN_libreso
lv=1 -DIN_LIB=libresolv-Dgethostbyname=res_gethostbyname
-Dgethostbyname2=res_gethostbyname2 -Dgethostbyaddr=res_gethostbyaddr
-Dgetnetbyname=res_getnetbyname -Dgetnetbyaddr=res_get
netbyaddr -o 
/tmp/1440207_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/glibc-1/resolv/ns_print.os
-MD -MP -MF /tmp/1440207_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-g
ccsrc/obj-arm-none-linux-gnueabi/glibc-1/resolv/ns_print.os.dt -MT
/tmp/1440207_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/glibc-1/resolv/ns_print.os

ns_print.c: In function 'ns_sprintrrf':
ns_print.c:94:1: internal compiler error: in get_stridx_plus_constant,
at tree-ssa-strlen.c:680
 ns_sprintrrf(const u_char *msg, size_t msglen,
 ^~~~
0xcf1387 get_stridx_plus_constant

/tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:680
0xcf15ba get_addr_stridx

/tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:197
0xcf1776 get_stridx

/tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:250
0xcf6448 handle_pointer_plus

/tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2028
0xcf6448 strlen_optimize_stmt

/tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2297
0xcf6fd9 strlen_dom_walker::before_dom_children(basic_block_def*)

/tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2456
0x115c27c dom_walker::walk(basic_block_def*)

/tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/domwalk.c:265
0xcef80a execute

Re: [PATCH] tree-ssa-strlen improvements (PR tree-optimization/71625)

2016-06-29 Thread Jakub Jelinek
On Wed, Jun 29, 2016 at 03:11:07PM +0200, Christophe Lyon wrote:
> This causes GCC builds failures on arm-linux* targets, while building glibc:
> 
> ns_print.c: In function 'ns_sprintrrf':
> ns_print.c:94:1: internal compiler error: in get_stridx_plus_constant,
> at tree-ssa-strlen.c:680
>  ns_sprintrrf(const u_char *msg, size_t msglen,
>  ^~~~
> 0xcf1387 get_stridx_plus_constant
> 
> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:680
> 0xcf15ba get_addr_stridx
> 
> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:197
> 0xcf1776 get_stridx
> 
> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:250
> 0xcf6448 handle_pointer_plus
> 
> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2028
> 0xcf6448 strlen_optimize_stmt
> 
> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2297
> 0xcf6fd9 strlen_dom_walker::before_dom_children(basic_block_def*)
> 
> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2456
> 0x115c27c dom_walker::walk(basic_block_def*)
> 
> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/domwalk.c:265
> 0xcef80a execute
> 
> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2528
> 
> 
> I do not have a preprocessed file yet, I'll prepare one if you need.

That would be greatly appreciated (plus command line options).
Please file a PR and I'll handle it.  Thanks.

Jakub


Re: [PATCH 2/9] c-decl.c: add [cd]tors to c_struct_parse_info

2016-06-29 Thread David Malcolm
On Wed, 2016-06-29 at 08:26 -0400, tbsaunde+...@tbsaunde.org wrote:
> From: Trevor Saunders 
> 
> gcc/c/ChangeLog:

I may be missing my coffee here but you mention adding a constructor in
the ChangeLog here:

> 2016-06-29  Trevor Saunders  
> 
>   * c-decl.c (struct c_struct_parse_info): Add constructor and
 ^^^

> > change member types from vec to auto_vec.
>   (start_struct): Adjust.
>   (finish_struct): Likewise.
> ---
>  gcc/c/c-decl.c | 16 +---
>  1 file changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 8b966fe..c173796 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -574,15 +574,15 @@ struct c_struct_parse_info
>  {
>/* If warn_cxx_compat, a list of types defined within this
>   struct.  */
> -  vec struct_types;
> +  auto_vec struct_types;
>/* If warn_cxx_compat, a list of field names which have bindings,
>   and which are defined in this struct, but which are not defined
>   in any enclosing struct.  This is used to clear the in_struct
>   field of the c_bindings structure.  */
> -  vec fields;
> +  auto_vec fields;
>/* If warn_cxx_compat, a list of typedef names used when defining
>   fields in this struct.  */
> -  vec typedefs_seen;
> +  auto_vec typedefs_seen;

...I don't see an explicit constructor here.

>  };
>  
>  /* Information for the struct or union currently being parsed, or
> @@ -7443,10 +7443,7 @@ start_struct (location_t loc, enum tree_code
> code, tree name,
>  TYPE_PACKED (v) = flag_pack_struct;
>  
>*enclosing_struct_parse_info = struct_parse_info;
> -  struct_parse_info = XNEW (struct c_struct_parse_info);
> -  struct_parse_info->struct_types.create (0);
> -  struct_parse_info->fields.create (0);
> -  struct_parse_info->typedefs_seen.create (0);
> +  struct_parse_info = new c_struct_parse_info ();
>  
>/* FIXME: This will issue a warning for a use of a type defined
>   within a statement expr used within sizeof, et. al.  This is
> not
> @@ -8088,10 +8085,7 @@ finish_struct (location_t loc, tree t, tree
> fieldlist, tree attributes,
>if (warn_cxx_compat)
>  warn_cxx_compat_finish_struct (fieldlist, TREE_CODE (t), loc);
>  
> -  struct_parse_info->struct_types.release ();
> -  struct_parse_info->fields.release ();
> -  struct_parse_info->typedefs_seen.release ();
> -  XDELETE (struct_parse_info);
> +  delete struct_parse_info;
>  
>struct_parse_info = enclosing_struct_parse_info;
>  


[PATCH] Testcase for PR15256

2016-06-29 Thread Richard Biener

noticed we have this fixed since GCC 6.

Committed.

Richard.

2016-06-29  Richard Biener  

PR middle-end/15256
* gcc.dg/tree-ssa/forwprop-34.c: New testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-34.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/forwprop-34.c (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-34.c (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-cddce1" } */
+ 
+ unsigned int
+ foo (unsigned int eax)
+ {
+   unsigned int edx = eax & 1;
+   edx ^= 1;
+   eax &= -2;
+   eax |= edx;
+   return eax;
+ }
+ 
+ /* { dg-final { scan-tree-dump-times " = " 1 "cddce1" } } */
+ /* { dg-final { scan-tree-dump " = eax_\[0-9\]+\\(D\\) \\^ 1;" "cddce1" } } */


Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-29 Thread Thomas Schwinge
Hi!

Cesar, I have not yet fully digested this, but do I understand right that
you're really fixing two issues here, that are related (OpenACC routines)
but still can be addressed independently of each other?  Do I understand
right that the first one, the "problems with acc routines [...]
incorrectly permitting 'acc seq' loops to call gang, worker and vector
routines" is just a Fortran front end patch?  If yes, please split that
one out, so as to reduce the volume of remaining changes that remain to
be discussed.

On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis  
wrote:
> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
> > On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> >> The second set of changes involves teaching the gimplifier to error when
> >> it detects a function call to an non-acc routines inside an OpenACC
> >> offloaded region.

As I understand, that's the same problem as has been discussed before
(Ilya CCed), and has recently again been filed in
 "ICE in LTO1 when attempting NVPTX
offloading (-fopenacc)", and  "ICE in LTO1
with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
threads include:
,
,
.

> >> Actually, I relaxed non-acc routines by excluding
> >> calls to builtin functions, including those prefixed with _gfortran_.
> >> Nvptx does have a newlib c library, and it also has a subset of
> >> libgfortran. Still, this solution is probably not optimal.
> > 
> > I don't really like that, hardcoding prefixes or whatever is available
> > (you have quite some subset of libc, libm etc. available too) in the
> > compiler looks very hackish.  What is wrong with complaining during
> > linking of the offloaded code?

ACK.  Jakub, do I understand you correctly, that you basically say that
every function declaration that is in scope inside offloaded regions (for
example, GCC builtin functions, or standard library functions declared in
target compiler's header files) is permitted to be called in offloaded
regions, and the offloading compiler will then either be able to resolve
these (nvptx back end knows about trigonometric functions, for example,
and a lot of functions are available in the nvptx libc), or otherwise
error out during the offloading compilation (during linking), gracefully
without terminating the target compilation (that "gracefully" bit is
currently missing -- that's for another day).  That is, all such
functions are implicitly callable as OpenACC "seq" functions (which means
that they don't internally use gang/worker/vector parallelism).  In
particular, all these functions do *not* need to be marked with an
explicit "#pragma acc routine seq" directive.  (Functions internally
using gang/worker/vector parallelism will need to be marked
appropriately, using a "#pragma acc routine gang/worker/vector"
directive.)  That's how I understand your comment above, and your earlier
comments on this topic, and also is what I think should be done.

> Wouldn't the error get reported multiple times then, i.e. once per
> target? Then again, maybe this error could have been restrained to the
> host compiler.

That's not something I would care about right now.  :-)

> Anyway, this patch now reduces that error to a warning. Furthermore,
> that warning is being thrown in lower_omp_1 instead of
> gimplify_call_expr because the latter is called multiple times and that
> causes duplicate warnings. The only bit of fallout I had with this
> change was with the fortran FE's usage of BUILT_IN_EXPECT in
> gfc_{un}likely. Since these are generated implicitly by the FE, I just
> added an oacc_function attribute to those calls when flag_openacc is set.
> 
> >> Next, I had to modify the openacc header files in libgomp to mark
> >> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> >> build the opeancc.mod module for gfortran with -fopenacc. But doing
> >> that, caused caused gcc to stream offloaded code to the openacc.o object
> >> file. So, I've updated the behavior of flag_generate_offload such that
> >> minus one indicates that the user specified -foffload=disable, and that
> >> will prevent gcc from streaming offloaded lto code. The alternative was
> >> to hack libtool to build libgomp with -foffload=disable.
> > 
> > This also looks wrong.  I'd say the right thing is when loading modules
> > that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> > handled this well) into CU with the corresponding flags unset (-fopenacc,
> > -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> > IMHO the module loading code should just ignore it, pretend it wasn't there.
> > Similarly e.g. to 

Re: [PATCH 2/9] c-decl.c: add [cd]tors to c_struct_parse_info

2016-06-29 Thread Trevor Saunders
On Wed, Jun 29, 2016 at 09:21:01AM -0400, David Malcolm wrote:
> On Wed, 2016-06-29 at 08:26 -0400, tbsaunde+...@tbsaunde.org wrote:
> > From: Trevor Saunders 
> > 
> > gcc/c/ChangeLog:
> 
> I may be missing my coffee here but you mention adding a constructor in
> the ChangeLog here:

No, I think I was missing mine, and you are right.  I'll update the
ChangeLog locally.

Trev



Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-29 Thread Jakub Jelinek
On Wed, Jun 29, 2016 at 04:11:31PM +0200, Thomas Schwinge wrote:
> > >> Actually, I relaxed non-acc routines by excluding
> > >> calls to builtin functions, including those prefixed with _gfortran_.
> > >> Nvptx does have a newlib c library, and it also has a subset of
> > >> libgfortran. Still, this solution is probably not optimal.
> > > 
> > > I don't really like that, hardcoding prefixes or whatever is available
> > > (you have quite some subset of libc, libm etc. available too) in the
> > > compiler looks very hackish.  What is wrong with complaining during
> > > linking of the offloaded code?
> 
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

Yes.  Well, OpenMP doesn't have different kinds of target functions, just
one.  And at least the current spec doesn't require that target regions or
declare target functions only call functions declared target, I guess mainly
because that would require that all the C/C++ headers are OpenMP aware and
declare everything that has the offloading counterpart.
For user code, of course users have to declare their routines, otherwise it
just can't be offloaded, and the implementation runtime is a very fuzzy
thing outside of the standard.

Jakub


[C++ RFC/Patch] PR c++/71665

2016-06-29 Thread Paolo Carlini

Hi,

we have this pretty old regression where we ICE on the invalid snippet:

class A
{
  int f ();
  enum { a = f };
};

in fact we hit the gcc_checking_assert in cxx_eval_constant_expression

case COMPONENT_REF:
  if (is_overloaded_fn (t))
{
  /* We can only get here in checking mode via
 build_non_dependent_expr,  because any expression that
 calls or takes the address of the function will have
 pulled a FUNCTION_DECL out of the COMPONENT_REF.  */
  gcc_checking_assert (ctx->quiet || errorcount);
  *non_constant_p = true;
  return t;
}

which goes back to my fix for c++/58647 (that's why Jakub CC-ed me). 
Three years later some archeology is required, this is the original thread:


https://gcc.gnu.org/ml/gcc-patches/2013-11/msg03059.html

Obviously the comment above isn't (fully) correct (anymore), we don't 
get there via build_non_dependent_expr, the backtrace is simple:


#0  cxx_eval_constant_expression (ctx=0x7fffd170, t=0x768559c0, 
lval=false, non_constant_p=0x7fffd1a7, overflow_p=0x7fffd1a6, 
jump_target=0x0) at ../../trunk/gcc/cp/constexpr.c:3918
#1  0x00929dd1 in cxx_eval_outermost_constant_expr 
(t=0x768559c0, allow_non_constant=false, strict=true, object=0x0) at 
../../trunk/gcc/cp/constexpr.c:4211
#2  0x0092a46f in cxx_constant_value (t=0x768559c0, 
decl=0x0) at ../../trunk/gcc/cp/constexpr.c:4309
#3  0x006aebf2 in build_enumerator (name=0x769ce000, 
value=0x768559c0, enumtype=0x769bd690, attributes=0x0, 
loc=250592) at ../../trunk/gcc/cp/decl.c:13588
#4  0x007c9b35 in cp_parser_enumerator_definition 
(parser=0x77fbeab0, type=0x769bd690) at 
../../trunk/gcc/cp/parser.c:17431


thus from the parser straight through build_enumerator and then 
cxx_constant_value.


Now, I suspect a fix for this issue is simple, eg, does it really make 
sense to keep the gcc_checking_assert in the face of this kind of broken 
but quite straightforward snippet? Of course ctx-quiet is false in this 
case and errorcount is still 0 but when we return from 
cxx_constant_value with such a COMPONENT_REF, build_enumerator 
immediately does:


  if (TREE_CODE (value) != INTEGER_CST
  || ! INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (value)))
{
  error ("enumerator value for %qD is not an integer constant",
 name);
  value = NULL_TREE;
}

and we are Ok diagnostic-wise (if we want we could say something more 
specific in build_enumerator, eg, clang says "call to non-static member 
function without an object argument", edg does not).


What do you think?

Thanks,
Paolo.


Re: Minor tweak to df_note_bb_compute

2016-06-29 Thread Eric Botcazou
> This simply prevents valgrind from complaining about an invalid read when
> pass_free_cfg::execute calls df_analyze for targets with delay slots,
> because var_location instructions are present in the RTL stream at this
> point.
> 
> Tested on x86_64-suse-linux, applied on the mainline as obvious.
> 
> 
> 2016-06-09  Eric Botcazou  
> 
>   * df-problems.c (df_note_bb_compute): Guard use of DF_INSN_INFO_GET.

I have backported it onto the 6 branch because we got segfaults for a cross-
compiler hosted on Windows triggered by the invalid read.

-- 
Eric Botcazou


Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-29 Thread Cesar Philippidis
On 06/29/2016 07:11 AM, Thomas Schwinge wrote:

> Cesar, I have not yet fully digested this, but do I understand right that
> you're really fixing two issues here, that are related (OpenACC routines)
> but still can be addressed independently of each other?  Do I understand
> right that the first one, the "problems with acc routines [...]
> incorrectly permitting 'acc seq' loops to call gang, worker and vector
> routines" is just a Fortran front end patch?  If yes, please split that
> one out, so as to reduce the volume of remaining changes that remain to
> be discussed.

This patch addresses the following issues:

 1. Issues warnings when a non-acc routine function is called inside an
OpenACC offloaded region.

 2. It corrects a bug what was allowing seq loops to call gang, worker
and vector routines.

 3. It adds supports for acc routines in fortran modules (which I
noticed was missing when I added 'acc routine seq' to acc_on_device
in the fortran openacc include files).

I'll split these into separate patches.

> On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis 
>  wrote:
>> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
>>> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
 The second set of changes involves teaching the gimplifier to error when
 it detects a function call to an non-acc routines inside an OpenACC
 offloaded region.
> 
> As I understand, that's the same problem as has been discussed before
> (Ilya CCed), and has recently again been filed in
>  "ICE in LTO1 when attempting NVPTX
> offloading (-fopenacc)", and  "ICE in LTO1
> with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
> threads include:
> ,
> ,
> .
> 
 Actually, I relaxed non-acc routines by excluding
 calls to builtin functions, including those prefixed with _gfortran_.
 Nvptx does have a newlib c library, and it also has a subset of
 libgfortran. Still, this solution is probably not optimal.
>>>
>>> I don't really like that, hardcoding prefixes or whatever is available
>>> (you have quite some subset of libc, libm etc. available too) in the
>>> compiler looks very hackish.  What is wrong with complaining during
>>> linking of the offloaded code?
> 
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

OK. I'll drop the warning changes from my patch set then unless you want
to keep it.

> A few random comments on the patch:
> 
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -303,6 +303,15 @@ enum save_state
>>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>>  };
>>  
>> +/* Flags to keep track of ACC routine states.  */
>> +enum oacc_function
>> +{ OACC_FUNCTION_NONE = 0,
>> +  OACC_FUNCTION_SEQ,
>> +  OACC_FUNCTION_GANG,
>> +  OACC_FUNCTION_WORKER,
>> +  OACC_FUNCTION_VECTOR
>> +};
> 
> What's the purpose of OACC_FUNCTION_NONE?  It's not used anywhere, as far
> as I can tell?

It's used by the fortran module code. It controls how parallelism gets
encoded in the .mod files.

>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
>> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
>>  
>>  /* Determine the loop level for a routine.   */
>>  
>> -static int
>> +static oacc_function
>>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>  {
>>int level = -1;
>> +  oacc_function ret = OACC_FUNCTION_SEQ;
>>  
>>if (clauses)
>>  {
>>unsigned mask = 0;

Re: Fix for PR70909 in Libiberty Demangler (4)

2016-06-29 Thread Pedro Alves
On 06/29/2016 08:43 AM, Marcel Böhme wrote:
> Hi Jason,
> 
> These test cases are generated by fuzzing which produces a lot of nonsensical 
> input data. 
> I think, "Garbage In, Garbage Out" is quite applicable here.
> With the patch at least it doesn’t crash and fixes the vulnerability.

Note that demangling shows up high in gdb profiles when loading
huge programs.  If we can avoid quadratic or worse complexity,
it'd preferred.

Thanks,
Pedro Alves



Re: [PATCH PR70729] The second part of patch.

2016-06-29 Thread Thomas Schwinge
Hi!

On Tue, 28 Jun 2016 18:50:37 +0300, Yuri Rumyantsev  wrote:
> Here is the second part of patch to improve loop invariant code motion
> for loop marked with pragma omp simd.
> 
> Bootstrapping and regression testing did not show any new failures.

That got committed in r237844.  In my testing, that causes (or exposes?)
the following regressions (both with offloading enabled and disabled):

 PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  (test for excess errors)
 PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  execution test
 PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  (test for excess errors)
-PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  execution test
+FAIL: libgomp.fortran/examples-4/simd-2.f90   -O2  execution test
 PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
-PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
+FAIL: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
 PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  (test for excess 
errors)
-PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  execution test
+FAIL: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  execution test
 PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  (test for excess errors)
 PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  execution test
 PASS: libgomp.fortran/examples-4/simd-3.f90   -O0  (test for excess errors)

These are segmentation faults, for example:

Program received signal SIGSEGV: Segmentation fault - invalid memory 
reference.

Backtrace for this error:
#0  0x7fb65eeb52ef in ???
#1  0x400845 in __simd2_mod_MOD_add2
at 
[...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:18
#2  0x400845 in __simd2_mod_MOD_work
at 
[...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:29
#3  0x4008d4 in MAIN__
at 
[...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:68
#4  0x4005dc in main
at 
[...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:58
Segmentation fault (core dumped)

> PR tree-optimization/70729
> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider memory reference as
> independent in loops having positive safelen value.
> * tree-vect-loop.c (vect_transform_loop): Clear-up safelen value since
> it may be not valid after vectorization.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH], Pr 71667, Fix PowerPC ISA 3.0 DImode Altivec load/store

2016-06-29 Thread Alan Hayward
Hi,

This patch references the wrong bug. It should reference 71677 and not
71667
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71677

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71667


Thanks,
Alan.



As we discussed in the patch review, there were some issues with using %Y
for
the ISA 3.0 instructions LXSD and STXSD.

I have rewritten the patch so that we have a new memory constraint (%wY)
that
explicitly targets those instructions.  I went back to the mov{DF,DD}
patterns
and changed their use of %o to %wY.

I removed the test that was generated from the XalanNamespacesStack.cpp
source
that showed up the problem.

I have bootstrapped this on a little endian power8 system and there were no
regressions in the test suite.  I also built Spec 2006 for power9 with this
compiler, and the xalancbmk benchmark now builds.  I will kick off a big
endian
build on a power7 system.

Assuming there are no regressions in power7, are these patches ok to
install in
the trunk, and backport to GCC 6.2 after a burn-in period?

2016-06-28  Michael Meissner  

PR target/71667
* config/rs6000/constraints.md (wY constraint): New constraint to
match the requirements for the LXSD and STXSD instructions.
* config/rs6000/predicates.md (offsettable_mem_14bit_operand): New
predicate to match the requirements for the LXSD and STXSD
instructions.
* config/rs6000/rs6000.md (mov_hardfloat32, FMOVE64 case):
Use constaint wY for LXSD/STXSD instructions instead of 'o' or 'Y'
to make sure that the bottom 2 bits of offset are 0, the address
form is offsettable, and no updating is done in the address mode.
(mov_hardfloat64, FMOVE64 case): Likewise.
(movdi_internal32): Likewise
(movdi_internal64): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797






Re: [PATCH, libstdc++/testsuite, ping2] 29_atomics/atomic/65913.cc: require atomic-builtins rather than specific target

2016-06-29 Thread Thomas Preudhomme
Ping?

Best regards,

Thomas

On Thursday 02 June 2016 14:34:03 Thomas Preudhomme wrote:
> Ping?
> 
> On Thursday 26 May 2016 14:00:55 Thomas Preudhomme wrote:
> > [Sorry for the large recipient list, I wasn't sure who of C++ and x86
> > maintainers should approve this]
> > 
> > Hi,
> > 
> > 29_atomics/atomic/65913.cc test in libstdc++ is a runtime test that only
> > rely on atomic and gnu++11 support. Therefore I propose to require
> > atomic-builtins instead of an x86 (32 or 64 bits) target.
> > 
> > ChangeLog entry is as follows:
> > 
> > 2016-05-19  Thomas Preud'homme  
> > 
> > * testsuite/29_atomics/atomic/65913.cc: Require atomic-builtins
> > 
> > rather than specific target.
> > 
> > 
> > Patch is in attachment.
> > 
> > 
> > Is this ok for trunk?
> > 
> > Best regards,
> > 
> > Thomas



Re: [PATCH][AArch64] Increase code alignment

2016-06-29 Thread Evandro Menezes

On 06/29/16 07:59, James Greenhalgh wrote:

On Tue, Jun 21, 2016 at 02:39:23PM +0100, Wilco Dijkstra wrote:

ping


From: Wilco Dijkstra
Sent: 03 June 2016 11:51
To: GCC Patches
Cc: nd; philipp.toms...@theobroma-systems.com; pins...@gmail.com; 
jim.wil...@linaro.org; benedikt.hu...@theobroma-systems.com; Evandro Menezes
Subject: [PATCH][AArch64] Increase code alignment
 
Increase loop alignment on Cortex cores to 8 and set function alignment to

16.  This makes things consistent across big.LITTLE cores, improves
performance of benchmarks with tight loops and reduces performance variations
due to small  changes in code layout. It looks almost all AArch64 cores agree
on alignment of 16 for function, and 8 for loops and branches, so we should
change -mcpu=generic as well if there is no disagreement - feedback welcome.

OK for commit?

Hi Wilco,

Sorry for the delay.

This patch is OK for trunk.

I hope we can continue the discussion as to whether there is a set of
values for -mcpu=generic that better suits the range of cores we now
support.

After Wilco's patch, and using the values in the proposed vulcan and
qdf24xx structures, and with the comments from this thread, the state of
the tuning structures will be:

-mcpu=: function-jump-loop alignment

cortex-a35: 16-8-8
cortex-a53: 16-8-8
cortex-a57: 16-8-8
cortex-a72: 16-8-8
cortex-a73: 16-8-4 (Though I'm guessing this is just an ordering issue on
 when Kyrill copied the cost table. Kyrill/Wilco do you
 want to spin an update for the Cortex-A73 alignment?
 Consider it preapproved if you do)
exynos-m1 : 4-4-4  (2: 16-4-16, 3: 8-4-4)
thunderx  : 8-8-8  (But 16-8-8 acceptable/maybe better)
xgene1:16-8-16
qdf24xx   :16-8-16
vulcan:16-8-16

Generic is currently set to 8-8-4, which doesn't look representative of
these individual cores at all.

Running down that list, I see a very compelling case for 16 for function
alignment (from comments in this thread, it is second choice, but not too
bad for exynos-m1, thunderx is maybe better at 16, but should not be
worse). I also see a close to unanimous case for 8 for jump alignment,
though I note that in Evandro's experiments 8 never came out as "best"
for exynos-m1. Did you get a feel in your experiments for what the
performance penalty of aligning jumps to 8 would be? That generic is
currently set to 8 seems like a good tie-breaker if the performance impact
for exynos-m1 would be minimal, as it gives no change from the current
behaviour.

For loop alignment I see a split straight down the middle between 8
and 16 (exynos-m1/cortex-a73 are the outliers at 4, but second place for
exynos-m1 was 16-4-16, and cortex-a73 might just be an artifact of when the
table was copied).

 From that, and as a starting point for discussion, I'd suggest that
16-8-8 or 16-8-16 are most representative from the core support that has
been added to GCC.

Wilco, the Cortex-A cores tip the balance in favour of 16-8-8. As you've
most recently looked at the cost of setting loop alignment to 16, do you
have any comments on why you chose to standardize on 16-8-8 rather than
16-8-16, or what the cost would be of 16-8-16?

I've widened the CC list, and I'd appreciate any comments on what we all
think will be the right alignment parameters for -mcpu=generic.


This is a very good summary.  However, I think that it should also 
consider the effects on code size.


Evidently, an alignment of 16 has the greatest probability of increasing 
code size, whereas an alignment of 4 doesn't increase it.  Likewise, 
what is aligned also matters, for each of them have a different 
frequency of instances.  Arguably, the function alignment has the least 
effect in code size, followed by jumps and then loops, based on typical 
frequencies.  In specific cases, the weights may move somewhat, of course.


As I stated in my previous reply, I also tracked the code size when 
experimenting with different alignments.  It was clear form my data that 
the jump alignment was the most critical to code size (~3% on average), 
followed by loop alignment (<1%) and function alignment (<1%).  
Therefore, I'd argue against setting the alignment for jumps to 16.  The 
case can be made for an alignment of 8 for them, when the cost to code 
size is more modest.


From a performance perspective, generic alignments at 16-8-8 would be 
the 4th rank on Exynos M1, with a negligible code size penalty (<1%) 
over the current 8-4-4.


HTH

--
Evandro Menezes



Re: [PATCH, rs6000] Enable some existing __float128 tests for powerpc64*

2016-06-29 Thread Joseph Myers
On Tue, 28 Jun 2016, Bill Schmidt wrote:

> -/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */
> +/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* powerpc64*-*-* } } 
> */
>  /* { dg-options "-pedantic" } */
> +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc64*-*-* } } */

Rather than duplicating powerpc64 references everywhere, wouldn't it be 
better to add an effective-target keyword __float128, meaning that 
__float128 is available?  Along with { dg-add-options float128 }.

Also: powerpc-* targets with -m64 should always be handled in tests 
identically to powerpc64 targets, while powerpc64 targets with -m32 should 
presumably not run these tests.  That is, testing for powerpc64*-*-* is 
actually (always, for any test, not just these ones) incorrect in both 
directions (just as it's always incorrect for a test to list just one of 
i?86-*-* and x86_64-*-*, rather than listing both and then adding any 
restrictions required to 32-bit or 64-bit).

> -/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* } } */
> +/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* powerpc64*-*-* } } */
>  /* { dg-options "" } */
>  /* { dg-require-effective-target fenv_exceptions } */
> +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc64*-*-* } } */

Also: for execution tests, if you add extra options, you also need to 
restrict the test to running when the hardware actually supports the 
required features.

-- 
Joseph S. Myers
jos...@codesourcery.com


[Patch, Fortran, 71623, v1][5/6/7 Regression] Segfault when allocating deferred-length characters to size of a pointer

2016-06-29 Thread Andre Vehreschild
Hi all,

the attached patch fixes the regression at least for trunk (I haven't
checked the others, but for 6 it should do either, 5 may need a little
bit more work). The issue here is that computing the typespec involved
code in se.pre that was not merged to the parent block.

Bootstrapped and regtested on x86_64-linux-gnu/F23? Ok for trunk?

I promise to do a backport to gcc-6 and -5 next week, once this patch
has been lifing in trunk a few days.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/fortran/ChangeLog:

2016-06-28  Andre Vehreschild  

PR fortran/71623
* trans-stmt.c (gfc_trans_allocate): Add code of pre block of typespec
in allocate to parent block.

gcc/testsuite/ChangeLog:

2016-06-28  Andre Vehreschild  

PR fortran/71623
* gfortran.dg/deferred_character_17.f90: New test.


diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 84bf749..5aa7778 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -5696,9 +5696,11 @@ gfc_trans_allocate (gfc_code * code)
 	  tmp = gfc_get_char_type (code->ext.alloc.ts.kind);
 	  tmp = TYPE_SIZE_UNIT (tmp);
 	  tmp = fold_convert (TREE_TYPE (se_sz.expr), tmp);
+	  gfc_add_block_to_block (&block, &se_sz.pre);
 	  expr3_esize = fold_build2_loc (input_location, MULT_EXPR,
 	 TREE_TYPE (se_sz.expr),
 	 tmp, se_sz.expr);
+	  expr3_esize = gfc_evaluate_now (expr3_esize, &block);
 	}
 }
 
@@ -5897,6 +5899,7 @@ gfc_trans_allocate (gfc_code * code)
 		 source= or mold= expression.  */
 	  gfc_init_se (&se_sz, NULL);
 	  gfc_conv_expr (&se_sz, code->ext.alloc.ts.u.cl->length);
+	  gfc_add_block_to_block (&block, &se_sz.pre);
 	  gfc_add_modify (&block, al_len,
 			  fold_convert (TREE_TYPE (al_len),
 	se_sz.expr));
@@ -5981,11 +5984,19 @@ gfc_trans_allocate (gfc_code * code)
 		 specified by a type spec for deferred length character
 		 arrays or unlimited polymorphic objects without a
 		 source= or mold= expression.  */
-	  gfc_init_se (&se_sz, NULL);
-	  gfc_conv_expr (&se_sz, code->ext.alloc.ts.u.cl->length);
-	  gfc_add_modify (&block, al_len,
-			  fold_convert (TREE_TYPE (al_len),
-	se_sz.expr));
+	  if (expr3_esize == NULL_TREE || code->ext.alloc.ts.kind != 1)
+		{
+		  gfc_init_se (&se_sz, NULL);
+		  gfc_conv_expr (&se_sz, code->ext.alloc.ts.u.cl->length);
+		  gfc_add_block_to_block (&block, &se_sz.pre);
+		  gfc_add_modify (&block, al_len,
+  fold_convert (TREE_TYPE (al_len),
+		se_sz.expr));
+		}
+	  else
+		gfc_add_modify (&block, al_len,
+fold_convert (TREE_TYPE (al_len),
+	  expr3_esize));
 	}
 	  else
 	/* No length information needed, because type to allocate
diff --git a/gcc/testsuite/gfortran.dg/deferred_character_17.f90 b/gcc/testsuite/gfortran.dg/deferred_character_17.f90
new file mode 100644
index 000..5a9d725
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/deferred_character_17.f90
@@ -0,0 +1,13 @@
+!{ dg-do run }
+
+! Check fix for PR fortran/71623
+
+program allocatemvce
+  implicit none
+  character(len=:), allocatable :: string
+  integer, dimension(4), target :: array = [1,2,3,4]
+  integer, dimension(:), pointer :: array_ptr
+  array_ptr => array
+  ! The allocate used to segfault
+  allocate(character(len=size(array_ptr))::string)
+end program allocatemvce


[AArch64] - feedback on the approach followed for vulcan scheduler

2016-06-29 Thread Virendra Pathak
Hi gcc-patches group,

I am working on adding vulcan.md (machine description) for vulcan cpu
in the aarch64 port. However, before proposing the final patch, I would like
the basic approach to be reviewed by you all (as changes are there in
aarch64.md)

In vulcan, a (load/store) instruction could be scheduled to cpu units in
different way based on the addressing mode(e.g. load, or load+integer).
So the requirement is to identify the addressing mode of (load/store)
instruction's operand while scheduling.

For this purpose, a new attribute "addr_type" has been added in the
aarch64.md file. This helps in identifying which operands of (load/store)
instruction should be considered for finding the addressing mode.

vulcan.md, while scheduling, calls a new function aarch64_mem_type_p
in the aarch64.c (via match_test) to decide the scheduling option based
on the addressing mode.

I have copied the code snippet below (complete patch is attached with
this mail).

Kindly review and give your feedback/comment.
Also if you think there could be an better alternative way, kindly suggest.

Thanks in advance for your time.



FILE - gcc/config/aarch64/aarch64-protos.h

/* Mask bits to use for for aarch64_mem_type_p.  Unshifted/shifted index
   register variants are separated for scheduling purposes because the
   distinction matters on some cores.  */
#define AARCH64_ADDR_REG_IMM   0x01
#define AARCH64_ADDR_REG_WB0x02
#define AARCH64_ADDR_REG_REG   0x04
#define AARCH64_ADDR_REG_SHIFT 0x08
#define AARCH64_ADDR_REG_EXT   0x10
#define AARCH64_ADDR_REG_SHIFT_EXT 0x20
#define AARCH64_ADDR_LO_SUM0x40
#define AARCH64_ADDR_SYMBOLIC  0x80




FILE - gcc/config/aarch64/aarch64.md

(define_attr "addr_type" "none,op0,op1,op0addr,op1addr,lo_sum,wb"
  (const_string "none"))


(define_insn "*mov_aarch64"
  [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r,
*w,r,*w, m, m, r,*w,*w")
(match_operand:SHORT 1 "general_operand"  " r,M,D,m,
m,rZ,*w,*w, r,*w"))]
  "(register_operand (operands[0], mode)
|| aarch64_reg_or_zero (operands[1], mode))"

{
   switch (which_alternative)
 {
 case 0:
   return "mov\t%w0, %w1";
 case 1:
   return "mov\t%w0, %1";
 case 2:
   return aarch64_output_scalar_simd_mov_immediate (operands[1],
mode);
 case 3:
   return "ldr\t%w0, %1";
 case 4:
   return "ldr\t%0, %1";
 case 5:
   return "str\t%w1, %0";
 case 6:
   return "str\t%1, %0";
 case 7:
   return "umov\t%w0, %1.[0]";
 case 8:
   return "dup\t%0., %w1";
 case 9:
   return "dup\t%0, %1.[0]";
 default:
   gcc_unreachable ();
 }
}
  [(set_attr "type" "mov_reg,mov_imm,neon_move,load1,load1,store1,store1,\
 neon_to_gp,neon_from_gp,neon_dup")
   (set_attr "simd" "*,*,yes,*,*,*,*,yes,yes,yes")
   (set_attr "addr_type" "*,*,*,op1,op1,op0,op0,*,*,*")]
)




FILE - gcc/config/aarch64/vulcan.md
;; Integer loads and stores.

(define_insn_reservation "vulcan_load_basic" 4
  (and (eq_attr "tune" "vulcan")
   (eq_attr "type" "load1")
   (match_test "aarch64_mem_type_p (insn, AARCH64_ADDR_SYMBOLIC
  | AARCH64_ADDR_REG_IMM
  | AARCH64_ADDR_LO_SUM)"))
  "vulcan_ls01")

(define_insn_reservation "vulcan_load_automod" 4
  (and (eq_attr "tune" "vulcan")
   (eq_attr "type" "load1")
   (match_test "aarch64_mem_type_p (insn, AARCH64_ADDR_REG_WB)"))
  "vulcan_ls01,vulcan_i012")




FILE - gcc/config/aarch64/aarch64.c

/* Return TRUE if INSN uses an address that satisfies any of the (non-strict)
   addressing modes specified by MASK.  This is intended for use in scheduling
   models that are sensitive to the form of address used by some particular
   instruction.  */

bool
aarch64_mem_type_p (rtx_insn *insn, unsigned HOST_WIDE_INT mask)
{
  aarch64_address_info info;
  bool valid;
  attr_addr_type addr_type;
  rtx mem, addr;
  machine_mode mode;

  addr_type = get_attr_addr_type (insn);

  switch (addr_type)
{
case ADDR_TYPE_WB:
  info.type = ADDRESS_REG_WB;
  break;

case ADDR_TYPE_LO_SUM:
  info.type = ADDRESS_LO_SUM;
  break;

case ADDR_TYPE_OP0:
case ADDR_TYPE_OP1:
  extract_insn_cached (insn);

  mem = recog_data.operand[(addr_type == ADDR_TYPE_OP0) ? 0 : 1];

  gcc_assert (MEM_P (mem));

  addr = XEXP (mem, 0);
  mode = GET_MODE (mem);

classify:
  valid = aarch64_classify_address (&info, addr, mode, MEM, false);
  if (!valid)
   return false;

  break;

case ADDR_TYPE_OP0ADDR:
case ADDR_TYPE_OP1ADDR:
  extract_insn_cached (insn);

  addr = recog_data.operand[(addr_type == ADDR_TYPE_OP0ADDR) ? 0 : 1];
  mode = DImode;
  goto classify;

case ADDR_TYPE_NONE:
  return false;
}

  switch (info.type)
{
case

[PATCH][AArch64] Canonicalize Cortex core tunings

2016-06-29 Thread Wilco Dijkstra
This patch sets the branch cost to the same most optimal setting for all Cortex 
cores,
reducing codesize and improving performance due to using more CSEL instructions.
Set the autoprefetcher model in Cortex-A72 to weak like the others.  Enable AES 
fusion
in Cortex-A35.  As a result generated code is now more similar as well as more 
optimal
across Cortex cores.

Regress passes, OK for commit?

ChangeLog:
2016-06-29  Wilco Dijkstra  

* config/aarch64/aarch64.c (cortexa35_tunings):
Enable AES fusion.  Use cortexa57_branch_cost.
(cortexa53_tunings): Use cortexa57_branch_cost.
(cortexa72_tunings): Use cortexa57_branch_cost.
Use AUTOPREFETCHER_WEAK.
(cortexa73_tunings): Use cortexa57_branch_cost.
---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
7617f9fb273d17653df95eaf22b3cbeae3862230..5a20e175e72b2848499f07fcdf1856b3a7817e49
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -448,11 +448,11 @@ static const struct tune_params cortexa35_tunings =
   &generic_addrcost_table,
   &cortexa53_regmove_cost,
   &generic_vector_cost,
-  &generic_branch_cost,
+  &cortexa57_branch_cost,
   &generic_approx_modes,
   4, /* memmov_cost  */
   1, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
   8,   /* function_align.  */
   8,   /* jump_align.  */
@@ -474,7 +474,7 @@ static const struct tune_params cortexa53_tunings =
   &generic_addrcost_table,
   &cortexa53_regmove_cost,
   &generic_vector_cost,
-  &generic_branch_cost,
+  &cortexa57_branch_cost,
   &generic_approx_modes,
   4, /* memmov_cost  */
   2, /* issue_rate  */
@@ -526,7 +526,7 @@ static const struct tune_params cortexa72_tunings =
   &cortexa57_addrcost_table,
   &cortexa57_regmove_cost,
   &cortexa57_vector_cost,
-  &generic_branch_cost,
+  &cortexa57_branch_cost,
   &generic_approx_modes,
   4, /* memmov_cost  */
   3, /* issue_rate  */
@@ -542,7 +542,7 @@ static const struct tune_params cortexa72_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   0,   /* cache_line_size.  */
-  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)/* tune_flags.  */
 };
 
@@ -552,7 +552,7 @@ static const struct tune_params cortexa73_tunings =
   &cortexa57_addrcost_table,
   &cortexa57_regmove_cost,
   &cortexa57_vector_cost,
-  &generic_branch_cost,
+  &cortexa57_branch_cost,
   &generic_approx_modes,
   4, /* memmov_cost.  */
   2, /* issue_rate.  */



[PATCH] Fix fold_binary RROTATE_EXPR opts (PR middle-end/71693)

2016-06-29 Thread Jakub Jelinek
Hi!

This issue is latent on the trunk/6.2, fails with checking in 5.x/4.9.
As usually in fold_*_loc, arg0/arg1 is STRIP_NOPS op0/op1, so might have
incompatible type, so we need to convert it to the right type first
before optimizing.

Bootstrapped/regtested on x86_64-linux/i686-linux, ok everywhere?

2016-06-29  Jakub Jelinek  

PR middle-end/71693
* fold-const.c (fold_binary_loc) : Cast
TREE_OPERAND (arg0, 0) and TREE_OPERAND (arg0, 1) to type
first when permuting bitwise operation with rotate.  Cast
TREE_OPERAND (arg0, 0) to type when cancelling two rotations.

* gcc.c-torture/compile/pr71693.c: New test.

--- gcc/fold-const.c.jj 2016-06-14 12:17:20.530282919 +0200
+++ gcc/fold-const.c2016-06-29 16:39:15.175562715 +0200
@@ -10294,11 +10294,15 @@ fold_binary_loc (location_t loc,
  || TREE_CODE (arg0) == BIT_IOR_EXPR
  || TREE_CODE (arg0) == BIT_XOR_EXPR)
  && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST)
-   return fold_build2_loc (loc, TREE_CODE (arg0), type,
-   fold_build2_loc (loc, code, type,
-TREE_OPERAND (arg0, 0), arg1),
-   fold_build2_loc (loc, code, type,
-TREE_OPERAND (arg0, 1), arg1));
+   {
+ tree arg00 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
+ tree arg01 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
+ return fold_build2_loc (loc, TREE_CODE (arg0), type,
+ fold_build2_loc (loc, code, type,
+  arg00, arg1),
+ fold_build2_loc (loc, code, type,
+  arg01, arg1));
+   }
 
   /* Two consecutive rotates adding up to the some integer
 multiple of the precision of the type can be ignored.  */
@@ -10307,7 +10311,7 @@ fold_binary_loc (location_t loc,
  && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
  && wi::umod_trunc (wi::add (arg1, TREE_OPERAND (arg0, 1)),
 prec) == 0)
-   return TREE_OPERAND (arg0, 0);
+   return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
 
   return NULL_TREE;
 
--- gcc/testsuite/gcc.c-torture/compile/pr71693.c.jj2016-06-29 
16:39:15.175562715 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr71693.c   2016-06-29 
16:39:15.175562715 +0200
@@ -0,0 +1,10 @@
+/* PR middle-end/71693 */
+
+unsigned short v;
+
+void
+foo (int x)
+{
+  v = unsigned short) (0x0001 | (x & 0x0070) | 0x0100) & 0x00ffU) << 8)
+   | (((unsigned short) (0x0001 | (x & 0x0070) | 0x0100) >> 8) & 0x00ffU));
+}

Jakub


Re: [PATCH], Pr 71667, Fix PowerPC ISA 3.0 DImode Altivec load/store

2016-06-29 Thread Michael Meissner
On Wed, Jun 29, 2016 at 05:29:36PM +0100, Alan Hayward wrote:
> Hi,
> 
> This patch references the wrong bug. It should reference 71677 and not
> 71667
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71677
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71667
> 
> 
> Thanks,
> Alan.

Whoops, yes you are correct.  Thanks for pointing that out.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Problem in cxx_fundamental_alignment_p?

2016-06-29 Thread Bernd Schmidt

I came across what I think is a bug in cxx_fundamental_alignment_p.

User alignments are specified in units of bytes. This is documented, and 
we can also see the following in handle_aligned_attribute, for the case 
when we have no args:

  align_expr = size_int (ATTRIBUTE_ALIGNED_VALUE / BITS_PER_UNIT);
Then, we compute the log of that alignment in check_user_alignment:
  i = check_user_alignment (align_expr, true)
That's the log of the alignment in bytes, as we can see a little further 
down:

  SET_TYPE_ALIGN (*type, (1U << i) * BITS_PER_UNIT);

Then, we call check_cxx_fundamental_alignment_constraints, which 
recomputes the alignment (in bytes) from that log:

  unsigned requested_alignment = 1U << align_log;
It then calls cxx_fundamental_alignment_p, where we compare it to 
TYPE_ALIGN values, which are specified in bits. So I think we have a 
mismatch there.


Does that sound right? The patch below was bootstrapped and tested on 
x86_64-linux, without issues, but I'm not convinced this code is covered 
by any testcase. Dodji?



Bernd
	* c-common.c (cxx_fundamental_alignment_p): Use TYPE_ALIGN_UNIT,
	not TYPE_ALIGN.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 237797)
+++ gcc/c-family/c-common.c	(working copy)
@@ -7668,10 +7668,10 @@ fail:
   return NULL_TREE;
 }
 
-/* Check whether ALIGN is a valid user-specified alignment.  If so,
-   return its base-2 log; if not, output an error and return -1.  If
-   ALLOW_ZERO then 0 is valid and should result in a return of -1 with
-   no error.  */
+/* Check whether ALIGN is a valid user-specified alignment, specified
+   in bytes.  If so, return its base-2 log; if not, output an error
+   and return -1.  If ALLOW_ZERO then 0 is valid and should result in
+   a return of -1 with no error.  */
 int
 check_user_alignment (const_tree align, bool allow_zero)
 {
@@ -12700,9 +12700,9 @@ scalar_to_vector (location_t loc, enum t
   return stv_nothing;
 }
 
-/* Return true iff ALIGN is an integral constant that is a fundamental
-   alignment, as defined by [basic.align] in the c++-11
-   specifications.
+/* Return true iff ALIGN, which is specified in bytes, is an integral
+   constant that is a fundamental alignment, as defined by
+   [basic.align] in the c++-11 specifications.
 
That is:
 
@@ -12712,10 +12712,11 @@ scalar_to_vector (location_t loc, enum t
 alignof(max_align_t)].  */
 
 bool
-cxx_fundamental_alignment_p  (unsigned align)
+cxx_fundamental_alignment_p (unsigned align)
 {
-  return (align <=  MAX (TYPE_ALIGN (long_long_integer_type_node),
-			 TYPE_ALIGN (long_double_type_node)));
+  unsigned limit = MAX (TYPE_ALIGN_UNIT (long_long_integer_type_node),
+			TYPE_ALIGN_UNIT (long_double_type_node));
+  return align <= limit;
 }
 
 /* Return true if T is a pointer to a zero-sized aggregate.  */


Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-06-29 Thread Andrew Burgess
* Jeff Law  [2016-06-21 20:55:15 -0600]:

> On 06/10/2016 10:56 AM, Andrew Burgess wrote:
> > The global flag `user_defined_section_attribute' is set while parsing C
> > code when the section attribute is encountered.  The flag is set when
> > anything has the section attribute applied to it, functions or data.
> > 
> > The only place this global was used was within the gate function for
> > partitioning blocks (pass_partition_blocks::gate), however, the
> > partitioning is done during compilation, while the flag is set earlier,
> > during the parsing.  The flag is then cleared again during the final
> > compilation pass.
> > 
> > The result is that if any function or data has a section attribute then
> > the flag will be set to true during the file parse pass.  The first
> > compiled function will then skip the partition-blocks pass, and the flag
> > will be set back to false during the final-pass on the first function.
> > After then, the flag is never set to true again.
> > 
> > The guarding of the partition-blocks pass does not appear to be
> > necessary, given that applying a section attribute correctly
> > overrides the hot/cold section partitioning (this is taken care if in
> > varasm.c).
> > 
> > gcc/ChangeLog:
> > 
> > * gcc/bb-reorder.c: Remove 'toplev.h' include.
> > (pass_partition_blocks::gate): No longer check
> > user_defined_section_attribute.
> > * gcc/c-family/c-common.c (handle_section_attribute): No longer
> > set user_defined_section_attribute.
> > * gcc/final.c (rest_of_handle_final): Likewise.
> > * gcc/toplev.c: Remove definition of user_defined_section_attribute.
> > * gcc/toplev.h: Remove declaration of
> > user_defined_section_attribute.
> user_defined_section_attribute was introduced as part of the hot/cold
> partitioning changes.
> 
> https://gcc.gnu.org/ml/gcc-patches/2004-07/msg01545.html
> 
> 
> What's supposed to happen is hot/cold partitioning is supposed to be turned
> off for the function which has the a user defined section attribute.
> 
> So proper behaviour is to set the flag to true when the attribute is parsed
> and turn it off when we're finished with the current function. The gate for
> hot/cold partitioning should check the value of the flag and avoid hot/cold
> partitioning when the flag is true.
> 
> So AFAICT everything is working as it should.  Keep in mind that multiple
> functions might have user defined section attributes.

Jeff & Jakub,

Thanks for taking the time to review and provide feedback.  Let me
explain how I _think_ it's working at the moment, then you can point
out where I might be going wrong.

My understanding is that currently all functions are parsed, and then
all functions are optimised / compiled.

The user_defined_section_attribute variable is initialised to false,
and set true when the section attribute is parsed.  However, as
pass_partition_blocks::gate is called as part of the optimise /
compile process we only read this variable once all of the functions
have been parsed.

The user_defined_section_attribute variable is reset to false in
rest_of_handle_final, which is the final compile / optimise pass.  So
I think how it (doesn't) currently work is that
user_defined_section_attribute starts as false, then if _any_ function
has a user defined section attribute the variable is set true.  Then,
in the compiler / optimise phase the _first_ function will not perform
the partition-blocks pass, after which user_defined_section_attribute
will be reset to false, and all future blocks will go through the
partition-blocks pass.

In the original patch I simply removed user_defined_section_attribute
figuring that given we've been happily running the partition-blocks
pass then this can't be too harmful, however, as Jakub said perhaps I
should have implemented a correct check.

I've attached an updated patch to remove
user_defined_section_attribute and replace the check in
pass_partition_blocks::gate with one the looks for the section
attribute on the function decl.

Feedback welcome,

Thanks,
Andrew

---

gcc: Remove unneeded global flag

The global flag `user_defined_section_attribute' is set while parsing C
code when the section attribute is encountered.  The flag is set when
anything has the section attribute applied to it, functions or data.

The only place this global was used was within the gate function for
partitioning blocks (pass_partition_blocks::gate), however, the
partitioning is done during compilation, while the flag is set earlier,
during the parsing.  The flag is then cleared again during the final
compilation pass.

The result is that if any function or data has a section attribute then
the flag will be set to true during the file parse pass.  The first
compiled function will then skip the partition-blocks pass, and the flag
will be set back to false during the final-pass on the first function.
After then, the flag is never set to true again.

The guarding of the partition-blocks pass does not ap

[PATCH], Add PowerPC ISA 3.0 vector extract support

2016-06-29 Thread Michael Meissner
This patch adds support to generate the ISA 3.0 VEXTRACTUB, VEXTRACTUH, and
XXEXTRACTUW instructions to extract a constant element from the small vector
integer types.  I also added support to generate STXSIWX if a DImode is in an
Altivec register (which is needed for this patch when extracting char or short
elements).

I built this on a little endian power8 system, and there were no regressions in
make check.  Is it ok to install in the trunk?

Ideally, I would like to also install it on the GCC 6.2 branch.  Note, it
depends on the June 15th changes to allow DImode into Altivec registers, and
the fix for PR 71677 that was submitted on June 27th being installed before
these patches can be applied.

Assuming the above patches are supplied can I back port it to the GCC 6.2 after
a burn-in period?

[gcc]
2016-06-29  Michael Meissner  

* config/rs6000/predicates.md (const_0_to_7_operand): New
predicate, recognize 0..7.
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Add
support for doing extracts from V16QImode, V8HImode, V4SImode
under ISA 3.0.
* config/rs6000/vsx.md (VSX_EXTRACT_I): Mode iterator for ISA 3.0
vector extract support.
(VSX_EXTRACT_PREDICATE): Mode attribute to validate element number
for ISA 3.0 vector extract.
(VSX_EXTRACT_INSN): Mode attribute to give the instruction to use
for ISA 3.0 vector extract.
(VSX_EX): Constraints to use for ISA 3.0 vector extract.
(vsx_extract_, VSX_EXTRACT_I): Add support for doing
extracts of a constant element number from small integer vectors
on 64-bit ISA 3.0 systems.
(vsx_extract__di): Likewise.
* config/rs6000/rs6000.h (TARGET_VEXTRACTUB): New target macro to
say when we can do ISA 3.0 vector extracts.
* config/rs6000/rs6000.md (stfiwx): Allow DImode in Altivec
registers, using the stxsiwx instruction.

[gcc/testsuite]
2016-06-29  Michael Meissner  

* gcc.target/powerpc/p9-extract-1.c: New file to test ISA 3.0
vector extract instructions.
* gcc.target/powerpc/p9-extract-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 237826)
+++ gcc/config/rs6000/predicates.md (.../gcc/config/rs6000) (working copy)
@@ -200,6 +200,11 @@ (define_predicate "const_2_to_3_operand"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 2, 3)")))
 
+;; Match op = 0..7.
+(define_predicate "const_0_to_7_operand"
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (INTVAL (op), 0, 7)")))
+
 ;; Match op = 0..15
 (define_predicate "const_0_to_15_operand"
   (and (match_code "const_int")
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 237826)
+++ gcc/config/rs6000/rs6000.c  (.../gcc/config/rs6000) (working copy)
@@ -6916,6 +6916,30 @@ rs6000_expand_vector_extract (rtx target
case V4SFmode:
  emit_insn (gen_vsx_extract_v4sf (target, vec, GEN_INT (elt)));
  return;
+   case V16QImode:
+ if (TARGET_VEXTRACTUB)
+   {
+ emit_insn (gen_vsx_extract_v16qi (target, vec, GEN_INT (elt)));
+ return;
+   }
+ else
+   break;
+   case V8HImode:
+ if (TARGET_VEXTRACTUB)
+   {
+ emit_insn (gen_vsx_extract_v8hi (target, vec, GEN_INT (elt)));
+ return;
+   }
+ else
+   break;
+   case V4SImode:
+ if (TARGET_VEXTRACTUB)
+   {
+ emit_insn (gen_vsx_extract_v4si (target, vec, GEN_INT (elt)));
+ return;
+   }
+ else
+   break;
}
 }
 
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 237826)
+++ gcc/config/rs6000/vsx.md(.../gcc/config/rs6000) (working copy)
@@ -263,6 +263,27 @@ (define_mode_attr VS_64reg [(V2DF  "ws")
 (define_mode_iterator VSINT_84  [V4SI V2DI DI])
 (define_mode_iterator VSINT_842 [V8HI V4SI V2DI])
 
+;; Iterator for ISA 3.0 vector extract/insert of integer vectors
+(define_mode_iterator VSX_EXTRACT_I [V16QI V8HI V4SI])
+
+;; Mode attribute to give the correct predicate for ISA 3.0 vector extract and
+;; insert to validate the operand number.
+(define_mode_attr VSX_EXTRACT_PREDICATE [(V16QI "const_0_to_15_operand")
+(

Re: [PATCH, AARCH64] add qdf24xx tuning structure

2016-06-29 Thread Christophe Lyon
On 28 June 2016 at 10:21, Kyrill Tkachov  wrote:
>
> On 28/06/16 02:03, Jim Wilson wrote:
>>
>> On Mon, Jun 13, 2016 at 3:01 AM, Kyrill Tkachov
>>  wrote:
>>>
>>> Hi Jim,
>>>
>>> On 10/06/16 23:48, Jim Wilson wrote:

 This adds a tuning structure for qdf24xx.  This was tested with an
 aarch64-linux bootstrap and a make check, with no regressions.  I also
 tested it with an x86_64-linux C make check to verify that I didn't
 break the testsuite for non aarch64 targets.
>>>
>>>
>>> As this also changes code in the arm backend
>>> it also needs a bootstrap and test on an arm target
>>> (arm-none-linux-gnueabihf for example).
>>> Can you please confirm that this passes successfully?
>>
>> Yes, I forgot to do the bootstrap and make check on arm.
>>
>> I tried to do that testing, and ran into problems with the armv8-a
>> assembler warning
>>  IT blocks containing 32-bit Thumb instructions are deprecated in
>> ARMv8
>> which messed up my testsuite results so much that they were unusable.
>
>
> Yes, that's PR 67591. For these purposes I usually configure with
I'm looking at this. Progress is slow as I keep being distracted.

Christophe

> something like --with-cpu=cortex-a15 unless I'm actually testing
> ARMv8-A functionality.
>
>> I had to rerun the tests to workaround that, and then got distracted
>> by other problems, but I have now done an armhf bootstrap and make
>> check with unpatched (cortex-a57) and patched (qdf24xx) trees and got
>> the same results.
>
>
> Thanks. That's ok arm-wise.
>
> Kyrill
>
>
>>
>> During the delay, the aarch64 tuning structure changed how the recip
>> square root approx is handled, so I had to make a trivial change to my
>> patch to compensate for that, and then redo the aarch64 bootstrap to
>> make sure it was still OK.  The new patch is attached, which otherwise
>> the same as the previous patch.  I'm assuming this is still OK to
>> install, as the previous patch was approved pending test results, but
>> will wait a bit in case someone ones to object.
>>
>> Jim
>
>


Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread François Dumont

On 29/06/2016 11:10, Jonathan Wakely wrote:

On 28/06/16 21:59 +0200, François Dumont wrote:

@@ -303,16 +301,20 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  emplace(const_iterator __position, _Args&&... __args)
  {
const size_type __n = __position - begin();


It looks like this should use __position - cbegin(), to avoid an
implicit conversion from iterator to const_iterator, and ...


-if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage
-&& __position == end())
-  {
-_Alloc_traits::construct(this->_M_impl, 
this->_M_impl._M_finish,

- std::forward<_Args>(__args)...);
-++this->_M_impl._M_finish;
-  }
+if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
+  if (__position == end())


This could be __position == cend(), and ...


+{
+  _Alloc_traits::construct(this->_M_impl, 
this->_M_impl._M_finish,

+   std::forward<_Args>(__args)...);
+  ++this->_M_impl._M_finish;
+}
+  else
+_M_insert_aux(begin() + (__position - cbegin()),


This could use __n, and ...


+ std::forward<_Args>(__args)...);
else
-  _M_insert_aux(begin() + (__position - cbegin()),
-std::forward<_Args>(__args)...);
+  _M_realloc_insert_aux(begin() + (__position - cbegin()),


This could also use __n.


+ std::forward<_Args>(__args)...);
+
return iterator(this->_M_impl._M_start + __n);
  }


.

I tried those changes too but started having failing tests in 
vector/ext_pointer so prefer to not touch that for the moment. I think 
the compilation error was coming from the change of begin() + 
(__position - cbegin()) into begin() + __n because of overloaded 
operator+. The 2 other changes should be fine for a future patch.


As asked here is now a patch to only fix the robustness issue. The 
consequence is that it is reverting the latest optimization as, without 
smart algo, we always need to do a copy to protect against insertion of 
values contained in the vector as shown by new tests.


François
diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h
index eaafa22..c37880a 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -949,7 +949,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #if __cplusplus >= 201103L
 	  _M_emplace_back_aux(__x);
 #else
-	  _M_insert_aux(end(), __x);
+	  _M_realloc_insert_aux(end(), __x);
 #endif
   }
 
@@ -1435,21 +1435,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #if __cplusplus < 201103L
   void
   _M_insert_aux(iterator __position, const value_type& __x);
-#else
-  template
-	static void
-	_S_insert_aux_assign(iterator __pos, _Args&&... __args)
-	{ *__pos =  _Tp(std::forward<_Args>(__args)...); }
-
-  static void
-  _S_insert_aux_assign(iterator __pos, _Tp&& __arg)
-  { *__pos = std::move(__arg); }
 
+  void
+  _M_realloc_insert_aux(iterator __position, const value_type& __x);
+#else
   template
 	void
 	_M_insert_aux(iterator __position, _Args&&... __args);
 
   template
+void
+_M_realloc_insert_aux(iterator __position, _Args&&... __args);
+
+  template
 	void
 	_M_emplace_back_aux(_Args&&... __args);
 #endif
diff --git a/libstdc++-v3/include/bits/vector.tcc b/libstdc++-v3/include/bits/vector.tcc
index 604be7b..9061593 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -112,27 +112,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #endif
 {
   const size_type __n = __position - begin();
-  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage
-	  && __position == end())
-	{
-	  _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x);
-	  ++this->_M_impl._M_finish;
-	}
+  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
+	if (__position == end())
+	  {
+	_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x);
+	++this->_M_impl._M_finish;
+	  }
+	else
+#if __cplusplus >= 201103L
+	  _M_insert_aux(begin() + (__position - cbegin()), __x);
+#else
+	  _M_insert_aux(__position, __x);
+#endif
   else
-	{
 #if __cplusplus >= 201103L
-	  const auto __pos = begin() + (__position - cbegin());
-	  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
-	{
-	  _Tp __x_copy = __x;
-	  _M_insert_aux(__pos, std::move(__x_copy));
-	}
-	  else
-	_M_insert_aux(__pos, __x);
+	_M_realloc_insert_aux(begin() + (__position - cbegin()), __x);
 #else
-	_M_insert_aux(__position, __x);
+	_M_realloc_insert_aux(__position, __x);
 #endif
-	}
+
   return iterator(this->_M_impl._M_start + __n);
 }
 
@@ -303,16 +301,20 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   emplace(const_iterator __position, _Args&&... __args)
   {
 	const size_type __n = __position - begin();
-	if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage
-	&& __position == end())
-	  {

Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread Jonathan Wakely

On 29/06/16 21:43 +0200, François Dumont wrote:
As asked here is now a patch to only fix the robustness issue. The 
consequence is that it is reverting the latest optimization as, 
without smart algo, we always need to do a copy to protect against 
insertion of values contained in the vector as shown by new tests.


I don't understand. There is no problem with insert(), only with
emplace(), so why do both get worse?

Also, the problem is with emplacing from an lvalue, so why do the
number of operations change for test02 and test03, which are for
xvalues and rvalues?

I haven't analyzed your patch, but the results seem wrong. We should
not have to do any more work to insert rvalues.

What am I missing?



diff --git 
a/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc 
b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc
index 39a3f03..8bd72b7 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc
@@ -149,7 +149,7 @@ test01()
void
test02()
{
-  const X::special expected{ 0, 0, 0, 0, 1, 3 };
+  const X::special expected{ 0, 1, 0, 0, 2, 3 };
  X::special ins, emp;
  {
std::vector v;
@@ -187,7 +187,7 @@ test02()
void
test03()
{
-  const X::special expected{ 1, 1, 0, 0, 1, 3 };
+  const X::special expected{ 1, 2, 0, 0, 2, 3 };
  X::special ins, emp;
  {
std::vector v;




Re: [PATCH], Add PowerPC ISA 3.0 vector extract support

2016-06-29 Thread Michael Meissner
On Wed, Jun 29, 2016 at 03:33:24PM -0400, Michael Meissner wrote:
> This patch adds support to generate the ISA 3.0 VEXTRACTUB, VEXTRACTUH, and
> XXEXTRACTUW instructions to extract a constant element from the small vector
> integer types.  I also added support to generate STXSIWX if a DImode is in an
> Altivec register (which is needed for this patch when extracting char or short
> elements).
> 
> I built this on a little endian power8 system, and there were no regressions 
> in
> make check.  Is it ok to install in the trunk?

Note, I discovered that the second test (p9-extract-2.c) had the wrong number
of '\'s for the alternative, and it would fail.

This is the correct patch.  Sorry about that.

Index: gcc/testsuite/gcc.target/powerpc/p9-extract-2.c
===
--- gcc/testsuite/gcc.target/powerpc/p9-extract-2.c 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)
 (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p9-extract-2.c 
(.../gcc/testsuite/gcc.target/powerpc)  (revision 237858)
@@ -0,0 +1,27 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include 
+
+void extract_int_0 (int *p, vector int a) { *p = vec_extract (a, 0); }
+void extract_int_3 (int *p, vector int a) { *p = vec_extract (a, 3); }
+
+void extract_short_0 (short *p, vector short a) { *p = vec_extract (a, 0); }
+void extract_short_3 (short *p, vector short a) { *p = vec_extract (a, 7); }
+
+void extract_schar_0 (signed char *p, vector signed char a) { *p = vec_extract 
(a, 0); }
+void extract_schar_3 (signed char *p, vector signed char a) { *p = vec_extract 
(a, 15); }
+
+/* { dg-final { scan-assembler "vextractub"  } } */
+/* { dg-final { scan-assembler "vextractuh"  } } */
+/* { dg-final { scan-assembler "xxextractuw" } } */
+/* { dg-final { scan-assembler "stxsibx" } } */
+/* { dg-final { scan-assembler "stxsihx" } } */
+/* { dg-final { scan-assembler "stfiwx\|stxsiwx" } } */
+/* { dg-final { scan-assembler-not "mfvsrd"  } } */
+/* { dg-final { scan-assembler-not "stxvd2x" } } */
+/* { dg-final { scan-assembler-not "stxv"} } */
+/* { dg-final { scan-assembler-not "lwa" } } */
+/* { dg-final { scan-assembler-not "stw" } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH, libstdc++/testsuite, ping2] 29_atomics/atomic/65913.cc: require atomic-builtins rather than specific target

2016-06-29 Thread Mike Stump
Please include the libstdc++ list, they don't all read the other list.  Also, 
the patch or a link to the patch helps the reviewers find the patch, otherwise 
even finding the patch to review can be hard for some folks.

Seems reasonable to me, though, I'd normally punt to the atomic people.

> On Jun 29, 2016, at 9:35 AM, Thomas Preudhomme 
>  wrote:
> Ping?
> 
> Best regards,
> 
> Thomas
> 
> On Thursday 02 June 2016 14:34:03 Thomas Preudhomme wrote:
>> Ping?
>> 
>> On Thursday 26 May 2016 14:00:55 Thomas Preudhomme wrote:
>>> [Sorry for the large recipient list, I wasn't sure who of C++ and x86
>>> maintainers should approve this]
>>> 
>>> Hi,
>>> 
>>> 29_atomics/atomic/65913.cc test in libstdc++ is a runtime test that only
>>> rely on atomic and gnu++11 support. Therefore I propose to require
>>> atomic-builtins instead of an x86 (32 or 64 bits) target.
>>> 
>>> ChangeLog entry is as follows:
>>> 
>>> 2016-05-19  Thomas Preud'homme  
>>> 
>>>* testsuite/29_atomics/atomic/65913.cc: Require atomic-builtins
>>> 
>>> rather than specific target.
>>> 
>>> 
>>> Patch is in attachment.
>>> 
>>> 
>>> Is this ok for trunk?
>>> 
>>> Best regards,
>>> 
>>> Thomas



req_atomic_builtin_for_65913.patch
Description: application/applefile


Re: [PATCH, libstdc++/testsuite, ping2] 29_atomics/atomic/65913.cc: require atomic-builtins rather than specific target

2016-06-29 Thread Jonathan Wakely

On 29/06/16 13:49 -0700, Mike Stump wrote:

Please include the libstdc++ list, they don't all read the other list.


And the documentation clearly says (in two places) that all libstdc++
patches must go to the libstdc++ list.



Re: [PATCH, libstdc++/testsuite, ping2] 29_atomics/atomic/65913.cc: require atomic-builtins rather than specific target

2016-06-29 Thread Jonathan Wakely

On 29/06/16 13:49 -0700, Mike Stump wrote:

On Thursday 26 May 2016 14:00:55 Thomas Preudhomme wrote:

[Sorry for the large recipient list, I wasn't sure who of C++ and x86
maintainers should approve this]

Hi,

29_atomics/atomic/65913.cc test in libstdc++ is a runtime test that only
rely on atomic and gnu++11 support. Therefore I propose to require
atomic-builtins instead of an x86 (32 or 64 bits) target.

ChangeLog entry is as follows:

2016-05-19  Thomas Preud'homme  

   * testsuite/29_atomics/atomic/65913.cc: Require atomic-builtins

rather than specific target.


Patch is in attachment.


The original mail with the patch is:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00167.html



Is this ok for trunk?


Yes, any target that passes dg-require-atomic-builtins should be able
to do that test without libatomic, so OK for trunk.



Re: [PATCH] tree-ssa-strlen improvements (PR tree-optimization/71625)

2016-06-29 Thread Christophe Lyon
On 29 June 2016 at 15:20, Jakub Jelinek  wrote:
> On Wed, Jun 29, 2016 at 03:11:07PM +0200, Christophe Lyon wrote:
>> This causes GCC builds failures on arm-linux* targets, while building glibc:
>>
>> ns_print.c: In function 'ns_sprintrrf':
>> ns_print.c:94:1: internal compiler error: in get_stridx_plus_constant,
>> at tree-ssa-strlen.c:680
>>  ns_sprintrrf(const u_char *msg, size_t msglen,
>>  ^~~~
>> 0xcf1387 get_stridx_plus_constant
>> 
>> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:680
>> 0xcf15ba get_addr_stridx
>> 
>> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:197
>> 0xcf1776 get_stridx
>> 
>> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:250
>> 0xcf6448 handle_pointer_plus
>> 
>> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2028
>> 0xcf6448 strlen_optimize_stmt
>> 
>> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2297
>> 0xcf6fd9 strlen_dom_walker::before_dom_children(basic_block_def*)
>> 
>> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2456
>> 0x115c27c dom_walker::walk(basic_block_def*)
>> 
>> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/domwalk.c:265
>> 0xcef80a execute
>> 
>> /tmp/1440207_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-ssa-strlen.c:2528
>>
>>
>> I do not have a preprocessed file yet, I'll prepare one if you need.
>
> That would be greatly appreciated (plus command line options).
> Please file a PR and I'll handle it.  Thanks.
>

This is PR 71707.
Sorry for the delay.

Thanks,

Christophe


> Jakub


Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases

2016-06-29 Thread Prathamesh Kulkarni
ping * 2 https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html

Thanks,
Prathamesh

On 7 June 2016 at 13:56, Prathamesh Kulkarni
 wrote:
> ping https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html
>
> Thanks,
> Prathamesh
>
> On 25 May 2016 at 18:19, Prathamesh Kulkarni
>  wrote:
>> On 23 May 2016 at 14:28, Prathamesh Kulkarni
>>  wrote:
>>> Hi,
>>> This patch overrides expand_divmod_libfunc for ARM port and adds test-cases.
>>> I separated the SImode tests into separate file from DImode tests
>>> because certain arm configs (cortex-15) have hardware div insn for
>>> SImode but not for DImode,
>>> and for that config we want SImode tests to be disabled but not DImode 
>>> tests.
>>> The patch therefore has two target-effective checks: divmod and 
>>> divmod_simode.
>>> Cross-tested on arm*-*-*.
>>> Bootstrap+test on arm-linux-gnueabihf in progress.
>>> Does this patch look OK ?
>> Hi,
>> This version adds couple of more test-cases and fixes typo in
>> divmod-3-simode.c, divmod-4-simode.c
>>
>> Thanks,
>> Prathamesh
>>>
>>> Thanks,
>>> Prathamesh


Re: [AArch64] ARMv8.2 command line and feature macros support

2016-06-29 Thread Andrew Pinski
On Mon, Jun 27, 2016 at 7:58 AM, Jiong Wang  wrote:
> On 07/06/16 09:46, Jiong Wang wrote:
>>
>> 2016-06-07  Matthew Wahab
>> Jiong Wang
>>
>> * config/aarch64/aarch64-arches.def: Add "armv8.2-a".
>> * config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
>> (AARCH64_FL_F16): New.
>> (AARCH64_FL_FOR_ARCH8_2): New.
>> (AARCH64_ISA_8_2): New.
>> (AARCH64_ISA_F16): New.
>> (TARGET_FP_F16INST): New.
>> (TARGET_SIMD_F16INST): New.
>> * config/aarch64/aarch64-option-extensions.def: New entry for
>> "fp16".
>> * config/aarch64/aarch64-c.c (arch64_update_cpp_builtins):
>> Conditionally define
>> __ARM_FEATURE_FP16_SCALAR_ARITHMETIC and
>> __ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
>> * doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and
>> "fp16".
>>
>
> This is a updated version of this patch, the updates are:
>
>   * When enabling "fp16" also enables "fp".
>   * When disabling "fp" also disables "fp16".
>   * When disabling "fp16" only disables "fp16".
>
> OK for trunk?

I notice you did not add a profile entry in
aarch64-option-extensions.def.  Are you going to add this separately?
People will use gcc to invoke the assembler and use
-mcpu=armv8.2-a+profile on the command line.  In fact I already got a
request internally to support that.

Thanks,
Andrew

>
> 2016-06-27  Matthew Wahab  
> Jiong Wang  
>
> * config/aarch64/aarch64-arches.def: Add "armv8.2-a".
> * config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
> (AARCH64_FL_F16): New.
> (AARCH64_FL_FOR_ARCH8_2): New.
> (AARCH64_ISA_8_2): New.
> (AARCH64_ISA_F16): New.
> (TARGET_FP_F16INST): New.
> (TARGET_SIMD_F16INST): New.
> * config/aarch64/aarch64-option-extensions.def ("fp16"): New entry.
> ("fp"): Disabling "fp" also disables "fp16".
>
> * config/aarch64/aarch64-c.c (arch64_update_cpp_builtins):
> Conditionally define
> __ARM_FEATURE_FP16_SCALAR_ARITHMETIC and
> __ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
> * doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and
> "fp16".
>
>


Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread Jonathan Wakely

On 29/06/16 21:36 +0100, Jonathan Wakely wrote:

On 29/06/16 21:43 +0200, François Dumont wrote:
As asked here is now a patch to only fix the robustness issue. The 
consequence is that it is reverting the latest optimization as, 
without smart algo, we always need to do a copy to protect against 
insertion of values contained in the vector as shown by new tests.


I don't understand. There is no problem with insert(), only with
emplace(), so why do both get worse?

Also, the problem is with emplacing from an lvalue, so why do the
number of operations change for test02 and test03, which are for
xvalues and rvalues?

I haven't analyzed your patch, but the results seem wrong. We should
not have to do any more work to insert rvalues.

What am I missing?


It seems to me that the minimal fix would be:

--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -312,7 +312,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 }
   else
 _M_insert_aux(begin() + (__position - cbegin()),
-   std::forward<_Args>(__args)...);
+   _Tp(std::forward<_Args>(__args)...));
   return iterator(this->_M_impl._M_start + __n);
  }

This causes regressions in the insert_vs_emplace.cc test because we
insert rvalues using emplace:

 iterator
 insert(const_iterator __position, value_type&& __x)
 { return emplace(__position, std::move(__x)); }

That's suboptimal, since in the general case we need an extra
construction for emplacing, but we know that we don't need to do that
when inserting rvalues.

So the correct fix would be to implement inserting rvalues without
using emplace.

The attached patch is a smaller change, and doesn't change the number
of operations for insertions, only for emplacing lvalues.


diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h
index eaafa22..4f43c8e 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -981,6 +981,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   }
 
 #if __cplusplus >= 201103L
+private:
+  iterator
+  _M_emplace_aux(const_iterator __position, value_type&& __v)
+  { return insert(__position, std::move(__v)); }
+
+  template
+	iterator
+	_M_emplace_aux(const_iterator __position, _Args&&... __args)
+	{ return insert(__position, _Tp(std::forward<_Args>(__args)...)); }
+
+public:
   /**
*  @brief  Inserts an object in %vector before specified iterator.
*  @param  __position  A const_iterator into the %vector.
@@ -995,7 +1006,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*/
   template
 	iterator
-	emplace(const_iterator __position, _Args&&... __args);
+	emplace(const_iterator __position, _Args&&... __args)
+	{ return _M_emplace_aux(__position, std::forward<_Args>(__args)...); }
 
   /**
*  @brief  Inserts given value into %vector before specified iterator.
@@ -1039,8 +1051,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  used the user should consider using std::list.
*/
   iterator
-  insert(const_iterator __position, value_type&& __x)
-  { return emplace(__position, std::move(__x)); }
+  insert(const_iterator __position, value_type&& __x);
 
   /**
*  @brief  Inserts an initializer_list into the %vector.
diff --git a/libstdc++-v3/include/bits/vector.tcc b/libstdc++-v3/include/bits/vector.tcc
index 9cb5464..fbcab84 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -297,24 +297,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
 #if __cplusplus >= 201103L
   template
-template
-  typename vector<_Tp, _Alloc>::iterator
-  vector<_Tp, _Alloc>::
-  emplace(const_iterator __position, _Args&&... __args)
-  {
-	const size_type __n = __position - begin();
-	if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage
-	&& __position == end())
-	  {
-	_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
- std::forward<_Args>(__args)...);
-	++this->_M_impl._M_finish;
-	  }
-	else
-	  _M_insert_aux(begin() + (__position - cbegin()),
-			std::forward<_Args>(__args)...);
-	return iterator(this->_M_impl._M_start + __n);
-  }
+auto
+vector<_Tp, _Alloc>::
+insert(const_iterator __position, value_type&& __v) -> iterator
+{
+  const auto __n = __position - cbegin();
+  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage
+	  && __position == cend())
+	{
+	  _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
+   std::move(__v));
+	  ++this->_M_impl._M_finish;
+	}
+  else
+	_M_insert_aux(begin() + __n, std::move(__v));
+  return iterator(this->_M_impl._M_start + __n);
+}
 
   template
 template
diff --git a/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc
index 39a3f03

Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory

2016-06-29 Thread Jiong Wang

Andreas Krebbel writes:

> On 06/28/2016 04:16 PM, Jiong Wang wrote:
> ...
>> So my first impression is TARGET_LEGITIMATE_ADDRESS_P on s390 do need a
>> fix here. The following draft patch fix this, my fix may be in
>> correct as normally we will allow illegal constant offset if it's
>> associated with virtual frame register before virtual register
>> elimination, I don't know if that should be considered here. Also I
>> don't know if such check should be constrainted under some architecture
>> features.
>> 
>> --- a/gcc/config/s390/s390.c
>> +++ b/gcc/config/s390/s390.c
>> @@ -4374,6 +4374,11 @@ s390_legitimate_address_p (machine_mode mode, rtx 
>> addr, bool strict)
>> || REGNO_REG_CLASS (REGNO (ad.indx)) == ADDR_REGS))
>>return false;
>>   }
>> +
>> +  if (ad.disp
>> +  && !s390_short_displacement (ad.disp))
>> +return false;
>> +
>> return true;
>>   }
>
> Whether short displacement is required or not depends on the instruction. We 
> always relied on reload
> to take care of this.  legitimate_address_p needs to accept the union of all 
> valid adresses on
> S/390.  That change probably would cause a severe performance regression.
>
>> Meanwhile I feel LRA might be too strict here, as we can't assume all
>> address legitimized before lra, right? we still need to handle reload
>> illegal address. While now, a reload of
>> 
>>set (mem (reg + reg + offset)) regA
>> 
>>(The offset is out of range that the inner address doesn't pass
>> constraint check.)
>> 
>> into
>> 
>> regB <- reg + reg + offset
>>(set (mem (regB)) regA)
>
> This is exactly what is supposed to happen in that case. All addresses which 
> might be invalid to be
> used directly in an insn can be handled with load address or load address 
> relative long (+ secondary
> reload for odd addresses).
>
>> is treate as spill after r237277, so I feel the following check in
>> lra-constraints.c also needs a relax?
>> 
>> if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
>> 
>> 
>> Comments?

I am testing a fix which teach LRA don't add spill cost if it's "offmemok".

-- 
Regards,
Jiong


[PATCH,rs6000] Fix error in Power9 code generation for vpermr instruction

2016-06-29 Thread Kelvin Nilsen

Testing on a Power9 simulator revealed an error in the code emitted for
the *altivec_vpermr__internal define_insn pattern.  Two of the
operands of the vpermr operand had been emitted in wrong order.  This 
patch corrects the error.

Testing of this patch on a Power9 simulator demonstrates that this
patch corrects all of the following tests, which had been failing on
the simulator prior to this patch:

g++ tests
> PASS: c-c++-common/torture/vector-subscript-1.c   -O1  execution test 
>

> PASS: c-c++-common/torture/vector-subscript-1.c   -O2  execution test 
>

> PASS: c-c++-common/torture/vector-subscript-1.c   -Os  execution test 
>

> PASS: c-c++-common/torture/vector-subscript-1.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  execution test

> PASS: c-c++-common/torture/vector-subscript-1.c   -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects  execution test   

> PASS: c-c++-common/torture/vector-subscript-2.c   -O1  execution test 
>  

> PASS: c-c++-common/torture/vector-subscript-2.c   -O2  execution test 
>  

> PASS: c-c++-common/torture/vector-subscript-2.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

> PASS: c-c++-common/torture/vector-subscript-2.c   -O3 -g  execution test  
> 

> PASS: c-c++-common/torture/vector-subscript-2.c   -Os  execution test 
> 

> PASS: c-c++-common/torture/vector-subscript-2.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  execution test   
> 

> PASS: c-c++-common/torture/vector-subscript-2.c   -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects  execution test  
> 

> PASS: g++.dg/torture/vshuf-v16qi.C   -O0  execution test  
> 

> PASS: g++.dg/torture/vshuf-v16qi.C   -O1  execution test

> PASS: g++.dg/torture/vshuf-v16qi.C   -O2  execution test

> PASS: g++.dg/torture/vshuf-v16qi.C   -Os  execution test

> PASS: g++.dg/torture/vshuf-v16qi.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test

> PASS: g++.dg/torture/vshuf-v16qi.C   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test

> PASS: g++.dg/torture/vshuf-v2di.C   -O0  execution test

> PASS: g++.dg/torture/vshuf-v2di.C   -O1  execution test

> PASS: g++.dg/torture/vshuf-v2di.C   -O2  execution test

> PASS: g++.dg/torture/vshuf-v2di.C   -Os  execution test

> PASS: g++.dg/torture/vshuf-v2di.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test

> PASS: g++.dg/torture/vshuf-v2di.C   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test

> PASS: g++.dg/torture/vshuf-v4si.C   -O0  execution test

> PASS: g++.dg/torture/vshuf-v4si.C   -O1  execution test

> PASS: g++.dg/torture/vshuf-v4si.C   -O2  execution test

> PASS: g++.dg/torture/vshuf-v4si.C   -Os  execution test

> PASS: g++.dg/torture/vshuf-v4si.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test

> PASS: g++.dg/torture/vshuf-v4si.C   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test

> PASS: g++.dg/torture/vshuf-v8hi.C   -O0  execution test

> PASS: g++.dg/torture/vshuf-v8hi.C   -O1  execution test

> PASS: g++.dg/torture/vshuf-v8hi.C   -O2  execution test

> PASS: g++.dg/torture/vshuf-v8hi.C   -Os  execution test

> PASS: g++.dg/torture/vshuf-v8hi.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test

> PASS: g++.dg/torture/vshuf-v8hi.C   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test

> PASS: g++.old-deja/g++.other/sizeof4.C  -std=c++14  (test for errors, line 34)

> PASS: g++.old-deja/g++.other/sizeof4.C  -std=c++14 (test for excess errors)


gcc tests:
> PASS: gcc.c-torture/execute/scal-to-vec1.c   -O0  execution test

> PASS: gcc.c-torture/execute/scal-to-vec1.c   -O1  execution test

> PASS: gcc.c-torture/execute/scal-to-vec1.c   -O2  execution test

> PASS: gcc.c-torture/execute/scal-to-vec1.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

> PASS: gcc.c-torture/execute/scal-to-vec1.c   -O3 -g  execution test

> PASS: gcc.c-torture/execute/scal-to-vec1.c   -Os  execution test

> PASS: gcc.c-torture/execute/scal-to-vec1.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test

> PASS: gcc.c-torture/execute/scal-to-vec1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test

> PASS: c-c++-common/torture/vector-subscript-1.c   -O1  execution test

> PASS: c-c++-common/torture/vector-subscript-1.c   -O2  execution test

> PASS: c-c++-common/torture/vector-subscript-1.c   -Os  execution t

Re: [PATCH, rs6000] Use direct moves for __builtin_signbit for 128-bit floating-point

2016-06-29 Thread Segher Boessenkool
Hi,

On Tue, Jun 28, 2016 at 04:44:08PM -0500, Bill Schmidt wrote:
> +void
> +rs6000_split_signbit (rtx dest, rtx src)
> +{
> +  machine_mode d_mode = GET_MODE (dest);
> +  machine_mode s_mode = GET_MODE (src);
> +  rtx dest_di = (d_mode == DImode) ? dest : gen_lowpart (DImode, dest);
> +  rtx shift_reg = dest_di;
> +
> +  gcc_assert (REG_P (dest));
> +  gcc_assert (REG_P (src) || MEM_P (src));
> +
> +  if (MEM_P (src))
> +{
> +  rtx mem;
> +
> +  if (s_mode == SFmode)
> + mem = gen_rtx_SIGN_EXTEND (DImode, adjust_address (src, SImode, 0));
> +
> +  else if (GET_MODE_SIZE (s_mode) == 16 && !WORDS_BIG_ENDIAN)
> + mem = adjust_address (src, DImode, 8);
> +
> +  else
> + mem = adjust_address (src, DImode, 0);

else if (s_mode == DFmode)
  ...
else
  gcc_unreachable ();

Or does this case catch some other modes, too?  Make those explicit, then?
Or make everything based on size (not mode).

And put it in order of size if you make that change?

> +  if (FLOAT128_IEEE_P (mode))
> +{
> +  if (mode == KFmode)
> + {
> +   emit_insn (gen_signbitkf2_dm (operands[0], operands[1]));
> +   DONE;
> + }
> +  else if (mode == TFmode)
> + {
> +   emit_insn (gen_signbittf2_dm (operands[0], operands[1]));
> +   DONE;
> + }
> +  else
> + gcc_unreachable ();
> +}

If you put a single DONE at the end here things will look simpler.

Why does the similar thing in rs6000.c construct the RTL by hand?  This
way is safer.

> --- gcc/testsuite/gcc.target/powerpc/signbit-1.c  (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/signbit-1.c  (working copy)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */

Default args aren't necessary.

> --- gcc/testsuite/gcc.target/powerpc/signbit-3.c  (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/signbit-3.c  (working copy)
> @@ -0,0 +1,172 @@
> +/* { dg-do run { target { powerpc*-*-linux* } } } */
> +/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-options "-mcpu=power7 -O2 -mfloat128 -static-libgcc -lm" } */

Why does this need -static-libgcc?


Segher


Re: Improve insert/emplace robustness to self insertion

2016-06-29 Thread Jonathan Wakely

On 29/06/16 21:43 +0200, François Dumont wrote:
I tried those changes too but started having failing tests in 
vector/ext_pointer so prefer to not touch that for the moment. I think 
the compilation error was coming from the change of begin() + 
(__position - cbegin()) into begin() + __n because of overloaded 
operator+. The 2 other changes should be fine for a future patch.


By the way, I fixed that in my patch too. The problem was that __n had
type size_type, but for adding to iterators it should really be
difference_type. Simply using auto makes it work :-)

Unless you see any problems in my patch I'll finish testing it
tomorrow and commit it with your new testcases.

Thanks for inspiring me to investigate what we were doing wrong! :-)





[PATCH], Fix PowerPC bug 71677, make xalancbmk build in Spec 2006

2016-06-29 Thread Michael Meissner
Sigh, I forgot to attach the patch, and I also used the wrong bug number in my
previous patch.

As we discussed in the patch review, there were some issues with using %Y for
the ISA 3.0 instructions LXSD and STXSD.

I have rewritten the patch so that we have a new memory constraint (%wY) that
explicitly targets those instructions.  I went back to the mov{DF,DD} patterns
and changed their use of %o to %wY.

I removed the test that was generated from the XalanNamespacesStack.cpp source
that showed up the problem.

I have bootstrapped this on a little endian power8 system and there were no
regressions in the test suite.  I also built Spec 2006 for power9 with this
compiler, and the xalancbmk benchmark now builds.  I will kick off a big endian
build on a power7 system.

Assuming there are no regressions in power7, are these patches ok to install in
the trunk, and backport to GCC 6.2 after a burn-in period?

2016-06-29  Michael Meissner  

PR target/71677
* config/rs6000/constraints.md (wY constraint): New constraint to
match the requirements for the LXSD and STXSD instructions.
* config/rs6000/predicates.md (offsettable_mem_14bit_operand): New
predicate to match the requirements for the LXSD and STXSD
instructions.
* config/rs6000/rs6000.md (mov_hardfloat32, FMOVE64 case):
Use constaint wY for LXSD/STXSD instructions instead of 'o' or 'Y'
to make sure that the bottom 2 bits of offset are 0, the address
form is offsettable, and no updating is done in the address mode.
(mov_hardfloat64, FMOVE64 case): Likewise.
(movdi_internal32): Likewise
(movdi_internal64): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 237826)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -185,6 +185,11 @@ (define_constraint "wS"
   "Vector constant that can be loaded with XXSPLTIB & sign extension."
   (match_test "xxspltib_constant_split (op, mode)"))
 
+;; ISA 3.0 D-form instruction that has the bottom 2 bits 0 (LXSD or STXSD).
+(define_memory_constraint "wY"
+  "Offsettable memory operand, with bottom 2 bits 0"
+  (match_operand 0 "offsettable_mem_14bit_operand"))
+
 ;; Altivec style load/store that ignores the bottom bits of the address
 (define_memory_constraint "wZ"
   "Indexed or indirect memory operand, ignoring the bottom 4 bits"
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 237826)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -729,6 +729,15 @@ (define_predicate "offsettable_mem_opera
   (and (match_operand 0 "memory_operand")
(match_test "offsettable_nonstrict_memref_p (op)")))
 
+;; Return 1 if the operand is an offsettable memory operand for ISA 3.0
+;; scalar LXSD/STXSD that must have the bottom 2 bits 0 and no update
+;; form
+(define_predicate "offsettable_mem_14bit_operand"
+  (and (match_operand 0 "memory_operand")
+   (match_test "offsettable_nonstrict_memref_p (op)")
+   (match_test "mem_operand_gpr (op, mode)")
+   (not (match_test "update_address_mem  (op, mode)"
+
 ;; Return 1 if the operand is suitable for load/store quad memory.
 ;; This predicate only checks for non-atomic loads/stores (not lqarx/stqcx).
 (define_predicate "quad_memory_operand"
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 237826)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -6773,8 +6773,8 @@ (define_split
 ;; except for 0.0 which can be created on VSX with an xor instruction.
 
 (define_insn "*mov_hardfloat32"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" 
"=m,d,d,,Z,,o,,,!r,Y,r,!r")
-   (match_operand:FMOVE64 1 "input_operand" 
"d,m,d,Z,,o,r,Y,r"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" 
"=m,d,d,,Z,,wY,,,!r,Y,r,!r")
+   (match_operand:FMOVE64 1 "input_operand" 
"d,m,d,Z,,wY,r,Y,r"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && 
TARGET_DOUBLE_FLOAT 
&& (gpc_reg_operand (operands[0], mode)
|| gpc_reg_operand (operands[1], mode))"
@@ -6812,8 +6812,8 @@ (define_insn "*mov_softfloat32"
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*mov_hardfloat64"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" 
"=m,d,d,,o,,Z,,,!r,Y,r,!r,*c*l,!r,*h,r,wg,r,")
-   (match_operand:FMOVE64 1 "input_operand" 
"d,m,d,o,,Z,r,Y,r,r,h,0,wg,r,,r"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" 
"=m,d,d,,wY,,Z,,,!r,Y,r,!r,*c*l,!r,*h,r,wg,r,")
+   (match_operand:FMOVE6

Re: [PATCH] Fix source locations of bad enum values (PR c/71610 and PR c/71613)

2016-06-29 Thread Joseph Myers
On Wed, 22 Jun 2016, David Malcolm wrote:

> PR c/71613 identifies a problem where we fail to report this enum:
> 
>   enum { e1 = LLONG_MIN };
> 
> with -pedantic, due to LLONG_MIN being inside a system header.
> 
> This patch updates the C and C++ frontends to use the location of the
> name as the primary location in the diagnostic, supplying the location
> of the value as a secondary location, fixing the issue.
> 
> Before:
>   $ gcc -c /tmp/test.c -Wpedantic
>   /tmp/test.c: In function 'main':
>   /tmp/test.c:3:14: warning: ISO C restricts enumerator values to range of 
> 'int' [-Wpedantic]
>  enum { c = -30 };
> ^
> 
> After:
>   $ ./xgcc -B. -c /tmp/test.c -Wpedantic
>   /tmp/test.c: In function 'main':
>   /tmp/test.c:3:10: warning: ISO C restricts enumerator values to range of 
> 'int' [-Wpedantic]
>  enum { c = -30 };
> ^   ~~~
> 
> Successfully bootstrapped®retested on x86_64-pc-linux-gnu;
> adds 13 PASS results to gcc.sum and 9 PASS results to g++.sum.
> 
> OK for trunk?

The C front-end changes are OK for the diagnostic improvement, although I 
think piecemeal fixes make no sense for the issue of missing diagnostics - 
in the absence of a clear strategy to identify which cases should or 
should get diagnostics when a macro from a system header is expanded 
outside a system header, we need to make the default to give the 
diagnostics if the expansion location is outside a system header.  (This 
applies both to the general diagnostic disabling outside system headers, 
and to specific tests for being in system headers in front ends - those 
specific tests may or may not be relics left over from before the general 
disabling was added.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[patch] Get rid of stack trampolines for nested functions

2016-06-29 Thread Eric Botcazou
Hi,

this patch implements generic support for the elimination of stack trampolines 
and, consequently, of the need to make the stack executable when pointers to 
nested functions are used.  That's done on a per-language and per-target basis 
(i.e. there is 1 language hook and 1 target hook to parameterize it) and there 
are no changes whatsoever in code generation if both are not turned on (and 
the patch implements a -ftrampolines option to let the user override them).

The idea is based on the fact that, for targets using function descriptors as 
per their ABI like IA-64, AIX or VMS platforms, stack trampolines "degenerate" 
into descriptors built at run time on the stack and thus made up of data only, 
which in turn means that the stack doesn't need to be made executable.

This descriptor-based scheme is implemented generically for nested functions, 
i.e. the nested function lowering pass builds generic descriptors instead of 
trampolines on the stack when encountering pointers to nested functions, which 
means that there are 2 kinds of pointers to functions and therefore a run-time 
identification mechanism is needed for indirect calls to distinguish them.

Because of that, enabling the support breaks binary compatibility (for code 
manipulating pointers to nested functions).  That's OK for Ada and nested 
functions are first-class citizens in the language anyway so we really need 
this, but not for C so for example Ada doesn't use it at the interface with C 
(when objects have "convention C" in Ada parlance).

This was bootstrapped/regtested on x86_64-suse-linux but AdaCore has been 
using it on native platforms (Linux, Windows, Solaris, etc) for years.

OK for the mainline?


2016-06-29  Eric Botcazou  

PR ada/37139
PR ada/67205
* common.opt (-ftrampolines): New option.
* doc/invoke.texi (Code Gen Options): Document it.
* doc/tm.texi.in (Trampolines): Add TARGET_CUSTOM_FUNCTION_DESCRIPTORS
* doc/tm.texi: Regenerate.
* builtins.def: Add init_descriptor and adjust_descriptor.
* builtins.c (expand_builtin_init_trampoline): Do not issue a warning
on platforms with descriptors.
(expand_builtin_init_descriptor): New function.
(expand_builtin_adjust_descriptor): Likewise.
(expand_builtin) : New case.
: Likewise.
* calls.c (prepare_call_address): Remove SIBCALLP parameter and add
FLAGS parameter.  Deal with indirect calls by descriptor and adjust.
Set STATIC_CHAIN_REG_P on the static chain register, if any.
(call_expr_flags): Set ECF_BY_DESCRIPTOR for calls by descriptor.
(expand_call): Likewise.  Move around call to prepare_call_address
and pass all flags to it.
* cfgexpand.c (expand_call_stmt): Reinstate CALL_EXPR_BY_DESCRIPTOR.
* gimple.h (enum gf_mask): New GF_CALL_BY_DESCRIPTOR value.
(gimple_call_set_by_descriptor): New setter.
(gimple_call_by_descriptor_p): New getter.
* gimple.c (gimple_build_call_from_tree): Set CALL_EXPR_BY_DESCRIPTOR.
(gimple_call_flags): Deal with GF_CALL_BY_DESCRIPTOR.
* langhooks.h (struct lang_hooks): Add custom_function_descriptors.
* langhooks-def.h (LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS): Define.
(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS.
* rtl.h (STATIC_CHAIN_REG_P): New macro.
* rtlanal.c (find_first_parameter_load): Skip static chain registers.
* target.def (custom_function_descriptors): New POD hook.
* tree.h (FUNC_ADDR_BY_DESCRIPTOR): New flag on ADDR_EXPR.
(CALL_EXPR_BY_DESCRIPTOR): New flag on CALL_EXPR.
* tree-core.h (ECF_BY_DESCRIPTOR): New mask.
Document FUNC_ADDR_BY_DESCRIPTOR and CALL_EXPR_BY_DESCRIPTOR.
* tree.c (make_node_stat) : Set function alignment to
DEFAULT_FUNCTION_ALIGNMENT instead of FUNCTION_BOUNDARY.
(build_common_builtin_nodes): Initialize init_descriptor and
adjust_descriptor.
* tree-nested.c: Include target.h.
(struct nesting_info): Add 'any_descr_created' field.
(get_descriptor_type): New function.
(lookup_element_for_decl): New function extracted from...
(create_field_for_decl): Likewise.
(lookup_tramp_for_decl): ...here.  Adjust.
(lookup_descr_for_decl): New function.
(convert_tramp_reference_op): Deal with descriptors.
(build_init_call_stmt): New function extracted from...
(finalize_nesting_tree_1): ...here.  Adjust and deal with descriptors.
* defaults.h (DEFAULT_FUNCTION_ALIGNMENT): Define.
(TRAMPOLINE_ALIGNMENT): Set to above instead of FUNCTION_BOUNDARY.
* config/aarch64/aarch64.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS):Define
* config/alpha/alpha.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Likewise.
* config/arm/arm.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Likewise.
* config/arm/arm.c

Re: [PATCH], Add PowerPC ISA 3.0 vector extract support

2016-06-29 Thread Segher Boessenkool
On Wed, Jun 29, 2016 at 03:33:24PM -0400, Michael Meissner wrote:
> Is it ok to install in the trunk?

See comments below.  Okay with that taken care of.

> Ideally, I would like to also install it on the GCC 6.2 branch.  Note, it
> depends on the June 15th changes to allow DImode into Altivec registers, and
> the fix for PR 71677 that was submitted on June 27th being installed before
> these patches can be applied.

Is the DImode thing not on trunk yet?

> Assuming the above patches are supplied can I back port it to the GCC 6.2 
> after
> a burn-in period?

Yes, thanks.


> +;; Mode attribute to give the instruction for vector extract.  %3 contains 
> the
> +;; adjusted element count.
> +(define_mode_attr VSX_EXTRACT_INSN [(V16QI "vextractub %0,%1,%3")
> + (V8HI  "vextractuh %0,%1,%3")
> + (V4SI  "xxextractuw %x0,%x1,%3")])

Could you use %2 instead, please?  Much less surprising.  And please inline
it in the one place it is used.

> +(define_insn  "vsx_extract__di"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=")
> + (zero_extend:DI
> +  (vec_select:
> +   (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "")
> +   (parallel [(match_operand:QI 2 "" "n")]]
> +  "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
> +{
> +  int element = INTVAL (operands[2]);
> +  int unit_size = GET_MODE_UNIT_SIZE (mode);
> +  int offset = ((VECTOR_ELT_ORDER_BIG)
> + ? unit_size * element
> + : unit_size * (GET_MODE_NUNITS (mode) - 1 - element));
> +
> +  operands[3] = GEN_INT (offset);
> +  return "";
> +}

So that would be

  operands[2] = GEN_INT (offset);
  if (unit_size == 4)
return "xxextractuw %x0,%x1,%2";
  else
return "vextractu %0,%1,%2";

or such.


Segher


Re: [PATCH], Add PowerPC ISA 3.0 vector extract support

2016-06-29 Thread Michael Meissner
On Wed, Jun 29, 2016 at 05:20:26PM -0500, Segher Boessenkool wrote:
> On Wed, Jun 29, 2016 at 03:33:24PM -0400, Michael Meissner wrote:
> > Is it ok to install in the trunk?
> 
> See comments below.  Okay with that taken care of.
> 
> > Ideally, I would like to also install it on the GCC 6.2 branch.  Note, it
> > depends on the June 15th changes to allow DImode into Altivec registers, and
> > the fix for PR 71677 that was submitted on June 27th being installed before
> > these patches can be applied.
> 
> Is the DImode thing not on trunk yet?

The DImode patch is on the trunk.  It is not on 6.2, and in fact since this
patch fixes a bug in that patch, both patches need to be applied at the same
time.

> > Assuming the above patches are supplied can I back port it to the GCC 6.2 
> > after
> > a burn-in period?
> 
> Yes, thanks.
> 
> 
> > +;; Mode attribute to give the instruction for vector extract.  %3 contains 
> > the
> > +;; adjusted element count.
> > +(define_mode_attr VSX_EXTRACT_INSN [(V16QI "vextractub %0,%1,%3")
> > +   (V8HI  "vextractuh %0,%1,%3")
> > +   (V4SI  "xxextractuw %x0,%x1,%3")])
> 
> Could you use %2 instead, please?  Much less surprising.  And please inline
> it in the one place it is used.

I generally don't like overwriting exisiting arguments when you are doing the
modification, but I can use %2 if you prefer.

> > +(define_insn  "vsx_extract__di"
> > +  [(set (match_operand:DI 0 "gpc_reg_operand" "=")
> > +   (zero_extend:DI
> > +(vec_select:
> > + (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "")
> > + (parallel [(match_operand:QI 2 "" "n")]]
> > +  "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
> > +{
> > +  int element = INTVAL (operands[2]);
> > +  int unit_size = GET_MODE_UNIT_SIZE (mode);
> > +  int offset = ((VECTOR_ELT_ORDER_BIG)
> > +   ? unit_size * element
> > +   : unit_size * (GET_MODE_NUNITS (mode) - 1 - element));
> > +
> > +  operands[3] = GEN_INT (offset);
> > +  return "";
> > +}
> 
> So that would be
> 
>   operands[2] = GEN_INT (offset);
>   if (unit_size == 4)
> return "xxextractuw %x0,%x1,%2";
>   else
> return "vextractu %0,%1,%2";

Ok.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Fix bogus option suggestions for RejectNegative options (PR driver/71651)

2016-06-29 Thread Joseph Myers
On Fri, 24 Jun 2016, David Malcolm wrote:

> PR driver/71651 identifies an issue where the driver can make
> nonsensical suggestions for options flagged as RejectNegative:
> 
>  $ gcc -fno-stack-protector-explicit
>  gcc: error: unrecognized command line option '-fno-stack-protector-explicit';
>  did you mean '-fno-stack-protector-explicit'?
> 
> where the driver erroneously suggests the same bogus option that was
> provided.
> 
> Fixed thusly.
> 
> Successfully bootstrapped®retested on x86_64-pc-linux-gnu.
> 
> OK for trunk and gcc-6-branch?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Offer suggestions for misspelled --param names.

2016-06-29 Thread Joseph Myers
On Mon, 27 Jun 2016, David Malcolm wrote:

> Another use of spellcheck.{c|h}, this time for --param.
> 
> Successfully bootstrapped®retested on x86_64-pc-linux-gnu;
> adds 4 PASS results to gcc.sum.
> 
> OK for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 0/9] separate shrink-wrapping

2016-06-29 Thread Bernd Schmidt

On 06/08/2016 07:26 PM, Segher Boessenkool wrote:

One thing I should try is put a USE of the saved registers at such
exits, maybe that helps those passes that now delete frame restores
to not do that.


Have you had a chance to try this?


Bernd


  1   2   >