Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-08-22 Thread Richard Sandiford
Richard Biener  writes:
> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford
>  wrote:
>>Richard Biener  writes:
>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner
>>>  wrote:
> Correct. It is truncated for integer shift, but not simd shift
> instructions. We generate a pattern in the split that only
>>generates
> the integer shift instructions.

 That's unfortunate, because it would be nice to do this in
>>simplify_rtx,
 since it's machine-independent, but that has to be conditioned on
 SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>>
>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in
>>> the patterns, like for example with
>>>
>>> (define_insn ashlSI3
>>>   [(set (match_operand 0 "")
>>>  (ashl:SI (match_operand ... )
>>>  (subreg:QI (match_operand:SI ...)))]
>>>
>>> or an explicit and:SI and combine / simplify_rtx should apply the
>>magic
>>> optimization we expect.
>>
>>The problem with the explicit AND is that you'd end up with either
>>an AND of two constants for constant shifts, or with two separate
>>patterns,
>>one for constant shifts and one for variable shifts.  (And the problem
>>in
>>theory with two patterns is that it reduces the RA's freedom, although
>>in
>>practice I guess we'd always want a constant shift where possible for
>>cost reasons, and so the RA would never need to replace pseudos with
>>constants itself.)
>>
>>I think all useful instances of this optimisation will be exposed by
>>the gimple optimisers, so maybe expand could to do it based on
>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than
>>the rtx code and it does take the mode into account.
>
> Sure, that could work as well and also take into account range info. But
> we'd then need named expanders and the result would still have the
> explicit and or need to be an unspec or a different RTL operation.

Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have
target-dependent rather than undefined behaviour, so it's OK
for a target to use shift codes with out-of-range values.  And
TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about
how the normal shift optabs behave, so I don't think we'd need new
optabs or new unspecs.

E.g. it already works this way when expanding double-word shifts,
which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There it's
possible to use a shorter sequence if you know that the shift optab
truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED
isn't defined.

Thanks,
Richard

>
> Richard. 
>
>>Thanks,
>>Richard


Re: [PATCH] Fix fallout from VRP strict-overflow changes

2017-08-22 Thread Richard Biener
On Mon, 21 Aug 2017, Martin Sebor wrote:

> On 08/21/2017 01:51 AM, Richard Biener wrote:
> > On Sat, 19 Aug 2017, Andreas Schwab wrote:
> > 
> > > On Aug 17 2017, Richard Biener  wrote:
> > > 
> > > > I was notifed I broke proper handling of undefined overflow in
> > > > multiplicative ops handling.  The following resurrects previous
> > > > behavior (and adds a testcase).
> > > > 
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > 
> > > This breaks gfortran.dg/alloc_comp_auto_array_2.f90 on aarch64 with
> > > -mabi=ilp32 (only for -O3):
> > > 
> > > FAIL: gfortran.dg/alloc_comp_auto_array_2.f90   -O3 -g  (test for excess
> > > errors)
> > > Excess errors:
> > > /opt/gcc/gcc-20170818/gcc/testsuite/gfortran.dg/alloc_comp_auto_array_2.f90:33:0:
> > > Warning: '__builtin_memcpy' specified size between 2147483648 and
> > > 4294967295 exceeds maximum object size 2147483647 [-Wstringop-overflow=]
> > 
> > I believe this is an issue that went latent when I broke VRP earlier.
> > 
> > I have opened PR81908, will amend with some initial analysis.
> 
> FWIW, the core of the problem is that the warning tends to either
> expose limitations in optimizations that were not written to make
> use of range information, or indicate additional optimization
> opportunities due to range information.  In this case, since
> the only valid value in the range the memcpy argument is in (i.e.,
> ~[0, INT_MAX]) is zero, the call could be eliminated.  But this
> isn't noticed by any pass until the expander checks the call for
> validity.
> 
> It seems to me that this could be handled by enhancing gimple-fold
> in two ways: 1) fold arguments whose range contains only one valid
> value into constants, and 2) transform calls with one or more
> arguments entirely in invalid ranges into __builtin_unreachable.
> 
> I have been thinking prototyping this solution for a while but
> haven't gotten around to it yet so I can't say what problems it
> might run into.

Well, there will always be missed optimizations so the correct fix
for this is to not warn about memcpy with size == 0.

Sure, folding can be improved but the get_size_range code is broken
and has to be fixed.

Richard.


Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-08-22 Thread Richard Biener
On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford
>>  wrote:
>>>Richard Biener  writes:
 On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner
  wrote:
>> Correct. It is truncated for integer shift, but not simd shift
>> instructions. We generate a pattern in the split that only
>>>generates
>> the integer shift instructions.
>
> That's unfortunate, because it would be nice to do this in
>>>simplify_rtx,
> since it's machine-independent, but that has to be conditioned on
> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.

 SHIFT_COUNT_TRUNCATED should go ... you should express this in
 the patterns, like for example with

 (define_insn ashlSI3
   [(set (match_operand 0 "")
  (ashl:SI (match_operand ... )
  (subreg:QI (match_operand:SI ...)))]

 or an explicit and:SI and combine / simplify_rtx should apply the
>>>magic
 optimization we expect.
>>>
>>>The problem with the explicit AND is that you'd end up with either
>>>an AND of two constants for constant shifts, or with two separate
>>>patterns,
>>>one for constant shifts and one for variable shifts.  (And the problem
>>>in
>>>theory with two patterns is that it reduces the RA's freedom, although
>>>in
>>>practice I guess we'd always want a constant shift where possible for
>>>cost reasons, and so the RA would never need to replace pseudos with
>>>constants itself.)
>>>
>>>I think all useful instances of this optimisation will be exposed by
>>>the gimple optimisers, so maybe expand could to do it based on
>>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than
>>>the rtx code and it does take the mode into account.
>>
>> Sure, that could work as well and also take into account range info. But
>> we'd then need named expanders and the result would still have the
>> explicit and or need to be an unspec or a different RTL operation.
>
> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have
> target-dependent rather than undefined behaviour, so it's OK
> for a target to use shift codes with out-of-range values.

Hmm, but that means simplify-rtx can't do anything with them because
we need to preserve target dependent behavior.  I think the RTL IL should
be always well-defined and its semantics shouldn't have any target
dependences (ideally, and if, then they should be well specified via
extra target hooks/macros).

>  And
> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about
> how the normal shift optabs behave, so I don't think we'd need new
> optabs or new unspecs.
>
> E.g. it already works this way when expanding double-word shifts,
> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There it's
> possible to use a shorter sequence if you know that the shift optab
> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED
> isn't defined.

I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK
applies to the instructions generated by the named shift patterns but _not_
general shift RTXen.  But the generated pattern contains shift RTXen and how
can we figure whether they were generated by the named expanders or
by other means?  Don't define_expand also serve as define_insn for things
like combine?

That said, from looking at the docs and your description above it seems that
semantics are not fully reflected in the RTL instruction stream?

Richard.

> Thanks,
> Richard
>
>>
>> Richard.
>>
>>>Thanks,
>>>Richard


Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-08-22 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford
>>>  wrote:
Richard Biener  writes:
> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner
>  wrote:
>>> Correct. It is truncated for integer shift, but not simd shift
>>> instructions. We generate a pattern in the split that only
generates
>>> the integer shift instructions.
>>
>> That's unfortunate, because it would be nice to do this in
simplify_rtx,
>> since it's machine-independent, but that has to be conditioned on
>> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>
> SHIFT_COUNT_TRUNCATED should go ... you should express this in
> the patterns, like for example with
>
> (define_insn ashlSI3
>   [(set (match_operand 0 "")
>  (ashl:SI (match_operand ... )
>  (subreg:QI (match_operand:SI ...)))]
>
> or an explicit and:SI and combine / simplify_rtx should apply the
magic
> optimization we expect.

The problem with the explicit AND is that you'd end up with either
an AND of two constants for constant shifts, or with two separate
patterns,
one for constant shifts and one for variable shifts.  (And the problem
in
theory with two patterns is that it reduces the RA's freedom, although
in
practice I guess we'd always want a constant shift where possible for
cost reasons, and so the RA would never need to replace pseudos with
constants itself.)

I think all useful instances of this optimisation will be exposed by
the gimple optimisers, so maybe expand could to do it based on
TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than
the rtx code and it does take the mode into account.
>>>
>>> Sure, that could work as well and also take into account range info. But
>>> we'd then need named expanders and the result would still have the
>>> explicit and or need to be an unspec or a different RTL operation.
>>
>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have
>> target-dependent rather than undefined behaviour, so it's OK
>> for a target to use shift codes with out-of-range values.
>
> Hmm, but that means simplify-rtx can't do anything with them because
> we need to preserve target dependent behavior.

Yeah, it needs to punt.  In practice that shouldn't matter much.

> I think the RTL IL should be always well-defined and its semantics
> shouldn't have any target dependences (ideally, and if, then they
> should be well specified via extra target hooks/macros).

That would be nice :-)  I think the problem has traditionally been
that shifts can be used in quite a few define_insn patterns besides
those for shift instructions.  So if your target defines shifts to have
256-bit precision (say) then you need to make sure that every define_insn
with a shift rtx will honour that.

It's more natural for target guarantees to apply to instructions than
to rtx codes.

>>  And
>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about
>> how the normal shift optabs behave, so I don't think we'd need new
>> optabs or new unspecs.
>>
>> E.g. it already works this way when expanding double-word shifts,
>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There it's
>> possible to use a shorter sequence if you know that the shift optab
>> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED
>> isn't defined.
>
> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK
> applies to the instructions generated by the named shift patterns but _not_
> general shift RTXen.  But the generated pattern contains shift RTXen and how
> can we figure whether they were generated by the named expanders or
> by other means?  Don't define_expand also serve as define_insn for things
> like combine?

Yeah, you can't (and aren't supposed to try to) reverse-engineer the
expander from the generated instructions.  TARGET_SHIFT_TRUNCATION_MASK
should only be used if you're expanding a shift optab.

> That said, from looking at the docs and your description above it seems that
> semantics are not fully reflected in the RTL instruction stream?

Yeah, the semantics go from being well-defined in the optab interface
to being target-dependent in the rtl stream.

Thanks,
Richard

>
> Richard.
>
>> Thanks,
>> Richard
>>
>>>
>>> Richard.
>>>
Thanks,
Richard


Re: [PATCH, gcc-7-branch] Backport PR80038

2017-08-22 Thread Richard Biener
On Mon, Aug 21, 2017 at 5:39 PM, Xi Ruoyao  wrote:
> On 2017-08-21 23:37 +0800, Xi Ruoyao wrote:
>> On 2017-04-25 09:30 -0600, Jeff Law wrote:
>> > On 04/14/2017 06:44 PM, Xi Ruoyao wrote:
>> > > On 2017-04-14 15:00 +0800, Xi Ruoyao wrote:
>> > > > On 2017-04-13 09:05 +0200, Richard Biener wrote:
>> > > >
>> > > > > Did you verify LTO bootstrap still works with the patch?
>> > > >
>> > > > I've just done a LTO bootstrapp (boarding a train :) ).
>> > > > It works with my patch.
>> > >
>> > > I've done dejagnu tests in lto.exp and built a Linux kernel
>> > > with lto bootstrapped GCC.   They seem good.
>> >
>> > Given Richi's general agreement around the in_lto_p, let's go with the
>> > patch on the trunk only.
>> >
>> > If you get positive feedback from Jan, then this can be backported to
>> > gcc-7 after it's been on the trunk for at least a week.
>>
>> We have 15 weeks now :)
>>
>> Re-tested on gcc-7-branch with lto-bootstrap.  No regressions.  Is it
>> OK to backport this to gcc-7-branch?
>
> I was too stupid so I didn't attach the patch :(
>
> It's attached here.

Ok for the gcc 7 branch.

Thanks,
Richard.

> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


[Ping~][PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c

2017-08-22 Thread Jiong Wang

On 10/08/17 17:39, Jiong Wang wrote:

Hi,

  A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for 
ARMv8.3-A
return address signing, it is multiplexing DW_CFA_GNU_window_save in 
CFA vendor

extension space.

  This patch adds necessary code to make it available to external, the 
GDB
patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is 
intended

to use it.

  A new DW_CFA_DUP for it is added in dwarf2.def.  The use of 
DW_CFA_DUP is to

avoid duplicated case value issue when included in libiberty/dwarfnames.

  Native x86 builds OK to make sure no macro expanding errors.

  OK for trunk?

2017-08-10  Jiong Wang  

include/
* dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP.
* dwarf2.h (DW_CFA_DUP): New define.

libiberty/
* dwarfnames.c (DW_CFA_DUP): New define.



Ping~


Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-08-22 Thread Richard Biener
On Mon, Aug 21, 2017 at 10:10 PM, Martin Sebor  wrote:
> On 08/09/2017 10:14 AM, Jeff Law wrote:
>>
>> On 08/06/2017 05:08 PM, Martin Sebor wrote:
>>

 Well, simply because the way as implemented isn't a must-alias query
 but maybe one that's good enough for warnings (reduces false positives
 but surely doesn't eliminate them).
>>>
>>>
>>> I'm very interested in reducing the rate of false positives in
>>> these and all other warnings.  As I mentioned in my comments,
>>> I did my best to weed them out of the implementation by building
>>> GDB, Glibc, Busybox, and the Linux kernel.  That of course isn't
>>> a guarantee that there aren't any.  But the first implementation
>>> of any non-trivial feature is never perfect, and hardly any
>>> warning of sufficient complexity is free of false positives, no
>>> matter here it's implemented (the middle-end, front-end, or
>>> a standalone static analysis tool).  If you spotted some cases
>>> I had missed I'd certainly be grateful for examples.  Otherwise,
>>> they will undoubtedly be reported as more software is exposed
>>> to the warning and, if possible, fixed, as happens with all
>>> other warnings.
>>
>> I think Richi is saying that the must alias query you've built isn't
>> proper/correct.  It's less about false positives for Richi and more
>> about building a proper must alias query if I understand him correctly.
>>
>> I suspect he's also saying that you can't reasonably build must-alias on
>> top of a may-alias query framework.  They're pretty different queries.
>>
>> If you need something that is close to, but not quite a must alias
>> query, then you're going to have to make a argument for that and you
>> can't call it a must alias query.
>
>
> Attached is an updated and simplified patch that avoids making
> changes to any of the may-alias functions.  It turns out that
> all the information the logic needs to determine the overlap
> is already in the ao_ref structures populated by
> ao_ref_init_from_ptr_and_size.  The only changes to the pass
> are the enhancement to ao_ref_init_from_ptr_and_size to make
> use of range info and the addition of the two new functions
> used by the -Wrestrict clients outside the pass.

Warning for memcpy (p, p, ...) is going to fire false positives all around
given the C++ FE emits those in some cases and optimization can
expose that we are dealing with self-assignments.  And *p = *p is
valid.

@@ -1028,6 +1066,10 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
}
}

+  /* Avoid folding the call if overlap is detected.  */
+  if (check_overlap && detect_overlap (loc, stmt, dest, src, len))
+   return false;
+

no, please not.  You diagnosed the call (which might be a false positive)
so why keep it undefined?  The folded stmt will either have the same
semantics (aggregate copies expanded as memcpy) or have all reads
performed before writes.

The ao_ref_init_from_ptr_and_size change misses a changelog entry.

+detect_overlap (location_t loc, gimple *stmt, tree dst, tree src, tree size,
+   bool adjust /* = false */)
+{
+  ao_ref dstref, srcref;
+  unsigned HOST_WIDE_INT range[2];
+
+  /* Initialize and store the lower bound of a constant offset (in
+ bits), disregarding the offset for the destination.  */
+  ao_ref_init_from_ptr_and_size (&dstref, dst, size, range);
+  ao_ref_init_from_ptr_and_size (&srcref, src, size, range);

just pass NULL range to the first call?

-  ref->ref = NULL_TREE;
+
+  if (offrng)
+offrng[0] = offrng[1] = 0;
+
+ ref->ref = NULL_TREE;

bogus indent

+ else if (offrng && TREE_CODE (offset) == SSA_NAME)
+   {
+ wide_int min, max;
+ value_range_type rng = get_range_info (offset, &min, &max);
+ if (rng == VR_RANGE && wi::fits_uhwi_p (min))
+   {
+ ptr = gimple_assign_rhs1 (stmt);
+ offrng[0] = BITS_PER_UNIT * min.to_uhwi ();
+ offrng[1] = BITS_PER_UNIT * max.to_uhwi ();
+
+ extra_offset = offrng[0];

you didnt' check whether max fits uhwi.  The effect of passing offrng
is not documented.

+ else
+   /* Offset range is indeterminate.  */
+   offrng[0] = offrng[1] = HOST_WIDE_INT_M1U;

I believe a cleaner interface would be to do

void
ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size, tree
*var_byte_offset)

and set *var_byte_offset to your 'offset' above, leaving 'ref'
unchanged.  The caller
can then get at the range info of var_byte_offset and adjust
ref->offset if desired.
The indeterminate state is then much cleaner - NULL_TREE.

+unsigned HOST_WIDE_INT
+refs_overlap (ao_ref *ref1, ao_ref *ref2, unsigned HOST_WIDE_INT *aloff)
+{

bool
refs_must_overlap_p (ao_ref *, ao_ref *, unsigned HOST_WIDE_INT *off,
unsinged HOST_WIDE_INT *size)

your return values are in bytes thus

+
+  // Compare pointers.
+  offset1 += mem_ref_offset (base1) << LOG2_B

Re: [PATCH] Remove -feliminate-dwarf2-dups

2017-08-22 Thread Richard Biener
On Mon, 21 Aug 2017, Richard Biener wrote:

> 
> This was agreed upon and now that early LTO debug landed here it is.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> In case the gcc_unreachable () gets through I'll instead remove the
> if as noted in my followup TODO.

Applied w/o the

@@ -10259,7 +10001,10 @@ output_die (dw_die_ref die)
  on the section beginning, not on the actual DIE.  */
   && (!flag_generate_lto
  || die->die_tag != DW_TAG_compile_unit))
-output_die_symbol (die);
+{
+  output_die_symbol (die);
+  gcc_unreachable ();
+}

   dw2_asm_output_data_uleb128 (die->die_abbrev, "(DIE (%#lx) %s)",
   (unsigned long)die->die_offset,

hunk, I'll remove it as followup (passed LTO bootstrap and the dwarf
tests with -flto -g w/o ICEing).

Richard.

> Richard.
> 
> 2017-08-21  Richard Biener  
> 
>   * common.opt (feliminate-dwarf2-dups): Ignore.
>   * doc/invoke.texi (feliminate-dwarf2-dups): Remove documentation.
>   * dwarf2out.c (push_new_compile_unit, pop_compile_unit,
>   same_die_p_wrap, compute_section_prefix,
>   is_symbol_die, assign_symbol_names, break_out_includes): Remove.
>   (comdat_symbol_id, comdat_symbol_number): Likewise.
>   (cu_hash_table_entry, cu_hash_table_entry_hasher, cu_hash_type):
>   Likewise.
>   (check_duplicate_cu, record_comdat_symbol_number): Likewise.
>   (output_die): Mark unreachable path unreachable.
>   (dwarf2out_start_source_file): Do not create DW_TAG_GNU_BINCL.
>   (dwarf2out_end_source_file): Do not create DW_TAG_GNU_EINCL.
>   (dwarf2out_init): Remove code handling flag_eliminate_dwarf2_dups.
>   (dwarf2out_early_finish): Likewise.
> 
>   * g++.dg/debug/dwarf2/dwarf2-1.C: Remove -feliminate-dwarf2-dups.
>   * g++.dg/debug/dwarf2/dwarf2-2.C: Likewise.
>   * g++.dg/debug/dwarf2/pr46123-2.C: Likewise.
>   * g++.dg/debug/dwarf2/typedef5.C: Likewise.
>   * gcc.dg/debug/dwarf2/dwarf2-3.c: Likewise.
>   * gcc.dg/debug/dwarf2/dwarf2-3.h: Likewise.
>   * gcc.dg/debug/dwarf2/dups-types.c: Remove.
>   * gcc.dg/debug/dwarf2/dups-types.h: Likewise.
> 
> Index: gcc/common.opt
> ===
> --- gcc/common.opt(revision 251222)
> +++ gcc/common.opt(working copy)
> @@ -1303,8 +1303,8 @@ Common Report Var(flag_early_inlining) I
>  Perform early inlining.
>  
>  feliminate-dwarf2-dups
> -Common Report Var(flag_eliminate_dwarf2_dups)
> -Perform DWARF duplicate elimination.
> +Common Ignore
> +Does nothing.  Preserved for backward compatibility.
>  
>  fipa-sra
>  Common Report Var(flag_ipa_sra) Init(0) Optimization
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi   (revision 251222)
> +++ gcc/doc/invoke.texi   (working copy)
> @@ -347,7 +347,7 @@ Objective-C and Objective-C++ Dialects}.
>  -gcolumn-info  -gno-column-info @gol
>  -gvms  -gxcoff  -gxcoff+  -gz@r{[}=@var{type}@r{]} @gol
>  -fdebug-prefix-map=@var{old}=@var{new}  -fdebug-types-section @gol
> --feliminate-dwarf2-dups  -fno-eliminate-unused-debug-types @gol
> +-fno-eliminate-unused-debug-types @gol
>  -femit-struct-debug-baseonly  -femit-struct-debug-reduced @gol
>  -femit-struct-debug-detailed@r{[}=@var{spec-list}@r{]} @gol
>  -feliminate-unused-debug-symbols  -femit-class-debug-always @gol
> @@ -6834,8 +6834,8 @@ for maximum benefit.
>  
>  GCC no longer supports DWARF Version 1, which is substantially
>  different than Version 2 and later.  For historical reasons, some
> -other DWARF-related options (including @option{-feliminate-dwarf2-dups} 
> -and @option{-fno-dwarf2-cfi-asm}) retain a reference to DWARF Version 2
> +other DWARF-related options such as
> +@option{-fno-dwarf2-cfi-asm}) retain a reference to DWARF Version 2
>  in their names, but apply to all currently-supported versions of DWARF.
>  
>  @item -gstabs
> @@ -7027,12 +7027,6 @@ writing compressed debug sections, the o
>  if the assembler does not support them, @option{-gz} is silently ignored
>  when producing object files.
>  
> -@item -feliminate-dwarf2-dups
> -@opindex feliminate-dwarf2-dups
> -Compress DWARF debugging information by eliminating duplicated
> -information about each symbol.  This option only makes sense when
> -generating DWARF debugging information.
> -
>  @item -femit-struct-debug-baseonly
>  @opindex femit-struct-debug-baseonly
>  Emit debug information for struct-like types
> Index: gcc/dwarf2out.c
> ===
> --- gcc/dwarf2out.c   (revision 251222)
> +++ gcc/dwarf2out.c   (working copy)
> @@ -3422,8 +3422,6 @@ static void equate_decl_number_to_die (t
>  static struct var_loc_node *add_var_loc_to_decl (tree, rtx, const char *);
>  static void print_spaces (FILE *);
>  static void print_die (dw_die_ref, FILE *);
> -static dw_die_ref push_new_c

[PATCH] Avoid re-allocating PHIs in split_edge

2017-08-22 Thread Richard Biener

The following patch makes sure to not grow the number of incoming
edges in the destination when doing split_edge on GIMPLE.  That's
easy by first redirecting the existing edge to the destination
to the new block rather than creating the new fallthru from the
new block to the destination.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-08-22  Richard Biener  

* tree-cfg.c (gimple_split_edge): Avoid reallocating target
PHI nodes.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 251215)
+++ gcc/tree-cfg.c  (working copy)
@@ -2844,10 +2844,11 @@ gimple_split_edge (edge edge_in)
   new_bb = create_empty_bb (after_bb);
   new_bb->frequency = EDGE_FREQUENCY (edge_in);
   new_bb->count = edge_in->count;
-  new_edge = make_single_succ_edge (new_bb, dest, EDGE_FALLTHRU);
 
   e = redirect_edge_and_branch (edge_in, new_bb);
   gcc_assert (e == edge_in);
+
+  new_edge = make_single_succ_edge (new_bb, dest, EDGE_FALLTHRU);
   reinstall_phi_args (new_edge, e);
 
   return new_bb;


Re: [PATCH] Switch on *.cc tests for g++ ASan

2017-08-22 Thread Richard Biener
On Wed, Aug 9, 2017 at 6:27 PM, Jeff Law  wrote:
> On 08/07/2017 11:59 PM, Slava Barinov wrote:
>>   * g++.dg/asan/asan.exp: Switch on *.cc tests.
>>
>> Signed-off-by: Slava Barinov 
>> ---
>>  gcc/testsuite/ChangeLog| 4 
>>  gcc/testsuite/g++.dg/asan/asan.exp | 2 +-
>>  2 files changed, 5 insertions(+), 1 deletion(-)
> Thanks.  Installed.

They all fail, I tried to fix that up but failed.  Thus reverted.

Richard.

> jeff


[AArch64, committed] Fix label mode

2017-08-22 Thread Richard Sandiford
This patch fixes a case where we tried to force a VOIDmode label
into a DImode register, which led to invalid rtl that was previously
undiagnosed.

Tested on aarch64-linux-gnu and applied as obvious.

Richard


2017-08-22  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64.md (casesi): Use DImode rather than
VOIDmode for the LABEL_REF.

Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2017-08-22 10:11:45.067177420 +0100
+++ gcc/config/aarch64/aarch64.md   2017-08-22 10:13:02.285442249 +0100
@@ -498,7 +498,7 @@ (define_expand "casesi"
 const0_rtx),
operands[0], operands[2], operands[4]));
 
-operands[2] = force_reg (DImode, gen_rtx_LABEL_REF (VOIDmode, 
operands[3]));
+operands[2] = force_reg (DImode, gen_rtx_LABEL_REF (DImode, operands[3]));
 emit_jump_insn (gen_casesi_dispatch (operands[2], operands[0],
 operands[3]));
 DONE;


[AArch64] Remove use of wider vector modes

2017-08-22 Thread Richard Sandiford
The AArch64 port defined x2, x3 and x4 vector modes that were only used
in the rtl for the AdvSIMD LD{2,3,4} patterns.  It seems unlikely that
this rtl would have led to any valid simplifications, since the values
involved were unspecs that had a different number of operands from the
non-dreg versions.  (The dreg UNSPEC_LD2 had a single operand, while
the qreg one had two operands.)

As it happened, the patterns led to invalid simplifications on big-
endian targets due to a mix-up in the operand order, see Tamar's fix
in r240271.

This patch therefore replaces the rtl patterns with dedicated unspecs.
This allows the x2, x3 and x4 modes to be removed, avoiding a clash
with 256-bit and 512-bit SVE.

Tested on aarch64-linux-gnu.  Also tested by comparing the before and after
assembly for the testsuite at -O2 -ftree-vectorize on aarch64-linux-gnu
and aarch64_be-linux-gnu; there were no differences.  OK to install?

Richard


2017-08-22  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-modes.def: Remove 32-, 48- and 64-byte
vector modes.
* config/aarch64/iterators.md (VRL2, VRL3, VRL4): Delete.
* config/aarch64/aarch64.md (UNSPEC_LD2_DREG, UNSPEC_LD3_DREG)
(UNSPEC_LD4_DREG): New unspecs.
* config/aarch64/aarch64-simd.md (aarch64_ld2_dreg_le)
(aarch64_ld2_dreg_be): Replace with...
(aarch64_ld2_dreg): ...this pattern and use the new DREG
unspec.
(aarch64_ld3_dreg_le)
(aarch64_ld3_dreg_be): Replace with...
(aarch64_ld3_dreg): ...this pattern and use the new DREG
unspec.
(aarch64_ld4_dreg_le)
(aarch64_ld4_dreg_be): Replace with...
(aarch64_ld4_dreg): ...this pattern and use the new DREG
unspec.

Index: gcc/config/aarch64/aarch64-modes.def
===
--- gcc/config/aarch64/aarch64-modes.def2017-02-23 19:54:24.0 
+
+++ gcc/config/aarch64/aarch64-modes.def2017-08-22 10:11:04.724056356 
+0100
@@ -44,15 +44,5 @@ INT_MODE (OI, 32);
 INT_MODE (CI, 48);
 INT_MODE (XI, 64);
 
-/* Vector modes for register lists.  */
-VECTOR_MODES (INT, 32);/* V32QI V16HI V8SI V4DI.  */
-VECTOR_MODES (FLOAT, 32);  /* V8SF V4DF.  */
-
-VECTOR_MODES (INT, 48);/* V32QI V16HI V8SI V4DI.  */
-VECTOR_MODES (FLOAT, 48);  /* V8SF V4DF.  */
-
-VECTOR_MODES (INT, 64);/* V32QI V16HI V8SI V4DI.  */
-VECTOR_MODES (FLOAT, 64);  /* V8SF V4DF.  */
-
 /* Quad float: 128-bit floating mode for long doubles.  */
 FLOAT_MODE (TF, 16, ieee_quad_format);
Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2017-08-03 10:40:55.896279339 +0100
+++ gcc/config/aarch64/iterators.md 2017-08-22 10:11:04.727125997 +0100
@@ -711,21 +711,6 @@ (define_mode_attr Vendreg [(OI "T") (CI
 ;; ld..._lane and st..._lane operations.
 (define_mode_attr nregs [(OI "2") (CI "3") (XI "4")])
 
-(define_mode_attr VRL2 [(V8QI "V32QI") (V4HI "V16HI")
-   (V4HF "V16HF")
-   (V2SI "V8SI")  (V2SF "V8SF")
-   (DI   "V4DI")  (DF   "V4DF")])
-
-(define_mode_attr VRL3 [(V8QI "V48QI") (V4HI "V24HI")
-   (V4HF "V24HF")
-   (V2SI "V12SI")  (V2SF "V12SF")
-   (DI   "V6DI")  (DF   "V6DF")])
-
-(define_mode_attr VRL4 [(V8QI "V64QI") (V4HI "V32HI")
-   (V4HF "V32HF")
-   (V2SI "V16SI")  (V2SF "V16SF")
-   (DI   "V8DI")  (DF   "V8DF")])
-
 ;; Mode for atomic operation suffixes
 (define_mode_attr atomic_sfx
   [(QI "b") (HI "h") (SI "") (DI "")])
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2017-08-16 08:50:34.060622654 +0100
+++ gcc/config/aarch64/aarch64.md   2017-08-22 10:11:04.726102783 +0100
@@ -98,10 +98,13 @@ (define_c_enum "unspec" [
 UNSPEC_GOTTINYTLS
 UNSPEC_LD1
 UNSPEC_LD2
+UNSPEC_LD2_DREG
 UNSPEC_LD2_DUP
 UNSPEC_LD3
+UNSPEC_LD3_DREG
 UNSPEC_LD3_DUP
 UNSPEC_LD4
+UNSPEC_LD4_DREG
 UNSPEC_LD4_DUP
 UNSPEC_LD2_LANE
 UNSPEC_LD3_LANE
Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  2017-08-17 17:29:27.227162205 +0100
+++ gcc/config/aarch64/aarch64-simd.md  2017-08-22 10:11:04.726102783 +0100
@@ -4963,278 +4963,62 @@ (define_expand "aarch64_ld_dreg_le"
+(define_insn "aarch64_ld2_dreg"
   [(set (match_operand:OI 0 "register_operand" "=w")
-   (subreg:OI
- (vec_concat:
-   (vec_concat:
-(unspec:VD
-   [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
-   UNSPEC_LD2)
-(vec_

[AArch64] Rename cmp_result iterator

2017-08-22 Thread Richard Sandiford
The comparison results provided by the V_cmp_result/v_cmp_result
attribute were simply the corresponding integer vector.  We'd also
like to have easy access to the integer vector for SVE, but using
"cmp_result" would be confusing because SVE comparisons return
predicates instead of vectors.  This patch therefore renames the
attributes to the more general V_INT_EQUIV/v_int_equiv instead.

As to the capitalisation: there are already many iterators that use
all lowercase vs. all uppercase names to distinguish all lowercase
vs. all uppercase expansions (e.g. fcvt_target and FCVT_TARGET).
It's also the convention used for the built-in mode/MODE/code/CODE/etc.
attributes.  IMO those names are easier to read at a glance, rather than
relying on a single letter's difference.

Tested on aarch64-linux-gnu.  OK to install?

Richard


2017-08-22  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/iterators.md (V_cmp_result): Rename to...
(V_INT_EQUIV): ...this.
(v_cmp_result): Rename to...
(v_int_equiv): ...this.
* config/aarch64/aarch64.md (xorsign3): Update accordingly.
* config/aarch64/aarch64-simd.md (xorsign3): Likewise.
(copysign3): Likewise.
(aarch64_simd_bsl_internal): Likewise.
(aarch64_simd_bsl): Likewise.
(vec_cmp): Likewise.
(vcond): Likewise.
(vcond): Likewise.
(vcondu): Likewise.
(aarch64_cm): Likewise.
(aarch64_cmtst): Likewise.
(aarch64_fac): Likewise.
(vec_perm_const): Likewise.
(vcond_mask_): Rename to...
(vcond_mask_): ...this.
(vec_cmp): Rename to...
(vec_cmp): ...this.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2017-08-22 10:11:04.727125997 +0100
+++ gcc/config/aarch64/iterators.md 2017-08-22 10:11:45.067177420 +0100
@@ -662,25 +662,25 @@ (define_mode_attr vwcore  [(V8QI "w") (V
 ;; Double vector types for ALLX.
 (define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
 
-;; Mode of result of comparison operations.
-(define_mode_attr V_cmp_result [(V8QI "V8QI") (V16QI "V16QI")
-   (V4HI "V4HI") (V8HI  "V8HI")
-   (V2SI "V2SI") (V4SI  "V4SI")
-   (DI   "DI")   (V2DI  "V2DI")
-   (V4HF "V4HI") (V8HF  "V8HI")
-   (V2SF "V2SI") (V4SF  "V4SI")
-   (V2DF "V2DI") (DF"DI")
-   (SF   "SI")   (HF"HI")])
+;; Mode with floating-point values replaced by like-sized integers.
+(define_mode_attr V_INT_EQUIV [(V8QI "V8QI") (V16QI "V16QI")
+  (V4HI "V4HI") (V8HI  "V8HI")
+  (V2SI "V2SI") (V4SI  "V4SI")
+  (DI   "DI")   (V2DI  "V2DI")
+  (V4HF "V4HI") (V8HF  "V8HI")
+  (V2SF "V2SI") (V4SF  "V4SI")
+  (V2DF "V2DI") (DF"DI")
+  (SF   "SI")   (HF"HI")])
 
-;; Lower case mode of results of comparison operations.
-(define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
-   (V4HI "v4hi") (V8HI  "v8hi")
-   (V2SI "v2si") (V4SI  "v4si")
-   (DI   "di")   (V2DI  "v2di")
-   (V4HF "v4hi") (V8HF  "v8hi")
-   (V2SF "v2si") (V4SF  "v4si")
-   (V2DF "v2di") (DF"di")
-   (SF   "si")])
+;; Lower case mode with floating-point values replaced by like-sized integers.
+(define_mode_attr v_int_equiv [(V8QI "v8qi") (V16QI "v16qi")
+  (V4HI "v4hi") (V8HI  "v8hi")
+  (V2SI "v2si") (V4SI  "v4si")
+  (DI   "di")   (V2DI  "v2di")
+  (V4HF "v4hi") (V8HF  "v8hi")
+  (V2SF "v2si") (V4SF  "v4si")
+  (V2DF "v2di") (DF"di")
+  (SF   "si")])
 
 ;; Mode for vector conditional operations where the comparison has
 ;; different type from the lhs.
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2017-08-22 10:11:04.726102783 +0100
+++ gcc/config/aarch64/aarch64.md   2017-08-22 10:11:45.067177420 +0100
@@ -5196,7 +5196,7 @@ (define_expand "xorsign3"
   "TARGET_FLOAT && TARGET_SIMD"
 {
 
-  machine_mode imode = mode;
+  machine_mode imode = mode;
   rtx mask = gen_reg_rtx (imode);
   rtx op1x = gen_reg_rtx (imode);
   rtx op2x = gen_reg_rtx (imode);
@@ -5205,13 +5205,13 @@ (define_expand "xorsign3"
   emit_move_insn (mask, GEN_INT (trunc_int

[AArch64] Tweak aarch64_classify_address interface

2017-08-22 Thread Richard Sandiford
Previously aarch64_classify_address used an rtx code to distinguish
LDP/STP addresses from normal addresses; the code was PARALLEL
to select LDP/STP and anything else to select normal addresses.
This patch replaces that parameter with a dedicated enum.

The SVE port will add another enum value that didn't map naturally
to an rtx code.

Tested on aarch64-linux-gnu.  OK to install?


2017-08-22  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_addr_query_type): New enum.
(aarch64_legitimate_address_p): Use it instead of an rtx code.
* config/aarch64/aarch64.c (aarch64_classify_address): Likewise.
(aarch64_legitimate_address_p): Likewise.
(aarch64_address_valid_for_prefetch_p): Update calls accordingly.
(aarch64_legitimate_address_hook_p): Likewise.
(aarch64_print_operand_address): Likewise.
(aarch64_address_cost): Likewise.
* config/aarch64/aarch64-simd.md (mov): Likewise.
* config/aarch64/constraints.md (Uad): Likewise.
* config/aarch64/predicates.md (aarch64_mem_pair_operand): Likewise.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2017-08-03 10:40:55.900281836 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-08-22 10:12:29.278432707 +0100
@@ -111,6 +111,19 @@ enum aarch64_symbol_type
   SYMBOL_FORCE_TO_MEM
 };
 
+/* Classifies the type of an address query.
+
+   ADDR_QUERY_M
+  Query what is valid for an "m" constraint and a memory_operand
+  (the rules are the same for both).
+
+   ADDR_QUERY_LDP_STP
+  Query what is valid for a load/store pair.  */
+enum aarch64_addr_query_type {
+  ADDR_QUERY_M,
+  ADDR_QUERY_LDP_STP
+};
+
 /* A set of tuning parameters contains references to size and time
cost models and vectors for address cost calculations, register
move costs and memory move costs.  */
@@ -433,7 +446,8 @@ bool aarch64_float_const_representable_p
 
 #if defined (RTX_CODE)
 
-bool aarch64_legitimate_address_p (machine_mode, rtx, RTX_CODE, bool);
+bool aarch64_legitimate_address_p (machine_mode, rtx,
+  aarch64_addr_query_type, bool);
 machine_mode aarch64_select_cc_mode (RTX_CODE, rtx, rtx);
 rtx aarch64_gen_compare_reg (RTX_CODE, rtx, rtx);
 rtx aarch64_load_tp (rtx);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-08-10 14:36:08.826443491 +0100
+++ gcc/config/aarch64/aarch64.c2017-08-22 10:12:29.280432708 +0100
@@ -4385,21 +4385,21 @@ virt_or_elim_regno_p (unsigned regno)
  || regno == ARG_POINTER_REGNUM);
 }
 
-/* Return true if X is a valid address for machine mode MODE.  If it is,
-   fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
-   effect.  OUTER_CODE is PARALLEL for a load/store pair.  */
+/* Return true if X is a valid address of type TYPE for machine mode MODE.
+   If it is, fill in INFO appropriately.  STRICT_P is true if
+   REG_OK_STRICT is in effect.  */
 
 static bool
 aarch64_classify_address (struct aarch64_address_info *info,
  rtx x, machine_mode mode,
- RTX_CODE outer_code, bool strict_p)
+ aarch64_addr_query_type type, bool strict_p)
 {
   enum rtx_code code = GET_CODE (x);
   rtx op0, op1;
 
   /* On BE, we use load/store pair for all large int mode load/stores.
  TI/TFmode may also use a load/store pair.  */
-  bool load_store_pair_p = (outer_code == PARALLEL
+  bool load_store_pair_p = (type == ADDR_QUERY_LDP_STP
|| mode == TImode
|| mode == TFmode
|| (BYTES_BIG_ENDIAN
@@ -4631,7 +4631,8 @@ aarch64_address_valid_for_prefetch_p (rt
   struct aarch64_address_info addr;
 
   /* PRFM accepts the same addresses as DImode...  */
-  bool res = aarch64_classify_address (&addr, x, DImode, MEM, strict_p);
+  bool res = aarch64_classify_address (&addr, x, DImode, ADDR_QUERY_M,
+  strict_p);
   if (!res)
 return false;
 
@@ -4667,19 +4668,18 @@ aarch64_legitimate_address_hook_p (machi
 {
   struct aarch64_address_info addr;
 
-  return aarch64_classify_address (&addr, x, mode, MEM, strict_p);
+  return aarch64_classify_address (&addr, x, mode, ADDR_QUERY_M, strict_p);
 }
 
-/* Return TRUE if X is a legitimate address for accessing memory in
-   mode MODE.  OUTER_CODE will be PARALLEL if this is a load/store
-   pair operation.  */
+/* Return TRUE if X is a legitimate address of type TYPE for accessing
+   memory in mode MODE.  STRICT_P is true if REG_OK_STRICT is in effect.  */
 bool
 aarch64_legitimate_address_p (machine_mode mode, rtx x,
- RTX_CODE outer_code, bool strict_p)
+ aar

[AArch64] Tighten address register subreg checks

2017-08-22 Thread Richard Sandiford
Previously we allowed subregs of non-GPR modes to be base and index
registers in non-strict mode.  In practice such subregs will always
require a reload, so we get better code by disallowing them.

Tested on aarch64-linux-gnu.  OK to install?

Richard


2017-08-22  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64.c (aarch64_base_register_rtx_p): Only allow
subregs whose inner modes can be stored in GPRs.
(aarch64_classify_index): Likewise.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-08-22 10:12:29.280432708 +0100
+++ gcc/config/aarch64/aarch64.c2017-08-22 10:13:53.032828125 +0100
@@ -4196,7 +4196,9 @@ aarch64_regno_ok_for_base_p (int regno,
 static bool
 aarch64_base_register_rtx_p (rtx x, bool strict_p)
 {
-  if (!strict_p && GET_CODE (x) == SUBREG)
+  if (!strict_p
+  && GET_CODE (x) == SUBREG
+  && contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (x))])
 x = SUBREG_REG (x);
 
   return (REG_P (x) && aarch64_regno_ok_for_base_p (REGNO (x), strict_p));
@@ -4343,7 +4345,9 @@ aarch64_classify_index (struct aarch64_a
   else
 return false;
 
-  if (GET_CODE (index) == SUBREG)
+  if (!strict_p
+  && GET_CODE (index) == SUBREG
+  && contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))])
 index = SUBREG_REG (index);
 
   if ((shift == 0 ||


Re: std::list optimizations

2017-08-22 Thread Jonathan Wakely

On 21/08/17 21:14 +0200, François Dumont wrote:

On 18/08/2017 22:04, Jonathan Wakely wrote:

On 28/07/17 18:42 +0200, François Dumont wrote:

Hi

  Completing execution of the testsuite revealed a bug. So here 
is the correct version of this patch.


François

On 21/07/2017 19:14, François Dumont wrote:

Hi

  Here is a proposal for 2 optimizations in the std::list 
implementation.


  Optimization on the move constructor taking an allocator for 
always equal allocators. Compare to the version in my previous 
std::list patch I am now doing it at std::list level rather than 
at _List_base level. This way we won't instantiate the insert 
call and we won't check for empty list when the allocator always 
compare equal.


  2nd optimization, I replace the _S_distance method by the 
std::distance algo which benefit from the nice [begin(), end()) 
range optimization when cxx11 abi is being used.


  Note that I am proposing the 2 change in 1 patch to save some 
review time but I can commit those separately.


Tested under x86_64 Linux normal mode.


  * include/bits/stl_list.h
  (_List_base<>::_S_distance): Remove.
  (_List_impl(_List_impl&&, _Node_alloc_type&&)): New.
  (_List_base(_List_base&&, _Node_alloc_type&&)): Use latter.
  (_List_base(_Node_alloc_type&&)): New.
  (_List_base<>::_M_distance, _List_base<>::_M_node_count): Move...
  (list<>::_M_distance, list<>::_M_node_count): ...here. 
Replace calls to

  _S_distance with calls to std::distance.
  (list(list&&, const allocator_type&, true_type)): New.
  (list(list&&, const allocator_type&, false_type)): New.
  (list(list&&, const allocator_type&)): Adapt to call latters.

Ok to commit ?

François






 _List_base(_List_base&&) = default;

 _List_base(_List_base&& __x, _Node_alloc_type&& __a)
+  : _M_impl(std::move(__x._M_impl), std::move(__a))
+  { }
+
+  _List_base(_Node_alloc_type&& __a)
 : _M_impl(std::move(__a))
-  {
-if (__x._M_get_Node_allocator() == _M_get_Node_allocator())
-  _M_move_nodes(std::move(__x));
-// else caller must move individual elements.
-  }
+  { }



I like this change in principle, but it alters the behaviour of an
existing constructor. Existing code might use the constructor and get
broken by this.

You can avoid that by leaving the existing constructor alone and
adding two new ones for new code to use. Reordering the parameters
will make the new one distinct from the old one:

// Used when is_always_equal
_List_base(_Node_alloc_type&& __a, _List_base&& __x))
: _M_impl(std::move(__x._M_impl), std::move(__a))
{ }


I have chosen this approach and also adapt the _List_impl class to 
have same signature which moreover correspond to order of members so 
maybe not so bad.




_M_distance could be static though, neither version uses the 'this'
pointer, so it would be called _S_distance.


Applied.



Do those suggestions make sense? The idea is to ensure that a given
function signature continues to have the same effects. To introduce
new effects, use a new signature.


Sure, I didn't consider the explicit instantiation use case, makes sens.

So here is a new version. I propose to "mark" code kept for backward 
abi compatibility with the _GLIBCXX_INLINE_VERSION. Next time we 
decide to break abi we will just need to look for this macro to know 
what code can be removed.


Great, thanks.


Ok to commit if tests are successful ?


Yes, thanks.




Re: [PATCH][2/2] early LTO debug, main part

2017-08-22 Thread Szabolcs Nagy
On 04/08/17 13:21, Richard Biener wrote:
> On Thu, 3 Aug 2017, Jason Merrill wrote:
>> OK if testing passes.
> 
> Thanks.  Meanwhile testing passed.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, LTO bootstrapped
> on x86_64-unknown-linux-gnu (both all languages).  I've successfully
> built SPEC CPU 2006 with -flto -g (provides reasonable Fortran coverage).
> I've successfully ran the GCC testsuite with -flto -g which shows lots
> of FAILs but doesn't regress in any unexpected ways compared to before
> the patches.
> 
> I'll ping Ian about the simple-object part again and will apply
> earliest at Aug 14th.
> 
> Richard.
> 

on aarch64_be-none-elf i see

PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 (test for excess errors)
PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -O (test for excess errors)
PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -O3 (test for excess errors)
PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 (test for excess errors)
PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 -O (test for excess errors)
PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 -O3 (test for excess errors)
PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 (test for excess errors)
PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O (test for excess errors)
PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O3 (test for excess errors)
PASS->FAIL: gcc.dg/lto/20090914-1 c_lto_20090914-1_0.o-c_lto_20090914-1_0.o 
link, -flto
PASS->FAIL: gcc.dg/lto/20100426 c_lto_20100426_0.o-c_lto_20100426_0.o link, -r 
-nostdlib -flto -g
PASS->FAIL: gcc.dg/lto/20111207-2 c_lto_20111207-2_0.o-c_lto_20111207-2_0.o 
link,  -g -O -flto
PASS->FAIL: gcc.dg/lto/20111213-1 c_lto_20111213-1_0.o-c_lto_20111213-1_0.o 
link,  -flto -g
PASS->FAIL: gcc.dg/lto/pr51572-1 c_lto_pr51572-1_0.o-c_lto_pr51572-1_0.o link,  
-flto -g
PASS->FAIL: gcc.dg/lto/pr53470 c_lto_pr53470_0.o-c_lto_pr53470_0.o link,  -flto 
-g
PASS->FAIL: gcc.dg/lto/pr59323 c_lto_pr59323_0.o-c_lto_pr59323_0.o link,  -O2 
-g -flto
PASS->FAIL: gcc.dg/lto/pr59323-2 c_lto_pr59323-2_0.o-c_lto_pr59323-2_0.o link,  
-O2 -g -flto
PASS->FAIL: gcc.dg/pr43557-1.c (test for excess errors)

linking seems to fail with

/tmp/ccqAb1Wfdebugobjtem: file not recognized: Bad value
collect2: error: ld returned 1 exit status
lto-wrapper: fatal error: B/gcc/xgcc returned 1 exit status
compilation terminated.
P/aarch64_be-none-elf/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
compiler exited with status 1

similar regressions on g++ tests:

PASS->FAIL: g++.dg/lto/20101010-4 cp_lto_20101010-4_0.o-cp_lto_20101010-4_0.o 
link,  -std=c++0x -flto -g -r
-nostdlib
PASS->FAIL: g++.dg/lto/20101015-2 cp_lto_20101015-2_0.o-cp_lto_20101015-2_0.o 
link,  -g -flto
PASS->FAIL: g++.dg/lto/pr42987 cp_lto_pr42987_0.o-cp_lto_pr42987_1.o link,  
-flto -flto-partition=none -g
PASS->FAIL: g++.dg/lto/pr42987 cp_lto_pr42987_0.o-cp_lto_pr42987_1.o link,  
-flto -g
PASS->FAIL: g++.dg/lto/pr48207 cp_lto_pr48207_0.o-cp_lto_pr48207_0.o link,  
-flto -g
PASS->FAIL: g++.dg/lto/pr48207-2 cp_lto_pr48207-2_0.o-cp_lto_pr48207-2_0.o 
link,  -flto -g
PASS->FAIL: g++.dg/lto/pr48207-3 cp_lto_pr48207-3_0.o-cp_lto_pr48207-3_0.o 
link,  -flto -g
PASS->FAIL: g++.dg/lto/pr48354-1 cp_lto_pr48354-1_0.o-cp_lto_pr48354-1_0.o 
link,  -g -flto
PASS->FAIL: g++.dg/lto/pr48508-1 cp_lto_pr48508-1_0.o-cp_lto_pr48508-1_1.o 
link,  -g -O2 -flto
-flto-partition=none
PASS->FAIL: g++.dg/lto/pr51564-1 cp_lto_pr51564-1_0.o-cp_lto_pr51564-1_0.o 
link,  -flto -g
PASS->FAIL: g++.dg/lto/pr51567-1 cp_lto_pr51567-1_0.o-cp_lto_pr51567-1_0.o 
link,  -flto -g
PASS->FAIL: g++.dg/lto/pr51572-2 cp_lto_pr51572-2_0.o-cp_lto_pr51572-2_0.o 
link,  -g -flto
PASS->FAIL: g++.dg/lto/pr51573-1 cp_lto_pr51573-1_0.o-cp_lto_pr51573-1_0.o 
link,  -flto -g
PASS->FAIL: g++.dg/lto/pr51650-1 cp_lto_pr51650-1_0.o-cp_lto_pr51650-1_0.o 
link,  -flto -g
PASS->FAIL: g++.dg/lto/pr51650-2 cp_lto_pr51650-2_0.o-cp_lto_pr51650-2_0.o 
link,  -flto -g
PASS->FAIL: g++.dg/lto/pr51650-3 cp_lto_pr51650-3_0.o-cp_lto_pr51650-3_0.o 
link,  -flto -g
PASS->FAIL: g++.dg/lto/pr52605 cp_lto_pr52605_0.o-cp_lto_pr52605_0.o link, 
-flto -g
PASS->FAIL: g++.dg/lto/pr53470 cp_lto_pr53470_0.o-cp_lto_pr53470_0.o link,  -g 
-flto
PASS->FAIL: g++.dg/lto/pr65193 cp_lto_pr65193_0.o-cp_lto_pr65193_0.o link, 
-fPIC -r -nostdlib -flto -O2 -g
PASS->FAIL: g++.dg/lto/pr65316 cp_lto_pr65316_0.o-cp_lto_pr65316_1.o link,  
-flto -std=c++11 -g2
-fno-lto-odr-type-merging -O2
PASS->FAIL: g++.dg/lto/pr65549 cp_lto_pr65549_0.o-cp_lto_pr65549_0.o link,  
-std=gnu++14 -flto -g
PASS->FAIL: g++.dg/lto/pr65549 cp_lto_pr65549_0.o-cp_lto_pr65549_0.o link,  
-std=gnu++14 -flto -g -O2
-fno-inline -flto-partition=max
PASS->FAIL: g++.dg/lto/pr69077 cp_lto_pr69077_0.o-cp_lto_pr69077_1.o link,  -O3 
-g -flto
PASS->FAIL: g++.dg/lto/pr69137 cp_lto_pr69137_0.o-cp_lto_pr69137_0.o link,  
-std=c++11 -g -flto
PASS->FAIL: g++.dg/lto/pr79000 cp_lto_pr79000_0.o-cp_lto_pr79000_1.o link, 
-flto -g
NA->FAIL: g++.dg/templa

[PATCH] Fix PR81921

2017-08-22 Thread Richard Biener

The following patch fixes PR81921 (and LTO build of libgo) which I ran
into when trying to enable free-lang-data for non-LTO compiles.

free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all functions
so we have them ending up with target_option_default_node eventually
which is something ix86_can_inline_p doesn't expect (I tried forcing
a compare of the actual options but that fails as well as we get
spurious differences in use-fpmath, the default node with -m32 
-march=x86_64 doesn't have it while non-default nodes have it).

The patch is what I consider safe for branches, we might want to work
on sth better (actually comparing always and fixing the fpmath issue)
on trunk as followup.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk
and active branches?

Note the change to the ref = false conditional isn't strictly necessary
but it makes -flto and non-flto behave consistently.

Thanks,
Richard.

2017-08-22  Richard Biener  

PR target/81921
* config/i386/i386.c (ix86_can_inline_p): Treat
target_option_default_node as non-existent.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 251266)
+++ gcc/config/i386/i386.c  (working copy)
@@ -7507,12 +7507,12 @@ ix86_can_inline_p (tree caller, tree cal
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
   /* If callee has no option attributes, then it is ok to inline.  */
-  if (!callee_tree)
+  if (!callee_tree || callee_tree == target_option_default_node)
 ret = true;
 
   /* If caller has no option attributes, but callee does then it is not ok to
  inline.  */
-  else if (!caller_tree)
+  else if (!caller_tree || caller_tree == target_option_default_node)
 ret = false;
 
   else
Index: gcc/testsuite/gcc.target/i386/pr81921.c
===
--- gcc/testsuite/gcc.target/i386/pr81921.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr81921.c (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -march=x86-64" } */
+
+extern __inline int  __attribute__((__gnu_inline__, __always_inline__, 
__artificial__, target("sse2")))
+_mm_loadu_si128 (int const *__P)
+{
+return *__P;
+}
+
+void __attribute__((target("ssse3"))) foo (void *p)
+{
+  volatile int x = _mm_loadu_si128 (p);
+}


Re: [PATCH][2/2] early LTO debug, main part

2017-08-22 Thread Richard Biener
On Tue, 22 Aug 2017, Szabolcs Nagy wrote:

> On 04/08/17 13:21, Richard Biener wrote:
> > On Thu, 3 Aug 2017, Jason Merrill wrote:
> >> OK if testing passes.
> > 
> > Thanks.  Meanwhile testing passed.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, LTO bootstrapped
> > on x86_64-unknown-linux-gnu (both all languages).  I've successfully
> > built SPEC CPU 2006 with -flto -g (provides reasonable Fortran coverage).
> > I've successfully ran the GCC testsuite with -flto -g which shows lots
> > of FAILs but doesn't regress in any unexpected ways compared to before
> > the patches.
> > 
> > I'll ping Ian about the simple-object part again and will apply
> > earliest at Aug 14th.
> > 
> > Richard.
> > 
> 
> on aarch64_be-none-elf i see
> 
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 (test for excess errors)
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -O (test for excess errors)
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -O3 (test for excess errors)
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 (test for excess errors)
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 -O (test for excess errors)
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 -O3 (test for excess 
> errors)
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 (test for excess errors)
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O (test for excess errors)
> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O3 (test for excess 
> errors)
> PASS->FAIL: gcc.dg/lto/20090914-1 c_lto_20090914-1_0.o-c_lto_20090914-1_0.o 
> link, -flto
> PASS->FAIL: gcc.dg/lto/20100426 c_lto_20100426_0.o-c_lto_20100426_0.o link, 
> -r -nostdlib -flto -g
> PASS->FAIL: gcc.dg/lto/20111207-2 c_lto_20111207-2_0.o-c_lto_20111207-2_0.o 
> link,  -g -O -flto
> PASS->FAIL: gcc.dg/lto/20111213-1 c_lto_20111213-1_0.o-c_lto_20111213-1_0.o 
> link,  -flto -g
> PASS->FAIL: gcc.dg/lto/pr51572-1 c_lto_pr51572-1_0.o-c_lto_pr51572-1_0.o 
> link,  -flto -g
> PASS->FAIL: gcc.dg/lto/pr53470 c_lto_pr53470_0.o-c_lto_pr53470_0.o link,  
> -flto -g
> PASS->FAIL: gcc.dg/lto/pr59323 c_lto_pr59323_0.o-c_lto_pr59323_0.o link,  -O2 
> -g -flto
> PASS->FAIL: gcc.dg/lto/pr59323-2 c_lto_pr59323-2_0.o-c_lto_pr59323-2_0.o 
> link,  -O2 -g -flto
> PASS->FAIL: gcc.dg/pr43557-1.c (test for excess errors)
> 
> linking seems to fail with
> 
> /tmp/ccqAb1Wfdebugobjtem: file not recognized: Bad value
> collect2: error: ld returned 1 exit status
> lto-wrapper: fatal error: B/gcc/xgcc returned 1 exit status
> compilation terminated.
> P/aarch64_be-none-elf/bin/ld: error: lto-wrapper failed
> collect2: error: ld returned 1 exit status
> compiler exited with status 1

Can you file a bugreport please?  Can you investigate a bit, I suspect
a simple int main() {} and ./xgcc -B. -flto -g t.c fails the same way.
With -save-temps -v you should be able to inspect the generated
debugobj with readelf.

Is this a native compiler or a cross-compiler?  [I suspect endianess
issues somewhere?]

Thanks,
Richard.

> similar regressions on g++ tests:
> 
> PASS->FAIL: g++.dg/lto/20101010-4 cp_lto_20101010-4_0.o-cp_lto_20101010-4_0.o 
> link,  -std=c++0x -flto -g -r
> -nostdlib
> PASS->FAIL: g++.dg/lto/20101015-2 cp_lto_20101015-2_0.o-cp_lto_20101015-2_0.o 
> link,  -g -flto
> PASS->FAIL: g++.dg/lto/pr42987 cp_lto_pr42987_0.o-cp_lto_pr42987_1.o link,  
> -flto -flto-partition=none -g
> PASS->FAIL: g++.dg/lto/pr42987 cp_lto_pr42987_0.o-cp_lto_pr42987_1.o link,  
> -flto -g
> PASS->FAIL: g++.dg/lto/pr48207 cp_lto_pr48207_0.o-cp_lto_pr48207_0.o link,  
> -flto -g
> PASS->FAIL: g++.dg/lto/pr48207-2 cp_lto_pr48207-2_0.o-cp_lto_pr48207-2_0.o 
> link,  -flto -g
> PASS->FAIL: g++.dg/lto/pr48207-3 cp_lto_pr48207-3_0.o-cp_lto_pr48207-3_0.o 
> link,  -flto -g
> PASS->FAIL: g++.dg/lto/pr48354-1 cp_lto_pr48354-1_0.o-cp_lto_pr48354-1_0.o 
> link,  -g -flto
> PASS->FAIL: g++.dg/lto/pr48508-1 cp_lto_pr48508-1_0.o-cp_lto_pr48508-1_1.o 
> link,  -g -O2 -flto
> -flto-partition=none
> PASS->FAIL: g++.dg/lto/pr51564-1 cp_lto_pr51564-1_0.o-cp_lto_pr51564-1_0.o 
> link,  -flto -g
> PASS->FAIL: g++.dg/lto/pr51567-1 cp_lto_pr51567-1_0.o-cp_lto_pr51567-1_0.o 
> link,  -flto -g
> PASS->FAIL: g++.dg/lto/pr51572-2 cp_lto_pr51572-2_0.o-cp_lto_pr51572-2_0.o 
> link,  -g -flto
> PASS->FAIL: g++.dg/lto/pr51573-1 cp_lto_pr51573-1_0.o-cp_lto_pr51573-1_0.o 
> link,  -flto -g
> PASS->FAIL: g++.dg/lto/pr51650-1 cp_lto_pr51650-1_0.o-cp_lto_pr51650-1_0.o 
> link,  -flto -g
> PASS->FAIL: g++.dg/lto/pr51650-2 cp_lto_pr51650-2_0.o-cp_lto_pr51650-2_0.o 
> link,  -flto -g
> PASS->FAIL: g++.dg/lto/pr51650-3 cp_lto_pr51650-3_0.o-cp_lto_pr51650-3_0.o 
> link,  -flto -g
> PASS->FAIL: g++.dg/lto/pr52605 cp_lto_pr52605_0.o-cp_lto_pr52605_0.o link, 
> -flto -g
> PASS->FAIL: g++.dg/lto/pr53470 cp_lto_pr53470_0.o-cp_lto_pr53470_0.o link,  
> -g -flto
> PASS->FAIL: g++.dg/lto/pr65193 cp_lto_pr65193_0.o-cp_lto_pr65193_0.o link, 
> -fPIC -r -nostdlib -flto -O2 -g
> PASS->FAIL: g++.dg/lto/pr65316 cp_lto_pr65316_0.o-cp_lto_pr65316_1.o li

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-08-22 Thread Wilco Dijkstra
Hi,

The main reason we have this issue is that DImode can be treated as a
vector of size 1. As a result we do not know whether the shift is an integer
or SIMD instruction. One way around this is to never use the SIMD variant,
another is to introduce V1DImode for vectors of size 1.

Short term I believe the first option is best - I don't think scalar operations 
executed
as SIMD instructions are useful in general, and these shifts are likely never 
used in
reality, so we're adding a lot of complexity and potential bugs .
Long term we need to compute register classes and do instruction selection
much earlier so this doesn't increase complexity of register allocation and 
machine
descriptors.

Wilco

RFC: Explicit move preference hints

2017-08-22 Thread Wilco Dijkstra
Hi,

The register allocator inserts move preferences when an instruction has
one or more dead sources in add_insn_allocno_copies. If an instruction
doesn't have a matching constraint (eg. "0"), then any dead source is treated
as a copy with all destination registers with a low priority. In reality what
appears to happen is that the first dead source is treated as a copy. This
leads to non-intuitive allocations in eg. 4-register FMAs. Here you'd
prefer to have the accumulator and destination to use the same register
when possible: so fmadd  d2, d0, d5, d2 instead of fmadd  d0, d0, d5, d2.

Would it be reasonable to add an explicit move preference feature?
For example use "r#0" to indicate in ira_get_dup_out_num that an operand
prefers to be the same as the destination if possible. This would use a lower
move priority than "0" to avoid constraining the allocation but a bit higher 
than
a simple dead source (eg. freq / 4 or freq / 2).

Wilco

Re: Make more use of paradoxical_subreg_p

2017-08-22 Thread Richard Biener
On Mon, Aug 21, 2017 at 3:31 PM, Richard Sandiford
 wrote:
> This patch makes more use of the existing paradoxical_subreg_p
> predicate and also adds a version that operates on outer and
> inner modes.
>
> Some of the affected tests were based on GET_MODE_SIZE rather than
> GET_MODE_PRECISION and so the patch could change the result for modes
> that have the same size but different precisions.  I think in each
> case the change should be a no-op or more correct, since a mode with
> precision N bits can't be expected to hold all of a mode with precision
> M>N bits.
>
> The patch changes the branch taken in simplify_subreg for modes with
> equal precision, but the new form matches the commentary more closely.
> Both branches should be equally good in that situation.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by checking that
> there were no changes in the testsuite assembly output for one target
> per CPU.  OK to install?

Ok.

Thanks
Richard.

> Richard
>
>
> 2017-08-21  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * rtl.h (paradoxical_subreg_p): Define inline, and add a version
> that takes the outer and inner modes.
> * doc/rtl.texi: Use paradoxical_subreg_p instead of a GET_MODE_SIZE
> comparison as the canonical test for a paradoxical subreg.
> * combine.c (simplify_set): Use paradoxical_subreg_p.
> (make_extraction): Likewise.
> (force_to_mode): Likewise.
> (rtx_equal_for_field_assignment_p): Likewise.
> (gen_lowpart_for_combine): Likewise.
> (simplify_comparison): Likewise.
> * cse.c (equiv_constant): Likewise.
> * expmed.c (store_bit_field_1): Likewise.
> * final.c (alter_subreg): Likewise.
> * fwprop.c (propagate_rtx): Likewise.
> (forward_propagate_subreg): Likewise.
> * ira-conflicts.c (ira_build_conflicts): Likewise.
> * lower-subreg.c (simplify_gen_subreg_concatn): Likewise.
> * lra-constraints.c (curr_insn_transform): Likewise.
> (split_reg): Likewise.
> * lra-eliminations.c (move_plus_up): Likewise.
> (lra_eliminate_regs_1): Likewise.
> * recog.c (general_operand): Likewise.
> * ree.c (combine_reaching_defs): Likewise.
> * reload.c (push_reload): Likewise.
> (find_reloads): Likewise.
> * reload1.c (elimination_effects): Likewise.
> (compute_reload_subreg_offset): Likewise.
> (choose_reload_regs): Likewise.
> * rtlanal.c (subreg_lsb_1): Likewise.
> * simplify-rtx.c (simplify_unary_operation_1): Likewise.
> (simplify_subreg): Likewise.
> * var-tracking.c (track_loc_p): Likewise.
> * emit-rtl.c (byte_lowpart_offset): Likewise.
> (paradoxical_subreg_p): Delete out-of-line definition.
>
> Index: gcc/rtl.h
> ===
> --- gcc/rtl.h   2017-08-21 10:42:34.185530464 +0100
> +++ gcc/rtl.h   2017-08-21 14:20:43.099964655 +0100
> @@ -2784,10 +2784,28 @@ extern rtx operand_subword (rtx, unsigne
>
>  /* In emit-rtl.c */
>  extern rtx operand_subword_force (rtx, unsigned int, machine_mode);
> -extern bool paradoxical_subreg_p (const_rtx);
>  extern int subreg_lowpart_p (const_rtx);
>  extern unsigned int subreg_size_lowpart_offset (unsigned int, unsigned int);
>
> +/* Return true if a subreg with the given outer and inner modes is
> +   paradoxical.  */
> +
> +inline bool
> +paradoxical_subreg_p (machine_mode outermode, machine_mode innermode)
> +{
> +  return GET_MODE_PRECISION (outermode) > GET_MODE_PRECISION (innermode);
> +}
> +
> +/* Return true if X is a paradoxical subreg, false otherwise.  */
> +
> +inline bool
> +paradoxical_subreg_p (const_rtx x)
> +{
> +  if (GET_CODE (x) != SUBREG)
> +return false;
> +  return paradoxical_subreg_p (GET_MODE (x), GET_MODE (SUBREG_REG (x)));
> +}
> +
>  /* Return the SUBREG_BYTE for an OUTERMODE lowpart of an INNERMODE value.  */
>
>  inline unsigned int
> Index: gcc/doc/rtl.texi
> ===
> --- gcc/doc/rtl.texi2017-07-27 10:37:54.486030028 +0100
> +++ gcc/doc/rtl.texi2017-08-21 14:20:43.094947435 +0100
> @@ -1872,7 +1872,7 @@ expression is called @dfn{paradoxical}.
>  class of @code{subreg} is:
>
>  @smallexample
> -GET_MODE_SIZE (@var{m1}) > GET_MODE_SIZE (@var{m2})
> +paradoxical_subreg_p (@var{m1}, @var{m2})
>  @end smallexample
>
>  Paradoxical @code{subreg}s can be used as both lvalues and rvalues.
> Index: gcc/combine.c
> ===
> --- gcc/combine.c   2017-08-21 10:42:34.185530464 +0100
> +++ gcc/combine.c   2017-08-21 14:20:43.092940547 +0100
> @@ -6809,9 +6809,7 @@ simplify_set (rtx x)
>/ UNITS_PER_WORD)
>   == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
>+ (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)

Re: ping [PATCH] [MSP430] Fix PR78849: ICE on initialization of global struct containing __int20 array

2017-08-22 Thread Jozef Lawrynowicz

On 02/08/2017 17:45, Joseph Myers wrote:

On Wed, 2 Aug 2017, Jeff Law wrote:


I think Joseph's suggestion for looking at partial float handling may be
useful, though ia64's RFmode may be more interesting as it's not a
multiple of 8 in bitsize.  IF/KF modes on the ppc port have similar
properties.


The key issue those floating-point modes engage is the meaning of
precision.  IFmode and KFmode have precision set to 106 and 113 to
distinguish them.  But that's precision in the sense of number of mantissa
bits; as normally understood in GCC, precision should be the number of
significant bits, so 128 for both those modes (but having multiple
different binary floating-point modes with the same precision would
require changing how we deal with laying out long double, so the target
specifies a mode for floating-point types rather than leaving it to be
deduced from values such as LONG_DOUBLE_TYPE_SIZE).



Thanks for the advice, I'm looking into how the ppc KFmode behaves in
this situation now.

I also looked through the front end code a bit more, and the behaviour
of stor-layout.c:layout_type for RECORD_TYPE looks likes a smoking gun
to me.
For BOOLEAN_TYPE, INTEGER_TYPE, ENUMERAL_TYPE, REAL_TYPE,
FIXED_POINT_TYPE etc. layout_type sets:

TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)));

But for the individual field types in RECORD_TYPE, UNION_TYPE or
QUAL_UNION_TYPE, this same setting of TYPE_SIZE and friends is not
performed.
So a field in a RECORD_TYPE might be an INTEGER_TYPE, but TYPE_SIZE for
this INTEGER_TYPE will not be set as it would have been had the type not
been a field in a RECORD_TYPE.

So the fix to me looks to be to ensure that the code for the base types
(BOOLEAN_TYPE, INTEGER_TYPE etc.) from layout_type is also executed for
each type that happens to be a field in a RECORD/UNION/QUAL_UNION_TYPE.


Re: [PATCH 2/2] C: use full locations within c_parser_expr_list's vec

2017-08-22 Thread Andreas Schwab
On Aug 18 2017, David Malcolm  wrote:

> gcc/c/ChangeLog:
>   * c-parser.c (c_parser_expr_list): Use c_expr::get_location ()
>   rather than peeking the location of the first token.
>   * c-tree.h (c_expr::get_location): New method.

This breaks gcc.dg/Wtraditional-conversion-2.c on m68k:

FAIL: gcc.dg/Wtraditional-conversion-2.c  (test for warnings, line 54)
FAIL: gcc.dg/Wtraditional-conversion-2.c  (test for warnings, line 55)
FAIL: gcc.dg/Wtraditional-conversion-2.c (test for excess errors)
Excess errors:
cc1: warning: passing argument 1 of 'ff' as 'float' rather than 'double' due to 
prototype
cc1: warning: passing argument 1 of 'x.ff' as 'float' rather than 'double' due 
to prototype

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH] Fix inliner assert with free-lang-data

2017-08-22 Thread Richard Biener

free-lang-data can end up unsharing TYPE_SIZE[_UNIT], the following patch
fixes that similar to how verify_type_variant behaves to not ICE during
inlining.

Refactoring free-lang-data a bit could avoid the false unsharing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-08-22  Richard Biener  

* tree-inline.c (remap_type_1): Change asserts on TYPE_SIZE[_UNIT]
to allow for free-lang-data replacements similar to verify_type_variant.

Index: gcc/tree-inline.c
===
--- gcc/tree-inline.c   (revision 251265)
+++ gcc/tree-inline.c   (working copy)
@@ -554,9 +554,16 @@ remap_type_1 (tree type, copy_body_data
   /* All variants of type share the same size, so use the already remaped 
data.  */
   if (TYPE_MAIN_VARIANT (new_tree) != new_tree)
 {
-  gcc_checking_assert (TYPE_SIZE (type) == TYPE_SIZE (TYPE_MAIN_VARIANT 
(type)));
-  gcc_checking_assert (TYPE_SIZE_UNIT (type) == TYPE_SIZE_UNIT 
(TYPE_MAIN_VARIANT (type)));
-
+  tree s = TYPE_SIZE (type);
+  tree mvs = TYPE_SIZE (TYPE_MAIN_VARIANT (type));
+  tree su = TYPE_SIZE_UNIT (type);
+  tree mvsu = TYPE_SIZE_UNIT (TYPE_MAIN_VARIANT (type));
+  gcc_checking_assert ((TREE_CODE (s) == PLACEHOLDER_EXPR
+   && (TREE_CODE (mvs) == PLACEHOLDER_EXPR))
+  || s == mvs);
+  gcc_checking_assert ((TREE_CODE (su) == PLACEHOLDER_EXPR
+   && (TREE_CODE (mvsu) == PLACEHOLDER_EXPR))
+  || su == mvsu);
   TYPE_SIZE (new_tree) = TYPE_SIZE (TYPE_MAIN_VARIANT (new_tree));
   TYPE_SIZE_UNIT (new_tree) = TYPE_SIZE_UNIT (TYPE_MAIN_VARIANT 
(new_tree));
 }


[PATCH] Remove output_die_symbol

2017-08-22 Thread Richard Biener

This addresses one postponed feedback from the early LTO debug time
by simply removing output_die_symbol from output_die as we have no
! die->comdat_type_p symbols left apart from the CU one produced by
LTO debug which I added the guard for.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2017-08-22  Richard Biener  

* dwarf2out.c (output_die_symbol): Remove.
(output_die): Do not output a DIEs symbol.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 251273)
+++ gcc/dwarf2out.c (working copy)
@@ -3472,7 +3472,6 @@ static enum dwarf_form value_format (dw_
 static void output_value_format (dw_attr_node *);
 static void output_abbrev_section (void);
 static void output_die_abbrevs (unsigned long, dw_die_ref);
-static void output_die_symbol (dw_die_ref);
 static void output_die (dw_die_ref);
 static void output_compilation_unit_header (enum dwarf_unit_type);
 static void output_comp_unit (dw_die_ref, int, const unsigned char *);
@@ -9615,27 +9614,6 @@ output_abbrev_section (void)
   dw2_asm_output_data (1, 0, NULL);
 }
 
-/* Output a symbol we can use to refer to this DIE from another CU.  */
-
-static inline void
-output_die_symbol (dw_die_ref die)
-{
-  const char *sym = die->die_id.die_symbol;
-
-  gcc_assert (!die->comdat_type_p);
-
-  if (sym == 0)
-return;
-
-  if (strncmp (sym, DIE_LABEL_PREFIX, sizeof (DIE_LABEL_PREFIX) - 1) == 0)
-/* We make these global, not weak; if the target doesn't support
-   .linkonce, it doesn't support combining the sections, so debugging
-   will break.  */
-targetm.asm_out.globalize_label (asm_out_file, sym);
-
-  ASM_OUTPUT_LABEL (asm_out_file, sym);
-}
-
 /* Return a new location list, given the begin and end range, and the
expression.  */
 
@@ -9994,15 +9972,6 @@ output_die (dw_die_ref die)
   unsigned long size;
   unsigned ix;
 
-  /* If someone in another CU might refer to us, set up a symbol for
- them to point to.  */
-  if (! die->comdat_type_p && die->die_id.die_symbol
-  /* Don't output the symbol twice.  For LTO we want the label
- on the section beginning, not on the actual DIE.  */
-  && (!flag_generate_lto
- || die->die_tag != DW_TAG_compile_unit))
-output_die_symbol (die);
-
   dw2_asm_output_data_uleb128 (die->die_abbrev, "(DIE (%#lx) %s)",
   (unsigned long)die->die_offset,
   dwarf_tag_name (die->die_tag));


Re: [PATCH] Fix PR81921

2017-08-22 Thread Uros Bizjak
On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener  wrote:
>
> The following patch fixes PR81921 (and LTO build of libgo) which I ran
> into when trying to enable free-lang-data for non-LTO compiles.
>
> free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all functions
> so we have them ending up with target_option_default_node eventually
> which is something ix86_can_inline_p doesn't expect (I tried forcing
> a compare of the actual options but that fails as well as we get
> spurious differences in use-fpmath, the default node with -m32
> -march=x86_64 doesn't have it while non-default nodes have it).
>
> The patch is what I consider safe for branches, we might want to work
> on sth better (actually comparing always and fixing the fpmath issue)
> on trunk as followup.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk
> and active branches?
>
> Note the change to the ref = false conditional isn't strictly necessary
> but it makes -flto and non-flto behave consistently.
>
> Thanks,
> Richard.
>
> 2017-08-22  Richard Biener  
>
> PR target/81921
> * config/i386/i386.c (ix86_can_inline_p): Treat
> target_option_default_node as non-existent.

LGTM.

Please give the patch some soaking time in the mainline before
backporting it to release branches.

Thanks,
Uros.

> Index: gcc/config/i386/i386.c
> ===
> --- gcc/config/i386/i386.c  (revision 251266)
> +++ gcc/config/i386/i386.c  (working copy)
> @@ -7507,12 +7507,12 @@ ix86_can_inline_p (tree caller, tree cal
>tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
>
>/* If callee has no option attributes, then it is ok to inline.  */
> -  if (!callee_tree)
> +  if (!callee_tree || callee_tree == target_option_default_node)
>  ret = true;
>
>/* If caller has no option attributes, but callee does then it is not ok to
>   inline.  */
> -  else if (!caller_tree)
> +  else if (!caller_tree || caller_tree == target_option_default_node)
>  ret = false;
>
>else
> Index: gcc/testsuite/gcc.target/i386/pr81921.c
> ===
> --- gcc/testsuite/gcc.target/i386/pr81921.c (nonexistent)
> +++ gcc/testsuite/gcc.target/i386/pr81921.c (working copy)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target lto } */
> +/* { dg-options "-flto -march=x86-64" } */
> +
> +extern __inline int  __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__, target("sse2")))
> +_mm_loadu_si128 (int const *__P)
> +{
> +return *__P;
> +}
> +
> +void __attribute__((target("ssse3"))) foo (void *p)
> +{
> +  volatile int x = _mm_loadu_si128 (p);
> +}


Re: [PATCH][2/2] early LTO debug, main part

2017-08-22 Thread Szabolcs Nagy
On 22/08/17 11:17, Richard Biener wrote:
> On Tue, 22 Aug 2017, Szabolcs Nagy wrote:
>> on aarch64_be-none-elf i see
>>
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 (test for excess errors)
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -O (test for excess errors)
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -O3 (test for excess errors)
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 (test for excess errors)
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 -O (test for excess 
>> errors)
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 -O3 (test for excess 
>> errors)
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 (test for excess errors)
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O (test for excess 
>> errors)
>> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O3 (test for excess 
>> errors)
>> PASS->FAIL: gcc.dg/lto/20090914-1 c_lto_20090914-1_0.o-c_lto_20090914-1_0.o 
>> link, -flto
>> PASS->FAIL: gcc.dg/lto/20100426 c_lto_20100426_0.o-c_lto_20100426_0.o link, 
>> -r -nostdlib -flto -g
>> PASS->FAIL: gcc.dg/lto/20111207-2 c_lto_20111207-2_0.o-c_lto_20111207-2_0.o 
>> link,  -g -O -flto
>> PASS->FAIL: gcc.dg/lto/20111213-1 c_lto_20111213-1_0.o-c_lto_20111213-1_0.o 
>> link,  -flto -g
>> PASS->FAIL: gcc.dg/lto/pr51572-1 c_lto_pr51572-1_0.o-c_lto_pr51572-1_0.o 
>> link,  -flto -g
>> PASS->FAIL: gcc.dg/lto/pr53470 c_lto_pr53470_0.o-c_lto_pr53470_0.o link,  
>> -flto -g
>> PASS->FAIL: gcc.dg/lto/pr59323 c_lto_pr59323_0.o-c_lto_pr59323_0.o link,  
>> -O2 -g -flto
>> PASS->FAIL: gcc.dg/lto/pr59323-2 c_lto_pr59323-2_0.o-c_lto_pr59323-2_0.o 
>> link,  -O2 -g -flto
>> PASS->FAIL: gcc.dg/pr43557-1.c (test for excess errors)
>>
>> linking seems to fail with
>>
>> /tmp/ccqAb1Wfdebugobjtem: file not recognized: Bad value
>> collect2: error: ld returned 1 exit status
>> lto-wrapper: fatal error: B/gcc/xgcc returned 1 exit status
>> compilation terminated.
>> P/aarch64_be-none-elf/bin/ld: error: lto-wrapper failed
>> collect2: error: ld returned 1 exit status
>> compiler exited with status 1
> 
> Can you file a bugreport please?  Can you investigate a bit, I suspect
> a simple int main() {} and ./xgcc -B. -flto -g t.c fails the same way.
> With -save-temps -v you should be able to inspect the generated
> debugobj with readelf.
> 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81925

i could not figure much out, readelf could parse lto1 input
but printed errors on the final link command input.

> Is this a native compiler or a cross-compiler?  [I suspect endianess
> issues somewhere?]

cross compiler (from aarch64-linux-gnu to aarch64_be-none-elf)



Re: [PATCH][2/2] early LTO debug, main part

2017-08-22 Thread Richard Biener
On Tue, 22 Aug 2017, Szabolcs Nagy wrote:

> On 22/08/17 11:17, Richard Biener wrote:
> > On Tue, 22 Aug 2017, Szabolcs Nagy wrote:
> >> on aarch64_be-none-elf i see
> >>
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 (test for excess errors)
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -O (test for excess errors)
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -O3 (test for excess errors)
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 (test for excess errors)
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 -O (test for excess 
> >> errors)
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g1 -O3 (test for excess 
> >> errors)
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 (test for excess errors)
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O (test for excess 
> >> errors)
> >> PASS->FAIL: gcc.dg/debug/pr41893-1.c -gdwarf-2 -g3 -O3 (test for excess 
> >> errors)
> >> PASS->FAIL: gcc.dg/lto/20090914-1 
> >> c_lto_20090914-1_0.o-c_lto_20090914-1_0.o link, -flto
> >> PASS->FAIL: gcc.dg/lto/20100426 c_lto_20100426_0.o-c_lto_20100426_0.o 
> >> link, -r -nostdlib -flto -g
> >> PASS->FAIL: gcc.dg/lto/20111207-2 
> >> c_lto_20111207-2_0.o-c_lto_20111207-2_0.o link,  -g -O -flto
> >> PASS->FAIL: gcc.dg/lto/20111213-1 
> >> c_lto_20111213-1_0.o-c_lto_20111213-1_0.o link,  -flto -g
> >> PASS->FAIL: gcc.dg/lto/pr51572-1 c_lto_pr51572-1_0.o-c_lto_pr51572-1_0.o 
> >> link,  -flto -g
> >> PASS->FAIL: gcc.dg/lto/pr53470 c_lto_pr53470_0.o-c_lto_pr53470_0.o link,  
> >> -flto -g
> >> PASS->FAIL: gcc.dg/lto/pr59323 c_lto_pr59323_0.o-c_lto_pr59323_0.o link,  
> >> -O2 -g -flto
> >> PASS->FAIL: gcc.dg/lto/pr59323-2 c_lto_pr59323-2_0.o-c_lto_pr59323-2_0.o 
> >> link,  -O2 -g -flto
> >> PASS->FAIL: gcc.dg/pr43557-1.c (test for excess errors)
> >>
> >> linking seems to fail with
> >>
> >> /tmp/ccqAb1Wfdebugobjtem: file not recognized: Bad value
> >> collect2: error: ld returned 1 exit status
> >> lto-wrapper: fatal error: B/gcc/xgcc returned 1 exit status
> >> compilation terminated.
> >> P/aarch64_be-none-elf/bin/ld: error: lto-wrapper failed
> >> collect2: error: ld returned 1 exit status
> >> compiler exited with status 1
> > 
> > Can you file a bugreport please?  Can you investigate a bit, I suspect
> > a simple int main() {} and ./xgcc -B. -flto -g t.c fails the same way.
> > With -save-temps -v you should be able to inspect the generated
> > debugobj with readelf.
> > 
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81925
> 
> i could not figure much out, readelf could parse lto1 input
> but printed errors on the final link command input.
> 
> > Is this a native compiler or a cross-compiler?  [I suspect endianess
> > issues somewhere?]
> 
> cross compiler (from aarch64-linux-gnu to aarch64_be-none-elf)

I am testing the following fix (verified with a cross), will apply
as obvious if that succeeds.

Richard.

2017-08-22  Richard Biener  

PR lto/81925
* simple-object-elf.c (simple_object_elf_write_shdr): Adjust
type of sh_addralign and sh_entsize and properly write
sh_entsize as Elf_Addr.
(simple_object_elf_write_to_file): Read sh_entsize as Elf_Addr.

Index: libiberty/simple-object-elf.c
===
--- libiberty/simple-object-elf.c   (revision 251273)
+++ libiberty/simple-object-elf.c   (working copy)
@@ -830,8 +830,8 @@ simple_object_elf_write_shdr (simple_obj
  off_t sh_addr,
  unsigned int sh_offset, unsigned int sh_size,
  unsigned int sh_link, unsigned int sh_info,
- unsigned int sh_addralign,
- unsigned int sh_entsize,
+ size_t sh_addralign,
+ size_t sh_entsize,
  const char **errmsg, int *err)
 {
   struct simple_object_elf_attributes *attrs =
@@ -858,7 +858,7 @@ simple_object_elf_write_shdr (simple_obj
   ELF_SET_FIELD (fns, cl, Shdr, buf, sh_link, Elf_Word, sh_link);
   ELF_SET_FIELD (fns, cl, Shdr, buf, sh_info, Elf_Word, sh_info);
   ELF_SET_FIELD (fns, cl, Shdr, buf, sh_addralign, Elf_Addr, sh_addralign);
-  ELF_SET_FIELD (fns, cl, Shdr, buf, sh_entsize, Elf_Word, sh_entsize);
+  ELF_SET_FIELD (fns, cl, Shdr, buf, sh_entsize, Elf_Addr, sh_entsize);
 
   return simple_object_internal_write (descriptor, offset, buf, shdr_size,
   errmsg, err);
@@ -948,8 +948,8 @@ simple_object_elf_write_to_file (simple_
   off_t sh_addr = 0;
   unsigned int sh_link = 0;
   unsigned int sh_info = 0;
-  unsigned int sh_addralign = 1U << section->align;
-  unsigned int sh_entsize = 0;
+  size_t sh_addralign = 1U << section->align;
+  size_t sh_entsize = 0;
   if (eow->shdrs)
{
  sh_type = ELF_FETCH_FIELD (attrs->type_functions, attrs->ei_class, 
Shdr,
@@ -972,7 +972,7 @

Re: [PATCH] Fix PR81921

2017-08-22 Thread Richard Biener
On Tue, 22 Aug 2017, Uros Bizjak wrote:

> On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener  wrote:
> >
> > The following patch fixes PR81921 (and LTO build of libgo) which I ran
> > into when trying to enable free-lang-data for non-LTO compiles.
> >
> > free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all functions
> > so we have them ending up with target_option_default_node eventually
> > which is something ix86_can_inline_p doesn't expect (I tried forcing
> > a compare of the actual options but that fails as well as we get
> > spurious differences in use-fpmath, the default node with -m32
> > -march=x86_64 doesn't have it while non-default nodes have it).
> >
> > The patch is what I consider safe for branches, we might want to work
> > on sth better (actually comparing always and fixing the fpmath issue)
> > on trunk as followup.
> >
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk
> > and active branches?
> >
> > Note the change to the ref = false conditional isn't strictly necessary
> > but it makes -flto and non-flto behave consistently.
> >
> > Thanks,
> > Richard.
> >
> > 2017-08-22  Richard Biener  
> >
> > PR target/81921
> > * config/i386/i386.c (ix86_can_inline_p): Treat
> > target_option_default_node as non-existent.
> 
> LGTM.
> 
> Please give the patch some soaking time in the mainline before
> backporting it to release branches.

Thanks.  Of course this was copied by other targets (and the x86
one maybe from the default).  So the following is an extended patch.

Ok for the rs6000 and aarch64 bits?

Thanks,
Richard.

2017-08-22  Richard Biener  

PR target/81921
* config/i386/i386.c (ix86_can_inline_p): Treat
target_option_default_node as non-existent.
* targhooks.c (default_target_can_inline_p): Likewise.
* config/aarch64/aarch64.c (aarch64_can_inline_p): Likewise.
* config/powerpcspe/powerpcspe.c (rs6000_can_inline_p): Likewise.
* config/rs6000/rs6000.c (rs6000_can_inline_p): Likewise.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 251266)
+++ gcc/config/i386/i386.c  (working copy)
@@ -7507,12 +7507,12 @@ ix86_can_inline_p (tree caller, tree cal
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
   /* If callee has no option attributes, then it is ok to inline.  */
-  if (!callee_tree)
+  if (!callee_tree || callee_tree == target_option_default_node)
 ret = true;
 
   /* If caller has no option attributes, but callee does then it is not ok to
  inline.  */
-  else if (!caller_tree)
+  else if (!caller_tree || caller_tree == target_option_default_node)
 ret = false;
 
   else
Index: gcc/testsuite/gcc.target/i386/pr81921.c
===
--- gcc/testsuite/gcc.target/i386/pr81921.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr81921.c (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -march=x86-64" } */
+
+extern __inline int  __attribute__((__gnu_inline__, __always_inline__, 
__artificial__, target("sse2")))
+_mm_loadu_si128 (int const *__P)
+{
+return *__P;
+}
+
+void __attribute__((target("ssse3"))) foo (void *p)
+{
+  volatile int x = _mm_loadu_si128 (p);
+}
Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 251274)
+++ gcc/targhooks.c (working copy)
@@ -1447,12 +1447,12 @@ default_target_can_inline_p (tree caller
   tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller);
 
   /* If callee has no option attributes, then it is ok to inline */
-  if (!callee_opts)
+  if (!callee_opts || callee_tree == target_option_default_node)
 ret = true;
 
   /* If caller has no option attributes, but callee does then it is not ok to
  inline */
-  else if (!caller_opts)
+  else if (!caller_opts || caller_tree == target_option_default_node)
 ret = false;
 
   /* If both caller and callee have attributes, assume that if the
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 251274)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10074,7 +10074,7 @@ aarch64_can_inline_p (tree caller, tree
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
   /* If callee has no option attributes, then it is ok to inline.  */
-  if (!callee_tree)
+  if (!callee_tree || callee_tree == target_option_default_node)
 return true;
 
   struct cl_target_option *caller_opts
Index: gcc/config/powerpcspe/powerpcspe.c
===
--- gcc/config/powerpcspe/powerpcspe.c  (revision 251274)
+++ gcc/config/powerpcspe/powerpcspe.c  (working copy)
@@ -40185,12 +40185,12 @@ rs6000_can_inline_p

Re: [PATCH] Fix PR81921

2017-08-22 Thread Kyrill Tkachov

Hi Richard,
[roping in more aarch64 maintainers]

On 22/08/17 13:27, Richard Biener wrote:

On Tue, 22 Aug 2017, Uros Bizjak wrote:


On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener  wrote:

The following patch fixes PR81921 (and LTO build of libgo) which I ran
into when trying to enable free-lang-data for non-LTO compiles.

free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all functions
so we have them ending up with target_option_default_node eventually
which is something ix86_can_inline_p doesn't expect (I tried forcing
a compare of the actual options but that fails as well as we get
spurious differences in use-fpmath, the default node with -m32
-march=x86_64 doesn't have it while non-default nodes have it).

The patch is what I consider safe for branches, we might want to work
on sth better (actually comparing always and fixing the fpmath issue)
on trunk as followup.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk
and active branches?

Note the change to the ref = false conditional isn't strictly necessary
but it makes -flto and non-flto behave consistently.

Thanks,
Richard.

2017-08-22  Richard Biener  

 PR target/81921
 * config/i386/i386.c (ix86_can_inline_p): Treat
 target_option_default_node as non-existent.

LGTM.

Please give the patch some soaking time in the mainline before
backporting it to release branches.

Thanks.  Of course this was copied by other targets (and the x86
one maybe from the default).  So the following is an extended patch.

Ok for the rs6000 and aarch64 bits?

Thanks,
Richard.

2017-08-22  Richard Biener  

PR target/81921
* config/i386/i386.c (ix86_can_inline_p): Treat
target_option_default_node as non-existent.
* targhooks.c (default_target_can_inline_p): Likewise.
* config/aarch64/aarch64.c (aarch64_can_inline_p): Likewise.
* config/powerpcspe/powerpcspe.c (rs6000_can_inline_p): Likewise.
* config/rs6000/rs6000.c (rs6000_can_inline_p): Likewise.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 251266)
+++ gcc/config/i386/i386.c  (working copy)
@@ -7507,12 +7507,12 @@ ix86_can_inline_p (tree caller, tree cal
tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
  
/* If callee has no option attributes, then it is ok to inline.  */

-  if (!callee_tree)
+  if (!callee_tree || callee_tree == target_option_default_node)
  ret = true;
  
/* If caller has no option attributes, but callee does then it is not ok to

   inline.  */
-  else if (!caller_tree)
+  else if (!caller_tree || caller_tree == target_option_default_node)
  ret = false;
  
else

Index: gcc/testsuite/gcc.target/i386/pr81921.c
===
--- gcc/testsuite/gcc.target/i386/pr81921.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr81921.c (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -march=x86-64" } */
+
+extern __inline int  __attribute__((__gnu_inline__, __always_inline__, __artificial__, 
target("sse2")))
+_mm_loadu_si128 (int const *__P)
+{
+return *__P;
+}
+
+void __attribute__((target("ssse3"))) foo (void *p)
+{
+  volatile int x = _mm_loadu_si128 (p);
+}
Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 251274)
+++ gcc/targhooks.c (working copy)
@@ -1447,12 +1447,12 @@ default_target_can_inline_p (tree caller
tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller);
  
/* If callee has no option attributes, then it is ok to inline */

-  if (!callee_opts)
+  if (!callee_opts || callee_tree == target_option_default_node)
  ret = true;
  
/* If caller has no option attributes, but callee does then it is not ok to

   inline */
-  else if (!caller_opts)
+  else if (!caller_opts || caller_tree == target_option_default_node)
  ret = false;
  
/* If both caller and callee have attributes, assume that if the

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 251274)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10074,7 +10074,7 @@ aarch64_can_inline_p (tree caller, tree
tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
  
/* If callee has no option attributes, then it is ok to inline.  */

-  if (!callee_tree)
+  if (!callee_tree || callee_tree == target_option_default_node)
  return true;
  


The aarch64 changes looks ok to me FWIW (since I wrote that function)

Kyrill


struct cl_target_option *caller_opts
Index: gcc/config/powerpcspe/powerpcspe.c
===
--- gcc/config/powerpcspe/powerpcspe.c  (revision 251274)
+++ gcc/

Re: [PATCH] Fix PR81921

2017-08-22 Thread Segher Boessenkool
On Tue, Aug 22, 2017 at 02:27:28PM +0200, Richard Biener wrote:
> Thanks.  Of course this was copied by other targets (and the x86
> one maybe from the default).  So the following is an extended patch.
> 
> Ok for the rs6000 and aarch64 bits?

Sure, rs6000 part is fine.  Thanks!


Segher


>   PR target/81921
>   * config/i386/i386.c (ix86_can_inline_p): Treat
>   target_option_default_node as non-existent.
>   * targhooks.c (default_target_can_inline_p): Likewise.
>   * config/aarch64/aarch64.c (aarch64_can_inline_p): Likewise.
>   * config/powerpcspe/powerpcspe.c (rs6000_can_inline_p): Likewise.
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Likewise.


Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations

2017-08-22 Thread Pekka Jääskeläinen
Hi Richard and Joseph,

Replies for both inline:

I wrote:
>> Both the inputs and outputs must be flushed to zero in the HSAIL’s
>> ‘ftz’ semantics.
>> FTZ operations were previously always “explicit” in the BRIG FE output, like 
>> you
>> propose here; there were builtin calls injected for all inputs and the
>> output of ‘ftz’-marked
>> float HSAIL instructions. This is still provided as a fallback for
>> targets which do not
>> support a CPU mode flag.

On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener
 wrote:
> I see.  But how does making them implicit fix cases in the conformance
> testsuite?  That is, isn't the error in the runtime implementation of
> __hsail_ftz_*?  I'd have used a "simple" [...]

There are two parts in the story here:

1) Making the FTZ/DAZ “the default”, meaning no builtin calls or
similar are used to flush
the operands/results, but relying on that the runtime flips on the
FTZ/DAZ CPU flags
before executing this code. This is purely a performance optimization because
those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the performance
for multiple reasons. We implemented this optimization already in our
staging branch of
the BRIG FE.

2) Ensuring GCC does not perform certain compile-time optimizations with the
assumption that FTZ/DAZ is optional, but make it assume that ftz
should happen for
correctness. The proposed patch addresses this part for the compiler
side by disabling
the currently known optimizations which should be flushed at runtime
when “ftz denorm
math” is desired.

>> The problem with a special FTZ ‘operation’ of some kind in the generic 
>> output is
>> that the basic optimizations get confused by a new operation and we’d need to
>> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
>> code, which
>> seems unnecessary to support this case as the optimizations typically apply 
>> also
>> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.
>
> Apart from the exceptions you needed to guard ... do you have an example of
> a transform that is confused by explicit FTZ and that would be valid if that 
> FTZ
> were implicit?  An explicit FTZ should be much safer.  I think the builtins
> should also be CONST and not only PURE.

Explicit builtin calls ruin many optimizations starting from a simple
common subexpression
elimination if they don’t understand what the builtin returns for any
given operand. Thus,
inlining the builtin function’s code would be needed first and there
would be a lot of code
inlined due to the abundance of ftz calls required and you cannot
eliminate it all (as at
compile time you don’t know if the operand is a denorm or not). Another approach
would be to introduce special cases to the optimizations affected so
they understand
the FTZ builtin and might be able to remove the useless ones. This potentially
touches _a lot_ of code. And in the end, if the CPU could flush
denorms efficiently
using hardware (typically it’s faster to do FTZ in HW than gradual
underflow so this
is likely the case), any builtin call to do it that cannot be
optimized away presents
additional, possibly major, runtime overhead.

We tested if a simple common subexpression elimination case works with
the ftz builtins
and it didn’t. CONST didn’t help here.

However, I understand your concern that there might be optimizations
that still break the
FTZ semantics if there are no explicit builtin calls, but we are
prepared to fix them case by
case if/when they appear. The attached updated patch fixes a few
additional cases we noticed,
e.g. it disables several constant folding cases.

On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers  wrote:
> Presumably this means that constant folding needs to know about those
> semantics, both for operations with a subnormal floating-point argument
> (whether or not the output is floating point, or floating point in the
> same format), and those with such a result?
> Can assignments copy subnormals without converting them to zero?  Should
> comparisons flush input subnormals to zero before comparing?  Should
> conversions e.g. from float to double convert a float subnormal input to
> zero?

I can answer yes to all of these questions.

BR,
Pekka
From 0b97ccde3ec837329b4c551ccd7f98c074ca7a7b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= 
Date: Mon, 24 Jul 2017 09:28:00 +0300
Subject: [PATCH] Added common -fftz-math flag

With the flag set on the compiler assumes the floating-point
operations must flush received and resulting subnormal floating-point
values to zero.
---
 gcc/common.opt  |   5 +
 gcc/doc/invoke.texi |  11 ++
 gcc/fold-const-call.c   |   9 +-
 gcc/fold-const.c|  22 +++
 gcc/match.pd|  14 +-
 gcc/simplify-rtx.c  |  30 +++-
 gcc/testsuite/gcc.dg/ftz-math.c | 330 
 7 files changed, 405 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ftz-math.c

diff --git a/

Re: Fix inconsistent section flags

2017-08-22 Thread coypu
ping


Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations

2017-08-22 Thread Richard Biener
On Tue, Aug 22, 2017 at 3:28 PM, Pekka Jääskeläinen  wrote:
> Hi Richard and Joseph,
>
> Replies for both inline:
>
> I wrote:
>>> Both the inputs and outputs must be flushed to zero in the HSAIL’s
>>> ‘ftz’ semantics.
>>> FTZ operations were previously always “explicit” in the BRIG FE output, 
>>> like you
>>> propose here; there were builtin calls injected for all inputs and the
>>> output of ‘ftz’-marked
>>> float HSAIL instructions. This is still provided as a fallback for
>>> targets which do not
>>> support a CPU mode flag.
>
> On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener
>  wrote:
>> I see.  But how does making them implicit fix cases in the conformance
>> testsuite?  That is, isn't the error in the runtime implementation of
>> __hsail_ftz_*?  I'd have used a "simple" [...]
>
> There are two parts in the story here:
>
> 1) Making the FTZ/DAZ “the default”, meaning no builtin calls or
> similar are used to flush
> the operands/results, but relying on that the runtime flips on the
> FTZ/DAZ CPU flags
> before executing this code. This is purely a performance optimization because
> those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the performance
> for multiple reasons. We implemented this optimization already in our
> staging branch of
> the BRIG FE.
>
> 2) Ensuring GCC does not perform certain compile-time optimizations with the
> assumption that FTZ/DAZ is optional, but make it assume that ftz
> should happen for
> correctness. The proposed patch addresses this part for the compiler
> side by disabling
> the currently known optimizations which should be flushed at runtime
> when “ftz denorm
> math” is desired.
>
>>> The problem with a special FTZ ‘operation’ of some kind in the generic 
>>> output is
>>> that the basic optimizations get confused by a new operation and we’d need 
>>> to
>>> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
>>> code, which
>>> seems unnecessary to support this case as the optimizations typically apply 
>>> also
>>> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.
>>
>> Apart from the exceptions you needed to guard ... do you have an example of
>> a transform that is confused by explicit FTZ and that would be valid if that 
>> FTZ
>> were implicit?  An explicit FTZ should be much safer.  I think the builtins
>> should also be CONST and not only PURE.
>
> Explicit builtin calls ruin many optimizations starting from a simple
> common subexpression
> elimination if they don’t understand what the builtin returns for any
> given operand.

Calls to const functions are CSEd just fine (if they are passed the same
argument, that is).

int __attribute__((const)) foo (int i);

int main()
{
  return foo(1) + foo(1);
}

results in 2 * foo (1).

Note that I expected FTZ to be a tree code and not a builtin.  The target
can then choose to simply elide all FTZ.  Constant folding can then
also correctly handle FTZ in the places where it is relevant.

> Thus,
> inlining the builtin function’s code would be needed first and there
> would be a lot of code
> inlined due to the abundance of ftz calls required and you cannot
> eliminate it all (as at
> compile time you don’t know if the operand is a denorm or not). Another 
> approach
> would be to introduce special cases to the optimizations affected so
> they understand
> the FTZ builtin and might be able to remove the useless ones. This potentially
> touches _a lot_ of code. And in the end, if the CPU could flush
> denorms efficiently
> using hardware (typically it’s faster to do FTZ in HW than gradual
> underflow so this
> is likely the case), any builtin call to do it that cannot be
> optimized away presents
> additional, possibly major, runtime overhead.

Understood.

> We tested if a simple common subexpression elimination case works with
> the ftz builtins
> and it didn’t. CONST didn’t help here.
>
> However, I understand your concern that there might be optimizations
> that still break the
> FTZ semantics if there are no explicit builtin calls, but we are
> prepared to fix them case by
> case if/when they appear. The attached updated patch fixes a few
> additional cases we noticed,
> e.g. it disables several constant folding cases.
>
> On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers  wrote:
>> Presumably this means that constant folding needs to know about those
>> semantics, both for operations with a subnormal floating-point argument
>> (whether or not the output is floating point, or floating point in the
>> same format), and those with such a result?
>> Can assignments copy subnormals without converting them to zero?  Should
>> comparisons flush input subnormals to zero before comparing?  Should
>> conversions e.g. from float to double convert a float subnormal input to
>> zero?
>
> I can answer yes to all of these questions.

I think the flag approach isn't good here.  If we'd have a mode that
doesn't have
denormals we could represent that but it's the language frontend that requires
a certa

Re: [PATCH] Switch on *.cc tests for g++ ASan

2017-08-22 Thread Jeff Law
On 08/22/2017 03:12 AM, Richard Biener wrote:
> On Wed, Aug 9, 2017 at 6:27 PM, Jeff Law  wrote:
>> On 08/07/2017 11:59 PM, Slava Barinov wrote:
>>>   * g++.dg/asan/asan.exp: Switch on *.cc tests.
>>>
>>> Signed-off-by: Slava Barinov 
>>> ---
>>>  gcc/testsuite/ChangeLog| 4 
>>>  gcc/testsuite/g++.dg/asan/asan.exp | 2 +-
>>>  2 files changed, 5 insertions(+), 1 deletion(-)
>> Thanks.  Installed.
> 
> They all fail, I tried to fix that up but failed.  Thus reverted.
Thanks.  It was on my todo list to figure out if any were passing.
Thanks for taking care of it.
jeff


Re: [PATCH] Avoid re-allocating PHIs in split_edge

2017-08-22 Thread Jeff Law
On 08/22/2017 03:03 AM, Richard Biener wrote:
> 
> The following patch makes sure to not grow the number of incoming
> edges in the destination when doing split_edge on GIMPLE.  That's
> easy by first redirecting the existing edge to the destination
> to the new block rather than creating the new fallthru from the
> new block to the destination.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> 
> Richard.
> 
> 2017-08-22  Richard Biener  
> 
>   * tree-cfg.c (gimple_split_edge): Avoid reallocating target
>   PHI nodes.
Definitely a good thing.  Having PHIs get reallocated has led to some
subtle bugs.  I realize this isn't a complete solution to that problem,
but every bit helps.

jeff


Re: [PATCH] Improve alloca alignment

2017-08-22 Thread Wilco Dijkstra
Jeff Law wrote:
On 07/26/2017 05:29 PM, Wilco Dijkstra wrote:

> > But then the check size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0
> > seems wrong too given that round_push uses a different alignment to align 
> > to. 
> I had started to dig into the history of this code, but just didn't have
> the time to do so fully before needing to leave for the day.  To some
> degree I was hoping you knew the rationale behind the test against
> MAX_SUPPORTED_STACK_ALIGNMENT and I wouldn't have to do a ton of digging :-)

I looked further into this - it appears this works correctly since it is only 
bypassed if
size_align is already maximally aligned. round_push aligns to the preferred 
alignment,
which may be lower or equal to MAX_SUPPORTED_STACK_ALIGNMENT (which is
at least STACK_BOUNDARY).

Here is the updated version:

ChangeLog:
2017-08-22  Wilco Dijkstra  

* explow.c (get_dynamic_stack_size): Improve dynamic alignment.

diff --git a/gcc/explow.c b/gcc/explow.c
index 
50074e281edd5270c76d29feac6b7a92f598d11d..d3148273030a010ece1f8ea1c14eef64bbf4e78a
 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1234,15 +1234,20 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align,
  example), so we must preventively align the value.  We leave space
  in SIZE for the hole that might result from the alignment operation.  */
 
-  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
-  size = plus_constant (Pmode, size, extra);
-  size = force_operand (size, NULL_RTX);
-
-  if (flag_stack_usage_info && pstack_usage_size)
-*pstack_usage_size += extra;
+  /* Since the stack is presumed to be aligned before this allocation,
+ we only need to increase the size of the allocation if the required
+ alignment is more than the stack alignment.  */
+  if (required_align > STACK_BOUNDARY)
+{
+  extra = (required_align - STACK_BOUNDARY) / BITS_PER_UNIT;
+  size = plus_constant (Pmode, size, extra);
+  size = force_operand (size, NULL_RTX);
+  if (size_align > STACK_BOUNDARY)
+   size_align = STACK_BOUNDARY;
 
-  if (extra && size_align > BITS_PER_UNIT)
-size_align = BITS_PER_UNIT;
+  if (flag_stack_usage_info && pstack_usage_size)
+   *pstack_usage_size += extra;
+}
 
   /* Round the size to a multiple of the required stack alignment.
  Since the stack is presumed to be rounded before this allocation,


Re: PR81635: Use chrecs to help find related data refs

2017-08-22 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener
>  wrote:
>> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng  wrote:
>>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
>>>  wrote:
 "Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>>>  wrote:
 "Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>  wrote:
>> The first loop in the testcase regressed after my recent changes to
>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>> for bb analysis and end up with:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: 0
>>DR_INIT: 0 or 4
>>DR_STEP: 16
>>
>> We now always keep the step as 0 instead, so for an int "i" we'd 
>> have:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: (intptr_t) i
>>DR_INIT: 0 or 4
>>DR_STEP: 0
>>
>> This is also what we'd like to have for the unsigned "i", but the
>> problem is that strip_constant_offset thinks that the "i + 1" in
>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>> longer seem to be related to the [i] ones.
>
> Didn't read the change in detail, so sorry if I mis-understood
> the issue.
> I made changes in scev to better fold type conversion by
> various sources
> of information, for example, vrp, niters, undefined overflow
> behavior etc.
> In theory these information should be available for other
> optimizers without
> querying scev.  For the mentioned test, vrp should compute
> accurate range
> information for "i" so that we can fold (intptr_t) (i + 1) it without
> worrying
> overflow.  Note we don't do it in generic folding because
> (intptr_t) (i) + 1
> could be more expensive (especially in case of (T)(i + j)), or
> because the
> CST part is in bigger precision after conversion.
> But such folding is wanted in several places, e.g, IVOPTs.  To
> provide such
> an interface, we changed tree-affine and made it do aggressive
> fold.  I am
> curious if it's possible to use aff_tree to implement
> strip_constant_offset
> here since aggressive folding is wanted.  After all, using
> additional chrec
> looks like a little heavy wrto the simple test.

 Yeah, using aff_tree does work here when the bounds are constant.
 It doesn't look like it works for things like:

 double p[1000];
 double q[1000];

 void
 f4 (unsigned int n)
 {
   for (unsigned int i = 0; i < n; i += 4)
 {
   double a = q[i] + p[i];
   double b = q[i + 1] + p[i + 1];
   q[i] = a;
   q[i + 1] = b;
 }
 }

 though, where the bounds on the global arrays guarantee that [i + 1] 
 can't
 overflow, even though "n" is unconstrained.  The patch as posted 
 handles
 this case too.
>>> BTW is this a missed optimization in value range analysis?  The range
>>> information for i should flow in a way like: array boundary -> niters
>>> -> scev/vrp.
>>> I think that's what niters/scev do in analysis.
>>
>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
>> the previous VRP pass came before loop header copying, so the (single)
>> header has to cope with n == 0 case.  Thus we get:
> Ah, there are several passes in between vrp and pass_ch, not sure if
> any such pass depends on vrp intensively.  I would suggestion reorder
> the two passes, or standalone VRP interface updating information for
> loop region after header copied?   This is a non-trivial issue that
> needs to be fixed.  Niters analyzer rely on
> simplify_using_initial_conditions heavily to get the same information,
> which in my opinion should be provided by VRP.  Though this won't be
> able to obsolete simplify_using_initial_conditions because VRP is weak
> in symbolic range...
>
>>
>>   Visiting statement:
>>   i_15 = ASSERT_EXPR ;
>>   Intersecting
>> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>>   and
>> [0, 0]
>>   to
>> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>   Intersecting
>> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>   and
>>

[PATCH][PR tree-optimization/81741] Throttle recording conditional equivalences

2017-08-22 Thread Jeff Law
This patch addresses some issues with conditional equivalences.

First, it stops recording blindly recording conditional equivalences.
Which leads to regressions...

So for certain binary expressions (for example x - y), if we lookup the
expression in the hash table and miss, we do a second lookup to see if
we have x == y in the hash table.  If so, then even though we don't know
the exact values of x and y, we can still provide a constant result.

I considered doing this in DOM rather than in the hash table lookup
routines, but then we'd have to duplicate the functionality in the
DOM/VRP threader.  Integrating the alternate lookup in the hash table
avoids that pitfall and turned out to be easy.  I've added a variety of
new tests to verify this functionality (extensions of pr71947 testcases).

For a conditional equivalence where the cost of computing one of the
SSA_NAMEs is higher than the other (as seen with 81741), we do record
the equivalence, but arrange it that we will replace the expensive name
with the cheap name.  Obviously, I use the code from 81741 for the
testcase here.

However, there are still cases where we have a conditional equivalence
and the names have the same cost to compute.  A greatly simplified
example can be found in gcc.dg/tree-ssa/20030922-2.c.

I've simply xfailed this test.  To fix it we need to build a second set
of tables that are essentially the conditional equivalences, without
setting SSA_NAME_VALUE (SSA_NAME_VALUE is what drives const/copy
propagation in DOM).  That's actually fairly easy and not terribly
costly.  What gets ugly is we have to consult this second set of tables
when doing simplifications.  Worse yet, we have to run through each
operand's conditional equivalences, substitute it in and try to
simplify.  It just don't expect it to hit enough to justify that pain.

The net result is we should stop ping-ponging copy propagations that
arise from conditional equivalences at the loss of some minor
optimizations in code similar to 20030922-2.c.

Bootstrapped and regression tested on x86_64.  Installing on the trunk.


Jeff


commit 44d01266aff5583b3b6db30158194656cfe88cae
Author: Jeff Law 
Date:   Tue Aug 22 09:10:02 2017 -0600

PR tree-optimization/81741
PR tree-optimization/71947
* tree-ssa-dom.c: Include tree-inline.h.
(record_temporary_equivalences): Only record SSA_NAME = SSA_NAME
equivalences if one is more expensive to compute than the other.
* tree-ssa-scopedtables.h (class const_or_copies): Make
record_const_or_copy_raw method private.
(class avail_exprs_stack): New method simplify_binary_operation.
* tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr): 
Call
avail_exprs_stack::simplify_binary_operation as needed.
(avail_exprs_stack::simplify_binary_operation): New function.

PR tree-optimization/81741
PR tree-optimization/71947
* gcc.dg/tree-ssa/pr81741.c: New test.
* gcc.dg/tree-ssa/pr71947-7.c: New test.
* gcc.dg/tree-ssa/pr71947-8.c: New test.
* gcc.dg/tree-ssa/pr71947-9.c: New test.
* gcc.dg/tree-ssa/pr71941-1.c: Tweak expected output.
* gcc.dg/tree-ssa/pr71941-2.c: Tweak expected output.
* gcc.dg/tree-ssa/pr71941-3.c: Tweak expected output.
* gcc.dg/tree-ssa/20030922-2.c: xfail.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ab85c074f24..9b941af74c6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,17 @@
+2017-08-22  Jeff Law  
+
+   PR tree-optimization/81741
+   PR tree-optimization/71947
+   * tree-ssa-dom.c: Include tree-inline.h.
+   (record_temporary_equivalences): Only record SSA_NAME = SSA_NAME
+   equivalences if one is more expensive to compute than the other.
+   * tree-ssa-scopedtables.h (class const_or_copies): Make
+   record_const_or_copy_raw method private.
+   (class avail_exprs_stack): New method simplify_binary_operation.
+   * tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr): Call
+   avail_exprs_stack::simplify_binary_operation as needed.
+   (avail_exprs_stack::simplify_binary_operation): New function.
+
 2017-08-22  Sebastian Huber  
 
* config.gcc (powerpc-*-rtems*): Add rs6000/linux64.opt.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 17733519e0b..531d0f95ae7 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,16 @@
+2017-08-22  Jeff Law  
+
+   PR tree-optimization/81741
+   PR tree-optimization/71947
+   * gcc.dg/tree-ssa/pr81741.c: New test.
+   * gcc.dg/tree-ssa/pr71947-7.c: New test.
+   * gcc.dg/tree-ssa/pr71947-8.c: New test.
+   * gcc.dg/tree-ssa/pr71947-9.c: New test.
+   * gcc.dg/tree-ssa/pr71941-1.c: Tweak expected output.
+   * gcc.dg/tree-ssa/pr71941-2.c: Tweak expected output.
+   * gcc.dg/tree

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 02/08 - V3

2017-08-22 Thread Jeff Law
On 08/18/2017 08:05 AM, Richard Biener wrote:
> On Mon, Jul 31, 2017 at 7:38 AM, Jeff Law  wrote:
>>
>> This patch introduces generic mechanisms to protect the dynamically
>> allocated stack space against stack-clash attacks.
>>
>> Changes since V2:
>>
>> Dynamic allocations can be emitted as unrolled inlined probes or with a
>> rotated loop.  Blockage insns are also properly emitted for the dynamic
>> area probes and the dynamic area probing now supports targets that may
>> make optimistic assumptions in their prologues.  Finally it uses the new
>> param to control the probing interval.
>>
>> Tests were updated to explicitly specify the guard and probing interval.
>>  New test to check inline/unrolled probes as well as rotated loop.
>>
>>
>>
>> * explow.c: Include "params.h".
>> (anti_adjust_stack_and_probe_stack_clash): New function.
>> (get_stack_check_protect): Likewise.
>> (compute_stack_clash_protection_loop_data): Likewise.
>> (emit_stack_clash_protection_loop_start): Likewise.
>> (emit_stack_clash_protection_loop_end): Likewise.
>> (allocate_dynamic_stack_space): Use get_stack_check_protect.
>> Use anti_adjust_stack_and_probe_stack_clash.
>> * explow.h (compute_stack_clash_protection_loop_data): Prototype.
>> (emit_stack_clash_protection_loop_start): Likewise.
>> (emit_stack_clash_protection_loop_end): Likewise.
>> * rtl.h (get_stack_check_protect): Prototype.
>> * defaults.h (STACK_CLASH_PROTECTION_NEEDS_FINAL_DYNAMIC_PROBE):
>> Define new default.
>> * doc/tm.texi.in (STACK_CLASH_PROTECTION_NEEDS_FINAL_DYNAMIC_PROBE):
>> Define.
> 
> Please make this a hook instead of a target macro.
Will do.  I should have known better :-)


> 
> Besides this it looks good (I trust you on the RTL details).
THanks.


Jeff


Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 - V3

2017-08-22 Thread Jeff Law
On 08/19/2017 12:22 PM, Martin Sebor wrote:
> On 07/30/2017 11:35 PM, Jeff Law wrote:
>> This patch introduces the stack clash protection options
>>
>> Changes since V2:
>>
>> Adds two new params.  The first controls the size of the guard area.
>> This controls the threshold for when a function prologue requires
>> probes.  The second controls the probing interval -- ie, once probes are
>> needed, how often do we emit them.  These are really meant more for
>> developers to experiment with than users.  Regardless I did go ahead and
>> document them./PARAM
>>
>> It also adds some sanity checking WRT combining stack clash protection
>> with -fstack-check.
> 
> Just a minor nit and suggestion:
> 
> "supproted" -> "supported"
> 
> +  warning_at (UNKNOWN_LOCATION, 0,
> +  "-fstack-clash_protection is not supproted on targets "
> +  "where the stack grows from lower to higher addresses");
> 
> and quote the name of the options in diagnostics, i.e., use either
> 
>   "%<"-fstack-clash_protection%> ..."
> 
> or
> 
>   "%qs is not supported...", "-fstack-clash_protection"
> 
> as you did in error ("value of parameter %qs must be a power of 2",
> ompiler_params[i].option);
> 
> Likewise in
> 
> +  warning_at (UNKNOWN_LOCATION, 0,
> +  "-fstack-check= and -fstack-clash_protection are mutually "
> +  "exclusive.  Disabling -fstack-check=");

Thanks.  I settled on the %< %> style.  None of the other warnings in
that area use either.  Otherwise I would have just selected whatever was
most commonly used in that code.

jeff


[PATCH] Add 'short_call' attribute for MIPS targets

2017-08-22 Thread Simon Atanasyan
Currently GCC supports 'long_call', 'far', and 'near' attributes. The
'long_call' and 'far' attributes are synonyms. This patch adds support
for the 'short_call' attribute as a synonym for `near` to make this list
complete, consistent with other targets, and compatible with attributes
supported by the Clang.

Tested on mipsel-linux-gnu.

2017-08-22  Simon Atanasyan  

gcc/

* config/mips/mips.c (mips_attribute_table): Add 'short_call' attribute.
(mips_near_type_p): Add 'short_call' attribute as a synonym for 'near'.
* doc/extend.texi (short_call): Document new function attribute.

gcc/testsuite/

* gcc.target/mips/near-far-1.c: Add check for 'short_call' attribute.
* gcc.target/mips/near-far-2.c: Likewise.
* gcc.target/mips/near-far-3.c: Likewise.
* gcc.target/mips/near-far-4.c: Likewise.

Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c (revision 251219)
+++ gcc/config/mips/mips.c (working copy)
@@ -598,6 +598,7 @@ static const struct attribute_spec mips_attribute_
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
om_diagnostic } */
   { "long_call",   0, 0, false, true,  true,  NULL, false },
+  { "short_call",  0, 0, false, true,  true,  NULL, false },
   { "far",   0, 0, false, true,  true,  NULL, false },
   { "near",0, 0, false, true,  true,  NULL, false },
   /* We would really like to treat "mips16" and "nomips16" as type
@@ -1171,13 +1172,14 @@ mflip_mips16_use_mips16_p (tree decl)
   return *slot;
 }

-/* Predicates to test for presence of "near" and "far"/"long_call"
+/* Predicates to test for presence of "near"/"short_call" and "far"/"long_call"
attributes on the given TYPE.  */

 static bool
 mips_near_type_p (const_tree type)
 {
-  return lookup_attribute ("near", TYPE_ATTRIBUTES (type)) != NULL;
+  return (lookup_attribute ("short_call", TYPE_ATTRIBUTES (type)) != NULL
+  || lookup_attribute ("near", TYPE_ATTRIBUTES (type)) != NULL);
 }

 static bool
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 251219)
+++ gcc/doc/extend.texi (working copy)
@@ -4528,10 +4528,12 @@ void __attribute__ ((interrupt("vector=hw3"))) v9
 @end smallexample

 @item long_call
+@itemx short_call
 @itemx near
 @itemx far
 @cindex indirect calls, MIPS
 @cindex @code{long_call} function attribute, MIPS
+@cindex @code{short_call} function attribute, MIPS
 @cindex @code{near} function attribute, MIPS
 @cindex @code{far} function attribute, MIPS
 These attributes specify how a particular function is called on MIPS@.
@@ -4539,8 +4541,9 @@ The attributes override the @option{-mlong-calls}
 command-line switch.  The @code{long_call} and @code{far} attributes are
 synonyms, and cause the compiler to always call
 the function by first loading its address into a register, and then using
-the contents of that register.  The @code{near} attribute has the opposite
-effect; it specifies that non-PIC calls should be made using the more
+the contents of that register.  The @code{short_call} and @code{near}
+attributes are synonyms, and have the opposite
+effect; they specify that non-PIC calls should be made using the more
 efficient @code{jal} instruction.

 @item mips16
Index: gcc/testsuite/gcc.target/mips/near-far-1.c
===
--- gcc/testsuite/gcc.target/mips/near-far-1.c (revision 251219)
+++ gcc/testsuite/gcc.target/mips/near-far-1.c (working copy)
@@ -3,6 +3,7 @@

 extern int long_call_func () __attribute__((long_call));
 extern int far_func () __attribute__((far));
+extern int short_call_func () __attribute__((short_call));
 extern int near_func () __attribute__((near));
 extern int normal_func ();

@@ -10,6 +11,7 @@ int test ()
 {
   return (long_call_func ()
   + far_func ()
+  + short_call_func ()
   + near_func ()
   + normal_func ());
 }
@@ -16,5 +18,6 @@ int test ()

 /* { dg-final { scan-assembler-not "\tjal\tlong_call_func\n" } } */
 /* { dg-final { scan-assembler-not "\tjal\tfar_func\n" } } */
+/* { dg-final { scan-assembler "\t(jal(|s)|balc)\tshort_call_func\n" } } */
 /* { dg-final { scan-assembler "\t(jal(|s)|balc)\tnear_func\n" } } */
 /* { dg-final { scan-assembler-not "\tjal\tnormal_func\n" } } */
Index: gcc/testsuite/gcc.target/mips/near-far-2.c
===
--- gcc/testsuite/gcc.target/mips/near-far-2.c (revision 251219)
+++ gcc/testsuite/gcc.target/mips/near-far-2.c (working copy)
@@ -3,6 +3,7 @@

 extern int long_call_func () __attribute__((long_call));
 extern int far_func () __attribute__((far));
+extern int short_call_func () __attribute__((short_call));
 extern int near_func () __attribute__((near));
 extern int normal_func ();

@@ -10,6 +11,7 @@ int test ()
 {
   return (long_call_func ()
   + far_func ()
+  + short_call_func ()
 

Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread Daniel Santos
On 08/22/2017 01:26 AM, Andreas Schwab wrote:
> On Aug 21 2017, Daniel Santos  wrote:
>
>> This is a problem that occured because of this code in
>> ix86_option_override_internal:
>>
>>   if (!opts_set->x_ix86_abi)
>> opts->x_ix86_abi = DEFAULT_ABI;
> Why is that a problem?  Note opts_set vs opts.

Just because the test !opts_set->x_ix86_abi will be true rather we
supplied no -mabi parameter or we supplied -mabi=sysv.

Daniel

> Andreas.



Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 - V3

2017-08-22 Thread Jeff Law
On 08/18/2017 08:01 AM, Richard Biener wrote:
> On Mon, Jul 31, 2017 at 7:35 AM, Jeff Law  wrote:
>> This patch introduces the stack clash protection options
>>
>> Changes since V2:
>>
>> Adds two new params.  The first controls the size of the guard area.
>> This controls the threshold for when a function prologue requires
>> probes.  The second controls the probing interval -- ie, once probes are
>> needed, how often do we emit them.  These are really meant more for
>> developers to experiment with than users.  Regardless I did go ahead and
>> document them./PARAM
>>
>> It also adds some sanity checking WRT combining stack clash protection
>> with -fstack-check.
> 
> diff --git a/gcc/params.c b/gcc/params.c
> index fab0ffa..8afe4c4 100644
> --- a/gcc/params.c
> +++ b/gcc/params.c
> @@ -209,6 +209,11 @@ set_param_value (const char *name, int value,
>  error ("maximum value of parameter %qs is %u",
> compiler_params[i].option,
> compiler_params[i].max_value);
> +  else if ((strcmp (name, "stack-clash-protection-guard-size") == 0
> +|| strcmp (name, "stack-clash-protection-probe-interval") == 0)
> +   && exact_log2 (value) == -1)
> +error ("value of parameter %qs must be a power of 2",
> +   compiler_params[i].option);
>else
>  set_param_value_internal ((compiler_param) i, value,
>params, params_set, true);
> 
> I don't like this.  Either use them as if they were power-of-two
> (floor_log2/ceil_log2 as appropriate) or simply make them take
> the logarithm instead (like -mincoming-stack-boundary and friends).
Yes.  I was torn on this for a variety of reasons, including the fact
that I don't actually expect anyone to be mucking with those :-)

Given we've already other stuff in log2 form, I'll use that -- it seems
less surprising to me than using floor/ceil.

> 
> Both -fstack-clash-protection and -fstack-check cannot be turned
> off per function.  This means they would need merging in lto-wrapper.
> The alternative is to mark them with 'Optimization' and allow per-function
> specification (like we do for -fstack-protector).
Do you have a strong preference here?  I'd tend to go with tweaking
lto-wrapper as we really don't want to have this stuff varying per-function.

Presumably in lto-wrapper we just have to detect that both were enabled
and do something sensible.  We drop -fstack-check in toplev.c when both
are specified, we could just as easily call that situation a fatal error
in both toplev.c and lto-wrapper.c

jeff




Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Martin Sebor

On 08/21/2017 07:41 PM, Daniel Santos wrote:

It took me a while to figure out how to do this so I figured that it should be
in the docs.  OK for trunk?

* doc/install.texi: Add more details on selecting multiple tests.


Thank you!  It had taken me some time to figure this out.


+The file-matching expression following @var{filename}@command{.exp=} is treated
+as a series of whitespace-delimited glob expressions so that multiple patterns
+may be passed, although any whitespace must either be escaped or surrounded by
+tick marks if multiple expressions are desired. For example,


Do you mean single quotes?  I would suggest "escaped or quoted."
The whole argument to RUNTESTFLAGS can be quoted in either single
or double quotes and, AFAICT, so can the space-separated test
names within it.

Martin


Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 - V3

2017-08-22 Thread David Malcolm
On Tue, 2017-08-22 at 09:35 -0600, Jeff Law wrote:
> On 08/19/2017 12:22 PM, Martin Sebor wrote:
> > On 07/30/2017 11:35 PM, Jeff Law wrote:
> > > This patch introduces the stack clash protection options
> > > 
> > > Changes since V2:
> > > 
> > > Adds two new params.  The first controls the size of the guard
> > > area.
> > > This controls the threshold for when a function prologue requires
> > > probes.  The second controls the probing interval -- ie, once
> > > probes are
> > > needed, how often do we emit them.  These are really meant more
> > > for
> > > developers to experiment with than users.  Regardless I did go
> > > ahead and
> > > document them./PARAM
> > > 
> > > It also adds some sanity checking WRT combining stack clash
> > > protection
> > > with -fstack-check.
> > 
> > Just a minor nit and suggestion:
> > 
> > "supproted" -> "supported"
> > 
> > +  warning_at (UNKNOWN_LOCATION, 0,
> > +  "-fstack-clash_protection is not supproted on targets "
> > +  "where the stack grows from lower to higher addresses");
> > 
> > and quote the name of the options in diagnostics, i.e., use either
> > 
> >   "%<"-fstack-clash_protection%> ..."
> > 
> > or
> > 
> >   "%qs is not supported...", "-fstack-clash_protection"
> > 
> > as you did in error ("value of parameter %qs must be a power of 2",
> > ompiler_params[i].option);
> > 
> > Likewise in
> > 
> > +  warning_at (UNKNOWN_LOCATION, 0,
> > +  "-fstack-check= and -fstack-clash_protection are
> > mutually "
> > +  "exclusive.  Disabling -fstack-check=");
> 
> Thanks.  I settled on the %< %> style.  None of the other warnings in
> that area use either.  Otherwise I would have just selected whatever
> was
> most commonly used in that code.

A nit that don't seem to have been mentioned: the patch refers to
  -fstack-clash_protection (erroneous underscore)
in two places, which should be:
  -fstack-clash-protection (all dashes)


Dave


[PATCH][GCC][mid-end] Fix single use requirement for xorsign (x * copysign (1, y))

2017-08-22 Thread Tamar Christina
Hi All,

This patch fixes the placement of the single-use requirement for xorsign.

on `x = a * copysign (1, b)` the intention was that `copysign (1, b)` be
single use, and not `x`. Requiring `x` to be single use blocks transformation
where we do want it to occur.

Regtested on aarch64-none-linux-gnu and no regressions
(only target currently supporting this).

Ok for trunk?

gcc/
2017-08-22  Tamar Christina  

PR middle-end/19706
* tree-ssa-math-opts.c (convert_expand_mult_copysign):
Fix single-use check.

-- 
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 87940b6e00db0d34a472504cf70923d1d334eccb..f26d105c16145bc77dd5e3a05b13f22aff43d4d8 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -3200,21 +3200,20 @@ convert_expand_mult_copysign (gimple *stmt, gimple_stmt_iterator *gsi)
   type = TREE_TYPE (lhs);
   machine_mode mode = TYPE_MODE (type);
 
-  if (HONOR_SNANS (type) || !has_single_use (lhs))
+  if (HONOR_SNANS (type))
 return false;
 
   if (TREE_CODE (treeop0) == SSA_NAME && TREE_CODE (treeop1) == SSA_NAME)
 {
   gimple *call0 = SSA_NAME_DEF_STMT (treeop0);
-  if (!is_copysign_call_with_1 (call0))
+  if (!is_copysign_call_with_1 (call0) || !has_single_use (treeop0))
 	{
 	  call0 = SSA_NAME_DEF_STMT (treeop1);
-	  if (!is_copysign_call_with_1 (call0))
+	  if (!is_copysign_call_with_1 (call0) || !has_single_use (treeop1))
 	return false;
 
 	  treeop1 = treeop0;
 	}
-
 	if (optab_handler (xorsign_optab, mode) == CODE_FOR_nothing)
 	  return false;
 



[PATCH][GCC][mid-end] Add generic test for xorsign (x * copysign (1, y))

2017-08-22 Thread Tamar Christina
Hi All,

Since XORSIGN is a mid-end transformation I'm adding a generic test
to check for targets that support it if the transformation has been
carried out or not in tree.

Regtested on aarch64-none-linux-gnu, arm-none-eabi
and x86_64-pc-linux-gnu with no regressions

Ok for trunk?

gcc/
2017-08-22  Tamar Christina  

PR middle-end/19706
* doc/sourcebuild.texi (Other hardware attributes):
Document xorsign.


gcc/testsuite
2017-08-22  Tamar Christina  

PR middle-end/19706
* gcc.dg/tree-ssa/pr19706.c: New.
* lib/target-supports.exp (check_effective_target_xorsign): New.

-- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index e6313dc031ef5b2b5a72180bccf1e876812efe48..a1ca417b5c810b3e16520bf68944be709e0b8e92 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1908,6 +1908,9 @@ or @code{EM_SPARCV9} executables.
 @item vect_cmdline_needed
 Target requires a command line argument to enable a SIMD instruction set.
 
+@item xorsign
+Target supports the xorsign optab expansion.
+
 @end table
 
 @subsubsection Environment attributes
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr19706.c b/gcc/testsuite/gcc.dg/tree-ssa/pr19706.c
new file mode 100644
index ..92dd5db5a979b57cd7e575bc182f4803d82936d7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr19706.c
@@ -0,0 +1,86 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-require-effective-target xorsign } */
+/* { dg-require-effective-target arm_v8_neon_ok { target { arm*-*-* } } } */
+/* { dg-add-options arm_v8_neon } */
+
+double
+check_d_pos (double x, double y)
+{
+  return x * __builtin_copysign (1.0, y);
+}
+
+float
+check_f_pos (float x, float y)
+{
+  return x * __builtin_copysignf (1.0f, y);
+}
+
+long double
+check_l_pos (long double x, long double y)
+{
+  return x * __builtin_copysignl (1.0, y);
+}
+
+/* --- */
+
+double
+check_d_neg (double x, double y)
+{
+  return x * __builtin_copysign (-1.0, y);
+}
+
+float
+check_f_neg (float x, float y)
+{
+  return x * __builtin_copysignf (-1.0f, y);
+}
+
+long double
+check_l_neg (long double x, long double y)
+{
+  return x * __builtin_copysignl (-1.0, y);
+}
+
+/* --- */
+
+double
+check_d_pos_rev (double x, double y)
+{
+  return __builtin_copysign (1.0, y) * x;
+}
+
+float
+check_f_pos_rev (float x, float y)
+{
+  return __builtin_copysignf (1.0f, y) * x;
+}
+
+long double
+check_l_pos_rev (long double x, long double y)
+{
+  return __builtin_copysignl (1.0, y) * x;
+}
+
+/* --- */
+
+double
+check_d_neg_rev (double x, double y)
+{
+  return __builtin_copysign (-1.0, y) * x;
+}
+
+float
+check_f_neg_rev (float x, float y)
+{
+  return __builtin_copysignf (-1.0f, y) * x;
+}
+
+long double
+check_l_neg_rev (long double x, long double y)
+{
+  return __builtin_copysignl (-1.0, y) * x;
+}
+
+/* { dg-final { scan-tree-dump "XORSIGN" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "copysign" "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5219fbf4671e83a6fa7affdab926115e8a23f9cb..db0c0ff089acba29a4c3c177d4ebacd40ce1a631 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5332,6 +5332,28 @@ proc check_effective_target_vect_perm_short { } {
 return $et_vect_perm_short_saved($et_index)
 }
 
+# Return 1 if the target plus current options supports folding of
+# copysign into XORSIGN.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_xorsign { } {
+global et_xorsign_saved
+global et_index
+
+if [info exists et_xorsign_saved($et_index)] {
+verbose "check_effective_target_xorsign: using cached result" 2
+} else {
+	set et_xorsign_saved($et_index) 0
+if { [istarget aarch64*-*-*] || [istarget arm*-*-*] } {
+	set et_xorsign_saved($et_index) 1
+}
+}
+verbose "check_effective_target_xorsign:\
+	 returning $et_xorsign_saved($et_index)" 2
+return $et_xorsign_saved($et_index)
+}
+
 # Return 1 if the target plus current options supports a vector
 # widening summation of *short* args into *int* result, 0 otherwise.
 #



Re: [PATCH][GCC][mid-end] Fix single use requirement for xorsign (x * copysign (1, y))

2017-08-22 Thread Tamar Christina
Sorry, forgot to add some maintainers :)

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Tuesday, August 22, 2017 5:27:23 PM
To: gcc-patches@gcc.gnu.org
Cc: nd; l...@redhat.com
Subject: [PATCH][GCC][mid-end] Fix single use requirement for xorsign (x * 
copysign (1, y))

Hi All,

This patch fixes the placement of the single-use requirement for xorsign.

on `x = a * copysign (1, b)` the intention was that `copysign (1, b)` be
single use, and not `x`. Requiring `x` to be single use blocks transformation
where we do want it to occur.

Regtested on aarch64-none-linux-gnu and no regressions
(only target currently supporting this).

Ok for trunk?

gcc/
2017-08-22  Tamar Christina  

PR middle-end/19706
* tree-ssa-math-opts.c (convert_expand_mult_copysign):
Fix single-use check.

--


Re: [PATCH] Fix PR81921

2017-08-22 Thread Richard Earnshaw (lists)
On 22/08/17 13:55, Kyrill Tkachov wrote:
> Hi Richard,
> [roping in more aarch64 maintainers]
> 
> On 22/08/17 13:27, Richard Biener wrote:
>> On Tue, 22 Aug 2017, Uros Bizjak wrote:
>>
>>> On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener 
>>> wrote:
 The following patch fixes PR81921 (and LTO build of libgo) which I ran
 into when trying to enable free-lang-data for non-LTO compiles.

 free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all functions
 so we have them ending up with target_option_default_node eventually
 which is something ix86_can_inline_p doesn't expect (I tried forcing
 a compare of the actual options but that fails as well as we get
 spurious differences in use-fpmath, the default node with -m32
 -march=x86_64 doesn't have it while non-default nodes have it).

 The patch is what I consider safe for branches, we might want to work
 on sth better (actually comparing always and fixing the fpmath issue)
 on trunk as followup.

 Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk
 and active branches?

 Note the change to the ref = false conditional isn't strictly necessary
 but it makes -flto and non-flto behave consistently.

 Thanks,
 Richard.

 2017-08-22  Richard Biener  

  PR target/81921
  * config/i386/i386.c (ix86_can_inline_p): Treat
  target_option_default_node as non-existent.
>>> LGTM.
>>>
>>> Please give the patch some soaking time in the mainline before
>>> backporting it to release branches.
>> Thanks.  Of course this was copied by other targets (and the x86
>> one maybe from the default).  So the following is an extended patch.
>>
>> Ok for the rs6000 and aarch64 bits?
>>
>> Thanks,
>> Richard.
>>
>> 2017-08-22  Richard Biener  
>>
>> PR target/81921
>> * config/i386/i386.c (ix86_can_inline_p): Treat
>> target_option_default_node as non-existent.
>> * targhooks.c (default_target_can_inline_p): Likewise.
>> * config/aarch64/aarch64.c (aarch64_can_inline_p): Likewise.
>> * config/powerpcspe/powerpcspe.c (rs6000_can_inline_p): Likewise.
>> * config/rs6000/rs6000.c (rs6000_can_inline_p): Likewise.
>>
>> Index: gcc/config/i386/i386.c
>> ===
>> --- gcc/config/i386/i386.c(revision 251266)
>> +++ gcc/config/i386/i386.c(working copy)
>> @@ -7507,12 +7507,12 @@ ix86_can_inline_p (tree caller, tree cal
>> tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
>>   /* If callee has no option attributes, then it is ok to inline.  */
>> -  if (!callee_tree)
>> +  if (!callee_tree || callee_tree == target_option_default_node)
>>   ret = true;
>>   /* If caller has no option attributes, but callee does then it
>> is not ok to
>>inline.  */
>> -  else if (!caller_tree)
>> +  else if (!caller_tree || caller_tree == target_option_default_node)
>>   ret = false;
>>   else
>> Index: gcc/testsuite/gcc.target/i386/pr81921.c
>> ===
>> --- gcc/testsuite/gcc.target/i386/pr81921.c(nonexistent)
>> +++ gcc/testsuite/gcc.target/i386/pr81921.c(working copy)
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target lto } */
>> +/* { dg-options "-flto -march=x86-64" } */
>> +
>> +extern __inline int  __attribute__((__gnu_inline__,
>> __always_inline__, __artificial__, target("sse2")))
>> +_mm_loadu_si128 (int const *__P)
>> +{
>> +return *__P;
>> +}
>> +
>> +void __attribute__((target("ssse3"))) foo (void *p)
>> +{
>> +  volatile int x = _mm_loadu_si128 (p);
>> +}
>> Index: gcc/targhooks.c
>> ===
>> --- gcc/targhooks.c(revision 251274)
>> +++ gcc/targhooks.c(working copy)
>> @@ -1447,12 +1447,12 @@ default_target_can_inline_p (tree caller
>> tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller);
>>   /* If callee has no option attributes, then it is ok to inline */
>> -  if (!callee_opts)
>> +  if (!callee_opts || callee_tree == target_option_default_node)
>>   ret = true;
>>   /* If caller has no option attributes, but callee does then it
>> is not ok to
>>inline */
>> -  else if (!caller_opts)
>> +  else if (!caller_opts || caller_tree == target_option_default_node)
>>   ret = false;
>>   /* If both caller and callee have attributes, assume that if the
>> Index: gcc/config/aarch64/aarch64.c
>> ===
>> --- gcc/config/aarch64/aarch64.c(revision 251274)
>> +++ gcc/config/aarch64/aarch64.c(working copy)
>> @@ -10074,7 +10074,7 @@ aarch64_can_inline_p (tree caller, tree
>> tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
>>   /* If callee has no option attributes, then it is ok to inline.  */
>> -  if (!callee_tree)
>> +  i

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 - V3

2017-08-22 Thread Jeff Law
On 08/22/2017 10:26 AM, David Malcolm wrote:
>> Thanks.  I settled on the %< %> style.  None of the other warnings in
>> that area use either.  Otherwise I would have just selected whatever
>> was
>> most commonly used in that code.
> 
> A nit that don't seem to have been mentioned: the patch refers to
>   -fstack-clash_protection (erroneous underscore)
> in two places, which should be:
>   -fstack-clash-protection (all dashes)
I already fixed that locally :-)

jeff


Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Mike Stump

> On Aug 22, 2017, at 8:58 AM, Martin Sebor  wrote:
> 
> On 08/21/2017 07:41 PM, Daniel Santos wrote:
>> It took me a while to figure out how to do this so I figured that it should 
>> be
>> in the docs.  OK for trunk?
>> 
>>  * doc/install.texi: Add more details on selecting multiple tests.
> 
> Thank you!  It had taken me some time to figure this out.
> 
>> +The file-matching expression following @var{filename}@command{.exp=} is 
>> treated
>> +as a series of whitespace-delimited glob expressions so that multiple 
>> patterns
>> +may be passed, although any whitespace must either be escaped or surrounded 
>> by
>> +tick marks if multiple expressions are desired. For example,

> Do you mean single quotes?  I would suggest "escaped or quoted."


Yes, I agree.  Please, no tick, slightly too informal.

Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Mike Stump
On Aug 22, 2017, at 8:58 AM, Martin Sebor  wrote:
> 
> On 08/21/2017 07:41 PM, Daniel Santos wrote:
>> It took me a while to figure out how to do this so I figured that it should 
>> be
>> in the docs.  OK for trunk?

Oh, yeah, with the correction mentioned, Ok.

Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
On 08/22/2017 10:58 AM, Martin Sebor wrote:
> On 08/21/2017 07:41 PM, Daniel Santos wrote:
>> It took me a while to figure out how to do this so I figured that it
>> should be
>> in the docs.  OK for trunk?
>>
>> * doc/install.texi: Add more details on selecting multiple tests.
>
> Thank you!  It had taken me some time to figure this out.
>
>> +The file-matching expression following @var{filename}@command{.exp=}
>> is treated
>> +as a series of whitespace-delimited glob expressions so that
>> multiple patterns
>> +may be passed, although any whitespace must either be escaped or
>> surrounded by
>> +tick marks if multiple expressions are desired. For example,
>
> Do you mean single quotes?

Yes.  I guess I've heard the terms "tick marks" and "single quotes" used
before.  Perhaps using 'single quotes' would be a good way to express it
(with the quotes).

>   I would suggest "escaped or quoted."
> The whole argument to RUNTESTFLAGS can be quoted in either single
> or double quotes and, AFAICT, so can the space-separated test
> names within it.

Well, mysteriously, double quotes do not work.  So if I pass
RUNTESTFLAGS='"i386.exp=pr80969-[12]*.c pr80969-4.c"' then the second
pattern isn't used.  I have NO idea what happens to it because it I pass
RUNTESTFLAGS='i386.exp=pr80969-[12]*.c pr80969-4.c' then runtest
properly demands that I tell it what in the hell pr80969-4.c is supposed
to mean.  As an experiment, I created a symlink named \"pr80969-4.c and
using RUNTESTFLAGS='"i386.exp=pr80969-[12]*.c "pr80969-4.c' but it
didn't pick it up.  This is probably JAB (just another bug) in DejaGNU.

Among the variations I've tried are enclosing the expressions in
{braces},  \{escaped braces\} and comma-delimited \{escaped,braces\},
but none of these worked.

Daniel

> Martin
>



Re: [PATCH] Fix PR81921

2017-08-22 Thread Richard Biener
On August 22, 2017 6:38:55 PM GMT+02:00, "Richard Earnshaw (lists)" 
 wrote:
>On 22/08/17 13:55, Kyrill Tkachov wrote:
>> Hi Richard,
>> [roping in more aarch64 maintainers]
>> 
>> On 22/08/17 13:27, Richard Biener wrote:
>>> On Tue, 22 Aug 2017, Uros Bizjak wrote:
>>>
 On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener
>
 wrote:
> The following patch fixes PR81921 (and LTO build of libgo) which I
>ran
> into when trying to enable free-lang-data for non-LTO compiles.
>
> free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all
>functions
> so we have them ending up with target_option_default_node
>eventually
> which is something ix86_can_inline_p doesn't expect (I tried
>forcing
> a compare of the actual options but that fails as well as we get
> spurious differences in use-fpmath, the default node with -m32
> -march=x86_64 doesn't have it while non-default nodes have it).
>
> The patch is what I consider safe for branches, we might want to
>work
> on sth better (actually comparing always and fixing the fpmath
>issue)
> on trunk as followup.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for
>trunk
> and active branches?
>
> Note the change to the ref = false conditional isn't strictly
>necessary
> but it makes -flto and non-flto behave consistently.
>
> Thanks,
> Richard.
>
> 2017-08-22  Richard Biener  
>
>  PR target/81921
>  * config/i386/i386.c (ix86_can_inline_p): Treat
>  target_option_default_node as non-existent.
 LGTM.

 Please give the patch some soaking time in the mainline before
 backporting it to release branches.
>>> Thanks.  Of course this was copied by other targets (and the x86
>>> one maybe from the default).  So the following is an extended patch.
>>>
>>> Ok for the rs6000 and aarch64 bits?
>>>
>>> Thanks,
>>> Richard.
>>>
>>> 2017-08-22  Richard Biener  
>>>
>>> PR target/81921
>>> * config/i386/i386.c (ix86_can_inline_p): Treat
>>> target_option_default_node as non-existent.
>>> * targhooks.c (default_target_can_inline_p): Likewise.
>>> * config/aarch64/aarch64.c (aarch64_can_inline_p): Likewise.
>>> * config/powerpcspe/powerpcspe.c (rs6000_can_inline_p):
>Likewise.
>>> * config/rs6000/rs6000.c (rs6000_can_inline_p): Likewise.
>>>
>>> Index: gcc/config/i386/i386.c
>>> ===
>>> --- gcc/config/i386/i386.c(revision 251266)
>>> +++ gcc/config/i386/i386.c(working copy)
>>> @@ -7507,12 +7507,12 @@ ix86_can_inline_p (tree caller, tree cal
>>> tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
>>>   /* If callee has no option attributes, then it is ok to
>inline.  */
>>> -  if (!callee_tree)
>>> +  if (!callee_tree || callee_tree == target_option_default_node)
>>>   ret = true;
>>>   /* If caller has no option attributes, but callee does then it
>>> is not ok to
>>>inline.  */
>>> -  else if (!caller_tree)
>>> +  else if (!caller_tree || caller_tree ==
>target_option_default_node)
>>>   ret = false;
>>>   else
>>> Index: gcc/testsuite/gcc.target/i386/pr81921.c
>>> ===
>>> --- gcc/testsuite/gcc.target/i386/pr81921.c(nonexistent)
>>> +++ gcc/testsuite/gcc.target/i386/pr81921.c(working copy)
>>> @@ -0,0 +1,14 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target lto } */
>>> +/* { dg-options "-flto -march=x86-64" } */
>>> +
>>> +extern __inline int  __attribute__((__gnu_inline__,
>>> __always_inline__, __artificial__, target("sse2")))
>>> +_mm_loadu_si128 (int const *__P)
>>> +{
>>> +return *__P;
>>> +}
>>> +
>>> +void __attribute__((target("ssse3"))) foo (void *p)
>>> +{
>>> +  volatile int x = _mm_loadu_si128 (p);
>>> +}
>>> Index: gcc/targhooks.c
>>> ===
>>> --- gcc/targhooks.c(revision 251274)
>>> +++ gcc/targhooks.c(working copy)
>>> @@ -1447,12 +1447,12 @@ default_target_can_inline_p (tree caller
>>> tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller);
>>>   /* If callee has no option attributes, then it is ok to inline
>*/
>>> -  if (!callee_opts)
>>> +  if (!callee_opts || callee_tree == target_option_default_node)
>>>   ret = true;
>>>   /* If caller has no option attributes, but callee does then it
>>> is not ok to
>>>inline */
>>> -  else if (!caller_opts)
>>> +  else if (!caller_opts || caller_tree ==
>target_option_default_node)
>>>   ret = false;
>>>   /* If both caller and callee have attributes, assume that if
>the
>>> Index: gcc/config/aarch64/aarch64.c
>>> ===
>>> --- gcc/config/aarch64/aarch64.c(revision 251274)
>>> +++ gcc/config/aarch64/aarch64.c(working copy)
>>> @@ -10074,7 +10074,7 @@ aarc

Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Mike Stump
On Aug 22, 2017, at 10:32 AM, Daniel Santos  wrote:
> 
>>  I would suggest "escaped or quoted."
>> The whole argument to RUNTESTFLAGS can be quoted in either single
>> or double quotes and, AFAICT, so can the space-separated test
>> names within it.
> 
> Well, mysteriously, double quotes do not work.

Did you try the obvious:

"\"pdf pdf\" pdf"

?  I think it should work fine.


[PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
OK, how's this one?

* doc/install.texi: Modify to add more details on running
selected tests.

Thanks,
Daniel

Signed-off-by: Daniel Santos 
---
 gcc/doc/install.texi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 7c9e2f25d44..da360da1c50 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2737,6 +2737,16 @@ the testsuite with filenames matching @samp{9805*}, you 
would use
 make check-g++ RUNTESTFLAGS="old-deja.exp=9805* @var{other-options}"
 @end smallexample
 
+The file-matching expression following @var{filename}@command{.exp=} is treated
+as a series of whitespace-delimited glob expressions so that multiple patterns
+may be passed, although any whitespace must either be escaped or surrounded by
+single quotes if multiple expressions are desired. For example,
+
+@smallexample
+make check-g++ RUNTESTFLAGS="old-deja.exp=9805*\ virtual2.c 
@var{other-options}"
+make check-g++ RUNTESTFLAGS="'old-deja.exp=9805* virtual2.c' 
@var{other-options}"
+@end smallexample
+
 The @file{*.exp} files are located in the testsuite directories of the GCC
 source, the most important ones being @file{compile.exp},
 @file{execute.exp}, @file{dg.exp} and @file{old-deja.exp}.
-- 
2.13.3



Re: Remove the frame size argument from function_prologue/epilogue

2017-08-22 Thread Mike Stump
On Aug 21, 2017, at 4:12 AM, Richard Sandiford  
wrote:
> 
> Later patches will add support for frame sizes that are a run-time
> invariant but not a compile-time constant.

I'm assuming dwarf and C++ exceptions will remain working.  :-)

Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Mike Stump
On Aug 22, 2017, at 10:44 AM, Daniel Santos  wrote:
> 
> OK, how's this one?

Ok.


Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
On 08/22/2017 12:32 PM, Mike Stump wrote:
> On Aug 22, 2017, at 10:32 AM, Daniel Santos  wrote:
>>>  I would suggest "escaped or quoted."
>>> The whole argument to RUNTESTFLAGS can be quoted in either single
>>> or double quotes and, AFAICT, so can the space-separated test
>>> names within it.
>> Well, mysteriously, double quotes do not work.
> Did you try the obvious:
>
> "\"pdf pdf\" pdf"
>
> ?  I think it should work fine.

Yes.  As I explained in the rest of my email I tried a great many
variations.  I can debug runtest some more and try to better understand
how this is getting parsed.

Daniel


Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
On 08/22/2017 12:32 PM, Mike Stump wrote:
> On Aug 22, 2017, at 10:32 AM, Daniel Santos  wrote:
>>>  I would suggest "escaped or quoted."
>>> The whole argument to RUNTESTFLAGS can be quoted in either single
>>> or double quotes and, AFAICT, so can the space-separated test
>>> names within it.
>> Well, mysteriously, double quotes do not work.
> Did you try the obvious:
>
> "\"pdf pdf\" pdf"
>
> ?  I think it should work fine.

I have found one additional working mechanism:

RUNTESTFLAGS='i386.exp=\"pr80969-[12]*.c pr80969-4.c\"'

But using double quotes for both does NOT work:

RUNTESTFLAGS="i386.exp=\"pr80969-[12]*.c pr80969-4.c\""

So the three working options appears to be:
1. Escaping whitespace
2. Using double quotes for the whole value and single quotes for the
file.exp=patterns expression
3. Using single quotes for the whole value and double quotes for the
file.exp=patterns expression

Daniel


Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Daniel Santos
OK, the problem is at line 4014 of gcc/Makefile.in:

  $(MAKE) TESTSUITEDIR="$(TESTSUITEDIR)"
RUNTESTFLAGS="$(RUNTESTFLAGS)" \
check-parallel-$* \
 
Even worse, one can inject arbitrary shell commands here, not that I can
think of a scenario where it would be an actual security problem:

RUNTESTFLAGS="i386.exp=a b\"; beep\"" check-c

I presume that the solution would be to re-escape the contents of
RUNTESTFLAGS.

Daniel


[PATCH, rs6000] (v2) Testcase coverage for vec_perm built-ins

2017-08-22 Thread Will Schmidt
Hi, 
[PATCH, rs6000] (v2)  Testcase coverage for vec_perm built-ins

Add some Testcase coverage for the vector permute intrinsics.

(v2) Revisited and reworked all of the requires statements
throughout this batch.  Everything specifies -maltivec except
the double and long long tests, which require vsx.

(re-) Tested across power platforms.   OK for trunk?

Thanks,
-Will

   
[gcc/testsuite]

2017-08-17  Will Schmidt  

* gcc.target/powerpc/fold-vec-perm-char.c: New.
* gcc.target/powerpc/fold-vec-perm-double.c: New.
* gcc.target/powerpc/fold-vec-perm-float.c: New.
* gcc.target/powerpc/fold-vec-perm-int.c: New.
* gcc.target/powerpc/fold-vec-perm-longlong.c: New.
* gcc.target/powerpc/fold-vec-perm-pixel.c: New.
* gcc.target/powerpc/fold-vec-perm-short.c: New.

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-char.c
new file mode 100644
index 000..d907eae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-char.c
@@ -0,0 +1,31 @@
+/* Verify that overloaded built-ins for vec_perm with char
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+#include 
+
+vector bool char
+testbc (vector bool char vbc2, vector bool char vbc3,
+   vector unsigned char vuc)
+{
+  return vec_perm (vbc2, vbc3, vuc);
+}
+
+vector signed char
+testsc (vector signed char vsc2, vector signed char vsc3,
+   vector unsigned char vuc)
+{
+  return vec_perm (vsc2, vsc3, vuc);
+}
+
+vector unsigned char
+testuc (vector unsigned char vuc2, vector unsigned char vuc3,
+   vector unsigned char vuc)
+{
+  return vec_perm (vuc2, vuc3, vuc);
+}
+
+/* { dg-final { scan-assembler-times "vperm" 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-double.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-double.c
new file mode 100644
index 000..7ceca9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-double.c
@@ -0,0 +1,17 @@
+/* Verify that overloaded built-ins for vec_perm with double
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+// vector double needs -mvsx.
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+#include 
+
+vector double
+testd (vector double vd2, vector double vd3, vector unsigned char vuc)
+{
+  return vec_perm (vd2, vd3, vuc);
+}
+
+/* { dg-final { scan-assembler-times "vperm" 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-float.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-float.c
new file mode 100644
index 000..c9cfb0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-float.c
@@ -0,0 +1,16 @@
+/* Verify that overloaded built-ins for vec_perm with float
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+#include 
+
+vector float
+testf (vector float vf2, vector float vf3, vector unsigned char vuc)
+{
+  return vec_perm (vf2, vf3, vuc);
+}
+
+/* { dg-final { scan-assembler-times "vperm" 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-int.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-int.c
new file mode 100644
index 000..a2fdc26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-int.c
@@ -0,0 +1,31 @@
+/* Verify that overloaded built-ins for vec_perm with int
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+#include 
+
+vector bool int
+testbi (vector bool int vbi2, vector bool int vbi3,
+   vector unsigned char vuc)
+{
+  return vec_perm (vbi2, vbi3, vuc);
+}
+
+vector signed int
+testsi (vector signed int vsi2, vector signed int vsi3,
+   vector unsigned char vuc)
+{
+  return vec_perm (vsi2, vsi3, vuc);
+}
+
+vector unsigned int
+testui (vector unsigned int vui2, vector unsigned int vui3,
+   vector unsigned char vuc)
+{
+  return vec_perm (vui2, vui3, vuc);
+}
+
+/* { dg-final { scan-assembler-times "vperm" 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-longlong.c
new file mode 100644
index 000..7f3e574
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-longlong.c
@@ -0,0 +1,32 @@
+/* Verify that overloaded built-ins for vec_perm with long long
+   inputs produce the right code.  */
+
+/* { dg-do compile {target lp64} } */
+// 'long long' in Altivec types is invalid without -mvsx.
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mvsx -O2" } */
+
+#include 
+
+vector bool long long
+testbl (vector bool long long vbl2, vector bool long long vbl3,
+   vector unsigned char vuc)
+{
+  return vec_per

Re: [PATCH] [docs] Explain how to use multiple file-name patterns in RUNTESTFLAGS

2017-08-22 Thread Mike Stump
On Aug 22, 2017, at 11:53 AM, Daniel Santos  wrote:
> 
> OK, the problem is at line 4014 of gcc/Makefile.in:
> 
>  $(MAKE) TESTSUITEDIR="$(TESTSUITEDIR)"
> RUNTESTFLAGS="$(RUNTESTFLAGS)" \
>check-parallel-$* \

So, this is typical of sh scripting.  Most kids don't quote and know how to 
quote in other than trivial cases.  It is one of the reasons why scripting is 
both better and worse.  sh from day 1 should have had a quote function that 
would quote the operand, it doesn't.  If it did, we'd put $(quote ...) in there 
instead of "..." and it would just work.

> I presume that the solution would be to re-escape the contents of
> RUNTESTFLAGS.

Yes.  The annoyance factor is so high, that no one ever does.  Feel free rot 
ignore the problem.



[PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread Daniel Santos
> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to
> UNKNOWN_ABI.

It would seem to me that UNSPECIFIED_ABI would be a better value name.

Also, I don't really understand what opts_set and opts are, except that I had
guessed opts_set is what the user asked for (or didn't ask for) and opts is
what we're going to actually use.  Am I close?

I'm re-running tests, so if they pass is this OK?

Thanks,
Daniel
---
 gcc/config/i386/i386-opts.h | 5 +++--
 gcc/config/i386/i386.c  | 3 +--
 gcc/config/i386/i386.opt| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index 542cd0f3d67..a1d1552a3c6 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -44,8 +44,9 @@ last_alg
 /* Available call abi.  */
 enum calling_abi
 {
-  SYSV_ABI = 0,
-  MS_ABI = 1
+  UNSPECIFIED_ABI = 0,
+  SYSV_ABI = 1,
+  MS_ABI = 2
 };
 
 enum fpmath_unit
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 650bcbc65ae..c08ad55fcd9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p,
 opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags)
 ? PMODE_DI : PMODE_SI;
 
-  if (!opts_set->x_ix86_abi)
+  if (opts_set->x_ix86_abi == UNSPECIFIED_ABI)
 opts->x_ix86_abi = DEFAULT_ABI;
 
   if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags))
 error ("-mabi=ms not supported with X32 ABI");
-  gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
 
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index cd564315f04..f7b9f9707f7 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -525,7 +525,7 @@ Target Report Mask(IAMCU)
 Generate code that conforms to Intel MCU psABI.
 
 mabi=
-Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI)
+Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) 
Init(UNSPECIFIED_ABI)
 Generate code that conforms to the given ABI.
 
 Enum
-- 
2.13.3



Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread Uros Bizjak
On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos  wrote:
>> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to
>> UNKNOWN_ABI.
>
> It would seem to me that UNSPECIFIED_ABI would be a better value name.
>
> Also, I don't really understand what opts_set and opts are, except that I had
> guessed opts_set is what the user asked for (or didn't ask for) and opts is
> what we're going to actually use.  Am I close?

Yes. opts_set is a flag that user specified an option at the command line.

However, I fail to see what is the problem. If nothing was specified,
then opts->x_ix86_abi is set to DEFAULT_ABI. Probably we don't need
Init(SYSV_ABI) in mabi= declaration at all.

Uros.

> I'm re-running tests, so if they pass is this OK?
>
> Thanks,
> Daniel
> ---
>  gcc/config/i386/i386-opts.h | 5 +++--
>  gcc/config/i386/i386.c  | 3 +--
>  gcc/config/i386/i386.opt| 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> index 542cd0f3d67..a1d1552a3c6 100644
> --- a/gcc/config/i386/i386-opts.h
> +++ b/gcc/config/i386/i386-opts.h
> @@ -44,8 +44,9 @@ last_alg
>  /* Available call abi.  */
>  enum calling_abi
>  {
> -  SYSV_ABI = 0,
> -  MS_ABI = 1
> +  UNSPECIFIED_ABI = 0,
> +  SYSV_ABI = 1,
> +  MS_ABI = 2
>  };
>
>  enum fpmath_unit
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 650bcbc65ae..c08ad55fcd9 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p,
>  opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags)
>  ? PMODE_DI : PMODE_SI;
>
> -  if (!opts_set->x_ix86_abi)
> +  if (opts_set->x_ix86_abi == UNSPECIFIED_ABI)
>  opts->x_ix86_abi = DEFAULT_ABI;
>
>if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags))
>  error ("-mabi=ms not supported with X32 ABI");
> -  gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
>
>/* For targets using ms ABI enable ms-extensions, if not
>   explicit turned off.  For non-ms ABI we turn off this
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index cd564315f04..f7b9f9707f7 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -525,7 +525,7 @@ Target Report Mask(IAMCU)
>  Generate code that conforms to Intel MCU psABI.
>
>  mabi=
> -Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI)
> +Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) 
> Init(UNSPECIFIED_ABI)
>  Generate code that conforms to the given ABI.
>
>  Enum
> --
> 2.13.3
>


Re: Statement Frontier Notes, Location Views, and Inlined Entry Point Markers

2017-08-22 Thread Alexandre Oliva
On Aug 21, 2017, Richard Biener  wrote:

> On Fri, Aug 18, 2017 at 11:20 PM, Alexandre Oliva  wrote:

>> Besides implementing these new features, the patch contains multiple
>> fixes for -fcompare-debug errors detected at various optimization
>> levels, arising mainly from the introduction of begin stmt and inlined
>> entry point markers.

> Can you try to split those out?

I'm pretty sure I split out before the ones that were not triggered by
the introduction of the new debug position markers.  The remaining ones
thus don't necessarily stand on their own, since the conditions needed
to trigger them are not necessarily exercised by the compiler.  Plus,
most of the fixes introduce references to the new classification of
debug stmts and insns.

Would it be sensible and acceptable to bring the bits that introduce the
new classification along with their new uses, rather than make two
revisions at the same spots?

> +gno-statement-frontiers
> +Common Driver RejectNegative Var(debug_nonbind_markers_p, 0) Init(2)
> +Don't enforce progressive recommended breakpoint locations.
> +
> +gstatement-frontiers
> +Common Driver RejectNegative Var(debug_nonbind_markers_p, 1)
> +Emit progressive recommended breakpoint locations.

> others get away with a single flag and Init(-1).  That is, 
> -gstatement-frontiers
> should set it to 1 already and -gno- to 0.  Why do you need the explicit
> entry for gno-..?

Good question.  I vaguely recall wondering about that myself when
copying from some other option.  Will investigate.


>for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> -if (DEBUG_INSN_P (insn))
> +if (BIND_DEBUG_INSN_P (insn))
>{

> DEBUG_BIND_INSN_P?  GIMPLE has gimple_debug_bind_p ...

Yeah, gimple notation seems to go from most general to most specific
(first gimple, then debug, then bind), whereas we go from specific to
general in the rtl macro names (insn at the end, debug just before it,
and now bind just before it).  I suppose debug bind insn would work, as
well, though it feels as awkward and inconsistent as e.g. Brazil, São
Paulo, Earth to me.  Or, for that matter, Aug 22, 2017, ;-) so I guess
one could get used to it ;-)


> +  /* If the function has too many markers, drop them while expanding.  */
> +  if (cfun->debug_marker_count
> +  >= PARAM_VALUE (PARAM_MAX_DEBUG_MARKER_COUNT))
> +cfun->debug_nonbind_markers = false;
> +

> if they are not a problem up until here why care now?

IIRC we do have a limit for VTA notes too, but there's a C++ testcase
(g++.dg/tree-ssa/pr14703.C) that expands and inlines fibonacci template
functions so deep, more than doubling the number of statements at all
but the base recursion levels, so we'd end up with over 2^{85+} debug
stmts if we didn't cut them off somehow.

> That said,
> what's the number
> of markers per GIMPLE stmt for optimized builds of tramp3d-v4.cpp?  [it has
> around 10 original function calls per generated assembler line]

I'm not sure how to count gimple stmts precisely, but I can count DEBUG
ones exactly.  Compiling tramp3d-v4 with -O2 -g and either -g0, -g, or
-gno-statement-frontiers, respectively, the 228t.optimized dump contains:

$ grep -c '^ *# DEBUG ' tramp3d-v4.ii.228t.optimized-*
tramp3d-v4.ii.228t.optimized-g0:0
tramp3d-v4.ii.228t.optimized-markers:982493
tramp3d-v4.ii.228t.optimized-noSFN:743940

$ wc -l tramp3d-v4.ii.228t.optimized-*
  238038 tramp3d-v4.ii.228t.optimized-g0
 1220531 tramp3d-v4.ii.228t.optimized-markers
  981978 tramp3d-v4.ii.228t.optimized-noSFN

So it seems like SFN and IEPM add a little over one extra debug stmt per
real stmt, whereas VTA added 3.  For this testcase, anyway.

That's not surprising.  Most useful statements have at least one
(variable-binding) side effect, and quite often more than one, so VTA
notes tend to dominate the begin_stmt markers.

Even inlined calls, that might have a different ratio, it's still one
inlined entry point and possibly one begin stmt marker, vs one bind stmt
per parameter, plus the inlined code.


> Would a better option be to condense multiple adjacent notes to a single one?
> That way we'd have a natural bound as fallback.

That wouldn't help with the fibonacci testcase, I'm afraid, but it's an
interesting idea I had not explored.

I guess it wouldn't help much in typical code, since typical code has
side effects and thus binding stmts, so we wouldn't find long streams of
nonbinding markers.  Even the atypical case of the fibonacci template
function set would end up being cut-off once I get to adding bindings
for return values, which is in my list of things to explore.


> I expect heavily abstracted C++ to blow up GIMPLE IL considerably that way...

> Did you see what these do to memory/compile-time use with a LTO bootstrap?

I haven't observed any notable changes.  I'll be glad to collect and
supply any specific measurements you choose.


> +  if (MARKER_DEBUG_INSN_P (insn))
> +   return true;
> +

> DEBUG_MARKER_INSN_P

See above.


> +/

Re: [PATCH, rs6000] (v2) Testcase coverage for vec_perm built-ins

2017-08-22 Thread Segher Boessenkool
Hi!

On Tue, Aug 22, 2017 at 02:03:03PM -0500, Will Schmidt wrote:
> (v2) Revisited and reworked all of the requires statements
> throughout this batch.  Everything specifies -maltivec except
> the double and long long tests, which require vsx.
> 
> (re-) Tested across power platforms.   OK for trunk?

Looks fine now, thanks!  Okay for trunk.


Segher


>   * gcc.target/powerpc/fold-vec-perm-char.c: New.
>   * gcc.target/powerpc/fold-vec-perm-double.c: New.
>   * gcc.target/powerpc/fold-vec-perm-float.c: New.
>   * gcc.target/powerpc/fold-vec-perm-int.c: New.
>   * gcc.target/powerpc/fold-vec-perm-longlong.c: New.
>   * gcc.target/powerpc/fold-vec-perm-pixel.c: New.
>   * gcc.target/powerpc/fold-vec-perm-short.c: New.


[PATCH v4 0/4] [i386] PR80969 Fix ICE with -mabi=ms -mavx512f

2017-08-22 Thread Daniel Santos
I had to fix a few things for x32 compatibility and I this is ready
now.  H.J. tested on machine with avx512 (including x32) and I've tested
both native x32 and normal x86_64 with m64, m32 and mx32 and all is
well.  I've made more changes to the tests so I'm just submitting a
version 2 of the whole patch set.

OK for trunk?

2017-08-22  Daniel Santos  

* config/i386/i386.h (ix86_frame::stack_realign_allocate_offset):
Remove field.
(ix86_frame::stack_realign_allocate): New field.
(struct machine_frame_state): Modify comments.
(machine_frame_state::sp_realigned_fp_end): New field.
* config/i386/i386.c (ix86_compute_frame_layout): Rework stack frame
layout calculation.
(sp_valid_at): Add assertion to assure no attempt to access invalid
offset of a realigned stack.
(fp_valid_at): Likewise.
(choose_baseaddr): Modify comments.
(ix86_emit_outlined_ms2sysv_save): Adjust to changes in
ix86_expand_prologue.
(ix86_expand_prologue): Modify stack realignment and allocation.
(ix86_expand_epilogue): Modify comments.

2017-08-22  Daniel Santos  

* gcc.target/i386/pr80969-1.c: New testcase.
* gcc.target/i386/pr80969-2a.c: Likewise.
* gcc.target/i386/pr80969-2.c: Likewise.
* gcc.target/i386/pr80969-3.c: Likewise.
* gcc.target/i386/pr80969-4a.c: Likewise.
* gcc.target/i386/pr80969-4b.c: Likewise.
* gcc.target/i386/pr80969-4.c: Likewise.
* gcc.target/i386/pr80969-4.h: New header common to pr80969-4*.c


Thanks,
Daniel


Re: Statement Frontier Notes, Location Views, and Inlined Entry Point Markers

2017-08-22 Thread Alexandre Oliva
On Aug 21, 2017, Richard Biener  wrote:

> +gno-statement-frontiers
> +Common Driver RejectNegative Var(debug_nonbind_markers_p, 0) Init(2)
> +Don't enforce progressive recommended breakpoint locations.
> +
> +gstatement-frontiers
> +Common Driver RejectNegative Var(debug_nonbind_markers_p, 1)
> +Emit progressive recommended breakpoint locations.

> others get away with a single flag and Init(-1).  That is, 
> -gstatement-frontiers
> should set it to 1 already and -gno- to 0.  Why do you need the explicit
> entry for gno-..?

All debug options that support negation seem to have adopted this idiom;
without it, the negated options end up misparsed as -g with an argument,
and then set_debug_level complains that "no-..." is not a number.

The logic of matching the longest option name prefix doesn't seem to
work very well when options have a prefix that is also a valid option
with a Joined(OrMissing) argument.

The same problem applies to -O: -Ono-fast is not parsed as a negative of
-Ofast, but as -O with no-fast as the argument, even though no
RejectNegative flag is present under Ofast.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[PATCH 1/4] [i386] Correct comments, add assertions to sp_valid_at and fp_valid_at

2017-08-22 Thread Daniel Santos
When we realign the stack frame (without DRAP), there may be a range of
CFA offsets that should never be touched because they are alignment
padding and any reference to them is almost certainly an error.
Previously, only the offset of where the realigned stack frame starts
was recorded and checked in sp_valid_at and fp_valid_at.

This change adds sp_realigned_fp_last to struct machine_frame_state to
record the last valid offset from which the frame pointer can be used
when the stack pointer is realigned and modifies sp_valid_at and
fp_valid_at to fail an assertion when passed an offset in the "no-man's
land" between these two values.

Comments for struct machine_frame_state incorrectly stated that a
realigned stack pointer could be used to access offsets equal to or
greater than sp_realigned_offset, but it is only valid for offsets that
are greater.  This was the (incorrect) behaviour of sp_valid_at and
fp_valid_at prior to r250587 and this change now corrects the
documentation and adds clarification of the CFA-relative calculation.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 45 ++---
 gcc/config/i386/i386.h | 18 +-
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c08ad55fcd9..601e3ef47f6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13177,26 +13177,36 @@ choose_baseaddr_len (unsigned int regno, 
HOST_WIDE_INT offset)
   return len;
 }
 
-/* Determine if the stack pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the stack pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
-static inline bool
+static bool
 sp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.sp_valid && !(fs.sp_realigned
- && cfa_offset <= fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset <= fs.sp_realigned_offset)
+{
+  /* Validate that the cfa_offset isn't in a "no-man's land".  */
+  gcc_assert (cfa_offset <= fs.sp_realigned_fp_last);
+  return false;
+}
+  return fs.sp_valid;
 }
 
-/* Determine if the frame pointer is valid for accessing the cfa_offset.
-   The register is saved at CFA - CFA_OFFSET.  */
+/* Determine if the frame pointer is valid for accessing the CFA_OFFSET in
+   the frame save area.  The register is saved at CFA - CFA_OFFSET.  */
 
 static inline bool
 fp_valid_at (HOST_WIDE_INT cfa_offset)
 {
   const struct machine_frame_state &fs = cfun->machine->fs;
-  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
- && cfa_offset > fs.sp_realigned_offset);
+  if (fs.sp_realigned && cfa_offset > fs.sp_realigned_fp_last)
+{
+  /* Validate that the cfa_offset isn't in a "no-man's land".  */
+  gcc_assert (cfa_offset >= fs.sp_realigned_offset);
+  return false;
+}
+  return fs.fp_valid;
 }
 
 /* Choose a base register based upon alignment requested, speed and/or
@@ -14675,6 +14685,9 @@ ix86_expand_prologue (void)
   int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
   gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
+  /* Record last valid frame pointer offset.  */
+  m->fs.sp_realigned_fp_last = m->fs.sp_offset;
+
   /* The computation of the size of the re-aligned stack frame means
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
@@ -14688,13 +14701,15 @@ ix86_expand_prologue (void)
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
stack_pointer_rtx,
GEN_INT (-align_bytes)));
-  /* For the purposes of register save area addressing, the stack
-pointer can no longer be used to access anything in the frame
-below m->fs.sp_realigned_offset and the frame pointer cannot be
-used for anything at or above.  */
   m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
   m->fs.sp_realigned = true;
   m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+  /* The stack pointer may no longer be equal to CFA - m->fs.sp_offset.
+Beyond this point, stack access should be done via choose_baseaddr or
+by using sp_valid_at and fp_valid_at to determine the correct base
+register.  Henceforth, any CFA offset should be thought of as logical
+and not physical.  */
+  gcc_assert (m->fs.sp_realigned_offset >= m->fs.sp_realigned_fp_last);
   gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
   /* SEH unwind emit doesn't currently support REG_CFA_EXPRESSION, which
 is needed to describe where a register is saved using a rea

[PATCH 2/4] [i386] Modify ix86_compute_frame_layout

2017-08-22 Thread Daniel Santos
These changes affect how the stack frame is calculated from the region
starting at frame.reg_save_offset until frame.frame_pointer_offset,
which includes either the stub save area or the (inline) SSE register
save area and the va_args register save area.

The calculation used when not realigning the stack pointer is the same,
but when when realigning we calculate the 16-byte aligned space needed
in reverse so that the stack realignment boundary at
frame.stack_realign_offset may not necessarily be a multiple of
stack_alignment_needed, but the value of frame.frame_pointer_offset
will. This results in a properly aligned stack for the function body and
avoids wasting stack space.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 116 +
 gcc/config/i386/i386.h |   2 +-
 2 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 601e3ef47f6..30e84dd5303 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12960,6 +12960,14 @@ ix86_compute_frame_layout (void)
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
   gcc_assert (preferred_alignment <= stack_alignment_needed);
 
+  /* The only ABI saving SSE regs should be 64-bit ms_abi.  */
+  gcc_assert (TARGET_64BIT || !frame->nsseregs);
+  if (TARGET_64BIT && m->call_ms2sysv)
+{
+  gcc_assert (stack_alignment_needed >= 16);
+  gcc_assert (!frame->nsseregs);
+}
+
   /* For SEH we have to limit the amount of code movement into the prologue.
  At present we do this via a BLOCKAGE, at which point there's very little
  scheduling that can be done, which means that there's very little point
@@ -13022,54 +13030,88 @@ ix86_compute_frame_layout (void)
   if (TARGET_SEH)
 frame->hard_frame_pointer_offset = offset;
 
-  /* When re-aligning the stack frame, but not saving SSE registers, this
- is the offset we want adjust the stack pointer to.  */
-  frame->stack_realign_allocate_offset = offset;
+  /* Calculate the size of the va-arg area (not including padding, if any).  */
+  frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
 
-  /* The re-aligned stack starts here.  Values before this point are not
- directly comparable with values below this point.  Use sp_valid_at
- to determine if the stack pointer is valid for a given offset and
- fp_valid_at for the frame pointer.  */
   if (stack_realign_fp)
-offset = ROUND_UP (offset, stack_alignment_needed);
-  frame->stack_realign_offset = offset;
-
-  if (TARGET_64BIT && m->call_ms2sysv)
 {
-  gcc_assert (stack_alignment_needed >= 16);
-  gcc_assert (!frame->nsseregs);
+  /* We may need a 16-byte aligned stack for the remainder of the
+register save area, but the stack frame for the local function
+may require a greater alignment if using AVX/2/512.  In order
+to avoid wasting space, we first calculate the space needed for
+the rest of the register saves, add that to the stack pointer,
+and then realign the stack to the boundary of the start of the
+frame for the local function.  */
+  HOST_WIDE_INT space_needed = 0;
+  HOST_WIDE_INT sse_reg_space_needed = 0;
 
-  m->call_ms2sysv_pad_in = !!(offset & UNITS_PER_WORD);
-  offset += xlogue_layout::get_instance ().get_stack_space_used ();
-}
+  if (TARGET_64BIT)
+   {
+ if (m->call_ms2sysv)
+   {
+ m->call_ms2sysv_pad_in = 0;
+ space_needed = xlogue_layout::get_instance 
().get_stack_space_used ();
+   }
 
-  /* Align and set SSE register save area.  */
-  else if (frame->nsseregs)
-{
-  /* The only ABI that has saved SSE registers (Win64) also has a
-16-byte aligned default stack.  However, many programs violate
-the ABI, and Wine64 forces stack realignment to compensate.
+ else if (frame->nsseregs)
+   /* The only ABI that has saved SSE registers (Win64) also has a
+  16-byte aligned default stack.  However, many programs violate
+  the ABI, and Wine64 forces stack realignment to compensate.  */
+   space_needed = frame->nsseregs * 16;
+
+ sse_reg_space_needed = space_needed = ROUND_UP (space_needed, 16);
+
+ /* 64-bit frame->va_arg_size should always be a multiple of 16, but
+rounding to be pedantic.  */
+ space_needed = ROUND_UP (space_needed + frame->va_arg_size, 16);
+   }
+  else
+   space_needed = frame->va_arg_size;
+
+  /* Record the allocation size required prior to the realignment AND.  */
+  frame->stack_realign_allocate = space_needed;
+
+  /* The re-aligned stack starts at frame->stack_realign_offset.  Values
+before this point are not directly comparable with values below
+this point.  Use sp_valid_at to determine if the stack pointer is
+valid for a given offse

[PATCH 3/4] [i386] Modify SP realignment in ix86_expand_prologue, et. al.

2017-08-22 Thread Daniel Santos
My first version of this patch inited m->fs.sp_realigned_fp_last with
the value of m->fs.sp_offset prior to performing the stack realignment.
I had forgotten, however, that when we're saving GP regs using MOV that
we delay SP modification as long as possible so that the value of
m->fs.sp_offset at this point is correct when we've used push, but
incorrect when we've used mov.

This has been tested on both x86_64-pc-linux-gnu{,x32} with
--target_board=unix/\{-m64,-mx32,-m32\}.

Original patch description:

The SP allocation calculation is now done in ix86_compute_frame_layout
and the result stored in ix86_frame::stack_realign_allocate.  This
change also updates comments for choose_baseaddr to clarify that the
alignment returned doesn't necessarily reflect the alignment of the
cfa_offset passed (e.g., you can pass cfa_offset 48 and it can return an
alignment of 64 bytes).

Since the alignment required may be more than 16-bytes, we cannot defer
SP allocation to ix86_emit_outlined_ms2sysv_save (when it's enabled), so
that function needs to be updated as well.

Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c | 58 --
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 30e84dd5303..dbc771da8aa 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13359,10 +13359,13 @@ choose_basereg (HOST_WIDE_INT cfa_offset, rtx 
&base_reg,
 }
 
 /* Return an RTX that points to CFA_OFFSET within the stack frame and
-   the alignment of address.  If align is non-null, it should point to
+   the alignment of address.  If ALIGN is non-null, it should point to
an alignment value (in bits) that is preferred or zero and will
-   recieve the alignment of the base register that was selected.  The
-   valid base registers are taken from CFUN->MACHINE->FS.  */
+   recieve the alignment of the base register that was selected,
+   irrespective of rather or not CFA_OFFSET is a multiple of that
+   alignment value.
+
+   The valid base registers are taken from CFUN->MACHINE->FS.  */
 
 static rtx
 choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
@@ -14445,35 +14448,35 @@ ix86_emit_outlined_ms2sysv_save (const struct 
ix86_frame &frame)
   rtx sym, addr;
   rtx rax = gen_rtx_REG (word_mode, AX_REG);
   const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
-  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset () + m->fs.sp_offset;
-  HOST_WIDE_INT stack_alloc_size = frame.stack_pointer_offset - 
m->fs.sp_offset;
-  HOST_WIDE_INT stack_align_off_in = xlogue.get_stack_align_off_in ();
+  HOST_WIDE_INT allocate = frame.stack_pointer_offset - m->fs.sp_offset;
+
+  /* AL should only be live with sysv_abi.  */
+  gcc_assert (!ix86_eax_live_at_start_p ());
+
+  /* Setup RAX as the stub's base pointer.  We use stack_realign_offset rather
+ we've actually realigned the stack or not.  */
+  align = GET_MODE_ALIGNMENT (V4SFmode);
+  addr = choose_baseaddr (frame.stack_realign_offset
+ + xlogue.get_stub_ptr_offset (), &align);
+  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
+  emit_insn (gen_rtx_SET (rax, addr));
 
-  /* Verify that the incoming stack 16-byte alignment offset matches the
- layout we're using.  */
-  gcc_assert (stack_align_off_in == (m->fs.sp_offset & UNITS_PER_WORD));
+  /* Allocate stack if not already done.  */
+  if (allocate > 0)
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (-allocate), -1, false);
 
   /* Get the stub symbol.  */
   sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
  : XLOGUE_STUB_SAVE);
   RTVEC_ELT (v, vi++) = gen_rtx_USE (VOIDmode, sym);
 
-  /* Setup RAX as the stub's base pointer.  */
-  align = GET_MODE_ALIGNMENT (V4SFmode);
-  addr = choose_baseaddr (rax_offset, &align);
-  gcc_assert (align >= GET_MODE_ALIGNMENT (V4SFmode));
-  insn = emit_insn (gen_rtx_SET (rax, addr));
-
-  gcc_assert (stack_alloc_size >= xlogue.get_stack_space_used ());
-  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-GEN_INT (-stack_alloc_size), -1,
-m->fs.cfa_reg == stack_pointer_rtx);
   for (i = 0; i < ncregs; ++i)
 {
   const xlogue_layout::reginfo &r = xlogue.get_reginfo (i);
   rtx reg = gen_rtx_REG ((SSE_REGNO_P (r.regno) ? V4SFmode : word_mode),
 r.regno);
-  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);;
+  RTVEC_ELT (v, vi++) = gen_frame_store (reg, rax, -r.offset);
 }
 
   gcc_assert (vi == (unsigned)GET_NUM_ELEM (v));
@@ -14728,14 +14731,15 @@ ix86_expand_prologue (void)
   gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
   /* Record last valid frame pointer offset.  */
-  m->fs.sp_realigned_fp_last = m->fs.sp_of

[PATCH 4/4] [i386, testsuite] Add tests, fix bug in check_avx2_hw_available

2017-08-22 Thread Daniel Santos
Changes to lib/target-supports.exp and documentation:
* Add effective-targets avx512f and avx512f_runtime (needed for new
  tests).
* Corrects bug in check_avx2_hw_available.
* Adds documentation for effective-targets avx2, avx2_runtime (both
  missing), avx512f and avx512f_runtime.

The following tests are added.  The testcase in the PR is used as a base
and relevant variants are added to test other factors affected by the
patch set.

pr80969-1.c   Base test case.
pr80969-2.c   With ms to sysv call.
pr80969-2a.c  With ms to sysv call using stubs.
pr80969-3.c   With alloca (for DRAP test).
pr80969-4.c   With va_args passed via va_list
pr80969-4a.c  With va_args passed via va_list and ms to sysv call.
pr80969-4b.c  With va_args passed via va_list and ms to sysv call using
  stubs.
pr80969-4.h   Common header for pr80969-4*.c.

Signed-off-by: Daniel Santos 
---
 gcc/doc/sourcebuild.texi   |  12 +++
 gcc/testsuite/gcc.target/i386/pr80969-1.c  |  16 
 gcc/testsuite/gcc.target/i386/pr80969-2.c  |  27 +++
 gcc/testsuite/gcc.target/i386/pr80969-2a.c |   8 ++
 gcc/testsuite/gcc.target/i386/pr80969-3.c  |  32 
 gcc/testsuite/gcc.target/i386/pr80969-4.c  |   9 +++
 gcc/testsuite/gcc.target/i386/pr80969-4.h  | 119 +
 gcc/testsuite/gcc.target/i386/pr80969-4a.c |   9 +++
 gcc/testsuite/gcc.target/i386/pr80969-4b.c |   9 +++
 gcc/testsuite/lib/target-supports.exp  |  66 
 10 files changed, 307 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4.h
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80969-4b.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index e6313dc031e..0bf4d6afeb6 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1855,6 +1855,18 @@ Target supports compiling @code{avx} instructions.
 @item avx_runtime
 Target supports the execution of @code{avx} instructions.
 
+@item avx2
+Target supports compiling @code{avx2} instructions.
+
+@item avx2_runtime
+Target supports the execution of @code{avx2} instructions.
+
+@item avx512f
+Target supports compiling @code{avx512f} instructions.
+
+@item avx512f_runtime
+Target supports the execution of @code{avx512f} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-1.c 
b/gcc/testsuite/gcc.target/i386/pr80969-1.c
new file mode 100644
index 000..e0520b45c40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target { ! x32 } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+int a[56];
+int b;
+int main (int argc, char *argv[]) {
+  int c;
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2.c
new file mode 100644
index 000..f885dee6512
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
+/* { dg-do compile { target { { ! x32 } && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func.  */
+
+int a[56];
+int b;
+
+static void __attribute__((sysv_abi)) sysv ()
+{
+}
+
+void __attribute__((sysv_abi)) (*volatile const sysv_noinfo)() = sysv;
+
+int main (int argc, char *argv[]) {
+  int c;
+  sysv_noinfo ();
+  for (; b; b++) {
+c = b;
+if (b & 1)
+  c = 2;
+a[b] = c;
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-2a.c 
b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
new file mode 100644
index 000..baea0796d24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-2a.c
@@ -0,0 +1,8 @@
+/* { dg-do run { target { lp64 && avx512f_runtime } } } */
+/* { dg-do compile { target { lp64 && { ! avx512f_runtime } } } } */
+/* { dg-options "-Ofast -mabi=ms -mavx512f -mcall-ms2sysv-xlogues" } */
+/* { dg-require-effective-target avx512f } */
+
+/* Test when calling a sysv func using save/restore stubs.  */
+
+#include "pr80969-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/pr80969-3.c 
b/gcc/testsuite/gcc.target/i386/pr80969-3.c
new file mode 100644
index 000..d902a771cc8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr80969-3.c
@@ -0,0 +1,32 @@
+/* { dg-do run { target { { ! x32 } && avx512f_runtime } } } */
+/* { dg-do compile { target { { ! x32 } && { ! av

Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread JonY
On 08/22/2017 06:32 AM, Uros Bizjak wrote:
> On Tue, Aug 22, 2017 at 4:10 AM, Daniel Santos  
> wrote:
> 
>> This is a problem that occured because of this code in
>> ix86_option_override_internal:
>>
>>   if (!opts_set->x_ix86_abi)
>> opts->x_ix86_abi = DEFAULT_ABI;
>>
>> I tested this along with my other patches.  OK for trunk?
>>
>> * config/i386/i386-opts.h (enum calling_abi): Modify so that no legal
>> values are equivalent to zero.
> 
> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to
> UNKNOWN_ABI.
> 
> Then change the above condition to
> 
> if (opts_set->x_ix86_abi == UNKNOWN_ABI)
> 
> We can't just init -mabi to DEFAULT_ABI, sinde this is selected at
> runtime. Maybe a comment should be added for UNKNOWN_ABI, that it is
> overriden in ix86_option_override_internal.
> 
> Uros.

This sounds sensible if there was ever a time some new ABI is added for x86.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 5/3] C++ bits to improve detection of attribute conflicts (PR 81544)

2017-08-22 Thread Jason Merrill
The C and C++ front ends already have a diagnose_mismatched_attributes 
function, which seems like the natural place to add more conflict checking.


Jason


[libcc1] Improve detection of triplet on compiler names

2017-08-22 Thread Sergio Durigan Junior
Hi there,

This is a series of two patches, one for GDB and one for GCC, which aims
to improve the detection and handling of triplets present on compiler
names.  The motivation for this series was mostly the fact that GDB's
"compile" command is broken on Debian unstable, as can be seen here:

  

The reason for the failure is the fact that Debian compiles GCC using
the --program-{prefix,suffix} options from configure in order to name
the compiler using the full triplet (i.e., Debian's GCC is not merely
named "gcc", but e.g. "x86_64-linux-gnu-gcc-7"), which end up naming the
C_COMPILER_NAME and CP_COMPILER_NAME defines with the specified prefix
and suffix.  Therefore, the regexp being used to match the compiler name
is wrong because it doesn't take into account the fact that the defines
may already contain the triplets.

The GCC patch improves the libcc1::compiler_triplet_regexp::find and
libcp1::compiler_triplet_regexp::find methods by first trying to match
the triplet in the compiler name and correctly discarding the triplet
part of the regexp if the matching succeeds.  I've had to do a few
modifications on the way the regexp's are built, but I'll explain them
in the patch itself.

The GDB patch is very simple: it adds the trailing "-" in the triplet
regexp.  Therefore, we will have a regexp that truly matches the full
triplet (e.g., "^(x86_64|i.86)(-[^-]*)?-linux(-gnu)?-") instead of one
that leaves the trailing "-" match to libcc1.

I've tested this patch both on my Fedora and my Debian machines, and
both now work as expected, independently of the presence of the triplet
string in the compiler name.  I am sorry about the cross-post, but these
patches are really dependent on one another.

Thanks,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/


[PATCH 2/2] [GDB] Add trailing dash on triplet regexp

2017-08-22 Thread Sergio Durigan Junior
This is the GDB patch.

It is very simple, and just a necessary adjustment needed because of the
modifications made in the "make_regexp" functions on libcc1.

Now, GDB will provide a full regexp for triplet names, including the
trailing dash ("-").  Therefore, we will have a regexp that truly
matches the full triplet (e.g., "^(x86_64|i.86)(-[^-]*)?-linux(-gnu)?-")
instead of one that leaves the trailing "-" match to libcc1.

OK to apply?

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

gdb/ChangeLog:
2017-08-23  Sergio Durigan Junior  

* compile/compile.c (compile_to_object): Add trailing dash on
triplet regexp.

diff --git a/gdb/compile/compile.c b/gdb/compile/compile.c
index 91e084f89f..0ce77a8b95 100644
--- a/gdb/compile/compile.c
+++ b/gdb/compile/compile.c
@@ -509,7 +509,7 @@ compile_to_object (struct command_line *cmd, const char 
*cmd_string,
   arch_rx = gdbarch_gnu_triplet_regexp (gdbarch);
 
   /* Allow triplets with or without vendor set.  */
-  triplet_rx = concat (arch_rx, "(-[^-]*)?-", os_rx, (char *) NULL);
+  triplet_rx = concat (arch_rx, "(-[^-]*)?-", os_rx, "-", (char *) NULL);
   make_cleanup (xfree, triplet_rx);
 
   /* Set compiler command-line arguments.  */


[PATCH 1/2] [GCC] Improve regexp handling on libc[cp]1::compiler_triplet_regexp::find

2017-08-22 Thread Sergio Durigan Junior
This is the GCC patch.

It adjusts the behaviour of make_regexp in order to not add the "-" at
the end of the triplet regexp.  This is now GDB's responsibility.

It also improves the 'find' methods of both
libcc1::compiler_triplet_regexp and libcp1::compiler_triplet_regexp
classes by implementing a detection of a triplet string in the compiler
name.  If it finds the triplet there, then the pristine compiler name is
used to match a compiler in the system, without using the initial
triplet regexp provided by GDB.  If the compiler name doesn't start with
a triplet, then the old behaviour is maintained and we'll still search
for compilers using the triplet regexp.

With this patch it is possible to include triplets in compiler names
(via the --program-prefix configure option), like Debian does, for
example.  I chose not to worry too much about possible suffixes (which
can be specified via --program-suffix) because even with them it is
still possible to find the compiler in the system.  It is important to
note that Debian uses suffixes as well.

It is still perfectly possible to find compilers without
prefixes/suffixes, like "gcc" (this is how Fedora names its GCC, by the
way).

OK to apply?

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/

libcc1/ChangeLog:
2017-08-23  Sergio Durigan Junior  

* libcc1.cc (make_regexp): Don't add dash after triplet regexp.
Handle case when COMPILER is empty.
(libcc1::compiler_triplet_regexp::find): Detect when
C_COMPILER_NAME already contains a triplet.
* libcp1.cc (make_regexp): Don't add dash after triplet regexp.
Handle case when COMPILER is empty.
(libcp1::compiler_triplet_regexp::find): Detect when
CP_COMPILER_NAME already contains a triplet.

diff --git a/libcc1/libcc1.cc b/libcc1/libcc1.cc
index 0ef6c112dae..41a6e3cca77 100644
--- a/libcc1/libcc1.cc
+++ b/libcc1/libcc1.cc
@@ -336,31 +336,34 @@ make_regexp (const char *triplet_regexp, const char 
*compiler)
 {
   std::stringstream buf;
 
-  buf << "^" << triplet_regexp << "-";
+  buf << "^" << triplet_regexp;
 
-  // Quote the compiler name in case it has something funny in it.
-  for (const char *p = compiler; *p; ++p)
+  if (compiler[0] != '\0')
 {
-  switch (*p)
+  // Quote the compiler name in case it has something funny in it.
+  for (const char *p = compiler; *p; ++p)
{
-   case '.':
-   case '^':
-   case '$':
-   case '*':
-   case '+':
-   case '?':
-   case '(':
-   case ')':
-   case '[':
-   case '{':
-   case '\\':
-   case '|':
- buf << '\\';
- break;
+ switch (*p)
+   {
+   case '.':
+   case '^':
+   case '$':
+   case '*':
+   case '+':
+   case '?':
+   case '(':
+   case ')':
+   case '[':
+   case '{':
+   case '\\':
+   case '|':
+ buf << '\\';
+ break;
+   }
+ buf << *p;
}
-  buf << *p;
+  buf << "$";
 }
-  buf << "$";
 
   return buf.str ();
 }
@@ -382,12 +385,30 @@ libcc1::compiler::find (std::string &compiler 
ATTRIBUTE_UNUSED) const
 char *
 libcc1::compiler_triplet_regexp::find (std::string &compiler) const
 {
-  std::string rx = make_regexp (triplet_regexp_.c_str (), C_COMPILER_NAME);
+  regex_t triplet;
+  bool c_compiler_has_triplet = false;
+
+  // Some distros, like Debian, choose to use a prefix and a suffix
+  // in their C_COMPILER_NAME, so we try to check if that's the case
+  // here and adjust the regexp's accordingly.
+  std::string triplet_rx = make_regexp (triplet_regexp_.c_str (), "");
+  int code = regcomp (&triplet, triplet_rx.c_str (), REG_EXTENDED | REG_NOSUB);
+
+  if (code == 0)
+{
+  if (regexec (&triplet, C_COMPILER_NAME, 0, NULL, 0) == 0)
+   c_compiler_has_triplet = true;
+
+  regfree (&triplet);
+}
+
+  std::string rx = make_regexp (c_compiler_has_triplet
+   ? "" : triplet_regexp_.c_str (),
+   C_COMPILER_NAME);
   if (self_->verbose)
 fprintf (stderr, _("searching for compiler matching regex %s\n"),
 rx.c_str());
-  regex_t triplet;
-  int code = regcomp (&triplet, rx.c_str (), REG_EXTENDED | REG_NOSUB);
+  code = regcomp (&triplet, rx.c_str (), REG_EXTENDED | REG_NOSUB);
   if (code != 0)
 {
   size_t len = regerror (code, &triplet, NULL, 0);
diff --git a/libcc1/libcp1.cc b/libcc1/libcp1.cc
index bbd8488af93..5bac6e4e37c 100644
--- a/libcc1/libcp1.cc
+++ b/libcc1/libcp1.cc
@@ -360,31 +360,34 @@ make_regexp (const char *triplet_regexp, const char 
*compiler)
 {
   std::stringstream buf;
 
-  buf << "^" << triplet_regexp << "-";
+  buf << "^" << triplet_regexp;
 
-  // Quote the compiler name in case it has something funny in it.
-  for (const char *p = compiler; *p

Re: [PATCH] RTEMS: Add GCC Runtime Library Exception

2017-08-22 Thread Sebastian Huber

Hello Jeff,

On 03/08/17 07:11, Sebastian Huber wrote:

On 02/08/17 21:30, Jeff Law wrote:


On 07/24/2017 12:03 AM, Sebastian Huber wrote:

gcc/

PR libgcc/61152
* aarch64/rtems.h: Add GCC Runtime Library Exception. Format
changes.
* arm/rtems.h: Likewise.
* bfin/rtems.h: Likewise.
* i386/rtemself.h: Likewise.
* lm32/rtems.h: Likewise.
* m32c/rtems.h: Likewise.
* m68k/rtemself.h: Likewise.
* microblaze/rtems.h: Likewise.
* mips/rtems.h: Likewise.
* moxie/rtems.h: Likewise.
* nios2/rtems.h: Likewise.
* powerpcspe/rtems.h: Likewise.
* rs6000/rtems.h: Likewise.
* rtems.h: Likewise.
* sh/rtems.h: Likewise.
* sh/rtemself.h: Likewise.
* sparc/rtemself.h: Likewise.

This seems horribly wrong.  Did anyone ack this change?  I'm fully
supportive of target maintainers taking care of their areas, but
licensing stuff probably should get explicitly ack'd.

I just reviewed all the rtems config files and I don't see anything in
any of them that deserves a runtime exception with the possible
exception of rs6000/rtems.h.

Seriously.  Redefining the CPP builtins?  LINK_SPEC?  #undefs? Those
are not things we should be granting an exception for.

The one that looks marginal to me would be rs6000/rtems.h and its
definition of CRT_CALL_STATIC_FUNCTION.


I asked on the mailing list:

https://gcc.gnu.org/ml/gcc/2017-07/msg00171.html

Jakub Jelinek said that for header files included by libgcc it is 
important whether they have the runtime exception or not:


https://gcc.gnu.org/ml/gcc/2017-07/msg00176.html

There is also this PR61152 from 2014

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61152

which adds the runtime exception to a couple of gcc/ subdirectory 
files (including some RTEMS files). So, I had no explicit acknowledge, 
but my impression was that I did simply fix some left over files.


If you don't add the runtime exception to files included by libgcc, 
then the user of GCC must check that these files contain no content 
that deserves a copyright. Is this really user friendly?




What should I do now? It would be nice to have a general guideline on 
how to deal with header files used by libgcc.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread Daniel Santos
On 08/22/2017 03:00 PM, Uros Bizjak wrote:
> On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos  
> wrote:
>>> Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to
>>> UNKNOWN_ABI.
>> It would seem to me that UNSPECIFIED_ABI would be a better value name.
>>
>> Also, I don't really understand what opts_set and opts are, except that I had
>> guessed opts_set is what the user asked for (or didn't ask for) and opts is
>> what we're going to actually use.  Am I close?
> Yes. opts_set is a flag that user specified an option at the command line.
>
> However, I fail to see what is the problem. If nothing was specified,
> then opts->x_ix86_abi is set to DEFAULT_ABI.

That is not what is happening.  If -mabi=sysv is specified, then the
test (!opts_set->x_ix86_abi) is true since the value of SYSV_ABI is
zero.  When that is evaluated as true, then the abi is set to
DEFAULT_ABI, which on Windows is MS_ABI, thus ignoring the command line
option.

> Probably we don't need
> Init(SYSV_ABI) in mabi= declaration at all.

I'm guessing that if we don't specify an Init() option then it will
default to zero?  We just need a valid way to differentiate when
-mabi=sysv has been passed from when nothing has been passed.

Daniel

>
> Uros.
>
>> I'm re-running tests, so if they pass is this OK?
>>
>> Thanks,
>> Daniel
>> ---
>>  gcc/config/i386/i386-opts.h | 5 +++--
>>  gcc/config/i386/i386.c  | 3 +--
>>  gcc/config/i386/i386.opt| 2 +-
>>  3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
>> index 542cd0f3d67..a1d1552a3c6 100644
>> --- a/gcc/config/i386/i386-opts.h
>> +++ b/gcc/config/i386/i386-opts.h
>> @@ -44,8 +44,9 @@ last_alg
>>  /* Available call abi.  */
>>  enum calling_abi
>>  {
>> -  SYSV_ABI = 0,
>> -  MS_ABI = 1
>> +  UNSPECIFIED_ABI = 0,
>> +  SYSV_ABI = 1,
>> +  MS_ABI = 2
>>  };
>>
>>  enum fpmath_unit
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 650bcbc65ae..c08ad55fcd9 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -5681,12 +5681,11 @@ ix86_option_override_internal (bool main_args_p,
>>  opts->x_ix86_pmode = TARGET_LP64_P (opts->x_ix86_isa_flags)
>>  ? PMODE_DI : PMODE_SI;
>>
>> -  if (!opts_set->x_ix86_abi)
>> +  if (opts_set->x_ix86_abi == UNSPECIFIED_ABI)
>>  opts->x_ix86_abi = DEFAULT_ABI;
>>
>>if (opts->x_ix86_abi == MS_ABI && TARGET_X32_P (opts->x_ix86_isa_flags))
>>  error ("-mabi=ms not supported with X32 ABI");
>> -  gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
>>
>>/* For targets using ms ABI enable ms-extensions, if not
>>   explicit turned off.  For non-ms ABI we turn off this
>> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
>> index cd564315f04..f7b9f9707f7 100644
>> --- a/gcc/config/i386/i386.opt
>> +++ b/gcc/config/i386/i386.opt
>> @@ -525,7 +525,7 @@ Target Report Mask(IAMCU)
>>  Generate code that conforms to Intel MCU psABI.
>>
>>  mabi=
>> -Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) Init(SYSV_ABI)
>> +Target RejectNegative Joined Var(ix86_abi) Enum(calling_abi) 
>> Init(UNSPECIFIED_ABI)
>>  Generate code that conforms to the given ABI.
>>
>>  Enum
>> --
>> 2.13.3
>>



Re: [PATCH] [i386] PR 81850 Don't ignore -mabi=sysv on Cygwin/MinGW

2017-08-22 Thread Uros Bizjak
On Wed, Aug 23, 2017 at 7:23 AM, Daniel Santos  wrote:
> On 08/22/2017 03:00 PM, Uros Bizjak wrote:
>> On Tue, Aug 22, 2017 at 9:47 PM, Daniel Santos  
>> wrote:
 Please add UNKNOWN_ABI to the enum and initialize -mabi in i386.opt to
 UNKNOWN_ABI.
>>> It would seem to me that UNSPECIFIED_ABI would be a better value name.
>>>
>>> Also, I don't really understand what opts_set and opts are, except that I 
>>> had
>>> guessed opts_set is what the user asked for (or didn't ask for) and opts is
>>> what we're going to actually use.  Am I close?
>> Yes. opts_set is a flag that user specified an option at the command line.
>>
>> However, I fail to see what is the problem. If nothing was specified,
>> then opts->x_ix86_abi is set to DEFAULT_ABI.
>
> That is not what is happening.  If -mabi=sysv is specified, then the
> test (!opts_set->x_ix86_abi) is true since the value of SYSV_ABI is
> zero.  When that is evaluated as true, then the abi is set to
> DEFAULT_ABI, which on Windows is MS_ABI, thus ignoring the command line
> option.

Let's use the following patch:

--cut here--
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3c82ae64f4f2..f8590f663285 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5682,7 +5682,7 @@ ix86_option_override_internal (bool main_args_p,
 ? PMODE_DI : PMODE_SI;

   if (!opts_set->x_ix86_abi)
-opts->x_ix86_abi = DEFAULT_ABI;
+printf ("Using default ABI\n"), opts->x_ix86_abi = DEFAULT_ABI;

   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
--cut here--

$ ./cc1 -O2 -quiet hello.c
Using default ABI
$ ./cc1 -O2 -mabi=sysv -quiet hello.c
$
$ ./cc1 -O2 -mabi=sysv -quiet hello.c
$

Again, opts_set is set to true when the option is specified on the
command line, it has nothing to do with the value of the option.

> I'm guessing that if we don't specify an Init() option then it will
> default to zero?  We just need a valid way to differentiate when
> -mabi=sysv has been passed from when nothing has been passed.

Yes, it defaults to zero, but since we live in c++ world nowadays, we
can't initialize enum with integer zero...

Uros.