Re: [ipa-vrp] ice in set_value_range

2016-11-10 Thread Andreas Schwab
On Nov 09 2016, Andrew Pinski  wrote:

> Either this patch or the patch for "Handle unary pass-through jump
> functions for ipa-vrp" caused a bootstrap failure on
> aarch64-linux-gnu.
> Bootstrap comparison failure!
> gcc/go/types.o differs
> gcc/fortran/class.o differs
> gcc/tree-ssa-live.o differs
> gcc/data-streamer-out.o differs
> gcc/ira-build.o differs
> gcc/hsa-gen.o differs
> gcc/hsa-brig.o differs
> gcc/omp-low.o differs
> gcc/lto-streamer-in.o differs
> gcc/real.o differs
> gcc/final.o differs
> gcc/df-core.o differs

Similar failure on ia64.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-10 Thread Pitchumani Sivanupandi

On Wednesday 09 November 2016 08:05 PM, Georg-Johann Lay wrote:

On 09.11.2016 10:14, Pitchumani Sivanupandi wrote:

On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:

On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:
I have updated patch to include the flash size as well. Took that 
info from
device headers (it was fed into crt's device information note 
section also).


The new option would render -mn-flash superfluous, but we should 
keep it for

backward compatibility.

Ok.

Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
LINK_RELAX_SPEC?  And what happens if relaxation is off?

Yes. Removed link_pmem_wrap from link_relax.
Disabling relaxation doesn't change -mpmem-wrap-around behavior.

flashsize-and-wrap-around.patch



diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def
index 6bcc6ff..9d4aa1a 100644



 /*



 /* Classic, > 8K, <= 64K.  */
-AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, 
NULL,0x0060, 0x0, 1)
-AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE, 
"__AVR_AT43USB355__",0x0060, 0x0, 1)
-AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, 
"__AVR_AT76C711__",  0x0060, 0x0, 1)
+AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, 
NULL,0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE, 
"__AVR_AT43USB355__",0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, 
"__AVR_AT76C711__",  0x0060, 0x0, 1, 0x4000)
+AVR_MCU ("at43usb320",   ARCH_AVR3, AVR_ISA_NONE, 
"__AVR_AT43USB320__",0x0060, 0x0, 1, 0x1)

 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, 
NULL,0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, 
"__AVR_ATmega103__", 0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE, 
"__AVR_AT43USB320__",   0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, 
NULL,0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, 
"__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)

 /* Classic + MOVW + JMP/CALL.  */


If at43usb320 is in the wrong multilib, then this should be handled as 
separate issue / patch together with its own PR.  Sorry for the 
confusion.  I just noticed that some fields don't match...


It is not even clear to me from the data sheet if avr3 is the correct 
multilib or perhaps avr35 (if it supports MOVW) or even avr5 (if it 
also has MUL) as there is no reference to the exact instruction set -- 
Atmochip will know.


Moreover, such a change should be sync'ed with avr-libc as all 
multilib stuff is hand-wired there: no use of --print-foo meta 
information retrieval by avr-libc :-((


I filed PR78275 and https://savannah.nongnu.org/bugs/index.php?49565 
for this one.


Thats better. I've attached the updated patch. If OK, could someone 
commit please?


I'll try if I could find some more info for AT43USB320.

Regards,
Pitchumani

diff --git a/gcc/config/avr/avr-arch.h b/gcc/config/avr/avr-arch.h
index 42eaee5..e0961d4 100644
--- a/gcc/config/avr/avr-arch.h
+++ b/gcc/config/avr/avr-arch.h
@@ -122,6 +122,9 @@ typedef struct
 
   /* Number of 64k segments in the flash.  */
   int n_flash;
+
+  /* Flash size in bytes.  */
+  int flash_size;
 } avr_mcu_t;
 
 /* AVR device specific features.
diff --git a/gcc/config/avr/avr-devices.c b/gcc/config/avr/avr-devices.c
index 7d13ba4..cef3b9a 100644
--- a/gcc/config/avr/avr-devices.c
+++ b/gcc/config/avr/avr-devices.c
@@ -111,12 +111,12 @@ avr_texinfo[] =
 const avr_mcu_t
 avr_mcu_types[] =
 {
-#define AVR_MCU(NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH)\
-  { NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH },
+#define AVR_MCU(NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH, FLASH_SIZE)\
+  { NAME, ARCH, DEV_ATTRIBUTE, MACRO, DATA_SEC, TEXT_SEC, N_FLASH, FLASH_SIZE },
 #include "avr-mcus.def"
 #undef AVR_MCU
 /* End of list.  */
-  { NULL, ARCH_UNKNOWN, AVR_ISA_NONE, NULL, 0, 0, 0 }
+  { NULL, ARCH_UNKNOWN, AVR_ISA_NONE, NULL, 0, 0, 0, 0 }
 };
 
 
diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def
index 6bcc6ff..6e51c2e 100644
--- a/gcc/config/avr/avr-mcus.def
+++ b/gcc/config/avr/avr-mcus.def
@@ -62,295 +62,297 @@
N_FLASH   Number of 64 KiB flash segments, rounded up.  The default
  value for -mn-flash=.
 
+   FLASH_SIZEFlash size in bytes.
+
"avr2" must be first for the "0" default to work as intended.  */
 
 /* Classic, <= 8K.  */
-AVR_MCU ("avr2", ARCH_AVR2, AVR_ERRATA_SKIP, NULL, 0x0060, 0x0, 6)
-AVR_MCU ("at90s2313",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2313__", 0x0060, 0x0, 1)
-AVR_MCU ("at90s2323",ARCH_AVR2, AVR_SHORT_SP, "__AVR_AT90S2323__", 0x0060, 0x0, 1)
-AVR_MCU ("at90s

Re: [PATCH TEST]Drop xfail for gcc.dg/vect/vect-cond-2.c

2016-11-10 Thread Bin.Cheng
On Wed, Nov 9, 2016 at 9:46 PM, Christophe Lyon
 wrote:
> Hi Bin
>
> On 8 November 2016 at 13:37, Bin Cheng  wrote:
>> Hi,
>> Test gcc.dg/vect/vect-cond-2.c can be vectorized by GCC now, this patch 
>> drops the xfail.
>>
>> Thanks,
>> bin
>>
>> gcc/testsuite/ChangeLog
>> 2016-11-04  Bin Cheng  
>>
>> * gcc.dg/vect/vect-cond-2.c: Drop xfail.
>
> But the test is now fail on armeb
Thanks for reporting.  Also on targets don't support vect_max_reduc, I
will refine it.

Thanks,
bin
>
> Christophe


Re: [PATCH] Fix PR78189

2016-11-10 Thread Richard Biener
On Wed, 9 Nov 2016, Christophe Lyon wrote:

> On 9 November 2016 at 09:36, Bin.Cheng  wrote:
> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener  wrote:
> >> On Mon, 7 Nov 2016, Christophe Lyon wrote:
> >>
> >>> Hi Richard,
> >>>
> >>>
> >>> On 7 November 2016 at 09:01, Richard Biener  wrote:
> >>> >
> >>> > The following fixes an oversight when computing alignment in the
> >>> > vectorizer.
> >>> >
> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >>> >
> >>> > Richard.
> >>> >
> >>> > 2016-11-07  Richard Biener  
> >>> >
> >>> > PR tree-optimization/78189
> >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix
> >>> > alignment computation.
> >>> >
> >>> > * g++.dg/torture/pr78189.C: New testcase.
> >>> >
> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
> >>> > ===
> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C  (revision 0)
> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C  (working copy)
> >>> > @@ -0,0 +1,41 @@
> >>> > +/* { dg-do run } */
> >>> > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" 
> >>> > } */
> >>> > +
> >>> > +#include 
> >>> > +
> >>> > +struct A
> >>> > +{
> >>> > +  void * a;
> >>> > +  void * b;
> >>> > +};
> >>> > +
> >>> > +struct alignas(16) B
> >>> > +{
> >>> > +  void * pad;
> >>> > +  void * misaligned;
> >>> > +  void * pad2;
> >>> > +
> >>> > +  A a;
> >>> > +
> >>> > +  void Null();
> >>> > +};
> >>> > +
> >>> > +void B::Null()
> >>> > +{
> >>> > +  a.a = nullptr;
> >>> > +  a.b = nullptr;
> >>> > +}
> >>> > +
> >>> > +void __attribute__((noinline,noclone))
> >>> > +NullB(void * misalignedPtr)
> >>> > +{
> >>> > +  B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) 
> >>> > - offsetof(B, misaligned));
> >>> > +  b->Null();
> >>> > +}
> >>> > +
> >>> > +int main()
> >>> > +{
> >>> > +  B b;
> >>> > +  NullB(&b.misaligned);
> >>> > +  return 0;
> >>> > +}
> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> >>> > index 9346cfe..b03cb1e 100644
> >>> > --- gcc/tree-vect-data-refs.c
> >>> > +++ gcc/tree-vect-data-refs.c
> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
> >>> > data_reference *dr)
> >>> >base = ref;
> >>> >while (handled_component_p (base))
> >>> >  base = TREE_OPERAND (base, 0);
> >>> > +  unsigned int base_alignment;
> >>> > +  unsigned HOST_WIDE_INT base_bitpos;
> >>> > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
> >>> > +  /* As data-ref analysis strips the MEM_REF down to its base operand
> >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to
> >>> > + adjust things to make base_alignment valid as the alignment of
> >>> > + DR_BASE_ADDRESS.  */
> >>> >if (TREE_CODE (base) == MEM_REF)
> >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
> >>> > -  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 
> >>> > 0));
> >>> > -  unsigned int base_alignment = get_object_alignment (base);
> >>> > +{
> >>> > +  base_bitpos -= mem_ref_offset (base).to_short_addr () * 
> >>> > BITS_PER_UNIT;
> >>> > +  base_bitpos &= (base_alignment - 1);
> >>> > +}
> >>> > +  if (base_bitpos != 0)
> >>> > +base_alignment = base_bitpos & -base_bitpos;
> >>> > +  /* Also look at the alignment of the base address DR analysis
> >>> > + computed.  */
> >>> > +  unsigned int base_addr_alignment = get_pointer_alignment (base_addr);
> >>> > +  if (base_addr_alignment > base_alignment)
> >>> > +base_alignment = base_addr_alignment;
> >>> >
> >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
> >>> >  DR_VECT_AUX (dr)->base_element_aligned = true;
> >>>
> >>> Since you committed this patch (r241892), I'm seeing execution failures:
> >>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
> >>>   gcc.dg/vect/pr40074.c execution test
> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
> >>> --with-fpu=neon-fp16
> >>> (using qemu as simulator)
> >>
> >> The difference is that we now vectorize the testcase with versioning
> >> for alignment (but it should never execute the vectorized variant).
> >> I need arm peoples help to understand what is wrong.
> > Hi All,
> > I will look at it.
> >
> 
> Hi,
> 
> This is causing new regressions on armeb:
>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorized 2 loops" 1
>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect
> "vectorized 2 loops" 1

It's actually an improvement as armeb can't do unaligned loads.
Before the patch we versioned both loops for alignment (they
can't be possibly both aligned by luck).  After the patch we
align 'arr' and thus the first loop becomes known-aligned and
vectorized while the 2nd one is now known unaligned (and thus
not vectorized because we can't do unalign

Re: [PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable

2016-11-10 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00040.html

Andrew, do you have any objections to this version?
Thanks,
Kyrill

On 01/11/16 15:21, Kyrill Tkachov wrote:


On 31/10/16 11:54, Kyrill Tkachov wrote:


On 24/10/16 17:15, Andrew Pinski wrote:

On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov
 wrote:

Hi all,

When storing a 64-bit immediate that has equal bottom and top halves we
currently
synthesize the repeating 32-bit pattern twice and perform a single X-store.
With this patch we synthesize the 32-bit pattern once into a W register and
store
that twice using an STP. This reduces codesize bloat from synthesising the
same
constant multiple times at the expense of converting a store to a
store-pair.
It will only trigger if we can save two or more instructions, so it will
only transform:
 mov x1, 49370
 movkx1, 0xc0da, lsl 32
 str x1, [x0]

into:

 mov w1, 49370
 stp w1, w1, [x0]

when optimising for -Os, whereas it will always transform a 4-insn synthesis
sequence into a two-insn sequence + STP (see comments in the patch).

This patch triggers already but will trigger more with the store merging
pass
that I'm working on since that will generate more of these repeating 64-bit
constants.
This helps improve codegen on 456.hmmer where store merging can sometimes
create very
complex repeating constants and target-specific expand needs to break them
down.


Doing STP might be worse on ThunderX 1 than the mov/movk.  Or this
might cause an ICE with -mcpu=thunderx; I can't remember if the check
for slow unaligned store pair word is with the pattern or not.


I can't get it to ICE with -mcpu=thunderx.
The restriction is just on the STP forming code in the sched-fusion hooks AFAIK.


Basically the rule is
1) if 4 byte aligned, then it is better to do two str.
2) If 8 byte aligned, then doing stp is good
3) Otherwise it is better to do two str.


Ok, then I'll make the function just emit two stores and depend on the 
sched-fusion
machinery to fuse them into an STP when appropriate since that has the logic 
that
takes thunderx into account.



Here it is.
I've confirmed that it emits to STRs for 4 byte aligned stores when 
-mtune=thunderx
and still generates STP for the other tunings, though now sched-fusion is 
responsible for
merging them, which is ok by me.

Bootstrapped and tested on aarch64.
Ok for trunk?

Thanks,
Kyril


2016-11-01  Kyrylo Tkachov  

* config/aarch64/aarch64.md (mov): Call
aarch64_split_dimode_const_store on DImode constant stores.
* config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store):
New prototype.
* config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
function.

2016-11-01  Kyrylo Tkachov  

* gcc.target/aarch64/store_repeating_constant_1.c: New test.
* gcc.target/aarch64/store_repeating_constant_2.c: Likewise.






Thanks,
Andrew



Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-10-24  Kyrylo Tkachov  

 * config/aarch64/aarch64.md (mov): Call
 aarch64_split_dimode_const_store on DImode constant stores.
 * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store):
 New prototype.
 * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
 function.

2016-10-24  Kyrylo Tkachov  

 * gcc.target/aarch64/store_repeating_constant_1.c: New test.
 * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.








Re: [PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable

2016-11-10 Thread Andrew Pinski
On Thu, Nov 10, 2016 at 1:04 AM, Kyrill Tkachov
 wrote:
> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00040.html
>
> Andrew, do you have any objections to this version?

Not really.

Thanks,
Andrew

> Thanks,
> Kyrill
>
> On 01/11/16 15:21, Kyrill Tkachov wrote:
>>
>>
>> On 31/10/16 11:54, Kyrill Tkachov wrote:
>>>
>>>
>>> On 24/10/16 17:15, Andrew Pinski wrote:

 On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov
  wrote:
>
> Hi all,
>
> When storing a 64-bit immediate that has equal bottom and top halves we
> currently
> synthesize the repeating 32-bit pattern twice and perform a single
> X-store.
> With this patch we synthesize the 32-bit pattern once into a W register
> and
> store
> that twice using an STP. This reduces codesize bloat from synthesising
> the
> same
> constant multiple times at the expense of converting a store to a
> store-pair.
> It will only trigger if we can save two or more instructions, so it
> will
> only transform:
>  mov x1, 49370
>  movkx1, 0xc0da, lsl 32
>  str x1, [x0]
>
> into:
>
>  mov w1, 49370
>  stp w1, w1, [x0]
>
> when optimising for -Os, whereas it will always transform a 4-insn
> synthesis
> sequence into a two-insn sequence + STP (see comments in the patch).
>
> This patch triggers already but will trigger more with the store
> merging
> pass
> that I'm working on since that will generate more of these repeating
> 64-bit
> constants.
> This helps improve codegen on 456.hmmer where store merging can
> sometimes
> create very
> complex repeating constants and target-specific expand needs to break
> them
> down.


 Doing STP might be worse on ThunderX 1 than the mov/movk.  Or this
 might cause an ICE with -mcpu=thunderx; I can't remember if the check
 for slow unaligned store pair word is with the pattern or not.
>>>
>>>
>>> I can't get it to ICE with -mcpu=thunderx.
>>> The restriction is just on the STP forming code in the sched-fusion hooks
>>> AFAIK.
>>>
 Basically the rule is
 1) if 4 byte aligned, then it is better to do two str.
 2) If 8 byte aligned, then doing stp is good
 3) Otherwise it is better to do two str.
>>>
>>>
>>> Ok, then I'll make the function just emit two stores and depend on the
>>> sched-fusion
>>> machinery to fuse them into an STP when appropriate since that has the
>>> logic that
>>> takes thunderx into account.
>>>
>>
>> Here it is.
>> I've confirmed that it emits to STRs for 4 byte aligned stores when
>> -mtune=thunderx
>> and still generates STP for the other tunings, though now sched-fusion is
>> responsible for
>> merging them, which is ok by me.
>>
>> Bootstrapped and tested on aarch64.
>> Ok for trunk?
>>
>> Thanks,
>> Kyril
>>
>>
>> 2016-11-01  Kyrylo Tkachov  
>>
>> * config/aarch64/aarch64.md (mov): Call
>> aarch64_split_dimode_const_store on DImode constant stores.
>> * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store):
>> New prototype.
>> * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
>> function.
>>
>> 2016-11-01  Kyrylo Tkachov  
>>
>> * gcc.target/aarch64/store_repeating_constant_1.c: New test.
>> * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
>>
>>>
>>>

 Thanks,
 Andrew


> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2016-10-24  Kyrylo Tkachov  
>
>  * config/aarch64/aarch64.md (mov): Call
>  aarch64_split_dimode_const_store on DImode constant stores.
>  * config/aarch64/aarch64-protos.h
> (aarch64_split_dimode_const_store):
>  New prototype.
>  * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
>  function.
>
> 2016-10-24  Kyrylo Tkachov  
>
>  * gcc.target/aarch64/store_repeating_constant_1.c: New test.
>  * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
>>>
>>>
>>
>


Re: [Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]

2016-11-10 Thread Andre Vehreschild
Hi Janus,

well, is it really that obvious? It fixes the ICE, correct. But what about
cases where the actual has an explicit interface, but is not a variable? Like
in:

function f()
  integer :: f
end function

call sub(f())

! sub() defined as in the pr.

I haven't checked whether that is valid Fortran.

Another thought, because I fell on my nose about this once too often, what
about the actual argument being a component of a derived type. I known this is
not the bug you fixes, but quite close. Think about:

type t
  real :: r(:)
end type

type(t) :: foo

call sub(foo%r)

I assume the check for C1232 does not address this correctly. Can you comment
having thought about this somewhat?

- Andre

On Wed, 9 Nov 2016 21:42:15 +0100
Janus Weil  wrote:

> Hi all,
> 
> I have committed yet another obvious ice-on-invalid fix:
> 
> https://gcc.gnu.org/viewcvs?rev=242020&root=gcc&view=rev
> 
> Cheers,
> Janus


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [ARM] PR 78253 do not resolve weak ref locally

2016-11-10 Thread Richard Earnshaw
On 09/11/16 21:29, Christophe Lyon wrote:
> Hi,
> 
> PR 78253 shows that the handling of weak references has changed for
> ARM with gcc-5.
> 
> When r220674 was committed, default_binds_local_p_2 gained a new
> parameter (weak_dominate), which, when true, implies that a reference
> to a weak symbol defined locally will be resolved locally, even though
> it could be overridden by a strong definition in another object file.
> 
> With r220674, default_binds_local_p forces weak_dominate=true,
> effectively changing the previous behavior.
> 
> The attached patch introduces default_binds_local_p_4 which is a copy
> of default_binds_local_p_2, but using weak_dominate=false, and updates
> the ARM target to call default_binds_local_p_4 instead of
> default_binds_local_p_2.
> 
> I ran cross-tests on various arm* configurations with no regression,
> and checked that the test attached to the original bugzilla now works
> as expected.
> 
> I am not sure why weak_dominate defaults to true, and I couldn't
> really understand why by reading the threads related to r220674 and
> following updates to default_binds_local_p_* which all deal with other
> corner cases and do not discuss the weak_dominate parameter.
> 
> Or should this patch be made more generic?
> 

I certainly don't think it should be ARM specific.

The questions I have are:

1) What do other targets do today.  Are they the same, or different?
2) If different why?
3) Is the current behaviour really what was intended by the patch?  ie.
Was the old behaviour actually wrong?

R.
> Thanks,
> 
> Christophe
> 



Re: [PATCH v2] Add mcpu flag for Qualcomm falkor core

2016-11-10 Thread Richard Earnshaw
On 04/11/16 15:47, Siddhesh Poyarekar wrote:
> This adds an mcpu option for the upcoming Qualcomm Falkor core.  This
> is identical to the qdf24xx part that was added earlier and hence
> retains the same tuning structure and continues to have the a57
> pipeline for now.  The part number has also been changed and this
> patch fixes this for both qdf24xx and falkor options.
> 
> Tested with aarch64 and armhf.
> 
> Siddhesh
> 
>   * gcc/config/aarch64/aarch64-cores.def (qdf24xx): Update part
>   number.
>   (falkor): New core.
>   * gcc/config/aarch64/aarch64-tune.md: Regenerated.
>   * gcc/config/arm/arm-cores.def (falkor): New core.
>   * gcc/config/arm/arm-tables.opt: Regenerated.
>   * gcc/config/arm/arm-tune.md: Regenerated.
>   * gcc/config/arm/bpabi.h (BE8_LINK_SPEC): Add falkor support.
>   * gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add falkor
>   support.
>   * gcc/doc/invoke.texi (AArch64 Options/-mtune): Add falkor.
>   (ARM Options/-mtune): Add falkor.
> 

Installed.  Please can you prepare a patch for the release notes as well.

R.



Re: [PATCH, RFC] Improve ivopts group costs

2016-11-10 Thread Bin.Cheng
On Wed, Nov 9, 2016 at 1:02 PM, Bin.Cheng  wrote:
> On Thu, Nov 3, 2016 at 4:00 PM, Bin.Cheng  wrote:
>> On Thu, Nov 3, 2016 at 1:35 PM, Evgeny Kudryashov  
>> wrote:
>>> Hello,
>>>
>>> I'm facing the following problem related to ivopts. The problem is that GCC
>>> generates a lot of induction variables and as a result there is an
>>> unnecessary increase of stack usage and register pressure.
>>>
>>> For instance, for the attached testcase (tc_ivopts.c) GCC generates 26
>>> induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for
>>> rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm target
>>> using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For reference, I'm
>>> attaching assembly I get on current trunk.
>>>
>>> The reason might be in use groups costs, in particular, in the way of
>>> estimation. Currently, the cost of a tuple (group, candidate) is a sum of
>>> per-use costs in the group. So, the cost of a group grows proportional to
>>> the number of uses in the group. This approach has a negative effect on the
>>> algorithm for finding the best set of induction variables: the part of a
>>> total cost related to adding a new candidate is almost washed out by the
>>> cost of the group. In addition, when there is a lot of groups with many uses
>>> inside and a target is out of free registers, the cost of spill is washing
>>> out too. As a result, GCC prefers to use native candidates (candidate
>>> created for a particular group) for each group of uses instead of
>>> considering the real cost of introducing a new variable into a set.
>>>
>>> The summing approach was added as a part of this patch
>>> (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the
>>> motivation for taking the sum does not seem to be
>>> specifically discussed.
>>>
>>> I propose the following patch that changes a group cost from cost summing to
>>> selecting the largest one inside a group. For the given test case I have:
>>> necessary size of stack is decreased by almost 3 times and ldr\str amount
>>> are decreased by less than 2 times. Also I'm attaching assembly after
>>> applying the patch.
>>>
>>> The essential change in the patch is just:
>>>
>>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>>> index f9211ad..a149418 100644
>>> --- a/gcc/tree-ssa-loop-ivopts.c
>>> +++ b/gcc/tree-ssa-loop-ivopts.c
>>> @@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct ivopts_data
>>> *data,
>>>  offset and where the iv_use is.  */
>>> cost = get_computation_cost (data, next, cand, true,
>>>  NULL, &can_autoinc, NULL);
>>> -  sum_cost += cost;
>>> +  if (sum_cost < cost)
>>> +sum_cost = cost;
>>>  }
>>>set_group_iv_cost (data, group, cand, sum_cost, depends_on,
>>>  NULL_TREE, ERROR_MARK, inv_expr);
>>>
>>> Any suggestions?
>> Hi Evgeny,
>>
>> I tend to be practical here.  Given cost is not model that well in
>> IVOPT, any heuristic tuning in cost is quite likely resulting in
>> improvement in some cases, while regressions in others.  At least we
>> need to run good number of benchmarks for such changes.  Specific to
>> this one, the summary of cost in a group is a natural result of the
>> original code, in which uses' cost is added up before candidate
>> selection.  Take your code as an example, choosing loop's original
>> candidate for group results in lots of memory accesses with [base +
>> index << scale] addressing mode, which could have higher cost than
>> [base + offset] mode wrto u-arch, IMHO, it makes sense to add up this
>> cost.  OTOH, I totally agree that IVOPT tends to choose too many
>> candidates at the moment, especially for large loops.  Is it possible
>> to achieve the same effect by penalizing register pressure cost?
>> Meanwhile, I can collect benchmark data for your patch on AArch64 and
>> get back to you later.
> I collected spec2k6 data on one AArch64 processors, it doesn't give
> meaningful improvement or regression.  Looking at the test, it boils
> down to how we choose induction variable for loops having below memory
> references:
> for (biv)
>   MEM[base + biv << scale + off1];
>   MEM[base + biv << scale + off2];
>   // ...
>   MEM[base + biv << scale + offn];
>
> On targets support [base + index << scale + offset] addressing mode ,
> the biv should be preferred (if cost of the addressing mode is not
> blocking) thus register pressure is 2.  While on targets only support
> [base + index << scale], it is more complicated.  Choosing biv
> actually increases the register pressure by 1 than choosing {base_1 +
> biv << scale, step} as the candidate, or an additional computation
> outside of address expression is required for each memory referece.
> Well, there is one exception when "base" is actually anticipated on
> exit edge of this loop.  So this is really target dependent cost model
> issue, IMHO, decreasing group cost in target-independent wa

Re: [PATCH] PR78241, fix loop unroller when niter expr is not reliable

2016-11-10 Thread Richard Biener
On Wed, Nov 9, 2016 at 11:13 PM, Pat Haugen  wrote:
> The following fixes a problem introduced by my earlier loop unroller patch, 
> https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01612.html. In instances where 
> the niter expr is not reliable we need to still emit an initial peel copy of 
> the loop.
>
> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

Ok.

Richard.

> -Pat
>
>
> 2016-11-09  Pat Haugen  
>
> PR rtl-optimization/78241
> * loop-unroll.c (unroll_loop_runtime_iterations): Don't adjust 
> 'niter', but
> emit initial peel copy if niter expr is not reliable.
>
>
> testsuite/ChangeLog:
> 2016-11-09  Pat Haugen  
>
> * gcc.dg/pr78241.c: New test.
>
>


[ARM] Remove duplicated enum type for CPU identifiers

2016-11-10 Thread Richard Earnshaw (lists)
Another minor cleanup.  We currently have two enumeration types for the
list of CPUs.  This patch gets rid of one of them and merges the uses
together.



* arm.h (target_cpus): Delete.
* arm-opts.h (enum processor_type): Prefix entries with TARGET_CPU_.
* arm.c (all_cores): Prefix IDENT with TARGET_CPU_.
(all_architectures): Likewise.
(arm_option_override): Adjust use of CPU enums.
(arm_sched_reorder): Likewise.
* vfp.md (movdi_vfp, movdi_vfp_cortexa8): Likewise.
* arm.opt (mcpu, mtune): Adjust use of CPU enums.
* arm/genopt.sh (processor_type): Prefix enumeration entries with
TARGET_CPU_.
* arm-tables.opt: Regenerated.

Installed on trunk.

>From 7f6788b63cdade43b245d33df667bc5af73c11aa Mon Sep 17 00:00:00 2001
From: Richard Earnshaw 
Date: Wed, 9 Nov 2016 13:11:10 +
Subject: [PATCH] [ARM] Remove duplicated enum type for CPU identifiers.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.7.4"

This is a multi-part message in MIME format.
--2.7.4
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


	* arm.h (target_cpus): Delete.
	* arm-opts.h (enum processor_type): Prefex entires with TARGET_CPU_.
	* arm.c (all_cores): Prefix IDENT with TARGET_CPU_.
	(all_architectures): Likewise.
	(arm_option_override): Adjust use of CPU enums.
	(arm_sched_reorder): Likewise.
	* vfp.md (movdi_vfp): Likewise.
	* arm.opt (mcpu, mtune): Adjust use of CPU enums.
	* arm/genopt.sh (processor_type): Prefix enumeration entries with
	TARGET_CPU_.
	* arm-tables.opt: Regenerated.
---
 gcc/config/arm/arm-opts.h |   4 +-
 gcc/config/arm/arm-tables.opt | 218 +-
 gcc/config/arm/arm.c  |  16 ++--
 gcc/config/arm/arm.h  |  10 --
 gcc/config/arm/arm.opt|   4 +-
 gcc/config/arm/genopt.sh  |   2 +-
 gcc/config/arm/vfp.md |   4 +-
 7 files changed, 124 insertions(+), 134 deletions(-)


--2.7.4
Content-Type: text/x-patch; name="0001-ARM-Remove-duplicated-enum-type-for-CPU-identifiers.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="0001-ARM-Remove-duplicated-enum-type-for-CPU-identifiers.patch"

diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h
index a649ba5..9fa8113 100644
--- a/gcc/config/arm/arm-opts.h
+++ b/gcc/config/arm/arm-opts.h
@@ -30,11 +30,11 @@ enum processor_type
 {
 #undef ARM_CORE
 #define ARM_CORE(NAME, INTERNAL_IDENT, IDENT, ARCH, FLAGS, COSTS) \
-  INTERNAL_IDENT,
+  TARGET_CPU_##INTERNAL_IDENT,
 #include "arm-cores.def"
 #undef ARM_CORE
   /* Used to indicate that no processor has been specified.  */
-  arm_none
+  TARGET_CPU_arm_none
 };
 
 /* Which __fp16 format to use.
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index f7886b9..effdfbc 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -25,331 +25,331 @@ Name(processor_type) Type(enum processor_type)
 Known ARM CPUs (for use with the -mcpu= and -mtune= options):
 
 EnumValue
-Enum(processor_type) String(arm2) Value(arm2)
+Enum(processor_type) String(arm2) Value( TARGET_CPU_arm2)
 
 EnumValue
-Enum(processor_type) String(arm250) Value(arm250)
+Enum(processor_type) String(arm250) Value( TARGET_CPU_arm250)
 
 EnumValue
-Enum(processor_type) String(arm3) Value(arm3)
+Enum(processor_type) String(arm3) Value( TARGET_CPU_arm3)
 
 EnumValue
-Enum(processor_type) String(arm6) Value(arm6)
+Enum(processor_type) String(arm6) Value( TARGET_CPU_arm6)
 
 EnumValue
-Enum(processor_type) String(arm60) Value(arm60)
+Enum(processor_type) String(arm60) Value( TARGET_CPU_arm60)
 
 EnumValue
-Enum(processor_type) String(arm600) Value(arm600)
+Enum(processor_type) String(arm600) Value( TARGET_CPU_arm600)
 
 EnumValue
-Enum(processor_type) String(arm610) Value(arm610)
+Enum(processor_type) String(arm610) Value( TARGET_CPU_arm610)
 
 EnumValue
-Enum(processor_type) String(arm620) Value(arm620)
+Enum(processor_type) String(arm620) Value( TARGET_CPU_arm620)
 
 EnumValue
-Enum(processor_type) String(arm7) Value(arm7)
+Enum(processor_type) String(arm7) Value( TARGET_CPU_arm7)
 
 EnumValue
-Enum(processor_type) String(arm7d) Value(arm7d)
+Enum(processor_type) String(arm7d) Value( TARGET_CPU_arm7d)
 
 EnumValue
-Enum(processor_type) String(arm7di) Value(arm7di)
+Enum(processor_type) String(arm7di) Value( TARGET_CPU_arm7di)
 
 EnumValue
-Enum(processor_type) String(arm70) Value(arm70)
+Enum(processor_type) String(arm70) Value( TARGET_CPU_arm70)
 
 EnumValue
-Enum(processor_type) String(arm700) Value(arm700)
+Enum(processor_type) String(arm700) Value( TARGET_CPU_arm700)
 
 EnumValue
-Enum(processor_type) String(arm700i) Value(arm700i)
+Enum(processor_type) String(arm700i) Value( TARGET_CPU_arm700i)
 
 EnumValue
-Enum(processor_type) String(arm710) Value(arm710)
+Enum(processor_type) String(arm710) Value( TARGET_CPU_arm710)
 
 EnumValue
-Enum(pro

[patch 0/5] OpenACC tile clause support

2016-11-10 Thread Chung-Lin Tang
Hi Jakub,
this patch set is a port of the OpenACC tile clause functionality from the gomp4
branch. The tile clause allows the programmer to specify the tiling of an
OpenACC loop nest.

These patches were mostly implemented by Nathan, with the Fortran
front-end bits by Cesar. I'm just upstreaming this on behalf of them.

As for the testing status, one libgomp testcase is currently XFAILed,
as we're fixing this as PR78266. Other test results look fine.

Is this patch set okay for trunk? We hope to get this into GCC 7.

Thanks,
Chung-Lin


Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-10 Thread Richard Biener
On Thu, Nov 10, 2016 at 12:05 AM, Segher Boessenkool
 wrote:
> On Wed, Nov 09, 2016 at 11:29:45PM +0100, Marc Glisse wrote:
>> >>>This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
>> >>>to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
>> >>>of B for that last term).
>> >>
>> >>Would it make sense to implement this transformation in match.pd, next to
>> >>the "opposite" one, or do you need it at the RTL level because C only
>> >>becomes a constant at that stage?
>> >
>> >It becomes a constant in the later gimple passes, but we need it in the RTL
>> >simplifiers as well, even if you also do it in match.pd?
>>
>> (assuming it is always an improvement, even though it may use the same
>> number of operations and one more constant)
>
> And a shallower evaluation tree.
>
> Does match.pd (or the gimple optimisers in general) care about the number
> of constants at all?

It's a matter of definition what is simpler.  I think for two
different constants vs
one constant I'd prefer the single constant (the match.pd pattern would apply
to int128 as well).  Unless you have a andnot instruction and combine/CSE
are clever enough to see the opportunity to re-use a previous constant if
it changes bit-and to bit-and-not ...

I'd at _most_ consider them equal cost which would mean we'd need to
decide for a canonical form -- having the same one for constant and not constant
seems like a good idea here as well.

bernd notices better resource utilization due to the shallower tree for the
ior variant but that also eventually increases register pressure.

Thus, I'd leave this all to RTL land ;)  (esp. the constant with
and-not instruction
trick is likely not done on RTL)

Richard.

>> Sure, it doesn't hurt to have it in both places. It just seems that since
>> the problem was caused by match.pd in your original testcase, fixing it at
>> that level (undoing the harm as soon as possible) would make the RTL
>> version less useful (though not useless). Anyway, I don't feel competent
>> to decide when which form is preferable, I was just curious.
>
> I don't know either.  See PR63568.
>
>> (simplify
>>  (bit_xor:cs (bit_and:s (bit_xor:cs @0 @1) CONSTANT_CLASS_P@2) @0)
>>  (bit_ior (bit_and @0 (bit_not @2)) (bit_and @1 @2)))
>>
>> (this handles vectors as well, I don't know if that is desired)
>
> No clue.  There is a theme here ;-)
>
>
> Segher


[Patch 1/5] OpenACC tile clause support, OMP_CLAUSE_TILE adjustments

2016-11-10 Thread Chung-Lin Tang
This patch contains a few supporting changes that adjusts how
OMP_CLAUSE_TILE is handled. This is in support of the more
elaborate omp-low.c changes in another patch.

Thanks,
Chung-Lin

2016-XX-XX  Nathan Sidwell  

* tree.h (OMP_CLAUSE_TILE_ITERVAR, OMP_CLAUSE_TILE_COUNT): New.
* tree.c (omp_clause_num_ops): Adjust TILE ops.
* tree-nested.c (convert_nonlocal_omp_clauses): Allow OMP_CLAUSE_TILE.
* gimplify.c (gimplify_scan_omp_clauses): No special handling for
OMP_CLAUSE_TILE.
(gomplify_adjust_omp_clauses): Don't delete TILE.
(gimplify_omp_for): Deal with TILE.
Index: tree.c
===
--- tree.c	(revision 241809)
+++ tree.c	(working copy)
@@ -327,7 +327,7 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_NUM_GANGS  */
   1, /* OMP_CLAUSE_NUM_WORKERS  */
   1, /* OMP_CLAUSE_VECTOR_LENGTH  */
-  1, /* OMP_CLAUSE_TILE  */
+  3, /* OMP_CLAUSE_TILE  */
   2, /* OMP_CLAUSE__GRIDDIM_  */
 };
 
Index: tree.h
===
--- tree.h	(revision 241809)
+++ tree.h	(working copy)
@@ -1654,6 +1654,10 @@ extern void protected_set_expr_location (tree, loc
 
 #define OMP_CLAUSE_TILE_LIST(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
+#define OMP_CLAUSE_TILE_ITERVAR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 1)
+#define OMP_CLAUSE_TILE_COUNT(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 2)
 
 #define OMP_CLAUSE__GRIDDIM__DIMENSION(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
Index: tree-nested.c
===
--- tree-nested.c	(revision 241809)
+++ tree-nested.c	(working copy)
@@ -1274,6 +1274,7 @@ convert_nonlocal_omp_clauses (tree *pclauses, stru
 	case OMP_CLAUSE_DEFAULT:
 	case OMP_CLAUSE_COPYIN:
 	case OMP_CLAUSE_COLLAPSE:
+	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE_UNTIED:
 	case OMP_CLAUSE_MERGEABLE:
 	case OMP_CLAUSE_PROC_BIND:
@@ -1286,8 +1287,6 @@ convert_nonlocal_omp_clauses (tree *pclauses, stru
 	case OMP_CLAUSE_AUTO:
 	  break;
 
-	  /* OpenACC tile clauses are discarded during gimplification.  */
-	case OMP_CLAUSE_TILE:
 	  /* The following clause belongs to the OpenACC cache directive, which
 	 is discarded during gimplification.  */
 	case OMP_CLAUSE__CACHE_:
@@ -1982,6 +1981,7 @@ convert_local_omp_clauses (tree *pclauses, struct
 	case OMP_CLAUSE_DEFAULT:
 	case OMP_CLAUSE_COPYIN:
 	case OMP_CLAUSE_COLLAPSE:
+	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE_UNTIED:
 	case OMP_CLAUSE_MERGEABLE:
 	case OMP_CLAUSE_PROC_BIND:
@@ -1994,8 +1994,6 @@ convert_local_omp_clauses (tree *pclauses, struct
 	case OMP_CLAUSE_AUTO:
 	  break;
 
-	  /* OpenACC tile clauses are discarded during gimplification.  */
-	case OMP_CLAUSE_TILE:
 	  /* The following clause belongs to the OpenACC cache directive, which
 	 is discarded during gimplification.  */
 	case OMP_CLAUSE__CACHE_:
Index: gimplify.c
===
--- gimplify.c	(revision 241809)
+++ gimplify.c	(working copy)
@@ -8138,20 +8138,11 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se
 	remove = true;
 	  break;
 
-	case OMP_CLAUSE_TILE:
-	  for (tree list = OMP_CLAUSE_TILE_LIST (c); !remove && list;
-	   list = TREE_CHAIN (list))
-	{
-	  if (gimplify_expr (&TREE_VALUE (list), pre_p, NULL,
- is_gimple_val, fb_rvalue) == GS_ERROR)
-		remove = true;
-	}
-	  break;
-
 	case OMP_CLAUSE_NOWAIT:
 	case OMP_CLAUSE_ORDERED:
 	case OMP_CLAUSE_UNTIED:
 	case OMP_CLAUSE_COLLAPSE:
+	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE_INDEPENDENT:
@@ -8927,13 +8918,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gi
 	case OMP_CLAUSE_VECTOR:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
-	  break;
-
 	case OMP_CLAUSE_TILE:
-	  /* We're not yet making use of the information provided by OpenACC
-	 tile clauses.  Discard these here, to simplify later middle end
-	 processing.  */
-	  remove = true;
 	  break;
 
 	default:
@@ -9388,10 +9373,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 		 (OMP_FOR_INIT (for_stmt))
 	   * 2);
 }
-  int collapse = 1;
-  c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_COLLAPSE);
-  if (c)
-collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
+  int collapse = 0;
+  /* Find the first of COLLAPSE or TILE.  */
+  for (c = OMP_FOR_CLAUSES (for_stmt); c; c = TREE_CHAIN (c))
+if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_COLLAPSE)
+  {
+	collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (c));
+	if (collapse == 1)
+	  /* Not really collapsing.  */
+	  collapse = 0;
+	break;
+  }
+else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_TILE)
+  {
+	collapse = list_length (OMP_CLAUSE_TILE_LIST (c));
+	break;
+  }
+
   for

[Patch 2/5] OpenACC tile clause support, omp-low parts

2016-11-10 Thread Chung-Lin Tang
This part is the bulk of the patch set. It consists of
the definition of the GOACC_TILE internal fn, the lowering/expanding
of this in omp-low, as well as loop auto-partitioning adjustments
that help improve generated code.

This patch corresponds to the gomp4 committed patches:
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00518.html
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02031.html
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02351.html
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02418.html
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00107.html
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00152.html
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00314.html

with some small bits separated out to the prior patch in this series.

Thanks,
Chung-Lin

2016-XX-XX  Nathan Sidwell  

* internal-fn.def (GOACC_DIM_POS): Comment may be overly conservative.
(GOACC_TILE): New.
* internal-fn.c (expand_GOACC_TILE): New.

* omp-low.c (struct omp_for_data): Add tiling field.
(struct oacc_loop): Change 'ifns' to vector of call stmts,
add e_mask field.
(enum oacc_loop_flags): Add OLF_TILE flag.
(extract_omp_for_data): Deal with tiling.
(scan_sharing_clauses): Allow OMP_CLAUSE_TILE.
(lower_oacc_head_mark): Add OLF_TILE as appropriate.  Ensure 2 levels
for auto loops.  Remove default auto determining, moved to
oacc_loop_fixed_partitions.
(struct oacc_collapse): Add tile and outer fields.  */
(expand_oacc_collapse_init): Add LOC paramter.  Initialize tile
element fields.
(expand_oacc_collapse_vars): Add INNER parm, adjust for tiling,
avoid DIV for outermost collapse var.
(expand_oacc_for): Insert tile element loop as needed.  Adjust.
Remove out of date comments, fix whitespace.
(oacc_dim_call): New function, abstracted out from oacc_thread_numbers.
(oacc_thread_numbers): Use oacc_dim_call.
(oacc_xform_tile): New.
(new_oacc_loop_raw): Initialize e_mask, adjust for ifns vector.
(finish_oacc_loop): Adjust for ifns vector.
(oacc_loop_discover_walk): Append loop abstraction sites to list,
add case for GOACC_TILE fns.
(oacc_loop_xform_loop): Delete.
(oacc_loop_process): Iterate over call list directly, and add
handling for GOACC_TILE fns.
(oacc_loop_fixed_partitions): Determine default auto, deal with TILE,
dump partitioning.
(oacc_loop_auto_partitions): Add outer_assign parm. Assign all but
vector partitioning to outer loops.  Assign 2 partitions to loops
when available. Add TILE handling.
(oacc_loop_partition): Adjust oacc_loop_auto_partitions call.
(execite_oacc_device_lower): Process GOACC_TILE fns, ignore unknown 
specs.

Index: internal-fn.c
===
--- internal-fn.c	(revision 241809)
+++ internal-fn.c	(working copy)
@@ -2168,6 +2168,14 @@ expand_GOACC_REDUCTION (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This is expanded by oacc_device_lower pass.  */
+
+static void
+expand_GOACC_TILE (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Set errno to EDOM.  */
 
 static void
Index: internal-fn.def
===
--- internal-fn.def	(revision 241809)
+++ internal-fn.def	(working copy)
@@ -175,7 +175,7 @@ DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW, NULL)
dimension.  DIM_POS is pure (and not const) so that it isn't
thought to clobber memory and can be gcse'd within a single
parallel region, but not across FORK/JOIN boundaries.  They take a
-   single INTEGER_CST argument.  */
+   single INTEGER_CST argument.  This might be overly conservative.  */
 DEF_INTERNAL_FN (GOACC_DIM_SIZE, ECF_CONST | ECF_NOTHROW | ECF_LEAF, ".")
 DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | ECF_LEAF, ".")
 
@@ -185,6 +185,10 @@ DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHRO
 /* OpenACC reduction abstraction.  See internal-fn.h  for usage.  */
 DEF_INTERNAL_FN (GOACC_REDUCTION, ECF_NOTHROW | ECF_LEAF, NULL)
 
+/* Openacc tile abstraction. Describes the spans of the element loop.
+   GOACC_TILE (num-loops, loop-no, tile-arg, tile-mask, element-mask).  */
+DEF_INTERNAL_FN (GOACC_TILE, ECF_NOTHROW | ECF_LEAF, NULL)
+
 /* Set errno to EDOM, if GCC knows how to do that directly for the
current target.  */
 DEF_INTERNAL_FN (SET_EDOM, ECF_LEAF | ECF_NOTHROW, NULL)
Index: omp-low.c
===
--- omp-low.c	(revision 241809)
+++ omp-low.c	(working copy)
@@ -213,7 +213,8 @@ struct omp_for_data
   tree chunk_size;
   gomp_for *for_stmt;
   tree pre, iter_type;
-  int collapse;
+  tree tiling;  /* Tiling values (if non null).  */
+  int collapse;  /* Collapsed loops, 1 for a non-collapsed loop.  */
   int ordered;
   bool have_nowait, have_ordered, simd_schedule;
 

[Patch 3/5] OpenACC tile clause support, C/C++ front-end parts

2016-11-10 Thread Chung-Lin Tang
These are the patches for the C/C++ front-ends, along with the
testsuite patches.

Thanks,
Chung-Lin

2016-XX-XX  Nathan Sidwell  

c/
* c-parser.c (c_parser_omp_clause_collapse): Disallow tile.
(c_parser_oacc_clause_tile): Disallow collapse. Fix parsing and
semantic checking.
* c-parser.c (c_parser_omp_for_loop): Accept tiling constructs.

cp/
* parser.c (cp_parser_oacc_clause_tile): Disallow collapse.  Fix
parsing.  Parse constant expression. Remove semantic checking.
(cp_parser_omp_clause_collapse): Disallow tile.
(cp_parser_omp_for_loop): Deal with tile clause.  Don't emit a
parse error about missing for after already emitting one.
Use more conventional for idiom for unbounded loop.
* pt.c (tsubst_omp_clauses): Require integral constant expression
for COLLAPSE and TILE.  Remove broken TILE subst.
* semantics.c (finish_omp_clauses): Correct TILE semantic check.
(finish_omp_for): Deal with tile clause.

gcc/testsuite/
* c-c++-common/goacc/loop-auto-1.c: Adjust and add additional
case.
* c-c++-common/goacc/loop-auto-2.c: New.
* c-c++-common/goacc/tile.c: Include stdbool, fix expected errors.
* g++.dg/goacc/template.C: Test tile subst.  Adjust erroneous
uses.
* g++.dg/goacc/tile-1.C: Check tile subst.
* gcc.dg/goacc/loop-processing-1.c: Adjust dg-final pattern.
Index: c/c-parser.c
===
--- c/c-parser.c	(revision 241809)
+++ c/c-parser.c	(working copy)
@@ -11010,6 +11010,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tr
   location_t loc;
 
   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
+  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
 
   loc = c_parser_peek_token (parser)->location;
   if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -11920,10 +11921,11 @@ static tree
 c_parser_oacc_clause_tile (c_parser *parser, tree list)
 {
   tree c, expr = error_mark_node;
-  location_t loc, expr_loc;
+  location_t loc;
   tree tile = NULL_TREE;
 
   check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
 
   loc = c_parser_peek_token (parser)->location;
   if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -11931,16 +11933,19 @@ c_parser_oacc_clause_tile (c_parser *parser, tree
 
   do
 {
+  if (tile && !c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
+	return list;
+
   if (c_parser_next_token_is (parser, CPP_MULT)
 	  && (c_parser_peek_2nd_token (parser)->type == CPP_COMMA
 	  || c_parser_peek_2nd_token (parser)->type == CPP_CLOSE_PAREN))
 	{
 	  c_parser_consume_token (parser);
-	  expr = integer_minus_one_node;
+	  expr = integer_zero_node;
 	}
   else
 	{
-	  expr_loc = c_parser_peek_token (parser)->location;
+	  location_t expr_loc = c_parser_peek_token (parser)->location;
 	  c_expr cexpr = c_parser_expr_no_commas (parser, NULL);
 	  cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true);
 	  expr = cexpr.value;
@@ -11952,28 +11957,20 @@ c_parser_oacc_clause_tile (c_parser *parser, tree
 	  return list;
 	}
 
-	  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)))
-	{
-	  c_parser_error (parser, "% value must be integral");
-	  return list;
-	}
-
 	  expr = c_fully_fold (expr, false, NULL);
 
-	  /* Attempt to statically determine when expr isn't positive.  */
-	  c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, expr,
-			   build_int_cst (TREE_TYPE (expr), 0));
-	  protected_set_expr_location (c, expr_loc);
-	  if (c == boolean_true_node)
+	  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
+	  || TREE_CODE (expr) != INTEGER_CST
+	  || !tree_fits_shwi_p (expr)
+	  || tree_to_shwi (expr) <= 0)
 	{
-	  warning_at (expr_loc, 0,"% value must be positive");
-	  expr = integer_one_node;
+	  error_at (expr_loc, "% argument needs positive"
+			" integral constant");
+	  expr = integer_zero_node;
 	}
 	}
 
   tile = tree_cons (NULL_TREE, expr, tile);
-  if (c_parser_next_token_is (parser, CPP_COMMA))
-	c_parser_consume_token (parser);
 }
   while (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN));
 
@@ -14899,11 +14896,17 @@ c_parser_omp_for_loop (location_t loc, c_parser *p
   bool fail = false, open_brace_parsed = false;
   int i, collapse = 1, ordered = 0, count, nbraces = 0;
   location_t for_loc;
+  bool tiling = false;
   vec *for_block = make_tree_vector ();
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
 if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
   collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
+else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+  {
+	tiling = true;
+	collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+  }
 else if (OMP_CLAUSE_CO

[Patch 5/5] OpenACC tile clause support, libgomp testsuite patches

2016-11-10 Thread Chung-Lin Tang
Some additional tests and adjustments to existing ones were made.

2016-XX-XX  Nathan Sidwell  
Chung-Lin Tang  

libgomp/
* testsuite/libgomp.oacc-c-c++-common/tile-1.c: New.
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust and
add additional case.
* testsuite/libgomp.oacc-c-c++-common/vprop.c: XFAIL under
"openacc_nvidia_accel_selected".
* libgomp.oacc-fortran/nested-function-1.f90 (test2): Add num_workers(8)
clause.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(revision 241809)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(working copy)
@@ -112,7 +112,7 @@ int vector_1 (int *ary, int size)
 	ary[ix] = place ();
   }
 
-  return check (ary, size, 0, 0, 1);
+  return check (ary, size, 0, 1, 1);
 }
 
 int vector_2 (int *ary, int size)
@@ -196,10 +196,24 @@ int gang_3 (int *ary, int size)
 	ary[ix + jx * 64] = place ();
   }
 
+  return check (ary, size, 1, 1, 1);
+}
+
+int gang_4 (int *ary, int size)
+{
+  clear (ary, size);
+  
+#pragma acc parallel vector_length(32) copy(ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop auto
+for (int jx = 0; jx <  size; jx++)
+  ary[jx] = place ();
+  }
+
   return check (ary, size, 1, 0, 1);
 }
 
-#define N (32*32*32)
+#define N (32*32*32*2)
 int main ()
 {
   int ondev = 0;
@@ -227,6 +241,8 @@ int main ()
 return 1;
   if (gang_3 (ary,  N))
 return 1;
+  if (gang_4 (ary,  N))
+return 1;
 
   return 0;
 }
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/tile-1.c	(revision 0)
@@ -0,0 +1,281 @@
+/* This code uses nvptx inline assembly guarded with acc_on_device, which is
+   not optimized away at -O0, and then confuses the target assembler.
+   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+
+/* { dg-additional-options "-fopenacc-dim=32" } */
+
+#include 
+#include 
+
+static int check (const int *ary, int size, int gp, int wp, int vp)
+{
+  int exit = 0;
+  int ix;
+  int gangs[32], workers[32], vectors[32];
+
+  for (ix = 0; ix < 32; ix++)
+gangs[ix] = workers[ix] = vectors[ix] = 0;
+  
+  for (ix = 0; ix < size; ix++)
+{
+  vectors[ary[ix] & 0xff]++;
+  workers[(ary[ix] >> 8) & 0xff]++;
+  gangs[(ary[ix] >> 16) & 0xff]++;
+}
+
+  for (ix = 0; ix < 32; ix++)
+{
+  if (gp)
+	{
+	  int expect = gangs[0];
+	  if (gangs[ix] != expect)
+	{
+	  exit = 1;
+	  printf ("gang %d not used %d times\n", ix, expect);
+	}
+	}
+  else if (ix && gangs[ix])
+	{
+	  exit = 1;
+	  printf ("gang %d unexpectedly used\n", ix);
+	}
+
+  if (wp)
+	{
+	  int expect = workers[0];
+	  if (workers[ix] != expect)
+	{
+	  exit = 1;
+	  printf ("worker %d not used %d times\n", ix, expect);
+	}
+	}
+  else if (ix && workers[ix])
+	{
+	  exit = 1;
+	  printf ("worker %d unexpectedly used\n", ix);
+	}
+
+  if (vp)
+	{
+	  int expect = vectors[0];
+	  if (vectors[ix] != expect)
+	{
+	  exit = 1;
+	  printf ("vector %d not used %d times\n", ix, expect);
+	}
+	}
+  else if (ix && vectors[ix])
+	{
+	  exit = 1;
+	  printf ("vector %d unexpectedly used\n", ix);
+	}
+  
+}
+  return exit;
+}
+
+#pragma acc routine seq
+static int __attribute__((noinline)) place ()
+{
+  int r = 0;
+
+  if (acc_on_device (acc_device_nvidia))
+{
+  int g = 0, w = 0, v = 0;
+
+  __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+  __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+  __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+  r = (g << 16) | (w << 8) | v;
+}
+  return r;
+}
+
+static void clear (int *ary, int size)
+{
+  int ix;
+
+  for (ix = 0; ix < size; ix++)
+ary[ix] = -1;
+}
+
+int gang_vector_1 (int *ary, int size)
+{
+  clear (ary, size);
+#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop tile(128) gang vector
+for (int jx = 0; jx < size; jx++)
+  ary[jx] = place ();
+  }
+
+  return check (ary, size, 1, 0, 1);
+}
+
+int gang_vector_2a (int *ary, int size)
+{
+  if (size % 256)
+return 1;
+  
+  clear (ary, size);
+#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop tile(64, 64) gang vector
+for (int jx = 0; jx < size / 256; jx++)
+  for (int ix = 0; ix < 256; ix++)
+	ary[jx * 256 + ix] = place ();
+  }
+
+  return check (ary, size, 1, 0, 1);
+}
+
+int gang_vector_2b (int *ary, int size)
+{
+  if (size % 256)
+return 1;
+  
+  clear (ary, size);
+#pragma acc parallel vector_length(32) num_gangs (32) copy (ary[0:size]) firstprivate (size)
+  {

[Patch 4/5] OpenACC tile clause support, Fortran front-end parts

2016-11-10 Thread Chung-Lin Tang
The Fortran front-end patches. These were originally written by Cesar.

Thanks,
Chung-Lin

2016-XX-XX  Cesar Philippidis  

fortran/
* openmp.c (resolve_oacc_positive_int_expr): Promote the warning
to an error.
(resolve_oacc_loop_blocks): Use integer zero to represent the '*'
tile argument.
(resolve_omp_clauses): Error on directives containing both tile
and collapse clauses.
* trans-openmp.c (gfc_trans_omp_do): Lower tiled loops like
collapsed loops.


gcc/testsuite/
* gfortran.dg/goacc/loop-2.f95: Change expected tile clause
warnings to errors.
* gfortran.dg/goacc/loop-5.f95: Likewise.
* gfortran.dg/goacc/sie.f95: Likewise.
* gfortran.dg/goacc/tile-1.f90: New test.
* gfortran.dg/goacc/tile-2.f90: New test
* gfortran.dg/goacc/tile-lowering.f95: New test.
Index: fortran/openmp.c
===
--- fortran/openmp.c	(revision 241809)
+++ fortran/openmp.c	(working copy)
@@ -3024,8 +3024,8 @@ resolve_oacc_positive_int_expr (gfc_expr *expr, co
   resolve_oacc_scalar_int_expr (expr, clause);
   if (expr->expr_type == EXPR_CONSTANT && expr->ts.type == BT_INTEGER
   && mpz_sgn(expr->value.integer) <= 0)
-gfc_warning (0, "INTEGER expression of %s clause at %L must be positive",
-		 clause, &expr->where);
+gfc_error ("INTEGER expression of %s clause at %L must be positive",
+	   clause, &expr->where);
 }
 
 /* Emits error when symbol is pointer, cray pointer or cray pointee
@@ -3859,6 +3859,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_claus
 if (omp_clauses->wait_list)
   for (el = omp_clauses->wait_list; el; el = el->next)
 	resolve_oacc_scalar_int_expr (el->expr, "WAIT");
+  if (omp_clauses->collapse && omp_clauses->tile_list)
+gfc_error ("Incompatible use of TILE and COLLAPSE at %L", &code->loc);
 }
 
 
@@ -4964,11 +4966,11 @@ resolve_oacc_loop_blocks (gfc_code *code)
 	  if (el->expr == NULL)
 	{
 	  /* NULL expressions are used to represent '*' arguments.
-		 Convert those to a -1 expressions.  */
+		 Convert those to a 0 expressions.  */
 	  el->expr = gfc_get_constant_expr (BT_INTEGER,
 		gfc_default_integer_kind,
 		&code->loc);
-	  mpz_set_si (el->expr->value.integer, -1);
+	  mpz_set_si (el->expr->value.integer, 0);
 	}
 	  else
 	{
Index: fortran/trans-openmp.c
===
--- fortran/trans-openmp.c	(revision 241809)
+++ fortran/trans-openmp.c	(working copy)
@@ -3162,7 +3162,18 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op,
   vec inits = vNULL;
   dovar_init *di;
   unsigned ix;
+  gfc_expr_list *tile = do_clauses ? do_clauses->tile_list : clauses->tile_list;
 
+  /* Both collapsed and tiled loops are lowered the same way.  In
+ OpenACC, those clauses are not compatible, so prioritize the tile
+ clause, if present.  */
+  if (tile)
+{
+  collapse = 0;
+  for (gfc_expr_list *el = tile; el; el = el->next)
+	collapse++;
+}
+
   if (collapse <= 0)
 collapse = 1;
 
Index: testsuite/gfortran.dg/goacc/loop-5.f95
===
--- testsuite/gfortran.dg/goacc/loop-5.f95	(revision 241809)
+++ testsuite/gfortran.dg/goacc/loop-5.f95	(working copy)
@@ -93,9 +93,6 @@ program test
   DO j = 1,10
   ENDDO
 ENDDO
-!$acc loop tile(-1) ! { dg-warning "must be positive" }
-do i = 1,10
-enddo
 !$acc loop vector tile(*)
 DO i = 1,10
 ENDDO
@@ -129,9 +126,6 @@ program test
   DO j = 1,10
   ENDDO
 ENDDO
-!$acc loop tile(-1) ! { dg-warning "must be positive" }
-do i = 1,10
-enddo
 !$acc loop vector tile(*)
 DO i = 1,10
 ENDDO
@@ -242,9 +236,6 @@ program test
 DO j = 1,10
 ENDDO
   ENDDO
-  !$acc kernels loop tile(-1) ! { dg-warning "must be positive" }
-  do i = 1,10
-  enddo
   !$acc kernels loop vector tile(*)
   DO i = 1,10
   ENDDO
@@ -333,9 +324,6 @@ program test
 DO j = 1,10
 ENDDO
   ENDDO
-  !$acc parallel loop tile(-1) ! { dg-warning "must be positive" }
-  do i = 1,10
-  enddo
   !$acc parallel loop vector tile(*)
   DO i = 1,10
   ENDDO
Index: testsuite/gfortran.dg/goacc/tile-1.f90
===
--- testsuite/gfortran.dg/goacc/tile-1.f90	(revision 0)
+++ testsuite/gfortran.dg/goacc/tile-1.f90	(revision 0)
@@ -0,0 +1,339 @@
+subroutine parloop
+  integer, parameter :: n = 100
+  integer i, j, k, a
+
+  !$acc parallel loop tile(10)
+  do i = 1, n
+  end do
+  
+  !$acc parallel loop tile(*)
+  do i = 1, n
+  end do
+
+  !$acc parallel loop tile(10, *)
+  do i = 1, n
+ do j = 1, n
+ end do
+  end do
+ 
+  !$acc parallel loop tile(10, *, i) ! { dg-error "" }
+  do i = 1, n
+ do j = 1, n
+do k = 1, n
+end do
+ end do
+  end do 
+
+  !$acc parallel loop t

Re: [PATCH] loop distribution bug fix

2016-11-10 Thread Richard Biener
On Thu, Nov 10, 2016 at 6:25 AM, Jim Wilson  wrote:
> This fixes a bug in code adding edges to the dependence graph. Values
> for this_dir can be -1 (backward edge), 0 (no edge), 1 (forward edge),
> and 2 (both backward and forward edges). There can be multiple
> dependencies checked, creating multiple edges that have to be merged
> together. this_dir contains the current edge. dir contains the
> previous edges. The code fails to handle the case where this_dir is 2,
> in which case we can return immediately. This is a minor optimization
> to improve compile time. The code handles the case where dir is
> non-zero and this_dir is zero by returning 2, which is incorrect. dir
> should be unmodified in this case. We can return 2 only if both dir
> and this_dir are non-zero and unequal, i.e. one is -1 and the other is
> 1. This problem creates extra unnecessary edges, which can prevent
> loops from being distributed. The patch fixes both problems.
>
> This passed an x86_64 gcc bootstrap with -ftree-loop-distribution
> added to BOOT_CFLAGS and a testsuite regression check.  Curiously, I
> see that I get different results for the C/C++ ubsan tests every time
> I run them, but this has nothing to do with my patch, as it happens
> with or without my patch.  I haven't tried debugging this yet.  Might
> be related to my recent upgrade to Ubuntu 16.04 LTS.  Otherwise, there
> are no regressions.
>
> On SPEC CPU2006, on aarch64, I see 5879 loops distributed without the
> patch, and 5906 loops distributed with the patch. So 27 extra loops
> are distributed which is about 0.5% more loop distributions. There is
> no measurable performance gain from the bug fix on the CPU2006 run
> time though I plan to spend some more time looking at this code to see
> if I can find other improvements.

The biggest "lack" of loop distribution is the ability to undo CSE so for

 for (;;)
  {
 a[i] = a[i] + 1;
 b[i] = a[i];
  }

CSE makes us see

  for (;;)
{
   tem = a[i];
   tem2 = tem + 1;
   a[i] = tem;
   b[i] = tem;
}

and loop distribution cannot re-materialize tem3 from a[i] thus most of the
time it ends up pulling redundant computations into each partition (sometimes
that can reduce memory bandwith if one less stream is used but sometimes
not, like in the above case).

Then of course the cost model is purely modeled for STREAM (reduce the number
of memory streams).  So loop distribution is expected to pessimize code for
the CSE case in case you are not memory bound and improve things if you
are memory bound.

> OK?

Ok.

Thanks for the improvement!
Richard.

> Jim


Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-10 Thread Martin Liška
On 11/09/2016 02:47 PM, Martin Liška wrote:
> On 11/09/2016 02:29 PM, Jakub Jelinek wrote:
>> On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote:
>>> As shown in the attached test-case, the assert cannot always be true.
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>> Martin
>>
>>> >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001
>>> From: marxin 
>>> Date: Wed, 9 Nov 2016 11:52:00 +0100
>>> Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR 
>>> sanitizer/78270)
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-11-09  Martin Liska  
>>>
>>> PR sanitizer/78270
>>> * gimplify.c (gimplify_switch_expr):
>>
>> No description on what you've changed.
>>
>> That said, I'm not 100% sure it is the right fix.
>> As the testcase shows, for switch without GIMPLE_BIND wrapping the body
>> we can have variables that are in scope from the switch onwards.
>> I bet we could also have variables that go out of scope, say if in the
>> compound literal's initializer there is ({ ... }) that declares variables.
>> I doubt you can have a valid case/default label in those though, so
>> perhaps it would be simpler not to create live_switch_vars at all
>> if SWITCH_BODY is not a BIND_EXPR?
> 
> I like the approach you introduced! I'll re-trigger regression tests and
> send a newer version of patch.
> 
> Martin

Sending the patch.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

> 
>>
>>  Jakub
>>
> 

>From fb4b852a17656309e6acfb8da97cf9bce4b3b176 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 11:52:00 +0100
Subject: [PATCH] Create live_switch_vars conditionally (PR sanitizer/78270)

gcc/testsuite/ChangeLog:

2016-11-09  Martin Liska  

	* gcc.dg/asan/pr78269.c: New test.

gcc/ChangeLog:

2016-11-09  Martin Liska  

	* gimplify.c (gimplify_switch_expr): Create live_switch_vars
	only when SWITCH_BODY is a BIND_EXPR.
---
 gcc/gimplify.c  | 20 +++-
 gcc/testsuite/gcc.dg/asan/pr78269.c | 13 +
 2 files changed, 28 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr78269.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d392450..da60c05 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2241,7 +2241,7 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 {
   vec labels;
   vec saved_labels;
-  hash_set *saved_live_switch_vars;
+  hash_set *saved_live_switch_vars = NULL;
   tree default_case = NULL_TREE;
   gswitch *switch_stmt;
 
@@ -2253,8 +2253,14 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
  labels.  Save all the things from the switch body to append after.  */
   saved_labels = gimplify_ctxp->case_labels;
   gimplify_ctxp->case_labels.create (8);
-  saved_live_switch_vars = gimplify_ctxp->live_switch_vars;
-  gimplify_ctxp->live_switch_vars = new hash_set (4);
+
+  /* Do not create live_switch_vars if SWITCH_BODY is not a BIND_EXPR.  */
+  if (TREE_CODE (SWITCH_BODY (switch_expr)) == BIND_EXPR)
+	{
+	  saved_live_switch_vars = gimplify_ctxp->live_switch_vars;
+	  gimplify_ctxp->live_switch_vars = new hash_set (4);
+	}
+
   bool old_in_switch_expr = gimplify_ctxp->in_switch_expr;
   gimplify_ctxp->in_switch_expr = true;
 
@@ -2269,8 +2275,12 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 
   labels = gimplify_ctxp->case_labels;
   gimplify_ctxp->case_labels = saved_labels;
-  gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0);
-  delete gimplify_ctxp->live_switch_vars;
+
+  if (gimplify_ctxp->live_switch_vars)
+	{
+	  gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0);
+	  delete gimplify_ctxp->live_switch_vars;
+	}
   gimplify_ctxp->live_switch_vars = saved_live_switch_vars;
 
   preprocess_case_label_vec_for_gimple (labels, index_type,
diff --git a/gcc/testsuite/gcc.dg/asan/pr78269.c b/gcc/testsuite/gcc.dg/asan/pr78269.c
new file mode 100644
index 000..55840b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr78269.c
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-additional-options "-Wno-switch-unreachable" }
+
+typedef struct
+{
+} bdaddr_t;
+
+int a;
+void fn1 ()
+{
+  switch (a)
+&(bdaddr_t){};
+}
-- 
2.10.1



[PATCH] [ARC] New option handling, refurbish multilib support.

2016-11-10 Thread Claudiu Zissulescu
Hi,

Please find the revised patch which includes the refurbishing of
mmpy-option option, and a new comment on DEFAULT_arc_fpu_build
define. As for the last suggestion, my proposal is to have a latter
patch on the topic of .cpu, synced with a related binutils patch.

OK to apply?
Claudiu

gcc/
2016-05-09  Claudiu Zissulescu  

* config/arc/arc-arch.h: New file.
* config/arc/arc-arches.def: Likewise.
* config/arc/arc-cpus.def: Likewise.
* config/arc/arc-options.def: Likewise.
* config/arc/t-multilib: Likewise.
* config/arc/genmultilib.awk: Likewise.
* config/arc/genoptions.awk: Likewise.
* config/arc/arc-tables.opt: Likewise.
* config/arc/driver-arc.c: Likewise.
* common/config/arc/arc-common.c (arc_handle_option): Trace
toggled options.
* config.gcc (arc*-*-*): Add arc-tables.opt to arc's extra
options; check for supported cpu against arc-cpus.def file.
(arc*-*-elf*, arc*-*-linux-uclibc*): Use new make fragment; define
TARGET_CPU_BUILD macro; add driver-arc.o as an extra object.
* config/arc/arc-c.def: Add emacs local variables.
* config/arc/arc-opts.h (processor_type): Use arc-cpus.def file.
(FPU_FPUS, FPU_FPUD, FPU_FPUDA, FPU_FPUDA_DIV, FPU_FPUDA_FMA)
(FPU_FPUDA_ALL, FPU_FPUS_DIV, FPU_FPUS_FMA, FPU_FPUS_ALL)
(FPU_FPUD_DIV, FPU_FPUD_FMA, FPU_FPUD_ALL): New defines.
(DEFAULT_arc_fpu_build): Define.
(DEFAULT_arc_mpy_option): Define.
* config/arc/arc-protos.h (arc_init): Delete.
* config/arc/arc.c (arc_cpu_name): New variable.
(arc_selected_cpu, arc_selected_arch, arc_arcem, arc_archs)
(arc_arc700, arc_arc600, arc_arc601): New variable.
(arc_init): Add static; remove selection of default tune value,
cleanup obsolete error messages.
(arc_override_options): Make use of .def files for selecting the
right cpu and option configurations.
* config/arc/arc.h (stdbool.h): Include.
(TARGET_CPU_DEFAULT): Define.
(CPP_SPEC): Remove mcpu=NPS400 handling.
(arc_cpu_to_as): Declare.
(EXTRA_SPEC_FUNCTIONS): Define.
(OPTION_DEFAULT_SPECS): Likewise.
(ASM_DEFAULT): Remove.
(ASM_SPEC): Use arc_cpu_to_as.
(DRIVER_SELF_SPECS): Remove deprecated options.
(arc_base_cpu): Declare.
(TARGET_ARC600, TARGET_ARC601, TARGET_ARC700, TARGET_EM)
(TARGET_HS, TARGET_V2, TARGET_ARC600): Make them use arc_base_cpu
variable.
(MULTILIB_DEFAULTS): Use ARC_MULTILIB_CPU_DEFAULT.
* config/arc/arc.md (attr_cpu): Remove.
* config/arc/arc.opt (mno-mpy): Deprecate.
(mcpu=ARC600, mcpu=ARC601, mcpu=ARC700, mcpu=NPS400, mcpu=ARCEM)
(mcpu=ARCHS): Remove.
(mcrc, mdsp-packa, mdvbf, mmac-d16, mmac-24, mtelephony, mrtsc):
Deprecate.
(mbarrel_shifte, mspfp_, mdpfp_, mdsp_pack, mmac_): Remove.
(arc_fpu): Use new defines.
(mpy-option): Change to use numeric or string like inputs.
* config/arc/t-arc (driver-arc.o): New target.
(arc-cpus, t-multilib, arc-tables.opt): Likewise.
* config/arc/t-arc-newlib: Delete.
* config/arc/t-arc-uClibc: Renamed to t-uClibc.
* doc/invoke.texi (ARC): Update arc options.
---
 gcc/common/config/arc/arc-common.c |  69 -
 gcc/config.gcc |  47 +
 gcc/config/arc/arc-arch.h  | 120 ++
 gcc/config/arc/arc-arches.def  |  56 ++
 gcc/config/arc/arc-c.def   |   4 +
 gcc/config/arc/arc-cpus.def|  75 ++
 gcc/config/arc/arc-options.def | 109 
 gcc/config/arc/arc-opts.h  |  49 +++--
 gcc/config/arc/arc-protos.h|   1 -
 gcc/config/arc/arc-tables.opt  |  90 
 gcc/config/arc/arc.c   | 176 +---
 gcc/config/arc/arc.h   |  89 
 gcc/config/arc/arc.md  |   5 -
 gcc/config/arc/arc.opt | 169 +++---
 gcc/config/arc/driver-arc.c|  78 ++
 gcc/config/arc/genmultilib.awk | 203 +
 gcc/config/arc/genoptions.awk  |  86 
 gcc/config/arc/t-arc   |  19 
 gcc/config/arc/t-arc-newlib|  46 -
 gcc/config/arc/t-arc-uClibc|  20 
 gcc/config/arc/t-multilib  |  34 +++
 gcc/config/arc/t-uClibc|  20 
 gcc/doc/invoke.texi|  90 +---
 23 files changed, 1277 insertions(+), 378 deletions(-)
 create mode 100644 gcc/config/arc/arc-arch.h
 create mode 100644 gcc/config/arc/arc-arches.def
 create mode 100644 gcc/config/arc/arc-cpus.def
 create mode 100644 gcc/config/arc/arc-options.def
 create mode 100644 gcc/config/arc/arc-tables.opt
 create mode 100644 gcc/config/arc/driver-a

Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes

2016-11-10 Thread Pierre-Marie de Rodat

On 11/09/2016 10:02 AM, Richard Biener wrote:

Using scan-assembler-times on the dwarf?


I always have a bad feeling about this kind of check as I imagine it can 
break very easily with legit changes. But I have nothing better to 
contribute, so I’ve added one such testcase. ;-)



Ok to commit?


Ok.


This is committed as r242035. Thanks!

--
Pierre-Marie de Rodat


Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 12:02:45PM +0100, Martin Liška wrote:
> >From fb4b852a17656309e6acfb8da97cf9bce4b3b176 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 9 Nov 2016 11:52:00 +0100
> Subject: [PATCH] Create live_switch_vars conditionally (PR sanitizer/78270)
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-11-09  Martin Liska  
> 

Missing
PR sanitizer/78270
note.

>   * gcc.dg/asan/pr78269.c: New test.

Misnamed test, should be pr78270.c.
> 
> gcc/ChangeLog:
> 
> 2016-11-09  Martin Liska  
> 

Also missing the PR line.

>   * gimplify.c (gimplify_switch_expr): Create live_switch_vars
>   only when SWITCH_BODY is a BIND_EXPR.

Ok with those nits fixed.

Jakub


Patch ping: Fix dwarf2out related bootstrap failure on Solaris (PR debug/78191)

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 05:42:51PM +0100, Jakub Jelinek wrote:
> Bootstrapped/regtested on x86_64-linux and i686-linux and Rainer has kindly
> tested it on Solaris, ok for trunk?
> 
> 2016-11-03  Jakub Jelinek  
> 
>   * dwarf2out.c (size_of_discr_list): Fix typo in function comment.
> 
>   PR debug/78191
>   * dwarf2out.c (abbrev_opt_base_type_end): New variable.
>   (die_abbrev_cmp): Sort dies with die_abbrev smaller than
>   abbrev_opt_base_type_end only by increasing die_abbrev, before
>   any other dies.
>   (optimize_abbrev_table): Don't change abbrev numbers of
>   base types and CU or optimize implicit consts in them if
>   calc_base_type_die_sizes has been called during build_abbrev_table.
>   (calc_base_type_die_sizes): If abbrev_opt_start, set
>   abbrev_opt_base_type_end to one plus largest base type's
>   die_abbrev.

I'd like to ping this patch, it fixes a bootstrap failure on Solaris
(both sparc and x86), so I think it is high priority.

Thanks.

Jakub


Re: [PATCH] libiberty: Add Rust symbol demangling.

2016-11-10 Thread Mark Wielaard
On Thu, 2016-11-03 at 18:39 +0100, Mark Wielaard wrote:
> Adds Rust symbol demangler. Rust mangles symbols using GNU_V3 style,
> adding a hash and various special character subtitutions. This adds
> a new rust style to cplus_demangle and adds 3 helper functions
> rust_demangle, rust_demangle_sym and rust_is_mangled.
> 
> rust-demangle.c was written by David. Mark did the code formatting to
> GNU style and integration into the gcc/libiberty build system and
> testsuite.
> 
> The original code was written for the perf tool. David agreed with
> submitting it for gcc/libiberty so binutils, gdb, valgrind, etc.
> can also easily reuse it. He has sent request-assign to ass...@gnu.org.
> I already have write after approval for gcc.

Ping.

Any comments are welcome. A variant of this is being used already in
perf and valgrind. It would be nice to make it part of upstream
libiberty.

I did also apply this patch to binutils-gdb/libiberty and with a
oneliner patch (*) it allows c++filt to handle rust mangled symbols and
accept --format=rust (and displays it with --help).

I did not yet test gdb, which would need a similar small tweak to its
current language settings. Any feedback on whether it also works as
expected with gdb appreciated.

Thanks,

Mark

(*)

diff --git a/binutils/cxxfilt.c b/binutils/cxxfilt.c
index d5863ee..7cf9458 100644
--- a/binutils/cxxfilt.c
+++ b/binutils/cxxfilt.c
@@ -242,6 +242,7 @@ main (int argc, char **argv)
 case gnu_v3_demangling:
 case dlang_demangling:
 case auto_demangling:
+case rust_demangling:
   valid_symbols = standard_symbol_characters ();
   break;
 case hp_demangling:



Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-10 Thread Richard Biener
On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> Here is updated 3 patch.
> 
> I checked that all new tests related to epilogue vectorization passed with it.
> 
> Your comments will be appreciated.

A lot better now.  Instead of the ->aux dance I now prefer to
pass the original loops loop_vinfo to vect_analyze_loop as
optional argument (if non-NULL we analyze the epilogue of that 
loop_vinfo).  OTOH I remember we mainly use it to get at the
original vectorization factor?  So we can pass down an (optional)
forced vectorization factor as well?

Richard.

> 2016-11-08 15:38 GMT+03:00 Richard Biener :
> > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Hi Richard,
> >>
> >> I did not understand your last remark:
> >>
> >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >
> >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> >   && dump_enabled_p ())
> >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> >"loop vectorized\n");
> >> > -   vect_transform_loop (loop_vinfo);
> >> > +   new_loop = vect_transform_loop (loop_vinfo);
> >> > num_vectorized_loops++;
> >> >/* Now that the loop has been vectorized, allow it to be unrolled
> >> >   etc.  */
> >> >  loop->force_vectorize = false;
> >> >
> >> > +   /* Add new loop to a processing queue.  To make it easier
> >> > +  to match loop and its epilogue vectorization in dumps
> >> > +  put new loop as the next loop to process.  */
> >> > +   if (new_loop)
> >> > + {
> >> > +   loops.safe_insert (i + 1, new_loop->num);
> >> > +   vect_loops_num = number_of_loops (cfun);
> >> > + }
> >> >
> >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> f> unction which will set up stuff properly (and also perform
> >> > the if-conversion of the epilogue there).
> >> >
> >> > That said, if we can get in non-masked epilogue vectorization
> >> > separately that would be great.
> >>
> >> Could you please clarify your proposal.
> >
> > When a loop was vectorized set things up to immediately vectorize
> > its epilogue, avoiding changing the loop iteration and avoiding
> > the re-use of ->aux.
> >
> > Richard.
> >
> >> Thanks.
> >> Yuri.
> >>
> >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> I re-send all patches sent by Ilya earlier for review which support
> >> >> vectorization of loop epilogues and loops with low trip count. We
> >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> >> approved by Jeff.
> >> >>
> >> >> I did re-base of all patches and performed bootstrapping and
> >> >> regression testing that did not show any new failures. Also all
> >> >> changes related to new vect_do_peeling algorithm have been changed
> >> >> accordingly.
> >> >>
> >> >> Is it OK for trunk?
> >> >
> >> > I would have prefered that the series up to -03-nomask-tails would
> >> > _only_ contain epilogue loop vectorization changes but unfortunately
> >> > the patchset is oddly separated.
> >> >
> >> > I have a comment on that part nevertheless:
> >> >
> >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> > loop_vinfo)
> >> >/* Check if we can possibly peel the loop.  */
> >> >if (!vect_can_advance_ivs_p (loop_vinfo)
> >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> >> > -  || loop->inner)
> >> > +  || loop->inner
> >> > +  /* Required peeling was performed in prologue and
> >> > +is not required for epilogue.  */
> >> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >> >  do_peeling = false;
> >> >
> >> >if (do_peeling
> >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> > loop_vinfo)
> >> >
> >> >do_versioning =
> >> > optimize_loop_nest_for_speed_p (loop)
> >> > -   && (!loop->inner); /* FORNOW */
> >> > +   && (!loop->inner) /* FORNOW */
> >> > +/* Required versioning was performed for the
> >> > +  original loop and is not required for epilogue.  */
> >> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >> >
> >> >if (do_versioning)
> >> >  {
> >> >
> >> > please do that check in the single caller of this function.
> >> >
> >> > Otherwise I still dislike the new ->aux use and I believe that simply
> >> > passing down info from the processed parent would be _much_ cleaner.
> >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >
> >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> > && dump_enabled_p ())
> >> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> > "loop vectorized\n");
> >> > -   vect_transform_loop (loop_vinfo);
> >> > +   new_loop = vect_transform_loop (loop_vinfo);
> >> > num_vectorized_loops++;
> >> > /*

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-10 Thread Richard Biener
On Thu, 10 Nov 2016, Richard Biener wrote:

> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> 
> > Richard,
> > 
> > Here is updated 3 patch.
> > 
> > I checked that all new tests related to epilogue vectorization passed with 
> > it.
> > 
> > Your comments will be appreciated.
> 
> A lot better now.  Instead of the ->aux dance I now prefer to
> pass the original loops loop_vinfo to vect_analyze_loop as
> optional argument (if non-NULL we analyze the epilogue of that 
> loop_vinfo).  OTOH I remember we mainly use it to get at the
> original vectorization factor?  So we can pass down an (optional)
> forced vectorization factor as well?

Btw, I wonder if you can produce a single patch containing just
epilogue vectorization, that is combine patches 1-3 but rip out
changes only needed by later patches?

Thanks,
Richard.

> Richard.
> 
> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> > >
> > >> Hi Richard,
> > >>
> > >> I did not understand your last remark:
> > >>
> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> > >> >
> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> > >> >   && dump_enabled_p ())
> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> > >> >"loop vectorized\n");
> > >> > -   vect_transform_loop (loop_vinfo);
> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
> > >> > num_vectorized_loops++;
> > >> >/* Now that the loop has been vectorized, allow it to be 
> > >> > unrolled
> > >> >   etc.  */
> > >> >  loop->force_vectorize = false;
> > >> >
> > >> > +   /* Add new loop to a processing queue.  To make it easier
> > >> > +  to match loop and its epilogue vectorization in dumps
> > >> > +  put new loop as the next loop to process.  */
> > >> > +   if (new_loop)
> > >> > + {
> > >> > +   loops.safe_insert (i + 1, new_loop->num);
> > >> > +   vect_loops_num = number_of_loops (cfun);
> > >> > + }
> > >> >
> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> > >> f> unction which will set up stuff properly (and also perform
> > >> > the if-conversion of the epilogue there).
> > >> >
> > >> > That said, if we can get in non-masked epilogue vectorization
> > >> > separately that would be great.
> > >>
> > >> Could you please clarify your proposal.
> > >
> > > When a loop was vectorized set things up to immediately vectorize
> > > its epilogue, avoiding changing the loop iteration and avoiding
> > > the re-use of ->aux.
> > >
> > > Richard.
> > >
> > >> Thanks.
> > >> Yuri.
> > >>
> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> > >> >
> > >> >> Hi All,
> > >> >>
> > >> >> I re-send all patches sent by Ilya earlier for review which support
> > >> >> vectorization of loop epilogues and loops with low trip count. We
> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> > >> >> approved by Jeff.
> > >> >>
> > >> >> I did re-base of all patches and performed bootstrapping and
> > >> >> regression testing that did not show any new failures. Also all
> > >> >> changes related to new vect_do_peeling algorithm have been changed
> > >> >> accordingly.
> > >> >>
> > >> >> Is it OK for trunk?
> > >> >
> > >> > I would have prefered that the series up to -03-nomask-tails would
> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
> > >> > the patchset is oddly separated.
> > >> >
> > >> > I have a comment on that part nevertheless:
> > >> >
> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > >> > loop_vinfo)
> > >> >/* Check if we can possibly peel the loop.  */
> > >> >if (!vect_can_advance_ivs_p (loop_vinfo)
> > >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> > >> > -  || loop->inner)
> > >> > +  || loop->inner
> > >> > +  /* Required peeling was performed in prologue and
> > >> > +is not required for epilogue.  */
> > >> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> > >> >  do_peeling = false;
> > >> >
> > >> >if (do_peeling
> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > >> > loop_vinfo)
> > >> >
> > >> >do_versioning =
> > >> > optimize_loop_nest_for_speed_p (loop)
> > >> > -   && (!loop->inner); /* FORNOW */
> > >> > +   && (!loop->inner) /* FORNOW */
> > >> > +/* Required versioning was performed for the
> > >> > +  original loop and is not required for epilogue.  */
> > >> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> > >> >
> > >> >if (do_versioning)
> > >> >  {
> > >> >
> > >> > please do that check in the single caller of this function.
> > >> >
> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
> > >> > passing down info from the processed parent

Re: [PATCH] DWARF: make signedness explicit for enumerator const values

2016-11-10 Thread Mark Wielaard
On Thu, 2016-10-13 at 18:12 +0200, Pierre-Marie de Rodat wrote:
> Currently, the DWARF description does not specify the signedness of the
> representation of enumeration types.  This is a problem in some
> contexts where DWARF consumers need to determine if value X is greater
> than value Y.
> 
> For instance in Ada:
> 
> type Enum_Type is ( A, B, C, D);
> for Enum_Type use (-1, 0, 1, 2);
> 
> type Rec_Type (E : Enum_Type) is record
>when A .. B => null;
>when others => B : Booleann;
> end record;
> 
> The above can be described in DWARF the following way:
> 
> DW_TAG_enumeration_type(Enum_Type)
> | DW_AT_byte_size: 1
>   DW_TAG_enumerator(A)
>   | DW_AT_const_value: -1
>   DW_TAG_enumerator(B)
>   | DW_AT_const_value: 0
>   DW_TAG_enumerator(C)
>   | DW_AT_const_value: 1
>   DW_TAG_enumerator(D)
>   | DW_AT_const_value: 2
> 
> DW_TAG_structure_type(Rec_Type)
>   DW_TAG_member(E)
>   | DW_AT_type: 
>   DW_TAG_variant_part
>   | DW_AT_discr: 
> DW_TAG_variant
> | DW_AT_discr_list: DW_DSC_range 0x7f 0
> DW_TAG_variant
> | DW_TAG_member(b)
> 
> DWARF consumers need to know that enumerators (A, B, C and D) are signed
> in order to determine the set of E values for which Rec_Type has a B
> field.  In practice, they need to know how to interpret the 0x7f LEB128
> number above (-1, not 127).
> 
> There seems to be only two alternatives to solve this issue: one is to
> add a DW_AT_type attribute to DW_TAG_enumerator_type DIEs to make it
> point to a base type that specifies the signedness.  The other is to
> make sure the form of the DW_AT_const_value attribute carries the
> signedness information.  This patch implements the latter.

IMHO having an explicit DW_AT_type pointing at the base type with size
and encoding for the DW_TAG_enumerator_type is better for consumers than
having to try and interpret the DW_FORM used to encode the values.

Alternatively could we just attach a DW_AT_encoding to the
DW_TAG_enumeration_type? The spec doesn't list it as one of the
attributes for an enumeration_type, but it makes sense given it already
carries bit/byte size attributes.

Thanks,

Mark


Re: RFA: PATCH to gengtype to avoid putting tree_node support in front end objects

2016-11-10 Thread Jakub Jelinek
On Thu, Oct 27, 2016 at 09:36:09AM -0400, Jason Merrill wrote:
> Currently, the way gengtype works it scans the list of source files
> with front end files at the end, and pushes data structures onto a
> stack.  It then processes the stack in LIFO order, so that data
> structures from front ends are handled first.  As a result, if a GTY
> data structure in a front end depends on tree_node, gengtype happily
> puts gt_ggc_mx(tree_node*&) in a front end file, leading to link
> errors on all other front ends.
> 
> This patch avoids this problem by appending to the list of data
> structures so that they are processed in FIFO order, and so tree_node
> gets handled in gtype-desc.o.
> 
> Tested x86_64-pc-linux-gnu, OK for trunk?

> commit 487a1c95c0d3169b2041942ff4f8d71c9ff689eb
> Author: Jason Merrill 
> Date:   Wed Oct 26 23:12:23 2016 -0400
> 
> * gengtype.c (new_structure): Append to structures list.
> 
> (find_structure): Likewise.

Please remove the blank line in the ChangeLog.

When looking at the differences it creates, it is hard, because
all the generated files have all the functions emitted in reverse order now
from what it used to be, so I only looked at files where the size changed,
and that is beyond gtype.state only in my case gt-tree-phinodes.h
which lost
void
gt_ggc_mx (struct gimple *& x)
{
  if (x)
gt_ggc_mx_gimple ((void *) x);
}
and
void
gt_pch_nx (struct gimple *& x)
{
  if (x)
gt_pch_nx_gimple ((void *) x);
}
and gtype-desc.c which didn't contain those but now it does (for
gtype-desc.c it is hard to find out due to the reordering what else
has changed, but as gt-tree-phinodes.h shrunk by 170 characters and
gtype-desc.c grew by 170 characters, I'd think it is all that changed).
I believe those routines belong to gtype-desc.c, that is where similar
ones for tree_node, etc. are, tree-phinodes.h certainly isn't the header
that defines gimple.

So I think this patch is ok for trunk.  Thanks.

Jakub


[PATCH] Fix print_node for CONSTRUCTORs

2016-11-10 Thread Martin Liška
Hello.

Following patch fixes indentation of print_node when printing a constructor
that has some equal elements. Current implementation caches tree to prevent deep
debug outputs. Such behavior is undesired for ctor elements. Apart from that,
I switch to hash_set for a table that is used for tree node caching.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Before:
  constant 1>
val  
constant 120>
idx  constant 2> val 
idx  constant 3> val 
idx  constant 4> val 
idx  constant 5> val >

After:
  constant 1>
val  
constant 120>
idx  constant 2>
val  
constant 120>
idx  constant 3>
val  
constant 120>
idx  constant 4>
val  
constant 120>
idx  constant 5>
val  
constant 120>>

Ready to be installed?
Martin
>From 6d18ff00ec1d8e6a8a154fbb70af25b2dda8165e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 16:28:52 +0100
Subject: [PATCH] Fix print_node for CONSTRUCTORs

gcc/ChangeLog:

2016-11-10  Martin Liska  

	* print-tree.c (struct bucket): Remove.
	(print_node): Add new argument which drives whether a tree node
	is printed briefly or not.
	(debug_tree): Replace a custom hash table with hash_set.
	* print-tree.h (print_node): Add the argument.
---
 gcc/print-tree.c | 43 +++
 gcc/print-tree.h |  3 ++-
 2 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index 8c63cb8..f3ee04c 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -33,19 +33,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h" /* FIXME */
 #include "tree-cfg.h"
 #include "tree-dump.h"
+#include "print-tree.h"
 
 /* Define the hash table of nodes already seen.
Such nodes are not repeated; brief cross-references are used.  */
 
 #define HASH_SIZE 37
 
-struct bucket
-{
-  tree node;
-  struct bucket *next;
-};
-
-static struct bucket **table;
+static hash_set *table = NULL;
 
 /* Print PREFIX and ADDR to FILE.  */
 void
@@ -176,10 +171,9 @@ indent_to (FILE *file, int column)
starting in column INDENT.  */
 
 void
-print_node (FILE *file, const char *prefix, tree node, int indent)
+print_node (FILE *file, const char *prefix, tree node, int indent,
+	bool brief_for_visited)
 {
-  int hash;
-  struct bucket *b;
   machine_mode mode;
   enum tree_code_class tclass;
   int len;
@@ -219,21 +213,14 @@ print_node (FILE *file, const char *prefix, tree node, int indent)
   /* Allow this function to be called if the table is not there.  */
   if (table)
 {
-  hash = ((uintptr_t) node) % HASH_SIZE;
-
   /* If node is in the table, just mention its address.  */
-  for (b = table[hash]; b; b = b->next)
-	if (b->node == node)
-	  {
-	print_node_brief (file, prefix, node, indent);
-	return;
-	  }
+  if (table->contains (node) && brief_for_visited)
+	{
+	  print_node_brief (file, prefix, node, indent);
+	  return;
+	}
 
-  /* Add this node to the table.  */
-  b = XNEW (struct bucket);
-  b->node = node;
-  b->next = table[hash];
-  table[hash] = b;
+  table->add (node);
 }
 
   /* Indent to the specified column, since this is the long form.  */
@@ -846,8 +833,8 @@ print_node (FILE *file, const char *prefix, tree node, int indent)
 	FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (node),
   cnt, index, value)
 	  {
-		print_node (file, "idx", index, indent + 4);
-		print_node (file, "val", value, indent + 4);
+		print_node (file, "idx", index, indent + 4, false);
+		print_node (file, "val", value, indent + 4, false);
 	  }
 	  }
 	  break;
@@ -997,10 +984,10 @@ print_node (FILE *file, const char *prefix, tree node, int indent)
 DEBUG_FUNCTION void
 debug_tree (tree node)
 {
-  table = XCNEWVEC (struct bucket *, HASH_SIZE);
+  table = new hash_set (HASH_SIZE);
   print_node (stderr, "", node, 0);
-  free (table);
-  table = 0;
+  delete table;
+  table = NULL;
   putc ('\n', stderr);
 }
 
diff --git a/gcc/print-tree.h b/gcc/print-tree.h
index 124deab..fd610f9 100644
--- a/gcc/print-tree.h
+++ b/gcc/print-tree.h
@@ -38,7 +38,8 @@ extern void debug_raw (vec &ref);
 extern void debug_raw (vec *ptr);
 #ifdef BUFSIZ
 extern void dump_addr (FILE*, const char *, const void *);
-extern void print_node (FILE *, const char *, tree, int);
+extern void print_node (FILE *, const char *, tree, int,
+			bool brief_for_visited = true);
 extern void print_node_brief (FILE *, const char *, const_tree, int);
 extern void indent_to (FILE *, int);
 #endif
-- 
2.10.1



Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-11-10 Thread Richard Biener
On Tue, Nov 8, 2016 at 5:03 AM, Martin Sebor  wrote:
> It's taken me longer than I expected to finally get back to this
> project.  Sorry about the delay.
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01110.html
>
> Attached is an updated patch with this enhancement and reflecting
> you previous comment.
>
> Besides running the GCC test suite I tested the patch by building
> Binutils and the Linux kernel.  It found one stpcpy-related overflow
> in Binutils that I'm looking into and reduced by one the number of
> problems reported by the -Wformat-length option in the kernel (I
> haven't yet checked which one it eliminated).
>
> Although I'm not done investigating the Binutils problem I'm posting
> the patch for review now to allow for comments before stage 1 ends.

@@ -158,14 +170,149 @@ compute_object_offset (const_tree expr, const_tree var)
   return size_binop (code, base, off);
 }

+static bool
+operand_unsigned_p (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)

new functions need a comment.  But maybe you want to use tree_expr_nonnegative_p
to also allow signed but known positive ones?

+/* Fill the 2-element OFFRANGE array with the range of values OFF
+   is known to be in.  Postcodition: OFFRANGE[0] <= OFFRANGE[1].  */
+
+static bool
+get_offset_range (tree off, HOST_WIDE_INT offrange[2])
+{
+  STRIP_NOPS (off);

why strip nops (even sign changes!) here?  Why below convert things
via to_uhwi when offrange is of type signed HOST_WIDE_INT[2]?

+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == PLUS_EXPR)
+   {
+ /* Handle offset in the form VAR + CST where VAR's type
+is unsigned so the offset must be the greater of
+OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
+is in a canonical form with CST second.  */
+ tree rhs2 = gimple_assign_rhs2 (def);

err, what?  What about overflow?  Aren't you just trying to decompose
'off' into a variable and a constant part here and somehow extracting a
range for the variable part?  So why not just do that?

+  else if (range_type == VR_ANTI_RANGE)
+   {
+ offrange[0] = max.to_uhwi () + 1;
+ offrange[1] = min.to_uhwi () - 1;
+ return true;
+   }

first of all, how do you know it fits uhwi?  Second, from ~[5, 9] you get
[10, 4] !?  That looks bogus (and contrary to the function comment
postcondition)

+  else if (range_type == VR_VARYING)
+   {
+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR)
+   {

please trust range-info instead of doing your own little VRP here.
VR_VARYING -> return false.

stopping review here noting that other parts of the compiler
split constant parts from variable parts and it looks like this is
what you want to do here?  That is, enhance

static vec object_sizes[4];

and cache a SSA-NAME, constant offset pair in addition?  Or just a range
(range of that SSA name + offset), if that's good enough -- wait, that's
what you get from get_range_info!

So I'm not sure where the whole complication in the patch comes from...

Possibly from the fact that VRP on pointers is limited and thus &a[i] + 5
doesn't get you a range for i + 5?

Richard.

> Martin
>
> PS The tests added in the patch (but nothing else) depend on
> the changes in the patch for c/53562:
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00483.html
>


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Martin Liška
PING^2

On 10/31/2016 10:13 AM, Martin Liška wrote:
> PING^1
> 
> On 10/13/2016 05:34 PM, Martin Liška wrote:
>> Hello.
>>
>> As it's very hard to guess from GCC driver whether a target supports atomic 
>> updates
>> for GCOV counter or not, I decided to come up with a new option value 
>> (maybe-atomic),
>> that would be transformed in a corresponding value (single or atomic) in 
>> tree-profile.c.
>> The GCC driver selects the option when -pthread is present in the command 
>> line.
>>
>> That should fix all tests failures seen on AIX target.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
>>



Re: [ipa-vrp] ice in set_value_range

2016-11-10 Thread David Edelsohn
Kugan

Is there a PR for this failure?  It broke bootstrap on AIX as well and
I only was able to track it to your patch last night.

Thanks, David


[PATCH] Fix PR71762

2016-11-10 Thread Richard Biener

The following fixes PR71762 via reverting the transforms of
~X & Y to X < Y and similar because when the bools they apply to
are expanded to RTL undefined values are not reliably zero-extended
and thus the transform is invalid.  Ensuring the zero-extension
is too costly IMHO and the proper fix is to move the transform
to RTL where we can check known-zero-bits to validate validity
or to fix GIMPLE not not have operations on types not matching their mode
in precision.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk
and branches?

Any takers for the RTL implementation?

Thanks,
Richard.

2016-11-10  Richard Biener  

PR middle-end/71762
* match.pd ((~X & Y) -> X < Y, (X & ~Y) -> Y < X,
(~X | Y) -> X <= Y, (X | ~Y) -> Y <= X): Remove.

* gcc.dg/torture/pr71762-1.c: New testcase.
* gcc.dg/torture/pr71762-2.c: Likewise.
* gcc.dg/torture/pr71762-3.c: Likewise.
* gcc.dg/tree-ssa/forwprop-28.c: XFAIL.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 242004)
+++ gcc/match.pd(working copy)
@@ -944,33 +944,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (op:c truth_valued_p@0 (logical_inverted_value @0))
   { constant_boolean_node (op == NE_EXPR ? true : false, type); }))
 
-/* If arg1 and arg2 are booleans (or any single bit type)
-   then try to simplify:
-
-   (~X & Y) -> X < Y
-   (X & ~Y) -> Y < X
-   (~X | Y) -> X <= Y
-   (X | ~Y) -> Y <= X
-
-   But only do this if our result feeds into a comparison as
-   this transformation is not always a win, particularly on
-   targets with and-not instructions.
-   -> simplify_bitwise_binary_boolean */
-(simplify
-  (ne (bit_and:c (bit_not @0) @1) integer_zerop)
-  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
-   && TYPE_PRECISION (TREE_TYPE (@1)) == 1)
-   (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
-(lt @0 @1)
-(gt @0 @1
-(simplify
-  (ne (bit_ior:c (bit_not @0) @1) integer_zerop)
-  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
-   && TYPE_PRECISION (TREE_TYPE (@1)) == 1)
-   (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
-(le @0 @1)
-(ge @0 @1
-
 /* ~~x -> x */
 (simplify
   (bit_not (bit_not @0))
Index: gcc/testsuite/gcc.dg/torture/pr71762-1.c
===
--- gcc/testsuite/gcc.dg/torture/pr71762-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr71762-1.c(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fdisable-rtl-init-regs" } */
+
+static _Bool
+foo (_Bool a, _Bool b)
+{
+  int x = a && ! b;
+  return x != 0;
+}
+
+int y = 1;
+int main()
+{
+  _Bool x;
+  if (foo (x, y))
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/pr71762-2.c
===
--- gcc/testsuite/gcc.dg/torture/pr71762-2.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr71762-2.c(working copy)
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+
+static _Bool
+foo (_Bool a, _Bool b)
+{
+  int x = a && ! b;
+  return x != 0;
+}
+
+int y = 1;
+int main()
+{
+  _Bool x[32];
+  if (foo (x[1], y))
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/pr71762-3.c
===
--- gcc/testsuite/gcc.dg/torture/pr71762-3.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr71762-3.c(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+
+static _Bool
+foo (_Bool a, _Bool b)
+{
+  int x = a && ! b;
+  return x != 0;
+}
+
+int y = 1;
+int main()
+{
+  register _Bool x
+  /* Add register spec for the argv parameter to main.  */
+#if __i386__ || __x86_64__
+  __asm__("%esi")
+#endif
+;
+  if (foo (x, y))
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c (revision 242032)
+++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c (working copy)
@@ -83,6 +83,8 @@ test_8 (int code)
to a ordered compare.  But the transform does not trigger if we transform
the negated code == 22 compare to code != 22 first.  It turns out if
we do that we even generate better code on x86 at least.  */
+/* ???  As PR71762 notices this transform causes wrong-code issues in RTL
+   with one uninitialized operand, thus it has been disabled.  */
 
-/* { dg-final { scan-tree-dump-times "simplified to if \\\(\[^ ]* \[<>\]" 4 
"forwprop1"} } */
+/* { dg-final { scan-tree-dump-times "simplified to if \\\(\[^ ]* \[<>\]" 4 
"forwprop1" { xfail *-*-* } } } */
 


Re: [Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]

2016-11-10 Thread Janus Weil
Hi Andre,

> well, is it really that obvious?

well ... what can I say. If you wanna be strict about it, I guess
there is no such thing as an "obvious patch". There is basically
always something that you can miss, or that can be improved. Mikael's
patch was obvious to me in the sense that it is clear and simple and
fixes the ICE without any side effects. Thus it's a clear improvement,
and it has been rotting in bugzilla for over five years.

Are you implying that it was premature to commit it as 'obvious'?


> It fixes the ICE, correct. But what about
> cases where the actual has an explicit interface, but is not a variable? Like
> in:
> [...]
> Can you comment
> having thought about this somewhat?

In fact I have not thought about any further cases. Since you're not
giving full examples, I can only guess what you mean: The cases in the
attachment are working as expected. Anything else?

Cheers,
Janus




> On Wed, 9 Nov 2016 21:42:15 +0100
> Janus Weil  wrote:
>
>> Hi all,
>>
>> I have committed yet another obvious ice-on-invalid fix:
>>
>> https://gcc.gnu.org/viewcvs?rev=242020&root=gcc&view=rev
>>
>> Cheers,
>> Janus
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de

  implicit none
  call sub(f())

contains

  function f()
integer :: f
  end function

  ! sub() defined as in the pr.
  subroutine sub (j)
integer, volatile :: j
  end subroutine

end
  type t
integer :: i
  end type

  type(t) :: foo

  call sub(foo%i)

contains

  ! sub() defined as in the pr.
  subroutine sub (j)
integer, volatile :: j
  end subroutine

end


Re: [ARM] PR 78253 do not resolve weak ref locally

2016-11-10 Thread Christophe Lyon
On 10 November 2016 at 11:05, Richard Earnshaw
 wrote:
> On 09/11/16 21:29, Christophe Lyon wrote:
>> Hi,
>>
>> PR 78253 shows that the handling of weak references has changed for
>> ARM with gcc-5.
>>
>> When r220674 was committed, default_binds_local_p_2 gained a new
>> parameter (weak_dominate), which, when true, implies that a reference
>> to a weak symbol defined locally will be resolved locally, even though
>> it could be overridden by a strong definition in another object file.
>>
>> With r220674, default_binds_local_p forces weak_dominate=true,
>> effectively changing the previous behavior.
>>
>> The attached patch introduces default_binds_local_p_4 which is a copy
>> of default_binds_local_p_2, but using weak_dominate=false, and updates
>> the ARM target to call default_binds_local_p_4 instead of
>> default_binds_local_p_2.
>>
>> I ran cross-tests on various arm* configurations with no regression,
>> and checked that the test attached to the original bugzilla now works
>> as expected.
>>
>> I am not sure why weak_dominate defaults to true, and I couldn't
>> really understand why by reading the threads related to r220674 and
>> following updates to default_binds_local_p_* which all deal with other
>> corner cases and do not discuss the weak_dominate parameter.
>>
>> Or should this patch be made more generic?
>>
>
> I certainly don't think it should be ARM specific.
That was my feeling too.

>
> The questions I have are:
>
> 1) What do other targets do today.  Are they the same, or different?

arm, aarch64, s390 use default_binds_local_p_2 since PR 65780, and
default_binds_local_p before that. Both have weak_dominate=true
i386 has its own version, calling default_binds_local_p_3 with true
for weak_dominate

But the behaviour of default_binds_local_p changed with r220674 as I said above.
See https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=220674 and
notice how weak_dominate was introduced

The original bug report is about a different case:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32219

The original patch submission is
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00410.html
and the 1st version with weak_dominate is in:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00469.html
but it's not clear to me why this was introduced

> 2) If different why?
on aarch64, although binds_local_p returns true, the relocations used when
building the function pointer is still the same (still via the GOT).

aarch64 has different logic than arm when accessing a symbol
(eg aarch64_classify_symbol)

> 3) Is the current behaviour really what was intended by the patch?  ie.
> Was the old behaviour actually wrong?
>
That's what I was wondering.
Before r220674, calling a weak function directly or via a function
pointer had the same effect (in other words, the function pointer
points to the actual implementation: the strong one if any, the weak
one otherwise).

After r220674, on arm the function pointer points to the weak
definition, which seems wrong to me, it should leave the actual
resolution to the linker.


> R.
>> Thanks,
>>
>> Christophe
>>
>


Re: [PATCH] Fix PR78189

2016-11-10 Thread Christophe Lyon
On 10 November 2016 at 09:34, Richard Biener  wrote:
> On Wed, 9 Nov 2016, Christophe Lyon wrote:
>
>> On 9 November 2016 at 09:36, Bin.Cheng  wrote:
>> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener  wrote:
>> >> On Mon, 7 Nov 2016, Christophe Lyon wrote:
>> >>
>> >>> Hi Richard,
>> >>>
>> >>>
>> >>> On 7 November 2016 at 09:01, Richard Biener  wrote:
>> >>> >
>> >>> > The following fixes an oversight when computing alignment in the
>> >>> > vectorizer.
>> >>> >
>> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>> >>> >
>> >>> > Richard.
>> >>> >
>> >>> > 2016-11-07  Richard Biener  
>> >>> >
>> >>> > PR tree-optimization/78189
>> >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix
>> >>> > alignment computation.
>> >>> >
>> >>> > * g++.dg/torture/pr78189.C: New testcase.
>> >>> >
>> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
>> >>> > ===
>> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C  (revision 0)
>> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C  (working copy)
>> >>> > @@ -0,0 +1,41 @@
>> >>> > +/* { dg-do run } */
>> >>> > +/* { dg-additional-options "-ftree-slp-vectorize 
>> >>> > -fno-vect-cost-model" } */
>> >>> > +
>> >>> > +#include 
>> >>> > +
>> >>> > +struct A
>> >>> > +{
>> >>> > +  void * a;
>> >>> > +  void * b;
>> >>> > +};
>> >>> > +
>> >>> > +struct alignas(16) B
>> >>> > +{
>> >>> > +  void * pad;
>> >>> > +  void * misaligned;
>> >>> > +  void * pad2;
>> >>> > +
>> >>> > +  A a;
>> >>> > +
>> >>> > +  void Null();
>> >>> > +};
>> >>> > +
>> >>> > +void B::Null()
>> >>> > +{
>> >>> > +  a.a = nullptr;
>> >>> > +  a.b = nullptr;
>> >>> > +}
>> >>> > +
>> >>> > +void __attribute__((noinline,noclone))
>> >>> > +NullB(void * misalignedPtr)
>> >>> > +{
>> >>> > +  B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) 
>> >>> > - offsetof(B, misaligned));
>> >>> > +  b->Null();
>> >>> > +}
>> >>> > +
>> >>> > +int main()
>> >>> > +{
>> >>> > +  B b;
>> >>> > +  NullB(&b.misaligned);
>> >>> > +  return 0;
>> >>> > +}
>> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> >>> > index 9346cfe..b03cb1e 100644
>> >>> > --- gcc/tree-vect-data-refs.c
>> >>> > +++ gcc/tree-vect-data-refs.c
>> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
>> >>> > data_reference *dr)
>> >>> >base = ref;
>> >>> >while (handled_component_p (base))
>> >>> >  base = TREE_OPERAND (base, 0);
>> >>> > +  unsigned int base_alignment;
>> >>> > +  unsigned HOST_WIDE_INT base_bitpos;
>> >>> > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
>> >>> > +  /* As data-ref analysis strips the MEM_REF down to its base operand
>> >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to
>> >>> > + adjust things to make base_alignment valid as the alignment of
>> >>> > + DR_BASE_ADDRESS.  */
>> >>> >if (TREE_CODE (base) == MEM_REF)
>> >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
>> >>> > -  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 
>> >>> > 0));
>> >>> > -  unsigned int base_alignment = get_object_alignment (base);
>> >>> > +{
>> >>> > +  base_bitpos -= mem_ref_offset (base).to_short_addr () * 
>> >>> > BITS_PER_UNIT;
>> >>> > +  base_bitpos &= (base_alignment - 1);
>> >>> > +}
>> >>> > +  if (base_bitpos != 0)
>> >>> > +base_alignment = base_bitpos & -base_bitpos;
>> >>> > +  /* Also look at the alignment of the base address DR analysis
>> >>> > + computed.  */
>> >>> > +  unsigned int base_addr_alignment = get_pointer_alignment 
>> >>> > (base_addr);
>> >>> > +  if (base_addr_alignment > base_alignment)
>> >>> > +base_alignment = base_addr_alignment;
>> >>> >
>> >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
>> >>> >  DR_VECT_AUX (dr)->base_element_aligned = true;
>> >>>
>> >>> Since you committed this patch (r241892), I'm seeing execution failures:
>> >>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
>> >>>   gcc.dg/vect/pr40074.c execution test
>> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
>> >>> --with-fpu=neon-fp16
>> >>> (using qemu as simulator)
>> >>
>> >> The difference is that we now vectorize the testcase with versioning
>> >> for alignment (but it should never execute the vectorized variant).
>> >> I need arm peoples help to understand what is wrong.
>> > Hi All,
>> > I will look at it.
>> >
>>
>> Hi,
>>
>> This is causing new regressions on armeb:
>>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects
>> scan-tree-dump-times vect "vectorized 2 loops" 1
>>   gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect
>> "vectorized 2 loops" 1
>
> It's actually an improvement as armeb can't do unaligned loads.
> Before the patch we versioned both loops for alignment (they
> can't be possibly both ali

[PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-10 Thread Kyrill Tkachov

Hi all,

This patch implements the new separate shrink-wrapping hooks for aarch64.
In separate shrink wrapping (as I understand it) we consider each register 
save/restore as
a 'component' that can be performed independently of the other save/restores in 
the prologue/epilogue
and can be moved outside the prologue/epilogue and instead performed only in 
the basic blocks where it's
actually needed. This allows us to avoid saving and restoring registers on 
execution paths where a register
might not be needed.

In the most general form a 'component' can be any operation that the 
prologue/epilogue performs, for example
stack adjustment. But in this patch we only consider callee-saved register 
save/restores as components.
The code is in many ways similar to the powerpc implementation of the hooks.

The hooks implemented are:
* TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS: Returns a bitmap containing a bit 
for each register that should
be considered a 'component' i.e. its save/restore should be separated from the 
prologue and epilogue and placed
at the basic block where it's needed.

* TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB: Determine for a given basic block which 
'component' registers it needs.
This is determined through dataflow. If a component register is in the IN,GEN 
or KILL sets for the basic block
it's considered as needed and marked as such in the bitmap.

* TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS and 
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS: Given a bitmap
of component registers emits the save or restore code for them.

* TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS: Given a bitmap of component 
registers record in the backend that
the register is shrink-wrapped using this approach and that the normal prologue 
and epilogue expansion code
should not emit code for them. This is done similarly to powerpc by defining a 
bool array in machine_function
where we record whether each register is separately shrink-wrapped.  The 
prologue and epilogue expansion code
(through aarch64_save_callee_saves and aarch64_restore_callee_saves) is updated 
to not emit save/restores for
these registers if they appear in that array.

Our prologue and epilogue code has a lot of intricate logic to perform stack 
adjustments using the writeback
forms of the load/store instructions. Separately shrink-wrapping those 
registers marked for writeback
(cfun->machine->frame.wb_candidate1 and cfun->machine->frame.wb_candidate2) 
broke that codegen and I had to
emit an explicit stack adjustment instruction that created ugly 
prologue/epilogue sequences. So this patch
is conservative and doesn't allow shrink-wrapping of the registers marked for 
writeback. Maybe in the future
we can relax it (for example allow wrapping of one of the two writeback 
registers if the writeback amount
can be encoded in a single-register writeback store/load) but given the 
development stage of GCC I thought
I'd play it safe.

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were some 
interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%

I'll be re-running the benchmarks with Segher's recent patch [1] to see if they 
fix the regression
and if it does I think this can go in.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00889.html

Bootstrapped and tested on aarch64-none-linux-gnu.

Thanks,
Kyrill

2016-11-10  Kyrylo Tkachov  

* config/aarch64/aarch64.h (machine_function): Add
reg_is_wrapped_separately field.
* config/aarch64/aarch64.c (emit_set_insn): Change return type to
rtx_insn *.
(aarch64_save_callee_saves): Don't save registers that are wrapped
separately.
(aarch64_restore_callee_saves): Don't restore registers that are
wrapped separately.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p,
aarch64_offset_7bit_signed_scaled_p): Move earlier in the file.
(aarch64_get_separate_components): New function.
(aarch64_components_for_bb): Likewise.
(aarch64_disqualify_components): Likewise.
(aarch64_emit_prologue_components): Likewise.
(aarch64_emit_epilogue_components): Likewise.
(aarch64_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.
commit 14c7a66d9f3a44ef40499e61ca9643c7dfbc6c82
Author: Kyrylo Tkachov 
Date:   Tue Oct 11 09:25:54 2016 +0100

[AArch64] Separate shrink wrapping hooks implementation

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 325e725..5508333 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1138,7 +1138,7 @@ aarch64_is_extend_from_extract (machine_mode mode, rtx mult_imm,
 
 /* Emit an insn that's a simple single-set.  Both the 

Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-10 Thread Paul Richard Thomas
Hi Dominique.

snip
> I have a last glitch (which can be deferred if needed):
snip

Fixed by the new patch, which is attached. Bootstraps and regtests OK.

OK for trunk?

Paul

2016-11-10  Paul Thomas  

PR fortran/44265
* gfortran.h : Add fn_result_spec bitfield to gfc_symbol.
* resolve.c (flag_fn_result_spec): New function.
(resolve_fntype): Call it for character result lengths.
* symbol.c (gfc_new_symbol): Set fn_result_spec to zero.
* trans-decl.c (gfc_sym_mangled_identifier): Include the
procedure name in the mangled name for symbols with the
fn_result_spec bit set.
(gfc_get_symbol_decl): Mangle the name of these symbols.
(gfc_create_module_variable): Allow them through the assert.
(gfc_generate_function_code): Remove the assert before the
initialization of sym->tlink because the frontend no longer
uses this field.
* trans-expr.c (gfc_map_intrinsic_function): Add a case to
treat the LEN_TRIM intrinsic.

2016-11-10  Paul Thomas  

PR fortran/44265
* gfortran.dg/char_result_14.f90: New test.
* gfortran.dg/char_result_15.f90: New test.
* gfortran.dg/char_result_16.f90: New test.


-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein
Index: gcc/fortran/gfortran.h
===
*** gcc/fortran/gfortran.h  (revision 241994)
--- gcc/fortran/gfortran.h  (working copy)
*** typedef struct gfc_symbol
*** 1498,1503 
--- 1498,1505 
unsigned equiv_built:1;
/* Set if this variable is used as an index name in a FORALL.  */
unsigned forall_index:1;
+   /* Set if the symbol is used in a function result specification .  */
+   unsigned fn_result_spec:1;
/* Used to avoid multiple resolutions of a single symbol.  */
unsigned resolved:1;
/* Set if this is a module function or subroutine with the
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 241994)
--- gcc/fortran/resolve.c   (working copy)
*** resolve_equivalence (gfc_equiv *eq)
*** 15732,15737 
--- 15732,15785 
  }
  
  
+ /* Function called by resolve_fntype to flag other symbol used in the
+length type parameter specification of function resuls.  */
+ 
+ static bool
+ flag_fn_result_spec (gfc_expr *expr,
+  gfc_symbol *sym ATTRIBUTE_UNUSED,
+  int *f ATTRIBUTE_UNUSED)
+ {
+   gfc_namespace *ns;
+   gfc_symbol *s;
+ 
+   if (expr->expr_type == EXPR_VARIABLE)
+ {
+   s = expr->symtree->n.sym;
+   for (ns = s->ns; ns; ns = ns->parent)
+   if (!ns->parent)
+ break;
+ 
+   if (!s->fn_result_spec
+ && s->attr.flavor == FL_PARAMETER)
+   {
+ /* Function contained in a module */
+ if (ns->proc_name && ns->proc_name->attr.flavor == FL_MODULE)
+   {
+ gfc_symtree *st;
+ s->fn_result_spec = 1;
+ /* Make sure that this symbol is translated as a module
+variable.  */
+ st = gfc_get_unique_symtree (ns);
+ st->n.sym = s;
+ s->refs++;
+   }
+ /* ... which is use associated and called.  */
+ else if (s->attr.use_assoc || s->attr.used_in_submodule
+   ||
+ /* External function matched with an interface.  */
+ (s->ns->proc_name
+  && ((s->ns == ns
+&& s->ns->proc_name->attr.if_source == IFSRC_DECL)
+  || s->ns->proc_name->attr.if_source == IFSRC_IFBODY)
+  && s->ns->proc_name->attr.function))
+   s->fn_result_spec = 1;
+   }
+ }
+   return false;
+ }
+ 
+ 
  /* Resolve function and ENTRY types, issue diagnostics if needed.  */
  
  static void
*** resolve_fntype (gfc_namespace *ns)
*** 15782,15787 
--- 15830,15838 
el->sym->attr.untyped = 1;
  }
}
+ 
+   if (sym->ts.type == BT_CHARACTER)
+ gfc_traverse_expr (sym->ts.u.cl->length, NULL, flag_fn_result_spec, 0);
  }
  
  
Index: gcc/fortran/symbol.c
===
*** gcc/fortran/symbol.c(revision 241994)
--- gcc/fortran/symbol.c(working copy)
*** gfc_new_symbol (const char *name, gfc_na
*** 2933,2938 
--- 2933,2939 
p->common_block = NULL;
p->f2k_derived = NULL;
p->assoc = NULL;
+   p->fn_result_spec = 0;

return p;
  }
Index: gcc/fortran/trans-decl.c
===
*** gcc/fortran/trans-decl.c(revision 241994)
--- gcc/fortran/trans-decl.c(working copy)
*** gfc_sym_mangled_identifier (gfc_symbol *
*** 355,362 
if (sym->attr.is_bind_c == 1 && sym->binding_label)
  return get_identifier (sym

[PATCH][ARM] PR78255: wrong code generation for indirect sibling calls

2016-11-10 Thread Andre Vieira (lists)
Hi,

As reported in PR78255 there is currently an issue with indirect sibling
calls in ARM when the address of the sibling call is loaded into 'r3'
and that same register is chosen to align the stack.  See the report for
further information.

As I mentioned in the bugzilla ticket I am not sure this is the right
approach, though it works... Bootstrapped on ARM and no regressions.

Do you think this is OK? Another solution would be to make sure that
'arm_get_frame_offsets' recalculates offsets after we know that the call
is going to be indirect, i.e. after we know the address is going to be
loaded into a register, but I do not know what a sane way would be to
ensure this.

Regards,
Andre

gcc/ChangeLog
2016-11-10  Andre Vieira  

* config/arm/arm.md (sibcall_internal): Add 'use' to pattern.
(sibcall_value_internal): Likewise.
(sibcall_insn): Likewise.
(sibcall_value_insn): Likewise.


gcc/testsuite/ChangeLog
2016-11-10  Andre Vieira  

* gcc.target/arm/pr78255.c: New.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
8393f65bcf4c9c3e61b91e5adcd5f59ff7c6ec3f..ab28b15f3e4ebbaca2b8ec0523493b54cce8c306
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8192,7 +8192,8 @@
   [(parallel [(call (match_operand 0 "memory_operand" "")
(match_operand 1 "general_operand" ""))
  (return)
- (use (match_operand 2 "" ""))])])
+ (use (match_operand 2 "" ""))
+ (use (match_dup 0))])])
 
 ;; We may also be able to do sibcalls for Thumb, but it's much harder...
 (define_expand "sibcall"
@@ -8225,7 +8226,8 @@
   (call (match_operand 1 "memory_operand" "")
 (match_operand 2 "general_operand" "")))
  (return)
- (use (match_operand 3 "" ""))])])
+ (use (match_operand 3 "" ""))
+ (use (match_dup 1))])])
 
 (define_expand "sibcall_value"
   [(parallel [(set (match_operand 0 "" "")
@@ -8258,7 +8260,8 @@
  [(call (mem:SI (match_operand:SI 0 "call_insn_operand" "Cs, US"))
(match_operand 1 "" ""))
   (return)
-  (use (match_operand 2 "" ""))]
+  (use (match_operand 2 "" ""))
+  (use (match_operand 3 "" ""))]
   "TARGET_32BIT && SIBLING_CALL_P (insn)"
   "*
   if (which_alternative == 1)
@@ -8279,7 +8282,8 @@
(call (mem:SI (match_operand:SI 1 "call_insn_operand" "Cs,US"))
 (match_operand 2 "" "")))
   (return)
-  (use (match_operand 3 "" ""))]
+  (use (match_operand 3 "" ""))
+  (use (match_operand 4 "" ""))]
   "TARGET_32BIT && SIBLING_CALL_P (insn)"
   "*
   if (which_alternative == 1)
diff --git a/gcc/testsuite/gcc.target/arm/pr78255.c 
b/gcc/testsuite/gcc.target/arm/pr78255.c
new file mode 100644
index 
..4901acea51466c0bac92d9cb90e52b00b450d88a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr78255.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2" }  */
+
+#include 
+
+struct table_s
+{
+void (*fun0)
+( void );
+void (*fun1)
+( void );
+void (*fun2)
+( void );
+void (*fun3)
+( void );
+void (*fun4)
+( void );
+void (*fun5)
+( void );
+void (*fun6)
+( void );
+void (*fun7)
+( void );
+} table;
+
+void callback0(){__asm("mov r0, r0 \n\t");}
+void callback1(){__asm("mov r0, r0 \n\t");}
+void callback2(){__asm("mov r0, r0 \n\t");}
+void callback3(){__asm("mov r0, r0 \n\t");}
+void callback4(){__asm("mov r0, r0 \n\t");}
+
+void test (void) {
+memset(&table, 0, sizeof table);
+
+asm volatile ("" : : : "r3");
+
+table.fun0 = callback0;
+table.fun1 = callback1;
+table.fun2 = callback2;
+table.fun3 = callback3;
+table.fun4 = callback4;
+table.fun0();
+}
+
+void foo (void)
+{
+  __builtin_abort ();
+}
+
+int main (void)
+{
+  unsigned long p = (unsigned long) &foo;
+  asm volatile ("mov r3, %0" : : "r" (p));
+  test ();
+
+  return 0;
+}


Unreviewed libstdc++ patch

2016-11-10 Thread Rainer Orth
The libstdc++ part of the following patch

[fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+ 

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00330.html

has remained unreviewed for a week.  Bruce already approved the
fixincludes part.

In the meantime, full testing on mainline has completed and backports to
the gcc-5 and gcc-6 branches have also been tested successfully.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Support no newline in print_gimple_stmt

2016-11-10 Thread Martin Liška
I've just noticed that tree-ssa-dse wrongly prints a new line to dump file.
For the next stage1, I'll go through usages of print_gimple_stmt and remove
extra new lines like:

gcc/auto-profile.c:  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
gcc/auto-profile.c-  fprintf (dump_file, "\n");

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From ab1ed77381f78a8940ca250ee0f5ef5cd6b87e7f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 10 Nov 2016 14:54:00 +0100
Subject: [PATCH] Support no newline in print_gimple_stmt

gcc/ChangeLog:

2016-11-10  Martin Liska  

	* gimple-pretty-print.c (print_gimple_stmt): Add new argument.
	* gimple-pretty-print.h (print_gimple_stmt): Declare the new
	argument.
	* tree-ssa-dse.c (dse_optimize_stmt): Use the argument to not to
	print a newline.
---
 gcc/gimple-pretty-print.c | 7 +--
 gcc/gimple-pretty-print.h | 2 +-
 gcc/tree-ssa-dse.c| 5 +++--
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index f588f5e..8538911 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -76,13 +76,16 @@ debug_gimple_stmt (gimple *gs)
FLAGS as in pp_gimple_stmt_1.  */
 
 void
-print_gimple_stmt (FILE *file, gimple *g, int spc, int flags)
+print_gimple_stmt (FILE *file, gimple *g, int spc, int flags,
+		   bool newline)
 {
   pretty_printer buffer;
   pp_needs_newline (&buffer) = true;
   buffer.buffer->stream = file;
   pp_gimple_stmt_1 (&buffer, g, spc, flags);
-  pp_newline_and_flush (&buffer);
+  if (newline)
+pp_newline (&buffer);
+  pp_flush (&buffer);
 }
 
 DEBUG_FUNCTION void
diff --git a/gcc/gimple-pretty-print.h b/gcc/gimple-pretty-print.h
index f8eef99..6890cfb 100644
--- a/gcc/gimple-pretty-print.h
+++ b/gcc/gimple-pretty-print.h
@@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 extern void debug_gimple_stmt (gimple *);
 extern void debug_gimple_seq (gimple_seq);
 extern void print_gimple_seq (FILE *, gimple_seq, int, int);
-extern void print_gimple_stmt (FILE *, gimple *, int, int);
+extern void print_gimple_stmt (FILE *, gimple *, int, int, bool newline = true);
 extern void debug (gimple &ref);
 extern void debug (gimple *ptr);
 extern void print_gimple_expr (FILE *, gimple *, int, int);
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 372a0be..b1a8c5d 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -237,7 +237,8 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 		{
 		  fprintf (dump_file, "  Deleted dead call '");
-		  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
+		  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0,
+ false);
 		  fprintf (dump_file, "'\n");
 		}
 
@@ -293,7 +294,7 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
   if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "  Deleted dead store '");
-	  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
+	  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0, false);
 	  fprintf (dump_file, "'\n");
 	}
 
-- 
2.10.1



Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx

2016-11-10 Thread Felipe Magno de Almeida
Hello,

Sorry for top-posting, but this is a ping for the attached patch.

The patch doesn't seem to have been applied nor refused. So I'm
pinging to see if I need to change something? I already have a
copyright assignment now.

I'm attaching a updated patch that doesn't conflict in the Changelog
file.

Regards,

On Fri, Sep 16, 2016 at 3:37 AM, Felipe Magno de Almeida
 wrote:
> Hello,
>
> Another patch.
>
> On Fri, Sep 16, 2016 at 2:53 AM, Felipe Magno de Almeida
>  wrote:
>> On Fri, Sep 16, 2016 at 2:42 AM, Marc Glisse  wrote:
>>> On Thu, 15 Sep 2016, Felipe Magno de Almeida wrote:
>>>
>>> +   || sizeof(uint32_t) == sizeof(void*)
>>> +|| sizeof(uint16_t) == sizeof(void*),
>>>
>>> Indentation is off?
>>>
 Call _M_extract_* functions family through temporary int objects
>>>
>>>
>>> Would it make sense to use a template type instead of int for this
>>> parameter? Or possibly have a typedef that defaults to int (what POSIX
>>> requires). The hard case would be a libc that uses bitfields for the fields
>>> of struct tm (that could save some space), but I don't think anyone does
>>> that.
>>
>> I've tried both approaches. Templates were causing problems of not
>> defined instantations because they were being used as ints too
>> in other _M_extract functions through a tmp integer. And typedef's
>> caused the same problem of having to use a tmp value of the right
>> type but for example _M_extract_wday_or_month could not have the
>> same type (in AVR they do) and I'd have to use a temporary anyway
>> then.
>>
>> This was the least intrusive way.
>>
 float pointing
>>>
>>> floating point?
>>
>> :D. Yes.
>>
>>> A ChangeLog entry would also help.
>>
>> OK.
>>
>>> --
>>> Marc Glisse
>>
>> Regards,
>> --
>> Felipe Magno de Almeida
>
>
>
> --
> Felipe Magno de Almeida



-- 
Felipe Magno de Almeida
diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index f405ccd..b0efe72 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,17 @@
+2016-11-10  Felipe Magno de Almeida 
+
+	* src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers
+
+2016-11-10  Felipe Magno de Almeida 
+
+	* include/bits/locale_facets_nonio.tcc: Avoid compilation errors
+	with non-standard struct tm.
+
+2016-11-10  Felipe Magno de Almeida 
+
+	* crossconfig.m4: Add avr target for cross-compilation
+	* configure: Regenerate
+
 2016-11-09  Tim Shen  
 
 	* libstdc++-v3/include/bits/regex.h (regex_iterator::regex_iterator()):
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 8481a48..5e3f783 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -28840,6 +28840,55 @@ case "${host}" in
 # This is a freestanding configuration; there is nothing to do here.
 ;;
 
+  avr*-*-*)
+$as_echo "#define HAVE_ACOSF 1" >>confdefs.h
+
+$as_echo "#define HAVE_ASINF 1" >>confdefs.h
+
+$as_echo "#define HAVE_ATAN2F 1" >>confdefs.h
+
+$as_echo "#define HAVE_ATANF 1" >>confdefs.h
+
+$as_echo "#define HAVE_CEILF 1" >>confdefs.h
+
+$as_echo "#define HAVE_COSF 1" >>confdefs.h
+
+$as_echo "#define HAVE_COSHF 1" >>confdefs.h
+
+$as_echo "#define HAVE_EXPF 1" >>confdefs.h
+
+$as_echo "#define HAVE_FABSF 1" >>confdefs.h
+
+$as_echo "#define HAVE_FLOORF 1" >>confdefs.h
+
+$as_echo "#define HAVE_FMODF 1" >>confdefs.h
+
+$as_echo "#define HAVE_FREXPF 1" >>confdefs.h
+
+$as_echo "#define HAVE_SQRTF 1" >>confdefs.h
+
+$as_echo "#define HAVE_HYPOTF 1" >>confdefs.h
+
+$as_echo "#define HAVE_LDEXPF 1" >>confdefs.h
+
+$as_echo "#define HAVE_LOG10F 1" >>confdefs.h
+
+$as_echo "#define HAVE_LOGF 1" >>confdefs.h
+
+$as_echo "#define HAVE_MODFF 1" >>confdefs.h
+
+$as_echo "#define HAVE_POWF 1" >>confdefs.h
+
+$as_echo "#define HAVE_SINF 1" >>confdefs.h
+
+$as_echo "#define HAVE_SINHF 1" >>confdefs.h
+
+$as_echo "#define HAVE_TANF 1" >>confdefs.h
+
+$as_echo "#define HAVE_TANHF 1" >>confdefs.h
+
+;;
+
   mips*-sde-elf*)
 # These definitions are for the SDE C library rather than newlib.
 SECTION_FLAGS='-ffunction-sections -fdata-sections'
diff --git a/libstdc++-v3/crossconfig.m4 b/libstdc++-v3/crossconfig.m4
index 6abc84f..2b955ec 100644
--- a/libstdc++-v3/crossconfig.m4
+++ b/libstdc++-v3/crossconfig.m4
@@ -9,6 +9,32 @@ case "${host}" in
 # This is a freestanding configuration; there is nothing to do here.
 ;;
 
+  avr*-*-*)
+AC_DEFINE(HAVE_ACOSF)
+AC_DEFINE(HAVE_ASINF)
+AC_DEFINE(HAVE_ATAN2F)
+AC_DEFINE(HAVE_ATANF)
+AC_DEFINE(HAVE_CEILF)
+AC_DEFINE(HAVE_COSF)
+AC_DEFINE(HAVE_COSHF)
+AC_DEFINE(HAVE_EXPF)
+AC_DEFINE(HAVE_FABSF)
+AC_DEFINE(HAVE_FLOORF)
+AC_DEFINE(HAVE_FMODF)
+AC_DEFINE(HAVE_FREXPF)
+AC_DEFINE(HAVE_SQRTF)
+AC_DEFINE(HAVE_HYPOTF)
+AC_DEFINE(HAVE_LDEXPF)
+AC_DEFINE(HAVE_LOG10F)
+AC_DEFINE(HAVE_LOGF)
+AC_DEFINE(HAVE_MODFF)
+AC_DEFINE(HAVE_POWF)
+AC_DEFINE(HAVE_SINF)
+AC_DEFINE(HAVE

Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Nathan Sidwell

On 11/10/2016 05:19 AM, Martin Liška wrote:


On 10/13/2016 05:34 PM, Martin Liška wrote:

Hello.

As it's very hard to guess from GCC driver whether a target supports atomic 
updates
for GCOV counter or not, I decided to come up with a new option value 
(maybe-atomic),
that would be transformed in a corresponding value (single or atomic) in 
tree-profile.c.
The GCC driver selects the option when -pthread is present in the command line.

That should fix all tests failures seen on AIX target.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?


I dislike this.  If it's hard for gcc itself to know, how much harder 
for the user must it be?   (does gcc have another instance of an option 
that behaves 'prefer-A-or-B-if-you-can't'?


It's also not clear what problem it's solving for the user?  If the user 
needs atomic update, they should get a hard error if the target doesn't 
support it.  If they don't need atomic, why ask for it?


But as ever, I'm not going to veto it.

nathan

--
Nathan Sidwell


Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx

2016-11-10 Thread Felipe Magno de Almeida
On Thu, Nov 10, 2016 at 1:39 PM, Felipe Magno de Almeida
 wrote:
> Hello,
>
> Sorry for top-posting, but this is a ping for the attached patch.
>
> The patch doesn't seem to have been applied nor refused. So I'm
> pinging to see if I need to change something? I already have a
> copyright assignment now.
>
> I'm attaching a updated patch that doesn't conflict in the Changelog
> file.

Reattaching the patch with the correct git commit descriptions.

> Regards,

[snip]

> --
> Felipe Magno de Almeida

Kind regards,
-- 
Felipe Magno de Almeida
From 2e62f3a4b3c307e2abec0cb0302ec601b8f693e8 Mon Sep 17 00:00:00 2001
From: Felipe Magno de Almeida 
Date: Thu, 15 Sep 2016 15:54:50 -0300
Subject: [PATCH 1/3] Add #ifdef case for 16 bits in cow-stdexcept.cc

Added #ifdef case for when void* is 16 bits so it compiles in AVR
target.
---
 libstdc++-v3/ChangeLog  |  4 
 libstdc++-v3/src/c++11/cow-stdexcept.cc | 11 ---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index f405ccd..80d2e9d 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,7 @@
+2016-11-10  Felipe Magno de Almeida 
+
+	* src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers
+
 2016-11-09  Tim Shen  
 
 	* libstdc++-v3/include/bits/regex.h (regex_iterator::regex_iterator()):
diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index 31a89df..641b372 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -208,6 +208,8 @@ extern void* _ZGTtnaX (size_t sz) __attribute__((weak));
 extern void _ZGTtdlPv (void* ptr) __attribute__((weak));
 extern uint8_t _ITM_RU1(const uint8_t *p)
   ITM_REGPARM __attribute__((weak));
+extern uint16_t _ITM_RU2(const uint16_t *p)
+  ITM_REGPARM __attribute__((weak));
 extern uint32_t _ITM_RU4(const uint32_t *p)
   ITM_REGPARM __attribute__((weak));
 extern uint64_t _ITM_RU8(const uint64_t *p)
@@ -272,12 +274,15 @@ _txnal_cow_string_C1_for_exceptions(void* that, const char* s,
 static void* txnal_read_ptr(void* const * ptr)
 {
   static_assert(sizeof(uint64_t) == sizeof(void*)
-		|| sizeof(uint32_t) == sizeof(void*),
-		"Pointers must be 32 bits or 64 bits wide");
+		|| sizeof(uint32_t) == sizeof(void*)
+		|| sizeof(uint16_t) == sizeof(void*),
+		"Pointers must be 16 bits, 32 bits or 64 bits wide");
 #if __UINTPTR_MAX__ == __UINT64_MAX__
   return (void*)_ITM_RU8((const uint64_t*)ptr);
-#else
+#elif __UINTPTR_MAX__ == __UINT32_MAX__
   return (void*)_ITM_RU4((const uint32_t*)ptr);
+#else
+  return (void*)_ITM_RU2((const uint16_t*)ptr);
 #endif
 }
 
-- 
2.10.2


From 7ed4af72fe0bdee1a38c7487955590fb64f76a5d Mon Sep 17 00:00:00 2001
From: Felipe Magno de Almeida 
Date: Thu, 15 Sep 2016 18:52:57 -0300
Subject: [PATCH 2/3] Use temporary int objects to access struct tm members

Call _M_extract_* functions family through temporary int objects, so
it doesn't convert from lvalue to rvalue through a temporary in AVR
because of the incompatible types used in AVR-Libc.

This fixes compilation errors with AVR-Libc while compiling libstdc++
for AVR target.
---
 libstdc++-v3/ChangeLog|  5 +++
 libstdc++-v3/include/bits/locale_facets_nonio.tcc | 44 ---
 2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 80d2e9d..80c75c6 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,5 +1,10 @@
 2016-11-10  Felipe Magno de Almeida 
 
+	* include/bits/locale_facets_nonio.tcc: Avoid compilation errors
+	with non-standard struct tm.
+
+2016-11-10  Felipe Magno de Almeida 
+
 	* src/c++11/cow-stdexcept.cc: Add special case for 16 bit pointers
 
 2016-11-09  Tim Shen  
diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 1a4f9a0..cc9d2df 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -659,30 +659,38 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
 		  // Abbreviated weekday name [tm_wday]
 		  const char_type*  __days1[7];
 		  __tp._M_days_abbreviated(__days1);
-		  __beg = _M_extract_name(__beg, __end, __tm->tm_wday, __days1,
+  __tm->tm_wday = __mem;
+		  __beg = _M_extract_name(__beg, __end, __mem, __days1,
 	  7, __io, __tmperr);
+  __mem = __tm->tm_wday;
 		  break;
 		case 'A':
 		  // Weekday name [tm_wday].
 		  const char_type*  __days2[7];
 		  __tp._M_days(__days2);
-		  __beg = _M_extract_name(__beg, __end, __tm->tm_wday, __days2,
+  __mem = __tm->tm_wday;
+		  __beg = _M_extract_name(__beg, __end, __mem, __days2,
 	  7, __io, __tmperr);
+  __tm->tm_wday = __mem;
 		  break;
 		case 'h':
 		case 'b':
 		  // Abbreviated month name [tm_mon]
 		  const char_type*  __months1[12];
 		  __tp._M_months_abbrevi

Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread David Edelsohn
On Thu, Nov 10, 2016 at 10:43 AM, Nathan Sidwell  wrote:
> On 11/10/2016 05:19 AM, Martin Liška wrote:
>
>>> On 10/13/2016 05:34 PM, Martin Liška wrote:

 Hello.

 As it's very hard to guess from GCC driver whether a target supports
 atomic updates
 for GCOV counter or not, I decided to come up with a new option value
 (maybe-atomic),
 that would be transformed in a corresponding value (single or atomic) in
 tree-profile.c.
 The GCC driver selects the option when -pthread is present in the
 command line.

 That should fix all tests failures seen on AIX target.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression
 tests.

 Ready to be installed?
>
>
> I dislike this.  If it's hard for gcc itself to know, how much harder for
> the user must it be?   (does gcc have another instance of an option that
> behaves 'prefer-A-or-B-if-you-can't'?
>
> It's also not clear what problem it's solving for the user?  If the user
> needs atomic update, they should get a hard error if the target doesn't
> support it.  If they don't need atomic, why ask for it?
>
> But as ever, I'm not going to veto it.

Do you have a better suggestion?

gcc.c now imposes profile-update=atomic if -pthread is used, even if
the target does not support profile-update=atomic.

Either gcc.c must not impose profile-update=atomic or we need some way
of differentiating between when the request should fail because the
user really expects it and when the request should silently and gently
be ignored.

The atomic update feature is nice, but currently GCC is trying to be
too smart to guess how important the feature is to the user.

Thanks, David


Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-10 Thread James Greenhalgh

On Wed, Nov 09, 2016 at 10:16:35PM +, Joseph Myers wrote:
> On Wed, 9 Nov 2016, Bernd Edlinger wrote:
>
> > Yes, but maybe introduce a test if the half-wide value fits?
> >
> > like:
> >
> > #define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)
>
> Something like that.

In patch form, that would look like this...

I've checked on my ARM and AArch64 trees with _Float16 support that this
lets the tests pass.

OK?

Thanks,
James

---
2016-11-10  James Greenhalgh  

* gcc.dg/torture/fp-int-convert.h (M_OK2): New, use it in
WVAL0S tests added in r241817.

diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert.h b/gcc/testsuite/gcc.dg/torture/fp-int-convert.h
index bbe9666..2b904b6 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-int-convert.h
+++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert.h
@@ -53,13 +53,14 @@ do {\
   TEST_I_F_VAL (U, F, HVAL1U (P, U), P_OK (P, U));		\
   TEST_I_F_VAL (U, F, HVAL1U (P, U) + 1, P_OK (P, U));		\
   TEST_I_F_VAL (U, F, HVAL1U (P, U) - 1, P_OK (P, U));		\
-  TEST_I_F_VAL (I, F, WVAL0S (I), 1);\
-  TEST_I_F_VAL (I, F, -WVAL0S (I), 1);\
+  TEST_I_F_VAL (I, F, WVAL0S (I), M_OK2 (M, U));		\
+  TEST_I_F_VAL (I, F, -WVAL0S (I), M_OK2 (M, U));		\
 } while (0)
 
 #define P_OK(P, T) ((P) >= sizeof(T) * CHAR_BIT)
 #define P_OK1(P, T) ((P) >= sizeof(T) * CHAR_BIT - 1)
 #define M_OK1(M, T) ((M) > sizeof(T) * CHAR_BIT - 1)
+#define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)
 #define HVAL0U(P, U) (U)(P_OK (P, U)	 \
 			 ? (U)1		 \
 			 : (((U)1 << (sizeof(U) * CHAR_BIT - 1))	 \


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Martin Liška
On 11/10/2016 04:43 PM, Nathan Sidwell wrote:
> On 11/10/2016 05:19 AM, Martin Liška wrote:
> 
>>> On 10/13/2016 05:34 PM, Martin Liška wrote:
 Hello.

 As it's very hard to guess from GCC driver whether a target supports 
 atomic updates
 for GCOV counter or not, I decided to come up with a new option value 
 (maybe-atomic),
 that would be transformed in a corresponding value (single or atomic) in 
 tree-profile.c.
 The GCC driver selects the option when -pthread is present in the command 
 line.

 That should fix all tests failures seen on AIX target.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 Ready to be installed?
> 
> I dislike this.  If it's hard for gcc itself to know, how much harder for the 
> user must it be?   (does gcc have another instance of an option that behaves 
> 'prefer-A-or-B-if-you-can't'?
> 
> It's also not clear what problem it's solving for the user?  If the user 
> needs atomic update, they should get a hard error if the target doesn't 
> support it.  If they don't need atomic, why ask for it?

My initial motivation was to automatically selected -fprofile-update=atomic if 
supported by a target and when '-pthread' is present on command line.
As it's very problematic to identify (from GCC driver) whether a target 
supports or not atomic updates, 'maybe' option is the only possible we can 
guess.

> 
> But as ever, I'm not going to veto it.

Other option is to disable selection of -fprofile-update=atomic automatically.

Martin

> 
> nathan
> 



Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers

2016-11-10 Thread Segher Boessenkool
Hi Mike,

> I have built the spec 2006 CPU benchmark suite with these changes, and the
> power8 (ISA 2.07) code generation does not change.

Very good to hear :-)

Just some nits; okay for trunk with that fixed:

> +(define_split
> +  [(set (match_operand:EXTHI 0 "altivec_register_operand" "")
> + (sign_extend:EXTHI
> +  (match_operand:HI 1 "indexed_or_indirect_operand" "")))]
> +  "TARGET_P9_VECTOR && reload_completed"
> +  [(set (match_dup 2)
> + (match_dup 1))
> +   (set (match_dup 0)
> + (sign_extend:EXTHI (match_dup 2)))]
> +{
> +  operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));
> +})

Please lose the default "" (here and elsewhere).

> Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c
> ___
> Modified: svn:mergeinfo
>Merged /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c:r241733-241924
> 
> 
> Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c
> ___
> Modified: svn:mergeinfo
>Merged /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c:r241733-241924

I don't know what this is?

Thanks!


Segher


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Nathan Sidwell

On 11/10/2016 07:43 AM, Nathan Sidwell wrote:

On 11/10/2016 05:19 AM, Martin Liška wrote:


On 10/13/2016 05:34 PM, Martin Liška wrote:

Hello.

As it's very hard to guess from GCC driver whether a target supports
atomic updates
for GCOV counter or not, I decided to come up with a new option
value (maybe-atomic),
that would be transformed in a corresponding value (single or
atomic) in tree-profile.c.
The GCC driver selects the option when -pthread is present in the
command line.

That should fix all tests failures seen on AIX target.

Patch can bootstrap on ppc64le-redhat-linux and survives regression
tests.

Ready to be installed?


I dislike this.  If it's hard for gcc itself to know, how much harder
for the user must it be?   (does gcc have another instance of an option
that behaves 'prefer-A-or-B-if-you-can't'?


Thinking further.  why isn't the right solution for 
-fprofile-update=atomic when faced with a target that cannot support it to:

a) issue an error and bail out at the first opportunity
b) or issue a warning and fall back to single threaded update?

For #b presumably there'll be the capability of suppressing that 
particular warning?


nathan

--
Nathan Sidwell


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread David Edelsohn
On Thu, Nov 10, 2016 at 11:14 AM, Nathan Sidwell  wrote:
> On 11/10/2016 07:43 AM, Nathan Sidwell wrote:
>>
>> On 11/10/2016 05:19 AM, Martin Liška wrote:
>>
 On 10/13/2016 05:34 PM, Martin Liška wrote:
>
> Hello.
>
> As it's very hard to guess from GCC driver whether a target supports
> atomic updates
> for GCOV counter or not, I decided to come up with a new option
> value (maybe-atomic),
> that would be transformed in a corresponding value (single or
> atomic) in tree-profile.c.
> The GCC driver selects the option when -pthread is present in the
> command line.
>
> That should fix all tests failures seen on AIX target.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression
> tests.
>
> Ready to be installed?
>>
>>
>> I dislike this.  If it's hard for gcc itself to know, how much harder
>> for the user must it be?   (does gcc have another instance of an option
>> that behaves 'prefer-A-or-B-if-you-can't'?
>
>
> Thinking further.  why isn't the right solution for -fprofile-update=atomic
> when faced with a target that cannot support it to:
> a) issue an error and bail out at the first opportunity
> b) or issue a warning and fall back to single threaded update?
>
> For #b presumably there'll be the capability of suppressing that particular
> warning?

Because that incorrectly breaks a huge portion of the testsuite.
that's not what the user intended.

- David


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Nathan Sidwell

On 11/10/2016 07:55 AM, David Edelsohn wrote:


gcc.c now imposes profile-update=atomic if -pthread is used, even if
the target does not support profile-update=atomic.


ah, that's where this is coming from.

nathan

--
Nathan Sidwell


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Martin Liška
On 11/10/2016 05:17 PM, David Edelsohn wrote:
> Maybe instead of adding "maybe", we need to change the severity of the
> warning so that the warning is not emitted by default.

Adding the warning option to -Wextra can be solution. Is it acceptable
approach?

Martin


Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-10 Thread Segher Boessenkool
Hi!

Great to see this.  Just a few comments...

On Thu, Nov 10, 2016 at 02:25:47PM +, Kyrill Tkachov wrote:
> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
> +
> +static sbitmap
> +aarch64_get_separate_components (void)
> +{
> +  /* Calls to alloca further extend the stack frame and it can be messy to
> + figure out the location of the stack slots for each register.
> + For now be conservative.  */
> +  if (cfun->calls_alloca)
> +return NULL;

The generic code already disallows functions with alloca (in
try_shrink_wrapping_separate).

> +static void
> +aarch64_emit_prologue_components (sbitmap components)
> +{
> +  rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
> +  ? HARD_FRAME_POINTER_REGNUM
> +  : STACK_POINTER_REGNUM);
> +
> +  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
> +if (bitmap_bit_p (components, regno))
> +  {
> + rtx reg = gen_rtx_REG (Pmode, regno);
> + HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
> + if (!frame_pointer_needed)
> + offset += cfun->machine->frame.frame_size
> +   - cfun->machine->frame.hard_fp_offset;
> + rtx addr = plus_constant (Pmode, ptr_reg, offset);
> + rtx mem = gen_frame_mem (Pmode, addr);
> +
> + RTX_FRAME_RELATED_P (emit_move_insn (mem, reg)) = 1;
> +  }
> +}

I think you should emit the CFI notes here directly, just like for the
epilogue components.


Segher


[PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
Hi,

this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.

It requires additional patch for register allocator from Vladimir
Makarov to be committed before.

gcc/
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX5124FMAPS_SET,
OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
OPTION_MASK_ISA_AVX5124VNNIW_SET,
OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New.
(ix86_handle_option): Handle OPT_mavx5124fmaps,
OPT_mavx5124vnniw.
* config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h.
* config/i386/avx5124fmapsintrin.h: New file.
* config/i386/avx5124vnniwintrin.h: Ditto.
* config/i386/constraints.md (h): New constraint.
* config/i386/cpuid.h: (bit_AVX5124VNNIW,
bit_AVX5124FMAPS): New.
* config/i386/driver-i386.c (host_detect_local_cpu):
Detect avx5124fmaps, avx5124vnniw.
* config/i386/i386-builtin-types.def: Add types
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI,
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF,
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF,
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI,
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI,
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI.
* config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask,
__builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss,
__builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask,
__builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss,
__builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd,
__builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds,
__builtin_ia32_vp4dpwssds_mask): New.
* config/i386/i386-c.c (ix86_target_macros_internal):
Define __AVX5124FMAPS__, __AVX5124VNNIW__.
* config/i386/i386-modes.def (VECTOR_MODES (FLOAT, 256),
VECTOR_MODE (INT, SI, 64)): New modes.
* config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps,
-mavx5124vnniw.
(PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define.
(ix86_option_override_internal): Handle new options.
(ix86_valid_target_attribute_inner_p): Add avx5124fmaps,
avx5124vnniw.
(ix86_expand_builtin): Handle new builtins.
(ix86_additional_allocno_class_p): New.
* config/i386/i386.h (TARGET_AVX5124FMAPS,
TARGET_AVX5124FMAPS_P,
TARGET_AVX5124VNNIW,
TARGET_AVX5124VNNIW_P): Define.
(reg_class): Add MOD4_SSE_REGS.
(MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New.
* config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw.
* config/i386/immintrin.h: Include avx5124fmapsintrin.h,
avx5124vnniwintrin.h.
* config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD,
UNSPEC_VP4FNMADD,
UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS.
(define_mode_iterator IMOD4): New.
(define_mode_attr imod4_narrow): Ditto.
(define_insn "mov"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto.
* init-regs.c (initialize_uninitialized_regs): Add emit_clobber call.
* genmodes.c (mode_size_inline): Extend return type.
* machmode.h (mode_size, mode_base_align): Extend type.
gcc/testsuite/
* gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test.
* gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
* gcc.target/i386/avx5124fmaps-check.h: Ditto.
* gcc.target/i386/avx5124vnniw-check.h: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
   

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
> Hi,
> 
> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
> 
> It requires additional patch for register allocator from Vladimir
> Makarov to be committed before.

Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
tabs), can you repost as attachment or configure your MUA not to do this?

Just a couple of random nits follow:

> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.

This mentions an option that doesn't exist, is that s/dd// ?

> * gcc.target/i386/sse-13.c: Ditto.

> @@ -399,6 +403,13 @@ ix86_handle_option (struct gcc_options *opts,
>   {
>opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
>opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
> +
> +  //turn off additional isa flags

Comments start with capital letter, end with ., there should be space
between // and T, better use /* ... */ style comment to match other
comments in the file.

> +  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
> +  opts->x_ix86_isa_flags2_explicit |=
> OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
> +  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
> +  opts->x_ix86_isa_flags2_explicit |=
> OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
> +
>   }

The formatting looks very weird.

Jakub


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread David Edelsohn
On Thu, Nov 10, 2016 at 10:58 AM, Martin Liška  wrote:
> On 11/10/2016 04:43 PM, Nathan Sidwell wrote:
>> On 11/10/2016 05:19 AM, Martin Liška wrote:
>>
 On 10/13/2016 05:34 PM, Martin Liška wrote:
> Hello.
>
> As it's very hard to guess from GCC driver whether a target supports 
> atomic updates
> for GCOV counter or not, I decided to come up with a new option value 
> (maybe-atomic),
> that would be transformed in a corresponding value (single or atomic) in 
> tree-profile.c.
> The GCC driver selects the option when -pthread is present in the command 
> line.
>
> That should fix all tests failures seen on AIX target.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?
>>
>> I dislike this.  If it's hard for gcc itself to know, how much harder for 
>> the user must it be?   (does gcc have another instance of an option that 
>> behaves 'prefer-A-or-B-if-you-can't'?
>>
>> It's also not clear what problem it's solving for the user?  If the user 
>> needs atomic update, they should get a hard error if the target doesn't 
>> support it.  If they don't need atomic, why ask for it?
>
> My initial motivation was to automatically selected -fprofile-update=atomic 
> if supported by a target and when '-pthread' is present on command line.
> As it's very problematic to identify (from GCC driver) whether a target 
> supports or not atomic updates, 'maybe' option is the only possible we can 
> guess.
>
>>
>> But as ever, I'm not going to veto it.
>
> Other option is to disable selection of -fprofile-update=atomic automatically.

Unfortunately, this cannot use a configure test or manually set value
based on target because the same gcc.c driver is invoked with
different options that may provide atomic update in some variants and
no atomic update in other (e.g., -m64) because the profile counter is
64 bits.

Maybe instead of adding "maybe", we need to change the severity of the
warning so that the warning is not emitted by default.

- David


Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-10 Thread Kyrill Tkachov

On 10/11/16 16:26, Segher Boessenkool wrote:

Hi!


Hi,



Great to see this.  Just a few comments...

On Thu, Nov 10, 2016 at 02:25:47PM +, Kyrill Tkachov wrote:

+/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
+
+static sbitmap
+aarch64_get_separate_components (void)
+{
+  /* Calls to alloca further extend the stack frame and it can be messy to
+ figure out the location of the stack slots for each register.
+ For now be conservative.  */
+  if (cfun->calls_alloca)
+return NULL;

The generic code already disallows functions with alloca (in
try_shrink_wrapping_separate).


Ok, I'll remove this.


+static void
+aarch64_emit_prologue_components (sbitmap components)
+{
+  rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
+? HARD_FRAME_POINTER_REGNUM
+: STACK_POINTER_REGNUM);
+
+  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
+if (bitmap_bit_p (components, regno))
+  {
+   rtx reg = gen_rtx_REG (Pmode, regno);
+   HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
+   if (!frame_pointer_needed)
+   offset += cfun->machine->frame.frame_size
+ - cfun->machine->frame.hard_fp_offset;
+   rtx addr = plus_constant (Pmode, ptr_reg, offset);
+   rtx mem = gen_frame_mem (Pmode, addr);
+
+   RTX_FRAME_RELATED_P (emit_move_insn (mem, reg)) = 1;
+  }
+}

I think you should emit the CFI notes here directly, just like for the
epilogue components.


The prologue code in expand_prologue doesn't attach any explicit notes,
so I didn't want to deviate from that. Looking at the powerpc implementation,
would that be a REG_CFA_OFFSET with the (SET (mem) (reg)) expression for saving
the reg?

Thanks,
Kyrill



Segher




Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-10 Thread Joseph Myers
On Thu, 10 Nov 2016, James Greenhalgh wrote:

> 
> On Wed, Nov 09, 2016 at 10:16:35PM +, Joseph Myers wrote:
> > On Wed, 9 Nov 2016, Bernd Edlinger wrote:
> >
> > > Yes, but maybe introduce a test if the half-wide value fits?
> > >
> > > like:
> > >
> > > #define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)
> >
> > Something like that.
> 
> In patch form, that would look like this...
> 
> I've checked on my ARM and AArch64 trees with _Float16 support that this
> lets the tests pass.
> 
> OK?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[C++ PATCH] Only warn with -Wc++1z-compat about mangling of visible symbols

2016-11-10 Thread Jakub Jelinek
Hi!

It seems -Wabi/-Wc++1z-compat warns about mangling changes even for symbols 
that are
not visible outside of its TU (so likely only inline asm or tools
looking at .symtab STB_LOCAL symbols would notice).  Perhaps that is fine
for -Wabi that isn't enabled in -Wall/-Wextra, but -Wc++1z-compat is, and
I think most of the users don't really care about mangling of those symbols.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-10  Jakub Jelinek  

* mangle.c (mangle_decl): Only emit -Wc++1z-compat warnings for
public or external symbols.

* g++.dg/cpp1z/noexcept-type14.C: New test.
* g++.dg/asan/asan_test.C: Remove -Wno-c++1z-compat from dg-options.

--- gcc/cp/mangle.c.jj  2016-11-09 23:55:59.0 +0100
+++ gcc/cp/mangle.c 2016-11-10 10:14:26.914059686 +0100
@@ -3836,7 +3836,8 @@ mangle_decl (const tree decl)
 }
   SET_DECL_ASSEMBLER_NAME (decl, id);
 
-  if (G.need_cxx1z_warning)
+  if (G.need_cxx1z_warning
+  && (TREE_PUBLIC (decl) || DECL_REALLY_EXTERN (decl)))
 warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc__1z_compat,
"mangled name for %qD will change in C++17 because the "
"exception specification is part of a function type",
--- gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C.jj 2016-11-10 
10:14:42.898857481 +0100
+++ gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C2016-11-10 
10:22:46.412741092 +0100
@@ -0,0 +1,26 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wall" }
+
+#define A asm volatile ("" : : : "memory")
+void foo () throw () {}
+extern void f1 (decltype (foo) *); // { dg-bogus "mangled name" }
+void f2 (decltype (foo) *);// { dg-bogus "mangled name" }
+extern void f3 (decltype (foo) *); // { dg-warning "mangled name" "" { 
target c++14_down } }
+void f4 (decltype (foo) *);// { dg-warning "mangled name" "" { 
target c++14_down } }
+void f5 (decltype (foo) *) { A; }  // { dg-warning "mangled name" "" { 
target c++14_down } }
+static void f6 (decltype (foo) *) { A; }// { dg-bogus "mangled name" }
+namespace N {
+void f7 (decltype (foo) *) { A; }  // { dg-warning "mangled name" "" { 
target c++14_down } }
+}
+namespace {
+void f8 (decltype (foo) *) { A; }  // { dg-bogus "mangled name" }
+}
+void bar ()
+{
+  f3 (foo);
+  f4 (foo);
+  f5 (foo);
+  f6 (foo);
+  N::f7 (foo);
+  f8 (foo);
+}
--- gcc/testsuite/g++.dg/asan/asan_test.C.jj2016-11-10 00:00:00.0 
+0100
+++ gcc/testsuite/g++.dg/asan/asan_test.C   2016-11-10 13:28:55.348073019 
+0100
@@ -2,7 +2,7 @@
 // { dg-skip-if "" { *-*-* } { "*" } { "-O2" } }
 // { dg-skip-if "" { *-*-* } { "-flto" } { "" } }
 // { dg-additional-sources "asan_globals_test-wrapper.cc" }
-// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall 
-Wno-c++1z-compat -Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 
-DASAN_HAS_BLACKLIST=0 -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" }
+// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall -Werror -g 
-DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 -DASAN_HAS_BLACKLIST=0 
-DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" }
 // { dg-additional-options "-DASAN_NEEDS_SEGV=1" { target { ! arm*-*-* } } }
 // { dg-additional-options "-DASAN_LOW_MEMORY=1 -DASAN_NEEDS_SEGV=0" { target 
arm*-*-* } }
 // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
run_expensive_tests } } }

Jakub


Re: [C++ PATCH] Only warn with -Wc++1z-compat about mangling of visible symbols

2016-11-10 Thread Jason Merrill
OK.

On Thu, Nov 10, 2016 at 8:52 AM, Jakub Jelinek  wrote:
> Hi!
>
> It seems -Wabi/-Wc++1z-compat warns about mangling changes even for symbols 
> that are
> not visible outside of its TU (so likely only inline asm or tools
> looking at .symtab STB_LOCAL symbols would notice).  Perhaps that is fine
> for -Wabi that isn't enabled in -Wall/-Wextra, but -Wc++1z-compat is, and
> I think most of the users don't really care about mangling of those symbols.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-11-10  Jakub Jelinek  
>
> * mangle.c (mangle_decl): Only emit -Wc++1z-compat warnings for
> public or external symbols.
>
> * g++.dg/cpp1z/noexcept-type14.C: New test.
> * g++.dg/asan/asan_test.C: Remove -Wno-c++1z-compat from dg-options.
>
> --- gcc/cp/mangle.c.jj  2016-11-09 23:55:59.0 +0100
> +++ gcc/cp/mangle.c 2016-11-10 10:14:26.914059686 +0100
> @@ -3836,7 +3836,8 @@ mangle_decl (const tree decl)
>  }
>SET_DECL_ASSEMBLER_NAME (decl, id);
>
> -  if (G.need_cxx1z_warning)
> +  if (G.need_cxx1z_warning
> +  && (TREE_PUBLIC (decl) || DECL_REALLY_EXTERN (decl)))
>  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc__1z_compat,
> "mangled name for %qD will change in C++17 because the "
> "exception specification is part of a function type",
> --- gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C.jj 2016-11-10 
> 10:14:42.898857481 +0100
> +++ gcc/testsuite/g++.dg/cpp1z/noexcept-type14.C2016-11-10 
> 10:22:46.412741092 +0100
> @@ -0,0 +1,26 @@
> +// { dg-do compile { target c++11 } }
> +// { dg-options "-Wall" }
> +
> +#define A asm volatile ("" : : : "memory")
> +void foo () throw () {}
> +extern void f1 (decltype (foo) *); // { dg-bogus "mangled name" }
> +void f2 (decltype (foo) *);// { dg-bogus "mangled name" }
> +extern void f3 (decltype (foo) *); // { dg-warning "mangled name" "" { 
> target c++14_down } }
> +void f4 (decltype (foo) *);// { dg-warning "mangled name" "" { 
> target c++14_down } }
> +void f5 (decltype (foo) *) { A; }  // { dg-warning "mangled name" "" { 
> target c++14_down } }
> +static void f6 (decltype (foo) *) { A; }// { dg-bogus "mangled name" }
> +namespace N {
> +void f7 (decltype (foo) *) { A; }  // { dg-warning "mangled name" "" { 
> target c++14_down } }
> +}
> +namespace {
> +void f8 (decltype (foo) *) { A; }  // { dg-bogus "mangled name" }
> +}
> +void bar ()
> +{
> +  f3 (foo);
> +  f4 (foo);
> +  f5 (foo);
> +  f6 (foo);
> +  N::f7 (foo);
> +  f8 (foo);
> +}
> --- gcc/testsuite/g++.dg/asan/asan_test.C.jj2016-11-10 00:00:00.0 
> +0100
> +++ gcc/testsuite/g++.dg/asan/asan_test.C   2016-11-10 13:28:55.348073019 
> +0100
> @@ -2,7 +2,7 @@
>  // { dg-skip-if "" { *-*-* } { "*" } { "-O2" } }
>  // { dg-skip-if "" { *-*-* } { "-flto" } { "" } }
>  // { dg-additional-sources "asan_globals_test-wrapper.cc" }
> -// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall 
> -Wno-c++1z-compat -Werror -g -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 
> -DASAN_HAS_BLACKLIST=0 -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" 
> }
> +// { dg-options "-std=c++11 -fsanitize=address -fno-builtin -Wall -Werror -g 
> -DASAN_UAR=0 -DASAN_HAS_EXCEPTIONS=1 -DASAN_HAS_BLACKLIST=0 
> -DSANITIZER_USE_DEJAGNU_GTEST=1 -lasan -lpthread -ldl" }
>  // { dg-additional-options "-DASAN_NEEDS_SEGV=1" { target { ! arm*-*-* } } }
>  // { dg-additional-options "-DASAN_LOW_MEMORY=1 -DASAN_NEEDS_SEGV=0" { 
> target arm*-*-* } }
>  // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
> run_expensive_tests } } }
>
> Jakub


[PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)

2016-11-10 Thread Jakub Jelinek
Hi!

On arm/aarch64 we ICE because some decls that make it here has non-NULL
DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that
doesn't fit into shwi would ICE similarly).  While it is arguably a FE bug
that it creates for VLA initialization from STRING_CST such a decl,
I believe we have some PRs about it already open.
I think it won't hurt to check for the large sizes properly even in this
function though, and punt on unexpected cases, or even extremely large ones.

Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going
to test on arm and/or aarch64.  Ok for trunk if the testing there passes
too?

2016-11-10  Jakub Jelinek  

PR middle-end/78201
* varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo.
Don't test decl != NULL.  Don't look at DECL_SIZE, but DECL_SIZE_UNIT
instead, return false if it is NULL, or doesn't fit into uhwi, or
is larger or equal to targetm.max_anchor_offset.

* g++.dg/opt/pr78201.C: New test.

--- gcc/varasm.c.jj 2016-10-31 13:28:12.0 +0100
+++ gcc/varasm.c2016-11-10 15:18:41.282886244 +0100
@@ -6804,11 +6804,12 @@ default_use_anchors_for_symbol_p (const_
return false;
 
   /* Don't use section anchors for decls that won't fit inside a single
-anchor range to reduce the amount of instructions require to refer
+anchor range to reduce the amount of instructions required to refer
 to the entire declaration.  */
-  if (decl && DECL_SIZE (decl)
-&& tree_to_shwi (DECL_SIZE (decl))
-   >= (targetm.max_anchor_offset * BITS_PER_UNIT))
+  if (DECL_SIZE_UNIT (decl) == NULL_TREE
+ || !tree_fits_uhwi_p (DECL_SIZE_UNIT (decl))
+ || (tree_to_uhwi (DECL_SIZE_UNIT (decl))
+ >= (unsigned HOST_WIDE_INT) targetm.max_anchor_offset))
return false;
 
 }
--- gcc/testsuite/g++.dg/opt/pr78201.C.jj   2016-11-10 15:20:18.398660681 
+0100
+++ gcc/testsuite/g++.dg/opt/pr78201.C  2016-11-10 15:19:58.0 +0100
@@ -0,0 +1,13 @@
+// PR middle-end/78201
+// { dg-do compile }
+// { dg-options "-O2" }
+
+struct B { long d (); } *c;
+long e;
+
+void
+foo ()
+{
+  char a[e] = "";
+  c && c->d();
+}

Jakub


[committed] Bump fortran _OPENMP macro and openmp_version

2016-11-10 Thread Jakub Jelinek
Hi!

I think it is better to announce 4.5 than 4.0 at the current state.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2016-11-10  Jakub Jelinek  

gcc/fortran/
* cpp.c (cpp_define_builtins): Define _OPENMP to 201511 instead
of 201307.
* gfortran.texi: Mention partial OpenMP 4.5 support.
* intrinsic.texi: Update for OpenMP 4.5.
gcc/testsuite/
* gfortran.dg/openmp-define-3.f90: Expect 201511 instead of
201307.
libgomp/
* omp_lib.f90.in (openmp_version): Change to 201511 from 201307.
* omp_lib.h.in (openmp_version): Likewise.
* testsuite/libgomp.fortran/openmp_version-1.f: Expect 201511 instead
of 201307.
* testsuite/libgomp.fortran/openmp_version-2.f90: Likewise.

--- gcc/fortran/cpp.c.jj2016-01-04 14:55:59.0 +0100
+++ gcc/fortran/cpp.c   2016-11-10 12:59:33.829477276 +0100
@@ -168,7 +168,7 @@ cpp_define_builtins (cpp_reader *pfile)
 cpp_define (pfile, "_OPENACC=201306");
 
   if (flag_openmp)
-cpp_define (pfile, "_OPENMP=201307");
+cpp_define (pfile, "_OPENMP=201511");
 
   /* The defines below are necessary for the TARGET_* macros.
 
--- gcc/fortran/gfortran.texi.jj2016-11-03 22:03:29.0 +0100
+++ gcc/fortran/gfortran.texi   2016-11-10 13:02:11.380466602 +0100
@@ -536,7 +536,8 @@ The current status of the support is can
 and @ref{TS 18508 status} sections of the documentation.
 
 Additionally, the GNU Fortran compilers supports the OpenMP specification
-(version 4.0, @url{http://openmp.org/@/wp/@/openmp-specifications/}).
+(version 4.0 and most of the features of the 4.5 version,
+@url{http://openmp.org/@/wp/@/openmp-specifications/}).
 There also is initial support for the OpenACC specification (targeting
 version 2.0, @uref{http://www.openacc.org/}).
 Note that this is an experimental feature, incomplete, and subject to
@@ -1999,7 +2000,7 @@ and environment variables that influence
 
 GNU Fortran strives to be compatible to the 
 @uref{http://openmp.org/wp/openmp-specifications/,
-OpenMP Application Program Interface v4.0}.
+OpenMP Application Program Interface v4.5}.
 
 To enable the processing of the OpenMP directive @code{!$omp} in
 free-form source code; the @code{c$omp}, @code{*$omp} and @code{!$omp}
--- gcc/fortran/intrinsic.texi.jj   2016-10-31 13:28:11.0 +0100
+++ gcc/fortran/intrinsic.texi  2016-11-10 13:02:57.463880355 +0100
@@ -14769,7 +14769,7 @@ with the following options: @code{-fno-u
 @section OpenMP Modules @code{OMP_LIB} and @code{OMP_LIB_KINDS}
 @table @asis
 @item @emph{Standard}:
-OpenMP Application Program Interface v4.0
+OpenMP Application Program Interface v4.5
 @end table
 
 
@@ -14783,8 +14783,8 @@ the named constants defined in the modul
 below.
 
 For details refer to the actual
-@uref{http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf,
-OpenMP Application Program Interface v4.0}.
+@uref{http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf,
+OpenMP Application Program Interface v4.5}.
 
 @code{OMP_LIB_KINDS} provides the following scalar default-integer
 named constants:
@@ -14799,7 +14799,7 @@ named constants:
 @code{OMP_LIB} provides the scalar default-integer
 named constant @code{openmp_version} with a value of the form
 @var{mm}, where @code{} is the year and @var{mm} the month
-of the OpenMP version; for OpenMP v4.0 the value is @code{201307}.
+of the OpenMP version; for OpenMP v4.5 the value is @code{201511}.
 
 The following scalar integer named constants of the
 kind @code{omp_sched_kind}:
--- gcc/testsuite/gfortran.dg/openmp-define-3.f90.jj2014-06-18 
09:11:57.0 +0200
+++ gcc/testsuite/gfortran.dg/openmp-define-3.f90   2016-11-10 
13:04:12.381927290 +0100
@@ -6,6 +6,6 @@
 # error _OPENMP not defined
 #endif
 
-#if _OPENMP != 201307
+#if _OPENMP != 201511
 # error _OPENMP defined to wrong value
 #endif
--- libgomp/omp_lib.f90.in.jj   2016-01-04 14:38:59.0 +0100
+++ libgomp/omp_lib.f90.in  2016-11-10 13:04:30.704694198 +0100
@@ -59,7 +59,7 @@
   module omp_lib
 use omp_lib_kinds
 implicit none
-integer, parameter :: openmp_version = 201307
+integer, parameter :: openmp_version = 201511
 
 interface
   subroutine omp_init_lock (svar)
--- libgomp/omp_lib.h.in.jj 2016-01-04 14:38:59.0 +0100
+++ libgomp/omp_lib.h.in2016-11-10 13:04:43.363533159 +0100
@@ -58,7 +58,7 @@
   parameter (omp_lock_hint_contended = 2)
   parameter (omp_lock_hint_nonspeculative = 4)
   parameter (omp_lock_hint_speculative = 8)
-  parameter (openmp_version = 201307)
+  parameter (openmp_version = 201511)
 
   external omp_init_lock, omp_init_nest_lock
   external omp_init_lock_with_hint
--- libgomp/testsuite/libgomp.fortran/openmp_version-1.f.jj 2014-06-18 
09:11:57.0 +0200
+++ libgomp/testsuite/libgomp.fortran/openmp_version-1.f2016-11-10 
13:05:04.255267386 +0100
@@ -4,6

[PATCH][AArch64] Tweak Cortex-A57 vector cost

2016-11-10 Thread Wilco Dijkstra
The existing vector costs stop some beneficial vectorization.  This is mostly 
due
to vector statement cost being set to 3 as well as vector loads having a higher
cost than scalar loads.  This means that even when we vectorize 4x, it is 
possible
that the cost of a vectorized loop is similar to the scalar version, and we fail
to vectorize.  For example for a particular loop the costs for -mcpu=generic 
are:

note: Cost model analysis: 
  Vector inside of loop cost: 146
  Vector prologue cost: 5
  Vector epilogue cost: 0
  Scalar iteration cost: 50
  Scalar outside cost: 0
  Vector outside cost: 5
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
note:   Runtime profitability threshold = 3
note:   Static estimate profitability threshold = 3
note: loop vectorized


While -mcpu=cortex-a57 reports:

note: Cost model analysis: 
  Vector inside of loop cost: 294
  Vector prologue cost: 15
  Vector epilogue cost: 0
  Scalar iteration cost: 74
  Scalar outside cost: 0
  Vector outside cost: 15
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 31
note:   Runtime profitability threshold = 30
note:   Static estimate profitability threshold = 30
note: not vectorized: vectorization not profitable.
note: not vectorized: iteration count smaller than user specified loop bound 
parameter or minimum profitable iterations (whichever is more conservative).


Using a cost of 3 for a vector operation suggests they are 3 times as
expensive as scalar operations.  Since most vector operations have a 
similar throughput as scalar operations, this is not correct.

Using slightly lower values for these heuristics now allows this loop
and many others to be vectorized.  On a proprietary benchmark the gain
from vectorizing this loop is around 15-30% which shows vectorizing it is
indeed beneficial.

ChangeLog:
2016-11-10  Wilco Dijkstra  

* config/aarch64/aarch64.c (cortexa57_vector_cost):
Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
279a6dfaa4a9c306bc7a8dba9f4f53704f61fefe..cff2e8fc6e9309e6aa4f68a5aba3bfac3b737283
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -382,12 +382,12 @@ static const struct cpu_vector_cost cortexa57_vector_cost 
=
   1, /* scalar_stmt_cost  */
   4, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
-  3, /* vec_stmt_cost  */
+  2, /* vec_stmt_cost  */
   3, /* vec_permute_cost  */
   8, /* vec_to_scalar_cost  */
   8, /* scalar_to_vec_cost  */
-  5, /* vec_align_load_cost  */
-  5, /* vec_unalign_load_cost  */
+  4, /* vec_align_load_cost  */
+  4, /* vec_unalign_load_cost  */
   1, /* vec_unalign_store_cost  */
   1, /* vec_store_cost  */
   1, /* cond_taken_branch_cost  */


gomp-nvptx branch status

2016-11-10 Thread Alexander Monakov
Hello,

I'd like to provide an overview of the gomp-nvptx branch status. In response to
this message I'll send two more emails, with libgomp and middle-end changes on
the branch.  Some of the changes to libgomp such as build machinery adaptations
have already received substantial comments in 2015, but the middle-end stuff is
mostly unreviewed I believe.

Middle-end changes mostly amount to adding SIMD-to-SIMT transforms in omp-low.c,
as shown on the Cauldron.  SIMT outlining via gimplifier abuse is not there, and
neither is cloning of SIMD/SIMT loops.  Outlining is required for correctness,
and cloning is useful as it allows to avoid intermixing SIMD+SIMT and thus be
sure that SIMT lowering does not 'dirty' SIMD loops and regress host/MIC
vectorization.  I could argue that it's possible to improve my SIMT lowering to
avoid some dirtying (like moving loop-invariant calls to GOMP_SIMT_VF()), but
the need for outlining makes that moot anyway, I think.

To get great performance this will need further changes everywhere, including
in target-independent code, due to accidents like this bug (which I'd like to
ping given the topic): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68706 

With OpenMP/PTX offloading there are 5 additional failures in 
check-target-libgomp:

Two due to tests using 'usleep' in a target region:
FAIL: libgomp.c/target-32.c (test for excess errors)
FAIL: libgomp.c/thread-limit-2.c (test for excess errors)

Two with 'target nowait' (not implemented)
FAIL: libgomp.c/target-33.c execution test
FAIL: libgomp.c/target-34.c execution test

One with 'target link' (not implemented)
FAIL: libgomp.c/target-link-1.c (test for excess errors)

Eventually these can be fixed by implementing the two missing OpenMP 4.5
features; for the 'usleep' issues, while I think it's not good to have tests
with that, eventually I'd like to provide a port of musl libc for PTX which
would also provide usleep (either a no-op stub, or based on a busy loop).

Short term, it should be possible to implement something like -foffload=^nvptx
to skip PTX (and only PTX) offloading on those tests.

Thanks.
Alexander


[PATCH 1/2][AArch64] Add bfx attribute

2016-11-10 Thread Wilco Dijkstra
Currently the SBFM, UBFM and BFM instructions all use the attribute "bfm".
SBFM and UBFM include all shifts on AArch64, which are simpler than bitfield
insert.  Add a new bfx attribute for these instructions so that they can be
modelled more accurately in the future.  There is no difference in code 
generation.

ChangeLog:
2016-11-10  Wilco Dijkstra  

* config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3)
Use bfx attribute.
(aarch64_lshr_sisd_or_int_3): Likewise.
(aarch64_ashr_sisd_or_int_3): Likewise.
(si3_insn_uxtw): Likewise.
(3_insn): Likewise.
(_ashl): Likewise.
(zero_extend_lshr): Likewise.
(extend_ashr): Likewise.
(): Likewise.
(insv): Likewise.
(andim_ashift_bfiz): Likewise.
* config/aarch64/thunderx.md (thunderx_shift): Add bfx.
* config/arm/cortex-a53.md (cortex_a53_alu_shift): Likewise.
* config/arm/cortex-a57.md (cortex_a57_alu): Add bfx.
* config/arm/exynos-m1.md (exynos_m1_alu): Add bfx.
(exynos_m1_alu_p): Likewise.
* config/arm/types.md: Add bfx.
* config/arm/xgene1.md (xgene1_bfm): Add bfx.

--
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
62eda569f9b642ac569a61718d7debf7eae1b59e..afd463602af4c3f19db8f8cc834aa8cf0b78867e
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3955,7 +3955,7 @@
shl\t%0, %1, %2
ushl\t%0, %1, %2"
   [(set_attr "simd" "no,no,yes,yes")
-   (set_attr "type" "bfm,shift_reg,neon_shift_imm, neon_shift_reg")]
+   (set_attr "type" "bfx,shift_reg,neon_shift_imm, neon_shift_reg")]
 )
 
 ;; Logical right shift using SISD or Integer instruction
@@ -3972,7 +3972,7 @@
#
#"
   [(set_attr "simd" "no,no,yes,yes,yes")
-   (set_attr "type" 
"bfm,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
+   (set_attr "type" 
"bfx,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
 )
 
 (define_split
@@ -4019,7 +4019,7 @@
#
#"
   [(set_attr "simd" "no,no,yes,yes,yes")
-   (set_attr "type" 
"bfm,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
+   (set_attr "type" 
"bfx,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
 )
 
 (define_split
@@ -4129,7 +4129,7 @@
   "@
\\t%w0, %w1, %2
\\t%w0, %w1, %w2"
-  [(set_attr "type" "bfm,shift_reg")]
+  [(set_attr "type" "bfx,shift_reg")]
 )
 
 (define_insn "*3_insn"
@@ -4141,7 +4141,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "\t%w0, %w1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 (define_insn "*extr5_insn"
@@ -4234,7 +4234,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "bfiz\t%0, %1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 (define_insn "*zero_extend_lshr"
@@ -4247,7 +4247,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "ubfx\t%0, %1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 (define_insn "*extend_ashr"
@@ -4260,7 +4260,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "sbfx\\t%0, %1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 ;; ---
@@ -4283,7 +4283,7 @@
 (match_operand 3 "const_int_operand" "n")))]
   ""
   "bfx\\t%0, %1, %3, %2"
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 ;; Bitfield Insert (insv)
@@ -4365,7 +4365,7 @@
  : GEN_INT ( - UINTVAL (operands[2]));
   return "bfiz\t%0, %1, %2, %3";
 }
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 ;; XXX We should match (any_extend (ashift)) here, like (and (ashift)) below
@@ -4377,7 +4377,7 @@
 (match_operand 3 "const_int_operand" "n")))]
   "aarch64_mask_and_shift_for_ubfiz_p (mode, operands[3], operands[2])"
   "ubfiz\\t%0, %1, %2, %P3"
-  [(set_attr "type" "bfm")]
+  [(set_attr "type" "bfx")]
 )
 
 (define_insn "bswap2"
diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md
index 
058713a2ad98a364d36a3faaf0e93c39cb89adbc..7c1c28b0498cfe0129e3f0de7e29e31536fe421a
 100644
--- a/gcc/config/aarch64/thunderx.md
+++ b/gcc/config/aarch64/thunderx.md
@@ -39,7 +39,7 @@
 
 (define_insn_reservation "thunderx_shift" 1
   (and (eq_attr "tune" "thunderx")
-   (eq_attr "type" "bfm,extend,rotate_imm,shift_imm,shift_reg,rbit,rev"))
+   (eq_attr "type" 
"bfm,bfx,extend,rotate_imm,shift_imm,shift_reg,rbit,rev"))
   "thunderx_pipe0 | thunderx_pipe1")
 
 
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 
70c0f4daabe0ccb8e32808f1af51f5460e087a18..eb6d0b04976aaf441dd95cc43d02918226e75387
 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -93,7 +93,7 @@
   (and (eq_attr "tune" "cortexa53")
(eq_attr "type" "alu_shift_imm,alus_shift_imm,
crc,logic_shift_imm,logics_

A RA patch necessary for new Intel insns generation

2016-11-10 Thread Vladimir N Makarov
Hi, the following patch is necessary for generation of new Intel insns 
requiring 4 aligned zmm regs.


Committed as rev. 242043.

Index: ChangeLog
===
--- ChangeLog	(revision 242040)
+++ ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2016-11-10  Vladimir Makarov  
+
+	* target.def (additional_allocno_class_p): New.
+	* hooks.h (hook_bool_reg_class_t_false): New prototype.
+	* hooks.c (hook_bool_reg_class_t_false): New.
+	* ira.c (setup_allocno_and_important_classes): Use the new hook.
+	* doc/tm.texi.in (TARGET_ADDITIONAL_ALLOCNO_CLASS_P): Add it.
+	* doc/tm.texi: Update.
+
 2016-11-10  Jason Merrill  
 
 	* gengtype.c (new_structure): Append to structures list.
Index: hooks.h
===
--- hooks.h	(revision 242040)
+++ hooks.h	(working copy)
@@ -55,6 +55,7 @@ extern bool hook_bool_rtx_insn_true (rtx
 extern bool hook_bool_rtx_false (rtx);
 extern bool hook_bool_rtx_insn_int_false (rtx_insn *, int);
 extern bool hook_bool_uintp_uintp_false (unsigned int *, unsigned int *);
+extern bool hook_bool_reg_class_t_false (reg_class_t regclass);
 extern bool hook_bool_rtx_mode_int_int_intp_bool_false (rtx, machine_mode,
 			int, int, int *, bool);
 extern bool hook_bool_tree_tree_false (tree, tree);
Index: hooks.c
===
--- hooks.c	(revision 242040)
+++ hooks.c	(working copy)
@@ -466,3 +466,11 @@ hook_bool_uint_uintp_false (unsigned int
 {
   return false;
 }
+
+/* Generic hook that takes a register class and returns false.  */
+bool
+hook_bool_reg_class_t_false (reg_class_t regclass ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
Index: target.def
===
--- target.def	(revision 242040)
+++ target.def	(working copy)
@@ -5029,6 +5029,18 @@ DEFHOOK
  reg_class_t, (reg_class_t, machine_mode),
  NULL)
 
+/* Determine an additional allocno class.  */
+DEFHOOK
+(additional_allocno_class_p,
+ "This hook should return @code{true} if given class of registers should\
+  be an allocno class in any way.  Usually RA uses only one register\
+  class from all classes containing the same register set.  In some\
+  complicated cases, you need to have two or more such classes as\
+  allocno ones for RA correct work.  Not defining this hook is\
+  equivalent to returning @code{false} for all inputs.",
+ bool, (reg_class_t),
+ hook_bool_reg_class_t_false)
+
 DEFHOOK
 (cstore_mode,
  "This hook defines the machine mode to use for the boolean result of\
Index: ira.c
===
--- ira.c	(revision 242040)
+++ ira.c	(working copy)
@@ -1012,7 +1012,7 @@ setup_allocno_and_important_classes (voi
 temp_hard_regset2))
 	break;
 	}
-  if (j >= n)
+  if (j >= n || targetm.additional_allocno_class_p (i))
 	classes[n++] = (enum reg_class) i;
   else if (i == GENERAL_REGS)
 	/* Prefer general regs.  For i386 example, it means that
Index: doc/tm.texi.in
===
--- doc/tm.texi.in	(revision 242040)
+++ doc/tm.texi.in	(working copy)
@@ -2507,6 +2507,8 @@ value that the middle-end intended.
 
 @hook TARGET_SPILL_CLASS
 
+@hook TARGET_ADDITIONAL_ALLOCNO_CLASS_P
+
 @hook TARGET_CSTORE_MODE
 
 @hook TARGET_COMPUTE_PRESSURE_CLASSES
Index: doc/tm.texi
===
--- doc/tm.texi	(revision 242040)
+++ doc/tm.texi	(working copy)
@@ -2899,6 +2899,10 @@ addressing.
 This hook defines a class of registers which could be used for spilling  pseudos of the given mode and class, or @code{NO_REGS} if only memory  should be used.  Not defining this hook is equivalent to returning  @code{NO_REGS} for all inputs.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_ADDITIONAL_ALLOCNO_CLASS_P (reg_class_t)
+This hook should return @code{true} if given class of registers should  be an allocno class in any way.  Usually RA uses only one register  class from all classes containing the same register set.  In some  complicated cases, you need to have two or more such classes as  allocno ones for RA correct work.  Not defining this hook is  equivalent to returning @code{false} for all inputs.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_CSTORE_MODE (enum insn_code @var{icode})
 This hook defines the machine mode to use for the boolean result of  conditional store patterns.  The ICODE argument is the instruction code  for the cstore being performed.  Not definiting this hook is the same  as accepting the mode encoded into operand 0 of the cstore expander  patterns.
 @end deftypefn


Re: gomp-nvptx branch - libgomp changes

2016-11-10 Thread Alexander Monakov
libgomp/

* Makefile.am (libgomp_la_SOURCES): Add atomic.c, icv.c, icv-device.c.
* Makefile.in. Regenerate.
* configure.ac [nvptx*-*-*] (libgomp_use_pthreads): Set and use it...
(LIBGOMP_USE_PTHREADS): ...here; new define.
* configure: Regenerate.
* config.h.in: Likewise.
* config/posix/affinity.c: Move to...
* affinity.c: ...here (new file).  Guard use of PThreads-specific
interface by LIBGOMP_USE_PTHREADS. 
* critical.c: Split out GOMP_atomic_{start,end} into...
* atomic.c: ...here (new file).
* env.c: Split out ICV definitions into...
* icv.c: ...here (new file) and...
* icv-device.c: ...here. New file.
* config/linux/lock.c (gomp_init_lock_30): Move to generic lock.c.
(gomp_destroy_lock_30): Ditto.
(gomp_set_lock_30): Ditto.
(gomp_unset_lock_30): Ditto.
(gomp_test_lock_30): Ditto.
(gomp_init_nest_lock_30): Ditto.
(gomp_destroy_nest_lock_30): Ditto.
(gomp_set_nest_lock_30): Ditto.
(gomp_unset_nest_lock_30): Ditto.
(gomp_test_nest_lock_30): Ditto.
* lock.c: New.
* config/nvptx/lock.c: New.
* config/nvptx/bar.c: New.
* config/nvptx/bar.h: New.
* config/nvptx/doacross.h: New.
* config/nvptx/error.c: New.
* config/nvptx/icv-device.c: New.
* config/nvptx/mutex.h: New.
* config/nvptx/pool.h: New.
* config/nvptx/proc.c: New.
* config/nvptx/ptrlock.h: New.
* config/nvptx/sem.h: New.
* config/nvptx/simple-bar.h: New.
* config/nvptx/target.c: New.
* config/nvptx/task.c: New.
* config/nvptx/team.c: New.
* config/nvptx/time.c: New.
* config/posix/simple-bar.h: New.
* libgomp.h: Guard pthread.h inclusion.  Include simple-bar.h.
(gomp_num_teams_var): Declare.
(struct gomp_thread_pool): Change threads_dock member to
gomp_simple_barrier_t.
[__nvptx__] (gomp_thread): New implementation.
(gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS.
(gomp_thread_destructor): Ditto.
(gomp_init_thread_affinity): Ditto.
* team.c: Guard uses of PThreads-specific interfaces by
LIBGOMP_USE_PTHREADS.  Adjust all uses of threads_dock.
(gomp_free_thread) [__nvptx__]: Do not call 'free'.

* config/nvptx/alloc.c: Delete.
* config/nvptx/barrier.c: Ditto.
* config/nvptx/fortran.c: Ditto.
* config/nvptx/iter.c: Ditto.
* config/nvptx/iter_ull.c: Ditto.
* config/nvptx/loop.c: Ditto.
* config/nvptx/loop_ull.c: Ditto.
* config/nvptx/ordered.c: Ditto.
* config/nvptx/parallel.c: Ditto.
* config/nvptx/section.c: Ditto.
* config/nvptx/single.c: Ditto.
* config/nvptx/splay-tree.c: Ditto.
* config/nvptx/work.c: Ditto.

* testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass
-foffload=-lgfortran in addition to -lgfortran.
* testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto.

* plugin/plugin-nvptx.c: Include .
(struct targ_fn_descriptor): Add new fields.
(struct ptx_device): Ditto.  Set them...
(nvptx_open_device): ...here.
(nvptx_adjust_launch_bounds): New.
(nvptx_host2dev): Allow NULL 'nvthd'.
(nvptx_dev2host): Ditto.
(GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400.
(link_ptx): Adjust log sizes.
(nvptx_host2dev): Allow NULL 'nvthd'.
(nvptx_dev2host): Ditto.
(nvptx_set_clocktick): New.  Use it...
(GOMP_OFFLOAD_load_image): ...here.  Set new targ_fn_descriptor
fields.
(GOMP_OFFLOAD_dev2dev): New.
(nvptx_adjust_launch_bounds): New.
(nvptx_stacks_size): New.
(nvptx_stacks_alloc): New.
(nvptx_stacks_free): New.
(GOMP_OFFLOAD_run): New.
(GOMP_OFFLOAD_async_run): New (stub).

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index a3e1c2b..4090336 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -58,12 +58,12 @@ libgomp_la_LDFLAGS = $(libgomp_version_info) 
$(libgomp_version_script) \
 libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
 libgomp_la_LINK = $(LINK) $(libgomp_la_LDFLAGS)
 
-libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
-   iter_ull.c loop.c loop_ull.c ordered.c parallel.c sections.c single.c \
-   task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
-   time.c fortran.c affinity.c target.c splay-tree.c libgomp-plugin.c \
-   oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \
-   oacc-plugin.c oacc-cuda.c priority_queue.c
+libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
+   icv.c icv-device.c iter.c iter_ull.c loop.c loop_ull.c ordered.c \
+   parallel.c sections.c single.c task.c team.

Re: gomp-nvptx branch - middle-end changes

2016-11-10 Thread Alexander Monakov
gcc/
* internal-fn.c (expand_GOMP_SIMT_LANE): New.
(expand_GOMP_SIMT_VF): New.
(expand_GOMP_SIMT_LAST_LANE): New.
(expand_GOMP_SIMT_ORDERED_PRED): New.
(expand_GOMP_SIMT_VOTE_ANY): New.
(expand_GOMP_SIMT_XCHG_BFLY): New.
(expand_GOMP_SIMT_XCHG_IDX): New.
* internal-fn.def (GOMP_SIMT_LANE): New.
(GOMP_SIMT_VF): New.
(GOMP_SIMT_LAST_LANE): New.
(GOMP_SIMT_ORDERED_PRED): New.
(GOMP_SIMT_VOTE_ANY): New.
(GOMP_SIMT_XCHG_BFLY): New.
(GOMP_SIMT_XCHG_IDX): New.
* omp-low.c (omp_maybe_offloaded_ctx): New, outlined from...
(create_omp_child_function): ...here.  Set "omp target entrypoint"
or "omp declare target" attribute based on is_gimple_omp_offloaded.
(omp_max_simt_vf): New.  Use it...
(omp_max_vf): ...here.
(lower_rec_input_clauses): Add reduction lowering for SIMT execution.
(lower_lastprivate_clauses): Likewise, for "lastprivate" lowering.
(lower_omp_ordered): Likewise, for "ordered" lowering.
(expand_omp_simd): Add SIMT transforms.
(pass_data_lower_omp): Add PROP_gimple_lomp_dev.
(execute_omp_device_lower): New.
(pass_data_omp_device_lower): New.
(pass_omp_device_lower): New pass.
(make_pass_omp_device_lower): New.
* passes.def (pass_omp_device_lower): Position new pass.
* tree-pass.h (PROP_gimple_lomp_dev): Define.
(make_pass_omp_device_lower): Declare.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index cbee97e..fd1cd8b 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -157,6 +157,132 @@ expand_ANNOTATE (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
+   without SIMT execution this should be expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_LANE (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  gcc_assert (targetm.have_omp_simt_lane ());
+  emit_insn (targetm.gen_omp_simt_lane (target));
+}
+
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_VF (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
+/* Lane index of the first SIMT lane that supplies a non-zero argument.
+   This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the
+   lane that executed the last iteration for handling OpenMP lastprivate.  */
+
+static void
+expand_GOMP_SIMT_LAST_LANE (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx cond = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], cond, mode);
+  gcc_assert (targetm.have_omp_simt_last_lane ());
+  expand_insn (targetm.code_for_omp_simt_last_lane, 2, ops);
+}
+
+/* Non-transparent predicate used in SIMT lowering of OpenMP "ordered".  */
+
+static void
+expand_GOMP_SIMT_ORDERED_PRED (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx ctr = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], ctr, mode);
+  gcc_assert (targetm.have_omp_simt_ordered ());
+  expand_insn (targetm.code_for_omp_simt_ordered, 2, ops);
+}
+
+/* "Or" boolean reduction across SIMT lanes: return non-zero in all lanes if
+   any lane supplies a non-zero argument.  */
+
+static void
+expand_GOMP_SIMT_VOTE_ANY (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx cond = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], cond, mode);
+  gcc_assert (targetm.have_omp_simt_vote_any ());
+  expand_insn (targetm.code_for_omp_simt_vote_any, 2, ops);
+}
+
+/* Exchange between SIMT lanes with a "butterfly" pattern: source lane index
+   is destination lane index XOR given offset.  */
+
+static void
+expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx src = expand_normal (gimple_call_arg (stmt, 0));
+  rtx idx = expand_normal (gimple_call_arg (stmt, 1));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_op

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Vladimir N Makarov



On 11/10/2016 11:27 AM, Andrew Senkevich wrote:

Hi,

this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.

It requires additional patch for register allocator from Vladimir
Makarov to be committed before.



I've just committed the necessary patch.



[PATCH 2/2][AArch64] Add bfx attribute

2016-11-10 Thread Wilco Dijkstra
The second patch updates the Cortex-A57 scheduler now that we can differentiate
between shifts and bitfield inserts.  The Cortex-A57 Software Optimization Guide
indicates that BFM operations use the integer multi-cycle pipeline, while ARM
UXTB/H instructions use the Integer 1 or Integer 0 pipelines, so swap the bfm
and extend reservations.  This results in minor scheduling differences.

I think the XGene-1 scheduler might need a similar change as currently all 
AArch64
shifts are modelled as 2-cycle operations.

ChangeLog:
2016-11-10  Wilco Dijkstra  

* config/arm/cortex-a57.md (cortex_a57_alu): Move extend here, move 
bfm...
(cortex_a57_alu_shift): ...here.

--
diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index 
da461846baa5b28ce3d9c9f731dbfd7becb31a85..63072509e50375929f75c44af900a4803a6285f3
 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -297,7 +297,7 @@
(eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm,\
alu_sreg,alus_sreg,logic_reg,logics_reg,\
adc_imm,adcs_imm,adc_reg,adcs_reg,\
-   adr,bfm,bfx,clz,rbit,rev,alu_dsp_reg,\
+   adr,bfx,extend,clz,rbit,rev,alu_dsp_reg,\
rotate_imm,shift_imm,shift_reg,\
mov_imm,mov_reg,\
mvn_imm,mvn_reg,\
@@ -307,7 +307,7 @@
 ;; ALU ops with immediate shift
 (define_insn_reservation "cortex_a57_alu_shift" 3
   (and (eq_attr "tune" "cortexa57")
-   (eq_attr "type" "extend,\
+   (eq_attr "type" "bfm,\
alu_shift_imm,alus_shift_imm,\
crc,logic_shift_imm,logics_shift_imm,\
mov_shift,mvn_shift"))



[PATCH][AArch64] Improve TI mode address offsets

2016-11-10 Thread Wilco Dijkstra
Improve TI mode address offsets - these may either use LDP of 64-bit or
LDR of 128-bit, so we need to use the correct intersection of offsets.
When splitting a large offset into base and offset, use a signed 9-bit 
unscaled offset.

Remove the Ump constraint on movti and movtf instructions as this blocks
the reload optimizer from merging address CSEs (is this supposed to work
only on 'm' constraints?).  The result is improved codesize, especially
wrf and gamess in SPEC2006.


int f (int x)
{
  __int128_t arr[100];
  arr[31] = 0;
  arr[48] = 0;
  arr[79] = 0;
  arr[65] = 0;
  arr[70] = 0;
  return arr[x];
}

Before patch (note the multiple redundant add x1, sp, 1024):
sub sp, sp, #1600
sbfiz   x0, x0, 4, 32
add x1, sp, 256
stp xzr, xzr, [x1, 240]
add x1, sp, 768
stp xzr, xzr, [x1]
add x1, sp, 1024
stp xzr, xzr, [x1, 240]
add x1, sp, 1024
stp xzr, xzr, [x1, 16]
add x1, sp, 1024
stp xzr, xzr, [x1, 96]
ldr w0, [sp, x0]
add sp, sp, 1600
ret

After patch:
sub sp, sp, #1600
sbfiz   x0, x0, 4, 32
add x1, sp, 1024
stp xzr, xzr, [sp, 496]
stp xzr, xzr, [x1, -256]
stp xzr, xzr, [x1, 240]
stp xzr, xzr, [x1, 16]
stp xzr, xzr, [x1, 96]
ldr w0, [sp, x0]
add sp, sp, 1600
ret


Bootstrap & regress OK.

ChangeLog:
2015-11-10  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.md (movti_aarch64): Change Ump to m.
(movtf_aarch64): Likewise.
* config/aarch64/aarch64.c (aarch64_classify_address):
Use correct intersection of offsets.
(aarch64_legitimize_address_displacement): Use 9-bit signed offsets.
(aarch64_legitimize_address): Use 9-bit signed offsets for TI/TF mode.
Use 7-bit signed scaled mode for modes > 16 bytes.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
3045e6d6447d5c1860feb51708eeb2a21d2caca9..45f44e96ba9e9d3c8c41d977aa509fa13398a8fd
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4066,7 +4066,8 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
 instruction memory accesses.  */
  if (mode == TImode || mode == TFmode)
return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
-   && offset_9bit_signed_unscaled_p (mode, offset));
+   && (offset_9bit_signed_unscaled_p (mode, offset)
+   || offset_12bit_unsigned_scaled_p (mode, offset)));
 
  /* A 7bit offset check because OImode will emit a ldp/stp
 instruction (only big endian will get here).
@@ -4270,18 +4271,19 @@ aarch64_legitimate_address_p (machine_mode mode, rtx x,
 /* Split an out-of-range address displacement into a base and offset.
Use 4KB range for 1- and 2-byte accesses and a 16KB range otherwise
to increase opportunities for sharing the base address of different sizes.
-   For TI/TFmode and unaligned accesses use a 256-byte range.  */
+   For unaligned accesses and TI/TF mode use the signed 9-bit range.  */
 static bool
 aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode 
mode)
 {
-  HOST_WIDE_INT mask = GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3fff;
+  HOST_WIDE_INT offset = INTVAL (*disp);
+  HOST_WIDE_INT base = offset & ~(GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3ffc);
 
-  if (mode == TImode || mode == TFmode ||
-  (INTVAL (*disp) & (GET_MODE_SIZE (mode) - 1)) != 0)
-mask = 0xff;
+  if (mode == TImode || mode == TFmode
+  || (offset & (GET_MODE_SIZE (mode) - 1)) != 0)
+base = (offset + 0x100) & ~0x1ff;
 
-  *off = GEN_INT (INTVAL (*disp) & ~mask);
-  *disp = GEN_INT (INTVAL (*disp) & mask);
+  *off = GEN_INT (base);
+  *disp = GEN_INT (offset - base);
   return true;
 }
 
@@ -5148,12 +5150,10 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
  x = gen_rtx_PLUS (Pmode, base, offset_rtx);
}
 
-  /* Does it look like we'll need a load/store-pair operation?  */
+  /* Does it look like we'll need a 16-byte load/store-pair operation?  */
   HOST_WIDE_INT base_offset;
-  if (GET_MODE_SIZE (mode) > 16
- || mode == TImode)
-   base_offset = ((offset + 64 * GET_MODE_SIZE (mode))
-  & ~((128 * GET_MODE_SIZE (mode)) - 1));
+  if (GET_MODE_SIZE (mode) > 16)
+   base_offset = (offset + 0x400) & ~0x7f0;
   /* For offsets aren't a multiple of the access size, the limit is
 -256...255.  */
   else if (offset & (GET_MODE_SIZE (mode) - 1))
@@ -5167,6 +5167,8 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
   /* Small negative offsets are supported.  */
   else if (IN_RANGE (offset, -256, 0))
base_offset = 0;
+  else if (mode == TImod

Re: [PATCH 2/2][AArch64] Add bfx attribute

2016-11-10 Thread Dr. Philipp Tomsich
> On 10 Nov 2016, at 18:14, Wilco Dijkstra  wrote:
> 
> I think the XGene-1 scheduler might need a similar change as currently all 
> AArch64
> shifts are modelled as 2-cycle operations.

Thanks for the heads-up.  We’ll indeed need to update this.

Regards,
Philipp.

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
2016-11-10 19:36 GMT+03:00 Jakub Jelinek :
> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>> Hi,
>>
>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>
>> It requires additional patch for register allocator from Vladimir
>> Makarov to be committed before.
>
> Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
> tabs), can you repost as attachment or configure your MUA not to do this?
>
> Just a couple of random nits follow:
>
>> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.
>
> This mentions an option that doesn't exist, is that s/dd// ?

Yes.
Attached fixed version.


--
WBR,
Andrew


new_avx512_instructions.patch
Description: Binary data


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
2016-11-10 20:14 GMT+03:00 Vladimir N Makarov :
>
>
> On 11/10/2016 11:27 AM, Andrew Senkevich wrote:
>>
>> Hi,
>>
>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>
>> It requires additional patch for register allocator from Vladimir
>> Makarov to be committed before.
>>
>>
> I've just committed the necessary patch.

Thanks, Vladimir.


--
WBR,
Andrew


[PATCH][ARM] Improve max_insns_skipped logic

2016-11-10 Thread Wilco Dijkstra
Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
increasing codesize.  Given 4 works well for Thumb-2, use the same limit for ARM
for consistency. 

ChangeLog:
2016-11-04  Wilco Dijkstra  

* config/arm/arm.c (arm_option_params_internal): Improve setting of
max_insns_skipped.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2901,20 +2901,12 @@ arm_option_params_internal (void)
   targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
 }
 
-  if (optimize_size)
-{
-  /* If optimizing for size, bump the number of instructions that we
- are prepared to conditionally execute (even on a StrongARM).  */
-  max_insns_skipped = 6;
+  /* Increase the number of conditional instructions with -Os.  */
+  max_insns_skipped = optimize_size ? 4 : current_tune->max_insns_skipped;
 
-  /* For THUMB2, we limit the conditional sequence to one IT block.  */
-  if (TARGET_THUMB2)
-max_insns_skipped = arm_restrict_it ? 1 : 4;
-}
-  else
-/* When -mrestrict-it is in use tone down the if-conversion.  */
-max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
-  ? 1 : current_tune->max_insns_skipped;
+  /* For THUMB2, we limit the conditional sequence to one IT block.  */
+  if (TARGET_THUMB2)
+max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
 }
 
 /* True if -mflip-thumb should next add an attribute for the default



Re: [PATCH] dwarf2cfi: Dump row differences before asserting

2016-11-10 Thread Segher Boessenkool
Ping.

On Mon, Oct 31, 2016 at 05:15:53PM +, Segher Boessenkool wrote:
> If maybe_record_trace_start fails because the CFI is inconsistent on two
> paths into a block it currently just ICEs.  This changes it to also dump
> the CFI on those two paths in the dump file; debugging it without that
> information is hopeless.
> 
> Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?
> 
> 
> Segher
> 
> 
> 2016-10-31  Segher Boessenkool  
> 
>   * dwarf2cfi.c (dump_cfi_row): Add forward declaration.
>   (maybe_record_trace_start): If the CFI is different on the new and
>   old paths, print out both to the dump file before ICEing.
> 
> ---
>  gcc/dwarf2cfi.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
> index da9da52..19edc28 100644
> --- a/gcc/dwarf2cfi.c
> +++ b/gcc/dwarf2cfi.c
> @@ -2239,6 +2239,8 @@ add_cfis_to_fde (void)
>  }
>  }
>  
> +static void dump_cfi_row (FILE *f, dw_cfi_row *row);
> +
>  /* If LABEL is the start of a trace, then initialize the state of that
> trace from CUR_TRACE and CUR_ROW.  */
>  
> @@ -2282,7 +2284,21 @@ maybe_record_trace_start (rtx_insn *start, rtx_insn 
> *origin)
>/* We ought to have the same state incoming to a given trace no
>matter how we arrive at the trace.  Anything else means we've
>got some kind of optimization error.  */
> -  gcc_checking_assert (cfi_row_equal_p (cur_row, ti->beg_row));
> +#if CHECKING_P
> +  if (!cfi_row_equal_p (cur_row, ti->beg_row))
> + {
> +   if (dump_file)
> + {
> +   fprintf (dump_file, "Inconsistent CFI state!\n");
> +   fprintf (dump_file, "SHOULD have:\n");
> +   dump_cfi_row (dump_file, ti->beg_row);
> +   fprintf (dump_file, "DO have:\n");
> +   dump_cfi_row (dump_file, cur_row);
> + }
> +
> +   gcc_unreachable ();
> + }
> +#endif
>  
>/* The args_size is allowed to conflict if it isn't actually used.  */
>if (ti->beg_true_args_size != args_size)
> -- 
> 1.9.3


Re: [PATCH] Introduce -fprofile-update=maybe-atomic

2016-11-10 Thread Nathan Sidwell

On 11/10/2016 08:24 AM, Martin Liška wrote:

On 11/10/2016 05:17 PM, David Edelsohn wrote:

Maybe instead of adding "maybe", we need to change the severity of the
warning so that the warning is not emitted by default.


Adding the warning option to -Wextra can be solution. Is it acceptable
approach?


I don't think that's good.  Now I understand the -pthreads thing, we 
have different use cases.


1) user explicitly said -fprofile-update=FOO.  They shouldn't have to 
enable something else to get a diagnostic that FOO doesn't work.


2) driver implicitly said -fprofile-update=FOO, because the user said 
-pthreads but the driver doesn't know if FOO is acceptable.  We want to 
silently fallback to the old behaviour.


The proposed solution addresses #2 by having the driver say 
-fprofile-update=META-FOO.  My dislike is that we're exposing this to 
the user and they're going to start using it.  That strikes me as 
undesirable.


How hard is it to implement the fprofile-update option value as a list. 
I.e. '-fprofile-update=atomic,single', with semantics of 'pick the first 
one you can do'? If that's straightforwards, then that seems to me as a 
better solution for #2. [flyby-thought, have 'atomic,single' as an 
acceptable single option value?]


Failing that, Martin's solution is probably the sanest available 
solution, but I'd like to rename 'maybe-atomic' to the more meaningful 
'prefer-atomic'.  With 'maybe-atomic', I'm left wondering if it looks at 
the phase of the moon.


nathan

--
Nathan Sidwell


Re: [PATCH] dwarf2cfi: Dump row differences before asserting

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 11:29:12AM -0600, Segher Boessenkool wrote:
> Ping.
> 
> On Mon, Oct 31, 2016 at 05:15:53PM +, Segher Boessenkool wrote:
> > If maybe_record_trace_start fails because the CFI is inconsistent on two
> > paths into a block it currently just ICEs.  This changes it to also dump
> > the CFI on those two paths in the dump file; debugging it without that
> > information is hopeless.
> > 
> > Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?
> > 
> > 
> > Segher
> > 
> > 
> > 2016-10-31  Segher Boessenkool  
> > 
> > * dwarf2cfi.c (dump_cfi_row): Add forward declaration.
> > (maybe_record_trace_start): If the CFI is different on the new and
> > old paths, print out both to the dump file before ICEing.

Ok, thanks.

Jakub


Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-10 Thread Joseph Myers
On Fri, 28 Oct 2016, Richard Biener wrote:

> +/* Parse a gimple expression.
> +
> +   gimple-expression:
> + gimple-unary-expression
> + gimple-call-statement
> + gimple-binary-expression
> + gimple-assign-expression
> + gimple-cast-expression

I don't see any comments expanding what the syntax is for most of these 
constructs.

> +  if (c_parser_next_token_is (parser, CPP_EQ))
> +c_parser_consume_token (parser);

That implies you're allowing an optional '=' at this point in the syntax.  
That doesn't seem to make sense to me; I'd expect you to do if (=) { 
process assignment; } else { other cases; } or similar.

> +  /* GIMPLE PHI expression.  */
> +  if (c_parser_next_token_is_keyword (parser, RID_PHI))

I don't see this mentioned in any of the syntax comments.

> +  struct {
> +/* The expression at this stack level.  */
> +struct c_expr expr;
> +/* The operation on its left.  */
> +enum tree_code op;
> +/* The source location of this operation.  */
> +location_t loc;
> +  } stack[2];
> +  int sp;
> +  /* Location of the binary operator.  */
> +  location_t binary_loc = UNKNOWN_LOCATION;  /* Quiet warning.  */
> +#define POP\

This all looks like excess complexity.  The syntax in the comment 
indicates that in GIMPLE, the operands of a binary expression are unary 
expressions.  So nothing related to precedence is needed at all, and you 
shouldn't need this stack construct.

-- 
Joseph S. Myers
jos...@codesourcery.com


[gomp4] add support for derived types in ACC UPDATE

2016-11-10 Thread Cesar Philippidis
OpenACC 2.0a has limited support for fortran derived types. Basically,
derived type variables are only supported in ACC UPDATE. Rather than
adding generalized support for derived times in the gimplifier, this
patch has the fortran FE pass both subarrays and arrays as void pointers
with an appropriate OMP_CLAUSE_SIZE. ACC UPDATE is an executable
directive, and the gimplifier would just end up pruning out all of the
unnecessary supporting data clauses otherwise.

As of right now, GCC still lacks support for non-contiguous subarray
arguments to ACC UPDATE. I'm not sure why it was decided to let ACC
UPDATE support non-contiguous subarrays, but it already is an oddball
with its lone support for derived types.

This patch has been committed to gomp-4_0-branch.

Cesar
2016-11-10  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_omp_variable_list): New allow_derived argument.
	(gfc_match_omp_map_clause): Update call to
	gfc_match_omp_variable_list.
	(gfc_match_omp_clauses): Update calls to gfc_match_omp_map_clause.
	(gfc_match_oacc_update): Update call to gfc_match_omp_clauses.
	(resolve_omp_clauses): Permit derived type variables in ACC UPDATE
	clauses.
	* trans-openmp.c (gfc_trans_omp_clauses_1): Lower derived type
	members.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Update handling of ACC
	UPDATE variables.

	gcc/testsuite/
	* gfortran.dg/goacc/derived-types.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-fortran/update-2.f90: New test.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 95885bc..0a9d137 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -216,7 +216,8 @@ static match
 gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 			 bool allow_common, bool *end_colon = NULL,
 			 gfc_omp_namelist ***headp = NULL,
-			 bool allow_sections = false)
+			 bool allow_sections = false,
+			 bool allow_derived = false)
 {
   gfc_omp_namelist *head, *tail, *p;
   locus old_loc, cur_loc;
@@ -242,7 +243,8 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	case MATCH_YES:
 	  gfc_expr *expr;
 	  expr = NULL;
-	  if (allow_sections && gfc_peek_ascii_char () == '(')
+	  if (allow_sections && gfc_peek_ascii_char () == '('
+	  || allow_derived && gfc_peek_ascii_char () == '%')
 	{
 	  gfc_current_locus = cur_loc;
 	  m = gfc_match_variable (&expr, 0);
@@ -634,10 +636,11 @@ cleanup:
 
 static bool
 gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
-			  bool common_blocks)
+			  bool common_blocks, bool allow_derived)
 {
   gfc_omp_namelist **head = NULL;
-  if (gfc_match_omp_variable_list ("", list, common_blocks, NULL, &head, true)
+  if (gfc_match_omp_variable_list ("", list, common_blocks, NULL, &head, true,
+   allow_derived)
   == MATCH_YES)
 {
   gfc_omp_namelist *n;
@@ -655,7 +658,8 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
 static match
 gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 		   uint64_t dtype_mask, bool first = true,
-		   bool needs_space = true, bool openacc = false)
+		   bool needs_space = true, bool openacc = false,
+		   bool allow_derived = false)
 {
   gfc_omp_clauses *base_clauses, *c = gfc_get_omp_clauses ();
   locus old_loc;
@@ -773,7 +777,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_COPY)
 	  && gfc_match ("copy ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_TOFROM, openacc))
+	   OMP_MAP_FORCE_TOFROM, openacc,
+	   allow_derived))
 	continue;
 	  if (mask & OMP_CLAUSE_COPYIN)
 	{
@@ -781,7 +786,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 		{
 		  if (gfc_match ("copyin ( ") == MATCH_YES
 		  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-		   OMP_MAP_FORCE_TO, true))
+		   OMP_MAP_FORCE_TO, true,
+		   allow_derived))
 		continue;
 		}
 	  else if (gfc_match_omp_variable_list ("copyin (",
@@ -792,7 +798,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_COPYOUT)
 	  && gfc_match ("copyout ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_FROM, true))
+	   OMP_MAP_FORCE_FROM, true,
+	   allow_derived))
 	continue;
 	  if ((mask & OMP_CLAUSE_COPYPRIVATE)
 	  && gfc_match_omp_variable_list ("copyprivate (",
@@ -802,14 +809,15 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_CREATE)
 	  && gfc_match ("create ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_ALLOC, true))
+	   OMP_MAP_FORCE_ALLOC, true,
+	   allow_derived))
 	continue;
 	  break;
 	case 'd':
 	  if ((mask & OMP_CLAUSE_DELETE)
 	  && gfc_match ("delete ( ") == MATCH_YES
 	  && gfc_match_omp_map_cl

Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers

2016-11-10 Thread Michael Meissner
On Thu, Nov 10, 2016 at 10:05:25AM -0600, Segher Boessenkool wrote:
> Hi Mike,
> 
> > I have built the spec 2006 CPU benchmark suite with these changes, and the
> > power8 (ISA 2.07) code generation does not change.
> 
> Very good to hear :-)
> 
> Just some nits; okay for trunk with that fixed:
> 
> > +(define_split
> > +  [(set (match_operand:EXTHI 0 "altivec_register_operand" "")
> > +   (sign_extend:EXTHI
> > +(match_operand:HI 1 "indexed_or_indirect_operand" "")))]
> > +  "TARGET_P9_VECTOR && reload_completed"
> > +  [(set (match_dup 2)
> > +   (match_dup 1))
> > +   (set (match_dup 0)
> > +   (sign_extend:EXTHI (match_dup 2)))]
> > +{
> > +  operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));
> > +})
> 
> Please lose the default "" (here and elsewhere).

Ok for the match_operands, but I discovered that match_scratch still needs the
"".

> > Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c
> > ___
> > Modified: svn:mergeinfo
> >Merged 
> > /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-1.c:r241733-241924
> > 
> > 
> > Property changes on: gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c
> > ___
> > Modified: svn:mergeinfo
> >Merged 
> > /trunk/gcc/testsuite/gcc.target/powerpc/p9-minmax-2.c:r241733-241924
> 
> I don't know what this is?

This is some artifact from svnmerge.  It shows up a lot of times when I do a
svn diff against the branch.  Usually I try to delete it, but once in awhile it
slips through.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[patch, doc] PR 37998 -- Unclear documentation of -fno-common

2016-11-10 Thread Sandra Loosemore
This patch is for a long-standing issue with the way -fno-common is 
documented.  The specific bug report was about an ambiguous use of the 
word "This", but as I looked more at the way this option was described, 
I realized the whole thing was needlessly confusing because it didn't 
use the correct terminology -- the current text tries to use 
"declaration" to describe what the C standard calls a "tentative 
definition".


Joseph, can you look this over with your language-lawyer hat on, before 
I commit it?  I'm pretty sure I understand what this option does, but I 
want to be sure I'm explaining it correctly now.


-Sandra

2016-11-09  Sandra Loosemore  

	PR c/37998

	gcc/
	* doc/invoke.texi (Code Gen Options) [-fno-common]: Use correct
	terminology.  Expand to remove ambiguity.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 241974)
+++ gcc/doc/invoke.texi	(working copy)
@@ -11953,25 +11953,32 @@ Use it to conform to a non-default appli
 
 @item -fno-common
 @opindex fno-common
-In C code, controls the placement of uninitialized global variables.
-Unix C compilers have traditionally permitted multiple definitions of
-such variables in different compilation units by placing the variables
-in a common block.
-This is the behavior specified by @option{-fcommon}, and is the default
-for GCC on most targets.
-On the other hand, this behavior is not required by ISO C, and on some
-targets may carry a speed or code size penalty on variable references.
-The @option{-fno-common} option specifies that the compiler should place
-uninitialized global variables in the data section of the object file,
-rather than generating them as common blocks.
-This has the effect that if the same variable is declared
-(without @code{extern}) in two different compilations,
-you get a multiple-definition error when you link them.
-In this case, you must compile with @option{-fcommon} instead.
+@cindex tentative definitions
+In C code, this option controls the placement of global variables 
+defined without an initializer, known as @dfn{tentative definitions} 
+in the C standard.  Tentative definitions are distinct from declarations 
+of a variable with the @code{extern} keyword, which do not allocate storage.
+
+Unix C compilers have traditionally allocated storage for
+uninitialized global variables in a common block.  This allows the
+linker to resolve all tentative definitions of the same variable
+in different compilation units to the same object, or to a non-tentative
+definition.  
+This is the behavior specified by @option{-fcommon}, and is the default for 
+GCC on most targets.  
+On the other hand, this behavior is not required by ISO
+C, and on some targets may carry a speed or code size penalty on
+variable references.
+
+The @option{-fno-common} option specifies that the compiler should instead
+place uninitialized global variables in the data section of the object file.
+This inhibits the merging of tentative definitions by the linker so
+you get a multiple-definition error if the same 
+variable is defined in more than one compilation unit.
 Compiling with @option{-fno-common} is useful on targets for which
 it provides better performance, or if you wish to verify that the
 program will work on other systems that always treat uninitialized
-variable declarations this way.
+variable definitions this way.
 
 @item -fno-ident
 @opindex fno-ident


Re: [patch, doc] PR 37998 -- Unclear documentation of -fno-common

2016-11-10 Thread Joseph Myers
On Thu, 10 Nov 2016, Sandra Loosemore wrote:

> This patch is for a long-standing issue with the way -fno-common is
> documented.  The specific bug report was about an ambiguous use of the word
> "This", but as I looked more at the way this option was described, I realized
> the whole thing was needlessly confusing because it didn't use the correct
> terminology -- the current text tries to use "declaration" to describe what
> the C standard calls a "tentative definition".
> 
> Joseph, can you look this over with your language-lawyer hat on, before I
> commit it?  I'm pretty sure I understand what this option does, but I want to
> be sure I'm explaining it correctly now.

This looks correct to me.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, gcc, wwwdocs] Document upcoming Qualcomm Falkor processor support

2016-11-10 Thread Siddhesh Poyarekar
Hi,

This patch documents the newly added flag in gcc 7 for the upcoming
Qualcomm Falkor processor core.

Siddhesh

Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.24
diff -u -r1.24 changes.html
--- htdocs/gcc-7/changes.html   9 Nov 2016 14:28:59 -   1.24
+++ htdocs/gcc-7/changes.html   10 Nov 2016 13:43:42 -
@@ -296,6 +296,11 @@
-mcpu=vulcan and -mtune=vulcan options as
well as the equivalent target attributes and pragmas.
  
+ 
+   The Qualcomm Falkor processor is now supported via the
+   -mcpu=falkor and -mtune=falkor options as
+   well as the equivalent target attributes and pragmas.
+ 

 
 ARM


Re: [PATCH] Fix PR71762

2016-11-10 Thread Marc Glisse

On Thu, 10 Nov 2016, Richard Biener wrote:


The following fixes PR71762 via reverting the transforms of
~X & Y to X < Y and similar because when the bools they apply to
are expanded to RTL undefined values are not reliably zero-extended
and thus the transform is invalid.  Ensuring the zero-extension
is too costly IMHO and the proper fix is to move the transform
to RTL where we can check known-zero-bits to validate validity
or to fix GIMPLE not not have operations on types not matching their mode
in precision.


Can you explain why this particular transformation is special? We have a 
number of other optimizations that take advantage of the fact that bool is 
in [0, 1], even without looking at VRP, say for instance x != 0 -> x. Are 
we supposed to remove all of them?


--
Marc Glisse


Re: [PATCH] Fix PR71762

2016-11-10 Thread Richard Biener
On November 10, 2016 7:39:57 PM GMT+01:00, Marc Glisse  
wrote:
>On Thu, 10 Nov 2016, Richard Biener wrote:
>
>> The following fixes PR71762 via reverting the transforms of
>> ~X & Y to X < Y and similar because when the bools they apply to
>> are expanded to RTL undefined values are not reliably zero-extended
>> and thus the transform is invalid.  Ensuring the zero-extension
>> is too costly IMHO and the proper fix is to move the transform
>> to RTL where we can check known-zero-bits to validate validity
>> or to fix GIMPLE not not have operations on types not matching their
>mode
>> in precision.
>
>Can you explain why this particular transformation is special? We have
>a 
>number of other optimizations that take advantage of the fact that bool
>is 
>in [0, 1], even without looking at VRP, say for instance x != 0 -> x.
>Are 
>we supposed to remove all of them?

No.  But UNDEFINED & 0 is well-defined
Even if precision ends up being bigger.  Undefined != 0 is never well-defined.

Richard.




Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-10 Thread Richard Biener
On November 10, 2016 6:38:12 PM GMT+01:00, Joseph Myers 
 wrote:
>On Fri, 28 Oct 2016, Richard Biener wrote:
>
>> +/* Parse a gimple expression.
>> +
>> +   gimple-expression:
>> + gimple-unary-expression
>> + gimple-call-statement
>> + gimple-binary-expression
>> + gimple-assign-expression
>> + gimple-cast-expression
>
>I don't see any comments expanding what the syntax is for most of these
>
>constructs.
>
>> +  if (c_parser_next_token_is (parser, CPP_EQ))
>> +c_parser_consume_token (parser);
>
>That implies you're allowing an optional '=' at this point in the
>syntax.  
>That doesn't seem to make sense to me; I'd expect you to do if (=) { 
>process assignment; } else { other cases; } or similar.
>
>> +  /* GIMPLE PHI expression.  */
>> +  if (c_parser_next_token_is_keyword (parser, RID_PHI))
>
>I don't see this mentioned in any of the syntax comments.
>
>> +  struct {
>> +/* The expression at this stack level.  */
>> +struct c_expr expr;
>> +/* The operation on its left.  */
>> +enum tree_code op;
>> +/* The source location of this operation.  */
>> +location_t loc;
>> +  } stack[2];
>> +  int sp;
>> +  /* Location of the binary operator.  */
>> +  location_t binary_loc = UNKNOWN_LOCATION;  /* Quiet warning.  */
>> +#define POP   \
>
>This all looks like excess complexity.  The syntax in the comment 
>indicates that in GIMPLE, the operands of a binary expression are unary
>
>expressions.  So nothing related to precedence is needed at all, and
>you 
>shouldn't need this stack construct.

I'll address those comments.  As you did not have any comments on the 
c-parser.[CH] parts does that mean you are fine with them?  That is, does the 
above constitute a complete review of the patch?

Thanks,
Richard.




[PATCH, Fortran] PR78277: ICE in is_anonymous_component, at fortran/interface.c:450

2016-11-10 Thread Fritz Reese
All,

The attached fixes an ICE-on-invalid-code, specifically due to invalid
anonymous structure declarations, as seen in the attached test case.
This also improves error handling in such cases- the anonymous
structure body will continue to be parsed even if the variable-decl
after the opening variable-type-decl is invalid. (Something similar
could be done to improve regular structure declarations.) See the
in-code comments and comments on the on the PR for an additional
description

Along with the first patch, I've attached another patch
(struct_whitespace) containing whitespace-only changes to some
dec-structure-related code; the poor formatting was introduced before
I had my vim settings right.

I intend to commit the two attached patches soon for trunk if nobody
finds any issues with it. They both regtest on x86_64-redhat-linux of
course.

---
Fritz Reese

> pr78277.diff
From: Fritz O. Reese 
Date: Thu, 10 Nov 2016 13:36:54 -0500
Subject: [PATCH] Fix ICE and improve errors for invalid anonymous
structure declarations.

PR fortran/78277
* gcc/fortran/decl.c (gfc_match_data_decl): Gracefully handle bad
anonymous structure declarations.

PR fortran/78277
* gcc/testsuite/gfortran.dg/dec_structure_17.f90: New test.
<

> struct_whitespace.diff
From: Fritz O. Reese 
Date: Thu, 10 Nov 2016 11:02:08 -0500
Subject: [PATCH] Fix some whitespace.

   gcc/fortran/
   * decl.c (get_struct_decl, gfc_match_map, gfc_match_union): Fix
   whitespace.
   * interface.c (gfc_compare_union_types): Likewise.
<
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 1272f1f..bf6bc24 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -4901,7 +4901,28 @@ ok:
 }
 
   if (!gfc_error_flag_test ())
-gfc_error ("Syntax error in data declaration at %C");
+{
+  /* An anonymous structure declaration is unambiguous; if we matched one
+according to gfc_match_structure_decl, we need to return MATCH_YES
+here to avoid confusing the remaining matchers, even if there was an
+error during variable_decl.  We must flush any such errors.  Note this
+causes the parser to gracefully continue parsing the remaining input
+as a structure body, which likely follows.  */
+  if (current_ts.type == BT_DERIVED && current_ts.u.derived
+ && gfc_fl_struct (current_ts.u.derived->attr.flavor))
+   {
+ gfc_error_now ("Syntax error in anonymous structure declaration"
+" at %C");
+ /* Skip the bad variable_decl and line up for the start of the
+structure body.  */
+ gfc_error_recovery ();
+ m = MATCH_YES;
+ goto cleanup;
+   }
+
+  gfc_error ("Syntax error in data declaration at %C");
+}
+
   m = MATCH_ERROR;
 
   gfc_free_data_all (gfc_current_ns);
diff --git a/gcc/testsuite/gfortran.dg/dec_structure_17.f90 
b/gcc/testsuite/gfortran.dg/dec_structure_17.f90
new file mode 100644
index 000..18d3193
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec_structure_17.f90
@@ -0,0 +1,27 @@
+! { dg-do compile }
+! { dg-options "-fdec-structure" }
+!
+! PR fortran/78277
+!
+! Fix ICE for invalid structure declaration code.
+!
+
+subroutine sub1()
+  structure /s/
+structure t
+  integer i
+end structure
+  end structure
+  record /s/ u
+  interface
+subroutine sub0(u)
+  structure /s/
+structure t. ! { dg-error "Syntax error in anonymous structure decl" }
+  integer i
+end structure
+  end structure
+  record /s/ u
+end
+  end interface
+  call sub0(u)
+end
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 0120ceb..1272f1f 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -8597,31 +8597,31 @@ get_struct_decl (const char *name, sym_flavor fl, locus 
*decl,
 match
 gfc_match_map (void)
 {
-/* Counter used to give unique internal names to map structures. */
-static unsigned int gfc_map_id = 0;
-char name[GFC_MAX_SYMBOL_LEN + 1];
-gfc_symbol *sym;
-locus old_loc;
+  /* Counter used to give unique internal names to map structures. */
+  static unsigned int gfc_map_id = 0;
+  char name[GFC_MAX_SYMBOL_LEN + 1];
+  gfc_symbol *sym;
+  locus old_loc;
 
-old_loc = gfc_current_locus;
+  old_loc = gfc_current_locus;
 
-if (gfc_match_eos () != MATCH_YES)
-  {
-  gfc_error ("Junk after MAP statement at %C");
-  gfc_current_locus = old_loc;
-  return MATCH_ERROR;
-  }
+  if (gfc_match_eos () != MATCH_YES)
+{
+   gfc_error ("Junk after MAP statement at %C");
+   gfc_current_locus = old_loc;
+   return MATCH_ERROR;
+}
 
-/* Map blocks are anonymous so we make up unique names for the symbol table
-   which are invalid Fortran identifiers.  */
-snprintf (name, GFC_MAX_SYMBOL_LEN + 1, "MM$%u", gfc_map_id++);
+  /* Map blocks are anonymous so we make up unique names for the symbol table
+   

Re: [ipa-vrp] ice in set_value_range

2016-11-10 Thread kugan

Hi David,

Sorry about the breakage. I have already reverted this patch as this is 
causing bootstrap failures. I will test it on more targets before 
submitting this patch again.


Thanks,
Kugan

On 11/11/16 00:25, David Edelsohn wrote:

Kugan

Is there a PR for this failure?  It broke bootstrap on AIX as well and
I only was able to track it to your patch last night.

Thanks, David



Re: [PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)

2016-11-10 Thread Yvan Roux
Hi,

On 10 November 2016 at 18:00, Jakub Jelinek  wrote:
> Hi!
>
> On arm/aarch64 we ICE because some decls that make it here has non-NULL
> DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that
> doesn't fit into shwi would ICE similarly).  While it is arguably a FE bug
> that it creates for VLA initialization from STRING_CST such a decl,
> I believe we have some PRs about it already open.
> I think it won't hurt to check for the large sizes properly even in this
> function though, and punt on unexpected cases, or even extremely large ones.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going
> to test on arm and/or aarch64.  Ok for trunk if the testing there passes
> too?

Bootstrapped and regtested on arm, aarch64, linux and bare targets
without issue.

Thanks,
Yvan

> 2016-11-10  Jakub Jelinek  
>
> PR middle-end/78201
> * varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo.
> Don't test decl != NULL.  Don't look at DECL_SIZE, but DECL_SIZE_UNIT
> instead, return false if it is NULL, or doesn't fit into uhwi, or
> is larger or equal to targetm.max_anchor_offset.
>
> * g++.dg/opt/pr78201.C: New test.
>
> --- gcc/varasm.c.jj 2016-10-31 13:28:12.0 +0100
> +++ gcc/varasm.c2016-11-10 15:18:41.282886244 +0100
> @@ -6804,11 +6804,12 @@ default_use_anchors_for_symbol_p (const_
> return false;
>
>/* Don't use section anchors for decls that won't fit inside a single
> -anchor range to reduce the amount of instructions require to refer
> +anchor range to reduce the amount of instructions required to refer
>  to the entire declaration.  */
> -  if (decl && DECL_SIZE (decl)
> -&& tree_to_shwi (DECL_SIZE (decl))
> -   >= (targetm.max_anchor_offset * BITS_PER_UNIT))
> +  if (DECL_SIZE_UNIT (decl) == NULL_TREE
> + || !tree_fits_uhwi_p (DECL_SIZE_UNIT (decl))
> + || (tree_to_uhwi (DECL_SIZE_UNIT (decl))
> + >= (unsigned HOST_WIDE_INT) targetm.max_anchor_offset))
> return false;
>
>  }
> --- gcc/testsuite/g++.dg/opt/pr78201.C.jj   2016-11-10 15:20:18.398660681 
> +0100
> +++ gcc/testsuite/g++.dg/opt/pr78201.C  2016-11-10 15:19:58.0 +0100
> @@ -0,0 +1,13 @@
> +// PR middle-end/78201
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +
> +struct B { long d (); } *c;
> +long e;
> +
> +void
> +foo ()
> +{
> +  char a[e] = "";
> +  c && c->d();
> +}
>
> Jakub


[gomp4] remove OMP_CLAUSE_DEVICE_RESIDENT

2016-11-10 Thread Cesar Philippidis
I've committed this patch to gomp-4_0-branch which removes
OMP_CLAUSE_DEVICE_RESIDENT. This standalone clause is no longer
necessary, and hasn't been for a while, because device_resident is
treated as a data mapping type for OMP_CLAUSE_MAP, and not a clause itself.

Cesar
2016-11-10  Cesar Philippidis  

	gcc/fortran/
	* trans-openmp.c (gfc_trans_omp_clauses_1): Remove
	OMP_CLAUSE_DEVICE_RESIDENT.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Remove
	OMP_CLAUSE_DEVICE_RESIDENT.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	* tree-core.h (enum omp_clause_code): Likewise.
	* tree-nested.c (convert_nonlocal_omp_clauses):
	(convert_local_omp_clauses):
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree.c (walk_tree_1): Likewise.


diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 9924872..3c53414 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1781,9 +1781,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 	case OMP_LIST_USE_DEVICE:
 	  clause_code = OMP_CLAUSE_USE_DEVICE_PTR;
 	  goto add_clause;
-	case OMP_LIST_DEVICE_RESIDENT:
-	  clause_code = OMP_CLAUSE_DEVICE_RESIDENT;
-	  goto add_clause;
 
 	add_clause:
 	  omp_clauses
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 36c128b..9649fae 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7594,10 +7594,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	remove = true;
 	  break;
 
-	case OMP_CLAUSE_DEVICE_RESIDENT:
-	  remove = true;
-	  break;
-
 	case OMP_CLAUSE_NOWAIT:
 	case OMP_CLAUSE_ORDERED:
 	case OMP_CLAUSE_UNTIED:
@@ -8445,7 +8441,6 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	case OMP_CLAUSE__CILK_FOR_COUNT_:
 	case OMP_CLAUSE_ASYNC:
 	case OMP_CLAUSE_WAIT:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_NUM_GANGS:
 	case OMP_CLAUSE_NUM_WORKERS:
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e7cb66c..142330c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2246,7 +2246,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	  break;
 
 	case OMP_CLAUSE_BIND:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_NOHOST:
 	case OMP_CLAUSE__CACHE_:
 	default:
@@ -2414,7 +2413,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	  break;
 
 	case OMP_CLAUSE_BIND:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_NOHOST:
 	case OMP_CLAUSE__CACHE_:
 	default:
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index b07ff30..a3c4b18 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -313,9 +313,6 @@ enum omp_clause_code {
  #pragma acc cache (variable-list).  */
   OMP_CLAUSE__CACHE_,
 
-  /* OpenACC clause: device_resident (variable_list).  */
-  OMP_CLAUSE_DEVICE_RESIDENT,
-
   /* OpenACC clause: gang [(gang-argument-list)].
  Where
   gang-argument-list: [gang-argument-list, ] gang-argument
diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index ed77798..55f1f20 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -1215,7 +1215,6 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	  gcc_unreachable ();
 
 	case OMP_CLAUSE_BIND:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_NOHOST:
 	default:
 	  gcc_unreachable ();
@@ -1914,7 +1913,6 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	  gcc_unreachable ();
 
 	case OMP_CLAUSE_BIND:
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_NOHOST:
 	default:
 	  gcc_unreachable ();
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index f5970e1..f1af737 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -407,9 +407,6 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
 case OMP_CLAUSE__LOOPTEMP_:
   name = "_looptemp_";
   goto print_remap;
-case OMP_CLAUSE_DEVICE_RESIDENT:
-  name = "device_resident";
-  goto print_remap;
 case OMP_CLAUSE_TO_DECLARE:
   name = "to";
   goto print_remap;
diff --git a/gcc/tree.c b/gcc/tree.c
index f26e282..7c87612 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -281,7 +281,6 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
   1, /* OMP_CLAUSE_IS_DEVICE_PTR  */
   2, /* OMP_CLAUSE__CACHE_  */
-  1, /* OMP_CLAUSE_DEVICE_RESIDENT  */
   2, /* OMP_CLAUSE_GANG  */
   1, /* OMP_CLAUSE_ASYNC  */
   1, /* OMP_CLAUSE_WAIT  */
@@ -356,7 +355,6 @@ const char * const omp_clause_code_name[] =
   "use_device_ptr",
   "is_device_ptr",
   "_cache_",
-  "device_resident",
   "gang",
   "async",
   "wait",
@@ -11651,7 +11649,6 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	  WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
 	  /* FALLTHRU */
 
-	case OMP_CLAUSE_DEVICE_RESIDENT:
 	case OMP_CLAUSE_ASYNC:
 	case OMP_CLAUSE_WAIT:
 	case OMP_CLAUSE_WORKER:


Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-10 Thread Dominik Vogt
On Wed, Nov 09, 2016 at 03:46:38PM +0100, Richard Biener wrote:
> On Wed, Nov 9, 2016 at 3:30 PM, Dominik Vogt  wrote:
> > Something like the attached patch?  Robin and me have spent quite
> > some time to figure out the new pattern.  Two questions:
> >
> > 1) In the match expression you cannot just use SSA_NAME@0 because
> >then the "case SSA_NAME:" is added to a switch for another
> >pattern that already has that label.  Thus we made that "proxy"
> >predicate "ssa_name_p" that forces the code for the new pattern
> >out of the old switch and into a separate code block.  We
> >couldn't figure out whether this joining of case labels is a
> >feature in the matching language.  So, is this the right way to
> >deal with the conflicting labels?
> 
> No, just do not match SSA_NAME.  And instead of
> 
> +  (with { gimple *def_stmt = SSA_NAME_DEF_STMT (@0); }
> +   (if (is_gimple_assign (def_stmt)
> +   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
> 
> you instead want to change the pattern to
> 
> (simpify
>   (cmp (convert @0) INTEGER_CST@1)
> 
> @0 will then be your innerop
> 
> note that you can't use get_value_range but you have to use the
> get_range_info interface instead.  I suppose a helper function
> somewhere that computes whether an expression fits a type
> would be helpful (see expr_not_equal_to for sth related,
> thus expr_fits_type_p (@0, TREE_TYPE (@1)))

All right, I think we got that (new patch attached).

> Likewise the overflow_infinity checks do not translate to match.pd
> (or rahter the range info you get).

Can you give us another hint here, please?  The overflow check
should probably go into expr_fits_type_p, but with only the min
and max values from get_range_info, how can the overflow
TREE_OVERFLOW_P flag be retrieved from @1, to duplicate the logic
from is_{nega,posi}tive_overflow_infinity?  Is it availably
somewhere, or is it necessary to somehow re-calculate it from the
expression?

(This is really necessary so that cases like this don't start
folding with the patch:

--
signed char foo3uu (unsigned char a) 
{ 
  unsigned char d; 
  unsigned long un; 
 
  d = (a & 63) + 200; 
  un = d; 
  if (un >= 12) 
ubar(un); 
 
  return d; 
}
--

)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From d6f30094d892d86598f6af35cb3b99c258642e9b Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 2 Nov 2016 14:01:46 +0100
Subject: [PATCH] Convert folding in simplify_cond_using_ranges() to match.pd.

In cases like this

  ac_4 = *p_3(D);
  bc_5 = ac_4 & 63;
  l_6 = (long int) bc_5;
  if (l_6 > 9)
bar(l_6);

the function would fold bc_5 into the condition, replacing l_6, but l_6 cannot
be eliminated.  We'd end up with bc_5 being used in two places and l_6 in one
place without the chance to eliminate either.

The patched code suppresses folding if

  bc_5 has only a single use (i.e. being cast to l_6),

AND

  l_6 has more than one use.

However, the patch does not catch the case where all uses of l_6 could be
eliminated by the function, e.g.

  ...
  if (l_6 > 9)
bar();
  else if (l_6 > 5)
foo();
---
 gcc/fold-const.c | 64 ++
 gcc/fold-const.h |  1 +
 gcc/match.pd | 28 
 gcc/tree-vrp.c   | 66 
 4 files changed, 93 insertions(+), 66 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 593ea16..082ad46 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9100,6 +9100,70 @@ expr_not_equal_to (tree t, const wide_int &w)
 }
 }
 
+/* Return true if the value range of T is known not fit into TYPE.  */
+
+bool
+expr_fits_type_p (tree t, tree type)
+{
+  wide_int min, max;
+  value_range_type rtype;
+  signop sign;
+  signop tsign;
+  bool has_range = false;
+
+  if (!INTEGRAL_TYPE_P (type))
+return false;
+  tsign = TYPE_SIGN (type);
+  sign = TYPE_SIGN (TREE_TYPE (t));
+  switch (TREE_CODE (t))
+{
+case INTEGER_CST:
+  min = t;
+  max = t;
+  has_range = true;
+  break;
+
+case SSA_NAME:
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
+   return false;
+  rtype = get_range_info (t, &min, &max);
+  if (rtype == VR_RANGE)
+   has_range = true;
+  else if (rtype == VR_ANTI_RANGE)
+   {
+ /* As ANTI_RANGEs always wrap around, just check if T's whole type's
+value range fits into TYPE.  */
+ min = TYPE_MIN_VALUE (TREE_TYPE (t));
+ max = TYPE_MAX_VALUE (TREE_TYPE (t));
+ has_range = true;
+   }
+  break;
+}
+  if (has_range)
+{
+  if (sign == tsign)
+   {
+ if (wi::le_p (max, TYPE_MAX_VALUE (type), sign)
+ && wi::ge_p (min, TYPE_MIN_VALUE (type), sign))
+   return true;
+   }
+  else if (sign == SIGNED && tsign == UNSIGNED)
+   {
+ if (wi::ge_p (min, 0, SIGNED)
+ && wi::le_p (max, TYPE_MAX_VALUE (type), UN

Re: Go patch committed: copy print code from Go 1.7 runtime

2016-11-10 Thread Ian Lance Taylor
On Tue, Oct 18, 2016 at 1:06 AM, Uros Bizjak  wrote:
>
>> This patch copies the code that implements the print and println
>> predeclared functions from the Go 1.7 runtime.  The compiler is
>> changed to use the new names, and to call the printlock and
>> printunlock functions around a sequence of print calls.  The writebuf
>> field in the g struct changes to a slice.  Bootstrapped and ran Go
>> testsuite on x86_64-pc-linux-gnu.  Committed to mainline.
>
> This patch probably introduced recent regression on 32bit x86 multilib:
>
> Running target unix/-m32
> FAIL: go.test/test/fixedbugs/bug114.go compilation,  -O2 -g
> FAIL: go.test/test/printbig.go -O (test for excess errors)
> FAIL: go.test/test/printbig.go execution
>
> === go Summary for unix/-m32 ===
>
> # of expected passes6875
> # of unexpected failures3
> # of expected failures  1
> # of untested testcases 12
> # of unsupported tests  2
>
> e.g.:
>
> /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:27:
> error: integer constant overflow
> /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:45:
> error: integer constant overflow
> /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:38:
> error: integer constant overflow
> /home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:56:
> error: integer constant overflow
>
> FAIL: go.test/test/fixedbugs/bug114.go compilation,  -O2 -g
> UNTESTED: go.test/test/fixedbugs/bug114.go execution,  -O2 -g
>
> FAIL: go.test/test/printbig.go -O (test for excess errors)
> Excess errors:
> /home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:12:8: error:
> integer constant overflow
> /home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:13:15:
> error: integer constant overflow
>
> ./printbig.exe >printbig.p 2>&1
> couldn't execute "./printbig.exe": no such file or directory
> FAIL: go.test/test/printbig.go execution
> UNTESTED: go.test/test/printbig.go compare


For the record, Uros filed this as https://gcc.gnu.org/PR78145 and it
is now fixed.

Ian


Re: Reject out-of-range bit pos in bit-fields insns operating on a register.

2016-11-10 Thread Jeff Law

On 11/09/2016 03:40 AM, Andreas Schwab wrote:

As seen by the testcase in PR77822, combine can generate out-of-range
bit pos in a bit-field insn, unless the pattern explicitly rejects it.
This only makes a difference for expressions that are undefined at
runtime.  Without that we would either generate bad assembler or ICE in
output_btst.

PR target/78254
* config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields
insns operating on a register.
Could you please include a testcase for this?  Even if it's something 
that triggered during a bootstrap -- few people bootstrap m68k these 
days :-)


jeff



Re: Reject out-of-range bit pos in bit-fields insns operating on a register.

2016-11-10 Thread Andreas Schwab
On Nov 10 2016, Jeff Law  wrote:

> On 11/09/2016 03:40 AM, Andreas Schwab wrote:
>> As seen by the testcase in PR77822, combine can generate out-of-range
>> bit pos in a bit-field insn, unless the pattern explicitly rejects it.
>> This only makes a difference for expressions that are undefined at
>> runtime.  Without that we would either generate bad assembler or ICE in
>> output_btst.
>>
>>  PR target/78254
>>  * config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields
>>  insns operating on a register.
> Could you please include a testcase for this?  Even if it's something that
> triggered during a bootstrap -- few people bootstrap m68k these days :-)

It didn't trigger during bootstrap.  It was the testcase for PR77822.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


C++ PATCH for c++/77337 (auto return and constexpr lambda)

2016-11-10 Thread Jason Merrill
The constexpr lambda change introduced some problematic dependency
ordering, since instantiate constexpr functions aggressively so that
they are available for constexpr evaluation.  The patch for 65942
delayed that instantiation by triggering it from constexpr evaluation
directly rather than earlier in mark_used, but instantiate_decl still
wanted to instantiate them right away.  This patch fixes that and a
latent bug in tsubst_friend_function, which was leaving DECL_INITIAL
set on a function that was not yet instantiated.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit aea89e80dc0b7b976fd69680771a8534af265d7e
Author: Jason Merrill 
Date:   Thu Nov 10 10:13:44 2016 -0800

PR c++/77337 - auto return and lambda

* pt.c (tsubst_friend_function): Don't set DECL_INITIAL.
(instantiate_decl): It's OK to defer a constexpr function.
* cp-tree.h (DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION): Check
DECL_LANG_SPECIFIC.
* decl2.c (decl_defined_p): Use it.  No longer static.
* decl.c (redeclaration_error_message): Use decl_defined_p.
* constexpr.c (cxx_eval_call_expression): Set input_location around
call to instantiate_decl.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 43457d2..f75f0b0 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1464,9 +1464,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
   if (!DECL_INITIAL (fun)
   && DECL_TEMPLOID_INSTANTIATION (fun))
 {
+  location_t save_loc = input_location;
+  input_location = loc;
   ++function_depth;
   instantiate_decl (fun, /*defer_ok*/false, /*expl_inst*/false);
   --function_depth;
+  input_location = save_loc;
 }
 
   /* If in direct recursive call, optimize definition search.  */
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 9b5b5bc..8183775 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4380,7 +4380,8 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
instantiated will not be a DECL_TEMPLATE_INSTANTIATION, but will be
a DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION.  */
 #define DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION(DECL) \
-  (DECL_TEMPLATE_INFO (DECL) && !DECL_USE_TEMPLATE (DECL))
+  (DECL_LANG_SPECIFIC (DECL) && DECL_TEMPLATE_INFO (DECL) \
+   && !DECL_USE_TEMPLATE (DECL))
 
 /* Nonzero if DECL is a function generated from a function 'temploid',
i.e. template, member of class template, or dependent friend.  */
@@ -5895,6 +5896,7 @@ extern void import_export_decl(tree);
 extern tree build_cleanup  (tree);
 extern tree build_offset_ref_call_from_tree(tree, vec **,
 tsubst_flags_t);
+extern bool decl_defined_p (tree);
 extern bool decl_constant_var_p(tree);
 extern bool decl_maybe_constant_var_p  (tree);
 extern void no_linkage_error   (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 4b18d4e..185c98b 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2778,8 +2778,8 @@ redeclaration_error_message (tree newdecl, tree olddecl)
 warn_extern_redeclared_static.  */
 
   /* Defining the same name twice is no good.  */
-  if (DECL_INITIAL (olddecl) != NULL_TREE
- && DECL_INITIAL (newdecl) != NULL_TREE)
+  if (decl_defined_p (olddecl)
+ && decl_defined_p (newdecl))
{
  if (DECL_NAME (olddecl) == NULL_TREE)
return G_("%q#D not declared in class");
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index e0fff1e..4ebc7dc 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -80,7 +80,6 @@ static void import_export_class (tree);
 static tree get_guard_bits (tree);
 static void determine_visibility_from_class (tree, tree);
 static bool determine_hidden_inline (tree);
-static bool decl_defined_p (tree);
 static void maybe_instantiate_decl (tree);
 
 /* A list of static class variables.  This is needed, because a
@@ -4085,11 +4084,15 @@ collect_ada_namespace (tree namespc, const char 
*source_file)
 /* Returns true iff there is a definition available for variable or
function DECL.  */
 
-static bool
+bool
 decl_defined_p (tree decl)
 {
   if (TREE_CODE (decl) == FUNCTION_DECL)
-return (DECL_INITIAL (decl) != NULL_TREE);
+return (DECL_INITIAL (decl) != NULL_TREE
+   /* A pending instantiation of a friend temploid is defined.  */
+   || (DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION (decl)
+   && DECL_INITIAL (DECL_TEMPLATE_RESULT
+(DECL_TI_TEMPLATE (decl);
   else
 {
   gcc_assert (VAR_P (decl));
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e8b6afd..d4855d5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9383,10 +9383,6 @@ tsubst_friend_function (tree decl, tree args)
   else
new_friend_result_template_info = NULL_TREE;
 
- 

Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-10 Thread Joseph Myers
On Thu, 10 Nov 2016, Richard Biener wrote:

> I'll address those comments.  As you did not have any comments on the 
> c-parser.[CH] parts does that mean you are fine with them?  That is, 
> does the above constitute a complete review of the patch?

I am fine with the c-parser.[ch] parts.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C++ PATCH for c++/77337 (auto return and constexpr lambda)

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 01:42:20PM -0800, Jason Merrill wrote:
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn33.C
> @@ -0,0 +1,27 @@
> +// PR c++/77337
> +// { dg-do compile { target c++14 } }
> +
> +template
> +struct fix_type {
> +  Functor functor;
> +
> +  decltype(auto) operator()()
> +  { return functor(*this); }
> +};
> +
> +template
> +fix_type fix(Functor functor)
> +{ return { functor }; }
> +
> +int main()
> +{
> +  auto zero = fix
> +([](auto& self) -> int // N.B. non-deduced, non-dependent return type
> + {
> +   return 0;
> +
> +   self(); // error: use of 'decltype(auto) 
> fix_type::operator()() [with Functor = main()::]' 
> before deduction of 'auto'

Wouldn't it be clearer to turn that // error: line into
// { dg-bogus "use of \[^\n\r]* before deduction of 'auto'" }
so that it is clear that the error is undesirable even to casual reader?

> + });
> +
> +  return zero();
> +}

Jakub


  1   2   >